The H.264 Constrained Baseline Profile (CBP) is a subset of both the
Main Profile and the Baseline Profile. In principles, a hardware
decoder that supports either of those can decode CBP content. As it
happens, Main is supported by all VDPAU drivers, and Baseline is not.
So favor map CBP to MP for now. Hopefully in the future libvdpau will
offer an explicit choice for CBP.
This fixes bug 757.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Whenever av_gettime() is used to measure relative period of time,
av_gettime_relative() is prefered as it guarantee monotonic time
on supported platforms.
Signed-off-by: Martin Storsjö <martin@martin.st>
Currently, this option is accessed through AVCodecContext.mb_threshold,
which originally controlled reusing MB data when transcoding mpeg to
mpeg. Since the libvpx meaning is completely different from the original
mpegvideo meaning, it is better to use a separate private option for
this.
For streams which contain DRC metadata, the FDK decoder is able to
control rendering of the decoded output. The rendering parameters
are detailed in fdk_aac_dec_options [].
The default behavior is left up to the decoder.
Signed-off-by: Martin Storsjö <martin@martin.st>
The FDK decoder is capable of producing mono and stereo downmix from
multichannel streams. These streams may contain metadata that control
the downmix process. The decoder requires an Ancillary Buffer in order to
correctly apply downmix in streams containing downmix Metadata. The
decoder does not have an API interface to inform of the presence of
Metadata in the stream, and therefore the Ancillary Buffer is always
allocated whenever a downmix is requested.
When downmixing multichannel streams, the decoder requires the output
buffer in aacDecoder_DecodeFrame call to be of fixed size in order to
hold the actual number of channels contained in the stream. For example,
for a 5.1ch to stereo downmix, the decoder requires that the output buffer
is allocated for 6 channels, regardless of the fact that the output is in
fact two channels.
Due to this requirement, the output buffer is allocated for the maximum
output buffer size in case a downmix is requested (and also during
decoder init). When a downmix is requested, the buffer used for output
during init will also be used for the entire duration the decoder is open.
Otherwise, the initial decoder output buffer is freed and the decoder
decodes straight into the output AVFrame.
Signed-off-by: Martin Storsjö <martin@martin.st>
When decoding, this field holds the inverse of the framerate that can be
written in the headers for some codecs. Using a field called 'time_base'
for this is very misleading, as there are no timestamps associated with
it. Furthermore, this field is used for a very different purpose during
encoding.
Add a new field, called 'framerate', to replace the use of time_base for
decoding.
Decoding acceleration may work even if the codec level is higher than
the stated limit of the VDPAU driver. Or the problem may be considered
acceptable by the user. This flag allows skipping the codec level
capability checks and proceed with decoding.
Applications should obviously not set this flag by default, but only if
the user explicitly requested this behavior (and presumably knows how
to turn it back off if it fails).
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Currently, the amount of padding inserted at the beginning by some audio
encoders, is exported through AVCodecContext.delay. However
- the term 'delay' is heavily overloaded and can have multiple different
meanings even in the case of audio encoding.
- this field has entirely different meanings, depending on whether the
codec context is used for encoding or decoding (and has yet another
different meaning for video), preventing generic handling of the codec
context.
Therefore, add a new field -- AVCodecContext.initial_padding. It could
conceivably be used for decoding as well at a later point.
This makes the addition of arch optimized functions easier.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
The register function now specifies that the user callback should
leave things in the same state that it found them on failure but
that failure to destroy is ignored by the library. The register
function is now explicit about its behavior on failure
(it unregisters the previous callback and destroys all mutex).
Signed-off-by: Manfred Georg <mgeorg@google.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
This function provides an explicit VDPAU device and VDPAU driver to
libavcodec, so that the application is relieved from codec specifics
and VdpDevice life cycle management.
A stub flags parameter is added for future extension. For instance, it
could be used to ignore codec level capabilities (if someone feels
dangerous).
Signed-off-by: Anton Khirnov <anton@khirnov.net>
This is necessary to recreate the decoder with the correct parameters,
as not all codecs invoke get_format() in this case.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Using the not so new init and uninit callbacks, avcodec can now take
care of creating and destroying the VDPAU decoder instance.
The application is still responsible for creating the VDPAU device
and allocating video surfaces - this is necessary to keep video
surfaces on the GPU all the way to the output. But the application
will no longer needs to care about any codec-specific aspects.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
This is similar to what is done in libx264.c
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
These function pointers already existed in the ARM code. Adding them globally
allows calls to the function pointers to access arch-optimized versions of the
functions transparently.
Only set a value if _WIN32_WINNT is undefined or smaller than 0x0600. This is
cleaner than unconditional definition and avoids a number of redefinition
warnings. Also only define a value in one of the two dxva2 headers.
MpegEncContext based decoders are only fully initialized after the first
ff_thread_get_buffer() call. The RV30/40 decoders may fail before a frame
buffer was requested. ff_mpeg_update_thread_context() fails on half
initialized MpegEncContexts. Since this can only happen before a the
first frame was decoded there is no need to call
ff_mpeg_update_thread_context().
Based on patches by John Stebbins and tested by John Stebbins.
CC: libav-stable@libav.org
The packet buffer allocation considers the alpha channel as DCT-coded,
while it is actually run-coded and thus requires a larger buffer.
CC: libav-stable@libav.org
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
The buffer allocation may be incorrect (e.g. with an alpha plane),
and currently causes the buffer to be set to NULL by init_put_bits,
causing a crash later on.
So, detect that situation, and if detected, reallocate the buffer
and ask for a sample that shows the problem.
CC: libav-stable@libav.org
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
If the allocated size, despite best efforts, is too small, exit
with the appropriate error.
CC: libav-stable@libav.org
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
The LZMA support is a semi-official extension supported by libtiff 4.0.0
and later.
Signed-off-by: Diego Elio Pettenò <flameeyes@flameeyes.eu>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Such changes are neither allowed nor supported
Found-by: ami_stuff
Bug-Id: CVE-2013-7020
CC: libav-stable@libav.org
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Reduces the number of calls to tmvp derivation from 933685 to 586271 on
a sequence.
Reviewed-by: Mickaël Raulet <mraulet@insa-rennes.fr>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
The position is either rounded or not checked, so delay the wait to
check the proper value.
Reviewed-by: Mickaël Raulet <mraulet@insa-rennes.fr>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Only use PAL8 if palette is present, else use GRAY8 for pixfmt.
Instead of simulating a grayscale palette, use real grayscale pixels, if no
palette is actually defined.
Signed-off-by: Diego Elio Pettenò <flameeyes@flameeyes.eu>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
1) each of the loops run within a single CTB, so the relevant reference
list is constant
2) when that CTB is, or lies on the same slice as, the current one, we
can use a simple access instead of a relatively expensive call to
ff_hevc_get_ref_list()
It allows attaching other external, opaque data to the frame and passing it
through the reordering process, for cases when the caller wants other data
than just the plain packet pts. There is no way to cleanly achieve this
without the field.
The input data must remain constant, make a copy instead. This is in
theory a performance hit, but since I failed to find any samples
using this feature, this should not matter in practice.
Also, check the size of the header, avoiding invalid reads on truncated
data.
CC:libav-stable@libav.org
The previous implementation of the parser made four passes over each input
buffer (reduced to two if the container format already guaranteed the input
buffer corresponded to frames, such as with MKV). But these buffers are
often 200K in size, certainly enough to flush the data out of L1 cache, and
for many CPUs, all the way out to main memory. The passes were:
1) locate frame boundaries (not needed for MKV etc)
2) copy the data into a contiguous block (not needed for MKV etc)
3) locate the start codes within each frame
4) unescape the data between start codes
After this, the unescaped data was parsed to extract certain header fields,
but because the unescape operation was so large, this was usually also
effectively operating on uncached memory. Most of the unescaped data was
simply thrown away and never processed further. Only step 2 - because it
used memcpy - was using prefetch, making things even worse.
This patch reorganises these steps so that, aside from the copying, the
operations are performed in parallel, maximising cache utilisation. No more
than the worst-case number of bytes needed for header parsing is unescaped.
Most of the data is, in practice, only read in order to search for a start
code, for which optimised implementations already existed in the H264 codec
(notably the ARM version uses prefetch, so we end up doing both remaining
passes at maximum speed). For MKV files, we know when we've found the last
start code of interest in a given frame, so we are able to avoid doing even
that one remaining pass for most of the buffer.
In some use-cases (such as the Raspberry Pi) video decode is handled by the
GPU, but the entire elementary stream is still fed through the parser to
pick out certain elements of the header which are necessary to manage the
decode process. As you might expect, in these cases, the performance of the
parser is significant.
To measure parser performance, I used the same VC-1 elementary stream in
either an MPEG-2 transport stream or a MKV file, and fed it through avconv
with -c:v copy -c:a copy -f null. These are the gperftools counts for
those streams, both filtered to only include vc1_parse() and its callees,
and unfiltered (to include the whole binary). Lower numbers are better:
Before After
File Filtered Mean StdDev Mean StdDev Confidence Change
M2TS No 861.7 8.2 650.5 8.1 100.0% +32.5%
MKV No 868.9 7.4 731.7 9.0 100.0% +18.8%
M2TS Yes 250.0 11.2 27.2 3.4 100.0% +817.9%
MKV Yes 149.0 12.8 1.7 0.8 100.0% +8526.3%
Yes, that last case shows vc1_parse() running 86 times faster! The M2TS
case does show a larger absolute improvement though, since it was worse
to begin with.
This patch has been tested with the FATE suite (albeit on x86 for speed).
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Initialise VC1DSPContext for parser as well as for decoder.
Note, the VC-1 code doesn't actually use the function pointer yet.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
The rationale is that you have a packed format in form
<greyscale sample> <alpha sample> <greyscale sample> <alpha sample>
and shortening greyscale to 'G' might make one thing about Greenscale instead.
An alias pixel format and color space name are provided for compatibility.
Bug-Id: CVE-2013-0868
inspired by a patch from Michael Niedermayer <michaelni@gmx.at>
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Diego Biurrun <diego@biurrun.de>
CC: libav-stable@libav.org
llvm's integrated assembler does not accept spaces as macro argument
delimiter when targeting darwin. Using a explicit delimiter is a good
idea in principle since it makes case like 'macro 4 -2' vs 'macro 4 - 2'
clear.
Properly address CVE-2011-3946 and parse bitstream as described in the spec.
CC: libav-stable@libav.org
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Make sure the buffer size does not exceed the expected
RLE size.
Prevent an out of array bound write.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Bug-Id: CVE-2013-0852
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Additional contributions by James Almer <jamrial@gmail.com>,
Carl Eugen Hoyos <cehoyos@ag.or.at>, Fiona Glaser <fiona@x264.com> and
Anton Khirnov <anton@khirnov.net>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
The typedefs also exist in the avfft.h header and since typedefs cannot be
legally redefined in C, the code fails to compile with some compilers.
This reverts commits 11c7155cce and 57f1b1dcc7.
Intrinsics only used on aarch64 since the existing ARMv7 NEON asm
is slightly faster (Cortex-A9, gcc-4.8, micro-benchmarks and full
decoding time).
Signed-off-by: James Yu <james.yu@linaro.org>
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
Such files can be created using the --bff x264 option.
Sample-Id: h264_direct_temporal_mvs_bff.mkv
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
The previous implementation targeted DTS Coherent Acoustics, which only
requires nbits == 4 (fft16()). This case was (and still is) linked directly
rather than being indirected through ff_fft_calc_vfp(), but now the full
range from radix-4 up to radix-65536 is available. This benefits other codecs
such as AAC and AC3.
The implementaion is based upon the C version, with each routine larger than
radix-16 calling a hierarchy of smaller FFT functions, then performing a
post-processing pass. This pass benefits a lot from loop unrolling to
counter the long pipelines in the VFP. A relaxed calling standard also
reduces the overhead of the call hierarchy, and avoiding the excessive
inlining performed by GCC probably helps with I-cache utilisation too.
I benchmarked the result by measuring the number of gperftools samples that
hit anywhere in the AAC decoder (starting from aac_decode_frame()) or
specifically in the FFT routines (fft4() to fft512() and pass()) for the
same sample AAC stream:
Before After
Mean StdDev Mean StdDev Confidence Change
Audio decode 2245.5 53.1 1599.6 43.8 100.0% +40.4%
FFT routines 940.6 22.0 348.1 20.8 100.0% +170.2%
Signed-off-by: Martin Storsjö <martin@martin.st>
The previous implementation targeted DTS Coherent Acoustics, which only
requires mdct_bits == 6. This relatively small size lent itself to
unrolling the loops a small number of times, and encoding offsets
calculated at assembly time within the load/store instructions of each
iteration.
In the more general case (codecs such as AAC and AC3) much larger arrays
are used - mdct_bits == [8, 9, 11]. The old method does not scale for
these cases, so more integer registers are used with non-unrolled versions
of the loops (and with some stack spillage). The postrotation filter loop
is still unrolled by a factor of 2 to permit the double-buffering of some
VFP registers to facilitate overlap of neighbouring iterations.
I benchmarked the result by measuring the number of gperftools samples
that hit anywhere in the AAC decoder (starting from aac_decode_frame())
or specifically in ff_imdct_half_c / ff_imdct_half_vfp, for the same
example AAC stream:
Before After
Mean StdDev Mean StdDev Confidence Change
aac_decode_frame 2368.1 35.8 2117.2 35.3 100.0% +11.8%
ff_imdct_half_* 457.5 22.4 251.2 16.2 100.0% +82.1%
Signed-off-by: Martin Storsjö <martin@martin.st>