The generic code copies the main context's private data to all the
others. However that is quite dangerous, as it might end up copying some
pointers that are or will become invalid.
Since everything we actually need will be copied later in
update_thread_context(), it's safest to zero the private data in
init_thread_copy(), so it works the same way as init for the main
context.
Additionally always set the software pixel format, so it's available
even if ff_get_format() is not called later. This will be useful for
exporting stream parameters from init().
Also change the method for allocating them. Instead of two possible
alloc calls from different places, just ensure they are allocated at the
start of each slice. This should be simpler and less bug-prone than the
previous method.
According to the WebP Lossless Bitstream Specification
"each transform is allowed to be used only once".
If a transform is more than once this can lead to memory
corruption.
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
No global variables are used and the VLC tables are allocated without
static elements. This will allow using a JPEG decoding context within
other decoders.
This field is designed for marking codec properties useful to lavc internals.
Two internal capabilities are added:
- FF_CODEC_CAP_INIT_THREADSAFE: codec can be opened without locks;
- FF_CODEC_CAP_INIT_CLEANUP: codec frees memory if initialization fails.
So far it is only set in roq_encode_frame, but it is used in
roq_encode_end to free the coded_frame. This currently segfaults if
roq_encode_frame is not called between roq_encode_init and
roq_encode_end.
CC:libav-stable@libav.org
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
The fourth substream is being discarded, since its not raw audio data,
but an encoded Atmos stream which needs a specialized decoder.
Fixes decoding of the true hd stream from Transformers\ -\ Age\ of\ Extinction\ 2014\ 1080P-003.mkv
Signed-off-by: Anton Khirnov <anton@khirnov.net>
This buffer is resized when vpx_codec_get_cx_data() returns a
VPX_CODEC_STATS_PKT packet.
CC: libav-stable@libav.org
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
The valid returned values are always at most 11bit.
Remove the previous check that assumed larger values plausible and
use a signed integer to check get_vlc2 return values.
CC: libav-stable@libav.org
According to the WebP Lossless Bitstream Specification the highest
allowed value for a prefix code is 39.
If prefix_code is too large, the calculated extra_bits has an invalid
value and triggers an assertion in get_bits.
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
If it doesn't fit into 12 bits it triggers an assertion.
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Although the specification mandates this bit to zero, it may happen
that software tools incorrectly flip it to one, invalidating a possibly
valid stream.
Relax this restriction, by failing only when AV_EF_BITSTREAM is set.
This behaviour is similar to aac decoders in Firefox and Quicktime.
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
Normally the aic decoder finds the proper slice combination (multiple of
some number less than 32) but in case of odd width, it resorts to the
default values, which were actually swapped.
The number of slices is modified to account for such odd width cases.
CC: libav-stable@libav.org
Some files produced by the official encoder have up to 16bit of
padding instead of the expected padding to the byte.
Use a self-explanatory macro instead of a simple number.
CC: libav-stable@libav.org
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
hevc seems to be the only place where the C implementation
of the av_clip function is explicitly selected, precluding
platform-specific optimizations
Signed-off-by: Peter Meerwald <pmeerw@pmeerw.net>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
The scaling list can be specified in either the SPS or PPS.
Additionally, compensate for the diagonal scan permutation applied
in the decoder.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Some files contain a few additional, all-0 bits.
Check for that case and don't print incorrect "not supported"
message.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Signed-off-by: Alex Converse <alex.converse@gmail.com>
This fixes builds with vc1_parser enabled without vc1_decoder. All
the vc1_decoder object files were included in the vc1_parser line
in libavcodec/Makefile before, but architecture specific object files
for vc1_decoder were not.
Signed-off-by: Martin Storsjö <martin@martin.st>
It does not work correctly and apparently never did. There is no
indication that this (mis)feature is ever used in the wild or even that
any software other than the reference supports it.
Since the code that attempts to support it adds some nontrivial
complexity and has resulted in several bugs in the past, it is better to
just drop it.
Currently, it needs to be initialized by the ER caller (which is
currently either a mpegvideo decoder or h264dec). However, since none of
those decoders use MECmpContext for anything except ER, it makes more
sense to handle it purely inside ER.
This improves motion estimation and avoids using uninitialized data
for resolutions that aren't a multiple of 16.
Prior to d2a25c40, the edges used to be initialized so that encoding
was deterministic, but after that commit it started using uninitialized
data (for non multiple of 16 resolutions).
CC: libav-stable@libav.org
Signed-off-by: Martin Storsjö <martin@martin.st>
The 1.3 release branch of OpenH264 (as well as the master branch)
have been updated so that GCC no longer warns about this variable
as being unused.
Signed-off-by: Martin Storsjö <martin@martin.st>
This fixes out of array reads and/or infinite loops.
30 is the maximum number of bits that can be read into
coeff_abs below.
CC: libav-stable@libav.org
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Martin Storsjö <martin@martin.st>
Move the lavc/imgconvert functions and rename them as follows:
avpicture_get_size -> av_image_get_buffer_size()
avpicture_fill -> av_image_fill_arrays()
avpicture_layout -> av_image_copy_to_buffer()
The new functions have an align parameter, which allows to define the
linesize alignment assumed in the buffer (which is set or read).
The names of the functions are consistent with the lavu/samples API
(av_samples_get_buffer_size(), av_samples_fill_arrays()).
A redundant check has been dropped from av_image_fill_arrays().
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
The latest version added support for a new option for enabling
a signal level limiter, which adds some extra delay. In fdk-aac, this
is enabled by default, but disable it by default here since we'd rather
have zero-delay decoding.
Signed-off-by: Martin Storsjö <martin@martin.st>
These have a DXSA tag and contain alpha in addition to
color values for palette.
Signed-off-by: Jean-Baptiste Kempf <jb@videolan.org>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Compared to existing, common opensource H264 encoders, this can be
useful since it has got a different license (BSD instead of GPL).
Performance- and qualitywise it is comparable to x264 in ultrafast
mode.
Hooking it up as an encoder in libavcodec also simplifies comparing
it against other common encoders.
This requires OpenH264 1.3 or newer. Since the OpenH264 API and ABI
changes frequently, only releases are supported.
To take advantage of the OpenH264 patent offer, the OpenH264 library
must not be redistributed, but downloaded at runtime at the end-user's
system.
Signed-off-by: Martin Storsjö <martin@martin.st>
This is the order that the caller uses in the rest of the file.
Variables are modified to reflect the order above too and their
initialization is merged with their declarationt. No behavioral
change.
Bug-Id: CID 732286
On some video samples, VDA silently fails to decode frames and returns
kVDADecoderNoErr. Error out in these cases to avoid producing AVFrames with
empty planes.
Signed-off-by: Stefano Pigozzi <stefano.pigozzi@gmail.com>
Since the VDPAU pixel format does not distinguish between different
VDPAU video surface chroma types, we need another way to pass this
data to the application.
Originally VDPAU in libavcodec only supported decoding to 8-bits YUV
with 4:2:0 chroma sampling. Correspondingly, applications assumed that
libavcodec expected VDP_CHROMA_TYPE_420 video surfaces for output.
However some of the new HEVC profiles proposed for addition to VDPAU
would require different depth and/or sampling:
http://lists.freedesktop.org/archives/vdpau/2014-July/000167.html
...as would lossless AVC profiles:
http://lists.freedesktop.org/archives/vdpau/2014-November/000241.html
To preserve backward binary compatibility with existing applications,
a new av_vdpau_bind_context() flag is introduced in a further change.
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
This can be used by the application to signal its ability to cope with
video surface of types other than 8-bits YUV 4:2:0.
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
This carries the pixel format that would be used if it were not for
hardware acceleration. This is equal to AVCodecContext.pix_fmt if
hardware acceleration is not in use.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
This is to avoid proliferation of similar tables in following changes.
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Fixes invalid writes when there are more blocks in a run than total
remaining blocks.
CC: libav-stable@libav.org
Bug-ID: CVE-2014-8548
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Fixes invalid writes with very small image heights.
CC: libav-stable@libav.org
Bug-ID: CVE-2014-8547
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Anton Khirnov <anton@khirnov.net>
The frame size must be set by the caller and each dimension must be a
multiple of 2.
CC: libav-stable@libav.org
Bug-ID: CVE-2014-8543
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
The frame size must be set by the caller and each dimension must be a
multiple of 8.
CC: libav-stable@libav.org
Bug-ID: CVE-2014-8542
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Fixes possible invalid memory access.
Based on code by Michael Niedermayer <michaelni@gmx.at>
CC: libav-stable@libav.org
Bug-ID: CVE-2014-8541
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
This is the order that the caller uses in the rest of the file. The
same operation is applied to both parameters, so this change is only
done for consistency, it doesn't change the actual behaviour.
Bug-Id: CID 732285 / CID 732286
ff_mpv_common_init sets s->context_initialized.
This fixes decoding of h261 in the cases where the demuxer
hasn't already set the frame size.
CC: libav-stable@libav.org
Signed-off-by: Martin Storsjö <martin@martin.st>
This avoids trying to do sliced encoding, even if a slice/packet
size is requested (via the -ps option or the rtp_payload_size
field), since the encoder currently doesn't support it (or at least
our decoder can't decode it, even if the h261_encode_gob_header
function is hooked up to be called from the slicing part in
mpegvideo_enc.c).
Signed-off-by: Martin Storsjö <martin@martin.st>
Also use the same type for add_entry and check_size.
Bug-Id: CID 700699
CC: libav-stable@libav.org
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Signed-off-by: Vittorio Giovara <vittorio.giovarao@gmail.com>
Fix linking when only a subset of vaapi decoders is enabled.
Bug-Id: 760
CC: libav-stable@libav.org
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
The input packets are always assumed to be padded and
the av_fast_ family of function takes a pointer to a pointer.
Thanks to Nicolas Dufresne <nicolas.dufresne@collabora.com> for
a similar patch.
Introduced in 7b588bb691.
Bug-Id: 766
CC: libav-stable@libav.org
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Old VDPAU drivers do not support this newly defined profile, so falling
back to Main profile is necessary for backward binary compatibility.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
This avoids potential out of bounds writes, with potential future
versions of the library.
Bug-Id: CID 1254945
CC: libav-stable@libav.org
Signed-off-by: Martin Storsjö <martin@martin.st>
According to the DPX file format description found at
http://www.fileformat.info/format/dpx/egff.htm the ImageElement part of
the GenericImageHeader also contains an an offset to the real image data
beside the same member that can be found in the GenericFileHeader.
Libav keeps this member empty (=0) while some applications expects it to
be filled properly. FATE test updated accordingly.
Bug-Id: 742
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
Don't include the function pointer table in the code segment
in arm mode.
This shouldn't have any significant performance effect. It does
end up as a few more instructions than before, for ARM, but
only at the entry to this function, not within the fft functions
themselves.
Signed-off-by: Martin Storsjö <martin@martin.st>
This is needed to avoid banding artifacts when gammaing the picture.
Currently, if done with a video filter, the process is done on uints
instead of full float.
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
Keep the code as similar as possible across the codepaths to
ease spotting it for factorization.
Based on a patch from Michael Niedermayer <michaelni@gmx.at>.
The code currently set the information in at least 4 places, spare
some pointless loops.
Make the code in the loop a little uniform to make easier factorize
it out later.
'ret' can only be used without initialization if s->height <= 0, which can
only happen if avctx->height <= 0, which is validated elsewhere. Doesn't hurt
to still initialize it though.
CC: libav-stable@libav.org
Bug-Id: CID 732296
This makes sure the default behaviour of using the internal encoder
stays the same regardless if libtwolame is enabled or not (as for
any external library).
This fixes fate-lavf-mpg if libav is built with libtwolame enabled.
CC: libav-stable@libav.org
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
This allows running the code on android, where 64 bit binaries with
text relocations aren't allowed to be loaded.
Signed-off-by: Martin Storsjö <martin@martin.st>
Use av_mallocz_array instead of iterating and check the returned memory.
Check returned memory and cleanly exit in case of error during the loop.
Avoid a null pointer dereference for invalid data.
CC: libav-stable@libav.org
Bug-Id: CID 29575
vorbis_parser.o is built unconditionally since 5e80fb7ff, and the
unconditionally built parts of it depend on xiph.o.
This fixes builds with --disable-everything.
Signed-off-by: Martin Storsjö <martin@martin.st>
The latest fdk-aac code drop (from android 5.0) changed the channel
layout enums (changing the value of existing enum constants), and
renamed the option for downmixing.
The failsafe comparison between ctype and FF_ARRAY_ELEMS(channel_counts)
can trigger warnings (-Wtautological-constant-out-of-range-compare)
when building with the old FDK AAC releases, where it can't be
out of range with the enum values used there.
CC: libav-stable@libav.org
Signed-off-by: Martin Storsjö <martin@martin.st>
The only parameters needed by the demuxers are the sample rate and sample
count, which can be trivially extracted manually, without resorting to
an avpriv function.
Currently, the API takes an external AVCodecContext, which is used only
for extradata and logging. This change will allow to it to work without
an AVCodecContext in the following commits.
The application will destroy the underlying hardware handles when
get_format() gets called again. Also this ensures the
deinitialization takes place if the get_format callback returns an
error.
Regression from 1c80c9d7ef.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
The H.264 Constrained Baseline Profile (CBP) is a subset of both the
Main Profile and the Baseline Profile. In principles, a hardware
decoder that supports either of those can decode CBP content. As it
happens, Main is supported by all VDPAU drivers, and Baseline is not.
So favor map CBP to MP for now. Hopefully in the future libvdpau will
offer an explicit choice for CBP.
This fixes bug 757.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Whenever av_gettime() is used to measure relative period of time,
av_gettime_relative() is prefered as it guarantee monotonic time
on supported platforms.
Signed-off-by: Martin Storsjö <martin@martin.st>
Currently, this option is accessed through AVCodecContext.mb_threshold,
which originally controlled reusing MB data when transcoding mpeg to
mpeg. Since the libvpx meaning is completely different from the original
mpegvideo meaning, it is better to use a separate private option for
this.
For streams which contain DRC metadata, the FDK decoder is able to
control rendering of the decoded output. The rendering parameters
are detailed in fdk_aac_dec_options [].
The default behavior is left up to the decoder.
Signed-off-by: Martin Storsjö <martin@martin.st>
The FDK decoder is capable of producing mono and stereo downmix from
multichannel streams. These streams may contain metadata that control
the downmix process. The decoder requires an Ancillary Buffer in order to
correctly apply downmix in streams containing downmix Metadata. The
decoder does not have an API interface to inform of the presence of
Metadata in the stream, and therefore the Ancillary Buffer is always
allocated whenever a downmix is requested.
When downmixing multichannel streams, the decoder requires the output
buffer in aacDecoder_DecodeFrame call to be of fixed size in order to
hold the actual number of channels contained in the stream. For example,
for a 5.1ch to stereo downmix, the decoder requires that the output buffer
is allocated for 6 channels, regardless of the fact that the output is in
fact two channels.
Due to this requirement, the output buffer is allocated for the maximum
output buffer size in case a downmix is requested (and also during
decoder init). When a downmix is requested, the buffer used for output
during init will also be used for the entire duration the decoder is open.
Otherwise, the initial decoder output buffer is freed and the decoder
decodes straight into the output AVFrame.
Signed-off-by: Martin Storsjö <martin@martin.st>
When decoding, this field holds the inverse of the framerate that can be
written in the headers for some codecs. Using a field called 'time_base'
for this is very misleading, as there are no timestamps associated with
it. Furthermore, this field is used for a very different purpose during
encoding.
Add a new field, called 'framerate', to replace the use of time_base for
decoding.
Decoding acceleration may work even if the codec level is higher than
the stated limit of the VDPAU driver. Or the problem may be considered
acceptable by the user. This flag allows skipping the codec level
capability checks and proceed with decoding.
Applications should obviously not set this flag by default, but only if
the user explicitly requested this behavior (and presumably knows how
to turn it back off if it fails).
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Currently, the amount of padding inserted at the beginning by some audio
encoders, is exported through AVCodecContext.delay. However
- the term 'delay' is heavily overloaded and can have multiple different
meanings even in the case of audio encoding.
- this field has entirely different meanings, depending on whether the
codec context is used for encoding or decoding (and has yet another
different meaning for video), preventing generic handling of the codec
context.
Therefore, add a new field -- AVCodecContext.initial_padding. It could
conceivably be used for decoding as well at a later point.
This makes the addition of arch optimized functions easier.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
The register function now specifies that the user callback should
leave things in the same state that it found them on failure but
that failure to destroy is ignored by the library. The register
function is now explicit about its behavior on failure
(it unregisters the previous callback and destroys all mutex).
Signed-off-by: Manfred Georg <mgeorg@google.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
This function provides an explicit VDPAU device and VDPAU driver to
libavcodec, so that the application is relieved from codec specifics
and VdpDevice life cycle management.
A stub flags parameter is added for future extension. For instance, it
could be used to ignore codec level capabilities (if someone feels
dangerous).
Signed-off-by: Anton Khirnov <anton@khirnov.net>
This is necessary to recreate the decoder with the correct parameters,
as not all codecs invoke get_format() in this case.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Using the not so new init and uninit callbacks, avcodec can now take
care of creating and destroying the VDPAU decoder instance.
The application is still responsible for creating the VDPAU device
and allocating video surfaces - this is necessary to keep video
surfaces on the GPU all the way to the output. But the application
will no longer needs to care about any codec-specific aspects.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
This is similar to what is done in libx264.c
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
These function pointers already existed in the ARM code. Adding them globally
allows calls to the function pointers to access arch-optimized versions of the
functions transparently.
Only set a value if _WIN32_WINNT is undefined or smaller than 0x0600. This is
cleaner than unconditional definition and avoids a number of redefinition
warnings. Also only define a value in one of the two dxva2 headers.
MpegEncContext based decoders are only fully initialized after the first
ff_thread_get_buffer() call. The RV30/40 decoders may fail before a frame
buffer was requested. ff_mpeg_update_thread_context() fails on half
initialized MpegEncContexts. Since this can only happen before a the
first frame was decoded there is no need to call
ff_mpeg_update_thread_context().
Based on patches by John Stebbins and tested by John Stebbins.
CC: libav-stable@libav.org
The packet buffer allocation considers the alpha channel as DCT-coded,
while it is actually run-coded and thus requires a larger buffer.
CC: libav-stable@libav.org
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
The buffer allocation may be incorrect (e.g. with an alpha plane),
and currently causes the buffer to be set to NULL by init_put_bits,
causing a crash later on.
So, detect that situation, and if detected, reallocate the buffer
and ask for a sample that shows the problem.
CC: libav-stable@libav.org
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
If the allocated size, despite best efforts, is too small, exit
with the appropriate error.
CC: libav-stable@libav.org
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
The LZMA support is a semi-official extension supported by libtiff 4.0.0
and later.
Signed-off-by: Diego Elio Pettenò <flameeyes@flameeyes.eu>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Such changes are neither allowed nor supported
Found-by: ami_stuff
Bug-Id: CVE-2013-7020
CC: libav-stable@libav.org
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Reduces the number of calls to tmvp derivation from 933685 to 586271 on
a sequence.
Reviewed-by: Mickaël Raulet <mraulet@insa-rennes.fr>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
The position is either rounded or not checked, so delay the wait to
check the proper value.
Reviewed-by: Mickaël Raulet <mraulet@insa-rennes.fr>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Only use PAL8 if palette is present, else use GRAY8 for pixfmt.
Instead of simulating a grayscale palette, use real grayscale pixels, if no
palette is actually defined.
Signed-off-by: Diego Elio Pettenò <flameeyes@flameeyes.eu>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
1) each of the loops run within a single CTB, so the relevant reference
list is constant
2) when that CTB is, or lies on the same slice as, the current one, we
can use a simple access instead of a relatively expensive call to
ff_hevc_get_ref_list()
It allows attaching other external, opaque data to the frame and passing it
through the reordering process, for cases when the caller wants other data
than just the plain packet pts. There is no way to cleanly achieve this
without the field.
The input data must remain constant, make a copy instead. This is in
theory a performance hit, but since I failed to find any samples
using this feature, this should not matter in practice.
Also, check the size of the header, avoiding invalid reads on truncated
data.
CC:libav-stable@libav.org
The previous implementation of the parser made four passes over each input
buffer (reduced to two if the container format already guaranteed the input
buffer corresponded to frames, such as with MKV). But these buffers are
often 200K in size, certainly enough to flush the data out of L1 cache, and
for many CPUs, all the way out to main memory. The passes were:
1) locate frame boundaries (not needed for MKV etc)
2) copy the data into a contiguous block (not needed for MKV etc)
3) locate the start codes within each frame
4) unescape the data between start codes
After this, the unescaped data was parsed to extract certain header fields,
but because the unescape operation was so large, this was usually also
effectively operating on uncached memory. Most of the unescaped data was
simply thrown away and never processed further. Only step 2 - because it
used memcpy - was using prefetch, making things even worse.
This patch reorganises these steps so that, aside from the copying, the
operations are performed in parallel, maximising cache utilisation. No more
than the worst-case number of bytes needed for header parsing is unescaped.
Most of the data is, in practice, only read in order to search for a start
code, for which optimised implementations already existed in the H264 codec
(notably the ARM version uses prefetch, so we end up doing both remaining
passes at maximum speed). For MKV files, we know when we've found the last
start code of interest in a given frame, so we are able to avoid doing even
that one remaining pass for most of the buffer.
In some use-cases (such as the Raspberry Pi) video decode is handled by the
GPU, but the entire elementary stream is still fed through the parser to
pick out certain elements of the header which are necessary to manage the
decode process. As you might expect, in these cases, the performance of the
parser is significant.
To measure parser performance, I used the same VC-1 elementary stream in
either an MPEG-2 transport stream or a MKV file, and fed it through avconv
with -c:v copy -c:a copy -f null. These are the gperftools counts for
those streams, both filtered to only include vc1_parse() and its callees,
and unfiltered (to include the whole binary). Lower numbers are better:
Before After
File Filtered Mean StdDev Mean StdDev Confidence Change
M2TS No 861.7 8.2 650.5 8.1 100.0% +32.5%
MKV No 868.9 7.4 731.7 9.0 100.0% +18.8%
M2TS Yes 250.0 11.2 27.2 3.4 100.0% +817.9%
MKV Yes 149.0 12.8 1.7 0.8 100.0% +8526.3%
Yes, that last case shows vc1_parse() running 86 times faster! The M2TS
case does show a larger absolute improvement though, since it was worse
to begin with.
This patch has been tested with the FATE suite (albeit on x86 for speed).
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Initialise VC1DSPContext for parser as well as for decoder.
Note, the VC-1 code doesn't actually use the function pointer yet.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
The rationale is that you have a packed format in form
<greyscale sample> <alpha sample> <greyscale sample> <alpha sample>
and shortening greyscale to 'G' might make one thing about Greenscale instead.
An alias pixel format and color space name are provided for compatibility.
Bug-Id: CVE-2013-0868
inspired by a patch from Michael Niedermayer <michaelni@gmx.at>
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Diego Biurrun <diego@biurrun.de>
CC: libav-stable@libav.org
llvm's integrated assembler does not accept spaces as macro argument
delimiter when targeting darwin. Using a explicit delimiter is a good
idea in principle since it makes case like 'macro 4 -2' vs 'macro 4 - 2'
clear.
Properly address CVE-2011-3946 and parse bitstream as described in the spec.
CC: libav-stable@libav.org
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Make sure the buffer size does not exceed the expected
RLE size.
Prevent an out of array bound write.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Bug-Id: CVE-2013-0852
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Additional contributions by James Almer <jamrial@gmail.com>,
Carl Eugen Hoyos <cehoyos@ag.or.at>, Fiona Glaser <fiona@x264.com> and
Anton Khirnov <anton@khirnov.net>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
The typedefs also exist in the avfft.h header and since typedefs cannot be
legally redefined in C, the code fails to compile with some compilers.
This reverts commits 11c7155cce and 57f1b1dcc7.
Intrinsics only used on aarch64 since the existing ARMv7 NEON asm
is slightly faster (Cortex-A9, gcc-4.8, micro-benchmarks and full
decoding time).
Signed-off-by: James Yu <james.yu@linaro.org>
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
Such files can be created using the --bff x264 option.
Sample-Id: h264_direct_temporal_mvs_bff.mkv
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
The previous implementation targeted DTS Coherent Acoustics, which only
requires nbits == 4 (fft16()). This case was (and still is) linked directly
rather than being indirected through ff_fft_calc_vfp(), but now the full
range from radix-4 up to radix-65536 is available. This benefits other codecs
such as AAC and AC3.
The implementaion is based upon the C version, with each routine larger than
radix-16 calling a hierarchy of smaller FFT functions, then performing a
post-processing pass. This pass benefits a lot from loop unrolling to
counter the long pipelines in the VFP. A relaxed calling standard also
reduces the overhead of the call hierarchy, and avoiding the excessive
inlining performed by GCC probably helps with I-cache utilisation too.
I benchmarked the result by measuring the number of gperftools samples that
hit anywhere in the AAC decoder (starting from aac_decode_frame()) or
specifically in the FFT routines (fft4() to fft512() and pass()) for the
same sample AAC stream:
Before After
Mean StdDev Mean StdDev Confidence Change
Audio decode 2245.5 53.1 1599.6 43.8 100.0% +40.4%
FFT routines 940.6 22.0 348.1 20.8 100.0% +170.2%
Signed-off-by: Martin Storsjö <martin@martin.st>
The previous implementation targeted DTS Coherent Acoustics, which only
requires mdct_bits == 6. This relatively small size lent itself to
unrolling the loops a small number of times, and encoding offsets
calculated at assembly time within the load/store instructions of each
iteration.
In the more general case (codecs such as AAC and AC3) much larger arrays
are used - mdct_bits == [8, 9, 11]. The old method does not scale for
these cases, so more integer registers are used with non-unrolled versions
of the loops (and with some stack spillage). The postrotation filter loop
is still unrolled by a factor of 2 to permit the double-buffering of some
VFP registers to facilitate overlap of neighbouring iterations.
I benchmarked the result by measuring the number of gperftools samples
that hit anywhere in the AAC decoder (starting from aac_decode_frame())
or specifically in ff_imdct_half_c / ff_imdct_half_vfp, for the same
example AAC stream:
Before After
Mean StdDev Mean StdDev Confidence Change
aac_decode_frame 2368.1 35.8 2117.2 35.3 100.0% +11.8%
ff_imdct_half_* 457.5 22.4 251.2 16.2 100.0% +82.1%
Signed-off-by: Martin Storsjö <martin@martin.st>