There is no point in delaying the check and it avoids bugs with a
half-initialized context.
Fixes invalid reads.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC:libav-stable@libav.org
If it was set before then we can end up trying to decode a slice without
a valid slice header, which can lead to invalid memory access.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC:libav-stable@libav.org
This is a temporary workaround to allow deprecating
avcodec_get_frame_defaults(). The proper solution will be using a
properly allocated AVFrame in Picture.
Allow supporting files for which the image stride is smaller than
the maximum block size + number of subpel mc taps, e.g. a 64x64 VP9
file or a 16x16 VP8 file with -fflags +emu_edge.
This can be optionally disabled whith the "output_corrupt" flags
option. When in "output_corrupt" mode, incomplete frames are
signalled through AVFrame.flags FRAME_FLAG_INCOMPLETE_FRAME.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
It is only used for error resilience. This allows building the
h264 decoder without dsputil, if error resilience is disabled.
Signed-off-by: Martin Storsjö <martin@martin.st>
The non-intra-pcm branch in hl_decode_mb (simple, 8bpp) goes from 700
to 672 cycles, and the complete loop of decode_mb_cabac and hl_decode_mb
(in the decode_slice loop) goes from 1759 to 1733 cycles on the clip
tested (cathedral), i.e. almost 30 cycles per mb faster.
Signed-off-by: Martin Storsjö <martin@martin.st>
Allows use of AVHWAccel based decoders with frame based multithreading.
The decoders will be forced into an non-concurrent mode by delaying
ff_thread_finish_setup() calls after decoding of the current frame
is finished.
This wastes memory by unnecessarily using multiple threads and thus
copies of the decoder context but allows seamless switching between
hardware accelerated and frame threaded software decoding when the
hardware decoder does not support the stream.
Error resilience is enabled by the h264 decoder, unless explicitly
disabled. --disable-everything --enable-decoder=h264 will produce
a h264 decoder with error resilience enabled, while
--disable-everything --enable-decoder=h264 --disable-error-resilience
will produce a h264 decoder with error resilience disabled.
Signed-off-by: Martin Storsjö <martin@martin.st>
AVCodecContext.bits_per_raw_sample is updated from the previous thread
in the generic update function before the codec specific update_thread
function is called. The check for reinitialization of dsp functions uses
bits_per_raw_sample. When called from update_thread_context it will be
already at the current value and the dsp functions aren't updated if
only the bit depth changes.
This ensures the hwaccel privdata does not leak when a frame buffer could
not be allocated (and toggle the assert when the frame is re-used).
Having no frame buffer available is quite common when using the DXVA2
hwaccel in situations where the DXVA2 renderer is being re-allocated, for
example when moving between displays.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
This makes the decoder independent of mpegvideo.
This copy of the draw_horiz_band code is simplified compared to
the "generic" mpegvideo one which still has a number of special
cases for different codecs.
Signed-off-by: Martin Storsjö <martin@martin.st>
Not all hwaccels implement all codecs, so using one single list for
multiple such codecs means some codecs will be represented in the list,
even though they don't actually handle that codec. Copying specific
lists in each codec fixes that.
Signed-off-by: Martin Storsjö <martin@martin.st>
The decoder assumes a single bit depth for all the planes
while the specification allows different bit depths for luma
and chroma.
Avoid the possible problems described in CVE-2013-2277
CC: libav-stable@libav.org
Instead, only extend edges on-demand when the motion vector actually
crosses the visible decoded area using ff_emulated_edge_mc(). This
changes decoding time for cathedral from 8.722sec to 8.706sec, i.e.
0.2% faster overall. More generally (VP8 uses this also), low-motion
content gets significant speed improvements, whereas high-motion content
tends to decode in approximately the same time.
Signed-off-by: Martin Storsjö <martin@martin.st>
These functions are mostly H264-specific (the only other user I can
spot is bink), and this allows us to special-case some functionality
for H264. Also remove the 16-bit-coeff with >8bpp versions (unused)
and merge the duplicate 32-bit-coeff for >8bpp (identical).
Signed-off-by: Martin Storsjö <martin@martin.st>
Most of the changes are just trivial are just trivial replacements of
fields from MpegEncContext with equivalent fields in H264Context.
Everything in h264* other than h264.c are those trivial changes.
The nontrivial parts are:
1) extracting a simplified version of the frame management code from
mpegvideo.c. We don't need last/next_picture anymore, since h264 uses
its own more complex system already and those were set only to appease
the mpegvideo parts.
2) some tables that need to be allocated/freed in appropriate places.
3) hwaccels -- mostly trivial replacements.
for dxva, the draw_horiz_band() call is moved from
ff_dxva2_common_end_frame() to per-codec end_frame() callbacks,
because it's now different for h264 and MpegEncContext-based
decoders.
4) svq3 -- it does not use h264 complex reference system, so I just
added some very simplistic frame management instead and dropped the
use of ff_h264_frame_start(). Because of this I also had to move some
initialization code to svq3.
Additional fixes for chroma format and bit depth changes by
Janne Grunau <janne-libav@jannau.net>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
The sh4 optimizations are removed, because the code is
100% identical to the C code, so it is unlikely to
provide any real practical benefit.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
ref_list is constructed from other fields per slice when needed, so do
not copy it for both frame and slice threading.
default_ref_list is constructed per frame and still needs to be copied
to per-slice contexts for slice threading, but a copy is not needed for
frame threading.
If the motion vector is at a subpixel position, we need 3 pixels below
the motion vector's wholepel position available, not 2, since the MC
filter is a sixtap filter for the hpel position, and then a bilin filter
for the qpel position.
This patch fixes highly irreproducible (0.1%) fate failures in frame 2
and 4 of h264-conformance-cama2_vtc_b (e.g. first P-frame, first field,
last line of MB x=40,y=2 and second field and last lines of MBs x=39-40,
y=3). These used pre-loopfilter instead of post-loopfilter data because
the await_progress() waited for one line too little in that field, and
the motion vector of these particular MBs happened to align exactly to a
position where that demonstrates the bug.
CC: libav-stable@libav.org
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Clobbering these tables will temporarily clobber the template used
as a basis for other threads to start decoding from. If the other
decoding thread updates from the template right at that moment,
subsequent threads will get invalid (or, usually, none at all) mmco
tables. This leads to invalid reference lists and subsequent decode
failures.
Therefore, instead, decode the mmco tables only for the first slice in
a field or frame. For other slices, decode the bits and ensure they
are identical to the mmco tables in the first slice, but don't ever
clobber the context state. This prevents other threads from using a
clobbered/invalid template as starting point for decoding, and thus
fixes decoding in these cases.
This fixes occasional (~1%) failures of h264-conformance-mr1_bt_a with
frame-multithreading enabled.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Fixes null pointer dereference later, since if this function failed,
a positive return value was returned to the caller.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Martin Storsjö <martin@martin.st>
Comparing AVCodecContext.pix_fmt against the get_pixel_format() return
value has the side effect of calling the get_format() callback on each
slice. Users of the callback will probably handle hardware accelerator
initialization in the callback.
Move some functions from dsputil. The idea is that videodsp contains
functions that are useful for a large and varied set of video decoders.
Currently, it contains emulated_edge_mc() and prefetch().
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Since we can't know which stride a custom get_buffer() implementation is
going to use we have to allocate this scratch buffers after the linesize
is known. It was pretty safe for 8 bit per pixel pixel formats since we
always allocated memory for up to 16 bits per pixel. It broke hoever
with cmdutis.c's alloc_buffer() and high pixel bit depth since it
allocated larger edges than mpegvideo expected.
Fixes fuzzed sample nasa-8s2.ts_s244342.
It is not posible to call get_buffer during frame-mt codec
initialization. Libavformat might pass huge amounts of data as
extradata after parsing broken files. The 'extradata' for the fuzzed
sample sample_varPAR_s5374_r001-02.avi is 2.8M large and contains
multiple slices.
This does not seem to have an effect currently. Fate-h264 passes with
THREADS=1..16 and both threading types as before. It fixes however a
segfault during error resilience with my adaptive-frame-mt patchset.
A picture in use during error resilience gets realloced in another
thread in the fuzzed sample sample_varPAR.avi_s226019.
Dropping frames is undesirable but that is the only way by which the
decoder could return to low delay mode. Instead emit a warning and
continue with delayed frames.
Fixes a crash in fuzzed sample nasa-8s2.ts_s20033 caused by a larger
than expected has_b_frames value. Low delay keeps getting re-enabled
from a presumely broken SPS.
CC: libav-stable@libav.org
s->mb_x is reset to zero a couple of lines above. It does not make
sense to call ff_er_add_slice() with 0 as endx when the end of the
macroblock row was reached. Fixes unnecessary and counterproductive
error resilience in https://bugzilla.libav.org/show_bug.cgi?id=394.
CC: libav-stable@libav.org
Some invocations include a verb in the log message, others do not. Yet
av_log_missing_feature expects callers to provide a verb. Change the
function to include a verb instead and update the callers accordingly.
The result is a more natural function API and correct English in the
function invocations.
When decode_nal_units() previously encountered a NAL_END_SEQUENCE,
and there are some junk bytes left in the input buffer, but no start codes,
buf_index gets stuck 3 bytes before the end of the buffer.
This can trigger an infinite loop in the caller code, eg. in
try_decode_trame(), as avcodec_decode_video() then keeps returning zeroes,
with 3 bytes of the input packet still available.
With this change, the remaining bytes are skipped so the whole packet gets
consumed.
CC:libav-stable@libav.org
Signed-off-by: Jindřich Makovička <makovick@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
It is possible in various error paths as well as gap handling
that this has already been allocated. It is not clear why that
would be a problem with the current code, thus disable the
assert to avoid a common assert failure when asserts are enabled.
Signed-off-by: Martin Storsjö <martin@martin.st>
The h264_vdpau decoder crashed if output colorspace was not 8-bit 420.
Add a check to error out instead (current hardware does not support
other colorspaces, so successful decoding is not possible).
Signed-off-by: Martin Storsjö <martin@martin.st>
Write out the NAL decoding loops in full so that they are easier
to parse for a preprocessor without it having to be aware of macros
or other such things in C code.
This also makes the code more readable.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Instead of inlining everything into ff_h264_hl_decode_mb(), use
explicit templating to create versions of the called functions
with constant parameters filled in. This greatly speeds up
compilation of h264.c and reduces the code size without any
measurable impact on performance.
Compilation time for h264.c on an i7 goes from 30s to 5.5s.
Code size is reduced by 430kB.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Override the frame size from the SPS with AVCodecContext values
if the latter specify a size smaller by less than one macroblock.
This is required for correct cropping of MOV files from Canon cameras.
Signed-off-by: Mans Rullgard <mans@mansr.com>
If decoding a second complementary field, and the first was
decoded in our thread, mark decoding of that field as complete.
If decoding fails, mark the decoded field/frame as complete.
Do not allow switching between field modes or field/frame mode
between slices within the same field/frame. Ensure that two
subsequent fields cover top/bottom (rather than top/frame,
bottom/frame or such nonsense situations).
Fixes various deadlocks when decoding samples with errors in
reference frames.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
Progressive images can have only 16 references, error out if there are
more, since the data is almost certainly corrupt, and the invalid value
will lead to random crashes or invalid writes later on.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
Parsing the entire NAL as SPS fixes decoding of some AVC bitstreams
with broken escaping. Since the size of the NAL unit is known and
checked against the buffer end we can parse it entirely without buffer
overreads.
Fixes playback of
http://streams.videolan.org/streams/mp4/Mr_MrsSmith-h264_aac.mp4
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
This reverts commit 729ebb2f18.
There was an off-by-one error in the bit mask calculation clearing
actually the last valid bit and causing
http://bugzilla.libav.org/show_bug.cgi?id=227
The broken sample (Mr_MrsSmith-h264_aac.mp4) the commit was fixing
does not work after correcting the off-by-one error.
CC: libav-stable@libav.org
Fixes invalid reads while initializing the dequant tables, which uses
the bit depth to determine the QP table size.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
Conversion of the luma intra prediction mode to one of the constrained
("alzheimer") ones can happen by crafting special bitstreams, causing
a crash because we'll call a NULL function pointer for 16x16 block intra
prediction, since constrained intra prediction functions are only
implemented for chroma (8x8 blocks).
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
This fixes standalone compilation of some decoders with --disable-optimizations.
cabac.h defines some inline functions that use symbols from cabac.c. Without
optimizations these inline functions are not eliminated and linking fails with
references to non-existing symbols.
Splitting the inline functions off into their own header and only #including
it in the places where the inline functions are used allows #including cabac.h
from anywhere without ill effects.