Instead, only extend edges on-demand when the motion vector actually
crosses the visible decoded area using ff_emulated_edge_mc(). This
changes decoding time for cathedral from 8.722sec to 8.706sec, i.e.
0.2% faster overall. More generally (VP8 uses this also), low-motion
content gets significant speed improvements, whereas high-motion content
tends to decode in approximately the same time.
Signed-off-by: Martin Storsjö <martin@martin.st>
These functions are mostly H264-specific (the only other user I can
spot is bink), and this allows us to special-case some functionality
for H264. Also remove the 16-bit-coeff with >8bpp versions (unused)
and merge the duplicate 32-bit-coeff for >8bpp (identical).
Signed-off-by: Martin Storsjö <martin@martin.st>
Most of the changes are just trivial are just trivial replacements of
fields from MpegEncContext with equivalent fields in H264Context.
Everything in h264* other than h264.c are those trivial changes.
The nontrivial parts are:
1) extracting a simplified version of the frame management code from
mpegvideo.c. We don't need last/next_picture anymore, since h264 uses
its own more complex system already and those were set only to appease
the mpegvideo parts.
2) some tables that need to be allocated/freed in appropriate places.
3) hwaccels -- mostly trivial replacements.
for dxva, the draw_horiz_band() call is moved from
ff_dxva2_common_end_frame() to per-codec end_frame() callbacks,
because it's now different for h264 and MpegEncContext-based
decoders.
4) svq3 -- it does not use h264 complex reference system, so I just
added some very simplistic frame management instead and dropped the
use of ff_h264_frame_start(). Because of this I also had to move some
initialization code to svq3.
Additional fixes for chroma format and bit depth changes by
Janne Grunau <janne-libav@jannau.net>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
The sh4 optimizations are removed, because the code is
100% identical to the C code, so it is unlikely to
provide any real practical benefit.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
ref_list is constructed from other fields per slice when needed, so do
not copy it for both frame and slice threading.
default_ref_list is constructed per frame and still needs to be copied
to per-slice contexts for slice threading, but a copy is not needed for
frame threading.
If the motion vector is at a subpixel position, we need 3 pixels below
the motion vector's wholepel position available, not 2, since the MC
filter is a sixtap filter for the hpel position, and then a bilin filter
for the qpel position.
This patch fixes highly irreproducible (0.1%) fate failures in frame 2
and 4 of h264-conformance-cama2_vtc_b (e.g. first P-frame, first field,
last line of MB x=40,y=2 and second field and last lines of MBs x=39-40,
y=3). These used pre-loopfilter instead of post-loopfilter data because
the await_progress() waited for one line too little in that field, and
the motion vector of these particular MBs happened to align exactly to a
position where that demonstrates the bug.
CC: libav-stable@libav.org
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Clobbering these tables will temporarily clobber the template used
as a basis for other threads to start decoding from. If the other
decoding thread updates from the template right at that moment,
subsequent threads will get invalid (or, usually, none at all) mmco
tables. This leads to invalid reference lists and subsequent decode
failures.
Therefore, instead, decode the mmco tables only for the first slice in
a field or frame. For other slices, decode the bits and ensure they
are identical to the mmco tables in the first slice, but don't ever
clobber the context state. This prevents other threads from using a
clobbered/invalid template as starting point for decoding, and thus
fixes decoding in these cases.
This fixes occasional (~1%) failures of h264-conformance-mr1_bt_a with
frame-multithreading enabled.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Fixes null pointer dereference later, since if this function failed,
a positive return value was returned to the caller.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Martin Storsjö <martin@martin.st>
Comparing AVCodecContext.pix_fmt against the get_pixel_format() return
value has the side effect of calling the get_format() callback on each
slice. Users of the callback will probably handle hardware accelerator
initialization in the callback.
Move some functions from dsputil. The idea is that videodsp contains
functions that are useful for a large and varied set of video decoders.
Currently, it contains emulated_edge_mc() and prefetch().
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Since we can't know which stride a custom get_buffer() implementation is
going to use we have to allocate this scratch buffers after the linesize
is known. It was pretty safe for 8 bit per pixel pixel formats since we
always allocated memory for up to 16 bits per pixel. It broke hoever
with cmdutis.c's alloc_buffer() and high pixel bit depth since it
allocated larger edges than mpegvideo expected.
Fixes fuzzed sample nasa-8s2.ts_s244342.
It is not posible to call get_buffer during frame-mt codec
initialization. Libavformat might pass huge amounts of data as
extradata after parsing broken files. The 'extradata' for the fuzzed
sample sample_varPAR_s5374_r001-02.avi is 2.8M large and contains
multiple slices.
This does not seem to have an effect currently. Fate-h264 passes with
THREADS=1..16 and both threading types as before. It fixes however a
segfault during error resilience with my adaptive-frame-mt patchset.
A picture in use during error resilience gets realloced in another
thread in the fuzzed sample sample_varPAR.avi_s226019.
Dropping frames is undesirable but that is the only way by which the
decoder could return to low delay mode. Instead emit a warning and
continue with delayed frames.
Fixes a crash in fuzzed sample nasa-8s2.ts_s20033 caused by a larger
than expected has_b_frames value. Low delay keeps getting re-enabled
from a presumely broken SPS.
CC: libav-stable@libav.org
s->mb_x is reset to zero a couple of lines above. It does not make
sense to call ff_er_add_slice() with 0 as endx when the end of the
macroblock row was reached. Fixes unnecessary and counterproductive
error resilience in https://bugzilla.libav.org/show_bug.cgi?id=394.
CC: libav-stable@libav.org
Some invocations include a verb in the log message, others do not. Yet
av_log_missing_feature expects callers to provide a verb. Change the
function to include a verb instead and update the callers accordingly.
The result is a more natural function API and correct English in the
function invocations.
When decode_nal_units() previously encountered a NAL_END_SEQUENCE,
and there are some junk bytes left in the input buffer, but no start codes,
buf_index gets stuck 3 bytes before the end of the buffer.
This can trigger an infinite loop in the caller code, eg. in
try_decode_trame(), as avcodec_decode_video() then keeps returning zeroes,
with 3 bytes of the input packet still available.
With this change, the remaining bytes are skipped so the whole packet gets
consumed.
CC:libav-stable@libav.org
Signed-off-by: Jindřich Makovička <makovick@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
It is possible in various error paths as well as gap handling
that this has already been allocated. It is not clear why that
would be a problem with the current code, thus disable the
assert to avoid a common assert failure when asserts are enabled.
Signed-off-by: Martin Storsjö <martin@martin.st>