* commit 'ebfe622bb1ca57cecb932e42926745cba7161913':
mpegvideo: drop support for real (non-emulated) edges
Conflicts:
libavcodec/mpegvideo.c
libavcodec/mpegvideo_motion.c
libavcodec/wmv2.c
If this is slower on a major platform then it should be investigated
and potentially reverted.
See: 8fc52a5ef9
See: 3969b4b861
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Several decoders disable those anyway and they are not measurably faster
on x86. They might be somewhat faster on other platforms due to missing
emu edge SIMD, but the gain is not large enough (and those decoders
relevant enough) to justify the added complexity.
* commit '458446acfa1441d283dacf9e6e545beb083b8bb0':
lavc: Edge emulation with dst/src linesize
Conflicts:
libavcodec/cavs.c
libavcodec/h264.c
libavcodec/hevc.c
libavcodec/mpegvideo_enc.c
libavcodec/mpegvideo_motion.c
libavcodec/rv34.c
libavcodec/svq3.c
libavcodec/vc1dec.c
libavcodec/videodsp.h
libavcodec/videodsp_template.c
libavcodec/vp3.c
libavcodec/vp8.c
libavcodec/wmv2.c
libavcodec/x86/videodsp.asm
libavcodec/x86/videodsp_init.c
Changes to the asm are not merged, they are left for volunteers or
in their absence for later.
The changes this merge introduces are reordering of the function
arguments
See: face578d56
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Allow supporting files for which the image stride is smaller than
the maximum block size + number of subpel mc taps, e.g. a 64x64 VP9
file or a 16x16 VP8 file with -fflags +emu_edge.
This allows supporting files for which the image stride is smaller than
the max. block size + number of subpel mc taps, e.g. a 64x64 VP9 file
or a 16x16 VP8 file with -fflags +emu_edge.
* commit '601c2015bc16f0b281160292a6a760cbbbb0eacb':
svq3: Avoid a division by zero
Conflicts:
libavcodec/svq3.c
See: 4fa706a4a6
Merged-by: Michael Niedermayer <michaelni@gmx.at>
If the height is zero, the decompression will probably end up
failing due to not fitting into the allocated buffer later
anyway, so this doesn't need any more elaborate check.
Reported-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
Signed-off-by: Martin Storsjö <martin@martin.st>
* commit '1115689d54ea95a084421f5a182b8dc56cbff978':
svq3: Check for any negative return value from ff_h264_check_intra_pred_mode
Conflicts:
libavcodec/svq3.c
See: 019eb2c77b
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Also pass on any returned error code.
Reported-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
Signed-off-by: Martin Storsjö <martin@martin.st>
* commit 'c4e43560fe6677e9d60bfb3cffc41c7324e92a0b':
h264data: Move some tables to the only place they are used
Conflicts:
libavcodec/h264data.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This reverts commit bf36dc50ea, reversing
changes made to b7fc2693c7.
Conflicts:
libavcodec/h264.c
Keeping support for the old VDPAU API has been requested by our VDPAU maintainer
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'bd8ac882140a38868c33c000a430a1292a352533':
avcodec: Add av_cold attributes to end functions missing them
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The non-intra-pcm branch in hl_decode_mb (simple, 8bpp) goes from 700
to 672 cycles, and the complete loop of decode_mb_cabac and hl_decode_mb
(in the decode_slice loop) goes from 1759 to 1733 cycles on the clip
tested (cathedral), i.e. almost 30 cycles per mb faster.
Signed-off-by: Martin Storsjö <martin@martin.st>
Instead, only extend edges on-demand when the motion vector actually
crosses the visible decoded area using ff_emulated_edge_mc(). This
changes decoding time for cathedral from 8.722sec to 8.706sec, i.e.
0.2% faster overall. More generally (VP8 uses this also), low-motion
content gets significant speed improvements, whereas high-motion content
tends to decode in approximately the same time.
Signed-off-by: Martin Storsjö <martin@martin.st>
The non-intra-pcm branch in hl_decode_mb (simple, 8bpp) goes from 700
to 672 cycles, and the complete loop of decode_mb_cabac and hl_decode_mb
(in the decode_slice loop) goes from 1759 to 1733 cycles on the clip
tested (cathedral), i.e. almost 30 cycles per mb faster.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The residual block data of 16x16 blocks was ignored for b-frames, which
leads to easy-to-identify artifacts. After this patch, the artifacts are
gone. Sample video: svq3_watermark.mov. (Fate results unaffected.)
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Instead, only extend edges on-demand when the motion vector actually
crosses the visible decoded area using ff_emulated_edge_mc(). This
changes decoding time for cathedral from 8.722sec to 8.706sec, i.e.
0.2% faster overall. More generally (VP8 uses this also), low-motion
content gets significant speed improvements, whereas high-motion content
tends to decode in approximately the same time.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Most of the changes are just trivial are just trivial replacements of
fields from MpegEncContext with equivalent fields in H264Context.
Everything in h264* other than h264.c are those trivial changes.
The nontrivial parts are:
1) extracting a simplified version of the frame management code from
mpegvideo.c. We don't need last/next_picture anymore, since h264 uses
its own more complex system already and those were set only to appease
the mpegvideo parts.
2) some tables that need to be allocated/freed in appropriate places.
3) hwaccels -- mostly trivial replacements.
for dxva, the draw_horiz_band() call is moved from
ff_dxva2_common_end_frame() to per-codec end_frame() callbacks,
because it's now different for h264 and MpegEncContext-based
decoders.
4) svq3 -- it does not use h264 complex reference system, so I just
added some very simplistic frame management instead and dropped the
use of ff_h264_frame_start(). Because of this I also had to move some
initialization code to svq3.
Additional fixes for chroma format and bit depth changes by
Janne Grunau <janne-libav@jannau.net>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
* commit '35685a3c2a1ec09f3c62dcfc4368fe9e92bcddf6':
dsputil: Move ff_shrink* function declarations to separate header
dsputil: Move ff_svq3 function declarations to a separate header
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Move some functions from dsputil. The idea is that videodsp contains
functions that are useful for a large and varied set of video decoders.
Currently, it contains emulated_edge_mc() and prefetch().
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>