The non-intra-pcm branch in hl_decode_mb (simple, 8bpp) goes from 700
to 672 cycles, and the complete loop of decode_mb_cabac and hl_decode_mb
(in the decode_slice loop) goes from 1759 to 1733 cycles on the clip
tested (cathedral), i.e. almost 30 cycles per mb faster.
Signed-off-by: Martin Storsjö <martin@martin.st>
Instead, only extend edges on-demand when the motion vector actually
crosses the visible decoded area using ff_emulated_edge_mc(). This
changes decoding time for cathedral from 8.722sec to 8.706sec, i.e.
0.2% faster overall. More generally (VP8 uses this also), low-motion
content gets significant speed improvements, whereas high-motion content
tends to decode in approximately the same time.
Signed-off-by: Martin Storsjö <martin@martin.st>
Most of the changes are just trivial are just trivial replacements of
fields from MpegEncContext with equivalent fields in H264Context.
Everything in h264* other than h264.c are those trivial changes.
The nontrivial parts are:
1) extracting a simplified version of the frame management code from
mpegvideo.c. We don't need last/next_picture anymore, since h264 uses
its own more complex system already and those were set only to appease
the mpegvideo parts.
2) some tables that need to be allocated/freed in appropriate places.
3) hwaccels -- mostly trivial replacements.
for dxva, the draw_horiz_band() call is moved from
ff_dxva2_common_end_frame() to per-codec end_frame() callbacks,
because it's now different for h264 and MpegEncContext-based
decoders.
4) svq3 -- it does not use h264 complex reference system, so I just
added some very simplistic frame management instead and dropped the
use of ff_h264_frame_start(). Because of this I also had to move some
initialization code to svq3.
Additional fixes for chroma format and bit depth changes by
Janne Grunau <janne-libav@jannau.net>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Move some functions from dsputil. The idea is that videodsp contains
functions that are useful for a large and varied set of video decoders.
Currently, it contains emulated_edge_mc() and prefetch().
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
This prevents undefined behaviour of signed left shift if the coded
value is larger than 2^31. Large values are most likely invalid and
caused errors or by feeding random.
Validate every use of svq3_get_ue_golomb() and changed the place there
the return value was compared with negative numbers. dirac.c was clean,
fixed rv30 and svq3.
Fixes:
libavcodec/svq3.c:661:9: warning: passing argument 2 of 'svq3_decode_block' from incompatible pointer type
libavcodec/svq3.c:208:19: note: expected 'DCTELEM *' but argument is of type 'DCTELEM (*)[32]'
Signed-off-by: Mans Rullgard <mans@mansr.com>
Also break some long lines, remove codec function placeholder comments
and add spaces in sample/pixel format lists.
Signed-off-by: Martin Storsjö <martin@martin.st>
Results of IDCT can by far outreach the range of ff_cropTbl[], leading
to overreads and potentially crashes.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
Conversion of the luma intra prediction mode to one of the constrained
("alzheimer") ones can happen by crafting special bitstreams, causing
a crash because we'll call a NULL function pointer for 16x16 block intra
prediction, since constrained intra prediction functions are only
implemented for chroma (8x8 blocks).
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
FF_COMMON_FRAME holds the contents of the AVFrame structure and is also copied
to struct Picture. Replace by an embedded AVFrame structure in struct Picture.
If done before, some parameters aren't known yet.
With svq3/rtp, initializing before some parameters are known
can lead to calling av_malloc(0), which on OS X currently returns
broken pointers.
No speed improvement, but necessary for some future stuff.
Also opens up the possibility of asm chroma dc idct/dequant.
Originally committed as revision 26349 to svn://svn.ffmpeg.org/ffmpeg/trunk