A lot of the time the DC block is empty: don't do the WHT in this case.
A lot of the rest of the time, there's only one coefficient: make a special
DC-only transform for that case.
When the block is empty, don't incorrectly mark luma DCT blocks as having DC
coefficients.
Originally committed as revision 24670 to svn://svn.ffmpeg.org/ffmpeg/trunk
Lets us do the zeroing in asm instead of C.
Also makes it consistent with the way the regular iDCT code does it.
Originally committed as revision 24668 to svn://svn.ffmpeg.org/ffmpeg/trunk
on the huffman tree, instead of traversing the tree in a while loop.
Based on the similar optimization in libvpx's detokenize.c
10% faster at normal bitrates, and 30% faster for high-bitrate intra-only
Originally committed as revision 24468 to svn://svn.ffmpeg.org/ffmpeg/trunk
Apparently the official conformance test vectors don't test this feature,
even though libvpx uses it.
Originally committed as revision 24456 to svn://svn.ffmpeg.org/ffmpeg/trunk
Take shortcuts based on statistically common situations.
Add 4-at-a-time idct_dc function (mmx and sse2) since rows of 4 DC-only DCT
blocks are common.
TODO: tie this more directly into the MB mode, since the DC-level transform is
only used for non-splitmv blocks?
Originally committed as revision 24452 to svn://svn.ffmpeg.org/ffmpeg/trunk
Don't prefetch reference frames that were used less than 1/32th of the time so
far in the frame.
This helps speed up to ~2% on videos that, in many frames, make near-zero
(but not entirely zero) use of golden and/or alt-refs.
This is a very common property of videos encoded by libvpx.
Originally committed as revision 24451 to svn://svn.ffmpeg.org/ffmpeg/trunk
Prefetch all refs (including altref), but only if they've been used so far this
frame.
~2.5% faster overall.
TODO: Do something even smarter, like using how often each ref has been used
so far, so that a couple blocks of a rarely-used ref don't force us to prefetch
it.
Originally committed as revision 24444 to svn://svn.ffmpeg.org/ffmpeg/trunk
Uses a slightly nonintuitive ring buffer size of (width+height*2) to simplify
addressing logic.
Also split out the segmentation map to a separate structure, necessary to
implement the ring buffer.
Originally committed as revision 24426 to svn://svn.ffmpeg.org/ffmpeg/trunk
Gives better cache locality, since the VP8Macroblock structs are still in cache.
Inspired by the way x264 does it.
Originally committed as revision 24417 to svn://svn.ffmpeg.org/ffmpeg/trunk
As in the previous commit, they aren't used for context selection, so it saves
memory this way.
Originally committed as revision 24416 to svn://svn.ffmpeg.org/ffmpeg/trunk
This is the correct solution to the warning "fixed" in the previous
commit.
Originally committed as revision 24367 to svn://svn.ffmpeg.org/ffmpeg/trunk
libavcodec/vp8.c:892: warning: suggest explicit braces to avoid ambiguous `else'
Originally committed as revision 24366 to svn://svn.ffmpeg.org/ffmpeg/trunk
so that it does both U and V planes at the same time. This will have speed
advantages when using SSE2 (or higher) optimizations, since we can do both
the U and V rows together in a single xmm register.
This also renames filter16 to filter16y and filter8 to filter8uv so that it's
more obvious what each function is used for.
Originally committed as revision 24337 to svn://svn.ffmpeg.org/ffmpeg/trunk