vpx/vp9
Jingning Han 9d67495f72 Optimize 32x32 2D inverse DCT for speed-up
This commit exploits the sparsity of quantized coefficient matrix.
It detects each 32x8 array and skip the corresponding inverse
transformation if all entries are zero.

For ped1080p at 8000 kbps, this on average reduces the runtime of
32x32 inverse 2D-DCT SSE2 function from 6256 cycles -> 5200
cycles. It makes the overall encoding process about 2% faster at
speed 0. The speed-up is more pronounceable for the decoding process.

Change-Id: If20056c3566bd117642a76f8884c83e8bc8efbcf
2013-07-31 17:13:31 -07:00
..
common Optimize 32x32 2D inverse DCT for speed-up 2013-07-31 17:13:31 -07:00
decoder Tune tokenization/detokenization flow for speed-up 2013-07-29 16:15:30 -07:00
encoder Make the use of ref_frame index consistent 2013-07-30 19:49:36 -07:00
exports_dec support building vp8 and vp9 into a single lib 2012-11-15 10:46:17 -08:00
exports_enc support building vp8 and vp9 into a single lib 2012-11-15 10:46:17 -08:00
vp9_common.mk Add neon optimize vp9_short_idct8x8_add. 2013-07-18 16:40:41 -07:00
vp9_cx_iface.c use consistent framerate naming 2013-07-16 14:12:47 -07:00
vp9_dx_iface.c vp9_dx_iface: s/vp8/vp9/ where possible 2013-07-12 11:05:39 -07:00
vp9_iface_common.h yv12config: remove YUV_TYPE 2013-07-12 15:25:48 -07:00
vp9cx.mk Remove unused fwalsh/fdct x86 SIMD implementations. 2013-07-10 18:22:51 -07:00
vp9dx.mk Remove all asm offset files from VP9 2013-07-09 14:26:53 -07:00