Ronald S. Bultje
488fadebbc
vp9: add 10/12bpp idct_idct_32x32 sse2 SIMD version.
2015-10-13 11:06:00 -04:00
Ronald S. Bultje
3d0ca2fe89
vp9: 10/12bpp sse2 SIMD for iadst16.
2015-10-13 11:06:00 -04:00
Ronald S. Bultje
0e80265b0a
vp9: refactor 10/12bpp dc-only code in 4x4/8x8 and add to 16x16.
2015-10-13 11:06:00 -04:00
Ronald S. Bultje
1338fb79d4
vp9: add 10/12bpp sse2 SIMD version for idct_idct_16x16.
2015-10-13 11:06:00 -04:00
Ronald S. Bultje
cb054d061a
vp9: add 10/12bpp sse2 SIMD versions of iadst8x8.
2015-10-13 11:05:59 -04:00
Ronald S. Bultje
e0610787b2
vp9: add 10/12bpp sse2 SIMD for idct_idct_8x8.
2015-10-13 11:05:59 -04:00
Ronald S. Bultje
a35f6bdb38
vp9: add 12bpp sse2 versions of iadst4.
2015-10-13 11:05:59 -04:00
Ronald S. Bultje
235e76aeb8
vp9: initial attempt at a idct_idct_4x4 12bpp x86 simd (sse2) impl.
...
The trouble with this function is that intermediates overflow 31+sign
bits, so I've added some helpers (that will also be used in 10/12bpp
8x8, 16x16 and 32x32) to make that easier, basically emulating a half-
assed pmaddqd using 2xpmaddwd. It's currently sse2-only, if anyone sees
potential in adding ssse3, I'd love to hear it.
2015-10-13 11:05:58 -04:00
Ronald S. Bultje
f76423d097
vp9: add x86 simd (sse2/ssse3) for iadst4 10bpp functions.
2015-10-13 11:05:58 -04:00
Ronald S. Bultje
6b579cf547
vp9: add 10bpp simd (mmxext/ssse3) for idct_idct_4x4.
2015-10-13 11:05:58 -04:00
Ronald S. Bultje
1c3be32533
vp9: add 10/12bpp mmxext-optimized iwht_iwht_4x4 function.
2015-10-13 11:05:57 -04:00