Ronald S. Bultje
|
92436e8ad9
|
vp9: implement top/left half (4x4) sub-8x8-IDCT.
For that specific case (eob>3&&eob<=12), runtime of idct8x8 goes from
668 to 477 cycles. For all idct8x8, runtime goes from 521 to 490 cycles.
|
2013-12-07 12:39:36 -05:00 |
|
Ronald S. Bultje
|
b2045c44a9
|
vp9: split pre-load of 11585x2 out of 1d idct macro.
This allows us to load it only once, instead of twice, in this function.
|
2013-12-07 12:39:36 -05:00 |
|
Ronald S. Bultje
|
f9a0d4c6e0
|
vp9: minor refactorings in idct ssse3 assembly.
Make register usage in macros explicit; change mulsub_2w_4x to use 2
instead of 3 temp registers.
|
2013-12-07 12:39:35 -05:00 |
|
Ronald S. Bultje
|
8729964b99
|
vp9: split x86 assembly in two files.
(And in future, loopfilter or intra pred could be put in their own
respective files also.)
|
2013-12-07 12:39:35 -05:00 |
|