Wrote sse2 functions of vp9_short_idct8x8 and vp9_short_idct10_8x8.
Compared to c version, the sse2 version is 2X faster. The decoder
test didn't show noticeable gain since 8x8 idct doesn't take much
of decoding time (less than 1% in my test).
Change-Id: I56313e18cd481700b3b52c4eda5ca204ca6365f3
Fixing code style, using array lookup instead of switch statements for
forward hybrid transforms (in the same way as for their inverses).
Consistent usage of ROUND_POWER_OF_TWO macro in appropriate places.
Change-Id: I0d3822ae11f928905fdbfbe4158f91d97c71015f
Wrote SSE2 version of vp9_dc_only_idct_add_c function. In order to
improve performance, clipped the absolute diff values to [0, 255].
This allowed us to keep the additions/subtractions in 8 bits.
Test showed an over 2% decoder performance increase.
Change-Id: Ie1a236d23d207e4ffcd1fc9f3d77462a9c7fe09d
The commit improves the 32x32 forward dct implementation:
1. change to use same constants and rounding as other forward dcts
2. select rounding to specifically minimize the roundtrip error, which
improved average 19/block to .77/block using 100000 random input.
Test showed a small but consistent gain on all test sets, about .15%
Change-Id: If0afd6a71880a522f60c1c234be0462092c2eb53
Rebased.
Remove the old matrix multiplication transform computation. The 16x16
ADST/DCT can be switched on/off and evaluated by setting ACTIVE_HT16
300/0 in vp9/common/vp9_blockd.h.
Change-Id: Icab2dbd18538987e1dc4e88c45abfc4cfc6e133f
Removing redundant 'extern' keywords and parentheses, fixing indentation,
making variable names lower case, using short expressions x *= c
instead of x = x * c, minor code simplifications.
Change-Id: If6a25fcf306d1db26e90d27e3c24a32735c607de
fixed format issues.
Implement the inverse 4x4 ADST using 9 multiplications. For this
particular dimension, the original ADST transform can be
factorized into simpler operations, hence is retained.
Change-Id: Ie5d9749942468df299ab74e90d92cd899569e960