An overflow issue could potentially happen in the second round 1-D
transform of the SSSE3 full inverse 16x16 2D-DCT. This commit fixes
this issue.
Change-Id: Ia19e4888fda1cc929a28a5f89a5beec612d628dc
This commit enables SSSE3 implementation of the inverse 2D-DCT
with only first 10 coefficients non-zero. It reduces the runtime
of SSE2 version from 745 cycles to 538 cycles, i.e., 27% speed-up.
Change-Id: I18ba4128859b09c704a6ee361d69a86c09fe8dfe
This commit enables the SSSE3 implementation of full inverse 16x16
2D-DCT. The unit runtime goes down from 1642 cycles to 1519 cycles,
about 7% speed-up.
Change-Id: I14d2fdf9da1fb4ed1e5db7ce24f77a1bfc8ea90d