This commit fixed the potential overflow issue in the SSE2
implementation of 32x32 forward DCT. It resolved the corrupted
coded frames in the border of scenes.
Change-Id: If87eef2d46209269f74ef27e7295b6707fbf56f9
Enable SSE2 implementation of high precision 32x32 forward DCT. The
intermediate stacks are of 32-bits. The run-time goes down from
32126 cycles to 13442 cycles.
Change-Id: Ib5ccafe3176c65bd6f2dbdef790bd47bbc880e56
This is in preparation for the SSE2 version of the high-precision
32x32 forward DCT which will share a lot of code with the existing
low precision version used for rate-distortion search.
Change-Id: I7084b6bdfb480b1fabb8493fb14e3f7fcc7888c0