Instead of using the predict buffer, the decoder now writes
the predictor into the recon buffer. For blocks with eob=0,
unnecessary idcts can be eliminated. This gave a performance
boost of ~1.8% for the HD clips used.
Tero: Added needed changes to ARM side and scheduled some
assembly code to prevent interlocks.
Patch Set 6: Merged (I1bcdca7a95aacc3a181b9faa6b10e3a71ee24df3)
into this commit because of similarities in the idct
functions.
Patch Set 7: EC bug fix.
Change-Id: Ie31d90b5d3522e1108163f2ac491e455e3f955e6
Expand 93c32a55 which used SSE2 instructions to do two
idct/dequant/recons at a time to NEON. Initial working
commit. More work needs to be put into rearranging and
interlacing the data to take advantage of quadword
operations, which is when we'll hopefully see a much
better boost
Change-Id: I86d59d96f15e0d0f9710253e2c098ac2ff2865d1