This commit reworks the SSSE3 implementation of the forward 8x8
2D-DCT. It uses a cyclic rotation approach to the temporary xmm
registers. It reduces the average cycles from 158 to 154. The SSE2
version uses 169 cycles.
Change-Id: I1b79b9642aae0ed3fb3cefb5b70246e6de5d5caa
Rename updated version of x86inc.asm
Use "private_prefix" instead of "program_name" and make vpx the default
prefix.
Change-Id: I4883a99b2aee8e5dc9f2c16a2e6f4b5d6e4de458