Sindre Aamås 7486de2844 [Encoder] AVX2 DCT tweaks
Do some shuffling in load/store unpack/pack to save some
work in horizontal DCTs.

Use a few 128-bit broadcasts to compact data vectors a bit.

~1.04x speedup for the DCT case on Haswell.
~1.12x speedup for the IDCT case on Haswell.
2016-02-02 17:22:48 +01:00
..
2016-02-02 17:22:48 +01:00