Sindre Aamås c8c74903f8 [Encoder] Add single-block AVX2 4x4 DCT/IDCT routines
We do four blocks at a time when possible, but need to handle
single blocks at a time for intra prediction.

~3.15x speedup over MMX for the DCT on Haswell.
~2.94x speedup over MMX for the IDCT on Haswell.

Returns diminish with increasing vector length because a larger
proportion of the time is spent on load/store/shuffling.
2016-02-02 17:22:49 +01:00
..
2015-06-30 10:29:49 +08:00
2015-12-10 15:07:19 +08:00
2015-11-26 09:32:33 +08:00
2015-12-15 17:10:52 +08:00