Sindre Aamås b6c4a5447c [Decoder/x86] IDCT one block at a time with SSE2
At lower bitrates, it is overall faster to conditionally do one block
at a time with SSE2 on Haswell and likely other common architectures.
At higher bitrates, it is faster to use the wider routine that IDCTs
four blocks at a time. To avoid potential performance regressions
as compared to MMX, stick with single-block IDCTs with SSE2. There
is still a performance advantage as compared to MMX because the
single-block SSE2 routine is faster than the corresponding MMX
routine.

Stick with four blocks at a time with AVX2 for which that appears
to be consistently faster on Haswell.
2016-03-16 19:55:11 +01:00
..
2016-01-19 17:16:39 -08:00
2014-06-25 18:50:41 -07:00