Sindre Aamås b6c4a5447c [Decoder/x86] IDCT one block at a time with SSE2
At lower bitrates, it is overall faster to conditionally do one block
at a time with SSE2 on Haswell and likely other common architectures.
At higher bitrates, it is faster to use the wider routine that IDCTs
four blocks at a time. To avoid potential performance regressions
as compared to MMX, stick with single-block IDCTs with SSE2. There
is still a performance advantage as compared to MMX because the
single-block SSE2 routine is faster than the corresponding MMX
routine.

Stick with four blocks at a time with AVX2 for which that appears
to be consistently faster on Haswell.
2016-03-16 19:55:11 +01:00
..
2015-11-27 08:56:20 +08:00
2015-11-27 08:56:20 +08:00
2015-06-16 14:17:12 +08:00
2015-06-16 14:17:12 +08:00