Sindre Aamås 62fb37d096 [Common/x86] DeblockLumaEq4_ssse3 optimizations
Use packed 8-bit operations rather than unpack to 16-bit.

Minimize spills.

~2.31x speedup on Haswell (x86-64).
~2.40x speedup on Haswell (x86 32-bit).
2016-02-15 02:06:39 +01:00
..
2015-10-15 10:04:00 -07:00