Sindre Aamås 732e1c5f78 [Common/x86] DeblockLumaLt4_ssse3 optimizations
Use packed 8-bit operations rather than unpack to 16-bit.

Avoid spills.

~1.97x speedup on Haswell (x86-64).
~3.09x speedup on Haswell (x86 32-bit).
2016-02-15 02:06:18 +01:00
..
2015-10-15 10:04:00 -07:00