10 Commits

Author SHA1 Message Date
Sindre Aamås
a009153741 [Common/x86] DeblockChromaEq4H_ssse3 optimizations
Use packed 8-bit operations rather than unpack to 16-bit.

~5.80x speedup on Haswell (x86-64).
~1.69x speedup on Haswell (x86 32-bit).
2016-02-26 10:58:16 +01:00
Sindre Aamås
9909c306f1 [Common/x86] DeblockChromaLt4H_ssse3 optimizations
Use packed 8-bit operations rather than unpack to 16-bit.

~5.72x speedup on Haswell (x86-64).
~1.85x speedup on Haswell (x86 32-bit).
2016-02-26 10:58:16 +01:00
Sindre Aamås
e96a7b5c92 [Common/x86] DeblockChromaEq4V_ssse3 optimizations
Use packed 8-bit operations rather than unpack to 16-bit.

Avoid spills.

~2.07x speedup on Haswell (x86-64).
~2.12x speedup on Haswell (x86 32-bit).
2016-02-15 02:08:03 +01:00
Sindre Aamås
fc16010583 [Common/x86] DeblockChromaLt4V_ssse3 optimizations
Use packed 8-bit operations rather than unpack to 16-bit.

Avoid spills.

~2.68x speedup on Haswell (x86-64).
~2.38x speedup on Haswell (x86 32-bit).
2016-02-15 02:07:25 +01:00
Sindre Aamås
62fb37d096 [Common/x86] DeblockLumaEq4_ssse3 optimizations
Use packed 8-bit operations rather than unpack to 16-bit.

Minimize spills.

~2.31x speedup on Haswell (x86-64).
~2.40x speedup on Haswell (x86 32-bit).
2016-02-15 02:06:39 +01:00
Sindre Aamås
732e1c5f78 [Common/x86] DeblockLumaLt4_ssse3 optimizations
Use packed 8-bit operations rather than unpack to 16-bit.

Avoid spills.

~1.97x speedup on Haswell (x86-64).
~3.09x speedup on Haswell (x86 32-bit).
2016-02-15 02:06:18 +01:00
zhiliang wang
01b74ea7c1 Add asm code for NoneZeroCount and refine related code 2015-01-04 16:39:17 +08:00
Martin Storsjö
7bc3e944ad Get rid of uneven spacing after WELS_EXTERN 2014-06-09 11:03:25 +03:00
Martin Storsjö
57f6bcc4b0 Convert all tabs to spaces in assembly sources, unify indentation
Previously the assembly sources had mixed indentation consisting
of both spaces and tabs, making it quite hard to read unless
the right tab size was used in the editor.

Tabs have been interpreted as 4 spaces in most cases, matching
the surrounding code.
2014-06-01 01:35:43 +03:00
Licai Guo
e39de8d404 reoranize common to inc/src/x86/arm 2014-03-18 19:41:32 -07:00