Christophe Gisquet
08e3ea60ff
x86: synth filter float: implement SSE2 version
Timings for Arrandale: C SSE win32: 2108 334 win64: 1152 322 Factorizing the inner loop with a call/jmp is a >15 cycles cost, even with the jmp destination being aligned. Unrolling for ARCH_X86_64 is a 20 cycles gain. Signed-off-by: Janne Grunau <janne-libav@jannau.net>
Libav README ------------ 1) Documentation ---------------- * Read the documentation in the doc/ directory. 2) Licensing ------------ * See the LICENSE file.
Description
Languages
C
92.1%
Assembly
6%
Makefile
1.2%
C++
0.3%
Objective-C
0.2%
Other
0.1%