Christophe Gisquet
2cdbcc0048
x86: synth filter float: implement SSE2 version
Timings for Arrandale: C SSE win32: 2108 334 win64: 1152 322 Factorizing the inner loop with a call/jmp is a >15 cycles cost, even with the jmp destination being aligned. Unrolling for ARCH_X86_64 is a 20 cycles gain. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
FFmpeg README ------------- 1) Documentation ---------------- * Read the documentation in the doc/ directory in git. You can also view it online at http://ffmpeg.org/documentation.html 2) Licensing ------------ * See the LICENSE file. 3) Build and Install -------------------- * See the INSTALL file.
Description
Languages
C
92.1%
Assembly
6%
Makefile
1.2%
C++
0.3%
Objective-C
0.2%
Other
0.1%