SBR DSP x86: implement SSE sbr_sum_square_sse

The 32bits targets have been compiled with -mfpmath=sse for proper reference.
sbr_sum_square C  /32bits: 82c (unrolled)/102c
               C  /64bits: 69c (unrolled)/82c
               SSE/32bits: 42c
               SSE/64bits: 31c

Use of SSE4.1 dpps to perform the final sum is slower.
Not unrolling to perform 8 operations in a loop yields 10 more cycles.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
This commit is contained in:
Christophe GISQUET
2012-02-23 19:48:58 +01:00
committed by Ronald S. Bultje
parent 2e74a5abc2
commit 34454c761f
5 changed files with 116 additions and 0 deletions

View File

@@ -238,4 +238,6 @@ av_cold void ff_sbrdsp_init(SBRDSPContext *s)
if (ARCH_ARM)
ff_sbrdsp_init_arm(s);
if (HAVE_MMX)
ff_sbrdsp_init_x86(s);
}