SBR DSP x86: implement SSE sbr_sum_square_sse
The 32bits targets have been compiled with -mfpmath=sse for proper reference. sbr_sum_square C /32bits: 82c (unrolled)/102c C /64bits: 69c (unrolled)/82c SSE/32bits: 42c SSE/64bits: 31c Use of SSE4.1 dpps to perform the final sum is slower. Not unrolling to perform 8 operations in a loop yields 10 more cycles. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
This commit is contained in:

committed by
Ronald S. Bultje

parent
2e74a5abc2
commit
34454c761f
@@ -46,5 +46,6 @@ extern const float ff_sbr_noise_table[][2];
|
||||
|
||||
void ff_sbrdsp_init(SBRDSPContext *s);
|
||||
void ff_sbrdsp_init_arm(SBRDSPContext *s);
|
||||
void ff_sbrdsp_init_x86(SBRDSPContext *s);
|
||||
|
||||
#endif
|
||||
|
Reference in New Issue
Block a user