Commit Graph

7 Commits

Author SHA1 Message Date
James Almer
5750d6c5e9 x86: move XOP emulation code back to x86inc
Only two functions that use xop multiply-accumulate instructions where the
first operand is the same as the fourth actually took advantage of the macros.

This further reduces differences with x264's x86inc.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-08-03 17:11:13 -03:00
James Almer
08810a8895 x86/flacdsp: remove unneeded ifdeffery
x86inc can translate r*m into a register or stack on its own

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-01-05 16:29:28 -03:00
James Almer
37b35feb64 x86/swr: add SSE2/AVX pack_8ch functions
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-12-30 23:05:27 -03:00
James Almer
3cec54b7d7 x86/flacdsp: add SSE2 and AVX decorrelate functions
Two to four times faster depending on instruction set, block size and channel count.
2014-11-13 13:47:55 -03:00
James Almer
3f3d748cab x86: Move XOP emulation to x86util
We need the emulation to support the cases where the first
argument is the same as the fourth. To achieve this a fifth
argument working as a temporary may be needed.
Emulation that doesn't obey the original instruction semantics
can't be in x86inc.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-24 08:30:19 +01:00
James Almer
e87974bc00 flac/x86: add ff_flac_lpc_32_xop()
Tested on an AMD FX 6300

679081 decicycles in ff_flac_lpc_32_xop, 32768 runs
774425 decicycles in ff_flac_lpc_32_sse4, 32768 runs

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-13 22:14:59 +01:00
Loren Merritt
9c978f243a flac/x86: add ff_flac_lpc_32_sse4()
benchmarked on sandybridge x86_64:
1358232 decicycles in flac_lpc_32_c
1244575 decicycles in flac_lpc_32_sse4, James Almer's patch
 650045 decicycles in flac_lpc_32_sse4, this patch

I haven't tested the edgecases such as odd block lengths

odd block length tested-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-06 02:51:19 +01:00