James Almer
b385c4c6a3
x86/swr: replace sse4 instructions in pack_6ch with sse ones
...
There's no benefit from using blendps here except on CPUs with AVX, where
it's faster than shufps according to Intel's documentation.
As such, rename the sse4 functions to sse/sse2 and use shufps instead.
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-11-06 20:54:00 -03:00
Ronald S. Bultje
ad75d2b590
x86: Fix compilation with nasm on PPC & OS/2
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 12:36:19 +02:00
Michael Niedermayer
ca2818b881
swresample/x86/audio_convert: add emms to CONV
...
Might fix Ticket1874
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-06-18 02:26:36 +02:00
Michael Niedermayer
3174616f59
Merge commit '6860b4081d046558c44b1b42f22022ea341a2a73'
...
* commit '6860b4081d046558c44b1b42f22022ea341a2a73':
x86: include x86inc.asm in x86util.asm
cng: Reindent some incorrectly indented lines
cngdec: Allow flushing the decoder
cngdec: Make the dbov variable have the right unit
cngdec: Fix the memset size to cover the full array
cngdec: Update the LPC coefficients after averaging the reflection coefficients
configure: fix print_config() with broke awks
Conflicts:
libavcodec/x86/ac3dsp.asm
libavcodec/x86/dct32.asm
libavcodec/x86/deinterlace.asm
libavcodec/x86/dsputil.asm
libavcodec/x86/dsputilenc.asm
libavcodec/x86/fft.asm
libavcodec/x86/fmtconvert.asm
libavcodec/x86/h264_chromamc.asm
libavcodec/x86/h264_deblock.asm
libavcodec/x86/h264_deblock_10bit.asm
libavcodec/x86/h264_idct.asm
libavcodec/x86/h264_idct_10bit.asm
libavcodec/x86/h264_intrapred.asm
libavcodec/x86/h264_intrapred_10bit.asm
libavcodec/x86/h264_weight.asm
libavcodec/x86/vc1dsp.asm
libavcodec/x86/vp3dsp.asm
libavcodec/x86/vp56dsp.asm
libavcodec/x86/vp8dsp.asm
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-10-31 13:43:33 +01:00
Carl Eugen Hoyos
52be5428c0
Add some missing _EXTERNAL suffixes to yasm source files.
2012-08-31 15:39:03 +02:00
Michael Niedermayer
c88e60af76
swr/x86: 10l, missed some SSE2 instructions in code marked as SSE.
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-07-05 15:28:10 +02:00
Michael Niedermayer
a927641e7a
libswresample-simd: Add ff_pack_6ch_float_to_int32_a_avx and ff_pack_6ch_float_to_int32_a_sse4
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-13 20:56:18 +02:00
Michael Niedermayer
ca986a06ad
libswresample-simd: add ff_pack_6ch_int32_to_float_a_avx and ff_pack_6ch_int32_to_float_a_sse4
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-13 20:53:30 +02:00
Michael Niedermayer
c4047ad9e0
libswresample: make NOP_N macro less picky on its parameters
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-13 20:45:32 +02:00
Michael Niedermayer
57bc91c710
libswresample: Change FLOAT_TO_INT32_N to need 1 register less
...
same speed on sandy bridge
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-13 20:44:08 +02:00
Michael Niedermayer
ecfdd125f1
libswresample-simd: rename 6ch pack to what it is
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-13 20:31:12 +02:00
Michael Niedermayer
429b964e25
libswresample-simd: make the converter registers parameters
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-13 20:30:13 +02:00
Michael Niedermayer
b3915c4b70
libswresample: cosmetics
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-13 19:32:06 +02:00
Michael Niedermayer
24c0d1583c
libswresample: unaligned AVX/SSE4 float and int32 6ch pack
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-13 19:31:59 +02:00
Justin Ruggles
6f67d9833b
libswresample: Implement MMX, SSE4 and AVX 6ch float and int32 packing function.
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-13 19:31:59 +02:00
Michael Niedermayer
cbbc472467
swr-x86-simd: add ff_unpack_2ch_int16_to_int16/int32/float_a_ssse3
...
more than 10% faster (tested on sandybridge)
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-06 19:39:52 +02:00
Michael Niedermayer
72ae583b7d
swr-x86-simd: stereo unpack S16/S32/FLT-> S16/S32/FLT SSE/SSE2 (16 new SIMD functions)
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-06 17:25:52 +02:00
Michael Niedermayer
adfa53b91f
swr-x86-SIMD: 3 instructions less for stereo planar->packed s32/flt->s16
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-06 17:25:52 +02:00
Michael Niedermayer
5f4e18cd16
swr: replace the remaining 2 audio convert SIMD macros by the new ones
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-05 19:59:57 +02:00
Michael Niedermayer
df5ff103cd
swr: fix internal asm labels
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-05 19:43:11 +02:00
Michael Niedermayer
b6f4f0d9ef
swr: fix PACK_2CH register count
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-05 19:42:52 +02:00
Michael Niedermayer
aae3119643
swr: replace planar->planar/packed->packed FLT<->S16/S32 SIMD by new macros
...
this simplifies the code
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-05 19:41:39 +02:00
Michael Niedermayer
47055b8913
swr: implement stereo S16/S32/FLT->S16/S32/FLT planar->packed in SSE/SSE2
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-05 18:32:34 +02:00
Michael Niedermayer
e8dd7928c8
swr: change simd len argument to be in samples instead of dst bytes.
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-05 18:32:34 +02:00
Michael Niedermayer
c1fe2db376
swr: add ff_int32_to_float_a_avx
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-03 15:58:51 +02:00
Michael Niedermayer
65722e7fc5
swr: int32_to_int16_mmx/sse
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-29 14:20:35 +02:00
Michael Niedermayer
73edb58c3c
swr: float_to_int16_sse2()
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-29 12:18:14 +02:00
Michael Niedermayer
5932938c9a
swr: float_to_int32_sse2()
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-29 11:37:32 +02:00
Michael Niedermayer
b72a0f9c23
swr: add int16_to_float_sse2()
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-28 19:07:30 +02:00
Michael Niedermayer
832c3b10d2
swr: add int32_to_float_sse2
...
could be done for sse/3dnow too if someone wants
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-28 17:06:11 +02:00
Michael Niedermayer
95057b1972
swr: int16->int32: use the old index negate trick to avoid 2 adds
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-28 17:06:11 +02:00
Michael Niedermayer
113738d6c2
swr: more correct cglobal parameters to int16->int32
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-28 17:06:11 +02:00
Michael Niedermayer
fa5daaca0d
swr: seperate functions for aligned & unaligned
...
If someone has an idea on how to do this cleaner, its welcome
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-28 13:15:44 +02:00
Michael Niedermayer
bcc66ff0e4
swr: add int16_to_int32_mmx/sse
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-28 13:15:44 +02:00