ffmpeg

Author	SHA1	Message	Date
Henrik Gramner	ab43beefab	x86inc: Drop SECTION_TEXT macro The .text section is already 16-byte aligned by default on all supported platforms so `SECTION_TEXT` isn't any different from `SECTION .text`. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2015-08-11 11:12:01 +02:00
Henrik Gramner	4a53c758d2	x86: dcadsp: Avoid SSE2 instructions in SSE functions Signed-off-by: Anton Khirnov <anton@khirnov.net>	2015-08-11 09:22:46 +02:00
James Almer	0f524b6c69	x86/synth_filter: remove the fma3 version ifdefs This fixes compilation failures with --disable-fma3 Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>	2014-04-13 11:29:28 +02:00
James Almer	c74b86699c	x86/synth_filter: add synth_filter_fma3 Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>	2014-04-04 17:40:51 +02:00
James Almer	81e02fae6e	x86/synth_filter: add synth_filter_avx Sandy Bridge Win64: 180 cycles in ff_synth_filter_inner_sse2 150 cycles in ff_synth_filter_inner_avx Also switch some instructions to a three operand format to avoid assembly errors with Yasm 1.1.0 or older. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>	2014-04-04 17:40:51 +02:00
James Almer	2025d8026f	x86/synth_filter: add synth_filter_sse Build only on x86_32 targets. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>	2014-04-04 17:40:51 +02:00
Christophe Gisquet	4cb6964244	dcadec: simplify decoding of VQ high frequencies The vector dequantization has a test in a loop preventing effective SIMD implementation. By moving it out of the loop, this loop can be DSPized. Therefore, modify the current DSP implementation. In particular, the DSP implementation no longer has to handle null loop sizes. The decode_hf implementations have following timings: For x86 Arrandale: C SSE SSE2 SSE4 win32: 260 162 119 104 win64: 242 N/A 89 72 The arm NEON optimizations follow in a later patch as external asm. The now unused check for the y modifier in arm inline asm is removed from configure.	2014-02-28 13:03:22 +01:00
Christophe Gisquet	08e3ea60ff	x86: synth filter float: implement SSE2 version Timings for Arrandale: C SSE win32: 2108 334 win64: 1152 322 Factorizing the inner loop with a call/jmp is a >15 cycles cost, even with the jmp destination being aligned. Unrolling for ARCH_X86_64 is a 20 cycles gain. Signed-off-by: Janne Grunau <janne-libav@jannau.net>	2014-02-28 13:00:48 +01:00
Christophe Gisquet	ad507d7907	x86: dcadsp: implement SSE lfe_dir Results for Arrandale/Windows: 32: 1670 -> 316 64: 728 -> 298 Signed-off-by: Janne Grunau <janne-libav@jannau.net>	2014-02-28 13:00:47 +01:00
Christophe Gisquet	5b59a9fc61	x86: dcadsp: implement int8x8_fmul_int32 For the callable function (as opposed to the inline one): C SSE SSE2 SSE4 Win32: 47 42 29 26 Win64: 30 33 25 23 The SSE version is neither compiled nor set for ARCH_X86_64, as the inlinable function takes over. Signed-off-by: Janne Grunau <janne-libav@jannau.net>	2014-02-07 22:52:40 +01:00

10 Commits