ffmpeg

Author	SHA1	Message	Date
Diego Biurrun	1648a508fa	x86: dsputil: Move specific optimization settings out of global init function They belong in the init functions specific to each CPU capability.	2012-09-11 10:12:17 +02:00
Diego Biurrun	a84edbacaf	x86: dsputil: Only compile motion_est code when encoders are enabled	2012-09-10 08:31:47 +02:00
Diego Biurrun	e0c6cce447	x86: Replace checks for CPU extensions and flags by convenience macros This separates code relying on inline from that relying on external assembly and fixes instances where the coalesced check was incorrect.	2012-09-08 18:18:34 +02:00
Hendrik Leppkes	fb4e983e0c	x86: mlpdsp: mlp_filter_channel_x86 requires inline asm Signed-off-by: Martin Storsjö <martin@martin.st>	2012-09-08 15:41:44 +03:00
Diego Biurrun	1169f0d0af	x86: more specific checks for availability of required assembly capabilities	2012-09-07 18:16:04 +02:00
Diego Biurrun	8cb7ed5562	x86: avcodec: Drop silly "_mmx" suffix from dsputil template names	2012-09-07 13:50:52 +02:00
Mans Rullgard	6efb698883	cavsdsp: set idct permutation independently of dsputil CAVS uses its own idct so using dsputil to set the permutation is fragile. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-09-07 11:42:35 +01:00
Mans Rullgard	5fe64d88f6	x86: allow using add_hfyu_median_prediction_cmov on any cpu with cmov For some reason add_hfyu_median_prediction_cmov is only selected on 3Dnow-capable CPUs, even though it uses no 3Dnow instructions. This patch allows it to be selected on any cpu with cmov with the possibility of being overridden by the mmxext version. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-09-07 11:42:35 +01:00
Diego Biurrun	ef6ba1f237	x86: dsputil: Do not redundantly check for CPU caps before calling init funcs The init functions check for CPU capabilities on their own already.	2012-09-06 09:05:52 +02:00
Hendrik Leppkes	d914ea6fd8	x86: vp56: cmov version of vp56_rac_get_prob requires inline asm Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-09-05 21:30:46 +02:00
Diego Biurrun	a84ac7a860	x86: h264dsp: drop some unnecessary ifdefs around prototype declarations	2012-09-04 01:44:59 +02:00
Diego Biurrun	17337f54c0	x86: Split inline and external assembly #ifdefs	2012-08-31 01:53:25 +02:00
Diego Biurrun	ec36aa6944	x86: Fix linking with some or all of yasm, mmx, optimizations disabled Some optimized template functions reference optimized symbols, so they must be explicitly disabled when those symbols are unavailable.	2012-08-30 19:37:32 +02:00
Diego Biurrun	a886b279a0	x86: cosmetics: Comment some #endifs for better readability	2012-08-30 18:50:33 +02:00
Diego Biurrun	2e6f93a284	x86: Always compile files with functions that are called unconditionally	2012-08-29 00:27:06 +02:00
Diego Biurrun	2f2aa2e542	x86: mpegvideoenc: fix linking with --disable-mmx The optimized dct_quantize template functions reference optimized fdct symbols, so these functions must only be enabled if the relevant optimizations have been enabled by configure.	2012-08-29 00:26:56 +02:00
Diego Biurrun	d39791bf39	x86: mpegvideoenc: Do not abuse HAVE_ variables for template instantiation This avoids trouble if HAVE_ variables are used elsewhere in the file.	2012-08-29 00:14:52 +02:00
Diego Biurrun	bcc45d6348	x86: avcodec: Drop silly "_mmx" suffixes from filenames	2012-08-28 18:37:34 +02:00
Diego Biurrun	efbd04c332	x86: avcodec: Drop silly "_sse" suffixes from filenames	2012-08-28 18:37:33 +02:00
Diego Biurrun	3f02c533f3	build: fft: x86: Drop unused YASM-OBJS-FFT- variable	2012-08-27 03:10:58 +02:00
Mans Rullgard	db70730291	x86: fft: remove unused fft_dispatch* functions These functions are not used since the yasm conversion. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-08-25 23:58:26 +01:00
Diego Biurrun	dc40285427	x86: mpegvideo: more sensible names for optimization file and init function	2012-08-24 02:23:16 +02:00
Diego Biurrun	d211547ddd	x86: mpegvideoenc: Split optimizations off into a separate file	2012-08-24 02:23:16 +02:00
Diego Biurrun	26ce9aec03	dnxhdenc: x86: more sensible names for optimization file and init function	2012-08-24 02:23:15 +02:00
Diego Biurrun	6fa488678f	build: x86: Only compile mpegvideo optimizations when necessary	2012-08-22 01:06:33 +02:00
Diego Biurrun	6961bdface	x86: avcodec: Consistently name all init files	2012-08-16 11:05:38 +02:00
Martin Storsjö	1d9c2dc89a	Don't include common.h from avutil.h Signed-off-by: Martin Storsjö <martin@martin.st>	2012-08-15 22:32:06 +03:00
Diego Biurrun	29cfdd3767	x86: avcodec: Appropriately name files containing only init functions	2012-08-15 03:24:08 +02:00
Diego Biurrun	be12958937	mpegvideo_mmx_template: drop some commented-out cruft	2012-08-15 03:24:07 +02:00
Mans Rullgard	8ec0204ee4	x86: cabac: allow building with suncc This fixes two issues preventing suncc from building this code. The undocumented 'a' operand modifier, causing gcc to omit a $ in front of immediate operands (as required in addresses), is not supported by suncc. Luckily, the also undocumented 'c' modifer has the same effect and is supported. On some asm statements with a large number of operands, suncc for no obvious reason fails to correctly substitute some of the operands. Fortunately, some of the operands in these statements are plain numbers which can be inserted directly into the code block instead of passed as operands. With these changes, the code builds correctly with both gcc and suncc. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-08-13 14:51:52 +01:00
Mans Rullgard	c8252e80eb	x86: mlpdsp: avoid taking address of void This code contains a C array of addresses of labels defined in inline asm. To do this, the names must be declared as external in C. The declared type does not matter since only the address is used, and for some reason, the author of the code used the 'void' type despite taking the address of a void expression being invalid. Changing the type to char, a reasonable choice since the alignment of the code labels cannot be known or guaranteed, eliminates gcc warnings and allows building with suncc. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-08-13 14:51:52 +01:00
Diego Biurrun	3b9e832e17	x86: Drop silly "_yasm" suffixes from filenames	2012-08-12 17:13:05 +02:00
Mans Rullgard	d7a4f8f8b9	Move MASK_ABS macro to libavcodec/mathops.h This macro is only used in two places, both in libavcodec, so this is a more sensible place for it. Two small tweaks to the macro are made: - removing the trailing semicolon - dropping unnecessary 'volatile' from the x86 asm Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-08-09 00:58:20 +01:00
Mans Rullgard	c318626ce2	x86: rename libavutil/x86_cpu.h to libavutil/x86/asm.h This puts x86-specific things in the x86/ subdirectory where they belong. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-08-09 00:58:20 +01:00
Dave Yeo	197439c1ef	x86: pngdsp: Fix assembly for OS/2 The a.out object format does not allow aligning sections. On OS/2 LD aligns sections to 16 bytes. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-08-08 15:45:09 +02:00
Mans Rullgard	2b140a3d09	x86: use 32-bit source registers with movd instruction yasm tolerates mismatch between movd/movq and source register size, adjusting the instruction according to the register. nasm is more strict. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-08-07 15:21:20 +01:00
Mans Rullgard	a3df4781f4	x86: add colons after labels nasm prints a warning if the colon is missing. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-08-07 15:20:56 +01:00
Anton Khirnov	36ef5369ee	Replace all CODEC_ID_* with AV_CODEC_ID_*	2012-08-07 16:00:24 +02:00
Diego Biurrun	2096857551	x86: h264_idct: Rename x264_add8x4_idct_sse2 --> h264_add8x4_idct_sse2	2012-08-05 21:40:49 +02:00
Ronald S. Bultje	4a8143e73c	fft: 3dnow: fix register name typo in DECL_IMDCT macro Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-08-04 00:16:02 +02:00
Diego Biurrun	0c3ff1982c	x86: dct32: port to cpuflags	2012-08-03 22:51:06 +02:00
Diego Biurrun	239fdf1b4a	x86: build: replace mmx2 by mmxext Refactoring mmx2/mmxext YASM code with cpuflags will force renames. So switching to a consistent naming scheme beforehand is sensible. The name "mmxext" is more official and widespread and also the name of the CPU flag, as reported e.g. by the Linux kernel.	2012-08-03 22:51:05 +02:00
Ronald S. Bultje	da6505ad2f	dsputil: make add_hfyu_left_prediction_sse4() support unaligned src. This makes add_hfyu_left_prediction_sse4() handle sources that are not 16-byte aligned in its own function rather than by proxying the call to add_hfyu_left_prediction_ssse3(). This fixes a crash on Win64, since the sse4 version clobberes xmm6, but the ssse3 version (which uses MMX regs) does not restore it, thus leading to XMM clobbering and RSP being off. Fixes bug 342.	2012-08-03 11:09:14 -07:00
Diego Biurrun	ca844b7be9	x86: Use consistent 3dnowext function and macro name suffixes Currently there is a wild mix of 3dn2/3dnow2/3dnowext. Switching to "3dnowext", which is a more common name of the CPU flag, as reported e.g. by the Linux kernel, unifies this.	2012-08-03 14:00:47 +02:00
Diego Biurrun	03737412a3	x86: proresdsp: improve SIGNEXTEND macro comments	2012-08-02 22:30:44 +02:00
Diego Biurrun	81905088a1	x86: h264dsp: K&R formatting cosmetics	2012-08-02 20:20:21 +02:00
Ronald S. Bultje	c728518b3c	x86: fft: fix imdct_half() for AVX Some calculations were changed in `b6a3849` to use mmsize, which was not correct for the AVX version, which uses INIT_YMM and therefore has mmsize == 32. Fixes Bug 341. Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>	2012-08-02 13:40:11 -04:00
Mans Rullgard	ec7c501ed5	x86: remove libmpeg2 mmx(ext) idct functions These functions are not faster than other mmx implementations on any hardware I have been able to test on, and they are horribly inaccurate. There is thus no reason to ever use them. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-08-02 12:14:52 +01:00
Ronald S. Bultje	b6a3849adb	fft: port FFT/IMDCT 3dnow functions to yasm, and disable on x86-64. 64-bit CPUs always have SSE available, thus there is no need to compile in the 3dnow functions. This results in smaller binaries.	2012-07-31 21:20:47 -07:00
Ronald S. Bultje	53dfaedc01	x86/dsputilenc: bury inline asm under HAVE_INLINE_ASM.	2012-07-31 20:28:52 -07:00
Diego Biurrun	6376a3ad24	x86: h264dsp: Remove unused variable ff_pb_3_1	2012-08-01 00:17:16 +02:00
Diego Biurrun	8728b381cb	x86: h264dsp: Adjust YASM #ifdefs This fixes compilation with YASM disabled.	2012-07-31 13:54:07 +02:00
Ronald S. Bultje	b829b4ce29	h264: convert loop filter strength dsp function to yasm. This completes the conversion of h264dsp to yasm; note that h264 also uses some dsputil functions, most notably qpel. Performance-wise, the yasm-version is ~10 cycles faster (182->172) on x86-64, and ~8 cycles faster (201->193) on x86-32.	2012-07-30 19:39:47 -07:00
Ronald S. Bultje	c83f44dba1	h264_idct_10bit: port x86 assembly to cpuflags.	2012-07-28 08:29:45 -07:00
Ronald S. Bultje	b3c5ae5607	fft: rename "z" to "zc" to prevent name collision. Without this, cglobal will expand "z" to "zh" to access the high byte in a register's word, which causes a name collision with the ZH(x) macro further up in this file.	2012-07-28 08:29:44 -07:00
Ronald S. Bultje	4d777eedfd	vp3: don't compile mmx IDCT functions on x86-64. 64-bit CPUs always have SSE2, and a SSE2 version exists, thus the MMX version will never be used.	2012-07-27 20:12:30 -07:00
Ronald S. Bultje	a5bbb1242c	h264_loopfilter: port x86 simd to cpuflags.	2012-07-27 20:12:11 -07:00
Ronald S. Bultje	d07ff3cd5a	h264_chromamc_10bit: port x86 simd to cpuflags.	2012-07-27 17:35:49 -07:00
Ronald S. Bultje	4a26fdd852	vp3: port x86 SIMD to cpuflags.	2012-07-27 17:35:49 -07:00
Ronald S. Bultje	76888c64b0	rv34: port x86 SIMD to cpuflags.	2012-07-27 15:13:26 -07:00
Ronald S. Bultje	158744a4cd	vp56: only compile MMX SIMD on x86-32. All x86-64 CPUs have SSE2, so the MMX version will never be used. This leads to smaller binaries.	2012-07-27 14:40:27 -07:00
Ronald S. Bultje	2734ba787b	vp56: port x86 simd to cpuflags.	2012-07-27 14:39:07 -07:00
Ronald S. Bultje	5361e10a5e	proresdsp: port x86 assembly to cpuflags.	2012-07-27 11:43:06 -07:00
Ronald S. Bultje	bde73f28af	mpegaudio: bury inline asm under HAVE_INLINE_ASM.	2012-07-26 13:43:16 -07:00
Ronald S. Bultje	30b45d9c38	x86inc: automatically insert vzeroupper for YMM functions.	2012-07-26 13:43:16 -07:00
Ronald S. Bultje	a1878a88a1	vp3: don't use calls to inline asm in yasm code. Mixing yasm and inline asm is a bad idea, since if either yasm or inline asm is not supported by your toolchain, all of the asm stops working. Thus, better to use either one or the other alone. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	2012-07-25 14:24:30 -04:00
Ronald S. Bultje	79195ce565	x86/dsputil: put inline asm under HAVE_INLINE_ASM. This allows compiling with compilers that don't support gcc-style inline assembly. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	2012-07-25 14:24:27 -04:00
Yang Wang	845e92fd6a	dsputil_mmx: fix incorrect assembly code In ff_put_pixels_clamped_mmx(), there are two assembly code blocks. In the first block (in the unrolled loop), the instructions "movq 8%3, %%mm1 \n\t", and so forth, have problems. From above instruction, it is clear what the programmer wants: a load from p + 8. But this assembly code doesn’t guarantee that. It only works if the compiler puts p in a register to produce an instruction like this: "movq 8(%edi), %mm1". During compiler optimization, it is possible that the compiler will be able to constant propagate into p. Suppose p = &x[10000]. Then operand 3 can become 10000(%edi), where %edi holds &x. And the instruction becomes "movq 810000(%edx)". That is, it will stride by 810000 instead of 8. This will cause a segmentation fault. This error was fixed in the second block of the assembly code, but not in the unrolled loop. How to reproduce: This error is exposed when we build using Intel C++ Compiler, with IPO+PGO optimization enabled. Crashed when decoding an MJPEG video. Signed-off-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	2012-07-25 14:22:18 -04:00
Jason Garrett-Glaser	85a3c19ed1	dsputil: x86: add SHUFFLE_MASK_W macro Simplifies pshufb masks that operate on words.	2012-07-22 16:56:58 -04:00
Diego Biurrun	9f97af2688	x86: dsputil: drop some unused CPU flag debug code	2012-07-19 10:17:56 +02:00
Mans Rullgard	28f9ab7029	vp3: move idct and loop filter pointers to new vp3dsp context This moves all VP3-specific function pointers from dsputil to a new vp3dsp context. There is no reason to ever use the VP3 IDCT where an MPEG2 IDCT is expected or vice versa. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-07-18 10:32:19 +01:00
Mans Rullgard	ab9f987661	build: add CONFIG_VP3DSP, reduce repetition in OBJS lists Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-07-18 10:32:18 +01:00
Martin Storsjö	f27386cdc7	x86: h264_intrapred: Don't add the 'd' suffix to the SPLATB_REG macro The SPLATB_REG macro already adds the 'd' suffix internally. This fixes building on Win64, which has been broken since `878e66902`. This worked for unix, where r2 happened to be rdx in this case, which with the first suffix rdxd was mapped to eax, and eaxd is defined back to eax. On win64 however, r2 happened to be R8 in this case, and R8d mapps to R8D just fine, but there's no mapping for R8Dd to anything. Signed-off-by: Martin Storsjö <martin@martin.st>	2012-07-06 21:07:23 +03:00
Diego Biurrun	878e669029	x86: h264_intrapred: use newly introduced SPLAT* and PSHUFLW macros	2012-07-05 17:37:11 +02:00
Loren Merritt	4d4752366f	x86inc: add SPLATB_LOAD, SPLATB_REG, PSHUFLW macros Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-07-05 17:37:11 +02:00
Diego Biurrun	d20f133ef9	x86: h264_intrapred: port to cpuflag macros	2012-07-05 17:37:10 +02:00
Martin Storsjö	07eeeb1d4f	vp8: Add ifdef guards around the sse2 loopfilter in the sse2slow branch too This was missed in the the previous commit in `70a1c800`. Signed-off-by: Martin Storsjö <martin@martin.st>	2012-07-05 09:39:01 +03:00
Martin Storsjö	70a1c8000f	vp8: loopfilter >=sse2 functions need aligned stack on x86-32. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-07-04 08:25:50 -07:00
Ronald S. Bultje	723b266d72	dsputilenc: group yasm and inline asm function pointer assignment.	2012-07-04 07:46:27 -07:00
Ronald S. Bultje	ceabc13f12	dsputilenc_mmx: split assignment of ff_sse16_sse2 to SSE2 section.	2012-06-30 09:24:52 -07:00
Ronald S. Bultje	66a02159ea	x86: fmtconvert: add special asm for float_to_int16_interleave_misc_* This gets rid of a variable-length array and a for loop in C code. Signed-off-by: Martin Storsjö <martin@martin.st>	2012-06-30 19:10:36 +03:00
Mans Rullgard	f2fd167835	x86: vc1: fix and enable optimised loop filter The problem is that the ssse3 psign instruction does the wrong thing here. Commit `ea60dfe` incorrectly removed a macro emulating this instruction for pre-ssse3 code. However, the emulation is incorrect, and the code relies on the behaviour of the macro. Specifically, the psign sets destination elements to zero where the corresponding source element is zero, whereas the emulation only negates destination elements where the source is negative. Furthermore, the PSIGNW_MMX macro in x86util.asm is totally bogus, which is why the original VC-1 code had an additional right shift when using it. Since the psign instruction cannot be used here, skip all the macro hell and use the working instruction sequence directly. None of this was noticed due a stray return statement in ff_vc1dsp_init_mmx() which meant that only the mmx version of the loop filter was ever used (before being removed in `ea60dfe`). Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-06-30 00:12:05 +01:00
Christophe Gisquet	a5bfa66df5	x86: fft: replace call to memcpy by a loop The function call was a mess to handle, and memcpy cannot make the assumptions we do in the new code. Tested on an IMC sample: 430c -> 370c. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-06-27 12:49:33 +01:00
Mans Rullgard	0595334892	x86: fft: elf64: fix PIC build In a 64-bit PIC build, external functions must be called through the PLT. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-06-25 22:58:18 +01:00
Mans Rullgard	8725da49a2	x86: fft: win64: fix stack alignment for memcpy() call	2012-06-25 15:10:39 +01:00
Mans Rullgard	8299260470	x86: fft: convert sse inline asm to yasm	2012-06-25 13:31:00 +01:00
Ronald S. Bultje	8123e0901f	x86: place some inline asm under #if HAVE_INLINE_ASM Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-06-25 13:23:12 +01:00
Mans Rullgard	0b6f973635	h264: use asm cabac reader under a generic condition This removes a dependency on implementation details from generic code and allows easy addition of the equivalent optimisation for other architectures than x86. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-06-23 22:14:21 +01:00
Diego Biurrun	fe07c9c6b5	x86: Only use optimizations with cmov if the CPU supports the instruction	2012-06-23 16:21:50 +02:00
Mans Rullgard	29686d6ea3	x86: remove unused inline asm macros from dsputil_mmx.h Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-06-23 14:14:06 +01:00
Mans Rullgard	685f5438bb	x86: move some inline asm macros to the only places they are used Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-06-23 14:14:06 +01:00
Diego Biurrun	a5a93fa8f5	cosmetics: do not use full path for local headers	2012-06-22 10:49:40 +02:00
Ronald S. Bultje	d9669eab0b	dwt: remove variable-length arrays Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-06-17 23:20:10 +01:00
Justin Ruggles	d5a7229ba4	Add a float DSP framework to libavutil Move vector_fmul() from DSPContext to AVFloatDSPContext.	2012-06-08 13:14:38 -04:00
Vitor Sessak	bac0729d9e	x86: use new schema for ASM macros Signed-off-by: Janne Grunau <janne-libav@jannau.net>	2012-05-29 14:49:45 +02:00
Justin Ruggles	713548cbad	x86: lavc: use %if HAVE_AVX guards around AVX functions in yasm code. This is needed for older versions of yasm/nasm that do not support AVX. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-05-22 20:46:02 +02:00
Kieran Kunhya	5ff01259a8	Convert vector_fmul range of functions to YASM and add AVX versions Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>	2012-05-21 17:13:05 -04:00
Michael Kostylev	6797d1948b	x86: rv40: Mark rv40_weight functions as MMX2; they use MMX2 instructions.	2012-05-15 23:54:08 +02:00
Justin Ruggles	95a98ab3f0	ac3dsp: simplify x86 versions of ac3_max_msb_abs_int16 Simplifies the code by using cpuflags and a new macro. Also fixes the invalid use of the MMX2 pshufw operation in the MMX-only function.	2012-05-15 15:23:59 -04:00
Vitor Sessak	fcc456b829	x86: use more standard construct for setting ASM functions in FFT code Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-05-14 15:38:42 +02:00
Michael Kostylev	ea60dfe284	x86: vc1: drop MMX loop filter implementation, which uses MMX2 instructions.	2012-05-12 14:02:45 +02:00
Christophe Gisquet	110d0cdc9d	rv40dsp x86: MMX/MMX2/3DNow/SSE2/SSSE3 implementations of MC Code mostly inspired by vp8's MC, however: - its MMX2 horizontal filter is worse because it can't take advantage of the coefficient redundancy - that same coefficient redundancy allows better code for non-SSSE3 versions Benchmark (rounded to tens of unit): V8x8 H8x8 2D8x8 V16x16 H16x16 2D16x16 C 445 358 985 1785 1559 3280 MMX* 219 271 478 714 929 1443 SSE2 131 158 294 425 515 892 SSSE3 120 122 248 387 390 763 End result is overall around a 15% speedup for SSSE3 version (on 6 sequences); all loop filter functions now take around 55% of decoding time, while luma MC dsp functions are around 6%, chroma ones are 1.3% and biweight around 2.3%. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-05-10 18:42:43 +02:00
Ronald S. Bultje	bec207f9f9	snowdsp: explicitily state instruction size. Fixes a compile error with clang at -O0.	2012-05-02 09:57:12 -07:00
Christophe GISQUET	e75d1d4f73	dsputil x86: revert a test back to its previous value Commit `356ee8d` caused the initial inversion. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-28 11:00:51 -07:00
Christophe Gisquet	fe5ed69dc7	rv34dsp x86: implement MMX2 inverse transform 141 cycles down to 51. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-28 10:58:47 -07:00
Roland Scheidegger	9b9df1cdff	h264: new assembly version of get_cabac for x86_64 with PIC This adds a hand-optimized assembly version for get_cabac much like the existing one, but it works if the table offsets are RIP-relative. Compared to the non-RIP-relative version this adds 2 lea instructions and it needs one extra register. get_cabac() gets about 40% faster, for an overall speedup of about 5%. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-28 09:43:25 -07:00
Roland Scheidegger	14e9ffc1e4	h264: use one table instead of several for cabac functions The reason is this is easier for PIC code (in particular on darwin...). Keep the old names as pointers (static in cabac_functions.h so gcc knows these are just immediate offsets) so the c code can nicely stay the same (alternatively could use offsets directly in the functions needing the tables). This should produce the same code as before with non-pic and better code (confirmed) with pic. The assembly uses the new table but still won't work for PIC case. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-28 08:26:12 -07:00
Roland Scheidegger	444f47b55c	h264: (trivial) remove unneeded macro argument in x86/cabac.h Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-28 08:24:56 -07:00
Mans Rullgard	2bcbd98459	Remove lowres video decoding This feature is complex, of questionable utility, and slows down normal decoding. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-04-21 18:56:19 +01:00
Mans Rullgard	95510be8c3	avcodec: remove AVCodecContext.dsp_mask This removes all references to AVCodecContext.dsp_mask and marks it for eviction at the next version bump. It has been superseded by av_set_cpu_flag_mask() which, unlike this field, works everywhere. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-04-21 18:30:01 +01:00
Ronald S. Bultje	87a246341b	h264: use proper PROLOGUE statement for a function using 8 registers. Fixes crashes when using biweight on win64.	2012-04-16 08:07:21 -07:00
Ronald S. Bultje	b089ca871a	dsputil: fix optimized emu_edge function on Win64. Recent register allocation changes (x86inc.asm update) changed the register order and thus opcodes for the inner loops. One of them became >128bytes, which confuses other parts of this function where it jumps to fixed-offset positions to extend the edge by fixed amounts. A simple register change fixes this.	2012-04-13 11:28:30 -07:00
Justin Ruggles	de7f22ab0c	ac3dsp: call femms/emms at the end of float_to_fixed24() for 3DNow and SSE Fixes ac3-encode and eac3-encode FATE test failures with SSE2 disabled. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-12 21:33:04 -07:00
Ronald S. Bultje	76538d7a78	h264: fix 10bit biweight functions after recent x86inc.asm fixes. This should have been updated in the x86inc.asm update, but was accidently forgotten.	2012-04-12 21:13:57 -07:00
Diego Biurrun	7bb3a302fe	build: Consistently handle conditional compilation for all optimization OBJS.	2012-04-12 09:00:49 +02:00
Henrik Gramner	729f90e268	x86inc improvements for 64-bit Add support for all x86-64 registers Prefer caller-saved register over callee-saved on WIN64 Support up to 15 function arguments Also (by Ronald S. Bultje) Fix up our asm to work with new x86inc.asm. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>	2012-04-11 15:47:00 -04:00
Christophe GISQUET	2130bd8f5b	rv40dsp x86: use only one register, for both increment and loop counter Around 10 cycles faster for luma. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-10 10:07:09 -07:00
Christophe GISQUET	272b252c01	rv40dsp: implement prescaled versions for biweight. Quite often, the original weights are multiple of 512. By prescaling them by 1/512 when they are computed (once per frame), no intermediate shifting is needed, and no prescaling on each call either. The x86 code already used that trick. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-10 10:06:48 -07:00
Christophe GISQUET	6b81da2fd0	dsputil x86: use SSE float instruction instead of SSE2 integer equivalent All the more required since the users are pure SSE functions. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-04 11:24:27 -07:00
Christophe GISQUET	cd88105f6f	dsputil x86: remove deprecated parameter from scalarproduct_int16 prototype Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-04 11:24:08 -07:00
Christophe GISQUET	f9888520cc	vp8dsp x86: perform rounding shift with a single instruction Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-04 11:23:36 -07:00
Ronald S. Bultje	a940198130	cabac: add overread protection to BRANCHLESS_GET_CABAC(). Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind	2012-03-28 08:01:29 -07:00
Ronald S. Bultje	448dc42571	cabac: increment jump locations by one in callers of BRANCHLESS_GET_CABAC().	2012-03-28 08:01:29 -07:00
Ronald S. Bultje	16f6e83f74	cabac: remove unused argument from BRANCHLESS_GET_CABAC_UPDATE().	2012-03-28 08:01:29 -07:00
Ronald S. Bultje	951014e5bb	cabac: use struct+offset instead of memory operand in BRANCHLESS_GET_CABAC().	2012-03-28 08:01:29 -07:00
Ronald S. Bultje	a0bdcb019e	h264: add overread protection to get_cabac_bypass_sign_x86().	2012-03-28 08:01:29 -07:00
Ronald S. Bultje	95bfa4ead7	h264: reindent get_cabac_bypass_sign_x86().	2012-03-28 08:01:29 -07:00
Ronald S. Bultje	db025929f2	h264: use struct offsets in get_cabac_bypass_sign_x86().	2012-03-28 08:01:29 -07:00
Diego Biurrun	ad0e31f134	build: prettyprinting cosmetics	2012-03-26 13:00:10 +02:00
Diego Biurrun	62ce9defb8	x86: dsputil: prettyprint gcc inline asm	2012-03-25 11:50:48 +02:00
Diego Biurrun	3b54912113	x86: K&R prettyprinting cosmetics for dsputil_mmx.c	2012-03-25 11:50:48 +02:00
Diego Biurrun	915a2a0a65	x86: conditionally compile H.264 QPEL optimizations	2012-03-25 11:50:45 +02:00
Diego Biurrun	3816642eab	dsputil_mmx: Surround QPEL macros by "do { } while (0);" blocks. This makes them safe to use in non-fully braced if-blocks and similar.	2012-03-25 11:48:37 +02:00
Ronald S. Bultje	71ea26811c	aacsbr: handle m_max values smaller than 4. Prevents a signflip in the counter, and a subsequent crash because of overreads/overwrites. Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind CC: libav-stable@libav.org	2012-03-23 12:56:08 -07:00
Ronald S. Bultje	a928ed3751	vp8: convert mbedge loopfilter x86 assembly to use named arguments.	2012-03-10 11:36:33 -08:00
Ronald S. Bultje	bee330e300	vp8: convert inner loopfilter x86 assembly to use named arguments.	2012-03-10 11:36:33 -08:00
Reimar Döffinger	6eda85e15b	sbrdsp.asm: convert all instructions to float/SSE ones. Since the values are floats, using the float operations makes sense, improves performance on some CPUs and makes the code SSE compatible instead of needing SSE2. Based on suggestion by Jason. Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de> Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-03-07 13:50:13 -08:00
Christophe GISQUET	7e1ce6a6ac	dsputil: remove shift parameter from scalarproduct_int16 There is only one caller, which does not need the shifting. Other use cases are situations where different roundings would be needed. The x86 and neon versions are modified accordingly. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-03-07 10:29:52 -08:00
Diego Biurrun	1e9d55e45e	x86: Remove duplicated AVG_3DNOW_OP / AVG_MMX2_OP macros from h264_qpel_mmx.c.	2012-03-07 09:36:04 +01:00
Reimar Döffinger	b5161908e0	SBR DSP: fix SSE code to not use SSE2 instructions. movq from SSE register _to_ memory is an SSE2 instruction. Use the SSE movlps function instead that does the same thing. Signed-off-by: Reimar DÃ¶ffinger <Reimar.Doeffinger@gmx.de> Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-03-06 13:40:35 -08:00
Mans Rullgard	356ee8d7de	x86: clean up ff_dsputil_init_mmx() This splits ff_dsputil_init_mmx() into multiple functions, one for each MMX/SSE level, somewhat simplifying the nested conditions. Signed-off-by: Mans Rullgard <mans@mansr.com> Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-03-05 14:40:03 +01:00
Ronald S. Bultje	b4188f0d46	vp8: convert simple loopfilter x86 assembly to use named arguments.	2012-03-03 20:40:00 -08:00
Ronald S. Bultje	8476ca3b4e	vp8: convert idct x86 assembly to use named arguments.	2012-03-03 20:40:00 -08:00
Ronald S. Bultje	21ffc78fd7	vp8: convert mc x86 assembly to use named arguments.	2012-03-03 20:40:00 -08:00
Ronald S. Bultje	28170f1a39	vp8: convert loopfilter x86 assembly to use cpuflags().	2012-03-03 20:40:00 -08:00
Ronald S. Bultje	e25be47154	vp8: convert idct/mc x86 assembly to use cpuflags().	2012-03-03 20:39:59 -08:00
Ronald S. Bultje	291c9b6285	h264: change underread for 10bit QPEL to overread. This prevents us from reading before the start of the buffer, and thus prevents crashes resulting from this behaviour. Fixes bug 237.	2012-03-02 10:33:05 -08:00
Ronald S. Bultje	45549339bc	vp8: disable mmx functions with sse/sse2 counterparts on x86-64. x86-64 is guaranteed to have at least SSE2, therefore the MMX/MMX2 functions will never be used in practice.	2012-03-02 10:32:05 -08:00
Ronald S. Bultje	bd66f073fe	vp8: change int stride to ptrdiff_t stride. On 64bit platforms with 32bit int, this means we won't have to sign- extend the integer anymore.	2012-03-02 10:31:50 -08:00
Ronald S. Bultje	b0c4f04338	h264: fix mmxext chroma deblock to use correct TC values.	2012-02-27 09:38:44 -08:00
Christophe GISQUET	2784d18791	SBR DSP x86: implement SSE sbr_hf_g_filt Unrolling the main loop to process, instead of 4 elements: - 8: minor gain of 2 cycles (not worth the extra object size) - 2: loss of 8 cycles. Assigning STEP to a register is a loss. Output address (Y) is almost always unaligned. Timings: - C (32/64 bits): 117/109 cycles - SSE: 57 cycles Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-02-23 15:50:09 -08:00
Christophe GISQUET	34454c761f	SBR DSP x86: implement SSE sbr_sum_square_sse The 32bits targets have been compiled with -mfpmath=sse for proper reference. sbr_sum_square C /32bits: 82c (unrolled)/102c C /64bits: 69c (unrolled)/82c SSE/32bits: 42c SSE/64bits: 31c Use of SSE4.1 dpps to perform the final sum is slower. Not unrolling to perform 8 operations in a loop yields 10 more cycles. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-02-23 15:50:06 -08:00
Ronald S. Bultje	3ab9a2a557	rv34: change most "int stride" into "ptrdiff_t stride". This prevents having to sign-extend on 64-bit systems with 32-bit ints, such as x86-64. Also fixes crashes on systems where we don't do it and arguments are not in registers, such as Win64 for all weight functions.	2012-02-20 14:58:25 -08:00
Ronald S. Bultje	8fb26950ed	h264: don't use redzone in loopfilter on win64. Red zone usage is not allowed in the Win64 ABI.	2012-02-19 15:31:03 -08:00
Christophe GISQUET	f3e084909b	mpegaudio: replace memcpy by SIMD code By replacing memcpy with an unrolled loop using the alignment knowledge it has, some speedup can be obtained. Before (gcc 4.6.1): ~400 cycles After: ~370 cycles Overall, around 2% speed increase when decoding a 2400s mp3 to f32le. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-02-15 20:11:54 -08:00
Martin Storsjö	efd29844eb	mpegvideo: Add ff_ prefix to nonstatic functions Signed-off-by: Martin Storsjö <martin@martin.st>	2012-02-15 22:07:23 +02:00
Martin Storsjö	873c89e2a6	dsputil: Add ff_ prefix to inv_zigzag_direct16 Signed-off-by: Martin Storsjö <martin@martin.st>	2012-02-15 22:06:42 +02:00
Martin Storsjö	9cf0841ef3	dsputil: Add ff_ prefix to the dsputil_init functions Signed-off-by: Martin Storsjö <martin@martin.st>	2012-02-15 22:06:34 +02:00
Justin Ruggles	d483bb58c3	ac3dsp: do not use pshufb in ac3_extract_exponents_ssse3() We need to do unsigned saturation in order to cover the corner case when the absolute coefficient value is 16777215 (the maximum value). Fixes Bug #216	2012-02-09 21:04:44 -05:00
Diego Biurrun	0bba26466f	cosmetics: Delete empty lines at end of file.	2012-02-09 12:26:45 +01:00
Ronald S. Bultje	ce1e250ee9	h264: manually save/restore XMM registers for functions using INIT_MMX. On Win64, these registers are callee-save, so not saving/restoring them correctly is a violation of ABI and can lead to crashes or corrupt data.	2012-02-08 10:31:14 -08:00
Ronald S. Bultje	4ff6dea390	pngdsp: swap argument inversion.	2012-02-07 14:32:26 -08:00
Michael Kostylev	3206cccc0e	h264: mark h264_idct_add8_10 with number of XMM registers. This fixes XMM register clobber problems on Win64. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-02-07 11:37:13 -08:00
Ronald S. Bultje	7e4d9d5d45	win64: add a XMM clobber test configure option. This will be useful to test more aggressively for failures to mark XMM registers as clobbered in Win64 builds, and prevent regressions thereof. Based on a patch by Ramiro Polla <ramiro.polla@gmail.com>	2012-02-02 12:00:48 -08:00
Justin Ruggles	236a550c3f	Fix a typo in the x86 asm version of ff_vector_clip_int32() Specifies the correct number of xmm registers used so that they can be saved and restored on Win64 if necessary.	2012-02-01 19:02:32 -05:00
Christophe Gisquet	e5c9de2ab7	rv40: x86 SIMD for biweight Provide MMX, SSE2 and SSSE3 versions, with a fast-path when the weights are multiples of 512 (which is often the case when the values round up nicely). *_TIMER report for the 16x16 and 8x8 cases: C: 9015 decicycles in 16, 524257 runs, 31 skips 2656 decicycles in 8, 524271 runs, 17 skips MMX: 4156 decicycles in 16, 262090 runs, 54 skips 1206 decicycles in 8, 262131 runs, 13 skips MMX on fast-path: 2760 decicycles in 16, 524222 runs, 66 skips 995 decicycles in 8, 524252 runs, 36 skips SSE2: 2163 decicycles in 16, 262131 runs, 13 skips 832 decicycles in 8, 262137 runs, 7 skips SSE2 with fast path: 1783 decicycles in 16, 524276 runs, 12 skips 711 decicycles in 8, 524283 runs, 5 skips SSSE3: 2117 decicycles in 16, 262136 runs, 8 skips 814 decicycles in 8, 262143 runs, 1 skips SSSE3 with fast path: 1315 decicycles in 16, 524285 runs, 3 skips 578 decicycles in 8, 524286 runs, 2 skips This means around a 4% speedup for some sequences. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-01-30 23:58:25 +01:00
Diego Biurrun	91bafb52ae	x86: Give RV40 init file a more suitable name.	2012-01-30 23:58:24 +01:00
Diego Biurrun	c30b198381	x86: Place mm_flags variable declaration below the appropriate #ifdef. This fixes some unused variable warnings with YASM disabled.	2012-01-30 23:58:23 +01:00
Christophe Gisquet	6b03900382	x86 dsputil: provide SSE2/SSSE3 versions of bswap_buf While pshufb allows emulating bswap on XMM registers for SSSE3, more shuffling is needed for SSE2. Alignment is critical, so specific codepaths are provided for this case. For the huffyuv sequence "angels_480-huffyuvcompress.avi": C (using bswap instruction): ~ 55k cycles SSE2: ~ 40k cycles SSSE3 using unaligned loads: ~ 35k cycles SSSE3 using aligned loads: ~ 30k cycles Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-01-30 10:19:55 +01:00
Ronald S. Bultje	af79a0c48a	png: add support for bpp>4 to paeth x86 SIMD code. This fixes playback of e.g. RGB48 (bpp=6) content on x86 CPUs. Fixes bug 214.	2012-01-29 21:22:50 -08:00
Ronald S. Bultje	f91c4b7824	png: add SSE2 version for add_bytes_l2.	2012-01-29 18:52:17 -08:00
Ronald S. Bultje	59f474b49d	png: convert DSP functions to yasm.	2012-01-29 18:47:50 -08:00
Ronald S. Bultje	20a7d3178f	png: add missing #if HAVE_SSSE3 around function pointer assignment.	2012-01-29 12:31:59 -08:00
Ronald S. Bultje	331e7c4cb3	imdct36: mark SSE functions as using all 16 XMM registers. On x86-64, it indeed uses all 16 registers (and on x86-32, this gets clipped to 8). Not marking it properly causes callers of this function to fail randomly because of XMM register clobbering.	2012-01-29 08:14:05 -08:00
Ronald S. Bultje	e92003514d	png: move DSP functions to their own DSP context.	2012-01-29 08:11:18 -08:00
Ronald S. Bultje	3b15a6d742	config.asm: change %ifdef directives to %if directives. This allows combining multiple conditionals in a single statement.	2012-01-27 10:19:57 +08:00
Ronald S. Bultje	c3af52fa8b	dsputil: use vertical component for drawing bottom edge. Current code only writes 8 pixels of vertical edge for YUV422, which causes MC artifacts when subsequent frames use data from that edge.	2012-01-25 18:06:36 +08:00
Christophe GISQUET	9ba9c34024	rv34: 1-pass inter MB reconstruction Implement 1-pass inverse transform and reconstruction for inter blocks.	2012-01-16 19:26:41 +01:00
Christophe GISQUET	d78062386e	rv34: Intra 16x16 handling Extract processing of intra 16x16 blocks from intra macroblock processing. Also implement a function performing inverse transform and block reconstruction for DC-only blocks in 1 pass instead of 2.	2012-01-16 00:41:51 +01:00
Christophe GISQUET	3faa303a47	rv34: DC-only inverse transform When decoding coefficients, detect whether the block is DC-only, and take advantage of this knowledge to perform DC-only inverse transform. This is achieved by: - first, changing the 108x4 element modulo_three_table into a 108 element table (kind of base4), and accessing each value using mask and shifts. - then, checking low bits for 0 (as they represent the presence of higher frequency coefficients) Also provide x86 SIMD code for the DC-only inverse transform. Signed-off-by: Kostya Shishkov <kostya.shishkov@gmail.com>	2012-01-12 09:52:33 +01:00
Henrik Gramner	e7d02b04dc	fft: init functions with INIT_XMM/YMM. This is required to handle clobbering of XMM registers on Win64 correctly. Fixes FFT and all tests depending on FFT on Win64. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Janne Grunau <janne-libav@jannau.net>	2012-01-11 20:12:26 +01:00
Vitor Sessak	39df0c434c	mpegaudiodec: optimized iMDCT transform Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-01-08 17:40:55 -08:00
Martin Storsjö	676a9ee1d2	x86: Fix constraints for decode_significance*_x86 Originally, prior to `8742a4ff8`, the caller code was compiled within this condition: ARCH_X86 && HAVE_7REGS && HAVE_EBX_AVAILABLE && !defined(BROKEN_RELOCATIONS) Since HAVE_7REGS is defined as (ARCH_X86_64 \|\| (HAVE_EBX_AVAILABLE && HAVE_EBP_AVAILABLE)) the subcondition HAVE_7REGS && HAVE_EBX_AVAILABLE is equal to HAVE_7REGS (for 32 bit at least). The correct simplification of the original condition thus is HAVE_7REGS, not HAVE_EBX_AVAILABLE. This fixes compilation in some cases where HAVE_EBP_AVAILABLE = 0 and HAVE_EBX_AVAILABLE = 1. Signed-off-by: Martin Storsjö <martin@martin.st>	2011-12-27 09:05:14 +02:00
Diego Biurrun	6fdb2ce34a	x86: Tighten register constraints for decode_significance*_x86. On 32-bit OS X with gcc 4.0/4.2 and shared libraries enabled, the ebx register is not available, but required to assemble the functions. This reverts commit `8742a4f` to a simplified version of the original constraints.	2011-12-21 12:06:37 +01:00
Diego Biurrun	30bbd5cbc0	x86: conditionally compile dnxhd encoder optimizations	2011-12-19 13:54:10 +01:00
Diego Biurrun	88b9735753	build: conditionally compile x86 H.264 chroma optimizations	2011-12-14 11:58:45 +01:00
Martin Storsjö	f1dba9e498	x86: Require 7 registers for the cabac asm The change in `599b4c6ef` didn't turn out to work properly on i386 on OS X, where it broke building with PIC enabled. Signed-off-by: Martin Storsjö <martin@martin.st>	2011-12-12 15:36:20 +02:00
Mans Rullgard	599b4c6efd	x86: cabac: replace explicit memory references with "m" operands This replaces the explicit offset(reg) memory references with "m" operands for the same locations. As a result, one fewer register operand is needed for these inline asm statements. Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-12-11 22:29:22 +00:00
Diego Biurrun	da9cea77e3	Fix a bunch of common typos.	2011-12-11 00:32:25 +01:00
Justin Ruggles	0e8fdd41c2	dsputil: use cpuflags in x86 emu_edge_core avoids passing around the extra argument among all the macros it uses	2011-11-22 15:40:51 -05:00
Justin Ruggles	395f2e70dd	dsputil: use movups instead of movdqu in ff_emu_edge_core_sse() This allows emulated_edge_mc_sse() and gmc_sse() to be used under AV_CPU_FLAG_SSE.	2011-11-22 15:40:51 -05:00
Justin Ruggles	9d06037d48	twinvq: add SSE/AVX optimized sum/difference stereo interleaving	2011-11-11 14:13:58 -05:00
Diego Biurrun	ce33320b30	Remove redundant filename self-references inside files. Filenames are brittle across renames and add no useful information.	2011-11-08 17:52:56 +01:00
Diego Biurrun	276b995d85	x86: drop pointless ARCH_X86 #ifdef from files in x86 subdirectory	2011-11-08 17:52:55 +01:00
Justin Ruggles	b8f02f5b4e	dsputil: use cpuflags in x86 versions of vector_clip_int32()	2011-11-06 20:50:06 -05:00
Ronald S. Bultje	717401aff2	h264_weight: remove duplication functions.	2011-11-05 07:16:30 -07:00
Justin Ruggles	5463e83dbc	fmtconvert: fix int32_to_float_fmul_scalar() for windows x86_64 The calling convention only allows 4 non-stack parameter, with each float or int register being skipped if not used. fixes Bug 64	2011-11-02 21:44:58 -04:00
Daniel Kang	ded3e9f054	H.264: Cometics to dsputil_mmx.c Add whitespace. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2011-10-26 06:41:32 -07:00
Ronald S. Bultje	b0b3231074	h264_weight: initialize "height" function argument properly. Right now it's not actually initialized on 32-bit, leading to crashes on win32.	2011-10-22 00:23:24 -07:00
Justin Ruggles	aad3429d4e	fmtconvert: port float_to_int16_interleave() 2-channel x86 inline asm to yasm	2011-10-21 10:13:05 -04:00
Justin Ruggles	4e8e262476	fmtconvert: port int32_to_float_fmul_scalar() x86 inline asm to yasm	2011-10-21 10:13:05 -04:00
Justin Ruggles	185142a5ea	fmtconvert: check compile-time x86 instruction set flags	2011-10-21 10:13:05 -04:00
Justin Ruggles	708ab7dd69	fmtconvert: port float_to_int16() x86 inline asm to yasm	2011-10-21 10:13:05 -04:00
Ronald S. Bultje	c2d337429c	H264: change weight/biweight functions to take a height argument. Neon parts by Mans Rullgard <mans@mansr.com>.	2011-10-21 01:00:45 -07:00
Ronald S. Bultje	229d263cc9	Support for lossless and inter H264 4:2:2.	2011-10-21 01:00:45 -07:00
Baptiste Coudurier	76741b0e56	h264: 4:2:2 intra decoding support Signed-off-by: Diego Biurrun <diego@biurrun.de> Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2011-10-21 01:00:41 -07:00
Diego Biurrun	265980dabc	x86: Move some variable declarations below the appropriat #ifdef. This avoids some unused variable warnings with YASM disabled.	2011-10-20 16:19:27 +02:00
Diego Biurrun	2cb7c81669	x86: Fix linking of ProRes DSP ASM with YASM disabled.	2011-10-20 16:19:13 +02:00
Ronald S. Bultje	05c8f119cc	proresdsp: fix function prototypes. Signed-off-by: Janne Grunau <janne-libav@jannau.net>	2011-10-14 21:34:46 +02:00
Ronald S. Bultje	e3f530feca	prores: idct sse2/sse4 optimizations. ~3.0-3.5x as fast as original C version, 1.6x as fast overall.	2011-10-11 07:50:48 -07:00
Sean McGovern	c2d3f56107	fft: avoid a signed overflow As a signed integer, 1<<31 overflows, so force it to unsigned. Signed-off-by: Alex Converse <alex.converse@gmail.com>	2011-09-23 17:02:58 -07:00
Ronald S. Bultje	38e06c2969	Move clipd macros to x86util.asm. This allows sharing them between multiple .asm files.	2011-08-17 20:56:06 -07:00
Dave Yeo	cc73511e8e	Fix NASM include directive Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2011-08-15 11:24:35 -07:00
Alex Converse	48f7163f13	dsputil_mmx: Honor HAVE_AMD3DNOW	2011-08-15 11:20:08 -07:00
Ronald S. Bultje	b2c087871d	Move x86util.asm from libavcodec/ to libavutil/. This allows using it in swscale also.	2011-08-12 11:43:03 -07:00
Ronald S. Bultje	3a39195b1d	Move x86inc.asm to libavutil/. This allows using it in libswscale/ also.	2011-08-12 11:43:02 -07:00
Kostya Shishkov	d241f51e0f	Move RV3/4-specific DSP functions into their own context Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2011-08-11 16:07:15 -07:00
Vitor Sessak	18b131de04	dct32: Add SSE2 ASM optimizations Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2011-08-02 10:17:29 -07:00
Jason Garrett-Glaser	a3bf7b864a	H.264: tweak some other x86 asm for Atom	2011-07-29 12:24:15 -07:00
Mans Rullgard	3ad1684126	x86: cabac: add operand size suffixes missing from `6c32576` This fixes build with clang. Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-07-28 18:59:23 -07:00
Mans Rullgard	f5f004bc5a	x86: cabac: don't load/store context values in asm Inspection of compiled code shows gcc handles these fine on its own. Benchmarking also shows no measurable speed difference. Removing the remaining cases in get_cabac_bypass_sign_x86() does cause more substantial changes to the compiled code with uncertain impact. Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-07-28 22:25:21 +01:00
Jason Garrett-Glaser	6c32576548	H.264: optimize CABAC x86 asm for Atom	2011-07-28 13:06:13 -07:00
Mans Rullgard	da4c7cce21	x86: fix build with gcc 4.7 The upcoming gcc 4.7 has more advanced constant propagation resulting some inline asm operands becoming constants and thus emitted as literals, sometimes in contexts where this results in invalid instructions. This patch changes the constraints of the relevant operands to "rm" thus forcing a valid type. While obviously suboptimal, this is what older gcc versions already did, and there is no change to the code generated with these. Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-07-26 22:17:43 +01:00
Daniel Kang	406fbd24dc	H.264: Add optimizations to predict x86 assembly. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2011-07-22 14:54:33 -07:00
Joseph Artsimovich	5ab21439fd	dnxhd: 10-bit support Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-07-21 18:44:40 +01:00
Mans Rullgard	a617c6aaa3	dsputil: update per-arch init funcs for non-h264 high bit depth Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-07-21 18:10:58 +01:00
Mans Rullgard	874f1a901d	dsputil: template get_pixels() for different bit depths Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-07-21 18:10:58 +01:00
Mans Rullgard	0a72533e98	jfdctint: add 10-bit version Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-07-21 18:10:58 +01:00
Mans Rullgard	e7a972e113	simple_idct: add 10-bit version Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-07-20 17:49:48 +01:00
Diego Biurrun	65083b4911	dsputil: remove disabled code	2011-07-18 11:48:35 +02:00
Martin Storsjö	8f62ef0f95	x86: Use LOCAL_ALIGNED in mpegvideo_mmx_template Signed-off-by: Martin Storsjö <martin@martin.st>	2011-07-18 00:10:45 +03:00
Diego Biurrun	e0ae2174db	simple_idct: remove disabled code	2011-07-17 17:32:37 +02:00
Daniel Kang	ac4a85f476	H.264: Add more x86 assembly for 10-bit H.264 predict functions Mainly ported from 8-bit H.264 predict. Some code ported from x264. LGPL ok by author. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2011-07-13 18:44:51 -07:00
Jason Garrett-Glaser	b5bbc84fe2	H.264: add filter_mb_fast support for >8-bit decoding Much faster high bit depth deblocking.	2011-07-11 14:58:50 -07:00
Mans Rullgard	710b8df949	dsputil: remove ff_emulated_edge_mc macro used in one place This macro can cause problems in conjunction with the bitdepth template expansion. It was presumably added to keep source compatibility when high bitdepth support was added. However, emulated_edge_mc is a dsputil pointer and should not be called directly, so there is little reason to keep such a macro. Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-07-10 17:55:58 +01:00
Daniel Kang	c0483d0c7a	H.264: Add x86 assembly for 10-bit H.264 predict functions Mainly ported from 8-bit H.264 predict. Some code ported from x264. LGPL ok by author. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2011-07-08 15:59:29 -07:00
Daniel Kang	3c7c16fde3	YASM: Shut up unused variable compiler warning with --disable-yasm. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2011-07-04 18:49:09 +02:00
Daniel Kang	567a32b5b2	x86_32: Fix build on x86_32 with --disable-yasm. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2011-07-04 08:47:09 -07:00
Daniel Kang	58f7aad051	Fix build with --disable-yasm. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2011-07-03 22:56:09 -07:00
Daniel Kang	9bfa5363da	H.264: Add x86 assembly for 10-bit H.264 qpel functions. Mainly ported from 8-bit H.264 qpel. Some code ported from x264. LGPL ok by author. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2011-07-03 07:43:38 -07:00
Justin Ruggles	f99a5ef92e	ac3dsp: add x86-optimized versions of ac3dsp.extract_exponents().	2011-07-01 13:02:11 -04:00
Justin Ruggles	6054cd25b4	ac3enc: add int32_t array clipping function to DSPUtil, including x86 versions.	2011-07-01 13:02:11 -04:00
Diego Biurrun	d2ee495fb2	configure: Drop check for availability of ten assembler operands. This was done to support gcc 2.95, which is an old legacy compiler that fails to compile the current codebase anyway.	2011-06-28 13:14:37 +02:00
Diego Biurrun	adbfc605f6	doxygen: Consistently use '@' instead of '\' for Doxygen markup. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2011-06-24 00:37:49 +02:00
Daniel Kang	84e70ef004	h264: Add x86 assembly for 10-bit weight/biweight H.264 functions. Mainly ported from 8-bit H.264 weight/biweight. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2011-06-21 15:24:13 +02:00
Mans Rullgard	c5ee740745	x86: cabac: fix register constraints for 32-bit mode Some operands need to be accessed in byte mode, which restricts the available registers in 32-bit mode. Using the 'q' constraint selects a suitable register. Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-06-20 23:36:40 +01:00
Mans Rullgard	2143d69bdd	cabac: move x86 asm to libavcodec/x86/cabac.h Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-06-20 22:36:31 +01:00
Mans Rullgard	d075e7d540	x86: h264: cast pointers to intptr_t rather than int Only the low-order bits are used here so the type is not important, but this avoids a compiler warning. Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-06-20 22:36:31 +01:00
Mans Rullgard	3a4edb76d6	x86: h264: remove hardcoded edi in decode_significance_8x8_x86() Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-06-20 22:36:31 +01:00
Mans Rullgard	b92c1a6d26	x86: h264: remove hardcoded esi in decode_significance[_8x8]_x86() Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-06-20 22:36:31 +01:00
Mans Rullgard	3fc4e36c78	x86: h264: remove hardcoded edx in decode_significance[_8x8]_x86() Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-06-20 22:36:31 +01:00
Mans Rullgard	e4b5a204aa	x86: h264: remove hardcoded eax in decode_significance[_8x8]_x86() Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-06-20 22:36:30 +01:00
Mans Rullgard	018c33838e	x86: cabac: remove hardcoded ebx in inline asm Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-06-20 22:36:30 +01:00
Mans Rullgard	6b712acc0e	x86: cabac: remove hardcoded struct offsets from inline asm Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-06-20 22:36:30 +01:00
Ronald S. Bultje	ed63f527f2	Fix build if yasm is not available.	2011-06-18 08:34:14 -04:00
Daniel Kang	f188a1e0ca	H.264: Add x86 assembly for 10-bit MC Chroma H.264 functions. Mainly ported from 8-bit H.264 MC Chroma. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2011-06-18 07:52:19 -04:00
Jason Garrett-Glaser	c90b94424c	4:4:4 H.264 decoding support Note: this is 4:4:4 from the 2007 spec revision, not the previous (now deprecated) 4:4:4 mode in H.264.	2011-06-13 21:16:30 -07:00
Jason Garrett-Glaser	504811baea	Roll back 4:4:4 H.264 for now Needs some ARM/PPC asm modifications.	2011-06-13 13:38:46 -07:00
Jason Garrett-Glaser	c9c493872c	4:4:4 H.264 decoding support Note: this is 4:4:4 from the 2007 spec revision, not the previous (now deprecated) 4:4:4 mode in H.264.	2011-06-13 12:21:39 -07:00
Oskar Arvidsson	6c031a3338	h264: Fix 10-bit H.264 x86 chroma v loopfilter asm. The tc variable was not splatted correctly. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2011-06-10 14:44:57 -04:00
Daniel Kang	4de83b7b6d	H264: x86 predict init cosmetics. Change indentation and whitespace; also move HAVE_YASM blocks. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2011-06-08 00:22:52 +02:00
Daniel Kang	a8d44f9dd5	Add x86 assembly for some 10-bit H.264 intra predict functions. Parts are inspired from the 8-bit H.264 predict code in Libav. Other parts ported from x264 with relicensing permission from author. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2011-06-06 01:31:02 +02:00
Loren Merritt	53be7b23e9	Cosmetic changes to h264_idct_10bit.asm. Removes redundant dword tags and whitespace changes. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2011-06-02 07:07:15 -07:00
Loren Merritt	994c3550ff	2x faster h264_idct_add8_10. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2011-06-02 07:07:02 -07:00
Ronald S. Bultje	e6635a9a19	h264: remove CONFIG_GPL from x86 intra prediction code. The authors permitted relicensing to LGPL a long time ago (Holger, Loren and Jason).	2011-06-02 07:02:46 -07:00
Daniel Kang	f3aa65af3a	h264/10bit: add HAVE_ALIGNED_STACK checks. Fixes regression in `836f47d34b` in ICC-10.x, since ICC<=11.0 doesn't align stack upon function calls. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2011-05-31 21:43:20 -07:00
Daniel Kang	348493db60	Update 8-bit H.264 IDCT function names to reflect bit-depth. Signed-off-by: Ronald S. Bultje <rbultje@google.com>	2011-05-31 15:02:32 -07:00
Daniel Kang	836f47d34b	Add IDCT functions for 10-bit H.264. Ports the majority of IDCT functions for 10-bit H.264. Parts are inspired from 8-bit IDCT code in Libav; other parts ported from x264 with relicensing permission from author. Signed-off-by: Ronald S. Bultje <rbultje@google.com>	2011-05-31 15:02:32 -07:00
Justin Ruggles	70bb747a57	ac3dsp: do not use the ff_* prefix when referencing ff_ac3_bap_bits. this should fix the windows builds Signed-off-by: Martin Storsjö <martin@martin.st>	2011-05-28 22:43:40 +03:00
Justin Ruggles	6ca23db9cc	ac3enc: modify mantissa bit counting to keep bap counts for all values of bap instead of just 0 to 4. This does all the actual bit counting as a final step.	2011-05-28 12:39:28 -04:00
Diego Biurrun	5e528cffcf	x86: Add appropriate ifdefs around certain AVX functions. nasm versions prior to 2.09 have trouble assembling some of our AVX code. Protect these sections by preprocessor macros to allow compilation to pass.	2011-05-27 21:18:12 +02:00
Dave Yeo	a10fb79070	x86 asm: Add SECTION_TEXT to dct32_sse.asm. This fixes the following error on OS/2: error: segment name `.text align=16' not recognized Signed-off-by: Diego Biurrun <diego@biurrun.de>	2011-05-23 12:47:53 +02:00
Loren Merritt	422b2362fc	dct32_sse: eliminate some spills 125->104 cycles on penryn (x86_64 only)	2011-05-22 19:27:18 +02:00
Vitor Sessak	165c7c420d	Fix dct32() compilation with --disable-yasm Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2011-05-22 07:10:19 -04:00
Vitor Sessak	6204feb160	dct32: Add AVX implementation of 32-point DCT	2011-05-21 17:42:26 +02:00
Vitor Sessak	4e653b98c8	dct32: Change pass 6 permutation to allow for AVX implementation	2011-05-21 17:42:26 +02:00
Vitor Sessak	3758eb0eb9	dct32: port SSE 32-point DCT to YASM	2011-05-21 17:42:26 +02:00
Diego Biurrun	153382e1b6	multiple inclusion guard cleanup Add missing multiple inclusion guards; clean up #endif comments; add missing library prefixes; keep guard names consistent.	2011-05-21 13:48:10 +02:00
Dave Yeo	d69f9a4234	Add support for a.out object format to assembler macros. This format is still used by e.g. OS/2. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2011-05-20 17:52:21 +02:00
Mans Rullgard	0b5e44ed29	mpegaudiodsp: fix x86 and ppc makefiles Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-05-19 16:32:24 +01:00
Mans Rullgard	c4f5c2d6f4	Move some mpegaudio functions to new mpegaudiodsp subsystem This separation allows these functions to be used in a cleaner fashion from other codecs (e.g. qdm2) and simplifies creating optimised versions of them. Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-05-19 12:25:34 +01:00
Justin Ruggles	e98a95e779	10l: wrap float_interleave functions in HAVE_YASM. fixes compilation with --disable-yasm	2011-05-18 20:18:08 -04:00
Justin Ruggles	32f8fb8ecf	Add float_interleave() to FmtConvertContext with x86-optimized versions. Partially based on patches by clsid2 in ffdshow-tryout. ff_float_interleave6() x86 improvements by Loren Merrit.	2011-05-18 17:27:05 -04:00
Daniel Kang	d0005d347d	Modify x86util.asm to ease transitioning to 10-bit H.264 assembly. Arguments for variable size instructions are added to many macros, along with other various changes. The x86util.asm code was ported from x264. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2011-05-17 20:44:48 +02:00
Gil Pedersen	257de5fb25	h264dsp_mmx: Add #ifdefs around some mmxext functions on x86_64. This fixes linking errors due to undefined symbols on x86_64 OS X. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2011-05-16 15:35:53 +02:00
Diego Biurrun	888fa31eca	Fix FSF address copy paste error in some license headers.	2011-05-14 21:32:31 +02:00
Jason Garrett-Glaser	5705b02079	10-bit H.264 x86 chroma v loopfilter asm Also delete some unused deblock asm macros.	2011-05-11 11:09:10 -07:00
Jason Garrett-Glaser	9f3d6ca4f1	Port x86 10-bit H.264 deblock asm from x264	2011-05-10 20:02:15 -07:00
Jason Garrett-Glaser	8ad77b65b5	Update x86 H.264 deblock asm Includes AVX versions from x264.	2011-05-10 20:01:58 -07:00
Ronald S. Bultje	86b29553f8	h264dsp_mmx: place bracket outside #if/#endif block. Should fix compile on systems missing yasm/nasm.	2011-05-10 08:39:38 -04:00
Oskar Arvidsson	19a0729b4c	Adds 8-, 9- and 10-bit versions of some of the functions used by the h264 decoder. This patch lets e.g. dsputil_init chose dsp functions with respect to the bit depth to decode. The naming scheme of bit depth dependent functions is <base name>_<bit depth>[_<prefix>] (i.e. the old clear_blocks_c is now named clear_blocks_8_c). Note: Some of the functions for high bit depth is not dependent on the bit depth, but only on the pixel size. This leaves some room for optimizing binary size. Preparatory patch for high bit depth h264 decoding support. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2011-05-10 07:24:36 -04:00
Diego Biurrun	a734fa575f	Remove disabled non-optimized code variants.	2011-04-29 20:01:13 +02:00
Vitor Sessak	9d35fa520e	Add AVX FFT implementation. Signed-off-by: Reinhard Tartler <siretart@tauware.de>	2011-04-26 18:25:24 +02:00
Vitor Sessak	33cbfa6fa3	Update x86inc.asm from x264 to allow AVX emulation using SSE and MMX. Signed-off-by: Reinhard Tartler <siretart@tauware.de>	2011-04-26 18:18:22 +02:00
Alexander Strange	1500be13f2	dsputil: allow to skip drawing of top/bottom edges.	2011-03-26 17:45:38 -04:00
Justin Ruggles	e6e9823488	Add apply_window_int16() to DSPContext with x86-optimized versions and use it in the ac3_fixed encoder.	2011-03-22 21:08:30 -04:00
Mans Rullgard	0aded9484d	Move dct and rdft definitions to separate files This leaves fft.h with only the core FFT and MDCT definitions thus making it more managable. Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-03-20 17:15:33 +00:00
Mans Rullgard	2912e87a6c	Replace FFmpeg with Libav in licence headers Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-03-19 13:33:20 +00:00
Justin Ruggles	0f999cfddb	ac3enc: add float_to_fixed24() with x86-optimized versions to AC3DSPContext and use in scale_coefficients() for the floating-point AC-3 encoder.	2011-03-17 16:46:48 -04:00
Justin Ruggles	79414257e2	mathops: fix MULL() when the compiler does not inline the function. If the function is not inlined, an immmediate cannot be used for the shift parameter, so the %cl register must be used instead in that case. This fixes compilation for x86-32 using gcc with --disable-optimizations.	2011-03-15 20:49:37 -04:00
Justin Ruggles	aaff3b312e	mathops: change "g" constraint to "rm" in x86-32 version of MUL64(). The 1-arg imul instruction cannot take an immediate argument, only a register or memory argument.	2011-03-15 13:43:47 -04:00
Justin Ruggles	b181b8fb96	mathops: convert MULL/MULH/MUL64 to inline functions rather than macros. This fixes unexpected name collisions that were occurring with variables declared within the macros. It also fixes the fate-acodec-ac3_fixed regression test on x86-32.	2011-03-15 13:43:47 -04:00
Justin Ruggles	f1efbca5e9	ac3enc: add SIMD-optimized shifting functions for use with the fixed-point AC3 encoder.	2011-03-14 08:45:31 -04:00
Mans Rullgard	a5444fee06	Add CONFIG_AC3DSP symbol to simplify makefiles Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-03-12 11:35:26 +00:00
Ronald S. Bultje	bf6fa73245	dsputil_mmx.c: remove ff_vector128. Remove ff_vector128, it is identical to ff_pb_80.	2011-02-19 10:51:15 -05:00
Ronald S. Bultje	12802ec060	dsputil: move VC1-specific stuff into VC1DSPContext.	2011-02-17 17:35:35 -05:00
Justin Ruggles	1f004fc512	ac3dsp: Change punpckhqdq to movhlps in ac3_max_msb_abs_int16(). Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2011-02-16 14:08:34 -05:00
Justin Ruggles	fbb6b49dab	ac3enc: Add x86-optimized function to speed up log2_tab(). AC3DSPContext.ac3_max_msb_abs_int16() finds the maximum MSB of the absolute value of each element in an array of int16_t. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2011-02-13 16:49:39 -05:00
Loren Merritt	e6b1ed693a	FFT: factor a shuffle out of the inner loop and merge it into fft_permute. 6% faster SSE FFT on Conroe, 2.5% on Penryn. Signed-off-by: Janne Grunau <janne-ffmpeg@jannau.net>	2011-02-13 15:36:39 +01:00
Justin Ruggles	dda3f0ef48	Add x86-optimized versions of exponent_min(). Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2011-02-10 15:32:47 -05:00
Ronald S. Bultje	17cf7c68ed	Fix ff_emu_edge_core_sse() on Win64. Fix emu_edge_v_extend_15 to be <128 bytes on Win64, by being more strict on the size of registers and which registers are being used for operations where multiple are available. This fixes segfaults in emulated_edge() function calls on Win64.	2011-02-08 18:25:12 -05:00
Justin Ruggles	c73d99e672	Separate format conversion DSP functions from DSPContext. This will be beneficial for use with the audio conversion API without requiring it to depend on all of dsputil. Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-02-02 02:44:53 +00:00
Alex Converse	770c410fbb	Fix ff_imdct_calc_sse() on gcc-4.6 Gcc 4.6 only preserves the first value when using an array with an "m" constraint. Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-02-02 02:40:05 +00:00
Ronald S. Bultje	81f2a3f4ff	Implement a SIMD version of emulated_edge_mc() for x86. From ~550 cycles (C version) to 170 (SSE/x86-64), 206 (MMX/x86-32) and 196 (SSE2/x86-32) cycles.	2011-01-31 20:55:56 -05:00
Justin Ruggles	d19b744a36	cosmetics: indentation Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-01-31 20:30:15 +00:00
Justin Ruggles	80ba1ddb58	Remove unneeded add bias from 3 functions. DSPContext.vector_fmul_window() DCADSPContext.lfe_fir() SynthFilterContext.synth_filter_float() Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-01-31 20:28:42 +00:00
Mans Rullgard	80944df720	x86: fix overflow in h264 8x8 planar prediction Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-01-24 23:24:28 +00:00
Justin Ruggles	6eabb0d3ad	Change DSPContext.vector_fmul() from dst=dstsrc to dest=src0src1. Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-01-22 17:53:27 +00:00
Justin Ruggles	1c189fc533	cosmetics related to LPC changes. Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-01-21 19:59:08 +00:00
Justin Ruggles	77a78e9bdc	Separate window function from autocorrelation. Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-01-21 19:59:08 +00:00
Justin Ruggles	56f8952b25	Move lpc_compute_autocorr() from DSPContext to a new struct LPCContext. Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-01-21 19:58:59 +00:00
Ronald S. Bultje	b9c7f66e6d	Fix horizontal/horizontal_up 8x8l intra prediction x86/simd functions. The original functions did not work correctly for edge pixels, e.g. when CODEC_FLAG_EMU_EDGE is set, leading to corrupt output in e.g. VLC. Based on a patch by Daniel Kang <daniel d kang gmail com>. Signed-off-by: Ronald S. Bultje <rsbultje gmail com>	2011-01-19 20:34:42 -05:00
Mans Rullgard	ef4a65149d	Replace ASMALIGN() with .p2align This macro has unconditionally used .p2align for a long time and serves no useful purpose.	2011-01-18 20:48:24 +00:00
Mans Rullgard	ac3c9d0169	x86: remove VLA in ac3_downmix_sse	2011-01-18 20:48:24 +00:00
Janne Grunau	2c3589bfda	consolidate .gitignore patters into a single file Signed-off-by: Janne Grunau <janne-ffmpeg@jannau.net>	2011-01-18 21:32:05 +01:00
Janne Grunau	348b8218f7	convert svn:ignore properties to .gitignore files Signed-off-by: Janne Grunau <janne-ffmpeg@jannau.net>	2011-01-17 15:50:14 +01:00
Ronald S. Bultje	1b3e43e4fd	Fix overflow in pred16x16_plane x86 simd code. Fixes issue 2547. Originally committed as revision 26381 to svn://svn.ffmpeg.org/ffmpeg/trunk	2011-01-15 22:00:44 +00:00
Ronald S. Bultje	ec3233a855	Fix ff_pw_3 alignment. Originally committed as revision 26344 to svn://svn.ffmpeg.org/ffmpeg/trunk	2011-01-14 23:26:34 +00:00
Jason Garrett-Glaser	19fb234e4a	H.264: split luma dc idct out and implement MMX/SSE2 versions About 2.5x the speed. NOTE: the way that the asm code handles large qmuls is a bit suboptimal. If x264-style dequant was used (separate shift and qmul values), it might be possible to get some extra speed. Originally committed as revision 26336 to svn://svn.ffmpeg.org/ffmpeg/trunk	2011-01-14 21:34:25 +00:00
Daniel Kang	004357a11f	Fix compilation on x86-32 with --disable-optimizations, fixes issue 2127. Patch by Daniel Kang, daniel.d.kang at gmail Originally committed as revision 26204 to svn://svn.ffmpeg.org/ffmpeg/trunk	2011-01-03 11:30:04 +00:00
Daniel Kang	0790caba60	Fix invalid reads in valgrind fate, patch by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI 2010. Originally committed as revision 26177 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-31 01:29:06 +00:00
Daniel Kang	536e9b2f58	Port pred8x8l_down_left_mmxext (H.264 intra prediction) from x264 (authors: Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI 2010. Originally committed as revision 26162 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 23:48:44 +00:00
Daniel Kang	720ea2d5b2	Port pred4x4_down_right_mmxext (H.264 intra prediction) from x264 (authors: Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI 2010. Originally committed as revision 26159 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 21:55:51 +00:00
Daniel Kang	d0aebe23e2	Port pred4x4_vertical_right_mmxext (H.264 intra prediction) from x264 (authors: Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI 2010. Originally committed as revision 26158 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 21:52:41 +00:00
Daniel Kang	76497232ef	Port pred4x4_horizontal_down_mmxext (H.264 intra prediction) from x264 (authors:Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI 2010. Originally committed as revision 26157 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 21:49:57 +00:00
Daniel Kang	e9c576a467	Port pred4x4_horizontal_up_mmxext (H.264 intra prediction) from x264 (authors: Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI 2010. Originally committed as revision 26156 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 21:42:33 +00:00
Daniel Kang	92f441ae86	Port pred4x4_vertical_left_mmxext (H.264 intra prediction) from x264 (authors: Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI 2010. Originally committed as revision 26155 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 21:35:34 +00:00
Ronald S. Bultje	e8d98764cc	Merge a few superfluous CONFIG_GPL checks. Originally committed as revision 26154 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 21:30:47 +00:00
Ronald S. Bultje	42a59278cf	Whitespace cosmetics. Originally committed as revision 26152 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 20:43:15 +00:00
Daniel Kang	57b1f334d1	Port pred8x8l_horizontal_down_sse2/ssse3 (H.264 intra prediction) from x264 (authors: Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI 2010. Originally committed as revision 26151 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 20:42:15 +00:00
Daniel Kang	04cbdf3d24	Port pred8x8l_horizontal_down_mmxext (H.264 intra prediction) from x264 (authors: Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI 2010. Originally committed as revision 26150 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 20:38:06 +00:00
Daniel Kang	98c6053cd0	Port pred8x8l_horizontal_up_mmxext/ssse3 (H.264 intra prediction) from x264 (authors: Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI 2010. Originally committed as revision 26149 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 20:35:31 +00:00
Daniel Kang	ecc7efbbb6	Port pred8x8l_vertical_left_sse2/ssse3 (H.264 intra prediction) from x264 (authors: Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI 2010. Originally committed as revision 26148 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 20:06:22 +00:00
Daniel Kang	bdd93f1b25	Port pred8x8l_vertical_right_sse2/ssse3 (H.264 intra prediction) from x264 (authors: Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI 2010. Originally committed as revision 26147 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 19:54:05 +00:00
Daniel Kang	f25112fc09	Port pred8x8l_vertical_right_mmxext (H.264 intra prediction) from x264 (authors: Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI 2010. Originally committed as revision 26146 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 19:46:09 +00:00
Daniel Kang	602a4cb25a	Port pred8x8l_down_right_sse2/ssse3 (H.264 intra prediction) from x264 (authors: Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI 2010. Originally committed as revision 26145 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 19:19:49 +00:00
Daniel Kang	e916acbcd1	Port pred8x8l_down_right_mmxext (H.264 intra prediction) from x264 (authors: Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI 2010. Originally committed as revision 26143 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 19:12:02 +00:00
Daniel Kang	c249e66576	Port pred8x8l_down_left_sse2/ssse3 (H.264 intra prediction) from x264 (authors: Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI 2010. Originally committed as revision 26142 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 19:02:50 +00:00
Daniel Kang	ee1ba9c326	Port pred8x8l_vertical_mmxext/ssse3 (H.264 intra prediction) from x264 to FFmpeg. Original authors: Holger Lubitz <holger lubitz org>, Jason Garrett- Glaser <darkshikari gmail com> (approves LGPL relicensing for this code) and Loren Merritt <lorenm at u dot washington dot edu> (approves LGPL relicensing for this code). Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI 2010. Originally committed as revision 26140 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 18:46:40 +00:00
Daniel Kang	04207ef353	Port pred8x8l_horizontal_mmxext/ssse3 (H.264 intra prediction) from x264 to FFmpeg. Original authors: Holger Lubitz <holger lubitz org>, Jason Garrett- Glaser <darkshikari gmail com> (approves LGPL relicensing for this code) and Loren Merritt <lorenm at u dot washington dot edu> (approves LGPL relicensing for this code). Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI 2010. Originally committed as revision 26139 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 18:40:53 +00:00

... 5 6 7 8 9 ...

919 Commits