ffmpeg

Author	SHA1	Message	Date
Ronald S. Bultje	d9669eab0b	dwt: remove variable-length arrays Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-06-17 23:20:10 +01:00
Justin Ruggles	d5a7229ba4	Add a float DSP framework to libavutil Move vector_fmul() from DSPContext to AVFloatDSPContext.	2012-06-08 13:14:38 -04:00
Vitor Sessak	bac0729d9e	x86: use new schema for ASM macros Signed-off-by: Janne Grunau <janne-libav@jannau.net>	2012-05-29 14:49:45 +02:00
Justin Ruggles	713548cbad	x86: lavc: use %if HAVE_AVX guards around AVX functions in yasm code. This is needed for older versions of yasm/nasm that do not support AVX. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-05-22 20:46:02 +02:00
Kieran Kunhya	5ff01259a8	Convert vector_fmul range of functions to YASM and add AVX versions Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>	2012-05-21 17:13:05 -04:00
Michael Kostylev	6797d1948b	x86: rv40: Mark rv40_weight functions as MMX2; they use MMX2 instructions.	2012-05-15 23:54:08 +02:00
Justin Ruggles	95a98ab3f0	ac3dsp: simplify x86 versions of ac3_max_msb_abs_int16 Simplifies the code by using cpuflags and a new macro. Also fixes the invalid use of the MMX2 pshufw operation in the MMX-only function.	2012-05-15 15:23:59 -04:00
Vitor Sessak	fcc456b829	x86: use more standard construct for setting ASM functions in FFT code Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-05-14 15:38:42 +02:00
Michael Kostylev	ea60dfe284	x86: vc1: drop MMX loop filter implementation, which uses MMX2 instructions.	2012-05-12 14:02:45 +02:00
Christophe Gisquet	110d0cdc9d	rv40dsp x86: MMX/MMX2/3DNow/SSE2/SSSE3 implementations of MC Code mostly inspired by vp8's MC, however: - its MMX2 horizontal filter is worse because it can't take advantage of the coefficient redundancy - that same coefficient redundancy allows better code for non-SSSE3 versions Benchmark (rounded to tens of unit): V8x8 H8x8 2D8x8 V16x16 H16x16 2D16x16 C 445 358 985 1785 1559 3280 MMX* 219 271 478 714 929 1443 SSE2 131 158 294 425 515 892 SSSE3 120 122 248 387 390 763 End result is overall around a 15% speedup for SSSE3 version (on 6 sequences); all loop filter functions now take around 55% of decoding time, while luma MC dsp functions are around 6%, chroma ones are 1.3% and biweight around 2.3%. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-05-10 18:42:43 +02:00
Ronald S. Bultje	bec207f9f9	snowdsp: explicitily state instruction size. Fixes a compile error with clang at -O0.	2012-05-02 09:57:12 -07:00
Christophe GISQUET	e75d1d4f73	dsputil x86: revert a test back to its previous value Commit `356ee8d` caused the initial inversion. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-28 11:00:51 -07:00
Christophe Gisquet	fe5ed69dc7	rv34dsp x86: implement MMX2 inverse transform 141 cycles down to 51. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-28 10:58:47 -07:00
Roland Scheidegger	9b9df1cdff	h264: new assembly version of get_cabac for x86_64 with PIC This adds a hand-optimized assembly version for get_cabac much like the existing one, but it works if the table offsets are RIP-relative. Compared to the non-RIP-relative version this adds 2 lea instructions and it needs one extra register. get_cabac() gets about 40% faster, for an overall speedup of about 5%. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-28 09:43:25 -07:00
Roland Scheidegger	14e9ffc1e4	h264: use one table instead of several for cabac functions The reason is this is easier for PIC code (in particular on darwin...). Keep the old names as pointers (static in cabac_functions.h so gcc knows these are just immediate offsets) so the c code can nicely stay the same (alternatively could use offsets directly in the functions needing the tables). This should produce the same code as before with non-pic and better code (confirmed) with pic. The assembly uses the new table but still won't work for PIC case. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-28 08:26:12 -07:00
Roland Scheidegger	444f47b55c	h264: (trivial) remove unneeded macro argument in x86/cabac.h Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-28 08:24:56 -07:00
Mans Rullgard	2bcbd98459	Remove lowres video decoding This feature is complex, of questionable utility, and slows down normal decoding. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-04-21 18:56:19 +01:00
Mans Rullgard	95510be8c3	avcodec: remove AVCodecContext.dsp_mask This removes all references to AVCodecContext.dsp_mask and marks it for eviction at the next version bump. It has been superseded by av_set_cpu_flag_mask() which, unlike this field, works everywhere. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-04-21 18:30:01 +01:00
Ronald S. Bultje	87a246341b	h264: use proper PROLOGUE statement for a function using 8 registers. Fixes crashes when using biweight on win64.	2012-04-16 08:07:21 -07:00
Ronald S. Bultje	b089ca871a	dsputil: fix optimized emu_edge function on Win64. Recent register allocation changes (x86inc.asm update) changed the register order and thus opcodes for the inner loops. One of them became >128bytes, which confuses other parts of this function where it jumps to fixed-offset positions to extend the edge by fixed amounts. A simple register change fixes this.	2012-04-13 11:28:30 -07:00
Justin Ruggles	de7f22ab0c	ac3dsp: call femms/emms at the end of float_to_fixed24() for 3DNow and SSE Fixes ac3-encode and eac3-encode FATE test failures with SSE2 disabled. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-12 21:33:04 -07:00
Ronald S. Bultje	76538d7a78	h264: fix 10bit biweight functions after recent x86inc.asm fixes. This should have been updated in the x86inc.asm update, but was accidently forgotten.	2012-04-12 21:13:57 -07:00
Diego Biurrun	7bb3a302fe	build: Consistently handle conditional compilation for all optimization OBJS.	2012-04-12 09:00:49 +02:00
Henrik Gramner	729f90e268	x86inc improvements for 64-bit Add support for all x86-64 registers Prefer caller-saved register over callee-saved on WIN64 Support up to 15 function arguments Also (by Ronald S. Bultje) Fix up our asm to work with new x86inc.asm. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>	2012-04-11 15:47:00 -04:00
Christophe GISQUET	2130bd8f5b	rv40dsp x86: use only one register, for both increment and loop counter Around 10 cycles faster for luma. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-10 10:07:09 -07:00
Christophe GISQUET	272b252c01	rv40dsp: implement prescaled versions for biweight. Quite often, the original weights are multiple of 512. By prescaling them by 1/512 when they are computed (once per frame), no intermediate shifting is needed, and no prescaling on each call either. The x86 code already used that trick. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-10 10:06:48 -07:00
Christophe GISQUET	6b81da2fd0	dsputil x86: use SSE float instruction instead of SSE2 integer equivalent All the more required since the users are pure SSE functions. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-04 11:24:27 -07:00
Christophe GISQUET	cd88105f6f	dsputil x86: remove deprecated parameter from scalarproduct_int16 prototype Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-04 11:24:08 -07:00
Christophe GISQUET	f9888520cc	vp8dsp x86: perform rounding shift with a single instruction Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-04 11:23:36 -07:00
Ronald S. Bultje	a940198130	cabac: add overread protection to BRANCHLESS_GET_CABAC(). Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind	2012-03-28 08:01:29 -07:00
Ronald S. Bultje	448dc42571	cabac: increment jump locations by one in callers of BRANCHLESS_GET_CABAC().	2012-03-28 08:01:29 -07:00
Ronald S. Bultje	16f6e83f74	cabac: remove unused argument from BRANCHLESS_GET_CABAC_UPDATE().	2012-03-28 08:01:29 -07:00
Ronald S. Bultje	951014e5bb	cabac: use struct+offset instead of memory operand in BRANCHLESS_GET_CABAC().	2012-03-28 08:01:29 -07:00
Ronald S. Bultje	a0bdcb019e	h264: add overread protection to get_cabac_bypass_sign_x86().	2012-03-28 08:01:29 -07:00
Ronald S. Bultje	95bfa4ead7	h264: reindent get_cabac_bypass_sign_x86().	2012-03-28 08:01:29 -07:00
Ronald S. Bultje	db025929f2	h264: use struct offsets in get_cabac_bypass_sign_x86().	2012-03-28 08:01:29 -07:00
Diego Biurrun	ad0e31f134	build: prettyprinting cosmetics	2012-03-26 13:00:10 +02:00
Diego Biurrun	62ce9defb8	x86: dsputil: prettyprint gcc inline asm	2012-03-25 11:50:48 +02:00
Diego Biurrun	3b54912113	x86: K&R prettyprinting cosmetics for dsputil_mmx.c	2012-03-25 11:50:48 +02:00
Diego Biurrun	915a2a0a65	x86: conditionally compile H.264 QPEL optimizations	2012-03-25 11:50:45 +02:00
Diego Biurrun	3816642eab	dsputil_mmx: Surround QPEL macros by "do { } while (0);" blocks. This makes them safe to use in non-fully braced if-blocks and similar.	2012-03-25 11:48:37 +02:00
Ronald S. Bultje	71ea26811c	aacsbr: handle m_max values smaller than 4. Prevents a signflip in the counter, and a subsequent crash because of overreads/overwrites. Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind CC: libav-stable@libav.org	2012-03-23 12:56:08 -07:00
Ronald S. Bultje	a928ed3751	vp8: convert mbedge loopfilter x86 assembly to use named arguments.	2012-03-10 11:36:33 -08:00
Ronald S. Bultje	bee330e300	vp8: convert inner loopfilter x86 assembly to use named arguments.	2012-03-10 11:36:33 -08:00
Reimar Döffinger	6eda85e15b	sbrdsp.asm: convert all instructions to float/SSE ones. Since the values are floats, using the float operations makes sense, improves performance on some CPUs and makes the code SSE compatible instead of needing SSE2. Based on suggestion by Jason. Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de> Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-03-07 13:50:13 -08:00
Christophe GISQUET	7e1ce6a6ac	dsputil: remove shift parameter from scalarproduct_int16 There is only one caller, which does not need the shifting. Other use cases are situations where different roundings would be needed. The x86 and neon versions are modified accordingly. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-03-07 10:29:52 -08:00
Diego Biurrun	1e9d55e45e	x86: Remove duplicated AVG_3DNOW_OP / AVG_MMX2_OP macros from h264_qpel_mmx.c.	2012-03-07 09:36:04 +01:00
Reimar Döffinger	b5161908e0	SBR DSP: fix SSE code to not use SSE2 instructions. movq from SSE register _to_ memory is an SSE2 instruction. Use the SSE movlps function instead that does the same thing. Signed-off-by: Reimar DÃ¶ffinger <Reimar.Doeffinger@gmx.de> Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-03-06 13:40:35 -08:00
Mans Rullgard	356ee8d7de	x86: clean up ff_dsputil_init_mmx() This splits ff_dsputil_init_mmx() into multiple functions, one for each MMX/SSE level, somewhat simplifying the nested conditions. Signed-off-by: Mans Rullgard <mans@mansr.com> Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-03-05 14:40:03 +01:00
Ronald S. Bultje	b4188f0d46	vp8: convert simple loopfilter x86 assembly to use named arguments.	2012-03-03 20:40:00 -08:00
Ronald S. Bultje	8476ca3b4e	vp8: convert idct x86 assembly to use named arguments.	2012-03-03 20:40:00 -08:00
Ronald S. Bultje	21ffc78fd7	vp8: convert mc x86 assembly to use named arguments.	2012-03-03 20:40:00 -08:00
Ronald S. Bultje	28170f1a39	vp8: convert loopfilter x86 assembly to use cpuflags().	2012-03-03 20:40:00 -08:00
Ronald S. Bultje	e25be47154	vp8: convert idct/mc x86 assembly to use cpuflags().	2012-03-03 20:39:59 -08:00
Ronald S. Bultje	291c9b6285	h264: change underread for 10bit QPEL to overread. This prevents us from reading before the start of the buffer, and thus prevents crashes resulting from this behaviour. Fixes bug 237.	2012-03-02 10:33:05 -08:00
Ronald S. Bultje	45549339bc	vp8: disable mmx functions with sse/sse2 counterparts on x86-64. x86-64 is guaranteed to have at least SSE2, therefore the MMX/MMX2 functions will never be used in practice.	2012-03-02 10:32:05 -08:00
Ronald S. Bultje	bd66f073fe	vp8: change int stride to ptrdiff_t stride. On 64bit platforms with 32bit int, this means we won't have to sign- extend the integer anymore.	2012-03-02 10:31:50 -08:00
Ronald S. Bultje	b0c4f04338	h264: fix mmxext chroma deblock to use correct TC values.	2012-02-27 09:38:44 -08:00
Christophe GISQUET	2784d18791	SBR DSP x86: implement SSE sbr_hf_g_filt Unrolling the main loop to process, instead of 4 elements: - 8: minor gain of 2 cycles (not worth the extra object size) - 2: loss of 8 cycles. Assigning STEP to a register is a loss. Output address (Y) is almost always unaligned. Timings: - C (32/64 bits): 117/109 cycles - SSE: 57 cycles Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-02-23 15:50:09 -08:00
Christophe GISQUET	34454c761f	SBR DSP x86: implement SSE sbr_sum_square_sse The 32bits targets have been compiled with -mfpmath=sse for proper reference. sbr_sum_square C /32bits: 82c (unrolled)/102c C /64bits: 69c (unrolled)/82c SSE/32bits: 42c SSE/64bits: 31c Use of SSE4.1 dpps to perform the final sum is slower. Not unrolling to perform 8 operations in a loop yields 10 more cycles. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-02-23 15:50:06 -08:00
Ronald S. Bultje	3ab9a2a557	rv34: change most "int stride" into "ptrdiff_t stride". This prevents having to sign-extend on 64-bit systems with 32-bit ints, such as x86-64. Also fixes crashes on systems where we don't do it and arguments are not in registers, such as Win64 for all weight functions.	2012-02-20 14:58:25 -08:00
Ronald S. Bultje	8fb26950ed	h264: don't use redzone in loopfilter on win64. Red zone usage is not allowed in the Win64 ABI.	2012-02-19 15:31:03 -08:00
Christophe GISQUET	f3e084909b	mpegaudio: replace memcpy by SIMD code By replacing memcpy with an unrolled loop using the alignment knowledge it has, some speedup can be obtained. Before (gcc 4.6.1): ~400 cycles After: ~370 cycles Overall, around 2% speed increase when decoding a 2400s mp3 to f32le. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-02-15 20:11:54 -08:00
Martin Storsjö	efd29844eb	mpegvideo: Add ff_ prefix to nonstatic functions Signed-off-by: Martin Storsjö <martin@martin.st>	2012-02-15 22:07:23 +02:00
Martin Storsjö	873c89e2a6	dsputil: Add ff_ prefix to inv_zigzag_direct16 Signed-off-by: Martin Storsjö <martin@martin.st>	2012-02-15 22:06:42 +02:00
Martin Storsjö	9cf0841ef3	dsputil: Add ff_ prefix to the dsputil_init functions Signed-off-by: Martin Storsjö <martin@martin.st>	2012-02-15 22:06:34 +02:00
Justin Ruggles	d483bb58c3	ac3dsp: do not use pshufb in ac3_extract_exponents_ssse3() We need to do unsigned saturation in order to cover the corner case when the absolute coefficient value is 16777215 (the maximum value). Fixes Bug #216	2012-02-09 21:04:44 -05:00
Diego Biurrun	0bba26466f	cosmetics: Delete empty lines at end of file.	2012-02-09 12:26:45 +01:00
Ronald S. Bultje	ce1e250ee9	h264: manually save/restore XMM registers for functions using INIT_MMX. On Win64, these registers are callee-save, so not saving/restoring them correctly is a violation of ABI and can lead to crashes or corrupt data.	2012-02-08 10:31:14 -08:00
Ronald S. Bultje	4ff6dea390	pngdsp: swap argument inversion.	2012-02-07 14:32:26 -08:00
Michael Kostylev	3206cccc0e	h264: mark h264_idct_add8_10 with number of XMM registers. This fixes XMM register clobber problems on Win64. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-02-07 11:37:13 -08:00
Ronald S. Bultje	7e4d9d5d45	win64: add a XMM clobber test configure option. This will be useful to test more aggressively for failures to mark XMM registers as clobbered in Win64 builds, and prevent regressions thereof. Based on a patch by Ramiro Polla <ramiro.polla@gmail.com>	2012-02-02 12:00:48 -08:00
Justin Ruggles	236a550c3f	Fix a typo in the x86 asm version of ff_vector_clip_int32() Specifies the correct number of xmm registers used so that they can be saved and restored on Win64 if necessary.	2012-02-01 19:02:32 -05:00
Christophe Gisquet	e5c9de2ab7	rv40: x86 SIMD for biweight Provide MMX, SSE2 and SSSE3 versions, with a fast-path when the weights are multiples of 512 (which is often the case when the values round up nicely). *_TIMER report for the 16x16 and 8x8 cases: C: 9015 decicycles in 16, 524257 runs, 31 skips 2656 decicycles in 8, 524271 runs, 17 skips MMX: 4156 decicycles in 16, 262090 runs, 54 skips 1206 decicycles in 8, 262131 runs, 13 skips MMX on fast-path: 2760 decicycles in 16, 524222 runs, 66 skips 995 decicycles in 8, 524252 runs, 36 skips SSE2: 2163 decicycles in 16, 262131 runs, 13 skips 832 decicycles in 8, 262137 runs, 7 skips SSE2 with fast path: 1783 decicycles in 16, 524276 runs, 12 skips 711 decicycles in 8, 524283 runs, 5 skips SSSE3: 2117 decicycles in 16, 262136 runs, 8 skips 814 decicycles in 8, 262143 runs, 1 skips SSSE3 with fast path: 1315 decicycles in 16, 524285 runs, 3 skips 578 decicycles in 8, 524286 runs, 2 skips This means around a 4% speedup for some sequences. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-01-30 23:58:25 +01:00
Diego Biurrun	91bafb52ae	x86: Give RV40 init file a more suitable name.	2012-01-30 23:58:24 +01:00
Diego Biurrun	c30b198381	x86: Place mm_flags variable declaration below the appropriate #ifdef. This fixes some unused variable warnings with YASM disabled.	2012-01-30 23:58:23 +01:00
Christophe Gisquet	6b03900382	x86 dsputil: provide SSE2/SSSE3 versions of bswap_buf While pshufb allows emulating bswap on XMM registers for SSSE3, more shuffling is needed for SSE2. Alignment is critical, so specific codepaths are provided for this case. For the huffyuv sequence "angels_480-huffyuvcompress.avi": C (using bswap instruction): ~ 55k cycles SSE2: ~ 40k cycles SSSE3 using unaligned loads: ~ 35k cycles SSSE3 using aligned loads: ~ 30k cycles Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-01-30 10:19:55 +01:00
Ronald S. Bultje	af79a0c48a	png: add support for bpp>4 to paeth x86 SIMD code. This fixes playback of e.g. RGB48 (bpp=6) content on x86 CPUs. Fixes bug 214.	2012-01-29 21:22:50 -08:00
Ronald S. Bultje	f91c4b7824	png: add SSE2 version for add_bytes_l2.	2012-01-29 18:52:17 -08:00
Ronald S. Bultje	59f474b49d	png: convert DSP functions to yasm.	2012-01-29 18:47:50 -08:00
Ronald S. Bultje	20a7d3178f	png: add missing #if HAVE_SSSE3 around function pointer assignment.	2012-01-29 12:31:59 -08:00
Ronald S. Bultje	331e7c4cb3	imdct36: mark SSE functions as using all 16 XMM registers. On x86-64, it indeed uses all 16 registers (and on x86-32, this gets clipped to 8). Not marking it properly causes callers of this function to fail randomly because of XMM register clobbering.	2012-01-29 08:14:05 -08:00
Ronald S. Bultje	e92003514d	png: move DSP functions to their own DSP context.	2012-01-29 08:11:18 -08:00
Ronald S. Bultje	3b15a6d742	config.asm: change %ifdef directives to %if directives. This allows combining multiple conditionals in a single statement.	2012-01-27 10:19:57 +08:00
Ronald S. Bultje	c3af52fa8b	dsputil: use vertical component for drawing bottom edge. Current code only writes 8 pixels of vertical edge for YUV422, which causes MC artifacts when subsequent frames use data from that edge.	2012-01-25 18:06:36 +08:00
Christophe GISQUET	9ba9c34024	rv34: 1-pass inter MB reconstruction Implement 1-pass inverse transform and reconstruction for inter blocks.	2012-01-16 19:26:41 +01:00
Christophe GISQUET	d78062386e	rv34: Intra 16x16 handling Extract processing of intra 16x16 blocks from intra macroblock processing. Also implement a function performing inverse transform and block reconstruction for DC-only blocks in 1 pass instead of 2.	2012-01-16 00:41:51 +01:00
Christophe GISQUET	3faa303a47	rv34: DC-only inverse transform When decoding coefficients, detect whether the block is DC-only, and take advantage of this knowledge to perform DC-only inverse transform. This is achieved by: - first, changing the 108x4 element modulo_three_table into a 108 element table (kind of base4), and accessing each value using mask and shifts. - then, checking low bits for 0 (as they represent the presence of higher frequency coefficients) Also provide x86 SIMD code for the DC-only inverse transform. Signed-off-by: Kostya Shishkov <kostya.shishkov@gmail.com>	2012-01-12 09:52:33 +01:00
Henrik Gramner	e7d02b04dc	fft: init functions with INIT_XMM/YMM. This is required to handle clobbering of XMM registers on Win64 correctly. Fixes FFT and all tests depending on FFT on Win64. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Janne Grunau <janne-libav@jannau.net>	2012-01-11 20:12:26 +01:00
Vitor Sessak	39df0c434c	mpegaudiodec: optimized iMDCT transform Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-01-08 17:40:55 -08:00
Martin Storsjö	676a9ee1d2	x86: Fix constraints for decode_significance*_x86 Originally, prior to `8742a4ff8`, the caller code was compiled within this condition: ARCH_X86 && HAVE_7REGS && HAVE_EBX_AVAILABLE && !defined(BROKEN_RELOCATIONS) Since HAVE_7REGS is defined as (ARCH_X86_64 \|\| (HAVE_EBX_AVAILABLE && HAVE_EBP_AVAILABLE)) the subcondition HAVE_7REGS && HAVE_EBX_AVAILABLE is equal to HAVE_7REGS (for 32 bit at least). The correct simplification of the original condition thus is HAVE_7REGS, not HAVE_EBX_AVAILABLE. This fixes compilation in some cases where HAVE_EBP_AVAILABLE = 0 and HAVE_EBX_AVAILABLE = 1. Signed-off-by: Martin Storsjö <martin@martin.st>	2011-12-27 09:05:14 +02:00
Diego Biurrun	6fdb2ce34a	x86: Tighten register constraints for decode_significance*_x86. On 32-bit OS X with gcc 4.0/4.2 and shared libraries enabled, the ebx register is not available, but required to assemble the functions. This reverts commit `8742a4f` to a simplified version of the original constraints.	2011-12-21 12:06:37 +01:00
Diego Biurrun	30bbd5cbc0	x86: conditionally compile dnxhd encoder optimizations	2011-12-19 13:54:10 +01:00
Diego Biurrun	88b9735753	build: conditionally compile x86 H.264 chroma optimizations	2011-12-14 11:58:45 +01:00
Martin Storsjö	f1dba9e498	x86: Require 7 registers for the cabac asm The change in `599b4c6ef` didn't turn out to work properly on i386 on OS X, where it broke building with PIC enabled. Signed-off-by: Martin Storsjö <martin@martin.st>	2011-12-12 15:36:20 +02:00
Mans Rullgard	599b4c6efd	x86: cabac: replace explicit memory references with "m" operands This replaces the explicit offset(reg) memory references with "m" operands for the same locations. As a result, one fewer register operand is needed for these inline asm statements. Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-12-11 22:29:22 +00:00
Diego Biurrun	da9cea77e3	Fix a bunch of common typos.	2011-12-11 00:32:25 +01:00
Justin Ruggles	0e8fdd41c2	dsputil: use cpuflags in x86 emu_edge_core avoids passing around the extra argument among all the macros it uses	2011-11-22 15:40:51 -05:00
Justin Ruggles	395f2e70dd	dsputil: use movups instead of movdqu in ff_emu_edge_core_sse() This allows emulated_edge_mc_sse() and gmc_sse() to be used under AV_CPU_FLAG_SSE.	2011-11-22 15:40:51 -05:00
Justin Ruggles	9d06037d48	twinvq: add SSE/AVX optimized sum/difference stereo interleaving	2011-11-11 14:13:58 -05:00

1 2 3 4 5 ...

577 Commits