ffmpeg

Author	SHA1	Message	Date
Diego Biurrun	311a592dfc	x86: Remove some duplicate function declarations	2013-04-22 02:29:57 +02:00
Martin Storsjö	b71a0507b0	x86: Remove unused inline asm instruction defines Signed-off-by: Martin Storsjö <martin@martin.st>	2013-04-20 00:44:54 +03:00
Ronald S. Bultje	8db00081a3	x86: hpeldsp: Move half-pel assembly from dsputil to hpeldsp Signed-off-by: Martin Storsjö <martin@martin.st>	2013-04-19 23:18:53 +03:00
Ronald S. Bultje	015821229f	vp3: Use full transpose for all IDCTs This way, the special IDCT permutations are no longer needed. This is similar to how H264 does it, and removes the dsputil dependency imposed by the scantable code. Also remove the unused type == 0 cases from the plain C version of the idct. Signed-off-by: Martin Storsjö <martin@martin.st>	2013-04-15 12:32:05 +03:00
Ronald S. Bultje	c46819f229	x86: Move constants to the only place where they are used Signed-off-by: Martin Storsjö <martin@martin.st>	2013-04-15 12:17:39 +03:00
Diego Biurrun	a3cb865310	x86: dsputil: Move some ifdefs to avoid unused variable warnings	2013-04-12 09:36:47 +02:00
Diego Biurrun	2004c7c8f7	x86: dsputil: cosmetics: Remove two pointless variable indirections	2013-04-12 09:36:47 +02:00
Diego Biurrun	c51a3a5bd9	x86: dsputil: Refactor some ff_{avg\|put}_pixels function declarations	2013-04-12 09:36:46 +02:00
Diego Biurrun	e027032fc6	x86: dsputil: ff_h263_*_loop_filter declarations to a more suitable place	2013-04-12 09:36:46 +02:00
Diego Biurrun	a89c05500f	x86: h264qpel: int --> ptrdiff_t for some line_size parameters	2013-04-12 09:30:12 +02:00
Diego Biurrun	ac9362c5d9	Move misplaced file author information where it belongs	2013-04-11 02:42:11 +02:00
Ronald S. Bultje	b93b27edb0	dsputil: Make dsputil selectable Signed-off-by: Martin Storsjö <martin@martin.st>	2013-04-10 11:04:05 +03:00
Ronald S. Bultje	62844c3fd6	h264: Integrate clear_blocks calls with IDCT The non-intra-pcm branch in hl_decode_mb (simple, 8bpp) goes from 700 to 672 cycles, and the complete loop of decode_mb_cabac and hl_decode_mb (in the decode_slice loop) goes from 1759 to 1733 cycles on the clip tested (cathedral), i.e. almost 30 cycles per mb faster. Signed-off-by: Martin Storsjö <martin@martin.st>	2013-04-10 11:03:06 +03:00
Ronald S. Bultje	610b18e2e3	x86: qpel: Move fullpel and l2 functions to a separate file This way, they can be shared between mpeg4qpel and h264qpel without requiring either one to be compiled unconditionally. Signed-off-by: Martin Storsjö <martin@martin.st>	2013-04-08 12:38:33 +03:00
Christophe Gisquet	f4b0d12f5b	x86: sbrdsp: Implement SSE neg_odd_64 Timing on Arrandale: C SSE Win32: 57 44 Win64: 47 38 Unrolling and not storing mask both save some cycles. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2013-04-05 22:47:04 +02:00
Diego Biurrun	b6649ab503	cosmetics: Remove unnecessary extern keywords from function declarations	2013-03-27 14:21:45 +01:00
Martin Storsjö	a2acadd058	x86: vc1dsp: Fix indentation Signed-off-by: Martin Storsjö <martin@martin.st>	2013-03-26 15:49:42 +02:00
Janne Grunau	e5c2794a71	x86: consistently use unaligned movs in the unaligned bswap Fixes fate errors in asv1, ffvhuff and huffyuv on x86_32.	2013-03-25 12:11:11 +01:00
Martin Storsjö	285ff14413	x86: Change a missed occurrance of int to ptrdiff_t for strides Signed-off-by: Martin Storsjö <martin@martin.st>	2013-03-24 12:06:53 +02:00
Martin Storsjö	352dbdb96c	x86: Remove win64 xmm clobbering wrappers for the now removed avcodec_encode_video function Signed-off-by: Martin Storsjö <martin@martin.st>	2013-03-23 23:37:27 +02:00
Luca Barbato	a8b6015823	dsputil: convert remaining functions to use ptrdiff_t strides Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2013-03-12 18:26:42 +01:00
Diego Biurrun	e8c52271c4	Revert "Move H264/QPEL specific asm from dsputil.asm to h264_qpel_*.asm." This reverts commit `f90ff772e7`. The code should be put back in h264_qpel_8bit.asm, but unfortunately it is unconditionally used from dsputil_mmx.c since `71155d7`.	2013-02-28 21:50:02 +01:00
Diego Biurrun	ebc701993f	x86: dsputil: Drop some unused function #defines	2013-02-26 23:36:24 +01:00
Diego Biurrun	845cfc92f9	x86: dsputil: Drop aliasing of ff_put_pixels8_mmx to ff_put_pixels8_mmxext The external assembly function uses mmxext instructions and should not be masqueraded as an mmx-only function. Instead, use the mmx-only inline assembly function.	2013-02-26 23:36:24 +01:00
Diego Biurrun	096cc11ec1	x86: vc1dsp: Move ff_avg_vc1_mspel_mc00_mmxext out of dsputil_mmx.c	2013-02-26 23:36:24 +01:00
Martin Storsjö	31a23a0dc6	x86: dsputil_mmx: Remove leftover inline assembly fragments These became unused in `71155d7b`. Signed-off-by: Martin Storsjö <martin@martin.st>	2013-02-27 00:17:05 +02:00
Diego Biurrun	c242bbd8b6	Remove unnecessary dsputil.h #includes	2013-02-26 00:51:34 +01:00
Matt Wolenetz	311443f6c7	x86: h264: Don't use redzone in AVX h264_deblock on Win64 This fixes crashes in chromium on win64 on machines with AVX (crashes that apparently aren't triggered by fate). Signed-off-by: Martin Storsjö <martin@martin.st>	2013-02-21 15:02:16 +02:00
Ronald S. Bultje	e5ffffe48d	h264chroma: Remove duplicate 9/10 bit functions These functions do the same thing in 16 bit space and don't need any depth specific clipping. Signed-off-by: Martin Storsjö <martin@martin.st>	2013-02-19 22:33:19 +02:00
Daniel Kang	9acd23d655	x86: dsputil: Fix h263 loop filter link error in some configurations This was caused by unconditionally referencing a conditionally compiled table. Now the code is also compiled conditionally. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2013-02-18 17:09:00 +01:00
Daniel Kang	7a03145ed7	x86: dsputil: int --> ptrdiff_t for ff_put_pixels16_mmxext line_size param This avoids SIMD-optimized functions having to sign-extend their line size argument manually to be able to do pointer arithmetic. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2013-02-18 15:23:03 +01:00
Daniel Kang	b3f2a3fe3f	x86: mpeg4qpel: Make movsxifnidn do the right thing Fixes an instruction that does nothing by changing the source to dword. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2013-02-11 20:17:15 +01:00
Diego Biurrun	5d3d39c72e	dsputil: Move fdct function declarations to dct.h	2013-02-09 00:08:28 +01:00
Diego Biurrun	218aefce44	dsputil: Move LOCAL_ALIGNED macros to libavutil	2013-02-08 23:13:37 +01:00
Daniel Kang	a1d3673034	dsputil: x86: Fix compile error Accidentally prefixed ff_ with cextern. Signed-off-by: Martin Storsjö <martin@martin.st>	2013-02-07 11:06:16 +02:00
Daniel Kang	659d4ba5af	dsputil: x86: Convert h263 loop filter to yasm Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2013-02-06 15:38:27 -08:00
Martin Storsjö	a846dccb29	h264chroma: x86: Fix building with yasm disabled Signed-off-by: Martin Storsjö <martin@martin.st>	2013-02-06 17:05:33 +02:00
Diego Biurrun	82bd04b170	rv34: Drop now unnecessary dsputil dependencies	2013-02-06 11:30:54 +01:00
Diego Biurrun	79dad2a932	dsputil: Separate h264chroma	2013-02-06 11:30:53 +01:00
Diego Biurrun	c9f933b5b6	Add av_cold attributes to arch-specific init functions	2013-02-05 17:01:05 +01:00
Diego Biurrun	25841dfe80	Use ptrdiff_t instead of int for {avg, put}_pixels line_size parameter. This avoids SIMD-optimized functions having to sign-extend their line size argument manually to be able to do pointer arithmetic.	2013-02-05 12:59:12 +01:00
Diego Biurrun	52acd79165	x86: hpel: Move {avg,put}_pixels16_sse2 to hpeldsp	2013-01-31 11:19:23 +01:00
Diego Biurrun	c59211b437	x86: Simplify some arch conditionals	2013-01-29 00:10:53 +01:00
Michael Niedermayer	834e9fb056	x86: hpeldsp: Fix a typo, use the right register This makes the code actually work. Signed-off-by: Martin Storsjö <martin@martin.st>	2013-01-28 12:49:37 +02:00
Daniel Kang	05b0998f51	dsputil: Fix error by not using redzone and register name Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com> Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2013-01-28 07:23:20 +01:00
Daniel Kang	96753bd00d	dsputil: x86: Correct the number of registers used in put_no_rnd_pixels16_l2 put_no_rnd_pixels16_l2 allocated 5 instead of 6 registers. Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2013-01-27 08:41:48 +01:00
Daniel Kang	0eedf5d74d	dsputil: add missing HAVE_YASM guard Fix compile error under "--disable-optimizations --disable-yasm --disable-inline-asm" Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2013-01-27 08:41:46 +01:00
Daniel Kang	71155d7b41	dsputil: x86: Convert mpeg4 qpel and dsputil avg to yasm Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2013-01-27 06:45:31 +01:00
Ronald S. Bultje	f90ff772e7	Move H264/QPEL specific asm from dsputil.asm to h264_qpel_*.asm.	2013-01-26 20:35:42 -08:00
Diego Biurrun	033a86f9bb	x86: h264qpel: Move stray comment to the right spot and clarify it	2013-01-26 11:19:22 +01:00
Janne Grunau	c5c2060cf5	x86: h264qpel: add cpu flag checks for init function The code was copied from per cpu extension init function so the checks for supported extensions was overlooked.	2013-01-24 19:03:59 +01:00
Mans Rullgard	e9d817351b	dsputil: Separate h264 qpel The sh4 optimizations are removed, because the code is 100% identical to the C code, so it is unlikely to provide any real practical benefit. Signed-off-by: Diego Biurrun <diego@biurrun.de> Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2013-01-24 10:44:43 +01:00
Ronald S. Bultje	baf35bb4bc	dsputil: remove one array dimension from avg_no_rnd_pixels_tab.	2013-01-22 18:41:36 -08:00
Ronald S. Bultje	32ff643228	dsputil: remove avg_no_rnd_pixels8. This is never used.	2013-01-22 18:41:36 -08:00
Diego Biurrun	88bd7fdc82	Drop DCTELEM typedef It does not help as an abstraction and adds dsputil dependencies. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2013-01-22 18:32:56 -08:00
Ronald S. Bultje	2e4bb99f4d	vorbisdsp: convert x86 simd functions from inline asm to yasm.	2013-01-22 18:02:24 -08:00
Ronald S. Bultje	d56668bd80	floatdsp: move scalarproduct_float from dsputil to avfloatdsp. This makes the aac decoder and all voice codecs independent of dsputil.	2013-01-22 11:55:42 -08:00
Ronald S. Bultje	42d3246948	floatdsp: move vector_fmul_reverse from dsputil to avfloatdsp. Now, nellymoserenc and aacenc no longer depends on dsputil. Independent of this patch, wmaprodec also does not depend on dsputil, so I removed it from there also.	2013-01-22 11:55:42 -08:00
Ronald S. Bultje	55aa03b9f8	floatdsp: move vector_fmul_add from dsputil to avfloatdsp.	2013-01-22 11:55:42 -08:00
Diego Biurrun	4f56e773fe	x86: ac3: Fix HAVE_MMXEXT condition to only refer to external assembly CC: libav-stable@libav.org	2013-01-21 23:54:32 +01:00
Daniel Kang	9f00b1cbab	dsputilenc: x86: Convert pixel inline asm to yasm Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2013-01-21 09:54:10 +01:00
Ronald S. Bultje	1768e43ceb	vorbisdsp: change block_size type from int to intptr_t. This saves one instruction in the x86-64 assembly.	2013-01-20 22:26:42 -08:00
Ronald S. Bultje	8a4f26206d	dsputil: remove butterflies_float_interleave. The function is unused.	2013-01-20 21:57:35 -08:00
Mans Rullgard	0b711ca3f3	dsputil: drop non-compliant "fast" qpel mc functions Signed-off-by: Diego Biurrun <diego@biurrun.de>	2013-01-20 14:50:42 +01:00
Ronald S. Bultje	fef906c77c	Move vorbis_inverse_coupling from dsputil to vorbisdspcontext. Conveniently (together with Justin's earlier patches), this makes our vorbis decoder entirely independent of dsputil.	2013-01-19 22:21:10 -08:00
Ronald S. Bultje	aeaf268e52	vp3: integrate clear_blocks with idct of previous block. This is identical to what e.g. vp8 does, and prevents the function call overhead (plus dependency on dsputil for this particular function). Arm asm updated by Janne Grunau <janne-libav@jannau.net>. Signed-off-by: Janne Grunau <janne-libav@jannau.net>	2013-01-19 22:04:55 -08:00
Diego Biurrun	822b0728f0	x86: dsputil: Drop some unused macro definitions	2013-01-18 22:24:58 +01:00
Justin Ruggles	e034cc6c60	lavc: Move vector_fmul_window to AVFloatDSPContext Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2013-01-16 10:45:45 +01:00
Diego Biurrun	dae1d507af	x86: Add PAVGB macro to abstract pavgb/pavgusb instruction via cpuflags	2013-01-15 17:29:43 +01:00
Diego Biurrun	51969a652c	x86: ABS2: port to cpuflags	2013-01-14 21:56:55 +01:00
Diego Biurrun	a0c5917f86	Drop Snow codec Snow is a toy codec with no real-world use and horrible code.	2013-01-06 16:30:02 +01:00
Christophe Gisquet	4f50646697	x86: sbrdsp: Implement SSE qmf_post_shuffle 255 to 174 cycles on Arrandale / Win64. Unrolling yields no gain. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2013-01-06 13:57:01 +01:00
Christophe Gisquet	44a0036d10	x86: sbrdsp: Implement SSE sum64x5 698 to 174 cycles on Arrandale. Unrolling is a 6 cycles gain. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2013-01-06 13:57:01 +01:00
Diego Biurrun	5b4dfbffc2	x86: ABS1: port to cpuflags	2013-01-06 13:57:01 +01:00
Ronald S. Bultje	8c53d39e7f	lavc: introduce VideoDSPContext Move some functions from dsputil. The idea is that videodsp contains functions that are useful for a large and varied set of video decoders. Currently, it contains emulated_edge_mc() and prefetch(). Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2012-12-20 13:40:45 +01:00
Ronald S. Bultje	6f40e9f070	x86inc: support stack mem allocation and re-alignment in PROLOGUE Use this in VP8/H264-8bit loopfilter functions so they can be used if there is no aligned stack (e.g. MSVC 32bit or ICC 10.x). Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2012-12-12 05:23:46 +01:00
Mans Rullgard	30b3916425	ac3dec: make downmix() take array of pointers to channel data	2012-12-09 15:52:01 +00:00
Christophe Gisquet	2aef3d66c9	SBR DSP x86: implement SSE sbr_hf_gen Start and end index are multiple of 2, therefore guaranteeing aligned access. Also, this allows to generate 4 floats per loop, keeping the alignment all along. Timing: - 32 bits: 326c -> 172c - 64 bits: 323c -> 156c Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-12-07 11:04:26 +01:00
Diego Biurrun	9b15c0a9b3	x86: dsputilenc: port to cpuflags	2012-11-28 16:05:44 +01:00
Diego Biurrun	89145fbbfe	x86: h264dsp: Fix linking with yasm and optimizations disabled Some optimized functions reference optimized symbols, so the functions must be explicitly disabled when those symbols are unavailable.	2012-11-28 14:45:28 +01:00
Diego Biurrun	2e89aeed65	x86: h264_idct: port to cpuflags	2012-11-28 00:28:09 +01:00
Diego Biurrun	28e1cf19aa	x86: h264_weight: port to cpuflags	2012-11-27 21:10:38 +01:00
Diego Biurrun	7ee4071362	x86: fix build without inline asm The qpel functions referenced here are not related to h264 and should thus never have been under CONFIG_H264QPEL. Signed-off-by: Mans Rullgard <mans@mansr.com> Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-11-26 01:50:47 +01:00
Justin Ruggles	2d3993ce8c	x86: h264 qpel: use the correct number of utilized xmm regs in cglobal Fixes xmm register clobbering on win64.	2012-11-25 18:48:43 -05:00
Daniel Kang	610e00b359	x86: h264: Convert 8-bit QPEL inline assembly to YASM Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-11-25 20:38:35 +01:00
Daniel Kang	ad01ba6cea	x86: h264: Remove 3dnow QPEL code The only CPUs that have 3dnow and don't have mmxext are 12 years old. Moreover, AMD has dropped 3dnow extensions from newer CPUs. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-11-25 20:32:55 +01:00
Diego Biurrun	28c8e288fa	x86: h264_chromamc: port to cpuflags	2012-11-25 17:25:10 +01:00
Diego Biurrun	89923fce70	x86: h264_intrapred: Fix C function names in comments Function names changed after switching to declaration with PRED4x4/8x8/8x8L/16x16 macros in the C code.	2012-11-18 18:34:05 +01:00
Diego Biurrun	87af05c575	x86: SPLATD: port to cpuflags	2012-11-18 18:34:05 +01:00
Diego Biurrun	8c3849bc76	x86: dsputil: port to cpuflags	2012-11-16 10:38:23 +01:00
Diego Biurrun	26301caaa1	x86: mmx2 ---> mmxext in asm constructs	2012-11-14 00:58:51 +01:00
Diego Biurrun	5e9c6ef8f3	x86: h264_weight_10bit: port to cpuflags	2012-11-13 19:07:09 +01:00
Diego Biurrun	2b479bcab0	build: Drop AVX assembly ifdefs An assembler able to cope with AVX instructions is now required.	2012-11-11 20:43:28 +01:00
Diego Biurrun	6cd796049d	x86: h264_qpel_10bit: drop unused parameter from MC10/MC20/MC30 macros	2012-11-10 14:49:09 +01:00
Diego Biurrun	4b60fac419	x86: PALIGNR: port to cpuflags	2012-11-09 21:31:31 +01:00
Diego Biurrun	4d1f69f244	x86: h264_qpel_10bit: port to cpuflags	2012-11-09 21:17:05 +01:00
Diego Biurrun	6ca60d4ddd	x86: h264_intrapred: port to cpuflags	2012-11-08 18:05:23 +01:00
Diego Biurrun	930e26a3ea	x86: h264qpel: Only define mmxext QPEL functions if H264QPEL is enabled This fixes compilation with --disable-everything and components enabled.	2012-11-05 20:48:43 +01:00
Diego Biurrun	dbb37e7711	x86: PABSW: port to cpuflags	2012-11-05 14:51:10 +01:00
Diego Biurrun	6c104826bd	x86: vc1dsp: port to cpuflags	2012-11-05 14:51:10 +01:00
Diego Biurrun	0a7a94f2e5	x86: Refactor PSWAPD fallback implementations and port to cpuflags	2012-11-02 17:05:29 +01:00
Diego Biurrun	26f01bd106	x86: PMINUB: port to cpuflags	2012-11-02 15:38:15 +01:00
Diego Biurrun	9ce02e14f0	x86: ac3dsp: port to cpuflags	2012-11-02 15:24:50 +01:00
Diego Biurrun	c37322e68c	x86: Move optimization suffix to end of function names This simplifies cpuflags porting.	2012-10-31 18:21:55 +01:00
Diego Biurrun	fa8fcab1e0	x86: h264_chromamc_10bit: drop pointless PAVG %define It is only used in one place so there is no need for the abstraction.	2012-10-31 18:21:55 +01:00
Diego Biurrun	d8eda37080	x86: mmx2 ---> mmxext in function names	2012-10-31 17:53:57 +01:00
Diego Biurrun	be2c456e96	x86: fmtconvert: Refactor cvtps2pi emulation through cpuflags	2012-10-31 01:05:03 +01:00
Diego Biurrun	be923ed659	x86: fmtconvert: port to cpuflags	2012-10-31 01:05:03 +01:00
Diego Biurrun	588fafe7f3	x86: MMX2 ---> MMXEXT in macro names	2012-10-31 01:04:55 +01:00
Diego Biurrun	652f518594	x86: mmx2 ---> mmxext in comments and messages	2012-10-31 00:37:42 +01:00
Diego Biurrun	04581c8c77	x86: yasm: Use complete source path for macro helper %includes This is more consistent with the way we handle C #includes and it simplifies the build system.	2012-10-31 00:37:42 +01:00
Diego Biurrun	6860b4081d	x86: include x86inc.asm in x86util.asm This is necessary to allow refactoring some x86util macros with cpuflags.	2012-10-31 00:37:42 +01:00
Ronald S. Bultje	95c89da36e	Use ptrdiff_t instead of int for intra pred "stride" function parameter. This way, SIMD-optimized functions don't have to sign-extend their stride argument manually to be able to do pointer arithmetic.	2012-10-29 17:49:13 -07:00
Ronald S. Bultje	bad8e33dc9	x86: use PRED4x4/8x8/8x8L/16x16 macros to declare intrapred prototypes.	2012-10-29 17:48:23 -07:00
Ronald S. Bultje	c285edd06e	Remove usage of INIT_AVX in h264_intrapred_10bit.asm. Replace INIT_AVX by INIT_XMM avx. Port the whole file to use cpuflag based function declarations. Remove (now unused) cputype argument in function declaration macros. Change function prototypes to have mmx2 instead of mmxext as suffix, since that's required by cpuflags.	2012-10-29 14:10:51 -07:00
Luca Barbato	2d6caade22	dsputil: split out mlp dsp function	2012-10-11 12:01:08 +02:00
Janne Grunau	7e522859fc	x86: vc1: call ff_vc1dsp_init_x86() under if (ARCH_X86)	2012-10-08 11:54:05 +02:00
Janne Grunau	cb36febcbc	x86: cavs: call ff_cavsdsp_init_x86() under if (ARCH_X86)	2012-10-08 11:54:05 +02:00
Janne Grunau	f101eab1be	x86: call most of the x86 dsp init functions under if (ARCH_X86) Rename the called dsp init functions to *_init_x86.	2012-10-08 11:54:05 +02:00
Diego Biurrun	e4cbf7529b	Give all anonymously typedeffed structs in headers a name Anonymous structs cannot be forward declared and have no benefit.	2012-10-06 09:27:11 +02:00
Mans Rullgard	bcf07a15a0	x86: dsputil: kill VLA in gmc_mmx() Instead of using an evil VLA, fall back to C version when edge emulation is needed. MPEG4 GMC is a rarely used fringe feature so the speed loss is an acceptable cost for safer code. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-10-05 22:33:32 +01:00
Diego Biurrun	9c6cf7f2c9	avcodec: Drop silly and/or broken printf debug output	2012-10-01 10:24:28 +02:00
Michael Niedermayer	791b5954bc	dsputil_mmx: fix reading prior of the src array in sub_hfyu_median_prediction() This should fix the utvideoenc valgrind failure Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2012-09-28 12:25:07 -04:00
Diego Biurrun	58139e141b	x86: dsputil: Move Xvid IDCT put/add functions to a more suitable place	2012-09-14 01:59:47 +02:00
Diego Biurrun	2017f0fdb7	x86: Remove some leftover declarations for non-existent functions	2012-09-13 21:38:47 +02:00
Martin Storsjö	91ff4e83ca	x86: ac3dsp: Only refer to the ac3_downmix_sse symbol if it has been declared This fixes building without inline assembly. Signed-off-by: Martin Storsjö <martin@martin.st>	2012-09-13 13:51:52 +03:00
Mans Rullgard	97cb9236cf	ac3: move ac3_downmix() from dsputil to ac3dsp Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-09-12 23:39:50 +01:00
Diego Biurrun	1648a508fa	x86: dsputil: Move specific optimization settings out of global init function They belong in the init functions specific to each CPU capability.	2012-09-11 10:12:17 +02:00
Diego Biurrun	a84edbacaf	x86: dsputil: Only compile motion_est code when encoders are enabled	2012-09-10 08:31:47 +02:00
Diego Biurrun	e0c6cce447	x86: Replace checks for CPU extensions and flags by convenience macros This separates code relying on inline from that relying on external assembly and fixes instances where the coalesced check was incorrect.	2012-09-08 18:18:34 +02:00
Hendrik Leppkes	fb4e983e0c	x86: mlpdsp: mlp_filter_channel_x86 requires inline asm Signed-off-by: Martin Storsjö <martin@martin.st>	2012-09-08 15:41:44 +03:00
Diego Biurrun	1169f0d0af	x86: more specific checks for availability of required assembly capabilities	2012-09-07 18:16:04 +02:00
Diego Biurrun	8cb7ed5562	x86: avcodec: Drop silly "_mmx" suffix from dsputil template names	2012-09-07 13:50:52 +02:00
Mans Rullgard	6efb698883	cavsdsp: set idct permutation independently of dsputil CAVS uses its own idct so using dsputil to set the permutation is fragile. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-09-07 11:42:35 +01:00
Mans Rullgard	5fe64d88f6	x86: allow using add_hfyu_median_prediction_cmov on any cpu with cmov For some reason add_hfyu_median_prediction_cmov is only selected on 3Dnow-capable CPUs, even though it uses no 3Dnow instructions. This patch allows it to be selected on any cpu with cmov with the possibility of being overridden by the mmxext version. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-09-07 11:42:35 +01:00
Diego Biurrun	ef6ba1f237	x86: dsputil: Do not redundantly check for CPU caps before calling init funcs The init functions check for CPU capabilities on their own already.	2012-09-06 09:05:52 +02:00
Hendrik Leppkes	d914ea6fd8	x86: vp56: cmov version of vp56_rac_get_prob requires inline asm Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-09-05 21:30:46 +02:00
Diego Biurrun	a84ac7a860	x86: h264dsp: drop some unnecessary ifdefs around prototype declarations	2012-09-04 01:44:59 +02:00
Diego Biurrun	17337f54c0	x86: Split inline and external assembly #ifdefs	2012-08-31 01:53:25 +02:00
Diego Biurrun	ec36aa6944	x86: Fix linking with some or all of yasm, mmx, optimizations disabled Some optimized template functions reference optimized symbols, so they must be explicitly disabled when those symbols are unavailable.	2012-08-30 19:37:32 +02:00
Diego Biurrun	a886b279a0	x86: cosmetics: Comment some #endifs for better readability	2012-08-30 18:50:33 +02:00
Diego Biurrun	2e6f93a284	x86: Always compile files with functions that are called unconditionally	2012-08-29 00:27:06 +02:00
Diego Biurrun	2f2aa2e542	x86: mpegvideoenc: fix linking with --disable-mmx The optimized dct_quantize template functions reference optimized fdct symbols, so these functions must only be enabled if the relevant optimizations have been enabled by configure.	2012-08-29 00:26:56 +02:00
Diego Biurrun	d39791bf39	x86: mpegvideoenc: Do not abuse HAVE_ variables for template instantiation This avoids trouble if HAVE_ variables are used elsewhere in the file.	2012-08-29 00:14:52 +02:00
Diego Biurrun	bcc45d6348	x86: avcodec: Drop silly "_mmx" suffixes from filenames	2012-08-28 18:37:34 +02:00
Diego Biurrun	efbd04c332	x86: avcodec: Drop silly "_sse" suffixes from filenames	2012-08-28 18:37:33 +02:00
Diego Biurrun	3f02c533f3	build: fft: x86: Drop unused YASM-OBJS-FFT- variable	2012-08-27 03:10:58 +02:00
Mans Rullgard	db70730291	x86: fft: remove unused fft_dispatch* functions These functions are not used since the yasm conversion. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-08-25 23:58:26 +01:00
Diego Biurrun	dc40285427	x86: mpegvideo: more sensible names for optimization file and init function	2012-08-24 02:23:16 +02:00
Diego Biurrun	d211547ddd	x86: mpegvideoenc: Split optimizations off into a separate file	2012-08-24 02:23:16 +02:00
Diego Biurrun	26ce9aec03	dnxhdenc: x86: more sensible names for optimization file and init function	2012-08-24 02:23:15 +02:00
Diego Biurrun	6fa488678f	build: x86: Only compile mpegvideo optimizations when necessary	2012-08-22 01:06:33 +02:00
Diego Biurrun	6961bdface	x86: avcodec: Consistently name all init files	2012-08-16 11:05:38 +02:00
Martin Storsjö	1d9c2dc89a	Don't include common.h from avutil.h Signed-off-by: Martin Storsjö <martin@martin.st>	2012-08-15 22:32:06 +03:00
Diego Biurrun	29cfdd3767	x86: avcodec: Appropriately name files containing only init functions	2012-08-15 03:24:08 +02:00
Diego Biurrun	be12958937	mpegvideo_mmx_template: drop some commented-out cruft	2012-08-15 03:24:07 +02:00
Mans Rullgard	8ec0204ee4	x86: cabac: allow building with suncc This fixes two issues preventing suncc from building this code. The undocumented 'a' operand modifier, causing gcc to omit a $ in front of immediate operands (as required in addresses), is not supported by suncc. Luckily, the also undocumented 'c' modifer has the same effect and is supported. On some asm statements with a large number of operands, suncc for no obvious reason fails to correctly substitute some of the operands. Fortunately, some of the operands in these statements are plain numbers which can be inserted directly into the code block instead of passed as operands. With these changes, the code builds correctly with both gcc and suncc. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-08-13 14:51:52 +01:00
Mans Rullgard	c8252e80eb	x86: mlpdsp: avoid taking address of void This code contains a C array of addresses of labels defined in inline asm. To do this, the names must be declared as external in C. The declared type does not matter since only the address is used, and for some reason, the author of the code used the 'void' type despite taking the address of a void expression being invalid. Changing the type to char, a reasonable choice since the alignment of the code labels cannot be known or guaranteed, eliminates gcc warnings and allows building with suncc. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-08-13 14:51:52 +01:00
Diego Biurrun	3b9e832e17	x86: Drop silly "_yasm" suffixes from filenames	2012-08-12 17:13:05 +02:00
Mans Rullgard	d7a4f8f8b9	Move MASK_ABS macro to libavcodec/mathops.h This macro is only used in two places, both in libavcodec, so this is a more sensible place for it. Two small tweaks to the macro are made: - removing the trailing semicolon - dropping unnecessary 'volatile' from the x86 asm Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-08-09 00:58:20 +01:00
Mans Rullgard	c318626ce2	x86: rename libavutil/x86_cpu.h to libavutil/x86/asm.h This puts x86-specific things in the x86/ subdirectory where they belong. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-08-09 00:58:20 +01:00
Dave Yeo	197439c1ef	x86: pngdsp: Fix assembly for OS/2 The a.out object format does not allow aligning sections. On OS/2 LD aligns sections to 16 bytes. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-08-08 15:45:09 +02:00
Mans Rullgard	2b140a3d09	x86: use 32-bit source registers with movd instruction yasm tolerates mismatch between movd/movq and source register size, adjusting the instruction according to the register. nasm is more strict. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-08-07 15:21:20 +01:00
Mans Rullgard	a3df4781f4	x86: add colons after labels nasm prints a warning if the colon is missing. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-08-07 15:20:56 +01:00
Anton Khirnov	36ef5369ee	Replace all CODEC_ID_* with AV_CODEC_ID_*	2012-08-07 16:00:24 +02:00
Diego Biurrun	2096857551	x86: h264_idct: Rename x264_add8x4_idct_sse2 --> h264_add8x4_idct_sse2	2012-08-05 21:40:49 +02:00
Ronald S. Bultje	4a8143e73c	fft: 3dnow: fix register name typo in DECL_IMDCT macro Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-08-04 00:16:02 +02:00
Diego Biurrun	0c3ff1982c	x86: dct32: port to cpuflags	2012-08-03 22:51:06 +02:00
Diego Biurrun	239fdf1b4a	x86: build: replace mmx2 by mmxext Refactoring mmx2/mmxext YASM code with cpuflags will force renames. So switching to a consistent naming scheme beforehand is sensible. The name "mmxext" is more official and widespread and also the name of the CPU flag, as reported e.g. by the Linux kernel.	2012-08-03 22:51:05 +02:00
Ronald S. Bultje	da6505ad2f	dsputil: make add_hfyu_left_prediction_sse4() support unaligned src. This makes add_hfyu_left_prediction_sse4() handle sources that are not 16-byte aligned in its own function rather than by proxying the call to add_hfyu_left_prediction_ssse3(). This fixes a crash on Win64, since the sse4 version clobberes xmm6, but the ssse3 version (which uses MMX regs) does not restore it, thus leading to XMM clobbering and RSP being off. Fixes bug 342.	2012-08-03 11:09:14 -07:00
Diego Biurrun	ca844b7be9	x86: Use consistent 3dnowext function and macro name suffixes Currently there is a wild mix of 3dn2/3dnow2/3dnowext. Switching to "3dnowext", which is a more common name of the CPU flag, as reported e.g. by the Linux kernel, unifies this.	2012-08-03 14:00:47 +02:00
Diego Biurrun	03737412a3	x86: proresdsp: improve SIGNEXTEND macro comments	2012-08-02 22:30:44 +02:00
Diego Biurrun	81905088a1	x86: h264dsp: K&R formatting cosmetics	2012-08-02 20:20:21 +02:00
Ronald S. Bultje	c728518b3c	x86: fft: fix imdct_half() for AVX Some calculations were changed in `b6a3849` to use mmsize, which was not correct for the AVX version, which uses INIT_YMM and therefore has mmsize == 32. Fixes Bug 341. Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>	2012-08-02 13:40:11 -04:00
Mans Rullgard	ec7c501ed5	x86: remove libmpeg2 mmx(ext) idct functions These functions are not faster than other mmx implementations on any hardware I have been able to test on, and they are horribly inaccurate. There is thus no reason to ever use them. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-08-02 12:14:52 +01:00
Ronald S. Bultje	b6a3849adb	fft: port FFT/IMDCT 3dnow functions to yasm, and disable on x86-64. 64-bit CPUs always have SSE available, thus there is no need to compile in the 3dnow functions. This results in smaller binaries.	2012-07-31 21:20:47 -07:00
Ronald S. Bultje	53dfaedc01	x86/dsputilenc: bury inline asm under HAVE_INLINE_ASM.	2012-07-31 20:28:52 -07:00
Diego Biurrun	6376a3ad24	x86: h264dsp: Remove unused variable ff_pb_3_1	2012-08-01 00:17:16 +02:00
Diego Biurrun	8728b381cb	x86: h264dsp: Adjust YASM #ifdefs This fixes compilation with YASM disabled.	2012-07-31 13:54:07 +02:00
Ronald S. Bultje	b829b4ce29	h264: convert loop filter strength dsp function to yasm. This completes the conversion of h264dsp to yasm; note that h264 also uses some dsputil functions, most notably qpel. Performance-wise, the yasm-version is ~10 cycles faster (182->172) on x86-64, and ~8 cycles faster (201->193) on x86-32.	2012-07-30 19:39:47 -07:00
Ronald S. Bultje	c83f44dba1	h264_idct_10bit: port x86 assembly to cpuflags.	2012-07-28 08:29:45 -07:00
Ronald S. Bultje	b3c5ae5607	fft: rename "z" to "zc" to prevent name collision. Without this, cglobal will expand "z" to "zh" to access the high byte in a register's word, which causes a name collision with the ZH(x) macro further up in this file.	2012-07-28 08:29:44 -07:00
Ronald S. Bultje	4d777eedfd	vp3: don't compile mmx IDCT functions on x86-64. 64-bit CPUs always have SSE2, and a SSE2 version exists, thus the MMX version will never be used.	2012-07-27 20:12:30 -07:00
Ronald S. Bultje	a5bbb1242c	h264_loopfilter: port x86 simd to cpuflags.	2012-07-27 20:12:11 -07:00
Ronald S. Bultje	d07ff3cd5a	h264_chromamc_10bit: port x86 simd to cpuflags.	2012-07-27 17:35:49 -07:00
Ronald S. Bultje	4a26fdd852	vp3: port x86 SIMD to cpuflags.	2012-07-27 17:35:49 -07:00
Ronald S. Bultje	76888c64b0	rv34: port x86 SIMD to cpuflags.	2012-07-27 15:13:26 -07:00
Ronald S. Bultje	158744a4cd	vp56: only compile MMX SIMD on x86-32. All x86-64 CPUs have SSE2, so the MMX version will never be used. This leads to smaller binaries.	2012-07-27 14:40:27 -07:00
Ronald S. Bultje	2734ba787b	vp56: port x86 simd to cpuflags.	2012-07-27 14:39:07 -07:00
Ronald S. Bultje	5361e10a5e	proresdsp: port x86 assembly to cpuflags.	2012-07-27 11:43:06 -07:00
Ronald S. Bultje	bde73f28af	mpegaudio: bury inline asm under HAVE_INLINE_ASM.	2012-07-26 13:43:16 -07:00
Ronald S. Bultje	30b45d9c38	x86inc: automatically insert vzeroupper for YMM functions.	2012-07-26 13:43:16 -07:00
Ronald S. Bultje	a1878a88a1	vp3: don't use calls to inline asm in yasm code. Mixing yasm and inline asm is a bad idea, since if either yasm or inline asm is not supported by your toolchain, all of the asm stops working. Thus, better to use either one or the other alone. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	2012-07-25 14:24:30 -04:00
Ronald S. Bultje	79195ce565	x86/dsputil: put inline asm under HAVE_INLINE_ASM. This allows compiling with compilers that don't support gcc-style inline assembly. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	2012-07-25 14:24:27 -04:00
Yang Wang	845e92fd6a	dsputil_mmx: fix incorrect assembly code In ff_put_pixels_clamped_mmx(), there are two assembly code blocks. In the first block (in the unrolled loop), the instructions "movq 8%3, %%mm1 \n\t", and so forth, have problems. From above instruction, it is clear what the programmer wants: a load from p + 8. But this assembly code doesn’t guarantee that. It only works if the compiler puts p in a register to produce an instruction like this: "movq 8(%edi), %mm1". During compiler optimization, it is possible that the compiler will be able to constant propagate into p. Suppose p = &x[10000]. Then operand 3 can become 10000(%edi), where %edi holds &x. And the instruction becomes "movq 810000(%edx)". That is, it will stride by 810000 instead of 8. This will cause a segmentation fault. This error was fixed in the second block of the assembly code, but not in the unrolled loop. How to reproduce: This error is exposed when we build using Intel C++ Compiler, with IPO+PGO optimization enabled. Crashed when decoding an MJPEG video. Signed-off-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	2012-07-25 14:22:18 -04:00
Jason Garrett-Glaser	85a3c19ed1	dsputil: x86: add SHUFFLE_MASK_W macro Simplifies pshufb masks that operate on words.	2012-07-22 16:56:58 -04:00
Diego Biurrun	9f97af2688	x86: dsputil: drop some unused CPU flag debug code	2012-07-19 10:17:56 +02:00
Mans Rullgard	28f9ab7029	vp3: move idct and loop filter pointers to new vp3dsp context This moves all VP3-specific function pointers from dsputil to a new vp3dsp context. There is no reason to ever use the VP3 IDCT where an MPEG2 IDCT is expected or vice versa. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-07-18 10:32:19 +01:00
Mans Rullgard	ab9f987661	build: add CONFIG_VP3DSP, reduce repetition in OBJS lists Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-07-18 10:32:18 +01:00
Martin Storsjö	f27386cdc7	x86: h264_intrapred: Don't add the 'd' suffix to the SPLATB_REG macro The SPLATB_REG macro already adds the 'd' suffix internally. This fixes building on Win64, which has been broken since `878e66902`. This worked for unix, where r2 happened to be rdx in this case, which with the first suffix rdxd was mapped to eax, and eaxd is defined back to eax. On win64 however, r2 happened to be R8 in this case, and R8d mapps to R8D just fine, but there's no mapping for R8Dd to anything. Signed-off-by: Martin Storsjö <martin@martin.st>	2012-07-06 21:07:23 +03:00
Diego Biurrun	878e669029	x86: h264_intrapred: use newly introduced SPLAT* and PSHUFLW macros	2012-07-05 17:37:11 +02:00
Loren Merritt	4d4752366f	x86inc: add SPLATB_LOAD, SPLATB_REG, PSHUFLW macros Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-07-05 17:37:11 +02:00
Diego Biurrun	d20f133ef9	x86: h264_intrapred: port to cpuflag macros	2012-07-05 17:37:10 +02:00
Martin Storsjö	07eeeb1d4f	vp8: Add ifdef guards around the sse2 loopfilter in the sse2slow branch too This was missed in the the previous commit in `70a1c800`. Signed-off-by: Martin Storsjö <martin@martin.st>	2012-07-05 09:39:01 +03:00
Martin Storsjö	70a1c8000f	vp8: loopfilter >=sse2 functions need aligned stack on x86-32. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-07-04 08:25:50 -07:00
Ronald S. Bultje	723b266d72	dsputilenc: group yasm and inline asm function pointer assignment.	2012-07-04 07:46:27 -07:00
Ronald S. Bultje	ceabc13f12	dsputilenc_mmx: split assignment of ff_sse16_sse2 to SSE2 section.	2012-06-30 09:24:52 -07:00
Ronald S. Bultje	66a02159ea	x86: fmtconvert: add special asm for float_to_int16_interleave_misc_* This gets rid of a variable-length array and a for loop in C code. Signed-off-by: Martin Storsjö <martin@martin.st>	2012-06-30 19:10:36 +03:00
Mans Rullgard	f2fd167835	x86: vc1: fix and enable optimised loop filter The problem is that the ssse3 psign instruction does the wrong thing here. Commit `ea60dfe` incorrectly removed a macro emulating this instruction for pre-ssse3 code. However, the emulation is incorrect, and the code relies on the behaviour of the macro. Specifically, the psign sets destination elements to zero where the corresponding source element is zero, whereas the emulation only negates destination elements where the source is negative. Furthermore, the PSIGNW_MMX macro in x86util.asm is totally bogus, which is why the original VC-1 code had an additional right shift when using it. Since the psign instruction cannot be used here, skip all the macro hell and use the working instruction sequence directly. None of this was noticed due a stray return statement in ff_vc1dsp_init_mmx() which meant that only the mmx version of the loop filter was ever used (before being removed in `ea60dfe`). Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-06-30 00:12:05 +01:00
Christophe Gisquet	a5bfa66df5	x86: fft: replace call to memcpy by a loop The function call was a mess to handle, and memcpy cannot make the assumptions we do in the new code. Tested on an IMC sample: 430c -> 370c. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-06-27 12:49:33 +01:00
Mans Rullgard	0595334892	x86: fft: elf64: fix PIC build In a 64-bit PIC build, external functions must be called through the PLT. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-06-25 22:58:18 +01:00
Mans Rullgard	8725da49a2	x86: fft: win64: fix stack alignment for memcpy() call	2012-06-25 15:10:39 +01:00
Mans Rullgard	8299260470	x86: fft: convert sse inline asm to yasm	2012-06-25 13:31:00 +01:00
Ronald S. Bultje	8123e0901f	x86: place some inline asm under #if HAVE_INLINE_ASM Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-06-25 13:23:12 +01:00
Mans Rullgard	0b6f973635	h264: use asm cabac reader under a generic condition This removes a dependency on implementation details from generic code and allows easy addition of the equivalent optimisation for other architectures than x86. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-06-23 22:14:21 +01:00
Diego Biurrun	fe07c9c6b5	x86: Only use optimizations with cmov if the CPU supports the instruction	2012-06-23 16:21:50 +02:00
Mans Rullgard	29686d6ea3	x86: remove unused inline asm macros from dsputil_mmx.h Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-06-23 14:14:06 +01:00
Mans Rullgard	685f5438bb	x86: move some inline asm macros to the only places they are used Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-06-23 14:14:06 +01:00
Diego Biurrun	a5a93fa8f5	cosmetics: do not use full path for local headers	2012-06-22 10:49:40 +02:00
Ronald S. Bultje	d9669eab0b	dwt: remove variable-length arrays Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-06-17 23:20:10 +01:00
Justin Ruggles	d5a7229ba4	Add a float DSP framework to libavutil Move vector_fmul() from DSPContext to AVFloatDSPContext.	2012-06-08 13:14:38 -04:00
Vitor Sessak	bac0729d9e	x86: use new schema for ASM macros Signed-off-by: Janne Grunau <janne-libav@jannau.net>	2012-05-29 14:49:45 +02:00
Justin Ruggles	713548cbad	x86: lavc: use %if HAVE_AVX guards around AVX functions in yasm code. This is needed for older versions of yasm/nasm that do not support AVX. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-05-22 20:46:02 +02:00
Kieran Kunhya	5ff01259a8	Convert vector_fmul range of functions to YASM and add AVX versions Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>	2012-05-21 17:13:05 -04:00
Michael Kostylev	6797d1948b	x86: rv40: Mark rv40_weight functions as MMX2; they use MMX2 instructions.	2012-05-15 23:54:08 +02:00
Justin Ruggles	95a98ab3f0	ac3dsp: simplify x86 versions of ac3_max_msb_abs_int16 Simplifies the code by using cpuflags and a new macro. Also fixes the invalid use of the MMX2 pshufw operation in the MMX-only function.	2012-05-15 15:23:59 -04:00
Vitor Sessak	fcc456b829	x86: use more standard construct for setting ASM functions in FFT code Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-05-14 15:38:42 +02:00
Michael Kostylev	ea60dfe284	x86: vc1: drop MMX loop filter implementation, which uses MMX2 instructions.	2012-05-12 14:02:45 +02:00
Christophe Gisquet	110d0cdc9d	rv40dsp x86: MMX/MMX2/3DNow/SSE2/SSSE3 implementations of MC Code mostly inspired by vp8's MC, however: - its MMX2 horizontal filter is worse because it can't take advantage of the coefficient redundancy - that same coefficient redundancy allows better code for non-SSSE3 versions Benchmark (rounded to tens of unit): V8x8 H8x8 2D8x8 V16x16 H16x16 2D16x16 C 445 358 985 1785 1559 3280 MMX* 219 271 478 714 929 1443 SSE2 131 158 294 425 515 892 SSSE3 120 122 248 387 390 763 End result is overall around a 15% speedup for SSSE3 version (on 6 sequences); all loop filter functions now take around 55% of decoding time, while luma MC dsp functions are around 6%, chroma ones are 1.3% and biweight around 2.3%. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-05-10 18:42:43 +02:00
Ronald S. Bultje	bec207f9f9	snowdsp: explicitily state instruction size. Fixes a compile error with clang at -O0.	2012-05-02 09:57:12 -07:00
Christophe GISQUET	e75d1d4f73	dsputil x86: revert a test back to its previous value Commit `356ee8d` caused the initial inversion. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-28 11:00:51 -07:00
Christophe Gisquet	fe5ed69dc7	rv34dsp x86: implement MMX2 inverse transform 141 cycles down to 51. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-28 10:58:47 -07:00
Roland Scheidegger	9b9df1cdff	h264: new assembly version of get_cabac for x86_64 with PIC This adds a hand-optimized assembly version for get_cabac much like the existing one, but it works if the table offsets are RIP-relative. Compared to the non-RIP-relative version this adds 2 lea instructions and it needs one extra register. get_cabac() gets about 40% faster, for an overall speedup of about 5%. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-28 09:43:25 -07:00
Roland Scheidegger	14e9ffc1e4	h264: use one table instead of several for cabac functions The reason is this is easier for PIC code (in particular on darwin...). Keep the old names as pointers (static in cabac_functions.h so gcc knows these are just immediate offsets) so the c code can nicely stay the same (alternatively could use offsets directly in the functions needing the tables). This should produce the same code as before with non-pic and better code (confirmed) with pic. The assembly uses the new table but still won't work for PIC case. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-28 08:26:12 -07:00
Roland Scheidegger	444f47b55c	h264: (trivial) remove unneeded macro argument in x86/cabac.h Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-28 08:24:56 -07:00
Mans Rullgard	2bcbd98459	Remove lowres video decoding This feature is complex, of questionable utility, and slows down normal decoding. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-04-21 18:56:19 +01:00
Mans Rullgard	95510be8c3	avcodec: remove AVCodecContext.dsp_mask This removes all references to AVCodecContext.dsp_mask and marks it for eviction at the next version bump. It has been superseded by av_set_cpu_flag_mask() which, unlike this field, works everywhere. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-04-21 18:30:01 +01:00
Ronald S. Bultje	87a246341b	h264: use proper PROLOGUE statement for a function using 8 registers. Fixes crashes when using biweight on win64.	2012-04-16 08:07:21 -07:00
Ronald S. Bultje	b089ca871a	dsputil: fix optimized emu_edge function on Win64. Recent register allocation changes (x86inc.asm update) changed the register order and thus opcodes for the inner loops. One of them became >128bytes, which confuses other parts of this function where it jumps to fixed-offset positions to extend the edge by fixed amounts. A simple register change fixes this.	2012-04-13 11:28:30 -07:00
Justin Ruggles	de7f22ab0c	ac3dsp: call femms/emms at the end of float_to_fixed24() for 3DNow and SSE Fixes ac3-encode and eac3-encode FATE test failures with SSE2 disabled. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-12 21:33:04 -07:00
Ronald S. Bultje	76538d7a78	h264: fix 10bit biweight functions after recent x86inc.asm fixes. This should have been updated in the x86inc.asm update, but was accidently forgotten.	2012-04-12 21:13:57 -07:00
Diego Biurrun	7bb3a302fe	build: Consistently handle conditional compilation for all optimization OBJS.	2012-04-12 09:00:49 +02:00
Henrik Gramner	729f90e268	x86inc improvements for 64-bit Add support for all x86-64 registers Prefer caller-saved register over callee-saved on WIN64 Support up to 15 function arguments Also (by Ronald S. Bultje) Fix up our asm to work with new x86inc.asm. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>	2012-04-11 15:47:00 -04:00
Christophe GISQUET	2130bd8f5b	rv40dsp x86: use only one register, for both increment and loop counter Around 10 cycles faster for luma. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-10 10:07:09 -07:00
Christophe GISQUET	272b252c01	rv40dsp: implement prescaled versions for biweight. Quite often, the original weights are multiple of 512. By prescaling them by 1/512 when they are computed (once per frame), no intermediate shifting is needed, and no prescaling on each call either. The x86 code already used that trick. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-10 10:06:48 -07:00
Christophe GISQUET	6b81da2fd0	dsputil x86: use SSE float instruction instead of SSE2 integer equivalent All the more required since the users are pure SSE functions. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-04 11:24:27 -07:00
Christophe GISQUET	cd88105f6f	dsputil x86: remove deprecated parameter from scalarproduct_int16 prototype Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-04 11:24:08 -07:00
Christophe GISQUET	f9888520cc	vp8dsp x86: perform rounding shift with a single instruction Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-04 11:23:36 -07:00
Ronald S. Bultje	a940198130	cabac: add overread protection to BRANCHLESS_GET_CABAC(). Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind	2012-03-28 08:01:29 -07:00
Ronald S. Bultje	448dc42571	cabac: increment jump locations by one in callers of BRANCHLESS_GET_CABAC().	2012-03-28 08:01:29 -07:00

... 3 4 5 6 7 ...

946 Commits