ffmpeg

Author	SHA1	Message	Date
Ronald S. Bultje	55aa03b9f8	floatdsp: move vector_fmul_add from dsputil to avfloatdsp.	2013-01-22 11:55:42 -08:00
Ronald S. Bultje	8a4f26206d	dsputil: remove butterflies_float_interleave. The function is unused.	2013-01-20 21:57:35 -08:00
Mans Rullgard	0b711ca3f3	dsputil: drop non-compliant "fast" qpel mc functions Signed-off-by: Diego Biurrun <diego@biurrun.de>	2013-01-20 14:50:42 +01:00
Ronald S. Bultje	fef906c77c	Move vorbis_inverse_coupling from dsputil to vorbisdspcontext. Conveniently (together with Justin's earlier patches), this makes our vorbis decoder entirely independent of dsputil.	2013-01-19 22:21:10 -08:00
Diego Biurrun	822b0728f0	x86: dsputil: Drop some unused macro definitions	2013-01-18 22:24:58 +01:00
Justin Ruggles	e034cc6c60	lavc: Move vector_fmul_window to AVFloatDSPContext Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2013-01-16 10:45:45 +01:00
Ronald S. Bultje	8c53d39e7f	lavc: introduce VideoDSPContext Move some functions from dsputil. The idea is that videodsp contains functions that are useful for a large and varied set of video decoders. Currently, it contains emulated_edge_mc() and prefetch(). Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2012-12-20 13:40:45 +01:00
Diego Biurrun	7ee4071362	x86: fix build without inline asm The qpel functions referenced here are not related to h264 and should thus never have been under CONFIG_H264QPEL. Signed-off-by: Mans Rullgard <mans@mansr.com> Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-11-26 01:50:47 +01:00
Daniel Kang	610e00b359	x86: h264: Convert 8-bit QPEL inline assembly to YASM Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-11-25 20:38:35 +01:00
Daniel Kang	ad01ba6cea	x86: h264: Remove 3dnow QPEL code The only CPUs that have 3dnow and don't have mmxext are 12 years old. Moreover, AMD has dropped 3dnow extensions from newer CPUs. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-11-25 20:32:55 +01:00
Diego Biurrun	8c3849bc76	x86: dsputil: port to cpuflags	2012-11-16 10:38:23 +01:00
Diego Biurrun	26301caaa1	x86: mmx2 ---> mmxext in asm constructs	2012-11-14 00:58:51 +01:00
Diego Biurrun	c37322e68c	x86: Move optimization suffix to end of function names This simplifies cpuflags porting.	2012-10-31 18:21:55 +01:00
Diego Biurrun	d8eda37080	x86: mmx2 ---> mmxext in function names	2012-10-31 17:53:57 +01:00
Diego Biurrun	588fafe7f3	x86: MMX2 ---> MMXEXT in macro names	2012-10-31 01:04:55 +01:00
Diego Biurrun	652f518594	x86: mmx2 ---> mmxext in comments and messages	2012-10-31 00:37:42 +01:00
Mans Rullgard	bcf07a15a0	x86: dsputil: kill VLA in gmc_mmx() Instead of using an evil VLA, fall back to C version when edge emulation is needed. MPEG4 GMC is a rarely used fringe feature so the speed loss is an acceptable cost for safer code. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-10-05 22:33:32 +01:00
Diego Biurrun	58139e141b	x86: dsputil: Move Xvid IDCT put/add functions to a more suitable place	2012-09-14 01:59:47 +02:00
Mans Rullgard	97cb9236cf	ac3: move ac3_downmix() from dsputil to ac3dsp Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-09-12 23:39:50 +01:00
Diego Biurrun	1648a508fa	x86: dsputil: Move specific optimization settings out of global init function They belong in the init functions specific to each CPU capability.	2012-09-11 10:12:17 +02:00
Diego Biurrun	8cb7ed5562	x86: avcodec: Drop silly "_mmx" suffix from dsputil template names	2012-09-07 13:50:52 +02:00
Mans Rullgard	6efb698883	cavsdsp: set idct permutation independently of dsputil CAVS uses its own idct so using dsputil to set the permutation is fragile. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-09-07 11:42:35 +01:00
Mans Rullgard	5fe64d88f6	x86: allow using add_hfyu_median_prediction_cmov on any cpu with cmov For some reason add_hfyu_median_prediction_cmov is only selected on 3Dnow-capable CPUs, even though it uses no 3Dnow instructions. This patch allows it to be selected on any cpu with cmov with the possibility of being overridden by the mmxext version. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-09-07 11:42:35 +01:00
Diego Biurrun	ef6ba1f237	x86: dsputil: Do not redundantly check for CPU caps before calling init funcs The init functions check for CPU capabilities on their own already.	2012-09-06 09:05:52 +02:00
Diego Biurrun	17337f54c0	x86: Split inline and external assembly #ifdefs	2012-08-31 01:53:25 +02:00
Diego Biurrun	a886b279a0	x86: cosmetics: Comment some #endifs for better readability	2012-08-30 18:50:33 +02:00
Diego Biurrun	bcc45d6348	x86: avcodec: Drop silly "_mmx" suffixes from filenames	2012-08-28 18:37:34 +02:00
Mans Rullgard	c318626ce2	x86: rename libavutil/x86_cpu.h to libavutil/x86/asm.h This puts x86-specific things in the x86/ subdirectory where they belong. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-08-09 00:58:20 +01:00
Anton Khirnov	36ef5369ee	Replace all CODEC_ID_* with AV_CODEC_ID_*	2012-08-07 16:00:24 +02:00
Diego Biurrun	239fdf1b4a	x86: build: replace mmx2 by mmxext Refactoring mmx2/mmxext YASM code with cpuflags will force renames. So switching to a consistent naming scheme beforehand is sensible. The name "mmxext" is more official and widespread and also the name of the CPU flag, as reported e.g. by the Linux kernel.	2012-08-03 22:51:05 +02:00
Diego Biurrun	ca844b7be9	x86: Use consistent 3dnowext function and macro name suffixes Currently there is a wild mix of 3dn2/3dnow2/3dnowext. Switching to "3dnowext", which is a more common name of the CPU flag, as reported e.g. by the Linux kernel, unifies this.	2012-08-03 14:00:47 +02:00
Mans Rullgard	ec7c501ed5	x86: remove libmpeg2 mmx(ext) idct functions These functions are not faster than other mmx implementations on any hardware I have been able to test on, and they are horribly inaccurate. There is thus no reason to ever use them. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-08-02 12:14:52 +01:00
Ronald S. Bultje	d07ff3cd5a	h264_chromamc_10bit: port x86 simd to cpuflags.	2012-07-27 17:35:49 -07:00
Ronald S. Bultje	79195ce565	x86/dsputil: put inline asm under HAVE_INLINE_ASM. This allows compiling with compilers that don't support gcc-style inline assembly. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	2012-07-25 14:24:27 -04:00
Yang Wang	845e92fd6a	dsputil_mmx: fix incorrect assembly code In ff_put_pixels_clamped_mmx(), there are two assembly code blocks. In the first block (in the unrolled loop), the instructions "movq 8%3, %%mm1 \n\t", and so forth, have problems. From above instruction, it is clear what the programmer wants: a load from p + 8. But this assembly code doesn’t guarantee that. It only works if the compiler puts p in a register to produce an instruction like this: "movq 8(%edi), %mm1". During compiler optimization, it is possible that the compiler will be able to constant propagate into p. Suppose p = &x[10000]. Then operand 3 can become 10000(%edi), where %edi holds &x. And the instruction becomes "movq 810000(%edx)". That is, it will stride by 810000 instead of 8. This will cause a segmentation fault. This error was fixed in the second block of the assembly code, but not in the unrolled loop. How to reproduce: This error is exposed when we build using Intel C++ Compiler, with IPO+PGO optimization enabled. Crashed when decoding an MJPEG video. Signed-off-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	2012-07-25 14:22:18 -04:00
Diego Biurrun	9f97af2688	x86: dsputil: drop some unused CPU flag debug code	2012-07-19 10:17:56 +02:00
Mans Rullgard	28f9ab7029	vp3: move idct and loop filter pointers to new vp3dsp context This moves all VP3-specific function pointers from dsputil to a new vp3dsp context. There is no reason to ever use the VP3 IDCT where an MPEG2 IDCT is expected or vice versa. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-07-18 10:32:19 +01:00
Diego Biurrun	fe07c9c6b5	x86: Only use optimizations with cmov if the CPU supports the instruction	2012-06-23 16:21:50 +02:00
Mans Rullgard	685f5438bb	x86: move some inline asm macros to the only places they are used Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-06-23 14:14:06 +01:00
Justin Ruggles	d5a7229ba4	Add a float DSP framework to libavutil Move vector_fmul() from DSPContext to AVFloatDSPContext.	2012-06-08 13:14:38 -04:00
Kieran Kunhya	5ff01259a8	Convert vector_fmul range of functions to YASM and add AVX versions Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>	2012-05-21 17:13:05 -04:00
Christophe Gisquet	110d0cdc9d	rv40dsp x86: MMX/MMX2/3DNow/SSE2/SSSE3 implementations of MC Code mostly inspired by vp8's MC, however: - its MMX2 horizontal filter is worse because it can't take advantage of the coefficient redundancy - that same coefficient redundancy allows better code for non-SSSE3 versions Benchmark (rounded to tens of unit): V8x8 H8x8 2D8x8 V16x16 H16x16 2D16x16 C 445 358 985 1785 1559 3280 MMX* 219 271 478 714 929 1443 SSE2 131 158 294 425 515 892 SSSE3 120 122 248 387 390 763 End result is overall around a 15% speedup for SSSE3 version (on 6 sequences); all loop filter functions now take around 55% of decoding time, while luma MC dsp functions are around 6%, chroma ones are 1.3% and biweight around 2.3%. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-05-10 18:42:43 +02:00
Christophe GISQUET	e75d1d4f73	dsputil x86: revert a test back to its previous value Commit `356ee8d` caused the initial inversion. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-28 11:00:51 -07:00
Mans Rullgard	2bcbd98459	Remove lowres video decoding This feature is complex, of questionable utility, and slows down normal decoding. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-04-21 18:56:19 +01:00
Mans Rullgard	95510be8c3	avcodec: remove AVCodecContext.dsp_mask This removes all references to AVCodecContext.dsp_mask and marks it for eviction at the next version bump. It has been superseded by av_set_cpu_flag_mask() which, unlike this field, works everywhere. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-04-21 18:30:01 +01:00
Christophe GISQUET	cd88105f6f	dsputil x86: remove deprecated parameter from scalarproduct_int16 prototype Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-04 11:24:08 -07:00
Diego Biurrun	62ce9defb8	x86: dsputil: prettyprint gcc inline asm	2012-03-25 11:50:48 +02:00
Diego Biurrun	3b54912113	x86: K&R prettyprinting cosmetics for dsputil_mmx.c	2012-03-25 11:50:48 +02:00
Diego Biurrun	915a2a0a65	x86: conditionally compile H.264 QPEL optimizations	2012-03-25 11:50:45 +02:00
Diego Biurrun	3816642eab	dsputil_mmx: Surround QPEL macros by "do { } while (0);" blocks. This makes them safe to use in non-fully braced if-blocks and similar.	2012-03-25 11:48:37 +02:00
Mans Rullgard	356ee8d7de	x86: clean up ff_dsputil_init_mmx() This splits ff_dsputil_init_mmx() into multiple functions, one for each MMX/SSE level, somewhat simplifying the nested conditions. Signed-off-by: Mans Rullgard <mans@mansr.com> Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-03-05 14:40:03 +01:00
Martin Storsjö	9cf0841ef3	dsputil: Add ff_ prefix to the dsputil_init functions Signed-off-by: Martin Storsjö <martin@martin.st>	2012-02-15 22:06:34 +02:00
Christophe Gisquet	6b03900382	x86 dsputil: provide SSE2/SSSE3 versions of bswap_buf While pshufb allows emulating bswap on XMM registers for SSSE3, more shuffling is needed for SSE2. Alignment is critical, so specific codepaths are provided for this case. For the huffyuv sequence "angels_480-huffyuvcompress.avi": C (using bswap instruction): ~ 55k cycles SSE2: ~ 40k cycles SSSE3 using unaligned loads: ~ 35k cycles SSSE3 using aligned loads: ~ 30k cycles Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-01-30 10:19:55 +01:00
Ronald S. Bultje	e92003514d	png: move DSP functions to their own DSP context.	2012-01-29 08:11:18 -08:00
Ronald S. Bultje	c3af52fa8b	dsputil: use vertical component for drawing bottom edge. Current code only writes 8 pixels of vertical edge for YUV422, which causes MC artifacts when subsequent frames use data from that edge.	2012-01-25 18:06:36 +08:00
Diego Biurrun	88b9735753	build: conditionally compile x86 H.264 chroma optimizations	2011-12-14 11:58:45 +01:00
Justin Ruggles	395f2e70dd	dsputil: use movups instead of movdqu in ff_emu_edge_core_sse() This allows emulated_edge_mc_sse() and gmc_sse() to be used under AV_CPU_FLAG_SSE.	2011-11-22 15:40:51 -05:00
Justin Ruggles	9d06037d48	twinvq: add SSE/AVX optimized sum/difference stereo interleaving	2011-11-11 14:13:58 -05:00
Justin Ruggles	b8f02f5b4e	dsputil: use cpuflags in x86 versions of vector_clip_int32()	2011-11-06 20:50:06 -05:00
Daniel Kang	ded3e9f054	H.264: Cometics to dsputil_mmx.c Add whitespace. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2011-10-26 06:41:32 -07:00
Ronald S. Bultje	e3f530feca	prores: idct sse2/sse4 optimizations. ~3.0-3.5x as fast as original C version, 1.6x as fast overall.	2011-10-11 07:50:48 -07:00
Alex Converse	48f7163f13	dsputil_mmx: Honor HAVE_AMD3DNOW	2011-08-15 11:20:08 -07:00
Kostya Shishkov	d241f51e0f	Move RV3/4-specific DSP functions into their own context Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2011-08-11 16:07:15 -07:00
Jason Garrett-Glaser	a3bf7b864a	H.264: tweak some other x86 asm for Atom	2011-07-29 12:24:15 -07:00
Mans Rullgard	a617c6aaa3	dsputil: update per-arch init funcs for non-h264 high bit depth Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-07-21 18:10:58 +01:00
Mans Rullgard	e7a972e113	simple_idct: add 10-bit version Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-07-20 17:49:48 +01:00
Diego Biurrun	65083b4911	dsputil: remove disabled code	2011-07-18 11:48:35 +02:00
Mans Rullgard	710b8df949	dsputil: remove ff_emulated_edge_mc macro used in one place This macro can cause problems in conjunction with the bitdepth template expansion. It was presumably added to keep source compatibility when high bitdepth support was added. However, emulated_edge_mc is a dsputil pointer and should not be called directly, so there is little reason to keep such a macro. Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-07-10 17:55:58 +01:00
Daniel Kang	c0483d0c7a	H.264: Add x86 assembly for 10-bit H.264 predict functions Mainly ported from 8-bit H.264 predict. Some code ported from x264. LGPL ok by author. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2011-07-08 15:59:29 -07:00
Daniel Kang	3c7c16fde3	YASM: Shut up unused variable compiler warning with --disable-yasm. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2011-07-04 18:49:09 +02:00
Daniel Kang	58f7aad051	Fix build with --disable-yasm. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2011-07-03 22:56:09 -07:00
Daniel Kang	9bfa5363da	H.264: Add x86 assembly for 10-bit H.264 qpel functions. Mainly ported from 8-bit H.264 qpel. Some code ported from x264. LGPL ok by author. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2011-07-03 07:43:38 -07:00
Justin Ruggles	6054cd25b4	ac3enc: add int32_t array clipping function to DSPUtil, including x86 versions.	2011-07-01 13:02:11 -04:00
Diego Biurrun	d2ee495fb2	configure: Drop check for availability of ten assembler operands. This was done to support gcc 2.95, which is an old legacy compiler that fails to compile the current codebase anyway.	2011-06-28 13:14:37 +02:00
Ronald S. Bultje	ed63f527f2	Fix build if yasm is not available.	2011-06-18 08:34:14 -04:00
Daniel Kang	f188a1e0ca	H.264: Add x86 assembly for 10-bit MC Chroma H.264 functions. Mainly ported from 8-bit H.264 MC Chroma. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2011-06-18 07:52:19 -04:00
Jason Garrett-Glaser	c90b94424c	4:4:4 H.264 decoding support Note: this is 4:4:4 from the 2007 spec revision, not the previous (now deprecated) 4:4:4 mode in H.264.	2011-06-13 21:16:30 -07:00
Jason Garrett-Glaser	504811baea	Roll back 4:4:4 H.264 for now Needs some ARM/PPC asm modifications.	2011-06-13 13:38:46 -07:00
Jason Garrett-Glaser	c9c493872c	4:4:4 H.264 decoding support Note: this is 4:4:4 from the 2007 spec revision, not the previous (now deprecated) 4:4:4 mode in H.264.	2011-06-13 12:21:39 -07:00
Jason Garrett-Glaser	9f3d6ca4f1	Port x86 10-bit H.264 deblock asm from x264	2011-05-10 20:02:15 -07:00
Oskar Arvidsson	19a0729b4c	Adds 8-, 9- and 10-bit versions of some of the functions used by the h264 decoder. This patch lets e.g. dsputil_init chose dsp functions with respect to the bit depth to decode. The naming scheme of bit depth dependent functions is <base name>_<bit depth>[_<prefix>] (i.e. the old clear_blocks_c is now named clear_blocks_8_c). Note: Some of the functions for high bit depth is not dependent on the bit depth, but only on the pixel size. This leaves some room for optimizing binary size. Preparatory patch for high bit depth h264 decoding support. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2011-05-10 07:24:36 -04:00
Alexander Strange	1500be13f2	dsputil: allow to skip drawing of top/bottom edges.	2011-03-26 17:45:38 -04:00
Justin Ruggles	e6e9823488	Add apply_window_int16() to DSPContext with x86-optimized versions and use it in the ac3_fixed encoder.	2011-03-22 21:08:30 -04:00
Mans Rullgard	2912e87a6c	Replace FFmpeg with Libav in licence headers Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-03-19 13:33:20 +00:00
Ronald S. Bultje	bf6fa73245	dsputil_mmx.c: remove ff_vector128. Remove ff_vector128, it is identical to ff_pb_80.	2011-02-19 10:51:15 -05:00
Ronald S. Bultje	12802ec060	dsputil: move VC1-specific stuff into VC1DSPContext.	2011-02-17 17:35:35 -05:00
Justin Ruggles	c73d99e672	Separate format conversion DSP functions from DSPContext. This will be beneficial for use with the audio conversion API without requiring it to depend on all of dsputil. Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-02-02 02:44:53 +00:00
Ronald S. Bultje	81f2a3f4ff	Implement a SIMD version of emulated_edge_mc() for x86. From ~550 cycles (C version) to 170 (SSE/x86-64), 206 (MMX/x86-32) and 196 (SSE2/x86-32) cycles.	2011-01-31 20:55:56 -05:00
Justin Ruggles	d19b744a36	cosmetics: indentation Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-01-31 20:30:15 +00:00
Justin Ruggles	80ba1ddb58	Remove unneeded add bias from 3 functions. DSPContext.vector_fmul_window() DCADSPContext.lfe_fir() SynthFilterContext.synth_filter_float() Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-01-31 20:28:42 +00:00
Justin Ruggles	6eabb0d3ad	Change DSPContext.vector_fmul() from dst=dstsrc to dest=src0src1. Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-01-22 17:53:27 +00:00
Mans Rullgard	ef4a65149d	Replace ASMALIGN() with .p2align This macro has unconditionally used .p2align for a long time and serves no useful purpose.	2011-01-18 20:48:24 +00:00
Mans Rullgard	ac3c9d0169	x86: remove VLA in ac3_downmix_sse	2011-01-18 20:48:24 +00:00
Ronald S. Bultje	ec3233a855	Fix ff_pw_3 alignment. Originally committed as revision 26344 to svn://svn.ffmpeg.org/ffmpeg/trunk	2011-01-14 23:26:34 +00:00
Jason Garrett-Glaser	19fb234e4a	H.264: split luma dc idct out and implement MMX/SSE2 versions About 2.5x the speed. NOTE: the way that the asm code handles large qmuls is a bit suboptimal. If x264-style dequant was used (separate shift and qmul values), it might be possible to get some extra speed. Originally committed as revision 26336 to svn://svn.ffmpeg.org/ffmpeg/trunk	2011-01-14 21:34:25 +00:00
Ronald S. Bultje	8d147f1f60	For rounding in chroma MC SSSE3, use 16-byte pw_3/4 instead of reading 8 bytes and then using movlhps to dup it into the higher half of the register. Originally committed as revision 26086 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-24 17:23:22 +00:00
Baptiste Coudurier	90f1f3bf00	In yadif filter, declare asm constants directly to avoid dependency on libavcodec Originally committed as revision 25895 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-06 00:14:15 +00:00
Baptiste Coudurier	9e95999e2a	10l, add ff_pw_1 to dsputil_mmx for yadif sse2 Originally committed as revision 25881 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-04 13:06:06 +00:00
İsmail Dönmez	80e33d2451	dsputil: Use explicit movzbl instead of movzx This fixes compilation with the latest clang trunk version. Patch by İsmail Dönmez, ismail at namtrac dot org Originally committed as revision 25628 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-11-01 19:35:51 +00:00
Ramiro Polla	153ca56b38	xmm_clobbers: list xmm registers first in clobber list suncc does not like the leading commas inside the macro, but it has no problem with trailing commas. Originally committed as revision 25615 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-10-31 18:14:48 +00:00

1 2 3 4 5

223 Commits