ffmpeg

Author	SHA1	Message	Date
Jason Garrett-Glaser	4a384de5b8	Split h264dsp and h264pred in configure. Many H.264 derivatives, like RV40 and VP8, use the H.264 prediction functions but not the weight/loopfilter functions. This should reduce the size of builds with one of these derivatives but without H.264 decoding itself. Originally committed as revision 24741 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-07 23:10:25 +00:00
Jason Garrett-Glaser	98fe09df7b	Add file missing in r24702 Originally committed as revision 24703 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-05 00:49:48 +00:00
Eli Friedman	c12d6955e2	H.264: SSE2/SSSE3 weighted prediction asm Patch by Eli Friedman <eli.friedman at gmail dot com> Originally committed as revision 24702 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-05 00:13:38 +00:00
Måns Rullgård	f079a64aea	Move cavs dsp functions to their own struct Originally committed as revision 24685 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-03 20:59:00 +00:00
Jason Garrett-Glaser	8b9b5e085f	VP5/6/8: add one inline missed in r24677 Originally committed as revision 24682 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-03 11:21:22 +00:00
Jason Garrett-Glaser	827d43bb9d	VP8: move zeroing of luma DC block into the WHT Lets us do the zeroing in asm instead of C. Also makes it consistent with the way the regular iDCT code does it. Originally committed as revision 24668 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-02 20:18:09 +00:00
Ronald S. Bultje	6341838f3c	Use word-writing instead of dword-writing (with two cached but otherwise unchanged bytes) in the horizontal simple loopfilter. This makes the filter quite a bit faster in itself (~30 cycles less on Core1), probably mostly because we don't need a complex 4x4 transpose, but only a simple byte interleave. Also allows using pextrw on SSE4, which speeds up even more (e.g. 25% faster on Core i7). Originally committed as revision 24638 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-31 23:13:15 +00:00
Vitor Sessak	fa738b3ad1	Remove x86/mmx.h. It is not used anymore and has been deprecated for years. Originally committed as revision 24618 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-31 16:20:45 +00:00
Vitor Sessak	de4bc44abb	Convert deinterlacing MMX code to YASM Originally committed as revision 24615 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-31 14:50:51 +00:00
Vitor Sessak	740dfe7012	Fix compilation in x86_64. I broke it with r24580. Originally committed as revision 24582 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-29 22:45:21 +00:00
Vitor Sessak	2c3dda6838	Translate libmpeg2 MMX IDCT to plain asm Originally committed as revision 24580 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-29 22:19:54 +00:00
Ronald S. Bultje	ab4d031889	Use pmaddubsw for the mbedge_filter (>=ssse3), 6-10 cycles faster. Originally committed as revision 24514 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-26 21:18:19 +00:00
Jason Garrett-Glaser	e25dee602f	VP8: Much faster SSE2 MC 5-10% faster or more on Phenom, Athlon 64, and some others. Helps some on pre-SSSE3 Intel chips as well, but not as much. Originally committed as revision 24513 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-26 19:34:00 +00:00
Ronald S. Bultje	48adb7e7a4	Enable no-loop memory/register saving for ssse3/sse4 also. Originally committed as revision 24511 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-26 14:07:57 +00:00
Ronald S. Bultje	2a180c69ea	Save a register (or regsize of stackspace for x86-32) for the no-loop mbedge loopfilter functions, by re-using space that holds a variable that we no longer need. Originally committed as revision 24510 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-26 14:00:15 +00:00
Ronald S. Bultje	bcd4aa6498	Use nested ifs instead of &&, which appears to not work with %ifidn (i.e. this construct was always enabled, even for <ssse3 versions). Originally committed as revision 24509 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-26 13:56:51 +00:00
Ronald S. Bultje	2208053bd3	Split pextrw macro-spaghetti into several opt-specific macros, this will make future new optimizations (imagine a sse5) much easier. Also fix a bug where we used the direction (%2) rather than optimization (%1) to enable this, which means it wasn't ever actually used... Originally committed as revision 24507 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-26 13:50:59 +00:00
Ronald S. Bultje	6de5b7c6b8	Fix obvious bug in assignment. Somehow, the test vectors don't test this... Originally committed as revision 24489 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-25 02:42:40 +00:00
Ronald S. Bultje	e3f7bf774c	Fix SPLATB_REG mess. Used to be a if/elseif/elseif/elseif spaghetti, so this splits it into small optimization-specific macros which are selected for each DSP function. The advantage of this approach is that the sse4 functions now use the ssse3 codepath also without needing an explicit sse4 codepath. Originally committed as revision 24487 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-24 19:33:05 +00:00
Eli Friedman	3611e7a309	Inline asm for VP56 arith coder This is a lot more reliable to get cmov rather than trying to trick gcc into generating it, useful since it's 2% faster overall. Patch by Eli Friedman <eli.friedman at gmail> Originally committed as revision 24471 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-23 21:46:30 +00:00
Jason Garrett-Glaser	3ae079a3c8	VP8: optimize DC-only chroma case in the same way as luma. Add MMX idct_dc_add4uv function for this case. ~40% faster chroma idct. Originally committed as revision 24455 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-23 06:02:52 +00:00
Jason Garrett-Glaser	51c9156438	VP8 asm: cosmetics (spacing) Originally committed as revision 24453 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-23 03:02:56 +00:00
Jason Garrett-Glaser	8a467b2d44	VP8: 30% faster idct_mb Take shortcuts based on statistically common situations. Add 4-at-a-time idct_dc function (mmx and sse2) since rows of 4 DC-only DCT blocks are common. TODO: tie this more directly into the MB mode, since the DC-level transform is only used for non-splitmv blocks? Originally committed as revision 24452 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-23 02:58:27 +00:00
Jason Garrett-Glaser	c25c776708	VP8: clear DCT blocks in iDCT instead of using clear_blocks. ~0.3% faster overall. Originally committed as revision 24448 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-23 00:07:16 +00:00
Ronald S. Bultje	dc5eec8085	Use pextrw for SSE4 mbedge filter result writing, speedup 5-10cycles on CPUs supporting it. Originally committed as revision 24437 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-22 19:59:34 +00:00
Ronald S. Bultje	003243c3c2	Fix and enable horizontal >=SSE2 mbedge loopfilter. Originally committed as revision 24409 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-22 01:35:26 +00:00
Loren Merritt	c7b1d9768c	relicense h264 deblock sse2 to lgpl Originally committed as revision 24408 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-22 00:39:49 +00:00
Loren Merritt	532e769701	sync yasm macros from x264 Originally committed as revision 24406 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-21 22:45:16 +00:00
Jason Garrett-Glaser	8731dbd890	Eliminate one instruction in VP8 dc_add_sse4 Originally committed as revision 24405 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-21 22:41:37 +00:00
Jason Garrett-Glaser	7dd224a42d	Various VP8 x86 deblocking speedups SSSE3 versions, improve SSE2 versions a bit. SSE2/SSSE3 mbedge h functions are currently broken, so explicitly disable them. Originally committed as revision 24403 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-21 22:11:03 +00:00
Jason Garrett-Glaser	b8b231b5dc	Make mmx VP8 WHT faster Avoid pextrw, since it's slow on many older CPUs. Now it doesn't require mmxext either. Originally committed as revision 24397 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-21 20:51:01 +00:00
David Conrad	af521abc28	Add header declarations for mmx/sse constants missing them Originally committed as revision 24381 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-21 10:02:07 +00:00
David Conrad	c7eec58170	Move ff_pw_* from vc1dsp_mmx.c to dsputil_mmx.c Should fix compilation with icc and should help prevent any future duplicates Originally committed as revision 24380 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-21 10:02:03 +00:00
Ronald S. Bultje	e9e456d850	VP8 MBedge loopfilter MMX/MMX2/SSE2 functions for both luma (width=16) and chroma (width=8). Originally committed as revision 24378 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-20 22:58:56 +00:00
Ronald S. Bultje	268821e76e	Chroma (width=8) inner loopfilter MMX/MMX2/SSE2 for VP8 decoder. Originally committed as revision 24377 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-20 22:04:18 +00:00
Ronald S. Bultje	c60ed66dbe	Revert r24339 (it causes fate failures on x86-64) - I'll figure out what's wrong with it tomorrow or so, then re-submit. Originally committed as revision 24341 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-19 23:57:09 +00:00
Ronald S. Bultje	6526976f0c	Remove FF_MM_SSE2/3 flags for CPUs where this is generally not faster than regular MMX code. Examples of this are the Core1 CPU. Instead, set a new flag, FF_MM_SSE2/3SLOW, which can be checked for particular SSE2/3 functions that have been checked specifically on such CPUs and are actually faster than their MMX counterparts. In addition, use this flag to enable particular VP8 and LPC SSE2 functions that are faster than their MMX counterparts. Based on a patch by Loren Merritt <lorenm AT u washington edu>. Originally committed as revision 24340 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-19 22:38:23 +00:00
Ronald S. Bultje	1878f685c0	Implement chroma (width=8) inner loopfilter MMX/MMX2/SSE2 functions. Originally committed as revision 24339 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-19 21:53:28 +00:00
Ronald S. Bultje	fb9bdf048c	Be more efficient with registers or stack memory. Saves 8/16 bytes stack for x86-32, or 2 MM registers on x86-64. Originally committed as revision 24338 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-19 21:45:36 +00:00
Ronald S. Bultje	3facfc99da	Change function prototypes for width=8 inner and mbedge loopfilter functions so that it does both U and V planes at the same time. This will have speed advantages when using SSE2 (or higher) optimizations, since we can do both the U and V rows together in a single xmm register. This also renames filter16 to filter16y and filter8 to filter8uv so that it's more obvious what each function is used for. Originally committed as revision 24337 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-19 21:18:04 +00:00
Loren Merritt	1ee076b1b1	more credits to D. J. Bernstein for fft Originally committed as revision 24308 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-18 20:06:42 +00:00
Ronald S. Bultje	819b2dd2b1	Attempt to fix x86-64 testsuite on fate. Originally committed as revision 24275 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-16 21:35:30 +00:00
Ronald S. Bultje	6f323f1251	Remove duplicate define. Originally committed as revision 24272 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-16 19:54:47 +00:00
Ronald S. Bultje	889b2c26ee	Revert 24270, it contained some stuff that shouldn't have been in there. Originally committed as revision 24271 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-16 19:54:25 +00:00
Ronald S. Bultje	2356a7834b	Remove duplicate define. Originally committed as revision 24270 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-16 19:42:32 +00:00
Ronald S. Bultje	ede1b9665a	Give x86 r%d registers names, this will simplify implementation of the chroma inner loopfilter, and it also allows us to save one register on x86-64/sse2. Originally committed as revision 24269 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-16 19:38:10 +00:00
Ronald S. Bultje	526e831a46	Change return statement, the REP_RET is a mistake since the else case (x86-64, sse2) doesn't actually loop, so REP_RET isn't necessary. Originally committed as revision 24268 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-16 18:29:14 +00:00
Ronald S. Bultje	a711eb4829	VP8 H/V inner loopfilter MMX/MMXEXT/SSE2 optimizations. Originally committed as revision 24250 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-15 23:02:34 +00:00
David Conrad	faa26db28b	MMX/SSE VC1 loop filter Originally committed as revision 24208 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-11 22:53:01 +00:00
David Conrad	7af8fbd348	Make ff_pw_4 128 bits Originally committed as revision 24207 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-11 22:52:55 +00:00

1 2 3 4

187 Commits