ffmpeg

Author	SHA1	Message	Date
İsmail Dönmez	9276bdddca	snowdsp: Explicitly state the operand sizes Fixes compilation with clang's builtin assembler Patch by İsmail Dönmez, ismail at namtrac dot org Originally committed as revision 25331 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-10-04 13:08:13 +00:00
Ronald S. Bultje	a52ffc3f54	Move static inline function to a macro, so that constant propagation in inline asm works for gcc-3.x also (hopefully). Should fix gcc-3.x FATE breakage after r25254. Originally committed as revision 25262 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-29 17:42:26 +00:00
Eli Friedman	329d689f75	Use sse2 variant of put_pixels16() for no_rnd also. Provides a minor speed increase to e.g. vc1, snow and mpeg decoding. Patch by Eli Friedman <eli dot friedman gmail com>. Originally committed as revision 25259 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-29 15:34:43 +00:00
Ronald S. Bultje	cd17285e6c	Merge b_idx and edge variables, and optimize the ASM to directly load variables from memory locations/offsets depending on b_idx plus constants, rather than having gcc do this. This saves several lea calls and together saves about 10 cycles in h264_loop_filter_strength_mmx2(). Originally committed as revision 25256 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-29 14:04:39 +00:00
Ronald S. Bultje	0cc8a5d088	Remove mv_mask variable. Replace the related pand -1/0 instructions by either a pxor, or remove the instruction alltogether. Altogether, this saves 1 instruction. Originally committed as revision 25255 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-29 14:03:30 +00:00
Ronald S. Bultje	c0673f2cf4	Remove d_idx as a variable, and instead load it as a constant in the asm. This has no measurable speed effect because the surrounding code doesn't take advantage of this yet. Originally committed as revision 25254 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-29 14:02:32 +00:00
Ronald S. Bultje	2c3135f6d3	Unroll inner bidir loop in h264_loop_filter_strength_mmx2(), which gets rid of the d_idx variable and therefore allows for future optimizations. No speed difference by this commit itself. Originally committed as revision 25253 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-29 13:35:24 +00:00
Ronald S. Bultje	4b81511cab	Unloop the outer loop in h264_loop_filter_strength_mmx2(), which allows inlining various constants within the loop code. 20 cycles faster on cathedral sample. Originally committed as revision 25252 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-29 13:34:20 +00:00
Reimar Döffinger	02b424d9c8	Add d suffix to movd target register to make it work with nasm. Originally committed as revision 25206 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-26 09:15:18 +00:00
Reimar Döffinger	dc77e985b7	Split and then simplify address generation macro. Allows nasm to work for this code. Originally committed as revision 25205 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-26 09:08:11 +00:00
Ronald S. Bultje	7e117771cd	Remove unused variable. Originally committed as revision 25173 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-24 15:31:46 +00:00
Ronald S. Bultje	ae11291865	Unroll loop in h264_idct_add16intra_sse2(). Basically identical to r25171, this inlines scan8[] and removes loop setup. 15% faster, 0.4% overall. See "[PATCH] unroll loop in h264_idct_add8_sse2()" thread on ML. Originally committed as revision 25172 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-24 14:07:23 +00:00
Ronald S. Bultje	4bca677494	Unroll loop in h264_idct_add8_sse2(). This means we can inline scan8[] in the code directly also and remove loop setup. 20% faster in function, 0.8% overall. See "[PATCH] unroll loop in h264_idct_add8_sse2()" thread on ML. Originally committed as revision 25171 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-24 14:05:45 +00:00
Måns Rullgård	c0bc8b9afb	x86: disable SSE functions using stack when stack is not aligned This fixes crashes with ICC 10.1. Originally committed as revision 25153 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-21 17:57:21 +00:00
Måns Rullgård	f41237c9db	x86: remove hack disabling sse2 h264 loop filter with 32-bit icc Originally committed as revision 25146 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-18 20:44:32 +00:00
Ronald S. Bultje	ada65af9d1	Don't access upper 32 bits of a 32-bit int on 64-bit systems. Originally committed as revision 25140 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-17 12:24:22 +00:00
Ronald S. Bultje	6c3d021891	Properly add HAVE_YASM around yasmified symbols. Should fix compile error on configurations using --disable-yasm. Originally committed as revision 25138 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-17 03:01:57 +00:00
Ronald S. Bultje	e2e341048e	Move hadamard_diff{,16}_{mmx,mmx2,sse2,ssse3}() from inline asm to yasm, which will hopefully solve the Win64/FATE failures caused by these functions. Originally committed as revision 25137 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-17 01:56:06 +00:00
Ronald S. Bultje	d0acc2d2e9	Move sse16_sse2() from inline asm to yasm. It is one of the functions causing Win64/FATE issues. Originally committed as revision 25136 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-17 01:44:17 +00:00
Ronald S. Bultje	1d16a1cf99	Rename h264_idct_sse2.asm to h264_idct.asm; move inline IDCT asm from h264dsp_mmx.c to h264_idct.asm (as yasm code). Because the loops are now coded in asm instead of C, this is (depending on the function) up to 50% faster for cases where gcc didn't do a great job at looping. Since h264_idct_add8() is now faster than the manual loop setup in h264.c, in-asm idct calling can now be enabled for chroma as well (see r16207). For MMX, this is 5% faster. For SSE2 (which isn't done for chroma if h264.c does the looping), this makes it up to 50% faster. Speed gain overall is ~0.5-1.0%. Originally committed as revision 25119 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-14 13:36:26 +00:00
Jason Garrett-Glaser	8acb554aff	LGPL SSE2 H.264 iDCT This leaves no more GPL-only H.264 decoding asm code. Approved by Loren. Originally committed as revision 25092 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-10 02:25:12 +00:00
Stefano Sabatini	c6c98d0897	Move mm_support() from libavcodec to libavutil, make it a public function and rename it to av_get_cpu_flags(). Originally committed as revision 25076 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-08 15:07:14 +00:00
Reimar Döffinger	b1c32fb5e5	Use "d" suffix for general-purpose registers used with movd. This increases compatibilty with nasm and is also more consistent, e.g. with h264_intrapred.asm and h264_chromamc.asm that already do it that way. Originally committed as revision 25042 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-05 10:10:16 +00:00
Stefano Sabatini	7160bb716b	Rename FF_MM_ symbols related to CPU features flags as AV_CPU_FLAG_ symbols, and move them from libavcodec/avcodec.h to libavutil/cpu.h. Originally committed as revision 25040 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-04 09:59:08 +00:00
Ronald S. Bultje	2c166c3af1	Port latest x264 deblock asm (before they moved to using NV12 as internal format), LGPL'ed with permission from Jason and Loren. This includes mmx2 code, so remove inline asm from h264dsp_mmx.c accordingly. Originally committed as revision 25031 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-03 16:52:46 +00:00
Eli Friedman	a10a9f5cd0	Fix typo in r25019. Patch by Eli Friedman <eli.friedman at gmail dot com>. Originally committed as revision 25022 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-01 23:19:36 +00:00
Ronald S. Bultje	615da9b1d9	Unscrew breakage after my last commit because of symbol prefixes. Originally committed as revision 25020 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-01 21:10:19 +00:00
Ronald S. Bultje	a33a2562c1	Rename h264_weight_sse2.asm to h264_weight.asm; add 16x8/8x16/8x4 non-square biweight code to sse2/ssse3; add sse2 weight code; and use that same code to create mmx2 functions also, so that the inline asm in h264dsp_mmx.c can be removed. OK'ed by Jason on IRC. Originally committed as revision 25019 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-01 20:56:16 +00:00
Ronald S. Bultje	14bc1f2485	Split h264dsp_mmx.c (which was #included in dsputil_mmx.c) in h264_qpel_mmx.c, still #included in dsputil_mmx.c and is part of DSPContext, and h264dsp_mmx.c, which represents H264DSPContext and is now compiled on its own. Originally committed as revision 25018 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-01 20:48:59 +00:00
Ronald S. Bultje	5929b3a651	Fix vertical align. Originally committed as revision 25009 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-31 12:32:24 +00:00
Ronald S. Bultje	79ce0f002e	Fix compilation failure if yasm is disabled (missing vp3 symbols). Originally committed as revision 24992 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-30 20:30:40 +00:00
Ronald S. Bultje	de1c253bab	Split intra prediction initialization (i.e. assigning of function pointers) into its own file, it doesn't belong in h264dsp_mmx.c (much less so in dsputil_mmx.c). Originally committed as revision 24990 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-30 16:34:13 +00:00
Ronald S. Bultje	d0eb5a1174	Move H264 chroma MC from inline asm to yasm. This fixes VP3/5/6 and VC-1 fate failures on Win64. Originally committed as revision 24989 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-30 16:31:04 +00:00
Ronald S. Bultje	e9f5f020c6	Move VP3 IDCT functions from inline ASM to YASM. This fixes part of the VP3/5/6 issues on Win64. Originally committed as revision 24988 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-30 16:25:46 +00:00
Ronald S. Bultje	7e7c4b6008	Put ff_ prefix on non-static {put_signed,put,add}_pixels_clamped_mmx() functions. Originally committed as revision 24987 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-30 16:22:27 +00:00
Loren Merritt	19d929f9a3	cosmetics in imdct_sse Originally committed as revision 24958 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-28 21:03:13 +00:00
Ronald S. Bultje	4eca52ed19	Fix typos when converting inline asm to yasm, fixes MMX-only fate-ea-vp61. Originally committed as revision 24948 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-26 14:33:39 +00:00
Ronald S. Bultje	6697bc33e2	Revert r24931, it broke Win32 and some BSD compiles (yay fate). Originally committed as revision 24934 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-25 20:36:35 +00:00
Ronald S. Bultje	72f642400b	Mark xmm6 and xmm7 as clobbered in ff_vp3_idct_sse2(), which is contributing to the VP6 fate failures on Win64. Originally committed as revision 24931 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-25 19:57:05 +00:00
Måns Rullgård	69dad87c48	VP6: fix vp6_filter_diag4_mmx/sse on 64-bit The stride can be negative and must be sign extended before being used in pointer arithmetic. Originally committed as revision 24926 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-25 15:41:11 +00:00
Ronald S. Bultje	89fa3504ed	Move vp6_filter_diag4() x86 SIMD code from inline ASM to YASM. This should help in fixing the Win64 fate failures. Originally committed as revision 24922 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-25 13:44:16 +00:00
Ronald S. Bultje	3a0885146c	Move vp6_filter_diag4() from DSPContext to VP56DSPContext. Originally committed as revision 24921 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-25 13:42:28 +00:00
Måns Rullgård	c0ec9918b0	Remove global mm_flags variable Originally committed as revision 24909 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-24 17:47:05 +00:00
Ronald S. Bultje	3611c45ab7	Mark xmm registers as clobbered in simple loopfilter. Should fix the last two VP8-related fate failures on Win64. Originally committed as revision 24908 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-24 16:52:27 +00:00
Alex Converse	cb4f12466b	imdct/x86: Use "s->mdct_size" instead of "1 << s->mdct_bits". It generates smaller cleaner code. Originally committed as revision 24887 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-23 15:51:09 +00:00
Ronald S. Bultje	684d608bde	Fix segfaults in VP8 SIMD code on Win64 (and FATE/win64 failures). Originally committed as revision 24871 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-23 02:41:22 +00:00
Alex Converse	78b5c97d3e	Convert ff_imdct_half_sse() to yasm. This is to avoid split asm sections that attempt to preserve some registers between sections. Originally committed as revision 24869 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-22 14:39:58 +00:00
Jason Garrett-Glaser	05c04cdf54	VP5/6/8: ~7% faster arithmetic decoding Grab from the bitstream in 16-bit chunks instead of 8-bit chunks. TODO: grab in 32-bit chunks on 64-bit systems. Originally committed as revision 24783 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-12 01:11:32 +00:00
Jason Garrett-Glaser	4a384de5b8	Split h264dsp and h264pred in configure. Many H.264 derivatives, like RV40 and VP8, use the H.264 prediction functions but not the weight/loopfilter functions. This should reduce the size of builds with one of these derivatives but without H.264 decoding itself. Originally committed as revision 24741 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-07 23:10:25 +00:00
Jason Garrett-Glaser	98fe09df7b	Add file missing in r24702 Originally committed as revision 24703 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-05 00:49:48 +00:00
Eli Friedman	c12d6955e2	H.264: SSE2/SSSE3 weighted prediction asm Patch by Eli Friedman <eli.friedman at gmail dot com> Originally committed as revision 24702 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-05 00:13:38 +00:00
Måns Rullgård	f079a64aea	Move cavs dsp functions to their own struct Originally committed as revision 24685 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-03 20:59:00 +00:00
Jason Garrett-Glaser	8b9b5e085f	VP5/6/8: add one inline missed in r24677 Originally committed as revision 24682 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-03 11:21:22 +00:00
Jason Garrett-Glaser	827d43bb9d	VP8: move zeroing of luma DC block into the WHT Lets us do the zeroing in asm instead of C. Also makes it consistent with the way the regular iDCT code does it. Originally committed as revision 24668 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-02 20:18:09 +00:00
Ronald S. Bultje	6341838f3c	Use word-writing instead of dword-writing (with two cached but otherwise unchanged bytes) in the horizontal simple loopfilter. This makes the filter quite a bit faster in itself (~30 cycles less on Core1), probably mostly because we don't need a complex 4x4 transpose, but only a simple byte interleave. Also allows using pextrw on SSE4, which speeds up even more (e.g. 25% faster on Core i7). Originally committed as revision 24638 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-31 23:13:15 +00:00
Vitor Sessak	fa738b3ad1	Remove x86/mmx.h. It is not used anymore and has been deprecated for years. Originally committed as revision 24618 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-31 16:20:45 +00:00
Vitor Sessak	de4bc44abb	Convert deinterlacing MMX code to YASM Originally committed as revision 24615 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-31 14:50:51 +00:00
Vitor Sessak	740dfe7012	Fix compilation in x86_64. I broke it with r24580. Originally committed as revision 24582 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-29 22:45:21 +00:00
Vitor Sessak	2c3dda6838	Translate libmpeg2 MMX IDCT to plain asm Originally committed as revision 24580 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-29 22:19:54 +00:00
Ronald S. Bultje	ab4d031889	Use pmaddubsw for the mbedge_filter (>=ssse3), 6-10 cycles faster. Originally committed as revision 24514 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-26 21:18:19 +00:00
Jason Garrett-Glaser	e25dee602f	VP8: Much faster SSE2 MC 5-10% faster or more on Phenom, Athlon 64, and some others. Helps some on pre-SSSE3 Intel chips as well, but not as much. Originally committed as revision 24513 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-26 19:34:00 +00:00
Ronald S. Bultje	48adb7e7a4	Enable no-loop memory/register saving for ssse3/sse4 also. Originally committed as revision 24511 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-26 14:07:57 +00:00
Ronald S. Bultje	2a180c69ea	Save a register (or regsize of stackspace for x86-32) for the no-loop mbedge loopfilter functions, by re-using space that holds a variable that we no longer need. Originally committed as revision 24510 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-26 14:00:15 +00:00
Ronald S. Bultje	bcd4aa6498	Use nested ifs instead of &&, which appears to not work with %ifidn (i.e. this construct was always enabled, even for <ssse3 versions). Originally committed as revision 24509 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-26 13:56:51 +00:00
Ronald S. Bultje	2208053bd3	Split pextrw macro-spaghetti into several opt-specific macros, this will make future new optimizations (imagine a sse5) much easier. Also fix a bug where we used the direction (%2) rather than optimization (%1) to enable this, which means it wasn't ever actually used... Originally committed as revision 24507 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-26 13:50:59 +00:00
Ronald S. Bultje	6de5b7c6b8	Fix obvious bug in assignment. Somehow, the test vectors don't test this... Originally committed as revision 24489 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-25 02:42:40 +00:00
Ronald S. Bultje	e3f7bf774c	Fix SPLATB_REG mess. Used to be a if/elseif/elseif/elseif spaghetti, so this splits it into small optimization-specific macros which are selected for each DSP function. The advantage of this approach is that the sse4 functions now use the ssse3 codepath also without needing an explicit sse4 codepath. Originally committed as revision 24487 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-24 19:33:05 +00:00
Eli Friedman	3611e7a309	Inline asm for VP56 arith coder This is a lot more reliable to get cmov rather than trying to trick gcc into generating it, useful since it's 2% faster overall. Patch by Eli Friedman <eli.friedman at gmail> Originally committed as revision 24471 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-23 21:46:30 +00:00
Jason Garrett-Glaser	3ae079a3c8	VP8: optimize DC-only chroma case in the same way as luma. Add MMX idct_dc_add4uv function for this case. ~40% faster chroma idct. Originally committed as revision 24455 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-23 06:02:52 +00:00
Jason Garrett-Glaser	51c9156438	VP8 asm: cosmetics (spacing) Originally committed as revision 24453 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-23 03:02:56 +00:00
Jason Garrett-Glaser	8a467b2d44	VP8: 30% faster idct_mb Take shortcuts based on statistically common situations. Add 4-at-a-time idct_dc function (mmx and sse2) since rows of 4 DC-only DCT blocks are common. TODO: tie this more directly into the MB mode, since the DC-level transform is only used for non-splitmv blocks? Originally committed as revision 24452 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-23 02:58:27 +00:00
Jason Garrett-Glaser	c25c776708	VP8: clear DCT blocks in iDCT instead of using clear_blocks. ~0.3% faster overall. Originally committed as revision 24448 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-23 00:07:16 +00:00
Ronald S. Bultje	dc5eec8085	Use pextrw for SSE4 mbedge filter result writing, speedup 5-10cycles on CPUs supporting it. Originally committed as revision 24437 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-22 19:59:34 +00:00
Ronald S. Bultje	003243c3c2	Fix and enable horizontal >=SSE2 mbedge loopfilter. Originally committed as revision 24409 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-22 01:35:26 +00:00
Loren Merritt	c7b1d9768c	relicense h264 deblock sse2 to lgpl Originally committed as revision 24408 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-22 00:39:49 +00:00
Loren Merritt	532e769701	sync yasm macros from x264 Originally committed as revision 24406 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-21 22:45:16 +00:00
Jason Garrett-Glaser	8731dbd890	Eliminate one instruction in VP8 dc_add_sse4 Originally committed as revision 24405 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-21 22:41:37 +00:00
Jason Garrett-Glaser	7dd224a42d	Various VP8 x86 deblocking speedups SSSE3 versions, improve SSE2 versions a bit. SSE2/SSSE3 mbedge h functions are currently broken, so explicitly disable them. Originally committed as revision 24403 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-21 22:11:03 +00:00
Jason Garrett-Glaser	b8b231b5dc	Make mmx VP8 WHT faster Avoid pextrw, since it's slow on many older CPUs. Now it doesn't require mmxext either. Originally committed as revision 24397 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-21 20:51:01 +00:00
David Conrad	af521abc28	Add header declarations for mmx/sse constants missing them Originally committed as revision 24381 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-21 10:02:07 +00:00
David Conrad	c7eec58170	Move ff_pw_* from vc1dsp_mmx.c to dsputil_mmx.c Should fix compilation with icc and should help prevent any future duplicates Originally committed as revision 24380 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-21 10:02:03 +00:00
Ronald S. Bultje	e9e456d850	VP8 MBedge loopfilter MMX/MMX2/SSE2 functions for both luma (width=16) and chroma (width=8). Originally committed as revision 24378 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-20 22:58:56 +00:00
Ronald S. Bultje	268821e76e	Chroma (width=8) inner loopfilter MMX/MMX2/SSE2 for VP8 decoder. Originally committed as revision 24377 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-20 22:04:18 +00:00
Ronald S. Bultje	c60ed66dbe	Revert r24339 (it causes fate failures on x86-64) - I'll figure out what's wrong with it tomorrow or so, then re-submit. Originally committed as revision 24341 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-19 23:57:09 +00:00
Ronald S. Bultje	6526976f0c	Remove FF_MM_SSE2/3 flags for CPUs where this is generally not faster than regular MMX code. Examples of this are the Core1 CPU. Instead, set a new flag, FF_MM_SSE2/3SLOW, which can be checked for particular SSE2/3 functions that have been checked specifically on such CPUs and are actually faster than their MMX counterparts. In addition, use this flag to enable particular VP8 and LPC SSE2 functions that are faster than their MMX counterparts. Based on a patch by Loren Merritt <lorenm AT u washington edu>. Originally committed as revision 24340 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-19 22:38:23 +00:00
Ronald S. Bultje	1878f685c0	Implement chroma (width=8) inner loopfilter MMX/MMX2/SSE2 functions. Originally committed as revision 24339 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-19 21:53:28 +00:00
Ronald S. Bultje	fb9bdf048c	Be more efficient with registers or stack memory. Saves 8/16 bytes stack for x86-32, or 2 MM registers on x86-64. Originally committed as revision 24338 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-19 21:45:36 +00:00
Ronald S. Bultje	3facfc99da	Change function prototypes for width=8 inner and mbedge loopfilter functions so that it does both U and V planes at the same time. This will have speed advantages when using SSE2 (or higher) optimizations, since we can do both the U and V rows together in a single xmm register. This also renames filter16 to filter16y and filter8 to filter8uv so that it's more obvious what each function is used for. Originally committed as revision 24337 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-19 21:18:04 +00:00
Loren Merritt	1ee076b1b1	more credits to D. J. Bernstein for fft Originally committed as revision 24308 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-18 20:06:42 +00:00
Ronald S. Bultje	819b2dd2b1	Attempt to fix x86-64 testsuite on fate. Originally committed as revision 24275 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-16 21:35:30 +00:00
Ronald S. Bultje	6f323f1251	Remove duplicate define. Originally committed as revision 24272 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-16 19:54:47 +00:00
Ronald S. Bultje	889b2c26ee	Revert 24270, it contained some stuff that shouldn't have been in there. Originally committed as revision 24271 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-16 19:54:25 +00:00
Ronald S. Bultje	2356a7834b	Remove duplicate define. Originally committed as revision 24270 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-16 19:42:32 +00:00
Ronald S. Bultje	ede1b9665a	Give x86 r%d registers names, this will simplify implementation of the chroma inner loopfilter, and it also allows us to save one register on x86-64/sse2. Originally committed as revision 24269 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-16 19:38:10 +00:00
Ronald S. Bultje	526e831a46	Change return statement, the REP_RET is a mistake since the else case (x86-64, sse2) doesn't actually loop, so REP_RET isn't necessary. Originally committed as revision 24268 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-16 18:29:14 +00:00
Ronald S. Bultje	a711eb4829	VP8 H/V inner loopfilter MMX/MMXEXT/SSE2 optimizations. Originally committed as revision 24250 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-15 23:02:34 +00:00
David Conrad	faa26db28b	MMX/SSE VC1 loop filter Originally committed as revision 24208 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-11 22:53:01 +00:00
David Conrad	7af8fbd348	Make ff_pw_4 128 bits Originally committed as revision 24207 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-11 22:52:55 +00:00
Vitor Sessak	881fd7a62f	Move SSE optimized 32-point DCT to its own file. Should fix breakage with YASM disabled. Originally committed as revision 24078 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-06 17:48:23 +00:00
Vitor Sessak	4dcc4f8eaa	SSE optimized 32-point DCT Originally committed as revision 24077 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-06 16:58:54 +00:00
Ronald S. Bultje	f2a30bd840	Simple H/V loopfilter for VP8 in MMX, MMX2 and SSE2 (yay for yasm macros). Originally committed as revision 24029 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-03 19:26:30 +00:00
Jason Garrett-Glaser	b06855f18a	SSSE3 versions of vp8 width4 bilinear MC functions Originally committed as revision 24013 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-03 00:48:12 +00:00
Jason Garrett-Glaser	dcc602d802	SSSE3 versions of width4 VP8 6-tap MC functions Also make some small changes to saturation order of 4-tap SSSE3 MC to fix a non-bitexactness bug. Patch mostly by Eli Friedman <eli.friedman AT gmail DOT com>. Originally committed as revision 23965 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-02 05:27:41 +00:00
Jason Garrett-Glaser	8434fc26eb	Fix 100L in vp8dsp asm init Originally committed as revision 23946 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-01 22:09:22 +00:00
Jason Garrett-Glaser	17dc7c7a60	Fix h264/vp8 intra pred on Athlon XP Whose idea was it to have a CPU that didn't SIGILL on an invalid instruction? Originally committed as revision 23927 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-01 10:29:47 +00:00
Måns Rullgård	49bd8e4b84	Fix grammar errors in documentation Originally committed as revision 23904 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-06-30 15:38:06 +00:00
Jason Garrett-Glaser	82a8d0f114	Use add instead of lshift in mmxext vp8 idct Originally committed as revision 23891 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-06-29 17:23:17 +00:00
Ronald S. Bultje	565344e7e4	Remove unused macros (duplicates from the now-LGPL x86util.asm). Originally committed as revision 23890 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-06-29 17:04:29 +00:00
Ronald S. Bultje	2dd2f71692	MMX idct_add for VP8. Originally committed as revision 23886 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-06-29 14:43:11 +00:00
Jason Garrett-Glaser	29e719377f	Add missing mm_support call toff_h264_pred_init_x86. I'm not sure if this is supposed to be here, but it can't hurt. Originally committed as revision 23885 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-06-29 12:28:06 +00:00
Jason Garrett-Glaser	004cda8e79	Add mmxext version of VP8 DC Hadamard transform Originally committed as revision 23878 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-06-29 01:41:59 +00:00
Jason Garrett-Glaser	37355fe823	Make x86util.asm LGPL so we can use it in LGPL asm Strip out most x264-specific stuff (not used anywhere in ffmpeg). Originally committed as revision 23877 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-06-29 00:40:12 +00:00
Jason Garrett-Glaser	bc14f04b2f	MMXEXT version of vp8 4x4 vertical pred Originally committed as revision 23876 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-06-29 00:23:52 +00:00
Jason Garrett-Glaser	fb9927ad7d	Add mmx/mmxext/ssse3 4x4 TM intra pred functions for vp8 Originally committed as revision 23875 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-06-28 23:53:07 +00:00
Jason Garrett-Glaser	8b746bb473	Add missing comment header for predict_4x4_dc_mmxext Originally committed as revision 23874 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-06-28 23:37:24 +00:00
Jason Garrett-Glaser	270a85d259	Fix some intra pred MMX functions that used MMXEXT instructions Also add predict_4x4_dc MMXEXT function for vp8/h264. Originally committed as revision 23873 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-06-28 23:35:17 +00:00
Jason Garrett-Glaser	a912da761d	Fix VP8 bilinear mc on x86_64 Originally committed as revision 23872 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-06-28 22:13:14 +00:00
Baptiste Coudurier	50f70541d3	Change MMXEXT to MMX2, MMXEXT is deprecated Originally committed as revision 23865 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-06-28 21:12:00 +00:00
Jason Garrett-Glaser	0fecad09fe	Add x86 asm functions for VP8 put_pixels Originally committed as revision 23858 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-06-28 19:14:40 +00:00
Jason Garrett-Glaser	a173aa8940	Add MMX, SSE2, SSSE3 asm for VP8 bilinear MC Originally committed as revision 23857 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-06-28 18:56:24 +00:00
Måns Rullgård	1f65b67c46	Fix x86 build with h264dsp disabled Originally committed as revision 23844 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-06-28 10:02:15 +00:00
Eli Friedman	b3858964d6	Add const to some pointer parameters. Patch by Eli Friedman, eli D friedman A gmail Originally committed as revision 23826 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-06-27 15:11:38 +00:00
David Conrad	30bdefd1de	Fix build without yasm Originally committed as revision 23816 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-06-27 02:52:43 +00:00
Jason Garrett-Glaser	0178d14fe5	First shot at VP8 optimizations: - MMXEXT, SSE2 and SSSE3 MC functions - MMX and SSE4 IDCT dc_add functions Patch by Jason Garrett-Glaser <darkshikari gmail com> and myself. Originally committed as revision 23815 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-06-27 02:01:45 +00:00
Måns Rullgård	0912db0206	Make vp8 select h264dsp and use this to pull in mmx intrapred Originally committed as revision 23790 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-06-25 19:10:08 +00:00
Carl Eugen Hoyos	0c59074868	Fix compilation without --enable-gpl. Originally committed as revision 23789 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-06-25 19:06:29 +00:00
Carl Eugen Hoyos	96da2a6967	Cosmetics: Fix indentation. Originally committed as revision 23785 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-06-25 18:34:03 +00:00
Jason Garrett-Glaser	4af8cdfc3f	16x16 and 8x8c x86 SIMD intra pred functions for VP8 and H.264 Originally committed as revision 23783 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-06-25 18:25:49 +00:00
Vitor Sessak	89c7d8058c	Fix compilation on x64. Originally committed as revision 23753 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-06-24 08:53:32 +00:00
Vitor Sessak	57dbd12b6d	Fix asm constraints in apply_window() Originally committed as revision 23752 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-06-24 08:46:47 +00:00
Vitor Sessak	bc2b368215	SSE-optimized MP3 floating point windowing functions Originally committed as revision 23750 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-06-24 07:44:50 +00:00
Jason Garrett-Glaser	2966cc1849	Update x264asm header files to latest versions. Modify the asm accordingly. GLOBAL is now no longoer necessary for PIC-compliant loads. Originally committed as revision 23739 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-06-23 19:20:46 +00:00
David Conrad	413abbe164	Add bitexact versions of put_no_rnd_pixels8 _x2 and _y2 for vp3/theora Originally committed as revision 23463 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-06-04 04:46:26 +00:00
David Conrad	179655b6c6	vp3: The DC-only IDCT is surprisingly not supposed to be bitexact to the full IDCT. Fix this. Originally committed as revision 23358 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-05-28 07:01:34 +00:00
Michael Niedermayer	22cb6fb60f	Adding missing () to mathops.h. Originally committed as revision 23083 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-05-11 00:22:50 +00:00
Reimar Döffinger	1c71b5c89a	Replace more "m" constraints with MANGLE to fix compilation issues with x86_32 gcc 4.4.4 and -fPIC. Originally committed as revision 23082 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-05-10 21:16:08 +00:00
Diego Biurrun	ba87f0801d	Remove explicit filename from Doxygen @file commands. Passing an explicit filename to this command is only necessary if the documentation in the @file block refers to a file different from the one the block resides in. Originally committed as revision 22921 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-04-20 14:45:34 +00:00
David Conrad	eb6a6cd788	vp3: DC-only IDCT 2-4% faster overall decode Originally committed as revision 22896 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-04-17 02:04:30 +00:00
Reimar Döffinger	27eecec359	Convert two "m" constraints to MANGLE to fix compilation with some compilers. Originally committed as revision 22760 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-04-01 16:52:14 +00:00
Måns Rullgård	d343d59837	Replace remaining uses of ATTR_ALIGNED with DECLARE_ALIGNED Originally committed as revision 22593 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-03-18 15:00:17 +00:00
Måns Rullgård	3bd74e9243	Simplify arch-specific object file lists Originally committed as revision 22570 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-03-16 21:23:03 +00:00
Måns Rullgård	43f60eba19	Move arch-specific makefile parts into $arch/Makefile Originally committed as revision 22569 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-03-16 21:22:59 +00:00
Måns Rullgård	4693b031a3	Move H264 dsputil functions into their own struct This moves the H264-specific functions from DSPContext to the new H264DSPContext. The code is made conditional on CONFIG_H264DSP which is set by the codecs requiring it. The qpel and chroma MC functions are not moved as these are used by non-h264 code. Originally committed as revision 22565 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-03-16 01:17:00 +00:00
Måns Rullgård	05aec7bb87	Separate DWT from snow and dsputil This moves the DWT functions from snow.c and dsputil.c to a file of their own. A new struct, DWTContext, holds the function pointers previously part of DSPContext. Originally committed as revision 22522 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-03-14 17:50:12 +00:00
Måns Rullgård	f49747e904	x86: move function prototypes to header files Originally committed as revision 22266 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-03-06 22:37:08 +00:00
Måns Rullgård	c26e58e32c	Add some missing #includes Originally committed as revision 22258 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-03-06 22:36:36 +00:00
Måns Rullgård	1429224b04	Move FFT parts from dsputil.h to fft.h Originally committed as revision 22235 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-03-06 14:34:46 +00:00
Måns Rullgård	84dc2d8afa	Remove DECLARE_ALIGNED_{8,16} macros These macros are redundant. All uses are replaced with the generic DECLARE_ALIGNED macro instead. Originally committed as revision 22233 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-03-06 14:24:59 +00:00
Måns Rullgård	5e46be96f8	Move NEG_[US]SR32 macros to mathops.h Originally committed as revision 21873 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-02-17 23:58:59 +00:00
David Conrad	19530266a5	Enable SSE2 (put\|avg)_pixels_16_sse2 SVQ1 chroma has been special-cased aligned to 16-bytes since at least r15466 Other architectures also assume 16-byte alignment here too but set STRIDE_ALIGN to 16. Originally committed as revision 21736 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-02-10 02:02:06 +00:00
Reimar Döffinger	3d05c1fbec	Make the jump-table section-relative for x86_64 with PIC enabled. This allows to get rid of the macho64 specific hack that moves them to rodata (with worse cache behaviour) and avoids textrels which e.g. Gentoo does not allow for x86_64 libraries. Originally committed as revision 21551 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-01-30 19:26:47 +00:00
Loren Merritt	900479bb74	optimize h264_loop_filter_strength_mmx2 244->160 cycles on core2 Originally committed as revision 21462 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-01-26 17:17:48 +00:00
Alex Converse	3deb53849e	Implement an sse version of scalarproduct_float(). Originally committed as revision 21386 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-01-22 23:07:58 +00:00
Måns Rullgård	c67278098d	Move array specifiers outside DECLARE_ALIGNED() invocations Originally committed as revision 21377 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-01-22 03:25:11 +00:00
David Conrad	1f630b9717	Use two separate memory arguments since 8+() is invalid gas syntax Originally committed as revision 21360 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-01-21 09:46:57 +00:00
Michael Niedermayer	b4c2ada528	Attempt to fix asm compilation failure. Only tested on gcc 4 & x86_64. Originally committed as revision 21355 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-01-20 19:23:19 +00:00
Måns Rullgård	5e7dfb7de1	Move COPY3_IF_LT to lavc/mathops.h This obscure macro is only used in motion_est.c so having it in lavc makes more sense. See discussion here: http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/2008-November/056561.html Originally committed as revision 21346 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-01-20 06:01:54 +00:00
David Conrad	c4f2b6dce3	Use constant offsets for memory operands since gcc is unable to This fixes gcc failing to fit 6 memory locations into 7 registers on x86-32 Originally committed as revision 21337 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-01-20 00:34:10 +00:00
Michael Niedermayer	9ac4548ff7	Fix h264_loop_filter_strength_mmx2() so it works with b frames. Originally committed as revision 21327 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-01-19 16:40:36 +00:00
Michael Niedermayer	ebddd2e253	Remove -2 -> -1 remapping, its not needed anymore as we must remap all references per LUT anyway. Originally committed as revision 21323 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-01-19 14:28:19 +00:00
Gwenole Beauchesne	5716aec3f9	Fix XvMC. XvMCCreateBlocks() may not allocate 16-byte aligned blocks, so we can't use SSE-optimized routines. Originally committed as revision 21011 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-01-04 09:19:32 +00:00
Reimar Döffinger	4a1289450a	Reduce number of ASM constraints for ff_lpc_compute_autocorr_sse2 since it causes no significant speed difference and can avoid compilation issues with --enable-pic. Originally committed as revision 21003 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-01-02 17:48:08 +00:00
Diego Biurrun	4052cbf161	Get rid of pointless CONFIG_ANY_H263 preprocessor definition. Originally committed as revision 20975 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-12-30 11:33:59 +00:00
Loren Merritt	758c7455f1	fix a crash in ape decoding on x86_32 sse2 Originally committed as revision 20777 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-12-08 21:24:01 +00:00
Loren Merritt	a4605efdf5	slightly faster scalarproduct_and_madd_int16_ssse3 on penryn, no change on conroe Originally committed as revision 20743 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-12-05 17:53:11 +00:00
Loren Merritt	91e644ff77	r20739 broke compilation on systems without yasm Originally committed as revision 20742 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-12-05 17:51:57 +00:00
Loren Merritt	b1159ad928	refactor and optimize scalarproduct 29-105% faster apply_filter, 6-90% faster ape decoding on core2 (Any x86 other than core2 probably gets much less, since this is mostly due to ssse3 cachesplit avoidance and I haven't written the full gamut of other cachesplit modes.) 9-123% faster ape decoding on G4. Originally committed as revision 20739 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-12-05 15:09:10 +00:00
Loren Merritt	b10fa1bb8b	port ape dsp functions from sse2 to mmx now requires yasm Originally committed as revision 20722 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-12-03 18:53:12 +00:00
Loren Merritt	4521308363	s/movdqa/movaps/ in sse1 fft. (regression in r20293) Originally committed as revision 20371 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-10-25 03:09:53 +00:00
Loren Merritt	b07781b6e4	fix linking on systems with a function name prefix (10l in r20287) Originally committed as revision 20294 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-10-18 21:44:03 +00:00
Loren Merritt	29e4edbbe7	sync yasm macros to x264 Originally committed as revision 20293 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-10-18 21:42:28 +00:00
Loren Merritt	e17ccf60fe	huffyuv: add some const qualifiers Originally committed as revision 20290 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-10-18 20:47:25 +00:00
Loren Merritt	2f77923d72	simd add_hfyu_left_prediction 2.2x faster than C on conroe, 3.6x on penryn. 4-6% faster huffyuv decoding if using left or plane mode and yuv Originally committed as revision 20287 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-10-18 20:10:10 +00:00
Justin Ruggles	f4d608e344	add CONFIG_LPC to the build system for lpc dsputil functions. fixes build problems when lpc.c is not compiled. Originally committed as revision 20285 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-10-18 19:51:18 +00:00
Justin Ruggles	fde82ca7e4	Move autocorrelation function from flacenc.c to lpc.c. Also rename the corresponding dsputil functions and remove their dependency on the FLAC encoder. Fixes Issue1486. Originally committed as revision 20266 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-10-17 21:00:39 +00:00
Reimar Döffinger	ec65675504	Use MANGLE in cavsdsp, the current version using "m" constraints will not compile on e.g. OpenBSD due to running out of registers. Originally committed as revision 20123 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-10-01 15:30:27 +00:00
Reimar Döffinger	003121091e	Replace several #ifdef PIC with the more obvious and correct #if !HAVE_EBX_AVAILABLE, since all it does is avoid using ebx. Originally committed as revision 20094 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-09-30 09:49:12 +00:00
Måns Rullgård	35de5d2412	cosmetics: fix indentation after previous commit Originally committed as revision 20062 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-09-27 16:52:00 +00:00
Måns Rullgård	952e872198	Drop unused args from vector_fmul_add_add, simpify code, and rename The src3 and step arguments to vector_fmul_add_add() are always zero and one, respectively. This removes these arguments from the function, simplifies the code accordingly, and renames the function to better match the new operation. Originally committed as revision 20061 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-09-27 16:51:54 +00:00
Måns Rullgård	01b2214758	Merge FFTContext and MDCTContext Originally committed as revision 19931 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-09-20 17:30:20 +00:00
Måns Rullgård	f486321395	Move per-arch fft init bits into the corresponding subdirs Originally committed as revision 19864 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-09-15 21:14:14 +00:00
Måns Rullgård	4e36a5b46f	Move declarations of some mmx functions to dsputil_mmx.h Originally committed as revision 19739 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-08-29 16:55:50 +00:00
Vitor Sessak	9263a05aab	Mark "i" parameter of vector_clipf_sse() as early-clobber Originally committed as revision 19731 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-08-27 15:52:44 +00:00
Vitor Sessak	50e23ae9d3	Mark parameter src of vector_clipf() as const Originally committed as revision 19729 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-08-27 15:38:59 +00:00
Vitor Sessak	0a68cd876e	SSE optimized vector_clipf(). 10% faster TwinVQ decoding. Originally committed as revision 19728 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-08-27 14:49:36 +00:00
John Adcock	3f87f39cb8	Update x264 asm code to latest to add support for 64-bit Windows. Use the new x86inc features to support 64-bit Windows on all non-x264 nasm assembly code as well. Patch by John Adcock, dscaler.johnad AT googlemail DOT com. Win64 changes originally by Anton Mitrofanov. x86util changes mostly by Holger Lubitz. Originally committed as revision 19580 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-08-04 07:42:55 +00:00
Diego Biurrun	9be6f0d2f8	Do not check for both CONFIG_VC1_DECODER and CONFIG_WMV3_DECODER, the former depends upon the latter. Originally committed as revision 19533 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-07-29 09:54:49 +00:00
Diego Biurrun	99e5a9d1ea	Do not redundantly check for both CONFIG_THEORA_DECODER and CONFIG_VP3_DECODER. The Theora decoder depends on the VP3 decoder. Originally committed as revision 19492 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-07-22 22:27:10 +00:00
Carl Eugen Hoyos	36904c4c9f	Icc 11.1 still does not align the stack pointer, disable some x264 functions. Originally committed as revision 19454 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-07-17 09:07:38 +00:00
Jason Garrett-Glaser	73b02e2460	SSE version of clear_blocks Originally committed as revision 19206 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-06-16 17:33:57 +00:00
Jason Garrett-Glaser	4f717c69ed	idct_dc for VC-1/WMV3 decoder; ~11% faster decoding overall. Includes mmx2 asm for the various functions. Note that the actual idct still does not have an x86 SIMD implemtation. For wmv3 files using regular idct, the decoder just falls back to simple_idct, since simple_idct_dc doesn't exist (yet). Originally committed as revision 19204 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-06-16 09:00:55 +00:00
Ramiro Polla	74a841af8b	Replace more uses of __attribute__((aligned)) by DECLARE_ALIGNED. Originally committed as revision 19089 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-06-04 23:25:09 +00:00
Ramiro Polla	989b7181ac	Use fewer macros in x86-optimized mlpdsp. Fixes compilation on 32-bit llvm which didn't allow a cast in an m operand. Originally committed as revision 19086 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-06-03 23:48:28 +00:00
Alexander Strange	2b9969a945	H264: Fix out of bounds reads in SSSE3 MC Reading above src[-2] isn't safe, so move loads and palignr ahead 3 pixels to load starting at the first pixel actually used. Fixes issue941. Originally committed as revision 18999 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-05-30 22:19:14 +00:00
Ramiro Polla	7c4c60e520	mlp: Use LABEL_MANGLE() to export label symbols from inside asm block. Originally committed as revision 18935 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-05-25 01:52:05 +00:00
Ramiro Polla	5624766d18	MLP DSP functions x86-optimized. 12.59% overall speedup in x86_32 9.98% overall speedup in x86_64 compared to gcc 4.3.3 Originally committed as revision 18903 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-05-23 00:23:30 +00:00
David Conrad	c21c835b8d	avg_ pixel functions need to use (dst+pix+1)>>1 to average with existing pixels, not (dst+pix)>>1. This makes the mmx functions bitexact with the C functions. Originally committed as revision 18527 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-04-15 19:10:16 +00:00
David Conrad	9bf0fdf378	VC1: extend MMX qpel MC to include MMX2 avg qpel Originally committed as revision 18519 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-04-15 02:25:42 +00:00
David Conrad	8013da7364	VC1: add and use avg_no_rnd chroma MC functions Originally committed as revision 18518 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-04-14 23:56:10 +00:00
David Conrad	c374691b28	Rename put_no_rnd_h264_chroma* to reflect its usage in VC1 only Originally committed as revision 18517 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-04-14 23:55:39 +00:00
Michael Niedermayer	cfe675269b	Do not use SSE2 SAD for snow as it requires more alignment than can be easily provided. Fixes issue315. Originally committed as revision 18404 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-04-09 21:53:48 +00:00
Stefano Sabatini	6b4343616c	Rename FF_MM_MMXEXT to FF_MM_MMX2, for both clarity and consistency with libswscale. Originally committed as revision 18330 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-04-04 13:20:53 +00:00
Reimar Döffinger	0be9e73e38	Mark line_skip3 asm argument as output-only instead of using av_uninit. Originally committed as revision 18327 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-04-03 14:03:49 +00:00
Reimar Döffinger	d7460a9cac	Mark put_signed_pixels_clamped_mmx output operands as early-clobber because they are. Hopefully fixes some FATE errors, too. Originally committed as revision 18326 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-04-03 14:02:34 +00:00
Reimar Döffinger	531a3d2721	Use DECLARE_ASM_CONST for non-global ff_vector128 constant used via MANGLE Originally committed as revision 18325 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-04-03 14:01:24 +00:00
Alex Converse	3dd6531208	Rewrite put_signed_pixels_clamped_mmx() to eliminate mmx.h from dsputil_mmx.c. Originally committed as revision 18319 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-04-02 21:02:42 +00:00
David Conrad	710441c2f6	Add SSE4 detection support Originally committed as revision 18302 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-04-01 09:11:32 +00:00
Matthieu Castet	ecf05a5971	Remove useless casting in asm "m" operand. Patch by Matthieu Castet, castet D matthieu A free D fr Originally committed as revision 18054 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-03-19 23:29:11 +00:00
Zuxy Meng	d05f808dc9	Remove CPUID availability check on AMD64 as it's architectural. Originally committed as revision 17543 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-02-23 15:53:39 +00:00
Jason Garrett-Glaser	e27ad11840	Convert x264 asm files to proper unix line breaks Originally committed as revision 17524 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-02-22 11:35:32 +00:00
Jason Garrett-Glaser	9bd5f59b33	Remove (incorrect) filenames from x264 asm files, add descriptions. Originally committed as revision 17523 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-02-22 11:33:09 +00:00
Alexander Strange	b6188c5a55	Put dispatch_tab in the rodata section for macho64. This fixes linking shared libavcodec, since the linker doesn't allow text relocations in shared libraries under Darwin/x86_64. Based on a patch by Art Clarke (aclarke xuggle com) Originally committed as revision 17197 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-02-13 00:57:22 +00:00
Zuxy Meng	ecb24904fe	add SSE2 version of vp6_filter_diag original patch by Zuxy Meng zuxy.meng _at_ gmail _dot_ com Originally committed as revision 17195 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-02-13 00:02:33 +00:00
Sebastien Lucas	6af3c226c3	add MMX version of vp6_filter_diag original patch by Sebastien Lucas sebastien.lucas _at_ gmail _dot_ com Originally committed as revision 17194 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-02-12 23:52:52 +00:00
Aurelien Jacobs	5110b25e1e	convert ff_pw_64 into an xmm_reg for future use in vp6 sse code Originally committed as revision 17192 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-02-12 23:48:07 +00:00
Diego Biurrun	15c13dde98	Fix wrong file name in header, noticed by David DeHaven, dave sagetv com. Originally committed as revision 17158 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-02-11 16:12:04 +00:00
Diego Biurrun	d3a4b4e09c	Add check whether the compiler/assembler supports 10 or more operands. thanks to Loren for some help with the asm statements Originally committed as revision 17151 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-02-11 11:16:00 +00:00
Stefan Gehrer	e090c70f2f	avoid duplicating dsputil's clear_block Originally committed as revision 17135 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-02-10 16:45:02 +00:00
Diego Biurrun	ea399a87b2	Remove svn:executable property from source file. Originally committed as revision 17098 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-02-09 11:32:29 +00:00
Loren Merritt	3daa434a40	ff_add_hfyu_median_prediction_mmx2 overall ffvhuff decoding speedup: 28% on core2, 25% on k8. Originally committed as revision 17059 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-02-08 17:45:30 +00:00
Loren Merritt	6166516d1f	re-enable mid_pred asm on x86_64. (broke in r16681) Originally committed as revision 17058 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-02-08 17:44:04 +00:00
Baptiste Coudurier	353f87b8d4	fix typo in h264dsp_mmx (no effect currently as the function is not used), approved by Dark Shikari on IRC Originally committed as revision 17046 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-02-08 06:35:21 +00:00
Diego Biurrun	bad5537e2c	Use full internal pathname in doxygen @file directives. Otherwise doxygen complains about ambiguous filenames when files exist under the same name in different subdirectories. Originally committed as revision 16912 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-02-01 02:00:19 +00:00
David Conrad	137ae32760	Workaround for gcc 3.4 to align sh properly Originally committed as revision 16797 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-01-26 03:40:48 +00:00
Diego Biurrun	406792e7b0	cosmetics: Remove pointless period after copyright statement non-sentences. Originally committed as revision 16684 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-01-19 15:46:40 +00:00
Aurelien Jacobs	199436b952	moves mid_pred() into mathops.h (with arch specific code split by directory) Originally committed as revision 16681 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-01-18 22:57:40 +00:00
Aurelien Jacobs	49fb20cb8a	replace all occurrence of ENABLE_ by the corresponding CONFIG_, HAVE_ or ARCH_ and remove all ENABLE_ definitions. Originally committed as revision 16600 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-01-14 17:19:17 +00:00
Aurelien Jacobs	b250f9c66d	Change semantic of CONFIG_, HAVE_ and ARCH_*. They are now always defined to either 0 or 1. Originally committed as revision 16590 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-01-13 23:44:16 +00:00
Ramiro Polla	1bb04d5a44	configure: allow to disable sse code. Based on patch by Michael Kostylev <mik at it-1 dot ru> Originally committed as revision 16490 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-01-07 23:38:54 +00:00
Diego Biurrun	c47d146be8	Add missing 'void' keyword to parameterless function declarations. Originally committed as revision 16436 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-01-05 13:57:43 +00:00
Mathieu Velten	21ff7689da	Use H264 MMX chroma functions to accelerate RV40 decoding. Patch by Mathieu Velten (matmaul A gmail) Originally committed as revision 16419 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-01-04 01:36:11 +00:00
Jason Garrett-Glaser	37fed10087	Add x264 SSE2 iDCT functions to H.264 decoder. Originally committed as revision 16409 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-01-03 00:46:17 +00:00
Carl Eugen Hoyos	2c67c65963	Fix h264 decoding on SSE2 cores with icc compilation. Originally committed as revision 16373 to svn://svn.ffmpeg.org/ffmpeg/trunk	2008-12-28 19:40:13 +00:00
Jason Garrett-Glaser	c1fc70362f	Fix compilation without optimization under 64-bit with x264 deblock asm enabled. Originally committed as revision 16313 to svn://svn.ffmpeg.org/ffmpeg/trunk	2008-12-26 00:19:08 +00:00
Diego Biurrun	a6493a8fbd	Rename libavcodec/i386/ --> libavcodec/x86/. It contains optimizations that are not specific to i386 and libavutil uses this naming scheme already. Originally committed as revision 16270 to svn://svn.ffmpeg.org/ffmpeg/trunk	2008-12-22 09:12:42 +00:00

... 5 6 7 8 9 ...

535 Commits