Commit Graph

338 Commits

Author SHA1 Message Date
Panagiotis Issaris
9b5dc86746 Make vp3dsp*.c compilation optional.
Originally committed as revision 9025 to svn://svn.ffmpeg.org/ffmpeg/trunk
2007-05-14 14:28:13 +00:00
Reimar Döffinger
e36d79c837 Change some leftover __attribute__((unused)) and __attribute__((used)) to
attribute_unused and attribute_used respectively to ease compiling on non-gcc.

Originally committed as revision 9024 to svn://svn.ffmpeg.org/ffmpeg/trunk
2007-05-14 14:07:50 +00:00
Zuxy Meng
25e4f8aaee Faster SSE FFT/MDCT, patch by Zuxy Meng %zuxy P meng A gmail P com%
unrolls some loops, utilizing all 8 xmm registers. fft-test
shows ~10% speed up in (I)FFT and ~8% speed up in (I)MDCT on Dothan

Originally committed as revision 9017 to svn://svn.ffmpeg.org/ffmpeg/trunk
2007-05-13 16:32:32 +00:00
Loren Merritt
ff506a906e sse2 & ssse3 versions of dct_quantize.
core2: mmx2=154 sse2=73 ssse3=66 (cycles)
k8: mmx2=179 sse2=149
p4: mmx2=284 sse2=194

Originally committed as revision 9003 to svn://svn.ffmpeg.org/ffmpeg/trunk
2007-05-12 05:55:09 +00:00
Loren Merritt
1edbfe1994 factor sum_abs_dctelem out of dct_sad, and simd it.
sum_abs_dctelem_* alone:
core2: c=186 mmx2=39 sse2=21 ssse3=13 (cycles)
k8: c=163 mmx2=33 sse2=31
p4: c=370 mmx2=60 sse2=60
 dct_sad including sum_abs_dctelem_*:
core2: c=405 mmx2=258 sse2=240 ssse3=232
k8: c=624 mmx2=394 sse2=392
p4: c=849 mmx2=556 sse2=556

Originally committed as revision 9001 to svn://svn.ffmpeg.org/ffmpeg/trunk
2007-05-12 02:41:25 +00:00
Loren Merritt
561f940c03 sse2 & ssse3 versions of hadamard. unroll and inline diff_pixels.
core2: before mmx2=193 cycles. after mmx2=174 sse2=122 ssse3=115 (cycles).
k8: before mmx2=205. after mmx2=184 sse2=180.
p4: before mmx2=342. after mmx2=314 sse2=309.

Originally committed as revision 9000 to svn://svn.ffmpeg.org/ffmpeg/trunk
2007-05-12 01:16:06 +00:00
Loren Merritt
ba53071acb 10l, r8991 broke mmx1 sad
Originally committed as revision 8993 to svn://svn.ffmpeg.org/ffmpeg/trunk
2007-05-11 03:29:06 +00:00
Loren Merritt
72946825fa sse2 version of fullpel sad.
16% faster on core2, 5% faster on p4. 10% slower (and thus disabled) on k8.

Originally committed as revision 8992 to svn://svn.ffmpeg.org/ffmpeg/trunk
2007-05-11 01:11:45 +00:00
Loren Merritt
164d75ebf3 tweak mmx2 sad.
40% faster on core2, 18% faster on k8, 5% faster on p4.

Originally committed as revision 8991 to svn://svn.ffmpeg.org/ffmpeg/trunk
2007-05-11 00:45:07 +00:00
Loren Merritt
eca3810e31 tweak mmx2 sad.
6% faster on core2 and k8, no change on p4.

Originally committed as revision 8984 to svn://svn.ffmpeg.org/ffmpeg/trunk
2007-05-10 22:24:19 +00:00
Loren Merritt
7c3a9fe2a3 sse2 version of fdct_col.
k8: 72->61 cycles, core2: 51->26 cycles.

Originally committed as revision 8966 to svn://svn.ffmpeg.org/ffmpeg/trunk
2007-05-10 03:13:41 +00:00
Loren Merritt
5adf43e47e cosmetics: remove code duplication in hadamard8_diff_mmx
Originally committed as revision 8946 to svn://svn.ffmpeg.org/ffmpeg/trunk
2007-05-09 01:46:33 +00:00
Loren Merritt
bba5293bb7 cosmetics: remove duplicate transpose macro
Originally committed as revision 8939 to svn://svn.ffmpeg.org/ffmpeg/trunk
2007-05-08 17:55:56 +00:00
Reimar Döffinger
a1ce61108b Fix parts missed in clip -> av_clip rename
Originally committed as revision 8760 to svn://svn.ffmpeg.org/ffmpeg/trunk
2007-04-19 16:12:06 +00:00
Diego Biurrun
fe0372296a typos
Originally committed as revision 8642 to svn://svn.ffmpeg.org/ffmpeg/trunk
2007-04-07 14:10:02 +00:00
Loren Merritt
5900637219 mmx 16-bit ssd. 2.3x faster svq1 encoding.
Originally committed as revision 8559 to svn://svn.ffmpeg.org/ffmpeg/trunk
2007-03-30 19:15:31 +00:00
Diego Biurrun
d42f88025a Fix wrong conditional, Snow decoding, not encoding, was SIMD-accelerated.
Originally committed as revision 8116 to svn://svn.ffmpeg.org/ffmpeg/trunk
2007-02-24 11:58:52 +00:00
Michael Niedermayer
58e31fb1d5 reorder a few more paddws to reduce dependancy chains
chroma mc4 put 2480 -> 2460 dezicyles on duron

Originally committed as revision 8098 to svn://svn.ffmpeg.org/ffmpeg/trunk
2007-02-23 15:44:56 +00:00
Michael Niedermayer
b4fe97696c reorder paddws to reduce dependancy chain
put_h264_chroma_mc2_mmx2() 927 -> 902 dezicyles on duron

Originally committed as revision 8097 to svn://svn.ffmpeg.org/ffmpeg/trunk
2007-02-23 15:28:35 +00:00
Michael Niedermayer
0c67082e02 shortening dependancy chain in chroma mc2
Originally committed as revision 8095 to svn://svn.ffmpeg.org/ffmpeg/trunk
2007-02-23 15:03:30 +00:00
Michael Niedermayer
af26516261 remove now wrong comment
Originally committed as revision 8094 to svn://svn.ffmpeg.org/ffmpeg/trunk
2007-02-23 14:29:59 +00:00
Michael Niedermayer
61240ae556 fix chroma mc2 bug, this is based on a patch by (Oleg Metelitsa oleg hitron co kr)
and does slow the mc2 chroma put down, avg interrestingly seems unaffected speedwise on duron
this of course should be rather done in a way which doesnt slow it down but its better a few %
slower but correct then incorrect

Originally committed as revision 8093 to svn://svn.ffmpeg.org/ffmpeg/trunk
2007-02-23 14:29:13 +00:00
Michael Niedermayer
470d2d03cc gcc 2.95 fix
Originally committed as revision 8059 to svn://svn.ffmpeg.org/ffmpeg/trunk
2007-02-22 00:04:36 +00:00
Måns Rullgård
459022f504 fix for x86-64
Originally committed as revision 8022 to svn://svn.ffmpeg.org/ffmpeg/trunk
2007-02-18 20:00:05 +00:00
Michael Niedermayer
b21e0b6dfc rewrite H264_CHROMA_MC4_TMPL (20% faster)
Originally committed as revision 8012 to svn://svn.ffmpeg.org/ffmpeg/trunk
2007-02-17 23:43:02 +00:00
Michael Niedermayer
2a115873af add a few asserts to ensure alignment
Originally committed as revision 7994 to svn://svn.ffmpeg.org/ffmpeg/trunk
2007-02-16 21:22:53 +00:00
Michael Niedermayer
00e210ddbb prevent h.264 MC related functions from being inlined (yes this is much faster the code just doesnt fit in the code cache otherwise)
Originally committed as revision 7993 to svn://svn.ffmpeg.org/ffmpeg/trunk
2007-02-16 21:21:07 +00:00
Reimar Döffinger
392b76ca93 Minor AMD64 compilation fix
Originally committed as revision 7907 to svn://svn.ffmpeg.org/ffmpeg/trunk
2007-02-10 13:33:08 +00:00
Michael Niedermayer
9bc0d3ef3e maybe fix x86_64 (untested)
Originally committed as revision 7906 to svn://svn.ffmpeg.org/ffmpeg/trunk
2007-02-10 03:56:49 +00:00
Michael Niedermayer
7c4fd7eb0c factor out common subexprssion (gcc of course is too stupid to do this ...)
5% faster avg_h264_chroma_mc2_mmx2()
10% faster put_h264_chroma_mc2_mmx2()

Originally committed as revision 7898 to svn://svn.ffmpeg.org/ffmpeg/trunk
2007-02-09 12:37:38 +00:00
Michael Niedermayer
9301a0b4a9 merge asm fragments in H264_CHROMA_MC2_TMPL()
10% faster avg_h264_chroma_mc2_mmx2()
5% faster put_h264_chroma_mc2_mmx2()

Originally committed as revision 7897 to svn://svn.ffmpeg.org/ffmpeg/trunk
2007-02-09 12:24:22 +00:00
Panagiotis Issaris
9dd6c80453 Add the const specifier as needed to reduce the number of warnings.
Originally committed as revision 7764 to svn://svn.ffmpeg.org/ffmpeg/trunk
2007-01-30 10:31:34 +00:00
Diego Biurrun
9688979c51 Fix some more license headers.
Originally committed as revision 7637 to svn://svn.ffmpeg.org/ffmpeg/trunk
2007-01-22 01:16:42 +00:00
Guillaume Poirier
5a5c770d5a Add SSSE3 (Core2 aka Conroe/Merom/Woodcrester new instructions) detection
Originally committed as revision 7332 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-12-18 22:43:09 +00:00
Måns Rullgård
849f10351d rename always_inline to av_always_inline and move to common.h
Originally committed as revision 7256 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-12-08 00:35:08 +00:00
Måns Rullgård
486497e07b revert bad checkin
Originally committed as revision 7044 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-11-14 03:18:09 +00:00
Måns Rullgård
be6ed6fff4 move some CFLAGS settings away from config.* writing section
Originally committed as revision 7043 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-11-14 03:12:29 +00:00
Måns Rullgård
7466ed2f02 zigzag_direct_noperm doesn't exist, remove declaration
Originally committed as revision 6998 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-11-12 23:35:49 +00:00
Måns Rullgård
36cd306907 rename inverse -> ff_inverse
Originally committed as revision 6990 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-11-12 18:49:36 +00:00
Måns Rullgård
bb54f6ab62 adding more static keywords
Originally committed as revision 6976 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-11-12 03:34:12 +00:00
Michael Niedermayer
079e61db5d ensure alignment (no speed change)
Originally committed as revision 6891 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-11-03 16:54:05 +00:00
Michael Niedermayer
f5a9e8f33d merging mov & and (no speedchange)
Originally committed as revision 6889 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-11-03 16:02:18 +00:00
Michael Niedermayer
e80cf125a7 2 instructions less (same speed)
Originally committed as revision 6888 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-11-03 15:40:57 +00:00
Michael Niedermayer
9347118237 comment about failed optimization
Originally committed as revision 6887 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-11-03 15:17:36 +00:00
Michael Niedermayer
38cfdc83f0 move luma tc0 related init into asm
5% faster filter_mb_fast() on P3

Originally committed as revision 6884 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-11-03 14:28:30 +00:00
Michael Niedermayer
25225c3773 2 instructions less in h264_loop_filter_luma_mmx2()
Originally committed as revision 6882 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-11-03 12:07:53 +00:00
Michael Niedermayer
bda2203d56 preempt possible overflow
Originally committed as revision 6881 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-11-03 11:07:35 +00:00
Michael Niedermayer
5a1553dee3 1 instruction less
Originally committed as revision 6880 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-11-03 09:59:15 +00:00
Michael Niedermayer
e9f1885c21 optimize H264_DEBLOCK_P0_Q0
2.5% faster filter_mb_fast() on P3

Originally committed as revision 6877 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-11-03 02:03:56 +00:00
Diego Biurrun
7c428ea681 Put libmpeg2 IDCT functions under CONFIG_GPL, fixes link failure
with --disable-opts.

Originally committed as revision 6691 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-14 17:04:50 +00:00