112 Commits

Author SHA1 Message Date
Michael Niedermayer
eb73bf723d x86 asm version of the decode significance loop (not 8x8) of decode_residual() 5% faster decode_residual() on P3
Originally committed as revision 6724 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-17 22:18:29 +00:00
Michael Niedermayer
4041a495a8 cosmetic (%%eax->%0)
Originally committed as revision 6717 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-17 09:38:37 +00:00
Diego Biurrun
8dda3e796b Fix crash with illegal instruction, cmov is available on 686 and later only.
Originally committed as revision 6715 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-16 21:47:19 +00:00
Diego Biurrun
e962604f1c Expand some #endif comments.
Originally committed as revision 6714 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-16 21:22:47 +00:00
Michael Niedermayer
165c5f0909 fix !CMOV_IS_FAST case (iam not really happy with the fix but i didnt come up with a better one quickly)
Originally committed as revision 6707 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-16 11:11:20 +00:00
Michael Niedermayer
1d7c111856 10l
Originally committed as revision 6704 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-15 21:04:10 +00:00
Michael Niedermayer
faff3a7ad0 this code will not work with PIC as it needs 7 registers and gcc doesnt support that in PIC
Originally committed as revision 6703 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-15 20:50:05 +00:00
Michael Niedermayer
f24a515931 shift CABACContext.range right, this reduces the number of shifts needed in get_cabac() and is slightly faster on P3 (and should be much faster on P4 as the P4 except the more recent variants lacks an integer shifter and so shifts have ~10 times longer latency then simple operations like adds)
Originally committed as revision 6702 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-15 20:40:50 +00:00
Michael Niedermayer
68a205edef dehack *ps_state indexing in the branchless decoder
Originally committed as revision 6683 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-13 14:21:25 +00:00
Michael Niedermayer
12ff5b0f3b add "memory" to the clobber list we change memory so we need it, this also fixes some problems with gcc svn
Originally committed as revision 6679 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-12 21:32:56 +00:00
Michael Niedermayer
851ded8918 prevent "mb level" get_cabac() calls from being inlined (3% faster decode_mb_cabac() on P3)
Originally committed as revision 6674 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-12 14:49:19 +00:00
Guillaume Poirier
a0490b324a adds some useful comments after some of the #else, #elseif,
#endif preprocessor directives to make it clearer which code
block depends on which #define xx

Originally committed as revision 6668 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-12 07:51:18 +00:00
Diego Biurrun
c26abfa541 Rename ABS macro to FFABS.
Originally committed as revision 6666 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-11 23:17:58 +00:00
Michael Niedermayer
1f4d5e9f69 slightly faster on P3 slightly slower on athlon and probably faster on P4
Originally committed as revision 6663 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-11 17:59:40 +00:00
Michael Niedermayer
2b5269b51c moving lps state transition code a little up in the branched asm code (1% faster on P3)
Originally committed as revision 6658 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-11 16:39:50 +00:00
Michael Niedermayer
b99f3cabed write cabac low and range variables as early as possible to prevent stalls from reading them before they where written, the P4 is said to disslike that alot, on P3 its 2% faster (START/STOP_TIMER over decode_residual)
Originally committed as revision 6657 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-11 16:11:41 +00:00
Michael Niedermayer
d17faef011 use ecx instead of cl (no speed change on P3 but might avoid partial register stalls on some cpus)
Originally committed as revision 6656 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-11 15:20:08 +00:00
Michael Niedermayer
d61c4e731e make state transition tables global as they are constant and the code is slightly faster that way
Originally committed as revision 6655 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-11 14:44:17 +00:00
Michael Niedermayer
5f3eca121e 10l
Originally committed as revision 6654 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-11 13:25:29 +00:00
Michael Niedermayer
0fa352c7e6 make lps_range a global table its constant anyway (saves 1 addition for accessing it)
Originally committed as revision 6653 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-11 13:21:42 +00:00
Michael Niedermayer
3650b43959 enable CMOV_IS_FAST as its faster or equal speed on every cpu (duron, athlon, PM, P3) from which ive seen benchmarks, it might be slower on P4 but noone has posted benchmarks ...
Originally committed as revision 6652 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-11 12:23:40 +00:00
Diego Biurrun
0bc2e7f081 BRANCHLESS_CABAD --> BRANCHLESS_CABAC_DECODER
Originally committed as revision 6623 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-10 08:16:41 +00:00
Michael Niedermayer
9ed92c65f1 moving another bit&1 out, this is as fast as with it in there, but it makes more sense with it outside of the loop
Originally committed as revision 6618 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-10 06:56:51 +00:00
Michael Niedermayer
f1b37db48d move the &1 out of the asm so gcc can optimize it away in inlined cases (yes this is slightly faster)
Originally committed as revision 6616 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-10 01:17:39 +00:00
Michael Niedermayer
ab0151d163 replace a few and/sub/... by cmov
this is faster on P3, should be faster on AMD, and should be slower on P4
its disabled by default (benchmarks welcome so we know when to enable it)

Originally committed as revision 6615 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-10 01:08:39 +00:00
Michael Niedermayer
a6672acf45 reading 8bit mem into a 8bit register needs 2 uops on P4, 8bit->32bit with zero extension needs just 1
Originally committed as revision 6612 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-09 21:57:10 +00:00
Michael Niedermayer
2d3df05ca0 on the P4 inc needs twice as much time a add
Originally committed as revision 6611 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-09 21:39:07 +00:00
Michael Niedermayer
2ee9dc65be 10l
Originally committed as revision 6610 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-09 21:21:10 +00:00
Michael Niedermayer
7822e1c1ff reverse remainder of the failed attempt to optimize *state=c->mps_state[s]
Originally committed as revision 6609 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-09 21:14:16 +00:00
Michael Niedermayer
ef0090a998 x86 branchless cabac decoder
slightly faster on P3

Originally committed as revision 6608 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-09 20:51:33 +00:00
Michael Niedermayer
2e1aee80f4 optimize branchless C CABAC decoder
Originally committed as revision 6607 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-09 20:44:11 +00:00
Michael Niedermayer
1c2a417f6a move outcommented START/STOP_TIMER to a hopefully better place for benchmarking ...
Originally committed as revision 6605 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-09 18:20:00 +00:00
Michael Niedermayer
30dc5f56ad drop failed attempt to optimize *state= c->mps_state[s];
Originally committed as revision 6604 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-09 15:52:17 +00:00
Michael Niedermayer
c56d23dacf 10l bugfix for some disabled code
Originally committed as revision 6603 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-09 14:15:53 +00:00
Michael Niedermayer
f7d0b68361 first try of a handwritten get_cabac() for x86, this is 10-20% faster on P3 depening on if you try to subtract the START/STOP_TIMER overhead
Originally committed as revision 6602 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-09 14:15:14 +00:00
Michael Niedermayer
5bbe2a5292 remove bytestream_end checks, seems to work fine without them and the bitstream reader doesnt check for the end either
Originally committed as revision 6599 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-09 12:25:24 +00:00
Michael Niedermayer
c010d69a75 decrease ff_h264_norm_shift[] size
Originally committed as revision 6596 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-09 00:59:42 +00:00
Michael Niedermayer
6ff042699f cleanup
Originally committed as revision 6594 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-08 21:26:08 +00:00
Michael Niedermayer
260ceb6322 branchless renormalization (1% faster get_cabac) old branchless renormalization wasnt faster because gcc was scared of the shift variable (missusing bit variable now)
Originally committed as revision 6587 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-08 13:20:22 +00:00
Michael Niedermayer
99ce10873d 5% faster get_cabac()
Originally committed as revision 6586 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-08 11:24:37 +00:00
Michael Niedermayer
400d0f8e47 disable benchmarking code
disable asm optims as the fastest depends on cpu type

Originally committed as revision 6582 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-07 22:37:34 +00:00
Michael Niedermayer
4310580db5 renorm_cabac_decoder_once START/STOP_TIMER scores for athlon
Originally committed as revision 6581 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-07 22:34:32 +00:00
Michael Niedermayer
5659b509c7 refill cabac variables in 16bit steps, 3% faster get_cabac()
Originally committed as revision 6578 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-07 15:44:14 +00:00
Diego Biurrun
b78e7197a8 Change license headers to say 'FFmpeg' instead of 'this program/this library'
and fix GPL/LGPL version mismatches.

Originally committed as revision 6577 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-07 15:30:46 +00:00
Michael Niedermayer
2ae7569dc8 () 10l
Originally committed as revision 6576 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-07 12:41:55 +00:00
Michael Niedermayer
ec8f483ab5 several x86 renorm_cabac_decoder_once optimizations
START/STOP_TIMER benchmarking code for them
please benchmark on P4 & athlon
(ill remove the benchmarking code and the always slower variants as soon as p4/athlon benchmarks have been posted or commited)

Originally committed as revision 6573 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-07 11:15:10 +00:00
Loren Merritt
938dd84693 don't try to inline cabac functions. gcc ignored the hint anyway, and forcing it would make h264 slower.
Originally committed as revision 6549 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-10-04 07:16:10 +00:00
Loren Merritt
bfe328caf0 tweak cabac. 0.5% faster h264.
Originally committed as revision 6106 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-08-27 09:19:02 +00:00
Loren Merritt
2848ce84d2 don't force asserts in release builds. 2% faster h264.
Originally committed as revision 5332 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-04-29 00:43:15 +00:00
Diego Biurrun
5509bffa88 Update licensing information: The FSF changed postal address.
Originally committed as revision 4842 to svn://svn.ffmpeg.org/ffmpeg/trunk
2006-01-12 22:43:26 +00:00