The codebooks each consist of small number of values repeated in
groups of 2 or 4. Storing the codebooks as a packed list of 2- or
4-bit indexes into a table reduces their size substantially (from 7.5k
to 1.5k), resulting in less cache pressure.
For the band types with sign bits in the bitstream, storing the number
and position of non-zero codebook values using a few bits avoids
multiple get_bits() calls and floating-point comparisons which gcc
handles miserably.
Some float/int type punning also avoids gcc brain damage.
Overall speedup 20-35% on Cortex-A8, 20% on Core i7.
Originally committed as revision 21188 to svn://svn.ffmpeg.org/ffmpeg/trunk
Two of these are in fact constant size, so use the constant instead of
a variable in the declarations. The remaining one is small enough
that always using the maximum size is acceptable.
Originally committed as revision 21183 to svn://svn.ffmpeg.org/ffmpeg/trunk
Seems to speed the code up a little...
The placement of many generic functions between h264.c and h264.h is still open
Currently they are a little randomly placed between them.
Originally committed as revision 21178 to svn://svn.ffmpeg.org/ffmpeg/trunk
called once per MB in worst case and doesnt seem to benefit from static inline.
Actually the code might be a hair faster now (0.1% according to my benchmark but
this could be random noise)
Originally committed as revision 21173 to svn://svn.ffmpeg.org/ffmpeg/trunk
no speedloss meassured, also its really not touching anything that is speed relevant.
Originally committed as revision 21169 to svn://svn.ffmpeg.org/ffmpeg/trunk
No speedloss meassured (its slightly faster here but that may be random fluctuations)
Originally committed as revision 21165 to svn://svn.ffmpeg.org/ffmpeg/trunk
functions called more than per mb are moved into the header, scan8 is also
as it must be known at compiletime.
The code after this patch duplicates h264data.h, this has been done to minimize
the changes in this step and allow more fine grained benchmarking.
Speedwise this is 1% faster on my pentium dual core with diegos cursed cathedral
sample.
Originally committed as revision 21157 to svn://svn.ffmpeg.org/ffmpeg/trunk
The maximum length of escape_sequence is 21 bits, so adjust limit in
code to match this.
Up to 10% faster on Cortex-A8.
Originally committed as revision 21153 to svn://svn.ffmpeg.org/ffmpeg/trunk
The maximum length of escape_sequence is 21 bits, so adjust limit in
code to match this. Also fix the comment.
Originally committed as revision 21151 to svn://svn.ffmpeg.org/ffmpeg/trunk
Fixes warnings:
libavcodec/mpegvideo_enc.c:574: warning: implicit declaration of function
'ff_match_2uint16'
libavcodec/ituh263enc.c:143: warning: implicit declaration of function
'ff_match_2uint16'
libavcodec/svq1enc.c:97: warning: implicit declaration of function
'ff_match_2uint16'
Originally committed as revision 21133 to svn://svn.ffmpeg.org/ffmpeg/trunk
this makes the 9/7 C wavelet at the decoder side 22% faster.
The old code is changed to match the new in terms of the order of operations
(which also makes it sligtly faster)
Originally committed as revision 21132 to svn://svn.ffmpeg.org/ffmpeg/trunk
decoder which allows their usage without checking profile_idc.
Patch by Laurent Aimar (fenrir (AT) videolan org)
Originally committed as revision 21107 to svn://svn.ffmpeg.org/ffmpeg/trunk