This is faster for videos that have lots of MBs that fall in this category.
Originally committed as revision 21400 to svn://svn.ffmpeg.org/ffmpeg/trunk
This is consistent with other codecs and will also avoid a crash on the
memcpy to data[1] if AVPALETTE_SIZE ever increases.
Originally committed as revision 21399 to svn://svn.ffmpeg.org/ffmpeg/trunk
inv_zigzag_direct16 16-byte aligned, so mark it appropriately.
Fixes encoder crashes e.g. with MPlayer's -vf lavc.
Originally committed as revision 21389 to svn://svn.ffmpeg.org/ffmpeg/trunk
Change order of operands as gcc uses a hardcoded register per operand it seems
even for static functions
thus reducing unneeded moved (now functions try to pass the same argument in
the same spot).
Change signed int to unsigned int for array indexes as signed requires signed
extension while unsigned is free.
move the +52 up and merge it where it will end as a lea instruction, gcc always
splits the 52 out there turning the free +52 into an expensive one otherwise.
The changed code becomes a little faster.
Originally committed as revision 21375 to svn://svn.ffmpeg.org/ffmpeg/trunk
This should be faster (couldnt meassue a difference), and its less picky
on slightly out of spec dquant.
Originally committed as revision 21373 to svn://svn.ffmpeg.org/ffmpeg/trunk
This makes ffmpeg stop printing millions of
Multiple frames in a packet from stream 0
when decoding adpcm.
Originally committed as revision 21362 to svn://svn.ffmpeg.org/ffmpeg/trunk
The various avcodec_thread_init() functions are updated to return
immediately after setting avctx->thread_count. This allows -threads 0
to pass through to codecs. It also simplifies the usage for apps
using libavcodec.
Originally committed as revision 21358 to svn://svn.ffmpeg.org/ffmpeg/trunk
It allows VLD H264 decoding using DXVA2 (GPU assisted decoding API under
VISTA and Windows 7).
It is implemented by using AVHWAccel API. It has been tested successfully
for some time in VLC using an nvidia card on Windows 7.
To compile it, you need to have the system header dxva2api.h (either from
microsoft or using http://downloads.videolan.org/pub/videolan/testing/contrib/dxva2api.h)
The generated libavcodec.dll does not depend directly on any new lib as
the necessary objects are given by the application using FFmpeg.
Originally committed as revision 21353 to svn://svn.ffmpeg.org/ffmpeg/trunk
This fixes gcc failing to fit 6 memory locations into 7 registers on x86-32
Originally committed as revision 21337 to svn://svn.ffmpeg.org/ffmpeg/trunk
all inlined, its small and horizontal & vertical versions are build out of
them. no change as gcc already did this.
Originally committed as revision 21333 to svn://svn.ffmpeg.org/ffmpeg/trunk
With b_keyframe instead of IDR for detecting keyframes, ffmpeg should now
support periodic encoding with periodic intra refresh (although there is no
interface option for it yet).
Set the new timebase values for full VFR input support.
Bump configure to check for API version 83.
Originally committed as revision 21317 to svn://svn.ffmpeg.org/ffmpeg/trunk
Thats not possible except maybe in FMO which noone uses anyway.
iam also not sure if this wasnt missing a part_width.
Originally committed as revision 21312 to svn://svn.ffmpeg.org/ffmpeg/trunk
loop filter. This removes one obstacle of getting ff_h264_filter_mb_fast()
bitexact. code is maybe 0.1% faster
Originally committed as revision 21280 to svn://svn.ffmpeg.org/ffmpeg/trunk
Run loop filter per row instead of per MB, this also should make it
much easier to switch to per frame filtering and also doing so in a
seperate thread in the future if some volunteer wants to try.
Overall decoding speedup of 1.7% (single thread on pentium dual / cathedral sample)
This change also allows some optimizations to be tried that would not have
been possible before.
Originally committed as revision 21270 to svn://svn.ffmpeg.org/ffmpeg/trunk
Fixes build with --disable-encoders --enable-encoder=snow.
This fixes MPlayer build with --disable-mencoder.
Originally committed as revision 21259 to svn://svn.ffmpeg.org/ffmpeg/trunk
~200 bytes smaller ff_h264_filter_mb()
please everyone, NEVER add code with the assumtation that gcc will remove it
without checking gcc actually does. Chances are it does not.
Originally committed as revision 21251 to svn://svn.ffmpeg.org/ffmpeg/trunk
and 5% faster.
ff_h264_filter_mb_fast() stay the same size as gcc decided not to inline these
functions there in the first place.
Originally committed as revision 21250 to svn://svn.ffmpeg.org/ffmpeg/trunk
No benchmark because its just replacing variables with litteral constants
(so no risk for slowdown outside gcc silliness) and i need sleep.
Originally committed as revision 21237 to svn://svn.ffmpeg.org/ffmpeg/trunk
Using the low-level macros directly avoids redundant open/update/close
cycles.
2-3% faster on ARM, PPC, and Core i7.
Originally committed as revision 21224 to svn://svn.ffmpeg.org/ffmpeg/trunk
This could have caused the linking failure of pred_pskip_motion() missing if
a compiler included never used static functions.
Originally committed as revision 21221 to svn://svn.ffmpeg.org/ffmpeg/trunk
About 1% faster ff_ac3_bit_alloc_calc_psd on Intel Atom, overall speedup
not measurable though.
Should have a bigger effect on systems without cmov or with very slow cmov.
Originally committed as revision 21214 to svn://svn.ffmpeg.org/ffmpeg/trunk
Since BGR24 is decoded as BGR32, fill its alpha channel with 255
using the appropriate predictors.
Originally committed as revision 21211 to svn://svn.ffmpeg.org/ffmpeg/trunk
Simplify cur_band_type, group_len, and coef/offset calculations. This
makes the code easier to read and slightly faster.
Originally committed as revision 21189 to svn://svn.ffmpeg.org/ffmpeg/trunk
The codebooks each consist of small number of values repeated in
groups of 2 or 4. Storing the codebooks as a packed list of 2- or
4-bit indexes into a table reduces their size substantially (from 7.5k
to 1.5k), resulting in less cache pressure.
For the band types with sign bits in the bitstream, storing the number
and position of non-zero codebook values using a few bits avoids
multiple get_bits() calls and floating-point comparisons which gcc
handles miserably.
Some float/int type punning also avoids gcc brain damage.
Overall speedup 20-35% on Cortex-A8, 20% on Core i7.
Originally committed as revision 21188 to svn://svn.ffmpeg.org/ffmpeg/trunk
Two of these are in fact constant size, so use the constant instead of
a variable in the declarations. The remaining one is small enough
that always using the maximum size is acceptable.
Originally committed as revision 21183 to svn://svn.ffmpeg.org/ffmpeg/trunk
Seems to speed the code up a little...
The placement of many generic functions between h264.c and h264.h is still open
Currently they are a little randomly placed between them.
Originally committed as revision 21178 to svn://svn.ffmpeg.org/ffmpeg/trunk
called once per MB in worst case and doesnt seem to benefit from static inline.
Actually the code might be a hair faster now (0.1% according to my benchmark but
this could be random noise)
Originally committed as revision 21173 to svn://svn.ffmpeg.org/ffmpeg/trunk
no speedloss meassured, also its really not touching anything that is speed relevant.
Originally committed as revision 21169 to svn://svn.ffmpeg.org/ffmpeg/trunk
No speedloss meassured (its slightly faster here but that may be random fluctuations)
Originally committed as revision 21165 to svn://svn.ffmpeg.org/ffmpeg/trunk
functions called more than per mb are moved into the header, scan8 is also
as it must be known at compiletime.
The code after this patch duplicates h264data.h, this has been done to minimize
the changes in this step and allow more fine grained benchmarking.
Speedwise this is 1% faster on my pentium dual core with diegos cursed cathedral
sample.
Originally committed as revision 21157 to svn://svn.ffmpeg.org/ffmpeg/trunk
The maximum length of escape_sequence is 21 bits, so adjust limit in
code to match this.
Up to 10% faster on Cortex-A8.
Originally committed as revision 21153 to svn://svn.ffmpeg.org/ffmpeg/trunk
The maximum length of escape_sequence is 21 bits, so adjust limit in
code to match this. Also fix the comment.
Originally committed as revision 21151 to svn://svn.ffmpeg.org/ffmpeg/trunk
Fixes warnings:
libavcodec/mpegvideo_enc.c:574: warning: implicit declaration of function
'ff_match_2uint16'
libavcodec/ituh263enc.c:143: warning: implicit declaration of function
'ff_match_2uint16'
libavcodec/svq1enc.c:97: warning: implicit declaration of function
'ff_match_2uint16'
Originally committed as revision 21133 to svn://svn.ffmpeg.org/ffmpeg/trunk
this makes the 9/7 C wavelet at the decoder side 22% faster.
The old code is changed to match the new in terms of the order of operations
(which also makes it sligtly faster)
Originally committed as revision 21132 to svn://svn.ffmpeg.org/ffmpeg/trunk
decoder which allows their usage without checking profile_idc.
Patch by Laurent Aimar (fenrir (AT) videolan org)
Originally committed as revision 21107 to svn://svn.ffmpeg.org/ffmpeg/trunk