(576i50 25Mbps 4:1:1 special case was wrong).
Fixes issue2211
Patch by Niobos, niobos dest-unreach be
Originally committed as revision 25269 to svn://svn.ffmpeg.org/ffmpeg/trunk
inline asm works for gcc-3.x also (hopefully). Should fix gcc-3.x FATE
breakage after r25254.
Originally committed as revision 25262 to svn://svn.ffmpeg.org/ffmpeg/trunk
increase to e.g. vc1, snow and mpeg decoding.
Patch by Eli Friedman <eli dot friedman gmail com>.
Originally committed as revision 25259 to svn://svn.ffmpeg.org/ffmpeg/trunk
That way the user app can set codec specific parameters in the private context
before opening it.
Originally committed as revision 25257 to svn://svn.ffmpeg.org/ffmpeg/trunk
from memory locations/offsets depending on b_idx plus constants, rather than
having gcc do this. This saves several lea calls and together saves about
10 cycles in h264_loop_filter_strength_mmx2().
Originally committed as revision 25256 to svn://svn.ffmpeg.org/ffmpeg/trunk
a pxor, or remove the instruction alltogether. Altogether, this saves 1
instruction.
Originally committed as revision 25255 to svn://svn.ffmpeg.org/ffmpeg/trunk
This has no measurable speed effect because the surrounding code doesn't
take advantage of this yet.
Originally committed as revision 25254 to svn://svn.ffmpeg.org/ffmpeg/trunk
of the d_idx variable and therefore allows for future optimizations. No speed
difference by this commit itself.
Originally committed as revision 25253 to svn://svn.ffmpeg.org/ffmpeg/trunk
inlining various constants within the loop code. 20 cycles faster on
cathedral sample.
Originally committed as revision 25252 to svn://svn.ffmpeg.org/ffmpeg/trunk
r25218 made assumptions about the existence of past reference frames that
weren't necessarily true.
Originally committed as revision 25243 to svn://svn.ffmpeg.org/ffmpeg/trunk
It was being set wrong for files with data_offset > 0
Patch by Michael Chinen, mchinen gmail
Originally committed as revision 25239 to svn://svn.ffmpeg.org/ffmpeg/trunk
inlines scan8[] and removes loop setup. 15% faster, 0.4% overall.
See "[PATCH] unroll loop in h264_idct_add8_sse2()" thread on ML.
Originally committed as revision 25172 to svn://svn.ffmpeg.org/ffmpeg/trunk
code directly also and remove loop setup. 20% faster in function, 0.8% overall.
See "[PATCH] unroll loop in h264_idct_add8_sse2()" thread on ML.
Originally committed as revision 25171 to svn://svn.ffmpeg.org/ffmpeg/trunk
but it used to do it only for h264 codec.
Allow it for other codecs, as mpeg2 and mpeg4 also set this flag.
Originally committed as revision 25156 to svn://svn.ffmpeg.org/ffmpeg/trunk
Apparently Apple platforms do not handle movw/movt relocations
properly, leading to runtime crashes in code using them.
Originally committed as revision 25150 to svn://svn.ffmpeg.org/ffmpeg/trunk
which will hopefully solve the Win64/FATE failures caused by these functions.
Originally committed as revision 25137 to svn://svn.ffmpeg.org/ffmpeg/trunk
have been accepted before).
Also do not fail if they are invalid but instead override them to 0.
This allows decoding e.g. MPEG video when only the container values are corrupted.
For encoding a value of 0,0 of course makes no sense, but was allowed
through before and will be caught by an extra check in the encode function.
Originally committed as revision 25124 to svn://svn.ffmpeg.org/ffmpeg/trunk
h264dsp_mmx.c to h264_idct.asm (as yasm code). Because the loops are now
coded in asm instead of C, this is (depending on the function) up to 50%
faster for cases where gcc didn't do a great job at looping.
Since h264_idct_add8() is now faster than the manual loop setup in h264.c,
in-asm idct calling can now be enabled for chroma as well (see r16207). For
MMX, this is 5% faster. For SSE2 (which isn't done for chroma if h264.c does
the looping), this makes it up to 50% faster. Speed gain overall is ~0.5-1.0%.
Originally committed as revision 25119 to svn://svn.ffmpeg.org/ffmpeg/trunk
Original patch by Zhou Zongyi, zhouzy A os pku edu cn, resubmitted by
James Darnley, james.darnley gmail, changes by me.
Originally committed as revision 25115 to svn://svn.ffmpeg.org/ffmpeg/trunk
For a PCE based configuration map the channels solely based on tags.
For an indexed configuration map the channels solely based on position.
This works with all known exotic samples including al17, elem_id0, bad_concat,
and lfe_is_sce.
Originally committed as revision 25098 to svn://svn.ffmpeg.org/ffmpeg/trunk
case the stride must be aligned to a multiple of 4.
The original CSCD encoder just compresses bitmaps it gets via Windows
API functions as-is, thus it uses exactly those alignment rules.
Originally committed as revision 25096 to svn://svn.ffmpeg.org/ffmpeg/trunk
mm_support instead.
Fix compilation if altivec is present and libxvidff.c is compiled.
Originally committed as revision 25075 to svn://svn.ffmpeg.org/ffmpeg/trunk
mm_support() instead.
Reduce complexity and simplify pending move to libavutil.
Originally committed as revision 25074 to svn://svn.ffmpeg.org/ffmpeg/trunk
Fixes a black line in non-swapped, non-mod-16-height Theora videos
when vp3_draw_horiz_band is used.
Originally committed as revision 25073 to svn://svn.ffmpeg.org/ffmpeg/trunk
Using floating-point here can cause erroneous rejection of
parameters due to rounding errors leading to a slightly too
large result.
This fixes the mxf regression test with gcc 4.5 on x86_32.
Originally committed as revision 25060 to svn://svn.ffmpeg.org/ffmpeg/trunk
that result in overdrawing areas again and again if s->flipped_image
is false.
Originally committed as revision 25051 to svn://svn.ffmpeg.org/ffmpeg/trunk
The #ifndef is required to allow for example some automated regression
tests by simply configuring with: --extra-cflags="-DFF_API_FOO=0".
Originally committed as revision 25043 to svn://svn.ffmpeg.org/ffmpeg/trunk
This increases compatibilty with nasm and is also more consistent,
e.g. with h264_intrapred.asm and h264_chromamc.asm that already
do it that way.
Originally committed as revision 25042 to svn://svn.ffmpeg.org/ffmpeg/trunk
This fixes building with --disable-everything --enable-muxer=rtp, closing
issue 2159.
Originally committed as revision 25036 to svn://svn.ffmpeg.org/ffmpeg/trunk
format), LGPL'ed with permission from Jason and Loren. This includes mmx2
code, so remove inline asm from h264dsp_mmx.c accordingly.
Originally committed as revision 25031 to svn://svn.ffmpeg.org/ffmpeg/trunk
biweight code to sse2/ssse3; add sse2 weight code; and use that same code to
create mmx2 functions also, so that the inline asm in h264dsp_mmx.c can be
removed. OK'ed by Jason on IRC.
Originally committed as revision 25019 to svn://svn.ffmpeg.org/ffmpeg/trunk
still #included in dsputil_mmx.c and is part of DSPContext, and h264dsp_mmx.c,
which represents H264DSPContext and is now compiled on its own.
Originally committed as revision 25018 to svn://svn.ffmpeg.org/ffmpeg/trunk
uint16_fast_t is unsigned int (or long) on Linux, which when compared
with int results in an unsigned compare.
Originally committed as revision 24994 to svn://svn.ffmpeg.org/ffmpeg/trunk
into its own file, it doesn't belong in h264dsp_mmx.c (much less so in
dsputil_mmx.c).
Originally committed as revision 24990 to svn://svn.ffmpeg.org/ffmpeg/trunk
This removes duplicated definitions of 8x8 and 16x16 fullpel MC
functions with various names reducing dsputil.o by 8k on x86_64.
Originally committed as revision 24933 to svn://svn.ffmpeg.org/ffmpeg/trunk
The stride can be negative and must be sign extended before being
used in pointer arithmetic.
Originally committed as revision 24926 to svn://svn.ffmpeg.org/ffmpeg/trunk
Because the order of evaluation of subexpressions is undefined, two
get_bits() calls may not be part of the same expression.
See also r24902.
Originally committed as revision 24906 to svn://svn.ffmpeg.org/ffmpeg/trunk
Because the order of evaluation of subexpressions is undefined, two
get_bits() calls may not be part of the same expression. In this
specific case, using get_bits_long() is simpler.
This fixes msmpeg4v1 decoding with armcc.
Originally committed as revision 24902 to svn://svn.ffmpeg.org/ffmpeg/trunk
This performs quite a bit better than the current 3GPP-inspired window decision
on all the samples I have tested. On the castanets.wav sample it performs very
similar to iTunes window selection, and seems to perform better than Nero.
On fatboy.wav, it seems to perform at least as good as iTunes, if not better.
Nero performs horribly on this sample.
Patch by: Nathan Caldwell <saintdev@gmail.com>
Originally committed as revision 24892 to svn://svn.ffmpeg.org/ffmpeg/trunk
This allows cleaner implementation of other psymodels using the existing
structs. It also will make it easier to interchange individual parts of
the psymodel to create hybrid models.
Patch by: Nathan Caldwell <saintdev@gmail.com>
Originally committed as revision 24890 to svn://svn.ffmpeg.org/ffmpeg/trunk
This is to avoid split asm sections that attempt to preserve some
registers between sections.
Originally committed as revision 24869 to svn://svn.ffmpeg.org/ffmpeg/trunk
since all of it ends up in a single 32-bit pixel.
This seems likely to be wrong though, since it is different
from the 15 and 16 bit modes and might explain the half-width
issue for 24 bit truemotion.
Originally committed as revision 24864 to svn://svn.ffmpeg.org/ffmpeg/trunk
BGR32 and not BGR24, do not swap red and blue on big-endian
for this format as well.
Originally committed as revision 24863 to svn://svn.ffmpeg.org/ffmpeg/trunk
The relevent extract of the iso 13818-2 about the value of the syntaxical
element top_field_first of the Picture Coding Extension is:
"top_field_first -- The meaning of this element depends upon picture_structure,
progressive_sequence and repeat_first_field.
[...]
In a field picture top_field_first shall have the value '0', and the only field
output by the decoding process is the decoded field picture."
Originally committed as revision 24853 to svn://svn.ffmpeg.org/ffmpeg/trunk
Conversion of an out of range float to int is undefined. Clipping to
the final range first avoids such problems. This fixes decoding on
MIPS, which handles these conversions differently from many other CPUs.
Originally committed as revision 24838 to svn://svn.ffmpeg.org/ffmpeg/trunk
ff_get_plane_bytewidth().
The new implementation is more generic, more compact and more correct.
Originally committed as revision 24786 to svn://svn.ffmpeg.org/ffmpeg/trunk
Grab from the bitstream in 16-bit chunks instead of 8-bit chunks.
TODO: grab in 32-bit chunks on 64-bit systems.
Originally committed as revision 24783 to svn://svn.ffmpeg.org/ffmpeg/trunk
wrapper of it.
The new function is more generic, and does not depend on the
definition of the AVPicture struct.
Patch by S.N. Hemanth Meenakshisundaram s + "meenakshisundaram".substr(0, 7) + "@ucsd.edu".
Originally committed as revision 24768 to svn://svn.ffmpeg.org/ffmpeg/trunk
correct mb_xy to use mb_width.
tighten allocations.
reduce the amount of zeroing.
Originally committed as revision 24760 to svn://svn.ffmpeg.org/ffmpeg/trunk
two padding zeroes before it. Should fix fate failures on openBSD and crashes
on MacOSX (that I cannot reproduce).
Originally committed as revision 24750 to svn://svn.ffmpeg.org/ffmpeg/trunk
Many H.264 derivatives, like RV40 and VP8, use the H.264 prediction functions
but not the weight/loopfilter functions.
This should reduce the size of builds with one of these derivatives but without
H.264 decoding itself.
Originally committed as revision 24741 to svn://svn.ffmpeg.org/ffmpeg/trunk
including libavcore/imgutils.h, which was required since the recent
avcodec_check_dimensions() -> av_check_image_size() transition.
Originally committed as revision 24734 to svn://svn.ffmpeg.org/ffmpeg/trunk
When sizeof(uint_fast8_t) >= sizeof(int) there are unintended size effects.
Originally committed as revision 24716 to svn://svn.ffmpeg.org/ffmpeg/trunk
Create a custom table for VP5/6/8's renorm to avoid depending on H.264's.
Saves one instruction in the arithmetic decoder as well.
Originally committed as revision 24701 to svn://svn.ffmpeg.org/ffmpeg/trunk
This reverts rev 24674 - the VP8 decoder actually depends on cabac.o.
vp8.c includes vp56.h, which includes cabac.h, which has inline functions
that reference tables from cabac.c.
This fixes compilation with --disable-everything --enable-decoder=vp8.
Originally committed as revision 24692 to svn://svn.ffmpeg.org/ffmpeg/trunk
Always inline the arithmetic coder, except in the case of header-parsing stuff,
in which case don't inline it at all to save code size.
Originally committed as revision 24677 to svn://svn.ffmpeg.org/ffmpeg/trunk
If sizeof uint_fast8_t > 1 and sizeof size_t <= 4, the expression that mallocs
classifs is susceptible to integer overflow.
Originally committed as revision 24675 to svn://svn.ffmpeg.org/ffmpeg/trunk
A lot of the time the DC block is empty: don't do the WHT in this case.
A lot of the rest of the time, there's only one coefficient: make a special
DC-only transform for that case.
When the block is empty, don't incorrectly mark luma DCT blocks as having DC
coefficients.
Originally committed as revision 24670 to svn://svn.ffmpeg.org/ffmpeg/trunk
Lets us do the zeroing in asm instead of C.
Also makes it consistent with the way the regular iDCT code does it.
Originally committed as revision 24668 to svn://svn.ffmpeg.org/ffmpeg/trunk
unchanged bytes) in the horizontal simple loopfilter. This makes the filter
quite a bit faster in itself (~30 cycles less on Core1), probably mostly
because we don't need a complex 4x4 transpose, but only a simple byte
interleave. Also allows using pextrw on SSE4, which speeds up even more
(e.g. 25% faster on Core i7).
Originally committed as revision 24638 to svn://svn.ffmpeg.org/ffmpeg/trunk
can be known exactly, so larger frame sizes can be safely encoded without buffer
overwrite.
Originally committed as revision 24630 to svn://svn.ffmpeg.org/ffmpeg/trunk
av_fill_image_pointers() rather than their wrappers ff_fill_linesize()
and ff_fill_pointer().
Improve performance.
Originally committed as revision 24587 to svn://svn.ffmpeg.org/ffmpeg/trunk
libavcodec/imgconvert.c and make them public in libavcore/imgutils.h,
with the names av_fill_image_linesizes() and av_fill_image_pointers().
Originally committed as revision 24583 to svn://svn.ffmpeg.org/ffmpeg/trunk
in favor of the newly added corresponding functions
av_parse_video_size() and av_parse_video_rate() defined in
libavcore/parseutils.h.
This change also adds a linking-time dependency of libavcodec and of
libavfilter on libavcore.
Originally committed as revision 24518 to svn://svn.ffmpeg.org/ffmpeg/trunk
5-10% faster or more on Phenom, Athlon 64, and some others.
Helps some on pre-SSSE3 Intel chips as well, but not as much.
Originally committed as revision 24513 to svn://svn.ffmpeg.org/ffmpeg/trunk
fill_image_data_ptr(). ff_fill_linesize() and ff_fill_pointer() now wrap
these functions.
The new functions are more generic, and are going to be exported in a
future patch.
Patch by S.N. Hemanth Meenakshisundaram smeenaks # ucsd § edu.
Originally committed as revision 24512 to svn://svn.ffmpeg.org/ffmpeg/trunk
mbedge loopfilter functions, by re-using space that holds a variable
that we no longer need.
Originally committed as revision 24510 to svn://svn.ffmpeg.org/ffmpeg/trunk
future new optimizations (imagine a sse5) much easier. Also fix a bug where
we used the direction (%2) rather than optimization (%1) to enable this, which
means it wasn't ever actually used...
Originally committed as revision 24507 to svn://svn.ffmpeg.org/ffmpeg/trunk
splits it into small optimization-specific macros which are selected for each
DSP function. The advantage of this approach is that the sse4 functions now
use the ssse3 codepath also without needing an explicit sse4 codepath.
Originally committed as revision 24487 to svn://svn.ffmpeg.org/ffmpeg/trunk
Reduce scalefactors in non-zero bands that underflow by twice as much as those
in bands that just fail to hit psy targets.
Originally committed as revision 24482 to svn://svn.ffmpeg.org/ffmpeg/trunk
This is a lot more reliable to get cmov rather than trying to trick gcc into
generating it, useful since it's 2% faster overall.
Patch by Eli Friedman <eli.friedman at gmail>
Originally committed as revision 24471 to svn://svn.ffmpeg.org/ffmpeg/trunk
on the huffman tree, instead of traversing the tree in a while loop.
Based on the similar optimization in libvpx's detokenize.c
10% faster at normal bitrates, and 30% faster for high-bitrate intra-only
Originally committed as revision 24468 to svn://svn.ffmpeg.org/ffmpeg/trunk
No difference at the moment, but allows a future branchy variant
of vp56_rac_get_prob to be significantly faster
Originally committed as revision 24467 to svn://svn.ffmpeg.org/ffmpeg/trunk
Apparently the official conformance test vectors don't test this feature,
even though libvpx uses it.
Originally committed as revision 24456 to svn://svn.ffmpeg.org/ffmpeg/trunk
Take shortcuts based on statistically common situations.
Add 4-at-a-time idct_dc function (mmx and sse2) since rows of 4 DC-only DCT
blocks are common.
TODO: tie this more directly into the MB mode, since the DC-level transform is
only used for non-splitmv blocks?
Originally committed as revision 24452 to svn://svn.ffmpeg.org/ffmpeg/trunk
Don't prefetch reference frames that were used less than 1/32th of the time so
far in the frame.
This helps speed up to ~2% on videos that, in many frames, make near-zero
(but not entirely zero) use of golden and/or alt-refs.
This is a very common property of videos encoded by libvpx.
Originally committed as revision 24451 to svn://svn.ffmpeg.org/ffmpeg/trunk
Prefetch all refs (including altref), but only if they've been used so far this
frame.
~2.5% faster overall.
TODO: Do something even smarter, like using how often each ref has been used
so far, so that a couple blocks of a rarely-used ref don't force us to prefetch
it.
Originally committed as revision 24444 to svn://svn.ffmpeg.org/ffmpeg/trunk
Uses a slightly nonintuitive ring buffer size of (width+height*2) to simplify
addressing logic.
Also split out the segmentation map to a separate structure, necessary to
implement the ring buffer.
Originally committed as revision 24426 to svn://svn.ffmpeg.org/ffmpeg/trunk
Gives better cache locality, since the VP8Macroblock structs are still in cache.
Inspired by the way x264 does it.
Originally committed as revision 24417 to svn://svn.ffmpeg.org/ffmpeg/trunk
As in the previous commit, they aren't used for context selection, so it saves
memory this way.
Originally committed as revision 24416 to svn://svn.ffmpeg.org/ffmpeg/trunk
Saves nothing except a bit of memory/cache now, but will allow future
optimizations.
Originally committed as revision 24411 to svn://svn.ffmpeg.org/ffmpeg/trunk
SSSE3 versions, improve SSE2 versions a bit.
SSE2/SSSE3 mbedge h functions are currently broken, so explicitly disable them.
Originally committed as revision 24403 to svn://svn.ffmpeg.org/ffmpeg/trunk
Avoid pextrw, since it's slow on many older CPUs.
Now it doesn't require mmxext either.
Originally committed as revision 24397 to svn://svn.ffmpeg.org/ffmpeg/trunk
The ff_inverse table is used by FASTDIV macro, defined in libavutil, but up
to now the table was defined only in libavcodec.
After this change, the main copy of ff_inverse is part of libavutil (just
like FASTDIV), but if CONFIG_SMALL is unset, then a different copy is made
available to libavcodec, to avoid the performance penalty of using an
external look up table.
Dynamic linking works, because the libraries are linked with -Bsymbolic, so
the local copy of the symbol has priority over the external; static linking
works because the table is on a standalone object file in both libraries,
so the linker is able to discard one of the two.
Tested on Linux/x86-64 and Mac OS X/x86-64.
Originally committed as revision 24383 to svn://svn.ffmpeg.org/ffmpeg/trunk
Should fix compilation with icc and should help prevent any future duplicates
Originally committed as revision 24380 to svn://svn.ffmpeg.org/ffmpeg/trunk
This is the correct solution to the warning "fixed" in the previous
commit.
Originally committed as revision 24367 to svn://svn.ffmpeg.org/ffmpeg/trunk
libavcodec/vp8.c:892: warning: suggest explicit braces to avoid ambiguous `else'
Originally committed as revision 24366 to svn://svn.ffmpeg.org/ffmpeg/trunk
regular MMX code. Examples of this are the Core1 CPU. Instead, set a new flag,
FF_MM_SSE2/3SLOW, which can be checked for particular SSE2/3 functions that
have been checked specifically on such CPUs and are actually faster than
their MMX counterparts.
In addition, use this flag to enable particular VP8 and LPC SSE2 functions
that are faster than their MMX counterparts.
Based on a patch by Loren Merritt <lorenm AT u washington edu>.
Originally committed as revision 24340 to svn://svn.ffmpeg.org/ffmpeg/trunk
so that it does both U and V planes at the same time. This will have speed
advantages when using SSE2 (or higher) optimizations, since we can do both
the U and V rows together in a single xmm register.
This also renames filter16 to filter16y and filter8 to filter8uv so that it's
more obvious what each function is used for.
Originally committed as revision 24337 to svn://svn.ffmpeg.org/ffmpeg/trunk