These modes were not originally exposed by the library at all.
In practice, only a few of them work for each sample rate/profile
combination, and they don't work at all for the more uncommon
sample rates.
Signed-off-by: Martin Storsjö <martin@martin.st>
Not all applications (e.g. MPlayer) set block_align, and
when using a different demuxer it might not even be
easily available.
So fall back to selecting mode based on bit rate as before
if block_align has not useful value.
It can't be worse than failing to decode completely.
(cherry picked from commit 1d0d63052b)
CC: libav-stable@libav.org
Signed-off-by: Reinhard Tartler <siretart@tauware.de>
Inline functions declared without extern do not provide an external
definition in standard C99. This code only works because most
compilers do not implement the inline semantics correctly. With a
stricter compiler, linking fails with unresolved references to these
functions.
Declaring the functions extern inline works correctly with some
compilers while some others still fail to create external definitions.
For maximum portability, create a static inline version with an
externally visible wrapper for ff_get_mb_score. ff_epzs_motion_search
is so large that no sane compiler inlines it anyway, so there the
inline keyword can simply be dropped with no effect.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Adds a flag context_reinit to MpegEncContext to relieable keep track
of frame parameter changes which require a context reinitialization.
This is required for broken inputs which change the frame size but
error out before the context can be reinitialized.
This is mainly required for frame parameter changes during frame based
multithreading but single threaded usage profits too from avoiding
ff_MPV_common_end()/ff_MPV_common_init() cycles.
ALS spec:
11.6.3.1.1 Quantization and encoding of parcor coefficients
...
In all cases the resulting quantized values ak are restricted to the range [-64,63].
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
Fixes out of array write in quant_cof.
Also make sure no invalid opt_order stays in the context.
Fixes CVE-2012-2775
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
Values that fail this check will cause failure of decode_rice()
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
These arguments are either constants or copies of MpegEncContext
fields just as easily accessed within the function.
Signed-off-by: Mans Rullgard <mans@mansr.com>
These functions do not benefit from being inlined. They are large,
and there are no opportunities for constant propagation.
Signed-off-by: Mans Rullgard <mans@mansr.com>
In both usages of FASTDIV the denominator might be 1.
Using a branch could make the function slower than using a normal
division.
Both denominator and numerator can be multiplied by 2 safely and
using shifts is faster than using a branch.
For some reason add_hfyu_median_prediction_cmov is only selected
on 3Dnow-capable CPUs, even though it uses no 3Dnow instructions.
This patch allows it to be selected on any cpu with cmov with the
possibility of being overridden by the mmxext version.
Signed-off-by: Mans Rullgard <mans@mansr.com>
It calculates the sum of power of two series, which can be done in one step.
Suggested by Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Kostya Shishkov <kostya.shishkov@gmail.com>
That division can be replaced with a comparison:
((c->value - c->low) << 1) + 1 >= range
By expanding 'range' definition and simplifying this inequation we obtain
the final expression.
Suggested by Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Kostya Shishkov <kostya.shishkov@gmail.com>
Prevents dangling pointers and makes access after free more obvious.
Setting AVFrame.qscale_table to NULL is required for successfully
allocating a previously freed Picture with ff_alloc_picture().
It is possible in various error paths as well as gap handling
that this has already been allocated. It is not clear why that
would be a problem with the current code, thus disable the
assert to avoid a common assert failure when asserts are enabled.
Signed-off-by: Martin Storsjö <martin@martin.st>
The data in coded_frame isn't allocated using get_buffer, but
is copied from the input frame to the encoder, so we should
not try to free it ourselves.
This fixes an assert failure when running in debug mode.
Signed-off-by: Martin Storsjö <martin@martin.st>
Previously, the put_bits call writing the value wrote a value
larger than the number of bits specified, failing asserts
in debug mode. There was no actual bitstream writer corruption,
since the overwritten bit already always was set to 1.
Signed-off-by: Martin Storsjö <martin@martin.st>
Previously, the value given to put_bits was 10 bits long for positive
predictors, even though 9 bits were to be written. The extra bit could
in some cases overwrite existing bits in the bitstream writer cache.
This fixes a failed assert in put_bits.h, when running a version
built with -DDEBUG.
The fate test result gets slightly improved, thanks to getting rid
of the overwritten bits in the bitstream writer cache.
Signed-off-by: Martin Storsjö <martin@martin.st>
This way it won't interfere with WMV9 initialisation inside MSS2 decoder and
avplay will play it fine.
Signed-off-by: Kostya Shishkov <kostya.shishkov@gmail.com>
Also, align the mangled RGB planes, which is required for the
SIMD versions of dsputils' median predict.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The optimized dct_quantize template functions reference optimized
fdct symbols, so these functions must only be enabled if the relevant
optimizations have been enabled by configure.
Using the malloc variant avoids pointless memcpy on size
increase and simplifies handling allocation failure.
Also change code to ensure that allocation, bswap and bitstream
reader all use the same size, even when the packet size is odd
for example.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
This fixes building with DEBUG defined after the function was made
static and the prototype removed in d7f9786cbc.
Signed-off-by: Martin Storsjö <martin@martin.st>
This is a preparatory step for the MSS2 decoder which needs to use
the WMV9 decoder to decode some kinds of frames.
From the patch by Alberto Delmás <adelmas@gmail.com>
This reverts commit 484a337cd7.
These functions were used in f8bed30 "VC1: merge idct8x8, coeff
adjustments and put_pixels" which was reverted in 18b6a69.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Instead, use it on the first member, since by definition, if
any member is aligned, the whole struct must be, in order to
maintain that alignment.
Fixes compilation with some finicky compilers.
Idea for fix from Måns Rullgård.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
The bitstream buffer must be padded, or the bitstream reader might
read over the end.
Fixes the following valgrind warning:
Use of uninitialised value of size 8 at 0x591BAE: cllc_decode_frame (cllc.c:166)
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Put the zero length check in place of code that was never used
during decoding, as zero-length slices were generally refused
in decode_frame().
Signed-off-by: Kostya Shishkov <kostya.shishkov@gmail.com>
This table is used only by mpegaudiodsp and mpegaudioenc. Separating
it allows dropping some dependencies from mpc[78] and qdm2.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Pass pointer to sample buffer instead of channel number to various
functions called from decode_subframe(). Also simplify a few
expressions within this function.
The h264_vdpau decoder crashed if output colorspace was not 8-bit 420.
Add a check to error out instead (current hardware does not support
other colorspaces, so successful decoding is not possible).
Signed-off-by: Martin Storsjö <martin@martin.st>
The way this bit is decoded was accidentally flipped in b70feb405,
leading to warnings "Encountered a bad or corrupted frame" for each
decoded frame.
Signed-off-by: Martin Storsjö <martin@martin.st>
This function is always called with a non-negative argument, so
those special cases are not needed. In the places the argument
might be zero, the return value for a zero argument does not matter
since it would then be used to scale an array full of zeros.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Note that the symbols used to run the hardware decoder in asynchronous mode
have been marked deprecated and will be dropped at a future version bump.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
This fixes two issues preventing suncc from building this code.
The undocumented 'a' operand modifier, causing gcc to omit a $ in
front of immediate operands (as required in addresses), is not
supported by suncc. Luckily, the also undocumented 'c' modifer
has the same effect and is supported.
On some asm statements with a large number of operands, suncc for no
obvious reason fails to correctly substitute some of the operands.
Fortunately, some of the operands in these statements are plain
numbers which can be inserted directly into the code block instead
of passed as operands.
With these changes, the code builds correctly with both gcc and
suncc.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This code contains a C array of addresses of labels defined in
inline asm. To do this, the names must be declared as external
in C. The declared type does not matter since only the address is
used, and for some reason, the author of the code used the 'void'
type despite taking the address of a void expression being invalid.
Changing the type to char, a reasonable choice since the alignment
of the code labels cannot be known or guaranteed, eliminates gcc
warnings and allows building with suncc.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Although a reasonable compiler will probably optimise out the
actual store and load, this operation still implies a truncation
to 16 bits which the compiler will probably not realise is not
necessary here.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Writing the scaled excitation to a scratch buffer (borrowing the
'audio' array) instead of modifying it in place avoids the need
to save and restore the unscaled values.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Use saturating addition functions instead of 64-bit intermediates
and separate clipping. This is much faster when dedicated
instructions are available.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Firstly, nothing in this function can overflow 32 bits so the use
of a 64-bit type is completely unnecessary. Secondly, the scale
is either a power of two or 0x7fff. Doing separate loops for these
cases avoids using multiplications. Finally, since only the number
of bits, not the actual value, of the maximum value is needed, the
bitwise or of all the values serves the purpose while being faster.
It is worth noting that even if overflow could happen, it was not
handled correctly anyway.
Signed-off-by: Mans Rullgard <mans@mansr.com>
The operands in both cases are 16-bit so cannot overflow a 32-bit
destination. In gain_scale() the inputs are reduced to 14-bit,
so even the shift cannot overflow.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Adding instead of subtracting the products in the loop allows the
compiler to generate more efficient multiply-accumulate instructions
when 16-bit multiply-subtract is not available. ARM has only
multiply-accumulate for 16-bit operands. In general, if only one
variant exists, it is usually accumulate rather than subtract.
In the same spirit, using the dedicated saturation function enables
use of any special optimised versions of this.
Signed-off-by: Mans Rullgard <mans@mansr.com>
C++ does not allow to mix different enums, so e.g. code comparing
ACodecID with CodecID would fail to compile with gcc.
This very evil hack should fix this problem.
In 16-bit arithmetic, x * 0xffffc is simply x * -4 with extra overflows,
(and the constant was probably meant to be 0xfffc). Combined with the
shift, this simplifies to -x >> 1. Finally, clearing the low two bits
with a 32-bit mask and switching to a 32-bit type allows more efficient
code on 32-bit machines.
Signed-off-by: Mans Rullgard <mans@mansr.com>
At both places this function is called, mb_[xy] == s->mb_[xy]
making the call together with following code equivalent to
simply assigning zeros.
Signed-off-by: Mans Rullgard <mans@mansr.com>
The main benefit of inlining this function is from constant
propagation for the 'field_based' argument. Instead of inlining
all calls, create two versions of the function for field_based
values of 0 and 1.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This file defines a single, huge function, MPV_motion(), which
although being declared inline is not actually inlined by the
compiler (for good reason). There is thus no sense in defining
this function in a header file, resulting in multiple copies of
it in the final library.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This adds a hidden config variable for the mpegvideo.o dependency
and selects from the codecs which require it.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This macro is only used in two places, both in libavcodec, so this
is a more sensible place for it.
Two small tweaks to the macro are made:
- removing the trailing semicolon
- dropping unnecessary 'volatile' from the x86 asm
Signed-off-by: Mans Rullgard <mans@mansr.com>
It expects maximum value to be 32767 but calculations in scale_vector()
which uses this function can give it ABS(-32768) which leads to wrong
result and thus clipping is needed.
yasm tolerates mismatch between movd/movq and source register size,
adjusting the instruction according to the register. nasm is more
strict.
Signed-off-by: Mans Rullgard <mans@mansr.com>
The check is bogus since the nuv frameheader is already skipped
and the (decompressed) RTjpeg header is checked.
This reverts commit f6afacdb3b.
CC: libav-stable@libav.org
If there was a failure inflating, or reinitializing
the zstream, the current frame's buffer would be lost.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
In the GNU assembler, a relational expression, bizarrely, has the
value -1 if true, whereas in Apple's it is +1. This patch makes
sure the correct expression is used in both cases.
Signed-off-by: Mans Rullgard <mans@mansr.com>
The clang integrated assembler does not support pre-UAL syntax,
while gcc requires pre-UAL syntax for ARM code. A patch[1] for
clang to support the old syntax as well has been ignored since
January.
This patch chooses the syntax appropriate for each compiler,
allowing both to build the code. Notably, this change allows
building for iphone with the latest Apple Xcode update.
[1] http://llvm.org/bugs/show_bug.cgi?id=11855
Signed-off-by: Mans Rullgard <mans@mansr.com>
Refactoring mmx2/mmxext YASM code with cpuflags will force renames.
So switching to a consistent naming scheme beforehand is sensible.
The name "mmxext" is more official and widespread and also the name
of the CPU flag, as reported e.g. by the Linux kernel.
For left HFYU prediction, we predict from the buffer buf+1 using 8- or
16-byte reads. This means that aligning the buffer by 16 bytes is in
itself not sufficient, because if the width itself is 16- or 8-byte
aligned, the buffer will not be padded, and thus a read of size 16 at
buf+1 will overflow boundaries at the right edge. Padding the buffer by
1 byte is sufficient to not overflow its boundaries.
Fixes bug 342.
This makes add_hfyu_left_prediction_sse4() handle sources that are not
16-byte aligned in its own function rather than by proxying the call to
add_hfyu_left_prediction_ssse3(). This fixes a crash on Win64, since the
sse4 version clobberes xmm6, but the ssse3 version (which uses MMX regs)
does not restore it, thus leading to XMM clobbering and RSP being off.
Fixes bug 342.
The scaling process for obtaining direct MVs from co-located field MVs
is the same for interlaced field and progressive pictures.
Signed-off-by: Kostya Shishkov <kostya.shishkov@gmail.com>
In VC-1 interlaced field pictures, chroma motion vectors can extend beyond
picture boundary even if luma vectors are bounded. The problem shows up
only for hpel interpolated MVs, and may be due to the way motion vectors
are scaled / cropped.
Thanks to Konstantin Shishkov for suggesting the fix. This fixes
long-known segfaults in MC-VC1.ts from videolan streams archive.
Signed-off-by: Kostya Shishkov <kostya.shishkov@gmail.com>
Currently there is a wild mix of 3dn2/3dnow2/3dnowext. Switching to
"3dnowext", which is a more common name of the CPU flag, as reported
e.g. by the Linux kernel, unifies this.
Fixed codebook mode in 5300 rate may write up to SUBFRAME_LEN + 4 and
that is considered normal by the reference decoder. Without that additional
padding it might overwrite first elements of LPC history.
Some calculations were changed in b6a3849 to use mmsize, which was not correct
for the AVX version, which uses INIT_YMM and therefore has mmsize == 32.
Fixes Bug 341.
Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
This allows building dct-test even if aandcttab.o is not pulled in
by any enabled codec. The DCT with which these tables are used does
not use them directly, so building it without the tables is possible.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Reordering the members in this struct reduces the holes required
to maintain alignment. With this order, the only remaining, and
unavoidable, hole is 3 bytes following left_nnz.
Signed-off-by: Mans Rullgard <mans@mansr.com>
These functions are not faster than other mmx implementations on
any hardware I have been able to test on, and they are horribly
inaccurate. There is thus no reason to ever use them.
Signed-off-by: Mans Rullgard <mans@mansr.com>
The standard syntax requires two destination registers for
LDRD/STRD instructions. Some versions of the GNU assembler
allow using only one with the second implicit, others are
more strict.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This completes the conversion of h264dsp to yasm; note that h264 also
uses some dsputil functions, most notably qpel. Performance-wise, the
yasm-version is ~10 cycles faster (182->172) on x86-64, and ~8 cycles
faster (201->193) on x86-32.
These decoders use a special non-MPEG2 IDCT. Call it directly
instead of going through dsputil. There is never any reason
to use a regular IDCT with these decoders or to use the EA IDCT
with other codecs.
This also fixes the bizarre situation of eamad and eatqi decoding
incorrectly if eatgq is disabled.
Signed-off-by: Mans Rullgard <mans@mansr.com>
There is no sense in pulling in this monster struct just for
a handful of fields. The code does not call any functions
expecting an MpegEncContext.
Signed-off-by: Mans Rullgard <mans@mansr.com>