ff_wma_init is used only by wmadec and wmaenc, and neither of them
can handle more than 2 channels.
This fixes crashes with invalid files.
Based on patch by Piotr Bandurski and Michael Niedermayer.
Signed-off-by: Martin Storsjö <martin@martin.st>
This creates proper position independent code when accessing
data symbols if CONFIG_PIC is set.
References to external symbols should now use the movrelx macro.
Some additional code changes are required since this macro may
need a register to hold the GOT pointer.
Signed-off-by: Mans Rullgard <mans@mansr.com>
The problem is that the ssse3 psign instruction does the wrong
thing here. Commit ea60dfe incorrectly removed a macro emulating
this instruction for pre-ssse3 code. However, the emulation is
incorrect, and the code relies on the behaviour of the macro.
Specifically, the psign sets destination elements to zero where
the corresponding source element is zero, whereas the emulation
only negates destination elements where the source is negative.
Furthermore, the PSIGNW_MMX macro in x86util.asm is totally bogus,
which is why the original VC-1 code had an additional right shift
when using it. Since the psign instruction cannot be used here,
skip all the macro hell and use the working instruction sequence
directly.
None of this was noticed due a stray return statement in
ff_vc1dsp_init_mmx() which meant that only the mmx version of the
loop filter was ever used (before being removed in ea60dfe).
Signed-off-by: Mans Rullgard <mans@mansr.com>
The function call was a mess to handle, and memcpy cannot make
the assumptions we do in the new code.
Tested on an IMC sample: 430c -> 370c.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Apparently, some build environments require dxva.h even for dxva2,
while others lack this header entirely. Including it conditionally
allows building in both cases.
Signed-off-by: Martin Storsjö <martin@martin.st>
The MBAFF flag may only be signaled if we're actually dealing with
a full frame, and not singular fields, as it can happen in mixed content.
Signed-off-by: Martin Storsjö <martin@martin.st>
This removes a dependency on implementation details from generic
code and allows easy addition of the equivalent optimisation for
other architectures than x86.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Its documentation states that it is allocated/freed by the caller, but
it is declared as an AV_OPT_TYPE_STRING AVOption. Since
367732832f the AVOptions system frees
strings automatically. This can be considered an API break, since it
won't work when the caller doesn't use av_malloc() to allocate the
memory or wants to use the string after closing the codec.
Since there is not much value in this field being an AVOption, the best
solution is to remove it from the options table.
Fixes infinite loop in FLAC decoding in case of a truncated bitstream due to
the safe bitstream reader returning 0's at the end.
Fixes Bug 310.
CC:libav-stable@libav.org
Override the frame size from the SPS with AVCodecContext values
if the latter specify a size smaller by less than one macroblock.
This is required for correct cropping of MOV files from Canon cameras.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Unlike its predecessor, Indeo Audio codec generates tables depending on
sampling rate. Previously decoder used pre-generated tables for 22050 Hz
which obviously doesn't work with other frequencies.
Many thanks to Maxim Poliakovsky for providing all needed information
for this.
This ensures that these functions are inlined into the per-position
entry points, allowing constant propagation as needed for proper
optimisation.
18% faster VC1 decoding on Cortex-A9.
Signed-off-by: Mans Rullgard <mans@mansr.com>
In Musepack SV8 codec property tell the maximum nonzero band, but every
frame codes maximum band as a limit (i.e. strictly less than given value).
Synthesis also expects maximum nonzero band, so there's a need to convert
frame maximum band limit value.
This prevents gcc from assuming that contents of it may have changed
between calls to vp56_range_get_prob(), thus preventing countless (and
unnecessary) movs. Decoding of sintel trailer goes from (avg+SG) 9.796
+/- 0.003 to 9.635 +/- 0.010.
We support every defined value for channel layout, bitrate and sample depth.
All other values are not unsupported, but reserved.
Update comments to say "are used" instead of "are known or exist".
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
This silences some valgrind warnings.
CC: libav-stable@libav.org
Fixes second half of http://ffmpeg.org/trac/ffmpeg/ticket/794
Bug found by: Oana Stratulat
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Reinhard Tartler <siretart@tauware.de>
(cherry picked from commit f85334f58e)
In hybrid frames long window part ends at 36 samples for most of the cases
but at 72 for 8kHz case. For some reason decoder assumed it's 48 or even 36
samples, which caused wrong bitstream decoding for such blocks.
l3_25207.mpg from conformance suite demonstrates it the best.
Change the size specifiers to match the actual element sizes
of the data. This makes no practical difference with strict
alignment checking disabled (the default) other than somewhat
documenting the code. With strict alignment checking on, it
avoids trapping the unaligned loads.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Code mostly inspired by vp8's MC, however:
- its MMX2 horizontal filter is worse because it can't take advantage of
the coefficient redundancy
- that same coefficient redundancy allows better code for non-SSSE3 versions
Benchmark (rounded to tens of unit):
V8x8 H8x8 2D8x8 V16x16 H16x16 2D16x16
C 445 358 985 1785 1559 3280
MMX* 219 271 478 714 929 1443
SSE2 131 158 294 425 515 892
SSSE3 120 122 248 387 390 763
End result is overall around a 15% speedup for SSSE3 version (on 6 sequences);
all loop filter functions now take around 55% of decoding time, while luma MC
dsp functions are around 6%, chroma ones are 1.3% and biweight around 2.3%.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
The vertically interpolating variants of these functions read
ahead one line to optimise the loop. On the last line processed,
this might be outside the buffer. Fix these invalid reads by
processing the last line outside the loop.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Unlike other variants, for YUY2 we need to use different prediction:
* on line 0 for luma we should left predict starting from the second pixel
* on line 1 we should left predict first 4 pixels for luma and 2 for chroma
* median prediction employed here is taken directly from HuffYUV
Prevents subsequent overreads when these numbers are used as indices
in arrays.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
ZeroCodec relies on the keyframe flag being set in the container, and
prev is the previously decoded frame. A keyframe flags incorrectly set
will lead to this condition.
Signed-off-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Wrong bit depth can lead to invalid rowsize values, which crashes the
decoder further down.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
The decoder assumes in various places that the image size
is a multiple of the block size, and there is no obvious
way to support odd sizes. Bailing out early if the header
specifies a bad size avoids various errors later on.
Fixes CVE-2012-0947.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Progressive data is allocated later in decode_sof(), not allocating
that data leads to NULL dereferences.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
This prevents sample_rate/data_length from going negative, which
caused various crashes and undefined behaviour further down.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
This properly synchronizes frame size changes between threads if
subsequent threads abort decoding before frame size is initialized, i.e.
it prevents the thread after that from ping-ponging back to the original
value.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
The index of the motion vector has to be checked before being
multiplied by 2 for the array index.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
ModeTab.fmode has only 3 elements, so indexing it with ftype
in the initialier for 'size' is invalid when ftype == FT_PPC.
This fixes crashes with gcc 4.8.
Signed-off-by: Mans Rullgard <mans@mansr.com>
The shift parameter was removed from this interface in 7e1ce6a.
This updates the Altivec implementation to match.
Signed-off-by: Mans Rullgard <mans@mansr.com>
To load unaligned vector data in the usual way, explicit vec_ld()
should be used rather than dereferencing a pointer to a vector type.
When the VSX extension is enabled, gcc may compile vector pointer
dereferences using the VSX lxvw4x instruction instead of the lvx
instruction typically used with Altivec/VMX. As the behaviour of
these instructions with unaligned addresses differs, it is important
that only lvx is used here.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Allowing dsputil functions to assume the stride is a multiple of 16
even for smaller block sizes can simplify their implementation.
This appears to be the only place this guarantee is not met.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Non perceptual color model that aims to have an increase effectiveness
in compression like the normal YCbCr while having near-lossless/lossless
mapping to RGB.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
This adds a hand-optimized assembly version for get_cabac much like the
existing one, but it works if the table offsets are RIP-relative.
Compared to the non-RIP-relative version this adds 2 lea instructions
and it needs one extra register. get_cabac() gets about 40% faster, for
an overall speedup of about 5%.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
The reason is this is easier for PIC code (in particular on darwin...).
Keep the old names as pointers (static in cabac_functions.h so gcc
knows these are just immediate offsets) so the c code can nicely stay the same
(alternatively could use offsets directly in the functions needing the
tables). This should produce the same code as before with non-pic and better
code (confirmed) with pic.
The assembly uses the new table but still won't work for PIC case.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
The assembler may fail to place literal pools close enough to
instructions referencing them. An explicit .ltorg directive
fixes this.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This allows masking CPU features with the -cpuflags avconv option
which is useful for testing different optimisations without rebuilding.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This removes all references to AVCodecContext.dsp_mask and marks
it for eviction at the next version bump. It has been superseded
by av_set_cpu_flag_mask() which, unlike this field, works everywhere.
Signed-off-by: Mans Rullgard <mans@mansr.com>
General cosmetics, such as keeping lines under 80 characters,
fixing a couple of typos (predition -> prediction) and a
general style fix that was pointed out by Derek when I was having
my sliced multithreading patch in review by him.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Do not pointlessly call ff_alloc_packet multiple times,
and fix an infinite loop by clamping the maximum
number of bits to target in the algorithm that does
not use lambda.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Save the old output configuration (if it has been used
successfully) when trying a new configuration. If the new configuration
fails to decode, restore the last successful configuration.
There is no point in storing the value in a variable, since it is not
used anywhere else in the decoder.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
This reworks a loop to get rid of an ugly pointer cast,
fixing errors seen with the PathScale ENZO compiler.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Recent register allocation changes (x86inc.asm update) changed the
register order and thus opcodes for the inner loops. One of them became
>128bytes, which confuses other parts of this function where it jumps
to fixed-offset positions to extend the edge by fixed amounts. A simple
register change fixes this.