Conversion of the luma intra prediction mode to one of the constrained
("alzheimer") ones can happen by crafting special bitstreams, causing
a crash because we'll call a NULL function pointer for 16x16 block intra
prediction, since constrained intra prediction functions are only
implemented for chroma (8x8 blocks).
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
We need to do unsigned saturation in order to cover the corner case when the
absolute coefficient value is 16777215 (the maximum value).
Fixes Bug #216
That way all mix levels as exported by avpriv_ac3_parse_header()
will have the same meaning.
Previously the 3-bit center mix level for E-AC-3 was used to index in a
4-entry table, leading to out-of-array reads.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
Signed-off-by: Alex Converse <alex.converse@gmail.com>
This fixes crashes on exit when closing a bitstream filter that
hasn't allocated any private data, on OS X.
Signed-off-by: Martin Storsjö <martin@martin.st>
This changes a number of FATE results, since before this commit, the
timestamps in all tests using rawenc were made up by lavf.
In most cases, the previous timestamps were completely bogus.
In some other cases -- raw formats, mostly h264 -- the new timestamps
are bogus as well. The only difference is that timestamps invented by
the muxer are replaced by timestamps invented by the demuxer.
cscd -- avconv sets output codec timebase from r_frame_rate
and r_frame_rate is in this case some guessed number 31.42 (377/12),
which is not accurate enough to represent all timestamps. This results
in some frames having duplicate pts. Therefore, vsync 0 needs to be
changed to vsync 2 and avconv drops two frames. A proper fix in the
future would be to set output timebase to something saner in avconv.
nuv -- previous timestamps for video were wrong AND the cscd
comment applies, one frame is dropped.
vp8-signbias -- the file contains two frames with identical timestamps,
so -vsync 0 needs to be removed/changed to -vsync 2 and avconv drops one
frame.
vc1-ism -- apparrently either the demuxer lies about timestamps or the
file is broken, since dts == pts on all packets, but reordering clearly
takes place.
The spec says the following speaker mapping is default:
center front speaker
left, right center front speakers,
left, right outside front speakers,
left surround, right surround rear speakers,
front low frequency effects speaker
This fixes crashes in e.g. PNG decoding with SSE2 enabled. In fact, many
x86 optimizations for codecs assume that our buffer strides are 16-byte
aligned.
Also slightly move around code not allocate a new frame if we won't
decode it. This prevents us from putting undecoded frames in frame
pointers, which (in mt decoding) other threads will use and wait on
as references, causing a deadlock (if we skipped decoding) or a crash
(if we didn't initialized next_framep[] at all).
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
It makes sense in some cases to split up the output packet to save on memory
usage (ape frames can be very large), but the current/default size is
arbitrary. Allowing the user to configure this gives more flexibility and
requires minimal additional code.
FFALIGN doesn't work with non-powers-of-2.
This reverts commit 7ad1b612c8.
Signed-off-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
This will be useful to test more aggressively for failures to mark XMM
registers as clobbered in Win64 builds, and prevent regressions thereof.
Based on a patch by Ramiro Polla <ramiro.polla@gmail.com>
Return the correct number of consumed bytes and set *data_size = 0.
Returned size is 1 too small, leading to that 1 byte being read as the next
frame, which results in an extra blank frame at the beginning of the stream.
Avoids doing malloc/free for each frame.
Also fixes valgrind errors due to use of uninitialized padding bytes.
Based on a patch by Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Wrapper around av_fast_malloc() that keeps FF_INPUT_BUFFER_PADDING_SIZE
zero-padded bytes at the end of the used buffer.
Based on a patch by Reimar Döffinger <Reimar.Doeffinger@gmx.de>.
get_ue_golomb_long() is only tested for values up to 2^15 - 2 since
we can not write larger values.
Silence the test on success and return a non-zero value on error.
Use an heap scratch buffer instead of large stack buffer.
Remove unneeded includes.
This way, if the AVCodecContext is allocated for a specific codec, the
caller doesn't need to store this codec separately and then pass it
again to avcodec_open2().
It also allows to set codec private options using av_opt_set_* before
opening the codec.
It allows to check whether an AVCodecContext is open in a documented
way. Right now the undocumented way this check is done in lavf/lavc is
by checking whether AVCodecContext.codec is NULL. However it's desirable
to be able to set AVCodecContext.codec before avcodec_open2().
In some cases, what is left to read from ptr is smaller than EXTRABYTES.
Based on a patch by Thierry Foucu <tfoucu@gmail.com>.
Signed-off-by: Alex Converse <alex.converse@gmail.com>
Provide MMX, SSE2 and SSSE3 versions, with a fast-path when the weights are
multiples of 512 (which is often the case when the values round up nicely).
*_TIMER report for the 16x16 and 8x8 cases:
C:
9015 decicycles in 16, 524257 runs, 31 skips
2656 decicycles in 8, 524271 runs, 17 skips
MMX:
4156 decicycles in 16, 262090 runs, 54 skips
1206 decicycles in 8, 262131 runs, 13 skips
MMX on fast-path:
2760 decicycles in 16, 524222 runs, 66 skips
995 decicycles in 8, 524252 runs, 36 skips
SSE2:
2163 decicycles in 16, 262131 runs, 13 skips
832 decicycles in 8, 262137 runs, 7 skips
SSE2 with fast path:
1783 decicycles in 16, 524276 runs, 12 skips
711 decicycles in 8, 524283 runs, 5 skips
SSSE3:
2117 decicycles in 16, 262136 runs, 8 skips
814 decicycles in 8, 262143 runs, 1 skips
SSSE3 with fast path:
1315 decicycles in 16, 524285 runs, 3 skips
578 decicycles in 8, 524286 runs, 2 skips
This means around a 4% speedup for some sequences.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Currently, any samples in the final frame are not decoded because they are
only represented by one frame instead of two. So we encode two final frames to
cover both the analysis delay and the MDCT delay.
While pshufb allows emulating bswap on XMM registers for SSSE3, more
shuffling is needed for SSE2. Alignment is critical, so specific codepaths
are provided for this case.
For the huffyuv sequence "angels_480-huffyuvcompress.avi":
C (using bswap instruction): ~ 55k cycles
SSE2: ~ 40k cycles
SSSE3 using unaligned loads: ~ 35k cycles
SSSE3 using aligned loads: ~ 30k cycles
Signed-off-by: Diego Biurrun <diego@biurrun.de>
On x86-64, it indeed uses all 16 registers (and on x86-32, this gets
clipped to 8). Not marking it properly causes callers of this function
to fail randomly because of XMM register clobbering.
Note: This fixes the following GCC warning :-
libavcodec/sunrast.c:94: warning: comparison of unsigned expression < 0 is always false.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
10l: Forgot to adjust deinterleave for new location of incoming samples in 7946a5a.
This produced incorrect, but surprisingly listenable results.
Thanks to Justin Ruggles for the report.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
This prepares for assembly optimisations by moving the most
time-consuming loops to functions called through pointers
in a new context.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Earlier, calling avcodec_encode_audio worked fine even if time_base
wasn't set. Now it crashes due to trying to scale the output pts to
the codec context time base. This affects e.g. VLC.
If no time_base is set for audio codecs, set it to the sample
rate.
CC: libav-stable@libav.org
Signed-off-by: Martin Storsjö <martin@martin.st>
MVDATA may or may not be transmitted. If it is not, both
dmv_x and dmv_y is to be assumed zero.
This may not trigger wrong picture in all systems, but
it's a bug nevertheless. Fixes SA10116.vc1 on my 64-bit
Windows 7.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Previously, it would not be read if refdist_flag was not set, however
according to the spec and the reference decoder, it should always be read.
Signed-off-by: Kostya Shishkov <kostya.shishkov@gmail.com>
The MDCT buffers in the decoder are only sized for up to 11 bits. The
reverse engineered documentation for WMA1/2 headers say that that for
all samplerates above 32kHz 11 bits are used. 12 and 13 bit support
were added for WMAPro. I was unable to make any Microsoft tools generate
a test file at a samplerate above 48kHz.
Discovered by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
Although it has been deprecated for a long time, its intended
replacement (request_channel_layout) is not actually used anywhere, so
request_channels is currently the only way to access that functionality.
This matches the spec as well as the reference decoder, and fixes a bug
with interlaced frame decoding.
Signed-off-by: Kostya Shishkov <kostya.shishkov@gmail.com>
Using threaded decoding by default breaks backward compatibility if
AVHWAccel is used or if an appliction sets threadunsafe callbacks.
Avconv and avplay still use -threads auto if not specified.
When either video dimension is only one macroblock, subtractions
based on v_edge_pos and the macroblock size may be negative. In
that situation, an unsigned comparison isn't sufficent to test for
MV overruns, because a limit of (unsigned)-1 will let any other
value pass.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Overall almost 4% faster, idct_add down from 350 to 85 cycles, idct_dc_add
down from 83 to 30 cycles.
squash: rv34 idct rearrange partial register loads
This allows audio encoders to optionally take an AVFrame as input and write
encoded output to an AVPacket.
This also adds AVCodec.encode2() which will also be usable by video and
subtitle encoders once support is implemented in the public functions.
Extract processing of intra 16x16 blocks from intra macroblock
processing.
Also implement a function performing inverse transform and block
reconstruction for DC-only blocks in 1 pass instead of 2.
Split inter/intra macroblock handling code. This will allow further
optimizations such as performing inverse transform and block reconstruction
in a single pass as well as specialize code.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
Do not fail audio decoding with avcodec_decode_audio3 if user has set a
custom get_buffer. Strictly speaking, this was never allowed by the API,
but it seems that some software packages did so anyways. In order to
unbreak applications (cf. http://bugs.debian.org/655890), this change
clarifies the API and overrides the custom get_buffer() with the defaults.
This change is inspired by a similar
commit (c3846e3eba) in FFmpeg.
Signed-off-by: Reinhard Tartler <siretart@tauware.de>
Reference decoder clips data before shifting it to final range and also
forces 32-bit lossy mode to be actually 24-bit lossy mode in order to be
able to perform proper clipping.
max_b_frames is initialized to -1 for libx264, to allow
distinguishing between an explicit user set 0 and a default not
touched 0 (see bb73cda2).
If max_b_frames is left as -1, this affects dts generation (where
expressions like max_b_frames != 0 are used), so make sure it is
left at the default 0 after the libx264 init function returns.
This avoids unnecessarily producing dts != pts when using
profile=baseline.
Signed-off-by: Martin Storsjö <martin@martin.st>
The alignment directive must obviously precede the label.
This was never noticed in ARM mode since the location is
already aligned there.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Due to apprent bugs in the GNU assembler and/or linker, relocations
can be incorrectly processed if the alignment of a Thumb instruction
is changed in the output file compared to the input object.
This fixes crashes in h264 decoding with Thumb enabled. No effect in
ARM mode since everything is 4-byte aligned there.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This fixes standalone compilation of some decoders with --disable-optimizations.
cabac.h defines some inline functions that use symbols from cabac.c. Without
optimizations these inline functions are not eliminated and linking fails with
references to non-existing symbols.
Splitting the inline functions off into their own header and only #including
it in the places where the inline functions are used allows #including cabac.h
from anywhere without ill effects.
The sporadic threading errors during fate-rv30 were caused by calling
ff_thread_await_progress with mb row -1 as argument. That returns
immediately since progress is initialized to -1. Not yet computed
motion vectors from the reference could be used for the first
macroblocks.
When decoding coefficients, detect whether the block is DC-only, and take
advantage of this knowledge to perform DC-only inverse transform.
This is achieved by:
- first, changing the 108x4 element modulo_three_table into a 108 element
table (kind of base4), and accessing each value using mask and shifts.
- then, checking low bits for 0 (as they represent the presence of higher
frequency coefficients)
Also provide x86 SIMD code for the DC-only inverse transform.
Signed-off-by: Kostya Shishkov <kostya.shishkov@gmail.com>
This is required to handle clobbering of XMM registers on Win64
correctly. Fixes FFT and all tests depending on FFT on Win64.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
The WAVE demuxer returns packets with many blocks per frame, which needs to be
parsed into single blocks. This has a side-effect of fixing the timestamps.
Statistics for bourne.rmvb -an -f null
1 thread: 37.12s user 0.03s system 99% cpu 37.174 total
2 threads: 47.63s user 0.24s system 185% cpu 25.807 total
4 threads: 41.21s user 0.30s system 327% cpu 12.674 total
Under certain conditions pictures could be released before they were
returned with frame-threading. Broken mv computation in the upcoming
rv34 frame-threading patch was caused by this.
To prevent contexts from running out of available pictures the loop
releasing "unused" pictures has to be run for B frames too.
Fixes Libav Bug 195.
This doesn't make the code handle sample rate or upsample/downsample
change properly but this is still a good sanity check.
Based on change by Michael Niedermayer.
Signed-off-by: Alex Converse <alex.converse@gmail.com>
The functions are not used in any part of Libav, therefore testing them in the
cabac-test is unnecessary. Since this makes them unused, remove the functions.
>> time ./avconv -i file.avi -f null -
Before : real 0m7.784s
After : real 0m3.662s
Tested on a Intel Core i3 Processor (2 cores, 4 threads).
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
The mpeg4 video, H264 and VC-1 parser hold (directly or indirectly)
a MpegEncContext in their private context. Since they do not call the
common mpegvideo init function slice_context_count has explicitly set
to 1.
Prevents a null pointer dereference in the h264 parser and fixes
bug 193.
Check explicitly if enough bits are left to prevent an infinite loop
when the bitstream buffer is not followed by zero-padding.
Based on patches by Michael Niedermayer <michaelni@gmx.at>.
frame_size is the number of bytes left in the packet, so if we are passing
buf-4 we can safely read frame_size+4 bytes.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
This fixes compilation failures related to START_TIMER/STOP_TIMER macros and
-Werror=declaration-after-statement. START_TIMER declares variables and thus
may not be placed after statements outside of a new block.
For small video dimensions, these calculations of the upper bound
for pixel access may have a negative result. Using an unsigned
comparison to bound a potentially negative value only works if
the greater operand is non-negative. Fixed by doing edge emulation
when the upper bound is probably negative, everywhere that this
pattern appears.
Signed-off-by: Kostya Shishkov <kostya.shishkov@gmail.com>
For small video dimensions calculations of the upper bound for pixel
access may result in negative value. Using an unsigned comparison
works only if the greater operand is non-negative. This is fixed by
doing edge emulation explicitly for such conditions.
Signed-off-by: Kostya Shishkov <kostya.shishkov@gmail.com>
Perform dequantization while decoding coefficients instead of performing it
on the entire coefficients buffer.
Since quantized coefficients are very sparse, this usually causes a small
speedup. Speedup of around 1% on Panda board compared to the removed here
neon code. Global speedup is probably around 3%.
Signed-off-by: Kostya Shishkov <kostya.shishkov@gmail.com>
This fixes compilation failures related to START_TIMER/STOP_TIMER macros and
-Werror=declaration-after-statement. START_TIMER declares variables and thus
may not be placed after statements outside of a new block.
The previous code ended in multiple different infinite
loops. See stl_ten_1_big.sfd as example with and without zzuf
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
Adds a new member to MpegEncContext to hold the number of used slice
contexts. Fixes segfaults with '-threads 17 -thread_type slice' and
fate-vsynth{1,2}-mpeg{2,4}thread{,_ilace} with --disable-pthreads.
The extra thread added in {frame_}*thread_init was not taken into
account. Explicitly sets thread_count to 1 if only one CPU core was
detected. Also fixes two typos in comments.
Some external codecs have their own code to determine the best number
of threads. This number is not necessary the number of cpu cores.
Thread_count will be only 0 if the codec has CODEC_CAP_AUTO_THREADS.
The safe bitstream reader does not allow using skip_bits_long() to seek to a
point before the start of the buffer, which was needed by the mp3 decoder.
This change instead calculates the start point of the first valid granule and
skips to that position.
Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
Since the conditions for the actual usage are more specific a less
preferred method can be used. This would cause compilation errors
because necessary headers are not included.
Originally, prior to 8742a4ff8, the caller code was compiled
within this condition:
ARCH_X86 && HAVE_7REGS && HAVE_EBX_AVAILABLE && !defined(BROKEN_RELOCATIONS)
Since HAVE_7REGS is defined as
(ARCH_X86_64 || (HAVE_EBX_AVAILABLE && HAVE_EBP_AVAILABLE))
the subcondition HAVE_7REGS && HAVE_EBX_AVAILABLE is equal
to HAVE_7REGS (for 32 bit at least). The correct simplification
of the original condition thus is HAVE_7REGS, not
HAVE_EBX_AVAILABLE.
This fixes compilation in some cases where HAVE_EBP_AVAILABLE = 0
and HAVE_EBX_AVAILABLE = 1.
Signed-off-by: Martin Storsjö <martin@martin.st>
The format is a per-frame property, having it in AVFrame simplify the
operation of extraction of that information, since avoids the need to
access the codec/stream context.
width and height are per-frame properties, setting these values in
AVFrame simplify the operation of extraction of that information,
since avoids the need to check the codec/stream context.
The sample aspect ratio is a per-frame property, so it makes sense to
define it in AVFrame rather than in the codec/stream context.
Simplify application-level sample aspect ratio information extraction,
and allow further simplifications.
Based on a patch by Michael Niedermayer <michaelni@gmx.at>
Fixes NGS00145, CVE-2011-4352
Found-by: Phillip Langlois
Signed-off-by: Reinhard Tartler <siretart@tauware.de>
Use sched_getaffinity to determine the number of logical CPUs.
Limits the number of threads to 16 since slice threading of H.264
seems to be buggy with more than 16 threads.
This file does not use anything from get_bits.h but needs
intreadwrite.h.
Signed-off-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
The 'fiel' atoms can be found in H.264 tracks clobbering the extradata.
MJPEG supports non field based extradata, and this data should be
preserved when copying.
Also define a codec capability for codecs that can handle
parameters changed externally between decoded packets.
Signed-off-by: Martin Storsjö <martin@martin.st>
On 32-bit OS X with gcc 4.0/4.2 and shared libraries enabled, the ebx register
is not available, but required to assemble the functions.
This reverts commit 8742a4f to a simplified version of the original constraints.
This fixes a deadlock VLC triggered with multithreaded decoding. The
wait forces one of the current waiters to wake and not the thread
which calls pthread_cond_signal() itself.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Interlaced content for most codec requires it.
This patch is a stop-gap pending a serious rework to support
codecs with non 16 pixel macroblocks.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Trailing bits are likely to be non-zero if the NAL unit is truncated.
Clearing the bits make overreads of the bitstream less likely in this
case. Fixes playback of
http://streams.videolan.org/streams/mp4/Mr_MrsSmith-h264_aac.mp4 which
has a forbidden byte sequence of 0x00 0x00 0x00 in it SPS.
Start code emulation prevention is only required in Annex B bytestream
packed NAL units. For other coding formats the size is already known.
Looking for a start code prefix can result in false positives like in
http://streams.videolan.org/streams/mp4/Mr_MrsSmith-h264_aac.mp4
which has a false positive in the SPS.
Width and height might get passed as 0 and would cause floating point
exceptions in decode_frame.
Fixes bugzilla #149
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
This was intended as an optimisation for skipped blocks in MPEG2
P-frames and never used elsewhere. Removing this "optimisation"
speeds up MPEG2 decoding by 1-2% (ARM Cortex-A9).
Signed-off-by: Mans Rullgard <mans@mansr.com>
Previously the decoder only worked if the user had set avctx->pix_fmt
manually. For some reason the libavformat tmv demuxer sets this, so
the problem was not visible in avplay etc.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
The buffer splicing relies on the bitstream reader over-reading
the end of the buffer as declared in init_get_bits(), although
more data is actually present. Manually moving the bitstream
boundary after init_get_bits() allows this to work as expected.
Signed-off-by: Mans Rullgard <mans@mansr.com>
When turned on, H264/CAVLC gets ~15% (CVPCMNL1_SVA_C.264) slower for
ultra-high-bitrate files, or ~2.5% (CVFI1_SVA_C.264) for lower-bitrate
files. Other codecs are affected to a lesser extent because they are
less optimized; e.g., VC-1 slows down by less than 1% (all on x86).
The patch generated 3 extra instructions (cmp, cmovae and mov) per
call to get_bits().
The performance penalty on ARM is within the error margin for most
files, up to 4% in extreme cases such as CVPCMNL1_SVA_C.264.
Based on work (for GCI) by Aneesh Dogra <lionaneesh@gmail.com>, and
inspired by patch in Chromium by Chris Evans <cevans@chromium.org>.
The A32 bitstream reader variant is only used on ARMv5 and for
Prores due to the larger bit cache this decoder requires.
In benchmarks on ARMv5 (Marvell Sheeva) with gcc 4.6, the only
statistically significant difference between ALT and A32 is
a 4% advantage for ALT in FLAC decoding. There is thus no (longer)
any reason to keep the A32 reader from this point of view.
This patch adds an option to the ALT reader increasing the bit
cache to 32 bits as required by the Prores decoder. Benchmarking
shows no significant change in speed on Intel i7. Again, the
A32 reader fails to justify its existence.
Signed-off-by: Mans Rullgard <mans@mansr.com>