When either video dimension is only one macroblock, subtractions
based on v_edge_pos and the macroblock size may be negative. In
that situation, an unsigned comparison isn't sufficent to test for
MV overruns, because a limit of (unsigned)-1 will let any other
value pass.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Overall almost 4% faster, idct_add down from 350 to 85 cycles, idct_dc_add
down from 83 to 30 cycles.
squash: rv34 idct rearrange partial register loads
This allows audio encoders to optionally take an AVFrame as input and write
encoded output to an AVPacket.
This also adds AVCodec.encode2() which will also be usable by video and
subtitle encoders once support is implemented in the public functions.
Extract processing of intra 16x16 blocks from intra macroblock
processing.
Also implement a function performing inverse transform and block
reconstruction for DC-only blocks in 1 pass instead of 2.
Split inter/intra macroblock handling code. This will allow further
optimizations such as performing inverse transform and block reconstruction
in a single pass as well as specialize code.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
Do not fail audio decoding with avcodec_decode_audio3 if user has set a
custom get_buffer. Strictly speaking, this was never allowed by the API,
but it seems that some software packages did so anyways. In order to
unbreak applications (cf. http://bugs.debian.org/655890), this change
clarifies the API and overrides the custom get_buffer() with the defaults.
This change is inspired by a similar
commit (c3846e3eba) in FFmpeg.
Signed-off-by: Reinhard Tartler <siretart@tauware.de>
Reference decoder clips data before shifting it to final range and also
forces 32-bit lossy mode to be actually 24-bit lossy mode in order to be
able to perform proper clipping.
max_b_frames is initialized to -1 for libx264, to allow
distinguishing between an explicit user set 0 and a default not
touched 0 (see bb73cda2).
If max_b_frames is left as -1, this affects dts generation (where
expressions like max_b_frames != 0 are used), so make sure it is
left at the default 0 after the libx264 init function returns.
This avoids unnecessarily producing dts != pts when using
profile=baseline.
Signed-off-by: Martin Storsjö <martin@martin.st>
The alignment directive must obviously precede the label.
This was never noticed in ARM mode since the location is
already aligned there.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Due to apprent bugs in the GNU assembler and/or linker, relocations
can be incorrectly processed if the alignment of a Thumb instruction
is changed in the output file compared to the input object.
This fixes crashes in h264 decoding with Thumb enabled. No effect in
ARM mode since everything is 4-byte aligned there.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This fixes standalone compilation of some decoders with --disable-optimizations.
cabac.h defines some inline functions that use symbols from cabac.c. Without
optimizations these inline functions are not eliminated and linking fails with
references to non-existing symbols.
Splitting the inline functions off into their own header and only #including
it in the places where the inline functions are used allows #including cabac.h
from anywhere without ill effects.
The sporadic threading errors during fate-rv30 were caused by calling
ff_thread_await_progress with mb row -1 as argument. That returns
immediately since progress is initialized to -1. Not yet computed
motion vectors from the reference could be used for the first
macroblocks.
When decoding coefficients, detect whether the block is DC-only, and take
advantage of this knowledge to perform DC-only inverse transform.
This is achieved by:
- first, changing the 108x4 element modulo_three_table into a 108 element
table (kind of base4), and accessing each value using mask and shifts.
- then, checking low bits for 0 (as they represent the presence of higher
frequency coefficients)
Also provide x86 SIMD code for the DC-only inverse transform.
Signed-off-by: Kostya Shishkov <kostya.shishkov@gmail.com>
This is required to handle clobbering of XMM registers on Win64
correctly. Fixes FFT and all tests depending on FFT on Win64.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
The WAVE demuxer returns packets with many blocks per frame, which needs to be
parsed into single blocks. This has a side-effect of fixing the timestamps.
Statistics for bourne.rmvb -an -f null
1 thread: 37.12s user 0.03s system 99% cpu 37.174 total
2 threads: 47.63s user 0.24s system 185% cpu 25.807 total
4 threads: 41.21s user 0.30s system 327% cpu 12.674 total
Under certain conditions pictures could be released before they were
returned with frame-threading. Broken mv computation in the upcoming
rv34 frame-threading patch was caused by this.
To prevent contexts from running out of available pictures the loop
releasing "unused" pictures has to be run for B frames too.
Fixes Libav Bug 195.
This doesn't make the code handle sample rate or upsample/downsample
change properly but this is still a good sanity check.
Based on change by Michael Niedermayer.
Signed-off-by: Alex Converse <alex.converse@gmail.com>
The functions are not used in any part of Libav, therefore testing them in the
cabac-test is unnecessary. Since this makes them unused, remove the functions.
>> time ./avconv -i file.avi -f null -
Before : real 0m7.784s
After : real 0m3.662s
Tested on a Intel Core i3 Processor (2 cores, 4 threads).
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
The mpeg4 video, H264 and VC-1 parser hold (directly or indirectly)
a MpegEncContext in their private context. Since they do not call the
common mpegvideo init function slice_context_count has explicitly set
to 1.
Prevents a null pointer dereference in the h264 parser and fixes
bug 193.
Check explicitly if enough bits are left to prevent an infinite loop
when the bitstream buffer is not followed by zero-padding.
Based on patches by Michael Niedermayer <michaelni@gmx.at>.
frame_size is the number of bytes left in the packet, so if we are passing
buf-4 we can safely read frame_size+4 bytes.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
This fixes compilation failures related to START_TIMER/STOP_TIMER macros and
-Werror=declaration-after-statement. START_TIMER declares variables and thus
may not be placed after statements outside of a new block.
For small video dimensions, these calculations of the upper bound
for pixel access may have a negative result. Using an unsigned
comparison to bound a potentially negative value only works if
the greater operand is non-negative. Fixed by doing edge emulation
when the upper bound is probably negative, everywhere that this
pattern appears.
Signed-off-by: Kostya Shishkov <kostya.shishkov@gmail.com>
For small video dimensions calculations of the upper bound for pixel
access may result in negative value. Using an unsigned comparison
works only if the greater operand is non-negative. This is fixed by
doing edge emulation explicitly for such conditions.
Signed-off-by: Kostya Shishkov <kostya.shishkov@gmail.com>
Perform dequantization while decoding coefficients instead of performing it
on the entire coefficients buffer.
Since quantized coefficients are very sparse, this usually causes a small
speedup. Speedup of around 1% on Panda board compared to the removed here
neon code. Global speedup is probably around 3%.
Signed-off-by: Kostya Shishkov <kostya.shishkov@gmail.com>
This fixes compilation failures related to START_TIMER/STOP_TIMER macros and
-Werror=declaration-after-statement. START_TIMER declares variables and thus
may not be placed after statements outside of a new block.
The previous code ended in multiple different infinite
loops. See stl_ten_1_big.sfd as example with and without zzuf
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
Adds a new member to MpegEncContext to hold the number of used slice
contexts. Fixes segfaults with '-threads 17 -thread_type slice' and
fate-vsynth{1,2}-mpeg{2,4}thread{,_ilace} with --disable-pthreads.
The extra thread added in {frame_}*thread_init was not taken into
account. Explicitly sets thread_count to 1 if only one CPU core was
detected. Also fixes two typos in comments.
Some external codecs have their own code to determine the best number
of threads. This number is not necessary the number of cpu cores.
Thread_count will be only 0 if the codec has CODEC_CAP_AUTO_THREADS.
The safe bitstream reader does not allow using skip_bits_long() to seek to a
point before the start of the buffer, which was needed by the mp3 decoder.
This change instead calculates the start point of the first valid granule and
skips to that position.
Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
Since the conditions for the actual usage are more specific a less
preferred method can be used. This would cause compilation errors
because necessary headers are not included.
Originally, prior to 8742a4ff8, the caller code was compiled
within this condition:
ARCH_X86 && HAVE_7REGS && HAVE_EBX_AVAILABLE && !defined(BROKEN_RELOCATIONS)
Since HAVE_7REGS is defined as
(ARCH_X86_64 || (HAVE_EBX_AVAILABLE && HAVE_EBP_AVAILABLE))
the subcondition HAVE_7REGS && HAVE_EBX_AVAILABLE is equal
to HAVE_7REGS (for 32 bit at least). The correct simplification
of the original condition thus is HAVE_7REGS, not
HAVE_EBX_AVAILABLE.
This fixes compilation in some cases where HAVE_EBP_AVAILABLE = 0
and HAVE_EBX_AVAILABLE = 1.
Signed-off-by: Martin Storsjö <martin@martin.st>
The format is a per-frame property, having it in AVFrame simplify the
operation of extraction of that information, since avoids the need to
access the codec/stream context.
width and height are per-frame properties, setting these values in
AVFrame simplify the operation of extraction of that information,
since avoids the need to check the codec/stream context.
The sample aspect ratio is a per-frame property, so it makes sense to
define it in AVFrame rather than in the codec/stream context.
Simplify application-level sample aspect ratio information extraction,
and allow further simplifications.
Based on a patch by Michael Niedermayer <michaelni@gmx.at>
Fixes NGS00145, CVE-2011-4352
Found-by: Phillip Langlois
Signed-off-by: Reinhard Tartler <siretart@tauware.de>
Use sched_getaffinity to determine the number of logical CPUs.
Limits the number of threads to 16 since slice threading of H.264
seems to be buggy with more than 16 threads.
This file does not use anything from get_bits.h but needs
intreadwrite.h.
Signed-off-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
The 'fiel' atoms can be found in H.264 tracks clobbering the extradata.
MJPEG supports non field based extradata, and this data should be
preserved when copying.
Also define a codec capability for codecs that can handle
parameters changed externally between decoded packets.
Signed-off-by: Martin Storsjö <martin@martin.st>
On 32-bit OS X with gcc 4.0/4.2 and shared libraries enabled, the ebx register
is not available, but required to assemble the functions.
This reverts commit 8742a4f to a simplified version of the original constraints.
This fixes a deadlock VLC triggered with multithreaded decoding. The
wait forces one of the current waiters to wake and not the thread
which calls pthread_cond_signal() itself.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Interlaced content for most codec requires it.
This patch is a stop-gap pending a serious rework to support
codecs with non 16 pixel macroblocks.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Trailing bits are likely to be non-zero if the NAL unit is truncated.
Clearing the bits make overreads of the bitstream less likely in this
case. Fixes playback of
http://streams.videolan.org/streams/mp4/Mr_MrsSmith-h264_aac.mp4 which
has a forbidden byte sequence of 0x00 0x00 0x00 in it SPS.
Start code emulation prevention is only required in Annex B bytestream
packed NAL units. For other coding formats the size is already known.
Looking for a start code prefix can result in false positives like in
http://streams.videolan.org/streams/mp4/Mr_MrsSmith-h264_aac.mp4
which has a false positive in the SPS.
Width and height might get passed as 0 and would cause floating point
exceptions in decode_frame.
Fixes bugzilla #149
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
This was intended as an optimisation for skipped blocks in MPEG2
P-frames and never used elsewhere. Removing this "optimisation"
speeds up MPEG2 decoding by 1-2% (ARM Cortex-A9).
Signed-off-by: Mans Rullgard <mans@mansr.com>
Previously the decoder only worked if the user had set avctx->pix_fmt
manually. For some reason the libavformat tmv demuxer sets this, so
the problem was not visible in avplay etc.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
The buffer splicing relies on the bitstream reader over-reading
the end of the buffer as declared in init_get_bits(), although
more data is actually present. Manually moving the bitstream
boundary after init_get_bits() allows this to work as expected.
Signed-off-by: Mans Rullgard <mans@mansr.com>
When turned on, H264/CAVLC gets ~15% (CVPCMNL1_SVA_C.264) slower for
ultra-high-bitrate files, or ~2.5% (CVFI1_SVA_C.264) for lower-bitrate
files. Other codecs are affected to a lesser extent because they are
less optimized; e.g., VC-1 slows down by less than 1% (all on x86).
The patch generated 3 extra instructions (cmp, cmovae and mov) per
call to get_bits().
The performance penalty on ARM is within the error margin for most
files, up to 4% in extreme cases such as CVPCMNL1_SVA_C.264.
Based on work (for GCI) by Aneesh Dogra <lionaneesh@gmail.com>, and
inspired by patch in Chromium by Chris Evans <cevans@chromium.org>.
The A32 bitstream reader variant is only used on ARMv5 and for
Prores due to the larger bit cache this decoder requires.
In benchmarks on ARMv5 (Marvell Sheeva) with gcc 4.6, the only
statistically significant difference between ALT and A32 is
a 4% advantage for ALT in FLAC decoding. There is thus no (longer)
any reason to keep the A32 reader from this point of view.
This patch adds an option to the ALT reader increasing the bit
cache to 32 bits as required by the Prores decoder. Benchmarking
shows no significant change in speed on Intel i7. Again, the
A32 reader fails to justify its existence.
Signed-off-by: Mans Rullgard <mans@mansr.com>
In the case that (frame_flags & 0x03) == 3, hybrid_maxclip
may have had a signed integer overflow.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
It doesn't make much sense to clip pre-shift,
nor is it correct for proper decoding.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
The keyframe after a POC reset may not be the first to be returned to
the user. Therefore, don't reset the expected next POC once we return
a keyframe to the user, but once we know that the next frame in the
return-queue is a keyframe.
The mode is set in libgsm_decode_init, but the decoder
object is simply destroyed and recreated in the flush
function - therefore the mode has to be set again.
This fixes playback using the libgsm_ms decoder in avplay.
Signed-off-by: Martin Storsjö <martin@martin.st>
v410 is a packed 10-bit 4:4:4 YCbCr format used in
QuickTime.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Signed-off-by: Diego Biurrun <diego@biurrun.de>
This patch is a generalization of what Michael Niedermayer
fixed in a single case.
The wmv8-drm fate test had been updated accordingly.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>