Currently, any samples in the final frame are not decoded because they are
only represented by one frame instead of two. So we encode two final frames to
cover both the analysis delay and the MDCT delay.
While pshufb allows emulating bswap on XMM registers for SSSE3, more
shuffling is needed for SSE2. Alignment is critical, so specific codepaths
are provided for this case.
For the huffyuv sequence "angels_480-huffyuvcompress.avi":
C (using bswap instruction): ~ 55k cycles
SSE2: ~ 40k cycles
SSSE3 using unaligned loads: ~ 35k cycles
SSSE3 using aligned loads: ~ 30k cycles
Signed-off-by: Diego Biurrun <diego@biurrun.de>
On x86-64, it indeed uses all 16 registers (and on x86-32, this gets
clipped to 8). Not marking it properly causes callers of this function
to fail randomly because of XMM register clobbering.
Note: This fixes the following GCC warning :-
libavcodec/sunrast.c:94: warning: comparison of unsigned expression < 0 is always false.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
10l: Forgot to adjust deinterleave for new location of incoming samples in 7946a5a.
This produced incorrect, but surprisingly listenable results.
Thanks to Justin Ruggles for the report.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
This prepares for assembly optimisations by moving the most
time-consuming loops to functions called through pointers
in a new context.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Earlier, calling avcodec_encode_audio worked fine even if time_base
wasn't set. Now it crashes due to trying to scale the output pts to
the codec context time base. This affects e.g. VLC.
If no time_base is set for audio codecs, set it to the sample
rate.
CC: libav-stable@libav.org
Signed-off-by: Martin Storsjö <martin@martin.st>
MVDATA may or may not be transmitted. If it is not, both
dmv_x and dmv_y is to be assumed zero.
This may not trigger wrong picture in all systems, but
it's a bug nevertheless. Fixes SA10116.vc1 on my 64-bit
Windows 7.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Previously, it would not be read if refdist_flag was not set, however
according to the spec and the reference decoder, it should always be read.
Signed-off-by: Kostya Shishkov <kostya.shishkov@gmail.com>
The MDCT buffers in the decoder are only sized for up to 11 bits. The
reverse engineered documentation for WMA1/2 headers say that that for
all samplerates above 32kHz 11 bits are used. 12 and 13 bit support
were added for WMAPro. I was unable to make any Microsoft tools generate
a test file at a samplerate above 48kHz.
Discovered by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
Although it has been deprecated for a long time, its intended
replacement (request_channel_layout) is not actually used anywhere, so
request_channels is currently the only way to access that functionality.
This matches the spec as well as the reference decoder, and fixes a bug
with interlaced frame decoding.
Signed-off-by: Kostya Shishkov <kostya.shishkov@gmail.com>
Using threaded decoding by default breaks backward compatibility if
AVHWAccel is used or if an appliction sets threadunsafe callbacks.
Avconv and avplay still use -threads auto if not specified.
When either video dimension is only one macroblock, subtractions
based on v_edge_pos and the macroblock size may be negative. In
that situation, an unsigned comparison isn't sufficent to test for
MV overruns, because a limit of (unsigned)-1 will let any other
value pass.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Overall almost 4% faster, idct_add down from 350 to 85 cycles, idct_dc_add
down from 83 to 30 cycles.
squash: rv34 idct rearrange partial register loads
This allows audio encoders to optionally take an AVFrame as input and write
encoded output to an AVPacket.
This also adds AVCodec.encode2() which will also be usable by video and
subtitle encoders once support is implemented in the public functions.
Extract processing of intra 16x16 blocks from intra macroblock
processing.
Also implement a function performing inverse transform and block
reconstruction for DC-only blocks in 1 pass instead of 2.
Split inter/intra macroblock handling code. This will allow further
optimizations such as performing inverse transform and block reconstruction
in a single pass as well as specialize code.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
Do not fail audio decoding with avcodec_decode_audio3 if user has set a
custom get_buffer. Strictly speaking, this was never allowed by the API,
but it seems that some software packages did so anyways. In order to
unbreak applications (cf. http://bugs.debian.org/655890), this change
clarifies the API and overrides the custom get_buffer() with the defaults.
This change is inspired by a similar
commit (c3846e3eba) in FFmpeg.
Signed-off-by: Reinhard Tartler <siretart@tauware.de>
Reference decoder clips data before shifting it to final range and also
forces 32-bit lossy mode to be actually 24-bit lossy mode in order to be
able to perform proper clipping.
max_b_frames is initialized to -1 for libx264, to allow
distinguishing between an explicit user set 0 and a default not
touched 0 (see bb73cda2).
If max_b_frames is left as -1, this affects dts generation (where
expressions like max_b_frames != 0 are used), so make sure it is
left at the default 0 after the libx264 init function returns.
This avoids unnecessarily producing dts != pts when using
profile=baseline.
Signed-off-by: Martin Storsjö <martin@martin.st>
The alignment directive must obviously precede the label.
This was never noticed in ARM mode since the location is
already aligned there.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Due to apprent bugs in the GNU assembler and/or linker, relocations
can be incorrectly processed if the alignment of a Thumb instruction
is changed in the output file compared to the input object.
This fixes crashes in h264 decoding with Thumb enabled. No effect in
ARM mode since everything is 4-byte aligned there.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This fixes standalone compilation of some decoders with --disable-optimizations.
cabac.h defines some inline functions that use symbols from cabac.c. Without
optimizations these inline functions are not eliminated and linking fails with
references to non-existing symbols.
Splitting the inline functions off into their own header and only #including
it in the places where the inline functions are used allows #including cabac.h
from anywhere without ill effects.
The sporadic threading errors during fate-rv30 were caused by calling
ff_thread_await_progress with mb row -1 as argument. That returns
immediately since progress is initialized to -1. Not yet computed
motion vectors from the reference could be used for the first
macroblocks.
When decoding coefficients, detect whether the block is DC-only, and take
advantage of this knowledge to perform DC-only inverse transform.
This is achieved by:
- first, changing the 108x4 element modulo_three_table into a 108 element
table (kind of base4), and accessing each value using mask and shifts.
- then, checking low bits for 0 (as they represent the presence of higher
frequency coefficients)
Also provide x86 SIMD code for the DC-only inverse transform.
Signed-off-by: Kostya Shishkov <kostya.shishkov@gmail.com>
This is required to handle clobbering of XMM registers on Win64
correctly. Fixes FFT and all tests depending on FFT on Win64.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
The WAVE demuxer returns packets with many blocks per frame, which needs to be
parsed into single blocks. This has a side-effect of fixing the timestamps.
Statistics for bourne.rmvb -an -f null
1 thread: 37.12s user 0.03s system 99% cpu 37.174 total
2 threads: 47.63s user 0.24s system 185% cpu 25.807 total
4 threads: 41.21s user 0.30s system 327% cpu 12.674 total
Under certain conditions pictures could be released before they were
returned with frame-threading. Broken mv computation in the upcoming
rv34 frame-threading patch was caused by this.
To prevent contexts from running out of available pictures the loop
releasing "unused" pictures has to be run for B frames too.
Fixes Libav Bug 195.
This doesn't make the code handle sample rate or upsample/downsample
change properly but this is still a good sanity check.
Based on change by Michael Niedermayer.
Signed-off-by: Alex Converse <alex.converse@gmail.com>