When turned on, H264/CAVLC gets ~15% (CVPCMNL1_SVA_C.264) slower for
ultra-high-bitrate files, or ~2.5% (CVFI1_SVA_C.264) for lower-bitrate
files. Other codecs are affected to a lesser extent because they are
less optimized; e.g., VC-1 slows down by less than 1% (all on x86).
The patch generated 3 extra instructions (cmp, cmovae and mov) per
call to get_bits().
The performance penalty on ARM is within the error margin for most
files, up to 4% in extreme cases such as CVPCMNL1_SVA_C.264.
Based on work (for GCI) by Aneesh Dogra <lionaneesh@gmail.com>, and
inspired by patch in Chromium by Chris Evans <cevans@chromium.org>.
The A32 bitstream reader variant is only used on ARMv5 and for
Prores due to the larger bit cache this decoder requires.
In benchmarks on ARMv5 (Marvell Sheeva) with gcc 4.6, the only
statistically significant difference between ALT and A32 is
a 4% advantage for ALT in FLAC decoding. There is thus no (longer)
any reason to keep the A32 reader from this point of view.
This patch adds an option to the ALT reader increasing the bit
cache to 32 bits as required by the Prores decoder. Benchmarking
shows no significant change in speed on Intel i7. Again, the
A32 reader fails to justify its existence.
Signed-off-by: Mans Rullgard <mans@mansr.com>
In the case that (frame_flags & 0x03) == 3, hybrid_maxclip
may have had a signed integer overflow.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
It doesn't make much sense to clip pre-shift,
nor is it correct for proper decoding.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
The keyframe after a POC reset may not be the first to be returned to
the user. Therefore, don't reset the expected next POC once we return
a keyframe to the user, but once we know that the next frame in the
return-queue is a keyframe.
The mode is set in libgsm_decode_init, but the decoder
object is simply destroyed and recreated in the flush
function - therefore the mode has to be set again.
This fixes playback using the libgsm_ms decoder in avplay.
Signed-off-by: Martin Storsjö <martin@martin.st>
v410 is a packed 10-bit 4:4:4 YCbCr format used in
QuickTime.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Signed-off-by: Diego Biurrun <diego@biurrun.de>
This patch is a generalization of what Michael Niedermayer
fixed in a single case.
The wmv8-drm fate test had been updated accordingly.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
The change in 599b4c6ef didn't turn out to work properly on
i386 on OS X, where it broke building with PIC enabled.
Signed-off-by: Martin Storsjö <martin@martin.st>
Make the function prototype match the argument of
AVCodecCntext.execute() and remove the cast hiding
this mismatch.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This replaces the explicit offset(reg) memory references with
"m" operands for the same locations. As a result, one fewer
register operand is needed for these inline asm statements.
Signed-off-by: Mans Rullgard <mans@mansr.com>
When the buf and last pointers are equal, the FFSWAP() results
in an invalid call to memcpy() with same source and destination
on some targets. Although assigning a struct to itself is valid
C99, gcc does not check for this before calling memcpy().
See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667
Signed-off-by: Mans Rullgard <mans@mansr.com>
The existing functions defined in intfloat_readwrite.[ch] are
both slow and incorrect (infinities are not handled).
This introduces a new header with fast, inline conversion
functions using direct union punning assuming an IEEE-754
system, an assumption already made throughout the code.
The one use of Intel/Motorola extended 80-bit format is
replaced by simpler code sufficient under the present
constraints (positive normal values).
The old functions are marked deprecated and retained for
compatibility.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Return the whole packet as consumed in this case and not the size the
packet should have had. Move the insufficient data check into the for
condition to fix a ISO C90 error on bigendian.
This groups the encode/decode parts under single ifdefs and
eliminates the encode_init() function as it merely calls
common_init(). Also fix whitespace in moved code.
Signed-off-by: Mans Rullgard <mans@mansr.com>