It is not allowed to change mid-stream like it does currently. Instead we need
to buffer the first 8 frames before returning them as a single packet, then
only return single frame packets after that.
This prevents certain tags with a default value assigned to them (as per
the EBML syntax elements) from ever being assigned a NULL value. Other
parts of the code rely on these being non-NULL (i.e. they don't check for
NULL before e.g. using the string in strcmp() or similar), and thus in
effect this prevents crashes when reading of such specific tags fails,
either because of low memory or because of targeted file corruption.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
Prevents crash when trying to copy from a non-existing plane in e.g.
a RGB32 reference image to a YUV420P target image
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
Instead of clipping extrasize based on EXTRABYTES, clip based on the
amount of buffer actually left. Without this fix, there are warbles
and other distortions in the test case below.
http://kevincennis.com/mix/assets/sounds/1901_voxfx.mp3
compute_pkt_fields() is for unreliable estimates or guessing. The
keyframe information from the parser is (at least in theory) reliable,
so it should be used even when the other guessing is disabled with the
AVFMT_FLAG_NOFILLIN flag.
Therefore, move setting the packet keyframe flag based on parser
information from compute_pkt_fields() to read_frame_internal().
fixes a memleak for Vorbis and Theora, where the comment header from
avpriv_split_xiph_headers() is replaced by a buffer that must be freed
separately.
This prevents crashes when trying to read beyond the end of the buffer
while decoding frame data.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
Unrolling the main loop to process, instead of 4 elements:
- 8: minor gain of 2 cycles (not worth the extra object size)
- 2: loss of 8 cycles.
Assigning STEP to a register is a loss. Output address (Y) is almost always
unaligned.
Timings:
- C (32/64 bits): 117/109 cycles
- SSE: 57 cycles
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
The 32bits targets have been compiled with -mfpmath=sse for proper reference.
sbr_sum_square C /32bits: 82c (unrolled)/102c
C /64bits: 69c (unrolled)/82c
SSE/32bits: 42c
SSE/64bits: 31c
Use of SSE4.1 dpps to perform the final sum is slower.
Not unrolling to perform 8 operations in a loop yields 10 more cycles.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>