Tables are bit identical.
Sample benchmark (Haswell, GNU/Linux+gcc):
old:
814485 decicycles in dsd_ctables_tableinit, 512 runs, 0 skips
new:
356808 decicycles in dsd_ctable_tableinit, 512 runs, 0 skips
Binary size should essentially be identical, and is in fact identical on
the configuration I tested on.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Intel's Instruction Set Reference (as of September 2015) clearly states
that cvtpi2ps switches to MMX state. Actual CPUs do not switch if the
source is a memory location. The Instruction Set Reference from 1999
(Order Number 243191) describes this behaviour but all later versions
I've seen have make no distinction whether MMX registers or memory is
used as source.
The documentation for the matching SSE2 instruction to convert to double
(cvtpi2pd) was fixed (see the valgrind bug
https://bugs.kde.org/show_bug.cgi?id=210264).
It will take time to get a clarification and fixes in place. In the
meantime it makes sense to change ff_int32_to_float_fmul_scalar_sse to
be correct according to the documentation. The vast majority of users
will have SSE2 so a change to the SSE version has little effect.
Fixes fate-checkasm on x86 valgrind targets.
Valgrind 'bug' reported as https://bugs.kde.org/show_bug.cgi?id=357059
get_ue_golomb() cannot decode values larger than 8190 (the maximum
value that can be golomb encoded in 25 bits) and produces the error
"Invalid UE golomb code" if a larger value is encountered. Use
get_ue_golomb_long() instead (which supports 63 bits, up to 4294967294)
when valid h264/hevc values can exceed 8190.
This updates decoding of the following values: (maximum)
first_mb_in_slice 36863* for level 5.2
abs_diff_pic_num_minus1 131071
difference_of_pic_nums_minus1 131071
idr_pic_id 65535
recovery_frame_cnt 65535
frame_packing_arrangement_id 4294967294
frame_packing_arrangement_repetition_period 16384
display_orientation_repetition_period 16384
An alternative would be to modify get_ue_golomb() to handle encoded
values of up to 49 bits as was done for get_se_golomb() in a92816c.
In that case get_ue_golomb() could continue to be used for all of
these except frame_packing_arrangement_id.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Should fix the regression, and also speeds up table generation.
Tables tested on GNU/Linux+clang: they are identical to the ones prior
to 5495c7f. ff_exp10 caused one slight change in one entry, 50000 became
50001 due to somewhat incorrect rounding.
Untested on ICC; passes FATE on GNU/Linux+gcc.
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
The DCA core decoder converts integer coefficients read from the
bitstream to floats just after reading them (along with dequantization).
All the other steps of the audio reconstruction are done with floats
which makes the output for the DTS lossless extension (XLL)
actually lossy.
This patch changes the DCA core to work with integer coefficients
until QMF. At this point the integer coefficients are converted to floats.
The coefficients for the LFE channel (lfe_data) are not touched.
This is the first step for the really lossless XLL decoding.
This fixes an out-of-bounds read introduced in commit 0379603.
Reviewed-by: Kieran Kunhya <kierank@obe.tv>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
Fix possible SF delta violation that would cause an
eventual assertion failure in some corner cases (esp
on very low bitrates) when marking bands for PNS due
to misuse of the sf_delta utilities
'erf' is far from the best name for a variable and is not very
descriptive since the actual variable points to the comparitively best
IS phase. Therefore rename it to 'best'.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Otherwise the too small buffer is directly used in the frame, causing
segmentation faults, when trying to use the frame.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
This is used to check if the input buffer is large enough, so if this
overflows it can cause a false negative leading to a segmentation fault
in bytestream2_get_bufferu.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
Fix related register order issue in ff_h264_idct_add_neon.
Found-by: zjh8890 <243186085@qq.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This macro unconditionally used out[-1], which causes an out of bounds
read, if out is the very beginning of the buffer.
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
More don't fit into the integer output.
Also use get_bits_long, since get_bits only supports reading up to 25
bits, while get_bits_long supports the full integer range.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
The type of last_frame_pb_count was chosen to be an int since overflow
is impossible (the spec says the maximum bits per frame is 6144 per
channel and the encoder checks for that).
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Reviewed-by: Paul B Mahol <onemda@gmail.com>
* commit '458e53f51fc75d08df884f8e9eb3d7ded23e97b3':
mpegvideo_enc: actually add the side data with vbv_delay to the packet
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
* commit '81c95eb8eee856d98d4ac37367dbc761f2faf875':
openh264: Directly include the deprecation guards header
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
* commit 'c34df422628e6b7b657faee241fe7bb2629e0f57':
sgienc: Make sure to initialize skipped header portions
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
exp2 is faster.
It may be possible to optimize further; e.g the exponents seem to be
multiples of 0.25. This requires study though.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
If that is the case, the loop setting predictor_state in
sonic_decode_frame causes out of bounds reads of int_samples, which has
only frame_size number of elements.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
exp2 suffices here. Some trivial speedup is done in addition here by
reusing results.
This retains accuracy, and in particular results in identical values
with GNU libm + gcc/clang.
sample benchmark (Haswell, GNU/Linux):
proposed : 424160 decicycles in pow_table, 512 runs, 0 skips
exp2 only: 1262093 decicycles in pow_table, 512 runs, 0 skips
old : 2849085 decicycles in pow_table, 512 runs, 0 skips
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
* commit 'e02de9df4b218bd6e1e927b67fd4075741545688':
lavc: export Dirac parsing API used by the ogg demuxer as public
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
* commit 'f0b769c16daafa64720dcba7fa81a9f5255e1d29':
lavc: add a packet side data type for VBV-like parameters
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
* commit '31c51f7441de07b88cfea2550245bf1f5140cb8f':
avpacket: add a function for wrapping existing data as side data
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
* commit 'b09ad37c83841c399abb7f2503a2ab214d0c2d48':
h264: derive the delay from the level when it's not present
Merged without changing the strict_std_compliance check, as it breaks FATE
and changes decoding behavior.
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
* commit '792b9c9dfcf44b657d7854368d975b5ca3bc22ca':
h264: set frame_num in start_frame(), not decode_slice_header()
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
This macro unconditionally used out[-1], which causes an out of bounds
read, if out is the very beginning of the buffer.
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Due to this typo max_center can be too large, causing nlsf to be set to
too large values, which in turn can cause nlsf[i - 1] + min_delta[i] to
overflow to a negative value, which is not allowed for nlsf and can
cause an out of bounds read in silk_lsf2lpc.
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Due to this typo max_center can be too large, causing nlsf to be set to
too large values, which in turn can cause nlsf[i - 1] + min_delta[i] to
overflow to a negative value, which is not allowed for nlsf and can
cause an out of bounds read in silk_lsf2lpc.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
Also correct the check to reject log < 7, because UPDATE_CACHE only
guarantees 25 meaningful bits.
This fixes undefined behavior:
runtime error: shift exponent is negative
Testing with START/STOP timers in get_ue_golomb, one for the first
branch (A) and one for the second (B), shows that there is practically no
slowdown, e.g. for the cavs decoder:
With the check in the B branch:
629 decicycles in get_ue_golomb B, 4194260 runs, 44 skips
433 decicycles in get_ue_golomb A,268434102 runs, 1354 skips
Without the check:
624 decicycles in get_ue_golomb B, 4194273 runs, 31 skips
433 decicycles in get_ue_golomb A,268434203 runs, 1253 skips
Since the B branch is executed far less often than the A branch, this
change is negligible, even more so for the h264 decoder, where the ratio
B/A is a lot smaller.
Fixes: mozilla bug 1230239
Fixes: fbeb8b2c7c996e9b91c6b1af319d7ebc/asan_heap-oob_195450f_2743_e8856ece4579ea486670be2b236099a0.bit
Found-by: Tyson Smith
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
PSNR doesn't change as expected. The AAC spec doesn't really say
anything about how exactly to generate noise.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Quite a bit faster than int32_to_float_fmul_array8_c calling
ff_int32_to_float_fmul_scalar_neon through FmtConvertContext.
Number of cycles per int32_to_float_fmul_array8 call while decoding
padded.dts on exynos5422:
before after change
cortex-a7: 1270 951 -25%
cortex-a15: 434 285 -34%
checkasm --bench cycle counts: cortex-a15 cortex-a7
int32_to_float_fmul_array8_c: 1730.4 4384.5
int32_to_float_fmul_array8_neon_c: 571.5 1694.3
int32_to_float_fmul_array8_neon: 374.0 1448.8
Interesting are the differences between
int32_to_float_fmul_array8_neon_c and int32_to_float_fmul_array8_neon.
The former is current behaviour of calling
ff_int32_to_float_fmul_scalar_neon repeatedly from the c function,
The raw numbers differ since checkasm uses different lengths than the
dca decoder.
~25% faster dts decoding overall. The checkasm CPU cycles numbers are
not that useful since synth_filter_float() calls FFTContext.imdct_half().
cortex-a57 cortex-a53
synth_filter_float_c: 1866.2 3490.9
synth_filter_float_neon: 915.0 1531.5
With fftc.imdct_half forced to imdct_half_neon:
cortex-a57 cortex-a53
synth_filter_float_c: 1718.4 3025.3
synth_filter_float_neon: 926.2 1530.1
The vector mode was deprecated in ARMv7-A/VFPv3 and various cpu
implementations do not support it in hardware. Vector mode code will
depending the OS either be emulated in software or result in an illegal
instruction on cpus which does not support it. This was not really
problem in practice since NEON implementations of the same functions are
preferred. It will however become a problem for checkasm which tests
every cpu flag separately.
Since this is a cpu feature newer cpu do not support anymore the
behaviour of this flag differs from the other flags. It can be only
activated by runtime cpu feature selection.
Almost all the places from which this function is called already check
the header manually and in the two that don't (the mp3 muxer) the check
should not cause any problems.
The limit is a conservative guess, the spec does not seem to specify a limit
Reviewed-by: Andreas Cadhalpun <andreas.cadhalpun@googlemail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The transpose_4x4H is wrong which cost me much time to find this bug. The orders of r2 and r3 are wrong,
this bug waste me much time while I make aarch64 arm instruction which used the function.
This also removes a #ifdef and special case for the fixed point case
Reviewed-by: Andreas Cadhalpun <andreas.cadhalpun@googlemail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
If the input contains too many too large values, the imdct can overflow.
Even if it didn't, the output would be larger than the valid range of 29
bits.
Note that this is a very delicate limit: Allowing values up to 1<<25
does not prevent input larger than 1<<29 from arriving at
sbr_sum_square, while limiting values to 1<<23 breaks the
fate-aac-fixed-al_sbr_hq_cm_48_5.1 test.
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
Fix OOB access in search_for_pns which was using
w2 outside the window group loop, and fix a typo
in which it was checking sf_idx instead of band_type
Reviewed-by: Andreas Cadhalpun <andreas.cadhalpun@googlemail.com>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
This gets rid of virtually useless hardcoded tables hackery. The reason
it is useless is that a 320 element lut is anyway placed regardless of
--enable-hardcoded-tables, from which all necessary tables are trivially
derived at runtime at very low cost:
sample benchmark (x86-64, Haswell, GNU/Linux, single run is really
what is relevant here since looping drastically changes the bench). Fluctuations
are on the order of 10% for the single run test:
39400 decicycles in aacsbr_tableinit, 1 runs, 0 skips
25325 decicycles in aacsbr_tableinit, 2 runs, 0 skips
18475 decicycles in aacsbr_tableinit, 4 runs, 0 skips
15008 decicycles in aacsbr_tableinit, 8 runs, 0 skips
13016 decicycles in aacsbr_tableinit, 16 runs, 0 skips
12005 decicycles in aacsbr_tableinit, 32 runs, 0 skips
11546 decicycles in aacsbr_tableinit, 64 runs, 0 skips
11506 decicycles in aacsbr_tableinit, 128 runs, 0 skips
11500 decicycles in aacsbr_tableinit, 256 runs, 0 skips
11183 decicycles in aacsbr_tableinit, 509 runs, 3 skips
Tested with FATE with/without --enable-hardcoded-tables.
Reviewed-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
pow is a very wasteful function for this purpose. A low hanging fruit
would be simply to replace with exp2f, and that does yield some speedup.
However, there are 2 drawbacks of this:
1. It does not exploit the integer nature of the argument.
2. (minor) Some platforms lack a proper exp2f routine, making benefits available
only to non broken libm.
3. exp2f does not solve the same issue that plagues pow, namely terrible
worst case performance. This is a fundamental issue known as the
"table-maker's dilemma" recognized by Prof. Kahan himself and
subsequently elaborated and researched by many others. All this is clear from benchmarks below.
This exploits the IEEE-754 format to get very good performance even in
the worst case for integer powers of 2. This solves all the issues noted
above. Function tested with clang usan over [-1000, 1000] (beyond range of
relevance for this, which is [-255, 255]), patch itself with FATE.
Benchmarks obtained on x86-64, Haswell, GNU-Linux via 10^5 iterations of
the pow call, START/STOP, and command ffplay ~/samples/jpeg2000/chiens_dcinema2K.mxf.
Low number of runs also given to prove the point about worst case:
pow:
216270 decicycles in pow, 1 runs, 0 skips
110175 decicycles in pow, 2 runs, 0 skips
56085 decicycles in pow, 4 runs, 0 skips
29013 decicycles in pow, 8 runs, 0 skips
15472 decicycles in pow, 16 runs, 0 skips
8689 decicycles in pow, 32 runs, 0 skips
5295 decicycles in pow, 64 runs, 0 skips
3599 decicycles in pow, 128 runs, 0 skips
2748 decicycles in pow, 256 runs, 0 skips
2304 decicycles in pow, 511 runs, 1 skips
2072 decicycles in pow, 1022 runs, 2 skips
1963 decicycles in pow, 2044 runs, 4 skips
1894 decicycles in pow, 4091 runs, 5 skips
1860 decicycles in pow, 8184 runs, 8 skips
exp2f:
134140 decicycles in pow, 1 runs, 0 skips
68110 decicycles in pow, 2 runs, 0 skips
34530 decicycles in pow, 4 runs, 0 skips
17677 decicycles in pow, 8 runs, 0 skips
9175 decicycles in pow, 16 runs, 0 skips
4931 decicycles in pow, 32 runs, 0 skips
2808 decicycles in pow, 64 runs, 0 skips
1747 decicycles in pow, 128 runs, 0 skips
1208 decicycles in pow, 256 runs, 0 skips
952 decicycles in pow, 512 runs, 0 skips
822 decicycles in pow, 1024 runs, 0 skips
765 decicycles in pow, 2047 runs, 1 skips
722 decicycles in pow, 4094 runs, 2 skips
693 decicycles in pow, 8190 runs, 2 skips
exp2fi:
2740 decicycles in pow, 1 runs, 0 skips
1530 decicycles in pow, 2 runs, 0 skips
955 decicycles in pow, 4 runs, 0 skips
622 decicycles in pow, 8 runs, 0 skips
477 decicycles in pow, 16 runs, 0 skips
368 decicycles in pow, 32 runs, 0 skips
317 decicycles in pow, 64 runs, 0 skips
291 decicycles in pow, 128 runs, 0 skips
277 decicycles in pow, 256 runs, 0 skips
268 decicycles in pow, 512 runs, 0 skips
265 decicycles in pow, 1024 runs, 0 skips
263 decicycles in pow, 2048 runs, 0 skips
263 decicycles in pow, 4095 runs, 1 skips
260 decicycles in pow, 8191 runs, 1 skips
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
This fixes out-of-bounds reads in avoid_clipping.
Reviewed-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
In the merge commit 78265fcfee this behaviour
was broken and the CORRUPT flag would never ever be set on a frame. However
the flag on the AVCodecContext was taken into account properly, including
AV_CODEC_FLAG2_SHOW_ALL.
The reason for this was that the recovered field of the next output picture
was always set to TRUE whenever one of the two AVCodecContext flags was set,
which made it impossible to detect later, before outputting, if the frame was
really recovered or not. Now don't set it to TRUE unless the frame is really
recovered and check the AVCodecContext flags right before outputting.
Signed-off-by: Sebastian Dröge <sebastian@centricular.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
With only 7 coefficients per short window at most the extra precision
makes a difference and seems to reduce crackling and stddev even
further.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
* commit 'b805482b1fba1d82fbe47023a24c9261f18979b6':
aac: Provide more information on the failure message
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
Don't print a warning when dcadec_context_filter() returns positive
warning code. Most relevant warnings are now output through the callback
function.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Take request_channel_layout as a hint and don't force 2.0 downmix by
using both the 2CH and 6CH flags together.
Remove warnings about missing coefficients because they are no longer
relevant.
Honor AV_CH_LAYOUT_NATIVE and make it possible for native DTS channel
layout to be output.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
These variables are coming from mpegvideoenc where are supposedly used
as bit counters on various frame properties. However their use is
unclear as they lack documentation, are available only from a very small
subset of encoders, and they are hardly used in the wild. Also frame_bits
in aacenc is employed in a similar way.
Remove this functionality from AVCodecContex, these variable are mostly
frame properties, and too few encoders support setting them with anything
useful.
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
Most option values are simply unused or ignored and in practice the
majory of codecs only need to check whether to enable rle or not.
Add appropriate codec private options which better expose the allowed
features.
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
* commit 'f023d57d355ff3b917f1aad9b03db5c293ec4244':
lavc: G.723.1 encoder
Split existing FFmpeg G.723.1 encoder into a new file.
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
* commit '165cc6fb9defcd79fd71c08167f3e8df26b058ff':
g723_1: Move sharable functions to a separate file
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
* commit 'aac996cc01042194bf621d845bbe684549b5882e':
g723_1: Rename files to better reflect their purpose
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
* commit 'b74b88f30da2389f333a31815d8326d5576d3331':
g723_1: Handle values at the ends of the table in lsp2lpc()
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
If the chroma components are subsampled, smaller buffers are allocated
for them. In that case the maximal block_offset for the chroma
components is not as large as for the luma component.
This fixes out of bounds writes causing segmentation faults or memory
corruption.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
There are a couple of major changes here:
1. Start using TNS coefficient compression.
2. Start using 3 bits per coefficient maximum for short windows.
The bits we save from these 2 changes seem to make a nice impact on the
rest of the file/windows.
3. Remove special case gain checking for short windows.
4. Modify the coefficient loop to support up to 3 windows.
The additional restrictions on TNS were something that was no in the
specifications and furthermore restricting TNS to only low energy short
windows was done to compensate for bugs elsewhere in the code.
Overall, the improvements here reduce crackling artifacts heard in very
noisy tracks.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
The original plan was to have TNS use data from the PNS search to better
tune itself to noise but this was never used nor necessary. This should
slightly boost the PNS accuracy if TNS was used.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Fall back to maximum DPB size if the level is unknown.
This should be more spec-compliant and does not depend on the caller
setting has_b_frames before opening the decoder.
The old behaviour, when the delay is supplied by the caller setting
has_b_frames, can still be obtained by setting strict_std_compliance
below normal.
According to the spec, the reference list for a slice should be
constructed by first generating an initial (what we now call "default")
reference list and then optionally applying modifications to it.
Our code has an optimization where the initial reference list is
constructed for the first inter slice and then rebuilt for other slices
if needed. This, however, adds complexity to the code, requires an extra
2.5kB array in the codec context and there is no reason to think that it
has any positive effect on performance. Therefore, simplify the code by
generating the reference list from scratch for each slice.
Fixes out of array read
Fixes: d41d8cd98f00b204e9800998ecf8427e/signal_sigsegv_321165b_7641_077dfcd8cbc80b1c0b470c8554cd6ffb.bit
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes out of array read
Fixes: 99d142c47e6ba3510a74b872a1a2ae72/asan_heap-oob_11b36f4_3811_0f5c69e7609a88a580135678de1df844.dxa
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Thiss commit removes the experimental flag from the native AAC Encoder
and thus makes it the default.
After a lot of work, done by myself and Claudio Freire, the quality of
this encoder rivals and surpasses libfdk_aac in some situations. The
encoder had instability issues earlier which prevented it from having
its experimental flag removed, however the last commits done by Claudio
removed the last known source of instability and solved a lot of
problems which were previously observed. The issues were caused by the
various coding tools interfering with the scalefactor indices. Thus,
with these problems solved, it should now be possible to declare this
encoder as the default and recommend that the users should use it
instead of others provided by external libraries, as it is both faster
and has a subjectively higher quality with selected tracks.
The encoder has still yet to be fine tuned for every possible audio file
type like music or voice, so it is hoped that with the experimental flag
removed the users should be able to provide feedback and make the
encoder better than the alternatives for every type of audio and at
every bitrate.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
ANMR has some interesting things coming up but is currently not in a
shape fit for non-experimental usage. Same with "FAST".
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
This coder produces a much lower quality audio than the rest, is much
slower and is unstable. Hasn't been updated for a very long time as
well, hence it is more appropriate to remove it since it also depends on
a big burden of a code (the encode_window_bands_info function which is
just as old, just as unstable and bad and in no way modifiable or
fixable).
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
If minq is negative, the range of sf_idx can be larger than
SCALE_MAX_DIFF allows, causing assertion failures later in
encode_scale_factors.
Reviewed-by: Claudio Freire <klaussfreire@gmail.com>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
Including these headers is not needed and breaks building on Windows as it
tries to activate the full compat tools, which are not needed for host
tools.
Fixes: out of array read
Fixes: 36b8096fefab16c4c9326a508053e95c/signal_sigsegv_1d9ce18_3233_1a55196b018106dfabeace071a432d9e.r3d
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes out of array read
Fixes: 0a7ff0c1d93da9cef28a315ec91b692a/asan_heap-oob_4a52e5_3604_9c56dbb20e308f4faeef7b35f688521a.ape
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This patch does 4 things, all of which interact and thus it
woudln't be possible to commit them separately without causing
either quality regressions or assertion failures.
Fate comparison targets don't all reflect improvements in
quality, yet listening tests show substantially improved quality
and stability.
1. Increase SF range utilization.
The spec requires SF delta values to be constrained within the
range -60..60. The previous code was applying that range to
the whole SF array and not only the deltas of consecutive values,
because doing so requires smarter code: zeroing or otherwise
skipping a band may invalidate lots of SF choices.
This patch implements that logic to allow the coders to utilize
the full dynamic range of scalefactors, increasing quality quite
considerably, and fixing delta-SF-related assertion failures,
since now the limitation is enforced rather than asserted.
2. PNS tweaks
The previous modification makes big improvements in twoloop's
efficiency, and every time that happens PNS logic needs to be
tweaked accordingly to avoid it from stepping all over twoloop's
decisions. This patch includes modifications of the sort.
3. Account for lowpass cutoff during PSY analysis
The closer PSY's allocation is to final allocation the better
the quality is, and given these modifications, twoloop is now
very efficient at avoiding holes. Thus, to compute accurate
thresholds, PSY needs to account for the lowpass applied
implicitly during twoloop (by zeroing high bands).
This patch makes twoloop set the cutoff in psymodel's context
the first time it runs, and makes PSY account for it during
threshold computation, making PE and threshold computations
closer to the final allocation and thus achieving better
subjective quality.
4. Tweaks to RC lambda tracking loop in relation to PNS
Without this tweak some corner cases cause quality regressions.
Basically, lambda needs to react faster to overall bitrate
efficiency changes since now PNS can be quite successful in
enforcing maximum bitrates, when PSY allocates too many bits
to the lower bands, suppressing the signals RC logic uses to
lower lambda in those cases and causing aggressive PNS.
This tweak makes PNS much less aggressive, though it can still
use some further tweaks.
Also update MIPS specializations and adjust fuzz
Also in lavc/mips/aacpsy_mips.h: remove trailing whitespace
On systems having cbrt, there is no reason to use the slow pow function.
Sample benchmark (x86-64, Haswell, GNU/Linux):
new:
5124920 decicycles in cbrt_tableinit, 1 runs, 0 skips
old:
12321680 decicycles in cbrt_tableinit, 1 runs, 0 skips
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
This further speeds up runtime initialization, with identical generated tables.
Sample benchmark (x86-64, Haswell, GNU/Linux):
old:
34441423 decicycles in mpegaudio_tableinit, 8192 runs, 0 skips
new:
10776291 decicycles in mpegaudio_tableinit, 8192 runs, 0 skips
Most low hanging fruit is taken care of here. For some idea, note that
83,064 array elements totalling 233,722 bytes need to be initialized.
Thus, with this patch, we average ~ 12.9 cycles per element or ~ 4.6
cycles per byte.
Reviewed-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
This does some miscellaneous stuff mainly avoiding the usage of pow to
achieve significant speedups. This is not speed critical, but is
unnecessary latency and cycles wasted for a user.
All tables tested and are identical to the old ones
(bit-exact even in floating point case).
Sample benchmark (x86-64, Haswell, GNU/Linux):
old:
102329530 decicycles in mpegaudio_tableinit, 1 runs, 0 skips
new:
34111900 decicycles in mpegaudio_tableinit, 1 runs, 0 skips
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Whoever wrote this stuff had a pretty bad libm - digits differ pretty
quickly.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
The table in question is a 253 byte one. In fact, it turns out that
dynamic generation of the table results in an increased binary size.
Code compiled with GCC 5.2.0, x86-64 (size in bytes), before and after
patch:
old: 62321064 libavcodec/libavcodec.so.57
new: 62320536 libavcodec/libavcodec.so.57
Thus, it always make sense to statically allocate this.
Tested with FATE with/without --enable-hardcoded-tables.
Reviewed-by: wm4 <nfxjfg@googlemail.com>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Will Kelleher <wkelleher@gogoair.com>
Previous version reviewed-by: Ivan Uskov <ivan.uskov@nablet.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes out of array reads.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
Fixes: out of array read
Fixes: 76c515fc3779d1b838667c61ea13ce92/asan_heap-oob_1fc0d07_8913_794a4629a264ebdb25b58d3a94ed1785.bit
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The DC VLC table used is too small, fixing this requires a sample,
thus request a sample.
Some samples are said to work even though the table has the wrong size, thus
this is left enabled if the user enables experimental features.
Fixes: 2abd25478c62a675f335fac00b467023/asan_static-oob_10aff98_1227_8811480c6ef1e970a7977ceb7e5a9958.mxf
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Approved-by: kurosu
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
As noted in a comment, pe.min in the reference encoder
is centered around current pe. The bit reservoir algo
needs pe.min to be a local minimum, because it can only
account for local PE variations. If it's set to a global
minimum as was being done, bit reservoir logic doesn't
work as efficiently.
This patch tries to forget old minimums and converge to
a local minimum without losing the stability of the
previous solution. Listening tests until now suggest this
solves numerous RC issues.
* commit '7831fb90503142e32cc3c9be43bc3f9d342ded6b':
textureencdsp: cosmetics: Use normal static const for tables
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
* commit '4a0918cae6394e503b17c71f8f171b4a795eb849':
sgienc: Support encoding high bit depth images with RLE
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
* commit 'c12c085be7e86880924249e5cb3f898e45dee134':
dcadec: Do not check for overreads in auxiliary data
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
Fixes out of array read
Fixes: 59bb925e90201fa0f87f0a31945d43b5/asan_heap-oob_4a52e5_3388_66027f11e3d072f1e02401ecc6193361.jvt
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes out of array access
Fixes: 482d8f2fd17c9f532b586458a33f267c/asan_heap-oob_4a52b6_7417_1d08d477736d66cdadd833d146bb8bae.mov
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes out of array read
Fixes: 2f95ddd996db8a6281d2e18c184595a7/asan_heap-oob_192fe91_3330_58e4441181e30a66c19f743dcb392347.bit
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes out of array access
Fixes: 08664a2a7921ef48172f26495c7455be/asan_heap-oob_23036c6_3301_523388ef84285a0270caf67a43247b59.bit
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
ff_aac_tableinit is a macro in the case of hardcoded tables, so wrap
that up in a function (similar to how the decoder template does it) and
use that as the argument for ff_thread_once().
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Fixes out of array access
Fixes: 01859c9a9ac6cd60a008274123275574/asan_heap-oob_1dff571_8250_50d3d1611e294c3519fd1fa82198b69b.avi
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes out of array read
Fixes: 007c4a36608ebdf27ee260ad60a81184/asan_heap-oob_32076b4_2243_116b1cb29d91cc4974d6680e3d10bd91.bit
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
AAC-Fixed decoder segfaulted. This commit makes the aac encoder
and decoder init the table twice in case of transcoding again.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Since the ff_aac_tableinit() can be called by both the encoder and
the decoder (in case of transcoding) this commit shares the AVOnce
variable to prevent this.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
This speeds up aac_tablegen to a ludicruous degree (~97%), i.e to the point
where it can be argued that runtime initialization can always be done instead of
hard-coded tables. The only cost is essentially a trivial increase in
the stack size.
Even if one does not care about this, the patch also improves accuracy
as detailed below.
Performance:
Benchmark obtained by looping 10^4 times over ff_aac_tableinit.
Sample benchmark (x86-64, Haswell, GNU/Linux):
old:
1295292 decicycles in ff_aac_tableinit, 512 runs, 0 skips
1275981 decicycles in ff_aac_tableinit, 1024 runs, 0 skips
1272932 decicycles in ff_aac_tableinit, 2048 runs, 0 skips
1262164 decicycles in ff_aac_tableinit, 4096 runs, 0 skips
1256720 decicycles in ff_aac_tableinit, 8192 runs, 0 skips
new:
21112 decicycles in ff_aac_tableinit, 511 runs, 1 skips
21269 decicycles in ff_aac_tableinit, 1023 runs, 1 skips
21352 decicycles in ff_aac_tableinit, 2043 runs, 5 skips
21386 decicycles in ff_aac_tableinit, 4080 runs, 16 skips
21299 decicycles in ff_aac_tableinit, 8173 runs, 19 skips
Accuracy:
The previous code was resulting in needless loss of
accuracy due to the pow being called in succession. As an illustration
of this:
ff_aac_pow34sf_tab[3]
old : 0.000000000007598092294225
new : 0.000000000007598091426864
real: 0.000000000007598091778545
truncated to float
old : 0.000000000007598092294225
new : 0.000000000007598091426864
real: 0.000000000007598091426864
showing that the old value was not correctly rounded. This affects a
large number of elements of the array.
Patch tested with FATE.
Reviewed-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
This hugely reduces the echo which was introduced with the previous
commit (though likely because previously everything was broken).
Makes LTP actually worthwhile now.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Should fix issues with ppc, tested by bug reporter.
Reported-by: John Warburton <john@johnwarburton.net>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Copy pointers to AVPicture after memory has been allocated.
Fixes NULL pointers in AVPicture after a17a766190.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
In some conditions, where the first band was being zeroed
mainly, the wrong global gain scalefactor would be written
to the stream since it's always taken from the first band
regardless of whether it's been marked as zero or not.
So, always make sure it contians something useful.
When both M/S coding and PNS are enabled, scalefactors
and coding books would be mistakenly clobbered when setting
the M/S flag on PNS'd bands. The flag needs to be set to
signal the generation of correlated noise, but the scalefactors,
coefficients and the coding books need to be kept intact.
Fixes out of array access
Fixes: 1430e9c43fae47a24c179c7c54f94918/signal_sigsegv_421427_2049_f2192b6829ab6e0eefcb035329c03c60.264
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
SGI RLE encoding is slighlty different than the one provided by rle
module (especially at high bit depth). The pixel count function however
does not change, so it is simply made library-public.
This is never mentioned in the specifications, and decoders work
just as fine without it. Update the fate references since the compressed
file is smaller.
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
There is no such thing as a slice structured mode in the original version 1 H.263,
that mode was added in H.263+ in 1998. Also the headers for slice structured mode
are not part of the older version 1 and this would result in unplayable files
An alternative to this patch would be to merge the H263 and H263P AVCodecs and use
other means to distinguish the older and newer versions.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
VP8E_UPD_ENTROPY, VP8E_UPD_REFERENCE, VP8E_USE_REFERENCE were removed
from libvpx and the remaining values were never used here
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Zern <jzern@google.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
The auxiliary data length field is not reliable,
and incorrect overread errors could be returned
for valid, real-world bitstreams.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
This actually fixes an incorrect float literal. It is believed by
examining the precision that the literals were all pre-computed as
floats, resulting in this needless loss of precision. There is no
benefit to keeping such reduced precision:
1. These constants are used for static array computation, hence
compile-time.
2. They will be treated as doubles anyway, since f specifier was not
present.
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
This uses M_SQRT1_2, M_SQRT2 instead of the actual literals. This yields
greater precision in some places in avcodec/ac3, while fixed point
values remain unchanged.
Reviewed-by: Clément Bœsch <u@pkh.me>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
* commit '4d8f536b535487063a08609636e712ad86d2ad54':
qsvenc: print the actual video parameters used by MSDK
Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* commit 'f6c94457b44f41d900cd0991857f54e1f0ccedd6':
mpegvideo_enc: enable rtp_mode when multiple slices are used
Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* commit '79ae1e630b476889c251fc905687a3831b43ab5e':
avcodec: Define side data type for fallback track
Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Doing that doesn't make sense, because the only purpose of sbr_dequant
is to process the data from read_sbr_data.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
This function returns the encoded data of a frame, one slice at a time
directly when that slice is encoded, instead of waiting for the full
frame to be done. However this field has a debatable usefulness, since
it looks like it is just a convoluted way to get data at lowest
possible latency, or a somewhat hacky way to store h263 in RFC-2190
rtp encapsulation.
Moreover when multi-threading is enabled (which is by default) the order
of returned slices is not deterministic at all, making the use of this
function not reliable at all (or at the very least, more complicated
than it should be).
So, for the reasons stated above, and being used by only a single encoder
family (mpegvideo), this field is deemed unnecessary, overcomplicated,
and not really belonging to libavcodec. Libavformat features a complete
implementation of RFC-2190, for any other case.
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
If videotoolbox_common_end_frame failed, then the AVFrame was returned
to the API user with the dummy buffer (in AVFrame.buf[0]) still set, and
the decode call indicating success.
These "half-set" AVFrames with dummy buffer are a videotoolbox specific
hack, because the decoder requires an allocated AVFrame for its internal
logic. Videotoolbox on the other hand allocates its frame itself
internally, and outputs it only on end_frame. At this point, the dummy
buffer is replaced with the real frame (unless decoding fails).
Currently, multiple slices with just one thread produce corrupted
output.
Additionally, enable slice structured mode for h263(+)
Bug-Id: 912
CC: libav-stabl@libav.org
The replaced check should have become redundant
Found-by: Andreas Cadhalpun <andreas.cadhalpun@googlemail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
It is used as size argument of ff_canopus_parse_info_tag, which uses it
as size argument to bytestream2_init, which only supports sizes up to
INT_MAX.
Changing it's type to unsigned simplifies the check.
Reviewed-by: Vittorio Giovara <vittorio.giovara@gmail.com>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
This allows removing a special case for the fixed point decoder and will
make error checks simpler
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
In case of bitstream errors the deblock filter and slices can access uninitialized
top_borders from previous slices which did not fill them as they stoped halfway due
to error or where entirely missing.
This also makes code using these tables deterministic in case of missing or damaged
slices
Found-by: Tyson Smith
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This side data type is meant to be added to AVStream side data.
A fallback track indicates an alternate track to use when the
current track can not be decoded for some reason. e.g. no
decoder available for codec.
Signed-off-by: Anton Khirnov <anton@khirnov.net>