34685 Commits

Author SHA1 Message Date
Andreas Cadhalpun
5ea59b1f42 exr: fix out of bounds read in get_code
This macro unconditionally used out[-1], which causes an out of bounds
read, if out is the very beginning of the buffer.

Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2015-12-16 22:22:06 +01:00
Andreas Cadhalpun
17776638c3 opus: Fix typo causing overflow in silk_stabilize_lsf
Due to this typo max_center can be too large, causing nlsf to be set to
too large values, which in turn can cause nlsf[i - 1] + min_delta[i] to
overflow to a negative value, which is not allowed for nlsf and can
cause an out of bounds read in silk_lsf2lpc.

Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2015-12-16 22:19:58 +01:00
Andreas Cadhalpun
f61d44b74a opus_silk: fix typo causing overflow in silk_stabilize_lsf
Due to this typo max_center can be too large, causing nlsf to be set to
too large values, which in turn can cause nlsf[i - 1] + min_delta[i] to
overflow to a negative value, which is not allowed for nlsf and can
cause an out of bounds read in silk_lsf2lpc.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2015-12-16 19:29:17 +01:00
Ganesh Ajjanagadde
83a04f103d lavc: move exp2fi to ff_exp2fi in internal.h
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
2015-12-16 07:57:26 -05:00
Stefano Sabatini
6e891d51f4 lavc/libopenh264: apply minor options text consistency fixes 2015-12-16 10:48:28 +01:00
Ganesh Ajjanagadde
65877ab935 lavc: typo fix uncliped -> unclipped
Untested due to lack of ppc.

Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
2015-12-15 22:45:15 -05:00
Matthieu Bouron
ae1c750cb4 lavc/utils: use AVPixFmtDescriptor to probe palette formats
Also use the input frame format instead of the AVCodecContext one according
to the documentation of AVCodecContext.get_buffer2().
2015-12-15 10:35:47 +01:00
Andreas Cadhalpun
22e960ad47 golomb: always check for invalid UE golomb codes in get_ue_golomb
Also correct the check to reject log < 7, because UPDATE_CACHE only
guarantees 25 meaningful bits.

This fixes undefined behavior:
runtime error: shift exponent is negative

Testing with START/STOP timers in get_ue_golomb, one for the first
branch (A) and one for the second (B), shows that there is practically no
slowdown, e.g. for the cavs decoder:

With the check in the B branch:
    629 decicycles in get_ue_golomb B, 4194260 runs,     44 skips
    433 decicycles in get_ue_golomb A,268434102 runs,   1354 skips

Without the check:
    624 decicycles in get_ue_golomb B, 4194273 runs,     31 skips
    433 decicycles in get_ue_golomb A,268434203 runs,   1253 skips

Since the B branch is executed far less often than the A branch, this
change is negligible, even more so for the h264 decoder, where the ratio
B/A is a lot smaller.

Fixes: mozilla bug 1230239
Fixes: fbeb8b2c7c996e9b91c6b1af319d7ebc/asan_heap-oob_195450f_2743_e8856ece4579ea486670be2b236099a0.bit

Found-by: Tyson Smith
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2015-12-14 20:51:39 +01:00
Rostislav Pehlivanov
ade31b9424 aacenc: switch to using the RNG from libavutil
PSNR doesn't change as expected. The AAC spec doesn't really say
anything about how exactly to generate noise.

Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
2015-12-14 18:53:09 +00:00
Janne Grunau
90b1b9350c arm: add ff_int32_to_float_fmul_array8_neon
Quite a bit faster than int32_to_float_fmul_array8_c calling
ff_int32_to_float_fmul_scalar_neon through FmtConvertContext.
Number of cycles per int32_to_float_fmul_array8 call while decoding
padded.dts on exynos5422:

               before  after   change
cortex-a7:     1270     951    -25%
cortex-a15:     434     285    -34%

checkasm --bench cycle counts:     cortex-a15   cortex-a7
int32_to_float_fmul_array8_c:      1730.4       4384.5
int32_to_float_fmul_array8_neon_c:  571.5       1694.3
int32_to_float_fmul_array8_neon:    374.0       1448.8

Interesting are the differences between
int32_to_float_fmul_array8_neon_c and int32_to_float_fmul_array8_neon.
The former is current behaviour of calling
ff_int32_to_float_fmul_scalar_neon repeatedly from the c function,
The raw numbers differ since checkasm uses different lengths than the
dca decoder.
2015-12-14 16:45:02 +01:00
Janne Grunau
a0fc780a20 arm64: int32_to_float_fmul neon asm
3% faster dts decoding on a cortex-a57.

                                 cortex-a57   cortex-a53
int32_to_float_fmul_array8_c:    1270.9       4475.6
int32_to_float_fmul_array8_neon:  328.6        569.2
int32_to_float_fmul_scalar_c:     928.5       4119.6
int32_to_float_fmul_scalar_neon:  309.1        524.1
2015-12-14 16:45:02 +01:00
Janne Grunau
705f5e5e15 arm64: port synth_filter_float_neon from arm
~25% faster dts decoding overall. The checkasm CPU cycles numbers are
not that useful since synth_filter_float() calls FFTContext.imdct_half().

                         cortex-a57   cortex-a53
synth_filter_float_c:    1866.2       3490.9
synth_filter_float_neon:  915.0       1531.5

With fftc.imdct_half forced to imdct_half_neon:
                         cortex-a57   cortex-a53
synth_filter_float_c:    1718.4       3025.3
synth_filter_float_neon:  926.2       1530.1
2015-12-14 16:45:01 +01:00
Janne Grunau
c33c1fa8af arm64: convert dcadsp neon asm from arm
~2% faster dts decoding overall.

                    cortex-a57   cortex-a53
dca_decode_hf_c:    474.8        1659.9
dca_decode_hf_neon: 225.2         301.1
dca_lfe_fir0_c:     913.2        1537.7
dca_lfe_fir0_neon:  286.8         451.9
dca_lfe_fir1_c:     848.7        1711.5
dca_lfe_fir1_neon:  387.1         506.4
2015-12-14 16:45:01 +01:00
Janne Grunau
e2710e790c arm: add a cpu flag for the VFPv2 vector mode
The vector mode was deprecated in ARMv7-A/VFPv3 and various cpu
implementations do not support it in hardware. Vector mode code will
depending the OS either be emulated in software or result in an illegal
instruction on cpus which does not support it. This was not really
problem in practice since NEON implementations of the same functions are
preferred. It will however become a problem for checkasm which tests
every cpu flag separately.

Since this is a cpu feature newer cpu do not support anymore the
behaviour of this flag differs from the other flags. It can be only
activated by runtime cpu feature selection.
2015-12-14 16:42:35 +01:00
Janne Grunau
5dfe4edad6 x86_64: int32_to_float_fmul_scalar sign extend integer length 2015-12-14 16:42:35 +01:00
Agatha Hu
758be45756 avcodec/nvenc: clamp initial qp value to [1, 51]
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
2015-12-14 10:34:59 +01:00
Agatha Hu
f1a8897375 avcodec/nvenc: set slice number to 1 to improve encoding quality
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
2015-12-14 10:27:36 +01:00
Kieran Kunhya
906c0b7716 get_bits: Support max_depth > 2 in GET_RL_VLC_INTERNAL 2015-12-13 22:56:49 +00:00
Anton Khirnov
de9e199a03 lavc: make avpriv_mpa_decode_header private on next bump
It's not used by anything outside of lavc anymore.
2015-12-12 21:26:29 +01:00
Anton Khirnov
955aec3c7c mpegaudiodecheader: check the header in avpriv_mpegaudio_decode_header
Almost all the places from which this function is called already check
the header manually and in the two that don't (the mp3 muxer) the check
should not cause any problems.
2015-12-12 21:25:42 +01:00
Anton Khirnov
cea1eef25c lavc: get the profile name through the codec descriptor in avcodec_string() 2015-12-12 21:24:29 +01:00
Anton Khirnov
2c6811397b lavc: add profiles to AVCodecDescriptor
The profiles are a property of the codec, so it makes sense to export
them through AVCodecDescriptors, not just the codec implementations.
2015-12-12 21:22:49 +01:00
Anton Khirnov
cdc9ce098e lavc: print the name of the codec, not its implementation, in avcodec_string 2015-12-12 21:21:54 +01:00
Anton Khirnov
458e53f51f mpegvideo_enc: actually add the side data with vbv_delay to the packet
Fixes 2507b5dd674834be7261772996f47ae3b95cca69
2015-12-12 21:16:41 +01:00
Michael Niedermayer
625b582d5a avcodec/aacsbr_template: Add Check to read_sbr_envelope()
The limit is a conservative guess, the spec does not seem to specify a limit

Reviewed-by: Andreas Cadhalpun <andreas.cadhalpun@googlemail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-12-12 19:05:07 +01:00
zjh8890
c18176bd55 avcodec/aarch64/neon.S: Update neon.s for transpose_4x4H
The transpose_4x4H is wrong which cost me much time to find this bug. The orders of r2 and r3 are wrong,
this bug waste me much time while I make aarch64 arm instruction which used the function.
2015-12-12 14:20:01 +01:00
Michael Niedermayer
b78885a3c5 avcodec/aacsbr: Split the env_facs table
This also removes a #ifdef and special case for the fixed point case

Reviewed-by: Andreas Cadhalpun <andreas.cadhalpun@googlemail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-12-12 12:19:07 +01:00
Ganesh Ajjanagadde
b4f1636a4d lavc: typo fix cliping -> clipping, saftey -> safety
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
2015-12-11 19:10:00 -05:00
Ganesh Ajjanagadde
b8e5b1d786 lavc/mdct_template: use lrint instead of floor hack
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
2015-12-11 10:35:15 -05:00
Ganesh Ajjanagadde
df679f1264 lavc/dcaenc: avoid wasteful cos calls
cos has symmetry; use this.

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
2015-12-11 10:22:09 -05:00
Ganesh Ajjanagadde
a0ddebfedf lavc/nellymoserdec: replace pow by exp2
exp2 suffices here.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
2015-12-11 10:21:47 -05:00
Dave Yeo
b0b133b8c0 hevcdsp: use a macro for .rodata section
fixes assembling on OS/2

Signed-off-by: Dave Yeo <dave.r.yeo@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2015-12-11 16:19:30 +01:00
Andreas Cadhalpun
fdc94db37e sbr_qmf_analysis: sanitize input for 32-bit imdct
If the input contains too many too large values, the imdct can overflow.
Even if it didn't, the output would be larger than the valid range of 29
bits.

Note that this is a very delicate limit: Allowing values up to 1<<25
does not prevent input larger than 1<<29 from arriving at
sbr_sum_square, while limiting values to 1<<23 breaks the
fate-aac-fixed-al_sbr_hq_cm_48_5.1 test.

Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2015-12-11 00:04:04 +01:00
Andreas Cadhalpun
a9c20e922c sbrdsp_fixed: assert that input values are in the valid range
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2015-12-11 00:04:04 +01:00
Andreas Cadhalpun
ff8816f717 aacsbr: ensure strictly monotone time borders
This fixes a division by zero in the aac_fixed decoder.

Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2015-12-11 00:04:04 +01:00
Rostislav Pehlivanov
d8f13e783a diracdec: remove duplicate codeblock decoding
Broken by commit 7424a6d0a589d31100d6067ebcb47236c00f4b36

Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
2015-12-10 22:50:58 +00:00
Kieran Kunhya
3652dd5d0c diracdec: Fix FPE on invalid low_delay data 2015-12-10 22:14:03 +00:00
Kieran Kunhya
cdf8c9038d diracdec: Replace dirac parse codes with better ones 2015-12-10 21:47:01 +00:00
Kieran Kunhya
7424a6d0a5 diracdec: Read picture types by using parse_code 2015-12-10 21:42:13 +00:00
Kieran Kunhya
8880ca2307 diracdec: Store version major/minor flags 2015-12-10 21:39:06 +00:00
Kieran Kunhya
8eb6acef92 diracdec: Support new extended quantiser range 2015-12-10 21:37:24 +00:00
Kieran Kunhya
8dcc99dc68 diracdec: Extract version parameters 2015-12-10 21:26:35 +00:00
Kieran Kunhya
9f374c5906 diracdec: Make slice parameters common between lowdelay and future hq profile 2015-12-10 21:04:04 +00:00
Kieran Kunhya
3bb6ce1af9 diracdec: Rename lowdelay_subband to decode_subband because it is shared with HQ profile 2015-12-10 19:11:21 +00:00
Kieran Kunhya
3f07f12f65 diracdec: Template DSP functions adding 10-bit versions 2015-12-10 18:25:02 +00:00
Kieran Kunhya
9553689854 diracdec: Move strides to bytes, and pointer types to uint8_t.
Start templating functions for move to support 10-bit
Parts of this patch were written by Rostislav Pehlivanov
2015-12-10 16:52:48 +00:00
Claudio Freire
124c375938 AAC encoder: fix OOB access in search_for_pns
Fix OOB access in search_for_pns which was using
w2 outside the window group loop, and fix a typo
in which it was checking sf_idx instead of band_type

Reviewed-by: Andreas Cadhalpun <andreas.cadhalpun@googlemail.com>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2015-12-09 22:29:18 +01:00
Ganesh Ajjanagadde
cb93df0dcb avcodec/aacsbr_tablegen: always initialize tables at runtime
This gets rid of virtually useless hardcoded tables hackery. The reason
it is useless is that a 320 element lut is anyway placed regardless of
--enable-hardcoded-tables, from which all necessary tables are trivially
derived at runtime at very low cost:

sample benchmark (x86-64, Haswell, GNU/Linux, single run is really
what is relevant here since looping drastically changes the bench). Fluctuations
are on the order of 10% for the single run test:
39400 decicycles in aacsbr_tableinit,       1 runs,      0 skips
25325 decicycles in aacsbr_tableinit,       2 runs,      0 skips
18475 decicycles in aacsbr_tableinit,       4 runs,      0 skips
15008 decicycles in aacsbr_tableinit,       8 runs,      0 skips
13016 decicycles in aacsbr_tableinit,      16 runs,      0 skips
12005 decicycles in aacsbr_tableinit,      32 runs,      0 skips
11546 decicycles in aacsbr_tableinit,      64 runs,      0 skips
11506 decicycles in aacsbr_tableinit,     128 runs,      0 skips
11500 decicycles in aacsbr_tableinit,     256 runs,      0 skips
11183 decicycles in aacsbr_tableinit,     509 runs,      3 skips

Tested with FATE with/without --enable-hardcoded-tables.

Reviewed-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
2015-12-09 07:36:58 -05:00
Ganesh Ajjanagadde
42868ca569 avcodec/jpeg2000: replace naive pow call with smarter exp2fi
pow is a very wasteful function for this purpose. A low hanging fruit
would be simply to replace with exp2f, and that does yield some speedup.
However, there are 2 drawbacks of this:
1. It does not exploit the integer nature of the argument.
2. (minor) Some platforms lack a proper exp2f routine, making benefits available
only to non broken libm.
3. exp2f does not solve the same issue that plagues pow, namely terrible
worst case performance. This is a fundamental issue known as the
"table-maker's dilemma" recognized by Prof. Kahan himself and
subsequently elaborated and researched by many others. All this is clear from benchmarks below.

This exploits the IEEE-754 format to get very good performance even in
the worst case for integer powers of 2. This solves all the issues noted
above. Function tested with clang usan over [-1000, 1000] (beyond range of
relevance for this, which is [-255, 255]), patch itself with FATE.

Benchmarks obtained on x86-64, Haswell, GNU-Linux via 10^5 iterations of
the pow call, START/STOP, and command ffplay ~/samples/jpeg2000/chiens_dcinema2K.mxf.
Low number of runs also given to prove the point about worst case:

pow:
 216270 decicycles in pow,       1 runs,      0 skips
 110175 decicycles in pow,       2 runs,      0 skips
  56085 decicycles in pow,       4 runs,      0 skips
  29013 decicycles in pow,       8 runs,      0 skips
  15472 decicycles in pow,      16 runs,      0 skips
   8689 decicycles in pow,      32 runs,      0 skips
   5295 decicycles in pow,      64 runs,      0 skips
   3599 decicycles in pow,     128 runs,      0 skips
   2748 decicycles in pow,     256 runs,      0 skips
   2304 decicycles in pow,     511 runs,      1 skips
   2072 decicycles in pow,    1022 runs,      2 skips
   1963 decicycles in pow,    2044 runs,      4 skips
   1894 decicycles in pow,    4091 runs,      5 skips
   1860 decicycles in pow,    8184 runs,      8 skips

exp2f:
 134140 decicycles in pow,       1 runs,      0 skips
  68110 decicycles in pow,       2 runs,      0 skips
  34530 decicycles in pow,       4 runs,      0 skips
  17677 decicycles in pow,       8 runs,      0 skips
   9175 decicycles in pow,      16 runs,      0 skips
   4931 decicycles in pow,      32 runs,      0 skips
   2808 decicycles in pow,      64 runs,      0 skips
   1747 decicycles in pow,     128 runs,      0 skips
   1208 decicycles in pow,     256 runs,      0 skips
    952 decicycles in pow,     512 runs,      0 skips
    822 decicycles in pow,    1024 runs,      0 skips
    765 decicycles in pow,    2047 runs,      1 skips
    722 decicycles in pow,    4094 runs,      2 skips
    693 decicycles in pow,    8190 runs,      2 skips

exp2fi:
   2740 decicycles in pow,       1 runs,      0 skips
   1530 decicycles in pow,       2 runs,      0 skips
    955 decicycles in pow,       4 runs,      0 skips
    622 decicycles in pow,       8 runs,      0 skips
    477 decicycles in pow,      16 runs,      0 skips
    368 decicycles in pow,      32 runs,      0 skips
    317 decicycles in pow,      64 runs,      0 skips
    291 decicycles in pow,     128 runs,      0 skips
    277 decicycles in pow,     256 runs,      0 skips
    268 decicycles in pow,     512 runs,      0 skips
    265 decicycles in pow,    1024 runs,      0 skips
    263 decicycles in pow,    2048 runs,      0 skips
    263 decicycles in pow,    4095 runs,      1 skips
    260 decicycles in pow,    8191 runs,      1 skips

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
2015-12-08 22:00:05 -05:00
Andreas Cadhalpun
5b0da6999f aacenc: update max_sfb when num_swb changes
This fixes out-of-bounds reads in avoid_clipping.

Reviewed-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2015-12-08 22:53:09 +01:00