Commit Graph

669 Commits

Author SHA1 Message Date
Michael Niedermayer
21dfabfa64 Merge commit 'adff0a8166345bb9513f0f658043fb6387e90122'
* commit 'adff0a8166345bb9513f0f658043fb6387e90122':
  arm: dsputil: Coalesce all init files

Conflicts:
	libavcodec/arm/Makefile
	libavcodec/arm/dsputil_arm.h
	libavcodec/arm/dsputil_init_arm.c
	libavcodec/arm/dsputil_init_armv6.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-16 20:09:25 +02:00
Diego Biurrun
adff0a8166 arm: dsputil: Coalesce all init files 2014-07-16 06:18:23 -07:00
Ben Avison
42c1cc35b7 armv6: Accelerate ff_imdct_half for general case (mdct_bits != 6)
The previous implementation targeted DTS Coherent Acoustics, which only
requires mdct_bits == 6. This relatively small size lent itself to
unrolling the loops a small number of times, and encoding offsets
calculated at assembly time within the load/store instructions of each
iteration.

In the more general case (codecs such as AAC and AC3) much larger arrays
are used - mdct_bits == [8, 9, 11]. The old method does not scale for
these cases, so more integer registers are used with non-unrolled versions
of the loops (and with some stack spillage). The postrotation filter loop
is still unrolled by a factor of 2 to permit the double-buffering of some
VFP registers to facilitate overlap of neighbouring iterations.

I benchmarked the result by measuring the number of gperftools samples
that hit anywhere in the AAC decoder (starting from aac_decode_frame())
or specifically in ff_imdct_half_c / ff_imdct_half_vfp, for the same
example AAC stream:

                  Before          After
                  Mean   StdDev   Mean   StdDev  Confidence  Change
aac_decode_frame  2368.1 35.8     2117.2 35.3    100.0%      +11.8%
ff_imdct_half_*   457.5  22.4     251.2  16.2    100.0%      +82.1%

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-13 15:17:04 +02:00
Michael Niedermayer
b8cdf04726 Merge commit '1173320249745eab01c901a39054fc0fced33c87'
* commit '1173320249745eab01c901a39054fc0fced33c87':
  dsputil: Drop unused bit_depth parameter from all init functions

Conflicts:
	libavcodec/dsputil.c
	libavcodec/dsputil.h
	libavcodec/ppc/dsputil_ppc.c
	libavcodec/x86/dsputilenc_mmx.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-11 20:29:40 +02:00
Diego Biurrun
1173320249 dsputil: Drop unused bit_depth parameter from all init functions 2014-07-11 06:38:26 -07:00
Michael Niedermayer
2d5e9451de Merge commit 'f46bb608d9d76c543e4929dc8cffe36b84bd789e'
* commit 'f46bb608d9d76c543e4929dc8cffe36b84bd789e':
  dsputil: Split off pixel block routines into their own context

Conflicts:
	configure
	libavcodec/dsputil.c
	libavcodec/mpegvideo_enc.c
	libavcodec/pixblockdsp_template.c
	libavcodec/x86/dsputilenc.asm
	libavcodec/x86/dsputilenc_mmx.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-10 01:22:14 +02:00
Diego Biurrun
f46bb608d9 dsputil: Split off pixel block routines into their own context 2014-07-09 08:05:26 -07:00
Michael Niedermayer
1f935c3d0b Merge commit '79fce1ec8abd017593c003917fc123f7119a78d6'
* commit '79fce1ec8abd017593c003917fc123f7119a78d6':
  arm: Avoid using the 'setend' instruction on ARMv7 and newer

Conflicts:
	libavcodec/arm/h264dsp_init_arm.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-08 14:44:12 +02:00
Martin Storsjö
79fce1ec8a arm: Avoid using the 'setend' instruction on ARMv7 and newer
This instruction is deprecated on ARMv8, and it is serializing on
some ARMv7 cores as well [1].

[1] http://article.gmane.org/gmane.linux.ports.arm.kernel/339293

CC: libav-stable@libav.org
Signed-off-by: Martin Storsjö <martin@martin.st>
2014-07-08 12:09:09 +03:00
Michael Niedermayer
020865f557 Merge commit 'c166148409fe8f0dbccef2fe684286a40ba1e37d'
* commit 'c166148409fe8f0dbccef2fe684286a40ba1e37d':
  dsputil: Move pix_sum, pix_norm1, shrink function pointers to mpegvideoenc

Conflicts:
	libavcodec/dsputil.c
	libavcodec/mpegvideo_enc.c
	libavcodec/x86/dsputilenc.asm
	libavcodec/x86/dsputilenc_mmx.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-07 15:36:58 +02:00
Diego Biurrun
c166148409 dsputil: Move pix_sum, pix_norm1, shrink function pointers to mpegvideoenc 2014-07-06 14:26:53 -07:00
Michael Niedermayer
581b5f0b9b Merge commit 'e3fcb14347466095839c2a3c47ebecff02da891e'
* commit 'e3fcb14347466095839c2a3c47ebecff02da891e':
  dsputil: Split off IDCT bits into their own context

Conflicts:
	configure
	libavcodec/aic.c
	libavcodec/arm/Makefile
	libavcodec/arm/dsputil_init_arm.c
	libavcodec/arm/dsputil_init_armv6.c
	libavcodec/asvdec.c
	libavcodec/dnxhdenc.c
	libavcodec/dsputil.c
	libavcodec/dvdec.c
	libavcodec/dxva2_mpeg2.c
	libavcodec/intrax8.c
	libavcodec/mdec.c
	libavcodec/mjpegdec.c
	libavcodec/mjpegenc_common.h
	libavcodec/mpegvideo.c
	libavcodec/ppc/dsputil_altivec.h
	libavcodec/ppc/dsputil_ppc.c
	libavcodec/ppc/idctdsp.c
	libavcodec/x86/Makefile
	libavcodec/x86/dsputil_init.c
	libavcodec/x86/dsputil_mmx.c
	libavcodec/x86/dsputil_x86.h

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-01 15:22:11 +02:00
Diego Biurrun
e3fcb14347 dsputil: Split off IDCT bits into their own context 2014-06-30 07:58:46 -07:00
Michael Niedermayer
32cf26cc6a Merge commit 'f23d26a6864128001b03876b0b92fffe131f2060'
* commit 'f23d26a6864128001b03876b0b92fffe131f2060':
  h264: avoid using uninitialized memory in NEON chroma mc

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-23 20:35:33 +02:00
Janne Grunau
f23d26a686 h264: avoid using uninitialized memory in NEON chroma mc
Adapt commit 982b596ea6 for the arm and
aarch64 NEON asm. 5-10% faster on Cortex-A9.
2014-06-23 16:32:15 +02:00
Michael Niedermayer
99497b4683 Merge commit '9a9e2f1c8aa4539a261625145e5c1f46a8106ac2'
* commit '9a9e2f1c8aa4539a261625145e5c1f46a8106ac2':
  dsputil: Split audio operations off into a separate context

Conflicts:
	configure
	libavcodec/takdec.c
	libavcodec/x86/Makefile
	libavcodec/x86/dsputil.asm
	libavcodec/x86/dsputil_init.c
	libavcodec/x86/dsputil_mmx.c
	libavcodec/x86/dsputil_x86.h

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-22 17:58:28 +02:00
Diego Biurrun
9a9e2f1c8a dsputil: Split audio operations off into a separate context 2014-06-22 06:20:15 -07:00
Michael Niedermayer
08c5859f17 avcodec: add simpleauto idct
This will pick the "best" simple idct compatible idct

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-19 14:28:01 +02:00
Michael Niedermayer
2b05db4f81 Merge commit 'e74433a8e6fc00c8dbde293c97a3e45384c2c1d9'
* commit 'e74433a8e6fc00c8dbde293c97a3e45384c2c1d9':
  dsputil: Split clear_block*/fill_block* off into a separate context

Conflicts:
	configure
	libavcodec/asvdec.c
	libavcodec/dnxhddec.c
	libavcodec/dnxhdenc.c
	libavcodec/dsputil.h
	libavcodec/eamad.c
	libavcodec/intrax8.c
	libavcodec/mjpegdec.c
	libavcodec/ppc/dsputil_ppc.c
	libavcodec/vc1dec.c
	libavcodec/x86/dsputil_init.c
	libavcodec/x86/dsputil_mmx.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-19 04:54:38 +02:00
Diego Biurrun
e74433a8e6 dsputil: Split clear_block*/fill_block* off into a separate context 2014-06-18 14:07:23 -07:00
Christophe Gisquet
ccff45a0d3 apedsp: move to llauddsp
APE is not the sole codec using scalarproduct_and_madd_int16.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-05 20:31:59 +02:00
Michael Niedermayer
83e8650f77 Merge commit '896a5bff64264f4d01ed98eacc97a67260c1e17e'
* commit '896a5bff64264f4d01ed98eacc97a67260c1e17e':
  arm: check if AS supports .dn

Conflicts:
	configure
	libavcodec/arm/vc1dsp_init_neon.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-03 18:19:21 +02:00
Janne Grunau
896a5bff64 arm: check if AS supports .dn
Move the GNU as check before the arch specific asm checks since the .dn
check requires gas compatible assembler.

Disable the VC-1 motion compensation NEON asm which is the only part
using that directive. The integrated assembler in the upcoming clang 3.5
does not support .dn/.qn without plans to change that. Too much effort
to implement it while it is rarely used.

http://llvm.org/bugs/show_bug.cgi?id=18199.
2014-06-03 14:23:03 +02:00
Michael Niedermayer
40f3a87c10 Merge commit '054013a0fc6f2b52c60cee3e051be8cc7f82cef3'
* commit '054013a0fc6f2b52c60cee3e051be8cc7f82cef3':
  dsputil: Move APE-specific bits into apedsp

Conflicts:
	libavcodec/arm/int_neon.S
	libavcodec/x86/dsputil.asm

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-30 00:59:15 +02:00
Diego Biurrun
054013a0fc dsputil: Move APE-specific bits into apedsp 2014-05-29 06:41:15 -07:00
Michael Niedermayer
8b30702c44 Merge commit '6a13505c069890cb0e2a07e29fd819a0cf2e73c1'
* commit '6a13505c069890cb0e2a07e29fd819a0cf2e73c1':
  mpegvideo: move the MpegEncContext fields used from arm asm to the beginning

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-30 00:23:01 +02:00
Anton Khirnov
6a13505c06 mpegvideo: move the MpegEncContext fields used from arm asm to the beginning
This should reduce the frequency with which the offsets need to be
updated.
2014-04-29 14:49:42 +02:00
Ben Avison
9d8ecdd8ca vc-1: Add platform-specific start code search routine to VC1DSPContext.
Initialise VC1DSPContext for parser as well as for decoder.
Note, the VC-1 code doesn't actually use the function pointer yet.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-25 02:36:11 +02:00
Ben Avison
270cede3f3 h264: Move search code search functions into separate source files.
This permits re-use with parsers for codecs which use similar start codes.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-25 02:35:56 +02:00
Michael Niedermayer
06e664366a Merge commit 'a88e1d1c598e641eecd5d43730211d91c82787c6'
* commit 'a88e1d1c598e641eecd5d43730211d91c82787c6':
  lavu: add CHK_OFFS as AV_CHECK_OFFSET to check struct member offsets

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-25 00:55:40 +02:00
Janne Grunau
a88e1d1c59 lavu: add CHK_OFFS as AV_CHECK_OFFSET to check struct member offsets 2014-04-24 18:28:26 +02:00
Michael Niedermayer
af89a685c4 avcodec/arm/vc1dsp_init_neon: fix code so it compiles and passes fate-vc1
The original patch  seems to be missing a 16x16 function though

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-20 20:32:21 +02:00
Christophe Gisquet
319235c67c vc1dsp: introduce cases for 8x8 and 16x16
This allows further unrolling the DSP implementation where possible.

x86 and ARM DSP modified by simply moving the multiple calls from vc1dec
to the DSP code. Decoding improvements should only occurs because of the
compiler actually able to unroll more.

Decoding time: ~8.80s -> 8.64s (ie around 2%)

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-20 18:25:36 +02:00
Michael Niedermayer
5440151fa4 Merge commit '3dc6272bed7890a49080e18eacf3c7a4a6594b0d'
* commit '3dc6272bed7890a49080e18eacf3c7a4a6594b0d':
  Remove a number of unnecessary dsputil.h #includes

Conflicts:
	libavcodec/h264pred.c
	libavcodec/vc1dsp.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-05 18:54:15 +02:00
Diego Biurrun
3dc6272bed Remove a number of unnecessary dsputil.h #includes 2014-04-04 19:08:05 +02:00
Michael Niedermayer
fb61ed1e9f Merge commit 'ac4b32df71bd932838043a4838b86d11e169707f'
* commit 'ac4b32df71bd932838043a4838b86d11e169707f':
  On2 VP7 decoder

Conflicts:
	Changelog
	libavcodec/arm/h264pred_init_arm.c
	libavcodec/arm/vp8dsp.h
	libavcodec/arm/vp8dsp_init_arm.c
	libavcodec/arm/vp8dsp_init_armv6.c
	libavcodec/arm/vp8dsp_init_neon.c
	libavcodec/avcodec.h
	libavcodec/h264pred.c
	libavcodec/version.h
	libavcodec/vp8.c
	libavcodec/vp8.h
	libavcodec/vp8data.h
	libavcodec/vp8dsp.c
	libavcodec/vp8dsp.h
	libavcodec/x86/h264_intrapred_init.c
	libavcodec/x86/vp8dsp_init.c

See: 89f2f5dbd7 and others
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-04 14:46:10 +02:00
Janne Grunau
f37815b1d5 arm: asm decode_block_coeffs_internal is vp8 specific
Unbreaks compilation on arm due to conflicting types for
'ff_decode_block_coeffs_armv6'.
2014-04-04 10:39:29 +02:00
Peter Ross
ac4b32df71 On2 VP7 decoder
Further performance improvements and security fixes by
Vittorio Giovara, Luca Barbato and Diego Biurrun.

Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2014-04-04 04:00:11 +02:00
Michael Niedermayer
68014c6ed9 Merge commit 'c3a0b3eb64be441ca897629e8ecd80d5b51fded7'
* commit 'c3a0b3eb64be441ca897629e8ecd80d5b51fded7':
  arm: build: Maintain decoder objects separate from infrastructure objects

Conflicts:
	libavcodec/arm/Makefile

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-27 20:10:51 +01:00
Diego Biurrun
c3a0b3eb64 arm: build: Maintain decoder objects separate from infrastructure objects 2014-03-27 03:00:05 -07:00
Michael Niedermayer
50b68e323c Merge remote-tracking branch 'qatar/master'
* qatar/master:
  truehd: add hand-scheduled ARM asm version of ff_mlp_pack_output.

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-26 21:23:09 +01:00
Ben Avison
89135716fd truehd: add hand-scheduled ARM asm version of ff_mlp_rematrix_channel.
Profiling results for overall audio decode and the rematrix_channels function
in particular are as follows:

              Before          After
              Mean   StdDev   Mean   StdDev  Confidence  Change
6:2 total     370.8  17.0     348.8  20.1    99.9%       +6.3%
6:2 function  46.4   8.4      45.8   6.6     18.0%       +1.2%  (insignificant)
8:2 total     343.2  19.0     339.1  15.4    54.7%       +1.2%  (insignificant)
8:2 function  38.9   3.9      40.2   6.9     52.4%       -3.2%  (insignificant)
6:6 total     658.4  15.7     604.6  20.8    100.0%      +8.9%
6:6 function  109.0  8.7      59.5   5.4     100.0%      +83.3%
8:8 total     896.2  24.5     766.4  17.6    100.0%      +16.9%
8:8 function  223.4  12.8     93.8   5.0     100.0%      +138.3%

The assembly version has also been tested with a fuzz tester to ensure that
any combinations of inputs not exercised by my available test streams still
generate mathematically identical results to the C version.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-26 20:50:05 +01:00
Michael Niedermayer
f38af0143c Merge commit '15a29c39d9ef15b0783c04b3228e1c55f6701ee3'
* commit '15a29c39d9ef15b0783c04b3228e1c55f6701ee3':
  truehd: add hand-scheduled ARM asm version of mlp_filter_channel.

Conflicts:
	libavcodec/arm/Makefile
	libavcodec/arm/mlpdsp_init_arm.c

See: 87b128d5ef
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-26 20:39:10 +01:00
Ben Avison
87b128d5ef truehd: add hand-scheduled ARM asm version of mlp_filter_channel.
Profiling results for overall audio decode and the mlp_filter_channel(_arm)
function in particular are as follows:

              Before          After
              Mean   StdDev   Mean   StdDev  Confidence  Change
6:2 total     380.4  22.0     370.8  17.0    87.4%       +2.6%  (insignificant)
6:2 function  60.7   7.2      36.6   8.1     100.0%      +65.8%
8:2 total     357.0  17.5     343.2  19.0    97.8%       +4.0%  (insignificant)
8:2 function  60.3   8.8      37.3   3.8     100.0%      +61.8%
6:6 total     717.2  23.2     658.4  15.7    100.0%      +8.9%
6:6 function  140.4  12.9     81.5   9.2     100.0%      +72.4%
8:8 total     981.9  16.2     896.2  24.5    100.0%      +9.6%
8:8 function  193.4  15.0     103.3  11.5    100.0%      +87.2%

Experiments with adding preload instructions to this function yielded no
useful benefit, so these have not been included.

The assembly version has also been tested with a fuzz tester to ensure that
any combinations of inputs not exercised by my available test streams still
generate mathematically identical results to the C version.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-26 20:22:18 +01:00
Ben Avison
3b5946bcce truehd: add hand-scheduled ARM asm version of ff_mlp_pack_output.
Profiling results for overall decode and the output_data function in
particular are as follows:

              Before          After
              Mean   StdDev   Mean   StdDev  Confidence  Change
6:2 total     339.6  15.1     329.3  16.0    95.8%       +3.1%  (insignificant)
6:2 function  24.6   6.0      9.9    3.1     100.0%      +148.5%
8:2 total     324.5  15.5     323.6  14.3    15.2%       +0.3%  (insignificant)
8:2 function  20.4   3.9      9.9    3.4     100.0%      +104.7%
6:6 total     572.8  20.6     539.9  24.2    100.0%      +6.1%
6:6 function  54.5   5.6      16.0   3.8     100.0%      +240.9%
8:8 total     741.5  21.2     702.5  18.5    100.0%      +5.6%
8:8 function  63.9   7.6      18.4   4.8     100.0%      +247.3%

The assembly version has also been tested with a fuzz tester to ensure that
any combinations of inputs not exercised by my available test streams still
generate mathematically identical results to the C version.

Signed-off-by: Martin Storsjö <martin@martin.st>
2014-03-26 19:54:32 +02:00
Ben Avison
483321fe78 truehd: add hand-scheduled ARM asm version of ff_mlp_rematrix_channel.
Profiling results for overall audio decode and the rematrix_channels function
in particular are as follows:

              Before          After
              Mean   StdDev   Mean   StdDev  Confidence  Change
6:2 total     370.8  17.0     348.8  20.1    99.9%       +6.3%
6:2 function  46.4   8.4      45.8   6.6     18.0%       +1.2%  (insignificant)
8:2 total     343.2  19.0     339.1  15.4    54.7%       +1.2%  (insignificant)
8:2 function  38.9   3.9      40.2   6.9     52.4%       -3.2%  (insignificant)
6:6 total     658.4  15.7     604.6  20.8    100.0%      +8.9%
6:6 function  109.0  8.7      59.5   5.4     100.0%      +83.3%
8:8 total     896.2  24.5     766.4  17.6    100.0%      +16.9%
8:8 function  223.4  12.8     93.8   5.0     100.0%      +138.3%

The assembly version has also been tested with a fuzz tester to ensure that
any combinations of inputs not exercised by my available test streams still
generate mathematically identical results to the C version.

Signed-off-by: Martin Storsjö <martin@martin.st>
2014-03-26 19:54:10 +02:00
Ben Avison
15a29c39d9 truehd: add hand-scheduled ARM asm version of mlp_filter_channel.
Profiling results for overall audio decode and the mlp_filter_channel(_arm)
function in particular are as follows:

              Before          After
              Mean   StdDev   Mean   StdDev  Confidence  Change
6:2 total     380.4  22.0     370.8  17.0    87.4%       +2.6%  (insignificant)
6:2 function  60.7   7.2      36.6   8.1     100.0%      +65.8%
8:2 total     357.0  17.5     343.2  19.0    97.8%       +4.0%  (insignificant)
8:2 function  60.3   8.8      37.3   3.8     100.0%      +61.8%
6:6 total     717.2  23.2     658.4  15.7    100.0%      +8.9%
6:6 function  140.4  12.9     81.5   9.2     100.0%      +72.4%
8:8 total     981.9  16.2     896.2  24.5    100.0%      +9.6%
8:8 function  193.4  15.0     103.3  11.5    100.0%      +87.2%

Experiments with adding preload instructions to this function yielded no
useful benefit, so these have not been included.

The assembly version has also been tested with a fuzz tester to ensure that
any combinations of inputs not exercised by my available test streams still
generate mathematically identical results to the C version.

Signed-off-by: Martin Storsjö <martin@martin.st>
2014-03-26 19:53:52 +02:00
Peter Ross
a490970af2 libavcodec/*/vp8dsp_init: indent
Signed-off-by: Peter Ross <pross@xvid.org>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-25 13:29:29 +01:00
Peter Ross
89f2f5dbd7 On2 VP7 decoder
Signed-off-by: Peter Ross <pross@xvid.org>
Reviewed-by: BBB
previous patch reviewed by jason
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-25 13:29:05 +01:00
Michael Niedermayer
77bc342975 Merge commit '322a1dda973e802db7b57f2007fad3efcd5bab81'
* commit '322a1dda973e802db7b57f2007fad3efcd5bab81':
  dsputil: Refactor duplicated CALL_2X_PIXELS / PIXELS16 macros

Conflicts:
	libavcodec/arm/hpeldsp_init_arm.c
	libavcodec/x86/dsputil_x86.h

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-22 22:53:33 +01:00