734 Commits

Author SHA1 Message Date
Michael Niedermayer
110420aac0 Merge commit '4de8b60684ce13dff3e3d372dae4f49b9e53f755'
* commit '4de8b60684ce13dff3e3d372dae4f49b9e53f755':
  idct: Move arm-specific declarations to a header in the arm directory

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-21 01:56:22 +02:00
Diego Biurrun
4de8b60684 idct: Move arm-specific declarations to a header in the arm directory 2014-07-20 13:02:17 -07:00
Michael Niedermayer
521f569734 Merge commit '8b0dd4942aac320d1ca3c40fa7ea1be342c71273'
* commit '8b0dd4942aac320d1ca3c40fa7ea1be342c71273':
  idctdsp: prettyprinting cosmetics

Conflicts:
	libavcodec/idctdsp.c
	libavcodec/ppc/idctdsp.c
	libavcodec/x86/idctdsp_init.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-18 22:16:04 +02:00
Michael Niedermayer
42d326353c Merge commit 'b4987f72197e0c62cf2633bf835a9c32d2a445ae'
* commit 'b4987f72197e0c62cf2633bf835a9c32d2a445ae':
  idct: Convert IDCT permutation #defines to an enum

Conflicts:
	libavcodec/idctdsp.c
	libavcodec/x86/cavsdsp.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-18 22:01:17 +02:00
Diego Biurrun
8b0dd4942a idctdsp: prettyprinting cosmetics 2014-07-18 07:51:03 -07:00
Diego Biurrun
b4987f7219 idct: Convert IDCT permutation #defines to an enum
Also rename the enum values to be consistent with other DCT permutations.
2014-07-18 07:51:03 -07:00
Michael Niedermayer
d13effb0b4 Merge commit '7e18a727d2c2a19f22fcf68875d1b05fd2eafcef'
* commit '7e18a727d2c2a19f22fcf68875d1b05fd2eafcef':
  arm: cosmetics: Consistently use lowercase for shift operators

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-18 13:17:29 +02:00
Michael Niedermayer
cd4497d8c5 Merge commit 'fe67f3fbb5f9f6a6b60f837f6bc5e087ac11f3bf'
* commit 'fe67f3fbb5f9f6a6b60f837f6bc5e087ac11f3bf':
  arm: cosmetics: Fix a misaligned asm operand

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-18 12:44:03 +02:00
Martin Storsjö
7e18a727d2 arm: cosmetics: Consistently use lowercase for shift operators
Signed-off-by: Martin Storsjö <martin@martin.st>
2014-07-18 11:17:40 +03:00
Martin Storsjö
fe67f3fbb5 arm: cosmetics: Fix a misaligned asm operand
Signed-off-by: Martin Storsjö <martin@martin.st>
2014-07-18 11:17:35 +03:00
Michael Niedermayer
c27adb37ef Merge commit '87552d54d3337c3241e8a9e1a05df16eaa821496'
* commit '87552d54d3337c3241e8a9e1a05df16eaa821496':
  armv6: Accelerate ff_fft_calc for general case (nbits != 4)

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-18 03:12:02 +02:00
Ben Avison
87552d54d3 armv6: Accelerate ff_fft_calc for general case (nbits != 4)
The previous implementation targeted DTS Coherent Acoustics, which only
requires nbits == 4 (fft16()). This case was (and still is) linked directly
rather than being indirected through ff_fft_calc_vfp(), but now the full
range from radix-4 up to radix-65536 is available. This benefits other codecs
such as AAC and AC3.

The implementaion is based upon the C version, with each routine larger than
radix-16 calling a hierarchy of smaller FFT functions, then performing a
post-processing pass. This pass benefits a lot from loop unrolling to
counter the long pipelines in the VFP. A relaxed calling standard also
reduces the overhead of the call hierarchy, and avoiding the excessive
inlining performed by GCC probably helps with I-cache utilisation too.

I benchmarked the result by measuring the number of gperftools samples that
hit anywhere in the AAC decoder (starting from aac_decode_frame()) or
specifically in the FFT routines (fft4() to fft512() and pass()) for the
same sample AAC stream:

              Before          After
              Mean   StdDev   Mean   StdDev  Confidence  Change
Audio decode  2245.5 53.1     1599.6 43.8    100.0%      +40.4%
FFT routines  940.6  22.0     348.1  20.8    100.0%      +170.2%

Signed-off-by: Martin Storsjö <martin@martin.st>
2014-07-18 01:34:23 +03:00
Ben Avison
5c22e8e4ad armv6: Accelerate ff_imdct_half for general case (mdct_bits != 6)
The previous implementation targeted DTS Coherent Acoustics, which only
requires mdct_bits == 6. This relatively small size lent itself to
unrolling the loops a small number of times, and encoding offsets
calculated at assembly time within the load/store instructions of each
iteration.

In the more general case (codecs such as AAC and AC3) much larger arrays
are used - mdct_bits == [8, 9, 11]. The old method does not scale for
these cases, so more integer registers are used with non-unrolled versions
of the loops (and with some stack spillage). The postrotation filter loop
is still unrolled by a factor of 2 to permit the double-buffering of some
VFP registers to facilitate overlap of neighbouring iterations.

I benchmarked the result by measuring the number of gperftools samples
that hit anywhere in the AAC decoder (starting from aac_decode_frame())
or specifically in ff_imdct_half_c / ff_imdct_half_vfp, for the same
example AAC stream:

                  Before          After
                  Mean   StdDev   Mean   StdDev  Confidence  Change
aac_decode_frame  2368.1 35.8     2117.2 35.3    100.0%      +11.8%
ff_imdct_half_*   457.5  22.4     251.2  16.2    100.0%      +82.1%

Signed-off-by: Martin Storsjö <martin@martin.st>
2014-07-18 01:34:08 +03:00
Michael Niedermayer
3a2d1465c8 Merge commit '2d60444331fca1910510038dd3817bea885c2367'
* commit '2d60444331fca1910510038dd3817bea885c2367':
  dsputil: Split motion estimation compare bits off into their own context

Conflicts:
	configure
	libavcodec/Makefile
	libavcodec/arm/Makefile
	libavcodec/dvenc.c
	libavcodec/error_resilience.c
	libavcodec/h264.h
	libavcodec/h264_slice.c
	libavcodec/me_cmp.c
	libavcodec/me_cmp.h
	libavcodec/motion_est.c
	libavcodec/motion_est_template.c
	libavcodec/mpeg4videoenc.c
	libavcodec/mpegvideo.c
	libavcodec/mpegvideo_enc.c
	libavcodec/x86/Makefile
	libavcodec/x86/me_cmp_init.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-17 23:27:40 +02:00
Diego Biurrun
2d60444331 dsputil: Split motion estimation compare bits off into their own context 2014-07-17 09:07:10 -07:00
Michael Niedermayer
21dfabfa64 Merge commit 'adff0a8166345bb9513f0f658043fb6387e90122'
* commit 'adff0a8166345bb9513f0f658043fb6387e90122':
  arm: dsputil: Coalesce all init files

Conflicts:
	libavcodec/arm/Makefile
	libavcodec/arm/dsputil_arm.h
	libavcodec/arm/dsputil_init_arm.c
	libavcodec/arm/dsputil_init_armv6.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-16 20:09:25 +02:00
Diego Biurrun
adff0a8166 arm: dsputil: Coalesce all init files 2014-07-16 06:18:23 -07:00
Ben Avison
42c1cc35b7 armv6: Accelerate ff_imdct_half for general case (mdct_bits != 6)
The previous implementation targeted DTS Coherent Acoustics, which only
requires mdct_bits == 6. This relatively small size lent itself to
unrolling the loops a small number of times, and encoding offsets
calculated at assembly time within the load/store instructions of each
iteration.

In the more general case (codecs such as AAC and AC3) much larger arrays
are used - mdct_bits == [8, 9, 11]. The old method does not scale for
these cases, so more integer registers are used with non-unrolled versions
of the loops (and with some stack spillage). The postrotation filter loop
is still unrolled by a factor of 2 to permit the double-buffering of some
VFP registers to facilitate overlap of neighbouring iterations.

I benchmarked the result by measuring the number of gperftools samples
that hit anywhere in the AAC decoder (starting from aac_decode_frame())
or specifically in ff_imdct_half_c / ff_imdct_half_vfp, for the same
example AAC stream:

                  Before          After
                  Mean   StdDev   Mean   StdDev  Confidence  Change
aac_decode_frame  2368.1 35.8     2117.2 35.3    100.0%      +11.8%
ff_imdct_half_*   457.5  22.4     251.2  16.2    100.0%      +82.1%

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-13 15:17:04 +02:00
Michael Niedermayer
b8cdf04726 Merge commit '1173320249745eab01c901a39054fc0fced33c87'
* commit '1173320249745eab01c901a39054fc0fced33c87':
  dsputil: Drop unused bit_depth parameter from all init functions

Conflicts:
	libavcodec/dsputil.c
	libavcodec/dsputil.h
	libavcodec/ppc/dsputil_ppc.c
	libavcodec/x86/dsputilenc_mmx.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-11 20:29:40 +02:00
Diego Biurrun
1173320249 dsputil: Drop unused bit_depth parameter from all init functions 2014-07-11 06:38:26 -07:00
Michael Niedermayer
2d5e9451de Merge commit 'f46bb608d9d76c543e4929dc8cffe36b84bd789e'
* commit 'f46bb608d9d76c543e4929dc8cffe36b84bd789e':
  dsputil: Split off pixel block routines into their own context

Conflicts:
	configure
	libavcodec/dsputil.c
	libavcodec/mpegvideo_enc.c
	libavcodec/pixblockdsp_template.c
	libavcodec/x86/dsputilenc.asm
	libavcodec/x86/dsputilenc_mmx.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-10 01:22:14 +02:00
Diego Biurrun
f46bb608d9 dsputil: Split off pixel block routines into their own context 2014-07-09 08:05:26 -07:00
Michael Niedermayer
1f935c3d0b Merge commit '79fce1ec8abd017593c003917fc123f7119a78d6'
* commit '79fce1ec8abd017593c003917fc123f7119a78d6':
  arm: Avoid using the 'setend' instruction on ARMv7 and newer

Conflicts:
	libavcodec/arm/h264dsp_init_arm.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-08 14:44:12 +02:00
Martin Storsjö
79fce1ec8a arm: Avoid using the 'setend' instruction on ARMv7 and newer
This instruction is deprecated on ARMv8, and it is serializing on
some ARMv7 cores as well [1].

[1] http://article.gmane.org/gmane.linux.ports.arm.kernel/339293

CC: libav-stable@libav.org
Signed-off-by: Martin Storsjö <martin@martin.st>
2014-07-08 12:09:09 +03:00
Michael Niedermayer
020865f557 Merge commit 'c166148409fe8f0dbccef2fe684286a40ba1e37d'
* commit 'c166148409fe8f0dbccef2fe684286a40ba1e37d':
  dsputil: Move pix_sum, pix_norm1, shrink function pointers to mpegvideoenc

Conflicts:
	libavcodec/dsputil.c
	libavcodec/mpegvideo_enc.c
	libavcodec/x86/dsputilenc.asm
	libavcodec/x86/dsputilenc_mmx.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-07 15:36:58 +02:00
Diego Biurrun
c166148409 dsputil: Move pix_sum, pix_norm1, shrink function pointers to mpegvideoenc 2014-07-06 14:26:53 -07:00
Michael Niedermayer
581b5f0b9b Merge commit 'e3fcb14347466095839c2a3c47ebecff02da891e'
* commit 'e3fcb14347466095839c2a3c47ebecff02da891e':
  dsputil: Split off IDCT bits into their own context

Conflicts:
	configure
	libavcodec/aic.c
	libavcodec/arm/Makefile
	libavcodec/arm/dsputil_init_arm.c
	libavcodec/arm/dsputil_init_armv6.c
	libavcodec/asvdec.c
	libavcodec/dnxhdenc.c
	libavcodec/dsputil.c
	libavcodec/dvdec.c
	libavcodec/dxva2_mpeg2.c
	libavcodec/intrax8.c
	libavcodec/mdec.c
	libavcodec/mjpegdec.c
	libavcodec/mjpegenc_common.h
	libavcodec/mpegvideo.c
	libavcodec/ppc/dsputil_altivec.h
	libavcodec/ppc/dsputil_ppc.c
	libavcodec/ppc/idctdsp.c
	libavcodec/x86/Makefile
	libavcodec/x86/dsputil_init.c
	libavcodec/x86/dsputil_mmx.c
	libavcodec/x86/dsputil_x86.h

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-01 15:22:11 +02:00
Diego Biurrun
e3fcb14347 dsputil: Split off IDCT bits into their own context 2014-06-30 07:58:46 -07:00
Michael Niedermayer
32cf26cc6a Merge commit 'f23d26a6864128001b03876b0b92fffe131f2060'
* commit 'f23d26a6864128001b03876b0b92fffe131f2060':
  h264: avoid using uninitialized memory in NEON chroma mc

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-23 20:35:33 +02:00
Janne Grunau
f23d26a686 h264: avoid using uninitialized memory in NEON chroma mc
Adapt commit 982b596ea6640bfe218a31f6c3fc542d9fe61c31 for the arm and
aarch64 NEON asm. 5-10% faster on Cortex-A9.
2014-06-23 16:32:15 +02:00
Michael Niedermayer
99497b4683 Merge commit '9a9e2f1c8aa4539a261625145e5c1f46a8106ac2'
* commit '9a9e2f1c8aa4539a261625145e5c1f46a8106ac2':
  dsputil: Split audio operations off into a separate context

Conflicts:
	configure
	libavcodec/takdec.c
	libavcodec/x86/Makefile
	libavcodec/x86/dsputil.asm
	libavcodec/x86/dsputil_init.c
	libavcodec/x86/dsputil_mmx.c
	libavcodec/x86/dsputil_x86.h

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-22 17:58:28 +02:00
Diego Biurrun
9a9e2f1c8a dsputil: Split audio operations off into a separate context 2014-06-22 06:20:15 -07:00
Michael Niedermayer
08c5859f17 avcodec: add simpleauto idct
This will pick the "best" simple idct compatible idct

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-19 14:28:01 +02:00
Michael Niedermayer
2b05db4f81 Merge commit 'e74433a8e6fc00c8dbde293c97a3e45384c2c1d9'
* commit 'e74433a8e6fc00c8dbde293c97a3e45384c2c1d9':
  dsputil: Split clear_block*/fill_block* off into a separate context

Conflicts:
	configure
	libavcodec/asvdec.c
	libavcodec/dnxhddec.c
	libavcodec/dnxhdenc.c
	libavcodec/dsputil.h
	libavcodec/eamad.c
	libavcodec/intrax8.c
	libavcodec/mjpegdec.c
	libavcodec/ppc/dsputil_ppc.c
	libavcodec/vc1dec.c
	libavcodec/x86/dsputil_init.c
	libavcodec/x86/dsputil_mmx.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-19 04:54:38 +02:00
Diego Biurrun
e74433a8e6 dsputil: Split clear_block*/fill_block* off into a separate context 2014-06-18 14:07:23 -07:00
Christophe Gisquet
ccff45a0d3 apedsp: move to llauddsp
APE is not the sole codec using scalarproduct_and_madd_int16.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-05 20:31:59 +02:00
Michael Niedermayer
83e8650f77 Merge commit '896a5bff64264f4d01ed98eacc97a67260c1e17e'
* commit '896a5bff64264f4d01ed98eacc97a67260c1e17e':
  arm: check if AS supports .dn

Conflicts:
	configure
	libavcodec/arm/vc1dsp_init_neon.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-03 18:19:21 +02:00
Janne Grunau
896a5bff64 arm: check if AS supports .dn
Move the GNU as check before the arch specific asm checks since the .dn
check requires gas compatible assembler.

Disable the VC-1 motion compensation NEON asm which is the only part
using that directive. The integrated assembler in the upcoming clang 3.5
does not support .dn/.qn without plans to change that. Too much effort
to implement it while it is rarely used.

http://llvm.org/bugs/show_bug.cgi?id=18199.
2014-06-03 14:23:03 +02:00
Michael Niedermayer
40f3a87c10 Merge commit '054013a0fc6f2b52c60cee3e051be8cc7f82cef3'
* commit '054013a0fc6f2b52c60cee3e051be8cc7f82cef3':
  dsputil: Move APE-specific bits into apedsp

Conflicts:
	libavcodec/arm/int_neon.S
	libavcodec/x86/dsputil.asm

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-30 00:59:15 +02:00
Diego Biurrun
054013a0fc dsputil: Move APE-specific bits into apedsp 2014-05-29 06:41:15 -07:00
Michael Niedermayer
8b30702c44 Merge commit '6a13505c069890cb0e2a07e29fd819a0cf2e73c1'
* commit '6a13505c069890cb0e2a07e29fd819a0cf2e73c1':
  mpegvideo: move the MpegEncContext fields used from arm asm to the beginning

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-30 00:23:01 +02:00
Anton Khirnov
6a13505c06 mpegvideo: move the MpegEncContext fields used from arm asm to the beginning
This should reduce the frequency with which the offsets need to be
updated.
2014-04-29 14:49:42 +02:00
Ben Avison
9d8ecdd8ca vc-1: Add platform-specific start code search routine to VC1DSPContext.
Initialise VC1DSPContext for parser as well as for decoder.
Note, the VC-1 code doesn't actually use the function pointer yet.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-25 02:36:11 +02:00
Ben Avison
270cede3f3 h264: Move search code search functions into separate source files.
This permits re-use with parsers for codecs which use similar start codes.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-25 02:35:56 +02:00
Michael Niedermayer
06e664366a Merge commit 'a88e1d1c598e641eecd5d43730211d91c82787c6'
* commit 'a88e1d1c598e641eecd5d43730211d91c82787c6':
  lavu: add CHK_OFFS as AV_CHECK_OFFSET to check struct member offsets

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-25 00:55:40 +02:00
Janne Grunau
a88e1d1c59 lavu: add CHK_OFFS as AV_CHECK_OFFSET to check struct member offsets 2014-04-24 18:28:26 +02:00
Michael Niedermayer
af89a685c4 avcodec/arm/vc1dsp_init_neon: fix code so it compiles and passes fate-vc1
The original patch  seems to be missing a 16x16 function though

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-20 20:32:21 +02:00
Christophe Gisquet
319235c67c vc1dsp: introduce cases for 8x8 and 16x16
This allows further unrolling the DSP implementation where possible.

x86 and ARM DSP modified by simply moving the multiple calls from vc1dec
to the DSP code. Decoding improvements should only occurs because of the
compiler actually able to unroll more.

Decoding time: ~8.80s -> 8.64s (ie around 2%)

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-20 18:25:36 +02:00
Michael Niedermayer
5440151fa4 Merge commit '3dc6272bed7890a49080e18eacf3c7a4a6594b0d'
* commit '3dc6272bed7890a49080e18eacf3c7a4a6594b0d':
  Remove a number of unnecessary dsputil.h #includes

Conflicts:
	libavcodec/h264pred.c
	libavcodec/vc1dsp.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-05 18:54:15 +02:00
Diego Biurrun
3dc6272bed Remove a number of unnecessary dsputil.h #includes 2014-04-04 19:08:05 +02:00