The function macro always sets .align 2 before declaring the
function label (since 5c5e1ea3) and always sets the section to
.text (since 278caa6a).
The .align 5 before certain functions, added in fc252eba, were added
before .text and .align were added to the function macro and thus
became useless/unused when the function macro got them.
This restores the original intention, to align the loop entry
points.
Signed-off-by: Martin Storsjö <martin@martin.st>
This file no longer uses the pld instruction at all, all such uses
have been split into hpeldsp_arm.S.
Signed-off-by: Martin Storsjö <martin@martin.st>
* commit 'a03a642d5ceb5f2f7c6ebbf56ff365dfbcdb65eb':
h264: do not use 422 functions for monochrome
See: 07abf13da4
Merged-by: Michael Niedermayer <michaelni@gmx.at>
For:
ff_vc1_inv_trans_{8,4}x{8,4}_{dc_,}neon
ff_put_pixels8x8_neon
ff_put_vc1_mspel_mc{0,1,2,3}{0,1,2,3}_neon (except for 00)
Based on ARM assembly code in libavcodec/arm by Rob Clark and Mans
Rullgard.
Signed-off-by: Martin Storsjö <martin@martin.st>
Before After
Mean StdDev Mean StdDev Change
This function 508.8 23.4 185.4 9.0 +174.4%
Overall 3068.5 31.7 2752.1 29.4 +11.5%
In combination with the preceding patch:
Before After
Mean StdDev Mean StdDev Change
Overall 2925.6 26.2 2752.1 29.4 +6.3%
Signed-off-by: Martin Storsjö <martin@martin.st>
When building for iOS in thumb mode, gas-preprocessor.pl doesn't
mark unused labels as thumb functions (as it does for other
local labels, where it can figure out that they are functions
due to being referenced in branch instructions). This leads to
linker warnings for some of those local labels, such as:
ld: warning: ARM function not 4-byte aligned: __a_evaluation from
libavcodec/libavcodec.a(simple_idct_arm.o)
Therefore, comment them out since they don't have any function.
They do still have a value in documenting key points in the
assembly source though.
Signed-off-by: Martin Storsjö <martin@martin.st>
* commit 'd6e4f5fef0d811e180fd7541941e07dca9e11dc0':
arm: Add VFP-accelerated version of int32_to_float_fmul_array8
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'ce9ed10ac27b9cf32a6257e083ea2f052692d971':
arm: Add VFP-accelerated version of int32_to_float_fmul_scalar
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '41ef1d360bac65032aa32f6b43ae137666507ae5':
arm: Add VFP-accelerated version of synth_filter_float
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Before After
Mean StdDev Mean StdDev Change
This function 1323.0 98.0 746.2 60.6 +77.3%
Overall 15400.0 336.4 14147.5 288.4 +8.9%
Signed-off-by: Martin Storsjö <martin@martin.st>
Before After
Mean StdDev Mean StdDev Change
This function 1389.3 4.2 967.8 35.1 +43.6%
Overall 15577.5 83.2 15400.0 336.4 +1.2%
Signed-off-by: Martin Storsjö <martin@martin.st>
Before After
Mean StdDev Mean StdDev Change
This function 868.2 33.5 436.0 27.0 +99.1%
Overall 15973.0 223.2 15577.5 83.2 +2.5%
Signed-off-by: Martin Storsjö <martin@martin.st>
Before After
Mean StdDev Mean StdDev Change
This function 2653.0 28.5 1108.8 51.4 +139.3%
Overall 17049.5 408.2 15973.0 223.2 +6.7%
Signed-off-by: Martin Storsjö <martin@martin.st>
Before After
Mean StdDev Mean StdDev Change
This function 366.2 18.3 277.8 13.7 +31.9%
Overall 18420.5 489.1 17049.5 408.2 +8.0%
Signed-off-by: Martin Storsjö <martin@martin.st>
Before After
Mean StdDev Mean StdDev Change
This function 1175.0 4.4 366.2 18.3 +220.8%
Overall 19285.5 292.0 18420.5 489.1 +4.7%
Signed-off-by: Martin Storsjö <martin@martin.st>
Before After
Mean StdDev Mean StdDev Change
This function 9295.0 114.9 4853.2 83.5 +91.5%
Overall 23699.8 397.6 19285.5 292.0 +22.9%
Signed-off-by: Martin Storsjö <martin@martin.st>
* commit '36a7df8cf1115aa37a1b0d42324ecde5ab6c2304':
arm: Only build the FFT init files if FFT is enabled
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '9b9b2e9f3036abfd42916bcf734af14b4cb686aa':
build: arm: cosmetics: Place all OBJS declarations in alphabetical order
Merged-by: Michael Niedermayer <michaelni@gmx.at>
A few of the h264qpel neon functions are shared with other
hpeldsp functions in this file.
This fixes standalone compilation of the h264 decoder on arm.
Signed-off-by: Martin Storsjö <martin@martin.st>
It was previously declared as int.
Does not change fate results for x86.
Conflicts:
libavcodec/ppc/fmtconvert_altivec.c
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '7f75f2f2bd692857c1c1ca7f414eb30ece3de93d':
ppc: Drop unnecessary ff_ name prefixes from static functions
x86: Drop unnecessary ff_ name prefixes from static functions
arm: Drop unnecessary ff_ name prefixes from static functions
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This way, the special IDCT permutations are no longer needed. This
is similar to how H264 does it, and removes the dsputil dependency
imposed by the scantable code.
Also remove the unused type == 0 cases from the plain C version
of the idct.
Signed-off-by: Martin Storsjö <martin@martin.st>
With the current code it fails due to running out
of registers.
So code the store offsets manually into the assembler
instead.
Passes "make fate-dts".
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
The non-intra-pcm branch in hl_decode_mb (simple, 8bpp) goes from 700
to 672 cycles, and the complete loop of decode_mb_cabac and hl_decode_mb
(in the decode_slice loop) goes from 1759 to 1733 cycles on the clip
tested (cathedral), i.e. almost 30 cycles per mb faster.
Signed-off-by: Martin Storsjö <martin@martin.st>
This way, the special IDCT permutations are no longer needed. Bfin code
is disabled until someone updates it. This is similar to how H264 does
it, and removes the dsputil dependency imposed by the scantable code.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '76b19a3984359b3be44d4f7e4e69b7b86729a622':
Fix a number of incorrect intmath.h #includes.
avconv: remove an unused variable
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The non-intra-pcm branch in hl_decode_mb (simple, 8bpp) goes from 700
to 672 cycles, and the complete loop of decode_mb_cabac and hl_decode_mb
(in the decode_slice loop) goes from 1759 to 1733 cycles on the clip
tested (cathedral), i.e. almost 30 cycles per mb faster.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'a846dccb29d2bb0798af1d47d06100eda9ca87cc':
h264chroma: x86: Fix building with yasm disabled
rv34: Drop now unnecessary dsputil dependencies
Conflicts:
libavcodec/x86/Makefile
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '620289a20e022b9c16c10d546ef86cc0bb77cc84':
sh4: Fix silly type vs. variable name search and replace typo
configure: Group all hwaccels together in a separate variable
Add av_cold attributes to arch-specific init functions
Conflicts:
configure
libavcodec/arm/mpegvideo_armv5te.c
libavcodec/x86/mlpdsp.c
libavcodec/x86/motion_est.c
libavcodec/x86/mpegvideoenc.c
libavcodec/x86/videodsp_init.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '25841dfe806a13de526ae09c11149ab1f83555a8':
Use ptrdiff_t instead of int for {avg, put}_pixels line_size parameter.
Conflicts:
libavcodec/alpha/dsputil_alpha.c
libavcodec/dsputil_template.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
Use proper "" quotes for local header #includes
ppc: fmtconvert: Drop two unused variables.
bink demuxer: set framerate.
Conflicts:
libavcodec/kbdwin.c
libavcodec/ppc/fmtconvert_altivec.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This makes the plain-armv6 version use the same registers as the
armv6t2 version above.
This fixes fate-vp8 on plain-armv6 devices.
Signed-off-by: Martin Storsjö <martin@martin.st>
* commit '6bdb841b46d170d58488deaed720729b79223b1d':
arm: h264qpel: use neon h264 qpel functions only if supported
* bug was fixed previously (in merge of buggy code):
h264: copy h264qpel dsp context to slice thread copies
Merged-by: Michael Niedermayer <michaelni@gmx.at>