Tested-by: Andreas Haupt
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
(cherry picked from commit cab6302534962331753fb69c674df86a458b098d)
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'b8e57113ecba5494d4bf47c29634392ea5fdb17b':
arm: Avoid using the 'setend' instruction on ARMv7 and newer
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This instruction is deprecated on ARMv8, and it is serializing on
some ARMv7 cores as well [1].
[1] http://article.gmane.org/gmane.linux.ports.arm.kernel/339293
CC: libav-stable@libav.org
Signed-off-by: Martin Storsjö <martin@martin.st>
(cherry picked from commit 79fce1ec8abd017593c003917fc123f7119a78d6)
Signed-off-by: Reinhard Tartler <siretart@tauware.de>
The overread avoidance fix in cbddee1cca0ebd01e8c5aa694d31228eb4de4b41
broke the computation for the last row since it prevented the safe
reading from the height+1-th row.
CC: libav-stable@libav.org
(cherry picked from commit 61985ad72c47bbb668f2d3923bf5c9df83e79323)
Based on a patch by Russel King <rmk+libav@arm.linux.org.uk>
Bug-Id: 646
CC: libav-stable@libav.org
(cherry picked from commit cbddee1cca0ebd01e8c5aa694d31228eb4de4b41)
The integrated arm assembler in clang-503.0.38 (Xcode-5.1) seems
to get confused by the offset of 4 and decides to use a non-wide
thumb encoding. That fails since the labels are out of range of
the limited offset a 16-bit thumb encoding offers.
Was missed in aeaf268e52fc11c1f64914a319e0edddf1346d6a when integrating
clear_blocks into the idct.
(cherry picked from commit 4506a854a4d846692ba71daeeff661dc214c8fa2)
The overread avoidance fix in cbddee1cca0ebd01e8c5aa694d31228eb4de4b41
broke the computation for the last row since it prevented the safe
reading from the height+1-th row.
The vector dequantization has a test in a loop preventing effective SIMD
implementation. By moving it out of the loop, this loop can be DSPized.
Therefore, modify the current DSP implementation. In particular, the
DSP implementation no longer has to handle null loop sizes.
The decode_hf implementations have following timings:
For x86 Arrandale:
C SSE SSE2 SSE4
win32: 260 162 119 104
win64: 242 N/A 89 72
The arm NEON optimizations follow in a later patch as external asm. The
now unused check for the y modifier in arm inline asm is removed from
configure.
The scaling factor is constant so it is faster to scale the
FIR coefficients in the tables during compilation.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
* commit '5c1c6e82261b856214499b9fef3a08bf3ff6e0ae':
dca: include dcadsp.h in {arm,x86}/dca.h for checkheaders
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The x86 runs short on registers because numerous elements are not static.
In addition, splitting them allows more optimized code, at least for x86.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
It is currently declared as a macro who is set to inlinable functions,
among which a Neon and a default C implementations.
Add a DSP parameter to each inline function, unused except by the
default C implementation which calls a function from the DSP context.
On an Arrandale CPU, gain for an inlined SSE2 function vs. a call:
- Win32: 29 to 26 cycles
- Win64: 25 to 23 cycles
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '5bcbb516f2ff45290ef7995b081762e668693672':
arm: Add X() around all references to extern symbols
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The x86 runs short on registers because numerous elements are not static.
In addition, splitting them allows more optimized code, at least for x86.
Arm asm changes by Janne Grunau.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
It is currently declared as a macro who is set to inlinable functions,
among which a Neon and a default C implementations.
Add a DSP parameter to each inline function, unused except by the
default C implementation which calls a function from the DSP context.
On an Arrandale CPU, gain for an inlined SSE2 function vs. a call:
- Win32: 29 to 26 cycles
- Win64: 25 to 23 cycles
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
* qatar/master:
vp8: Use 2 registers for dst_stride and src_stride in neon bilin filter
Conflicts:
libavcodec/arm/vp8dsp_neon.S
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '7151c5d04aed3b496c21f713dcb603e2cbdb9c49':
arm: Use full filenames as multiple inclusion guards
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
arm: Add an option for making sure NEON registers aren't clobbered
Conflicts:
configure
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '5dae4872357613a0b51120b54a4c5221e0ec3f69':
arm: Allow overriding the alignment set in the function macro
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'b7b932f5e3602bd34c3cc634b71c8bbbc0fb8dc0':
arm: Remove a leftover define for the pld instruction
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The function macro always sets .align 2 before declaring the
function label (since 5c5e1ea3) and always sets the section to
.text (since 278caa6a).
The .align 5 before certain functions, added in fc252eba, were added
before .text and .align were added to the function macro and thus
became useless/unused when the function macro got them.
This restores the original intention, to align the loop entry
points.
Signed-off-by: Martin Storsjö <martin@martin.st>
This file no longer uses the pld instruction at all, all such uses
have been split into hpeldsp_arm.S.
Signed-off-by: Martin Storsjö <martin@martin.st>