Commit Graph

11 Commits

Author SHA1 Message Date
Johann
2057d3ef75 use memcpy for unaligned neon stores
Advise the compiler that the store is eventually going to a uint8_t
buffer. This helps avoid getting alignment hints which would cause the
memory access to fail.

Originally added as a workaround for clang:
https://bugs.llvm.org//show_bug.cgi?id=24421

Change-Id: Ie9854b777cfb2f4baaee66764f0e51dcb094d51e
2017-05-17 12:11:31 -07:00
Johann
1d14e42df7 Un-Revert "Restore vp8_sixtap_predict4x4_neon"
This restores d9dce2f48e

Switched to using signed shift-and-narrow. Instead of saturating
negative results to 0, it was saturating them to 255.

BUG=webm:817
BUG=webm:1273

Change-Id: I571095336aa4182e3288b17924fcaaece42b0a49
2016-09-23 14:58:57 -07:00
Johann Koenig
7795e99296 Revert "Restore vp8_sixtap_predict4x4_neon"
This reverts commit d9dce2f48e.

Appears to be failing the SixtapPredict tests in some configurations and possibly test vectors as well.

Change-Id: Ica6aa83ebac47d0a76e451846e7da67b1c17a7d7
2016-09-16 06:12:49 +00:00
Johann
d9dce2f48e Restore vp8_sixtap_predict4x4_neon
This function was removed when clang started introducing alignment hints
which caused the 32 bit vld1_lane_u32/vst1_lane_u32 to fail:
https://llvm.org/bugs/show_bug.cgi?id=24421

The load has been rendered safe with an implementation ~indiscernible
performance-wise that uses _u8 and over-reads just a touch.

The store, when unaligned, has a version that is ~25% slower but safe
when xoffset = 0 (second pass filter only). When the first pass filter
(or both) are in play, the new version is almost identical in speed.

Worst case performance (both filters, unaligned stores) is roughly 3-4x
faster than C.

BUG=webm:817
BUG=webm:1273

Change-Id: I1e490e94453e0872151fe0dafb05557463f6247d
2016-09-15 14:56:47 -07:00
Jim Bankoski
3e04114f3d prepend ++ instead of post in for loops.
Applied the following regex  :
search for: (for.*\(.*;.*;) ([a-zA-Z_]*)\+\+\)
replace with: \1 ++\2)

This misses some for loops:
ie : for (mb_col = 0; mb_col < oci->mb_cols; mb_col++, mi++)

Change-Id: Icf5f6fb93cced0992e0bb71d2241780f7fb1f0a8
2016-07-18 06:54:50 -07:00
clang-format
81a6739533 vp8: apply clang-format
Change-Id: I7605b6678014a5426ceb45c27b54885e0c4e06ed
2016-07-15 19:28:44 -07:00
Johann
ce11055d57 Remove sixtap/bilinear 4x4 neon implementations
These implementations rely on casting the pointers to load the data.
Clang implemented optimizations which automatically add alignment hints
to such loads. The 4x4 filters do not guarantee the necessary alignment
so the resulting assembly is broken.
https://llvm.org/bugs/show_bug.cgi?id=24421

BUG=webm:817
BUG=webm:892

Change-Id: I608885299f1f86ff83653b65e0e40d0ae87fb3fe
2016-05-06 17:20:15 -07:00
Jia Jia
0ae866bd19 vp8/vp9: neon: msvc: move the 'ifdef _MSC_VER' bit to vpx_ports/mem.h.
fix compiling warning.

Change-Id: If8706a9046436f704c597e4275a6810c76ba7daa
2014-09-14 01:43:54 +08:00
James Zern
5e30127c7a vp8_sixtap_predict4x4_neon: init src vectors
quiets uninitialized warnings on the first load.

Change-Id: Ied9b03928537a9ed2cd414b9e8a0be00191b0f32
2014-07-10 23:48:47 -07:00
Martin Storsjo
d5d82a5e1a arm: Add a no-op define of __builtin_prefetch for MSVC
Both GCC and RVCT/ARMCC support __builtin_prefetch, but MSVC
doesn't.

Change-Id: I44e1eecead61bc88d8fdfd3fef03d76d4f5afe08
2014-05-07 10:43:24 +03:00
James Yu
08e38f06db VP8 for ARMv8 by using NEON intrinsics 14
Add sixtappredict_neon.c
- vp8_sixtap_predict16x16_neon
- vp8_sixtap_predict8x8_neon
- vp8_sixtap_predict8x4_neon
- vp8_sixtap_predict4x4_neon

Change-Id: I3b02fce48ae2e6c6099041ba5ddd7b090f1463b9
Signed-off-by: James Yu <james.yu@linaro.org>
2014-05-03 19:07:12 -07:00