Advise the compiler that the store is eventually going to a uint8_t
buffer. This helps avoid getting alignment hints which would cause the
memory access to fail.
Originally added as a workaround for clang:
https://bugs.llvm.org//show_bug.cgi?id=24421
Change-Id: Ie9854b777cfb2f4baaee66764f0e51dcb094d51e
This restores d9dce2f48e
Switched to using signed shift-and-narrow. Instead of saturating
negative results to 0, it was saturating them to 255.
BUG=webm:817
BUG=webm:1273
Change-Id: I571095336aa4182e3288b17924fcaaece42b0a49
This reverts commit d9dce2f48e.
Appears to be failing the SixtapPredict tests in some configurations and possibly test vectors as well.
Change-Id: Ica6aa83ebac47d0a76e451846e7da67b1c17a7d7
This function was removed when clang started introducing alignment hints
which caused the 32 bit vld1_lane_u32/vst1_lane_u32 to fail:
https://llvm.org/bugs/show_bug.cgi?id=24421
The load has been rendered safe with an implementation ~indiscernible
performance-wise that uses _u8 and over-reads just a touch.
The store, when unaligned, has a version that is ~25% slower but safe
when xoffset = 0 (second pass filter only). When the first pass filter
(or both) are in play, the new version is almost identical in speed.
Worst case performance (both filters, unaligned stores) is roughly 3-4x
faster than C.
BUG=webm:817
BUG=webm:1273
Change-Id: I1e490e94453e0872151fe0dafb05557463f6247d
These implementations rely on casting the pointers to load the data.
Clang implemented optimizations which automatically add alignment hints
to such loads. The 4x4 filters do not guarantee the necessary alignment
so the resulting assembly is broken.
https://llvm.org/bugs/show_bug.cgi?id=24421
BUG=webm:817
BUG=webm:892
Change-Id: I608885299f1f86ff83653b65e0e40d0ae87fb3fe