I've added a few new functions (d45e, d63e, he, ve) to cover the
filtered h/v 4x4 predictors that are vp8-specific, the "correct"
d45 with the correctly filtered bottom-right pixel (as opposed to
the unfiltered version in vp9), and the "broken" d63 with weirdly
filtered bottom-right pixels (which is correctly filtered in vp9).
There may be a minor performance impact on all systems because we
have to do an extra copy of the Above pixel array to incorporate
the topleft pixel in the same array (thus fitting the vpx_dsp API).
In addition, armv6 will have a more serious performance impact b/c
I removed the armv6/vp8-specific assembly. I'm not sure anyone
cares...
Change-Id: I7f9e5ebee11d8e21aca2cd517a69eefc181b2e86
This commit replaces the vp8_ prefixed subtract function with the
common vpx_subtract_block function. It removes redundant SIMD
optimization codes and unit tests.
Change-Id: I42e086c32c93c6125e452dcaa6ed04337fe028d9
fails unit tests:
[ FAILED ] NEON/VP8SubpelVarianceTest.ExtremeRef/0, where GetParam() = (3, 3, 0x14e36d, 0)
[ FAILED ] NEON/VP8SubpelVarianceTest.Ref/0, where GetParam() = (3, 3, 0x14e36d, 0)
the tests were recently enabled in:
eb88b17 Make vp9 subpixel match vp8
the functions likely haven't changed since being converted from assembly
Change-Id: I6141717b111b8f735f436c160d74270af53ef722
Clang adds alignment hints when casting up the loads/stores. Although
this should be safe for most paths, it's causing some crashes. Either
the source of the misalignment needs to be determined and adjusted or
the intrinsics need to be rewritten to avoid using the cast to load the
data.
BUG=817,892
Change-Id: Ia3aa824d6a4cd97e14325ff49dc730b6f85ec7e8
Create a new component, vpx_dsp, for code that can be shared
between codecs. Move the SAD code into the component.
This reduces the size of vpxenc/dec by 36k on x86_64 builds.
Change-Id: I73f837ddaecac6b350bf757af0cfe19c4ab9327a
The obj_int_extract code is no longer worth maintaining. It creates
significant issues when adapting for different build systems and no
longer offers as significant of a performance benefit due to
improvements in intrinsics.
Source files will remain until the various third-party builds are updated.
The neon fast quantizer has been moved to intrinsics. The armv6 version
has been removed because so few remaining targets require it.
Compilers and processors have improved significantly since the
pack_tokens code was written. The assembly is no longer faster than the
C code.
pack_tokens were the only optimizations for the armv5te targets so the targets
will be removed after the test infrastructure has been updated.
BUG=710
Change-Id: Ic785b167cd9f95eeff31c7c76b7b736c07fb30eb
Use intrinsics for neon quantization. Slight loss (<5%) of performance
compared to the assembly. Roughly 10x faster on arm64 because that was
running C code before.
Change-Id: I7cf5242d8f29b7eab5bca6a1c20c89c9fc9ca66d
vp8_build_intra_predictors_mbuv_s().
This patch replaces the assembly version with an intrinsic
version.
On a Nexus 7, vpxenc (in realtime mode, speed -12)
reported a performance improvement of ~2.6%.
Change-Id: I9ef65bad929450c0215253fdae1c16c8b4a8f26f
C version and sse2 version, and off by default.
For the test clip used, the sse2 performance improved by ~5.6%
Change-Id: Ic2d815968849db51b9d62085d7a490d0e01574f6