Scott LaVarnway
|
8e6022844f
|
vpx: [x86] add vpx_satd_avx2()
SSE2 instrinsic vs AVX2 intrinsic speed gains:
blocksize 16: ~1.33
blocksize 64: ~1.51
blocksize 256: ~3.03
blocksize 1024: ~3.71
Change-Id: I79b28cba82d21f9dd765e79881aa16d24fd0cb58
|
2017-11-10 12:24:12 -08:00 |
|
Scott LaVarnway
|
3bf02ad74a
|
vpx: hadamard: use ptrdiff_t instead of int for stride
Eliminates the following instruction for the x86 (64 bit)
intrinsic code:
movslq %esi,%rax
Change-Id: I8f5ebd40726f998708a668b0f52ea7a0576befae
|
2017-10-26 11:41:48 -07:00 |
|
Scott LaVarnway
|
512bf4e029
|
vpx: [x86] vpx_hadamard_16x16_avx2() highbitdepth fix
Use an intermediate buffer before storing to coeffs when
highbitdepth is enabled.
Change-Id: I101981a1995f1108ad107c55c37d6e09eadb404b
|
2017-10-23 08:49:32 -07:00 |
|
Scott LaVarnway
|
4906cea027
|
vpx: [x86] vpx_hadamard_16x16_avx2() improvements
~10% performance gain. Fixed the cosmetics noted in the
previous commit.
Change-Id: Iddf475f34d0d0a3e356b2143682aeabac459ed13
|
2017-10-20 08:55:06 -07:00 |
|
Scott LaVarnway
|
55c126a5d7
|
vpx: [x86] add vpx_hadamard_16x16_avx2()
This version is ~1.91x faster than the sse2 version. When
highbitdepth is enabled, it is ~1.74x.
Change-Id: I2b0e92ede9f55c6259ca07bf1f8c8a5d0d0955bd
|
2017-10-18 18:00:00 -07:00 |
|