371 Commits

Author SHA1 Message Date
Scott LaVarnway
4cae64c32c vpxdsp: [x86] add highbd_d117_predictor functions
C vs SSE2 speed gains:
_4x4 : ~2.04x

C vs SSSE3 speed gains:
_8x8 : ~2.82x
_16x16 : ~5.93x
_32x32 : ~2.79x

BUG=webm:1411

Change-Id: I31d949695991c067dac89d91e0bed3e666c94993
2017-09-28 14:45:28 -07:00
Scott LaVarnway
80992a746c Merge "vpxdsp: [x86] add highbd_d153_predictor functions" 2017-09-27 20:40:21 +00:00
Linfeng Zhang
dbbbd44304 fix signed integer overflow of idct
Exposed by fuzz test in high bitdepth.
The bug is introduced in commit 64653fa.

BUG=webm:1466

Change-Id: Idd77d5c6a60efb9241471611ce1aba0646cb6ff5
2017-09-27 11:17:54 -07:00
Scott LaVarnway
19c45ccd43 vpxdsp: [x86] add highbd_d153_predictor functions
C vs SSE2 speed gains:
_4x4 : ~1.95x

C vs SSSE3 speed gains:
_8x8 : ~3.30x
_16x16 : ~5.67x
_32x32 : ~3.87x

BUG=webm:1411

Change-Id: Ib483989b25614aa89b635e8c087d0879a5d71904
2017-09-27 11:01:11 -07:00
Linfeng Zhang
28762341ac Merge changes Ib9105462,Idfac00ed,If8d8a0e2
* changes:
  cosmetics: NEON scaling code
  Refactor convolve NEON code
  Refactor convolve code
2017-09-26 16:10:46 +00:00
Scott LaVarnway
a059dc0986 Merge "vpxdsp: [x86] add highbd_d45_predictor functions" 2017-09-25 11:34:14 +00:00
Scott LaVarnway
cf82f7276e vpxdsp: [x86] add highbd_d45_predictor functions
C vs SSSE3 speed gains:
_4x4 : ~2.45x
_8x8 : ~10.61x
_16x16 : ~11.34x
_32x32 : ~6.36x

BUG=webm:1411

Change-Id: Ic91389a4f1a8ad093f498afe53765b897fb9be09
2017-09-22 05:20:12 -07:00
Linfeng Zhang
d586cdb4d4 Remove the unnecessary cast of (int16_t)cospi_{1...31}_64
BUG=webm:1450

Change-Id: If59743aafe99226e0ec67ab5d20678ce25f53ab8
2017-09-20 14:13:26 -07:00
Linfeng Zhang
76a3d3fcc5 Remove the unnecessary upcasts of (int)cospi_{1...31}_64
BUG=webm:1450

Change-Id: Ib046fe28caec5b9ebdc9d0152df7c54ff4266858
2017-09-20 14:13:26 -07:00
Linfeng Zhang
64653fa133 Change cospi_{1...31}_64 from tran_high_t to tran_coef_t
The unnecessary upcast to (int) will be cleaned later.

BUG=webm:1450

Change-Id: Ia234575206d5a74540526924b06ed3939322d063
2017-09-20 14:13:26 -07:00
Scott LaVarnway
b85e391ac8 Merge "vpxdsp: [x86] add highbd_d63_predictor functions" 2017-09-20 11:39:28 +00:00
Linfeng Zhang
bf8bdae913 Refactor convolve code
Extract a couple of static functions into their caller functions.

Change-Id: If8d8a0e217fba6b402d2a79ede13b5b444ff08a0
2017-09-19 16:28:31 -07:00
Scott LaVarnway
bc86e2c6a2 vpxdsp: [x86] add highbd_d63_predictor functions
C vs SSE2 speed gains:
_4x4 : ~2.94x

C vs SSSE3 speed gains:
_8x8 : ~8.69x
_16x16 : ~6.32x
_32x32 : ~5.33x

BUG=webm:1411

Change-Id: I2c35b527eac2229f17aaa9d118fb601e7195efe4
2017-09-19 15:47:22 -07:00
Johann
eb4238ac70 Revert "Revert "quantize avx: copy 32x32 implementation""
This reverts commit 8c42237bb200253931c49e2c530838f3a877dd65.

Because ssse3 code is used for the reference, the qcoeff and dqcoeff
reference buffers must be aligned.

Original change's description:
> quantize avx: copy 32x32 implementation
>
> Ensure avx and ssse3 stay in sync by testing them against each other.
>
> Change-Id: I699f3b48785c83260825402d7826231f475f697c

Change-Id: Ieeef11b9406964194028b0d81d84bcb63296ae06
2017-09-12 14:25:38 -07:00
Scott LaVarnway
d6c9bbc2b6 vpxdsp: [x86] add highbd_d207_predictor functions
C vs SSE2 speed gains:
_4x4 : ~2.31x

C vs SSSE3 speed gains:
_8x8 : ~4.73x
_16x16 : ~10.88x
_32x32 : ~4.80x

BUG=webm:1411

Change-Id: I0bac29db261079181ddabc6814bd62c463109caf
2017-09-11 07:36:24 -07:00
Linfeng Zhang
ef41c6286d Update convolve functions' assertions
So that 4 to 1 frame scaling can call them.

Change-Id: I9ec438aa63b923ba164ad3c59d7ecfa12789eab5
2017-09-07 12:33:58 -07:00
Linfeng Zhang
7219f31904 Merge "Remove get_filter_base() and get_filter_offset() in convolve" 2017-09-06 22:39:15 +00:00
Linfeng Zhang
d331e7a1c0 Remove get_filter_base() and get_filter_offset() in convolve
so that the convolve functions are independent of table alignment.

Change-Id: Ieab132a30d72c6e75bbe9473544fbe2cf51541ee
2017-09-05 15:22:36 -07:00
Scott LaVarnway
bc4bcca3fd vpxdsp: [x86] add highbd_dc_128_predictor functions
C vs SSE2 speed gains:
_4x4 : ~7.64x
_8x8 : ~16.60x
_16x16 : ~8.15x
_32x32 : ~5.05x

BUG=webm:1411

Change-Id: If165d419711cfda901bd428a05ca1560a009e62e
2017-09-05 07:57:42 -07:00
Scott LaVarnway
c39a05ff61 vpxdsp: [x86] add highbd_dc_left_predictor functions
C vs SSE2 speed gains:
_4x4 : ~6.49x
_8x8 : ~10.82x
_16x16 : ~7.61x
_32x32 : ~5.29x

BUG=webm:1411

Change-Id: Ibc30c50cb7139049bf05298010803499e6ef949b
2017-08-30 09:29:06 -07:00
Scott LaVarnway
f783e3a75d vpxdsp: [x86] add highbd_dc_top_predictor functions
C vs SSE2 speed gains:
_4x4 : ~7.39x
_8x8 : ~11.36x
_16x16 : ~8.68x
_32x32 : ~4.33x

BUG=webm:1411

Change-Id: I7f1487cd1531d4e7f0fbb4596fed3bfb72a59d58
2017-08-29 12:53:30 -07:00
Scott LaVarnway
30d9a1916c vpxdsp: [x86] add highbd_h_predictor functions
C vs SSE2 speed gains:
_4x4 : ~8.12x
_8x8 : ~9.71x
_16x16 : ~8.21x
_32x32 : ~5.0x

BUG=webm:1422

Change-Id: I5e8a1ed4db7b8dc539b3e2a728b0b34d8b4b1993
2017-08-28 17:31:18 -07:00
Marco Paniconi
8c42237bb2 Revert "quantize avx: copy 32x32 implementation"
This reverts commit f60d1dcd3de46f72bafc5eeef481bd1a4e203301.

Reason for revert: <INSERT REASONING HERE>
Failures in AVX/VP9QuantizeTest in nightly tests.
Original change's description:
> quantize avx: copy 32x32 implementation
> 
> Ensure avx and ssse3 stay in sync by testing them against each other.
> 
> Change-Id: I699f3b48785c83260825402d7826231f475f697c

TBR=slavarnway@google.com,johannkoenig@google.com,builds@webmproject.org

Change-Id: Ibd38636212269328317dd0721be9d25452113d1c
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
2017-08-25 16:56:08 +00:00
Johann
f60d1dcd3d quantize avx: copy 32x32 implementation
Ensure avx and ssse3 stay in sync by testing them against each other.

Change-Id: I699f3b48785c83260825402d7826231f475f697c
2017-08-24 10:42:34 -07:00
Johann
1787e7dbe0 quantize ssse3: copy implementation to intrinsics
Still does not pass tests. Does match the previous assembly, although
saving the sign before multiplying is dubious.

Change-Id: Ia163f18c755aba542d6e93f7bf7343184660df5a
2017-08-24 07:47:51 -07:00
Johann Koenig
f53b656207 Merge "quantize avx: copy implementation to intrinsics" 2017-08-23 21:14:13 +00:00
Scott LaVarnway
1aad50c092 Merge "vpx_dsp: get32x32var_avx2() cleanup" 2017-08-23 19:59:25 +00:00
Johann
7c27872164 quantize avx: copy implementation to intrinsics
Adds an early exit based on ptest. Slightly slower than ssse3 in the
full case because of the extra check, but potentially faster if lots of
rows can be skipped.

Very close in speed to the assembly.

Can run in 32 bit, unlike the assembly. Allows reworking the function
prototype to use structs.

Change-Id: If80e2b9ba059370a4cad3c973196e82a97b4330e
2017-08-23 09:19:16 -07:00
Johann
b9c1dcc5fa quantize ssse3: copy style from sse2
Change-Id: I53f8a160e640c674ea035fc112e207b6dca42598
2017-08-22 14:25:27 -07:00
Johann
75752ab7c0 quantize sse2: copy opts from ssse3
Simplify eob calculations based on ssse3 implementation.

General clean up and re-scoping.

Change-Id: I48f282bf9bd28ee9bc2c7a6779be9d45b5a3a3ee
2017-08-22 13:01:44 -07:00
Johann
c02fdd0258 quantize: ignore skip_block in x86
Change-Id: I9a963e99f08761f0c8d6a305619270b2f1c4edf8
2017-08-21 14:37:03 -07:00
Scott LaVarnway
eab3f5e0cc vpx_dsp: get32x32var_avx2() cleanup
renamed to get32x16var_avx2()

BUG=webm:1404

Change-Id: Icb8f3986c9c9c646e13a69430db7235fc7e1a036
2017-08-18 13:44:09 -07:00
Scott LaVarnway
2c5478e383 Merge "vpx_dsp: vpx_get16x16var_avx2() cleanup" 2017-08-18 20:30:59 +00:00
Scott LaVarnway
2f7497f341 vpx_dsp: vpx_get16x16var_avx2() cleanup
BUG=webm:1404

Change-Id: I88aceb07f4db4870a06eee21d87296974ce3221a
2017-08-18 12:23:49 -07:00
James Zern
bb15fd51be highbd_idct32x32*,idct32_34_4x32_quarter_1_2: fix typo
135 -> 34

fixes unused function warnings for highbd_idct32_34_4x32_quarter_[12]

Change-Id: I4f50ff6ea514200af93dd59ff94c7f9717409682
2017-08-17 15:37:38 -07:00
James Zern
e038d1610e inv_txfm_sse2.h: correct idct*/iadst* prototypes
fixes mismatch between prototypes and definitions

Change-Id: Ib5e7dfcce244dbb8401815be2cdd183d96792652
2017-08-16 23:06:09 -07:00
Linfeng Zhang
d72e20b123 Add vpx_highbd_idct32x32_{34, 135, 1024}_add_{sse2, sse4_1}
BUG=webm:1412

Change-Id: I08b562b60fa85fbc2fec1c15c323a3444b44618f
2017-08-14 17:05:22 -07:00
Linfeng Zhang
69775d2f40 Update highbd idct x86 optimizations.
BUG=webm:1412

Change-Id: Ia275940af7d7d8637e9a851a9e39d655bfbe4069
2017-08-14 16:59:50 -07:00
Linfeng Zhang
3f05a70c41 Update 32x32 idct sse2 and ssse3 optimizations.
Change-Id: I51106e90344035452621c49a6e1be7d5276b6c70
2017-08-14 16:59:31 -07:00
Linfeng Zhang
15193ce51f Merge "Clean highbd idct x86 code with inline functions" 2017-08-10 20:25:18 +00:00
Johann Koenig
0b393ae505 Merge "quantize: copy ssse3 optimizations to intrinsics" 2017-08-10 15:42:20 +00:00
Linfeng Zhang
39da7fb786 Clean highbd idct x86 code with inline functions
Created inline functions highbd_butterfly_cospi16_sse2()
and highbd_butterfly_cospi16_sse4_1()

BUG=webm:1412

Change-Id: Icbc53a73712b6207379872a5e88d0a4d09e2322a
2017-08-08 17:53:28 -07:00
Johann
d52cb59729 quantize: copy ssse3 optimizations to intrinsics
Fairly minor differences from sse2. pabsw and psignw are the big gains.
Also re-uses some values in eob calculation to avoid an extra pcmp.

Fixes test failures in HBD and OS X builds.

Allows using it in 32bit builds, where it is about 40% faster than sse2.

Substantially faster than the assembly for skip_block. 10-20% faster the
rest of the time.

Change-Id: If783bb3567e561e47667e10133b9c84414a334e2
2017-08-08 12:22:14 -07:00
Linfeng Zhang
853165ba39 Update 32x32 idct sse2 funcs, add partial case 135
Change-Id: I2b9add83f6fd8f9138fed3bec04a59877a237a6a
2017-08-07 17:37:02 -07:00
Linfeng Zhang
d670678f26 Rename highbd_multiplication_and_add_xx() to highbd_butterfly_xx()
in idct x86 code

Change-Id: I5159499a73a5c1b680516f6ca9c3d84f00c35083
2017-08-04 15:33:37 -07:00
Linfeng Zhang
fa829e0e5a Replace multiplication_and_add() with butterfly() in idct x86 code
Change-Id: I266e45a3d75a5357c7d6e6f20ab5c6fdbfe4982e
2017-08-04 15:33:34 -07:00
Linfeng Zhang
c9fb719ee1 Update butterfly() in idct x86 optimizations.
Change-Id: Ic73e03bab9fdc085146f52094014db4af36ad701
2017-08-04 15:33:28 -07:00
Linfeng Zhang
7f20c3ac44 Add vpx_highbd_idct16x16_{10, 38, 256}_add_sse4_1
BUG=webm:1412

Change-Id: I8877c986b4042f7b8e33f5674c86700675a0e4ca
2017-08-04 15:31:17 -07:00
Linfeng Zhang
22b6dc9fdf Update for loop increment of idct x86 functions
Change-Id: Ided7895eaf41d5bc9d64fe536a17f5a078da68d4
2017-08-04 15:29:19 -07:00
Linfeng Zhang
0c61331244 Update high bitdepth 16x16 idct x86 code
Prepare for high bitdepth 16x16 idct sse4.1 code.
Just functions moving and renaming.

BUG=webm:1412

Change-Id: Ie056fe4494b1f299491968beadcef990e2ab714a
2017-08-04 15:12:33 -07:00