371 Commits

Author SHA1 Message Date
Scott LaVarnway
c42517568d vpx_dsp: merge avx2 variance files
BUG=webm:1404

Change-Id: Ieb8f85c3811b05df78722cb41eeb1166966ceec4
2017-08-04 07:49:30 -07:00
Linfeng Zhang
e921c7ba8d Merge "Rewrite vpx_idct16x16_{10,256}_add_sse2() and add case 38 function" 2017-08-04 01:16:35 +00:00
Scott LaVarnway
f6c6f37e0c Merge "vpx_dsp: Use correct check for halfpel in" 2017-08-03 23:17:09 +00:00
Linfeng Zhang
563d58ab84 Rewrite vpx_idct16x16_{10,256}_add_sse2() and add case 38 function
BUG=webm:1412

Change-Id: I945f0fb6807b8948747243794dc7352b959221f7
2017-08-03 13:59:47 -07:00
Linfeng Zhang
6624f20785 Merge changes I76727df0,I66297d78,I1d000c6b
* changes:
  Extract inlined 16x16 idct sse2 code into header file
  Add transpose_32bit_8x4() sse2 optimization
  Update x86 idct optimization
2017-08-03 20:51:02 +00:00
Scott LaVarnway
8334a48d3a vpx_dsp: Use correct check for halfpel in
vpx_sub_pixel_variance32xh_avx2() and
vpx_sub_pixel_avg_variance32xh_avx2

see:
17fae3a Change to use correct check for halfpel

Change-Id: Ib0741c5c2fd011e9650ca62b76009f1b59fdbe4c
2017-08-03 06:57:40 -07:00
Linfeng Zhang
15a47db730 Extract inlined 16x16 idct sse2 code into header file
Will be called by high bitdepth functions.

Change-Id: I76727df00941b5a27adceaba8347f275475fcd8c
2017-08-02 16:17:43 -07:00
Linfeng Zhang
8c0ab7607e Add transpose_32bit_8x4() sse2 optimization
Change-Id: I66297d78b38db718cfe3ebb8ea972f5a72c17955
2017-08-02 16:15:58 -07:00
Scott LaVarnway
698e56f26c Merge "vpxdsp: variance_impl_avx2.c cleanup" 2017-08-02 19:08:10 +00:00
Scott LaVarnway
632fe8286a vpxdsp: variance_impl_avx2.c cleanup
BUG=webm:1404

Change-Id: I8d8498009e5ef7bf1137e4ff16ec81738a020b02
2017-08-02 05:57:39 -07:00
Linfeng Zhang
6738ad7aaf Update x86 idct optimization
Move constant coefficients preparation into inline function.

Change-Id: I1d000c6b161794c8828ff70768439b767e2afea1
2017-08-01 14:40:12 -07:00
Linfeng Zhang
bf14d468c1 Rewrite vpx_highbd_idct8x8_{12,64}_add_sse2
This replaces commit aa1c4cd, which has a bug and was reverted in
commit 3c73e58.

The bug is caused by rounding -step1[5] in highbd_idct8x8_12_half1d().

Change-Id: I37b3a5f0d91815f2dc570209091dc6626fd178a8
2017-07-31 16:36:13 -07:00
James Zern
78155b7ed5 highbd_inv_txfm_sse4: make << of neg. val a multiply
left shifting a negative value is undefined; quiets a ubsan warning.
this is applied to a constant, no change in the generated code.

Change-Id: I595f0ff7904ef025e07bb80234293d958dc9f254
2017-07-30 12:48:28 -07:00
James Zern
d35b627340 Revert "Rewrite vpx_highbd_idct8x8_{12,64}_add_sse2"
This reverts commit aa1c4cd140007ea5b4be99732fbb23d1fd8cf2b5.

This fails the following tests with extreme input coefficients:
SSE2/InvTrans8x8DCT.CompareReference/0
SSE2/InvTrans8x8DCT.CompareReference/2

previously the optimized path was skipped in this range

Change-Id: I9af015a46eba96208834a219fafd651d37556a80
2017-07-29 11:12:27 -07:00
Linfeng Zhang
75653b7032 Merge changes Ia0e20f5f,I28150789,I35df041b,I221dff34
* changes:
  Update vpx_idct16x16_10_add_sse2()
  Add vpx_idct16x16_38_add_sse2()
  Rewrite vpx_highbd_idct8x8_{12,64}_add_sse2
  Refactor highbd idct 4x4 and 8x8 x86 functions
2017-07-28 22:43:00 +00:00
James Zern
3c73e587d1 Revert "quantize ssse3: declare all variables"
This reverts commit 03f5e300d69d368290305e19cc66bac8b0ea1ff8.

This causes test failures under OSX:
SSSE3/VP9QuantizeTest.EOBCheck/0
SSSE3/VP9QuantizeTest.OperationCheck/0

Change-Id: I122732717ead1f7af5b04c529a6948e382e5e59b
2017-07-28 01:22:16 -07:00
Linfeng Zhang
5232e35bc2 Update vpx_idct16x16_10_add_sse2()
Change-Id: Ia0e20f5fa47382af5785221eebb05212b40bd35c
2017-07-27 18:03:25 -07:00
Linfeng Zhang
7f4acf8700 Add vpx_idct16x16_38_add_sse2()
Change-Id: I28150789feadc0b63d2fadc707e48971b41f9898
2017-07-27 18:02:43 -07:00
Linfeng Zhang
aa1c4cd140 Rewrite vpx_highbd_idct8x8_{12,64}_add_sse2
BUG=webm:1412

Change-Id: I35df041b757d42278ac7a5cdbd909e8ffcee1455
2017-07-27 18:02:36 -07:00
Linfeng Zhang
9c43d81bc2 Refactor highbd idct 4x4 and 8x8 x86 functions
BUG=webm:1412

Change-Id: I221dff34dd5f71b390b5e043d0a137ccb0a01dec
2017-07-27 18:01:03 -07:00
Johann Koenig
a83e1f1d53 Merge "quantize ssse3: declare all variables" 2017-07-27 21:18:35 +00:00
James Zern
1c666465af inv_txfm_{sse2,ssse3}: clear conversion warnings
visual studio reports tran_high_t (int64) -> short in calls to
_mm_set1_epi16

Change-Id: Icb8d1baee77ad3d45edb1477a443d3e648f0b745
2017-07-25 20:13:49 -07:00
James Zern
62682ac8ad highbd_idct*_sse*.c: clear conversion warnings
visual studio reports tran_high_t (int64) -> int in calls to
_mm_setr_epi32

Change-Id: Ic2247c8e3800991202151790d78bd94c4f4aed05
2017-07-25 20:11:09 -07:00
James Zern
85736e616e vpx_variance16x16_sse2: correct cast order
allow the right shift to operate on 64-bits, this matches the rest of
the implementations

previously:
b0f1ae147 vpx_get16x16var_avx2: correct cast order

Change-Id: I632ee5e418f3f9b30e79ecd05588eb172b0783aa
2017-07-25 16:45:40 -07:00
James Zern
b0f1ae1475 vpx_get16x16var_avx2: correct cast order
allow the right shift to operate on 64-bits, this matches the rest of
the implementations

missed in:
6acd061aa variance_avx2: sync variance functions with c-code

Change-Id: Icae436b881251ccb9f9ed64fcbf8d358c58a4617
2017-07-24 16:29:44 -07:00
Johann
03f5e300d6 quantize ssse3: declare all variables
Copy missing line from avx implementation.

Change-Id: I9755c5b4d4034867de6fa9f741c24bf49dce3a27
2017-07-18 12:32:57 -07:00
James Zern
3dd993e4be highbd_idct8x8_add_sse4: make << of neg. val a multiply
left shifting a negative value is undefined; quiets a ubsan warning.
this is applied to a constant, no change in the generated code.

Change-Id: Ia17a7672d4832463decbc4afd6cd42974d02698e
2017-07-01 11:56:56 -07:00
Linfeng Zhang
c338f3635e Add vpx_highbd_idct8x8_{12, 64}_add_sse4_1
BUG=webm:1412

Change-Id: I5d038b4fa842ce2f6b9bd5c8c44c70647bda9591
2017-06-29 17:19:34 -07:00
Linfeng Zhang
ee5cb8d87f sse2: Add transpose_32bit_4x4x2() and update transpose_32bit_4x4()
BUG=webm:1412

Change-Id: I9d00d1ddbd724fd5f825fd974c4cf46a9bca6cb3
2017-06-29 17:18:01 -07:00
Linfeng Zhang
0fa59a4baf Refactor highbd idct 4x4 sse4.1 code and add highbd_inv_txfm_sse4.h
Also clean highbd_inv_txfm_sse2.h

BUG=webm:1412

Change-Id: I0722841d824ce602874019bd9779b10d49d10c0b
2017-06-29 17:17:43 -07:00
Linfeng Zhang
9ac78ae35f Refactor vpx_idct8x8_12_add_ssse3() and add inv_txfm_ssse3.h
BUG=webm:1412

Change-Id: I1f640db71ad4c644b7521305a781f2218eb1ba9d
2017-06-29 17:13:28 -07:00
Linfeng Zhang
0bb31a46a4 Update vpx_idct8x8_12_add_ssse3()
Change-Id: I0f38801c391db87ddae168602a786a062cd34b1d
2017-06-26 14:57:41 -07:00
Linfeng Zhang
a76b6b232c Update load_input_data() in x86
Split to load_input_data4() and load_input_data8().
Use pack with signed saturation instruction for high bitdepth.

Change-Id: Icda3e0129a6fdb4a51d1cafbdc652ae3a65f4e06
2017-06-26 13:38:33 -07:00
Linfeng Zhang
8253a27904 Add vpx_highbd_idct4x4_16_add_sse4_1()
BUG=webm:1412

Change-Id: Ie33482409351a01be4e89466b0441834eb1e905a
2017-06-23 14:30:12 -07:00
Linfeng Zhang
b8a4b5dd8d Cosmetics, 8x8 idct SSE2 optimization
Change-Id: Id21fa94fd323e36cd19a2d890bf4a0cafb7d964d
2017-06-23 14:30:12 -07:00
Linfeng Zhang
466b667ff3 Clean vpx_idct16x16_256_add_sse2()
Remove macro IDCT16 which is redundant with idct16_8col().

Change-Id: I783c5f4fda038a22d5ee5c2b22e8c2cdfb38432c
2017-06-21 13:47:15 -07:00
Linfeng Zhang
42522ce0b7 Update vpx_idct{8x8,16x16,32x32}_1_add_sse2()
Change-Id: I365f8e53d9ccd028cef0f561d4de9e5916278609
2017-06-21 13:47:05 -07:00
Linfeng Zhang
2b43a1ee18 Clean 32x32 full idct sse2 and ssse3 code
vpx_idct32x32_1024_add_ssse3() is actually a sse2 function and faster
than vpx_idct32x32_1024_add_sse2(). Replace the slow one. All are
code relocations, no new code.

Change-Id: I5dac0e98cc411a4ce05660406921118986638d19
2017-06-21 13:46:49 -07:00
Linfeng Zhang
c7e4917e97 Clean 8x8 idct x86 optimization
Create load_buffer_8x8() and write_buffer_8x8().

Change-Id: Ib26dd515d734a5402971c91de336ab481b213fdf
2017-06-15 14:30:00 -07:00
Linfeng Zhang
98967645a1 Remove vpx_idct8x8_64_add_ssse3()
It's almost identical with vpx_idct8x8_64_add_sse2(), except little
difference in instructions order.

Change-Id: Ie60dabc35eaa6ebae7c755e6cff00a710aad284f
2017-06-15 14:09:33 -07:00
Linfeng Zhang
6da6a23291 Update high bitdepth load_input_data() in x86
BUG=webm:1412

Change-Id: Ibf9d120b80c7d3a7637e79e123cf2f0aae6dd78c
2017-06-13 16:53:53 -07:00
Linfeng Zhang
d6eeef9ee6 Clean array_transpose_{4X8,16x16,16x16_2) in x86
Change-Id: I341399ecbde37065375ea7e63511a26bfc285ea0
2017-06-13 16:50:44 -07:00
Linfeng Zhang
9c72e85e4c Remove array_transpose_8x8() in x86
Duplicate of transpose_16bit_8x8()

Change-Id: Iaa5dd63b5cccb044974a65af22c90e13418e311f
2017-06-13 16:50:44 -07:00
Linfeng Zhang
cbb991b6b8 Convert 8x8 idct x86 macros to inline functions
Change-Id: Id59865fd6c453a24121ce7160048d67875fc67ce
2017-06-13 16:50:43 -07:00
Linfeng Zhang
45048dc9dc Update vpx_highbd_idct4x4_16_add_sse2()
BUG=webm:1412

Change-Id: I26e4b34ae9bc1ae80c24f56d740d737a95f1ab84
2017-05-30 09:25:30 -07:00
Linfeng Zhang
6444958f62 Update inv_txfm_sse2.h and inv_txfm_sse2.c
Extract shared code into inline functions.

Change-Id: Iee1e5a4bc6396aeed0d301163095c9b21aa66b2f
2017-05-23 14:54:46 -07:00
Linfeng Zhang
c167345ffb Add vpx_highbd_idct{4x4,8x8,16x16}_1_add_sse2
BUG=webm:1412

Change-Id: Ia338a6057d36f9ed7eaa9cbd4dfbf0c3cbdc6468
2017-05-22 11:24:21 -07:00
Linfeng Zhang
18e8baa5c0 Add transpose_32bit_4x4() and rename transpose_4x4() for vpx_dsp/x86
Change-Id: Ib57377f6cf6573c04720d3cc5dea4285362b4220
2017-05-16 17:46:37 -07:00
Linfeng Zhang
ecd1eb2162 Update 4x4 idct sse2 functions
It's a bit faster to call idct4_sse2() in vpx_idct4x4_16_add_sse2()

Change-Id: I1513be7a895cd2fc190f4a8297c240b17de0f876
2017-05-08 16:16:52 -07:00
Linfeng Zhang
2c3a2ad6f1 Merge changes I0cfe4117,I3581d80d,Ida62c941
* changes:
  Split dsp/x86/inv_txfm_sse2.c
  Update highbd idct functions arguments to use uint16_t dst
  Clean CONVERT_TO_BYTEPTR/SHORTPTR in idct
2017-05-08 16:15:57 +00:00