Linfeng Zhang
76a3d3fcc5
Remove the unnecessary upcasts of (int)cospi_{1...31}_64
...
BUG=webm:1450
Change-Id: Ib046fe28caec5b9ebdc9d0152df7c54ff4266858
2017-09-20 14:13:26 -07:00
Linfeng Zhang
64653fa133
Change cospi_{1...31}_64 from tran_high_t to tran_coef_t
...
The unnecessary upcast to (int) will be cleaned later.
BUG=webm:1450
Change-Id: Ia234575206d5a74540526924b06ed3939322d063
2017-09-20 14:13:26 -07:00
Scott LaVarnway
b85e391ac8
Merge "vpxdsp: [x86] add highbd_d63_predictor functions"
2017-09-20 11:39:28 +00:00
Linfeng Zhang
bf8bdae913
Refactor convolve code
...
Extract a couple of static functions into their caller functions.
Change-Id: If8d8a0e217fba6b402d2a79ede13b5b444ff08a0
2017-09-19 16:28:31 -07:00
Scott LaVarnway
bc86e2c6a2
vpxdsp: [x86] add highbd_d63_predictor functions
...
C vs SSE2 speed gains:
_4x4 : ~2.94x
C vs SSSE3 speed gains:
_8x8 : ~8.69x
_16x16 : ~6.32x
_32x32 : ~5.33x
BUG=webm:1411
Change-Id: I2c35b527eac2229f17aaa9d118fb601e7195efe4
2017-09-19 15:47:22 -07:00
Johann
eb4238ac70
Revert "Revert "quantize avx: copy 32x32 implementation""
...
This reverts commit 8c42237bb2
.
Because ssse3 code is used for the reference, the qcoeff and dqcoeff
reference buffers must be aligned.
Original change's description:
> quantize avx: copy 32x32 implementation
>
> Ensure avx and ssse3 stay in sync by testing them against each other.
>
> Change-Id: I699f3b48785c83260825402d7826231f475f697c
Change-Id: Ieeef11b9406964194028b0d81d84bcb63296ae06
2017-09-12 14:25:38 -07:00
Scott LaVarnway
d6c9bbc2b6
vpxdsp: [x86] add highbd_d207_predictor functions
...
C vs SSE2 speed gains:
_4x4 : ~2.31x
C vs SSSE3 speed gains:
_8x8 : ~4.73x
_16x16 : ~10.88x
_32x32 : ~4.80x
BUG=webm:1411
Change-Id: I0bac29db261079181ddabc6814bd62c463109caf
2017-09-11 07:36:24 -07:00
Linfeng Zhang
ef41c6286d
Update convolve functions' assertions
...
So that 4 to 1 frame scaling can call them.
Change-Id: I9ec438aa63b923ba164ad3c59d7ecfa12789eab5
2017-09-07 12:33:58 -07:00
Linfeng Zhang
7219f31904
Merge "Remove get_filter_base() and get_filter_offset() in convolve"
2017-09-06 22:39:15 +00:00
Linfeng Zhang
d331e7a1c0
Remove get_filter_base() and get_filter_offset() in convolve
...
so that the convolve functions are independent of table alignment.
Change-Id: Ieab132a30d72c6e75bbe9473544fbe2cf51541ee
2017-09-05 15:22:36 -07:00
Scott LaVarnway
bc4bcca3fd
vpxdsp: [x86] add highbd_dc_128_predictor functions
...
C vs SSE2 speed gains:
_4x4 : ~7.64x
_8x8 : ~16.60x
_16x16 : ~8.15x
_32x32 : ~5.05x
BUG=webm:1411
Change-Id: If165d419711cfda901bd428a05ca1560a009e62e
2017-09-05 07:57:42 -07:00
Scott LaVarnway
c39a05ff61
vpxdsp: [x86] add highbd_dc_left_predictor functions
...
C vs SSE2 speed gains:
_4x4 : ~6.49x
_8x8 : ~10.82x
_16x16 : ~7.61x
_32x32 : ~5.29x
BUG=webm:1411
Change-Id: Ibc30c50cb7139049bf05298010803499e6ef949b
2017-08-30 09:29:06 -07:00
Scott LaVarnway
f783e3a75d
vpxdsp: [x86] add highbd_dc_top_predictor functions
...
C vs SSE2 speed gains:
_4x4 : ~7.39x
_8x8 : ~11.36x
_16x16 : ~8.68x
_32x32 : ~4.33x
BUG=webm:1411
Change-Id: I7f1487cd1531d4e7f0fbb4596fed3bfb72a59d58
2017-08-29 12:53:30 -07:00
Scott LaVarnway
30d9a1916c
vpxdsp: [x86] add highbd_h_predictor functions
...
C vs SSE2 speed gains:
_4x4 : ~8.12x
_8x8 : ~9.71x
_16x16 : ~8.21x
_32x32 : ~5.0x
BUG=webm:1422
Change-Id: I5e8a1ed4db7b8dc539b3e2a728b0b34d8b4b1993
2017-08-28 17:31:18 -07:00
Marco Paniconi
8c42237bb2
Revert "quantize avx: copy 32x32 implementation"
...
This reverts commit f60d1dcd3d
.
Reason for revert: <INSERT REASONING HERE>
Failures in AVX/VP9QuantizeTest in nightly tests.
Original change's description:
> quantize avx: copy 32x32 implementation
>
> Ensure avx and ssse3 stay in sync by testing them against each other.
>
> Change-Id: I699f3b48785c83260825402d7826231f475f697c
TBR=slavarnway@google.com ,johannkoenig@google.com,builds@webmproject.org
Change-Id: Ibd38636212269328317dd0721be9d25452113d1c
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
2017-08-25 16:56:08 +00:00
Johann
f60d1dcd3d
quantize avx: copy 32x32 implementation
...
Ensure avx and ssse3 stay in sync by testing them against each other.
Change-Id: I699f3b48785c83260825402d7826231f475f697c
2017-08-24 10:42:34 -07:00
Johann
1787e7dbe0
quantize ssse3: copy implementation to intrinsics
...
Still does not pass tests. Does match the previous assembly, although
saving the sign before multiplying is dubious.
Change-Id: Ia163f18c755aba542d6e93f7bf7343184660df5a
2017-08-24 07:47:51 -07:00
Johann Koenig
f53b656207
Merge "quantize avx: copy implementation to intrinsics"
2017-08-23 21:14:13 +00:00
Scott LaVarnway
1aad50c092
Merge "vpx_dsp: get32x32var_avx2() cleanup"
2017-08-23 19:59:25 +00:00
Johann
7c27872164
quantize avx: copy implementation to intrinsics
...
Adds an early exit based on ptest. Slightly slower than ssse3 in the
full case because of the extra check, but potentially faster if lots of
rows can be skipped.
Very close in speed to the assembly.
Can run in 32 bit, unlike the assembly. Allows reworking the function
prototype to use structs.
Change-Id: If80e2b9ba059370a4cad3c973196e82a97b4330e
2017-08-23 09:19:16 -07:00
Johann
b9c1dcc5fa
quantize ssse3: copy style from sse2
...
Change-Id: I53f8a160e640c674ea035fc112e207b6dca42598
2017-08-22 14:25:27 -07:00
Johann
75752ab7c0
quantize sse2: copy opts from ssse3
...
Simplify eob calculations based on ssse3 implementation.
General clean up and re-scoping.
Change-Id: I48f282bf9bd28ee9bc2c7a6779be9d45b5a3a3ee
2017-08-22 13:01:44 -07:00
Johann
c02fdd0258
quantize: ignore skip_block in x86
...
Change-Id: I9a963e99f08761f0c8d6a305619270b2f1c4edf8
2017-08-21 14:37:03 -07:00
Scott LaVarnway
eab3f5e0cc
vpx_dsp: get32x32var_avx2() cleanup
...
renamed to get32x16var_avx2()
BUG=webm:1404
Change-Id: Icb8f3986c9c9c646e13a69430db7235fc7e1a036
2017-08-18 13:44:09 -07:00
Scott LaVarnway
2c5478e383
Merge "vpx_dsp: vpx_get16x16var_avx2() cleanup"
2017-08-18 20:30:59 +00:00
Scott LaVarnway
2f7497f341
vpx_dsp: vpx_get16x16var_avx2() cleanup
...
BUG=webm:1404
Change-Id: I88aceb07f4db4870a06eee21d87296974ce3221a
2017-08-18 12:23:49 -07:00
James Zern
bb15fd51be
highbd_idct32x32*,idct32_34_4x32_quarter_1_2: fix typo
...
135 -> 34
fixes unused function warnings for highbd_idct32_34_4x32_quarter_[12]
Change-Id: I4f50ff6ea514200af93dd59ff94c7f9717409682
2017-08-17 15:37:38 -07:00
James Zern
e038d1610e
inv_txfm_sse2.h: correct idct*/iadst* prototypes
...
fixes mismatch between prototypes and definitions
Change-Id: Ib5e7dfcce244dbb8401815be2cdd183d96792652
2017-08-16 23:06:09 -07:00
Linfeng Zhang
d72e20b123
Add vpx_highbd_idct32x32_{34, 135, 1024}_add_{sse2, sse4_1}
...
BUG=webm:1412
Change-Id: I08b562b60fa85fbc2fec1c15c323a3444b44618f
2017-08-14 17:05:22 -07:00
Linfeng Zhang
69775d2f40
Update highbd idct x86 optimizations.
...
BUG=webm:1412
Change-Id: Ia275940af7d7d8637e9a851a9e39d655bfbe4069
2017-08-14 16:59:50 -07:00
Linfeng Zhang
3f05a70c41
Update 32x32 idct sse2 and ssse3 optimizations.
...
Change-Id: I51106e90344035452621c49a6e1be7d5276b6c70
2017-08-14 16:59:31 -07:00
Linfeng Zhang
15193ce51f
Merge "Clean highbd idct x86 code with inline functions"
2017-08-10 20:25:18 +00:00
Johann Koenig
0b393ae505
Merge "quantize: copy ssse3 optimizations to intrinsics"
2017-08-10 15:42:20 +00:00
Linfeng Zhang
39da7fb786
Clean highbd idct x86 code with inline functions
...
Created inline functions highbd_butterfly_cospi16_sse2()
and highbd_butterfly_cospi16_sse4_1()
BUG=webm:1412
Change-Id: Icbc53a73712b6207379872a5e88d0a4d09e2322a
2017-08-08 17:53:28 -07:00
Johann
d52cb59729
quantize: copy ssse3 optimizations to intrinsics
...
Fairly minor differences from sse2. pabsw and psignw are the big gains.
Also re-uses some values in eob calculation to avoid an extra pcmp.
Fixes test failures in HBD and OS X builds.
Allows using it in 32bit builds, where it is about 40% faster than sse2.
Substantially faster than the assembly for skip_block. 10-20% faster the
rest of the time.
Change-Id: If783bb3567e561e47667e10133b9c84414a334e2
2017-08-08 12:22:14 -07:00
Linfeng Zhang
853165ba39
Update 32x32 idct sse2 funcs, add partial case 135
...
Change-Id: I2b9add83f6fd8f9138fed3bec04a59877a237a6a
2017-08-07 17:37:02 -07:00
Linfeng Zhang
d670678f26
Rename highbd_multiplication_and_add_xx() to highbd_butterfly_xx()
...
in idct x86 code
Change-Id: I5159499a73a5c1b680516f6ca9c3d84f00c35083
2017-08-04 15:33:37 -07:00
Linfeng Zhang
fa829e0e5a
Replace multiplication_and_add() with butterfly() in idct x86 code
...
Change-Id: I266e45a3d75a5357c7d6e6f20ab5c6fdbfe4982e
2017-08-04 15:33:34 -07:00
Linfeng Zhang
c9fb719ee1
Update butterfly() in idct x86 optimizations.
...
Change-Id: Ic73e03bab9fdc085146f52094014db4af36ad701
2017-08-04 15:33:28 -07:00
Linfeng Zhang
7f20c3ac44
Add vpx_highbd_idct16x16_{10, 38, 256}_add_sse4_1
...
BUG=webm:1412
Change-Id: I8877c986b4042f7b8e33f5674c86700675a0e4ca
2017-08-04 15:31:17 -07:00
Linfeng Zhang
22b6dc9fdf
Update for loop increment of idct x86 functions
...
Change-Id: Ided7895eaf41d5bc9d64fe536a17f5a078da68d4
2017-08-04 15:29:19 -07:00
Linfeng Zhang
0c61331244
Update high bitdepth 16x16 idct x86 code
...
Prepare for high bitdepth 16x16 idct sse4.1 code.
Just functions moving and renaming.
BUG=webm:1412
Change-Id: Ie056fe4494b1f299491968beadcef990e2ab714a
2017-08-04 15:12:33 -07:00
Scott LaVarnway
c42517568d
vpx_dsp: merge avx2 variance files
...
BUG=webm:1404
Change-Id: Ieb8f85c3811b05df78722cb41eeb1166966ceec4
2017-08-04 07:49:30 -07:00
Linfeng Zhang
e921c7ba8d
Merge "Rewrite vpx_idct16x16_{10,256}_add_sse2() and add case 38 function"
2017-08-04 01:16:35 +00:00
Scott LaVarnway
f6c6f37e0c
Merge "vpx_dsp: Use correct check for halfpel in"
2017-08-03 23:17:09 +00:00
Linfeng Zhang
563d58ab84
Rewrite vpx_idct16x16_{10,256}_add_sse2() and add case 38 function
...
BUG=webm:1412
Change-Id: I945f0fb6807b8948747243794dc7352b959221f7
2017-08-03 13:59:47 -07:00
Linfeng Zhang
6624f20785
Merge changes I76727df0,I66297d78,I1d000c6b
...
* changes:
Extract inlined 16x16 idct sse2 code into header file
Add transpose_32bit_8x4() sse2 optimization
Update x86 idct optimization
2017-08-03 20:51:02 +00:00
Scott LaVarnway
8334a48d3a
vpx_dsp: Use correct check for halfpel in
...
vpx_sub_pixel_variance32xh_avx2() and
vpx_sub_pixel_avg_variance32xh_avx2
see:
17fae3a
Change to use correct check for halfpel
Change-Id: Ib0741c5c2fd011e9650ca62b76009f1b59fdbe4c
2017-08-03 06:57:40 -07:00
Linfeng Zhang
15a47db730
Extract inlined 16x16 idct sse2 code into header file
...
Will be called by high bitdepth functions.
Change-Id: I76727df00941b5a27adceaba8347f275475fcd8c
2017-08-02 16:17:43 -07:00
Linfeng Zhang
8c0ab7607e
Add transpose_32bit_8x4() sse2 optimization
...
Change-Id: I66297d78b38db718cfe3ebb8ea972f5a72c17955
2017-08-02 16:15:58 -07:00
Scott LaVarnway
698e56f26c
Merge "vpxdsp: variance_impl_avx2.c cleanup"
2017-08-02 19:08:10 +00:00
Scott LaVarnway
632fe8286a
vpxdsp: variance_impl_avx2.c cleanup
...
BUG=webm:1404
Change-Id: I8d8498009e5ef7bf1137e4ff16ec81738a020b02
2017-08-02 05:57:39 -07:00
Linfeng Zhang
6738ad7aaf
Update x86 idct optimization
...
Move constant coefficients preparation into inline function.
Change-Id: I1d000c6b161794c8828ff70768439b767e2afea1
2017-08-01 14:40:12 -07:00
Linfeng Zhang
bf14d468c1
Rewrite vpx_highbd_idct8x8_{12,64}_add_sse2
...
This replaces commit aa1c4cd
, which has a bug and was reverted in
commit 3c73e58
.
The bug is caused by rounding -step1[5] in highbd_idct8x8_12_half1d().
Change-Id: I37b3a5f0d91815f2dc570209091dc6626fd178a8
2017-07-31 16:36:13 -07:00
James Zern
78155b7ed5
highbd_inv_txfm_sse4: make << of neg. val a multiply
...
left shifting a negative value is undefined; quiets a ubsan warning.
this is applied to a constant, no change in the generated code.
Change-Id: I595f0ff7904ef025e07bb80234293d958dc9f254
2017-07-30 12:48:28 -07:00
James Zern
d35b627340
Revert "Rewrite vpx_highbd_idct8x8_{12,64}_add_sse2"
...
This reverts commit aa1c4cd140
.
This fails the following tests with extreme input coefficients:
SSE2/InvTrans8x8DCT.CompareReference/0
SSE2/InvTrans8x8DCT.CompareReference/2
previously the optimized path was skipped in this range
Change-Id: I9af015a46eba96208834a219fafd651d37556a80
2017-07-29 11:12:27 -07:00
Linfeng Zhang
75653b7032
Merge changes Ia0e20f5f,I28150789,I35df041b,I221dff34
...
* changes:
Update vpx_idct16x16_10_add_sse2()
Add vpx_idct16x16_38_add_sse2()
Rewrite vpx_highbd_idct8x8_{12,64}_add_sse2
Refactor highbd idct 4x4 and 8x8 x86 functions
2017-07-28 22:43:00 +00:00
James Zern
3c73e587d1
Revert "quantize ssse3: declare all variables"
...
This reverts commit 03f5e300d6
.
This causes test failures under OSX:
SSSE3/VP9QuantizeTest.EOBCheck/0
SSSE3/VP9QuantizeTest.OperationCheck/0
Change-Id: I122732717ead1f7af5b04c529a6948e382e5e59b
2017-07-28 01:22:16 -07:00
Linfeng Zhang
5232e35bc2
Update vpx_idct16x16_10_add_sse2()
...
Change-Id: Ia0e20f5fa47382af5785221eebb05212b40bd35c
2017-07-27 18:03:25 -07:00
Linfeng Zhang
7f4acf8700
Add vpx_idct16x16_38_add_sse2()
...
Change-Id: I28150789feadc0b63d2fadc707e48971b41f9898
2017-07-27 18:02:43 -07:00
Linfeng Zhang
aa1c4cd140
Rewrite vpx_highbd_idct8x8_{12,64}_add_sse2
...
BUG=webm:1412
Change-Id: I35df041b757d42278ac7a5cdbd909e8ffcee1455
2017-07-27 18:02:36 -07:00
Linfeng Zhang
9c43d81bc2
Refactor highbd idct 4x4 and 8x8 x86 functions
...
BUG=webm:1412
Change-Id: I221dff34dd5f71b390b5e043d0a137ccb0a01dec
2017-07-27 18:01:03 -07:00
Johann Koenig
a83e1f1d53
Merge "quantize ssse3: declare all variables"
2017-07-27 21:18:35 +00:00
James Zern
1c666465af
inv_txfm_{sse2,ssse3}: clear conversion warnings
...
visual studio reports tran_high_t (int64) -> short in calls to
_mm_set1_epi16
Change-Id: Icb8d1baee77ad3d45edb1477a443d3e648f0b745
2017-07-25 20:13:49 -07:00
James Zern
62682ac8ad
highbd_idct*_sse*.c: clear conversion warnings
...
visual studio reports tran_high_t (int64) -> int in calls to
_mm_setr_epi32
Change-Id: Ic2247c8e3800991202151790d78bd94c4f4aed05
2017-07-25 20:11:09 -07:00
James Zern
85736e616e
vpx_variance16x16_sse2: correct cast order
...
allow the right shift to operate on 64-bits, this matches the rest of
the implementations
previously:
b0f1ae147
vpx_get16x16var_avx2: correct cast order
Change-Id: I632ee5e418f3f9b30e79ecd05588eb172b0783aa
2017-07-25 16:45:40 -07:00
James Zern
b0f1ae1475
vpx_get16x16var_avx2: correct cast order
...
allow the right shift to operate on 64-bits, this matches the rest of
the implementations
missed in:
6acd061aa
variance_avx2: sync variance functions with c-code
Change-Id: Icae436b881251ccb9f9ed64fcbf8d358c58a4617
2017-07-24 16:29:44 -07:00
Johann
03f5e300d6
quantize ssse3: declare all variables
...
Copy missing line from avx implementation.
Change-Id: I9755c5b4d4034867de6fa9f741c24bf49dce3a27
2017-07-18 12:32:57 -07:00
James Zern
3dd993e4be
highbd_idct8x8_add_sse4: make << of neg. val a multiply
...
left shifting a negative value is undefined; quiets a ubsan warning.
this is applied to a constant, no change in the generated code.
Change-Id: Ia17a7672d4832463decbc4afd6cd42974d02698e
2017-07-01 11:56:56 -07:00
Linfeng Zhang
c338f3635e
Add vpx_highbd_idct8x8_{12, 64}_add_sse4_1
...
BUG=webm:1412
Change-Id: I5d038b4fa842ce2f6b9bd5c8c44c70647bda9591
2017-06-29 17:19:34 -07:00
Linfeng Zhang
ee5cb8d87f
sse2: Add transpose_32bit_4x4x2() and update transpose_32bit_4x4()
...
BUG=webm:1412
Change-Id: I9d00d1ddbd724fd5f825fd974c4cf46a9bca6cb3
2017-06-29 17:18:01 -07:00
Linfeng Zhang
0fa59a4baf
Refactor highbd idct 4x4 sse4.1 code and add highbd_inv_txfm_sse4.h
...
Also clean highbd_inv_txfm_sse2.h
BUG=webm:1412
Change-Id: I0722841d824ce602874019bd9779b10d49d10c0b
2017-06-29 17:17:43 -07:00
Linfeng Zhang
9ac78ae35f
Refactor vpx_idct8x8_12_add_ssse3() and add inv_txfm_ssse3.h
...
BUG=webm:1412
Change-Id: I1f640db71ad4c644b7521305a781f2218eb1ba9d
2017-06-29 17:13:28 -07:00
Linfeng Zhang
0bb31a46a4
Update vpx_idct8x8_12_add_ssse3()
...
Change-Id: I0f38801c391db87ddae168602a786a062cd34b1d
2017-06-26 14:57:41 -07:00
Linfeng Zhang
a76b6b232c
Update load_input_data() in x86
...
Split to load_input_data4() and load_input_data8().
Use pack with signed saturation instruction for high bitdepth.
Change-Id: Icda3e0129a6fdb4a51d1cafbdc652ae3a65f4e06
2017-06-26 13:38:33 -07:00
Linfeng Zhang
8253a27904
Add vpx_highbd_idct4x4_16_add_sse4_1()
...
BUG=webm:1412
Change-Id: Ie33482409351a01be4e89466b0441834eb1e905a
2017-06-23 14:30:12 -07:00
Linfeng Zhang
b8a4b5dd8d
Cosmetics, 8x8 idct SSE2 optimization
...
Change-Id: Id21fa94fd323e36cd19a2d890bf4a0cafb7d964d
2017-06-23 14:30:12 -07:00
Linfeng Zhang
466b667ff3
Clean vpx_idct16x16_256_add_sse2()
...
Remove macro IDCT16 which is redundant with idct16_8col().
Change-Id: I783c5f4fda038a22d5ee5c2b22e8c2cdfb38432c
2017-06-21 13:47:15 -07:00
Linfeng Zhang
42522ce0b7
Update vpx_idct{8x8,16x16,32x32}_1_add_sse2()
...
Change-Id: I365f8e53d9ccd028cef0f561d4de9e5916278609
2017-06-21 13:47:05 -07:00
Linfeng Zhang
2b43a1ee18
Clean 32x32 full idct sse2 and ssse3 code
...
vpx_idct32x32_1024_add_ssse3() is actually a sse2 function and faster
than vpx_idct32x32_1024_add_sse2(). Replace the slow one. All are
code relocations, no new code.
Change-Id: I5dac0e98cc411a4ce05660406921118986638d19
2017-06-21 13:46:49 -07:00
Linfeng Zhang
c7e4917e97
Clean 8x8 idct x86 optimization
...
Create load_buffer_8x8() and write_buffer_8x8().
Change-Id: Ib26dd515d734a5402971c91de336ab481b213fdf
2017-06-15 14:30:00 -07:00
Linfeng Zhang
98967645a1
Remove vpx_idct8x8_64_add_ssse3()
...
It's almost identical with vpx_idct8x8_64_add_sse2(), except little
difference in instructions order.
Change-Id: Ie60dabc35eaa6ebae7c755e6cff00a710aad284f
2017-06-15 14:09:33 -07:00
Linfeng Zhang
6da6a23291
Update high bitdepth load_input_data() in x86
...
BUG=webm:1412
Change-Id: Ibf9d120b80c7d3a7637e79e123cf2f0aae6dd78c
2017-06-13 16:53:53 -07:00
Linfeng Zhang
d6eeef9ee6
Clean array_transpose_{4X8,16x16,16x16_2) in x86
...
Change-Id: I341399ecbde37065375ea7e63511a26bfc285ea0
2017-06-13 16:50:44 -07:00
Linfeng Zhang
9c72e85e4c
Remove array_transpose_8x8() in x86
...
Duplicate of transpose_16bit_8x8()
Change-Id: Iaa5dd63b5cccb044974a65af22c90e13418e311f
2017-06-13 16:50:44 -07:00
Linfeng Zhang
cbb991b6b8
Convert 8x8 idct x86 macros to inline functions
...
Change-Id: Id59865fd6c453a24121ce7160048d67875fc67ce
2017-06-13 16:50:43 -07:00
Linfeng Zhang
45048dc9dc
Update vpx_highbd_idct4x4_16_add_sse2()
...
BUG=webm:1412
Change-Id: I26e4b34ae9bc1ae80c24f56d740d737a95f1ab84
2017-05-30 09:25:30 -07:00
Linfeng Zhang
6444958f62
Update inv_txfm_sse2.h and inv_txfm_sse2.c
...
Extract shared code into inline functions.
Change-Id: Iee1e5a4bc6396aeed0d301163095c9b21aa66b2f
2017-05-23 14:54:46 -07:00
Linfeng Zhang
c167345ffb
Add vpx_highbd_idct{4x4,8x8,16x16}_1_add_sse2
...
BUG=webm:1412
Change-Id: Ia338a6057d36f9ed7eaa9cbd4dfbf0c3cbdc6468
2017-05-22 11:24:21 -07:00
Linfeng Zhang
18e8baa5c0
Add transpose_32bit_4x4() and rename transpose_4x4() for vpx_dsp/x86
...
Change-Id: Ib57377f6cf6573c04720d3cc5dea4285362b4220
2017-05-16 17:46:37 -07:00
Linfeng Zhang
ecd1eb2162
Update 4x4 idct sse2 functions
...
It's a bit faster to call idct4_sse2() in vpx_idct4x4_16_add_sse2()
Change-Id: I1513be7a895cd2fc190f4a8297c240b17de0f876
2017-05-08 16:16:52 -07:00
Linfeng Zhang
2c3a2ad6f1
Merge changes I0cfe4117,I3581d80d,Ida62c941
...
* changes:
Split dsp/x86/inv_txfm_sse2.c
Update highbd idct functions arguments to use uint16_t dst
Clean CONVERT_TO_BYTEPTR/SHORTPTR in idct
2017-05-08 16:15:57 +00:00
Linfeng Zhang
2231669a83
Split dsp/x86/inv_txfm_sse2.c
...
Spin out highbd idct functions.
BUG=webm:1412
Change-Id: I0cfe4117c00039b6778c59c022eee79ad089a2af
2017-05-03 15:43:02 -07:00
Linfeng Zhang
d5de63d2be
Update highbd idct functions arguments to use uint16_t dst
...
BUG=webm:1388
Change-Id: I3581d80d0389b99166e70987d38aba2db6c469d5
2017-05-03 13:59:16 -07:00
Linfeng Zhang
081b39f2b7
Clean CONVERT_TO_BYTEPTR/SHORTPTR in idct
...
BUG=webm:1388
Change-Id: Ida62c941f2b836d6c9e27b427a7d5008ab6dc112
2017-05-03 13:58:31 -07:00
Yi Luo
a3452996a1
High bit depth inter prediction horizontal/vertical filters AVX2
...
User level speed improvement on i7-6700, cpu-used=1,
x86_64 Linux, bitrate, 1080p, 8Mbps, 4K, 16Mbps:
- Decoder:
1080p: ~4%
4K: ~5%
- Encoder:
1080p: ~1%
4K: ~3%
Change-Id: I51b48f9c5de0d62487d5a11aa579c97bd03dd640
2017-05-03 12:18:01 -07:00
Linfeng Zhang
51dc998f3a
Update highbd convolve functions arguments to use uint16_t src/dst
...
BUG=webm:1388
Change-Id: I6912de2639895d817ce850da8ea9f6c8fe21da42
2017-04-25 14:22:19 -07:00
Linfeng Zhang
bf8a49abbd
Clean CONVERT_TO_BYTEPTR/SHORTPTR in convolve
...
Replace by CAST_TO_BYTEPTR/SHORTPTR.
The rule is: if a short ptr is casted to a byte ptr, any offset
operation on the byte ptr must be doubled. We do this by casting to
short ptr first, adding offset, then casting back to byte ptr.
BUG=webm:1388
Change-Id: I9e18a73ba45ddae58fc9dae470c0ff34951fe248
2017-04-19 12:13:49 -07:00
James Zern
4ba20da8b1
Merge "Add AVX2 optimization to copy/avg functions"
2017-04-15 00:26:08 +00:00
Yi Luo
aa5a941992
Add AVX2 optimization to copy/avg functions
...
Change-Id: Ibcef70e4fead74e2c2909330a7044a29381a8074
2017-04-14 16:50:10 -07:00