gxw
25d9adb74b
vp9: [loongson] optimize vpx_convolve8 with mmi.
...
1. vpx_convolve8_vert_mmi
2. vpx_convolve8_horiz_mmi
3. vpx_convolve8_mmi
4. vpx_convolve8_avg_mmi
5. vpx_convolve8_avg_vert_mmi
Change-Id: I41a6b3b4f327d6b67d282e0163cfa0aee8648abe
2018-03-28 18:11:16 +00:00
Linfeng Zhang
3636330490
Add vp9_highbd_iht4x4_16_add_neon()
...
BUG=webm:1403
Change-Id: Id9833e985fb70958cf4bde38f8e6303ed83c12f9
2018-02-05 13:42:16 -08:00
Johann
bd990cad72
quantize x86: dedup some parts
...
Change-Id: I9f95f47bc7ecbb7980f21cbc3a91f699624141af
2017-11-27 13:09:21 -08:00
Kyle Siefring
b383a17fa4
Support building AVX-512 and implement sadx4 for AVX-512
...
The added AVX-512 support requires the subset of AVX-512 added in Skylake-X.
Change-Id: I39666b00d10bf96d06c709823663eb09b89265b7
2017-11-03 13:37:23 -04:00
Scott LaVarnway
b58259ab55
Merge "vpx: [x86] add vpx_hadamard_16x16_avx2()"
2017-10-19 23:32:10 +00:00
Scott LaVarnway
55c126a5d7
vpx: [x86] add vpx_hadamard_16x16_avx2()
...
This version is ~1.91x faster than the sse2 version. When
highbitdepth is enabled, it is ~1.74x.
Change-Id: I2b0e92ede9f55c6259ca07bf1f8c8a5d0d0955bd
2017-10-18 18:00:00 -07:00
Kyle Siefring
55805e2786
Refactor x86/vpx_subpixel_8t_intrin_avx2.c
...
Change-Id: I6539111dfb35a43028e9755785b2e9ea31854305
2017-10-17 11:57:40 -04:00
Linfeng Zhang
6543213e87
Refactor x86/vpx_subpixel_8t_intrin_ssse3.c
...
Change-Id: Id6a8c549709a3c516ed5d7b719b05117c5ef8bac
2017-10-03 13:02:05 -07:00
Linfeng Zhang
0f756a307d
Add vpx_dsp/x86/mem_sse2.h
...
Add some load and store sse2 inline functions.
Change-Id: Ib1e0650b5a3d8e2b3736ab7c7642d6e384354222
2017-10-03 12:59:05 -07:00
Linfeng Zhang
9d0d13e939
Add vpx_scaled_2d_neon()
...
BUG=webm:1419
Change-Id: I39c8033734562efc0ac0e28e7f06fa05130f9b96
2017-09-26 09:22:39 -07:00
Johann
eb4238ac70
Revert "Revert "quantize avx: copy 32x32 implementation""
...
This reverts commit 8c42237bb2
.
Because ssse3 code is used for the reference, the qcoeff and dqcoeff
reference buffers must be aligned.
Original change's description:
> quantize avx: copy 32x32 implementation
>
> Ensure avx and ssse3 stay in sync by testing them against each other.
>
> Change-Id: I699f3b48785c83260825402d7826231f475f697c
Change-Id: Ieeef11b9406964194028b0d81d84bcb63296ae06
2017-09-12 14:25:38 -07:00
Scott LaVarnway
d6c9bbc2b6
vpxdsp: [x86] add highbd_d207_predictor functions
...
C vs SSE2 speed gains:
_4x4 : ~2.31x
C vs SSSE3 speed gains:
_8x8 : ~4.73x
_16x16 : ~10.88x
_32x32 : ~4.80x
BUG=webm:1411
Change-Id: I0bac29db261079181ddabc6814bd62c463109caf
2017-09-11 07:36:24 -07:00
Shiyou Yin
2c7b7424c5
Merge "vpxdsp: [loongson] optimize sad functions with mmi"
2017-09-08 00:55:14 +00:00
Linfeng Zhang
3ec20445b2
Refactor convolve8 NEON functions
...
Change-Id: I4ac576875c91fee7cb150d298fae4a2c156d374c
2017-09-06 15:55:17 -07:00
Shiyou Yin
f4150163a2
vpxdsp: [loongson] optimize sad functions with mmi
...
1. vpx_sadWxH_c
2. vpx_sadWxH_avg_c
3. vpx_sadWxHx3_c
4. vpx_sadWxHx8_c
5. vpx_sadWxHx4d_c
Change-Id: Ie13161e3d73a052ea6ea7bac9cfadf55598fea7a
2017-09-02 15:11:32 +00:00
Scott LaVarnway
30d9a1916c
vpxdsp: [x86] add highbd_h_predictor functions
...
C vs SSE2 speed gains:
_4x4 : ~8.12x
_8x8 : ~9.71x
_16x16 : ~8.21x
_32x32 : ~5.0x
BUG=webm:1422
Change-Id: I5e8a1ed4db7b8dc539b3e2a728b0b34d8b4b1993
2017-08-28 17:31:18 -07:00
Marco Paniconi
8c42237bb2
Revert "quantize avx: copy 32x32 implementation"
...
This reverts commit f60d1dcd3d
.
Reason for revert: <INSERT REASONING HERE>
Failures in AVX/VP9QuantizeTest in nightly tests.
Original change's description:
> quantize avx: copy 32x32 implementation
>
> Ensure avx and ssse3 stay in sync by testing them against each other.
>
> Change-Id: I699f3b48785c83260825402d7826231f475f697c
TBR=slavarnway@google.com ,johannkoenig@google.com,builds@webmproject.org
Change-Id: Ibd38636212269328317dd0721be9d25452113d1c
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
2017-08-25 16:56:08 +00:00
Johann
f60d1dcd3d
quantize avx: copy 32x32 implementation
...
Ensure avx and ssse3 stay in sync by testing them against each other.
Change-Id: I699f3b48785c83260825402d7826231f475f697c
2017-08-24 10:42:34 -07:00
Johann
1787e7dbe0
quantize ssse3: copy implementation to intrinsics
...
Still does not pass tests. Does match the previous assembly, although
saving the sign before multiplying is dubious.
Change-Id: Ia163f18c755aba542d6e93f7bf7343184660df5a
2017-08-24 07:47:51 -07:00
Shiyou Yin
d080c92524
Merge "vpx_dsp:loongson optimize vpx_mseWxH_c(case 16x16,16X8,8X16,8X8) with mmi."
2017-08-24 00:55:11 +00:00
Johann
7c27872164
quantize avx: copy implementation to intrinsics
...
Adds an early exit based on ptest. Slightly slower than ssse3 in the
full case because of the extra check, but potentially faster if lots of
rows can be skipped.
Very close in speed to the assembly.
Can run in 32 bit, unlike the assembly. Allows reworking the function
prototype to use structs.
Change-Id: If80e2b9ba059370a4cad3c973196e82a97b4330e
2017-08-23 09:19:16 -07:00
Shiyou Yin
59e065b6ed
vpx_dsp:loongson optimize vpx_mseWxH_c(case 16x16,16X8,8X16,8X8) with mmi.
...
Change-Id: I2c782d18d9004414ba61b77238e0caf3e022d8f2
2017-08-23 15:14:15 +08:00
Shiyou Yin
bff5aa9827
Merge "vpx_dsp:loongson optimize vpx_subtract_block_c (case 4x4,8x8,16x16) with mmi."
2017-08-22 00:37:23 +00:00
Shiyou Yin
7d82e57f5b
vpx_dsp:loongson optimize vpx_subtract_block_c (case 4x4,8x8,16x16) with mmi.
...
Change-Id: Ia120ad1064d0b6106d9685cf075bdab373eef19e
2017-08-18 09:06:49 +08:00
Linfeng Zhang
d72e20b123
Add vpx_highbd_idct32x32_{34, 135, 1024}_add_{sse2, sse4_1}
...
BUG=webm:1412
Change-Id: I08b562b60fa85fbc2fec1c15c323a3444b44618f
2017-08-14 17:05:22 -07:00
Johann Koenig
0b393ae505
Merge "quantize: copy ssse3 optimizations to intrinsics"
2017-08-10 15:42:20 +00:00
Johann
d52cb59729
quantize: copy ssse3 optimizations to intrinsics
...
Fairly minor differences from sse2. pabsw and psignw are the big gains.
Also re-uses some values in eob calculation to avoid an extra pcmp.
Fixes test failures in HBD and OS X builds.
Allows using it in 32bit builds, where it is about 40% faster than sse2.
Substantially faster than the assembly for skip_block. 10-20% faster the
rest of the time.
Change-Id: If783bb3567e561e47667e10133b9c84414a334e2
2017-08-08 12:22:14 -07:00
Linfeng Zhang
7f20c3ac44
Add vpx_highbd_idct16x16_{10, 38, 256}_add_sse4_1
...
BUG=webm:1412
Change-Id: I8877c986b4042f7b8e33f5674c86700675a0e4ca
2017-08-04 15:31:17 -07:00
Scott LaVarnway
c42517568d
vpx_dsp: merge avx2 variance files
...
BUG=webm:1404
Change-Id: Ieb8f85c3811b05df78722cb41eeb1166966ceec4
2017-08-04 07:49:30 -07:00
Johann
2d6b5df657
neon: vpx_quantize_b
...
With skip block or coeff < zbin it is about twice as fast as C.
If most coeff values are > zbin it is about 10-15x as fast as C.
BUG=webm:1426
Change-Id: I5d3c007b014a372d5ef0882b39bb48983b4131c7
2017-07-31 10:38:46 -07:00
Johann
87610ac45e
neon: consolidate horizontal adds
...
Change-Id: Iaf9e88ff636ccf8f0ef310869c6827f3f205cca8
2017-07-10 15:29:13 -07:00
Alexandra Hájková
d8c277030c
ppc: Add vpx_idct4x4_16_add_vsx
...
Change-Id: Id2673eece32027fb245919c7a5c81994a4a19fd8
2017-07-01 12:32:18 -07:00
Linfeng Zhang
1e3a93e72e
Merge changes I5d038b4f,I9d00d1dd,I0722841d,I1f640db7
...
* changes:
Add vpx_highbd_idct8x8_{12, 64}_add_sse4_1
sse2: Add transpose_32bit_4x4x2() and update transpose_32bit_4x4()
Refactor highbd idct 4x4 sse4.1 code and add highbd_inv_txfm_sse4.h
Refactor vpx_idct8x8_12_add_ssse3() and add inv_txfm_ssse3.h
2017-06-30 20:49:19 +00:00
Linfeng Zhang
c338f3635e
Add vpx_highbd_idct8x8_{12, 64}_add_sse4_1
...
BUG=webm:1412
Change-Id: I5d038b4fa842ce2f6b9bd5c8c44c70647bda9591
2017-06-29 17:19:34 -07:00
Linfeng Zhang
0fa59a4baf
Refactor highbd idct 4x4 sse4.1 code and add highbd_inv_txfm_sse4.h
...
Also clean highbd_inv_txfm_sse2.h
BUG=webm:1412
Change-Id: I0722841d824ce602874019bd9779b10d49d10c0b
2017-06-29 17:17:43 -07:00
Linfeng Zhang
9ac78ae35f
Refactor vpx_idct8x8_12_add_ssse3() and add inv_txfm_ssse3.h
...
BUG=webm:1412
Change-Id: I1f640db71ad4c644b7521305a781f2218eb1ba9d
2017-06-29 17:13:28 -07:00
Johann
cf75ab6ccd
partial fdct neon: move 8x8_1 and enable hbd tests
...
The function was originally written with HBD in mind. Enable it and
configure the tests.
BUG=webm:1424
Change-Id: I78a2eba8d4d9d59db98a344ba0840d4a60ebe9a1
2017-06-28 15:37:43 -07:00
Linfeng Zhang
8253a27904
Add vpx_highbd_idct4x4_16_add_sse4_1()
...
BUG=webm:1412
Change-Id: Ie33482409351a01be4e89466b0441834eb1e905a
2017-06-23 14:30:12 -07:00
Johann
e67660cf37
fdct32x32 neon implementation
...
Almost 3x faster in constrained loop testing. Over 10x faster in HBD
builds.
BUG=webm:1424
Change-Id: I2b7f8453e1d4ada63cde729d8115d684c4a71ff9
2017-06-22 06:40:17 -07:00
Jerome Jiang
943f9ee25c
Merge "Merge skin detection code in vp8/9."
2017-06-08 16:36:00 +00:00
Johann Koenig
903375a48a
Merge "fdct16x16 neon optimization"
2017-06-08 15:19:36 +00:00
Jerome Jiang
658e854252
Merge skin detection code in vp8/9.
...
BUG=webm:1438
Change-Id: Ie3dc034c7dbb498a0b088a767b1936ddeed4df14
2017-06-07 21:20:34 -07:00
Johann
eae7cf2368
fdct16x16 neon optimization
...
Roughly 2x speedup. Since the only change for HBD is to store(), the
improvement appears to hold there as well.
BUG=webm:1424
Change-Id: I15b813d50deb2e47b49a6b0705945de748e83c19
2017-06-07 14:59:55 -07:00
Johann
f695b30ac2
comp_avg_pred neon: used by sub pixel avg variance
...
BUG=webm:1423
Change-Id: I33de537f238f58f89b7a6c1c2d6e8110de4b8804
2017-05-30 22:47:34 +00:00
Johann
105503b839
neon fdct: 4x4 implementation
...
Approximately twice as fast as C implementation.
BUG=webm:1424
Change-Id: I3c0307fb08ddc23df42545cd089a78e2ed5c9d3f
2017-05-17 07:38:18 -07:00
Johann
1088b4f87c
move neon load/stores to a new file
...
Move the tran_low_t helper functions to a new file. Additional
load/store functions will be added here.
Change-Id: I52bf652c344c585ea2f3e1230886be93f5caefc3
2017-05-15 08:29:43 -07:00
James Zern
ac8f58f6ab
Merge changes I1b54a7a5,I3028bdad,I59788cd9
...
* changes:
ppc: Add get_mb_ss_vsx
ppc: Add get4x4sse_cs_vsx
ppc: Add comp_avg_pred_vsx
2017-05-12 15:24:59 +00:00
Luca Barbato
a7f8bd451b
ppc: Add comp_avg_pred_vsx
...
Change-Id: I59788cd98231e707239c2ad95ae54f67cfe24e10
2017-05-12 17:22:55 +02:00
Alexandra Hájková
cc7f0c0f3e
ppc: Add vpx_sad16x8/16/32_vsx
...
Change-Id: I60619d28fffd9809f93b1af510a50e1aa02519a9
2017-05-10 19:57:30 +00:00
Linfeng Zhang
2231669a83
Split dsp/x86/inv_txfm_sse2.c
...
Spin out highbd idct functions.
BUG=webm:1412
Change-Id: I0cfe4117c00039b6778c59c022eee79ad089a2af
2017-05-03 15:43:02 -07:00