Scott LaVarnway
b58259ab55
Merge "vpx: [x86] add vpx_hadamard_16x16_avx2()"
2017-10-19 23:32:10 +00:00
Scott LaVarnway
55c126a5d7
vpx: [x86] add vpx_hadamard_16x16_avx2()
...
This version is ~1.91x faster than the sse2 version. When
highbitdepth is enabled, it is ~1.74x.
Change-Id: I2b0e92ede9f55c6259ca07bf1f8c8a5d0d0955bd
2017-10-18 18:00:00 -07:00
Kyle Siefring
b3a36f7946
Merge "Refactor x86/vpx_subpixel_8t_intrin_avx2.c"
2017-10-18 16:19:52 +00:00
Linfeng Zhang
9336e01621
Merge changes I17fff122,Ic149e3cb
...
* changes:
Add 4 to 3 scaling SSSE3 optimization
Test extreme inputs in frame scale functions
2017-10-17 16:03:29 +00:00
Kyle Siefring
55805e2786
Refactor x86/vpx_subpixel_8t_intrin_avx2.c
...
Change-Id: I6539111dfb35a43028e9755785b2e9ea31854305
2017-10-17 11:57:40 -04:00
Linfeng Zhang
580d32240f
Add 4 to 3 scaling SSSE3 optimization
...
Note this change will trigger the different C version on SSSE3 and
generate different scaled output.
Its speed is 2x compared with the version calling vpx_scaled_2d_ssse3().
Change-Id: I17fff122cd0a5ac8aa451d84daa606582da8e194
2017-10-16 15:42:42 -07:00
Kyle Siefring
caa116c9be
Merge changes I38783d97,If5160c0c
...
* changes:
Extend 16 wide AVX2 convolve8 code to support averaging.
Add AVX2 version of vpx_convolve8_avg.
2017-10-12 16:12:38 +00:00
Linfeng Zhang
16166bfdaa
Add 4 to 1 scaling x86 optimization
...
Change-Id: I51c190f0a88685867df36912522e67bdae58a673
2017-10-10 16:24:06 -07:00
Linfeng Zhang
963cc22cef
Merge changes I9d4c1af5,I882da3a0
...
* changes:
Rename some inline functions in NEON scaling
Generalize 2:1 vp9_scale_and_extend_frame_ssse3()
2017-10-10 17:29:50 +00:00
Kyle Siefring
1b2f92ee8e
Extend 16 wide AVX2 convolve8 code to support averaging.
...
Also adds vpx_convolve8_avg_horiz_avx2.
Change-Id: I38783d972ac26bec77610e9e15a0a058ed498cbf
2017-10-09 19:10:03 -04:00
Kyle Siefring
9ca06bcdd2
Add AVX2 version of vpx_convolve8_avg.
...
vpx_convolve8_avg works by first running a normal horizontal filter then a
vertical filter averages at the end.
The added vpx_convolve8_avg_avx2 calls pre-existing AVX2 code for the
horizontal step.
vpx_convolve8_avg_vert_avx2 is also added, but only uses ssse3 code.
Change-Id: If5160c0c8e778e10de61ee9bf42ee4be5975c983
2017-10-07 23:37:48 -04:00
James Zern
807248ec81
Merge "ppc: Add vpx_idct32x32_1024_add_vsx"
2017-10-07 19:08:26 +00:00
Linfeng Zhang
127864deb3
Generalize 2:1 vp9_scale_and_extend_frame_ssse3()
...
Change-Id: I882da3a04884d5fabd4cd591c28682cbb2d76aa5
2017-10-04 12:35:39 -07:00
Linfeng Zhang
9a71811d98
Merge changes Id6a8c549,Ib1e0650b,Ic369dd86
...
* changes:
Refactor x86/vpx_subpixel_8t_intrin_ssse3.c
Add vpx_dsp/x86/mem_sse2.h
Add transpose_8bit_{4x4,8x8}() x86 optimization
2017-10-04 16:15:14 +00:00
James Zern
66b6b87471
Merge "vpx: fix nasm build errors"
2017-10-03 21:47:49 +00:00
Scott LaVarnway
bc4bc9b622
vpx: fix nasm build errors
...
BUG=webm:1462,766721
Change-Id: Icfa536a8e38623636b96c396e3c94889bfde7a98
2017-10-03 20:02:21 +00:00
Linfeng Zhang
6543213e87
Refactor x86/vpx_subpixel_8t_intrin_ssse3.c
...
Change-Id: Id6a8c549709a3c516ed5d7b719b05117c5ef8bac
2017-10-03 13:02:05 -07:00
Linfeng Zhang
0f756a307d
Add vpx_dsp/x86/mem_sse2.h
...
Add some load and store sse2 inline functions.
Change-Id: Ib1e0650b5a3d8e2b3736ab7c7642d6e384354222
2017-10-03 12:59:05 -07:00
Linfeng Zhang
67c38c92e7
Add transpose_8bit_{4x4,8x8}() x86 optimization
...
Change-Id: Ic369dd86b3b81686f68fbc13ad34ab8ea8846878
2017-10-03 10:00:30 -07:00
Alexandra Hájková
fb7fc1dbda
ppc: Add vpx_idct32x32_1024_add_vsx
...
Change-Id: I55cd0a1569ccc47a53d0ecf751aac259d510e10d
2017-09-30 19:31:20 +00:00
Scott LaVarnway
3bbd62ed27
vpxdsp: [x86] add highbd_d135_predictor functions
...
C vs SSE2 speed gains:
_4x4 : ~1.81x
C vs SSSE3 speed gains:
_8x8 : ~1.96x
_16x16 : ~1.88x
_32x32 : ~2.02x
BUG=webm:1411
Change-Id: Iefaf8b39afbbfe34c1ad1d21e3a003b20f1f61e0
2017-09-29 08:56:38 -07:00
Scott LaVarnway
4cae64c32c
vpxdsp: [x86] add highbd_d117_predictor functions
...
C vs SSE2 speed gains:
_4x4 : ~2.04x
C vs SSSE3 speed gains:
_8x8 : ~2.82x
_16x16 : ~5.93x
_32x32 : ~2.79x
BUG=webm:1411
Change-Id: I31d949695991c067dac89d91e0bed3e666c94993
2017-09-28 14:45:28 -07:00
Scott LaVarnway
80992a746c
Merge "vpxdsp: [x86] add highbd_d153_predictor functions"
2017-09-27 20:40:21 +00:00
James Zern
690fa6bb6e
Merge "fix signed integer overflow of idct"
2017-09-27 19:39:11 +00:00
Linfeng Zhang
dbbbd44304
fix signed integer overflow of idct
...
Exposed by fuzz test in high bitdepth.
The bug is introduced in commit 64653fa
.
BUG=webm:1466
Change-Id: Idd77d5c6a60efb9241471611ce1aba0646cb6ff5
2017-09-27 11:17:54 -07:00
Scott LaVarnway
19c45ccd43
vpxdsp: [x86] add highbd_d153_predictor functions
...
C vs SSE2 speed gains:
_4x4 : ~1.95x
C vs SSSE3 speed gains:
_8x8 : ~3.30x
_16x16 : ~5.67x
_32x32 : ~3.87x
BUG=webm:1411
Change-Id: Ib483989b25614aa89b635e8c087d0879a5d71904
2017-09-27 11:01:11 -07:00
Linfeng Zhang
9d0d13e939
Add vpx_scaled_2d_neon()
...
BUG=webm:1419
Change-Id: I39c8033734562efc0ac0e28e7f06fa05130f9b96
2017-09-26 09:22:39 -07:00
Linfeng Zhang
28762341ac
Merge changes Ib9105462,Idfac00ed,If8d8a0e2
...
* changes:
cosmetics: NEON scaling code
Refactor convolve NEON code
Refactor convolve code
2017-09-26 16:10:46 +00:00
Scott LaVarnway
a059dc0986
Merge "vpxdsp: [x86] add highbd_d45_predictor functions"
2017-09-25 11:34:14 +00:00
Scott LaVarnway
cf82f7276e
vpxdsp: [x86] add highbd_d45_predictor functions
...
C vs SSSE3 speed gains:
_4x4 : ~2.45x
_8x8 : ~10.61x
_16x16 : ~11.34x
_32x32 : ~6.36x
BUG=webm:1411
Change-Id: Ic91389a4f1a8ad093f498afe53765b897fb9be09
2017-09-22 05:20:12 -07:00
Linfeng Zhang
d586cdb4d4
Remove the unnecessary cast of (int16_t)cospi_{1...31}_64
...
BUG=webm:1450
Change-Id: If59743aafe99226e0ec67ab5d20678ce25f53ab8
2017-09-20 14:13:26 -07:00
Linfeng Zhang
76a3d3fcc5
Remove the unnecessary upcasts of (int)cospi_{1...31}_64
...
BUG=webm:1450
Change-Id: Ib046fe28caec5b9ebdc9d0152df7c54ff4266858
2017-09-20 14:13:26 -07:00
Linfeng Zhang
64653fa133
Change cospi_{1...31}_64 from tran_high_t to tran_coef_t
...
The unnecessary upcast to (int) will be cleaned later.
BUG=webm:1450
Change-Id: Ia234575206d5a74540526924b06ed3939322d063
2017-09-20 14:13:26 -07:00
Scott LaVarnway
b85e391ac8
Merge "vpxdsp: [x86] add highbd_d63_predictor functions"
2017-09-20 11:39:28 +00:00
Linfeng Zhang
7c0529728a
cosmetics: NEON scaling code
...
Change-Id: Ib91054622c1f09c4ca523bc6837d7d8ab9f03618
2017-09-19 16:39:17 -07:00
Linfeng Zhang
f357335c38
Refactor convolve NEON code
...
Rename a couple of hbd static functions.
Move the position of NEON function convolve8_4().
Change-Id: Idfac00edf2e99cdd8e0a73b9f895402f60be6349
2017-09-19 16:28:36 -07:00
Linfeng Zhang
bf8bdae913
Refactor convolve code
...
Extract a couple of static functions into their caller functions.
Change-Id: If8d8a0e217fba6b402d2a79ede13b5b444ff08a0
2017-09-19 16:28:31 -07:00
Scott LaVarnway
bc86e2c6a2
vpxdsp: [x86] add highbd_d63_predictor functions
...
C vs SSE2 speed gains:
_4x4 : ~2.94x
C vs SSSE3 speed gains:
_8x8 : ~8.69x
_16x16 : ~6.32x
_32x32 : ~5.33x
BUG=webm:1411
Change-Id: I2c35b527eac2229f17aaa9d118fb601e7195efe4
2017-09-19 15:47:22 -07:00
Linfeng Zhang
a80bdfd081
Change sinpi_{1,2,3,4}_9 from tran_high_t to int16_t
...
Add "typedef int16_t tran_coef_t;"
BUG=webm:1450
Change-Id: I67866f104898d1dda8989e1abdaf6983fe324154
2017-09-18 09:26:03 -07:00
Linfeng Zhang
9d278465b5
Merge "cosmetics: vp9_rtcd_defs.pl"
2017-09-18 16:23:33 +00:00
Kaustubh Raste
4ca8f8f5e2
mips msa clean-up msa macros
...
Removed inline for GP load-store in case of (__mips_isa_rev >= 6)
Created one define LD_V for vector load and ST_V for vector store
Change-Id: Ifec3570fa18346e39791b0dd622892e5c18bd448
2017-09-14 12:29:19 +05:30
Linfeng Zhang
535dee0fb6
cosmetics: vp9_rtcd_defs.pl
...
Change-Id: I1bf57824e07fa4f8b3b5574984117f2bd7a1c086
2017-09-13 12:13:55 -07:00
Johann Koenig
ed3a80cb5e
Merge "Revert "Revert "quantize avx: copy 32x32 implementation"""
2017-09-13 14:44:53 +00:00
Johann
eb4238ac70
Revert "Revert "quantize avx: copy 32x32 implementation""
...
This reverts commit 8c42237bb2
.
Because ssse3 code is used for the reference, the qcoeff and dqcoeff
reference buffers must be aligned.
Original change's description:
> quantize avx: copy 32x32 implementation
>
> Ensure avx and ssse3 stay in sync by testing them against each other.
>
> Change-Id: I699f3b48785c83260825402d7826231f475f697c
Change-Id: Ieeef11b9406964194028b0d81d84bcb63296ae06
2017-09-12 14:25:38 -07:00
Kaustubh Raste
30f1ff94e0
Optimize mips msa vp9 average mc functions
...
Load the specific destination loads instead of vector load
Change-Id: I65ca13ae8f608fad07121fef848e2a18f54171fe
2017-09-12 16:12:11 +05:30
Scott LaVarnway
c39cd9235e
Merge "vpxdsp: [x86] add highbd_d207_predictor functions"
2017-09-11 22:32:23 +00:00
Linfeng Zhang
a9bbe53dbb
Add 4 to 1 scaling NEON optimization
...
BUG=webm:1419
Change-Id: If82a93935d2453e61b7647aae70983db1740bec7
2017-09-11 10:17:28 -07:00
Scott LaVarnway
d6c9bbc2b6
vpxdsp: [x86] add highbd_d207_predictor functions
...
C vs SSE2 speed gains:
_4x4 : ~2.31x
C vs SSSE3 speed gains:
_8x8 : ~4.73x
_16x16 : ~10.88x
_32x32 : ~4.80x
BUG=webm:1411
Change-Id: I0bac29db261079181ddabc6814bd62c463109caf
2017-09-11 07:36:24 -07:00
James Zern
fb40b5d7a7
intrapred: sync highbd_d63_predictor w/d63_
...
8/16/32: ~6%/~18%/~33% faster
previously:
7012ba639
vp9_reconintra: simplify d63_predictor
BUG=webm:1411
Change-Id: Ie775f3a4f7fd74df44754e65686d826a51c2cdc2
2017-09-08 19:28:01 -07:00
James Zern
5c95fd921e
intrapred: sync highbd_d45_predictor w/d45_
...
8/16/32:: ~19%/~54%/~75.5% faster
previously:
acc481eaa
vp9_reconintra: simplify d45_predictor
BUG=webm:1411
Change-Id: Ie8340b0c5070ae640f124733f025e4e749b660d8
2017-09-08 19:09:07 -07:00