Commit Graph

940 Commits

Author SHA1 Message Date
Scott LaVarnway
b58259ab55 Merge "vpx: [x86] add vpx_hadamard_16x16_avx2()" 2017-10-19 23:32:10 +00:00
Scott LaVarnway
55c126a5d7 vpx: [x86] add vpx_hadamard_16x16_avx2()
This version is ~1.91x faster than the sse2 version.  When
highbitdepth is enabled, it is ~1.74x.

Change-Id: I2b0e92ede9f55c6259ca07bf1f8c8a5d0d0955bd
2017-10-18 18:00:00 -07:00
Kyle Siefring
b3a36f7946 Merge "Refactor x86/vpx_subpixel_8t_intrin_avx2.c" 2017-10-18 16:19:52 +00:00
Linfeng Zhang
9336e01621 Merge changes I17fff122,Ic149e3cb
* changes:
  Add 4 to 3 scaling SSSE3 optimization
  Test extreme inputs in frame scale functions
2017-10-17 16:03:29 +00:00
Kyle Siefring
55805e2786 Refactor x86/vpx_subpixel_8t_intrin_avx2.c
Change-Id: I6539111dfb35a43028e9755785b2e9ea31854305
2017-10-17 11:57:40 -04:00
Linfeng Zhang
580d32240f Add 4 to 3 scaling SSSE3 optimization
Note this change will trigger the different C version on SSSE3 and
generate different scaled output.

Its speed is 2x compared with the version calling vpx_scaled_2d_ssse3().

Change-Id: I17fff122cd0a5ac8aa451d84daa606582da8e194
2017-10-16 15:42:42 -07:00
Kyle Siefring
caa116c9be Merge changes I38783d97,If5160c0c
* changes:
  Extend 16 wide AVX2 convolve8 code to support averaging.
  Add AVX2 version of vpx_convolve8_avg.
2017-10-12 16:12:38 +00:00
Linfeng Zhang
16166bfdaa Add 4 to 1 scaling x86 optimization
Change-Id: I51c190f0a88685867df36912522e67bdae58a673
2017-10-10 16:24:06 -07:00
Linfeng Zhang
963cc22cef Merge changes I9d4c1af5,I882da3a0
* changes:
  Rename some inline functions in NEON scaling
  Generalize 2:1 vp9_scale_and_extend_frame_ssse3()
2017-10-10 17:29:50 +00:00
Kyle Siefring
1b2f92ee8e Extend 16 wide AVX2 convolve8 code to support averaging.
Also adds vpx_convolve8_avg_horiz_avx2.

Change-Id: I38783d972ac26bec77610e9e15a0a058ed498cbf
2017-10-09 19:10:03 -04:00
Kyle Siefring
9ca06bcdd2 Add AVX2 version of vpx_convolve8_avg.
vpx_convolve8_avg works by first running a normal horizontal filter then a
vertical filter averages at the end.

The added vpx_convolve8_avg_avx2 calls pre-existing AVX2 code for the
horizontal step.

vpx_convolve8_avg_vert_avx2 is also added, but only uses ssse3 code.

Change-Id: If5160c0c8e778e10de61ee9bf42ee4be5975c983
2017-10-07 23:37:48 -04:00
James Zern
807248ec81 Merge "ppc: Add vpx_idct32x32_1024_add_vsx" 2017-10-07 19:08:26 +00:00
Linfeng Zhang
127864deb3 Generalize 2:1 vp9_scale_and_extend_frame_ssse3()
Change-Id: I882da3a04884d5fabd4cd591c28682cbb2d76aa5
2017-10-04 12:35:39 -07:00
Linfeng Zhang
9a71811d98 Merge changes Id6a8c549,Ib1e0650b,Ic369dd86
* changes:
  Refactor x86/vpx_subpixel_8t_intrin_ssse3.c
  Add vpx_dsp/x86/mem_sse2.h
  Add transpose_8bit_{4x4,8x8}() x86 optimization
2017-10-04 16:15:14 +00:00
James Zern
66b6b87471 Merge "vpx: fix nasm build errors" 2017-10-03 21:47:49 +00:00
Scott LaVarnway
bc4bc9b622 vpx: fix nasm build errors
BUG=webm:1462,766721

Change-Id: Icfa536a8e38623636b96c396e3c94889bfde7a98
2017-10-03 20:02:21 +00:00
Linfeng Zhang
6543213e87 Refactor x86/vpx_subpixel_8t_intrin_ssse3.c
Change-Id: Id6a8c549709a3c516ed5d7b719b05117c5ef8bac
2017-10-03 13:02:05 -07:00
Linfeng Zhang
0f756a307d Add vpx_dsp/x86/mem_sse2.h
Add some load and store sse2 inline functions.

Change-Id: Ib1e0650b5a3d8e2b3736ab7c7642d6e384354222
2017-10-03 12:59:05 -07:00
Linfeng Zhang
67c38c92e7 Add transpose_8bit_{4x4,8x8}() x86 optimization
Change-Id: Ic369dd86b3b81686f68fbc13ad34ab8ea8846878
2017-10-03 10:00:30 -07:00
Alexandra Hájková
fb7fc1dbda ppc: Add vpx_idct32x32_1024_add_vsx
Change-Id: I55cd0a1569ccc47a53d0ecf751aac259d510e10d
2017-09-30 19:31:20 +00:00
Scott LaVarnway
3bbd62ed27 vpxdsp: [x86] add highbd_d135_predictor functions
C vs SSE2 speed gains:
_4x4 : ~1.81x

C vs SSSE3 speed gains:
_8x8 : ~1.96x
_16x16 : ~1.88x
_32x32 : ~2.02x

BUG=webm:1411

Change-Id: Iefaf8b39afbbfe34c1ad1d21e3a003b20f1f61e0
2017-09-29 08:56:38 -07:00
Scott LaVarnway
4cae64c32c vpxdsp: [x86] add highbd_d117_predictor functions
C vs SSE2 speed gains:
_4x4 : ~2.04x

C vs SSSE3 speed gains:
_8x8 : ~2.82x
_16x16 : ~5.93x
_32x32 : ~2.79x

BUG=webm:1411

Change-Id: I31d949695991c067dac89d91e0bed3e666c94993
2017-09-28 14:45:28 -07:00
Scott LaVarnway
80992a746c Merge "vpxdsp: [x86] add highbd_d153_predictor functions" 2017-09-27 20:40:21 +00:00
James Zern
690fa6bb6e Merge "fix signed integer overflow of idct" 2017-09-27 19:39:11 +00:00
Linfeng Zhang
dbbbd44304 fix signed integer overflow of idct
Exposed by fuzz test in high bitdepth.
The bug is introduced in commit 64653fa.

BUG=webm:1466

Change-Id: Idd77d5c6a60efb9241471611ce1aba0646cb6ff5
2017-09-27 11:17:54 -07:00
Scott LaVarnway
19c45ccd43 vpxdsp: [x86] add highbd_d153_predictor functions
C vs SSE2 speed gains:
_4x4 : ~1.95x

C vs SSSE3 speed gains:
_8x8 : ~3.30x
_16x16 : ~5.67x
_32x32 : ~3.87x

BUG=webm:1411

Change-Id: Ib483989b25614aa89b635e8c087d0879a5d71904
2017-09-27 11:01:11 -07:00
Linfeng Zhang
9d0d13e939 Add vpx_scaled_2d_neon()
BUG=webm:1419

Change-Id: I39c8033734562efc0ac0e28e7f06fa05130f9b96
2017-09-26 09:22:39 -07:00
Linfeng Zhang
28762341ac Merge changes Ib9105462,Idfac00ed,If8d8a0e2
* changes:
  cosmetics: NEON scaling code
  Refactor convolve NEON code
  Refactor convolve code
2017-09-26 16:10:46 +00:00
Scott LaVarnway
a059dc0986 Merge "vpxdsp: [x86] add highbd_d45_predictor functions" 2017-09-25 11:34:14 +00:00
Scott LaVarnway
cf82f7276e vpxdsp: [x86] add highbd_d45_predictor functions
C vs SSSE3 speed gains:
_4x4 : ~2.45x
_8x8 : ~10.61x
_16x16 : ~11.34x
_32x32 : ~6.36x

BUG=webm:1411

Change-Id: Ic91389a4f1a8ad093f498afe53765b897fb9be09
2017-09-22 05:20:12 -07:00
Linfeng Zhang
d586cdb4d4 Remove the unnecessary cast of (int16_t)cospi_{1...31}_64
BUG=webm:1450

Change-Id: If59743aafe99226e0ec67ab5d20678ce25f53ab8
2017-09-20 14:13:26 -07:00
Linfeng Zhang
76a3d3fcc5 Remove the unnecessary upcasts of (int)cospi_{1...31}_64
BUG=webm:1450

Change-Id: Ib046fe28caec5b9ebdc9d0152df7c54ff4266858
2017-09-20 14:13:26 -07:00
Linfeng Zhang
64653fa133 Change cospi_{1...31}_64 from tran_high_t to tran_coef_t
The unnecessary upcast to (int) will be cleaned later.

BUG=webm:1450

Change-Id: Ia234575206d5a74540526924b06ed3939322d063
2017-09-20 14:13:26 -07:00
Scott LaVarnway
b85e391ac8 Merge "vpxdsp: [x86] add highbd_d63_predictor functions" 2017-09-20 11:39:28 +00:00
Linfeng Zhang
7c0529728a cosmetics: NEON scaling code
Change-Id: Ib91054622c1f09c4ca523bc6837d7d8ab9f03618
2017-09-19 16:39:17 -07:00
Linfeng Zhang
f357335c38 Refactor convolve NEON code
Rename a couple of hbd static functions.
Move the position of NEON function convolve8_4().

Change-Id: Idfac00edf2e99cdd8e0a73b9f895402f60be6349
2017-09-19 16:28:36 -07:00
Linfeng Zhang
bf8bdae913 Refactor convolve code
Extract a couple of static functions into their caller functions.

Change-Id: If8d8a0e217fba6b402d2a79ede13b5b444ff08a0
2017-09-19 16:28:31 -07:00
Scott LaVarnway
bc86e2c6a2 vpxdsp: [x86] add highbd_d63_predictor functions
C vs SSE2 speed gains:
_4x4 : ~2.94x

C vs SSSE3 speed gains:
_8x8 : ~8.69x
_16x16 : ~6.32x
_32x32 : ~5.33x

BUG=webm:1411

Change-Id: I2c35b527eac2229f17aaa9d118fb601e7195efe4
2017-09-19 15:47:22 -07:00
Linfeng Zhang
a80bdfd081 Change sinpi_{1,2,3,4}_9 from tran_high_t to int16_t
Add "typedef int16_t tran_coef_t;"

BUG=webm:1450

Change-Id: I67866f104898d1dda8989e1abdaf6983fe324154
2017-09-18 09:26:03 -07:00
Linfeng Zhang
9d278465b5 Merge "cosmetics: vp9_rtcd_defs.pl" 2017-09-18 16:23:33 +00:00
Kaustubh Raste
4ca8f8f5e2 mips msa clean-up msa macros
Removed inline for GP load-store in case of (__mips_isa_rev >= 6)
Created one define LD_V for vector load and ST_V for vector store

Change-Id: Ifec3570fa18346e39791b0dd622892e5c18bd448
2017-09-14 12:29:19 +05:30
Linfeng Zhang
535dee0fb6 cosmetics: vp9_rtcd_defs.pl
Change-Id: I1bf57824e07fa4f8b3b5574984117f2bd7a1c086
2017-09-13 12:13:55 -07:00
Johann Koenig
ed3a80cb5e Merge "Revert "Revert "quantize avx: copy 32x32 implementation""" 2017-09-13 14:44:53 +00:00
Johann
eb4238ac70 Revert "Revert "quantize avx: copy 32x32 implementation""
This reverts commit 8c42237bb2.

Because ssse3 code is used for the reference, the qcoeff and dqcoeff
reference buffers must be aligned.

Original change's description:
> quantize avx: copy 32x32 implementation
>
> Ensure avx and ssse3 stay in sync by testing them against each other.
>
> Change-Id: I699f3b48785c83260825402d7826231f475f697c

Change-Id: Ieeef11b9406964194028b0d81d84bcb63296ae06
2017-09-12 14:25:38 -07:00
Kaustubh Raste
30f1ff94e0 Optimize mips msa vp9 average mc functions
Load the specific destination loads instead of vector load

Change-Id: I65ca13ae8f608fad07121fef848e2a18f54171fe
2017-09-12 16:12:11 +05:30
Scott LaVarnway
c39cd9235e Merge "vpxdsp: [x86] add highbd_d207_predictor functions" 2017-09-11 22:32:23 +00:00
Linfeng Zhang
a9bbe53dbb Add 4 to 1 scaling NEON optimization
BUG=webm:1419

Change-Id: If82a93935d2453e61b7647aae70983db1740bec7
2017-09-11 10:17:28 -07:00
Scott LaVarnway
d6c9bbc2b6 vpxdsp: [x86] add highbd_d207_predictor functions
C vs SSE2 speed gains:
_4x4 : ~2.31x

C vs SSSE3 speed gains:
_8x8 : ~4.73x
_16x16 : ~10.88x
_32x32 : ~4.80x

BUG=webm:1411

Change-Id: I0bac29db261079181ddabc6814bd62c463109caf
2017-09-11 07:36:24 -07:00
James Zern
fb40b5d7a7 intrapred: sync highbd_d63_predictor w/d63_
8/16/32: ~6%/~18%/~33% faster

previously:
7012ba639 vp9_reconintra: simplify d63_predictor

BUG=webm:1411

Change-Id: Ie775f3a4f7fd74df44754e65686d826a51c2cdc2
2017-09-08 19:28:01 -07:00
James Zern
5c95fd921e intrapred: sync highbd_d45_predictor w/d45_
8/16/32:: ~19%/~54%/~75.5% faster

previously:
acc481eaa vp9_reconintra: simplify d45_predictor

BUG=webm:1411

Change-Id: Ie8340b0c5070ae640f124733f025e4e749b660d8
2017-09-08 19:09:07 -07:00