914 Commits

Author SHA1 Message Date
Linfeng Zhang
dbbbd44304 fix signed integer overflow of idct
Exposed by fuzz test in high bitdepth.
The bug is introduced in commit 64653fa.

BUG=webm:1466

Change-Id: Idd77d5c6a60efb9241471611ce1aba0646cb6ff5
2017-09-27 11:17:54 -07:00
Linfeng Zhang
28762341ac Merge changes Ib9105462,Idfac00ed,If8d8a0e2
* changes:
  cosmetics: NEON scaling code
  Refactor convolve NEON code
  Refactor convolve code
2017-09-26 16:10:46 +00:00
Scott LaVarnway
a059dc0986 Merge "vpxdsp: [x86] add highbd_d45_predictor functions" 2017-09-25 11:34:14 +00:00
Scott LaVarnway
cf82f7276e vpxdsp: [x86] add highbd_d45_predictor functions
C vs SSSE3 speed gains:
_4x4 : ~2.45x
_8x8 : ~10.61x
_16x16 : ~11.34x
_32x32 : ~6.36x

BUG=webm:1411

Change-Id: Ic91389a4f1a8ad093f498afe53765b897fb9be09
2017-09-22 05:20:12 -07:00
Linfeng Zhang
d586cdb4d4 Remove the unnecessary cast of (int16_t)cospi_{1...31}_64
BUG=webm:1450

Change-Id: If59743aafe99226e0ec67ab5d20678ce25f53ab8
2017-09-20 14:13:26 -07:00
Linfeng Zhang
76a3d3fcc5 Remove the unnecessary upcasts of (int)cospi_{1...31}_64
BUG=webm:1450

Change-Id: Ib046fe28caec5b9ebdc9d0152df7c54ff4266858
2017-09-20 14:13:26 -07:00
Linfeng Zhang
64653fa133 Change cospi_{1...31}_64 from tran_high_t to tran_coef_t
The unnecessary upcast to (int) will be cleaned later.

BUG=webm:1450

Change-Id: Ia234575206d5a74540526924b06ed3939322d063
2017-09-20 14:13:26 -07:00
Scott LaVarnway
b85e391ac8 Merge "vpxdsp: [x86] add highbd_d63_predictor functions" 2017-09-20 11:39:28 +00:00
Linfeng Zhang
7c0529728a cosmetics: NEON scaling code
Change-Id: Ib91054622c1f09c4ca523bc6837d7d8ab9f03618
2017-09-19 16:39:17 -07:00
Linfeng Zhang
f357335c38 Refactor convolve NEON code
Rename a couple of hbd static functions.
Move the position of NEON function convolve8_4().

Change-Id: Idfac00edf2e99cdd8e0a73b9f895402f60be6349
2017-09-19 16:28:36 -07:00
Linfeng Zhang
bf8bdae913 Refactor convolve code
Extract a couple of static functions into their caller functions.

Change-Id: If8d8a0e217fba6b402d2a79ede13b5b444ff08a0
2017-09-19 16:28:31 -07:00
Scott LaVarnway
bc86e2c6a2 vpxdsp: [x86] add highbd_d63_predictor functions
C vs SSE2 speed gains:
_4x4 : ~2.94x

C vs SSSE3 speed gains:
_8x8 : ~8.69x
_16x16 : ~6.32x
_32x32 : ~5.33x

BUG=webm:1411

Change-Id: I2c35b527eac2229f17aaa9d118fb601e7195efe4
2017-09-19 15:47:22 -07:00
Linfeng Zhang
a80bdfd081 Change sinpi_{1,2,3,4}_9 from tran_high_t to int16_t
Add "typedef int16_t tran_coef_t;"

BUG=webm:1450

Change-Id: I67866f104898d1dda8989e1abdaf6983fe324154
2017-09-18 09:26:03 -07:00
Linfeng Zhang
9d278465b5 Merge "cosmetics: vp9_rtcd_defs.pl" 2017-09-18 16:23:33 +00:00
Kaustubh Raste
4ca8f8f5e2 mips msa clean-up msa macros
Removed inline for GP load-store in case of (__mips_isa_rev >= 6)
Created one define LD_V for vector load and ST_V for vector store

Change-Id: Ifec3570fa18346e39791b0dd622892e5c18bd448
2017-09-14 12:29:19 +05:30
Linfeng Zhang
535dee0fb6 cosmetics: vp9_rtcd_defs.pl
Change-Id: I1bf57824e07fa4f8b3b5574984117f2bd7a1c086
2017-09-13 12:13:55 -07:00
Johann Koenig
ed3a80cb5e Merge "Revert "Revert "quantize avx: copy 32x32 implementation""" 2017-09-13 14:44:53 +00:00
Johann
eb4238ac70 Revert "Revert "quantize avx: copy 32x32 implementation""
This reverts commit 8c42237bb200253931c49e2c530838f3a877dd65.

Because ssse3 code is used for the reference, the qcoeff and dqcoeff
reference buffers must be aligned.

Original change's description:
> quantize avx: copy 32x32 implementation
>
> Ensure avx and ssse3 stay in sync by testing them against each other.
>
> Change-Id: I699f3b48785c83260825402d7826231f475f697c

Change-Id: Ieeef11b9406964194028b0d81d84bcb63296ae06
2017-09-12 14:25:38 -07:00
Kaustubh Raste
30f1ff94e0 Optimize mips msa vp9 average mc functions
Load the specific destination loads instead of vector load

Change-Id: I65ca13ae8f608fad07121fef848e2a18f54171fe
2017-09-12 16:12:11 +05:30
Scott LaVarnway
c39cd9235e Merge "vpxdsp: [x86] add highbd_d207_predictor functions" 2017-09-11 22:32:23 +00:00
Linfeng Zhang
a9bbe53dbb Add 4 to 1 scaling NEON optimization
BUG=webm:1419

Change-Id: If82a93935d2453e61b7647aae70983db1740bec7
2017-09-11 10:17:28 -07:00
Scott LaVarnway
d6c9bbc2b6 vpxdsp: [x86] add highbd_d207_predictor functions
C vs SSE2 speed gains:
_4x4 : ~2.31x

C vs SSSE3 speed gains:
_8x8 : ~4.73x
_16x16 : ~10.88x
_32x32 : ~4.80x

BUG=webm:1411

Change-Id: I0bac29db261079181ddabc6814bd62c463109caf
2017-09-11 07:36:24 -07:00
James Zern
fb40b5d7a7 intrapred: sync highbd_d63_predictor w/d63_
8/16/32: ~6%/~18%/~33% faster

previously:
7012ba639 vp9_reconintra: simplify d63_predictor

BUG=webm:1411

Change-Id: Ie775f3a4f7fd74df44754e65686d826a51c2cdc2
2017-09-08 19:28:01 -07:00
James Zern
5c95fd921e intrapred: sync highbd_d45_predictor w/d45_
8/16/32:: ~19%/~54%/~75.5% faster

previously:
acc481eaa vp9_reconintra: simplify d45_predictor

BUG=webm:1411

Change-Id: Ie8340b0c5070ae640f124733f025e4e749b660d8
2017-09-08 19:09:07 -07:00
James Zern
9a2dd7e67e Merge changes I9ec438aa,I99c954ff
* changes:
  Update convolve functions' assertions
  Add 2 to 1 scaling NEON optimization
2017-09-08 19:23:40 +00:00
Shiyou Yin
2c7b7424c5 Merge "vpxdsp: [loongson] optimize sad functions with mmi" 2017-09-08 00:55:14 +00:00
Linfeng Zhang
ef41c6286d Update convolve functions' assertions
So that 4 to 1 frame scaling can call them.

Change-Id: I9ec438aa63b923ba164ad3c59d7ecfa12789eab5
2017-09-07 12:33:58 -07:00
Linfeng Zhang
3ec20445b2 Refactor convolve8 NEON functions
Change-Id: I4ac576875c91fee7cb150d298fae4a2c156d374c
2017-09-06 15:55:17 -07:00
Linfeng Zhang
7219f31904 Merge "Remove get_filter_base() and get_filter_offset() in convolve" 2017-09-06 22:39:15 +00:00
Linfeng Zhang
d331e7a1c0 Remove get_filter_base() and get_filter_offset() in convolve
so that the convolve functions are independent of table alignment.

Change-Id: Ieab132a30d72c6e75bbe9473544fbe2cf51541ee
2017-09-05 15:22:36 -07:00
Scott LaVarnway
bc4bcca3fd vpxdsp: [x86] add highbd_dc_128_predictor functions
C vs SSE2 speed gains:
_4x4 : ~7.64x
_8x8 : ~16.60x
_16x16 : ~8.15x
_32x32 : ~5.05x

BUG=webm:1411

Change-Id: If165d419711cfda901bd428a05ca1560a009e62e
2017-09-05 07:57:42 -07:00
Shiyou Yin
f4150163a2 vpxdsp: [loongson] optimize sad functions with mmi
1. vpx_sadWxH_c
2. vpx_sadWxH_avg_c
3. vpx_sadWxHx3_c
4. vpx_sadWxHx8_c
5. vpx_sadWxHx4d_c

Change-Id: Ie13161e3d73a052ea6ea7bac9cfadf55598fea7a
2017-09-02 15:11:32 +00:00
James Zern
334e9abb0b Merge "inv_txfm_vsx: fix loads in high-bitdepth" 2017-09-01 03:09:49 +00:00
James Zern
f8f64c309b inv_txfm_vsx: fix loads in high-bitdepth
vec_vsx_ld -> load_tran_low

Change-Id: Id3144cdd528d2d406a515e5812e2ea9e4db64bf1
2017-08-30 23:47:56 -07:00
Scott LaVarnway
c39a05ff61 vpxdsp: [x86] add highbd_dc_left_predictor functions
C vs SSE2 speed gains:
_4x4 : ~6.49x
_8x8 : ~10.82x
_16x16 : ~7.61x
_32x32 : ~5.29x

BUG=webm:1411

Change-Id: Ibc30c50cb7139049bf05298010803499e6ef949b
2017-08-30 09:29:06 -07:00
Scott LaVarnway
f783e3a75d vpxdsp: [x86] add highbd_dc_top_predictor functions
C vs SSE2 speed gains:
_4x4 : ~7.39x
_8x8 : ~11.36x
_16x16 : ~8.68x
_32x32 : ~4.33x

BUG=webm:1411

Change-Id: I7f1487cd1531d4e7f0fbb4596fed3bfb72a59d58
2017-08-29 12:53:30 -07:00
Scott LaVarnway
30d9a1916c vpxdsp: [x86] add highbd_h_predictor functions
C vs SSE2 speed gains:
_4x4 : ~8.12x
_8x8 : ~9.71x
_16x16 : ~8.21x
_32x32 : ~5.0x

BUG=webm:1422

Change-Id: I5e8a1ed4db7b8dc539b3e2a728b0b34d8b4b1993
2017-08-28 17:31:18 -07:00
Marco Paniconi
3e069846b9 Merge "Revert "quantize avx: copy 32x32 implementation"" 2017-08-25 18:20:31 +00:00
Marco Paniconi
8c42237bb2 Revert "quantize avx: copy 32x32 implementation"
This reverts commit f60d1dcd3de46f72bafc5eeef481bd1a4e203301.

Reason for revert: <INSERT REASONING HERE>
Failures in AVX/VP9QuantizeTest in nightly tests.
Original change's description:
> quantize avx: copy 32x32 implementation
> 
> Ensure avx and ssse3 stay in sync by testing them against each other.
> 
> Change-Id: I699f3b48785c83260825402d7826231f475f697c

TBR=slavarnway@google.com,johannkoenig@google.com,builds@webmproject.org

Change-Id: Ibd38636212269328317dd0721be9d25452113d1c
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
2017-08-25 16:56:08 +00:00
Shiyou Yin
ece1989fa2 Merge "vpx_dsp:loongson optimize vpx_varianceWxH_c,vpx_sub_pixel_varianceWxH_c and vpx_sub_pixel_avg_varianceWxH_c with mmi." 2017-08-25 06:44:02 +00:00
Shiyou Yin
9e4647c7ab vpx_dsp:loongson optimize vpx_varianceWxH_c,vpx_sub_pixel_varianceWxH_c and vpx_sub_pixel_avg_varianceWxH_c with mmi.
Change-Id: Ia576a721df6312329b599c31cfe1fb1267a9f174
2017-08-25 01:58:49 +08:00
Johann
f60d1dcd3d quantize avx: copy 32x32 implementation
Ensure avx and ssse3 stay in sync by testing them against each other.

Change-Id: I699f3b48785c83260825402d7826231f475f697c
2017-08-24 10:42:34 -07:00
Johann
1787e7dbe0 quantize ssse3: copy implementation to intrinsics
Still does not pass tests. Does match the previous assembly, although
saving the sign before multiplying is dubious.

Change-Id: Ia163f18c755aba542d6e93f7bf7343184660df5a
2017-08-24 07:47:51 -07:00
Shiyou Yin
d080c92524 Merge "vpx_dsp:loongson optimize vpx_mseWxH_c(case 16x16,16X8,8X16,8X8) with mmi." 2017-08-24 00:55:11 +00:00
Johann Koenig
f53b656207 Merge "quantize avx: copy implementation to intrinsics" 2017-08-23 21:14:13 +00:00
Scott LaVarnway
1aad50c092 Merge "vpx_dsp: get32x32var_avx2() cleanup" 2017-08-23 19:59:25 +00:00
Johann Koenig
dfafd10ef5 Merge "quantize neon: round dqcoeff towards zero" 2017-08-23 19:20:53 +00:00
Johann
7c27872164 quantize avx: copy implementation to intrinsics
Adds an early exit based on ptest. Slightly slower than ssse3 in the
full case because of the extra check, but potentially faster if lots of
rows can be skipped.

Very close in speed to the assembly.

Can run in 32 bit, unlike the assembly. Allows reworking the function
prototype to use structs.

Change-Id: If80e2b9ba059370a4cad3c973196e82a97b4330e
2017-08-23 09:19:16 -07:00
Johann
2a5aa98a35 quantize neon: round dqcoeff towards zero
Add 1 if negative to get dqcoeff to round towards zero.

10-15% faster than converting to positive before shifting.

Change-Id: I01a62fd0c9bca786b6885b318bd447bb9229903d
2017-08-23 08:05:50 -07:00
Shiyou Yin
59e065b6ed vpx_dsp:loongson optimize vpx_mseWxH_c(case 16x16,16X8,8X16,8X8) with mmi.
Change-Id: I2c782d18d9004414ba61b77238e0caf3e022d8f2
2017-08-23 15:14:15 +08:00