Commit Graph

2144 Commits

Author SHA1 Message Date
Scott LaVarnway
8e6022844f vpx: [x86] add vpx_satd_avx2()
SSE2 instrinsic vs AVX2 intrinsic speed gains:
blocksize   16: ~1.33
blocksize   64: ~1.51
blocksize  256: ~3.03
blocksize 1024: ~3.71

Change-Id: I79b28cba82d21f9dd765e79881aa16d24fd0cb58
2017-11-10 12:24:12 -08:00
Scott LaVarnway
8c7213bc00 Merge "vpx: [x86] add vp9_block_error_fp_avx2()" 2017-11-10 00:45:47 +00:00
Scott LaVarnway
62ab5e99c1 vpx: [x86] add vp9_block_error_fp_avx2()
SSE2 asm vs AVX2 intrinsics speed gains:
blocksize   16: ~1.00
blocksize   64: ~1.17
blocksize  256: ~1.67
blocksize 1024: ~1.81

Change-Id: I2a86db239cf57e3ff617890ccb2d236aba83ad5e
2017-11-09 05:02:31 -08:00
Jerome Jiang
adbb4c4d32 Merge "vp9: Add nonref frame buffer test." 2017-11-09 04:41:10 +00:00
Jerome Jiang
a68bbcff29 vp9: Add nonref frame buffer test.
The new test will run a SVC bitstream which has non ref frames.
It checks the number of buffer acquired and released to make sure all
external frame buffers are released.

Add a new test bitstream:
vp90-2-22-svc_1280x720_1.webm
which has 400 frames in total, and 1 spatial layer and 2 temporal layers.
There is one non ref frame every other frame.

Disabled for now. Will be enabled with the fix.

BUG=b/68819248

Change-Id: I0515336fd9809a9e1fceba90e4dce53dabaf53a5
2017-11-08 18:41:33 -08:00
Kyle Siefring
b383a17fa4 Support building AVX-512 and implement sadx4 for AVX-512
The added AVX-512 support requires the subset of AVX-512 added in Skylake-X.

Change-Id: I39666b00d10bf96d06c709823663eb09b89265b7
2017-11-03 13:37:23 -04:00
Scott LaVarnway
3bf02ad74a vpx: hadamard: use ptrdiff_t instead of int for stride
Eliminates the following instruction for the x86 (64 bit)
intrinsic code:

movslq %esi,%rax

Change-Id: I8f5ebd40726f998708a668b0f52ea7a0576befae
2017-10-26 11:41:48 -07:00
Kyle Siefring
037e596f04 Merge "Optimize convolve8 SSSE3 and AVX2 intrinsics" 2017-10-24 19:22:36 +00:00
Kyle Siefring
ae35425ae6 Optimize convolve8 SSSE3 and AVX2 intrinsics
Changed the intrinsics to perform summation similiar to the way the assembly does.

The new code diverges from the assembly by preferring unsaturated additions.

Results for haswell

SSSE3
Horiz/Vert  Size  Speedup
Horiz       x4    ~32%
Horiz       x8    ~6%
Vert        x8    ~4%

AVX2
Horiz/Vert  Size  Speedup
Horiz       x16   ~16%
Vert        x16   ~14%

BUG=webm:1471

Change-Id: I7ad98ea688c904b1ba324adf8eb977873c8b8668
2017-10-24 10:39:48 -04:00
Scott LaVarnway
b58259ab55 Merge "vpx: [x86] add vpx_hadamard_16x16_avx2()" 2017-10-19 23:32:10 +00:00
Scott LaVarnway
55c126a5d7 vpx: [x86] add vpx_hadamard_16x16_avx2()
This version is ~1.91x faster than the sse2 version.  When
highbitdepth is enabled, it is ~1.74x.

Change-Id: I2b0e92ede9f55c6259ca07bf1f8c8a5d0d0955bd
2017-10-18 18:00:00 -07:00
Jerome Jiang
401e6d48bf Merge "Add datarate test for vp8 ROI." 2017-10-18 19:39:26 +00:00
Jerome Jiang
bd6d82e881 Add datarate test for vp8 ROI.
BUG=webm:1470

Change-Id: Icbc848837e64eacc49491dcc26b4c5802af2ee13
2017-10-18 11:19:59 -07:00
Jerome Jiang
ec2fced451 Merge "vp8: Enable use of ROI map." 2017-10-18 18:16:44 +00:00
Jerome Jiang
dbb8926b86 vp8: Enable use of ROI map.
Disable cyclic refresh if ROI is used and add flag to properly handle
the static_thresh deltas.
Remove the ROI test for cyclic refresh (it's allowed but disabled if ROI
is used).
Add an example in vpx_temporal_svc_encoder.c. Turned off by default.

BUG=webm:1470

Change-Id: Ief9ba1d7f967bc00511b412b491c3f70943bfbda
2017-10-17 15:23:03 -07:00
Linfeng Zhang
9336e01621 Merge changes I17fff122,Ic149e3cb
* changes:
  Add 4 to 3 scaling SSSE3 optimization
  Test extreme inputs in frame scale functions
2017-10-17 16:03:29 +00:00
Linfeng Zhang
0d2e95193b Merge "Generalize CheckScalingFiltering in ConvolveTest" 2017-10-17 16:03:07 +00:00
Shiyou Yin
3e2770de4f Merge "vp8: [loongson] optimize dct with mmi" 2017-10-13 00:37:57 +00:00
Kyle Siefring
caa116c9be Merge changes I38783d97,If5160c0c
* changes:
  Extend 16 wide AVX2 convolve8 code to support averaging.
  Add AVX2 version of vpx_convolve8_avg.
2017-10-12 16:12:38 +00:00
Shiyou Yin
f70de09f2a vp8: [loongson] optimize dct with mmi
1. vp8_short_fdct4x4_mmi
2. vp8_short_fdct8x4_mmi
3. vp8_short_walsh4x4_mmi

Change-Id: I89a7df25cfd09fae309fac257ad8b6a3dc1c8acb
2017-10-12 08:50:04 +08:00
Shiyou Yin
bc4098a8e9 Merge "vp8: [loongson] optimize quantize with mmi" 2017-10-12 00:33:17 +00:00
Marco
72c69e14ad Adjust threshold in datarate tests for 1 pass VBR
Small increase in threshold for the 1 pass VBR datarate tests.
Needed due to commit:
<017257a Adjustment to scene detection and key frame>

Change-Id: I28b3bd7db2192a8cc2bccc3cb0e3b8dbb910ca16
2017-10-11 11:48:36 -07:00
Linfeng Zhang
1fa3ec3023 Test extreme inputs in frame scale functions
Change-Id: Ic149e3cb59be2ee0f98a3fcfd83226ad5ea30c99
2017-10-11 11:35:19 -07:00
Shiyou Yin
e8ed2bb762 vp8: [loongson] optimize quantize with mmi
1. vp8_fast_quantize_b_mmi
2. vp8_regular_quantize_b_mmi

Change-Id: Ic6e21593075f92c1004acd67184602d2aa5d5646
2017-10-11 16:45:58 +08:00
Linfeng Zhang
54f7d68c5c Generalize CheckScalingFiltering in ConvolveTest
Let it test extreme inputs and all filter types.
In the future ConvolveTest should test regular 8-bit functions in
high bitdepth mode.

Change-Id: I1042564d1d390589ca203070fe332c6da3315d75
2017-10-10 14:12:43 -07:00
Kyle Siefring
1b2f92ee8e Extend 16 wide AVX2 convolve8 code to support averaging.
Also adds vpx_convolve8_avg_horiz_avx2.

Change-Id: I38783d972ac26bec77610e9e15a0a058ed498cbf
2017-10-09 19:10:03 -04:00
Linfeng Zhang
e1ae3772da Merge "Update vp9_scale_and_extend_frame_ssse3()" 2017-10-09 16:20:00 +00:00
Kyle Siefring
9ca06bcdd2 Add AVX2 version of vpx_convolve8_avg.
vpx_convolve8_avg works by first running a normal horizontal filter then a
vertical filter averages at the end.

The added vpx_convolve8_avg_avx2 calls pre-existing AVX2 code for the
horizontal step.

vpx_convolve8_avg_vert_avx2 is also added, but only uses ssse3 code.

Change-Id: If5160c0c8e778e10de61ee9bf42ee4be5975c983
2017-10-07 23:37:48 -04:00
James Zern
807248ec81 Merge "ppc: Add vpx_idct32x32_1024_add_vsx" 2017-10-07 19:08:26 +00:00
James Zern
107eb6a9d4 vp9_ethread_test: abort early/add more detailed output
in the case compare_fp_stats fails report the 2 values and their index

Change-Id: I927a832b7a1e24c392961093b7caee1134223def
2017-10-05 15:02:51 -07:00
Linfeng Zhang
b809442521 Update vp9_scale_and_extend_frame_ssse3()
Change-Id: I22622faebfcc36f7a4d1f37e3800ae8ab87c8cd4
2017-10-04 12:32:30 -07:00
Linfeng Zhang
9a71811d98 Merge changes Id6a8c549,Ib1e0650b,Ic369dd86
* changes:
  Refactor x86/vpx_subpixel_8t_intrin_ssse3.c
  Add vpx_dsp/x86/mem_sse2.h
  Add transpose_8bit_{4x4,8x8}() x86 optimization
2017-10-04 16:15:14 +00:00
Jerome Jiang
ffa3a3c441 Merge "Fix image width alignment. Enable ImageSizeSetting test." 2017-10-04 14:48:03 +00:00
Linfeng Zhang
6543213e87 Refactor x86/vpx_subpixel_8t_intrin_ssse3.c
Change-Id: Id6a8c549709a3c516ed5d7b719b05117c5ef8bac
2017-10-03 13:02:05 -07:00
Alexandra Hájková
fb7fc1dbda ppc: Add vpx_idct32x32_1024_add_vsx
Change-Id: I55cd0a1569ccc47a53d0ecf751aac259d510e10d
2017-09-30 19:31:20 +00:00
Scott LaVarnway
3bbd62ed27 vpxdsp: [x86] add highbd_d135_predictor functions
C vs SSE2 speed gains:
_4x4 : ~1.81x

C vs SSSE3 speed gains:
_8x8 : ~1.96x
_16x16 : ~1.88x
_32x32 : ~2.02x

BUG=webm:1411

Change-Id: Iefaf8b39afbbfe34c1ad1d21e3a003b20f1f61e0
2017-09-29 08:56:38 -07:00
Scott LaVarnway
4cae64c32c vpxdsp: [x86] add highbd_d117_predictor functions
C vs SSE2 speed gains:
_4x4 : ~2.04x

C vs SSSE3 speed gains:
_8x8 : ~2.82x
_16x16 : ~5.93x
_32x32 : ~2.79x

BUG=webm:1411

Change-Id: I31d949695991c067dac89d91e0bed3e666c94993
2017-09-28 14:45:28 -07:00
Jerome Jiang
5a40c8fde1 Fix image width alignment. Enable ImageSizeSetting test.
BUG=b/64710201

Change-Id: I5465f6c6481d3c9a5e00fcab024cf4ae562b6b01
2017-09-28 11:25:24 -07:00
Scott LaVarnway
80992a746c Merge "vpxdsp: [x86] add highbd_d153_predictor functions" 2017-09-27 20:40:21 +00:00
James Zern
690fa6bb6e Merge "fix signed integer overflow of idct" 2017-09-27 19:39:11 +00:00
Linfeng Zhang
dbbbd44304 fix signed integer overflow of idct
Exposed by fuzz test in high bitdepth.
The bug is introduced in commit 64653fa.

BUG=webm:1466

Change-Id: Idd77d5c6a60efb9241471611ce1aba0646cb6ff5
2017-09-27 11:17:54 -07:00
Scott LaVarnway
19c45ccd43 vpxdsp: [x86] add highbd_d153_predictor functions
C vs SSE2 speed gains:
_4x4 : ~1.95x

C vs SSSE3 speed gains:
_8x8 : ~3.30x
_16x16 : ~5.67x
_32x32 : ~3.87x

BUG=webm:1411

Change-Id: Ib483989b25614aa89b635e8c087d0879a5d71904
2017-09-27 11:01:11 -07:00
Linfeng Zhang
d203a91a09 Merge "Add vpx_scaled_2d_neon()" 2017-09-27 16:12:48 +00:00
Jerome Jiang
878464150b Merge "Add unit test to expose vp8 bug when width is set odd." 2017-09-27 01:26:59 +00:00
Jerome Jiang
767503504f Add unit test to expose vp8 bug when width is set odd.
BUG=b/64710201

Change-Id: Ia518af5494a42e80949cf1165244fbed59606cf7
2017-09-26 17:40:13 -07:00
Linfeng Zhang
9d0d13e939 Add vpx_scaled_2d_neon()
BUG=webm:1419

Change-Id: I39c8033734562efc0ac0e28e7f06fa05130f9b96
2017-09-26 09:22:39 -07:00
Linfeng Zhang
28762341ac Merge changes Ib9105462,Idfac00ed,If8d8a0e2
* changes:
  cosmetics: NEON scaling code
  Refactor convolve NEON code
  Refactor convolve code
2017-09-26 16:10:46 +00:00
Scott LaVarnway
cf82f7276e vpxdsp: [x86] add highbd_d45_predictor functions
C vs SSSE3 speed gains:
_4x4 : ~2.45x
_8x8 : ~10.61x
_16x16 : ~11.34x
_32x32 : ~6.36x

BUG=webm:1411

Change-Id: Ic91389a4f1a8ad093f498afe53765b897fb9be09
2017-09-22 05:20:12 -07:00
Scott LaVarnway
b85e391ac8 Merge "vpxdsp: [x86] add highbd_d63_predictor functions" 2017-09-20 11:39:28 +00:00
Linfeng Zhang
7c0529728a cosmetics: NEON scaling code
Change-Id: Ib91054622c1f09c4ca523bc6837d7d8ab9f03618
2017-09-19 16:39:17 -07:00