Scott LaVarnway
8e6022844f
vpx: [x86] add vpx_satd_avx2()
...
SSE2 instrinsic vs AVX2 intrinsic speed gains:
blocksize 16: ~1.33
blocksize 64: ~1.51
blocksize 256: ~3.03
blocksize 1024: ~3.71
Change-Id: I79b28cba82d21f9dd765e79881aa16d24fd0cb58
2017-11-10 12:24:12 -08:00
Scott LaVarnway
8c7213bc00
Merge "vpx: [x86] add vp9_block_error_fp_avx2()"
2017-11-10 00:45:47 +00:00
Scott LaVarnway
62ab5e99c1
vpx: [x86] add vp9_block_error_fp_avx2()
...
SSE2 asm vs AVX2 intrinsics speed gains:
blocksize 16: ~1.00
blocksize 64: ~1.17
blocksize 256: ~1.67
blocksize 1024: ~1.81
Change-Id: I2a86db239cf57e3ff617890ccb2d236aba83ad5e
2017-11-09 05:02:31 -08:00
Jerome Jiang
adbb4c4d32
Merge "vp9: Add nonref frame buffer test."
2017-11-09 04:41:10 +00:00
Jerome Jiang
a68bbcff29
vp9: Add nonref frame buffer test.
...
The new test will run a SVC bitstream which has non ref frames.
It checks the number of buffer acquired and released to make sure all
external frame buffers are released.
Add a new test bitstream:
vp90-2-22-svc_1280x720_1.webm
which has 400 frames in total, and 1 spatial layer and 2 temporal layers.
There is one non ref frame every other frame.
Disabled for now. Will be enabled with the fix.
BUG=b/68819248
Change-Id: I0515336fd9809a9e1fceba90e4dce53dabaf53a5
2017-11-08 18:41:33 -08:00
Kyle Siefring
b383a17fa4
Support building AVX-512 and implement sadx4 for AVX-512
...
The added AVX-512 support requires the subset of AVX-512 added in Skylake-X.
Change-Id: I39666b00d10bf96d06c709823663eb09b89265b7
2017-11-03 13:37:23 -04:00
Scott LaVarnway
3bf02ad74a
vpx: hadamard: use ptrdiff_t instead of int for stride
...
Eliminates the following instruction for the x86 (64 bit)
intrinsic code:
movslq %esi,%rax
Change-Id: I8f5ebd40726f998708a668b0f52ea7a0576befae
2017-10-26 11:41:48 -07:00
Kyle Siefring
037e596f04
Merge "Optimize convolve8 SSSE3 and AVX2 intrinsics"
2017-10-24 19:22:36 +00:00
Kyle Siefring
ae35425ae6
Optimize convolve8 SSSE3 and AVX2 intrinsics
...
Changed the intrinsics to perform summation similiar to the way the assembly does.
The new code diverges from the assembly by preferring unsaturated additions.
Results for haswell
SSSE3
Horiz/Vert Size Speedup
Horiz x4 ~32%
Horiz x8 ~6%
Vert x8 ~4%
AVX2
Horiz/Vert Size Speedup
Horiz x16 ~16%
Vert x16 ~14%
BUG=webm:1471
Change-Id: I7ad98ea688c904b1ba324adf8eb977873c8b8668
2017-10-24 10:39:48 -04:00
Scott LaVarnway
b58259ab55
Merge "vpx: [x86] add vpx_hadamard_16x16_avx2()"
2017-10-19 23:32:10 +00:00
Scott LaVarnway
55c126a5d7
vpx: [x86] add vpx_hadamard_16x16_avx2()
...
This version is ~1.91x faster than the sse2 version. When
highbitdepth is enabled, it is ~1.74x.
Change-Id: I2b0e92ede9f55c6259ca07bf1f8c8a5d0d0955bd
2017-10-18 18:00:00 -07:00
Jerome Jiang
401e6d48bf
Merge "Add datarate test for vp8 ROI."
2017-10-18 19:39:26 +00:00
Jerome Jiang
bd6d82e881
Add datarate test for vp8 ROI.
...
BUG=webm:1470
Change-Id: Icbc848837e64eacc49491dcc26b4c5802af2ee13
2017-10-18 11:19:59 -07:00
Jerome Jiang
ec2fced451
Merge "vp8: Enable use of ROI map."
2017-10-18 18:16:44 +00:00
Jerome Jiang
dbb8926b86
vp8: Enable use of ROI map.
...
Disable cyclic refresh if ROI is used and add flag to properly handle
the static_thresh deltas.
Remove the ROI test for cyclic refresh (it's allowed but disabled if ROI
is used).
Add an example in vpx_temporal_svc_encoder.c. Turned off by default.
BUG=webm:1470
Change-Id: Ief9ba1d7f967bc00511b412b491c3f70943bfbda
2017-10-17 15:23:03 -07:00
Linfeng Zhang
9336e01621
Merge changes I17fff122,Ic149e3cb
...
* changes:
Add 4 to 3 scaling SSSE3 optimization
Test extreme inputs in frame scale functions
2017-10-17 16:03:29 +00:00
Linfeng Zhang
0d2e95193b
Merge "Generalize CheckScalingFiltering in ConvolveTest"
2017-10-17 16:03:07 +00:00
Shiyou Yin
3e2770de4f
Merge "vp8: [loongson] optimize dct with mmi"
2017-10-13 00:37:57 +00:00
Kyle Siefring
caa116c9be
Merge changes I38783d97,If5160c0c
...
* changes:
Extend 16 wide AVX2 convolve8 code to support averaging.
Add AVX2 version of vpx_convolve8_avg.
2017-10-12 16:12:38 +00:00
Shiyou Yin
f70de09f2a
vp8: [loongson] optimize dct with mmi
...
1. vp8_short_fdct4x4_mmi
2. vp8_short_fdct8x4_mmi
3. vp8_short_walsh4x4_mmi
Change-Id: I89a7df25cfd09fae309fac257ad8b6a3dc1c8acb
2017-10-12 08:50:04 +08:00
Shiyou Yin
bc4098a8e9
Merge "vp8: [loongson] optimize quantize with mmi"
2017-10-12 00:33:17 +00:00
Marco
72c69e14ad
Adjust threshold in datarate tests for 1 pass VBR
...
Small increase in threshold for the 1 pass VBR datarate tests.
Needed due to commit:
<017257a Adjustment to scene detection and key frame>
Change-Id: I28b3bd7db2192a8cc2bccc3cb0e3b8dbb910ca16
2017-10-11 11:48:36 -07:00
Linfeng Zhang
1fa3ec3023
Test extreme inputs in frame scale functions
...
Change-Id: Ic149e3cb59be2ee0f98a3fcfd83226ad5ea30c99
2017-10-11 11:35:19 -07:00
Shiyou Yin
e8ed2bb762
vp8: [loongson] optimize quantize with mmi
...
1. vp8_fast_quantize_b_mmi
2. vp8_regular_quantize_b_mmi
Change-Id: Ic6e21593075f92c1004acd67184602d2aa5d5646
2017-10-11 16:45:58 +08:00
Linfeng Zhang
54f7d68c5c
Generalize CheckScalingFiltering in ConvolveTest
...
Let it test extreme inputs and all filter types.
In the future ConvolveTest should test regular 8-bit functions in
high bitdepth mode.
Change-Id: I1042564d1d390589ca203070fe332c6da3315d75
2017-10-10 14:12:43 -07:00
Kyle Siefring
1b2f92ee8e
Extend 16 wide AVX2 convolve8 code to support averaging.
...
Also adds vpx_convolve8_avg_horiz_avx2.
Change-Id: I38783d972ac26bec77610e9e15a0a058ed498cbf
2017-10-09 19:10:03 -04:00
Linfeng Zhang
e1ae3772da
Merge "Update vp9_scale_and_extend_frame_ssse3()"
2017-10-09 16:20:00 +00:00
Kyle Siefring
9ca06bcdd2
Add AVX2 version of vpx_convolve8_avg.
...
vpx_convolve8_avg works by first running a normal horizontal filter then a
vertical filter averages at the end.
The added vpx_convolve8_avg_avx2 calls pre-existing AVX2 code for the
horizontal step.
vpx_convolve8_avg_vert_avx2 is also added, but only uses ssse3 code.
Change-Id: If5160c0c8e778e10de61ee9bf42ee4be5975c983
2017-10-07 23:37:48 -04:00
James Zern
807248ec81
Merge "ppc: Add vpx_idct32x32_1024_add_vsx"
2017-10-07 19:08:26 +00:00
James Zern
107eb6a9d4
vp9_ethread_test: abort early/add more detailed output
...
in the case compare_fp_stats fails report the 2 values and their index
Change-Id: I927a832b7a1e24c392961093b7caee1134223def
2017-10-05 15:02:51 -07:00
Linfeng Zhang
b809442521
Update vp9_scale_and_extend_frame_ssse3()
...
Change-Id: I22622faebfcc36f7a4d1f37e3800ae8ab87c8cd4
2017-10-04 12:32:30 -07:00
Linfeng Zhang
9a71811d98
Merge changes Id6a8c549,Ib1e0650b,Ic369dd86
...
* changes:
Refactor x86/vpx_subpixel_8t_intrin_ssse3.c
Add vpx_dsp/x86/mem_sse2.h
Add transpose_8bit_{4x4,8x8}() x86 optimization
2017-10-04 16:15:14 +00:00
Jerome Jiang
ffa3a3c441
Merge "Fix image width alignment. Enable ImageSizeSetting test."
2017-10-04 14:48:03 +00:00
Linfeng Zhang
6543213e87
Refactor x86/vpx_subpixel_8t_intrin_ssse3.c
...
Change-Id: Id6a8c549709a3c516ed5d7b719b05117c5ef8bac
2017-10-03 13:02:05 -07:00
Alexandra Hájková
fb7fc1dbda
ppc: Add vpx_idct32x32_1024_add_vsx
...
Change-Id: I55cd0a1569ccc47a53d0ecf751aac259d510e10d
2017-09-30 19:31:20 +00:00
Scott LaVarnway
3bbd62ed27
vpxdsp: [x86] add highbd_d135_predictor functions
...
C vs SSE2 speed gains:
_4x4 : ~1.81x
C vs SSSE3 speed gains:
_8x8 : ~1.96x
_16x16 : ~1.88x
_32x32 : ~2.02x
BUG=webm:1411
Change-Id: Iefaf8b39afbbfe34c1ad1d21e3a003b20f1f61e0
2017-09-29 08:56:38 -07:00
Scott LaVarnway
4cae64c32c
vpxdsp: [x86] add highbd_d117_predictor functions
...
C vs SSE2 speed gains:
_4x4 : ~2.04x
C vs SSSE3 speed gains:
_8x8 : ~2.82x
_16x16 : ~5.93x
_32x32 : ~2.79x
BUG=webm:1411
Change-Id: I31d949695991c067dac89d91e0bed3e666c94993
2017-09-28 14:45:28 -07:00
Jerome Jiang
5a40c8fde1
Fix image width alignment. Enable ImageSizeSetting test.
...
BUG=b/64710201
Change-Id: I5465f6c6481d3c9a5e00fcab024cf4ae562b6b01
2017-09-28 11:25:24 -07:00
Scott LaVarnway
80992a746c
Merge "vpxdsp: [x86] add highbd_d153_predictor functions"
2017-09-27 20:40:21 +00:00
James Zern
690fa6bb6e
Merge "fix signed integer overflow of idct"
2017-09-27 19:39:11 +00:00
Linfeng Zhang
dbbbd44304
fix signed integer overflow of idct
...
Exposed by fuzz test in high bitdepth.
The bug is introduced in commit 64653fa
.
BUG=webm:1466
Change-Id: Idd77d5c6a60efb9241471611ce1aba0646cb6ff5
2017-09-27 11:17:54 -07:00
Scott LaVarnway
19c45ccd43
vpxdsp: [x86] add highbd_d153_predictor functions
...
C vs SSE2 speed gains:
_4x4 : ~1.95x
C vs SSSE3 speed gains:
_8x8 : ~3.30x
_16x16 : ~5.67x
_32x32 : ~3.87x
BUG=webm:1411
Change-Id: Ib483989b25614aa89b635e8c087d0879a5d71904
2017-09-27 11:01:11 -07:00
Linfeng Zhang
d203a91a09
Merge "Add vpx_scaled_2d_neon()"
2017-09-27 16:12:48 +00:00
Jerome Jiang
878464150b
Merge "Add unit test to expose vp8 bug when width is set odd."
2017-09-27 01:26:59 +00:00
Jerome Jiang
767503504f
Add unit test to expose vp8 bug when width is set odd.
...
BUG=b/64710201
Change-Id: Ia518af5494a42e80949cf1165244fbed59606cf7
2017-09-26 17:40:13 -07:00
Linfeng Zhang
9d0d13e939
Add vpx_scaled_2d_neon()
...
BUG=webm:1419
Change-Id: I39c8033734562efc0ac0e28e7f06fa05130f9b96
2017-09-26 09:22:39 -07:00
Linfeng Zhang
28762341ac
Merge changes Ib9105462,Idfac00ed,If8d8a0e2
...
* changes:
cosmetics: NEON scaling code
Refactor convolve NEON code
Refactor convolve code
2017-09-26 16:10:46 +00:00
Scott LaVarnway
cf82f7276e
vpxdsp: [x86] add highbd_d45_predictor functions
...
C vs SSSE3 speed gains:
_4x4 : ~2.45x
_8x8 : ~10.61x
_16x16 : ~11.34x
_32x32 : ~6.36x
BUG=webm:1411
Change-Id: Ic91389a4f1a8ad093f498afe53765b897fb9be09
2017-09-22 05:20:12 -07:00
Scott LaVarnway
b85e391ac8
Merge "vpxdsp: [x86] add highbd_d63_predictor functions"
2017-09-20 11:39:28 +00:00
Linfeng Zhang
7c0529728a
cosmetics: NEON scaling code
...
Change-Id: Ib91054622c1f09c4ca523bc6837d7d8ab9f03618
2017-09-19 16:39:17 -07:00