Commit Graph

9428 Commits

Author SHA1 Message Date
James Zern
f16ea6a6eb Merge "vp9_rdopt: correct size to vpx_sum_squares_2d_i16" 2017-03-23 00:53:22 +00:00
Marco Paniconi
ff0e0a76e8 Merge "vp9: Adjust some speed settings for speed 8." 2017-03-22 22:56:17 +00:00
Marco
4d50991320 vp9: Adjust some speed settings for speed 8.
Allow for simple_block_rd for VGA resoln, and reduce
adaptive_rd_thresh to 1.

On average no loss on RTC set, ~4% speedup on mac.

Change-Id: Ib549c4061c853776062b5e34040f839d470fbebc
2017-03-22 15:16:15 -07:00
Jerome Jiang
dcd6c87b80 Merge "vp9: Enable adaptive_rd_threshold for row mt for realtime speed 8." 2017-03-22 22:02:24 +00:00
James Zern
5661cd8ff4 vp9_rdopt: correct size to vpx_sum_squares_2d_i16
the current implementations expect pixel size, not the block type

BUG=webm:1392

Change-Id: Ib91e9f30a1f56e13566b1fb76f089dae9bb50cdc
2017-03-22 12:04:33 -07:00
Johann
36d732c22b vp9 temporal filter: add const to function prototype
The input frames are not modified.

Change-Id: Ideb810e3c5afeb4dbdc4c7d54024c43a8129ad39
2017-03-22 18:14:21 +00:00
Jerome Jiang
20c2892693 vp9: Enable adaptive_rd_threshold for row mt for realtime speed 8.
Change it to row based array to avoid the slow down cause by sync.
row-mt on, speed 8, 2 threads: ~4% speedup for VGA on ARM benefited
from adaptive_rd_threshold.

Change-Id: I887e65a53af20a6c4f48d293daaee09dab3512cf
2017-03-21 18:49:47 -07:00
Jerome Jiang
dbed479d79 Fix the data race caused by vp9 denoiser.
BUG=webm:1391

Change-Id: I9669ae62fe9c695d4c6f9973094cb0f39bed51c7
2017-03-21 15:46:25 -07:00
Yunqing Wang
1935dfb294 Code refactoring in the partition search
Computed the partition search early termination score in a separate
function.

Change-Id: I1894b517ff179a38b1c05e054d373ac4b7f4cbb4
2017-03-21 10:00:44 -07:00
Marco Paniconi
05c7259525 Merge "vp9: Nonrd variance partition: improve split to 16x16." 2017-03-21 00:17:35 +00:00
Yunqing Wang
bf43b4c4b4 Merge "Record the sum of tx block eobs in the partition block" 2017-03-20 23:20:12 +00:00
Marco
3135b85423 vp9: Nonrd variance partition: improve split to 16x16.
Add additional condition to split to 16x16, for resolutions <= 360p,
reduces dragging artifact near moving boundary.

Small/no change on RTC metrics.

Change-Id: I314694f2166435d918f74e7ab42f002b07f40dae
2017-03-20 15:44:46 -07:00
Marco
06c8713e89 vp9: Use sb content measure to bias against golden.
For each superblock, keep track of how far from current frame
was the last significant content change, and use that (along
with GF distance), to turnoff GF search in non-rd pickmode.

Only enabled for speed >= 8.

avgPNSR on RTC/RTC_derf down by ~0.9/1.2.
Speedup on mac: ~3-5%.
Speedup on arm: 3.6% for VGA and 4.4% for HD.

Change-Id: Ic3f3d6a2af650aca6ba0064d2b1db8d48c035ac7
2017-03-20 12:42:26 -07:00
Yunqing Wang
9c2552a1c1 Record the sum of tx block eobs in the partition block
The sum of tx bloxk eobs is needed in the machine learning based partition
early termination. The eobs are first accumulated during tx search, and
then the value associated with the best tx_size is copied to ctx for later
use.

After the sum of eobs are calculated correctly, re-enabled
ml_partition_search_early_termination speed feature.

Re-did the quality/speed test to check the impact of the fix.

1. Borg test BDRATE result:
4k set:     PSNR: +0.183%; SSIM: +0.100%;
hdres set:  PSNR: +0.168%; SSIM: +0.256%;
midres set: PSNR: +0.186%; SSIM: +0.326%;

2.Average speed gain result:
4k clips: 21%;
hd clips: 26%;
midres clips: 15%.

The result is in line with the original result.

Change-Id: I4209a95c89be03b4cbfb6a95b16885f89feddbda
2017-03-20 17:12:15 +00:00
Alex Converse
0842daa24e Merge "vp9_optimize_b: Combine extrabits cost with token lookup" 2017-03-17 16:18:21 +00:00
Marco
02975a604c vp9: Fix speed 8 condition for enabling copy_partition.
Change-Id: I2c090e6ba853a30fef1957b620853315f9471753
2017-03-16 17:08:37 -07:00
Alex Converse
3a6ec9ea72 vp9_optimize_b: Combine extrabits cost with token lookup
About 0.6% fewer cycles spent in vp9_optimize_b.

Change-Id: I2ae62a78374c594ed81d4e3100a5848e2f6f2c4e
2017-03-16 17:03:22 -07:00
Gabriel Marin
976ddb61d3 Add a vector form of routine vp9_model_rd_from_var_lapndz
Add routine vp9_model_rd_from_var_lapndz_vec and call it from model_rd_for_sb
to model the rate and distortion for MAX_MB_PLANE Laplacian sources in
parallel. The caller ensures that all sources have non-zero variance.

Measured a 18% to 25% reduction in retired instructions, and 17% to 24%
reduction in instruction execution cost with different compilers for the
Laplacian modeling.

No change in behavior.

TEST=Verified that encoded files match bit for bit, with and without this
change.
BUG=b/33678225

Change-Id: I6b76947f21c659a349adb896e13e99f6e3f951e6
2017-03-16 22:19:44 +00:00
Marco
bc7d4935bb vp9: Fixes in non-rd pickmode for denoising with SVC.
Don't denoise spatial layer frames whose base layer is a key frame.

Disallow golden reference for SVC with denoising on frames
that will be denoised (highest layer), as this removes bad artifact.
Will re-enable when issue is resolved.

Change-Id: I87a6597812330500966458172acfce54af65f70f
2017-03-16 12:59:41 -07:00
Jerome Jiang
bf40776aa4 Merge "Refactor: Change cpi->resize_state to enum values." 2017-03-16 16:43:42 +00:00
Marco Paniconi
cd47c1942e Merge "vp9: Fix some issues with denoiser and SVC." 2017-03-16 02:42:55 +00:00
Marco
a340c64a79 vp9: Fix some issues with denoiser and SVC.
Fix the update of the denoiser buffer when the base
spatial layer is a key frame. And allow for better/lower
QP on high spatial layers when their base layer is key frame.

Change-Id: I96b2426f1eaa43b8b8d4c31a68b0c6d68c3024a2
2017-03-15 17:19:17 -07:00
Jerome Jiang
b5f7f7737a Refactor: Change cpi->resize_state to enum values.
Change-Id: Iab1409b0fc1175bc5a14afc4749a08c536c98c41
2017-03-15 17:16:17 -07:00
Marco
2c8430e223 vp9: Turn off ml_partition_search_early_termination.
Fails on nightly ubsan, valgrind tests.
Enabled on commit:6701014

Change-Id: Ied3f5cb38e39cba54ac134f4514107cdfdfce159
2017-03-15 15:00:38 -07:00
Jerome Jiang
27d5a57072 Merge "vp9: Using source sad for speedup for dynamic resizing." 2017-03-15 00:03:52 +00:00
Jerome Jiang
2fa7092808 Merge "vp9: Enable row multithreading for SVC in real-time mode." 2017-03-14 23:29:46 +00:00
Jerome Jiang
02463273c9 vp9: Using source sad for speedup for dynamic resizing.
Only for speed >= 7.

Change-Id: I3ac85fbb4023cf7e6f8333806b345b0174382a09
2017-03-14 15:47:19 -07:00
James Zern
1b91f41935 Merge "vp9/encoder: fix segfault on win32 using vs < 2015" 2017-03-14 19:21:42 +00:00
Yunqing Wang
c3e290963d Merge "Apply machine learning-based early termination in VP9 partition search" 2017-03-14 18:07:05 +00:00
Marco Paniconi
78a6946904 Merge "vp9: Speed >= 8: Enable simple_block_yrd speed feature." 2017-03-14 17:50:17 +00:00
Marco
c0c789ab50 vp9: Adjust copy partition threshold, for speed 8.
Reduce it from 5 to 4, small/no change in metrics or speed.
Small reduction in dragging artifact near moving head.

Change-Id: Ic3bc5ca67c70bf0c89fc2ed14454840a28ae5b6a
2017-03-14 09:18:53 -07:00
Marco
c216c8d6f2 vp9: Speed >= 8: Enable simple_block_yrd speed feature.
Enable speed feature for resolutions > VGA.
avgPSNR on RTC down by ~1.7%.
Speedup on ARM: ~5%.

Change-Id: I7a3fe5f7425aa8df3f4a2eced1afa355bc0d4c95
2017-03-14 09:10:28 -07:00
Marco Paniconi
507204316a Merge "vp9: Fix to source_sad feature for SVC." 2017-03-13 19:18:31 +00:00
Linfeng Zhang
b0bfcc368c Merge "Add vpx_highbd_idct32x32_135_add_c()" 2017-03-13 18:49:01 +00:00
Marco
f0a22b23fe vp9: Fix to source_sad feature for SVC.
Allow speed feature sf->use_source_sad to be used
on highest spatial layer for SVC.

Change-Id: I260eb0478902764f49f83e43b17024fe86ff3b22
2017-03-13 11:00:40 -07:00
Yunqing Wang
670101439f Apply machine learning-based early termination in VP9 partition search
This patch was based on Yang Xian's intern project code. Further modifications
were done.
1. Moved machine-learning related parameters into the context structure.
2. Corrected the calculation of sum_eobs.
3. Removed unused parameters and calculations.
4. Made it work with multiple tiles.
5. Added a speed feature for the machine-learning based partition search
early termination.
6. Re-organized the code.

The patch was rebased to the top-of-tree.

Borg test BDRATE result:
4k set:     PSNR: +0.144%; SSIM: +0.043%;
hdres set:  PSNR: +0.149%; SSIM: +0.269%;
midres set: PSNR: +0.127%; SSIM: +0.257%;

Average speed gain result:
4k clips: 22%;
hd clips: 23%;
midres clips: 15%.

Change-Id: I0220e93a8277e6a7ea4b2c34b605966e3b1584ac
2017-03-13 09:54:18 -07:00
Marco
8c18df7fcd vp9: Fix condition for intra search in non-rd pickmode.
Fixes an issue when the LAST and golden is not used as a reference,
in which case its possible no encoding mode is set (since intra may be
skipped under certain codtions). Fix is to make sure intra is searched
if no inter mode is checked.

Issue can happen for temporal layer pattern#7 in vpx_temporal_svc_encoder.c

Change-Id: I5ab4999b2f9dbd739044888e0916b5ec491d966b
2017-03-12 22:30:39 -07:00
James Zern
c09b290cea vp9/encoder: fix segfault on win32 using vs < 2015
shift the bsse[] member of the macroblock struct to the front to avoid
an incorrect offset (0) to the upper half of bsse[0] which leads to a
negative resulting in a crash. restrict this to visual studio versions
before 2015 (the bug was observed with 2013, fixed in 2015) to avoid any
potential cache impact on other platforms.

https://connect.microsoft.com/VisualStudio/feedback/details/2396360/bad-structure-offset-in-32-bit-code

BUG=webm:1054

Change-Id: I40f68a1d421ccc503cc712192263bab4f7dde076
2017-03-10 17:37:17 -08:00
Marco
ffb3c50da1 vp9: Enable row multithreading for SVC in real-time mode.
Enable row-mt for SVC for real-time mode, speed >=5.

Add the controls to the sample encoders, but keep it off for now.
Add the control and enable it for the 1 pass CBR unittests.

For speed 7, 3 layer SVC, 2 threads, row-mt enabled gives about ~5% speedup.

Change-Id: Ie8e77323c17263e3e7a7b9858aec12a3a93ec0c1
2017-03-10 01:01:07 +00:00
James Zern
cb60e66085 Merge "move vp9_scale_and_extend_frame_c to vp9_frame_scale.c" 2017-03-09 22:51:08 +00:00
James Zern
2f31a16445 move vp9_scale_and_extend_frame_c to vp9_frame_scale.c
this is similar to the x86 configuration and helps mitigate an issue
with a circular dependency between this function and the ssse3 variant
causing an outsized increase in binary size (~300K for chrome)
chrome.dll:
.text 255B000 -> 252B000
.data 7B000 -> 75000
-221184 bytes

BUG=chromium:697956

Change-Id: Ic95b142ecd62dd4f1795788aa27dd8fab59b708c
2017-03-08 21:13:50 -08:00
Marco
ea3c817ac2 vp9: Enable two speed features for SVC real-time mode.
Enable short_circuit_low_temp_var and limit_newmv_early_exit
for SVC, 1 pass CBR mode.

Change-Id: I77df2b2c6cc40657bb8ea76e19dfc2fdaad6389e
2017-03-08 16:13:59 -08:00
Yunqing Wang
099e9bf1ff Make the partition search early termination feature to be frame size dependent
The 2 thresholds(i.e. partition_search_breakout_dist_thr and
partition_search_breakout_rate_thr) are used as the partition search
early termination speed feature. This refactoring patch made this
feature to be frame size dependent consistently throughout the code.

Change-Id: Idaa0bd8400badaa0f8e2091e3f41ed2544e71be9
2017-03-08 12:56:41 -08:00
Linfeng Zhang
48f5886605 Add vpx_highbd_idct32x32_135_add_c()
When eob is less than or equal to 135 for high-bitdepth 32x32 idct,
call this function.

BUG=webm:1301

Change-Id: I8a5864f5c076e449c984e602946547a7b09c9fe6
2017-03-08 10:46:33 -08:00
Marco
45de35fc58 vp9: Fix for denoising with SVC.
Fix the conditon for getting last_source when denoising is on.
This avoids unneeded scaling in the case of SVC.

No change in quality.

Change-Id: I32c1c2c9085104da51af8535716bcc4d55fb0f42
2017-03-08 09:45:58 -08:00
Alex Converse
15dac923b9 Merge "Narrow cat6_high_cost tables to uint16_t" 2017-03-03 23:45:39 +00:00
Alex Converse
bcd12de6c3 Narrow cat6_high_cost tables to uint16_t
Saves 2688 bytes of rodata.

Change-Id: I46633b6e50c2845181c70fff6273a8e58fdd1e56
2017-03-03 23:09:12 +00:00
Vignesh Venkatasubramanian
9e7140b451 Merge "vp9,realtime: Enable row multithreading for non-rd" 2017-03-03 19:05:52 +00:00
Marco Paniconi
1bb63bf669 Merge "vp9: Speed 8: reduce the adaptive_rd_thresh level." 2017-03-02 22:25:03 +00:00
Marco
b60617f5ff vp9: Speed 8: reduce the adaptive_rd_thresh level.
Reduce the level from 4 to 2.
This gives ~1-2% quality gain on RTC set, with small decreaee in speed (~1-2% on mac).

Change-Id: I7d959731badcee3d45b2f4a08efe378765016a13
2017-03-02 13:34:10 -08:00