BUG=webm:1397
(yunqingwang)
To verify that this patch wouldn't cause much performance change,
the Borg tests were run. Here was the result:
avg_psnr overall_psnr ssim
hdres: -0.002 0.006 0.013
midres: 0 0 0
lowres: 0 0 0
Change-Id: Iae395ae7b741e0513cf5bab9dcace110b792a67d
The row mt sync read uses sync_range = 1, and wouldn't work if we want
to use a sync_range that is greater than 1. To make it work, this sync
read code is modified. Pass in col instead of col - 1 to make it
consistent with other row mt code in VP9, and then add 1 in "while"
codition.
Change-Id: I4a0e487190ac5d47b8216368da12d80fec779c1a
Issue/bug happens for denoising with spatial layers, where
the golden (spatial) reference is used in pickmode, but
denoising is only done wrt to last (temporal).
Fix is to make sure set_ref_ptrs is set before build predictors
in denoiser.
Change-Id: I793cf441341edf7c4a88b8ab1e1b22b3cb0eb508
Temporary override to condition for disallowing intra-search in SVC,
since golden (spatial) reference is currently suppressed due to
artifact issue.
Change-Id: I28ed7fdddc9fcdbcc0a4175a247a3ecc94c11767
For non-rd variance partition, avoid the chrome check
unless y_sad is below some threshold.
Small decrease in avgPSNR (~0.3) on RTC set.
Small/negligible decrease on RTC_derf.
Change-Id: I7af44235af514058ccf9a4f10bb737da9d720866
Refactor to split the 1 passs source sad computation into scene
detection (currently used for VBR and screen-content mode), and
superblock based source sad computation (used in non-rd CBR mode).
This allows the source sad computation for CBR mode to be
multi-threaded.
No change in compression.
Change-Id: I112f2918613ccbd37c1771d852606d3af18c1388
Make the source_sad feature work properly for cases of VBR or
screen_content with SVC.
Added unittest for SVC with screen-content on.
Change-Id: Iba5254fd8833fb11da521e00cc1317ec81d3f89b
Since y_sad is not computed yet (on the early exit due to source_sad),
no need to check for setting color_sensitiviy.
Only affects speed >=8. No change in behavior.
Change-Id: I3a6f2d20fed38d8b8ec51b75bcacf9a21f2db916
Allow for simple_block_rd for VGA resoln, and reduce
adaptive_rd_thresh to 1.
On average no loss on RTC set, ~4% speedup on mac.
Change-Id: Ib549c4061c853776062b5e34040f839d470fbebc
Change it to row based array to avoid the slow down cause by sync.
row-mt on, speed 8, 2 threads: ~4% speedup for VGA on ARM benefited
from adaptive_rd_threshold.
Change-Id: I887e65a53af20a6c4f48d293daaee09dab3512cf
Add additional condition to split to 16x16, for resolutions <= 360p,
reduces dragging artifact near moving boundary.
Small/no change on RTC metrics.
Change-Id: I314694f2166435d918f74e7ab42f002b07f40dae
For each superblock, keep track of how far from current frame
was the last significant content change, and use that (along
with GF distance), to turnoff GF search in non-rd pickmode.
Only enabled for speed >= 8.
avgPNSR on RTC/RTC_derf down by ~0.9/1.2.
Speedup on mac: ~3-5%.
Speedup on arm: 3.6% for VGA and 4.4% for HD.
Change-Id: Ic3f3d6a2af650aca6ba0064d2b1db8d48c035ac7
The sum of tx bloxk eobs is needed in the machine learning based partition
early termination. The eobs are first accumulated during tx search, and
then the value associated with the best tx_size is copied to ctx for later
use.
After the sum of eobs are calculated correctly, re-enabled
ml_partition_search_early_termination speed feature.
Re-did the quality/speed test to check the impact of the fix.
1. Borg test BDRATE result:
4k set: PSNR: +0.183%; SSIM: +0.100%;
hdres set: PSNR: +0.168%; SSIM: +0.256%;
midres set: PSNR: +0.186%; SSIM: +0.326%;
2.Average speed gain result:
4k clips: 21%;
hd clips: 26%;
midres clips: 15%.
The result is in line with the original result.
Change-Id: I4209a95c89be03b4cbfb6a95b16885f89feddbda
Add routine vp9_model_rd_from_var_lapndz_vec and call it from model_rd_for_sb
to model the rate and distortion for MAX_MB_PLANE Laplacian sources in
parallel. The caller ensures that all sources have non-zero variance.
Measured a 18% to 25% reduction in retired instructions, and 17% to 24%
reduction in instruction execution cost with different compilers for the
Laplacian modeling.
No change in behavior.
TEST=Verified that encoded files match bit for bit, with and without this
change.
BUG=b/33678225
Change-Id: I6b76947f21c659a349adb896e13e99f6e3f951e6
Don't denoise spatial layer frames whose base layer is a key frame.
Disallow golden reference for SVC with denoising on frames
that will be denoised (highest layer), as this removes bad artifact.
Will re-enable when issue is resolved.
Change-Id: I87a6597812330500966458172acfce54af65f70f
Fix the update of the denoiser buffer when the base
spatial layer is a key frame. And allow for better/lower
QP on high spatial layers when their base layer is key frame.
Change-Id: I96b2426f1eaa43b8b8d4c31a68b0c6d68c3024a2
Reduce it from 5 to 4, small/no change in metrics or speed.
Small reduction in dragging artifact near moving head.
Change-Id: Ic3bc5ca67c70bf0c89fc2ed14454840a28ae5b6a
This patch was based on Yang Xian's intern project code. Further modifications
were done.
1. Moved machine-learning related parameters into the context structure.
2. Corrected the calculation of sum_eobs.
3. Removed unused parameters and calculations.
4. Made it work with multiple tiles.
5. Added a speed feature for the machine-learning based partition search
early termination.
6. Re-organized the code.
The patch was rebased to the top-of-tree.
Borg test BDRATE result:
4k set: PSNR: +0.144%; SSIM: +0.043%;
hdres set: PSNR: +0.149%; SSIM: +0.269%;
midres set: PSNR: +0.127%; SSIM: +0.257%;
Average speed gain result:
4k clips: 22%;
hd clips: 23%;
midres clips: 15%.
Change-Id: I0220e93a8277e6a7ea4b2c34b605966e3b1584ac
Fixes an issue when the LAST and golden is not used as a reference,
in which case its possible no encoding mode is set (since intra may be
skipped under certain codtions). Fix is to make sure intra is searched
if no inter mode is checked.
Issue can happen for temporal layer pattern#7 in vpx_temporal_svc_encoder.c
Change-Id: I5ab4999b2f9dbd739044888e0916b5ec491d966b
shift the bsse[] member of the macroblock struct to the front to avoid
an incorrect offset (0) to the upper half of bsse[0] which leads to a
negative resulting in a crash. restrict this to visual studio versions
before 2015 (the bug was observed with 2013, fixed in 2015) to avoid any
potential cache impact on other platforms.
https://connect.microsoft.com/VisualStudio/feedback/details/2396360/bad-structure-offset-in-32-bit-code
BUG=webm:1054
Change-Id: I40f68a1d421ccc503cc712192263bab4f7dde076
Enable row-mt for SVC for real-time mode, speed >=5.
Add the controls to the sample encoders, but keep it off for now.
Add the control and enable it for the 1 pass CBR unittests.
For speed 7, 3 layer SVC, 2 threads, row-mt enabled gives about ~5% speedup.
Change-Id: Ie8e77323c17263e3e7a7b9858aec12a3a93ec0c1