This could save some cycles since skin detection is used in multiple
places in vp9.
1~2% speed up on ARM.
Change-Id: I86b731945f85215bbb0976021cd0f2040ff2687c
Use it to limit NEWMV early exit in nonrd pickmode
Small change in RTC metrics, has some improvement
for high motion clips.
Change-Id: I1d89fd955e1b3486d5fb07f4472eeeecd553f67f
When temporal layers are used, only allow for copy partition
on the top temporal enhancement layer frames.
Change-Id: I5472abdc0f9f6c8dafa75a7a84c615e08ae22af8
Only affects speed 8.
Make changes to copy partition to fix a bug in setting microblock
offset. Avg PSNR shows 0.02% gain on rtc_derf and 0.08% loss on rtc.
Change-Id: I61c3e5914dde645331344388e7437e5638acd4f3
Increase the partition and acskip thresholds for temporal
enhancement layers.
~1-2% speedup, with negligible loss in quality.
Change-Id: Id527398a05855298ad9ddac10ada972482415627
For speed >= 8 and color_sensitivity not set, skip the transform
skipping test in UV planes.
Add a new condition to check noise level to skip chroma check
for speed >= 8 if y_sad is high.
1~2% speedup on ARM for speed 8.
Borg tests show neutral results in both rtc and rtc_derf.
Change-Id: Idecd3ff6e28c97757a43bb6f3a7082c85f72109c
Add a low-variance high-sumdiff to the superblock content state
and use it to limit the mv and bias some decisions in non-rd pickmode.
Only affects speed >= 6.
Reduces artifact for lighting changes.
Small/no difference in metrics on RTC set.
Change-Id: Ic84b2379fe0ae3fa71ae826ee6bae3eaf551a25b
This patch followed allow_exhaustive_searches feature modification and
continued to modify the encoder to achieve the determinism in the row
based multi-threaded encoding. While row-mt = 1 and using multiple
threads, the adaptive feature in encoder was disabled, which gave
BDRate gain(at speed 1, -0.6% ~ -0.7%; at speed 2, -0.46% ~ -0.59%),
but some encoder speed losses(7% ~ 10% at speed 1 and 3% ~ 6% at
speed 2). These speed losses were acceptable considering the speed
gains obtained from row-mt.
Change-Id: I60d87a25346ebc487a864b57d559f560b7e398bb
A previous patch turned on allow_exhaustive_searches feature only for
FC_GRAPHICS_ANIMATION content. This patch further modified the feature
by removing the exhaustive search limit, and made it no longer adaptive.
As a result, the 2 counts that recorded the number of motion searches
were removed, which helped achieve the determinism in the row based
multi-threading encoding. Tests showed that this patch didn't cause
the encoder much slower.
Used exhaustive_searches_thresh for this speed feature, and removed
allow_exhaustive_searches. Also, refactored the speed feature code
to follow the general speed feature setting style.
Change-Id: Ib96b182c4c8dfff4c1ab91d2497cc42bb9e5a4aa
The more aggressive settings should only be used when denoise_svc
condition is satisfied (which means top spatial layer).
Change-Id: Ia8e3515b27f31bf21b1976ca80a2fa826daece3a
Set adaptive_rd_thresh to 2 when simple block yrd is not used.
Fix regression caused by computing y sad without
int_pro_motion_estimation on low res motion clips.
Overall 0.07% quality loss on rtc_derf.
Change only affects low res on speed 8.
Change-Id: Ic6a188a56529f1034d6431005fb4b0e24e8a7e27
For speed 5, 1 pass CBR: Don't use the nonrd_pick_partition
on the segment, rather use choose_partitioning followed by
nonrd_select_partition (as is done on base segment).
Little/no quality loss on RTC and RTC_derf (< 0.3%),
speedup of at least 5%.
Change-Id: I5273d5f950e60adf5e437b4ca8c4f63964641e83
Temporal denoiser runs in non-rd pickmode, so it is only used
for speed >= 5. Regression exists for speed 5, due to use of
reference_partition (which use non-rd pickmode for partitioning).
Avoid denoising for now at speed 5.
Change-Id: I74a74d2e1404d7cfd33dcf4ec06dd2e503256cf0
The row mt sync read uses sync_range = 1, and wouldn't work if we want
to use a sync_range that is greater than 1. To make it work, this sync
read code is modified. Pass in col instead of col - 1 to make it
consistent with other row mt code in VP9, and then add 1 in "while"
codition.
Change-Id: I4a0e487190ac5d47b8216368da12d80fec779c1a
For non-rd variance partition, avoid the chrome check
unless y_sad is below some threshold.
Small decrease in avgPSNR (~0.3) on RTC set.
Small/negligible decrease on RTC_derf.
Change-Id: I7af44235af514058ccf9a4f10bb737da9d720866
Refactor to split the 1 passs source sad computation into scene
detection (currently used for VBR and screen-content mode), and
superblock based source sad computation (used in non-rd CBR mode).
This allows the source sad computation for CBR mode to be
multi-threaded.
No change in compression.
Change-Id: I112f2918613ccbd37c1771d852606d3af18c1388
Make the source_sad feature work properly for cases of VBR or
screen_content with SVC.
Added unittest for SVC with screen-content on.
Change-Id: Iba5254fd8833fb11da521e00cc1317ec81d3f89b
Since y_sad is not computed yet (on the early exit due to source_sad),
no need to check for setting color_sensitiviy.
Only affects speed >=8. No change in behavior.
Change-Id: I3a6f2d20fed38d8b8ec51b75bcacf9a21f2db916
Change it to row based array to avoid the slow down cause by sync.
row-mt on, speed 8, 2 threads: ~4% speedup for VGA on ARM benefited
from adaptive_rd_threshold.
Change-Id: I887e65a53af20a6c4f48d293daaee09dab3512cf
Add additional condition to split to 16x16, for resolutions <= 360p,
reduces dragging artifact near moving boundary.
Small/no change on RTC metrics.
Change-Id: I314694f2166435d918f74e7ab42f002b07f40dae
For each superblock, keep track of how far from current frame
was the last significant content change, and use that (along
with GF distance), to turnoff GF search in non-rd pickmode.
Only enabled for speed >= 8.
avgPNSR on RTC/RTC_derf down by ~0.9/1.2.
Speedup on mac: ~3-5%.
Speedup on arm: 3.6% for VGA and 4.4% for HD.
Change-Id: Ic3f3d6a2af650aca6ba0064d2b1db8d48c035ac7
The sum of tx bloxk eobs is needed in the machine learning based partition
early termination. The eobs are first accumulated during tx search, and
then the value associated with the best tx_size is copied to ctx for later
use.
After the sum of eobs are calculated correctly, re-enabled
ml_partition_search_early_termination speed feature.
Re-did the quality/speed test to check the impact of the fix.
1. Borg test BDRATE result:
4k set: PSNR: +0.183%; SSIM: +0.100%;
hdres set: PSNR: +0.168%; SSIM: +0.256%;
midres set: PSNR: +0.186%; SSIM: +0.326%;
2.Average speed gain result:
4k clips: 21%;
hd clips: 26%;
midres clips: 15%.
The result is in line with the original result.
Change-Id: I4209a95c89be03b4cbfb6a95b16885f89feddbda
This patch was based on Yang Xian's intern project code. Further modifications
were done.
1. Moved machine-learning related parameters into the context structure.
2. Corrected the calculation of sum_eobs.
3. Removed unused parameters and calculations.
4. Made it work with multiple tiles.
5. Added a speed feature for the machine-learning based partition search
early termination.
6. Re-organized the code.
The patch was rebased to the top-of-tree.
Borg test BDRATE result:
4k set: PSNR: +0.144%; SSIM: +0.043%;
hdres set: PSNR: +0.149%; SSIM: +0.269%;
midres set: PSNR: +0.127%; SSIM: +0.257%;
Average speed gain result:
4k clips: 22%;
hd clips: 23%;
midres clips: 15%.
Change-Id: I0220e93a8277e6a7ea4b2c34b605966e3b1584ac
The 2 thresholds(i.e. partition_search_breakout_dist_thr and
partition_search_breakout_rate_thr) are used as the partition search
early termination speed feature. This refactoring patch made this
feature to be frame size dependent consistently throughout the code.
Change-Id: Idaa0bd8400badaa0f8e2091e3f41ed2544e71be9
From commit:
https://chromium-review.googlesource.com/c/441393/
On non-segment the set_vbp_thresholds() should be called
again to adjust thresholds based on content_state of superblock.
This was the intended behavior from 441393.
Small change in RTC metrics and speed.
Change-Id: I45e5fbdc4af74db76b3cb4f13074fcae0eb2219e
new_mt is a very generic name that will get obsolete soon enough.
Since this is exposed as a codec control, renaming it to row_mt to
signify row level paralellism. Also renaming the ETHREAD_BIT_MATCH
codec control to ROW_MT_BIT_EXACT.
Change-Id: Ic7872d78bb3b12fb4cf92ba028ec8e08eb3a9558
Increase the variance partition thresholds for superblocks that
have low sum-diff (from source analysis prior to encoding frame).
Use it for now only for speed >= 7 or for denoising on.
Small change on metrics for rtc set: less than ~0.1 avgPNSR decrease
on RTC set, for both speed 7 and 8.
Change-Id: I38325046ebd5f371f51d6e91233d68ff73561af1
(Yunqing Wang)
This patch implements the row-based multi-threading within tiles in
the encoding pass, and substantially speeds up the multi-threaded
encoder in VP9.
Speed tests at speed 1 on STDHD(using 4 tiles) set show that the
average speedups of the encoding pass(second pass in the 2-pass
encoding) is 7% while using 2 threads, 16% while using 4 threads,
85% while using 8 threads, and 116% while using 16 threads.
Change-Id: I12e41dbc171951958af9e6d098efd6e2c82827de
Add factor to increase varianace partition and ac skip thresholds,
under certain conditions (noise level and sum_diff), to increase
denoiser speed.
Change-Id: I7671140ef3598bf5f114a72623d68792bcd7b77b
Threshold for partitioning only affects VGA and lower res.
0.07% quality regression is observed in borg tests on rtc_derf
and 0.2% regression on rtc.
5.6% speed up for low res and 6.8% for VGA on Nexus 6.
Change-Id: If85a2919b48c991de66059c90f32ed06980452be
Modified the encoding stage to have row level entry points with relevant
initializations and to access the token information at row level
Change-Id: Ife10e55a7c1a420ee906d711caf75002688d9e39