Refactor to split the 1 passs source sad computation into scene
detection (currently used for VBR and screen-content mode), and
superblock based source sad computation (used in non-rd CBR mode).
This allows the source sad computation for CBR mode to be
multi-threaded.
No change in compression.
Change-Id: I112f2918613ccbd37c1771d852606d3af18c1388
Make the source_sad feature work properly for cases of VBR or
screen_content with SVC.
Added unittest for SVC with screen-content on.
Change-Id: Iba5254fd8833fb11da521e00cc1317ec81d3f89b
Since y_sad is not computed yet (on the early exit due to source_sad),
no need to check for setting color_sensitiviy.
Only affects speed >=8. No change in behavior.
Change-Id: I3a6f2d20fed38d8b8ec51b75bcacf9a21f2db916
Allow for simple_block_rd for VGA resoln, and reduce
adaptive_rd_thresh to 1.
On average no loss on RTC set, ~4% speedup on mac.
Change-Id: Ib549c4061c853776062b5e34040f839d470fbebc
Change it to row based array to avoid the slow down cause by sync.
row-mt on, speed 8, 2 threads: ~4% speedup for VGA on ARM benefited
from adaptive_rd_threshold.
Change-Id: I887e65a53af20a6c4f48d293daaee09dab3512cf
Add additional condition to split to 16x16, for resolutions <= 360p,
reduces dragging artifact near moving boundary.
Small/no change on RTC metrics.
Change-Id: I314694f2166435d918f74e7ab42f002b07f40dae
For each superblock, keep track of how far from current frame
was the last significant content change, and use that (along
with GF distance), to turnoff GF search in non-rd pickmode.
Only enabled for speed >= 8.
avgPNSR on RTC/RTC_derf down by ~0.9/1.2.
Speedup on mac: ~3-5%.
Speedup on arm: 3.6% for VGA and 4.4% for HD.
Change-Id: Ic3f3d6a2af650aca6ba0064d2b1db8d48c035ac7
The sum of tx bloxk eobs is needed in the machine learning based partition
early termination. The eobs are first accumulated during tx search, and
then the value associated with the best tx_size is copied to ctx for later
use.
After the sum of eobs are calculated correctly, re-enabled
ml_partition_search_early_termination speed feature.
Re-did the quality/speed test to check the impact of the fix.
1. Borg test BDRATE result:
4k set: PSNR: +0.183%; SSIM: +0.100%;
hdres set: PSNR: +0.168%; SSIM: +0.256%;
midres set: PSNR: +0.186%; SSIM: +0.326%;
2.Average speed gain result:
4k clips: 21%;
hd clips: 26%;
midres clips: 15%.
The result is in line with the original result.
Change-Id: I4209a95c89be03b4cbfb6a95b16885f89feddbda
Add routine vp9_model_rd_from_var_lapndz_vec and call it from model_rd_for_sb
to model the rate and distortion for MAX_MB_PLANE Laplacian sources in
parallel. The caller ensures that all sources have non-zero variance.
Measured a 18% to 25% reduction in retired instructions, and 17% to 24%
reduction in instruction execution cost with different compilers for the
Laplacian modeling.
No change in behavior.
TEST=Verified that encoded files match bit for bit, with and without this
change.
BUG=b/33678225
Change-Id: I6b76947f21c659a349adb896e13e99f6e3f951e6
Don't denoise spatial layer frames whose base layer is a key frame.
Disallow golden reference for SVC with denoising on frames
that will be denoised (highest layer), as this removes bad artifact.
Will re-enable when issue is resolved.
Change-Id: I87a6597812330500966458172acfce54af65f70f
Fix the update of the denoiser buffer when the base
spatial layer is a key frame. And allow for better/lower
QP on high spatial layers when their base layer is key frame.
Change-Id: I96b2426f1eaa43b8b8d4c31a68b0c6d68c3024a2
Reduce it from 5 to 4, small/no change in metrics or speed.
Small reduction in dragging artifact near moving head.
Change-Id: Ic3bc5ca67c70bf0c89fc2ed14454840a28ae5b6a
This patch was based on Yang Xian's intern project code. Further modifications
were done.
1. Moved machine-learning related parameters into the context structure.
2. Corrected the calculation of sum_eobs.
3. Removed unused parameters and calculations.
4. Made it work with multiple tiles.
5. Added a speed feature for the machine-learning based partition search
early termination.
6. Re-organized the code.
The patch was rebased to the top-of-tree.
Borg test BDRATE result:
4k set: PSNR: +0.144%; SSIM: +0.043%;
hdres set: PSNR: +0.149%; SSIM: +0.269%;
midres set: PSNR: +0.127%; SSIM: +0.257%;
Average speed gain result:
4k clips: 22%;
hd clips: 23%;
midres clips: 15%.
Change-Id: I0220e93a8277e6a7ea4b2c34b605966e3b1584ac
Fixes an issue when the LAST and golden is not used as a reference,
in which case its possible no encoding mode is set (since intra may be
skipped under certain codtions). Fix is to make sure intra is searched
if no inter mode is checked.
Issue can happen for temporal layer pattern#7 in vpx_temporal_svc_encoder.c
Change-Id: I5ab4999b2f9dbd739044888e0916b5ec491d966b
shift the bsse[] member of the macroblock struct to the front to avoid
an incorrect offset (0) to the upper half of bsse[0] which leads to a
negative resulting in a crash. restrict this to visual studio versions
before 2015 (the bug was observed with 2013, fixed in 2015) to avoid any
potential cache impact on other platforms.
https://connect.microsoft.com/VisualStudio/feedback/details/2396360/bad-structure-offset-in-32-bit-code
BUG=webm:1054
Change-Id: I40f68a1d421ccc503cc712192263bab4f7dde076
Enable row-mt for SVC for real-time mode, speed >=5.
Add the controls to the sample encoders, but keep it off for now.
Add the control and enable it for the 1 pass CBR unittests.
For speed 7, 3 layer SVC, 2 threads, row-mt enabled gives about ~5% speedup.
Change-Id: Ie8e77323c17263e3e7a7b9858aec12a3a93ec0c1
this is similar to the x86 configuration and helps mitigate an issue
with a circular dependency between this function and the ssse3 variant
causing an outsized increase in binary size (~300K for chrome)
chrome.dll:
.text 255B000 -> 252B000
.data 7B000 -> 75000
-221184 bytes
BUG=chromium:697956
Change-Id: Ic95b142ecd62dd4f1795788aa27dd8fab59b708c
The 2 thresholds(i.e. partition_search_breakout_dist_thr and
partition_search_breakout_rate_thr) are used as the partition search
early termination speed feature. This refactoring patch made this
feature to be frame size dependent consistently throughout the code.
Change-Id: Idaa0bd8400badaa0f8e2091e3f41ed2544e71be9
Fix the conditon for getting last_source when denoising is on.
This avoids unneeded scaling in the case of SVC.
No change in quality.
Change-Id: I32c1c2c9085104da51af8535716bcc4d55fb0f42
Reduce the level from 4 to 2.
This gives ~1-2% quality gain on RTC set, with small decreaee in speed (~1-2% on mac).
Change-Id: I7d959731badcee3d45b2f4a08efe378765016a13
This reverts commit d3db846cc5.
This change causes a large drop in psnr (4-5db) on low framerate
difficult content (tested at 360/480p)
BUG=b/35804225
Change-Id: I8e90012d3b9c8a0cddb062ba93b01b36c0e0c0a0
From commit:
https://chromium-review.googlesource.com/c/441393/
On non-segment the set_vbp_thresholds() should be called
again to adjust thresholds based on content_state of superblock.
This was the intended behavior from 441393.
Small change in RTC metrics and speed.
Change-Id: I45e5fbdc4af74db76b3cb4f13074fcae0eb2219e
new_mt is a very generic name that will get obsolete soon enough.
Since this is exposed as a codec control, renaming it to row_mt to
signify row level paralellism. Also renaming the ETHREAD_BIT_MATCH
codec control to ROW_MT_BIT_EXACT.
Change-Id: Ic7872d78bb3b12fb4cf92ba028ec8e08eb3a9558
vp9_highbd_block_error_8bit_c was a very simple wrapper around
vp9_block_error_c. The SSE2 implemention was practically identical to
the non-HBD one. It was missing some minor improvements which only
went into the original version.
In quick speed tests, the AVX implementation showed minimal
improvement over SSE2 when it does not detect overflow. However, when
overflow is detected the function is run a second time. The
OperationCheck test seems to trigger this case and reverses any
speed benefits by running ~60% slower. AVX2 on the other hand is
always 30-40% faster.
Change-Id: I9fcb9afbcb560f234c7ae1b13ddb69eca3988ba1
Only works for bitdepth = 8 when compiled with high bitdepth flag.
4x speed ups for handling 1:2 down/upsampling.
Validated manually for:
1) Dynamic resize for a single layer encoding
2) SVC encoding with 3 spatial layers
Results are bitexact with the patch and the speed gain (~4x) in the
scaling was verified.
BUG=webm:1371
Change-Id: I1bdb5f4d4bd0df67763fc271b6aa355e60f34712
The reduction showed improvement on RTC when aq-mode=3 is on.
Add that (cyclic refresh enabled) to the condition.
Only affects 1 pass CBR.
Change-Id: I5d0843002d8e31d7c165098a62e7a71146b08664
For speed 8 only.
3% speed up for QVGA and 6.3% for VGA on Nexus 6.
~3% avgPSNR decrease on rtc_derf and 2.9% on rtc.
Disabled for now.
Change-Id: I70133f1f6c804d663d594df437bfe7fdb0030d6a
Increase the variance partition thresholds for superblocks that
have low sum-diff (from source analysis prior to encoding frame).
Use it for now only for speed >= 7 or for denoising on.
Small change on metrics for rtc set: less than ~0.1 avgPNSR decrease
on RTC set, for both speed 7 and 8.
Change-Id: I38325046ebd5f371f51d6e91233d68ff73561af1
Use the simple block_yrd under certain conditions.
The optimization code is completed but the speed is still slower
(~6% on 720p) than the low-bitdepth build.
For now, use the more complex block_yrd under certain conditions
(always use it for speed <= 5, otherwise use it on key frames and for
bsize >= 32x32).
This gives about ~2-3% gain in quality for speed 7 on RTC set
(over high bitdepth build), with about the same encoder fps as the
low bitdepth build.
Change-Id: Ibe92a1945d0bd635f880befb4c815727df62d754
Modified the code to facilitate bit-match tests in first pass
Added unit-tests to test the row based multi-threading behavior for bit-exactness
Change-Id: Ieaf6a8f935bb1075597e0a3b52d9989c8546d7df
This change subtracts out low complexity intra regions that are also low
error in the inter domain, in the calculation of the frame prediction decay.
The rationale here his that low complexity regions (such as sky) do not imply
high prediction decay in the same way as high error intra or neutral blocks.
The effect of this is small in most clips but in a few clips it can be > 10%.
(E.g. In to tree)
Change-Id: If67ac23d17fca14285cad2defa464c61c9ea861c
vp9[_highbd]_quantize]_fp[_32x32] and vp9_fdct8x8_quant do not make use
of these parameters.
scan is used for C code and iscan is used for SIMD implementations.
Change-Id: I908a0ff7d3febac33da97e0596e040ec7bc18ca5
* changes:
quantize_fp_32x32 highbd ssse3: enable existing function
quantize_fp highbd ssse3: use tran_low_t for coeff
quantize_fp highbd sse2: use tran_low_t for coeff
The previous implementation confused bit/bytes/elements. It was using
'32' as the multiplier but that was mistakenly adopted because a 32x32
transform embedded the stride.
Change-Id: Ieeb867a332416b9a40580b5e7c9b20088e9e691a
The weight segment needs to only be computed once per frame,
so remove it from the funciton vp9_cyclic_refresh_rc_bits_per_mb(),
which is called within a loop inside vp9_rc_regulate_q.
Change-Id: Ia0e18b89abb97e42c466d4dbc47700d7f76555db
vp9_compute_qdelta_by_rate has almost 2% overhead in profiling on Nexus 6.
Reduce the calling of that function in speed 8 by estimating the delta-q.
Both rtc and rtc_derf show little/no change in avg psnr/ssim.
Encoding speed is 2~3% faster on Nexus 6.
Change-Id: If25933715783f31104a18a5092ea347b1221b5f5
This small change replaces the frame boost check in the arf group
length break out clause with a test against a prediction decay value.
The boost value is in fact partly dependent on the decay value but
this change means that the per frame boost calculation can be adjusted
without influencing the group length calculation.
The value chosen gives a close match on all the test sets with the previous
code (on average) but it was noted that a lower threshold was slightly better
for 1080P and up and a slightly higher value for small image sizes.
Change-Id: I4d5b9f67d5b17b0d99ea3f796d3d6202fd61ee0c
The function scale_sse_threshold() returns a threshold scaled
if necessary for use with 10 and 12 bit from an 8 bit baseline.
SSE error values would be expected to rise for the 10 and 12
bit cases where there are more bits of precision.
Hence the threshold used for the test should also be scaled up.
Change-Id: I4009c98b6eecd1bf64c3c38aaa56598e0136b03d
(Yunqing Wang)
This patch implements the row-based multi-threading within tiles in
the encoding pass, and substantially speeds up the multi-threaded
encoder in VP9.
Speed tests at speed 1 on STDHD(using 4 tiles) set show that the
average speedups of the encoding pass(second pass in the 2-pass
encoding) is 7% while using 2 threads, 16% while using 4 threads,
85% while using 8 threads, and 116% while using 16 threads.
Change-Id: I12e41dbc171951958af9e6d098efd6e2c82827de
While the new-mt mode is enabled(namely, allowing to use row-based
multi-threading in encoder), several speed features that adaptively
adjust encoding parameters during encoding would cause mismatch
between single-thread encoded bitstream and multi-thread encoded
bitstream. This patch provides a set_control API to disable these
features, so that the bit match bitstream is obtained in the unit
test.
Change-Id: Ie9868bafdfe196296d1dd29e0dca517f6a9a4d60
broken since:
c3f095c8b Merge "Fix to avoid abrupt relaxation of max qindex in recode path"
5f21aba4b Fix to avoid abrupt relaxation of max qindex in recode path
the original change pre-dated the addition of .clang-format
Change-Id: If5e399d9a805bcad9147360b13b36fbc8c560a7c
VBR method that allows a wider Q range for the first normal frame
in each ARF group and then centers the min - max range for the rest of
the arf group on the chosen Q value for that first frame.
This allows for quite rapid adjustment of the active Q range even if the
initial estimate is poor.
In some cases where the ARF frames themselves are tending to
undershoot but the normal frames are overshooting this can still give
net undershoot. This can be corrected by allowing a larger Q delta for
arf frames but is usually is a sign that the allocation to the arfs was to
high.
Change-Id: Icec87758925d8f7aeb2dca29aac0ff9496237469
Temporary fix until optimization work for block_yrd is completed.
This essentially reverts back to the state before the change:
https://chromium-review.googlesource.com/c/433821/
Compression loss is about ~5-6% on RTC set.
Speed-up (from using this simple/model-based block_yrd) over the low
bitdepth builds (which uses more complex block_yrd) is ~5% on 720p.
Change-Id: Ie0af9eb0d111e5595f587870c44f08317403b8d8
Add factor to increase varianace partition and ac skip thresholds,
under certain conditions (noise level and sum_diff), to increase
denoiser speed.
Change-Id: I7671140ef3598bf5f114a72623d68792bcd7b77b
Threshold for partitioning only affects VGA and lower res.
0.07% quality regression is observed in borg tests on rtc_derf
and 0.2% regression on rtc.
5.6% speed up for low res and 6.8% for VGA on Nexus 6.
Change-Id: If85a2919b48c991de66059c90f32ed06980452be
Modified the encoding stage to have row level entry points with relevant
initializations and to access the token information at row level
Change-Id: Ife10e55a7c1a420ee906d711caf75002688d9e39
Clears up static clang analysis warning regarding a dead store. Only
declare 'c' when it will be used.
Change-Id: I1ac0fc7f94bc44da63938c63cd1efcd6b95e0eb3
This commit resolves the compression performance regression in
real-time encoding setting when high bit-depth mode is enabled.
The current solution temporarily disables the SIMD implementations
of vpx_satd, hadamard8x8, and hadamard16x16 in high bit-depth mode.
The commit makes the coding results bit-wise identical between
regular coding pipeline and high bit-depth at profile 0.
BUG=webm:1365
Change-Id: Icfb900821733749685370460a1a5a7e07f76f4bf
In non-rd pickmode: Allow speed 7 to also use larger block size in
model_rd. Small change in behavior for speed 7.
Change-Id: I8c5523e424308e8f0bc71b3f6324dec42a464cc8
In non-rd pickmode: small change in behavior for speed 6 and 7.
Remove condition on HIGHBITDEPTH flag.
Change-Id: I360a13fcc313d72612fe9b918162ef4bb278cdea
Skip denoising for blocks < 16x16, and for block = 16x16
skip denoising for low noise levels and width > 480 for now.
Allow for some speed-up in denoiser.
Change-Id: Ib46cefe4741962d145fa08775defea3a9c928567
(yunqingwang)
1. Rebased the patch. Incorporated recent first pass changes.
2. Turned on the first pass unit test.
Change-Id: Ia2f7ba8152d0b6dd6bf8efb9dfaf505ba7d8edee
Increase the qp-delta, mainly for low resolutions,
excluding case of very low bitrates.
avgPSNR/SSSIM gain of ~3-5% on rtc_derf set.
Small change on rtc set.
Change-Id: Ice03d04bd0340404d1957666ef154fd64fed0606
Affecting only speed 8.
Speed tests on Nexus 6 show 4% faster for QVGA and 2.4% faster for VGA.
Little/negligible quality regression observed on both rtc and rtc_derf sets.
Change-Id: I337f301a2db49a568d18ba7623160f7678399ae1
(Yunqing)
This patch added the missing initialization in temporal filter.
Borg test BDRate results:
PSNR: -0.019%(lowres); -0.013%(hdres);
SSIM: -0.001%(lowres); -0.010%(hdres).
Other q values gave comparable but no better results.
Change-Id: I7ad0c18b39e6f558342688e2fe1e12fdb133ce9b
Only for speed >= 7, and affects skipping of intra modes.
Threshold is set low for now, needs to be tuned.
Small/no difference in metrics on rtc clips.
Change-Id: If9bdbd43f08d1f80407cdd2e9e5e96780dcd2424
For short_circuit set to level 1, skip newmv for 64x64 blocks if the
low temporal variance flag is set. Also modify threshold for 64x64 split
in variance partitioning.
Overall speed-up on noisy clips of 2-4%.
Only affect speed >= 7.
Change-Id: I384b3772007e84de6f8707e480d2ddf1fe1f907d
Avoid quality loss when copying partition of superblock with large motions.
Maximum consecutively copied frames can be set (currently 5).
Change-Id: I11c30575514f02194c0f001444cf4021609e5049
Also set the flag to 1 when exit early choosing 64x64 block
such that skipping new mv for golden works in these scenerios.
Change the size of prev_segment_id to the number of superblocks
to save memory.
Borg test shows quality regression of 0.012% on average PSNR
and 0.035% on SSIM.
Change-Id: I5014224c8617d439d35c66ece3fed9ae30b31d23
The fix relaxes the max qindex based on the data from previous loop of
coding if output frame size is greater than maximum frame size allowed
Change-Id: Iac1f63ec67559d68766e090a7cbb80b812b2560f
If enabled will compute source_sad for every superblock on every frame,
prior to encoding. Off by default, only on for speed=8 when
copy_partition is set.
Change-Id: Iab7903180a23dad369135e8234b7f896f20e1231
Avoid many visual artifacts. Compression quality is improved by more
than 1%. Encode speed is about 4% for QVGA and 6% for VGA faster on
android.
Change-Id: I4dd0a81429ddf7efdef1e80a191da5fb8de8e8af
For speed 8, it speeds up the encoding on android by 6% for QVGA and
7.4% for VGA with the new threshold. Overall PSNR is improved by 0.667
for rtc.
Change-Id: I4a644560b32c0b5b4e9f49ffb953d000413a3732
If enabled denoiser will only denoise the top spatial layer for now.
Added unittest for SVC with denoising.
Change-Id: Ifa373771c4ecfa208615eb163cc38f1c22c6664b
When aq=3 mode is on and the gf_cbr_boost is set: make sure golden frame
is always refreshed, and don't incorporate segement cost in qp setting
on the boosted golden frame.
Better performance on RTC set with gf_cbr_boost on,
for example with gf_cbr_boost=50, gains from ~0.5-3%.
Change-Id: Ie811f5e4d444ff3320bd6e2c1745b2c4c09a8460
Quality improved by 1.866 and 0.386 for two noisy clips (dark720p and
marcooffice720p), respectively.
Change-Id: Ib33a7672ae9ca53da156208f7cd13f07b5543e44
Avoid the qp-clamping on gf/alt frame if gf_cbr_boost_pct is set.
Change only affect CBR mode when gf_cbr_boost_pct is set.
Change-Id: I0655ed4f2b047c8ed1ed33a070c17960ad776704
Increase the boost threshold below which GOLDEN update will use same
rate correction factor as INTER_NORMAL.
Improves performance when gf_cbr_boost_pct is set (between 0 and 100)
in CBR mode.
Change-Id: I9f54cc18664786a100b13a416b7137ae03bd0cab
Set short_circuit_low_temp_var to 3 for speed 8 for all res.
No strong visual difference on all clips.
Change-Id: Ia6d9a314291ab1c14d5421bbdd769974083aeb2a
Constraints on encoder config:
-target_bandwidth is no larger than 80% of level bitrate limit
-target_bandwidth * (1 + max_over_shoot_pct) is no larger than
88% of level bitrate limit
-min_gf_interval is no smaller than level limit
-tile_columns is no larger than level limit
Constraints on rate control:
-current frame size plus previous three frames' size is no larger
than the CPB level limit
-current frame size is no larger than 50%/40%/20% of the CPB
level limit if it's a key/alt-ref/other frame.
Change-Id: I84d1a2d6d6e3c82bfd533b3309ce999cfaba2c8b
The source sad could be used to copy the partition without going into
choose_partitioning function to speed up vp9 encoding. Computing source
sad takes little time. Speed test on Android and Linux shows little
encoding time gain (less than 1.4%).
Turned off for now since partition copy is turned off.
Change-Id: I61c9d5b8f22329760cb29a4ee30a7f9c232ce8d3
vp9: Set short circuit to level 3 for VGA for speed 8. Also change the
threshold_32x32 to 5/8*thresholds[1] to improve quality regression
caused to VGA clips.
Change-Id: Ia1590e91e7cb22be78d5b85013387bb1be4272e3
For out-of-range cases, returned UINT_MAX instead of INT_MAX in the
sub-pixel mv search to be consistent with the "uint32_t" return type.
Change-Id: I8e206d771228c13d89bafbbe9f14722c8ecc6a7a
The function 'vp9_find_best_sub_pixel_tree_pruned_more' is modified
to return INT_MAX for handling invalid MV cases from UINT32_MAX.
yunqingwang:
patch 3: rebased on top of the tree.
patch 4: The return type of vp9_find_best_sub_pixel_tree* was changed
to uint32_t to fix ubsan warnings. Changing UINT_MAX back to INT_MAX
was not quite right. Patch 4 modified vp9_temporal_filter.c to accept
uint32_t.
(Note: Inconsistency exists in vp9_find_best_sub_pixel_tree*, which
will be fixed in a separate CL.)
Change-Id: Ib1a79dc2aa41ea6335c21669c76883cdbb7e0535
When source frame is altref, we only do zero-mv mode, so we can skip
the find_predictors(). No change in compression.
Small speed gain, ~1%.
Only affects 1 pass vbr with lookhead altref, for ytlive with
the macro flag USE_ALTREF_FOR_ONE_PASS on.
Change-Id: I9318c5da8521f017bf54919cd652438b3a6313d1
Remove superfluous test. Produces a small improvement in instruction scheduling.
Measured a 1% to 1.5% reduction in execution time for routine vp9_optimize_b
with different compilers.
No change in behavior.
TEST=Verified that encoded files match bit for bit, with and without this
change.
BUG=b/33678225
Change-Id: I2bf248d4c25fc0256147d7a8766ff9108ae9cba3
Add feature to copy partition from the last frame.
The copy is only done under certain conditions that SAD is below threshold.
Feature is currently disabled, until threshold is tuned.
Feature will be initially used for Speed 8 (ARM).
Under extreme case of always copying partition for speed 8:
Encode time is reduced by 5.4% on rtc_derf and 7.8% on rtc.
Overall PSNR reduced by 2.1 on rtc_derf and 0.968 on rtc.
Change-Id: I1bcab515af3088e4d60675758f72613c2d3dc7a5
Simplify address arithmetic on token_costs to reduce the number of generated
instructions that are used for address arithmetic inside routine
vp9_optimize_b. It also helps improve instruction scheduling depending on
compiler and optimization level.
Measured a 9.3% reduction in retired instructions and 5.3% reduction in
execution time for this routine with GCC v4.8.4 and optimization flags -O3,
and a reduction of up to 11.6% in execution time with other compilers.
No change in behavior.
TEST=Verified that encoded files match bit for bit, with and without this
change.
BUG=b/33678225
Change-Id: I6098650fb5cd2aa04e014fe6e68ca20761f3a21f
Correctly set interp_filter to SWITCHABLE for INTRA mode.
Also reduce threshold on noise level for re-evaluating zeromv.
Change-Id: Id32c01e193209fb380aa07204f0be3babf29f70a
For when denoising enabled: change condition to enable
the recheck_zeromv_after_denoising for only very high noise level.
This is causing an issue, so enabling it for very high noise
to effectively shut it off.
Change-Id: Ic40d6025f3f398338cedd270d17c0ccd9a3daa84
The flag USE_ALTREF_FOR_ONE_PASS allows for alt-ref lookahead
in 1 pass vbr (from https://chromium-review.googlesource.com/#/c/365498).
This change is to make sure this macro flag only has effect if
the config flag cpi->oxcf.enable_auto_altef is also on.
No change in ytlive encoding, as USE_ALTREF_FOR_ONE_PASS is not
yet enabled.
Change-Id: I1a69681e4a15c5244581a3dab4587fca08f02e0f
use_base_mv assumes 2x2 scaling, so fix is to shutoff
this feature unless spatial scale factors are 2.
Added svc unittest for 2 spatial layers with 5x5 scaling,
which generates the issue without this fix.
Also fix some settings in svc unittest:
let the speed setting vary (from 5 to 8), and enable static threshold.
BUG=webm:1344
Change-Id: Idfd0a6c633c21b49a0479601506302cfe974e30e
One of the first pass stats "new_mv_count" is no longer used in VP9,
and is removed. This also makes it easy to implement a multi-threaded
first pass. This change doesn't affect the coding performance, which
has been verified by borg tests.
Change-Id: I4c7c7bf9465fda838eb230814ef0c631c068c903
Use the segment weight factor based on the target (cr->percent_refresh)
if it less than the current estimate (avergae of past usage and target).
Small improvement at low bitrates.
Change-Id: Iba8fd909e203f94458901366d3a991f7ea854d49
Increase the motion threshold and qp-delta for segment#2 boost.
This can increase the frame-drop at low bitrates, but generally
better spatial quality.
Only affects real-time mode with aq-mode=3, at very low bitrates.
Change-Id: I5ccb784667f70d0c27d369806b93b1f93d5605d1
Use the same feature as https://chromium-review.googlesource.com/#/c/411327/,
but allow it to be used for speed = 6 and 7, where
short_circuit_low_temp_var = 1.
Speed up of ~2-3% for speed 7, with little/no loss in compression.
Change-Id: I263a0f261ad9929034392d68f0153dc6376fdb5f
Add a new, more aggresive short circuit: short_circuit_low_temp_var = 3 to skip
golden of any mode when variance is lower than threshold for low res.
This change only affects speed = 8, low resolution.
Metrics for avgPSNR/SSIM on rtc_derf (low resolution) show loss of
0.27/0.31%.
On Nexus 6, the encoding time is reduced by ~2.3% on average across all
low-res clips.
Visually little change on rtc_derf clips.
Change-Id: Ia8f7366fc2d49181a96733a380b4dbd7390246ec
Changes only affects speed = 8 for low resolutions.
Metrics for avgPSNR/SSIM on rtc_derf (low resolutions) show loss of
0.5/0.6%.
On Nexus 6, the encoding time is reduced by ~5.9% on average across all
low-res clips.
Visually little/no change on rtc_derf clips.
Change-Id: I68dd50e558d72dcc1af8317d224bfae5e3bd872d
This commit enables asymptotic closed-loop encoding decision for
the key frame and alternate reference frame. It follows the regular
rate control scheme, but leaves out additional iteration on the
updated frame level probability model. It is enabled for speed 0.
The compression performance is improved:
lowres 0.2%
midres 0.35%
hdres 0.4%
Change-Id: I905ffa057c9a1ef2e90ef87c9723a6cf7dbe67cb
For noisy content, be more aggressive in skippping some blocks for
delta-qp to reduce noise pulsing artifact. Also treat frame boundary
case when dimension is not multiple of superblock size/64.
Only affects non-screen content case, and when source noise
is measured to be high (at least level kMedium).
Change-Id: Ib13a2a20ed1ce37ff3c44d95c3ef2635fd695222
Add condition that usable_ref_frame > LAST.
This is to avoid potentially skipping all last-nonzero mv modes,
if golden is used as a reference but skipped completely for the
current block.
This has no effect currenty, as we always consider testing golden
mode for each block.
Change-Id: I3182cf44664081935a90ed43aa7b32e710e60e22
Fixed formatting bug introduced by the fix to BUG=webm:1322
( Iedc4477aef1746aa0a4f84d88a1156296fd3ba87)
Change-Id: I715ee446c0e8584967ab87ba4e355759dd394187
This change is a step in a larger change to the way boost and interval are
determined for ARF and Key frames.
This patch contains some pluming for the general case but focuses on the
key frame boost calculation. This now relies more heavily on the rate at
which the error score increases between the primary and secondary reference
frame. This seems to be less fragile when dealing with different frame sizes.
For example larger image formats tend in the first pass to see a higher
% of intra coded blocks and the use of this number in calculating the frame
decay factor was leading to much lower boost numbers for 4K, for example,
than the same clip coded at 2K.
This change does give overall gains but they are MUCH larger for the 4K Netflix
set. For the 4K Netflix set the average gain is around 3% with some clips > 20%
whereas for the same set at 2K the average gain is 0.5-1%.
In general for small image formats the boost is most often reduced a little whereas
4K clips the boost is increased. There are some -ve cases such as Akiyo at 352x288
where the reduced boost hurts the metrics, especially for SSIM, even while
the set as a whole improves. This is most notable at very low Q and may be the
subject of a future patch.
Some common code for KF and ARF was separated in this patch for the purposes of
tuning but may later be re-merged if appropriate.
Change-Id: Iaa15ac5a58d2be89181100d95cef6a8dc4b12d0d
Fixes a case where recode is not triggered based on the value
of maxq passed into the recode loop test function.
BUG=b/32375284
Change-Id: I15ad985d0525c68e0443cfaf842440d2754b2266
Removed a couple of adjustments that no longer move the needle
much but complicate the process of tuning.
Change-Id: Ie320f5cf155e6aac14a4757ea9ada2cd59f27590
This patch modified the motion search counts used in:
https://chromium-review.googlesource.com/#/c/305640/
These 2 counts were originally added as thread data, and used to
make decisions in motion search. The tile encoding order can be
inconsistent while using different number of threads, which can
cause bitstream mismatch. Here moved them to tile data to solve
the issue.
BUG=webm:1322
Change-Id: Iedc4477aef1746aa0a4f84d88a1156296fd3ba87
Re-use the tile worker threads to pack the bitstream in parallel
on a per-tile basis. Restricting this to real-time only for now
(further testing is needed to ensure this does not make 2-pass
worse in any case).
BUG=webm:1309
Change-Id: I8a80da7c5089b837d0df79a5c49d5e3022dfc8ec
In variance partition low resolutions may use varianace based on
4x4 average for better partitioning.
Increase the threshold for doing this at speed = 8.
Improves speed by ~5%, with little loss, < 1%, on RTC_derf set.
Change-Id: Ib5ec420832ccff887a06cb5e1d2c73199b093941
This reverts commit 9e8efa5b18.
this change causes ubsan warnings, failures in
vpxenc_vp9_webm_rt_multithread_tiled
BUG=webm:1309
Change-Id: I020c7be985c771bfff4b3de1afe51cc8edb980da
Add stronger condition for splitting 64x64, for low noise content.
This reduces dragging artifact near moving head.
Little/no change in metrics on RTC set.
Change-Id: I39b38cfd20f2ece53ff49c2aaf76ba9f82761be1
Re-use the tile worker threads to pack the bitstream in parallel
on a per-tile basis. Restricting this to real-time only for now
(further testing is needed to ensure this does not make 2-pass
worse in any case).
BUG=webm:1309
Change-Id: Ia2c982da56697756e12f02643f589189b3271d98
For 1 pass vbr real-time mode:
Allow for the usage of alt-ref frame when non-zero lag-in-frames is used.
Use non-filtered alt-ref, and select usage based on fast scene/content
analysis/detection within the lag of frames.
Positive gains on ytlive set: overall avgPSNR ~3-4%.
Several clips are up between 5-14%, a few clips are neutral/small change.
Current speed decrease is about ~5-10%.
Use the flag USE_ALTREF_FOR_ONE_PASS to enable this feature
(off by default for now).
Change-Id: I802d2bf3d44f9cf01f6d15c76be9c90192314769
Put limit on gf interval based on lag, and allow
for the adjustment on next gf group also on key frame.
Small/neutral change on ytlive metrics.
Change only affects 1 pass vbr real-time mode.
Change-Id: I339c8f4398848698b6e10fe9482c52ca661b94a5
These caused the following warning with GCC 5:
warning: logical not is only applied to the left hand side of
comparison [-Wlogical-not-parentheses]
assert(!is_compound == (cm->reference_mode == SINGLE_REFERENCE));
Change-Id: If296aabb2311ceb7d903b395c1549ef81c2cbf9b
(cherry picked from commit c6cf7a6111)
This patch sets the 16x16 src_diff to zero and ensures correct calculation
of this_error for block sizes smaller than 16x16.
Change-Id: I7b7c02d267433c9f22c8ac9b8d5df2f499175172
change_config() may be called often in real-time application,
to update bitrate/framerate or qp-max/min.
No need to do update_frame_size() unless frame size has changed.
Change-Id: I23a51deade1e03adc91c468f9ffde3235298770c
For real-time mode at speed 8: turn off MINIMAL_LF at speed 8,
for non-screen content mode.
Visually better, avgPSNR/SSIM on rtc set go up by ~4-5%.
Speed decrease of about ~3%.
Change-Id: I8eb69330f02e0ceece1507d43cfc8a049a1d8291
Reduce the filt_guess for 1 pass cbr on inter-frames.
This reduces visual artifact seen in rtc clip (jimred.vga),
and improves metrics on rtc set.
Metrics on rtc set for cbr mode overall positive, most clips are up:
Speed 7 rtc: avgPSNR/SSIM up by: ~2.6/3.9%
Speed 8 rtc: avgPSNR/SSIM up by: ~1.3/2.5%
Change-Id: Ia4eccea1c19d65b583516df28823cd756c49464f
Added a cap on the maximum boost for an arf based on interval length.
Fixed bug where by the image size was not accounted for in determining
two of the motion breakout thresholds.
Overall small gains of 0.2-0.4% psnr but on large image format clips with
slow zooms the gain may be as much as 20% or more (e.g. in_to_tree
at 1080P)
Change-Id: Id0a47391203026742daa9c97afac5705fd8c4dfb
The vp9_mv_class0_tree is a balanced tree with two leafs and can
simply be coded as a boolean with probability class0[0].
Change-Id: If294dac825a5f945371092c74aa8e3f84cd962b6
(cherry picked from commit be8a8ab62ebdd111c6f2e9a33b15630570671eba)
Remove the experiment LIMIT_QP_ONEPASS_VBR_LAG, as its
not currently used and no plan to use in near future.
Change-Id: Ib069f8d7225195be04b765d0ab477510dfba6a3b
Added casts to remove warnings:
BUG=webm:1274
In regards to the safety of these casts they are of two types:-
- Normalized bits per (16x16) MB stored in a 32 bit int (This is safe as bits
per MB even with << 9 normalization cant overflow 32 bits. Even raw 12
bits hdr source even would only be 29 bits :- (4+4+12+9) and the encoder
imposes much stricter limits than this on max bit rate.
- Cast as part of variance calculations. There is an internal cast up to 64 bit
for the Sum X Sum calculation, but after normalization dividing by the number
of points the result will always be <= the SSE value.
Change-Id: I4e700236ed83d6b2b1955e92e84c3b1978b9eaa0
Using a tighter resize constraint on undershoot seems to help
results (especially SSIM) as significant undershoot on a frame
seems to have more of a damaging impact than overshoot.
This patch has been tuned so that in local testing using the
derf set it is encode speed neutral for speed setting 2.
Average quality result for speed 2 (psnr,ssim) were as follows:-
lowres 0.039, 0.453
midres 0.249, 0.853
hdres 0.159, 0.659
NetFlix -0.241, 0.360
Change-Id: Ie8d3a0d7d6f7ea89d9965d1821be17f8bda85062
This patch is to address concerns that changes to allow
recodes on the first frame in each ARF group do not give a
good enough speed quality trade off for speed 2. Though the
average impact on encode speed is 1-2%, for some hard clips
it is > 5% rise. For speed 1 this is less an issue and for Speed 0
the previous patch actually improves speed.
Change-Id: Ie1bcefdbfdf846d3f4428590173f621465dffe3a
This corrects a formatting error introduced in:
I1e9d548ce445d29002f0c59ebfd3957a6f15e702
where spaces were used as delimiters instead of tabs.
The corresponding fix for vp10 is in
Ica3d625d6672b3c47e0e208b45eede29b9004030.
Change-Id: Ibc4eb8fd82e6b926ba259a679dc98557cadba9b1
Current commit is just an API template for the rest of the code, and
I will add inner logic later.
Altref frames generate a lot of bitrate and at the same time
other frames refer to them a lot, so it makes sense to apply
special compensation-based adaptive quantization scheme for altref
frames. E.g., for blocks that are good predictors for the future
apply rate-control chosen quantizer while for bad predictors apply
worse one.
Change-Id: Iba3f8ec349470673b7249f6a125f6859336a47c8
Previously Tx domain rd was used in all cases above speed 0.
Coefficient optimization was only enabled for best and speed 0.
This patch selectively sets these features at other speed settings
based on block complexity.
For the Netflix and HD sets in particular the quality gains are
large compared to the speed hit. At speed 1 the average psnr
gain in the NF set is > 2.5% with one clip coming in at 18%
and some points almost 30%. Average gains for the lower
resolution test sets are around 1%.
The gains are biggest at low Q so some further optimization
may be possible.
Change-Id: I340376c7b2a78e5389a34b7ebdc41072808d0576
In the future this option will activate adaptive quantization special
for altref frames. Encoder will create the adaptive quantization map
on the basis of lookahead buffers similarity which is the estimate of
the future motion compensation performance.
Change-Id: Ia0088b3babb0f9a4899c79d8d819947ba5a03df2
Disabled the split mode while encoding 4k video to speed
up the encoder.
Borg test result on 4k set:
Overall PSNR: +0.029%; SSIM: +0.009%.
Average encoder speedup at speed 2 is 2.5%.
Change-Id: I1519c658f07c3ac838affbe5aff0ed9b94f3f8f4
Adjusted speed 2 features to speed up 4k video encoding.
BDBR results from borg test:
PSNR: +0.313%; SSIM: +0.268%.
Average speedup: 8.5%
Change-Id: I1e2695a01fb3f3817c1df4480e184c2aed8f2eba
Bias towards base_mv and skip 1/4 pixel motion search when using base mv.
2~3% speed up for 2 spatial layers, 3~5% speed up for 3 spatial layers.
PSNR loss:
(2 layers) 0.07dB for gips_stationary, 0.04dB for gips_motion;
(3 layers) 0.07dB for gips_stationary, 0.06dB for gips_motion.
Change-Id: I773acbda080c301cabe8cd259f842bcc5b8bc999
Add option, for newmv-last, to limit the rd-threshold update for early exit,
under a source varianace condition.
This can improve visual quality in low texture moving areas,
like forehead/faces.
Also add bias against golden to improve the speed/fps,
will little/negligible loss in quality.
Only affects CBR mode, non-svc, non-screen-content.
Change-Id: I3a5229eee860c71499a6fd464c450b167b07534d
Changes the default recode rule for Speed 0 and best quality
from ALLOW_RECODE to ALLOW_RECODE_KFARFGF.
Tested on the NF, hdres, midres and lowres test sets, this setting
when combined with patch I40cb559... now performs "as well" in
metrics terms (in fact it came out a tiny amount better overall)
but encode time is 9.6% faster (measured as the average
from 27 mid rate local encodes on clips in the derf/lowres set.
Change-Id: I8c781c0cdfa3a9929cd9406d15582fce47d6ae3b
Allow recodes for the first inter frame in each arf group
even when the recode rule is set to ALLOW_RECODE_KFARFGF.
Small gains of 0.05%.
Change-Id: I40cb559d36a2bf0ebf5cf758c3f92e452b480577
This patch fixed a motion vector out of range bug:
vpxenc: ../libvpx/vp9/encoder/vp9_mcomp.c:69:
mv_cost: Assertion `mv->col >= -((1 << (11 + 1 + 2)) - 1) &&
mv->col < ((1 << (11 + 1 + 2)) - 1)' failed.
For blocks that returned without having full-pixel search, the original
MV limits were not restored, which caused the failure. Moved the set
MV limit function down to fix the bug.
Change-Id: Id7d798fc7214e95c6e4846c588f0233fcf1a4223
This patch fixed a motion vector(MV) out of range bug, which was caused
by not restoring the original values of the MV min/max thresholds after
the sub8x8 full pixel motion search. It occurred rarely and only was seen
while encoding a 4k clip for 200 frames.
BUG=webm:1271
Change-Id: Ibc4e0de80846f297431923cef8a0c80fe8dcc6a5