For 1 pass vbr mode:
On no-show_frame/ARF: instead of skipping alt_ref_frame
completely in mode testing, allow for checking (0, 0) on alt_ref.
Small gain in metrics, ~0.18%, no change in speed.
Change-Id: I32a3c24faca64ab70dd5091071a0dc301db7dd1e
For 1 pass vbr: when significant content/scene change is detected
(high_source_sad = 1) reduce/turnoff the additional qdelta on the
active_worst_quality. This helps somewhat to reduce the occurrence
of large frame sizes and large encode times.
Allow it only when use_altef_onepass is enabled.
Neutral/no change on metrics.
Change-Id: I1dd97dd2ab892d65f707b841b27a5de300b714ea
For speed 6 real-time mode: use adapt_partition
on ARF frame instead of REFERENCE_PARTITION (which is slower).
This requires enabling compute_source_sad_onepass for no-show_frames.
Speedup of ~3-5% on some clips that heavily use ARF,
small loss (~0.2%) in quality on ytlive set.
Change-Id: Ib50acc97df06458244a6ac55d2bd882c30012536
Speed comparing with the one calling vpx_scaled_2d_neon()
~1.7 x in general
~2.8x for BILINEAR filter
BUG=webm:1419
Change-Id: I8f0a54c2013e61ea086033010f97c19ecf47c7c6
Scale 3x3 block instead of 16x16 block in each loop. Disabled by
default.
Benefits:
1. Reduced number of different phase_scaler from 16 to 3.
Optimization code will be smaller and faster.
2. Maximum phase_scaler drifting will be reduced from 5/16 to 1/24.
(The drifting is 1/(3*16) in each step.)
BUG=webm:1419
Change-Id: I59a1f7496d89a1b090498c935d30cfcf1d0c282b
For real-time mode. Move the switch to fixed partition
for is_src_frame_alt_ref so all speeds may use it
if use_altref_onepass is set.
Improves metrics by ~2% for ytlive set at speed 4
(where use_altref_onepass is currently used).
Change-Id: I033240386598c9dbd0364da89ccbcca64bc663ee
Only has effect when sf->use_altref_onepass is enabled,
as in that case scene detection is skipped for non-show frame
and so high_source_sad does not get reset to 0.
No change in metrics or speed.
Change-Id: I421f066d239341449c18826089e1810b9fc5967f
Add stats for past ARF usage, and use it to disable
ARF usage based on some conditions.
Overall improvement on ytlive set, reduces the regression
on the problem clips for this feature.
Only affects when sf->use_altref_onepass is enabled
(currently off by default).
Change-Id: I66267f227ea132dc86acb730e9882f85bead2cdb
This reverts commit 535b7b915a.
This is actually used in CBR to reset the rate control if high source sad is detected.
Original change's description:
> Remove the speed condition on scene detection in 1 pass code.
>
> Scene detection is used for VBR mode and for screen_content mode.
>
> It was also enabled for CBR mode via the speed condition,
> but currently the analysis in the scene detection is not used
> in CRB mode (similar computations are done locally at superblock level
> when the source_sad feature is enabled).
>
> For 1 pass code.
> No change in behavior. Small speed gain, ~0.5%.
>
> Change-Id: I59991d7ef2af320bea7af4b907596e057affa42f
TBR=marpan@google.com,builds@webmproject.org,jianj@google.com
Change-Id: Ib4e6b02047f75632503e7b0fc870af97fa9291c3
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Scene detection is used for VBR mode and for screen_content mode.
It was also enabled for CBR mode via the speed condition,
but currently the analysis in the scene detection is not used
in CRB mode (similar computations are done locally at superblock level
when the source_sad feature is enabled).
For 1 pass code.
No change in behavior. Small speed gain, ~0.5%.
Change-Id: I59991d7ef2af320bea7af4b907596e057affa42f
* changes:
Remove the unnecessary cast of (int16_t)cospi_{1...31}_64
Remove the unnecessary upcasts of (int)cospi_{1...31}_64
Change cospi_{1...31}_64 from tran_high_t to tran_coef_t
Add the condition frames_since_golden > 0 to the
early exit check for ARF usage in nonrd_pickmode.
This improves quality of first frame following ARF, where
frame_since_golden = 0.
Small/neutral gain in metrics for speed 6, neutral change in speed.
Only affects when USE_ALTREF_FOR_ONE_PASS is enabled.
Change-Id: I82e73e6ff6fc849e5ca5448563cb8a0515fe0cdc
A new bug was introduced in a80bdfd "Change sinpi_{1,2,3,4}_9 from
tran_high_t to int16_t". Reverted the change in this file.
BUG=webm:1450
Failed test C/TransHT.AccuracyCheck/26.
Change-Id: Id001f57aad811803ef7d367d2b2bc008d8499991
Modify simple_block_yrd condition in nonrd_pickmode for SVC:
allow it to be used also on base temporal_layer, only when
spatial_layer > 1 and block size < 32x32.
Speed up of about ~2% for 3 layer SVC, with little/negligible
loss in quality.
Change-Id: I7734bdae51cf51f22b96f6b2b27da20ea1d84344
Fix the setting to frames_till_gf_update_due, and
adjust the limit value.
Only affects when USE_ALTREF_FOR_ONE_PASS is enabled.
Neutral change to metrics and speed for ytlive.
Change-Id: I266d9a00b36221bc8602fa2746d4e8a8f7d4dfae
Only when USE_ALT_REF_ONE_PASS is enabled (off by default).
Force fixed partition to 64x64 when is_src_alt_ref_frame is true,
and don't force early exit for some modes in nonrd_pickmode
for ARF noshow frames.
Small gain ~0.2% on ytlive metrics for speed 6.
Neutral speed difference.
Change-Id: I27eb6622d0453c09a06ccdc3b16368762474d11d
In the new AUTO mode, restrict the minimum alt-ref interval and max column
tiles adaptively based on picture size, while not applying any rate control
constraints.
This mode aims to produce encodings that fit into levels corresponding to
the source picture size, with minimum compression quality lost. However, the
bitstream is not guaranteed to be level compatible, e.g., the average bitrate
may exceed level limit.
BUG=b/64451920
Change-Id: I02080b169cbbef4ab2e08c0df4697ce894aad83c
Also add column headings so that the output can still be parsed if the
set of headers changes later.
Change-Id: I4beaf266521e093db4acf5f715b18fdfb7e3d1cd
Scale 3x3 block instead of 16x16 block in each loop.
Benefits:
1. Reduced number of different phase_scaler from 16 to 3. Optimization code
will be smaller and faster.
2. The maximum phase_scaler drifting will be reduced from 5/16 to 1/24.
(The drifting is 1/(3*16) in each step.)
BUG=webm:1419
Change-Id: Ibb9242a629ddb03e1ff93b859bece738255e698c
The intra mode rd penalty was implemented as a rate penalty.
Code was added to scale the penalty according to block size but
this was not done correctly for the SB level or sub 8x8.
The code did a weird double scaling in regard to bit depth that
has been removed. Given that it is a rate penalty the bit depth
should not matter.
This bug fix improves average metrics on our standard test
sets by about 0.1%
Change-Id: I7cf81b66aad0cda389fe234f47beba01c7493b1e
Move class VpxScaleBase to new file test/vpx_scale_test.h.
Add new file test/vp9_scale_test.cc with ScaleFrameTest.
BUG=webm:1419
Change-Id: Iec2098eafcef99b94047de525e5da47bcab519c1
Neutral on rtc set for speed 8. Neutral on ytlive for speed 5.
Saves some computation cycles but no speed gain observed on Pixel.
Change-Id: I34c4642cd543aa89c5b9c4bff6b7113577c64c91
Rev d147771 fixed the test failure. So remove the resolution condition
for using source_sad in speed 6.
BUG=webm:1452
Change-Id: I1efba97e1ef5bd4de5f886299f6fcb907187abcd
Enable adapt_partition for vbr mode for speed 6.
This allows the usage of the pickmode-based partition
(used in speed 5), but only selectively for superblocks
with high source sad, otherwise the faster variance based
partition scheme is used.
For speed 6 on ytlive set: avgPSNR/SSIM metrics up by ~0.6%,
several clips up by ~1.5%. Small/negligible decrease in speed.
Change-Id: I12f3efef6b3e059391de330fdbe5a44c2587f1f8
For SVC at speed >= 7: only use the improved mv search
on base spatial layer, if top layer resolution is above 640x360.
~2.3% speedup
Small/negligible loss in avgPSNR metrics on rtc set.
Change-Id: Iaef75a57ebf1c248931bc1aa28d20b7fecac1851
For speeds < 7, increase threshold that controls the split
of 16x16->8x8 blocks, for resolutions 720p and higher.
Minor change for speed 5 (since it uses reference partition scheme
which only uses variance partition as first step).
For speed 6: ~0.5% increase in avgPSNR/SSIM metrics on ytlvie set.
No change in speed.
Change-Id: I5126580973201538d8ca26a9256b93c4d11d685b
About 4x faster when values are below the dequant threshold and 10x
faster if everything needs to be calculated.
Both numbers would improve if the division for dqcoeff could be
simplified.
BUG=webm:1426
Change-Id: I8da67c1f3fcb4abed8751990c1afe00bc841f4b2
This feature is used for the CBR RTC encoding mode
at speed >= 6. This change will exclude it for VBR mode.
For speed 6 live encoding (VBR):
avgPSNR/SSIM metrics on ytlive set up by ~1% (few clips up by 2/3%).
No change in speed.
Change-Id: I1a0dd94c334f7df309ab5a48d477d7e25355b798
This should probably be handled before vp9_regular_quantize_b_4x4 even
gets called.
Fixes an assert resulting from removing skip_block from the quantize
functions.
BUG=webm:1459
Change-Id: I7f52b53f959b4654b3d4517ebda31a678f4d0fde
This condition is handled before this code is reached. The ssse3 version
of the function has always crashed when attempting to handle the
skip_block condition.
Add assert() and comments regarding the usage of skip_block.
Removing the parameter is a fairly involved process so leave it be for
the moment.
Change-Id: Ib299f6fc6589d7ee102262cc74a7aeb60110bc5a
Despite abs_coeff being a positive value, all the other implementations
treat it as signed which simplifies restoring the sign.
HBD builds cast qcoeff to avoid a visual studio warning. Match
vp9_quantize.c style of casting the entire expression.
Change-Id: I62b539b8df05364df3d7644311e325288da7c5b5
Having a very low "lag_in_frames" value could cause the encoder to create
incorrect / corrupt ARF groups including displayed frames that update the
ARF buffer and false overlay frames that are coded at low rate but are not
actually overlays of a real ARF frame.
This is linked to a reported unit test "slow down" where the chosen parameters
(lag of 3 frames) gave rise to such "broken" ARF group(s).
See also BUG=webm:1454
Change-Id: If52d0236243ed5552537d1ea9ed3fed8c867232c
Some clips in nightly unit test exhibiting significant encoder slowdown which
appears to bisect to Change-Id: I692311a709ccdb6003e705103de9d05b59bf840a.
The above change allowed for emergency iterations of the recode loop and
adjustment of the Q range if there is a large rate miss.
This patch disables the above adaptation for cases of cpu_speed >= 3 or more
specifically where cpi->sf.recode_loop >= ALLOW_RECODE_KFARFGF.
For speeds >= 3 the code does not currently run a dummy bit pack operation
inside the recode loop. Without this dummy pack operation there is no up to
date estimate of the current frame's size to use as a basis for assessing the
requirement for a recode. In practice it was using the previous frames size (or 0
for the first frame) which could cause odd behavior.
If we require the emergency rate correction added in Change-Id: I6923.. for
the higher speed settings it will be necessary to enable the dummy pack
which will in turn hurt encode speed.
BUG=webm:1454
Change-Id: I4fb3c6062ca9508325a6f31582f8e80f1a9b126f
Change legacy vp8/9_write_yuv_frame to vpx_write_yuv_files.
Delete some flags that can be enabled during build.
To enable writing denoised YUV, use the following command line:
CFLAGS='-DOUTPUT_YUV_DENOISED' ./configure
--enable-vp9-temporal-denoising
For skinmap, use CFLAGS='-DOUTPUT_YUV_SKINMAP'
Change-Id: I236974ac8b3cf279d20c4dc7f6162d8b480b6528
Change the denoiser frame buffer management for SVC to more generally
handle the layer patterns in SVC (where last is not always refreshed).
This change is only for SVC with denoising and is bitexact.
Change-Id: Ic2b146a924cdf6e7114609158afa3d4880fe3fae
Testing of 4k videos encoded with a fixed arbitrary chunking interval
uncovered a bug where by if a chunk ends 1 frame before a real scene cut,
the next chunk may be encoded with two consecutive key frames at the start
with the first being assigned 0 bits.
This fix insures that where there is a key frame group of length 1 it is
at least assigned 1 frames worth of bits not 0.
See also patch Change-Id: I692311a709ccdb6003e705103de9d05b59bf840a
which by virtue of allowing fast adaptation of Q made this bug more visible.
BUG=webm:1456
Change-Id: Ic9e016cb66d489b829412052273238975dc6f6ab
When adapt_partition_source_sad is enabled (currently only at
speed 6 for resoln <= 360p): use lower subsize (8x8 instead of 16x16)
for nonrd_select_partition on 32X32 blocks.
And force avoiding rectangular partition checks in
nonrd_pick_partition for speed >= 6.
Small increase ~0.5 in metrics for speed 6 on rtc_derf,
no change in speed.
Change-Id: Id751bc8f7573634571b2d6f5e29627cd5cebccae
Enable fast adaptation of Q when there is a large overshoot
for the #ifdef AGGRESSIVE_VBR test case.
AGGRESSIVE_VBR is not currently enabled by default.
Change-Id: I7240bb6589795964b6b0b66df4468e4f21504e0f
Originally, for the purpose of keeping a fast first pass, the first-pass
stats between row_mt_mode = 0 and row_mt_mode = 1 are not bit exact, but
that difference is very small that doesn't cause a mismatch between the
final bitstreams. However, if the encoder changes, this minor difference
may cause a mismatch. Thus, this patch always forces the first pass to
be bit exact.
BUG=webm:1453
Change-Id: I2b67cf529dee81f660f9d9e7fe9a60ea3c7b12b8
When the superblock partition is based on the nonrd-pickmode,
we need to avoid the denoising. Current condition was based on
the speed level. This change is to make the condition at the
superblock level, as the switch in partitioning may be done at
sb level based on source_sad (e.g., in speed 6).
Change-Id: I12ece4f60b93ed34ee65ff2d6cdce1213c36de04
This reverts commit c9266b8547.
Disable source_sad when resolution > 1080P. The test should
pass now.
BUG=webm:1452
Change-Id: I72dde88e66590ff9e41da5e5dd83f5550a83f082
Move the source_sad feature to speed 6 (from speed 7), and
add speed feature to switch from the variance-based partition
to reference_partition (which uses nonrd-pickmode for bsize selection)
if source_sad is high.
Currently used only for speed 6 for resoln <= 360p.
About 4-5% improvement on 360p in RTC set.
Some speed slowdown, but still ~30% faster than speed 5.
Change-Id: Ib0330ee5fe9fdd2608aed91359a2a339d967491c
For 8-bit the subtrahend is small enough to fit into uint32_t.
For 10/12-bit apply:
63a37d16f Prevent negative variance
previously:
47b9a0912 Resolve -Wshorten-64-to-32 in highbd variance.
c0241664a Resolve -Wshorten-64-to-32 in variance.
Change-Id: I181c85f0b9a03da37c2e8b89482d48aa3dbc0aee
0.007% regression on rtc and 0.004% gain on rtc_derf.
1 thread on QVGA,VGA and HD has ~0.2% speed regression while 2 threads has
~0.2% speed gain on Google Pixel.
Change-Id: Ia4a6ec904df670d7001e35e070b01e34149d23dc
To fix valgrind issueis with SVC tests.
SVC encoding uses prune_evenmore which is causing uinit value.
Will re-enable later when issue is resolved.
Change-Id: I257ff878cf78197ddd813db056582a4d5fe94f44