This patch is to address concerns that changes to allow
recodes on the first frame in each ARF group do not give a
good enough speed quality trade off for speed 2. Though the
average impact on encode speed is 1-2%, for some hard clips
it is > 5% rise. For speed 1 this is less an issue and for Speed 0
the previous patch actually improves speed.
Change-Id: Ie1bcefdbfdf846d3f4428590173f621465dffe3a
This corrects a formatting error introduced in:
I1e9d548ce445d29002f0c59ebfd3957a6f15e702
where spaces were used as delimiters instead of tabs.
The corresponding fix for vp10 is in
Ica3d625d6672b3c47e0e208b45eede29b9004030.
Change-Id: Ibc4eb8fd82e6b926ba259a679dc98557cadba9b1
Current commit is just an API template for the rest of the code, and
I will add inner logic later.
Altref frames generate a lot of bitrate and at the same time
other frames refer to them a lot, so it makes sense to apply
special compensation-based adaptive quantization scheme for altref
frames. E.g., for blocks that are good predictors for the future
apply rate-control chosen quantizer while for bad predictors apply
worse one.
Change-Id: Iba3f8ec349470673b7249f6a125f6859336a47c8
Previously Tx domain rd was used in all cases above speed 0.
Coefficient optimization was only enabled for best and speed 0.
This patch selectively sets these features at other speed settings
based on block complexity.
For the Netflix and HD sets in particular the quality gains are
large compared to the speed hit. At speed 1 the average psnr
gain in the NF set is > 2.5% with one clip coming in at 18%
and some points almost 30%. Average gains for the lower
resolution test sets are around 1%.
The gains are biggest at low Q so some further optimization
may be possible.
Change-Id: I340376c7b2a78e5389a34b7ebdc41072808d0576
In the future this option will activate adaptive quantization special
for altref frames. Encoder will create the adaptive quantization map
on the basis of lookahead buffers similarity which is the estimate of
the future motion compensation performance.
Change-Id: Ia0088b3babb0f9a4899c79d8d819947ba5a03df2
Disabled the split mode while encoding 4k video to speed
up the encoder.
Borg test result on 4k set:
Overall PSNR: +0.029%; SSIM: +0.009%.
Average encoder speedup at speed 2 is 2.5%.
Change-Id: I1519c658f07c3ac838affbe5aff0ed9b94f3f8f4
Adjusted speed 2 features to speed up 4k video encoding.
BDBR results from borg test:
PSNR: +0.313%; SSIM: +0.268%.
Average speedup: 8.5%
Change-Id: I1e2695a01fb3f3817c1df4480e184c2aed8f2eba
this fixes a crash in vp9_dec_setup_mi() via
vp9_init_context_buffers() should decoding continue and the decoder
resyncs on a smaller frame
BUG=b/30593752
Change-Id: I9ce8d94abe89bcd058697e8bd8599690e61bd380
Bias towards base_mv and skip 1/4 pixel motion search when using base mv.
2~3% speed up for 2 spatial layers, 3~5% speed up for 3 spatial layers.
PSNR loss:
(2 layers) 0.07dB for gips_stationary, 0.04dB for gips_motion;
(3 layers) 0.07dB for gips_stationary, 0.06dB for gips_motion.
Change-Id: I773acbda080c301cabe8cd259f842bcc5b8bc999
Add option, for newmv-last, to limit the rd-threshold update for early exit,
under a source varianace condition.
This can improve visual quality in low texture moving areas,
like forehead/faces.
Also add bias against golden to improve the speed/fps,
will little/negligible loss in quality.
Only affects CBR mode, non-svc, non-screen-content.
Change-Id: I3a5229eee860c71499a6fd464c450b167b07534d
Changes the default recode rule for Speed 0 and best quality
from ALLOW_RECODE to ALLOW_RECODE_KFARFGF.
Tested on the NF, hdres, midres and lowres test sets, this setting
when combined with patch I40cb559... now performs "as well" in
metrics terms (in fact it came out a tiny amount better overall)
but encode time is 9.6% faster (measured as the average
from 27 mid rate local encodes on clips in the derf/lowres set.
Change-Id: I8c781c0cdfa3a9929cd9406d15582fce47d6ae3b
Allow recodes for the first inter frame in each arf group
even when the recode rule is set to ALLOW_RECODE_KFARFGF.
Small gains of 0.05%.
Change-Id: I40cb559d36a2bf0ebf5cf758c3f92e452b480577
This patch fixed a motion vector out of range bug:
vpxenc: ../libvpx/vp9/encoder/vp9_mcomp.c:69:
mv_cost: Assertion `mv->col >= -((1 << (11 + 1 + 2)) - 1) &&
mv->col < ((1 << (11 + 1 + 2)) - 1)' failed.
For blocks that returned without having full-pixel search, the original
MV limits were not restored, which caused the failure. Moved the set
MV limit function down to fix the bug.
Change-Id: Id7d798fc7214e95c6e4846c588f0233fcf1a4223
This patch fixed a motion vector(MV) out of range bug, which was caused
by not restoring the original values of the MV min/max thresholds after
the sub8x8 full pixel motion search. It occurred rarely and only was seen
while encoding a 4k clip for 200 frames.
BUG=webm:1271
Change-Id: Ibc4e0de80846f297431923cef8a0c80fe8dcc6a5
* changes:
Use common transpose for vpx_idct32x32_1024_add_neon
Use common transpose for vpx_idct8x8_[12|64]_add_neon
Use common transpose for vp9_iht8x8_add_neon
Use common transpose for vpx_idct16x16_[10|256]_add_neon
Increase the minimum distance.
Reduces the overshoot somewhat on some clips,
small gain in avgPSNR (~0.1%) on ytlive set.
Change-Id: Id5ddde20c2907dbdb536e79542eff775019c142b
Move best index into the token state. Shrink it down to one byte. This
is more cache friendly (access are group together) and uses less total
memory.
Results in 4% fewer cycles in optimize_b().
Change-Id: I75db484fb3dc82f59928d54b659d79c80ee40452
Shutdown all threads before reclaiming any memory. The frame-level
parallel decoder may access data from another worker.
BUG=webm:1259
Change-Id: I26856ebd1f77cc4a4545331baa19bbf3e01c4ea4
This commit changes the call in vp9 encoder from vp9_deblock() to
vp9_post_proc_frame() to ensure the data structures used in the call
are properly allocated. This fixes an encoder crash when configured
with --enable-internal-stats.
Change-Id: I2393b336c0f566665336df4f1ba91c405eb56764
Allow usage of lookahead for VBR in real-time mode, for 1 pass vbr.
Current usage is for fast checking of future scene cuts/changes,
and adjusting rate control (gf interval and active_worst/target size).
Added unittests (datarate) for 1 pass vbr mode, with non-zero lag.
Added an experimental option to limit QP based on lookahead.
Overall positive gain in metrics on ytlive set:
avgPNSR/SSIM up on average ~1-3%; several clips up by 5, 7%.
Change-Id: I960d57dfc89de121c4824b9a9bf88d2814e74b56
This change eliminates redundant computation in the two stage
downscaling, which saves ~1% encoding time in 3-layer svc encoding.
Change-Id: Ib4b218811b68499a740af1f9b7b5a5445e28d671
Modify the gfu_boost and af_ratio setting based on the
average frame motion level.
Change only affects 1 pass vbr.
Metrics overall positive on ytlive set.
On average up by ~1%, several clips up by 2-4%.
Change-Id: Ic18c49eb2df74cb4986b63cdb11be36d86ab5e8d
Use the precise context to estimate the zero token cost in trellis
optimization process. This improves the speed 0 coding performance
by 0.15% for lowres and 0.1% for midres. It improves the speed 1
coding performance by 0.2% for midres and hdres.
Change-Id: I59c7c08702fc79dc4f8534b64ca594da909e2c91
This commit allows the inter prediction residual to use uniform
quantization followed by trellis coefficient optimization in
speed 0. It improves the coding performance by
lowres 0.79%
midres 1.07%
hdres 1.44%
Change-Id: I46ef8cfe042a4ccc7a0055515012cd6cbf5c9619
Move the operations that update the context buffers outside this
function. The coeff_cost() takes all input as const value and returns
the coefficient cost.
This makes preparation for the next coefficient optimization CLs.
Change-Id: I850eec6e5470b91ea84646ff26b9231b09f70a0c
Replace the existing mv bias with a bias only for
NEWMV, and based on the motion vector difference of
its top/left neighbors.
For cbr non-screen-content mode.
Change-Id: I8a8cf56347cfa23e9ffd8ead69eec8746c8f9e09
Use a measure of noise energy to adjust Q estimate and
arf filter strength.
Gains 0.3-0.5% on Lowres and |Netflix sets.
Hdres and Midres neutral.
Change-Id: Ic0de552e7b6763e70eeeaa3651619831b423e151
Use pixel domain distortion metric in speed 0. This improves the
compression performance by 0.3% for both low and high resolution
test sets.
Change-Id: I5b5b7115960de73f0b5e5d0c69db305e490e6f1d
Safer to have the decoder operate normally and have
better-hw-compatibility only implement encoding changes.
Fixes some test failures.
Change-Id: I0dd70d002e4e893992f0cd59774b9363e6f7fe76
For real time CBR mode, use model_rd_for_sb_y for 32x32 if the sb is
a skin sb to avoid visual regression on the slowly moving face.
Refer to the cl: https://chromium-review.googlesource.com/#/c/356020/
Change-Id: I42c36666b2b474ce5ee274239d52ae8ab400fd46
The transform block row and column positions are always available
outside the callees. There is no need to re-compute these values
again. This approach has been used by the decoder. This commit
removes txfrm_block_to_raster_xy() function.
Change-Id: I5b90f91a0d8b7c35cfa7d171da9edf8202630108
these are debug-only modules that can be added in manually when needed.
leave a reference in vp8_common.mk / vp9_common.mk for easy addition.
quiets -Wmissing-prototypes warning
BUG=b/29584271
Change-Id: Ifc8637d877edfbd562b34dc5c540428bba7951fc
Moved the API patch from NextGenv2. An example was included.
To try it, for example, run the following command:
$ examples/vpx_cx_set_ref vp9 352 288 in.yuv out.ivf 4 30
Change-Id: I4cf8f23b86d7ebd85ffd2630dcfbd799c0b88101
The scaling of the threshold for 10 and 12 bit here appears
to be in the wrong direction. For 10 and 12 bit we expect sse
values to be higher and hence the threshold used should be
scaled up not down.
Change-Id: I2678116652b539aef48100e0f22873edd4f5a786
This function seems to scale the threshold for testing an
SSE value in the wrong direction for 10 and 12 bit inputs.
Also for a true SSE the scalings should probably be << 4 and 8
Change-Id: Iba8047b3f70d04aa46d9688a824f3d49c1c58e90
For real time CBR mode, use model_rd_for_sb_y for 32x32 if the mode is
newmv last, which is less aggressive in skipping transform and
quantization, to avoid quality regression in some conditions.
Change-Id: Ifa30be587f2a8a4a7f182a172de6ce277c0f8556
For forced key frames in particular this helps to make them
blend better with the surrounding frames where noise tends
to be suppressed by a combination of quantization and alt
ref filtering.
Currently disabled by default under and IFDEF flag pending
wider testing.
Change-Id: I971b5cc2b2a4b9e1f11fe06c67ef073f01b25056
Bug fix: The crash is caused by not allocating buffer for prev_mip in
postproc_state and prev_mip in postproc_state is only used for MFQE,
ohter postproc modules, deblocking and etc., should not use it.
BUG=webm:1251
Change-Id: I3120d2f50603b4a2d400e92d583960a513953a28
add a trailing ':', though it's optional with the tools we support, it's
more common to use it to mark a label. this also quiets the
orphan-labels warning with nasm/yasm.
BUG=b/29583530
Change-Id: I46e95255e12026dd542d9838e2dd3fbddf7b56e2
For real-time mode, increase variance threshold for 32x32 blocks in
var-based partitioning for resolution >= 720p, so that it is more
likely to stay at 32x32 for high resolution which accelerates the
encoding speed with little/no PSNR drop.
PSNR effect on different speed settings:
speed 8 rtc: 0.02 overall PSNR drop, 0.285% SSIM drop
speed 7 rtc: 0.196% overall PSNR increase, 0.066% SSIM increase
speed 5 rtc_derf: no effect.
Speed up:
gips_motion_WHD, 1mbps: 2.5% faster on speed 7, 2.6% faster on speed8
gips_stat_WHD, 1mbps: 4.6% faster on speed 7, 5.6% faster on speed8
Change-Id: Ie7c33c4d2dd7d09294917e031357fc5476c3a4bb
Avoids a segfault in high-bitdepth builds.
This restores the condition to its state prior to:
7991241 vp9: Change the scheme for modeling rd for bsize 32x32.
BUG=webm:1250
Change-Id: I6183d5b34cb89dfbf27b7bb589812148a72cd7de
For real-time CBR mode, use model_rd_for_sb_y_large instead of
model_rd_for_sb_y for 32x32 block. In the former model, transform
might be skipped more aggressively in some condtions, which speeds
up encoding time with only a little PSNR/SSIM drop on rtc test set.
No obvious visual quality regression.
PSNR effect on different speed settings:
speed 8 rtc: 0.129% overall PSNR drop, 0.137% SSIM drop
speed 7 rtc: 0.135% overall PSNR drop, 0.062% SSIM drop
speed 5 rtc_derf: 0.105% overall PSNR drop, 0.095% SSIM drop
Speed up:
gips_motion_WHD, 1mbps: 3.29% faster on speed 7, 2.56% faster on speed8
gips_stat_WHD, 1mbps: 2.17% faster on speed 7, 1.62% faster on speed8
BUG=webm:1250
Change-Id: I818babce5b8549b4b1a7c3978df8591bffde7173
decoder_peek_si_internal could potentially read more bytes than
what actually exists in the input buffer. We check for the buffer
size to be at least 8, but we try to read up to 10 bytes in the
worst case. A well crafted file could thus cause a segfault.
Likely change that introduced this bug was:
https://chromium-review.googlesource.com/#/c/70439 (git hash:
7c43fb6)
BUG=chromium:621095
Change-Id: Id74880cfdded44caaa45bbdbaac859c09d3db752
In vp9_pick_inter_mode(), instead of using
vp9_get_pred_context_switchable_interp(xd) to assign filter_ref,
we use a less strict condition on assigning filter_ref.
This is to reduce the probabily of entering the flow of not
assigning filter_ref and then skipping filter search.
Overall PSNR gain 0.074% for rtc dataset
Details:
Low Mid High
0.185% -0.008% -0.082%
Change-Id: Id5c5ab38d3766c213d5681e17b4d1afd1529e676
For real-time CBR mode, use model_rd_for_sb_y_large instead of
model_rd_for_sb_y for 32x32 block. In the former model, transform
might be skipped more aggressively in some condtions, which speeds
up encoding time with only a little PSNR/SSIM drop on rtc test set.
No obvious visual quality regression.
PSNR effect on different speed setting:
speed 8 rtc: 0.129% overall PSNR drop, 0.137% SSIM drop
speed 7 rtc: 0.135% overall PSNR drop, 0.062% SSIM drop
speed 5 rtc_derf: 0.105% overall PSNR drop, 0.095% SSIM drop
Speed up:
gips_motion_WHD, 1mbps: 3.29% faster on speed 7, 2.56% faster on speed8
gips_stat_WHD, 1mbps: 2.17% faster on speed 7, 1.62% faster on speed8
Change-Id: I902f62def225ea01c145d7e5a93497398b8f5edf
This commit adds an encoder workaround to support better
compatibility with a non-compliant hardware vp9 profile 2 decoder.
The known issue with this decoder is:
The decoder assumes a wrong value, 127 instead of the correct
value of 511 and 2047, for any assumed top-left corner pixel in
UV planes for 10 and 12 bit, respectively. Such assumed
top-left corner pixel is used for INTRA prediction when a real
decoded/reconstructed pixel is not avalable, e.g. when it is
located inside the row above the top row or inside the column
left to the leftest column of a video image.
Change-Id: Ic15a938a3107e1b85e96cb7903a5c4220986b99d
development has moved to the nextgenv2 branch and a snapshot from here
was used to seed aomedia
BUG=b/29457125
Change-Id: Iedaca11ec7870fb3a4e50b2c9ea0c2b056a0d3c0
This commit refactors the trellis coefficient optimization process.
It saves multiplications used to generate the final dequantized
coefficients. It removes two memset operations on quantized
and dequantized coefficient sets. This improves the unit speed
by 10%.
Change-Id: I23f47c6e14582520a7f952f03ce8f72183e7f0e6
Since combining VPX_DL_REALTIME with VPX_RC_FIRST_PASS is basically
nonsense, ignore the user's pass setting when this happens and
behave as if the requested encode is a single pass encode.
BUG=webm:1233
Change-Id: I5ee4c4e5838c4ca6d24988890aae490b10826db2
For VBR: (1) allow newmv mode for golden ref to
select interpolation filter (as in last ref case), and
(2) don't use the more aggressive tx-skip testing logic for large blocks.
Only affects 1 pass real-time vbr mode (speed >= 5).
PSNR/SSIM metrics on ytlive set are all positive, ~0.5-2% gain.
Change-Id: I0ffbb0a9755563a5acd6230c58236e4f19a47266
This change is only for real-time mode if short_circuit_low_temp_var
is on. Add bias to last frame in choosing ref frame for partitioning,
when y_sad and y_sad_g are close. It speeds up real-time encoding by
0.5% on some clips with less than 0.1% overall PSNR drop on rtc test set.
Change-Id: I2a2110fe36455f3d8f0fc404aef2228f512e8df8