Small changes to the best quality default speed trade off.
Some speedup settings are worth while even for best quality as they
have only a very small impact on quality but a significant impact on
encode time.
These changes give as much as a further 50-60% increase in encode
speed for my test animations clip with minimal impact on quality.
For this sequence these changes improve the best quality encode speed
to about the same level as good quality speed 0 in Q3 2015 whilst
retaining the large quality gain of over 1 db
For many natural videos though the quality difference from good 0
to best is much smaller.
Change-Id: I28b3840009d77e129817a78a7c41e29cb03e1132
This change alters the nature and use of exhaustive motion search.
Firstly any exhaustive search is preceded by a normal step search.
The exhaustive search is only carried out if the distortion resulting
from the step search is above a threshold value.
Secondly the simple +/- 64 exhaustive search is replaced by a
multi stage mesh based search where each stage has a range
and step/interval size. Subsequent stages use the best position from
the previous stage as the center of the search but use a reduced range
and interval size.
For example:
stage 1: Range +/- 64 interval 4
stage 2: Range +/- 32 interval 2
stage 3: Range +/- 15 interval 1
This process, especially when it follows on from a normal step
search, has shown itself to be almost as effective as a full range
exhaustive search with step 1 but greatly lowers the computational
complexity such that it can be used in some cases for speeds 0-2.
This patch also removes a double exhaustive search for sub 8x8 blocks
which also contained a bug (the two searches used different distortion
metrics).
For best quality in my test animation sequence this patch has almost
no impact on quality but improves encode speed by more than 5X.
Restricted use in good quality speeds 0-2 yields significant quality gains
on the animation test of 0.2 - 0.5 db with only a small impact on encode
speed. On most clips though the quality gain and speed impact are small.
Change-Id: Id22967a840e996e1db273f6ac4ff03f4f52d49aa
This reverts commit 380a5519cc.
This causes an assertion failure in debug_check_frame_counts() which
probably isn't valid with this change; leaving the investigation for
later now.
Change-Id: Ieda5ca811ed2fa50a0cc6935919a8d10dca996e0
This function now has an AVX intrinsics version which is about 80%
faster compared to the C implementation. This provides a 2-4% total
speed-up for encode, depending on encoding parameters. The function
utilizes 3 properties of the cost function lookup table, constructed
in 'cal_nmvjointsadcost' and 'cal_nmvsadcosts'.
For the joint cost:
- mvjointsadcost[1] == mvjointsadcost[2] == mvjointsadcost[3]
For the component costs:
- For all i: mvsadcost[0][i] == mvsadcost[1][i]
(equal per component cost)
- For all i: mvsadcost[0][i] == mvsadcost[0][-i]
(Cost function is even)
These must hold, otherwise the AVX version of the function cannot be used.
Change-Id: I6c2791d43022822a9e6ab43cd124a773946d0bdc
Change is only for real-time mode, speed >= 5, and non-screen content mode.
Add bias to zero/low motion for big blocks, if noise estimation
is enabled and noise level is above threshold.
Change-Id: I3a0a4608ede6aa535bda6eca528d20f8aba738e7
For 1 pass CBR mode: increase waiting time after key frame
before we start sampling rate control behavior for determining
resize. This change need to disable one internal resize(DownUp)
temporally since it requires a longer clip to do so.
Change-Id: If21beda1be23f169ee541ab4dd642f718347887a
Use same setting for speed 5 (as it is for speed > 5).
Change is only for real-time (non-rd) mode.
Change-Id: I830250eac654328373cb318baa89d4f0e63942e1
Reduces Linux perf estimated cycle count for pack_mb_tokens on a
lossless encode on my desktop from 61858501855 to 48154040219 or from
26% of the overall profile to 21%.
Change-Id: I9ca3426d7e3272bc7f7030abda4f0d0cec87fb4a
This reverts commit f1342a7b07.
This breaks 32-bit builds:
runtime error: load of misaligned address 0xf72fdd48 for type 'const
__m128i' (vector of 2 'long long' values), which requires 16 byte
alignment
+ _mm_set1_epi64x is incompatible with some versions of visual studio
Change-Id: I6f6fc3c11403344cef78d1c432cdc9147e5c1673
Add threshold/condition on spatial_variance and brightness level.
Modification to normalization of block variance.
Change resolution limit below which we disable noise estimation.
Change-Id: If5be08a26ceda351242d8a58d2f0bc88c0a918f0
This function now has an AVX intrinsics version which is about 80%
faster compared to the C implementation. This provides a 2-4% total
speed-up for encode, depending on encoding parameters. The function
utilizes 3 properties of the cost function lookup table, constructed
in 'cal_nmvjointsadcost' and 'cal_nmvsadcosts'.
For the joint cost:
- mvjointsadcost[1] == mvjointsadcost[2] == mvjointsadcost[3]
For the component costs:
- For all i: mvsadcost[0][i] == mvsadcost[1][i]
(equal per component cost)
- For all i: mvsadcost[0][i] == mvsadcost[0][-i]
(Cost function is even)
These must hold, otherwise the AVX version of the function cannot be used.
Change-Id: I184055b864c5a2dc37b2d8c5c9012eb801e9daf6
Change is only for real-time mode, speed > 5, and non-screen content mode.
Bias is based on block size and motion vector level (motion above some threshold).
Helps to improves stability in background from lightning changes.
PSNR/SSIM metrics on RTC set almost no change/neutral (within +/- 0.1).
Change-Id: I7eac13c1ae10be4ab1f40acc7f9f1df5653ece9d
Only use non-zero threshold(s) for breakout if
the motion level of the current tested mode is low.
Change-Id: I22aae961cc42371b49d3f648560181cc54708502
Source noise level estimate is also useful for
setting variance encoder parameters (variance thresholds,
qp-delta, mode selection, etc), so allow it to be used also
if denoising is not on.
Change-Id: I4fe23d47607b4e17a35287057f489c29114beed1
this avoids redefining vpx_codec_vp9_dx, vpx_codec_vp9_dx_algo in
vp9_encoder_parms_get_to_decoder.cc
Change-Id: I3b89e7a62497227ee32419f1a7d30e4c10a13c05
The old workaround "p = 0 ? 0 : p -1" is misleading.
?: happens before =
assigning back to p truncates to one byte.
Therefore it is equivalent to (p - 1) & 0xFF, but the check just exists
to work around a first pass bug, so let's make the work around more
clear.
https://bugs.chromium.org/p/webm/issues/detail?id=1089
Change-Id: I587c44dd61c1f3767543c0126376f881889935af
Width and height of downscaling resolution should not be lower
than min_width and min_height which can be set as needed, both
are 180 for now.
Change-Id: I34d06704ea51affbdd814246e22ee8d41d991f00
This reverts commit 7f56cb2978.
It causes uninitialized reads in the first pass setting up later cost tables.
Change-Id: I2df498df3f5c03eff359f79edf045aed0c618dc9
Adjust variance threshold, delta-qp, and intra penalty cost,
based on estimated noise level in source.
Replace denoising_on with a level value=L/M/H.
Change-Id: I0c017dae75a5d897367d2c42dec26f2f37e447c1
The old workaround "p = 0 ? 0 : p -1" is misleading.
?: happens before =
assigning back to p truncates to one byte.
Therefore it is equivalent to (p - 1) & 0xFF, but the check just exists
to work around a first pass bug, so let's make the work around more
clear.
https://code.google.com/p/webm/issues/detail?id=1089
Change-Id: Ia6dcc8922e1acbac0eeca23a4d564a355c489572
Bug relating to issue:- http://b/25090786
base_frame_target is supposed to track the idealized bit
allocation based on error score and not the actual bits
allocated to each frame.
The clamping of this value based on the VBR min and max pct values
was causing a bug where in some cases the loop that adjusts the
active max quantizer for each GF group was running out of bits at
the end of a KF group. This caused a spike in Q and some ugly artifacts.
A second change makes sure that the calculation of the active
Q range for a group DOES, however, take account of clamping.
Change-Id: I31035e97d18853530b0874b433c1da7703f607d1
Periodically estiamte noise level in source, and only denoise
if estimated noise level is above threshold.
Change-Id: I54f967b3003b0c14d0b1d3dc83cb82ce8cc2d381
A new version of vp9_highbd_error_8bit is now available which is
optimized with AVX assembly. AVX itself does not buy us too much, but
the non-destructive 3 operand format encoding of the 128bit SSEn integer
instructions helps to eliminate move instructions. The Sandy Bridge
micro-architecture cannot eliminate move instructions in the processor
front end, so AVX will help on these machines.
Further 2 optimizations are applied:
1. The common case of computing block error on 4x4 blocks is optimized
as a special case.
2. All arithmetic is speculatively done on 32 bits only. At the end of
the loop, the code detects if overflow might have happened and if so,
the whole computation is re-executed using higher precision arithmetic.
This case however is extremely rare in real use, so we can achieve a
large net gain here.
The optimizations rely on the fact that the coefficients are in the
range [-(2^15-1), 2^15-1], and that the quantized coefficients always
have the same sign as the input coefficients (in the worst case they are
0). These are the same assumptions that the old SSE2 assembly code for
the non high bitdepth configuration relied on. The unit tests have been
updated to take this constraint into consideration when generating test
input data.
Change-Id: I57d9888a74715e7145a5d9987d67891ef68f39b7
Added optimization of the 8 bit assembly quantizer routines. This makes
these functions up to 100% faster, depending on encoding parameters.
This patch maskes the encoder faster in both the high bitdepth and 8bit
configurations. In the high bitdepth configuration, it effects profile 0
only.
Based on my profiling using 1080p input the net gain is between 1-3% for
the 8 bit config, and around 2.5-4.5% for the high bitdepth config,
depending on target bitrate. The difference between the 8 bit and high
bitdepth configurations for the same encoder run is reduced by 1% in all
cases I have profiled.
Change-Id: I86714a6b7364da20cd468cd784247009663a5140
VP8E_UPD_ENTROPY, VP8E_UPD_REFERENCE and VP8E_USE_REFERENCE have been
deprecated since the initial public release
Change-Id: Ied16b441eec13434d85f1ab115d49ccaf5f2f7b0
Adjust the qp threshold and consec_zeromv threshold for
limiting cyclic refresh. Also increase the refresh period
when the limit amount is significant, and some code-cleanup.
Small gain in PSNR/SSIM metrics: ~0.25/0.3 gain on RTC set, speed 7.
Change only affects non-screen content.
Change-Id: I1ced87a89a132684c071e722616e445b2d18236a
Adjust the qp threshold based on the denoising setting; not allow
to scale directly from original resolution to one half and vise versa.
Change-Id: I032a9b22f8e1c88de6bb81cf8351367223a3e40d
For the re-encoding (at max-qp) on the detected high-content change:
update rate correction factor, reset rate over/under-shoot flags,
and update/reset the rate control for layered coding.
Change-Id: I5dc72bb235427344dc87b5235f2b0f31704a034a
Changes to the breakout behavior for partition selection.
The biggest impact is on speed 0 where encode speed in
some cases more than doubles with typically less than 1%
impact on quality.
Speed 0 encode speed impact examples
Animation test clip: +128%
Park Joy: +59%
Old town Cross: + 109%
Change-Id: I222720657e56cede1b2a5539096f788ffb2df3a1
If high bit depth configuration is enabled, but encoding in profile 0,
the code now falls back on optimized SSE2 assembler to compute the
block errors, similar to when high bit depth is not enabled.
Change-Id: I471d1494e541de61a4008f852dbc0d548856484f
The artifact occurs periodically when VP9 denoiser is on and
refresh_golden_frame happen. When refresh_golden_frame happen,
we should copy the frame buffer instead of swapping the pointers.
Change-Id: Ib3204c4b04db28ecf439c6d9e61f3d146f04196d
this reduces the number of synchronizations in decode_tiles_mt() and
improves overall performance when the number of threads is less than the
number of tiles
Change-Id: Iaee6082673dc187ffe0e3d91a701d1e470c62924
Small code cleanup. consec_zeromv refresh threshold
does not need to be computed for every super-block.
No change in behavior.
Change-Id: I8c4b1b28072f42b01d917fff6d1f62722f1e1554
Use the existing VP9_SET_SVC control to set the
first spatial layer to encode.
Since we loop over all spatial layers inside the encoder, the
setting of spatial_layer_id via VP9_SET_SVC has no relevance.
Use it instead to set the first_spatial_layer_to_encode,
which allows an application to skip encoding lower layer(s).
Change only affects the 1 pass CBR SVC.
Change-Id: I5d63ab713c3e250fdf42c637f38d5ec8f60cd1fb
The resolution check fixs the issue which resets resize_pending
unnecessarily and causes not-bitexact with previous one-step version.
Change-Id: I4e7660b3c8f34f59781e2e61ca30d61080c322de
Temporary fix to denoiser when dynamic resizing is on.
-Reallocate denoiser buffers on resized frame.
-Force golden update on resized frame.
-Don't denoise resized frame, and copy source into denoised buffers.
Change-Id: Ife7638173b76a1c49eac7da4f2a30c9c1f4e2000
For screen-content mode, with frame dropper off, put a limit
on how low encoder buffer can go.
Under hard slide changes, the buffer level can go too low and then
take long time to come back up (in particular when frame-dropping
is not used), which will affect the active_worst and target frame size.
Change-Id: Ie9fca097e05cd71141f978ec687f852daf9de332
Dynamic resizing now support two-steps scaling: first go down to
3/4 and then 1/2. This feature is under a flag which controls the
switch between two-steps scaling and one-step scaling (1/2 only).
Change-Id: I3a6c1d3d5668cf8e016a0a02aeca737565604a0f
vp9_filter_block_plane_ss11() and vp9_filter_block_plane_non420()
are only called for the uv planes.
Change-Id: Iacd3b3242c8ce581edd37c8f06d95efc8a0f88a3
The loopfilter masks are now built in the decode loop.
This is done so we can eventually reduce the number of
MODE_INFO structs required by the decoder.
The encoder builds the masks for the entire frame prior
to calling the loopfilter.
Change-Id: Ia2146b07e0acb8c50203e586dfae0c4c5b316f11
When configured with high bitdepth enabled, the 8bit transform
stopped using optimised code. This made 8bit content decode slowly.
Change-Id: I67d91f9b212921d5320f949fc0a0d3f32f90c0ea
In the decoder, map this to the output variable vpx_image_t.r_w/h.
This is intended as an improved version of VP9D_GET_DISPLAY_SIZE,
which doesn't work with parallel frame decoding. In the encoder,
map this to a codec control func (VP9E_SET_RENDER_SIZE) that takes
a w/h pair argument in a int[2] (identical to VP9D_GET_DISPLAY_SIZE).
Also add render_size to the encoder_param_get_to_decoder unit test.
See issue 1030.
Change-Id: I12124c13602d832bf4c44090db08c1009c94c7e8
The name "display_*" (or "d_*") is used for non-compatible information
(that is, the cropped frame dimensions in pixels, as opposed to the
intended screen rendering surface size). Therefore, continuing to use
display_* would be confusing to end users. Instead, rename the field
to render_*, so that struct vpx_image can include it.
Change-Id: Iab8d2eae96492b71c4ea60c4bce8121cb2a1fe2d
Use the existing QP condition on limiting cyclic refresh, and add
addiitonal condition that block has been encoded with zero/small motion
x frames in row (where x is at least several times the refresh period).
Additional condition only affect non-screen content mode.
This helps to improve visual stability for noisy input, where on steady
background areas the application of delta_qp may lead to encoding the noise.
Also added a change to use the true skip (after encoding) to update the
last QP.
Change-Id: I234a1128d017d284cf767fdb58ef6c59d809f679
Limit transform size for intra to 16x16, for non-screen content mode.
Little/no change in speed or metrics.
32x32 intra block is rarley selected in RTC (non-screen content) case,
but some visual improvement can be seen in some example,
e.g., captured_video_dark_whd.yuv.
Change-Id: I68e2db87875343b3fb9bb407a7709f0088f84072
Reallocation of mi buffer fails if change size on the first frame and
change config in subsequent frames. Add a condition for resolution
check to avoid assertion failure.
BUG=1074
Change-Id: Ie26ed816a57fa871ba27a72db9805baaaeaba9f3
Reference frame masking logic may skip checking zeromv-last mode.
Fix to avoid this and make sure zero-last is always checked.
No noticeable change in speed, and PSNR/SSIM metrics on RTC set overall
neutral (very small gain ~0.02).
Small visual improvement on few RTC clips.
Change-Id: I26eacdc449126424001a4a64e5ac31949f064417
Add SVC codec control to set the frame flags and buffer indices
for each spatial layer of the current (super)frame to be encoded.
This allows the application to set (and change on the fly) the
reference frame configuration for spatial layers.
Added an example layer pattern (spatial and temporal layers)
in vp9_spatial_svc_encoder for the bypass_mode using new control.
Change-Id: I05f941897cae13fb9275b939d11f93941cb73bee
In decoder, export (eventually) into vpx_image_t.range field. In
encoder, use oxcf->color_range to set it (same way as for
color_space).
See issue 1059.
Change-Id: Ieabbb2a785fa58cc4044bd54eee66f328f3906ce
For 1 pass CBR spatial-SVC:
Add cyclic refresh parameters to the svc-layer context.
This allows cyclic refresh (aq-mode=3) to be applied to
the whole super-frame (all spatial layers).
This gives a performance improvement for spatial layer encoding.
Addd the aq_mode mode on/off setting as command line option.
Change-Id: Ib9c3b5ba3cb7851bfb8c37d4f911664bef38e165
Fixes temporal scalability. Updates were inadvertently turned
off for two pass svc causing crashes due to gf_group.index
growing unchecked.
Change-Id: Iff759946bf61bbde70630347cc8fa4d51a8c2d2f
The normative (convolve8) filter is optimized/faster than
the nonnormative one. Pass usage of scaler (normative/nonomorative)
to vp9_scale_if_required(), and always use normative one for 1 pass.
Change-Id: I2b71d9ff18b3c7499b058d1325a9554de993dd52
prevents an int -> vpx_img_fmt_t conversion warning with high-bitdepth
as it modifies the image format
Change-Id: Ie3135d031565312613a036a1e6937abb59760a7e
Access scaled reference frame in the sub8x8 rate-distortion
optimization loop only when the current test mode is an inter mode.
This prevents an ioc warning triggered by sending intra_frame index
to fetch scaled reference frame.
Change-Id: I6177ecc946651dd86c7ce362e3f65c4074444604
This commit allows the encoder to include sub8x8 inter mode with
scaled reference frame in the rate-distortion optimization scheme.
Change-Id: Ibbe9678801592826ef22566566dcdeeb008350d5
Sync the encoder's buffer offset calculation for sub8x8 block motion
compensated prediction with scaled reference frame to match the
decoder's behavior. This resolves an enc/dec mismatch issue when
sub8x8 inter mode with scaled is turned on.
Change-Id: I4bab3672b007a5ae0c992f8a701341892d2458b0
the check performed within the while was redundant; simply place the
accumulation after all tiles are decoded.
Change-Id: I6a74e87257c775fd8bfc8ac4511e4a6ad8f18346
If the encoder dynamic resize is triggered and change config()
is then called, it will reset the current (resized) codec width/height
back to the the config (unresized) width/height (which will then
prevent the resizing action from occurring in encoder_loop).
Avoid this by checking for a change in the config width/height
before resetting the cm->width/height.
Change-Id: Id9d50c0ee8a943abe4b6c72bbaa02d9696f93177
For one pass CBR: only check for updating refresh_golden
if ext_refresh_frame_flags_pending is not set (i.e., == 0).
And move the resetting of ext_refresh_frame_flags_pending = 0
down to after the encode_loop (and account for dropped frames).
This is to prevent changing refresh_golden flga when the user
supplies the reference/update flags.
Change-Id: I4d87b3e705ba43f243667e367503b585c61e2a54
In high bitdepth setting, the rate multipier may be set as 0. In
lossless mode, the RD cost would always be 0, resulting in bad
partition and prediction mode choices.
Change-Id: I297014dd8bfa8a07ff0ab480119f75678300ff68
This patch just fixes the test for the time being, but does not
actually solve the underlying issue, which still needs investigation.
Change-Id: I54a35de839723f5b499b57e38dd2bdd400adc427
Switch to use the normative (convolve8) filter for source scaling,
only for 1/2x1/2 scaling for now. This is faster and has better
quality than either the vpx_scale_frame or the nonnormative scaler.
Remove the vp9_scale_if_required_fast, which is now not used.
Change-Id: I2f7d73950589d19baafb1fa650eac987d531bcc8
For 1 pass CBR mode under screen content mode:
if pre-analysis (source temporal-sad) indicates significant
change in content, then check the projected frame size after
encode_frame(), and if size is above threshold, force re-encode
of that frame at max QP.
Change-Id: I91e66d9f3167aff2ffcc6f16f47f19f1c21dc688
Only test for using golden as reference for variance partition
selection if it is used as a reference for that frame.
For temporal layers, golden may not be a reference on a given frame,
even though it was for some previous frame. If it is not a reference
for current frame, don't check/use it for partition selection.
Change-Id: I6b0f2bd36aebbb5903077c9a0a66d80f1de9a7b1
For speed 7, real-time mode: Base layer frames are further apart
(for #temporal layers = 3, this is every 4 frames) so worth keeping
same motion search parameters (as in speed 6) on the base layer frames.
Change-Id: Idebf49dda6ef4f3d9a55aee55129a68253f692fb
* changes:
Only use .text sections for aout
Use newer x86inc.asm
Use .text instead of .rodata on macho
Copy PIC handling code from x86_abi_support
Set 'private_extern' visibility for macho targets
Avoid 'amdnop' when building with nasm
Catch all elf formats
Expand PIC default to macho64 and respect CONFIG_PIC from libvpx
Use libvpx defines to set name mangling rules
Customize x86inc.asm for libvpx
Rename updated version of x86inc.asm
Use "private_prefix" instead of "program_name" and make vpx the default
prefix.
Change-Id: I4883a99b2aee8e5dc9f2c16a2e6f4b5d6e4de458
Use the correct period (in terms of cr->percent_refresh) for the condition
of larger delta-qp following key frame.
And account for larger interval for temporal layers.
Change-Id: Ibb43f5200f9b1eeb8bbb8211327b08ecda3c3b8a
Re-investigated the second-level sub-pixel motion search. Improved the
way of choosing search points. Rewrote the second-level search code.
At speed 0, the borg tests showed:
1. for stdhd set, Avg PSNR gain: 0.216%; Overall PSNR gain: 0.196%;
SSIM gain: 0.206%. Only 1 out of 15 clips showed PSNR loss.
2. for derf set, Avg PSNR gain: 0.171%; Overall PSNR gain: 0.192%;
SSIM gain: 0.207%. Only 3 out of 30 clips showed PSNR losses.
Added the condition for third-point checking, namely, less points
were checked. Speed tests showed no speed loss(Avg 0.3% speedup at
speed 0).
Change-Id: I6284ebb3fa7ba63be8528184c49e06757211a7f1
-For ambient qp in active_worst setting: increase the initial
averaging time (from very first frame) to account for avg_qp of key_frame.
-In postencode on key frame: update the last_q/avg_q[key_frame] for
all temporal layers.
Change-Id: I5313153d350b1045b4835ce948dfffb7d2039b52
Condition usage of rc.frames_since_golden to non-svc mode.
rc.frames_since_golden, which is used in non-svc mode to add second reference,
was causing, under certain condiiton, the turning off of golden reference
for svc case.
Change-Id: Icec644d235d0471e56d8ff73d6c37278bd6ecd3b
and FUN_CONV_2D macros. The predict lut now handles
this case. The encoder now calls vpx_scaled_2d() instead
of vpx_convolve8() for scaling.
Change-Id: Ia1c8af8a31e4cb4887a587143108cb45835f7df7
This commit clears all the vp9_ prefix use case in vpx_dsp. It gets
the vp9 folder ready to branch out vp10.
Change-Id: I2906eec179ee792b4af8c9b4161313653050e931
This commit clears the function naming convention in vpx_dsp. It
replaces vp9_ prefix of global functions with vpx_ prefix. It also
removes the vp9_ prefix from static functions.
Change-Id: I6394359a63b71a51dda01342eec6a3cc08dfeedf
Choose a different diagonal point to check when the two costs are
the same, making it consistent with the way we choose the best mv.
This slightly changes the encoding result, and the derflr set borg
test at speed 0 shows 0.027% Overall PSNR gain, 0.024% Avg PSNR
gain, and 0.043% SSIM gain.
Change-Id: Ic8ee3a6767394866d159e4f9e1c777604dd73c17
If the current best mv(namely, the search center) is still the best mv
after the first level search, the second level checks is skipped. This
patch doesn't change the bitstream. At speed 0, it speeds up the encoder
by 1% - 2%.
Change-Id: I054c91b884d3f7aef157436c061744562bd6506d