Re-use the tile worker threads to pack the bitstream in parallel
on a per-tile basis. Restricting this to real-time only for now
(further testing is needed to ensure this does not make 2-pass
worse in any case).
BUG=webm:1309
Change-Id: I8a80da7c5089b837d0df79a5c49d5e3022dfc8ec
This reverts commit 9e8efa5b18.
this change causes ubsan warnings, failures in
vpxenc_vp9_webm_rt_multithread_tiled
BUG=webm:1309
Change-Id: I020c7be985c771bfff4b3de1afe51cc8edb980da
Re-use the tile worker threads to pack the bitstream in parallel
on a per-tile basis. Restricting this to real-time only for now
(further testing is needed to ensure this does not make 2-pass
worse in any case).
BUG=webm:1309
Change-Id: Ia2c982da56697756e12f02643f589189b3271d98
For 1 pass vbr real-time mode:
Allow for the usage of alt-ref frame when non-zero lag-in-frames is used.
Use non-filtered alt-ref, and select usage based on fast scene/content
analysis/detection within the lag of frames.
Positive gains on ytlive set: overall avgPSNR ~3-4%.
Several clips are up between 5-14%, a few clips are neutral/small change.
Current speed decrease is about ~5-10%.
Use the flag USE_ALTREF_FOR_ONE_PASS to enable this feature
(off by default for now).
Change-Id: I802d2bf3d44f9cf01f6d15c76be9c90192314769
change_config() may be called often in real-time application,
to update bitrate/framerate or qp-max/min.
No need to do update_frame_size() unless frame size has changed.
Change-Id: I23a51deade1e03adc91c468f9ffde3235298770c
This patch is to address concerns that changes to allow
recodes on the first frame in each ARF group do not give a
good enough speed quality trade off for speed 2. Though the
average impact on encode speed is 1-2%, for some hard clips
it is > 5% rise. For speed 1 this is less an issue and for Speed 0
the previous patch actually improves speed.
Change-Id: Ie1bcefdbfdf846d3f4428590173f621465dffe3a
This corrects a formatting error introduced in:
I1e9d548ce445d29002f0c59ebfd3957a6f15e702
where spaces were used as delimiters instead of tabs.
The corresponding fix for vp10 is in
Ica3d625d6672b3c47e0e208b45eede29b9004030.
Change-Id: Ibc4eb8fd82e6b926ba259a679dc98557cadba9b1
Current commit is just an API template for the rest of the code, and
I will add inner logic later.
Altref frames generate a lot of bitrate and at the same time
other frames refer to them a lot, so it makes sense to apply
special compensation-based adaptive quantization scheme for altref
frames. E.g., for blocks that are good predictors for the future
apply rate-control chosen quantizer while for bad predictors apply
worse one.
Change-Id: Iba3f8ec349470673b7249f6a125f6859336a47c8
Allow recodes for the first inter frame in each arf group
even when the recode rule is set to ALLOW_RECODE_KFARFGF.
Small gains of 0.05%.
Change-Id: I40cb559d36a2bf0ebf5cf758c3f92e452b480577
This commit changes the call in vp9 encoder from vp9_deblock() to
vp9_post_proc_frame() to ensure the data structures used in the call
are properly allocated. This fixes an encoder crash when configured
with --enable-internal-stats.
Change-Id: I2393b336c0f566665336df4f1ba91c405eb56764
Allow usage of lookahead for VBR in real-time mode, for 1 pass vbr.
Current usage is for fast checking of future scene cuts/changes,
and adjusting rate control (gf interval and active_worst/target size).
Added unittests (datarate) for 1 pass vbr mode, with non-zero lag.
Added an experimental option to limit QP based on lookahead.
Overall positive gain in metrics on ytlive set:
avgPNSR/SSIM up on average ~1-3%; several clips up by 5, 7%.
Change-Id: I960d57dfc89de121c4824b9a9bf88d2814e74b56
This change eliminates redundant computation in the two stage
downscaling, which saves ~1% encoding time in 3-layer svc encoding.
Change-Id: Ib4b218811b68499a740af1f9b7b5a5445e28d671
For forced key frames in particular this helps to make them
blend better with the surrounding frames where noise tends
to be suppressed by a combination of quantization and alt
ref filtering.
Currently disabled by default under and IFDEF flag pending
wider testing.
Change-Id: I971b5cc2b2a4b9e1f11fe06c67ef073f01b25056
Added actual and absolute rate miss values to the opsnr.stt
stats output line.
Changes to the borg graphing may be needed before merge.
Change-Id: I1e9d548ce445d29002f0c59ebfd3957a6f15e702
Add control API VP9E_SET_TARGET_LEVEL that allows the encoder to
control the output bitstream level and/or keep level related
statistics.
Usage:
255 do not care about level (default)
0 keep level related stats only
10 target for level 1
11 target for level 1.1
.
.
.
62 target for level 6.2
Usage for vpxenc:
--target-level=0/255/10/11...
Change-Id: I31d1aeca19358b893e7577b4e63748c8e614034a
In Aq mode 1 the segment and AQ delta for each block is based
on spatial variance. There may be a net imbalance between blocks
that have lower Q than the baseline value and those that have higher Q.
This patch monitors that imbalance and extends the allowed baseline
Q range for the frame to accommodate adjustment of that baseline value
to compensate.
Change-Id: Iae8a48c7c01fe2af94a141e149d03acf467237ca
So it can be used even with aq-mode=3 not enabled.
Also cleans up some code in the places where its used.
No change in behavior.
Change-Id: Ib6b265308dbd483f691200da9a0be4da4b380dbc
Change recursive weight for average_source_sad and
put some constraint on spacing between detected scene-cuts.
Change only affects 1 pass real-time mode.
Change-Id: I1917e748d845e244812d11aec2a9d755372ec182
Avoid copy-block when denoising is at LowLow level (i.e., no denoising is done).
Instead, don't enter denoiser at all, and when level goes back up over kLowLow
do a reset in denoiser.
Change-Id: I0544adf58f4dd51ecc4a4607fcb0353bfbbb7a59
Adds a second threshold for recodes even on frames where
recode is normally disabled if there is a big rate miss.
Change-Id: Ifd4a34707da55ec15eb7cfb87de4644b8d76deb2
Don't advance the svc frame counters on dropped frame,
since this can break the referencing scheme and lead
to a crash/assert.
Updated svc-datarate unittest to add a lower bitrate test.
Change only affects 1 pass cbr svc, with frame dropper enabled.
Change-Id: Ibb7530b7a587a9344d46898d9286fd9e2ef0779c
Use the superframe counter to set the key frame, and force
it to the key frame on base spatial layer only.
Also, update svc frame counters under frame dropping.
Update unittest: add specific tests with short key frame period.
https://bugs.chromium.org/p/webm/issues/detail?id=1150
Change-Id: I5b1c9a09253e6e5fbfce51b4cf603ae22d422b01
For 1 pass cbr mode: allow for two-stage 1:2 scaling
(which will use the 1:2 optimized scaler) if the spatial
layer is 1/4x1/4 of souce.
Without this change, the base layer for 3 spatial layers would
be using the non-normative scaler which is un-optimized/C code.
Change-Id: I9d73f92a4a96927d0f1d6bf75315c1e60513226a
This reverts commit f51f0998e1.
This causes datarate tests to fail. Some are due to the new default
keyframe distance, another causes an assert even forcing 9999:
[ RUN ] VP9/DatarateOnePassCbrSvc.OnePassCbrSvc3SpatialLayers/0
test_libvpx:
vpx_dsp/x86/vpx_subpixel_8t_intrin_ssse3.c:853: scaledconvolve2d:
Assertion `y_step_q4 <= 32' failed.
Change-Id: I4ee4fea97f47e4f1a23b82a62e6afc6280961e38
For 1 pass cbr mode: allow for two-stage 1:2 scaling
(which will use the 1:2 optimized scaler) if the spatial
layer is 1/4x1/4 of souce.
Without this change, the base layer for 3 spatial layers would
be using the non-normative scaler which is un-optimized/C code.
Change-Id: Ifcf526ec2aaf3e5fa7924588d9dd8660bf02fb46
Use the existing scene/content change detection to better
update/adjust golden frame refresh.
Change only affects 1 pass real-time vbr mode, speed >=5.
Change-Id: I2963a5bb7ca4a19f8cf8511b0a925e502f60e014
move to encoder_encode() as vp9_get_compressed_data() allocates data and
would require some modification to make its error return meaningful.
Change-Id: I8ddc390a1441afd0ff937842fa4ad1053c956133
External dynamic resize with swapping width and height was
not handled properly.
Fix is to re-init loop-filter under certain condtions.
Modify unittest to test this case.
Without this change test will fail.
Relates to: https://bugs.chromium.org/p/webm/issues/detail?id=1140
Change-Id: I7d81ca7fe0783b3bc103a52a7b7cf073a96be26e
allocations done within this function are protected with
vpx_internal_error; adding the setjmp fixes a crash in
vp9_lookahead_push() under low memory conditions.
Change-Id: I4b79dca37cc7fadc4b7633f0db44c0e406799bc6
An issue exists with reference_masking in non-rd pickmode for spatial
scaling. It was kept off for internal dynamic resizing and svc, this
change is to keep it off also for external dynamic resizing.
Update to external resize test, and update TODO to re-enable this
at frame level when references have same scale as source.
Change-Id: If880a643572127def703ee5b2d16fd41bdbf256c
When the codec frame size is the same as the reference frame size,
release the scaled reference before assigning it a new buf_idx.
Only affects 1 pass non-svc mode, where the scaled references are
release only under certain conditions (to prevent un-needed scaling
of the references every frame).
Modified a unittest that can trigger this bug without this change.
https://code.google.com/p/chromium/issues/detail?id=582598
Change-Id: I9a884e36ebd7608b1641ec2a469e20a4f829cf43
If the application changes frame size (external size changes),
and aq-mode=3 is on, reset the cyclic refresh.
Modify the TestExternalResize unittest (longer run with more resize
actions). Without this change an assert would be triggered on this
longer test.
Change-Id: I0eefd2cd7ffa0c557cca96ae30d607034a2599ce
The postproc vp9_denoise() is a spatial denoise/blur function.
It was not intended to be used if temporal denoising is enabled.
Change-Id: I97d2dcb941e7cc49bbafce99d9286beb2693249d
Changes to mode selection for 1 pass SVC mode:
use base layer motion vector, changes to intra-prediction.
Change-Id: I3e883aa04db521cfa026a0b12c9478ea35a344c9
This patch fixes a bug that causes the loop filter search to reset to
a low value or zero after each arf overlay frame. We expect the overlay
frames to need little or no loop filtering but this should not propagate.
Change-Id: I895b28474cf200f20d82793f3de40b60b19579fd
Different quality levels are used for different regions in
the frame depending on how far they are vertically from the
center. Specifically, three segments are used based on the
mi_row index with respect number to the number of mi_rows in
the frame.
Change-Id: Ifc8b777bc58ea8521dffc4640360c67d99f8d381
For testing implemented a fixed pattern and delta, 1 pass,
fixed Q, low delay mode.
This has not in any way been tuned or optimized.
Change-Id: Idf5ee179b277fa15d07a97f14f2ce5bbaae80a04
Keep track of frame indexes for the references, and
constrain inter mode search for reference with same
temporal alignment.
Improves speed by about ~15%, no noticeable loss in
compression performance.
Change-Id: I5c407a8acca921234060c4fcef4afd7d734201c8
This change alters the nature and use of exhaustive motion search.
Firstly any exhaustive search is preceded by a normal step search.
The exhaustive search is only carried out if the distortion resulting
from the step search is above a threshold value.
Secondly the simple +/- 64 exhaustive search is replaced by a
multi stage mesh based search where each stage has a range
and step/interval size. Subsequent stages use the best position from
the previous stage as the center of the search but use a reduced range
and interval size.
For example:
stage 1: Range +/- 64 interval 4
stage 2: Range +/- 32 interval 2
stage 3: Range +/- 15 interval 1
This process, especially when it follows on from a normal step
search, has shown itself to be almost as effective as a full range
exhaustive search with step 1 but greatly lowers the computational
complexity such that it can be used in some cases for speeds 0-2.
This patch also removes a double exhaustive search for sub 8x8 blocks
which also contained a bug (the two searches used different distortion
metrics).
For best quality in my test animation sequence this patch has almost
no impact on quality but improves encode speed by more than 5X.
Restricted use in good quality speeds 0-2 yields significant quality gains
on the animation test of 0.2 - 0.5 db with only a small impact on encode
speed. On most clips though the quality gain and speed impact are small.
Change-Id: Id22967a840e996e1db273f6ac4ff03f4f52d49aa
This function now has an AVX intrinsics version which is about 80%
faster compared to the C implementation. This provides a 2-4% total
speed-up for encode, depending on encoding parameters. The function
utilizes 3 properties of the cost function lookup table, constructed
in 'cal_nmvjointsadcost' and 'cal_nmvsadcosts'.
For the joint cost:
- mvjointsadcost[1] == mvjointsadcost[2] == mvjointsadcost[3]
For the component costs:
- For all i: mvsadcost[0][i] == mvsadcost[1][i]
(equal per component cost)
- For all i: mvsadcost[0][i] == mvsadcost[0][-i]
(Cost function is even)
These must hold, otherwise the AVX version of the function cannot be used.
Change-Id: I6c2791d43022822a9e6ab43cd124a773946d0bdc
This reverts commit f1342a7b07.
This breaks 32-bit builds:
runtime error: load of misaligned address 0xf72fdd48 for type 'const
__m128i' (vector of 2 'long long' values), which requires 16 byte
alignment
+ _mm_set1_epi64x is incompatible with some versions of visual studio
Change-Id: I6f6fc3c11403344cef78d1c432cdc9147e5c1673
This function now has an AVX intrinsics version which is about 80%
faster compared to the C implementation. This provides a 2-4% total
speed-up for encode, depending on encoding parameters. The function
utilizes 3 properties of the cost function lookup table, constructed
in 'cal_nmvjointsadcost' and 'cal_nmvsadcosts'.
For the joint cost:
- mvjointsadcost[1] == mvjointsadcost[2] == mvjointsadcost[3]
For the component costs:
- For all i: mvsadcost[0][i] == mvsadcost[1][i]
(equal per component cost)
- For all i: mvsadcost[0][i] == mvsadcost[0][-i]
(Cost function is even)
These must hold, otherwise the AVX version of the function cannot be used.
Change-Id: I184055b864c5a2dc37b2d8c5c9012eb801e9daf6
Source noise level estimate is also useful for
setting variance encoder parameters (variance thresholds,
qp-delta, mode selection, etc), so allow it to be used also
if denoising is not on.
Change-Id: I4fe23d47607b4e17a35287057f489c29114beed1
Periodically estiamte noise level in source, and only denoise
if estimated noise level is above threshold.
Change-Id: I54f967b3003b0c14d0b1d3dc83cb82ce8cc2d381
Use the existing VP9_SET_SVC control to set the
first spatial layer to encode.
Since we loop over all spatial layers inside the encoder, the
setting of spatial_layer_id via VP9_SET_SVC has no relevance.
Use it instead to set the first_spatial_layer_to_encode,
which allows an application to skip encoding lower layer(s).
Change only affects the 1 pass CBR SVC.
Change-Id: I5d63ab713c3e250fdf42c637f38d5ec8f60cd1fb
Temporary fix to denoiser when dynamic resizing is on.
-Reallocate denoiser buffers on resized frame.
-Force golden update on resized frame.
-Don't denoise resized frame, and copy source into denoised buffers.
Change-Id: Ife7638173b76a1c49eac7da4f2a30c9c1f4e2000
Dynamic resizing now support two-steps scaling: first go down to
3/4 and then 1/2. This feature is under a flag which controls the
switch between two-steps scaling and one-step scaling (1/2 only).
Change-Id: I3a6c1d3d5668cf8e016a0a02aeca737565604a0f
The loopfilter masks are now built in the decode loop.
This is done so we can eventually reduce the number of
MODE_INFO structs required by the decoder.
The encoder builds the masks for the entire frame prior
to calling the loopfilter.
Change-Id: Ia2146b07e0acb8c50203e586dfae0c4c5b316f11
In the decoder, map this to the output variable vpx_image_t.r_w/h.
This is intended as an improved version of VP9D_GET_DISPLAY_SIZE,
which doesn't work with parallel frame decoding. In the encoder,
map this to a codec control func (VP9E_SET_RENDER_SIZE) that takes
a w/h pair argument in a int[2] (identical to VP9D_GET_DISPLAY_SIZE).
Also add render_size to the encoder_param_get_to_decoder unit test.
See issue 1030.
Change-Id: I12124c13602d832bf4c44090db08c1009c94c7e8
The name "display_*" (or "d_*") is used for non-compatible information
(that is, the cropped frame dimensions in pixels, as opposed to the
intended screen rendering surface size). Therefore, continuing to use
display_* would be confusing to end users. Instead, rename the field
to render_*, so that struct vpx_image can include it.
Change-Id: Iab8d2eae96492b71c4ea60c4bce8121cb2a1fe2d
Reallocation of mi buffer fails if change size on the first frame and
change config in subsequent frames. Add a condition for resolution
check to avoid assertion failure.
BUG=1074
Change-Id: Ie26ed816a57fa871ba27a72db9805baaaeaba9f3
In decoder, export (eventually) into vpx_image_t.range field. In
encoder, use oxcf->color_range to set it (same way as for
color_space).
See issue 1059.
Change-Id: Ieabbb2a785fa58cc4044bd54eee66f328f3906ce
For 1 pass CBR spatial-SVC:
Add cyclic refresh parameters to the svc-layer context.
This allows cyclic refresh (aq-mode=3) to be applied to
the whole super-frame (all spatial layers).
This gives a performance improvement for spatial layer encoding.
Addd the aq_mode mode on/off setting as command line option.
Change-Id: Ib9c3b5ba3cb7851bfb8c37d4f911664bef38e165
The normative (convolve8) filter is optimized/faster than
the nonnormative one. Pass usage of scaler (normative/nonomorative)
to vp9_scale_if_required(), and always use normative one for 1 pass.
Change-Id: I2b71d9ff18b3c7499b058d1325a9554de993dd52
If the encoder dynamic resize is triggered and change config()
is then called, it will reset the current (resized) codec width/height
back to the the config (unresized) width/height (which will then
prevent the resizing action from occurring in encoder_loop).
Avoid this by checking for a change in the config width/height
before resetting the cm->width/height.
Change-Id: Id9d50c0ee8a943abe4b6c72bbaa02d9696f93177
For one pass CBR: only check for updating refresh_golden
if ext_refresh_frame_flags_pending is not set (i.e., == 0).
And move the resetting of ext_refresh_frame_flags_pending = 0
down to after the encode_loop (and account for dropped frames).
This is to prevent changing refresh_golden flga when the user
supplies the reference/update flags.
Change-Id: I4d87b3e705ba43f243667e367503b585c61e2a54
Switch to use the normative (convolve8) filter for source scaling,
only for 1/2x1/2 scaling for now. This is faster and has better
quality than either the vpx_scale_frame or the nonnormative scaler.
Remove the vp9_scale_if_required_fast, which is now not used.
Change-Id: I2f7d73950589d19baafb1fa650eac987d531bcc8
For 1 pass CBR mode under screen content mode:
if pre-analysis (source temporal-sad) indicates significant
change in content, then check the projected frame size after
encode_frame(), and if size is above threshold, force re-encode
of that frame at max QP.
Change-Id: I91e66d9f3167aff2ffcc6f16f47f19f1c21dc688
and FUN_CONV_2D macros. The predict lut now handles
this case. The encoder now calls vpx_scaled_2d() instead
of vpx_convolve8() for scaling.
Change-Id: Ia1c8af8a31e4cb4887a587143108cb45835f7df7
Ssim_vars is used to accumulate stats based 4x4 pixel blocks, this
commit changes the allocations size to be based on mi_rows and mi_cols
to avoid out-of-bound memory access for larger size videos. The hard
coded 720x480 can only work for image size up to 2880x1920.
Change-Id: Id9d07f3f777385b448ac88a6034b7472e4cf3c79
It in essence refactors the code for both the interpolation
filtering and the convolution. This change includes the moving
of all the files as well as the changing of the code from vp9_
prefix to vpx_ prefix accordingly, for underneath architectures:
(1) x86;
(2) arm/neon; and
(3) mips/msa.
The work on mips/drsp2 will be done in a separate change list.
Change-Id: Ic3ce7fb7f81210db7628b373c73553db68793c46
Avoid scaling the references if they have already been scaled.
Change only affects 1 pass non-svc mode for now.
Change-Id: I204f4079c026cba7adce7a7f855d072f6139ccec
The fast scaling for 1 pass mode was being used only on the
first frame after resizing event (because resize_scale_num/den
is set to 1 and only changed for first frame following resize event).
Change-Id: I723b63e21823eb858f25f5662d2bbe4f1842e61f
Proper use/update of resize_state and resize_pending to constrain
the total amount of downsizing to be at most one scale down, for now.
Change-Id: Id18fc32499f2fbdbec16728dcdc9e4eac09098f0
The encoder gets its dqcoeff from the context tree. In the decoder move
it to directly after MACROBLOCKD.
Change-Id: I46c9b76f26956a360d17de0b26ecb994dae34ecb
Even if the recode loop is not enabled for the current frame type
trap the case where the projected size of a a frame is above the
maximum allowed in recode_loop_test()
Change-Id: I453004694b8f8699e3c2a83252e9f83adccdda4e
Adds two new vp9 parameters --min-gf-interval and --max-gf-interval
to enable testing based on frequency of alt-ref frames.
Also adds a unit-test to test enforcement of min-gf-interval.
For both these parameters the default value is 0, which indicates
they are picked by the encoder, based on resolution and framerate
considerations. If they are greater than zero, the specified
parameter is honored.
(Additional note by paulwilkins)
Note that there is a slight oddity in that key frames are also GFs and
considered part of GF only group. However they are treated as not
being part of an arf group because for arf groups the previous GF is
assumed to be the terminal or overlay frame for the previous group.
(end note)
Change-Id: Ibf0c30b72074b3f71918ab278ccccc02a95a70a0
to MB_MODE_INFO_EXT. This saves 36 bytes per 8x8 area for
both the decoder and encoder. (encoder has two MODE_INFO
buffers)
Change-Id: If006abb2224acaf326df3c2be09e77e967662107
Decision to scale down/up is based on buffer state and average QP
over previous time window. Limit the total amount of down-scaling
to be at most one scale down for now.
Reset certain quantities after resize (buffer level, cyclic refresh,
rate correction factor).
Feature is enable via the setting rc_resize_allowed = 1.
Change-Id: I9b1a53024e1e1e953fb8a1e1f75d21d160280dc7
set_frame_size() is being called twice, once before entering
encode_encode_frame_to_data_rate(), and once again in that function.
No need to call it twice for one-pass mode.
Change-Id: I5fabaf0a90482d4f42cd89ef7ae1402c31aec600
For content that is identified as likely to contain some
animation or graphics content, increase the availability
of split modes for good quality speeds 1-3.
On a problem test animation clip this improves metrics
results by about 0.25 db and makes a noticeable difference
visually. It also causes a small drop in file size (~0.5%) but
a rise in encode time of about 5-6% at speed 2.
For more normal content it should have no effect.
Change-Id: Ic4cd9a8de065af9f9402f4477a17442aebf0e439
Some places are using the unoptimized variance function. This was never
intended and does not fit into the optimization framework.
Change-Id: Id96238407aad03b0ffd4a46cd183555a026daedc