Add code to monitor over and under spend and
apply limited correction to the data rate of subsequent
frames. To prevent the problem of starvation or overspend
on individual frames (especially near the end of a clip) the
maximum adjustment on a single frame is limited to a %
of its un-modified allocation.
Change-Id: I6e1ca035ab8afb0c98eac4392115d0752d9cbd7f
This commit compares the current original frame to the previous
original frame at 64x64 block level and decides if the entire
block belongs to background area. If it is in the background area,
skip non-RD partition search and copy the partition types of the
collocated block in the previous frame.
For vidyo1 in the rtc set, this makes the speed -5 coding speed
about 8% faster. The overall compression performance is down by
1.37% for rtc set.
Change-Id: Iccf920562fcc88f21d377fb6a44c547c8689b7ea
Delete code relating to the old VP8_TUNE_SSIM flag
as this code does not currently work and is largely made
redundant in VP9 by the various AQ modes.
Change-Id: I71f28e1f680573d296422254489000678552b17b
Remove duplicate rd_thresh code introduced when vp9_rd_pick_inter_mode_sub8x8()
was forked from vp9_rd_pick_inter_mode_sb().
Change-Id: I3c9b7143d182e1f28b29c16518eaca81dc2ecfed
Fix rate control bug whereby the rate factor heuristics
were being updated on arf overlays causing a rate surge
for a few frames followed by a corrective drop.
This fix eliminates many of the overshoot problems that
we were seeing on hard clips (even without applying
stricter vbr rate control) and also helps quality on
almost all clips with some hard clips improving by >5%.
Overall quality results measured at speed 2.
Derf +1.78% opsnr , +2.44% SSIM
Stdhd +2.41% opsnr, +2.85% SSIM
Change-Id: I2369df6295c2705963fa6307877f6acb304bcc39
We don't use declarations from this file. The real declarations
(differently named) are in vp9_rtcd_defs.pl, e.g. vp9_full_search_sad.
Change-Id: I73cbf064305710ba20747233cfdbe67366f069a0
Added command line flags "resize-width" & "resize-height"
to allow the user to specify the frame size to encode at.
These two flags are ignored if the "resize-allowed" switch
is not set to 1.
All frames in the clip are then encoded at this size, which
must be smaller than the raw frame size.
Change-Id: I3d64bd9303d5c0bd678461a866a1ea621700d744
A previous path improved speed 2 quality a little but
more extensive testing showed that it slowed encode
by a few %.
The change will have a similar effect for speed 3 but
should not impact speeds 4+;
This experiment should reverse that and give a speed
up at the cost of a small quality loss.
Borg results pending.
Change-Id: I4493fc1541aaf44587f1a41ff219f7088da9252c
Both values are already checked as command line arguments:
RANGE_CHECK_HI(cfg, g_lag_in_frames, MAX_LAG_BUFFERS);
RANGE_CHECK_HI(extra_cfg, sharpness, 7);
Change-Id: I584798d587152d88dfd517c210054b466f4e5f8a
With a more approriate one vp9_setup_src_planes() as only src buffer
pointers need to be initialized here.
Change-Id: I40fac4d8b2d39eb7d0c65b9b6afab45138a13936
This increases the range of Q values available to
normal inter frames to allow encoder a better chance
to hit the target rate.
Change-Id: I33cd96469a46577fdcea631e26d3355710909e6d
The limits applied under the flag
"LIMIT_QRANGE_FOR_ALTREF_AND_KEY"
behaved in an undesirable way if the gap between
active_worst_quality and active_best_quality was
changed.
In this patch, the adjustment is made using the
vp9_compute_qdelta_by_rate() function and fixed
rate multiplier values. Hence it is not impacted by
the relative value of active_best_quality.
Change-Id: I93b3308e04ade1e4eb5af63edf64f91cd3700249
The cause is because VP9 encoding use vp8_vpxyv12_extendframeborders_neon
on arm which only extend boarder size 32. But VP9's border size is 160
Change-Id: I1ff7e945344a658af862beb1197925e677e8ff57
Problem has been introduced recently with the cleanup patch
I0816ec12ec0a6f21d0f25f10c214b5fd327afc6c
Change-Id: Iaacb956a6039eb5826b82618dc03be32053fb892
This commit changed the initialization of best_mode_index to -1 to make
sure it is not mistakenly used for mode masking.
Change-Id: I75b05db51466070dd23c4ee57a4d4b40764dc019
In mode selection loop, once mode_index pass mode_skip_start, all
modes with a different reference frame from current best mode are
masked out using mode_skip_mask.
However, the setting of mode_skip_mask may use an invalid mode if
there is no mode tested yet. This commit fixes the issue by making
sure a mode has been tested and selected. Otherwise, no mode will be
masked out because of their reference frame.
Change-Id: Ib0009e8a96836a65cf5347440fff8a2e1a67f29f
Calculate the difference variance between last source frame and
current source frame. The variance is calculated at 16x16 block
level. The variances are compared to several thresholds to decide
final partition sizes.
An adaptive strategy is implemented to decide using
SOURCE_VAR_BASED_PARTITION or FIXED_PARTITION based on motions
in the video. The switching test is done once every
search_type_check_frequency frames.
The selection of source_var_thresh needs to be investigated
further later.
RTC set Borg test showed 0.424% overall psnr gain, and 0.357%
ssim gain. For clips with large enough static area, the
encoding speedup is around 2% to 15%.
Change-Id: Id7d268f1d8cbca7fb8026aa4a53b3c77459dc156
This commit allows the non-RD mode decision flow to select
prediction filter type in NEWMV mode. It provides 8.14% compression
performance gains in both settings of AQ=0 and 3. The current speed
impact is about 5% to 10% slower.
Change-Id: Id66ecebf77abd8f90fb3f6a066c0e8dfb4bf1c42
Adds some high-level hooks for profile 2 before further
progress on the implementation.
According to the definitiion in this patch:
1. Profile 2 only supports 10 or 12 bit color but not 8
2. Profile 2 supports all color sampling modes: 444, 422 and 420,
and alpha plane.
3. Profile 3 is currently undefined.
Please consider the definition carefully and suggest modifications
to the definition as needed.
Change-Id: I5b284fc679e54ac5aee171af72fa7994cfd28995
Copy up to a certain bsize, otherwise set to a fixed bsize.
This helsp to reduce artifact near moving boundary caused by full partition
copy without checking motion of super-block.
This artifact can occur at speeds 3,4 in real-time mode.
Issue: https://code.google.com/p/webm/issues/detail?id=738.
Change-Id: I05812521fd38816a467f72eb6a951cae4c227931
ARF overlays now use the same rate correction factor as regular inter
frames, further testing would be needed to see if it makes sense to use
a completely separate rate correction factor for ARF overlays.
$ vpxenc --cpu-used=5 --fps=50/1 --target-bitrate=2000
parkjoy.y4m -o out.webm
=> Before: 3356 kb/s
=> After: 2271 kb/s
Change-Id: I73e4defa615ba7a8a2bdb845864f4b1721cbbffe
This commit estimates the motion vector rate cost right after full
pixel motion search. It combines this and the mode cost and compares
the corresponding rate-distortion cost. If it is already above the
current best one, skip the rest sub-pixel motion search and modeling
process. For pedestrian_area 1080p at 4000 kpbs, the speed -5 runtime
goes down from 39425 ms -> 38399 ms.
Change-Id: If4cd7119fd6c266798d5cf1d19d19ab425e52a26
Reinstates this macro and truns it on in order to avoid issues
due to some frames at the end starving in harder videos.
A more acceptable solution is in the works.
Change-Id: I3c46148e86fa6114e3fed245246fb3686a9e6700
This commit slightly increases the bit allocation for key frame. This
improves speed -5 coding performance by 2.77% with aq-mode=0 and by
2.78% with aq-mode=3.
Change-Id: Iaa3e777f80b9706306606af06e89852bac146659
For real-time mode under cbr, this increases the gain (5-10%)
for speed 5 (none/little change for 6), on vc-clips.
Change-Id: I9b38beeb3c820de22c43a0ba53a9456168dd24ba
Turns off the DISABLE_RC_LONG_TERM_MEM macro and makes other changes
in the way the bits are updated, to make 2-pass rate control track
target bitrates closer.
Change-Id: I5f3be4b11c2908e6a9a9a1dd4fcf4e65531c44d8
This commit reduces the frequency of frames using finer quantizer
in non-RD coding flow, and slightly tune up the quantizer resolution
when used. It provides 1.7% compression gains in speed -5 at no speed
difference.
Change-Id: I430249a51260a841a0402666e5ec1566e4f7d5a6
The new tolerance is a little higher than before (especially
for kf/gf/arf) so this change gives an encode speed up
for some clips up for speeds 0-2.
Change-Id: I63f7d6c9cc11c7f58742f41e250dcd3eab1741eb
This code/setting was actually not used (since speed features were not set on first frame,
until a recent change) and should be removed.
In CBR mode, the q value for the first frame can be controlled by setting
the target size via the parameters rc_buf_initial_sz (and max_intra_size_pct).
Change-Id: I65afc64972b36c449bd5a8c25800e65da5389066
While encoding a frame, its last frame source can be used to give
acurate motion information. This patch prevents last frame to be
overwritten so that it is available during current frame encoding.
The last source is scaled when it is necessary. cpi->Last_Source
points to the last source frame.
Change-Id: I0e1ef5e9e1d2badf9d0c7a1a44a7ed5b24c09425
Use a crude correction factor to correct for
lower compression efficiency at higher encode
speeds when estimating the max Q for the
clip.
Change-Id: I5ae377647f4adf5e91d700a8791fb3b8f70efc73
This commit optimizes the bit allocation for the non-RD coding flow.
It applies slightly better quantizer to the frames, where all blocks
run a non-RD partition search. Such frames typically have better
rate-distortion trade off, thus improving the reconstruction quality
for next few frames reference at reasonably low increment in rate
cost.
The coding performance for rtc set at speed -5 with error-resilient
tuned on and rate control set as cbr is improved by 19.58%. It improved
the coding speed by about 10% for a portion of local test clips.
Change-Id: I9d56696cd4359dc8136ca10aff10fff05aaa2686
This commit adjusted the speed steps in rt mode to make the steps
more evenly spaced on speed and quality, specifically:
1. Merged 3 and 4 into one single step 3 and removed confilicting
features.
2. Move 8, 7, 6, 5 to be 7, 6, 5, 4 repsectively.
Change-Id: I38d56d61531f3561d772aef953c411c8fb38c063
Root cause is the different default register length between x86
and x64 platform. Change spatial_layer_id to long long.
Change-Id: If1a5972365c7a59f7e76cb4fd714610f3d48a8ff
Small speed gain for speed 1.
Quality is generally a little up for speed 2.
Speed 3 was similar to speed 4 but now positioned more
evenly between 2 and 4 speed and quality wise.
(opsnr +5.6% ssim +8.25% across all sets)
Speed 4 is a little slower than before but sizable quality gains.
(opsnr +3.7% ssim +6.8% across all sets)
The code has been cleaned up a bit so that for each incremental
speed step changes over the previous speed step are applied.
This makes it easier to see what is changing from one setting to
the next.
Change-Id: I2d98d0d6230af23486adaec01908f58942a7cdeb
Allow tx search for ARF and GF helps quality but a little slower.
Setting subpel_iters_per_step to 1 improves encode speed.
Setting sf->mode_skip_start = 10 improves speed.
Initial local results suggest overall impact on quality is neutral
but encode is up to 15% faster.
Change-Id: Ibde02cae6626a44c10a1da0cefe888afbb51f037
Removes a TODO. Changed meaning of some parameters
(target-max-percent refresh and starting index) to be
defined relative to superblock. Also, modify turn-off condition.
Change-Id: I5e55f372b7079c24f9cdac0b06fa34620dbf456b
This commit enables the non-RD mode decision coding flow to
adaptively apply partition search in non-refresh frame, when the
collocated block in previous frame suggests there might be a motion
activity. It refactors the update_state_rt() function to support
buffer swap of mode_info struct, thereby unifying the encoding
stage across various non-RD coding modes.
It provides 5% compression performance gains in speed -6 for rtc
test set, at about 12% speed slow down.
Change-Id: Iefa374aed5a11c4b7ff9a3ed36a98ea8bd184edb
Fix so that vp9_update_segment_aq() will use the correct (i..e, chosen)
encoding mode (from ctx struct) in update_state.
Change-Id: Icc4b66f3935fad5ec4516a4d57e843d12c365e64
This commit allows the recursive non-RD partition search to early
terminate sub search tree when the cumulative rate-distortion is
already above the best available.
Change-Id: Ifdbcbb4bee229f47fde3033200829577c9f1fc1d
This commit added a speed feature to make the logic of calculating
skip_recode on a block level more explicit. This also enable the
feature to be enabled at speed 5 where the previous logic is too
conservative, help gain back the lost speed for --rt(-5).
Change-Id: Ieb37ca3e85c2e7bda343486edf13d5f5395f2233
There were a few conflicts between the new non-RD partition search
and recent clean-up patches, which were not caught by git control.
This commit fixed these issues.
Change-Id: Ieebefbd6c19d81d0d13e3c568877d5cce2ab7797
This commit takes out the if statements on using adaptive motion
search flag. This feature is automatically enabled in non-RD mode
decision flow for variable partition types search.
Change-Id: I5a25cf9109d84d07aa61b3e02c8d32dda1e90cb0
This commit enables a recursive partition type search for non-RD
mode decisions. It allows the encoder to choose variable block
sizes in a 64x64 block based on rate-distortion modeling.
It improves speed -6 coding efficiency for rtc set by 2.4%. Most
of the gains come from 32-40dB range, where many sequences gain
about 5% to 20%. Local tests suggest there is no speed change.
Change-Id: I06300016e500a21652812b7b3b081db39a783d66
This commit reformats non-RD coding flow layout to allow mode
decision with fixed and variable block sizes.
Change-Id: I2cdd3bb9f26c499ee4a9849004fd925cdd195d09
2 functions were optimized for avx2 by using full 256 bit register
In order to handle 32 elements in parallel instead of only 16 in parallel:
1. vp9_sad32x32x4d
2. vp9_sad64x64x4d
The function level gain is 66% and the user level gain is ~1%.
Change-Id: I4efbb3bc7d8bc03b64b6c98f5cd5c4a9dd3212cb
The third pred_mv is stored in x->pred_mv[ref_frame]. This commit make
sure the correct mv is read.
Change-Id: Ibed24daf36703a63f0394c87b2381ee1d2eb7910
In non-rd pick_mode code, added mode skipping according to
thresholds. Used rd thresholds now, but we can modified them
later for real-time case.
RTC set borg test showed a 0.095% PSNR gain. For different rtc
clips, the real-time(speed 7) encoder speedup is 2% - 10%.
Change-Id: Ic72535c96b891092c662453be32d3168f7e34dcc
The flag x->skip_recode interacts badly with
the cpi->sf.use_nonrd_pick_mode and
cpi->sf.skip_encode_sb speed settings.
Restricting the use of the skip_decode flag when
these other speed choices are in use helps quality
for speeds 3 and 4 by a large amount with only a
small impact on speed.
Average improvmentes for 2 pass speed 4:
Derf +8.8%
Yt + 10.53%
Std-Hd +6.95%
yt-hd + 22.95%
Change-Id: I8010876d8012042a11077c92e69d813c3dfa58eb
This commit changed how q is validated in lossless mode. With this
commit, when --lossless=1 is specificed at commandline, --min-q and
--max-q are now ignored. This is to make the option non-ambiguious.
Change-Id: I33e85690460537509d33be75d6a3597be4affc09
This is an initial attempt to allow variable block size partition
in non-RD coding flow. It tests 8x8, 16x16 and 32x32 block size per
64x64 block, all using non-RD mode decision and the associated rate
distortion costs from modeling, then selects the best block size to
encode the entire 64x64 block. Such operations are triggered every
other 3 frames. The blocks of intermediate frames will reuse the
collocated block's partition type.
It improves the compression performance by 13.2%. Note that the gains
are not evenly distributed. For many hard clips, the compression
performance is improved by 20% to 28%. Local speed test shows that
it will also increase runtime by 50%, as compared to speed -7. It is
now enabled in speed -6 setting.
Change-Id: Ib4fb8659d21621c9075b3c369ddaa9ecb0a4b204
1. Save stats for each spatial layer
2. Add frame buffer management for svc first pass rc
3. Set default spatial layer to 1
4. Flush encoder at the end of stream in test app
This only supports spatial svc.
Change-Id: Ia89cfa87bb6394e6c0405b921d86c426d0a0c9ae
The use of uninitialized skip flag will trigger inconsistency in
coding statistics, when alternate RD and non-RD coding modes are
enabled. This commit fixes this issue and removes unnecessary if
statements from update_state_rt.
Change-Id: I7d549dcb0e3ef48b999e5bbc78174ba84502cfcf
The block coding skip flags are assigned in the normal RD mode
decision loop. They are then used in the final encoding stage.
In the non-RD mode decision, the forward transform and quantization
stages are replaced by modeling based on SSE and variance of
prediction residues. This commit applies reset to this array in
the non-RD coding mode.
Change-Id: I66584669b035e9c8ac23e95047849ff277472742
This commit moves the position where rdmult is saved to make sure it
is the correct value. Prior, an uninitialized value may be saved and
restored.
This addresses issue:
https://code.google.com/p/webm/issues/detail?id=733
Change-Id: I436407f289169bc63da3c5a6bf609bed16cb71b5
This commit unifies the non-RD partition use cases for both fixed
and variable block sizes. Deprecate and remove the separate function
for fixed partition type only.
Change-Id: I2b6cb945e90c1566f985adcebc4d0757480a8004
This commit allows the non-RD mode decision process to return the
rate and distortion costs associated with the selected mode.
Change-Id: Ibe0f67d323f65839fd9cb0a726c1219bf7b55da9
Brings back most of Jim's previous patch for choosing
partitioning based on variance while making it compatible
with the current state of the code. Also adds a
nonrd_use_partition() function to recursively encode for any
arbitrary sb_type decisions within a 64x64 block; and
includes some refactoring.
Currently, when the VAR_BASED_PARTITIONING mode is turned on
for speed 7, there is a 10+% speed-up observed.
Experiments/improvements with this new partitioning method
will be conducted subsequently.
Change-Id: Ie6f43bfbde30583e941f450bf07c3b48828c9571
This commit adjusts the rate-distortion modeling for non-RD mode
decision. It puts more weights on energy from AC coefficients when
estimating the cost. The coding performance for rtc testset is
improved by 0.72%.
Change-Id: Ifa6ff11311a513ec2b10586589e82a9a21f6c61c
Assign interp_kernel value in MACROBLOCKD. This will be used to
select prediction filter coefficient sets and generate motion
compensated prediction.
Change-Id: I28c8dfb2dae6566f6939bb328aca5875c94bee65
Clean-ups include
a. redundant code in rt -5 speed feature settings
b. code that guarantees square block availability in
rd_auto_partition_range()
Change-Id: Ic7b04d45b6dc15c461e0edbbb4e78aec20348291
The block size used for non-RD mode decision in FIXED_PARTITION
setting was uninitialized. This commit fixes it by setting block
size to be BLOCK_16X16.
Change-Id: Ief04c9f1ab668de69297d9ab3dc15e2fa0bc4e95
Adds a fast diamond search which is about 5% faster than FAST_HEX
with only a 0.1% drop in psnr when turned on for both speeds 5 and 7.
This search is turned on for speed 7.
Change-Id: I497630aa88a5148926086bb3038e7975e5f4eb98
This commit allows the non-RD mode decision to skip mode RD modelling
check, if the motion vector associated with the current mode is
same as that of NEARESTMV mode. This makes speed -7 about 2% faster.
Previous change that converts cost metric from SAD to model based RD
value makes the codec 6% slower at speed -7.
Change-Id: I30cfec5452f606a671b8432a2f7f0c94fbb49fc8
The rate-distortion model in non-RD coding mode is only applied to
luma component. This commit removed a few redundant addition steps.
Change-Id: Id8edc0a47c2dbef8deba43debe2c95db39454de3
This commit replaces SAD cost with modeled rate-distortion cost
for non-RD mode decision. It translates the prediction residual
SSE into estimate rate and reconstruction distorion costs, hence
capturing the quantization setting effect. The compression
performance of speed -7 for rtc set is improved by 14.79%.
Change-Id: Ifda014eb0501d13109fe7f92680bf1410b463632