Commit Graph

9234 Commits

Author SHA1 Message Date
Marco
768b1f7281 vp9: 1 pass cbr: allow noise estimation down to 360p.
Also adjust some thresholds for noise level setting.

Change-Id: I7e03d7057ef2061c9447728deb9c6aff5d3da4b7
2017-01-03 16:26:22 -08:00
Yunqing Wang
99c573f018 Merge "Fix for out of range motion vector bug in joint motion search" 2017-01-03 17:46:15 +00:00
Ranjit Kumar Tulabandu
b67e1f701f Fix for out of range motion vector bug in joint motion search
Clamped the initial mv in vp9_refining_search_8p_c.

BUG=webm:1354

Change-Id: I47d302b350937e3e6e52e95c983b5fb0b4c64fba
2017-01-03 09:12:32 -08:00
Yunqing Wang
ecdb6a00c2 Merge "Make sub-pixel mv search's return value consistent with the return type" 2016-12-29 19:16:01 +00:00
Yunqing Wang
c96a8dcb5b Merge "Bug fix to avoid random crashes during ARNR filtering" 2016-12-29 17:24:24 +00:00
Gabriel Marin
e6b9609fc0 Merge "Remove superfluous conditional on 'shortcut'" 2016-12-29 06:03:43 +00:00
Yunqing Wang
1d12559b09 Make sub-pixel mv search's return value consistent with the return type
For out-of-range cases, returned UINT_MAX instead of INT_MAX in the
sub-pixel mv search to be consistent with the "uint32_t" return type.

Change-Id: I8e206d771228c13d89bafbbe9f14722c8ecc6a7a
2016-12-27 12:08:38 -08:00
Ranjit Kumar Tulabandu
7cf13826b7 Bug fix to avoid random crashes during ARNR filtering
The function 'vp9_find_best_sub_pixel_tree_pruned_more' is modified
to return INT_MAX for handling invalid MV cases from UINT32_MAX.

yunqingwang:
patch 3: rebased on top of the tree.
patch 4: The return type of vp9_find_best_sub_pixel_tree* was changed
to uint32_t to fix ubsan warnings. Changing UINT_MAX back to INT_MAX
was not quite right. Patch 4 modified vp9_temporal_filter.c to accept
uint32_t.
(Note: Inconsistency exists in vp9_find_best_sub_pixel_tree*, which
will be fixed in a separate CL.)

Change-Id: Ib1a79dc2aa41ea6335c21669c76883cdbb7e0535
2016-12-27 11:20:08 -08:00
Marco
e7c453b613 vp9: 1 pass vbr: Skip find_predictors in pickmode when source is altref.
When source frame is altref, we only do zero-mv mode, so we can skip
the find_predictors(). No change in compression.
Small speed gain, ~1%.

Only affects 1 pass vbr with lookhead altref, for ytlive with
the macro flag USE_ALTREF_FOR_ONE_PASS on.

Change-Id: I9318c5da8521f017bf54919cd652438b3a6313d1
2016-12-21 12:12:55 -08:00
Jerome Jiang
f27276f44f Merge "vp9: Add feature to copy partition from the last frame." 2016-12-20 21:46:44 +00:00
Gabriel Marin
fce163cd54 Remove superfluous conditional on 'shortcut'
Remove superfluous test. Produces a small improvement in instruction scheduling.
Measured a 1% to 1.5% reduction in execution time for routine vp9_optimize_b
with different compilers.

No change in behavior.

TEST=Verified that encoded files match bit for bit, with and without this
change.
BUG=b/33678225

Change-Id: I2bf248d4c25fc0256147d7a8766ff9108ae9cba3
2016-12-20 12:20:21 -08:00
Jerome Jiang
1d5ca84df6 vp9: Add feature to copy partition from the last frame.
Add feature to copy partition from the last frame.
The copy is only done under certain conditions that SAD is below threshold.
Feature is currently disabled, until threshold is tuned.
Feature will be initially used for Speed 8 (ARM).

Under extreme case of always copying partition for speed 8:
Encode time is reduced by 5.4% on rtc_derf and 7.8% on rtc.
Overall PSNR reduced by 2.1 on rtc_derf and 0.968 on rtc.

Change-Id: I1bcab515af3088e4d60675758f72613c2d3dc7a5
2016-12-19 16:24:03 -08:00
Gabriel Marin
85aead1790 Merge "Simplify address arithmetic in vp9_optimize_b" 2016-12-19 23:25:39 +00:00
Marco Paniconi
c1f5194842 Merge "vp9 denoiser: Fix the logic for re-evaluating zeromv after denoising." 2016-12-19 21:15:37 +00:00
Gabriel Marin
0549f5aae9 Simplify address arithmetic in vp9_optimize_b
Simplify address arithmetic on token_costs to reduce the number of generated
instructions that are used for address arithmetic inside routine
vp9_optimize_b. It also helps improve instruction scheduling depending on
compiler and optimization level.

Measured a 9.3% reduction in retired instructions and 5.3% reduction in
execution time for this routine with GCC v4.8.4 and optimization flags -O3,
and a reduction of up to 11.6% in execution time with other compilers.

No change in behavior.

TEST=Verified that encoded files match bit for bit, with and without this
change.
BUG=b/33678225

Change-Id: I6098650fb5cd2aa04e014fe6e68ca20761f3a21f
2016-12-19 13:10:04 -08:00
Marco
6e8dbc76ad vp9: With denoising on, only estimate noise level for higher resolns.
Allow it for resolns above 640x360 for now.

Change-Id: I087d0d8173f96b316164fdd4a499110ce2e7a233
2016-12-19 10:05:54 -08:00
Marco
61b569b461 vp9 denoiser: Fix the logic for re-evaluating zeromv after denoising.
Correctly set interp_filter to SWITCHABLE for INTRA mode.
Also reduce threshold on noise level for re-evaluating zeromv.

Change-Id: Id32c01e193209fb380aa07204f0be3babf29f70a
2016-12-19 09:30:16 -08:00
Marco
4260a7f2b3 vp9: Change condition to enable recheck_zeromv_after_denoising.
For when denoising enabled: change condition to enable
the recheck_zeromv_after_denoising for only very high noise level.
This is causing an issue, so enabling it for very high noise
to effectively shut it off.

Change-Id: Ic40d6025f3f398338cedd270d17c0ccd9a3daa84
2016-12-16 15:00:21 -08:00
Marco
5de798f2b2 vp9: Fix to usage of flag USE_ALTREF_FOR_ONE_PASS
The flag USE_ALTREF_FOR_ONE_PASS allows for alt-ref lookahead
in 1 pass vbr (from https://chromium-review.googlesource.com/#/c/365498).
This change is to make sure this macro flag only has effect if
the config flag cpi->oxcf.enable_auto_altef is also on.

No change in ytlive encoding, as USE_ALTREF_FOR_ONE_PASS is not
yet enabled.

Change-Id: I1a69681e4a15c5244581a3dab4587fca08f02e0f
2016-12-14 15:07:38 -08:00
Linfeng Zhang
5d4aa325a6 Cosmetics by unifying dest_stride to stride in idct
Change-Id: Ie9336a808a3c3592bb4fd5d4ad3839028bfcafba
2016-12-12 15:13:22 -08:00
Marco
076d4bd91a vp9: Fix to crash in svc code.
use_base_mv assumes 2x2 scaling, so fix is to shutoff
this feature unless spatial scale factors are 2.

Added svc unittest for 2 spatial layers with 5x5 scaling,
which generates the issue without this fix.

Also fix some settings in svc unittest:
let the speed setting vary (from 5 to 8), and enable static threshold.

BUG=webm:1344

Change-Id: Idfd0a6c633c21b49a0479601506302cfe974e30e
2016-12-09 08:57:09 -08:00
Yunqing Wang
880adc3355 Merge "Remove an unused first pass statistic" 2016-12-08 22:46:44 +00:00
Yunqing Wang
394020383d Remove an unused first pass statistic
One of the first pass stats "new_mv_count" is no longer used in VP9,
and is removed. This also makes it easy to implement a multi-threaded
first pass. This change doesn't affect the coding performance, which
has been verified by borg tests.

Change-Id: I4c7c7bf9465fda838eb230814ef0c631c068c903
2016-12-07 15:32:25 -08:00
Linfeng Zhang
174528de1e Merge "Update idct NEON optimization to not use narrowing saturating shift" 2016-12-07 21:03:21 +00:00
Linfeng Zhang
018a2adcb1 Update idct NEON optimization to not use narrowing saturating shift
Change-Id: Iae517017217dbacd638d40fcfeeb0f4bba7b8b8b
2016-12-07 10:25:09 -08:00
Marco
360ac89885 vp9: Adjust the weight factor for segment rate cost for aq-mode=3.
Use the segment weight factor based on the target (cr->percent_refresh)
if it less than the current estimate (avergae of past usage and target).
Small improvement at low bitrates.

Change-Id: Iba8fd909e203f94458901366d3a991f7ea854d49
2016-12-05 12:42:56 -08:00
Marco
d793950ec8 vp9: Adjust cyclic refresh parameters for low bitrates.
Increase the motion threshold and qp-delta for segment#2 boost.
This can increase the frame-drop at low bitrates, but generally
better spatial quality.

Only affects real-time mode with aq-mode=3, at very low bitrates.

Change-Id: I5ccb784667f70d0c27d369806b93b1f93d5605d1
2016-11-23 12:14:28 -08:00
Marco Paniconi
8b2cbaefcf Merge "vp9: Use more aggressive skip when short_circuit_low_temp_var = 1." 2016-11-23 18:15:58 +00:00
James Zern
cb22359d02 vp9,read_inter_block_mode_info: quiet msan warning
best_sub8x8[1] won't be used meaningfully when is_compound is false, but
may trigger an msan warning as the value is copied around and later
clamped.

BUG=667044

Change-Id: Icc24c3b72cdb550bebea44d4aaa4ff8bf3fbab56
2016-11-22 15:32:00 -08:00
Marco
b6597745f9 vp9: Use more aggressive skip when short_circuit_low_temp_var = 1.
Use the same feature as https://chromium-review.googlesource.com/#/c/411327/,
but allow it to be used for speed  = 6 and 7, where
short_circuit_low_temp_var = 1.

Speed up of ~2-3% for speed 7, with little/no loss in compression.

Change-Id: I263a0f261ad9929034392d68f0153dc6376fdb5f
2016-11-22 14:54:28 -08:00
Yaowu Xu
0ffbb36ddc Add validation of frame_parallel_decoding_mode
This is a boolean value that is written into bitstream, any value other
than 0 or 1 could have led to unexpected behavior. This commit fix the
issue by adding validation of the value to make sure it is boolean.

BUG=webm:1339

Change-Id: I2d3e69e8dbefcab9a0db9cb39a91a40ce531c5a1
2016-11-21 10:53:25 -08:00
Jingning Han
f473e892f7 Merge "Enable asymptotic closed-loop encoding decision" 2016-11-19 04:12:55 +00:00
Jerome Jiang
4ddae8f524 Merge "vp9: Speed 8: More aggresive golden skip for low res." 2016-11-15 22:50:58 +00:00
Jerome Jiang
360217a233 vp9: Speed 8: More aggresive golden skip for low res.
Add a new, more aggresive short circuit: short_circuit_low_temp_var = 3 to skip
golden of any mode when variance is lower than threshold for low res.
This change only affects speed = 8, low resolution.

Metrics for avgPSNR/SSIM on rtc_derf (low resolution) show loss of
0.27/0.31%.
On Nexus 6, the encoding time is reduced by ~2.3% on average across all
low-res clips.

Visually little change on rtc_derf clips.

Change-Id: Ia8f7366fc2d49181a96733a380b4dbd7390246ec
2016-11-15 13:56:27 -08:00
Jerome Jiang
eff68a3a4d vp9: Speed 8: Turn off 4x4avg for low-res non-key frames.
Changes only affects speed = 8 for low resolutions.

Metrics for avgPSNR/SSIM on rtc_derf (low resolutions) show loss of
0.5/0.6%.
On Nexus 6, the encoding time is reduced by ~5.9% on average across all
low-res clips.
Visually little/no change on rtc_derf clips.

Change-Id: I68dd50e558d72dcc1af8317d224bfae5e3bd872d
2016-11-14 11:17:14 -08:00
Jingning Han
44f8ee7258 Enable asymptotic closed-loop encoding decision
This commit enables asymptotic closed-loop encoding decision for
the key frame and alternate reference frame. It follows the regular
rate control scheme, but leaves out additional iteration on the
updated frame level probability model. It is enabled for speed 0.

The compression performance is improved:

lowres 0.2%
midres 0.35%
hdres  0.4%

Change-Id: I905ffa057c9a1ef2e90ef87c9723a6cf7dbe67cb
2016-11-14 09:22:55 -08:00
Marco Paniconi
b6f6169348 Merge "vp9: Adjust thresholds for limiting cyclic refresh for noisy content." 2016-11-11 17:11:19 +00:00
James Zern
4807f1584c *ppflags.h: remove unused *_DEBUG_* enum values
usage of the vp8 versions was removed in:
3f72509 vp8: remove VP8_SET_DBG* control support

vp9 had the usage stripped even earlier.

Change-Id: I978142eb6492552cd29c9c6feb1e89acfc5f7b84
2016-11-08 21:09:16 -08:00
Marco
18794d8ddc vp9: Adjust thresholds for limiting cyclic refresh for noisy content.
For noisy content, be more aggressive in skippping some blocks for
delta-qp to reduce noise pulsing artifact. Also treat frame boundary
case when dimension is not multiple of superblock size/64.

Only affects non-screen content case, and when source noise
is measured to be high (at least level kMedium).

Change-Id: Ib13a2a20ed1ce37ff3c44d95c3ef2635fd695222
2016-11-08 15:50:46 -08:00
Linfeng Zhang
d545c19afa Rename vpx_highbd_idct8x8_10{*}() to vpx_highbd_idct8x8_12{*}()
Also update its trigger threshold from 10 to 12.

Change-Id: Ib8dddd87a5a22a12ca66e7084d342fbb027b0a2f
2016-11-07 09:07:55 -08:00
Johann
e10c95dc83 Update vp9_fdct8x8_quant_ssse3 for highbitdepth
Borrow transition functions from fdct.h nee vpx_quantize_b_sse2

BUG=webm:1304

Change-Id: I9c88c3eec3ff8bb461411d98c26c3c236ea28ef1
2016-11-05 01:23:07 +00:00
Marco Paniconi
cca774c7df Merge "vp9: Non-rd pickmode: fix logic in reference masking." 2016-11-03 23:12:05 +00:00
Marco
86b0042f44 vp9-svc: Add decoder control to decode up to x spatial layers.
Change-Id: I85536473b8722424785c84c5b5520960b4e5744a
2016-11-03 11:18:00 -07:00
Marco
da9f762e24 vp9: Non-rd pickmode: fix logic in reference masking.
Add condition that usable_ref_frame > LAST.
This is to avoid potentially skipping all last-nonzero mv modes,
if golden is used as a reference but skipped completely for the
current block.

This has no effect currenty, as we always consider testing golden
mode for each block.

Change-Id: I3182cf44664081935a90ed43aa7b32e710e60e22
2016-11-03 10:32:57 -07:00
Debargha Mukherjee
f93305aa07 Merge "Speed-up recode loop for extreme bitrate diffs" 2016-11-03 17:04:17 +00:00
Paul Wilkins
295cd3b493 Merge "Fixed bug in formatting of debug stats." 2016-11-02 17:10:07 +00:00
paulwilkins
de76d2e315 Fixed bug in formatting of debug stats.
Fixed formatting bug introduced by the fix to BUG=webm:1322
( Iedc4477aef1746aa0a4f84d88a1156296fd3ba87)

Change-Id: I715ee446c0e8584967ab87ba4e355759dd394187
2016-11-02 09:38:18 +00:00
James Zern
1961a92a94 vp9,tile_worker_hook: correctly set jmp target
vp9_init_macroblockd() resets the error_info to cm's global copy; this
needs to be set to the thread-level target to avoid jumping to the
incorrect stack, resulting in hang or crash.
broken since:
1f4a6c8 vp9/tile_worker_hook: add multiple tile decoding
includes v1.5.0, v1.6.0

BUG=629481

Change-Id: Icbf1696b25ba8c479e845fbf227b3c3ca73542f5
2016-11-01 18:45:50 -07:00
Paul Wilkins
84dcfced5b Merge "Change to KF boost calculation." 2016-11-01 09:29:30 +00:00
Paul Wilkins
715c65914b Change to KF boost calculation.
This  change is a step in a larger change to the way boost and interval are
determined for ARF and Key frames.

This patch contains some pluming for the general case but focuses on the
key frame boost calculation. This now relies more heavily on the rate at
which the error score increases between the primary and secondary reference
frame. This seems to be less fragile when dealing with different frame sizes.
For example larger image formats tend in the first pass to see a higher
% of intra coded blocks and the use of this number in calculating the frame
decay factor was leading to much lower boost numbers for 4K, for example,
than the same clip coded at 2K.

This change does give overall gains but they are MUCH larger for the 4K Netflix
set. For the 4K Netflix set the average gain is around 3% with some clips > 20%
whereas for the same set at 2K the average gain is 0.5-1%.

In general for small image formats the boost is most often reduced a little whereas
4K clips the boost is increased. There are some -ve cases such as Akiyo at 352x288
where the reduced boost hurts the metrics, especially for SSIM, even while
the set as a whole improves. This is most notable at very low Q and may be the
subject of a future patch.

Some common code for KF and ARF was separated in this patch for the purposes of
tuning but may later be re-merged if appropriate.

Change-Id: Iaa15ac5a58d2be89181100d95cef6a8dc4b12d0d
2016-10-28 15:35:59 +01:00