Commit Graph

9806 Commits

Author SHA1 Message Date
Marco
6c0011a255 vp9-svc: Avoid minmax variance for non-reference frames.
For choose_partitioning (speed >= 6): avoid computation
of minmax variance for non-reference frames in SVC.

Existing condition only avoided this for speed >= 8.
Combine that existing logic with non-reference condition.

Small speedup (~0.5-1%) for 3 layer SVC,
neutral change on avgPSNR/SSIM metrics.

Change-Id: I3e9f3a1af0647b15e475cf170d9402908d672ee5
2017-11-09 16:27:27 -08:00
Jerome Jiang
fdb054a05d vp9: SVC feature to use partition from lower resolution.
For SVC with 3 spatial layers:
Add feature to copy/upscale partition from middle spatial layer
to the upper/highest resolution, when superblock sad is not high.

Enabled for speed >= 7 and only for non-reference frames.

Speedup ~3-4%, small loss in avgPNSR/SSIM of ~1%.

Change-Id: I7f0a2716c0fde28bade0f86159d11b7e31d6ab8d
2017-11-09 14:16:50 -08:00
Johann Koenig
cf8039c25f Merge "Support building AVX-512 and implement sadx4 for AVX-512" 2017-11-08 16:28:40 +00:00
Marco
6fbc354c97 Nonrd_pickmode: avoid computing UV cost when early_term is set.
For nonrd_pickmode: if early_term is set there should be
no need to include UV in rdcost (when color_sensitivity is set).

Neutral change on RTC and RTC_derf metrics, for speed >= 5.
No change for ytlive metrics.

Very small speed gain (~0.5%) on some clips with strong color content.

Change-Id: Ifc00928ecd935fc71e94935ceef0ae7481249f07
2017-11-06 10:22:14 -08:00
Kyle Siefring
b383a17fa4 Support building AVX-512 and implement sadx4 for AVX-512
The added AVX-512 support requires the subset of AVX-512 added in Skylake-X.

Change-Id: I39666b00d10bf96d06c709823663eb09b89265b7
2017-11-03 13:37:23 -04:00
Marco
eb7d431cb5 Compound prediction mode for nonrd pickmode.
Allow for compound prediction mode in nonrd_pickmode for ZEROMV.
For real-time encoding, 1 pass with non-zero lag-in-frames.

Added speed feature to control the feature.
Enabled for speed >=6 for now, under VBR mode.

avgPSNR/SSIM metrics positive on ytlive set, for speed 6:
some clips up by ~3-5%, some clips neutral gain, average gain
across clips is ~1%.

Small/negligible decrease in speed.

Change-Id: I7a60c7596e69b9a928410c5ee2f9141eecd8613d
2017-11-03 10:13:05 -07:00
Jerome Jiang
3ba9a2c8b2 Merge "vp9: Move allocation of vt2 after early exits." 2017-11-01 16:58:01 +00:00
Jerome Jiang
34805d6d0d vp9: Move allocation of vt2 after early exits.
Remove the memory deallocation on the early exits.

Change-Id: I00b4a814ae6705105ecab89644d055ca3311d9f4
2017-10-31 17:04:04 -07:00
Jerome Jiang
0c84b9b703 Merge "vp9: Reduce stack usage of choose_partitioning." 2017-10-31 21:42:18 +00:00
Jerome Jiang
18b470f486 vp9: Reduce stack usage of choose_partitioning.
Move vt2 to heap.
Reduce the stack usage from ~87K to ~44K.

BUG=b/68362457

Change-Id: I8f5f93712934d59a8cc4564378172d409a736a2e
2017-10-31 13:10:27 -07:00
Jerome Jiang
c77822615e Merge "vp9: Reduce stack usage of choose_partioning." 2017-10-30 23:39:41 +00:00
Jerome Jiang
cc47231187 vp9: Reduce stack usage of choose_partioning.
Change type of sum_square_error from int64_t to uint32_t.
Change type of sum_error from int64_t to int32_t.

This reduces the stack usage from ~131K to ~87K.

BUG=b/68362457

Change-Id: I147d7c7b226bceb4f0817bb86848e1fa9d9ac149
2017-10-30 13:53:20 -07:00
Marco
0738d90169 vp9-svc: Allow for adapt_rd_thresh with row-mt.
Set adaptive_row_thresh_mt = 1 at speed >= 7,
for svc when multi-threading is used with row-mt.
This allow the adaptive_rd_thresh feature to be used
in the nonrd-pickmode.

~1-2% speedup for SVC encoding with small quality
loss (< 0.6%) on RTC set.

Change-Id: Iab9878dff117bccdaef3e4d0645165db9808cdfc
2017-10-23 11:47:18 -07:00
Paul Wilkins
199971d606 Merge "Corpus VBR tweak for undershoot." 2017-10-19 10:07:45 +00:00
Paul Wilkins
0c493cbe2b Merge "Increase precision of some debug stats output for corpus VBR." 2017-10-19 10:07:30 +00:00
Paul Wilkins
d8c34a2552 Merge "Prevent double application of min rate in two pass." 2017-10-19 10:06:33 +00:00
Linfeng Zhang
9336e01621 Merge changes I17fff122,Ic149e3cb
* changes:
  Add 4 to 3 scaling SSSE3 optimization
  Test extreme inputs in frame scale functions
2017-10-17 16:03:29 +00:00
Linfeng Zhang
580d32240f Add 4 to 3 scaling SSSE3 optimization
Note this change will trigger the different C version on SSSE3 and
generate different scaled output.

Its speed is 2x compared with the version calling vpx_scaled_2d_ssse3().

Change-Id: I17fff122cd0a5ac8aa451d84daa606582da8e194
2017-10-16 15:42:42 -07:00
Marco
a9248457b1 Adjust threshold in gf_boost for 1 pass vbr
Small inncrease the sad_thresh1, avoids some false
detection of possible scene changes within lag.

Small improvement in few clips on ytlive, otherwise neutral change.

Change-Id: Ia79b53bb657bbce65a7aac7d20666b6373d5af8b
2017-10-13 15:33:51 -07:00
Paul Wilkins
12df840777 Merge "Further Corpus VBR change." 2017-10-13 15:59:58 +00:00
Paul Wilkins
eaa593d293 Merge "Corpus Wide VBR test implementation." 2017-10-13 15:59:45 +00:00
paulwilkins
8842ee0b0d Corpus VBR tweak for undershoot.
In cases of strong undershoot adjust Q range down faster.

Change-Id: I84982beceb3c9b6dc50e52e4a6e891c7dd395d03
2017-10-13 10:27:15 +01:00
Marco Paniconi
28d1c0535d Merge "Adjust to scene detection for 1 pass vbr." 2017-10-12 19:36:33 +00:00
Marco
a673b4f4af Adjust to scene detection for 1 pass vbr.
Expose the threshold for setting key frame on cut,
and increase it for speed 5.
Also small adjustment to min_thresh.

No change in overall metrics or fps.
Small quality improvement and lower encode time on scene cuts.

Change-Id: I36e06ff3b26b6c29aede39c23fce454525fc9026
2017-10-12 10:59:23 -07:00
paulwilkins
2b247ae91c Increase precision of some debug stats output for corpus VBR.
Change-Id: I75841797cc0c215781b5b36e3a3e9f4b0e35ba63
2017-10-12 10:07:21 +01:00
Jerome Jiang
288890cd43 vp9: use nonrd pick_intra for small blocks on keyframes.
Keyframe encoding is more than 2x faster.
Disabled on Speed 8.

Change-Id: I2157318b6ac8253fa5398322c72d98cd7fa9b2b6
2017-10-11 21:38:01 -07:00
paulwilkins
416b7051d7 Prevent double application of min rate in two pass.
The initial allocation of bits in the two pass code to each frame
should be within the min max limits on the command line. However,
when forming an ARF group the cost of the ARF is shared by frames
in that group such that the residual bits for a frame could drop below
the min value. This change prevents the minimum being re-applied
after the cost of the ARF has been deducted as this may otherwise
cause low rate sections to overshoot their target.

Test runs comparing to a baseline run with min and max section pct
0-2000% vs one closer to the YT use case (50-150%) suggest that
this fix not only results in better rate control but also gives a better
rd outcome.

For example the HD set vs 0-2000% baseline (opsnr, ssim).
Old code (50-150):  +0.751, +1.099
New code(50-150): +0.241, -0.009

Change-Id: I715da7b130bf53ba8aa609532aa9e18b84f5e2ef
2017-10-11 18:00:44 +01:00
Linfeng Zhang
16166bfdaa Add 4 to 1 scaling x86 optimization
Change-Id: I51c190f0a88685867df36912522e67bdae58a673
2017-10-10 16:24:06 -07:00
Marco
017257a317 Adjustment to scene detection and key frame.
For 1 pass vbr: use higher threshold on avg_sad
and force key frame under scene cut detection if
above the threshold. Allow it for speed >= 6 for now,
since it does not use the full nonrd_pickmode partition
(as in speed 5).

Improves quality somewhat on scene cut frames.
Neutral on overall metrics and fps for speed 6 on
ytlive set.

Change-Id: I12626f7627419ca14f9d0d249df86c7104438162
2017-10-10 11:20:05 -07:00
Linfeng Zhang
963cc22cef Merge changes I9d4c1af5,I882da3a0
* changes:
  Rename some inline functions in NEON scaling
  Generalize 2:1 vp9_scale_and_extend_frame_ssse3()
2017-10-10 17:29:50 +00:00
paulwilkins
06d231c9fa Further Corpus VBR change.
Change to the bit allocation within a GF/ARF group.

Normal VBR and CQ mode allocate bits to a GF/ARF group based of the mean
complexity score of the frames in that group but then share bits evenly between
the "normal" frames in that group regardless of the individual frame complexity
scores (with the exception of the middle and last frames).

This patch alters the behavior for the experimental "Corpus VBR" mode such that
the allocation is always based on the individual complexity scores.

Change-Id: I5045a143eadeb452302886cc5ccffd0906b75708
2017-10-10 10:41:35 +01:00
paulwilkins
741bd6df4f Corpus Wide VBR test implementation.
This patch makes further changes to support an experimental
corpus wide VBR mode that uses a corpus complexity
number as the midpoint of the distribution used to allocate bits
within a clip, rather than some average error score derived from the
clip itself.

At the moment the midpoint number is hard wired for testing and
the mode is enabled or disabled through a #ifdef.  Ultimately this
would need to be controlled by command line parameters.

Change-Id: I9383b76ac9fc646eb35a5d2c5b7d8bc645bfa873
2017-10-10 10:40:44 +01:00
Linfeng Zhang
27d21a3d13 Rename some inline functions in NEON scaling
Change-Id: I9d4c1af53d57f72fc716bacbe3b0965719c045ac
2017-10-09 11:23:00 -07:00
Linfeng Zhang
e1ae3772da Merge "Update vp9_scale_and_extend_frame_ssse3()" 2017-10-09 16:20:00 +00:00
Marco Paniconi
5bc4c37a89 Merge "Revert "Speed >=5 real-time: add TM intra mode for high_source_sad."" 2017-10-06 22:41:34 +00:00
Marco Paniconi
bcbc6ed82d Revert "Speed >=5 real-time: add TM intra mode for high_source_sad."
This reverts commit 9311ef18b4.

Reason for revert:
Notice small regression in some clips.
Will revisit in another change.

Original change's description:
> Speed >=5 real-time: add TM intra mode for high_source_sad.
> 
> Small/neutral change in metrics or speed for ytlive.
> Some improvement in quality on frames with big content change.
> 
> Change-Id: Ib3b0703a5f28ea6710e90324436e27598ab7384d

TBR=marpan@google.com,builds@webmproject.org,jianj@google.com

Change-Id: I9d8ec5195bb05ddf329d325699355185affb9b13
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
2017-10-06 22:14:56 +00:00
Marco
e405eb06b1 Adjust threshold in scene detection
For 1 pass vbr: increase min_thresh slightly, and also add
condition on golden/arf update for using full nonrd_pick_partition.

Reduces possible false detection for scene cut detection.

Neutral/small change in metrics or speed for speed 5.

Change-Id: I388f4d9a56e3cc763e0148338c1bc0381e58ad76
2017-10-06 11:08:56 -07:00
Marco
9311ef18b4 Speed >=5 real-time: add TM intra mode for high_source_sad.
Small/neutral change in metrics or speed for ytlive.
Some improvement in quality on frames with big content change.

Change-Id: Ib3b0703a5f28ea6710e90324436e27598ab7384d
2017-10-05 23:07:03 -07:00
Marco
18262a8576 Adjust threshold for adapt_partition for speed 6.
Lower SAD threshold to select non_rd pickmode partition
at superblock level more often.
Small gain in metrics, small/negligible decrease in speed.

Change-Id: I0f728236b91a604e4ca7e02039adc54d5985c4dc
2017-10-04 18:04:09 -07:00
Marco
4bc1fc58b6 Avoid nonrd_pick_partition for speed >= 6.
For 1 pass vbr speed >= 6: when REFERENCE_PARTITION is selected,
avoid doing the full nonrd_pickmode based partition.
No change in overall metrics or speed.
Reduces encode times on scene cuts by 10-20%.

Change-Id: I0310b1610cc1c83793a509e0a9059840e8f18308
2017-10-04 15:31:54 -07:00
Linfeng Zhang
127864deb3 Generalize 2:1 vp9_scale_and_extend_frame_ssse3()
Change-Id: I882da3a04884d5fabd4cd591c28682cbb2d76aa5
2017-10-04 12:35:39 -07:00
Linfeng Zhang
b809442521 Update vp9_scale_and_extend_frame_ssse3()
Change-Id: I22622faebfcc36f7a4d1f37e3800ae8ab87c8cd4
2017-10-04 12:32:30 -07:00
Marco
77e51e2035 Modify early exit for alt_ref in nonrd_pickmode.
For 1 pass vbr mode:
On no-show_frame/ARF: instead of skipping alt_ref_frame
completely in mode testing, allow for checking (0, 0) on alt_ref.

Small gain in metrics, ~0.18%, no change in speed.

Change-Id: I32a3c24faca64ab70dd5091071a0dc301db7dd1e
2017-10-04 11:53:39 -07:00
Marco
98dbf31c87 Enable arf usage for speed >= 6, 1 pass vbr.
For speed 6 on ytlive set:
On average, speed slowdown ~5%, quality gain ~2%.

Change-Id: Ia18237cc1d52c54d7e2cb3c71f571cf37ef61b44
2017-10-03 17:18:33 -07:00
Marco
ab2bd340ac vp9: 1 pass vbr: Limit qpdelta on high_source_sad.
For 1 pass vbr: when significant content/scene change is detected
(high_source_sad = 1) reduce/turnoff the additional qdelta on the
active_worst_quality. This helps somewhat to reduce the occurrence
of large frame sizes and large encode times.
Allow it only when use_altef_onepass is enabled.

Neutral/no change on metrics.

Change-Id: I1dd97dd2ab892d65f707b841b27a5de300b714ea
2017-10-03 16:27:17 -07:00
Marco
c8678fb7f3 Use adapt_partition for ARF in 1 pass.
For speed 6 real-time mode: use adapt_partition
on ARF frame instead of REFERENCE_PARTITION (which is slower).
This requires enabling compute_source_sad_onepass for no-show_frames.

Speedup of ~3-5% on some clips that heavily use ARF,
small loss (~0.2%) in quality on ytlive set.

Change-Id: Ib50acc97df06458244a6ac55d2bd882c30012536
2017-10-03 11:49:55 -07:00
Marco Paniconi
fe7b869104 Merge "ARF in 1 pass vbr: modify skip ref_frame in nonrd_pickmode." 2017-10-03 03:01:14 +00:00
Marco
33e10dfa7e ARF in 1 pass vbr: modify skip ref_frame in nonrd_pickmode.
Speedup of ~2-3% on 1080p clips speed 6.
Neutral/negligible loss in metrics on ytlive.

Change-Id: I7ac47a4d8b58c566920bae29a94a0e8d59c36dee
2017-10-02 19:04:03 -07:00
Linfeng Zhang
0e55b0b0a7 Add 4 to 3 scaling NEON optimization
Speed comparing with the one calling vpx_scaled_2d_neon()
  ~1.7 x in general
  ~2.8x for BILINEAR filter

BUG=webm:1419

Change-Id: I8f0a54c2013e61ea086033010f97c19ecf47c7c6
2017-10-02 15:04:09 -07:00
Linfeng Zhang
2c560c3c22 Specialize 4 to 3 frame scaling in C
Scale 3x3 block instead of 16x16 block in each loop. Disabled by
default.

Benefits:
1. Reduced number of different phase_scaler from 16 to 3.
   Optimization code will be smaller and faster.
2. Maximum phase_scaler drifting will be reduced from 5/16 to 1/24.
   (The drifting is 1/(3*16) in each step.)

BUG=webm:1419

Change-Id: I59a1f7496d89a1b090498c935d30cfcf1d0c282b
2017-10-02 11:56:15 -07:00