583 Commits

Author SHA1 Message Date
Jerome Jiang
6b9d130214 Merge "vp9: speed 8: Fix seg fault in partition copy when drop frames." 2017-05-13 03:20:49 +00:00
Jerome Jiang
1fcd5cca3c vp9: speed 8: Fix seg fault in partition copy when drop frames.
BUG=webm:1433

Change-Id: I4f3984ef28660d3218d48007d7c977bdbdaf8af6
2017-05-12 15:57:23 -07:00
Cheng Chen
76567d84ce Speed up encoding by skipping altref recode
Speed up for speed 0.
Reduce 10+% of encoding time for hdres in speed 0,
with less than 0.1% PSNR loss.
Compute total difference of previous and current frame context probability
model. If the diff is less than the threshold, skip recoding the frame.

Borg test (positive number means performance loss):
		lowres    midres    hdres
PSNR:		0.030     0.032     0.065

Local speed test: bitrate set at 1200
		blue_sky  pedestrian  rush_hour
Encoding time:	 -10.0%     -16.5%      -16.5%

Change-Id: I4e2d200ea3115d48b2c3e890143596b31b8ef9e9
2017-05-10 22:15:01 -07:00
Marco
4e23998fb4 vp9: SVC: Add option to set downsampling filter type.
Add option in SVC to set the filter type and phase for
the frame level downsampling filters.

For 3 spatial layers: set downsampling filter type to bilinear
and set phase to 8, for lowest spatial layer.

Change-Id: Id81f4b1ba93db19c1cd37b6a46d1281a2c61bc43
2017-05-09 17:22:44 -07:00
Hui Su
5048d6e7ee Merge "vp9 level: add tentative max cpb values for high levels" 2017-05-03 20:51:03 +00:00
Yunqing Wang
b68f14d0ed Merge "Make the row based multi-threaded encoder deterministic" 2017-04-26 16:12:14 +00:00
hui su
d01c9febe9 vp9 level: add tentative max cpb values for high levels
Add tentative max cpb size values for levels 5.2 and up. Otherwise
encoding will fail when targeting for these levels.

Change-Id: Ib7e0ba4b9836ea1ac900b6822543812843d48463
2017-04-25 18:03:55 -07:00
Linfeng Zhang
51dc998f3a Update highbd convolve functions arguments to use uint16_t src/dst
BUG=webm:1388

Change-Id: I6912de2639895d817ce850da8ea9f6c8fe21da42
2017-04-25 14:22:19 -07:00
Yunqing Wang
10a497bd38 Make the row based multi-threaded encoder deterministic
This patch followed allow_exhaustive_searches feature modification and
continued to modify the encoder to achieve the determinism in the row
based multi-threaded encoding. While row-mt = 1 and using multiple
threads, the adaptive feature in encoder was disabled, which gave
BDRate gain(at speed 1, -0.6% ~ -0.7%; at speed 2, -0.46% ~ -0.59%),
but some encoder speed losses(7% ~ 10% at speed 1 and 3% ~ 6% at
speed 2). These speed losses were acceptable considering the speed
gains obtained from row-mt.

Change-Id: I60d87a25346ebc487a864b57d559f560b7e398bb
2017-04-24 16:28:27 -07:00
Marco
29938b3a5a vp9: 1 pass SVC: Fix comment and condition for up-sampling reference.
No change in behavior.

Change-Id: I218fb30289091da623acb23324027435b8510d0e
2017-04-20 14:21:05 -07:00
Marco
3134a52d26 vp9: SVC: Redefine the source downsample filter choice.
Rename the source downsampling filter, and define it
per spatial layers. Used 1 pass CBR SVC.

Change-Id: I8135f2ab89c535c53429b9c58b586f746bb668c7
2017-04-20 10:17:13 -07:00
Linfeng Zhang
fbbdba3b04 Merge changes I9e18a73b,Ie47c8cd4
* changes:
  Clean CONVERT_TO_BYTEPTR/SHORTPTR in convolve
  Create CAST_TO_BYTEPTR/SHORTPTR
2017-04-19 23:55:58 +00:00
Linfeng Zhang
bf8a49abbd Clean CONVERT_TO_BYTEPTR/SHORTPTR in convolve
Replace by CAST_TO_BYTEPTR/SHORTPTR.
The rule is: if a short ptr is casted to a byte ptr, any offset
operation on the byte ptr must be doubled. We do this by casting to
short ptr first, adding offset, then casting back to byte ptr.

BUG=webm:1388

Change-Id: I9e18a73ba45ddae58fc9dae470c0ff34951fe248
2017-04-19 12:13:49 -07:00
Marco
348bdc0195 vp9: Add phase to get averaging filter for 1:2 downsampling.
The scaling filter with zero shift will give sub-sampling for
2x downsampling. Allow for a phase shift to get an averaging filter.

Usage is for source scaling in 1 pass SVC mode for 1:2 downscale.
Reduces aliasing in downsampled image.

Keep the phase to 0/off for now.

Change-Id: Ic547ea0748d151b675f877527e656407fcf4d51e
2017-04-18 16:56:15 -07:00
Marco
ad2e3598d2 vp9: Add key_frame condition to is_reference check for loopfilter.
This condiiton is not needed as key_frame should set the refresh
of the reference frames, but good to have for clarity in condition.

Change-Id: Icf9838e7e4f0ff5cf0a9562ae3b5d6c7e6f78702
2017-04-17 15:18:46 -07:00
Marco Paniconi
9aa429a66d Revert "Revert "vp9: Avoid encoder loopfilter for non-reference frames.""
This reverts commit e9b7f98c56b3b9c99a60eb41b83bf8346b3ad25f.

Reason for revert:
Commit d578bdad fixes the issue (encoder/decoder mismatch
in 3TL datarate test) that causes the original revert.

Original change's description:
> Revert "vp9: Avoid encoder loopfilter for non-reference frames."
>
> This reverts commit 863f860bfcf3bdc26eeecb299aa481d0f63d11ac.
>
> This causes encoder / decoder mismatches in various
> VP9/DatarateTestVP9Large.BasicRateTargeting3TemporalLayers tests
>
> BUG=webm:1408
>
> Change-Id: Ic200c39d7ed9c0b0247ef562f5d6f7b2625f7e14
>

TBR=jzern@google.com,marpan@google.com,builds@webmproject.org,jianj@google.com
BUG=webm:1408

Change-Id: Ifeb81460856d1d56482d4e0477a70ee98f8bfaa6
2017-04-17 11:02:02 -07:00
James Zern
e9b7f98c56 Revert "vp9: Avoid encoder loopfilter for non-reference frames."
This reverts commit 863f860bfcf3bdc26eeecb299aa481d0f63d11ac.

This causes encoder / decoder mismatches in various
VP9/DatarateTestVP9Large.BasicRateTargeting3TemporalLayers tests

BUG=webm:1408

Change-Id: Ic200c39d7ed9c0b0247ef562f5d6f7b2625f7e14
2017-04-14 11:50:06 -07:00
Marco
863f860bfc vp9: Avoid encoder loopfilter for non-reference frames.
Useful for SVC, where the top layer enhancement frames may
not update any reference buffers, as is the case for the
patterns in the 1 pass CBR SVC when #temporal_layers > 1.

~3% encoder speedup for SVC patterns with temporal layers
in 1 pass CBR mode.

Updated the SVC datarate tests for the mismatch frames.
Set the frame-dropper off in some tests with #temporal_layers > 1
so we can correctly set #mismatch frames. Adjusted rate target
threshold for tests where frame-dropper was turned off.

Change-Id: Ia0c142f02100be0fed61cd2049691be9c59d6793
2017-04-13 09:51:55 -07:00
Jerome Jiang
705fc9f107 Merge "Refactor: Clean memory allocation for copy partition." 2017-04-06 02:57:08 +00:00
Jerome Jiang
58ba880b94 Refactor: Clean memory allocation for copy partition.
Move the memory allocation from setting speed features.

Change-Id: I2e89dfaeb46daee63effe5a5df62feed732aa990
2017-04-05 15:33:24 -07:00
Jerome Jiang
fb60204d4c vp9: Remove legacy comments for avg_source_sad.
Change-Id: Ia6e8614535a097f17f37fc382cef8e22e03b70f6
2017-04-04 16:28:27 -07:00
Marco
6b3f4bc794 vp9: 1 pass CBR: cleanup to cyclic refresh.
Code cleanup: merged two functions that were doing postencode
update for cylic refresh, remove some unused code and fix comments.

No change in behavior.

Change-Id: I9be0d7e346d34dec29bf4e5bb380a7bf81c8480a
2017-04-03 16:37:45 -07:00
Marco
fc83fcb7c4 vp9: SVC: fix to allow output of denoised result.
Change-Id: Iaf55cfb5e9621d074eb33d6a32f184e4777968f8
2017-03-29 14:02:54 -07:00
Marco
66c6b4d6fc vp9: 1 pass: Move source sad computation into encodeframe loop.
Refactor to split the 1 passs source sad computation into scene
detection (currently used for VBR and screen-content mode), and
superblock based source sad computation (used in non-rd CBR mode).

This allows the source sad computation for CBR mode to be
multi-threaded.

No change in compression.

Change-Id: I112f2918613ccbd37c1771d852606d3af18c1388
2017-03-27 11:11:05 -07:00
Marco
07ad5a15c2 vp9: Fix to condition on using source_sad for 1 pass real-time.
Make the source_sad feature work properly for cases of VBR or
screen_content with SVC.

Added unittest for SVC with screen-content on.

Change-Id: Iba5254fd8833fb11da521e00cc1317ec81d3f89b
2017-03-24 10:21:47 -07:00
Marco
06c8713e89 vp9: Use sb content measure to bias against golden.
For each superblock, keep track of how far from current frame
was the last significant content change, and use that (along
with GF distance), to turnoff GF search in non-rd pickmode.

Only enabled for speed >= 8.

avgPNSR on RTC/RTC_derf down by ~0.9/1.2.
Speedup on mac: ~3-5%.
Speedup on arm: 3.6% for VGA and 4.4% for HD.

Change-Id: Ic3f3d6a2af650aca6ba0064d2b1db8d48c035ac7
2017-03-20 12:42:26 -07:00
Jerome Jiang
bf40776aa4 Merge "Refactor: Change cpi->resize_state to enum values." 2017-03-16 16:43:42 +00:00
Marco
a340c64a79 vp9: Fix some issues with denoiser and SVC.
Fix the update of the denoiser buffer when the base
spatial layer is a key frame. And allow for better/lower
QP on high spatial layers when their base layer is key frame.

Change-Id: I96b2426f1eaa43b8b8d4c31a68b0c6d68c3024a2
2017-03-15 17:19:17 -07:00
Jerome Jiang
b5f7f7737a Refactor: Change cpi->resize_state to enum values.
Change-Id: Iab1409b0fc1175bc5a14afc4749a08c536c98c41
2017-03-15 17:16:17 -07:00
Jerome Jiang
27d5a57072 Merge "vp9: Using source sad for speedup for dynamic resizing." 2017-03-15 00:03:52 +00:00
Jerome Jiang
2fa7092808 Merge "vp9: Enable row multithreading for SVC in real-time mode." 2017-03-14 23:29:46 +00:00
Jerome Jiang
02463273c9 vp9: Using source sad for speedup for dynamic resizing.
Only for speed >= 7.

Change-Id: I3ac85fbb4023cf7e6f8333806b345b0174382a09
2017-03-14 15:47:19 -07:00
Marco
f0a22b23fe vp9: Fix to source_sad feature for SVC.
Allow speed feature sf->use_source_sad to be used
on highest spatial layer for SVC.

Change-Id: I260eb0478902764f49f83e43b17024fe86ff3b22
2017-03-13 11:00:40 -07:00
Marco
ffb3c50da1 vp9: Enable row multithreading for SVC in real-time mode.
Enable row-mt for SVC for real-time mode, speed >=5.

Add the controls to the sample encoders, but keep it off for now.
Add the control and enable it for the 1 pass CBR unittests.

For speed 7, 3 layer SVC, 2 threads, row-mt enabled gives about ~5% speedup.

Change-Id: Ie8e77323c17263e3e7a7b9858aec12a3a93ec0c1
2017-03-10 01:01:07 +00:00
James Zern
2f31a16445 move vp9_scale_and_extend_frame_c to vp9_frame_scale.c
this is similar to the x86 configuration and helps mitigate an issue
with a circular dependency between this function and the ssse3 variant
causing an outsized increase in binary size (~300K for chrome)
chrome.dll:
.text 255B000 -> 252B000
.data 7B000 -> 75000
-221184 bytes

BUG=chromium:697956

Change-Id: Ic95b142ecd62dd4f1795788aa27dd8fab59b708c
2017-03-08 21:13:50 -08:00
Marco
45de35fc58 vp9: Fix for denoising with SVC.
Fix the conditon for getting last_source when denoising is on.
This avoids unneeded scaling in the case of SVC.

No change in quality.

Change-Id: I32c1c2c9085104da51af8535716bcc4d55fb0f42
2017-03-08 09:45:58 -08:00
Vignesh Venkatasubramanian
453f18040f vp9,realtime: Enable row multithreading for non-rd
Enable row level multithreading for realtime encodes where non-rd
path is used (speed >= 5).

Change-Id: I5439cb49a02171166d8e1de06c7d5e6f8e819a41
2017-03-02 11:03:56 -08:00
Vignesh Venkatasubramanian
5881601488 vp9: Rename new_mt to row_mt
new_mt is a very generic name that will get obsolete soon enough.
Since this is exposed as a codec control, renaming it to row_mt to
signify row level paralellism. Also renaming the ETHREAD_BIT_MATCH
codec control to ROW_MT_BIT_EXACT.

Change-Id: Ic7872d78bb3b12fb4cf92ba028ec8e08eb3a9558
2017-02-27 09:43:26 -08:00
Jerome Jiang
0998a146d4 Make vp9_scale_and_extend_frame_ssse3 work for hbd when bitdepth = 8.
Only works for bitdepth = 8 when compiled with high bitdepth flag.
4x speed ups for handling 1:2 down/upsampling.

Validated manually for:
1) Dynamic resize for a single layer encoding
2) SVC encoding with 3 spatial layers
Results are bitexact with the patch and the speed gain (~4x) in the
scaling was verified.

BUG=webm:1371

Change-Id: I1bdb5f4d4bd0df67763fc271b6aa355e60f34712
2017-02-23 20:40:28 -08:00
Yunqing Wang
66f36f4735 Merge "Refactored the row based multi-threading code" 2017-02-22 16:55:04 +00:00
Jerome Jiang
b1dcaf7f1e Merge "Fix segmentation fault caused by denoiser working with spatial SVC." 2017-02-22 04:44:55 +00:00
Marco
7f2daa74a0 vp9: Incorporate source sum_diff into non-rd partition thresholds.
Increase the variance partition thresholds for superblocks that
have low sum-diff (from source analysis prior to encoding frame).
Use it for now only for speed >= 7 or for denoising on.

Small change on metrics for rtc set: less than ~0.1 avgPNSR decrease
on RTC set, for both speed 7 and 8.

Change-Id: I38325046ebd5f371f51d6e91233d68ff73561af1
2017-02-21 17:22:11 -08:00
Jerome Jiang
0d1e5a21c4 Fix segmentation fault caused by denoiser working with spatial SVC.
Re-enable the affected test.
BUG=webm:1374

Change-Id: I98cd49403927123546d1d0056660b98c9cb8babb
2017-02-21 09:38:28 -08:00
Ranjit Kumar Tulabandu
97d6a4cbd1 Refactored the row based multi-threading code
Modified the code to facilitate bit-match tests in first pass
Added unit-tests to test the row based multi-threading behavior for bit-exactness

Change-Id: Ieaf6a8f935bb1075597e0a3b52d9989c8546d7df
2017-02-20 16:13:45 +05:30
paulwilkins
d218b0914e cosmetics: Fix spelling mistake in compile flag name.
agressive -> aggressive

after:
ce7b38459 Aggressive VBR method.

Change-Id: Ie0f30b1bbc77ed9f32bec047b4a9b3d0cf4853f5
2017-02-16 14:51:31 -08:00
paulwilkins
945ccfee59 Additional first pass stats.
Added counts that split the intra coded blocks into low and high variance.

Change-Id: Ic540144b34d5141659081bb22f7ee16fd6861f14
2017-02-15 10:44:37 +00:00
Paul Wilkins
7635ee0f37 Merge "Aggressive VBR method." 2017-02-15 10:37:02 +00:00
Ranjit Kumar Tulabandu
71061e9332 Row based multi-threading of encoding stage
(Yunqing Wang)
This patch implements the row-based multi-threading within tiles in
the encoding pass, and substantially speeds up the multi-threaded
encoder in VP9.

Speed tests at speed 1 on STDHD(using 4 tiles) set show that the
average speedups of the encoding pass(second pass in the 2-pass
encoding) is 7% while using 2 threads, 16% while using 4 threads,
85% while using 8 threads, and 116% while using 16 threads.

Change-Id: I12e41dbc171951958af9e6d098efd6e2c82827de
2017-02-15 00:49:34 +00:00
paulwilkins
ce7b38459a Aggressive VBR method.
VBR method that allows a wider Q range for the first normal frame
in each ARF group and then centers the min - max range for the rest of
the arf group on the chosen Q value for that first frame.

This allows for quite rapid adjustment of the active Q range even if the
initial estimate is poor.

In some cases where the ARF frames themselves are tending to
undershoot but the normal frames are overshooting this can still give
net undershoot. This can be corrected by allowing a larger Q delta for
arf frames but is usually is a sign that the allocation to the arfs was to
high.

Change-Id: Icec87758925d8f7aeb2dca29aac0ff9496237469
2017-02-13 15:42:11 +00:00
Paul Wilkins
c3f095c8b3 Merge "Fix to avoid abrupt relaxation of max qindex in recode path" 2017-02-09 17:17:55 +00:00