Commit Graph

6459 Commits

Author SHA1 Message Date
Jerome Jiang
27d5a57072 Merge "vp9: Using source sad for speedup for dynamic resizing." 2017-03-15 00:03:52 +00:00
Jerome Jiang
2fa7092808 Merge "vp9: Enable row multithreading for SVC in real-time mode." 2017-03-14 23:29:46 +00:00
Jerome Jiang
02463273c9 vp9: Using source sad for speedup for dynamic resizing.
Only for speed >= 7.

Change-Id: I3ac85fbb4023cf7e6f8333806b345b0174382a09
2017-03-14 15:47:19 -07:00
James Zern
1b91f41935 Merge "vp9/encoder: fix segfault on win32 using vs < 2015" 2017-03-14 19:21:42 +00:00
Yunqing Wang
c3e290963d Merge "Apply machine learning-based early termination in VP9 partition search" 2017-03-14 18:07:05 +00:00
Marco Paniconi
78a6946904 Merge "vp9: Speed >= 8: Enable simple_block_yrd speed feature." 2017-03-14 17:50:17 +00:00
Marco
c0c789ab50 vp9: Adjust copy partition threshold, for speed 8.
Reduce it from 5 to 4, small/no change in metrics or speed.
Small reduction in dragging artifact near moving head.

Change-Id: Ic3bc5ca67c70bf0c89fc2ed14454840a28ae5b6a
2017-03-14 09:18:53 -07:00
Marco
c216c8d6f2 vp9: Speed >= 8: Enable simple_block_yrd speed feature.
Enable speed feature for resolutions > VGA.
avgPSNR on RTC down by ~1.7%.
Speedup on ARM: ~5%.

Change-Id: I7a3fe5f7425aa8df3f4a2eced1afa355bc0d4c95
2017-03-14 09:10:28 -07:00
Marco
f0a22b23fe vp9: Fix to source_sad feature for SVC.
Allow speed feature sf->use_source_sad to be used
on highest spatial layer for SVC.

Change-Id: I260eb0478902764f49f83e43b17024fe86ff3b22
2017-03-13 11:00:40 -07:00
Yunqing Wang
670101439f Apply machine learning-based early termination in VP9 partition search
This patch was based on Yang Xian's intern project code. Further modifications
were done.
1. Moved machine-learning related parameters into the context structure.
2. Corrected the calculation of sum_eobs.
3. Removed unused parameters and calculations.
4. Made it work with multiple tiles.
5. Added a speed feature for the machine-learning based partition search
early termination.
6. Re-organized the code.

The patch was rebased to the top-of-tree.

Borg test BDRATE result:
4k set:     PSNR: +0.144%; SSIM: +0.043%;
hdres set:  PSNR: +0.149%; SSIM: +0.269%;
midres set: PSNR: +0.127%; SSIM: +0.257%;

Average speed gain result:
4k clips: 22%;
hd clips: 23%;
midres clips: 15%.

Change-Id: I0220e93a8277e6a7ea4b2c34b605966e3b1584ac
2017-03-13 09:54:18 -07:00
Marco
8c18df7fcd vp9: Fix condition for intra search in non-rd pickmode.
Fixes an issue when the LAST and golden is not used as a reference,
in which case its possible no encoding mode is set (since intra may be
skipped under certain codtions). Fix is to make sure intra is searched
if no inter mode is checked.

Issue can happen for temporal layer pattern#7 in vpx_temporal_svc_encoder.c

Change-Id: I5ab4999b2f9dbd739044888e0916b5ec491d966b
2017-03-12 22:30:39 -07:00
James Zern
c09b290cea vp9/encoder: fix segfault on win32 using vs < 2015
shift the bsse[] member of the macroblock struct to the front to avoid
an incorrect offset (0) to the upper half of bsse[0] which leads to a
negative resulting in a crash. restrict this to visual studio versions
before 2015 (the bug was observed with 2013, fixed in 2015) to avoid any
potential cache impact on other platforms.

https://connect.microsoft.com/VisualStudio/feedback/details/2396360/bad-structure-offset-in-32-bit-code

BUG=webm:1054

Change-Id: I40f68a1d421ccc503cc712192263bab4f7dde076
2017-03-10 17:37:17 -08:00
Marco
ffb3c50da1 vp9: Enable row multithreading for SVC in real-time mode.
Enable row-mt for SVC for real-time mode, speed >=5.

Add the controls to the sample encoders, but keep it off for now.
Add the control and enable it for the 1 pass CBR unittests.

For speed 7, 3 layer SVC, 2 threads, row-mt enabled gives about ~5% speedup.

Change-Id: Ie8e77323c17263e3e7a7b9858aec12a3a93ec0c1
2017-03-10 01:01:07 +00:00
James Zern
cb60e66085 Merge "move vp9_scale_and_extend_frame_c to vp9_frame_scale.c" 2017-03-09 22:51:08 +00:00
James Zern
2f31a16445 move vp9_scale_and_extend_frame_c to vp9_frame_scale.c
this is similar to the x86 configuration and helps mitigate an issue
with a circular dependency between this function and the ssse3 variant
causing an outsized increase in binary size (~300K for chrome)
chrome.dll:
.text 255B000 -> 252B000
.data 7B000 -> 75000
-221184 bytes

BUG=chromium:697956

Change-Id: Ic95b142ecd62dd4f1795788aa27dd8fab59b708c
2017-03-08 21:13:50 -08:00
Marco
ea3c817ac2 vp9: Enable two speed features for SVC real-time mode.
Enable short_circuit_low_temp_var and limit_newmv_early_exit
for SVC, 1 pass CBR mode.

Change-Id: I77df2b2c6cc40657bb8ea76e19dfc2fdaad6389e
2017-03-08 16:13:59 -08:00
Yunqing Wang
099e9bf1ff Make the partition search early termination feature to be frame size dependent
The 2 thresholds(i.e. partition_search_breakout_dist_thr and
partition_search_breakout_rate_thr) are used as the partition search
early termination speed feature. This refactoring patch made this
feature to be frame size dependent consistently throughout the code.

Change-Id: Idaa0bd8400badaa0f8e2091e3f41ed2544e71be9
2017-03-08 12:56:41 -08:00
Marco
45de35fc58 vp9: Fix for denoising with SVC.
Fix the conditon for getting last_source when denoising is on.
This avoids unneeded scaling in the case of SVC.

No change in quality.

Change-Id: I32c1c2c9085104da51af8535716bcc4d55fb0f42
2017-03-08 09:45:58 -08:00
Alex Converse
15dac923b9 Merge "Narrow cat6_high_cost tables to uint16_t" 2017-03-03 23:45:39 +00:00
Alex Converse
bcd12de6c3 Narrow cat6_high_cost tables to uint16_t
Saves 2688 bytes of rodata.

Change-Id: I46633b6e50c2845181c70fff6273a8e58fdd1e56
2017-03-03 23:09:12 +00:00
Vignesh Venkatasubramanian
9e7140b451 Merge "vp9,realtime: Enable row multithreading for non-rd" 2017-03-03 19:05:52 +00:00
Marco
b60617f5ff vp9: Speed 8: reduce the adaptive_rd_thresh level.
Reduce the level from 4 to 2.
This gives ~1-2% quality gain on RTC set, with small decreaee in speed (~1-2% on mac).

Change-Id: I7d959731badcee3d45b2f4a08efe378765016a13
2017-03-02 13:34:10 -08:00
Vignesh Venkatasubramanian
453f18040f vp9,realtime: Enable row multithreading for non-rd
Enable row level multithreading for realtime encodes where non-rd
path is used (speed >= 5).

Change-Id: I5439cb49a02171166d8e1de06c7d5e6f8e819a41
2017-03-02 11:03:56 -08:00
James Zern
8697d14ec8 Revert "Fix for max qindex calculation of a gf interval"
This reverts commit d3db846cc5.

This change causes a large drop in psnr (4-5db) on low framerate
difficult content (tested at 360/480p)

BUG=b/35804225

Change-Id: I8e90012d3b9c8a0cddb062ba93b01b36c0e0c0a0
2017-02-28 16:26:13 -08:00
Marco
defe094e9e vp9: Fix an issue with setting variance thresholds.
From commit:
https://chromium-review.googlesource.com/c/441393/

On non-segment the set_vbp_thresholds() should be called
again to adjust thresholds based on content_state of superblock.
This was the intended behavior from 441393.

Small change in RTC metrics and speed.

Change-Id: I45e5fbdc4af74db76b3cb4f13074fcae0eb2219e
2017-02-27 12:09:51 -08:00
Vignesh Venkatasubramanian
5881601488 vp9: Rename new_mt to row_mt
new_mt is a very generic name that will get obsolete soon enough.
Since this is exposed as a codec control, renaming it to row_mt to
signify row level paralellism. Also renaming the ETHREAD_BIT_MATCH
codec control to ROW_MT_BIT_EXACT.

Change-Id: Ic7872d78bb3b12fb4cf92ba028ec8e08eb3a9558
2017-02-27 09:43:26 -08:00
Jerome Jiang
e96ab22462 Merge "Make vp9_scale_and_extend_frame_ssse3 work for hbd when bitdepth = 8." 2017-02-24 16:56:33 +00:00
Johann
904b957ae9 consolidate block_error functions
vp9_highbd_block_error_8bit_c was a very simple wrapper around
vp9_block_error_c. The SSE2 implemention was practically identical to
the non-HBD one. It was missing some minor improvements which only
went into the original version.

In quick speed tests, the AVX implementation showed minimal
improvement over SSE2 when it does not detect overflow. However, when
overflow is detected the function is run a second time. The
OperationCheck test seems to trigger this case and reverses any
speed benefits by running ~60% slower. AVX2 on the other hand is
always 30-40% faster.

Change-Id: I9fcb9afbcb560f234c7ae1b13ddb69eca3988ba1
2017-02-24 05:25:26 +00:00
Johann Koenig
aa911e8b41 Merge "block error sse2: use tran_low_t" 2017-02-24 05:24:34 +00:00
Jerome Jiang
0998a146d4 Make vp9_scale_and_extend_frame_ssse3 work for hbd when bitdepth = 8.
Only works for bitdepth = 8 when compiled with high bitdepth flag.
4x speed ups for handling 1:2 down/upsampling.

Validated manually for:
1) Dynamic resize for a single layer encoding
2) SVC encoding with 3 spatial layers
Results are bitexact with the patch and the speed gain (~4x) in the
scaling was verified.

BUG=webm:1371

Change-Id: I1bdb5f4d4bd0df67763fc271b6aa355e60f34712
2017-02-23 20:40:28 -08:00
Johann
3c16bbb73b block error sse2: use tran_low_t
Change-Id: Ib04990e4a7bda9fbf501f294da2057a2b2595deb
2017-02-24 01:33:35 +00:00
Marco Paniconi
1d12a125e7 Merge "vp9: 1pass CBR: modify condition for reducing loop filter." 2017-02-23 03:24:26 +00:00
Jerome Jiang
a6b6258284 Merge "vp9: Non-rd pickmode: use simple block_yrd under some conditons." 2017-02-22 23:19:29 +00:00
Marco
84f106f198 vp9: 1pass CBR: modify condition for reducing loop filter.
The reduction showed improvement on RTC when aq-mode=3 is on.
Add that (cyclic refresh enabled) to the condition.

Only affects 1 pass CBR.

Change-Id: I5d0843002d8e31d7c165098a62e7a71146b08664
2017-02-22 15:09:45 -08:00
Marco
7e7d820d5b vp9: Non-rd pickmode: use simple block_yrd under some conditons.
For speed 8 only.
3% speed up for QVGA and 6.3% for VGA on Nexus 6.
~3% avgPSNR decrease on rtc_derf and 2.9% on rtc.

Disabled for now.

Change-Id: I70133f1f6c804d663d594df437bfe7fdb0030d6a
2017-02-22 13:22:53 -08:00
Marco Paniconi
0acc270830 Merge "vp9: aq-mode=3: On key frame reset cr->reduce_refresh to 0." 2017-02-22 19:52:24 +00:00
Marco
7e79831016 vp9: aq-mode=3: On key frame reset cr->reduce_refresh to 0.
This prevent possible reduction of cyclic refresh after key frame.

Change-Id: Idd4e49b69cd95476e7eccfa31b2bd8669569e9e8
2017-02-22 10:50:08 -08:00
Jerome Jiang
3d1fa00fce vp9: Only compute y_sad for golden in variance partition for speed < 8.
Only affects speed 8. No obvious quality regression. Systematic speed
ups by ~1% on Nexus 6.

Change-Id: Ia904ca28ea041c3281c532911ec38fb7d7f46a17
2017-02-22 10:19:09 -08:00
Yunqing Wang
66f36f4735 Merge "Refactored the row based multi-threading code" 2017-02-22 16:55:04 +00:00
Jerome Jiang
b1dcaf7f1e Merge "Fix segmentation fault caused by denoiser working with spatial SVC." 2017-02-22 04:44:55 +00:00
Marco
7f2daa74a0 vp9: Incorporate source sum_diff into non-rd partition thresholds.
Increase the variance partition thresholds for superblocks that
have low sum-diff (from source analysis prior to encoding frame).
Use it for now only for speed >= 7 or for denoising on.

Small change on metrics for rtc set: less than ~0.1 avgPNSR decrease
on RTC set, for both speed 7 and 8.

Change-Id: I38325046ebd5f371f51d6e91233d68ff73561af1
2017-02-21 17:22:11 -08:00
Johann Koenig
1e224dcb83 Merge "Drop zbin_ptr and quant_shift_ptr" 2017-02-21 18:16:38 +00:00
Jerome Jiang
0d1e5a21c4 Fix segmentation fault caused by denoiser working with spatial SVC.
Re-enable the affected test.
BUG=webm:1374

Change-Id: I98cd49403927123546d1d0056660b98c9cb8babb
2017-02-21 09:38:28 -08:00
Paul Wilkins
4d4231352c Merge "Change to prediction decay calculation." 2017-02-21 09:42:38 +00:00
Marco
4e1ba35458 vp9: Fix for non-rd pickmode for high-bitdepth build.
Use the simple block_yrd under certain conditions.
The optimization code is completed but the speed is still slower
(~6% on 720p) than the low-bitdepth build.

For now, use the more complex block_yrd under certain conditions
(always use it for speed <= 5, otherwise use it on key frames and for
bsize >= 32x32).

This gives about ~2-3% gain in quality for speed 7 on RTC set
(over high bitdepth build), with about the same encoder fps as the
low bitdepth build.

Change-Id: Ibe92a1945d0bd635f880befb4c815727df62d754
2017-02-20 20:25:36 -08:00
Ranjit Kumar Tulabandu
97d6a4cbd1 Refactored the row based multi-threading code
Modified the code to facilitate bit-match tests in first pass
Added unit-tests to test the row based multi-threading behavior for bit-exactness

Change-Id: Ieaf6a8f935bb1075597e0a3b52d9989c8546d7df
2017-02-20 16:13:45 +05:30
paulwilkins
a63adac604 Change to prediction decay calculation.
This change subtracts out low complexity intra regions that are also low
error in the inter domain, in the calculation of the frame prediction decay.
The rationale here his that low complexity regions (such as sky) do not imply
high prediction decay in the same way as high error intra or neutral blocks.

The effect of this is small in most clips but in a few clips it can be > 10%.
(E.g. In to tree)

Change-Id: If67ac23d17fca14285cad2defa464c61c9ea861c
2017-02-17 09:29:24 +00:00
James Zern
b5bc9ee02d Merge "cosmetics: Fix spelling mistake in compile flag name." 2017-02-17 00:04:42 +00:00
Johann Koenig
a9b81da575 Merge "block error avx2: use tran_low_t" 2017-02-16 23:51:14 +00:00
paulwilkins
d218b0914e cosmetics: Fix spelling mistake in compile flag name.
agressive -> aggressive

after:
ce7b38459 Aggressive VBR method.

Change-Id: Ie0f30b1bbc77ed9f32bec047b4a9b3d0cf4853f5
2017-02-16 14:51:31 -08:00