Commit Graph

6435 Commits

Author SHA1 Message Date
Marco
defe094e9e vp9: Fix an issue with setting variance thresholds.
From commit:
https://chromium-review.googlesource.com/c/441393/

On non-segment the set_vbp_thresholds() should be called
again to adjust thresholds based on content_state of superblock.
This was the intended behavior from 441393.

Small change in RTC metrics and speed.

Change-Id: I45e5fbdc4af74db76b3cb4f13074fcae0eb2219e
2017-02-27 12:09:51 -08:00
Vignesh Venkatasubramanian
5881601488 vp9: Rename new_mt to row_mt
new_mt is a very generic name that will get obsolete soon enough.
Since this is exposed as a codec control, renaming it to row_mt to
signify row level paralellism. Also renaming the ETHREAD_BIT_MATCH
codec control to ROW_MT_BIT_EXACT.

Change-Id: Ic7872d78bb3b12fb4cf92ba028ec8e08eb3a9558
2017-02-27 09:43:26 -08:00
Jerome Jiang
e96ab22462 Merge "Make vp9_scale_and_extend_frame_ssse3 work for hbd when bitdepth = 8." 2017-02-24 16:56:33 +00:00
Johann
904b957ae9 consolidate block_error functions
vp9_highbd_block_error_8bit_c was a very simple wrapper around
vp9_block_error_c. The SSE2 implemention was practically identical to
the non-HBD one. It was missing some minor improvements which only
went into the original version.

In quick speed tests, the AVX implementation showed minimal
improvement over SSE2 when it does not detect overflow. However, when
overflow is detected the function is run a second time. The
OperationCheck test seems to trigger this case and reverses any
speed benefits by running ~60% slower. AVX2 on the other hand is
always 30-40% faster.

Change-Id: I9fcb9afbcb560f234c7ae1b13ddb69eca3988ba1
2017-02-24 05:25:26 +00:00
Johann Koenig
aa911e8b41 Merge "block error sse2: use tran_low_t" 2017-02-24 05:24:34 +00:00
Jerome Jiang
0998a146d4 Make vp9_scale_and_extend_frame_ssse3 work for hbd when bitdepth = 8.
Only works for bitdepth = 8 when compiled with high bitdepth flag.
4x speed ups for handling 1:2 down/upsampling.

Validated manually for:
1) Dynamic resize for a single layer encoding
2) SVC encoding with 3 spatial layers
Results are bitexact with the patch and the speed gain (~4x) in the
scaling was verified.

BUG=webm:1371

Change-Id: I1bdb5f4d4bd0df67763fc271b6aa355e60f34712
2017-02-23 20:40:28 -08:00
Johann
3c16bbb73b block error sse2: use tran_low_t
Change-Id: Ib04990e4a7bda9fbf501f294da2057a2b2595deb
2017-02-24 01:33:35 +00:00
Marco Paniconi
1d12a125e7 Merge "vp9: 1pass CBR: modify condition for reducing loop filter." 2017-02-23 03:24:26 +00:00
Jerome Jiang
a6b6258284 Merge "vp9: Non-rd pickmode: use simple block_yrd under some conditons." 2017-02-22 23:19:29 +00:00
Marco
84f106f198 vp9: 1pass CBR: modify condition for reducing loop filter.
The reduction showed improvement on RTC when aq-mode=3 is on.
Add that (cyclic refresh enabled) to the condition.

Only affects 1 pass CBR.

Change-Id: I5d0843002d8e31d7c165098a62e7a71146b08664
2017-02-22 15:09:45 -08:00
Marco
7e7d820d5b vp9: Non-rd pickmode: use simple block_yrd under some conditons.
For speed 8 only.
3% speed up for QVGA and 6.3% for VGA on Nexus 6.
~3% avgPSNR decrease on rtc_derf and 2.9% on rtc.

Disabled for now.

Change-Id: I70133f1f6c804d663d594df437bfe7fdb0030d6a
2017-02-22 13:22:53 -08:00
Marco Paniconi
0acc270830 Merge "vp9: aq-mode=3: On key frame reset cr->reduce_refresh to 0." 2017-02-22 19:52:24 +00:00
Marco
7e79831016 vp9: aq-mode=3: On key frame reset cr->reduce_refresh to 0.
This prevent possible reduction of cyclic refresh after key frame.

Change-Id: Idd4e49b69cd95476e7eccfa31b2bd8669569e9e8
2017-02-22 10:50:08 -08:00
Jerome Jiang
3d1fa00fce vp9: Only compute y_sad for golden in variance partition for speed < 8.
Only affects speed 8. No obvious quality regression. Systematic speed
ups by ~1% on Nexus 6.

Change-Id: Ia904ca28ea041c3281c532911ec38fb7d7f46a17
2017-02-22 10:19:09 -08:00
Yunqing Wang
66f36f4735 Merge "Refactored the row based multi-threading code" 2017-02-22 16:55:04 +00:00
Jerome Jiang
b1dcaf7f1e Merge "Fix segmentation fault caused by denoiser working with spatial SVC." 2017-02-22 04:44:55 +00:00
Marco
7f2daa74a0 vp9: Incorporate source sum_diff into non-rd partition thresholds.
Increase the variance partition thresholds for superblocks that
have low sum-diff (from source analysis prior to encoding frame).
Use it for now only for speed >= 7 or for denoising on.

Small change on metrics for rtc set: less than ~0.1 avgPNSR decrease
on RTC set, for both speed 7 and 8.

Change-Id: I38325046ebd5f371f51d6e91233d68ff73561af1
2017-02-21 17:22:11 -08:00
Johann Koenig
1e224dcb83 Merge "Drop zbin_ptr and quant_shift_ptr" 2017-02-21 18:16:38 +00:00
Jerome Jiang
0d1e5a21c4 Fix segmentation fault caused by denoiser working with spatial SVC.
Re-enable the affected test.
BUG=webm:1374

Change-Id: I98cd49403927123546d1d0056660b98c9cb8babb
2017-02-21 09:38:28 -08:00
Paul Wilkins
4d4231352c Merge "Change to prediction decay calculation." 2017-02-21 09:42:38 +00:00
Marco
4e1ba35458 vp9: Fix for non-rd pickmode for high-bitdepth build.
Use the simple block_yrd under certain conditions.
The optimization code is completed but the speed is still slower
(~6% on 720p) than the low-bitdepth build.

For now, use the more complex block_yrd under certain conditions
(always use it for speed <= 5, otherwise use it on key frames and for
bsize >= 32x32).

This gives about ~2-3% gain in quality for speed 7 on RTC set
(over high bitdepth build), with about the same encoder fps as the
low bitdepth build.

Change-Id: Ibe92a1945d0bd635f880befb4c815727df62d754
2017-02-20 20:25:36 -08:00
Ranjit Kumar Tulabandu
97d6a4cbd1 Refactored the row based multi-threading code
Modified the code to facilitate bit-match tests in first pass
Added unit-tests to test the row based multi-threading behavior for bit-exactness

Change-Id: Ieaf6a8f935bb1075597e0a3b52d9989c8546d7df
2017-02-20 16:13:45 +05:30
paulwilkins
a63adac604 Change to prediction decay calculation.
This change subtracts out low complexity intra regions that are also low
error in the inter domain, in the calculation of the frame prediction decay.
The rationale here his that low complexity regions (such as sky) do not imply
high prediction decay in the same way as high error intra or neutral blocks.

The effect of this is small in most clips but in a few clips it can be > 10%.
(E.g. In to tree)

Change-Id: If67ac23d17fca14285cad2defa464c61c9ea861c
2017-02-17 09:29:24 +00:00
James Zern
b5bc9ee02d Merge "cosmetics: Fix spelling mistake in compile flag name." 2017-02-17 00:04:42 +00:00
Johann Koenig
a9b81da575 Merge "block error avx2: use tran_low_t" 2017-02-16 23:51:14 +00:00
paulwilkins
d218b0914e cosmetics: Fix spelling mistake in compile flag name.
agressive -> aggressive

after:
ce7b38459 Aggressive VBR method.

Change-Id: Ie0f30b1bbc77ed9f32bec047b4a9b3d0cf4853f5
2017-02-16 14:51:31 -08:00
Johann
ca4e27f5da Drop zbin_ptr and quant_shift_ptr
vp9[_highbd]_quantize]_fp[_32x32] and vp9_fdct8x8_quant do not make use
of these parameters.

scan is used for C code and iscan is used for SIMD implementations.

Change-Id: I908a0ff7d3febac33da97e0596e040ec7bc18ca5
2017-02-16 13:20:32 -08:00
Johann
2104454607 block error avx2: use tran_low_t
Change-Id: Ic5f3a1f569d6f82afeaf4fcd7235374bb460db3c
2017-02-16 12:39:02 -08:00
Johann Koenig
cc43012674 Merge changes I267050a5,Iebade0ef,Id96a8df3
* changes:
  quantize_fp_32x32 highbd ssse3: enable existing function
  quantize_fp highbd ssse3: use tran_low_t for coeff
  quantize_fp highbd sse2: use tran_low_t for coeff
2017-02-16 20:34:48 +00:00
Yunqing Wang
0bf6b51572 Merge "Structured the mode ordering code to avoid redundant memcpy" 2017-02-16 16:22:54 +00:00
Johann
4682130b60 quantize_fp highbd ssse3: use tran_low_t for coeff
Change-Id: Iebade0efc0efbb0a80a0f3adbef4962e3a2f25e8
2017-02-16 07:40:56 -08:00
Johann
ac3996a6d1 quantize_fp highbd sse2: use tran_low_t for coeff
Change-Id: Id96a8df33354a7987ce890a3d6798c7375ffa4aa
2017-02-16 07:40:55 -08:00
Johann
44600442dc bitdepth conversion: really use num elements
The previous implementation confused bit/bytes/elements. It was using
'32' as the multiplier but that was mistakenly adopted because a 32x32
transform embedded the stride.

Change-Id: Ieeb867a332416b9a40580b5e7c9b20088e9e691a
2017-02-16 15:02:48 +00:00
Ranjit Kumar Tulabandu
5127e58dab Structured the mode ordering code to avoid redundant memcpy
Change-Id: I4f5d6b54018bd1928cd9e5e42619e6f55b334803
2017-02-16 14:12:33 +00:00
Paul Wilkins
60a10116d1 Merge "Disconnect ARF breakout from frame boost." 2017-02-16 10:02:09 +00:00
Paul Wilkins
543ebc900f Merge "Remove unnecessary factor." 2017-02-16 10:01:58 +00:00
Paul Wilkins
9216ba58d8 Merge "Bug in scale_sse_threshold()" 2017-02-16 10:01:46 +00:00
Paul Wilkins
e6c1993f1b Merge "Additional first pass stats." 2017-02-16 09:39:29 +00:00
Marco
158b300952 vp9: Some code cleanup for aq-mode = 3.
The weight segment needs to only be computed once per frame,
so remove it from the funciton vp9_cyclic_refresh_rc_bits_per_mb(),
which is called within a loop inside vp9_rc_regulate_q.

Change-Id: Ia0e18b89abb97e42c466d4dbc47700d7f76555db
2017-02-15 14:07:04 -08:00
Marco
f82280820a vp9. Use same source_sad threshold for all speeds.
Only affects real-time mode.

Change-Id: Iba836f110c4da936f5173cc0f54424d5b6121bff
2017-02-15 11:28:26 -08:00
Marco
716c1d5ff5 Vp9: Speed 8 aq-mode=3: Reduce computation in estimating bits per mb.
vp9_compute_qdelta_by_rate has almost 2% overhead in profiling on Nexus 6.
Reduce the calling of that function in speed 8 by estimating the delta-q.
Both rtc and rtc_derf show little/no change in avg psnr/ssim.
Encoding speed is 2~3% faster on Nexus 6.

Change-Id: If25933715783f31104a18a5092ea347b1221b5f5
2017-02-15 09:28:16 -08:00
paulwilkins
cfc79a357a Disconnect ARF breakout from frame boost.
This small change replaces the frame boost check in the arf group
length break out clause with a test against a prediction decay value.

The boost value is in fact partly dependent on the decay value but
this change means that the per frame boost calculation can be adjusted
without influencing the group length calculation.

The value chosen gives a close match on all the test sets with the previous
code (on average) but it was noted that a lower threshold was slightly better
for 1080P and up and a slightly higher value for small image sizes.

Change-Id: I4d5b9f67d5b17b0d99ea3f796d3d6202fd61ee0c
2017-02-15 10:46:14 +00:00
paulwilkins
b89ba05ab4 Remove unnecessary factor.
Removed unnecessary scaling factor to simplify.

Change-Id: I3fc9c5975a2597e72f1324e09dd586dea1facfa7
2017-02-15 10:45:43 +00:00
paulwilkins
76550dfdc0 Bug in scale_sse_threshold()
The function scale_sse_threshold() returns a threshold scaled
if necessary for use with 10 and 12 bit from an 8 bit baseline.

SSE error values would be expected to rise for the 10 and 12
bit cases where there are more bits of precision.

Hence the threshold used for the test should also be scaled up.

Change-Id: I4009c98b6eecd1bf64c3c38aaa56598e0136b03d
2017-02-15 10:45:03 +00:00
paulwilkins
945ccfee59 Additional first pass stats.
Added counts that split the intra coded blocks into low and high variance.

Change-Id: Ic540144b34d5141659081bb22f7ee16fd6861f14
2017-02-15 10:44:37 +00:00
Paul Wilkins
7635ee0f37 Merge "Aggressive VBR method." 2017-02-15 10:37:02 +00:00
Johann Koenig
61927ba4ac Merge "vp9 fdct higbd neon: connect existing highbd calls" 2017-02-15 01:33:00 +00:00
Yunqing Wang
f2c1aea118 Merge "Row based multi-threading of encoding stage" 2017-02-15 00:54:10 +00:00
Ranjit Kumar Tulabandu
71061e9332 Row based multi-threading of encoding stage
(Yunqing Wang)
This patch implements the row-based multi-threading within tiles in
the encoding pass, and substantially speeds up the multi-threaded
encoder in VP9.

Speed tests at speed 1 on STDHD(using 4 tiles) set show that the
average speedups of the encoding pass(second pass in the 2-pass
encoding) is 7% while using 2 threads, 16% while using 4 threads,
85% while using 8 threads, and 116% while using 16 threads.

Change-Id: I12e41dbc171951958af9e6d098efd6e2c82827de
2017-02-15 00:49:34 +00:00
Johann
3e7aa8fda9 vp9 fdct higbd neon: connect existing highbd calls
Change-Id: Ia8f822bd6e70b3911bc433a5a750bfb6f9a3a75c
2017-02-14 22:11:49 +00:00