17183 Commits

Author SHA1 Message Date
Linfeng Zhang
b90166665f Merge "Slow pshufb removal in 3 intra prediction functions." 2016-06-03 16:35:14 +00:00
Jacky Chen
a8c8bf1c99 Merge "vp9: Fix valgrind failure for short circuit on low temporal vaiance block." 2016-06-03 16:09:36 +00:00
Geza Lore
1354c6942c Compute rate of partition type accurately for edge blocks.
This patch factors in the different partition coding syntax used for
right and bottom edge blocks when doing RD search.

Change-Id: I2f31650512b6a4a7a2c03352414693aff6fbf87b
2016-06-03 06:43:34 -07:00
Debargha Mukherjee
353930d212 Merge "Add 1D version of vpx_sum_squares_i16" into nextgenv2 2016-06-03 13:27:50 +00:00
Debargha Mukherjee
5590c48937 Merge "Move template specializations into .cc from .h" into nextgenv2 2016-06-03 13:27:43 +00:00
Debargha Mukherjee
cfa03374f8 Merge "Factor out x86 SIMD intrinsic synonyms" into nextgenv2 2016-06-03 13:27:29 +00:00
Debargha Mukherjee
1e160ce559 Merge "Factor out model_rd_from_sse" into nextgenv2 2016-06-03 13:27:22 +00:00
Debargha Mukherjee
cbf51c5ba0 Merge "Pre-compute and use contiguous wedge masks." into nextgenv2 2016-06-03 13:27:02 +00:00
paulwilkins
45a26dd9c8 Slightly more damped VBR adjustment.
Increase in the damping used in adjusting the active Q range.
This does hurt rate accuracy a little in a few extreme cases
especially if the clip is very short*, but helps metrics.

* Note that the adjustment is applied at the GF/ARF group level based
on what happened in the last group.  Hence for very short clips where
the length of a single group may be a significant % of the clip length
there is still scope for some drift that cannot be accommodated.

In practice most data points in our test sets are now much closer to target
than was previously the case with default settings and in some cases are
better even than they were with the command line undershoot and overshoot
parameter was set very low (e.g. 2%). For example in bridge_close at high rates
the old mechanism was unable to adapt enough to prevent extreme overshoot.

Change-Id: I634f8f0e015b5ee64a9f0ccaa2bcfdbc1d360489
2016-06-03 13:19:51 +01:00
paulwilkins
552fd02cf0 Change to get_twopass_worst_quality()
Change to the calculation of the error divisor used in
get_twopass_worst_quality(). This follows on from other
changes to the rate control that impact the output of this
function.

Change-Id: I414fa9aa1e6a68a64dccea17c3712f44b8a0c10c
2016-06-03 13:18:29 +01:00
paulwilkins
f9865d1701 Removed unused data structure.
Removed unused element from TWOPASS data structure.

Change-Id: I9b662fd8eea727a7978055bc14f7c7328f048a5e
2016-06-03 13:18:09 +01:00
paulwilkins
c7ac2f3864 Adjustment to VBR rate correction.
Changes to the function the redistributes bits from overshoot
or undershoot throughout the rest of the clip to respond more
quickly.

Change-Id: I90f10900cdd82cf2ce1d8da4b6f91eb5934310da
2016-06-03 13:17:43 +01:00
paulwilkins
cd700e1ab9 Adjustment calculation of active worst quality.
Added a factor based on the bit spend in the last arf group vs the
target to adjust the choice of the active worst quality in subsequent
groups.

Helps clips where previously there was a big overshoot or undershoot
to adapt and get closer to the target rate.

Change-Id: I67034b801679b99024409489a2273ea6fe23b8e6
2016-06-03 13:17:21 +01:00
paulwilkins
4328b08521 Remove gf_zeromotion_pct.
The use of this value is preventing rate adjustment on clips
or sections that have very little motion but high noise and
this can give rise to some sections with massive overshoot.

Change-Id: I9a65c7c1148dc5d3a7d8b23e50fc1733f3661621
2016-06-03 12:13:03 +01:00
Geza Lore
f19700fe52 Add 1D version of vpx_sum_squares_i16
Change-Id: I0d7bda2fe6f995a9e88a9f66540b4979b3f7fab1
2016-06-03 09:34:55 +01:00
Geza Lore
5a69ee0e11 Move template specializations into .cc from .h
Change-Id: I6d8775c1fa228fde25016a401e3c22a8e3da42f9
2016-06-03 09:34:55 +01:00
Geza Lore
9ebca46933 Factor out x86 SIMD intrinsic synonyms
Change-Id: Idc4ac3ccd2ba19087cdb74a3e4a6774ac50386aa
2016-06-03 09:34:55 +01:00
Geza Lore
73bc3119be Factor out model_rd_from_sse
Change-Id: Ia60ff0ecc8d083870fadbfe07d494d1e2c080489
2016-06-03 09:34:55 +01:00
Geza Lore
ab29978e9f Pre-compute and use contiguous wedge masks.
This is purely a refactoring patch and has no functional effect.

Uses of these masks can be arranged such that all input blocks are
contiguous in memory (stride == block width). In this case 1D versions
of  operations can be used. 1D vector operations have superior performance
over 2D block equivalents as they are more processor cache friendly and
they can do away with a second loop overhead.

Change-Id: I2b76c9888aea2c857cc497e8a4b2841fd3dad54e
2016-06-03 00:16:22 -07:00
James Zern
462e0ff88b vpx_dsp,add_noise: remove mmx implementation
a sse2 version exists, this is a reasonable modern baseline.

Change-Id: If31d36c8412d25b53f41b4a93cf02f46802c0c33
2016-06-02 23:51:22 -07:00
James Zern
eea8ea88ab vpx_dsp: remove mmx variance implementations
there are sse2 equivalents for all remaining variance implementations

Change-Id: I10b947e73fc0067688181f819b59e47966bec3d2
2016-06-02 23:46:16 -07:00
James Zern
7aef9790cf Merge "ivfdec: tolerate invalid framerates" 2016-06-03 03:15:01 +00:00
JackyChen
891dbe1e52 vp9: Fix valgrind failure for short circuit on low temporal vaiance block.
Add check for actual split before using the variance of the split.

Change-Id: If0f93248be0b16d17738675d16c90516054dad2b
2016-06-02 15:56:58 -07:00
Debargha Mukherjee
17c4f1c7f5 Merge "Use standard rounding in combine_interintra." into nextgenv2 2016-06-02 19:29:16 +00:00
Debargha Mukherjee
7534a15c3a Merge "Warped motion functions added" into nextgenv2 2016-06-02 19:28:03 +00:00
Linfeng Zhang
ad0646cb84 Slow pshufb removal in 3 intra prediction functions.
Replaced vpx_d45_predictor_4x4_ssse3(), vpx_d45_predictor_8x8_ssse3()
and vpx_d207_predictor_4x4_ssse3() with
created vpx_d45_predictor_4x4_sse2(), vpx_d45_predictor_8x8_sse2()
and vpx_d207_predictor_4x4_sse2() respectively.
It's mostly neutral or slightly worse than ssse3 in good cases and
better than ssse3 in the bad cases (but still worse than using the mmx
regs).

Change-Id: Ib0237ceb71d2c57b8a93fd3170330cfed9d56bdd
2016-06-02 10:55:58 -07:00
JackyChen
a32f341539 Disable short circuit feature for low temporal variance.
The featrue fails in libvpx_unit_tests-valgrind. Will re-enable it after
fixing the issue.

Change-Id: I8ba132f04e98f4615b31fbff2097eda83c5e42bc
2016-06-02 09:45:00 -07:00
Linfeng Zhang
10969dfc6e Merge "Update filter_selectively_vert_row2()" 2016-06-02 16:22:21 +00:00
Yaowu Xu
100dfc9eab Merge "firstpass.c: fix an UBSAN/IOC error" 2016-06-02 16:20:06 +00:00
Geza Lore
888e90e823 Use standard rounding in combine_interintra.
Use the same rounding method that is used throughout the codebase,
where the halfway value is rounded up rather than down.

Change-Id: I04e92850bc69a7d7a07b06e3d2ce97f6f2ada321
2016-06-02 16:26:05 +01:00
Yaowu Xu
fd500f955f firstpass.c: fix an UBSAN/IOC error
Change-Id: I579286e6741b689ae4281a35beb7b8f95c3ffce5
2016-06-02 00:31:32 +00:00
jackychen
bacc67f4a8 vp9: Skip some modes when variance is low for big blocks, for 1 pass real-time.
Skip intra-mode and some inter-modes (newmv, nearmv, nearestmv) for
golden frame if the variance got from choose_partitioning is very low.
Only for 1 pass real-time CBR mode and bsize >= 32x32, it has ~2.5%
speed up with less than 0.1% PSNR drop for rtc test set. Don't see
visual regression.

Change-Id: I70efbc95a1007231ae36f02c5b2fbf6cd35077ad
2016-06-01 13:54:18 -07:00
Linfeng Zhang
b26232eb1b Update filter_selectively_vert_row2()
Reduce operations and jumps. perf shows CPU time reduced from 1.9% to
1.6% when decoding fdJc1_IBKJA.248.webm on Xeon E5.
Will apply the changes to vp10 after code review.

Change-Id: I9351509922855d8896ddef1ed093b3ca12619a61
2016-06-01 11:20:47 -07:00
Alex Converse
380c4ee32d Merge "segmentation: Don't use uninitialized probability data." into nextgenv2 2016-06-01 17:50:37 +00:00
Marco Paniconi
204809bfb3 Merge "vp9: Skip computation of best_sad for newmv, unless needed." 2016-06-01 17:37:29 +00:00
Yaowu Xu
6382727dc5 Fix UBSAN/IOC errors
1. test/dct16x16_test.cc
2. test/dct32x32_test.cc
3. test/fdct8x8_test.cc

BUG=webm:1225

Change-Id: I9c9315fbd65ddb3b44f688e01ba265fd22192198
2016-06-01 16:01:18 +00:00
Yaowu Xu
787b38ebb9 Fix VP8 encoder UBSAN/IOC errors
1. vp8/decoder/dboolhuff.c
2. vp8/decoder/dboolhuff.h
3. vp8/encoder/bitstream.c
4. vp8/encoder/boolhuff.h
5. vp8/encoder/rdopt.c

BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1218

Change-Id: I5d315d63fd7aeaee6f3bd79178e593f3db38a6b1
2016-06-01 16:00:56 +00:00
James Zern
e5e2932cb3 ivfdec: tolerate invalid framerates
default invalid framerates to 30, quiets warnings in corrupt / fuzzed
files

Change-Id: Ib10d2b67df83cb6f9ed1cd6ef8e0e637aa7099ff
2016-05-31 17:37:59 -07:00
Alex Converse
6bae20ca43 Merge "Replace some vpxbool calls with entropy coder agnostic calls." into nextgenv2 2016-05-31 23:58:00 +00:00
Yaowu Xu
46ff1072b3 variance_avx2.c: UBSAN/IOC fix
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1222

Change-Id: Ifb3bedf9b4e1b007b21aebaa4beb9ba50424efef
2016-05-31 16:44:35 -07:00
Alex Converse
7a6cb59dbb segmentation: Don't use uninitialized probability data.
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1224

Change-Id: I17b76fcf0d8c191850350d5aa50dcc007b8b0cdc
2016-05-31 16:42:29 -07:00
Hui Su
afaefc89eb Merge "ext-intra: speed up keyframe encoding" into nextgenv2 2016-05-31 23:21:03 +00:00
Hui Su
118167a47d Merge "Add a speed feature for inter tx type search" into nextgenv2 2016-05-31 23:20:57 +00:00
Hui Su
60b52a1334 Merge "Add a speed feature for intra tx type search" into nextgenv2 2016-05-31 23:20:52 +00:00
James Zern
1d9cf262f7 Merge "vp10_inv_txfm2d_test: fix memory leak" into nextgenv2 2016-05-31 23:19:47 +00:00
Alex Converse
aee0091161 Replace some vpxbool calls with entropy coder agnostic calls.
Change-Id: Ifbcd0714fcf994c43b69255185456c7a255df66c
2016-05-31 15:42:19 -07:00
Debargha Mukherjee
faf3c2cd38 Warped motion functions added
Change-Id: I5064ef1421e17c3ecafe70e7ff1fc7db0c16cc8f
2016-05-31 14:03:23 -07:00
hui su
fa933553da ext-intra: speed up keyframe encoding
130% speed increase for keyframe encoding, with 0.4%
compression loss.

When kf-max-dist=150, 1.5% speed increase with 0.03%
compression loss.

Change-Id: I4cf7314ab95b9eb6dd17f314aca8955522c82676
2016-05-31 10:34:44 -07:00
hui su
f523d7b540 Add a speed feature for inter tx type search
Seperate prediction mode and tx type search for inter
modes. Enabled for speed >=1.

baseline:
speed increase     40%
compression drop   0.30%/0.29% on lowres/midres

ext-tx:
speed increase    160%
compression drop  1.08%/0.95% on lowres/midres

Change-Id: Ieb34b1ee80df6980d16e26a5783e08cc0deae55b
2016-05-31 10:34:35 -07:00
hui su
38e6dd71bb Add a speed feature for intra tx type search
Add a speed feature to seperate prediction mode and tx type search
for intra modes: search for best intra prediction mode with fixed
default tx type first, then choose the best tx type for the
selected mode.

Coding performance drop:
baseline
  lowres 0.10% midres 0.08% hdres 0.14%
with ext-tx
  lowres 0.14% midres 0.25% hdres 0.20%

Speed improvement is 20% for baseline and 17% for ext-tx.

It is turned on for speed >= 1.

Change-Id: Ia5e8d39e8a4e2e42c521bfde938f8b6a98ab24f9
2016-05-31 10:33:56 -07:00