1015 Commits

Author SHA1 Message Date
Geza Lore
73bc3119be Factor out model_rd_from_sse
Change-Id: Ia60ff0ecc8d083870fadbfe07d494d1e2c080489
2016-06-03 09:34:55 +01:00
Geza Lore
ab29978e9f Pre-compute and use contiguous wedge masks.
This is purely a refactoring patch and has no functional effect.

Uses of these masks can be arranged such that all input blocks are
contiguous in memory (stride == block width). In this case 1D versions
of  operations can be used. 1D vector operations have superior performance
over 2D block equivalents as they are more processor cache friendly and
they can do away with a second loop overhead.

Change-Id: I2b76c9888aea2c857cc497e8a4b2841fd3dad54e
2016-06-03 00:16:22 -07:00
Alex Converse
380c4ee32d Merge "segmentation: Don't use uninitialized probability data." into nextgenv2 2016-06-01 17:50:37 +00:00
Alex Converse
6bae20ca43 Merge "Replace some vpxbool calls with entropy coder agnostic calls." into nextgenv2 2016-05-31 23:58:00 +00:00
Alex Converse
7a6cb59dbb segmentation: Don't use uninitialized probability data.
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1224

Change-Id: I17b76fcf0d8c191850350d5aa50dcc007b8b0cdc
2016-05-31 16:42:29 -07:00
Alex Converse
aee0091161 Replace some vpxbool calls with entropy coder agnostic calls.
Change-Id: Ifbcd0714fcf994c43b69255185456c7a255df66c
2016-05-31 15:42:19 -07:00
hui su
fa933553da ext-intra: speed up keyframe encoding
130% speed increase for keyframe encoding, with 0.4%
compression loss.

When kf-max-dist=150, 1.5% speed increase with 0.03%
compression loss.

Change-Id: I4cf7314ab95b9eb6dd17f314aca8955522c82676
2016-05-31 10:34:44 -07:00
hui su
f523d7b540 Add a speed feature for inter tx type search
Seperate prediction mode and tx type search for inter
modes. Enabled for speed >=1.

baseline:
speed increase     40%
compression drop   0.30%/0.29% on lowres/midres

ext-tx:
speed increase    160%
compression drop  1.08%/0.95% on lowres/midres

Change-Id: Ieb34b1ee80df6980d16e26a5783e08cc0deae55b
2016-05-31 10:34:35 -07:00
hui su
38e6dd71bb Add a speed feature for intra tx type search
Add a speed feature to seperate prediction mode and tx type search
for intra modes: search for best intra prediction mode with fixed
default tx type first, then choose the best tx type for the
selected mode.

Coding performance drop:
baseline
  lowres 0.10% midres 0.08% hdres 0.14%
with ext-tx
  lowres 0.14% midres 0.25% hdres 0.20%

Speed improvement is 20% for baseline and 17% for ext-tx.

It is turned on for speed >= 1.

Change-Id: Ia5e8d39e8a4e2e42c521bfde938f8b6a98ab24f9
2016-05-31 10:33:56 -07:00
Zoe Liu
e89ca180c2 Make the bi-predictive frame group interval adjustable
This is for the bidir-pred experiment. Previously the length of the
bi-predictive frame group interval is fixed at 2, i.e. one
bi-predictive frame may be inserted every other frame. This patch
makes the length adjustable, i.e. any positive number may be
specified, but the use of the backward ref will be turned off if the
bi-predictive frame group interval is larger than the golden frame
group.

Further, an additional rate factor level has been added:
INTER_LOW
, which applies to LAST_BIPRED_UPDATE frames that are not used as
references.

Change-Id: I5514d34a64dd486bbb5756c2d0612946f598a789
2016-05-28 16:46:45 -07:00
Linfeng Zhang
af7fb17c09 Upgrade fwht4x4_mmx() to fwht4x4_sse2() for vp9 and vp10.
Function level timing test shows about 27% time saving on
a Xeon E5-2680 v2 desktop.

Rename vp9_dct_sse2.c to vp9_dct_intrin_sse2.c for vp9 and
rename dct_sse2.c to dct_intrin_sse2.c for vp10 to avoid
duplicate basenames.

Actually vp9_fwht4x4_mmx/sse2() and vp10_fwht4x4_mmx/sse2()
are identical. TODO: They should be unified later if there is
no intention to keep a duplicate.

Change-Id: I3e537b7bbd9ba417c606cd7c68c4dbbfa583f77d
2016-05-27 09:51:16 -07:00
hui su
e5f47d4334 ext-intra: refactor mode info. writing and reading
No performance changes.

Change-Id: I001068330ea217a993aee9b79d7ffead0d23100e
2016-05-26 14:56:40 -07:00
Hui Su
88eaf5d6ce Merge "Skip unnecessary calculations in ext-intra" into nextgenv2 2016-05-26 18:03:02 +00:00
Yi Luo
cb507ff29a Merge "HBD inverse HT 8x8 and 16x16 sse4.1 optimization" into nextgenv2 2016-05-24 22:06:07 +00:00
Zoe Liu
cf5083d4cd Added an experiment "bidir_pred" for backward prediction
Major parts have been implemented as follows:
(1) Added BRF_UPDATE, LASTNRF_UPDATE, and NRF_UPDATE in firstpass.c;
(2) Added the handling for the scenario of
"cpi->common.show_existing_frame == 1" at the encoder;
(3) Added a new reference frame of BWDREF_FRAME;
(4) Have bwd-ref work with upsampled references.

Note that when the experiment of "ext_refs" turned on, this experiment
will be turned off automatically currently.

RD performance in Overall PSNR has been improved, compared against the
VP10 baseline:

lowres: Avg -3.312; BDRate -3.154
derflr: Avg -1.927; BDRate -1.176
midres: Avg -2.149; BDRate -2.001
hdres : Avg -0.567; BDRate -0.588

Change-Id: I4c06ff51cc20194bffbd4d2346e57ba3dcf6b62c
2016-05-24 13:55:57 -07:00
Yi Luo
28cdee448d HBD inverse HT 8x8 and 16x16 sse4.1 optimization
- Covers tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST.
- Encoding speed improves ~27% on crowd_run_1080p_12.
- Merge 4x4, 8x8, 16x16 unit tests in one test file.

Change-Id: I058ef5254d068a9523a826480c78ebbdd231824c
2016-05-24 12:55:30 -07:00
hui su
4a741a5d5c Skip unnecessary calculations in ext-intra
Around 5% speedup.

Change-Id: I1c552e4e58fbf5637c0b5a97dd2cc4f83a1ca201
2016-05-23 17:24:19 -07:00
Zoe Liu
a63147ae77 Fix --test-decode=warn to test mismatch
This patch always compares the most recent show frames between
the encoder and the decoder to test the mismatch.

Change-Id: I68a91ad0996a598231450debfd616e24992419b5
2016-05-23 17:01:53 -07:00
Debargha Mukherjee
fa5022978d Merge "Wedge refactoring to handle signs better" into nextgenv2 2016-05-20 23:19:39 +00:00
Debargha Mukherjee
e5de2ad632 Wedge refactoring to handle signs better
Mostly refactoring. Handles signs better though results are
more or less neutral.

Change-Id: If499537c8f8da4f34d104ebfda072eb4c85fb12f
2016-05-20 14:12:52 -07:00
Yaowu Xu
a9fc1cc257 Fix a build issue
When both obmc and dual_filter is enabled.

Change-Id: I56b127573a6cca31469bb357cf7a6a9c3df64071
2016-05-19 14:24:41 -07:00
Yue Chen
a33e3d12cb Merge "Fix obmc + ext-interp interference" into nextgenv2 2016-05-19 18:08:52 +00:00
Jingning Han
936ed0804d Merge "Account sub8x8 block reference filter type for prob context" into nextgenv2 2016-05-19 15:04:31 +00:00
Geza Lore
009bd1153e Fix obmc + ext-interp interference
With ext-interp, a switchable interpolation filter is coded iff the
motion vector uses fractional pixel movement (ie, true subpixel
movement). With ext-interp and obmc enabled at the same time, the RD
search proceeds as:
1. Do motion search
2. Do interpolation filter search iff subpixel motion, otherwise use
   EIGHTTAP_REGULAR
3. Evaluate obmc=0
4. Evaluete obmc=1 - This involves another motion search

If the motion search in step 4 yields an integer motion vector, while
the search in step 1 did not, then an interp_filter value other than
EIGHTTAP_REGULAR is invalid, and will cause an assertion failure
at output time, or a mismatch if not using --enable-debug.

The fix sets the interp_filter to EIGHTTAP_REGULAR if obmc=1 is picked
with an integer motion vector.

Change-Id: I4685d1ad537f41d833dc9eb64845956b67886cca
2016-05-19 11:30:07 +01:00
Zoe Liu
011f020447 Refactor on getting upsampled reference frame
Reused a function that has been used in getting the normal
reference frames.

Change-Id: Ic4f7dac5c396d689a72699ab79fd580747f8bd65
2016-05-18 16:00:23 -07:00
Jingning Han
9161464f6c Account sub8x8 block reference filter type for prob context
If a reference block is coded with sub8x8 block size, and if it
has sub-pixel level motion vectors, its prediction filter type
should be used as context information.

The coding performance gains of dual filter type coding scheme are
lowres  0.57%
hdres   0.88%

Change-Id: I68b98f2518d02f11c29d0256aeb45b2580fe5cac
2016-05-18 12:35:31 -07:00
Angie Chiang
6f28581b26 Turn on flip in inverse txfm2d
Fix build failed
Reduce txfm test time

Change-Id: Ieaf6b27f3a272d06286f817f01230413fa8adcf6
2016-05-18 11:26:57 -07:00
Yi Luo
18ecb16c30 Merge "Integrate HBD row/column flip fwd txfm SSE4.1 optimization" into nextgenv2 2016-05-18 17:45:45 +00:00
Debargha Mukherjee
f1ddf6eb04 Merge "Reducing computation of interintra modes" into nextgenv2 2016-05-18 17:21:15 +00:00
Yi Luo
1d307368a9 Integrate HBD row/column flip fwd txfm SSE4.1 optimization
- Integrate 5 flip transform types for each 4x4, 8x8, and 16x16
  block, for experiment, EXT_TX.
- Encoder speed improves about 12%-15%.
- Update the unit tests for bit-exact result against C.

Change-Id: Idf27c87f1e516ca5b66c7b70142477a115404ccb
2016-05-18 03:48:01 +00:00
Debargha Mukherjee
049dbe7786 Reducing computation of interintra modes
Use model for interintra mode search.
Speed-up about 5-10% with about 0.04 drop in efficiency.

lowres: -2.60%

Change-Id: I825bf0ba8a46eb7f19fc528c25b8df066fb8ea95
2016-05-17 07:28:06 -07:00
James Zern
a81a75184c Merge "vp10/rdopt,rd_pick_intra4x4block: port tsan fix from vp9" into nextgenv2 2016-05-17 03:04:00 +00:00
James Zern
8eba4ac46e vp10/rdopt,rd_pick_intra4x4block: port tsan fix from vp9
minus the non-existent nonrd portion. original change:

commit d642294b1c57a5adacb1038ff45766c38bae8a6d
Author: Jingning Han <jingning@google.com>
Date:   Thu Feb 11 12:36:49 2016 -0800

    Fix tsan error in VP9 sub8x8 intra mode search

    This commit fixes issue 1141. The issue was triggered in multi-tile
    encoding. The change properly saves and restores the block context
    information in the real-time mode selection process. It removes
    several redundant memcpy operations in sub8x8 intra block mode
    search.

    Change-Id: I35c9ad197f4bd500ec39b5fc833f052f19eee010

Change-Id: Idfa38c54c9e645479f6870d46f71fb1e91c071da
2016-05-16 17:20:29 -07:00
Jingning Han
4677e1a718 Unify the per directional filter type system for compound modes
For the current stage, we assume a single prediction filter type
per direction in the settings of compound inter prediction modes.

Change-Id: I12a1afdd364b93fcee870bd11ad01fc40ab48cff
2016-05-16 14:41:08 -07:00
Jingning Han
d567e14e81 Enable per motion component filter type selection
Change-Id: I73823fc94f296d225dece7156de71b30bae3fcb7
2016-05-16 14:38:43 -07:00
Debargha Mukherjee
fb8ea1736b Various wedge enhancements
Increases number of wedges for smaller block and removes
wedge coding mode for blocks larger than 32x32.

Also adds various other enhancements for subsequent experimentation,
including adding provision for multiple smoothing functions
(though one is used currently), adds a speed feature that decides
the sign for interinter wedges using a fast mechanism, and refactors
wedge representations.

lowres: -2.651% BDRATE

Most of the gain is due to increase in codebook size for 8x8 - 16x16.

Change-Id: I50669f558c8d0d45e5a6f70aca4385a185b58b5b
2016-05-16 12:41:47 -07:00
Angie Chiang
1e587ae616 Merge "Add flip option for vp10_fwd_txfm2d_#x#_c" into nextgenv2 2016-05-12 18:08:28 +00:00
Alex Converse
ccf4f47b99 Merge changes I412c24aa,I28a8bbf0
* changes:
  mcomp: Remove an obsolete undef.
  mcomp: Remove an obsolete comment.
2016-05-11 20:03:21 +00:00
Geza Lore
c1b739014f Cost wedge sign/index properly in rdopt.
Lowres improves by about 0.1%

lowres: -2.164 BDRATE

Change-Id: I393bbb92700bfbb8763ace424f4edc2d672a74b4
2016-05-11 11:59:10 -07:00
Yaowu Xu
a45596cff7 Merge "Added a measure of rc drift." 2016-05-11 18:02:00 +00:00
Yue Chen
372e12b959 Merge "Add single motion search for OBMC predictor" into nextgenv2 2016-05-11 17:20:32 +00:00
Paul Wilkins
5fd142e763 Merge "Fixed 8K two pass encoder crash." 2016-05-11 16:25:25 +00:00
paulwilkins
45df87ca57 Added a measure of rc drift.
Added actual and absolute rate miss values to the opsnr.stt
stats output line.

Changes to the borg graphing may be needed before merge.

Change-Id: I1e9d548ce445d29002f0c59ebfd3957a6f15e702
2016-05-11 15:15:07 +01:00
paulwilkins
65732c36a8 Fixed 8K two pass encoder crash.
Bug found by Yunqing relating to the correction for size at 8K and
above in get_twopass_worst_quality().

The basis for the correction was changed to the linear size relative to
1080P as a baseline and the adjustment has been clamped to prevent
problems at extreme images sizes.

For 1080P the results on our test sets were neutral but the low res and
mid res sets saw a small gain (0.1%-0.2% average).

I would also expect some gains on 4k and larger content where the
previous correction was overly aggressive.

Change-Id: I30b026b5f4535e9601e3178d738066459d19c8fb
2016-05-11 14:45:50 +01:00
Yue Chen
370f203a40 Add single motion search for OBMC predictor
Weighted single motion search is implemented for obmc predictor.
When NEWMV mode is used, to determine the MV for the current block,
we run weighted motion search to compare the weighted prediction
with (source - weighted prediction using neighbors' MVs), in which
the distortion is the actual prediction error of obmc prediction.

Coding gain: 0.404/0.425/0.366 for lowres/midres/hdres
Speed impact: +14% encoding time
              (obmc w/o mv search 13%-> obmc w/ mv search 27%)

Change-Id: Id7ad3fc6ba295b23d9c53c8a16a4ac1677ad835c
2016-05-10 18:27:45 -07:00
Angie Chiang
1954fa390f Add flip option for vp10_fwd_txfm2d_#x#_c
Will add unit test to test/vp10_fwd_txfm2d_test.cc later

Change-Id: I626900c67fca4eee2ad0ae1828188527a04a5362
2016-05-10 18:14:57 -07:00
Alex Converse
6dd5ec7efb mcomp: Remove an obsolete undef.
The macro was removed in 6724676.

Change-Id: I412c24aac49bd1ff60a331a30933e0d8ae3f2dd5
2016-05-10 18:04:24 -07:00
Alex Converse
7764f8af3e mcomp: Remove an obsolete comment.
This was copied over from VP8. VP9 doesn't seem to do this buffer copy.

Change-Id: I28a8bbf0503a7f99b2cb60620ab3674adde863bb
2016-05-10 18:04:24 -07:00
Yaowu Xu
dc73c3332e Merge "Move count buffers from stack to heap" into nextgenv2 2016-05-10 23:58:59 +00:00
Jingning Han
005564813d Merge "Remove unused highbd_fdct32x32 function" into nextgenv2 2016-05-10 23:16:41 +00:00