454 Commits

Author SHA1 Message Date
Yue Chen
1273c39c03 Fix RDO issue of obmc + speed feature fast_inter_tx_type_search
Change-Id: I86a967ad2d824ca7877626eed9eb11f0e057b22d
2016-06-20 16:38:12 +00:00
Jingning Han
887f020691 Fix unit test failure in obmc exp
Properly restore the rate cost in the inner search loop of obmc
prediction. This avoids unexpected encoding behavior. It fixes
the unit test failure in obmc experiment:

AltRefForcedKeyTestLarge.Frame1IsKey/2

Change-Id: I667b219dfcf2f2c63d9d984900ed3cfd10c354bd
2016-06-17 17:44:03 -07:00
Zoe Liu
5805a14ca6 Merge bi-predictive frames to EXT_REFS
This patch removed the experiment of BIDIR_PRED and merged the feature
into the experiment of EXT_REFS:

(1) Each frame now has up to 6 reference frames, namely
    LAST_FRAME, LAST2_FRAME, LAST3_FRAME, GOLDEN_FRAME, (forward) and
    BWDREF_FRAME, ALTREF_FRAME (backward);
    LAST4_FRAME has been removed;
(2) First pass still keeps the 8 updates:
    KF_UPDATE, LF_UPDATE, GF_UPDATE, ARF_UPDATE, OVERLAY_UPDATE, and
    BRF_UPDATE, LAST_BIPRED_UPDATE, BI_PRED_UPDATE;
(3) show_existing_frame==1 is supported in the experiment of EXT_REFS;
(4) New encoding modes are added for both single-ref and compound cases,
    through the use of the 2 extra forward references (LAST2 & LAST3)
    and the 1 extra backward reference (BWDREF).

RD performance wise, using Overall PSNR: Avg/BDRate
        Bipred only      Prev EXT_REFS    Current EXT_REFS with bipred
lowres: -3.474/-3.324    -1.748/-1.586    -4.613/-4.387
derflr: -2.097/-1.353    -1.439/-1.215    -3.120/-2.252
midres: -2.129/-1.901    -1.345/-1.185    -2.898/-2.636

If in vp10/encoder/firstpass.h, change BFG_INTERVAL from 2 to 3, i.e. to
use 2 bi-predictive frames than 1, a further improvement may be
obtained:
                 Current EXT_REFS with bipred
        1 bi-predictive frame    2 bi-predictive frames
lowres: -4.613/-4.387            -4.675/-4.465
derflr: -3.120/-2.252            -3.333/-2.516
midres: -2.898/-2.636            -3.406/-3.095

Change-Id: Ib06fe9ea0a5cfd7418a1d79b978ee9d80bf191cb
2016-06-17 12:43:39 -07:00
Alex Converse
cf7968dcea Cleanup dist_block()
Change-Id: Iff0c0548924efd5a01c3a301cc5b4cdfda42e87e
2016-06-16 15:27:22 -07:00
Debargha Mukherjee
095c88e470 Merge "Fix estimate_wedge_sign with high bit-depth." into nextgenv2 2016-06-15 16:43:51 +00:00
Jingning Han
c457fc3553 Merge "Rework transform quantization pipeline" into nextgenv2 2016-06-15 16:07:10 +00:00
Jingning Han
08bf788ebd Merge "Refactor the trellis optimization process" into nextgenv2 2016-06-15 16:07:03 +00:00
Geza Lore
d3df694fa8 Fix estimate_wedge_sign with high bit-depth.
This fixes some crashes in
VP10/EndToEndTestLarge.EndtoEndPSNRTest/ with high bit-depth and
ext-inter.

Change-Id: I10f0f08e1be4bd5c388616074d4aa3f91a2fda7a
2016-06-15 10:52:29 +01:00
Hui Su
1f493d1ff8 Merge "Speed up ext-intra inter frame encoding" into nextgenv2 2016-06-15 02:46:06 +00:00
Hui Su
e7fb03c8ae Merge "ext-intra: refactor rd loop in interframe" into nextgenv2 2016-06-15 02:46:00 +00:00
Jingning Han
1faf288798 Rework transform quantization pipeline
This commit reworks the transform and quantization unit. It enables
the use of adaptive quantization for intra modes. This further
improves the compression performance:
lowres 0.36%
midres 0.79%
hdres  0.73%

The key frame coding performance is improved:
lowres 1.7%
midres 1.9%
hdres  3.3%

The overall coding gains are:
lowres 1.1%
midres 1.8%
hdres  2.3%

Change-Id: Iaec1a3a4c1d5eac883ab526ed076d957060479dd
2016-06-14 16:32:04 -07:00
Hui Su
69f6fd2134 Merge "Fix rate cost calculation for ext-intra" into nextgenv2 2016-06-14 23:11:25 +00:00
hui su
8c3b3d3686 Handle intra modes when tx type speed feature is enabled
Change-Id: I9dc156214f3b3ded33ab30d558124b3151548161
2016-06-14 13:46:53 -07:00
hui su
8f9c9b28a8 Speed up ext-intra inter frame encoding
Skip filter intra mode search when regular intra modes have large
rd cost.

Encoding speed improvement:  8%.

Compression performance drop: 0.02%  / 0.09%  / 0.03% on
                              lowres / midres / hdres

Change-Id: I94d3e48781bff6ae6895a54f271dd65c959bb976
2016-06-14 13:46:17 -07:00
hui su
70566f0563 ext-intra: refactor rd loop in interframe
Move filter intra modes search to the end, after regular
mode search.

On average no performance changes.

Change-Id: I9293c8fdf706ebf831fbd61c6bb81959790f4848
2016-06-14 13:46:17 -07:00
hui su
7fa61d7d51 Fix rate cost calculation for ext-intra
It was broken by commit 8ee640f979.

Change-Id: I26b9eba810c74849b0805e64da2d269ab0685cb9
2016-06-14 13:46:17 -07:00
Debargha Mukherjee
902ee5060c A crash fix for supertx / ext-inter combination.
Change-Id: I9860376c98aa3b25f5bf86ed13d4a7631fa6b153
2016-06-13 13:57:30 -07:00
Jingning Han
a9a8c5993b Refactor the trellis optimization process
Speed up the trellis optimization unit by 10%.

Change-Id: If055f6c0589a405c008d2900bb8fbc11b1246f66
2016-06-13 12:19:57 -07:00
Jingning Han
4588676cfb Merge "Trellis based adaptive quantization" into nextgenv2 2016-06-13 17:36:19 +00:00
Debargha Mukherjee
81f8b3f31c Merge "Some refactoring to support warped motion mode" into nextgenv2 2016-06-10 23:18:39 +00:00
Jingning Han
25ca322957 Trellis based adaptive quantization
This commit combines uniform quantizer with trellis based coefficient
level optimization. It improves the codebase compression performance:

lowres 0.8%
midres 1.0%
hdres  1.6%

Note that the current trellis optimization unit is using C code. This
will make the cost of the overall quantization process slower. A number
of optimizations will come up next.

Change-Id: Id441dd238e4844409d0f08f82604be777f3f5282
2016-06-10 12:56:14 -07:00
Debargha Mukherjee
03be30ba3e Some refactoring to support warped motion mode
Change-Id: I15d54a3ae48b2b33082668116792c6595bdb3ddb
2016-06-10 12:04:18 -07:00
Sarah Parker
a21afd421b Move new quant experiment from nextgen
This experiment implements non-uniform quantization where
the width of the bins increases gradually to more closely
match a laplacian distribution of the coeficcients.

Performance Gain:
derflr: 0.15%
hevcmr: 0.675%

Change-Id: I25234244e3bcd94b87c1f77cf682190b61c8ef94
2016-06-10 08:06:22 -07:00
Angie Chiang
95340fccb3 Revert "Optimize wedge partition selection."
This reverts commit efda2831e5f758b4f350679b5c55c0b9282449b0.

This commit causes segmentation fault at SSE2/SumSquares2DTest.RandomValues/0

Change-Id: I171937e4daf6f15323e8206418773deb03bd8c53
2016-06-09 19:17:37 -07:00
Jingning Han
cedf90a9d6 Merge "Remove swap buffer speed feature" into nextgenv2 2016-06-08 19:45:54 +00:00
Jingning Han
0d6980d7a1 Remove swap buffer speed feature
The inter prediction residual can undergo different transform types
during the rate-distortion optimization search. The assumption used
in this speed feature no longer holds true. This commit removes the
related code to clean up the codebase and clear out unit test
failure in higher speed setting.

Change-Id: I7f7cd4df2345ed3e607c9fae75b38cd2dbde0cac
2016-06-08 11:27:00 -07:00
Jingning Han
b48eb90023 Merge "Add tx type speed feature to recursive transform block partitioning" into nextgenv2 2016-06-07 23:44:01 +00:00
Jingning Han
0b7f864213 Merge "Rework the tx type speed feature" into nextgenv2 2016-06-07 23:42:43 +00:00
Jingning Han
33dafdb58b Add tx type speed feature to recursive transform block partitioning
Change-Id: I45440a72b4287d98cbe21b72defc67138a8eb953
2016-06-07 11:34:30 -07:00
Jingning Han
9a858e868c Rework the tx type speed feature
This commit re-works the transform type speed feature. It moves
the transform type selection outside of the coding mode loop. This
avoids repeated motion search if the best prediction mode is
chosen as NEWMV. It improves the speed performance for clips that
contain more motion activities.

For mobile_cif at 1000 kbps, this makes the baseline encoding 7%
faster and makes the encoding with dynamic motion vector referencing
scheme enabled 10% faster.

Change-Id: I93e2714b3e461303372c4b66a4134ee212faffd1
2016-06-07 11:32:27 -07:00
Zoe Liu
5414abb4a0 Fix a RD performance bug in bipredictive frames
This patch will make sure the use of the BWDREF_FRAME for the
encoding of both the two types of bipredictive frames, namely
LAST_BIPRED_UPDATE and BIPRED_UPDATE. To realize it, the
updates on the cpi->ref_frame_flags have been moved to before
the encoding of one frame, instread of originally handled after
the encoding of one frame.

RD performance has been improved slightly, approximately by 0.17%
compared to before the applying of this patch:

lowres: Avg -3.474; BDRate -3.324
derflr: Avg -2.097; BDRate -1.353

Change-Id: I0aa19afd752293e345489fbff104c4351ca5498c
2016-06-07 09:45:10 -07:00
Debargha Mukherjee
13155e7725 Merge "Optimize wedge partition selection." into nextgenv2 2016-06-07 09:50:13 +00:00
Jingning Han
3713949b6d Merge "Make ref-mv experiment support ActiveMap" into nextgenv2 2016-06-06 16:06:41 +00:00
Geza Lore
efda2831e5 Optimize wedge partition selection.
We can optimize wedge partition selection by pre-computing the
residuals of the 2 underlying predictors, and then blend these
to compute the sse of the compound predictor, without actually
having to compute and subtract the compound predictor.

Similarly we can pre-compute a proxy array which we can use to
cheaply check which mask sign would have lower sse.

Details are in wedge_utils.c.

Mathematically these are equivalence transformations, but due to the
finite precision the encoder output will be perturbed, though on
average this should make 0% difference.

ext-inter gains about ~4.5% speedup.

Change-Id: Ib2657c3209ae161b4090b58b4b6c392641bf2792
2016-06-06 14:43:10 +01:00
Debargha Mukherjee
b85d0adadf Merge "Always include the cost of tx size in rate for Y." into nextgenv2 2016-06-03 22:57:17 +00:00
Debargha Mukherjee
33c57e6223 Merge "Check if sub8x8 rd stats are valid before reusing them." into nextgenv2 2016-06-03 22:38:56 +00:00
Jingning Han
27d8a948c1 Make ref-mv experiment support ActiveMap
Reset the ref_mv_idx and predicted motion vector when the coding
block belongs to skip segment.

Change-Id: I5746ab315a436b829b64a1a25121989d3c11c995
2016-06-03 15:04:18 -07:00
Geza Lore
b87078d51e Always include the cost of tx size in rate for Y.
The transform can only be skipped if both Y and U/V can be skipped, so
we always include the cost of tx size in the rate for Y. This will
get later subtracted if the transform is actually skipped.

Change-Id: I136a223e5596f18b69bb9f743e7e08438183a215
2016-06-03 11:51:35 -07:00
Geza Lore
d9870c32a9 Check if sub8x8 rd stats are valid before reusing them.
Change-Id: I5d49f15a07de58c226d4003b4691e001abf1f3f8
2016-06-03 11:47:34 -07:00
Geza Lore
8ee640f979 Compute cost of UV mode accurately for intra blocks.
We used to cache the cost of the UV mode from the search with a
different previously tried Y mode, but the UV mode is contexted
on the Y mode, so caching the cost is inaccurate.

Change-Id: Ib003510afb6fc9befb7808b67b0be64f1c0a0804
2016-06-03 11:13:51 -07:00
Geza Lore
73bc3119be Factor out model_rd_from_sse
Change-Id: Ia60ff0ecc8d083870fadbfe07d494d1e2c080489
2016-06-03 09:34:55 +01:00
Geza Lore
ab29978e9f Pre-compute and use contiguous wedge masks.
This is purely a refactoring patch and has no functional effect.

Uses of these masks can be arranged such that all input blocks are
contiguous in memory (stride == block width). In this case 1D versions
of  operations can be used. 1D vector operations have superior performance
over 2D block equivalents as they are more processor cache friendly and
they can do away with a second loop overhead.

Change-Id: I2b76c9888aea2c857cc497e8a4b2841fd3dad54e
2016-06-03 00:16:22 -07:00
hui su
fa933553da ext-intra: speed up keyframe encoding
130% speed increase for keyframe encoding, with 0.4%
compression loss.

When kf-max-dist=150, 1.5% speed increase with 0.03%
compression loss.

Change-Id: I4cf7314ab95b9eb6dd17f314aca8955522c82676
2016-05-31 10:34:44 -07:00
hui su
f523d7b540 Add a speed feature for inter tx type search
Seperate prediction mode and tx type search for inter
modes. Enabled for speed >=1.

baseline:
speed increase     40%
compression drop   0.30%/0.29% on lowres/midres

ext-tx:
speed increase    160%
compression drop  1.08%/0.95% on lowres/midres

Change-Id: Ieb34b1ee80df6980d16e26a5783e08cc0deae55b
2016-05-31 10:34:35 -07:00
hui su
38e6dd71bb Add a speed feature for intra tx type search
Add a speed feature to seperate prediction mode and tx type search
for intra modes: search for best intra prediction mode with fixed
default tx type first, then choose the best tx type for the
selected mode.

Coding performance drop:
baseline
  lowres 0.10% midres 0.08% hdres 0.14%
with ext-tx
  lowres 0.14% midres 0.25% hdres 0.20%

Speed improvement is 20% for baseline and 17% for ext-tx.

It is turned on for speed >= 1.

Change-Id: Ia5e8d39e8a4e2e42c521bfde938f8b6a98ab24f9
2016-05-31 10:33:56 -07:00
Zoe Liu
e89ca180c2 Make the bi-predictive frame group interval adjustable
This is for the bidir-pred experiment. Previously the length of the
bi-predictive frame group interval is fixed at 2, i.e. one
bi-predictive frame may be inserted every other frame. This patch
makes the length adjustable, i.e. any positive number may be
specified, but the use of the backward ref will be turned off if the
bi-predictive frame group interval is larger than the golden frame
group.

Further, an additional rate factor level has been added:
INTER_LOW
, which applies to LAST_BIPRED_UPDATE frames that are not used as
references.

Change-Id: I5514d34a64dd486bbb5756c2d0612946f598a789
2016-05-28 16:46:45 -07:00
Hui Su
88eaf5d6ce Merge "Skip unnecessary calculations in ext-intra" into nextgenv2 2016-05-26 18:03:02 +00:00
Zoe Liu
cf5083d4cd Added an experiment "bidir_pred" for backward prediction
Major parts have been implemented as follows:
(1) Added BRF_UPDATE, LASTNRF_UPDATE, and NRF_UPDATE in firstpass.c;
(2) Added the handling for the scenario of
"cpi->common.show_existing_frame == 1" at the encoder;
(3) Added a new reference frame of BWDREF_FRAME;
(4) Have bwd-ref work with upsampled references.

Note that when the experiment of "ext_refs" turned on, this experiment
will be turned off automatically currently.

RD performance in Overall PSNR has been improved, compared against the
VP10 baseline:

lowres: Avg -3.312; BDRate -3.154
derflr: Avg -1.927; BDRate -1.176
midres: Avg -2.149; BDRate -2.001
hdres : Avg -0.567; BDRate -0.588

Change-Id: I4c06ff51cc20194bffbd4d2346e57ba3dcf6b62c
2016-05-24 13:55:57 -07:00
hui su
4a741a5d5c Skip unnecessary calculations in ext-intra
Around 5% speedup.

Change-Id: I1c552e4e58fbf5637c0b5a97dd2cc4f83a1ca201
2016-05-23 17:24:19 -07:00
Debargha Mukherjee
fa5022978d Merge "Wedge refactoring to handle signs better" into nextgenv2 2016-05-20 23:19:39 +00:00