81 Commits

Author SHA1 Message Date
Yaowu Xu
7e89c102c4 vp9-highbitdepth -> vpx-highbitdepth
Change-Id: I1e90cf7ab4bb02c0ef119b0bd1596771edefedff
2016-08-05 15:41:33 -07:00
Debargha Mukherjee
e5848dea5a Rectangular transforms 4x8 & 8x4
Added a new expt rect-tx to be used in conjunction with ext-tx.
[rect-tx is a temporary config flag and will eventually be
merged into ext-tx once it works correctly with all other
experiments].

Added 4x8 and 8x4 tranforms for use initially with rectangular
sub8x8 y blocks as part of this experiment.

There is about a -0.2% BDRATE improvement on lowres, others pending.

When var-tx is on rectangular transforms are currently not used.
That will be enabled in a subsequent patch.

Change-Id: Iaf3f88ede2740ffe6a0ffb1ef5fc01a16cd0283a
2016-07-21 10:46:41 -07:00
Jingning Han
2ad40b89b3 Align the quantizers for inter/inter modes in the first pass coding
Use regular extended zero bin quantizer for both inter and intra
modes in the first pass. This doesn't affect lowres and midres
significantly, but would bring back 0.9% coding gains for hdres.

Change-Id: Ifa5977fa7b141fc5be595c0f3a4fc81a93f6606f
2016-07-18 10:16:03 -07:00
Debargha Mukherjee
5f8ea94c1f Remove unused zcoeff_blk
from PICK_MODE_CONTEXT and MACROBLOCK

Change-Id: I42f98ce51871948244bdcaaaeb3d0191622116ae
2016-07-14 12:36:03 -07:00
Geza Lore
0b9b3d8643 Add a few branch hints to vp10_optimize_b.
vp10_optimize_b now takes between 40% to 60% of the TOTAL runtime
of the encoder, depending on bit-rate. It also contains 2/3 to 3/4
of the mispredicted branch instructions in the whole program.

Adding a few branch hints makes vp10_optimize_b around 2-5% faster
(dependig on bit-rate) when compiled with gcc/clang.

Change-Id: I1572733e18b4166bc10591b958c5018a9561fa2b
2016-07-08 19:20:35 +01:00
Jingning Han
7c393d097f Merge "Fix ioc in trellis optimization with hbd" into nextgenv2 2016-07-08 01:11:17 +00:00
Jingning Han
07d35de056 Fix ioc in trellis optimization with hbd
Use int64_t type for distortion. This avoids integer overflow
issues in the trellis optimization function in high bit-depth
settings.

Change-Id: I550c3ca9f11a3191ef8638a152887018cd476141
2016-07-07 12:00:38 -07:00
Debargha Mukherjee
a85e84599b Remove redundant code in new_quant
Change-Id: Ie2534c7c0cc3fc59e7389b55cb066f2b347d846e
2016-07-07 11:55:20 -07:00
Debargha Mukherjee
a35597fc7f Various cosmetics on the new_quant experiment
Also extends quant profiles to include quality range.

Change-Id: Ia96e45b6425e1d42ca61fc401f63d4fd7214e448
2016-06-29 13:18:52 -07:00
Geza Lore
92922be83c Remove skip_txfm optimization.
Commit 0d6980d7a1caa592058f8d5d618b012c160772f7 removed some use
of the skip_txfm optimization, and the rest are not productive.

The current use of this optimization is only used with --good
and --cpu-used >= 3, however the overhead of this is higher than the
speedup it yields.

Removing this, and subsequently simplifying model_rd_for_sb yields
a net encoder speedup:
--cpu-used=0    ~1.5% faster
--cpu-used=3    ~2.0% faster

The code simplification is also significant.

Change-Id: I1dd668c32de15a2e912c59c42379d0f9e1032ff8
2016-06-28 10:03:03 +01:00
Debargha Mukherjee
f3dfa0c36a Quantization fix for new-quant/var-tx
Also use the fp quantizer consistently

lowres: -0.07 BDRATE improvement

Change-Id: I9174f6ad54a74d38541004b99cb3689d0c09be55
2016-06-27 17:22:09 -07:00
Jingning Han
813201e174 Disable trellis optimized quantization in the first-pass
This resolves the use of uninitialized value in the first-pass
encoding.

Change-Id: I78bc19214a1bfde5c5641424550cbbe4e52cae99
2016-06-27 12:46:07 -07:00
Sarah Parker
fbe6fb2773 Add multiple quantization profiles to new_quant experiment
Add the ability to pick between 3 quantization profiles.
The profile is chosen based on the entropy context at the
block level.

Change-Id: Iaea0485798441b7d635962c2563f3a477f582dac
2016-06-24 16:16:13 -07:00
Jingning Han
8aca4c3495 Enforce trellis optimization for 1-pass encoding
This fixes the unit test failure in the 1-pass settings of
EndToEndTestLarge.EndtoEndPSNRTest

bug=webm:1243

Change-Id: I7667c341f7c063f7ffb83786446bbbd1e498c1aa
2016-06-23 12:18:28 -07:00
Jingning Han
c797e709a2 Merge "Fix uninitialized context use case in supertx and var-tx" into nextgenv2 2016-06-22 05:47:45 +00:00
Jingning Han
d26815569f Fix uninitialized context use case in supertx and var-tx
This commit fixes the use of uninitialized context values in the
combination of supertx and var-tx.

Change-Id: I2d36badf5c9806ea402ce3e19515cc299e6b79e8
2016-06-22 00:46:22 +00:00
hui su
9981cb8b0f Remove an unnecessary if()
The condition of this if() is always true.

Change-Id: I251715d519414d1a3d0a78eb3d025df11d913298
2016-06-21 14:56:11 -07:00
hui su
e067755930 Skip optimizing larger coefficients in trellis quant module
This achieves a few percent speed increase without hurting
compression performance.

Change-Id: I040e9bb69274f7de843bdd15926a5c924b30a731
2016-06-21 14:55:52 -07:00
Jingning Han
5223a4b405 Handle two identical states in the trellis chain
When the next two states are identical, skip repeated cost table
fetch and multiplication operations. This makes the trellis unit
about 5% faster.

Change-Id: I0dbf7ad0a5732044e4e45dd59e9431a251c678f2
2016-06-20 16:59:28 -07:00
Jingning Han
019b750867 Use precise rate estimate for zero_token
This commit takes the precise rate estimate for zero_token rate
cost update. It improves the compression performance:

lowres 0.15%
midres 0.23%

Change-Id: I36761079f75ce43c814f8c663667e359d4ac2cd4
2016-06-17 10:57:30 -07:00
Jingning Han
90ea281f29 Optimize the use case of token_cost table
Reduce the cache footprint of the token_costs table.

Change-Id: Ie989e60c6479ac3251cadaac9c7e795ccba52f4e
2016-06-17 10:15:34 -07:00
Jingning Han
c187429865 Skip restore token_cache value
The trellis optimization is going backward. Hence there is no need
to restore the token_cache values that is behind the current node
in the scan order.

Change-Id: I4da8a2e3f78bf9630e6667c85d8c387c5d94de9a
2016-06-16 15:18:46 -07:00
Jingning Han
37bf29b916 Rework table access operations in vp10_optimize_b function
Localize table access. This provides another 10% speed-up to
the unit.

Change-Id: Ib902121f412f78e2bd501b9799c8c64462f803b5
2016-06-16 14:33:16 -07:00
Jingning Han
e9c44a76a2 Refactor trellis optimization process
This commit refactors the trellis coefficient optimization process.
It saves multiplications used to generate the final dequantized
coefficients. It also removes two memset operations on quantized
and dequantized coefficient sets.

The trellis coefficient optimization is on average running over
10% faster.

Change-Id: If3aa26d2a706c3012bf2b7ac059bf1825250e81f
2016-06-15 09:06:13 -07:00
Jingning Han
1faf288798 Rework transform quantization pipeline
This commit reworks the transform and quantization unit. It enables
the use of adaptive quantization for intra modes. This further
improves the compression performance:
lowres 0.36%
midres 0.79%
hdres  0.73%

The key frame coding performance is improved:
lowres 1.7%
midres 1.9%
hdres  3.3%

The overall coding gains are:
lowres 1.1%
midres 1.8%
hdres  2.3%

Change-Id: Iaec1a3a4c1d5eac883ab526ed076d957060479dd
2016-06-14 16:32:04 -07:00
Jingning Han
a9a8c5993b Refactor the trellis optimization process
Speed up the trellis optimization unit by 10%.

Change-Id: If055f6c0589a405c008d2900bb8fbc11b1246f66
2016-06-13 12:19:57 -07:00
Jingning Han
25ca322957 Trellis based adaptive quantization
This commit combines uniform quantizer with trellis based coefficient
level optimization. It improves the codebase compression performance:

lowres 0.8%
midres 1.0%
hdres  1.6%

Note that the current trellis optimization unit is using C code. This
will make the cost of the overall quantization process slower. A number
of optimizations will come up next.

Change-Id: Id441dd238e4844409d0f08f82604be777f3f5282
2016-06-10 12:56:14 -07:00
Sarah Parker
a21afd421b Move new quant experiment from nextgen
This experiment implements non-uniform quantization where
the width of the bins increases gradually to more closely
match a laplacian distribution of the coeficcients.

Performance Gain:
derflr: 0.15%
hevcmr: 0.675%

Change-Id: I25234244e3bcd94b87c1f77cf682190b61c8ef94
2016-06-10 08:06:22 -07:00
Jingning Han
025fa11c75 Take out skip_recode speed feature
The assumption doesn't hold true in the current codebase. Remove
this speed feature to simplify the codebase.

Change-Id: I9b69f484c9b7cd612b825047cc5b2fce63ee0af7
2016-06-08 18:27:36 +00:00
Yaowu Xu
0d7dc0cae1 Change to use proper type in vp10_token_state
"qc" in vp10_token_state is used to save quantized coefficients, this
commit changes the type from short to tran_low_t to properly reflect
the value range for highbitdepth build.

This fixes an out-of-range bug when optimize_b is used in highbitdepth
build.

Change-Id: I914c6fd3d3f4b9d061f9ed7cc5f08a883ab59dcd
2016-05-04 11:59:10 -07:00
Alex Converse
8f2fa04181 Unbreak the non-var_tx build.
Change-Id: I76cc3d88122de42f035fbf6508bdf3fd7c995012
2016-04-21 13:27:19 -07:00
Debargha Mukherjee
53968c3917 Merge "Fix uninitialized blk_skip for VAR TX." into nextgenv2 2016-04-21 19:56:17 +00:00
hui su
ad59b08f76 Adjust optimize_b RD parameters
Coding gain:
lowres  0.44%
midres  0.24%
hdres   0.32%

Change-Id: Ie558203b2b2bf5c16cd49b114df3d696c4f35049
2016-04-19 09:54:08 -07:00
hui su
e43c21112d Enable optimize_b for intra blocks
Coding gain:
lowres  0.05%
midres  0.10%
hdres   0.18%

Change-Id: I508b150c02588f911a8ddddfe73c770f0819fe10
2016-04-19 09:50:45 -07:00
Geza Lore
7aa95be980 Fix uninitialized blk_skip for VAR TX.
x->blk_skip used to be uninitialized (leftover from encoding the
previous block), if cm->tx_mode != TX_MODE_SELECT (which is used with
higher --cpu-used or --rt options). This resulted in degraded coding
performance when using cm->tx_mode != TX_MODE_SELECT.

This fixes the VP10/EndToEndTestLarge.EndtoEndPSNRTest/40 unit test.

Also fixed an edge effect where encode_block in encodemb.c used the
formal width of the block (without cropping at the right edge), to
look up blk_skip, while select_tx_block in rdopt.c used the cropped
width to set blk_skip.

Change-Id: I76d0f49ac5ab3ab54203573e0d7fcfcc1c6aa10d
2016-04-19 17:00:20 +01:00
Geza Lore
8d64b53dc8 Revert "Fix uninitialized blk_skip for VAR TX."
This reverts commit e7b89d88354708790211ff3949fdc705a4fa1672.
2016-04-19 15:41:56 +01:00
Geza Lore
e7b89d8835 Fix uninitialized blk_skip for VAR TX.
x->blk_skip used to be uninitialzied (leftover from encoding the
previous block), if cm->tx_mode != TX_MODE_SELECT (which is used with
higher --cpu-used or --rt options). This resulted in degraded coding
performance when uning cm->tx_mode != TX_MODE_SELECT.

This fixes the VP10/EndToEndTestLarge.EndtoEndPSNRTest/40 unit test.

Change-Id: If39062927446798c626fc93694b4e6a4f35fa5da
2016-04-19 14:22:48 +01:00
Angie Chiang
027d12b7d6 Merge changes I359aa49c,Ic8ca5afb into nextgenv2
* changes:
  Generalize txfm scale in highbd quantizer
  Parameterize transform scale for quantizer
2016-04-12 18:02:05 +00:00
Geza Lore
511da8cbe5 Rename MI_BLOCK_SIZE and MI_MASK macros.
Rename MI_BLOCK_SIZE.* -> MAX_MIB_SIZE.* (MIB is for MI Block).
Rename MI_MASK.* -> MAX_MIB_MASK.*

There are no functional changes.

This is in preparation for coding the superblock size at the frame
level, which will require some of these constants to become variables.
The new names better reflect future semantics, and hence make the code
clearer.

Change-Id: Iee08d97554cf4cc16a5dc166a3ffd1ab91529992
2016-03-31 09:57:41 +01:00
Angie Chiang
64413a6ca7 Parameterize transform scale for quantizer
This is to facilitate changing transform scale later

Change-Id: Ic8ca5afba57d2489ebd191ccc40c1b31605a0d8c
2016-03-30 15:25:26 -07:00
Geza Lore
552d5cd715 Extend superblock size fo 128x128 pixels.
If --enable-ext-partition is used at build time, the superblock size
(sometimes also referred to as coding unit (CU) size) is extended to
128x128 pixels.

Change-Id: Ie09cec6b7e8d765b7555ff5d80974aab60803f3a
2016-03-30 18:23:06 +01:00
Angie Chiang
46b234478f Use vp10_[fwd/inv]_txfm2d_add_32x32 for bd 10
Change-Id: I996c48a90d7d71b52594a91a35cb8712c7fc212e
2016-03-28 11:08:40 -07:00
Angie Chiang
d9a0cbb1b7 Use vp10_[fwd/inv]_txfm2d_add_#x# for bd 10
Change-Id: Ie35bdbd7aafae693e3106d7ccbbdd8e65ee8800c
2016-03-23 12:05:12 -07:00
Geza Lore
f8cfb72a32 Refactor bsse and skip_txfm in MACROBLOCK.
Simple refactoring to 2 dimensional arrays, in preparation for 128
wide superblocks.

Change-Id: I40d447bd9fbd4f755534ea3cc82fc8f4676cea07
2016-03-18 15:30:10 +00:00
Geza Lore
efe7d4e5a2 Refactor mbmi->inter_tx_size to 2D array.
This is in preparation of increasing the superblock size.

Change-Id: I9197e397399fbe8aec1178a45ea0337dd90412d7
2016-03-18 15:30:09 +00:00
Alex Converse
b3ad81288f Port switch to 9-bit rate cost to vp10.
Brings the following commits to vp10:
269428e Tie the bit cost scale to a define.
d13385c Switch to 9-bit rate cost constants built on a 256 probability denominator.
ad43a73 Fix a signed overflow in vp9 motion cost.
1c9b091 Fix some interger overflow errors
fac947d Restore previous motion search bit-error scale.

Change-Id: I598ba7ee7efcde18439c31dfa96b86cbf297a580
2016-02-11 09:54:24 -08:00
hui su
2778b1cbb9 Fix a bug with ext-intra when skip_recode is enabled
Change-Id: I906945d61254149b315a6de81ac6373ed31791e6
2016-01-19 14:54:31 -08:00
Debargha Mukherjee
3787b17439 Super transform - ported from nextgen branch
Various additional changes were made to make the experiment
compatible with misc_fixes.

derflr: +0.979%
hevcmr: +0.865%

Speed-wise with --enable-supertx the encoder is only about 10%
slower than without. Decoding impact is about 30% slowdown.

Note this does not work with ext-tx or var-tx yet. That is
a TODO.

Change-Id: If25af4241a7a9efbd28f58eda3c4f044c7a7ef4b
2016-01-04 22:12:57 -08:00
Angie Chiang
0919edd4d2 Refactor vp10_encode_block_intra
1) Add VP10_XFORM_QUANT_SKIP_QUANT mode for vp10_xform_quant
2) Let encode_block call vp10_xform_quant so that its code flow
   is clear

Change-Id: I122d5cf6a089f444ae018f3e4bf844be847e17ee
2015-12-11 14:30:24 -08:00
Angie Chiang
88cae8b422 Refactor vp10_xform_quant
1) Add facade to quantize b/fp/dc version so that their interface
   are the same.
2) Merge vp10_xform_quant b/fp/dc version to one function so that
   the code flow in encodemb.c is clear

Change-Id: Ib62d6215438fc2d07f4e7e72393f964832d6746f
2015-12-03 15:28:11 -08:00