Added a new expt rect-tx to be used in conjunction with ext-tx.
[rect-tx is a temporary config flag and will eventually be
merged into ext-tx once it works correctly with all other
experiments].
Added 4x8 and 8x4 tranforms for use initially with rectangular
sub8x8 y blocks as part of this experiment.
There is about a -0.2% BDRATE improvement on lowres, others pending.
When var-tx is on rectangular transforms are currently not used.
That will be enabled in a subsequent patch.
Change-Id: Iaf3f88ede2740ffe6a0ffb1ef5fc01a16cd0283a
Use regular extended zero bin quantizer for both inter and intra
modes in the first pass. This doesn't affect lowres and midres
significantly, but would bring back 0.9% coding gains for hdres.
Change-Id: Ifa5977fa7b141fc5be595c0f3a4fc81a93f6606f
vp10_optimize_b now takes between 40% to 60% of the TOTAL runtime
of the encoder, depending on bit-rate. It also contains 2/3 to 3/4
of the mispredicted branch instructions in the whole program.
Adding a few branch hints makes vp10_optimize_b around 2-5% faster
(dependig on bit-rate) when compiled with gcc/clang.
Change-Id: I1572733e18b4166bc10591b958c5018a9561fa2b
Use int64_t type for distortion. This avoids integer overflow
issues in the trellis optimization function in high bit-depth
settings.
Change-Id: I550c3ca9f11a3191ef8638a152887018cd476141
Commit 0d6980d7a1caa592058f8d5d618b012c160772f7 removed some use
of the skip_txfm optimization, and the rest are not productive.
The current use of this optimization is only used with --good
and --cpu-used >= 3, however the overhead of this is higher than the
speedup it yields.
Removing this, and subsequently simplifying model_rd_for_sb yields
a net encoder speedup:
--cpu-used=0 ~1.5% faster
--cpu-used=3 ~2.0% faster
The code simplification is also significant.
Change-Id: I1dd668c32de15a2e912c59c42379d0f9e1032ff8
Add the ability to pick between 3 quantization profiles.
The profile is chosen based on the entropy context at the
block level.
Change-Id: Iaea0485798441b7d635962c2563f3a477f582dac
This fixes the unit test failure in the 1-pass settings of
EndToEndTestLarge.EndtoEndPSNRTest
bug=webm:1243
Change-Id: I7667c341f7c063f7ffb83786446bbbd1e498c1aa
This commit fixes the use of uninitialized context values in the
combination of supertx and var-tx.
Change-Id: I2d36badf5c9806ea402ce3e19515cc299e6b79e8
When the next two states are identical, skip repeated cost table
fetch and multiplication operations. This makes the trellis unit
about 5% faster.
Change-Id: I0dbf7ad0a5732044e4e45dd59e9431a251c678f2
This commit takes the precise rate estimate for zero_token rate
cost update. It improves the compression performance:
lowres 0.15%
midres 0.23%
Change-Id: I36761079f75ce43c814f8c663667e359d4ac2cd4
The trellis optimization is going backward. Hence there is no need
to restore the token_cache values that is behind the current node
in the scan order.
Change-Id: I4da8a2e3f78bf9630e6667c85d8c387c5d94de9a
This commit refactors the trellis coefficient optimization process.
It saves multiplications used to generate the final dequantized
coefficients. It also removes two memset operations on quantized
and dequantized coefficient sets.
The trellis coefficient optimization is on average running over
10% faster.
Change-Id: If3aa26d2a706c3012bf2b7ac059bf1825250e81f
This commit reworks the transform and quantization unit. It enables
the use of adaptive quantization for intra modes. This further
improves the compression performance:
lowres 0.36%
midres 0.79%
hdres 0.73%
The key frame coding performance is improved:
lowres 1.7%
midres 1.9%
hdres 3.3%
The overall coding gains are:
lowres 1.1%
midres 1.8%
hdres 2.3%
Change-Id: Iaec1a3a4c1d5eac883ab526ed076d957060479dd
This commit combines uniform quantizer with trellis based coefficient
level optimization. It improves the codebase compression performance:
lowres 0.8%
midres 1.0%
hdres 1.6%
Note that the current trellis optimization unit is using C code. This
will make the cost of the overall quantization process slower. A number
of optimizations will come up next.
Change-Id: Id441dd238e4844409d0f08f82604be777f3f5282
This experiment implements non-uniform quantization where
the width of the bins increases gradually to more closely
match a laplacian distribution of the coeficcients.
Performance Gain:
derflr: 0.15%
hevcmr: 0.675%
Change-Id: I25234244e3bcd94b87c1f77cf682190b61c8ef94
The assumption doesn't hold true in the current codebase. Remove
this speed feature to simplify the codebase.
Change-Id: I9b69f484c9b7cd612b825047cc5b2fce63ee0af7
"qc" in vp10_token_state is used to save quantized coefficients, this
commit changes the type from short to tran_low_t to properly reflect
the value range for highbitdepth build.
This fixes an out-of-range bug when optimize_b is used in highbitdepth
build.
Change-Id: I914c6fd3d3f4b9d061f9ed7cc5f08a883ab59dcd
x->blk_skip used to be uninitialized (leftover from encoding the
previous block), if cm->tx_mode != TX_MODE_SELECT (which is used with
higher --cpu-used or --rt options). This resulted in degraded coding
performance when using cm->tx_mode != TX_MODE_SELECT.
This fixes the VP10/EndToEndTestLarge.EndtoEndPSNRTest/40 unit test.
Also fixed an edge effect where encode_block in encodemb.c used the
formal width of the block (without cropping at the right edge), to
look up blk_skip, while select_tx_block in rdopt.c used the cropped
width to set blk_skip.
Change-Id: I76d0f49ac5ab3ab54203573e0d7fcfcc1c6aa10d
x->blk_skip used to be uninitialzied (leftover from encoding the
previous block), if cm->tx_mode != TX_MODE_SELECT (which is used with
higher --cpu-used or --rt options). This resulted in degraded coding
performance when uning cm->tx_mode != TX_MODE_SELECT.
This fixes the VP10/EndToEndTestLarge.EndtoEndPSNRTest/40 unit test.
Change-Id: If39062927446798c626fc93694b4e6a4f35fa5da
Rename MI_BLOCK_SIZE.* -> MAX_MIB_SIZE.* (MIB is for MI Block).
Rename MI_MASK.* -> MAX_MIB_MASK.*
There are no functional changes.
This is in preparation for coding the superblock size at the frame
level, which will require some of these constants to become variables.
The new names better reflect future semantics, and hence make the code
clearer.
Change-Id: Iee08d97554cf4cc16a5dc166a3ffd1ab91529992
If --enable-ext-partition is used at build time, the superblock size
(sometimes also referred to as coding unit (CU) size) is extended to
128x128 pixels.
Change-Id: Ie09cec6b7e8d765b7555ff5d80974aab60803f3a
Brings the following commits to vp10:
269428e Tie the bit cost scale to a define.
d13385c Switch to 9-bit rate cost constants built on a 256 probability denominator.
ad43a73 Fix a signed overflow in vp9 motion cost.
1c9b091 Fix some interger overflow errors
fac947d Restore previous motion search bit-error scale.
Change-Id: I598ba7ee7efcde18439c31dfa96b86cbf297a580
Various additional changes were made to make the experiment
compatible with misc_fixes.
derflr: +0.979%
hevcmr: +0.865%
Speed-wise with --enable-supertx the encoder is only about 10%
slower than without. Decoding impact is about 30% slowdown.
Note this does not work with ext-tx or var-tx yet. That is
a TODO.
Change-Id: If25af4241a7a9efbd28f58eda3c4f044c7a7ef4b
1) Add VP10_XFORM_QUANT_SKIP_QUANT mode for vp10_xform_quant
2) Let encode_block call vp10_xform_quant so that its code flow
is clear
Change-Id: I122d5cf6a089f444ae018f3e4bf844be847e17ee
1) Add facade to quantize b/fp/dc version so that their interface
are the same.
2) Merge vp10_xform_quant b/fp/dc version to one function so that
the code flow in encodemb.c is clear
Change-Id: Ib62d6215438fc2d07f4e7e72393f964832d6746f