vp10_optimize_b now takes between 40% to 60% of the TOTAL runtime
of the encoder, depending on bit-rate. It also contains 2/3 to 3/4
of the mispredicted branch instructions in the whole program.
Adding a few branch hints makes vp10_optimize_b around 2-5% faster
(dependig on bit-rate) when compiled with gcc/clang.
Change-Id: I1572733e18b4166bc10591b958c5018a9561fa2b
The combination of the two experiments improves the compression
performance gains:
lowres 2.5%
midres 2.1%
Change-Id: Id26c0a9474ce08893aa1d946365c7ff850fab57a
When the prediction residuals are all zero, reset the coeff rate
cost and the distortion value to be zero. This change doesn't affect
lowres set significantly, but improves several clips in the midres
set, like sintel_480p and mobisode2_480p, by a few percents. The
average performance for midres set is improved by 0.2%.
Change-Id: Idd5ebf2652e556a1b1c569fe3c48dacef3f11c32
Use int64_t type for distortion. This avoids integer overflow
issues in the trellis optimization function in high bit-depth
settings.
Change-Id: I550c3ca9f11a3191ef8638a152887018cd476141
- Use int32_t instead of int in vpx_obmc{variance,sad} functions
- Remove weigthed_src and obmc mask strides and assume contiguous
buffers. These inputs can always be packed as contiguous arrays.
Change-Id: I74c09b3fb3337f13d39e13a9cb61e140536f345d
Originally we need to send the refresh flag and the virtual indices
mapping for the reference frame buffer update for show_existing_frame to
have the BWDREF_FRAME replace the LAST_FRAME.
To remove sending this information, we update the the virtual indices
of the reference frame buffer after the last_bipred_frame is encoded,
and therefore the decoder will receive the updated reference mapping
at the next non-show-existing frame.
As a result, we can save 4 bytes per show-existing frame, and get 0.12,
0.2, and 0.07 BDRATE improvement in lowres, derf, and midref test set
respectively.
Change-Id: I63d41ee6ea99884798f0778b789d2701e2f2d3e0
Reject ext-inter compound modes before doing full rate distortion
evaluation, if the corresponding single reference modes had a lower
modelled RD.
ext-inter speedup up to TBD.
Coding performance: TBD
Change-Id: I358bfb879c5ebe5e7afbf6f540cc784f8de14857
This offset value related to the bit depth has been taken care of
inside the function vp10_highbd_block_error.
Change-Id: I58dd8a53380ba4529d59837e56a951bc81a2962e
Commit 0d6980d7a1caa592058f8d5d618b012c160772f7 removed some use
of the skip_txfm optimization, and the rest are not productive.
The current use of this optimization is only used with --good
and --cpu-used >= 3, however the overhead of this is higher than the
speedup it yields.
Removing this, and subsequently simplifying model_rd_for_sb yields
a net encoder speedup:
--cpu-used=0 ~1.5% faster
--cpu-used=3 ~2.0% faster
The code simplification is also significant.
Change-Id: I1dd668c32de15a2e912c59c42379d0f9e1032ff8
Add the ability to pick between 3 quantization profiles.
The profile is chosen based on the entropy context at the
block level.
Change-Id: Iaea0485798441b7d635962c2563f3a477f582dac
This commit refactors the transform and quantization process for
sub8x8 blocks and unifies the related functions.
Change-Id: I005f61f3eb49eec44f947b906c4e308cab9935a2
Replace the expanded zero-bin quantizer with uniform quantizer in
the recursive transform block partitioning scheme. This improves
the compression performance by 0.4% for lowres.
Change-Id: I1c32ce9ebba0f0760e36a2c5bd20f2f5887ea5b4
This fixes the unit test failure in the 1-pass settings of
EndToEndTestLarge.EndtoEndPSNRTest
bug=webm:1243
Change-Id: I7667c341f7c063f7ffb83786446bbbd1e498c1aa