When the prediction residuals are all zero, reset the coeff rate
cost and the distortion value to be zero. This change doesn't affect
lowres set significantly, but improves several clips in the midres
set, like sintel_480p and mobisode2_480p, by a few percents. The
average performance for midres set is improved by 0.2%.
Change-Id: Idd5ebf2652e556a1b1c569fe3c48dacef3f11c32
Use int64_t type for distortion. This avoids integer overflow
issues in the trellis optimization function in high bit-depth
settings.
Change-Id: I550c3ca9f11a3191ef8638a152887018cd476141
- Use int32_t instead of int in vpx_obmc{variance,sad} functions
- Remove weigthed_src and obmc mask strides and assume contiguous
buffers. These inputs can always be packed as contiguous arrays.
Change-Id: I74c09b3fb3337f13d39e13a9cb61e140536f345d
Originally we need to send the refresh flag and the virtual indices
mapping for the reference frame buffer update for show_existing_frame to
have the BWDREF_FRAME replace the LAST_FRAME.
To remove sending this information, we update the the virtual indices
of the reference frame buffer after the last_bipred_frame is encoded,
and therefore the decoder will receive the updated reference mapping
at the next non-show-existing frame.
As a result, we can save 4 bytes per show-existing frame, and get 0.12,
0.2, and 0.07 BDRATE improvement in lowres, derf, and midref test set
respectively.
Change-Id: I63d41ee6ea99884798f0778b789d2701e2f2d3e0
Reject ext-inter compound modes before doing full rate distortion
evaluation, if the corresponding single reference modes had a lower
modelled RD.
ext-inter speedup up to TBD.
Coding performance: TBD
Change-Id: I358bfb879c5ebe5e7afbf6f540cc784f8de14857
This offset value related to the bit depth has been taken care of
inside the function vp10_highbd_block_error.
Change-Id: I58dd8a53380ba4529d59837e56a951bc81a2962e
Commit 0d6980d7a1caa592058f8d5d618b012c160772f7 removed some use
of the skip_txfm optimization, and the rest are not productive.
The current use of this optimization is only used with --good
and --cpu-used >= 3, however the overhead of this is higher than the
speedup it yields.
Removing this, and subsequently simplifying model_rd_for_sb yields
a net encoder speedup:
--cpu-used=0 ~1.5% faster
--cpu-used=3 ~2.0% faster
The code simplification is also significant.
Change-Id: I1dd668c32de15a2e912c59c42379d0f9e1032ff8
- Fix the over-writing bug in horizontal filtering as width = 2.
- Fix 10-tap vertical filtering which no longer reads one row of
pixel above the block.
- Fix 10-tap filter zero padding.
- Encoder speed slow down ~4.0%, compared to,
81ad953 Convolution vertical filter SSSE3 optimization
Change-Id: I9bb294a4529300081c29bf284e6bc6eb081cc536
Add the ability to pick between 3 quantization profiles.
The profile is chosen based on the entropy context at the
block level.
Change-Id: Iaea0485798441b7d635962c2563f3a477f582dac
This commit refactors the transform and quantization process for
sub8x8 blocks and unifies the related functions.
Change-Id: I005f61f3eb49eec44f947b906c4e308cab9935a2
Replace the expanded zero-bin quantizer with uniform quantizer in
the recursive transform block partitioning scheme. This improves
the compression performance by 0.4% for lowres.
Change-Id: I1c32ce9ebba0f0760e36a2c5bd20f2f5887ea5b4
- Apply 8-pixel vertical filtering direction parallelism.
- Add unit tests to verify bit exact.
- Encoder speed improves ~29% (enable EXT_INTERP) on Xeon E5-2680.
- Combinational cycle count of vp10_convolve() drops from 26.06%
to 6.73%.
Change-Id: Ic1ae48f8fb1909991577947a8c00d07832737e57
This fixes the unit test failure in the 1-pass settings of
EndToEndTestLarge.EndtoEndPSNRTest
bug=webm:1243
Change-Id: I7667c341f7c063f7ffb83786446bbbd1e498c1aa