Use vpx_blend_a64_hmask and vpx_blend_a64_vmask to speed up
computing the supertx predictor.
Decoder speedup of up to 4% has been observed.
Change-Id: I255a5ba4cc24f78dc905d25b6e2f7fbafac13253
- Made source buffers pointers to const.
- Renamed vpx_blend_mask6b to vpx_blend_a64_mask. This is more
indicative that the function does alpha blending. The 6, or 6b
suffix was misleading, as the max mask value (64) does not fit into
6 bits.
- Added VPX_BLEND_* macros to use when needing to blend scalars.
- Use VPX_BLEND_A256 in combine_interintra to be more explicit about
the operation being done.
- Added versions of vpx_blend_a64_* which take 1D horizontal/vertical
masks directly and apply them to all rows/columns
(vpx_blend_a64_hmask and vpx_blend_a64_vmask). The SSE4.1 optimzied
horizontal version now falls back on the 2D version. This can be
improved upon if it show up high enough in a profile.
- All vpx_blend_a64_* functions now support block sizes down to 1x1
(ie: a single pixel). This is for usage convenience. The SSE4.1
optimized versions fall back on the C implementation if
w <= 2 or h <= 2. This can again be improved if it becomes hot code.
Change-Id: I13ab3835146ffafe3e1d74d8e9cf64a5abe4144d
Directly call c functions, otherwise when EXT_TX is enabled, hybrid
transform other than combination of DCT/ADST has not been implemented, thus
will cause assertion failures in the switch loops in vp10_fhtnxn_msa() and
vp10_ihtnxn_nxn_add_msa().
BUG=webm:1239
Change-Id: I2379a07e5406f9489edcd2f3205682f679c9b091
vp10_optimize_b now takes between 40% to 60% of the TOTAL runtime
of the encoder, depending on bit-rate. It also contains 2/3 to 3/4
of the mispredicted branch instructions in the whole program.
Adding a few branch hints makes vp10_optimize_b around 2-5% faster
(dependig on bit-rate) when compiled with gcc/clang.
Change-Id: I1572733e18b4166bc10591b958c5018a9561fa2b
The combination of the two experiments improves the compression
performance gains:
lowres 2.5%
midres 2.1%
Change-Id: Id26c0a9474ce08893aa1d946365c7ff850fab57a
When the prediction residuals are all zero, reset the coeff rate
cost and the distortion value to be zero. This change doesn't affect
lowres set significantly, but improves several clips in the midres
set, like sintel_480p and mobisode2_480p, by a few percents. The
average performance for midres set is improved by 0.2%.
Change-Id: Idd5ebf2652e556a1b1c569fe3c48dacef3f11c32
Use int64_t type for distortion. This avoids integer overflow
issues in the trellis optimization function in high bit-depth
settings.
Change-Id: I550c3ca9f11a3191ef8638a152887018cd476141
test/assertion_helpers.h
test/randomise.{cc,h}
test/snapshot.h
Modfiy blend_mask6_test.cc not to rely on these.
Change-Id: I88b8933fe0a729a606797e5cd421795a544c612d
This reinstates the tests from commit
efda2831e5f758b4f350679b5c55c0b9282449b0 with the appropriate
fixes for 32 bit x86 builds.
Change-Id: Ib331906c5b448ca964895ee9cbfd4266f67d1089
- Use int32_t instead of int in vpx_obmc{variance,sad} functions
- Remove weigthed_src and obmc mask strides and assume contiguous
buffers. These inputs can always be packed as contiguous arrays.
Change-Id: I74c09b3fb3337f13d39e13a9cb61e140536f345d
Originally we need to send the refresh flag and the virtual indices
mapping for the reference frame buffer update for show_existing_frame to
have the BWDREF_FRAME replace the LAST_FRAME.
To remove sending this information, we update the the virtual indices
of the reference frame buffer after the last_bipred_frame is encoded,
and therefore the decoder will receive the updated reference mapping
at the next non-show-existing frame.
As a result, we can save 4 bytes per show-existing frame, and get 0.12,
0.2, and 0.07 BDRATE improvement in lowres, derf, and midref test set
respectively.
Change-Id: I63d41ee6ea99884798f0778b789d2701e2f2d3e0
Reject ext-inter compound modes before doing full rate distortion
evaluation, if the corresponding single reference modes had a lower
modelled RD.
ext-inter speedup up to TBD.
Coding performance: TBD
Change-Id: I358bfb879c5ebe5e7afbf6f540cc784f8de14857