Originally the uniform quantization function was not being
replaced with the new_quant version in rdopt when new_quant
is turned on. This fixes the bug.
Change-Id: I593793bb909e1e1a6f89544eeca6783fe0576f25
1. Add "best_mv" in MACROBLOCK to store the best motion vector
during motion search, so that we don't need to pass its pointer
to various motion search functions.
2. Declare some functions as static when possible.
3. Fix some indents.
Change-Id: I0778146c0866cbc55e245988c59222577ea8260e
Use vpx_blend_a64_hmask and vpx_blend_a64_vmask to speed up
computing the obmc predictor. Clean up calc_target_weighted_pred.
Encoder speedup: 1.3%
Decoder speedup: 6.5%
Change-Id: I0c774fe53d22399e92a10d1daf3af0010d88d2c5
Fixed best error reported by loop filter selection, this value is used
during loop restoration to pick best mode. Baseline remains unchanged,
change in BDRate for loop restoration experiment:
-0.628 -> -0.625 for lowres,
-1.262 -> -1.283 for highres.
Change-Id: I69ef1608bc232b250ac46f59e31fdbed1a999dcd
- For experiment EXT_INTERP under high bit depth.
- Add unit test to verify bit-exact.
- Speed performance improvement:
On Xeon E5-2680, park_joy_1080p_12.y4m, 50 frames, encoding time
drops from 6682503 ms to 5390270 ms.
Change-Id: Iea4debf5414f3accf1eb5672abeab56a0539ac77
* changes:
vp10/encoder/rdopt.c: make a function static
vp10/encoder/rd.c: make a function static
vp10_convolve_ssse3.c: make some functions static
vp10/encoder/bitstream.[hc]: correct a prototype
vp10/common/idct.h: add some missing prototypes
highbd_quantize_intrin_sse2.c: add missing rtcd include
vp10: add some missing includes
Use vpx_blend_a64_hmask and vpx_blend_a64_vmask to speed up
computing the supertx predictor.
Decoder speedup of up to 4% has been observed.
Change-Id: I255a5ba4cc24f78dc905d25b6e2f7fbafac13253
- Made source buffers pointers to const.
- Renamed vpx_blend_mask6b to vpx_blend_a64_mask. This is more
indicative that the function does alpha blending. The 6, or 6b
suffix was misleading, as the max mask value (64) does not fit into
6 bits.
- Added VPX_BLEND_* macros to use when needing to blend scalars.
- Use VPX_BLEND_A256 in combine_interintra to be more explicit about
the operation being done.
- Added versions of vpx_blend_a64_* which take 1D horizontal/vertical
masks directly and apply them to all rows/columns
(vpx_blend_a64_hmask and vpx_blend_a64_vmask). The SSE4.1 optimzied
horizontal version now falls back on the 2D version. This can be
improved upon if it show up high enough in a profile.
- All vpx_blend_a64_* functions now support block sizes down to 1x1
(ie: a single pixel). This is for usage convenience. The SSE4.1
optimized versions fall back on the C implementation if
w <= 2 or h <= 2. This can again be improved if it becomes hot code.
Change-Id: I13ab3835146ffafe3e1d74d8e9cf64a5abe4144d
Directly call c functions, otherwise when EXT_TX is enabled, hybrid
transform other than combination of DCT/ADST has not been implemented, thus
will cause assertion failures in the switch loops in vp10_fhtnxn_msa() and
vp10_ihtnxn_nxn_add_msa().
BUG=webm:1239
Change-Id: I2379a07e5406f9489edcd2f3205682f679c9b091
vp10_optimize_b now takes between 40% to 60% of the TOTAL runtime
of the encoder, depending on bit-rate. It also contains 2/3 to 3/4
of the mispredicted branch instructions in the whole program.
Adding a few branch hints makes vp10_optimize_b around 2-5% faster
(dependig on bit-rate) when compiled with gcc/clang.
Change-Id: I1572733e18b4166bc10591b958c5018a9561fa2b
The combination of the two experiments improves the compression
performance gains:
lowres 2.5%
midres 2.1%
Change-Id: Id26c0a9474ce08893aa1d946365c7ff850fab57a
When the prediction residuals are all zero, reset the coeff rate
cost and the distortion value to be zero. This change doesn't affect
lowres set significantly, but improves several clips in the midres
set, like sintel_480p and mobisode2_480p, by a few percents. The
average performance for midres set is improved by 0.2%.
Change-Id: Idd5ebf2652e556a1b1c569fe3c48dacef3f11c32