Total encoding time for first 50 frames of bus (speed 0) @ 1500kbps
goes 2min34.8 to 2min14.4, i.e. a 10.4% overall speedup. The code is
x86-64 only, it needs some minor modifications to be 32bit compatible,
because it uses 15 xmm registers, whereas 32bit only has 8.
Change-Id: I2df53770c2e850813ffa713e1a91b45b0082b904
Added a speed feature that focuses only on thresholds
for new motion modes.
Moved sf->comp_inter_joint_search_thresh into speed
1. This has ~+0.4% impact on quality at speed 0 as
our quality reference baseline.
Slight adjustment to baseline thresholds.
Change-Id: I7ebf104f1fe29af77ed4837b2e84be065621bbe5
This commit enables SSE2 4x4 foward hybrid transform. The runtime
goes from 249 cycles down to 74 cycles. Overall around 2% speed-up
at no compression performance change.
Change-Id: Iad4d526346e05c7be896466c05500711bb763660
Makes cost_coeffs() a lot faster:
4x4: 236 -> 181 cycles
8x8: 888 -> 588 cycles
16x16: 3550 -> 2483 cycles
32x32: 17392 -> 12010 cycles
Total encode time of first 50 frames of bus (speed 0) @ 1500kbps goes
from 2min51.6 to 2min43.9, i.e. 4.7% overall speedup.
Change-Id: I16b8d595946393c8dc661599550b3f37f5718896
Adding CHECK_MEM_ERROR macro to vp9_common.h and removing two duplicated
ones from vp9_onyx_int.h and vp9_onyxd_int.h.
Change-Id: I916afec61b3019f18193135dac7c35ed0f89b8b6
Cycle timings for first 3 frames of bus (speed 0) at 1500kbps:
4x4: 298 -> 234 cycles
8x8: 1227 -> 878 cycles
16x16: 23426 -> 18134 cycles
32x32: 4906 -> 3664 cycles
Total encode time of first 50 frames of bus @ 1500kbps (speed 0) goes
from 3min0.7 to 2min51.6 seconds, i.e. 5.3% faster.
Change-Id: I68a0e1b530b0563b84a67342cca4b45146077e95
This commit replaces zrun_zbin_boost, a method of biasing non-zero
coefficients following runs of zero-coefficients to be rounded towards
zero, with an explicit skip-block choice in the RD loop.
The logic is basically that if individual coefficients should be rounded
towards zero (from a RD point of view), the trellis/optimize loop should
take care of it. If whole blocks should be zero (from a RD point of
view), a single RD check is much more efficient than a complete
serialization of the quantization loop.
Quality change: derf +0.5% psnr, +1.6% ssim; yt +0.6% psnr, +1.1% ssim.
SIMD for quantize will follow in a separate patch. Results for other
test sets pending.
Change-Id: Ife5fa641163ac5150ac428011e87188f1937c1f4
This commit change the partition search order to allow checking of
rectangular partition to be done after square partitions. It also
added a speed feature to skip rectangular partition check when
NONE is better than SPLIT in RD sense.
This feature roughly speed up encoder by 1.5X with loss on compression
-0.91% on cif set
-0.56% on stdhd set
Change-Id: I0d2d06993041aa9ea9073fcc39c54f73a127dfa4
Using vp9_set_pred_flag function instead of custom code, adding
decode_tokens function which is now called from decode_atom,
decode_sb_intra, and decode_sb.
Change-Id: Ie163a7106c0241099da9c5fe03069bd71f9d9ff8
Encoding time of first 50 frames of bus @ 1500kbps (speed 0) goes from
3min15.0 to 3min10.9, i.e. 2.1% faster overall.
Change-Id: If592ee99be09bcd34a7c8498347f44e7305e982c
This commit enables configurable reference buffer pointer for intra
predictor. This allows later removal of spatial dependency between
blocks inside a 64x64 superblock in the rate-distortion optimization
loop.
Change-Id: I02418c2077efe19adc86e046a6b49364a980f5b1
Also tweaks to other features and experiments with
what is on and off at different speed settings.
Change-Id: I3e1d0be0d195216bf17c2ac5df67f34ce0b306b2
The aligned array in parameter list caused win32 build to report
c2719 error. This commit fixed the issue by make the parameter
type a pointer instead of an array.
Change-Id: I4ed654ce4eba2db4995d9cdc136c68e9a6acc992