- full ASM version, no more C gateway file.
- integrate combine-add with last step of 2nd pass.
- remove a few push/pop pairs.
- some instruction reordering to hide latency.
Change-Id: Ic9d9933c908b65d1bf7ba8fd47b524cda808c9c6
Both first pass and mbgraph search use block size 16x16 for motion
estimation. This commit put a limit of motion vector range. The
effective range allows the entire 16x16 with required subpel
interpolation input to be completely outside image border, but
not any further away from image border.
Change-Id: Id70a5ed08be49e70959f064859d72adc7d775d08
INT64_MAX may be assigned as RDCOST when RDCSOST computation is skipped
for speed, this commit to prevent INT64_MAX from being used as real
RDCOST in transform size decision.
Change-Id: I89a945134191bbdea1f1431ade70424ac079eaac
After change of MI context storage , mi_8x8[] pointer may be null for
a block outside of image border. The commit changes to access the data
only after validation of mi_row and mi_col.
Change-Id: I039c4eb486a228ea9d8e5f35ab9ae6717d718bf3
The probability model used to code prediction mode is conditioned
on the immediate above and left 8x8 blocks' prediction modes. When
the above/left block is coded in sub8x8 mode, we use the prediction
mode of the bottom-right sub8x8 block as the reference to generate
the context.
This commit moves the update of mbmi.mode out of the sub8x8 decoding
loop, hence removing redundant update steps and keeping the bottom-
right block's mode for the decoding process of next blocks.
Change-Id: I1e8d749684d201c1a1151697621efa5d569218b6
39c7b01d accidently reverted the row/col initialization, which broke
mv clamps, which is dependent on the sites for valid motion vector
range. This commit fixed the issue.
Change-Id: Ibcce0226e0360b1ef483fe760b2e33f1af4bf494
This commit enables forcing all coefficients zero per transformed
block, when its rate-distortion cost is lower than regular coeff
quantization.
The overall performance improvement (including its parent patch on
calculating rd cost per transformed block) at speed 1:
derf: 0.298%
yt: 0.452%
hd: 0.741%
stdhd: 0.006%
Change-Id: I66005fe0fd7af192c3eba32e02fd6d77952accb5
Adds modeled functions to decide the qp for altref frames in constant q
mode similar to other functions in use in bitrate mode.
Also turns on the constrained quality mode (end-usage=2) option which
was turned off before. Basic testing shows the mode works in principle,
to cap bitrate to the target-bitrate specified, while allowing lower
bitrate depending on the cq-level specified. The mode will need to be
improved over time.
Results for constant quality vs bitrate control mode:
derfraw300/fullderfraw: +3.0% at constant quality over bitrate control.
fullstdhdraw: +4.341%
stdhdraw250: +5.361%
Change-Id: If5027c9ec66c8e88d33e47062c6cb84a07b1cda9
This commit makes the rate-distortion optimization loop evaluate
the rd costs of regular quantization and all zero coeffs, per
transformed block. It improves speed 1 compression performance:
derf: 0.245%
yt: 0.515%
For a large partition that consists multiple transformed blocks,
this allows more flexibility to selectively force a portion of
them coded as all zero coeffs, as well be continued in the next
patches.
Change-Id: I211518be4179747b57375696f017d1160cc91851
The sub8x8 blocks has its own motion vector reference scheme. The
mv_pred is only used blocks of sizes 8x8 and above, to find the
starting point for motion search.
This change does not change any coding behavior. It makes the
encoding process slightly faster. (0.5% speed-up for local test on
speed 1.)
Change-Id: I746ee6ef0eac19aa3621be014afa12be8d82cbb9
The fake token EOSB may cause invaild memory read in pack token, this
commit reworked the loop to avoid such invalid read.
Change-Id: I37fdfce869b44a7f90003f82a02f84c45472a457