This commit enables a special handle for the 8x8 inverse 2D-DCT,
where only DC coefficient is quantized to be non-zero. For bus_cif
at 2000 kbps, it provides about 1% speed-up at speed 0.
Change-Id: I2523222359eec26b144cf8fd4c63a4ad63b1b011
Speed feature experiment to set an upper and lower
partition size limit based on what has been seen
in spatial neighbors.
This seems to gives quite reasonable speed gains in local
(10-15%) and when used with speed 0 the losses are small
(0.25% derf, 0.35% stdhd). However, for now I am only
enabling it on speed 1 as there may be clashes with the existing
temporal partition selection in speed 2.
Using a tighter min / max around the range derived from the
neighbors increases speed further but at the cost of a
bigger quality loss. However, I think this spatial method could
be combined with data from either the last frame or a variance
method (or both) to refine the range of minimum and maximum
partition size. I.e. consider the min and max from spatial and
temporal neighbors and the variance recommendation.
Change-Id: I1b96bf8b84368d6aad0c7aa600fe141b4f07435f
Used 3 * standard_deviation in internal threshold calculation
instead of fit curve. This actually approached the algorithm
better.
For comparison, similar tests were done:
The overall psnr loss is less than before.
1. derf set:
when static-thresh = 1, psnr loss is 0.329%;
when static-thresh = 500, psnr loss is 0.970%;
2. stdhd set:
when static-thresh = 1, psnr loss is 0.922%;
when static-thresh = 500, psnr loss is 1.307%;
Similar speedup is achieved. For example,
clip bitrate static-thresh psnr time
akiyo(cif) 500 0 48.952 5.077s(50f)
akiyo 500 500 48.866 4.169s(50f)
parkjoy(1080p) 4000 0 30.388 78.20s(30f)
parkjoy 4000 500 30.367 70.85s(30f)
sunflower(1080p) 4000 0 44.402 74.55s(30f)
sunflower 4000 500 44.414 68.69s(30f)
Change-Id: Ic78833642ce1911dbbd1cb6c899a2d7e2dfcc1f3
This option exists in VP8, and it was rewritten in VP9 to support
skipping on different partition levels. After prediction is done,
we can check if the residuals in the partition block will be all
quantized to 0. If this is true, the skip flag is set, and only
prediction data are needed in reconstruction. Based on DCT's energy
conservation property, the skipping check can be estimated in
spatial domain.
The prediction error is calculated and compared to a threshold.
The threshold is determined by the dequant values, and also
adjusted by partition sizes. To be precise, the DC and AC parts
for Y, U, and V planes are checked to decide skipping or not.
Test showed that
1. derf set:
when static-thresh = 1, psnr loss is 0.666%;
when static-thresh = 500, psnr loss is 1.162%;
2. stdhd set:
when static-thresh = 1, psnr loss is 1.249%;
when static-thresh = 500, psnr loss is 1.668%;
For different clips, encoding speedup range is between several
percentage and 20+% when static-thresh <= 500. For example,
clip bitrate static-thresh psnr time
akiyo(cif) 500 0 48.923 5.635s(50f)
akiyo 500 500 48.863 4.402s(50f)
parkjoy(1080p) 4000 0 30.380 77.54s(30f)
parkjoy 4000 500 30.384 69.59s(30f)
sunflower(1080p) 4000 0 44.461 85.2s(30f)
sunflower 4000 500 44.418 78.1s(30f)
Higher static-thresh values give larger speedup with larger
quality loss.
Change-Id: I857031ceb466ff314ab580ac5ec5d18542203c53
Removing unused constants, macros, and function declarations. Using
ROUND_POWER_OF_TWO macro, vp9_zero, vp9_copy where possible. Moving
#include from *.h to *.c. Merging for loops for motion vectors.
Change-Id: Ic3bf841764a2bb177128bb3a6d7aa8f68229cd13
Simplified the code that extracts and uses the motion
vectors for the 4 sub-partitions in rd_pick_partition.
Change-Id: Iaf698ef7ee3aef9edd59015e1ae065dd359b17d9
This commit makes the initialization of trellis coeff optimization
a per-plane operation, thereby eliminating the redundant steps in
encode_sby and encode_sbuv. It makes the encoder at speed 0 slightly
faster.
Change-Id: Iffe9faca6a109dafc0dd69dc7273cbdec19b17cd
The feature that uses small partition results as a measure to skip
mode evaluation at larger partition requires the flags to be reset.
The reset was missing in the code path that calls rd_use_partition().
Change-Id: Ia0a3a0aee1a862b6e2333d596808db7c48033d50
Adding plane type check condition because it was always used outside of
get_tx_type_{4x4, 8x8, 16x16}.
Change-Id: I02f0bbfee8063474865bd903eb25b54d26e07230
Although local copies of the mode member variables
(mode, ref_frame) were made, they were not used in
all places. Also, made a local copy of the
second_ref_frame member.
Change-Id: I84d8c822e5cb3d8a02fc3de8a4037ca3fea8bfad
Prevents doing duplicate IDCTs; encoding of first 50 frames of bus
(speed 0) @ 1500kbps goes from 1min4.0 to 1min3.5, i.e. 0.87% faster
overall.
Change-Id: I2df39e29ed9d5ea5e7d2704a34940ba622832ddd
Encoding time of first 50 frames of bus (speed 0) @ 1500kbps goes from
1min5.4 to 1min4.0, i.e. 2.2% faster overall.
Change-Id: I8c32f2aff9a649ce7dd49d910dc5ba16b99c3bc6
Counts are separate from frame context. We have several frame contexts but
need only one copy of all counts.
Change-Id: I5279b0321cb450bbea7049adaa9275306a7cef7d
The struct optimize_block_args is defined same as encode_b_args.
Remove this redundant definition, and use encode_b_args consistently.
Change-Id: I1703aeeb3bacf92e98a34f4355202712110173d9
All filters are interpolating now, so we don't need this array, all
values from this array are evaluated to true.
Change-Id: I9af6d8219ae0eb984063cd15e4e2296374ae4961
The xform_quant() module is only used by inter modes, hence removing
the redundant switches therein conditioned on tx_type.
Change-Id: Ib87ce5b2f2e4cbf3ceb133a1108afa173c933a3f
When all the transform coefficients were quantized to zero, skip
the inverse transform operation. For bus_cif at 1000 kbps, the
runtime goes from 154967ms -> 149842ms, i.e., about 3% speed-up,
at speed 0.
Change-Id: Ic0a813fff5e28972d4888ee42d8747846a6c3cc6
many structures use bw and bh and they have different meanings. This cl attempts
to start this clean up and remove unneccessary 2 step look up log and then
shift operations...
also removed partition type multiple operation code in bitstream.c.
Change-Id: I7e03e552bdfc0939738e430862e3073d30fdd5db
Renamed:
MAX_MB_SEGMENTS to MAX_SEGMENTS
MB_SEG_TREE_PROBS to SEG_TREE_PROBS
The minimum unit for segmentation in the segment map
is now 8x8 so it is misleading to use MB_ as macro-block
traditionally refers to a 16x16 region.
Change-Id: I0b55a6f0426bb46dd13435fcfa5bae0a30a7fa22