In the full-rd transform size search, we go through all transform
sizes to choose the one with best rd score. In this patch, an
early termination is added to stop the search once we see that the
smaller size won't give better rd score than the larger size. Also,
the search starts from largest transform size, then goes down to
smallest size.
A speed feature tx_size_search_breakout is added, which is turned off
at speed 0, and on for other speeds. The transform size search is
turned on at speed 1.
Borg test results:
1. At speed 1,
derf set: psnr gain: 0.618%, ssim gain: 0.377%;
stdhd set: psnr gain: 0.594%, ssim gain: 0.162%;
No noticeable speed change.
3. At speed 2,
derf set: psnr loss: 0.157%, ssim loss: 0.175%;
stdhd set: psnr loss: 0.090%, ssim loss: 0.101%;
speed gain: ~4%.
Change-Id: I22535cd2017b5e54f2a62bb6a38231aea4268b3f
This commit enables the encoder to record the location of the
center frame to generate alter reference frame. It then allows to
skip checking prediction modes of other reference frame types when
it comes to encode this frame.
The speed 3 runtime is reduced for the test sequences:
bus at CIF 1000 kbps, 9791 ms -> 9446 ms, i.e., 3.5% speed-up,
pedestrian at 1080p 2000 kbps, 184043 ms -> 175730 ms, i.e., 4.5%
speed-up.
No compression performance change observed.
Change-Id: Iacfde3bcc1445964e7a241f239bd6ea11cb94bd1
This commit enables the encoder to skip NEARMV and ZEROMV if the
above and left blocks have identical reference frame, and the
current reference is different from that. It reduces the runtime
of speed 3 for test sequences:
bus cif at 1000 kbps 10064 ms -> 9823 ms
pedestrian 1080p at 2000 kbps 193078 ms -> 189559 ms
The compression performance is changed by
derf -0.085%
stdhd -0.103%
Change-Id: If304f26d42e6412152a84c3dd7b02635c38444f4
This commit allows the encoder to check the above and left neighbor
blocks' reference frames and motion vectors. If they are all
consistent, skip checking the NEARMV and ZEROMV modes. This is
enabled in speed 3. The coding performance is improved:
pedestrian area 1080p at 2000 kbps,
from 74773 b/f, 41.101 dB, 198064 ms
to 74795 b/f, 41.099 dB, 193078 ms
park joy 1080p at 15000 kbps,
from 290727 b/f, 30.640 dB, 609113 ms
to 290558 b/f, 30.630 dB, 592815 ms
Overall compression performance of speed 3 is changed
derf -0.171%
stdhd -0.168%
Change-Id: I8d47dd543a5f90d7a1c583f74035b926b6704b95
This commit enables encoder to select fast forward transform and
quantization path according to the prediction residual sse/variance,
in the rate-distortion optimization scheme.
Change-Id: Ief9fc3844fd4107166d401970e800c6e5ce2b5fe
The get_chessboard_index() used to call the entire VP9_COMMON
struct pointer to retrieve the chessboard pattern index. This cl
makes it call the frame index directly.
Change-Id: I3cad9d209ea2e77a358085a04fe1ff0ddec5ba03
The assignment of the variable mode_excluded in
vp9_rd_pick_inter_mode_sub8x8 takes redundant conditional jump.
This commit removes it.
Change-Id: Ie195fbe6e54ec2ade7093d562c456a2e93143704
The value of mode_excluded has been properly set in
vp9_rd_pick_inter_mode_sb(). It is redundant to send it in
handle_inter_mode() and re-set the value again.
Change-Id: I408d4731f2f42e0bcf3ae62e85757717bb410471
This commit extends the chessboard pattern prediction filter search.
If the above and left blocks have the same prediction filter type,
the encoder will skip the prediction filter type search and use the
reference one.
The overall chessboard pattern prediction filter type search reduces
speed 3 runtime for hard clips. Experiments on park joy at 1080p
and 15000 kbps show that the runtime goes from 723265 ms to 65832 ms,
i.e., about 10% speed-up. Compression performance wise, it affects
the coding quality by
Change-Id: I880975497c7ad166532e9eea9bf46684d77ff327
derf: -0.326%
yt: -0.257%
hd: -0.241%
stdhd: -0.417%
This commit enables a chessboard pattern prediction filter type
search scheme for rate-distortion optimization speed-up. For the
inferred motion vector modes, the encoder can re-use its above/left
neighbor blocks' prediction filter type and skip a full test on
all possible filter types. Such operation is turned on/off
alternatively in a chessboard manner.
It is turned on in speed 3. For test clip pedestrian 1080p, the
runtime is reduced from 231500 ms -> 221700 ms. The compression
performance is changed:
derf: -0.147%
yt: -0.134%
hd: -0.079%
stdhd: -0.220%
Change-Id: I1912f278e7576c2dc632688e3ad7a257410c605a
This should be a local variable. Move the definition from
vp9_rd_pick_inter_mode_sb to handle_inter_mode.
Change-Id: I14f4168bb1c896ed04e8f6d4cd89fbf4c9839944
For gcc, when libvpx config option debug is disabled, added the
flag -DNDEBUG to disable the assertions in libvpx for some speedup.
Change-Id: Ifcb7b9e8ef5cbe5d07a24407b53b9a2923f596ee
vp9_rdopt is for making rd optimal mode decisions. vp9_rd is for all
other rd related routines. Anything used outside of making an rd optimal
decision belongs in rd.
Change-Id: I772a3073f7588bdf139f551fb9810b6864d8e64b
Moved the threshold adjustment before reference flag checking,
which could set the threshold to INT_MAX for disabled reference
frame, and cause overflow if the adjustment is done after that.
Change-Id: I85e94f8726d5e3ae93f65965aa978721dddc9957
The current threshold is knid of low, and in many cases NEWMV
mode is checked but not picked as the best mode. This patch
added a speed feature to increase NEWMV threshold, so that
less partition mode checking goes to check NEWMV. This feature
is enabled for speed 6 and 7.
Rtc set borg tests showed:
1. Speed 6, overall psnr: -0.088%, ssim: -1.339%;
Average speedup on rtc set is 11.1%.
2. Speed 7, overall psnr: -0.505%, ssim: -2.320%
Average speedup on rtc set is 12.9%.
Change-Id: I953b849eeb6e0d5a1f13eacba30c14204472c5be
This commit allows the key frame to search through more prediction
modes and more flexible block sizes. No speed change observed. The
coding performance for rtc set is improved by 1.7% for speed -5 and
3.0% for speed -6.
Change-Id: Ifd1bc28558017851b210b4004f2d80838938bcc5
This breaks the profile 1 bitstream.
Don't force non420 uv transform size to 1/4 y size. In the 4:2:0 case the
chroma corresponding to a luma block is 1/4 its size. In the 4:4:4 case
chroma and luma planes are the same size. Disallowing larger transforms
can result in a loss of compression efficiency and is inconsistent.
For sub-8x8 blocks only average corresponding motion vectors.
4:2:0 and profile 0 behavior remains unchanged.
Change-Id: I560ae07183012c6734dd1860ea54ed6f62f3cae8