This saves an extra 64x64 variance calculation and replaces two
32x32 variance functions with sad functions. The compression
performance change is unnoticeable.
Change-Id: I6d33868695664ec73b56c42945162ae61c484856
Frame buffers are now allocated dynamically on-demand.
Entries in the reference frame map, cm->ref_frame_map,
may now be set to -1 (INVALID_IDX) to indicate that
there is not a valid reference buffer in that "slot".
All slots in the reference frame map are now initialized
to the empty state (-1) and each buffer is initialized
to have a reference count of 0.
Change-Id: Id1afe98de98db4ae8b2dfefed7889c3b28c68582
Use rectangular block size for integral projection motion estimation
if the the 64x64 block has over half block outside the frame. This
avoids the issue that the motion information of these blocks is
dominated by the extended pixels, instead of the pixels of interest.
Change-Id: I22f4d2bb7f6a20db9b3f5e2e5463a7f4b9d1b737
The rounding factor needs to be scaled down by a factor of 2.
Also, the quantized and dequantized coefficients are memset to 0
when dc quantizer is used.
Change-Id: Ifa68bab02addbf1b83d249c5b4cbd5cda796b1cf
Instead using only a fixed threshold, this commit adapts the threshold
for color sensitivity decision to luma signal energy: chroma channel's
sse is at least 1/6 of that in luma for color sensitivity flag to be
set to active.
This recoups a large portion of the speed loss due to accounting for
chroma component costs in RTC mode decision.
Change-Id: Ie01f747f6037dba6a1d1ed3e10b71a0ef1abc42c
This commit replaces the SAD with variance as metric for the
integral projection vector match. It improves the search accuracy
in the presence of slight light change. The average speed -6
compression performance for rtc set is improved by 1.7%. No speed
changes are observed for the test clips.
Change-Id: I71c1d27e42de2aa429fb3564e6549bba1c7d6d4d
This commit fixes an issue in source frame border extension. It
causes certain frame resolution such as 640x480 to have a portion
of the right/bottom extension filled by zeros, which misleads
motion search and degrades transform coding performance when large
block size is used.
This fix improves the speed 2 compression performance of a few
yt sequence, typically ranging from 1% - 2%, up to 5% at median
to low bit-rate.
Change-Id: Id6b09a5695d9e7651c6dfbc2c6a72288b08af7fb
This commit applies one-step refinement search to the resulting
motion vector of the integral projectiion based motion estimation,
per 64x64 block. It improves the coding performance of speed -6.
pedestrian 1080p 500 kbps
51735 b/f, 36.794 dB, 16044 ms ->
51382 b/f, 36.793 dB, 16282 ms
cloud 1080p 500 kbps
24081 b/f, 37.988 dB, 14016 ms ->
23597 b/f, 38.076 dB, 12774 ms
vidyo1 720p 1000 kbps
16552 b/f, 40.514 dB, 8279 ms ->
16553 b/f, 40.543 dB, 8510 ms
The rtc set compression performance is improved by 0.5%.
Change-Id: I3d09bea2caf58b2a4f3b38aa26fffafcbe9a2c17
This commit modifies the hierarchical vector match patter. It
avoids repeated SAD computation at same points. The function
vp9_vector_sad_sse2 is called 12 times per 64x64 block, instead
of 15 times as before. The effective coverage remains the same.
Change-Id: I91ad9d27d40db8963c907d02af84e10702136994
In ssse3 functions, DEFINE_ARGS macro hard codes qcoeff and dqcoeff
to r3 and r4. If skip is 1, qcoeff and dqcoeff need to be loaded
from the stack, which doesn't work because of the above definitions.
Currently, skip=1 case is not used in the encoder. This patch fixed
the issue, so it can be turned on later.
Change-Id: I998d696b1a7a85dca2b3bcee790b21c21e039147
When GF group adaptive maxQ is enabled this patch accounts
somewhat for accumulated error in the rate control.
This improves accuracy quite a bit on many clips especially
when there is overshoot.
Examples when the overshoot and undershoot command line
parameters are set to 100:
Hall @ 1200 overshoot is reduced from 67-24%.
Akiyo @ 400 undershoot is reduced from 28%-15%.
Setting a lower value for undershoot or overshoot still
reduces the error further.
Impact on metrics is mixed with some gains in average psnr
but generally a little lower (e.g. 0.5%) on overall and ssim.
The GF group adaptation is still off by default in this patch.
Compared to with the head, enabling this mode now gives
big average psnr gains on the YT sets (e.g. YT_HD >11.2%),
a drop in overall PSNR (YT-HD 3.9%) and a smaller drop or
neutral for SSIM.
Change-Id: If4b32cd0740d3fb941317b374f9c2951954eee90
Target higher delta-qp for big blocks with zero motion,
and for segment#1: avoid 64x64 partition size and force 8x8 tx size.
Metrics on RTC set mostly positive: SSIM up by ~4%, PSRN by ~1.5%.
Doesn't seem to be any change in speed.
Change-Id: I1f68fa3c4f62dab3b90cc58041f05ebb048ae5ac
Modified the thresholds of deciding whether or not to skip
the transforms in model_rd_for_sb_y(). Used zbin[] instead
of dequant[] to be more precise. Also, modified the checking
coditions.
Rtc set borg test results (at speed 6) showed:
average PSNR gain: 0.138%, overall PSNR gain: 0.158%,
and SSIM gain: 0.177%.
The data rate test was modified slightly as suggested by
Marco.
Change-Id: Ieaf633ab77f4838cb3c45cf69065b29d55f8ae6c
This commit introduces a new block match motion estimation
using integral projection measurement. The 2-D block and the nearby
region is projected onto the horizontal and vertical 1-D vectors,
respectively. It then runs vector match, instead of block match,
over the two separate 1-D vectors to locate the motion compensated
reference block.
This process is run per 64x64 block to align the reference before
choosing partitioning in speed 6. The overall CPU cycle cost due
to this additional 64x64 block match (SSE2 version) takes around 2%
at low bit-rate rtc speed 6. When strong motion activities exist in
the video sequence, it substantially improves the partition
selection accuracy, thereby achieving better compression performance
and lower CPU cycles.
The experiments were tested in RTC speed -6 setting:
cloud 1080p 500 kbps
17006 b/f, 37.086 dB, 5386 ms ->
16669 b/f, 37.970 dB, 5085 ms (>0.9dB gain and 6% faster)
pedestrian_area 1080p 500 kbps
53537 b/f, 36.771 dB, 18706 ms ->
51897 b/f, 36.792 dB, 18585 ms (4% bit-rate savings)
blue_sky 1080p 500 kbps
70214 b/f, 33.600 dB, 13979 ms ->
53885 b/f, 33.645 dB, 10878 ms (30% bit-rate savings, 25% faster)
jimred 400 kbps
13380 b/f, 36.014 dB, 5723 ms ->
13377 b/f, 36.087 dB, 5831 ms (2% bit-rate savings, 2% slower)
Change-Id: Iffdb6ea5b16b77016bfa3dd3904d284168ae649c