to MB_MODE_INFO_EXT. This saves 36 bytes per 8x8 area for
both the decoder and encoder. (encoder has two MODE_INFO
buffers)
Change-Id: If006abb2224acaf326df3c2be09e77e967662107
Keep the same transform cutoff and partition selection
for speed 5 as in speeds >=6 (non-rd speed settings).
Existing setting for key frame at speed 5 allowed transform size
up to 32x32 on key frames, and did not allow for 4x4 block partition size.
This created more visual artifacts on first few frames.
avgPSNR/overallPSNR/SSIM gains of 0.2/0.7/0.8 for rtc_derf(low-res) set,
and 0/0.7/1.1 gains for rtc set.
Change-Id: I8c139ec6c9bb74e14b4ffbad5f12e94f18a59c0b
For speed 5 real-time mode, the selection of the partition size for
superblocks on the segment (aq-mode=3) uses the non-rd recursive
pick partition search, and can sometimes select 64x64.
For low resolutions, visually better to limit this to 32x32.
Change-Id: I69657a7ed8899f8b3cf8c9c318a2509c5c72c565
If the frame size increases, the tile data buffer needs to be
re-allocated according to the number of tiles existing in current
frame. This patch makes the multi-tile encoding work in spatial
SVC usage case, and partially solved WebM issue 1018.
Change-Id: I1ad6f33058cf5ce6f60ed5024455a709ca80c5ad
Increase the 32x32 split threshold, to allow for more 32x32
at expense of 16x16. Visually looks somewhat better.
Change-Id: Ia1439c3a0dc2d7933468b88bd59266fcd9f03505
Break out the setting of the block variance split thresholds,
since they are locally modified, e.g., based on local/segment qp.
No change in performance.
Change-Id: I0a3238e6dab05140657539fc4bd27ac5ff7a554e
This commit fixes the integral projection motion search crash when
frame resize is used. It fixes issue 994.
Change-Id: Ieeb52619121d7444f7d6b3d0cf09415f990d1506
With the sad functions, and hopefully the variance functions soon,
moving to the vpx_dsp location, place the defines used in the
reference C code in a common location.
Change-Id: I4c8ce7778eb38a0a3ee674d2f1c488eda01cfeca
this macro was used inconsistently and only differs in behavior from
DECLARE_ALIGNED when an alignment attribute is unavailable. this macro
is used with calls to assembly, while generic c-code doesn't rely on it,
so in a c-only build without an alignment attribute the code will
function as expected.
Change-Id: Ie9d06d4028c0de17c63b3a27e6c1b0491cc4ea79
Impose a limit on the rd auto partition search based on
the image format. Smaller formats require that the search
includes includes a smaller minimum block size.
This change is intended to mitigate the visual impact of
ringing in some problem clips, for smaller image formats.
Change-Id: Ie039e5f599ee079bbef5d272f3e40e2e27d8f97b
Remove one of the auto partition size cases.
This case can behaves badly in some types of animated content
and was only used for the rd encode path. A subsequent patch
will add additional checks to help further improve visual quality.
Change-Id: I0ebd8da3d45ab8501afa45d7959ced8c2d60ee4e
Calculated cpi->vbp_threshold_sad from this frame's dequant value.
The encoding quality and speed didn't change much. Borg test
result: PSNR: -0.002%, SSIM: -0.003%.
Change-Id: I97c9826986f39582f29910d637d08a69c90afdee
For color sampling format other than 420, valid partion size in Y may
not work for UV plane. This commit adds validation of UV partition
size before select the partition choice.
This fixes a crash for real time encoding of 422 input.
Change-Id: I1fe3282accfd58625e8b5e6a4c8d2c84199751b6
(see I3a05cf1610679fed26e0b2eadd315a9ae91afdd6)
For the test clip used, the decoder performance improved by ~2%.
This is also an intermediate step towards adding back the
mode_info streams.
Change-Id: Idddc4a3f46e4180fbebddc156c4bbf177d5c2e0d
Force split on 16x16 block (to 8x8) based on the minmax over the 8x8 sub-blocks.
Also increase variance threshold for 32x32, and add exit condiiton in choose_partition
(with very safe threshold) based on sad used to select reference frame.
Some visual improvement near moving boundaries.
Average gain in psnr/ssim: ~0.6%, some clips go up ~1 or 2%.
Encoding time increase (due to more 8x8 blocks) from ~1-4%, depending on clip.
Change-Id: I4759bb181251ac41517cd45e326ce2997dadb577
The vbp thresholds are set seperately for boosted/non-boosted
superblocks according to their segment_id. This way we don't
have to force the boosted blocks to split to 32x32.
Speed 6 RTC set borg test result showed some quality gains.
Overall PSNR: +0.199%; Avg PSNR: +0.245%; SSIM: +0.802%.
No speed change was observed.
Change-Id: I37c6643a3e2da59c4b7dc10ebe05abc8abf4026a
This experiment biases the rd decision based on the impact
a mode decision has on the relative spatial complexity of the
reconstruction vs the source.
The aim is to better retain a semblance of texture even if it
is slightly misaligned / wrong, rather than use a simple rd
measure that tends to favor use of a flat predictor if a perfect
match can't be found.
This improves the appearance of texture and visual quality
on specific test clips but is hidden under a flag and currently
off by default pending visual quality testing on a wider Yt set.
Change-Id: Idf6e754a8949bf39ed9d314c6f2daaa20c888aad
Factor in segment#2 and skip blocks into the postencode estimated bits,
and increase somewhat the aggressiveness of the refresh.
PSNR/SSIM Metrics on RTC set go up by ~0.8/0.5%.
Change-Id: I5d4e7cb00a3aefb25d18c88b6b24118b72dc5d51
Use force_split to constrain the partition selection.
This is used because in the top-down approach to variance partition,
a block size may be selected even though one of its subblocks may have
high variance.
In this patch the selection of the 64x64 block size will only
be allowed if the variance of all the 32x32 subblocks are also below the threshold.
Stil testing, but some visual improvement for areas near slow moving boundary
can be seen. Metrics for RTC set increase by about ~0.5%.
Change-Id: Iab3e7b19bf70f534236f7a43fd873895a2bb261d
The compression performance of speed -5 is on average 12.6% better
than speed -6. At lower bit-rates, the gains are typically 20% or
more. For 2-thread encoding, the speed -5 takes about 1.6x time of
speed -6.
Change-Id: If7a73464a24d33e8f49b9533b51ec51c8da7fc80
Crash occured on very first key frame, because denoiser
temporal function was beng entered.
Updated denoiser unittest to set cpu_used from first frame,
and verified fix fixes the crash.
Change-Id: I3be1124b52846fbbe7248d2c3d6136e086c80bc1
Re-arrange the multiplication and right shift operations to avoid
integer overflow in choose_partitioning.
Change-Id: Ib4005cafb410a67c1960486471d75b6ebe38c4e0
Make the vp9_int_pro_motion_estimation() function return zero
motion vector if high bit depth is turned on, instead of removing
it from compiled codes.
Change-Id: Ia48f010eb590b2d517d5678c394110b326a1a95e
Choose_partition uses only the last frame as reference frame in making
partition decision, this commit adds the check on how well Golden
frame with (0,0) predicts the current block, and uses GF(0,0) as
basis for partition decision if it produces better prediction.
The commit improves rtc speed 6 and 7 encoding by 0.14% and 0.19%
respectively.
Change-Id: I156acf925bd6e0b586d48155d1940d27270a3915
Force 64x64 partitioning when a whole superblock is SEGMENT_LVL_SKIP. This
drops encode times of screens mostly at rest by 20%.
Change-Id: Ieba554b0b8a0c1679aae784a8bd11f038ab942c3
While turning on "--aq_mode=3", the quantizers are updated by each
thread. Fixed the me consts initialization function to make sure
that the correct thread data are updated.
Change-Id: Ied27bb7bae76fc3fa2cda4f8c35ac0b46271bef4
This saves an extra 64x64 variance calculation and replaces two
32x32 variance functions with sad functions. The compression
performance change is unnoticeable.
Change-Id: I6d33868695664ec73b56c42945162ae61c484856
Frame buffers are now allocated dynamically on-demand.
Entries in the reference frame map, cm->ref_frame_map,
may now be set to -1 (INVALID_IDX) to indicate that
there is not a valid reference buffer in that "slot".
All slots in the reference frame map are now initialized
to the empty state (-1) and each buffer is initialized
to have a reference count of 0.
Change-Id: Id1afe98de98db4ae8b2dfefed7889c3b28c68582
Use rectangular block size for integral projection motion estimation
if the the 64x64 block has over half block outside the frame. This
avoids the issue that the motion information of these blocks is
dominated by the extended pixels, instead of the pixels of interest.
Change-Id: I22f4d2bb7f6a20db9b3f5e2e5463a7f4b9d1b737
The rounding factor needs to be scaled down by a factor of 2.
Also, the quantized and dequantized coefficients are memset to 0
when dc quantizer is used.
Change-Id: Ifa68bab02addbf1b83d249c5b4cbd5cda796b1cf
Instead using only a fixed threshold, this commit adapts the threshold
for color sensitivity decision to luma signal energy: chroma channel's
sse is at least 1/6 of that in luma for color sensitivity flag to be
set to active.
This recoups a large portion of the speed loss due to accounting for
chroma component costs in RTC mode decision.
Change-Id: Ie01f747f6037dba6a1d1ed3e10b71a0ef1abc42c
This commit replaces the SAD with variance as metric for the
integral projection vector match. It improves the search accuracy
in the presence of slight light change. The average speed -6
compression performance for rtc set is improved by 1.7%. No speed
changes are observed for the test clips.
Change-Id: I71c1d27e42de2aa429fb3564e6549bba1c7d6d4d
This commit applies one-step refinement search to the resulting
motion vector of the integral projectiion based motion estimation,
per 64x64 block. It improves the coding performance of speed -6.
pedestrian 1080p 500 kbps
51735 b/f, 36.794 dB, 16044 ms ->
51382 b/f, 36.793 dB, 16282 ms
cloud 1080p 500 kbps
24081 b/f, 37.988 dB, 14016 ms ->
23597 b/f, 38.076 dB, 12774 ms
vidyo1 720p 1000 kbps
16552 b/f, 40.514 dB, 8279 ms ->
16553 b/f, 40.543 dB, 8510 ms
The rtc set compression performance is improved by 0.5%.
Change-Id: I3d09bea2caf58b2a4f3b38aa26fffafcbe9a2c17
This commit modifies the hierarchical vector match patter. It
avoids repeated SAD computation at same points. The function
vp9_vector_sad_sse2 is called 12 times per 64x64 block, instead
of 15 times as before. The effective coverage remains the same.
Change-Id: I91ad9d27d40db8963c907d02af84e10702136994
Target higher delta-qp for big blocks with zero motion,
and for segment#1: avoid 64x64 partition size and force 8x8 tx size.
Metrics on RTC set mostly positive: SSIM up by ~4%, PSRN by ~1.5%.
Doesn't seem to be any change in speed.
Change-Id: I1f68fa3c4f62dab3b90cc58041f05ebb048ae5ac
This commit introduces a new block match motion estimation
using integral projection measurement. The 2-D block and the nearby
region is projected onto the horizontal and vertical 1-D vectors,
respectively. It then runs vector match, instead of block match,
over the two separate 1-D vectors to locate the motion compensated
reference block.
This process is run per 64x64 block to align the reference before
choosing partitioning in speed 6. The overall CPU cycle cost due
to this additional 64x64 block match (SSE2 version) takes around 2%
at low bit-rate rtc speed 6. When strong motion activities exist in
the video sequence, it substantially improves the partition
selection accuracy, thereby achieving better compression performance
and lower CPU cycles.
The experiments were tested in RTC speed -6 setting:
cloud 1080p 500 kbps
17006 b/f, 37.086 dB, 5386 ms ->
16669 b/f, 37.970 dB, 5085 ms (>0.9dB gain and 6% faster)
pedestrian_area 1080p 500 kbps
53537 b/f, 36.771 dB, 18706 ms ->
51897 b/f, 36.792 dB, 18585 ms (4% bit-rate savings)
blue_sky 1080p 500 kbps
70214 b/f, 33.600 dB, 13979 ms ->
53885 b/f, 33.645 dB, 10878 ms (30% bit-rate savings, 25% faster)
jimred 400 kbps
13380 b/f, 36.014 dB, 5723 ms ->
13377 b/f, 36.087 dB, 5831 ms (2% bit-rate savings, 2% slower)
Change-Id: Iffdb6ea5b16b77016bfa3dd3904d284168ae649c
This is to avoid redo the same calculation repeatly, and also allow
easier adjustments for further experiments.
This commit shall have no effect on quality/compression.
Change-Id: I4460acf5c808ff5518da18d21e002c5da58af857
This commit fixes the sub block partition size used in
fill_mode_info_sb. Previous implementation effectively disabled
the rectangular block sizes. This commit resolved this issue.
Change-Id: Ic1c383ab0a9a2e7d59e85b388093f1f1f94d1e7f
On rtc set:
speed 7 quality improves about 0.5%
speed 8 quality improves about 1.0%
Encoding time for speed 7 changes from 67804ms to 65889ms
Encoding time for speed 8 changes from 58659ms to 56808ms
Change-Id: Iabcfb53012fc1b9f3326cdbc167e5758b8c7ad30
This commit allows the encoder to account for additional chroma
plane costs in the mode decision process, if the current block
potentially contains significant color change. It improves the
visual quality at very low bit-rates.
The compression performance of dark720p is improved by 12.39% in
speed 6. For jimred at 150 kbps, the PSNR of V component (red)
increased by 0.2 dB, at the expense of about 5% increase in
encoding time. Note that for sequences where the chroma components
are fairly consistent, the encoding time increase is negligible.
On average the rtc set compression performance is improved by
1.172% in PSNR and 1.920% in SSIM.
Change-Id: Ia55b24ef23a25304f7ec9958fbf07fd6e658505c
1. move the check of search method of USE_TX_8X8 up one level to
avoid operations of build_tree_distributions()
2. count tx used and avoid computaton for coef udpate when one size
is not used at all.
Change-Id: Ia3e54a2588aa531c41377a1bfaa64385d04a592c
Floating point is used in vp9_convert_qindex_to_q(), so sometime unit
test ActiveMapTest would cause run time error without properly call
to clear_system_state to reset register status.
Change-Id: I181e9395148c44a6ca8b97d6e109bd4a152143c6
Add distortion threshold condition to refresh state of a coding block,
and allow for qp adjustment also for some intra modes and non-zero motion modes.
Also some code cleanup (remove unused variables/code).
Change-Id: I735fa2b28bc64f60e0323976b82510577b074203
For low spatial resolutions: bias partittion selection to smaller block sizes,
and base the variance computation on 4x4 down-sampling.
Also move the threshold computations into the choose_partitioning,
so they are computed once for each sb block.
On low-res clips (RTC_derf) PSNR/SSIMetrics increase by about 4-5%.
No change for resolutions above CIF.
Change-Id: I93f8ff742c8044786977bb6e31dcf8efda6dd1b0
This makes the inter_mode counts update consistent with other symbols.
Also, forward updates should work corerctly now.
Change-Id: Id98be26fd08875162e644bb8f1de6f0918f85396
Check if block size is below 8x8 for rectangular block coding. It
is added to support 4x8 and 8x4 block coding for RTC mode.
Change-Id: I760b328f45b98ae48adc45ed5a39fb643cd8aebd
This commit enables sub8x8 inter block coding for RTC mode. The
use of sub8x8 blocks can be turned on by allowing
choose_partitioning function to select 4x4/4x8/8x4 block sizes.
Change-Id: Ifbf1fb3888fe4c094fc85158ac3aa89867d8494a
Initial patch to remove get_zbin_mode_boost() and
cpi->zbin_mode_boost.
For now sets a dummy value of 0 for zbin extra pending
a further clean up patch.
Change-Id: I64a1e1eca2d39baa8ffb0871b515a0be05c9a6af
Change 72141 introduced a new use of vp9_avg_4x4.
This call needs to switch to using vp9_highbd_avg_4x4
when performing high bitdepth encodes.
Change-Id: I6a8ba4b62f8a75d0a917b365a55245e2f0438ea1
This commit fixes a bug in the PICK_MODE_CONTEXT index for
horizontal partition case. The compression performance change
is less than 0.01% level, since most blocks are selected to
use square block size in RTC coding mode.
Change-Id: I67effc18ae8795fccdd82a55f4efc609fa5cb3e1
For key frame under variance source partition: 4x4 prediction blocks
may be selected when variance of 8x8 block is very high (threshold is set fairly high for now).
Testing on some RTC clips shows this helps to reduce some ringing artifacts on key frame.
Encoded key frame size increases about ~10%. Key frame PSNR increases about ~0.1-0.2dB.
Change-Id: I56e203fac32ea6ef69897fb3ea269c59cb50d174
This commit explicitly uses the bit shift operation instead of
division for computing block variance.
Change-Id: Id19c0ff27dd1d1ae4aceee6657e1aad0d406bd74
This commit refactors the choose_partitioning function. It removes
redundant memset calls and makes the encoder to calculate
variance value per block only when it is needed. It reduces the
average runtime cost of choose_partitioning by 60%. Overall it
reduces speed -6 runtime by 2-5%.
Change-Id: I951922c50d901d0fff77a3bafc45992179bacef9
Update the frame motion vector only if previous frame motion vector
is needed for next frame reference motion vector.
Change-Id: Ica50f9d7b46ad4f815bba0d9e30f5546df29546f
This commit enables the use of sub8x8 blocks in RTC key frame
encoding. It requires the block size to be preset and will decide
the coding mode and encode the bit-stream.
Change-Id: I35aaf8ee2d4d6085432410c7963f339f85a2c19b
Rename set_modeinfo_offsets as set_mode_info_offsets, to be more
consistent with naming convention.
Change-Id: I68ca1f36c4a78127d9439a50c1506a2afd07927d
The later encoding process will take the top-left block's
mode_info for pre-determined block size.
Change-Id: I76a90f9ce7f3b2dbc2975b52442114e461c465b5
The restructure moves the decision into the rd pick
modes loop and makes a decision based at the 16x16
block level instead of only the 64x64 level.
This gives finer granularity and better visual results
on the clips I have tested. Metrics results are worse
than the old AQ2 especially for PSNR and this mode
now falls between AQ0 and AQ1 in terms of visual
impact and metrics results.
Further tuning of this to follow.
It should be noted that if there are multiple iterations
of the recode loop the segment for a MB could change
in each loop if the previous loop causes a change in the
complexity / variance bin of the block. Also where a block
gets a delta Q this will alter the rd multiplier for this block
in subsequent recode iterations and frames where the
segmentation is applied.
Change-Id: I20256c125daa14734c16f7cc9aefab656ab808f7
When block size is below 16x16, the encoder swap from non-RD to
RD mode for key frame coding. This largely brough back the key
frame compression performance. For vidyo1 at 1000 kbps, the key
frame coding statistics are changed
9978F, 34.183 dB, 36807 us -> 9838F, 35.020 dB, 61677 us
As compared to the full RD case
7187F, 34.930 dB, 214470 us
The overall rtc set coding performance (single key frame setting)
is improved by 1.5%.
Change-Id: I78a4ecf025d7b24ec911e85be94e01da05e77878
Currently, VP9 supports column-tile encoding, which allows a frame
to be encoded in multiple column tiles independently. The number of
column tiles are set by encoder option "--tile-columns". This
provides a way to encode a frame in parallel.
Based on previous set of patches, this patch implemented the tile-
based multi-threaded encoder. Each thread processes one or more
tiles.
Usage:
For HD clips:
--tile-columns=2 --threads=1/2/3/4
While using 4 threads, tests showed that the encoder achieved
2.3X - 2.5X speedup at good-quality speed 3, and 2X speedup at
realtime speed 5.
Change-Id: Ied987f8f2618b1283a8643ad255e88341733c9d4
For key frame at speed 6: enable the non-rd mode selection in speed setting
and use the (non-rd) variance_based partition.
Adjust some logic/thresholds in variance partition selection for key frame only (no change to delta frames),
mainly to bias to selecting smaller prediction blocks, and also set max tx size of 16x16.
Loss in key frame quality (~0.6-0.7dB) compared to rd coding,
but speeds up key frame encoding by at least 6x.
Average PNSR/SSIM metrics over RTC clips go down by ~1-2% for speed 6.
Change-Id: Ie4845e0127e876337b9c105aa37e93b286193405
Also removes some spurious changes in common/vp9_blockd.h which
was introduced by a rebase issue between nextgen and master branches.
Change-Id: If359f0e9a71bca9c2ba685a87a355873536bb282
(cherry picked from commit 005d80cd05)
(cherry picked from commit 08d2f54800)
(cherry picked from commit 4230c2306c)
Each tile's tok starting address is calculated before the encoding
process. These addresses are stored so that the same calculation
won't be done again in packing bit stream.
Change-Id: I0a3be0301f002260c19a850303f2f73ebc47aa50