Improves function to return sad of integer pels by reusing integer
pels already visited in the smallest scale.
Turns on BIGDIA search for speed 4. Also, turns on the
first version of the pruned subpel search at this speed.
derf: -0.32% (speed 4)
Speed seems to improve by at least 5% but subject to verification.
Change-Id: Iaec8eaffd61d6237ac029e6a2a1b0a88b2a35271
The speed feature that skips compound inter prediction modes was
subsumed by other speed features and effectively was not in use.
This commit removes it.
Change-Id: I22b0c71a8ddd15d93b25d86fa63a1dce2ba6a1a9
This commit refactor the rate-distortion optimization search for
regular block sizes to remove the speed feature dependency on mode
search order.
Change-Id: Ied033ee484c2957e17baa7b6450b720fe7dd0e7d
The use of use_lastframe_partitioning is totally removed in good-
quality encoding. Its usage in real-time encoding needs to be
evaluated to see if it can be removed too.
The Borg tests at speed 4 showed:
stdhd set: 0.220% psnr gain, 0.166% ssim gain;
derf set: 0.329% psnr gain, 0.476% ssim gain.
Speed test on selected clips showed 1.54% speedup.(Worst case:
pedestrian_area_1080p25.y4m, speed loss: 1.5%)
Change-Id: I1c844d329b0b5678558439b887297c1be7ddab00
The speedup in rd_pick_partition() function makes it possible
to drop use_lastframe_partitioning feature. By doing that, we
achieve good PSNR gain with small speed loss. Also, this makes
encoding loop less complicated. The code cleanup patch will
follow.
Borg tests showed:
1. At speed 2,
stdhd set: 0.201% PSNR gain, 0.133% SSIM gain;
derf set: 0.262% PSNR gain, 0.276% SSIM gain.
2. At speed 3,
stdhd set: 0.139% PSNR gain, 0.109% SSIM gain;
derf set: 0.447% PSNR gain, 0.442% SSIM gain.
The average speed loss over selected test clips is within 1%
with the worst case of 4%.
Change-Id: Icfd2ded7869372b585a6972855d933b3d0280d90
From 3 to 2, which seems to be slightly positive on compression for
all test sets, also reduces encoding time by 2%-5%, varying on the
test clips.
Change-Id: If045417bd27311700c919b4a335eff0dc1130ae0
This commit allows encoder to skip intra coding mode test, when
the known inter residual is less than the source variance. It
reduces the runtime of speed 3 for test clips:
bus cif 1000 kbps: 8587 ms -> 8260 ms, 3.8% speed-up
pedestrian 1080p 2000 kbps: 161381 ms -> 155241 ms, 3.7% speed-up.
The compression performance is down by
derf -0.36%
stdhd -0.25%
Change-Id: I75ce1e035b4da2153cb1ac14111d1a07c05a735d
In the partition search, the encoder checks all possible
partitionings in the superblock's partition search tree.
This patch proposed a set of criteria for partition search
early termination, which effectively decided whether or
not to terminate the search in current branch based on the
"skippable" result of the quantized transform coefficients.
The "skippable" information was gathered during the
partition mode search, and no overhead calculations were
introduced.
This patch gives significant encoding speed gains without
sacrificing the quality.
Borg test results:
1. At speed 1,
stdhd set: psnr: +0.074%, ssim: +0.093%;
derf set: psnr: -0.024%, ssim: +0.011%;
2. At speed 2,
stdhd set: psnr: +0.033%, ssim: +0.100%;
derf set: psnr: -0.062%, ssim: +0.003%;
3. At speed 3,
stdhd set: psnr: +0.060%, ssim: +0.190%;
derf set: psnr: -0.064%, ssim: -0.002%;
4. At speed 4,
stdhd set: psnr: +0.070%, ssim: +0.143%;
derf set: psnr: -0.104%, ssim: +0.039%;
The speedup ranges from several percent to 60+%.
speed1 speed2 speed3 speed4
(1080p, 100f):
old_town_cross: 48.2% 23.9% 20.8% 16.5%
park_joy: 11.4% 17.8% 29.4% 18.2%
pedestrian_area: 10.7% 4.0% 4.2% 2.4%
(720p, 200f):
mobcal: 68.1% 36.3% 34.4% 17.7%
parkrun: 15.8% 24.2% 37.1% 16.8%
shields: 45.1% 32.8% 30.1% 9.6%
(cif, 300f)
bus: 3.7% 10.4% 14.0% 7.9%
deadline: 13.6% 14.8% 12.6% 10.9%
mobile: 5.3% 11.5% 14.7% 10.7%
Change-Id: I246c38fb952ad762ce5e365711235b605f470a66
Updates the vp9_pattern_search function to return integer one-away
neighbors' sad values, for subsequent use in speeding up the
sub-pel search. Also, removes code for the do_refine option
which is not being used currently.
Updates the integer and subpel functions to pass in a 5-element
sad list for output or input.
A new pruned sub-pel search algorithm is implemented that uses
the sad returned from the integer pel search. But it is not
deployed yet.
Change-Id: Ifa9f5ad024b5b660570366d2bd900343e1891520
This commit addes a new strategy to reduce the search for optimal
interpolation filter type. The encoder counts and store how many each
filter type is selected and used for each of the reference frames.
A filter type that is rarely used for all three reference frames is
masked out to avoid computation.
The impact on compression is neglectible:
-0.02% on derf
+0.02% on stdhd
Encoding time is seen to reduce by 2~3%.
Change-Id: Ibafa92291b51185de40da513716222db4b230383
In the full-rd transform size search, we go through all transform
sizes to choose the one with best rd score. In this patch, an
early termination is added to stop the search once we see that the
smaller size won't give better rd score than the larger size. Also,
the search starts from largest transform size, then goes down to
smallest size.
A speed feature tx_size_search_breakout is added, which is turned off
at speed 0, and on for other speeds. The transform size search is
turned on at speed 1.
Borg test results:
1. At speed 1,
derf set: psnr gain: 0.618%, ssim gain: 0.377%;
stdhd set: psnr gain: 0.594%, ssim gain: 0.162%;
No noticeable speed change.
3. At speed 2,
derf set: psnr loss: 0.157%, ssim loss: 0.175%;
stdhd set: psnr loss: 0.090%, ssim loss: 0.101%;
speed gain: ~4%.
Change-Id: I22535cd2017b5e54f2a62bb6a38231aea4268b3f
This commit enables the encoder to record the location of the
center frame to generate alter reference frame. It then allows to
skip checking prediction modes of other reference frame types when
it comes to encode this frame.
The speed 3 runtime is reduced for the test sequences:
bus at CIF 1000 kbps, 9791 ms -> 9446 ms, i.e., 3.5% speed-up,
pedestrian at 1080p 2000 kbps, 184043 ms -> 175730 ms, i.e., 4.5%
speed-up.
No compression performance change observed.
Change-Id: Iacfde3bcc1445964e7a241f239bd6ea11cb94bd1
Add a speed feature to give the tighter partition search
range. Before partition search, calculate the histogram
of the partition sizes of the left, above and previous
co-located blocks of the current block. If the variance of
observed partition sizes is small enough, adjust the search
range around the mean partition size, which will be tigher.
The feature is currently turned on at speed 2. Experiments on
sample youtube clips show on average the runtime is reduced
by 3-7%.
For hard stdhd clips:
park_joy_1080p @ 15000kbps: 509251 ms -> 491953 ms (3.3%)
pedestrian_area_1080p @ 2000kbps: 223941 ms -> 214226 ms (4.3%)
The PSNR performance is changed:
derf: -0.112%
yt: -0.099%
hd: -0.090%
stdhd:-0.102%
Change-Id: Ie205ec5325bf92ec5676c243e30ba9d0adca10f2
At --good and speed 3 or above for resolution less than 720p. This
disables the tests for 64x64 intra prediction modes. Encoding time
reduction is about 1%.
Change-Id: Ib396e3d1417fece416e3f0fee929b128acbb130f
This commit moves the simplified coefficient probability model
and costing update to speed 4, and turns on chessboard pattern
mode search for sub 720p sequences. The overall coding performance
of speed 3 is improved:
derf 0.889%
stdhd 1.744%
The speed 3 runtime for test sequences are improved:
bus cif at 1000 kbps 9823 ms -> 9642 ms
pedestrian 1080p 2000 kbps 189559 ms -> 183284 ms
Change-Id: Iecbc7496a68f31fd49fb09f8dfd97c028d675a5d
This commit allows the encoder to check the above and left neighbor
blocks' reference frames and motion vectors. If they are all
consistent, skip checking the NEARMV and ZEROMV modes. This is
enabled in speed 3. The coding performance is improved:
pedestrian area 1080p at 2000 kbps,
from 74773 b/f, 41.101 dB, 198064 ms
to 74795 b/f, 41.099 dB, 193078 ms
park joy 1080p at 15000 kbps,
from 290727 b/f, 30.640 dB, 609113 ms
to 290558 b/f, 30.630 dB, 592815 ms
Overall compression performance of speed 3 is changed
derf -0.171%
stdhd -0.168%
Change-Id: I8d47dd543a5f90d7a1c583f74035b926b6704b95
At speed 6 the smallest partitioning was 16x16 and biggest
intra block was 8x8, essentially disallowing all intra blocks
which produces ugly artifacts when revealing new video.
Change-Id: I364042d4c64e09be0666ade64aac94d0a1b586cf
We had a very complicated way to initialize cpi->pass from
cfg->g_pass:
switch (cfg->g_pass) {
case VPX_RC_ONE_PASS:
oxcf->mode = ONE_PASS_GOOD;
break;
case VPX_RC_FIRST_PASS:
oxcf->mode = TWO_PASS_FIRST;
break;
case VPX_RC_LAST_PASS:
oxcf->mode = TWO_PASS_SECOND_BEST;
break;
}
cpi->pass = get_pass(oxcf->mode).
Now pass is moved to VP9EncoderConfig and initialization is simple:
switch (cfg->g_pass) {
case VPX_RC_ONE_PASS:
oxcf->pass = 0;
break;
case VPX_RC_FIRST_PASS:
oxcf->pass = 1;
break;
case VPX_RC_LAST_PASS:
oxcf->pass = 2;
break;
}
Change-Id: I8f582203a4575f5e39b071598484a8ad2b72e0d9
This commit enables a chessboard pattern constrained partition
search for 720p and above resolutions. The scheme applies stricter
partition search to alternative blocks based on its above/left
neighboring blocks' partition range, as well as that of the
collocated blocks in the previous frame. It is currently turned
on at 16x16 block size level. The chessboard pattern is flipped
per coding frame.
The speed 3 runtime is reduced:
park_joy_1080p, 652832 ms -> 607738 ms (7% speed-up)
pedestrian_area_1080p, 215998 ms -> 200589 ms (8% speed-up)
The compression performance is changed:
hd -0.223%
stdhd -0.295%
Change-Id: I2d4d123ae89f7171562f618febb4d81789575b19
This commit enables a chessboard pattern prediction filter type
search scheme for rate-distortion optimization speed-up. For the
inferred motion vector modes, the encoder can re-use its above/left
neighbor blocks' prediction filter type and skip a full test on
all possible filter types. Such operation is turned on/off
alternatively in a chessboard manner.
It is turned on in speed 3. For test clip pedestrian 1080p, the
runtime is reduced from 231500 ms -> 221700 ms. The compression
performance is changed:
derf: -0.147%
yt: -0.134%
hd: -0.079%
stdhd: -0.220%
Change-Id: I1912f278e7576c2dc632688e3ad7a257410c605a
For sequences of resolution below 720p, the encoder will check
intra prediction modes and inter prediction modes from LAST_FRAME.
This commit turns on adaptive prediction filter scheme for sub8x8
blocks, where inter prediction modes are enabled. For the test
sequence bus at CIF, the speed 2 runtime goes down from 17879 ms
to 16783 ms, i.e., 6% speed up. The compression performance of
derf set is down by -0.128%.
Change-Id: I01d5321a5ceab4e0666ac5be56c52d896c7a8d45
This commit changed the hard-coded DEFAULT_INTERP_FILTER to a speed
feature with the same default value: SWITCHABLE.
Change-Id: I7f54f40f1bd3f5277841d04b85db7a84e47313f1
We target this speed to achieve similar encoding speed and better
compression than vp8 rt mode with cpu-used at -12.
Change-Id: Ic1bb4371c81a17ea80e83459c1cbf4c09a3498e8
This commit fixes a potential out-of-boundary memory access due to
the use of reuse_inter_pred_sby in the non-RD coding flow. It
resolves the corresponding asan error.
Change-Id: Iff605f5921230966990013541cd855d698810922
Use FAST_HEX in speed 5 and 6, which covers more points than
FAST_DIAMOND and improves motion search quality.
At speed 6, RTC set borg tests showed slight quality gain (psnr
gain: 0.143%, ssim gain: 0.226%). No noticeable encoding speed
change.
Change-Id: Ifa62875d9a52ee382ec494f271382bb77d8c67bf
This commit enables a new quantization process for 32x32 2D-DCT
transform coefficient blocks. It improves the compression
performance of speed 5 by 1.4%. The overall compression gains of
speed 5 due to the new quantization scheme is 4.7%. It also includes
the SSSE3 implementation of the 32x32 quantization process.
Change-Id: I0855b124fd6462418683f783f5bcb44255c9993b
* Replace max_step_search_steps with constant MAX_MVSEARCH_STEPS
* Fold (reduce_first_step_size + speed > 5) into reduce_first_step_size
replacing uses of reduce_first_step_size that don't add the speed
check with zero.
Change-Id: Iae46395dbf3eaca138bf4d18b838a9e364b5a198
This commit added a speed feature to control the step_param used in
full pixel motion search. The intention is to reduced the search
steps for high speed real time coding.
Change-Id: I21d2f0105c2b647783a6688615da7fcf2b6d670b
This commit re-designs the quantization process for transform
coefficient blocks of size 4x4 to 16x16. It improves compression
performance for speed 7 by 3.85%. The SSSE3 version for the
new quantization process is included.
The average runtime of the 8x8 block quantization is reduced
from 285 cycles -> 255 cycles, i.e., over 10% faster.
Change-Id: I61278aa02efc70599b962d3314671db5b0446a50
The current threshold is knid of low, and in many cases NEWMV
mode is checked but not picked as the best mode. This patch
added a speed feature to increase NEWMV threshold, so that
less partition mode checking goes to check NEWMV. This feature
is enabled for speed 6 and 7.
Rtc set borg tests showed:
1. Speed 6, overall psnr: -0.088%, ssim: -1.339%;
Average speedup on rtc set is 11.1%.
2. Speed 7, overall psnr: -0.505%, ssim: -2.320%
Average speedup on rtc set is 12.9%.
Change-Id: I953b849eeb6e0d5a1f13eacba30c14204472c5be
For real time speed 7, once encode breakout is on(i.e. encoding
setting --static-thresh=1), a proper encode breakout threshold
is set to speed up the encoder.
Set --static-thresh=1, RTC set borg test showed a slight overall
psnr loss of 0.162%, but ssim gain of 0.287%. The average speedup
on RTC set is 6%, and for some clips, the speedup can be 10+%.
Change-Id: Id522d9ce779ff7c699936d13d0c47083de4afb85
Before encoding a frame, calculate and store each 16x16 block's
variance of source difference between last and current frame.
Find partitioning threshold T for the frame from its variance
histogram, and then use T to make partition decisions.
Comparing with fixed 16x16 partitioning, rtc set test showed an
overall psnr gain of 3.242%, and ssim gain of 3.751%. The best
psnr gain is 8.653%.
The overall encoding speed didn't change much. It got faster for
some clips(for example, 12% speedup for vidyo1), and a little
slower for others.
Also, a minor modification was made in datarate unit test.
Change-Id: Ie290743aa3814e83607b93831b667a2a49d0932c
This commit enables an adaptive transform size selection method
for speed -6. It uses largest transform size when the sse is more
than 4 times of variance, i.e., most energy is compacted in the
DC coefficient. Otherwise, use the default TX_8X8. It improves
the compression efficiency for rtc set of speed -6 by 0.8%, no
speed change observed.
Change-Id: Ie6ed1e728ff7bf88ebe940a60811361cdd19969c
This commit fixes the potential issue in the non-RD mode decision
flow that only checks part of the block to estimate the cost. It
was due to the use of fixed transform size, in replacing the
largest transform block size. This commit enables per transform
block cost estimation of the intra prediction mode in the non-RD
mode decision.
Change-Id: I14ff92065e193e3e731c2bbf7ec89db676f1e132
In real-time speed 6, no partition search is done. The inter
prediction results got from picking mode can be reused in the
following encoding process. A speed feature reuse_inter_pred_sby
is added to only enable the resue in speed 6.
This patch doesn't change encoding result. RTC set tests showed
that the encoding speed gain is 2% - 5%.
Change-Id: I3884780f64ef95dd8be10562926542528713b92c
Fix some bugs relating to the use of buffers
in the overlay frames.
Fix bug where a mid sequence overlay was
propagating large partition and transform sizes into
the subsequent frame because of :-
sf->last_partitioning_redo_frequency > 1 and
sf->tx_size_search_method == USE_LARGESTALL
Change-Id: Ibf9ef39a5a5150f8cbdd2c9275abb0316c67873a
This commit allows the key frame to search through more prediction
modes and more flexible block sizes. No speed change observed. The
coding performance for rtc set is improved by 1.7% for speed -5 and
3.0% for speed -6.
Change-Id: Ifd1bc28558017851b210b4004f2d80838938bcc5
Speed 6 uses small tx size, namely 8x8. max_intra_bsize needs to
be modified accordingly to ensure valid intra mode checking.
Borg test on RTC set showed an overall PSNR gain of 0.335% in speed
-6.
This also changes speed -5 encoding by allowing DC_PRED checking
for block32x32. Borg test on RTC set showed a slight PSNR gain of
0.145%, and no noticeable speed change.
Change-Id: I1502978d8fbe265b3bb235db0f9c35ba0703cd45
In non-rd real-time mode, choosing smaller transform size in
encoding gives better video quality and good speed gain than
choosing larger transform size. This patch set tx size search
method to ALLOW_8X8, which is better than using 4x4 or other
larger sizes.
Borg tests on rtc set at speed 6 showed significant gain on quality.
PSNR gain: 11.034% and SSIM gain: 15.466%.
The speed gain is 5% - 12% for <720p clips, and 2% - 7% for
720p clips.
Change-Id: If4dc74ed2df359346b059f47fb73b4a0193ec548
Right now there is just one place to check: xd->lossless and for the first
pass there is a function is_lossless_requested().
Change-Id: I949a6834e64ce51e422e2892f097f2b871b5429a
Making this consistent with intra mode masks: you need to specify
allowed inter/intra modes to use.
Change-Id: Iaecd28bf79047259707d8e7a59a57bb7b856383e
This commit changed to enable the encoder to adjust motion dection
speed threshold based on picture size. In addition, cpu-used 1 now
does a partition search every other frame instead of every third
frame for low resolution inputs.
The change has no quality/speed impact for 720p and above. Test
showed the change increase encoding time by between 3% to 6% for
cpu-used 2 encodiong of 360p sequences. It also has a compression
gain about .3%.
For cpu-used 2, the change resolved some very disturbing visual
artifacts in certain sequences when large block partitionings and
transforms are used as a result of copying the partition from a
previous frame.
Change-Id: Ic7fd22508cdb811d4ca935655adbf20109286cfa
Added a skipping test in non-rd inter-mode. After interpolation
prediction step, the residuals are tested to see if they will be
quantized to 0 based on modeling between spatial domain and
frequency domain.
Set static-thresh to 800 for >=720p and 300 for <720p, rtc set
tests showed
1. Speed 5, psnr: -0.514%; ssim: -1.748%;
speedup on related clips: 5% -11%
2. Speed 6, psbr: -0.628%; ssim: -1.637%;
speedup on related clips: 4% - 9%
Change-Id: I62fbf26bc043ecd2b584f255f1a4ee5ab52bfcf3
This commit introduces a chessboard pattern search for the prediction
filter type search. It runs extensive search in alternate blocks and
allows the rest blocks to refer coding decisions of their nearby
neighbors.
For pedestrian 1080p at 4000 kbps, the runtime of speed -5 goes down
from 43990 ms to 42200 ms. The overall compression performance for
RTC set is changed by -1.37%.
Change-Id: Icfe220c49451cda796f0ca91d935c9ed01e56c9d
For speed 3 and above, such search is only allowed at speed 3.
The change helped cif and stdhd set by 1.2% and .7% in compression,
but increased the encoding time by around 5%.
Change-Id: Ifa4832327f1c1bef3decb032ceb769cbf50e059f
A previous path improved speed 2 quality a little but
more extensive testing showed that it slowed encode
by a few %.
The change will have a similar effect for speed 3 but
should not impact speeds 4+;
This experiment should reverse that and give a speed
up at the cost of a small quality loss.
Borg results pending.
Change-Id: I4493fc1541aaf44587f1a41ff219f7088da9252c
Calculate the difference variance between last source frame and
current source frame. The variance is calculated at 16x16 block
level. The variances are compared to several thresholds to decide
final partition sizes.
An adaptive strategy is implemented to decide using
SOURCE_VAR_BASED_PARTITION or FIXED_PARTITION based on motions
in the video. The switching test is done once every
search_type_check_frequency frames.
The selection of source_var_thresh needs to be investigated
further later.
RTC set Borg test showed 0.424% overall psnr gain, and 0.357%
ssim gain. For clips with large enough static area, the
encoding speedup is around 2% to 15%.
Change-Id: Id7d268f1d8cbca7fb8026aa4a53b3c77459dc156
Copy up to a certain bsize, otherwise set to a fixed bsize.
This helsp to reduce artifact near moving boundary caused by full partition
copy without checking motion of super-block.
This artifact can occur at speeds 3,4 in real-time mode.
Issue: https://code.google.com/p/webm/issues/detail?id=738.
Change-Id: I05812521fd38816a467f72eb6a951cae4c227931
This commit slightly increases the bit allocation for key frame. This
improves speed -5 coding performance by 2.77% with aq-mode=0 and by
2.78% with aq-mode=3.
Change-Id: Iaa3e777f80b9706306606af06e89852bac146659
This commit reduces the frequency of frames using finer quantizer
in non-RD coding flow, and slightly tune up the quantizer resolution
when used. It provides 1.7% compression gains in speed -5 at no speed
difference.
Change-Id: I430249a51260a841a0402666e5ec1566e4f7d5a6
The new tolerance is a little higher than before (especially
for kf/gf/arf) so this change gives an encode speed up
for some clips up for speeds 0-2.
Change-Id: I63f7d6c9cc11c7f58742f41e250dcd3eab1741eb
This commit adjusted the speed steps in rt mode to make the steps
more evenly spaced on speed and quality, specifically:
1. Merged 3 and 4 into one single step 3 and removed confilicting
features.
2. Move 8, 7, 6, 5 to be 7, 6, 5, 4 repsectively.
Change-Id: I38d56d61531f3561d772aef953c411c8fb38c063