Allocate memory space of dual buffer sets that store the coeff, qcoeff,
dqcoeff, and eobs. Connect the pointers of macroblock_plane and
macroblockd_plane to the actual buffer in use accordingly.
Change-Id: I2f0b5f482ca879fae39095013eaf8901db20a5a4
Make the macroblockd_plane contain dynamic buffer pointers instead
static pointers to the memory space allocated therein. The decoder
uses the buffer allocated in pbi, while encoder will use a dual
buffer approach for rate-distortion optimization search.
Change-Id: Ie6f24be2dcda35df7c15b4014e5ccf236fb3f76c
Removed:
goldfreq, avg_encode_time, avg_pick_mode_time,
cpu_freq, interquantizer
member variables from VP9_COMP since they are no longer
used in the code.
Change-Id: I010a82c217d0da03c3f53d1858d3462190c12dcf
Removed three members from the VP9_COMP data structure:
inter_zz_count, gf_bad_count, gf_update_recommended.
These were part of the VP8 real-time mode implementation
that was removed from the initial VP9 codecbase.
Change-Id: I866b083b88ef02c74837277d50ce532ca88492f3
This 2-pass rate control setting allocates bits based
on first pass stats to each kf group, gf group and individual
frame but does not correct the bits left and allocation after
each frame.
In other words it recommends a bit allocation for each frame
but does not try and correct any over or under spend on a
frame over the remainder of the clip. This reduces the accuracy
of rate control in terms of hitting an average bitrate but prevents
problems that may arise because early frames either use to many
or too few bits. This mode is currently more inclined to undershoot
than overshoot (particularly at higher data rates).
Also minor changes to rate of adaption when recode loop is not
enabled.
This mode is currently enabled by default for VBR.
It gives the following % performance gains.
derf +0.467, +1.072
yt 2.962, 2.645
stdhd 1.682, 1.595,
yt-hd 2.3, 2.174
Change-Id: I3c84a9bf8884e5b345698ff0e19187f792c2f3a0
Renames for consistency with other constants:
NUM_FRAME_TYPES -> FRAME_TYPES
NUM_PARTITION_CONTEXTS -> PARTITION_CONTEXTS
Change-Id: I3db30acb2868eb0a424237c831087b2e264ec47f
This should be similar to what x264 does with --aq-mode 1.
It works well with clips like parkjoy and touhou
(http://x264.nl/developers/Dark_Shikari/LosslessTouhou.mkv).
At low bitrates, the segmentation signaling overhead may negate the
benefits of this feature.
(PGW) Default changed to feature OFF to allow provisional merge.
Change-Id: I938abf9bb487e1d4ad3b0264ea03d9826275c70b
The commit changes to mask available intra prediction modes for test
based on prediction block size.
With this patch, encoding time of CpuUsed 2 reduces from 10% to 20% for
HD clips with a compression drop of 0.2%
Change-Id: I65f320f1237c0f5ae3a355bf7caf447f55625455
This commit re-designs the per transformed block rate-distortion
costs tracking buffers. It removes redundant buffer usage, makes
the needed context memory allocation per VP9_COMP instance and
reuses the same buffer sets inside the rate-distortion optimization
search loop, thereby avoiding repeatedly requiring memory space.
It reduces speed 0 runtime:
bus at 2000 kbps from 166763ms to 158967ms,
football at 600 kbps from 246614ms to 234257ms.
Both about 5% speed-up. Local tests suggest about 2% to 5% speed-up
for speed 1 and 2 settings. This does not change compression
performance.
Change-Id: I363514c5276b5cf9a38c7251088ffc6ab7f9a4c3
Allow selective masking of individual split modes rather than
just a single on / off flag.
For speed 2 recovers the large speed loss seen for some derf
clips in change Ie6bdfa0a370148dd60bd800961077f7e97e67dd4
and a small quality gain.
For speed 1 10 % speed increase observed locally on some derf clips
for minimal quality change.
Change-Id: If86191087b93cbc05351c26c60c7933e2149e485
This commit separates the rate-distortion optimization loop of
superblocks from that of sub8x8 blocks. This allows better design
rate-distortion optimization search loop for each setting. It also
removes the use of SPLITMV and I4X4_PRED therein.
No performance change in speed 0 settings. For bus@CIF at 2000kbps,
the speed 1 runtime goes from 48009ms to 43894ms (about 10% faster).
The overall compression performance on derf changed by -0.021%.
Speed 2 runtime goes from 27114ms to 28700ms (6% slower), while the
overall coding efficiency goes up by 1.629% for derf, 1.236% for yt.
Change-Id: Ie6bdfa0a370148dd60bd800961077f7e97e67dd4
Substantial reworking of the speed vs quality trade offs for
speed 1 and 2.
In this patch I am attempting to freeze the "quality" meaning of
speeds 1 and 2 relative to speed 0 so that in future we can
better evaluate progress.
I am targeting :
Speed 1 quality ~-5% vs speed 0.
Speed 2 quality ~-10% vs speed 0
It is inevitable that quality will still fluctuate a little as we adjust
settings and add new features, but we will attempt to keep as
close as possible to these values. Above speed 2 things will remain
a bit more fluid for now.
In this patch speed 1 is approximately 4-5x as fast as speed 0. This
is similar to before but the quality hit is a lot less. Likewise speed 2
is approximately 2x as fast as speed 1 but is similar in quality to the
previous speed 1 configuration.
Also slight change to behavior of FLAG_EARLY_TERMINATE to insure
all reference frames get at least one rd test. Important for very low
variance regions.
WIP :- Added a new speed level with old speed 4 becoming speed 5.
Speed 3 and 4 tradeoffs still WIP
Change-Id: Ic7a38dd7b5b63ab1501f9352411972f480ac6264
The code now takes into account temporal and spatial
information to determine the partition size range, but the
frequency counts have been removed.
The net effect is similar in quality but about 10% faster.
Change-Id: I39a513fb79cec9177b73b2a7218f0da70963ae95
This patch deletes the variance based speed three partitioning.
Speed 3 now uses the same partitioning method as speed 2
but with some stricter conditions.
The speed and quality are now somewhere between speeds 2 and 4
whereas before it was worse in both than speed 4.
Change-Id: Ia142e7007299d79db3ceee6ca8670540db6f7a41
This commit enables adaptive constraint on motion search range for
smaller partitions, given the motion vectors of collocated larger
partition as a candidate initial search point.
It makes speed 0 runtime of bus at CIF and 2000 kbps goes from
167s down to 162s (3% speed-up), at 0.01dB performance gains. In
the settings of speed 1, this makes the runtime goes from 33687 ms
to 32142 ms (4.5% speed-up), at 0.03dB performance gains.
Compression performance wise, it gains at speed 1:
derf 0.118%
yt 0.237%
hd 0.203%
stdhd 0.438%
Change-Id: Ic8b34c67810d9504a9579bef2825d3fa54b69454
Removes this speed feature since it is very slow and unlikely
to be used in practice. This cleanup removes a bunch of unnecessary
complications in the outer encode loop.
Change-Id: I3c66ef1ca924fbfad7dadff297c9e7f652d308a1
Propose some changes to the speed 2 settings to improve quality.
In particular, turns off the adjust_thresholds_by_speed feature
which improves results by 6%. Also removes the code for
adjust_thresholds_by_speed since it conflicts with the adaptive
rd thresh feature.
Overall, with this change speed 2 is -15.2% from speed 0 settings,
on derf, which is significantly better than -21.6% down before.
Change-Id: I6e90a563470979eb0c258ec32d6183ed7ce9a505
Thank Paul for the suggestions. While turning on static-thresh
for static-image videos, a big jump on bitrate was seen. In this
patch, we detected static frames in the video using first-pass
stats. For different cases, disable encode breakout or reduce
encode breakout threshold to limit the skipping.
More modification need be done to break incorrect partition
picking pattern for static frames while skipping happens.
Change-Id: Ia25f47041af0f04e229c70a0185e12b0ffa6047f
A previous speed feature skipped modes not used in earlier
partitions but this not longer worked as intended following
changes to the partition coding order and in conjunction
with some other speed features (Especially speed 2 and above).
This modified mode skip feature sets a mask after the first X
modes have been tested in each partition depending on the
reference frame of the current best case.
This patch also makes some changes to the order modes are
tested to fit better with this skip functionality.
Initial testing suggests speed and rd hit count improvements
of up to 20% at speed 1. Quality results. (derf -1.9%, std hd +0.23%).
Change-Id: Idd8efa656cbc0c28f06d09690984c1f18b1115e1
Sample app: vp9_spatial_scalable_encoder
vpx_codec_control extensions:
VP9E_SET_SVC
VP9E_SET_WIDTH, VP9E_SET_HEIGHT, VP9E_SET_LAYER
VP9E_SET_MIN_Q, VP9E_SET_MAX_Q
expanded buffer size for vp9_convolve
modified setting of initial width in vp9_onyx_if.c so that layer size
can be set prior to initial encode
Default number of layers set to 3 (VPX_SS_DEFAULT_LAYERS)
Number of layers set explicitly in vpx_codec_enc_cfg.ss_number_layers
Change-Id: I2c7a6fe6d665113671337032f7ad032430ac4197
Added some code to output normalized rd hit count stats.
In effect this approximates to the average number of rd
operations/tests per pixel for the sequence.
The results are not quite accurate and I have not bothered
to account for partial SB64s at frame edges and for key frames
However they do give some idea of the number of modes /
prediction methods being tested for each pixel across the
different partition sizes. This indicates how much scope their
is for further gains either by reducing the number of partitions
examined or the modes per partition through heuristics.
Patch 3 moved place where count incremented so partial rd
tests that are aborted with INT_MAX return are also counted.
Example numbers for first 50 frames of Akiyo.
Speed 0 ~84.4 rd operations / pixel
Speed 1 ~28.8
Speed 2 ~11.9
Change-Id: Ib956e787e12f7fa8b12d3a1a2f6cda19a65a6cb8
Incorporates a speed feature for fast forward updates of
coefficients. This feature takes 3 values:
0 - use standard 2-loop version
1 - use a 1-loop version
2 - use a 1-loop version with reduced updates
Results: derfraw300 +0.007% (on speed 0) at feature value = 1
-0.160% (on speed 0) at feature value = 2
There is substantial speed up at speeds 2 and above for low
resolution sequences where the entropy updates are a big part
of the overall computations.
Change-Id: Ie96fc50777088a5bd441288bca6111e43d03bcae