We already have itxm_add member in MACROBLOCKD structure. Both
inv_txm4x4_1_add and inv_txm4x4_add are just its special cases for
different eob values. But eob logic is already implemented in
vp9_iwht4x4_add and vp9_idct4x4_add (that's why also removing
inverse_transform_b_4x4_add).
Change-Id: I80bec9b6f7d40c5e5033c613faca5c819c3e6326
The idea is to have the following names for each transform size:
vp9_idct4x4_add
vp9_idct4x4_1_add
vp9_idct4x4_10_add
vp9_idct4x4_16_add
vp9_idct8x8_add
vp9_idct8x8_1_add
vp9_idct8x8_10_add
vp9_idct8x8_64_add
etc for 16x16, 32x32
The actual list of renames in this patch:
vp9_idct_add_lossless -> vp9_iwht4x4_add
vp9_short_iwalsh4x4_add -> vp9_iwht4x4_16_add
vp9_short_iwalsh4x4_1_add -> vp9_iwht4x4_1_add
vp9_idct_add -> vp9_idct4x4_add
vp9_short_idct4x4_add -> vp9_idct4x4_16_add
vp9_short_idct4x4_1_add -> vp9_idct4x4_1_add
Change-Id: I6f43f7437c68dd30cdd05d72e213765578ed30b1
This commit separates the rate-distortion optimization loop of
superblocks from that of sub8x8 blocks. This allows better design
rate-distortion optimization search loop for each setting. It also
removes the use of SPLITMV and I4X4_PRED therein.
No performance change in speed 0 settings. For bus@CIF at 2000kbps,
the speed 1 runtime goes from 48009ms to 43894ms (about 10% faster).
The overall compression performance on derf changed by -0.021%.
Speed 2 runtime goes from 27114ms to 28700ms (6% slower), while the
overall coding efficiency goes up by 1.629% for derf, 1.236% for yt.
Change-Id: Ie6bdfa0a370148dd60bd800961077f7e97e67dd4
Substantial reworking of the speed vs quality trade offs for
speed 1 and 2.
In this patch I am attempting to freeze the "quality" meaning of
speeds 1 and 2 relative to speed 0 so that in future we can
better evaluate progress.
I am targeting :
Speed 1 quality ~-5% vs speed 0.
Speed 2 quality ~-10% vs speed 0
It is inevitable that quality will still fluctuate a little as we adjust
settings and add new features, but we will attempt to keep as
close as possible to these values. Above speed 2 things will remain
a bit more fluid for now.
In this patch speed 1 is approximately 4-5x as fast as speed 0. This
is similar to before but the quality hit is a lot less. Likewise speed 2
is approximately 2x as fast as speed 1 but is similar in quality to the
previous speed 1 configuration.
Also slight change to behavior of FLAG_EARLY_TERMINATE to insure
all reference frames get at least one rd test. Important for very low
variance regions.
WIP :- Added a new speed level with old speed 4 becoming speed 5.
Speed 3 and 4 tradeoffs still WIP
Change-Id: Ic7a38dd7b5b63ab1501f9352411972f480ac6264
This commit causes use last partition to consider whether a 64x64 has
motion that might make a new partitioning worth while.
Change-Id: I3a57bedef4f3cd961fadbfa96651c206fa36da4a
Make encoder skip rectangular partition check in speed 1 and above,
when early termination was triggered in partition split.
Thanks Guillaume (gmartres@) for catching this issue.
This change makes bus_cif at 2000kbps speed 1 runtime goes down from
25612ms to 23438ms (about 9% speed-up), at the expense of -0.235%
performance down.
Change-Id: I98613fad081a261d30d5fa206f934ca70601c180
The code now takes into account temporal and spatial
information to determine the partition size range, but the
frequency counts have been removed.
The net effect is similar in quality but about 10% faster.
Change-Id: I39a513fb79cec9177b73b2a7218f0da70963ae95
This patch deletes the variance based speed three partitioning.
Speed 3 now uses the same partitioning method as speed 2
but with some stricter conditions.
The speed and quality are now somewhere between speeds 2 and 4
whereas before it was worse in both than speed 4.
Change-Id: Ia142e7007299d79db3ceee6ca8670540db6f7a41
After change of MI context storage , mi_8x8[] pointer may be null for
a block outside of image border. The commit changes to access the data
only after validation of mi_row and mi_col.
Change-Id: I039c4eb486a228ea9d8e5f35ab9ae6717d718bf3
This commit enables forcing all coefficients zero per transformed
block, when its rate-distortion cost is lower than regular coeff
quantization.
The overall performance improvement (including its parent patch on
calculating rd cost per transformed block) at speed 1:
derf: 0.298%
yt: 0.452%
hd: 0.741%
stdhd: 0.006%
Change-Id: I66005fe0fd7af192c3eba32e02fd6d77952accb5
The commit added reset of pred_mv at the beginning of each SB64x64
partition mv search, also limited the usage of pred_mv only when
search on the largest partition is already done. This is to fix
a crash at speed 1/2 encoder where an invalid mv is used in mv
search.
Change-Id: I39010177da76d054e3c90b7899a44feb2e3a5b1b
This commit enables adaptive constraint on motion search range for
smaller partitions, given the motion vectors of collocated larger
partition as a candidate initial search point.
It makes speed 0 runtime of bus at CIF and 2000 kbps goes from
167s down to 162s (3% speed-up), at 0.01dB performance gains. In
the settings of speed 1, this makes the runtime goes from 33687 ms
to 32142 ms (4.5% speed-up), at 0.03dB performance gains.
Compression performance wise, it gains at speed 1:
derf 0.118%
yt 0.237%
hd 0.203%
stdhd 0.438%
Change-Id: Ic8b34c67810d9504a9579bef2825d3fa54b69454
mode_info_context was stored as a grid of MODE_INFO structs.
The grid now constists of pointers to MODE_INFO structs. The
MODE_INFO structs are now stored as a stream (decoder only),
eliminating unnecessary copies and is a little more cache
friendly.
Change-Id: I031d376284c6eb98a38ad5595b797f048a6cfc0d
If the current obtained distortion is very small, which happens
for static image case, we pick the current partition type without
further split checking.
This won't affect regular videos. For static videos, we got 10%~12%
encoding speed gain. PSNR was better for some clips, and worse for
others. Overall it was even.
Change-Id: If787a57bedf46fc595ca4f5ded2b0c0a69e9fdef
A previous speed feature skipped modes not used in earlier
partitions but this not longer worked as intended following
changes to the partition coding order and in conjunction
with some other speed features (Especially speed 2 and above).
This modified mode skip feature sets a mask after the first X
modes have been tested in each partition depending on the
reference frame of the current best case.
This patch also makes some changes to the order modes are
tested to fit better with this skip functionality.
Initial testing suggests speed and rd hit count improvements
of up to 20% at speed 1. Quality results. (derf -1.9%, std hd +0.23%).
Change-Id: Idd8efa656cbc0c28f06d09690984c1f18b1115e1
mode_info_context was stored as a grid of MODE_INFO structs.
The grid now constists of a pointer to a MODE_INFO struct and
a "in the image" flag. The MODE_INFO structs are now stored
as a stream, eliminating unnecessary copies and is a little
more cache friendly.
For the test clips used, the decoder performance improved
by ~4.3% (1080p) and ~9.7% (720p).
Patch Set 2: Re-encoded clips with latest. Now ~1.7% (1080p)
and 5.9% (720p).
Change-Id: I846f29e88610fce2523ca697a9a9ef2a182e9256
Speed 4 fixed partition size. Use fixed size unless it does not
fit inside image, in which case use the largest size that does.
Change-Id: I250f7a80506750dd82ab355721624a1344247223
Switching from mi_{width, height}_log2 and b_{width, height}_log2 to
num_8x8_blocks_{wide, high} and num_4x4_blocks_{wide, high}. Removing
redundant code, adding const.
Change-Id: Iaab2207590fd24d0b76999071778d1395dc5cd5d
Previous change c4048dbd limits the mv search range assuming max block
size of 64x64, this commit change the search range using actual block
size instead.
Change-Id: Ibe07ab02b62bf64bd9f8675d2b997af20a2c7e11
Put rectangular partition check flag change according to the rd
costs of NONE and SPLIT partition types under the speed feature.
Change-Id: If681e1e078a8d43d86961ea4b748da5cd1b6c331
This commit changes the partition search order of superblocks from
{SPLIT, NONE, HORZ, VERT} to {NONE, SPLIT, HORZ, VERT} for
consistency with that of sub8x8 partition search. It enable the use
of early termination in partition search for all block sizes.
For ped_area_1080p 50 frames coded at 4000 kbps, it makes the runtime
goes down from 844305ms -> 818003ms (3% speed-up) at speed 0.
This will further move towards making the in-search partition types
configurable, hence unifying various speed-up approaches.
Some speed 1 and 2 features are turned off during the refactoring
process, including:
disable_split_var_thresh
using_small_partition_info
Stricter constraints are applied to use_square_partition_only for
right/bottom boundary blocks. Will bring back/refine these features
subsequently. At this point, it makes derf set at speed 1 about
0.45% higher in compression performance, and 9% down in run-time.
Change-Id: I3db9f9d1d1a0d6cbe2e50e49bd9eda1cf705f37c
Adds a couple of minor fixes, which may be absorbed in Jingning's
patch. Thanks to Guillaume for pointing these out.
Also adjusts the thresholds for speed 1 and 2 to 16 and 32
respectively, to keep quality drops small.
Results:
--------
derfraw300: threshold = 16, psnr -0.082%, speedup 2-3%
threshold = 32, psnr -0.218%, speedup 5-6%
stdhdraw250: threshold = 16, psnr -0.031%, speedup 2-3%
threshold = 32, psnr -0.273%, speedup 5-6%
Change-Id: I4b11ae8296cca6c2a9f644be7e40de7c423b8330
It appears that the above/left mb_skip_coeff used during
the pick modes, is left over from the previously
encode frame. This patch initializes the flag to the default
value of zero.
Change-Id: Ida4684cc99611d6e3e82628db35ed717e28ce550
Changes to code to auto select a partition size range
based on data from spatial neighbors.
Now looks at the sb_type in each 8x8 block of above
and left SB64.
The effect on speed 1 is now weaker giving better
quality but less speed gain. Now also used in speed 2.
Change-Id: Iace33a97d5c3498dd2a9a8a4067351941abcbabc
As the pixel values beyond image border are duplicates of pixels
on edge, the change limits the mv search range, any mv beyond
the limits no longer produce new/different prediction values
as entire block with pixels used for subpel interpolation are
outside image border.
Change-Id: I4c6fdf06e33c1cef1489f5470ce0fb4e5e01fb79
VP9_COMMON is the right place to segmentatation struct because it has
global segmentation parameters, not something specific to macroblock
processing.
Change-Id: Ib9ada0c06c253996eb3b5f6cccf6a323fbbba708
Adds a speed feature to disable split partition search based on a
given threshold on the source variance. A tighter threshold derived
from the threshold provided is used to also disable horizontal and
vertical partitions.
Results on derfraw300:
threshold = 16, psnr = -0.057%, speedup ~1% (football)
threshold = 32, psnr = -0.150%, speedup ~4-5% (football)
threshold = 64, psnr = -0.570%, speedup ~10-12% (football)
Results on stdhdraw250:
threshold = 32, psnr = -0.18%, speedup is somewhat more than derf
because of a larger number of smoother blocks at higher resolution.
Based on these results, a threshold of 32 is chosen for speed 1,
and a threshold of 64 is chosen for speeds 2 and above.
Change-Id: If08912fb6c67fd4242d12a0d094783a99f52f6c6
The macro block mode info context originally contained an
entry for each 16x16 macroblock. In VP9 each entry refers
to an 8x8 region not a macro block, so the naming is misleading.
This first stage clean up changes the names of 3 entries in the
structure to remove the mb_ prefix.
TODO clean up the nomenclature more widely in respect of
mbmi and bmi.
Change-Id: Ia7305c6d0cb805dfe8cdc98dad21338f502e49c6
Don't do vertical or horizontal splits if subsize < min_partition_size,
except for edge blocks where it makes sense.
Change-Id: I479aa66ba1838d227b5de8312d46be184a8d6401
Loop filter configuration doesn't belong to macroblock, so moving it from
MACROBLOCKD to VP9_COMMON. Also moving the declaration of loopfilter struct
from vp9_blockd.h to vp9_loopfilter.h.
Change-Id: I4b3e34be9623b47cda35f9b1f9951f8c5b1d5d28
Different partitionings were not being evaluated against
best_rd and there were unnecessary calls to RDCOST. This
could have resulted in a non-optimal partioning being
selected.
I simplified the variables used to track the rate,
distortion and RD values throughout the function.
Change-Id: Ifa7085ee80d824e86791432a5bc6d8fea5a3e313
The low precision 32x32 fdct has all the intermediate steps within
16-bit depth, hence allowing faster SSE2 implementation, at the
expense of larger round-trip error. It was used in the rate-distortion
optimization search loop only.
Using the low precision version, in replace of the high precision one,
affects the compression performance by about 0.7% (derf, stdhd) at
speed 0. For speed 1, it makes derf set down by only 0.017%.
Change-Id: I4e7d18fac5bea5317b91c8e7dabae143bc6b5c8b
There was no benefit having this function. For example, inside
read_switchable_filter_type switchable filter context was calculated twice.
Change-Id: I79cd5bf95cbc0f6d8bf91a2e32289e01b18dcff1
Adds a speed feature to skip all intra modes other than
DC_PRED if the source variance is small. This feature is
made part of speed 1 and up.
Results on derf300: psnr -0.07%, speedup about 1-2%
Also uses the source variance to fine-tune the early
termination criteria when FLAG_EARLY_TERMINATE is on.
This feature is made part of speed 2 and up.
Results on derf300: psnr -0.52%, speedup about 5-7%
Change-Id: I59e38aa836557cfa5405ae706fc64815cbfe4232
This changeset allows to remove vp9_switchable_interp and
vp9_switchable_interp_map arrays and make code much clear. Actually we
still have to use these mapping but only inside read_interp_filter_type and
write_interp_filter_type functions.
Change-Id: I4026c6f8c4acefba6c81421b7bacbaa52cc45f50
Adds a function to compute source variance for various
sb_types to be used for pruning mode and partition searches.
[The existing activity measure function is currently specialized
for only 16x16 MBs and needs to be updated].
Change-Id: I22a41e6f1430184201487326fdbebb9b47e6fc24
If the partition is out of partition size range, we don't
need to process small partition information.
Change-Id: Ice9bfbbdebe1f2ef79271a3aee17de0ed4608376
use_min_partition_size and use_max_partition_size are not used
currently, and could be added back if needed later.
Change-Id: Ib22a9c06b064567a7c1d6d5445567ed77e0d3acc
This commit removes redundant arguments passing in the function of
rd_pick_reference_frame. This resolves the clang warnings about
potential use of uninitialized values.
Change-Id: Ic68f949a9f8fcd0a583786b0c75321104ea44739