Encoding time of first 50 frames of bus (speed 0) @ 1500kbps goes from
1min5.4 to 1min4.0, i.e. 2.2% faster overall.
Change-Id: I8c32f2aff9a649ce7dd49d910dc5ba16b99c3bc6
All filters are interpolating now, so we don't need this array, all
values from this array are evaluated to true.
Change-Id: I9af6d8219ae0eb984063cd15e4e2296374ae4961
many structures use bw and bh and they have different meanings. This cl attempts
to start this clean up and remove unneccessary 2 step look up log and then
shift operations...
also removed partition type multiple operation code in bitstream.c.
Change-Id: I7e03e552bdfc0939738e430862e3073d30fdd5db
Moving common encoder/decoder code to update_tx_counts. Also renaming
vp9_get_pred_probs_tx_size to get_tx_probs2 and adding get_tx_probs to
call vp9_get_pred_context_tx_size inside read_selected_tx_size only once
(twice before).
Change-Id: Ia50247f3893de88ef8e9041b0d44be44a40aaa4d
Stack the rate-distortion statistics in the sub8x8 rd loop. This allows
the encoder to skip the forward transform, quantization, and coeff cost
estimation, in the sub8x8 rd optimization search, if the motion
vector(s) are of integer pixel value, and have been tested in the
previous prediction filter type rd loops of the same block.
This gives about 2% speed-up for bus_cif at 2000 kpbs, for speed 0.
Its efficacy depends how frequently the motion search will select an
integer motion vector.
Change-Id: Iee15d4283ad4adea05522c1d40b198b127e6dd97
Mode search order in rd loop changed to better reflect
observed hit counts.
Also some adjustment of the baseline mode rd thresholds
to reflect the order change and observed frequencies.
Change-Id: I47a131cc83e11551df8add6d6d8d413d78d3a63c
This commit allows the encoder to skip a few buffer update steps in
rd_pick_best_mbsegmentation, when early breakout has been triggered
in the rd_check_segment_txsize. It provides about 1% speed-up for
bus_cif at 2000 kbps, in the settings of speed 0.
Change-Id: Ica034f10a24dec572b397d8389a2b81020ebc0b9
This patch modifies the auto_mv_step_size speed feature to
use a combination of the maximum magnitude mv from the last
inter frame, and the maximum magnitude mv for the two reference
mvs with the same reference. For arf frames, the max mav step
for the resolution is used.
The bounds therefore are slightly tighter. The feature is made
a speed 1 feature.
Rebased.
Results (when this feature is turned on over speed 0):
derfraw300: -0.046% psnr, about 5+% speedup
(tested on football: goes from 4m30.760s to 4m17.410s).
Change-Id: If492797a61b0b4b3e58c0b8f86afb880165fc9f6
Renaming vp9_sb_mv_ref_tree to vp9_inter_mode_tree, and
vp9_sb_mv_ref_encoding_array to vp9_inter_mode_encodings.
Change-Id: I0e91fbf81350d3ec5a2599064c74089b5d06133a
This prevents a duplicate memcpy of a 128-byte struct every time
set_scale_factors() is called (which is a lot), thus leading to a
decrease from 3.7 MB to 1.85 MB of struct copying per 64x64 block
RD/partition loop.
Overall, this decreases encoding time of the first 50 frames of bus
@ 1500kbps (speed 0) from 1min5.9 to 1min4.9, i.e. about a 1.5%
overall speedup. We can likely get more gains by removing the copy
of the other struct (and replacing it with an indexing) as well.
Change-Id: I3dceb7e79f71e6fe911b11cc994cf89a869dde7a
This means we only do UV intra mode selection if we find any intra
mode to actually be useful at all; in addition, we only do UV intra
mode selection for the transform sizes that were selected, rather
than all sizes available in this partition.
First 50 frames of bus @ 1500kbps (speed 0) gains about 5% with this
change.
Change-Id: I7b461eb8b803247f57896c5a9505f745b55502b3
The break statement only breaks out of the nested loop, not the
top-level loop, so it doesn't always work as intended. Changing it
to a return statement does what's intended.
Change-Id: I585419823b39a04ec8826b1c8a216099b1728ba7
This could happen during golden overlay frame coding from a previous
alt-ref frame if the special overlay code was triggered.
Change-Id: I3056d0c547cd26903b260ef93c94026e96bd9868
Encoding of first 50 frames of bus (speed 0) @ 1500kbps goes from
1min6.2 to 1min5.9, i.e. 0.5% faster overall.
Change-Id: I59d8a3b2f0a75010fa041d5e2646c8caac5bd683
Encode of first 50 frames of bus @ 1500kbps (speed 0) goes from
1min7.3 to 1min6.2, i.e. 1.7% faster overall.
Change-Id: I19d2deacfbffadd61d32551cee9586757ab4a987
Encode of first 50 frames of bus @ 1500kbps (speed 0) goes from 1min12.8
to 1min7.3, i.e. 8% faster.
Change-Id: Ia22d1c7b687316c553cc60eacae988b24e175b62
About 15% faster for bus (speed 0) first 50 frames @ 1500kbps, which
goes from 1min36 to 1min24. Results become slightly better (+0.2% on
derf/yt, +0.4% on hd), probably because of a bugfix for skipmode in
super_block_yrd(). Overall speed change (on derfraw300) is roughly
-13%. This can probably be improved further by caching best_yrd
between partition searches. Also, we might be able to get more
speedups by always doing PARTITION_NONE before PARTITIONS_SPLIT, not
just at the sb8x8 level.
Change-Id: I83736949ebd5b4a3b400ee688d7661913fefc98b
Current partition checking starts from small sizes, and then goes up
to large sizes. This experiment uses the small partitions' motion
estimation result, which is already available, to speed up the
large partition's motion estimation. We can decide to skip some
patition checkings if they are unlikely choices. We could use the
motion vector(MV) result as current partition's prediction MV, limit
the search range and reference frame.
Current result at speed 1:
psnr loss: 1.19% for stdhd, 0.287% for derf.
speed gain: 14% for sunflower(hd), 11% for akiyo.
Further improvement will be done later.
Change-Id: I5abfd070e9cace2e91e2a0247d1325df313887ab
Use an estimate based on DC_PRED for intra uv cost
within the rd loop then only do a full uv mode analysis
if an intra mode is chosen.
Significant speed gains in some cases. Currently only
enabled for speed 2 pending speed/quality tests.
Change-Id: Ie851a12400d5483bce47ec0e3ccb8516041e91c0
This commit makes the encoder to perform motion search only once
per reference frame type for each 4x4/4x8/8x4 block. For bus_cif
at 2000 kbps, the runtime goes from 253812ms -> 217817ms
(14% speed-up) for speed 0.
Change-Id: I5f17599ccc8cfaf93ccb4f98fcb6008af6d79e92
This speed feature allows the encoder to largely remove the spatial
dependency between blocks inside a 64x64 superblock, thereby removing
the need to repeatedly encode superblocks per partition type in the
rate-distortion optimization loop.
A major challenge lies in the intra modes tested in the rate-distortion
optimization loop. The subsequent blocks do not have access to the
reconstructed boundary pixels without the intermediate coding steps.
This was resolved by using the original pixels for intra prediction
in the rd loop, followed by an appropriately designed distortion
modeling on the quantization parameters. Experiments also suggested
that the performance impact is more discernible at lower bit-rate/psnr
settings. Hence a quantizer dependent threshold is applied to deactivate
skip of block coding.
For bus_cif at 2000 kbps,
speed 0: runtime 269854ms -> 237774ms (12% speed-up) at 0.05dB
performance loss.
speed 1: runtime 65312ms -> 61536ms, (7% speed-up) at 0.04dB
performance loss.
This operation is currently turned on in settings of speed 1.
Change-Id: Ib689741dfff8dd38365d8c1b92860a3e176f56ec
Implements some of the helper functions more efficiently with
lookups rathers than branches. Modeling function is consolidated
to reduce some computations.
Also merged the two enums BLOCK_SIZE_TYPES and BlockSize into
one because there is no need to keep them separate (even though
the semantics are a little different).
No bitstream or output change.
About 0.5% speedup
Change-Id: I7d71a66e8031ddb340744dc493f22976052b8f9f
Adding segmentation struct to vp9_seg_common.h. Struct members are from
macroblockd and VP9Common structs. Moving segmentation related constants
and enums to vp9_seg_common.h.
Change-Id: I23fabc33f11a359249f5f80d161daf569d02ec03
This commit fixed the mis-use of the tx_type for inverse transform
in intra4x4 rate-distortion optimization loop. It improves the
overall coding performance.
Change-Id: I7fe9953175b74890357dbcee33c138573766e980
Adds a speed feature to eliminate full-rd computation if the modeled
rd or rd based on a different parameter in the same mode is already
a lot larger than the best rd yet.
Specifically, only search the sharp and smooth filters if the modeled
rd cost based on the regular filter is within a certain factor of the
best rd cost so far. Also, skip full-rd computation of non splitmv
inter modes if the modeled rd cost based on pred error is within the
same factor of the best rd cost so far.
Also adds some enhancements in the rd search for splitmv mode to
speed things up by early breakouts. Negligible impact on performance.
Resuts on derfraw300:
psnr: -0.013% with the splitmv enhancements, -0.24% with the rd
breakout feature on.
speedup: 6% with splitmv enhancements, 20% with also residual breakout
(tested on football sequence at 600 Kbps)
Change-Id: I37abc308ea9f110c1679ce649b6a7e73ab1ad5fc
Encode time of first 50 frames of bus (speed 0) @ 1500kbps goes from
2min4.9 to 2min3.1, i.e. a 1.4% speedup overall.
Change-Id: I9b25e87974430cb942caa276410bb2eda815bd83
Changes cost_mv_ref() into doing a LUT into pre-calculated cost
arrays instead. Encode time of first 50 frames of bus (speed 0)
@ 1500kbps goes from 2min11.6 to 2min10.9, i.e. 0.5% faster overall.
Change-Id: If186e92c34c201b29cbbc058785a15c9c09e433a
Encode time of first 50 frames of bus (speed 0) @ 1500kbps goes from
2min4.9 to 2min3.1, i.e. a 1.4% speedup overall.
Change-Id: Ibe8b08d159797504c5d0c5122de1b6da3b6595e0
Overall, on all test sets, this gains about +0.2% on all metrics.
City is a clip where this really hurts (-1.0% on all metrics), I'm
not quite sure why yet. Maybe interesting to look into in the future.
Change-Id: I6f0eecb20e72f0194633270d30bf00d76d9eae78
Skips mode searches for intra and compound inter modes depending
on the best mode so far and the reference frames. The various
heuristics to be used are selected by bits from a flag. The
previous direction based intra mode search pruning is also absorbed
in this framework.
Specifically the flags and their impact are:
1) FLAG_SKIP_INTRA_BESTINTER (skip intra mode search for oblique
directional modes and TM_PRED if the best so far is
an inter mode)
derfraw300: -0.15%, 10% speedup
2) FLAG_SKIP_INTRA_DIRMISMATCH (skip D27, D63, D117 and D153
mode search if the best so far is not one of the closest
hor/vert/diagonal directions.
derfraw300: -0.05%, about 9% speedup
3) FLAG_SKIP_COMP_BESTINTRA (skip compound prediction mode
search if the best so far is an intra mode)
derfraw300: -0.06%, about 7-8% speedup
4) FLAG_SKIP_COMP_REFMISMATCH (skip compound prediction search
if the best single ref inter mode does not have the same ref
as one of the two references being tested in the compound mode)
derfraw300: -0.56%, about 10% speedup
Change-Id: I1a736cd29b36325489e7af9f32698d6394b2c495
This commit allows encoder to detect the cumulative rate-distortion
cost per transformed block inside a partition. If the cumulative
rd cost is already above the best rd value, it terminates the rest
operations and continue to next prediction mode test.
It reduces the runtime of bus at target bit-rate 2000 from 308 second
to 266 second, i.e., about 13% speed-up at no performance penalty.
Change-Id: I5f15a3d8955d97031d5653006027866a00654e7a
sf->unused_mode_skip_lvl. Tests modes as normal for all
sizes at or below the given level. At larger sizes it skips
all modes that were not chosen at any smaller size.
Hence setting BLOCK_SIZE_SB64X64 is in effect off.
Setting BLOCK_SIZE_AB4X4 will only consider modes that
were chosen for one or more 4x4 blocks at larger sizes.
sf->reference_masking.
Do a test encode of the NONE partition at one size and create
a reference frame mask based on the best rd choice. In the
full search only allow this reference frame.
Currently it is testing 64x64 and repeats this in the full search.
This does not work well with Jim's Partition code just now and
is disabled by default.
Change-Id: I8f8c52d2ef4a0c08100150b0ea4155d1aaab93dd
This speed feature will skip searching the directional intra prediction
modes D63, D117, D27, D153 if the best intra mode so far is not one of
the diagonal, horizontal or vertical directions closest to the respective
directions being tested. In other words, this implements a sort of
binary search in the angular domain.
Speedup: about 9-10%
Results: -0.05% only on derfraw300.
Change-Id: I413584c41f2a3e8dabfbdeb40718c8fc4b1d63a2
(1) Refines the modeling function and uses that to add some speed
features. Specifically, intead of using a flag use_largest_txfm as
a speed feature, an enum tx_size_search_method is used, of which
two of the types are USE_FULL_RD and USE_LARGESTALL. Two other
new types are added:
USE_LARGESTINTRA (use largest only for intra)
USE_LARGESTINTRA_MODELINTER (use largest for intra, and model for
inter)
(2) Another change is that the framework for deciding transform type
is simplified to use a heuristic count based method rather than
an rd based method using txfm_cache. In practice the new method
is found to work just as well - with derf only -0.01 down.
The new method is more compatible with the new framework where
certain rd costs are based on full rd and certain others are
based on modeled rd or are not computed. In this patch the existing
rd based method is still kept for use in the USE_FULL_RD mode.
In the other modes, the count based method is used.
However the recommendation is to remove it eventually since the
benefit is limited, and will remove a lot of complications in
the code
(3) Finally a bug is fixed with the existing use_largest_txfm speed feature
that causes mismatches when the lossless mode and 4x4 WH transform is
forced.
Results on derf:
USE_FULL_RD: +0.03% (due to change in the tables), 0% encode time reduction
USE_LARGESTINTRA: -0.21%, 15% encode time reduction (this one is a
pretty good compromise)
USE_LARGESTINTRA_MODELINTER: -0.98%, 22% encode time reduction
(currently the benefit of modeling is limited for txfm size selection,
but keeping this enum as a placeholder) .
USE_LARGESTALL: -1.05%, 27% encode-time reduction (same as existing
use_largest_txfm speed feature).
Change-Id: I4d60a5f9ce78fbc90cddf2f97ed91d8bc0d4f936
Compute the rate-distortion cost per transformed block, and cumulate
the cost through all blocks inside a partition. This allows encoder
to detect if the cumulative rd cost is already above the best rd cost,
thereby enabling early termination in the rate-distortion optimization
search.
Change-Id: I0a856367a9a7b6dd0b466e7b767f54d5018d09ac
This should significantly speedup cost_coeffs(). Basically what the
patch does is to make the neighbour arrays padded by one item to
prevent an eob check in get_coef_context(), then it populates each
col/row scan and left/top edge coefficient with two times the same
neighbour - this prevents a single/double context branch in
get_coef_context(). Lastly, it populates neighbour arrays in pixel
order (rather than scan order), so we don't have to dereference the
scantable to get the correct neighbours.
Total encoding time of first 50 frames of bus (speed 0) at 1500kbps
goes from 2min10.1 to 2min5.3, i.e. a 2.6% overall speed increase.
Change-Id: I42bcd2210fd7bec03767ef0e2945a665b851df56
Total encoding time for first 50 frames of bus (speed 0) @ 1500kbps
goes 2min34.8 to 2min14.4, i.e. a 10.4% overall speedup. The code is
x86-64 only, it needs some minor modifications to be 32bit compatible,
because it uses 15 xmm registers, whereas 32bit only has 8.
Change-Id: I2df53770c2e850813ffa713e1a91b45b0082b904
Added a speed feature that focuses only on thresholds
for new motion modes.
Moved sf->comp_inter_joint_search_thresh into speed
1. This has ~+0.4% impact on quality at speed 0 as
our quality reference baseline.
Slight adjustment to baseline thresholds.
Change-Id: I7ebf104f1fe29af77ed4837b2e84be065621bbe5
Makes cost_coeffs() a lot faster:
4x4: 236 -> 181 cycles
8x8: 888 -> 588 cycles
16x16: 3550 -> 2483 cycles
32x32: 17392 -> 12010 cycles
Total encode time of first 50 frames of bus (speed 0) @ 1500kbps goes
from 2min51.6 to 2min43.9, i.e. 4.7% overall speedup.
Change-Id: I16b8d595946393c8dc661599550b3f37f5718896
Cycle timings for first 3 frames of bus (speed 0) at 1500kbps:
4x4: 298 -> 234 cycles
8x8: 1227 -> 878 cycles
16x16: 23426 -> 18134 cycles
32x32: 4906 -> 3664 cycles
Total encode time of first 50 frames of bus @ 1500kbps (speed 0) goes
from 3min0.7 to 2min51.6 seconds, i.e. 5.3% faster.
Change-Id: I68a0e1b530b0563b84a67342cca4b45146077e95
This commit replaces zrun_zbin_boost, a method of biasing non-zero
coefficients following runs of zero-coefficients to be rounded towards
zero, with an explicit skip-block choice in the RD loop.
The logic is basically that if individual coefficients should be rounded
towards zero (from a RD point of view), the trellis/optimize loop should
take care of it. If whole blocks should be zero (from a RD point of
view), a single RD check is much more efficient than a complete
serialization of the quantization loop.
Quality change: derf +0.5% psnr, +1.6% ssim; yt +0.6% psnr, +1.1% ssim.
SIMD for quantize will follow in a separate patch. Results for other
test sets pending.
Change-Id: Ife5fa641163ac5150ac428011e87188f1937c1f4
This commit enables configurable reference buffer pointer for intra
predictor. This allows later removal of spatial dependency between
blocks inside a 64x64 superblock in the rate-distortion optimization
loop.
Change-Id: I02418c2077efe19adc86e046a6b49364a980f5b1
Also tweaks to other features and experiments with
what is on and off at different speed settings.
Change-Id: I3e1d0be0d195216bf17c2ac5df67f34ce0b306b2
Each frame we reset all adaptive thresholds to MAX
rather than base. As modes are picked their thresholds
drop down.
Change-Id: Ia37f03a73003c2d9bfcda57edea07205e9a0e5e8
Renamed cpi->sf.first_step to cpi->sf.reduce_first_step_size
and changed its meaning such that it is a delta applied to
reduce the default first step size (>> x) in the motion search
rather than an absolute value.
The default first step size is already changed according to the image
dimensions (smaller for smaller images). cpi->sf.reduce_first_step_size
now applies a further correction from the default.
Change-Id: Ia94e08bc24c67b604831f980909af7e982fcd16d
Change vp9_block_error() to return a 64bit error variable, change all
callers to expect a 64bit return value (this will prevent overflows,
which we basically don't check for at all right now). Remove duplicate
block_error() function, which fixed that through truncation. Remove
old (incompatible) mmx/sse2 block_error SIMD versions and replace with
a new one that returns a 64bit value.
Encoding time of first 50 frames of bus @ 1500kbps goes from 3min29 to
3min23, i.e. a 3% overall speedup.
Change-Id: Ib71ac5508b5ee8a80f1753cd85d72df1629abe68
Improves the rd modeling function and implements them using interpolation
from a table which is a little faster. Also uses sse as input to the
modeling function rather than var - since there is no dc prediction
used and as a result the sse works a little better.
derfraw300: +0.05%
Speedup: ~1%
Change-Id: I151353c6451e0e8fe3ae18ab9842f8f67e5151ff
Change the argument of get_uv_tx_size() to be an MBMI pointer, so that the
correct column's MBMI can be passed to the function.
Change-Id: Ied6b8ec33b77cdd353119e8fd2d157811815fc98
Fixed point scaling factors are calculated once for each
reference frame by using integer division. Otherwise fixed point
scaling routines are used in all scaling calculations. This makes it
possible to calculate fixed point scaling factors on device driver
software and pass them to hardware and thus avoid division on hardware.
TODO:
- Missing check for maximum frame dimensions
(currently scaling uses 14 bits)
- Missing check for maximum scaling ratio
(upscaling 16:1, downscaling 2:1)
Problems:
- Straightforward fixed point implementation can cause error +-1
compared to integer division (i.e. in x_step_q4). Should only
be an issue for frames larger than 16k.
Change-Id: I3cf4dabd610a4dc18da3bdb31ae244ebaf5d579c
Adds coding of transform size within a frame by use of context
of transform sizes selected in left and above blocks.
Also incorporates code for generating stats.
TODO: generate and incorporate new default stats
Change-Id: I6a7af099f6ad61d448521d9a51167aedaf638ed6
Changes to the coding of transform sizes, along with forward
and backward probability updates.
Results:
derf300: +0.241%
Context based coding of transform sizes will be in a separate
patch.
Change-Id: I97241d60a926f014fee2de21fa4446ca56495756
Code intra/inter, then comp/single, then the ref frame selection.
Use contextualization for all steps. Don't code two past frames
in comp pred mode.
Change-Id: I4639a78cd5cccb283023265dbcc07898c3e7cf95
Split partition probabilities between keyframes and non-keyframes,
since they are fairly different. Also have per-blocksize interframe
y intramode probabilities, since these vary heavily between different
blocksizes.
Lastly, replace default probabilities for partitioning and intra modes
with new ones generated from current codec. Replace counts with actual
probabilities also.
Change-Id: I77ca996e25e4a28e03bdbc542f27a3e64ca1234f
Added structures to support independent rd thresholds
for different block sizes (and set experimental block
size correction factors).
Added structure to to allow dynamic adaptation of thresholds
per mode and per block size basis depending on how often
the mode/block size combination is seen (currently fixed factor).
Removed some unused variables.
TODO
- Adaptation of thresholds based on how often each mode chosen.
- The baseline mode values could also be adjusted based on
the block size (e.g. for a particular intra mode use a low threshold
for 4x4 prediction blocks but a relatively high value for 64x64.
Change-Id: Iddee65ff3324ee309815ae7c1c5a8584720e7568
This avoids encoding tokens for blocks that are entirely
in the UMV border. This changes the bitstream.
Change-Id: I32b4df46ac8a990d0c37cee92fd34f8ddd4fb6c9
This commit makes the coding/reconstruction operations of intra
coding rate-distortion loop for UV components consistent with those
of the encoding process.
key frame coding gains:
derf: 0.11%
stdhd: 0.42%
Change-Id: I8d49f83924a320e3689ef2d60096c49d7f0c7a40
This commit makes operations of the superblock intra coding rate
distortion optimization consistent with those used in the encoding
process. Given the test prediction mode and transform size, the rd
optimizer encodes and reconstructs each transformed block of the
superblock consecutively, then computes the total rate-distortion
costs accosicated with the current superblock to select the coding
decisions.
It achieves coding performance gains:
derf 0.353%
yt 1.111%
Change-Id: I0da2eb7a71361dfb8c1384927fc536b0c2790d07
Enable iterative motion search for compound inter-inter prediction
of block sizes 4x4/4x8/8x4 only when best coding quality is selected.
The iterative motion search provides about 0.1% gains for derf and
stdhd at this point, at the expense of longer runtime.
Change-Id: Idc03e7f827e51f1bb8d269bc3752ee297a6bbfe5
Migrates costing changes/fixes from the rebalance expt to the head
without the expt on.
Rebased.
Change-Id: I51677d62f77ed08aca8d21a4c9a13103eb8de93f
Results:
derfraw300: +0.126%
This speed 1 - uses variance threshold stolen from static-thresh
to determine split. Any superblock with greater than the variance
set by static thresh * quantizer index squared is split. In addition
transform size is set to largest size less than or equal to partition
size, sub pixel filter is set to normal, and only 12 modes are used
at all.
Change-Id: If7a2858ee70f96d1eb989c04fd87a332b147abef
We leave it in rdopt.c as a local define for now - this can be removed
later. In all other places, we remove it, thereby slightly decreasing
the size of some arrays in the bitstream.
Change-Id: Ic2a9beb97a4eda0b086f62c039d994b192f99ca5
It remains as a local define in rdopt.c so we can distinguish between
split and non-split modes in the RD loop, but disappears outside that
scope in the codec.
Change-Id: I98c18fe5ab7e4fbd1d6620ec5695e2ea20513ce9
This commit enables iterative motion search for 4x4/4x8/8x4 block
size compound inter-inter prediction.
WIP: borg run testing
Change-Id: I2b318db4a03cdca5a8002b3fa6c0fa89b129288b
This patch changes the coefficient tree to move the EOB to below
the ZERO node in order to save number of bool decodes.
The advantages of moving EOB one step down as opposed to two steps down
in the other parallel patch are: 1. The coef modeling based on
the One-node becomes independent of the tree structure above it, and
2. Fewer conext/counter increases are needed.
The drawback is that the potential savings in bool decodes will be
less, but assuming that 0s are much more predominant than 1's the
potential savings is still likely to be substantial.
Results on derf300: -0.237%
Change-Id: Ie784be13dc98291306b338e8228703a4c2ea2242
For 4x4 blocks valgrind points out the cache was uninitalized.
This resolves the issue by setting it.
Change-Id: I22733000da048643762813a84fbda66d8e4040d2
This commit makes clean-ups in the rate-distortion loop for 4x4,
4x8, and 8x4 block sizes for the use of iterative motion search.
Removed unnecessary use of bmi in handle_inter_mode.
Deprecated loop over labels in the 4x4/4x8/8x4 block rd search.
Change-Id: I71203dbb68b65e66f073b37abd90d82ef5ae6826
This patch checks at the frame level to see if the previous
mode info context can be used. This patch eliminates the
flag check that was done for every mode and removes another
check that was done prior to every vp9_find_mv_refs().
Change-Id: I9da5e18b7e7e28f8b1f90d527cad087073df2d73
scales for second reference frame vars are unitialized if the
second ref frame is one of of those disallowed by refframeflags
Change-Id: I4ce42de391178c1699dcaede18c5f12c84993c61
Proposal for tuning the residual coding by changing how the context
from previous tokens is calculated. Storing the energy class of previous
tokens instead of the token itself eases the critical path of
HW implementations.
Change-Id: I6d71d856b84518f6c88de771ddd818436f794bab
This commit pulls the iterative motion search for compound inter-
inter out from handle_inter_mode_ as a separate function. Hence,
it is applicable to 4x4/4x8/8x4 level compound inter search to be
enabled later.
Also edit the rd loop for 4x4 inter block sizes for cosmetic
purpose.
Change-Id: Ibc71a11cbe5a26cd52faba01026cf8446cf4d2b4
Removed one 4x4 prediction step that was unnessary in the rd loop.
Removed a unused modecosts estimate from encoder side.
Change-Id: I65221a52719d6876492996955ef04142d2752d86
1. remove prediction mode conversion
2. unified bmode, same for key and non-key frame
3. set I4X4_PRED count for pdf to 0, as I4X4_PRED is no longer
coded ever. It is determined by ref_frame and block partition
Change-Id: If5b282957c24339b241acdb9f2afef85658fe47d
Also do per-partition motion vector referencing in <sb8x8 partitions,
and adjust mvref finding for sub8x8 partitions.
Change-Id: Id3ed1ed4d2a8910d11d327db6cc63b8eb79f941f
This commit refactors the iterative motion search for compound
inter-inter mode, to make it support all partition types including
4x4/4x8/8x4 block sizes.
Change-Id: I5f1212b0f307377291763e45c6bdc9693b5f04c8
Move 4x4/4x8/8x4 partition coding out of experimental list.
This commit fixed the unit test failure issues. It also resolved
the merge conflicts between 4x4 block level partition and iterative
motion search for comp_inter_inter.
Change-Id: I898671f0631f5ddc4f5cc68d4c62ead7de9c5a58
Reverts to using 128 bit LUT for the coef models rather than 48
to ease hardware implementation.
Also incorporates some cleanups including removing various
hooks to support different lookup tables based on block_type and
ref_type.
Change-Id: I54100c120cca07a2ebd3a7776bc4630fa6a153f6
This commit allows the rate-distortion optimization of intra coding
capable of supporting 8x4 and 4x8 partition settings.
It enables the entropy coding of intra modes in key frame using a
unified contextual probability model conditioned on its above/left
prediction modes.
Coding performance:
derf 0.464%
Change-Id: Ieed055084e11fcb64d5d5faeb0e706d30268ba18
Cleans up the experiment. Actually uses reduced counts for backward
updates, and reduced number of probabilities in the context.
No change in bitstream when the experiment is on.
Between expt on and off:
derfraw300 is down only -0.062% (which is better than when expts
were run previously).
Change-Id: I55285a049a0c22810bdb42914212ab5a4f8521b5
This patch eliminates the intermediate diff buffer usage by
combining the short idct and the add residual into one function.
The encoder can use the same code as well.
Change-Id: I296604bf73579c45105de0dd1adbcc91bcc53c22
The recursive partition type search is enabled down to 4x4, 4x8 and
8x4, followed by the corresponding rate-distortion optimization for
the per-partition encoding mode decisions.
The bit-stream writing/reading synchronized in supporting the
rectangular partition of 8x8 block.
This provides above 1% coding performance gains on derf.
To do next:
1. re-design the rate-distortion loop for inter prediction below 8x8.
2. re-design the rate-distortion loop for intra prediction below 4x4.
3. make the loop-filter aware of rectangular partition of 8x8 block.
4. clean the unused probability models.
5. update default probability values.
Change-Id: Idd41a315b16879db08f045a322241f46f1d53f20
This is a mostly-working implementation of an extra channel in the
bitstream. Configure with --enable-alpha to test. Notable TODOs:
- Add extra channel to all mismatch tests, PSNR, SSIM, etc
- Configurable subsampling
- Variable number of planes (currently always uses all 4)
- Loop filtering
- Per-plane lossless quantizer
- ARNR support
This implementation just uses the same contents as the Y channel
for the A channel, due to lack of content and general pain in
playing back 4 channel content. A later patch will use the actual
alpha channel passed in from outside the codec.
Change-Id: Ibf81f023b1c570bd84b3064e9b4b8ae52e087592
These building blocks enable rate-distortion optimization search
over block sizes of 8x4 and 4x8. Need to convert them into mmx/sse
forms.
Change-Id: I570ea2d22d14ceec3fe3575128d7dfa172a577de
This commit allows proper transform type (DCT/ADST) selection in
the settings of partition 4x4 level.
Change-Id: Iec6f922a46480d777e7ca9142a99e8c131f0077b
This commit allows the rate-distortion optimization recursion
at encoder to go down to 4x4 block size. It deprecates the use
of I4X4_PRED and SPLITMV syntax elements from bit-stream
writing/reading. Will remove the unused probability models in
the next patch.
The partition type search and bit-stream are now capable of
supporting the rectangular partition of 8x8 block, i.e., 8x4
and 4x8. Need to revise the rate-distortion parts to get these
two partition tested in the rd loop.
Change-Id: I0dfe3b90a1507ad6138db10cc58e6e237a06a9d6
Allow motion search multiple times iteratively, and break out
the loop if this search couldn't find better motion vectors.
Limit the maximum number of search to 2.
Tests results:
1. stdhd set: 0.311%(overall psnr); 0.346%(ssim).
positive gain on 10 out of 16 clips(best: 2.746% on sunflower;
worst: -0.434% on old_town_cross).
2. derf set: 0.016%(overall psnr); 0.062%(ssim).
positive gain on half of the clips(best: 0.499% on bowing;
worst: -0.387 on city).
Change-Id: Ibf0a51776d4caf7707be0586346db08128117559
Change band calculation back to simpler model based
on the order in which coefficients are coded in scan order
not the absolute coefficient positions.
With the scatter scan experiment enabled the results were
appear broadly neutral on derf (-0.028) but up a little on std-hd +0.134).
Without the scatterscan experiment on the results were up derf as well.
Change-Id: Ie9ef03ce42a6b24b849a4bebe950d4a5dffa6791
Use 4x4 block coding for UV components arbitrarily in I4X4_PRED and
SPLITMV coding modes. This is a temporary solution to enable
bit-stream support for recursive partition down to 4x4 block size.
Will separate the functionalities of 4x4 block coding rate-distortion
out from those of superblocks.
Change-Id: I03dc15d5897014f175f3f2c91e9b266091d56797
In current code, motion vectors got from single prediction mode are used
in compound prediction mode directly. These motion vectors may not give
accurate prediction since they are searched independently. In this patch,
we took Pascal's suggestion, and did joint motion search in compound
prediction mode to find better motion vectors in this situation.
Test results:
Overall PSNR: 0.570%(derf), 0.918%(stdhd);
SSIM: 0.572%(derf), 1.009%(stdhd);
The encoder is a little slower. This can be improved since some c
code is used in motion search.
Change-Id: Ib30c9240f6c56c9b070867b4ca89412a76d9f3c6
Makes the temporary storage of the filtered data agnostic to
the number of planes and how they're subsampled.
Change-Id: I12f352cd69a47ebe1ac622af30db29b49becb7f4
Iterating over all planes in the loop instead of custom y,uv code inside
handle_inter_mode function.
Change-Id: I301f9276d6d544c2fd7203d84f1318ac80ea625d
Disable the use of scaled reference frame for motion search in
SPLITMV mode. This fixes the unit test failure issue triggered
when merging sb8x8 from experimental list.
Change-Id: I02ac25fd8db8d5762f8fee29513b947189875fa0
This allows removing a large number of transform size specific functions,
as well as supporting 444/alpha by routing all code through the
subsampling-aware path.
Change-Id: Ieb085cebe9f37f24fc24de179898b22abfda08a4
Work-in-progress, not yet ready for review. TODO items:
- bitstream writing (encoder) and reading (decoder)
- decoder reconstruction
Change-Id: I5afb7284e7e0480847b47cd0097cb469433c9081
Output changes slightly because of a minor bug in (at least) the sb32x16
block2above tx16x16 tables that previously existed in vp9_blockd.c.
Change-Id: I624af28ac200a8322d64454cf05c79e9502968cc
All members can be referenced from their per-plane counterparts, and
removes assumptions about 24 blocks per macroblock.
Change-Id: I7ff2fa72d22c29163eb558981c8193765a8113d9
This originally was "Removed update_blockd_bmi()". Now,
this patch removed bmi from blockd and uses the bmi found
in mode_info_context. Eliminates unnecessary bmi copies between
blockd and mode_info_context.
Change-Id: I287a4972974bb363f49e528daa9b2a2293f4bc76
Basic assumption: when talking about transform units, use b_; when
talking about macroblock indices, use mb_.
Change-Id: Ifd163f595d4924ff892de4eb0401ccd56dc81884
The underlying storage for these buffers is in the per-plane MACROBLOCKD
area, so read it from there directly.
Change-Id: Id6bd835117fdd9dea07db95ad06eff9f12afaaf7
All members can be referenced from their per-plane counterparts, and
removes assumptions about 24 blocks per macroblock.
Change-Id: I593fb0715e74cd84b48facd1c9b18c3ae1185d4b
This commit enables selecting probability models for recursive block
partition information syntax, depending on its above/left partition
information, as well as the current block size. These conditional
probability models are reasonably stationary and consistent across
frames, hence the backward adaptive approach is used to maintain and
update the contextual models.
It achieves coding performance gains (on top of enabling rectangular
block sizes):
derf: 0.242%
yt: 0.391%
hd: 0.376%
stdhd: 0.645%
Change-Id: Ie513d9673337f0d27abd65fb566b711d0844ec2e
This commit moves the coeff storage from the MACROBLOCK struct to its
per-plane part. The next commit will remove the coeff member from the
BLOCK structure so that it is consistently accessed per-plane.
Also refactors vp9_sb_block_error_c and vp9_sb_uv_block_error_c to be
variable subsampling aware.
Change-Id: I18c30f87f27c3a012119b6c1970d5fa499804455
First in a series of commits making certain MACROBLOCK members
addressable per-plane. This commit also refactors the block subtraction
functions vp9_subtract_b, vp9_subtract_sby_c, etc to be
loops-over-planes and variable subsampling aware.
Change-Id: I371d092b914ae0a495dfd852ea1a3d2467be6ec3
This version of speed 1 only disables modes at higher resolution that
had distortions >2x the best mode we found...
The hope is that this could be a replacement for speed 0 ...
Change-Id: I7421f1016b8958314469da84c4dccddf25390720
Adds an experiment that codes an end-of-orientation symbol
for every eligible zero encountered in scan order.
This cleans out various other sub-experiments that were part
of the origiinal patch, which will be later included if found
useful.
Results are slightly positive on all sets (0.1 - 0.2% range).
Change-Id: I57765c605fefc7fb9d1b57f1b356843602abefaf
Removes the redundant dst pointers from vp9_build_inter_predictors_sb{y,uv}
and the remaining mb specific functions.
Change-Id: I7b6bf439d9394b85ea79b4fe61a3ffc1025720da
First in a series of commits moving the framebuffers pointers to
per-plane data, so that they can be indexed numerically rather than
by name.
Change-Id: I6e0d60fd4d51e6375c384eb7321776564df21775
Further simplification of mvref search to return
only the top two candidates. Distance weights removed
as the test order reflects distance anyway.
Change-Id: I0518cab7280258fec2058670add4f853fab7b855
As we are no longer able to sort the candidate
mvrefs in both encoder and decode and given
that the cost of explicit signalling has proved
prohibitive, it no longer makes sense to find more
than 2 candidates.
This patch:
Modifies and simplifies add_candidate_mv()
Removes the forced addition of a 0 vector in the
MAX_MV_REF_CANDIDATES-1 position (in preparation
to reducing MAX_MV_REF_CANDIDATES to 2).
Re-orders the addition of candidates slightly.
This actually gives small gains (circa 0.2% on std-hd)
A subsequent patch will remove NEW_MVREF experiment,
reduce MAX_MV_REF_CANDIDATES to 2 and remove distance
weights as these are implicit now in the order.
Change-Id: I3dbe1a6f8a1a18b3c108257069c22a1141a207a4
Use in-place buffers (dst of MACROBLOCKD) for macroblock prediction.
This makes the macroblock buffer handling consistent with those of
superblock. Remove predictor buffer MACROBLOCKD.
Change-Id: Id1bcd898961097b1e6230c10f0130753a59fc6df
Adds RD integration for 32x16, 16x32, 64x32 and 32x64 rectangular blocks.
Derf almost +0.6%, HD a little over +1.0%, STDHD +1.3%.
Change-Id: Id651fdb6a655fdbb5c47009757e63317acfb88a5
This flag was added to VP8 to allow a mode where MB-level skipping
was not allowed, saving a bit per mb. It was never used in practice,
and hasn't been tested in VP9, so remove it.
Change-Id: Id450ec6904c6d06c1919508e7efc52d05cde5631