Changes to the coding of transform sizes, along with forward
and backward probability updates.
Results:
derf300: +0.241%
Context based coding of transform sizes will be in a separate
patch.
Change-Id: I97241d60a926f014fee2de21fa4446ca56495756
Added condition to not to skip filtering of transform block edges when
the edge is also a decoding block edge.
Change-Id: Iaccb6206c4202b78e5dca3b89379556e0f4aba0c
Wrong max data size (skip has no data) and use of vp9_get_segdata()
when it should be vp9_segfeature_active().
Change-Id: I1eb97d33df6e2a42cc589049f704266fe3639902
In the longer term the encoder should allow compound as long
as one of the buffers has opposite sign bias and as per the decoder
this buffer is then set as the fixed reference. However at the moment
the encoder and RD loop only supports the case where the ALTREF_FRAME
buffer (or third of the 3 allowed in any given frame) is the odd one out.
This patch fixes a bug that would allow compound inter and set
fixed ref to ALTREF_FRAME when it is not the odd one out.
Change-Id: Ic83a69486e088a147ba83a4aedc2a0042f6b3721
Adds the ability to have the decoder show one of the existing reference
frames directly, without having to code it indirectly as a series of
skip blocks.
Change-Id: Ib6c26c5f6a8709863cf304ab890db8559687d25e
ref_frame in MB_Mode_Info was changed in the ref frame coding patch
to be an array to handle first and second reference frame, this patch
fix the loop filter code that use the pointer directly as reference
frame.
Change-Id: I71afa5a49deb50c1bc38029fd07470b984c6dfe9
Code intra/inter, then comp/single, then the ref frame selection.
Use contextualization for all steps. Don't code two past frames
in comp pred mode.
Change-Id: I4639a78cd5cccb283023265dbcc07898c3e7cf95
Split partition probabilities between keyframes and non-keyframes,
since they are fairly different. Also have per-blocksize interframe
y intramode probabilities, since these vary heavily between different
blocksizes.
Lastly, replace default probabilities for partitioning and intra modes
with new ones generated from current codec. Replace counts with actual
probabilities also.
Change-Id: I77ca996e25e4a28e03bdbc542f27a3e64ca1234f
This version of the loop filter supports non-4:2:0 subsampling and
a fourth plane, as well as changing the filtering order to be more
friendly to hardware implementations.
The filters are applied first to all vertical edges within the
64x64 SB, followed by the top horizontal edge and any internal
horizontal edges. Since filtering is applied on each 4x4 edge
serially, a dependency is created from filtering one block edge
to the next. It would be possible to remove this depencnecy by
building all filtering decisions from the unfiltered
reconstruction data.
Change-Id: I08f3e9683eb7bded8a76651cbc50fc0dfdd05fa7
Added structures to support independent rd thresholds
for different block sizes (and set experimental block
size correction factors).
Added structure to to allow dynamic adaptation of thresholds
per mode and per block size basis depending on how often
the mode/block size combination is seen (currently fixed factor).
Removed some unused variables.
TODO
- Adaptation of thresholds based on how often each mode chosen.
- The baseline mode values could also be adjusted based on
the block size (e.g. for a particular intra mode use a low threshold
for 4x4 prediction blocks but a relatively high value for 64x64.
Change-Id: Iddee65ff3324ee309815ae7c1c5a8584720e7568
This avoids encoding tokens for blocks that are entirely
in the UMV border. This changes the bitstream.
Change-Id: I32b4df46ac8a990d0c37cee92fd34f8ddd4fb6c9
This commit makes the coding/reconstruction operations of intra
coding rate-distortion loop for UV components consistent with those
of the encoding process.
key frame coding gains:
derf: 0.11%
stdhd: 0.42%
Change-Id: I8d49f83924a320e3689ef2d60096c49d7f0c7a40
Adds backward adaptation and differential forward updates of switchable
interpolation filter probabilities. Also adds some cosmetic cleanups
and minor fixes on mv_ref probabilities.
derfraw300: +0.353% (with most coming from switchable interp changes)
Change-Id: Ie2718be73528c945fd0d80cfd63ca2d9cb3032de
This commit makes operations of the superblock intra coding rate
distortion optimization consistent with those used in the encoding
process. Given the test prediction mode and transform size, the rd
optimizer encodes and reconstructs each transformed block of the
superblock consecutively, then computes the total rate-distortion
costs accosicated with the current superblock to select the coding
decisions.
It achieves coding performance gains:
derf 0.353%
yt 1.111%
Change-Id: I0da2eb7a71361dfb8c1384927fc536b0c2790d07
Enable iterative motion search for compound inter-inter prediction
of block sizes 4x4/4x8/8x4 only when best coding quality is selected.
The iterative motion search provides about 0.1% gains for derf and
stdhd at this point, at the expense of longer runtime.
Change-Id: Idc03e7f827e51f1bb8d269bc3752ee297a6bbfe5
Migrates costing changes/fixes from the rebalance expt to the head
without the expt on.
Rebased.
Change-Id: I51677d62f77ed08aca8d21a4c9a13103eb8de93f
Results:
derfraw300: +0.126%
Restrict get_matching_candidate() to considering
mvs at 8x8 and larger sizes for last frame case.
This is to reduce the HW load of using vectors down
to the 4x4 level from the previous frame.
Change-Id: I6505e610fd63a4e22d67f136aec7905a01b893ba
This speed 1 - uses variance threshold stolen from static-thresh
to determine split. Any superblock with greater than the variance
set by static thresh * quantizer index squared is split. In addition
transform size is set to largest size less than or equal to partition
size, sub pixel filter is set to normal, and only 12 modes are used
at all.
Change-Id: If7a2858ee70f96d1eb989c04fd87a332b147abef
We leave it in rdopt.c as a local define for now - this can be removed
later. In all other places, we remove it, thereby slightly decreasing
the size of some arrays in the bitstream.
Change-Id: Ic2a9beb97a4eda0b086f62c039d994b192f99ca5
It remains as a local define in rdopt.c so we can distinguish between
split and non-split modes in the RD loop, but disappears outside that
scope in the codec.
Change-Id: I98c18fe5ab7e4fbd1d6620ec5695e2ea20513ce9
The commit changed to use a new variant of Walsh-Hadamard Transform
by Tim Terriberry. This new variant has the best compression among a
number of variants that developed by Tim.
Change-Id: Icb3a88515463cfc644b17ca046fcd139db2557e9
Fixes an issue with reducing branch cts in the encoder causing
a drop in performance. The bug was introduced in a previous
clean up patch.
Test: Went back to the offending patch, applied this same fix
to it, and checked that results are identical to the parent
of that patch.
Change-Id: I0bad8e2d930235d0284300fcebf836ceb56f2498
The first 240 coeff positions (15 top-left blocks) are scanned in the
same order as in scatter scan, after that the coeffs are scanned in
"block bands", each band at a time, all coeffs in one band before
moving on to the next band. This brings down the amount of 4x4 coeff
blocks that need to be buffered while scanning, from 15 blocks to 8 blocks.
Change-Id: I478a991d63c48bd5e64d36e59fed7a00c9a651ba
This patch removes the implicit segmentation
experiment from the code base as the benefits
were still unproven as of the bitstream deadline.
Change-Id: I273b99d8d621d1853eac4182f97982cb5957247e
This commit enables iterative motion search for 4x4/4x8/8x4 block
size compound inter-inter prediction.
WIP: borg run testing
Change-Id: I2b318db4a03cdca5a8002b3fa6c0fa89b129288b
Added two flags to the frame header:
intra_only:
Signals that the frame is encoded using only INTRA
coding modes.
reset_frame_context:
Indicates that the coding context specified
in the frame header should be reset to default values before the
frame is encoded/decoded.
Change-Id: I182d46f1f84fb67a13c46ad767f246a38d7861a2
We could remove calling set_scale_factors() since it is also
done in set_refs() right after vp9_decode_mb_mode_mv() call in
decode_modes_b().
Change-Id: I9e62c90ffb770240987cd42815786567261b5d97
This patch changes the coefficient tree to move the EOB to below
the ZERO node in order to save number of bool decodes.
The advantages of moving EOB one step down as opposed to two steps down
in the other parallel patch are: 1. The coef modeling based on
the One-node becomes independent of the tree structure above it, and
2. Fewer conext/counter increases are needed.
The drawback is that the potential savings in bool decodes will be
less, but assuming that 0s are much more predominant than 1's the
potential savings is still likely to be substantial.
Results on derf300: -0.237%
Change-Id: Ie784be13dc98291306b338e8228703a4c2ea2242
For 4x4 blocks valgrind points out the cache was uninitalized.
This resolves the issue by setting it.
Change-Id: I22733000da048643762813a84fbda66d8e4040d2
This commit makes clean-ups in the rate-distortion loop for 4x4,
4x8, and 8x4 block sizes for the use of iterative motion search.
Removed unnecessary use of bmi in handle_inter_mode.
Deprecated loop over labels in the 4x4/4x8/8x4 block rd search.
Change-Id: I71203dbb68b65e66f073b37abd90d82ef5ae6826
This patch checks at the frame level to see if the previous
mode info context can be used. This patch eliminates the
flag check that was done for every mode and removes another
check that was done prior to every vp9_find_mv_refs().
Change-Id: I9da5e18b7e7e28f8b1f90d527cad087073df2d73
scales for second reference frame vars are unitialized if the
second ref frame is one of of those disallowed by refframeflags
Change-Id: I4ce42de391178c1699dcaede18c5f12c84993c61
Proposal for tuning the residual coding by changing how the context
from previous tokens is calculated. Storing the energy class of previous
tokens instead of the token itself eases the critical path of
HW implementations.
Change-Id: I6d71d856b84518f6c88de771ddd818436f794bab
Adding API to read/write uncompressed frame header bits (it is not final
yet). Separate functions to read/write uncompressed header. Moving
clr_type, error_resilient_mode, refresh_frame_context,
frame_parallel_decoding_mode, frame_context_idx from compressed partition
to uncompressed frame header.
Change-Id: Id3ed8a387980c652ae147549412f4ec24a0a5bd0
This commit pulls the iterative motion search for compound inter-
inter out from handle_inter_mode_ as a separate function. Hence,
it is applicable to 4x4/4x8/8x4 level compound inter search to be
enabled later.
Also edit the rd loop for 4x4 inter block sizes for cosmetic
purpose.
Change-Id: Ibc71a11cbe5a26cd52faba01026cf8446cf4d2b4
Removed one 4x4 prediction step that was unnessary in the rd loop.
Removed a unused modecosts estimate from encoder side.
Change-Id: I65221a52719d6876492996955ef04142d2752d86
1. remove prediction mode conversion
2. unified bmode, same for key and non-key frame
3. set I4X4_PRED count for pdf to 0, as I4X4_PRED is no longer
coded ever. It is determined by ref_frame and block partition
Change-Id: If5b282957c24339b241acdb9f2afef85658fe47d
This commit removes the use of bmi_ in the first-pass encoding by
forcing encode_intra4x4block_ to use DC_PRED, followed by DCT_DCT
only, as John suggested. This makes the need for bmi buffer only
up to 4 entries, instead of 16.
Change-Id: I3410007dfae789ee46a09ae20c39d3ce3c7954aa
Hardware implementation needs to load coeff probs based on the
transform size. For selectable transform size, moving these bits
earlier in the bitstream adds some delay giving time to preload
the probs and speeds up the decoding process.
Change-Id: I3bfc1f662ae6f219c9286fe9ae6310c7d8a63ea7
Also do per-partition motion vector referencing in <sb8x8 partitions,
and adjust mvref finding for sub8x8 partitions.
Change-Id: Id3ed1ed4d2a8910d11d327db6cc63b8eb79f941f
This code does not seem to be necessary anymore.
For the 1080p clip used, the decoder performance improved by
~2%.
Change-Id: I66bb0496d4998b0d6c6637c746b642b77bdbef88
1) Added an initialization of rd_tx_select_threshs[].
2) Made updating transform size counts to be consistent
Change-Id: Iaa9d6c6be825b0364c9d61a9802873d01356815c
As intra coded blocks are always decoded using decode_sb_intra(), this
commmit removed the code no longer in use.
Change-Id: I09f14fa9cdc875656e8fbe245f72c8fd83b9e31e
The changing in intra coding to base on transform block, i.e. pred->
txfm->quant->dequant-itxfm->recon, made all blocks within a prediction
unit behave consistently, there is no longer a need to handle blocks
differently based on the position within a predicitn block. So this
commit simplifies the decision of transform type to be based on
prediction mode only.
Change-Id: If96cb72386f2e9186126ace88afa35ef085b6c96
This commit refactors the iterative motion search for compound
inter-inter mode, to make it support all partition types including
4x4/4x8/8x4 block sizes.
Change-Id: I5f1212b0f307377291763e45c6bdc9693b5f04c8
Move 4x4/4x8/8x4 partition coding out of experimental list.
This commit fixed the unit test failure issues. It also resolved
the merge conflicts between 4x4 block level partition and iterative
motion search for comp_inter_inter.
Change-Id: I898671f0631f5ddc4f5cc68d4c62ead7de9c5a58
Reverts to using 128 bit LUT for the coef models rather than 48
to ease hardware implementation.
Also incorporates some cleanups including removing various
hooks to support different lookup tables based on block_type and
ref_type.
Change-Id: I54100c120cca07a2ebd3a7776bc4630fa6a153f6
This commit changed the encoding and decoding of intra blocks to be
based on transform block. In each prediction block, the intra coding
iterates thorough each transform block based on raster scan order.
This commit also fixed a bug in D135 prediction code.
TODO next:
The RD mode/txfm_size selection should take this into account when
computing RD values.
Change-Id: I6d1be2faa4c4948a52e830b6a9a84a6b2b6850f6
This commit allows the rate-distortion optimization of intra coding
capable of supporting 8x4 and 4x8 partition settings.
It enables the entropy coding of intra modes in key frame using a
unified contextual probability model conditioned on its above/left
prediction modes.
Coding performance:
derf 0.464%
Change-Id: Ieed055084e11fcb64d5d5faeb0e706d30268ba18
The API is not final yet and can be changed. Actual layout of
uncompressed frame part will be finalized later. Right now moving
clr_type, error_resilient_mode, refresh_frame_context,
frame_parallel_decoding_mode from first compressed partition to
uncompressed frame part.
Change-Id: I3afc5d4ea92c5a114f4c3d88f96858cccc15b76e
Uses more aggrerssive interpolation to reduce storage for the
model tables by almost more than half. Only 48 lists of probs are
stored (as opposed to 128 before), corresponding to ONE_NODE
probabilities of:
1,
3, 7, 11, ..., 115, 119,
127, 135, ..., 247, 255.
Besides, only 1 table is used as opposed to 2 before. So the overall
memory needed for the tables is just 48 * 8 = 384 bytes.
The table currently used is based on a new Pareto distribution with
heavier tail than a generalized Gaussian - which improves results on
derf by about 0.1% over a single table Generaized Gaussian.
Results overall on derfraw300 is -0.14%.
Change-Id: I19bd03559cbf5894a9f8594b8023dcc3e546f6bd
Cleans up the experiment. Actually uses reduced counts for backward
updates, and reduced number of probabilities in the context.
No change in bitstream when the experiment is on.
Between expt on and off:
derfraw300 is down only -0.062% (which is better than when expts
were run previously).
Change-Id: I55285a049a0c22810bdb42914212ab5a4f8521b5
This patch eliminates the intermediate diff buffer usage by
combining the short idct and the add residual into one function.
The encoder can use the same code as well.
Change-Id: I296604bf73579c45105de0dd1adbcc91bcc53c22
The new code is 0x49, 0x83, 0x42
There is nothing particularly special about this code bitstream wise.
Its derivation is the word "sync" coded using 4x6bit alphabetic indices.
Change-Id: Ie2430a854af32ddc5a5c25a6c1c90cf6497ba647
The recursive partition type search is enabled down to 4x4, 4x8 and
8x4, followed by the corresponding rate-distortion optimization for
the per-partition encoding mode decisions.
The bit-stream writing/reading synchronized in supporting the
rectangular partition of 8x8 block.
This provides above 1% coding performance gains on derf.
To do next:
1. re-design the rate-distortion loop for inter prediction below 8x8.
2. re-design the rate-distortion loop for intra prediction below 4x4.
3. make the loop-filter aware of rectangular partition of 8x8 block.
4. clean the unused probability models.
5. update default probability values.
Change-Id: Idd41a315b16879db08f045a322241f46f1d53f20
Correct the stride parameter of 4x8 in vp9_sub_pixel_variance4x8_
and vp9_sub_pixel_avg_variance4x8.
Change-Id: I2ca74d4043817503b21737563994270e3b0619ff
Replace vp9_kf_default_bmode_counts structure with
direct default probabilities. The probability structure is
smaller and it removes the need to specify in the bitstream
how to convert the counts to probabilities.
Note that I have concerns still about the size and value of
the large intra mode context. This may cause problems for
HW but it also means we rely heavily on reverse update as
forwards update of a structure this size is problematic. I
intend to review this more generally in the next few days to
see if we can come up with a competitive solution that does
not rely on such a large context.
Change-Id: I0a36071079d5d26a57ab0e9fbf91af4199aa7984
This is a mostly-working implementation of an extra channel in the
bitstream. Configure with --enable-alpha to test. Notable TODOs:
- Add extra channel to all mismatch tests, PSNR, SSIM, etc
- Configurable subsampling
- Variable number of planes (currently always uses all 4)
- Loop filtering
- Per-plane lossless quantizer
- ARNR support
This implementation just uses the same contents as the Y channel
for the A channel, due to lack of content and general pain in
playing back 4 channel content. A later patch will use the actual
alpha channel passed in from outside the codec.
Change-Id: Ibf81f023b1c570bd84b3064e9b4b8ae52e087592
Deprecate set_block_index. Replace it with get_sb_index_ for
consistency with partition search and bit-stream writing/reading.
Use b_width/height_log2 instead of mi_width/height_log2, to support
4x4 resolution partition types.
Change-Id: Ic1e71981e163c669f7ea6b3c12b831c284c4a494
Replace mi_width/height_log2 with b_width/height_log2 in partition
type parsing at bit-stream writing stage. This allows parsing
resolution at 4x4 block level and makes the 4x4/4x8/8x4 partition
coding consistent with other superblock types.
Change-Id: I7db3617ea042e0db2dc898999b0c323bff91a22f
Test on cif set showed small but consistent compression gain for
almost all encodings with overall impact of .08%. The gains average
aournd .12% combined with D63 adst change.
Test encoding on std-hd set is ongoing..
Change-Id: If4d94799cf0486fb9c770b193e5c386d13d99d59
This patch eliminates the intermediate diff buffer usage by
combining the short idct and the add residual into one function.
The encoder can use the same code as well.
Change-Id: Iacfd57324fbe2b7beca5d7f3dcae25c976e67f45
These building blocks enable rate-distortion optimization search
over block sizes of 8x4 and 4x8. Need to convert them into mmx/sse
forms.
Change-Id: I570ea2d22d14ceec3fe3575128d7dfa172a577de
This patch creates a new inter mode contest that avoids
a dependence on the reconstructed motion vectors from
neighboring blocks. This was a change requested by
a hardware vendor to improve decode performance.
As part of this change I have also made some modifications
to stats output code (under a flag) to allow accumulation of
inter mode context flags over multiple clips
Some further changes will be required to accommodate the
deprecation of the split mv mode over the next few days.
Performance as stands is around -0.25% on derf and
std-hd but up on the YT and YT-HD sets. With further tuning
or some adjustment to the context criteria it should be
possible to make this change broadly neutral.
Change-Id: Ia15cb4470969b9e87332a59c546ae0bd40676f6c
Adds a subsampling aware border extension function. This may be reworked
soon to support more than 3 planes.
Change-Id: I76b81901ad10bb1e678dd4f0d22740ca6c76c43b
This commit allows proper transform type (DCT/ADST) selection in
the settings of partition 4x4 level.
Change-Id: Iec6f922a46480d777e7ca9142a99e8c131f0077b
Always initialize the mode_info with sb_type of BLOCK_SIZE_MB16X16
for the first-pass encoding test.
Change-Id: Ic86393eeef981bdd523a5b44cfac3f0b24c068b7
Combining encode_nmv_component with encode_nmv_component_fp
and read_nmv_component with read_nmv_component_fp. Bitstream is slightly
changed (only the order of bits), here are the results on test sets:
stdhd: +0.047, yt: -0.038, derf: +0.001, hd: -0.011.
Change-Id: I1be312e976796df78ca63368702d0ee19f2b8c50
This patch eliminates the intermediate diff buffer usage by
combining the short idct and the add residual into one function.
The encoder can use the same code as well.
Change-Id: Iea7976b22b1927d24b8004d2a3fddae7ecca3ba1
Trial use of a combination of reference frame,
prediction block size and mv to define segmentation.
Change-Id: Ie8946a0446dbad777fdcf7626f89e5af0994db50
This patch eliminates the intermediate diff buffer usage by
combining the short idct and the add residual into one function.
The encoder can use the same code as well.
Change-Id: I4ea09df0e162591e420d869b7431c2e7f89a8c1a
This commit allows the rate-distortion optimization recursion
at encoder to go down to 4x4 block size. It deprecates the use
of I4X4_PRED and SPLITMV syntax elements from bit-stream
writing/reading. Will remove the unused probability models in
the next patch.
The partition type search and bit-stream are now capable of
supporting the rectangular partition of 8x8 block, i.e., 8x4
and 4x8. Need to revise the rate-distortion parts to get these
two partition tested in the rd loop.
Change-Id: I0dfe3b90a1507ad6138db10cc58e6e237a06a9d6
Allow motion search multiple times iteratively, and break out
the loop if this search couldn't find better motion vectors.
Limit the maximum number of search to 2.
Tests results:
1. stdhd set: 0.311%(overall psnr); 0.346%(ssim).
positive gain on 10 out of 16 clips(best: 2.746% on sunflower;
worst: -0.434% on old_town_cross).
2. derf set: 0.016%(overall psnr); 0.062%(ssim).
positive gain on half of the clips(best: 0.499% on bowing;
worst: -0.387 on city).
Change-Id: Ibf0a51776d4caf7707be0586346db08128117559
Change band calculation back to simpler model based
on the order in which coefficients are coded in scan order
not the absolute coefficient positions.
With the scatter scan experiment enabled the results were
appear broadly neutral on derf (-0.028) but up a little on std-hd +0.134).
Without the scatterscan experiment on the results were up derf as well.
Change-Id: Ie9ef03ce42a6b24b849a4bebe950d4a5dffa6791
Move set_partition_seg_context_ to common file. Use consistent
context setup conditions for partition probability model update at
encoder and decoder.
Change-Id: I24b7ed3b1c48e3d2568191a46b70136b99b67b1a
Use 4x4 block coding for UV components arbitrarily in I4X4_PRED and
SPLITMV coding modes. This is a temporary solution to enable
bit-stream support for recursive partition down to 4x4 block size.
Will separate the functionalities of 4x4 block coding rate-distortion
out from those of superblocks.
Change-Id: I03dc15d5897014f175f3f2c91e9b266091d56797
In current code, motion vectors got from single prediction mode are used
in compound prediction mode directly. These motion vectors may not give
accurate prediction since they are searched independently. In this patch,
we took Pascal's suggestion, and did joint motion search in compound
prediction mode to find better motion vectors in this situation.
Test results:
Overall PSNR: 0.570%(derf), 0.918%(stdhd);
SSIM: 0.572%(derf), 1.009%(stdhd);
The encoder is a little slower. This can be improved since some c
code is used in motion search.
Change-Id: Ib30c9240f6c56c9b070867b4ca89412a76d9f3c6
This commit enables the search for the optimal superblock
partition types in the recursion form. The intention is to
make the optimization process more concise and ready to
support partition down to 4x4 block size next.
Change-Id: Iae279a67df3a7cc372553c84c775bc4d2f3e4336
The previous code was somewhat vestigial for 16x16 MI units, but was
incorrect when called with chroma blocks larger than 4x4 because the
block index caused a reference to a non-existent BMI. This patch uses
the same MV for all chroma subblocks in SPLITMV mode, which is
suboptimal for non-4:2:0 subsamplings, but as SPLITMV may be removed
in the near future, will use this as a stop gap.
Change-Id: I3211cee5ccf1cfb426e5eef5353b0ce5bb92b4cd
Make framebuffer allocations according to the chroma subsamping
factors in use. A bit is placed in the raw part of the frame header for
each of the two subsampling factors. This will be moved in a future
commit to make them part of the TBD feature set bits, probably only set
on keyframes, etc.
Change-Id: I59ed38d3a3c0d4af3c7c277617de28d04a001853
The chroma planes are not used during the first pass encode,
but the vp9_encode_sb() function was operating on them anyway.
This was causing the use of uninitialized memory.
Change-Id: I5ebafcd3d5e34ed91a8336dad159b573995a939f
Update and buffer left/above partition information context per 8x8
block. This allows to further enable recursive partition down to
4x4 block size, and hence deprecating I4X4_PRED and SPLITMV.
This commit also fixes a context buffer swap/restore issue in 32x32
partition type search. This gives 0.1% performance gain for derf/yt.
Will refactor the superblock partition type search into recursion
form.
Change-Id: Ib61975aca5f12b78d8018481d7fa1393d085689b
Makes the temporary storage of the filtered data agnostic to
the number of planes and how they're subsampled.
Change-Id: I12f352cd69a47ebe1ac622af30db29b49becb7f4
Skip Q values between the q.0 mode and a real q of
2.0 as these are not valuable from an RD perspective.
Change-Id: I110c4858c57f97315953f4d88a2596d4764360df
There is only one instance of these structures, no need for them
to be allocated separately on the heap.
Change-Id: I1333cc92d06bbe21be643c2b2f0e3936f0264cac
This setup is now handled by vp9_build_intra_predictors()
when left_available and/or up_available is zero.
Change-Id: I59cec0ab95f8be69ce885fd20727510e4deef8a0
Iterating over all planes in the loop instead of custom y,uv code inside
handle_inter_mode function.
Change-Id: I301f9276d6d544c2fd7203d84f1318ac80ea625d
Disable the use of scaled reference frame for motion search in
SPLITMV mode. This fixes the unit test failure issue triggered
when merging sb8x8 from experimental list.
Change-Id: I02ac25fd8db8d5762f8fee29513b947189875fa0
This allows removing a large number of transform size specific functions,
as well as supporting 444/alpha by routing all code through the
subsampling-aware path.
Change-Id: Ieb085cebe9f37f24fc24de179898b22abfda08a4
The number of reference buffers is extended to 8 and
a reference sign-bias added for the LAST_FRAME.
Whilst the number of reference buffers used by an
individual frame remains unchanged at 3, these may
now be selected from 8 possible buffers.
Change-Id: I2d247b9c1c2b3a339d6c9fac125e81ba373f75a7
With this, encoder/decoder appear to match with sb8x8 experiment.
Needs some larger-scale testing.
Change-Id: I44d3cac37b3c98264985ed0a0fc763c30089aa64
Creates a common encode (subtract, transform, quantize, optimize,
inverse transform, reconstruct) function for all sb sizes, including
the old 16x16 path.
Change-Id: I964dff1ea7a0a5c378046a069ad83495f54df007
Don't allow i4x4 except for sb8x8 recursion step. Read only 4 (not 16)
i4x4 submodes if we are i4x4.
Change-Id: Iaaaced1a134006b2c96eed66f014300eae41e0ed
If a reference frame is inter, the only valid modes would
be inter modes. This check is unnecessary.
Change-Id: Ib8433ab5a3418f94149ee4e3062d48d7740d225a
This doesn't affect the output, because in previous cases where the
values were uninitialized, this was because the mb_row/col is outside
the codable area, and thus encode_sb will test them for the next
decomposition-level, but return right after that on size-check. All
this does is prevent a warning in valgrind.
Change-Id: I90d8a29e6f8ebb2b0143684e08fe77ae3a0816b1
The decode_mb only carries I8X8_PRED decoding, which will be covered
by the regular MB intra modes when SB8X8 is on. To be removed later.
Change-Id: I3b9ee55917a30b42518b81987bc10c22b1a19e7f
Work-in-progress, not yet ready for review. TODO items:
- bitstream writing (encoder) and reading (decoder)
- decoder reconstruction
Change-Id: I5afb7284e7e0480847b47cd0097cb469433c9081
Changing the order of probabilities inside mb_segment_tree_probs in order
to use treed_read/treed_write function instead of custom code.
Change-Id: I843487d5057913b9358db73da270893eefecc6c8
Moving common code from encoder and decoder to vp9_get_qindex function.
Also moving quant-related constants from vp9_onyxc_int.h to
vp9_quant_common.h.
Change-Id: I70c5bfbaa1c8bf00fde0bfc459d077f88b6d46c8
Separate the decoding process of 4x4 block based coding (both intra
and inter) from decode_mb and move it into decode_atom_. This allows
to further move the rest per 16x16 block decoding of decode_mb into
decode_sb, and hence eventually deprecating decode_mb when SB8X8 is
enabled.
Change-Id: I678cb8007d8a57b792d7a23020edb0c74fbf4237
Separate the functionality of I4X4_PRED from decode_mb. Use
decode_atom_intra instead, to enable recursive partition of superblock
down to 8x8.
Change-Id: Ifc89a3be82225398954169d0a839abdbbfd8ca3b
Also fixed two minor subtle boundary conditions in intra prediction
code, and replaced memcpy/memset with vpx_ prefixed version.
Change-Id: I9cddff3be831228b628f1f2f065a61feacbcbee6
The commmit changed to use same intra prediction function for all
block sizes.
Some details on the changes:
1. All directional modes except DC/TM/V/H now have built-in filtering
for all pixels with filter taps either (1, 2, 1)/4 or (1, 1)/2.
2. Above edge get automatic extended to double width (bw*2), which
makes a lot of the prediciton mode computation simpler.
3. Same intra prediction function is called with different size
for i4x4_pred and all other larger size.
Overall, the change helped keyframe only coding for both cif size
and std-hd size test sets by .5% consistently on all encodings.
For normal coding with single/auto key frame, the change now also
is consistently net positive for all encodings. The overall gains
is about .15% on std-hd set.
Change-Id: I01ceb31fbc73d49776262e6bdc06853b03bbd1d1
Updates the tokenizer to use the common block walker used by the
detokenizer, to support non-4:2:0 and more than 3 planes.
Change-Id: If1854117a9c7c1427349209fa2b3051ce6459dcb
This doesn't change output, because the argument isn't actually used
ATM. However, we should fix it for consistency.
Change-Id: I7b7326a8e92c0d411c999ec2c781204b516ed53d
Unify the tokenize_ function and enable configurable block size for
superblock 8x8. We are immigrating the functionalities of
macroblock handles into superblock ones, and eventually will remove
encode_mb and decode_mb. To be continued on detokenize_ module.
Change-Id: I9f81e8c2291082535cf5e0c4b662eb24fb7c8a7f
Output changes slightly because of a minor bug in (at least) the sb32x16
block2above tx16x16 tables that previously existed in vp9_blockd.c.
Change-Id: I624af28ac200a8322d64454cf05c79e9502968cc
Turns model based reverse updates on for coefficients in an
effort to reduce the memory requirement for counters.
With this patch the counters needed will be reduced by about
75% since only 3 counts are needed instead of 12.
The impact in performance is:
derf300: -0.252%
stdhd250: -0.046%
However retraining should alleviate some of the drop in
performance.
Change-Id: I6f2b3e13f6d5520aa3400b0b228fb5e8b4a43caa
First patch to make sb decoding based on the transform size. This patch
is working for the sb modes, combining the parts of decode_mb that fit
into this framework will come as a second patch.
Change-Id: I26123416a7a87e096bbdb5eb944ce5bb198384f8
Conflicts:
vp9/common/vp9_findnearmv.c
vp9/common/vp9_rtcd_defs.sh
vp9/decoder/vp9_decodframe.c
vp9/decoder/x86/vp9_dequantize_sse2.c
vp9/encoder/vp9_rdopt.c
vp9/vp9_common.mk
Resolve file name changes in favor of master. Resolve rdopt changes in
favor of experimental, preserving the newer experiments.
Change-Id: If51ed8f457470281c7b20a5c1a2f4ce2cf76c20f
All members can be referenced from their per-plane counterparts, and
removes assumptions about 24 blocks per macroblock.
Change-Id: I7ff2fa72d22c29163eb558981c8193765a8113d9
The previous commit 15255ee "Move dequant from BLOCKD to per-plane MACROBLOCKD"
removed the vp9_asm_enc_offsets.c file, but didn't update the various secondary
build systems that special case these files. Restore it for now, to ensure any
in-progress changes and builds continue working, to allow time to more carefully
coordinate removal of these files.
Change-Id: I24b78db3fb874d5fbd226548b7366a05ed98e536
This originally was "Removed update_blockd_bmi()". Now,
this patch removed bmi from blockd and uses the bmi found
in mode_info_context. Eliminates unnecessary bmi copies between
blockd and mode_info_context.
Change-Id: I287a4972974bb363f49e528daa9b2a2293f4bc76
vp9_dequantize_x86 has only sse2 functions.
vp9_dct_sse2_intrinsics has no namespace collision and can drop
_intrinsics.
vp9_idct_mmx.h is unused.
Change-Id: Ic16e31fb372a1d1e841a62ecb4189fe8f95808ec
Basic assumption: when talking about transform units, use b_; when
talking about macroblock indices, use mb_.
Change-Id: Ifd163f595d4924ff892de4eb0401ccd56dc81884
The quantizer can vary per-plane, and the dequantization vector is
available in the per-plane part of MACROBLOCKD. The previous code would
incorrectly use the Y quantizer for the whole macroblock.
Change-Id: I3ab418aef9168ea0ddcfa4b7c0be32ae48b536d7
The underlying storage for these buffers is in the per-plane MACROBLOCKD
area, so read it from there directly.
Change-Id: Id6bd835117fdd9dea07db95ad06eff9f12afaaf7
All members can be referenced from their per-plane counterparts, and
removes assumptions about 24 blocks per macroblock.
Change-Id: I593fb0715e74cd84b48facd1c9b18c3ae1185d4b
Remove similarly named header file. It is obsolete.
Move file to match naming style.
Adjust make file to include the file correctly and remove extra
unnecessary #if guard.
Change-Id: Ifba07ba9938a5df08a9f4eda54a3ac4d6983f7bf
Using ALLOWED_REFS_PER_FRAME constants instead of hard coded 3, replacing
memcpy with plain struct assignment.
Change-Id: Ibc86f5d175fcb3f3a3eddacf593525370f1f854c
Function set_mb_row() and set_mb_col() do similar work and are always
called together, this commit merged them into a single function for
clarity and easy maintainence. This was a TODO item.
Change-Id: I956bd9ed6afb8b2b0469b20fd8bc893b26f8a0f3
This commit enables selecting probability models for recursive block
partition information syntax, depending on its above/left partition
information, as well as the current block size. These conditional
probability models are reasonably stationary and consistent across
frames, hence the backward adaptive approach is used to maintain and
update the contextual models.
It achieves coding performance gains (on top of enabling rectangular
block sizes):
derf: 0.242%
yt: 0.391%
hd: 0.376%
stdhd: 0.645%
Change-Id: Ie513d9673337f0d27abd65fb566b711d0844ec2e
This minor tweak makes segment 0 neutral and used by
key frames and also extends beyond 4 segments.
Change-Id: Ife4744602aba66ac9432746db3113cc5cd88a482
Also some further simplification following removal
of top node code.
There is an issue in regards to the shared file vp8cx.h
in regard to the roi_map as this interface assumes that
there are only 4 segments. I have left the value here as
4 for now meaning that the roi_map interface is broken
for VP9.
Note that this change would have been easier if I hadn't
had to search for hard wire instances of the number 4
and <= 3.
Change-Id: Ia8b6deea4be4dbd20deb1656e689dd43a5f190e8
Remove top node optimization.
The improvement this gives is not sufficient to justify
the extra complexity.
Change-Id: I2bb4a12a50ffd52cacfa4a3e8acbb2e522066905
Quantizers can vary per plane, but not per block. Move these values to
the per-plane part of MACROBLOCK.
Change-Id: I320a55e38b7b28b29aec751a4aca5ccd0c9b9326
This commit moves the coeff storage from the MACROBLOCK struct to its
per-plane part. The next commit will remove the coeff member from the
BLOCK structure so that it is consistently accessed per-plane.
Also refactors vp9_sb_block_error_c and vp9_sb_uv_block_error_c to be
variable subsampling aware.
Change-Id: I18c30f87f27c3a012119b6c1970d5fa499804455
This commit enables rectangular block prediction of compound
inter-intra mode. It combines the mb/sb32/sb64 prediction functions
into a unified version with configurable block width and height.
This fixes the enc/dec mismatch of the codebase when
comp-interintra-pred is enabled.
Change-Id: I1d0db2f1f184007802df04fcd12b9dadb3189ff0
Clean Windows build warnings:
warning C4028: formal parameter <N> different from declaration
This was fixed independently in master and experimental but the fixes
were in opposite directions. One added const to the declaration and the
other removed it from the implementation.
Also update the variable names. This doesn't modify the data so call it
ref, matching the functions in the vicinity, rather than dst.
Change-Id: I2ffc6b4a874cb98c26487b909d20a5e099b5582c
Fix warning on windows: signed/unsigned mismatch on lines 415, 454
Comparison was between size_t data_sz >= int index_sz on 415 and
unsigned int data_sz >= int index_sz on 454. Both might be changed to
size_t but that would be tracing and replacing all comparisons is
outside the scope of this change.
In the rest of these two functions ensure unsigned values are used
consistently.
Change-Id: I922b399ceca612a92f44b9d1d331c1c6bae9d768
Also use explicitely named enum values in sb_type comparisons, rather
than relying on absolute integer values, because enum values may
change in the future.
Change-Id: I72d42971a98157af93413a25ac2c7e6f9b369cec
First in a series of commits making certain MACROBLOCK members
addressable per-plane. This commit also refactors the block subtraction
functions vp9_subtract_b, vp9_subtract_sby_c, etc to be
loops-over-planes and variable subsampling aware.
Change-Id: I371d092b914ae0a495dfd852ea1a3d2467be6ec3
This version of speed 1 only disables modes at higher resolution that
had distortions >2x the best mode we found...
The hope is that this could be a replacement for speed 0 ...
Change-Id: I7421f1016b8958314469da84c4dccddf25390720
Mostly for cleanup purposes. Now we should be able to rework
the encoder/decoder to use a common idct/add function.
Change-Id: I1597cc59812f362ecec0a3493b6101a6cc6fa7ff
This fixes an intermittent mismatch issue cause by moving
the lossless mode decoding bit to after the loop filter
setup information. We need to ensure that the lossless bit
is decoded prior to loop filter setup.
Change-Id: I3faa3fff8e1013b7405dac91268350e059ed121e
Adds an experiment that codes an end-of-orientation symbol
for every eligible zero encountered in scan order.
This cleans out various other sub-experiments that were part
of the origiinal patch, which will be later included if found
useful.
Results are slightly positive on all sets (0.1 - 0.2% range).
Change-Id: I57765c605fefc7fb9d1b57f1b356843602abefaf
Removes the redundant dst pointers from vp9_build_inter_predictors_sb{y,uv}
and the remaining mb specific functions.
Change-Id: I7b6bf439d9394b85ea79b4fe61a3ffc1025720da
Wherever there are real pixels available before falling back to use
assumed values 127 and 129.
This also make DC_PRED for i4x4 consistent with DC_PRED for larger
blocks.
Change-Id: I54372924826118da023f402c802ac6ce0caa70c3
First in a series of commits moving the framebuffers pointers to
per-plane data, so that they can be indexed numerically rather than
by name.
Change-Id: I6e0d60fd4d51e6375c384eb7321776564df21775
For 1080 material, this buffer is currently 2,270,928 bytes. This patch swaps
ptrs instead of copying and uses the last show_frame flag instead of setting
the entire buffer to zero. For the test clip used, the decoder improved by up
to 1%.
Change-Id: I686825712ad56043e09ada9808dc489f875a6ce0