Almost all arguments for vp9_build_inter32x32_predictors_sb and
vp9_build_inter64x64_predictors_sb can be deduced from the first macroblock
argument.
Change-Id: I5d477a607586d05698d5b3b9b9bc03891dd3fe83
Extracting setup_frame_size and update_frame_context functions. Introducing
vp9_read_prob function as shortcut for (vp9_prob)vp9_read_literal(r, 8).
Change-Id: Ia5c68fd725b2d1b9c5eb20f69cacb62361b5a3dd
This gains about 0.2% on derf, 0.1% on hd and 0.4% on stdhd. I can put
this under an experimental flag if wanted, just trying to get my patch
queue in shape.
Change-Id: Ibe1a30fe0e0b07bec4802e0f3ff0ba22e505f576
Adds an experiment to use a weighted prediction of two INTER
predictors, where the weight is one of (1/4, 3/4), (3/8, 5/8),
(1/2, 1/2), (5/8, 3/8) or (3/4, 1/4), and is chosen implicitly
based on consistency of the predictors to the already
reconstructed pixels to the top and left of the current macroblock
or superblock.
Currently the weighting is not applied to SPLITMV modes, which
default to the usual (1/2, 1/2) weighting. However the code is in
place controlled by a macro. The same weighting is used for Y and
UV components, where the weight is derived from analyzing the Y
component only.
Results (over compound inter-intra experiment)
derf: +0.18%
yt: +0.34%
hd: +0.49%
stdhd: +0.23%
The experiment suggests bigger benefit for explicitly signaled weights.
Change-Id: I5438539ff4485c5752874cd1eb078ff14bf5235a
These are mostly just for experimental purposes. I saw small gains (in
the 0.1% range) when playing with this on derf.
Change-Id: Ib21eed477bbb46bddcd73b21c5c708a5b46abedc
Now that the first AC coefficient in both directions use the same DC
as their context, there no longer is a purpose in letting both have
their own band. Merging these two bands allows us to split bands for
some of the very high-frequency AC bands.
In addition, I'm redoing the banding for the 1D-ADST col/row scans. I
don't think the old banding made any sense at all (it merged the last
coefficient of the first row/col in the same band as the first two of
the second row/col), which was clearly an oversight from the band being
applied in scan-order (rather than in their actual position). Now,
coefficients at the same position will be in the same band, regardless
what scan order is used. I think this makes most sense for the purpose
of banding, which is basically "predict energy for this coefficient
depending on the energy of context coefficients" (i.e. pt).
After full re-training, together with previous patch, derf gains about
1.2-1.3%, and hd/stdhd gain about 0.9-1.0%.
Change-Id: I7a0cc12ba724e88b278034113cb4adaaebf87e0c
Pearson correlation for above or left is significantly higher than for
previous-in-scan-order (absolute values depend on position in scan, but
in general, we gain about 0.1-0.2 by using either above or left; using
both basically just makes this even better). For eob branch skipping,
we continue to use the previous token in scan order.
This helps about 0.9% on derf after re-training on a limited data set.
Full re-training and results on larger-resolution clips are pending.
Note that this commit breaks trellis, so we can probably get further
gains out of it by fixing trellis at some later point.
Change-Id: Iead68e296fc3a105cca746b5e3da9555d6010cfe
Moving code from vp9_decode_frame function into setup_loopfilter and
setup_segmentation functions. A little bit of cleanup.
Change-Id: I2cce1813e4d7aeec701ccf752bf57e3bdd41b51c
Adds a per-frame, strength adjustable, in loop deringing filter. Uses
the existing vp9_post_proc_down_and_across 5 tap thresholded blur
code, with a brute force search for the threshold.
Results almost strictly positive on the YT HD set, either having no
effect or helping PSNR in the range of 1-3% (overall average 0.8%).
Results more mixed for the CIF set, (-0.5 min, 1.4 max, 0.1 avg).
This has an almost strictly negative impact to SSIM, so examining a
different filter or a more balanced search heuristic is in order.
Other test set results pending.
Change-Id: I5ca6ee8fe292dfa3f2eab7f65332423fa1710b58
Replaces the default tables for single coefficient magnitudes with
those obtained from an appropriate distribution. The EOB node
is left unchanged. The model is represeted as a 256-size codebook
where the index corresponds to the probability of the Zero or the
One node. Two variations are implemented corresponding to whether
the Zero node or the One-node is used as the peg. The main advantage
is that the default prob tables will become considerably smaller and
manageable. Besides there is substantially less risk of over-fitting
for a training set.
Various distributions are tried and the one that gives the best
results is the family of Generalized Gaussian distributions with
shape parameter 0.75. The results are within about 0.2% of fully
trained tables for the Zero peg variant, and within 0.1% of the
One peg variant.
The forward updates are optionally (controlled by a macro)
model-based, i.e. restricted to only convey probabilities from the
codebook. Backward updates can also be optionally (controlled by
another macro) model-based, but is turned off by default. Currently
model-based forward updates work about the same as unconstrained
updates, but there is a drop in performance with backward-updates
being model based.
The model based approach also allows the probabilities for the key
frames to be adjusted from the defaults based on the base_qindex of
the frame. Currently the adjustment function is a placeholder that
adjusts the prob of EOB and Zero node from the nominal one at higher
quality (lower qindex) or lower quality (higher qindex) ends of the
range. The rest of the probabilities are then derived based on the
model from the adjusted prob of zero.
Change-Id: Iae050f3cbcc6d8b3f204e8dc395ae47b3b2192c9
Lower case variable names, code simplification by using already defined
clamp and read_le16 functions.
Change-Id: I8fd544365bd8d1daed86d7b2ae0843e4ef80df08
Wrote sse2 version of vp9_short_idct10_16x16 function. Compared
to c version, the sse2 version is 2.3X faster.
Change-Id: I314c4f09369648721798321eeed6f58e38857f26
Making consistent initialization of mb_to_{top,botton,left,right}_edge
variables after set_mb_row & set_mb_col calls. A little bit of code cleanup
additionally.
Change-Id: I245bfe32c5701e9836956dc25cf8c770d109cbc1
Wrote sse2 version of vp9_short_idct16x16 function. Compared to c
version, the sse2 version is over 2.5X faster.
Change-Id: I38536e2b846427a2cc5c5423aaf305fd0e605d61
Renaming Width to width, Height to height and Version to version in
several structs and function signatures.
Change-Id: I084c3f7e747cb2ce3345aff27a3dff9b13a87543
Wrote sse2 functions of vp9_short_idct8x8 and vp9_short_idct10_8x8.
Compared to c version, the sse2 version is 2X faster. The decoder
test didn't show noticeable gain since 8x8 idct doesn't take much
of decoding time (less than 1% in my test).
Change-Id: I56313e18cd481700b3b52c4eda5ca204ca6365f3
If the intended display size is different than the size the frame is
coded at, then send that size explicitly in the bitstream. Adds a new
bit to the frame header to indicate whether the extra size fields
are present.
Change-Id: I525c66f22d207efaf1e5f903c6a2a91b80245854
This fix resolves some of the mismatch issues being seen
recently. While this is the right thing to do when tiling
is used for this experiment, it is not the underlying cause
of the the mismatches.
Something else is causing writing outside of the allowable
frame area in the encoder leading to this mismatch.
Change-Id: If52c6f67555aa18ab8762865384e323b47237277
Moving identical code to separate functions, variable declaration and
initialization on the same line.
Change-Id: Ifa6474a64189f9d8051e88e19850453b0227752c
Updates the YV12_BUFFER_CONFIG structure to be crop-aware. The
exiting width/height parameters are left unchanged, storing the
width and height algined to a 16 byte boundary. The cropped
dimensions are added as new fields.
This fixes a nasty visual pulse when switching between scaled and
unscaled frame dimensions due to a mismatch between the scaling
ratio and the 16-byte aligned sizes.
Change-Id: Id4a3f6aea6b9b9ae38bdfa1b87b7eb2cfcdd57b6
This is like VP8_COPY_REFERENCE, but returns a pointer to the reference
frame rather than a copy of it. This is useful when the application
doesn't know what the size of the reference is, as is the case when
scaling is in effect.
Change-Id: I63667109f65510364d0e397ebe56217140772085
When coding the frame that corresponds to the midpoint frame
defining an ARF, do not update the last reference frame buffer.
Previously this buffer was updated meaning that when coding the next
ARF all the reference buffers were the same (or nearly so).
Turning the update off means that the frame before is still available
as an alternative predictor and for use in compound prediction.
Also fixed inconsistency in test for mismatch (patch from JK).
Net average gains (derf 0.049, yt 0.163, yt-hd 0.207, std-hd 0.286)
Change-Id: Ifee21da21ccbb1648ac2eafe890d3ce60562c7bc
Removing redundant code, introducing new functions for better
decomposition, adding 'clamp' function to vp9_common.h.
Change-Id: Ic3b8ca13bbc38f60f0c9c43910b5802005e31aaf
Adds probability updates for extra bits for the nzcs, code for
getting nzc stats, plus some minor cleanups and fixes.
Change-Id: If2814e7f04fb52f5025ad9f400f3e6c50a00b543
This also changes the RD search to take account of the correct block
index when searching (this is required for ADST positioning to work
correctly in combination with tx_select).
Change-Id: Ie50d05b3a024a64ecd0b376887aa38ac5f7b6af6
Yaowu found this function had a compiling issue with MSVC because
of using _mm_storel_pi((__m64 *)(dest + 0 * stride), (__m128)p0).
To be safe, changed back to use integer store instruction.
Also, for some build, diff could not always be 16-byte aligned.
Changed that in the code.
Change-Id: I9995e5446af15dad18f3c5c0bad1ae68abef6c0d
This patch revamps the entropy coding of coefficients to code first
a non-zero count per coded block and correspondingly remove the EOB
token from the token set.
STATUS:
Main encode/decode code achieving encode/decode sync - done.
Forward and backward probability updates to the nzcs - done.
Rd costing updates for nzcs - done.
Note: The dynamic progrmaming apporach used in trellis quantization
is not exactly compatible with nzcs. A suboptimal approach has been
used instead where branch costs are updated to account for changes
in the nzcs.
TODO:
Training the default probs/counts for nzcs
Change-Id: I951bc1e22f47885077a7453a09b0493daa77883d
Split macroblock and superblock tokenization and detokenization
functions and coefficient-related data structs so that the bitstream
layout and related code of superblock coefficients looks less like it's
a hack to fit macroblocks in superblocks.
In addition, unify chroma transform size selection from luma transform
size (i.e. always use the same size, as long as it fits the predictor);
in practice, this means 32x32 and 64x64 superblocks using the 16x16 luma
transform will now use the 16x16 (instead of the 8x8) chroma transform,
and 64x64 superblocks using the 32x32 luma transform will now use the
32x32 (instead of the 16x16) chroma transform.
Lastly, add a trellis optimize function for 32x32 transform blocks.
HD gains about 0.3%, STDHD about 0.15% and derf about 0.1%. There's
a few negative points here and there that I might want to analyze
a little closer.
Change-Id: Ibad7c3ddfe1acfc52771dfc27c03e9783e054430
Fixed a couple of variable/function definitions, as well as header
handling to support 16K sequence coding at high bit-rates.
The width and height are each specified by two bytes in the header.
Use an extra byte to explicitly indicate the scaling factors in
both directions, each ranging from 0 to 15.
Tested coding up to 16400x16400 dimension.
Change-Id: Ibc2225c6036620270f2c0cf5172d1760aaec10ec
Simplified idct32x32 calculation when there are only 10 or less
non-zero coefficients in 32x32 block. This helps the decoder
performance.
Change-Id: If7f8893d27b64a9892b4b2621a37fdf4ac0c2a6d
Removing redundant variables, using x *= y instead x = x * y, moving
variable declarations into inner blocks.
Change-Id: I884f95c755f55d51b7c1c6585f10296919063e41
This patch extends the previous support for using references of a
different resolution in ZEROMV mode to all inter prediction modes.
Subpixel based best-mv scoring is disabled when the reference frame
differs in resolution from the current frame.
Change-Id: Id4dc3e5e6692de98d9857fd56bfad3ac57e944ac
This patch allows coding frames using references of different
resolution, in ZEROMV mode. For compound prediction, either
reference may be scaled.
To test, I use the resize_test and enable WRITE_RECON_BUFFER
in vp9_onyxd_if.c. It's also useful to apply this patch to
test/i420_video_source.h:
--- a/test/i420_video_source.h
+++ b/test/i420_video_source.h
@@ -93,6 +93,7 @@ class I420VideoSource : public VideoSource {
virtual void FillFrame() {
// Read a frame from input_file.
+ if (frame_ != 3)
if (fread(img_->img_data, raw_sz_, 1, input_file_) == 0) {
limit_ = frame_;
}
This forces the frame that the resolution changes on to be coded
with no motion, only scaling, and improves the quality of the
result.
Change-Id: I1ee75d19a437ff801192f767fd02a36bcbd1d496
Wrote SSE2 version of vp9_dc_only_idct_add_c function. In order to
improve performance, clipped the absolute diff values to [0, 255].
This allowed us to keep the additions/subtractions in 8 bits.
Test showed an over 2% decoder performance increase.
Change-Id: Ie1a236d23d207e4ffcd1fc9f3d77462a9c7fe09d
Ensure that all inter prediction goes through a common code path
that takes scaling into account. Removes a bunch of duplicate
1st/2nd predictor code. Also introduces a 16x8 mode for 8x8
MVs, similar to the 8x4 trick we were doing before. This has an
unexpected effect with EIGHTTAP_SMOOTH, so it's disabled in that
case for now.
Change-Id: Ia053e823a8bc616a988a0af30452e1e75a739cba
Rebased.
Remove the old matrix multiplication transform computation. The 16x16
ADST/DCT can be switched on/off and evaluated by setting ACTIVE_HT16
300/0 in vp9/common/vp9_blockd.h.
Change-Id: Icab2dbd18538987e1dc4e88c45abfc4cfc6e133f
This patch alters the balance of context between the
coefficient bands (reflecting the position of coefficients
within a transform blocks) and the energy of the previous
token (or tokens) within a block.
In this case the number of coefficient bands is reduced
but more previous token energy bands are supported.
Some initial rebalancing of the default tables has been
by running multiple derf clips at multiple data rates using
the ENTOPY_STATS macro. Further balancing needs to be
done using larger image formatsd especially in regard to
the bigger transform sizes which are not as well represented
in encodings of smaller image formats.
Change-Id: If9736e95c391e711b04aef6393d26f60f36e1f8a
Removing redundant 'extern' keywords and parentheses, fixing indentation,
making variable names lower case, using short expressions x *= c
instead of x = x * c, minor code simplifications.
Change-Id: If6a25fcf306d1db26e90d27e3c24a32735c607de
The issue was caused by a out-of-order merge, which leads to wrong
functions are called at lossless mode.
Change-Id: If157729abab62954c729e0377e7f53edb7db22ca
rebased.
This patch includes 16x16 butterfly inverse ADST/DCT hybrid
transform. It uses the variant ADST of kernel
sin((2k+1)*(2n+1)/4N),
which allows a butterfly implementation.
The coding gains as compared to DCT 16x16 are about 0.1% for
both derf and std-hd. It is noteworthy that for std-hd sets
many sequences gains about 0.5%, some 0.2%. There are also few
points that provides -1% to -3% performance. Hence the average
goes to about 0.1%.
Change-Id: Ie80ac84cf403390f6e5d282caa58723739e5ec17
Since there is no Y2, these values are always zero. This changes the
bitstream results slightly, hence a separate commit.
Change-Id: I2f838f184341868f35113ec77ca89da53c4644e0
These allow sending partial bitstream packets over the network before
encoding a complete frame is completed, thus lowering end-to-end
latency. The tile-rows are not independent.
Change-Id: I99986595cbcbff9153e2a14f49b4aa7dee4768e2
Since addition of the larger-scale transforms (16x16, 32x32), these
don't give a benefit at macroblock-sizes anymore. At superblock-sizes,
2nd-order transform was never used over the larger transforms. Future
work should test whether there is a benefit for that use case.
Change-Id: I90cadfc42befaf201de3eb0c4f7330c56e33330a
This patch abstracts the selection of the coefficient band
context into a function as a precursor to further experiments
with the coefficient context.
It also removes the large per TX size coefficient band structures
and uses a single matrix for all block sizes within the test function.
This may have an impact on quality (results to follow) but is only an
intermediate step in the process of redefining the context. Also the
quality impact will be larger initially because the default tables will
be out of step with the new banding.
In particular the 4x4 will in this case only use 7 bands. If needed we
can add back block size dependency localized within the function, but
this can follow on after the other changes to the definition of the
context.
Change-Id: Id7009c2f4f9bb1d02b861af85fd8223d4285bde5
This is an initial step to facilitate experimentation
with changes to the prior token context used to code
coefficients to take better account of the energy of
preceding tokens.
This patch merely abstracts the selection of context into
two functions and does not alter the output.
Change-Id: I117fff0b49c61da83aed641e36620442f86def86
1. Added a bit in frame header to to indicate if a frame is encoded
in lossless mode, so decoder does not make the decision based on Q0
2. Minor changes to make sure that lossy coding works same as when
the lossless experiment is not enabled.
3. Renamed function pointers for transforms to be consistent, using
prefix fwd_txm and inv_txm for forward and inverse respectively
To encode in lossless mode, using "--lossless=1 --min-q=0 --max-q=0"
with vpxenc.
Change-Id: Ifae53b26d2ffbe378d707e29d96817b8a5e6c068
Removal of the NEWCOEFCONTEXT experiment to
reduce code clutter and make it easier to experiment with
some other changes to the coefficient coding context.
Change-Id: Icd17b421384c354df6117cc714747647c5eb7e98
This is after discussion with the hardware team. Update the unit test
to take these sizes into account. Split out some duplicate code into
a separate file so it can be shared.
Change-Id: I8311d11b0191d8bb37e8eb4ac962beb217e1bff5
fixed format issues.
Implement the inverse 4x4 ADST using 9 multiplications. For this
particular dimension, the original ADST transform can be
factorized into simpler operations, hence is retained.
Change-Id: Ie5d9749942468df299ab74e90d92cd899569e960
Replace as_mv.{first, second} with a two element array, so that they
can easily be processed with an index variable.
Change-Id: I1e429155544d2a94a5b72a5b467c53d8b8728190
* changes:
Initial support for resolution changes on P-frames
Avoid allocating memory when resizing frames
Adds a test for the VP8E_SET_SCALEMODE control
Allows inter-frames to change resolution. Currently these are
almost equivalent to keyframes, as only intra prediction modes
are allowed, but without the other context resets that occur on
keyframes.
Change-Id: Icd1a2a5af0d9462cc792588427b0a1f5b12e40d3
Tweak to default mode context to account for the fact
that when there are no non zero motion candidates
Nearest is now the preferred mode for coding a 0,0
vector.
Also resolve duplicate function name and typos.
Change-Id: I76802788d46c84e3d1c771be216a537ab7b12817
Refactor the 8x8 inverse hybrid transform. It is now consistent
with the new inverse DCT. Overall performance loss (due to the
use of this variant ADST, and the rounding errors in the butterfly
implementation) for std-hd is -0.02.
Fixed BUILD warning.
Devise a variant of the original ADST, which allows butterfly
computation structure. This new transform has kernel of the
form: sin((2k+1)*(2n+1) / (4N)). One of its butterfly structures
using floating-point multiplications was reported in Z. Wang,
"Fast algorithms for the discrete W transform and for the discrete
Fourier transform", IEEE Trans. on ASSP, 1984.
This patch includes the butterfly implementation of the inverse
ADST/DCT hybrid transform of dimension 8x8.
Change-Id: I3533cb715f749343a80b9087ce34b3e776d1581d
This patch adds column-based tiling. The idea is to make each tile
independently decodable (after reading the common frame header) and
also independendly encodable (minus within-frame cost adjustments in
the RD loop) to speed-up hardware & software en/decoders if they used
multi-threading. Column-based tiling has the added advantage (over
other tiling methods) that it minimizes realtime use-case latency,
since all threads can start encoding data as soon as the first SB-row
worth of data is available to the encoder.
There is some test code that does random tile ordering in the decoder,
to confirm that each tile is indeed independently decodable from other
tiles in the same frame. At tile edges, all contexts assume default
values (i.e. 0, 0 motion vector, no coefficients, DC intra4x4 mode),
and motion vector search and ordering do not cross tiles in the same
frame.
t log
Tile independence is not maintained between frames ATM, i.e. tile 0 of
frame 1 is free to use motion vectors that point into any tile of frame
0. We support 1 (i.e. no tiling), 2 or 4 column-tiles.
The loopfilter crosses tile boundaries. I discussed this briefly with Aki
and he says that's OK. An in-loop loopfilter would need to do some sync
between tile threads, but that shouldn't be a big issue.
Resuls: with tiling disabled, we go up slightly because of improved edge
use in the intra4x4 prediction. With 2 tiles, we lose about ~1% on derf,
~0.35% on HD and ~0.55% on STD/HD. With 4 tiles, we lose another ~1.5%
on derf ~0.77% on HD and ~0.85% on STD/HD. Most of this loss is
concentrated in the low-bitrate end of clips, and most of it is because
of the loss of edges at tile boundaries and the resulting loss of intra
predictors.
TODO:
- more tiles (perhaps allow row-based tiling also, and max. 8 tiles)?
- maybe optionally (for EC purposes), motion vectors themselves
should not cross tile edges, or we should emulate such borders as
if they were off-frame, to limit error propagation to within one
tile only. This doesn't have to be the default behaviour but could
be an optional bitstream flag.
Change-Id: I5951c3a0742a767b20bc9fb5af685d9892c2c96f
and called this function in vp9_dequant_idct_add_32x32_c when
eob == 1. For the test clip used, the decoder performance improved
by 21+%. Based on Yaowu's 16 point idct work.
Change-Id: Ib579a90fed531d45777980e04bf0c9b23c093c43
This commit changes the inverse 16 point dct to use the same algorithm
as the one for 32 point idct. In fact, now 16 point dct uses the exact
version of the souce code for even portion of the 32 point idct.
Tests showed current implementation has significant better accuracy
than the previous version. With this implementation and the minor bug
fix on forward 16 point dct, encoding tests showed about 0.2% better
compression of CIF set, test results on std-hd setting pending.
Change-Id: I68224b60c816ba03434e9f08bee147c7e344fb63
First step in simplifying the segment mode and
segment EOB flags into a simpler segment skip
flag that implies 0,0 mv and EOB at position 0.
Change-Id: Ib750cac31a7a02dc21082580498efd9f7d8d72a5
Adds a flag to disable features that would inhibit frame parallel
decoding. This includes backward adaptation and MV sorting based
on search in ref frame buffer.
Also includes some minor clean-ups.
Change-Id: I434846717a47b7bcb244b37ea670c5cdf776f14d
Added a quick eob == 0 check. Once the integer version of the dct32x32 is
complete, we can check for other eob cases.
For the 1080p clip used, the decoder performance improved by 4%.
Change-Id: I9390b6ed3c8be0c0c0a0c44c578d9a031d6e026e
Adds an error-resilient mode where frames can be continued
to be decoded even when there are errors (due to network losses)
on a prior frame. Specifically, backward updates are turned off
and probabilities of various symbols are reset to defaults at
the beginning of each frame. Further, the last frame's mvs are
not used for the mv reference list, and the sorting of the
initial list based on search on previous frames is turned off
as well.
Also adds a test where an arbitrary set of frames are skipped
from decoding to simulate errors. The test verifies (1) that if
the error frames are droppable - i.e. frame buffer updates have
been turned off - there are no mismatch errors for the remaining
frames after the error frames; and (2) if the error-frames are non
droppable, there are not only no decoding errors but the mismatch
PSNR between the decoder's version of the post-error frames and the
encoder's version is at least 20 dB.
Change-Id: Ie6e2bcd436b1e8643270356d3a930e8989ff52a5
This commit restores the quality lost when the buffer-to-buffer copy
logic was removed. Note that this is specific to the current use of
golden frames and will need rework when RTC functionality is added.
Change-Id: I7324a75acd96eafd9e0f9b8633d782e390d5dc21
Previously there were two frame coding contexts tracked, one for normal
frames and one for alt-ref frames. Generalize this by signalling the
context to use in the bitstream, rather than tieing it to the alt ref
refresh bit. Also increase the number of contexts available to 4, which
may be useful for temporal scalability.
Change-Id: I7b66daaddd55c535c20cd16713541fab182b1662
Remove lst_fb_idx, gld_fb_idx, alt_fb_idx, refresh_last_frame,
refresh_golden_frame, refresh_alt_ref_frame from common. Gold/Alt are
encode side conventions. From the decoder's perspective, we want to be
dealing with numbered references.
Updates to active_ref 2 signal mode context switches, vestigial from
refresh_alt_ref_frame. This needs some clean up to make sense with
increased numbers of reference frames, as well as reimplementing the
swapping of alt/golden which was previously done using the
buffer-to-buffer copy mechanism removed in an earlier commit.
Change-Id: I7334445158b7666f9295d2a2dd22aa03f4485f58
Do reference counting the same way on the encoder as the decoder does,
rather than maintaining the 'flags' member of YV12_BUFFER_CONFIG.
Change-Id: I91dc210ffca081acaf9d5c09a06e7461b3c3139c
This is the first in a series of commits to add additional reference
frames to the codec. Each frame will be able to update any of the
available references, but copying between references is not
supported.
Change-Id: I5945b5ce6cc3582c495102b4e7eed4f08c44d5a1
This experiment gives little gains and adds relatively much code
complexity (and it hinders other experiments), so let's get rid of
it.
Change-Id: Id25e79a137a1b8a01138aa27a1fa0ba4a2df274a
This patch removes the old pred-filter experiment and replaces it
with one that is implemented using the switchable filter framework.
If the pred-filter experiment is enabled, three interopolation
filters are tested during mode selection; the standard 8-tap
interpolation filter, a sharp 8-tap filter and a (new) 8-tap
smoothing filter.
The 6-tap filter code has been preserved for now and if the
enable-6tap experiment is enabled (in addition to the pred-filter
experiment) the original 6-tap filter replaces the new 8-tap smooth
filter in the switchable mode.
The new experiment applies the prediction filter in cases of a
fractional-pel motion vector. Future patches will apply the filter
where the mv is pel-aligned and also to intra predicted blocks.
Change-Id: I08e8cba978f2bbf3019f8413f376b8e2cd85eba4
This is to fix a decoder crash when decoder skips a number of frame to
continue decoding from a later key frame.
Change-Id: I3ba116eba6c3440e0528a21f53745f694302e4ad
This commit did a couple of minor cleanup/refactoring to prepare for
futher loop filter experiments. It merged y_only version of loop filter
function into the regular one, which makes sure that same logic is used
for functions for picking level and for actual loop filtering.
Change-Id: Id10c94dccd45f58e5310bacfdf6ee63cbb60b86f
Read mode before calling vp9_find_best_ref_mvs(). If the mode is
ZEROMV, the best ref_mvs are not needed. Then, we can skip calling
vp9_find_best_ref_mvs().
Change-Id: I5baa3658dd3f1c7107211cbbbcf919b4584be2e2
Various fixups to resolve issues when building vp9-preview under the more stringent
checks placed on the experimental branch.
Change-Id: I21749de83552e1e75c799003f849e6a0f1a35b07
Adds an experiment to derive the previous context of a coefficient
not just from the previous coefficient in the scan order but from a
combination of several neighboring coefficients previously encountered
in scan order. A precomputed table of neighbors for each location
for each scan type and block size is used. Currently 5 neighbors are
used.
Results are about 0.2% positive using a strategy where the max coef
magnitude from the 5 neigbors is used to derive the context.
Change-Id: Ie708b54d8e1898af742846ce2d1e2b0d89fd4ad5
This patch changes the token packing to call the bool encoder API rather
than inlining it into the token packing function, and similarly removes
a special get_signed case from the detokenizer. This allows easier
experimentation with changing the bool coder as a whole.
Change-Id: I52c3625bbe4960b68cfb873b0e39ade0c82f9e91
For coefficients, use int16_t (instead of short); for pixel values in
16-bit intermediates, use uint16_t (instead of unsigned short); for all
others, use uint8_t (instead of unsigned char).
Change-Id: I3619cd9abf106c3742eccc2e2f5e89a62774f7da
This is to fix a decoder crash when decoder skips a number of frame to
continue decoding from a later key frame.
Change-Id: I3ba116eba6c3440e0528a21f53745f694302e4ad
As suggested by Yaowu, we can use eob to reduce the complexity
of the vp9_ihtllm_c function. For the 1080p test clip used, the decoder
performance improved by 17%.
Change-Id: I32486f2f06f9b8f60467d2a574209aa3a3daa435
Add a function clip_pixel() to clip a pixel value to the [0,255] range
of allowed values, and use this where-ever appropriate (e.g. prediction,
reconstruction). Likewise, consistently use the recently added function
clip_prob(), which calculates a binary probability in the [1,255] range.
If possible, try to use get_prob() or its sister get_binary_prob() to
calculate binary probabilities, for consistency.
Since in some places, this means that binary probability calculations
are changed (we use {255,256}*count0/(total) in a range of places,
and all of these are now changed to use 256*count0+(total>>1)/total),
this changes the encoding result, so this patch warrants some extensive
testing.
Change-Id: Ibeeff8d886496839b8e0c0ace9ccc552351f7628
Some further changes and refactoring of mv
reference code and selection of center point for
searches. Mainly relates to not passing so many
different local copies of things around.
Some place holder comments.
Change-Id: I309f10ffe9a9cde7663e7eae19eb594371c8d055
This commit changed the ENTROPY_CONTEXT conversion between MBs that
have different transform sizes.
In additioin, this commit also did a number of cleanup/bug fix:
1. removed duplicate function vp9_fix_contexts() and changed to use
vp8_reset_mb_token_contexts() for both encoder and decoder
2. fixed a bug in stuff_mb_16x16 where wrong context was used for
the UV.
3. changed reset all context to 0 if a MB is skipped to simplify the
logic.
Change-Id: I7bc57a5fb6dbf1f85eac1543daaeb3a61633275c
Don't use vp9_decode_coefs_4x4() for 2nd order DC or luma blocks. The
code introduces some overhead which is unnecessary for these cases.
Also, remove variable declarations that are only used once, remove
magic offsets into the coefficient buffer (use xd->block[i].qcoeff
instead of xd->qcoeff + magic_offset), and fix a few Google Style
Guide violations.
Change-Id: I0ae653fd80ca7f1e4bccd87ecef95ddfff8f28b4
Use these, instead of the 4/5-dimensional arrays, to hold statistics,
counts, accumulations and probabilities for coefficient tokens. This
commit also re-allows ENTROPY_STATS to compile.
Change-Id: If441ffac936f52a3af91d8f2922ea8a0ceabdaa5
This adds Debargha's DCT/DWT hybrid and a regular 32x32 DCT, and adds
code all over the place to wrap that in the bitstream/encoder/decoder/RD.
Some implementation notes (these probably need careful review):
- token range is extended by 1 bit, since the value range out of this
transform is [-16384,16383].
- the coefficients coming out of the FDCT are manually scaled back by
1 bit, or else they won't fit in int16_t (they are 17 bits). Because
of this, the RD error scoring does not right-shift the MSE score by
two (unlike for 4x4/8x8/16x16).
- to compensate for this loss in precision, the quantizer is halved
also. This is currently a little hacky.
- FDCT and IDCT is double-only right now. Needs a fixed-point impl.
- There are no default probabilities for the 32x32 transform yet; I'm
simply using the 16x16 luma ones. A future commit will add newly
generated probabilities for all transforms.
- No ADST version. I don't think we'll add one for this level; if an
ADST is desired, transform-size selection can scale back to 16x16
or lower, and use an ADST at that level.
Additional notes specific to Debargha's DWT/DCT hybrid:
- coefficient scale is different for the top/left 16x16 (DCT-over-DWT)
block than for the rest (DWT pixel differences) of the block. Therefore,
RD error scoring isn't easily scalable between coefficient and pixel
domain. Thus, unfortunately, we need to compute the RD distortion in
the pixel domain until we figure out how to scale these appropriately.
Change-Id: I00386f20f35d7fabb19aba94c8162f8aee64ef2b
Only declare the functions in vpx_scale RTCD and include the relevant
header.
Remove unused files and functions in vpx_scale to avoid wasting time
renaming. vpx_scale/win32/scaleopt.c contains functions which have not
been called in a long time but are potentially optimized.
The 'vp8' functions have not been renamed yet. That is for after the
cleanup.
Change-Id: I2c325a101d60fa9d27e7dfcd5b52a864b4a1e09c
Only declare the functions in vpx_scale RTCD and include the relevant
header.
Remove unused files and functions in vpx_scale to avoid wasting time
renaming. vpx_scale/win32/scaleopt.c contains functions which have not
been called in a long time but are potentially optimized.
The 'vp8' functions have not been renamed yet. That is for after the
cleanup.
Change-Id: I2c325a101d60fa9d27e7dfcd5b52a864b4a1e09c
Allows switchbale filters to be used without mismatch when the
superblock experiment is on.
Also removes a spurious clamping code in decodemv.c which causes
rare encode/decode mismatches.
Change-Id: I809d9ee0b2859552b613500b539a615515b863ae
Previously, the "!=" check is logically incorrect when eob is at 0 and
effective coefficient starting position is 1. This commit should have
no effect on bitstream.
Change-Id: I6ce3a847c7e72bfbe4f7c74f88e3310c6b9b6d30
This patch allows use of 8x8 and 4x4 ADST correctly for Intra
16x16 modes and Intra 8x8 modes when the block size selected
is smaller than the prediction mode. Also includes some cleanups
and refactoring.
Rebase.
Change-Id: Ie3257bdf07bdb9c6e9476915e3a80183c8fa005a
This commit makes sure Y2 entropy coding context is always updated on
every macroblock even there is no Y2 block.
Change-Id: Ie307cfc46526efe55613be39f9f178d2531b56ba
This commit removed a couple of redundant data structures in frame
coding contextsm, mode_context and mode_context_a, and changed to
use vp9_mode_contexts only. The switch of the context for different
frame type now relies on the switch of frame coding context between
lfc and lfc_a. This commit also removed a number of memcpy among
these redundant data structure.
Change-Id: I42e8174bd60f466b0860afc44c1263896471b0f3
Not all segment feature data elements are full-range powers of two, so
there are values that can be encoded that are invalid. Add a new function
to clamp values to the maximum allowed.
Change-Id: Ie47cb80ef2d54292e6b8db9f699c57214a915bc4
Support for gyp which doesn't support multiple objects in the same
static library having the same basename.
Change-Id: Ib947eefbaf68f8b177a796d23f875ccdfa6bc9dc
Add a new function vp9_decode_mb_tokens() that handles the switch
between different per-tx-size detokenize functions. Make actual
implementations (vp9_decode_mb_tokens_NxN()) static.
Change-Id: I9e0c4ef410bfa90128a02b472c079a955776816d
Don't declare variables if they only ever have a single value and are
used only as argument to another function call; instead, just hardcode
the value in the function call directly. Split out UV and Y coefficient
loops for clarity. Use xd->block[].qcoeff instead of xd->qcoeff + magic
to remove use of magic offset variables.
Change-Id: I5b17eda1bb666c69c2b7ea957d5525cd78192e33
Don't declare variables if they only ever have a single value and are
used only as argument to another function call; instead, just hardcode
the value in the function call directly. Also remove unneeded brackets
around a code block, and remove the magic offsets 64 and 256 for chroma
values in the coefficient memory block.
Change-Id: I14fc14120a81ea1d6fb862674e8bf8cf6ba3d114
This prevents the relatively expensive token-from-coefficient lookup
function get_token(), plus a duplicate loop..
Change-Id: Ibecd407b2a91d3593d439ec4646e43fa26d2ff91
Just like for all other block modes, b_pred tokens can be read together
before starting macroblock reconstruction. This removes special cases
for b_pred in decode_macroblock() and allows to make decode_coefs_4x4()
static in detokenize.c.
While at it, remove the redundant handling and checking of plane_type
and block_index (i) in decode_coefs_4x4(). Since the function is static,
and is called only from decode_mb_tokens_4x4(), we don't need to worry
that the arguments ever go out of sync.
Change-Id: I2d415da0b51b89d0490a6b9e24cc86363c2090f7
Experiments with a larger set of contexts and some
clean up to replace magic numbers regarding the
number of contexts.
The starting values and rate of backwards adaption
are still suspect and based on a small set of tests.
Added forwards adjustment of probabilities.
The net result of adding the new context and forward
update is small compared to the old context from the
legacy find_near function. (down a little on derf but
up by a similar amount for HD)
HOWEVER.... with the new context and forward update
the impact of disabling the reverse update (which may be
necessary in some use cases to facilitate parallel decoding)
is hugely reduced.
For the old context without forward update, the impact of
turning off reverse update (Experiment was with SB off) was
Derf - 0.9, Yt -1.89, ythd -2.75 and sthd -8.35. The impact was
mainly at low data rates.
With the new context and forward update enabled the impact
for all the test sets was no more than 0.5-1% (again most at
the low end).
Change-Id: Ic751b414c8ce7f7f3ebc6f19a741d774d2b4b556
A patch on compound inter-intra prediction.
In compound inter-intra prediction, a new predictor for
16x16 inter coded MBs are obtained by combining a single
inter predictor with a 16x16 intra predictor, in a manner
that the weight varies with distance from the top/left
boundary. The current search strategy is to combine the best
inter mode with the best intra mode obtained independently.
Results so far:
derf +0.31%
yt +0.32%
std-hd +0.35%
hd +0.42%
It is conceivable that the results would improve somewhat
with a more thorough search strategy where all intra modes
are searched given the best mv, or even a joint search for
the best mv and the best intra mode.
Change-Id: I7951f1ed0d6eb31ca32ac24d120f1585bcd8d79b
Modify the decoder to return the ending position of the bool decoder and
use that as the starting position for the next frame.
The constant-space algorithm for parsing the appended frame lengths is
O(n^2), which is a potential DoS concern if n is unbounded. Revisit
the appended lengths for use as partition lengths when multipartition
support is added.
In addition, this allows decoding of raw streams outside of a container
without additional framing information, though it's insufficient to
be able to remux said stream into a container.
Change-Id: I71e801a9c3e37abe559a56a597635b0cbae1934b
Update decode_coefs() to break when c >= eob, since it's possible that
c starts the loop from 1 and eob is 0. The loop won't terminate in that
case.
Add new get_eob() function to consistently clamp the eob based on the
segment level EOB and the block size. It's possible to code a segment
level EOB that's greater than the block size, and that leads to an
out of bounds access.
Change-Id: I859563b30414615cf1b30dcc2aef8a1de358c42d
CONFIG_DEBUG was turning on some code to dump the reconstructed frame
to a buffer from within the decoder. Move this code to a more specific
debugging define.
Change-Id: I3ca9ea634bdbd186f2470bd644d3695ee0ab3037
Similar to 16x16 dequant and idct, based on the value of eobs, the
8x8 dequant and idct calculation was simplified to improve decorder
performance.
Combined vp9_dequant_idct_add_8x8 and vp9_dequant_dc_idct_add_8x8
to eliminate duplicate code.
Change-Id: Ia58e50ab27f7012b7379c495837c9c0b5ba9cf7f
This change is a fix / extension of the newbestrefmv
experiment. As such it is presented without IFDEF.
The change creates a new context for coding inter modes
in vp9_find_mv_refs(). This replaces the context that
was previously calculated in vp9_find_near_mvs().
The new context is unoptimized and not necessarily
any better at this stage (results pending), but eliminates
the need for a legacy call to vp9_find_near_mvs().
Based on numbers from Scott, this could help decode
speed by several %.
In a later patch I will add support for forward update of
context (assuming this helps) and refine the context as
necessary.
Change-Id: I1cd991b82c8df86cc02237a34185e6d67510698a
This fixes encoder/decoder mismatches with the superblock experiment
turned on whenever a superblock is encoded using the 4x4 transform.
Change-Id: Iefec7055e8d25f8efdbba66c4261bbd322d335a3
Preliminary patch on a new 4x4 intra mode B_CONTEXT_PRED where the
dominant direction from the context is used to encode. Various decoder
changes are needed to support decoding of B_CONTEXT_PRED in conjunction
with hybrid transforms since the scan order and tokenization depends on
the actual direction of prediction obtained from the context. Currently
the traditional directional modes are used in conjunction with the
B_CONTEXT_PRED, which also seems to provide the best results.
The gains are small - in the 0.1% range.
Change-Id: I5a7ea80b5218f42a9c0dfb42d3f79a68c7f0cdc2
Also split superblock handling code out of decode_macroblock() into
a new function decode_superblock(), for easier readability.
Derf +0.05%, HD +0.2%, STDHD +0.1%. We can likely get further gains
by allowing to select mb_skip_coeff for a subset of the complete SB
or something along those lines, because although this change allows
coding smaller transforms for bigger predictors, it increases the
overhead of coding EOBs to skip the parts where the residual is
near-zero, and thus the overall gain is not as high as we'd expect.
Change-Id: I552ce1286487267f504e3090b683e15515791efa
As suggested by Yaowu, simplified 16x16 dequant and idct. In decoder,
after detoken step, we know the number of non-zero dct coefficients
(eobs) in a macroblock. Idct calculation can be skipped or simplified
based on eobs, which improves the decoder performance.
Change-Id: I9ffa1cb134bcb5a7d64fcf90c81871a96d1b4018
there are still a couple type of warning left, which are related to
double constants assigned to float type. As those would be addressed
by the conversion of transforms into integer version. This commit
has left those un-dealt with.
Change-Id: I48fd9b489c0c27ad6b543f4177423419f929f2bb
The block sizes for decoding tokens are up to 16x16, which means
eobs is within [0, 256]. Using (signed) char is not enough. Changed
eobs data type to unsigned short to fix the problem.
Change-Id: I88a7d3098e1f1604c336d6adb88ffec971fb03a6