The commit changes the coding mode to lossless whenever the lowest
quantizer is choosen.
As expected, test results showed no difference for cif and std-hd
set where Q0 is rarely used. For yt and yt-hd set, Q0 is used for
a number of clips, where this commit helped a lot in the high end.
Average over all clips in the sets:
yt: 2.391% 1.017% 1.066%
hd: 1.937% .764% .787%
Change-Id: I9fa9df8646fd70cb09ffe9e4202b86b67da16765
The 32x32 value in case of splitmv was uninitialized. this leads to
all kind of erratic behaviour down the line. Also fill in dummy values
for superblocks in keyframes (the values are currently unused, but we
run into integer overflows anyway, which makes detecting bad cases
harder). Lastly, in case we did not find any RD value at all, don't
set tx_diff to INT_MIN, but instead set it to zero (since if we couldn't
find a mode, it's unlikely that any particular transform would have made
that worse or better; rather, it's likely equally bad for all tx_sizes).
Change-Id: If236fd3aa2037e5b398d03f3b1978fbbc5ce740e
This issue breaks the encoding process of the codebase. The effect
emerges only in particular test sequence at certain bit-rates and
frame limits.
Change-Id: I02e080f2a49624eef9a21c424053dc2a1d902452
Since there is no Y2, these values are always zero. This changes the
bitstream results slightly, hence a separate commit.
Change-Id: I2f838f184341868f35113ec77ca89da53c4644e0
These allow sending partial bitstream packets over the network before
encoding a complete frame is completed, thus lowering end-to-end
latency. The tile-rows are not independent.
Change-Id: I99986595cbcbff9153e2a14f49b4aa7dee4768e2
This patch abstracts the selection of the coefficient band
context into a function as a precursor to further experiments
with the coefficient context.
It also removes the large per TX size coefficient band structures
and uses a single matrix for all block sizes within the test function.
This may have an impact on quality (results to follow) but is only an
intermediate step in the process of redefining the context. Also the
quality impact will be larger initially because the default tables will
be out of step with the new banding.
In particular the 4x4 will in this case only use 7 bands. If needed we
can add back block size dependency localized within the function, but
this can follow on after the other changes to the definition of the
context.
Change-Id: Id7009c2f4f9bb1d02b861af85fd8223d4285bde5
Reverted part of change
I19981d1ef0b33e4e5732739574f367fe82771a84
That gives rise to an enc/dec mismatch.
As things stand the memsets are still needed.
Change-Id: I9fa076a703909aa0c4da0059ac6ae19aa530db30
This is an initial step to facilitate experimentation
with changes to the prior token context used to code
coefficients to take better account of the energy of
preceding tokens.
This patch merely abstracts the selection of context into
two functions and does not alter the output.
Change-Id: I117fff0b49c61da83aed641e36620442f86def86
1. Added a bit in frame header to to indicate if a frame is encoded
in lossless mode, so decoder does not make the decision based on Q0
2. Minor changes to make sure that lossy coding works same as when
the lossless experiment is not enabled.
3. Renamed function pointers for transforms to be consistent, using
prefix fwd_txm and inv_txm for forward and inverse respectively
To encode in lossless mode, using "--lossless=1 --min-q=0 --max-q=0"
with vpxenc.
Change-Id: Ifae53b26d2ffbe378d707e29d96817b8a5e6c068
Removal of the NEWCOEFCONTEXT experiment to
reduce code clutter and make it easier to experiment with
some other changes to the coefficient coding context.
Change-Id: Icd17b421384c354df6117cc714747647c5eb7e98
A couple of scalar optimizations speeding up quantization by about 1.6x. Overall encoder speedup is around 3%.
Change-Id: I19981d1ef0b33e4e5732739574f367fe82771a84
This is after discussion with the hardware team. Update the unit test
to take these sizes into account. Split out some duplicate code into
a separate file so it can be shared.
Change-Id: I8311d11b0191d8bb37e8eb4ac962beb217e1bff5
fixed format issues.
Implement the inverse 4x4 ADST using 9 multiplications. For this
particular dimension, the original ADST transform can be
factorized into simpler operations, hence is retained.
Change-Id: Ie5d9749942468df299ab74e90d92cd899569e960
Experimental tweaks to various thresholds to measure
quality / speed trade off.
Add flag that allows static segmentation to be turned off
and disables it unless in the second pass of a two pass
encode.
Change-Id: I219702ffe858412a83db801cbbbd869924b8c61b
Replace as_mv.{first, second} with a two element array, so that they
can easily be processed with an index variable.
Change-Id: I1e429155544d2a94a5b72a5b467c53d8b8728190
Also port the 4x4, 16x16, 8x16 and 16x8 versions to x86inc.asm; this
makes them all slightly faster, particularly on x86-64. Remove SSE3
sad16x16 version, since the SSE2 version is now faster.
About 1.5% overall encoding speedup.
Change-Id: Id4011a78cce7839f554b301d0800d5ca021af797
Cache the constant offset in one variable to prevent re-loading that
in each loop iteration, and mark the function as inline so we can use
the fact that the transform size is always known in the caller.
Almost 1% faster encoding overall.
Change-Id: Id78325a60b025057d8f4ecd9003a74086ccbf85a
Pass the current mb row and column around rather than the
recon_yoffset and recon_uvoffset, since those offsets will
change from predictor to predictor, based on the reference
frame selection.
Change-Id: If3f9df059e00f5048ca729d3d083ff428e1859c1
* changes:
Initial support for resolution changes on P-frames
Avoid allocating memory when resizing frames
Adds a test for the VP8E_SET_SCALEMODE control
Allows inter-frames to change resolution. Currently these are
almost equivalent to keyframes, as only intra prediction modes
are allowed, but without the other context resets that occur on
keyframes.
Change-Id: Icd1a2a5af0d9462cc792588427b0a1f5b12e40d3
As long as the new frame is smaller than the size that was originally
allocated, we don't need to free and reallocate the memory allocated.
Instead, do the allocation on the size of the first frame. We could
make this passed in from the application instead, if we wanted to
support external upscaling.
Change-Id: I204d17a130728bbd91155bb4bd863a99bb99b038
Tests that the external interface to set the internal codec scaling
works as expected. Also updates the test to pull the height from
the decoded frame size rather than parsing the keyframe header,
in anticipation of allowing resolution changes on non-keyframes.
Change-Id: I3ed92117d8e5288fbbd1e7b618f2f233d0fe2c17
Tweak to default mode context to account for the fact
that when there are no non zero motion candidates
Nearest is now the preferred mode for coding a 0,0
vector.
Also resolve duplicate function name and typos.
Change-Id: I76802788d46c84e3d1c771be216a537ab7b12817
Refactor the 8x8 inverse hybrid transform. It is now consistent
with the new inverse DCT. Overall performance loss (due to the
use of this variant ADST, and the rounding errors in the butterfly
implementation) for std-hd is -0.02.
Fixed BUILD warning.
Devise a variant of the original ADST, which allows butterfly
computation structure. This new transform has kernel of the
form: sin((2k+1)*(2n+1) / (4N)). One of its butterfly structures
using floating-point multiplications was reported in Z. Wang,
"Fast algorithms for the discrete W transform and for the discrete
Fourier transform", IEEE Trans. on ASSP, 1984.
This patch includes the butterfly implementation of the inverse
ADST/DCT hybrid transform of dimension 8x8.
Change-Id: I3533cb715f749343a80b9087ce34b3e776d1581d
Added switches and code to skip/breakout from
doing SB32 and SB64 tests based on whether
the 16x16 MB tests used split modes. Also to
optionally skip 64x64 if 16x16 was chosen over
32x32.
Impact varies depending on clip from a few %
up to almost 50% on encode speed. Only the
split mode breakout is currently enabled.
Change-Id: Ib5836140b064b350ffa3057778ed2cadcc495cf8
This patch adds column-based tiling. The idea is to make each tile
independently decodable (after reading the common frame header) and
also independendly encodable (minus within-frame cost adjustments in
the RD loop) to speed-up hardware & software en/decoders if they used
multi-threading. Column-based tiling has the added advantage (over
other tiling methods) that it minimizes realtime use-case latency,
since all threads can start encoding data as soon as the first SB-row
worth of data is available to the encoder.
There is some test code that does random tile ordering in the decoder,
to confirm that each tile is indeed independently decodable from other
tiles in the same frame. At tile edges, all contexts assume default
values (i.e. 0, 0 motion vector, no coefficients, DC intra4x4 mode),
and motion vector search and ordering do not cross tiles in the same
frame.
t log
Tile independence is not maintained between frames ATM, i.e. tile 0 of
frame 1 is free to use motion vectors that point into any tile of frame
0. We support 1 (i.e. no tiling), 2 or 4 column-tiles.
The loopfilter crosses tile boundaries. I discussed this briefly with Aki
and he says that's OK. An in-loop loopfilter would need to do some sync
between tile threads, but that shouldn't be a big issue.
Resuls: with tiling disabled, we go up slightly because of improved edge
use in the intra4x4 prediction. With 2 tiles, we lose about ~1% on derf,
~0.35% on HD and ~0.55% on STD/HD. With 4 tiles, we lose another ~1.5%
on derf ~0.77% on HD and ~0.85% on STD/HD. Most of this loss is
concentrated in the low-bitrate end of clips, and most of it is because
of the loss of edges at tile boundaries and the resulting loss of intra
predictors.
TODO:
- more tiles (perhaps allow row-based tiling also, and max. 8 tiles)?
- maybe optionally (for EC purposes), motion vectors themselves
should not cross tile edges, or we should emulate such borders as
if they were off-frame, to limit error propagation to within one
tile only. This doesn't have to be the default behaviour but could
be an optional bitstream flag.
Change-Id: I5951c3a0742a767b20bc9fb5af685d9892c2c96f
Update the code to call the new convolution functions to do subpixel
prediction rather than the existing functions. Remove the old C and
assembly code, since it is unused. This causes a 50% performance
reduction on the decoder, but that will be resolved when the asm for
the new functions is available.
There is no consensus for whether 6-tap or 2-tap predictors will be
supported in the final codec, so these filters are implemented in
terms of the 8-tap code, so that quality testing of these modes
can continue. Implementing the lower complexity algorithms is a
simple exercise, should it be necessary.
This code produces slightly better results in the EIGHTTAP_SMOOTH
case, since the filter is now applied in only one direction when
the subpel motion is only in one direction. Like the previous code,
the filtering is skipped entirely on full-pel MVs. This combination
seems to give the best quality gains, but this may be indicative of a
bug in the encoder's filter selection, since the encoder could
achieve the result of skipping the filtering on full-pel by selecting
one of the other filters. This should be revisited.
Quality gains on derf positive on almost all clips. The only clip
that seemed to be hurt at all datarates was football
(-0.115% PSNR average, -0.587% min). Overall averages 0.375% PSNR,
0.347% SSIM.
Change-Id: I7d469716091b1d89b4b08adde5863999319d69ff
This commit changes the 4x4 iDCT to use same algorithm & constants as
other iDCTs. The 4x4 fDCT is also changed to be based on the new iDCT.
Change-Id: Ib1a902693228af903862e1f5a08078c36f2089b0
This commit makes the NearestMV match the chosen
best reference MV. It can be a 0,0 or non zero vector
which means the the compound nearest mv mode can
combine a 0,0 and a non zero vector.
Change-Id: I2213d09996ae2916e53e6458d7d110350dcffd7a
Separate out code to set the main encode speed
related rd thresholds. Some values changed from
the initial defaults for various new modes.
Quality test results pending but even the addition
of some further non-zero defaults helps encode speed
somewhat in limited testing on derf clips.
Adjustment of thresholds for quality / speed tradeoff
to follow.
Change-Id: I117ee473157e151a1b93193d5f393449328de20d
The commit fixes a minor error in 16 point fdct where in a rotation can
produce result of -1 instead of 0.
Change-Id: I45aac4a52bcd06225c6d04e643547a13e1c1aade
This is identical to the later decisions made in encode_superblock().
This commit doesn't actually change anything, but makes the mbmi state
more consistent between the RD loop and the final encode result.
Change-Id: I9e735afb7c5a52e5b61728cb88c67ef9b9bf59be
The RD loop would change the pointer after the first mode (DC) was tested,
leading to corrupt block objects being provided for the others. This
would essentially render the i8x8 predictor useless.
Change-Id: I16c5906ca64fb34878ac32ce59af8974e4582bb8
Remove eob_max_offset markers and replace
with the generic skip_block flag to indicate
to the quantizer that all coeffs to be set to 0
and eob position set to 0;
Change-Id: Id477e8f8d4ec1a5562758904071013c24b76bfd7
First step in simplifying the segment mode and
segment EOB flags into a simpler segment skip
flag that implies 0,0 mv and EOB at position 0.
Change-Id: Ib750cac31a7a02dc21082580498efd9f7d8d72a5
Simplification to eliminate a number of very large data
data structures. All zero run, zbin boosts for different
transform sizes are now limited to a maximum run length
of 15 before they max out the boost.
Some further work still needs be done to refactor, rationalize
and optimize the multiple quantizer functions.
The simplification coupled with tweaks to the 16 element array
now used for all transform sizes, has minimal effect on quality.
Change-Id: I6f3948b8ca0418b60d4db9030ff19026a34ed423
Adds a flag to disable features that would inhibit frame parallel
decoding. This includes backward adaptation and MV sorting based
on search in ref frame buffer.
Also includes some minor clean-ups.
Change-Id: I434846717a47b7bcb244b37ea670c5cdf776f14d
Adds an error-resilient mode where frames can be continued
to be decoded even when there are errors (due to network losses)
on a prior frame. Specifically, backward updates are turned off
and probabilities of various symbols are reset to defaults at
the beginning of each frame. Further, the last frame's mvs are
not used for the mv reference list, and the sorting of the
initial list based on search on previous frames is turned off
as well.
Also adds a test where an arbitrary set of frames are skipped
from decoding to simulate errors. The test verifies (1) that if
the error frames are droppable - i.e. frame buffer updates have
been turned off - there are no mismatch errors for the remaining
frames after the error frames; and (2) if the error-frames are non
droppable, there are not only no decoding errors but the mismatch
PSNR between the decoder's version of the post-error frames and the
encoder's version is at least 20 dB.
Change-Id: Ie6e2bcd436b1e8643270356d3a930e8989ff52a5
This matches the behavior prior to generalizing the frame context
selection, and intuitively makes sense in that the first forward ref
is immediately after the keyframe, so it's quality is improved a bit
by using the keyframe's entropy context rather than the default.
Change-Id: Ia82cef79382b9d8cfafdc44ba0533d4dc3e44053
This commit restores the quality lost when the buffer-to-buffer copy
logic was removed. Note that this is specific to the current use of
golden frames and will need rework when RTC functionality is added.
Change-Id: I7324a75acd96eafd9e0f9b8633d782e390d5dc21
Previously there were two frame coding contexts tracked, one for normal
frames and one for alt-ref frames. Generalize this by signalling the
context to use in the bitstream, rather than tieing it to the alt ref
refresh bit. Also increase the number of contexts available to 4, which
may be useful for temporal scalability.
Change-Id: I7b66daaddd55c535c20cd16713541fab182b1662
Remove lst_fb_idx, gld_fb_idx, alt_fb_idx, refresh_last_frame,
refresh_golden_frame, refresh_alt_ref_frame from common. Gold/Alt are
encode side conventions. From the decoder's perspective, we want to be
dealing with numbered references.
Updates to active_ref 2 signal mode context switches, vestigial from
refresh_alt_ref_frame. This needs some clean up to make sense with
increased numbers of reference frames, as well as reimplementing the
swapping of alt/golden which was previously done using the
buffer-to-buffer copy mechanism removed in an earlier commit.
Change-Id: I7334445158b7666f9295d2a2dd22aa03f4485f58
Do reference counting the same way on the encoder as the decoder does,
rather than maintaining the 'flags' member of YV12_BUFFER_CONFIG.
Change-Id: I91dc210ffca081acaf9d5c09a06e7461b3c3139c
This is the first in a series of commits to add additional reference
frames to the codec. Each frame will be able to update any of the
available references, but copying between references is not
supported.
Change-Id: I5945b5ce6cc3582c495102b4e7eed4f08c44d5a1
These variables have the type int64_t, not long long. long long could
be a larger type than 64 bits. Emulate INT64_MAX for older versions of
MSVC, and remove the unreferenced vpx_ports/vpxtypes.h
Change-Id: Ideaca71838fcd3849d816d5ab17aa347c97d03b0