Also port the 4x4, 16x16, 8x16 and 16x8 versions to x86inc.asm; this
makes them all slightly faster, particularly on x86-64. Remove SSE3
sad16x16 version, since the SSE2 version is now faster.
About 1.5% overall encoding speedup.
Change-Id: Id4011a78cce7839f554b301d0800d5ca021af797
Cache the constant offset in one variable to prevent re-loading that
in each loop iteration, and mark the function as inline so we can use
the fact that the transform size is always known in the caller.
Almost 1% faster encoding overall.
Change-Id: Id78325a60b025057d8f4ecd9003a74086ccbf85a
Pass the current mb row and column around rather than the
recon_yoffset and recon_uvoffset, since those offsets will
change from predictor to predictor, based on the reference
frame selection.
Change-Id: If3f9df059e00f5048ca729d3d083ff428e1859c1
* changes:
Initial support for resolution changes on P-frames
Avoid allocating memory when resizing frames
Adds a test for the VP8E_SET_SCALEMODE control
Allows inter-frames to change resolution. Currently these are
almost equivalent to keyframes, as only intra prediction modes
are allowed, but without the other context resets that occur on
keyframes.
Change-Id: Icd1a2a5af0d9462cc792588427b0a1f5b12e40d3
As long as the new frame is smaller than the size that was originally
allocated, we don't need to free and reallocate the memory allocated.
Instead, do the allocation on the size of the first frame. We could
make this passed in from the application instead, if we wanted to
support external upscaling.
Change-Id: I204d17a130728bbd91155bb4bd863a99bb99b038
Tests that the external interface to set the internal codec scaling
works as expected. Also updates the test to pull the height from
the decoded frame size rather than parsing the keyframe header,
in anticipation of allowing resolution changes on non-keyframes.
Change-Id: I3ed92117d8e5288fbbd1e7b618f2f233d0fe2c17
Refactor the 8x8 inverse hybrid transform. It is now consistent
with the new inverse DCT. Overall performance loss (due to the
use of this variant ADST, and the rounding errors in the butterfly
implementation) for std-hd is -0.02.
Fixed BUILD warning.
Devise a variant of the original ADST, which allows butterfly
computation structure. This new transform has kernel of the
form: sin((2k+1)*(2n+1) / (4N)). One of its butterfly structures
using floating-point multiplications was reported in Z. Wang,
"Fast algorithms for the discrete W transform and for the discrete
Fourier transform", IEEE Trans. on ASSP, 1984.
This patch includes the butterfly implementation of the inverse
ADST/DCT hybrid transform of dimension 8x8.
Change-Id: I3533cb715f749343a80b9087ce34b3e776d1581d
Added switches and code to skip/breakout from
doing SB32 and SB64 tests based on whether
the 16x16 MB tests used split modes. Also to
optionally skip 64x64 if 16x16 was chosen over
32x32.
Impact varies depending on clip from a few %
up to almost 50% on encode speed. Only the
split mode breakout is currently enabled.
Change-Id: Ib5836140b064b350ffa3057778ed2cadcc495cf8
This patch adds column-based tiling. The idea is to make each tile
independently decodable (after reading the common frame header) and
also independendly encodable (minus within-frame cost adjustments in
the RD loop) to speed-up hardware & software en/decoders if they used
multi-threading. Column-based tiling has the added advantage (over
other tiling methods) that it minimizes realtime use-case latency,
since all threads can start encoding data as soon as the first SB-row
worth of data is available to the encoder.
There is some test code that does random tile ordering in the decoder,
to confirm that each tile is indeed independently decodable from other
tiles in the same frame. At tile edges, all contexts assume default
values (i.e. 0, 0 motion vector, no coefficients, DC intra4x4 mode),
and motion vector search and ordering do not cross tiles in the same
frame.
t log
Tile independence is not maintained between frames ATM, i.e. tile 0 of
frame 1 is free to use motion vectors that point into any tile of frame
0. We support 1 (i.e. no tiling), 2 or 4 column-tiles.
The loopfilter crosses tile boundaries. I discussed this briefly with Aki
and he says that's OK. An in-loop loopfilter would need to do some sync
between tile threads, but that shouldn't be a big issue.
Resuls: with tiling disabled, we go up slightly because of improved edge
use in the intra4x4 prediction. With 2 tiles, we lose about ~1% on derf,
~0.35% on HD and ~0.55% on STD/HD. With 4 tiles, we lose another ~1.5%
on derf ~0.77% on HD and ~0.85% on STD/HD. Most of this loss is
concentrated in the low-bitrate end of clips, and most of it is because
of the loss of edges at tile boundaries and the resulting loss of intra
predictors.
TODO:
- more tiles (perhaps allow row-based tiling also, and max. 8 tiles)?
- maybe optionally (for EC purposes), motion vectors themselves
should not cross tile edges, or we should emulate such borders as
if they were off-frame, to limit error propagation to within one
tile only. This doesn't have to be the default behaviour but could
be an optional bitstream flag.
Change-Id: I5951c3a0742a767b20bc9fb5af685d9892c2c96f
Update the code to call the new convolution functions to do subpixel
prediction rather than the existing functions. Remove the old C and
assembly code, since it is unused. This causes a 50% performance
reduction on the decoder, but that will be resolved when the asm for
the new functions is available.
There is no consensus for whether 6-tap or 2-tap predictors will be
supported in the final codec, so these filters are implemented in
terms of the 8-tap code, so that quality testing of these modes
can continue. Implementing the lower complexity algorithms is a
simple exercise, should it be necessary.
This code produces slightly better results in the EIGHTTAP_SMOOTH
case, since the filter is now applied in only one direction when
the subpel motion is only in one direction. Like the previous code,
the filtering is skipped entirely on full-pel MVs. This combination
seems to give the best quality gains, but this may be indicative of a
bug in the encoder's filter selection, since the encoder could
achieve the result of skipping the filtering on full-pel by selecting
one of the other filters. This should be revisited.
Quality gains on derf positive on almost all clips. The only clip
that seemed to be hurt at all datarates was football
(-0.115% PSNR average, -0.587% min). Overall averages 0.375% PSNR,
0.347% SSIM.
Change-Id: I7d469716091b1d89b4b08adde5863999319d69ff
This commit changes the 4x4 iDCT to use same algorithm & constants as
other iDCTs. The 4x4 fDCT is also changed to be based on the new iDCT.
Change-Id: Ib1a902693228af903862e1f5a08078c36f2089b0
This commit makes the NearestMV match the chosen
best reference MV. It can be a 0,0 or non zero vector
which means the the compound nearest mv mode can
combine a 0,0 and a non zero vector.
Change-Id: I2213d09996ae2916e53e6458d7d110350dcffd7a
Separate out code to set the main encode speed
related rd thresholds. Some values changed from
the initial defaults for various new modes.
Quality test results pending but even the addition
of some further non-zero defaults helps encode speed
somewhat in limited testing on derf clips.
Adjustment of thresholds for quality / speed tradeoff
to follow.
Change-Id: I117ee473157e151a1b93193d5f393449328de20d
The commit fixes a minor error in 16 point fdct where in a rotation can
produce result of -1 instead of 0.
Change-Id: I45aac4a52bcd06225c6d04e643547a13e1c1aade
This is identical to the later decisions made in encode_superblock().
This commit doesn't actually change anything, but makes the mbmi state
more consistent between the RD loop and the final encode result.
Change-Id: I9e735afb7c5a52e5b61728cb88c67ef9b9bf59be
The RD loop would change the pointer after the first mode (DC) was tested,
leading to corrupt block objects being provided for the others. This
would essentially render the i8x8 predictor useless.
Change-Id: I16c5906ca64fb34878ac32ce59af8974e4582bb8
Remove eob_max_offset markers and replace
with the generic skip_block flag to indicate
to the quantizer that all coeffs to be set to 0
and eob position set to 0;
Change-Id: Id477e8f8d4ec1a5562758904071013c24b76bfd7
First step in simplifying the segment mode and
segment EOB flags into a simpler segment skip
flag that implies 0,0 mv and EOB at position 0.
Change-Id: Ib750cac31a7a02dc21082580498efd9f7d8d72a5
Simplification to eliminate a number of very large data
data structures. All zero run, zbin boosts for different
transform sizes are now limited to a maximum run length
of 15 before they max out the boost.
Some further work still needs be done to refactor, rationalize
and optimize the multiple quantizer functions.
The simplification coupled with tweaks to the 16 element array
now used for all transform sizes, has minimal effect on quality.
Change-Id: I6f3948b8ca0418b60d4db9030ff19026a34ed423
Adds a flag to disable features that would inhibit frame parallel
decoding. This includes backward adaptation and MV sorting based
on search in ref frame buffer.
Also includes some minor clean-ups.
Change-Id: I434846717a47b7bcb244b37ea670c5cdf776f14d
Adds an error-resilient mode where frames can be continued
to be decoded even when there are errors (due to network losses)
on a prior frame. Specifically, backward updates are turned off
and probabilities of various symbols are reset to defaults at
the beginning of each frame. Further, the last frame's mvs are
not used for the mv reference list, and the sorting of the
initial list based on search on previous frames is turned off
as well.
Also adds a test where an arbitrary set of frames are skipped
from decoding to simulate errors. The test verifies (1) that if
the error frames are droppable - i.e. frame buffer updates have
been turned off - there are no mismatch errors for the remaining
frames after the error frames; and (2) if the error-frames are non
droppable, there are not only no decoding errors but the mismatch
PSNR between the decoder's version of the post-error frames and the
encoder's version is at least 20 dB.
Change-Id: Ie6e2bcd436b1e8643270356d3a930e8989ff52a5
This matches the behavior prior to generalizing the frame context
selection, and intuitively makes sense in that the first forward ref
is immediately after the keyframe, so it's quality is improved a bit
by using the keyframe's entropy context rather than the default.
Change-Id: Ia82cef79382b9d8cfafdc44ba0533d4dc3e44053
This commit restores the quality lost when the buffer-to-buffer copy
logic was removed. Note that this is specific to the current use of
golden frames and will need rework when RTC functionality is added.
Change-Id: I7324a75acd96eafd9e0f9b8633d782e390d5dc21
Previously there were two frame coding contexts tracked, one for normal
frames and one for alt-ref frames. Generalize this by signalling the
context to use in the bitstream, rather than tieing it to the alt ref
refresh bit. Also increase the number of contexts available to 4, which
may be useful for temporal scalability.
Change-Id: I7b66daaddd55c535c20cd16713541fab182b1662