Use these, instead of the 4/5-dimensional arrays, to hold statistics,
counts, accumulations and probabilities for coefficient tokens. This
commit also re-allows ENTROPY_STATS to compile.
Change-Id: If441ffac936f52a3af91d8f2922ea8a0ceabdaa5
This adds Debargha's DCT/DWT hybrid and a regular 32x32 DCT, and adds
code all over the place to wrap that in the bitstream/encoder/decoder/RD.
Some implementation notes (these probably need careful review):
- token range is extended by 1 bit, since the value range out of this
transform is [-16384,16383].
- the coefficients coming out of the FDCT are manually scaled back by
1 bit, or else they won't fit in int16_t (they are 17 bits). Because
of this, the RD error scoring does not right-shift the MSE score by
two (unlike for 4x4/8x8/16x16).
- to compensate for this loss in precision, the quantizer is halved
also. This is currently a little hacky.
- FDCT and IDCT is double-only right now. Needs a fixed-point impl.
- There are no default probabilities for the 32x32 transform yet; I'm
simply using the 16x16 luma ones. A future commit will add newly
generated probabilities for all transforms.
- No ADST version. I don't think we'll add one for this level; if an
ADST is desired, transform-size selection can scale back to 16x16
or lower, and use an ADST at that level.
Additional notes specific to Debargha's DWT/DCT hybrid:
- coefficient scale is different for the top/left 16x16 (DCT-over-DWT)
block than for the rest (DWT pixel differences) of the block. Therefore,
RD error scoring isn't easily scalable between coefficient and pixel
domain. Thus, unfortunately, we need to compute the RD distortion in
the pixel domain until we figure out how to scale these appropriately.
Change-Id: I00386f20f35d7fabb19aba94c8162f8aee64ef2b
This patch reduces the cpu cost of the MV ref
search by only allowing insert for candidates
that would be in the current top 4.
This could alter the outcome and slightly favors
near candidates which are tested first but also
limits the worst case loop count to 4 and means in
many cases it will drop out and not happen.
Change-Id: Idd795a825f9fd681f30f4fcd550c34c38939e113
Only declare the functions in vpx_scale RTCD and include the relevant
header.
Remove unused files and functions in vpx_scale to avoid wasting time
renaming. vpx_scale/win32/scaleopt.c contains functions which have not
been called in a long time but are potentially optimized.
The 'vp8' functions have not been renamed yet. That is for after the
cleanup.
Change-Id: I2c325a101d60fa9d27e7dfcd5b52a864b4a1e09c
Adds support for compound inter-intra prediction with superblocks.
Also, fixes a bug that disabled intra modes for superblocks.
Change-Id: I4d711317e1bc19df8c2f32dc645429f7fff31036
Allows switchbale filters to be used without mismatch when the
superblock experiment is on.
Also removes a spurious clamping code in decodemv.c which causes
rare encode/decode mismatches.
Change-Id: I809d9ee0b2859552b613500b539a615515b863ae
This patch allows use of 8x8 and 4x4 ADST correctly for Intra
16x16 modes and Intra 8x8 modes when the block size selected
is smaller than the prediction mode. Also includes some cleanups
and refactoring.
Rebase.
Change-Id: Ie3257bdf07bdb9c6e9476915e3a80183c8fa005a
This change included:
1. Aligned reads in vp9_mbloop_filter_vertical_edge function.
Since we actually read 16 bytes, we can align the reads to read
starting at (s - 8) instead of (s - 5).
2. Combined u, v loop filters.
3. Added 8x16 transpose.
This gave 2% decoder performance gain (tulip clip).
Change-Id: Ib14c2f1645c4a3436df17fe2f24789506bf0bb58
This commit removed a couple of redundant data structures in frame
coding contextsm, mode_context and mode_context_a, and changed to
use vp9_mode_contexts only. The switch of the context for different
frame type now relies on the switch of frame coding context between
lfc and lfc_a. This commit also removed a number of memcpy among
these redundant data structure.
Change-Id: I42e8174bd60f466b0860afc44c1263896471b0f3
Not all segment feature data elements are full-range powers of two, so
there are values that can be encoded that are invalid. Add a new function
to clamp values to the maximum allowed.
Change-Id: Ie47cb80ef2d54292e6b8db9f699c57214a915bc4
Support for gyp which doesn't support multiple objects in the same
static library having the same basename.
Change-Id: Ib947eefbaf68f8b177a796d23f875ccdfa6bc9dc
Vp9_sad3x16_sse2() is heavily called in decoder, in which the
unaligned reads consume lots of cpu cycles. When CONFIG_SUBPELREFMV
is off, the unaligned offset is 1. In this situation,
we can adjust the src_ptr to be 4-byte aligned, and then do the
aligned reads. This reduced the reading time significantly. Tests
on 1080p clip showed over 2% decoder performance gain with
CONFIG_SUBPELREFM off.
Change-Id: I953afe3ac5406107933ef49d0b695eafba9a6507
Experiments with a larger set of contexts and some
clean up to replace magic numbers regarding the
number of contexts.
The starting values and rate of backwards adaption
are still suspect and based on a small set of tests.
Added forwards adjustment of probabilities.
The net result of adding the new context and forward
update is small compared to the old context from the
legacy find_near function. (down a little on derf but
up by a similar amount for HD)
HOWEVER.... with the new context and forward update
the impact of disabling the reverse update (which may be
necessary in some use cases to facilitate parallel decoding)
is hugely reduced.
For the old context without forward update, the impact of
turning off reverse update (Experiment was with SB off) was
Derf - 0.9, Yt -1.89, ythd -2.75 and sthd -8.35. The impact was
mainly at low data rates.
With the new context and forward update enabled the impact
for all the test sets was no more than 0.5-1% (again most at
the low end).
Change-Id: Ic751b414c8ce7f7f3ebc6f19a741d774d2b4b556
added additional motion vectors at close neighborhood of a superblock
to the list of candiate motion vectors, and removed a couple that are
further away.
The change helped std-hd set about .8% (all metrics) and smaller gain
for derf set.
Change-Id: Iaa69b98614db43420ed3fd4738d0ca5587b90045
A patch on compound inter-intra prediction.
In compound inter-intra prediction, a new predictor for
16x16 inter coded MBs are obtained by combining a single
inter predictor with a 16x16 intra predictor, in a manner
that the weight varies with distance from the top/left
boundary. The current search strategy is to combine the best
inter mode with the best intra mode obtained independently.
Results so far:
derf +0.31%
yt +0.32%
std-hd +0.35%
hd +0.42%
It is conceivable that the results would improve somewhat
with a more thorough search strategy where all intra modes
are searched given the best mv, or even a joint search for
the best mv and the best intra mode.
Change-Id: I7951f1ed0d6eb31ca32ac24d120f1585bcd8d79b