We used to calculate SSIM only over the postproc buffer, whereas we
calculate PSNR for both. Compared to postproc-SSIM, this is about 0.3%
higher for derf, 1.4% lower for hd and 0.5% lower for stdhd, although
it is highly variable on a per-clip basis.
Change-Id: I8dd491f0f5b4201dedfb15d288c854d5d4caa10f
As things stand the zero bin mode boost is hurting somewhat.
In part this seems to be because the boost applied as is
interferes with the rd mode selection loop.
Average gains (derf 0.072, yt 0.243, ythd 0.179 std-hd 0.212%)
Change-Id: Icaecea3908d9a7352370e49b8fa822f2c2c49dc1
Adjust the filter length and strength for each
ARF group based on a measure of difficulty (the boost)
and the active q range.
Remove lower limit on RDMULT value.
Average gains on the different sets in range 0.4%-0.9%.
However the ARNR changes give a very big boost on a
few clips.
Eg. Soccer ~5%, in derf set and Cyclist ~ 10% in the std-hd set
Change-Id: I2078d78798e27ad2bcc2b32d703ea37b67412ec4
These variables are unused, and are subject to overflowing, causing
assertions when built with -ftrapv.
Change-Id: Ia00a3201af309906c05bcd4b23a643925ed6ea86
Updates the YV12_BUFFER_CONFIG structure to be crop-aware. The
exiting width/height parameters are left unchanged, storing the
width and height algined to a 16 byte boundary. The cropped
dimensions are added as new fields.
This fixes a nasty visual pulse when switching between scaled and
unscaled frame dimensions due to a mismatch between the scaling
ratio and the 16-byte aligned sizes.
Change-Id: Id4a3f6aea6b9b9ae38bdfa1b87b7eb2cfcdd57b6
Adds probability updates for extra bits for the nzcs, code for
getting nzc stats, plus some minor cleanups and fixes.
Change-Id: If2814e7f04fb52f5025ad9f400f3e6c50a00b543
This patch revamps the entropy coding of coefficients to code first
a non-zero count per coded block and correspondingly remove the EOB
token from the token set.
STATUS:
Main encode/decode code achieving encode/decode sync - done.
Forward and backward probability updates to the nzcs - done.
Rd costing updates for nzcs - done.
Note: The dynamic progrmaming apporach used in trellis quantization
is not exactly compatible with nzcs. A suboptimal approach has been
used instead where branch costs are updated to account for changes
in the nzcs.
TODO:
Training the default probs/counts for nzcs
Change-Id: I951bc1e22f47885077a7453a09b0493daa77883d
Added a variant of the one shot maxQ flag
for two pass that forces a fixed Q for the
normal inter frames. Disabled by default.
Also small adjustment to the Bits per MB
estimation.
Change-Id: I87efdfb2d094fe1340ca9ddae37470d7b278c8b8
Split macroblock and superblock tokenization and detokenization
functions and coefficient-related data structs so that the bitstream
layout and related code of superblock coefficients looks less like it's
a hack to fit macroblocks in superblocks.
In addition, unify chroma transform size selection from luma transform
size (i.e. always use the same size, as long as it fits the predictor);
in practice, this means 32x32 and 64x64 superblocks using the 16x16 luma
transform will now use the 16x16 (instead of the 8x8) chroma transform,
and 64x64 superblocks using the 32x32 luma transform will now use the
32x32 (instead of the 16x16) chroma transform.
Lastly, add a trellis optimize function for 32x32 transform blocks.
HD gains about 0.3%, STDHD about 0.15% and derf about 0.1%. There's
a few negative points here and there that I might want to analyze
a little closer.
Change-Id: Ibad7c3ddfe1acfc52771dfc27c03e9783e054430
Fixed a couple of variable/function definitions, as well as header
handling to support 16K sequence coding at high bit-rates.
The width and height are each specified by two bytes in the header.
Use an extra byte to explicitly indicate the scaling factors in
both directions, each ranging from 0 to 15.
Tested coding up to 16400x16400 dimension.
Change-Id: Ibc2225c6036620270f2c0cf5172d1760aaec10ec
This patch extends the previous support for using references of a
different resolution in ZEROMV mode to all inter prediction modes.
Subpixel based best-mv scoring is disabled when the reference frame
differs in resolution from the current frame.
Change-Id: Id4dc3e5e6692de98d9857fd56bfad3ac57e944ac
Some minor refactoring code relating to estimates of
bits per MB at a given Q and estimating the allowed Q range.
Most of the changes here were included in a previous commit.
This commit seeks to separate out the refactoring from more
the material changes.
Two #define control flags have been added for experimentation.
ONE_SHOT_Q_ESTIMATE force the two pass encoder to
use its initial Q range estimate for the whole clip even if this results
in a miss on the target data rate. In effect this tightens the Q range
seen at the expense of rate control accuracy.
DISABLE_RC_LONG_TERM_MEM is a related flag that disables the
long term memory in the rate control. Local adjustments are still
made to try and better hit the rate target on a per frame basis but
the impact of rate control misses is not propagated to the remainder
of the clip. This means that for example an overshoot early on will not
cause frames later in the clip to be starved of bits. Again the result
of this relaxation amy be less rate control accuracy especially on short
clips.
The flags are disabled by default for now.
Change-Id: I7482f980146d8ea033b5d50cc689f772e4bd119e
The over quant code was added in VP8 post
bitstream freeze to allow compression to lower
data rates
In VP9 the real qualtizer range has been greatly
extended anyway.
Change-Id: I5d384fa5e9a83ef75a3df34ee30627bd21901526
These allow sending partial bitstream packets over the network before
encoding a complete frame is completed, thus lowering end-to-end
latency. The tile-rows are not independent.
Change-Id: I99986595cbcbff9153e2a14f49b4aa7dee4768e2
Experimental tweaks to various thresholds to measure
quality / speed trade off.
Add flag that allows static segmentation to be turned off
and disables it unless in the second pass of a two pass
encode.
Change-Id: I219702ffe858412a83db801cbbbd869924b8c61b
* changes:
Initial support for resolution changes on P-frames
Avoid allocating memory when resizing frames
Adds a test for the VP8E_SET_SCALEMODE control
As long as the new frame is smaller than the size that was originally
allocated, we don't need to free and reallocate the memory allocated.
Instead, do the allocation on the size of the first frame. We could
make this passed in from the application instead, if we wanted to
support external upscaling.
Change-Id: I204d17a130728bbd91155bb4bd863a99bb99b038
Added switches and code to skip/breakout from
doing SB32 and SB64 tests based on whether
the 16x16 MB tests used split modes. Also to
optionally skip 64x64 if 16x16 was chosen over
32x32.
Impact varies depending on clip from a few %
up to almost 50% on encode speed. Only the
split mode breakout is currently enabled.
Change-Id: Ib5836140b064b350ffa3057778ed2cadcc495cf8
Simplification to eliminate a number of very large data
data structures. All zero run, zbin boosts for different
transform sizes are now limited to a maximum run length
of 15 before they max out the boost.
Some further work still needs be done to refactor, rationalize
and optimize the multiple quantizer functions.
The simplification coupled with tweaks to the 16 element array
now used for all transform sizes, has minimal effect on quality.
Change-Id: I6f3948b8ca0418b60d4db9030ff19026a34ed423
Remove lst_fb_idx, gld_fb_idx, alt_fb_idx, refresh_last_frame,
refresh_golden_frame, refresh_alt_ref_frame from common. Gold/Alt are
encode side conventions. From the decoder's perspective, we want to be
dealing with numbered references.
Updates to active_ref 2 signal mode context switches, vestigial from
refresh_alt_ref_frame. This needs some clean up to make sense with
increased numbers of reference frames, as well as reimplementing the
swapping of alt/golden which was previously done using the
buffer-to-buffer copy mechanism removed in an earlier commit.
Change-Id: I7334445158b7666f9295d2a2dd22aa03f4485f58
This patch removes the old pred-filter experiment and replaces it
with one that is implemented using the switchable filter framework.
If the pred-filter experiment is enabled, three interopolation
filters are tested during mode selection; the standard 8-tap
interpolation filter, a sharp 8-tap filter and a (new) 8-tap
smoothing filter.
The 6-tap filter code has been preserved for now and if the
enable-6tap experiment is enabled (in addition to the pred-filter
experiment) the original 6-tap filter replaces the new 8-tap smooth
filter in the switchable mode.
The new experiment applies the prediction filter in cases of a
fractional-pel motion vector. Future patches will apply the filter
where the mv is pel-aligned and also to intra predicted blocks.
Change-Id: I08e8cba978f2bbf3019f8413f376b8e2cd85eba4
Old Scheme:
When SWITCHABLE filter selection is enabled the encoder
evaluates the use of each interpolation filter type and
selects the best one to use at the MB level. A frame-
level flag can be set to force the use of a particular
filter type for all MBs in a frame if it is more efficient
to encode that way. The logic here involved a Q dependent
threshold that assumed that the second 8-tap filter was
a high-pass filter. However, this requires a trip around
the recode loop. If the frame-level flag indicates use
of a particular filter, the other filters are not
evaluated in the pick_mode loop.
New Scheme:
Each filter type is evaluated at the MB level and a record
of the best filter is kept, irrespective of what filter
is signaled at the frame-level. Once all MBs have been
encoded, a decision is made as to what frame-level mode
to set for the *next* frame. If one filter is used by 80%
or more of the MBs, then this filter is forced since it
is assumed that this will be more efficient if the
next frame has similar characteristics. i.e. there is a
one-frame lag between measuring the filter selection and
setting the frame-level mode to use.
Change-Id: I6a7e7ced8f27e120fafb99db2dc9c6293f8d20f7
Various fixups to resolve issues when building vp9-preview under the more stringent
checks placed on the experimental branch.
Change-Id: I21749de83552e1e75c799003f849e6a0f1a35b07
For coefficients, use int16_t (instead of short); for pixel values in
16-bit intermediates, use uint16_t (instead of unsigned short); for all
others, use uint8_t (instead of unsigned char).
Change-Id: I3619cd9abf106c3742eccc2e2f5e89a62774f7da
Some further changes and refactoring of mv
reference code and selection of center point for
searches. Mainly relates to not passing so many
different local copies of things around.
Some place holder comments.
Change-Id: I309f10ffe9a9cde7663e7eae19eb594371c8d055
Use these, instead of the 4/5-dimensional arrays, to hold statistics,
counts, accumulations and probabilities for coefficient tokens. This
commit also re-allows ENTROPY_STATS to compile.
Change-Id: If441ffac936f52a3af91d8f2922ea8a0ceabdaa5
This adds Debargha's DCT/DWT hybrid and a regular 32x32 DCT, and adds
code all over the place to wrap that in the bitstream/encoder/decoder/RD.
Some implementation notes (these probably need careful review):
- token range is extended by 1 bit, since the value range out of this
transform is [-16384,16383].
- the coefficients coming out of the FDCT are manually scaled back by
1 bit, or else they won't fit in int16_t (they are 17 bits). Because
of this, the RD error scoring does not right-shift the MSE score by
two (unlike for 4x4/8x8/16x16).
- to compensate for this loss in precision, the quantizer is halved
also. This is currently a little hacky.
- FDCT and IDCT is double-only right now. Needs a fixed-point impl.
- There are no default probabilities for the 32x32 transform yet; I'm
simply using the 16x16 luma ones. A future commit will add newly
generated probabilities for all transforms.
- No ADST version. I don't think we'll add one for this level; if an
ADST is desired, transform-size selection can scale back to 16x16
or lower, and use an ADST at that level.
Additional notes specific to Debargha's DWT/DCT hybrid:
- coefficient scale is different for the top/left 16x16 (DCT-over-DWT)
block than for the rest (DWT pixel differences) of the block. Therefore,
RD error scoring isn't easily scalable between coefficient and pixel
domain. Thus, unfortunately, we need to compute the RD distortion in
the pixel domain until we figure out how to scale these appropriately.
Change-Id: I00386f20f35d7fabb19aba94c8162f8aee64ef2b
This patch reduces the cpu cost of the MV ref
search by only allowing insert for candidates
that would be in the current top 4.
This could alter the outcome and slightly favors
near candidates which are tested first but also
limits the worst case loop count to 4 and means in
many cases it will drop out and not happen.
Change-Id: Idd795a825f9fd681f30f4fcd550c34c38939e113