Reverts to using 128 bit LUT for the coef models rather than 48
to ease hardware implementation.
Also incorporates some cleanups including removing various
hooks to support different lookup tables based on block_type and
ref_type.
Change-Id: I54100c120cca07a2ebd3a7776bc4630fa6a153f6
This commit changed the encoding and decoding of intra blocks to be
based on transform block. In each prediction block, the intra coding
iterates thorough each transform block based on raster scan order.
This commit also fixed a bug in D135 prediction code.
TODO next:
The RD mode/txfm_size selection should take this into account when
computing RD values.
Change-Id: I6d1be2faa4c4948a52e830b6a9a84a6b2b6850f6
This commit allows the rate-distortion optimization of intra coding
capable of supporting 8x4 and 4x8 partition settings.
It enables the entropy coding of intra modes in key frame using a
unified contextual probability model conditioned on its above/left
prediction modes.
Coding performance:
derf 0.464%
Change-Id: Ieed055084e11fcb64d5d5faeb0e706d30268ba18
The API is not final yet and can be changed. Actual layout of
uncompressed frame part will be finalized later. Right now moving
clr_type, error_resilient_mode, refresh_frame_context,
frame_parallel_decoding_mode from first compressed partition to
uncompressed frame part.
Change-Id: I3afc5d4ea92c5a114f4c3d88f96858cccc15b76e
Cleans up the experiment. Actually uses reduced counts for backward
updates, and reduced number of probabilities in the context.
No change in bitstream when the experiment is on.
Between expt on and off:
derfraw300 is down only -0.062% (which is better than when expts
were run previously).
Change-Id: I55285a049a0c22810bdb42914212ab5a4f8521b5
This patch eliminates the intermediate diff buffer usage by
combining the short idct and the add residual into one function.
The encoder can use the same code as well.
Change-Id: I296604bf73579c45105de0dd1adbcc91bcc53c22
The new code is 0x49, 0x83, 0x42
There is nothing particularly special about this code bitstream wise.
Its derivation is the word "sync" coded using 4x6bit alphabetic indices.
Change-Id: Ie2430a854af32ddc5a5c25a6c1c90cf6497ba647
The recursive partition type search is enabled down to 4x4, 4x8 and
8x4, followed by the corresponding rate-distortion optimization for
the per-partition encoding mode decisions.
The bit-stream writing/reading synchronized in supporting the
rectangular partition of 8x8 block.
This provides above 1% coding performance gains on derf.
To do next:
1. re-design the rate-distortion loop for inter prediction below 8x8.
2. re-design the rate-distortion loop for intra prediction below 4x4.
3. make the loop-filter aware of rectangular partition of 8x8 block.
4. clean the unused probability models.
5. update default probability values.
Change-Id: Idd41a315b16879db08f045a322241f46f1d53f20
This patch eliminates the intermediate diff buffer usage by
combining the short idct and the add residual into one function.
The encoder can use the same code as well.
Change-Id: Iacfd57324fbe2b7beca5d7f3dcae25c976e67f45
Adds a subsampling aware border extension function. This may be reworked
soon to support more than 3 planes.
Change-Id: I76b81901ad10bb1e678dd4f0d22740ca6c76c43b
Combining encode_nmv_component with encode_nmv_component_fp
and read_nmv_component with read_nmv_component_fp. Bitstream is slightly
changed (only the order of bits), here are the results on test sets:
stdhd: +0.047, yt: -0.038, derf: +0.001, hd: -0.011.
Change-Id: I1be312e976796df78ca63368702d0ee19f2b8c50
This patch eliminates the intermediate diff buffer usage by
combining the short idct and the add residual into one function.
The encoder can use the same code as well.
Change-Id: Iea7976b22b1927d24b8004d2a3fddae7ecca3ba1
This patch eliminates the intermediate diff buffer usage by
combining the short idct and the add residual into one function.
The encoder can use the same code as well.
Change-Id: I4ea09df0e162591e420d869b7431c2e7f89a8c1a
This commit allows the rate-distortion optimization recursion
at encoder to go down to 4x4 block size. It deprecates the use
of I4X4_PRED and SPLITMV syntax elements from bit-stream
writing/reading. Will remove the unused probability models in
the next patch.
The partition type search and bit-stream are now capable of
supporting the rectangular partition of 8x8 block, i.e., 8x4
and 4x8. Need to revise the rate-distortion parts to get these
two partition tested in the rd loop.
Change-Id: I0dfe3b90a1507ad6138db10cc58e6e237a06a9d6
Change band calculation back to simpler model based
on the order in which coefficients are coded in scan order
not the absolute coefficient positions.
With the scatter scan experiment enabled the results were
appear broadly neutral on derf (-0.028) but up a little on std-hd +0.134).
Without the scatterscan experiment on the results were up derf as well.
Change-Id: Ie9ef03ce42a6b24b849a4bebe950d4a5dffa6791
Move set_partition_seg_context_ to common file. Use consistent
context setup conditions for partition probability model update at
encoder and decoder.
Change-Id: I24b7ed3b1c48e3d2568191a46b70136b99b67b1a
This commit enables the search for the optimal superblock
partition types in the recursion form. The intention is to
make the optimization process more concise and ready to
support partition down to 4x4 block size next.
Change-Id: Iae279a67df3a7cc372553c84c775bc4d2f3e4336
Make framebuffer allocations according to the chroma subsamping
factors in use. A bit is placed in the raw part of the frame header for
each of the two subsampling factors. This will be moved in a future
commit to make them part of the TBD feature set bits, probably only set
on keyframes, etc.
Change-Id: I59ed38d3a3c0d4af3c7c277617de28d04a001853
Update and buffer left/above partition information context per 8x8
block. This allows to further enable recursive partition down to
4x4 block size, and hence deprecating I4X4_PRED and SPLITMV.
This commit also fixes a context buffer swap/restore issue in 32x32
partition type search. This gives 0.1% performance gain for derf/yt.
Will refactor the superblock partition type search into recursion
form.
Change-Id: Ib61975aca5f12b78d8018481d7fa1393d085689b
This setup is now handled by vp9_build_intra_predictors()
when left_available and/or up_available is zero.
Change-Id: I59cec0ab95f8be69ce885fd20727510e4deef8a0
The number of reference buffers is extended to 8 and
a reference sign-bias added for the LAST_FRAME.
Whilst the number of reference buffers used by an
individual frame remains unchanged at 3, these may
now be selected from 8 possible buffers.
Change-Id: I2d247b9c1c2b3a339d6c9fac125e81ba373f75a7
Don't allow i4x4 except for sb8x8 recursion step. Read only 4 (not 16)
i4x4 submodes if we are i4x4.
Change-Id: Iaaaced1a134006b2c96eed66f014300eae41e0ed
If a reference frame is inter, the only valid modes would
be inter modes. This check is unnecessary.
Change-Id: Ib8433ab5a3418f94149ee4e3062d48d7740d225a
The decode_mb only carries I8X8_PRED decoding, which will be covered
by the regular MB intra modes when SB8X8 is on. To be removed later.
Change-Id: I3b9ee55917a30b42518b81987bc10c22b1a19e7f
Work-in-progress, not yet ready for review. TODO items:
- bitstream writing (encoder) and reading (decoder)
- decoder reconstruction
Change-Id: I5afb7284e7e0480847b47cd0097cb469433c9081
Changing the order of probabilities inside mb_segment_tree_probs in order
to use treed_read/treed_write function instead of custom code.
Change-Id: I843487d5057913b9358db73da270893eefecc6c8
Moving common code from encoder and decoder to vp9_get_qindex function.
Also moving quant-related constants from vp9_onyxc_int.h to
vp9_quant_common.h.
Change-Id: I70c5bfbaa1c8bf00fde0bfc459d077f88b6d46c8
Separate the decoding process of 4x4 block based coding (both intra
and inter) from decode_mb and move it into decode_atom_. This allows
to further move the rest per 16x16 block decoding of decode_mb into
decode_sb, and hence eventually deprecating decode_mb when SB8X8 is
enabled.
Change-Id: I678cb8007d8a57b792d7a23020edb0c74fbf4237
Separate the functionality of I4X4_PRED from decode_mb. Use
decode_atom_intra instead, to enable recursive partition of superblock
down to 8x8.
Change-Id: Ifc89a3be82225398954169d0a839abdbbfd8ca3b
Output changes slightly because of a minor bug in (at least) the sb32x16
block2above tx16x16 tables that previously existed in vp9_blockd.c.
Change-Id: I624af28ac200a8322d64454cf05c79e9502968cc
First patch to make sb decoding based on the transform size. This patch
is working for the sb modes, combining the parts of decode_mb that fit
into this framework will come as a second patch.
Change-Id: I26123416a7a87e096bbdb5eb944ce5bb198384f8
Conflicts:
vp9/common/vp9_findnearmv.c
vp9/common/vp9_rtcd_defs.sh
vp9/decoder/vp9_decodframe.c
vp9/decoder/x86/vp9_dequantize_sse2.c
vp9/encoder/vp9_rdopt.c
vp9/vp9_common.mk
Resolve file name changes in favor of master. Resolve rdopt changes in
favor of experimental, preserving the newer experiments.
Change-Id: If51ed8f457470281c7b20a5c1a2f4ce2cf76c20f
All members can be referenced from their per-plane counterparts, and
removes assumptions about 24 blocks per macroblock.
Change-Id: I7ff2fa72d22c29163eb558981c8193765a8113d9
This originally was "Removed update_blockd_bmi()". Now,
this patch removed bmi from blockd and uses the bmi found
in mode_info_context. Eliminates unnecessary bmi copies between
blockd and mode_info_context.
Change-Id: I287a4972974bb363f49e528daa9b2a2293f4bc76
vp9_dequantize_x86 has only sse2 functions.
vp9_dct_sse2_intrinsics has no namespace collision and can drop
_intrinsics.
vp9_idct_mmx.h is unused.
Change-Id: Ic16e31fb372a1d1e841a62ecb4189fe8f95808ec
Basic assumption: when talking about transform units, use b_; when
talking about macroblock indices, use mb_.
Change-Id: Ifd163f595d4924ff892de4eb0401ccd56dc81884
The quantizer can vary per-plane, and the dequantization vector is
available in the per-plane part of MACROBLOCKD. The previous code would
incorrectly use the Y quantizer for the whole macroblock.
Change-Id: I3ab418aef9168ea0ddcfa4b7c0be32ae48b536d7
Using ALLOWED_REFS_PER_FRAME constants instead of hard coded 3, replacing
memcpy with plain struct assignment.
Change-Id: Ibc86f5d175fcb3f3a3eddacf593525370f1f854c
Function set_mb_row() and set_mb_col() do similar work and are always
called together, this commit merged them into a single function for
clarity and easy maintainence. This was a TODO item.
Change-Id: I956bd9ed6afb8b2b0469b20fd8bc893b26f8a0f3
This commit enables selecting probability models for recursive block
partition information syntax, depending on its above/left partition
information, as well as the current block size. These conditional
probability models are reasonably stationary and consistent across
frames, hence the backward adaptive approach is used to maintain and
update the contextual models.
It achieves coding performance gains (on top of enabling rectangular
block sizes):
derf: 0.242%
yt: 0.391%
hd: 0.376%
stdhd: 0.645%
Change-Id: Ie513d9673337f0d27abd65fb566b711d0844ec2e
Also some further simplification following removal
of top node code.
There is an issue in regards to the shared file vp8cx.h
in regard to the roi_map as this interface assumes that
there are only 4 segments. I have left the value here as
4 for now meaning that the roi_map interface is broken
for VP9.
Note that this change would have been easier if I hadn't
had to search for hard wire instances of the number 4
and <= 3.
Change-Id: Ia8b6deea4be4dbd20deb1656e689dd43a5f190e8
Remove top node optimization.
The improvement this gives is not sufficient to justify
the extra complexity.
Change-Id: I2bb4a12a50ffd52cacfa4a3e8acbb2e522066905
First in a series of commits making certain MACROBLOCK members
addressable per-plane. This commit also refactors the block subtraction
functions vp9_subtract_b, vp9_subtract_sby_c, etc to be
loops-over-planes and variable subsampling aware.
Change-Id: I371d092b914ae0a495dfd852ea1a3d2467be6ec3
Mostly for cleanup purposes. Now we should be able to rework
the encoder/decoder to use a common idct/add function.
Change-Id: I1597cc59812f362ecec0a3493b6101a6cc6fa7ff
This fixes an intermittent mismatch issue cause by moving
the lossless mode decoding bit to after the loop filter
setup information. We need to ensure that the lossless bit
is decoded prior to loop filter setup.
Change-Id: I3faa3fff8e1013b7405dac91268350e059ed121e
Adds an experiment that codes an end-of-orientation symbol
for every eligible zero encountered in scan order.
This cleans out various other sub-experiments that were part
of the origiinal patch, which will be later included if found
useful.
Results are slightly positive on all sets (0.1 - 0.2% range).
Change-Id: I57765c605fefc7fb9d1b57f1b356843602abefaf
Removes the redundant dst pointers from vp9_build_inter_predictors_sb{y,uv}
and the remaining mb specific functions.
Change-Id: I7b6bf439d9394b85ea79b4fe61a3ffc1025720da
First in a series of commits moving the framebuffers pointers to
per-plane data, so that they can be indexed numerically rather than
by name.
Change-Id: I6e0d60fd4d51e6375c384eb7321776564df21775
For 1080 material, this buffer is currently 2,270,928 bytes. This patch swaps
ptrs instead of copying and uses the last show_frame flag instead of setting
the entire buffer to zero. For the test clip used, the decoder improved by up
to 1%.
Change-Id: I686825712ad56043e09ada9808dc489f875a6ce0
Further simplification of mvref search to return
only the top two candidates. Distance weights removed
as the test order reflects distance anyway.
Change-Id: I0518cab7280258fec2058670add4f853fab7b855
Use in-place buffers (dst of MACROBLOCKD) for macroblock prediction.
This makes the macroblock buffer handling consistent with those of
superblock. Remove predictor buffer MACROBLOCKD.
Change-Id: Id1bcd898961097b1e6230c10f0130753a59fc6df
Moving all the probability updates after frame context selection.
This makes it clean and simple to store all the probs in single
struct that can be sent to hardware codec.
Change-Id: I2ec3de81adbd468d8ef34a914caae80a18c3ef56
List of moved functions: vp9_decode_uniform, vp9_decode_term_subexp,
vo9_inv_recenter_nonneg, vp9_decode_unsigned_max.
Change-Id: Ib518beb90b791690c5c93de17b8bdbf560033b41
Adds RD integration for 32x16, 16x32, 64x32 and 32x64 rectangular blocks.
Derf almost +0.6%, HD a little over +1.0%, STDHD +1.3%.
Change-Id: Id651fdb6a655fdbb5c47009757e63317acfb88a5
Enable recursive partition information coding from SB64X64 down to
MB16X16. The bit-stream syntax is now supporting rectangular block
sizes. It starts from SB64X64 and recursively describes the partition
type of the current block. If the partition type is PARTITION_NONE,
the block is coded as a single unit; if it is PARTITION_HORZ or
PARTITION_VERT, the block is segmented into two independently coded
rectangular units, with no further partition needed; otherwise, the
block is segmented into 4 square blocks. i.e., PARTITION_SPLIT case,
each can be potentially further partitioned.
Forward adaptive probability modeling is used for the partition
information coding, conditioned on the current block size.
Change-Id: I499365fb547839d555498e3bcc0387d8a3587d87
This flag was added to VP8 to allow a mode where MB-level skipping
was not allowed, saving a bit per mb. It was never used in practice,
and hasn't been tested in VP9, so remove it.
Change-Id: Id450ec6904c6d06c1919508e7efc52d05cde5631
tx_type == DCT_DCT check is an implementation detail of iht_add. Also
adding dequant_add_y function with explicit DCT_DCT check inside.
Change-Id: Ia3cb0225601752cdef0ff6f0acd3a09d9dbd8938
This is the first CL with vp9_reader changes. All another macro
definitions will be replaced after.
Change-Id: I1c6bd9c9a612ec1663d484d6adb4fb720af54063
This is work-in-progress, it implements multiple ARF
encoding behind an experimental flag.
It adds the ability to insert multiple ARF frames into a
single ARF group. This patch implements the reordering
of the coded frames, and implements a fixed-length coding
pattern. It applies a fixed quantizer strategy based on
where the frame is in the coding sequence.
Further work to modify the rate control strategy is
ongoing and will be submitted via a set of future patches.
In this first step, each ARF group is recursively
bisected and an ARF frame added at that position in the
sequence. The recursion continues until ARF frames are
within MIN_GF_INTERVAL frames.
The code sits behind the "multiple-arf" experimental
flag ("CONFIG_MULTIPLE_ARF"). The experimental flag
"oneshotq" ("CONFIG_ONESHOTQ") also needs to be enabled
for this patch to work correctly.
Change-Id: Ie473b05ebb43ac473c0cfb659b2b8042823085e2
Combine superblock inter predictors into a unified function that
allows configurable block width and height. The inter predictions
of block sizes smaller than 16x16 are handled differently. To be
continued on merging them later.
Change-Id: I14075959dd5e221f00c205c99ca35c1c31ef728e
The intra predictor supports configurable block sizes. It can handle
intra prediction down to 4x4 sizes, when enabled in BLOCK_SIZE_TYPE.
Change-Id: I7399ec2512393aa98aadda9813ca0c83e19af854
This patch changes the default with the modecoefprob expt
to use mode-based forward updates with one-node pegged
modeling.
The maximum difference with fully trained tables is now
less that 0.1%.
Change-Id: I06b44322e10c6703f93f3c1d48d973b1136a0618
This patch will use the dest buffer instead of the
predictor buffer. This will allow us in future commits
to remove the extra mem copy that occurs in the dequant
functions when eob == 0. We should also be able to remove
extra params that are passed into the dequant functions.
Change-Id: I7241bc1ab797a430418b1f3a95b5476db7455f6a
With these fixed, the codec produces identical results regardless of
what literal values are used for the enum members in BLOCK_SIZE_*.
Change-Id: I26db8e08019b58ba432af1f0950ebe6b0eb4ad8c
The unified dequantization, inverse transform, and adding functions
support rectangular block sizes. Also separate the operations on
luma and chroma components, in the consideration of the txfm_size
for uv components in rectangular block sizes.
Change-Id: I2a13246b2a9086b37d575d346070990d854cc110
Merge sb32x32 and sb64x64 functions; allow for rectangular sizes. Code
gives identical encoder results before and after. There are a few
macros for rectangular block sizes under the sbsegment experiment; this
experiment is not yet functional and should not yet be used.
Change-Id: I71f93b5d2a1596e99a6f01f29c3f0a456694d728
Restructure the code to avoid the majority of per-block-size
switches, code duplication, etc. All block types (mb/sb32/sb64)
can be handled by the same code.
Change-Id: I4022718d66e31a15a7074e43f3b98cd0a5124ea7
Removing several commented code blocks, using uint32_t and uint8_t types,
removing redundant code.
Change-Id: Ifc5cc9863897925ea2a7cab4f7309ccf28d80bfe
Clamp only the motion vectors inferred from neighboring reference
macroblocks. The motion vectors obtained through motion search in
NEWMV mode are constrained during the search process, which allows
a relatively larger referencing region than the inferred mvs.
Hence further clamping the best mv provided by the motion search may
affect the efficacy of NEWMV mode.
Synchronized the decoding process. The decoded mvs in NEWMV modes
should be guaranteed to fit in the effective range. Put a mv range
clamping function there for security purpose.
This improves the coding performance of high motion sequences, e.g.,
derf set:
foreman 0.233%
husky 0.175%
icd 0.135%
mother_daughter 0.337%
pamphlet 0.561%
stdhd set:
blue_sky 0.408%
city 0.455%
also saw sunflower goes down by -0.469%.
Change-Id: I3fcbba669e56dab779857a8126a91b926e899cb5
Start grouping data per-plane, as part of refactoring to support
additional planes, and chroma planes with other-than 4:2:0
subsampling.
Change-Id: Idb76a0e23ab239180c818025bae1f36f1608bb23
Inside decode_sb_4x4 it should be
"get_tx_type_4x4(mb, y_idx * y_size + x_idx)"
but it was
"get_tx_type_4x4(mb, y_idx * (2 * y_size) + x_idx)".
Also making code of decode_sb_4x4, decode_sb_8x8, and decode_sb_16x16
formatted in the same way.
Change-Id: I15c7bef4fb575f7e9da19f953912324cb35d24dd
This code was only called in the BPRED case, but had no real special
case associated with it. Made BPRED behave like all other modes. No
bitstream change.
Change-Id: I87ba11fe723928b6314d094979011228d5ba006f
Took vp9_setup_scale_factors_for_frame() out from
vp9_setup_interp_filters(), so that it is only called once per
frame instead of per macroblock. Decoder tests showed a 1.5%
performance gain.
Change-Id: I770cb09eb2140ab85132f82aed388ac0bdd3a0aa
General code cleanup in loopfilter code. Modification of setup_frame_size,
so now VP9_COMMON is modified in one place after all width/height checks
passed.
Change-Id: Iedf32df43a912d7aae788ed276ac6c429973f6fe
Adding decode_sb_8x8 and decode_sb_4x4 with common code for superblock
decoding. Renaming decode_superblock32 to decode_sb32 and
decode_superblock64 to decode_sb64.
Change-Id: Id006d7e398b9bfa3acec4326e1e0c537ebfefdd3
The patch adds the flexibility to use standard EOB based coding
on smaller block sizes and nzc based coding on larger blocksizes.
The tx-sizes that use nzc based coding and those that use EOB based
coding are controlled by a function get_nzc_used().
By default, this function uses nzc based coding for 16x16 and 32x32
transform blocks, which seem to bridge the performance gap
substantially.
All sets are now lower by 0.5% to 0.7%, as opposed to ~1.8% before.
Change-Id: I06abed3df57b52d241ea1f51b0d571c71e38fd0b
Almost all arguments for vp9_build_inter32x32_predictors_sb and
vp9_build_inter64x64_predictors_sb can be deduced from the first macroblock
argument.
Change-Id: I5d477a607586d05698d5b3b9b9bc03891dd3fe83
Extracting setup_frame_size and update_frame_context functions. Introducing
vp9_read_prob function as shortcut for (vp9_prob)vp9_read_literal(r, 8).
Change-Id: Ia5c68fd725b2d1b9c5eb20f69cacb62361b5a3dd
This gains about 0.2% on derf, 0.1% on hd and 0.4% on stdhd. I can put
this under an experimental flag if wanted, just trying to get my patch
queue in shape.
Change-Id: Ibe1a30fe0e0b07bec4802e0f3ff0ba22e505f576
Adds an experiment to use a weighted prediction of two INTER
predictors, where the weight is one of (1/4, 3/4), (3/8, 5/8),
(1/2, 1/2), (5/8, 3/8) or (3/4, 1/4), and is chosen implicitly
based on consistency of the predictors to the already
reconstructed pixels to the top and left of the current macroblock
or superblock.
Currently the weighting is not applied to SPLITMV modes, which
default to the usual (1/2, 1/2) weighting. However the code is in
place controlled by a macro. The same weighting is used for Y and
UV components, where the weight is derived from analyzing the Y
component only.
Results (over compound inter-intra experiment)
derf: +0.18%
yt: +0.34%
hd: +0.49%
stdhd: +0.23%
The experiment suggests bigger benefit for explicitly signaled weights.
Change-Id: I5438539ff4485c5752874cd1eb078ff14bf5235a
These are mostly just for experimental purposes. I saw small gains (in
the 0.1% range) when playing with this on derf.
Change-Id: Ib21eed477bbb46bddcd73b21c5c708a5b46abedc
Now that the first AC coefficient in both directions use the same DC
as their context, there no longer is a purpose in letting both have
their own band. Merging these two bands allows us to split bands for
some of the very high-frequency AC bands.
In addition, I'm redoing the banding for the 1D-ADST col/row scans. I
don't think the old banding made any sense at all (it merged the last
coefficient of the first row/col in the same band as the first two of
the second row/col), which was clearly an oversight from the band being
applied in scan-order (rather than in their actual position). Now,
coefficients at the same position will be in the same band, regardless
what scan order is used. I think this makes most sense for the purpose
of banding, which is basically "predict energy for this coefficient
depending on the energy of context coefficients" (i.e. pt).
After full re-training, together with previous patch, derf gains about
1.2-1.3%, and hd/stdhd gain about 0.9-1.0%.
Change-Id: I7a0cc12ba724e88b278034113cb4adaaebf87e0c
Pearson correlation for above or left is significantly higher than for
previous-in-scan-order (absolute values depend on position in scan, but
in general, we gain about 0.1-0.2 by using either above or left; using
both basically just makes this even better). For eob branch skipping,
we continue to use the previous token in scan order.
This helps about 0.9% on derf after re-training on a limited data set.
Full re-training and results on larger-resolution clips are pending.
Note that this commit breaks trellis, so we can probably get further
gains out of it by fixing trellis at some later point.
Change-Id: Iead68e296fc3a105cca746b5e3da9555d6010cfe
Moving code from vp9_decode_frame function into setup_loopfilter and
setup_segmentation functions. A little bit of cleanup.
Change-Id: I2cce1813e4d7aeec701ccf752bf57e3bdd41b51c
Adds a per-frame, strength adjustable, in loop deringing filter. Uses
the existing vp9_post_proc_down_and_across 5 tap thresholded blur
code, with a brute force search for the threshold.
Results almost strictly positive on the YT HD set, either having no
effect or helping PSNR in the range of 1-3% (overall average 0.8%).
Results more mixed for the CIF set, (-0.5 min, 1.4 max, 0.1 avg).
This has an almost strictly negative impact to SSIM, so examining a
different filter or a more balanced search heuristic is in order.
Other test set results pending.
Change-Id: I5ca6ee8fe292dfa3f2eab7f65332423fa1710b58
Replaces the default tables for single coefficient magnitudes with
those obtained from an appropriate distribution. The EOB node
is left unchanged. The model is represeted as a 256-size codebook
where the index corresponds to the probability of the Zero or the
One node. Two variations are implemented corresponding to whether
the Zero node or the One-node is used as the peg. The main advantage
is that the default prob tables will become considerably smaller and
manageable. Besides there is substantially less risk of over-fitting
for a training set.
Various distributions are tried and the one that gives the best
results is the family of Generalized Gaussian distributions with
shape parameter 0.75. The results are within about 0.2% of fully
trained tables for the Zero peg variant, and within 0.1% of the
One peg variant.
The forward updates are optionally (controlled by a macro)
model-based, i.e. restricted to only convey probabilities from the
codebook. Backward updates can also be optionally (controlled by
another macro) model-based, but is turned off by default. Currently
model-based forward updates work about the same as unconstrained
updates, but there is a drop in performance with backward-updates
being model based.
The model based approach also allows the probabilities for the key
frames to be adjusted from the defaults based on the base_qindex of
the frame. Currently the adjustment function is a placeholder that
adjusts the prob of EOB and Zero node from the nominal one at higher
quality (lower qindex) or lower quality (higher qindex) ends of the
range. The rest of the probabilities are then derived based on the
model from the adjusted prob of zero.
Change-Id: Iae050f3cbcc6d8b3f204e8dc395ae47b3b2192c9
Lower case variable names, code simplification by using already defined
clamp and read_le16 functions.
Change-Id: I8fd544365bd8d1daed86d7b2ae0843e4ef80df08
Wrote sse2 version of vp9_short_idct10_16x16 function. Compared
to c version, the sse2 version is 2.3X faster.
Change-Id: I314c4f09369648721798321eeed6f58e38857f26
Making consistent initialization of mb_to_{top,botton,left,right}_edge
variables after set_mb_row & set_mb_col calls. A little bit of code cleanup
additionally.
Change-Id: I245bfe32c5701e9836956dc25cf8c770d109cbc1
Wrote sse2 version of vp9_short_idct16x16 function. Compared to c
version, the sse2 version is over 2.5X faster.
Change-Id: I38536e2b846427a2cc5c5423aaf305fd0e605d61
Renaming Width to width, Height to height and Version to version in
several structs and function signatures.
Change-Id: I084c3f7e747cb2ce3345aff27a3dff9b13a87543
Wrote sse2 functions of vp9_short_idct8x8 and vp9_short_idct10_8x8.
Compared to c version, the sse2 version is 2X faster. The decoder
test didn't show noticeable gain since 8x8 idct doesn't take much
of decoding time (less than 1% in my test).
Change-Id: I56313e18cd481700b3b52c4eda5ca204ca6365f3
If the intended display size is different than the size the frame is
coded at, then send that size explicitly in the bitstream. Adds a new
bit to the frame header to indicate whether the extra size fields
are present.
Change-Id: I525c66f22d207efaf1e5f903c6a2a91b80245854
This fix resolves some of the mismatch issues being seen
recently. While this is the right thing to do when tiling
is used for this experiment, it is not the underlying cause
of the the mismatches.
Something else is causing writing outside of the allowable
frame area in the encoder leading to this mismatch.
Change-Id: If52c6f67555aa18ab8762865384e323b47237277
Moving identical code to separate functions, variable declaration and
initialization on the same line.
Change-Id: Ifa6474a64189f9d8051e88e19850453b0227752c
Updates the YV12_BUFFER_CONFIG structure to be crop-aware. The
exiting width/height parameters are left unchanged, storing the
width and height algined to a 16 byte boundary. The cropped
dimensions are added as new fields.
This fixes a nasty visual pulse when switching between scaled and
unscaled frame dimensions due to a mismatch between the scaling
ratio and the 16-byte aligned sizes.
Change-Id: Id4a3f6aea6b9b9ae38bdfa1b87b7eb2cfcdd57b6
This is like VP8_COPY_REFERENCE, but returns a pointer to the reference
frame rather than a copy of it. This is useful when the application
doesn't know what the size of the reference is, as is the case when
scaling is in effect.
Change-Id: I63667109f65510364d0e397ebe56217140772085
When coding the frame that corresponds to the midpoint frame
defining an ARF, do not update the last reference frame buffer.
Previously this buffer was updated meaning that when coding the next
ARF all the reference buffers were the same (or nearly so).
Turning the update off means that the frame before is still available
as an alternative predictor and for use in compound prediction.
Also fixed inconsistency in test for mismatch (patch from JK).
Net average gains (derf 0.049, yt 0.163, yt-hd 0.207, std-hd 0.286)
Change-Id: Ifee21da21ccbb1648ac2eafe890d3ce60562c7bc
Removing redundant code, introducing new functions for better
decomposition, adding 'clamp' function to vp9_common.h.
Change-Id: Ic3b8ca13bbc38f60f0c9c43910b5802005e31aaf
Adds probability updates for extra bits for the nzcs, code for
getting nzc stats, plus some minor cleanups and fixes.
Change-Id: If2814e7f04fb52f5025ad9f400f3e6c50a00b543
This also changes the RD search to take account of the correct block
index when searching (this is required for ADST positioning to work
correctly in combination with tx_select).
Change-Id: Ie50d05b3a024a64ecd0b376887aa38ac5f7b6af6
Yaowu found this function had a compiling issue with MSVC because
of using _mm_storel_pi((__m64 *)(dest + 0 * stride), (__m128)p0).
To be safe, changed back to use integer store instruction.
Also, for some build, diff could not always be 16-byte aligned.
Changed that in the code.
Change-Id: I9995e5446af15dad18f3c5c0bad1ae68abef6c0d
This patch revamps the entropy coding of coefficients to code first
a non-zero count per coded block and correspondingly remove the EOB
token from the token set.
STATUS:
Main encode/decode code achieving encode/decode sync - done.
Forward and backward probability updates to the nzcs - done.
Rd costing updates for nzcs - done.
Note: The dynamic progrmaming apporach used in trellis quantization
is not exactly compatible with nzcs. A suboptimal approach has been
used instead where branch costs are updated to account for changes
in the nzcs.
TODO:
Training the default probs/counts for nzcs
Change-Id: I951bc1e22f47885077a7453a09b0493daa77883d
Split macroblock and superblock tokenization and detokenization
functions and coefficient-related data structs so that the bitstream
layout and related code of superblock coefficients looks less like it's
a hack to fit macroblocks in superblocks.
In addition, unify chroma transform size selection from luma transform
size (i.e. always use the same size, as long as it fits the predictor);
in practice, this means 32x32 and 64x64 superblocks using the 16x16 luma
transform will now use the 16x16 (instead of the 8x8) chroma transform,
and 64x64 superblocks using the 32x32 luma transform will now use the
32x32 (instead of the 16x16) chroma transform.
Lastly, add a trellis optimize function for 32x32 transform blocks.
HD gains about 0.3%, STDHD about 0.15% and derf about 0.1%. There's
a few negative points here and there that I might want to analyze
a little closer.
Change-Id: Ibad7c3ddfe1acfc52771dfc27c03e9783e054430
Fixed a couple of variable/function definitions, as well as header
handling to support 16K sequence coding at high bit-rates.
The width and height are each specified by two bytes in the header.
Use an extra byte to explicitly indicate the scaling factors in
both directions, each ranging from 0 to 15.
Tested coding up to 16400x16400 dimension.
Change-Id: Ibc2225c6036620270f2c0cf5172d1760aaec10ec
Simplified idct32x32 calculation when there are only 10 or less
non-zero coefficients in 32x32 block. This helps the decoder
performance.
Change-Id: If7f8893d27b64a9892b4b2621a37fdf4ac0c2a6d
Removing redundant variables, using x *= y instead x = x * y, moving
variable declarations into inner blocks.
Change-Id: I884f95c755f55d51b7c1c6585f10296919063e41