Work-in-progress, not yet ready for review. TODO items:
- bitstream writing (encoder) and reading (decoder)
- decoder reconstruction
Change-Id: I5afb7284e7e0480847b47cd0097cb469433c9081
Changing the order of probabilities inside mb_segment_tree_probs in order
to use treed_read/treed_write function instead of custom code.
Change-Id: I843487d5057913b9358db73da270893eefecc6c8
Moving common code from encoder and decoder to vp9_get_qindex function.
Also moving quant-related constants from vp9_onyxc_int.h to
vp9_quant_common.h.
Change-Id: I70c5bfbaa1c8bf00fde0bfc459d077f88b6d46c8
Separate the functionality of I4X4_PRED from decode_mb. Use
decode_atom_intra instead, to enable recursive partition of superblock
down to 8x8.
Change-Id: Ifc89a3be82225398954169d0a839abdbbfd8ca3b
Also fixed two minor subtle boundary conditions in intra prediction
code, and replaced memcpy/memset with vpx_ prefixed version.
Change-Id: I9cddff3be831228b628f1f2f065a61feacbcbee6
The commmit changed to use same intra prediction function for all
block sizes.
Some details on the changes:
1. All directional modes except DC/TM/V/H now have built-in filtering
for all pixels with filter taps either (1, 2, 1)/4 or (1, 1)/2.
2. Above edge get automatic extended to double width (bw*2), which
makes a lot of the prediciton mode computation simpler.
3. Same intra prediction function is called with different size
for i4x4_pred and all other larger size.
Overall, the change helped keyframe only coding for both cif size
and std-hd size test sets by .5% consistently on all encodings.
For normal coding with single/auto key frame, the change now also
is consistently net positive for all encodings. The overall gains
is about .15% on std-hd set.
Change-Id: I01ceb31fbc73d49776262e6bdc06853b03bbd1d1
Output changes slightly because of a minor bug in (at least) the sb32x16
block2above tx16x16 tables that previously existed in vp9_blockd.c.
Change-Id: I624af28ac200a8322d64454cf05c79e9502968cc
Turns model based reverse updates on for coefficients in an
effort to reduce the memory requirement for counters.
With this patch the counters needed will be reduced by about
75% since only 3 counts are needed instead of 12.
The impact in performance is:
derf300: -0.252%
stdhd250: -0.046%
However retraining should alleviate some of the drop in
performance.
Change-Id: I6f2b3e13f6d5520aa3400b0b228fb5e8b4a43caa
Conflicts:
vp9/common/vp9_findnearmv.c
vp9/common/vp9_rtcd_defs.sh
vp9/decoder/vp9_decodframe.c
vp9/decoder/x86/vp9_dequantize_sse2.c
vp9/encoder/vp9_rdopt.c
vp9/vp9_common.mk
Resolve file name changes in favor of master. Resolve rdopt changes in
favor of experimental, preserving the newer experiments.
Change-Id: If51ed8f457470281c7b20a5c1a2f4ce2cf76c20f
All members can be referenced from their per-plane counterparts, and
removes assumptions about 24 blocks per macroblock.
Change-Id: I7ff2fa72d22c29163eb558981c8193765a8113d9
This originally was "Removed update_blockd_bmi()". Now,
this patch removed bmi from blockd and uses the bmi found
in mode_info_context. Eliminates unnecessary bmi copies between
blockd and mode_info_context.
Change-Id: I287a4972974bb363f49e528daa9b2a2293f4bc76
Basic assumption: when talking about transform units, use b_; when
talking about macroblock indices, use mb_.
Change-Id: Ifd163f595d4924ff892de4eb0401ccd56dc81884
The underlying storage for these buffers is in the per-plane MACROBLOCKD
area, so read it from there directly.
Change-Id: Id6bd835117fdd9dea07db95ad06eff9f12afaaf7
All members can be referenced from their per-plane counterparts, and
removes assumptions about 24 blocks per macroblock.
Change-Id: I593fb0715e74cd84b48facd1c9b18c3ae1185d4b
Remove similarly named header file. It is obsolete.
Move file to match naming style.
Adjust make file to include the file correctly and remove extra
unnecessary #if guard.
Change-Id: Ifba07ba9938a5df08a9f4eda54a3ac4d6983f7bf
Using ALLOWED_REFS_PER_FRAME constants instead of hard coded 3, replacing
memcpy with plain struct assignment.
Change-Id: Ibc86f5d175fcb3f3a3eddacf593525370f1f854c
Function set_mb_row() and set_mb_col() do similar work and are always
called together, this commit merged them into a single function for
clarity and easy maintainence. This was a TODO item.
Change-Id: I956bd9ed6afb8b2b0469b20fd8bc893b26f8a0f3
This commit enables selecting probability models for recursive block
partition information syntax, depending on its above/left partition
information, as well as the current block size. These conditional
probability models are reasonably stationary and consistent across
frames, hence the backward adaptive approach is used to maintain and
update the contextual models.
It achieves coding performance gains (on top of enabling rectangular
block sizes):
derf: 0.242%
yt: 0.391%
hd: 0.376%
stdhd: 0.645%
Change-Id: Ie513d9673337f0d27abd65fb566b711d0844ec2e
This minor tweak makes segment 0 neutral and used by
key frames and also extends beyond 4 segments.
Change-Id: Ife4744602aba66ac9432746db3113cc5cd88a482
Also some further simplification following removal
of top node code.
There is an issue in regards to the shared file vp8cx.h
in regard to the roi_map as this interface assumes that
there are only 4 segments. I have left the value here as
4 for now meaning that the roi_map interface is broken
for VP9.
Note that this change would have been easier if I hadn't
had to search for hard wire instances of the number 4
and <= 3.
Change-Id: Ia8b6deea4be4dbd20deb1656e689dd43a5f190e8
Remove top node optimization.
The improvement this gives is not sufficient to justify
the extra complexity.
Change-Id: I2bb4a12a50ffd52cacfa4a3e8acbb2e522066905
This commit enables rectangular block prediction of compound
inter-intra mode. It combines the mb/sb32/sb64 prediction functions
into a unified version with configurable block width and height.
This fixes the enc/dec mismatch of the codebase when
comp-interintra-pred is enabled.
Change-Id: I1d0db2f1f184007802df04fcd12b9dadb3189ff0
Clean Windows build warnings:
warning C4028: formal parameter <N> different from declaration
This was fixed independently in master and experimental but the fixes
were in opposite directions. One added const to the declaration and the
other removed it from the implementation.
Also update the variable names. This doesn't modify the data so call it
ref, matching the functions in the vicinity, rather than dst.
Change-Id: I2ffc6b4a874cb98c26487b909d20a5e099b5582c
Also use explicitely named enum values in sb_type comparisons, rather
than relying on absolute integer values, because enum values may
change in the future.
Change-Id: I72d42971a98157af93413a25ac2c7e6f9b369cec
First in a series of commits making certain MACROBLOCK members
addressable per-plane. This commit also refactors the block subtraction
functions vp9_subtract_b, vp9_subtract_sby_c, etc to be
loops-over-planes and variable subsampling aware.
Change-Id: I371d092b914ae0a495dfd852ea1a3d2467be6ec3
Mostly for cleanup purposes. Now we should be able to rework
the encoder/decoder to use a common idct/add function.
Change-Id: I1597cc59812f362ecec0a3493b6101a6cc6fa7ff
Adds an experiment that codes an end-of-orientation symbol
for every eligible zero encountered in scan order.
This cleans out various other sub-experiments that were part
of the origiinal patch, which will be later included if found
useful.
Results are slightly positive on all sets (0.1 - 0.2% range).
Change-Id: I57765c605fefc7fb9d1b57f1b356843602abefaf
Removes the redundant dst pointers from vp9_build_inter_predictors_sb{y,uv}
and the remaining mb specific functions.
Change-Id: I7b6bf439d9394b85ea79b4fe61a3ffc1025720da
Wherever there are real pixels available before falling back to use
assumed values 127 and 129.
This also make DC_PRED for i4x4 consistent with DC_PRED for larger
blocks.
Change-Id: I54372924826118da023f402c802ac6ce0caa70c3
First in a series of commits moving the framebuffers pointers to
per-plane data, so that they can be indexed numerically rather than
by name.
Change-Id: I6e0d60fd4d51e6375c384eb7321776564df21775
For 1080 material, this buffer is currently 2,270,928 bytes. This patch swaps
ptrs instead of copying and uses the last show_frame flag instead of setting
the entire buffer to zero. For the test clip used, the decoder improved by up
to 1%.
Change-Id: I686825712ad56043e09ada9808dc489f875a6ce0
Remove the unnecessary _s_ from their names, and add a new
vp9_recon_sb() that calls the y and uv variants.
Change-Id: I7ffaa5ff5605a8472cac2a53de8cf889353039a6
Similar to the prior change that removed the rounding from non-SPLITMV
modes. Improves quality by a similar amount (Additional +0.087% on derf)
Change-Id: I39d80b4a3037a3aa7e285eb2320346ddaf646f52
Further simplification of mvref search to return
only the top two candidates. Distance weights removed
as the test order reflects distance anyway.
Change-Id: I0518cab7280258fec2058670add4f853fab7b855
As we are no longer able to sort the candidate
mvrefs in both encoder and decode and given
that the cost of explicit signalling has proved
prohibitive, it no longer makes sense to find more
than 2 candidates.
This patch:
Modifies and simplifies add_candidate_mv()
Removes the forced addition of a 0 vector in the
MAX_MV_REF_CANDIDATES-1 position (in preparation
to reducing MAX_MV_REF_CANDIDATES to 2).
Re-orders the addition of candidates slightly.
This actually gives small gains (circa 0.2% on std-hd)
A subsequent patch will remove NEW_MVREF experiment,
reduce MAX_MV_REF_CANDIDATES to 2 and remove distance
weights as these are implicit now in the order.
Change-Id: I3dbe1a6f8a1a18b3c108257069c22a1141a207a4
Consider the previous behavior for the MV 1 3/8 (11/8 pel). In the
existing code, the fractional part of the MV is considered separately,
and rounded is applied, giving a result of 6/8. Rounding is not required
in this case, as we're increasing the precision from a q3 to a q4, and
the correct value 11/16 can be represented exactly.
Slight gain observed (+.033 average on derf)
Change-Id: I320e160e8b12f1dd66aa0ce7966b5088870fe9f8
This commit converts the luma versions of vp9_build_inter_predictors_sb
to use a common function. Update the convolution functions to support
block sizes larger than 16x16, and add a foreach_predicted_block walker.
Next step will be to calculate the UV motion vector and implement SBUV,
then fold in vp9_build_inter16x16_predictors_mb and SPLITMV.
At the 16x16, 32x32, and 64x64 levels implemented in this commit, each
plane is predicted with only a single call to vp9_build_inter_predictor.
This is not yet called for SPLITMV. If the notion of SPLITMV/I8X8/I4X4
goes away, then the prediction block walker can go away, since we'll
always predict the whole bsize in a single step. Implemented using a
block walker at this stage for SPLITMV, as a 4x4 "prediction block size"
within the BLOCK_SIZE_MB16X16 macroblock. It would also support other
rectangular sizes too, if the blocks smaller than 16x16 remain
implemented as a SPLITMV-like thing. Just using 4x4 for now.
There's also a potential to combine with the foreach_transformed_block
walker if the logic for calculating the size of the subsampled
transform is made more straightforward, perhaps as a consequence of
supporing smaller macroblocks than 16x16. Will watch what happens there.
Change-Id: Iddd9973398542216601b630c628b9b7fdee33fe2
Use in-place buffers (dst of MACROBLOCKD) for macroblock prediction.
This makes the macroblock buffer handling consistent with those of
superblock. Remove predictor buffer MACROBLOCKD.
Change-Id: Id1bcd898961097b1e6230c10f0130753a59fc6df
Updates the common convoloution code to support blocks larger than
16x16, and rectangular blocks. This uncovered a bug in the SSSE3
filtering routines due to the order of application of saturation.
This commit fixes that bug, adjusts the unit test to bias its
random values towards the extremes, and adds a test to ensure that
all filters conform to the expected pairwise addition structure.
Change-Id: I81f69668b1de0de5a8ed43f0643845641525c8f0
Adds RD integration for 32x16, 16x32, 64x32 and 32x64 rectangular blocks.
Derf almost +0.6%, HD a little over +1.0%, STDHD +1.3%.
Change-Id: Id651fdb6a655fdbb5c47009757e63317acfb88a5
Enable recursive partition information coding from SB64X64 down to
MB16X16. The bit-stream syntax is now supporting rectangular block
sizes. It starts from SB64X64 and recursively describes the partition
type of the current block. If the partition type is PARTITION_NONE,
the block is coded as a single unit; if it is PARTITION_HORZ or
PARTITION_VERT, the block is segmented into two independently coded
rectangular units, with no further partition needed; otherwise, the
block is segmented into 4 square blocks. i.e., PARTITION_SPLIT case,
each can be potentially further partitioned.
Forward adaptive probability modeling is used for the partition
information coding, conditioned on the current block size.
Change-Id: I499365fb547839d555498e3bcc0387d8a3587d87
This flag was added to VP8 to allow a mode where MB-level skipping
was not allowed, saving a bit per mb. It was never used in practice,
and hasn't been tested in VP9, so remove it.
Change-Id: Id450ec6904c6d06c1919508e7efc52d05cde5631
In decoder, the scaling calculation, such as (mv * x_num / x_den),
is fairly time-consuming. In this patch, we check if the scaling
happens or not at frame level, and then decide which function to
call to skip scaling calculation when no scaling is needed. Tests
showed a 3% decoder performance gain.
Change-Id: I270901dd0331048e50368cfd51ce273dd82b8733
This is work-in-progress, it implements multiple ARF
encoding behind an experimental flag.
It adds the ability to insert multiple ARF frames into a
single ARF group. This patch implements the reordering
of the coded frames, and implements a fixed-length coding
pattern. It applies a fixed quantizer strategy based on
where the frame is in the coding sequence.
Further work to modify the rate control strategy is
ongoing and will be submitted via a set of future patches.
In this first step, each ARF group is recursively
bisected and an ARF frame added at that position in the
sequence. The recursion continues until ARF frames are
within MIN_GF_INTERVAL frames.
The code sits behind the "multiple-arf" experimental
flag ("CONFIG_MULTIPLE_ARF"). The experimental flag
"oneshotq" ("CONFIG_ONESHOTQ") also needs to be enabled
for this patch to work correctly.
Change-Id: Ie473b05ebb43ac473c0cfb659b2b8042823085e2
Combine superblock inter predictors into a unified function that
allows configurable block width and height. The inter predictions
of block sizes smaller than 16x16 are handled differently. To be
continued on merging them later.
Change-Id: I14075959dd5e221f00c205c99ca35c1c31ef728e
To match the order of directional intra prediction modes for larger
blocks, also renamed the i4x4 prediction modes to mirror the larger
variants.
Change-Id: I77cea4d0add6c7758460bf9c7a2fe59aca601f0b
The intra predictor supports configurable block sizes. It can handle
intra prediction down to 4x4 sizes, when enabled in BLOCK_SIZE_TYPE.
Change-Id: I7399ec2512393aa98aadda9813ca0c83e19af854
This patch changes the default with the modecoefprob expt
to use mode-based forward updates with one-node pegged
modeling.
The maximum difference with fully trained tables is now
less that 0.1%.
Change-Id: I06b44322e10c6703f93f3c1d48d973b1136a0618
This patch will use the dest buffer instead of the
predictor buffer. This will allow us in future commits
to remove the extra mem copy that occurs in the dequant
functions when eob == 0. We should also be able to remove
extra params that are passed into the dequant functions.
Change-Id: I7241bc1ab797a430418b1f3a95b5476db7455f6a
More specifically, remove vp9_quantize_mb*, vp9_optimize_mb*,
vp9_inverse_transform_mb* and vp9_transform_mb*. Instead, use the
generic _sb* functions that take a size argument, and call them with
BLOCK_SIZE_MB16X16.
Change-Id: I33024afea95d3a23ffbc1df7da426e4645110f29
Merge various super_block_yrd and super_block_uvrd versions into one
common function that works for all sizes. Make transform size selection
size-agnostic also. This fixes a slight bug in the intra UV superblock
code where it used the wrong transform size for txsz > 8x8, and stores
the txsz selection for superblocks properly (instead of forgetting it).
Lastly, it removes the trellis search that was done for 16x16 intra
predictors, since trellis is relatively expensive and should thus only
be done after RD mode selection.
Gives basically identical results on derf (+0.009%).
Change-Id: If4485c6f0a0fe4038b3172f7a238477c35a6f8d3
The strategy to run fast loop filter picking for encoder speed-up
should be revisited at a later stage.
Change-Id: I3b75e06d767cff41be952a42e63b3292f4eab996
Merge sb32x32 and sb64x64 functions; allow for rectangular sizes. Code
gives identical encoder results before and after. There are a few
macros for rectangular block sizes under the sbsegment experiment; this
experiment is not yet functional and should not yet be used.
Change-Id: I71f93b5d2a1596e99a6f01f29c3f0a456694d728
Restructure the code to avoid the majority of per-block-size
switches, code duplication, etc. All block types (mb/sb32/sb64)
can be handled by the same code.
Change-Id: I4022718d66e31a15a7074e43f3b98cd0a5124ea7
Clamp only the motion vectors inferred from neighboring reference
macroblocks. The motion vectors obtained through motion search in
NEWMV mode are constrained during the search process, which allows
a relatively larger referencing region than the inferred mvs.
Hence further clamping the best mv provided by the motion search may
affect the efficacy of NEWMV mode.
Synchronized the decoding process. The decoded mvs in NEWMV modes
should be guaranteed to fit in the effective range. Put a mv range
clamping function there for security purpose.
This improves the coding performance of high motion sequences, e.g.,
derf set:
foreman 0.233%
husky 0.175%
icd 0.135%
mother_daughter 0.337%
pamphlet 0.561%
stdhd set:
blue_sky 0.408%
city 0.455%
also saw sunflower goes down by -0.469%.
Change-Id: I3fcbba669e56dab779857a8126a91b926e899cb5
Start grouping data per-plane, as part of refactoring to support
additional planes, and chroma planes with other-than 4:2:0
subsampling.
Change-Id: Idb76a0e23ab239180c818025bae1f36f1608bb23
Took vp9_setup_scale_factors_for_frame() out from
vp9_setup_interp_filters(), so that it is only called once per
frame instead of per macroblock. Decoder tests showed a 1.5%
performance gain.
Change-Id: I770cb09eb2140ab85132f82aed388ac0bdd3a0aa
General code cleanup in loopfilter code. Modification of setup_frame_size,
so now VP9_COMMON is modified in one place after all width/height checks
passed.
Change-Id: Iedf32df43a912d7aae788ed276ac6c429973f6fe
Rename the file and clean up includes. In the future we would like to
pattern match the files which need additional compiler flags.
Change-Id: I2c76256467f392a78dd4ccc71e6e0a580e158e56