There are many places in handle_inter_mode that need to restore the
dst buffer pointers, due to buffer pointer swap and early rd search
breakout. This commit wraps these operations into an inline function
for clean-up.
Change-Id: I0462e8c41c8bc3cd8db07395489cac03d8e5be54
Commit a4a5a210 enabled lossless coding, but the commit incorrectly
disabled the usage of skip in encoder even when skip should be used.
This commit make sure that skip is enabled even in lossless mode.
Change-Id: I276954f952c6ac68f17a316ebc72f09001228a08
Overall change (using dual buffer scheme for superblocks of both inter
and intra modes) reduces speed 2 runtime:
bluesky_1080p at 6000kbps: 263553ms -> 257441ms
riverbed_1080p at 8000kbps: 233230ms -> 225308ms.
Change-Id: Idf8d70f768a4b0d97b2a8506372c57b7b4022119
This commit enables the dual buffer rate-distortion optimization
and encoding scheme. It stacks the original transform coefficients,
quantized levels, and reconstructed coefficients, in the rate-
distortion optimization search process, hence eliminates the need
to re-run residual generation, forward transform, and quantization
in the encoding stage.
Change-Id: I011bfad3a59a380a869ee552e91dae0394ec492e
Allocate memory space of dual buffer sets that store the coeff, qcoeff,
dqcoeff, and eobs. Connect the pointers of macroblock_plane and
macroblockd_plane to the actual buffer in use accordingly.
Change-Id: I2f0b5f482ca879fae39095013eaf8901db20a5a4
This to make sure that prediction residue always get coded in lossless
mode.
This commit also fixed lossless unit test
Change-Id: I537726ee55328d4e4cf0a0196393a67e12bfcde1
Removing special case handling from vp9_tree_probs_from_distribution(),
tree_merge_probs(), and vp9_tokens_from_tree_offset() functions. Replacing
inter_mode_offset() function with macro INTER_OFFSET which is used now for
vp9_inter_mode_tree definition.
Change-Id: Iff75a1499d460beb949ece543389c8754deaf178
The compound inter prediction could potentially run with initial
motion vectors of invalid value and check the mv_cost, which triggers
overheap read. This commit resolves this issue by forcing a motion
vector value check for compound inter modes of both superblock and
sub8x8 block sizes.
Change-Id: I4f4fc19ce83c8272782bc382f12c82a3f03212fc
We only update partition_probs for inter frames but they are constant
for key frames. It is not necessary to have constants inside frame
context and copy them every time. This change reduces FRAME_CONTEXT size
by at least 48 bytes.
Change-Id: If70a53be51043f37fe7d113853217937710932a7
This commit fixes the use case of plane_block_idx, which determines
the plane (Y/U/V) index based on block index. When block idx >= 4 in
sub8x8 block loop, it should be of chroma components.
Change-Id: I072705aa7b35445524ac607089ca8ce54b7ba478
This commit makes zcoeff_blk cache the case where the entire block
is quantized to be zero (without applying zero-forcing) in the rate-
distortion optimization loop, and skip the forward DCT, quantization,
inverse DCT, and reconstruction process in the encode_block stage.
It now works for all the block sizes, including sub8x8 blocks.
Change-Id: I5ae60a9c436ba3637d11666733554bec4580ef98
"<< SUBPEL_BITS" needs to be added in the calculation. Call
set_scaled_offsets() to calculate x_offset_q4 and y_offset_q4.
Change-Id: Ied130ea771510e918f51cd1dc3abe57f4c0962b5
It is enough to check just block type: intra or inter. Intra block implies
intra prediction mode, and inter block implies inter mode.
Change-Id: I3cf98731a3935f670a3cd8e2b2443483eb944be4
replaces use of cur_tile_mi_(row|col)_(start|end) by VP9_COMMON, making
it less stateful and more reusable for parallel tile decoding
Change-Id: I1df09382b4567a0e5f4434825d47c79afe2399be
Use a flag variable to determine if coded in inter mode, thus avoiding
multiple inter mode checks in super_block_yrd.
Change-Id: I0ef998b2811c38e185a2e0583f0f636cee45d2cf
The ref's scale_factors are set at frame level, and then copied for
each partition block. Since the struct members are mostly constant,
this patch separated the constant and non-constant members, and
reduced struct copying. This gave 0.5% ~ 1.4% decoder speed gain.
Change-Id: I94043bf5a6995c8042da52e5c661818dfa6f6d4c
The pointer was asigned only once with vp9_regular_quantize_b_4x4, calling
this function directly now. Also removing unused declarations:
prototype_quantize_block
prototype_quantize_block_pair
prototype_quantize_mb
vp9_regular_quantize_b_4x4_pair
vp9_regular_quantize_b_8x8
Change-Id: I14325bc2f082336820671eafbc06126651b79f73
This commit uses left_available flag to decide if the left mode_info
struct is available for left_block_mode. As discussed with James
Zern (jzern@), this prevents the codec from fetching mode_info from
blocks in the left tile, which although effectively not used might
present concerns for multi-threaded tile decoding.
This is NOT a bit-stream change.
Change-Id: I1dc8cf1bcbf056688eee27c7bc5706ac4b4e0125
Just making fdct consistent with iht/idct/fht functions which all use
stride (# of elements) as input argument.
Change-Id: I0ba3c52513a5fdd194f1e7e2901092671398985b
Renames for consistency with other constants:
NUM_FRAME_TYPES -> FRAME_TYPES
NUM_PARTITION_CONTEXTS -> PARTITION_CONTEXTS
Change-Id: I3db30acb2868eb0a424237c831087b2e264ec47f
This commit makes the buffer allocation of zcoeff_blk array in
pick_mode_context block size aware. It calculates the number of
4x4 blocks in the partition and assigns the memory space accordingly.
This process (and the uninitialization) is done once for each encoding
pass. It allows memory copy of smaller buffer when possible.
For football at 600kbps, the runtimes improve by about 1%:
speed 1, 45961ms -> 45472ms
speed 2, 23863ms -> 23598ms
Change-Id: Id2ca24906fa89f46fa5fe742ec4b8efc2a61f877
Making this change in order to move allow_high_precision_mv field
from MACROBLOCKD structure to VP9_COMMON (because it is a frame level
flag).
Change-Id: I1d006ba36d938e0caf4d40fa051e2e38df9c1108
* changes:
Use a separate MODE_INFO stream for each tile column
Get rid of "this_mi", use "mi_8x8[0]" everywhere instead
Make the static_segmentation feature work again
The only case where they were intentionally pointing to different
structures was in mbgraph, and this didn't have the expected behavior
because both of these pointers are used interchangeably through the code
Change-Id: I979251782f90885fe962305bcc845bc05907f80c
Moving code that gets band_translate array from get_scan_and_band()
function to get_band_translate() function. Renaming get_scan_and_band() to
get_scan().
Change-Id: I43047c205a1ca2a6e24be44db39dc04b7a385008
This should be similar to what x264 does with --aq-mode 1.
It works well with clips like parkjoy and touhou
(http://x264.nl/developers/Dark_Shikari/LosslessTouhou.mkv).
At low bitrates, the segmentation signaling overhead may negate the
benefits of this feature.
(PGW) Default changed to feature OFF to allow provisional merge.
Change-Id: I938abf9bb487e1d4ad3b0264ea03d9826275c70b
Updated the encoder to handle frames that are coded
intra-only. Intra-only frames must be non-showable,
that is, the "show frame" flag must be set to 0 in
the frame header.
Tested by forcing the ARF frames to be coded intra-
only.
Note: The rate control code will need to be modified
to account for intra-only frames better than they
are currently handled.
Change-Id: I6a9dd5337deddcecc599d3a44a7431909ed21079
Use the zcoeff_blk buffer of PICK_MODE_CONTEXT to store the indexes
of all-zero-coeff block of the current best mode. Remove the temporary
buffer best_zcoeff_blk defined in the rate-distortion optimization
loop. This improves the speed performance by about 0.5% in all speed
settings.
Change-Id: Ie3e15988ddfa581eafa2e19a8228d3fe4a46095c
This commit moves token_cache buffer into macroblock struct, instead
of defining as a local variable in cost_coeffs. This avoids repeatedly
re-allocating memory space in the rate-distortion optimization loop.
The runtime at speed 0 reduces:
bus 2000kbps, 161692ms to 159951ms
football 600kbps, 229505ms to 225821ms
Change-Id: If7da6b0b6d8c5138a16271a33c4548fba33d8840
Converts the constant rddiv parameter to 128 (from 100) and
implements RDCOST with bit-shift rather than multiplication.
Other parameters are also adjusted to roughly keep the same
balance between Rate and Distortion.
There is a slight speed-up of about 0.5-1% (at speed 0) as
testted on football_cif.
There is a slight change in performance due to small change
in the parameters.
derfraw300: +0.033%
stdhdraw250; +0.102%
Change-Id: I70ac69f58fa71c83108f68fe41796cd19d1fc760
The commit changes to mask available intra prediction modes for test
based on prediction block size.
With this patch, encoding time of CpuUsed 2 reduces from 10% to 20% for
HD clips with a compression drop of 0.2%
Change-Id: I65f320f1237c0f5ae3a355bf7caf447f55625455
This commit re-designs the per transformed block rate-distortion
costs tracking buffers. It removes redundant buffer usage, makes
the needed context memory allocation per VP9_COMP instance and
reuses the same buffer sets inside the rate-distortion optimization
search loop, thereby avoiding repeatedly requiring memory space.
It reduces speed 0 runtime:
bus at 2000 kbps from 166763ms to 158967ms,
football at 600 kbps from 246614ms to 234257ms.
Both about 5% speed-up. Local tests suggest about 2% to 5% speed-up
for speed 1 and 2 settings. This does not change compression
performance.
Change-Id: I363514c5276b5cf9a38c7251088ffc6ab7f9a4c3
Increases these parameters.
There is a small efficiency gain.
Change-Id: Ie5f0ddb39c907d335e0dafa5eb112365a81f4542
derfraw300: +0.091%
stdhdraw250: +0.238%
The intra mode distortion adjustment for skip_encode feature was
broken in the refactoring cc91851. This commit fixes it and tunes
the distortion models used therein.
Change-Id: I0d676e82f8e855536a90cf9b3e3fdefafcd886c6
Use b_mode_info to store the inter prediction mode of sub8x8 block,
in replacement of the use of partition_info. Remove redundant buffer
update for partition_info. For bus_cif at 2000 kbps, this seem to make
speed 0 about 1% faster.
Change-Id: Id1b3be45e75a24fb4b42335ac480c23e440978f6
We already have itxm_add member in MACROBLOCKD structure. Both
inv_txm4x4_1_add and inv_txm4x4_add are just its special cases for
different eob values. But eob logic is already implemented in
vp9_iwht4x4_add and vp9_idct4x4_add (that's why also removing
inverse_transform_b_4x4_add).
Change-Id: I80bec9b6f7d40c5e5033c613faca5c819c3e6326
This commit removes the redundant second reference frame check in
the rate-distortion optimization loop for sub8x8 blocks.
Change-Id: I13a57a6f624c4a9bcef02ff2a867fa30d8b44a93
This commit defines b_mode_info as a struct type. This will allow
us to further remove the use of PARTITION_INFO in the encoding process.
Change-Id: I975b0f7d557b5e0f66545a61b472def76b671cce
This commit separates the rate-distortion optimization loop of
superblocks from that of sub8x8 blocks. This allows better design
rate-distortion optimization search loop for each setting. It also
removes the use of SPLITMV and I4X4_PRED therein.
No performance change in speed 0 settings. For bus@CIF at 2000kbps,
the speed 1 runtime goes from 48009ms to 43894ms (about 10% faster).
The overall compression performance on derf changed by -0.021%.
Speed 2 runtime goes from 27114ms to 28700ms (6% slower), while the
overall coding efficiency goes up by 1.629% for derf, 1.236% for yt.
Change-Id: Ie6bdfa0a370148dd60bd800961077f7e97e67dd4