The difference with the old code is that originally the whole token_cache
was initialized with zeros at the beginning of decode_coefs() function.
Now we set several zero values explicitly with "token_cache[scan[c]] = 0".
Change-Id: I88cc5031f01d13012d1a4491739c36cb44f9401e
Removing goto and using while loop instead, renaming seg_eob to max_eob,
moving eob token counter increment.
Change-Id: Idcc4b3a45e4f313596a71776aef56691a6647e5f
We only need qcoeff buffers in the encoder. Reducing TileWorkerData struct
and VP9Decompressor struct sizes by 24K.
Change-Id: Id148868461f7ffa3d3dd634b371503ae9c57e207
Renaming treed_read() to consistent vp9_read_tree() and moving it from
deleted vp9_treereader.h to vp9_dboolhuff.h file.
Change-Id: Iedd8655acbe25e4fcf62b79e5a13bdea69b6b004
The decoder will construct inter predictor using lazy border extension,
while the encoder, going with multiple runs of motion search in the rate-
distortion optimization loop for each block, does border extension at
frame level. This commit makes separate the inter predictors for encoder
and decoder, respectively.
Change-Id: Ieca2fecba3a7201a6d64ef9f219e5d91e50559c3
This commit takes out vp9_extend_frame_borders from
vp9_setup_scale_factors.
The refactoring is for the preparation of the use of lazy border
extension at decoder. This makes it necessary to handle border
extension separately at encoder/decoder. The use of
vp9_extend_frame_borders will be removed, when lazy border extension
is ready.
Change-Id: Ia3baba3d179d5f11eee1634f19b3b319d2a59186
Since they used in encoder only. This commit also re-order includes
for the files that include vp9_extend.h
Change-Id: I929fc113f2135d3198cd1fc6a17434e5a2f8a459
Simplifies the code by implementing band mapping with static arrays.
A lot of the code complexity introduced in a previous patch
disappears.
Change-Id: Ia3fac36e594fb5ad2d55ae141c58bba4c55c2d28
Implements scan order to band map with arrays in both the encoder
and decoder to remove conditional statements.
Encoding seems to be about 1% faster at speed 0, tested on football.
Decoding seems to be about 0.5-1% faster on a set of 25 videos.
Change-Id: Idb233ca0b9e0efd790e30880642e8717e1c5c8dd
Make the macroblockd_plane contain dynamic buffer pointers instead
static pointers to the memory space allocated therein. The decoder
uses the buffer allocated in pbi, while encoder will use a dual
buffer approach for rate-distortion optimization search.
Change-Id: Ie6f24be2dcda35df7c15b4014e5ccf236fb3f76c
This commit fixes the assignment of mode_info pointer per tile. It
makes recognition of tiles in both row and column formats and properly
arrange the use of mode_info.
The bug was first introduced in
I6226456dd11f275fa991e4a7a930549da6675915
https://gerrit.chromium.org/gerrit/#/c/67492/
Change-Id: Ie12cd209f53241513728c461ee3d7b9599ddb860
The new expression is much more logical than previous one. Surprisingly
both expressions give exactly the same set of dependent values
-- have_top, have_left, have_right -- in vp9_predict_intra_block.
Change-Id: I63eb1b592b8c37883b3a0dbb1f3daa271e446109
Now tile decoding consists of two stages:
1. Find tile buffer start and its size, put this info into tile_buffers.
2. Decode each tile based on information from tile_buffers.
It seems that stage 1 can also be reused by multithreaded tile decoder.
Change-Id: If0cdaefdd6d10bb41c63561346c9ae4cfac081dd
It is more logical to use dqcoeff buffer to put there *dequantized*
transform coefficients (inside inverse_transform_block and
decode_coefs functions). Dequantization happens inside WRITE_COEF_CONTINUE
macro.
qcoeff buffer should be only used in the encoder for *quantized*
transform coefficients.
Change-Id: Ifd54bef272bbf5311ced6669c4f1079f998af5d7
Removing special case handling from vp9_tree_probs_from_distribution(),
tree_merge_probs(), and vp9_tokens_from_tree_offset() functions. Replacing
inter_mode_offset() function with macro INTER_OFFSET which is used now for
vp9_inter_mode_tree definition.
Change-Id: Iff75a1499d460beb949ece543389c8754deaf178
Removes stack-alocation of token_cache in decode_coefs function
Seems to achieve about 1% decode speed improvement as tested on
25 480p videos.
Change-Id: I8e7eb3361fa09d9654dfad0677a6d606701fdc6e
We only update partition_probs for inter frames but they are constant
for key frames. It is not necessary to have constants inside frame
context and copy them every time. This change reduces FRAME_CONTEXT size
by at least 48 bytes.
Change-Id: If70a53be51043f37fe7d113853217937710932a7
1. Reduced the size memset based on eob for 32x32 transform. The reset
of non-zero coefficient should probably go into where they are read in
inverse transform functions. (TODO)
2. Removed a redundant level of indirection.
vp9_iht4x4_add() checks transform type and call vp9_iht4x4_16_add()
for tranforms other than DCT_DCT. In this case, the DCT_DCT case
has been already handled here.
Change-Id: Iacbc77da761f0b308df5acea0f20c9add9f33d20
The change doesn't affect the bitstream. It changes the order or function
calls and affects how we reconstruct intra- and inter-blocks. Speed up is
about 1...1.5%.
For intra-blocks:
Before:
for each transform block read tokens
for each transform block do prediction
for each transform block do inverse transform
Now:
for each transform block
read tokens
do prediction
do inverse transform
For inter-blocks:
Before:
for each transform block read tokens
for each transform block do inverse transform
Now:
for each transform block
read tokens
do inverse transform
Change-Id: I12a79bf1aa5a18c351b8010369bd3ff1deae1570
Both decode_modes_sb and decode_modes_b had conditions to immediately
return at the beginning. Eliminating these conditions here and calling
these functions only to do a real work. Also unrolling loop for
PARTITION_SPLIT.
Change-Id: I2fc41cb74ac491f045a2f04fe68d30ff4aaa555d
factorizes the code in decode_tiles(). reading the offsets backwards
wasn't doing anything to prove tile independence
Change-Id: I0395d3c77205852ebdc55efedc68291e93cef85c
"keyframe" variable in the current code actually means that previous
frame is a keyframe because cm->frame_type has not been initialized
in read_uncompressed_header.
Change-Id: I5645b0816c70abdef5dfc70113018d06276dac77
replaces use of cur_tile_mi_(row|col)_(start|end) by VP9_COMMON, making
it less stateful and more reusable for parallel tile decoding
Change-Id: I1df09382b4567a0e5f4434825d47c79afe2399be
update_partition_context / partition_plane_context: this will allow for
separate storage to be used in tile decoding
Change-Id: Ie0bc393531ab7e9d2ce35c95111849b294aad4ed
Splitting setup_inter_inter function into is_compound_prediction_allowed
and setup_compound_prediction. Moving setup_compound_prediction call
into read_comp_pred from read_uncompressed_header.
We should do the same in the encoder as well.
Change-Id: I40d75fdc4a221b2f7705df00d23a4b3fe79987c3
Assign the pointer to mode_info stream per tile. Remove the use of
tile_col in the decoding modules.
Change-Id: I7df87086708a3d92c5e20e86bcfb04e458ff47a6
This move is done to have all compressed header reading functions in one
place. Moved functions:
read_switchable_interp_probs
read_inter_mode_probs
read_comp_pred_mode
read_comp_pred
update_mv
read_mv_probs
Change-Id: I2aebb57d2826d03d11bf2f8fbbfc3a9978c4f9fb
The ref's scale_factors are set at frame level, and then copied for
each partition block. Since the struct members are mostly constant,
this patch separated the constant and non-constant members, and
reduced struct copying. This gave 0.5% ~ 1.4% decoder speed gain.
Change-Id: I94043bf5a6995c8042da52e5c661818dfa6f6d4c
This commit uses left_available flag to decide if the left mode_info
struct is available for left_block_mode. As discussed with James
Zern (jzern@), this prevents the codec from fetching mode_info from
blocks in the left tile, which although effectively not used might
present concerns for multi-threaded tile decoding.
This is NOT a bit-stream change.
Change-Id: I1dc8cf1bcbf056688eee27c7bc5706ac4b4e0125
We used set_partition_seg_context() only before calls to:
1. update_partition_context()
2. partition_plane_context()
Moving these functions from vp9_blockd.h to vp9_onyxc_int.h and
inlining set_partition_seg_context into them. After that it is not
necessary to have {above, left}_seg_context fields in MACROBLOCKD struture,
so removing them also.
Change-Id: I4723f59e1c8f3788432b7f51185d8d747b3a97f9
missed one in vp9_detokenize.c in the last
+ add some asserts in vp9_decode_frame() to catch regressions
Change-Id: Ide67505114ee17efdafb13694aed0c09039e5a16
replace VP9D_COMP usage with the (slightly) more targeted
VP9_COMMON/MACROBLCKD/struct segmentation structures.
Change-Id: Iabb3616e231417b0e17b7e4b384ea63167a81745
Renames for consistency with other constants:
NUM_FRAME_TYPES -> FRAME_TYPES
NUM_PARTITION_CONTEXTS -> PARTITION_CONTEXTS
Change-Id: I3db30acb2868eb0a424237c831087b2e264ec47f
in most cases at least the left column was a harmless race as it was
left unused later in the code.
Change-Id: I43211df66fb157c6feecf08c681add4fcf18b644
That makes decoder and encoder (only bitstream writing part) a little bit
simpler and faster. Moving get_sb_index() function to the encoder.
Change-Id: Ie91aaeefd69c84b085948267b33556a7666c6278
cherry-picked from:
commit 988b70844e03efcfcc075a9bc25d846670494f36
Author: Pascal Massimino <pascal.massimino@gmail.com>
Date: Fri Aug 2 11:15:16 2013 -0700
add WebPWorkerExecute() for convenient bypass
This is mainly for re-using the worker structs without using the
thread.
Change-Id: I8e1be29e53874ef425b15c192fb68036b4c0a359
Original source:
http://git.chromium.org/webm/libwebp.git
100644 blob c0d318aee628fdf9ba4876451a28aa978f1066b8 src/utils/thread.c
100644 blob c2b92c9fe353f8e514f78922f3d237204a9cbc66 src/utils/thread.h
Change-Id: I13fe92b1e94062bb99fdeeb7cb0b4b0575d27793
* changes:
Use a separate MODE_INFO stream for each tile column
Get rid of "this_mi", use "mi_8x8[0]" everywhere instead
Make the static_segmentation feature work again
The only case where they were intentionally pointing to different
structures was in mbgraph, and this didn't have the expected behavior
because both of these pointers are used interchangeably through the code
Change-Id: I979251782f90885fe962305bcc845bc05907f80c
Moving code that gets band_translate array from get_scan_and_band()
function to get_band_translate() function. Renaming get_scan_and_band() to
get_scan().
Change-Id: I43047c205a1ca2a6e24be44db39dc04b7a385008
Updated the encoder to handle frames that are coded
intra-only. Intra-only frames must be non-showable,
that is, the "show frame" flag must be set to 0 in
the frame header.
Tested by forcing the ARF frames to be coded intra-
only.
Note: The rate control code will need to be modified
to account for intra-only frames better than they
are currently handled.
Change-Id: I6a9dd5337deddcecc599d3a44a7431909ed21079
For bad input data, the decoder may access the array out of bounds. The
commit added clamp to prevent such out of bound access
Change-Id: I0a1cfd9b8786ea7113a998053c76605c963b077a
Values of MODE_UPDATE_PROB and VP9_COEF_UPDATE_PROB are equal, so replacing
them with one constant. Inlining appropriate arguments for functions:
vp9_cond_prob_diff_update (encoder)
vp9_diff_update_prob (decoder)
Change-Id: I1255a1cb477743b799b3bfbbcd8de6b32b067338
The idea is to have the following names for each transform size:
vp9_idct4x4_add
vp9_idct4x4_1_add
vp9_idct4x4_10_add
vp9_idct4x4_16_add
vp9_idct8x8_add
vp9_idct8x8_1_add
vp9_idct8x8_10_add
vp9_idct8x8_64_add
etc for 16x16, 32x32
The actual list of renames in this patch:
vp9_idct_add_lossless -> vp9_iwht4x4_add
vp9_short_iwalsh4x4_add -> vp9_iwht4x4_16_add
vp9_short_iwalsh4x4_1_add -> vp9_iwht4x4_1_add
vp9_idct_add -> vp9_idct4x4_add
vp9_short_idct4x4_add -> vp9_idct4x4_16_add
vp9_short_idct4x4_1_add -> vp9_idct4x4_1_add
Change-Id: I6f43f7437c68dd30cdd05d72e213765578ed30b1
Moving INTERPOLATIONFILTERTYPE enum and subpix_fn_table struct to
vp9_filter.h. Adding convenient typedef for subpel kernels.
Function vp9_setup_interp_filters() besides setting xd->subpix.filter_x &
xd->subpix.filter_y has a side effect of also setting scale factors. This
is not required inside decode_modes_b() because scale factors have been
already set by set_ref() calls. That's why replacing
vp9_setup_interp_filters() call with newly created vp9_get_filter_kernel()
call. The behavior of vp9_setup_interp_filters() is unchanged (it
is used from the encoder).
Change-Id: I3f36d3f7cd8d15195a6e2fafd1777cdaf9ecb847
Moving functions from vp9_idct_blk to vp9_idct because these functions are
used from both encoder and decoder. Removing duplicated code from
vp9_encodemb.c and reusing existing functions.
Change-Id: Ia0a6782f8c4c409efb891651b871dd4bf22d5fe8
The codec should effectively run with motion vector of range (-2048, 2047)
in full pixels, for sequences of 1080p and below. Add assertions to clarify
this behavior.
Change-Id: Ia0cac28249f587d8f8882205228fa480263ab313
Moving out decode_tokens function calls and adding decode_blocks boolean
variable. We only have to decode if eobtotal > 0, i.e. we have at least one
non-zero coefficient. Also inlining and remove vp9_set_pred_flag_mbskip
function.
Change-Id: I7be38b12ee8206faf0beea2bbf4d52be42575b03
We don't need these functions anymore. The only one which was actually
used is vp9_add_constant_residual_32x32. Addition of
vp9_short_idct32x32_1_add eliminates this single usage. SSE2 optimized
version of vp9_short_idct32x32_1_add will be added in the next patch set,
right now it is only C implementation. Now we have all idct functions
implemented in a consistent manner.
Change-Id: I63df79a13cf62aa2c9360a7a26933c100f9ebda3
apparently we are going to have trouble completely removing lint issue in this file.
It needs a bit more work. We need to include vpx_config.h to know whether
we need to have multi threading . and that means vpx_config.h has to come
before the system headers. ( a violation )
Change-Id: I023feeab1bf5643b79dccc3b80a4a9ad42689e7b
Signed-off-by: Jim Bankoski <jimbankoski@google.com>
The probability model used to code prediction mode is conditioned
on the immediate above and left 8x8 blocks' prediction modes. When
the above/left block is coded in sub8x8 mode, we use the prediction
mode of the bottom-right sub8x8 block as the reference to generate
the context.
This commit moves the update of mbmi.mode out of the sub8x8 decoding
loop, hence removing redundant update steps and keeping the bottom-
right block's mode for the decoding process of next blocks.
Change-Id: I1e8d749684d201c1a1151697621efa5d569218b6
mode_info_context was stored as a grid of MODE_INFO structs.
The grid now constists of pointers to MODE_INFO structs. The
MODE_INFO structs are now stored as a stream (decoder only),
eliminating unnecessary copies and is a little more cache
friendly.
Change-Id: I031d376284c6eb98a38ad5595b797f048a6cfc0d
mode_info_context was stored as a grid of MODE_INFO structs.
The grid now constists of a pointer to a MODE_INFO struct and
a "in the image" flag. The MODE_INFO structs are now stored
as a stream, eliminating unnecessary copies and is a little
more cache friendly.
For the test clips used, the decoder performance improved
by ~4.3% (1080p) and ~9.7% (720p).
Patch Set 2: Re-encoded clips with latest. Now ~1.7% (1080p)
and 5.9% (720p).
Change-Id: I846f29e88610fce2523ca697a9a9ef2a182e9256
The segment feature SEG_LVL_SKIP requires the prediction unit size
to be at least BLOCK_8X8. This commit makes the requirement to be
explicit. This is to prevent future encoder implementations from
making wrong choices.
Change-Id: I0127f0bd4c66e130b81f0cb0a8d3dbfe3b2da5c2
Switching from mi_{width, height}_log2 and b_{width, height}_log2 to
num_8x8_blocks_{wide, high} and num_4x4_blocks_{wide, high}. Removing
redundant code, adding const.
Change-Id: Iaab2207590fd24d0b76999071778d1395dc5cd5d
vp9_setup_interp_filters before each inter block decoding, it is not
necessary to call it just before the whole frame decoding.
Change-Id: Id1b0ee62f987474e27eafba0013a4896b492c400
We could avoid calling clamp_mv2 because it has been already called
inside vp9_find_best_ref_mvs function.
Change-Id: I08edeaf3e11e98c19e67b9711b2523ca5fb1416e
Fix of https://code.google.com/p/webm/issues/detail?id=608. We could have
used invalid display size equal to the previous frame size (not to the
current frame size).
Change-Id: I91b576be5032e47084214052a1990dc51213e2f0
Making code more compact, adding consts, removing redundant arguments,
adding do/while(0) for macros.
Change-Id: Ic9ec0bc58cee0910a5450b7fb8cfbf35fa9d0d16
It is possible to have invalid scale factors and not access them
during decoding. Error is reported if we really try to use invalid scale
factors.
Change-Id: Ie532d3ea7325ee0c7a6ada08269f804350c80fdf
Comment is wrong, we don't initialize any xd pointers. We only initialize
xd->planes[i]->dst and xd->planes[i]->pre[], which are actually initialized
for every block during the decoding.
Change-Id: If152ea872ebef1f83ca70712fa6f8df1b6855f56
the final macroblock rows are scheduled in the main thread. prior to
this change one additional macroblock row would be scheduled in the
worker forcing the main thread to wait before finishing.
Change-Id: I05f3168e5c629b898fcebb0d77eb6d6a90d6105e
Adding set_contexts contexts function and call it instead of
set_contexts_on_border. Calling txfrm_block_to_raster_xy to get aoff and
loff.
Change-Id: I41897e344afd2cae1f923f4fdbe63daccf6fe80e
Updating all foreach_transformed_block_visitor functions to work with
plane block size instead of general block. Removing a lot of duplicated
code.
Change-Id: I6a9069e27528c611f5a648e1da0c5a5fd17f1bb4
This change set is intermediate. The next one will remove all repetitive
plane_bsize calculations, because it will be passed as argument to
foreach_transformed_block_visitor.
Change-Id: Ifc12e0b330e017c6851a28746b3a5460b9bf7f0b
When the frame size changes the last frame segment map must
be resized to match and initialized to 0.
Change-Id: Idc10de109f55dbe9af3a6caae355a2974712243d
VP9_COMMON is the right place to segmentatation struct because it has
global segmentation parameters, not something specific to macroblock
processing.
Change-Id: Ib9ada0c06c253996eb3b5f6cccf6a323fbbba708
Making foreach_transformed_block_in_plane more clear (it's not finished
yet). Using explicit tx_size variable consistently instead of
(ss_txfrm_size / 2) or (ss_txfrm_size >> 1) expression.
Change-Id: I1b9bba2c0a9f817fca72c88324bbe6004766fb7d
The macro block mode info context originally contained an
entry for each 16x16 macroblock. In VP9 each entry refers
to an 8x8 region not a macro block, so the naming is misleading.
This first stage clean up changes the names of 3 entries in the
structure to remove the mb_ prefix.
TODO clean up the nomenclature more widely in respect of
mbmi and bmi.
Change-Id: Ia7305c6d0cb805dfe8cdc98dad21338f502e49c6
Loop filter configuration doesn't belong to macroblock, so moving it from
MACROBLOCKD to VP9_COMMON. Also moving the declaration of loopfilter struct
from vp9_blockd.h to vp9_loopfilter.h.
Change-Id: I4b3e34be9623b47cda35f9b1f9951f8c5b1d5d28
There was no benefit having this function. For example, inside
read_switchable_filter_type switchable filter context was calculated twice.
Change-Id: I79cd5bf95cbc0f6d8bf91a2e32289e01b18dcff1
Currently the only threaded option for vp9 decode. Enabled when the
decoder config thread count is > 1.
Change-Id: I082959abac9e31aa4a38ed9fd68b94680e57f4df
This changeset allows to remove vp9_switchable_interp and
vp9_switchable_interp_map arrays and make code much clear. Actually we
still have to use these mapping but only inside read_interp_filter_type and
write_interp_filter_type functions.
Change-Id: I4026c6f8c4acefba6c81421b7bacbaa52cc45f50
Removing assign_and_clamp_mv function, making implementation of clamp_mv
and clamp_mv2 more clear and consistent.
Change-Id: Iecd08e1c1bf0379f8314ebe01811f8253f4ade58
This commit optimizes the tokenization and detokenization operational
flow for speed-up. It makes the coding process about 0.3% faster at
speed 0.
Change-Id: I28008df7482874e4b5f237f2d418ff82a249dd56
This commit provides special handle on 16x16 inverse 2D-DCT, where
only DC coefficient is quantized to be non-zero value.
Change-Id: I7bf71be7fa13384fab453dc8742b5b50e77a277c
This commit brought back the shortcut implementation of 8x8/16x16
inverse 2D-DCT. When the eob <= 10, it skips the inverse transform
operations on row 4:7/4:15 in the first round. For bus_cif at 1000
kbps, this provides about 2% speed-up at speed 0.
Change-Id: I453e2d72956467d75be4ad8c04b4482ab889d572
This commit enables a special handle for the 8x8 inverse 2D-DCT,
where only DC coefficient is quantized to be non-zero. For bus_cif
at 2000 kbps, it provides about 1% speed-up at speed 0.
Change-Id: I2523222359eec26b144cf8fd4c63a4ad63b1b011
Now read_inter_mode_info calls read_intra_block_part (renamed from
read_intra_block_modes) or read_inter_block_part (just added).
Change-Id: I541badea6b663e0ae692ec158665efb90ed20c03
Removing unused constants, macros, and function declarations. Using
ROUND_POWER_OF_TWO macro, vp9_zero, vp9_copy where possible. Moving
#include from *.h to *.c. Merging for loops for motion vectors.
Change-Id: Ic3bf841764a2bb177128bb3a6d7aa8f68229cd13
Adding plane type check condition because it was always used outside of
get_tx_type_{4x4, 8x8, 16x16}.
Change-Id: I02f0bbfee8063474865bd903eb25b54d26e07230
Counts are separate from frame context. We have several frame contexts but
need only one copy of all counts.
Change-Id: I5279b0321cb450bbea7049adaa9275306a7cef7d
Renamed:
MAX_MB_SEGMENTS to MAX_SEGMENTS
MB_SEG_TREE_PROBS to SEG_TREE_PROBS
The minimum unit for segmentation in the segment map
is now 8x8 so it is misleading to use MB_ as macro-block
traditionally refers to a 16x16 region.
Change-Id: I0b55a6f0426bb46dd13435fcfa5bae0a30a7fa22
Moving common encoder/decoder code to update_tx_counts. Also renaming
vp9_get_pred_probs_tx_size to get_tx_probs2 and adding get_tx_probs to
call vp9_get_pred_context_tx_size inside read_selected_tx_size only once
(twice before).
Change-Id: Ia50247f3893de88ef8e9041b0d44be44a40aaa4d
Adding loopfilter struct with fields from MACROBLOCKD and VP9Common.
Eventually it will be moved to vp9_loopfilter.h for better code structure.
Change-Id: Iaf5fb71c33719cdfa1b991f671caf071be9ea035
Renaming vp9_sb_mv_ref_tree to vp9_inter_mode_tree, and
vp9_sb_mv_ref_encoding_array to vp9_inter_mode_encodings.
Change-Id: I0e91fbf81350d3ec5a2599064c74089b5d06133a
This prevents a duplicate memcpy of a 128-byte struct every time
set_scale_factors() is called (which is a lot), thus leading to a
decrease from 3.7 MB to 1.85 MB of struct copying per 64x64 block
RD/partition loop.
Overall, this decreases encoding time of the first 50 frames of bus
@ 1500kbps (speed 0) from 1min5.9 to 1min4.9, i.e. about a 1.5%
overall speedup. We can likely get more gains by removing the copy
of the other struct (and replacing it with an indexing) as well.
Change-Id: I3dceb7e79f71e6fe911b11cc994cf89a869dde7a
These arrays have constant values (no any updates). Removing two
corresponding memcpy calls. Making a little cleanup in vp9_entropymode.h
as well: removing redundant 'extern' keyword and moving all function
declarations at the end.
Change-Id: Ia16b38b46aec2e2500f5df29c40a297ae241dede
Removing tile_rows and tile_columns from VP9Common, removing redundant
constants MIN_TILE_WIDTH and MAX_TILE_WIDTH, changing signature of
vp9_get_tile_n_bits.
Change-Id: I8ff3104a38179b2c6900df965c144c1d6f602267
Making implementation of vp9_set_pred_flag_{seg_id, mbskip} consistent
with vp9_get_segment_id without using confusing sub(a, b) macro. Passing
mi_row and mi_col to functions explicitly instead of replying on
mb_to_right_edge and mb_to_bottom_edge.
Change-Id: I54c1087dd2ba9036f8ba7eb165b073e807d00435
This is a short term optimization till we work out a decoder
implementation requiring no frame border extension.
Change-Id: I02d15bfde4d926b50a4e58b393d8c4062d1be70f
Removing unused DEC_DEBUG define and dec_debug variable. Changing function
signatures to eliminate code duplication, renaming function
mb_init_dequantizer to init_dequantizer. Also removing redundant curly
braces, and comments.
Change-Id: Ia56ee1b0be5f24abb0e878581845be8a4773c298
This function is actually called from set_offsets which is called right
before vp9_read_mode_info.
Change-Id: Ibb9d5ad606194bc80eab264fad85b31c9dfd8f77
this was never fleshed out in the context of VP8, for which it was
added. for VP9 it has no meaning.
Change-Id: Iba2ecc026d9e947067b96690245d337e51e26eff
Adding segmentation struct to vp9_seg_common.h. Struct members are from
macroblockd and VP9Common structs. Moving segmentation related constants
and enums to vp9_seg_common.h.
Change-Id: I23fabc33f11a359249f5f80d161daf569d02ec03
Uses mapping tables instead of complicated modulo/division
operations for prob mapping for forward updates.
No bit-stream or output change.
Change-Id: Ifd9ce8ac1437835c305c94f64c18273c7a68f546
This should significantly speedup cost_coeffs(). Basically what the
patch does is to make the neighbour arrays padded by one item to
prevent an eob check in get_coef_context(), then it populates each
col/row scan and left/top edge coefficient with two times the same
neighbour - this prevents a single/double context branch in
get_coef_context(). Lastly, it populates neighbour arrays in pixel
order (rather than scan order), so we don't have to dereference the
scantable to get the correct neighbours.
Total encoding time of first 50 frames of bus (speed 0) at 1500kbps
goes from 2min10.1 to 2min5.3, i.e. a 2.6% overall speed increase.
Change-Id: I42bcd2210fd7bec03767ef0e2945a665b851df56
Total encoding time for first 50 frames of bus (speed 0) @ 1500kbps
goes 2min34.8 to 2min14.4, i.e. a 10.4% overall speedup. The code is
x86-64 only, it needs some minor modifications to be 32bit compatible,
because it uses 15 xmm registers, whereas 32bit only has 8.
Change-Id: I2df53770c2e850813ffa713e1a91b45b0082b904
Adding read_skip_coeff function. Renaming decode_mv to read_mv for
consistency with another function names. Removing redundant function
arguments. Renaming kfread_modes to read_intra_mode_info, read_mb_modes_mv
to read_inter_mode_info, vp9_decode_mb_mode_mv to vp9_read_mode_info,
vp9_decode_mode_mvs_init to vp9_prepare_read_mode_info. Inlining function
mb_mode_mv_init inside vp9_prepare_read_mode_info.
Change-Id: Ifee05d333da4cd331d4aff40ce41ccd9b70e494a
Makes cost_coeffs() a lot faster:
4x4: 236 -> 181 cycles
8x8: 888 -> 588 cycles
16x16: 3550 -> 2483 cycles
32x32: 17392 -> 12010 cycles
Total encode time of first 50 frames of bus (speed 0) @ 1500kbps goes
from 2min51.6 to 2min43.9, i.e. 4.7% overall speedup.
Change-Id: I16b8d595946393c8dc661599550b3f37f5718896
Adding CHECK_MEM_ERROR macro to vp9_common.h and removing two duplicated
ones from vp9_onyx_int.h and vp9_onyxd_int.h.
Change-Id: I916afec61b3019f18193135dac7c35ed0f89b8b6
Using vp9_set_pred_flag function instead of custom code, adding
decode_tokens function which is now called from decode_atom,
decode_sb_intra, and decode_sb.
Change-Id: Ie163a7106c0241099da9c5fe03069bd71f9d9ff8
This commit enables configurable reference buffer pointer for intra
predictor. This allows later removal of spatial dependency between
blocks inside a 64x64 superblock in the rate-distortion optimization
loop.
Change-Id: I02418c2077efe19adc86e046a6b49364a980f5b1