Removing goto and using while loop instead, renaming seg_eob to max_eob,
moving eob token counter increment.
Change-Id: Idcc4b3a45e4f313596a71776aef56691a6647e5f
E.g. disable vertical partioning for 4:2:2. Until we come up with something
better to do with the chroma block size, this prevents an assert error.
Change-Id: I9394fb3f14ec1343abc3ad4769de208e6278f285
Considering a horizontal edge, if mask_16x16 is 1 for an even-
indexed 8x8 block, then mask_16x16 is 1 for next 8x8 block in
same row. Similiar to a verticle edge, if mask_16x16 is 1 for
an even-rowed 8x8 block, then mask_16x16 is 1 for the 8x8 block
right below it in next raw. Based on that, the mask_16x16 checking
can be simplified to save cycles. The corresponding 8-pixel
vp9_mb_lpf_horizontal_edge code can also be removed.
Change-Id: Ic3fe7a5674322239208cbe2731dc3216ce2084f3
We only need qcoeff buffers in the encoder. Reducing TileWorkerData struct
and VP9Decompressor struct sizes by 24K.
Change-Id: Id148868461f7ffa3d3dd634b371503ae9c57e207
Renaming treed_read() to consistent vp9_read_tree() and moving it from
deleted vp9_treereader.h to vp9_dboolhuff.h file.
Change-Id: Iedd8655acbe25e4fcf62b79e5a13bdea69b6b004
vp9_idct32x32_34_add_sse2:
speedup: 1.472
IDCT32_1D_34 and MULTIPLICATION_AND_ADD_2 are optimized
based on the fact that Only upper-left 8x8 has
non-zero values.
vp9_idct32x32_1024_add_sse2:
speedup: 1.032
Tested with: park_joy_420_720p50.y4m
Change-Id: I8670ce547552b48695049de298e2fc46ce28dfbc
The idea here is to allow "in frame" adjustment of the final Q
value used to encode each SB64, using segmentation.
There is also adjustment of the rd mult in regions of overspend.
Activated using aq_mode=2
Change-Id: I2f140cd898c9f877c32cd6d2e667f5e11ada4b1c
The decoder will construct inter predictor using lazy border extension,
while the encoder, going with multiple runs of motion search in the rate-
distortion optimization loop for each block, does border extension at
frame level. This commit makes separate the inter predictors for encoder
and decoder, respectively.
Change-Id: Ieca2fecba3a7201a6d64ef9f219e5d91e50559c3
When calling check_initial_width through vp9_set_size_literal
the function was defaulting to using non-subsampled chroma.
This patch changes the default to assume sampled chroma as
an interim solution until complete support for other
color formats is added.
Change-Id: Id8e7e919b350e3473dfdf7551af6fd0716478b04
This commit takes out vp9_extend_frame_borders from
vp9_setup_scale_factors.
The refactoring is for the preparation of the use of lazy border
extension at decoder. This makes it necessary to handle border
extension separately at encoder/decoder. The use of
vp9_extend_frame_borders will be removed, when lazy border extension
is ready.
Change-Id: Ia3baba3d179d5f11eee1634f19b3b319d2a59186
The decoder ignored the display width & height
specified in the frame header.
This patch adds a control, VP9D_GET_DISPLAY_SIZE, to
allow the application to obtain the display width and
height from the frame header.
vpxdec has been modified to scale the output frame to
this size.
Should the request for the display size fail vpxdec will
use the native width and height of the raw decoded
frame instead.
Change-Id: I25db04407426dac730263720c75a7dd6400af68a
This patch followed "Add filter_selectively_vert_row2 to enable
parallel loopfiltering" commit, and added x86 SSE2 optimization
to do 16-pixel filtering in parallel. For other optimizations
(neon and dspr2), current 16-pixel functions were done by calling
8-pixel functions twice, and real 16-pixel functions could be added
later.
Decoder speedup:
tulip clip: 2% speed gain;
old_town_cross: 1.2% speed gain;
bus: 2% speed gain.
Change-Id: I4818a0c72f84b34f5fe678e496cf4a10238574b7
Moves all rate control variables to a separate structure,
removes some currently unused variables,
moves some rate control functions to vp9_ratectrl.c,
and splits the encode_frame_to_data_rate function.
Change-Id: I4ed54c24764b3b6de2dd676484f01473724ab52b
Using for loop based on max_tx_size instead of separate checks. Combining
build_coeff_contexts() with update_coef_probs().
Change-Id: Ie335a7db29830677fbc14478a9c190d3c1068665
Modifications are done to reduce the total clock cycle.
Speedup: 1.2
Tested with: park_joy_420_720p50.y4m
Change-Id: Ia36b87e62e2f80a5fadaf5628729aedc80f38f3f
Both functions have no relation to motion vectors, so moving them from
vp9_findnearmv.h to vp9_blockd.h.
Change-Id: I74f524267886ab0fff4a2da793a10c906ed0f43a
Added filter_selectively_vert_row2 to be ready for parallel
loopfiltering in vertical direction. This change did 2-row
filtering at a time. If 2 vertically adjacent 8x8 blocks do same
type of filtering, we can do 16-pixel filtering in parallel.
Next, we need to provide 16-pixel loopfiltering functions in c
and optimized versions for codec speedup.
Change-Id: Idf97bbdd70566e55bd30e1fd25cb8544e33291be
Add support to do 16 pixel horizontal filtering in Neon.
Nexus devices saw about 0.5% decode speed increase.
Change-Id: I2993f6c2d49f31fa74976879eeaa289fd3f4e15d
This function is called from vp9_setup_past_independence() which is called
before the modified piece of code. Moving reset of inter_mode_probs into
vp9_init_mbmode_probs() for consistency.
Change-Id: Ib188e8798e1fbe15407fd501406761b746fdda95
Although no mismatch was indicated for 8/16 wide sub-pixel filters
in issue 661, they had similar problems that could cause mismatch
potentially. This patch fixed calculations in HORIZx8/16
and VERTx8/16.
Change-Id: I169961c9d40a20340995b7d22aafc89ccf30bfca