vp9_idct32x32_34_add_sse2:
speedup: 1.472
IDCT32_1D_34 and MULTIPLICATION_AND_ADD_2 are optimized
based on the fact that Only upper-left 8x8 has
non-zero values.
vp9_idct32x32_1024_add_sse2:
speedup: 1.032
Tested with: park_joy_420_720p50.y4m
Change-Id: I8670ce547552b48695049de298e2fc46ce28dfbc
The idea here is to allow "in frame" adjustment of the final Q
value used to encode each SB64, using segmentation.
There is also adjustment of the rd mult in regions of overspend.
Activated using aq_mode=2
Change-Id: I2f140cd898c9f877c32cd6d2e667f5e11ada4b1c
The decoder will construct inter predictor using lazy border extension,
while the encoder, going with multiple runs of motion search in the rate-
distortion optimization loop for each block, does border extension at
frame level. This commit makes separate the inter predictors for encoder
and decoder, respectively.
Change-Id: Ieca2fecba3a7201a6d64ef9f219e5d91e50559c3
When calling check_initial_width through vp9_set_size_literal
the function was defaulting to using non-subsampled chroma.
This patch changes the default to assume sampled chroma as
an interim solution until complete support for other
color formats is added.
Change-Id: Id8e7e919b350e3473dfdf7551af6fd0716478b04
This commit takes out vp9_extend_frame_borders from
vp9_setup_scale_factors.
The refactoring is for the preparation of the use of lazy border
extension at decoder. This makes it necessary to handle border
extension separately at encoder/decoder. The use of
vp9_extend_frame_borders will be removed, when lazy border extension
is ready.
Change-Id: Ia3baba3d179d5f11eee1634f19b3b319d2a59186
The decoder ignored the display width & height
specified in the frame header.
This patch adds a control, VP9D_GET_DISPLAY_SIZE, to
allow the application to obtain the display width and
height from the frame header.
vpxdec has been modified to scale the output frame to
this size.
Should the request for the display size fail vpxdec will
use the native width and height of the raw decoded
frame instead.
Change-Id: I25db04407426dac730263720c75a7dd6400af68a
This patch followed "Add filter_selectively_vert_row2 to enable
parallel loopfiltering" commit, and added x86 SSE2 optimization
to do 16-pixel filtering in parallel. For other optimizations
(neon and dspr2), current 16-pixel functions were done by calling
8-pixel functions twice, and real 16-pixel functions could be added
later.
Decoder speedup:
tulip clip: 2% speed gain;
old_town_cross: 1.2% speed gain;
bus: 2% speed gain.
Change-Id: I4818a0c72f84b34f5fe678e496cf4a10238574b7
Moves all rate control variables to a separate structure,
removes some currently unused variables,
moves some rate control functions to vp9_ratectrl.c,
and splits the encode_frame_to_data_rate function.
Change-Id: I4ed54c24764b3b6de2dd676484f01473724ab52b
Using for loop based on max_tx_size instead of separate checks. Combining
build_coeff_contexts() with update_coef_probs().
Change-Id: Ie335a7db29830677fbc14478a9c190d3c1068665
Modifications are done to reduce the total clock cycle.
Speedup: 1.2
Tested with: park_joy_420_720p50.y4m
Change-Id: Ia36b87e62e2f80a5fadaf5628729aedc80f38f3f
Both functions have no relation to motion vectors, so moving them from
vp9_findnearmv.h to vp9_blockd.h.
Change-Id: I74f524267886ab0fff4a2da793a10c906ed0f43a
Added filter_selectively_vert_row2 to be ready for parallel
loopfiltering in vertical direction. This change did 2-row
filtering at a time. If 2 vertically adjacent 8x8 blocks do same
type of filtering, we can do 16-pixel filtering in parallel.
Next, we need to provide 16-pixel loopfiltering functions in c
and optimized versions for codec speedup.
Change-Id: Idf97bbdd70566e55bd30e1fd25cb8544e33291be
Add support to do 16 pixel horizontal filtering in Neon.
Nexus devices saw about 0.5% decode speed increase.
Change-Id: I2993f6c2d49f31fa74976879eeaa289fd3f4e15d
This function is called from vp9_setup_past_independence() which is called
before the modified piece of code. Moving reset of inter_mode_probs into
vp9_init_mbmode_probs() for consistency.
Change-Id: Ib188e8798e1fbe15407fd501406761b746fdda95
Although no mismatch was indicated for 8/16 wide sub-pixel filters
in issue 661, they had similar problems that could cause mismatch
potentially. This patch fixed calculations in HORIZx8/16
and VERTx8/16.
Change-Id: I169961c9d40a20340995b7d22aafc89ccf30bfca
In commit "3d50da5397d20abc932d81453b26cde758293a40", the stack
pointer was modified while aligning the stack, and it needed to
be pop out at the end.
Change-Id: I062971e195f1f2ab9d0ab5fb84dcf215a0fcaa67
There are many places in handle_inter_mode that need to restore the
dst buffer pointers, due to buffer pointer swap and early rd search
breakout. This commit wraps these operations into an inline function
for clean-up.
Change-Id: I0462e8c41c8bc3cd8db07395489cac03d8e5be54
This patch fixed issue 661: "Decoder produces mismatched outputs
with ssse3 enabled and disabled." In sub-pixel filters, a pixel
value was multiplied by a filter coefficient, and the results
were added up. The order of adding up these multiplications had to
be arranged carefully to prevent incorrect overflowing.
Change-Id: Id08af4200fea9e1b896fc40157b8651c2c7e80f2
Reversing bit order of partition_context_lookup, and modifying accordingly
update_partition_context() and partition_plane_context().
Change-Id: I64a11f1a94962a3bf217de2f50698cb781db71a5