This patch fixed issue 661: "Decoder produces mismatched outputs
with ssse3 enabled and disabled." In sub-pixel filters, a pixel
value was multiplied by a filter coefficient, and the results
were added up. The order of adding up these multiplications had to
be arranged carefully to prevent incorrect overflowing.
Change-Id: Id08af4200fea9e1b896fc40157b8651c2c7e80f2
Reversing bit order of partition_context_lookup, and modifying accordingly
update_partition_context() and partition_plane_context().
Change-Id: I64a11f1a94962a3bf217de2f50698cb781db71a5
This rebase is a better implementation of the previous ones.
Modifications are done to reduce the total clock cycle.
Speedup: 1.341
Compiled with -O3
Tested with: park_joy_420_720p50.y4m
Change-Id: I940eaf283f60597ca0d9d2e13d518878d55ff02d
Since they used in encoder only. This commit also re-order includes
for the files that include vp9_extend.h
Change-Id: I929fc113f2135d3198cd1fc6a17434e5a2f8a459
This patch followed "Rewrite filter_selectively_horiz for parallel
loopfiltering" commit, and added x86 SSE2 optimization to do
16-pixel filtering in parallel. Also, corrected the declaration
of aligned arrays. For 8-pixel-in-parallel case, improved the
calculation of the masks and filters. Updated the threshold loading
since the thresholds were already duplicated. Updated neon C functions
to call neon loopfilters twice.
Using tulip clip, tests showed it gave a ~1.5% decoder speed gain.
Change-Id: Id02638626ac27a4b0e0b09d71792a24c0499bd35
Separate the rounding and right shift operations of forward transform
from those of inverse transform. Take out the assertion check from
inverse transforms. If the transform coefficients were constructed to
cause intermediate steps of inverse transform overflow, the codec will
just let it overflow without breaking the decoding flow.
Change-Id: I73cfc3706c4e840fc543a77cbc4cdb0b05d07730
on arm until we implenment real vp9_idct32x32_34_add_neon.
This issue is due to commit 47665452f0da3c11427ecb4852535e1787bb0c5b
Merge "Add 32x32 idct function for eob<=34 case".
Change-Id: I56b5f0abc20e7dd1bba521f78a995e85d65ea296
Simplifies the code by implementing band mapping with static arrays.
A lot of the code complexity introduced in a previous patch
disappears.
Change-Id: Ia3fac36e594fb5ad2d55ae141c58bba4c55c2d28
As Jim suggested, 1D array was used to store filter levels instead
of 2D array. This used shift_y in setup_mask directly, and saved
few cycles.
Change-Id: If61ab298784861f1806b1cd396d4e4e2e0f097b9