This CL contains two AVX2 optimized loop filter functions,
mb_lpf_horizontal_edge_w_avx2_8 and mb_lpf_horizontal_edge_w_avx2_16.
Change-Id: I604e4fe6e99752b7800c2ea98721d97f7e0b931b
When only upper-left 8x8 area has non-zero dct coefficients, we
could skip 1D IDCT for 9th to 32th rows to save operations. This
function is called when eob <= 34.
Change-Id: I9684b75947bdde346cfe3720f08a953aa7a13fb5
For consistency with idct function names. Renames:
vp9_short_fdct4x4 -> vp9_fdct4x4
vp9_short_walsh4x4 -> vp9_fwht4x4
Change-Id: Id15497cc1270acca626447d846f0ce9199770f58
Just making fdct consistent with iht/idct/fht functions which all use
stride (# of elements) as input argument.
Change-Id: I0ba3c52513a5fdd194f1e7e2901092671398985b
This patch fixed a bug that caused 32bit PIC build mismatch. The
stack pointer was modified after "GET_GOT". Loading left pointer
from a hard-coded position gave wrong result.
Change-Id: Iea0aec6f917b12a6b3393ffc986bad74510248cc
Commit "d207 intra prediction ssse3 using bytes" caused mismatch
while building 32bit PIC code. Disabled these SSSE3 functions
until we fix the bug.
Change-Id: Ic444e531d3d4058092fe6eab09006b44fcb18e4c
Just making fdct consistent with iht/idct/fht functions which all use
stride (# of elements) as input argument.
Change-Id: Ibc944952a192e6c7b2b6a869ec2894c01da82ed1
Just making fdct consistent with iht/idct/fht functions which all use
stride (# of elements) as input argument.
Change-Id: I2d95fdcbba96aaa0ed24a80870cb38f53487a97d
Just making fdct consistent with iht/idct/fht functions which all use
stride (# of elements) as input argument.
Change-Id: Id623c5113262655fa50f7c9d6cec9a91fcb20bb4
We have two SSE2-optimized functions for idct4_1d:
vp9_idct4_1d_sse2 <-- removing this one
idct4_1d_sse2
vp9_idct4_1d_sse2 was used only by the following functions which already
have SSE2 optimized variants:
vp9_idct4x4_16_add_c -> vp9_idct4x4_16_add_see2
idct8_1d -> vp9_idct8x8_{16, 10, 1}_see2
vp9_short_iht4x4_add_c -> vp9_short_iht4x4_add_see2
Change-Id: Ib0a7f6d1373dbaf7a4a41208cd9d0671fdf15edb
To ensure fast encoding/decoding on devices without ssse3 support,
SSE2 optimization of sub-pixel filters was done. Test using 1080p
clip showed the decoder speeds were ~70fps with ssse3 filters, ~60fps
with sse2 filters, and ~15fps with c filters.
Change-Id: Ie2088f87d83a889fba80a613e4d0e287aadd785c
The idea is to have the following names for each transform size:
vp9_idct4x4_add
vp9_idct4x4_1_add
vp9_idct4x4_10_add
vp9_idct4x4_16_add
vp9_idct8x8_add
vp9_idct8x8_1_add
vp9_idct8x8_10_add
vp9_idct8x8_64_add
etc for 16x16, 32x32
The actual list of renames in this patch:
vp9_idct_add_lossless -> vp9_iwht4x4_add
vp9_short_iwalsh4x4_add -> vp9_iwht4x4_16_add
vp9_short_iwalsh4x4_1_add -> vp9_iwht4x4_1_add
vp9_idct_add -> vp9_idct4x4_add
vp9_short_idct4x4_add -> vp9_idct4x4_16_add
vp9_short_idct4x4_1_add -> vp9_idct4x4_1_add
Change-Id: I6f43f7437c68dd30cdd05d72e213765578ed30b1
Moving functions from vp9_idct_blk to vp9_idct because these functions are
used from both encoder and decoder. Removing duplicated code from
vp9_encodemb.c and reusing existing functions.
Change-Id: Ia0a6782f8c4c409efb891651b871dd4bf22d5fe8
byte version of ronalds d153 ssse3 optimizations for
4x4 and 8x8
(commit: fc91a2a112238a1aee568f3b840585de4e928fca)
Change-Id: Iec4426032311483f615fd9e0dceba3ee85ddebd7
We don't need these functions anymore. The only one which was actually
used is vp9_add_constant_residual_32x32. Addition of
vp9_short_idct32x32_1_add eliminates this single usage. SSE2 optimized
version of vp9_short_idct32x32_1_add will be added in the next patch set,
right now it is only C implementation. Now we have all idct functions
implemented in a consistent manner.
Change-Id: I63df79a13cf62aa2c9360a7a26933c100f9ebda3