Adds an early exit based on ptest. Slightly slower than ssse3 in the
full case because of the extra check, but potentially faster if lots of
rows can be skipped.
Very close in speed to the assembly.
Can run in 32 bit, unlike the assembly. Allows reworking the function
prototype to use structs.
Change-Id: If80e2b9ba059370a4cad3c973196e82a97b4330e
Created inline functions highbd_butterfly_cospi16_sse2()
and highbd_butterfly_cospi16_sse4_1()
BUG=webm:1412
Change-Id: Icbc53a73712b6207379872a5e88d0a4d09e2322a
Fairly minor differences from sse2. pabsw and psignw are the big gains.
Also re-uses some values in eob calculation to avoid an extra pcmp.
Fixes test failures in HBD and OS X builds.
Allows using it in 32bit builds, where it is about 40% faster than sse2.
Substantially faster than the assembly for skip_block. 10-20% faster the
rest of the time.
Change-Id: If783bb3567e561e47667e10133b9c84414a334e2
Prepare for high bitdepth 16x16 idct sse4.1 code.
Just functions moving and renaming.
BUG=webm:1412
Change-Id: Ie056fe4494b1f299491968beadcef990e2ab714a
vpx_sub_pixel_variance32xh_avx2() and
vpx_sub_pixel_avg_variance32xh_avx2
see:
17fae3a Change to use correct check for halfpel
Change-Id: Ib0741c5c2fd011e9650ca62b76009f1b59fdbe4c
This replaces commit aa1c4cd, which has a bug and was reverted in
commit 3c73e58.
The bug is caused by rounding -step1[5] in highbd_idct8x8_12_half1d().
Change-Id: I37b3a5f0d91815f2dc570209091dc6626fd178a8
left shifting a negative value is undefined; quiets a ubsan warning.
this is applied to a constant, no change in the generated code.
Change-Id: I595f0ff7904ef025e07bb80234293d958dc9f254
This reverts commit aa1c4cd140.
This fails the following tests with extreme input coefficients:
SSE2/InvTrans8x8DCT.CompareReference/0
SSE2/InvTrans8x8DCT.CompareReference/2
previously the optimized path was skipped in this range
Change-Id: I9af015a46eba96208834a219fafd651d37556a80
This reverts commit 03f5e300d6.
This causes test failures under OSX:
SSSE3/VP9QuantizeTest.EOBCheck/0
SSSE3/VP9QuantizeTest.OperationCheck/0
Change-Id: I122732717ead1f7af5b04c529a6948e382e5e59b
allow the right shift to operate on 64-bits, this matches the rest of
the implementations
previously:
b0f1ae147 vpx_get16x16var_avx2: correct cast order
Change-Id: I632ee5e418f3f9b30e79ecd05588eb172b0783aa
allow the right shift to operate on 64-bits, this matches the rest of
the implementations
missed in:
6acd061aa variance_avx2: sync variance functions with c-code
Change-Id: Icae436b881251ccb9f9ed64fcbf8d358c58a4617