This replaces commit aa1c4cd, which has a bug and was reverted in
commit 3c73e58.
The bug is caused by rounding -step1[5] in highbd_idct8x8_12_half1d().
Change-Id: I37b3a5f0d91815f2dc570209091dc6626fd178a8
left shifting a negative value is undefined; quiets a ubsan warning.
this is applied to a constant, no change in the generated code.
Change-Id: I595f0ff7904ef025e07bb80234293d958dc9f254
This reverts commit aa1c4cd140007ea5b4be99732fbb23d1fd8cf2b5.
This fails the following tests with extreme input coefficients:
SSE2/InvTrans8x8DCT.CompareReference/0
SSE2/InvTrans8x8DCT.CompareReference/2
previously the optimized path was skipped in this range
Change-Id: I9af015a46eba96208834a219fafd651d37556a80
This reverts commit 03f5e300d69d368290305e19cc66bac8b0ea1ff8.
This causes test failures under OSX:
SSSE3/VP9QuantizeTest.EOBCheck/0
SSSE3/VP9QuantizeTest.OperationCheck/0
Change-Id: I122732717ead1f7af5b04c529a6948e382e5e59b
allow the right shift to operate on 64-bits, this matches the rest of
the implementations
previously:
b0f1ae147 vpx_get16x16var_avx2: correct cast order
Change-Id: I632ee5e418f3f9b30e79ecd05588eb172b0783aa
allow the right shift to operate on 64-bits, this matches the rest of
the implementations
missed in:
6acd061aa variance_avx2: sync variance functions with c-code
Change-Id: Icae436b881251ccb9f9ed64fcbf8d358c58a4617
left shifting a negative value is undefined; quiets a ubsan warning.
this is applied to a constant, no change in the generated code.
Change-Id: Ia17a7672d4832463decbc4afd6cd42974d02698e
Split to load_input_data4() and load_input_data8().
Use pack with signed saturation instruction for high bitdepth.
Change-Id: Icda3e0129a6fdb4a51d1cafbdc652ae3a65f4e06
vpx_idct32x32_1024_add_ssse3() is actually a sse2 function and faster
than vpx_idct32x32_1024_add_sse2(). Replace the slow one. All are
code relocations, no new code.
Change-Id: I5dac0e98cc411a4ce05660406921118986638d19
It's almost identical with vpx_idct8x8_64_add_sse2(), except little
difference in instructions order.
Change-Id: Ie60dabc35eaa6ebae7c755e6cff00a710aad284f
Replace by CAST_TO_BYTEPTR/SHORTPTR.
The rule is: if a short ptr is casted to a byte ptr, any offset
operation on the byte ptr must be doubled. We do this by casting to
short ptr first, adding offset, then casting back to byte ptr.
BUG=webm:1388
Change-Id: I9e18a73ba45ddae58fc9dae470c0ff34951fe248
Provides over 15x speedup for width > 8.
Due to smaller loads and shifting for width == 8 it gets about 8x
speedup.
For width == 4 it's only about 4x speedup because there is a lot of
shuffling and shifting to get the data properly situated.
BUG=webm:1390
Change-Id: Ice0b3dbbf007be3d9509786a61e7f35e94bdffa8
For 8-bit the subtrahend is small enough to fit into uint32_t.
This is the same that was done for:
c0241664a Resolve -Wshorten-64-to-32 in variance.
For 10/12-bit apply:
63a37d16f Prevent negative variance
Change-Id: Iab35e3f3f269035e17c711bd6cc01272c3137e1d