vpx_idct32x32_1024_add_ssse3() is actually a sse2 function and faster
than vpx_idct32x32_1024_add_sse2(). Replace the slow one. All are
code relocations, no new code.
Change-Id: I5dac0e98cc411a4ce05660406921118986638d19
It's almost identical with vpx_idct8x8_64_add_sse2(), except little
difference in instructions order.
Change-Id: Ie60dabc35eaa6ebae7c755e6cff00a710aad284f
Add PartialIDctTest::PrintDiff() to help debugging.
In RunQuantCheck, try all combinations of +/-mask_ input for 4x4 idct.
Update PartialIDctTest::InitInput().
Change-Id: I13fd163954a4c1a3a6cfeb5e4a4d3d0e7ff901f4
Makes more sense to call the corresponding partial idct C function
instead of the full idct C function as the reference.
Change-Id: Ibb7681dd063edd6307ba582c10c26c4c6a4b78c6
When eob is less than or equal to 135 for high-bitdepth 32x32 idct,
call this function.
BUG=webm:1301
Change-Id: I8a5864f5c076e449c984e602946547a7b09c9fe6
When eob is less than or equal to 38 for high-bitdepth 16x16 idct,
call this function.
BUG=webm:1301
Change-Id: I09167f89d29c401f9c36710b0fd2d02644052060
The RunQuantCheck() test on it exposes 16-bit overflow in stage 7 of
pass 2. Change to use saturating add/sub for both
vpx_idct16x16_38_add_neon() and vpx_idct16x16_256_add_neon() for high
bitdepth.
Change-Id: Ibf4c107a887553a52852cc582e28d38a5a5a2712
This currently runs 1000 * 1000 = one *million* times which is quite
unnecessary. It's one of the slowest items in Jenkins and takes over an
hour for each of the larger transforms.
Change-Id: I01653b5e610683e1a2d778ec60cf5065562ab8db
1. Use correct projections when copying real dct/quant outputs.
2. Remove local random number generator and combine loops.
3. Quantization with minimum allowed step sizes instead of maximum.
This may generate larger inputs.
Change-Id: I154afc26230c894d564671cff4b8fd5485b69598
tests with 'Large' in the name are reserved for slow running tests which
may not be run on all platforms
Change-Id: I2a7d6dd46b29b50469893e46433844132fb727c2
this removes the need for __STDC_LIMIT_MACROS which is defined in
vpx_integer.h, but may be preceded by earlier includes of stdint.h;
fixes build with the r13 ndk
Change-Id: I3950c8837cf90d5584a20ce370ae370581c2182c
idct4x4 and idct8x8 were universally enabled for high-bitdepth builds
in:
3ae2597 idct,NEON: add a tran_low_t->s16 load adapter
BUG=webm:1294
Change-Id: If142afb169c48728cc4b222e7c41aa4a63f95f0f
replace load_and_transpose_s16_8x8() in idct32_6_neon() with a separate
load_tran_low_to_s16() and transpose_s16_8x8(). the combined function is
used in idct32_8_neon() where the input is the correctly sized output
from the earlier stage.
BUG=webm:1294
Change-Id: I4257c4b3a421b2cf5d13651f966eee0680ef98a9
Two functions do not pass this test:
vpx_idct8x8_64_add_ssse3
vpx_idct8x8_12_add_ssse3
The test has been modified to avoid triggering an issue with those
functions but they still must be investigated.
BUG=webm:1332
Change-Id: I52569a81e8e6e0b33c4a4d060d0b69c3fc4f578e
The result of the transform is added to the destination buffers. In the
existing tests the destination buffer is always empty so that portion of
the code was never exercised.
Change-Id: I1858c4fed2274f1b9faf834d2ba4186a4510492a
Switch to using correctly sized inputs and outputs. This simplifies
adding tests with varying strides.
Change-Id: I716a0d8173dcf6a86d56656ac9d3101b7ec27642