IDCT input is defined in such a way that the intermediate values
cannot legally overflow an int16_t. The use of random values
as input causes such overflows. This results in implementation-
dependent output depending on which type is used to hold
intermediate results. Use a template for the test reference
implementation to test implementations with different
intermediate representation.
Initialize input arrays with different random values.
Otherwise, the input to the DCT routines is effectively
all zero values after taking the difference.
Reduce duplication.
These values are read as two separate 16 bit integers from an
array in the FeatureSearchOne function, therefore we should
also store them in a well-defined order.
This fixes encoding of screen content on big endian; now the
full testsuite passes on big endian.
If the calling test hasn't set m_iPicResSize, it is set to the
maximum frame size, which takes much longer to initialize than the
current actual frame size.
This reduces the runtime of EncoderInterfaceTest.SkipFrameCheck
in valgrind from 229 seconds to 8 seconds, and the total runtime
of all the test cases in EncoderInterfaceTest from 405 seconds
to 89 seconds.
They are still used slightly differently in the encoder and decoder;
the decoder uses plain functions while the encoder uses one object
keeping track of the number of allocated bytes, and keeping track
of the requested alignment.
Use the decoder versions of the functions (which are capable
of handling widths 4/8/16 for luma, not only 16 as in the
encoder). By using the more generic versions, there may be a small
performance loss since the functions need to check the width
in every call. Actual measurements show that the actual change is
very small (and the shared routines turn out to actually be faster
than the existing ones in ARM NEON setups).
Even if there actually is no SIMD optimized version of the
width==2 cases, luma function for SIMD still needs to handle
it by calling McCopyWidthEq2_c for these cases.
This simplifies the UT code a little, and makes sure that
those codepaths are tested properly.
This speeds up the compile time from 21.3 to 2.6 seconds
for the MC test files.
This makes it slightly harder to see exactly which test
failed on a quick glance, but it makes the overall structure of
the unit test output more manageable and readable, by reducing
the number of tests from 1300 to 430.