We do four blocks at a time when possible, but need to handle
single blocks at a time for intra prediction.
~3.15x speedup over MMX for the DCT on Haswell.
~2.94x speedup over MMX for the IDCT on Haswell.
Returns diminish with increasing vector length because a larger
proportion of the time is spent on load/store/shuffling.
We do four blocks at a time when possible, but need to handle
single blocks at a time for intra prediction.
~2.31x speedup over MMX for the DCT on Haswell.
~1.92x speedup over MMX for the IDCT on Haswell.
IDCT input is defined in such a way that the intermediate values
cannot legally overflow an int16_t. The use of random values
as input causes such overflows. This results in implementation-
dependent output depending on which type is used to hold
intermediate results. Use a template for the test reference
implementation to test implementations with different
intermediate representation.
This makes sure we don't accidentally return the same sequence
of random numbers multiple times within one test (which would
be very non-random).
Every time srand(time()) is called, the pseudo random number
generator is initialized to the same value (as long as time()
returned the same value).
By initializing the random number generator once and for all
before starting to run the unit tests, we are sure we don't
need to reinitialize it within all the tests and all the
functions that use random numbers.
This fixes occasional errors in MotionEstimateTest.
MotionEstimateTest was designed to allow the test to occasionally
not succeed - if it didn't succeed, it tried again, up to 100 times.
However, since the YUVPixelDataGenerator function reset the random
seed to time(), every attempt actually ran with the same random
data (as long as all 100 attempts ran within 1 second) - thus if
one attempt in MotionEstimateTest failed, all 100 of them would
fail. If the utility functions don't touch the random seed,
this is not an issue.