At lower bitrates, it is overall faster to conditionally do one block
at a time with SSE2 on Haswell and likely other common architectures.
At higher bitrates, it is faster to use the wider routine that IDCTs
four blocks at a time. To avoid potential performance regressions
as compared to MMX, stick with single-block IDCTs with SSE2. There
is still a performance advantage as compared to MMX because the
single-block SSE2 routine is faster than the corresponding MMX
routine.
Stick with four blocks at a time with AVX2 for which that appears
to be consistently faster on Haswell.
Move asm routines to common. Delete obsolete decoder routines.
Use wider routines where applicable.
~1.07x overall faster decode on a quick 720p30 4Mbps test on Haswell.
This makes sure we don't accidentally return the same sequence
of random numbers multiple times within one test (which would
be very non-random).
Every time srand(time()) is called, the pseudo random number
generator is initialized to the same value (as long as time()
returned the same value).
By initializing the random number generator once and for all
before starting to run the unit tests, we are sure we don't
need to reinitialize it within all the tests and all the
functions that use random numbers.
This fixes occasional errors in MotionEstimateTest.
MotionEstimateTest was designed to allow the test to occasionally
not succeed - if it didn't succeed, it tried again, up to 100 times.
However, since the YUVPixelDataGenerator function reset the random
seed to time(), every attempt actually ran with the same random
data (as long as all 100 attempts ran within 1 second) - thus if
one attempt in MotionEstimateTest failed, all 100 of them would
fail. If the utility functions don't touch the random seed,
this is not an issue.