18 Commits

Author SHA1 Message Date
Sindre Aamås
4645bd26aa [Encoder] Add an SSE4.2 implementation of WelsGetNonZeroCount
Avoid touching some cache lines by using popcnt instead of table
lookups.

Also gives a speedup of ~1.4x on Haswell as compared with SSE2.
2016-04-20 19:10:24 +02:00
Sindre Aamås
d906dda224 [UT] Improve GetNonZeroCount tests
Reduce duplication.
Test more combinations.
Always test boundary cases.
2016-04-20 19:10:24 +02:00
Sindre Aamås
c8c74903f8 [Encoder] Add single-block AVX2 4x4 DCT/IDCT routines
We do four blocks at a time when possible, but need to handle
single blocks at a time for intra prediction.

~3.15x speedup over MMX for the DCT on Haswell.
~2.94x speedup over MMX for the IDCT on Haswell.

Returns diminish with increasing vector length because a larger
proportion of the time is spent on load/store/shuffling.
2016-02-02 17:22:49 +01:00
Sindre Aamås
f90960983c [Encoder] Add single-block SSE2 4x4 DCT/IDCT routines
We do four blocks at a time when possible, but need to handle
single blocks at a time for intra prediction.

~2.31x speedup over MMX for the DCT on Haswell.
~1.92x speedup over MMX for the IDCT on Haswell.
2016-02-02 17:22:48 +01:00
Sindre Aamås
cc8d541432 [UT] Utilize DCT function pointer typedefs 2016-01-19 22:00:24 +01:00
Sindre Aamås
a45c10cf91 [UT] Only run AVX2 tests if host supports AVX2 2016-01-19 14:27:46 +01:00
Sindre Aamås
b267163f10 [Encoder] Add an AVX2 4x4 DCT implementation
~2.52x faster on Haswell as compared to the SSE2 version.
2016-01-19 13:12:28 +01:00
Sindre Aamås
8764231784 [UT] Improve DCT tests
Initialize input arrays with different random values.

Otherwise, the input to the DCT routines is effectively
all zero values after taking the difference.

Reduce duplication.
2016-01-19 13:12:28 +01:00
Martin Storsjö
51efa57a3d Convert tabs to spaces in vertically aligned code 2015-06-10 10:21:29 +03:00
Martin Storsjö
723044837a Convert tabs to spaces in defines 2015-06-10 10:21:25 +03:00
huili2
c3cfce5223 modify some functions extending to sub8x8 usage, especially in ME part 2015-06-02 13:39:38 +08:00
ruil2
f7cd6e7aad use WelsEnc namespace instead of WelsSVCEnc 2014-08-11 16:08:49 +08:00
Martin Storsjö
d2afebd2d7 Add proper spacing in include directvies 2014-07-01 10:55:04 +03:00
Martin Storsjö
4f594deff9 Don't reset the random number generator within the unit tests
This makes sure we don't accidentally return the same sequence
of random numbers multiple times within one test (which would
be very non-random).

Every time srand(time()) is called, the pseudo random number
generator is initialized to the same value (as long as time()
returned the same value).

By initializing the random number generator once and for all
before starting to run the unit tests, we are sure we don't
need to reinitialize it within all the tests and all the
functions that use random numbers.

This fixes occasional errors in MotionEstimateTest.

MotionEstimateTest was designed to allow the test to occasionally
not succeed - if it didn't succeed, it tried again, up to 100 times.
However, since the YUVPixelDataGenerator function reset the random
seed to time(), every attempt actually ran with the same random
data (as long as all 100 attempts ran within 1 second) - thus if
one attempt in MotionEstimateTest failed, all 100 of them would
fail. If the utility functions don't touch the random seed,
this is not an issue.
2014-07-01 10:20:45 +03:00
Martin Storsjö
6e815e708d Use ENFORCE_STACK_ALIGN_1D instead of manually doing stack buffer alignment 2014-06-29 00:55:46 +03:00
huili2
dc3fae4477 astyle all 2014-06-25 18:50:41 -07:00
Martin Storsjö
1b2d3943f1 Use uintptr_t for casting pointers to integers
This fixes compilation on mingw-w64 and makes failing tests pass
on MSVC in 64 bit mode.
2014-04-08 09:46:47 +03:00
JuannyWang
801b664201 add encoder UT of mbAux 2014-04-03 16:25:06 +08:00