speed up accumulate, accumulateSquare, accumulateProduct and accumulateWeighted using SIMD

* use SSE and/or AVX based on configuration
  * revise the test to verify the implementation
This commit is contained in:
Tomoaki Teshima
2016-07-15 08:04:18 +09:00
parent da69cd08db
commit 3c2f7ecc97
2 changed files with 1367 additions and 3 deletions

File diff suppressed because it is too large Load Diff