
use 64 bytes cache lines, reduce the main loop to 64-bytes instead of 128 bytes and adjust the prefetch distance to the optimal value.
Description
No description provided
Languages
C
68.1%
Assembly
16.2%
C++
13.4%
Makefile
1.1%
Python
0.9%
Other
0.2%