also, split intrinsics into separate files. C implementation is compiled only when none of SSE2 or NEON is available.