Merging these functions allows merging some loops, which makes the results (particularly after SIMD optimizations) much faster. (cherry picked from commit f8bed30d8b)
f8bed30d8b