Merging these functions allows merging some loops, which makes the results (particularly after SIMD optimizations) much faster. (cherry picked from commit f8bed30d8b176fa030f6737765338bb4a2bcabc9)
(cherry picked from commit 12802ec0601c3bd7b9c7a2503518e28fd5e7d744)