modelled after aarch64 code on Cortex-A8, s16 and s32 code is about 2x faster, float code about 7x faster Signed-off-by: Peter Meerwald <pmeerw@pmeerw.net> Signed-off-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Mans Rullgard <mans@mansr.com>