This is identical to what e.g. vp8 does, and prevents the function call
overhead (plus dependency on dsputil for this particular function).
Arm asm updated by Janne Grunau <janne-libav@jannau.net>.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>