Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
~3.0-3.5x as fast as original C version, 1.6x as fast overall.