Signed-off-by: Janne Grunau <janne-libav@jannau.net>
~3.0-3.5x as fast as original C version, 1.6x as fast overall.