From the ARMv7 NEON version. 16 times faster as the C version, overall more than 12% faster vorbis decoding on Apple's A7.
This saves one instruction in the x86-64 assembly.
Conveniently (together with Justin's earlier patches), this makes our vorbis decoder entirely independent of dsputil.