Makes fate-h264 pass under valgrind --undef-value-errors=yes with -cpuflags none. {avg,put}_h264_chroma_mc8_8 approximately 5% faster, {avg,put}_h264_chroma_mc4_8 2% faster both on x86 and arm.