This allows detecting CPU features with builds that have neither gcc inline assembly nor the right compiler intrinsics enabled.
Move vector_fmul() from DSPContext to AVFloatDSPContext.