splits it into small optimization-specific macros which are selected for each
DSP function. The advantage of this approach is that the sse4 functions now
use the ssse3 codepath also without needing an explicit sse4 codepath.
Originally committed as revision 24487 to svn://svn.ffmpeg.org/ffmpeg/trunk