Even if there actually is no SIMD optimized version of the
width==2 cases, luma function for SIMD still needs to handle
it by calling McCopyWidthEq2_c for these cases.
This simplifies the UT code a little, and makes sure that
those codepaths are tested properly.
This speeds up the compile time from 21.3 to 2.6 seconds
for the MC test files.
This makes it slightly harder to see exactly which test
failed on a quick glance, but it makes the overall structure of
the unit test output more manageable and readable, by reducing
the number of tests from 1300 to 430.