Wrote SSE2 version of vp9_dc_only_idct_add_c function. In order to
improve performance, clipped the absolute diff values to [0, 255].
This allowed us to keep the additions/subtractions in 8 bits.
Test showed an over 2% decoder performance increase.
Change-Id: Ie1a236d23d207e4ffcd1fc9f3d77462a9c7fe09d