Instead of doing 8-bit data unpack and 16-bit subtraction, use
psubb to do 16 8-bit subtractions and pcmpgtb to preserve the
sign information. This does not bring noticable gain since
these functions are not called frequently.
Change-Id: I90a0dfaa3db9d422e4ada324076596ffb178548e