sse4_1 code used uint16_t for returning sad, but that
won't work for 32x32 or 64x64. This code fixes the
assembly for those and also reenables sse4_1 on linux
Change-Id: I5ce7288d581db870a148e5f7c5092826f59edd81
Support for gyp which doesn't support multiple objects in the same
static library having the same basename.
Change-Id: Ib947eefbaf68f8b177a796d23f875ccdfa6bc9dc