Added sse2 version of vp8_regular_quantize_b which improved encode
performance(for the clip used) by ~10% for 32 bit builds and ~3% for
64 bit builds.
Also updated SHADOW_ARGS_TO_STACK to allow for more than 9 arguments.
Change-Id: I62f78eabc8040b39f3ffdf21be175811e96b39af