Went through the code and fixed it. Verified on Windows.
Where possible, remove dependencies on xmm[67]
Current code relies on pushing rbp to the stack to get 16 byte
alignment. This broke when rbp wasn't pushed
(vp8/encoder/x86/sad_sse3.asm). Work around this by using unaligned
memory accesses. Revisit this and the offsets in
vp8/encoder/x86/sad_sse3.asm in another change to SAVE_XMM.
Change-Id: I5f940994d3ebfd977c3d68446cef20fd78b07877