Reduce dependency on offsets file by using intrinsics. Disassembly shows
improvements over previous assembly specifically in register management,
preloading, and {pro,epi}log. Speed change is within margin of error.
Change-Id: I8131b4b4d62bc092407fe847bfaa8f2c0e1384ff