* simplify the endian logic * remove the need for memset() * write 16 or 32 at a time (likely aligned) Makes the code a bit faster on ARM (~1%) Change-Id: I650bc5654e8d0b0454318b7a78206b301c5f6c2c