Adds new code to memcpy function, optimized for Cortex A9.
Adds new ARM-only loop, for operations where source and
destination are aligned.
Copyright (C) ST-Ericsson SA 2010
Modified neon implementation to fit Cortex A9 cache line size,
for those running 32 bytes L2 cache line size.
Also split the implementation in aligned and unaligned access,
for those that allows unaligned memory access with Neon.
For totally aligned operations, arm-only code is used.
Change-Id: I95ebf6164cd6486b12a7e3e98e369db21e7e18d2
Author: Henrik Smiding henrik.smiding@stericsson.com for ST-Ericsson.
Signed-off-by: Christian Bejram <christian.bejram@stericsson.com>