Streamline the memcpy a bit removing some unnecessary instructions.
The biggest speed improvement comes from changing the size of
the preload. On krait, the sweet spot for the preload in the main
loop is twice the L1 cache line size.
In most cases, these small tweaks yield > 1000MB/s speed ups. As
the size of the memcpy approaches about 1MB, the speed improvement
disappears.
Change-Id: Ief79694d65324e2db41bee4707dae19b8c24be62
This uses the new strcmp.a15.S code as the basis for new versions
of strcmp.S.
The cortex-a15 code is the performance optimized version of strcmp.a15.S
taken with only the addition of a few pld instructions.
The cortex-a9 code is the same as the cortex-a15 code except that the
unaligned strcmp code was taken from the original strcmp.S.
The krait code is the same as the cortex-a15 code except that one path
in the unaligned strcmp code was taken from the original strcmp.S code
(the 2 byte overlap case).
The generic code is the original unmodified strmp.S from the bionic
subdirectory.
All three new versions underwent these test cases:
Strings the same, all same size:
- Both pointers double word aligned.
- One pointer double word aligned, one pointer word aligned.
- Both pointers word aligned.
- One pointer double word aligned, one pointer 1 off a word alignment.
- One pointer double word aligned, one pointer 2 off a word alignment.
- One pointer double word aligned, one pointer 3 off a word alignment.
- One pointer word aligned, one pointer 1 off a word alignment.
- One pointer word aligned, one pointer 2 off a word alignment.
- One pointer word aligned, one pointer 3 off a word alignment.
For all cases where it made sense, the two pointers were also tested
swapped.
Different strings, all same size:
- Single difference at double word boundary.
- Single difference at word boudary.
- Single difference at 1 off a word alignment.
- Single difference at 2 off a word alignment.
- Single difference at 3 off a word alignment.
Different sized strings, strings the same until the end:
- Shorter string ends on a double word boundary.
- Shorter string ends on word boundary.
- Shorter string ends at 1 off a word boundary.
- Shorter string ends at 2 off a word boundary.
- Shorter string ends at 3 off a word boundary.
For all different cases, run them through the same pointer alignment
cases when the strings are the same size.
For all cases the two pointers were also tested swapped.
Bug: 8005082
Change-Id: I5f3dc02b48afba2cb9c13332ab45c828ff171a1c
Move arch specific code for arm, mips, x86 into separate
makefiles.
In addition, add different arm cpu versions of memcpy/memset.
Bug: 8005082
Change-Id: I04f3d0715104fab618e1abf7cf8f7eec9bec79df