bionic/libc/arch-arm/cortex-a9/cortex-a9.mk

9 lines
499 B
Makefile
Raw Normal View History

$(call libc-add-cpu-variant-src,MEMCPY,arch-arm/cortex-a9/bionic/memcpy.S)
$(call libc-add-cpu-variant-src,MEMSET,arch-arm/cortex-a9/bionic/memset.S)
Optimize strcat/strcpy, small tweaks to strlen. Create one version of strcat/strcpy/strlen for cortex-a15/krait and another version for cortex-a9. Tested with the libc_test strcat/strcpy/strlen tests. Including new tests that verify that the src for strcat/strcpy do not overread across page boundaries. NOTE: The handling of unaligned strcpy (same code in strcat) could probably be optimized further such that the src is read 64 bits at a time instead of the partial reads occurring now. strlen improves slightly since it was recently optimized. Performance improvements for strcpy and strcat (using an empty dest string): cortex-a9 - Small copies vary from about 5% to 20% as the size gets above 10 bytes. - Copies >= 1024, about a 60% improvement. - Unaligned copies, from about 40% improvement. cortex-a15 - Most small copies exhibit a 100% improvement, a few copies only improve by 20%. - Copies >= 1024, about 150% improvement. - Unaligned copies, about 100% improvement. krait - Most small copies vary widely, but on average 20% improvement, then the performance gets better, hitting about a 100% improvement when copies 64 bytes of data. - Copies >= 1024, about 100% improvement. - When coping MBs of data, about 50% improvement. - Unaligned copies, about 90% improvement. As strcat destination strings get larger in size: cortex-a9 - about 40% improvement for small dst strings (>= 32). - about 250% improvement for dst strings >= 1024. cortex-a15 - about 200% improvement for small dst strings (>=32). - about 250% improvement for dst strings >= 1024. krait - about 25% improvement for small dst strings (>=32). - about 100% improvement for dst strings >=1024. Change-Id: Ifd091ebdbce70fe35a7c5d8f71d5914255f3af35
2013-07-15 21:49:26 +02:00
$(call libc-add-cpu-variant-src,STRCAT,arch-arm/cortex-a9/bionic/strcat.S)
Create arch specific versions of strcmp. This uses the new strcmp.a15.S code as the basis for new versions of strcmp.S. The cortex-a15 code is the performance optimized version of strcmp.a15.S taken with only the addition of a few pld instructions. The cortex-a9 code is the same as the cortex-a15 code except that the unaligned strcmp code was taken from the original strcmp.S. The krait code is the same as the cortex-a15 code except that one path in the unaligned strcmp code was taken from the original strcmp.S code (the 2 byte overlap case). The generic code is the original unmodified strmp.S from the bionic subdirectory. All three new versions underwent these test cases: Strings the same, all same size: - Both pointers double word aligned. - One pointer double word aligned, one pointer word aligned. - Both pointers word aligned. - One pointer double word aligned, one pointer 1 off a word alignment. - One pointer double word aligned, one pointer 2 off a word alignment. - One pointer double word aligned, one pointer 3 off a word alignment. - One pointer word aligned, one pointer 1 off a word alignment. - One pointer word aligned, one pointer 2 off a word alignment. - One pointer word aligned, one pointer 3 off a word alignment. For all cases where it made sense, the two pointers were also tested swapped. Different strings, all same size: - Single difference at double word boundary. - Single difference at word boudary. - Single difference at 1 off a word alignment. - Single difference at 2 off a word alignment. - Single difference at 3 off a word alignment. Different sized strings, strings the same until the end: - Shorter string ends on a double word boundary. - Shorter string ends on word boundary. - Shorter string ends at 1 off a word boundary. - Shorter string ends at 2 off a word boundary. - Shorter string ends at 3 off a word boundary. For all different cases, run them through the same pointer alignment cases when the strings are the same size. For all cases the two pointers were also tested swapped. Bug: 8005082 Change-Id: I5f3dc02b48afba2cb9c13332ab45c828ff171a1c
2013-03-09 01:50:31 +01:00
$(call libc-add-cpu-variant-src,STRCMP,arch-arm/cortex-a9/bionic/strcmp.S)
Optimize strcat/strcpy, small tweaks to strlen. Create one version of strcat/strcpy/strlen for cortex-a15/krait and another version for cortex-a9. Tested with the libc_test strcat/strcpy/strlen tests. Including new tests that verify that the src for strcat/strcpy do not overread across page boundaries. NOTE: The handling of unaligned strcpy (same code in strcat) could probably be optimized further such that the src is read 64 bits at a time instead of the partial reads occurring now. strlen improves slightly since it was recently optimized. Performance improvements for strcpy and strcat (using an empty dest string): cortex-a9 - Small copies vary from about 5% to 20% as the size gets above 10 bytes. - Copies >= 1024, about a 60% improvement. - Unaligned copies, from about 40% improvement. cortex-a15 - Most small copies exhibit a 100% improvement, a few copies only improve by 20%. - Copies >= 1024, about 150% improvement. - Unaligned copies, about 100% improvement. krait - Most small copies vary widely, but on average 20% improvement, then the performance gets better, hitting about a 100% improvement when copies 64 bytes of data. - Copies >= 1024, about 100% improvement. - When coping MBs of data, about 50% improvement. - Unaligned copies, about 90% improvement. As strcat destination strings get larger in size: cortex-a9 - about 40% improvement for small dst strings (>= 32). - about 250% improvement for dst strings >= 1024. cortex-a15 - about 200% improvement for small dst strings (>=32). - about 250% improvement for dst strings >= 1024. krait - about 25% improvement for small dst strings (>=32). - about 100% improvement for dst strings >=1024. Change-Id: Ifd091ebdbce70fe35a7c5d8f71d5914255f3af35
2013-07-15 21:49:26 +02:00
$(call libc-add-cpu-variant-src,STRCPY,arch-arm/cortex-a9/bionic/strcpy.S)
$(call libc-add-cpu-variant-src,STRLEN,arch-arm/cortex-a9/bionic/strlen.S)
include bionic/libc/arch-arm/generic/generic.mk