434 Commits

Author SHA1 Message Date
Christopher Ferris
b922ed3498 Fix strcpy.c that should have been strcpy.S. DO NOT MERGE
Merge from internal master.

(cherry-picked from 1ce665416307628f4bcaced86faa64bdf9c489c3)

Change-Id: I376b831df42248baadde7202a30a68112f752ff7
2013-08-08 12:09:37 -07:00
Christopher Ferris
4e24dcc8d8 Optimize strcat/strcpy, small tweaks to strlen. DO NOT MERGE
Create one version of strcat/strcpy/strlen for cortex-a15/krait and another
version for cortex-a9.

Tested with the libc_test strcat/strcpy/strlen tests.
Including new tests that verify that the src for strcat/strcpy do not
overread across page boundaries.

NOTE: The handling of unaligned strcpy (same code in strcat) could probably
be optimized further such that the src is read 64 bits at a time instead of
the partial reads occurring now.

strlen improves slightly since it was recently optimized.

Performance improvements for strcpy and strcat (using an empty dest string):

cortex-a9
- Small copies vary from about 5% to 20% as the size gets above 10 bytes.
- Copies >= 1024, about a 60% improvement.
- Unaligned copies, from about 40% improvement.

cortex-a15
- Most small copies exhibit a 100% improvement, a few copies only
  improve by 20%.
- Copies >= 1024, about 150% improvement.
- Unaligned copies, about 100% improvement.

krait
- Most small copies vary widely, but on average 20% improvement, then
  the performance gets better, hitting about a 100% improvement when
  copies 64 bytes of data.
- Copies >= 1024, about 100% improvement.
- When coping MBs of data, about 50% improvement.
- Unaligned copies, about 90% improvement.

As strcat destination strings get larger in size:

cortex-a9
- about 40% improvement for small dst strings (>= 32).
- about 250% improvement for dst strings >= 1024.

cortex-a15
- about 200% improvement for small dst strings (>=32).
- about 250% improvement for dst strings >= 1024.

krait
- about 25% improvement for small dst strings (>=32).
- about 100% improvement for dst strings >=1024.

Merge from internal master.

(cherry-picked from d119b7b6f48fe507088cfb98bcafa99b320fd884)

Change-Id: I296463b251ef9fab004ee4dded2793feca5b547a
2013-08-08 11:13:46 -07:00
Christopher Ferris
7c860db074 Optimize __memset_chk, __memcpy_chk.
This change creates assembler versions of __memcpy_chk/__memset_chk
that is implemented in the memcpy/memset assembler code. This change
avoids an extra call to memcpy/memset, instead allowing a simple fall
through to occur from the chk code into the body of the real
implementation.

Testing:

- Ran the libc_test on __memcpy_chk/__memset_chk on all nexus devices.
- Wrote a small test executable that has three calls to __memcpy_chk and
  three calls to __memset_chk. First call dest_len is length + 1. Second
  call dest_len is length. Third call dest_len is length - 1.
  Verified that the first two calls pass, and the third fails. Examined
  the logcat output on all nexus devices to verify that the fortify
  error message was sent properly.
- I benchmarked the new __memcpy_chk and __memset_chk on all systems. For
  __memcpy_chk and large copies, the savings is relatively small (about 1%).
  For small copies, the savings is large on cortex-a15/krait devices
  (between 5% to 30%).
  For cortex-a9 and small copies, the speed up is present, but relatively
  small (about 3% to 5%).
  For __memset_chk and large copies, the savings is also small (about 1%).
  However, all processors show larger speed-ups on small copies (about 30% to
  100%).

Bug: 9293744

Change-Id: I8926d59fe2673e36e8a27629e02a7b7059ebbc98
2013-08-06 15:38:29 -07:00
Christopher Ferris
1ce6654163 Fix strcpy.c that should have been strcpy.S.
Change-Id: Ib4609baad3a14c8b0f37556269781fa2b06916dc
2013-08-05 17:08:06 -07:00
Christopher Ferris
e1857431e8 Merge "Optimize strcat/strcpy, small tweaks to strlen." 2013-08-05 23:32:06 +00:00
Ben Cheng
b78f43579f am aa2733d1: Merge "Update the comments to reflect the current status."
* commit 'aa2733d17b87c607fccbd6e6a0f44d2d411ffd77':
  Update the comments to reflect the current status.
2013-08-02 17:58:47 -07:00
Elliott Hughes
d0313e7a2a am a6ed05c1: Merge "libgcc_compat: Introduce __aeabi_lasr for cortex-a9 and higher"
* commit 'a6ed05c1c4c787241b56df132e77512c64cbc595':
  libgcc_compat: Introduce __aeabi_lasr for cortex-a9 and higher
2013-08-02 17:58:46 -07:00
Ben Cheng
772b797b7b Update the comments to reflect the current status.
Change-Id: I3a6348b568230fe8b21d121e5b8d30561a9703c2
2013-08-02 15:53:18 -07:00
Christopher Ferris
d119b7b6f4 Optimize strcat/strcpy, small tweaks to strlen.
Create one version of strcat/strcpy/strlen for cortex-a15/krait and another
version for cortex-a9.

Tested with the libc_test strcat/strcpy/strlen tests.
Including new tests that verify that the src for strcat/strcpy do not
overread across page boundaries.

NOTE: The handling of unaligned strcpy (same code in strcat) could probably
be optimized further such that the src is read 64 bits at a time instead of
the partial reads occurring now.

strlen improves slightly since it was recently optimized.

Performance improvements for strcpy and strcat (using an empty dest string):

cortex-a9
- Small copies vary from about 5% to 20% as the size gets above 10 bytes.
- Copies >= 1024, about a 60% improvement.
- Unaligned copies, from about 40% improvement.

cortex-a15
- Most small copies exhibit a 100% improvement, a few copies only
  improve by 20%.
- Copies >= 1024, about 150% improvement.
- Unaligned copies, about 100% improvement.

krait
- Most small copies vary widely, but on average 20% improvement, then
  the performance gets better, hitting about a 100% improvement when
  copies 64 bytes of data.
- Copies >= 1024, about 100% improvement.
- When coping MBs of data, about 50% improvement.
- Unaligned copies, about 90% improvement.

As strcat destination strings get larger in size:

cortex-a9
- about 40% improvement for small dst strings (>= 32).
- about 250% improvement for dst strings >= 1024.

cortex-a15
- about 200% improvement for small dst strings (>=32).
- about 250% improvement for dst strings >= 1024.

krait
- about 25% improvement for small dst strings (>=32).
- about 100% improvement for dst strings >=1024.

Change-Id: Ifd091ebdbce70fe35a7c5d8f71d5914255f3af35
2013-08-02 10:31:51 -07:00
synergydev
efddf44c8e libgcc_compat: Introduce __aeabi_lasr for cortex-a9 and higher
This is needed when passing -mcpu=cortex-a9 or higher on a modern
toolchain for prebuilt library compatibility

Change-Id: I73eb2393377914ae26216a8c2828ad973d1c1225
2013-07-29 16:55:08 -07:00
Christopher Ferris
7ff868a630 am f63c28f0: Merge "Fix assembler errors in generic arm strlen.c."
* commit 'f63c28f0338fd647e88f1f9300b2220093af1aae':
  Fix assembler errors in generic arm strlen.c.
2013-07-16 17:22:05 -07:00
Christopher Ferris
b1d7fd4969 am 6f4fed74: Merge "Add new optimized strlen for arm."
* commit '6f4fed74cb9405c0f5322307085d15afed6be764':
  Add new optimized strlen for arm.
2013-07-16 17:21:55 -07:00
Christopher Ferris
9ad2a73ed6 Fix assembler errors in generic arm strlen.c.
Tested using a static version of the strlen libc_test program
on a nexus7 that uses the generic code.

Merge from internal master.

(cherry-picked from d8d10a8994472e40d19301b7087806630877b4d5)

Change-Id: I88f7dc01dc5b5c3ac2d5580d92153bc1bc36c564
2013-07-16 16:47:54 -07:00
Christopher Ferris
0aa9b52efa Add new optimized strlen for arm.
This optimized version is primarily targeted at cortex-a15.

Tested on all nexus devices using the system/extras/libc_test strlen test.
Tested alignments from 1 to 32 that are powers of 2.
Tested that strlen does not cross page boundaries at all alignments.

Speed improvements listed below:

cortex-a15
- Sizes >= 32 bytes, ~75% improvement.
- Sizes >= 1024 bytes, ~250% improvement.

cortex-a9
- Sizes >= 32 bytes, ~75% improvement.
- Sizes >= 1024 bytes, ~85% improvement.

krait
- Sizes >= 32 bytes, ~95% improvement.
- Sizes >= 1024 bytes, ~160% improvement.

Merge from internal master.

(cherry-picked from 2fc071797743b88a9a47427d46baed7c7b24f4d2)

Change-Id: I1ceceb4e745fd68e9d946f96d1d42e0cdaff6ccf
2013-07-16 16:47:37 -07:00
Elliott Hughes
62d6b7526a am 2a18ea14: am f152e386: Merge "EABI syscall cleanup."
* commit '2a18ea1462cf65cc51bfcb1a1c46972ee5af1d01':
  EABI syscall cleanup.
2013-07-16 15:36:11 -07:00
Elliott Hughes
2a18ea1462 am f152e386: Merge "EABI syscall cleanup."
* commit 'f152e386fcf477f3f5de9dc020c3660d4f9c4b81':
  EABI syscall cleanup.
2013-07-16 15:31:39 -07:00
Elliott Hughes
da4a3e6515 EABI syscall cleanup.
We cleaned up the auto-generated ones a while back to not touch
the stack unnecessarily if they have <= 4 arguments. This patch
cleans up some hand-crafted ones.

Also improve comments in clone.S.

Change-Id: I8850bf98f2b26829385315304472a760e6880ed8
2013-07-16 11:52:24 -07:00
Christopher Ferris
d8d10a8994 Fix assembler errors in generic arm strlen.c.
Tested using a static version of the strlen libc_test program
on a nexus7 that uses the generic code.

Change-Id: If04d15dcb6c0b18f27f2fefadca5510ed49016c5
2013-07-15 13:56:45 -07:00
Christopher Ferris
2fc0717977 Add new optimized strlen for arm.
This optimized version is primarily targeted at cortex-a15.

Tested on all nexus devices using the system/extras/libc_test strlen test.
Tested alignments from 1 to 32 that are powers of 2.
Tested that strlen does not cross page boundaries at all alignments.

Speed improvements listed below:

cortex-a15
- Sizes >= 32 bytes, ~75% improvement.
- Sizes >= 1024 bytes, ~250% improvement.

cortex-a9
- Sizes >= 32 bytes, ~75% improvement.
- Sizes >= 1024 bytes, ~85% improvement.

krait
- Sizes >= 32 bytes, ~95% improvement.
- Sizes >= 1024 bytes, ~160% improvement.

Change-Id: I361b1a36ed89ab991f2a8f0abbf0d7416d39c8f5
2013-07-15 12:37:51 -07:00
Elliott Hughes
be438a4c40 am fac9199c: am ebc8ce1d: Merge "libc/arch-arm/bionic/memcpy.a9.S: memcpy from cortex-strings."
* commit 'fac9199c7698481805dd9b1adaf89a2584719f4c':
  libc/arch-arm/bionic/memcpy.a9.S: memcpy from cortex-strings.
2013-07-03 10:28:19 -07:00
Elliott Hughes
fac9199c76 am ebc8ce1d: Merge "libc/arch-arm/bionic/memcpy.a9.S: memcpy from cortex-strings."
* commit 'ebc8ce1de68a83d772106af98c7cb98150bb5662':
  libc/arch-arm/bionic/memcpy.a9.S: memcpy from cortex-strings.
2013-07-03 10:23:41 -07:00
Will Newton
2753e12af5 libc/arch-arm/bionic/memcpy.a9.S: memcpy from cortex-strings.
This memcpy code uses NEON/VFP to achieve very good performance
on ARMv7-A processors. It is specifically tuned for A15 but should
provide good performance on A9 also. It is equivalent to the code
in cortex-strings rev 116.

This patch is a follow up the existing gerrit change:

I7f6f77995f3ca903ad9c66d14261441667a2a935

This version includes a tweak for performance on misaligned
buffers and splits the header comment into license and
documentation sections.

Change-Id: Ibd2e23c8d8e01357ba0247be1d05192de3ceba69
Signed-off-by: Will Newton <will.newton@linaro.org>
2013-07-03 10:20:43 -07:00
Christopher Ferris
c6ac3ae269 am 269daac2: am 7c14d67b: Merge "libc/arch-arm/bionic/memcpy.a9.S: memcpy from cortex-strings."
* commit '269daac2f1d76a478b83ba4cbb57d28b47eef5ec':
  libc/arch-arm/bionic/memcpy.a9.S: memcpy from cortex-strings.
2013-07-01 10:39:08 -07:00
Christopher Ferris
269daac2f1 am 7c14d67b: Merge "libc/arch-arm/bionic/memcpy.a9.S: memcpy from cortex-strings."
* commit '7c14d67bc1cc2679365a784e68518bf602b81dc7':
  libc/arch-arm/bionic/memcpy.a9.S: memcpy from cortex-strings.
2013-07-01 10:32:17 -07:00
Will Newton
b61103dff4 libc/arch-arm/bionic/memcpy.a9.S: memcpy from cortex-strings.
This memcpy code uses NEON/VFP to achieve very good performance
on ARMv7-A processors. It is specifically tuned for A15 but should
provide good performance on A9 also. It is equivalent to the code
in cortex-strings rev 116.

This patch is a follow up the existing gerrit change:

I7f6f77995f3ca903ad9c66d14261441667a2a935

But this version includes a tweak for performance on misaligned
buffers.

Change-Id: I285abac0068f8ae29a1cbf7862ea8590aadaf0a7
Signed-off-by: Will Newton <will.newton@linaro.org>
2013-07-01 11:15:27 +01:00
Rom Lemarchand
6937468d37 am baa61864: am 995f17e6: Merge "libc: add swapon and swapoff syscalls"
* commit 'baa61864c515a56d4dbeac46b149b4317b01797b':
  libc: add swapon and swapoff syscalls
2013-06-25 17:02:06 -07:00
Rom Lemarchand
baa61864c5 am 995f17e6: Merge "libc: add swapon and swapoff syscalls"
* commit '995f17e6a9a9903f03f542192da9a83b1cabc684':
  libc: add swapon and swapoff syscalls
2013-06-25 15:28:21 -07:00
Rom Lemarchand
d206b560e7 libc: add swapon and swapoff syscalls
Change-Id: Ie79dc8e3f2ff1cd427dd6d95e3850920c4b407b0
Signed-off-by: Rom Lemarchand <romlem@google.com>
2013-06-25 13:18:03 -07:00
Ben Cheng
d20a04c5cf am 77f90de7: am fc104f89: Merge "Fix abort(3) to raise SIGABRT rather than causing SIGSEGV."
* commit '77f90de728b9fa60b83b7f12a45c1113f3189cb2':
  Fix abort(3) to raise SIGABRT rather than causing SIGSEGV.
2013-06-10 17:28:46 -07:00
Ben Cheng
77f90de728 am fc104f89: Merge "Fix abort(3) to raise SIGABRT rather than causing SIGSEGV."
* commit 'fc104f899d47916f76c91127caf9aeaf7b69d4ef':
  Fix abort(3) to raise SIGABRT rather than causing SIGSEGV.
2013-06-10 17:25:31 -07:00
Ben Cheng
7e6ce1a3c5 Fix abort(3) to raise SIGABRT rather than causing SIGSEGV.
tgkill() needs the .save stack unwinding directive to get the complete
stack trace.

BUG: https://code.google.com/p/android/issues/detail?id=16672

Change-Id: Ifb447dca2147a592c48baf32769dfc175d8aea72
2013-06-10 17:17:46 -07:00
Ben Cheng
72ce296f28 am 404d491e: Merge "Use bl instead of blx to support interworking properly."
* commit '404d491eb655839bf4260cc168bb79864473e129':
  Use bl instead of blx to support interworking properly.
2013-06-01 08:19:07 -07:00
Ben Cheng
a123b5d319 Use bl instead of blx to support interworking properly.
(cherry picked from commit 9e1905794b4ecd8f7b87d8e4e2f954c8cfc6beda in
master)

Change-Id: I9b8c35ea9e201e00f84315f9f105013c23c94d85
2013-05-31 14:39:23 -07:00
Ben Cheng
9e1905794b Use bl instead of blx to support interworking properly.
BUG: 9227177
Change-Id: I742c2f2ecbe332f9c9743e3f4bde8de791a1d289
2013-05-31 14:25:48 -07:00
Erik Gilling
d5234a3b08 am 4c8eba6f: am 2e317075: Merge "libc/arm: add cortex-a8 cpu variant"
* commit '4c8eba6f2aaf351e29881ca4dc2ec47fc0246446':
  libc/arm: add cortex-a8 cpu variant
2013-05-16 13:20:53 -07:00
Erik Gilling
4c8eba6f2a am 2e317075: Merge "libc/arm: add cortex-a8 cpu variant"
* commit '2e317075b044e94fc75e36d08bec8a7eb5fc31ae':
  libc/arm: add cortex-a8 cpu variant
2013-05-16 13:19:07 -07:00
Rom Lemarchand
22bda4bd67 libc/arm: add cortex-a8 cpu variant
Change-Id: I30e8dd6d4b2e7889aea8f5ed21182a5941bfb489
2013-05-15 20:13:28 -07:00
Elliott Hughes
562804ff87 am f0f4fa3f: Merge "libc: add timerfd calls"
* commit 'f0f4fa3fb1ea8623b1e1bc59f7967e0470c8e532':
  libc: add timerfd calls
2013-05-14 14:59:16 -07:00
Todd Poynor
4200e6203a libc: add timerfd calls
(cherry-pick of 04c0ac14a49e0969333008a9522b64046d58fbdc.)

Change-Id: I06d0b6c2a8781602362b81f48faf1cca76b9ec05
2013-05-14 14:45:02 -07:00
Todd Poynor
04c0ac14a4 libc: add timerfd calls
Change-Id: Id63b907266d5b87c7422a51d393a1430551ca33d
2013-05-13 12:06:15 -07:00
Christopher Ferris
4d8fe5177e Tune the memcpy for krait.
Streamline the memcpy a bit removing some unnecessary instructions.

The biggest speed improvement comes from changing the size of
the preload. On krait, the sweet spot for the preload in the main
loop is twice the L1 cache line size.

In most cases, these small tweaks yield > 1000MB/s speed ups. As
the size of the memcpy approaches about 1MB, the speed improvement
disappears.

Change-Id: Ief79694d65324e2db41bee4707dae19b8c24be62
2013-05-02 14:04:31 -07:00
Andrew Hsieh
83966db80b am f7153fd1: Merge "Remove redundant space within square brackets"
* commit 'f7153fd13f469e9ba5aecbfa00fde42530ca2124':
  Remove redundant space within square brackets
2013-04-25 21:22:26 -07:00
Andrew Hsieh
e8f46e8edd Remove redundant space within square brackets
The new "as" in binutils-2.23 (with gcc4.8) is more picky:
it expects register right after [

Change-Id: I876124841582070ab2083ffafe38bc333b5812d0
2013-04-25 15:05:03 +08:00
Christopher Ferris
39e4ed9699 am 516a8970: Merge "Rewrite memset for cortexa15 to use strd."
* commit '516a89705378f43646678e75924529404e52b613':
  Rewrite memset for cortexa15 to use strd.
2013-04-12 12:30:22 -07:00
Christopher Ferris
796cbe249b Rewrite memset for cortexa15 to use strd.
Merge from internal master.

(cherry-picked from commit 7ffad9c120054eedebd5f56f8bed01144e93eafa)

Change-Id: Ia67f2a545399f4fa37b63d5634a3565e4f5482f9
2013-04-12 10:58:25 -07:00
Christopher Ferris
101dadf6a6 am fc76c7d3: Merge "Add missing branch in memcpy.S dst aligned case."
* commit 'fc76c7d394ebe0e585777955efadf7cc8ed86636':
  Add missing branch in memcpy.S dst aligned case.
2013-04-10 17:37:49 -07:00
Christopher Ferris
3fe5b10948 am 68fd78ef: Merge "Update to latest cortexa15 memcpy code."
* commit '68fd78efa05fc61adfbdeadeb757caa45663570c':
  Update to latest cortexa15 memcpy code.
2013-04-10 17:37:49 -07:00
Christopher Ferris
bf0d1ad72b Add missing branch in memcpy.S dst aligned case.
Merge from internal master.

(cherry-picked from commit 6ffaa931c362602a2b606a610c92326a425a876e)

Change-Id: Ifdcf01fd122866cf0d4c5b5f7a997803561d7889
2013-04-10 17:21:29 -07:00
Christopher Ferris
185ce72d00 Update to latest cortexa15 memcpy code.
This uses the new code original submitted as memcpy.a15.S as
the base. However, the old code handled unaligned src/dst better
so that was spliced in. I optimized the original unaligned code by
removing a few unnecessary instructions. I optimized the a15 code by
rewriting the pre and post code. I also modified the main loop to add
a pld so that larger copies would not stall waiting for memory.

Test cases for the new memcpy:

- Copy all sized values from 0 to 1024 bytes, using whatever alignment
  is returned by malloc.
For each alignment case described below, the test copied from 0 to 128
bytes.
- Src and dst pointers are both aligned to the same value, starting
  at one going through every power of two up to and including 128.
- Src aligned to double word boundary, dst aligned to word boundary.
- Src aligned to word boundary, dst aligned to double word boundary.
- Src aligned to 16 bit boundary, dst aligned to word boundary.
- Src aligned to word boundary, dst aligned to 16 byte boundary.
- Src aligned to word boundary, dst aligned to 1 byte from a word
  boundary.
- Src aligned to word boundary, dst aligned to 2 bytes from a word
  boundary.
- Src aligned to word boundary, dst aligned to 3 bytes from a word
  boundary.
- Src aligned to 1 byte from a word boundary, dst aligned to a word
  boundary.
- Src aligned to 2 bytes from a word boundary, dst aligned to a word
  boundary.
- Src aligned to 3 bytes from a word boundary, dst aligned to a word
  boundary.

Cases to verify the unaligned source code properly aligns to a 16 bit
boundary.
- Src aligned to 1 byte from a 128 bit boundary, dst aligned to
  4 + 128 bit boundary.
- Src aligned to 1 byte from a 128 bit boundary, dst aligned to
  8 + 128 bit boundary.
- Src aligned to 1 byte from a 128 bit boundary, dst aligned to
  12 + 128 bit boundary.
- Src aligned to 1 byte from a 128 bit boundary, dst aligned to
  16 + 128 bit boundary.

In all cases, a two byte fencepost was placed at the end of the
destination to verify that only the requested number of bytes were copied.

Bug: 8005082

Merge from internal master.

(cherry-picked from commit 21ede92d794969f22cacbdb9f557818f1c5712b5)

Change-Id: Ief70c9e6dc8c6473ae245b6570b2c266fed9618c
2013-04-08 18:13:35 -07:00
Dima Zavin
369f92349f Merge "libc/arm: add cortex-a7 cpu variant" into jb-mr2-dev 2013-03-25 19:42:28 +00:00