316 Commits

Author SHA1 Message Date
Christopher Ferris
2fc0717977 Add new optimized strlen for arm.
This optimized version is primarily targeted at cortex-a15.

Tested on all nexus devices using the system/extras/libc_test strlen test.
Tested alignments from 1 to 32 that are powers of 2.
Tested that strlen does not cross page boundaries at all alignments.

Speed improvements listed below:

cortex-a15
- Sizes >= 32 bytes, ~75% improvement.
- Sizes >= 1024 bytes, ~250% improvement.

cortex-a9
- Sizes >= 32 bytes, ~75% improvement.
- Sizes >= 1024 bytes, ~85% improvement.

krait
- Sizes >= 32 bytes, ~95% improvement.
- Sizes >= 1024 bytes, ~160% improvement.

Change-Id: I361b1a36ed89ab991f2a8f0abbf0d7416d39c8f5
2013-07-15 12:37:51 -07:00
Elliott Hughes
be438a4c40 am fac9199c: am ebc8ce1d: Merge "libc/arch-arm/bionic/memcpy.a9.S: memcpy from cortex-strings."
* commit 'fac9199c7698481805dd9b1adaf89a2584719f4c':
  libc/arch-arm/bionic/memcpy.a9.S: memcpy from cortex-strings.
2013-07-03 10:28:19 -07:00
Elliott Hughes
fac9199c76 am ebc8ce1d: Merge "libc/arch-arm/bionic/memcpy.a9.S: memcpy from cortex-strings."
* commit 'ebc8ce1de68a83d772106af98c7cb98150bb5662':
  libc/arch-arm/bionic/memcpy.a9.S: memcpy from cortex-strings.
2013-07-03 10:23:41 -07:00
Will Newton
2753e12af5 libc/arch-arm/bionic/memcpy.a9.S: memcpy from cortex-strings.
This memcpy code uses NEON/VFP to achieve very good performance
on ARMv7-A processors. It is specifically tuned for A15 but should
provide good performance on A9 also. It is equivalent to the code
in cortex-strings rev 116.

This patch is a follow up the existing gerrit change:

I7f6f77995f3ca903ad9c66d14261441667a2a935

This version includes a tweak for performance on misaligned
buffers and splits the header comment into license and
documentation sections.

Change-Id: Ibd2e23c8d8e01357ba0247be1d05192de3ceba69
Signed-off-by: Will Newton <will.newton@linaro.org>
2013-07-03 10:20:43 -07:00
Christopher Ferris
c6ac3ae269 am 269daac2: am 7c14d67b: Merge "libc/arch-arm/bionic/memcpy.a9.S: memcpy from cortex-strings."
* commit '269daac2f1d76a478b83ba4cbb57d28b47eef5ec':
  libc/arch-arm/bionic/memcpy.a9.S: memcpy from cortex-strings.
2013-07-01 10:39:08 -07:00
Christopher Ferris
269daac2f1 am 7c14d67b: Merge "libc/arch-arm/bionic/memcpy.a9.S: memcpy from cortex-strings."
* commit '7c14d67bc1cc2679365a784e68518bf602b81dc7':
  libc/arch-arm/bionic/memcpy.a9.S: memcpy from cortex-strings.
2013-07-01 10:32:17 -07:00
Will Newton
b61103dff4 libc/arch-arm/bionic/memcpy.a9.S: memcpy from cortex-strings.
This memcpy code uses NEON/VFP to achieve very good performance
on ARMv7-A processors. It is specifically tuned for A15 but should
provide good performance on A9 also. It is equivalent to the code
in cortex-strings rev 116.

This patch is a follow up the existing gerrit change:

I7f6f77995f3ca903ad9c66d14261441667a2a935

But this version includes a tweak for performance on misaligned
buffers.

Change-Id: I285abac0068f8ae29a1cbf7862ea8590aadaf0a7
Signed-off-by: Will Newton <will.newton@linaro.org>
2013-07-01 11:15:27 +01:00
Rom Lemarchand
6937468d37 am baa61864: am 995f17e6: Merge "libc: add swapon and swapoff syscalls"
* commit 'baa61864c515a56d4dbeac46b149b4317b01797b':
  libc: add swapon and swapoff syscalls
2013-06-25 17:02:06 -07:00
Rom Lemarchand
baa61864c5 am 995f17e6: Merge "libc: add swapon and swapoff syscalls"
* commit '995f17e6a9a9903f03f542192da9a83b1cabc684':
  libc: add swapon and swapoff syscalls
2013-06-25 15:28:21 -07:00
Rom Lemarchand
d206b560e7 libc: add swapon and swapoff syscalls
Change-Id: Ie79dc8e3f2ff1cd427dd6d95e3850920c4b407b0
Signed-off-by: Rom Lemarchand <romlem@google.com>
2013-06-25 13:18:03 -07:00
Ben Cheng
d20a04c5cf am 77f90de7: am fc104f89: Merge "Fix abort(3) to raise SIGABRT rather than causing SIGSEGV."
* commit '77f90de728b9fa60b83b7f12a45c1113f3189cb2':
  Fix abort(3) to raise SIGABRT rather than causing SIGSEGV.
2013-06-10 17:28:46 -07:00
Ben Cheng
77f90de728 am fc104f89: Merge "Fix abort(3) to raise SIGABRT rather than causing SIGSEGV."
* commit 'fc104f899d47916f76c91127caf9aeaf7b69d4ef':
  Fix abort(3) to raise SIGABRT rather than causing SIGSEGV.
2013-06-10 17:25:31 -07:00
Ben Cheng
7e6ce1a3c5 Fix abort(3) to raise SIGABRT rather than causing SIGSEGV.
tgkill() needs the .save stack unwinding directive to get the complete
stack trace.

BUG: https://code.google.com/p/android/issues/detail?id=16672

Change-Id: Ifb447dca2147a592c48baf32769dfc175d8aea72
2013-06-10 17:17:46 -07:00
Ben Cheng
72ce296f28 am 404d491e: Merge "Use bl instead of blx to support interworking properly."
* commit '404d491eb655839bf4260cc168bb79864473e129':
  Use bl instead of blx to support interworking properly.
2013-06-01 08:19:07 -07:00
Ben Cheng
a123b5d319 Use bl instead of blx to support interworking properly.
(cherry picked from commit 9e1905794b4ecd8f7b87d8e4e2f954c8cfc6beda in
master)

Change-Id: I9b8c35ea9e201e00f84315f9f105013c23c94d85
2013-05-31 14:39:23 -07:00
Ben Cheng
9e1905794b Use bl instead of blx to support interworking properly.
BUG: 9227177
Change-Id: I742c2f2ecbe332f9c9743e3f4bde8de791a1d289
2013-05-31 14:25:48 -07:00
Erik Gilling
d5234a3b08 am 4c8eba6f: am 2e317075: Merge "libc/arm: add cortex-a8 cpu variant"
* commit '4c8eba6f2aaf351e29881ca4dc2ec47fc0246446':
  libc/arm: add cortex-a8 cpu variant
2013-05-16 13:20:53 -07:00
Erik Gilling
4c8eba6f2a am 2e317075: Merge "libc/arm: add cortex-a8 cpu variant"
* commit '2e317075b044e94fc75e36d08bec8a7eb5fc31ae':
  libc/arm: add cortex-a8 cpu variant
2013-05-16 13:19:07 -07:00
Rom Lemarchand
22bda4bd67 libc/arm: add cortex-a8 cpu variant
Change-Id: I30e8dd6d4b2e7889aea8f5ed21182a5941bfb489
2013-05-15 20:13:28 -07:00
Elliott Hughes
562804ff87 am f0f4fa3f: Merge "libc: add timerfd calls"
* commit 'f0f4fa3fb1ea8623b1e1bc59f7967e0470c8e532':
  libc: add timerfd calls
2013-05-14 14:59:16 -07:00
Todd Poynor
4200e6203a libc: add timerfd calls
(cherry-pick of 04c0ac14a49e0969333008a9522b64046d58fbdc.)

Change-Id: I06d0b6c2a8781602362b81f48faf1cca76b9ec05
2013-05-14 14:45:02 -07:00
Todd Poynor
04c0ac14a4 libc: add timerfd calls
Change-Id: Id63b907266d5b87c7422a51d393a1430551ca33d
2013-05-13 12:06:15 -07:00
Christopher Ferris
4d8fe5177e Tune the memcpy for krait.
Streamline the memcpy a bit removing some unnecessary instructions.

The biggest speed improvement comes from changing the size of
the preload. On krait, the sweet spot for the preload in the main
loop is twice the L1 cache line size.

In most cases, these small tweaks yield > 1000MB/s speed ups. As
the size of the memcpy approaches about 1MB, the speed improvement
disappears.

Change-Id: Ief79694d65324e2db41bee4707dae19b8c24be62
2013-05-02 14:04:31 -07:00
Andrew Hsieh
83966db80b am f7153fd1: Merge "Remove redundant space within square brackets"
* commit 'f7153fd13f469e9ba5aecbfa00fde42530ca2124':
  Remove redundant space within square brackets
2013-04-25 21:22:26 -07:00
Andrew Hsieh
e8f46e8edd Remove redundant space within square brackets
The new "as" in binutils-2.23 (with gcc4.8) is more picky:
it expects register right after [

Change-Id: I876124841582070ab2083ffafe38bc333b5812d0
2013-04-25 15:05:03 +08:00
Christopher Ferris
39e4ed9699 am 516a8970: Merge "Rewrite memset for cortexa15 to use strd."
* commit '516a89705378f43646678e75924529404e52b613':
  Rewrite memset for cortexa15 to use strd.
2013-04-12 12:30:22 -07:00
Christopher Ferris
796cbe249b Rewrite memset for cortexa15 to use strd.
Merge from internal master.

(cherry-picked from commit 7ffad9c120054eedebd5f56f8bed01144e93eafa)

Change-Id: Ia67f2a545399f4fa37b63d5634a3565e4f5482f9
2013-04-12 10:58:25 -07:00
Christopher Ferris
101dadf6a6 am fc76c7d3: Merge "Add missing branch in memcpy.S dst aligned case."
* commit 'fc76c7d394ebe0e585777955efadf7cc8ed86636':
  Add missing branch in memcpy.S dst aligned case.
2013-04-10 17:37:49 -07:00
Christopher Ferris
3fe5b10948 am 68fd78ef: Merge "Update to latest cortexa15 memcpy code."
* commit '68fd78efa05fc61adfbdeadeb757caa45663570c':
  Update to latest cortexa15 memcpy code.
2013-04-10 17:37:49 -07:00
Christopher Ferris
bf0d1ad72b Add missing branch in memcpy.S dst aligned case.
Merge from internal master.

(cherry-picked from commit 6ffaa931c362602a2b606a610c92326a425a876e)

Change-Id: Ifdcf01fd122866cf0d4c5b5f7a997803561d7889
2013-04-10 17:21:29 -07:00
Christopher Ferris
185ce72d00 Update to latest cortexa15 memcpy code.
This uses the new code original submitted as memcpy.a15.S as
the base. However, the old code handled unaligned src/dst better
so that was spliced in. I optimized the original unaligned code by
removing a few unnecessary instructions. I optimized the a15 code by
rewriting the pre and post code. I also modified the main loop to add
a pld so that larger copies would not stall waiting for memory.

Test cases for the new memcpy:

- Copy all sized values from 0 to 1024 bytes, using whatever alignment
  is returned by malloc.
For each alignment case described below, the test copied from 0 to 128
bytes.
- Src and dst pointers are both aligned to the same value, starting
  at one going through every power of two up to and including 128.
- Src aligned to double word boundary, dst aligned to word boundary.
- Src aligned to word boundary, dst aligned to double word boundary.
- Src aligned to 16 bit boundary, dst aligned to word boundary.
- Src aligned to word boundary, dst aligned to 16 byte boundary.
- Src aligned to word boundary, dst aligned to 1 byte from a word
  boundary.
- Src aligned to word boundary, dst aligned to 2 bytes from a word
  boundary.
- Src aligned to word boundary, dst aligned to 3 bytes from a word
  boundary.
- Src aligned to 1 byte from a word boundary, dst aligned to a word
  boundary.
- Src aligned to 2 bytes from a word boundary, dst aligned to a word
  boundary.
- Src aligned to 3 bytes from a word boundary, dst aligned to a word
  boundary.

Cases to verify the unaligned source code properly aligns to a 16 bit
boundary.
- Src aligned to 1 byte from a 128 bit boundary, dst aligned to
  4 + 128 bit boundary.
- Src aligned to 1 byte from a 128 bit boundary, dst aligned to
  8 + 128 bit boundary.
- Src aligned to 1 byte from a 128 bit boundary, dst aligned to
  12 + 128 bit boundary.
- Src aligned to 1 byte from a 128 bit boundary, dst aligned to
  16 + 128 bit boundary.

In all cases, a two byte fencepost was placed at the end of the
destination to verify that only the requested number of bytes were copied.

Bug: 8005082

Merge from internal master.

(cherry-picked from commit 21ede92d794969f22cacbdb9f557818f1c5712b5)

Change-Id: Ief70c9e6dc8c6473ae245b6570b2c266fed9618c
2013-04-08 18:13:35 -07:00
Dima Zavin
369f92349f Merge "libc/arm: add cortex-a7 cpu variant" into jb-mr2-dev 2013-03-25 19:42:28 +00:00
Dima Zavin
0c973d7049 libc/arm: add cortex-a7 cpu variant
Change-Id: I541d665805ea69ca96bb6a5f4d50e56287f8c08c
Signed-off-by: Dima Zavin <dima@android.com>
2013-03-23 01:38:22 -07:00
Elliott Hughes
cda62094ef Use the correct names for the __ARM_NR_* syscalls.
This lets us move all the ARM syscall stubs over to the kernel <asm/unistd.h>.
Our generated <sys/linux-syscalls.h> is now unused, but I'll remove that in a
later change.

Change-Id: Ie5ff2cc4abce1938576af7cbaef615a79c7f310d
2013-03-22 13:53:43 -07:00
Elliott Hughes
8794ece296 Replace unnecessary ARM uses of <sys/linux-syscalls.h> with <asm/unistd.h>.
For some reason, socketcalls.c was only being compiled for ARM, where
it makes no sense. For x86 we generate stubs for the socket functions
that use __NR_socketcall directly.

Change-Id: I84181e6183fae2314ae3ed862276eba82ad21e8e
2013-03-21 23:07:11 -07:00
Elliott Hughes
5c2772f59d The SYS_ constants should cover all __NR_ values.
<sys/linux-syscalls.h> only contains constants for the syscalls
we're generating stubs for. We want all the syscalls available
on the architecture in question.

Keep using <sys/linux-syscalls.h> on ARM for now because the
__NR_ARM_set_tls and __NR_ARM_cacheflush values aren't in <asm/unistd.h>.

Change-Id: I66683950d87d9b18d6107d0acc0ed238a4496f44
2013-03-21 22:26:20 -07:00
Elliott Hughes
17a8b0db63 Expose wait4 as wait4 rather than __wait4.
This helps strace(1) compile with one fewer hack.

Change-Id: I5296d0cfec5546709cda990abd705ad33d7c4626
2013-03-21 16:14:06 -07:00
Christopher Ferris
31dea25b8b Create arch specific versions of strcmp.
This uses the new strcmp.a15.S code as the basis for new versions
of strcmp.S.

The cortex-a15 code is the performance optimized version of strcmp.a15.S
taken with only the addition of a few pld instructions.
The cortex-a9 code is the same as the cortex-a15 code except that the
unaligned strcmp code was taken from the original strcmp.S.
The krait code is the same as the cortex-a15 code except that one path
in the unaligned strcmp code was taken from the original strcmp.S code
(the 2 byte overlap case).
The generic code is the original unmodified strmp.S from the bionic
subdirectory.

All three new versions underwent these test cases:

Strings the same, all same size:
- Both pointers double word aligned.
- One pointer double word aligned, one pointer word aligned.
- Both pointers word aligned.
- One pointer double word aligned, one pointer 1 off a word alignment.
- One pointer double word aligned, one pointer 2 off a word alignment.
- One pointer double word aligned, one pointer 3 off a word alignment.
- One pointer word aligned, one pointer 1 off a word alignment.
- One pointer word aligned, one pointer 2 off a word alignment.
- One pointer word aligned, one pointer 3 off a word alignment.
For all cases where it made sense, the two pointers were also tested
swapped.

Different strings, all same size:
- Single difference at double word boundary.
- Single difference at word boudary.
- Single difference at 1 off a word alignment.
- Single difference at 2 off a word alignment.
- Single difference at 3 off a word alignment.

Different sized strings, strings the same until the end:
- Shorter string ends on a double word boundary.
- Shorter string ends on word boundary.
- Shorter string ends at 1 off a word boundary.
- Shorter string ends at 2 off a word boundary.
- Shorter string ends at 3 off a word boundary.

For all different cases, run them through the same pointer alignment
cases when the strings are the same size.
For all cases the two pointers were also tested swapped.

Bug: 8005082

Merge from internal master.

(cherry-picked from commit a9a5870d166f8060a8182cd61e5536b0becea74e)

Change-Id: I4c2b98f8a50804fb98ab67f75e9d660f1315a144
2013-03-20 14:33:54 -07:00
Elliott Hughes
8f2a5a0b40 Clean up internal libc logging.
We only need one logging API, and I prefer the one that does no
allocation and is thus safe to use in any context.

Also use O_CLOEXEC when opening the /dev/log files.

Move everything logging-related into one header file.

Change-Id: Ic1e3ea8e9b910dc29df351bff6c0aa4db26fbb58
2013-03-15 16:12:58 -07:00
Elliott Hughes
ec706c24ac Merge "Use the kernel's MAX_ERRNO in the syscall stubs." 2013-03-13 00:44:33 +00:00
Elliott Hughes
9aceab5015 Use the kernel's MAX_ERRNO in the syscall stubs.
Bug: http://code.google.com/p/android/issues/detail?id=53104
Change-Id: Iaabf7025b153e96dc5eca231a33a32d4cb7d8116
2013-03-12 17:43:58 -07:00
Christopher Ferris
04954a43b3 Break bionic implementations into arch versions.
Move arch specific code for arm, mips, x86 into separate
makefiles.
In addition, add different arm cpu versions of memcpy/memset.

Bug: 8005082

Merge from internal master (acdde8c1cf8e8beed98c052757d96695b820b50c).

Change-Id: I04f3d0715104fab618e1abf7cf8f7eec9bec79df
2013-03-12 14:06:08 -07:00
Ben Cheng
14283004f5 Add stack unwinding directives to memcpy.
Also include some Android specific header files.

Change-Id: Idbcbd43458ba945ca8c61bfbc04ea15fc0ae4e00
2013-03-01 14:56:04 -08:00
Greta Yorsh
eb149e954e Adding strcmp tuned for Cortex-A15.
The attached patch provides a new implementation of strcmp for ARM,
using LDRD instead of LDR whenever possible.

For older architectures that do not support LDRD, this implementation
uses the same algorithm as before.

Testing and benchmarking:
* Validation: successfully passes a test that compares different strings
of length 1-128 and offsets 0-8 from a word boundary. Checked on
qemu/A15/A9, ARM/Thumb mode, Big/Little Endian.
* Integration with gcc: no regression on qemu for arm-none-eabi --with-cpu
a15/a9 --with-mode arm/thumb.

Change-Id: I9e230e1b99dbdc9119b69ee858a89038c516a4ea
Signed-off-by: Vassilis Laganakos <vasileios.laganakos@arm.com>
2013-03-01 10:41:01 +00:00
Greta Yorsh
5b349fc22e Adding memcpy tuned for Cortex-A15.
The strategy for large block sizes is LDRD and STRD with offset addressing,
where the main loop copies 64 bytes in every iteration, (i.e., 8 calls to
LDRD and STRD pairs), interleaving load and stores (i.e., the pairs of LDRD
and STRD of the same data are consecutive instructions), and the writeback
of an updated address is a separate instruction, which allows us to write
back the accumulated update once per iteration.

This strategy is implemented in memcpy.S. In some configurations, a plain
version of memcpy (included from memcpy-stub.c) is used instead of the
optimized one.

Validation:
* Correctness: checked memcpy using a test harness for block sizes
ranging between 1 to 128, and source and destination buffers alignment
ranging in { 0,1,2,3,4,8,12 } bytes each.
* Performance: benchmarking on Cortex-A15 FPGA indicates that this strategy
is better for A15 than the strategy used by glibc and even slightly better
than using NEON. Benchmarking on Cortex-A9 bare metal and Linux shows
that the proposed strategy is reasonable: not as fast as the version of
memcpy from glibc (which is the best open source strategy for A9), but
comparable with csl and bionic.
* Integration with GCC: no regression for arm-none-eabi --with-cpu
cortex-a15 and cortex-a9.

Change-Id: Ied56354d8992c62ae3e02d582a2bd55585d814b9
Signed-off-by: Vassilis Laganakos <vasileios.laganakos@arm.com>
2013-03-01 10:40:50 +00:00
Elliott Hughes
40eabe24e4 Fix the pthread_setname_np test.
Fix the pthread_setname_np test to take into account that emulator kernels are
so old that they don't support setting the name of other threads.

The CLONE_DETACHED thread is obsolete since 2.5 kernels.

Rename kernel_id to tid.

Fix the signature of __pthread_clone.

Clean up the clone and pthread_setname_np implementations slightly.

Change-Id: I16c2ff8845b67530544bbda9aa6618058603066d
2013-02-15 12:08:59 -08:00
Elliott Hughes
6719500dbd Add a bunch more missing ENDs to assembler routines.
This isn't everything; I've missed out those x86 files that are

Change-Id: Idb7bb1a68796d6c0b70ea2b5c3300e49da6c62d2
2013-02-13 15:12:32 -08:00
Elliott Hughes
73964c592c Everyone has CLZ.
Even armv5 had CLZ.

Change-Id: I51bc8d1166d09940fd0d3f4c7717edf26977082c
2013-02-13 14:40:48 -08:00
Elliott Hughes
9f878c2fca Really set errno if __pthread_clone fails.
If r0 == 0, we're the child. If r0 > 0, we're the parent.
Otherwise set errno.

The __bionic_clone code I copy & pasted was wrong. This patch
fixes both.

Bug: 3461078
Change-Id: Ibb7d6cc7e54e666841f2f0dc59a141a0b31982e4
2013-02-12 16:07:06 -08:00
Elliott Hughes
d7a3a403c1 Use ENTRY/END in ARM __get_sp.
Change-Id: If2f159b266f5fa4ad9d188a17d4cd318b605e446
2013-02-11 16:58:34 -08:00