Don't enable the inlines when building libm itself. Otherwise clang gets
upset by seeing both an inline and a non-inline definition.
This reverts commit c5deb0f883cbdca7e5ab75f92f82c31d21367f49.
Change-Id: If7abdb351f5a5549d6a331b33af408e8fcfa9868
Use of "extern inline" breaks clang build.
This reverts commit d76f16973a9d06765fb1f482239b9559f893ffd0.
Change-Id: I995d0d38c3776f5c50b060f16770741c92a2acac
Also remove cruft meant to support long-obsolete compilers. More
benchmarks.
Bug: http://b/23195789
Change-Id: Ief538e41e77a77e8013b2f4f359584e8df2c47d8
The * flag to printf() wants an int instead of size_t, and these are
distinct types on 64-bit. To accomodate this, make the name column
width helpers return int.
In theory this truncates things, but in practice this only matters if
you have a benchmark with more than INT_MAX characters in its name (in
which case you have bigger problems).
Change-Id: I3338948c25a3a8d84f1ead2f5b457c05da8a01cf
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Changes:
- Modify the benchmarks to derive from a single Benchmark object.
- Rewrite the main iteration code. This includes changing the iteration
code to use the actual total time calculated by the benchmark as a basis
for determining whether there are enough iterations instead of using
the time it takes to run the benchmark.
- Allow benchmarks to take no argument, int, or double.
- Fix the PrettyInt printer for negative integers.
- Modify the max column width name to include the whole name including
the arg part.
- Reformat property_benchmark.cpp in line with the rest of the code.
- Modify a few of the math benchmarks to take an argument instead of
separate benchmarks for the same function with different args.
- Create a vector of regex_t structs to represent the args all at
once instead of when running each benchmark.
This change is in preparation for adding new math based benchmarks.
Tested by running on a nexus flo running at max using the new code
and the old code and comparing. All of the numbers are similar, but
some of the iterations are different due to the slightly different
algorithm used.
Change-Id: I57ad1f3ff083282b9ffeb72e687cab369ce3523a
This test reports the overhead of sem_post to sem_wake for a low thread count
and a high thread count.
Change-Id: Ic30dcc8a78d754979117446bf3a28b7575cabac7
This test tries its best to report the producer side underlying futex
wake syscall overhead wthin sem_post. It does not measure the time it
takes for the wakeup to propagate to the consumer. It suffers from
clock_gettime syscall overhead, so subtract that. Lock the CPU speed
for consistent results as we may not reach >50% cpu utilization.
Change-Id: I02fa9dab2e6ac27202f0290115150bd3c8de00f2
The old __isthreaded hack was never very useful on Android because all user
code runs in a VM where there are lots of threads running. But __fsetlocking
lets a caller say "I'll worry about the locking for this FILE*", which is
useful for the normal case where you don't share a FILE* between threads
so you don't need any locking.
Bug: 17154740
Bug: 18593728
Change-Id: I2a8dddc29d3edff39a3d7d793387f2253608a68d
Also switch throughput to GiB/s. I did play with using the new code,
but having consistent units for all results seemed easier to use
anyway (and doesn't require extra code).
Change-Id: I466fd573373bd05619e6f6e6d3dedd7dae0d9362
The benchmark run loop tries to run until 1s of time has been
accumulated between StartBenchmarkTiming() and StopBenchmarkTiming().
If a majority of the time is spent stopped this can lead to
benchmarks running for very long periods of time. This can easily
happen when benchmarking something that requires initialization or
cleanup on each iteration.
Modify the loop to run for 1s of real time instead of 1s of
benchmark time. For existing benchmarks this shouldn't make much
of a difference.
Change-Id: Iaba8a13b4dfc4a5e2cd9992041c9173ea556f9cc
Benchmarks for the following sequences:
1) pthread_rwlock_rdlock -> pthread_rwlock_unlock
2) pthread_rwlock_wrlock -> pthread_rwlock_unlock
Change-Id: I8d87d4d8afab8637ea7ff5d23a0b3a81d6d40835
In practice, with this implementation we never need to make a system call.
We get the main thread's tid (which is the same as our pid) back from
the set_tid_address system call we have to make during initialization.
A new pthread will have the same pid as its parent, and a fork child's
main (and only) thread will have a pid equal to its tid, which we get for
free from the kernel before clone returns.
The only time we'd actually have to make a getpid system call now is if
we take a signal during fork and the signal handler calls getpid. (That,
or we call getpid in the dynamic linker while it's still dealing with its
own relocations and hasn't even set up the main thread yet.)
Bug: 15387103
Change-Id: I6d4718ed0a5c912fc75b5f738c49a023dbed5189
This patch fixes the ARM64 ABI for libm. fenv_t is now split in 32bit status
and 32bit control. This mirrors the AArch64 FPU control and status
registers (FPCR, FPSR).
The patch also refactors the libm implementation for ARM64 into a finer
grained control over the FPU registers.
Bionic-benchmarks has been expanded with 3 more benchmarks for floating
point operations. The new libm implementation for ARM64 performs better
over all the math benchmarks available.
Change-Id: I2a7f81d6b4e55c91f8a63a4c69614fc8b1bcf2db
Signed-off-by: Serban Constantinescu <serban.constantinescu@arm.com>
System calls can be pretty slow. This is mako, which has one of our
lowest latencies:
iterations ns/op
BM_unistd_getpid 10000000 209
BM_unistd_gettid 200000000 8
Bug: 15297299 (kernel panic from too many gettid calls)
Bug: 15315766 (excessive gettid overhead in liblogd)
Change-Id: I49656c0fc5b5d092390264a59e4f2c0d8a8b1aeb