micro-optimizations: vpcmpeqb+vpmaskmov is faster than vptest according
to uops.info; make usually untaken branches target forward.
reduce numbers of data dependant branches and code size.
Change-Id: Ie70b4bc99685368e5131f23344348bfaf7c27d3e
Signed-off-by: Nicola Torracca <shark@bitchx.it>
The file types.h has long been misnamed and overlaps with
functionality in the test helper routines.
Change-Id: I774047d3a0074198b67a6b4e909f1e2ce1938195
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
To support Intel CET, all indirect branch targets must start with
ENDBR32/ENDBR64. Here is a patch to define endbranch and add it to
function entries in x86 assembly codes which are indirect branch
targets as discovered by running testsuite on Intel CET machine and
visual inspection.
Verified with
$ CC="gcc -Wl,-z,cet-report=error -fcf-protection" CXX="g++ -Wl,-z,cet-report=error -fcf-protection" .../configure x86_64-linux
$ make -j8
$ make -j8 check
with both nasm and yasm on both CET and non-CET machines.
Change-Id: I9822578e7294fb5043a64ab7de5c41de81a7d337
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Previously windows build could only use yasm because some procedural items such
as proc_start were not supported by nasm. This adds a few macros and fixes so
nasm can be used to build on windows.
Change-Id: Ia05dc3ff482f33b0f915bb1be3c7df5e4a753b3a
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
1) Implement the ErasureCode function in Altivec Intrinsics
2) Coding style update
Change-Id: I2c81d035f4083e9b011dbf3b741f628813b68606
Thanks-to: Daniel Axtens <dja@axtens.net>
Signed-off-by: Hong Bo Peng <penghb@cn.ibm.com>
Some CPUs report "illegal instruction" error for the crc test because
they do not support the relevant optional feature . This can be fixed by
introducing CPU feature detection for AArch64 .
The difference with the x86 implementation is the dispatcher . It is based
on the glibc function `getauxval(AT_HWCAP)` and `getauxval(AT_HWCAP2)` , not
registers or instructions .
On a heterogeneous system (big.LITTLE) , it is dangerous to detect CPU
features using identification registers . And while it is possible to use
architectural feature registers from userspace on recent kernels, this
won't necessarily work with older platforms . Thus we use the HW_CAPs
exported from the kernel (and visible in getauxval) as the solution.
- According to kernel suggestion , getauxval should be used for this purpose .
- [CPU Feature detection](https://github.com/torvalds/linux/blob/master/Documentation/arm64/cpu-feature-registers.rst)
- According to AAPCS result/paramter registers should be saved/restore for function call
- [AAPCS](http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf)
- [GLibc](https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=blob;f=sysdeps/aarch64/dl-trampoline.S)
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
Change-Id: Ic9abe0d2268ac95537e1abf10acc642fc58a5054
If an application treats these functions as function pointers, and this
lib (isa-l) is compiled into solib, a segmentation fault may occur.
For example: Ubuntu 16.04 on arm64 platfrom will be crash, because the
linker does not know that this symbol is a function, so mark the function
type explicitly with %function to solves this issue.
Change-Id: Iba41b1f1367146d7dcce09203694b08b1cb8ec20
Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>
Mingw does not define WORDSIZE and incorrect int width was used.
Change-Id: Idc9f560dd1c722d51f6e54ba2342feafa13f8fa5
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
This patch introduces the base, avx and sse optimized zero detect memory function.
The zero detect memory function tests if a memory region is all zeroes. If all the
bytes in the memory region are zero, the function return a zero. Otherwise, if the
memory region has non zero bytes, the zero detect function returns a 1.
Change-Id: If965badf750377124d0067d09f888d0419554998
Signed-off-by: John Kariuki <John.K.Kariuki@intel.com>