+ Utilise `pmull2` instruction in main loops of arm64 crc functions and
avoid the need for `dup` to align multiplicands.
+ Use just 1 ASIMD register to hold both 64b p4 constants,
appropriately aligned.
+ Interleave quadword `ldr` with `pmull{2}` to avoid unnecessary stalls
on existing LITTLE uarch (which can only issue these instructions every
other cycle).
+ Similarly interleave scalar instructions with ASIMD instructions to
increase likelihood of instruction level parallelism on a variety of
uarch.
+ Cut down on needless instructions in non-critical sections to help
performance for small buffers.
+ Extract common instruction sequences into inner macros and moved
them into shared header - crc_common_pmull.h
+ Use the same human readable register aliases and register allocation
in all 4 implementations, never refer to registers without using human
readable alias.
+ Use #defines rather than .req to allow use of same names across
several implementations
+ Reduce tail case size from 1024B to 64B
+ Phrased the `eor` instructions in the main loop to more clearly show
that we can rewrite pairs of `eor` instructions with a single `eor3`
instruction in the presence of Armv8.2-SHA (should probably be an option
in multibinary in future).
Change-Id: I3688193ea4ad88b53cf47e5bd9a7fd5c2b4401e1
Signed-off-by: Samuel Lee <samuel.lee@microsoft.com>
This issue occurs when dynamic compilation is used
and gcc's -fsanitize memory detection option is turned on.
[Log] relocation truncated to fit: R_AARCH64_LD_PREL_LO19 against `.rodata'
Change-Id: Ic2f82264610552f347e043f82ac5ebafc93748e2
Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>
Some CPUs report "illegal instruction" error for the crc test because
they do not support the relevant optional feature . This can be fixed by
introducing CPU feature detection for AArch64 .
The difference with the x86 implementation is the dispatcher . It is based
on the glibc function `getauxval(AT_HWCAP)` and `getauxval(AT_HWCAP2)` , not
registers or instructions .
On a heterogeneous system (big.LITTLE) , it is dangerous to detect CPU
features using identification registers . And while it is possible to use
architectural feature registers from userspace on recent kernels, this
won't necessarily work with older platforms . Thus we use the HW_CAPs
exported from the kernel (and visible in getauxval) as the solution.
- According to kernel suggestion , getauxval should be used for this purpose .
- [CPU Feature detection](https://github.com/torvalds/linux/blob/master/Documentation/arm64/cpu-feature-registers.rst)
- According to AAPCS result/paramter registers should be saved/restore for function call
- [AAPCS](http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf)
- [GLibc](https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=blob;f=sysdeps/aarch64/dl-trampoline.S)
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
Change-Id: Ic9abe0d2268ac95537e1abf10acc642fc58a5054
Some arm64 machines don't support pmull instructions, so set these
crc interface to base functions. For long-term solution, will
provide better multi-binary support with cpu features detection.
Change-Id: I02791a2a50283dc8df2f9ba124eb309912b5b4b7
Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>
Reason: Ceph directly copied some code from isal,
which will have conflict on the condition that
SPDK applications use isal-lib(configured with '--with-isal')
and also use Ceph (configured with --with-rbd)
Change-Id: I9f58412a68af76f8e29219a9c72cd44b9183033d
Signed-off-by: Jesse Hui <Chunyang.hui@intel.com>
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Merge crc32_gzip_refl function definitions, base code, multi-binary
code into crc32.h, crc32_base,c and crc_multibinary.asm in order to
keep consistency. Add crc32_gzip_refl files into crc/Makefile.am
Original crc32_gzip_refl removed NOT operation, re-add it.
Change-Id: Ib0cbbeb1ab3c9fcafec324b392596d2514202424
Signed-off-by: Xiaodong Liu <xiaodong.liu@intel.com>
1. Add normal and reflected bits order functions for ISO format and
Jones coefficients format.
2. Add a multi-binary macro for crc64 functions.
3. In order to decrease number of repeated test.c and perf.c files,
using crc64_funcs_test.c and cr crc64_funcs_perf.c.
4. Add crc64_example.c to take the demonstration role.
Change-Id: Icb8c14f1a84cd98f58eb12206ca605dea8a2cefb
Signed-off-by: Xiaodong Liu <xiaodong.liu@intel.com>
crc64_ecma_norm is used for the normal format.
crc64_ecma_refl is used for the refleced format.
Change-Id: I8fa8aad48ed995ea7edcdb8e123e1a5f1a1f01ad
Signed-off-by: Xiaodong Liu <xiaodong.liu@intel.com>
Allows configure to again build in an external directory. When building ISAL in
an external path, assembler or compiler needs relative include paths.
Signed-off-by: Xiaodong Liu <xiaodong.liu@intel.com>
Reviewed-by: Greg Tucker <greg.b.tucker@intel.com>