Added AVX512 optimized functions to calculate the
GF(2^8) vector dot product with 5 and 6 outputs
at a time. Also added GF(2^8) vector multiply
AVX512 optimized functions with 5 and 6 accumulate.
Change-Id: I6d2c080f4f4f8e4823ad9a9be2c65c3b5b3bb1f8
Signed-off-by: John Kariuki <John.K.Kariuki@intel.com>
+ Utilise `pmull2` instruction in main loops of arm64 crc functions and
avoid the need for `dup` to align multiplicands.
+ Use just 1 ASIMD register to hold both 64b p4 constants,
appropriately aligned.
+ Interleave quadword `ldr` with `pmull{2}` to avoid unnecessary stalls
on existing LITTLE uarch (which can only issue these instructions every
other cycle).
+ Similarly interleave scalar instructions with ASIMD instructions to
increase likelihood of instruction level parallelism on a variety of
uarch.
+ Cut down on needless instructions in non-critical sections to help
performance for small buffers.
+ Extract common instruction sequences into inner macros and moved
them into shared header - crc_common_pmull.h
+ Use the same human readable register aliases and register allocation
in all 4 implementations, never refer to registers without using human
readable alias.
+ Use #defines rather than .req to allow use of same names across
several implementations
+ Reduce tail case size from 1024B to 64B
+ Phrased the `eor` instructions in the main loop to more clearly show
that we can rewrite pairs of `eor` instructions with a single `eor3`
instruction in the presence of Armv8.2-SHA (should probably be an option
in multibinary in future).
Change-Id: I3688193ea4ad88b53cf47e5bd9a7fd5c2b4401e1
Signed-off-by: Samuel Lee <samuel.lee@microsoft.com>
Removed the redundant parts that apply to all arch.
Change-Id: I2015c436cc8ea09913a8d0d4ce2cf1f112d71dde
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
Replace hardcode gcc with $(CC). as_filter
will work correct in cross compile
Change-Id: I484d5074abdfc80ed5cd14fdd1358274f306bcfd
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
The third parameter must be 32bit register . Those assmebly
put 64bit register here , it is wrong .
Change-Id: Iebe17516b555a6a9b94ea7baa4778ad4b9dd0878
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
with -Wstric-prototype option , GCC report the
warning .
Change-Id: Ic2d1adb566ad21deec65c66552e2863254e1376a
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
- Disable clang test for travis and drone.io
- Add document about compiler requirement
Change-Id: I81f8dc31088d40f315dd4ec062bed5df8ab7b633
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
Bug in Homebrew auto-update causes post-update install to use the old
environment.
Change-Id: I03e20d899f558f71579dfd4be3f96903b77f1998
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
1.Replace below erasure code interfaces to arm neon interface by mbin_interface function.
ec_encode_data
gf_vect_mul
gf_vect_dot_prod
gf_vect_mad
ec_encode_data_update
2.Utilise arm neon instrution to accelerate GF(2^8) set compute by 128bit registor.
Change-Id: Ib0ecbfbd1837d2b1f823d26815c896724d2d22e4
Signed-off-by: Zhou Xiong <zhouxiong13@huawei.com>
Enable new arm64 architecture in TravisCI, add tests for
following compilers:
gcc: v5.4.0
clang: v3.8.0
Change-Id: Id0b2f2231fabcbeff7061f85050db99df12c9a67
Signed-off-by: Jun He <jun.he@arm.com>
ec_base.h has several variables, which were defined with
a global scope. Exactly those global variables caused issues
on linking a static compilation of libisal.a to a shared lib.
Adding -fPIC to CFLAGS somehow didn't help.
As all the variables in ec_base.h are only included
and used by a single C file, all of these can be
(file) static, which then will also helps the compiler to
make further optimizations. And which also solves the issue
to link the static libisal to a shared lib.
Also make the variables const, as these are constants and
must be modified.
Change-Id: I2b8141dabc1c7a528401f2778cdbdbed6c93c36b
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
This issue occurs when dynamic compilation is used
and gcc's -fsanitize memory detection option is turned on.
[Log] relocation truncated to fit: R_AARCH64_LD_PREL_LO19 against `.rodata'
Change-Id: Ic2f82264610552f347e043f82ac5ebafc93748e2
Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>
Also fix multibinary to try each available arch
Change-Id: Icd8496d169665bded478a33a02e739d1f8349b6f
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
Fixes a few scan build hits. A few are false positives such as a missed free but
better to clarify the code in this case. Others such as calling no-null
functions are made explicit.
Change-Id: Icb001a2bf7024dbaa4b4c87089eda818de830c78
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
- Add dispatcher layer
- Alias functions with assmebly
Change-Id: I84da1be539d890db0df64e5ea989b2fd1f276949
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
Some CPUs report "illegal instruction" error for the crc test because
they do not support the relevant optional feature . This can be fixed by
introducing CPU feature detection for AArch64 .
The difference with the x86 implementation is the dispatcher . It is based
on the glibc function `getauxval(AT_HWCAP)` and `getauxval(AT_HWCAP2)` , not
registers or instructions .
On a heterogeneous system (big.LITTLE) , it is dangerous to detect CPU
features using identification registers . And while it is possible to use
architectural feature registers from userspace on recent kernels, this
won't necessarily work with older platforms . Thus we use the HW_CAPs
exported from the kernel (and visible in getauxval) as the solution.
- According to kernel suggestion , getauxval should be used for this purpose .
- [CPU Feature detection](https://github.com/torvalds/linux/blob/master/Documentation/arm64/cpu-feature-registers.rst)
- According to AAPCS result/paramter registers should be saved/restore for function call
- [AAPCS](http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf)
- [GLibc](https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=blob;f=sysdeps/aarch64/dl-trampoline.S)
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
Change-Id: Ic9abe0d2268ac95537e1abf10acc642fc58a5054
Build with Makefile.unx on unsupported CPUs fail . It reports
"undefine references". Fix it with adding base aliases files
into sources list
Change-Id: I9fbdeee7cb82edc9d5d8461bee3f648be83feaa6
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
convert_dist_to_dist_sym uses long if/else branch to get look back distance.
The distance calculation is well formed for each distance range, so it could
be optimized for a branchless version.
Change-Id: I4e1e5170f8b3238631f3048087f95acc53e4498e
Signed-off-by: Jun He <jun.he@arm.com>
Also test the assembler for modern instruction support and set appropriate
defines.
Change-Id: I1628abd50b3babeeb7e010b86bda7ea97de0e6fb
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
Some arm64 machines don't support pmull instructions, so set these
crc interface to base functions. For long-term solution, will
provide better multi-binary support with cpu features detection.
Change-Id: I02791a2a50283dc8df2f9ba124eb309912b5b4b7
Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>
Xenial has a minimum nasm version but no longer min indent.
Change-Id: I3ec70b9d5be932e903b77fd07d23667746c6c9f8
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>