Commit Graph

681 Commits

Author SHA1 Message Date
John Kariuki
5eeb33f69c ec: add AVX512 ec functions with 5 and 6 outputs
Added AVX512 optimized functions to calculate the
GF(2^8) vector dot product with 5 and 6 outputs
at a time. Also added GF(2^8) vector multiply
AVX512 optimized functions with 5 and 6 accumulate.

Change-Id: I6d2c080f4f4f8e4823ad9a9be2c65c3b5b3bb1f8
Signed-off-by: John Kariuki <John.K.Kariuki@intel.com>
2019-11-19 10:12:14 -07:00
Samuel Lee
4785428d2f crc: arm64 implementation tweaks
+ Utilise `pmull2` instruction in main loops of arm64 crc functions and
avoid the need for `dup` to align multiplicands.
  + Use just 1 ASIMD register to hold both 64b p4 constants,
appropriately aligned.
+ Interleave quadword `ldr` with `pmull{2}` to avoid unnecessary stalls
on existing LITTLE uarch (which can only issue these instructions every
other cycle).
+ Similarly interleave scalar instructions with ASIMD instructions to
increase likelihood of instruction level parallelism on a variety of
uarch.
+ Cut down on needless instructions in non-critical sections to help
performance for small buffers.
+ Extract common instruction sequences into inner macros and moved
them into shared header - crc_common_pmull.h
+ Use the same human readable register aliases and register allocation
in all 4 implementations, never refer to registers without using human
readable alias.
  + Use #defines rather than .req to allow use of same names across
several implementations
+ Reduce tail case size from 1024B to 64B

+ Phrased the `eor` instructions in the main loop to more clearly show
that we can rewrite pairs of `eor` instructions with a single `eor3`
instruction in the presence of Armv8.2-SHA (should probably be an option
in multibinary in future).

Change-Id: I3688193ea4ad88b53cf47e5bd9a7fd5c2b4401e1
Signed-off-by: Samuel Lee <samuel.lee@microsoft.com>
2019-11-13 10:58:19 -07:00
Greg Tucker
0a8d05a81e doc: Move arch-dependent build instructions to readme
Removed the redundant parts that apply to all arch.

Change-Id: I2015c436cc8ea09913a8d0d4ce2cf1f112d71dde
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-11-01 15:55:44 -07:00
Hang Li
02a86dfb3f erasure_code: modify eor way in aarch64 neon codes
Change-Id: I9fb9219c5f280ed88194ec63234af046a5a036ae
Signed-off-by: Hang Li <lihang48@hisilicon.com>
2019-11-01 15:31:33 -07:00
Jerry Yu
ce9e56054a igzip:implement deflate hash with assembly
Change-Id: I39b3a37cd291c40f597750839c27db2a6a571fe5
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2019-11-01 14:41:46 -07:00
Jerry Yu
216d0f929b build: fix cross compile issue
Replace hardcode gcc with $(CC). as_filter
will work correct in cross compile

Change-Id: I484d5074abdfc80ed5cd14fdd1358274f306bcfd
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2019-11-01 18:11:05 +08:00
Jerry Yu
5d7724898d build: fix wrong use the register name
The third parameter must be 32bit register . Those assmebly
put 64bit register here , it is wrong .

Change-Id: Iebe17516b555a6a9b94ea7baa4778ad4b9dd0878
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2019-11-01 18:11:00 +08:00
Jerry Yu
b441659879 multibinary: fix strict-prototype warning
with -Wstric-prototype option , GCC report the
warning .

Change-Id: Ic2d1adb566ad21deec65c66552e2863254e1376a
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2019-11-01 18:10:57 +08:00
Jerry Yu
f0104600a0 build: disable clang support in ci
- Disable clang test for travis and drone.io
- Add document about compiler requirement

Change-Id: I81f8dc31088d40f315dd4ec062bed5df8ab7b633
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2019-11-01 18:10:50 +08:00
Zhiyuan Zhu
6b70da5051 igzip: implement set_long_icf_fg with assembly
Change-Id: I21ac55985a56c2b7b0a684934c076600d90f8b0a
Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>
2019-10-31 11:02:54 -07:00
Greg Tucker
4ed944c4b1 build: Fix travis osx issue with brew update
Bug in Homebrew auto-update causes post-update install to use the old
environment.

Change-Id: I03e20d899f558f71579dfd4be3f96903b77f1998
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-10-30 11:16:49 -07:00
Hang Li
621cf92c52 erasure_code: modify perf benchmark loop
Change-Id: Ie45ceb3ac55ab943a155e2a3f9f6b765cd94d7a1
Signed-off-by: Hang Li <lihang48@hisilicon.com>
2019-10-30 10:34:40 -07:00
Greg Tucker
2f9eef537c build: Fix autoconf build for mingw target
Change-Id: Ie5ae17556f8cc95af8e59c8bd81a958c94455cd1
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-10-28 15:53:14 -07:00
Greg Tucker
e6848434ae test: Fix issue keeping mingw tests from running
Change-Id: I1e72ed99c2f09cbad488774313cddafdb1ce5de8
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-10-28 15:52:48 -07:00
Greg Tucker
533ba53f11 crc: Fix symbol conflict with older assemblers
Change-Id: I6f1322a5fecdf21b2c774454cd51cb56767f30b8
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-10-28 14:39:44 -07:00
Zhou Xiong
d7848c1d05 Implement aarch64 neon for erasure code.
1.Replace below erasure code interfaces to arm neon interface by mbin_interface function.
	ec_encode_data
	gf_vect_mul
	gf_vect_dot_prod
	gf_vect_mad
	ec_encode_data_update

2.Utilise arm neon instrution to accelerate GF(2^8) set compute by 128bit registor.

Change-Id: Ib0ecbfbd1837d2b1f823d26815c896724d2d22e4
Signed-off-by: Zhou Xiong <zhouxiong13@huawei.com>
2019-10-25 11:09:03 -07:00
Jun He
c680d3aba7 Add arm64 to Travis matrix
Enable new arm64 architecture in TravisCI, add tests for
following compilers:
gcc: v5.4.0
clang: v3.8.0

Change-Id: Id0b2f2231fabcbeff7061f85050db99df12c9a67
Signed-off-by: Jun He <jun.he@arm.com>
2019-10-24 10:09:19 +08:00
Greg Tucker
5f698e9e41 doc: Update mailing list link
Change-Id: I57fdf1ab4ca9f57c11f361c873094c5c22dc5410
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-10-16 17:13:54 -07:00
Greg Tucker
66cff99954 doc: Remove non-extern headers and add treeview
Change-Id: Icee001e66d48f7a47b36ded5550c66832f81a4cc
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-10-16 17:13:54 -07:00
Bernd Schubert
d32d3f6902 Make variables in ec_base.h (file) static
ec_base.h has several variables, which were defined with
a global scope. Exactly those global variables caused issues
on linking a static compilation of libisal.a to a shared lib.
Adding -fPIC to CFLAGS somehow didn't help.
As all the variables in ec_base.h are only included
and used by a single C file, all of these can be
(file) static, which then will also helps the compiler to
make further optimizations. And which also solves the issue
to link the static libisal to a shared lib.

Also make the variables const, as these are constants and
must be modified.

Change-Id: I2b8141dabc1c7a528401f2778cdbdbed6c93c36b
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
2019-10-11 15:39:56 -07:00
Zhiyuan Zhu
f3993f5c0b crc: Fix dynamic relocation link failure on Arm
This issue occurs when dynamic compilation is used
and gcc's -fsanitize memory detection option is turned on.

[Log] relocation truncated to fit: R_AARCH64_LD_PREL_LO19 against `.rodata'

Change-Id: Ic2f82264610552f347e043f82ac5ebafc93748e2
Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>
2019-10-11 15:37:29 -07:00
Zhiyuan Zhu
be4d035227 igzip: Optimize isal update histogram with arm64
Change-Id: I944f9497d990e831de5e066055a21ea7e8d6693b
Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>
2019-10-11 09:59:47 -07:00
Zhiyuan Zhu
290456231c igzip: Implement deflate icf body/finish with assembly
Change-Id: I40e4a9be2ae654c881460056de9730176d3d097c
Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>
2019-10-11 09:59:40 -07:00
Jerry Yu
f3bb041799 igzip: Implement deflate body/finish with assembly
Change-Id: I556af7976294f31abd72ac49366f7259e3baf399
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2019-10-11 09:59:30 -07:00
Greg Tucker
fae4c3a499 Update release notes for v2.28 additions
Change-Id: Id295d5e615712f41d67d1130d5bcab1abed4c29f
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-09-17 11:01:17 -07:00
Greg Tucker
36502ec33b build: Bump revision to 2.28
Change-Id: I57443be6b0f6dff6129943cd6e1508d73bc1aa80
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-09-17 10:43:53 -07:00
Greg Tucker
600b6d8d99 crc: Add new ecma_norm
Change-Id: I7747bfdca24bcd604c3eb118e7f1bcd98b2b6211
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-09-16 17:01:25 -07:00
Greg Tucker
121bc635c9 crc: Add new jones_norm
Change-Id: I66118baeec2a1d63423c74edc3aa20a3e8955c6e
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-09-16 17:01:25 -07:00
Greg Tucker
ed528bb2ad crc: Add new iso_norm
Change-Id: If0b05d1a1029b02842935c5c43966d81c59fbbca
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-09-16 17:01:25 -07:00
Greg Tucker
ea4cbf0ffa crc: Add new ecma_refl
Change-Id: Ifef4f8c6ce7da328b0cc03040b17e7443febf44d
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-09-16 17:01:25 -07:00
Greg Tucker
42bbc5a37e crc: Add new jones_refl
Change-Id: Ia4837b9125bce4e38ef6bae0a8c852d02e9b0bf2
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-09-16 17:01:25 -07:00
Greg Tucker
5c546ecddf crc: Add new arch CRC
Change-Id: I31d3a7e61eeed9d13a0cadd6d1ed25b0dbb39415
Signed-off-by: Chunyang Hui <chunyang.hui@intel.com>
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-09-16 17:01:25 -07:00
Greg Tucker
7a28c83879 test: Increase size of crc tests and simplify output
Change-Id: Ia0418b7889e591a0164c335e273caff263cdf640
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-09-14 16:01:28 -07:00
Greg Tucker
ae3c91ab85 build: Set assembler feature level in std make
Also fix multibinary to try each available arch

Change-Id: Icd8496d169665bded478a33a02e739d1f8349b6f
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-09-14 16:01:28 -07:00
Roy Oursler
198b026a55 build: Add multi-binary checking for new arch
Change-Id: I8bb8d9e9ae28987ee583976871ff84ee205bdbdc
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2019-09-14 16:01:28 -07:00
Roy Oursler
e4b8f164ae build: Setup as_feature_level
Change-Id: I7443058c577cf8eafe10acc2b2bfdfe76e2ce264
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2019-09-14 16:01:28 -07:00
Roy Oursler
d3caab9c3a build: Avoid requiring AVX512 define when using dispatch functions
Change-Id: I76af2d6ab7eb61ae531bbc7427650d08737c20ab
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2019-09-14 16:01:28 -07:00
Greg Tucker
1ba280fa09 igzip: Fix and clarify a few code issues in the cli tool
Fixes a few scan build hits. A few are false positives such as a missed free but
better to clarify the code in this case. Others such as calling no-null
functions are made explicit.

Change-Id: Icb001a2bf7024dbaa4b4c87089eda818de830c78
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-09-04 14:39:01 -07:00
Jerry Yu
5f45f3f310 igzip: Optimize adler32 with arm neon
Change-Id: I9b8932eb02ed6bc44756f6505e7efbfad1706b46
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2019-08-29 10:11:06 +08:00
Jerry Yu
a2005c1fd6 igzip: enable multibinary interfaces
- Add dispatcher layer
- Alias functions with assmebly

Change-Id: I84da1be539d890db0df64e5ea989b2fd1f276949
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2019-08-29 10:08:58 +08:00
Jerry Yu
183385f02f multibinary: Add run-time cpu feature detect for aarch64
Some CPUs  report "illegal instruction" error for the crc test because
they do not support the relevant optional feature . This can be fixed by
introducing CPU feature detection for AArch64 .

The difference with the x86 implementation is the dispatcher . It is based
on the glibc function `getauxval(AT_HWCAP)` and `getauxval(AT_HWCAP2)` , not
registers or instructions .

On a  heterogeneous system (big.LITTLE) , it is dangerous to detect CPU
features using identification registers . And while it is possible to use
architectural feature registers from userspace on recent kernels, this
won't necessarily work with older platforms . Thus we use the HW_CAPs
exported from the kernel (and visible in getauxval) as the solution.

- According to kernel suggestion , getauxval should be used for this purpose .
  - [CPU Feature detection](https://github.com/torvalds/linux/blob/master/Documentation/arm64/cpu-feature-registers.rst)
- According to  AAPCS result/paramter registers should be saved/restore for function call
  - [AAPCS](http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf)
  - [GLibc](https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=blob;f=sysdeps/aarch64/dl-trampoline.S)

Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
Change-Id: Ic9abe0d2268ac95537e1abf10acc642fc58a5054
2019-08-26 17:58:42 +08:00
Jerry Yu
0c22fcd3e2 build: fix compile break for unsupported CPUs
Build with Makefile.unx on unsupported CPUs fail . It reports
"undefine references". Fix it with adding base aliases files
into sources list

Change-Id: I9fbdeee7cb82edc9d5d8461bee3f648be83feaa6
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2019-08-23 17:28:22 +08:00
Jun He
a95292aa01 ci: add drone.io for arm64 verification
Change-Id: Ib357be80e7e9d7c0ab62433ee5fda4b962592553
Signed-off-by: Jun He <jun.he@arm.com>
2019-08-19 11:21:10 -07:00
Jun He
b721db98e5 igzip: optimize convert_dist_to_dist_sym to branchless
convert_dist_to_dist_sym uses long if/else branch to get look back distance.
The distance calculation is well formed for each distance range, so it could
be optimized for a branchless version.

Change-Id: I4e1e5170f8b3238631f3048087f95acc53e4498e
Signed-off-by: Jun He <jun.he@arm.com>
2019-08-13 11:02:53 +08:00
Greg Tucker
e2997062fb igzip: Optimize routine to find msb
Change-Id: I40e7898e2139c04f261980ca10886debc917842a
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-08-12 14:28:33 -07:00
Greg Tucker
4b33238371 Update travis with more nasm builds
Change-Id: I78b48f80d22ea811a9ed2e3a537e8dfa0350c8c5
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-07-16 16:09:16 -07:00
Greg Tucker
38f4880a4e build: Set nasm as the default when using std makefile
Also test the assembler for modern instruction support and set appropriate
defines.

Change-Id: I1628abd50b3babeeb7e010b86bda7ea97de0e6fb
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-07-16 15:47:20 -07:00
Greg Tucker
4ac0e435eb ec: Fix incorrect min size stated for gf_vect_mad
Change-Id: If178913f01f0d500aa66ce0e8dd67aaba49a0871
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-07-16 15:41:34 -07:00
Zhiyuan Zhu
c80610a2bb crc: push the aarch64 crc optimization back to base functions
Some arm64 machines don't support pmull instructions, so set these
crc interface to base functions. For long-term solution, will
provide better multi-binary support with cpu features detection.

Change-Id: I02791a2a50283dc8df2f9ba124eb309912b5b4b7
Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>
2019-07-16 07:18:54 +00:00
Greg Tucker
236fdcc28f Update travis with xenial builds and new indent
Xenial has a minimum nasm version but no longer min indent.

Change-Id: I3ec70b9d5be932e903b77fd07d23667746c6c9f8
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-07-10 13:10:20 -07:00