Commit Graph

664 Commits

Author SHA1 Message Date
Jerry Yu
6c4d3dbf6c crc32:NeoverseN1: Change CRC32/PMULL order to PMULL first
To reduce the cache missing events, the mix layout is changed
to PMULL+CRC. It also relaxes the final delay caused by data
dependency.
As results, the cold perf was improved about 20% and warm perf
was improved about 4%.

Change-Id: I7756f846edcb4f1665b4643a5a0e02283938cfdf
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2020-04-16 20:38:41 +08:00
Jerry Yu
92fc8733fa crc32: Fix prototype mismatch bug
Change-Id: I7c8a2348441f32a43ff386122612405e418d9947
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2020-04-10 00:46:41 +00:00
Jerry Yu
9bcd6768fd crc32:Adjust hardware folding algorithm flags
Hardware folding algorithm depend on CRC32 and PMULL instruction.
And it should match both flags .

Change-Id: I361068402db1fe6d7c0bd8d2c7048f1d94880233
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2020-04-08 13:50:15 +08:00
Jerry Yu
0033f42189 crc32:Optimize crc32/c for cortex-a72
Change-Id: Ib1658fd4b87b31d8ea6c93f697b50d9b409c186e
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2020-04-08 13:49:38 +08:00
Greg Tucker
5e586843eb build: Change ms nmake default to nasm and add pdb gen
The nmake default is changed for a modern nasm. Older nasm and yasm versions
will still work with windows but the nmake options must be changed appropriately
for max AS_FEATURE_LEVEL to match. Also now generates debug symbol pdb files.

Change-Id: I94a2dd7ecf541c6564ccbd4a184c33995d7b31ad
Signed-off-by: Poornima Kumar <poornima.kumar@intel.com>
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-03-31 22:55:27 +00:00
Jerry Yu
a2fc2c000d crc32:Add optimization implementation for Neoverse N1
This patch is base on reference(1) algorithm with some changes.
- Redefine the block number to two.
  - That's due to only two pipe-line can be used in CRC32 calculate.
- Redefine the block size:
  - The block size of CRC is 1536B and PMULL is 512B
- Interleave CRC and PMULL instructions.
The optimization parameters are calculated base on reference(2)

References:
- https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf
- https://developer.arm.com/docs/swog309707/a

Change-Id: I1c9e593d59b521f56e4b3c807b396c083c181636
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2020-03-30 09:20:29 -07:00
Jerry Yu
f2cf2609cd multi-binary:Add microarchitecture id reader
This patch provides microarchitecture information
and make microarchitecture optimization possible. It
will trap into kernel due to mrs instruction. So it
should be called only in dispatcher, that will be
called only once in program lifecycle. And HWCAP must
be match,That will make sure there are no illegal
instruction errors.

Change-Id: I393ec742010bf3f10ce335482c0350aa4202c788
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2020-03-30 09:20:29 -07:00
Jerry Yu
85f947e120 ci: remove unused drone configuration
Change-Id: I20bded8111deb122757dbf259d17cd80010c2bb6
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2020-03-27 16:16:00 -07:00
Greg Tucker
af13ed6136 ec: Fix second windows reg push for avx512
Change improper stack push in windows prolog.  Error was not reachable without
windows nasm support and so went undetected.

Change-Id: I8b715195d1c8efd173843c043d42fc610ddebd17
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-03-20 12:36:58 -07:00
Greg Tucker
ede04f0a1f build: Fix for windows to allow nasm use
Previously windows build could only use yasm because some procedural items such
as proc_start were not supported by nasm.  This adds a few macros and fixes so
nasm can be used to build on windows.

Change-Id: Ia05dc3ff482f33b0f915bb1be3c7df5e4a753b3a
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-03-17 18:05:46 -07:00
Greg Tucker
5ab40c79cc ec: Fix windows reg push for avx512
Push of registers overlapped xmm push.  Error was not reachable without windows
nasm support and so went undetected.

Change-Id: I0ffd66f6d32ac37ea03fe9b11924968aa50f8fa7
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-03-17 18:05:46 -07:00
Greg Tucker
472e7011e8 ec: Change use of windows macro save_xmm128 to vec
For builds under windows this could emit a non-vec mov that's not optional for
AVX versions.

Change-Id: I31e6ea3b62d48c5a13f6e83f8d684f0b5551087b
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-03-17 18:04:54 -07:00
Greg Tucker
7c0ab1d459 build: Add auto regenerate of nmake file
Change-Id: Icaa64aa35697c87779df18c3941d3df0f3256546
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-03-10 14:00:05 -07:00
Greg Tucker
794413ddd2 ec: Remove arch-specific redundant gf_nvect tests
The gf_{2-6}vect_dot_prod tests were kept in other_tests since the 5,6vect
functions were not strictly called by the higher level ec_encode_data() and
needed independent testing.  As this has now changed the extra tests can be
removed as redundant.

Change-Id: I8a95e31487b150a2a8f929c5586785524d951fde
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-03-06 13:45:59 -07:00
Greg Tucker
806b55ee57 build: Bump revision to 2.29
Change-Id: I78cfa77864f3fd77c3b63199bc18fd1782fe3dc2
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-02-26 18:29:49 -07:00
Greg Tucker
2db2cd557c Update release notes for v2.29 additions
Change-Id: Id9ba5da760ee60dbb1de47162e6276f522bc0850
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-02-26 12:04:18 -07:00
Greg Tucker
6136a04bbe crc: Add new vclmul version of crc16_t10dif
Change-Id: Ic068f35d5d8c34b74128b7a2ea8e82f5fa693c28
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-02-21 19:54:19 -07:00
Greg Tucker
5ef6eb5c68 crc: Add new vclmul version of crc32_ieee
Change-Id: Ib761e3240d8252ce84e9abeadb568dce60742717
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-02-21 10:11:16 -07:00
Greg Tucker
25a673d75a crc: Add new vclmul version of gzip_refl
Change-Id: I8050853dcd177f4fb506f32f5fa723f7a1d3cded
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-02-21 10:11:16 -07:00
Greg Tucker
4217930338 crc: Add vec version of crc16_t10dif_copy
Change-Id: I5f73e8a38efd1ff50d30a39689d9d85da702e809
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-02-21 10:11:16 -07:00
Greg Tucker
02a41e0653 crc: Add vec version of crc32_ieee when avx avail
Change-Id: I5542ee93156c26f5a23feb89b82f4c51f282777d
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-02-21 10:11:16 -07:00
Greg Tucker
d4131bb3d3 crc: Add vec version of crc32_gzip_refl when avx avail
Change-Id: I4a069c318c809dcd21a6ebc47d3e0d1c131599ea
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-02-21 10:11:16 -07:00
Greg Tucker
ad22a90686 crc: Add vec version of crc16 when avx available
Vec versions mix much better with other avx code.

Change-Id: I2544c75d09231ee70f16c384b1e57062976199d9
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-02-21 10:11:16 -07:00
Hong Bo Peng
180c74aefd enable VSX SIMD in ISA-L for ppc64le
1) Implement the ErasureCode function in Altivec Intrinsics
  2) Coding style update

Change-Id: I2c81d035f4083e9b011dbf3b741f628813b68606
Thanks-to: Daniel Axtens <dja@axtens.net>
Signed-off-by: Hong Bo Peng <penghb@cn.ibm.com>
2020-02-20 09:40:43 -07:00
Zhang Jinde
a3d5cd8642 igzip: Fix clang error on dep generation
Clang errors when generating dependencies due to a stray semicolon following a
function definition.

Change-Id: Iefb4aca988b643bb62a69bbbaf197aca20a2d085
Signed-off-by: Zhang Jinde <zjd5536@163.com>
2020-01-17 10:25:32 -07:00
Zhang Jinde
163b6cd934 igzip: Fix for deflate logic buffer management
Fixes invalid logic that attempted to eliminate unnecessary copy of input to the
history buffer in cases where it is not required. Correction should improve
performance and not change functionality.

Change-Id: Ife24dcc9d920ce220b1a394031e971321737a171
Signed-off-by: Zhang Jinde <zjd5536@163.com>
2020-01-08 09:46:16 -07:00
Jerry Yu
fc69e8fc79 igzip: fix deflate hash bug
if next_in equal end_in, the function should
return.

Change-Id: I59e631bb1f24835fd43f943a3736e016c4e2d0ac
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2019-12-31 13:15:35 -07:00
Jerry Yu
e2b07bbd44 build: fix debug build problem
Remove strip command when lib_debug=1

Change-Id: I1203fcbfefb3b87080e9ba12ccbfb8018a008147
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2019-12-31 13:15:05 -07:00
Jerry Yu
936d05fc4f igzip:Add decode huffman code for aarch64
Change-Id: If26cc4fd97b078b5f3b02e5f6f121a12ec73f671
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2019-12-19 16:10:04 +08:00
Greg Tucker
ad49e580dc doc: Fix missing description of gf_matrix_inverse
Doc missed issue of input matrix destruction.
Fixes #116

Change-Id: Ic840b27532d90518dd21ec2701c278a1c3b61a8b
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-12-13 16:24:05 -07:00
Zhiyuan Zhu
2b8cc393af igzip: implement gen_icf_map with assembly
Change-Id: I74e6200a732acfaac44b7f5a82bd4a2215ba1535
Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>
2019-12-13 07:54:12 +00:00
Zhiyuan Zhu
f430953f0a igzip: cleanup perf test related code
This patch addresses some cppcheck issues.
And some minor changes to maintain code consistency.

- Cleanup cppcheck issues.
  [log][igzip/igzip_perf.c] (error) Shifting signed 32-bit value by 31 bits is undefined behaviour
  [log][igzip/igzip_hist_perf.c:132]: (error) Memory leak: outbuf

- Some minor changes to maintain code consistency.
  igzip/igzip_build_hash_table_perf.c
  igzip/igzip_hist_perf.c
  igzip/igzip_semi_dyn_file_perf.c

- delete unused variable
  outbuf and outbuf_size from igzip/igzip_hist_perf.c

Change-Id: Icbbd8f70de689931c8a844d89e457af8d97c6793
Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>
2019-12-06 15:33:20 +08:00
Zhiyuan Zhu
683364c47b igzip: implement encode_deflate_icf with assembly
Change-Id: I90b12da2d2a96bfdb47d29ab329648247a756585
Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>
2019-11-29 14:45:45 -07:00
John Kariuki
5eeb33f69c ec: add AVX512 ec functions with 5 and 6 outputs
Added AVX512 optimized functions to calculate the
GF(2^8) vector dot product with 5 and 6 outputs
at a time. Also added GF(2^8) vector multiply
AVX512 optimized functions with 5 and 6 accumulate.

Change-Id: I6d2c080f4f4f8e4823ad9a9be2c65c3b5b3bb1f8
Signed-off-by: John Kariuki <John.K.Kariuki@intel.com>
2019-11-19 10:12:14 -07:00
Samuel Lee
4785428d2f crc: arm64 implementation tweaks
+ Utilise `pmull2` instruction in main loops of arm64 crc functions and
avoid the need for `dup` to align multiplicands.
  + Use just 1 ASIMD register to hold both 64b p4 constants,
appropriately aligned.
+ Interleave quadword `ldr` with `pmull{2}` to avoid unnecessary stalls
on existing LITTLE uarch (which can only issue these instructions every
other cycle).
+ Similarly interleave scalar instructions with ASIMD instructions to
increase likelihood of instruction level parallelism on a variety of
uarch.
+ Cut down on needless instructions in non-critical sections to help
performance for small buffers.
+ Extract common instruction sequences into inner macros and moved
them into shared header - crc_common_pmull.h
+ Use the same human readable register aliases and register allocation
in all 4 implementations, never refer to registers without using human
readable alias.
  + Use #defines rather than .req to allow use of same names across
several implementations
+ Reduce tail case size from 1024B to 64B

+ Phrased the `eor` instructions in the main loop to more clearly show
that we can rewrite pairs of `eor` instructions with a single `eor3`
instruction in the presence of Armv8.2-SHA (should probably be an option
in multibinary in future).

Change-Id: I3688193ea4ad88b53cf47e5bd9a7fd5c2b4401e1
Signed-off-by: Samuel Lee <samuel.lee@microsoft.com>
2019-11-13 10:58:19 -07:00
Greg Tucker
0a8d05a81e doc: Move arch-dependent build instructions to readme
Removed the redundant parts that apply to all arch.

Change-Id: I2015c436cc8ea09913a8d0d4ce2cf1f112d71dde
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-11-01 15:55:44 -07:00
Hang Li
02a86dfb3f erasure_code: modify eor way in aarch64 neon codes
Change-Id: I9fb9219c5f280ed88194ec63234af046a5a036ae
Signed-off-by: Hang Li <lihang48@hisilicon.com>
2019-11-01 15:31:33 -07:00
Jerry Yu
ce9e56054a igzip:implement deflate hash with assembly
Change-Id: I39b3a37cd291c40f597750839c27db2a6a571fe5
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2019-11-01 14:41:46 -07:00
Jerry Yu
216d0f929b build: fix cross compile issue
Replace hardcode gcc with $(CC). as_filter
will work correct in cross compile

Change-Id: I484d5074abdfc80ed5cd14fdd1358274f306bcfd
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2019-11-01 18:11:05 +08:00
Jerry Yu
5d7724898d build: fix wrong use the register name
The third parameter must be 32bit register . Those assmebly
put 64bit register here , it is wrong .

Change-Id: Iebe17516b555a6a9b94ea7baa4778ad4b9dd0878
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2019-11-01 18:11:00 +08:00
Jerry Yu
b441659879 multibinary: fix strict-prototype warning
with -Wstric-prototype option , GCC report the
warning .

Change-Id: Ic2d1adb566ad21deec65c66552e2863254e1376a
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2019-11-01 18:10:57 +08:00
Jerry Yu
f0104600a0 build: disable clang support in ci
- Disable clang test for travis and drone.io
- Add document about compiler requirement

Change-Id: I81f8dc31088d40f315dd4ec062bed5df8ab7b633
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2019-11-01 18:10:50 +08:00
Zhiyuan Zhu
6b70da5051 igzip: implement set_long_icf_fg with assembly
Change-Id: I21ac55985a56c2b7b0a684934c076600d90f8b0a
Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>
2019-10-31 11:02:54 -07:00
Greg Tucker
4ed944c4b1 build: Fix travis osx issue with brew update
Bug in Homebrew auto-update causes post-update install to use the old
environment.

Change-Id: I03e20d899f558f71579dfd4be3f96903b77f1998
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-10-30 11:16:49 -07:00
Hang Li
621cf92c52 erasure_code: modify perf benchmark loop
Change-Id: Ie45ceb3ac55ab943a155e2a3f9f6b765cd94d7a1
Signed-off-by: Hang Li <lihang48@hisilicon.com>
2019-10-30 10:34:40 -07:00
Greg Tucker
2f9eef537c build: Fix autoconf build for mingw target
Change-Id: Ie5ae17556f8cc95af8e59c8bd81a958c94455cd1
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-10-28 15:53:14 -07:00
Greg Tucker
e6848434ae test: Fix issue keeping mingw tests from running
Change-Id: I1e72ed99c2f09cbad488774313cddafdb1ce5de8
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-10-28 15:52:48 -07:00
Greg Tucker
533ba53f11 crc: Fix symbol conflict with older assemblers
Change-Id: I6f1322a5fecdf21b2c774454cd51cb56767f30b8
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-10-28 14:39:44 -07:00
Zhou Xiong
d7848c1d05 Implement aarch64 neon for erasure code.
1.Replace below erasure code interfaces to arm neon interface by mbin_interface function.
	ec_encode_data
	gf_vect_mul
	gf_vect_dot_prod
	gf_vect_mad
	ec_encode_data_update

2.Utilise arm neon instrution to accelerate GF(2^8) set compute by 128bit registor.

Change-Id: Ib0ecbfbd1837d2b1f823d26815c896724d2d22e4
Signed-off-by: Zhou Xiong <zhouxiong13@huawei.com>
2019-10-25 11:09:03 -07:00
Jun He
c680d3aba7 Add arm64 to Travis matrix
Enable new arm64 architecture in TravisCI, add tests for
following compilers:
gcc: v5.4.0
clang: v3.8.0

Change-Id: Id0b2f2231fabcbeff7061f85050db99df12c9a67
Signed-off-by: Jun He <jun.he@arm.com>
2019-10-24 10:09:19 +08:00