67 Commits

Author SHA1 Message Date
liuqinfei
4815174a68 crc: optimize by supporting arm xor fusion feature
Arrange the two xor instructions according to the specified
paradigm, then the two xor instructions can be fused to execute
which can save one issue slot and one execution latency.

Change-Id: Ic64bcfe569b2468e4dc9c13d073d367cc81fd937
Signed-off-by: liuqinfei <lucas.liuqinfei@huawei.com>
2023-08-18 07:53:59 +00:00
Pablo de Lara
f534a5c6a9 crc: fold 64 bytes of data if possible
When less than 256 bytes of data are left, fold data
in steps of 64 bytes, instead of 16 bytes, if there is enough
data.

Change-Id: I47d7cacdd1ba620078df528136945695c338db6d
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-08-17 11:54:24 +01:00
Pablo de Lara
beab678fb8 crc: optimize last bytes
Change-Id: I4b8f73b23eb50c4c50ca65fab19716f217fe5780
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-08-17 11:54:20 +01:00
Pablo de Lara
2bbce31943 crc: add CRC64 rocksoft implementation
- Added reference implementation
- Added base implementation
- Added functional and performance tests

Change-Id: I60c5097bd5fb89ee7a50910e71d449d50d155d0a
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2023-05-08 12:37:44 +00:00
Pablo de Lara
16056ff4e4 crc: refactor SSE CRC64 implementations to use common code
Change-Id: I2d141f2ccd12ab338783e50736e36ed4aeb11f7f
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-05-08 12:37:44 +00:00
Pablo de Lara
22d33cf795 crc: use k-mask to load final bytes of data
Change-Id: Ibd8d2144bc6942e11911e25a6365c1cb108af477
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-05-08 12:37:23 +00:00
Greg Tucker
c2bec3ea65 crc: Use ternlog in by16 avx512 loop
Ternlog has additional benefit in by16 crc main loop in both reflected
and non-reflected polynomial crcs. Some arch see 4-7% improvement.
Revisited on suggestion by Nicola Torracca.

Change-Id: I806266a7080168cf33409634983e254a291a0795
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2022-11-02 12:16:20 -07:00
Taiju Yamada
1187583a97 Fixes for aarch64 mac
- It should be fine to enable pmull always on Apple Silicon
- macOS 12+ is required for PMULL instruction.
- Changed the conditional macro to __APPLE__
- Rewritten dispatcher using sysctlbyname
- Use __USER_LABEL_PREFIX__
- Use __TEXT,__const as readonly section
- use ASM_DEF_RODATA macro
- fix func decl

Change-Id: I800593f21085d8187b480c8bb3ab2bd70c4a6974
Signed-off-by: Taiju Yamada <tyamada@bi.a.u-tokyo.ac.jp>
2022-10-28 08:27:26 -07:00
Greg Tucker
9c7e3b9f22 test: Change perf tests to warm by default
The cold versions of tests depended on a fixed size of last level
cache that is too low on some arch and too high for the total
available memory on others.

Change-Id: Iee98403f9ace02e01b810c296a5fe44b933bfb17
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2022-08-03 16:35:55 -07:00
Greg Tucker
9f75defd57 Remove all slver legacy segments
The relic slver is no longer used for individual versioning
on functions and is confusing tools looking for data in text
sections. This removes all instances instead of fixing since
its usefulness is waining. Fixes #221

Change-Id: Ife0b9f105950a90337c58e8a41ac2cffc0f67d99
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2022-07-14 19:23:52 -07:00
Chunsong Feng
e297ecae7a crc16: Accelerate T10DIF performance with prefetch and pmull2
The memory block size calculated by t10dif is generally 512 bytes in
sectors. prefetching can effectively reduce cache misses.Use ldp instead
of ldr to reduce the number of instructions, pmull+pmull2 can resuce
register access. The perf test result shows that the performance is
improved by 5x ~ 14x after optimization.

Change-Id: Ibd3f08036b6a45443ffc15f808fd3b467294c283
Signed-off-by: Chunsong Feng <fengchunsong@huawei.com>
2022-03-31 09:58:04 -07:00
Greg Tucker
112dd72c01 build: Remove unneeded file types.h
The file types.h has long been misnamed and overlaps with
functionality in the test helper routines.

Change-Id: I774047d3a0074198b67a6b4e909f1e2ce1938195
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2021-06-10 09:35:43 -07:00
Greg Tucker
cfdd3497d1 perf: Remove unneeded time include
Timing functions are made os-independent with test.h include.

Change-Id: Iab7d6325254d5c32263504efc756dbbe51d77153
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2021-06-09 18:33:57 -07:00
Greg Tucker
d7bac36be4 crc: Fix warning in perf test from uninitialized tmp ptr
Both gcc and clang are showing a warning on this despite the buffer
always being set before use.

Change-Id: I0e8f6b9e3451efe69e49814abc883d49b04f2666
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2021-05-20 11:57:56 -07:00
Greg Tucker
ec73d39086 crc: Add new vclmul version of crc32_iscsi
Change-Id: I1c509c6ea312b6eb4e1c2c1c8bb7044f7b043e0d
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-08-21 17:15:58 -07:00
Jerry Yu
1c71f9c0ae crc32: tweak performance of crc32/crc32c
Tweak performances with prefetch instructions.

Below is the test results:
- Neoverse N1: ~30%
- Cortex-A72: ~3%
- Cortex-A57: ~90%
- Others: 50% - 5x

Change-Id: I3ab292a953043dbaea98af3c66778f57da3a1331
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2020-07-09 17:37:00 +08:00
H.J. Lu
cd888f01a4 x86: Add ENDBR32/ENDBR64 at function entries for Intel CET
To support Intel CET, all indirect branch targets must start with
ENDBR32/ENDBR64.  Here is a patch to define endbranch and add it to
function entries in x86 assembly codes which are indirect branch
targets as discovered by running testsuite on Intel CET machine and
visual inspection.

Verified with

$ CC="gcc -Wl,-z,cet-report=error -fcf-protection" CXX="g++ -Wl,-z,cet-report=error -fcf-protection" .../configure x86_64-linux
$ make -j8
$ make -j8 check

with both nasm and yasm on both CET and non-CET machines.

Change-Id: I9822578e7294fb5043a64ab7de5c41de81a7d337
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2020-05-26 09:16:49 -07:00
Zhiyuan Zhu
031450f697 crc32: Implement default mix mode optimization
Change-Id: Ib3bf04215cca491db522ec33905fe48df173cc2f
Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>
2020-05-09 08:10:34 +00:00
Jerry Yu
6c4d3dbf6c crc32:NeoverseN1: Change CRC32/PMULL order to PMULL first
To reduce the cache missing events, the mix layout is changed
to PMULL+CRC. It also relaxes the final delay caused by data
dependency.
As results, the cold perf was improved about 20% and warm perf
was improved about 4%.

Change-Id: I7756f846edcb4f1665b4643a5a0e02283938cfdf
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2020-04-16 20:38:41 +08:00
Jerry Yu
92fc8733fa crc32: Fix prototype mismatch bug
Change-Id: I7c8a2348441f32a43ff386122612405e418d9947
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2020-04-10 00:46:41 +00:00
Jerry Yu
9bcd6768fd crc32:Adjust hardware folding algorithm flags
Hardware folding algorithm depend on CRC32 and PMULL instruction.
And it should match both flags .

Change-Id: I361068402db1fe6d7c0bd8d2c7048f1d94880233
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2020-04-08 13:50:15 +08:00
Jerry Yu
0033f42189 crc32:Optimize crc32/c for cortex-a72
Change-Id: Ib1658fd4b87b31d8ea6c93f697b50d9b409c186e
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2020-04-08 13:49:38 +08:00
Jerry Yu
a2fc2c000d crc32:Add optimization implementation for Neoverse N1
This patch is base on reference(1) algorithm with some changes.
- Redefine the block number to two.
  - That's due to only two pipe-line can be used in CRC32 calculate.
- Redefine the block size:
  - The block size of CRC is 1536B and PMULL is 512B
- Interleave CRC and PMULL instructions.
The optimization parameters are calculated base on reference(2)

References:
- https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf
- https://developer.arm.com/docs/swog309707/a

Change-Id: I1c9e593d59b521f56e4b3c807b396c083c181636
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2020-03-30 09:20:29 -07:00
Greg Tucker
ede04f0a1f build: Fix for windows to allow nasm use
Previously windows build could only use yasm because some procedural items such
as proc_start were not supported by nasm.  This adds a few macros and fixes so
nasm can be used to build on windows.

Change-Id: Ia05dc3ff482f33b0f915bb1be3c7df5e4a753b3a
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-03-17 18:05:46 -07:00
Greg Tucker
6136a04bbe crc: Add new vclmul version of crc16_t10dif
Change-Id: Ic068f35d5d8c34b74128b7a2ea8e82f5fa693c28
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-02-21 19:54:19 -07:00
Greg Tucker
5ef6eb5c68 crc: Add new vclmul version of crc32_ieee
Change-Id: Ib761e3240d8252ce84e9abeadb568dce60742717
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-02-21 10:11:16 -07:00
Greg Tucker
25a673d75a crc: Add new vclmul version of gzip_refl
Change-Id: I8050853dcd177f4fb506f32f5fa723f7a1d3cded
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-02-21 10:11:16 -07:00
Greg Tucker
4217930338 crc: Add vec version of crc16_t10dif_copy
Change-Id: I5f73e8a38efd1ff50d30a39689d9d85da702e809
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-02-21 10:11:16 -07:00
Greg Tucker
02a41e0653 crc: Add vec version of crc32_ieee when avx avail
Change-Id: I5542ee93156c26f5a23feb89b82f4c51f282777d
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-02-21 10:11:16 -07:00
Greg Tucker
d4131bb3d3 crc: Add vec version of crc32_gzip_refl when avx avail
Change-Id: I4a069c318c809dcd21a6ebc47d3e0d1c131599ea
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-02-21 10:11:16 -07:00
Greg Tucker
ad22a90686 crc: Add vec version of crc16 when avx available
Vec versions mix much better with other avx code.

Change-Id: I2544c75d09231ee70f16c384b1e57062976199d9
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-02-21 10:11:16 -07:00
Hong Bo Peng
180c74aefd enable VSX SIMD in ISA-L for ppc64le
1) Implement the ErasureCode function in Altivec Intrinsics
  2) Coding style update

Change-Id: I2c81d035f4083e9b011dbf3b741f628813b68606
Thanks-to: Daniel Axtens <dja@axtens.net>
Signed-off-by: Hong Bo Peng <penghb@cn.ibm.com>
2020-02-20 09:40:43 -07:00
Samuel Lee
4785428d2f crc: arm64 implementation tweaks
+ Utilise `pmull2` instruction in main loops of arm64 crc functions and
avoid the need for `dup` to align multiplicands.
  + Use just 1 ASIMD register to hold both 64b p4 constants,
appropriately aligned.
+ Interleave quadword `ldr` with `pmull{2}` to avoid unnecessary stalls
on existing LITTLE uarch (which can only issue these instructions every
other cycle).
+ Similarly interleave scalar instructions with ASIMD instructions to
increase likelihood of instruction level parallelism on a variety of
uarch.
+ Cut down on needless instructions in non-critical sections to help
performance for small buffers.
+ Extract common instruction sequences into inner macros and moved
them into shared header - crc_common_pmull.h
+ Use the same human readable register aliases and register allocation
in all 4 implementations, never refer to registers without using human
readable alias.
  + Use #defines rather than .req to allow use of same names across
several implementations
+ Reduce tail case size from 1024B to 64B

+ Phrased the `eor` instructions in the main loop to more clearly show
that we can rewrite pairs of `eor` instructions with a single `eor3`
instruction in the presence of Armv8.2-SHA (should probably be an option
in multibinary in future).

Change-Id: I3688193ea4ad88b53cf47e5bd9a7fd5c2b4401e1
Signed-off-by: Samuel Lee <samuel.lee@microsoft.com>
2019-11-13 10:58:19 -07:00
Greg Tucker
533ba53f11 crc: Fix symbol conflict with older assemblers
Change-Id: I6f1322a5fecdf21b2c774454cd51cb56767f30b8
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-10-28 14:39:44 -07:00
Zhiyuan Zhu
f3993f5c0b crc: Fix dynamic relocation link failure on Arm
This issue occurs when dynamic compilation is used
and gcc's -fsanitize memory detection option is turned on.

[Log] relocation truncated to fit: R_AARCH64_LD_PREL_LO19 against `.rodata'

Change-Id: Ic2f82264610552f347e043f82ac5ebafc93748e2
Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>
2019-10-11 15:37:29 -07:00
Greg Tucker
600b6d8d99 crc: Add new ecma_norm
Change-Id: I7747bfdca24bcd604c3eb118e7f1bcd98b2b6211
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-09-16 17:01:25 -07:00
Greg Tucker
121bc635c9 crc: Add new jones_norm
Change-Id: I66118baeec2a1d63423c74edc3aa20a3e8955c6e
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-09-16 17:01:25 -07:00
Greg Tucker
ed528bb2ad crc: Add new iso_norm
Change-Id: If0b05d1a1029b02842935c5c43966d81c59fbbca
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-09-16 17:01:25 -07:00
Greg Tucker
ea4cbf0ffa crc: Add new ecma_refl
Change-Id: Ifef4f8c6ce7da328b0cc03040b17e7443febf44d
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-09-16 17:01:25 -07:00
Greg Tucker
42bbc5a37e crc: Add new jones_refl
Change-Id: Ia4837b9125bce4e38ef6bae0a8c852d02e9b0bf2
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-09-16 17:01:25 -07:00
Greg Tucker
5c546ecddf crc: Add new arch CRC
Change-Id: I31d3a7e61eeed9d13a0cadd6d1ed25b0dbb39415
Signed-off-by: Chunyang Hui <chunyang.hui@intel.com>
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-09-16 17:01:25 -07:00
Greg Tucker
7a28c83879 test: Increase size of crc tests and simplify output
Change-Id: Ia0418b7889e591a0164c335e273caff263cdf640
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-09-14 16:01:28 -07:00
Jerry Yu
183385f02f multibinary: Add run-time cpu feature detect for aarch64
Some CPUs  report "illegal instruction" error for the crc test because
they do not support the relevant optional feature . This can be fixed by
introducing CPU feature detection for AArch64 .

The difference with the x86 implementation is the dispatcher . It is based
on the glibc function `getauxval(AT_HWCAP)` and `getauxval(AT_HWCAP2)` , not
registers or instructions .

On a  heterogeneous system (big.LITTLE) , it is dangerous to detect CPU
features using identification registers . And while it is possible to use
architectural feature registers from userspace on recent kernels, this
won't necessarily work with older platforms . Thus we use the HW_CAPs
exported from the kernel (and visible in getauxval) as the solution.

- According to kernel suggestion , getauxval should be used for this purpose .
  - [CPU Feature detection](https://github.com/torvalds/linux/blob/master/Documentation/arm64/cpu-feature-registers.rst)
- According to  AAPCS result/paramter registers should be saved/restore for function call
  - [AAPCS](http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf)
  - [GLibc](https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=blob;f=sysdeps/aarch64/dl-trampoline.S)

Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
Change-Id: Ic9abe0d2268ac95537e1abf10acc642fc58a5054
2019-08-26 17:58:42 +08:00
Zhiyuan Zhu
c80610a2bb crc: push the aarch64 crc optimization back to base functions
Some arm64 machines don't support pmull instructions, so set these
crc interface to base functions. For long-term solution, will
provide better multi-binary support with cpu features detection.

Change-Id: I02791a2a50283dc8df2f9ba124eb309912b5b4b7
Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>
2019-07-16 07:18:54 +00:00
Zhiyuan Zhu
a46da529d9 crc: optimize crc with arm64 assembly
Change-Id: I49166ee06b3ad24babb90aeb0b834d8aacfc2d03
Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>
2019-06-21 17:02:16 +08:00
Zhiyuan Zhu
899c647628 crc: implement table-driven crc algorithm
Change-Id: Iebfb8ae1db09bf2dc882fd87e61627d74fab4a5c
Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>
2019-05-08 17:50:03 -07:00
Roy Oursler
699bb5bd3f all: Revamp performance testing to be time based
Change-Id: I6260d28e4adc974d8db0a1c770e3eb922d87f8e4
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2019-03-07 09:28:04 -07:00
Roy Oursler
5c62f1e1ec crc: Use type cast in crc32_ieee_base to avoid undefined behavior
Change-Id: I8362831125927372c62ecb5eec2f5afe6f75ef24
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2019-03-07 09:27:50 -07:00
Yibo Cai
7a44098a98 build: Add aarch64 support
Change-Id: If9594936a28355d89edd1a331b3b429dffa44184
Signed-off-by: Yibo Cai <yibo.cai@arm.com>
2019-02-10 13:08:56 -07:00
Ziye Yang
bed578b4d6 crc: Make crc32_table_iscsi_base static
Reason: Ceph directly copied some code from isal,
which will have conflict on the condition that
SPDK applications use isal-lib(configured with '--with-isal')
and also use Ceph (configured with --with-rbd)

Change-Id: I9f58412a68af76f8e29219a9c72cd44b9183033d
Signed-off-by: Jesse Hui <Chunyang.hui@intel.com>
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
2019-02-01 17:37:33 +08:00