Commit Graph

12 Commits

Author SHA1 Message Date
Jerry Yu
1c71f9c0ae crc32: tweak performance of crc32/crc32c
Tweak performances with prefetch instructions.

Below is the test results:
- Neoverse N1: ~30%
- Cortex-A72: ~3%
- Cortex-A57: ~90%
- Others: 50% - 5x

Change-Id: I3ab292a953043dbaea98af3c66778f57da3a1331
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2020-07-09 17:37:00 +08:00
Zhiyuan Zhu
031450f697 crc32: Implement default mix mode optimization
Change-Id: Ib3bf04215cca491db522ec33905fe48df173cc2f
Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>
2020-05-09 08:10:34 +00:00
Jerry Yu
6c4d3dbf6c crc32:NeoverseN1: Change CRC32/PMULL order to PMULL first
To reduce the cache missing events, the mix layout is changed
to PMULL+CRC. It also relaxes the final delay caused by data
dependency.
As results, the cold perf was improved about 20% and warm perf
was improved about 4%.

Change-Id: I7756f846edcb4f1665b4643a5a0e02283938cfdf
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2020-04-16 20:38:41 +08:00
Jerry Yu
92fc8733fa crc32: Fix prototype mismatch bug
Change-Id: I7c8a2348441f32a43ff386122612405e418d9947
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2020-04-10 00:46:41 +00:00
Jerry Yu
9bcd6768fd crc32:Adjust hardware folding algorithm flags
Hardware folding algorithm depend on CRC32 and PMULL instruction.
And it should match both flags .

Change-Id: I361068402db1fe6d7c0bd8d2c7048f1d94880233
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2020-04-08 13:50:15 +08:00
Jerry Yu
0033f42189 crc32:Optimize crc32/c for cortex-a72
Change-Id: Ib1658fd4b87b31d8ea6c93f697b50d9b409c186e
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2020-04-08 13:49:38 +08:00
Jerry Yu
a2fc2c000d crc32:Add optimization implementation for Neoverse N1
This patch is base on reference(1) algorithm with some changes.
- Redefine the block number to two.
  - That's due to only two pipe-line can be used in CRC32 calculate.
- Redefine the block size:
  - The block size of CRC is 1536B and PMULL is 512B
- Interleave CRC and PMULL instructions.
The optimization parameters are calculated base on reference(2)

References:
- https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf
- https://developer.arm.com/docs/swog309707/a

Change-Id: I1c9e593d59b521f56e4b3c807b396c083c181636
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2020-03-30 09:20:29 -07:00
Samuel Lee
4785428d2f crc: arm64 implementation tweaks
+ Utilise `pmull2` instruction in main loops of arm64 crc functions and
avoid the need for `dup` to align multiplicands.
  + Use just 1 ASIMD register to hold both 64b p4 constants,
appropriately aligned.
+ Interleave quadword `ldr` with `pmull{2}` to avoid unnecessary stalls
on existing LITTLE uarch (which can only issue these instructions every
other cycle).
+ Similarly interleave scalar instructions with ASIMD instructions to
increase likelihood of instruction level parallelism on a variety of
uarch.
+ Cut down on needless instructions in non-critical sections to help
performance for small buffers.
+ Extract common instruction sequences into inner macros and moved
them into shared header - crc_common_pmull.h
+ Use the same human readable register aliases and register allocation
in all 4 implementations, never refer to registers without using human
readable alias.
  + Use #defines rather than .req to allow use of same names across
several implementations
+ Reduce tail case size from 1024B to 64B

+ Phrased the `eor` instructions in the main loop to more clearly show
that we can rewrite pairs of `eor` instructions with a single `eor3`
instruction in the presence of Armv8.2-SHA (should probably be an option
in multibinary in future).

Change-Id: I3688193ea4ad88b53cf47e5bd9a7fd5c2b4401e1
Signed-off-by: Samuel Lee <samuel.lee@microsoft.com>
2019-11-13 10:58:19 -07:00
Zhiyuan Zhu
f3993f5c0b crc: Fix dynamic relocation link failure on Arm
This issue occurs when dynamic compilation is used
and gcc's -fsanitize memory detection option is turned on.

[Log] relocation truncated to fit: R_AARCH64_LD_PREL_LO19 against `.rodata'

Change-Id: Ic2f82264610552f347e043f82ac5ebafc93748e2
Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>
2019-10-11 15:37:29 -07:00
Jerry Yu
183385f02f multibinary: Add run-time cpu feature detect for aarch64
Some CPUs  report "illegal instruction" error for the crc test because
they do not support the relevant optional feature . This can be fixed by
introducing CPU feature detection for AArch64 .

The difference with the x86 implementation is the dispatcher . It is based
on the glibc function `getauxval(AT_HWCAP)` and `getauxval(AT_HWCAP2)` , not
registers or instructions .

On a  heterogeneous system (big.LITTLE) , it is dangerous to detect CPU
features using identification registers . And while it is possible to use
architectural feature registers from userspace on recent kernels, this
won't necessarily work with older platforms . Thus we use the HW_CAPs
exported from the kernel (and visible in getauxval) as the solution.

- According to kernel suggestion , getauxval should be used for this purpose .
  - [CPU Feature detection](https://github.com/torvalds/linux/blob/master/Documentation/arm64/cpu-feature-registers.rst)
- According to  AAPCS result/paramter registers should be saved/restore for function call
  - [AAPCS](http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf)
  - [GLibc](https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=blob;f=sysdeps/aarch64/dl-trampoline.S)

Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
Change-Id: Ic9abe0d2268ac95537e1abf10acc642fc58a5054
2019-08-26 17:58:42 +08:00
Zhiyuan Zhu
c80610a2bb crc: push the aarch64 crc optimization back to base functions
Some arm64 machines don't support pmull instructions, so set these
crc interface to base functions. For long-term solution, will
provide better multi-binary support with cpu features detection.

Change-Id: I02791a2a50283dc8df2f9ba124eb309912b5b4b7
Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>
2019-07-16 07:18:54 +00:00
Zhiyuan Zhu
a46da529d9 crc: optimize crc with arm64 assembly
Change-Id: I49166ee06b3ad24babb90aeb0b834d8aacfc2d03
Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>
2019-06-21 17:02:16 +08:00