isa-l/crc/aarch64
Samuel Lee 4785428d2f crc: arm64 implementation tweaks
+ Utilise `pmull2` instruction in main loops of arm64 crc functions and
avoid the need for `dup` to align multiplicands.
  + Use just 1 ASIMD register to hold both 64b p4 constants,
appropriately aligned.
+ Interleave quadword `ldr` with `pmull{2}` to avoid unnecessary stalls
on existing LITTLE uarch (which can only issue these instructions every
other cycle).
+ Similarly interleave scalar instructions with ASIMD instructions to
increase likelihood of instruction level parallelism on a variety of
uarch.
+ Cut down on needless instructions in non-critical sections to help
performance for small buffers.
+ Extract common instruction sequences into inner macros and moved
them into shared header - crc_common_pmull.h
+ Use the same human readable register aliases and register allocation
in all 4 implementations, never refer to registers without using human
readable alias.
  + Use #defines rather than .req to allow use of same names across
several implementations
+ Reduce tail case size from 1024B to 64B

+ Phrased the `eor` instructions in the main loop to more clearly show
that we can rewrite pairs of `eor` instructions with a single `eor3`
instruction in the presence of Armv8.2-SHA (should probably be an option
in multibinary in future).

Change-Id: I3688193ea4ad88b53cf47e5bd9a7fd5c2b4401e1
Signed-off-by: Samuel Lee <samuel.lee@microsoft.com>
2019-11-13 10:58:19 -07:00
..
crc16_t10dif_copy_pmull.S crc: Fix dynamic relocation link failure on Arm 2019-10-11 15:37:29 -07:00
crc16_t10dif_pmull.S crc: Fix dynamic relocation link failure on Arm 2019-10-11 15:37:29 -07:00
crc32_gzip_refl_hw_fold.S crc: optimize crc with arm64 assembly 2019-06-21 17:02:16 +08:00
crc32_gzip_refl_pmull.h crc: arm64 implementation tweaks 2019-11-13 10:58:19 -07:00
crc32_gzip_refl_pmull.S crc: optimize crc with arm64 assembly 2019-06-21 17:02:16 +08:00
crc32_ieee_norm_pmull.h crc: arm64 implementation tweaks 2019-11-13 10:58:19 -07:00
crc32_ieee_norm_pmull.S crc: optimize crc with arm64 assembly 2019-06-21 17:02:16 +08:00
crc32_iscsi_refl_hw_fold.S crc: optimize crc with arm64 assembly 2019-06-21 17:02:16 +08:00
crc32_iscsi_refl_pmull.h crc: arm64 implementation tweaks 2019-11-13 10:58:19 -07:00
crc32_iscsi_refl_pmull.S crc: optimize crc with arm64 assembly 2019-06-21 17:02:16 +08:00
crc32_norm_common_pmull.h crc: arm64 implementation tweaks 2019-11-13 10:58:19 -07:00
crc32_refl_common_pmull.h crc: arm64 implementation tweaks 2019-11-13 10:58:19 -07:00
crc64_ecma_norm_pmull.h crc: arm64 implementation tweaks 2019-11-13 10:58:19 -07:00
crc64_ecma_norm_pmull.S crc: optimize crc with arm64 assembly 2019-06-21 17:02:16 +08:00
crc64_ecma_refl_pmull.h crc: arm64 implementation tweaks 2019-11-13 10:58:19 -07:00
crc64_ecma_refl_pmull.S crc: optimize crc with arm64 assembly 2019-06-21 17:02:16 +08:00
crc64_iso_norm_pmull.h crc: arm64 implementation tweaks 2019-11-13 10:58:19 -07:00
crc64_iso_norm_pmull.S crc: optimize crc with arm64 assembly 2019-06-21 17:02:16 +08:00
crc64_iso_refl_pmull.h crc: arm64 implementation tweaks 2019-11-13 10:58:19 -07:00
crc64_iso_refl_pmull.S crc: optimize crc with arm64 assembly 2019-06-21 17:02:16 +08:00
crc64_jones_norm_pmull.h crc: arm64 implementation tweaks 2019-11-13 10:58:19 -07:00
crc64_jones_norm_pmull.S crc: optimize crc with arm64 assembly 2019-06-21 17:02:16 +08:00
crc64_jones_refl_pmull.h crc: arm64 implementation tweaks 2019-11-13 10:58:19 -07:00
crc64_jones_refl_pmull.S crc: optimize crc with arm64 assembly 2019-06-21 17:02:16 +08:00
crc64_norm_common_pmull.h crc: arm64 implementation tweaks 2019-11-13 10:58:19 -07:00
crc64_refl_common_pmull.h crc: arm64 implementation tweaks 2019-11-13 10:58:19 -07:00
crc_aarch64_dispatcher.c multibinary: Add run-time cpu feature detect for aarch64 2019-08-26 17:58:42 +08:00
crc_common_pmull.h crc: arm64 implementation tweaks 2019-11-13 10:58:19 -07:00
crc_multibinary_arm.S multibinary: Add run-time cpu feature detect for aarch64 2019-08-26 17:58:42 +08:00
Makefile.am multibinary: Add run-time cpu feature detect for aarch64 2019-08-26 17:58:42 +08:00