Commit Graph

3 Commits

Author SHA1 Message Date
Jerry Yu
6c4d3dbf6c crc32:NeoverseN1: Change CRC32/PMULL order to PMULL first
To reduce the cache missing events, the mix layout is changed
to PMULL+CRC. It also relaxes the final delay caused by data
dependency.
As results, the cold perf was improved about 20% and warm perf
was improved about 4%.

Change-Id: I7756f846edcb4f1665b4643a5a0e02283938cfdf
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2020-04-16 20:38:41 +08:00
Jerry Yu
92fc8733fa crc32: Fix prototype mismatch bug
Change-Id: I7c8a2348441f32a43ff386122612405e418d9947
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2020-04-10 00:46:41 +00:00
Jerry Yu
a2fc2c000d crc32:Add optimization implementation for Neoverse N1
This patch is base on reference(1) algorithm with some changes.
- Redefine the block number to two.
  - That's due to only two pipe-line can be used in CRC32 calculate.
- Redefine the block size:
  - The block size of CRC is 1536B and PMULL is 512B
- Interleave CRC and PMULL instructions.
The optimization parameters are calculated base on reference(2)

References:
- https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf
- https://developer.arm.com/docs/swog309707/a

Change-Id: I1c9e593d59b521f56e4b3c807b396c083c181636
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2020-03-30 09:20:29 -07:00