Compare commits

...

670 Commits

Author SHA1 Message Date
Pablo de Lara
e677f668c8 crc: only prefetch data that will be consumed for VPCLMUL functions
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-09-02 14:17:47 +01:00
Pablo de Lara
510de484c4 crc: only prefetch data that will be consumed for non-VPCLMUL functions
Also, use only 2 prefetch instructions for 128B.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-09-02 14:17:47 +01:00
Pablo de Lara
46b52726c8 crc: prefetch data with prefetcht1 for non-VPCLMUL implementations
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-09-02 14:17:47 +01:00
Pablo de Lara
81ee1cdb95 crc: prefetch data with prefetcht0 for VPCLMUL implementations
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-09-02 14:17:47 +01:00
Pablo de Lara
4613c5ac09 crc: delete unused CRC ISCSI implementation
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-09-02 14:17:47 +01:00
Pablo de Lara
0ed666031d crc: add PCLMUL CRC32 ISCSI implementation
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-09-02 14:17:47 +01:00
Tim Burke
a46e3f1588 igzip: Fix aarch64 registry width for bfinal
We only ever load 32 bits into it, and we only ever want to compare against
32 bits. There was no need to declare it as 64 bits.

Furthermore, there were cases where a 64 bit comparison around
isal_out_overflow_1 led us to erroneously set the block state to
ISAL_BLOCK_INPUT_DONE when it should have been left at ISAL_BLOCK_NEW_HDR.

Fixes #316

Signed-off-by: Tim Burke <tim.burke@gmail.com>
2025-08-29 23:50:35 +08:00
Tim Burke
73c50447fc aarch64: Fix build on macOS
Somewhere between Command Line Tools for Xcode 16.2 and 16.3, clang
started complaining like

   <instantiation>:91:26: error: unexpected token in argument list
    movk x7, br_low_b2, lsl 32
                            ^
   crc/aarch64/crc32_ieee_norm_pmull.S:34:1: note: while in macro instantiation
   crc32_norm_func crc32_ieee_norm_pmull

It seems to do with some change to macro expansion; work around it by
replacing .equ directives with #defines.

Fixes #352
Signed-off-by: Tim Burke <tim.burke@gmail.com>
2025-08-21 16:20:04 +01:00
Pablo de Lara
8772e99fee Add MAINTAINERS file
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-08-21 09:05:44 +01:00
Pablo de Lara
fa32879c2d tests: [fuzz] fix potential null dereference
There is a possibility that zstate.msg = NULL, which is set
in inflateInit2() function. In that case, we should not
compare against another string.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-08-11 17:37:49 +01:00
Pablo de Lara
768b77219f igzip: [SHIM] fix memory leaks
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-08-11 17:37:49 +01:00
Pablo de Lara
8f2c02ab9e igzip: fix memory leak in test
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-08-11 17:37:49 +01:00
vkarpenk
f0320e1c30 shim: add zlib shim library
This is experimental library is a drop-in replacement for zlib that
utilizes ISA-L for improved compression/decompression performance.

Signed-off-by: Karpenko, Veronika <veronika.karpenko@intel.com>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-08-08 07:47:35 +00:00
Pablo de Lara
5e9072107a cmake: add functional tests
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-07-17 09:57:45 +02:00
Pablo de Lara
612c210684 Add inital CMake build system
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
2025-07-17 09:57:45 +02:00
lvshuo
d414b2702a erasure_code: optimize RVV implementation
The ISA-L EC code has been written using RVV vector instructions and the minimum multiplication table,
resulting in a performance improvement of over 10 times compared to the existing implementation.

Signed-off-by: Shuo Lv <lv.shuo@sanechips.com.cn>
2025-07-11 15:55:57 +02:00
Pablo de Lara
f2883f24fd raid: add cold cache test
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-07-09 14:15:18 +01:00
Pablo de Lara
55e25f7aa2 raid: add consolidated performance app
Added new RAID performance application which consolidates the
existing XOR and P+Q gen performance applications.

This application accepts buffer sizes to benchmark,
as a single value, list or range, and the RAID function
to test and the number of sources.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-07-03 15:00:56 +02:00
Pablo de Lara
8735bb4e20 crc: add cold cache test
To benchmark a cold cache scenario, the option `--cold`
has been added as a parameter of the CRC benchmark application,
where the addresses of the input buffers are randomize
within a 1GB preallocated memory buffer.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-07-03 12:11:56 +01:00
Pablo de Lara
e97c91547f Add parenthesis around parameters in macros
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-07-03 12:11:56 +01:00
Pablo de Lara
199a0a8151 crc: add CRC consolidated performance benchmark
Added new CRC performance application which consolidates the
existing CRC performance applications (CRC16, CRC32 and CRC64).

This application accepts buffer sizes to benchmark,
as a single value, list or range, and the CRC function
to test (or all of them).

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-06-23 08:48:01 +01:00
Pablo de Lara
5d437d72f1 Add missing base function symbol
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-06-23 08:48:01 +01:00
Pablo de Lara
fc37bd08e3 Further memory leak fixes
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-06-23 08:48:01 +01:00
Pablo de Lara
bf18da6770 Free allocated memory in test applications
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-06-11 12:17:43 +01:00
Rong Tao
8054d41db5 Ignore more generated files
ignore files generated by 'make perf'.

Signed-off-by: Rong Tao <rongtao@cestc.cn>
2025-05-26 19:28:17 +01:00
Tim Burke
9a6c32cb05 Optimize crc64_rocksoft for aarch64
Closes #326

Signed-off-by: Tim Burke <tim.burke@gmail.com>
2025-05-20 11:39:28 +01:00
Tim Burke
86e775b3b5 Remove unnecessary .text directives
Signed-off-by: Tim Burke <tim.burke@gmail.com>
2025-05-20 11:39:28 +01:00
Tim Burke
22810489c6 Normalize the width of some constants
This makes it easier to compare the constants used for crc/crc64_*_by8.asm,
crc/crc64_*_by16_10.asm, and crc/aarch64/crc64_*_pmull.h

Note that this revealed some discrepancies:

  ecma_refl: br_high != rk8 (92d8af2baf0e1e85 vs 92d8af2baf0e1e84)
  iso_refl: br_high != rk8 (b000000000000001 vs b000000000000000)
  jones_refl: br_high != rk8 (2b5926535897936b vs 2b5926535897936a)

but they should be innocuous.

Signed-off-by: Tim Burke <tim.burke@gmail.com>
2025-05-20 11:39:28 +01:00
sunyuechi
f74b0d27ab Update release notes
Signed-off-by: sunyuechi <sunyuechi@iscas.ac.cn>
2025-05-13 15:39:44 +02:00
sunyuechi
a7766a91b6 mem: R-V V mem_zero_detect
banana_f3:
    rvv: mem_zero_detect_perf_warm: runtime =    3062584 usecs, bandwidth 33784 MB in 3.0626 sec = 11031.32 MB/s
    c:   mem_zero_detect_perf_warm: runtime =    3000354 usecs, bandwidth 1594 MB in 3.0004 sec = 531.34 MB/s

Signed-off-by: sunyuechi <sunyuechi@iscas.ac.cn>
2025-05-13 15:39:44 +02:00
Pablo de Lara
94690d01ca Remove 32-bit x86 architecture support
As already announced in issue #296, we are removing 32-bit x86 support,
which was not being validated anyway.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-05-08 18:37:08 +01:00
Pablo de Lara
8045bee170 Bump minimum NASM version to 2.14.01
NASM version 2.14.01 supports all x86 ISA in this library.
Since this version has been out since 2018, it is safe to
only permit the library to be compiled with this minimum version,
as announced in issue #297.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-05-08 16:20:08 +01:00
Pablo de Lara
d20335bba8 Update release notes
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-05-08 16:20:08 +01:00
sunyuechi
eb130eaf6b erasure_code: R-V V ec_encode_data
banana_f3:
    rvv:
        erasure_code_encode_warm: runtime =    3065696 usecs, bandwidth 108 MB in 3.0657 sec = 35.37 MB/s
        erasure_code_decode_warm: runtime =    3001213 usecs, bandwidth 136 MB in 3.0012 sec = 45.47 MB/s
    c:
        erasure_code_encode_warm: runtime =    3002512 usecs, bandwidth 52 MB in 3.0025 sec = 17.34 MB/s
        erasure_code_decode_warm: runtime =    3065235 usecs, bandwidth 57 MB in 3.0652 sec = 18.69 MB/s

Signed-off-by: sunyuechi <sunyuechi@iscas.ac.cn>
2025-05-01 17:44:19 +01:00
sunyuechi
c5d75f1e27 erasure_code: R-V V gf_vect_dot_prod
banana_f3:
    rvv: gf_vect_dot_prod_warm: runtime =    3062964 usecs, bandwidth 490 MB in 3.0630 sec = 160.25 MB/s
    c:   gf_vect_dot_prod_warm: runtime =    3000581 usecs, bandwidth 173 MB in 3.0006 sec = 57.69 MB/s

Signed-off-by: sunyuechi <sunyuechi@iscas.ac.cn>
2025-05-01 17:44:19 +01:00
sunyuechi
4174804684 riscv64_multibinary support more args
Signed-off-by: sunyuechi <sunyuechi@iscas.ac.cn>
2025-05-01 17:44:19 +01:00
sunyuechi
0a68e9434a erasure_code: R-V V gf_vect_mul
banana_f3:
    rvv: gf_vect_mul_warm: runtime =    3062541 usecs, bandwidth 1889 MB in 3.0625 sec = 616.84 MB/s
    c:   gf_vect_mul_warm: runtime =    3062014 usecs, bandwidth 285 MB in 3.0620 sec = 93.29 MB/s

Signed-off-by: sunyuechi <sunyuechi@iscas.ac.cn>
2025-05-01 17:44:19 +01:00
sunyuechi
5518db11a9 Fix erasure_code/gf_vect_mul_test output
Signed-off-by: sunyuechi <sunyuechi@iscas.ac.cn>
2025-05-01 17:44:19 +01:00
Pablo de Lara
9b3532244b Remove YASM support
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-04-29 17:37:34 +01:00
Pablo de Lara
8401831dc4 raid: add AVX2+GFNI implementation for P+Q gen
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-04-29 13:51:12 +01:00
Pablo de Lara
55a42d7717 raid: add AVX512+GFNI implementation for P+Q gen
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-04-29 13:51:12 +01:00
sunyuechi
359e2ac1af Update release notes for v2.32
Signed-off-by: sunyuechi <sunyuechi@iscas.ac.cn>
2025-04-29 12:16:01 +00:00
sunyuechi
0de3661ec0 raid: R-V V xor_gen
banana_f3:
        new: xor_gen_warm: runtime =    3006459 usecs, bandwidth 10685 MB in 3.0065 sec = 3554.17 MB/s
        old: xor_gen_warm: runtime =    3060970 usecs, bandwidth 514 MB in 3.0610 sec = 168.21 MB/s

Signed-off-by: sunyuechi <sunyuechi@iscas.ac.cn>
2025-04-29 12:16:01 +00:00
sunyuechi
7fafc98a37 Fix xor_gen test pass when len % 256 == 0
If len > 255, the return value will be % 256, which causes the test to incorrectly pass

Signed-off-by: sunyuechi <sunyuechi@iscas.ac.cn>
2025-04-29 12:16:01 +00:00
sunyuechi
ba874ba762 raid: R-V V pq_gen
banana_f3:
        new: pq_gen_warm: runtime =    3062397 usecs, bandwidth 4737 MB in 3.0624 sec = 1546.92 MB/s
        old: pq_gen_warm: runtime =    3005894 usecs, bandwidth 2851 MB in 3.0059 sec = 948.80 MB/s

Signed-off-by: sunyuechi <sunyuechi@iscas.ac.cn>
2025-04-29 12:16:00 +00:00
sunyuechi
b725bddd05 license: correct name to "ISCAS"
Signed-off-by: sunyuechi <sunyuechi@iscas.ac.cn>
2025-04-29 12:16:00 +00:00
sunyuechi
91da2ada9a add RISCV CI
Signed-off-by: sunyuechi <sunyuechi@iscas.ac.cn>
2025-04-24 15:29:34 +01:00
Pablo de Lara
ce957f9449 ci: update github actions to latest versions
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-04-24 10:11:26 +01:00
Mattias Ellert
7e01b2c812 Address type mismatch warnings on riscv64
The riscv64 dispatcher code uses the same PROVIDER_INFO macro as the
aarch64 dispatcher and have the same kind of warnings during compilation:

igzip/riscv64/igzip_multibinary_riscv64_dispatcher.c:39:24: warning: type of 'adler32_base' does not match original declaration [-Wlto-type-mismatch]
   39 |                 return PROVIDER_BASIC(adler32);
      |                        ^
igzip/adler32_base.c:34:1: note: return value type mismatch
   34 | adler32_base(uint32_t adler32, uint8_t *start, uint64_t length)
      | ^
igzip/adler32_base.c:34:1: note: type 'uint32_t' should match type 'void'
igzip/adler32_base.c:34:1: note: 'adler32_base' was previously declared here

This commit introduces the same correction for riscv64.

Signed-off-by: Mattias Ellert <mattias.ellert@physics.uu.se>
2025-04-23 20:04:05 +01:00
Pablo de Lara
6b03bc4f1e igzip: fix coding style of inflate example
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-04-23 13:46:12 +01:00
Pablo de Lara
4fe61d3bce Show clang-format version
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-04-23 13:46:12 +01:00
Pablo de Lara
aa9e15f794 aarch64: remove unneeded defines
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-04-22 16:12:16 +01:00
Mattias Ellert
841f9e34ad Address type mismatch warnings on aarch64
The PROVIDER_INFO macro used in the aarch64 code declares all
functions with the signature:

extern void function(void);

The actual return type and parameter list of the functions are however
different. The declarations provided by the PROVIDER_INFO macro
therfore conflicts with the actual declarations of the functions
elsewhere in the code, causing compiler warnings.

This commit drops the PROVIDER_INFO macro and provides proper function
declarations, eiter by including a header file or by providing a
forward declaration. This corresponds to how the code for the other
architectures are handlinging this issue.

Signed-off-by: Mattias Ellert <mattias.ellert@physics.uu.se>
2025-04-22 12:55:53 +01:00
Karpenko, Veronika
3e03e91cef igzip: add inflate example
Signed-off-by: Karpenko, Veronika <veronika.karpenko@intel.com>
2025-04-08 10:13:32 +01:00
sunyuechi
c0bd84c20e add R-V V build check
Signed-off-by: sunyuechi <sunyuechi@iscas.ac.cn>
2025-03-20 19:22:40 +00:00
sunyuechi
027be4beb9 add volatile for igzip/checksum32_funs_test
When using RISC-V GCC 14, `gcc -O0` passes the test, but `gcc -O2` fails.

The log shows that it enters the branch `if (c_dut != c_ref) {`

even though `c_dut` and `c_ref` have the same value.

Adding `volatile` allows the test to pass.

Signed-off-by: sunyuechi <sunyuechi@iscas.ac.cn>
2025-03-20 19:22:40 +00:00
sunyuechi
e0687d4964 igzip: R-V V isal_adler32
banana_f3:
	new: adler32_warm: runtime =    3062612 usecs, bandwidth 3861 MB in 3.0626 sec = 1261.01 MB/s
	old: adler32_warm: runtime =    3062505 usecs, bandwidth 1027 MB in 3.0625 sec = 335.64 MB/s

Signed-off-by: sunyuechi <sunyuechi@iscas.ac.cn>
2025-03-20 19:22:40 +00:00
sunyuechi
83d58b856c multibinary: Add run-time cpu feature detect for riscv64
Signed-off-by: sunyuechi <sunyuechi@iscas.ac.cn>
2025-03-20 19:22:40 +00:00
Daniel Gregory
726a6f7c02 build: Add riscv64 support
Use the base implementations for every function.

Signed-off-by: Daniel Gregory <daniel.gregory@bytedance.com>
2025-03-20 19:22:40 +00:00
Pablo de Lara
633add1b56 igzip: fix header construction in Big Endian systems
When a file contains a number of repeated '0x00' or '0xff'
bytes, the block header is copied from a precomputed header,
which only worked for Little-Endian systems.

Fixes #311.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-02-04 10:13:32 +00:00
Mattias Ellert
e3c2d243a1 Address compiler warnings on ppc64le and s390x
igzip/igzip_icf_body.c:7:1: warning: type of 'gen_icf_map_lh1' does not match original declaration [-Wlto-type-mismatch]
    7 | gen_icf_map_lh1(struct isal_zstream *, struct deflate_icf *, uint32_t);
      | ^
igzip/igzip_base_aliases.c:177:1: note: return value type mismatch
  177 | gen_icf_map_lh1(struct isal_zstream *stream, struct deflate_icf *matches_icf_lookup,
      | ^
igzip/igzip_base_aliases.c:177:1: note: type 'void' should match type 'uint64_t'
igzip/igzip_base_aliases.c:177:1: note: 'gen_icf_map_lh1' was previously declared here
igzip/igzip_base_aliases.c:177:1: note: code may be misoptimized unless '-fno-strict-aliasing' is used
igzip/igzip_icf_body.c:9:1: warning: type of 'set_long_icf_fg' does not match original declaration [-Wlto-type-mismatch]
    9 | set_long_icf_fg(uint8_t *, uint64_t, uint64_t, struct deflate_icf *);
      | ^
igzip/igzip_base_aliases.c:170:1: note: type mismatch in parameter 2
  170 | set_long_icf_fg(uint8_t *next_in, uint8_t *end_in, struct deflate_icf *match_lookup,
      | ^
igzip/igzip_base_aliases.c:170:1: note: 'set_long_icf_fg' was previously declared here
igzip/igzip_base_aliases.c:170:1: note: code may be misoptimized unless '-fno-strict-aliasing' is used
igzip/igzip_base_aliases.c:62:1: warning: type of 'set_long_icf_fg_base' does not match original declaration [-Wlto-type-mismatch]
   62 | set_long_icf_fg_base(uint8_t *next_in, uint8_t *end_in, struct deflate_icf *match_lookup,
      | ^
igzip/igzip_icf_body.c:34:1: note: type mismatch in parameter 2
   34 | set_long_icf_fg_base(uint8_t *next_in, uint64_t processed, uint64_t input_size,
      | ^
igzip/igzip_icf_body.c:34:1: note: 'set_long_icf_fg_base' was previously declared here
igzip/igzip_icf_body.c:34:1: note: code may be misoptimized unless '-fno-strict-aliasing' is used
igzip/igzip_base_aliases.c:54:1: warning: type of 'adler32_base' does not match original declaration [-Wlto-type-mismatch]
   54 | adler32_base(uint32_t init, const unsigned char *buf, uint64_t len);
      | ^
igzip/adler32_base.c:34:1: note: type mismatch in parameter 3
   34 | adler32_base(uint32_t adler32, uint8_t *start, uint32_t length)
      | ^
igzip/adler32_base.c:34:1: note: type 'uint32_t' should match type 'uint64_t'
igzip/adler32_base.c:34:1: note: 'adler32_base' was previously declared here
igzip/adler32_base.c:34:1: note: code may be misoptimized unless '-fno-strict-aliasing' is used

Signed-off-by: Mattias Ellert <mattias.ellert@physics.uu.se>
2025-01-27 23:01:00 +01:00
Mattias Ellert
c387163fcb Revert soname change
The soname is equal to current minus age.
In version 2.31.0 current is 2 and age is set to 0.
In version 2.31.1 current is 2 and age is set to 1.
This means the soname goes backwards from 2 to 1.
The full library version changes from 2.0.31 to 1.1.31

The soname should not go backwards, so this soname change looks like a
mistake that should be reverted.

The current, revision, age for a library should change in one of three ways:

1) increase current by one, reset revision and age to 0.
2) increase current by one, reset revision to 0 and increase age by 1.
3) increase revision by 1, retain the values of current and age.

1) is for non-backward compatible changes to the library (changes or
removals to the old ABI). Soname changes and applications using the
library must be recompiled.

2) is for when there are ABI additions to the library, but no ABI
changes or removals. Application compiled against the old version of
the library don't need to be recompiled, and the soname (current minus
age) does not change.

3) is for minor updates with no ABI additions, changes or removals.

The major, minor, patch version of the software project should not be
used as current, revision, age for the library. Especially true for
using the patch version as age, because that means the soname goes
backwards for patch releases as happened here.

Signed-off-by: Mattias Ellert <mattias.ellert@physics.uu.se>
2025-01-08 15:33:59 +00:00
Pablo de Lara
b0f067f94b mem: fix compilation with YASM
Fixes #294.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-01-08 15:32:19 +00:00
Pablo de Lara
28305ade9e Bump version to v2.31.1
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-01-03 10:26:01 +00:00
Pablo de Lara
504fa6721c Update release notes for v2.31.1
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-01-03 10:04:44 +00:00
Taiju Yamada
b1e6ac3c66 Assume pthread on MinGW
Signed-off-by: Taiju Yamada <tyamada@bi.a.u-tokyo.ac.jp>
2024-11-21 15:30:27 +00:00
Taiju Yamada
bd1ce56c43 add mingw CI
Signed-off-by: Taiju Yamada <tyamada@bi.a.u-tokyo.ac.jp>
2024-11-21 15:30:27 +00:00
Taiju Yamada
ae034d6f08 Use _byteswap_ushort etc for WIN32
Signed-off-by: Taiju Yamada <tyamada@bi.a.u-tokyo.ac.jp>
2024-11-21 15:30:27 +00:00
Taiju Yamada
ea1288fc6a Disable hardening build on mingw
Signed-off-by: Taiju Yamada <tyamada@bi.a.u-tokyo.ac.jp>
2024-11-21 15:30:27 +00:00
Marcel Cornu
aaad73e15d workflows: add validation to windows build
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2024-11-19 17:02:28 +00:00
Cornu, Marcel D
07f8028743 erasure_code: fix unaligned free error in perf apps on windows
Signed-off-by: Cornu, Marcel D <marcel.d.cornu@intel.com>
2024-11-19 14:20:33 +00:00
Cornu, Marcel D
00d6e6fe87 add perf target to windows makefile
Signed-off-by: Cornu, Marcel D <marcel.d.cornu@intel.com>
2024-11-19 14:20:33 +00:00
Marcel Cornu
496255cda6 tools: format source files in parallel
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2024-06-06 12:34:36 +01:00
Greg Troxel
0231d314f5 Extend FreeBSD conditional about byte ordering to NetBSD
NetBSD has the same byte-ordering idioms as FreeBSD.

Signed-off-by: Greg Troxel <gdt@lexort.com>
2024-06-06 12:33:54 +01:00
Bernd Schubert
dbaf284e11 aarch64_multibinary.h: Fix -Wasm-operand-widths
Compilation with clang gave warnings as per below.
Arm64 is has a width of 64 bit and these warnings came up.

In file included from igzip/aarch64/igzip_multibinary_aarch64_dispatcher.c:29:
./include/aarch64_multibinary.h:338:35: warning: value size does not match register size specified by the constraint and modifier [-Wasm-operand-widths]
                asm("mrs %0, MIDR_EL1 " : "=r" (id));
                                                ^
./include/aarch64_multibinary.h:338:12: note: use constraint modifier "w"
                asm("mrs %0, MIDR_EL1 " : "=r" (id));
                         ^~
                         %w0
1 warning generated.
In file included from mem/aarch64/mem_aarch64_dispatcher.c:29:
./include/aarch64_multibinary.h:338:35: warning: value size does not match register size specified by the constraint and modifier [-Wasm-operand-widths]
                asm("mrs %0, MIDR_EL1 " : "=r" (id));
                                                ^
./include/aarch64_multibinary.h:338:12: note: use constraint modifier "w"
                asm("mrs %0, MIDR_EL1 " : "=r" (id));
                         ^~
                         %w0
1 warning generated.

Signed-off-by: Bernd Schubert <bschubert@ddn.com>
2024-05-31 17:02:19 +01:00
Pablo de Lara
4e898eced6 mem: fix build on FreeBSD
Fix build warnings on FreeBSD, due to unused value.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2024-05-31 13:30:48 +01:00
Pablo de Lara
7ebc65baa7 igzip: fix build on FreeBSD
Fix build warnings on FreeBSD, due to unused value.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2024-05-31 13:30:48 +01:00
Pablo de Lara
47b2c5ab15 Makefile: remove duplicated pattern match
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2024-05-31 13:30:48 +01:00
Marcel Cornu
0234d629a4 clang-format: ignore aarch64_label.h
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2024-05-03 13:19:17 +01:00
Marcel Cornu
84ad119970 programs: add igzip binary as man page dependency
Required to support parallel builds

Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2024-05-03 13:19:17 +01:00
Marcel Cornu
75ce489550 workflows: use clang-format-18 to check format
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2024-04-22 11:35:03 +02:00
Marcel Cornu
9ab5a9e579 tests: reformat using new code style
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2024-04-22 11:35:03 +02:00
Marcel Cornu
ae951677ab raid: reformat using new code style
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2024-04-22 11:35:03 +02:00
Marcel Cornu
cf6105271a programs: reformat using new code style
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2024-04-22 11:35:03 +02:00
Marcel Cornu
aaa78d6a7c mem: reformat using new code style
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2024-04-22 11:35:03 +02:00
Marcel Cornu
fa5b8baf84 include: reformat using new code style
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2024-04-22 11:35:03 +02:00
Marcel Cornu
55fbfabfc6 igzip: reformat using new code style
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2024-04-22 11:35:03 +02:00
Marcel Cornu
9d99f8215d examples: reformat using new code style
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2024-04-22 11:35:03 +02:00
Marcel Cornu
300260a4d9 erasure_code: reformat using new code style
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2024-04-22 11:35:03 +02:00
Marcel Cornu
671e67b62d crc: reformat using new code style
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2024-04-22 11:35:03 +02:00
Marcel Cornu
07bca509e7 tools: use clang-format for style checking
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2024-04-22 11:35:03 +02:00
Taiju Yamada
7b30857e20 Run macos-13 (actual x86_64 latest) and macos-14 (arm64) CIs
Signed-off-by: Taiju Yamada <tyamada@bi.a.u-tokyo.ac.jp>
2024-03-25 15:34:01 +00:00
Taiju Yamada
38279f5e9e Avoid using x18 register
Signed-off-by: Taiju Yamada <tyamada@bi.a.u-tokyo.ac.jp>
2024-03-25 15:34:01 +00:00
Taiju Yamada
14ec878aae enable macOS extended test
Signed-off-by: Taiju Yamada <tyamada@bi.a.u-tokyo.ac.jp>
2024-03-21 09:58:33 +00:00
Taiju Yamada
4b74fb2204 tools: replace echo -n with printf
Signed-off-by: Taiju Yamada <tyamada@bi.a.u-tokyo.ac.jp>
2024-03-21 09:58:33 +00:00
Pablo de Lara
69d4a8a081 Add CI/Coverity/OpenSSF scorecard badges
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2024-03-19 21:30:50 +00:00
Pablo de Lara
8c2ff41c7f build: allow alternative compiler
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2024-03-14 13:41:41 +00:00
orbea
37005a00fc tools: fix shebang
This causes a build failure with slibtool.

Gentoo issue: https://bugs.gentoo.org/829500

Signed-off-by: orbea <orbea@riseup.net>
2024-03-12 14:25:16 +00:00
Taiju Yamada
f1b144bbab Fix mach compilation again; fold_constant has to be the same section as crc16_t10dif_copy_pmull
Signed-off-by: Taiju Yamada <tyamada@bi.a.u-tokyo.ac.jp>
2024-03-07 10:10:51 +00:00
Taiju Yamada
4be96e2437 Fixed isal_deflate_icf_finish_lvl1 dispatcher
Signed-off-by: Taiju Yamada <tyamada@bi.a.u-tokyo.ac.jp>
2024-03-07 10:10:39 +00:00
Taiju Yamada
f36d1ede78 add libtool dependency for MacOS CI
Signed-off-by: Taiju Yamada <tyamada@bi.a.u-tokyo.ac.jp>
2024-03-01 11:37:29 +00:00
Colin Ian King
1500db751d Fix a handful of spelling mistakes and typos
There are quite a few spelling mistakes and typos in comments and
user facing message literal strings as found using codespell. Fix
these.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2024-02-06 15:03:14 +00:00
Pablo de Lara
ffc16330d8 makefile: add spellcheck
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2024-02-06 15:03:14 +00:00
Mattias Ellert
1b1ee1e18f erasure_code: fix wrong return type
erasure_code/ppc64le/gf_vect_mul_vsx.c: In function '_gf_vect_mul_base':
erasure_code/ppc64le/gf_vect_mul_vsx.c:14:16: error: 'return' with a value, in function returning void [-Wreturn-mismatch]
   14 |         return 0;
      |                ^
erasure_code/ppc64le/gf_vect_mul_vsx.c:6:13: note: declared here
    6 | static void _gf_vect_mul_base(int len, unsigned char *a, unsigned char *src,
      |             ^~~~~~~~~~~~~~~~~

Signed-off-by: Mattias Ellert <mattias.ellert@physics.uu.se>
2024-01-23 12:01:14 +00:00
Pablo de Lara
bd22637502 Bump version to v2.31
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2024-01-18 18:27:24 +00:00
Pablo de Lara
d4e1c21acb lib: add missing structure documentation
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2024-01-15 16:58:43 +00:00
Pablo de Lara
4997190ab3 Update release notes for v2.31
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2024-01-15 16:55:28 +00:00
Greg Tucker
479b3f84f9 build: fix CET default in unix Makefile
CET default flag was clobbering CFLAGS.

Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2024-01-15 16:53:04 +00:00
Pablo de Lara
e0fd782974 erasure_code: use internal gf_vect_mul_base for ppc64le encoding
gf_vect_mul_base is expected to work for all buffer sizes.
However, this function is checking for size alignment to 32 bytes,
to follow the other gf_vect_mul implementations.
Therefore, another implementation for this function is included
inside ppc64le folder to be used by the encoding functions.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2024-01-15 15:48:14 +00:00
Pablo de Lara
b8d5633e51 erasure_code: check for size alignment on powerpc gf_vect_mul_vsx implementation
Follows the rest of the gf_vect_mul implementations for other architectures,
and checks for size alignment, stated in the documentation.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2024-01-15 15:48:14 +00:00
Pablo de Lara
91e7906f3f erasure_code: check for size on gf_vect_mul_sse/avx
gf_vect_mul requires length to be multiple of 32 bytes,
so this check is added in the SSE/AVX implementations.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2024-01-15 13:52:08 +00:00
liuqinfei
275977156d gf_vect_mul_sve: fix error and enable unit tests for aarch64
Signed-off-by: liuqinfei <lucas.liuqinfei@huawei.com>
2024-01-12 15:18:37 +00:00
Pablo de Lara
e0fffbe48b erasure_code: disable unit tests temporarily for aarch64/ppc64le
Some aarch64 and ppc64le implementations of gf_vect_mul do not check
for invalid sizes, so the unit test checking for negative return value
from this function is disabled temporarily on these architectures.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2024-01-10 15:53:14 +00:00
Pablo de Lara
7145c7f8b4 Makefile: add architecture to CFLAGS
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2024-01-10 15:53:14 +00:00
Pablo de Lara
455fdded4e erasure_code: add missing aarch64 and powerpc interface for ec_init_tables
ec_init_tables is now a multi-implementation function,
so it requires a dispatcher for all architectures.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2024-01-09 13:38:43 +00:00
Pablo de Lara
ae0a688051 Update License file
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2024-01-09 09:42:48 +00:00
Tomasz Kantecki
75af1c4d4e build: detect availability of -z now, relro and noexecstack linker options
Signed-off-by: Tomasz Kantecki <tomasz.kantecki@intel.com>
2024-01-05 14:45:12 +00:00
Pablo de Lara
71575ae434 raid: [example] fix memory leak in CRC64 example
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-12-22 09:35:37 +00:00
Pablo de Lara
9ee34ec0f5 crc: use macro to print 64-bit value
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-12-22 09:35:37 +00:00
Pablo de Lara
cf967e5a37 README: add section for DLL injection attack mitigations
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-12-21 16:05:52 +00:00
Pablo de Lara
29d99fce26 igzip: add zlib header init function
Add isal_zlib_hdr_init() function to initialize
the isal_zlib_header structure to all 0.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-12-20 14:05:52 +00:00
Tomasz Kantecki
6ef2abe80e igzip: fix issues reported by static code analysis
compute_dist_code() and compute_dist_icf_code() in huffman.h:
    Correct `assert(msb >= 1)` to `assert(msb >= 2)`.
    `msb` cannot be lower than 2 as it would result in corrupt computations.

get_dist_code() in huffman.h:
    Remove dead `if` statement at the beginning of the function.
    `dist` must be equal 1 or above in this function.

Signed-off-by: Tomasz Kantecki <tomasz.kantecki@intel.com>
2023-12-19 20:36:39 +00:00
Tomasz Kantecki
402bd4f773 erasure_code: various fixes for static code analysis issues
Signed-off-by: Tomasz Kantecki <tomasz.kantecki@intel.com>
2023-12-19 20:36:39 +00:00
Tomasz Kantecki
ac2ee91cdb mem_zero_detect_test: fix for issue reported by static code analysis
Signed-off-by: Tomasz Kantecki <tomasz.kantecki@intel.com>
2023-12-19 20:36:39 +00:00
Tomasz Kantecki
5a00eaec33 igzip: several fixes for issues reported by static code analysis
Signed-off-by: Tomasz Kantecki <tomasz.kantecki@intel.com>
2023-12-19 20:36:39 +00:00
Pablo de Lara
c83771eeec mem: [test] fix memory leak
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-12-18 14:25:22 +00:00
Pablo de Lara
a3e260436a erasure_code: [test] fix memory leak
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-12-18 14:25:22 +00:00
Pablo de Lara
abd80d3c5a erasure_code: check for size in gf_Xvect_mad_avx512_gfni
Length of data was not checked in implementation with AVX512+GFNI,
at the start of the gf_Xvect_mad_avx512_gfni functions, resulting
in buffer overflow if length was less than 64 bytes.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-12-18 14:25:22 +00:00
Pablo de Lara
c06db0c60a igzip: [test] fix memory leak
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-12-18 14:25:22 +00:00
Pablo de Lara
d65d2b5572 crc: [test] fix memory leak
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-12-18 14:25:22 +00:00
Pablo de Lara
54d1153a61 raid: [test] fix memory leak
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-12-18 14:25:22 +00:00
Tomasz Kantecki
c183961175 build: enable full read-only relocations and control flow integrity for hardening check
Signed-off-by: Tomasz Kantecki <tomasz.kantecki@intel.com>
2023-12-18 10:47:23 +00:00
Tomasz Kantecki
809f536265 igzip_cli: add missing 'void' keyword to some function prototypes
Signed-off-by: Tomasz Kantecki <tomasz.kantecki@intel.com>
2023-12-18 10:47:23 +00:00
Marcel Cornu
561a419bc8 erasure_code: fix modules using incorrect unsigned jump
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2023-12-14 17:55:49 +00:00
Marcel Cornu
a53a20ea2a erasure_code: add AVX2 5vect mad with GFNI implementation
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2023-12-14 17:55:49 +00:00
Marcel Cornu
47ed2847af erasure_code: add AVX2 4vect mad with GFNI implementation
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2023-12-14 17:55:49 +00:00
Marcel Cornu
22b7f33d68 erasure_code: add AVX2 3vect mad with GFNI implementation
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2023-12-14 17:55:49 +00:00
Tomasz Kantecki
a139dd7302 igzip_cli: improve get_posix_filetime() to deal with potential fstat() errors
Signed-off-by: Tomasz Kantecki <tomasz.kantecki@intel.com>
2023-12-14 13:55:52 +00:00
Tomasz Kantecki
08f021c43f igzip_cli: fix for potential buffer overrun on 'outfile_name' buffer with strncat()
Signed-off-by: Tomasz Kantecki <tomasz.kantecki@intel.com>
2023-12-14 13:55:52 +00:00
Tomasz Kantecki
722144ee75 igzip_cli: simplify fopen_safe() by replacing access() calls with detailed error message after failed fopen()
Signed-off-by: Tomasz Kantecki <tomasz.kantecki@intel.com>
2023-12-14 13:55:52 +00:00
Tomasz Kantecki
0e6bc4a5a1 igzip: zero flags field in isal_gzip_header_init()
Signed-off-by: Tomasz Kantecki <tomasz.kantecki@intel.com>
2023-12-14 13:55:52 +00:00
Marcel Cornu
d22bb198f3 erasure_code: optimize AVX2-GFNI single vector mad implementation
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2023-12-13 17:03:16 +00:00
Marcel Cornu
a0a149d674 erasure_code: add AVX2 2vect mad with GFNI implementation
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2023-12-13 17:03:16 +00:00
Marcel Cornu
0052080f53 erasure_code: optimize AVX2 GFNI 2 vector dot product
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2023-12-11 22:44:07 +00:00
Marcel Cornu
3f87141d03 erasure_code: optimize AVX2 GFNI single vector dot product
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2023-12-11 22:44:07 +00:00
Marcel Cornu
164d9ff1f0 erasure_code: add 2 vector AVX2 dot product with GFNI implementation
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2023-12-11 22:44:07 +00:00
Pablo de Lara
f82746491e tools: check code style first
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-12-11 15:23:56 +00:00
Pablo de Lara
8f2634aeac raid: remove unneeded load
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-12-11 09:31:11 +00:00
Pablo de Lara
5d6092c832 raid: optimize final parity check
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-12-11 09:31:11 +00:00
Pablo de Lara
bf8f2a25ba raid: fix function descriptions
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-12-11 09:31:11 +00:00
Marcel Cornu
307d737bf2 erasure_code: add 3 vector AVX2 dot product with GFNI implementation
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2023-12-07 14:01:18 +00:00
Pablo de Lara
4203d9628c igzip: fix null-terminated string setting
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-12-07 13:34:21 +00:00
Pablo de Lara
4a4635e8db igzip: remove unneeded check
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-12-07 13:34:21 +00:00
Pablo de Lara
02aa005c2d igzip: fix return value in wrapper header test
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-12-07 13:34:21 +00:00
Pablo de Lara
6dc544c661 Ignore obsolete warnings when using autoreconf
A few macros are declared obsolete from autoconf 2.70.
In order to avoid breaking compatibility with 2.69 removing them,
these warnings are ignored.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-12-07 13:29:03 +00:00
Pablo de Lara
7e2b097f15 igzip: fix build warnings on Windows
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-12-07 13:23:09 +00:00
Pablo de Lara
6188bf7b2f crc: fix build warnings on Windows
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-12-07 13:23:09 +00:00
Pablo de Lara
df073be348 tools: allow testing on multiple architectures with Intel SDE
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-12-01 14:33:29 +00:00
Pablo de Lara
2ca781df19 lib: reduce verbosity by default in tests
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-12-01 14:33:29 +00:00
Marcel Cornu
5f23c03415 erasure_code: add initial AVX2 mad with GFNI implementation
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2023-12-01 14:20:56 +00:00
Pablo de Lara
447d9af75b erasure_code: add initial AVX2 dot product with GFNI implementation
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2023-12-01 14:20:56 +00:00
Marcel Cornu
637f5a631d include: add memcpy asm module
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2023-12-01 14:20:56 +00:00
Marcel Cornu
bc34d87427 erasure_code: update GF_MUL_XOR macro to support VEX encoding
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2023-12-01 14:20:56 +00:00
Pablo de Lara
c8dd92f04a lib: add new interface supporting AVX2 with GFNI
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-12-01 14:20:56 +00:00
Pablo de Lara
f971f02309 erasure_code: expose base implementation of init_tables
Expose ec_init_tables_base(), which should be used
with ec_encode_data_base() and ec_encode_data_update_base().

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-11-23 10:56:28 +00:00
Pablo de Lara
65e89717df erasure_code: implement EC update with AVX512 + GFNI
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-11-23 10:56:28 +00:00
Pablo de Lara
1eff12dddb erasure_code: implement EC with AVX512 + GFNI
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-11-23 10:56:28 +00:00
Pablo de Lara
9d487fd6db erasure_code: [perf] get parameters for number of buffers
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-11-23 10:56:28 +00:00
Pablo de Lara
07af4032ff erasure_code: fix stack allocation
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-11-23 10:56:28 +00:00
Pablo de Lara
801df41929 erasure_code: fix vmovdqa instruction
vmovdqa needs to be vmovdqa32/64 when used on ZMMs (EVEX encoded).

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-11-23 10:56:28 +00:00
Pablo de Lara
34463cb663 ci: build with EC_ALIGNED_ADDR and NO_NT_LDST
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-11-23 10:56:28 +00:00
Pablo de Lara
e2acfbfe78 igzip: fix build warning
Fix the following build issue by initializing look_back_dist to 0.

igzip/igzip_inflate.c: In function ‘decode_huffman_code_block_stateless_base’:
igzip/igzip_inflate.c:1727:36:
 warning: ‘look_back_dist’ may be used uninitialized  [-Wmaybe-uninitialized]

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-11-15 13:46:52 +00:00
Pablo de Lara
acbe0deecf crc: fix build with NASM 2.14
Fix following compilation error
crc/crc32_iscsi_by16_10.s:408: error: invalid combination of opcode and operands

Fixes #257.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-11-15 13:42:00 +00:00
liuqinfei
4815174a68 crc: optimize by supporting arm xor fusion feature
Arrange the two xor instructions according to the specified
paradigm, then the two xor instructions can be fused to execute
which can save one issue slot and one execution latency.

Change-Id: Ic64bcfe569b2468e4dc9c13d073d367cc81fd937
Signed-off-by: liuqinfei <lucas.liuqinfei@huawei.com>
2023-08-18 07:53:59 +00:00
Pablo de Lara
f534a5c6a9 crc: fold 64 bytes of data if possible
When less than 256 bytes of data are left, fold data
in steps of 64 bytes, instead of 16 bytes, if there is enough
data.

Change-Id: I47d7cacdd1ba620078df528136945695c338db6d
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-08-17 11:54:24 +01:00
Pablo de Lara
beab678fb8 crc: optimize last bytes
Change-Id: I4b8f73b23eb50c4c50ca65fab19716f217fe5780
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-08-17 11:54:20 +01:00
Greg Tucker
e53db85631 doc: Add notes on reentrancy and threading
Fixes #249

Change-Id: Id56464436aeeb2c16bab2cbc0efeb4fded80dc4f
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2023-07-19 13:11:43 -07:00
Pablo de Lara
e1e0df6c7e Update README.md
libtool is required for the autotool build otherwise you will get the "error: possibly undefined macro: AC_PROG_LD" error message

Change-Id: Ifa4d8fd48dba6714246390aadedaecb844c206c9
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-07-06 09:41:56 +01:00
Pablo de Lara
2bbce31943 crc: add CRC64 rocksoft implementation
- Added reference implementation
- Added base implementation
- Added functional and performance tests

Change-Id: I60c5097bd5fb89ee7a50910e71d449d50d155d0a
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2023-05-08 12:37:44 +00:00
Pablo de Lara
16056ff4e4 crc: refactor SSE CRC64 implementations to use common code
Change-Id: I2d141f2ccd12ab338783e50736e36ed4aeb11f7f
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-05-08 12:37:44 +00:00
Pablo de Lara
22d33cf795 crc: use k-mask to load final bytes of data
Change-Id: Ibd8d2144bc6942e11911e25a6365c1cb108af477
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-05-08 12:37:23 +00:00
Greg Tucker
9f2b68f057 igzip: Add precautionary reset hist_bits on stateless_init
The zstate.hist_bits is an option and shouldn't be set randomly by a
deflate stateless run but like level we may set anyway.

Change-Id: I37d3b51863d4697e964d45a482ddd526f40a0902
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2023-03-14 17:26:58 -07:00
Greg Tucker
33a2d94845 doc: Updates and info on crc combine
Change-Id: Ibe5d6c61e73a03e7ff1840ca0335ada3657eaf00
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2023-02-10 17:44:52 -07:00
Greg Tucker
4cbd285861 ex: Add crc combine example for multiple polynomials
Change-Id: I55b6585f768877cffe1cbe16802456c8a12aea28
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2023-02-10 15:41:35 -07:00
Taiju Yamada
ad39d7ccfd Include hwcap.h only in C compilation
Change-Id: I08a75896ebd49634f31a80ed37acf2a1267fe156
Signed-off-by: Taiju Yamada <tyamada@bi.a.u-tokyo.ac.jp>
2022-12-08 14:10:30 -07:00
Greg Tucker
c2bec3ea65 crc: Use ternlog in by16 avx512 loop
Ternlog has additional benefit in by16 crc main loop in both reflected
and non-reflected polynomial crcs. Some arch see 4-7% improvement.
Revisited on suggestion by Nicola Torracca.

Change-Id: I806266a7080168cf33409634983e254a291a0795
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2022-11-02 12:16:20 -07:00
Greg Tucker
fec429e1b9 build: Add top-level read-only permissions to ci actions
This is recommended by ossf scorecard:
https://github.com/ossf/scorecard/blob/main/docs/checks.md#token-permissions

Change-Id: I48a36cc6625fa3f1e6babb9edbe81c9522f41a13
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2022-10-28 17:05:01 -07:00
Taiju Yamada
1187583a97 Fixes for aarch64 mac
- It should be fine to enable pmull always on Apple Silicon
- macOS 12+ is required for PMULL instruction.
- Changed the conditional macro to __APPLE__
- Rewritten dispatcher using sysctlbyname
- Use __USER_LABEL_PREFIX__
- Use __TEXT,__const as readonly section
- use ASM_DEF_RODATA macro
- fix func decl

Change-Id: I800593f21085d8187b480c8bb3ab2bd70c4a6974
Signed-off-by: Taiju Yamada <tyamada@bi.a.u-tokyo.ac.jp>
2022-10-28 08:27:26 -07:00
Surendar Chandra
85716fe2fe Correct loop bounds check in aarch64 gf_vect_mul
Prior to this change, a missing loop bounds check in the aarch64
version of gf_vect_mul would cause the routine to return 1 (error)
in the normal case.

This change introduces a check and branch to "return_pass" (success), and
also adds checks of the return code of gf_vect_mul to the supplied unit
test; it was previously ignored.

Change-Id: I9f7fe0014189b24f9600e0473ee02b5316c2da91
Signed-off-by: Surendar Chandra <vsurench@amazon.com>
2022-10-27 15:30:00 -07:00
Pawel Piatek
b6e96427d2 Use gindent on FreeBSD
Also add workaround for GNU indent bug.

Signed-off-by: Pawel Piatek <pawelx.piatek@intel.com>
Change-Id: I9478a06dc17675c858030cfe15552609fef021da
2022-10-11 12:30:53 +02:00
Greg Tucker
04f3125ea0 test: Move perf routine output from stack to heap
Large cold perf tests were allocating more then allowed stack size.

Change-Id: I2c54f36ac6b42b359078dae7fffa5ce0b6d4890a
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2022-08-08 15:19:03 -07:00
Greg Tucker
9c7e3b9f22 test: Change perf tests to warm by default
The cold versions of tests depended on a fixed size of last level
cache that is too low on some arch and too high for the total
available memory on others.

Change-Id: Iee98403f9ace02e01b810c296a5fe44b933bfb17
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2022-08-03 16:35:55 -07:00
Greg Tucker
2bcbaf4c39 doc: Add security policy file
Change-Id: Id5703011c296bd79b57ce2342b3bc25f82c6bd99
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2022-07-18 19:53:53 -07:00
Greg Tucker
9f75defd57 Remove all slver legacy segments
The relic slver is no longer used for individual versioning
on functions and is confusing tools looking for data in text
sections. This removes all instances instead of fixing since
its usefulness is waining. Fixes #221

Change-Id: Ife0b9f105950a90337c58e8a41ac2cffc0f67d99
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2022-07-14 19:23:52 -07:00
Greg Tucker
62519d97ec build: Remove ms link flag for msvcrt
The cflag to link with dynamic msvcrt /MD is not necessary and causes
warnings when static linking.  Fixes #219

Change-Id: I0085d468afc4acbe323b0783cbbc6760b4c70704
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2022-07-11 16:16:07 -07:00
Martin Oliveira
8b7c1b80b2 igzip: fix neon adler32 load beyond buffer end
In the adler32_neon function, during the last iteration of the
loop through "accum32_neon", we would load data after the end of the
buffer (in the ld1 instruction, the "start" register points to the end
of the buffer).

If this memory is unmapped, this would cause a segfault. If the memory
is mapped, the checksum would be correct because that value would
only be used in the next iteration, but this happens during the last
iteration.

To fix this, we can simply do the load before incrementing "start". And
while we're at it, we can load directly into d0_v/d1_v, saving a couple
of mov's.

Finally, the ld1 done during the function initialization can be removed
as the values aren't used for anything.

Change-Id: I4a0f2811adc523852ebe774da0a6fb1f5419192f
Signed-off-by: Martin Oliveira <martin.oliveira@eideticom.com>
2022-04-25 15:36:37 -07:00
ZhaiMo
5b1a519ffc change some logic in compress_icf_map_g
Change-Id: Ibb59058b6d826e03833c53839613e54c3d2003a8
Signed-off-by: ZhaiMo <zhaimo14@mails.ucas.ac.cn>
2022-04-13 17:20:05 +00:00
Chunsong Feng
e297ecae7a crc16: Accelerate T10DIF performance with prefetch and pmull2
The memory block size calculated by t10dif is generally 512 bytes in
sectors. prefetching can effectively reduce cache misses.Use ldp instead
of ldr to reduce the number of instructions, pmull+pmull2 can resuce
register access. The perf test result shows that the performance is
improved by 5x ~ 14x after optimization.

Change-Id: Ibd3f08036b6a45443ffc15f808fd3b467294c283
Signed-off-by: Chunsong Feng <fengchunsong@huawei.com>
2022-03-31 09:58:04 -07:00
Greg Tucker
ad8dce15c6 doc: Add function overview and usage page
While the external headers define the API, we could really use this
overview to get users started and point them to examples.

Change-Id: Iba419e61d0d7723e1029a3b6e7259facfeb39522
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2022-02-15 16:59:31 -07:00
H.J. Lu
57846f414f Properly add .note.gnu.property section to assembly codes
1. Revert "x86: Generate .note.gnu.property section for ELF output"

This reverts commit 8074e3fe1b9398a9d3b717267790050fc5041594, which is
a hack to work around the old nasm which doesn't support

section .note.gnu.property  note  alloc noexec align=8

This hack doesn't work for downstream, like:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=2040091

2. If Intel CET is enabled, require nasm with note section support to
add

section .note.gnu.property  note  alloc noexec align=N

to assembly codes.

Verified with

$ CC="gcc -Wl,-z,cet-report=error -fcf-protection" CXX="g++ -Wl,-z,cet-report=error -fcf-protection" .../configure x86_64-linux
    $ make -j8

on Tiger Lake.

Change-Id: I6d66fe6fd054420d7fde35b1508ca9f09defdeca
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2022-01-20 12:23:30 -07:00
Nicola Torracca
e3783f28f8 Add AVX512 implementation of mem_zero_detect().
Change-Id: I60fe0846d783787198b6a44a090fd9fe17c1807f
Signed-off-by: Nicola Torracca <shark@bitchx.it>
2022-01-04 12:25:23 -07:00
Ilya Leoshkevich
d3cfb2fb77 Fix s390 build
The goal of this patch is to make isa-l testsuite pass on s390 with
minimal changes to the library. The one and only reason isa-l does not
work on s390 at the moment is that s390 is big-endian, and isa-l
assumes little-endian at a lot of places.

There are two flavors of this: loading/storing integers from/to
memory, and overlapping structs. Loads/stores are already helpfully
wrapped by unaligned.h header, so replace the functions there with
endianness-aware variants. Solve struct member overlap by reversing
their order on big-endian.

Also, fix a couple of usages of uninitialized memory in the testsuite
(found with MemorySanitizer).

Fixes s390x part of #188.

Change-Id: Iaf14a113bd266900192cc8b44212f8a47a8c7753
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
2022-01-04 11:06:17 -07:00
Guodong Xu
3b3d7cc47b Enable SVE in ISA-L erasure code for aarch64
This patch adds Arm (aarch64) SVE [1] variable-length vector assembly support
into ISA-L erasure code library. "Arm designed the Scalable Vector Extension
(SVE) as a next-generation SIMD extension to AArch64. SVE allows flexible
vector length implementations with a range of possible values in CPU
implementations. The vector length can vary from a minimum of 128 bits up to
a maximum of 2048 bits, at 128-bit increments. The SVE design guarantees
that the same application can run on different implementations that support
SVE, without the need to recompile the code. " [3]

Test method:
 - This patch was tested on Fujitsu's A64FX [2], and it passed all erasure
     code related test cases, including "make checks" , "make test", and
     "make perf".
 - To ensure code testing coverage, parameters in files (erasure_code/
     erasure_code_test.c , erasure_code_update_test.c and gf_vect_mad_test.c)
     are modified to cover all _vect versions of _mad_sve() / _dot_prod_sve()
     rutines.

Performance improvements over NEON:
In general, SVE benchmarks (bandwidth in MB/s) are 40% ~ 100% higher than NEON
when running _cold style (data uncached and pulled from memory) perfs. This
includes routines of dot_prod, mad, and mul.

Optimization points:
This patch was tuned for the best performance on A64FX. Tuning points being
touched in this patch include:
1) Data prefetch into L2 cache before loading. See _sve.S files.
2) Instruction sequence orchestration. Such as interleaving every two
     'ld1b/st1b' instructions with other instructions. See _sve.S files.
3) To improve dest vectors parallelism, in highlevel, running
     gf_4vect_dot_prod_sve twice is better than running gf_8vect_dot_prod_sve()
     once, and it's also better than running _7vect + _vect, _6vect + _2vect,
     and _5vect + _3vect. The similar idea is applied to improve 11 ~ 9 dest
     vectors dot product computing as well. The related change can be found
     in ec_encode_data_sve() of file:
     erasure_code/aarch64/ec_aarch64_highlevel_func.c

Notes:
1) About vector length: A64FX has a vector register length of 512bit. However,
     this patchset was written with variable length assembly so it work
     automatically on aarch64 machines with any types of SVE vector length,
     such as SVE-128, SVE-256, etc..
2) About optimization: Due to differences in microarchitecture and
     cache/memory design, to achieve optimum performance on SVE capable CPUs
     other than A64FX, it is considered necessary to do microarchitecture-level
     tunings on these CPUs.

[1] Introduction to SVE - Arm Developer.
      https://developer.arm.com/documentation/102476/latest/
[2] FUJITSU Processor A64FX.
      https://www.fujitsu.com/global/products/computing/servers/supercomputer/a64fx/
[3] Introducing SVE.
      https://developer.arm.com/documentation/102476/0001/Introducing-SVE

Change-Id: If49eb8a956154d799dcda0ba4c9c6d979f5064a9
Signed-off-by: Guodong Xu <guodong.xu@linaro.org>
2022-01-04 10:54:38 -07:00
Greg Tucker
642ef36874 Fix check signoff for github actions
Github actions checkout changed to pull only a single generated merge
commit instead of the actual PR commit id. This breaks check_format
test for signoff. Pulling history of 2 will include the actual commit
ID.

Change-Id: I7d83871159d24faaf2f8e6086f12173e14cbcf3c
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2021-12-30 16:49:27 -07:00
John Zhang
0de83dbff7 add help2man as optional package
Change-Id: Id01a6d0fa77d5ec4959c2e9d9b0d6c3390cd43be
Signed-off-by: John Zhang <zsgsdesign@gmail.com>
2021-11-29 10:17:52 -07:00
Ruben Vorderman
78f5c31e66 Create github CI yaml file
This file automatically triggers testing on github actions.

Change-Id: I23848f2dca925e0c96e64f7d655f32b83498bed1
Signed-off-by: Ruben Vorderman <r.h.p.vorderman@lumc.nl>
2021-10-29 17:06:36 -07:00
Ruben Vorderman
fd83ed1924 Add -arch to unsupported arguments in [ny]asm-filters
Change-Id: Ieb53bb225815e204482e74bb383f1b61f12dabfd
Signed-off-by: Ruben Vorderman <r.h.p.vorderman@lumc.nl>
2021-10-12 15:53:32 -07:00
Greg Tucker
6d17992b6d mem: Add small allocs into test to help mem checkers
Change-Id: I6de3951ff66a715d8b1c0f36d691cb60e8396139
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2021-10-04 11:01:33 -07:00
Greg Tucker
87908c9060 mem: Move new mem_zero_detect function to avx2
New mem_zero_detect function will fail on avx only machines.

Change-Id: I3bca49bff886f9c130c89e8c74b31110e9bac76b
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2021-09-30 17:47:57 -07:00
Nicola Torracca
0e65117138 mem_zero_detect_avx: OR multiple vector and test for non zero on the result
micro-optimizations: vpcmpeqb+vpmaskmov is faster than vptest according
to uops.info; make usually untaken branches target forward.
reduce numbers of data dependant branches and code size.

Change-Id: Ie70b4bc99685368e5131f23344348bfaf7c27d3e
Signed-off-by: Nicola Torracca <shark@bitchx.it>
2021-09-30 16:55:30 -07:00
Greg Tucker
f980b36655 build: Change include shortcut D to not conflict with env
The variable D= can be used to quickly add defines. This sets a null
default so it can only be overridden by the make command line.

fixes #184

Change-Id: I84615174547f36208d6d577c1e30b6fac83139b3
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2021-09-14 19:18:31 -07:00
Taiju Yamada
998e03bf95 Strip -isysroot and related flags from asm-filter
This helps python-isal compatibility.

Change-Id: I8a2540e330f229f65903bdb2cc47aceeb0724dc5
Signed-off-by: Taiju Yamada <tyamada@bi.a.u-tokyo.ac.jp>
2021-09-13 10:02:38 -07:00
Greg Tucker
066940a9a7 build: Add ms rc file to put extra metatdata on dll
Change-Id: Idf687c6b2f8d1dea203f01bf57c5158d19ed519e
Signed-off-by: Ranjit Menon <ranjit.menon@intel.com>
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2021-09-02 18:27:51 -07:00
Ruben Vorderman
908726e255 More prominently feature language bindings and igzip
Change-Id: Ief814eeb6d24f16d822e22327f40756ffba05869
Signed-off-by: Ruben Vorderman <r.h.p.vorderman@lumc.nl>
2021-08-24 18:22:54 -07:00
Ruben Vorderman
94ec6026ce Create headers based on compression parameters.
Instead of using a constant as default zlib header, create the header on the fly. Both zlib
header bytes depend on the wbits and compression level used.
Make sure that ISA-L compression level 0 is advertised as the fastest compression in
both the gzip header (setting xfl flag to 0x04) and the zlib header (as 0, fastest, other levels are 1, fast).

Change-Id: I1f30e4397a0f5fcf6df593c40178e7d6f6c05328
Signed-off-by: Ruben Vorderman <r.h.p.vorderman@lumc.nl>
2021-08-23 09:48:10 -07:00
Greg Tucker
1db0363c49 igzip: Add compress-decompress with dictionary to perf test
Change-Id: Ic396819537f5437e6aab3ebf5d023ed5cdbe852a
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2021-06-14 15:55:39 -07:00
Greg Tucker
112dd72c01 build: Remove unneeded file types.h
The file types.h has long been misnamed and overlaps with
functionality in the test helper routines.

Change-Id: I774047d3a0074198b67a6b4e909f1e2ce1938195
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2021-06-10 09:35:43 -07:00
Greg Tucker
cfdd3497d1 perf: Remove unneeded time include
Timing functions are made os-independent with test.h include.

Change-Id: Iab7d6325254d5c32263504efc756dbbe51d77153
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2021-06-09 18:33:57 -07:00
Greg Tucker
d5928e3760 build: Fix missing ms function export
Windows def file was missing an exported ec support function.
Also added path in nmake file to build extra examples.

Change-Id: I59ac1599dcb8cdb45077347c74b57aeca4751c35
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2021-06-07 18:30:08 -07:00
Greg Tucker
628f4e91ea ex: Add makefile to build examples from installed lib
Change-Id: I10a51dfe90e0672bb33348de241a5be91c9caa37
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2021-06-03 17:53:20 -07:00
Greg Tucker
0f7bf1c04d doc: Update minimum nasm recommendation and details
Change-Id: Icb113242c0ab7f3c75af3e65a8d519511f4ed4c3
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2021-05-27 17:44:31 -07:00
Greg Tucker
393f69fcac build: Change travis osx to use std brew
The osx brew and older linux targets are failing the update.
This removes the older linux builds and change the osx to
take the latest brew that comes with the image instead of
doing a brew update on every build.

Change-Id: Ib1543296a733875c9eff798326b0d45854153923
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2021-05-21 19:44:39 -07:00
Greg Tucker
240ca46ffb build: Change mingw to nasm by default
Change-Id: I80053b8cf62f5f2ef7c12661086e9aeaf2eea573
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2021-05-21 19:44:39 -07:00
Greg Tucker
d7bac36be4 crc: Fix warning in perf test from uninitialized tmp ptr
Both gcc and clang are showing a warning on this despite the buffer
always being set before use.

Change-Id: I0e8f6b9e3451efe69e49814abc883d49b04f2666
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2021-05-20 11:57:56 -07:00
Greg Tucker
fe4b7f9acc Add toplevel header gen in windows
Change-Id: I3a1e5fc495266d8ba223d75384625e22c3cf66fe
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2021-05-06 16:44:10 -07:00
Greg Tucker
2c705a26cb raid: Fix doc and base functions for min sources
The raid functions xor_gen, pq_gen and check functions
must have at least two sources. Fixes #175

Change-Id: I2e4509e037c2b1dc88f3f7449d80f4c763e1e124
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2021-04-26 16:23:58 -07:00
Greg Tucker
ebb78fc99e build: Fix warning from inconsistency in gnu make
Make changed the interpretation of escaped # in a quote causing
warnings in the test for pthreads.

Change-Id: Ice94116713aea3c3e9725b38232e03f53d6633cc
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2021-03-03 10:26:06 -07:00
luo rixin
bee5180a15 erasure_code: Fix text relocation on aarch64
Here is the bug report on ceph. https://tracker.ceph.com/issues/48681

Change-Id: Ie1c60a71f28c1a169c8899a621be9bb455f5e244
Signed-off-by: luo rixin <luorixin@huawei.com>
2021-01-08 15:23:15 -07:00
Jerry Yu
bc8b2aef55 Fix clang build fail
Author of this patch is Taiju Yamada <tyamada@bi.a.u-tokyo.ac.jp>
Re-organized by Jerry Yu <jerry.h.yu@arm.com>

Clang version must be later than 9.x according to https://reviews.llvm.org/D61719

Change-Id: I7516cca17ef4556b828fb6ecfa755e6451052359
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2020-12-09 14:37:55 +08:00
Greg Tucker
600d8d8f77 build: Update fuzz tests for deprecated clang args
Clang has deprecated the option -fsanitize-coverage=trace-pc-guard
for use with fuzzing.

Change-Id: I7fe5da0f57ab44110208d098858b786450a0a5e7
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-12-04 15:04:02 -07:00
Greg Tucker
2df39cf5f1 build: Bump revision to 2.30
Change-Id: If6d696ee76f3949d3cf5aff34403df65bce2c6b9
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-11-06 18:08:16 -07:00
Greg Tucker
05f6a0bb39 Update release notes for v2.30 additions
Change-Id: Icbb1faa2b67d8d18b1c7cde9f09774ebd895a6df
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-11-04 14:59:33 -07:00
Greg Tucker
ece814e912 doc: Add details on build and test
Change-Id: I58401ed26ba8a0a7fad0191b4c1bbb461d0311e6
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-11-04 12:40:08 -07:00
Greg Tucker
dca9dd221e igzip: Use unaligned load on static header to fix usan
Clang with sanitizer on was catching on cast of static header.
Switching to uload64 macro for better general solution.

Change-Id: I495d440407bb1773841e2f7cdc48bd95fc1a2df4
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-11-04 12:40:08 -07:00
Greg Tucker
269df8a67d igzip: Fix order of args check in new dictionary function
In the newly added function isal_deflate_process_dict(), a null check
was added to the dictionary struct but was ineffectual because of the
order.

Change-Id: I3b3e70997210794de102b1348e1467295871cee2
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-11-03 08:50:30 -07:00
Greg Tucker
24a98e3e87 Fix missing files in extra dist
Change-Id: I83e62344fab72afd755453d4eb43e9c236ba2b86
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-10-28 17:43:53 -07:00
Greg Tucker
79143208ac test: Add testing for new dictionary functions
Change-Id: I0b0a151374acfe9b44c7a2be4bb959df59356d97
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-10-28 17:28:43 -07:00
Greg Tucker
19035917f4 igzip: Add new functions for faster dictionary compression
Change-Id: Id55728fea286d144f8a11192ab02ccc8503d7b25
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-10-21 18:09:49 -07:00
Greg Tucker
438ecd8187 Update custom hufftable tool for saving histogram
Change-Id: I515217b19373b8f996ff887268862cf2b102f3a4
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-10-21 18:09:49 -07:00
Greg Tucker
89f7c46cd5 Change igzip_file_perf to accept 0 time
Change-Id: Ie2edf8e742d0bcdd9a008704f997006f8f5009ac
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-10-21 18:09:49 -07:00
Greg Tucker
9968e7a032 Change gen cust hufftables to accept dictionary
Change-Id: I4eed03bdb91030b16b3ecfd8076adc890e4f59a2
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-10-21 18:09:49 -07:00
Greg Tucker
63dffab948 igzip: Change pre-gen inflate table to multi-symbol
Change-Id: I4b0dac1e5aa2796be17644b893e3b6c7aed05876
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-10-21 18:09:49 -07:00
Greg Tucker
d7927673ba igzip: Inflate detect pre-gen header and use pre-expanded
Performance improvement for inflate to skip the time-consuming process of decode
table expansion when the header matches a known common dymanic one such as
produced by level 0 compression.

Change-Id: Ia2550b812a062b7cc2eb1b72bcb609f1a631e40b
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-10-21 18:09:49 -07:00
Greg Tucker
cc9ed53972 build: Fix nmake check for multiple arch
Change-Id: I36c3616163f6fec61dda9cf8b35ca561e59477c9
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-08-27 11:16:30 -07:00
Greg Tucker
794b8b60c1 build: Add test to check for nmake consistency
Change-Id: I1180ba749d54e7ef433b01b33450e52ac5dbb2bb
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-08-26 11:41:03 -07:00
Greg Tucker
24623b8b82 crc: Fix missing object omitted from nmake file
Previous new crc version missed the update for nmake.

Change-Id: Ie529ee9d70d8d0ab8a8af3bd2720405802180d1e
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-08-26 09:49:23 -07:00
Greg Tucker
ec73d39086 crc: Add new vclmul version of crc32_iscsi
Change-Id: I1c509c6ea312b6eb4e1c2c1c8bb7044f7b043e0d
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-08-21 17:15:58 -07:00
Greg Tucker
ae45f60e78 igzip: Add cli feature to inflate concatenated gz files
Change-Id: I2beade6682e78fda30a18228a8660201ae7bf718
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-08-13 15:21:10 -07:00
Greg Tucker
93049d0d1f igzip: Fix read header for correct null checking and init
Issue with reading header only appears when combined with new feature in cli of
multiple concatenated gzip files.

Change-Id: Id8df9150c6f27d8b22e810b511291f3fcf136723
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-08-13 15:21:10 -07:00
Ruben Vorderman
2049d8dc81 Add conda shield to readme
This will make it easier for users to get the latest version. Installing with conda is easier than compiling it yourself. Distro packages (such as Debian's) do not always ship the latest version while conda-forge can. This badge will advertise this install method.

Change-Id: I99a1853a00e55fdf0c574c9906675738ac278121
Signed-off-by: Ruben Vorderman <r.h.p.vorderman@lumc.nl>
2020-07-27 11:36:55 -07:00
Jerry Yu
1c71f9c0ae crc32: tweak performance of crc32/crc32c
Tweak performances with prefetch instructions.

Below is the test results:
- Neoverse N1: ~30%
- Cortex-A72: ~3%
- Cortex-A57: ~90%
- Others: 50% - 5x

Change-Id: I3ab292a953043dbaea98af3c66778f57da3a1331
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2020-07-09 17:37:00 +08:00
Jerry Yu
14e0081bef build: fix build break on non-x86 platform
Arm64 and ppc64 build reports below error:
"configure: error: conditional "INTEL_CET_ENABLED" was never defined."
And the error should be report in all non-x86 platform.

Change-Id: I4c1b2fc99091424cfd5c62cf4d6536222b66712d
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2020-06-03 03:25:03 +00:00
H.J. Lu
8074e3fe1b x86: Generate .note.gnu.property section for ELF output
We should generate .note.gnu.property section with x86 assembly codes
for ELF outputs to mark Intel CET support when Intel CET is enabled
since all input files must be marked with Intel CET support in order
for linker to mark output with Intel CET support.  Since nasm and yasm
can't generate the proper .note.gnu.property section, yasm-cet-filter.sh
and yasm-filter.sh are added to generate the proper .note.gnu.property
with linker help.

Verified with

$ CC="gcc -Wl,-z,cet-report=error -fcf-protection" CXX="g++ -Wl,-z,cet-report=error -fcf-protection" .../configure x86_64-linux
$ make -j8

on Linux/x86-64.

Change-Id: I14e03a8a9031c8397dc36939a528cf5a827d775a
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2020-05-26 17:12:01 -07:00
H.J. Lu
cd888f01a4 x86: Add ENDBR32/ENDBR64 at function entries for Intel CET
To support Intel CET, all indirect branch targets must start with
ENDBR32/ENDBR64.  Here is a patch to define endbranch and add it to
function entries in x86 assembly codes which are indirect branch
targets as discovered by running testsuite on Intel CET machine and
visual inspection.

Verified with

$ CC="gcc -Wl,-z,cet-report=error -fcf-protection" CXX="g++ -Wl,-z,cet-report=error -fcf-protection" .../configure x86_64-linux
$ make -j8
$ make -j8 check

with both nasm and yasm on both CET and non-CET machines.

Change-Id: I9822578e7294fb5043a64ab7de5c41de81a7d337
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2020-05-26 09:16:49 -07:00
Zhiyuan Zhu
031450f697 crc32: Implement default mix mode optimization
Change-Id: Ib3bf04215cca491db522ec33905fe48df173cc2f
Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>
2020-05-09 08:10:34 +00:00
Jerry Yu
6c4d3dbf6c crc32:NeoverseN1: Change CRC32/PMULL order to PMULL first
To reduce the cache missing events, the mix layout is changed
to PMULL+CRC. It also relaxes the final delay caused by data
dependency.
As results, the cold perf was improved about 20% and warm perf
was improved about 4%.

Change-Id: I7756f846edcb4f1665b4643a5a0e02283938cfdf
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2020-04-16 20:38:41 +08:00
Jerry Yu
92fc8733fa crc32: Fix prototype mismatch bug
Change-Id: I7c8a2348441f32a43ff386122612405e418d9947
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2020-04-10 00:46:41 +00:00
Jerry Yu
9bcd6768fd crc32:Adjust hardware folding algorithm flags
Hardware folding algorithm depend on CRC32 and PMULL instruction.
And it should match both flags .

Change-Id: I361068402db1fe6d7c0bd8d2c7048f1d94880233
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2020-04-08 13:50:15 +08:00
Jerry Yu
0033f42189 crc32:Optimize crc32/c for cortex-a72
Change-Id: Ib1658fd4b87b31d8ea6c93f697b50d9b409c186e
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2020-04-08 13:49:38 +08:00
Greg Tucker
5e586843eb build: Change ms nmake default to nasm and add pdb gen
The nmake default is changed for a modern nasm. Older nasm and yasm versions
will still work with windows but the nmake options must be changed appropriately
for max AS_FEATURE_LEVEL to match. Also now generates debug symbol pdb files.

Change-Id: I94a2dd7ecf541c6564ccbd4a184c33995d7b31ad
Signed-off-by: Poornima Kumar <poornima.kumar@intel.com>
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-03-31 22:55:27 +00:00
Jerry Yu
a2fc2c000d crc32:Add optimization implementation for Neoverse N1
This patch is base on reference(1) algorithm with some changes.
- Redefine the block number to two.
  - That's due to only two pipe-line can be used in CRC32 calculate.
- Redefine the block size:
  - The block size of CRC is 1536B and PMULL is 512B
- Interleave CRC and PMULL instructions.
The optimization parameters are calculated base on reference(2)

References:
- https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf
- https://developer.arm.com/docs/swog309707/a

Change-Id: I1c9e593d59b521f56e4b3c807b396c083c181636
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2020-03-30 09:20:29 -07:00
Jerry Yu
f2cf2609cd multi-binary:Add microarchitecture id reader
This patch provides microarchitecture information
and make microarchitecture optimization possible. It
will trap into kernel due to mrs instruction. So it
should be called only in dispatcher, that will be
called only once in program lifecycle. And HWCAP must
be match,That will make sure there are no illegal
instruction errors.

Change-Id: I393ec742010bf3f10ce335482c0350aa4202c788
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2020-03-30 09:20:29 -07:00
Jerry Yu
85f947e120 ci: remove unused drone configuration
Change-Id: I20bded8111deb122757dbf259d17cd80010c2bb6
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2020-03-27 16:16:00 -07:00
Greg Tucker
af13ed6136 ec: Fix second windows reg push for avx512
Change improper stack push in windows prolog.  Error was not reachable without
windows nasm support and so went undetected.

Change-Id: I8b715195d1c8efd173843c043d42fc610ddebd17
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-03-20 12:36:58 -07:00
Greg Tucker
ede04f0a1f build: Fix for windows to allow nasm use
Previously windows build could only use yasm because some procedural items such
as proc_start were not supported by nasm.  This adds a few macros and fixes so
nasm can be used to build on windows.

Change-Id: Ia05dc3ff482f33b0f915bb1be3c7df5e4a753b3a
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-03-17 18:05:46 -07:00
Greg Tucker
5ab40c79cc ec: Fix windows reg push for avx512
Push of registers overlapped xmm push.  Error was not reachable without windows
nasm support and so went undetected.

Change-Id: I0ffd66f6d32ac37ea03fe9b11924968aa50f8fa7
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-03-17 18:05:46 -07:00
Greg Tucker
472e7011e8 ec: Change use of windows macro save_xmm128 to vec
For builds under windows this could emit a non-vec mov that's not optional for
AVX versions.

Change-Id: I31e6ea3b62d48c5a13f6e83f8d684f0b5551087b
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-03-17 18:04:54 -07:00
Greg Tucker
7c0ab1d459 build: Add auto regenerate of nmake file
Change-Id: Icaa64aa35697c87779df18c3941d3df0f3256546
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-03-10 14:00:05 -07:00
Greg Tucker
794413ddd2 ec: Remove arch-specific redundant gf_nvect tests
The gf_{2-6}vect_dot_prod tests were kept in other_tests since the 5,6vect
functions were not strictly called by the higher level ec_encode_data() and
needed independent testing.  As this has now changed the extra tests can be
removed as redundant.

Change-Id: I8a95e31487b150a2a8f929c5586785524d951fde
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-03-06 13:45:59 -07:00
Greg Tucker
806b55ee57 build: Bump revision to 2.29
Change-Id: I78cfa77864f3fd77c3b63199bc18fd1782fe3dc2
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-02-26 18:29:49 -07:00
Greg Tucker
2db2cd557c Update release notes for v2.29 additions
Change-Id: Id9ba5da760ee60dbb1de47162e6276f522bc0850
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-02-26 12:04:18 -07:00
Greg Tucker
6136a04bbe crc: Add new vclmul version of crc16_t10dif
Change-Id: Ic068f35d5d8c34b74128b7a2ea8e82f5fa693c28
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-02-21 19:54:19 -07:00
Greg Tucker
5ef6eb5c68 crc: Add new vclmul version of crc32_ieee
Change-Id: Ib761e3240d8252ce84e9abeadb568dce60742717
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-02-21 10:11:16 -07:00
Greg Tucker
25a673d75a crc: Add new vclmul version of gzip_refl
Change-Id: I8050853dcd177f4fb506f32f5fa723f7a1d3cded
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-02-21 10:11:16 -07:00
Greg Tucker
4217930338 crc: Add vec version of crc16_t10dif_copy
Change-Id: I5f73e8a38efd1ff50d30a39689d9d85da702e809
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-02-21 10:11:16 -07:00
Greg Tucker
02a41e0653 crc: Add vec version of crc32_ieee when avx avail
Change-Id: I5542ee93156c26f5a23feb89b82f4c51f282777d
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-02-21 10:11:16 -07:00
Greg Tucker
d4131bb3d3 crc: Add vec version of crc32_gzip_refl when avx avail
Change-Id: I4a069c318c809dcd21a6ebc47d3e0d1c131599ea
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-02-21 10:11:16 -07:00
Greg Tucker
ad22a90686 crc: Add vec version of crc16 when avx available
Vec versions mix much better with other avx code.

Change-Id: I2544c75d09231ee70f16c384b1e57062976199d9
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-02-21 10:11:16 -07:00
Hong Bo Peng
180c74aefd enable VSX SIMD in ISA-L for ppc64le
1) Implement the ErasureCode function in Altivec Intrinsics
  2) Coding style update

Change-Id: I2c81d035f4083e9b011dbf3b741f628813b68606
Thanks-to: Daniel Axtens <dja@axtens.net>
Signed-off-by: Hong Bo Peng <penghb@cn.ibm.com>
2020-02-20 09:40:43 -07:00
Zhang Jinde
a3d5cd8642 igzip: Fix clang error on dep generation
Clang errors when generating dependencies due to a stray semicolon following a
function definition.

Change-Id: Iefb4aca988b643bb62a69bbbaf197aca20a2d085
Signed-off-by: Zhang Jinde <zjd5536@163.com>
2020-01-17 10:25:32 -07:00
Zhang Jinde
163b6cd934 igzip: Fix for deflate logic buffer management
Fixes invalid logic that attempted to eliminate unnecessary copy of input to the
history buffer in cases where it is not required. Correction should improve
performance and not change functionality.

Change-Id: Ife24dcc9d920ce220b1a394031e971321737a171
Signed-off-by: Zhang Jinde <zjd5536@163.com>
2020-01-08 09:46:16 -07:00
Jerry Yu
fc69e8fc79 igzip: fix deflate hash bug
if next_in equal end_in, the function should
return.

Change-Id: I59e631bb1f24835fd43f943a3736e016c4e2d0ac
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2019-12-31 13:15:35 -07:00
Jerry Yu
e2b07bbd44 build: fix debug build problem
Remove strip command when lib_debug=1

Change-Id: I1203fcbfefb3b87080e9ba12ccbfb8018a008147
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2019-12-31 13:15:05 -07:00
Jerry Yu
936d05fc4f igzip:Add decode huffman code for aarch64
Change-Id: If26cc4fd97b078b5f3b02e5f6f121a12ec73f671
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2019-12-19 16:10:04 +08:00
Greg Tucker
ad49e580dc doc: Fix missing description of gf_matrix_inverse
Doc missed issue of input matrix destruction.
Fixes #116

Change-Id: Ic840b27532d90518dd21ec2701c278a1c3b61a8b
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-12-13 16:24:05 -07:00
Zhiyuan Zhu
2b8cc393af igzip: implement gen_icf_map with assembly
Change-Id: I74e6200a732acfaac44b7f5a82bd4a2215ba1535
Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>
2019-12-13 07:54:12 +00:00
Zhiyuan Zhu
f430953f0a igzip: cleanup perf test related code
This patch addresses some cppcheck issues.
And some minor changes to maintain code consistency.

- Cleanup cppcheck issues.
  [log][igzip/igzip_perf.c] (error) Shifting signed 32-bit value by 31 bits is undefined behaviour
  [log][igzip/igzip_hist_perf.c:132]: (error) Memory leak: outbuf

- Some minor changes to maintain code consistency.
  igzip/igzip_build_hash_table_perf.c
  igzip/igzip_hist_perf.c
  igzip/igzip_semi_dyn_file_perf.c

- delete unused variable
  outbuf and outbuf_size from igzip/igzip_hist_perf.c

Change-Id: Icbbd8f70de689931c8a844d89e457af8d97c6793
Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>
2019-12-06 15:33:20 +08:00
Zhiyuan Zhu
683364c47b igzip: implement encode_deflate_icf with assembly
Change-Id: I90b12da2d2a96bfdb47d29ab329648247a756585
Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>
2019-11-29 14:45:45 -07:00
John Kariuki
5eeb33f69c ec: add AVX512 ec functions with 5 and 6 outputs
Added AVX512 optimized functions to calculate the
GF(2^8) vector dot product with 5 and 6 outputs
at a time. Also added GF(2^8) vector multiply
AVX512 optimized functions with 5 and 6 accumulate.

Change-Id: I6d2c080f4f4f8e4823ad9a9be2c65c3b5b3bb1f8
Signed-off-by: John Kariuki <John.K.Kariuki@intel.com>
2019-11-19 10:12:14 -07:00
Samuel Lee
4785428d2f crc: arm64 implementation tweaks
+ Utilise `pmull2` instruction in main loops of arm64 crc functions and
avoid the need for `dup` to align multiplicands.
  + Use just 1 ASIMD register to hold both 64b p4 constants,
appropriately aligned.
+ Interleave quadword `ldr` with `pmull{2}` to avoid unnecessary stalls
on existing LITTLE uarch (which can only issue these instructions every
other cycle).
+ Similarly interleave scalar instructions with ASIMD instructions to
increase likelihood of instruction level parallelism on a variety of
uarch.
+ Cut down on needless instructions in non-critical sections to help
performance for small buffers.
+ Extract common instruction sequences into inner macros and moved
them into shared header - crc_common_pmull.h
+ Use the same human readable register aliases and register allocation
in all 4 implementations, never refer to registers without using human
readable alias.
  + Use #defines rather than .req to allow use of same names across
several implementations
+ Reduce tail case size from 1024B to 64B

+ Phrased the `eor` instructions in the main loop to more clearly show
that we can rewrite pairs of `eor` instructions with a single `eor3`
instruction in the presence of Armv8.2-SHA (should probably be an option
in multibinary in future).

Change-Id: I3688193ea4ad88b53cf47e5bd9a7fd5c2b4401e1
Signed-off-by: Samuel Lee <samuel.lee@microsoft.com>
2019-11-13 10:58:19 -07:00
Greg Tucker
0a8d05a81e doc: Move arch-dependent build instructions to readme
Removed the redundant parts that apply to all arch.

Change-Id: I2015c436cc8ea09913a8d0d4ce2cf1f112d71dde
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-11-01 15:55:44 -07:00
Hang Li
02a86dfb3f erasure_code: modify eor way in aarch64 neon codes
Change-Id: I9fb9219c5f280ed88194ec63234af046a5a036ae
Signed-off-by: Hang Li <lihang48@hisilicon.com>
2019-11-01 15:31:33 -07:00
Jerry Yu
ce9e56054a igzip:implement deflate hash with assembly
Change-Id: I39b3a37cd291c40f597750839c27db2a6a571fe5
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2019-11-01 14:41:46 -07:00
Jerry Yu
216d0f929b build: fix cross compile issue
Replace hardcode gcc with $(CC). as_filter
will work correct in cross compile

Change-Id: I484d5074abdfc80ed5cd14fdd1358274f306bcfd
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2019-11-01 18:11:05 +08:00
Jerry Yu
5d7724898d build: fix wrong use the register name
The third parameter must be 32bit register . Those assmebly
put 64bit register here , it is wrong .

Change-Id: Iebe17516b555a6a9b94ea7baa4778ad4b9dd0878
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2019-11-01 18:11:00 +08:00
Jerry Yu
b441659879 multibinary: fix strict-prototype warning
with -Wstric-prototype option , GCC report the
warning .

Change-Id: Ic2d1adb566ad21deec65c66552e2863254e1376a
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2019-11-01 18:10:57 +08:00
Jerry Yu
f0104600a0 build: disable clang support in ci
- Disable clang test for travis and drone.io
- Add document about compiler requirement

Change-Id: I81f8dc31088d40f315dd4ec062bed5df8ab7b633
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2019-11-01 18:10:50 +08:00
Zhiyuan Zhu
6b70da5051 igzip: implement set_long_icf_fg with assembly
Change-Id: I21ac55985a56c2b7b0a684934c076600d90f8b0a
Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>
2019-10-31 11:02:54 -07:00
Greg Tucker
4ed944c4b1 build: Fix travis osx issue with brew update
Bug in Homebrew auto-update causes post-update install to use the old
environment.

Change-Id: I03e20d899f558f71579dfd4be3f96903b77f1998
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-10-30 11:16:49 -07:00
Hang Li
621cf92c52 erasure_code: modify perf benchmark loop
Change-Id: Ie45ceb3ac55ab943a155e2a3f9f6b765cd94d7a1
Signed-off-by: Hang Li <lihang48@hisilicon.com>
2019-10-30 10:34:40 -07:00
Greg Tucker
2f9eef537c build: Fix autoconf build for mingw target
Change-Id: Ie5ae17556f8cc95af8e59c8bd81a958c94455cd1
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-10-28 15:53:14 -07:00
Greg Tucker
e6848434ae test: Fix issue keeping mingw tests from running
Change-Id: I1e72ed99c2f09cbad488774313cddafdb1ce5de8
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-10-28 15:52:48 -07:00
Greg Tucker
533ba53f11 crc: Fix symbol conflict with older assemblers
Change-Id: I6f1322a5fecdf21b2c774454cd51cb56767f30b8
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-10-28 14:39:44 -07:00
Zhou Xiong
d7848c1d05 Implement aarch64 neon for erasure code.
1.Replace below erasure code interfaces to arm neon interface by mbin_interface function.
	ec_encode_data
	gf_vect_mul
	gf_vect_dot_prod
	gf_vect_mad
	ec_encode_data_update

2.Utilise arm neon instrution to accelerate GF(2^8) set compute by 128bit registor.

Change-Id: Ib0ecbfbd1837d2b1f823d26815c896724d2d22e4
Signed-off-by: Zhou Xiong <zhouxiong13@huawei.com>
2019-10-25 11:09:03 -07:00
Jun He
c680d3aba7 Add arm64 to Travis matrix
Enable new arm64 architecture in TravisCI, add tests for
following compilers:
gcc: v5.4.0
clang: v3.8.0

Change-Id: Id0b2f2231fabcbeff7061f85050db99df12c9a67
Signed-off-by: Jun He <jun.he@arm.com>
2019-10-24 10:09:19 +08:00
Greg Tucker
5f698e9e41 doc: Update mailing list link
Change-Id: I57fdf1ab4ca9f57c11f361c873094c5c22dc5410
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-10-16 17:13:54 -07:00
Greg Tucker
66cff99954 doc: Remove non-extern headers and add treeview
Change-Id: Icee001e66d48f7a47b36ded5550c66832f81a4cc
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-10-16 17:13:54 -07:00
Bernd Schubert
d32d3f6902 Make variables in ec_base.h (file) static
ec_base.h has several variables, which were defined with
a global scope. Exactly those global variables caused issues
on linking a static compilation of libisal.a to a shared lib.
Adding -fPIC to CFLAGS somehow didn't help.
As all the variables in ec_base.h are only included
and used by a single C file, all of these can be
(file) static, which then will also helps the compiler to
make further optimizations. And which also solves the issue
to link the static libisal to a shared lib.

Also make the variables const, as these are constants and
must be modified.

Change-Id: I2b8141dabc1c7a528401f2778cdbdbed6c93c36b
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
2019-10-11 15:39:56 -07:00
Zhiyuan Zhu
f3993f5c0b crc: Fix dynamic relocation link failure on Arm
This issue occurs when dynamic compilation is used
and gcc's -fsanitize memory detection option is turned on.

[Log] relocation truncated to fit: R_AARCH64_LD_PREL_LO19 against `.rodata'

Change-Id: Ic2f82264610552f347e043f82ac5ebafc93748e2
Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>
2019-10-11 15:37:29 -07:00
Zhiyuan Zhu
be4d035227 igzip: Optimize isal update histogram with arm64
Change-Id: I944f9497d990e831de5e066055a21ea7e8d6693b
Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>
2019-10-11 09:59:47 -07:00
Zhiyuan Zhu
290456231c igzip: Implement deflate icf body/finish with assembly
Change-Id: I40e4a9be2ae654c881460056de9730176d3d097c
Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>
2019-10-11 09:59:40 -07:00
Jerry Yu
f3bb041799 igzip: Implement deflate body/finish with assembly
Change-Id: I556af7976294f31abd72ac49366f7259e3baf399
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2019-10-11 09:59:30 -07:00
Greg Tucker
fae4c3a499 Update release notes for v2.28 additions
Change-Id: Id295d5e615712f41d67d1130d5bcab1abed4c29f
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-09-17 11:01:17 -07:00
Greg Tucker
36502ec33b build: Bump revision to 2.28
Change-Id: I57443be6b0f6dff6129943cd6e1508d73bc1aa80
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-09-17 10:43:53 -07:00
Greg Tucker
600b6d8d99 crc: Add new ecma_norm
Change-Id: I7747bfdca24bcd604c3eb118e7f1bcd98b2b6211
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-09-16 17:01:25 -07:00
Greg Tucker
121bc635c9 crc: Add new jones_norm
Change-Id: I66118baeec2a1d63423c74edc3aa20a3e8955c6e
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-09-16 17:01:25 -07:00
Greg Tucker
ed528bb2ad crc: Add new iso_norm
Change-Id: If0b05d1a1029b02842935c5c43966d81c59fbbca
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-09-16 17:01:25 -07:00
Greg Tucker
ea4cbf0ffa crc: Add new ecma_refl
Change-Id: Ifef4f8c6ce7da328b0cc03040b17e7443febf44d
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-09-16 17:01:25 -07:00
Greg Tucker
42bbc5a37e crc: Add new jones_refl
Change-Id: Ia4837b9125bce4e38ef6bae0a8c852d02e9b0bf2
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-09-16 17:01:25 -07:00
Greg Tucker
5c546ecddf crc: Add new arch CRC
Change-Id: I31d3a7e61eeed9d13a0cadd6d1ed25b0dbb39415
Signed-off-by: Chunyang Hui <chunyang.hui@intel.com>
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-09-16 17:01:25 -07:00
Greg Tucker
7a28c83879 test: Increase size of crc tests and simplify output
Change-Id: Ia0418b7889e591a0164c335e273caff263cdf640
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-09-14 16:01:28 -07:00
Greg Tucker
ae3c91ab85 build: Set assembler feature level in std make
Also fix multibinary to try each available arch

Change-Id: Icd8496d169665bded478a33a02e739d1f8349b6f
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-09-14 16:01:28 -07:00
Roy Oursler
198b026a55 build: Add multi-binary checking for new arch
Change-Id: I8bb8d9e9ae28987ee583976871ff84ee205bdbdc
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2019-09-14 16:01:28 -07:00
Roy Oursler
e4b8f164ae build: Setup as_feature_level
Change-Id: I7443058c577cf8eafe10acc2b2bfdfe76e2ce264
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2019-09-14 16:01:28 -07:00
Roy Oursler
d3caab9c3a build: Avoid requiring AVX512 define when using dispatch functions
Change-Id: I76af2d6ab7eb61ae531bbc7427650d08737c20ab
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2019-09-14 16:01:28 -07:00
Greg Tucker
1ba280fa09 igzip: Fix and clarify a few code issues in the cli tool
Fixes a few scan build hits. A few are false positives such as a missed free but
better to clarify the code in this case. Others such as calling no-null
functions are made explicit.

Change-Id: Icb001a2bf7024dbaa4b4c87089eda818de830c78
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-09-04 14:39:01 -07:00
Jerry Yu
5f45f3f310 igzip: Optimize adler32 with arm neon
Change-Id: I9b8932eb02ed6bc44756f6505e7efbfad1706b46
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2019-08-29 10:11:06 +08:00
Jerry Yu
a2005c1fd6 igzip: enable multibinary interfaces
- Add dispatcher layer
- Alias functions with assmebly

Change-Id: I84da1be539d890db0df64e5ea989b2fd1f276949
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2019-08-29 10:08:58 +08:00
Jerry Yu
183385f02f multibinary: Add run-time cpu feature detect for aarch64
Some CPUs  report "illegal instruction" error for the crc test because
they do not support the relevant optional feature . This can be fixed by
introducing CPU feature detection for AArch64 .

The difference with the x86 implementation is the dispatcher . It is based
on the glibc function `getauxval(AT_HWCAP)` and `getauxval(AT_HWCAP2)` , not
registers or instructions .

On a  heterogeneous system (big.LITTLE) , it is dangerous to detect CPU
features using identification registers . And while it is possible to use
architectural feature registers from userspace on recent kernels, this
won't necessarily work with older platforms . Thus we use the HW_CAPs
exported from the kernel (and visible in getauxval) as the solution.

- According to kernel suggestion , getauxval should be used for this purpose .
  - [CPU Feature detection](https://github.com/torvalds/linux/blob/master/Documentation/arm64/cpu-feature-registers.rst)
- According to  AAPCS result/paramter registers should be saved/restore for function call
  - [AAPCS](http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf)
  - [GLibc](https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=blob;f=sysdeps/aarch64/dl-trampoline.S)

Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
Change-Id: Ic9abe0d2268ac95537e1abf10acc642fc58a5054
2019-08-26 17:58:42 +08:00
Jerry Yu
0c22fcd3e2 build: fix compile break for unsupported CPUs
Build with Makefile.unx on unsupported CPUs fail . It reports
"undefine references". Fix it with adding base aliases files
into sources list

Change-Id: I9fbdeee7cb82edc9d5d8461bee3f648be83feaa6
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2019-08-23 17:28:22 +08:00
Jun He
a95292aa01 ci: add drone.io for arm64 verification
Change-Id: Ib357be80e7e9d7c0ab62433ee5fda4b962592553
Signed-off-by: Jun He <jun.he@arm.com>
2019-08-19 11:21:10 -07:00
Jun He
b721db98e5 igzip: optimize convert_dist_to_dist_sym to branchless
convert_dist_to_dist_sym uses long if/else branch to get look back distance.
The distance calculation is well formed for each distance range, so it could
be optimized for a branchless version.

Change-Id: I4e1e5170f8b3238631f3048087f95acc53e4498e
Signed-off-by: Jun He <jun.he@arm.com>
2019-08-13 11:02:53 +08:00
Greg Tucker
e2997062fb igzip: Optimize routine to find msb
Change-Id: I40e7898e2139c04f261980ca10886debc917842a
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-08-12 14:28:33 -07:00
Greg Tucker
4b33238371 Update travis with more nasm builds
Change-Id: I78b48f80d22ea811a9ed2e3a537e8dfa0350c8c5
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-07-16 16:09:16 -07:00
Greg Tucker
38f4880a4e build: Set nasm as the default when using std makefile
Also test the assembler for modern instruction support and set appropriate
defines.

Change-Id: I1628abd50b3babeeb7e010b86bda7ea97de0e6fb
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-07-16 15:47:20 -07:00
Greg Tucker
4ac0e435eb ec: Fix incorrect min size stated for gf_vect_mad
Change-Id: If178913f01f0d500aa66ce0e8dd67aaba49a0871
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-07-16 15:41:34 -07:00
Zhiyuan Zhu
c80610a2bb crc: push the aarch64 crc optimization back to base functions
Some arm64 machines don't support pmull instructions, so set these
crc interface to base functions. For long-term solution, will
provide better multi-binary support with cpu features detection.

Change-Id: I02791a2a50283dc8df2f9ba124eb309912b5b4b7
Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>
2019-07-16 07:18:54 +00:00
Greg Tucker
236fdcc28f Update travis with xenial builds and new indent
Xenial has a minimum nasm version but no longer min indent.

Change-Id: I3ec70b9d5be932e903b77fd07d23667746c6c9f8
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-07-10 13:10:20 -07:00
Greg Tucker
25374814c9 Format only changes for new indent version
Change-Id: I2b2a5caf1b31ad56665081145d5e7089fd34d0ab
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-07-03 15:17:02 -07:00
Greg Tucker
430e862a9c Change indent format for minimal changes with latest
Latest indent 2.2.12 has some changes and bug fixes including adding -sar to -kr
and recognizing size_t as a standard type.

Change-Id: Id613cfb3cebdbe8e9e8823236adb5ee6eb712229
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-07-03 15:12:50 -07:00
Greg Tucker
0111c21cbc Update release notes for v2.27 additions
Change-Id: I75416fab491ffbeb41b989e554f0c56b175d2d1f
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-06-24 13:13:07 -07:00
Greg Tucker
10906f3d3a build: Bump revision to 2.27
Change-Id: Ia0f0f872614370475a29fdab32b587480a3ff760
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-06-24 10:47:15 -07:00
Greg Tucker
0a7e3167ce igzip: Add optional threaded compression to cli tool
Change-Id: Ia29e877cfa8bef2285d8b48bb9133b2ff5b2eea0
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-06-21 16:37:17 -07:00
Zhiyuan Zhu
a46da529d9 crc: optimize crc with arm64 assembly
Change-Id: I49166ee06b3ad24babb90aeb0b834d8aacfc2d03
Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>
2019-06-21 17:02:16 +08:00
Roy Oursler
9c91a18c6e igzip: Fix igzip_rand_test to test on a single file
Change-Id: I21cb27f6012094adee6496f811792d6e3b11a8bc
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2019-05-21 13:52:41 -07:00
Zhiyuan Zhu
899c647628 crc: implement table-driven crc algorithm
Change-Id: Iebfb8ae1db09bf2dc882fd87e61627d74fab4a5c
Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>
2019-05-08 17:50:03 -07:00
Greg Tucker
2000f8a3cd doc: Update references to new github group
Change-Id: Icca8d7da1f34cc7b31cb942baf7a22026f7fcf93
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-05-01 18:27:24 -07:00
Greg Tucker
88eff26884 build: Install pkg-config files
Change-Id: I712ef6565f613b7fffa5bae02b08b2224aeb2b17
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-05-01 16:48:10 -07:00
Greg Tucker
f1252a9e79 build: Add missing files for distcheck
Change-Id: I644d59eece8a9eae9d467f42d668b13d8fae0d81
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-05-01 16:48:10 -07:00
Greg Tucker
e9e373ba3d build: Add gitignore
Change-Id: Id839000d997f565e5de5fd570e321587347474bf
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-05-01 16:48:10 -07:00
Roy Oursler
db59b1082f igzip: Remove undefined behavior in igzip.c
Remove unaligend data access in
write_deflate_header_unaligned_stateless

Change-Id: I7defa5621d8dc188edc51d22d29155ed3687c49d
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2019-04-29 16:22:07 -07:00
Greg Tucker
906332850d build: Add version number to dll
Change-Id: Ic3411dbd0e959c41f4085cec38e4c95a6683a092
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-04-18 15:35:12 -07:00
Greg Tucker
fc9f7493a0 igzip: Fix help message in perf test
Change-Id: I2d4fd7c0eee176570d79bebf5f6e453f3b7dbba6
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-04-18 10:59:50 -07:00
Greg Tucker
09338c2ca7 Update release notes for v2.26 additions
Change-Id: I7c6aa4a60e16aec79178c3db6282904cd7af45f6
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-03-25 17:15:05 -07:00
Greg Tucker
f30db4c6c6 build: Bump revision to 2.26
Change-Id: I97ed7ab591e8174f7379be0563d6b9f2d0f90f0a
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-03-25 12:57:17 -07:00
Greg Tucker
8f06ac6973 Add Adler32 performance test
Change-Id: I511dd34235c9a4ce2f3596d63236800cbf06703b
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-03-22 14:29:42 -07:00
Yibo Cai
57eed2f02b aarch64: Cleanup build issues
This patch addresses one build failure and fixes several build warnings
for Arm (some for x86 too).

- Fix dynamic relocation link failure of ld.bfd 2.30 on Arm
  [log] relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol `xor_gen_neon' which may bind externally can not be used when making a shared object

- Add arch dependent "other_tests" to exclude x86 specific tests on Arm
  [log] isa-l/erasure_code/gf_2vect_dot_prod_sse_test.c:181: undefined reference to `gf_2vect_dot_prod_sse'

- Check "fread" return value to fix gcc warnings on Arm and x86
  [log] warning: ignoring return value of ‘fread’, declared with attribute warn_unused_result [-Wunused-result]
        fread(in_buf, 1, in_size, in_file);
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- Fix issue of comparing "char" with "int" on Arm. "char" is unsigned
  on Arm by default, an unsigned char will never equal to EOF(-1).
  [Log] programs/igzip_cli.c:318:31: warning: comparison is always true due to limited range of data type [-Wtype-limits]
        while (tmp != '\n' && tmp != EOF)
                                  ^~

- Include <stdlib.h> to several files to fix build warnings on Arm
  [log] igzip/igzip_inflate_perf.c:339:5: warning: incompatible implicit declaration of built-in function ‘exit’
        exit(0);
        ^~~~

Change-Id: I82c1b63316b634b3d398ffba2ff815679d9051a8
Signed-off-by: Yibo Cai <yibo.cai@arm.com>
2019-03-20 10:15:40 +08:00
Greg Tucker
3c009347b1 Fix a few c99isms in unit tests
Change-Id: Iea9ba619e337d5abea7ee791ddf3dd27e0f3e60f
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-03-19 15:02:40 -07:00
Greg Tucker
e08dfab9b3 test: Fix c99 warn in perf helper functions
Change-Id: I7e116215dc95bbca96c7285b98f5b8ec4e340ef1
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-03-18 15:07:47 -07:00
Roy Oursler
31eca5035f igzip: Modify last byte retrieval method
Change-Id: I3ba7e9bd007277be543ba7a6299d5acc5c848bd2
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2019-03-07 09:28:04 -07:00
Roy Oursler
67c4e26580 igzip: Remove unneeded generation of k register
Change-Id: I79bfb3b3a3feeb969a0c0ec92b7ae0633f6be1b0
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2019-03-07 09:28:04 -07:00
Roy Oursler
28da992ad2 igzip: Reduce data used in encode_df_06.asm
Change-Id: I83dbca452840c07b0fd77faaf9d35c46065f8a08
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2019-03-07 09:28:04 -07:00
Roy Oursler
863e72189b igzip: Modify igzip_rand_test to optionally use getopt
Change-Id: I8ad8e7f18f292b54158f1cda2eef9aec3919d175
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2019-03-07 09:28:04 -07:00
Roy Oursler
35e90e73d5 igzip: Write out compressed data from igzip_perf
Change-Id: Iefea3e314e277112858874f826f54bdfa0172e04
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2019-03-07 09:28:04 -07:00
Roy Oursler
d90220d935 igzip: Implement window size for igzip_perf
Change-Id: I6f5e9453aaff980b44c3e6d56b113da7625ec36e
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2019-03-07 09:28:04 -07:00
Roy Oursler
699bb5bd3f all: Revamp performance testing to be time based
Change-Id: I6260d28e4adc974d8db0a1c770e3eb922d87f8e4
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2019-03-07 09:28:04 -07:00
Roy Oursler
bde3fc5ff1 igzip: Remove igzip_stateless_file_perf
Remove igzip_stateless_file_perf as all the functionality is included in
igzip_perf

Change-Id: Icfd4dfd25af1a3a6c16fa2c3299d277c18f204d9
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2019-03-07 09:28:04 -07:00
Roy Oursler
4ac2b7864b igzip: Remove igzip_inflate_perf
Remove igzip_inflate_perf as all the functionality is duplicated in igzip_perf

Change-Id: I510cc4643a3949e2fa8f30309b5d45a249320b9e
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2019-03-07 09:28:04 -07:00
Roy Oursler
4f79dc1e83 build: Remove non-extant perf test from nmake
Change-Id: I4f8872c9c48f9779e37347fb2a776f5f4013ffed
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2019-03-07 09:28:04 -07:00
Roy Oursler
3a78c4a205 ec: Remove gf_vect_mad_perf.c
Remove gf_vect_mad_perf.c as it is architecture specific and does not provide
useful information in its current format.

Change-Id: I7819679db491a9b5572128e4fc05d989b870d22d
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2019-03-07 09:28:04 -07:00
Roy Oursler
623d2f0dc0 igzip: Bitbuf improvements
Update Bitbuf to use stdmac and decrease register dependencies by replaces a sub
with an and.

Change-Id: Iaadf3c6ef7f533540a7adb57a418e9e80e9b8503
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2019-03-07 09:28:04 -07:00
Roy Oursler
53b92e83f4 igzip: Avoid UB pointer arithmetic underflow for virtual file start
Change-Id: I95c0e6f004eaf70227a6419fc14bf0958d1f4538
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2019-03-07 09:27:50 -07:00
Roy Oursler
139fdb68b9 igzip: Fix latent pointer undeflow bug
Change-Id: I60a3d6b355dc4ab5d74ad701d7c76d989ae30906
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2019-03-07 09:27:50 -07:00
Roy Oursler
a4535d776c test: Add undefined behaviour sanitizer to test_checks
Change-Id: Id953cca99c6a6c64875185452e2ca6630cf47541
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2019-03-07 09:27:50 -07:00
Roy Oursler
c0467e56d5 igzip: Use blind union to represent overlapped tables in hufftables_icf
Change-Id: I0260a705db81f4e7731d4d40757c5919be002e8f
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2019-03-07 09:27:50 -07:00
Roy Oursler
342cae57fc igzip: Modify fast hash table init to avoid signed shift
Change-Id: Ifd8e6e6540b6d6e6d82af74bb57c25733684bfd4
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2019-03-07 09:27:50 -07:00
Roy Oursler
aae6e29d28 igzip: Remove unaligned stores
Change-Id: I8d351c8b7153178d26d6fc702ee3036b71165b93
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2019-03-07 09:27:50 -07:00
Roy Oursler
a3169750b5 mem: Remove unaligned loads in base function
Change-Id: I8fb0f2e2e372485c864d5c60f816b661a865b707
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2019-03-07 09:27:50 -07:00
Roy Oursler
5be1ba2215 igzip: Remove undefined unaligned loads
Change-Id: I02591d958f8691d07b261218cf5ab361e8ad36c9
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2019-03-07 09:27:50 -07:00
Roy Oursler
733901ee32 mem: Change test r and l data type to avoid unsigned add overflow
Change-Id: If9c30c5fda72ed5139a7cab01b5236f57a3ad0ef
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2019-03-07 09:27:50 -07:00
Roy Oursler
5c62f1e1ec crc: Use type cast in crc32_ieee_base to avoid undefined behavior
Change-Id: I8362831125927372c62ecb5eec2f5afe6f75ef24
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2019-03-07 09:27:50 -07:00
Zhiyuan Zhu
636272cff6 aarch64: Fix dynamic lib call crash
If an application treats these functions as function pointers, and this
lib (isa-l) is compiled into solib, a segmentation fault may occur.

For example: Ubuntu 16.04 on arm64 platfrom will be crash, because the
linker does not know that this symbol is a function, so mark the function
type explicitly with %function to solves this issue.

Change-Id: Iba41b1f1367146d7dcce09203694b08b1cb8ec20
Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>
2019-03-01 02:55:50 +00:00
Zhiyuan Zhu
f5aa9d72de raid: Add license headers
Change-Id: I0d2d48eb30c31ff6967c132a415431dddd8a8982
Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>
2019-02-22 14:32:19 -07:00
zhiyuan.zhu
2d6c8496f2 mem: mem-zero-detect optimization on Arm64
Change-Id: I9e7b8c80657c9c251d69efcfc73acc53567cfa33
Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>
2019-02-22 08:15:22 +00:00
Yibo Cai
abb6bd3ee8 build: Fix autoconf for non-x86 arch
Due to latest changes to configure.ac, some conditional variables are
now only defined for x86, which leads to building errors on non-x86
arch: 'configure: error: conditional "USE_YASM" was never defined'

Change-Id: If6dbd23c6898e04f4755d713b1e76e2b5fc34232
Signed-off-by: Yibo Cai <yibo.cai@arm.com>
2019-02-12 10:52:44 +08:00
Zach Bjornson
f9588bbedc igzip: export isal_adler32
Change-Id: Iadb73851f826131cc59974b65240b501e9d57f98
Signed-off-by: Zach Bjornson <zbbjornson@gmail.com>
2019-02-10 13:37:52 -07:00
Yibo Cai
19fb012e81 raid: Add aarch64 NEON implementation
Change-Id: I6ad471d3b22a87bfa7e216713e04afa990a90edb
Signed-off-by: Yibo Cai <yibo.cai@arm.com>
2019-02-10 13:08:56 -07:00
Yibo Cai
7a44098a98 build: Add aarch64 support
Change-Id: If9594936a28355d89edd1a331b3b429dffa44184
Signed-off-by: Yibo Cai <yibo.cai@arm.com>
2019-02-10 13:08:56 -07:00
Greg Tucker
32b5c4131b build: Fix for mingw autoconf set proper yasm args
Change-Id: Ifb5423e429de0f0302a991d8e4ef5f426df1b80b
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2019-02-01 09:34:46 -07:00
Ziye Yang
bed578b4d6 crc: Make crc32_table_iscsi_base static
Reason: Ceph directly copied some code from isal,
which will have conflict on the condition that
SPDK applications use isal-lib(configured with '--with-isal')
and also use Ceph (configured with --with-rbd)

Change-Id: I9f58412a68af76f8e29219a9c72cd44b9183033d
Signed-off-by: Jesse Hui <Chunyang.hui@intel.com>
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
2019-02-01 17:37:33 +08:00
Greg Tucker
ce9f3923da test: Ignore merge commits in signoff check
Change-Id: I205b7ec523eaf7576513f0ca3edb2bddea43b6ce
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-12-20 16:18:19 -07:00
Greg Tucker
cb1b7bb664 Update release notes for v2.25 additions
Change-Id: Iad8a1536cbabc577a70d9961fd80ed513debd0ea
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-12-12 11:11:47 -07:00
Greg Tucker
a4795d8011 build: Bump revision to 2.25
Change-Id: Ie85600426b36ab8c10cf2b9bc0c71667b9595e57
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-12-12 11:07:41 -07:00
Roy Oursler
ff3841d638 igzip: Fix Type 0 block_size calculation missing buffered bits
Change-Id: I7ee01353673ca66e79e2087bf814bccf76d1824c
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-12-12 11:03:43 -07:00
John Kariuki
2393791654 build: Add multi-arch autoconf support
Added multi-arch support to configure.ac.
Updated header files to only export sse and avx functions on x86

Change-Id: I4d1f8d0eccabad55ee887dc092a565c468f5c629
Signed-off-by: John Kariuki <John.K.Kariuki@intel.com>
2018-12-10 13:40:55 -07:00
Roy Oursler
ebab4454ef igzip: Fix cli to verify checksum when decompressing
Change-Id: Ifa4d0eb5345bc3c3e97a23e3e32c732057bd8ba9
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-12-04 17:12:25 -07:00
Roy Oursler
f76288339a igzip: Initialize avail_in in igzip_cli compress
Change-Id: I2c03c12c4afdc4118fc8dccb04229b9e70d7a610
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-12-04 10:41:40 -07:00
Roy Oursler
8195446d19 igzip: Remove non-applicable possible igzip_cli extensions
Change-Id: Ieea1e883948012222357528d23d4a5d7ead9795e
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-12-04 10:41:40 -07:00
Roy Oursler
7177ff9966 igzip: Implement --test option in igizp_cli
Change-Id: Iff64f591a77cc3aee775b6c30e97641f8d650e69
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-12-04 10:41:40 -07:00
Roy Oursler
cd5de57f1f igzip: Implement name/no-name option in igzip_cli
Change-Id: Ib45ef921223715cd2250eecb5dc4c6cd1d945bd0
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-12-04 10:41:40 -07:00
Roy Oursler
6f3599c191 igzip: Modify igzip_cli to abort when in_file is out_file
Change-Id: Iccb1926e6eb4461ff9f02ab6d593689e96e94155
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-12-04 10:41:40 -07:00
Roy Oursler
1fdc5941a3 igzip: Modify set_long_icf to handle small end_in
Change-Id: I24c3420df5d9e84d27fe28eff96155e5fcd51760
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-12-04 10:41:40 -07:00
Roy Oursler
ba1a000680 igzip: Implement set_long_icf to compare more than 258
Change-Id: Ia8813d176da6bfcd3c6ef441eca1c59ac99db7f2
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-12-04 10:41:40 -07:00
Roy Oursler
bdb6289bbe igzip: Reduce data usage of set_long_icf_fg_06
Change-Id: If05629100ef21fa43a0275110ad978c705c1a7bd
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-12-04 10:41:40 -07:00
Roy Oursler
fce71b0670 igzip: Implement icf_body random data skip ahead
Change-Id: I5dd5f37ec0cdfe4f2591685dc4a0a056f0b07ea3
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-12-04 10:41:40 -07:00
Greg Tucker
eaa1c18a94 doc: Fix spelling errors in headers
Change-Id: I0f4164b39b185fa808c66208df0731b5e031d7fd
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-12-04 10:12:14 -07:00
Greg Tucker
e19101f5de doc: Add detail of internal checksum value in gzip/zlib mode
Change-Id: I8f7fdcec40371e61eb19248cb24c9837d0845a0c
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-12-04 10:11:40 -07:00
Greg Tucker
940515d51f igzip: Add missing initialization in perf test
Change-Id: Ia51a45edd48bd57c925f86c9de7616075a81e64d
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-11-29 16:37:56 -07:00
Greg Tucker
e1470f70f6 igzip: Fixup a few labels and return warnings
Change-Id: Iaf2634a939fc741006895407b4b219a2f2cae98e
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-11-29 16:37:56 -07:00
Greg Tucker
2e212f28fa build: Fix for mac nasm lack of symbol types
Change-Id: I9ee86a3e32876d3860477c8365fc459d94a8920e
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-11-29 13:54:36 -07:00
Greg Tucker
86c865b784 test: Fix for script where no tags are pulled
Change-Id: I88753b8c7abcef3826078fbad21486fb403a1acf
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-11-16 11:06:00 -07:00
Greg Tucker
37a42dd2e8 build: Fix for older mingw that does not auto add extention
Change-Id: I5217da1f59ed747aa85da30fd005343e245c4fe2
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-11-15 18:02:49 -07:00
Greg Tucker
9d7e8097bc build: Fix for change in mingw linker adding extention
New mingw linker will always add .exe extention to filenames regardless of the
-o file name.

Change-Id: I089bc95e91ca9a11c0f6fbb23ff138699d9b42f9
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-11-15 10:57:25 -07:00
Greg Tucker
06b926fbb6 igzip: Fix portability issue when bad window size passed
If a user passes an invalid size for window bits it could have triggered an
undefined shift by larger than variable size.

Change-Id: Ib2999b094af075596be3333418667ae9b498e2ae
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-10-25 14:43:27 -07:00
Roy Oursler
b4dfd61d06 igzip: Fix missing argument in base aliases
Fix missing argument in decode_huffman_code_block_stateless base alias that
causes runtime issues for some architectures.

Change-Id: I84c34bf2635dad2fca6235bd4fe0a5bc78dfbbe6
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-10-23 12:54:36 -07:00
Roy Oursler
8a2be4b693 igzip: Optimize reset hash table
Change-Id: I380353ec846190acc87f19fd7660991e2db62010
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-10-18 17:47:09 -07:00
Roy Oursler
412abd81ea igzip: Make all perf tests run a round for warmup
Change-Id: I9b6d4bd261b8633a6d44458aaa7c0b4bb7d713a5
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-10-18 15:13:31 -07:00
Roy Oursler
2458a651b7 igzip: Increase overestimate of compressed data size in igzip_perf
Change-Id: I16adcd817737f71fc59dac585c65c73eb3397d99
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-10-18 14:53:12 -07:00
Roy Oursler
e4bfb4d22e igzip: Remove unecessary validation checks from igzip_perf
Change-Id: I0a79bfc9604153ab554275949caf987265e17113
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-10-17 13:52:55 -07:00
Roy Oursler
5793a8514e igzip: Modify igzip_perf to be able to avoid perf testing inflate
Change-Id: I384485a6ce8b35cae3738aebc9291d5ff5f8b029
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-10-17 13:52:47 -07:00
Roy Oursler
cb17625279 igzip: Implement zlib stateful perf for igzip_perf
Change-Id: I124de36254dd079e6cf46681fd0cbc86659f2561
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-10-17 13:52:34 -07:00
Roy Oursler
18fd996588 igzip: Add flush type performance testing to igzip_perf
Change-Id: I66de7fa27de43ada1930475aafd1558f09d32ea9
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-10-17 13:51:09 -07:00
Roy Oursler
71a76468d6 igzip: Setup wrapper_hdr_test to use isal_gzip_hdr_init
Change-Id: I42c6ff0375b87d892bfebf28853403adef0a68c7
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-10-16 18:36:05 -07:00
Roy Oursler
943120532f igzip: Expose isal_gzip_header_init for Windows DLLs
Change-Id: I4594ac0b42c43295d60a87d54aed91943e8f762b
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-10-16 16:43:58 -07:00
Greg Tucker
09e787231b ec: Change gf_mad_test to use multi-binary function
Change-Id: Ibe484239b75514b5563dd043bb0e8c46d3bdac5e
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-10-09 17:55:56 -07:00
Greg Tucker
4832ace5f2 build: Add file permissions check to format script
Also do white space checks on more files.

Change-Id: Ie6fae6726336bf8ebf3f6aa5c1534d98cd3f9510
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-10-05 16:08:54 -07:00
Ondřej Nový
fddcb00eb0 igzip: Set NAME section in man page to something usefull
Change-Id: I57f9d51513adfbdc679e09a3d3f9690a7f04bb93
Signed-off-by: Ondřej Nový <ondrej.novy@firma.seznam.cz>
2018-10-01 09:28:06 -07:00
Greg Tucker
7031fa9072 Update release notes for v2.24 additions
Change-Id: Ib6ebccda1c0725d9755dd8e0b61646431984550b
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-09-27 13:35:04 -07:00
Greg Tucker
8ddc8d0117 mem: Fix zero detect base function for mingw
Mingw does not define WORDSIZE and incorrect int width was used.

Change-Id: Idc9f560dd1c722d51f6e54ba2342feafa13f8fa5
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-09-26 10:32:31 -07:00
Greg Tucker
3983fac41f build: Bump revision to 2.24
Change-Id: I55b665f36867e827b5b3660e6fee297908cdc4ea
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-09-25 15:37:30 -07:00
Seth Howell
2065e638f2 tools: fix format of test_end function
This will make the function more flexible

Change-Id: I39acc83ca51ebd22d91166a47efa0d84f415669d
Signed-off-by: Seth Howell <seth.howell@intel.com>
2018-09-25 15:00:05 -07:00
John Kariuki
6e2013391a mem: Add zero detect memory functions
This patch introduces the base, avx and sse optimized zero detect memory function.
The zero detect memory function tests if a memory region is all zeroes. If all the
bytes in the memory region are zero, the function return a zero. Otherwise, if the
memory region has non zero bytes, the zero detect function returns a 1.

Change-Id: If965badf750377124d0067d09f888d0419554998
Signed-off-by: John Kariuki <John.K.Kariuki@intel.com>
2018-09-25 14:33:31 -07:00
Greg Tucker
c872426b1c doc: Add man file for igzip
Change-Id: I13b054aebddcdc1bfa9ae9b82cf4fc5c8ebab94b
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-09-25 11:03:09 -07:00
Greg Tucker
391db3314a igzip: Add a few missing descriptions to cli
Change-Id: I4000547679ccee01541e0203e49d62b99f4f317b
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-09-25 11:01:45 -07:00
Greg Tucker
41fc273c43 igzip: Add missing files for distcheck
Change-Id: Ic6acd926c19113ecfa4233ca8d0a658e1a10ecd8
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-09-21 16:13:17 -07:00
Roy Oursler
a570a3e5d4 igzip: Limit max compare_large to limit redundant matching
Change-Id: I989c9b805700fdced4624fb5f4b4c19cce389448
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-09-20 11:12:02 -07:00
Roy Oursler
906e5d213b igzip: Test swap flush for all levels
Change-Id: I889af775ba3474cf802e2bcb85f82d39d60ee518
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-09-20 11:12:02 -07:00
Roy Oursler
9689ba7e13 igzip: Remove igzip copy of crc32_gzip
Change-Id: I859ed904effa0a8bd7462b77b13e359014912639
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-09-20 11:12:02 -07:00
Roy Oursler
7205c23647 igzip: Improve optional compile options for wrapper_hdr_test
Change TEST_SEED and RANDOMS to be set only when they
are not already defined.

Change-Id: Ib101be4a1726875372eb7b19cd1bcdae90c9267d
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-09-20 11:12:02 -07:00
Roy Oursler
c19657241a igzip: Setup optimized static header table to be default
Change-Id: Ia54682f7e1e321a26f941da8f884f385cfd42ad9
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-09-20 11:12:02 -07:00
Roy Oursler
a673395ac0 igzip: Improve test to test create_hufftables_subset
Change-Id: Ibefb5daa37050e6739f7004ff6e2004c342dd422
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-09-20 11:12:02 -07:00
Roy Oursler
b9a546058f igzip: Add gzip/zlib wrapper testing to large test
Change-Id: I656c29f123692d2b29d70cb3c6711b49aafb1dc9
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-09-20 11:12:02 -07:00
Roy Oursler
f1f8d0917d igzip: Modify stateless to limit max compression size
Change-Id: Ic2fcbe8fe643bcbd00bdc13e649e42b639098dad
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-09-20 11:12:02 -07:00
Roy Oursler
49e3329729 igzip: Fix Stateless Full Flush type0 block bugs
Prevent type 0 block in stateless full flush from attempting to write a trailer
when it not the end of the stream and fix the values of block_start and
block_end.

Change-Id: Ia8beac20fc244b1b3e5690cbc15d4d4bb8ada68e
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-09-20 11:12:02 -07:00
Roy Oursler
a1f8e55d11 igzip: Remove movnti instructions
Change-Id: I760b737bb5b138de4d62d841e2f24bc41c6a8b68
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-09-20 11:12:02 -07:00
Roy Oursler
7345490999 igzip: Modify igzip_body assembly to run to last 16 bytes.
Change-Id: Ib2c688d0b2d7ff5d4fd7b14bb6eea72a7f689cd3
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-09-20 11:12:02 -07:00
Roy Oursler
52d974762b igzip: Create precomputed static inflate_huff_code
Change-Id: I94a2de2b5e5bebc37e30f5d597a95c493da504c0
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-09-20 11:12:02 -07:00
Roy Oursler
8387b65800 igzip: Modify hash table size based on input size
Change-Id: Ieeddb36ef8cd9615011876e4d8dc941a06622d1b
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-09-20 11:12:02 -07:00
Roy Oursler
8e4f1a1a38 igzip: Create generic deflate performance utility
Change-Id: Idf180660797f97a492550fb557652f036cd55509
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-09-20 11:12:02 -07:00
Roy Oursler
991fd252f1 igzip: Remove igzip_perf.c
Compression is too data dependent for igzip_perf.c to give any meaninful result.

Change-Id: I6d22cb5eb959404807b9c83de38e06d46c3ede76
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-09-20 11:12:02 -07:00
Roy Oursler
2104a77d02 igzip: Create a command line utitlity
Change-Id: If283f03231ca3a5cd6f97d01c5268ad37cb3b538
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-09-20 11:12:02 -07:00
Roy Oursler
ff1928e8ec igzip: Create functions to write gzip/zlib headers
Change-Id: If5aaa277a01214bd36406ee11680df0904ad12f7
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-09-20 11:12:02 -07:00
Roy Oursler
43374f6776 igzip: Implement gzip/zlib header/trailer parsing
Change-Id: I3fe8653f2286212a9d6c6ecfa3b78752b2cac8ef
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-09-20 11:12:02 -07:00
Roy Oursler
aa8b51930f igzip: Add variable history window to perf test
Change-Id: Ia5eb10094e8c84778ed6cf3a51ddade9a19103b5
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-09-20 11:12:02 -07:00
Roy Oursler
1409f70c7e igzip: Modify test to check validity of window bits
Change-Id: Id198bb019057e54226a90c3d82d00746df04da63
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-09-20 11:12:02 -07:00
Roy Oursler
11542000f0 igzip: Implement limited window size for inflate
Change-Id: Ib7fce6a51db99fc7e11f06f5916c2b755bfc5c67
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-09-20 11:12:02 -07:00
Roy Oursler
03bef684a4 igzip: Setup for variable hash mask
Change-Id: I3be94dbc40c2e02dcff4f89e5a9df8ed1f744f02
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-09-18 14:27:25 -07:00
Roy Oursler
6317ce2b78 igzip: Setup for variable lookback distance
Change-Id: Idd52c9392113dfc54feea3c66916a7f5aa128bef
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-09-18 14:27:25 -07:00
Roy Oursler
f421ea8d7a igzip: Modify isal_deflate buffer management
Change-Id: I2f12a0acf8ceeffb7328093e25205a6e73484159
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-08-21 16:35:30 -07:00
Roy Oursler
c1876a1221 igzip: Fixup level 3 first byte handling
Change-Id: Id9f59934d43b09af3c2ec722f5a825aa9b02e2dc
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-08-21 16:35:30 -07:00
Roy Oursler
cd7b70dd41 igzip: Fix level 3 gen_map end of buffer handling
Change-Id: I3ed75b0ade5af23a98d916e867bb93ee9ad3a992
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-08-21 16:35:30 -07:00
Roy Oursler
bca564035a igzip: Move stream off the stack in test_compres_multi_pass
Change-Id: Iac7e48d52159936b6623542a118227d4aa63c8e0
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-08-21 16:35:30 -07:00
Roy Oursler
a6f438f935 igzip: Make igzip_rand_test exit when an error happens
Change-Id: I35249618dad9668b361a87ee827820e977148a7c
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-08-21 16:35:30 -07:00
Roy Oursler
649ad89cdf igzip: Fix large buffer bug due to promoting total_in to uint64_t
Change-Id: Ibf7af1695d3d04a30c09f8bf2b444e4a5b87971a
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-08-21 16:35:23 -07:00
Roy Oursler
fac7e7ce18 igzip: Pass writeable buffer to test_compress
The function test_compress in igzip_rand_test modifies the input buffer as part
of the test, passing in a read only buffer can cause issues.

Change-Id: Ib1e67ec72d9c95ea983b5f7550deb3d56cde4260
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-08-21 13:18:05 -07:00
Roy Oursler
bb3c6c28c8 igzip: Remove need for total_in_start
Change-Id: Ie9ab3e702ce07a5ba8d6fb3275da98e03c25822b
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-08-21 13:18:05 -07:00
Greg Tucker
105eeb967c crc: Fix for small buffer readover in iscsi crc
Change-Id: Ib4d7e2c6838d490a539a0174b8eb128e4fb49bba
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-08-17 15:37:43 -07:00
Greg Tucker
bad3a0af87 test: Add build info to autotest output
Change-Id: I6a8fdd3c00f7b598c91ccf2b6d96507da164a991
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-08-16 13:27:50 -07:00
Seth Howell
f31da80345 test: add functions for indicating test completion
Change-Id: I30f2f1147989ec3411f7d16066f0e5a8eb208135
Signed-off-by: Seth Howell <seth.howell@intel.com>
2018-08-15 14:27:00 -07:00
Roy Oursler
64aefbfcba igzip: Fix level 3 compression drop
Change-Id: I67d66323850d1e42ab1c38b212f4cb5ad8699920
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-08-15 11:19:28 -07:00
Roy Oursler
a32245b97b igzip: Fix underflow bug in total_in in isal_deflate()
Fix bug introduced by patch 2a292689e.

Change-Id: If53af716d546e9430eb8aaae32c2f6133aba21a2
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-08-15 11:18:33 -07:00
Roy Oursler
2a292689ec igzip: Fix deflate for large buffers
Change-Id: I1993e0a6d3aa36c68af80229329316b2e0616a09
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-08-07 09:52:55 -07:00
Roy Oursler
995d420c6b igzip: Fix inflate for large output files
Change-Id: I3d3532100f8d60e0b446f5c90fc820cbfd2df1ec
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-08-03 12:54:05 -07:00
Roy Oursler
7da82d555f igzip: Add missing USE_HSWNI defines
Change-Id: Ic3f2e1dada0c61e7b78068131fbea37023215844
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-07-26 11:10:13 -07:00
Roy Oursler
cfa1400557 igzip: Add zlib inflate perf testing
Change-Id: Ida40fbeb4dae15c1fd85ae23644b6c08efe79182
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-07-12 13:42:39 -07:00
Greg Tucker
c66302370e igzip: Fix for inflate perf test stateless
Test was missing reset of inbuf for each iteration and not checking fopen.
Also redid the loop a bit.

Change-Id: Ia55e80e2ba79b14c26d0db6d722eebc2e6f14cc5
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-06-28 13:55:57 -07:00
Greg Tucker
8274daec55 Update release notes for v2.23 additions
Change-Id: I74603cf31e324dfc2273b3100f42eb2205131e22
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-06-25 10:19:43 -07:00
Greg Tucker
8e1f3c01f3 build: Bump revision to 2.23
Change-Id: If3dcae790b0c3acb83ab59140cf9a046e79ce6a4
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-06-21 14:55:56 -07:00
Greg Tucker
b1c4517557 igzip: Add a few missing asm copyright headers
Change-Id: Iddcfbd357efa17dbbd32acacac952579fc052756
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-06-21 14:50:40 -07:00
Greg Tucker
a1869430c7 build: Fix warnings on mac for objects defining no symbols
Change-Id: I13ef334ae23a3370cbf2a5409974fa0dc9fba7a5
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-06-21 14:39:08 -07:00
Greg Tucker
b0ffac5140 doc: Add contributing and mainpage to doxygen
Change-Id: Ie912ab61d7d7ac19c982d23bc43468f14cb3436c
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-06-21 14:39:08 -07:00
Greg Tucker
e8d15527fb igzip: Remove igzip_sync_flush_file_perf
Performance test had limited functionality.

Change-Id: I5abc839fafc1351de7543531e7770b6add0bcb1d
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-06-21 14:39:08 -07:00
Greg Tucker
bee68480b8 ec: Remove references in lib source to types.h
Change-Id: I3e8db92626c92d21c2426bbad89a10fa10c3e002
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-06-21 14:39:08 -07:00
Greg Tucker
da1aee8714 igzip: Remove references to types.h
Previously included just for struct alignment but all restrictions
have been removed.

Change-Id: I3fa7cbab86fce419b3b3bfccb48d9129bd77cf64
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-06-21 14:39:08 -07:00
Greg Tucker
804cf21206 test: Check for sign-off and format in more files
Change-Id: Id780756a6fb201b22fc3ac1a67955ed61b436e8d
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-06-21 14:39:08 -07:00
Greg Tucker
0ad8ea9a15 test: Fix compile warning in dump inflate corpus for fuzz
Change-Id: I2734ec46687c5188a207963c6631bb4280c1fb8f
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-06-21 14:39:08 -07:00
Greg Tucker
d30b45ffbe igzip: Remove unused initialization in inflate expand huffcode
Change-Id: Ia9732fc4a2acc9990e3b2b77bc604f740ce6be30
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-06-21 14:39:08 -07:00
Roy Oursler
e8ca21baf4 igzip: Fix update_histogram_base buffer finish
Change-Id: Ib74988a79baca7f5095447458d7374f834d1c138
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-06-14 15:35:14 -07:00
Roy Oursler
31204ae96e igzip: Reduce igzip_gen_icf_map_lh1_04 data usage
Change-Id: Ida538675ef0ffe0e3d65e1aed382b2e04ed83baf
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-06-14 15:30:14 -07:00
Roy Oursler
ff00a0f927 igzip: Reduce data usage in igzip_gen_icf_map_lh1_06
Change-Id: I453f6c6e71f236145c1e79493710c85847ed8c70
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-06-14 15:30:14 -07:00
Roy Oursler
d389b8d6f8 igzip: Move COMPARE_TYPES usage
Change-Id: I87d88618b6f86c1f9618ba9cea132153a8ef2fa5
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-06-14 15:30:14 -07:00
Roy Oursler
21e78d5aa3 igzip: improve igzip_body_compilations
Change-Id: I7ad859a986c643336be8824f6400b266ff140dcd
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-06-14 15:30:14 -07:00
Roy Oursler
ac5a741420 tests: Improve usability of test_fuzz.sh
Change-Id: I595e15d155dc6aa759671da510198c39b2e9c23a
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-06-14 15:30:14 -07:00
Roy Oursler
14ba3747b5 igzip: Optimize multibyte for small files
Change-Id: I8400e0be07da75fd549724147ab06aa71f7cc9df
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-06-14 15:30:14 -07:00
Roy Oursler
5d6d9a21eb igzip: Optimize bit reverse
Change-Id: I45244b8c2f07fab0f237b11b92fa5f557aff878b
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-06-14 15:30:14 -07:00
Roy Oursler
7faedc71bd igzip: Rename tzcnt to more accurate tzbytecnt
Change-Id: Ifc0b828f50e4c1feaf141e0164749eca3b227996
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-06-14 15:30:14 -07:00
Roy Oursler
3732485914 igzip: Accept multiple inflate codes for matches of 258
Allow inflate to accept code 284 with extra bits 0x1f to be accepted
as a match of length 258. This matches zlib's behaviour.

Change-Id: Id85052ceea2b23d3db9c147672dd7996a4c66786
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-06-14 15:30:14 -07:00
Roy Oursler
a778455448 igzip: Fix bug in create_rand_repeat_data
Change-Id: Ib3ab731ea9a96cdbd0380d6a88b3837ae0de0815
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-06-14 15:30:14 -07:00
Roy Oursler
054758ea63 igzip: Add back inflate std vectors test with updated errors
Change-Id: I5e46fa028baec8b8b0a3435b5d1cc11303e39abd
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-06-14 15:30:14 -07:00
Roy Oursler
a61e035445 igzip: Improve inflate invalid block identification
Change-Id: I31ab9fa641e448c643ff4c6e606837c07ec2b14f
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-06-14 15:30:14 -07:00
Roy Oursler
b3ddaff6c5 igzip: Create flag to determine maximum symbols decoded
Change-Id: I94c185bf10662931248ccae07aa5659626f1deb2
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-06-14 15:30:14 -07:00
Roy Oursler
2de5a0fd88 igzip: Swap length code lookup with length lookup
Change-Id: I9f1c3ea5353f2c2fa98bab1d0cb1eb3c7b7397f6
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-06-14 15:30:14 -07:00
Roy Oursler
ec6169ac3b igzip: Combine distance code and extra bit count lookup
Change-Id: I12dca7126313406afd6ccbbaa91e7a1cc4a91bc0
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-06-14 15:30:14 -07:00
Roy Oursler
16f32e179c igzip: Reoptimize decode_stateless due to multibyte decode
Change-Id: I1479772062be584f5087bf2999ba4500a340ca51
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-06-14 15:30:14 -07:00
Roy Oursler
abd1963a22 igzip: Allow decoding symbol triples
Change-Id: I82e088b65a37adb1853ce2525dafafec06586a0f
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-06-14 15:30:14 -07:00
Roy Oursler
bf4c2dab27 igzip: Allow decoding symbol pairs
Change-Id: I306404d7821cf4e43c28ae6477038b17a29b0c47
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-06-14 15:30:14 -07:00
Roy Oursler
021afc5911 igzip: Remove inflate standard vectors
Change-Id: Ib7a5e8e8d63bc895eddc85c1eb1f5ab2edd56515
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-06-14 15:30:14 -07:00
Roy Oursler
3d1b5b40d3 igzip: Rearrangement of make_inflate_huff_code
Change-Id: Ieff4a7c03827ff6a41b2e8e7316b239b94343c1a
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-06-14 15:30:14 -07:00
Roy Oursler
222a68f760 igzip: Implement multibyte decode
Change-Id: I923a57a01f696f2082945fafcc2773655b9a5beb
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-06-14 15:30:14 -07:00
Roy Oursler
0b8fe87648 igzip: Verbose printing for multi_pass inflate
Change-Id: Iea1ebf1f185bf90da441d27df479e164d21fb74c
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-06-14 15:30:14 -07:00
Roy Oursler
7a1dc55c27 igzip: Increase size of large short code lookup
Change-Id: I05a564d1759ae417a966f3a12621799db0edf80a
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-06-14 15:30:14 -07:00
Roy Oursler
fbeb7c83c4 igzip: Some general cleanup of the decode_block assembly
Change-Id: Ie30955fcb47ffc9b23f0c50f520cbd9973b2b315
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-06-14 15:30:14 -07:00
Greg Tucker
9edac4799d ex: Allow erasure list in any order in ec example
Previous gf_gen_decode_matrix_simple() assumed that all source errors
were listed first before any erasures in parity.  Generalized to work
in any order.

Change-Id: I31b9c0c0db5d0155473424ccd0ecdcdd787ef71f
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-05-25 14:44:50 -07:00
Konstantine Kharlamov
19f2c46d1b igzip: fix build failure for CPUs with BMI capability
…also makes use of an optimized algorithm for x86_64 CPUs without the BMI.

v2: use "defined()" macro

igzip: s/__bsfq/__builtin_ctzll

Per discussion at https://github.com/01org/isa-l/pull/38 __bsfq isn't
defined on clang, but __builtin_ctzll should work same way.

Also, refactor the code a bit.

Change-Id: I1a251abe1fab1be1cbdc2c042298d0b500068c68
Signed-off-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
2018-05-25 10:27:50 -07:00
Greg Tucker
951ec3198f test: Fix clang warning in inflate perf test
Change-Id: I4bd30057a9a6f508af871b0828193004e405daa7
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-05-08 11:31:21 -07:00
Greg Tucker
c20260e361 test: Fix check script for old bash versions
Change-Id: I09975d3540993279c378f3dbbe93437dbcc4c142
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-05-07 14:14:32 -07:00
Roy Oursler
55481069ac tests: Run all make targets in test_checks
Change-Id: I484500fc5a943aebf5779846972595cb74f0e145
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-05-04 10:56:04 -07:00
Greg Tucker
bc48c1e2dc test: Change fuzz object to link .o instead of .lo
Automake is not cleaning .lo from extra directories.

Change-Id: Ib68f32954c58cb7a76d07b2562e020fbd854f46e
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-05-01 16:30:45 -07:00
Greg Tucker
4c4185ba56 ex: Fix ec example for random params, min parity 1
Keep from trying to alloc 0 bytes for parity = 0.

Change-Id: Iafb9eeb9ac9da85f521ac5eeb4e85ea80c5b6e4c
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-05-01 10:56:23 -07:00
Roy Oursler
7274d27ff6 igzip: Implement stateful blocks perf testing
Change-Id: I2a1baa2d3c09d894dee54d5be8c1e9aa2ed434af
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-04-24 10:09:45 -07:00
Roy Oursler
83c2ec02cc igzip: Add runtime options to inflate perf
Change-Id: If520c1d499d49779f35dffc550b5612dab6839c7
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-04-13 13:20:57 -07:00
Greg Tucker
2a8b061218 Update release notes for v2.22 additions
Change-Id: Ifffe687c16516f50422aa1543f64a3c9cd5c7861
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-03-30 11:54:04 -07:00
Greg Tucker
16a5d25988 test: Add llvm libFuzz arguments for builtin clang 6.0
Clang 6.0 has libFuzzer included and different args

Change-Id: Iad7470d13a93c6b5e41de63f634ba8d501eaaa37
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-03-30 00:36:52 -07:00
Greg Tucker
0ba5f0f7db build: Bump revision to 2.21
Change-Id: I72ab302f7fb4e23e2637f810cee131264b4e96d4
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-03-29 23:12:31 -07:00
Greg Tucker
aaeedf60c4 ec: New example of piggyback codes
Change-Id: I872c48b150be1799b97b2115aed0804a36eb5a0c
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-03-29 23:06:21 -07:00
Greg Tucker
041379a6c6 ec: New simple erasure coding example prog
Change-Id: Ic3090a9315c8c0fa7bf910c2855e95fbabea7f7a
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-03-29 10:35:24 -07:00
Daniel Verkamp
99b45db17e types.h: remove [U]INT{8,16,32,64} typedefs
These can be replaced with the <stdint.h> types.

Additionally, the existing definitions weren't correct on some platforms
(e.g. IA-32, where 'long', used for INT64/UINT64, is only 32 bits).

Change-Id: I1d9235c693ca2dc0c51d085128cecc4effc165fd
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
2018-03-29 09:35:25 -07:00
Daniel Verkamp
d9ec2c4c8a ec: use standard types in struct slver
This matches the definition of struct slver elsewhere in the code.

Removes the last use of [U]INT{8,16,32,64} types.

Change-Id: I70761ac27add1e19808f1cebd6a7ee69ebd08dee
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
2018-03-29 09:34:21 -07:00
Daniel Verkamp
6e9f576bff igzip: remove detect_repeated_char()
This was replaced with detect_repeated_char_length(), but the
implementation of the old function was never removed.

Change-Id: I55485cec324dce01033b73f24474f1aca2a31bd3
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
2018-03-29 09:33:04 -07:00
Daniel Verkamp
f9a61187c5 igzip: remove stray duplicate semicolon
Change-Id: I0df14600e2f49dd04d3e94bdfe2c155faa9ac2ee
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
2018-03-29 09:31:40 -07:00
Roy Oursler
bb05f0fcdb igzip: Fix sizeof(void *) in isal_create_hufftables
Fixes #33

Change-Id: I43640b4ccdf165c84757ea7b3ace80f3dc4aafde
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-03-22 13:19:30 -07:00
Roy Oursler
4734834501 igzip: Fix uninitialized used of bl_count[0]
Fixes #34

Change-Id: I489b1ae24212d81875b96f687f3543ac548e2278
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-03-22 13:05:56 -07:00
Daniel Verkamp
bcde6e410b igzip: mark huff_codes.c internal functions static
Remove declarations from huff_codes.h (preserving doc comments where
applicable) and mark functions that are only called within huff_codes.c
as static.

Change-Id: Idc0113d4eca9e97347def86a502073ef7126114b
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
2018-03-15 12:34:46 -07:00
Daniel Verkamp
a301835e06 igzip: mark proc_heap_base.c internal function static
heapify() is only used within proc_heap_base.c.

Change-Id: I68cc11c2a82fa7f6a989a1838c0e744c0c23feb3
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
2018-03-15 12:34:33 -07:00
Daniel Verkamp
69176f5428 igzip: mark igzip_icf_body.c internal function static
compress_icf_map_g() is only used within igzip_icf_body.c.

Change-Id: Id488d6721c60c1909c922a5e0bd162b1542e71ca
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
2018-03-15 12:34:12 -07:00
Daniel Verkamp
575664e78b igzip: mark internal igzip.c function static
detect_repeated_char_length() is only used within igzip.c.

Change-Id: I77ee5422e2cb58d81b9705ebcfad68f3a7017d6b
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
2018-03-15 12:33:49 -07:00
Daniel Verkamp
bc78944634 igzip: drop declarations of non-existent functions
valid_lit_huff_table() and valid_dist_huff_table() were declared in an
internal header, but they were never defined or called anywhere.

Change-Id: I59ddf35f161276fca6d6b58081cf640bbea41252
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
2018-03-12 09:08:06 -07:00
Daniel Verkamp
e815371be7 igzip: remove unused hash_section() function
This isn't referenced anywhere and isn't part of the public API.

Change-Id: I1e4809c8cc4ac64310fa151425e710abb7351079
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
2018-03-12 08:47:49 -07:00
Daniel Verkamp
07eac8fc12 igzip: mark private data as static
These globals are only used within huff_codes.c, so they don't need to
be globally visible.

Change-Id: I1e118b3a95cfb7d21bf33c66559362483e460d58
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
2018-03-12 08:28:10 -07:00
Greg Tucker
9e79faeeff raid: Ensure example meets min align requirement
Change-Id: Ie9d367176046bb4919474981c84e957bed6c99d6
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-03-06 13:39:32 -07:00
Greg Tucker
5af4e4aa0a raid: Change example to use multi-binary function
Change-Id: I9a3edf4ad0b9b8afad6d0545bfc7436b4c8fdfe0
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-03-01 13:32:54 -07:00
Roy Oursler
4b45beff4f igzip: Avoid some nested multibinary calls in deflate_body
Change-Id: I2b433f63664ffa27fc125a6a859a1b8053212d7f
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-03-01 13:27:48 -07:00
Roy Oursler
47e914f98f igzip: Fix Windows prologue for avx 512 gen_map and set_long
Change-Id: I8e326dc7fb67f30101d03dc364ffba25242e1f67
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-03-01 13:27:48 -07:00
Roy Oursler
aedf4f8cff igzip: create optimized set_long_icf_fg for avx2
Change-Id: I027e73eaa908ca69a5c1af5a52b464f1963fd2da
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-03-01 13:27:48 -07:00
Roy Oursler
9acc3ed2ac igzip: Create AVX2 optimized version of level 3
Change-Id: Icfdb67445ee5afff85441cfee23beb66bfe15d5e
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-03-01 13:27:38 -07:00
Jean-Yves VET
52bb322912 build: fix compilation with CPU not supporting SSE
This patch makes the project compile and run (tests and
performance tests as well) with CPUs which are not
supporting SSE instructions.

Signed-off-by: Jean-Yves Vet <jyvet@ddn.com>
2018-02-23 06:13:10 -05:00
Greg Tucker
553f01f0c4 Include doxygen label in toplevel header
Change-Id: I8dfc08afa8255ff781104542c0a50da1519673e0
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2018-02-09 11:13:27 -07:00
Roy Oursler
3371542dce crc: Create a combined crc32 check test
Change-Id: Idae7634007363cfb59cca15270bd82c37fae26ea
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-02-02 13:51:17 -07:00
Roy Oursler
148898c6ba igizp: Print error when test_compress_file fails in igzip_rand_test
Change-Id: I5b03b006708778cb61edb3d456d22d08f1169309
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2018-01-04 09:13:59 -07:00
Greg Tucker
f3c218ae62 test: Remove travis job that pulls nasm from debian
Travis-ci is having an issue with installing nasm from debian repo.
Removing until they have it fixed.

Change-Id: I3b67e2fde0b2a9c7bc44d5a9077bf9a23f1fde24
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-12-21 17:36:04 -07:00
Greg Tucker
cf9cf4b430 Update release notes for v2.21 additions
Change-Id: I940bd2289d679fb40c2416db3258ef83fbc81c29
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-12-21 11:41:00 -07:00
Greg Tucker
35dc4f0b0c build: Change configure nasm min test for AVX512
Due to bug in nasm on vinserti32x8 instruction, need nasm v2.13 or
better to build new AVX512 igzip files.  Changed the configure test
and doc to reflect.

Change-Id: Iceaf65187cbb2d63c29f9c0f19346f03bb484a94
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-12-20 11:54:41 -07:00
Greg Tucker
249deb378d build: Bump revision to 2.21
Change-Id: I5a5bfe5e15ff56e791aaabd68915793ee1886ba3
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-12-19 17:41:33 -07:00
Greg Tucker
22ec5c7469 igzip: Fix igzip fuzz test for lev2-3 buf size
Also remove unnecessary deps in igzip makefile.

Change-Id: I1ff79461df6d60bfc52c99b574c39098f1fe238a
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-12-19 11:01:46 -07:00
Greg Tucker
6b1c9a95c8 igzip: Fix syntax for win yasm for when avx512 is supported
Currently can't test because yasm doesn't support avx512 but should
get the syntax correct for when it does.

Change-Id: I672b47b83b96861d8b9bfb0af02e726a1949aca0
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-12-19 10:33:53 -07:00
Roy Oursler
48119c5c87 igzip: Increase long_code_lookup struct size to fix buffer overflow
Change-Id: I6546dcb7ffcd5895292d06fdc748c3cf279a4542
2017-12-19 10:24:23 -07:00
Greg Tucker
491035d956 crc: Add t10dif+copy function
Change-Id: Ic6c424a0aa746aa06643575f7fcc8d6944cbfc0e
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-12-18 15:59:17 -07:00
Chunyang Hui
e7df3fdedc crc: crc32 iscsi performance improvement
Change-Id: Ic8c946546345cf92a19dea1bbc5ebaf66c0f98da
Signed-off-by: Chunyang Hui <Chunyang.hui@intel.com>
2017-12-18 15:51:10 -07:00
Roy Oursler
e3fad7c45a igzip: Fix out buffer overflow in write_type0_header
Fix a possible 1 byte overflow by creating a combined write_bits and flush.

Change-Id: I2d2455e9e32a820522ff1d89d016db72a82baed9
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-12-15 14:27:14 -07:00
Roy Oursler
5d413c8b12 igzip: Use full 32 bits histogram elements in igizp_icf_body/finish
This fix prevents possible histogram overflow in compression.

Change-Id: Ie5f25d1bace7f443f432678fcfbd9050ac65113f
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-12-15 14:27:14 -07:00
Roy Oursler
8965584ae3 igzip: Implement large hash table method as level 2
Move current level 2 compression to be level 3 and create a level 2
compression which is the same algorithm as level 1 but with a larger
hashtable.

Change-Id: I398a6017c2ff218dded24e229fb51c23ea69f314
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-12-15 14:27:14 -07:00
Roy Oursler
7a12bcb2a8 igzip: Separate concept of level and compression method
Change-Id: I82a5fbeb93adc77057893c643e044e311e4f393c
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-12-15 14:27:14 -07:00
Roy Oursler
fe68f02dac igzip: Move hash_table and symbol histogram for icf compression
Change-Id: I50df9c8915ff3e1af450aeb8e4c0cc3baf9624ae
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-12-15 14:27:14 -07:00
Roy Oursler
2573f3cd87 igzip: Create defines for lvl1 hash table and histogram
Change-Id: I4cdf7af4e482b8105aef024085323d0b1cd622ef
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-12-15 14:27:14 -07:00
Roy Oursler
06cd70481c igzip: Separate defines for LVL1 and LVL0 hash tables
Change-Id: I19bdec8d2d0c74083bc1695763c9630516995885
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-12-15 14:27:14 -07:00
Roy Oursler
9203f96c2a igzip: Change inputs to deflate_hash functions
Change-Id: I9ffff9aee82d5d1ece51853b4709f13ccd80c8ad
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-12-15 14:27:14 -07:00
Roy Oursler
fe90c14bc3 igzip: Fix issues for large file inputs to igzip_rand_test
Change-Id: Ic77834d771b2eebdfb7eeed782ee0d69e7905fa5
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-12-15 14:27:14 -07:00
Roy Oursler
b4ebd1c6e7 igzip: Fix bug in igzip_rand_test where invalid flag is set
Change-Id: I0d9418bcb48bd40e2fb8bb9105f38cffe77ba33d
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-12-15 14:27:14 -07:00
Roy Oursler
4ae2d1be29 igzip: Implement optimized level 2 compression
Change-Id: I8cf5bcd56f290d17205ac36dc2828c8acfc66947
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-12-15 14:27:14 -07:00
Roy Oursler
3c62216aa1 igzip: Remove some uneccessary random data generation from igzip_rand_test
Change-Id: I7c10c95549b9d825360d8f033f8dc3d8546d4d4c
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-12-15 14:27:14 -07:00
Roy Oursler
4de751f35a igzip: Improve VERBOSE printing in igzip_rand_test
Change-Id: Ib04361d0980920e81f901da3df234f9299669a98
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-12-15 14:27:14 -07:00
Roy Oursler
864634a62b igzip: Modify rand_test to not call rand() within VERBOSE blocks
Change-Id: Ice907a9eae72662a741d3d02a3926683e65c173d
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-12-15 14:27:14 -07:00
Roy Oursler
a7d474a55c igzip: Fix rand_test_bug where invalid flush case can have a valid flush
Change-Id: I6459432cb14b0c91f07b87a54c224f71ee03a05a
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-12-15 14:27:14 -07:00
Roy Oursler
49156c3568 igzip: Implement block buffer
Change-Id: I9d7942740557e4ffaf8e223e190f4bd4e4f47b1e
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-12-15 14:27:14 -07:00
Roy Oursler
6049ce3ca7 igzip: Modify state to record total_in on deflate call
Change-Id: I13e5878a227732545aee5a762bf5a9a75ce73f02
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-12-15 14:27:14 -07:00
Roy Oursler
59b9990a39 igzip: Remove file_start from zstate
Change-Id: Ia4cb6dc86da54cc771f25a6d958bea730caa4801
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-12-15 14:27:14 -07:00
Greg Tucker
0fe95360bb test: Run with new test_seed in each extended test
Change-Id: I63903ecff9624a793b041671042c1fcaff2dd3a4
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-12-11 16:15:55 -07:00
Greg Tucker
cb4cea60da test: Remove redundant arch-specific tests
Change-Id: Ifdbac9d8a99888bfd7a12da5d47dd07b8f85481d
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-11-30 11:13:21 -07:00
Greg Tucker
6dcf6edba3 test: Fix ext script for darwin arch=noarch
Change-Id: If50f15fd1fef862e73eac50cebc88fbf18caf989
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-11-29 15:54:32 -07:00
Greg Tucker
54e1f157f7 test: Ensure fuzz tool returns 0 if libfuzzer not avail
Change-Id: Iafbeea0444529df5c14c65c0722653aba442df76
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-11-17 18:50:26 -07:00
Greg Tucker
4f59eeda90 test: Add llvm fuzz testing
Moved the afl fuzz test and added llvm fuzz tests including inflate
and round trip compress and inflate.  Currently only works with clang,
std makefile and libFuzzer installed.  Need to add checking and
support later when libfuzzer is more tightly integrated into the
compiler.

Change-Id: I2db9ad2335d6c5ed846886703b58225f67bcc935
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-11-17 17:28:27 -07:00
Roy Oursler
f7b6e73146 igzip: Improve igzip_rand_test compress_multi_pass by modifying in buffer
Modify igzip_rand_test to test that data after next_in can be modified without
producing incorrect compressed data.

Change-Id: Ic6b62f269c6783407fbec7222ae73f24d3735717
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-10-20 15:29:39 -07:00
Roy Oursler
cad5e7d479 ec: Fix doxygen comment format for gen_rs_matrix
Change-Id: I505155b0d57576814d876cd2d3595e522e03c469
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-10-17 10:18:45 -07:00
Greg Tucker
48d74d6e51 igzip: Fix perf test to only open dictionary if specified
Change-Id: Idf5048589af5b61da5ccccf25c575dfc05ea15ec
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-10-11 15:57:41 -07:00
Greg Tucker
daa6fd05b4 doc: Add issue on create_hufftables_subset to release notes
Change-Id: I1c8c4b7ffbd7a1485abfe40bcf1a3e56d6aea85d
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-09-26 17:27:45 -07:00
Roy Oursler
e79c57c7e3 igzip: Fix issue with isal_create_hufftables_subset
isal_create_hufftables_subset failed to generate length symbols, but should
generate those symbols as a histogram does not guarantee finding all lengths
found in compression

Change-Id: I880210fe1b1078de8617cab0ecb93c9810b9c9de
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-09-21 17:24:56 -07:00
Greg Tucker
23159441e0 doc: Update readme with info on build doc and packages
Change-Id: I1dd04179fb2f7f9de8ce85c65dc422d43a2853c9
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-09-20 11:25:37 -07:00
Greg Tucker
f78601adb9 igzip: Pick faster crc32 gzip function for sse4.2 machines
For machines with sse4.2 but not avx we can choose a faster version of
crc32_gzip. For highly compressible files using gzip format (rfc 1952) this can
be a significant advantage.

Change-Id: Ib699a68999290eed4b83f9bbf54dc6c304a5c445
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-09-20 10:50:16 -07:00
Greg Tucker
63c9aef4b7 update release notes for v2.20 additions
Change-Id: Ic8d7a37f715983852036f0b63273ea8f10fd83aa
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-09-15 18:59:31 -07:00
Greg Tucker
ebf720af04 build: Bump revision to 2.20
Change-Id: I0f994fa9d31d873706b41b0b5c50b5f277ad0988
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-09-15 18:48:44 -07:00
Greg Tucker
62ba0b6acd build: Update travis to run more tests
Change-Id: I0cb4967914c2240c555a9cd447a7fc404d64cc4c
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-09-15 18:34:00 -07:00
Greg Tucker
7ab24b769e build: Add extended test scripts
Change-Id: Ia5d57d8e1c0037ecf3d235651adcc33913049c94
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-09-07 16:52:52 -07:00
Greg Tucker
74d5d3660b build: Add format checking script
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-08-25 13:37:13 -07:00
Peng Xiao
9099918dcb fixing huff_codes.c's compiling problem on CentOS 6.5 as gcc 4.4.7 only support C89
Signed-off-by: Peng Xiao <xiao.peng61@zte.com.cn>
2017-08-25 11:07:23 +00:00
Roy Oursler
aff6555226 igzip: Optimized deflate_hash
Optimize deflate hash by unrolling crc calculations.

Change-Id: Ief882910619a2cc3b052416d30499f6226e47419
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-08-18 14:42:37 -07:00
Roy Oursler
cf936f0d84 igzip: Remove igzip_sync_flush_perf
Remove igzip_sync_flush perf because the perf test does not give any new
information, and the data used does is not a representative work load.

Change-Id: I7b68f8b7c6da0944ace5a2a9e31db378135689ff
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-08-18 14:39:14 -07:00
Roy Oursler
80bfbb33df igzip: Remove DECLARE_ALIGNED and optimize structure layout
Change-Id: I95bc3b8e2e30aff0d596c743158337400c4eb486
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-08-18 14:35:44 -07:00
Greg Tucker
a7fad4b9d2 igzip: Change a few conflicting functions to static
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-08-18 14:34:14 -07:00
Roy Oursler
48d7def5d9 igzip: Fix inflate total_out behavior to align with expected behavior
Modify total_out to be total data decompressed into user supplied buffers,
rather than to user supplied and internal buffers.

Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-08-18 14:28:10 -07:00
Xiaodong Liu
3ab8239097 multibinary: move WRT_OPT macro to common header
Signed-off-by: Xiaodong Liu <xiaodong.liu@intel.com>
2017-08-18 14:24:57 -07:00
Xiaodong Liu
9d243d0ed7 Update windows config files
Signed-off-by: Xiaodong Liu <xiaodong.liu@intel.com>
2017-06-26 06:34:35 -04:00
Xiaodong Liu
b39d72a7da Update release notes for v2.19 additions
Signed-off-by: Xiaodong Liu <xiaodong.liu@intel.com>
2017-06-26 04:51:35 -04:00
Xiaodong Liu
96ade864f1 build: Bump revision to 2.19
Change-Id: Ib0f47911fc4745faf3535e73eefa4c012500316f
Signed-off-by: Xiaodong Liu <xiaodong.liu@intel.com>
2017-06-26 04:47:40 -04:00
Roy Oursler
34c341db35 igzip: Add reset functions for both deflate and inflate.
Change-Id: I8677a4365ac5c2343751660176f3b2eb4746ddfe
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-06-26 04:14:44 -04:00
Xiaodong Liu
7137c4a5be crc: release crc32_gzip_refl code outside
Merge crc32_gzip_refl function definitions, base code, multi-binary
code into crc32.h, crc32_base,c and crc_multibinary.asm in order to
keep consistency. Add crc32_gzip_refl files into crc/Makefile.am
Original crc32_gzip_refl removed NOT operation, re-add it.

Change-Id: Ib0cbbeb1ab3c9fcafec324b392596d2514202424
Signed-off-by: Xiaodong Liu <xiaodong.liu@intel.com>
2017-06-26 04:14:27 -04:00
Xiaodong Liu
39ce870235 igzip: slver typo fix
Change-Id: I13e6d150d0c661ee6dda9c25162c9ade5136d367
Signed-off-by: Xiaodong Liu <xiaodong.liu@intel.com>
2017-06-26 04:12:19 -04:00
Roy Oursler
ed15402f5b igzip: Add stateful dictionary support
Change-Id: I75dbac947787bc0041674468c88d0aa41b8b082f
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-06-26 04:11:48 -04:00
Roy Oursler
82a6ac65dc ec: Determine exact conditions where gf_gen_rs_matrix works
Add a program calculating some of the exact conditions where gf_gen_rs_matrix
works, add comments stating these bounds to gf_gen_rs_matrix, and fix erasure
code test that violates the bounds.

Change-Id: I1d0010b09fea97731bfd24f4f76e24609538b24f
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-06-26 04:11:12 -04:00
Roy Oursler
1a7c640ef9 igzip: Fix 0 length file and looping errors in igzip_inflate_test
Change-Id: I328f241ba07d8a0ae4fbc4c7de2ea8913912a188
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-06-26 04:11:03 -04:00
Greg Tucker
fc1467deb2 Format only patch from iindent and remove_whitespace
Change-Id: I114bfcfa8750c7ba3a50ad2be9dd9e87cb7a1042
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-06-26 04:10:47 -04:00
Greg Tucker
e3f36868fa Fix test helper for windows and gcc7 issues
Change-Id: Idb61d32d928536918dd243df825060c1b5bc484d
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-06-26 04:08:37 -04:00
Greg Tucker
278a51979a Fix test helper for windows and gcc7 issues
Change-Id: Idb61d32d928536918dd243df825060c1b5bc484d
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-06-26 04:06:22 -04:00
Greg Tucker
a8966b6709 igzip: Add unit tests for adler and crc32_gzip
Also renamed test helper function, fixed clang warnings and adler usage.

Change-Id: I4ad22d046809483456608be1f4fdc4adbf0e09e4
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-06-26 04:03:35 -04:00
Greg Tucker
e1f5284ff8 igzip: Add sse optimized adler32 checksum
Change-Id: Id07727b8a8da4b41aa983b487ca881552d5190ee
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-06-26 04:01:29 -04:00
Greg Tucker
3025e83b91 igzip: Add avx2 optimized adler32 checksum
Change-Id: I019a38cf98836e3e6c7215a6914b85abb9399e33
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-06-26 03:56:49 -04:00
Roy Oursler
f4a5b303e2 igzip: Remove BITBUF8 and BITBUF_ELSE compile options
Change-Id: Iad3b2e6f9a32473b6e59910494c75d82558fc28e
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-06-21 06:03:51 -04:00
Roy Oursler
edacadc8fb igzip: Modify encode_df_04 to behave more like encode_df_06 algorithm
Change-Id: I39c5d0d8182efb0fe8aa6bea97d9361df4ee8ddf
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-06-21 06:02:13 -04:00
Roy Oursler
5a55e3096c igzip: Avx512 version for encdode_df
Change-Id: I1625a3d7e016805791cfd09e31909562f432fd71
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-06-21 05:56:05 -04:00
Roy Oursler
b3c09b9b7c igzip: Improve random data generation in igzip-rand_test
Change-Id: I4835a9e376b4fa24080d765255703a959389487d
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-06-21 05:47:04 -04:00
Roy Oursler
64143a741e igzip: Implement type 0 blocks for level 1 compress
Change-Id: If55ab161623d29fa6fb08df3bc813e654918e592
2017-06-21 05:46:38 -04:00
Roy Oursler
87652b4489 igzip: Modify inflate to optionally calculate adler32 hash
Change-Id: I314617b89b59d53608e464c7d2cf299faa3528b5
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-06-21 05:42:12 -04:00
Roy Oursler
0cc3d93758 igzip: Modify test to test zlib compression
Change-Id: I52979c9e572ef9703995adf8d2163ba1797b8f53
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-06-21 05:41:47 -04:00
Roy Oursler
4259169107 igzip: Implement zlib compression format
Change-Id: I3d3cca425a494ac629cea230de74e3d32fcaea79
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-06-21 05:41:05 -04:00
Greg Tucker
c68e15dc53 Fix test helper function for mingw builds
Change-Id: Ic24b3ba89bf03bfbc829a78d2cb8f820885ada7e
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-06-21 05:33:00 -04:00
Roy Oursler
4f2d148ae5 erasure_code: Limit efence test length
Change-Id: Ib3bb0fa2fbcbbb759af7ea54fef5ea24ee1ba7cd
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-06-21 05:32:19 -04:00
Hailiang Wang
52f644d3ff igzip:use fgetpos() to replace ftell() to get file's length.
Change-Id: Ia1f3c06e92c01da9d22b3d70b2b05fe4808c9f2c
Signed-off-by: Hailiang Wang <hailiangx.e.wang@intel.com>
2017-06-21 05:22:07 -04:00
Hailiang Wang
ff9c0c1842 igzip:change the default max file size for 32-bit builds.
Change-Id: Ifab108250cfd06211843b5eccb1f1f0482669426
Signed-off-by: Hailiang Wang <hailiangx.e.wang@intel.com>
2017-06-21 05:13:24 -04:00
Greg Tucker
7e1a337433 igzip: Fix warnings from nasm 2.13
A few legitimate warnings got masked by previous nasm bugs.  This cleans last as
of nasm 2.13rc20.

Change-Id: Iaa7e6148e0e506222cc207685263103d62bdd015
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-04-19 10:17:24 -07:00
Greg Tucker
f4d8d35084 Update release notes for v2.18 additions
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-03-30 20:22:36 -07:00
Greg Tucker
6715e73d16 build: Bump revision to 2.18
Change-Id: I9b62a40d6a8c850476eb426d7c163f25d4af3a51
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-03-30 20:22:36 -07:00
Greg Tucker
579b09ba52 raid: Update nmake obj list for avx512 versions
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-03-30 20:22:36 -07:00
Greg Tucker
8f1155387c raid: Add avx512 version of pq_gen
Change-Id: Ic404e7f3c09c953fe3687355cc3f9728cfd16011
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-03-30 20:22:36 -07:00
Greg Tucker
0a62b7c40d raid: Add avx512 version of xor
Change-Id: I8f8e79f3442ef76268f60a8e61d2c36aedb1ccc1
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-03-30 20:22:36 -07:00
Greg Tucker
c28be0d306 igzip: Remove unnecessary casts causing warnings in 32bit build
Change-Id: I9fb85c097c9ade8fdc7907e9d6533d1997ccd406
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-03-30 20:22:36 -07:00
Greg Tucker
181cc20404 igzip: Update nmake obj list
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-03-30 20:22:36 -07:00
Greg Tucker
8270237457 igzip: Add decode_huffman_code_block_stateless to base aliases
Change-Id: I60558f5c09df354a1e7608fe479182ca4e10efb6
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-03-30 18:23:55 -07:00
Greg Tucker
dbab2ddad7 igzip: Make crc32_gzip_base standalone c function
Change-Id: Iffb55919fb51e9e3d74c5c5cb06a3011bd19e99b
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-03-30 18:22:20 -07:00
Greg Tucker
ec6e5de665 igzip: Move build_heap base functions to own file
Change-Id: I0161cd65c71df00fadad9dd69e207e9fb29a54ef
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-03-30 18:20:09 -07:00
Roy Oursler
f80a1ed62b igzip: Create base functions for build_heap and build_huff_tree
Change-Id: I19c2d7bbf1ac8270458b165a385c385a60f1cadc
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-03-30 18:18:14 -07:00
Greg Tucker
f316e96217 igzip: Add base function aliases and group src by arch
Change-Id: I4b6c8f62a09545d1ed4f48dc16b3fe8c8a5c72ea
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-03-30 18:16:36 -07:00
Greg Tucker
1268a57a32 raid: Add base function aliases and group src by arch
Change-Id: If7d987bcebb0ed1293d6836cd038746e7b0bbd85
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-03-30 18:09:32 -07:00
Greg Tucker
5d9cf8cadf ec: Fixes for 32-bit build
Change-Id: Iac362f0d7282716a8502afcec939b0d1877a943f
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-03-30 17:56:07 -07:00
Greg Tucker
a0bfd8d02b ec: Add base function aliases
Change-Id: I36f1a7948e0009ca5f4f67437f4aa704e737a05a
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-03-30 17:40:16 -07:00
Greg Tucker
14f07a9134 crc: Add base function aliases
Change-Id: I45e14808418b203e1761ae3ee4e4e510ae1e07e9
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-03-30 17:37:48 -07:00
Roy Oursler
871ad43fad igzip: Fix bug in isal_deflate where processed is not calculated correctly.
Change-Id: I61e15a18ebe3130e73010337d2d41f59e2227f08
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-03-30 17:20:32 -07:00
Roy Oursler
be09e56b36 igzip: Modify igzip_example to provide example of higher compression levels
Change-Id: Iccd0528ac088e1eec3921aa5fde99769d75e3334
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-03-30 17:19:54 -07:00
Roy Oursler
bad6569acc igzip: Document and add some defines for higher level compressions
Change-Id: I4d9fc9d7f697e721e247a64b2522ef2793c2d07f
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-03-30 17:18:55 -07:00
Roy Oursler
13692d61a3 igzip: Fix bug when 0 length buffer is passed
Avoid unsigned addition overflow when avail_in = 0 and next_in = -1 causing a
comparison to be correct when it should not be.

Change-Id: I3ff7123a89867317f383931cc95c8738ba2f8d56
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-03-30 17:17:04 -07:00
Greg Tucker
de7d639ab1 igzip: Change zero length array and typos found in windows build
Change-Id: Ia185b600af3c5f3b34bb9411daae6877f99b05d7
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-03-30 17:15:57 -07:00
Greg Tucker
cd2146de95 igzip: Add missing file to dist list
Change-Id: I6dc920bbae9bd1800ba12e7174cdf6bc98f75489
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-03-30 17:15:14 -07:00
Roy Oursler
9992cc1920 igzip: Implement static header on level 1 compress
Change-Id: I0fe61eb6d3994a0977a4486a2a4cf21af38dc250
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-03-30 17:13:12 -07:00
Roy Oursler
e38ed4b54e igzip: Increase isal_mod_hist size to stop histogram overflow
Change-Id: I1c651c9625d0fb543cd89e53e2b78b391176ef68
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-03-30 17:11:51 -07:00
Roy Oursler
761b207376 igzip: delete generate_constant_block_header since it has not function
Change-Id: Id236f7a847f660b35165f9936850b3ff73609017
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-03-30 17:09:13 -07:00
Roy Oursler
91fef2d39c igzip: Fix generate_custom_huffcodes
Reimplement generate_custom_hufftables.c to avoid internal dependencies. Remove
the symbol subset feature since a default incomplete huffman code is dangerous
and that functionaly is available when generating custom huffman codes.

Change-Id: I014d8e127b49583fe7d6ac9ce861cc3138ffec46
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-03-30 16:58:37 -07:00
Roy Oursler
5d15580467 igzip: Modify stateless_file_perf to test compression levels
Change-Id: If628e7f949da6862b99cebe6a0a59c6e6d5443e4
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-03-30 16:56:13 -07:00
Roy Oursler
a41d352a03 igzip: Modify assembly functions and structs to work on Windows
Change-Id: Icd106c3330dec72601e6f03340c07c6e1d1b5794
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-03-30 16:55:14 -07:00
Roy Oursler
f2c35a9fd4 igzip: Fix underflow bugs whe null pointer is passed in igzip base functions
Change-Id: Ia24326dcde6f8be9ee019690b8206d2779ddebb2
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-03-30 16:54:35 -07:00
Roy Oursler
7c7272e89c igzip: Optimize two pass first pass
Change-Id: I45efa8304f929d91238c8173c0fac00ec64ab323
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-03-30 16:53:26 -07:00
Roy Oursler
4725626847 igzip: Setup to allow encode_df to decode lit lit pairs.
Change-Id: Ic93797a09c5a908fc1c6005b40439978d7484773
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-03-30 16:52:21 -07:00
Roy Oursler
8a05e7d780 igzip: vecotorize encode_df_asm.asm
Change-Id: I11e005556e150a5bf8c6cc9410fa7d98196847f1
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-03-30 16:50:53 -07:00
Roy Oursler
fb13462fac igzip: Use SHLX and SHRX in encode_df_asm
Change-Id: Ic3165579587c905d8e347b35efa6cbedb5dbf5f3
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-03-30 16:50:53 -07:00
Roy Oursler
01dfbcc484 igzip: implement igzip two pass
Change-Id: I9564b2da251a02197b39cab5f141e7aff1ae8439
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-03-30 16:50:48 -07:00
Roy Oursler
43d1029b81 igzip: Modify sync_flush to not worry about writing eob
Change-Id: If3e7d5ff628574d715be348d96cdb82645985c25
Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>
2017-03-30 16:06:47 -07:00
Greg Tucker
4ec9df4f8a ec: Group src by arch
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-03-06 16:13:48 -07:00
Greg Tucker
8c975e9cbc crc: Group source by arch
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-03-06 15:46:02 -07:00
Greg Tucker
97bbddc723 build: Change to canonical system type in autoconf
Change-Id: I5eb76fcf5da46fe85ad4fc06511a47d822e7da2c
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-03-06 09:40:07 -07:00
Greg Tucker
5ec8ea0e14 doc: Add build details and contributing
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-02-24 14:50:34 -07:00
Greg Tucker
81c8c823cd Fix configure for a strict shell
The 'test ==' can give issues under some shells 'test =' is better.

Reported-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-01-20 16:04:47 -07:00
Greg Tucker
9be74b389f doc: Add doxyfile for API doc creation
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-01-11 19:14:23 -07:00
Greg Tucker
d549db38e5 doc: Fix igzip header description for isal_create_hufftables
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2017-01-11 10:31:44 -07:00
517 changed files with 101745 additions and 44764 deletions

46
.clang-format Normal file
View File

@ -0,0 +1,46 @@
# Copyright (c) 2024, Intel Corporation
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# * Neither the name of Intel Corporation nor the names of its contributors
# may be used to endorse or promote products derived from this software
# without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
BasedOnStyle: LLVM
IndentWidth: 8
Language: Cpp
BreakBeforeBraces: Linux
AllowShortIfStatementsOnASingleLine: false
IndentCaseLabels: false
UseTab: Never
AlignConsecutiveMacros: true
AlignTrailingComments: true
AlwaysBreakAfterReturnType: All
SortIncludes: false
BreakBeforeInheritanceComma: true
AllowAllParametersOfDeclarationOnNextLine: false
BinPackParameters: true
BinPackArguments: true
ReflowComments: true
ColumnLimit: 100
Cpp11BracedListStyle: false
MaxEmptyLinesToKeep: 1
ContinuationIndentWidth: 8
SpaceAfterCStyleCast: true

6
.clang-format-ignore Normal file
View File

@ -0,0 +1,6 @@
include/aarch64_multibinary.h
include/aarch64_label.h
**/aarch64/*.h
include/riscv64_multibinary.h
**/riscv64/*.h

143
.github/workflows/ci.yml vendored Normal file
View File

@ -0,0 +1,143 @@
name: Continous integration
on:
pull_request:
push:
branches:
- master
tags:
- "*"
permissions:
contents: read
jobs:
check_format:
env:
CLANGFORMAT: clang-format-18
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4.2.2
with:
fetch-depth: 2
- name: Install clang-format-18
run: |
wget https://apt.llvm.org/llvm.sh
chmod +x llvm.sh
sudo ./llvm.sh 18
sudo apt install -y clang-format-18
- name: Run format check
run: bash tools/check_format.sh
run_tests_unix:
needs: check_format
strategy:
matrix:
os:
- ubuntu-latest
- macos-13 # x86_64
- macos-14 # arm64
assembler:
- nasm
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v4.2.2
- name: Install build dependencies (Linux)
run: sudo apt install ${{ matrix.assembler }}
if: runner.os == 'Linux'
- name: Install build dependencies (Macos)
run: brew install ${{ matrix.assembler }} automake autoconf coreutils libtool
if: runner.os == 'macOS'
- name: Build
run: |
./autogen.sh
./configure
bash -c 'make -j $(nproc)'
- name: Run tests
run: bash tools/test_checks.sh
- name: Run extended tests
run: bash tools/test_extended.sh
run_tests_mingw_linux_64:
needs: check_format
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4.2.2
- name: Install build dependencies (Linux)
run: sudo apt install nasm mingw-w64
- name: Build
shell: bash
run: |
make -j $(nproc) -f Makefile.unx programs/igzip tests arch=mingw host_cpu=x86_64
# wine does not seem available, hence cannot run tests.
run_tests_mingw_linux_32:
needs: check_format
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4.2.2
- name: Install build dependencies (Linux)
run: sudo apt install nasm mingw-w64
- name: Build
shell: bash
run: |
make -j $(nproc) -f Makefile.unx programs/igzip tests arch=mingw host_cpu=base_aliases CC=i686-w64-mingw32-gcc
run_tests_mingw_windows_64:
needs: check_format
runs-on: windows-latest
steps:
- uses: actions/checkout@v4.2.2
- name: Install nasm
uses: ilammy/setup-nasm@v1.5.2
- name: Build
shell: bash
run: |
make -j $(nproc) -f Makefile.unx programs/igzip tests SIM= arch=mingw host_cpu=x86_64 AR=x86_64-w64-mingw32-gcc-ar
- name: Run tests
shell: bash
run: |
# autoconf is missing, hence simulates test_checks.sh
make -j $(nproc) -f Makefile.unx check D=TEST_SEED=0 SIM= arch=mingw host_cpu=x86_64 AR=x86_64-w64-mingw32-gcc-ar
- name: Run extended tests
shell: bash
run: |
# simulates test_extended.sh
make -j $(nproc) -f Makefile.unx perf D=TEST_SEED=0 SIM= arch=mingw host_cpu=x86_64 AR=x86_64-w64-mingw32-gcc-ar
make -j $(nproc) -f Makefile.unx test D=TEST_SEED=0 SIM= arch=mingw host_cpu=x86_64 AR=x86_64-w64-mingw32-gcc-ar
# seems like i686-w64-mingw32-gcc is not available on windows runner.
run_tests_windows:
needs: check_format
runs-on: windows-latest
steps:
- uses: actions/checkout@v4.2.2
- name: Set MSVC developer prompt
uses: ilammy/msvc-dev-cmd@v1.13.0
- name: Install nasm
uses: ilammy/setup-nasm@v1.5.2
- name: Build
run: |
nmake -f Makefile.nmake || exit /b 1
nmake checks -f Makefile.nmake || exit /b 1
nmake perfs -f Makefile.nmake || exit /b 1
- name: Run perf apps
run: nmake perf -f Makefile.nmake || exit /b 1
- name: Run checks
run: nmake check -f Makefile.nmake || exit /b 1
run_tests_linux-riscv64-v:
needs: check_format
runs-on: run_tests_linux-riscv64-v
steps:
- uses: actions/checkout@v4.2.2
- name: Build
run: |
./autogen.sh
./configure
bash -c 'make -j $(nproc)'
- name: Run tests
run: bash tools/test_checks.sh
- name: Run extended tests
run: bash tools/test_extended.sh

40
.gitignore vendored Normal file
View File

@ -0,0 +1,40 @@
# Objects
*~
*.o
*.lo
*.so
*.dll
*.exp
*.lib
bin
# Autobuild
Makefile
Makefile.in
aclocal.m4
autom4te.cache
build-aux
config.*
configure
.deps
.dirstamp
.libs
libtool
# Generated files
isa-l.h
/libisal.la
libisal.pc
crc/*_perf
crc/*_test
erasure_code/*_perf
erasure_code/*_test
erasure_code/gf_vect_dot_prod_1tbl
igzip/*_perf
igzip/*_test
igzip/shim/build
mem/*_perf
mem/*_test
programs/igzip
raid/*_perf
raid/*_test

View File

@ -1,11 +1,86 @@
sudo: required
dist: trusty
before_script:
- sudo apt-get -q update
- sudo apt-get install -y yasm nasm
- ./autogen.sh
script: ./configure && make && make check
language: c
compiler:
- clang
- gcc
sudo: required
matrix:
include:
### OS X
- os: osx
osx_image: xcode12.5
addons:
homebrew:
packages:
- nasm
env: C_COMPILER=clang
### linux gcc and format check
- os: linux
dist: bionic
addons:
apt:
packages:
- nasm
install:
# Install newer indent to check formatting
- sudo apt-get install texinfo
- wget http://archive.ubuntu.com/ubuntu/pool/main/i/indent/indent_2.2.12.orig.tar.xz -O /tmp/indent.tar.xz
- tar -xJf /tmp/indent.tar.xz -C /tmp/
- pushd /tmp/indent-2.2.12 && ./configure --prefix=/usr && make && sudo make install && popd
env: C_COMPILER=gcc
### linux clang
- os: linux
dist: bionic
addons:
apt:
packages:
- nasm
env: C_COMPILER=clang
### linux older gcc
- os: linux
dist: xenial
addons:
apt:
sources:
- ubuntu-toolchain-r-test
packages:
- g++-4.7
- nasm
env: C_COMPILER=gcc-4.7
### arm64: gcc-5.4
- os: linux
dist: bionic
arch: arm64
env: C_COMPILER=gcc
### arm64: gcc-5.4 extended tests
- os: linux
dist: bionic
arch: arm64
env: TEST_TYPE=ext
### linux extended tests
- os: linux
dist: xenial
addons:
apt:
sources:
- ubuntu-toolchain-r-test
packages:
- binutils-mingw-w64-x86-64
- gcc-mingw-w64-x86-64
- wine
- nasm
env: TEST_TYPE=ext
before_install:
- if [ -n "${C_COMPILER}" ]; then export CC="${C_COMPILER}"; fi
- if [ -n "${AS_ASSEMBL}" ]; then export AS="${AS_ASSEMBL}"; fi
before_script:
- if [ $TRAVIS_OS_NAME = linux ]; then sudo apt-get -q update; fi
script:
- if [ -n "${CC}" ]; then $CC --version; fi
- if [ -n "${AS}" ]; then $AS --version || echo No version; fi
- ./tools/test_autorun.sh "${TEST_TYPE}"

230
CMakeLists.txt Normal file
View File

@ -0,0 +1,230 @@
# cmake-format: off
# Copyright (c) 2025, Intel Corporation
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# * Neither the name of Intel Corporation nor the names of its contributors
# may be used to endorse or promote products derived from this software
# without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
# cmake-format: on
cmake_minimum_required(VERSION 3.12)
cmake_policy(VERSION 3.12)
project(ISA-L
VERSION 2.31.0
DESCRIPTION "Intel's ISA-L (Intelligent Storage Acceleration Library)"
LANGUAGES C ASM
)
# Enable NASM for x86_64 builds
if(CMAKE_SYSTEM_PROCESSOR STREQUAL "x86_64" OR CMAKE_SYSTEM_PROCESSOR STREQUAL "AMD64")
enable_language(ASM_NASM)
endif()
# Enable testing
option(BUILD_TESTS "Build the testing tree" ON)
if(BUILD_TESTS)
enable_testing()
include(CTest)
endif()
# Enable building ISAL shim library
option(BUILD_ISAL_SHIM "Build the ISAL shim library" OFF)
# Set default build type
if(NOT CMAKE_BUILD_TYPE)
set(CMAKE_BUILD_TYPE Release)
endif()
# Detect processor architecture
if(CMAKE_SYSTEM_PROCESSOR STREQUAL "x86_64" OR CMAKE_SYSTEM_PROCESSOR STREQUAL "AMD64")
set(CPU_X86_64 ON)
set(ARCH_DEF "x86_64")
elseif(CMAKE_SYSTEM_PROCESSOR STREQUAL "aarch64" OR CMAKE_SYSTEM_PROCESSOR STREQUAL "arm64")
set(CPU_AARCH64 ON)
set(ARCH_DEF "aarch64")
elseif(CMAKE_SYSTEM_PROCESSOR STREQUAL "ppc64le")
set(CPU_PPC64LE ON)
set(ARCH_DEF "ppc64le")
elseif(CMAKE_SYSTEM_PROCESSOR STREQUAL "riscv64")
set(CPU_RISCV64 ON)
set(ARCH_DEF "riscv64")
else()
set(CPU_UNDEFINED ON)
endif()
# Compiler and assembler setup
if(CPU_X86_64)
# Configure NASM flags
set(CMAKE_ASM_NASM_FLAGS "-f elf64 -D LINUX")
set(CMAKE_ASM_NASM_INCLUDES "-I ${CMAKE_SOURCE_DIR}/include/")
set(USE_NASM ON)
elseif(CPU_AARCH64 OR CPU_RISCV64)
# Use C compiler for assembly on ARM and RISC-V
set(ASM_FILTER "${CMAKE_C_COMPILER} -D__ASSEMBLY__")
endif()
# Set include directories
set(ISAL_INCLUDE_DIRS
${CMAKE_SOURCE_DIR}/include
)
# Initialize EXTERN_HEADERS list
set(EXTERN_HEADERS)
# Compiler flags
if(ARCH_DEF)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -D${ARCH_DEF}")
endif()
if(CPU_AARCH64 OR CPU_RISCV64)
set(CMAKE_ASM_FLAGS "${CMAKE_C_FLAGS}")
endif()
# Library version (semantic versioning)
set(LIBISAL_VERSION_MAJOR 2)
set(LIBISAL_VERSION_MINOR 31)
set(LIBISAL_VERSION_PATCH 0)
# Include CMake modules for each library component
include(cmake/erasure_code.cmake)
include(cmake/raid.cmake)
include(cmake/crc.cmake)
include(cmake/igzip.cmake)
include(cmake/mem.cmake)
# Conditionally build ISAL shim library
if(BUILD_ISAL_SHIM)
add_subdirectory(igzip/shim)
endif()
# Add test.h to extern headers (used by all modules)
list(APPEND EXTERN_HEADERS include/test.h)
# Create the main ISA-L library
# Build type option
option(BUILD_SHARED_LIBS "Build shared libraries" ON)
# Create the main ISA-L library
add_library(isal
${ERASURE_CODE_SOURCES}
${RAID_SOURCES}
${CRC_SOURCES}
${IGZIP_SOURCES}
${MEM_SOURCES}
)
# Set library properties
set_target_properties(isal PROPERTIES
VERSION ${LIBISAL_VERSION_MAJOR}.${LIBISAL_VERSION_MINOR}.${LIBISAL_VERSION_PATCH}
SOVERSION ${LIBISAL_VERSION_MAJOR}
PUBLIC_HEADER "${EXTERN_HEADERS}"
)
# Configure include directories for NASM assembly files
if(CPU_X86_64 AND USE_NASM)
# Filter assembly files by module and set appropriate include directories
foreach(source IN LISTS ERASURE_CODE_SOURCES RAID_SOURCES CRC_SOURCES IGZIP_SOURCES MEM_SOURCES)
if(source MATCHES "\\.asm$")
get_filename_component(source_dir ${source} DIRECTORY)
set_source_files_properties(${source} PROPERTIES
INCLUDE_DIRECTORIES "${CMAKE_SOURCE_DIR}/include;${CMAKE_SOURCE_DIR}/${source_dir}")
endif()
endforeach()
endif()
# Include directories
target_include_directories(isal
PUBLIC
$<BUILD_INTERFACE:${CMAKE_SOURCE_DIR}/include>
$<INSTALL_INTERFACE:include>
)
# Generate isa-l.h header
set(ISAL_HEADER "${CMAKE_BINARY_DIR}/isa-l.h")
configure_file(${CMAKE_SOURCE_DIR}/cmake/isa-l.h.in ${ISAL_HEADER} @ONLY)
# Install targets
include(GNUInstallDirs)
# Install library
install(TARGETS isal
EXPORT ISALTargets
LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}
ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR}
RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR}
PUBLIC_HEADER DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/isa-l
)
# Install generated header
install(FILES ${ISAL_HEADER}
DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}
)
# Install headers
install(DIRECTORY include/
DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/isa-l
FILES_MATCHING PATTERN "*.h"
)
# Export targets
install(EXPORT ISALTargets
FILE ISALTargets.cmake
NAMESPACE ISAL::
DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/ISAL
)
# Generate and install package config files
include(CMakePackageConfigHelpers)
configure_package_config_file(
"${CMAKE_SOURCE_DIR}/cmake/ISALConfig.cmake.in"
"${CMAKE_BINARY_DIR}/ISALConfig.cmake"
INSTALL_DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/ISAL
)
write_basic_package_version_file(
"${CMAKE_BINARY_DIR}/ISALConfigVersion.cmake"
VERSION ${PROJECT_VERSION}
COMPATIBILITY SameMajorVersion
)
install(FILES
"${CMAKE_BINARY_DIR}/ISALConfig.cmake"
"${CMAKE_BINARY_DIR}/ISALConfigVersion.cmake"
DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/ISAL
)
# Optional: Create pkg-config file
set(prefix ${CMAKE_INSTALL_PREFIX})
set(exec_prefix \${prefix})
set(libdir \${prefix}/${CMAKE_INSTALL_LIBDIR})
set(includedir \${prefix}/${CMAKE_INSTALL_INCLUDEDIR})
set(VERSION ${PROJECT_VERSION})
configure_file(
"${CMAKE_SOURCE_DIR}/libisal.pc.in"
"${CMAKE_BINARY_DIR}/libisal.pc"
@ONLY
)
install(FILES "${CMAKE_BINARY_DIR}/libisal.pc"
DESTINATION ${CMAKE_INSTALL_LIBDIR}/pkgconfig
)

39
CONTRIBUTING.md Normal file
View File

@ -0,0 +1,39 @@
# Contributing to ISA-L
Everyone is welcome to contribute. Patches may be submitted using GitHub pull
requests (PRs). All commits must be signed off by the developer (--signoff)
which indicates that you agree to the Developer Certificate of Origin. Patch
discussion will happen directly on the GitHub PR. Design pre-work and general
discussion occurs on the [mailing list]. Anyone can provide feedback in either
location and all discussion is welcome. Decisions on whether to merge patches
will be handled by the maintainer.
## License
ISA-L is licensed using a BSD 3-clause [license]. All code submitted to
the project is required to carry that license.
## Certificate of Origin
In order to get a clear contribution chain of trust we use the
[signed-off-by language] used by the Linux kernel project.
## Mailing List
Contributors and users are welcome to submit new request on our roadmap, submit
patches, file issues, and ask questions on our [mailing list].
## Coding Style
The coding style for ISA-L C code is roughly based on LLVM style with
some customizations. Use the included format script to format C code.
./tools/format.sh
And use check format script before submitting.
./tools/check_format.sh
[mailing list]:https://lists.01.org/hyperkitty/list/isal@lists.01.org/
[license]:LICENSE
[signed-off-by language]:https://01.org/community/signed-process

35
Doxyfile Normal file
View File

@ -0,0 +1,35 @@
PROJECT_NAME = "Intel Intelligent Storage Acceleration Library"
PROJECT_BRIEF = "ISA-L API reference doc"
OUTPUT_DIRECTORY = generated_doc
FULL_PATH_NAMES = NO
TAB_SIZE = 8
ALIASES = "requires=\xrefitem requires \"Requires\" \"Instruction Set Requirements for arch-specific functions (non-multibinary)\""
OPTIMIZE_OUTPUT_FOR_C = YES
HIDE_UNDOC_MEMBERS = YES
USE_MDFILE_AS_MAINPAGE = README.md
INPUT = isa-l.h \
include \
README.md \
CONTRIBUTING.md \
SECURITY.md \
Release_notes.txt \
doc/functions.md \
doc/test.md \
doc/build.md
EXCLUDE = include/test.h include/unaligned.h
EXCLUDE_PATTERNS = */include/*_multibinary.h
EXAMPLE_PATH = . crc raid erasure_code igzip
PAPER_TYPE = letter
LATEX_SOURCE_CODE = YES
GENERATE_TREEVIEW = YES
MACRO_EXPANSION = YES
EXPAND_ONLY_PREDEF = YES
PREDEFINED = "DECLARE_ALIGNED(n, a)=ALIGN n" \
__declspec(x)='x' \
align(x)='ALIGN \
x'
EXPAND_AS_DEFINED = DECLARE_ALIGNED
EXTENSION_MAPPING = "txt=md"

View File

@ -1,4 +1,4 @@
Copyright(c) 2011-2016 Intel Corporation All rights reserved.
Copyright(c) 2011-2024 Intel Corporation All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
@ -24,3 +24,5 @@
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
SPDX-License-Identifier: BSD-3-Clause

34
MAINTAINERS.md Normal file
View File

@ -0,0 +1,34 @@
ISA-L Maintainers
=================
The intention of this file is to provide a set of names that we can rely on
for assisting with pull requests and questions.
A pull request targeting architecture optimizations requires the approval
of the maintainer of that architecture.\
A pull request targeting the base implementation, general API or broader project
changes requires the approval of the general maintainer.
Descriptions of section entries:
M: Maintainer's Full Name <address@domain>
General Project Administration
------------------------------
M: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Base Implementations
--------------------
M: Pablo de Lara <pablo.de.lara.guarch@intel.com>
x86 Architecture
----------------
M: Pablo de Lara <pablo.de.lara.guarch@intel.com>
ARM Architecture
----------------
M: Liu Qinfei <lucas.liuqinfei@huawei.com>
RISC-V Architecture
-------------------
M: Sun Yuechi <sunyuechi@iscas.ac.cn>

View File

@ -1,11 +1,18 @@
EXTRA_DIST = autogen.sh Makefile.unx make.inc Makefile.nmake isa-l.def LICENSE README.md
EXTRA_DIST = autogen.sh Makefile.unx make.inc Makefile.nmake isa-l.def LICENSE README.md Doxyfile CONTRIBUTING.md
CLEANFILES =
LDADD =
AM_MAKEFLAGS = --no-print-directory
noinst_HEADERS =
pkginclude_HEADERS = include/test.h
noinst_LTLIBRARIES =
bin_PROGRAMS =
INCLUDE = -I $(srcdir)/include/
D =
pkgconfigdir = $(libdir)/pkgconfig
pkgconfig_DATA = libisal.pc
EXTRA_DIST += libisal.pc.in
CLEANFILES += libisal.pc
lsrc=
src_include=
@ -18,9 +25,15 @@ unit_tests_extra=
perf_tests_extra=
examples=
other_tests=
lsrc32=
unit_tests32=
perf_tests32=
other_tests_x86_64=
other_tests_ppc64le=
other_tests_riscv64=
lsrc_x86_64=
lsrc_aarch64=
lsrc_ppc64le=
lsrc_riscv64=
lsrc_base_aliases=
progs=
# Include units
@ -28,15 +41,48 @@ include erasure_code/Makefile.am
include raid/Makefile.am
include crc/Makefile.am
include igzip/Makefile.am
include tests/fuzz/Makefile.am
include examples/ec/Makefile.am
include programs/Makefile.am
include mem/Makefile.am
# LIB version info not necessarily the same as package version
LIBISAL_CURRENT=2
LIBISAL_REVISION=17
LIBISAL_REVISION=31
LIBISAL_AGE=0
lib_LTLIBRARIES = libisal.la
pkginclude_HEADERS += $(sort ${extern_hdrs})
libisal_la_SOURCES = ${lsrc}
if CPU_X86_64
ARCH=-Dx86_64
libisal_la_SOURCES += ${lsrc_x86_64}
other_tests += ${other_tests_x86_64}
endif
if CPU_AARCH64
ARCH=-Daarch64
libisal_la_SOURCES += ${lsrc_aarch64}
other_tests += ${other_tests_aarch64}
endif
if CPU_PPC64LE
ARCH=-Dppc64le
libisal_la_SOURCES += ${lsrc_ppc64le}
other_tests += ${other_tests_ppc64le}
endif
if CPU_RISCV64
ARCH=-Driscv64
libisal_la_SOURCES += ${lsrc_riscv64}
other_tests += ${other_tests_riscv64}
endif
if CPU_UNDEFINED
libisal_la_SOURCES += ${lsrc_base_aliases}
endif
nobase_include_HEADERS = isa-l.h
libisal_la_LDFLAGS = $(AM_LDFLAGS) \
-version-info $(LIBISAL_CURRENT):$(LIBISAL_REVISION):$(LIBISAL_AGE)
@ -57,6 +103,7 @@ EXTRA_PROGRAMS += ${other_tests}
EXTRA_PROGRAMS += ${examples}
CLEANFILES += ${EXTRA_PROGRAMS}
programs:${progs}
perfs: ${perf_tests}
tests: ${unit_tests}
checks: ${check_tests}
@ -70,19 +117,30 @@ test: $(addsuffix .run,$(unit_tests))
$<
@echo Completed run: $<
# Support for yasm/nasm
if USE_YASM
as_filter = ${srcdir}/tools/yasm-filter.sh
endif
# Support for nasm/gas
if USE_NASM
as_filter = ${srcdir}/tools/nasm-filter.sh
endif
if CPU_AARCH64
as_filter = $(CC) -D__ASSEMBLY__
endif
if CPU_RISCV64
as_filter = $(CC) -D__ASSEMBLY__
endif
CCAS = $(as_filter)
EXTRA_DIST += tools/yasm-filter.sh tools/nasm-filter.sh
EXTRA_DIST += tools/nasm-filter.sh
EXTRA_DIST += tools/nasm-cet-filter.sh
AM_CFLAGS = ${my_CFLAGS} ${INCLUDE} $(src_include) ${D}
AM_CCASFLAGS = ${yasm_args} ${INCLUDE} ${src_include} ${DEFS} ${D}
AM_CFLAGS = ${my_CFLAGS} ${INCLUDE} $(src_include) ${ARCH} ${D}
if CPU_AARCH64
AM_CCASFLAGS = ${AM_CFLAGS}
else
AM_CCASFLAGS = ${asm_args} ${INCLUDE} ${src_include} ${DEFS} ${D}
endif
if CPU_RISCV64
AM_CCASFLAGS = ${AM_CFLAGS}
endif
.asm.s:
@echo " MKTMP " $@;
@ -94,6 +152,11 @@ CLEANFILES += isa-l.h
isa-l.h:
@echo 'Building $@'
@echo '' >> $@
@echo '/**' >> $@
@echo ' * @file isa-l.h' >> $@
@echo ' * @brief Include for ISA-L library' >> $@
@echo ' */' >> $@
@echo '' >> $@
@echo '#ifndef _ISAL_H_' >> $@
@echo '#define _ISAL_H_' >> $@
@echo '' >> $@
@ -106,9 +169,7 @@ isa-l.h:
@for unit in $(sort $(extern_hdrs)); do echo "#include <isa-l/$$unit>" | sed -e 's;include/;;' >> $@; done
@echo '#endif //_ISAL_H_' >> $@
license = bsd
licc = $(srcdir)/doc/license_$(license)_c.txt
lica = $(srcdir)/doc/license_$(license)_asm.txt
licm = $(srcdir)/doc/license_$(license)_make.txt
doc: isa-l.h
(cat Doxyfile; echo 'PROJECT_NUMBER=${VERSION}') | doxygen -
$(MAKE) -C generated_doc/latex &> generated_doc/latex_build_api.log
cp generated_doc/latex/refman.pdf isa-l_api_${VERSION}.pdf

View File

@ -1,8 +1,8 @@
########################################################################
# Copyright(c) 2011-2016 Intel Corporation All rights reserved.
# Copyright(c) 2011-2024 Intel Corporation All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
@ -25,121 +25,207 @@
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
# SPDX-License-Identifier: BSD-3-Clause
########################################################################
objs = \
# This file can be auto-regenerated with $make -f Makefile.unx Makefile.nmake
objs = \
bin\ec_base.obj \
bin\raid_base.obj \
bin\crc_base.obj \
bin\crc64_base.obj \
bin\igzip.obj \
bin\hufftables_c.obj \
bin\igzip_base.obj \
bin\igzip_icf_base.obj \
bin\adler32_base.obj \
bin\flatten_ll.obj \
bin\encode_df.obj \
bin\igzip_icf_body.obj \
bin\huff_codes.obj \
bin\igzip_inflate.obj \
bin\mem_zero_detect_base.obj \
bin\ec_highlevel_func.obj \
bin\ec_multibinary.obj \
bin\gf_2vect_dot_prod_avx.obj \
bin\gf_2vect_dot_prod_avx2.obj \
bin\gf_2vect_dot_prod_avx512.obj \
bin\gf_2vect_dot_prod_sse.obj \
bin\gf_2vect_mad_avx.obj \
bin\gf_2vect_mad_avx2.obj \
bin\gf_2vect_mad_avx512.obj \
bin\gf_2vect_mad_sse.obj \
bin\gf_3vect_dot_prod_avx.obj \
bin\gf_3vect_dot_prod_avx2.obj \
bin\gf_3vect_dot_prod_avx512.obj \
bin\gf_3vect_dot_prod_sse.obj \
bin\gf_3vect_mad_avx.obj \
bin\gf_3vect_mad_avx2.obj \
bin\gf_3vect_mad_avx512.obj \
bin\gf_3vect_mad_sse.obj \
bin\gf_4vect_dot_prod_avx.obj \
bin\gf_4vect_dot_prod_avx2.obj \
bin\gf_4vect_dot_prod_avx512.obj \
bin\gf_4vect_dot_prod_sse.obj \
bin\gf_4vect_mad_avx.obj \
bin\gf_4vect_mad_avx2.obj \
bin\gf_4vect_mad_avx512.obj \
bin\gf_4vect_mad_sse.obj \
bin\gf_5vect_dot_prod_avx.obj \
bin\gf_5vect_dot_prod_avx2.obj \
bin\gf_5vect_dot_prod_sse.obj \
bin\gf_5vect_mad_avx.obj \
bin\gf_5vect_mad_avx2.obj \
bin\gf_5vect_mad_sse.obj \
bin\gf_6vect_dot_prod_avx.obj \
bin\gf_6vect_dot_prod_avx2.obj \
bin\gf_6vect_dot_prod_sse.obj \
bin\gf_6vect_mad_avx.obj \
bin\gf_6vect_mad_avx2.obj \
bin\gf_6vect_mad_sse.obj \
bin\gf_vect_mul_sse.obj \
bin\gf_vect_mul_avx.obj \
bin\gf_vect_dot_prod_sse.obj \
bin\gf_vect_dot_prod_avx.obj \
bin\gf_vect_dot_prod_avx2.obj \
bin\gf_vect_dot_prod_avx512.obj \
bin\gf_vect_dot_prod_sse.obj \
bin\gf_vect_mad_avx.obj \
bin\gf_vect_mad_avx2.obj \
bin\gf_vect_mad_avx512.obj \
bin\gf_2vect_dot_prod_sse.obj \
bin\gf_3vect_dot_prod_sse.obj \
bin\gf_4vect_dot_prod_sse.obj \
bin\gf_5vect_dot_prod_sse.obj \
bin\gf_6vect_dot_prod_sse.obj \
bin\gf_2vect_dot_prod_avx.obj \
bin\gf_3vect_dot_prod_avx.obj \
bin\gf_4vect_dot_prod_avx.obj \
bin\gf_5vect_dot_prod_avx.obj \
bin\gf_6vect_dot_prod_avx.obj \
bin\gf_2vect_dot_prod_avx2.obj \
bin\gf_3vect_dot_prod_avx2.obj \
bin\gf_4vect_dot_prod_avx2.obj \
bin\gf_5vect_dot_prod_avx2.obj \
bin\gf_6vect_dot_prod_avx2.obj \
bin\gf_vect_mad_sse.obj \
bin\gf_vect_mul_avx.obj \
bin\gf_vect_mul_sse.obj \
bin\gf_2vect_mad_sse.obj \
bin\gf_3vect_mad_sse.obj \
bin\gf_4vect_mad_sse.obj \
bin\gf_5vect_mad_sse.obj \
bin\gf_6vect_mad_sse.obj \
bin\gf_vect_mad_avx.obj \
bin\gf_2vect_mad_avx.obj \
bin\gf_3vect_mad_avx.obj \
bin\gf_4vect_mad_avx.obj \
bin\gf_5vect_mad_avx.obj \
bin\gf_6vect_mad_avx.obj \
bin\gf_vect_mad_avx2.obj \
bin\gf_2vect_mad_avx2.obj \
bin\gf_3vect_mad_avx2.obj \
bin\gf_4vect_mad_avx2.obj \
bin\gf_5vect_mad_avx2.obj \
bin\gf_6vect_mad_avx2.obj \
bin\ec_multibinary.obj \
bin\gf_vect_mad_avx2_gfni.obj \
bin\gf_2vect_mad_avx2_gfni.obj \
bin\gf_3vect_mad_avx2_gfni.obj \
bin\gf_4vect_mad_avx2_gfni.obj \
bin\gf_5vect_mad_avx2_gfni.obj \
bin\gf_vect_dot_prod_avx512.obj \
bin\gf_2vect_dot_prod_avx512.obj \
bin\gf_3vect_dot_prod_avx512.obj \
bin\gf_4vect_dot_prod_avx512.obj \
bin\gf_5vect_dot_prod_avx512.obj \
bin\gf_6vect_dot_prod_avx512.obj \
bin\gf_vect_dot_prod_avx512_gfni.obj \
bin\gf_vect_dot_prod_avx2_gfni.obj \
bin\gf_2vect_dot_prod_avx2_gfni.obj \
bin\gf_3vect_dot_prod_avx2_gfni.obj \
bin\gf_2vect_dot_prod_avx512_gfni.obj \
bin\gf_3vect_dot_prod_avx512_gfni.obj \
bin\gf_4vect_dot_prod_avx512_gfni.obj \
bin\gf_5vect_dot_prod_avx512_gfni.obj \
bin\gf_6vect_dot_prod_avx512_gfni.obj \
bin\gf_vect_mad_avx512.obj \
bin\gf_2vect_mad_avx512.obj \
bin\gf_3vect_mad_avx512.obj \
bin\gf_4vect_mad_avx512.obj \
bin\gf_5vect_mad_avx512.obj \
bin\gf_6vect_mad_avx512.obj \
bin\gf_vect_mad_avx512_gfni.obj \
bin\gf_2vect_mad_avx512_gfni.obj \
bin\gf_3vect_mad_avx512_gfni.obj \
bin\gf_4vect_mad_avx512_gfni.obj \
bin\gf_5vect_mad_avx512_gfni.obj \
bin\gf_6vect_mad_avx512_gfni.obj \
bin\xor_gen_sse.obj \
bin\pq_gen_sse.obj \
bin\xor_check_sse.obj \
bin\pq_check_sse.obj \
bin\pq_gen_avx.obj \
bin\pq_gen_avx2.obj \
bin\pq_gen_sse.obj \
bin\raid_base.obj \
bin\raid_multibinary.obj \
bin\xor_check_sse.obj \
bin\xor_gen_avx.obj \
bin\xor_gen_sse.obj \
bin\pq_gen_avx2.obj \
bin\pq_gen_avx2_gfni.obj \
bin\xor_gen_avx512.obj \
bin\pq_gen_avx512.obj \
bin\pq_gen_avx512_gfni.obj \
bin\raid_multibinary.obj \
bin\crc16_t10dif_01.obj \
bin\crc16_t10dif_by4.obj \
bin\crc32_gzip.obj \
bin\crc16_t10dif_02.obj \
bin\crc16_t10dif_by16_10.obj \
bin\crc16_t10dif_copy_by4.obj \
bin\crc16_t10dif_copy_by4_02.obj \
bin\crc32_ieee_01.obj \
bin\crc32_ieee_02.obj \
bin\crc32_ieee_by4.obj \
bin\crc32_iscsi_00.obj \
bin\crc32_ieee_by16_10.obj \
bin\crc32_iscsi_01.obj \
bin\crc64_base.obj \
bin\crc64_ecma_norm_by8.obj \
bin\crc64_ecma_refl_by8.obj \
bin\crc64_iso_norm_by8.obj \
bin\crc64_iso_refl_by8.obj \
bin\crc64_jones_norm_by8.obj \
bin\crc64_jones_refl_by8.obj \
bin\crc64_multibinary.obj \
bin\crc_base.obj \
bin\crc_data.obj \
bin\crc32_iscsi_by8_02.obj \
bin\crc32_iscsi_by16_10.obj \
bin\crc_multibinary.obj \
bin\huff_codes.obj \
bin\hufftables_c.obj \
bin\igzip.obj \
bin\igzip_base.obj \
bin\igzip_body_01.obj \
bin\igzip_body_02.obj \
bin\igzip_body_04.obj \
bin\igzip_decode_block_stateless_01.obj \
bin\igzip_decode_block_stateless_04.obj \
bin\crc64_multibinary.obj \
bin\crc64_ecma_refl_by8.obj \
bin\crc64_ecma_refl_by16_10.obj \
bin\crc64_ecma_norm_by8.obj \
bin\crc64_ecma_norm_by16_10.obj \
bin\crc64_iso_refl_by8.obj \
bin\crc64_iso_refl_by16_10.obj \
bin\crc64_iso_norm_by8.obj \
bin\crc64_iso_norm_by16_10.obj \
bin\crc64_jones_refl_by8.obj \
bin\crc64_jones_refl_by16_10.obj \
bin\crc64_jones_norm_by8.obj \
bin\crc64_jones_norm_by16_10.obj \
bin\crc64_rocksoft_refl_by8.obj \
bin\crc64_rocksoft_refl_by16_10.obj \
bin\crc64_rocksoft_norm_by8.obj \
bin\crc64_rocksoft_norm_by16_10.obj \
bin\crc32_gzip_refl_by8.obj \
bin\crc32_gzip_refl_by8_02.obj \
bin\crc32_gzip_refl_by16_10.obj \
bin\igzip_body.obj \
bin\igzip_finish.obj \
bin\igzip_inflate.obj \
bin\igzip_inflate_multibinary.obj \
bin\igzip_icf_body_h1_gr_bt.obj \
bin\igzip_icf_finish.obj \
bin\rfc1951_lookup.obj \
bin\adler32_sse.obj \
bin\adler32_avx2_4.obj \
bin\igzip_multibinary.obj \
bin\igzip_update_histogram_01.obj \
bin\igzip_update_histogram_04.obj \
bin\rfc1951_lookup.obj \
bin\detect_repeated_char.obj
bin\igzip_decode_block_stateless_01.obj \
bin\igzip_decode_block_stateless_04.obj \
bin\igzip_inflate_multibinary.obj \
bin\encode_df_04.obj \
bin\encode_df_06.obj \
bin\proc_heap.obj \
bin\igzip_deflate_hash.obj \
bin\igzip_gen_icf_map_lh1_06.obj \
bin\igzip_gen_icf_map_lh1_04.obj \
bin\igzip_set_long_icf_fg_04.obj \
bin\igzip_set_long_icf_fg_06.obj \
bin\mem_zero_detect_avx512.obj \
bin\mem_zero_detect_avx2.obj \
bin\mem_zero_detect_avx.obj \
bin\mem_zero_detect_sse.obj \
bin\mem_multibinary.obj
INCLUDES = -I./ -Ierasure_code/ -Iraid/ -Icrc/ -Iigzip/ -Iinclude/
LINKFLAGS = /nologo
CFLAGS = -O2 -D NDEBUG /nologo -D_USE_MATH_DEFINES -Qstd=c99 $(INCLUDES) $(D)
AFLAGS = -f win64 $(INCLUDES) $(D)
CC = icl
AS = yasm
INCLUDES = -I./ -Ierasure_code/ -Iraid/ -Icrc/ -Iigzip/ -Iprograms/ -Imem/ -Iinclude/ -Itests/fuzz/ -Iexamples/ec/
CFLAGS_REL = -O2 -DNDEBUG /Z7 /Gy
CFLAGS_DBG = -Od -DDEBUG /Z7
LINKFLAGS = -nologo -incremental:no -debug
CFLAGS = $(CFLAGS_REL) -nologo -D_USE_MATH_DEFINES $(INCLUDES) $(D)
AFLAGS = -f win64 $(INCLUDES) $(D)
CC = cl
# or CC = icl -Qstd=c99
AS = nasm
lib: bin static dll
static: bin isa-l_static.lib
dll: bin isa-l.dll
static: bin isa-l_static.lib isa-l.h
dll: bin isa-l.dll isa-l.h
bin: ; -mkdir $@
isa-l_static.lib: $(objs)
lib -out:$@ $?
lib -out:$@ @<<
$?
<<
!IF [rc] == 0
isa-l.dll: isa-l.res
!ELSE
!MESSAGE Optionally install rc to set file version info
!ENDIF
isa-l.dll: $(objs)
link -out:$@ -dll -def:isa-l.def $?
link -out:$@ -dll -def:isa-l.def $(LINKFLAGS) @<<
$?
<<
isa-l.res: isa-l.h
rc /fo $@ isa-l.rc
{erasure_code}.c.obj:
$(CC) $(CFLAGS) /c -Fo$@ $?
@ -161,9 +247,31 @@ isa-l.dll: $(objs)
{igzip}.asm.obj:
$(AS) $(AFLAGS) -o $@ $?
{programs}.c.obj:
$(CC) $(CFLAGS) /c -Fo$@ $?
{programs}.asm.obj:
$(AS) $(AFLAGS) -o $@ $?
{mem}.c.obj:
$(CC) $(CFLAGS) /c -Fo$@ $?
{mem}.asm.obj:
$(AS) $(AFLAGS) -o $@ $?
# Examples
ex = xor_example.exe crc_simple_test.exe igzip_example.exe igzip_sync_flush_example.exe
ex = \
xor_example.exe \
crc_simple_test.exe \
crc64_example.exe \
igzip_example.exe \
igzip_inflate_example.exe \
igzip_sync_flush_example.exe \
ec_simple_example.exe \
ec_piggyback_example.exe
{examples\ec}.c.obj:
$(CC) $(CFLAGS) /c -Fo$@ $?
ex: lib $(ex)
$(ex): $(@B).obj
@ -182,10 +290,13 @@ checks = \
xor_check_test.exe \
pq_check_test.exe \
crc16_t10dif_test.exe \
crc32_ieee_test.exe \
crc32_iscsi_test.exe \
crc16_t10dif_copy_test.exe \
crc64_funcs_test.exe \
igzip_rand_test.exe
crc32_funcs_test.exe \
igzip_rand_test.exe \
igzip_wrapper_hdr_test.exe \
checksum32_funcs_test.exe \
mem_zero_detect_test.exe
checks: lib $(checks)
$(checks): $(@B).obj
@ -206,41 +317,74 @@ $(tests): $(@B).obj
# Performance tests
perfs = \
gf_vect_mul_perf.exe \
gf_vect_mul_sse_perf.exe \
gf_vect_mul_avx_perf.exe \
gf_vect_dot_prod_sse_perf.exe \
gf_vect_dot_prod_avx_perf.exe \
gf_2vect_dot_prod_sse_perf.exe \
gf_3vect_dot_prod_sse_perf.exe \
gf_4vect_dot_prod_sse_perf.exe \
gf_5vect_dot_prod_sse_perf.exe \
gf_6vect_dot_prod_sse_perf.exe \
gf_vect_dot_prod_perf.exe \
gf_vect_dot_prod_1tbl.exe \
gf_vect_mad_perf.exe \
erasure_code_perf.exe \
erasure_code_base_perf.exe \
erasure_code_sse_perf.exe \
erasure_code_update_perf.exe \
xor_gen_perf.exe \
pq_gen_perf.exe \
raid_funcs_perf.exe \
crc16_t10dif_perf.exe \
crc16_t10dif_copy_perf.exe \
crc16_t10dif_op_perf.exe \
crc32_ieee_perf.exe \
crc32_iscsi_perf.exe \
igzip_perf.exe \
igzip_sync_flush_perf.exe
crc64_funcs_perf.exe \
crc32_gzip_refl_perf.exe \
crc_funcs_perf.exe \
adler32_perf.exe \
mem_zero_detect_perf.exe
perfs: lib $(perfs)
$(perfs): $(@B).obj
perf: $(perfs)
!$?
progs = \
igzip.exe
progs: lib $(progs)
igzip.exe: programs\igzip_cli.obj
link /out:$@ $(LINKFLAGS) isa-l.lib $?
isa-l.h:
@echo /**>> $@
@echo * @file isa-l.h>> $@
@echo * @brief Include for ISA-L library>> $@
@echo */>> $@
@echo.>> $@
@echo #ifndef _ISAL_H_>> $@
@echo #define _ISAL_H_>> $@
@echo.>> $@
@echo #define ISAL_MAJOR_VERSION 2 >> $@
@echo #define ISAL_MINOR_VERSION 31 >> $@
@echo #define ISAL_PATCH_VERSION 1 >> $@
@echo #define ISAL_MAKE_VERSION(maj, min, patch) ((maj) * 0x10000 + (min) * 0x100 + (patch))>> $@
@echo #define ISAL_VERSION ISAL_MAKE_VERSION(ISAL_MAJOR_VERSION, ISAL_MINOR_VERSION, ISAL_PATCH_VERSION)>> $@
@echo.>> $@
@echo #ifndef RC_INVOKED>> $@
@echo #include ^<isa-l/crc.h^>>> $@
@echo #include ^<isa-l/crc64.h^>>> $@
@echo #include ^<isa-l/erasure_code.h^>>> $@
@echo #include ^<isa-l/gf_vect_mul.h^>>> $@
@echo #include ^<isa-l/igzip_lib.h^>>> $@
@echo #include ^<isa-l/mem_routines.h^>>> $@
@echo #include ^<isa-l/raid.h^>>> $@
@echo #endif // RC_INVOKED>> $@
@echo #endif //_ISAL_H_>> $@
clean:
-if exist *.obj del *.obj
-if exist bin\*.obj del bin\*.obj
-if exist isa-l_static.lib del isa-l_static.lib
-if exist *.exe del *.exe
-if exist *.pdb del *.pdb
-if exist isa-l.lib del isa-l.lib
-if exist isa-l.dll del isa-l.dll
-if exist isa-l.exp del isa-l.exp
-if exist isa-l.res del isa-l.res
zlib.lib:
igzip_inflate_perf.exe: zlib.lib
igzip_perf.exe: zlib.lib
igzip_inflate_test.exe: zlib.lib

View File

@ -2,7 +2,7 @@
# Copyright(c) 2011-2015 Intel Corporation All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
@ -27,15 +27,30 @@
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
########################################################################
units = erasure_code raid crc igzip
units = erasure_code raid crc igzip programs mem
default: lib
ifeq (,$(findstring crc,$(units)))
ifneq (,$(findstring igzip,$(units)))
override units += crc
endif
endif
include $(foreach unit,$(units), $(unit)/Makefile.am)
ifneq (,$(findstring igzip,$(units)))
include tests/fuzz/Makefile.am
endif
ifneq (,$(findstring erasure_code,$(units)))
include examples/ec/Makefile.am
endif
# Override individual lib names to make one inclusive library.
lib_name := bin/isa-l.a
include make.inc
include tools/gen_nmake.mk
VPATH = . $(units) include
VPATH = . $(units) include tests/fuzz examples/ec

View File

@ -1,57 +1,101 @@
=================================================
Intel(R) Intelligent Storage Acceleration Library
=================================================
[![Build Status](https://travis-ci.org/01org/isa-l.svg?branch=master)](https://travis-ci.org/01org/isa-l)
![Continuous Integration](https://github.com/intel/isa-l/actions/workflows/ci.yml/badge.svg)
[![Package on conda-forge](https://img.shields.io/conda/v/conda-forge/isa-l.svg)](https://anaconda.org/conda-forge/isa-l)
[![Coverity Status](https://scan.coverity.com/projects/29480/badge.svg)](https://scan.coverity.com/projects/intel-isa-l)
[![OpenSSF Scorecard](https://api.securityscorecards.dev/projects/github.com/intel/isa-l/badge)](https://securityscorecards.dev/viewer/?uri=github.com/intel/isa-l)
ISA-L is a collection of optimized low-level functions targeting storage
applications. ISA-L includes:
* Erasure codes - Fast block Reed-Solomon type erasure codes for any
encode/decode matrix in GF(2^8).
* CRC - Fast implementations of cyclic redundancy check. Six different
polynomials supported.
- iscsi32, ieee32, t10dif, ecma64, iso64, jones64.
- iscsi32, ieee32, t10dif, ecma64, iso64, jones64, rocksoft64.
* Raid - calculate and operate on XOR and P+Q parity found in common RAID
implementations.
* Compression - Fast deflate-compatible data compression.
* De-compression - Fast inflate-compatible data compression.
* igzip - A command line application like gzip, accelerated with ISA-L.
See [ISA-L for updates.](https://github.com/01org/isa-l)
For crypto functions see [isa-l_crypto on github.](https://github.com/01org/isa-l_crypto)
Build Prerequisites
===================
ISA-L requires yasm version 1.2.0 or later or nasm v2.11.01 or later. Building
with autotools requires autoconf/automake packages.
Also see:
* [ISA-L for updates](https://github.com/intel/isa-l).
* For crypto functions see [isa-l_crypto on github](https://github.com/intel/isa-l_crypto).
* The [github wiki](https://github.com/intel/isa-l/wiki) including a list of
[distros/ports](https://github.com/intel/isa-l/wiki/Ports--Repos) offering binary packages
as well as a list of [language bindings](https://github.com/intel/isa-l/wiki/Language-Bindings).
* [Contributing](CONTRIBUTING.md).
* [Security Policy](SECURITY.md).
* Docs on [units](doc/functions.md), [tests](doc/test.md), or [build details](doc/build.md).
Building ISA-L
==============
--------------
Autotools
---------
### Prerequisites
To build and install the library with autotools it is usually sufficient to run
the following:
* Make: GNU 'make' or 'nmake' (Windows).
* Optional: Building with autotools requires autoconf/automake/libtool packages.
* Optional: Manual generation requires help2man package.
x86_64:
* Assembler: nasm. 2.14.01 minimum version required [support](doc/build.md)).
* Compiler: gcc, clang, icc or VC compiler.
aarch64:
* Assembler: gas v2.24 or later.
* Compiler: gcc v4.7 or later.
other:
* Compiler: Portable base functions are available that build with most C compilers.
### Autotools
To build and install the library with autotools it is usually sufficient to run:
./autogen.sh
./configure
make
sudo make install
Other targets include: make check, make tests, make perfs, make ex (examples)
and make other.
### Makefile
To use a standard makefile run:
Windows
-------
make -f Makefile.unx
### Windows
On Windows use nmake to build dll and static lib:
nmake -f Makefile.nmake
Other targes include: nmake check.
or see [details on setting up environment here](doc/build.md).
### Other make targets
Other targets include:
* `make check` : create and run tests
* `make tests` : create additional unit tests
* `make perfs` : create included performance tests
* `make ex` : build examples
* `make other` : build other utilities such as compression file tests
* `make doc` : build API manual
DLL Injection Attack
--------------------
### Problem
The Windows OS has an insecure predefined search order and set of defaults when trying to locate a resource. If the resource location is not specified by the software, an attacker need only place a malicious version in one of the locations Windows will search, and it will be loaded instead. Although this weakness can occur with any resource, it is especially common with DLL files.
### Solutions
Applications using libisal DLL library may need to apply one of the solutions to prevent from DLL injection attack.
Two solutions are available:
- Using a Fully Qualified Path is the most secure way to load a DLL
- Signature verification of the DLL
### Resources and Solution Details
- Security remarks section of LoadLibraryEx documentation by Microsoft: <https://docs.microsoft.com/en-us/windows/win32/api/libloaderapi/nf-libloaderapi-loadlibraryexa#security-remarks>
- Microsoft Dynamic Link Library Security article: <https://docs.microsoft.com/en-us/windows/win32/dlls/dynamic-link-library-security>
- Hijack Execution Flow: DLL Search Order Hijacking: <https://attack.mitre.org/techniques/T1574/001>
- Hijack Execution Flow: DLL Side-Loading: <https://attack.mitre.org/techniques/T1574/002>

View File

@ -1,25 +1,111 @@
=============================================================================
v2.17 Intel Intelligent Storage Acceleration Library Release Notes
=============================================================================
v2.32 Intel Intelligent Storage Acceleration Library Release Notes
====================================================================
=============================================================================
RELEASE NOTE CONTENTS
=============================================================================
1. KNOWN ISSUES
2. FIXED ISSUES
3. CHANGE LOG & FEATURES ADDED
=============================================================================
1. KNOWN ISSUES
=============================================================================
1. KNOWN ISSUES
----------------
* Perf tests do not run in Windows environment.
* 32-bit lib is not supported in Windows.
=============================================================================
2. FIXED ISSUES
=============================================================================
---------------
v2.31.1
* Fixed return type for PowerPC _gf_vect_mul_base function.
* Fixed isal_deflate_icf_finish_lvl1 dispatcher for aarch64.
* Fixed CRC compilation on aarch64.
* Fixed MacOS-14 compilation.
* Fixed MinGW build.
* Fixed Clang compilation on igzip library on aarch64.
* Fixed spelling mistakes and typos.
* Fixed Windows build on erasure code performance applications.
* Fixed FreeBSD build warnings.
* Fixed compilation with YASM.
v2.31
* Fixed various compilation issues/warnings for different platforms.
* Fixed documentation on xor/pq gen/check functions, with minimum
number of vectors.
* Fixed potential out-of-bounds read on Adler32 Neon implementation.
* Fixed potential out-of-bounds read on gf_vect_mul Neon implementation.
* Fixed x86 load/store instructions in erasure coding functions (aligned moves
that should be unaligned).
* Fixed memory leaks in unit tests.
v2.30
* Intel CET support.
* Windows nasm support fix.
v2.28
* Fix documentation on gf_vect_mad(). Min length listed as 32 instead of
required min 64 bytes.
v2.27
* Fix lack of install for pkg-config files
v2.26
* Fixes for sanitizer warnings.
v2.25
* Fix for nasm on Mac OS X/darwin.
v2.24
* Fix for crc32_iscsi(). Potential read-over for small buffer. For an input
buffer length of less than 8 bytes and aligned to an 8 byte boundary, function
could read past length. Previously had the possibility to cause a seg fault
only for length 0 and invalid buffer passed. Calculated CRC is unchanged.
* Fix for compression/decompression of > 4GB files. For streaming compression
of extremely large files, the total_out parameter would wrap and could
potentially flag an otherwise valid lookback distance as being invalid.
Total_out is still 32bit for zlib compatibility. No inconsistent compressed
buffers were generated by the issue.
v2.23
* Fix for histogram generation base function.
* Fix library build warnings on macOS.
* Fix igzip to use bsf instruction when tzcnt is not available.
v2.22
* Fix ISA-L builds for other architectures. Base function and examples
sanitized for non-IA builds.
* Fix fuzz test script to work with llvm 6.0 builtin libFuzz.
v2.20
* Inflate total_out behavior corrected for in-progress decompression.
Previously total_out represented the total bytes decompressed into the output
buffer or temp internal buffer. This is changed to be only the bytes put into
the output buffer.
* Fixed issue with isal_create_hufftables_subset. Affects semi-dynamic
compression use case when explicitly creating hufftables from histogram. The
_hufftables_subset function could fail to generate length symbols for any
length that were never seen.
v2.19
* Fix erasure code test that violates rs matrix bounds.
* Fix 0 length file and looping errors in igzip_inflate_test.
v2.18
* Mac OS X/darwin systems no longer require the --target=darwin config option.
The autoconf canonical build should detect.
v2.17
* Fix igzip using 32K window and a shared object
@ -46,9 +132,206 @@ v2.10
affects windows versions of erasure code. GP register saves/restore were
pushed to same stack area as XMM.
=============================================================================
3. CHANGE LOG & FEATURES ADDED
=============================================================================
3. CHANGE LOG & FEATURES ADDED
------------------------------
v2.32
* General:
- Minimum NASM version required for x86 architecture is 2.14.01 now.
- 32-bit x86 support has been removed.
* RISCV support.
- Initial riscv64 support with runtime and build-time CPU feature detection.
* Igzip compression improvements:
- Added new RVV adler32 implementations.
* Igzip:
- Added experimental ISA-L shim library to provide drop-in compatibility with zlib.
* RAID improvements:
- Added new x86 AVX2+GFNI and AVX512+GFNI pq_gen implementations.
- Added new RVV xor_gen, pq_gen implementations.
* Erasure coding improvements:
- Added new RVV ec_encode_data,ec_encode_data_update,gf_vect_mad, gf_vect_dot_prod, gf_vect_mul implementations.
* Zero-memory detection improvements:
- Added new RVV implementations.
* CRC improvements:
- CRC64 Rocksoft implementation on aarch64 optimized similar to other CRC64
implementations.
* Performance applications:
- Add consolidated CRC performance application.
- Add consolidated RAID performance application.
v2.31
* API changes:
- gf_vect_mul_base() function now returns an integer, matching the return type
of gf_vect_mul() function (not a breaking change).
* Igzip compression improvements:
- Added compress/decompress with dictionary to perf test app.
- Zlib header can be now created on the fly when starting the compression.
- Added isal_zlib_hdr_init() function to initialize the zlib header to 0.
* Zero-memory detection improvements:
- Optimized AVX implementation.
- Added new AVX2 and AVX512 implementations.
* Erasure coding improvements:
- Added new AVX512 and AVX2 implementations using GFNI instructions.
- Added new SVE implementation.
* CRC improvements:
- Added new CRC64 Rocksoft algorithm.
- CRC x86 implementations optimized using ternary logic instructions and
folding of bigger data on the last bytes.
- CRC16 T10dif aarch64 implementation improved.
- CRC aarch64 implementations optimized using XOR fusion feature.
* Documentation:
- Added function overview documentation page.
- Added security file.
* Performance apps:
- Changed performance tests to warm by default.
* Example apps:
- Added CRC combine example `crc_combine_example` for multiple polynomials.
v2.30
* Igzip compression enhancements.
- New functions for dictionary acceleration. Split dictionary processing and
resetting can greatly accelerate the performance of compressing many small
files with a dictionary.
- New static level 0 header decode tables. Accelerates decompressing small
files that are level 0 compressed by skipping the known header parsing.
- New feature for igzip cli tool: support for concatenated .gz files. On
decompression, igzip will process a series of independent, concatenated .gz
files into one output stream.
* CRC Improvements
- New vclmul version of crc32_iscsi().
- Updates for aarch64.
v2.29
* CRC Improvements
- New AVX512 vclmul versions of crc16_t10dif(), crc32_ieee(), crc32_gzip_refl.
* Erasure code improvements
- Added AVX512 ec functions with 5 and 6 outputs. Can improve performance for
codes with 5 or more parity by running in batches of up to 6 at a time.
v2.28
* New next-arch versions of 64-bit CRC. All norm and reflected 64-bit
polynomials are expanded to utilize vpclmulqdq.
v2.27
* New multi-threaded compression option for igzip cli tool
v2.26
* Adler32 added to external API.
* Multi-arch improvements.
* Performance test improvements.
v2.25
* Igzip performance improvements and features.
- Performance improvements for uncompressable files. Random or uncompressable
files can be up to 3x faster in level 1 or 2 compression.
- Additional small file performance improvements.
- New options in igzip cli: use name from header or not, test compressed file.
* Multi-arch autoconf script.
- Autoconf should detect architecture and run base functions at minimum.
v2.24
* Igzip small file performance improvements and new features.
- Better performance on small files.
- New gzip/zlib header and trailer handling.
- New gzip/zlib header parsing helper functions.
- New user-space compression/decompression tool igzip.
* New mem unit added with first function isal_zero_detect().
v2.23
* Igzip inflate (decompression) performance improvements.
- Implemented multi-byte decode for inflate. Decode can pack up to three
symbols into the decode table making some compressed streams decompress much
faster depending on the prevalence of short codes.
v2.22
* Igzip: AVX2 version of level 3 compression added.
* Erasure code examples
- New examples for standard EC encode and decode.
- Example of piggyback EC encode and decode.
v2.21
* Igzip improvements
- New compression levels added. ISA-L fast deflate now has more levels to
balance speed vs. target compression level. Level 0, 1 are as in previous
generations. New levels 2 & 3 target higher compression roughly comparable
to zlib levels 2-3. Level 3 is currently only optimized for processors with
AVX512 instructions.
* New T10dif & copy function - crc16_t10dif_copy()
- CRC and copy was added to emulate T10dif operations such as DIF insert and
strip. This function stitches together CRC and memcpy operations
eliminating an extra data read.
* CRC32 iscsi performance improvements
- Fixes issue under some distributions where warm cache performance was
reduced.
v2.20
* Igzip improvements
- Optimized deflate_hash in compression functions.
Improves performance of using preset dictionary.
- Removed alignment restrictions on input structure.
v2.19
* Igzip improvements
- Add optimized Adler-32 checksum.
- Implement zlib compression format.
- Add stateful dictionary support.
- Add struct reset functions for both deflate and inflate.
* Reflected IEEE format CRC32 is released out. Function interface is named
crc32_gzip_refl.
* Exact work condition of Erasure Code Reed-Solomon Matrix is determined by new
added program gen_rs_matrix_limits.
v2.18
* New 2-pass fully-dynamic deflate compression (level -1). ISA-L fast deflate
now has two levels. Level 0 (default) is the same as previous generations.
Setting to level 1 will switch to the fully-dynamic compression that will
typically reach higher compression ratios.
* RAID AVX512 functions.
v2.17
* New fast decompression (inflate)
@ -88,7 +371,7 @@ v2.14
v2.13
* Erasure code improvments
* Erasure code improvements
- 32-bit port of optimized gf_vect_dot_prod() functions. This makes
ec_encode_data() functions much faster on 32-bit processors.
- Avoton performance improvements. Performance on Avoton for

11
SECURITY.md Normal file
View File

@ -0,0 +1,11 @@
# ISA-L Security Policy
## Report a Vulnerability
Please report security issues or vulnerabilities to the [Intel Security Center].
For more information on how Intel works to resolve security issues, see
[Vulnerability Handling Guidelines].
[Intel Security Center]:https://www.intel.com/security
[Vulnerability Handling Guidelines]:https://www.intel.com/content/www/us/en/security-center/vulnerability-handling-guidelines.html

View File

@ -1,6 +1,6 @@
#!/bin/sh -e
autoreconf --install --symlink -f
autoreconf --install --symlink -f -Wno-obsolete
libdir() {
echo $(cd $1/$(gcc -print-multi-os-directory); pwd)

View File

@ -0,0 +1,5 @@
@PACKAGE_INIT@
include("${CMAKE_CURRENT_LIST_DIR}/ISALTargets.cmake")
check_required_components(ISAL)

132
cmake/README.md Normal file
View File

@ -0,0 +1,132 @@
# CMake Build System for ISA-L
This directory contains CMake build configuration files for the ISA-L library (Intelligent Storage Acceleration Library).
## Prerequisites
### Required Tools
- **CMake** 3.12 or later
- **C compiler** (GCC, Clang, or compatible)
### Architecture-Specific Requirements
#### x86_64
- **NASM** (Netwide Assembler) - Required for optimized assembly implementations
#### ARM64/AArch64, RISC-V, PowerPC
- Standard C compiler with assembly support
## Building
### Quick Start
```bash
mkdir build
cd build
cmake ..
make -j$(nproc)
```
### Build Options
#### Specify build type
```bash
cmake -DCMAKE_BUILD_TYPE=Release .. # Default
cmake -DCMAKE_BUILD_TYPE=Debug ..
```
#### Cross-compilation example
```bash
cmake -DCMAKE_TOOLCHAIN_FILE=path/to/toolchain.cmake ..
```
## Installation
```bash
make install
```
Default installation paths:
- Libraries: `/usr/local/lib`
- Headers: `/usr/local/include/isa-l/`
- CMake config: `/usr/local/lib/cmake/ISAL/`
- pkg-config: `/usr/local/lib/pkgconfig/`
### Custom installation prefix
```bash
cmake -DCMAKE_INSTALL_PREFIX=/usr ..
make install
```
## Library Modules
The CMake build system is organized into the following modules:
- **erasure_code** - Erasure coding and Galois Field operations
- **raid** - RAID XOR and P+Q generation functions
- **crc** - CRC16, CRC32, and CRC64 implementations
- **igzip** - Fast deflate/inflate compression
- **mem** - Memory utility functions
## Using ISA-L in Your Project
### CMake Integration
```cmake
find_package(ISAL REQUIRED)
target_link_libraries(your_target ISAL::isal)
```
### pkg-config Integration
```bash
gcc $(pkg-config --cflags --libs libisal) your_program.c
```
### Direct Header Include
```c
#include <isa-l.h> // Includes all ISA-L headers
// or include specific headers:
#include <isa-l/erasure_code.h>
#include <isa-l/raid.h>
#include <isa-l/crc.h>
```
## Architecture Support
| Architecture | Status | Assembly Optimizations |
|--------------|--------|------------------------|
| x86_64 | ✅ | SSE, AVX, AVX2, AVX-512 |
| AArch64 | ✅ | NEON, SVE |
| RISC-V 64 | ✅ | RVV (Vector extensions) |
| PowerPC64LE | ✅ | VSX |
## Build Targets
- `isal` - Main shared library
- `install` - Install library and headers
## Troubleshooting
### NASM not found (x86_64)
```
sudo apt-get install nasm # Ubuntu/Debian
sudo yum install nasm # RHEL/CentOS
brew install nasm # macOS
```
### Assembly compilation errors
Ensure you have the correct assembler for your platform:
- x86_64: NASM
- ARM/RISC-V: GCC with assembly support
## Contributing
When adding new source files, update the appropriate module file in the `cmake/` directory:
- `cmake/erasure_code.cmake`
- `cmake/raid.cmake`
- `cmake/crc.cmake`
- `cmake/igzip.cmake`
- `cmake/mem.cmake`

139
cmake/crc.cmake Normal file
View File

@ -0,0 +1,139 @@
# cmake-format: off
# Copyright (c) 2025, Intel Corporation
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# * Neither the name of Intel Corporation nor the names of its contributors
# may be used to endorse or promote products derived from this software
# without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
# cmake-format: on
# CRC module CMake configuration
set(CRC_BASE_SOURCES
crc/crc_base.c
crc/crc64_base.c
)
set(CRC_BASE_ALIASES_SOURCES
crc/crc_base_aliases.c
)
set(CRC_X86_64_SOURCES
crc/crc16_t10dif_01.asm
crc/crc16_t10dif_by4.asm
crc/crc16_t10dif_02.asm
crc/crc16_t10dif_by16_10.asm
crc/crc16_t10dif_copy_by4.asm
crc/crc16_t10dif_copy_by4_02.asm
crc/crc32_ieee_01.asm
crc/crc32_ieee_02.asm
crc/crc32_ieee_by4.asm
crc/crc32_ieee_by16_10.asm
crc/crc32_iscsi_01.asm
crc/crc32_iscsi_by8_02.asm
crc/crc32_iscsi_by16_10.asm
crc/crc_multibinary.asm
crc/crc64_multibinary.asm
crc/crc64_ecma_refl_by8.asm
crc/crc64_ecma_refl_by16_10.asm
crc/crc64_ecma_norm_by8.asm
crc/crc64_ecma_norm_by16_10.asm
crc/crc64_iso_refl_by8.asm
crc/crc64_iso_refl_by16_10.asm
crc/crc64_iso_norm_by8.asm
crc/crc64_iso_norm_by16_10.asm
crc/crc64_jones_refl_by8.asm
crc/crc64_jones_refl_by16_10.asm
crc/crc64_jones_norm_by8.asm
crc/crc64_jones_norm_by16_10.asm
crc/crc64_rocksoft_refl_by8.asm
crc/crc64_rocksoft_refl_by16_10.asm
crc/crc64_rocksoft_norm_by8.asm
crc/crc64_rocksoft_norm_by16_10.asm
crc/crc32_gzip_refl_by8.asm
crc/crc32_gzip_refl_by8_02.asm
crc/crc32_gzip_refl_by16_10.asm
)
set(CRC_AARCH64_SOURCES
crc/aarch64/crc_multibinary_arm.S
crc/aarch64/crc_aarch64_dispatcher.c
crc/aarch64/crc16_t10dif_pmull.S
crc/aarch64/crc16_t10dif_copy_pmull.S
crc/aarch64/crc32_ieee_norm_pmull.S
crc/aarch64/crc64_ecma_refl_pmull.S
crc/aarch64/crc64_ecma_norm_pmull.S
crc/aarch64/crc64_iso_refl_pmull.S
crc/aarch64/crc64_iso_norm_pmull.S
crc/aarch64/crc64_jones_refl_pmull.S
crc/aarch64/crc64_jones_norm_pmull.S
crc/aarch64/crc64_rocksoft_refl_pmull.S
crc/aarch64/crc64_rocksoft_norm_pmull.S
crc/aarch64/crc32_iscsi_refl_pmull.S
crc/aarch64/crc32_gzip_refl_pmull.S
crc/aarch64/crc32_iscsi_3crc_fold.S
crc/aarch64/crc32_gzip_refl_3crc_fold.S
crc/aarch64/crc32_iscsi_crc_ext.S
crc/aarch64/crc32_gzip_refl_crc_ext.S
crc/aarch64/crc32_mix_default.S
crc/aarch64/crc32c_mix_default.S
crc/aarch64/crc32_mix_neoverse_n1.S
crc/aarch64/crc32c_mix_neoverse_n1.S
)
# Build source list based on architecture
set(CRC_SOURCES ${CRC_BASE_SOURCES})
if(CPU_X86_64)
list(APPEND CRC_SOURCES ${CRC_X86_64_SOURCES})
elseif(CPU_AARCH64)
list(APPEND CRC_SOURCES ${CRC_AARCH64_SOURCES})
elseif(CPU_PPC64LE OR CPU_RISCV64 OR CPU_UNDEFINED)
# These architectures use base aliases
list(APPEND CRC_SOURCES ${CRC_BASE_ALIASES_SOURCES})
endif()
# Headers exported by CRC module
set(CRC_HEADERS
include/crc.h
include/crc64.h
)
# Add to main extern headers list
list(APPEND EXTERN_HEADERS ${CRC_HEADERS})
# Add test applications for crc module
if(BUILD_TESTS)
# Check tests (unit tests that are run by CTest)
set(CRC_CHECK_TESTS
crc16_t10dif_test
crc16_t10dif_copy_test
crc64_funcs_test
crc32_funcs_test
)
# Create check test executables
foreach(test ${CRC_CHECK_TESTS})
add_executable(${test} crc/${test}.c)
target_link_libraries(${test} PRIVATE isal)
target_include_directories(${test} PRIVATE include)
add_test(NAME ${test} COMMAND ${test})
endforeach()
endif()

240
cmake/erasure_code.cmake Normal file
View File

@ -0,0 +1,240 @@
# cmake-format: off
# Copyright (c) 2025, Intel Corporation
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# * Neither the name of Intel Corporation nor the names of its contributors
# may be used to endorse or promote products derived from this software
# without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
# cmake-format: on
# Erasure Code module CMake configuration
set(ERASURE_CODE_BASE_SOURCES
erasure_code/ec_base.c
)
set(ERASURE_CODE_BASE_ALIASES_SOURCES
erasure_code/ec_base_aliases.c
)
set(ERASURE_CODE_X86_64_SOURCES
erasure_code/ec_highlevel_func.c
erasure_code/gf_vect_mul_sse.asm
erasure_code/gf_vect_mul_avx.asm
erasure_code/gf_vect_dot_prod_sse.asm
erasure_code/gf_vect_dot_prod_avx.asm
erasure_code/gf_vect_dot_prod_avx2.asm
erasure_code/gf_2vect_dot_prod_sse.asm
erasure_code/gf_3vect_dot_prod_sse.asm
erasure_code/gf_4vect_dot_prod_sse.asm
erasure_code/gf_5vect_dot_prod_sse.asm
erasure_code/gf_6vect_dot_prod_sse.asm
erasure_code/gf_2vect_dot_prod_avx.asm
erasure_code/gf_3vect_dot_prod_avx.asm
erasure_code/gf_4vect_dot_prod_avx.asm
erasure_code/gf_5vect_dot_prod_avx.asm
erasure_code/gf_6vect_dot_prod_avx.asm
erasure_code/gf_2vect_dot_prod_avx2.asm
erasure_code/gf_3vect_dot_prod_avx2.asm
erasure_code/gf_4vect_dot_prod_avx2.asm
erasure_code/gf_5vect_dot_prod_avx2.asm
erasure_code/gf_6vect_dot_prod_avx2.asm
erasure_code/gf_vect_mad_sse.asm
erasure_code/gf_2vect_mad_sse.asm
erasure_code/gf_3vect_mad_sse.asm
erasure_code/gf_4vect_mad_sse.asm
erasure_code/gf_5vect_mad_sse.asm
erasure_code/gf_6vect_mad_sse.asm
erasure_code/gf_vect_mad_avx.asm
erasure_code/gf_2vect_mad_avx.asm
erasure_code/gf_3vect_mad_avx.asm
erasure_code/gf_4vect_mad_avx.asm
erasure_code/gf_5vect_mad_avx.asm
erasure_code/gf_6vect_mad_avx.asm
erasure_code/gf_vect_mad_avx2.asm
erasure_code/gf_2vect_mad_avx2.asm
erasure_code/gf_3vect_mad_avx2.asm
erasure_code/gf_4vect_mad_avx2.asm
erasure_code/gf_5vect_mad_avx2.asm
erasure_code/gf_6vect_mad_avx2.asm
erasure_code/ec_multibinary.asm
erasure_code/gf_vect_mad_avx2_gfni.asm
erasure_code/gf_2vect_mad_avx2_gfni.asm
erasure_code/gf_3vect_mad_avx2_gfni.asm
erasure_code/gf_4vect_mad_avx2_gfni.asm
erasure_code/gf_5vect_mad_avx2_gfni.asm
erasure_code/gf_vect_dot_prod_avx512.asm
erasure_code/gf_2vect_dot_prod_avx512.asm
erasure_code/gf_3vect_dot_prod_avx512.asm
erasure_code/gf_4vect_dot_prod_avx512.asm
erasure_code/gf_5vect_dot_prod_avx512.asm
erasure_code/gf_6vect_dot_prod_avx512.asm
erasure_code/gf_vect_dot_prod_avx512_gfni.asm
erasure_code/gf_vect_dot_prod_avx2_gfni.asm
erasure_code/gf_2vect_dot_prod_avx2_gfni.asm
erasure_code/gf_3vect_dot_prod_avx2_gfni.asm
erasure_code/gf_2vect_dot_prod_avx512_gfni.asm
erasure_code/gf_3vect_dot_prod_avx512_gfni.asm
erasure_code/gf_4vect_dot_prod_avx512_gfni.asm
erasure_code/gf_5vect_dot_prod_avx512_gfni.asm
erasure_code/gf_6vect_dot_prod_avx512_gfni.asm
erasure_code/gf_vect_mad_avx512.asm
erasure_code/gf_2vect_mad_avx512.asm
erasure_code/gf_3vect_mad_avx512.asm
erasure_code/gf_4vect_mad_avx512.asm
erasure_code/gf_5vect_mad_avx512.asm
erasure_code/gf_6vect_mad_avx512.asm
erasure_code/gf_vect_mad_avx512_gfni.asm
erasure_code/gf_2vect_mad_avx512_gfni.asm
erasure_code/gf_3vect_mad_avx512_gfni.asm
erasure_code/gf_4vect_mad_avx512_gfni.asm
erasure_code/gf_5vect_mad_avx512_gfni.asm
erasure_code/gf_6vect_mad_avx512_gfni.asm
)
set(ERASURE_CODE_AARCH64_SOURCES
erasure_code/aarch64/ec_aarch64_highlevel_func.c
erasure_code/aarch64/ec_aarch64_dispatcher.c
erasure_code/aarch64/gf_vect_dot_prod_neon.S
erasure_code/aarch64/gf_2vect_dot_prod_neon.S
erasure_code/aarch64/gf_3vect_dot_prod_neon.S
erasure_code/aarch64/gf_4vect_dot_prod_neon.S
erasure_code/aarch64/gf_5vect_dot_prod_neon.S
erasure_code/aarch64/gf_vect_mad_neon.S
erasure_code/aarch64/gf_2vect_mad_neon.S
erasure_code/aarch64/gf_3vect_mad_neon.S
erasure_code/aarch64/gf_4vect_mad_neon.S
erasure_code/aarch64/gf_5vect_mad_neon.S
erasure_code/aarch64/gf_6vect_mad_neon.S
erasure_code/aarch64/gf_vect_mul_neon.S
erasure_code/aarch64/gf_vect_mad_sve.S
erasure_code/aarch64/gf_2vect_mad_sve.S
erasure_code/aarch64/gf_3vect_mad_sve.S
erasure_code/aarch64/gf_4vect_mad_sve.S
erasure_code/aarch64/gf_5vect_mad_sve.S
erasure_code/aarch64/gf_6vect_mad_sve.S
erasure_code/aarch64/gf_vect_dot_prod_sve.S
erasure_code/aarch64/gf_2vect_dot_prod_sve.S
erasure_code/aarch64/gf_3vect_dot_prod_sve.S
erasure_code/aarch64/gf_4vect_dot_prod_sve.S
erasure_code/aarch64/gf_5vect_dot_prod_sve.S
erasure_code/aarch64/gf_6vect_dot_prod_sve.S
erasure_code/aarch64/gf_7vect_dot_prod_sve.S
erasure_code/aarch64/gf_8vect_dot_prod_sve.S
erasure_code/aarch64/gf_vect_mul_sve.S
erasure_code/aarch64/ec_multibinary_arm.S
)
set(ERASURE_CODE_PPC64LE_SOURCES
erasure_code/ppc64le/ec_base_vsx.c
erasure_code/ppc64le/gf_vect_mul_vsx.c
erasure_code/ppc64le/gf_vect_dot_prod_vsx.c
erasure_code/ppc64le/gf_vect_mad_vsx.c
erasure_code/ppc64le/gf_2vect_dot_prod_vsx.c
erasure_code/ppc64le/gf_2vect_mad_vsx.c
erasure_code/ppc64le/gf_3vect_dot_prod_vsx.c
erasure_code/ppc64le/gf_3vect_mad_vsx.c
erasure_code/ppc64le/gf_4vect_dot_prod_vsx.c
erasure_code/ppc64le/gf_4vect_mad_vsx.c
erasure_code/ppc64le/gf_5vect_dot_prod_vsx.c
erasure_code/ppc64le/gf_5vect_mad_vsx.c
erasure_code/ppc64le/gf_6vect_dot_prod_vsx.c
erasure_code/ppc64le/gf_6vect_mad_vsx.c
)
set(ERASURE_CODE_RISCV64_SOURCES
erasure_code/riscv64/ec_multibinary_riscv64_dispatcher.c
erasure_code/riscv64/ec_multibinary_riscv64.S
erasure_code/riscv64/ec_gf_vect_mul_rvv.S
erasure_code/riscv64/ec_gf_vect_dot_prod_rvv.S
erasure_code/riscv64/ec_encode_data_rvv.S
)
# Build source list based on architecture
set(ERASURE_CODE_SOURCES ${ERASURE_CODE_BASE_SOURCES})
if(CPU_X86_64)
list(APPEND ERASURE_CODE_SOURCES ${ERASURE_CODE_X86_64_SOURCES})
elseif(CPU_AARCH64)
list(APPEND ERASURE_CODE_SOURCES ${ERASURE_CODE_AARCH64_SOURCES})
elseif(CPU_PPC64LE)
list(APPEND ERASURE_CODE_SOURCES ${ERASURE_CODE_PPC64LE_SOURCES})
elseif(CPU_RISCV64)
list(APPEND ERASURE_CODE_SOURCES ${ERASURE_CODE_RISCV64_SOURCES})
elseif(CPU_UNDEFINED)
list(APPEND ERASURE_CODE_SOURCES ${ERASURE_CODE_BASE_ALIASES_SOURCES})
endif()
# Headers exported by erasure_code module
set(ERASURE_CODE_HEADERS
include/erasure_code.h
include/gf_vect_mul.h
)
# Add to main extern headers list
list(APPEND EXTERN_HEADERS ${ERASURE_CODE_HEADERS})
# Add test applications for erasure_code module
if(BUILD_TESTS)
# Check tests (unit tests that are run by CTest)
set(ERASURE_CODE_CHECK_TESTS
gf_vect_mul_test
erasure_code_test
gf_inverse_test
erasure_code_update_test
)
# Unit tests (additional unit tests)
set(ERASURE_CODE_UNIT_TESTS
gf_vect_mul_base_test
gf_vect_dot_prod_base_test
gf_vect_dot_prod_test
gf_vect_mad_test
erasure_code_base_test
)
# Other tests
set(ERASURE_CODE_OTHER_TESTS
gen_rs_matrix_limits
)
# Create check test executables
foreach(test ${ERASURE_CODE_CHECK_TESTS})
add_executable(${test} erasure_code/${test}.c)
target_link_libraries(${test} PRIVATE isal)
target_include_directories(${test} PRIVATE include)
add_test(NAME ${test} COMMAND ${test})
endforeach()
# Create unit test executables
foreach(test ${ERASURE_CODE_UNIT_TESTS})
add_executable(${test} erasure_code/${test}.c)
target_link_libraries(${test} PRIVATE isal)
target_include_directories(${test} PRIVATE include)
endforeach()
# Create other test executables
foreach(test ${ERASURE_CODE_OTHER_TESTS})
add_executable(${test} erasure_code/${test}.c)
target_link_libraries(${test} PRIVATE isal)
target_include_directories(${test} PRIVATE include)
endforeach()
endif()

135
cmake/igzip.cmake Normal file
View File

@ -0,0 +1,135 @@
# cmake-format: off
# Copyright (c) 2025, Intel Corporation
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# * Neither the name of Intel Corporation nor the names of its contributors
# may be used to endorse or promote products derived from this software
# without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
# cmake-format: on
# IGZIP module CMake configuration
set(IGZIP_BASE_SOURCES
igzip/igzip.c
igzip/hufftables_c.c
igzip/igzip_base.c
igzip/igzip_icf_base.c
igzip/adler32_base.c
igzip/flatten_ll.c
igzip/encode_df.c
igzip/igzip_icf_body.c
igzip/huff_codes.c
igzip/igzip_inflate.c
)
set(IGZIP_BASE_ALIASES_SOURCES
igzip/igzip_base_aliases.c
igzip/proc_heap_base.c
)
set(IGZIP_X86_64_SOURCES
igzip/igzip_body.asm
igzip/igzip_finish.asm
igzip/igzip_icf_body_h1_gr_bt.asm
igzip/igzip_icf_finish.asm
igzip/rfc1951_lookup.asm
igzip/adler32_sse.asm
igzip/adler32_avx2_4.asm
igzip/igzip_multibinary.asm
igzip/igzip_update_histogram_01.asm
igzip/igzip_update_histogram_04.asm
igzip/igzip_decode_block_stateless_01.asm
igzip/igzip_decode_block_stateless_04.asm
igzip/igzip_inflate_multibinary.asm
igzip/encode_df_04.asm
igzip/encode_df_06.asm
igzip/proc_heap.asm
igzip/igzip_deflate_hash.asm
igzip/igzip_gen_icf_map_lh1_06.asm
igzip/igzip_gen_icf_map_lh1_04.asm
igzip/igzip_set_long_icf_fg_04.asm
igzip/igzip_set_long_icf_fg_06.asm
)
set(IGZIP_AARCH64_SOURCES
igzip/aarch64/igzip_inflate_multibinary_arm64.S
igzip/aarch64/igzip_multibinary_arm64.S
igzip/aarch64/igzip_isal_adler32_neon.S
igzip/aarch64/igzip_multibinary_aarch64_dispatcher.c
igzip/aarch64/igzip_deflate_body_aarch64.S
igzip/aarch64/igzip_deflate_finish_aarch64.S
igzip/aarch64/isal_deflate_icf_body_hash_hist.S
igzip/aarch64/isal_deflate_icf_finish_hash_hist.S
igzip/aarch64/igzip_set_long_icf_fg.S
igzip/aarch64/encode_df.S
igzip/aarch64/isal_update_histogram.S
igzip/aarch64/gen_icf_map.S
igzip/aarch64/igzip_deflate_hash_aarch64.S
igzip/aarch64/igzip_decode_huffman_code_block_aarch64.S
)
set(IGZIP_RISCV64_SOURCES
igzip/riscv64/igzip_multibinary_riscv64_dispatcher.c
igzip/riscv64/igzip_multibinary_riscv64.S
igzip/riscv64/igzip_isal_adler32_rvv.S
)
# Build source list based on architecture
set(IGZIP_SOURCES ${IGZIP_BASE_SOURCES})
if(CPU_X86_64)
list(APPEND IGZIP_SOURCES ${IGZIP_X86_64_SOURCES})
elseif(CPU_AARCH64)
list(APPEND IGZIP_SOURCES ${IGZIP_AARCH64_SOURCES})
elseif(CPU_PPC64LE)
# PPC64LE uses base aliases
list(APPEND IGZIP_SOURCES ${IGZIP_BASE_ALIASES_SOURCES})
elseif(CPU_RISCV64)
list(APPEND IGZIP_SOURCES ${IGZIP_RISCV64_SOURCES})
elseif(CPU_UNDEFINED)
list(APPEND IGZIP_SOURCES ${IGZIP_BASE_ALIASES_SOURCES})
endif()
# Headers exported by IGZIP module
set(IGZIP_HEADERS
include/igzip_lib.h
)
# Add to main extern headers list
list(APPEND EXTERN_HEADERS ${IGZIP_HEADERS})
# Add test applications for igzip module
if(BUILD_TESTS)
# Check tests (unit tests that are run by CTest)
set(IGZIP_CHECK_TESTS
igzip_rand_test
igzip_wrapper_hdr_test
checksum32_funcs_test
)
# Create check test executables
foreach(test ${IGZIP_CHECK_TESTS})
add_executable(${test} igzip/${test}.c)
target_link_libraries(${test} PRIVATE isal)
target_include_directories(${test} PRIVATE include igzip)
add_test(NAME ${test} COMMAND ${test})
endforeach()
endif()

24
cmake/isa-l.h.in Normal file
View File

@ -0,0 +1,24 @@
/**
* @file isa-l.h
* @brief Include for ISA-L library
*/
#ifndef _ISAL_H_
#define _ISAL_H_
#define ISAL_MAJOR_VERSION @PROJECT_VERSION_MAJOR@
#define ISAL_MINOR_VERSION @PROJECT_VERSION_MINOR@
#define ISAL_PATCH_VERSION @PROJECT_VERSION_PATCH@
#define ISAL_MAKE_VERSION(maj, min, patch) ((maj) * 0x10000 + (min) * 0x100 + (patch))
#define ISAL_VERSION ISAL_MAKE_VERSION(ISAL_MAJOR_VERSION, ISAL_MINOR_VERSION, ISAL_PATCH_VERSION)
#include <isa-l/erasure_code.h>
#include <isa-l/gf_vect_mul.h>
#include <isa-l/raid.h>
#include <isa-l/crc.h>
#include <isa-l/crc64.h>
#include <isa-l/igzip_lib.h>
#include <isa-l/mem_routines.h>
#include <isa-l/test.h>
#endif //_ISAL_H_

94
cmake/mem.cmake Normal file
View File

@ -0,0 +1,94 @@
# cmake-format: off
# Copyright (c) 2025, Intel Corporation
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# * Neither the name of Intel Corporation nor the names of its contributors
# may be used to endorse or promote products derived from this software
# without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
# cmake-format: on
# MEM module CMake configuration
set(MEM_BASE_SOURCES
mem/mem_zero_detect_base.c
)
set(MEM_BASE_ALIASES_SOURCES
mem/mem_zero_detect_base_aliases.c
)
set(MEM_X86_64_SOURCES
mem/mem_zero_detect_avx512.asm
mem/mem_zero_detect_avx2.asm
mem/mem_zero_detect_avx.asm
mem/mem_zero_detect_sse.asm
mem/mem_multibinary.asm
)
set(MEM_AARCH64_SOURCES
mem/aarch64/mem_zero_detect_neon.S
mem/aarch64/mem_multibinary_arm.S
mem/aarch64/mem_aarch64_dispatcher.c
)
set(MEM_RISCV64_SOURCES
mem/riscv64/mem_multibinary_riscv64_dispatcher.c
mem/riscv64/mem_multibinary_riscv64.S
mem/riscv64/mem_zero_detect_rvv.S
)
# Build source list based on architecture
set(MEM_SOURCES ${MEM_BASE_SOURCES})
if(CPU_X86_64)
list(APPEND MEM_SOURCES ${MEM_X86_64_SOURCES})
elseif(CPU_AARCH64)
list(APPEND MEM_SOURCES ${MEM_AARCH64_SOURCES})
elseif(CPU_PPC64LE OR CPU_UNDEFINED)
# These architectures use base aliases
list(APPEND MEM_SOURCES ${MEM_BASE_ALIASES_SOURCES})
elseif(CPU_RISCV64)
list(APPEND MEM_SOURCES ${MEM_RISCV64_SOURCES})
endif()
# Headers exported by MEM module
set(MEM_HEADERS
include/mem_routines.h
)
# Add to main extern headers list
list(APPEND EXTERN_HEADERS ${MEM_HEADERS})
# Add test applications for mem module
if(BUILD_TESTS)
# Check tests (unit tests that are run by CTest)
set(MEM_CHECK_TESTS
mem_zero_detect_test
)
# Create check test executables
foreach(test ${MEM_CHECK_TESTS})
add_executable(${test} mem/${test}.c)
target_link_libraries(${test} PRIVATE isal)
target_include_directories(${test} PRIVATE include)
add_test(NAME ${test} COMMAND ${test})
endforeach()
endif()

110
cmake/raid.cmake Normal file
View File

@ -0,0 +1,110 @@
# cmake-format: off
# Copyright (c) 2025, Intel Corporation
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# * Neither the name of Intel Corporation nor the names of its contributors
# may be used to endorse or promote products derived from this software
# without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
# cmake-format: on
# RAID module CMake configuration
set(RAID_BASE_SOURCES
raid/raid_base.c
)
set(RAID_BASE_ALIASES_SOURCES
raid/raid_base_aliases.c
)
set(RAID_X86_64_SOURCES
raid/xor_gen_sse.asm
raid/pq_gen_sse.asm
raid/xor_check_sse.asm
raid/pq_check_sse.asm
raid/pq_gen_avx.asm
raid/xor_gen_avx.asm
raid/pq_gen_avx2.asm
raid/pq_gen_avx2_gfni.asm
raid/xor_gen_avx512.asm
raid/pq_gen_avx512.asm
raid/pq_gen_avx512_gfni.asm
raid/raid_multibinary.asm
)
set(RAID_AARCH64_SOURCES
raid/aarch64/xor_gen_neon.S
raid/aarch64/pq_gen_neon.S
raid/aarch64/xor_check_neon.S
raid/aarch64/pq_check_neon.S
raid/aarch64/raid_multibinary_arm.S
raid/aarch64/raid_aarch64_dispatcher.c
)
set(RAID_RISCV64_SOURCES
raid/riscv64/raid_multibinary_riscv64_dispatcher.c
raid/riscv64/raid_multibinary_riscv64.S
raid/riscv64/raid_pq_gen_rvv.S
raid/riscv64/raid_xor_gen_rvv.S
)
# Build source list based on architecture
set(RAID_SOURCES ${RAID_BASE_SOURCES})
if(CPU_X86_64)
list(APPEND RAID_SOURCES ${RAID_X86_64_SOURCES})
elseif(CPU_AARCH64)
list(APPEND RAID_SOURCES ${RAID_AARCH64_SOURCES})
elseif(CPU_PPC64LE)
# PPC64LE uses base aliases
list(APPEND RAID_SOURCES ${RAID_BASE_ALIASES_SOURCES})
elseif(CPU_RISCV64)
list(APPEND RAID_SOURCES ${RAID_RISCV64_SOURCES})
elseif(CPU_UNDEFINED)
list(APPEND RAID_SOURCES ${RAID_BASE_ALIASES_SOURCES})
endif()
# Headers exported by RAID module
set(RAID_HEADERS
include/raid.h
)
# Add to main extern headers list
list(APPEND EXTERN_HEADERS ${RAID_HEADERS})
# Add test applications for raid module
if(BUILD_TESTS)
# Check tests (unit tests that are run by CTest)
set(RAID_CHECK_TESTS
xor_gen_test
pq_gen_test
xor_check_test
pq_check_test
)
# Create check test executables
foreach(test ${RAID_CHECK_TESTS})
add_executable(${test} raid/${test}.c)
target_link_libraries(${test} PRIVATE isal)
target_include_directories(${test} PRIVATE include)
add_test(NAME ${test} COMMAND ${test})
endforeach()
endif()

View File

@ -3,10 +3,9 @@
AC_PREREQ(2.69)
AC_INIT([libisal],
[2.17.0],
[sg.support.isal@intel.com],
[isa-l],
[http://01.org/storage-acceleration-library])
[2.31.1],
[https://github.com/intel/isa-l/issues],
[isa-l])
AC_CONFIG_SRCDIR([])
AC_CONFIG_AUX_DIR([build-aux])
AM_INIT_AUTOMAKE([
@ -22,6 +21,24 @@ AM_INIT_AUTOMAKE([
])
AM_PROG_AS
AC_CANONICAL_HOST
CPU=""
AS_CASE([$host_cpu],
[x86_64], [CPU="x86_64"],
[amd64], [CPU="x86_64"],
[aarch64], [CPU="aarch64"],
[arm64], [CPU="aarch64"],
[powerpc64le], [CPU="ppc64le"],
[ppc64le], [CPU="ppc64le"],
[riscv64], [CPU="riscv64"],
)
AM_CONDITIONAL([CPU_X86_64], [test "$CPU" = "x86_64"])
AM_CONDITIONAL([CPU_AARCH64], [test "$CPU" = "aarch64"])
AM_CONDITIONAL([CPU_PPC64LE], [test "$CPU" = "ppc64le"])
AM_CONDITIONAL([CPU_RISCV64], [test "$CPU" = "riscv64"])
AM_CONDITIONAL([CPU_UNDEFINED], [test "x$CPU" = "x"])
AM_CONDITIONAL([HAVE_RVV], [false])
# Check for programs
AC_PROG_CC_STDC
AC_USE_SYSTEM_EXTENSIONS
@ -31,6 +48,40 @@ AC_PREFIX_DEFAULT([/usr])
AC_PROG_SED
AC_PROG_MKDIR_P
case "${CPU}" in
x86_64)
is_x86=yes
;;
riscv64)
AC_MSG_CHECKING([checking RVV support])
AC_COMPILE_IFELSE(
[AC_LANG_PROGRAM([], [
__asm__ volatile(
".option arch, +v\n"
"vsetivli zero, 0, e8, m1, ta, ma\n"
);
])],
[AC_DEFINE([HAVE_RVV], [1], [Enable RVV instructions])
AM_CONDITIONAL([HAVE_RVV], [true]) rvv=yes],
[AC_DEFINE([HAVE_RVV], [0], [Disable RVV instructions])
AM_CONDITIONAL([HAVE_RVV], [false]) rvv=no]
)
if test "x$rvv" = "xyes"; then
CFLAGS+=" -march=rv64gcv"
CCASFLAGS+=" -march=rv64gcv"
fi
AC_MSG_RESULT([$rvv])
;;
*)
is_x86=no
esac
# Options
AC_ARG_ENABLE([debug],
AS_HELP_STRING([--enable-debug], [enable debug messages @<:@default=disabled@:>@]),
@ -39,90 +90,105 @@ AS_IF([test "x$enable_debug" = "xyes"], [
AC_DEFINE(ENABLE_DEBUG, [1], [Debug messages.])
])
# Check for yasm and yasm features
AC_CHECK_PROG(HAVE_YASM, yasm, yes, no)
if test "$HAVE_YASM" = "no"; then
AC_MSG_RESULT([no yasm])
else
AC_MSG_CHECKING([for modern yasm])
AC_LANG_CONFTEST([AC_LANG_SOURCE([[vmovdqa %xmm0, %xmm1;]])])
if yasm -f elf64 -p gas conftest.c ; then
with_modern_yasm=yes
AC_MSG_RESULT([yes])
AC_MSG_CHECKING([for optional yasm AVX512 support])
AC_LANG_CONFTEST([AC_LANG_SOURCE([[vpshufb %zmm0, %zmm1, %zmm2;]])])
if yasm -f elf64 -p gas conftest.c 2> /dev/null; then
yasm_knows_avx512=yes
AC_MSG_RESULT([yes])
# If this build is for x86, look for nasm
if test x"$is_x86" = x"yes"; then
AC_MSG_CHECKING([whether Intel CET is enabled])
AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[]], [[
#ifndef __CET__
# error CET is not enabled
#endif]])],[AC_MSG_RESULT([yes])
intel_cet_enabled=yes],[AC_MSG_RESULT([no])
intel_cet_enabled=no])
AS_IF([test "x$intel_cet_enabled" = "xyes"], [
AC_DEFINE(INTEL_CET_ENABLED, [1], [Intel CET enabled.])
])
# check if LD -z options are supported
LDFLAGS="\
-Wl,-z,noexecstack \
-Wl,-z,relro \
-Wl,-z,now \
"
AC_MSG_CHECKING([if $LD supports $LDFLAGS])
AC_LINK_IFELSE([AC_LANG_PROGRAM([[]], [[
int main(int argc, char **argv)
{
return 0;
}]])],
[AC_MSG_RESULT([yes])],
[AC_MSG_RESULT([no])
LDFLAGS=""]
)
# Pick NASM assembler
if test x"$AS" = x""; then
# Check for nasm and nasm features
nasm_feature_level=0
AC_CHECK_PROG(HAVE_NASM, nasm, yes, no)
if test "$HAVE_NASM" = "yes"; then
nasm_feature_level=1
else
AC_MSG_RESULT([no])
AC_MSG_RESULT([no nasm])
fi
else
AC_MSG_FAILURE([no])
fi
fi
# Check for nasm and nasm features
AC_CHECK_PROG(HAVE_NASM, nasm, yes, no)
if test "$HAVE_NASM" = "no"; then
AC_MSG_RESULT([no nasm])
else
AC_MSG_CHECKING([for modern nasm])
AC_LANG_CONFTEST([AC_LANG_SOURCE([[pblendvb xmm2, xmm1;]])])
sed -i -e '/pblendvb/!d' conftest.c
if nasm -f elf64 conftest.c 2> /dev/null; then
with_modern_nasm=yes
AC_MSG_RESULT([yes])
AC_MSG_CHECKING([for optional nasm AVX512 support])
AC_LANG_CONFTEST([AC_LANG_SOURCE([[vpshufb zmm0, zmm1, zmm2;]])])
sed -i -e '/vpshufb/!d' conftest.c
if nasm -f elf64 conftest.c 2> /dev/null; then
nasm_knows_avx512=yes
AC_MSG_RESULT([yes])
if test x"$nasm_feature_level" = x"1"; then
AC_MSG_CHECKING([for modern nasm])
AC_LANG_CONFTEST([AC_LANG_SOURCE([[vpcompressb zmm0 {k1}, zmm1;]])])
sed -i -e '/vpcompressb/!d' conftest.c
if nasm -f elf64 conftest.c 2> /dev/null; then
AC_MSG_RESULT([yes])
nasm_feature_level=10
else
AC_MSG_RESULT([no])
fi
fi
AS=nasm
as_feature_level=$nasm_feature_level
else
# Check for $AS supported features
as_feature_level=0
AC_CHECK_PROG(HAVE_AS, $AS, yes, no)
if test "$HAVE_AS" = "yes"; then
as_feature_level=1
else
AC_MSG_RESULT([no])
AC_MSG_ERROR([no $AS])
fi
else
AC_MSG_RESULT([no])
fi
fi
# Pick an assembler yasm or nasm
if test x"$AS" == x""; then
if test x"$yasm_knows_avx512" = x"yes"; then
AS=yasm
elif test x"$nasm_knows_avx512" = x"yes"; then
AS=nasm
elif test x"$with_modern_yasm" = x"yes"; then
AS=yasm
elif test x"$with_modern_nasm" = x"yes"; then
AS=nasm
else
AC_MSG_ERROR([No modern yasm or nasm found as required. Yasm should be 1.2.0 or later, and nasm should be v2.11.01 or later.])
if test x"$as_feature_level" = x"1"; then
AC_LANG_CONFTEST([AC_LANG_SOURCE([[vpcompressb zmm0, k1, zmm1;]])])
sed -i -e '/vpcompressb/!d' conftest.c
if $AS -f elf64 conftest.c 2> /dev/null; then
AC_MSG_RESULT([yes])
as_feature_level=10
else
AC_MSG_RESULT([no])
fi
fi
fi
fi
echo "Using assembler $AS"
if test \( x"$AS" = x"yasm" -a x"$yasm_knows_avx512" = x"yes" \) -o \( x"$AS" = x"nasm" -a x"$nasm_knows_avx512" = x"yes" \); then
AC_DEFINE(HAVE_AS_KNOWS_AVX512, [1], [Assembler can do AVX512.])
have_as_knows_avx512=yes
if test $as_feature_level -lt 10 ; then
AC_MSG_ERROR([No modern nasm found as required. Nasm should be v2.14.01 or later.])
fi
case $host_os in
*linux*) arch=linux asm_args="-f elf64";;
*darwin*) arch=darwin asm_args="-f macho64 --prefix=_ ";;
*netbsd*) arch=netbsd asm_args="-f elf64";;
*mingw*) arch=mingw asm_args="-f win64";;
*) arch=unknown asm_args="-f elf64";;
esac
AM_CONDITIONAL(USE_NASM, test x"$AS" = x"nasm")
AC_SUBST([asm_args])
AM_CONDITIONAL(DARWIN, test x"$arch" = x"darwin")
AC_MSG_RESULT([Using $AS args target "$arch" "$asm_args"])
else
AC_MSG_RESULT([Assembler does not understand AVX512 opcodes. Consider upgrading for best performance.])
# Disable below conditionals if not x86
AM_CONDITIONAL(USE_NASM, test "x" = "y")
AM_CONDITIONAL(DARWIN, test "x" = "y")
fi
AM_CONDITIONAL(USE_YASM, test x"$AS" = x"yasm")
AM_CONDITIONAL(USE_NASM, test x"$AS" = x"nasm")
AM_CONDITIONAL(WITH_AVX512, test x"$have_as_knows_avx512" = x"yes")
case $target in
*linux*) arch=linux yasm_args="-f elf64";;
*darwin*) arch=darwin yasm_args="-f macho64 --prefix=_ ";;
*netbsd*) arch=netbsd yasm_args="-f elf64";;
*) arch=unknown yasm_args="-f elf64";;
esac
AC_SUBST([yasm_args])
AM_CONDITIONAL(DARWIN, test x"$arch" = x"darwin")
AC_MSG_RESULT([Using yasm args target "$arch" "$yasm_args"])
# Check for header files
AC_CHECK_HEADERS([limits.h stdint.h stdlib.h string.h])
@ -137,7 +203,7 @@ AC_TYPE_UINT8_T
# Checks for library functions.
AC_FUNC_MALLOC # Used only in tests
AC_CHECK_FUNCS([memmove memset])
AC_CHECK_FUNCS([memmove memset getopt])
my_CFLAGS="\
-Wall \
@ -148,6 +214,8 @@ my_CFLAGS="\
-Wshadow \
-Wstrict-prototypes \
-Wtype-limits \
-fstack-protector \
-D_FORTIFY_SOURCE=2 \
"
AC_SUBST([my_CFLAGS])

View File

@ -1,5 +1,5 @@
########################################################################
# Copyright(c) 2011-2015 Intel Corporation All rights reserved.
# Copyright(c) 2011-2017 Intel Corporation All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
@ -27,33 +27,68 @@
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
########################################################################
include crc/aarch64/Makefile.am
lsrc += \
crc/crc_base.c \
crc/crc64_base.c
lsrc_base_aliases += crc/crc_base_aliases.c
lsrc_ppc64le += crc/crc_base_aliases.c
lsrc_riscv64 += crc/crc_base_aliases.c
lsrc_x86_64 += \
crc/crc16_t10dif_01.asm \
crc/crc16_t10dif_by4.asm \
crc/crc16_t10dif_02.asm \
crc/crc16_t10dif_by16_10.asm \
crc/crc16_t10dif_copy_by4.asm \
crc/crc16_t10dif_copy_by4_02.asm \
crc/crc32_ieee_01.asm \
crc/crc32_ieee_02.asm \
crc/crc32_ieee_by4.asm \
crc/crc32_ieee_by16_10.asm \
crc/crc32_iscsi_01.asm \
crc/crc32_iscsi_00.asm \
crc/crc32_iscsi_by8_02.asm \
crc/crc32_iscsi_by16_10.asm \
crc/crc_multibinary.asm \
crc/crc64_multibinary.asm \
crc/crc64_ecma_refl_by8.asm \
crc/crc64_ecma_refl_by16_10.asm \
crc/crc64_ecma_norm_by8.asm \
crc/crc64_ecma_norm_by16_10.asm \
crc/crc64_iso_refl_by8.asm \
crc/crc64_iso_refl_by16_10.asm \
crc/crc64_iso_norm_by8.asm \
crc/crc64_iso_norm_by16_10.asm \
crc/crc64_jones_refl_by8.asm \
crc/crc64_jones_refl_by16_10.asm \
crc/crc64_jones_norm_by8.asm \
crc/crc64_base.c \
crc/crc_multibinary.asm \
crc/crc_base.c
crc/crc64_jones_norm_by16_10.asm \
crc/crc64_rocksoft_refl_by8.asm \
crc/crc64_rocksoft_refl_by16_10.asm \
crc/crc64_rocksoft_norm_by8.asm \
crc/crc64_rocksoft_norm_by16_10.asm \
crc/crc32_gzip_refl_by8.asm \
crc/crc32_gzip_refl_by8_02.asm \
crc/crc32_gzip_refl_by16_10.asm
src_include += -I $(srcdir)/crc
extern_hdrs += include/crc.h include/crc64.h
other_src += include/reg_sizes.asm include/types.h include/test.h
other_src += include/reg_sizes.asm include/test.h \
crc/crc_ref.h crc/crc64_ref.h
check_tests += crc/crc16_t10dif_test crc/crc32_ieee_test crc/crc32_iscsi_test \
crc/crc64_funcs_test
check_tests += crc/crc16_t10dif_test \
crc/crc16_t10dif_copy_test \
crc/crc64_funcs_test \
crc/crc32_funcs_test
perf_tests += crc/crc16_t10dif_perf crc/crc32_ieee_perf crc/crc32_iscsi_perf \
crc/crc64_funcs_perf
perf_tests += crc/crc16_t10dif_perf crc/crc16_t10dif_copy_perf \
crc/crc16_t10dif_op_perf \
crc/crc32_ieee_perf crc/crc32_iscsi_perf \
crc/crc64_funcs_perf crc/crc32_gzip_refl_perf \
crc/crc_funcs_perf
examples += crc/crc_simple_test crc/crc64_example

60
crc/aarch64/Makefile.am Normal file
View File

@ -0,0 +1,60 @@
########################################################################
# Copyright(c) 2020 Arm Corporation All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Arm Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#########################################################################
lsrc_aarch64 += \
crc/aarch64/crc_multibinary_arm.S \
crc/aarch64/crc_aarch64_dispatcher.c
lsrc_aarch64 += \
crc/aarch64/crc16_t10dif_pmull.S \
crc/aarch64/crc16_t10dif_copy_pmull.S \
crc/aarch64/crc32_ieee_norm_pmull.S \
crc/aarch64/crc64_ecma_refl_pmull.S \
crc/aarch64/crc64_ecma_norm_pmull.S \
crc/aarch64/crc64_iso_refl_pmull.S \
crc/aarch64/crc64_iso_norm_pmull.S \
crc/aarch64/crc64_jones_refl_pmull.S \
crc/aarch64/crc64_jones_norm_pmull.S \
crc/aarch64/crc64_rocksoft_refl_pmull.S \
crc/aarch64/crc64_rocksoft_norm_pmull.S
#CRC32/CRC32C for micro-architecture
lsrc_aarch64 += \
crc/aarch64/crc32_iscsi_refl_pmull.S \
crc/aarch64/crc32_gzip_refl_pmull.S \
crc/aarch64/crc32_iscsi_3crc_fold.S \
crc/aarch64/crc32_gzip_refl_3crc_fold.S \
crc/aarch64/crc32_iscsi_crc_ext.S \
crc/aarch64/crc32_gzip_refl_crc_ext.S \
crc/aarch64/crc32_mix_default.S \
crc/aarch64/crc32c_mix_default.S \
crc/aarch64/crc32_mix_neoverse_n1.S \
crc/aarch64/crc32c_mix_neoverse_n1.S

View File

@ -0,0 +1,437 @@
########################################################################
# Copyright(c) 2019 Arm Corporation All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Arm Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#########################################################################
#include "../include/aarch64_label.h"
.arch armv8-a+crc+crypto
.text
.align 3
.global cdecl(crc16_t10dif_copy_pmull)
#ifndef __APPLE__
.type crc16_t10dif_copy_pmull, %function
#endif
/* uint16_t crc16_t10dif_pmull(uint16_t seed, uint8_t *buf, uint64_t len) */
/* arguments */
w_seed .req w0
x_dst .req x1
x_src .req x2
x_len .req x3
w_len .req w3
/* returns */
w_ret .req w0
/* these as global temporary registers */
w_tmp .req w6
x_tmp .req x6
x_tmp1 .req x7
x_tmp2 .req x11
d_tmp1 .req d0
d_tmp2 .req d1
q_tmp1 .req q0
q_tmp2 .req q1
v_tmp1 .req v0
v_tmp2 .req v1
/* local variables */
w_counter .req w4
w_crc .req w0
x_crc .req x0
x_counter .req x4
x_crc16tab .req x5
x_src_saved .req x0
x_dst_saved .req x12
cdecl(crc16_t10dif_copy_pmull):
cmp x_len, 63
sub sp, sp, #16
uxth w_seed, w_seed
bhi .crc_fold
mov x_tmp, 0
mov w_counter, 0
.crc_table_loop_pre:
cmp x_len, x_tmp
bls .end
#ifndef __APPLE__
sxtw x_counter, w_counter
adrp x_crc16tab, .LANCHOR0
sub x_src, x_src, x_counter
sub x_dst, x_dst, x_counter
add x_crc16tab, x_crc16tab, :lo12:.LANCHOR0
#else
sxtw x_counter, w_counter
adrp x_crc16tab, .LANCHOR0@PAGE
sub x_src, x_src, x_counter
sub x_dst, x_dst, x_counter
add x_crc16tab, x_crc16tab, .LANCHOR0@PAGEOFF
#endif
.align 2
.crc_table_loop:
ldrb w_tmp, [x_src, x_counter]
strb w_tmp, [x_dst, x_counter]
add x_counter, x_counter, 1
cmp x_len, x_counter
eor w_tmp, w_tmp, w_crc, lsr 8
ldrh w_tmp, [x_crc16tab, w_tmp, sxtw 1]
eor w_crc, w_tmp, w_crc, lsl 8
uxth w_crc, w_crc
bhi .crc_table_loop
.end:
add sp, sp, 16
ret
/* carry less multiplication, part1 - before loop */
q_x0 .req q2
q_x1 .req q3
q_x2 .req q4
q_x3 .req q5
v_x0 .req v2
v_x1 .req v3
v_x2 .req v4
v_x3 .req v5
d_x0 .req d2
d_x1 .req d3
d_x2 .req d4
d_x3 .req d5
q_permutation .req q7
v_permutation .req v7
// the following registers only used this part1
d_tmp3 .req d16
v_tmp3 .req v16
.align 3
.crc_fold:
fmov d_tmp1, x_crc
fmov d_tmp2, xzr
dup d_tmp3, v_tmp2.d[0]
shl d_tmp1, d_tmp1, 48
ins v_tmp3.d[1], v_tmp1.d[0]
and x_counter, x_len, -64
sub x_counter, x_counter, #64
cmp x_counter, 63
add x_src_saved, x_src, 64
add x_dst_saved, x_dst, 64
ldp q_x0, q_x1, [x_src]
ldp q_x2, q_x3, [x_src, 32]
stp q_x0, q_x1, [x_dst]
stp q_x2, q_x3, [x_dst, 32]
#ifndef __APPLE__
adrp x_tmp, .shuffle_mask_lanchor
ldr q_permutation, [x_tmp, :lo12:.shuffle_mask_lanchor]
#else
adrp x_tmp, .shuffle_mask_lanchor@PAGE
ldr q_permutation, [x_tmp, .shuffle_mask_lanchor@PAGEOFF]
#endif
tbl v_tmp1.16b, {v_x0.16b}, v7.16b
eor v_x0.16b, v_tmp3.16b, v_tmp1.16b
tbl v_x1.16b, {v_x1.16b}, v7.16b
tbl v_x2.16b, {v_x2.16b}, v7.16b
tbl v_x3.16b, {v_x3.16b}, v7.16b
bls .crc_fold_loop_end
/* carry less multiplication, part2 - loop */
q_y0 .req q28
q_y1 .req q29
q_y2 .req q30
q_y3 .req q31
v_y0 .req v28
v_y1 .req v29
v_y2 .req v30
v_y3 .req v31
d_x0_h .req d24
d_x0_l .req d2
d_x1_h .req d25
d_x1_l .req d3
d_x2_h .req d26
d_x2_l .req d4
d_x3_h .req d27
d_x3_l .req d5
v_x0_h .req v24
v_x0_l .req v2
v_x1_h .req v25
v_x1_l .req v3
v_x2_h .req v26
v_x2_l .req v4
v_x3_h .req v27
v_x3_l .req v5
v_tmp1_x0 .req v24
v_tmp1_x1 .req v25
v_tmp1_x2 .req v26
v_tmp1_x3 .req v27
q_fold_const .req q17
v_fold_const .req v17
ldr q_fold_const, fold_constant
.align 2
.crc_fold_loop:
add x_src_saved, x_src_saved, 64
add x_dst_saved, x_dst_saved, 64
sub x_counter, x_counter, #64
cmp x_counter, 63
ldp q_y0, q_y1, [x_src_saved, -64]
ldp q_y2, q_y3, [x_src_saved, -32]
stp q_y0, q_y1, [x_dst_saved, -64]
stp q_y2, q_y3, [x_dst_saved, -32]
prfm pldl2strm, [x_src_saved, #1024]
prfm pldl2strm, [x_src_saved, #1088]
pmull2 v_tmp1_x0.1q, v_x0.2d, v_fold_const.2d
pmull v_x0.1q, v_x0.1d, v_fold_const.1d
pmull2 v_tmp1_x1.1q, v_x1.2d, v_fold_const.2d
pmull v_x1.1q, v_x1.1d, v_fold_const.1d
pmull2 v_tmp1_x2.1q, v_x2.2d, v_fold_const.2d
pmull v_x2.1q, v_x2.1d, v_fold_const.1d
pmull2 v_tmp1_x3.1q, v_x3.2d, v_fold_const.2d
pmull v_x3.1q, v_x3.1d, v_fold_const.1d
tbl v_y0.16b, {v_y0.16b}, v_permutation.16b
eor v_x0.16b, v_tmp1_x0.16b, v_x0.16b
eor v_x0.16b, v_x0.16b, v_y0.16b
tbl v_y1.16b, {v_y1.16b}, v_permutation.16b
eor v_x1.16b, v_tmp1_x1.16b, v_x1.16b
eor v_x1.16b, v_x1.16b, v_y1.16b
tbl v_y2.16b, {v_y2.16b}, v_permutation.16b
eor v_x2.16b, v_tmp1_x2.16b, v_x2.16b
eor v_x2.16b, v_x2.16b, v_y2.16b
tbl v_y3.16b, {v_y3.16b}, v_permutation.16b
eor v_x3.16b, v_tmp1_x3.16b, v_x3.16b
eor v_x3.16b, v_x3.16b, v_y3.16b
bhi .crc_fold_loop
/* carry less multiplication, part3 - after loop */
/* folding 512bit ---> 128bit */
// input parameters:
// v_x0 => v2
// v_x1 => v3
// v_x2 => v4
// v_x3 => v5
// v0, v1, v6, v30, are tmp registers
.crc_fold_loop_end:
mov x_tmp, 0x4c1a0000 /* p1 [1] */
fmov d0, x_tmp
mov x_tmp, 0xfb0b0000 /* p1 [0] */
fmov d1, x_tmp
and w_counter, w_len, -64
sxtw x_tmp, w_counter
add x_src, x_src, x_tmp
add x_dst, x_dst, x_tmp
dup d6, v_x0.d[1]
dup d30, v_x0.d[0]
pmull v6.1q, v6.1d, v0.1d
pmull v30.1q, v30.1d, v1.1d
eor v6.16b, v6.16b, v30.16b
eor v_x1.16b, v6.16b, v_x1.16b
dup d6, v_x1.d[1]
dup d30, v_x1.d[0]
pmull v6.1q, v6.1d, v0.1d
pmull v16.1q, v30.1d, v1.1d
eor v6.16b, v6.16b, v16.16b
eor v_x2.16b, v6.16b, v_x2.16b
dup d_x0, v_x2.d[1]
dup d30, v_x2.d[0]
pmull v0.1q, v_x0.1d, v0.1d
pmull v_x0.1q, v30.1d, v1.1d
eor v1.16b, v0.16b, v_x0.16b
eor v_x0.16b, v1.16b, v_x3.16b
/* carry less multiplication, part3 - after loop */
/* crc16 fold function */
d_16fold_p0_h .req d18
v_16fold_p0_h .req v18
d_16fold_p0_l .req d4
v_16fold_p0_l .req v4
v_16fold_from .req v_x0
d_16fold_from_h .req d3
v_16fold_from_h .req v3
v_16fold_zero .req v7
v_16fold_from1 .req v16
v_16fold_from2 .req v0
d_16fold_from2_h .req d6
v_16fold_from2_h .req v6
v_16fold_tmp .req v0
movi v_16fold_zero.4s, 0
mov x_tmp1, 0x2d560000 /* p0 [1] */
mov x_tmp2, 0x13680000 /* p0 [0] */
ext v_16fold_tmp.16b, v_16fold_zero.16b, v_16fold_from.16b, #8
ext v_16fold_tmp.16b, v0.16b, v_16fold_zero.16b, #4
dup d_16fold_from_h, v_16fold_from.d[1]
fmov d_16fold_p0_h, x_tmp1
pmull v_16fold_from1.1q, v_16fold_from_h.1d, v_16fold_p0_h.1d
eor v_16fold_from2.16b, v_16fold_tmp.16b, v_16fold_from1.16b
dup d_16fold_from2_h, v_16fold_from2.d[1]
fmov d_16fold_p0_l, x_tmp2
pmull v6.1q, v_16fold_from2_h.1d, v_16fold_p0_l.1d
eor v_x0.16b, v0.16b, v6.16b
/* carry less multiplication, part3 - after loop */
/* crc16 barrett reduction function */
// input parameters:
// v_x0: v2
// barrett reduction constant: br[0], br[1]
d_br0 .req d3
v_br0 .req v3
d_br1 .req d5
v_br1 .req v5
mov x_tmp1, 0x57f9 /* br[0] low */
movk x_tmp1, 0xf65a, lsl 16 /* br[0] high */
movk x_tmp1, 0x1, lsl 32
fmov d_br0, x_tmp1
dup d1, v_x0.d[0]
dup d1, v1.d[0]
ext v1.16b, v1.16b, v7.16b, #4
pmull v4.1q, v1.1d, v_br0.1d
ext v1.16b, v4.16b, v7.16b, #4
mov x_tmp1, 0x8bb70000 /* br[1] low */
movk x_tmp1, 0x1, lsl 32 /* br[1] high */
fmov d_br1, x_tmp1
pmull v_br1.1q, v1.1d, v_br1.1d
eor v_x0.16b, v_x0.16b, v_br1.16b
umov x0, v_x0.d[0]
ubfx x0, x0, 16, 16
b .crc_table_loop_pre
#ifndef __APPLE__
.size crc16_t10dif_copy_pmull, .-crc16_t10dif_copy_pmull
#endif
.align 4
fold_constant:
.word 0x87e70000
.word 0x00000000
.word 0x371d0000
.word 0x00000000
ASM_DEF_RODATA
.shuffle_mask_lanchor = . + 0
#ifndef __APPLE__
.type shuffle_mask, %object
.size shuffle_mask, 16
#endif
shuffle_mask:
.byte 15, 14, 13, 12, 11, 10, 9, 8
.byte 7, 6, 5, 4, 3, 2, 1, 0
.align 4
.LANCHOR0 = . + 0
#ifndef __APPLE__
.type crc16tab, %object
.size crc16tab, 512
#endif
crc16tab:
.hword 0x0000, 0x8bb7, 0x9cd9, 0x176e, 0xb205, 0x39b2, 0x2edc, 0xa56b
.hword 0xEFBD, 0x640a, 0x7364, 0xf8d3, 0x5db8, 0xd60f, 0xc161, 0x4ad6
.hword 0x54CD, 0xdf7a, 0xc814, 0x43a3, 0xe6c8, 0x6d7f, 0x7a11, 0xf1a6
.hword 0xBB70, 0x30c7, 0x27a9, 0xac1e, 0x0975, 0x82c2, 0x95ac, 0x1e1b
.hword 0xA99A, 0x222d, 0x3543, 0xbef4, 0x1b9f, 0x9028, 0x8746, 0x0cf1
.hword 0x4627, 0xcd90, 0xdafe, 0x5149, 0xf422, 0x7f95, 0x68fb, 0xe34c
.hword 0xFD57, 0x76e0, 0x618e, 0xea39, 0x4f52, 0xc4e5, 0xd38b, 0x583c
.hword 0x12EA, 0x995d, 0x8e33, 0x0584, 0xa0ef, 0x2b58, 0x3c36, 0xb781
.hword 0xD883, 0x5334, 0x445a, 0xcfed, 0x6a86, 0xe131, 0xf65f, 0x7de8
.hword 0x373E, 0xbc89, 0xabe7, 0x2050, 0x853b, 0x0e8c, 0x19e2, 0x9255
.hword 0x8C4E, 0x07f9, 0x1097, 0x9b20, 0x3e4b, 0xb5fc, 0xa292, 0x2925
.hword 0x63F3, 0xe844, 0xff2a, 0x749d, 0xd1f6, 0x5a41, 0x4d2f, 0xc698
.hword 0x7119, 0xfaae, 0xedc0, 0x6677, 0xc31c, 0x48ab, 0x5fc5, 0xd472
.hword 0x9EA4, 0x1513, 0x027d, 0x89ca, 0x2ca1, 0xa716, 0xb078, 0x3bcf
.hword 0x25D4, 0xae63, 0xb90d, 0x32ba, 0x97d1, 0x1c66, 0x0b08, 0x80bf
.hword 0xCA69, 0x41de, 0x56b0, 0xdd07, 0x786c, 0xf3db, 0xe4b5, 0x6f02
.hword 0x3AB1, 0xb106, 0xa668, 0x2ddf, 0x88b4, 0x0303, 0x146d, 0x9fda
.hword 0xD50C, 0x5ebb, 0x49d5, 0xc262, 0x6709, 0xecbe, 0xfbd0, 0x7067
.hword 0x6E7C, 0xe5cb, 0xf2a5, 0x7912, 0xdc79, 0x57ce, 0x40a0, 0xcb17
.hword 0x81C1, 0x0a76, 0x1d18, 0x96af, 0x33c4, 0xb873, 0xaf1d, 0x24aa
.hword 0x932B, 0x189c, 0x0ff2, 0x8445, 0x212e, 0xaa99, 0xbdf7, 0x3640
.hword 0x7C96, 0xf721, 0xe04f, 0x6bf8, 0xce93, 0x4524, 0x524a, 0xd9fd
.hword 0xC7E6, 0x4c51, 0x5b3f, 0xd088, 0x75e3, 0xfe54, 0xe93a, 0x628d
.hword 0x285B, 0xa3ec, 0xb482, 0x3f35, 0x9a5e, 0x11e9, 0x0687, 0x8d30
.hword 0xE232, 0x6985, 0x7eeb, 0xf55c, 0x5037, 0xdb80, 0xccee, 0x4759
.hword 0x0D8F, 0x8638, 0x9156, 0x1ae1, 0xbf8a, 0x343d, 0x2353, 0xa8e4
.hword 0xB6FF, 0x3d48, 0x2a26, 0xa191, 0x04fa, 0x8f4d, 0x9823, 0x1394
.hword 0x5942, 0xd2f5, 0xc59b, 0x4e2c, 0xeb47, 0x60f0, 0x779e, 0xfc29
.hword 0x4BA8, 0xc01f, 0xd771, 0x5cc6, 0xf9ad, 0x721a, 0x6574, 0xeec3
.hword 0xA415, 0x2fa2, 0x38cc, 0xb37b, 0x1610, 0x9da7, 0x8ac9, 0x017e
.hword 0x1F65, 0x94d2, 0x83bc, 0x080b, 0xad60, 0x26d7, 0x31b9, 0xba0e
.hword 0xF0D8, 0x7b6f, 0x6c01, 0xe7b6, 0x42dd, 0xc96a, 0xde04, 0x55b3

View File

@ -0,0 +1,422 @@
########################################################################
# Copyright(c) 2019 Arm Corporation All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Arm Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#########################################################################
#include "../include/aarch64_label.h"
.arch armv8-a+crc+crypto
.text
.align 3
.global cdecl(crc16_t10dif_pmull)
#ifndef __APPLE__
.type crc16_t10dif_pmull, %function
#endif
/* uint16_t crc16_t10dif_pmull(uint16_t seed, uint8_t *buf, uint64_t len) */
/* arguments */
w_seed .req w0
x_buf .req x1
x_len .req x2
w_len .req w2
/* returns */
w_ret .req w0
/* these as global temporary registers */
w_tmp .req w5
x_tmp .req x5
x_tmp1 .req x6
x_tmp2 .req x7
d_tmp1 .req d0
d_tmp2 .req d1
q_tmp1 .req q0
q_tmp2 .req q1
v_tmp1 .req v0
v_tmp2 .req v1
/* local variables */
w_counter .req w3
w_crc .req w0
x_crc .req x0
x_counter .req x3
x_crc16tab .req x4
x_buf_saved .req x0
cdecl(crc16_t10dif_pmull):
cmp x_len, 63
sub sp, sp, #16
uxth w_seed, w_seed
bhi .crc_fold
mov x_tmp, 0
mov w_counter, 0
.crc_table_loop_pre:
cmp x_len, x_tmp
bls .end
#ifndef __APPLE__
sxtw x_counter, w_counter
adrp x_crc16tab, .LANCHOR0
sub x_buf, x_buf, x_counter
add x_crc16tab, x_crc16tab, :lo12:.LANCHOR0
#else
sxtw x_counter, w_counter
adrp x_crc16tab, .LANCHOR0@PAGE
sub x_buf, x_buf, x_counter
add x_crc16tab, x_crc16tab, .LANCHOR0@PAGEOFF
#endif
.align 2
.crc_table_loop:
ldrb w_tmp, [x_buf, x_counter]
add x_counter, x_counter, 1
cmp x_len, x_counter
eor w_tmp, w_tmp, w_crc, lsr 8
ldrh w_tmp, [x_crc16tab, w_tmp, sxtw 1]
eor w_crc, w_tmp, w_crc, lsl 8
uxth w_crc, w_crc
bhi .crc_table_loop
.end:
add sp, sp, 16
ret
/* carry less multiplication, part1 - before loop */
q_x0 .req q2
q_x1 .req q3
q_x2 .req q4
q_x3 .req q5
v_x0 .req v2
v_x1 .req v3
v_x2 .req v4
v_x3 .req v5
d_x0 .req d2
d_x1 .req d3
d_x2 .req d4
d_x3 .req d5
q_permutation .req q7
v_permutation .req v7
// the following registers only used this part1
d_tmp3 .req d16
v_tmp3 .req v16
.align 3
.crc_fold:
fmov d_tmp1, x_crc
fmov d_tmp2, xzr
dup d_tmp3, v_tmp2.d[0]
shl d_tmp1, d_tmp1, 48
ins v_tmp3.d[1], v_tmp1.d[0]
and x_counter, x_len, -64
sub x_counter, x_counter, #64
cmp x_counter, 63
add x_buf_saved, x_buf, 64
ldp q_x0, q_x1, [x_buf]
ldp q_x2, q_x3, [x_buf, 32]
#ifndef __APPLE__
adrp x_tmp, .shuffle_mask_lanchor
ldr q7, [x_tmp, :lo12:.shuffle_mask_lanchor]
#else
adrp x_tmp, .shuffle_mask_lanchor@PAGE
ldr q7, [x_tmp, .shuffle_mask_lanchor@PAGEOFF]
#endif
tbl v_tmp1.16b, {v_x0.16b}, v7.16b
eor v_x0.16b, v_tmp3.16b, v_tmp1.16b
tbl v_x1.16b, {v_x1.16b}, v7.16b
tbl v_x2.16b, {v_x2.16b}, v7.16b
tbl v_x3.16b, {v_x3.16b}, v7.16b
bls .crc_fold_loop_end
/* carry less multiplication, part2 - loop */
q_y0 .req q28
q_y1 .req q29
q_y2 .req q30
q_y3 .req q31
v_y0 .req v28
v_y1 .req v29
v_y2 .req v30
v_y3 .req v31
d_x0_h .req d24
d_x0_l .req d2
d_x1_h .req d25
d_x1_l .req d3
d_x2_h .req d26
d_x2_l .req d4
d_x3_h .req d27
d_x3_l .req d5
v_x0_h .req v24
v_x0_l .req v2
v_x1_h .req v25
v_x1_l .req v3
v_x2_h .req v26
v_x2_l .req v4
v_x3_h .req v27
v_x3_l .req v5
v_tmp1_x0 .req v24
v_tmp1_x1 .req v25
v_tmp1_x2 .req v26
v_tmp1_x3 .req v27
q_fold_const .req q17
v_fold_const .req v17
ldr q_fold_const, fold_constant
.align 2
.crc_fold_loop:
add x_buf_saved, x_buf_saved, 64
sub x_counter, x_counter, #64
cmp x_counter, 63
ldp q_y0, q_y1, [x_buf_saved, -64]
ldp q_y2, q_y3, [x_buf_saved, -32]
prfm pldl2strm, [x_buf_saved, #1024]
prfm pldl2strm, [x_buf_saved, #1088]
pmull2 v_tmp1_x0.1q, v_x0.2d, v_fold_const.2d
pmull v_x0.1q, v_x0.1d, v_fold_const.1d
pmull2 v_tmp1_x1.1q, v_x1.2d, v_fold_const.2d
pmull v_x1.1q, v_x1.1d, v_fold_const.1d
pmull2 v_tmp1_x2.1q, v_x2.2d, v_fold_const.2d
pmull v_x2.1q, v_x2.1d, v_fold_const.1d
pmull2 v_tmp1_x3.1q, v_x3.2d, v_fold_const.2d
pmull v_x3.1q, v_x3.1d, v_fold_const.1d
tbl v_y0.16b, {v_y0.16b}, v_permutation.16b
eor v_x0.16b, v_tmp1_x0.16b, v_x0.16b
eor v_x0.16b, v_x0.16b, v_y0.16b
tbl v_y1.16b, {v_y1.16b}, v_permutation.16b
eor v_x1.16b, v_tmp1_x1.16b, v_x1.16b
eor v_x1.16b, v_x1.16b, v_y1.16b
tbl v_y2.16b, {v_y2.16b}, v_permutation.16b
eor v_x2.16b, v_tmp1_x2.16b, v_x2.16b
eor v_x2.16b, v_x2.16b, v_y2.16b
tbl v_y3.16b, {v_y3.16b}, v_permutation.16b
eor v_x3.16b, v_tmp1_x3.16b, v_x3.16b
eor v_x3.16b, v_x3.16b, v_y3.16b
bhi .crc_fold_loop
/* carry less multiplication, part3 - after loop */
/* folding 512bit ---> 128bit */
// input parameters:
// v_x0 => v2
// v_x1 => v3
// v_x2 => v4
// v_x3 => v5
// v0, v1, v6, v30, are tmp registers
.crc_fold_loop_end:
mov x_tmp, 0x4c1a0000 /* p1 [1] */
fmov d0, x_tmp
mov x_tmp, 0xfb0b0000 /* p1 [0] */
fmov d1, x_tmp
and w_counter, w_len, -64
sxtw x_tmp, w_counter
add x_buf, x_buf, x_tmp
dup d6, v_x0.d[1]
dup d30, v_x0.d[0]
pmull v6.1q, v6.1d, v0.1d
pmull v30.1q, v30.1d, v1.1d
eor v6.16b, v6.16b, v30.16b
eor v_x1.16b, v6.16b, v_x1.16b
dup d6, v_x1.d[1]
dup d30, v_x1.d[0]
pmull v6.1q, v6.1d, v0.1d
pmull v16.1q, v30.1d, v1.1d
eor v6.16b, v6.16b, v16.16b
eor v_x2.16b, v6.16b, v_x2.16b
dup d_x0, v_x2.d[1]
dup d30, v_x2.d[0]
pmull v0.1q, v_x0.1d, v0.1d
pmull v_x0.1q, v30.1d, v1.1d
eor v1.16b, v0.16b, v_x0.16b
eor v_x0.16b, v1.16b, v_x3.16b
/* carry less multiplication, part3 - after loop */
/* crc16 fold function */
d_16fold_p0_h .req d18
v_16fold_p0_h .req v18
d_16fold_p0_l .req d4
v_16fold_p0_l .req v4
v_16fold_from .req v_x0
d_16fold_from_h .req d3
v_16fold_from_h .req v3
v_16fold_zero .req v7
v_16fold_from1 .req v16
v_16fold_from2 .req v0
d_16fold_from2_h .req d6
v_16fold_from2_h .req v6
v_16fold_tmp .req v0
movi v_16fold_zero.4s, 0
mov x_tmp1, 0x2d560000 /* p0 [1] */
mov x_tmp2, 0x13680000 /* p0 [0] */
ext v_16fold_tmp.16b, v_16fold_zero.16b, v_16fold_from.16b, #8
ext v_16fold_tmp.16b, v0.16b, v_16fold_zero.16b, #4
dup d_16fold_from_h, v_16fold_from.d[1]
fmov d_16fold_p0_h, x_tmp1
pmull v_16fold_from1.1q, v_16fold_from_h.1d, v_16fold_p0_h.1d
eor v_16fold_from2.16b, v_16fold_tmp.16b, v_16fold_from1.16b
dup d_16fold_from2_h, v_16fold_from2.d[1]
fmov d_16fold_p0_l, x_tmp2
pmull v6.1q, v_16fold_from2_h.1d, v_16fold_p0_l.1d
eor v_x0.16b, v0.16b, v6.16b
/* carry less multiplication, part3 - after loop */
/* crc16 barrett reduction function */
// input parameters:
// v_x0: v2
// barrett reduction constant: br[0], br[1]
d_br0 .req d3
v_br0 .req v3
d_br1 .req d5
v_br1 .req v5
mov x_tmp1, 0x57f9 /* br[0] low */
movk x_tmp1, 0xf65a, lsl 16 /* br[0] high */
movk x_tmp1, 0x1, lsl 32
fmov d_br0, x_tmp1
dup d1, v_x0.d[0]
dup d1, v1.d[0]
ext v1.16b, v1.16b, v7.16b, #4
pmull v4.1q, v1.1d, v_br0.1d
ext v1.16b, v4.16b, v7.16b, #4
mov x_tmp1, 0x8bb70000 /* br[1] low */
movk x_tmp1, 0x1, lsl 32 /* br[1] high */
fmov d_br1, x_tmp1
pmull v_br1.1q, v1.1d, v_br1.1d
eor v_x0.16b, v_x0.16b, v_br1.16b
umov x0, v_x0.d[0]
ubfx x0, x0, 16, 16
b .crc_table_loop_pre
#ifndef __APPLE__
.size crc16_t10dif_pmull, .-crc16_t10dif_pmull
#endif
.align 4
fold_constant:
.word 0x87e70000
.word 0x00000000
.word 0x371d0000
.word 0x00000000
ASM_DEF_RODATA
.shuffle_mask_lanchor = . + 0
#ifndef __APPLE__
.type shuffle_mask, %object
.size shuffle_mask, 16
#endif
shuffle_mask:
.byte 15, 14, 13, 12, 11, 10, 9, 8
.byte 7, 6, 5, 4, 3, 2, 1, 0
.align 4
.LANCHOR0 = . + 0
#ifndef __APPLE__
.type crc16tab, %object
.size crc16tab, 512
#endif
crc16tab:
.hword 0x0000, 0x8bb7, 0x9cd9, 0x176e, 0xb205, 0x39b2, 0x2edc, 0xa56b
.hword 0xEFBD, 0x640a, 0x7364, 0xf8d3, 0x5db8, 0xd60f, 0xc161, 0x4ad6
.hword 0x54CD, 0xdf7a, 0xc814, 0x43a3, 0xe6c8, 0x6d7f, 0x7a11, 0xf1a6
.hword 0xBB70, 0x30c7, 0x27a9, 0xac1e, 0x0975, 0x82c2, 0x95ac, 0x1e1b
.hword 0xA99A, 0x222d, 0x3543, 0xbef4, 0x1b9f, 0x9028, 0x8746, 0x0cf1
.hword 0x4627, 0xcd90, 0xdafe, 0x5149, 0xf422, 0x7f95, 0x68fb, 0xe34c
.hword 0xFD57, 0x76e0, 0x618e, 0xea39, 0x4f52, 0xc4e5, 0xd38b, 0x583c
.hword 0x12EA, 0x995d, 0x8e33, 0x0584, 0xa0ef, 0x2b58, 0x3c36, 0xb781
.hword 0xD883, 0x5334, 0x445a, 0xcfed, 0x6a86, 0xe131, 0xf65f, 0x7de8
.hword 0x373E, 0xbc89, 0xabe7, 0x2050, 0x853b, 0x0e8c, 0x19e2, 0x9255
.hword 0x8C4E, 0x07f9, 0x1097, 0x9b20, 0x3e4b, 0xb5fc, 0xa292, 0x2925
.hword 0x63F3, 0xe844, 0xff2a, 0x749d, 0xd1f6, 0x5a41, 0x4d2f, 0xc698
.hword 0x7119, 0xfaae, 0xedc0, 0x6677, 0xc31c, 0x48ab, 0x5fc5, 0xd472
.hword 0x9EA4, 0x1513, 0x027d, 0x89ca, 0x2ca1, 0xa716, 0xb078, 0x3bcf
.hword 0x25D4, 0xae63, 0xb90d, 0x32ba, 0x97d1, 0x1c66, 0x0b08, 0x80bf
.hword 0xCA69, 0x41de, 0x56b0, 0xdd07, 0x786c, 0xf3db, 0xe4b5, 0x6f02
.hword 0x3AB1, 0xb106, 0xa668, 0x2ddf, 0x88b4, 0x0303, 0x146d, 0x9fda
.hword 0xD50C, 0x5ebb, 0x49d5, 0xc262, 0x6709, 0xecbe, 0xfbd0, 0x7067
.hword 0x6E7C, 0xe5cb, 0xf2a5, 0x7912, 0xdc79, 0x57ce, 0x40a0, 0xcb17
.hword 0x81C1, 0x0a76, 0x1d18, 0x96af, 0x33c4, 0xb873, 0xaf1d, 0x24aa
.hword 0x932B, 0x189c, 0x0ff2, 0x8445, 0x212e, 0xaa99, 0xbdf7, 0x3640
.hword 0x7C96, 0xf721, 0xe04f, 0x6bf8, 0xce93, 0x4524, 0x524a, 0xd9fd
.hword 0xC7E6, 0x4c51, 0x5b3f, 0xd088, 0x75e3, 0xfe54, 0xe93a, 0x628d
.hword 0x285B, 0xa3ec, 0xb482, 0x3f35, 0x9a5e, 0x11e9, 0x0687, 0x8d30
.hword 0xE232, 0x6985, 0x7eeb, 0xf55c, 0x5037, 0xdb80, 0xccee, 0x4759
.hword 0x0D8F, 0x8638, 0x9156, 0x1ae1, 0xbf8a, 0x343d, 0x2353, 0xa8e4
.hword 0xB6FF, 0x3d48, 0x2a26, 0xa191, 0x04fa, 0x8f4d, 0x9823, 0x1394
.hword 0x5942, 0xd2f5, 0xc59b, 0x4e2c, 0xeb47, 0x60f0, 0x779e, 0xfc29
.hword 0x4BA8, 0xc01f, 0xd771, 0x5cc6, 0xf9ad, 0x721a, 0x6574, 0xeec3
.hword 0xA415, 0x2fa2, 0x38cc, 0xb37b, 0x1610, 0x9da7, 0x8ac9, 0x017e
.hword 0x1F65, 0x94d2, 0x83bc, 0x080b, 0xad60, 0x26d7, 0x31b9, 0xba0e
.hword 0xF0D8, 0x7b6f, 0x6c01, 0xe7b6, 0x42dd, 0xc96a, 0xde04, 0x55b3

View File

@ -0,0 +1,320 @@
/**********************************************************************
Copyright(c) 2020 Arm Corporation All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Arm Corporation nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
**********************************************************************/
#include "../include/aarch64_label.h"
.macro crc32_hw_common poly_type
.ifc \poly_type,crc32
mvn wCRC,wCRC
.endif
cbz LEN, .zero_length_ret
tbz BUF, 0, .align_short
ldrb wdata,[BUF],1
sub LEN,LEN,1
crc32_u8 wCRC,wCRC,wdata
.align_short:
tst BUF,2
ccmp LEN,1,0,ne
bhi .align_short_2
tst BUF,4
ccmp LEN,3,0,ne
bhi .align_word
.align_finish:
cmp LEN, 63
bls .loop_16B
.loop_64B:
ldp data0, data1, [BUF],#16
prfm pldl2keep,[BUF,2048]
sub LEN,LEN,#64
ldp data2, data3, [BUF],#16
prfm pldl1keep,[BUF,256]
cmp LEN,#64
crc32_u64 wCRC, wCRC, data0
crc32_u64 wCRC, wCRC, data1
ldp data0, data1, [BUF],#16
crc32_u64 wCRC, wCRC, data2
crc32_u64 wCRC, wCRC, data3
ldp data2, data3, [BUF],#16
crc32_u64 wCRC, wCRC, data0
crc32_u64 wCRC, wCRC, data1
crc32_u64 wCRC, wCRC, data2
crc32_u64 wCRC, wCRC, data3
bge .loop_64B
.loop_16B:
cmp LEN, 15
bls .less_16B
ldp data0, data1, [BUF],#16
sub LEN,LEN,#16
cmp LEN,15
crc32_u64 wCRC, wCRC, data0
crc32_u64 wCRC, wCRC, data1
bls .less_16B
ldp data0, data1, [BUF],#16
sub LEN,LEN,#16
cmp LEN,15
crc32_u64 wCRC, wCRC, data0
crc32_u64 wCRC, wCRC, data1
bls .less_16B
ldp data0, data1, [BUF],#16
sub LEN,LEN,#16 //MUST less than 16B
crc32_u64 wCRC, wCRC, data0
crc32_u64 wCRC, wCRC, data1
.less_16B:
cmp LEN, 7
bls .less_8B
ldr data0, [BUF], 8
sub LEN, LEN, #8
crc32_u64 wCRC, wCRC, data0
.less_8B:
cmp LEN, 3
bls .less_4B
ldr wdata, [BUF], 4
sub LEN, LEN, #4
crc32_u32 wCRC, wCRC, wdata
.less_4B:
cmp LEN, 1
bls .less_2B
ldrh wdata, [BUF], 2
sub LEN, LEN, #2
crc32_u16 wCRC, wCRC, wdata
.less_2B:
cbz LEN, .zero_length_ret
ldrb wdata, [BUF]
crc32_u8 wCRC, wCRC, wdata
.zero_length_ret:
.ifc \poly_type,crc32
mvn w0, wCRC
.else
mov w0, wCRC
.endif
ret
.align_short_2:
ldrh wdata, [BUF], 2
sub LEN, LEN, 2
tst BUF, 4
crc32_u16 wCRC, wCRC, wdata
ccmp LEN, 3, 0, ne
bls .align_finish
.align_word:
ldr wdata, [BUF], 4
sub LEN, LEN, #4
crc32_u32 wCRC, wCRC, wdata
b .align_finish
.endm
.macro crc32_3crc_fold poly_type
.ifc \poly_type,crc32
mvn wCRC,wCRC
.endif
cbz LEN, .zero_length_ret
tbz BUF, 0, .align_short
ldrb wdata,[BUF],1
sub LEN,LEN,1
crc32_u8 wCRC,wCRC,wdata
.align_short:
tst BUF,2
ccmp LEN,1,0,ne
bhi .align_short_2
tst BUF,4
ccmp LEN,3,0,ne
bhi .align_word
.align_finish:
cmp LEN,1023
adr const_adr, .Lconstants
bls 1f
ldp dconst0,dconst1,[const_adr]
2:
ldr crc0_data0,[ptr_crc0],8
prfm pldl2keep,[ptr_crc0,3*1024-8]
mov crc1,0
mov crc2,0
add ptr_crc1,ptr_crc0,336
add ptr_crc2,ptr_crc0,336*2
crc32_u64 crc0,crc0,crc0_data0
.set offset,0
.set ptr_offset,8
.rept 5
ldp crc0_data0,crc0_data1,[ptr_crc0],16
ldp crc1_data0,crc1_data1,[ptr_crc1],16
.set offset,offset+64
.set ptr_offset,ptr_offset+16
prfm pldl2keep,[ptr_crc0,3*1024-ptr_offset+offset]
crc32_u64 crc0,crc0,crc0_data0
crc32_u64 crc0,crc0,crc0_data1
ldp crc2_data0,crc2_data1,[ptr_crc2],16
crc32_u64 crc1,crc1,crc1_data0
crc32_u64 crc1,crc1,crc1_data1
crc32_u64 crc2,crc2,crc2_data0
crc32_u64 crc2,crc2,crc2_data1
.endr
.set l1_offset,0
.rept 10
ldp crc0_data0,crc0_data1,[ptr_crc0],16
ldp crc1_data0,crc1_data1,[ptr_crc1],16
.set offset,offset+64
.set ptr_offset,ptr_offset+16
prfm pldl2keep,[ptr_crc0,3*1024-ptr_offset+offset]
prfm pldl1keep,[ptr_crc0,2*1024-ptr_offset+l1_offset]
.set l1_offset,l1_offset+64
crc32_u64 crc0,crc0,crc0_data0
crc32_u64 crc0,crc0,crc0_data1
ldp crc2_data0,crc2_data1,[ptr_crc2],16
crc32_u64 crc1,crc1,crc1_data0
crc32_u64 crc1,crc1,crc1_data1
crc32_u64 crc2,crc2,crc2_data0
crc32_u64 crc2,crc2,crc2_data1
.endr
.rept 6
ldp crc0_data0,crc0_data1,[ptr_crc0],16
ldp crc1_data0,crc1_data1,[ptr_crc1],16
.set ptr_offset,ptr_offset+16
prfm pldl1keep,[ptr_crc0,2*1024-ptr_offset+l1_offset]
.set l1_offset,l1_offset+64
crc32_u64 crc0,crc0,crc0_data0
crc32_u64 crc0,crc0,crc0_data1
ldp crc2_data0,crc2_data1,[ptr_crc2],16
crc32_u64 crc1,crc1,crc1_data0
crc32_u64 crc1,crc1,crc1_data1
crc32_u64 crc2,crc2,crc2_data0
crc32_u64 crc2,crc2,crc2_data1
.endr
ldr crc2_data0,[ptr_crc2]
fmov dtmp0,xcrc0
fmov dtmp1,xcrc1
crc32_u64 crc2,crc2,crc2_data0
add ptr_crc0,ptr_crc0,1024-(336+8)
pmull vtmp0.1q,vtmp0.1d,vconst0.1d
sub LEN,LEN,1024
pmull vtmp1.1q,vtmp1.1d,vconst1.1d
cmp LEN,1024
fmov xcrc0,dtmp0
fmov xcrc1,dtmp1
crc32_u64 crc0,wzr,xcrc0
crc32_u64 crc1,wzr,xcrc1
eor crc0,crc0,crc2
eor crc0,crc0,crc1
bhs 2b
1:
cmp LEN, 63
bls .loop_16B
.loop_64B:
ldp data0, data1, [BUF],#16
sub LEN,LEN,#64
ldp data2, data3, [BUF],#16
cmp LEN,#64
crc32_u64 wCRC, wCRC, data0
crc32_u64 wCRC, wCRC, data1
ldp data0, data1, [BUF],#16
crc32_u64 wCRC, wCRC, data2
crc32_u64 wCRC, wCRC, data3
ldp data2, data3, [BUF],#16
crc32_u64 wCRC, wCRC, data0
crc32_u64 wCRC, wCRC, data1
crc32_u64 wCRC, wCRC, data2
crc32_u64 wCRC, wCRC, data3
bge .loop_64B
.loop_16B:
cmp LEN, 15
bls .less_16B
ldp data0, data1, [BUF],#16
sub LEN,LEN,#16
cmp LEN,15
crc32_u64 wCRC, wCRC, data0
crc32_u64 wCRC, wCRC, data1
bls .less_16B
ldp data0, data1, [BUF],#16
sub LEN,LEN,#16
cmp LEN,15
crc32_u64 wCRC, wCRC, data0
crc32_u64 wCRC, wCRC, data1
bls .less_16B
ldp data0, data1, [BUF],#16
sub LEN,LEN,#16 //MUST less than 16B
crc32_u64 wCRC, wCRC, data0
crc32_u64 wCRC, wCRC, data1
.less_16B:
cmp LEN, 7
bls .less_8B
ldr data0, [BUF], 8
sub LEN, LEN, #8
crc32_u64 wCRC, wCRC, data0
.less_8B:
cmp LEN, 3
bls .less_4B
ldr wdata, [BUF], 4
sub LEN, LEN, #4
crc32_u32 wCRC, wCRC, wdata
.less_4B:
cmp LEN, 1
bls .less_2B
ldrh wdata, [BUF], 2
sub LEN, LEN, #2
crc32_u16 wCRC, wCRC, wdata
.less_2B:
cbz LEN, .zero_length_ret
ldrb wdata, [BUF]
crc32_u8 wCRC, wCRC, wdata
.zero_length_ret:
.ifc \poly_type,crc32
mvn w0, wCRC
.else
mov w0, wCRC
.endif
ret
.align_short_2:
ldrh wdata, [BUF], 2
sub LEN, LEN, 2
tst BUF, 4
crc32_u16 wCRC, wCRC, wdata
ccmp LEN, 3, 0, ne
bls .align_finish
.align_word:
ldr wdata, [BUF], 4
sub LEN, LEN, #4
crc32_u32 wCRC, wCRC, wdata
b .align_finish
.Lconstants:
.ifc \poly_type,crc32
.quad 0xb486819b
.quad 0x76278617
.else
.quad 0xe417f38a
.quad 0x8f158014
.endif
.endm

View File

@ -0,0 +1,135 @@
/**********************************************************************
Copyright(c) 2020 Arm Corporation All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Arm Corporation nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
**********************************************************************/
.macro crc32_hw_common poly_type
cbz LEN, .zero_length_ret
.ifc \poly_type,crc32
mvn wCRC,wCRC
.endif
tbz BUF, 0, .align_short
ldrb wdata,[BUF],1
sub LEN,LEN,1
crc32_u8 wCRC,wCRC,wdata
.align_short:
tst BUF,2
ccmp LEN,1,0,ne
bhi .align_short_2
tst BUF,4
ccmp LEN,3,0,ne
bhi .align_word
.align_finish:
cmp LEN, 63
bls .loop_16B
.loop_64B:
ldp data0, data1, [BUF],#16
sub LEN,LEN,#64
ldp data2, data3, [BUF],#16
cmp LEN,#64
crc32_u64 wCRC, wCRC, data0
crc32_u64 wCRC, wCRC, data1
ldp data0, data1, [BUF],#16
crc32_u64 wCRC, wCRC, data2
crc32_u64 wCRC, wCRC, data3
ldp data2, data3, [BUF],#16
crc32_u64 wCRC, wCRC, data0
crc32_u64 wCRC, wCRC, data1
crc32_u64 wCRC, wCRC, data2
crc32_u64 wCRC, wCRC, data3
bge .loop_64B
.loop_16B:
cmp LEN, 15
bls .less_16B
ldp data0, data1, [BUF],#16
sub LEN,LEN,#16
cmp LEN,15
crc32_u64 wCRC, wCRC, data0
crc32_u64 wCRC, wCRC, data1
bls .less_16B
ldp data0, data1, [BUF],#16
sub LEN,LEN,#16
cmp LEN,15
crc32_u64 wCRC, wCRC, data0
crc32_u64 wCRC, wCRC, data1
bls .less_16B
ldp data0, data1, [BUF],#16
sub LEN,LEN,#16 //MUST less than 16B
crc32_u64 wCRC, wCRC, data0
crc32_u64 wCRC, wCRC, data1
.less_16B:
cmp LEN, 7
bls .less_8B
ldr data0, [BUF], 8
sub LEN, LEN, #8
crc32_u64 wCRC, wCRC, data0
.less_8B:
cmp LEN, 3
bls .less_4B
ldr wdata, [BUF], 4
sub LEN, LEN, #4
crc32_u32 wCRC, wCRC, wdata
.less_4B:
cmp LEN, 1
bls .less_2B
ldrh wdata, [BUF], 2
sub LEN, LEN, #2
crc32_u16 wCRC, wCRC, wdata
.less_2B:
cbz LEN, .finish_exit
ldrb wdata, [BUF]
crc32_u8 wCRC, wCRC, wdata
.finish_exit:
.ifc \poly_type,crc32
mvn w0, wCRC
.else
mov w0, wCRC
.endif
ret
.zero_length_ret:
mov w0, wCRC
ret
.align_short_2:
ldrh wdata, [BUF], 2
sub LEN, LEN, 2
tst BUF, 4
crc32_u16 wCRC, wCRC, wdata
ccmp LEN, 3, 0, ne
bls .align_finish
.align_word:
ldr wdata, [BUF], 4
sub LEN, LEN, #4
crc32_u32 wCRC, wCRC, wdata
b .align_finish
.endm

View File

@ -0,0 +1,432 @@
/**********************************************************************
Copyright(c) 2020 Arm Corporation All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Arm Corporation nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
**********************************************************************/
#include "../include/aarch64_label.h"
.macro declare_var_vector_reg name:req,reg:req
\name\()_q .req q\reg
\name\()_v .req v\reg
\name\()_s .req s\reg
\name\()_d .req d\reg
.endm
declare_var_vector_reg k1k2,20
declare_var_vector_reg k3k4,21
declare_var_vector_reg poly,22
declare_var_vector_reg k5k0,23
declare_var_vector_reg mask,24
declare_var_vector_reg fold_poly,25
declare_var_vector_reg tmp0,0
declare_var_vector_reg tmp1,1
declare_var_vector_reg tmp2,2
declare_var_vector_reg tmp3,3
declare_var_vector_reg tmp4,4
declare_var_vector_reg tmp5,5
declare_var_vector_reg tmp6,6
declare_var_vector_reg tmp7,7
declare_var_vector_reg pmull_data0,16
declare_var_vector_reg pmull_data1,17
declare_var_vector_reg pmull_data2,18
declare_var_vector_reg pmull_data3,19
vzr .req v26
const_addr .req x3
crc_blk_ptr .req x4
pmull_blk_ptr .req x5
crc_data0 .req x6
crc_data1 .req x7
crc_data2 .req x9
crc_data3 .req x10
wPmull .req w11
xPmull .req x11
data0 .req x4
data1 .req x5
data2 .req x6
data3 .req x7
wdata .req w4
.macro pmull_fold
pmull2 tmp4_v.1q, tmp0_v.2d, k1k2_v.2d
pmull2 tmp5_v.1q, tmp1_v.2d, k1k2_v.2d
pmull2 tmp6_v.1q, tmp2_v.2d, k1k2_v.2d
pmull2 tmp7_v.1q, tmp3_v.2d, k1k2_v.2d
pmull tmp0_v.1q, tmp0_v.1d, k1k2_v.1d
pmull tmp1_v.1q, tmp1_v.1d, k1k2_v.1d
pmull tmp2_v.1q, tmp2_v.1d, k1k2_v.1d
pmull tmp3_v.1q, tmp3_v.1d, k1k2_v.1d
ld1 {pmull_data0_v.16b-pmull_data3_v.16b},[pmull_blk_ptr],#64
crc32_u64 wCRC,wCRC,crc_data0
crc32_u64 wCRC,wCRC,crc_data1
ldp crc_data0,crc_data1,[crc_blk_ptr],16
crc32_u64 wCRC,wCRC,crc_data2
crc32_u64 wCRC,wCRC,crc_data3
ldp crc_data2,crc_data3,[crc_blk_ptr],16
eor tmp0_v.16b, tmp0_v.16b, tmp4_v.16b
eor tmp1_v.16b, tmp1_v.16b, tmp5_v.16b
eor tmp2_v.16b, tmp2_v.16b, tmp6_v.16b
eor tmp3_v.16b, tmp3_v.16b, tmp7_v.16b
crc32_u64 wCRC,wCRC,crc_data0
crc32_u64 wCRC,wCRC,crc_data1
ldp crc_data0,crc_data1,[crc_blk_ptr],16
crc32_u64 wCRC,wCRC,crc_data2
crc32_u64 wCRC,wCRC,crc_data3
ldp crc_data2,crc_data3,[crc_blk_ptr],16
eor tmp0_v.16b, tmp0_v.16b, v16.16b
eor tmp1_v.16b, tmp1_v.16b, v17.16b
eor tmp2_v.16b, tmp2_v.16b, v18.16b
eor tmp3_v.16b, tmp3_v.16b, v19.16b
crc32_u64 wCRC,wCRC,crc_data0
crc32_u64 wCRC,wCRC,crc_data1
ldp crc_data0,crc_data1,[crc_blk_ptr],16
crc32_u64 wCRC,wCRC,crc_data2
crc32_u64 wCRC,wCRC,crc_data3
ldp crc_data2,crc_data3,[crc_blk_ptr],16
.endm
.macro crc32_common_mix poly_type
.set MIX_BLK_SIZE,2048
.ifc \poly_type,crc32
mvn wCRC,wCRC
.endif
cmp LEN,MIX_BLK_SIZE-1
adr const_addr, .Lconstants
bls start_final
ld1 {k1k2_v.16b,k3k4_v.16b,poly_v.16b},[const_addr],#48
movi vzr.16b, #0
ld1 {k5k0_v.8b,mask_v.8b,fold_poly_v.8b},[const_addr]
loop_2048:
ld1 {tmp0_v.16b-tmp3_v.16b}, [BUF]
add pmull_blk_ptr,BUF,0x40
add crc_blk_ptr, BUF,512
mov tmp4_v.16b,vzr.16b
fmov tmp4_s, wCRC
ldp crc_data0,crc_data1,[crc_blk_ptr],16
eor tmp0_v.16b,tmp0_v.16b,tmp4_v.16b
mov wCRC, 0
sub LEN,LEN,MIX_BLK_SIZE
cmp LEN,MIX_BLK_SIZE
ldp crc_data2,crc_data3,[crc_blk_ptr],16
crc32_u64 wCRC,wCRC,crc_data0
crc32_u64 wCRC,wCRC,crc_data1
ldp crc_data0,crc_data1,[crc_blk_ptr],16
crc32_u64 wCRC,wCRC,crc_data2
crc32_u64 wCRC,wCRC,crc_data3
ldp crc_data2,crc_data3,[crc_blk_ptr],16
pmull_fold
pmull_fold
pmull_fold
pmull_fold
pmull_fold
pmull_fold
pmull_fold
/* Folding cache line into 128bit */
pmull2 tmp4_v.1q, tmp0_v.2d, k3k4_v.2d
crc32_u64 wCRC,wCRC,crc_data0
crc32_u64 wCRC,wCRC,crc_data1
ldp crc_data0,crc_data1,[crc_blk_ptr],16
crc32_u64 wCRC,wCRC,crc_data2
crc32_u64 wCRC,wCRC,crc_data3
ldp crc_data2,crc_data3,[crc_blk_ptr],16
crc32_u64 wCRC,wCRC,crc_data0
crc32_u64 wCRC,wCRC,crc_data1
ldp crc_data0,crc_data1,[crc_blk_ptr],16
pmull tmp0_v.1q, tmp0_v.1d, k3k4_v.1d
crc32_u64 wCRC,wCRC,crc_data2
crc32_u64 wCRC,wCRC,crc_data3
ldp crc_data2,crc_data3,[crc_blk_ptr],16
crc32_u64 wCRC,wCRC,crc_data0
crc32_u64 wCRC,wCRC,crc_data1
ldp crc_data0,crc_data1,[crc_blk_ptr],16
crc32_u64 wCRC,wCRC,crc_data2
crc32_u64 wCRC,wCRC,crc_data3
ldp crc_data2,crc_data3,[crc_blk_ptr],16
eor tmp0_v.16b, tmp0_v.16b, tmp4_v.16b
crc32_u64 wCRC,wCRC,crc_data0
crc32_u64 wCRC,wCRC,crc_data1
ldp crc_data0,crc_data1,[crc_blk_ptr],16
eor tmp0_v.16b, tmp0_v.16b, tmp1_v.16b
crc32_u64 wCRC,wCRC,crc_data2
crc32_u64 wCRC,wCRC,crc_data3
ldp crc_data2,crc_data3,[crc_blk_ptr],16
crc32_u64 wCRC,wCRC,crc_data0
crc32_u64 wCRC,wCRC,crc_data1
ldp crc_data0,crc_data1,[crc_blk_ptr],16
pmull2 tmp4_v.1q, tmp0_v.2d, k3k4_v.2d
crc32_u64 wCRC,wCRC,crc_data2
crc32_u64 wCRC,wCRC,crc_data3
ldp crc_data2,crc_data3,[crc_blk_ptr],16
pmull tmp0_v.1q, tmp0_v.1d, k3k4_v.1d
crc32_u64 wCRC,wCRC,crc_data0
crc32_u64 wCRC,wCRC,crc_data1
ldp crc_data0,crc_data1,[crc_blk_ptr],16
eor tmp0_v.16b, tmp0_v.16b, tmp4_v.16b
crc32_u64 wCRC,wCRC,crc_data2
crc32_u64 wCRC,wCRC,crc_data3
ldp crc_data2,crc_data3,[crc_blk_ptr],16
crc32_u64 wCRC,wCRC,crc_data0
crc32_u64 wCRC,wCRC,crc_data1
ldp crc_data0,crc_data1,[crc_blk_ptr],16
crc32_u64 wCRC,wCRC,crc_data2
crc32_u64 wCRC,wCRC,crc_data3
ldp crc_data2,crc_data3,[crc_blk_ptr],16
eor tmp0_v.16b, tmp0_v.16b, tmp2_v.16b
crc32_u64 wCRC,wCRC,crc_data0
crc32_u64 wCRC,wCRC,crc_data1
ldp crc_data0,crc_data1,[crc_blk_ptr],16
pmull2 tmp4_v.1q, tmp0_v.2d, k3k4_v.2d
crc32_u64 wCRC,wCRC,crc_data2
crc32_u64 wCRC,wCRC,crc_data3
ldp crc_data2,crc_data3,[crc_blk_ptr],16
crc32_u64 wCRC,wCRC,crc_data0
crc32_u64 wCRC,wCRC,crc_data1
ldp crc_data0,crc_data1,[crc_blk_ptr],16
crc32_u64 wCRC,wCRC,crc_data2
crc32_u64 wCRC,wCRC,crc_data3
ldp crc_data2,crc_data3,[crc_blk_ptr],16
pmull tmp0_v.1q, tmp0_v.1d, k3k4_v.1d
crc32_u64 wCRC,wCRC,crc_data0
crc32_u64 wCRC,wCRC,crc_data1
ldp crc_data0,crc_data1,[crc_blk_ptr],16
crc32_u64 wCRC,wCRC,crc_data2
crc32_u64 wCRC,wCRC,crc_data3
ldp crc_data2,crc_data3,[crc_blk_ptr],16
eor tmp0_v.16b, tmp0_v.16b, tmp4_v.16b
crc32_u64 wCRC,wCRC,crc_data0
crc32_u64 wCRC,wCRC,crc_data1
ldp crc_data0,crc_data1,[crc_blk_ptr],16
crc32_u64 wCRC,wCRC,crc_data2
crc32_u64 wCRC,wCRC,crc_data3
ldp crc_data2,crc_data3,[crc_blk_ptr],16
crc32_u64 wCRC,wCRC,crc_data0
crc32_u64 wCRC,wCRC,crc_data1
ldp crc_data0,crc_data1,[crc_blk_ptr],16
eor tmp0_v.16b, tmp0_v.16b, tmp3_v.16b
crc32_u64 wCRC,wCRC,crc_data2
crc32_u64 wCRC,wCRC,crc_data3
ldp crc_data2,crc_data3,[crc_blk_ptr],16
crc32_u64 wCRC,wCRC,crc_data0
crc32_u64 wCRC,wCRC,crc_data1
ldp crc_data0,crc_data1,[crc_blk_ptr],16
/**
* perform the last 64 bit fold, also
* adds 32 zeroes to the input stream
*/
ext tmp1_v.16b, tmp0_v.16b, tmp0_v.16b, #8
crc32_u64 wCRC,wCRC,crc_data2
crc32_u64 wCRC,wCRC,crc_data3
ldp crc_data2,crc_data3,[crc_blk_ptr],16
crc32_u64 wCRC,wCRC,crc_data0
crc32_u64 wCRC,wCRC,crc_data1
ldp crc_data0,crc_data1,[crc_blk_ptr],16
pmull2 tmp1_v.1q, tmp1_v.2d, k3k4_v.2d
crc32_u64 wCRC,wCRC,crc_data2
crc32_u64 wCRC,wCRC,crc_data3
ldp crc_data2,crc_data3,[crc_blk_ptr],16
crc32_u64 wCRC,wCRC,crc_data0
crc32_u64 wCRC,wCRC,crc_data1
ldp crc_data0,crc_data1,[crc_blk_ptr],16
crc32_u64 wCRC,wCRC,crc_data2
crc32_u64 wCRC,wCRC,crc_data3
ldp crc_data2,crc_data3,[crc_blk_ptr],16
ext tmp0_v.16b, tmp0_v.16b, vzr.16b, #8
crc32_u64 wCRC,wCRC,crc_data0
crc32_u64 wCRC,wCRC,crc_data1
ldp crc_data0,crc_data1,[crc_blk_ptr],16
crc32_u64 wCRC,wCRC,crc_data2
crc32_u64 wCRC,wCRC,crc_data3
ldp crc_data2,crc_data3,[crc_blk_ptr],16
eor tmp0_v.16b, tmp0_v.16b, tmp1_v.16b
crc32_u64 wCRC,wCRC,crc_data0
crc32_u64 wCRC,wCRC,crc_data1
ldp crc_data0,crc_data1,[crc_blk_ptr],16
crc32_u64 wCRC,wCRC,crc_data2
crc32_u64 wCRC,wCRC,crc_data3
ldp crc_data2,crc_data3,[crc_blk_ptr],16
/* final 32-bit fold */
ext tmp1_v.16b, tmp0_v.16b, vzr.16b, #4
and tmp0_v.16b, tmp0_v.16b, mask_v.16b
crc32_u64 wCRC,wCRC,crc_data0
crc32_u64 wCRC,wCRC,crc_data1
ldp crc_data0,crc_data1,[crc_blk_ptr],16
pmull tmp0_v.1q, tmp0_v.1d, k5k0_v.1d
crc32_u64 wCRC,wCRC,crc_data2
crc32_u64 wCRC,wCRC,crc_data3
ldp crc_data2,crc_data3,[crc_blk_ptr],16
crc32_u64 wCRC,wCRC,crc_data0
crc32_u64 wCRC,wCRC,crc_data1
ldp crc_data0,crc_data1,[crc_blk_ptr],16
crc32_u64 wCRC,wCRC,crc_data2
crc32_u64 wCRC,wCRC,crc_data3
ldp crc_data2,crc_data3,[crc_blk_ptr],16
eor tmp0_v.16b, tmp0_v.16b, tmp1_v.16b
/**
* Finish up with the bit-reversed barrett
* reduction 64 ==> 32 bits
*/
crc32_u64 wCRC,wCRC,crc_data0
crc32_u64 wCRC,wCRC,crc_data1
and tmp1_v.16b, tmp0_v.16b, mask_v.16b
ldp crc_data0,crc_data1,[crc_blk_ptr],16
ext tmp1_v.16b, vzr.16b, tmp1_v.16b, #8
crc32_u64 wCRC,wCRC,crc_data2
crc32_u64 wCRC,wCRC,crc_data3
pmull2 tmp1_v.1q, tmp1_v.2d, poly_v.2d
ldp crc_data2,crc_data3,[crc_blk_ptr],16
crc32_u64 wCRC,wCRC,crc_data0
crc32_u64 wCRC,wCRC,crc_data1
ldp crc_data0,crc_data1,[crc_blk_ptr],16
crc32_u64 wCRC,wCRC,crc_data2
crc32_u64 wCRC,wCRC,crc_data3
and tmp1_v.16b, tmp1_v.16b, mask_v.16b
ldp crc_data2,crc_data3,[crc_blk_ptr],16
pmull tmp1_v.1q, tmp1_v.1d, poly_v.1d
crc32_u64 wCRC,wCRC,crc_data0
crc32_u64 wCRC,wCRC,crc_data1
ldp crc_data0,crc_data1,[crc_blk_ptr],16
eor tmp0_v.16b, tmp0_v.16b, tmp1_v.16b
crc32_u64 wCRC,wCRC,crc_data2
crc32_u64 wCRC,wCRC,crc_data3
mov tmp4_v.16b,vzr.16b
mov tmp4_v.s[0], tmp0_v.s[1]
ldp crc_data2,crc_data3,[crc_blk_ptr],16
crc32_u64 wCRC,wCRC,crc_data0
crc32_u64 wCRC,wCRC,crc_data1
ldp crc_data0,crc_data1,[crc_blk_ptr],16
crc32_u64 wCRC,wCRC,crc_data2
crc32_u64 wCRC,wCRC,crc_data3
ldp crc_data2,crc_data3,[crc_blk_ptr],16
crc32_u64 wCRC,wCRC,crc_data0
crc32_u64 wCRC,wCRC,crc_data1
ldp crc_data0,crc_data1,[crc_blk_ptr],16
crc32_u64 wCRC,wCRC,crc_data2
crc32_u64 wCRC,wCRC,crc_data3
ldp crc_data2,crc_data3,[crc_blk_ptr],16
crc32_u64 wCRC,wCRC,crc_data0
crc32_u64 wCRC,wCRC,crc_data1
ldp crc_data0,crc_data1,[crc_blk_ptr],16
crc32_u64 wCRC,wCRC,crc_data2
crc32_u64 wCRC,wCRC,crc_data3
ldp crc_data2,crc_data3,[crc_blk_ptr],16
crc32_u64 wCRC,wCRC,crc_data0
crc32_u64 wCRC,wCRC,crc_data1
crc32_u64 wCRC,wCRC,crc_data2
crc32_u64 wCRC,wCRC,crc_data3
pmull tmp4_v.1q, tmp4_v.1d, fold_poly_v.1d
add BUF,BUF,MIX_BLK_SIZE
fmov xPmull, tmp4_d
crc32_u64 wPmull, wzr, xPmull
eor wCRC, wPmull, wCRC
bge loop_2048
start_final:
cmp LEN, 63
bls .loop_16B
.loop_64B:
ldp data0, data1, [BUF],#16
sub LEN,LEN,#64
ldp data2, data3, [BUF],#16
cmp LEN,#64
crc32_u64 wCRC, wCRC, data0
crc32_u64 wCRC, wCRC, data1
ldp data0, data1, [BUF],#16
crc32_u64 wCRC, wCRC, data2
crc32_u64 wCRC, wCRC, data3
ldp data2, data3, [BUF],#16
crc32_u64 wCRC, wCRC, data0
crc32_u64 wCRC, wCRC, data1
crc32_u64 wCRC, wCRC, data2
crc32_u64 wCRC, wCRC, data3
bge .loop_64B
.loop_16B:
cmp LEN, 15
bls .less_16B
ldp data0, data1, [BUF],#16
sub LEN,LEN,#16
cmp LEN,15
crc32_u64 wCRC, wCRC, data0
crc32_u64 wCRC, wCRC, data1
bls .less_16B
ldp data0, data1, [BUF],#16
sub LEN,LEN,#16
cmp LEN,15
crc32_u64 wCRC, wCRC, data0
crc32_u64 wCRC, wCRC, data1
bls .less_16B
ldp data0, data1, [BUF],#16
sub LEN,LEN,#16 //MUST less than 16B
crc32_u64 wCRC, wCRC, data0
crc32_u64 wCRC, wCRC, data1
.less_16B:
cmp LEN, 7
bls .less_8B
ldr data0, [BUF], 8
sub LEN, LEN, #8
crc32_u64 wCRC, wCRC, data0
.less_8B:
cmp LEN, 3
bls .less_4B
ldr wdata, [BUF], 4
sub LEN, LEN, #4
crc32_u32 wCRC, wCRC, wdata
.less_4B:
cmp LEN, 1
bls .less_2B
ldrh wdata, [BUF], 2
sub LEN, LEN, #2
crc32_u16 wCRC, wCRC, wdata
.less_2B:
cbz LEN, .finish_exit
ldrb wdata, [BUF]
crc32_u8 wCRC, wCRC, wdata
.finish_exit:
.ifc \poly_type,crc32
mvn w0, wCRC
.else
mov w0, wCRC
.endif
ret
.endm

View File

@ -0,0 +1,99 @@
########################################################################
# Copyright(c) 2020 Arm Corporation All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Arm Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#########################################################################
#include "crc32_aarch64_common.h"
.text
.align 6
.arch armv8-a+crc+crypto
.macro crc32_u64 dst,src,data
crc32x \dst,\src,\data
.endm
.macro crc32_u32 dst,src,data
crc32w \dst,\src,\data
.endm
.macro crc32_u16 dst,src,data
crc32h \dst,\src,\data
.endm
.macro crc32_u8 dst,src,data
crc32b \dst,\src,\data
.endm
.macro declare_var_vector_reg name:req,reg:req
q\name .req q\reg
v\name .req v\reg
s\name .req s\reg
d\name .req d\reg
.endm
BUF .req x1
ptr_crc0 .req x1
LEN .req x2
wCRC .req w0
crc0 .req w0
xcrc0 .req x0
crc1 .req w3
crc2 .req w4
xcrc1 .req x3
const_adr .req x3
ptr_crc1 .req x6
ptr_crc2 .req x7
crc0_data0 .req x9
crc0_data1 .req x10
crc1_data0 .req x11
crc1_data1 .req x12
crc2_data0 .req x13
crc2_data1 .req x14
wdata .req w3
data0 .req x3
data1 .req x4
data2 .req x5
data3 .req x6
declare_var_vector_reg tmp0,0
declare_var_vector_reg tmp1,1
declare_var_vector_reg const0,2
declare_var_vector_reg const1,3
/**
uint32_t crc32_gzip_refl(
uint32_t wCRC,
const unsigned char *BUF,
uint64_t LEN
);
*/
.global cdecl(crc32_gzip_refl_3crc_fold)
#ifndef __APPLE__
.type crc32_gzip_refl_3crc_fold, %function
#endif
cdecl(crc32_gzip_refl_3crc_fold):
crc32_3crc_fold crc32
#ifndef __APPLE__
.size crc32_gzip_refl_3crc_fold, .-crc32_gzip_refl_3crc_fold
#endif

View File

@ -0,0 +1,70 @@
/**********************************************************************
Copyright(c) 2020 Arm Corporation All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Arm Corporation nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
**********************************************************************/
.text
.align 6
.arch armv8-a+crc
#include "crc32_aarch64_common.h"
BUF .req x1
LEN .req x2
wCRC .req w0
data0 .req x4
data1 .req x5
data2 .req x6
data3 .req x7
wdata .req w3
.macro crc32_u64 dst,src,data
crc32x \dst,\src,\data
.endm
.macro crc32_u32 dst,src,data
crc32w \dst,\src,\data
.endm
.macro crc32_u16 dst,src,data
crc32h \dst,\src,\data
.endm
.macro crc32_u8 dst,src,data
crc32b \dst,\src,\data
.endm
/**
* uint32_t crc32_gzip_refl_crc_ext(const unsigned char *BUF,
* uint64_t LEN,uint32_t wCRC);
*/
.global cdecl(crc32_gzip_refl_crc_ext)
#ifndef __APPLE__
.type crc32_gzip_refl_crc_ext, %function
#endif
cdecl(crc32_gzip_refl_crc_ext):
crc32_hw_common crc32
#ifndef __APPLE__
.size crc32_gzip_refl_crc_ext, .-crc32_gzip_refl_crc_ext
#endif

View File

@ -0,0 +1,34 @@
########################################################################
# Copyright(c) 2019 Arm Corporation All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Arm Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#########################################################################
#include "../include/aarch64_label.h"
#include "crc32_gzip_refl_pmull.h"
#include "crc32_refl_common_pmull.h"
crc32_refl_func crc32_gzip_refl_pmull

View File

@ -0,0 +1,89 @@
########################################################################
# Copyright(c) 2019 Arm Corporation All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Arm Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#########################################################################
#define p4_low_b0 0x2d95
#define p4_low_b1 0x8f35
#define p4_high_b0 0x13d7
#define p4_high_b1 0x1d95
#define p1_low_b0 0x9191
#define p1_low_b1 0xae68
#define p1_high_b0 0x009e
#define p1_high_b1 0xccaa
#define p0_low_b0 0x6765
#define p0_low_b1 0xb8bc
#define p0_high_b0 p1_high_b0
#define p0_high_b1 p1_high_b1
#define br_low_b0 0x0641
#define br_low_b1 0xdb71
#define br_low_b2 0x1
#define br_high_b0 0x1641
#define br_high_b1 0xf701
#define br_high_b2 0x1
.text
ASM_DEF_RODATA
.align 4
.set .lanchor_crc_tab,. + 0
#ifndef __APPLE__
.type crc32_table_gzip_refl, %object
.size crc32_table_gzip_refl, 1024
#endif
crc32_table_gzip_refl:
.word 0x00000000, 0x77073096, 0xee0e612c, 0x990951ba, 0x076dc419, 0x706af48f, 0xe963a535, 0x9e6495a3
.word 0x0edb8832, 0x79dcb8a4, 0xe0d5e91e, 0x97d2d988, 0x09b64c2b, 0x7eb17cbd, 0xe7b82d07, 0x90bf1d91
.word 0x1db71064, 0x6ab020f2, 0xf3b97148, 0x84be41de, 0x1adad47d, 0x6ddde4eb, 0xf4d4b551, 0x83d385c7
.word 0x136c9856, 0x646ba8c0, 0xfd62f97a, 0x8a65c9ec, 0x14015c4f, 0x63066cd9, 0xfa0f3d63, 0x8d080df5
.word 0x3b6e20c8, 0x4c69105e, 0xd56041e4, 0xa2677172, 0x3c03e4d1, 0x4b04d447, 0xd20d85fd, 0xa50ab56b
.word 0x35b5a8fa, 0x42b2986c, 0xdbbbc9d6, 0xacbcf940, 0x32d86ce3, 0x45df5c75, 0xdcd60dcf, 0xabd13d59
.word 0x26d930ac, 0x51de003a, 0xc8d75180, 0xbfd06116, 0x21b4f4b5, 0x56b3c423, 0xcfba9599, 0xb8bda50f
.word 0x2802b89e, 0x5f058808, 0xc60cd9b2, 0xb10be924, 0x2f6f7c87, 0x58684c11, 0xc1611dab, 0xb6662d3d
.word 0x76dc4190, 0x01db7106, 0x98d220bc, 0xefd5102a, 0x71b18589, 0x06b6b51f, 0x9fbfe4a5, 0xe8b8d433
.word 0x7807c9a2, 0x0f00f934, 0x9609a88e, 0xe10e9818, 0x7f6a0dbb, 0x086d3d2d, 0x91646c97, 0xe6635c01
.word 0x6b6b51f4, 0x1c6c6162, 0x856530d8, 0xf262004e, 0x6c0695ed, 0x1b01a57b, 0x8208f4c1, 0xf50fc457
.word 0x65b0d9c6, 0x12b7e950, 0x8bbeb8ea, 0xfcb9887c, 0x62dd1ddf, 0x15da2d49, 0x8cd37cf3, 0xfbd44c65
.word 0x4db26158, 0x3ab551ce, 0xa3bc0074, 0xd4bb30e2, 0x4adfa541, 0x3dd895d7, 0xa4d1c46d, 0xd3d6f4fb
.word 0x4369e96a, 0x346ed9fc, 0xad678846, 0xda60b8d0, 0x44042d73, 0x33031de5, 0xaa0a4c5f, 0xdd0d7cc9
.word 0x5005713c, 0x270241aa, 0xbe0b1010, 0xc90c2086, 0x5768b525, 0x206f85b3, 0xb966d409, 0xce61e49f
.word 0x5edef90e, 0x29d9c998, 0xb0d09822, 0xc7d7a8b4, 0x59b33d17, 0x2eb40d81, 0xb7bd5c3b, 0xc0ba6cad
.word 0xedb88320, 0x9abfb3b6, 0x03b6e20c, 0x74b1d29a, 0xead54739, 0x9dd277af, 0x04db2615, 0x73dc1683
.word 0xe3630b12, 0x94643b84, 0x0d6d6a3e, 0x7a6a5aa8, 0xe40ecf0b, 0x9309ff9d, 0x0a00ae27, 0x7d079eb1
.word 0xf00f9344, 0x8708a3d2, 0x1e01f268, 0x6906c2fe, 0xf762575d, 0x806567cb, 0x196c3671, 0x6e6b06e7
.word 0xfed41b76, 0x89d32be0, 0x10da7a5a, 0x67dd4acc, 0xf9b9df6f, 0x8ebeeff9, 0x17b7be43, 0x60b08ed5
.word 0xd6d6a3e8, 0xa1d1937e, 0x38d8c2c4, 0x4fdff252, 0xd1bb67f1, 0xa6bc5767, 0x3fb506dd, 0x48b2364b
.word 0xd80d2bda, 0xaf0a1b4c, 0x36034af6, 0x41047a60, 0xdf60efc3, 0xa867df55, 0x316e8eef, 0x4669be79
.word 0xcb61b38c, 0xbc66831a, 0x256fd2a0, 0x5268e236, 0xcc0c7795, 0xbb0b4703, 0x220216b9, 0x5505262f
.word 0xc5ba3bbe, 0xb2bd0b28, 0x2bb45a92, 0x5cb36a04, 0xc2d7ffa7, 0xb5d0cf31, 0x2cd99e8b, 0x5bdeae1d
.word 0x9b64c2b0, 0xec63f226, 0x756aa39c, 0x026d930a, 0x9c0906a9, 0xeb0e363f, 0x72076785, 0x05005713
.word 0x95bf4a82, 0xe2b87a14, 0x7bb12bae, 0x0cb61b38, 0x92d28e9b, 0xe5d5be0d, 0x7cdcefb7, 0x0bdbdf21
.word 0x86d3d2d4, 0xf1d4e242, 0x68ddb3f8, 0x1fda836e, 0x81be16cd, 0xf6b9265b, 0x6fb077e1, 0x18b74777
.word 0x88085ae6, 0xff0f6a70, 0x66063bca, 0x11010b5c, 0x8f659eff, 0xf862ae69, 0x616bffd3, 0x166ccf45
.word 0xa00ae278, 0xd70dd2ee, 0x4e048354, 0x3903b3c2, 0xa7672661, 0xd06016f7, 0x4969474d, 0x3e6e77db
.word 0xaed16a4a, 0xd9d65adc, 0x40df0b66, 0x37d83bf0, 0xa9bcae53, 0xdebb9ec5, 0x47b2cf7f, 0x30b5ffe9
.word 0xbdbdf21c, 0xcabac28a, 0x53b39330, 0x24b4a3a6, 0xbad03605, 0xcdd70693, 0x54de5729, 0x23d967bf
.word 0xb3667a2e, 0xc4614ab8, 0x5d681b02, 0x2a6f2b94, 0xb40bbe37, 0xc30c8ea1, 0x5a05df1b, 0x2d02ef8d

View File

@ -0,0 +1,34 @@
########################################################################
# Copyright(c) 2019 Arm Corporation All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Arm Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#########################################################################
#include "../include/aarch64_label.h"
#include "crc32_ieee_norm_pmull.h"
#include "crc32_norm_common_pmull.h"
crc32_norm_func crc32_ieee_norm_pmull

View File

@ -0,0 +1,89 @@
########################################################################
# Copyright(c) 2019 Arm Corporation All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Arm Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#########################################################################
#define p4_low_b0 0x8b11
#define p4_low_b1 0xe622
#define p4_high_b0 0x794c
#define p4_high_b1 0x8833
#define p1_low_b0 0x5605
#define p1_low_b1 0xe8a4
#define p1_high_b0 0xcd4c
#define p1_high_b1 0xc5b9
#define p0_low_b0 0x678d
#define p0_low_b1 0x490d
#define p0_high_b0 0xaa66
#define p0_high_b1 0xf200
#define br_low_b0 0x01df
#define br_low_b1 0x04d1
#define br_low_b2 0x1
#define br_high_b0 0x1db7
#define br_high_b1 0x04c1
#define br_high_b2 0x1
.text
ASM_DEF_RODATA
.align 4
.set .lanchor_crc_tab,. + 0
#ifndef __APPLE__
.type crc32_table_ieee_norm, %object
.size crc32_table_ieee_norm, 1024
#endif
crc32_table_ieee_norm:
.word 0x00000000, 0x04c11db7, 0x09823b6e, 0x0d4326d9, 0x130476dc, 0x17c56b6b, 0x1a864db2, 0x1e475005
.word 0x2608edb8, 0x22c9f00f, 0x2f8ad6d6, 0x2b4bcb61, 0x350c9b64, 0x31cd86d3, 0x3c8ea00a, 0x384fbdbd
.word 0x4c11db70, 0x48d0c6c7, 0x4593e01e, 0x4152fda9, 0x5f15adac, 0x5bd4b01b, 0x569796c2, 0x52568b75
.word 0x6a1936c8, 0x6ed82b7f, 0x639b0da6, 0x675a1011, 0x791d4014, 0x7ddc5da3, 0x709f7b7a, 0x745e66cd
.word 0x9823b6e0, 0x9ce2ab57, 0x91a18d8e, 0x95609039, 0x8b27c03c, 0x8fe6dd8b, 0x82a5fb52, 0x8664e6e5
.word 0xbe2b5b58, 0xbaea46ef, 0xb7a96036, 0xb3687d81, 0xad2f2d84, 0xa9ee3033, 0xa4ad16ea, 0xa06c0b5d
.word 0xd4326d90, 0xd0f37027, 0xddb056fe, 0xd9714b49, 0xc7361b4c, 0xc3f706fb, 0xceb42022, 0xca753d95
.word 0xf23a8028, 0xf6fb9d9f, 0xfbb8bb46, 0xff79a6f1, 0xe13ef6f4, 0xe5ffeb43, 0xe8bccd9a, 0xec7dd02d
.word 0x34867077, 0x30476dc0, 0x3d044b19, 0x39c556ae, 0x278206ab, 0x23431b1c, 0x2e003dc5, 0x2ac12072
.word 0x128e9dcf, 0x164f8078, 0x1b0ca6a1, 0x1fcdbb16, 0x018aeb13, 0x054bf6a4, 0x0808d07d, 0x0cc9cdca
.word 0x7897ab07, 0x7c56b6b0, 0x71159069, 0x75d48dde, 0x6b93dddb, 0x6f52c06c, 0x6211e6b5, 0x66d0fb02
.word 0x5e9f46bf, 0x5a5e5b08, 0x571d7dd1, 0x53dc6066, 0x4d9b3063, 0x495a2dd4, 0x44190b0d, 0x40d816ba
.word 0xaca5c697, 0xa864db20, 0xa527fdf9, 0xa1e6e04e, 0xbfa1b04b, 0xbb60adfc, 0xb6238b25, 0xb2e29692
.word 0x8aad2b2f, 0x8e6c3698, 0x832f1041, 0x87ee0df6, 0x99a95df3, 0x9d684044, 0x902b669d, 0x94ea7b2a
.word 0xe0b41de7, 0xe4750050, 0xe9362689, 0xedf73b3e, 0xf3b06b3b, 0xf771768c, 0xfa325055, 0xfef34de2
.word 0xc6bcf05f, 0xc27dede8, 0xcf3ecb31, 0xcbffd686, 0xd5b88683, 0xd1799b34, 0xdc3abded, 0xd8fba05a
.word 0x690ce0ee, 0x6dcdfd59, 0x608edb80, 0x644fc637, 0x7a089632, 0x7ec98b85, 0x738aad5c, 0x774bb0eb
.word 0x4f040d56, 0x4bc510e1, 0x46863638, 0x42472b8f, 0x5c007b8a, 0x58c1663d, 0x558240e4, 0x51435d53
.word 0x251d3b9e, 0x21dc2629, 0x2c9f00f0, 0x285e1d47, 0x36194d42, 0x32d850f5, 0x3f9b762c, 0x3b5a6b9b
.word 0x0315d626, 0x07d4cb91, 0x0a97ed48, 0x0e56f0ff, 0x1011a0fa, 0x14d0bd4d, 0x19939b94, 0x1d528623
.word 0xf12f560e, 0xf5ee4bb9, 0xf8ad6d60, 0xfc6c70d7, 0xe22b20d2, 0xe6ea3d65, 0xeba91bbc, 0xef68060b
.word 0xd727bbb6, 0xd3e6a601, 0xdea580d8, 0xda649d6f, 0xc423cd6a, 0xc0e2d0dd, 0xcda1f604, 0xc960ebb3
.word 0xbd3e8d7e, 0xb9ff90c9, 0xb4bcb610, 0xb07daba7, 0xae3afba2, 0xaafbe615, 0xa7b8c0cc, 0xa379dd7b
.word 0x9b3660c6, 0x9ff77d71, 0x92b45ba8, 0x9675461f, 0x8832161a, 0x8cf30bad, 0x81b02d74, 0x857130c3
.word 0x5d8a9099, 0x594b8d2e, 0x5408abf7, 0x50c9b640, 0x4e8ee645, 0x4a4ffbf2, 0x470cdd2b, 0x43cdc09c
.word 0x7b827d21, 0x7f436096, 0x7200464f, 0x76c15bf8, 0x68860bfd, 0x6c47164a, 0x61043093, 0x65c52d24
.word 0x119b4be9, 0x155a565e, 0x18197087, 0x1cd86d30, 0x029f3d35, 0x065e2082, 0x0b1d065b, 0x0fdc1bec
.word 0x3793a651, 0x3352bbe6, 0x3e119d3f, 0x3ad08088, 0x2497d08d, 0x2056cd3a, 0x2d15ebe3, 0x29d4f654
.word 0xc5a92679, 0xc1683bce, 0xcc2b1d17, 0xc8ea00a0, 0xd6ad50a5, 0xd26c4d12, 0xdf2f6bcb, 0xdbee767c
.word 0xe3a1cbc1, 0xe760d676, 0xea23f0af, 0xeee2ed18, 0xf0a5bd1d, 0xf464a0aa, 0xf9278673, 0xfde69bc4
.word 0x89b8fd09, 0x8d79e0be, 0x803ac667, 0x84fbdbd0, 0x9abc8bd5, 0x9e7d9662, 0x933eb0bb, 0x97ffad0c
.word 0xafb010b1, 0xab710d06, 0xa6322bdf, 0xa2f33668, 0xbcb4666d, 0xb8757bda, 0xb5365d03, 0xb1f740b4

View File

@ -0,0 +1,101 @@
########################################################################
# Copyright(c) 2020 Arm Corporation All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Arm Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#########################################################################
.text
.align 6
.arch armv8-a+crc+crypto
#include "crc32_aarch64_common.h"
.macro crc32_u64 dst,src,data
crc32cx \dst,\src,\data
.endm
.macro crc32_u32 dst,src,data
crc32cw \dst,\src,\data
.endm
.macro crc32_u16 dst,src,data
crc32ch \dst,\src,\data
.endm
.macro crc32_u8 dst,src,data
crc32cb \dst,\src,\data
.endm
.macro declare_var_vector_reg name:req,reg:req
q\name .req q\reg
v\name .req v\reg
s\name .req s\reg
d\name .req d\reg
.endm
BUF .req x0
LEN .req x1
wCRC .req w2
crc0 .req w2
crc1 .req w3
crc2 .req w4
xcrc0 .req x2
xcrc1 .req x3
const_adr .req x3
ptr_crc0 .req x0
ptr_crc1 .req x6
ptr_crc2 .req x7
crc0_data0 .req x9
crc0_data1 .req x10
crc1_data0 .req x11
crc1_data1 .req x12
crc2_data0 .req x13
crc2_data1 .req x14
wdata .req w3
data0 .req x3
data1 .req x4
data2 .req x5
data3 .req x6
declare_var_vector_reg tmp0,0
declare_var_vector_reg tmp1,1
declare_var_vector_reg const0,2
declare_var_vector_reg const1,3
/**
unsigned int crc32_iscsi(
unsigned char *BUF,
int LEN,
unsigned int wCRC
);
*/
.global cdecl(crc32_iscsi_3crc_fold)
#ifndef __APPLE__
.type crc32_iscsi_3crc_fold, %function
#endif
cdecl(crc32_iscsi_3crc_fold):
crc32_3crc_fold crc32c
#ifndef __APPLE__
.size crc32_iscsi_3crc_fold, .-crc32_iscsi_3crc_fold
#endif

View File

@ -0,0 +1,69 @@
/**********************************************************************
Copyright(c) 2020 Arm Corporation All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Arm Corporation nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
**********************************************************************/
.text
.align 6
.arch armv8-a+crc
#include "crc32_aarch64_common.h"
BUF .req x0
LEN .req x1
wCRC .req w2
data0 .req x4
data1 .req x5
data2 .req x6
data3 .req x7
wdata .req w3
.macro crc32_u64 dst,src,data
crc32cx \dst,\src,\data
.endm
.macro crc32_u32 dst,src,data
crc32cw \dst,\src,\data
.endm
.macro crc32_u16 dst,src,data
crc32ch \dst,\src,\data
.endm
.macro crc32_u8 dst,src,data
crc32cb \dst,\src,\data
.endm
/**
* uint32_t crc32_iscsi_crc_ext(const unsigned char *BUF,
* uint64_t LEN,uint32_t wCRC);
*/
.global cdecl(crc32_iscsi_crc_ext)
#ifndef __APPLE__
.type crc32_iscsi_crc_ext, %function
#endif
cdecl(crc32_iscsi_crc_ext):
crc32_hw_common crc32c
#ifndef __APPLE__
.size crc32_iscsi_crc_ext, .-crc32_iscsi_crc_ext
#endif

View File

@ -0,0 +1,56 @@
########################################################################
# Copyright(c) 2019 Arm Corporation All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Arm Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#########################################################################
#include "../include/aarch64_label.h"
#include "crc32_iscsi_refl_pmull.h"
#include "crc32_refl_common_pmull.h"
crc32_refl_func crc32_iscsi_refl_pmull_internal
.arch armv8-a+crc+crypto
.text
.align 3
.global cdecl(crc32_iscsi_refl_pmull)
#ifndef __APPLE__
.type crc32_iscsi_refl_pmull, %function
#endif
cdecl(crc32_iscsi_refl_pmull):
stp x29, x30, [sp, -32]!
mov x29, sp
mov w7, w2
sxtw x2, w1
mov x1, x0
mov w0, w7
mvn w0, w0
bl cdecl(crc32_iscsi_refl_pmull_internal)
mvn w0, w0
ldp x29, x30, [sp], 32
ret

View File

@ -0,0 +1,90 @@
########################################################################
# Copyright(c) 2019 Arm Corporation All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Arm Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#########################################################################
#define p4_low_b0 0xef02
#define p4_low_b1 0x740e
#define p4_high_b0 0xddf8
#define p4_high_b1 0x9e4a
#define p1_low_b0 0x0dfe
#define p1_low_b1 0xf20c
#define p1_high_b0 0x7d27
#define p1_high_b1 0x493c
#define p0_low_b0 0xaab8
#define p0_low_b1 0xdd45
#define p0_high_b0 p1_high_b0
#define p0_high_b1 p1_high_b1
#define br_low_b0 0x76f1
#define br_low_b1 0x05ec
#define br_low_b2 0x1
#define br_high_b0 0x13f1
#define br_high_b1 0xdea7
#define br_high_b2 0x0
.text
ASM_DEF_RODATA
.align 4
.set .lanchor_crc_tab,. + 0
#ifndef __APPLE__
.type crc32_table_iscsi_refl, %object
.size crc32_table_iscsi_refl, 1024
#endif
crc32_table_iscsi_refl:
.word 0x00000000, 0xF26B8303, 0xE13B70F7, 0x1350F3F4, 0xC79A971F, 0x35F1141C, 0x26A1E7E8, 0xD4CA64EB
.word 0x8AD958CF, 0x78B2DBCC, 0x6BE22838, 0x9989AB3B, 0x4D43CFD0, 0xBF284CD3, 0xAC78BF27, 0x5E133C24
.word 0x105EC76F, 0xE235446C, 0xF165B798, 0x030E349B, 0xD7C45070, 0x25AFD373, 0x36FF2087, 0xC494A384
.word 0x9A879FA0, 0x68EC1CA3, 0x7BBCEF57, 0x89D76C54, 0x5D1D08BF, 0xAF768BBC, 0xBC267848, 0x4E4DFB4B
.word 0x20BD8EDE, 0xD2D60DDD, 0xC186FE29, 0x33ED7D2A, 0xE72719C1, 0x154C9AC2, 0x061C6936, 0xF477EA35
.word 0xAA64D611, 0x580F5512, 0x4B5FA6E6, 0xB93425E5, 0x6DFE410E, 0x9F95C20D, 0x8CC531F9, 0x7EAEB2FA
.word 0x30E349B1, 0xC288CAB2, 0xD1D83946, 0x23B3BA45, 0xF779DEAE, 0x05125DAD, 0x1642AE59, 0xE4292D5A
.word 0xBA3A117E, 0x4851927D, 0x5B016189, 0xA96AE28A, 0x7DA08661, 0x8FCB0562, 0x9C9BF696, 0x6EF07595
.word 0x417B1DBC, 0xB3109EBF, 0xA0406D4B, 0x522BEE48, 0x86E18AA3, 0x748A09A0, 0x67DAFA54, 0x95B17957
.word 0xCBA24573, 0x39C9C670, 0x2A993584, 0xD8F2B687, 0x0C38D26C, 0xFE53516F, 0xED03A29B, 0x1F682198
.word 0x5125DAD3, 0xA34E59D0, 0xB01EAA24, 0x42752927, 0x96BF4DCC, 0x64D4CECF, 0x77843D3B, 0x85EFBE38
.word 0xDBFC821C, 0x2997011F, 0x3AC7F2EB, 0xC8AC71E8, 0x1C661503, 0xEE0D9600, 0xFD5D65F4, 0x0F36E6F7
.word 0x61C69362, 0x93AD1061, 0x80FDE395, 0x72966096, 0xA65C047D, 0x5437877E, 0x4767748A, 0xB50CF789
.word 0xEB1FCBAD, 0x197448AE, 0x0A24BB5A, 0xF84F3859, 0x2C855CB2, 0xDEEEDFB1, 0xCDBE2C45, 0x3FD5AF46
.word 0x7198540D, 0x83F3D70E, 0x90A324FA, 0x62C8A7F9, 0xB602C312, 0x44694011, 0x5739B3E5, 0xA55230E6
.word 0xFB410CC2, 0x092A8FC1, 0x1A7A7C35, 0xE811FF36, 0x3CDB9BDD, 0xCEB018DE, 0xDDE0EB2A, 0x2F8B6829
.word 0x82F63B78, 0x709DB87B, 0x63CD4B8F, 0x91A6C88C, 0x456CAC67, 0xB7072F64, 0xA457DC90, 0x563C5F93
.word 0x082F63B7, 0xFA44E0B4, 0xE9141340, 0x1B7F9043, 0xCFB5F4A8, 0x3DDE77AB, 0x2E8E845F, 0xDCE5075C
.word 0x92A8FC17, 0x60C37F14, 0x73938CE0, 0x81F80FE3, 0x55326B08, 0xA759E80B, 0xB4091BFF, 0x466298FC
.word 0x1871A4D8, 0xEA1A27DB, 0xF94AD42F, 0x0B21572C, 0xDFEB33C7, 0x2D80B0C4, 0x3ED04330, 0xCCBBC033
.word 0xA24BB5A6, 0x502036A5, 0x4370C551, 0xB11B4652, 0x65D122B9, 0x97BAA1BA, 0x84EA524E, 0x7681D14D
.word 0x2892ED69, 0xDAF96E6A, 0xC9A99D9E, 0x3BC21E9D, 0xEF087A76, 0x1D63F975, 0x0E330A81, 0xFC588982
.word 0xB21572C9, 0x407EF1CA, 0x532E023E, 0xA145813D, 0x758FE5D6, 0x87E466D5, 0x94B49521, 0x66DF1622
.word 0x38CC2A06, 0xCAA7A905, 0xD9F75AF1, 0x2B9CD9F2, 0xFF56BD19, 0x0D3D3E1A, 0x1E6DCDEE, 0xEC064EED
.word 0xC38D26C4, 0x31E6A5C7, 0x22B65633, 0xD0DDD530, 0x0417B1DB, 0xF67C32D8, 0xE52CC12C, 0x1747422F
.word 0x49547E0B, 0xBB3FFD08, 0xA86F0EFC, 0x5A048DFF, 0x8ECEE914, 0x7CA56A17, 0x6FF599E3, 0x9D9E1AE0
.word 0xD3D3E1AB, 0x21B862A8, 0x32E8915C, 0xC083125F, 0x144976B4, 0xE622F5B7, 0xF5720643, 0x07198540
.word 0x590AB964, 0xAB613A67, 0xB831C993, 0x4A5A4A90, 0x9E902E7B, 0x6CFBAD78, 0x7FAB5E8C, 0x8DC0DD8F
.word 0xE330A81A, 0x115B2B19, 0x020BD8ED, 0xF0605BEE, 0x24AA3F05, 0xD6C1BC06, 0xC5914FF2, 0x37FACCF1
.word 0x69E9F0D5, 0x9B8273D6, 0x88D28022, 0x7AB90321, 0xAE7367CA, 0x5C18E4C9, 0x4F48173D, 0xBD23943E
.word 0xF36E6F75, 0x0105EC76, 0x12551F82, 0xE03E9C81, 0x34F4F86A, 0xC69F7B69, 0xD5CF889D, 0x27A40B9E
.word 0x79B737BA, 0x8BDCB4B9, 0x988C474D, 0x6AE7C44E, 0xBE2DA0A5, 0x4C4623A6, 0x5F16D052, 0xAD7D5351

View File

@ -0,0 +1,123 @@
/**********************************************************************
Copyright(c) 2020 Arm Corporation All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Arm Corporation nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
**********************************************************************/
.arch armv8-a+crypto+crc
.text
.align 6
#define CRC32
.macro crc32_u64 dst,src,data
crc32x \dst,\src,\data
.endm
.macro crc32_u32 dst,src,data
crc32w \dst,\src,\data
.endm
.macro crc32_u16 dst,src,data
crc32h \dst,\src,\data
.endm
.macro crc32_u8 dst,src,data
crc32b \dst,\src,\data
.endm
#include "crc32_mix_default_common.S"
.global cdecl(crc32_mix_default)
#ifndef __APPLE__
.type crc32_mix_default, %function
#endif
cdecl(crc32_mix_default):
crc32_mix_main_default
#ifndef __APPLE__
.size crc32_mix_default, .-crc32_mix_default
#endif
ASM_DEF_RODATA
.align 4
.set lanchor_crc32,. + 0
#ifndef __APPLE__
.type k1k2, %object
.size k1k2, 16
#endif
k1k2:
.xword 0x0154442bd4
.xword 0x01c6e41596
#ifndef __APPLE__
.type k3k4, %object
.size k3k4, 16
#endif
k3k4:
.xword 0x01751997d0
.xword 0x00ccaa009e
#ifndef __APPLE__
.type k5k0, %object
.size k5k0, 16
#endif
k5k0:
.xword 0x0163cd6124
.xword 0
#ifndef __APPLE__
.type poly, %object
.size poly, 16
#endif
poly:
.xword 0x01db710641
.xword 0x01f7011641
#ifndef __APPLE__
.type crc32_const, %object
.size crc32_const, 48
#endif
crc32_const:
.xword 0x1753ab84
.xword 0
.xword 0xbbf2f6d6
.xword 0
.xword 0x0c30f51d
.xword 0
.align 4
.set .lanchor_mask,. + 0
#ifndef __APPLE__
.type mask, %object
.size mask, 16
#endif
mask:
.word -1
.word 0
.word -1
.word 0

View File

@ -0,0 +1,586 @@
/**********************************************************************
Copyright(c) 2020 Arm Corporation All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Arm Corporation nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
**********************************************************************/
#include "../include/aarch64_label.h"
.macro declare_generic_reg name:req, reg:req, default:req
\name .req \default\reg
w_\name .req w\reg
x_\name .req x\reg
.endm
.macro declare_neon_reg name:req, reg:req, default:req
\name .req \default\reg
v_\name .req v\reg
q_\name .req q\reg
d_\name .req d\reg
s_\name .req s\reg
.endm
/**********************************************************************
variables
**********************************************************************/
declare_generic_reg crc, 0,w
declare_generic_reg buf, 1,x
declare_generic_reg len, 2,x
declare_generic_reg buf_saved, 3,x
declare_generic_reg buf_iter, 4,x
declare_generic_reg len_saved, 5,x
declare_generic_reg buf_tmp, 6,x
declare_generic_reg crc0, 7,x
declare_generic_reg crc1, 8,x
declare_generic_reg crc2, 9,x
declare_generic_reg pconst, 10,x
declare_generic_reg data_crc0, 11,x
declare_generic_reg data_crc1, 12,x
declare_generic_reg data_crc2, 13,x
declare_generic_reg size, 9,x
declare_generic_reg crc_tmp, 10,w
declare_generic_reg size_tmp, 11,x
declare_generic_reg data_tmp1, 11,x
declare_generic_reg data_tmp2, 12,x
declare_generic_reg data_tmp3, 13,x
declare_generic_reg tmp, 14,x
declare_generic_reg tmp1, 15,x
// return
declare_generic_reg ret_crc, 0,w
/**********************************************************************
simd variables
**********************************************************************/
declare_neon_reg a0, 0,v
declare_neon_reg a1, 1,v
declare_neon_reg a2, 2,v
declare_neon_reg a3, 3,v
declare_neon_reg a4, 4,v
declare_neon_reg a5, 16,v
declare_neon_reg a6, 17,v
declare_neon_reg a7, 18,v
declare_neon_reg a8, 19,v
declare_neon_reg y5, 20,v
declare_neon_reg y6, 21,v
declare_neon_reg y7, 22,v
declare_neon_reg y8, 23,v
declare_neon_reg neon_zero, 24,v
declare_neon_reg neon_tmp, 24,v
declare_neon_reg k5k0, 25,v
declare_neon_reg neon_tmp1, 26,v
declare_neon_reg neon_tmp2, 27,v
declare_neon_reg neon_tmp3, 28,v
declare_neon_reg crc_pmull, 29,v
declare_neon_reg neon_crc0, 30,v
declare_neon_reg neon_crc1, 31,v
declare_neon_reg neon_const0, 5,v
declare_neon_reg neon_const1, 6,v
declare_neon_reg neon_const2, 7,v
// constants
.equ offset_k3k4, 16
.equ offset_k5k0, 32
.equ offset_poly, 48
.equ offset_crc32_const, 64
// pmull fold
.macro pmull_fold
ldr x_data_crc0, [x_buf_tmp, 464]
ldr x_data_crc1, [x_buf_tmp, 976]
ldr x_data_crc2, [x_buf_tmp, 1488]
pmull v_a5.1q, v_a1.1d, v_a0.1d
crc32_u64 w_crc0, w_crc0, x_data_crc0
crc32_u64 w_crc1, w_crc1, x_data_crc1
crc32_u64 w_crc2, w_crc2, x_data_crc2
ldr x_data_crc0, [x_buf_tmp, 472]
ldr x_data_crc1, [x_buf_tmp, 984]
ldr x_data_crc2, [x_buf_tmp, 1496]
pmull v_a6.1q, v_a2.1d, v_a0.1d
crc32_u64 w_crc0, w_crc0, x_data_crc0
crc32_u64 w_crc1, w_crc1, x_data_crc1
crc32_u64 w_crc2, w_crc2, x_data_crc2
ldr x_data_crc0, [x_buf_tmp, 480]
ldr x_data_crc1, [x_buf_tmp, 992]
ldr x_data_crc2, [x_buf_tmp, 1504]
pmull v_a7.1q, v_a3.1d, v_a0.1d
crc32_u64 w_crc0, w_crc0, x_data_crc0
crc32_u64 w_crc1, w_crc1, x_data_crc1
crc32_u64 w_crc2, w_crc2, x_data_crc2
ldr x_data_crc0, [x_buf_tmp, 488]
ldr x_data_crc1, [x_buf_tmp, 1000]
ldr x_data_crc2, [x_buf_tmp, 1512]
pmull v_a8.1q, v_a4.1d, v_a0.1d
crc32_u64 w_crc0, w_crc0, x_data_crc0
crc32_u64 w_crc1, w_crc1, x_data_crc1
crc32_u64 w_crc2, w_crc2, x_data_crc2
ldr x_data_crc0, [x_buf_tmp, 496]
ldr x_data_crc1, [x_buf_tmp, 1008]
ldr x_data_crc2, [x_buf_tmp, 1520]
pmull2 v_a1.1q, v_a1.2d, v_a0.2d
crc32_u64 w_crc0, w_crc0, x_data_crc0
crc32_u64 w_crc1, w_crc1, x_data_crc1
crc32_u64 w_crc2, w_crc2, x_data_crc2
ld1 {v_y5.4s, v_y6.4s, v_y7.4s, v_y8.4s}, [x_buf_tmp]
ldr x_data_crc0, [x_buf_tmp, 504]
ldr x_data_crc1, [x_buf_tmp, 1016]
ldr x_data_crc2, [x_buf_tmp, 1528]
pmull2 v_a2.1q, v_a2.2d, v_a0.2d
crc32_u64 w_crc0, w_crc0, x_data_crc0
crc32_u64 w_crc1, w_crc1, x_data_crc1
crc32_u64 w_crc2, w_crc2, x_data_crc2
pmull2 v_a3.1q, v_a3.2d, v_a0.2d
pmull2 v_a4.1q, v_a4.2d, v_a0.2d
eor v_y5.16b, v_y5.16b, v_a5.16b
eor v_y6.16b, v_y6.16b, v_a6.16b
eor v_y7.16b, v_y7.16b, v_a7.16b
eor v_y8.16b, v_y8.16b, v_a8.16b
ldr x_data_crc0, [x_buf_tmp, 512]
ldr x_data_crc1, [x_buf_tmp, 1024]
ldr x_data_crc2, [x_buf_tmp, 1536]
eor v_a1.16b, v_y5.16b, v_a1.16b
eor v_a2.16b, v_y6.16b, v_a2.16b
eor v_a3.16b, v_y7.16b, v_a3.16b
eor v_a4.16b, v_y8.16b, v_a4.16b
crc32_u64 w_crc0, w_crc0, x_data_crc0
crc32_u64 w_crc1, w_crc1, x_data_crc1
crc32_u64 w_crc2, w_crc2, x_data_crc2
ldr x_data_crc0, [x_buf_tmp, 520]
ldr x_data_crc1, [x_buf_tmp, 1032]
ldr x_data_crc2, [x_buf_tmp, 1544]
crc32_u64 w_crc0, w_crc0, x_data_crc0
crc32_u64 w_crc1, w_crc1, x_data_crc1
crc32_u64 w_crc2, w_crc2, x_data_crc2
.endm
// crc32 mix for 2048 byte input data
.macro crc32_mix2048
fmov s_a1, w_crc
movi v_neon_tmp.4s, 0
#ifndef __APPLE__
adrp x_pconst, lanchor_crc32
add x_buf_tmp, x_buf, 64
#else
adrp x_pconst, lanchor_crc32@PAGE
add x_buf_tmp, x_buf, 64
#endif
ldr x_data_crc0, [x_buf, 512]
ldr x_data_crc1, [x_buf, 1024]
ldr x_data_crc2, [x_buf, 1536]
crc32_u64 w_crc0, wzr, x_data_crc0
crc32_u64 w_crc1, wzr, x_data_crc1
crc32_u64 w_crc2, wzr, x_data_crc2
#ifdef CRC32
mvn v_a1.8b, v_a1.8b
#endif
ins v_neon_tmp.s[0], v_a1.s[0]
ld1 {v_a1.4s, v_a2.4s, v_a3.4s, v_a4.4s}, [x_buf]
ldr x_data_crc0, [x_buf, 520]
ldr x_data_crc1, [x_buf, 1032]
ldr x_data_crc2, [x_buf, 1544]
eor v_a1.16b, v_a1.16b, v_neon_tmp.16b
#ifndef __APPLE__
ldr q_a0, [x_pconst, #:lo12:lanchor_crc32] // k1k2
#else
ldr q_a0, [x_pconst, #lanchor_crc32@PAGEOFF] // k1k2
#endif
crc32_u64 w_crc0, w_crc0, x_data_crc0
crc32_u64 w_crc1, w_crc1, x_data_crc1
crc32_u64 w_crc2, w_crc2, x_data_crc2
// loop start, unroll the loop
.align 4
pmull_fold
add x_buf_tmp, x_buf_tmp, 64
pmull_fold
add x_buf_tmp, x_buf_tmp, 64
pmull_fold
add x_buf_tmp, x_buf_tmp, 64
pmull_fold
add x_buf_tmp, x_buf_tmp, 64
pmull_fold
add x_buf_tmp, x_buf_tmp, 64
pmull_fold
add x_buf_tmp, x_buf_tmp, 64
pmull_fold
// loop end
// PMULL: fold into 128-bits
#ifndef __APPLE__
add x_pconst, x_pconst, :lo12:lanchor_crc32
#else
add x_pconst, x_pconst, lanchor_crc32@PAGEOFF
#endif
ldr x_data_crc0, [x_buf, 976]
ldr x_data_crc1, [x_buf, 1488]
ldr x_data_crc2, [x_buf, 2000]
ldr q_a0, [x_pconst, offset_k3k4] // k3k4
crc32_u64 w_crc0, w_crc0, x_data_crc0
crc32_u64 w_crc1, w_crc1, x_data_crc1
crc32_u64 w_crc2, w_crc2, x_data_crc2
pmull v_a5.1q, v_a1.1d, v_a0.1d
pmull2 v_a1.1q, v_a1.2d, v_a0.2d
eor v_a1.16b, v_a5.16b, v_a1.16b
eor v_a1.16b, v_a1.16b, v_a2.16b
ldr x_data_crc0, [x_buf, 984]
ldr x_data_crc1, [x_buf, 1496]
ldr x_data_crc2, [x_buf, 2008]
crc32_u64 w_crc0, w_crc0, x_data_crc0
crc32_u64 w_crc1, w_crc1, x_data_crc1
crc32_u64 w_crc2, w_crc2, x_data_crc2
pmull v_a5.1q, v_a1.1d, v_a0.1d
pmull2 v_a1.1q, v_a1.2d, v_a0.2d
ldr x_data_crc0, [x_buf, 992]
ldr x_data_crc1, [x_buf, 1504]
ldr x_data_crc2, [x_buf, 2016]
eor v_a1.16b, v_a5.16b, v_a1.16b
eor v_a1.16b, v_a1.16b, v_a3.16b
crc32_u64 w_crc0, w_crc0, x_data_crc0
crc32_u64 w_crc1, w_crc1, x_data_crc1
crc32_u64 w_crc2, w_crc2, x_data_crc2
pmull v_a5.1q, v_a1.1d, v_a0.1d
pmull2 v_a1.1q, v_a1.2d, v_a0.2d
ldr x_data_crc0, [x_buf, 1000]
ldr x_data_crc1, [x_buf, 1512]
ldr x_data_crc2, [x_buf, 2024]
eor v_a1.16b, v_a5.16b, v_a1.16b
eor v_a1.16b, v_a1.16b, v_a4.16b
// PMULL: fold 128-bits to 64-bits
crc32_u64 w_crc0, w_crc0, x_data_crc0
crc32_u64 w_crc1, w_crc1, x_data_crc1
crc32_u64 w_crc2, w_crc2, x_data_crc2
dup d_a0, v_a0.d[1]
pmull v_a2.1q, v_a1.1d, v_a0.1d
movi v_neon_zero.4s, 0
ldr q_k5k0, [x_pconst, offset_k5k0] // k5k0
#ifndef __APPLE__
adrp x_tmp, .lanchor_mask
#else
adrp x_tmp, .lanchor_mask@PAGE
#endif
ldr x_data_crc0, [x_buf, 1008]
ldr x_data_crc1, [x_buf, 1520]
ldr x_data_crc2, [x_buf, 2032]
ext v_a1.16b, v_a1.16b, v_neon_zero.16b, #8
eor v_a1.16b, v_a2.16b, v_a1.16b
#ifndef __APPLE__
ldr q_neon_tmp3, [x_tmp, #:lo12:.lanchor_mask]
#else
ldr q_neon_tmp3, [x_tmp, #.lanchor_mask@PAGEOFF]
#endif
crc32_u64 w_crc0, w_crc0, x_data_crc0
crc32_u64 w_crc1, w_crc1, x_data_crc1
crc32_u64 w_crc2, w_crc2, x_data_crc2
dup d_a0, v_k5k0.d[1]
pmull v_a3.1q, v_a2.1d, v_a0.1d
ext v_a2.16b, v_a1.16b, v_neon_zero.16b, #4
and v_a1.16b, v_a1.16b, v_neon_tmp3.16b
pmull v_a1.1q, v_a1.1d, v_k5k0.1d
eor v_a1.16b, v_a2.16b, v_a1.16b
// PMULL: barret reduce to 32-bits
ldr q_neon_tmp1, [x_pconst, offset_poly] // poly
ldr x_data_crc0, [x_buf, 1016]
ldr x_data_crc1, [x_buf, 1528]
ldr x_data_crc2, [x_buf, 2040]
dup d_neon_tmp2, v_neon_tmp1.d[1]
crc32_u64 w_crc0, w_crc0, x_data_crc0
crc32_u64 w_crc1, w_crc1, x_data_crc1
crc32_u64 w_crc2, w_crc2, x_data_crc2
and v_a2.16b, v_a1.16b, v_neon_tmp3.16b
pmull v_a2.1q, v_a2.1d, v_neon_tmp2.1d
and v_a2.16b, v_neon_tmp3.16b, v_a2.16b
pmull v_a2.1q, v_a2.1d, v_neon_tmp1.1d
// crc_pmull result
eor v_a1.16b, v_a1.16b, v_a2.16b
dup s_crc_pmull, v_a1.s[1]
// merge crc_pmull, crc0, crc1, crc2 using pmull instruction
fmov s_neon_crc0, w_crc0
fmov s_neon_crc1, w_crc1
ldr q_neon_const0, [x_pconst, offset_crc32_const]
ldr q_neon_const1, [x_pconst, offset_crc32_const+16]
ldr q_neon_const2, [x_pconst, offset_crc32_const+32]
pmull v_crc_pmull.1q, v_crc_pmull.1d, v_neon_const0.1d
pmull v_neon_crc0.1q, v_neon_crc0.1d, v_neon_const1.1d
pmull v_neon_crc1.1q, v_neon_crc1.1d, v_neon_const2.1d
fmov x_tmp1, d_neon_crc0
crc32_u64 w_crc0, wzr, x_tmp1
fmov x_tmp1, d_neon_crc1
crc32_u64 w_crc1, wzr, x_tmp1
eor w_ret_crc, w_crc1, w_crc0
fmov x_tmp1, d_crc_pmull
crc32_u64 w_tmp, wzr, x_tmp1
eor w_crc2, w_tmp, w_crc2
// handle crc32/crc32c
#ifdef CRC32
eon w_ret_crc, w_crc2, w_ret_crc
#else
eor w_ret_crc, w_crc2, w_ret_crc
#endif
.endm
// crc32 mix main default
.macro crc32_mix_main_default
cmp x_len, 2047
mov x_len_saved, x_len
mov x_buf_saved, x_buf
bls .less_than_2048
sub x_buf_iter, x_len, #2048
stp x29, x30, [sp, -16]!
mov x29, sp
and x_buf_iter, x_buf_iter, -2048
add x_buf_iter, x_buf_iter, 2048
add x_buf_iter, x_buf, x_buf_iter
.align 4
.loop_mix:
mov x_buf, x_buf_saved
crc32_mix2048
add x_buf_saved, x_buf_saved, 2048
cmp x_buf_saved, x_buf_iter
bne .loop_mix
and x_len_saved, x_len_saved, 2047
cbnz x_len_saved, .remain_ldp
ldp x29, x30, [sp], 16
ret
.align 4
.remain_ldp:
mov w_crc_tmp, crc
ldp x29, x30, [sp], 16
mov size, x_len_saved
mov buf, x_buf_iter
b .crc32_hw_handle
.remain:
mov w_crc_tmp, crc
mov size, x_len_saved
mov buf, x_buf_saved
b .crc32_hw_handle
.align 4
.less_than_2048:
cbnz x_len, .remain
ret
.crc32_hw_handle:
cmp size, 63
#ifdef CRC32
mvn crc_tmp, crc_tmp
#endif
bls .less_than_64
sub buf_saved, size, #64
and buf_saved, buf_saved, -64
add buf_saved, buf_saved, 64
add buf_saved, buf, buf_saved
.align 4
.loop_64:
ldp data_tmp1, data_tmp2, [buf]
ldr data_tmp3, [buf, 16]
crc32_u64 crc_tmp, crc_tmp, data_tmp1
crc32_u64 crc_tmp, crc_tmp, data_tmp2
ldp data_tmp1, data_tmp2, [buf, 24]
add buf, buf, 64
crc32_u64 crc_tmp, crc_tmp, data_tmp3
ldr data_tmp3, [buf, -24]
crc32_u64 crc_tmp, crc_tmp, data_tmp1
crc32_u64 crc_tmp, crc_tmp, data_tmp2
ldp data_tmp1, data_tmp2, [buf, -16]
cmp buf_saved, buf
crc32_u64 crc_tmp, crc_tmp, data_tmp3
crc32_u64 crc_tmp, crc_tmp, data_tmp1
crc32_u64 crc_tmp, crc_tmp, data_tmp2
bne .loop_64
and size, size, 63
.less_than_64:
cmp size, 7
bls .crc32_hw_w
ldr data_tmp2, [buf]
sub size_tmp, size, #8
cmp size_tmp, 7
crc32_u64 crc_tmp, crc_tmp, data_tmp2
bls .crc32_hw_w_pre
ldr data_tmp2, [buf, 8]
sub data_tmp3, size, #16
cmp data_tmp3, 7
crc32_u64 crc_tmp, crc_tmp, data_tmp2
bls .crc32_hw_w_pre
ldr data_tmp2, [buf, 16]
sub data_tmp3, size, #24
cmp data_tmp3, 7
crc32_u64 crc_tmp, crc_tmp, data_tmp2
bls .crc32_hw_w_pre
ldr data_tmp2, [buf, 24]
sub data_tmp3, size, #32
cmp data_tmp3, 7
crc32_u64 crc_tmp, crc_tmp, data_tmp2
bls .crc32_hw_w_pre
ldr data_tmp2, [buf, 32]
sub data_tmp3, size, #40
cmp data_tmp3, 7
crc32_u64 crc_tmp, crc_tmp, data_tmp2
bls .crc32_hw_w_pre
ldr data_tmp2, [buf, 40]
sub data_tmp3, size, #48
cmp data_tmp3, 7
crc32_u64 crc_tmp, crc_tmp, data_tmp2
bls .crc32_hw_w_pre
ldr data_tmp2, [buf, 48]
crc32_u64 crc_tmp, crc_tmp, data_tmp2
.crc32_hw_w_pre:
and size_tmp, size_tmp, -8
and size, size, 7
add size_tmp, size_tmp, 8
add buf, buf, size_tmp
.crc32_hw_w:
cmp size, 3
bls .crc32_hw_h
ldr w_data_tmp2, [buf], 4
sub size, size, #4
crc32_u32 crc_tmp, crc_tmp, w_data_tmp2
.crc32_hw_h:
cmp size, 1
bls .crc32_hw_b
ldrh w_data_tmp2, [buf], 2
sub size, size, #2
crc32_u16 crc_tmp, crc_tmp, w_data_tmp2
.crc32_hw_b:
cbz size, .crc32_hw_done
ldrb w_data_tmp2, [buf]
crc32_u8 crc_tmp, crc_tmp, w_data_tmp2
.crc32_hw_done:
#ifdef CRC32
mvn ret_crc, crc_tmp
#else
mov ret_crc, crc_tmp
#endif
ret
.endm

View File

@ -0,0 +1,73 @@
/**********************************************************************
Copyright(c) 2020 Arm Corporation All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Arm Corporation nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
**********************************************************************/
.text
.align 6
.arch armv8-a+crypto+crc
#include "crc32_common_mix_neoverse_n1.S"
.Lconstants:
.octa 0x00000001c6e415960000000154442bd4
.octa 0x00000000ccaa009e00000001751997d0
.octa 0x00000001F701164100000001DB710641
.quad 0x0000000163cd6124
.quad 0x00000000FFFFFFFF
.quad 0x000000001753ab84
.macro crc32_u64 dst,src,data
crc32x \dst,\src,\data
.endm
.macro crc32_u32 dst,src,data
crc32w \dst,\src,\data
.endm
.macro crc32_u16 dst,src,data
crc32h \dst,\src,\data
.endm
.macro crc32_u8 dst,src,data
crc32b \dst,\src,\data
.endm
/**
* uint32_t crc32_mix_neoverse_n1(uint CRC ,uint8_t * BUF,
* size_t LEN)
*/
BUF .req x1
LEN .req x2
CRC .req x0
wCRC .req w0
.align 6
.global cdecl(crc32_mix_neoverse_n1)
#ifndef __APPLE__
.type crc32_mix_neoverse_n1, %function
#endif
cdecl(crc32_mix_neoverse_n1):
crc32_common_mix crc32
#ifndef __APPLE__
.size crc32_mix_neoverse_n1, .-crc32_mix_neoverse_n1
#endif

View File

@ -0,0 +1,146 @@
########################################################################
# Copyright(c) 2019 Arm Corporation All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Arm Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#########################################################################
#include "crc_common_pmull.h"
.macro crc32_norm_func name:req
.arch armv8-a+crypto
.text
.align 3
.global cdecl(\name)
#ifndef __APPLE__
.type \name, %function
#endif
/* uint32_t crc32_norm_func(uint32_t seed, uint8_t * buf, uint64_t len) */
cdecl(\name\()):
mvn w_seed, w_seed
mov x_counter, 0
cmp x_len, (FOLD_SIZE - 1)
bhi .crc_clmul_pre
.crc_tab_pre:
cmp x_len, x_counter
bls .done
#ifndef __APPLE__
adrp x_tmp, .lanchor_crc_tab
add x_buf_iter, x_buf, x_counter
add x_buf, x_buf, x_len
add x_crc_tab_addr, x_tmp, :lo12:.lanchor_crc_tab
#else
adrp x_tmp, .lanchor_crc_tab@PAGE
add x_buf_iter, x_buf, x_counter
add x_buf, x_buf, x_len
add x_crc_tab_addr, x_tmp, .lanchor_crc_tab@PAGEOFF
#endif
.align 3
.loop_crc_tab:
ldrb w_tmp, [x_buf_iter], 1
cmp x_buf, x_buf_iter
eor w_tmp, w_tmp, w_seed, lsr 24
ldr w_tmp, [x_crc_tab_addr, w_tmp, uxtw 2]
eor w_seed, w_tmp, w_seed, lsl 8
bhi .loop_crc_tab
.done:
mvn w_crc_ret, w_seed
ret
.align 2
.crc_clmul_pre:
lsl x_seed, x_seed, 32
movi v_x0.2s, 0
fmov v_x0.d[1], x_seed // save crc to v_x0
crc_norm_load_first_block
bls .clmul_loop_end
crc32_load_p4
// 1024bit --> 512bit loop
// merge x0, x1, x2, x3, y0, y1, y2, y3 => x0, x1, x2, x3 (uint64x2_t)
crc_norm_loop
.clmul_loop_end:
// folding 512bit --> 128bit
crc32_fold_512b_to_128b
// folding 128bit --> 64bit
mov x_tmp, p0_high_b0
movk x_tmp, p0_high_b1, lsl 16
fmov d_p0_high, x_tmp
mov x_tmp2, p0_low_b0
movk x_tmp2, p0_low_b1, lsl 16
fmov d_p0_high2, x_tmp2
mov d_tmp_high, v_x3.d[0]
ext v_tmp_high.16b, v_tmp_high.16b, v_tmp_high.16b, #12
pmull2 v_x3.1q, v_x3.2d, v_p0.2d
eor v_tmp_high.16b, v_tmp_high.16b, v_x3.16b
pmull2 v_x3.1q, v_tmp_high.2d, v_p02.2d
// barrett reduction
mov x_tmp2, br_high_b0
movk x_tmp2, br_high_b1, lsl 16
movk x_tmp2, br_high_b2, lsl 32
fmov d_br_high, x_tmp2
mov x_tmp, br_low_b0
movk x_tmp, br_low_b1, lsl 16
movk x_tmp, br_low_b2, lsl 32
fmov d_br_low, x_tmp
eor v_tmp_high.16b, v_tmp_high.16b, v_x3.16b
mov s_x3, v_tmp_high.s[1]
pmull v_x3.1q, v_x3.1d, v_br_low.1d
mov s_x3, v_x3.s[1]
pmull v_x3.1q, v_x3.1d, v_br_high.1d
eor v_tmp_high.8b, v_tmp_high.8b, v_x3.8b
umov w_seed, v_tmp_high.s[0]
b .crc_tab_pre
#ifndef __APPLE__
.size \name, .-\name
.section .rodata.cst16,"aM",@progbits,16
#else
.section __TEXT,__const
#endif
.align 4
.shuffle_data:
.byte 15, 14, 13, 12, 11, 10, 9
.byte 8, 7, 6, 5, 4, 3, 2, 1, 0
.endm

View File

@ -0,0 +1,136 @@
########################################################################
# Copyright(c) 2019 Arm Corporation All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Arm Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#########################################################################
#include "crc_common_pmull.h"
.macro crc32_refl_func name:req
.arch armv8-a+crypto
.text
.align 3
.global cdecl(\name)
#ifndef __APPLE__
.type \name, %function
#endif
/* uint32_t crc32_refl_func(uint32_t seed, uint8_t * buf, uint64_t len) */
cdecl(\name\()):
mvn w_seed, w_seed
mov x_counter, 0
cmp x_len, (FOLD_SIZE - 1)
bhi .crc32_clmul_pre
.crc_tab_pre:
cmp x_len, x_counter
bls .done
#ifndef __APPLE__
adrp x_tmp, .lanchor_crc_tab
add x_buf_iter, x_buf, x_counter
add x_buf, x_buf, x_len
add x_crc_tab_addr, x_tmp, :lo12:.lanchor_crc_tab
#else
adrp x_tmp, .lanchor_crc_tab@PAGE
add x_buf_iter, x_buf, x_counter
add x_buf, x_buf, x_len
add x_crc_tab_addr, x_tmp, .lanchor_crc_tab@PAGEOFF
#endif
.align 3
.loop_crc_tab:
ldrb w_tmp, [x_buf_iter], 1
cmp x_buf, x_buf_iter
eor w_tmp, w_tmp, w_seed
and w_tmp, w_tmp, 255
ldr w_tmp, [x_crc_tab_addr, w_tmp, uxtw 2]
eor w_seed, w_tmp, w_seed, lsr 8
bhi .loop_crc_tab
.done:
mvn w_crc_ret, w_seed
ret
.align 2
.crc32_clmul_pre:
fmov s_x0, w_seed // save crc to s_x0
crc_refl_load_first_block
bls .clmul_loop_end
crc32_load_p4
// 1024bit --> 512bit loop
// merge x0, x1, x2, x3, y0, y1, y2, y3 => x0, x1, x2, x3 (uint64x2_t)
crc_refl_loop
.clmul_loop_end:
// folding 512bit --> 128bit
crc32_fold_512b_to_128b
// folding 128bit --> 64bit
mov x_tmp, p0_low_b0
movk x_tmp, p0_low_b1, lsl 16
fmov d_p0_low2, x_tmp
mov d_tmp_high, v_x3.d[1]
mov d_p0_low, v_p1.d[1]
pmull v_x3.1q, v_x3.1d, v_p0.1d
eor v_tmp_high.16b, v_tmp_high.16b, v_x3.16b
mov s_x3, v_tmp_high.s[0]
ext v_tmp_high.16b, v_tmp_high.16b, v_tmp_high.16b, #4
pmull v_x3.1q, v_x3.1d, v_p02.1d
// barrett reduction
mov x_tmp2, br_high_b0
movk x_tmp2, br_high_b1, lsl 16
movk x_tmp2, br_high_b2, lsl 32
fmov d_br_high, x_tmp2
mov x_tmp, br_low_b0
movk x_tmp, br_low_b1, lsl 16
movk x_tmp, br_low_b2, lsl 32
fmov d_br_low, x_tmp
eor v_tmp_high.16b, v_tmp_high.16b, v_x3.16b
mov s_x3, v_tmp_high.s[0]
pmull v_x3.1q, v_x3.1d, v_br_high.1d
mov s_x3, v_x3.s[0]
pmull v_x3.1q, v_x3.1d, v_br_low.1d
eor v_tmp_high.8b, v_tmp_high.8b, v_x3.8b
umov w_seed, v_tmp_high.s[1]
b .crc_tab_pre
#ifndef __APPLE__
.size \name, .-\name
#endif
.endm

View File

@ -0,0 +1,125 @@
/**********************************************************************
Copyright(c) 2020 Arm Corporation All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Arm Corporation nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
**********************************************************************/
.text
.arch armv8-a+crypto+crc
.align 6
.macro crc32_u64 dst,src,data
crc32cx \dst,\src,\data
.endm
.macro crc32_u32 dst,src,data
crc32cw \dst,\src,\data
.endm
.macro crc32_u16 dst,src,data
crc32ch \dst,\src,\data
.endm
.macro crc32_u8 dst,src,data
crc32cb \dst,\src,\data
.endm
#include "crc32_mix_default_common.S"
.global cdecl(crc32c_mix_default)
#ifndef __APPLE__
.type crc32c_mix_default, %function
#endif
cdecl(crc32c_mix_default):
mov w3, w2
sxtw x2, w1
mov x1, x0
mov w0, w3
crc32_mix_main_default
#ifndef __APPLE__
.size crc32c_mix_default, .-crc32c_mix_default
#endif
ASM_DEF_RODATA
.align 4
.set lanchor_crc32,. + 0
#ifndef __APPLE__
.type k1k2, %object
.size k1k2, 16
#endif
k1k2:
.xword 0x00740eef02
.xword 0x009e4addf8
#ifndef __APPLE__
.type k3k4, %object
.size k3k4, 16
#endif
k3k4:
.xword 0x00f20c0dfe
.xword 0x014cd00bd6
#ifndef __APPLE__
.type k5k0, %object
.size k5k0, 16
#endif
k5k0:
.xword 0x00dd45aab8
.xword 0
#ifndef __APPLE__
.type poly, %object
.size poly, 16
#endif
poly:
.xword 0x0105ec76f0
.xword 0x00dea713f1
#ifndef __APPLE__
.type crc32_const, %object
.size crc32_const, 48
#endif
crc32_const:
.xword 0x9ef68d35
.xword 0
.xword 0x170076fa
.xword 0
.xword 0xdd7e3b0c
.xword 0
.align 4
.set .lanchor_mask,. + 0
#ifndef __APPLE__
.type mask, %object
.size mask, 16
#endif
mask:
.word -1
.word 0
.word -1
.word 0

View File

@ -0,0 +1,72 @@
/**********************************************************************
Copyright(c) 2020 Arm Corporation All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Arm Corporation nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
**********************************************************************/
.text
.align 6
.arch armv8-a+crypto+crc
#include "crc32_common_mix_neoverse_n1.S"
.Lconstants:
.octa 0x000000009e4addf800000000740eef02
.octa 0x000000014cd00bd600000000f20c0dfe
.octa 0x00000000dea713f10000000105ec76f0
.quad 0x00000000dd45aab8
.quad 0x00000000FFFFFFFF
.quad 0x000000009ef68d35
.macro crc32_u64 dst,src,data
crc32cx \dst,\src,\data
.endm
.macro crc32_u32 dst,src,data
crc32cw \dst,\src,\data
.endm
.macro crc32_u16 dst,src,data
crc32ch \dst,\src,\data
.endm
.macro crc32_u8 dst,src,data
crc32cb \dst,\src,\data
.endm
/**
* uint32_t crc32c_mix_neoverse_n1(uint8_t * BUF,
* size_t LEN, uint CRC)
*/
BUF .req x0
LEN .req x1
CRC .req x2
wCRC .req w2
.align 6
.global cdecl(crc32c_mix_neoverse_n1)
#ifndef __APPLE__
.type crc32c_mix_neoverse_n1, %function
#endif
cdecl(crc32c_mix_neoverse_n1):
crc32_common_mix crc32c
#ifndef __APPLE__
.size crc32c_mix_neoverse_n1, .-crc32c_mix_neoverse_n1
#endif

View File

@ -0,0 +1,34 @@
########################################################################
# Copyright(c) 2019 Arm Corporation All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Arm Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#########################################################################
#include "../include/aarch64_label.h"
#include "crc64_ecma_norm_pmull.h"
#include "crc64_norm_common_pmull.h"
crc64_norm_func crc64_ecma_norm_pmull

View File

@ -0,0 +1,201 @@
########################################################################
# Copyright(c) 2019 Arm Corporation All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Arm Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#########################################################################
#define p4_low_b0 (0xf020)
#define p4_low_b1 0x540d
#define p4_low_b2 0x43ca
#define p4_low_b3 0x5f68
#define p4_high_b0 0xb83f
#define p4_high_b1 0x1205
#define p4_high_b2 0xb698
#define p4_high_b3 0xddf4
#define p1_low_b0 (0xfab6)
#define p1_low_b1 0xeb52
#define p1_low_b2 0xc3c7
#define p1_low_b3 0x05f5
#define p1_high_b0 0x740e
#define p1_high_b1 0xd257
#define p1_high_b2 0x38a7
#define p1_high_b3 0x4eb9
#define p0_low_b0 (0xfab6)
#define p0_low_b1 0xeb52
#define p0_low_b2 0xc3c7
#define p0_low_b3 0x05f5
#define p0_high_b0 0x0
#define p0_high_b1 0x0
#define p0_high_b2 0x0
#define p0_high_b3 0x0
#define br_low_b0 (0xf872)
#define br_low_b1 0x6cc4
#define br_low_b2 0x29d0
#define br_low_b3 0x578d
#define br_high_b0 0x3693
#define br_high_b1 0xa9ea
#define br_high_b2 0xe1eb
#define br_high_b3 0x42f0
ASM_DEF_RODATA
.align 4
.set .lanchor_crc_tab,. + 0
#ifndef __APPLE__
.type crc64_tab, %object
.size crc64_tab, 2048
#endif
crc64_tab:
.xword 0x0000000000000000, 0x42f0e1eba9ea3693
.xword 0x85e1c3d753d46d26, 0xc711223cfa3e5bb5
.xword 0x493366450e42ecdf, 0x0bc387aea7a8da4c
.xword 0xccd2a5925d9681f9, 0x8e224479f47cb76a
.xword 0x9266cc8a1c85d9be, 0xd0962d61b56fef2d
.xword 0x17870f5d4f51b498, 0x5577eeb6e6bb820b
.xword 0xdb55aacf12c73561, 0x99a54b24bb2d03f2
.xword 0x5eb4691841135847, 0x1c4488f3e8f96ed4
.xword 0x663d78ff90e185ef, 0x24cd9914390bb37c
.xword 0xe3dcbb28c335e8c9, 0xa12c5ac36adfde5a
.xword 0x2f0e1eba9ea36930, 0x6dfeff5137495fa3
.xword 0xaaefdd6dcd770416, 0xe81f3c86649d3285
.xword 0xf45bb4758c645c51, 0xb6ab559e258e6ac2
.xword 0x71ba77a2dfb03177, 0x334a9649765a07e4
.xword 0xbd68d2308226b08e, 0xff9833db2bcc861d
.xword 0x388911e7d1f2dda8, 0x7a79f00c7818eb3b
.xword 0xcc7af1ff21c30bde, 0x8e8a101488293d4d
.xword 0x499b3228721766f8, 0x0b6bd3c3dbfd506b
.xword 0x854997ba2f81e701, 0xc7b97651866bd192
.xword 0x00a8546d7c558a27, 0x4258b586d5bfbcb4
.xword 0x5e1c3d753d46d260, 0x1cecdc9e94ace4f3
.xword 0xdbfdfea26e92bf46, 0x990d1f49c77889d5
.xword 0x172f5b3033043ebf, 0x55dfbadb9aee082c
.xword 0x92ce98e760d05399, 0xd03e790cc93a650a
.xword 0xaa478900b1228e31, 0xe8b768eb18c8b8a2
.xword 0x2fa64ad7e2f6e317, 0x6d56ab3c4b1cd584
.xword 0xe374ef45bf6062ee, 0xa1840eae168a547d
.xword 0x66952c92ecb40fc8, 0x2465cd79455e395b
.xword 0x3821458aada7578f, 0x7ad1a461044d611c
.xword 0xbdc0865dfe733aa9, 0xff3067b657990c3a
.xword 0x711223cfa3e5bb50, 0x33e2c2240a0f8dc3
.xword 0xf4f3e018f031d676, 0xb60301f359dbe0e5
.xword 0xda050215ea6c212f, 0x98f5e3fe438617bc
.xword 0x5fe4c1c2b9b84c09, 0x1d14202910527a9a
.xword 0x93366450e42ecdf0, 0xd1c685bb4dc4fb63
.xword 0x16d7a787b7faa0d6, 0x5427466c1e109645
.xword 0x4863ce9ff6e9f891, 0x0a932f745f03ce02
.xword 0xcd820d48a53d95b7, 0x8f72eca30cd7a324
.xword 0x0150a8daf8ab144e, 0x43a04931514122dd
.xword 0x84b16b0dab7f7968, 0xc6418ae602954ffb
.xword 0xbc387aea7a8da4c0, 0xfec89b01d3679253
.xword 0x39d9b93d2959c9e6, 0x7b2958d680b3ff75
.xword 0xf50b1caf74cf481f, 0xb7fbfd44dd257e8c
.xword 0x70eadf78271b2539, 0x321a3e938ef113aa
.xword 0x2e5eb66066087d7e, 0x6cae578bcfe24bed
.xword 0xabbf75b735dc1058, 0xe94f945c9c3626cb
.xword 0x676dd025684a91a1, 0x259d31cec1a0a732
.xword 0xe28c13f23b9efc87, 0xa07cf2199274ca14
.xword 0x167ff3eacbaf2af1, 0x548f120162451c62
.xword 0x939e303d987b47d7, 0xd16ed1d631917144
.xword 0x5f4c95afc5edc62e, 0x1dbc74446c07f0bd
.xword 0xdaad56789639ab08, 0x985db7933fd39d9b
.xword 0x84193f60d72af34f, 0xc6e9de8b7ec0c5dc
.xword 0x01f8fcb784fe9e69, 0x43081d5c2d14a8fa
.xword 0xcd2a5925d9681f90, 0x8fdab8ce70822903
.xword 0x48cb9af28abc72b6, 0x0a3b7b1923564425
.xword 0x70428b155b4eaf1e, 0x32b26afef2a4998d
.xword 0xf5a348c2089ac238, 0xb753a929a170f4ab
.xword 0x3971ed50550c43c1, 0x7b810cbbfce67552
.xword 0xbc902e8706d82ee7, 0xfe60cf6caf321874
.xword 0xe224479f47cb76a0, 0xa0d4a674ee214033
.xword 0x67c58448141f1b86, 0x253565a3bdf52d15
.xword 0xab1721da49899a7f, 0xe9e7c031e063acec
.xword 0x2ef6e20d1a5df759, 0x6c0603e6b3b7c1ca
.xword 0xf6fae5c07d3274cd, 0xb40a042bd4d8425e
.xword 0x731b26172ee619eb, 0x31ebc7fc870c2f78
.xword 0xbfc9838573709812, 0xfd39626eda9aae81
.xword 0x3a28405220a4f534, 0x78d8a1b9894ec3a7
.xword 0x649c294a61b7ad73, 0x266cc8a1c85d9be0
.xword 0xe17dea9d3263c055, 0xa38d0b769b89f6c6
.xword 0x2daf4f0f6ff541ac, 0x6f5faee4c61f773f
.xword 0xa84e8cd83c212c8a, 0xeabe6d3395cb1a19
.xword 0x90c79d3fedd3f122, 0xd2377cd44439c7b1
.xword 0x15265ee8be079c04, 0x57d6bf0317edaa97
.xword 0xd9f4fb7ae3911dfd, 0x9b041a914a7b2b6e
.xword 0x5c1538adb04570db, 0x1ee5d94619af4648
.xword 0x02a151b5f156289c, 0x4051b05e58bc1e0f
.xword 0x87409262a28245ba, 0xc5b073890b687329
.xword 0x4b9237f0ff14c443, 0x0962d61b56fef2d0
.xword 0xce73f427acc0a965, 0x8c8315cc052a9ff6
.xword 0x3a80143f5cf17f13, 0x7870f5d4f51b4980
.xword 0xbf61d7e80f251235, 0xfd913603a6cf24a6
.xword 0x73b3727a52b393cc, 0x31439391fb59a55f
.xword 0xf652b1ad0167feea, 0xb4a25046a88dc879
.xword 0xa8e6d8b54074a6ad, 0xea16395ee99e903e
.xword 0x2d071b6213a0cb8b, 0x6ff7fa89ba4afd18
.xword 0xe1d5bef04e364a72, 0xa3255f1be7dc7ce1
.xword 0x64347d271de22754, 0x26c49cccb40811c7
.xword 0x5cbd6cc0cc10fafc, 0x1e4d8d2b65facc6f
.xword 0xd95caf179fc497da, 0x9bac4efc362ea149
.xword 0x158e0a85c2521623, 0x577eeb6e6bb820b0
.xword 0x906fc95291867b05, 0xd29f28b9386c4d96
.xword 0xcedba04ad0952342, 0x8c2b41a1797f15d1
.xword 0x4b3a639d83414e64, 0x09ca82762aab78f7
.xword 0x87e8c60fded7cf9d, 0xc51827e4773df90e
.xword 0x020905d88d03a2bb, 0x40f9e43324e99428
.xword 0x2cffe7d5975e55e2, 0x6e0f063e3eb46371
.xword 0xa91e2402c48a38c4, 0xebeec5e96d600e57
.xword 0x65cc8190991cb93d, 0x273c607b30f68fae
.xword 0xe02d4247cac8d41b, 0xa2dda3ac6322e288
.xword 0xbe992b5f8bdb8c5c, 0xfc69cab42231bacf
.xword 0x3b78e888d80fe17a, 0x7988096371e5d7e9
.xword 0xf7aa4d1a85996083, 0xb55aacf12c735610
.xword 0x724b8ecdd64d0da5, 0x30bb6f267fa73b36
.xword 0x4ac29f2a07bfd00d, 0x08327ec1ae55e69e
.xword 0xcf235cfd546bbd2b, 0x8dd3bd16fd818bb8
.xword 0x03f1f96f09fd3cd2, 0x41011884a0170a41
.xword 0x86103ab85a2951f4, 0xc4e0db53f3c36767
.xword 0xd8a453a01b3a09b3, 0x9a54b24bb2d03f20
.xword 0x5d45907748ee6495, 0x1fb5719ce1045206
.xword 0x919735e51578e56c, 0xd367d40ebc92d3ff
.xword 0x1476f63246ac884a, 0x568617d9ef46bed9
.xword 0xe085162ab69d5e3c, 0xa275f7c11f7768af
.xword 0x6564d5fde549331a, 0x279434164ca30589
.xword 0xa9b6706fb8dfb2e3, 0xeb46918411358470
.xword 0x2c57b3b8eb0bdfc5, 0x6ea7525342e1e956
.xword 0x72e3daa0aa188782, 0x30133b4b03f2b111
.xword 0xf7021977f9cceaa4, 0xb5f2f89c5026dc37
.xword 0x3bd0bce5a45a6b5d, 0x79205d0e0db05dce
.xword 0xbe317f32f78e067b, 0xfcc19ed95e6430e8
.xword 0x86b86ed5267cdbd3, 0xc4488f3e8f96ed40
.xword 0x0359ad0275a8b6f5, 0x41a94ce9dc428066
.xword 0xcf8b0890283e370c, 0x8d7be97b81d4019f
.xword 0x4a6acb477bea5a2a, 0x089a2aacd2006cb9
.xword 0x14dea25f3af9026d, 0x562e43b4931334fe
.xword 0x913f6188692d6f4b, 0xd3cf8063c0c759d8
.xword 0x5dedc41a34bbeeb2, 0x1f1d25f19d51d821
.xword 0xd80c07cd676f8394, 0x9afce626ce85b507

View File

@ -0,0 +1,34 @@
########################################################################
# Copyright(c) 2019 Arm Corporation All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Arm Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#########################################################################
#include "../include/aarch64_label.h"
#include "crc64_ecma_refl_pmull.h"
#include "crc64_refl_common_pmull.h"
crc64_refl_func crc64_ecma_refl_pmull

View File

@ -0,0 +1,197 @@
########################################################################
# Copyright(c) 2019 Arm Corporation All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Arm Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#########################################################################
#define p4_low_b0 0x41f3
#define p4_low_b1 0x9dd4
#define p4_low_b2 0xefbb
#define p4_low_b3 0x6ae3
#define p4_high_b0 0x2df4
#define p4_high_b1 0xa784
#define p4_high_b2 0x6054
#define p4_high_b3 0x081f
#define p1_low_b0 0x3ae4
#define p1_low_b1 0xca39
#define p1_low_b2 0xd497
#define p1_low_b3 0xe05d
#define p1_high_b0 0x5f40
#define p1_high_b1 0xc787
#define p1_high_b2 0x95af
#define p1_high_b3 0xdabe
#define p0_low_b0 0x5f40
#define p0_low_b1 0xc787
#define p0_low_b2 0x95af
#define p0_low_b3 0xdabe
#define br_low_b0 0x63d5
#define br_low_b1 0x1729
#define br_low_b2 0x466c
#define br_low_b3 0x9c3e
#define br_high_b0 0x1e85
#define br_high_b1 0xaf0e
#define br_high_b2 0xaf2b
#define br_high_b3 0x92d8
ASM_DEF_RODATA
.align 4
.set .lanchor_crc_tab,. + 0
#ifndef __APPLE__
.type crc64_tab, %object
.size crc64_tab, 2048
#endif
crc64_tab:
.xword 0x0000000000000000, 0xb32e4cbe03a75f6f
.xword 0xf4843657a840a05b, 0x47aa7ae9abe7ff34
.xword 0x7bd0c384ff8f5e33, 0xc8fe8f3afc28015c
.xword 0x8f54f5d357cffe68, 0x3c7ab96d5468a107
.xword 0xf7a18709ff1ebc66, 0x448fcbb7fcb9e309
.xword 0x0325b15e575e1c3d, 0xb00bfde054f94352
.xword 0x8c71448d0091e255, 0x3f5f08330336bd3a
.xword 0x78f572daa8d1420e, 0xcbdb3e64ab761d61
.xword 0x7d9ba13851336649, 0xceb5ed8652943926
.xword 0x891f976ff973c612, 0x3a31dbd1fad4997d
.xword 0x064b62bcaebc387a, 0xb5652e02ad1b6715
.xword 0xf2cf54eb06fc9821, 0x41e11855055bc74e
.xword 0x8a3a2631ae2dda2f, 0x39146a8fad8a8540
.xword 0x7ebe1066066d7a74, 0xcd905cd805ca251b
.xword 0xf1eae5b551a2841c, 0x42c4a90b5205db73
.xword 0x056ed3e2f9e22447, 0xb6409f5cfa457b28
.xword 0xfb374270a266cc92, 0x48190ecea1c193fd
.xword 0x0fb374270a266cc9, 0xbc9d3899098133a6
.xword 0x80e781f45de992a1, 0x33c9cd4a5e4ecdce
.xword 0x7463b7a3f5a932fa, 0xc74dfb1df60e6d95
.xword 0x0c96c5795d7870f4, 0xbfb889c75edf2f9b
.xword 0xf812f32ef538d0af, 0x4b3cbf90f69f8fc0
.xword 0x774606fda2f72ec7, 0xc4684a43a15071a8
.xword 0x83c230aa0ab78e9c, 0x30ec7c140910d1f3
.xword 0x86ace348f355aadb, 0x3582aff6f0f2f5b4
.xword 0x7228d51f5b150a80, 0xc10699a158b255ef
.xword 0xfd7c20cc0cdaf4e8, 0x4e526c720f7dab87
.xword 0x09f8169ba49a54b3, 0xbad65a25a73d0bdc
.xword 0x710d64410c4b16bd, 0xc22328ff0fec49d2
.xword 0x85895216a40bb6e6, 0x36a71ea8a7ace989
.xword 0x0adda7c5f3c4488e, 0xb9f3eb7bf06317e1
.xword 0xfe5991925b84e8d5, 0x4d77dd2c5823b7ba
.xword 0x64b62bcaebc387a1, 0xd7986774e864d8ce
.xword 0x90321d9d438327fa, 0x231c512340247895
.xword 0x1f66e84e144cd992, 0xac48a4f017eb86fd
.xword 0xebe2de19bc0c79c9, 0x58cc92a7bfab26a6
.xword 0x9317acc314dd3bc7, 0x2039e07d177a64a8
.xword 0x67939a94bc9d9b9c, 0xd4bdd62abf3ac4f3
.xword 0xe8c76f47eb5265f4, 0x5be923f9e8f53a9b
.xword 0x1c4359104312c5af, 0xaf6d15ae40b59ac0
.xword 0x192d8af2baf0e1e8, 0xaa03c64cb957be87
.xword 0xeda9bca512b041b3, 0x5e87f01b11171edc
.xword 0x62fd4976457fbfdb, 0xd1d305c846d8e0b4
.xword 0x96797f21ed3f1f80, 0x2557339fee9840ef
.xword 0xee8c0dfb45ee5d8e, 0x5da24145464902e1
.xword 0x1a083bacedaefdd5, 0xa9267712ee09a2ba
.xword 0x955cce7fba6103bd, 0x267282c1b9c65cd2
.xword 0x61d8f8281221a3e6, 0xd2f6b4961186fc89
.xword 0x9f8169ba49a54b33, 0x2caf25044a02145c
.xword 0x6b055fede1e5eb68, 0xd82b1353e242b407
.xword 0xe451aa3eb62a1500, 0x577fe680b58d4a6f
.xword 0x10d59c691e6ab55b, 0xa3fbd0d71dcdea34
.xword 0x6820eeb3b6bbf755, 0xdb0ea20db51ca83a
.xword 0x9ca4d8e41efb570e, 0x2f8a945a1d5c0861
.xword 0x13f02d374934a966, 0xa0de61894a93f609
.xword 0xe7741b60e174093d, 0x545a57dee2d35652
.xword 0xe21ac88218962d7a, 0x5134843c1b317215
.xword 0x169efed5b0d68d21, 0xa5b0b26bb371d24e
.xword 0x99ca0b06e7197349, 0x2ae447b8e4be2c26
.xword 0x6d4e3d514f59d312, 0xde6071ef4cfe8c7d
.xword 0x15bb4f8be788911c, 0xa6950335e42fce73
.xword 0xe13f79dc4fc83147, 0x521135624c6f6e28
.xword 0x6e6b8c0f1807cf2f, 0xdd45c0b11ba09040
.xword 0x9aefba58b0476f74, 0x29c1f6e6b3e0301b
.xword 0xc96c5795d7870f42, 0x7a421b2bd420502d
.xword 0x3de861c27fc7af19, 0x8ec62d7c7c60f076
.xword 0xb2bc941128085171, 0x0192d8af2baf0e1e
.xword 0x4638a2468048f12a, 0xf516eef883efae45
.xword 0x3ecdd09c2899b324, 0x8de39c222b3eec4b
.xword 0xca49e6cb80d9137f, 0x7967aa75837e4c10
.xword 0x451d1318d716ed17, 0xf6335fa6d4b1b278
.xword 0xb199254f7f564d4c, 0x02b769f17cf11223
.xword 0xb4f7f6ad86b4690b, 0x07d9ba1385133664
.xword 0x4073c0fa2ef4c950, 0xf35d8c442d53963f
.xword 0xcf273529793b3738, 0x7c0979977a9c6857
.xword 0x3ba3037ed17b9763, 0x888d4fc0d2dcc80c
.xword 0x435671a479aad56d, 0xf0783d1a7a0d8a02
.xword 0xb7d247f3d1ea7536, 0x04fc0b4dd24d2a59
.xword 0x3886b22086258b5e, 0x8ba8fe9e8582d431
.xword 0xcc0284772e652b05, 0x7f2cc8c92dc2746a
.xword 0x325b15e575e1c3d0, 0x8175595b76469cbf
.xword 0xc6df23b2dda1638b, 0x75f16f0cde063ce4
.xword 0x498bd6618a6e9de3, 0xfaa59adf89c9c28c
.xword 0xbd0fe036222e3db8, 0x0e21ac88218962d7
.xword 0xc5fa92ec8aff7fb6, 0x76d4de52895820d9
.xword 0x317ea4bb22bfdfed, 0x8250e80521188082
.xword 0xbe2a516875702185, 0x0d041dd676d77eea
.xword 0x4aae673fdd3081de, 0xf9802b81de97deb1
.xword 0x4fc0b4dd24d2a599, 0xfceef8632775faf6
.xword 0xbb44828a8c9205c2, 0x086ace348f355aad
.xword 0x34107759db5dfbaa, 0x873e3be7d8faa4c5
.xword 0xc094410e731d5bf1, 0x73ba0db070ba049e
.xword 0xb86133d4dbcc19ff, 0x0b4f7f6ad86b4690
.xword 0x4ce50583738cb9a4, 0xffcb493d702be6cb
.xword 0xc3b1f050244347cc, 0x709fbcee27e418a3
.xword 0x3735c6078c03e797, 0x841b8ab98fa4b8f8
.xword 0xadda7c5f3c4488e3, 0x1ef430e13fe3d78c
.xword 0x595e4a08940428b8, 0xea7006b697a377d7
.xword 0xd60abfdbc3cbd6d0, 0x6524f365c06c89bf
.xword 0x228e898c6b8b768b, 0x91a0c532682c29e4
.xword 0x5a7bfb56c35a3485, 0xe955b7e8c0fd6bea
.xword 0xaeffcd016b1a94de, 0x1dd181bf68bdcbb1
.xword 0x21ab38d23cd56ab6, 0x9285746c3f7235d9
.xword 0xd52f0e859495caed, 0x6601423b97329582
.xword 0xd041dd676d77eeaa, 0x636f91d96ed0b1c5
.xword 0x24c5eb30c5374ef1, 0x97eba78ec690119e
.xword 0xab911ee392f8b099, 0x18bf525d915feff6
.xword 0x5f1528b43ab810c2, 0xec3b640a391f4fad
.xword 0x27e05a6e926952cc, 0x94ce16d091ce0da3
.xword 0xd3646c393a29f297, 0x604a2087398eadf8
.xword 0x5c3099ea6de60cff, 0xef1ed5546e415390
.xword 0xa8b4afbdc5a6aca4, 0x1b9ae303c601f3cb
.xword 0x56ed3e2f9e224471, 0xe5c372919d851b1e
.xword 0xa26908783662e42a, 0x114744c635c5bb45
.xword 0x2d3dfdab61ad1a42, 0x9e13b115620a452d
.xword 0xd9b9cbfcc9edba19, 0x6a978742ca4ae576
.xword 0xa14cb926613cf817, 0x1262f598629ba778
.xword 0x55c88f71c97c584c, 0xe6e6c3cfcadb0723
.xword 0xda9c7aa29eb3a624, 0x69b2361c9d14f94b
.xword 0x2e184cf536f3067f, 0x9d36004b35545910
.xword 0x2b769f17cf112238, 0x9858d3a9ccb67d57
.xword 0xdff2a94067518263, 0x6cdce5fe64f6dd0c
.xword 0x50a65c93309e7c0b, 0xe388102d33392364
.xword 0xa4226ac498dedc50, 0x170c267a9b79833f
.xword 0xdcd7181e300f9e5e, 0x6ff954a033a8c131
.xword 0x28532e49984f3e05, 0x9b7d62f79be8616a
.xword 0xa707db9acf80c06d, 0x14299724cc279f02
.xword 0x5383edcd67c06036, 0xe0ada17364673f59

View File

@ -0,0 +1,34 @@
########################################################################
# Copyright(c) 2019 Arm Corporation All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Arm Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#########################################################################
#include "../include/aarch64_label.h"
#include "crc64_iso_norm_pmull.h"
#include "crc64_norm_common_pmull.h"
crc64_norm_func crc64_iso_norm_pmull

View File

@ -0,0 +1,202 @@
########################################################################
# Copyright(c) 2019 Arm Corporation All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Arm Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#########################################################################
#define p4_low_b0 (0x0101)
#define p4_low_b1 0x0100
#define p4_low_b2 0x0001
#define p4_low_b3 0x0000
#define p4_high_b0 0x1b1b
#define p4_high_b1 0x1b00
#define p4_high_b2 0x001b
#define p4_high_b3 0x0000
#define p1_low_b0 (0x0145)
#define p1_low_b1 0x0000
#define p1_low_b2 0x0000
#define p1_low_b3 0x0000
#define p1_high_b0 0x1db7
#define p1_high_b1 0x0000
#define p1_high_b2 0x0000
#define p1_high_b3 0x0000
#define p0_low_b0 (0x0145)
#define p0_low_b1 0x0000
#define p0_low_b2 0x0000
#define p0_low_b3 0x0000
#define p0_high_b0 0x0000
#define p0_high_b1 0x0000
#define p0_high_b2 0x0000
#define p0_high_b3 0x0000
#define br_low_b0 (0x001b)
#define br_low_b1 0x0000
#define br_low_b2 0x0000
#define br_low_b3 0x0000
#define br_high_b0 0x001b
#define br_high_b1 0x0000
#define br_high_b2 0x0000
#define br_high_b3 0x0000
ASM_DEF_RODATA
.align 4
.set .lanchor_crc_tab,. + 0
#ifndef __APPLE__
.type crc64_tab, %object
.size crc64_tab, 2048
#endif
crc64_tab:
.xword 0x0000000000000000, 0x000000000000001b
.xword 0x0000000000000036, 0x000000000000002d
.xword 0x000000000000006c, 0x0000000000000077
.xword 0x000000000000005a, 0x0000000000000041
.xword 0x00000000000000d8, 0x00000000000000c3
.xword 0x00000000000000ee, 0x00000000000000f5
.xword 0x00000000000000b4, 0x00000000000000af
.xword 0x0000000000000082, 0x0000000000000099
.xword 0x00000000000001b0, 0x00000000000001ab
.xword 0x0000000000000186, 0x000000000000019d
.xword 0x00000000000001dc, 0x00000000000001c7
.xword 0x00000000000001ea, 0x00000000000001f1
.xword 0x0000000000000168, 0x0000000000000173
.xword 0x000000000000015e, 0x0000000000000145
.xword 0x0000000000000104, 0x000000000000011f
.xword 0x0000000000000132, 0x0000000000000129
.xword 0x0000000000000360, 0x000000000000037b
.xword 0x0000000000000356, 0x000000000000034d
.xword 0x000000000000030c, 0x0000000000000317
.xword 0x000000000000033a, 0x0000000000000321
.xword 0x00000000000003b8, 0x00000000000003a3
.xword 0x000000000000038e, 0x0000000000000395
.xword 0x00000000000003d4, 0x00000000000003cf
.xword 0x00000000000003e2, 0x00000000000003f9
.xword 0x00000000000002d0, 0x00000000000002cb
.xword 0x00000000000002e6, 0x00000000000002fd
.xword 0x00000000000002bc, 0x00000000000002a7
.xword 0x000000000000028a, 0x0000000000000291
.xword 0x0000000000000208, 0x0000000000000213
.xword 0x000000000000023e, 0x0000000000000225
.xword 0x0000000000000264, 0x000000000000027f
.xword 0x0000000000000252, 0x0000000000000249
.xword 0x00000000000006c0, 0x00000000000006db
.xword 0x00000000000006f6, 0x00000000000006ed
.xword 0x00000000000006ac, 0x00000000000006b7
.xword 0x000000000000069a, 0x0000000000000681
.xword 0x0000000000000618, 0x0000000000000603
.xword 0x000000000000062e, 0x0000000000000635
.xword 0x0000000000000674, 0x000000000000066f
.xword 0x0000000000000642, 0x0000000000000659
.xword 0x0000000000000770, 0x000000000000076b
.xword 0x0000000000000746, 0x000000000000075d
.xword 0x000000000000071c, 0x0000000000000707
.xword 0x000000000000072a, 0x0000000000000731
.xword 0x00000000000007a8, 0x00000000000007b3
.xword 0x000000000000079e, 0x0000000000000785
.xword 0x00000000000007c4, 0x00000000000007df
.xword 0x00000000000007f2, 0x00000000000007e9
.xword 0x00000000000005a0, 0x00000000000005bb
.xword 0x0000000000000596, 0x000000000000058d
.xword 0x00000000000005cc, 0x00000000000005d7
.xword 0x00000000000005fa, 0x00000000000005e1
.xword 0x0000000000000578, 0x0000000000000563
.xword 0x000000000000054e, 0x0000000000000555
.xword 0x0000000000000514, 0x000000000000050f
.xword 0x0000000000000522, 0x0000000000000539
.xword 0x0000000000000410, 0x000000000000040b
.xword 0x0000000000000426, 0x000000000000043d
.xword 0x000000000000047c, 0x0000000000000467
.xword 0x000000000000044a, 0x0000000000000451
.xword 0x00000000000004c8, 0x00000000000004d3
.xword 0x00000000000004fe, 0x00000000000004e5
.xword 0x00000000000004a4, 0x00000000000004bf
.xword 0x0000000000000492, 0x0000000000000489
.xword 0x0000000000000d80, 0x0000000000000d9b
.xword 0x0000000000000db6, 0x0000000000000dad
.xword 0x0000000000000dec, 0x0000000000000df7
.xword 0x0000000000000dda, 0x0000000000000dc1
.xword 0x0000000000000d58, 0x0000000000000d43
.xword 0x0000000000000d6e, 0x0000000000000d75
.xword 0x0000000000000d34, 0x0000000000000d2f
.xword 0x0000000000000d02, 0x0000000000000d19
.xword 0x0000000000000c30, 0x0000000000000c2b
.xword 0x0000000000000c06, 0x0000000000000c1d
.xword 0x0000000000000c5c, 0x0000000000000c47
.xword 0x0000000000000c6a, 0x0000000000000c71
.xword 0x0000000000000ce8, 0x0000000000000cf3
.xword 0x0000000000000cde, 0x0000000000000cc5
.xword 0x0000000000000c84, 0x0000000000000c9f
.xword 0x0000000000000cb2, 0x0000000000000ca9
.xword 0x0000000000000ee0, 0x0000000000000efb
.xword 0x0000000000000ed6, 0x0000000000000ecd
.xword 0x0000000000000e8c, 0x0000000000000e97
.xword 0x0000000000000eba, 0x0000000000000ea1
.xword 0x0000000000000e38, 0x0000000000000e23
.xword 0x0000000000000e0e, 0x0000000000000e15
.xword 0x0000000000000e54, 0x0000000000000e4f
.xword 0x0000000000000e62, 0x0000000000000e79
.xword 0x0000000000000f50, 0x0000000000000f4b
.xword 0x0000000000000f66, 0x0000000000000f7d
.xword 0x0000000000000f3c, 0x0000000000000f27
.xword 0x0000000000000f0a, 0x0000000000000f11
.xword 0x0000000000000f88, 0x0000000000000f93
.xword 0x0000000000000fbe, 0x0000000000000fa5
.xword 0x0000000000000fe4, 0x0000000000000fff
.xword 0x0000000000000fd2, 0x0000000000000fc9
.xword 0x0000000000000b40, 0x0000000000000b5b
.xword 0x0000000000000b76, 0x0000000000000b6d
.xword 0x0000000000000b2c, 0x0000000000000b37
.xword 0x0000000000000b1a, 0x0000000000000b01
.xword 0x0000000000000b98, 0x0000000000000b83
.xword 0x0000000000000bae, 0x0000000000000bb5
.xword 0x0000000000000bf4, 0x0000000000000bef
.xword 0x0000000000000bc2, 0x0000000000000bd9
.xword 0x0000000000000af0, 0x0000000000000aeb
.xword 0x0000000000000ac6, 0x0000000000000add
.xword 0x0000000000000a9c, 0x0000000000000a87
.xword 0x0000000000000aaa, 0x0000000000000ab1
.xword 0x0000000000000a28, 0x0000000000000a33
.xword 0x0000000000000a1e, 0x0000000000000a05
.xword 0x0000000000000a44, 0x0000000000000a5f
.xword 0x0000000000000a72, 0x0000000000000a69
.xword 0x0000000000000820, 0x000000000000083b
.xword 0x0000000000000816, 0x000000000000080d
.xword 0x000000000000084c, 0x0000000000000857
.xword 0x000000000000087a, 0x0000000000000861
.xword 0x00000000000008f8, 0x00000000000008e3
.xword 0x00000000000008ce, 0x00000000000008d5
.xword 0x0000000000000894, 0x000000000000088f
.xword 0x00000000000008a2, 0x00000000000008b9
.xword 0x0000000000000990, 0x000000000000098b
.xword 0x00000000000009a6, 0x00000000000009bd
.xword 0x00000000000009fc, 0x00000000000009e7
.xword 0x00000000000009ca, 0x00000000000009d1
.xword 0x0000000000000948, 0x0000000000000953
.xword 0x000000000000097e, 0x0000000000000965
.xword 0x0000000000000924, 0x000000000000093f
.xword 0x0000000000000912, 0x0000000000000909

View File

@ -0,0 +1,34 @@
########################################################################
# Copyright(c) 2019 Arm Corporation All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Arm Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#########################################################################
#include "../include/aarch64_label.h"
#include "crc64_iso_refl_pmull.h"
#include "crc64_refl_common_pmull.h"
crc64_refl_func crc64_iso_refl_pmull

View File

@ -0,0 +1,198 @@
########################################################################
# Copyright(c) 2019 Arm Corporation All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Arm Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#########################################################################
#define p4_low_b0 0x0001
#define p4_low_b1 0xb000
#define p4_low_b2 0x01b1
#define p4_low_b3 0x01b0
#define p4_high_b0 0x0001
#define p4_high_b1 0x0000
#define p4_high_b2 0x0101
#define p4_high_b3 0xb100
#define p1_low_b0 0x0001
#define p1_low_b1 0x0000
#define p1_low_b2 0x0000
#define p1_low_b3 0x6b70
#define p1_high_b0 0x0001
#define p1_high_b1 0x0000
#define p1_high_b2 0x0000
#define p1_high_b3 0xf500
#define p0_low_b0 0x0001
#define p0_low_b1 0x0000
#define p0_low_b2 0x0000
#define p0_low_b3 0xf500
#define br_low_b0 0x0001
#define br_low_b1 0x0000
#define br_low_b2 0x0000
#define br_low_b3 0xb000
#define br_high_b0 0x0001
#define br_high_b1 0x0000
#define br_high_b2 0x0000
#define br_high_b3 0xb000
ASM_DEF_RODATA
.align 4
.set .lanchor_crc_tab,. + 0
#ifndef __APPLE__
.type crc64_tab, %object
.size crc64_tab, 2048
#endif
crc64_tab:
.xword 0x0000000000000000, 0x01b0000000000000
.xword 0x0360000000000000, 0x02d0000000000000
.xword 0x06c0000000000000, 0x0770000000000000
.xword 0x05a0000000000000, 0x0410000000000000
.xword 0x0d80000000000000, 0x0c30000000000000
.xword 0x0ee0000000000000, 0x0f50000000000000
.xword 0x0b40000000000000, 0x0af0000000000000
.xword 0x0820000000000000, 0x0990000000000000
.xword 0x1b00000000000000, 0x1ab0000000000000
.xword 0x1860000000000000, 0x19d0000000000000
.xword 0x1dc0000000000000, 0x1c70000000000000
.xword 0x1ea0000000000000, 0x1f10000000000000
.xword 0x1680000000000000, 0x1730000000000000
.xword 0x15e0000000000000, 0x1450000000000000
.xword 0x1040000000000000, 0x11f0000000000000
.xword 0x1320000000000000, 0x1290000000000000
.xword 0x3600000000000000, 0x37b0000000000000
.xword 0x3560000000000000, 0x34d0000000000000
.xword 0x30c0000000000000, 0x3170000000000000
.xword 0x33a0000000000000, 0x3210000000000000
.xword 0x3b80000000000000, 0x3a30000000000000
.xword 0x38e0000000000000, 0x3950000000000000
.xword 0x3d40000000000000, 0x3cf0000000000000
.xword 0x3e20000000000000, 0x3f90000000000000
.xword 0x2d00000000000000, 0x2cb0000000000000
.xword 0x2e60000000000000, 0x2fd0000000000000
.xword 0x2bc0000000000000, 0x2a70000000000000
.xword 0x28a0000000000000, 0x2910000000000000
.xword 0x2080000000000000, 0x2130000000000000
.xword 0x23e0000000000000, 0x2250000000000000
.xword 0x2640000000000000, 0x27f0000000000000
.xword 0x2520000000000000, 0x2490000000000000
.xword 0x6c00000000000000, 0x6db0000000000000
.xword 0x6f60000000000000, 0x6ed0000000000000
.xword 0x6ac0000000000000, 0x6b70000000000000
.xword 0x69a0000000000000, 0x6810000000000000
.xword 0x6180000000000000, 0x6030000000000000
.xword 0x62e0000000000000, 0x6350000000000000
.xword 0x6740000000000000, 0x66f0000000000000
.xword 0x6420000000000000, 0x6590000000000000
.xword 0x7700000000000000, 0x76b0000000000000
.xword 0x7460000000000000, 0x75d0000000000000
.xword 0x71c0000000000000, 0x7070000000000000
.xword 0x72a0000000000000, 0x7310000000000000
.xword 0x7a80000000000000, 0x7b30000000000000
.xword 0x79e0000000000000, 0x7850000000000000
.xword 0x7c40000000000000, 0x7df0000000000000
.xword 0x7f20000000000000, 0x7e90000000000000
.xword 0x5a00000000000000, 0x5bb0000000000000
.xword 0x5960000000000000, 0x58d0000000000000
.xword 0x5cc0000000000000, 0x5d70000000000000
.xword 0x5fa0000000000000, 0x5e10000000000000
.xword 0x5780000000000000, 0x5630000000000000
.xword 0x54e0000000000000, 0x5550000000000000
.xword 0x5140000000000000, 0x50f0000000000000
.xword 0x5220000000000000, 0x5390000000000000
.xword 0x4100000000000000, 0x40b0000000000000
.xword 0x4260000000000000, 0x43d0000000000000
.xword 0x47c0000000000000, 0x4670000000000000
.xword 0x44a0000000000000, 0x4510000000000000
.xword 0x4c80000000000000, 0x4d30000000000000
.xword 0x4fe0000000000000, 0x4e50000000000000
.xword 0x4a40000000000000, 0x4bf0000000000000
.xword 0x4920000000000000, 0x4890000000000000
.xword 0xd800000000000000, 0xd9b0000000000000
.xword 0xdb60000000000000, 0xdad0000000000000
.xword 0xdec0000000000000, 0xdf70000000000000
.xword 0xdda0000000000000, 0xdc10000000000000
.xword 0xd580000000000000, 0xd430000000000000
.xword 0xd6e0000000000000, 0xd750000000000000
.xword 0xd340000000000000, 0xd2f0000000000000
.xword 0xd020000000000000, 0xd190000000000000
.xword 0xc300000000000000, 0xc2b0000000000000
.xword 0xc060000000000000, 0xc1d0000000000000
.xword 0xc5c0000000000000, 0xc470000000000000
.xword 0xc6a0000000000000, 0xc710000000000000
.xword 0xce80000000000000, 0xcf30000000000000
.xword 0xcde0000000000000, 0xcc50000000000000
.xword 0xc840000000000000, 0xc9f0000000000000
.xword 0xcb20000000000000, 0xca90000000000000
.xword 0xee00000000000000, 0xefb0000000000000
.xword 0xed60000000000000, 0xecd0000000000000
.xword 0xe8c0000000000000, 0xe970000000000000
.xword 0xeba0000000000000, 0xea10000000000000
.xword 0xe380000000000000, 0xe230000000000000
.xword 0xe0e0000000000000, 0xe150000000000000
.xword 0xe540000000000000, 0xe4f0000000000000
.xword 0xe620000000000000, 0xe790000000000000
.xword 0xf500000000000000, 0xf4b0000000000000
.xword 0xf660000000000000, 0xf7d0000000000000
.xword 0xf3c0000000000000, 0xf270000000000000
.xword 0xf0a0000000000000, 0xf110000000000000
.xword 0xf880000000000000, 0xf930000000000000
.xword 0xfbe0000000000000, 0xfa50000000000000
.xword 0xfe40000000000000, 0xfff0000000000000
.xword 0xfd20000000000000, 0xfc90000000000000
.xword 0xb400000000000000, 0xb5b0000000000000
.xword 0xb760000000000000, 0xb6d0000000000000
.xword 0xb2c0000000000000, 0xb370000000000000
.xword 0xb1a0000000000000, 0xb010000000000000
.xword 0xb980000000000000, 0xb830000000000000
.xword 0xbae0000000000000, 0xbb50000000000000
.xword 0xbf40000000000000, 0xbef0000000000000
.xword 0xbc20000000000000, 0xbd90000000000000
.xword 0xaf00000000000000, 0xaeb0000000000000
.xword 0xac60000000000000, 0xadd0000000000000
.xword 0xa9c0000000000000, 0xa870000000000000
.xword 0xaaa0000000000000, 0xab10000000000000
.xword 0xa280000000000000, 0xa330000000000000
.xword 0xa1e0000000000000, 0xa050000000000000
.xword 0xa440000000000000, 0xa5f0000000000000
.xword 0xa720000000000000, 0xa690000000000000
.xword 0x8200000000000000, 0x83b0000000000000
.xword 0x8160000000000000, 0x80d0000000000000
.xword 0x84c0000000000000, 0x8570000000000000
.xword 0x87a0000000000000, 0x8610000000000000
.xword 0x8f80000000000000, 0x8e30000000000000
.xword 0x8ce0000000000000, 0x8d50000000000000
.xword 0x8940000000000000, 0x88f0000000000000
.xword 0x8a20000000000000, 0x8b90000000000000
.xword 0x9900000000000000, 0x98b0000000000000
.xword 0x9a60000000000000, 0x9bd0000000000000
.xword 0x9fc0000000000000, 0x9e70000000000000
.xword 0x9ca0000000000000, 0x9d10000000000000
.xword 0x9480000000000000, 0x9530000000000000
.xword 0x97e0000000000000, 0x9650000000000000
.xword 0x9240000000000000, 0x93f0000000000000
.xword 0x9120000000000000, 0x9090000000000000

View File

@ -0,0 +1,34 @@
########################################################################
# Copyright(c) 2019 Arm Corporation All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Arm Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#########################################################################
#include "../include/aarch64_label.h"
#include "crc64_jones_norm_pmull.h"
#include "crc64_norm_common_pmull.h"
crc64_norm_func crc64_jones_norm_pmull

View File

@ -0,0 +1,202 @@
########################################################################
# Copyright(c) 2019 Arm Corporation All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Arm Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#########################################################################
#define p4_low_b0 (0xd25e)
#define p4_low_b1 0xca43
#define p4_low_b2 0x1e58
#define p4_low_b3 0x4e50
#define p4_high_b0 0xf643
#define p4_high_b1 0x8f27
#define p4_high_b2 0x6158
#define p4_high_b3 0x13c9
#define p1_low_b0 (0x7038)
#define p1_low_b1 0x5001
#define p1_low_b2 0xed27
#define p1_low_b3 0x4445
#define p1_high_b0 0xd736
#define p1_high_b1 0x7cfb
#define p1_high_b2 0x7415
#define p1_high_b3 0x698b
#define p0_low_b0 (0x7038)
#define p0_low_b1 0x5001
#define p0_low_b2 0xed27
#define p0_low_b3 0x4445
#define p0_high_b0 0x0000
#define p0_high_b1 0x0000
#define p0_high_b2 0x0000
#define p0_high_b3 0x0000
#define br_low_b0 (0x6cf8)
#define br_low_b1 0x98be
#define br_low_b2 0xeeb2
#define br_low_b3 0xddf3
#define br_high_b0 0x35a9
#define br_high_b1 0x94c9
#define br_high_b2 0xd235
#define br_high_b3 0xad93
ASM_DEF_RODATA
.align 4
.set .lanchor_crc_tab,. + 0
#ifndef __APPLE__
.type crc64_tab, %object
.size crc64_tab, 2048
#endif
crc64_tab:
.xword 0x0000000000000000, 0xad93d23594c935a9
.xword 0xf6b4765ebd5b5efb, 0x5b27a46b29926b52
.xword 0x40fb3e88ee7f885f, 0xed68ecbd7ab6bdf6
.xword 0xb64f48d65324d6a4, 0x1bdc9ae3c7ede30d
.xword 0x81f67d11dcff10be, 0x2c65af2448362517
.xword 0x77420b4f61a44e45, 0xdad1d97af56d7bec
.xword 0xc10d4399328098e1, 0x6c9e91aca649ad48
.xword 0x37b935c78fdbc61a, 0x9a2ae7f21b12f3b3
.xword 0xae7f28162d3714d5, 0x03ecfa23b9fe217c
.xword 0x58cb5e48906c4a2e, 0xf5588c7d04a57f87
.xword 0xee84169ec3489c8a, 0x4317c4ab5781a923
.xword 0x183060c07e13c271, 0xb5a3b2f5eadaf7d8
.xword 0x2f895507f1c8046b, 0x821a8732650131c2
.xword 0xd93d23594c935a90, 0x74aef16cd85a6f39
.xword 0x6f726b8f1fb78c34, 0xc2e1b9ba8b7eb99d
.xword 0x99c61dd1a2ecd2cf, 0x3455cfe43625e766
.xword 0xf16d8219cea71c03, 0x5cfe502c5a6e29aa
.xword 0x07d9f44773fc42f8, 0xaa4a2672e7357751
.xword 0xb196bc9120d8945c, 0x1c056ea4b411a1f5
.xword 0x4722cacf9d83caa7, 0xeab118fa094aff0e
.xword 0x709bff0812580cbd, 0xdd082d3d86913914
.xword 0x862f8956af035246, 0x2bbc5b633bca67ef
.xword 0x3060c180fc2784e2, 0x9df313b568eeb14b
.xword 0xc6d4b7de417cda19, 0x6b4765ebd5b5efb0
.xword 0x5f12aa0fe39008d6, 0xf281783a77593d7f
.xword 0xa9a6dc515ecb562d, 0x04350e64ca026384
.xword 0x1fe994870def8089, 0xb27a46b29926b520
.xword 0xe95de2d9b0b4de72, 0x44ce30ec247debdb
.xword 0xdee4d71e3f6f1868, 0x7377052baba62dc1
.xword 0x2850a14082344693, 0x85c3737516fd733a
.xword 0x9e1fe996d1109037, 0x338c3ba345d9a59e
.xword 0x68ab9fc86c4bcecc, 0xc5384dfdf882fb65
.xword 0x4f48d60609870daf, 0xe2db04339d4e3806
.xword 0xb9fca058b4dc5354, 0x146f726d201566fd
.xword 0x0fb3e88ee7f885f0, 0xa2203abb7331b059
.xword 0xf9079ed05aa3db0b, 0x54944ce5ce6aeea2
.xword 0xcebeab17d5781d11, 0x632d792241b128b8
.xword 0x380add49682343ea, 0x95990f7cfcea7643
.xword 0x8e45959f3b07954e, 0x23d647aaafcea0e7
.xword 0x78f1e3c1865ccbb5, 0xd56231f41295fe1c
.xword 0xe137fe1024b0197a, 0x4ca42c25b0792cd3
.xword 0x1783884e99eb4781, 0xba105a7b0d227228
.xword 0xa1ccc098cacf9125, 0x0c5f12ad5e06a48c
.xword 0x5778b6c67794cfde, 0xfaeb64f3e35dfa77
.xword 0x60c18301f84f09c4, 0xcd5251346c863c6d
.xword 0x9675f55f4514573f, 0x3be6276ad1dd6296
.xword 0x203abd891630819b, 0x8da96fbc82f9b432
.xword 0xd68ecbd7ab6bdf60, 0x7b1d19e23fa2eac9
.xword 0xbe25541fc72011ac, 0x13b6862a53e92405
.xword 0x489122417a7b4f57, 0xe502f074eeb27afe
.xword 0xfede6a97295f99f3, 0x534db8a2bd96ac5a
.xword 0x086a1cc99404c708, 0xa5f9cefc00cdf2a1
.xword 0x3fd3290e1bdf0112, 0x9240fb3b8f1634bb
.xword 0xc9675f50a6845fe9, 0x64f48d65324d6a40
.xword 0x7f281786f5a0894d, 0xd2bbc5b36169bce4
.xword 0x899c61d848fbd7b6, 0x240fb3eddc32e21f
.xword 0x105a7c09ea170579, 0xbdc9ae3c7ede30d0
.xword 0xe6ee0a57574c5b82, 0x4b7dd862c3856e2b
.xword 0x50a1428104688d26, 0xfd3290b490a1b88f
.xword 0xa61534dfb933d3dd, 0x0b86e6ea2dfae674
.xword 0x91ac011836e815c7, 0x3c3fd32da221206e
.xword 0x671877468bb34b3c, 0xca8ba5731f7a7e95
.xword 0xd1573f90d8979d98, 0x7cc4eda54c5ea831
.xword 0x27e349ce65ccc363, 0x8a709bfbf105f6ca
.xword 0x9e91ac0c130e1b5e, 0x33027e3987c72ef7
.xword 0x6825da52ae5545a5, 0xc5b608673a9c700c
.xword 0xde6a9284fd719301, 0x73f940b169b8a6a8
.xword 0x28dee4da402acdfa, 0x854d36efd4e3f853
.xword 0x1f67d11dcff10be0, 0xb2f403285b383e49
.xword 0xe9d3a74372aa551b, 0x44407576e66360b2
.xword 0x5f9cef95218e83bf, 0xf20f3da0b547b616
.xword 0xa92899cb9cd5dd44, 0x04bb4bfe081ce8ed
.xword 0x30ee841a3e390f8b, 0x9d7d562faaf03a22
.xword 0xc65af24483625170, 0x6bc9207117ab64d9
.xword 0x7015ba92d04687d4, 0xdd8668a7448fb27d
.xword 0x86a1cccc6d1dd92f, 0x2b321ef9f9d4ec86
.xword 0xb118f90be2c61f35, 0x1c8b2b3e760f2a9c
.xword 0x47ac8f555f9d41ce, 0xea3f5d60cb547467
.xword 0xf1e3c7830cb9976a, 0x5c7015b69870a2c3
.xword 0x0757b1ddb1e2c991, 0xaac463e8252bfc38
.xword 0x6ffc2e15dda9075d, 0xc26ffc20496032f4
.xword 0x9948584b60f259a6, 0x34db8a7ef43b6c0f
.xword 0x2f07109d33d68f02, 0x8294c2a8a71fbaab
.xword 0xd9b366c38e8dd1f9, 0x7420b4f61a44e450
.xword 0xee0a5304015617e3, 0x43998131959f224a
.xword 0x18be255abc0d4918, 0xb52df76f28c47cb1
.xword 0xaef16d8cef299fbc, 0x0362bfb97be0aa15
.xword 0x58451bd25272c147, 0xf5d6c9e7c6bbf4ee
.xword 0xc1830603f09e1388, 0x6c10d43664572621
.xword 0x3737705d4dc54d73, 0x9aa4a268d90c78da
.xword 0x8178388b1ee19bd7, 0x2cebeabe8a28ae7e
.xword 0x77cc4ed5a3bac52c, 0xda5f9ce03773f085
.xword 0x40757b122c610336, 0xede6a927b8a8369f
.xword 0xb6c10d4c913a5dcd, 0x1b52df7905f36864
.xword 0x008e459ac21e8b69, 0xad1d97af56d7bec0
.xword 0xf63a33c47f45d592, 0x5ba9e1f1eb8ce03b
.xword 0xd1d97a0a1a8916f1, 0x7c4aa83f8e402358
.xword 0x276d0c54a7d2480a, 0x8afede61331b7da3
.xword 0x91224482f4f69eae, 0x3cb196b7603fab07
.xword 0x679632dc49adc055, 0xca05e0e9dd64f5fc
.xword 0x502f071bc676064f, 0xfdbcd52e52bf33e6
.xword 0xa69b71457b2d58b4, 0x0b08a370efe46d1d
.xword 0x10d4399328098e10, 0xbd47eba6bcc0bbb9
.xword 0xe6604fcd9552d0eb, 0x4bf39df8019be542
.xword 0x7fa6521c37be0224, 0xd2358029a377378d
.xword 0x891224428ae55cdf, 0x2481f6771e2c6976
.xword 0x3f5d6c94d9c18a7b, 0x92cebea14d08bfd2
.xword 0xc9e91aca649ad480, 0x647ac8fff053e129
.xword 0xfe502f0deb41129a, 0x53c3fd387f882733
.xword 0x08e45953561a4c61, 0xa5778b66c2d379c8
.xword 0xbeab1185053e9ac5, 0x1338c3b091f7af6c
.xword 0x481f67dbb865c43e, 0xe58cb5ee2cacf197
.xword 0x20b4f813d42e0af2, 0x8d272a2640e73f5b
.xword 0xd6008e4d69755409, 0x7b935c78fdbc61a0
.xword 0x604fc69b3a5182ad, 0xcddc14aeae98b704
.xword 0x96fbb0c5870adc56, 0x3b6862f013c3e9ff
.xword 0xa142850208d11a4c, 0x0cd157379c182fe5
.xword 0x57f6f35cb58a44b7, 0xfa6521692143711e
.xword 0xe1b9bb8ae6ae9213, 0x4c2a69bf7267a7ba
.xword 0x170dcdd45bf5cce8, 0xba9e1fe1cf3cf941
.xword 0x8ecbd005f9191e27, 0x235802306dd02b8e
.xword 0x787fa65b444240dc, 0xd5ec746ed08b7575
.xword 0xce30ee8d17669678, 0x63a33cb883afa3d1
.xword 0x388498d3aa3dc883, 0x95174ae63ef4fd2a
.xword 0x0f3dad1425e60e99, 0xa2ae7f21b12f3b30
.xword 0xf989db4a98bd5062, 0x541a097f0c7465cb
.xword 0x4fc6939ccb9986c6, 0xe25541a95f50b36f
.xword 0xb972e5c276c2d83d, 0x14e137f7e20bed94

View File

@ -0,0 +1,34 @@
########################################################################
# Copyright(c) 2019 Arm Corporation All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Arm Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#########################################################################
#include "../include/aarch64_label.h"
#include "crc64_jones_refl_pmull.h"
#include "crc64_refl_common_pmull.h"
crc64_refl_func crc64_jones_refl_pmull

View File

@ -0,0 +1,198 @@
########################################################################
# Copyright(c) 2019 Arm Corporation All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Arm Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#########################################################################
#define p4_low_b0 0xb4fb
#define p4_low_b1 0x6d9a
#define p4_low_b2 0xefb1
#define p4_low_b3 0xaf86
#define p4_high_b0 0x14e4
#define p4_high_b1 0x34f0
#define p4_high_b2 0x84a6
#define p4_high_b3 0xf497
#define p1_low_b0 0xa32c
#define p1_low_b1 0x505d
#define p1_low_b2 0xbe7d
#define p1_low_b3 0xd9d7
#define p1_high_b0 0x4444
#define p1_high_b1 0xc96f
#define p1_high_b2 0x0015
#define p1_high_b3 0x381d
#define p0_low_b0 0x4444
#define p0_low_b1 0xc96f
#define p0_low_b2 0x0015
#define p0_low_b3 0x381d
#define br_low_b0 0x9f77
#define br_low_b1 0x9aef
#define br_low_b2 0xfa32
#define br_low_b3 0x3e6c
#define br_high_b0 0x936b
#define br_high_b1 0x5897
#define br_high_b2 0x2653
#define br_high_b3 0x2b59
ASM_DEF_RODATA
.align 4
.set .lanchor_crc_tab,. + 0
#ifndef __APPLE__
.type crc64_tab, %object
.size crc64_tab, 2048
#endif
crc64_tab:
.xword 0x0000000000000000, 0x7ad870c830358979
.xword 0xf5b0e190606b12f2, 0x8f689158505e9b8b
.xword 0xc038e5739841b68f, 0xbae095bba8743ff6
.xword 0x358804e3f82aa47d, 0x4f50742bc81f2d04
.xword 0xab28ecb46814fe75, 0xd1f09c7c5821770c
.xword 0x5e980d24087fec87, 0x24407dec384a65fe
.xword 0x6b1009c7f05548fa, 0x11c8790fc060c183
.xword 0x9ea0e857903e5a08, 0xe478989fa00bd371
.xword 0x7d08ff3b88be6f81, 0x07d08ff3b88be6f8
.xword 0x88b81eabe8d57d73, 0xf2606e63d8e0f40a
.xword 0xbd301a4810ffd90e, 0xc7e86a8020ca5077
.xword 0x4880fbd87094cbfc, 0x32588b1040a14285
.xword 0xd620138fe0aa91f4, 0xacf86347d09f188d
.xword 0x2390f21f80c18306, 0x594882d7b0f40a7f
.xword 0x1618f6fc78eb277b, 0x6cc0863448deae02
.xword 0xe3a8176c18803589, 0x997067a428b5bcf0
.xword 0xfa11fe77117cdf02, 0x80c98ebf2149567b
.xword 0x0fa11fe77117cdf0, 0x75796f2f41224489
.xword 0x3a291b04893d698d, 0x40f16bccb908e0f4
.xword 0xcf99fa94e9567b7f, 0xb5418a5cd963f206
.xword 0x513912c379682177, 0x2be1620b495da80e
.xword 0xa489f35319033385, 0xde51839b2936bafc
.xword 0x9101f7b0e12997f8, 0xebd98778d11c1e81
.xword 0x64b116208142850a, 0x1e6966e8b1770c73
.xword 0x8719014c99c2b083, 0xfdc17184a9f739fa
.xword 0x72a9e0dcf9a9a271, 0x08719014c99c2b08
.xword 0x4721e43f0183060c, 0x3df994f731b68f75
.xword 0xb29105af61e814fe, 0xc849756751dd9d87
.xword 0x2c31edf8f1d64ef6, 0x56e99d30c1e3c78f
.xword 0xd9810c6891bd5c04, 0xa3597ca0a188d57d
.xword 0xec09088b6997f879, 0x96d1784359a27100
.xword 0x19b9e91b09fcea8b, 0x636199d339c963f2
.xword 0xdf7adabd7a6e2d6f, 0xa5a2aa754a5ba416
.xword 0x2aca3b2d1a053f9d, 0x50124be52a30b6e4
.xword 0x1f423fcee22f9be0, 0x659a4f06d21a1299
.xword 0xeaf2de5e82448912, 0x902aae96b271006b
.xword 0x74523609127ad31a, 0x0e8a46c1224f5a63
.xword 0x81e2d7997211c1e8, 0xfb3aa75142244891
.xword 0xb46ad37a8a3b6595, 0xceb2a3b2ba0eecec
.xword 0x41da32eaea507767, 0x3b024222da65fe1e
.xword 0xa2722586f2d042ee, 0xd8aa554ec2e5cb97
.xword 0x57c2c41692bb501c, 0x2d1ab4dea28ed965
.xword 0x624ac0f56a91f461, 0x1892b03d5aa47d18
.xword 0x97fa21650afae693, 0xed2251ad3acf6fea
.xword 0x095ac9329ac4bc9b, 0x7382b9faaaf135e2
.xword 0xfcea28a2faafae69, 0x8632586aca9a2710
.xword 0xc9622c4102850a14, 0xb3ba5c8932b0836d
.xword 0x3cd2cdd162ee18e6, 0x460abd1952db919f
.xword 0x256b24ca6b12f26d, 0x5fb354025b277b14
.xword 0xd0dbc55a0b79e09f, 0xaa03b5923b4c69e6
.xword 0xe553c1b9f35344e2, 0x9f8bb171c366cd9b
.xword 0x10e3202993385610, 0x6a3b50e1a30ddf69
.xword 0x8e43c87e03060c18, 0xf49bb8b633338561
.xword 0x7bf329ee636d1eea, 0x012b592653589793
.xword 0x4e7b2d0d9b47ba97, 0x34a35dc5ab7233ee
.xword 0xbbcbcc9dfb2ca865, 0xc113bc55cb19211c
.xword 0x5863dbf1e3ac9dec, 0x22bbab39d3991495
.xword 0xadd33a6183c78f1e, 0xd70b4aa9b3f20667
.xword 0x985b3e827bed2b63, 0xe2834e4a4bd8a21a
.xword 0x6debdf121b863991, 0x1733afda2bb3b0e8
.xword 0xf34b37458bb86399, 0x8993478dbb8deae0
.xword 0x06fbd6d5ebd3716b, 0x7c23a61ddbe6f812
.xword 0x3373d23613f9d516, 0x49aba2fe23cc5c6f
.xword 0xc6c333a67392c7e4, 0xbc1b436e43a74e9d
.xword 0x95ac9329ac4bc9b5, 0xef74e3e19c7e40cc
.xword 0x601c72b9cc20db47, 0x1ac40271fc15523e
.xword 0x5594765a340a7f3a, 0x2f4c0692043ff643
.xword 0xa02497ca54616dc8, 0xdafce7026454e4b1
.xword 0x3e847f9dc45f37c0, 0x445c0f55f46abeb9
.xword 0xcb349e0da4342532, 0xb1eceec59401ac4b
.xword 0xfebc9aee5c1e814f, 0x8464ea266c2b0836
.xword 0x0b0c7b7e3c7593bd, 0x71d40bb60c401ac4
.xword 0xe8a46c1224f5a634, 0x927c1cda14c02f4d
.xword 0x1d148d82449eb4c6, 0x67ccfd4a74ab3dbf
.xword 0x289c8961bcb410bb, 0x5244f9a98c8199c2
.xword 0xdd2c68f1dcdf0249, 0xa7f41839ecea8b30
.xword 0x438c80a64ce15841, 0x3954f06e7cd4d138
.xword 0xb63c61362c8a4ab3, 0xcce411fe1cbfc3ca
.xword 0x83b465d5d4a0eece, 0xf96c151de49567b7
.xword 0x76048445b4cbfc3c, 0x0cdcf48d84fe7545
.xword 0x6fbd6d5ebd3716b7, 0x15651d968d029fce
.xword 0x9a0d8ccedd5c0445, 0xe0d5fc06ed698d3c
.xword 0xaf85882d2576a038, 0xd55df8e515432941
.xword 0x5a3569bd451db2ca, 0x20ed197575283bb3
.xword 0xc49581ead523e8c2, 0xbe4df122e51661bb
.xword 0x3125607ab548fa30, 0x4bfd10b2857d7349
.xword 0x04ad64994d625e4d, 0x7e7514517d57d734
.xword 0xf11d85092d094cbf, 0x8bc5f5c11d3cc5c6
.xword 0x12b5926535897936, 0x686de2ad05bcf04f
.xword 0xe70573f555e26bc4, 0x9ddd033d65d7e2bd
.xword 0xd28d7716adc8cfb9, 0xa85507de9dfd46c0
.xword 0x273d9686cda3dd4b, 0x5de5e64efd965432
.xword 0xb99d7ed15d9d8743, 0xc3450e196da80e3a
.xword 0x4c2d9f413df695b1, 0x36f5ef890dc31cc8
.xword 0x79a59ba2c5dc31cc, 0x037deb6af5e9b8b5
.xword 0x8c157a32a5b7233e, 0xf6cd0afa9582aa47
.xword 0x4ad64994d625e4da, 0x300e395ce6106da3
.xword 0xbf66a804b64ef628, 0xc5bed8cc867b7f51
.xword 0x8aeeace74e645255, 0xf036dc2f7e51db2c
.xword 0x7f5e4d772e0f40a7, 0x05863dbf1e3ac9de
.xword 0xe1fea520be311aaf, 0x9b26d5e88e0493d6
.xword 0x144e44b0de5a085d, 0x6e963478ee6f8124
.xword 0x21c640532670ac20, 0x5b1e309b16452559
.xword 0xd476a1c3461bbed2, 0xaeaed10b762e37ab
.xword 0x37deb6af5e9b8b5b, 0x4d06c6676eae0222
.xword 0xc26e573f3ef099a9, 0xb8b627f70ec510d0
.xword 0xf7e653dcc6da3dd4, 0x8d3e2314f6efb4ad
.xword 0x0256b24ca6b12f26, 0x788ec2849684a65f
.xword 0x9cf65a1b368f752e, 0xe62e2ad306bafc57
.xword 0x6946bb8b56e467dc, 0x139ecb4366d1eea5
.xword 0x5ccebf68aecec3a1, 0x2616cfa09efb4ad8
.xword 0xa97e5ef8cea5d153, 0xd3a62e30fe90582a
.xword 0xb0c7b7e3c7593bd8, 0xca1fc72bf76cb2a1
.xword 0x45775673a732292a, 0x3faf26bb9707a053
.xword 0x70ff52905f188d57, 0x0a2722586f2d042e
.xword 0x854fb3003f739fa5, 0xff97c3c80f4616dc
.xword 0x1bef5b57af4dc5ad, 0x61372b9f9f784cd4
.xword 0xee5fbac7cf26d75f, 0x9487ca0fff135e26
.xword 0xdbd7be24370c7322, 0xa10fceec0739fa5b
.xword 0x2e675fb4576761d0, 0x54bf2f7c6752e8a9
.xword 0xcdcf48d84fe75459, 0xb71738107fd2dd20
.xword 0x387fa9482f8c46ab, 0x42a7d9801fb9cfd2
.xword 0x0df7adabd7a6e2d6, 0x772fdd63e7936baf
.xword 0xf8474c3bb7cdf024, 0x829f3cf387f8795d
.xword 0x66e7a46c27f3aa2c, 0x1c3fd4a417c62355
.xword 0x935745fc4798b8de, 0xe98f353477ad31a7
.xword 0xa6df411fbfb21ca3, 0xdc0731d78f8795da
.xword 0x536fa08fdfd90e51, 0x29b7d047efec8728

View File

@ -0,0 +1,141 @@
########################################################################
# Copyright(c) 2019 Arm Corporation All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Arm Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#########################################################################
#include "crc_common_pmull.h"
.macro crc64_norm_func name:req
.arch armv8-a+crypto
.text
.align 3
.global cdecl(\name)
#ifndef __APPLE__
.type \name, %function
#endif
/* uint64_t crc64_norm_func(uint64_t seed, const uint8_t * buf, uint64_t len) */
cdecl(\name\()):
mvn x_seed, x_seed
mov x_counter, 0
cmp x_len, (FOLD_SIZE-1)
bhi .crc_clmul_pre
.crc_tab_pre:
cmp x_len, x_counter
bls .done
#ifndef __APPLE__
adrp x_tmp, .lanchor_crc_tab
add x_buf_iter, x_buf, x_counter
add x_buf, x_buf, x_len
add x_crc_tab_addr, x_tmp, :lo12:.lanchor_crc_tab
#else
adrp x_tmp, .lanchor_crc_tab@PAGE
add x_buf_iter, x_buf, x_counter
add x_buf, x_buf, x_len
add x_crc_tab_addr, x_tmp, .lanchor_crc_tab@PAGEOFF
#endif
.align 3
.loop_crc_tab:
ldrb w_tmp, [x_buf_iter], 1
cmp x_buf, x_buf_iter
eor x_tmp, x_tmp, x_seed, lsr 56
ldr x_tmp, [x_crc_tab_addr, x_tmp, lsl 3]
eor x_seed, x_tmp, x_seed, lsl 8
bne .loop_crc_tab
.done:
mvn x_crc_ret, x_seed
ret
.align 2
.crc_clmul_pre:
movi v_x0.2s, 0
fmov v_x0.d[1], x_seed // save crc to v_x0
crc_norm_load_first_block
bls .clmul_loop_end
crc64_load_p4
// 1024bit --> 512bit loop
// merge x0, x1, x2, x3, y0, y1, y2, y3 => x0, x1, x2, x3 (uint64x2_t)
crc_norm_loop
.clmul_loop_end:
// folding 512bit --> 128bit
crc64_fold_512b_to_128b
// folding 128bit --> 64bit
mov x_tmp, p0_low_b0
movk x_tmp, p0_low_b1, lsl 16
movk x_tmp, p0_low_b2, lsl 32
movk x_tmp, p0_low_b3, lsl 48
fmov d_p0_high, x_tmp
pmull2 v_tmp_high.1q, v_x3.2d, v_p0.2d
movi v_tmp_low.2s, 0
ext v_tmp_low.16b, v_tmp_low.16b, v_x3.16b, #8
eor v_x3.16b, v_tmp_high.16b, v_tmp_low.16b
// barrett reduction
mov x_tmp, br_low_b0
movk x_tmp, br_low_b1, lsl 16
movk x_tmp, br_low_b2, lsl 32
movk x_tmp, br_low_b3, lsl 48
fmov d_br_low2, x_tmp
mov x_tmp2, br_high_b0
movk x_tmp2, br_high_b1, lsl 16
movk x_tmp2, br_high_b2, lsl 32
movk x_tmp2, br_high_b3, lsl 48
fmov d_br_high2, x_tmp2
pmull2 v_tmp_low.1q, v_x3.2d, v_br_low.2d
eor v_tmp_low.16b, v_x3.16b, v_tmp_low.16b
pmull2 v_tmp_low.1q, v_tmp_low.2d, v_br_high.2d
eor v_x3.8b, v_x3.8b, v_tmp_low.8b
umov x_seed, v_x3.d[0]
b .crc_tab_pre
#ifndef __APPLE__
.size \name, .-\name
.section .rodata.cst16,"aM",@progbits,16
#else
.section __TEXT,__const
#endif
.align 4
.shuffle_data:
.byte 15, 14, 13, 12, 11, 10, 9, 8
.byte 7, 6, 5, 4, 3, 2, 1, 0
.endm

View File

@ -0,0 +1,136 @@
########################################################################
# Copyright(c) 2019 Arm Corporation All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Arm Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#########################################################################
#include "crc_common_pmull.h"
.macro crc64_refl_func name:req
.arch armv8-a+crypto
.text
.align 3
.global cdecl(\name)
#ifndef __APPLE__
.type \name, %function
#endif
/* uint64_t crc64_refl_func(uint64_t seed, const uint8_t * buf, uint64_t len) */
cdecl(\name\()):
mvn x_seed, x_seed
mov x_counter, 0
cmp x_len, (FOLD_SIZE-1)
bhi .crc_clmul_pre
.crc_tab_pre:
cmp x_len, x_counter
bls .done
#ifndef __APPLE__
adrp x_tmp, .lanchor_crc_tab
add x_buf_iter, x_buf, x_counter
add x_buf, x_buf, x_len
add x_crc_tab_addr, x_tmp, :lo12:.lanchor_crc_tab
#else
adrp x_tmp, .lanchor_crc_tab@PAGE
add x_buf_iter, x_buf, x_counter
add x_buf, x_buf, x_len
add x_crc_tab_addr, x_tmp, .lanchor_crc_tab@PAGEOFF
#endif
.align 3
.loop_crc_tab:
ldrb w_tmp, [x_buf_iter], 1
eor w_tmp, w_tmp, w0
cmp x_buf, x_buf_iter
and x_tmp, x_tmp, 255
ldr x_tmp, [x_crc_tab_addr, x_tmp, lsl 3]
eor x_seed, x_tmp, x_seed, lsr 8
bne .loop_crc_tab
.done:
mvn x_crc_ret, x_seed
ret
.align 2
.crc_clmul_pre:
fmov d_x0, x_seed // save crc to d_x0
crc_refl_load_first_block
bls .clmul_loop_end
crc64_load_p4
// 1024bit --> 512bit loop
// merge x0, x1, x2, x3, y0, y1, y2, y3 => x0, x1, x2, x3 (uint64x2_t)
crc_refl_loop
.clmul_loop_end:
// folding 512bit --> 128bit
crc64_fold_512b_to_128b
// folding 128bit --> 64bit
mov x_tmp, p0_low_b0
movk x_tmp, p0_low_b1, lsl 16
movk x_tmp, p0_low_b2, lsl 32
movk x_tmp, p0_low_b3, lsl 48
fmov d_p0_low, x_tmp
pmull v_tmp_low.1q, v_x3.1d, v_p0.1d
mov d_tmp_high, v_x3.d[1]
eor v_x3.16b, v_tmp_high.16b, v_tmp_low.16b
// barrett reduction
mov x_tmp, br_low_b0
movk x_tmp, br_low_b1, lsl 16
movk x_tmp, br_low_b2, lsl 32
movk x_tmp, br_low_b3, lsl 48
fmov d_br_low, x_tmp
mov x_tmp2, br_high_b0
movk x_tmp2, br_high_b1, lsl 16
movk x_tmp2, br_high_b2, lsl 32
movk x_tmp2, br_high_b3, lsl 48
fmov d_br_high, x_tmp2
pmull v_tmp_low.1q, v_x3.1d, v_br_low.1d
pmull v_tmp_high.1q, v_tmp_low.1d, v_br_high.1d
ext v_tmp_low.16b, v_br_low.16b, v_tmp_low.16b, #8
eor v_tmp_low.16b, v_tmp_high.16b, v_tmp_low.16b
eor v_tmp_low.16b, v_x3.16b, v_tmp_low.16b
umov x_crc_ret, v_tmp_low.d[1]
b .crc_tab_pre
#ifndef __APPLE__
.size \name, .-\name
#endif
.endm

View File

@ -0,0 +1,34 @@
########################################################################
# Copyright(c) 2025 Tim Burke All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Arm Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#########################################################################
#include "../include/aarch64_label.h"
#include "crc64_rocksoft_norm_pmull.h"
#include "crc64_norm_common_pmull.h"
crc64_norm_func crc64_rocksoft_norm_pmull

View File

@ -0,0 +1,209 @@
########################################################################
# Copyright(c) 2025 Tim Burke All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Arm Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#########################################################################
# rk15
#define p4_low_b0 0x488c
#define p4_low_b1 0x0488
#define p4_low_b2 0x4e6a
#define p4_low_b3 0xb441
# rk16
#define p4_high_b0 0x9860
#define p4_high_b1 0x9b66
#define p4_high_b2 0x30f1
#define p4_high_b3 0xa42a
# rk1
#define p1_low_b0 0x2f08
#define p1_low_b1 0xf0dd
#define p1_low_b2 0xc948
#define p1_low_b3 0x6b08
# rk2
#define p1_high_b0 0x76ae
#define p1_high_b1 0x7f04
#define p1_high_b2 0x8ba9
#define p1_high_b3 0x0857
# rk1
#define p0_low_b0 0x2f08
#define p0_low_b1 0xf0dd
#define p0_low_b2 0xc948
#define p0_low_b3 0x6b08
#define p0_high_b0 0x0000
#define p0_high_b1 0x0000
#define p0_high_b2 0x0000
#define p0_high_b3 0x0000
# rk7
#define br_low_b0 0x6fc8
#define br_low_b1 0x98be
#define br_low_b2 0xeeb2
#define br_low_b3 0xddf3
# rk8
#define br_high_b0 0x3659
#define br_high_b1 0x94c9
#define br_high_b2 0xd235
#define br_high_b3 0xad93
ASM_DEF_RODATA
.align 4
.set .lanchor_crc_tab,. + 0
#ifndef __APPLE__
.type crc64_tab, %object
.size crc64_tab, 2048
#endif
crc64_tab:
.xword 0x0000000000000000, 0xad93d23594c93659
.xword 0xf6b4765ebd5b5aeb, 0x5b27a46b29926cb2
.xword 0x40fb3e88ee7f838f, 0xed68ecbd7ab6b5d6
.xword 0xb64f48d65324d964, 0x1bdc9ae3c7edef3d
.xword 0x81f67d11dcff071e, 0x2c65af2448363147
.xword 0x77420b4f61a45df5, 0xdad1d97af56d6bac
.xword 0xc10d439932808491, 0x6c9e91aca649b2c8
.xword 0x37b935c78fdbde7a, 0x9a2ae7f21b12e823
.xword 0xae7f28162d373865, 0x03ecfa23b9fe0e3c
.xword 0x58cb5e48906c628e, 0xf5588c7d04a554d7
.xword 0xee84169ec348bbea, 0x4317c4ab57818db3
.xword 0x183060c07e13e101, 0xb5a3b2f5eadad758
.xword 0x2f895507f1c83f7b, 0x821a873265010922
.xword 0xd93d23594c936590, 0x74aef16cd85a53c9
.xword 0x6f726b8f1fb7bcf4, 0xc2e1b9ba8b7e8aad
.xword 0x99c61dd1a2ece61f, 0x3455cfe43625d046
.xword 0xf16d8219cea74693, 0x5cfe502c5a6e70ca
.xword 0x07d9f44773fc1c78, 0xaa4a2672e7352a21
.xword 0xb196bc9120d8c51c, 0x1c056ea4b411f345
.xword 0x4722cacf9d839ff7, 0xeab118fa094aa9ae
.xword 0x709bff081258418d, 0xdd082d3d869177d4
.xword 0x862f8956af031b66, 0x2bbc5b633bca2d3f
.xword 0x3060c180fc27c202, 0x9df313b568eef45b
.xword 0xc6d4b7de417c98e9, 0x6b4765ebd5b5aeb0
.xword 0x5f12aa0fe3907ef6, 0xf281783a775948af
.xword 0xa9a6dc515ecb241d, 0x04350e64ca021244
.xword 0x1fe994870deffd79, 0xb27a46b29926cb20
.xword 0xe95de2d9b0b4a792, 0x44ce30ec247d91cb
.xword 0xdee4d71e3f6f79e8, 0x7377052baba64fb1
.xword 0x2850a14082342303, 0x85c3737516fd155a
.xword 0x9e1fe996d110fa67, 0x338c3ba345d9cc3e
.xword 0x68ab9fc86c4ba08c, 0xc5384dfdf88296d5
.xword 0x4f48d6060987bb7f, 0xe2db04339d4e8d26
.xword 0xb9fca058b4dce194, 0x146f726d2015d7cd
.xword 0x0fb3e88ee7f838f0, 0xa2203abb73310ea9
.xword 0xf9079ed05aa3621b, 0x54944ce5ce6a5442
.xword 0xcebeab17d578bc61, 0x632d792241b18a38
.xword 0x380add496823e68a, 0x95990f7cfcead0d3
.xword 0x8e45959f3b073fee, 0x23d647aaafce09b7
.xword 0x78f1e3c1865c6505, 0xd56231f41295535c
.xword 0xe137fe1024b0831a, 0x4ca42c25b079b543
.xword 0x1783884e99ebd9f1, 0xba105a7b0d22efa8
.xword 0xa1ccc098cacf0095, 0x0c5f12ad5e0636cc
.xword 0x5778b6c677945a7e, 0xfaeb64f3e35d6c27
.xword 0x60c18301f84f8404, 0xcd5251346c86b25d
.xword 0x9675f55f4514deef, 0x3be6276ad1dde8b6
.xword 0x203abd891630078b, 0x8da96fbc82f931d2
.xword 0xd68ecbd7ab6b5d60, 0x7b1d19e23fa26b39
.xword 0xbe25541fc720fdec, 0x13b6862a53e9cbb5
.xword 0x489122417a7ba707, 0xe502f074eeb2915e
.xword 0xfede6a97295f7e63, 0x534db8a2bd96483a
.xword 0x086a1cc994042488, 0xa5f9cefc00cd12d1
.xword 0x3fd3290e1bdffaf2, 0x9240fb3b8f16ccab
.xword 0xc9675f50a684a019, 0x64f48d65324d9640
.xword 0x7f281786f5a0797d, 0xd2bbc5b361694f24
.xword 0x899c61d848fb2396, 0x240fb3eddc3215cf
.xword 0x105a7c09ea17c589, 0xbdc9ae3c7edef3d0
.xword 0xe6ee0a57574c9f62, 0x4b7dd862c385a93b
.xword 0x50a1428104684606, 0xfd3290b490a1705f
.xword 0xa61534dfb9331ced, 0x0b86e6ea2dfa2ab4
.xword 0x91ac011836e8c297, 0x3c3fd32da221f4ce
.xword 0x671877468bb3987c, 0xca8ba5731f7aae25
.xword 0xd1573f90d8974118, 0x7cc4eda54c5e7741
.xword 0x27e349ce65cc1bf3, 0x8a709bfbf1052daa
.xword 0x9e91ac0c130f76fe, 0x33027e3987c640a7
.xword 0x6825da52ae542c15, 0xc5b608673a9d1a4c
.xword 0xde6a9284fd70f571, 0x73f940b169b9c328
.xword 0x28dee4da402baf9a, 0x854d36efd4e299c3
.xword 0x1f67d11dcff071e0, 0xb2f403285b3947b9
.xword 0xe9d3a74372ab2b0b, 0x44407576e6621d52
.xword 0x5f9cef95218ff26f, 0xf20f3da0b546c436
.xword 0xa92899cb9cd4a884, 0x04bb4bfe081d9edd
.xword 0x30ee841a3e384e9b, 0x9d7d562faaf178c2
.xword 0xc65af24483631470, 0x6bc9207117aa2229
.xword 0x7015ba92d047cd14, 0xdd8668a7448efb4d
.xword 0x86a1cccc6d1c97ff, 0x2b321ef9f9d5a1a6
.xword 0xb118f90be2c74985, 0x1c8b2b3e760e7fdc
.xword 0x47ac8f555f9c136e, 0xea3f5d60cb552537
.xword 0xf1e3c7830cb8ca0a, 0x5c7015b69871fc53
.xword 0x0757b1ddb1e390e1, 0xaac463e8252aa6b8
.xword 0x6ffc2e15dda8306d, 0xc26ffc2049610634
.xword 0x9948584b60f36a86, 0x34db8a7ef43a5cdf
.xword 0x2f07109d33d7b3e2, 0x8294c2a8a71e85bb
.xword 0xd9b366c38e8ce909, 0x7420b4f61a45df50
.xword 0xee0a530401573773, 0x43998131959e012a
.xword 0x18be255abc0c6d98, 0xb52df76f28c55bc1
.xword 0xaef16d8cef28b4fc, 0x0362bfb97be182a5
.xword 0x58451bd25273ee17, 0xf5d6c9e7c6bad84e
.xword 0xc1830603f09f0808, 0x6c10d43664563e51
.xword 0x3737705d4dc452e3, 0x9aa4a268d90d64ba
.xword 0x8178388b1ee08b87, 0x2cebeabe8a29bdde
.xword 0x77cc4ed5a3bbd16c, 0xda5f9ce03772e735
.xword 0x40757b122c600f16, 0xede6a927b8a9394f
.xword 0xb6c10d4c913b55fd, 0x1b52df7905f263a4
.xword 0x008e459ac21f8c99, 0xad1d97af56d6bac0
.xword 0xf63a33c47f44d672, 0x5ba9e1f1eb8de02b
.xword 0xd1d97a0a1a88cd81, 0x7c4aa83f8e41fbd8
.xword 0x276d0c54a7d3976a, 0x8afede61331aa133
.xword 0x91224482f4f74e0e, 0x3cb196b7603e7857
.xword 0x679632dc49ac14e5, 0xca05e0e9dd6522bc
.xword 0x502f071bc677ca9f, 0xfdbcd52e52befcc6
.xword 0xa69b71457b2c9074, 0x0b08a370efe5a62d
.xword 0x10d4399328084910, 0xbd47eba6bcc17f49
.xword 0xe6604fcd955313fb, 0x4bf39df8019a25a2
.xword 0x7fa6521c37bff5e4, 0xd2358029a376c3bd
.xword 0x891224428ae4af0f, 0x2481f6771e2d9956
.xword 0x3f5d6c94d9c0766b, 0x92cebea14d094032
.xword 0xc9e91aca649b2c80, 0x647ac8fff0521ad9
.xword 0xfe502f0deb40f2fa, 0x53c3fd387f89c4a3
.xword 0x08e45953561ba811, 0xa5778b66c2d29e48
.xword 0xbeab1185053f7175, 0x1338c3b091f6472c
.xword 0x481f67dbb8642b9e, 0xe58cb5ee2cad1dc7
.xword 0x20b4f813d42f8b12, 0x8d272a2640e6bd4b
.xword 0xd6008e4d6974d1f9, 0x7b935c78fdbde7a0
.xword 0x604fc69b3a50089d, 0xcddc14aeae993ec4
.xword 0x96fbb0c5870b5276, 0x3b6862f013c2642f
.xword 0xa142850208d08c0c, 0x0cd157379c19ba55
.xword 0x57f6f35cb58bd6e7, 0xfa6521692142e0be
.xword 0xe1b9bb8ae6af0f83, 0x4c2a69bf726639da
.xword 0x170dcdd45bf45568, 0xba9e1fe1cf3d6331
.xword 0x8ecbd005f918b377, 0x235802306dd1852e
.xword 0x787fa65b4443e99c, 0xd5ec746ed08adfc5
.xword 0xce30ee8d176730f8, 0x63a33cb883ae06a1
.xword 0x388498d3aa3c6a13, 0x95174ae63ef55c4a
.xword 0x0f3dad1425e7b469, 0xa2ae7f21b12e8230
.xword 0xf989db4a98bcee82, 0x541a097f0c75d8db
.xword 0x4fc6939ccb9837e6, 0xe25541a95f5101bf
.xword 0xb972e5c276c36d0d, 0x14e137f7e20a5b54

View File

@ -0,0 +1,34 @@
########################################################################
# Copyright(c) 2025 Tim Burke All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Arm Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#########################################################################
#include "../include/aarch64_label.h"
#include "crc64_rocksoft_refl_pmull.h"
#include "crc64_refl_common_pmull.h"
crc64_refl_func crc64_rocksoft_refl_pmull

View File

@ -0,0 +1,205 @@
########################################################################
# Copyright(c) 2025 Tim Burke All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Arm Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#########################################################################
# rk16
#define p4_low_b0 0xa84a
#define p4_low_b1 0x1e18
#define p4_low_b2 0xcdb3
#define p4_low_b3 0x0c32
# rk15
#define p4_high_b0 0x045a
#define p4_high_b1 0xace5
#define p4_high_b2 0x2240
#define p4_high_b3 0x6224
# rk2
#define p1_low_b0 0xd420
#define p1_low_b1 0x2ba3
#define p1_low_b2 0x41fd
#define p1_low_b3 0xeadc
# rk1
#define p1_high_b0 0x21ac
#define p1_high_b1 0x2526
#define p1_high_b2 0x761e
#define p1_high_b3 0x21e9
# rk1
#define p0_low_b0 0x21ac
#define p0_low_b1 0x2526
#define p0_low_b2 0x761e
#define p0_low_b3 0x21e9
# rk7
#define br_low_b0 0x9f77
#define br_low_b1 0x9aef
#define br_low_b2 0xfa32
#define br_low_b3 0x27ec
# rk8 + 1?
#define br_high_b0 0x936b
#define br_high_b1 0x5897
#define br_high_b2 0x2653
#define br_high_b3 0x34d9
ASM_DEF_RODATA
.align 4
.set .lanchor_crc_tab,. + 0
#ifndef __APPLE__
.type crc64_tab, %object
.size crc64_tab, 2048
#endif
crc64_tab:
.xword 0x0000000000000000, 0x7f6ef0c830358979
.xword 0xfedde190606b12f2, 0x81b31158505e9b8b
.xword 0xc962e5739841b68f, 0xb60c15bba8743ff6
.xword 0x37bf04e3f82aa47d, 0x48d1f42bc81f2d04
.xword 0xa61cecb46814fe75, 0xd9721c7c5821770c
.xword 0x58c10d24087fec87, 0x27affdec384a65fe
.xword 0x6f7e09c7f05548fa, 0x1010f90fc060c183
.xword 0x91a3e857903e5a08, 0xeecd189fa00bd371
.xword 0x78e0ff3b88be6f81, 0x078e0ff3b88be6f8
.xword 0x863d1eabe8d57d73, 0xf953ee63d8e0f40a
.xword 0xb1821a4810ffd90e, 0xceecea8020ca5077
.xword 0x4f5ffbd87094cbfc, 0x30310b1040a14285
.xword 0xdefc138fe0aa91f4, 0xa192e347d09f188d
.xword 0x2021f21f80c18306, 0x5f4f02d7b0f40a7f
.xword 0x179ef6fc78eb277b, 0x68f0063448deae02
.xword 0xe943176c18803589, 0x962de7a428b5bcf0
.xword 0xf1c1fe77117cdf02, 0x8eaf0ebf2149567b
.xword 0x0f1c1fe77117cdf0, 0x7072ef2f41224489
.xword 0x38a31b04893d698d, 0x47cdebccb908e0f4
.xword 0xc67efa94e9567b7f, 0xb9100a5cd963f206
.xword 0x57dd12c379682177, 0x28b3e20b495da80e
.xword 0xa900f35319033385, 0xd66e039b2936bafc
.xword 0x9ebff7b0e12997f8, 0xe1d10778d11c1e81
.xword 0x606216208142850a, 0x1f0ce6e8b1770c73
.xword 0x8921014c99c2b083, 0xf64ff184a9f739fa
.xword 0x77fce0dcf9a9a271, 0x08921014c99c2b08
.xword 0x4043e43f0183060c, 0x3f2d14f731b68f75
.xword 0xbe9e05af61e814fe, 0xc1f0f56751dd9d87
.xword 0x2f3dedf8f1d64ef6, 0x50531d30c1e3c78f
.xword 0xd1e00c6891bd5c04, 0xae8efca0a188d57d
.xword 0xe65f088b6997f879, 0x9931f84359a27100
.xword 0x1882e91b09fcea8b, 0x67ec19d339c963f2
.xword 0xd75adabd7a6e2d6f, 0xa8342a754a5ba416
.xword 0x29873b2d1a053f9d, 0x56e9cbe52a30b6e4
.xword 0x1e383fcee22f9be0, 0x6156cf06d21a1299
.xword 0xe0e5de5e82448912, 0x9f8b2e96b271006b
.xword 0x71463609127ad31a, 0x0e28c6c1224f5a63
.xword 0x8f9bd7997211c1e8, 0xf0f5275142244891
.xword 0xb824d37a8a3b6595, 0xc74a23b2ba0eecec
.xword 0x46f932eaea507767, 0x3997c222da65fe1e
.xword 0xafba2586f2d042ee, 0xd0d4d54ec2e5cb97
.xword 0x5167c41692bb501c, 0x2e0934dea28ed965
.xword 0x66d8c0f56a91f461, 0x19b6303d5aa47d18
.xword 0x980521650afae693, 0xe76bd1ad3acf6fea
.xword 0x09a6c9329ac4bc9b, 0x76c839faaaf135e2
.xword 0xf77b28a2faafae69, 0x8815d86aca9a2710
.xword 0xc0c42c4102850a14, 0xbfaadc8932b0836d
.xword 0x3e19cdd162ee18e6, 0x41773d1952db919f
.xword 0x269b24ca6b12f26d, 0x59f5d4025b277b14
.xword 0xd846c55a0b79e09f, 0xa72835923b4c69e6
.xword 0xeff9c1b9f35344e2, 0x90973171c366cd9b
.xword 0x1124202993385610, 0x6e4ad0e1a30ddf69
.xword 0x8087c87e03060c18, 0xffe938b633338561
.xword 0x7e5a29ee636d1eea, 0x0134d92653589793
.xword 0x49e52d0d9b47ba97, 0x368bddc5ab7233ee
.xword 0xb738cc9dfb2ca865, 0xc8563c55cb19211c
.xword 0x5e7bdbf1e3ac9dec, 0x21152b39d3991495
.xword 0xa0a63a6183c78f1e, 0xdfc8caa9b3f20667
.xword 0x97193e827bed2b63, 0xe877ce4a4bd8a21a
.xword 0x69c4df121b863991, 0x16aa2fda2bb3b0e8
.xword 0xf86737458bb86399, 0x8709c78dbb8deae0
.xword 0x06bad6d5ebd3716b, 0x79d4261ddbe6f812
.xword 0x3105d23613f9d516, 0x4e6b22fe23cc5c6f
.xword 0xcfd833a67392c7e4, 0xb0b6c36e43a74e9d
.xword 0x9a6c9329ac4bc9b5, 0xe50263e19c7e40cc
.xword 0x64b172b9cc20db47, 0x1bdf8271fc15523e
.xword 0x530e765a340a7f3a, 0x2c608692043ff643
.xword 0xadd397ca54616dc8, 0xd2bd67026454e4b1
.xword 0x3c707f9dc45f37c0, 0x431e8f55f46abeb9
.xword 0xc2ad9e0da4342532, 0xbdc36ec59401ac4b
.xword 0xf5129aee5c1e814f, 0x8a7c6a266c2b0836
.xword 0x0bcf7b7e3c7593bd, 0x74a18bb60c401ac4
.xword 0xe28c6c1224f5a634, 0x9de29cda14c02f4d
.xword 0x1c518d82449eb4c6, 0x633f7d4a74ab3dbf
.xword 0x2bee8961bcb410bb, 0x548079a98c8199c2
.xword 0xd53368f1dcdf0249, 0xaa5d9839ecea8b30
.xword 0x449080a64ce15841, 0x3bfe706e7cd4d138
.xword 0xba4d61362c8a4ab3, 0xc52391fe1cbfc3ca
.xword 0x8df265d5d4a0eece, 0xf29c951de49567b7
.xword 0x732f8445b4cbfc3c, 0x0c41748d84fe7545
.xword 0x6bad6d5ebd3716b7, 0x14c39d968d029fce
.xword 0x95708ccedd5c0445, 0xea1e7c06ed698d3c
.xword 0xa2cf882d2576a038, 0xdda178e515432941
.xword 0x5c1269bd451db2ca, 0x237c997575283bb3
.xword 0xcdb181ead523e8c2, 0xb2df7122e51661bb
.xword 0x336c607ab548fa30, 0x4c0290b2857d7349
.xword 0x04d364994d625e4d, 0x7bbd94517d57d734
.xword 0xfa0e85092d094cbf, 0x856075c11d3cc5c6
.xword 0x134d926535897936, 0x6c2362ad05bcf04f
.xword 0xed9073f555e26bc4, 0x92fe833d65d7e2bd
.xword 0xda2f7716adc8cfb9, 0xa54187de9dfd46c0
.xword 0x24f29686cda3dd4b, 0x5b9c664efd965432
.xword 0xb5517ed15d9d8743, 0xca3f8e196da80e3a
.xword 0x4b8c9f413df695b1, 0x34e26f890dc31cc8
.xword 0x7c339ba2c5dc31cc, 0x035d6b6af5e9b8b5
.xword 0x82ee7a32a5b7233e, 0xfd808afa9582aa47
.xword 0x4d364994d625e4da, 0x3258b95ce6106da3
.xword 0xb3eba804b64ef628, 0xcc8558cc867b7f51
.xword 0x8454ace74e645255, 0xfb3a5c2f7e51db2c
.xword 0x7a894d772e0f40a7, 0x05e7bdbf1e3ac9de
.xword 0xeb2aa520be311aaf, 0x944455e88e0493d6
.xword 0x15f744b0de5a085d, 0x6a99b478ee6f8124
.xword 0x224840532670ac20, 0x5d26b09b16452559
.xword 0xdc95a1c3461bbed2, 0xa3fb510b762e37ab
.xword 0x35d6b6af5e9b8b5b, 0x4ab846676eae0222
.xword 0xcb0b573f3ef099a9, 0xb465a7f70ec510d0
.xword 0xfcb453dcc6da3dd4, 0x83daa314f6efb4ad
.xword 0x0269b24ca6b12f26, 0x7d0742849684a65f
.xword 0x93ca5a1b368f752e, 0xeca4aad306bafc57
.xword 0x6d17bb8b56e467dc, 0x12794b4366d1eea5
.xword 0x5aa8bf68aecec3a1, 0x25c64fa09efb4ad8
.xword 0xa4755ef8cea5d153, 0xdb1bae30fe90582a
.xword 0xbcf7b7e3c7593bd8, 0xc399472bf76cb2a1
.xword 0x422a5673a732292a, 0x3d44a6bb9707a053
.xword 0x759552905f188d57, 0x0afba2586f2d042e
.xword 0x8b48b3003f739fa5, 0xf42643c80f4616dc
.xword 0x1aeb5b57af4dc5ad, 0x6585ab9f9f784cd4
.xword 0xe436bac7cf26d75f, 0x9b584a0fff135e26
.xword 0xd389be24370c7322, 0xace74eec0739fa5b
.xword 0x2d545fb4576761d0, 0x523aaf7c6752e8a9
.xword 0xc41748d84fe75459, 0xbb79b8107fd2dd20
.xword 0x3acaa9482f8c46ab, 0x45a459801fb9cfd2
.xword 0x0d75adabd7a6e2d6, 0x721b5d63e7936baf
.xword 0xf3a84c3bb7cdf024, 0x8cc6bcf387f8795d
.xword 0x620ba46c27f3aa2c, 0x1d6554a417c62355
.xword 0x9cd645fc4798b8de, 0xe3b8b53477ad31a7
.xword 0xab69411fbfb21ca3, 0xd407b1d78f8795da
.xword 0x55b4a08fdfd90e51, 0x2ada5047efec8728

View File

@ -0,0 +1,279 @@
/**********************************************************************
Copyright(c) 2019-2020 Arm Corporation All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Arm Corporation nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
**********************************************************************/
#include <aarch64_multibinary.h>
#include "crc.h"
#include "crc64.h"
extern uint16_t
crc16_t10dif_pmull(uint16_t, uint8_t *, uint64_t);
extern uint16_t
crc16_t10dif_copy_pmull(uint16_t, uint8_t *, uint8_t *, uint64_t);
extern uint32_t
crc32_ieee_norm_pmull(uint32_t, uint8_t *, uint64_t);
extern unsigned int
crc32_iscsi_crc_ext(unsigned char *, int, unsigned int);
extern unsigned int
crc32_iscsi_3crc_fold(unsigned char *, int, unsigned int);
extern unsigned int
crc32_iscsi_refl_pmull(unsigned char *, int, unsigned int);
extern uint32_t
crc32_gzip_refl_crc_ext(uint32_t, uint8_t *, uint64_t);
extern uint32_t
crc32_gzip_refl_3crc_fold(uint32_t, uint8_t *, uint64_t);
extern uint32_t
crc32_gzip_refl_pmull(uint32_t, uint8_t *, uint64_t);
extern uint64_t
crc64_ecma_refl_pmull(uint64_t, const unsigned char *, uint64_t);
extern uint64_t
crc64_ecma_norm_pmull(uint64_t, const unsigned char *, uint64_t);
extern uint64_t
crc64_iso_refl_pmull(uint64_t, const unsigned char *, uint64_t);
extern uint64_t
crc64_iso_norm_pmull(uint64_t, const unsigned char *, uint64_t);
extern uint64_t
crc64_jones_refl_pmull(uint64_t, const unsigned char *, uint64_t);
extern uint64_t
crc64_jones_norm_pmull(uint64_t, const unsigned char *, uint64_t);
extern uint64_t
crc64_rocksoft_refl_pmull(uint64_t, const unsigned char *, uint64_t);
extern uint64_t
crc64_rocksoft_norm_pmull(uint64_t, const unsigned char *, uint64_t);
DEFINE_INTERFACE_DISPATCHER(crc16_t10dif)
{
#if defined(__linux__)
unsigned long auxval = getauxval(AT_HWCAP);
if (auxval & HWCAP_PMULL)
return crc16_t10dif_pmull;
#elif defined(__APPLE__)
if (sysctlEnabled(SYSCTL_PMULL_KEY))
return crc16_t10dif_pmull;
#endif
return crc16_t10dif_base;
}
DEFINE_INTERFACE_DISPATCHER(crc16_t10dif_copy)
{
#if defined(__linux__)
unsigned long auxval = getauxval(AT_HWCAP);
if (auxval & HWCAP_PMULL)
return crc16_t10dif_copy_pmull;
#elif defined(__APPLE__)
if (sysctlEnabled(SYSCTL_PMULL_KEY))
return crc16_t10dif_copy_pmull;
#endif
return crc16_t10dif_copy_base;
}
DEFINE_INTERFACE_DISPATCHER(crc32_ieee)
{
#if defined(__linux__)
unsigned long auxval = getauxval(AT_HWCAP);
if (auxval & HWCAP_PMULL) {
return crc32_ieee_norm_pmull;
}
#elif defined(__APPLE__)
if (sysctlEnabled(SYSCTL_PMULL_KEY))
return crc32_ieee_norm_pmull;
#endif
return crc32_ieee_base;
}
DEFINE_INTERFACE_DISPATCHER(crc32_iscsi)
{
#if defined(__linux__)
unsigned long auxval = getauxval(AT_HWCAP);
if (auxval & HWCAP_CRC32) {
switch (get_micro_arch_id()) {
case MICRO_ARCH_ID(ARM, NEOVERSE_N1):
case MICRO_ARCH_ID(ARM, CORTEX_A57):
case MICRO_ARCH_ID(ARM, CORTEX_A72):
return crc32_iscsi_crc_ext;
}
}
if ((HWCAP_CRC32 | HWCAP_PMULL) == (auxval & (HWCAP_CRC32 | HWCAP_PMULL))) {
return crc32_iscsi_3crc_fold;
}
if (auxval & HWCAP_PMULL) {
return crc32_iscsi_refl_pmull;
}
#elif defined(__APPLE__)
if (sysctlEnabled(SYSCTL_CRC32_KEY))
return crc32_iscsi_3crc_fold;
if (sysctlEnabled(SYSCTL_PMULL_KEY))
return crc32_iscsi_refl_pmull;
#endif
return crc32_iscsi_base;
}
DEFINE_INTERFACE_DISPATCHER(crc32_gzip_refl)
{
#if defined(__linux__)
unsigned long auxval = getauxval(AT_HWCAP);
if (auxval & HWCAP_CRC32) {
switch (get_micro_arch_id()) {
case MICRO_ARCH_ID(ARM, NEOVERSE_N1):
case MICRO_ARCH_ID(ARM, CORTEX_A57):
case MICRO_ARCH_ID(ARM, CORTEX_A72):
return crc32_gzip_refl_crc_ext;
}
}
if ((HWCAP_CRC32 | HWCAP_PMULL) == (auxval & (HWCAP_CRC32 | HWCAP_PMULL))) {
return crc32_gzip_refl_3crc_fold;
}
if (auxval & HWCAP_PMULL)
return crc32_gzip_refl_pmull;
#elif defined(__APPLE__)
if (sysctlEnabled(SYSCTL_CRC32_KEY))
return crc32_gzip_refl_3crc_fold;
if (sysctlEnabled(SYSCTL_PMULL_KEY))
return crc32_gzip_refl_pmull;
#endif
return crc32_gzip_refl_base;
}
DEFINE_INTERFACE_DISPATCHER(crc64_ecma_refl)
{
#if defined(__linux__)
unsigned long auxval = getauxval(AT_HWCAP);
if (auxval & HWCAP_PMULL)
return crc64_ecma_refl_pmull;
#elif defined(__APPLE__)
if (sysctlEnabled(SYSCTL_PMULL_KEY))
return crc64_ecma_refl_pmull;
#endif
return crc64_ecma_refl_base;
}
DEFINE_INTERFACE_DISPATCHER(crc64_ecma_norm)
{
#if defined(__linux__)
unsigned long auxval = getauxval(AT_HWCAP);
if (auxval & HWCAP_PMULL)
return crc64_ecma_norm_pmull;
#elif defined(__APPLE__)
if (sysctlEnabled(SYSCTL_PMULL_KEY))
return crc64_ecma_norm_pmull;
#endif
return crc64_ecma_norm_base;
}
DEFINE_INTERFACE_DISPATCHER(crc64_iso_refl)
{
#if defined(__linux__)
unsigned long auxval = getauxval(AT_HWCAP);
if (auxval & HWCAP_PMULL)
return crc64_iso_refl_pmull;
#elif defined(__APPLE__)
if (sysctlEnabled(SYSCTL_PMULL_KEY))
return crc64_iso_refl_pmull;
#endif
return crc64_iso_refl_base;
}
DEFINE_INTERFACE_DISPATCHER(crc64_iso_norm)
{
#if defined(__linux__)
unsigned long auxval = getauxval(AT_HWCAP);
if (auxval & HWCAP_PMULL)
return crc64_iso_norm_pmull;
#elif defined(__APPLE__)
if (sysctlEnabled(SYSCTL_PMULL_KEY))
return crc64_iso_norm_pmull;
#endif
return crc64_iso_norm_base;
}
DEFINE_INTERFACE_DISPATCHER(crc64_jones_refl)
{
#if defined(__linux__)
unsigned long auxval = getauxval(AT_HWCAP);
if (auxval & HWCAP_PMULL)
return crc64_jones_refl_pmull;
#elif defined(__APPLE__)
if (sysctlEnabled(SYSCTL_PMULL_KEY))
return crc64_jones_refl_pmull;
#endif
return crc64_jones_refl_base;
}
DEFINE_INTERFACE_DISPATCHER(crc64_jones_norm)
{
#if defined(__linux__)
unsigned long auxval = getauxval(AT_HWCAP);
if (auxval & HWCAP_PMULL)
return crc64_jones_norm_pmull;
#elif defined(__APPLE__)
if (sysctlEnabled(SYSCTL_PMULL_KEY))
return crc64_jones_norm_pmull;
#endif
return crc64_jones_norm_base;
}
DEFINE_INTERFACE_DISPATCHER(crc64_rocksoft_refl)
{
#if defined(__linux__)
unsigned long auxval = getauxval(AT_HWCAP);
if (auxval & HWCAP_PMULL)
return crc64_rocksoft_refl_pmull;
#elif defined(__APPLE__)
if (sysctlEnabled(SYSCTL_PMULL_KEY))
return crc64_rocksoft_refl_pmull;
#endif
return crc64_rocksoft_refl_base;
}
DEFINE_INTERFACE_DISPATCHER(crc64_rocksoft_norm)
{
#if defined(__linux__)
unsigned long auxval = getauxval(AT_HWCAP);
if (auxval & HWCAP_PMULL)
return crc64_rocksoft_norm_pmull;
#elif defined(__APPLE__)
if (sysctlEnabled(SYSCTL_PMULL_KEY))
return crc64_rocksoft_norm_pmull;
#endif
return crc64_rocksoft_norm_base;
}

View File

@ -0,0 +1,311 @@
########################################################################
# Copyright (c) 2019 Microsoft Corporation.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Microsoft Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#########################################################################
#include "../include/aarch64_label.h"
// parameters
#define w_seed w0
#define x_seed x0
#define x_buf x1
#define w_len w2
#define x_len x2
// return
#define w_crc_ret w0
#define x_crc_ret x0
// constant
#define FOLD_SIZE 64
// global variables
#define x_buf_end x3
#define w_counter w4
#define x_counter x4
#define x_buf_iter x5
#define x_crc_tab_addr x6
#define x_tmp2 x6
#define w_tmp w7
#define x_tmp x7
#define v_x0 v0
#define d_x0 d0
#define s_x0 s0
#define q_x1 q1
#define v_x1 v1
#define q_x2 q2
#define v_x2 v2
#define q_x3 q3
#define v_x3 v3
#define d_x3 d3
#define s_x3 s3
#define q_y0 q4
#define v_y0 v4
#define v_tmp_high v4
#define d_tmp_high d4
#define q_y1 q5
#define v_y1 v5
#define v_tmp_low v5
#define q_y2 q6
#define v_y2 v6
#define q_y3 q7
#define v_y3 v7
#define q_x0_tmp q30
#define v_x0_tmp v30
#define d_p4_high v30.d[1]
#define d_p4_low d30
#define v_p4 v30
#define d_p1_high v30.d[1]
#define d_p1_low d30
#define v_p1 v30
#define d_p0_high v30.d[1]
#define d_p0_low d30
#define v_p0 v30
#define d_br_low d30
#define d_br_low2 v30.d[1]
#define v_br_low v30
#define q_shuffle q31
#define v_shuffle v31
#define d_br_high d31
#define d_br_high2 v31.d[1]
#define v_br_high v31
#define d_p0_low2 d31
#define d_p0_high2 v31.d[1]
#define v_p02 v31
#define v_x0_high v16
#define v_x1_high v17
#define v_x2_high v18
#define v_x3_high v19
.macro crc_refl_load_first_block
ldr q_x0_tmp, [x_buf]
ldr q_x1, [x_buf, 16]
ldr q_x2, [x_buf, 32]
ldr q_x3, [x_buf, 48]
and x_counter, x_len, -64
sub x_tmp, x_counter, #64
cmp x_tmp, 63
add x_buf_iter, x_buf, 64
eor v_x0.16b, v_x0.16b, v_x0_tmp.16b
.endm
.macro crc_norm_load_first_block
#ifndef __APPLE__
adrp x_tmp, .shuffle_data
ldr q_shuffle, [x_tmp, #:lo12:.shuffle_data]
#else
adrp x_tmp, .shuffle_data@PAGE
ldr q_shuffle, [x_tmp, #.shuffle_data@PAGEOFF]
#endif
ldr q_x0_tmp, [x_buf]
ldr q_x1, [x_buf, 16]
ldr q_x2, [x_buf, 32]
ldr q_x3, [x_buf, 48]
and x_counter, x_len, -64
sub x_tmp, x_counter, #64
cmp x_tmp, 63
add x_buf_iter, x_buf, 64
tbl v_x0_tmp.16b, {v_x0_tmp.16b}, v_shuffle.16b
tbl v_x1.16b, {v_x1.16b}, v_shuffle.16b
tbl v_x2.16b, {v_x2.16b}, v_shuffle.16b
tbl v_x3.16b, {v_x3.16b}, v_shuffle.16b
eor v_x0.16b, v_x0.16b, v_x0_tmp.16b
.endm
.macro crc32_load_p4
add x_buf_end, x_buf_iter, x_tmp
mov x_tmp, p4_low_b0
movk x_tmp, p4_low_b1, lsl 16
fmov d_p4_low, x_tmp
mov x_tmp2, p4_high_b0
movk x_tmp2, p4_high_b1, lsl 16
fmov d_p4_high, x_tmp2
.endm
.macro crc64_load_p4
add x_buf_end, x_buf_iter, x_tmp
mov x_tmp, p4_low_b0
movk x_tmp, p4_low_b1, lsl 16
movk x_tmp, p4_low_b2, lsl 32
movk x_tmp, p4_low_b3, lsl 48
fmov d_p4_low, x_tmp
mov x_tmp2, p4_high_b0
movk x_tmp2, p4_high_b1, lsl 16
movk x_tmp2, p4_high_b2, lsl 32
movk x_tmp2, p4_high_b3, lsl 48
fmov d_p4_high, x_tmp2
.endm
.macro crc_refl_loop
.align 3
.clmul_loop:
// interleave ldr and pmull(2) for arch which can only issue quadword load every
// other cycle (i.e. A55)
ldr q_y0, [x_buf_iter]
pmull2 v_x0_high.1q, v_x0.2d, v_p4.2d
ldr q_y1, [x_buf_iter, 16]
pmull2 v_x1_high.1q, v_x1.2d, v_p4.2d
ldr q_y2, [x_buf_iter, 32]
pmull2 v_x2_high.1q, v_x2.2d, v_p4.2d
ldr q_y3, [x_buf_iter, 48]
pmull2 v_x3_high.1q, v_x3.2d, v_p4.2d
pmull v_x0.1q, v_x0.1d, v_p4.1d
add x_buf_iter, x_buf_iter, 64
pmull v_x1.1q, v_x1.1d, v_p4.1d
cmp x_buf_iter, x_buf_end
pmull v_x2.1q, v_x2.1d, v_p4.1d
pmull v_x3.1q, v_x3.1d, v_p4.1d
eor v_x0.16b, v_x0_high.16b, v_x0.16b
eor v_x0.16b, v_x0.16b, v_y0.16b
eor v_x1.16b, v_x1_high.16b, v_x1.16b
eor v_x1.16b, v_x1.16b, v_y1.16b
eor v_x2.16b, v_x2_high.16b, v_x2.16b
eor v_x2.16b, v_x2.16b, v_y2.16b
eor v_x3.16b, v_x3_high.16b, v_x3.16b
eor v_x3.16b, v_x3.16b, v_y3.16b
bne .clmul_loop
.endm
.macro crc_norm_loop
.align 3
.clmul_loop:
// interleave ldr and pmull(2) for arch which can only issue quadword load every
// other cycle (i.e. A55)
ldr q_y0, [x_buf_iter]
pmull2 v_x0_high.1q, v_x0.2d, v_p4.2d
ldr q_y1, [x_buf_iter, 16]
pmull2 v_x1_high.1q, v_x1.2d, v_p4.2d
ldr q_y2, [x_buf_iter, 32]
pmull2 v_x2_high.1q, v_x2.2d, v_p4.2d
ldr q_y3, [x_buf_iter, 48]
pmull2 v_x3_high.1q, v_x3.2d, v_p4.2d
pmull v_x0.1q, v_x0.1d, v_p4.1d
add x_buf_iter, x_buf_iter, 64
pmull v_x1.1q, v_x1.1d, v_p4.1d
cmp x_buf_iter, x_buf_end
pmull v_x2.1q, v_x2.1d, v_p4.1d
pmull v_x3.1q, v_x3.1d, v_p4.1d
tbl v_y0.16b, {v_y0.16b}, v_shuffle.16b
tbl v_y1.16b, {v_y1.16b}, v_shuffle.16b
tbl v_y2.16b, {v_y2.16b}, v_shuffle.16b
tbl v_y3.16b, {v_y3.16b}, v_shuffle.16b
eor v_x0.16b, v_x0.16b, v_x0_high.16b
eor v_x1.16b, v_x1.16b, v_x1_high.16b
eor v_x2.16b, v_x2.16b, v_x2_high.16b
eor v_x3.16b, v_x3.16b, v_x3_high.16b
eor v_x0.16b, v_x0.16b, v_y0.16b
eor v_x1.16b, v_x1.16b, v_y1.16b
eor v_x2.16b, v_x2.16b, v_y2.16b
eor v_x3.16b, v_x3.16b, v_y3.16b
bne .clmul_loop
.endm
.macro crc32_fold_512b_to_128b
mov x_tmp, p1_low_b0
movk x_tmp, p1_low_b1, lsl 16
fmov d_p1_low, x_tmp
mov x_tmp2, p1_high_b0
movk x_tmp2, p1_high_b1, lsl 16
fmov d_p1_high, x_tmp2
pmull2 v_tmp_high.1q, v_x0.2d, v_p1.2d
pmull v_tmp_low.1q, v_x0.1d, v_p1.1d
eor v_x1.16b, v_x1.16b, v_tmp_high.16b
eor v_x1.16b, v_x1.16b, v_tmp_low.16b
pmull2 v_tmp_high.1q, v_x1.2d, v_p1.2d
pmull v_tmp_low.1q, v_x1.1d, v_p1.1d
eor v_x2.16b, v_x2.16b, v_tmp_high.16b
eor v_x2.16b, v_x2.16b, v_tmp_low.16b
pmull2 v_tmp_high.1q, v_x2.2d, v_p1.2d
pmull v_tmp_low.1q, v_x2.1d, v_p1.1d
eor v_x3.16b, v_x3.16b, v_tmp_high.16b
eor v_x3.16b, v_x3.16b, v_tmp_low.16b
.endm
.macro crc64_fold_512b_to_128b
mov x_tmp, p1_low_b0
movk x_tmp, p1_low_b1, lsl 16
movk x_tmp, p1_low_b2, lsl 32
movk x_tmp, p1_low_b3, lsl 48
fmov d_p1_low, x_tmp
mov x_tmp2, p1_high_b0
movk x_tmp2, p1_high_b1, lsl 16
movk x_tmp2, p1_high_b2, lsl 32
movk x_tmp2, p1_high_b3, lsl 48
fmov d_p1_high, x_tmp2
pmull2 v_tmp_high.1q, v_x0.2d, v_p1.2d
pmull v_tmp_low.1q, v_x0.1d, v_p1.1d
eor v_x1.16b, v_x1.16b, v_tmp_high.16b
eor v_x1.16b, v_x1.16b, v_tmp_low.16b
pmull2 v_tmp_high.1q, v_x1.2d, v_p1.2d
pmull v_tmp_low.1q, v_x1.1d, v_p1.1d
eor v_x2.16b, v_x2.16b, v_tmp_high.16b
eor v_x2.16b, v_x2.16b, v_tmp_low.16b
pmull2 v_tmp_high.1q, v_x2.2d, v_p1.2d
pmull v_tmp_low.1q, v_x2.1d, v_p1.1d
eor v_x3.16b, v_x3.16b, v_tmp_high.16b
eor v_x3.16b, v_x3.16b, v_tmp_low.16b
.endm

View File

@ -0,0 +1,44 @@
########################################################################
# Copyright(c) 2019 Arm Corporation All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Arm Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#########################################################################
#include <aarch64_multibinary.h>
mbin_interface crc32_iscsi
mbin_interface crc16_t10dif
mbin_interface crc16_t10dif_copy
mbin_interface crc32_ieee
mbin_interface crc32_gzip_refl
mbin_interface crc64_ecma_refl
mbin_interface crc64_ecma_norm
mbin_interface crc64_iso_refl
mbin_interface crc64_iso_norm
mbin_interface crc64_jones_refl
mbin_interface crc64_jones_norm
mbin_interface crc64_rocksoft_refl
mbin_interface crc64_rocksoft_norm

View File

@ -44,7 +44,13 @@
%include "reg_sizes.asm"
%define fetch_dist 1024
%ifndef fetch_dist
%define fetch_dist 4096
%endif
%ifndef PREFETCH
%define PREFETCH prefetcht1
%endif
[bits 64]
default rel
@ -73,8 +79,9 @@ section .text
%endif
align 16
global crc16_t10dif_01:function
mk_global crc16_t10dif_01, function
crc16_t10dif_01:
endbranch
; adjust the 16-bit initial_crc value, scale it to 32 bits
shl arg1_low32, 16
@ -140,14 +147,92 @@ crc16_t10dif_01:
; at this section of the code, there is 128*x+y (0<=y<128) bytes of buffer. The _fold_128_B_loop
; loop will fold 128B at a time until we have 128+y Bytes of buffer
%if fetch_dist != 0
; check if there is at least 4kb (fetch distance) + 128b in the buffer
cmp arg3, (fetch_dist + 128)
jb _fold_128_B_loop
; fold 128B at a time. This section of the code folds 8 xmm registers in parallel
_fold_128_B_loop:
align 16
_fold_and_prefetch_128_B_loop:
; update the buffer pointer
add arg2, 128 ; buf += 128;
PREFETCH [arg2+fetch_dist+0]
movdqu xmm9, [arg2+16*0]
movdqu xmm12, [arg2+16*1]
pshufb xmm9, xmm11
pshufb xmm12, xmm11
movdqa xmm8, xmm0
movdqa xmm13, xmm1
pclmulqdq xmm0, xmm10, 0x0
pclmulqdq xmm8, xmm10 , 0x11
pclmulqdq xmm1, xmm10, 0x0
pclmulqdq xmm13, xmm10 , 0x11
pxor xmm0, xmm9
xorps xmm0, xmm8
pxor xmm1, xmm12
xorps xmm1, xmm13
movdqu xmm9, [arg2+16*2]
movdqu xmm12, [arg2+16*3]
pshufb xmm9, xmm11
pshufb xmm12, xmm11
movdqa xmm8, xmm2
movdqa xmm13, xmm3
pclmulqdq xmm2, xmm10, 0x0
pclmulqdq xmm8, xmm10 , 0x11
pclmulqdq xmm3, xmm10, 0x0
pclmulqdq xmm13, xmm10 , 0x11
pxor xmm2, xmm9
xorps xmm2, xmm8
pxor xmm3, xmm12
xorps xmm3, xmm13
PREFETCH [arg2+fetch_dist+64]
movdqu xmm9, [arg2+16*4]
movdqu xmm12, [arg2+16*5]
pshufb xmm9, xmm11
pshufb xmm12, xmm11
movdqa xmm8, xmm4
movdqa xmm13, xmm5
pclmulqdq xmm4, xmm10, 0x0
pclmulqdq xmm8, xmm10 , 0x11
pclmulqdq xmm5, xmm10, 0x0
pclmulqdq xmm13, xmm10 , 0x11
pxor xmm4, xmm9
xorps xmm4, xmm8
pxor xmm5, xmm12
xorps xmm5, xmm13
movdqu xmm9, [arg2+16*6]
movdqu xmm12, [arg2+16*7]
pshufb xmm9, xmm11
pshufb xmm12, xmm11
movdqa xmm8, xmm6
movdqa xmm13, xmm7
pclmulqdq xmm6, xmm10, 0x0
pclmulqdq xmm8, xmm10 , 0x11
pclmulqdq xmm7, xmm10, 0x0
pclmulqdq xmm13, xmm10 , 0x11
pxor xmm6, xmm9
xorps xmm6, xmm8
pxor xmm7, xmm12
xorps xmm7, xmm13
sub arg3, 128
; check if there is another 128B in the buffer to be able to fold
cmp arg3, (fetch_dist + 128)
jge _fold_and_prefetch_128_B_loop
%endif ; fetch_dist != 0
; fold 128B at a time. This section of the code folds 8 xmm registers in parallel
align 16
_fold_128_B_loop:
; update the buffer pointer
add arg2, 128 ; buf += 128;
prefetchnta [arg2+fetch_dist+0]
movdqu xmm9, [arg2+16*0]
movdqu xmm12, [arg2+16*1]
pshufb xmm9, xmm11
@ -163,7 +248,6 @@ _fold_128_B_loop:
pxor xmm1, xmm12
xorps xmm1, xmm13
prefetchnta [arg2+fetch_dist+32]
movdqu xmm9, [arg2+16*2]
movdqu xmm12, [arg2+16*3]
pshufb xmm9, xmm11
@ -179,7 +263,6 @@ _fold_128_B_loop:
pxor xmm3, xmm12
xorps xmm3, xmm13
prefetchnta [arg2+fetch_dist+64]
movdqu xmm9, [arg2+16*4]
movdqu xmm12, [arg2+16*5]
pshufb xmm9, xmm11
@ -195,7 +278,6 @@ _fold_128_B_loop:
pxor xmm5, xmm12
xorps xmm5, xmm13
prefetchnta [arg2+fetch_dist+96]
movdqu xmm9, [arg2+16*6]
movdqu xmm12, [arg2+16*7]
pshufb xmm9, xmm11
@ -659,7 +741,3 @@ pshufb_shf_table:
; dq 0x060504030201008f, 0x0e0d0c0b0a090807 ; shl 1 (16-15) / shr15
dq 0x8786858483828100, 0x8f8e8d8c8b8a8988
dq 0x0706050403020100, 0x000e0d0c0b0a0908
;;; func core, ver, snum
slversion crc16_t10dif_01, 01, 06, 0010

738
crc/crc16_t10dif_02.asm Normal file
View File

@ -0,0 +1,738 @@
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Copyright(c) 2011-2020 Intel Corporation All rights reserved.
;
; Redistribution and use in source and binary forms, with or without
; modification, are permitted provided that the following conditions
; are met:
; * Redistributions of source code must retain the above copyright
; notice, this list of conditions and the following disclaimer.
; * Redistributions in binary form must reproduce the above copyright
; notice, this list of conditions and the following disclaimer in
; the documentation and/or other materials provided with the
; distribution.
; * Neither the name of Intel Corporation nor the names of its
; contributors may be used to endorse or promote products derived
; from this software without specific prior written permission.
;
; THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
; "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
; LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
; A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
; OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
; SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
; LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
; DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
; THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
; (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
; OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Function API:
; UINT16 crc16_t10dif_02(
; UINT16 init_crc, //initial CRC value, 16 bits
; const unsigned char *buf, //buffer pointer to calculate CRC on
; UINT64 len //buffer length in bytes (64-bit data)
; );
;
; Authors:
; Erdinc Ozturk
; Vinodh Gopal
; James Guilford
;
; Reference paper titled "Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction"
; URL: http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf
%include "reg_sizes.asm"
%ifndef fetch_dist
%define fetch_dist 4096
%endif
%ifndef PREFETCH
%define PREFETCH prefetcht1
%endif
[bits 64]
default rel
section .text
%ifidn __OUTPUT_FORMAT__, win64
%xdefine arg1 rcx
%xdefine arg2 rdx
%xdefine arg3 r8
%xdefine arg1_low32 ecx
%else
%xdefine arg1 rdi
%xdefine arg2 rsi
%xdefine arg3 rdx
%xdefine arg1_low32 edi
%endif
%ifidn __OUTPUT_FORMAT__, win64
%define XMM_SAVE 16*2
%define VARIABLE_OFFSET 16*10+8
%else
%define VARIABLE_OFFSET 16*2+8
%endif
align 16
mk_global crc16_t10dif_02, function
crc16_t10dif_02:
endbranch
; adjust the 16-bit initial_crc value, scale it to 32 bits
shl arg1_low32, 16
; After this point, code flow is exactly same as a 32-bit CRC.
; The only difference is before returning eax, we will shift it right 16 bits, to scale back to 16 bits.
sub rsp, VARIABLE_OFFSET
%ifidn __OUTPUT_FORMAT__, win64
; push the xmm registers into the stack to maintain
vmovdqa [rsp+16*2],xmm6
vmovdqa [rsp+16*3],xmm7
vmovdqa [rsp+16*4],xmm8
vmovdqa [rsp+16*5],xmm9
vmovdqa [rsp+16*6],xmm10
vmovdqa [rsp+16*7],xmm11
vmovdqa [rsp+16*8],xmm12
vmovdqa [rsp+16*9],xmm13
%endif
; check if smaller than 256
cmp arg3, 256
; for sizes less than 256, we can't fold 128B at a time...
jl _less_than_256
; load the initial crc value
vmovd xmm10, arg1_low32 ; initial crc
; crc value does not need to be byte-reflected, but it needs to be moved to the high part of the register.
; because data will be byte-reflected and will align with initial crc at correct place.
vpslldq xmm10, 12
vmovdqa xmm11, [SHUF_MASK]
; receive the initial 128B data, xor the initial crc value
vmovdqu xmm0, [arg2+16*0]
vmovdqu xmm1, [arg2+16*1]
vmovdqu xmm2, [arg2+16*2]
vmovdqu xmm3, [arg2+16*3]
vmovdqu xmm4, [arg2+16*4]
vmovdqu xmm5, [arg2+16*5]
vmovdqu xmm6, [arg2+16*6]
vmovdqu xmm7, [arg2+16*7]
vpshufb xmm0, xmm11
; XOR the initial_crc value
vpxor xmm0, xmm10
vpshufb xmm1, xmm11
vpshufb xmm2, xmm11
vpshufb xmm3, xmm11
vpshufb xmm4, xmm11
vpshufb xmm5, xmm11
vpshufb xmm6, xmm11
vpshufb xmm7, xmm11
vmovdqa xmm10, [rk3] ;xmm10 has rk3 and rk4
;imm value of pclmulqdq instruction will determine which constant to use
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; we subtract 256 instead of 128 to save one instruction from the loop
sub arg3, 256
; at this section of the code, there is 128*x+y (0<=y<128) bytes of buffer. The _fold_128_B_loop
; loop will fold 128B at a time until we have 128+y Bytes of buffer
%if fetch_dist != 0
; check if there is at least 4kb (fetch distance) + 128b in the buffer
cmp arg3, (fetch_dist + 128)
jb _fold_128_B_loop
; fold 128B at a time. This section of the code folds 8 xmm registers in parallel
align 16
_fold_and_prefetch_128_B_loop:
; update the buffer pointer
add arg2, 128 ; buf += 128;
PREFETCH [arg2+fetch_dist+0]
vmovdqu xmm9, [arg2+16*0]
vmovdqu xmm12, [arg2+16*1]
vpshufb xmm9, xmm11
vpshufb xmm12, xmm11
vmovdqa xmm8, xmm0
vmovdqa xmm13, xmm1
vpclmulqdq xmm0, xmm10, 0x0
vpclmulqdq xmm8, xmm10 , 0x11
vpclmulqdq xmm1, xmm10, 0x0
vpclmulqdq xmm13, xmm10 , 0x11
vpxor xmm0, xmm9
vxorps xmm0, xmm8
vpxor xmm1, xmm12
vxorps xmm1, xmm13
vmovdqu xmm9, [arg2+16*2]
vmovdqu xmm12, [arg2+16*3]
vpshufb xmm9, xmm11
vpshufb xmm12, xmm11
vmovdqa xmm8, xmm2
vmovdqa xmm13, xmm3
vpclmulqdq xmm2, xmm10, 0x0
vpclmulqdq xmm8, xmm10 , 0x11
vpclmulqdq xmm3, xmm10, 0x0
vpclmulqdq xmm13, xmm10 , 0x11
vpxor xmm2, xmm9
vxorps xmm2, xmm8
vpxor xmm3, xmm12
vxorps xmm3, xmm13
PREFETCH [arg2+fetch_dist+64]
vmovdqu xmm9, [arg2+16*4]
vmovdqu xmm12, [arg2+16*5]
vpshufb xmm9, xmm11
vpshufb xmm12, xmm11
vmovdqa xmm8, xmm4
vmovdqa xmm13, xmm5
vpclmulqdq xmm4, xmm10, 0x0
vpclmulqdq xmm8, xmm10 , 0x11
vpclmulqdq xmm5, xmm10, 0x0
vpclmulqdq xmm13, xmm10 , 0x11
vpxor xmm4, xmm9
vxorps xmm4, xmm8
vpxor xmm5, xmm12
vxorps xmm5, xmm13
vmovdqu xmm9, [arg2+16*6]
vmovdqu xmm12, [arg2+16*7]
vpshufb xmm9, xmm11
vpshufb xmm12, xmm11
vmovdqa xmm8, xmm6
vmovdqa xmm13, xmm7
vpclmulqdq xmm6, xmm10, 0x0
vpclmulqdq xmm8, xmm10 , 0x11
vpclmulqdq xmm7, xmm10, 0x0
vpclmulqdq xmm13, xmm10 , 0x11
vpxor xmm6, xmm9
vxorps xmm6, xmm8
vpxor xmm7, xmm12
vxorps xmm7, xmm13
sub arg3, 128
; check if there is another 128B in the buffer to be able to fold
cmp arg3, (fetch_dist + 128)
jge _fold_and_prefetch_128_B_loop
%endif ; fetch_dist != 0
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; fold 128B at a time. This section of the code folds 8 xmm registers in parallel
align 16
_fold_128_B_loop:
; update the buffer pointer
add arg2, 128 ; buf += 128;
vmovdqu xmm9, [arg2+16*0]
vmovdqu xmm12, [arg2+16*1]
vpshufb xmm9, xmm11
vpshufb xmm12, xmm11
vmovdqa xmm8, xmm0
vmovdqa xmm13, xmm1
vpclmulqdq xmm0, xmm10, 0x0
vpclmulqdq xmm8, xmm10 , 0x11
vpclmulqdq xmm1, xmm10, 0x0
vpclmulqdq xmm13, xmm10 , 0x11
vpxor xmm0, xmm9
vxorps xmm0, xmm8
vpxor xmm1, xmm12
vxorps xmm1, xmm13
vmovdqu xmm9, [arg2+16*2]
vmovdqu xmm12, [arg2+16*3]
vpshufb xmm9, xmm11
vpshufb xmm12, xmm11
vmovdqa xmm8, xmm2
vmovdqa xmm13, xmm3
vpclmulqdq xmm2, xmm10, 0x0
vpclmulqdq xmm8, xmm10 , 0x11
vpclmulqdq xmm3, xmm10, 0x0
vpclmulqdq xmm13, xmm10 , 0x11
vpxor xmm2, xmm9
vxorps xmm2, xmm8
vpxor xmm3, xmm12
vxorps xmm3, xmm13
vmovdqu xmm9, [arg2+16*4]
vmovdqu xmm12, [arg2+16*5]
vpshufb xmm9, xmm11
vpshufb xmm12, xmm11
vmovdqa xmm8, xmm4
vmovdqa xmm13, xmm5
vpclmulqdq xmm4, xmm10, 0x0
vpclmulqdq xmm8, xmm10 , 0x11
vpclmulqdq xmm5, xmm10, 0x0
vpclmulqdq xmm13, xmm10 , 0x11
vpxor xmm4, xmm9
vxorps xmm4, xmm8
vpxor xmm5, xmm12
vxorps xmm5, xmm13
vmovdqu xmm9, [arg2+16*6]
vmovdqu xmm12, [arg2+16*7]
vpshufb xmm9, xmm11
vpshufb xmm12, xmm11
vmovdqa xmm8, xmm6
vmovdqa xmm13, xmm7
vpclmulqdq xmm6, xmm10, 0x0
vpclmulqdq xmm8, xmm10 , 0x11
vpclmulqdq xmm7, xmm10, 0x0
vpclmulqdq xmm13, xmm10 , 0x11
vpxor xmm6, xmm9
vxorps xmm6, xmm8
vpxor xmm7, xmm12
vxorps xmm7, xmm13
sub arg3, 128
; check if there is another 128B in the buffer to be able to fold
jge _fold_128_B_loop
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
add arg2, 128
; at this point, the buffer pointer is pointing at the last y Bytes of the buffer
; fold the 8 xmm registers to 1 xmm register with different constants
vmovdqa xmm10, [rk9]
vmovdqa xmm8, xmm0
vpclmulqdq xmm0, xmm10, 0x11
vpclmulqdq xmm8, xmm10, 0x0
vpxor xmm7, xmm8
vxorps xmm7, xmm0
vmovdqa xmm10, [rk11]
vmovdqa xmm8, xmm1
vpclmulqdq xmm1, xmm10, 0x11
vpclmulqdq xmm8, xmm10, 0x0
vpxor xmm7, xmm8
vxorps xmm7, xmm1
vmovdqa xmm10, [rk13]
vmovdqa xmm8, xmm2
vpclmulqdq xmm2, xmm10, 0x11
vpclmulqdq xmm8, xmm10, 0x0
vpxor xmm7, xmm8
vpxor xmm7, xmm2
vmovdqa xmm10, [rk15]
vmovdqa xmm8, xmm3
vpclmulqdq xmm3, xmm10, 0x11
vpclmulqdq xmm8, xmm10, 0x0
vpxor xmm7, xmm8
vxorps xmm7, xmm3
vmovdqa xmm10, [rk17]
vmovdqa xmm8, xmm4
vpclmulqdq xmm4, xmm10, 0x11
vpclmulqdq xmm8, xmm10, 0x0
vpxor xmm7, xmm8
vpxor xmm7, xmm4
vmovdqa xmm10, [rk19]
vmovdqa xmm8, xmm5
vpclmulqdq xmm5, xmm10, 0x11
vpclmulqdq xmm8, xmm10, 0x0
vpxor xmm7, xmm8
vxorps xmm7, xmm5
vmovdqa xmm10, [rk1] ;xmm10 has rk1 and rk2
;imm value of pclmulqdq instruction will determine which constant to use
vmovdqa xmm8, xmm6
vpclmulqdq xmm6, xmm10, 0x11
vpclmulqdq xmm8, xmm10, 0x0
vpxor xmm7, xmm8
vpxor xmm7, xmm6
; instead of 128, we add 112 to the loop counter to save 1 instruction from the loop
; instead of a cmp instruction, we use the negative flag with the jl instruction
add arg3, 128-16
jl _final_reduction_for_128
; now we have 16+y bytes left to reduce. 16 Bytes is in register xmm7 and the rest is in memory
; we can fold 16 bytes at a time if y>=16
; continue folding 16B at a time
_16B_reduction_loop:
vmovdqa xmm8, xmm7
vpclmulqdq xmm7, xmm10, 0x11
vpclmulqdq xmm8, xmm10, 0x0
vpxor xmm7, xmm8
vmovdqu xmm0, [arg2]
vpshufb xmm0, xmm11
vpxor xmm7, xmm0
add arg2, 16
sub arg3, 16
; instead of a cmp instruction, we utilize the flags with the jge instruction
; equivalent of: cmp arg3, 16-16
; check if there is any more 16B in the buffer to be able to fold
jge _16B_reduction_loop
;now we have 16+z bytes left to reduce, where 0<= z < 16.
;first, we reduce the data in the xmm7 register
_final_reduction_for_128:
; check if any more data to fold. If not, compute the CRC of the final 128 bits
add arg3, 16
je _128_done
; here we are getting data that is less than 16 bytes.
; since we know that there was data before the pointer, we can offset the input pointer before the actual point, to receive exactly 16 bytes.
; after that the registers need to be adjusted.
_get_last_two_xmms:
vmovdqa xmm2, xmm7
vmovdqu xmm1, [arg2 - 16 + arg3]
vpshufb xmm1, xmm11
; get rid of the extra data that was loaded before
; load the shift constant
lea rax, [pshufb_shf_table + 16]
sub rax, arg3
vmovdqu xmm0, [rax]
; shift xmm2 to the left by arg3 bytes
vpshufb xmm2, xmm0
; shift xmm7 to the right by 16-arg3 bytes
vpxor xmm0, [mask1]
vpshufb xmm7, xmm0
vpblendvb xmm1, xmm1, xmm2, xmm0
; fold 16 Bytes
vmovdqa xmm2, xmm1
vmovdqa xmm8, xmm7
vpclmulqdq xmm7, xmm10, 0x11
vpclmulqdq xmm8, xmm10, 0x0
vpxor xmm7, xmm8
vpxor xmm7, xmm2
_128_done:
; compute crc of a 128-bit value
vmovdqa xmm10, [rk5] ; rk5 and rk6 in xmm10
vmovdqa xmm0, xmm7
;64b fold
vpclmulqdq xmm7, xmm10, 0x1
vpslldq xmm0, 8
vpxor xmm7, xmm0
;32b fold
vmovdqa xmm0, xmm7
vpand xmm0, [mask2]
vpsrldq xmm7, 12
vpclmulqdq xmm7, xmm10, 0x10
vpxor xmm7, xmm0
;barrett reduction
_barrett:
vmovdqa xmm10, [rk7] ; rk7 and rk8 in xmm10
vmovdqa xmm0, xmm7
vpclmulqdq xmm7, xmm10, 0x01
vpslldq xmm7, 4
vpclmulqdq xmm7, xmm10, 0x11
vpslldq xmm7, 4
vpxor xmm7, xmm0
vpextrd eax, xmm7,1
_cleanup:
; scale the result back to 16 bits
shr eax, 16
%ifidn __OUTPUT_FORMAT__, win64
vmovdqa xmm6, [rsp+16*2]
vmovdqa xmm7, [rsp+16*3]
vmovdqa xmm8, [rsp+16*4]
vmovdqa xmm9, [rsp+16*5]
vmovdqa xmm10, [rsp+16*6]
vmovdqa xmm11, [rsp+16*7]
vmovdqa xmm12, [rsp+16*8]
vmovdqa xmm13, [rsp+16*9]
%endif
add rsp, VARIABLE_OFFSET
ret
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
align 16
_less_than_256:
; check if there is enough buffer to be able to fold 16B at a time
cmp arg3, 32
jl _less_than_32
vmovdqa xmm11, [SHUF_MASK]
; if there is, load the constants
vmovdqa xmm10, [rk1] ; rk1 and rk2 in xmm10
vmovd xmm0, arg1_low32 ; get the initial crc value
vpslldq xmm0, 12 ; align it to its correct place
vmovdqu xmm7, [arg2] ; load the plaintext
vpshufb xmm7, xmm11 ; byte-reflect the plaintext
vpxor xmm7, xmm0
; update the buffer pointer
add arg2, 16
; update the counter. subtract 32 instead of 16 to save one instruction from the loop
sub arg3, 32
jmp _16B_reduction_loop
align 16
_less_than_32:
; mov initial crc to the return value. this is necessary for zero-length buffers.
mov eax, arg1_low32
test arg3, arg3
je _cleanup
vmovdqa xmm11, [SHUF_MASK]
vmovd xmm0, arg1_low32 ; get the initial crc value
vpslldq xmm0, 12 ; align it to its correct place
cmp arg3, 16
je _exact_16_left
jl _less_than_16_left
vmovdqu xmm7, [arg2] ; load the plaintext
vpshufb xmm7, xmm11 ; byte-reflect the plaintext
vpxor xmm7, xmm0 ; xor the initial crc value
add arg2, 16
sub arg3, 16
vmovdqa xmm10, [rk1] ; rk1 and rk2 in xmm10
jmp _get_last_two_xmms
align 16
_less_than_16_left:
; use stack space to load data less than 16 bytes, zero-out the 16B in memory first.
vpxor xmm1, xmm1
mov r11, rsp
vmovdqa [r11], xmm1
cmp arg3, 4
jl _only_less_than_4
; backup the counter value
mov r9, arg3
cmp arg3, 8
jl _less_than_8_left
; load 8 Bytes
mov rax, [arg2]
mov [r11], rax
add r11, 8
sub arg3, 8
add arg2, 8
_less_than_8_left:
cmp arg3, 4
jl _less_than_4_left
; load 4 Bytes
mov eax, [arg2]
mov [r11], eax
add r11, 4
sub arg3, 4
add arg2, 4
_less_than_4_left:
cmp arg3, 2
jl _less_than_2_left
; load 2 Bytes
mov ax, [arg2]
mov [r11], ax
add r11, 2
sub arg3, 2
add arg2, 2
_less_than_2_left:
cmp arg3, 1
jl _zero_left
; load 1 Byte
mov al, [arg2]
mov [r11], al
_zero_left:
vmovdqa xmm7, [rsp]
vpshufb xmm7, xmm11
vpxor xmm7, xmm0 ; xor the initial crc value
lea rax, [pshufb_shf_table + 16]
sub rax, r9
vmovdqu xmm0, [rax]
vpxor xmm0, [mask1]
vpshufb xmm7, xmm0
jmp _128_done
align 16
_exact_16_left:
vmovdqu xmm7, [arg2]
vpshufb xmm7, xmm11
vpxor xmm7, xmm0 ; xor the initial crc value
jmp _128_done
_only_less_than_4:
cmp arg3, 3
jl _only_less_than_3
; load 3 Bytes
mov al, [arg2]
mov [r11], al
mov al, [arg2+1]
mov [r11+1], al
mov al, [arg2+2]
mov [r11+2], al
vmovdqa xmm7, [rsp]
vpshufb xmm7, xmm11
vpxor xmm7, xmm0 ; xor the initial crc value
vpsrldq xmm7, 5
jmp _barrett
_only_less_than_3:
cmp arg3, 2
jl _only_less_than_2
; load 2 Bytes
mov al, [arg2]
mov [r11], al
mov al, [arg2+1]
mov [r11+1], al
vmovdqa xmm7, [rsp]
vpshufb xmm7, xmm11
vpxor xmm7, xmm0 ; xor the initial crc value
vpsrldq xmm7, 6
jmp _barrett
_only_less_than_2:
; load 1 Byte
mov al, [arg2]
mov [r11], al
vmovdqa xmm7, [rsp]
vpshufb xmm7, xmm11
vpxor xmm7, xmm0 ; xor the initial crc value
vpsrldq xmm7, 7
jmp _barrett
section .data
; precomputed constants
; these constants are precomputed from the poly: 0x8bb70000 (0x8bb7 scaled to 32 bits)
align 16
; Q = 0x18BB70000
; rk1 = 2^(32*3) mod Q << 32
; rk2 = 2^(32*5) mod Q << 32
; rk3 = 2^(32*15) mod Q << 32
; rk4 = 2^(32*17) mod Q << 32
; rk5 = 2^(32*3) mod Q << 32
; rk6 = 2^(32*2) mod Q << 32
; rk7 = floor(2^64/Q)
; rk8 = Q
rk1:
DQ 0x2d56000000000000
rk2:
DQ 0x06df000000000000
rk3:
DQ 0x9d9d000000000000
rk4:
DQ 0x7cf5000000000000
rk5:
DQ 0x2d56000000000000
rk6:
DQ 0x1368000000000000
rk7:
DQ 0x00000001f65a57f8
rk8:
DQ 0x000000018bb70000
rk9:
DQ 0xceae000000000000
rk10:
DQ 0xbfd6000000000000
rk11:
DQ 0x1e16000000000000
rk12:
DQ 0x713c000000000000
rk13:
DQ 0xf7f9000000000000
rk14:
DQ 0x80a6000000000000
rk15:
DQ 0x044c000000000000
rk16:
DQ 0xe658000000000000
rk17:
DQ 0xad18000000000000
rk18:
DQ 0xa497000000000000
rk19:
DQ 0x6ee3000000000000
rk20:
DQ 0xe7b5000000000000
mask1:
dq 0x8080808080808080, 0x8080808080808080
mask2:
dq 0xFFFFFFFFFFFFFFFF, 0x00000000FFFFFFFF
SHUF_MASK:
dq 0x08090A0B0C0D0E0F, 0x0001020304050607
pshufb_shf_table:
; use these values for shift constants for the pshufb instruction
; different alignments result in values as shown:
; dq 0x8887868584838281, 0x008f8e8d8c8b8a89 ; shl 15 (16-1) / shr1
; dq 0x8988878685848382, 0x01008f8e8d8c8b8a ; shl 14 (16-3) / shr2
; dq 0x8a89888786858483, 0x0201008f8e8d8c8b ; shl 13 (16-4) / shr3
; dq 0x8b8a898887868584, 0x030201008f8e8d8c ; shl 12 (16-4) / shr4
; dq 0x8c8b8a8988878685, 0x04030201008f8e8d ; shl 11 (16-5) / shr5
; dq 0x8d8c8b8a89888786, 0x0504030201008f8e ; shl 10 (16-6) / shr6
; dq 0x8e8d8c8b8a898887, 0x060504030201008f ; shl 9 (16-7) / shr7
; dq 0x8f8e8d8c8b8a8988, 0x0706050403020100 ; shl 8 (16-8) / shr8
; dq 0x008f8e8d8c8b8a89, 0x0807060504030201 ; shl 7 (16-9) / shr9
; dq 0x01008f8e8d8c8b8a, 0x0908070605040302 ; shl 6 (16-10) / shr10
; dq 0x0201008f8e8d8c8b, 0x0a09080706050403 ; shl 5 (16-11) / shr11
; dq 0x030201008f8e8d8c, 0x0b0a090807060504 ; shl 4 (16-12) / shr12
; dq 0x04030201008f8e8d, 0x0c0b0a0908070605 ; shl 3 (16-13) / shr13
; dq 0x0504030201008f8e, 0x0d0c0b0a09080706 ; shl 2 (16-14) / shr14
; dq 0x060504030201008f, 0x0e0d0c0b0a090807 ; shl 1 (16-15) / shr15
dq 0x8786858483828100, 0x8f8e8d8c8b8a8988
dq 0x0706050403020100, 0x000e0d0c0b0a0908

View File

@ -0,0 +1,627 @@
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Copyright(c) 2011-2020 Intel Corporation All rights reserved.
;
; Redistribution and use in source and binary forms, with or without
; modification, are permitted provided that the following conditions
; are met:
; * Redistributions of source code must retain the above copyright
; notice, this list of conditions and the following disclaimer.
; * Redistributions in binary form must reproduce the above copyright
; notice, this list of conditions and the following disclaimer in
; the documentation and/or other materials provided with the
; distribution.
; * Neither the name of Intel Corporation nor the names of its
; contributors may be used to endorse or promote products derived
; from this software without specific prior written permission.
;
; THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
; "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
; LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
; A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
; OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
; SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
; LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
; DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
; THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
; (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
; OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Function API:
; UINT32 crc16_t10dif_by16_10(
; UINT16 init_crc, //initial CRC value, 16 bits
; const unsigned char *buf, //buffer pointer to calculate CRC on
; UINT64 len //buffer length in bytes (64-bit data)
; );
;
; Authors:
; Erdinc Ozturk
; Vinodh Gopal
; James Guilford
;
; Reference paper titled "Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction"
; URL: http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf
;
;
%include "reg_sizes.asm"
%ifndef FUNCTION_NAME
%define FUNCTION_NAME crc16_t10dif_by16_10
%endif
%ifndef fetch_dist
%define fetch_dist 1536
%endif
%ifndef PREFETCH
%define PREFETCH prefetcht0
%endif
[bits 64]
default rel
section .text
%ifidn __OUTPUT_FORMAT__, win64
%xdefine arg1 rcx
%xdefine arg2 rdx
%xdefine arg3 r8
%xdefine arg1_low32 ecx
%else
%xdefine arg1 rdi
%xdefine arg2 rsi
%xdefine arg3 rdx
%xdefine arg1_low32 edi
%endif
%define TMP 16*0
%ifidn __OUTPUT_FORMAT__, win64
%define XMM_SAVE 16*2
%define VARIABLE_OFFSET 16*12+8
%else
%define VARIABLE_OFFSET 16*2+8
%endif
align 16
mk_global FUNCTION_NAME, function
FUNCTION_NAME:
endbranch
; adjust the 16-bit initial_crc value, scale it to 32 bits
shl arg1_low32, 16
; After this point, code flow is exactly same as a 32-bit CRC.
; The only difference is before returning eax, we will shift it right 16 bits, to scale back to 16 bits.
sub rsp, VARIABLE_OFFSET
%ifidn __OUTPUT_FORMAT__, win64
; push the xmm registers into the stack to maintain
vmovdqa [rsp + XMM_SAVE + 16*0], xmm6
vmovdqa [rsp + XMM_SAVE + 16*1], xmm7
vmovdqa [rsp + XMM_SAVE + 16*2], xmm8
vmovdqa [rsp + XMM_SAVE + 16*3], xmm9
vmovdqa [rsp + XMM_SAVE + 16*4], xmm10
vmovdqa [rsp + XMM_SAVE + 16*5], xmm11
vmovdqa [rsp + XMM_SAVE + 16*6], xmm12
vmovdqa [rsp + XMM_SAVE + 16*7], xmm13
vmovdqa [rsp + XMM_SAVE + 16*8], xmm14
vmovdqa [rsp + XMM_SAVE + 16*9], xmm15
%endif
vbroadcasti32x4 zmm18, [SHUF_MASK]
cmp arg3, 256
jl .less_than_256
; load the initial crc value
vmovd xmm10, arg1_low32 ; initial crc
; crc value does not need to be byte-reflected, but it needs to be moved to the high part of the register.
; because data will be byte-reflected and will align with initial crc at correct place.
vpslldq xmm10, 12
; receive the initial 64B data, xor the initial crc value
vmovdqu8 zmm0, [arg2+16*0]
vmovdqu8 zmm4, [arg2+16*4]
vpshufb zmm0, zmm0, zmm18
vpshufb zmm4, zmm4, zmm18
vpxorq zmm0, zmm10
vbroadcasti32x4 zmm10, [rk3] ;xmm10 has rk3 and rk4
;imm value of pclmulqdq instruction will determine which constant to use
sub arg3, 256
cmp arg3, 256
jl .fold_128_B_loop
vmovdqu8 zmm7, [arg2+16*8]
vmovdqu8 zmm8, [arg2+16*12]
vpshufb zmm7, zmm7, zmm18
vpshufb zmm8, zmm8, zmm18
vbroadcasti32x4 zmm16, [rk_1] ;zmm16 has rk-1 and rk-2
sub arg3, 256
%if fetch_dist != 0
; check if there is at least 1.5KB (fetch distance) + 256B in the buffer
cmp arg3, (fetch_dist + 256)
jb .fold_256_B_loop
align 16
.fold_and_prefetch_256_B_loop:
add arg2, 256
PREFETCH [arg2+fetch_dist+0]
vmovdqu8 zmm3, [arg2+16*0]
vpshufb zmm3, zmm3, zmm18
vpclmulqdq zmm1, zmm0, zmm16, 0x00
vpclmulqdq zmm0, zmm0, zmm16, 0x11
vpternlogq zmm0, zmm1, zmm3, 0x96
PREFETCH [arg2+fetch_dist+64]
vmovdqu8 zmm9, [arg2+16*4]
vpshufb zmm9, zmm9, zmm18
vpclmulqdq zmm5, zmm4, zmm16, 0x00
vpclmulqdq zmm4, zmm4, zmm16, 0x11
vpternlogq zmm4, zmm5, zmm9, 0x96
PREFETCH [arg2+fetch_dist+64*2]
vmovdqu8 zmm11, [arg2+16*8]
vpshufb zmm11, zmm11, zmm18
vpclmulqdq zmm12, zmm7, zmm16, 0x00
vpclmulqdq zmm7, zmm7, zmm16, 0x11
vpternlogq zmm7, zmm12, zmm11, 0x96
PREFETCH [arg2+fetch_dist+64*3]
vmovdqu8 zmm17, [arg2+16*12]
vpshufb zmm17, zmm17, zmm18
vpclmulqdq zmm14, zmm8, zmm16, 0x00
vpclmulqdq zmm8, zmm8, zmm16, 0x11
vpternlogq zmm8, zmm14, zmm17, 0x96
sub arg3, 256
; check if there is another 1.5KB (fetch distance) + 256B in the buffer
cmp arg3, (fetch_dist + 256)
jge .fold_and_prefetch_256_B_loop
%endif ; fetch_dist != 0
.fold_256_B_loop:
add arg2, 256
vmovdqu8 zmm3, [arg2+16*0]
vpshufb zmm3, zmm3, zmm18
vpclmulqdq zmm1, zmm0, zmm16, 0x00
vpclmulqdq zmm0, zmm0, zmm16, 0x11
vpternlogq zmm0, zmm1, zmm3, 0x96
vmovdqu8 zmm9, [arg2+16*4]
vpshufb zmm9, zmm9, zmm18
vpclmulqdq zmm5, zmm4, zmm16, 0x00
vpclmulqdq zmm4, zmm4, zmm16, 0x11
vpternlogq zmm4, zmm5, zmm9, 0x96
vmovdqu8 zmm11, [arg2+16*8]
vpshufb zmm11, zmm11, zmm18
vpclmulqdq zmm12, zmm7, zmm16, 0x00
vpclmulqdq zmm7, zmm7, zmm16, 0x11
vpternlogq zmm7, zmm12, zmm11, 0x96
vmovdqu8 zmm17, [arg2+16*12]
vpshufb zmm17, zmm17, zmm18
vpclmulqdq zmm14, zmm8, zmm16, 0x00
vpclmulqdq zmm8, zmm8, zmm16, 0x11
vpternlogq zmm8, zmm14, zmm17, 0x96
sub arg3, 256
jge .fold_256_B_loop
;; Fold 256 into 128
add arg2, 256
vpclmulqdq zmm1, zmm0, zmm10, 0x00
vpclmulqdq zmm2, zmm0, zmm10, 0x11
vpternlogq zmm7, zmm1, zmm2, 0x96 ; xor ABC
vpclmulqdq zmm5, zmm4, zmm10, 0x00
vpclmulqdq zmm6, zmm4, zmm10, 0x11
vpternlogq zmm8, zmm5, zmm6, 0x96 ; xor ABC
vmovdqa32 zmm0, zmm7
vmovdqa32 zmm4, zmm8
add arg3, 128
jmp .fold_128_B_register
; at this section of the code, there is 128*x+y (0<=y<128) bytes of buffer. The fold_128_B_loop
; loop will fold 128B at a time until we have 128+y Bytes of buffer
; fold 128B at a time. This section of the code folds 8 xmm registers in parallel
.fold_128_B_loop:
add arg2, 128
vmovdqu8 zmm8, [arg2+16*0]
vpshufb zmm8, zmm8, zmm18
vpclmulqdq zmm2, zmm0, zmm10, 0x00
vpclmulqdq zmm0, zmm0, zmm10, 0x11
vpternlogq zmm0, zmm2, zmm8, 0x96
vmovdqu8 zmm9, [arg2+16*4]
vpshufb zmm9, zmm9, zmm18
vpclmulqdq zmm5, zmm4, zmm10, 0x00
vpclmulqdq zmm4, zmm4, zmm10, 0x11
vpternlogq zmm4, zmm5, zmm9, 0x96
sub arg3, 128
jge .fold_128_B_loop
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
add arg2, 128
; at this point, the buffer pointer is pointing at the last y Bytes of the buffer, where 0 <= y < 128
; the 128B of folded data is in 8 of the xmm registers: xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7
.fold_128_B_register:
; fold the 8 128b parts into 1 xmm register with different constants
vmovdqu8 zmm16, [rk9] ; multiply by rk9-rk16
vmovdqu8 zmm11, [rk17] ; multiply by rk17-rk20, rk1,rk2, 0,0
vpclmulqdq zmm1, zmm0, zmm16, 0x00
vpclmulqdq zmm2, zmm0, zmm16, 0x11
vextracti64x2 xmm7, zmm4, 3 ; save last that has no multiplicand
vpclmulqdq zmm5, zmm4, zmm11, 0x00
vpclmulqdq zmm6, zmm4, zmm11, 0x11
vmovdqa xmm10, [rk1] ; Needed later in reduction loop
vpternlogq zmm1, zmm2, zmm5, 0x96 ; xor ABC
vpternlogq zmm1, zmm6, zmm7, 0x96 ; xor ABC
vshufi64x2 zmm8, zmm1, zmm1, 0x4e ; Swap 1,0,3,2 - 01 00 11 10
vpxorq ymm8, ymm8, ymm1
vextracti64x2 xmm5, ymm8, 1
vpxorq xmm7, xmm5, xmm8
; instead of 128, we add 128-16 to the loop counter to save 1 instruction from the loop
; instead of a cmp instruction, we use the negative flag with the jl instruction
add arg3, 128-16
jl .final_reduction_for_128
; now we have 16+y bytes left to reduce. 16 Bytes is in register xmm7 and the rest is in memory
; we can fold 16 bytes at a time if y>=16
; continue folding 16B at a time
.16B_reduction_loop:
vpclmulqdq xmm8, xmm7, xmm10, 0x11
vpclmulqdq xmm7, xmm7, xmm10, 0x00
vpxor xmm7, xmm8
vmovdqu xmm0, [arg2]
vpshufb xmm0, xmm0, xmm18
vpxor xmm7, xmm0
add arg2, 16
sub arg3, 16
; instead of a cmp instruction, we utilize the flags with the jge instruction
; equivalent of: cmp arg3, 16-16
; check if there is any more 16B in the buffer to be able to fold
jge .16B_reduction_loop
;now we have 16+z bytes left to reduce, where 0<= z < 16.
;first, we reduce the data in the xmm7 register
.final_reduction_for_128:
add arg3, 16
je .128_done
; here we are getting data that is less than 16 bytes.
; since we know that there was data before the pointer, we can offset
; the input pointer before the actual point, to receive exactly 16 bytes.
; after that the registers need to be adjusted.
.get_last_two_xmms:
vmovdqa xmm2, xmm7
vmovdqu xmm1, [arg2 - 16 + arg3]
vpshufb xmm1, xmm18
; get rid of the extra data that was loaded before
; load the shift constant
lea rax, [pshufb_shf_table + 16]
sub rax, arg3
vmovdqu xmm0, [rax]
vpshufb xmm2, xmm0
vpxor xmm0, [mask1]
vpshufb xmm7, xmm0
vpblendvb xmm1, xmm1, xmm2, xmm0
vpclmulqdq xmm8, xmm7, xmm10, 0x11
vpclmulqdq xmm7, xmm7, xmm10, 0x00
vpxor xmm7, xmm8
vpxor xmm7, xmm1
.128_done:
; compute crc of a 128-bit value
vmovdqa xmm10, [rk5]
vmovdqa xmm0, xmm7
;64b fold
vpclmulqdq xmm7, xmm10, 0x01 ; H*L
vpslldq xmm0, 8
vpxor xmm7, xmm0
;32b fold
vmovdqa xmm0, xmm7
vpand xmm0, [mask2]
vpsrldq xmm7, 12
vpclmulqdq xmm7, xmm10, 0x10
vpxor xmm7, xmm0
;barrett reduction
.barrett:
vmovdqa xmm10, [rk7] ; rk7 and rk8 in xmm10
vmovdqa xmm0, xmm7
vpclmulqdq xmm7, xmm10, 0x01
vpslldq xmm7, 4
vpclmulqdq xmm7, xmm10, 0x11
vpslldq xmm7, 4
vpxor xmm7, xmm0
vpextrd eax, xmm7, 1
.cleanup:
; scale the result back to 16 bits
shr eax, 16
%ifidn __OUTPUT_FORMAT__, win64
vmovdqa xmm6, [rsp + XMM_SAVE + 16*0]
vmovdqa xmm7, [rsp + XMM_SAVE + 16*1]
vmovdqa xmm8, [rsp + XMM_SAVE + 16*2]
vmovdqa xmm9, [rsp + XMM_SAVE + 16*3]
vmovdqa xmm10, [rsp + XMM_SAVE + 16*4]
vmovdqa xmm11, [rsp + XMM_SAVE + 16*5]
vmovdqa xmm12, [rsp + XMM_SAVE + 16*6]
vmovdqa xmm13, [rsp + XMM_SAVE + 16*7]
vmovdqa xmm14, [rsp + XMM_SAVE + 16*8]
vmovdqa xmm15, [rsp + XMM_SAVE + 16*9]
%endif
add rsp, VARIABLE_OFFSET
ret
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
align 16
.less_than_256:
; check if there is enough buffer to be able to fold 16B at a time
cmp arg3, 32
jl .less_than_32
; if there is, load the constants
vmovdqa xmm10, [rk1] ; rk1 and rk2 in xmm10
vmovd xmm0, arg1_low32 ; get the initial crc value
vpslldq xmm0, 12 ; align it to its correct place
vmovdqu xmm7, [arg2] ; load the plaintext
vpshufb xmm7, xmm18 ; byte-reflect the plaintext
vpxor xmm7, xmm0
; update the buffer pointer
add arg2, 16
; update the counter. subtract 32 instead of 16 to save one instruction from the loop
sub arg3, 32
jmp .16B_reduction_loop
align 16
.less_than_32:
; mov initial crc to the return value. this is necessary for zero-length buffers.
mov eax, arg1_low32
test arg3, arg3
je .cleanup
vmovd xmm0, arg1_low32 ; get the initial crc value
vpslldq xmm0, 12 ; align it to its correct place
cmp arg3, 16
je .exact_16_left
jl .less_than_16_left
vmovdqu xmm7, [arg2] ; load the plaintext
vpshufb xmm7, xmm18
vpxor xmm7, xmm0 ; xor the initial crc value
add arg2, 16
sub arg3, 16
vmovdqa xmm10, [rk1] ; rk1 and rk2 in xmm10
jmp .get_last_two_xmms
align 16
.less_than_16_left:
; use stack space to load data less than 16 bytes, zero-out the 16B in memory first.
vpxor xmm1, xmm1
mov r11, rsp
vmovdqa [r11], xmm1
cmp arg3, 4
jl .only_less_than_4
; backup the counter value
mov r9, arg3
cmp arg3, 8
jl .less_than_8_left
; load 8 Bytes
mov rax, [arg2]
mov [r11], rax
add r11, 8
sub arg3, 8
add arg2, 8
.less_than_8_left:
cmp arg3, 4
jl .less_than_4_left
; load 4 Bytes
mov eax, [arg2]
mov [r11], eax
add r11, 4
sub arg3, 4
add arg2, 4
.less_than_4_left:
cmp arg3, 2
jl .less_than_2_left
; load 2 Bytes
mov ax, [arg2]
mov [r11], ax
add r11, 2
sub arg3, 2
add arg2, 2
.less_than_2_left:
cmp arg3, 1
jl .zero_left
; load 1 Byte
mov al, [arg2]
mov [r11], al
.zero_left:
vmovdqa xmm7, [rsp]
vpshufb xmm7, xmm18
vpxor xmm7, xmm0 ; xor the initial crc value
lea rax, [pshufb_shf_table + 16]
sub rax, r9
vmovdqu xmm0, [rax]
vpxor xmm0, [mask1]
vpshufb xmm7,xmm0
jmp .128_done
align 16
.exact_16_left:
vmovdqu xmm7, [arg2]
vpshufb xmm7, xmm18
vpxor xmm7, xmm0 ; xor the initial crc value
jmp .128_done
.only_less_than_4:
cmp arg3, 3
jl .only_less_than_3
; load 3 Bytes
mov al, [arg2]
mov [r11], al
mov al, [arg2+1]
mov [r11+1], al
mov al, [arg2+2]
mov [r11+2], al
vmovdqa xmm7, [rsp]
vpshufb xmm7, xmm18
vpxor xmm7, xmm0 ; xor the initial crc value
vpsrldq xmm7, 5
jmp .barrett
.only_less_than_3:
cmp arg3, 2
jl .only_less_than_2
; load 2 Bytes
mov al, [arg2]
mov [r11], al
mov al, [arg2+1]
mov [r11+1], al
vmovdqa xmm7, [rsp]
vpshufb xmm7, xmm18
vpxor xmm7, xmm0 ; xor the initial crc value
vpsrldq xmm7, 6
jmp .barrett
.only_less_than_2:
; load 1 Byte
mov al, [arg2]
mov [r11], al
vmovdqa xmm7, [rsp]
vpshufb xmm7, xmm18
vpxor xmm7, xmm0 ; xor the initial crc value
vpsrldq xmm7, 7
jmp .barrett
section .data
align 32
%ifndef USE_CONSTS
; precomputed constants
rk_1: dq 0xdccf000000000000
rk_2: dq 0x4b0b000000000000
rk1: dq 0x2d56000000000000
rk2: dq 0x06df000000000000
rk3: dq 0x9d9d000000000000
rk4: dq 0x7cf5000000000000
rk5: dq 0x2d56000000000000
rk6: dq 0x1368000000000000
rk7: dq 0x00000001f65a57f8
rk8: dq 0x000000018bb70000
rk9: dq 0xceae000000000000
rk10: dq 0xbfd6000000000000
rk11: dq 0x1e16000000000000
rk12: dq 0x713c000000000000
rk13: dq 0xf7f9000000000000
rk14: dq 0x80a6000000000000
rk15: dq 0x044c000000000000
rk16: dq 0xe658000000000000
rk17: dq 0xad18000000000000
rk18: dq 0xa497000000000000
rk19: dq 0x6ee3000000000000
rk20: dq 0xe7b5000000000000
rk_1b: dq 0x2d56000000000000
rk_2b: dq 0x06df000000000000
dq 0x0000000000000000
dq 0x0000000000000000
%else
INCLUDE_CONSTS
%endif
mask1: dq 0x8080808080808080, 0x8080808080808080
mask2: dq 0xFFFFFFFFFFFFFFFF, 0x00000000FFFFFFFF
SHUF_MASK: dq 0x08090A0B0C0D0E0F, 0x0001020304050607
pshufb_shf_table:
; use these values for shift constants for the pshufb instruction
; different alignments result in values as shown:
; dq 0x8887868584838281, 0x008f8e8d8c8b8a89 ; shl 15 (16-1) / shr1
; dq 0x8988878685848382, 0x01008f8e8d8c8b8a ; shl 14 (16-3) / shr2
; dq 0x8a89888786858483, 0x0201008f8e8d8c8b ; shl 13 (16-4) / shr3
; dq 0x8b8a898887868584, 0x030201008f8e8d8c ; shl 12 (16-4) / shr4
; dq 0x8c8b8a8988878685, 0x04030201008f8e8d ; shl 11 (16-5) / shr5
; dq 0x8d8c8b8a89888786, 0x0504030201008f8e ; shl 10 (16-6) / shr6
; dq 0x8e8d8c8b8a898887, 0x060504030201008f ; shl 9 (16-7) / shr7
; dq 0x8f8e8d8c8b8a8988, 0x0706050403020100 ; shl 8 (16-8) / shr8
; dq 0x008f8e8d8c8b8a89, 0x0807060504030201 ; shl 7 (16-9) / shr9
; dq 0x01008f8e8d8c8b8a, 0x0908070605040302 ; shl 6 (16-10) / shr10
; dq 0x0201008f8e8d8c8b, 0x0a09080706050403 ; shl 5 (16-11) / shr11
; dq 0x030201008f8e8d8c, 0x0b0a090807060504 ; shl 4 (16-12) / shr12
; dq 0x04030201008f8e8d, 0x0c0b0a0908070605 ; shl 3 (16-13) / shr13
; dq 0x0504030201008f8e, 0x0d0c0b0a09080706 ; shl 2 (16-14) / shr14
; dq 0x060504030201008f, 0x0e0d0c0b0a090807 ; shl 1 (16-15) / shr15
dq 0x8786858483828100, 0x8f8e8d8c8b8a8988
dq 0x0706050403020100, 0x000e0d0c0b0a0908
dq 0x8080808080808080, 0x0f0e0d0c0b0a0908
dq 0x8080808080808080, 0x8080808080808080

View File

@ -45,7 +45,13 @@
%include "reg_sizes.asm"
%define fetch_dist 1024
%ifndef fetch_dist
%define fetch_dist 4096
%endif
%ifndef PREFETCH
%define PREFETCH prefetcht1
%endif
[bits 64]
default rel
@ -66,8 +72,9 @@ section .text
%endif
align 16
global crc16_t10dif_by4:function
mk_global crc16_t10dif_by4, function
crc16_t10dif_by4:
endbranch
; adjust the 16-bit initial_crc value, scale it to 32 bits
shl arg1_low32, 16
@ -123,15 +130,75 @@ crc16_t10dif_by4:
; buffer. The _fold_64_B_loop
; loop will fold 64B at a time until we have 64+y Bytes of buffer
%if fetch_dist != 0
; check if there is at least 4KB (fetch distance) + 64B in the buffer
cmp arg3, (fetch_dist + 64)
jb _fold_64_B_loop
; fold 64B at a time. This section of the code folds 4 xmm
; registers in parallel
_fold_64_B_loop:
align 16
_fold_and_prefetch_64_B_loop:
; update the buffer pointer
add arg2, 64 ; buf += 64;
PREFETCH [arg2+fetch_dist+0]
movdqu xmm4, xmm0
movdqu xmm5, xmm1
pclmulqdq xmm0, xmm6 , 0x11
pclmulqdq xmm1, xmm6 , 0x11
pclmulqdq xmm4, xmm6, 0x0
pclmulqdq xmm5, xmm6, 0x0
pxor xmm0, xmm4
pxor xmm1, xmm5
movdqu xmm4, xmm2
movdqu xmm5, xmm3
pclmulqdq xmm2, xmm6, 0x11
pclmulqdq xmm3, xmm6, 0x11
pclmulqdq xmm4, xmm6, 0x0
pclmulqdq xmm5, xmm6, 0x0
pxor xmm2, xmm4
pxor xmm3, xmm5
movdqu xmm4, [arg2]
movdqu xmm5, [arg2+16]
pshufb xmm4, xmm7
pshufb xmm5, xmm7
pxor xmm0, xmm4
pxor xmm1, xmm5
movdqu xmm4, [arg2+32]
movdqu xmm5, [arg2+48]
pshufb xmm4, xmm7
pshufb xmm5, xmm7
pxor xmm2, xmm4
pxor xmm3, xmm5
sub arg3, 64
; check if there is another 64B in the buffer to be able to fold
cmp arg3, (fetch_dist + 64)
jge _fold_and_prefetch_64_B_loop
%endif ; fetch_dist != 0
; fold 64B at a time. This section of the code folds 4 xmm
; registers in parallel
align 16
_fold_64_B_loop:
; update the buffer pointer
add arg2, 64 ; buf += 64;
prefetchnta [arg2+fetch_dist+0]
movdqu xmm4, xmm0
movdqu xmm5, xmm1
@ -144,7 +211,6 @@ _fold_64_B_loop:
pxor xmm0, xmm4
pxor xmm1, xmm5
prefetchnta [arg2+fetch_dist+32]
movdqu xmm4, xmm2
movdqu xmm5, xmm3
@ -557,6 +623,3 @@ pshufb_shf_table:
; dq 0x060504030201008f, 0x0e0d0c0b0a090807 ; shl 1 (16-15) / shr15
dq 0x8786858483828100, 0x8f8e8d8c8b8a8988
dq 0x0706050403020100, 0x000e0d0c0b0a0908
;;; func core, ver, snum
slversion crc16_t10dif_by4, 05, 02, 0016

View File

@ -0,0 +1,667 @@
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Copyright(c) 2011-2017 Intel Corporation All rights reserved.
;
; Redistribution and use in source and binary forms, with or without
; modification, are permitted provided that the following conditions
; are met:
; * Redistributions of source code must retain the above copyright
; notice, this list of conditions and the following disclaimer.
; * Redistributions in binary form must reproduce the above copyright
; notice, this list of conditions and the following disclaimer in
; the documentation and/or other materials provided with the
; distribution.
; * Neither the name of Intel Corporation nor the names of its
; contributors may be used to endorse or promote products derived
; from this software without specific prior written permission.
;
; THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
; "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
; LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
; A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
; OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
; SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
; LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
; DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
; THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
; (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
; OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;
; Function API:
; UINT16 crc16_t10dif_copy_by4(
; UINT16 init_crc, //initial CRC value, 16 bits
; unsigned char *dst, //buffer pointer destination for copy
; const unsigned char *src, //buffer pointer to calculate CRC on
; UINT64 len //buffer length in bytes (64-bit data)
; );
;
; Authors:
; Erdinc Ozturk
; Vinodh Gopal
; James Guilford
;
; Reference paper titled "Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction"
; URL: http://download.intel.com/design/intarch/papers/323102.pdf
;
%include "reg_sizes.asm"
%ifndef fetch_dist
%define fetch_dist 4096
%endif
%ifndef PREFETCH
%define PREFETCH prefetcht1
%endif
[bits 64]
default rel
section .text
%ifidn __OUTPUT_FORMAT__, win64
%xdefine arg1 rcx
%xdefine arg2 rdx
%xdefine arg3 r8
%xdefine arg4 r9
%xdefine tmp1 r10
%xdefine arg1_low32 ecx
%else
%xdefine arg1 rdi
%xdefine arg2 rsi
%xdefine arg3 rdx
%xdefine arg4 rcx
%xdefine tmp1 r10
%xdefine arg1_low32 edi
%endif
align 16
mk_global crc16_t10dif_copy_by4, function
crc16_t10dif_copy_by4:
endbranch
; adjust the 16-bit initial_crc value, scale it to 32 bits
shl arg1_low32, 16
; After this point, code flow is exactly same as a 32-bit CRC.
; The only difference is before returning eax, we will shift
; it right 16 bits, to scale back to 16 bits.
sub rsp,16*4+8
; push the xmm registers into the stack to maintain
movdqa [rsp+16*2],xmm6
movdqa [rsp+16*3],xmm7
; check if smaller than 128B
cmp arg4, 128
; for sizes less than 128, we can't fold 64B at a time...
jl _less_than_128
; load the initial crc value
movd xmm6, arg1_low32 ; initial crc
; crc value does not need to be byte-reflected, but it needs to
; be moved to the high part of the register.
; because data will be byte-reflected and will align with
; initial crc at correct place.
pslldq xmm6, 12
movdqa xmm7, [SHUF_MASK]
; receive the initial 64B data, xor the initial crc value
movdqu xmm0, [arg3]
movdqu xmm1, [arg3+16]
movdqu xmm2, [arg3+32]
movdqu xmm3, [arg3+48]
; copy initial data
movdqu [arg2], xmm0
movdqu [arg2+16], xmm1
movdqu [arg2+32], xmm2
movdqu [arg2+48], xmm3
pshufb xmm0, xmm7
; XOR the initial_crc value
pxor xmm0, xmm6
pshufb xmm1, xmm7
pshufb xmm2, xmm7
pshufb xmm3, xmm7
movdqa xmm6, [rk3] ;xmm6 has rk3 and rk4
;imm value of pclmulqdq instruction
;will determine which constant to use
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; we subtract 128 instead of 64 to save one instruction from the loop
sub arg4, 128
; at this section of the code, there is 64*x+y (0<=y<64) bytes of
; buffer. The _fold_64_B_loop
; loop will fold 64B at a time until we have 64+y Bytes of buffer
%if fetch_dist != 0
; check if there is at least 4KB (fetch distance) + 64B in the buffer
cmp arg4, (fetch_dist + 64)
jb _fold_64_B_loop
; fold 64B at a time. This section of the code folds 4 xmm
; registers in parallel
align 16
_fold_and_prefetch_64_B_loop:
; update the buffer pointer
add arg3, 64 ; buf += 64;
add arg2, 64
PREFETCH [arg3+fetch_dist+0]
movdqu xmm4, xmm0
movdqu xmm5, xmm1
pclmulqdq xmm0, xmm6 , 0x11
pclmulqdq xmm1, xmm6 , 0x11
pclmulqdq xmm4, xmm6, 0x0
pclmulqdq xmm5, xmm6, 0x0
pxor xmm0, xmm4
pxor xmm1, xmm5
movdqu xmm4, xmm2
movdqu xmm5, xmm3
pclmulqdq xmm2, xmm6, 0x11
pclmulqdq xmm3, xmm6, 0x11
pclmulqdq xmm4, xmm6, 0x0
pclmulqdq xmm5, xmm6, 0x0
pxor xmm2, xmm4
pxor xmm3, xmm5
movdqu xmm4, [arg3]
movdqu xmm5, [arg3+16]
movdqu [arg2], xmm4
movdqu [arg2+16], xmm5
pshufb xmm4, xmm7
pshufb xmm5, xmm7
pxor xmm0, xmm4
pxor xmm1, xmm5
movdqu xmm4, [arg3+32]
movdqu xmm5, [arg3+48]
movdqu [arg2+32], xmm4
movdqu [arg2+48], xmm5
pshufb xmm4, xmm7
pshufb xmm5, xmm7
pxor xmm2, xmm4
pxor xmm3, xmm5
sub arg4, 64
; check if there is another 4KB (fetch distance) + 64B in the buffer
cmp arg4, (fetch_dist + 64)
jge _fold_and_prefetch_64_B_loop
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
%endif ; fetch_dist != 0
; fold 64B at a time. This section of the code folds 4 xmm
; registers in parallel
align 16
_fold_64_B_loop:
; update the buffer pointer
add arg3, 64 ; buf += 64;
add arg2, 64
movdqu xmm4, xmm0
movdqu xmm5, xmm1
pclmulqdq xmm0, xmm6 , 0x11
pclmulqdq xmm1, xmm6 , 0x11
pclmulqdq xmm4, xmm6, 0x0
pclmulqdq xmm5, xmm6, 0x0
pxor xmm0, xmm4
pxor xmm1, xmm5
movdqu xmm4, xmm2
movdqu xmm5, xmm3
pclmulqdq xmm2, xmm6, 0x11
pclmulqdq xmm3, xmm6, 0x11
pclmulqdq xmm4, xmm6, 0x0
pclmulqdq xmm5, xmm6, 0x0
pxor xmm2, xmm4
pxor xmm3, xmm5
movdqu xmm4, [arg3]
movdqu xmm5, [arg3+16]
movdqu [arg2], xmm4
movdqu [arg2+16], xmm5
pshufb xmm4, xmm7
pshufb xmm5, xmm7
pxor xmm0, xmm4
pxor xmm1, xmm5
movdqu xmm4, [arg3+32]
movdqu xmm5, [arg3+48]
movdqu [arg2+32], xmm4
movdqu [arg2+48], xmm5
pshufb xmm4, xmm7
pshufb xmm5, xmm7
pxor xmm2, xmm4
pxor xmm3, xmm5
sub arg4, 64
; check if there is another 64B in the buffer to be able to fold
jge _fold_64_B_loop
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
add arg3, 64
add arg2, 64
; at this point, the buffer pointer is pointing at the last y Bytes of the buffer
; the 64B of folded data is in 4 of the xmm registers: xmm0, xmm1, xmm2, xmm3
; fold the 4 xmm registers to 1 xmm register with different constants
movdqa xmm6, [rk1] ;xmm6 has rk1 and rk2
;imm value of pclmulqdq instruction will
;determine which constant to use
movdqa xmm4, xmm0
pclmulqdq xmm0, xmm6, 0x11
pclmulqdq xmm4, xmm6, 0x0
pxor xmm1, xmm4
pxor xmm1, xmm0
movdqa xmm4, xmm1
pclmulqdq xmm1, xmm6, 0x11
pclmulqdq xmm4, xmm6, 0x0
pxor xmm2, xmm4
pxor xmm2, xmm1
movdqa xmm4, xmm2
pclmulqdq xmm2, xmm6, 0x11
pclmulqdq xmm4, xmm6, 0x0
pxor xmm3, xmm4
pxor xmm3, xmm2
; instead of 64, we add 48 to the loop counter to save 1 instruction from the loop
; instead of a cmp instruction, we use the negative flag with the jl instruction
add arg4, 64-16
jl _final_reduction_for_128
; now we have 16+y bytes left to reduce. 16 Bytes
; is in register xmm3 and the rest is in memory
; we can fold 16 bytes at a time if y>=16
; continue folding 16B at a time
_16B_reduction_loop:
movdqa xmm4, xmm3
pclmulqdq xmm3, xmm6, 0x11
pclmulqdq xmm4, xmm6, 0x0
pxor xmm3, xmm4
movdqu xmm0, [arg3]
movdqu [arg2], xmm0
pshufb xmm0, xmm7
pxor xmm3, xmm0
add arg3, 16
add arg2, 16
sub arg4, 16
; instead of a cmp instruction, we utilize the flags with the jge instruction
; equivalent of: cmp arg4, 16-16
; check if there is any more 16B in the buffer to be able to fold
jge _16B_reduction_loop
;now we have 16+z bytes left to reduce, where 0<= z < 16.
;first, we reduce the data in the xmm3 register
_final_reduction_for_128:
; check if any more data to fold. If not, compute the CRC of the final 128 bits
add arg4, 16
je _128_done
; here we are getting data that is less than 16 bytes.
; since we know that there was data before the pointer,
; we can offset the input pointer before the actual point,
; to receive exactly 16 bytes.
; after that the registers need to be adjusted.
_get_last_two_xmms:
movdqa xmm2, xmm3
movdqu xmm1, [arg3 - 16 + arg4]
movdqu [arg2 - 16 + arg4], xmm1
pshufb xmm1, xmm7
; get rid of the extra data that was loaded before
; load the shift constant
lea rax, [pshufb_shf_table + 16]
sub rax, arg4
movdqu xmm0, [rax]
; shift xmm2 to the left by arg4 bytes
pshufb xmm2, xmm0
; shift xmm3 to the right by 16-arg4 bytes
pxor xmm0, [mask1]
pshufb xmm3, xmm0
pblendvb xmm1, xmm2 ;xmm0 is implicit
; fold 16 Bytes
movdqa xmm2, xmm1
movdqa xmm4, xmm3
pclmulqdq xmm3, xmm6, 0x11
pclmulqdq xmm4, xmm6, 0x0
pxor xmm3, xmm4
pxor xmm3, xmm2
_128_done:
; compute crc of a 128-bit value
movdqa xmm6, [rk5] ; rk5 and rk6 in xmm6
movdqa xmm0, xmm3
;64b fold
pclmulqdq xmm3, xmm6, 0x1
pslldq xmm0, 8
pxor xmm3, xmm0
;32b fold
movdqa xmm0, xmm3
pand xmm0, [mask2]
psrldq xmm3, 12
pclmulqdq xmm3, xmm6, 0x10
pxor xmm3, xmm0
;barrett reduction
_barrett:
movdqa xmm6, [rk7] ; rk7 and rk8 in xmm6
movdqa xmm0, xmm3
pclmulqdq xmm3, xmm6, 0x01
pslldq xmm3, 4
pclmulqdq xmm3, xmm6, 0x11
pslldq xmm3, 4
pxor xmm3, xmm0
pextrd eax, xmm3,1
_cleanup:
; scale the result back to 16 bits
shr eax, 16
movdqa xmm6, [rsp+16*2]
movdqa xmm7, [rsp+16*3]
add rsp,16*4+8
ret
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
align 16
_less_than_128:
; check if there is enough buffer to be able to fold 16B at a time
cmp arg4, 32
jl _less_than_32
movdqa xmm7, [SHUF_MASK]
; if there is, load the constants
movdqa xmm6, [rk1] ; rk1 and rk2 in xmm6
movd xmm0, arg1_low32 ; get the initial crc value
pslldq xmm0, 12 ; align it to its correct place
movdqu xmm3, [arg3] ; load the plaintext
movdqu [arg2], xmm3 ; store copy
pshufb xmm3, xmm7 ; byte-reflect the plaintext
pxor xmm3, xmm0
; update the buffer pointer
add arg3, 16
add arg2, 16
; update the counter. subtract 32 instead of 16 to save one instruction from the loop
sub arg4, 32
jmp _16B_reduction_loop
align 16
_less_than_32:
; mov initial crc to the return value. this is necessary for zero-length buffers.
mov eax, arg1_low32
test arg4, arg4
je _cleanup
movdqa xmm7, [SHUF_MASK]
movd xmm0, arg1_low32 ; get the initial crc value
pslldq xmm0, 12 ; align it to its correct place
cmp arg4, 16
je _exact_16_left
jl _less_than_16_left
movdqu xmm3, [arg3] ; load the plaintext
movdqu [arg2], xmm3 ; store the copy
pshufb xmm3, xmm7 ; byte-reflect the plaintext
pxor xmm3, xmm0 ; xor the initial crc value
add arg3, 16
add arg2, 16
sub arg4, 16
movdqa xmm6, [rk1] ; rk1 and rk2 in xmm6
jmp _get_last_two_xmms
align 16
_less_than_16_left:
; use stack space to load data less than 16 bytes, zero-out the 16B in memory first.
pxor xmm1, xmm1
mov r11, rsp
movdqa [r11], xmm1
cmp arg4, 4
jl _only_less_than_4
; backup the counter value
mov tmp1, arg4
cmp arg4, 8
jl _less_than_8_left
; load 8 Bytes
mov rax, [arg3]
mov [arg2], rax
mov [r11], rax
add r11, 8
sub arg4, 8
add arg3, 8
add arg2, 8
_less_than_8_left:
cmp arg4, 4
jl _less_than_4_left
; load 4 Bytes
mov eax, [arg3]
mov [arg2], eax
mov [r11], eax
add r11, 4
sub arg4, 4
add arg3, 4
add arg2, 4
_less_than_4_left:
cmp arg4, 2
jl _less_than_2_left
; load 2 Bytes
mov ax, [arg3]
mov [arg2], ax
mov [r11], ax
add r11, 2
sub arg4, 2
add arg3, 2
add arg2, 2
_less_than_2_left:
cmp arg4, 1
jl _zero_left
; load 1 Byte
mov al, [arg3]
mov [arg2], al
mov [r11], al
_zero_left:
movdqa xmm3, [rsp]
pshufb xmm3, xmm7
pxor xmm3, xmm0 ; xor the initial crc value
; shl tmp1, 4
lea rax, [pshufb_shf_table + 16]
sub rax, tmp1
movdqu xmm0, [rax]
pxor xmm0, [mask1]
pshufb xmm3, xmm0
jmp _128_done
align 16
_exact_16_left:
movdqu xmm3, [arg3]
movdqu [arg2], xmm3
pshufb xmm3, xmm7
pxor xmm3, xmm0 ; xor the initial crc value
jmp _128_done
_only_less_than_4:
cmp arg4, 3
jl _only_less_than_3
; load 3 Bytes
mov al, [arg3]
mov [arg2], al
mov [r11], al
mov al, [arg3+1]
mov [arg2+1], al
mov [r11+1], al
mov al, [arg3+2]
mov [arg2+2], al
mov [r11+2], al
movdqa xmm3, [rsp]
pshufb xmm3, xmm7
pxor xmm3, xmm0 ; xor the initial crc value
psrldq xmm3, 5
jmp _barrett
_only_less_than_3:
cmp arg4, 2
jl _only_less_than_2
; load 2 Bytes
mov al, [arg3]
mov [arg2], al
mov [r11], al
mov al, [arg3+1]
mov [arg2+1], al
mov [r11+1], al
movdqa xmm3, [rsp]
pshufb xmm3, xmm7
pxor xmm3, xmm0 ; xor the initial crc value
psrldq xmm3, 6
jmp _barrett
_only_less_than_2:
; load 1 Byte
mov al, [arg3]
mov [arg2],al
mov [r11], al
movdqa xmm3, [rsp]
pshufb xmm3, xmm7
pxor xmm3, xmm0 ; xor the initial crc value
psrldq xmm3, 7
jmp _barrett
section .data
; precomputed constants
; these constants are precomputed from the poly: 0x8bb70000 (0x8bb7 scaled to 32 bits)
align 16
; Q = 0x18BB70000
; rk1 = 2^(32*3) mod Q << 32
; rk2 = 2^(32*5) mod Q << 32
; rk3 = 2^(32*15) mod Q << 32
; rk4 = 2^(32*17) mod Q << 32
; rk5 = 2^(32*3) mod Q << 32
; rk6 = 2^(32*2) mod Q << 32
; rk7 = floor(2^64/Q)
; rk8 = Q
rk1:
DQ 0x2d56000000000000
rk2:
DQ 0x06df000000000000
rk3:
DQ 0x044c000000000000
rk4:
DQ 0xe658000000000000
rk5:
DQ 0x2d56000000000000
rk6:
DQ 0x1368000000000000
rk7:
DQ 0x00000001f65a57f8
rk8:
DQ 0x000000018bb70000
mask1:
dq 0x8080808080808080, 0x8080808080808080
mask2:
dq 0xFFFFFFFFFFFFFFFF, 0x00000000FFFFFFFF
SHUF_MASK:
dq 0x08090A0B0C0D0E0F, 0x0001020304050607
pshufb_shf_table:
; use these values for shift constants for the pshufb instruction
; different alignments result in values as shown:
; dq 0x8887868584838281, 0x008f8e8d8c8b8a89 ; shl 15 (16-1) / shr1
; dq 0x8988878685848382, 0x01008f8e8d8c8b8a ; shl 14 (16-3) / shr2
; dq 0x8a89888786858483, 0x0201008f8e8d8c8b ; shl 13 (16-4) / shr3
; dq 0x8b8a898887868584, 0x030201008f8e8d8c ; shl 12 (16-4) / shr4
; dq 0x8c8b8a8988878685, 0x04030201008f8e8d ; shl 11 (16-5) / shr5
; dq 0x8d8c8b8a89888786, 0x0504030201008f8e ; shl 10 (16-6) / shr6
; dq 0x8e8d8c8b8a898887, 0x060504030201008f ; shl 9 (16-7) / shr7
; dq 0x8f8e8d8c8b8a8988, 0x0706050403020100 ; shl 8 (16-8) / shr8
; dq 0x008f8e8d8c8b8a89, 0x0807060504030201 ; shl 7 (16-9) / shr9
; dq 0x01008f8e8d8c8b8a, 0x0908070605040302 ; shl 6 (16-10) / shr10
; dq 0x0201008f8e8d8c8b, 0x0a09080706050403 ; shl 5 (16-11) / shr11
; dq 0x030201008f8e8d8c, 0x0b0a090807060504 ; shl 4 (16-12) / shr12
; dq 0x04030201008f8e8d, 0x0c0b0a0908070605 ; shl 3 (16-13) / shr13
; dq 0x0504030201008f8e, 0x0d0c0b0a09080706 ; shl 2 (16-14) / shr14
; dq 0x060504030201008f, 0x0e0d0c0b0a090807 ; shl 1 (16-15) / shr15
dq 0x8786858483828100, 0x8f8e8d8c8b8a8988
dq 0x0706050403020100, 0x000e0d0c0b0a0908

View File

@ -0,0 +1,666 @@
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Copyright(c) 2011-2020 Intel Corporation All rights reserved.
;
; Redistribution and use in source and binary forms, with or without
; modification, are permitted provided that the following conditions
; are met:
; * Redistributions of source code must retain the above copyright
; notice, this list of conditions and the following disclaimer.
; * Redistributions in binary form must reproduce the above copyright
; notice, this list of conditions and the following disclaimer in
; the documentation and/or other materials provided with the
; distribution.
; * Neither the name of Intel Corporation nor the names of its
; contributors may be used to endorse or promote products derived
; from this software without specific prior written permission.
;
; THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
; "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
; LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
; A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
; OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
; SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
; LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
; DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
; THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
; (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
; OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;
; Function API:
; UINT16 crc16_t10dif_copy_by4_02(
; UINT16 init_crc, //initial CRC value, 16 bits
; unsigned char *dst, //buffer pointer destination for copy
; const unsigned char *src, //buffer pointer to calculate CRC on
; UINT64 len //buffer length in bytes (64-bit data)
; );
;
; Authors:
; Erdinc Ozturk
; Vinodh Gopal
; James Guilford
;
; Reference paper titled "Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction"
; URL: http://download.intel.com/design/intarch/papers/323102.pdf
;
%include "reg_sizes.asm"
%ifndef fetch_dist
%define fetch_dist 4096
%endif
%ifndef PREFETCH
%define PREFETCH prefetcht1
%endif
[bits 64]
default rel
section .text
%ifidn __OUTPUT_FORMAT__, win64
%xdefine arg1 rcx
%xdefine arg2 rdx
%xdefine arg3 r8
%xdefine arg4 r9
%xdefine tmp1 r10
%xdefine arg1_low32 ecx
%else
%xdefine arg1 rdi
%xdefine arg2 rsi
%xdefine arg3 rdx
%xdefine arg4 rcx
%xdefine tmp1 r10
%xdefine arg1_low32 edi
%endif
align 16
mk_global crc16_t10dif_copy_by4_02, function
crc16_t10dif_copy_by4_02:
endbranch
; adjust the 16-bit initial_crc value, scale it to 32 bits
shl arg1_low32, 16
; After this point, code flow is exactly same as a 32-bit CRC.
; The only difference is before returning eax, we will shift
; it right 16 bits, to scale back to 16 bits.
sub rsp,16*4+8
; push the xmm registers into the stack to maintain
movdqa [rsp+16*2],xmm6
movdqa [rsp+16*3],xmm7
; check if smaller than 128B
cmp arg4, 128
; for sizes less than 128, we can't fold 64B at a time...
jl _less_than_128
; load the initial crc value
vmovd xmm6, arg1_low32 ; initial crc
; crc value does not need to be byte-reflected, but it needs to
; be moved to the high part of the register.
; because data will be byte-reflected and will align with
; initial crc at correct place.
vpslldq xmm6, 12
vmovdqa xmm7, [SHUF_MASK]
; receive the initial 64B data, xor the initial crc value
vmovdqu xmm0, [arg3]
vmovdqu xmm1, [arg3+16]
vmovdqu xmm2, [arg3+32]
vmovdqu xmm3, [arg3+48]
; copy initial data
vmovdqu [arg2], xmm0
vmovdqu [arg2+16], xmm1
vmovdqu [arg2+32], xmm2
vmovdqu [arg2+48], xmm3
vpshufb xmm0, xmm7
; XOR the initial_crc value
vpxor xmm0, xmm6
vpshufb xmm1, xmm7
vpshufb xmm2, xmm7
vpshufb xmm3, xmm7
vmovdqa xmm6, [rk3] ;xmm6 has rk3 and rk4
;imm value of pclmulqdq instruction
;will determine which constant to use
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; we subtract 128 instead of 64 to save one instruction from the loop
sub arg4, 128
; at this section of the code, there is 64*x+y (0<=y<64) bytes of
; buffer. The _fold_64_B_loop
; loop will fold 64B at a time until we have 64+y Bytes of buffer
%if fetch_dist != 0
; check if there is at least 4KB (fetch distance) + 64B in the buffer
cmp arg4, (fetch_dist + 64)
jb _fold_64_B_loop
; fold 64B at a time. This section of the code folds 4 xmm
; registers in parallel
align 16
_fold_and_prefetch_64_B_loop:
; update the buffer pointer
add arg3, 64 ; buf += 64;
add arg2, 64
PREFETCH [arg3+fetch_dist+0]
vmovdqu xmm4, xmm0
vmovdqu xmm5, xmm1
vpclmulqdq xmm0, xmm6 , 0x11
vpclmulqdq xmm1, xmm6 , 0x11
vpclmulqdq xmm4, xmm6, 0x0
vpclmulqdq xmm5, xmm6, 0x0
vpxor xmm0, xmm4
vpxor xmm1, xmm5
vmovdqu xmm4, xmm2
vmovdqu xmm5, xmm3
vpclmulqdq xmm2, xmm6, 0x11
vpclmulqdq xmm3, xmm6, 0x11
vpclmulqdq xmm4, xmm6, 0x0
vpclmulqdq xmm5, xmm6, 0x0
vpxor xmm2, xmm4
vpxor xmm3, xmm5
vmovdqu xmm4, [arg3]
vmovdqu xmm5, [arg3+16]
vmovdqu [arg2], xmm4
vmovdqu [arg2+16], xmm5
vpshufb xmm4, xmm7
vpshufb xmm5, xmm7
vpxor xmm0, xmm4
vpxor xmm1, xmm5
vmovdqu xmm4, [arg3+32]
vmovdqu xmm5, [arg3+48]
vmovdqu [arg2+32], xmm4
vmovdqu [arg2+48], xmm5
vpshufb xmm4, xmm7
vpshufb xmm5, xmm7
vpxor xmm2, xmm4
vpxor xmm3, xmm5
sub arg4, 64
; check if there is another 64B in the buffer to be able to fold
cmp arg4, (fetch_dist + 64)
jge _fold_and_prefetch_64_B_loop
%endif ; fetch_dist != 0
; fold 64B at a time. This section of the code folds 4 xmm
; registers in parallel
align 16
_fold_64_B_loop:
; update the buffer pointer
add arg3, 64 ; buf += 64;
add arg2, 64
vmovdqu xmm4, xmm0
vmovdqu xmm5, xmm1
vpclmulqdq xmm0, xmm6 , 0x11
vpclmulqdq xmm1, xmm6 , 0x11
vpclmulqdq xmm4, xmm6, 0x0
vpclmulqdq xmm5, xmm6, 0x0
vpxor xmm0, xmm4
vpxor xmm1, xmm5
vmovdqu xmm4, xmm2
vmovdqu xmm5, xmm3
vpclmulqdq xmm2, xmm6, 0x11
vpclmulqdq xmm3, xmm6, 0x11
vpclmulqdq xmm4, xmm6, 0x0
vpclmulqdq xmm5, xmm6, 0x0
vpxor xmm2, xmm4
vpxor xmm3, xmm5
vmovdqu xmm4, [arg3]
vmovdqu xmm5, [arg3+16]
vmovdqu [arg2], xmm4
vmovdqu [arg2+16], xmm5
vpshufb xmm4, xmm7
vpshufb xmm5, xmm7
vpxor xmm0, xmm4
vpxor xmm1, xmm5
vmovdqu xmm4, [arg3+32]
vmovdqu xmm5, [arg3+48]
vmovdqu [arg2+32], xmm4
vmovdqu [arg2+48], xmm5
vpshufb xmm4, xmm7
vpshufb xmm5, xmm7
vpxor xmm2, xmm4
vpxor xmm3, xmm5
sub arg4, 64
; check if there is another 64B in the buffer to be able to fold
jge _fold_64_B_loop
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
add arg3, 64
add arg2, 64
; at this point, the buffer pointer is pointing at the last y Bytes of the buffer
; the 64B of folded data is in 4 of the xmm registers: xmm0, xmm1, xmm2, xmm3
; fold the 4 xmm registers to 1 xmm register with different constants
vmovdqa xmm6, [rk1] ;xmm6 has rk1 and rk2
;imm value of pclmulqdq instruction will
;determine which constant to use
vmovdqa xmm4, xmm0
vpclmulqdq xmm0, xmm6, 0x11
vpclmulqdq xmm4, xmm6, 0x0
vpxor xmm1, xmm4
vpxor xmm1, xmm0
vmovdqa xmm4, xmm1
vpclmulqdq xmm1, xmm6, 0x11
vpclmulqdq xmm4, xmm6, 0x0
vpxor xmm2, xmm4
vpxor xmm2, xmm1
vmovdqa xmm4, xmm2
vpclmulqdq xmm2, xmm6, 0x11
vpclmulqdq xmm4, xmm6, 0x0
vpxor xmm3, xmm4
vpxor xmm3, xmm2
; instead of 64, we add 48 to the loop counter to save 1 instruction from the loop
; instead of a cmp instruction, we use the negative flag with the jl instruction
add arg4, 64-16
jl _final_reduction_for_128
; now we have 16+y bytes left to reduce. 16 Bytes
; is in register xmm3 and the rest is in memory
; we can fold 16 bytes at a time if y>=16
; continue folding 16B at a time
_16B_reduction_loop:
vmovdqa xmm4, xmm3
vpclmulqdq xmm3, xmm6, 0x11
vpclmulqdq xmm4, xmm6, 0x0
vpxor xmm3, xmm4
vmovdqu xmm0, [arg3]
vmovdqu [arg2], xmm0
vpshufb xmm0, xmm7
vpxor xmm3, xmm0
add arg3, 16
add arg2, 16
sub arg4, 16
; instead of a cmp instruction, we utilize the flags with the jge instruction
; equivalent of: cmp arg4, 16-16
; check if there is any more 16B in the buffer to be able to fold
jge _16B_reduction_loop
;now we have 16+z bytes left to reduce, where 0<= z < 16.
;first, we reduce the data in the xmm3 register
_final_reduction_for_128:
; check if any more data to fold. If not, compute the CRC of the final 128 bits
add arg4, 16
je _128_done
; here we are getting data that is less than 16 bytes.
; since we know that there was data before the pointer,
; we can offset the input pointer before the actual point,
; to receive exactly 16 bytes.
; after that the registers need to be adjusted.
_get_last_two_xmms:
vmovdqa xmm2, xmm3
vmovdqu xmm1, [arg3 - 16 + arg4]
vmovdqu [arg2 - 16 + arg4], xmm1
vpshufb xmm1, xmm7
; get rid of the extra data that was loaded before
; load the shift constant
lea rax, [pshufb_shf_table + 16]
sub rax, arg4
vmovdqu xmm0, [rax]
; shift xmm2 to the left by arg4 bytes
vpshufb xmm2, xmm0
; shift xmm3 to the right by 16-arg4 bytes
vpxor xmm0, [mask1]
vpshufb xmm3, xmm0
vpblendvb xmm1, xmm1, xmm2, xmm0
; fold 16 Bytes
vmovdqa xmm2, xmm1
vmovdqa xmm4, xmm3
vpclmulqdq xmm3, xmm6, 0x11
vpclmulqdq xmm4, xmm6, 0x0
vpxor xmm3, xmm4
vpxor xmm3, xmm2
_128_done:
; compute crc of a 128-bit value
vmovdqa xmm6, [rk5] ; rk5 and rk6 in xmm6
vmovdqa xmm0, xmm3
;64b fold
vpclmulqdq xmm3, xmm6, 0x1
vpslldq xmm0, 8
vpxor xmm3, xmm0
;32b fold
vmovdqa xmm0, xmm3
vpand xmm0, [mask2]
vpsrldq xmm3, 12
vpclmulqdq xmm3, xmm6, 0x10
vpxor xmm3, xmm0
;barrett reduction
_barrett:
vmovdqa xmm6, [rk7] ; rk7 and rk8 in xmm6
vmovdqa xmm0, xmm3
vpclmulqdq xmm3, xmm6, 0x01
vpslldq xmm3, 4
vpclmulqdq xmm3, xmm6, 0x11
vpslldq xmm3, 4
vpxor xmm3, xmm0
vpextrd eax, xmm3,1
_cleanup:
; scale the result back to 16 bits
shr eax, 16
vmovdqa xmm6, [rsp+16*2]
vmovdqa xmm7, [rsp+16*3]
add rsp,16*4+8
ret
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
align 16
_less_than_128:
; check if there is enough buffer to be able to fold 16B at a time
cmp arg4, 32
jl _less_than_32
vmovdqa xmm7, [SHUF_MASK]
; if there is, load the constants
vmovdqa xmm6, [rk1] ; rk1 and rk2 in xmm6
vmovd xmm0, arg1_low32 ; get the initial crc value
vpslldq xmm0, 12 ; align it to its correct place
vmovdqu xmm3, [arg3] ; load the plaintext
vmovdqu [arg2], xmm3 ; store copy
vpshufb xmm3, xmm7 ; byte-reflect the plaintext
vpxor xmm3, xmm0
; update the buffer pointer
add arg3, 16
add arg2, 16
; update the counter. subtract 32 instead of 16 to save one instruction from the loop
sub arg4, 32
jmp _16B_reduction_loop
align 16
_less_than_32:
; mov initial crc to the return value. this is necessary for zero-length buffers.
mov eax, arg1_low32
test arg4, arg4
je _cleanup
vmovdqa xmm7, [SHUF_MASK]
vmovd xmm0, arg1_low32 ; get the initial crc value
vpslldq xmm0, 12 ; align it to its correct place
cmp arg4, 16
je _exact_16_left
jl _less_than_16_left
vmovdqu xmm3, [arg3] ; load the plaintext
vmovdqu [arg2], xmm3 ; store the copy
vpshufb xmm3, xmm7 ; byte-reflect the plaintext
vpxor xmm3, xmm0 ; xor the initial crc value
add arg3, 16
add arg2, 16
sub arg4, 16
vmovdqa xmm6, [rk1] ; rk1 and rk2 in xmm6
jmp _get_last_two_xmms
align 16
_less_than_16_left:
; use stack space to load data less than 16 bytes, zero-out the 16B in memory first.
vpxor xmm1, xmm1
mov r11, rsp
vmovdqa [r11], xmm1
cmp arg4, 4
jl _only_less_than_4
; backup the counter value
mov tmp1, arg4
cmp arg4, 8
jl _less_than_8_left
; load 8 Bytes
mov rax, [arg3]
mov [arg2], rax
mov [r11], rax
add r11, 8
sub arg4, 8
add arg3, 8
add arg2, 8
_less_than_8_left:
cmp arg4, 4
jl _less_than_4_left
; load 4 Bytes
mov eax, [arg3]
mov [arg2], eax
mov [r11], eax
add r11, 4
sub arg4, 4
add arg3, 4
add arg2, 4
_less_than_4_left:
cmp arg4, 2
jl _less_than_2_left
; load 2 Bytes
mov ax, [arg3]
mov [arg2], ax
mov [r11], ax
add r11, 2
sub arg4, 2
add arg3, 2
add arg2, 2
_less_than_2_left:
cmp arg4, 1
jl _zero_left
; load 1 Byte
mov al, [arg3]
mov [arg2], al
mov [r11], al
_zero_left:
vmovdqa xmm3, [rsp]
vpshufb xmm3, xmm7
vpxor xmm3, xmm0 ; xor the initial crc value
; shl tmp1, 4
lea rax, [pshufb_shf_table + 16]
sub rax, tmp1
vmovdqu xmm0, [rax]
vpxor xmm0, [mask1]
vpshufb xmm3, xmm0
jmp _128_done
align 16
_exact_16_left:
vmovdqu xmm3, [arg3]
vmovdqu [arg2], xmm3
vpshufb xmm3, xmm7
vpxor xmm3, xmm0 ; xor the initial crc value
jmp _128_done
_only_less_than_4:
cmp arg4, 3
jl _only_less_than_3
; load 3 Bytes
mov al, [arg3]
mov [arg2], al
mov [r11], al
mov al, [arg3+1]
mov [arg2+1], al
mov [r11+1], al
mov al, [arg3+2]
mov [arg2+2], al
mov [r11+2], al
vmovdqa xmm3, [rsp]
vpshufb xmm3, xmm7
vpxor xmm3, xmm0 ; xor the initial crc value
vpsrldq xmm3, 5
jmp _barrett
_only_less_than_3:
cmp arg4, 2
jl _only_less_than_2
; load 2 Bytes
mov al, [arg3]
mov [arg2], al
mov [r11], al
mov al, [arg3+1]
mov [arg2+1], al
mov [r11+1], al
vmovdqa xmm3, [rsp]
vpshufb xmm3, xmm7
vpxor xmm3, xmm0 ; xor the initial crc value
vpsrldq xmm3, 6
jmp _barrett
_only_less_than_2:
; load 1 Byte
mov al, [arg3]
mov [arg2],al
mov [r11], al
vmovdqa xmm3, [rsp]
vpshufb xmm3, xmm7
vpxor xmm3, xmm0 ; xor the initial crc value
vpsrldq xmm3, 7
jmp _barrett
section .data
; precomputed constants
; these constants are precomputed from the poly: 0x8bb70000 (0x8bb7 scaled to 32 bits)
align 16
; Q = 0x18BB70000
; rk1 = 2^(32*3) mod Q << 32
; rk2 = 2^(32*5) mod Q << 32
; rk3 = 2^(32*15) mod Q << 32
; rk4 = 2^(32*17) mod Q << 32
; rk5 = 2^(32*3) mod Q << 32
; rk6 = 2^(32*2) mod Q << 32
; rk7 = floor(2^64/Q)
; rk8 = Q
rk1:
DQ 0x2d56000000000000
rk2:
DQ 0x06df000000000000
rk3:
DQ 0x044c000000000000
rk4:
DQ 0xe658000000000000
rk5:
DQ 0x2d56000000000000
rk6:
DQ 0x1368000000000000
rk7:
DQ 0x00000001f65a57f8
rk8:
DQ 0x000000018bb70000
mask1:
dq 0x8080808080808080, 0x8080808080808080
mask2:
dq 0xFFFFFFFFFFFFFFFF, 0x00000000FFFFFFFF
SHUF_MASK:
dq 0x08090A0B0C0D0E0F, 0x0001020304050607
pshufb_shf_table:
; use these values for shift constants for the pshufb instruction
; different alignments result in values as shown:
; dq 0x8887868584838281, 0x008f8e8d8c8b8a89 ; shl 15 (16-1) / shr1
; dq 0x8988878685848382, 0x01008f8e8d8c8b8a ; shl 14 (16-3) / shr2
; dq 0x8a89888786858483, 0x0201008f8e8d8c8b ; shl 13 (16-4) / shr3
; dq 0x8b8a898887868584, 0x030201008f8e8d8c ; shl 12 (16-4) / shr4
; dq 0x8c8b8a8988878685, 0x04030201008f8e8d ; shl 11 (16-5) / shr5
; dq 0x8d8c8b8a89888786, 0x0504030201008f8e ; shl 10 (16-6) / shr6
; dq 0x8e8d8c8b8a898887, 0x060504030201008f ; shl 9 (16-7) / shr7
; dq 0x8f8e8d8c8b8a8988, 0x0706050403020100 ; shl 8 (16-8) / shr8
; dq 0x008f8e8d8c8b8a89, 0x0807060504030201 ; shl 7 (16-9) / shr9
; dq 0x01008f8e8d8c8b8a, 0x0908070605040302 ; shl 6 (16-10) / shr10
; dq 0x0201008f8e8d8c8b, 0x0a09080706050403 ; shl 5 (16-11) / shr11
; dq 0x030201008f8e8d8c, 0x0b0a090807060504 ; shl 4 (16-12) / shr12
; dq 0x04030201008f8e8d, 0x0c0b0a0908070605 ; shl 3 (16-13) / shr13
; dq 0x0504030201008f8e, 0x0d0c0b0a09080706 ; shl 2 (16-14) / shr14
; dq 0x060504030201008f, 0x0e0d0c0b0a090807 ; shl 1 (16-15) / shr15
dq 0x8786858483828100, 0x8f8e8d8c8b8a8988
dq 0x0706050403020100, 0x000e0d0c0b0a0908

View File

@ -1,8 +1,8 @@
/**********************************************************************
Copyright(c) 2011-2015 Intel Corporation All rights reserved.
Copyright(c) 2011-2017 Intel Corporation All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
@ -29,69 +29,62 @@
#include <stdio.h>
#include <stdlib.h>
#include <string.h> // for memset
#include "erasure_code.h"
#include <string.h>
#include <stdint.h>
#include "crc.h"
#include "test.h"
//#define CACHED_TEST
#ifdef CACHED_TEST
// Cached test, loop many times over small dataset
# define TEST_LEN 8*1024
# define TEST_LOOPS 4000000
# define TEST_TYPE_STR "_warm"
#else
# ifndef TEST_CUSTOM
// Uncached test. Pull from large mem base.
# define TEST_SOURCES 10
# define GT_L3_CACHE 32*1024*1024 /* some number > last level cache */
# define TEST_LEN GT_L3_CACHE / 2
# define TEST_LOOPS 1000
# define TEST_TYPE_STR "_cold"
# else
# define TEST_TYPE_STR "_cus"
# ifndef TEST_LOOPS
# define TEST_LOOPS 1000
# endif
# endif
#ifndef GT_L3_CACHE
#define GT_L3_CACHE 32 * 1024 * 1024 /* some number > last level cache */
#endif
#define TEST_MEM (2 * TEST_LEN)
#if !defined(COLD_TEST) && !defined(TEST_CUSTOM)
// Cached test, loop many times over small dataset
#define TEST_LEN 8 * 1024
#define TEST_TYPE_STR "_warm"
#elif defined(COLD_TEST)
// Uncached test. Pull from large mem base.
#define TEST_LEN (2 * GT_L3_CACHE)
#define TEST_TYPE_STR "_cold"
#endif
typedef unsigned char u8;
#ifndef TEST_SEED
#define TEST_SEED 0x1234
#endif
int main(int argc, char *argv[])
#define TEST_MEM TEST_LEN
int
main(int argc, char *argv[])
{
int i;
u8 *buff1, *buff2, gf_const_tbl[64], a = 2;
struct perf start, stop;
void *src, *dst;
uint16_t crc;
struct perf start;
printf("gf_vect_mul_sse_perf:\n");
printf("crc16_t10dif_copy_perf:\n");
gf_vect_mul_init(a, gf_const_tbl);
if (posix_memalign(&src, 1024, TEST_LEN)) {
printf("alloc error: Fail");
return -1;
}
if (posix_memalign(&dst, 1024, TEST_LEN)) {
printf("alloc error: Fail");
return -1;
}
// Allocate large mem region
buff1 = (u8 *) malloc(TEST_LEN);
buff2 = (u8 *) malloc(TEST_LEN);
if (NULL == buff1 || NULL == buff2) {
printf("Failed to allocate %dB\n", TEST_LEN);
return 1;
}
printf("Start timed tests\n");
fflush(0);
memset(buff1, 0, TEST_LEN);
memset(buff2, 0, TEST_LEN);
memset(src, 0, TEST_LEN);
BENCHMARK(&start, BENCHMARK_TIME, crc = crc16_t10dif_copy(TEST_SEED, dst, src, TEST_LEN));
printf("crc16_t10dif_copy" TEST_TYPE_STR ": ");
perf_print(start, (long long) TEST_LEN);
printf("Start timed tests\n");
fflush(0);
printf("finish 0x%x\n", crc);
gf_vect_mul_sse(TEST_LEN, gf_const_tbl, buff1, buff2);
perf_start(&start);
for (i = 0; i < TEST_LOOPS; i++) {
gf_vect_mul_init(a, gf_const_tbl); // in a re-build would only calc once
gf_vect_mul_sse(TEST_LEN, gf_const_tbl, buff1, buff2);
}
perf_stop(&stop);
printf("gf_vect_mul_sse" TEST_TYPE_STR ": ");
perf_print(stop, start, (long long)TEST_LEN * i);
// Free allocated memory
aligned_free(src);
aligned_free(dst);
return 0;
return 0;
}

View File

@ -0,0 +1,196 @@
/**********************************************************************
Copyright(c) 2011-2017 Intel Corporation All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Intel Corporation nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
**********************************************************************/
#include <stdio.h>
#include <string.h>
#include <stdint.h>
#include <stdlib.h>
#include <assert.h>
#include "crc.h"
#include "crc_ref.h"
#include "test.h"
#ifndef RANDOMS
#define RANDOMS 20
#endif
#ifndef TEST_SEED
#define TEST_SEED 0x1234
#endif
#define MAX_BUF 2345
#define TEST_SIZE 217
#define TEST_LEN (8 * 1024)
typedef uint16_t u16;
typedef uint8_t u8;
// bitwise crc version
uint16_t
crc16_t10dif_copy_ref(uint16_t seed, uint8_t *dst, uint8_t *src, uint64_t len);
void
rand_buffer(unsigned char *buf, long buffer_size)
{
long i;
for (i = 0; i < buffer_size; i++)
buf[i] = rand();
}
int
memtst(unsigned char *buf, unsigned char c, int len)
{
int i;
for (i = 0; i < len; i++)
if (*buf++ != c)
return 1;
return 0;
}
int
crc_copy_check(const char *description, u8 *dst, u8 *src, u8 dst_fill_val, int len, int tot)
{
u16 seed;
int rem;
assert(tot >= len);
seed = rand();
rem = tot - len;
memset(dst, dst_fill_val, tot);
// multi-binary crc version
u16 crc_dut = crc16_t10dif_copy(seed, dst, src, len);
u16 crc_ref = crc16_t10dif(seed, src, len);
if (crc_dut != crc_ref) {
printf("%s, crc gen fail: 0x%4x 0x%4x len=%d\n", description, crc_dut, crc_ref,
len);
return 1;
} else if (memcmp(dst, src, len)) {
printf("%s, copy fail: len=%d\n", description, len);
return 1;
} else if (memtst(&dst[len], dst_fill_val, rem)) {
printf("%s, writeover fail: len=%d\n", description, len);
return 1;
}
// bitwise crc version
crc_dut = crc16_t10dif_copy_ref(seed, dst, src, len);
crc_ref = crc16_t10dif_ref(seed, src, len);
if (crc_dut != crc_ref) {
printf("%s, crc gen fail (table-driven): 0x%4x 0x%4x len=%d\n", description,
crc_dut, crc_ref, len);
return 1;
} else if (memcmp(dst, src, len)) {
printf("%s, copy fail (table driven): len=%d\n", description, len);
return 1;
} else if (memtst(&dst[len], dst_fill_val, rem)) {
printf("%s, writeover fail (table driven): len=%d\n", description, len);
return 1;
}
return 0;
}
int
main(int argc, char *argv[])
{
int r = 0;
int i;
int len, tot;
u8 *src_raw = NULL, *dst_raw = NULL;
u8 *src, *dst;
printf("Test crc16_t10dif_copy_test:\n");
src_raw = (u8 *) malloc(TEST_LEN);
if (NULL == src_raw) {
printf("alloc error: Fail");
return -1;
}
dst_raw = (u8 *) malloc(TEST_LEN);
if (NULL == dst_raw) {
printf("alloc error: Fail");
return -1;
}
src = src_raw;
dst = dst_raw;
srand(TEST_SEED);
// Test of all zeros
memset(src, 0, TEST_LEN);
r |= crc_copy_check("zero tst", dst, src, 0x5e, MAX_BUF, TEST_LEN);
// Another simple test pattern
memset(src, 0xff, TEST_LEN);
r |= crc_copy_check("simp tst", dst, src, 0x5e, MAX_BUF, TEST_LEN);
// Do a few short len random data tests
rand_buffer(src, TEST_LEN);
rand_buffer(dst, TEST_LEN);
for (i = 0; i < MAX_BUF; i++) {
r |= crc_copy_check("short len", dst, src, rand(), i, MAX_BUF);
}
#ifdef TEST_VERBOSE
printf(".");
#endif
// Do a few longer tests, random data
for (i = TEST_LEN; i >= (TEST_LEN - TEST_SIZE); i--) {
r |= crc_copy_check("long len", dst, src, rand(), i, TEST_LEN);
}
#ifdef TEST_VERBOSE
printf(".");
#endif
// Do random size, random data
for (i = 0; i < RANDOMS; i++) {
len = rand() % TEST_LEN;
r |= crc_copy_check("rand len", dst, src, rand(), len, TEST_LEN);
}
#ifdef TEST_VERBOSE
printf(".");
#endif
// Run tests at end of buffer
for (i = 0; i < RANDOMS; i++) {
len = rand() % TEST_LEN;
src = &src_raw[TEST_LEN - len - 1];
dst = &dst_raw[TEST_LEN - len - 1];
tot = len;
r |= crc_copy_check("end of buffer", dst, src, rand(), len, tot);
}
#ifdef TEST_VERBOSE
printf(".");
#endif
printf("Test done: %s\n", r ? "Fail" : "Pass");
free(src_raw);
free(dst_raw);
return r;
}

123
crc/crc16_t10dif_op_perf.c Normal file
View File

@ -0,0 +1,123 @@
/**********************************************************************
Copyright(c) 2011-2017 Intel Corporation All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Intel Corporation nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
**********************************************************************/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include "crc.h"
#include "test.h"
#define BLKSIZE (512)
#ifndef GT_L3_CACHE
#define GT_L3_CACHE 32 * 1024 * 1024 /* some number > last level cache */
#endif
#if !defined(COLD_TEST) && !defined(TEST_CUSTOM)
// Cached test, loop many times over small dataset
#define NBLOCKS 100
#define TEST_TYPE_STR "_warm"
#elif defined(COLD_TEST)
// Uncached test. Pull from large mem base.
#define TEST_LEN (2 * GT_L3_CACHE)
#define NBLOCKS (TEST_LEN / BLKSIZE)
#define TEST_TYPE_STR "_cold"
#endif
#ifndef TEST_SEED
#define TEST_SEED 0x1234
#endif
struct blk {
uint8_t data[BLKSIZE];
};
struct blk_ext {
uint8_t data[BLKSIZE];
uint32_t tag;
uint16_t meta;
uint16_t crc;
};
static void
crc16_t10dif_copy_perf(struct blk *blks, struct blk *blkp, struct blk_ext *blks_ext,
struct blk_ext *blkp_ext, uint16_t *crc)
{
int i;
for (i = 0, blkp = blks, blkp_ext = blks_ext; i < NBLOCKS; i++) {
*crc = crc16_t10dif_copy(TEST_SEED, blkp_ext->data, blkp->data, sizeof(blks->data));
blkp_ext->crc = *crc;
blkp++;
blkp_ext++;
}
}
int
main(int argc, char *argv[])
{
uint16_t crc;
struct blk *blks = NULL, *blkp = NULL;
struct blk_ext *blks_ext = NULL, *blkp_ext = NULL;
struct perf start;
printf("crc16_t10dif_streaming_insert_perf:\n");
if (posix_memalign((void *) &blks, 1024, NBLOCKS * sizeof(*blks))) {
printf("alloc error: Fail");
return -1;
}
if (posix_memalign((void *) &blks_ext, 1024, NBLOCKS * sizeof(*blks_ext))) {
printf("alloc error: Fail");
return -1;
}
printf(" size blk: %zu, blk_ext: %zu, blk data: %zu, stream: %zu\n", sizeof(*blks),
sizeof(*blks_ext), sizeof(blks->data), NBLOCKS * sizeof(blks->data));
memset(blks, 0xe5, NBLOCKS * sizeof(*blks));
memset(blks_ext, 0xe5, NBLOCKS * sizeof(*blks_ext));
printf("Start timed tests\n");
fflush(0);
// Copy and insert test
BENCHMARK(&start, BENCHMARK_TIME,
crc16_t10dif_copy_perf(blks, blkp, blks_ext, blkp_ext, &crc));
printf("crc16_t10pi_op_copy_insert" TEST_TYPE_STR ": ");
perf_print(start, (long long) sizeof(blks->data) * NBLOCKS);
printf("finish 0x%x\n", crc);
// Free allocated memory
aligned_free(blks);
aligned_free(blks_ext);
return 0;
}

View File

@ -31,57 +31,53 @@
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <sys/time.h>
#include "crc.h"
#include "test.h"
//#define CACHED_TEST
#ifdef CACHED_TEST
#ifndef GT_L3_CACHE
#define GT_L3_CACHE 32 * 1024 * 1024 /* some number > last level cache */
#endif
#if !defined(COLD_TEST) && !defined(TEST_CUSTOM)
// Cached test, loop many times over small dataset
# define TEST_LEN 8*1024
# define TEST_LOOPS 400000
# define TEST_TYPE_STR "_warm"
#else
#define TEST_LEN 8 * 1024
#define TEST_TYPE_STR "_warm"
#elif defined(COLD_TEST)
// Uncached test. Pull from large mem base.
# define GT_L3_CACHE 32*1024*1024 /* some number > last level cache */
# define TEST_LEN (2 * GT_L3_CACHE)
# define TEST_LOOPS 100
# define TEST_TYPE_STR "_cold"
#define TEST_LEN (2 * GT_L3_CACHE)
#define TEST_TYPE_STR "_cold"
#endif
#ifndef TEST_SEED
# define TEST_SEED 0x1234
#define TEST_SEED 0x1234
#endif
#define TEST_MEM TEST_LEN
int main(int argc, char *argv[])
int
main(int argc, char *argv[])
{
int i;
void *buf;
uint16_t crc;
struct perf start, stop;
void *buf;
uint16_t crc;
struct perf start;
printf("crc16_t10dif_perf:\n");
printf("crc16_t10dif_perf:\n");
if (posix_memalign(&buf, 1024, TEST_LEN)) {
printf("alloc error: Fail");
return -1;
}
if (posix_memalign(&buf, 1024, TEST_LEN)) {
printf("alloc error: Fail");
return -1;
}
printf("Start timed tests\n");
fflush(0);
printf("Start timed tests\n");
fflush(0);
memset(buf, 0, TEST_LEN);
crc = crc16_t10dif(TEST_SEED, buf, TEST_LEN);
perf_start(&start);
for (i = 0; i < TEST_LOOPS; i++) {
crc = crc16_t10dif(TEST_SEED, buf, TEST_LEN);
}
perf_stop(&stop);
printf("crc16_t10dif" TEST_TYPE_STR ": ");
perf_print(stop, start, (long long)TEST_LEN * i);
memset(buf, 0, TEST_LEN);
BENCHMARK(&start, BENCHMARK_TIME, crc = crc16_t10dif(TEST_SEED, buf, TEST_LEN));
printf("crc16_t10dif" TEST_TYPE_STR ": ");
perf_print(start, (long long) TEST_LEN);
printf("finish 0x%x\n", crc);
return 0;
printf("finish 0x%x\n", crc);
aligned_free(buf);
return 0;
}

View File

@ -32,136 +32,168 @@
#include <stdint.h>
#include <stdlib.h>
#include "crc.h"
#include "types.h"
#include "crc_ref.h"
#include "test.h"
#ifndef TEST_SEED
# define TEST_SEED 0x1234
#define TEST_SEED 0x1234
#endif
#define MAX_BUF 512
#define TEST_SIZE 20
#define MAX_BUF 4096
#define TEST_SIZE 20
typedef uint32_t u32;
typedef uint16_t u16;
typedef uint8_t u8;
void rand_buffer(unsigned char *buf, long buffer_size)
uint16_t
crc16_t10dif_ref(uint16_t seed, uint8_t *buf, uint64_t len);
void
rand_buffer(unsigned char *buf, long buffer_size)
{
long i;
for (i = 0; i < buffer_size; i++)
buf[i] = rand();
long i;
for (i = 0; i < buffer_size; i++)
buf[i] = rand();
}
int main(int argc, char *argv[])
int
main(int argc, char *argv[])
{
int fail = 0;
u32 r = 0;
int verbose = argc - 1;
int i, s;
void *buf_raw;
unsigned char *buf;
int fail = 0;
u32 r = 0;
int i, s;
void *buf_raw = NULL;
unsigned char *buf;
printf("Test crc16_t10dif_test ");
if (posix_memalign(&buf_raw, MAX_BUF, MAX_BUF * TEST_SIZE)) {
printf("alloc error: Fail");
return -1;
}
buf = (unsigned char *)buf_raw;
printf("Test crc16_t10dif_test ");
if (posix_memalign(&buf_raw, 32, MAX_BUF * TEST_SIZE)) {
printf("alloc error: Fail");
return -1;
}
buf = (unsigned char *) buf_raw;
srand(TEST_SEED);
srand(TEST_SEED);
// Test of all zeros
memset(buf, 0, MAX_BUF * 10);
u16 crc = crc16_t10dif(TEST_SEED, buf, MAX_BUF);
u16 crc_ref = crc16_t10dif_base(TEST_SEED, buf, MAX_BUF);
if (crc != crc_ref) {
fail++;
printf("\n opt ref\n");
printf(" ------ ------\n");
printf("crc zero = 0x%4x 0x%4x \n", crc, crc_ref);
} else
printf(".");
// Test of all zeros
memset(buf, 0, MAX_BUF * 10);
u16 crc_ref = crc16_t10dif_ref(TEST_SEED, buf, MAX_BUF);
u16 crc_base = crc16_t10dif_base(TEST_SEED, buf, MAX_BUF);
u16 crc = crc16_t10dif(TEST_SEED, buf, MAX_BUF);
if ((crc_base != crc_ref) || (crc != crc_ref)) {
fail++;
printf("\n opt ref\n");
printf(" ------ ------\n");
printf("fail crc zero = 0x%4x 0x%4x 0x%4x \n", crc_ref, crc_base, crc);
}
#ifdef TEST_VERBOSE
else
printf(".");
#endif
// Another simple test pattern
memset(buf, 0x8a, MAX_BUF);
crc = crc16_t10dif(TEST_SEED, buf, MAX_BUF);
crc_ref = crc16_t10dif_base(TEST_SEED, buf, MAX_BUF);
if (crc != crc_ref) {
fail++;
printf("crc all 8a = 0x%4x 0x%4x\n", crc, crc_ref);
} else
printf(".");
// Another simple test pattern
memset(buf, 0x8a, MAX_BUF);
crc_ref = crc16_t10dif_ref(TEST_SEED, buf, MAX_BUF);
crc_base = crc16_t10dif_base(TEST_SEED, buf, MAX_BUF);
crc = crc16_t10dif(TEST_SEED, buf, MAX_BUF);
if ((crc_base != crc_ref) || (crc != crc_ref)) {
fail++;
printf("fail crc all 8a = 0x%4x 0x%4x 0x%4x\n", crc_ref, crc_base, crc);
}
#ifdef TEST_VERBOSE
else
printf(".");
#endif
// Do a few random tests
// Do a few random tests
rand_buffer(buf, MAX_BUF * TEST_SIZE);
rand_buffer(buf, MAX_BUF * TEST_SIZE);
for (i = 0; i < TEST_SIZE; i++) {
crc = crc16_t10dif(TEST_SEED, buf, MAX_BUF);
crc_ref = crc16_t10dif_base(TEST_SEED, buf, MAX_BUF);
if (crc != crc_ref)
fail++;
if (verbose)
printf("crc rand%3d = 0x%4x 0x%4x\n", i, crc, crc_ref);
else
printf(".");
buf += MAX_BUF;
}
for (i = 0; i < TEST_SIZE; i++) {
crc_ref = crc16_t10dif_ref(TEST_SEED, buf, MAX_BUF);
crc_base = crc16_t10dif_base(TEST_SEED, buf, MAX_BUF);
crc = crc16_t10dif(TEST_SEED, buf, MAX_BUF);
if ((crc_base != crc_ref) || (crc != crc_ref)) {
fail++;
printf("fail crc rand%3d = 0x%4x 0x%4x 0x%4x\n", i, crc_ref, crc_base, crc);
}
#ifdef TEST_VERBOSE
else if (i % (TEST_SIZE / 8) == 0)
printf(".");
#endif
buf += MAX_BUF;
}
// Do a few random sizes
buf = (unsigned char *)buf_raw; //reset buf
r = rand();
// Do a few random sizes
buf = (unsigned char *) buf_raw; // reset buf
r = rand();
for (i = MAX_BUF; i >= 0; i--) {
crc = crc16_t10dif(r, buf, i);
crc_ref = crc16_t10dif_base(r, buf, i);
if (crc != crc_ref) {
fail++;
printf("fail random size%i 0x%8x 0x%8x\n", i, crc, crc_ref);
} else
printf(".");
}
for (i = MAX_BUF; i >= 0; i--) {
crc_ref = crc16_t10dif_ref(r, buf, i);
crc_base = crc16_t10dif_base(r, buf, i);
crc = crc16_t10dif(r, buf, i);
if ((crc_base != crc_ref) || (crc != crc_ref)) {
fail++;
printf("fail random size%i 0x%8x 0x%8x 0x%8x\n", i, crc_ref, crc_base, crc);
}
#ifdef TEST_VERBOSE
else if (i % (MAX_BUF / 8) == 0)
printf(".");
#endif
}
// Try different seeds
for (s = 0; s < 20; s++) {
buf = (unsigned char *)buf_raw; //reset buf
// Try different seeds
for (s = 0; s < 20; s++) {
buf = (unsigned char *) buf_raw; // reset buf
r = rand(); // just to get a new seed
rand_buffer(buf, MAX_BUF * TEST_SIZE); // new pseudo-rand data
r = rand(); // just to get a new seed
rand_buffer(buf, MAX_BUF * TEST_SIZE); // new pseudo-rand data
if (verbose)
printf("seed = 0x%x\n", r);
#ifdef TEST_VERBOSE
printf("seed = 0x%x\n", r);
#endif
for (i = 0; i < TEST_SIZE; i++) {
crc = crc16_t10dif(r, buf, MAX_BUF);
crc_ref = crc16_t10dif_base(r, buf, MAX_BUF);
if (crc != crc_ref)
fail++;
if (verbose)
printf("crc rand%3d = 0x%4x 0x%4x\n", i, crc, crc_ref);
else
printf(".");
buf += MAX_BUF;
}
}
for (i = 0; i < TEST_SIZE; i++) {
crc_ref = crc16_t10dif_ref(r, buf, MAX_BUF);
crc_base = crc16_t10dif_base(r, buf, MAX_BUF);
crc = crc16_t10dif(r, buf, MAX_BUF);
if ((crc_base != crc_ref) || (crc != crc_ref)) {
fail++;
printf("fail crc rand%3d = 0x%4x 0x%4x 0x%4x\n", i, crc_ref,
crc_base, crc);
}
#ifdef TEST_VERBOSE
else if (i % (TEST_SIZE * 20 / 8) == 0)
printf(".");
#endif
buf += MAX_BUF;
}
}
// Run tests at end of buffer
buf = (unsigned char *)buf_raw; //reset buf
buf = buf + ((MAX_BUF - 1) * TEST_SIZE); //Line up TEST_SIZE from end
for (i = 0; i < TEST_SIZE; i++) {
crc = crc16_t10dif(TEST_SEED, buf + i, TEST_SIZE - i);
crc_ref = crc16_t10dif_base(TEST_SEED, buf + i, TEST_SIZE - i);
if (crc != crc_ref)
fail++;
if (verbose)
printf("crc eob rand%3d = 0x%4x 0x%4x\n", i, crc, crc_ref);
else
printf(".");
}
// Run tests at end of buffer
buf = (unsigned char *) buf_raw; // reset buf
buf = buf + ((MAX_BUF - 1) * TEST_SIZE); // Line up TEST_SIZE from end
for (i = 0; i < TEST_SIZE; i++) {
crc_ref = crc16_t10dif_ref(TEST_SEED, buf + i, TEST_SIZE - i);
crc_base = crc16_t10dif_base(TEST_SEED, buf + i, TEST_SIZE - i);
crc = crc16_t10dif(TEST_SEED, buf + i, TEST_SIZE - i);
if ((crc_base != crc_ref) || (crc != crc_ref)) {
fail++;
printf("fail crc eob rand%3d = 0x%4x 0x%4x 0x%4x\n", i, crc_ref, crc_base,
crc);
}
#ifdef TEST_VERBOSE
else
printf(".");
#endif
}
printf("Test done: %s\n", fail ? "Fail" : "Pass");
if (fail)
printf("\nFailed %d tests\n", fail);
printf("Test done: %s\n", fail ? "Fail" : "Pass");
if (fail)
printf("\nFailed %d tests\n", fail);
return fail;
if (buf)
aligned_free(buf_raw);
return fail;
}

352
crc/crc32_funcs_test.c Normal file
View File

@ -0,0 +1,352 @@
/**********************************************************************
Copyright(c) 2011-2018 Intel Corporation All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Intel Corporation nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
**********************************************************************/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include "crc.h"
#include "crc_ref.h"
#include "test.h"
#ifndef TEST_SEED
#define TEST_SEED 0x1234
#endif
#define MAX_BUF 4096
#define TEST_SIZE 32
typedef uint32_t (*crc32_func_t)(uint32_t, const uint8_t *, uint64_t);
typedef uint32_t (*crc32_func_t_base)(uint32_t, uint8_t *, uint64_t);
typedef uint32_t (*crc32_func_t_ref)(uint32_t, uint8_t *, uint64_t);
typedef struct func_case {
char *note;
crc32_func_t crc32_func_call;
crc32_func_t_base crc32_base_call;
crc32_func_t_ref crc32_ref_call;
} func_case_t;
uint32_t
crc32_iscsi_wrap(uint32_t seed, const uint8_t *buf, uint64_t len)
{
return crc32_iscsi((uint8_t *) buf, len, seed);
}
uint32_t
crc32_iscsi_base_wrap(uint32_t seed, uint8_t *buf, uint64_t len)
{
return crc32_iscsi_base(buf, len, seed);
}
uint32_t
crc32_iscsi_ref_wrap(uint32_t seed, uint8_t *buf, uint64_t len)
{
return crc32_iscsi_ref(buf, len, seed);
}
func_case_t test_funcs[] = {
{ "crc32_ieee", crc32_ieee, crc32_ieee_base, crc32_ieee_ref },
{ "crc32_gzip_refl", crc32_gzip_refl, crc32_gzip_refl_base, crc32_gzip_refl_ref },
{ "crc32_iscsi", crc32_iscsi_wrap, crc32_iscsi_base_wrap, crc32_iscsi_ref_wrap }
};
// Generates pseudo-random data
void
rand_buffer(unsigned char *buf, long buffer_size)
{
long i;
for (i = 0; i < buffer_size; i++)
buf[i] = rand();
}
// Test cases
int
zeros_test(func_case_t *test_func);
int
simple_pattern_test(func_case_t *test_func);
int
seeds_sizes_test(func_case_t *test_func);
int
eob_test(func_case_t *test_func);
int
update_test(func_case_t *test_func);
void *buf_alloc = NULL;
int
main(int argc, char *argv[])
{
int fail = 0, fail_case;
int i, ret;
func_case_t *test_func;
// Align to TEST_SIZE boundary
ret = posix_memalign(&buf_alloc, TEST_SIZE, MAX_BUF * TEST_SIZE);
if (ret) {
printf("alloc error: Fail");
return -1;
}
srand(TEST_SEED);
printf("CRC32 Tests\n");
for (i = 0; i < sizeof(test_funcs) / sizeof(test_funcs[0]); i++) {
fail_case = 0;
test_func = &test_funcs[i];
printf("Test %s\t", test_func->note);
fail_case += zeros_test(test_func);
fail_case += simple_pattern_test(test_func);
fail_case += seeds_sizes_test(test_func);
fail_case += eob_test(test_func);
fail_case += update_test(test_func);
printf(" done: %s\n", fail_case ? "Fail" : "Pass");
if (fail_case) {
printf("\n%s Failed %d tests\n", test_func->note, fail_case);
fail++;
}
}
printf("CRC32 Tests all done: %s\n", fail ? "Fail" : "Pass");
aligned_free(buf_alloc);
return fail;
}
// Test of all zeros
int
zeros_test(func_case_t *test_func)
{
uint32_t crc_ref, crc_base, crc;
int fail = 0;
unsigned char *buf = NULL;
buf = (unsigned char *) buf_alloc;
memset(buf, 0, MAX_BUF * 10);
crc_ref = test_func->crc32_ref_call(TEST_SEED, buf, MAX_BUF * 10);
crc_base = test_func->crc32_base_call(TEST_SEED, buf, MAX_BUF * 10);
crc = test_func->crc32_func_call(TEST_SEED, buf, MAX_BUF * 10);
if ((crc_base != crc_ref) || (crc != crc_ref)) {
fail++;
printf("\n opt ref\n");
printf(" ------ ------\n");
printf("fail crc zero = 0x%8x 0x%8x 0x%8x\n", crc_ref, crc_base, crc);
}
#ifdef TEST_VERBOSE
else
printf(".");
#endif
return fail;
}
// Another simple test pattern
int
simple_pattern_test(func_case_t *test_func)
{
uint32_t crc_ref, crc_base, crc;
int fail = 0;
unsigned char *buf = NULL;
buf = (unsigned char *) buf_alloc;
memset(buf, 0x8a, MAX_BUF);
crc_ref = test_func->crc32_ref_call(TEST_SEED, buf, MAX_BUF);
crc_base = test_func->crc32_base_call(TEST_SEED, buf, MAX_BUF);
crc = test_func->crc32_func_call(TEST_SEED, buf, MAX_BUF);
if ((crc_base != crc_ref) || (crc != crc_ref)) {
fail++;
printf("fail crc all 8a = 0x%8x 0x%8x 0x%8x\n", crc_ref, crc_base, crc);
}
#ifdef TEST_VERBOSE
else
printf(".");
#endif
return fail;
}
int
seeds_sizes_test(func_case_t *test_func)
{
uint32_t crc_ref, crc_base, crc;
int fail = 0;
int i;
uint64_t r, s;
unsigned char *buf = NULL;
// Do a few random tests
buf = (unsigned char *) buf_alloc; // reset buf
r = rand();
rand_buffer(buf, MAX_BUF * TEST_SIZE);
for (i = 0; i < TEST_SIZE; i++) {
crc_ref = test_func->crc32_ref_call(r, buf, MAX_BUF);
crc_base = test_func->crc32_base_call(r, buf, MAX_BUF);
crc = test_func->crc32_func_call(r, buf, MAX_BUF);
if ((crc_base != crc_ref) || (crc != crc_ref)) {
fail++;
printf("fail crc rand%3d = 0x%8x 0x%8x 0x%8x\n", i, crc_ref, crc_base, crc);
}
#ifdef TEST_VERBOSE
else if (i % (TEST_SIZE / 8) == 0)
printf(".");
#endif
buf += MAX_BUF;
}
// Do a few random sizes
buf = (unsigned char *) buf_alloc; // reset buf
r = rand();
for (i = MAX_BUF; i >= 0; i--) {
crc_ref = test_func->crc32_ref_call(r, buf, i);
crc_base = test_func->crc32_base_call(r, buf, i);
crc = test_func->crc32_func_call(r, buf, i);
if ((crc_base != crc_ref) || (crc != crc_ref)) {
fail++;
printf("fail random size%i 0x%8x 0x%8x 0x%8x\n", i, crc_ref, crc_base, crc);
}
#ifdef TEST_VERBOSE
else if (i % (MAX_BUF / 8) == 0)
printf(".");
#endif
}
// Try different seeds
for (s = 0; s < 20; s++) {
buf = (unsigned char *) buf_alloc; // reset buf
r = rand(); // just to get a new seed
rand_buffer(buf, MAX_BUF * TEST_SIZE); // new pseudo-rand data
#ifdef TEST_VERBOSE
printf("seed = 0x%lx\n", r);
#endif
for (i = 0; i < TEST_SIZE; i++) {
crc_ref = test_func->crc32_ref_call(r, buf, MAX_BUF);
crc_base = test_func->crc32_base_call(r, buf, MAX_BUF);
crc = test_func->crc32_func_call(r, buf, MAX_BUF);
if ((crc_base != crc_ref) || (crc != crc_ref)) {
fail++;
printf("fail crc rand%3d = 0x%8x 0x%8x 0x%8x\n", i, crc_ref,
crc_base, crc);
}
#ifdef TEST_VERBOSE
else if (i % (TEST_SIZE * 20 / 8) == 0)
printf(".");
#endif
buf += MAX_BUF;
}
}
return fail;
}
// Run tests at end of buffer
int
eob_test(func_case_t *test_func)
{
uint32_t crc_ref, crc_base, crc;
int fail = 0;
int i;
unsigned char *buf = NULL;
// Null test
if (0 != test_func->crc32_func_call(0, NULL, 0)) {
fail++;
printf("crc null test fail\n");
}
buf = (unsigned char *) buf_alloc; // reset buf
buf = buf + ((MAX_BUF - 1) * TEST_SIZE); // Line up TEST_SIZE from end
for (i = 0; i <= TEST_SIZE; i++) {
crc_ref = test_func->crc32_ref_call(TEST_SEED, buf + i, TEST_SIZE - i);
crc_base = test_func->crc32_base_call(TEST_SEED, buf + i, TEST_SIZE - i);
crc = test_func->crc32_func_call(TEST_SEED, buf + i, TEST_SIZE - i);
if ((crc_base != crc_ref) || (crc != crc_ref)) {
fail++;
printf("fail crc eob rand%3d = 0x%8x 0x%8x 0x%8x\n", i, crc_ref, crc_base,
crc);
}
#ifdef TEST_VERBOSE
else if (i % (TEST_SIZE / 8) == 0)
printf(".");
#endif
}
return fail;
}
int
update_test(func_case_t *test_func)
{
uint32_t crc_ref, crc_base, crc;
int fail = 0;
int i;
uint64_t r;
unsigned char *buf = NULL;
buf = (unsigned char *) buf_alloc; // reset buf
r = rand();
// Process the whole buf with reference func single call.
crc_ref = test_func->crc32_ref_call(r, buf, MAX_BUF * TEST_SIZE);
crc_base = test_func->crc32_base_call(r, buf, MAX_BUF * TEST_SIZE);
// Process buf with update method.
for (i = 0; i < TEST_SIZE; i++) {
crc = test_func->crc32_func_call(r, buf, MAX_BUF);
// Update crc seeds and buf pointer.
r = crc;
buf += MAX_BUF;
}
if ((crc_base != crc_ref) || (crc != crc_ref)) {
fail++;
printf("fail crc rand%3d = 0x%8x 0x%8x 0x%8x\n", i, crc_ref, crc_base, crc);
}
#ifdef TEST_VERBOSE
else
printf(".");
#endif
return fail;
}

View File

@ -0,0 +1,575 @@
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Copyright(c) 2011-2020 Intel Corporation All rights reserved.
;
; Redistribution and use in source and binary forms, with or without
; modification, are permitted provided that the following conditions
; are met:
; * Redistributions of source code must retain the above copyright
; notice, this list of conditions and the following disclaimer.
; * Redistributions in binary form must reproduce the above copyright
; notice, this list of conditions and the following disclaimer in
; the documentation and/or other materials provided with the
; distribution.
; * Neither the name of Intel Corporation nor the names of its
; contributors may be used to endorse or promote products derived
; from this software without specific prior written permission.
;
; THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
; "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
; LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
; A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
; OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
; SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
; LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
; DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
; THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
; (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
; OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Function API:
; UINT32 crc32_gzip_refl_by16_10(
; UINT32 init_crc, //initial CRC value, 32 bits
; const unsigned char *buf, //buffer pointer to calculate CRC on
; UINT64 len //buffer length in bytes (64-bit data)
; );
;
; Authors:
; Erdinc Ozturk
; Vinodh Gopal
; James Guilford
;
; Reference paper titled "Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction"
; URL: http://download.intel.com/design/intarch/papers/323102.pdf
;
;
; sample yasm command line:
; yasm -f x64 -f elf64 -X gnu -g dwarf2 crc32_gzip_refl_by8
;
; As explained here:
; http://docs.oracle.com/javase/7/docs/api/java/util/zip/package-summary.html
; CRC-32 checksum is described in RFC 1952
; Implementing RFC 1952 CRC:
; http://www.ietf.org/rfc/rfc1952.txt
%include "reg_sizes.asm"
%ifndef FUNCTION_NAME
%define FUNCTION_NAME crc32_gzip_refl_by16_10
%endif
%ifndef fetch_dist
%define fetch_dist 1536
%endif
%ifndef PREFETCH
%define PREFETCH prefetcht0
%endif
[bits 64]
default rel
section .text
%ifidn __OUTPUT_FORMAT__, win64
%xdefine arg1 rcx
%xdefine arg2 rdx
%xdefine arg3 r8
%xdefine arg1_low32 ecx
%else
%xdefine arg1 rdi
%xdefine arg2 rsi
%xdefine arg3 rdx
%xdefine arg1_low32 edi
%endif
align 16
mk_global FUNCTION_NAME, function
FUNCTION_NAME:
endbranch
not arg1_low32
%ifidn __OUTPUT_FORMAT__, win64
sub rsp, (16*10 + 8)
; push the xmm registers into the stack to maintain
vmovdqa [rsp + 16*0], xmm6
vmovdqa [rsp + 16*1], xmm7
vmovdqa [rsp + 16*2], xmm8
vmovdqa [rsp + 16*3], xmm9
vmovdqa [rsp + 16*4], xmm10
vmovdqa [rsp + 16*5], xmm11
vmovdqa [rsp + 16*6], xmm12
vmovdqa [rsp + 16*7], xmm13
vmovdqa [rsp + 16*8], xmm14
vmovdqa [rsp + 16*9], xmm15
%endif
; check if smaller than 256B
cmp arg3, 256
jl .less_than_256
; load the initial crc value
vmovd xmm10, arg1_low32 ; initial crc
; receive the initial 64B data, xor the initial crc value
vmovdqu8 zmm0, [arg2+16*0]
vmovdqu8 zmm4, [arg2+16*4]
vpxorq zmm0, zmm10
vbroadcasti32x4 zmm10, [rk3] ;xmm10 has rk3 and rk4
;imm value of pclmulqdq instruction will determine which constant to use
sub arg3, 256
cmp arg3, 256
jl .fold_128_B_loop
vmovdqu8 zmm7, [arg2+16*8]
vmovdqu8 zmm8, [arg2+16*12]
vbroadcasti32x4 zmm16, [rk_1] ;zmm16 has rk-1 and rk-2
sub arg3, 256
%if fetch_dist != 0
; check if there is at least 1.5KB (fetch distance) + 256B in the buffer
cmp arg3, (fetch_dist + 256)
jb .fold_256_B_loop
align 16
.fold_and_prefetch_256_B_loop:
add arg2, 256
PREFETCH [arg2+fetch_dist+0]
vpclmulqdq zmm1, zmm0, zmm16, 0x10
vpclmulqdq zmm0, zmm0, zmm16, 0x01
vpternlogq zmm0, zmm1, [arg2+16*0], 0x96
PREFETCH [arg2+fetch_dist+64]
vpclmulqdq zmm2, zmm4, zmm16, 0x10
vpclmulqdq zmm4, zmm4, zmm16, 0x01
vpternlogq zmm4, zmm2, [arg2+16*4], 0x96
PREFETCH [arg2+fetch_dist+64*2]
vpclmulqdq zmm3, zmm7, zmm16, 0x10
vpclmulqdq zmm7, zmm7, zmm16, 0x01
vpternlogq zmm7, zmm3, [arg2+16*8], 0x96
PREFETCH [arg2+fetch_dist+64*3]
vpclmulqdq zmm5, zmm8, zmm16, 0x10
vpclmulqdq zmm8, zmm8, zmm16, 0x01
vpternlogq zmm8, zmm5, [arg2+16*12], 0x96
sub arg3, 256
; check if there is another 1.5KB (fetch distance) + 256B in the buffer
cmp arg3, (fetch_dist + 256)
jge .fold_and_prefetch_256_B_loop
%endif ; fetch_dist != 0
align 16
.fold_256_B_loop:
add arg2, 256
vpclmulqdq zmm1, zmm0, zmm16, 0x10
vpclmulqdq zmm0, zmm0, zmm16, 0x01
vpternlogq zmm0, zmm1, [arg2+16*0], 0x96
vpclmulqdq zmm2, zmm4, zmm16, 0x10
vpclmulqdq zmm4, zmm4, zmm16, 0x01
vpternlogq zmm4, zmm2, [arg2+16*4], 0x96
vpclmulqdq zmm3, zmm7, zmm16, 0x10
vpclmulqdq zmm7, zmm7, zmm16, 0x01
vpternlogq zmm7, zmm3, [arg2+16*8], 0x96
vpclmulqdq zmm5, zmm8, zmm16, 0x10
vpclmulqdq zmm8, zmm8, zmm16, 0x01
vpternlogq zmm8, zmm5, [arg2+16*12], 0x96
sub arg3, 256
jge .fold_256_B_loop
;; Fold 256 into 128
add arg2, 256
vpclmulqdq zmm1, zmm0, zmm10, 0x01
vpclmulqdq zmm2, zmm0, zmm10, 0x10
vpternlogq zmm7, zmm1, zmm2, 0x96 ; xor ABC
vpclmulqdq zmm5, zmm4, zmm10, 0x01
vpclmulqdq zmm6, zmm4, zmm10, 0x10
vpternlogq zmm8, zmm5, zmm6, 0x96 ; xor ABC
vmovdqa32 zmm0, zmm7
vmovdqa32 zmm4, zmm8
add arg3, 128
jmp .less_than_128_B
; at this section of the code, there is 128*x+y (0<=y<128) bytes of buffer. The fold_128_B_loop
; loop will fold 128B at a time until we have 128+y Bytes of buffer
; fold 128B at a time. This section of the code folds 8 xmm registers in parallel
align 16
.fold_128_B_loop:
add arg2, 128
vpclmulqdq zmm2, zmm0, zmm10, 0x10
vpclmulqdq zmm0, zmm0, zmm10, 0x01
vpternlogq zmm0, zmm2, [arg2+16*0], 0x96
vpclmulqdq zmm5, zmm4, zmm10, 0x10
vpclmulqdq zmm4, zmm4, zmm10, 0x01
vpternlogq zmm4, zmm5, [arg2+16*4], 0x96
sub arg3, 128
jge .fold_128_B_loop
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
add arg2, 128
align 16
.less_than_128_B:
;; At this point, the buffer pointer is pointing at the last
;; y bytes of the buffer, where 0 <= y < 128.
;; The 128 bytes of folded data is in 2 of the zmm registers:
;; zmm0 and zmm4
cmp arg3, -64
jl .fold_128_B_register
vbroadcasti32x4 zmm10, [rk15]
;; If there are still 64 bytes left, folds from 128 bytes to 64 bytes
;; and handles the next 64 bytes
vpclmulqdq zmm2, zmm0, zmm10, 0x10
vpclmulqdq zmm0, zmm0, zmm10, 0x01
vpternlogq zmm0, zmm2, zmm4, 0x96
add arg3, 128
jmp .fold_64B_loop
align 16
.fold_128_B_register:
; fold the 8 128b parts into 1 xmm register with different constants
vmovdqu8 zmm16, [rk9] ; multiply by rk9-rk16
vmovdqu8 zmm11, [rk17] ; multiply by rk17-rk20, rk1,rk2, 0,0
vpclmulqdq zmm1, zmm0, zmm16, 0x01
vpclmulqdq zmm2, zmm0, zmm16, 0x10
vextracti64x2 xmm7, zmm4, 3 ; save last that has no multiplicand
vpclmulqdq zmm5, zmm4, zmm11, 0x01
vpclmulqdq zmm6, zmm4, zmm11, 0x10
vmovdqa xmm10, [rk1] ; Needed later in reduction loop
vpternlogq zmm1, zmm2, zmm5, 0x96 ; xor ABC
vpternlogq zmm1, zmm6, zmm7, 0x96 ; xor ABC
vshufi64x2 zmm8, zmm1, zmm1, 0x4e ; Swap 1,0,3,2 - 01 00 11 10
vpxorq ymm8, ymm8, ymm1
vextracti64x2 xmm5, ymm8, 1
vpxorq xmm7, xmm5, xmm8
; instead of 128, we add 128-16 to the loop counter to save 1 instruction from the loop
; instead of a cmp instruction, we use the negative flag with the jl instruction
add arg3, 128-16
jl .final_reduction_for_128
; now we have 16+y bytes left to reduce. 16 Bytes is in register xmm7 and the rest is in memory
; we can fold 16 bytes at a time if y>=16
; continue folding 16B at a time
align 16
.16B_reduction_loop:
vpclmulqdq xmm8, xmm7, xmm10, 0x1
vpclmulqdq xmm7, xmm7, xmm10, 0x10
vpternlogq xmm7, xmm8, [arg2], 0x96
add arg2, 16
sub arg3, 16
; instead of a cmp instruction, we utilize the flags with the jge instruction
; equivalent of: cmp arg3, 16-16
; check if there is any more 16B in the buffer to be able to fold
jge .16B_reduction_loop
;now we have 16+z bytes left to reduce, where 0<= z < 16.
;first, we reduce the data in the xmm7 register
align 16
.final_reduction_for_128:
add arg3, 16
je .128_done
; here we are getting data that is less than 16 bytes.
; since we know that there was data before the pointer, we can offset
; the input pointer before the actual point, to receive exactly 16 bytes.
; after that the registers need to be adjusted.
align 16
.get_last_two_xmms:
vmovdqa xmm2, xmm7
vmovdqu xmm1, [arg2 - 16 + arg3]
; get rid of the extra data that was loaded before
; load the shift constant
lea rax, [rel pshufb_shf_table]
add rax, arg3
vmovdqu xmm0, [rax]
vpshufb xmm7, xmm0
vpxor xmm0, [mask3]
vpshufb xmm2, xmm0
vpblendvb xmm2, xmm2, xmm1, xmm0
;;;;;;;;;;
vpclmulqdq xmm8, xmm7, xmm10, 0x1
vpclmulqdq xmm7, xmm7, xmm10, 0x10
vpternlogq xmm7, xmm8, xmm2, 0x96
align 16
.128_done:
; compute crc of a 128-bit value
vmovdqa xmm10, [rk5]
vmovdqa xmm0, xmm7
;64b fold
vpclmulqdq xmm7, xmm10, 0
vpsrldq xmm0, 8
vpxor xmm7, xmm0
;32b fold
vmovdqa xmm0, xmm7
vpslldq xmm7, 4
vpclmulqdq xmm7, xmm10, 0x10
vpxor xmm7, xmm0
;barrett reduction
align 16
.barrett:
vpand xmm7, [mask2]
vmovdqa xmm1, xmm7
vmovdqa xmm2, xmm7
vmovdqa xmm10, [rk7]
vpclmulqdq xmm7, xmm10, 0
vpternlogq xmm7, xmm2, [mask], 0x28
vmovdqa xmm2, xmm7
vpclmulqdq xmm7, xmm10, 0x10
vpternlogq xmm7, xmm2, xmm1, 0x96
vpextrd eax, xmm7, 2
align 16
.cleanup:
not eax
%ifidn __OUTPUT_FORMAT__, win64
vmovdqa xmm6, [rsp + 16*0]
vmovdqa xmm7, [rsp + 16*1]
vmovdqa xmm8, [rsp + 16*2]
vmovdqa xmm9, [rsp + 16*3]
vmovdqa xmm10, [rsp + 16*4]
vmovdqa xmm11, [rsp + 16*5]
vmovdqa xmm12, [rsp + 16*6]
vmovdqa xmm13, [rsp + 16*7]
vmovdqa xmm14, [rsp + 16*8]
vmovdqa xmm15, [rsp + 16*9]
add rsp, (16*10 + 8)
%endif
ret
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
align 16
.less_than_256:
; check if there is enough buffer to be able to fold 16B at a time
cmp arg3, 32
jl .less_than_32
vmovd xmm1, arg1_low32 ; get the initial crc value ; if there is, load the constants
cmp arg3, 64
jl .less_than_64
;; receive the initial 64B data, xor the initial crc value
vmovdqu8 zmm0, [arg2]
vpxorq zmm0, zmm1
add arg2, 64
sub arg3, 64
cmp arg3, 64
jb .reduce_64B
vbroadcasti32x4 zmm10, [rk15]
align 16
.fold_64B_loop:
vmovdqu8 zmm4, [arg2]
vpclmulqdq zmm2, zmm0, zmm10, 0x10
vpclmulqdq zmm0, zmm0, zmm10, 0x01
vpternlogq zmm0, zmm2, zmm4, 0x96
add arg2, 64
sub arg3, 64
cmp arg3, 64
jge .fold_64B_loop
align 16
.reduce_64B:
; Reduce from 64 bytes to 16 bytes
vmovdqu8 zmm11, [rk17]
vpclmulqdq zmm1, zmm0, zmm11, 0x01
vpclmulqdq zmm2, zmm0, zmm11, 0x10
vextracti64x2 xmm7, zmm0, 3 ; save last that has no multiplicand
vpternlogq zmm1, zmm2, zmm7, 0x96
vmovdqa xmm10, [rk_1b] ; Needed later in reduction loop
vshufi64x2 zmm8, zmm1, zmm1, 0x4e ; Swap 1,0,3,2 - 01 00 11 10
vpxorq ymm8, ymm8, ymm1
vextracti64x2 xmm5, ymm8, 1
vpxorq xmm7, xmm5, xmm8
sub arg3, 16
jns .16B_reduction_loop ; At least 16 bytes of data to digest
jmp .final_reduction_for_128
align 16
.less_than_64:
;; if there is, load the constants
vmovdqa xmm10, [rk_1b]
vmovdqu xmm7, [arg2] ; load the plaintext
vpxor xmm7, xmm1 ; xmm1 already has initial crc value
;; update the buffer pointer
add arg2, 16
;; update the counter
;; - subtract 32 instead of 16 to save one instruction from the loop
sub arg3, 32
jmp .16B_reduction_loop
align 16
.less_than_32:
; mov initial crc to the return value. this is necessary for zero-length buffers.
mov eax, arg1_low32
test arg3, arg3
je .cleanup
vmovd xmm0, arg1_low32 ; get the initial crc value
cmp arg3, 16
je .exact_16_left
jl .less_than_16_left
vmovdqu xmm7, [arg2] ; load the plaintext
vpxor xmm7, xmm0 ; xor the initial crc value
add arg2, 16
sub arg3, 16
vmovdqa xmm10, [rk1] ; rk1 and rk2 in xmm10
jmp .get_last_two_xmms
align 16
.less_than_16_left:
xor r10, r10
bts r10, arg3
dec r10
kmovw k2, r10d
vmovdqu8 xmm7{k2}{z}, [arg2]
vpxor xmm7, xmm0 ; xor the initial crc value
cmp arg3, 4
jb .only_less_than_4
lea rax, [rel pshufb_shf_table]
vmovdqu xmm0, [rax + arg3]
vpshufb xmm7,xmm0
jmp .128_done
align 16
.exact_16_left:
vmovdqu xmm7, [arg2]
vpxor xmm7, xmm0 ; xor the initial crc value
jmp .128_done
align 16
.only_less_than_4:
lea r11, [rel pshufb_shift_table]
vmovdqu xmm0, [r11 + arg3]
vpshufb xmm7, xmm0
jmp .barrett
section .data
align 32
%ifndef USE_CONSTS
; precomputed constants
rk_1: dq 0x00000000e95c1271
rk_2: dq 0x00000000ce3371cb
rk1: dq 0x00000000ccaa009e
rk2: dq 0x00000001751997d0
rk3: dq 0x000000014a7fe880
rk4: dq 0x00000001e88ef372
rk5: dq 0x00000000ccaa009e
rk6: dq 0x0000000163cd6124
rk7: dq 0x00000001f7011640
rk8: dq 0x00000001db710640
rk9: dq 0x00000001d7cfc6ac
rk10: dq 0x00000001ea89367e
rk11: dq 0x000000018cb44e58
rk12: dq 0x00000000df068dc2
rk13: dq 0x00000000ae0b5394
rk14: dq 0x00000001c7569e54
rk15: dq 0x00000001c6e41596
rk16: dq 0x0000000154442bd4
rk17: dq 0x0000000174359406
rk18: dq 0x000000003db1ecdc
rk19: dq 0x000000015a546366
rk20: dq 0x00000000f1da05aa
rk_1b: dq 0x00000000ccaa009e
rk_2b: dq 0x00000001751997d0
dq 0x0000000000000000
dq 0x0000000000000000
%else
INCLUDE_CONSTS
%endif
pshufb_shf_table:
; use these values for shift constants for the pshufb instruction
; different alignments result in values as shown:
; dq 0x8887868584838281, 0x008f8e8d8c8b8a89 ; shl 15 (16-1) / shr1
; dq 0x8988878685848382, 0x01008f8e8d8c8b8a ; shl 14 (16-3) / shr2
; dq 0x8a89888786858483, 0x0201008f8e8d8c8b ; shl 13 (16-4) / shr3
; dq 0x8b8a898887868584, 0x030201008f8e8d8c ; shl 12 (16-4) / shr4
; dq 0x8c8b8a8988878685, 0x04030201008f8e8d ; shl 11 (16-5) / shr5
; dq 0x8d8c8b8a89888786, 0x0504030201008f8e ; shl 10 (16-6) / shr6
; dq 0x8e8d8c8b8a898887, 0x060504030201008f ; shl 9 (16-7) / shr7
; dq 0x8f8e8d8c8b8a8988, 0x0706050403020100 ; shl 8 (16-8) / shr8
; dq 0x008f8e8d8c8b8a89, 0x0807060504030201 ; shl 7 (16-9) / shr9
; dq 0x01008f8e8d8c8b8a, 0x0908070605040302 ; shl 6 (16-10) / shr10
; dq 0x0201008f8e8d8c8b, 0x0a09080706050403 ; shl 5 (16-11) / shr11
; dq 0x030201008f8e8d8c, 0x0b0a090807060504 ; shl 4 (16-12) / shr12
; dq 0x04030201008f8e8d, 0x0c0b0a0908070605 ; shl 3 (16-13) / shr13
; dq 0x0504030201008f8e, 0x0d0c0b0a09080706 ; shl 2 (16-14) / shr14
; dq 0x060504030201008f, 0x0e0d0c0b0a090807 ; shl 1 (16-15) / shr15
dq 0x8786858483828100, 0x8f8e8d8c8b8a8988
dq 0x0706050403020100, 0x000e0d0c0b0a0908
align 16
pshufb_shift_table:
;; use these values to shift data for the pshufb instruction
db 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
db 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
db 0x08, 0x09, 0x0A
mask: dq 0xFFFFFFFFFFFFFFFF, 0x0000000000000000
mask2: dq 0xFFFFFFFF00000000, 0xFFFFFFFFFFFFFFFF
mask3: dq 0x8080808080808080, 0x8080808080808080

View File

@ -1,5 +1,5 @@
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Copyright(c) 2011-2016 Intel Corporation All rights reserved.
; Copyright(c) 2011-2017 Intel Corporation All rights reserved.
;
; Redistribution and use in source and binary forms, with or without
; modification, are permitted provided that the following conditions
@ -29,7 +29,7 @@
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Function API:
; UINT32 crc32_gzip(
; UINT32 crc32_gzip_refl_by8(
; UINT32 init_crc, //initial CRC value, 32 bits
; const unsigned char *buf, //buffer pointer to calculate CRC on
; UINT64 len //buffer length in bytes (64-bit data)
@ -45,7 +45,7 @@
;
;
; sample yasm command line:
; yasm -f x64 -f elf64 -X gnu -g dwarf2 crc32_gzip
; yasm -f x64 -f elf64 -X gnu -g dwarf2 crc32_gzip_refl_by8
;
; As explained here:
; http://docs.oracle.com/javase/7/docs/api/java/util/zip/package-summary.html
@ -55,6 +55,15 @@
%include "reg_sizes.asm"
%ifndef fetch_dist
%define fetch_dist 4096
%endif
%ifndef PREFETCH
%define PREFETCH prefetcht1
%endif
[bits 64]
default rel
@ -84,8 +93,9 @@ section .text
%endif
align 16
global crc32_gzip_01
crc32_gzip_01:
mk_global crc32_gzip_refl_by8, function
crc32_gzip_refl_by8:
endbranch
; unsigned long c = crc ^ 0xffffffffL;
not arg1_low32 ;
@ -126,7 +136,7 @@ crc32_gzip_01:
; XOR the initial_crc value
pxor xmm0, xmm10
movdqa xmm10, [rk3] ;xmm10 has rk3 and rk4
movdqa xmm10, [rk3] ;xmm10 has rk3 and rk4
;imm value of pclmulqdq instruction will determine which constant to use
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; we subtract 256 instead of 128 to save one instruction from the loop
@ -142,6 +152,7 @@ _fold_128_B_loop:
; update the buffer pointer
add arg2, 128
PREFETCH [arg2+fetch_dist+0]
movdqu xmm9, [arg2+16*0]
movdqu xmm12, [arg2+16*1]
movdqa xmm8, xmm0
@ -155,6 +166,7 @@ _fold_128_B_loop:
pxor xmm1, xmm12
xorps xmm1, xmm13
PREFETCH [arg2+fetch_dist+32]
movdqu xmm9, [arg2+16*2]
movdqu xmm12, [arg2+16*3]
movdqa xmm8, xmm2
@ -168,6 +180,7 @@ _fold_128_B_loop:
pxor xmm3, xmm12
xorps xmm3, xmm13
PREFETCH [arg2+fetch_dist+64]
movdqu xmm9, [arg2+16*4]
movdqu xmm12, [arg2+16*5]
movdqa xmm8, xmm4
@ -181,6 +194,7 @@ _fold_128_B_loop:
pxor xmm5, xmm12
xorps xmm5, xmm13
PREFETCH [arg2+fetch_dist+96]
movdqu xmm9, [arg2+16*6]
movdqu xmm12, [arg2+16*7]
movdqa xmm8, xmm6
@ -586,6 +600,12 @@ DQ 0x000000015a546366
rk20 :
DQ 0x00000000f1da05aa
mask:
dq 0xFFFFFFFFFFFFFFFF, 0x0000000000000000
mask2:
dq 0xFFFFFFFF00000000, 0xFFFFFFFFFFFFFFFF
mask3:
dq 0x8080808080808080, 0x8080808080808080
pshufb_shf_table:
; use these values for shift constants for the pshufb instruction
@ -607,11 +627,3 @@ pshufb_shf_table:
; dq 0x060504030201008f, 0x0e0d0c0b0a090807 ; shl 1 (16-15) / shr15
dq 0x8786858483828100, 0x8f8e8d8c8b8a8988
dq 0x0706050403020100, 0x000e0d0c0b0a0908
mask:
dq 0xFFFFFFFFFFFFFFFF, 0x0000000000000000
mask2:
dq 0xFFFFFFFF00000000, 0xFFFFFFFFFFFFFFFF
mask3:
dq 0x8080808080808080, 0x8080808080808080

View File

@ -0,0 +1,623 @@
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Copyright(c) 2011-2020 Intel Corporation All rights reserved.
;
; Redistribution and use in source and binary forms, with or without
; modification, are permitted provided that the following conditions
; are met:
; * Redistributions of source code must retain the above copyright
; notice, this list of conditions and the following disclaimer.
; * Redistributions in binary form must reproduce the above copyright
; notice, this list of conditions and the following disclaimer in
; the documentation and/or other materials provided with the
; distribution.
; * Neither the name of Intel Corporation nor the names of its
; contributors may be used to endorse or promote products derived
; from this software without specific prior written permission.
;
; THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
; "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
; LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
; A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
; OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
; SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
; LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
; DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
; THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
; (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
; OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Function API:
; UINT32 crc32_gzip_refl_by8_02(
; UINT32 init_crc, //initial CRC value, 32 bits
; const unsigned char *buf, //buffer pointer to calculate CRC on
; UINT64 len //buffer length in bytes (64-bit data)
; );
;
; Authors:
; Erdinc Ozturk
; Vinodh Gopal
; James Guilford
;
; Reference paper titled "Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction"
; URL: http://download.intel.com/design/intarch/papers/323102.pdf
;
;
; sample yasm command line:
; yasm -f x64 -f elf64 -X gnu -g dwarf2 crc32_gzip_refl_by8
;
; As explained here:
; http://docs.oracle.com/javase/7/docs/api/java/util/zip/package-summary.html
; CRC-32 checksum is described in RFC 1952
; Implementing RFC 1952 CRC:
; http://www.ietf.org/rfc/rfc1952.txt
%include "reg_sizes.asm"
%ifndef fetch_dist
%define fetch_dist 4096
%endif
%ifndef PREFETCH
%define PREFETCH prefetcht1
%endif
[bits 64]
default rel
section .text
%ifidn __OUTPUT_FORMAT__, win64
%xdefine arg1 rcx
%xdefine arg2 rdx
%xdefine arg3 r8
%xdefine arg1_low32 ecx
%else
%xdefine arg1 rdi
%xdefine arg2 rsi
%xdefine arg3 rdx
%xdefine arg1_low32 edi
%endif
%define TMP 16*0
%ifidn __OUTPUT_FORMAT__, win64
%define XMM_SAVE 16*2
%define VARIABLE_OFFSET 16*10+8
%else
%define VARIABLE_OFFSET 16*2+8
%endif
align 16
mk_global crc32_gzip_refl_by8_02, function
crc32_gzip_refl_by8_02:
endbranch
not arg1_low32
sub rsp, VARIABLE_OFFSET
%ifidn __OUTPUT_FORMAT__, win64
; push the xmm registers into the stack to maintain
vmovdqa [rsp + XMM_SAVE + 16*0], xmm6
vmovdqa [rsp + XMM_SAVE + 16*1], xmm7
vmovdqa [rsp + XMM_SAVE + 16*2], xmm8
vmovdqa [rsp + XMM_SAVE + 16*3], xmm9
vmovdqa [rsp + XMM_SAVE + 16*4], xmm10
vmovdqa [rsp + XMM_SAVE + 16*5], xmm11
vmovdqa [rsp + XMM_SAVE + 16*6], xmm12
vmovdqa [rsp + XMM_SAVE + 16*7], xmm13
%endif
; check if smaller than 256B
cmp arg3, 256
jl .less_than_256
; load the initial crc value
vmovd xmm10, arg1_low32 ; initial crc
; receive the initial 64B data, xor the initial crc value
vmovdqu xmm0, [arg2+16*0]
vmovdqu xmm1, [arg2+16*1]
vmovdqu xmm2, [arg2+16*2]
vmovdqu xmm3, [arg2+16*3]
vmovdqu xmm4, [arg2+16*4]
vmovdqu xmm5, [arg2+16*5]
vmovdqu xmm6, [arg2+16*6]
vmovdqu xmm7, [arg2+16*7]
; XOR the initial_crc value
vpxor xmm0, xmm10
vmovdqa xmm10, [rk3] ;xmm10 has rk3 and rk4
;imm value of pclmulqdq instruction will determine which constant to use
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; we subtract 256 instead of 128 to save one instruction from the loop
sub arg3, 256
; at this section of the code, there is 128*x+y (0<=y<128) bytes of buffer. The fold_128_B_loop
; loop will fold 128B at a time until we have 128+y Bytes of buffer
%if fetch_dist != 0
; check if there is at least 4KB (fetch distance) + 128B in the buffer
cmp arg3, (fetch_dist + 128)
jb .fold_128_B_loop
; fold 128B at a time. This section of the code folds 8 xmm registers in parallel
align 16
.fold_and_prefetch_128_B_loop:
add arg2, 128
PREFETCH [arg2+fetch_dist+0]
vmovdqu xmm9, [arg2+16*0]
vmovdqu xmm12, [arg2+16*1]
vpclmulqdq xmm8, xmm0, xmm10, 0x10
vpclmulqdq xmm0, xmm0, xmm10 , 0x1
vpclmulqdq xmm13, xmm1, xmm10, 0x10
vpclmulqdq xmm1, xmm1, xmm10 , 0x1
vpxor xmm0, xmm9
vxorps xmm0, xmm8
vpxor xmm1, xmm12
vxorps xmm1, xmm13
vmovdqu xmm9, [arg2+16*2]
vmovdqu xmm12, [arg2+16*3]
vpclmulqdq xmm8, xmm2, xmm10, 0x10
vpclmulqdq xmm2, xmm2, xmm10 , 0x1
vpclmulqdq xmm13, xmm3, xmm10, 0x10
vpclmulqdq xmm3, xmm3, xmm10 , 0x1
vpxor xmm2, xmm9
vxorps xmm2, xmm8
vpxor xmm3, xmm12
vxorps xmm3, xmm13
PREFETCH [arg2+fetch_dist+64]
vmovdqu xmm9, [arg2+16*4]
vmovdqu xmm12, [arg2+16*5]
vpclmulqdq xmm8, xmm4, xmm10, 0x10
vpclmulqdq xmm4, xmm4, xmm10 , 0x1
vpclmulqdq xmm13, xmm5, xmm10, 0x10
vpclmulqdq xmm5, xmm5, xmm10 , 0x1
vpxor xmm4, xmm9
vxorps xmm4, xmm8
vpxor xmm5, xmm12
vxorps xmm5, xmm13
vmovdqu xmm9, [arg2+16*6]
vmovdqu xmm12, [arg2+16*7]
vpclmulqdq xmm8, xmm6, xmm10, 0x10
vpclmulqdq xmm6, xmm6, xmm10 , 0x1
vpclmulqdq xmm13, xmm7, xmm10, 0x10
vpclmulqdq xmm7, xmm7, xmm10 , 0x1
vpxor xmm6, xmm9
vxorps xmm6, xmm8
vpxor xmm7, xmm12
vxorps xmm7, xmm13
sub arg3, 128
; check if there is another 4KB (fetch distance) + 128B in the buffer
cmp arg3, (fetch_dist + 128)
jge .fold_and_prefetch_128_B_loop
%endif ; fetch_dist != 0
; fold 128B at a time. This section of the code folds 8 xmm registers in parallel
align 16
.fold_128_B_loop:
add arg2, 128
vmovdqu xmm9, [arg2+16*0]
vmovdqu xmm12, [arg2+16*1]
vpclmulqdq xmm8, xmm0, xmm10, 0x10
vpclmulqdq xmm0, xmm0, xmm10 , 0x1
vpclmulqdq xmm13, xmm1, xmm10, 0x10
vpclmulqdq xmm1, xmm1, xmm10 , 0x1
vpxor xmm0, xmm9
vxorps xmm0, xmm8
vpxor xmm1, xmm12
vxorps xmm1, xmm13
vmovdqu xmm9, [arg2+16*2]
vmovdqu xmm12, [arg2+16*3]
vpclmulqdq xmm8, xmm2, xmm10, 0x10
vpclmulqdq xmm2, xmm2, xmm10 , 0x1
vpclmulqdq xmm13, xmm3, xmm10, 0x10
vpclmulqdq xmm3, xmm3, xmm10 , 0x1
vpxor xmm2, xmm9
vxorps xmm2, xmm8
vpxor xmm3, xmm12
vxorps xmm3, xmm13
vmovdqu xmm9, [arg2+16*4]
vmovdqu xmm12, [arg2+16*5]
vpclmulqdq xmm8, xmm4, xmm10, 0x10
vpclmulqdq xmm4, xmm4, xmm10 , 0x1
vpclmulqdq xmm13, xmm5, xmm10, 0x10
vpclmulqdq xmm5, xmm5, xmm10 , 0x1
vpxor xmm4, xmm9
vxorps xmm4, xmm8
vpxor xmm5, xmm12
vxorps xmm5, xmm13
vmovdqu xmm9, [arg2+16*6]
vmovdqu xmm12, [arg2+16*7]
vpclmulqdq xmm8, xmm6, xmm10, 0x10
vpclmulqdq xmm6, xmm6, xmm10 , 0x1
vpclmulqdq xmm13, xmm7, xmm10, 0x10
vpclmulqdq xmm7, xmm7, xmm10 , 0x1
vpxor xmm6, xmm9
vxorps xmm6, xmm8
vpxor xmm7, xmm12
vxorps xmm7, xmm13
sub arg3, 128
jge .fold_128_B_loop
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
add arg2, 128
; at this point, the buffer pointer is pointing at the last y Bytes of the buffer, where 0 <= y < 128
; the 128B of folded data is in 8 of the xmm registers: xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7
; fold the 8 xmm registers to 1 xmm register with different constants
vmovdqa xmm10, [rk9]
vpclmulqdq xmm8, xmm0, xmm10, 0x1
vpclmulqdq xmm0, xmm0, xmm10, 0x10
vpxor xmm7, xmm8
vxorps xmm7, xmm0
vmovdqa xmm10, [rk11]
vpclmulqdq xmm8, xmm1, xmm10, 0x1
vpclmulqdq xmm1, xmm1, xmm10, 0x10
vpxor xmm7, xmm8
vxorps xmm7, xmm1
vmovdqa xmm10, [rk13]
vpclmulqdq xmm8, xmm2, xmm10, 0x1
vpclmulqdq xmm2, xmm2, xmm10, 0x10
vpxor xmm7, xmm8
vpxor xmm7, xmm2
vmovdqa xmm10, [rk15]
vpclmulqdq xmm8, xmm3, xmm10, 0x1
vpclmulqdq xmm3, xmm3, xmm10, 0x10
vpxor xmm7, xmm8
vxorps xmm7, xmm3
vmovdqa xmm10, [rk17]
vpclmulqdq xmm8, xmm4, xmm10, 0x1
vpclmulqdq xmm4, xmm4, xmm10, 0x10
vpxor xmm7, xmm8
vpxor xmm7, xmm4
vmovdqa xmm10, [rk19]
vpclmulqdq xmm8, xmm5, xmm10, 0x1
vpclmulqdq xmm5, xmm5, xmm10, 0x10
vpxor xmm7, xmm8
vxorps xmm7, xmm5
vmovdqa xmm10, [rk1]
vpclmulqdq xmm8, xmm6, xmm10, 0x1
vpclmulqdq xmm6, xmm6, xmm10, 0x10
vpxor xmm7, xmm8
vpxor xmm7, xmm6
; instead of 128, we add 128-16 to the loop counter to save 1 instruction from the loop
; instead of a cmp instruction, we use the negative flag with the jl instruction
add arg3, 128-16
jl .final_reduction_for_128
; now we have 16+y bytes left to reduce. 16 Bytes is in register xmm7 and the rest is in memory
; we can fold 16 bytes at a time if y>=16
; continue folding 16B at a time
.16B_reduction_loop:
vpclmulqdq xmm8, xmm7, xmm10, 0x1
vpclmulqdq xmm7, xmm7, xmm10, 0x10
vpxor xmm7, xmm8
vmovdqu xmm0, [arg2]
vpxor xmm7, xmm0
add arg2, 16
sub arg3, 16
; instead of a cmp instruction, we utilize the flags with the jge instruction
; equivalent of: cmp arg3, 16-16
; check if there is any more 16B in the buffer to be able to fold
jge .16B_reduction_loop
;now we have 16+z bytes left to reduce, where 0<= z < 16.
;first, we reduce the data in the xmm7 register
.final_reduction_for_128:
add arg3, 16
je .128_done
; here we are getting data that is less than 16 bytes.
; since we know that there was data before the pointer, we can offset
; the input pointer before the actual point, to receive exactly 16 bytes.
; after that the registers need to be adjusted.
.get_last_two_xmms:
vmovdqa xmm2, xmm7
vmovdqu xmm1, [arg2 - 16 + arg3]
; get rid of the extra data that was loaded before
; load the shift constant
lea rax, [pshufb_shf_table]
add rax, arg3
vmovdqu xmm0, [rax]
vpshufb xmm7, xmm0
vpxor xmm0, [mask3]
vpshufb xmm2, xmm0
vpblendvb xmm2, xmm2, xmm1, xmm0
;;;;;;;;;;
vpclmulqdq xmm8, xmm7, xmm10, 0x1
vpclmulqdq xmm7, xmm7, xmm10, 0x10
vpxor xmm7, xmm8
vpxor xmm7, xmm2
.128_done:
; compute crc of a 128-bit value
vmovdqa xmm10, [rk5]
vmovdqa xmm0, xmm7
;64b fold
vpclmulqdq xmm7, xmm10, 0
vpsrldq xmm0, 8
vpxor xmm7, xmm0
;32b fold
vmovdqa xmm0, xmm7
vpslldq xmm7, 4
vpclmulqdq xmm7, xmm10, 0x10
vpxor xmm7, xmm0
;barrett reduction
.barrett:
vpand xmm7, [mask2]
vmovdqa xmm1, xmm7
vmovdqa xmm2, xmm7
vmovdqa xmm10, [rk7]
vpclmulqdq xmm7, xmm10, 0
vpxor xmm7, xmm2
vpand xmm7, [mask]
vmovdqa xmm2, xmm7
vpclmulqdq xmm7, xmm10, 0x10
vpxor xmm7, xmm2
vpxor xmm7, xmm1
vpextrd eax, xmm7, 2
.cleanup:
not eax
%ifidn __OUTPUT_FORMAT__, win64
vmovdqa xmm6, [rsp + XMM_SAVE + 16*0]
vmovdqa xmm7, [rsp + XMM_SAVE + 16*1]
vmovdqa xmm8, [rsp + XMM_SAVE + 16*2]
vmovdqa xmm9, [rsp + XMM_SAVE + 16*3]
vmovdqa xmm10, [rsp + XMM_SAVE + 16*4]
vmovdqa xmm11, [rsp + XMM_SAVE + 16*5]
vmovdqa xmm12, [rsp + XMM_SAVE + 16*6]
vmovdqa xmm13, [rsp + XMM_SAVE + 16*7]
%endif
add rsp, VARIABLE_OFFSET
ret
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
align 16
.less_than_256:
; check if there is enough buffer to be able to fold 16B at a time
cmp arg3, 32
jl .less_than_32
; if there is, load the constants
vmovdqa xmm10, [rk1] ; rk1 and rk2 in xmm10
vmovd xmm0, arg1_low32 ; get the initial crc value
vmovdqu xmm7, [arg2] ; load the plaintext
vpxor xmm7, xmm0
; update the buffer pointer
add arg2, 16
; update the counter. subtract 32 instead of 16 to save one instruction from the loop
sub arg3, 32
jmp .16B_reduction_loop
align 16
.less_than_32:
; mov initial crc to the return value. this is necessary for zero-length buffers.
mov eax, arg1_low32
test arg3, arg3
je .cleanup
vmovd xmm0, arg1_low32 ; get the initial crc value
cmp arg3, 16
je .exact_16_left
jl .less_than_16_left
vmovdqu xmm7, [arg2] ; load the plaintext
vpxor xmm7, xmm0 ; xor the initial crc value
add arg2, 16
sub arg3, 16
vmovdqa xmm10, [rk1] ; rk1 and rk2 in xmm10
jmp .get_last_two_xmms
align 16
.less_than_16_left:
; use stack space to load data less than 16 bytes, zero-out the 16B in memory first.
vpxor xmm1, xmm1
mov r11, rsp
vmovdqa [r11], xmm1
cmp arg3, 4
jl .only_less_than_4
; backup the counter value
mov r9, arg3
cmp arg3, 8
jl .less_than_8_left
; load 8 Bytes
mov rax, [arg2]
mov [r11], rax
add r11, 8
sub arg3, 8
add arg2, 8
.less_than_8_left:
cmp arg3, 4
jl .less_than_4_left
; load 4 Bytes
mov eax, [arg2]
mov [r11], eax
add r11, 4
sub arg3, 4
add arg2, 4
.less_than_4_left:
cmp arg3, 2
jl .less_than_2_left
; load 2 Bytes
mov ax, [arg2]
mov [r11], ax
add r11, 2
sub arg3, 2
add arg2, 2
.less_than_2_left:
cmp arg3, 1
jl .zero_left
; load 1 Byte
mov al, [arg2]
mov [r11], al
.zero_left:
vmovdqa xmm7, [rsp]
vpxor xmm7, xmm0 ; xor the initial crc value
lea rax,[pshufb_shf_table]
vmovdqu xmm0, [rax + r9]
vpshufb xmm7,xmm0
jmp .128_done
align 16
.exact_16_left:
vmovdqu xmm7, [arg2]
vpxor xmm7, xmm0 ; xor the initial crc value
jmp .128_done
.only_less_than_4:
cmp arg3, 3
jl .only_less_than_3
; load 3 Bytes
mov al, [arg2]
mov [r11], al
mov al, [arg2+1]
mov [r11+1], al
mov al, [arg2+2]
mov [r11+2], al
vmovdqa xmm7, [rsp]
vpxor xmm7, xmm0 ; xor the initial crc value
vpslldq xmm7, 5
jmp .barrett
.only_less_than_3:
cmp arg3, 2
jl .only_less_than_2
; load 2 Bytes
mov al, [arg2]
mov [r11], al
mov al, [arg2+1]
mov [r11+1], al
vmovdqa xmm7, [rsp]
vpxor xmm7, xmm0 ; xor the initial crc value
vpslldq xmm7, 6
jmp .barrett
.only_less_than_2:
; load 1 Byte
mov al, [arg2]
mov [r11], al
vmovdqa xmm7, [rsp]
vpxor xmm7, xmm0 ; xor the initial crc value
vpslldq xmm7, 7
jmp .barrett
section .data
; precomputed constants
align 16
rk1: dq 0x00000000ccaa009e
rk2: dq 0x00000001751997d0
rk3: dq 0x000000014a7fe880
rk4: dq 0x00000001e88ef372
rk5: dq 0x00000000ccaa009e
rk6: dq 0x0000000163cd6124
rk7: dq 0x00000001f7011640
rk8: dq 0x00000001db710640
rk9: dq 0x00000001d7cfc6ac
rk10: dq 0x00000001ea89367e
rk11: dq 0x000000018cb44e58
rk12: dq 0x00000000df068dc2
rk13: dq 0x00000000ae0b5394
rk14: dq 0x00000001c7569e54
rk15: dq 0x00000001c6e41596
rk16: dq 0x0000000154442bd4
rk17: dq 0x0000000174359406
rk18: dq 0x000000003db1ecdc
rk19: dq 0x000000015a546366
rk20: dq 0x00000000f1da05aa
mask: dq 0xFFFFFFFFFFFFFFFF, 0x0000000000000000
mask2: dq 0xFFFFFFFF00000000, 0xFFFFFFFFFFFFFFFF
mask3: dq 0x8080808080808080, 0x8080808080808080
pshufb_shf_table:
; use these values for shift constants for the pshufb instruction
; different alignments result in values as shown:
; dq 0x8887868584838281, 0x008f8e8d8c8b8a89 ; shl 15 (16-1) / shr1
; dq 0x8988878685848382, 0x01008f8e8d8c8b8a ; shl 14 (16-3) / shr2
; dq 0x8a89888786858483, 0x0201008f8e8d8c8b ; shl 13 (16-4) / shr3
; dq 0x8b8a898887868584, 0x030201008f8e8d8c ; shl 12 (16-4) / shr4
; dq 0x8c8b8a8988878685, 0x04030201008f8e8d ; shl 11 (16-5) / shr5
; dq 0x8d8c8b8a89888786, 0x0504030201008f8e ; shl 10 (16-6) / shr6
; dq 0x8e8d8c8b8a898887, 0x060504030201008f ; shl 9 (16-7) / shr7
; dq 0x8f8e8d8c8b8a8988, 0x0706050403020100 ; shl 8 (16-8) / shr8
; dq 0x008f8e8d8c8b8a89, 0x0807060504030201 ; shl 7 (16-9) / shr9
; dq 0x01008f8e8d8c8b8a, 0x0908070605040302 ; shl 6 (16-10) / shr10
; dq 0x0201008f8e8d8c8b, 0x0a09080706050403 ; shl 5 (16-11) / shr11
; dq 0x030201008f8e8d8c, 0x0b0a090807060504 ; shl 4 (16-12) / shr12
; dq 0x04030201008f8e8d, 0x0c0b0a0908070605 ; shl 3 (16-13) / shr13
; dq 0x0504030201008f8e, 0x0d0c0b0a09080706 ; shl 2 (16-14) / shr14
; dq 0x060504030201008f, 0x0e0d0c0b0a090807 ; shl 1 (16-15) / shr15
dq 0x8786858483828100, 0x8f8e8d8c8b8a8988
dq 0x0706050403020100, 0x000e0d0c0b0a0908

View File

@ -1,8 +1,8 @@
/**********************************************************************
Copyright(c) 2011-2015 Intel Corporation All rights reserved.
Copyright(c) 2011-2017 Intel Corporation All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
@ -29,71 +29,65 @@
#include <stdio.h>
#include <stdlib.h>
#include <string.h> // for memset
#include "erasure_code.h"
#include <string.h>
#include <stdint.h>
#include "crc.h"
#include "test.h"
//#define CACHED_TEST
#ifdef CACHED_TEST
// Cached test, loop many times over small dataset
# define TEST_LEN 8*1024
# define TEST_LOOPS 4000000
# define TEST_TYPE_STR "_warm"
#else
# ifndef TEST_CUSTOM
// Uncached test. Pull from large mem base.
# define TEST_SOURCES 10
# define GT_L3_CACHE 32*1024*1024 /* some number > last level cache */
# define TEST_LEN GT_L3_CACHE / 2
# define TEST_LOOPS 1000
# define TEST_TYPE_STR "_cold"
# else
# define TEST_TYPE_STR "_cus"
# ifndef TEST_LOOPS
# define TEST_LOOPS 1000
# endif
# endif
#ifndef GT_L3_CACHE
#define GT_L3_CACHE 32 * 1024 * 1024 /* some number > last level cache */
#endif
#define TEST_MEM (2 * TEST_LEN)
#if !defined(COLD_TEST) && !defined(TEST_CUSTOM)
// Cached test, loop many times over small dataset
#define TEST_LEN 8 * 1024
#define TEST_TYPE_STR "_warm"
#elif defined(COLD_TEST)
// Uncached test. Pull from large mem base.
#define TEST_LEN (2 * GT_L3_CACHE)
#define TEST_TYPE_STR "_cold"
#endif
typedef unsigned char u8;
#ifndef TEST_SEED
#define TEST_SEED 0x1234
#endif
int main(int argc, char *argv[])
#define TEST_MEM TEST_LEN
int
main(int argc, char *argv[])
{
int i;
u8 *buff1, *buff2, gf_const_tbl[64], a = 2;
struct perf start, stop;
void *buf;
uint32_t crc;
struct perf start;
printf("gf_vect_mul_avx_perf:\n");
printf("crc32_gzip_refl_perf:\n");
gf_vect_mul_init(a, gf_const_tbl);
if (posix_memalign(&buf, 1024, TEST_LEN)) {
printf("alloc error: Fail");
return -1;
}
// Allocate large mem region
buff1 = (u8 *) malloc(TEST_LEN);
buff2 = (u8 *) malloc(TEST_LEN);
if (NULL == buff1 || NULL == buff2) {
printf("Failed to allocate %dB\n", TEST_LEN);
return 1;
}
printf("Start timed tests\n");
fflush(0);
memset(buff1, 0, TEST_LEN);
memset(buff2, 0, TEST_LEN);
memset(buf, 0, TEST_LEN);
BENCHMARK(&start, BENCHMARK_TIME, crc = crc32_gzip_refl(TEST_SEED, buf, TEST_LEN));
printf("crc32_gzip_refl" TEST_TYPE_STR ": ");
perf_print(start, (long long) TEST_LEN);
gf_vect_mul_avx(TEST_LEN, gf_const_tbl, buff1, buff2);
printf("finish 0x%x\n", crc);
printf("Start timed tests\n");
fflush(0);
printf("crc32_gzip_refl_base_perf:\n");
printf("Start timed tests\n");
fflush(0);
gf_vect_mul_avx(TEST_LEN, gf_const_tbl, buff1, buff2);
perf_start(&start);
for (i = 0; i < TEST_LOOPS; i++) {
gf_vect_mul_init(a, gf_const_tbl);
gf_vect_mul_avx(TEST_LEN, gf_const_tbl, buff1, buff2);
}
perf_stop(&stop);
printf("gf_vect_mul_avx" TEST_TYPE_STR ": ");
perf_print(stop, start, (long long)TEST_LEN * i);
BENCHMARK(&start, BENCHMARK_TIME, crc = crc32_gzip_refl_base(TEST_SEED, buf, TEST_LEN));
printf("crc32_gzip_refl_base" TEST_TYPE_STR ": ");
perf_print(start, (long long) TEST_LEN);
return 0;
printf("finish 0x%x\n", crc);
aligned_free(buf);
return 0;
}

View File

@ -44,7 +44,14 @@
%include "reg_sizes.asm"
%define fetch_dist 1024
%ifndef fetch_dist
%define fetch_dist 4096
%endif
%ifndef PREFETCH
%define PREFETCH prefetcht1
%endif
[bits 64]
default rel
@ -72,8 +79,9 @@ section .text
%define VARIABLE_OFFSET 16*2+8
%endif
align 16
global crc32_ieee_01:function
mk_global crc32_ieee_01, function
crc32_ieee_01:
endbranch
not arg1_low32 ;~init_crc
@ -138,13 +146,19 @@ crc32_ieee_01:
; loop will fold 128B at a time until we have 128+y Bytes of buffer
%if fetch_dist != 0
; check if there is at least 4kb (fetch distance) + 128b in the buffer
cmp arg3, (fetch_dist + 128)
jb _fold_128_B_loop
; fold 128B at a time. This section of the code folds 8 xmm registers in parallel
_fold_128_B_loop:
align 16
_fold_and_prefetch_128_B_loop:
; update the buffer pointer
add arg2, 128 ; buf += 128;
prefetchnta [arg2+fetch_dist+0]
PREFETCH [arg2+fetch_dist+0]
movdqu xmm9, [arg2+16*0]
movdqu xmm12, [arg2+16*1]
pshufb xmm9, xmm11
@ -160,7 +174,6 @@ _fold_128_B_loop:
pxor xmm1, xmm12
xorps xmm1, xmm13
prefetchnta [arg2+fetch_dist+32]
movdqu xmm9, [arg2+16*2]
movdqu xmm12, [arg2+16*3]
pshufb xmm9, xmm11
@ -176,7 +189,83 @@ _fold_128_B_loop:
pxor xmm3, xmm12
xorps xmm3, xmm13
prefetchnta [arg2+fetch_dist+64]
PREFETCH [arg2+fetch_dist+64]
movdqu xmm9, [arg2+16*4]
movdqu xmm12, [arg2+16*5]
pshufb xmm9, xmm11
pshufb xmm12, xmm11
movdqa xmm8, xmm4
movdqa xmm13, xmm5
pclmulqdq xmm4, xmm10, 0x0
pclmulqdq xmm8, xmm10 , 0x11
pclmulqdq xmm5, xmm10, 0x0
pclmulqdq xmm13, xmm10 , 0x11
pxor xmm4, xmm9
xorps xmm4, xmm8
pxor xmm5, xmm12
xorps xmm5, xmm13
movdqu xmm9, [arg2+16*6]
movdqu xmm12, [arg2+16*7]
pshufb xmm9, xmm11
pshufb xmm12, xmm11
movdqa xmm8, xmm6
movdqa xmm13, xmm7
pclmulqdq xmm6, xmm10, 0x0
pclmulqdq xmm8, xmm10 , 0x11
pclmulqdq xmm7, xmm10, 0x0
pclmulqdq xmm13, xmm10 , 0x11
pxor xmm6, xmm9
xorps xmm6, xmm8
pxor xmm7, xmm12
xorps xmm7, xmm13
sub arg3, 128
; check if there is another 4KB (fetch distance) + 128B in the buffer
cmp arg3, (fetch_dist + 128)
jge _fold_and_prefetch_128_B_loop
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
%endif ; fetch_dist != 0
align 16
_fold_128_B_loop:
; update the buffer pointer
add arg2, 128 ; buf += 128;
PREFETCH [arg2+fetch_dist+0]
movdqu xmm9, [arg2+16*0]
movdqu xmm12, [arg2+16*1]
pshufb xmm9, xmm11
pshufb xmm12, xmm11
movdqa xmm8, xmm0
movdqa xmm13, xmm1
pclmulqdq xmm0, xmm10, 0x0
pclmulqdq xmm8, xmm10 , 0x11
pclmulqdq xmm1, xmm10, 0x0
pclmulqdq xmm13, xmm10 , 0x11
pxor xmm0, xmm9
xorps xmm0, xmm8
pxor xmm1, xmm12
xorps xmm1, xmm13
movdqu xmm9, [arg2+16*2]
movdqu xmm12, [arg2+16*3]
pshufb xmm9, xmm11
pshufb xmm12, xmm11
movdqa xmm8, xmm2
movdqa xmm13, xmm3
pclmulqdq xmm2, xmm10, 0x0
pclmulqdq xmm8, xmm10 , 0x11
pclmulqdq xmm3, xmm10, 0x0
pclmulqdq xmm13, xmm10 , 0x11
pxor xmm2, xmm9
xorps xmm2, xmm8
pxor xmm3, xmm12
xorps xmm3, xmm13
PREFETCH [arg2+fetch_dist+64]
movdqu xmm9, [arg2+16*4]
movdqu xmm12, [arg2+16*5]
pshufb xmm9, xmm11
@ -192,7 +281,6 @@ _fold_128_B_loop:
pxor xmm5, xmm12
xorps xmm5, xmm13
prefetchnta [arg2+fetch_dist+96]
movdqu xmm9, [arg2+16*6]
movdqu xmm12, [arg2+16*7]
pshufb xmm9, xmm11
@ -649,7 +737,3 @@ pshufb_shf_table:
; dq 0x060504030201008f, 0x0e0d0c0b0a090807 ; shl 1 (16-15) / shr15
dq 0x8786858483828100, 0x8f8e8d8c8b8a8988
dq 0x0706050403020100, 0x000e0d0c0b0a0908
;;; func core, ver, snum
slversion crc32_ieee_01, 01, 06, 0011

735
crc/crc32_ieee_02.asm Normal file
View File

@ -0,0 +1,735 @@
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Copyright(c) 2011-2020 Intel Corporation All rights reserved.
;
; Redistribution and use in source and binary forms, with or without
; modification, are permitted provided that the following conditions
; are met:
; * Redistributions of source code must retain the above copyright
; notice, this list of conditions and the following disclaimer.
; * Redistributions in binary form must reproduce the above copyright
; notice, this list of conditions and the following disclaimer in
; the documentation and/or other materials provided with the
; distribution.
; * Neither the name of Intel Corporation nor the names of its
; contributors may be used to endorse or promote products derived
; from this software without specific prior written permission.
;
; THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
; "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
; LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
; A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
; OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
; SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
; LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
; DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
; THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
; (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
; OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Function API:
; UINT32 crc32_ieee_02(
; UINT32 init_crc, //initial CRC value, 32 bits
; const unsigned char *buf, //buffer pointer to calculate CRC on
; UINT64 len //buffer length in bytes (64-bit data)
; );
;
; Authors:
; Erdinc Ozturk
; Vinodh Gopal
; James Guilford
;
; Reference paper titled "Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction"
; URL: http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf
%include "reg_sizes.asm"
%ifndef fetch_dist
%define fetch_dist 4096
%endif
%ifndef PREFETCH
%define PREFETCH prefetcht1
%endif
[bits 64]
default rel
section .text
%ifidn __OUTPUT_FORMAT__, win64
%xdefine arg1 rcx
%xdefine arg2 rdx
%xdefine arg3 r8
%xdefine arg1_low32 ecx
%else
%xdefine arg1 rdi
%xdefine arg2 rsi
%xdefine arg3 rdx
%xdefine arg1_low32 edi
%endif
%define TMP 16*0
%ifidn __OUTPUT_FORMAT__, win64
%define XMM_SAVE 16*2
%define VARIABLE_OFFSET 16*10+8
%else
%define VARIABLE_OFFSET 16*2+8
%endif
align 16
mk_global crc32_ieee_02, function
crc32_ieee_02:
endbranch
not arg1_low32 ;~init_crc
sub rsp,VARIABLE_OFFSET
%ifidn __OUTPUT_FORMAT__, win64
; push the xmm registers into the stack to maintain
vmovdqa [rsp + XMM_SAVE + 16*0], xmm6
vmovdqa [rsp + XMM_SAVE + 16*1], xmm7
vmovdqa [rsp + XMM_SAVE + 16*2], xmm8
vmovdqa [rsp + XMM_SAVE + 16*3], xmm9
vmovdqa [rsp + XMM_SAVE + 16*4], xmm10
vmovdqa [rsp + XMM_SAVE + 16*5], xmm11
vmovdqa [rsp + XMM_SAVE + 16*6], xmm12
vmovdqa [rsp + XMM_SAVE + 16*7], xmm13
%endif
; check if smaller than 256
cmp arg3, 256
; for sizes less than 256, we can't fold 128B at a time...
jl _less_than_256
; load the initial crc value
vmovd xmm10, arg1_low32 ; initial crc
; crc value does not need to be byte-reflected, but it needs to be moved to the high part of the register.
; because data will be byte-reflected and will align with initial crc at correct place.
vpslldq xmm10, 12
vmovdqa xmm11, [SHUF_MASK]
; receive the initial 128B data, xor the initial crc value
vmovdqu xmm0, [arg2+16*0]
vmovdqu xmm1, [arg2+16*1]
vmovdqu xmm2, [arg2+16*2]
vmovdqu xmm3, [arg2+16*3]
vmovdqu xmm4, [arg2+16*4]
vmovdqu xmm5, [arg2+16*5]
vmovdqu xmm6, [arg2+16*6]
vmovdqu xmm7, [arg2+16*7]
vpshufb xmm0, xmm11
; XOR the initial_crc value
vpxor xmm0, xmm10
vpshufb xmm1, xmm11
vpshufb xmm2, xmm11
vpshufb xmm3, xmm11
vpshufb xmm4, xmm11
vpshufb xmm5, xmm11
vpshufb xmm6, xmm11
vpshufb xmm7, xmm11
vmovdqa xmm10, [rk3] ;xmm10 has rk3 and rk4
;imm value of pclmulqdq instruction will determine which constant to use
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; we subtract 256 instead of 128 to save one instruction from the loop
sub arg3, 256
; at this section of the code, there is 128*x+y (0<=y<128) bytes of buffer. The _fold_128_B_loop
; loop will fold 128B at a time until we have 128+y Bytes of buffer
%if fetch_dist != 0
; check if there is at least 4KB (fetch distance) + 128B in the buffer
cmp arg3, (fetch_dist + 128)
jb _fold_128_B_loop
; fold 128B at a time. This section of the code folds 8 xmm registers in parallel
align 16
_fold_and_prefetch_128_B_loop:
; update the buffer pointer
add arg2, 128 ; buf += 128;
PREFETCH [arg2+fetch_dist+0]
vmovdqu xmm9, [arg2+16*0]
vmovdqu xmm12, [arg2+16*1]
vpshufb xmm9, xmm11
vpshufb xmm12, xmm11
vmovdqa xmm8, xmm0
vmovdqa xmm13, xmm1
vpclmulqdq xmm0, xmm10, 0x0
vpclmulqdq xmm8, xmm10 , 0x11
vpclmulqdq xmm1, xmm10, 0x0
vpclmulqdq xmm13, xmm10 , 0x11
vpxor xmm0, xmm9
vxorps xmm0, xmm8
vpxor xmm1, xmm12
vxorps xmm1, xmm13
vmovdqu xmm9, [arg2+16*2]
vmovdqu xmm12, [arg2+16*3]
vpshufb xmm9, xmm11
vpshufb xmm12, xmm11
vmovdqa xmm8, xmm2
vmovdqa xmm13, xmm3
vpclmulqdq xmm2, xmm10, 0x0
vpclmulqdq xmm8, xmm10 , 0x11
vpclmulqdq xmm3, xmm10, 0x0
vpclmulqdq xmm13, xmm10 , 0x11
vpxor xmm2, xmm9
vxorps xmm2, xmm8
vpxor xmm3, xmm12
vxorps xmm3, xmm13
PREFETCH [arg2+fetch_dist+64]
vmovdqu xmm9, [arg2+16*4]
vmovdqu xmm12, [arg2+16*5]
vpshufb xmm9, xmm11
vpshufb xmm12, xmm11
vmovdqa xmm8, xmm4
vmovdqa xmm13, xmm5
vpclmulqdq xmm4, xmm10, 0x0
vpclmulqdq xmm8, xmm10 , 0x11
vpclmulqdq xmm5, xmm10, 0x0
vpclmulqdq xmm13, xmm10 , 0x11
vpxor xmm4, xmm9
vxorps xmm4, xmm8
vpxor xmm5, xmm12
vxorps xmm5, xmm13
vmovdqu xmm9, [arg2+16*6]
vmovdqu xmm12, [arg2+16*7]
vpshufb xmm9, xmm11
vpshufb xmm12, xmm11
vmovdqa xmm8, xmm6
vmovdqa xmm13, xmm7
vpclmulqdq xmm6, xmm10, 0x0
vpclmulqdq xmm8, xmm10 , 0x11
vpclmulqdq xmm7, xmm10, 0x0
vpclmulqdq xmm13, xmm10 , 0x11
vpxor xmm6, xmm9
vxorps xmm6, xmm8
vpxor xmm7, xmm12
vxorps xmm7, xmm13
sub arg3, 128
; check if there is another 4KB (fetch distance) + 128B in the buffer
cmp arg3, (fetch_dist + 128)
jge _fold_and_prefetch_128_B_loop
%endif ; fetch_dist != 0
; fold 128B at a time. This section of the code folds 8 xmm registers in parallel
_fold_128_B_loop:
; update the buffer pointer
add arg2, 128 ; buf += 128;
vmovdqu xmm9, [arg2+16*0]
vmovdqu xmm12, [arg2+16*1]
vpshufb xmm9, xmm11
vpshufb xmm12, xmm11
vmovdqa xmm8, xmm0
vmovdqa xmm13, xmm1
vpclmulqdq xmm0, xmm10, 0x0
vpclmulqdq xmm8, xmm10 , 0x11
vpclmulqdq xmm1, xmm10, 0x0
vpclmulqdq xmm13, xmm10 , 0x11
vpxor xmm0, xmm9
vxorps xmm0, xmm8
vpxor xmm1, xmm12
vxorps xmm1, xmm13
vmovdqu xmm9, [arg2+16*2]
vmovdqu xmm12, [arg2+16*3]
vpshufb xmm9, xmm11
vpshufb xmm12, xmm11
vmovdqa xmm8, xmm2
vmovdqa xmm13, xmm3
vpclmulqdq xmm2, xmm10, 0x0
vpclmulqdq xmm8, xmm10 , 0x11
vpclmulqdq xmm3, xmm10, 0x0
vpclmulqdq xmm13, xmm10 , 0x11
vpxor xmm2, xmm9
vxorps xmm2, xmm8
vpxor xmm3, xmm12
vxorps xmm3, xmm13
vmovdqu xmm9, [arg2+16*4]
vmovdqu xmm12, [arg2+16*5]
vpshufb xmm9, xmm11
vpshufb xmm12, xmm11
vmovdqa xmm8, xmm4
vmovdqa xmm13, xmm5
vpclmulqdq xmm4, xmm10, 0x0
vpclmulqdq xmm8, xmm10 , 0x11
vpclmulqdq xmm5, xmm10, 0x0
vpclmulqdq xmm13, xmm10 , 0x11
vpxor xmm4, xmm9
vxorps xmm4, xmm8
vpxor xmm5, xmm12
vxorps xmm5, xmm13
vmovdqu xmm9, [arg2+16*6]
vmovdqu xmm12, [arg2+16*7]
vpshufb xmm9, xmm11
vpshufb xmm12, xmm11
vmovdqa xmm8, xmm6
vmovdqa xmm13, xmm7
vpclmulqdq xmm6, xmm10, 0x0
vpclmulqdq xmm8, xmm10 , 0x11
vpclmulqdq xmm7, xmm10, 0x0
vpclmulqdq xmm13, xmm10 , 0x11
vpxor xmm6, xmm9
vxorps xmm6, xmm8
vpxor xmm7, xmm12
vxorps xmm7, xmm13
sub arg3, 128
; check if there is another 128B in the buffer to be able to fold
jge _fold_128_B_loop
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
add arg2, 128
; at this point, the buffer pointer is pointing at the last y Bytes of the buffer
; the 128 of folded data is in 4 of the xmm registers: xmm0, xmm1, xmm2, xmm3
; fold the 8 xmm registers to 1 xmm register with different constants
vmovdqa xmm10, [rk9]
vmovdqa xmm8, xmm0
vpclmulqdq xmm0, xmm10, 0x11
vpclmulqdq xmm8, xmm10, 0x0
vpxor xmm7, xmm8
vxorps xmm7, xmm0
vmovdqa xmm10, [rk11]
vmovdqa xmm8, xmm1
vpclmulqdq xmm1, xmm10, 0x11
vpclmulqdq xmm8, xmm10, 0x0
vpxor xmm7, xmm8
vxorps xmm7, xmm1
vmovdqa xmm10, [rk13]
vmovdqa xmm8, xmm2
vpclmulqdq xmm2, xmm10, 0x11
vpclmulqdq xmm8, xmm10, 0x0
vpxor xmm7, xmm8
vpxor xmm7, xmm2
vmovdqa xmm10, [rk15]
vmovdqa xmm8, xmm3
vpclmulqdq xmm3, xmm10, 0x11
vpclmulqdq xmm8, xmm10, 0x0
vpxor xmm7, xmm8
vxorps xmm7, xmm3
vmovdqa xmm10, [rk17]
vmovdqa xmm8, xmm4
vpclmulqdq xmm4, xmm10, 0x11
vpclmulqdq xmm8, xmm10, 0x0
vpxor xmm7, xmm8
vpxor xmm7, xmm4
vmovdqa xmm10, [rk19]
vmovdqa xmm8, xmm5
vpclmulqdq xmm5, xmm10, 0x11
vpclmulqdq xmm8, xmm10, 0x0
vpxor xmm7, xmm8
vxorps xmm7, xmm5
vmovdqa xmm10, [rk1] ;xmm10 has rk1 and rk2
;imm value of pclmulqdq instruction will determine which constant to use
vmovdqa xmm8, xmm6
vpclmulqdq xmm6, xmm10, 0x11
vpclmulqdq xmm8, xmm10, 0x0
vpxor xmm7, xmm8
vpxor xmm7, xmm6
; instead of 128, we add 112 to the loop counter to save 1 instruction from the loop
; instead of a cmp instruction, we use the negative flag with the jl instruction
add arg3, 128-16
jl _final_reduction_for_128
; now we have 16+y bytes left to reduce. 16 Bytes is in register xmm7 and the rest is in memory
; we can fold 16 bytes at a time if y>=16
; continue folding 16B at a time
_16B_reduction_loop:
vmovdqa xmm8, xmm7
vpclmulqdq xmm7, xmm10, 0x11
vpclmulqdq xmm8, xmm10, 0x0
vpxor xmm7, xmm8
vmovdqu xmm0, [arg2]
vpshufb xmm0, xmm11
vpxor xmm7, xmm0
add arg2, 16
sub arg3, 16
; instead of a cmp instruction, we utilize the flags with the jge instruction
; equivalent of: cmp arg3, 16-16
; check if there is any more 16B in the buffer to be able to fold
jge _16B_reduction_loop
;now we have 16+z bytes left to reduce, where 0<= z < 16.
;first, we reduce the data in the xmm7 register
_final_reduction_for_128:
; check if any more data to fold. If not, compute the CRC of the final 128 bits
add arg3, 16
je _128_done
; here we are getting data that is less than 16 bytes.
; since we know that there was data before the pointer, we can offset the input pointer before the actual point, to receive exactly 16 bytes.
; after that the registers need to be adjusted.
_get_last_two_xmms:
vmovdqa xmm2, xmm7
vmovdqu xmm1, [arg2 - 16 + arg3]
vpshufb xmm1, xmm11
; get rid of the extra data that was loaded before
; load the shift constant
lea rax, [pshufb_shf_table + 16]
sub rax, arg3
vmovdqu xmm0, [rax]
; shift xmm2 to the left by arg3 bytes
vpshufb xmm2, xmm0
; shift xmm7 to the right by 16-arg3 bytes
vpxor xmm0, [mask1]
vpshufb xmm7, xmm0
vpblendvb xmm1, xmm1, xmm2, xmm0
; fold 16 Bytes
vmovdqa xmm2, xmm1
vmovdqa xmm8, xmm7
vpclmulqdq xmm7, xmm10, 0x11
vpclmulqdq xmm8, xmm10, 0x0
vpxor xmm7, xmm8
vpxor xmm7, xmm2
_128_done:
; compute crc of a 128-bit value
vmovdqa xmm10, [rk5] ; rk5 and rk6 in xmm10
vmovdqa xmm0, xmm7
;64b fold
vpclmulqdq xmm7, xmm10, 0x1
vpslldq xmm0, 8
vpxor xmm7, xmm0
;32b fold
vmovdqa xmm0, xmm7
vpand xmm0, [mask2]
vpsrldq xmm7, 12
vpclmulqdq xmm7, xmm10, 0x10
vpxor xmm7, xmm0
;barrett reduction
_barrett:
vmovdqa xmm10, [rk7] ; rk7 and rk8 in xmm10
vmovdqa xmm0, xmm7
vpclmulqdq xmm7, xmm10, 0x01
vpslldq xmm7, 4
vpclmulqdq xmm7, xmm10, 0x11
vpslldq xmm7, 4
vpxor xmm7, xmm0
vpextrd eax, xmm7,1
_cleanup:
not eax
%ifidn __OUTPUT_FORMAT__, win64
vmovdqa xmm6, [rsp + XMM_SAVE + 16*0]
vmovdqa xmm7, [rsp + XMM_SAVE + 16*1]
vmovdqa xmm8, [rsp + XMM_SAVE + 16*2]
vmovdqa xmm9, [rsp + XMM_SAVE + 16*3]
vmovdqa xmm10, [rsp + XMM_SAVE + 16*4]
vmovdqa xmm11, [rsp + XMM_SAVE + 16*5]
vmovdqa xmm12, [rsp + XMM_SAVE + 16*6]
vmovdqa xmm13, [rsp + XMM_SAVE + 16*7]
%endif
add rsp,VARIABLE_OFFSET
ret
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
align 16
_less_than_256:
; check if there is enough buffer to be able to fold 16B at a time
cmp arg3, 32
jl _less_than_32
vmovdqa xmm11, [SHUF_MASK]
; if there is, load the constants
vmovdqa xmm10, [rk1] ; rk1 and rk2 in xmm10
vmovd xmm0, arg1_low32 ; get the initial crc value
vpslldq xmm0, 12 ; align it to its correct place
vmovdqu xmm7, [arg2] ; load the plaintext
vpshufb xmm7, xmm11 ; byte-reflect the plaintext
vpxor xmm7, xmm0
; update the buffer pointer
add arg2, 16
; update the counter. subtract 32 instead of 16 to save one instruction from the loop
sub arg3, 32
jmp _16B_reduction_loop
align 16
_less_than_32:
; mov initial crc to the return value. this is necessary for zero-length buffers.
mov eax, arg1_low32
test arg3, arg3
je _cleanup
vmovdqa xmm11, [SHUF_MASK]
vmovd xmm0, arg1_low32 ; get the initial crc value
vpslldq xmm0, 12 ; align it to its correct place
cmp arg3, 16
je _exact_16_left
jl _less_than_16_left
vmovdqu xmm7, [arg2] ; load the plaintext
vpshufb xmm7, xmm11 ; byte-reflect the plaintext
vpxor xmm7, xmm0 ; xor the initial crc value
add arg2, 16
sub arg3, 16
vmovdqa xmm10, [rk1] ; rk1 and rk2 in xmm10
jmp _get_last_two_xmms
align 16
_less_than_16_left:
; use stack space to load data less than 16 bytes, zero-out the 16B in memory first.
vpxor xmm1, xmm1
mov r11, rsp
vmovdqa [r11], xmm1
cmp arg3, 4
jl _only_less_than_4
; backup the counter value
mov r9, arg3
cmp arg3, 8
jl _less_than_8_left
; load 8 Bytes
mov rax, [arg2]
mov [r11], rax
add r11, 8
sub arg3, 8
add arg2, 8
_less_than_8_left:
cmp arg3, 4
jl _less_than_4_left
; load 4 Bytes
mov eax, [arg2]
mov [r11], eax
add r11, 4
sub arg3, 4
add arg2, 4
_less_than_4_left:
cmp arg3, 2
jl _less_than_2_left
; load 2 Bytes
mov ax, [arg2]
mov [r11], ax
add r11, 2
sub arg3, 2
add arg2, 2
_less_than_2_left:
cmp arg3, 1
jl _zero_left
; load 1 Byte
mov al, [arg2]
mov [r11], al
_zero_left:
vmovdqa xmm7, [rsp]
vpshufb xmm7, xmm11
vpxor xmm7, xmm0 ; xor the initial crc value
; shl r9, 4
lea rax, [pshufb_shf_table + 16]
sub rax, r9
vmovdqu xmm0, [rax]
vpxor xmm0, [mask1]
vpshufb xmm7, xmm0
jmp _128_done
align 16
_exact_16_left:
vmovdqu xmm7, [arg2]
vpshufb xmm7, xmm11
vpxor xmm7, xmm0 ; xor the initial crc value
jmp _128_done
_only_less_than_4:
cmp arg3, 3
jl _only_less_than_3
; load 3 Bytes
mov al, [arg2]
mov [r11], al
mov al, [arg2+1]
mov [r11+1], al
mov al, [arg2+2]
mov [r11+2], al
vmovdqa xmm7, [rsp]
vpshufb xmm7, xmm11
vpxor xmm7, xmm0 ; xor the initial crc value
vpsrldq xmm7, 5
jmp _barrett
_only_less_than_3:
cmp arg3, 2
jl _only_less_than_2
; load 2 Bytes
mov al, [arg2]
mov [r11], al
mov al, [arg2+1]
mov [r11+1], al
vmovdqa xmm7, [rsp]
vpshufb xmm7, xmm11
vpxor xmm7, xmm0 ; xor the initial crc value
vpsrldq xmm7, 6
jmp _barrett
_only_less_than_2:
; load 1 Byte
mov al, [arg2]
mov [r11], al
vmovdqa xmm7, [rsp]
vpshufb xmm7, xmm11
vpxor xmm7, xmm0 ; xor the initial crc value
vpsrldq xmm7, 7
jmp _barrett
section .data
; precomputed constants
align 16
rk1 :
DQ 0xf200aa6600000000
rk2 :
DQ 0x17d3315d00000000
rk3 :
DQ 0x022ffca500000000
rk4 :
DQ 0x9d9ee22f00000000
rk5 :
DQ 0xf200aa6600000000
rk6 :
DQ 0x490d678d00000000
rk7 :
DQ 0x0000000104d101df
rk8 :
DQ 0x0000000104c11db7
rk9 :
DQ 0x6ac7e7d700000000
rk10 :
DQ 0xfcd922af00000000
rk11 :
DQ 0x34e45a6300000000
rk12 :
DQ 0x8762c1f600000000
rk13 :
DQ 0x5395a0ea00000000
rk14 :
DQ 0x54f2d5c700000000
rk15 :
DQ 0xd3504ec700000000
rk16 :
DQ 0x57a8445500000000
rk17 :
DQ 0xc053585d00000000
rk18 :
DQ 0x766f1b7800000000
rk19 :
DQ 0xcd8c54b500000000
rk20 :
DQ 0xab40b71e00000000
mask1:
dq 0x8080808080808080, 0x8080808080808080
mask2:
dq 0xFFFFFFFFFFFFFFFF, 0x00000000FFFFFFFF
SHUF_MASK:
dq 0x08090A0B0C0D0E0F, 0x0001020304050607
pshufb_shf_table:
; use these values for shift constants for the pshufb instruction
; different alignments result in values as shown:
; dq 0x8887868584838281, 0x008f8e8d8c8b8a89 ; shl 15 (16-1) / shr1
; dq 0x8988878685848382, 0x01008f8e8d8c8b8a ; shl 14 (16-3) / shr2
; dq 0x8a89888786858483, 0x0201008f8e8d8c8b ; shl 13 (16-4) / shr3
; dq 0x8b8a898887868584, 0x030201008f8e8d8c ; shl 12 (16-4) / shr4
; dq 0x8c8b8a8988878685, 0x04030201008f8e8d ; shl 11 (16-5) / shr5
; dq 0x8d8c8b8a89888786, 0x0504030201008f8e ; shl 10 (16-6) / shr6
; dq 0x8e8d8c8b8a898887, 0x060504030201008f ; shl 9 (16-7) / shr7
; dq 0x8f8e8d8c8b8a8988, 0x0706050403020100 ; shl 8 (16-8) / shr8
; dq 0x008f8e8d8c8b8a89, 0x0807060504030201 ; shl 7 (16-9) / shr9
; dq 0x01008f8e8d8c8b8a, 0x0908070605040302 ; shl 6 (16-10) / shr10
; dq 0x0201008f8e8d8c8b, 0x0a09080706050403 ; shl 5 (16-11) / shr11
; dq 0x030201008f8e8d8c, 0x0b0a090807060504 ; shl 4 (16-12) / shr12
; dq 0x04030201008f8e8d, 0x0c0b0a0908070605 ; shl 3 (16-13) / shr13
; dq 0x0504030201008f8e, 0x0d0c0b0a09080706 ; shl 2 (16-14) / shr14
; dq 0x060504030201008f, 0x0e0d0c0b0a090807 ; shl 1 (16-15) / shr15
dq 0x8786858483828100, 0x8f8e8d8c8b8a8988
dq 0x0706050403020100, 0x000e0d0c0b0a0908

608
crc/crc32_ieee_by16_10.asm Normal file
View File

@ -0,0 +1,608 @@
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Copyright(c) 2011-2020 Intel Corporation All rights reserved.
;
; Redistribution and use in source and binary forms, with or without
; modification, are permitted provided that the following conditions
; are met:
; * Redistributions of source code must retain the above copyright
; notice, this list of conditions and the following disclaimer.
; * Redistributions in binary form must reproduce the above copyright
; notice, this list of conditions and the following disclaimer in
; the documentation and/or other materials provided with the
; distribution.
; * Neither the name of Intel Corporation nor the names of its
; contributors may be used to endorse or promote products derived
; from this software without specific prior written permission.
;
; THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
; "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
; LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
; A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
; OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
; SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
; LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
; DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
; THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
; (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
; OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Function API:
; UINT32 crc32_gzip_refl_by16_10(
; UINT32 init_crc, //initial CRC value, 32 bits
; const unsigned char *buf, //buffer pointer to calculate CRC on
; UINT64 len //buffer length in bytes (64-bit data)
; );
;
; Authors:
; Erdinc Ozturk
; Vinodh Gopal
; James Guilford
;
; Reference paper titled "Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction"
; URL: http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf
;
;
%include "reg_sizes.asm"
%ifndef FUNCTION_NAME
%define FUNCTION_NAME crc32_ieee_by16_10
%endif
%ifndef fetch_dist
%define fetch_dist 1536
%endif
%ifndef PREFETCH
%define PREFETCH prefetcht0
%endif
[bits 64]
default rel
section .text
%ifidn __OUTPUT_FORMAT__, win64
%xdefine arg1 rcx
%xdefine arg2 rdx
%xdefine arg3 r8
%xdefine arg1_low32 ecx
%else
%xdefine arg1 rdi
%xdefine arg2 rsi
%xdefine arg3 rdx
%xdefine arg1_low32 edi
%endif
align 16
mk_global FUNCTION_NAME, function
FUNCTION_NAME:
endbranch
not arg1_low32
%ifidn __OUTPUT_FORMAT__, win64
sub rsp, (16*10 + 8)
; push the xmm registers into the stack to maintain
vmovdqa [rsp + 16*0], xmm6
vmovdqa [rsp + 16*1], xmm7
vmovdqa [rsp + 16*2], xmm8
vmovdqa [rsp + 16*3], xmm9
vmovdqa [rsp + 16*4], xmm10
vmovdqa [rsp + 16*5], xmm11
vmovdqa [rsp + 16*6], xmm12
vmovdqa [rsp + 16*7], xmm13
vmovdqa [rsp + 16*8], xmm14
vmovdqa [rsp + 16*9], xmm15
%endif
vbroadcasti32x4 zmm18, [SHUF_MASK]
cmp arg3, 256
jl .less_than_256
; load the initial crc value
vmovd xmm10, arg1_low32 ; initial crc
; crc value does not need to be byte-reflected, but it needs to be moved to the high part of the register.
; because data will be byte-reflected and will align with initial crc at correct place.
vpslldq xmm10, 12
; receive the initial 64B data, xor the initial crc value
vmovdqu8 zmm0, [arg2+16*0]
vmovdqu8 zmm4, [arg2+16*4]
vpshufb zmm0, zmm0, zmm18
vpshufb zmm4, zmm4, zmm18
vpxorq zmm0, zmm10
vbroadcasti32x4 zmm10, [rk3] ;xmm10 has rk3 and rk4
;imm value of pclmulqdq instruction will determine which constant to use
sub arg3, 256
cmp arg3, 256
jl .fold_128_B_loop
vmovdqu8 zmm7, [arg2+16*8]
vmovdqu8 zmm8, [arg2+16*12]
vpshufb zmm7, zmm7, zmm18
vpshufb zmm8, zmm8, zmm18
vbroadcasti32x4 zmm16, [rk_1] ;zmm16 has rk-1 and rk-2
sub arg3, 256
%if fetch_dist != 0
; check if there is at least 1.5KB (fetch distance) + 256B in the buffer
cmp arg3, (fetch_dist + 256)
jb .fold_256_B_loop
align 16
.fold_and_prefetch_256_B_loop:
add arg2, 256
vmovdqu8 zmm3, [arg2+16*0]
PREFETCH [arg2+fetch_dist+0]
vpshufb zmm3, zmm3, zmm18
vpclmulqdq zmm1, zmm0, zmm16, 0x00
vpclmulqdq zmm0, zmm0, zmm16, 0x11
vpternlogq zmm0, zmm1, zmm3, 0x96
vmovdqu8 zmm9, [arg2+16*4]
PREFETCH [arg2+fetch_dist+64]
vpshufb zmm9, zmm9, zmm18
vpclmulqdq zmm5, zmm4, zmm16, 0x00
vpclmulqdq zmm4, zmm4, zmm16, 0x11
vpternlogq zmm4, zmm5, zmm9, 0x96
vmovdqu8 zmm11, [arg2+16*8]
PREFETCH [arg2+fetch_dist+64*2]
vpshufb zmm11, zmm11, zmm18
vpclmulqdq zmm12, zmm7, zmm16, 0x00
vpclmulqdq zmm7, zmm7, zmm16, 0x11
vpternlogq zmm7, zmm12, zmm11, 0x96
vmovdqu8 zmm17, [arg2+16*12]
PREFETCH [arg2+fetch_dist+64*3]
vpshufb zmm17, zmm17, zmm18
vpclmulqdq zmm14, zmm8, zmm16, 0x00
vpclmulqdq zmm8, zmm8, zmm16, 0x11
vpternlogq zmm8, zmm14, zmm17, 0x96
sub arg3, 256
; check if there is another 1.5KB (fetch distance) + 256B in the buffer
cmp arg3, (fetch_dist + 256)
jge .fold_and_prefetch_256_B_loop
%endif ; fetch_dist != 0
align 16
.fold_256_B_loop:
add arg2, 256
vmovdqu8 zmm3, [arg2+16*0]
vpshufb zmm3, zmm3, zmm18
vpclmulqdq zmm1, zmm0, zmm16, 0x00
vpclmulqdq zmm0, zmm0, zmm16, 0x11
vpternlogq zmm0, zmm1, zmm3, 0x96
vmovdqu8 zmm9, [arg2+16*4]
vpshufb zmm9, zmm9, zmm18
vpclmulqdq zmm5, zmm4, zmm16, 0x00
vpclmulqdq zmm4, zmm4, zmm16, 0x11
vpternlogq zmm4, zmm5, zmm9, 0x96
vmovdqu8 zmm11, [arg2+16*8]
vpshufb zmm11, zmm11, zmm18
vpclmulqdq zmm12, zmm7, zmm16, 0x00
vpclmulqdq zmm7, zmm7, zmm16, 0x11
vpternlogq zmm7, zmm12, zmm11, 0x96
vmovdqu8 zmm17, [arg2+16*12]
vpshufb zmm17, zmm17, zmm18
vpclmulqdq zmm14, zmm8, zmm16, 0x00
vpclmulqdq zmm8, zmm8, zmm16, 0x11
vpternlogq zmm8, zmm14, zmm17, 0x96
sub arg3, 256
jge .fold_256_B_loop
;; Fold 256 into 128
add arg2, 256
vpclmulqdq zmm1, zmm0, zmm10, 0x00
vpclmulqdq zmm2, zmm0, zmm10, 0x11
vpternlogq zmm7, zmm1, zmm2, 0x96 ; xor ABC
vpclmulqdq zmm5, zmm4, zmm10, 0x00
vpclmulqdq zmm6, zmm4, zmm10, 0x11
vpternlogq zmm8, zmm5, zmm6, 0x96 ; xor ABC
vmovdqa32 zmm0, zmm7
vmovdqa32 zmm4, zmm8
add arg3, 128
jmp .less_than_128_B
; at this section of the code, there is 128*x+y (0<=y<128) bytes of buffer. The fold_128_B_loop
; loop will fold 128B at a time until we have 128+y Bytes of buffer
; fold 128B at a time. This section of the code folds 8 xmm registers in parallel
align 16
.fold_128_B_loop:
add arg2, 128
vmovdqu8 zmm8, [arg2+16*0]
vpshufb zmm8, zmm8, zmm18
vpclmulqdq zmm2, zmm0, zmm10, 0x00
vpclmulqdq zmm0, zmm0, zmm10, 0x11
vpternlogq zmm0, zmm2, zmm8, 0x96
vmovdqu8 zmm9, [arg2+16*4]
vpshufb zmm9, zmm9, zmm18
vpclmulqdq zmm5, zmm4, zmm10, 0x00
vpclmulqdq zmm4, zmm4, zmm10, 0x11
vpternlogq zmm4, zmm5, zmm9, 0x96
sub arg3, 128
jge .fold_128_B_loop
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
add arg2, 128
align 16
.less_than_128_B:
;; At this point, the buffer pointer is pointing at the last
;; y bytes of the buffer, where 0 <= y < 128.
;; The 128 bytes of folded data is in 2 of the zmm registers:
;; zmm0 and zmm4
cmp arg3, -64
jl .fold_128_B_register
vbroadcasti32x4 zmm10, [rk15]
;; If there are still 64 bytes left, folds from 128 bytes to 64 bytes
;; and handles the next 64 bytes
vpclmulqdq zmm2, zmm0, zmm10, 0x00
vpclmulqdq zmm0, zmm0, zmm10, 0x11
vpternlogq zmm0, zmm2, zmm4, 0x96
add arg3, 128
jmp .fold_64B_loop
align 16
.fold_128_B_register:
; fold the 8 128b parts into 1 xmm register with different constants
vmovdqu8 zmm16, [rk9] ; multiply by rk9-rk16
vmovdqu8 zmm11, [rk17] ; multiply by rk17-rk20, rk1,rk2, 0,0
vpclmulqdq zmm1, zmm0, zmm16, 0x00
vpclmulqdq zmm2, zmm0, zmm16, 0x11
vextracti64x2 xmm7, zmm4, 3 ; save last that has no multiplicand
vpclmulqdq zmm5, zmm4, zmm11, 0x00
vpclmulqdq zmm6, zmm4, zmm11, 0x11
vmovdqa xmm10, [rk1] ; Needed later in reduction loop
vpternlogq zmm1, zmm2, zmm5, 0x96 ; xor ABC
vpternlogq zmm1, zmm6, zmm7, 0x96 ; xor ABC
vshufi64x2 zmm8, zmm1, zmm1, 0x4e ; Swap 1,0,3,2 - 01 00 11 10
vpxorq ymm8, ymm8, ymm1
vextracti64x2 xmm5, ymm8, 1
vpxorq xmm7, xmm5, xmm8
; instead of 128, we add 128-16 to the loop counter to save 1 instruction from the loop
; instead of a cmp instruction, we use the negative flag with the jl instruction
add arg3, 128-16
jl .final_reduction_for_128
; now we have 16+y bytes left to reduce. 16 Bytes is in register xmm7 and the rest is in memory
; we can fold 16 bytes at a time if y>=16
; continue folding 16B at a time
align 16
.16B_reduction_loop:
vpclmulqdq xmm8, xmm7, xmm10, 0x11
vpclmulqdq xmm7, xmm7, xmm10, 0x00
vpxor xmm7, xmm8
vmovdqu xmm0, [arg2]
vpshufb xmm0, xmm0, xmm18
vpxor xmm7, xmm0
add arg2, 16
sub arg3, 16
; instead of a cmp instruction, we utilize the flags with the jge instruction
; equivalent of: cmp arg3, 16-16
; check if there is any more 16B in the buffer to be able to fold
jge .16B_reduction_loop
;now we have 16+z bytes left to reduce, where 0<= z < 16.
;first, we reduce the data in the xmm7 register
align 16
.final_reduction_for_128:
add arg3, 16
je .128_done
; here we are getting data that is less than 16 bytes.
; since we know that there was data before the pointer, we can offset
; the input pointer before the actual point, to receive exactly 16 bytes.
; after that the registers need to be adjusted.
align 16
.get_last_two_xmms:
vmovdqa xmm2, xmm7
vmovdqu xmm1, [arg2 - 16 + arg3]
vpshufb xmm1, xmm18
; get rid of the extra data that was loaded before
; load the shift constant
lea rax, [rel pshufb_shf_table + 16]
sub rax, arg3
vmovdqu xmm0, [rax]
vpshufb xmm2, xmm0
vpxor xmm0, [mask1]
vpshufb xmm7, xmm0
vpblendvb xmm1, xmm1, xmm2, xmm0
vpclmulqdq xmm8, xmm7, xmm10, 0x11
vpclmulqdq xmm7, xmm7, xmm10, 0x00
vpternlogq xmm7, xmm8, xmm1, 0x96
align 16
.128_done:
; compute crc of a 128-bit value
vmovdqa xmm10, [rk5]
vmovdqa xmm0, xmm7
;64b fold
vpclmulqdq xmm7, xmm10, 0x01 ; H*L
vpslldq xmm0, 8
vpxor xmm7, xmm0
;32b fold
vpand xmm0, xmm7, [mask2]
vpsrldq xmm7, 12
vpclmulqdq xmm7, xmm10, 0x10
vpxor xmm7, xmm0
;barrett reduction
align 16
.barrett:
vmovdqa xmm10, [rk7] ; rk7 and rk8 in xmm10
vmovdqa xmm0, xmm7
vpclmulqdq xmm7, xmm10, 0x01
vpslldq xmm7, 4
vpclmulqdq xmm7, xmm10, 0x11
vpslldq xmm7, 4
vpxor xmm7, xmm0
vpextrd eax, xmm7, 1
align 16
.cleanup:
not eax
%ifidn __OUTPUT_FORMAT__, win64
vmovdqa xmm6, [rsp + 16*0]
vmovdqa xmm7, [rsp + 16*1]
vmovdqa xmm8, [rsp + 16*2]
vmovdqa xmm9, [rsp + 16*3]
vmovdqa xmm10, [rsp + 16*4]
vmovdqa xmm11, [rsp + 16*5]
vmovdqa xmm12, [rsp + 16*6]
vmovdqa xmm13, [rsp + 16*7]
vmovdqa xmm14, [rsp + 16*8]
vmovdqa xmm15, [rsp + 16*9]
add rsp, (16*10 + 8)
%endif
ret
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
align 16
.less_than_256:
; check if there is enough buffer to be able to fold 16B at a time
cmp arg3, 32
jl .less_than_32
vmovd xmm1, arg1_low32 ; get the initial crc value
vpslldq xmm1, 12
cmp arg3, 64
jl .less_than_64
;; receive the initial 64B data, xor the initial crc value
vmovdqu8 zmm0, [arg2]
vpshufb zmm0, zmm18
vpxorq zmm0, zmm1
add arg2, 64
sub arg3, 64
cmp arg3, 64
jb .reduce_64B
vbroadcasti32x4 zmm10, [rk15]
align 16
.fold_64B_loop:
vmovdqu8 zmm4, [arg2]
vpshufb zmm4, zmm18
vpclmulqdq zmm2, zmm0, zmm10, 0x11
vpclmulqdq zmm0, zmm0, zmm10, 0x00
vpternlogq zmm0, zmm2, zmm4, 0x96
add arg2, 64
sub arg3, 64
cmp arg3, 64
jge .fold_64B_loop
align 16
.reduce_64B:
; Reduce from 64 bytes to 16 bytes
vmovdqu8 zmm11, [rk17]
vpclmulqdq zmm1, zmm0, zmm11, 0x11
vpclmulqdq zmm2, zmm0, zmm11, 0x00
vextracti64x2 xmm7, zmm0, 3 ; save last that has no multiplicand
vpternlogq zmm1, zmm2, zmm7, 0x96
vmovdqa xmm10, [rk_1b] ; Needed later in reduction loop
vshufi64x2 zmm8, zmm1, zmm1, 0x4e ; Swap 1,0,3,2 - 01 00 11 10
vpxorq ymm8, ymm8, ymm1
vextracti64x2 xmm5, ymm8, 1
vpxorq xmm7, xmm5, xmm8
sub arg3, 16
jns .16B_reduction_loop ; At least 16 bytes of data to digest
jmp .final_reduction_for_128
align 16
.less_than_64:
;; if there is, load the constants
vmovdqa xmm10, [rk_1b]
vmovdqu xmm7, [arg2] ; load the plaintext
vpshufb xmm7, xmm18
vpxor xmm7, xmm1 ; xmm1 already has initial crc value
;; update the buffer pointer
add arg2, 16
;; update the counter
;; - subtract 32 instead of 16 to save one instruction from the loop
sub arg3, 32
jmp .16B_reduction_loop
align 16
.less_than_32:
; mov initial crc to the return value. this is necessary for zero-length buffers.
mov eax, arg1_low32
test arg3, arg3
je .cleanup
vmovd xmm0, arg1_low32 ; get the initial crc value
vpslldq xmm0, 12 ; align it to its correct place
cmp arg3, 16
je .exact_16_left
jl .less_than_16_left
vmovdqu xmm7, [arg2] ; load the plaintext
vpshufb xmm7, xmm18
vpxor xmm7, xmm0 ; xor the initial crc value
add arg2, 16
sub arg3, 16
vmovdqa xmm10, [rk1] ; rk1 and rk2 in xmm10
jmp .get_last_two_xmms
align 16
.less_than_16_left:
xor r10, r10
bts r10, arg3
dec r10
kmovw k2, r10d
vmovdqu8 xmm7{k2}{z}, [arg2]
vpshufb xmm7, xmm18 ; byte-reflect the plaintext
vpxor xmm7, xmm0 ; xor the initial crc value
cmp arg3, 4
jb .only_less_than_4
lea rax, [rel pshufb_shf_table + 16]
sub rax, arg3
vmovdqu xmm0, [rax]
vpxor xmm0, [mask1]
vpshufb xmm7,xmm0
jmp .128_done
align 16
.only_less_than_4:
lea r11, [rel pshufb_shift_table + 3]
sub r11, arg3
vmovdqu xmm0, [r11]
vpshufb xmm7, xmm0
jmp .barrett
align 32
.exact_16_left:
vmovdqu xmm7, [arg2]
vpshufb xmm7, xmm18
vpxor xmm7, xmm0 ; xor the initial crc value
jmp .128_done
section .data
align 32
%ifndef USE_CONSTS
; precomputed constants
rk_1: dq 0x1851689900000000
rk_2: dq 0xa3dc855100000000
rk1: dq 0xf200aa6600000000
rk2: dq 0x17d3315d00000000
rk3: dq 0x022ffca500000000
rk4: dq 0x9d9ee22f00000000
rk5: dq 0xf200aa6600000000
rk6: dq 0x490d678d00000000
rk7: dq 0x0000000104d101df
rk8: dq 0x0000000104c11db7
rk9: dq 0x6ac7e7d700000000
rk10: dq 0xfcd922af00000000
rk11: dq 0x34e45a6300000000
rk12: dq 0x8762c1f600000000
rk13: dq 0x5395a0ea00000000
rk14: dq 0x54f2d5c700000000
rk15: dq 0xd3504ec700000000
rk16: dq 0x57a8445500000000
rk17: dq 0xc053585d00000000
rk18: dq 0x766f1b7800000000
rk19: dq 0xcd8c54b500000000
rk20: dq 0xab40b71e00000000
rk_1b: dq 0xf200aa6600000000
rk_2b: dq 0x17d3315d00000000
dq 0x0000000000000000
dq 0x0000000000000000
%else
INCLUDE_CONSTS
%endif
align 16
pshufb_shift_table:
;; use these values to shift data for the pshufb instruction
db 0x05, 0x06, 0x07, 0x08, 0x09, 0x0A, 0x0B,
db 0x0C, 0x0D, 0x0E, 0x0F, 0xFF, 0xFF, 0xFF, 0xFF
db 0xFF, 0xFF
mask1: dq 0x8080808080808080, 0x8080808080808080
mask2: dq 0xFFFFFFFFFFFFFFFF, 0x00000000FFFFFFFF
SHUF_MASK: dq 0x08090A0B0C0D0E0F, 0x0001020304050607
pshufb_shf_table:
; use these values for shift constants for the pshufb instruction
; different alignments result in values as shown:
; dq 0x8887868584838281, 0x008f8e8d8c8b8a89 ; shl 15 (16-1) / shr1
; dq 0x8988878685848382, 0x01008f8e8d8c8b8a ; shl 14 (16-3) / shr2
; dq 0x8a89888786858483, 0x0201008f8e8d8c8b ; shl 13 (16-4) / shr3
; dq 0x8b8a898887868584, 0x030201008f8e8d8c ; shl 12 (16-4) / shr4
; dq 0x8c8b8a8988878685, 0x04030201008f8e8d ; shl 11 (16-5) / shr5
; dq 0x8d8c8b8a89888786, 0x0504030201008f8e ; shl 10 (16-6) / shr6
; dq 0x8e8d8c8b8a898887, 0x060504030201008f ; shl 9 (16-7) / shr7
; dq 0x8f8e8d8c8b8a8988, 0x0706050403020100 ; shl 8 (16-8) / shr8
; dq 0x008f8e8d8c8b8a89, 0x0807060504030201 ; shl 7 (16-9) / shr9
; dq 0x01008f8e8d8c8b8a, 0x0908070605040302 ; shl 6 (16-10) / shr10
; dq 0x0201008f8e8d8c8b, 0x0a09080706050403 ; shl 5 (16-11) / shr11
; dq 0x030201008f8e8d8c, 0x0b0a090807060504 ; shl 4 (16-12) / shr12
; dq 0x04030201008f8e8d, 0x0c0b0a0908070605 ; shl 3 (16-13) / shr13
; dq 0x0504030201008f8e, 0x0d0c0b0a09080706 ; shl 2 (16-14) / shr14
; dq 0x060504030201008f, 0x0e0d0c0b0a090807 ; shl 1 (16-15) / shr15
dq 0x8786858483828100, 0x8f8e8d8c8b8a8988
dq 0x0706050403020100, 0x000e0d0c0b0a0908
dq 0x8080808080808080, 0x0f0e0d0c0b0a0908
dq 0x8080808080808080, 0x8080808080808080

View File

@ -45,7 +45,13 @@
%include "reg_sizes.asm"
%define fetch_dist 1024
%ifndef fetch_dist
%define fetch_dist 4096
%endif
%ifndef PREFETCH
%define PREFETCH prefetcht1
%endif
[bits 64]
default rel
@ -74,8 +80,9 @@ section .text
%endif
align 16
global crc32_ieee_by4:function
mk_global crc32_ieee_by4, function
crc32_ieee_by4:
endbranch
not arg1_low32
@ -128,14 +135,17 @@ crc32_ieee_by4:
; buffer. The _fold_64_B_loop loop will fold 64B at a time until we
; have 64+y Bytes of buffer
%if fetch_dist != 0
; check if there is another 4KB (fetch distance) + 128B in the buffer
cmp arg3, (fetch_dist + 64)
jge _fold_and_prefetch_64_B_loop
; fold 64B at a time. This section of the code folds 4 xmm registers in parallel
_fold_64_B_loop:
align 16
_fold_and_prefetch_64_B_loop:
;update the buffer pointer
add arg2, 64
prefetchnta [arg2+fetch_dist+0]
movdqa xmm4, xmm0
movdqa xmm5, xmm1
@ -148,7 +158,61 @@ _fold_64_B_loop:
pxor xmm0, xmm4
pxor xmm1, xmm5
prefetchnta [arg2+fetch_dist+32]
movdqa xmm4, xmm2
movdqa xmm5, xmm3
pclmulqdq xmm2, xmm6, 0x11
pclmulqdq xmm3, xmm6, 0x11
pclmulqdq xmm4, xmm6, 0x0
pclmulqdq xmm5, xmm6, 0x0
pxor xmm2, xmm4
pxor xmm3, xmm5
movdqu xmm4, [arg2]
movdqu xmm5, [arg2+16]
pshufb xmm4, xmm7
pshufb xmm5, xmm7
pxor xmm0, xmm4
pxor xmm1, xmm5
movdqu xmm4, [arg2+32]
movdqu xmm5, [arg2+48]
pshufb xmm4, xmm7
pshufb xmm5, xmm7
pxor xmm2, xmm4
pxor xmm3, xmm5
sub arg3, 64
; check if there is another 4KB (fetch distance) + 64B in the buffer
cmp arg3, (fetch_dist + 64)
jge _fold_and_prefetch_64_B_loop
%endif ; fetch_dist != 0
; fold 64B at a time. This section of the code folds 4 xmm registers in parallel
align 16
_fold_64_B_loop:
;update the buffer pointer
add arg2, 64
PREFETCH [arg2+fetch_dist+0]
movdqa xmm4, xmm0
movdqa xmm5, xmm1
pclmulqdq xmm0, xmm6 , 0x11
pclmulqdq xmm1, xmm6 , 0x11
pclmulqdq xmm4, xmm6, 0x0
pclmulqdq xmm5, xmm6, 0x0
pxor xmm0, xmm4
pxor xmm1, xmm5
movdqa xmm4, xmm2
movdqa xmm5, xmm3
@ -560,6 +624,3 @@ pshufb_shf_table:
SHUF_MASK dq 0x08090A0B0C0D0E0F, 0x0001020304050607
;;; func core, ver, snum
slversion crc32_ieee_by4, 05, 02, 0017

View File

@ -31,57 +31,53 @@
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <sys/time.h>
#include "crc.h"
#include "test.h"
//#define CACHED_TEST
#ifdef CACHED_TEST
#ifndef GT_L3_CACHE
#define GT_L3_CACHE 32 * 1024 * 1024 /* some number > last level cache */
#endif
#if !defined(COLD_TEST) && !defined(TEST_CUSTOM)
// Cached test, loop many times over small dataset
# define TEST_LEN 8*1024
# define TEST_LOOPS 400000
# define TEST_TYPE_STR "_warm"
#else
#define TEST_LEN 8 * 1024
#define TEST_TYPE_STR "_warm"
#elif defined(COLD_TEST)
// Uncached test. Pull from large mem base.
# define GT_L3_CACHE 32*1024*1024 /* some number > last level cache */
# define TEST_LEN (2 * GT_L3_CACHE)
# define TEST_LOOPS 100
# define TEST_TYPE_STR "_cold"
#define TEST_LEN (2 * GT_L3_CACHE)
#define TEST_TYPE_STR "_cold"
#endif
#ifndef TEST_SEED
# define TEST_SEED 0x1234
#define TEST_SEED 0x1234
#endif
#define TEST_MEM TEST_LEN
int main(int argc, char *argv[])
int
main(int argc, char *argv[])
{
int i;
void *buf;
uint32_t crc;
struct perf start, stop;
void *buf;
uint32_t crc;
struct perf start;
printf("crc32_ieee_perf:\n");
printf("crc32_ieee_perf:\n");
if (posix_memalign(&buf, 1024, TEST_LEN)) {
printf("alloc error: Fail");
return -1;
}
if (posix_memalign(&buf, 1024, TEST_LEN)) {
printf("alloc error: Fail");
return -1;
}
printf("Start timed tests\n");
fflush(0);
printf("Start timed tests\n");
fflush(0);
memset(buf, 0, TEST_LEN);
crc = crc32_ieee(TEST_SEED, buf, TEST_LEN);
perf_start(&start);
for (i = 0; i < TEST_LOOPS; i++) {
crc = crc32_ieee(TEST_SEED, buf, TEST_LEN);
}
perf_stop(&stop);
printf("crc32_ieee" TEST_TYPE_STR ": ");
perf_print(stop, start, (long long)TEST_LEN * i);
memset(buf, 0, TEST_LEN);
BENCHMARK(&start, BENCHMARK_TIME, crc = crc32_ieee(TEST_SEED, buf, TEST_LEN));
printf("crc32_ieee" TEST_TYPE_STR ": ");
perf_print(start, (long long) TEST_LEN);
printf("finish 0x%x\n", crc);
return 0;
printf("finish 0x%x\n", crc);
aligned_free(buf);
return 0;
}

View File

@ -1,174 +0,0 @@
/**********************************************************************
Copyright(c) 2011-2015 Intel Corporation All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Intel Corporation nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
**********************************************************************/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include "crc.h"
#include "types.h"
#ifndef TEST_SEED
# define TEST_SEED 0x1234
#endif
#define MAX_BUF 512
#define TEST_SIZE 20
typedef uint64_t u64;
typedef uint32_t u32;
typedef uint16_t u16;
typedef uint8_t u8;
// Generates pseudo-random data
void rand_buffer(unsigned char *buf, long buffer_size)
{
long i;
for (i = 0; i < buffer_size; i++)
buf[i] = rand();
}
int main(int argc, char *argv[])
{
int fail = 0;
u32 r;
int verbose = argc - 1;
int i, s, ret;
void *buf_alloc;
unsigned char *buf;
printf("Test crc32_ieee ");
// Align to MAX_BUF boundary
ret = posix_memalign(&buf_alloc, MAX_BUF, MAX_BUF * TEST_SIZE);
if (ret) {
printf("alloc error: Fail");
return -1;
}
buf = (unsigned char *)buf_alloc;
srand(TEST_SEED);
// Test of all zeros
memset(buf, 0, MAX_BUF * 10);
u32 crc = crc32_ieee(TEST_SEED, buf, MAX_BUF);
u32 crc_ref = crc32_ieee_base(TEST_SEED, buf, MAX_BUF);
if (crc != crc_ref) {
fail++;
printf("\n opt ref\n");
printf(" ------ ------\n");
printf("crc zero = 0x%8x 0x%8x \n", crc, crc_ref);
} else
printf(".");
// Another simple test pattern
memset(buf, 0x8a, MAX_BUF);
crc = crc32_ieee(TEST_SEED, buf, MAX_BUF);
crc_ref = crc32_ieee_base(TEST_SEED, buf, MAX_BUF);
if (crc != crc_ref)
fail++;
if (verbose)
printf("crc all 8a = 0x%8x 0x%8x\n", crc, crc_ref);
else
printf(".");
// Do a few random tests
r = rand();
rand_buffer(buf, MAX_BUF * TEST_SIZE);
for (i = 0; i < TEST_SIZE; i++) {
crc = crc32_ieee(r, buf, MAX_BUF);
crc_ref = crc32_ieee_base(r, buf, MAX_BUF);
if (crc != crc_ref)
fail++;
if (verbose)
printf("crc rand%3d = 0x%8x 0x%8x\n", i, crc, crc_ref);
else
printf(".");
buf += MAX_BUF;
}
// Do a few random sizes
buf = (unsigned char *)buf_alloc; //reset buf
r = rand();
for (i = MAX_BUF; i >= 0; i--) {
crc = crc32_ieee(r, buf, i);
crc_ref = crc32_ieee_base(r, buf, i);
if (crc != crc_ref) {
fail++;
printf("fail random size%i 0x%8x 0x%8x\n", i, crc, crc_ref);
} else
printf(".");
}
// Try different seeds
for (s = 0; s < 20; s++) {
buf = (unsigned char *)buf_alloc; //reset buf
r = rand(); // just to get a new seed
rand_buffer(buf, MAX_BUF * TEST_SIZE); // new pseudo-rand data
if (verbose)
printf("seed = 0x%x\n", r);
for (i = 0; i < TEST_SIZE; i++) {
crc = crc32_ieee(r, buf, MAX_BUF);
crc_ref = crc32_ieee_base(r, buf, MAX_BUF);
if (crc != crc_ref)
fail++;
if (verbose)
printf("crc rand%3d = 0x%8x 0x%8x\n", i, crc, crc_ref);
else
printf(".");
buf += MAX_BUF;
}
}
// Run tests at end of buffer
buf = (unsigned char *)buf_alloc; //reset buf
buf = buf + ((MAX_BUF - 1) * TEST_SIZE); //Line up TEST_SIZE from end
for (i = 0; i < TEST_SIZE; i++) {
crc = crc32_ieee(TEST_SEED, buf + i, TEST_SIZE - i);
crc_ref = crc32_ieee_base(TEST_SEED, buf + i, TEST_SIZE - i);
if (crc != crc_ref)
fail++;
if (verbose)
printf("crc eob rand%3d = 0x%4x 0x%4x\n", i, crc, crc_ref);
else
printf(".");
}
printf("Test done: %s\n", fail ? "Fail" : "Pass");
if (fail)
printf("\nFailed %d tests\n", fail);
return fail;
}

View File

@ -1,656 +0,0 @@
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Copyright(c) 2011-2015 Intel Corporation All rights reserved.
;
; Redistribution and use in source and binary forms, with or without
; modification, are permitted provided that the following conditions
; are met:
; * Redistributions of source code must retain the above copyright
; notice, this list of conditions and the following disclaimer.
; * Redistributions in binary form must reproduce the above copyright
; notice, this list of conditions and the following disclaimer in
; the documentation and/or other materials provided with the
; distribution.
; * Neither the name of Intel Corporation nor the names of its
; contributors may be used to endorse or promote products derived
; from this software without specific prior written permission.
;
; THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
; "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
; LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
; A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
; OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
; SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
; LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
; DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
; THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
; (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
; OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Function to compute iscsi CRC32 with table-based recombination
; crc done "by 3" with block sizes 1920, 960, 480, 240
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
%include "reg_sizes.asm"
default rel
; crcB3 MACRO to implement crc32 on 3 %%bSize-byte blocks
%macro crcB3 3
%define %%bSize %1 ; 1/3 of buffer size
%define %%td2 %2 ; table offset for crc0 (2/3 of buffer)
%define %%td1 %3 ; table offset for crc1 (1/3 of buffer)
%IF %%bSize=640
sub len, %%bSize*3
js %%crcB3_end ;; jump to next level if 3*blockSize > len
%ELSE
cmp len, %%bSize*3
jnae %%crcB3_end ;; jump to next level if 3*blockSize > len
%ENDIF
;;;;;; Calculate CRC of 3 blocks of the buffer ;;;;;;
%%crcB3_loop:
;; rax = crc0 = initial crc
xor rbx, rbx ;; rbx = crc1 = 0;
xor r10, r10 ;; r10 = crc2 = 0;
%assign i 0
%rep %%bSize/8 - 1
%if i < %%bSize*3/4
prefetchnta [bufptmp+ %%bSize*3 +i*4]
%endif
crc32 rax, qword [bufptmp+i + 0*%%bSize] ;; update crc0
crc32 rbx, qword [bufptmp+i + 1*%%bSize] ;; update crc1
crc32 r10, qword [bufptmp+i + 2*%%bSize] ;; update crc2
%assign i (i+8)
%endrep
crc32 rax, qword [bufptmp+i + 0*%%bSize] ;; update crc0
crc32 rbx, qword [bufptmp+i + 1*%%bSize] ;; update crc1
; SKIP ;crc32 r10, [bufptmp+i + 2*%%bSize] ;; update crc2
; merge in crc0
movzx bufp_dw, al
mov r9d, [crc_init + bufp*4 + %%td2]
movzx bufp_dw, ah
shr eax, 16
mov r11d, [crc_init + bufp*4 + %%td2]
shl r11, 8
xor r9, r11
movzx bufp_dw, al
mov r11d, [crc_init + bufp*4 + %%td2]
movzx bufp_dw, ah
shl r11, 16
xor r9, r11
mov r11d, [crc_init + bufp*4 + %%td2]
shl r11, 24
xor r9, r11
; merge in crc1
movzx bufp_dw, bl
mov r11d, [crc_init + bufp*4 + %%td1]
movzx bufp_dw, bh
shr ebx, 16
xor r9, r11
mov r11d, [crc_init + bufp*4 + %%td1]
shl r11, 8
xor r9, r11
movzx bufp_dw, bl
mov r11d, [crc_init + bufp*4 + %%td1]
movzx bufp_dw, bh
shl r11, 16
xor r9, r11
mov r11d, [crc_init + bufp*4 + %%td1]
shl r11, 24
xor r9, r11
xor r9, [bufptmp+i + 2*%%bSize]
crc32 r10, r9
mov rax, r10
add bufptmp, %%bSize*3 ;; move to next block
sub len, %%bSize*3
%IF %%bSize=640
jns %%crcB3_loop
%ENDIF
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
%%crcB3_end:
%IF %%bSize=640
add len, %%bSize*3
%ENDIF
je do_return ;; return if remaining data is zero
%endmacro
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; ISCSI CRC 32 Implementation with crc32 Instruction
;;; unsigned int crc32_iscsi_00(unsigned char * buffer, int len, unsigned int crc_init);
;;;
;;; *buf = rcx
;;; len = rdx
;;; crc_init = r8
;;;
global crc32_iscsi_00:function
crc32_iscsi_00:
%ifidn __OUTPUT_FORMAT__, elf64
%define bufp rdi
%define bufp_dw edi
%define bufp_w di
%define bufp_b dil
%define bufptmp rcx
%define block_0 rcx
%define block_1 r8
%define block_2 r11
%define len rsi
%define len_dw esi
%define len_w si
%define len_b sil
%define crc_init rdx
%define crc_init_dw edx
%else
%define bufp rcx
%define bufp_dw ecx
%define bufp_w cx
%define bufp_b cl
%define bufptmp rdi
%define block_0 rdi
%define block_1 rsi
%define block_2 r11
%define len rdx
%define len_dw edx
%define len_w dx
%define len_b dl
%define crc_init r8
%define crc_init_dw r8d
%endif
push rdi
push rbx
mov rax, crc_init ;; rax = crc_init;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; 1) ALIGN: ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
mov bufptmp, bufp ;; rdi = *buf
neg bufp
and bufp, 7 ;; calculate the unalignment
;; amount of the address
je proc_block ;; Skip if aligned
cmp len, 8
jb less_than_8
;;;; Calculate CRC of unaligned bytes of the buffer (if any) ;;;;
mov rbx, [bufptmp] ;; load a quadword from the buffer
add bufptmp, bufp ;; align buffer pointer for
;; quadword processing
sub len, bufp ;; update buffer length
align_loop:
crc32 eax, bl ;; compute crc32 of 1-byte
shr rbx, 8 ;; get next byte
dec bufp
jne align_loop
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; 2) BLOCK LEVEL: ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
proc_block:
cmp len, 240
jb bit8
lea crc_init, [mul_table_72] ;; load table base address
crcB3 640, 0x1000, 0x0c00 ; 640*3 = 1920 (Tables 1280, 640)
crcB3 320, 0x0c00, 0x0800 ; 320*3 = 960 (Tables 640, 320)
crcB3 160, 0x0800, 0x0400 ; 160*3 = 480 (Tables 320, 160)
crcB3 80, 0x0400, 0x0000 ; 80*3 = 240 (Tables 160, 80)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;4) LESS THAN 256-bytes REMAIN AT THIS POINT (8-bits of rdx are full)
bit8:
shl len_b, 1 ;; shift-out MSB (bit-7)
jnc bit7 ;; jump to bit-6 if bit-7 == 0
%assign i 0
%rep 16
crc32 rax, qword [bufptmp+i] ;; compute crc32 of 8-byte data
%assign i (i+8)
%endrep
je do_return ;; return if remaining data is zero
add bufptmp, 128 ;; buf +=64; (next 64 bytes)
bit7:
shl len_b, 1 ;; shift-out MSB (bit-7)
jnc bit6 ;; jump to bit-6 if bit-7 == 0
%assign i 0
%rep 8
crc32 rax, qword [bufptmp+i] ;; compute crc32 of 8-byte data
%assign i (i+8)
%endrep
je do_return ;; return if remaining data is zero
add bufptmp, 64 ;; buf +=64; (next 64 bytes)
bit6:
shl len_b, 1 ;; shift-out MSB (bit-6)
jnc bit5 ;; jump to bit-5 if bit-6 == 0
%assign i 0
%rep 4
crc32 rax, qword [bufptmp+i] ;; compute crc32 of 8-byte data
%assign i (i+8)
%endrep
je do_return ;; return if remaining data is zero
add bufptmp, 32 ;; buf +=32; (next 32 bytes)
bit5:
shl len_b, 1 ;; shift-out MSB (bit-5)
jnc bit4 ;; jump to bit-4 if bit-5 == 0
%assign i 0
%rep 2
crc32 rax, qword [bufptmp+i] ;; compute crc32 of 8-byte data
%assign i (i+8)
%endrep
je do_return ;; return if remaining data is zero
add bufptmp, 16 ;; buf +=16; (next 16 bytes)
bit4:
shl len_b, 1 ;; shift-out MSB (bit-4)
jnc bit3 ;; jump to bit-3 if bit-4 == 0
crc32 rax, qword [bufptmp] ;; compute crc32 of 8-byte data
je do_return ;; return if remaining data is zero
add bufptmp, 8 ;; buf +=8; (next 8 bytes)
bit3:
mov rbx, qword [bufptmp] ;; load a 8-bytes from the buffer:
shl len_b, 1 ;; shift-out MSB (bit-3)
jnc bit2 ;; jump to bit-2 if bit-3 == 0
crc32 eax, ebx ;; compute crc32 of 4-byte data
je do_return ;; return if remaining data is zero
shr rbx, 32 ;; get next 3 bytes
bit2:
shl len_b, 1 ;; shift-out MSB (bit-2)
jnc bit1 ;; jump to bit-1 if bit-2 == 0
crc32 eax, bx ;; compute crc32 of 2-byte data
je do_return ;; return if remaining data is zero
shr rbx, 16 ;; next byte
bit1:
test len_b,len_b
je do_return
crc32 eax, bl ;; compute crc32 of 1-byte data
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
do_return:
pop rbx
pop rdi
ret
less_than_8:
test len,4
jz less_than_4
crc32 eax, dword[bufptmp]
add bufptmp,4
less_than_4:
test len,2
jz less_than_2
crc32 eax, word[bufptmp]
add bufptmp,2
less_than_2:
test len,1
jz do_return
crc32 rax, byte[bufptmp]
pop rbx
pop bufptmp
ret
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; global mul_table_72, mul_table_152, mul_table_312, mul_table_632, mul_table_1272
section .data
align 8
mul_table_72:
DD 0x00000000,0x39d3b296,0x73a7652c,0x4a74d7ba
DD 0xe74eca58,0xde9d78ce,0x94e9af74,0xad3a1de2
DD 0xcb71e241,0xf2a250d7,0xb8d6876d,0x810535fb
DD 0x2c3f2819,0x15ec9a8f,0x5f984d35,0x664bffa3
DD 0x930fb273,0xaadc00e5,0xe0a8d75f,0xd97b65c9
DD 0x7441782b,0x4d92cabd,0x07e61d07,0x3e35af91
DD 0x587e5032,0x61ade2a4,0x2bd9351e,0x120a8788
DD 0xbf309a6a,0x86e328fc,0xcc97ff46,0xf5444dd0
DD 0x23f31217,0x1a20a081,0x5054773b,0x6987c5ad
DD 0xc4bdd84f,0xfd6e6ad9,0xb71abd63,0x8ec90ff5
DD 0xe882f056,0xd15142c0,0x9b25957a,0xa2f627ec
DD 0x0fcc3a0e,0x361f8898,0x7c6b5f22,0x45b8edb4
DD 0xb0fca064,0x892f12f2,0xc35bc548,0xfa8877de
DD 0x57b26a3c,0x6e61d8aa,0x24150f10,0x1dc6bd86
DD 0x7b8d4225,0x425ef0b3,0x082a2709,0x31f9959f
DD 0x9cc3887d,0xa5103aeb,0xef64ed51,0xd6b75fc7
DD 0x47e6242e,0x7e3596b8,0x34414102,0x0d92f394
DD 0xa0a8ee76,0x997b5ce0,0xd30f8b5a,0xeadc39cc
DD 0x8c97c66f,0xb54474f9,0xff30a343,0xc6e311d5
DD 0x6bd90c37,0x520abea1,0x187e691b,0x21addb8d
DD 0xd4e9965d,0xed3a24cb,0xa74ef371,0x9e9d41e7
DD 0x33a75c05,0x0a74ee93,0x40003929,0x79d38bbf
DD 0x1f98741c,0x264bc68a,0x6c3f1130,0x55eca3a6
DD 0xf8d6be44,0xc1050cd2,0x8b71db68,0xb2a269fe
DD 0x64153639,0x5dc684af,0x17b25315,0x2e61e183
DD 0x835bfc61,0xba884ef7,0xf0fc994d,0xc92f2bdb
DD 0xaf64d478,0x96b766ee,0xdcc3b154,0xe51003c2
DD 0x482a1e20,0x71f9acb6,0x3b8d7b0c,0x025ec99a
DD 0xf71a844a,0xcec936dc,0x84bde166,0xbd6e53f0
DD 0x10544e12,0x2987fc84,0x63f32b3e,0x5a2099a8
DD 0x3c6b660b,0x05b8d49d,0x4fcc0327,0x761fb1b1
DD 0xdb25ac53,0xe2f61ec5,0xa882c97f,0x91517be9
DD 0x8fcc485c,0xb61ffaca,0xfc6b2d70,0xc5b89fe6
DD 0x68828204,0x51513092,0x1b25e728,0x22f655be
DD 0x44bdaa1d,0x7d6e188b,0x371acf31,0x0ec97da7
DD 0xa3f36045,0x9a20d2d3,0xd0540569,0xe987b7ff
DD 0x1cc3fa2f,0x251048b9,0x6f649f03,0x56b72d95
DD 0xfb8d3077,0xc25e82e1,0x882a555b,0xb1f9e7cd
DD 0xd7b2186e,0xee61aaf8,0xa4157d42,0x9dc6cfd4
DD 0x30fcd236,0x092f60a0,0x435bb71a,0x7a88058c
DD 0xac3f5a4b,0x95ece8dd,0xdf983f67,0xe64b8df1
DD 0x4b719013,0x72a22285,0x38d6f53f,0x010547a9
DD 0x674eb80a,0x5e9d0a9c,0x14e9dd26,0x2d3a6fb0
DD 0x80007252,0xb9d3c0c4,0xf3a7177e,0xca74a5e8
DD 0x3f30e838,0x06e35aae,0x4c978d14,0x75443f82
DD 0xd87e2260,0xe1ad90f6,0xabd9474c,0x920af5da
DD 0xf4410a79,0xcd92b8ef,0x87e66f55,0xbe35ddc3
DD 0x130fc021,0x2adc72b7,0x60a8a50d,0x597b179b
DD 0xc82a6c72,0xf1f9dee4,0xbb8d095e,0x825ebbc8
DD 0x2f64a62a,0x16b714bc,0x5cc3c306,0x65107190
DD 0x035b8e33,0x3a883ca5,0x70fceb1f,0x492f5989
DD 0xe415446b,0xddc6f6fd,0x97b22147,0xae6193d1
DD 0x5b25de01,0x62f66c97,0x2882bb2d,0x115109bb
DD 0xbc6b1459,0x85b8a6cf,0xcfcc7175,0xf61fc3e3
DD 0x90543c40,0xa9878ed6,0xe3f3596c,0xda20ebfa
DD 0x771af618,0x4ec9448e,0x04bd9334,0x3d6e21a2
DD 0xebd97e65,0xd20accf3,0x987e1b49,0xa1ada9df
DD 0x0c97b43d,0x354406ab,0x7f30d111,0x46e36387
DD 0x20a89c24,0x197b2eb2,0x530ff908,0x6adc4b9e
DD 0xc7e6567c,0xfe35e4ea,0xb4413350,0x8d9281c6
DD 0x78d6cc16,0x41057e80,0x0b71a93a,0x32a21bac
DD 0x9f98064e,0xa64bb4d8,0xec3f6362,0xd5ecd1f4
DD 0xb3a72e57,0x8a749cc1,0xc0004b7b,0xf9d3f9ed
DD 0x54e9e40f,0x6d3a5699,0x274e8123,0x1e9d33b5
mul_table_152:
DD 0x00000000,0x878a92a7,0x0af953bf,0x8d73c118
DD 0x15f2a77e,0x927835d9,0x1f0bf4c1,0x98816666
DD 0x2be54efc,0xac6fdc5b,0x211c1d43,0xa6968fe4
DD 0x3e17e982,0xb99d7b25,0x34eeba3d,0xb364289a
DD 0x57ca9df8,0xd0400f5f,0x5d33ce47,0xdab95ce0
DD 0x42383a86,0xc5b2a821,0x48c16939,0xcf4bfb9e
DD 0x7c2fd304,0xfba541a3,0x76d680bb,0xf15c121c
DD 0x69dd747a,0xee57e6dd,0x632427c5,0xe4aeb562
DD 0xaf953bf0,0x281fa957,0xa56c684f,0x22e6fae8
DD 0xba679c8e,0x3ded0e29,0xb09ecf31,0x37145d96
DD 0x8470750c,0x03fae7ab,0x8e8926b3,0x0903b414
DD 0x9182d272,0x160840d5,0x9b7b81cd,0x1cf1136a
DD 0xf85fa608,0x7fd534af,0xf2a6f5b7,0x752c6710
DD 0xedad0176,0x6a2793d1,0xe75452c9,0x60dec06e
DD 0xd3bae8f4,0x54307a53,0xd943bb4b,0x5ec929ec
DD 0xc6484f8a,0x41c2dd2d,0xccb11c35,0x4b3b8e92
DD 0x5ac60111,0xdd4c93b6,0x503f52ae,0xd7b5c009
DD 0x4f34a66f,0xc8be34c8,0x45cdf5d0,0xc2476777
DD 0x71234fed,0xf6a9dd4a,0x7bda1c52,0xfc508ef5
DD 0x64d1e893,0xe35b7a34,0x6e28bb2c,0xe9a2298b
DD 0x0d0c9ce9,0x8a860e4e,0x07f5cf56,0x807f5df1
DD 0x18fe3b97,0x9f74a930,0x12076828,0x958dfa8f
DD 0x26e9d215,0xa16340b2,0x2c1081aa,0xab9a130d
DD 0x331b756b,0xb491e7cc,0x39e226d4,0xbe68b473
DD 0xf5533ae1,0x72d9a846,0xffaa695e,0x7820fbf9
DD 0xe0a19d9f,0x672b0f38,0xea58ce20,0x6dd25c87
DD 0xdeb6741d,0x593ce6ba,0xd44f27a2,0x53c5b505
DD 0xcb44d363,0x4cce41c4,0xc1bd80dc,0x4637127b
DD 0xa299a719,0x251335be,0xa860f4a6,0x2fea6601
DD 0xb76b0067,0x30e192c0,0xbd9253d8,0x3a18c17f
DD 0x897ce9e5,0x0ef67b42,0x8385ba5a,0x040f28fd
DD 0x9c8e4e9b,0x1b04dc3c,0x96771d24,0x11fd8f83
DD 0xb58c0222,0x32069085,0xbf75519d,0x38ffc33a
DD 0xa07ea55c,0x27f437fb,0xaa87f6e3,0x2d0d6444
DD 0x9e694cde,0x19e3de79,0x94901f61,0x131a8dc6
DD 0x8b9beba0,0x0c117907,0x8162b81f,0x06e82ab8
DD 0xe2469fda,0x65cc0d7d,0xe8bfcc65,0x6f355ec2
DD 0xf7b438a4,0x703eaa03,0xfd4d6b1b,0x7ac7f9bc
DD 0xc9a3d126,0x4e294381,0xc35a8299,0x44d0103e
DD 0xdc517658,0x5bdbe4ff,0xd6a825e7,0x5122b740
DD 0x1a1939d2,0x9d93ab75,0x10e06a6d,0x976af8ca
DD 0x0feb9eac,0x88610c0b,0x0512cd13,0x82985fb4
DD 0x31fc772e,0xb676e589,0x3b052491,0xbc8fb636
DD 0x240ed050,0xa38442f7,0x2ef783ef,0xa97d1148
DD 0x4dd3a42a,0xca59368d,0x472af795,0xc0a06532
DD 0x58210354,0xdfab91f3,0x52d850eb,0xd552c24c
DD 0x6636ead6,0xe1bc7871,0x6ccfb969,0xeb452bce
DD 0x73c44da8,0xf44edf0f,0x793d1e17,0xfeb78cb0
DD 0xef4a0333,0x68c09194,0xe5b3508c,0x6239c22b
DD 0xfab8a44d,0x7d3236ea,0xf041f7f2,0x77cb6555
DD 0xc4af4dcf,0x4325df68,0xce561e70,0x49dc8cd7
DD 0xd15deab1,0x56d77816,0xdba4b90e,0x5c2e2ba9
DD 0xb8809ecb,0x3f0a0c6c,0xb279cd74,0x35f35fd3
DD 0xad7239b5,0x2af8ab12,0xa78b6a0a,0x2001f8ad
DD 0x9365d037,0x14ef4290,0x999c8388,0x1e16112f
DD 0x86977749,0x011de5ee,0x8c6e24f6,0x0be4b651
DD 0x40df38c3,0xc755aa64,0x4a266b7c,0xcdacf9db
DD 0x552d9fbd,0xd2a70d1a,0x5fd4cc02,0xd85e5ea5
DD 0x6b3a763f,0xecb0e498,0x61c32580,0xe649b727
DD 0x7ec8d141,0xf94243e6,0x743182fe,0xf3bb1059
DD 0x1715a53b,0x909f379c,0x1decf684,0x9a666423
DD 0x02e70245,0x856d90e2,0x081e51fa,0x8f94c35d
DD 0x3cf0ebc7,0xbb7a7960,0x3609b878,0xb1832adf
DD 0x29024cb9,0xae88de1e,0x23fb1f06,0xa4718da1
mul_table_312:
DD 0x00000000,0xbac2fd7b,0x70698c07,0xcaab717c
DD 0xe0d3180e,0x5a11e575,0x90ba9409,0x2a786972
DD 0xc44a46ed,0x7e88bb96,0xb423caea,0x0ee13791
DD 0x24995ee3,0x9e5ba398,0x54f0d2e4,0xee322f9f
DD 0x8d78fb2b,0x37ba0650,0xfd11772c,0x47d38a57
DD 0x6dabe325,0xd7691e5e,0x1dc26f22,0xa7009259
DD 0x4932bdc6,0xf3f040bd,0x395b31c1,0x8399ccba
DD 0xa9e1a5c8,0x132358b3,0xd98829cf,0x634ad4b4
DD 0x1f1d80a7,0xa5df7ddc,0x6f740ca0,0xd5b6f1db
DD 0xffce98a9,0x450c65d2,0x8fa714ae,0x3565e9d5
DD 0xdb57c64a,0x61953b31,0xab3e4a4d,0x11fcb736
DD 0x3b84de44,0x8146233f,0x4bed5243,0xf12faf38
DD 0x92657b8c,0x28a786f7,0xe20cf78b,0x58ce0af0
DD 0x72b66382,0xc8749ef9,0x02dfef85,0xb81d12fe
DD 0x562f3d61,0xecedc01a,0x2646b166,0x9c844c1d
DD 0xb6fc256f,0x0c3ed814,0xc695a968,0x7c575413
DD 0x3e3b014e,0x84f9fc35,0x4e528d49,0xf4907032
DD 0xdee81940,0x642ae43b,0xae819547,0x1443683c
DD 0xfa7147a3,0x40b3bad8,0x8a18cba4,0x30da36df
DD 0x1aa25fad,0xa060a2d6,0x6acbd3aa,0xd0092ed1
DD 0xb343fa65,0x0981071e,0xc32a7662,0x79e88b19
DD 0x5390e26b,0xe9521f10,0x23f96e6c,0x993b9317
DD 0x7709bc88,0xcdcb41f3,0x0760308f,0xbda2cdf4
DD 0x97daa486,0x2d1859fd,0xe7b32881,0x5d71d5fa
DD 0x212681e9,0x9be47c92,0x514f0dee,0xeb8df095
DD 0xc1f599e7,0x7b37649c,0xb19c15e0,0x0b5ee89b
DD 0xe56cc704,0x5fae3a7f,0x95054b03,0x2fc7b678
DD 0x05bfdf0a,0xbf7d2271,0x75d6530d,0xcf14ae76
DD 0xac5e7ac2,0x169c87b9,0xdc37f6c5,0x66f50bbe
DD 0x4c8d62cc,0xf64f9fb7,0x3ce4eecb,0x862613b0
DD 0x68143c2f,0xd2d6c154,0x187db028,0xa2bf4d53
DD 0x88c72421,0x3205d95a,0xf8aea826,0x426c555d
DD 0x7c76029c,0xc6b4ffe7,0x0c1f8e9b,0xb6dd73e0
DD 0x9ca51a92,0x2667e7e9,0xeccc9695,0x560e6bee
DD 0xb83c4471,0x02feb90a,0xc855c876,0x7297350d
DD 0x58ef5c7f,0xe22da104,0x2886d078,0x92442d03
DD 0xf10ef9b7,0x4bcc04cc,0x816775b0,0x3ba588cb
DD 0x11dde1b9,0xab1f1cc2,0x61b46dbe,0xdb7690c5
DD 0x3544bf5a,0x8f864221,0x452d335d,0xffefce26
DD 0xd597a754,0x6f555a2f,0xa5fe2b53,0x1f3cd628
DD 0x636b823b,0xd9a97f40,0x13020e3c,0xa9c0f347
DD 0x83b89a35,0x397a674e,0xf3d11632,0x4913eb49
DD 0xa721c4d6,0x1de339ad,0xd74848d1,0x6d8ab5aa
DD 0x47f2dcd8,0xfd3021a3,0x379b50df,0x8d59ada4
DD 0xee137910,0x54d1846b,0x9e7af517,0x24b8086c
DD 0x0ec0611e,0xb4029c65,0x7ea9ed19,0xc46b1062
DD 0x2a593ffd,0x909bc286,0x5a30b3fa,0xe0f24e81
DD 0xca8a27f3,0x7048da88,0xbae3abf4,0x0021568f
DD 0x424d03d2,0xf88ffea9,0x32248fd5,0x88e672ae
DD 0xa29e1bdc,0x185ce6a7,0xd2f797db,0x68356aa0
DD 0x8607453f,0x3cc5b844,0xf66ec938,0x4cac3443
DD 0x66d45d31,0xdc16a04a,0x16bdd136,0xac7f2c4d
DD 0xcf35f8f9,0x75f70582,0xbf5c74fe,0x059e8985
DD 0x2fe6e0f7,0x95241d8c,0x5f8f6cf0,0xe54d918b
DD 0x0b7fbe14,0xb1bd436f,0x7b163213,0xc1d4cf68
DD 0xebaca61a,0x516e5b61,0x9bc52a1d,0x2107d766
DD 0x5d508375,0xe7927e0e,0x2d390f72,0x97fbf209
DD 0xbd839b7b,0x07416600,0xcdea177c,0x7728ea07
DD 0x991ac598,0x23d838e3,0xe973499f,0x53b1b4e4
DD 0x79c9dd96,0xc30b20ed,0x09a05191,0xb362acea
DD 0xd028785e,0x6aea8525,0xa041f459,0x1a830922
DD 0x30fb6050,0x8a399d2b,0x4092ec57,0xfa50112c
DD 0x14623eb3,0xaea0c3c8,0x640bb2b4,0xdec94fcf
DD 0xf4b126bd,0x4e73dbc6,0x84d8aaba,0x3e1a57c1
mul_table_632:
DD 0x00000000,0x6b749fb2,0xd6e93f64,0xbd9da0d6
DD 0xa83e0839,0xc34a978b,0x7ed7375d,0x15a3a8ef
DD 0x55906683,0x3ee4f931,0x837959e7,0xe80dc655
DD 0xfdae6eba,0x96daf108,0x2b4751de,0x4033ce6c
DD 0xab20cd06,0xc05452b4,0x7dc9f262,0x16bd6dd0
DD 0x031ec53f,0x686a5a8d,0xd5f7fa5b,0xbe8365e9
DD 0xfeb0ab85,0x95c43437,0x285994e1,0x432d0b53
DD 0x568ea3bc,0x3dfa3c0e,0x80679cd8,0xeb13036a
DD 0x53adecfd,0x38d9734f,0x8544d399,0xee304c2b
DD 0xfb93e4c4,0x90e77b76,0x2d7adba0,0x460e4412
DD 0x063d8a7e,0x6d4915cc,0xd0d4b51a,0xbba02aa8
DD 0xae038247,0xc5771df5,0x78eabd23,0x139e2291
DD 0xf88d21fb,0x93f9be49,0x2e641e9f,0x4510812d
DD 0x50b329c2,0x3bc7b670,0x865a16a6,0xed2e8914
DD 0xad1d4778,0xc669d8ca,0x7bf4781c,0x1080e7ae
DD 0x05234f41,0x6e57d0f3,0xd3ca7025,0xb8beef97
DD 0xa75bd9fa,0xcc2f4648,0x71b2e69e,0x1ac6792c
DD 0x0f65d1c3,0x64114e71,0xd98ceea7,0xb2f87115
DD 0xf2cbbf79,0x99bf20cb,0x2422801d,0x4f561faf
DD 0x5af5b740,0x318128f2,0x8c1c8824,0xe7681796
DD 0x0c7b14fc,0x670f8b4e,0xda922b98,0xb1e6b42a
DD 0xa4451cc5,0xcf318377,0x72ac23a1,0x19d8bc13
DD 0x59eb727f,0x329fedcd,0x8f024d1b,0xe476d2a9
DD 0xf1d57a46,0x9aa1e5f4,0x273c4522,0x4c48da90
DD 0xf4f63507,0x9f82aab5,0x221f0a63,0x496b95d1
DD 0x5cc83d3e,0x37bca28c,0x8a21025a,0xe1559de8
DD 0xa1665384,0xca12cc36,0x778f6ce0,0x1cfbf352
DD 0x09585bbd,0x622cc40f,0xdfb164d9,0xb4c5fb6b
DD 0x5fd6f801,0x34a267b3,0x893fc765,0xe24b58d7
DD 0xf7e8f038,0x9c9c6f8a,0x2101cf5c,0x4a7550ee
DD 0x0a469e82,0x61320130,0xdcafa1e6,0xb7db3e54
DD 0xa27896bb,0xc90c0909,0x7491a9df,0x1fe5366d
DD 0x4b5bc505,0x202f5ab7,0x9db2fa61,0xf6c665d3
DD 0xe365cd3c,0x8811528e,0x358cf258,0x5ef86dea
DD 0x1ecba386,0x75bf3c34,0xc8229ce2,0xa3560350
DD 0xb6f5abbf,0xdd81340d,0x601c94db,0x0b680b69
DD 0xe07b0803,0x8b0f97b1,0x36923767,0x5de6a8d5
DD 0x4845003a,0x23319f88,0x9eac3f5e,0xf5d8a0ec
DD 0xb5eb6e80,0xde9ff132,0x630251e4,0x0876ce56
DD 0x1dd566b9,0x76a1f90b,0xcb3c59dd,0xa048c66f
DD 0x18f629f8,0x7382b64a,0xce1f169c,0xa56b892e
DD 0xb0c821c1,0xdbbcbe73,0x66211ea5,0x0d558117
DD 0x4d664f7b,0x2612d0c9,0x9b8f701f,0xf0fbefad
DD 0xe5584742,0x8e2cd8f0,0x33b17826,0x58c5e794
DD 0xb3d6e4fe,0xd8a27b4c,0x653fdb9a,0x0e4b4428
DD 0x1be8ecc7,0x709c7375,0xcd01d3a3,0xa6754c11
DD 0xe646827d,0x8d321dcf,0x30afbd19,0x5bdb22ab
DD 0x4e788a44,0x250c15f6,0x9891b520,0xf3e52a92
DD 0xec001cff,0x8774834d,0x3ae9239b,0x519dbc29
DD 0x443e14c6,0x2f4a8b74,0x92d72ba2,0xf9a3b410
DD 0xb9907a7c,0xd2e4e5ce,0x6f794518,0x040ddaaa
DD 0x11ae7245,0x7adaedf7,0xc7474d21,0xac33d293
DD 0x4720d1f9,0x2c544e4b,0x91c9ee9d,0xfabd712f
DD 0xef1ed9c0,0x846a4672,0x39f7e6a4,0x52837916
DD 0x12b0b77a,0x79c428c8,0xc459881e,0xaf2d17ac
DD 0xba8ebf43,0xd1fa20f1,0x6c678027,0x07131f95
DD 0xbfadf002,0xd4d96fb0,0x6944cf66,0x023050d4
DD 0x1793f83b,0x7ce76789,0xc17ac75f,0xaa0e58ed
DD 0xea3d9681,0x81490933,0x3cd4a9e5,0x57a03657
DD 0x42039eb8,0x2977010a,0x94eaa1dc,0xff9e3e6e
DD 0x148d3d04,0x7ff9a2b6,0xc2640260,0xa9109dd2
DD 0xbcb3353d,0xd7c7aa8f,0x6a5a0a59,0x012e95eb
DD 0x411d5b87,0x2a69c435,0x97f464e3,0xfc80fb51
DD 0xe92353be,0x8257cc0c,0x3fca6cda,0x54bef368
mul_table_1272:
DD 0x00000000,0xdd66cbbb,0xbf21e187,0x62472a3c
DD 0x7bafb5ff,0xa6c97e44,0xc48e5478,0x19e89fc3
DD 0xf75f6bfe,0x2a39a045,0x487e8a79,0x951841c2
DD 0x8cf0de01,0x519615ba,0x33d13f86,0xeeb7f43d
DD 0xeb52a10d,0x36346ab6,0x5473408a,0x89158b31
DD 0x90fd14f2,0x4d9bdf49,0x2fdcf575,0xf2ba3ece
DD 0x1c0dcaf3,0xc16b0148,0xa32c2b74,0x7e4ae0cf
DD 0x67a27f0c,0xbac4b4b7,0xd8839e8b,0x05e55530
DD 0xd34934eb,0x0e2fff50,0x6c68d56c,0xb10e1ed7
DD 0xa8e68114,0x75804aaf,0x17c76093,0xcaa1ab28
DD 0x24165f15,0xf97094ae,0x9b37be92,0x46517529
DD 0x5fb9eaea,0x82df2151,0xe0980b6d,0x3dfec0d6
DD 0x381b95e6,0xe57d5e5d,0x873a7461,0x5a5cbfda
DD 0x43b42019,0x9ed2eba2,0xfc95c19e,0x21f30a25
DD 0xcf44fe18,0x122235a3,0x70651f9f,0xad03d424
DD 0xb4eb4be7,0x698d805c,0x0bcaaa60,0xd6ac61db
DD 0xa37e1f27,0x7e18d49c,0x1c5ffea0,0xc139351b
DD 0xd8d1aad8,0x05b76163,0x67f04b5f,0xba9680e4
DD 0x542174d9,0x8947bf62,0xeb00955e,0x36665ee5
DD 0x2f8ec126,0xf2e80a9d,0x90af20a1,0x4dc9eb1a
DD 0x482cbe2a,0x954a7591,0xf70d5fad,0x2a6b9416
DD 0x33830bd5,0xeee5c06e,0x8ca2ea52,0x51c421e9
DD 0xbf73d5d4,0x62151e6f,0x00523453,0xdd34ffe8
DD 0xc4dc602b,0x19baab90,0x7bfd81ac,0xa69b4a17
DD 0x70372bcc,0xad51e077,0xcf16ca4b,0x127001f0
DD 0x0b989e33,0xd6fe5588,0xb4b97fb4,0x69dfb40f
DD 0x87684032,0x5a0e8b89,0x3849a1b5,0xe52f6a0e
DD 0xfcc7f5cd,0x21a13e76,0x43e6144a,0x9e80dff1
DD 0x9b658ac1,0x4603417a,0x24446b46,0xf922a0fd
DD 0xe0ca3f3e,0x3dacf485,0x5febdeb9,0x828d1502
DD 0x6c3ae13f,0xb15c2a84,0xd31b00b8,0x0e7dcb03
DD 0x179554c0,0xcaf39f7b,0xa8b4b547,0x75d27efc
DD 0x431048bf,0x9e768304,0xfc31a938,0x21576283
DD 0x38bffd40,0xe5d936fb,0x879e1cc7,0x5af8d77c
DD 0xb44f2341,0x6929e8fa,0x0b6ec2c6,0xd608097d
DD 0xcfe096be,0x12865d05,0x70c17739,0xada7bc82
DD 0xa842e9b2,0x75242209,0x17630835,0xca05c38e
DD 0xd3ed5c4d,0x0e8b97f6,0x6cccbdca,0xb1aa7671
DD 0x5f1d824c,0x827b49f7,0xe03c63cb,0x3d5aa870
DD 0x24b237b3,0xf9d4fc08,0x9b93d634,0x46f51d8f
DD 0x90597c54,0x4d3fb7ef,0x2f789dd3,0xf21e5668
DD 0xebf6c9ab,0x36900210,0x54d7282c,0x89b1e397
DD 0x670617aa,0xba60dc11,0xd827f62d,0x05413d96
DD 0x1ca9a255,0xc1cf69ee,0xa38843d2,0x7eee8869
DD 0x7b0bdd59,0xa66d16e2,0xc42a3cde,0x194cf765
DD 0x00a468a6,0xddc2a31d,0xbf858921,0x62e3429a
DD 0x8c54b6a7,0x51327d1c,0x33755720,0xee139c9b
DD 0xf7fb0358,0x2a9dc8e3,0x48dae2df,0x95bc2964
DD 0xe06e5798,0x3d089c23,0x5f4fb61f,0x82297da4
DD 0x9bc1e267,0x46a729dc,0x24e003e0,0xf986c85b
DD 0x17313c66,0xca57f7dd,0xa810dde1,0x7576165a
DD 0x6c9e8999,0xb1f84222,0xd3bf681e,0x0ed9a3a5
DD 0x0b3cf695,0xd65a3d2e,0xb41d1712,0x697bdca9
DD 0x7093436a,0xadf588d1,0xcfb2a2ed,0x12d46956
DD 0xfc639d6b,0x210556d0,0x43427cec,0x9e24b757
DD 0x87cc2894,0x5aaae32f,0x38edc913,0xe58b02a8
DD 0x33276373,0xee41a8c8,0x8c0682f4,0x5160494f
DD 0x4888d68c,0x95ee1d37,0xf7a9370b,0x2acffcb0
DD 0xc478088d,0x191ec336,0x7b59e90a,0xa63f22b1
DD 0xbfd7bd72,0x62b176c9,0x00f65cf5,0xdd90974e
DD 0xd875c27e,0x051309c5,0x675423f9,0xba32e842
DD 0xa3da7781,0x7ebcbc3a,0x1cfb9606,0xc19d5dbd
DD 0x2f2aa980,0xf24c623b,0x900b4807,0x4d6d83bc
DD 0x54851c7f,0x89e3d7c4,0xeba4fdf8,0x36c23643
;;; func core, ver, snum
slversion crc32_iscsi_00, 00, 03, 0014

View File

@ -31,7 +31,13 @@
%include "reg_sizes.asm"
%ifndef PREFETCH
%define PREFETCH prefetcht1
%endif
default rel
section .text
%define CONCAT(a,b,c) a %+ b %+ c
; Define threshold where buffers are considered "small" and routed to more
@ -50,8 +56,9 @@ default rel
;;; len = rdx
;;; crc_init = r8
global crc32_iscsi_01:function
mk_global crc32_iscsi_01, function
crc32_iscsi_01:
endbranch
%ifidn __OUTPUT_FORMAT__, elf64
%define bufp rdi
@ -97,6 +104,11 @@ crc32_iscsi_01:
mov crc_init, crc_init_arg
%endif
;; If len is less than 8 we need to jump to special code to avoid
;; reading beyond the end of the buffer
cmp len, 8
jb less_than_8
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; 1) ALIGN: ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
@ -108,11 +120,6 @@ crc32_iscsi_01:
;; the address
je proc_block ;; Skip if aligned
;; If len is less than 8 and we're unaligned, we need to jump
;; to special code to avoid reading beyond the end of the buffer
cmp len, 8
jb less_than_8
;;;; Calculate CRC of unaligned bytes of the buffer (if any) ;;;
mov tmp, [bufptmp] ;; load a quadword from the buffer
add bufptmp, bufp ;; align buffer pointer for quadword
@ -182,7 +189,7 @@ full_block:
; ;; branch into array
; jmp CONCAT(crc_,128,)
; Fall thruogh into top of crc array (crc_128)
; Fall through into top of crc array (crc_128)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
@ -190,19 +197,38 @@ full_block:
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
crc_array:
cmp len, 128*24*2
jbe non_prefetch
%assign i 128
%rep 128-1
CONCAT(crc_,i,:)
CONCAT(_crc_,i,:)
crc32 crc_init, qword [block_0 - i*8]
crc32 crc1, qword [block_1 - i*8]
crc32 crc2, qword [block_2 - i*8]
%if i > 128*8 / 32 ; prefetch next 3KB data
prefetchnta [block_2 + 128*32 - i*32]
PREFETCH [block_2 + 128*32 - i*32]
%endif
%assign i (i-1)
%endrep
jmp next_
non_prefetch:
%assign i 128
%rep 128-1
CONCAT(crc_,i,:)
endbranch
crc32 crc_init, qword [block_0 - i*8]
crc32 crc1, qword [block_1 - i*8]
crc32 crc2, qword [block_2 - i*8]
%assign i (i-1)
%endrep
next_:
CONCAT(crc_,i,:)
crc32 crc_init, qword [block_0 - i*8]
crc32 crc1, qword [block_1 - i*8]
@ -316,17 +342,17 @@ do_16:
less_than_8:
test len,4
jz less_than_4
crc32 crc_init_dw, dword[bufptmp]
add bufptmp,4
crc32 crc_init_dw, dword[bufp]
add bufp,4
less_than_4:
test len,2
jz less_than_2
crc32 crc_init_dw, word[bufptmp]
add bufptmp,2
crc32 crc_init_dw, word[bufp]
add bufp,2
less_than_2:
test len,1
jz do_return
crc32 crc_init_dw, byte[bufptmp]
crc32 crc_init_dw, byte[bufp]
mov rax, crc_init
pop rsi
pop rdi
@ -566,7 +592,3 @@ K_table:
dq 0x045cddf4e, 0x0e0ac139e
dq 0x1a91647f2, 0x169cf9eb0
dq 0x1a0f717c4, 0x0170076fa
;;; func core, ver, snum
slversion crc32_iscsi_01, 01, 03, 0015

547
crc/crc32_iscsi_by16_10.asm Normal file
View File

@ -0,0 +1,547 @@
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Copyright(c) 2011-2020 Intel Corporation All rights reserved.
;
; Redistribution and use in source and binary forms, with or without
; modification, are permitted provided that the following conditions
; are met:
; * Redistributions of source code must retain the above copyright
; notice, this list of conditions and the following disclaimer.
; * Redistributions in binary form must reproduce the above copyright
; notice, this list of conditions and the following disclaimer in
; the documentation and/or other materials provided with the
; distribution.
; * Neither the name of Intel Corporation nor the names of its
; contributors may be used to endorse or promote products derived
; from this software without specific prior written permission.
;
; THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
; "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
; LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
; A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
; OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
; SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
; LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
; DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
; THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
; (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
; OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Function API:
; UINT32 crc32_iscsi_by16_10(
; UINT32 init_crc, //initial CRC value, 32 bits
; const unsigned char *buf, //buffer pointer to calculate CRC on
; UINT64 len //buffer length in bytes (64-bit data)
; );
;
; Authors:
; Erdinc Ozturk
; Vinodh Gopal
; James Guilford
;
; Reference paper titled "Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction"
; URL: http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf
;
;
%include "reg_sizes.asm"
%ifndef FUNCTION_NAME
%define FUNCTION_NAME crc32_iscsi_by16_10
%endif
%ifndef fetch_dist
%define fetch_dist 1536
%endif
%ifndef PREFETCH
%define PREFETCH prefetcht0
%endif
[bits 64]
default rel
section .text
%ifidn __OUTPUT_FORMAT__, win64
%xdefine arg1 r8
%xdefine arg2 rcx
%xdefine arg3 rdx
%xdefine arg1_low32 r8d
%else
%xdefine arg1 rdx
%xdefine arg2 rdi
%xdefine arg3 rsi
%xdefine arg1_low32 edx
%endif
align 16
mk_global FUNCTION_NAME, function
FUNCTION_NAME:
endbranch
%ifidn __OUTPUT_FORMAT__, win64
sub rsp, (16*10 + 8)
; push the xmm registers into the stack to maintain
vmovdqa [rsp + 16*0], xmm6
vmovdqa [rsp + 16*1], xmm7
vmovdqa [rsp + 16*2], xmm8
vmovdqa [rsp + 16*3], xmm9
vmovdqa [rsp + 16*4], xmm10
vmovdqa [rsp + 16*5], xmm11
vmovdqa [rsp + 16*6], xmm12
vmovdqa [rsp + 16*7], xmm13
vmovdqa [rsp + 16*8], xmm14
vmovdqa [rsp + 16*9], xmm15
%endif
; check if smaller than 256B
cmp arg3, 256
jl .less_than_256
; load the initial crc value
vmovd xmm10, arg1_low32 ; initial crc
; receive the initial 64B data, xor the initial crc value
vmovdqu8 zmm0, [arg2+16*0]
vmovdqu8 zmm4, [arg2+16*4]
vpxorq zmm0, zmm10
vbroadcasti32x4 zmm10, [rk3] ;xmm10 has rk3 and rk4
;imm value of pclmulqdq instruction will determine which constant to use
sub arg3, 256
cmp arg3, 256
jl .fold_128_B_loop
vmovdqu8 zmm7, [arg2+16*8]
vmovdqu8 zmm8, [arg2+16*12]
vbroadcasti32x4 zmm16, [rk_1] ;zmm16 has rk-1 and rk-2
sub arg3, 256
%if fetch_dist != 0
; check if there is at least 1.5KB (fetch distance) + 256B in the buffer
cmp arg3, (fetch_dist + 256)
jb .fold_256_B_loop
align 16
.fold_and_prefetch_256_B_loop:
add arg2, 256
PREFETCH [arg2+fetch_dist+0]
vpclmulqdq zmm1, zmm0, zmm16, 0x10
vpclmulqdq zmm0, zmm0, zmm16, 0x01
vpternlogq zmm0, zmm1, [arg2+16*0], 0x96
PREFETCH [arg2+fetch_dist+64]
vpclmulqdq zmm2, zmm4, zmm16, 0x10
vpclmulqdq zmm4, zmm4, zmm16, 0x01
vpternlogq zmm4, zmm2, [arg2+16*4], 0x96
PREFETCH [arg2+fetch_dist+64*2]
vpclmulqdq zmm3, zmm7, zmm16, 0x10
vpclmulqdq zmm7, zmm7, zmm16, 0x01
vpternlogq zmm7, zmm3, [arg2+16*8], 0x96
PREFETCH [arg2+fetch_dist+64*3]
vpclmulqdq zmm5, zmm8, zmm16, 0x10
vpclmulqdq zmm8, zmm8, zmm16, 0x01
vpternlogq zmm8, zmm5, [arg2+16*12], 0x96
sub arg3, 256
; check if there is another 1.5KB (fetch distance) + 256B in the buffer
cmp arg3, (fetch_dist + 256)
jge .fold_and_prefetch_256_B_loop
%endif ; fetch_dist != 0
align 16
.fold_256_B_loop:
add arg2, 256
vpclmulqdq zmm1, zmm0, zmm16, 0x10
vpclmulqdq zmm0, zmm0, zmm16, 0x01
vpternlogq zmm0, zmm1, [arg2+16*0], 0x96
vpclmulqdq zmm2, zmm4, zmm16, 0x10
vpclmulqdq zmm4, zmm4, zmm16, 0x01
vpternlogq zmm4, zmm2, [arg2+16*4], 0x96
vpclmulqdq zmm3, zmm7, zmm16, 0x10
vpclmulqdq zmm7, zmm7, zmm16, 0x01
vpternlogq zmm7, zmm3, [arg2+16*8], 0x96
vpclmulqdq zmm5, zmm8, zmm16, 0x10
vpclmulqdq zmm8, zmm8, zmm16, 0x01
vpternlogq zmm8, zmm5, [arg2+16*12], 0x96
sub arg3, 256
jge .fold_256_B_loop
;; Fold 256 into 128
add arg2, 256
vpclmulqdq zmm1, zmm0, zmm10, 0x01
vpclmulqdq zmm2, zmm0, zmm10, 0x10
vpternlogq zmm7, zmm1, zmm2, 0x96 ; xor ABC
vpclmulqdq zmm5, zmm4, zmm10, 0x01
vpclmulqdq zmm6, zmm4, zmm10, 0x10
vpternlogq zmm8, zmm5, zmm6, 0x96 ; xor ABC
vmovdqa32 zmm0, zmm7
vmovdqa32 zmm4, zmm8
add arg3, 128
jmp .less_than_128_B
; at this section of the code, there is 128*x+y (0<=y<128) bytes of buffer. The fold_128_B_loop
; loop will fold 128B at a time until we have 128+y Bytes of buffer
; fold 128B at a time. This section of the code folds 8 xmm registers in parallel
align 16
.fold_128_B_loop:
add arg2, 128
vpclmulqdq zmm2, zmm0, zmm10, 0x10
vpclmulqdq zmm0, zmm0, zmm10, 0x01
vpternlogq zmm0, zmm2, [arg2+16*0], 0x96
vpclmulqdq zmm5, zmm4, zmm10, 0x10
vpclmulqdq zmm4, zmm4, zmm10, 0x01
vpternlogq zmm4, zmm5, [arg2+16*4], 0x96
sub arg3, 128
jge .fold_128_B_loop
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
add arg2, 128
align 16
.less_than_128_B:
;; At this point, the buffer pointer is pointing at the last
;; y bytes of the buffer, where 0 <= y < 128.
;; The 128 bytes of folded data is in 2 of the zmm registers:
;; zmm0 and zmm4
cmp arg3, -64
jl .fold_128_B_register
vbroadcasti32x4 zmm10, [rk15]
;; If there are still 64 bytes left, folds from 128 bytes to 64 bytes
;; and handles the next 64 bytes
vpclmulqdq zmm2, zmm0, zmm10, 0x10
vpclmulqdq zmm0, zmm0, zmm10, 0x01
vpternlogq zmm0, zmm2, zmm4, 0x96
add arg3, 128
jmp .fold_64B_loop
align 16
.fold_128_B_register:
; fold the 8 128b parts into 1 xmm register with different constants
vmovdqu8 zmm16, [rk9] ; multiply by rk9-rk16
vmovdqu8 zmm11, [rk17] ; multiply by rk17-rk20, rk1,rk2, 0,0
vpclmulqdq zmm1, zmm0, zmm16, 0x01
vpclmulqdq zmm2, zmm0, zmm16, 0x10
vextracti64x2 xmm7, zmm4, 3 ; save last that has no multiplicand
vpclmulqdq zmm5, zmm4, zmm11, 0x01
vpclmulqdq zmm6, zmm4, zmm11, 0x10
vmovdqa xmm10, [rk1] ; Needed later in reduction loop
vpternlogq zmm1, zmm2, zmm5, 0x96 ; xor ABC
vpternlogq zmm1, zmm6, zmm7, 0x96 ; xor ABC
vshufi64x2 zmm8, zmm1, zmm1, 0x4e ; Swap 1,0,3,2 - 01 00 11 10
vpxorq ymm8, ymm8, ymm1
vextracti64x2 xmm5, ymm8, 1
vpxorq xmm7, xmm5, xmm8
; instead of 128, we add 128-16 to the loop counter to save 1 instruction from the loop
; instead of a cmp instruction, we use the negative flag with the jl instruction
add arg3, 128-16
jl .final_reduction_for_128
; now we have 16+y bytes left to reduce. 16 Bytes is in register xmm7 and the rest is in memory
; we can fold 16 bytes at a time if y>=16
; continue folding 16B at a time
align 16
.16B_reduction_loop:
vpclmulqdq xmm8, xmm7, xmm10, 0x1
vpclmulqdq xmm7, xmm7, xmm10, 0x10
vpternlogq xmm7, xmm8, [arg2], 0x96
add arg2, 16
sub arg3, 16
; instead of a cmp instruction, we utilize the flags with the jge instruction
; equivalent of: cmp arg3, 16-16
; check if there is any more 16B in the buffer to be able to fold
jge .16B_reduction_loop
;now we have 16+z bytes left to reduce, where 0<= z < 16.
;first, we reduce the data in the xmm7 register
align 16
.final_reduction_for_128:
add arg3, 16
je .128_done
; here we are getting data that is less than 16 bytes.
; since we know that there was data before the pointer, we can offset
; the input pointer before the actual point, to receive exactly 16 bytes.
; after that the registers need to be adjusted.
align 16
.get_last_two_xmms:
vmovdqa xmm2, xmm7
vmovdqu xmm1, [arg2 - 16 + arg3]
; get rid of the extra data that was loaded before
; load the shift constant
lea rax, [rel pshufb_shf_table]
add rax, arg3
vmovdqu xmm0, [rax]
vpshufb xmm7, xmm0
vpxor xmm0, [mask3]
vpshufb xmm2, xmm0
vpblendvb xmm2, xmm2, xmm1, xmm0
;;;;;;;;;;
vpclmulqdq xmm8, xmm7, xmm10, 0x1
vpclmulqdq xmm7, xmm7, xmm10, 0x10
vpternlogq xmm7, xmm8, xmm2, 0x96
align 16
.128_done:
; compute crc of a 128-bit value
xor rax, rax
vmovq r11, xmm7
crc32 rax, r11
vpextrq r11, xmm7, 1
crc32 rax, r11
align 16
.cleanup:
%ifidn __OUTPUT_FORMAT__, win64
vmovdqa xmm6, [rsp + 16*0]
vmovdqa xmm7, [rsp + 16*1]
vmovdqa xmm8, [rsp + 16*2]
vmovdqa xmm9, [rsp + 16*3]
vmovdqa xmm10, [rsp + 16*4]
vmovdqa xmm11, [rsp + 16*5]
vmovdqa xmm12, [rsp + 16*6]
vmovdqa xmm13, [rsp + 16*7]
vmovdqa xmm14, [rsp + 16*8]
vmovdqa xmm15, [rsp + 16*9]
add rsp, (16*10 + 8)
%endif
ret
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
align 16
.less_than_256:
; check if there is enough buffer to be able to fold 16B at a time
cmp arg3, 32
jl .less_than_32
vmovd xmm1, arg1_low32 ; get the initial crc value
cmp arg3, 64
jl .less_than_64
;; receive the initial 64B data, xor the initial crc value
vmovdqu8 zmm0, [arg2]
vpxorq zmm0, zmm1
add arg2, 64
sub arg3, 64
cmp arg3, 64
jb .reduce_64B
vbroadcasti32x4 zmm10, [rk15]
align 16
.fold_64B_loop:
vmovdqu8 zmm4, [arg2]
vpclmulqdq zmm2, zmm0, zmm10, 0x10
vpclmulqdq zmm0, zmm0, zmm10, 0x01
vpternlogq zmm0, zmm2, zmm4, 0x96
add arg2, 64
sub arg3, 64
cmp arg3, 64
jge .fold_64B_loop
align 16
.reduce_64B:
; Reduce from 64 bytes to 16 bytes
vmovdqu8 zmm11, [rk17]
vpclmulqdq zmm1, zmm0, zmm11, 0x01
vpclmulqdq zmm2, zmm0, zmm11, 0x10
vextracti64x2 xmm7, zmm0, 3 ; save last that has no multiplicand
vpternlogq zmm1, zmm2, zmm7, 0x96
vmovdqa xmm10, [rk_1b] ; Needed later in reduction loop
vshufi64x2 zmm8, zmm1, zmm1, 0x4e ; Swap 1,0,3,2 - 01 00 11 10
vpxorq ymm8, ymm8, ymm1
vextracti64x2 xmm5, ymm8, 1
vpxorq xmm7, xmm5, xmm8
sub arg3, 16
jns .16B_reduction_loop ; At least 16 bytes of data to digest
jmp .final_reduction_for_128
align 16
.less_than_64:
;; if there is, load the constants
vmovdqa xmm10, [rk_1b]
vmovdqu xmm7, [arg2] ; load the plaintext
vpxor xmm7, xmm1 ; xmm1 already has initial crc value
;; update the buffer pointer
add arg2, 16
;; update the counter
;; - subtract 32 instead of 16 to save one instruction from the loop
sub arg3, 32
jmp .16B_reduction_loop
align 16
.less_than_32:
; mov initial crc to the return value. this is necessary for zero-length buffers.
mov eax, arg1_low32
test arg3, arg3
je .cleanup
vmovd xmm0, arg1_low32 ; get the initial crc value
cmp arg3, 16
je .exact_16_left
jl .less_than_16_left
vmovdqu xmm7, [arg2] ; load the plaintext
vpxor xmm7, xmm0 ; xor the initial crc value
add arg2, 16
sub arg3, 16
vmovdqa xmm10, [rk1] ; rk1 and rk2 in xmm10
jmp .get_last_two_xmms
align 16
.less_than_16_left:
cmp arg3, 4
jl .only_less_than_4
xor r10, r10
bts r10, arg3
dec r10
kmovw k2, r10d
vmovdqu8 xmm7{k2}{z}, [arg2]
vpxor xmm7, xmm0 ; xor the initial crc value
lea rax, [rel pshufb_shf_table]
vmovdqu xmm0, [rax + arg3]
vpshufb xmm7,xmm0
jmp .128_done
align 16
.exact_16_left:
vmovdqu xmm7, [arg2]
vpxor xmm7, xmm0 ; xor the initial crc value
jmp .128_done
align 16
.only_less_than_4:
mov eax, arg1_low32
cmp arg3, 2
jb .only_1_left
je .only_2_left
; 3 bytes left
crc32 eax, word [arg2]
crc32 eax, byte [arg2 + 2]
jmp .cleanup
align 16
.only_2_left:
crc32 eax, word [arg2]
jmp .cleanup
align 16
.only_1_left:
crc32 eax, byte [arg2]
jmp .cleanup
section .data
align 32
%ifndef USE_CONSTS
; precomputed constants
rk_1: dq 0x00000000b9e02b86
rk_2: dq 0x00000000dcb17aa4
rk1: dq 0x00000000493c7d27
rk2: dq 0x0000000ec1068c50
rk3: dq 0x0000000206e38d70
rk4: dq 0x000000006992cea2
rk5: dq 0x00000000493c7d27
rk6: dq 0x00000000dd45aab8
rk7: dq 0x00000000dea713f0
rk8: dq 0x0000000105ec76f0
rk9: dq 0x0000000047db8317
rk10: dq 0x000000002ad91c30
rk11: dq 0x000000000715ce53
rk12: dq 0x00000000c49f4f67
rk13: dq 0x0000000039d3b296
rk14: dq 0x00000000083a6eec
rk15: dq 0x000000009e4addf8
rk16: dq 0x00000000740eef02
rk17: dq 0x00000000ddc0152b
rk18: dq 0x000000001c291d04
rk19: dq 0x00000000ba4fc28e
rk20: dq 0x000000003da6d0cb
rk_1b: dq 0x00000000493c7d27
rk_2b: dq 0x0000000ec1068c50
dq 0x0000000000000000
dq 0x0000000000000000
%else
INCLUDE_CONSTS
%endif
pshufb_shf_table:
; use these values for shift constants for the pshufb instruction
; different alignments result in values as shown:
; dq 0x8887868584838281, 0x008f8e8d8c8b8a89 ; shl 15 (16-1) / shr1
; dq 0x8988878685848382, 0x01008f8e8d8c8b8a ; shl 14 (16-3) / shr2
; dq 0x8a89888786858483, 0x0201008f8e8d8c8b ; shl 13 (16-4) / shr3
; dq 0x8b8a898887868584, 0x030201008f8e8d8c ; shl 12 (16-4) / shr4
; dq 0x8c8b8a8988878685, 0x04030201008f8e8d ; shl 11 (16-5) / shr5
; dq 0x8d8c8b8a89888786, 0x0504030201008f8e ; shl 10 (16-6) / shr6
; dq 0x8e8d8c8b8a898887, 0x060504030201008f ; shl 9 (16-7) / shr7
; dq 0x8f8e8d8c8b8a8988, 0x0706050403020100 ; shl 8 (16-8) / shr8
; dq 0x008f8e8d8c8b8a89, 0x0807060504030201 ; shl 7 (16-9) / shr9
; dq 0x01008f8e8d8c8b8a, 0x0908070605040302 ; shl 6 (16-10) / shr10
; dq 0x0201008f8e8d8c8b, 0x0a09080706050403 ; shl 5 (16-11) / shr11
; dq 0x030201008f8e8d8c, 0x0b0a090807060504 ; shl 4 (16-12) / shr12
; dq 0x04030201008f8e8d, 0x0c0b0a0908070605 ; shl 3 (16-13) / shr13
; dq 0x0504030201008f8e, 0x0d0c0b0a09080706 ; shl 2 (16-14) / shr14
; dq 0x060504030201008f, 0x0e0d0c0b0a090807 ; shl 1 (16-15) / shr15
dq 0x8786858483828100, 0x8f8e8d8c8b8a8988
dq 0x0706050403020100, 0x000e0d0c0b0a0908
mask: dq 0xFFFFFFFFFFFFFFFF, 0x0000000000000000
mask2: dq 0xFFFFFFFF00000000, 0xFFFFFFFFFFFFFFFF
mask3: dq 0x8080808080808080, 0x8080808080808080

629
crc/crc32_iscsi_by8_02.asm Normal file
View File

@ -0,0 +1,629 @@
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Copyright(c) 2011-2025 Intel Corporation All rights reserved.
;
; Redistribution and use in source and binary forms, with or without
; modification, are permitted provided that the following conditions
; are met:
; * Redistributions of source code must retain the above copyright
; notice, this list of conditions and the following disclaimer.
; * Redistributions in binary form must reproduce the above copyright
; notice, this list of conditions and the following disclaimer in
; the documentation and/or other materials provided with the
; distribution.
; * Neither the name of Intel Corporation nor the names of its
; contributors may be used to endorse or promote products derived
; from this software without specific prior written permission.
;
; THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
; "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
; LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
; A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
; OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
; SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
; LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
; DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
; THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
; (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
; OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Function API:
; UINT32 crc32_iscsi_by8_02(
; const unsigned char *buf, //buffer pointer to calculate CRC on
; UINT64 len, //buffer length in bytes (64-bit data)
; UINT32 init_crc //initial CRC value, 32 bits
; );
;
; Authors:
; Erdinc Ozturk
; Vinodh Gopal
; James Guilford
;
; Reference paper titled "Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction"
; URL: http://download.intel.com/design/intarch/papers/323102.pdf
;
;
; CRC-32 checksum is described in RFC 1952
; Implementing RFC 1952 CRC:
; http://www.ietf.org/rfc/rfc1952.txt
%include "reg_sizes.asm"
%ifndef fetch_dist
%define fetch_dist 4096
%endif
%ifndef PREFETCH
%define PREFETCH prefetcht1
%endif
[bits 64]
default rel
section .text
%ifidn __OUTPUT_FORMAT__, win64
%xdefine arg1 rcx
%xdefine arg2 rdx
%xdefine arg3 r8
%xdefine arg3_low32 r8d
%else
%xdefine arg1 rdi
%xdefine arg2 rsi
%xdefine arg3 rdx
%xdefine arg3_low32 edx
%endif
%define in_buf arg1
%define buf_len arg2
%define init_crc arg3_low32
%ifidn __OUTPUT_FORMAT__, win64
%define XMM_OFFSET 16*2
%define VARIABLE_OFFSET 16*10+8
%else
%define VARIABLE_OFFSET 16*2+8
%endif
align 16
mk_global crc32_iscsi_by8_02, function
crc32_iscsi_by8_02:
endbranch
sub rsp,VARIABLE_OFFSET
%ifidn __OUTPUT_FORMAT__, win64
; push the xmm registers into the stack to maintain
vmovdqa [rsp + XMM_OFFSET + 16*0], xmm6
vmovdqa [rsp + XMM_OFFSET + 16*1], xmm7
vmovdqa [rsp + XMM_OFFSET + 16*2], xmm8
vmovdqa [rsp + XMM_OFFSET + 16*3], xmm9
vmovdqa [rsp + XMM_OFFSET + 16*4], xmm10
vmovdqa [rsp + XMM_OFFSET + 16*5], xmm11
vmovdqa [rsp + XMM_OFFSET + 16*6], xmm12
vmovdqa [rsp + XMM_OFFSET + 16*7], xmm13
%endif
; check if smaller than 256
cmp buf_len, 256
; for sizes less than 256, we can't fold 128B at a time...
jl _less_than_256
; load the initial crc value
vmovd xmm10, init_crc ; initial crc
; receive the initial 128B data, xor the initial crc value
vmovdqu xmm0, [in_buf+16*0]
vmovdqu xmm1, [in_buf+16*1]
vmovdqu xmm2, [in_buf+16*2]
vmovdqu xmm3, [in_buf+16*3]
vmovdqu xmm4, [in_buf+16*4]
vmovdqu xmm5, [in_buf+16*5]
vmovdqu xmm6, [in_buf+16*6]
vmovdqu xmm7, [in_buf+16*7]
; XOR the initial_crc value
vpxor xmm0, xmm10
vmovdqa xmm10, [rk3] ;xmm10 has rk3 and rk4
;imm value of pclmulqdq instruction will determine which constant to use
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; we subtract 256 instead of 128 to save one instruction from the loop
sub buf_len, 256
; at this section of the code, there is 128*x+y (0<=y<128) bytes of buffer. The _fold_128_B_loop
; loop will fold 128B at a time until we have 128+y Bytes of buffer
%if fetch_dist != 0
; check if there is at least 4kb (fetch distance) + 128b in the buffer
cmp buf_len, (fetch_dist + 128)
jb _fold_128_B_loop
; fold 128B at a time. This section of the code folds 8 xmm registers in parallel
align 16
_fold_and_prefetch_128_B_loop:
; update the buffer pointer
add in_buf, 128 ; buf += 128;
PREFETCH [in_buf+fetch_dist+0]
vmovdqu xmm9, [in_buf+16*0]
vmovdqu xmm12, [in_buf+16*1]
vpclmulqdq xmm8, xmm0, xmm10, 0x10
vpclmulqdq xmm0, xmm0, xmm10, 0x1
vpclmulqdq xmm13, xmm1, xmm10, 0x10
vpclmulqdq xmm1, xmm1, xmm10, 0x1
vpxor xmm0, xmm9
vpxor xmm0, xmm8
vpxor xmm1, xmm12
vpxor xmm1, xmm13
vmovdqu xmm9, [in_buf+16*2]
vmovdqu xmm12, [in_buf+16*3]
vpclmulqdq xmm8, xmm2, xmm10, 0x10
vpclmulqdq xmm2, xmm2, xmm10, 0x1
vpclmulqdq xmm13, xmm3, xmm10, 0x10
vpclmulqdq xmm3, xmm3, xmm10, 0x1
vpxor xmm2, xmm9
vpxor xmm2, xmm8
vpxor xmm3, xmm12
vpxor xmm3, xmm13
PREFETCH [in_buf+fetch_dist+64]
vmovdqu xmm9, [in_buf+16*4]
vmovdqu xmm12, [in_buf+16*5]
vpclmulqdq xmm8, xmm4, xmm10, 0x10
vpclmulqdq xmm4, xmm4, xmm10, 0x1
vpclmulqdq xmm13, xmm5, xmm10, 0x10
vpclmulqdq xmm5, xmm5, xmm10, 0x1
vpxor xmm4, xmm9
vpxor xmm4, xmm8
vpxor xmm5, xmm12
vpxor xmm5, xmm13
vmovdqu xmm9, [in_buf+16*6]
vmovdqu xmm12, [in_buf+16*7]
vmovdqa xmm8, xmm6
vmovdqa xmm13, xmm7
vpclmulqdq xmm6, xmm10, 0x10
vpclmulqdq xmm8, xmm10 , 0x1
vpclmulqdq xmm7, xmm10, 0x10
vpclmulqdq xmm13, xmm10 , 0x1
vpxor xmm6, xmm9
vxorps xmm6, xmm8
vpxor xmm7, xmm12
vxorps xmm7, xmm13
sub buf_len, 128
; check if there is another 4KB (fetch distance) + 128B in the buffer
cmp buf_len, (fetch_dist + 128)
jge _fold_and_prefetch_128_B_loop
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
%endif ; fetch_dist != 0
align 16
_fold_128_B_loop:
; update the buffer pointer
add in_buf, 128 ; buf += 128;
vmovdqu xmm9, [in_buf+16*0]
vmovdqu xmm12, [in_buf+16*1]
vmovdqa xmm8, xmm0
vmovdqa xmm13, xmm1
vpclmulqdq xmm0, xmm10, 0x10
vpclmulqdq xmm8, xmm10 , 0x1
vpclmulqdq xmm1, xmm10, 0x10
vpclmulqdq xmm13, xmm10 , 0x1
vpxor xmm0, xmm9
vxorps xmm0, xmm8
vpxor xmm1, xmm12
vxorps xmm1, xmm13
vmovdqu xmm9, [in_buf+16*2]
vmovdqu xmm12, [in_buf+16*3]
vmovdqa xmm8, xmm2
vmovdqa xmm13, xmm3
vpclmulqdq xmm2, xmm10, 0x10
vpclmulqdq xmm8, xmm10 , 0x1
vpclmulqdq xmm3, xmm10, 0x10
vpclmulqdq xmm13, xmm10 , 0x1
vpxor xmm2, xmm9
vxorps xmm2, xmm8
vpxor xmm3, xmm12
vxorps xmm3, xmm13
vmovdqu xmm9, [in_buf+16*4]
vmovdqu xmm12, [in_buf+16*5]
vmovdqa xmm8, xmm4
vmovdqa xmm13, xmm5
vpclmulqdq xmm4, xmm10, 0x10
vpclmulqdq xmm8, xmm10 , 0x1
vpclmulqdq xmm5, xmm10, 0x10
vpclmulqdq xmm13, xmm10 , 0x1
vpxor xmm4, xmm9
vxorps xmm4, xmm8
vpxor xmm5, xmm12
vxorps xmm5, xmm13
vmovdqu xmm9, [in_buf+16*6]
vmovdqu xmm12, [in_buf+16*7]
vpclmulqdq xmm8, xmm6, xmm10, 0x10
vpclmulqdq xmm6, xmm6, xmm10, 0x1
vpclmulqdq xmm13, xmm7, xmm10, 0x10
vpclmulqdq xmm7, xmm7, xmm10, 0x1
vpxor xmm6, xmm9
vpxor xmm6, xmm8
vpxor xmm7, xmm12
vpxor xmm7, xmm13
sub buf_len, 128
; check if there is another 128B in the buffer to be able to fold
jge _fold_128_B_loop
add in_buf, 128
; at this point, the buffer pointer is pointing at the last y Bytes of the buffer
; the 128 of folded data is in 4 of the xmm registers: xmm0, xmm1, xmm2, xmm3
; fold the 8 xmm registers to 1 xmm register with different constants
vmovdqa xmm10, [rk9]
vpclmulqdq xmm8, xmm0, xmm10, 0x1
vpclmulqdq xmm0, xmm0, xmm10, 0x10
vpxor xmm7, xmm8
vpxor xmm7, xmm0
vmovdqa xmm10, [rk11]
vpclmulqdq xmm8, xmm1, xmm10, 0x1
vpclmulqdq xmm1, xmm1, xmm10, 0x10
vpxor xmm7, xmm8
vpxor xmm7, xmm1
vmovdqa xmm10, [rk13]
vpclmulqdq xmm8, xmm2, xmm10, 0x1
vpclmulqdq xmm2, xmm2, xmm10, 0x10
vpxor xmm7, xmm8
vpxor xmm7, xmm2
vmovdqa xmm10, [rk15]
vpclmulqdq xmm8, xmm3, xmm10, 0x1
vpclmulqdq xmm3, xmm3, xmm10, 0x10
vpxor xmm7, xmm8
vpxor xmm7, xmm3
vmovdqa xmm10, [rk17]
vpclmulqdq xmm8, xmm4, xmm10, 0x1
vpclmulqdq xmm4, xmm4, xmm10, 0x10
vpxor xmm7, xmm8
vpxor xmm7, xmm4
vmovdqa xmm10, [rk19]
vpclmulqdq xmm8, xmm5, xmm10, 0x1
vpclmulqdq xmm5, xmm5, xmm10, 0x10
vpxor xmm7, xmm8
vpxor xmm7, xmm5
vmovdqa xmm10, [rk1] ;xmm10 has rk1 and rk2
;imm value of pclmulqdq instruction will determine which constant to use
vpclmulqdq xmm8, xmm6, xmm10, 0x1
vpclmulqdq xmm6, xmm6, xmm10, 0x10
vpxor xmm7, xmm8
vpxor xmm7, xmm6
; instead of 128, we add 112 to the loop counter to save 1 instruction from the loop
; instead of a cmp instruction, we use the negative flag with the jl instruction
add buf_len, 128-16
jl _final_reduction_for_128
; now we have 16+y bytes left to reduce. 16 Bytes is in register xmm7 and the rest is in memory
; we can fold 16 bytes at a time if y>=16
; continue folding 16B at a time
_16B_reduction_loop:
vpclmulqdq xmm8, xmm7, xmm10, 0x1
vpclmulqdq xmm7, xmm7, xmm10, 0x10
vpxor xmm7, xmm8
vmovdqu xmm0, [in_buf]
vpxor xmm7, xmm0
add in_buf, 16
sub buf_len, 16
; instead of a cmp instruction, we utilize the flags with the jge instruction
; equivalent of: cmp buf_len, 16-16
; check if there is any more 16B in the buffer to be able to fold
jge _16B_reduction_loop
;now we have 16+z bytes left to reduce, where 0<= z < 16.
;first, we reduce the data in the xmm7 register
_final_reduction_for_128:
; check if any more data to fold. If not, compute the CRC of the final 128 bits
add buf_len, 16
je _128_done
; here we are getting data that is less than 16 bytes.
; since we know that there was data before the pointer, we can offset the input pointer before the actual point, to receive exactly 16 bytes.
; after that the registers need to be adjusted.
_get_last_two_xmms:
vmovdqa xmm2, xmm7
vmovdqu xmm1, [in_buf - 16 + buf_len]
; get rid of the extra data that was loaded before
; load the shift constant
lea rax, [pshufb_shf_table]
add rax, buf_len
vmovdqu xmm0, [rax]
vpshufb xmm7, xmm0
vpxor xmm0, [mask3]
vpshufb xmm2, xmm0
vpblendvb xmm2, xmm2, xmm1, xmm0
;;;;;;;;;;
vpclmulqdq xmm8, xmm7, xmm10, 0x1
vpclmulqdq xmm7, xmm7, xmm10, 0x10
vpxor xmm7, xmm8
vpxor xmm7, xmm2
_128_done:
; compute crc of a 128-bit value
vmovdqa xmm10, [rk5]
vmovdqa xmm0, xmm7
;64b fold
vpclmulqdq xmm7, xmm10, 0
vpsrldq xmm0, 8
vpxor xmm7, xmm0
;32b fold
vmovdqa xmm0, xmm7
vpslldq xmm7, 4
vpclmulqdq xmm7, xmm10, 0x10
pxor xmm7, xmm0
;barrett reduction
_barrett:
vpand xmm7, [mask2]
vmovdqa xmm1, xmm7
vmovdqa xmm2, xmm7
vmovdqa xmm10, [rk7]
vpclmulqdq xmm7, xmm10, 0
vpxor xmm7, xmm2
vpand xmm7, [mask]
vmovdqa xmm2, xmm7
vpclmulqdq xmm7, xmm10, 0x10
vpxor xmm7, xmm2
vpxor xmm7, xmm1
vpextrd eax, xmm7, 2
_cleanup:
%ifidn __OUTPUT_FORMAT__, win64
vmovdqa xmm6, [rsp + XMM_OFFSET + 16*0]
vmovdqa xmm7, [rsp + XMM_OFFSET + 16*1]
vmovdqa xmm8, [rsp + XMM_OFFSET + 16*2]
vmovdqa xmm9, [rsp + XMM_OFFSET + 16*3]
vmovdqa xmm10, [rsp + XMM_OFFSET + 16*4]
vmovdqa xmm11, [rsp + XMM_OFFSET + 16*5]
vmovdqa xmm12, [rsp + XMM_OFFSET + 16*6]
vmovdqa xmm13, [rsp + XMM_OFFSET + 16*7]
%endif
add rsp,VARIABLE_OFFSET
ret
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
align 16
_less_than_256:
; check if there is enough buffer to be able to fold 16B at a time
cmp buf_len, 32
jl _less_than_32
; if there is, load the constants
vmovdqa xmm10, [rk1] ; rk1 and rk2 in xmm10
vmovd xmm0, init_crc ; get the initial crc value
vmovdqu xmm7, [in_buf] ; load the plaintext
vpxor xmm7, xmm0
; update the buffer pointer
add in_buf, 16
; update the counter. subtract 32 instead of 16 to save one instruction from the loop
sub buf_len, 32
jmp _16B_reduction_loop
align 16
_less_than_32:
; mov initial crc to the return value. this is necessary for zero-length buffers.
mov eax, init_crc
test buf_len, buf_len
je _cleanup
vmovd xmm0, init_crc ; get the initial crc value
cmp buf_len, 16
je _exact_16_left
jl _less_than_16_left
vmovdqu xmm7, [in_buf] ; load the plaintext
vpxor xmm7, xmm0 ; xor the initial crc value
add in_buf, 16
sub buf_len, 16
vmovdqa xmm10, [rk1] ; rk1 and rk2 in xmm10
jmp _get_last_two_xmms
align 16
_less_than_16_left:
; use stack space to load data less than 16 bytes, zero-out the 16B in memory first.
vpxor xmm1, xmm1
mov r11, rsp
vmovdqa [r11], xmm1
cmp buf_len, 4
jl _only_less_than_4
; backup the counter value
mov r9, buf_len
cmp buf_len, 8
jl _less_than_8_left
; load 8 Bytes
mov rax, [in_buf]
mov [r11], rax
add r11, 8
sub buf_len, 8
add in_buf, 8
_less_than_8_left:
cmp buf_len, 4
jl _less_than_4_left
; load 4 Bytes
mov eax, [in_buf]
mov [r11], eax
add r11, 4
sub buf_len, 4
add in_buf, 4
_less_than_4_left:
cmp buf_len, 2
jl _less_than_2_left
; load 2 Bytes
mov ax, [in_buf]
mov [r11], ax
add r11, 2
sub buf_len, 2
add in_buf, 2
_less_than_2_left:
cmp buf_len, 1
jl _zero_left
; load 1 Byte
mov al, [in_buf]
mov [r11], al
_zero_left:
vmovdqa xmm7, [rsp]
vpxor xmm7, xmm0 ; xor the initial crc value
lea rax, [pshufb_shf_table]
vmovdqu xmm0, [rax + r9]
vpshufb xmm7, xmm0
jmp _128_done
align 16
_exact_16_left:
vmovdqu xmm7, [in_buf]
vpxor xmm7, xmm0 ; xor the initial crc value
jmp _128_done
_only_less_than_4:
cmp buf_len, 3
jl _only_less_than_3
; load 3 Bytes
mov al, [in_buf]
mov [r11], al
mov al, [in_buf+1]
mov [r11+1], al
mov al, [in_buf+2]
mov [r11+2], al
vmovdqa xmm7, [rsp]
vpxor xmm7, xmm0 ; xor the initial crc value
vpslldq xmm7, 5
jmp _barrett
_only_less_than_3:
cmp buf_len, 2
jl _only_less_than_2
; load 2 Bytes
mov al, [in_buf]
mov [r11], al
mov al, [in_buf+1]
mov [r11+1], al
vmovdqa xmm7, [rsp]
vpxor xmm7, xmm0 ; xor the initial crc value
vpslldq xmm7, 6
jmp _barrett
_only_less_than_2:
; load 1 Byte
mov al, [in_buf]
mov [r11], al
vmovdqa xmm7, [rsp]
vpxor xmm7, xmm0 ; xor the initial crc value
vpslldq xmm7, 7
jmp _barrett
section .data
; precomputed constants
align 16
rk1: dq 0x00000000493c7d27
rk2: dq 0x0000000ec1068c50
rk3: dq 0x0000000206e38d70
rk4: dq 0x000000006992cea2
rk5: dq 0x00000000493c7d27
rk6: dq 0x00000000dd45aab8
rk7: dq 0x00000000dea713f0
rk8: dq 0x0000000105ec76f0
rk9: dq 0x0000000047db8317
rk10: dq 0x000000002ad91c30
rk11: dq 0x000000000715ce53
rk12: dq 0x00000000c49f4f67
rk13: dq 0x0000000039d3b296
rk14: dq 0x00000000083a6eec
rk15: dq 0x000000009e4addf8
rk16: dq 0x00000000740eef02
rk17: dq 0x00000000ddc0152b
rk18: dq 0x000000001c291d04
rk19: dq 0x00000000ba4fc28e
rk20: dq 0x000000003da6d0cb
mask:
dq 0xFFFFFFFFFFFFFFFF, 0x0000000000000000
mask2:
dq 0xFFFFFFFF00000000, 0xFFFFFFFFFFFFFFFF
mask3:
dq 0x8080808080808080, 0x8080808080808080
pshufb_shf_table:
dq 0x8786858483828100, 0x8f8e8d8c8b8a8988
dq 0x0706050403020100, 0x000e0d0c0b0a0908

View File

@ -31,57 +31,55 @@
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <sys/time.h>
#include "crc.h"
#include "test.h"
//#define CACHED_TEST
#ifdef CACHED_TEST
#ifndef GT_L3_CACHE
#define GT_L3_CACHE 32 * 1024 * 1024 /* some number > last level cache */
#endif
#if !defined(COLD_TEST) && !defined(TEST_CUSTOM)
// Cached test, loop many times over small dataset
# define TEST_LEN 8*1024
# define TEST_LOOPS 1000000
# define TEST_TYPE_STR "_warm"
#else
#define TEST_LEN 8 * 1024
#define TEST_TYPE_STR "_warm"
#elif defined(COLD_TEST)
// Uncached test. Pull from large mem base.
# define GT_L3_CACHE 32*1024*1024 /* some number > last level cache */
# define TEST_LEN (2 * GT_L3_CACHE)
# define TEST_LOOPS 500
# define TEST_TYPE_STR "_cold"
#define TEST_LEN (2 * GT_L3_CACHE)
#define TEST_TYPE_STR "_cold"
#endif
#ifndef TEST_SEED
# define TEST_SEED 0x1234
#define TEST_SEED 0x1234
#endif
#define TEST_MEM TEST_LEN
int main(int argc, char *argv[])
int
main(int argc, char *argv[])
{
int i;
void *buf;
uint32_t crc;
struct perf start, stop;
void *buf;
uint32_t crc;
struct perf start;
printf("crc32_iscsi_perf:\n");
printf("crc32_iscsi_perf:\n");
if (posix_memalign(&buf, 1024, TEST_LEN)) {
printf("alloc error: Fail");
return -1;
}
if (posix_memalign(&buf, 1024, TEST_LEN)) {
printf("alloc error: Fail");
return -1;
}
printf("Start timed tests\n");
fflush(0);
printf("Start timed tests\n");
fflush(0);
memset(buf, 0, TEST_LEN);
crc = crc32_iscsi(buf, TEST_LEN, TEST_SEED);
perf_start(&start);
for (i = 0; i < TEST_LOOPS; i++) {
crc = crc32_iscsi(buf, TEST_LEN, TEST_SEED);
}
perf_stop(&stop);
printf("crc32_iscsi" TEST_TYPE_STR ": ");
perf_print(stop, start, (long long)TEST_LEN * i);
memset(buf, 0, TEST_LEN);
BENCHMARK(&start, BENCHMARK_TIME, crc = crc32_iscsi(buf, TEST_LEN, TEST_SEED));
printf("crc32_iscsi" TEST_TYPE_STR ": ");
perf_print(start, (long long) TEST_LEN);
printf("finish 0x%x\n", crc);
return 0;
printf("finish 0x%x\n", crc);
// Free allocated memory
aligned_free(buf);
return 0;
}

View File

@ -1,171 +0,0 @@
/**********************************************************************
Copyright(c) 2011-2015 Intel Corporation All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Intel Corporation nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
**********************************************************************/
#include <stdio.h>
#include <stdlib.h>
#include "crc.h"
#include "types.h"
unsigned long crc32_table_iscsi[256] = {
0x00000000, 0xF26B8303, 0xE13B70F7, 0x1350F3F4,
0xC79A971F, 0x35F1141C, 0x26A1E7E8, 0xD4CA64EB,
0x8AD958CF, 0x78B2DBCC, 0x6BE22838, 0x9989AB3B,
0x4D43CFD0, 0xBF284CD3, 0xAC78BF27, 0x5E133C24,
0x105EC76F, 0xE235446C, 0xF165B798, 0x030E349B,
0xD7C45070, 0x25AFD373, 0x36FF2087, 0xC494A384,
0x9A879FA0, 0x68EC1CA3, 0x7BBCEF57, 0x89D76C54,
0x5D1D08BF, 0xAF768BBC, 0xBC267848, 0x4E4DFB4B,
0x20BD8EDE, 0xD2D60DDD, 0xC186FE29, 0x33ED7D2A,
0xE72719C1, 0x154C9AC2, 0x061C6936, 0xF477EA35,
0xAA64D611, 0x580F5512, 0x4B5FA6E6, 0xB93425E5,
0x6DFE410E, 0x9F95C20D, 0x8CC531F9, 0x7EAEB2FA,
0x30E349B1, 0xC288CAB2, 0xD1D83946, 0x23B3BA45,
0xF779DEAE, 0x05125DAD, 0x1642AE59, 0xE4292D5A,
0xBA3A117E, 0x4851927D, 0x5B016189, 0xA96AE28A,
0x7DA08661, 0x8FCB0562, 0x9C9BF696, 0x6EF07595,
0x417B1DBC, 0xB3109EBF, 0xA0406D4B, 0x522BEE48,
0x86E18AA3, 0x748A09A0, 0x67DAFA54, 0x95B17957,
0xCBA24573, 0x39C9C670, 0x2A993584, 0xD8F2B687,
0x0C38D26C, 0xFE53516F, 0xED03A29B, 0x1F682198,
0x5125DAD3, 0xA34E59D0, 0xB01EAA24, 0x42752927,
0x96BF4DCC, 0x64D4CECF, 0x77843D3B, 0x85EFBE38,
0xDBFC821C, 0x2997011F, 0x3AC7F2EB, 0xC8AC71E8,
0x1C661503, 0xEE0D9600, 0xFD5D65F4, 0x0F36E6F7,
0x61C69362, 0x93AD1061, 0x80FDE395, 0x72966096,
0xA65C047D, 0x5437877E, 0x4767748A, 0xB50CF789,
0xEB1FCBAD, 0x197448AE, 0x0A24BB5A, 0xF84F3859,
0x2C855CB2, 0xDEEEDFB1, 0xCDBE2C45, 0x3FD5AF46,
0x7198540D, 0x83F3D70E, 0x90A324FA, 0x62C8A7F9,
0xB602C312, 0x44694011, 0x5739B3E5, 0xA55230E6,
0xFB410CC2, 0x092A8FC1, 0x1A7A7C35, 0xE811FF36,
0x3CDB9BDD, 0xCEB018DE, 0xDDE0EB2A, 0x2F8B6829,
0x82F63B78, 0x709DB87B, 0x63CD4B8F, 0x91A6C88C,
0x456CAC67, 0xB7072F64, 0xA457DC90, 0x563C5F93,
0x082F63B7, 0xFA44E0B4, 0xE9141340, 0x1B7F9043,
0xCFB5F4A8, 0x3DDE77AB, 0x2E8E845F, 0xDCE5075C,
0x92A8FC17, 0x60C37F14, 0x73938CE0, 0x81F80FE3,
0x55326B08, 0xA759E80B, 0xB4091BFF, 0x466298FC,
0x1871A4D8, 0xEA1A27DB, 0xF94AD42F, 0x0B21572C,
0xDFEB33C7, 0x2D80B0C4, 0x3ED04330, 0xCCBBC033,
0xA24BB5A6, 0x502036A5, 0x4370C551, 0xB11B4652,
0x65D122B9, 0x97BAA1BA, 0x84EA524E, 0x7681D14D,
0x2892ED69, 0xDAF96E6A, 0xC9A99D9E, 0x3BC21E9D,
0xEF087A76, 0x1D63F975, 0x0E330A81, 0xFC588982,
0xB21572C9, 0x407EF1CA, 0x532E023E, 0xA145813D,
0x758FE5D6, 0x87E466D5, 0x94B49521, 0x66DF1622,
0x38CC2A06, 0xCAA7A905, 0xD9F75AF1, 0x2B9CD9F2,
0xFF56BD19, 0x0D3D3E1A, 0x1E6DCDEE, 0xEC064EED,
0xC38D26C4, 0x31E6A5C7, 0x22B65633, 0xD0DDD530,
0x0417B1DB, 0xF67C32D8, 0xE52CC12C, 0x1747422F,
0x49547E0B, 0xBB3FFD08, 0xA86F0EFC, 0x5A048DFF,
0x8ECEE914, 0x7CA56A17, 0x6FF599E3, 0x9D9E1AE0,
0xD3D3E1AB, 0x21B862A8, 0x32E8915C, 0xC083125F,
0x144976B4, 0xE622F5B7, 0xF5720643, 0x07198540,
0x590AB964, 0xAB613A67, 0xB831C993, 0x4A5A4A90,
0x9E902E7B, 0x6CFBAD78, 0x7FAB5E8C, 0x8DC0DD8F,
0xE330A81A, 0x115B2B19, 0x020BD8ED, 0xF0605BEE,
0x24AA3F05, 0xD6C1BC06, 0xC5914FF2, 0x37FACCF1,
0x69E9F0D5, 0x9B8273D6, 0x88D28022, 0x7AB90321,
0xAE7367CA, 0x5C18E4C9, 0x4F48173D, 0xBD23943E,
0xF36E6F75, 0x0105EC76, 0x12551F82, 0xE03E9C81,
0x34F4F86A, 0xC69F7B69, 0xD5CF889D, 0x27A40B9E,
0x79B737BA, 0x8BDCB4B9, 0x988C474D, 0x6AE7C44E,
0xBE2DA0A5, 0x4C4623A6, 0x5F16D052, 0xAD7D5351,
};
#define PAGESIZE 10240
int main(void)
{
unsigned int i, j, good, test, init_crc = 1;
printf("crc32_iscsi_test: ");
unsigned char *q_buf = malloc(PAGESIZE);
if (q_buf == NULL) {
printf("alloc of q_buf failed\n");
return -1;
}
// fill q_buf with semi-random data
for (i = 0; i < PAGESIZE; i++)
q_buf[i] = (unsigned char)(i ^ (13 + (i >> 8)) ^ ((i >> 16) - 13));
// Test case 1: Compare against base/simple crc32 implementation and
// try all offsets/alignments of buffer.
for (j = 0; j < 128; j++) {
for (i = 0; i < PAGESIZE - j; i++) {
good = crc32_iscsi_base(q_buf + j, i, -1);
test = crc32_iscsi(q_buf + j, i, -1);
if (good != test) {
printf("Error for size %d offset %d, %08X should be %08X\n",
i, j, test, good);
return -1;
}
} // end for i
putchar('.');
fflush(0);
} // end for j
// Test case 2: Also vary initial CRC
for (j = 0; j < 128; j++) { // do all offsets
for (i = 0; i < PAGESIZE - j; i++) {
good = crc32_iscsi_base(q_buf + j, i, init_crc);
test = crc32_iscsi(q_buf + j, i, init_crc);
if (good != test) {
printf("Error for size %d offset %d, %08X should be %08X\n",
i, j, test, good);
return -1;
}
// modify init_crc semi-randomly
init_crc ^= 1 << ((i * 3 + j * 5) & 31);
} // end for i
putchar('.');
fflush(0);
} // end for j
// Test case 3: do end of buffer
for (i = 0; i < PAGESIZE; i++) {
good = crc32_iscsi_base(q_buf + i, PAGESIZE - i, -1);
test = crc32_iscsi(q_buf + i, PAGESIZE - i, -1);
if (good != test) {
printf("Error for size %d at eob, %08X should be %08X\n",
i, test, good);
return -1;
}
} // end for i
putchar('.');
fflush(0);
printf("Pass\n");
return 0;
}

View File

@ -27,133 +27,644 @@
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
**********************************************************************/
#include <stdlib.h>
#include "crc64.h"
#define MAX_ITER 8
// crc64_ecma baseline function
// Slow crc64 from the definition. Can be sped up with a lookup table.
uint64_t crc64_ecma_refl_base(uint64_t seed, const uint8_t * buf, uint64_t len)
{
uint64_t rem = ~seed;
unsigned int i, j;
uint64_t poly = 0xC96C5795D7870F42ULL; // ECMA-182 standard reflected
for (i = 0; i < len; i++) {
rem = rem ^ (uint64_t) buf[i];
for (j = 0; j < MAX_ITER; j++) {
rem = (rem & 0x1ULL ? poly : 0) ^ (rem >> 1);
}
}
return ~rem;
}
uint64_t crc64_ecma_norm_base(uint64_t seed, const uint8_t * buf, uint64_t len)
{
uint64_t rem = ~seed;
unsigned int i, j;
uint64_t poly = 0x42F0E1EBA9EA3693ULL; // ECMA-182 standard
for (i = 0; i < len; i++) {
rem = rem ^ ((uint64_t) buf[i] << 56);
for (j = 0; j < MAX_ITER; j++) {
rem = (rem & 0x8000000000000000ULL ? poly : 0) ^ (rem << 1);
}
}
return ~rem;
}
// crc64_iso baseline function
// Slow crc64 from the definition. Can be sped up with a lookup table.
uint64_t crc64_iso_refl_base(uint64_t seed, const uint8_t * buf, uint64_t len)
{
uint64_t rem = ~seed;
unsigned int i, j;
uint64_t poly = 0xD800000000000000ULL; // ISO standard reflected
for (i = 0; i < len; i++) {
rem = rem ^ (uint64_t) buf[i];
for (j = 0; j < MAX_ITER; j++) {
rem = (rem & 0x1ULL ? poly : 0) ^ (rem >> 1);
}
}
return ~rem;
}
uint64_t crc64_iso_norm_base(uint64_t seed, const uint8_t * buf, uint64_t len)
{
uint64_t rem = ~seed;
unsigned int i, j;
uint64_t poly = 0x000000000000001BULL; // ISO standard
for (i = 0; i < len; i++) {
rem = rem ^ ((uint64_t) buf[i] << 56);
for (j = 0; j < MAX_ITER; j++) {
rem = (rem & 0x8000000000000000ULL ? poly : 0) ^ (rem << 1);
}
}
return ~rem;
}
// crc64_jones baseline function
// Slow crc64 from the definition. Can be sped up with a lookup table.
uint64_t crc64_jones_refl_base(uint64_t seed, const uint8_t * buf, uint64_t len)
{
uint64_t rem = ~seed;
unsigned int i, j;
uint64_t poly = 0x95ac9329ac4bc9b5ULL; // Jones coefficients reflected
for (i = 0; i < len; i++) {
rem = rem ^ (uint64_t) buf[i];
for (j = 0; j < MAX_ITER; j++) {
rem = (rem & 0x1ULL ? poly : 0) ^ (rem >> 1);
}
}
return ~rem;
}
uint64_t crc64_jones_norm_base(uint64_t seed, const uint8_t * buf, uint64_t len)
{
uint64_t rem = ~seed;
unsigned int i, j;
uint64_t poly = 0xad93d23594c935a9ULL; // Jones coefficients
for (i = 0; i < len; i++) {
rem = rem ^ ((uint64_t) buf[i] << 56);
for (j = 0; j < MAX_ITER; j++) {
rem = (rem & 0x8000000000000000ULL ? poly : 0) ^ (rem << 1);
}
}
return ~rem;
}
struct slver {
unsigned short snum;
unsigned char ver;
unsigned char core;
static const uint64_t crc64_ecma_refl_table[256] = {
0x0000000000000000ULL, 0xb32e4cbe03a75f6fULL, 0xf4843657a840a05bULL, 0x47aa7ae9abe7ff34ULL,
0x7bd0c384ff8f5e33ULL, 0xc8fe8f3afc28015cULL, 0x8f54f5d357cffe68ULL, 0x3c7ab96d5468a107ULL,
0xf7a18709ff1ebc66ULL, 0x448fcbb7fcb9e309ULL, 0x0325b15e575e1c3dULL, 0xb00bfde054f94352ULL,
0x8c71448d0091e255ULL, 0x3f5f08330336bd3aULL, 0x78f572daa8d1420eULL, 0xcbdb3e64ab761d61ULL,
0x7d9ba13851336649ULL, 0xceb5ed8652943926ULL, 0x891f976ff973c612ULL, 0x3a31dbd1fad4997dULL,
0x064b62bcaebc387aULL, 0xb5652e02ad1b6715ULL, 0xf2cf54eb06fc9821ULL, 0x41e11855055bc74eULL,
0x8a3a2631ae2dda2fULL, 0x39146a8fad8a8540ULL, 0x7ebe1066066d7a74ULL, 0xcd905cd805ca251bULL,
0xf1eae5b551a2841cULL, 0x42c4a90b5205db73ULL, 0x056ed3e2f9e22447ULL, 0xb6409f5cfa457b28ULL,
0xfb374270a266cc92ULL, 0x48190ecea1c193fdULL, 0x0fb374270a266cc9ULL, 0xbc9d3899098133a6ULL,
0x80e781f45de992a1ULL, 0x33c9cd4a5e4ecdceULL, 0x7463b7a3f5a932faULL, 0xc74dfb1df60e6d95ULL,
0x0c96c5795d7870f4ULL, 0xbfb889c75edf2f9bULL, 0xf812f32ef538d0afULL, 0x4b3cbf90f69f8fc0ULL,
0x774606fda2f72ec7ULL, 0xc4684a43a15071a8ULL, 0x83c230aa0ab78e9cULL, 0x30ec7c140910d1f3ULL,
0x86ace348f355aadbULL, 0x3582aff6f0f2f5b4ULL, 0x7228d51f5b150a80ULL, 0xc10699a158b255efULL,
0xfd7c20cc0cdaf4e8ULL, 0x4e526c720f7dab87ULL, 0x09f8169ba49a54b3ULL, 0xbad65a25a73d0bdcULL,
0x710d64410c4b16bdULL, 0xc22328ff0fec49d2ULL, 0x85895216a40bb6e6ULL, 0x36a71ea8a7ace989ULL,
0x0adda7c5f3c4488eULL, 0xb9f3eb7bf06317e1ULL, 0xfe5991925b84e8d5ULL, 0x4d77dd2c5823b7baULL,
0x64b62bcaebc387a1ULL, 0xd7986774e864d8ceULL, 0x90321d9d438327faULL, 0x231c512340247895ULL,
0x1f66e84e144cd992ULL, 0xac48a4f017eb86fdULL, 0xebe2de19bc0c79c9ULL, 0x58cc92a7bfab26a6ULL,
0x9317acc314dd3bc7ULL, 0x2039e07d177a64a8ULL, 0x67939a94bc9d9b9cULL, 0xd4bdd62abf3ac4f3ULL,
0xe8c76f47eb5265f4ULL, 0x5be923f9e8f53a9bULL, 0x1c4359104312c5afULL, 0xaf6d15ae40b59ac0ULL,
0x192d8af2baf0e1e8ULL, 0xaa03c64cb957be87ULL, 0xeda9bca512b041b3ULL, 0x5e87f01b11171edcULL,
0x62fd4976457fbfdbULL, 0xd1d305c846d8e0b4ULL, 0x96797f21ed3f1f80ULL, 0x2557339fee9840efULL,
0xee8c0dfb45ee5d8eULL, 0x5da24145464902e1ULL, 0x1a083bacedaefdd5ULL, 0xa9267712ee09a2baULL,
0x955cce7fba6103bdULL, 0x267282c1b9c65cd2ULL, 0x61d8f8281221a3e6ULL, 0xd2f6b4961186fc89ULL,
0x9f8169ba49a54b33ULL, 0x2caf25044a02145cULL, 0x6b055fede1e5eb68ULL, 0xd82b1353e242b407ULL,
0xe451aa3eb62a1500ULL, 0x577fe680b58d4a6fULL, 0x10d59c691e6ab55bULL, 0xa3fbd0d71dcdea34ULL,
0x6820eeb3b6bbf755ULL, 0xdb0ea20db51ca83aULL, 0x9ca4d8e41efb570eULL, 0x2f8a945a1d5c0861ULL,
0x13f02d374934a966ULL, 0xa0de61894a93f609ULL, 0xe7741b60e174093dULL, 0x545a57dee2d35652ULL,
0xe21ac88218962d7aULL, 0x5134843c1b317215ULL, 0x169efed5b0d68d21ULL, 0xa5b0b26bb371d24eULL,
0x99ca0b06e7197349ULL, 0x2ae447b8e4be2c26ULL, 0x6d4e3d514f59d312ULL, 0xde6071ef4cfe8c7dULL,
0x15bb4f8be788911cULL, 0xa6950335e42fce73ULL, 0xe13f79dc4fc83147ULL, 0x521135624c6f6e28ULL,
0x6e6b8c0f1807cf2fULL, 0xdd45c0b11ba09040ULL, 0x9aefba58b0476f74ULL, 0x29c1f6e6b3e0301bULL,
0xc96c5795d7870f42ULL, 0x7a421b2bd420502dULL, 0x3de861c27fc7af19ULL, 0x8ec62d7c7c60f076ULL,
0xb2bc941128085171ULL, 0x0192d8af2baf0e1eULL, 0x4638a2468048f12aULL, 0xf516eef883efae45ULL,
0x3ecdd09c2899b324ULL, 0x8de39c222b3eec4bULL, 0xca49e6cb80d9137fULL, 0x7967aa75837e4c10ULL,
0x451d1318d716ed17ULL, 0xf6335fa6d4b1b278ULL, 0xb199254f7f564d4cULL, 0x02b769f17cf11223ULL,
0xb4f7f6ad86b4690bULL, 0x07d9ba1385133664ULL, 0x4073c0fa2ef4c950ULL, 0xf35d8c442d53963fULL,
0xcf273529793b3738ULL, 0x7c0979977a9c6857ULL, 0x3ba3037ed17b9763ULL, 0x888d4fc0d2dcc80cULL,
0x435671a479aad56dULL, 0xf0783d1a7a0d8a02ULL, 0xb7d247f3d1ea7536ULL, 0x04fc0b4dd24d2a59ULL,
0x3886b22086258b5eULL, 0x8ba8fe9e8582d431ULL, 0xcc0284772e652b05ULL, 0x7f2cc8c92dc2746aULL,
0x325b15e575e1c3d0ULL, 0x8175595b76469cbfULL, 0xc6df23b2dda1638bULL, 0x75f16f0cde063ce4ULL,
0x498bd6618a6e9de3ULL, 0xfaa59adf89c9c28cULL, 0xbd0fe036222e3db8ULL, 0x0e21ac88218962d7ULL,
0xc5fa92ec8aff7fb6ULL, 0x76d4de52895820d9ULL, 0x317ea4bb22bfdfedULL, 0x8250e80521188082ULL,
0xbe2a516875702185ULL, 0x0d041dd676d77eeaULL, 0x4aae673fdd3081deULL, 0xf9802b81de97deb1ULL,
0x4fc0b4dd24d2a599ULL, 0xfceef8632775faf6ULL, 0xbb44828a8c9205c2ULL, 0x086ace348f355aadULL,
0x34107759db5dfbaaULL, 0x873e3be7d8faa4c5ULL, 0xc094410e731d5bf1ULL, 0x73ba0db070ba049eULL,
0xb86133d4dbcc19ffULL, 0x0b4f7f6ad86b4690ULL, 0x4ce50583738cb9a4ULL, 0xffcb493d702be6cbULL,
0xc3b1f050244347ccULL, 0x709fbcee27e418a3ULL, 0x3735c6078c03e797ULL, 0x841b8ab98fa4b8f8ULL,
0xadda7c5f3c4488e3ULL, 0x1ef430e13fe3d78cULL, 0x595e4a08940428b8ULL, 0xea7006b697a377d7ULL,
0xd60abfdbc3cbd6d0ULL, 0x6524f365c06c89bfULL, 0x228e898c6b8b768bULL, 0x91a0c532682c29e4ULL,
0x5a7bfb56c35a3485ULL, 0xe955b7e8c0fd6beaULL, 0xaeffcd016b1a94deULL, 0x1dd181bf68bdcbb1ULL,
0x21ab38d23cd56ab6ULL, 0x9285746c3f7235d9ULL, 0xd52f0e859495caedULL, 0x6601423b97329582ULL,
0xd041dd676d77eeaaULL, 0x636f91d96ed0b1c5ULL, 0x24c5eb30c5374ef1ULL, 0x97eba78ec690119eULL,
0xab911ee392f8b099ULL, 0x18bf525d915feff6ULL, 0x5f1528b43ab810c2ULL, 0xec3b640a391f4fadULL,
0x27e05a6e926952ccULL, 0x94ce16d091ce0da3ULL, 0xd3646c393a29f297ULL, 0x604a2087398eadf8ULL,
0x5c3099ea6de60cffULL, 0xef1ed5546e415390ULL, 0xa8b4afbdc5a6aca4ULL, 0x1b9ae303c601f3cbULL,
0x56ed3e2f9e224471ULL, 0xe5c372919d851b1eULL, 0xa26908783662e42aULL, 0x114744c635c5bb45ULL,
0x2d3dfdab61ad1a42ULL, 0x9e13b115620a452dULL, 0xd9b9cbfcc9edba19ULL, 0x6a978742ca4ae576ULL,
0xa14cb926613cf817ULL, 0x1262f598629ba778ULL, 0x55c88f71c97c584cULL, 0xe6e6c3cfcadb0723ULL,
0xda9c7aa29eb3a624ULL, 0x69b2361c9d14f94bULL, 0x2e184cf536f3067fULL, 0x9d36004b35545910ULL,
0x2b769f17cf112238ULL, 0x9858d3a9ccb67d57ULL, 0xdff2a94067518263ULL, 0x6cdce5fe64f6dd0cULL,
0x50a65c93309e7c0bULL, 0xe388102d33392364ULL, 0xa4226ac498dedc50ULL, 0x170c267a9b79833fULL,
0xdcd7181e300f9e5eULL, 0x6ff954a033a8c131ULL, 0x28532e49984f3e05ULL, 0x9b7d62f79be8616aULL,
0xa707db9acf80c06dULL, 0x14299724cc279f02ULL, 0x5383edcd67c06036ULL, 0xe0ada17364673f59ULL
};
struct slver crc64_ecma_refl_base_slver_0000001c;
struct slver crc64_ecma_refl_base_slver = { 0x001c, 0x00, 0x00 };
static const uint64_t crc64_ecma_norm_table[256] = {
0x0000000000000000ULL, 0x42f0e1eba9ea3693ULL, 0x85e1c3d753d46d26ULL, 0xc711223cfa3e5bb5ULL,
0x493366450e42ecdfULL, 0x0bc387aea7a8da4cULL, 0xccd2a5925d9681f9ULL, 0x8e224479f47cb76aULL,
0x9266cc8a1c85d9beULL, 0xd0962d61b56fef2dULL, 0x17870f5d4f51b498ULL, 0x5577eeb6e6bb820bULL,
0xdb55aacf12c73561ULL, 0x99a54b24bb2d03f2ULL, 0x5eb4691841135847ULL, 0x1c4488f3e8f96ed4ULL,
0x663d78ff90e185efULL, 0x24cd9914390bb37cULL, 0xe3dcbb28c335e8c9ULL, 0xa12c5ac36adfde5aULL,
0x2f0e1eba9ea36930ULL, 0x6dfeff5137495fa3ULL, 0xaaefdd6dcd770416ULL, 0xe81f3c86649d3285ULL,
0xf45bb4758c645c51ULL, 0xb6ab559e258e6ac2ULL, 0x71ba77a2dfb03177ULL, 0x334a9649765a07e4ULL,
0xbd68d2308226b08eULL, 0xff9833db2bcc861dULL, 0x388911e7d1f2dda8ULL, 0x7a79f00c7818eb3bULL,
0xcc7af1ff21c30bdeULL, 0x8e8a101488293d4dULL, 0x499b3228721766f8ULL, 0x0b6bd3c3dbfd506bULL,
0x854997ba2f81e701ULL, 0xc7b97651866bd192ULL, 0x00a8546d7c558a27ULL, 0x4258b586d5bfbcb4ULL,
0x5e1c3d753d46d260ULL, 0x1cecdc9e94ace4f3ULL, 0xdbfdfea26e92bf46ULL, 0x990d1f49c77889d5ULL,
0x172f5b3033043ebfULL, 0x55dfbadb9aee082cULL, 0x92ce98e760d05399ULL, 0xd03e790cc93a650aULL,
0xaa478900b1228e31ULL, 0xe8b768eb18c8b8a2ULL, 0x2fa64ad7e2f6e317ULL, 0x6d56ab3c4b1cd584ULL,
0xe374ef45bf6062eeULL, 0xa1840eae168a547dULL, 0x66952c92ecb40fc8ULL, 0x2465cd79455e395bULL,
0x3821458aada7578fULL, 0x7ad1a461044d611cULL, 0xbdc0865dfe733aa9ULL, 0xff3067b657990c3aULL,
0x711223cfa3e5bb50ULL, 0x33e2c2240a0f8dc3ULL, 0xf4f3e018f031d676ULL, 0xb60301f359dbe0e5ULL,
0xda050215ea6c212fULL, 0x98f5e3fe438617bcULL, 0x5fe4c1c2b9b84c09ULL, 0x1d14202910527a9aULL,
0x93366450e42ecdf0ULL, 0xd1c685bb4dc4fb63ULL, 0x16d7a787b7faa0d6ULL, 0x5427466c1e109645ULL,
0x4863ce9ff6e9f891ULL, 0x0a932f745f03ce02ULL, 0xcd820d48a53d95b7ULL, 0x8f72eca30cd7a324ULL,
0x0150a8daf8ab144eULL, 0x43a04931514122ddULL, 0x84b16b0dab7f7968ULL, 0xc6418ae602954ffbULL,
0xbc387aea7a8da4c0ULL, 0xfec89b01d3679253ULL, 0x39d9b93d2959c9e6ULL, 0x7b2958d680b3ff75ULL,
0xf50b1caf74cf481fULL, 0xb7fbfd44dd257e8cULL, 0x70eadf78271b2539ULL, 0x321a3e938ef113aaULL,
0x2e5eb66066087d7eULL, 0x6cae578bcfe24bedULL, 0xabbf75b735dc1058ULL, 0xe94f945c9c3626cbULL,
0x676dd025684a91a1ULL, 0x259d31cec1a0a732ULL, 0xe28c13f23b9efc87ULL, 0xa07cf2199274ca14ULL,
0x167ff3eacbaf2af1ULL, 0x548f120162451c62ULL, 0x939e303d987b47d7ULL, 0xd16ed1d631917144ULL,
0x5f4c95afc5edc62eULL, 0x1dbc74446c07f0bdULL, 0xdaad56789639ab08ULL, 0x985db7933fd39d9bULL,
0x84193f60d72af34fULL, 0xc6e9de8b7ec0c5dcULL, 0x01f8fcb784fe9e69ULL, 0x43081d5c2d14a8faULL,
0xcd2a5925d9681f90ULL, 0x8fdab8ce70822903ULL, 0x48cb9af28abc72b6ULL, 0x0a3b7b1923564425ULL,
0x70428b155b4eaf1eULL, 0x32b26afef2a4998dULL, 0xf5a348c2089ac238ULL, 0xb753a929a170f4abULL,
0x3971ed50550c43c1ULL, 0x7b810cbbfce67552ULL, 0xbc902e8706d82ee7ULL, 0xfe60cf6caf321874ULL,
0xe224479f47cb76a0ULL, 0xa0d4a674ee214033ULL, 0x67c58448141f1b86ULL, 0x253565a3bdf52d15ULL,
0xab1721da49899a7fULL, 0xe9e7c031e063acecULL, 0x2ef6e20d1a5df759ULL, 0x6c0603e6b3b7c1caULL,
0xf6fae5c07d3274cdULL, 0xb40a042bd4d8425eULL, 0x731b26172ee619ebULL, 0x31ebc7fc870c2f78ULL,
0xbfc9838573709812ULL, 0xfd39626eda9aae81ULL, 0x3a28405220a4f534ULL, 0x78d8a1b9894ec3a7ULL,
0x649c294a61b7ad73ULL, 0x266cc8a1c85d9be0ULL, 0xe17dea9d3263c055ULL, 0xa38d0b769b89f6c6ULL,
0x2daf4f0f6ff541acULL, 0x6f5faee4c61f773fULL, 0xa84e8cd83c212c8aULL, 0xeabe6d3395cb1a19ULL,
0x90c79d3fedd3f122ULL, 0xd2377cd44439c7b1ULL, 0x15265ee8be079c04ULL, 0x57d6bf0317edaa97ULL,
0xd9f4fb7ae3911dfdULL, 0x9b041a914a7b2b6eULL, 0x5c1538adb04570dbULL, 0x1ee5d94619af4648ULL,
0x02a151b5f156289cULL, 0x4051b05e58bc1e0fULL, 0x87409262a28245baULL, 0xc5b073890b687329ULL,
0x4b9237f0ff14c443ULL, 0x0962d61b56fef2d0ULL, 0xce73f427acc0a965ULL, 0x8c8315cc052a9ff6ULL,
0x3a80143f5cf17f13ULL, 0x7870f5d4f51b4980ULL, 0xbf61d7e80f251235ULL, 0xfd913603a6cf24a6ULL,
0x73b3727a52b393ccULL, 0x31439391fb59a55fULL, 0xf652b1ad0167feeaULL, 0xb4a25046a88dc879ULL,
0xa8e6d8b54074a6adULL, 0xea16395ee99e903eULL, 0x2d071b6213a0cb8bULL, 0x6ff7fa89ba4afd18ULL,
0xe1d5bef04e364a72ULL, 0xa3255f1be7dc7ce1ULL, 0x64347d271de22754ULL, 0x26c49cccb40811c7ULL,
0x5cbd6cc0cc10fafcULL, 0x1e4d8d2b65facc6fULL, 0xd95caf179fc497daULL, 0x9bac4efc362ea149ULL,
0x158e0a85c2521623ULL, 0x577eeb6e6bb820b0ULL, 0x906fc95291867b05ULL, 0xd29f28b9386c4d96ULL,
0xcedba04ad0952342ULL, 0x8c2b41a1797f15d1ULL, 0x4b3a639d83414e64ULL, 0x09ca82762aab78f7ULL,
0x87e8c60fded7cf9dULL, 0xc51827e4773df90eULL, 0x020905d88d03a2bbULL, 0x40f9e43324e99428ULL,
0x2cffe7d5975e55e2ULL, 0x6e0f063e3eb46371ULL, 0xa91e2402c48a38c4ULL, 0xebeec5e96d600e57ULL,
0x65cc8190991cb93dULL, 0x273c607b30f68faeULL, 0xe02d4247cac8d41bULL, 0xa2dda3ac6322e288ULL,
0xbe992b5f8bdb8c5cULL, 0xfc69cab42231bacfULL, 0x3b78e888d80fe17aULL, 0x7988096371e5d7e9ULL,
0xf7aa4d1a85996083ULL, 0xb55aacf12c735610ULL, 0x724b8ecdd64d0da5ULL, 0x30bb6f267fa73b36ULL,
0x4ac29f2a07bfd00dULL, 0x08327ec1ae55e69eULL, 0xcf235cfd546bbd2bULL, 0x8dd3bd16fd818bb8ULL,
0x03f1f96f09fd3cd2ULL, 0x41011884a0170a41ULL, 0x86103ab85a2951f4ULL, 0xc4e0db53f3c36767ULL,
0xd8a453a01b3a09b3ULL, 0x9a54b24bb2d03f20ULL, 0x5d45907748ee6495ULL, 0x1fb5719ce1045206ULL,
0x919735e51578e56cULL, 0xd367d40ebc92d3ffULL, 0x1476f63246ac884aULL, 0x568617d9ef46bed9ULL,
0xe085162ab69d5e3cULL, 0xa275f7c11f7768afULL, 0x6564d5fde549331aULL, 0x279434164ca30589ULL,
0xa9b6706fb8dfb2e3ULL, 0xeb46918411358470ULL, 0x2c57b3b8eb0bdfc5ULL, 0x6ea7525342e1e956ULL,
0x72e3daa0aa188782ULL, 0x30133b4b03f2b111ULL, 0xf7021977f9cceaa4ULL, 0xb5f2f89c5026dc37ULL,
0x3bd0bce5a45a6b5dULL, 0x79205d0e0db05dceULL, 0xbe317f32f78e067bULL, 0xfcc19ed95e6430e8ULL,
0x86b86ed5267cdbd3ULL, 0xc4488f3e8f96ed40ULL, 0x0359ad0275a8b6f5ULL, 0x41a94ce9dc428066ULL,
0xcf8b0890283e370cULL, 0x8d7be97b81d4019fULL, 0x4a6acb477bea5a2aULL, 0x089a2aacd2006cb9ULL,
0x14dea25f3af9026dULL, 0x562e43b4931334feULL, 0x913f6188692d6f4bULL, 0xd3cf8063c0c759d8ULL,
0x5dedc41a34bbeeb2ULL, 0x1f1d25f19d51d821ULL, 0xd80c07cd676f8394ULL, 0x9afce626ce85b507ULL
};
struct slver crc64_ecma_norm_base_slver_00000019;
struct slver crc64_ecma_norm_base_slver = { 0x0019, 0x00, 0x00 };
static const uint64_t crc64_iso_refl_table[256] = {
0x0000000000000000ULL, 0x01b0000000000000ULL, 0x0360000000000000ULL, 0x02d0000000000000ULL,
0x06c0000000000000ULL, 0x0770000000000000ULL, 0x05a0000000000000ULL, 0x0410000000000000ULL,
0x0d80000000000000ULL, 0x0c30000000000000ULL, 0x0ee0000000000000ULL, 0x0f50000000000000ULL,
0x0b40000000000000ULL, 0x0af0000000000000ULL, 0x0820000000000000ULL, 0x0990000000000000ULL,
0x1b00000000000000ULL, 0x1ab0000000000000ULL, 0x1860000000000000ULL, 0x19d0000000000000ULL,
0x1dc0000000000000ULL, 0x1c70000000000000ULL, 0x1ea0000000000000ULL, 0x1f10000000000000ULL,
0x1680000000000000ULL, 0x1730000000000000ULL, 0x15e0000000000000ULL, 0x1450000000000000ULL,
0x1040000000000000ULL, 0x11f0000000000000ULL, 0x1320000000000000ULL, 0x1290000000000000ULL,
0x3600000000000000ULL, 0x37b0000000000000ULL, 0x3560000000000000ULL, 0x34d0000000000000ULL,
0x30c0000000000000ULL, 0x3170000000000000ULL, 0x33a0000000000000ULL, 0x3210000000000000ULL,
0x3b80000000000000ULL, 0x3a30000000000000ULL, 0x38e0000000000000ULL, 0x3950000000000000ULL,
0x3d40000000000000ULL, 0x3cf0000000000000ULL, 0x3e20000000000000ULL, 0x3f90000000000000ULL,
0x2d00000000000000ULL, 0x2cb0000000000000ULL, 0x2e60000000000000ULL, 0x2fd0000000000000ULL,
0x2bc0000000000000ULL, 0x2a70000000000000ULL, 0x28a0000000000000ULL, 0x2910000000000000ULL,
0x2080000000000000ULL, 0x2130000000000000ULL, 0x23e0000000000000ULL, 0x2250000000000000ULL,
0x2640000000000000ULL, 0x27f0000000000000ULL, 0x2520000000000000ULL, 0x2490000000000000ULL,
0x6c00000000000000ULL, 0x6db0000000000000ULL, 0x6f60000000000000ULL, 0x6ed0000000000000ULL,
0x6ac0000000000000ULL, 0x6b70000000000000ULL, 0x69a0000000000000ULL, 0x6810000000000000ULL,
0x6180000000000000ULL, 0x6030000000000000ULL, 0x62e0000000000000ULL, 0x6350000000000000ULL,
0x6740000000000000ULL, 0x66f0000000000000ULL, 0x6420000000000000ULL, 0x6590000000000000ULL,
0x7700000000000000ULL, 0x76b0000000000000ULL, 0x7460000000000000ULL, 0x75d0000000000000ULL,
0x71c0000000000000ULL, 0x7070000000000000ULL, 0x72a0000000000000ULL, 0x7310000000000000ULL,
0x7a80000000000000ULL, 0x7b30000000000000ULL, 0x79e0000000000000ULL, 0x7850000000000000ULL,
0x7c40000000000000ULL, 0x7df0000000000000ULL, 0x7f20000000000000ULL, 0x7e90000000000000ULL,
0x5a00000000000000ULL, 0x5bb0000000000000ULL, 0x5960000000000000ULL, 0x58d0000000000000ULL,
0x5cc0000000000000ULL, 0x5d70000000000000ULL, 0x5fa0000000000000ULL, 0x5e10000000000000ULL,
0x5780000000000000ULL, 0x5630000000000000ULL, 0x54e0000000000000ULL, 0x5550000000000000ULL,
0x5140000000000000ULL, 0x50f0000000000000ULL, 0x5220000000000000ULL, 0x5390000000000000ULL,
0x4100000000000000ULL, 0x40b0000000000000ULL, 0x4260000000000000ULL, 0x43d0000000000000ULL,
0x47c0000000000000ULL, 0x4670000000000000ULL, 0x44a0000000000000ULL, 0x4510000000000000ULL,
0x4c80000000000000ULL, 0x4d30000000000000ULL, 0x4fe0000000000000ULL, 0x4e50000000000000ULL,
0x4a40000000000000ULL, 0x4bf0000000000000ULL, 0x4920000000000000ULL, 0x4890000000000000ULL,
0xd800000000000000ULL, 0xd9b0000000000000ULL, 0xdb60000000000000ULL, 0xdad0000000000000ULL,
0xdec0000000000000ULL, 0xdf70000000000000ULL, 0xdda0000000000000ULL, 0xdc10000000000000ULL,
0xd580000000000000ULL, 0xd430000000000000ULL, 0xd6e0000000000000ULL, 0xd750000000000000ULL,
0xd340000000000000ULL, 0xd2f0000000000000ULL, 0xd020000000000000ULL, 0xd190000000000000ULL,
0xc300000000000000ULL, 0xc2b0000000000000ULL, 0xc060000000000000ULL, 0xc1d0000000000000ULL,
0xc5c0000000000000ULL, 0xc470000000000000ULL, 0xc6a0000000000000ULL, 0xc710000000000000ULL,
0xce80000000000000ULL, 0xcf30000000000000ULL, 0xcde0000000000000ULL, 0xcc50000000000000ULL,
0xc840000000000000ULL, 0xc9f0000000000000ULL, 0xcb20000000000000ULL, 0xca90000000000000ULL,
0xee00000000000000ULL, 0xefb0000000000000ULL, 0xed60000000000000ULL, 0xecd0000000000000ULL,
0xe8c0000000000000ULL, 0xe970000000000000ULL, 0xeba0000000000000ULL, 0xea10000000000000ULL,
0xe380000000000000ULL, 0xe230000000000000ULL, 0xe0e0000000000000ULL, 0xe150000000000000ULL,
0xe540000000000000ULL, 0xe4f0000000000000ULL, 0xe620000000000000ULL, 0xe790000000000000ULL,
0xf500000000000000ULL, 0xf4b0000000000000ULL, 0xf660000000000000ULL, 0xf7d0000000000000ULL,
0xf3c0000000000000ULL, 0xf270000000000000ULL, 0xf0a0000000000000ULL, 0xf110000000000000ULL,
0xf880000000000000ULL, 0xf930000000000000ULL, 0xfbe0000000000000ULL, 0xfa50000000000000ULL,
0xfe40000000000000ULL, 0xfff0000000000000ULL, 0xfd20000000000000ULL, 0xfc90000000000000ULL,
0xb400000000000000ULL, 0xb5b0000000000000ULL, 0xb760000000000000ULL, 0xb6d0000000000000ULL,
0xb2c0000000000000ULL, 0xb370000000000000ULL, 0xb1a0000000000000ULL, 0xb010000000000000ULL,
0xb980000000000000ULL, 0xb830000000000000ULL, 0xbae0000000000000ULL, 0xbb50000000000000ULL,
0xbf40000000000000ULL, 0xbef0000000000000ULL, 0xbc20000000000000ULL, 0xbd90000000000000ULL,
0xaf00000000000000ULL, 0xaeb0000000000000ULL, 0xac60000000000000ULL, 0xadd0000000000000ULL,
0xa9c0000000000000ULL, 0xa870000000000000ULL, 0xaaa0000000000000ULL, 0xab10000000000000ULL,
0xa280000000000000ULL, 0xa330000000000000ULL, 0xa1e0000000000000ULL, 0xa050000000000000ULL,
0xa440000000000000ULL, 0xa5f0000000000000ULL, 0xa720000000000000ULL, 0xa690000000000000ULL,
0x8200000000000000ULL, 0x83b0000000000000ULL, 0x8160000000000000ULL, 0x80d0000000000000ULL,
0x84c0000000000000ULL, 0x8570000000000000ULL, 0x87a0000000000000ULL, 0x8610000000000000ULL,
0x8f80000000000000ULL, 0x8e30000000000000ULL, 0x8ce0000000000000ULL, 0x8d50000000000000ULL,
0x8940000000000000ULL, 0x88f0000000000000ULL, 0x8a20000000000000ULL, 0x8b90000000000000ULL,
0x9900000000000000ULL, 0x98b0000000000000ULL, 0x9a60000000000000ULL, 0x9bd0000000000000ULL,
0x9fc0000000000000ULL, 0x9e70000000000000ULL, 0x9ca0000000000000ULL, 0x9d10000000000000ULL,
0x9480000000000000ULL, 0x9530000000000000ULL, 0x97e0000000000000ULL, 0x9650000000000000ULL,
0x9240000000000000ULL, 0x93f0000000000000ULL, 0x9120000000000000ULL, 0x9090000000000000ULL
};
struct slver crc64_iso_refl_base_slver_00000022;
struct slver crc64_iso_refl_base_slver = { 0x0022, 0x00, 0x00 };
static const uint64_t crc64_iso_norm_table[256] = {
0x0000000000000000ULL, 0x000000000000001bULL, 0x0000000000000036ULL, 0x000000000000002dULL,
0x000000000000006cULL, 0x0000000000000077ULL, 0x000000000000005aULL, 0x0000000000000041ULL,
0x00000000000000d8ULL, 0x00000000000000c3ULL, 0x00000000000000eeULL, 0x00000000000000f5ULL,
0x00000000000000b4ULL, 0x00000000000000afULL, 0x0000000000000082ULL, 0x0000000000000099ULL,
0x00000000000001b0ULL, 0x00000000000001abULL, 0x0000000000000186ULL, 0x000000000000019dULL,
0x00000000000001dcULL, 0x00000000000001c7ULL, 0x00000000000001eaULL, 0x00000000000001f1ULL,
0x0000000000000168ULL, 0x0000000000000173ULL, 0x000000000000015eULL, 0x0000000000000145ULL,
0x0000000000000104ULL, 0x000000000000011fULL, 0x0000000000000132ULL, 0x0000000000000129ULL,
0x0000000000000360ULL, 0x000000000000037bULL, 0x0000000000000356ULL, 0x000000000000034dULL,
0x000000000000030cULL, 0x0000000000000317ULL, 0x000000000000033aULL, 0x0000000000000321ULL,
0x00000000000003b8ULL, 0x00000000000003a3ULL, 0x000000000000038eULL, 0x0000000000000395ULL,
0x00000000000003d4ULL, 0x00000000000003cfULL, 0x00000000000003e2ULL, 0x00000000000003f9ULL,
0x00000000000002d0ULL, 0x00000000000002cbULL, 0x00000000000002e6ULL, 0x00000000000002fdULL,
0x00000000000002bcULL, 0x00000000000002a7ULL, 0x000000000000028aULL, 0x0000000000000291ULL,
0x0000000000000208ULL, 0x0000000000000213ULL, 0x000000000000023eULL, 0x0000000000000225ULL,
0x0000000000000264ULL, 0x000000000000027fULL, 0x0000000000000252ULL, 0x0000000000000249ULL,
0x00000000000006c0ULL, 0x00000000000006dbULL, 0x00000000000006f6ULL, 0x00000000000006edULL,
0x00000000000006acULL, 0x00000000000006b7ULL, 0x000000000000069aULL, 0x0000000000000681ULL,
0x0000000000000618ULL, 0x0000000000000603ULL, 0x000000000000062eULL, 0x0000000000000635ULL,
0x0000000000000674ULL, 0x000000000000066fULL, 0x0000000000000642ULL, 0x0000000000000659ULL,
0x0000000000000770ULL, 0x000000000000076bULL, 0x0000000000000746ULL, 0x000000000000075dULL,
0x000000000000071cULL, 0x0000000000000707ULL, 0x000000000000072aULL, 0x0000000000000731ULL,
0x00000000000007a8ULL, 0x00000000000007b3ULL, 0x000000000000079eULL, 0x0000000000000785ULL,
0x00000000000007c4ULL, 0x00000000000007dfULL, 0x00000000000007f2ULL, 0x00000000000007e9ULL,
0x00000000000005a0ULL, 0x00000000000005bbULL, 0x0000000000000596ULL, 0x000000000000058dULL,
0x00000000000005ccULL, 0x00000000000005d7ULL, 0x00000000000005faULL, 0x00000000000005e1ULL,
0x0000000000000578ULL, 0x0000000000000563ULL, 0x000000000000054eULL, 0x0000000000000555ULL,
0x0000000000000514ULL, 0x000000000000050fULL, 0x0000000000000522ULL, 0x0000000000000539ULL,
0x0000000000000410ULL, 0x000000000000040bULL, 0x0000000000000426ULL, 0x000000000000043dULL,
0x000000000000047cULL, 0x0000000000000467ULL, 0x000000000000044aULL, 0x0000000000000451ULL,
0x00000000000004c8ULL, 0x00000000000004d3ULL, 0x00000000000004feULL, 0x00000000000004e5ULL,
0x00000000000004a4ULL, 0x00000000000004bfULL, 0x0000000000000492ULL, 0x0000000000000489ULL,
0x0000000000000d80ULL, 0x0000000000000d9bULL, 0x0000000000000db6ULL, 0x0000000000000dadULL,
0x0000000000000decULL, 0x0000000000000df7ULL, 0x0000000000000ddaULL, 0x0000000000000dc1ULL,
0x0000000000000d58ULL, 0x0000000000000d43ULL, 0x0000000000000d6eULL, 0x0000000000000d75ULL,
0x0000000000000d34ULL, 0x0000000000000d2fULL, 0x0000000000000d02ULL, 0x0000000000000d19ULL,
0x0000000000000c30ULL, 0x0000000000000c2bULL, 0x0000000000000c06ULL, 0x0000000000000c1dULL,
0x0000000000000c5cULL, 0x0000000000000c47ULL, 0x0000000000000c6aULL, 0x0000000000000c71ULL,
0x0000000000000ce8ULL, 0x0000000000000cf3ULL, 0x0000000000000cdeULL, 0x0000000000000cc5ULL,
0x0000000000000c84ULL, 0x0000000000000c9fULL, 0x0000000000000cb2ULL, 0x0000000000000ca9ULL,
0x0000000000000ee0ULL, 0x0000000000000efbULL, 0x0000000000000ed6ULL, 0x0000000000000ecdULL,
0x0000000000000e8cULL, 0x0000000000000e97ULL, 0x0000000000000ebaULL, 0x0000000000000ea1ULL,
0x0000000000000e38ULL, 0x0000000000000e23ULL, 0x0000000000000e0eULL, 0x0000000000000e15ULL,
0x0000000000000e54ULL, 0x0000000000000e4fULL, 0x0000000000000e62ULL, 0x0000000000000e79ULL,
0x0000000000000f50ULL, 0x0000000000000f4bULL, 0x0000000000000f66ULL, 0x0000000000000f7dULL,
0x0000000000000f3cULL, 0x0000000000000f27ULL, 0x0000000000000f0aULL, 0x0000000000000f11ULL,
0x0000000000000f88ULL, 0x0000000000000f93ULL, 0x0000000000000fbeULL, 0x0000000000000fa5ULL,
0x0000000000000fe4ULL, 0x0000000000000fffULL, 0x0000000000000fd2ULL, 0x0000000000000fc9ULL,
0x0000000000000b40ULL, 0x0000000000000b5bULL, 0x0000000000000b76ULL, 0x0000000000000b6dULL,
0x0000000000000b2cULL, 0x0000000000000b37ULL, 0x0000000000000b1aULL, 0x0000000000000b01ULL,
0x0000000000000b98ULL, 0x0000000000000b83ULL, 0x0000000000000baeULL, 0x0000000000000bb5ULL,
0x0000000000000bf4ULL, 0x0000000000000befULL, 0x0000000000000bc2ULL, 0x0000000000000bd9ULL,
0x0000000000000af0ULL, 0x0000000000000aebULL, 0x0000000000000ac6ULL, 0x0000000000000addULL,
0x0000000000000a9cULL, 0x0000000000000a87ULL, 0x0000000000000aaaULL, 0x0000000000000ab1ULL,
0x0000000000000a28ULL, 0x0000000000000a33ULL, 0x0000000000000a1eULL, 0x0000000000000a05ULL,
0x0000000000000a44ULL, 0x0000000000000a5fULL, 0x0000000000000a72ULL, 0x0000000000000a69ULL,
0x0000000000000820ULL, 0x000000000000083bULL, 0x0000000000000816ULL, 0x000000000000080dULL,
0x000000000000084cULL, 0x0000000000000857ULL, 0x000000000000087aULL, 0x0000000000000861ULL,
0x00000000000008f8ULL, 0x00000000000008e3ULL, 0x00000000000008ceULL, 0x00000000000008d5ULL,
0x0000000000000894ULL, 0x000000000000088fULL, 0x00000000000008a2ULL, 0x00000000000008b9ULL,
0x0000000000000990ULL, 0x000000000000098bULL, 0x00000000000009a6ULL, 0x00000000000009bdULL,
0x00000000000009fcULL, 0x00000000000009e7ULL, 0x00000000000009caULL, 0x00000000000009d1ULL,
0x0000000000000948ULL, 0x0000000000000953ULL, 0x000000000000097eULL, 0x0000000000000965ULL,
0x0000000000000924ULL, 0x000000000000093fULL, 0x0000000000000912ULL, 0x0000000000000909ULL
};
struct slver crc64_iso_norm_base_slver_0000001f;
struct slver crc64_iso_norm_base_slver = { 0x001f, 0x00, 0x00 };
static const uint64_t crc64_jones_refl_table[256] = {
0x0000000000000000ULL, 0x7ad870c830358979ULL, 0xf5b0e190606b12f2ULL, 0x8f689158505e9b8bULL,
0xc038e5739841b68fULL, 0xbae095bba8743ff6ULL, 0x358804e3f82aa47dULL, 0x4f50742bc81f2d04ULL,
0xab28ecb46814fe75ULL, 0xd1f09c7c5821770cULL, 0x5e980d24087fec87ULL, 0x24407dec384a65feULL,
0x6b1009c7f05548faULL, 0x11c8790fc060c183ULL, 0x9ea0e857903e5a08ULL, 0xe478989fa00bd371ULL,
0x7d08ff3b88be6f81ULL, 0x07d08ff3b88be6f8ULL, 0x88b81eabe8d57d73ULL, 0xf2606e63d8e0f40aULL,
0xbd301a4810ffd90eULL, 0xc7e86a8020ca5077ULL, 0x4880fbd87094cbfcULL, 0x32588b1040a14285ULL,
0xd620138fe0aa91f4ULL, 0xacf86347d09f188dULL, 0x2390f21f80c18306ULL, 0x594882d7b0f40a7fULL,
0x1618f6fc78eb277bULL, 0x6cc0863448deae02ULL, 0xe3a8176c18803589ULL, 0x997067a428b5bcf0ULL,
0xfa11fe77117cdf02ULL, 0x80c98ebf2149567bULL, 0x0fa11fe77117cdf0ULL, 0x75796f2f41224489ULL,
0x3a291b04893d698dULL, 0x40f16bccb908e0f4ULL, 0xcf99fa94e9567b7fULL, 0xb5418a5cd963f206ULL,
0x513912c379682177ULL, 0x2be1620b495da80eULL, 0xa489f35319033385ULL, 0xde51839b2936bafcULL,
0x9101f7b0e12997f8ULL, 0xebd98778d11c1e81ULL, 0x64b116208142850aULL, 0x1e6966e8b1770c73ULL,
0x8719014c99c2b083ULL, 0xfdc17184a9f739faULL, 0x72a9e0dcf9a9a271ULL, 0x08719014c99c2b08ULL,
0x4721e43f0183060cULL, 0x3df994f731b68f75ULL, 0xb29105af61e814feULL, 0xc849756751dd9d87ULL,
0x2c31edf8f1d64ef6ULL, 0x56e99d30c1e3c78fULL, 0xd9810c6891bd5c04ULL, 0xa3597ca0a188d57dULL,
0xec09088b6997f879ULL, 0x96d1784359a27100ULL, 0x19b9e91b09fcea8bULL, 0x636199d339c963f2ULL,
0xdf7adabd7a6e2d6fULL, 0xa5a2aa754a5ba416ULL, 0x2aca3b2d1a053f9dULL, 0x50124be52a30b6e4ULL,
0x1f423fcee22f9be0ULL, 0x659a4f06d21a1299ULL, 0xeaf2de5e82448912ULL, 0x902aae96b271006bULL,
0x74523609127ad31aULL, 0x0e8a46c1224f5a63ULL, 0x81e2d7997211c1e8ULL, 0xfb3aa75142244891ULL,
0xb46ad37a8a3b6595ULL, 0xceb2a3b2ba0eececULL, 0x41da32eaea507767ULL, 0x3b024222da65fe1eULL,
0xa2722586f2d042eeULL, 0xd8aa554ec2e5cb97ULL, 0x57c2c41692bb501cULL, 0x2d1ab4dea28ed965ULL,
0x624ac0f56a91f461ULL, 0x1892b03d5aa47d18ULL, 0x97fa21650afae693ULL, 0xed2251ad3acf6feaULL,
0x095ac9329ac4bc9bULL, 0x7382b9faaaf135e2ULL, 0xfcea28a2faafae69ULL, 0x8632586aca9a2710ULL,
0xc9622c4102850a14ULL, 0xb3ba5c8932b0836dULL, 0x3cd2cdd162ee18e6ULL, 0x460abd1952db919fULL,
0x256b24ca6b12f26dULL, 0x5fb354025b277b14ULL, 0xd0dbc55a0b79e09fULL, 0xaa03b5923b4c69e6ULL,
0xe553c1b9f35344e2ULL, 0x9f8bb171c366cd9bULL, 0x10e3202993385610ULL, 0x6a3b50e1a30ddf69ULL,
0x8e43c87e03060c18ULL, 0xf49bb8b633338561ULL, 0x7bf329ee636d1eeaULL, 0x012b592653589793ULL,
0x4e7b2d0d9b47ba97ULL, 0x34a35dc5ab7233eeULL, 0xbbcbcc9dfb2ca865ULL, 0xc113bc55cb19211cULL,
0x5863dbf1e3ac9decULL, 0x22bbab39d3991495ULL, 0xadd33a6183c78f1eULL, 0xd70b4aa9b3f20667ULL,
0x985b3e827bed2b63ULL, 0xe2834e4a4bd8a21aULL, 0x6debdf121b863991ULL, 0x1733afda2bb3b0e8ULL,
0xf34b37458bb86399ULL, 0x8993478dbb8deae0ULL, 0x06fbd6d5ebd3716bULL, 0x7c23a61ddbe6f812ULL,
0x3373d23613f9d516ULL, 0x49aba2fe23cc5c6fULL, 0xc6c333a67392c7e4ULL, 0xbc1b436e43a74e9dULL,
0x95ac9329ac4bc9b5ULL, 0xef74e3e19c7e40ccULL, 0x601c72b9cc20db47ULL, 0x1ac40271fc15523eULL,
0x5594765a340a7f3aULL, 0x2f4c0692043ff643ULL, 0xa02497ca54616dc8ULL, 0xdafce7026454e4b1ULL,
0x3e847f9dc45f37c0ULL, 0x445c0f55f46abeb9ULL, 0xcb349e0da4342532ULL, 0xb1eceec59401ac4bULL,
0xfebc9aee5c1e814fULL, 0x8464ea266c2b0836ULL, 0x0b0c7b7e3c7593bdULL, 0x71d40bb60c401ac4ULL,
0xe8a46c1224f5a634ULL, 0x927c1cda14c02f4dULL, 0x1d148d82449eb4c6ULL, 0x67ccfd4a74ab3dbfULL,
0x289c8961bcb410bbULL, 0x5244f9a98c8199c2ULL, 0xdd2c68f1dcdf0249ULL, 0xa7f41839ecea8b30ULL,
0x438c80a64ce15841ULL, 0x3954f06e7cd4d138ULL, 0xb63c61362c8a4ab3ULL, 0xcce411fe1cbfc3caULL,
0x83b465d5d4a0eeceULL, 0xf96c151de49567b7ULL, 0x76048445b4cbfc3cULL, 0x0cdcf48d84fe7545ULL,
0x6fbd6d5ebd3716b7ULL, 0x15651d968d029fceULL, 0x9a0d8ccedd5c0445ULL, 0xe0d5fc06ed698d3cULL,
0xaf85882d2576a038ULL, 0xd55df8e515432941ULL, 0x5a3569bd451db2caULL, 0x20ed197575283bb3ULL,
0xc49581ead523e8c2ULL, 0xbe4df122e51661bbULL, 0x3125607ab548fa30ULL, 0x4bfd10b2857d7349ULL,
0x04ad64994d625e4dULL, 0x7e7514517d57d734ULL, 0xf11d85092d094cbfULL, 0x8bc5f5c11d3cc5c6ULL,
0x12b5926535897936ULL, 0x686de2ad05bcf04fULL, 0xe70573f555e26bc4ULL, 0x9ddd033d65d7e2bdULL,
0xd28d7716adc8cfb9ULL, 0xa85507de9dfd46c0ULL, 0x273d9686cda3dd4bULL, 0x5de5e64efd965432ULL,
0xb99d7ed15d9d8743ULL, 0xc3450e196da80e3aULL, 0x4c2d9f413df695b1ULL, 0x36f5ef890dc31cc8ULL,
0x79a59ba2c5dc31ccULL, 0x037deb6af5e9b8b5ULL, 0x8c157a32a5b7233eULL, 0xf6cd0afa9582aa47ULL,
0x4ad64994d625e4daULL, 0x300e395ce6106da3ULL, 0xbf66a804b64ef628ULL, 0xc5bed8cc867b7f51ULL,
0x8aeeace74e645255ULL, 0xf036dc2f7e51db2cULL, 0x7f5e4d772e0f40a7ULL, 0x05863dbf1e3ac9deULL,
0xe1fea520be311aafULL, 0x9b26d5e88e0493d6ULL, 0x144e44b0de5a085dULL, 0x6e963478ee6f8124ULL,
0x21c640532670ac20ULL, 0x5b1e309b16452559ULL, 0xd476a1c3461bbed2ULL, 0xaeaed10b762e37abULL,
0x37deb6af5e9b8b5bULL, 0x4d06c6676eae0222ULL, 0xc26e573f3ef099a9ULL, 0xb8b627f70ec510d0ULL,
0xf7e653dcc6da3dd4ULL, 0x8d3e2314f6efb4adULL, 0x0256b24ca6b12f26ULL, 0x788ec2849684a65fULL,
0x9cf65a1b368f752eULL, 0xe62e2ad306bafc57ULL, 0x6946bb8b56e467dcULL, 0x139ecb4366d1eea5ULL,
0x5ccebf68aecec3a1ULL, 0x2616cfa09efb4ad8ULL, 0xa97e5ef8cea5d153ULL, 0xd3a62e30fe90582aULL,
0xb0c7b7e3c7593bd8ULL, 0xca1fc72bf76cb2a1ULL, 0x45775673a732292aULL, 0x3faf26bb9707a053ULL,
0x70ff52905f188d57ULL, 0x0a2722586f2d042eULL, 0x854fb3003f739fa5ULL, 0xff97c3c80f4616dcULL,
0x1bef5b57af4dc5adULL, 0x61372b9f9f784cd4ULL, 0xee5fbac7cf26d75fULL, 0x9487ca0fff135e26ULL,
0xdbd7be24370c7322ULL, 0xa10fceec0739fa5bULL, 0x2e675fb4576761d0ULL, 0x54bf2f7c6752e8a9ULL,
0xcdcf48d84fe75459ULL, 0xb71738107fd2dd20ULL, 0x387fa9482f8c46abULL, 0x42a7d9801fb9cfd2ULL,
0x0df7adabd7a6e2d6ULL, 0x772fdd63e7936bafULL, 0xf8474c3bb7cdf024ULL, 0x829f3cf387f8795dULL,
0x66e7a46c27f3aa2cULL, 0x1c3fd4a417c62355ULL, 0x935745fc4798b8deULL, 0xe98f353477ad31a7ULL,
0xa6df411fbfb21ca3ULL, 0xdc0731d78f8795daULL, 0x536fa08fdfd90e51ULL, 0x29b7d047efec8728ULL
};
struct slver crc64_jones_refl_base_slver_00000028;
struct slver crc64_jones_refl_base_slver = { 0x0028, 0x00, 0x00 };
static const uint64_t crc64_jones_norm_table[256] = {
0x0000000000000000ULL, 0xad93d23594c935a9ULL, 0xf6b4765ebd5b5efbULL, 0x5b27a46b29926b52ULL,
0x40fb3e88ee7f885fULL, 0xed68ecbd7ab6bdf6ULL, 0xb64f48d65324d6a4ULL, 0x1bdc9ae3c7ede30dULL,
0x81f67d11dcff10beULL, 0x2c65af2448362517ULL, 0x77420b4f61a44e45ULL, 0xdad1d97af56d7becULL,
0xc10d4399328098e1ULL, 0x6c9e91aca649ad48ULL, 0x37b935c78fdbc61aULL, 0x9a2ae7f21b12f3b3ULL,
0xae7f28162d3714d5ULL, 0x03ecfa23b9fe217cULL, 0x58cb5e48906c4a2eULL, 0xf5588c7d04a57f87ULL,
0xee84169ec3489c8aULL, 0x4317c4ab5781a923ULL, 0x183060c07e13c271ULL, 0xb5a3b2f5eadaf7d8ULL,
0x2f895507f1c8046bULL, 0x821a8732650131c2ULL, 0xd93d23594c935a90ULL, 0x74aef16cd85a6f39ULL,
0x6f726b8f1fb78c34ULL, 0xc2e1b9ba8b7eb99dULL, 0x99c61dd1a2ecd2cfULL, 0x3455cfe43625e766ULL,
0xf16d8219cea71c03ULL, 0x5cfe502c5a6e29aaULL, 0x07d9f44773fc42f8ULL, 0xaa4a2672e7357751ULL,
0xb196bc9120d8945cULL, 0x1c056ea4b411a1f5ULL, 0x4722cacf9d83caa7ULL, 0xeab118fa094aff0eULL,
0x709bff0812580cbdULL, 0xdd082d3d86913914ULL, 0x862f8956af035246ULL, 0x2bbc5b633bca67efULL,
0x3060c180fc2784e2ULL, 0x9df313b568eeb14bULL, 0xc6d4b7de417cda19ULL, 0x6b4765ebd5b5efb0ULL,
0x5f12aa0fe39008d6ULL, 0xf281783a77593d7fULL, 0xa9a6dc515ecb562dULL, 0x04350e64ca026384ULL,
0x1fe994870def8089ULL, 0xb27a46b29926b520ULL, 0xe95de2d9b0b4de72ULL, 0x44ce30ec247debdbULL,
0xdee4d71e3f6f1868ULL, 0x7377052baba62dc1ULL, 0x2850a14082344693ULL, 0x85c3737516fd733aULL,
0x9e1fe996d1109037ULL, 0x338c3ba345d9a59eULL, 0x68ab9fc86c4bceccULL, 0xc5384dfdf882fb65ULL,
0x4f48d60609870dafULL, 0xe2db04339d4e3806ULL, 0xb9fca058b4dc5354ULL, 0x146f726d201566fdULL,
0x0fb3e88ee7f885f0ULL, 0xa2203abb7331b059ULL, 0xf9079ed05aa3db0bULL, 0x54944ce5ce6aeea2ULL,
0xcebeab17d5781d11ULL, 0x632d792241b128b8ULL, 0x380add49682343eaULL, 0x95990f7cfcea7643ULL,
0x8e45959f3b07954eULL, 0x23d647aaafcea0e7ULL, 0x78f1e3c1865ccbb5ULL, 0xd56231f41295fe1cULL,
0xe137fe1024b0197aULL, 0x4ca42c25b0792cd3ULL, 0x1783884e99eb4781ULL, 0xba105a7b0d227228ULL,
0xa1ccc098cacf9125ULL, 0x0c5f12ad5e06a48cULL, 0x5778b6c67794cfdeULL, 0xfaeb64f3e35dfa77ULL,
0x60c18301f84f09c4ULL, 0xcd5251346c863c6dULL, 0x9675f55f4514573fULL, 0x3be6276ad1dd6296ULL,
0x203abd891630819bULL, 0x8da96fbc82f9b432ULL, 0xd68ecbd7ab6bdf60ULL, 0x7b1d19e23fa2eac9ULL,
0xbe25541fc72011acULL, 0x13b6862a53e92405ULL, 0x489122417a7b4f57ULL, 0xe502f074eeb27afeULL,
0xfede6a97295f99f3ULL, 0x534db8a2bd96ac5aULL, 0x086a1cc99404c708ULL, 0xa5f9cefc00cdf2a1ULL,
0x3fd3290e1bdf0112ULL, 0x9240fb3b8f1634bbULL, 0xc9675f50a6845fe9ULL, 0x64f48d65324d6a40ULL,
0x7f281786f5a0894dULL, 0xd2bbc5b36169bce4ULL, 0x899c61d848fbd7b6ULL, 0x240fb3eddc32e21fULL,
0x105a7c09ea170579ULL, 0xbdc9ae3c7ede30d0ULL, 0xe6ee0a57574c5b82ULL, 0x4b7dd862c3856e2bULL,
0x50a1428104688d26ULL, 0xfd3290b490a1b88fULL, 0xa61534dfb933d3ddULL, 0x0b86e6ea2dfae674ULL,
0x91ac011836e815c7ULL, 0x3c3fd32da221206eULL, 0x671877468bb34b3cULL, 0xca8ba5731f7a7e95ULL,
0xd1573f90d8979d98ULL, 0x7cc4eda54c5ea831ULL, 0x27e349ce65ccc363ULL, 0x8a709bfbf105f6caULL,
0x9e91ac0c130e1b5eULL, 0x33027e3987c72ef7ULL, 0x6825da52ae5545a5ULL, 0xc5b608673a9c700cULL,
0xde6a9284fd719301ULL, 0x73f940b169b8a6a8ULL, 0x28dee4da402acdfaULL, 0x854d36efd4e3f853ULL,
0x1f67d11dcff10be0ULL, 0xb2f403285b383e49ULL, 0xe9d3a74372aa551bULL, 0x44407576e66360b2ULL,
0x5f9cef95218e83bfULL, 0xf20f3da0b547b616ULL, 0xa92899cb9cd5dd44ULL, 0x04bb4bfe081ce8edULL,
0x30ee841a3e390f8bULL, 0x9d7d562faaf03a22ULL, 0xc65af24483625170ULL, 0x6bc9207117ab64d9ULL,
0x7015ba92d04687d4ULL, 0xdd8668a7448fb27dULL, 0x86a1cccc6d1dd92fULL, 0x2b321ef9f9d4ec86ULL,
0xb118f90be2c61f35ULL, 0x1c8b2b3e760f2a9cULL, 0x47ac8f555f9d41ceULL, 0xea3f5d60cb547467ULL,
0xf1e3c7830cb9976aULL, 0x5c7015b69870a2c3ULL, 0x0757b1ddb1e2c991ULL, 0xaac463e8252bfc38ULL,
0x6ffc2e15dda9075dULL, 0xc26ffc20496032f4ULL, 0x9948584b60f259a6ULL, 0x34db8a7ef43b6c0fULL,
0x2f07109d33d68f02ULL, 0x8294c2a8a71fbaabULL, 0xd9b366c38e8dd1f9ULL, 0x7420b4f61a44e450ULL,
0xee0a5304015617e3ULL, 0x43998131959f224aULL, 0x18be255abc0d4918ULL, 0xb52df76f28c47cb1ULL,
0xaef16d8cef299fbcULL, 0x0362bfb97be0aa15ULL, 0x58451bd25272c147ULL, 0xf5d6c9e7c6bbf4eeULL,
0xc1830603f09e1388ULL, 0x6c10d43664572621ULL, 0x3737705d4dc54d73ULL, 0x9aa4a268d90c78daULL,
0x8178388b1ee19bd7ULL, 0x2cebeabe8a28ae7eULL, 0x77cc4ed5a3bac52cULL, 0xda5f9ce03773f085ULL,
0x40757b122c610336ULL, 0xede6a927b8a8369fULL, 0xb6c10d4c913a5dcdULL, 0x1b52df7905f36864ULL,
0x008e459ac21e8b69ULL, 0xad1d97af56d7bec0ULL, 0xf63a33c47f45d592ULL, 0x5ba9e1f1eb8ce03bULL,
0xd1d97a0a1a8916f1ULL, 0x7c4aa83f8e402358ULL, 0x276d0c54a7d2480aULL, 0x8afede61331b7da3ULL,
0x91224482f4f69eaeULL, 0x3cb196b7603fab07ULL, 0x679632dc49adc055ULL, 0xca05e0e9dd64f5fcULL,
0x502f071bc676064fULL, 0xfdbcd52e52bf33e6ULL, 0xa69b71457b2d58b4ULL, 0x0b08a370efe46d1dULL,
0x10d4399328098e10ULL, 0xbd47eba6bcc0bbb9ULL, 0xe6604fcd9552d0ebULL, 0x4bf39df8019be542ULL,
0x7fa6521c37be0224ULL, 0xd2358029a377378dULL, 0x891224428ae55cdfULL, 0x2481f6771e2c6976ULL,
0x3f5d6c94d9c18a7bULL, 0x92cebea14d08bfd2ULL, 0xc9e91aca649ad480ULL, 0x647ac8fff053e129ULL,
0xfe502f0deb41129aULL, 0x53c3fd387f882733ULL, 0x08e45953561a4c61ULL, 0xa5778b66c2d379c8ULL,
0xbeab1185053e9ac5ULL, 0x1338c3b091f7af6cULL, 0x481f67dbb865c43eULL, 0xe58cb5ee2cacf197ULL,
0x20b4f813d42e0af2ULL, 0x8d272a2640e73f5bULL, 0xd6008e4d69755409ULL, 0x7b935c78fdbc61a0ULL,
0x604fc69b3a5182adULL, 0xcddc14aeae98b704ULL, 0x96fbb0c5870adc56ULL, 0x3b6862f013c3e9ffULL,
0xa142850208d11a4cULL, 0x0cd157379c182fe5ULL, 0x57f6f35cb58a44b7ULL, 0xfa6521692143711eULL,
0xe1b9bb8ae6ae9213ULL, 0x4c2a69bf7267a7baULL, 0x170dcdd45bf5cce8ULL, 0xba9e1fe1cf3cf941ULL,
0x8ecbd005f9191e27ULL, 0x235802306dd02b8eULL, 0x787fa65b444240dcULL, 0xd5ec746ed08b7575ULL,
0xce30ee8d17669678ULL, 0x63a33cb883afa3d1ULL, 0x388498d3aa3dc883ULL, 0x95174ae63ef4fd2aULL,
0x0f3dad1425e60e99ULL, 0xa2ae7f21b12f3b30ULL, 0xf989db4a98bd5062ULL, 0x541a097f0c7465cbULL,
0x4fc6939ccb9986c6ULL, 0xe25541a95f50b36fULL, 0xb972e5c276c2d83dULL, 0x14e137f7e20bed94ULL
};
struct slver crc64_jones_norm_base_slver_00000025;
struct slver crc64_jones_norm_base_slver = { 0x0025, 0x00, 0x00 };
static const uint64_t crc64_rocksoft_refl_table[256] = {
0x0000000000000000ULL, 0x7f6ef0c830358979ULL, 0xfedde190606b12f2ULL, 0x81b31158505e9b8bULL,
0xc962e5739841b68fULL, 0xb60c15bba8743ff6ULL, 0x37bf04e3f82aa47dULL, 0x48d1f42bc81f2d04ULL,
0xa61cecb46814fe75ULL, 0xd9721c7c5821770cULL, 0x58c10d24087fec87ULL, 0x27affdec384a65feULL,
0x6f7e09c7f05548faULL, 0x1010f90fc060c183ULL, 0x91a3e857903e5a08ULL, 0xeecd189fa00bd371ULL,
0x78e0ff3b88be6f81ULL, 0x078e0ff3b88be6f8ULL, 0x863d1eabe8d57d73ULL, 0xf953ee63d8e0f40aULL,
0xb1821a4810ffd90eULL, 0xceecea8020ca5077ULL, 0x4f5ffbd87094cbfcULL, 0x30310b1040a14285ULL,
0xdefc138fe0aa91f4ULL, 0xa192e347d09f188dULL, 0x2021f21f80c18306ULL, 0x5f4f02d7b0f40a7fULL,
0x179ef6fc78eb277bULL, 0x68f0063448deae02ULL, 0xe943176c18803589ULL, 0x962de7a428b5bcf0ULL,
0xf1c1fe77117cdf02ULL, 0x8eaf0ebf2149567bULL, 0x0f1c1fe77117cdf0ULL, 0x7072ef2f41224489ULL,
0x38a31b04893d698dULL, 0x47cdebccb908e0f4ULL, 0xc67efa94e9567b7fULL, 0xb9100a5cd963f206ULL,
0x57dd12c379682177ULL, 0x28b3e20b495da80eULL, 0xa900f35319033385ULL, 0xd66e039b2936bafcULL,
0x9ebff7b0e12997f8ULL, 0xe1d10778d11c1e81ULL, 0x606216208142850aULL, 0x1f0ce6e8b1770c73ULL,
0x8921014c99c2b083ULL, 0xf64ff184a9f739faULL, 0x77fce0dcf9a9a271ULL, 0x08921014c99c2b08ULL,
0x4043e43f0183060cULL, 0x3f2d14f731b68f75ULL, 0xbe9e05af61e814feULL, 0xc1f0f56751dd9d87ULL,
0x2f3dedf8f1d64ef6ULL, 0x50531d30c1e3c78fULL, 0xd1e00c6891bd5c04ULL, 0xae8efca0a188d57dULL,
0xe65f088b6997f879ULL, 0x9931f84359a27100ULL, 0x1882e91b09fcea8bULL, 0x67ec19d339c963f2ULL,
0xd75adabd7a6e2d6fULL, 0xa8342a754a5ba416ULL, 0x29873b2d1a053f9dULL, 0x56e9cbe52a30b6e4ULL,
0x1e383fcee22f9be0ULL, 0x6156cf06d21a1299ULL, 0xe0e5de5e82448912ULL, 0x9f8b2e96b271006bULL,
0x71463609127ad31aULL, 0x0e28c6c1224f5a63ULL, 0x8f9bd7997211c1e8ULL, 0xf0f5275142244891ULL,
0xb824d37a8a3b6595ULL, 0xc74a23b2ba0eececULL, 0x46f932eaea507767ULL, 0x3997c222da65fe1eULL,
0xafba2586f2d042eeULL, 0xd0d4d54ec2e5cb97ULL, 0x5167c41692bb501cULL, 0x2e0934dea28ed965ULL,
0x66d8c0f56a91f461ULL, 0x19b6303d5aa47d18ULL, 0x980521650afae693ULL, 0xe76bd1ad3acf6feaULL,
0x09a6c9329ac4bc9bULL, 0x76c839faaaf135e2ULL, 0xf77b28a2faafae69ULL, 0x8815d86aca9a2710ULL,
0xc0c42c4102850a14ULL, 0xbfaadc8932b0836dULL, 0x3e19cdd162ee18e6ULL, 0x41773d1952db919fULL,
0x269b24ca6b12f26dULL, 0x59f5d4025b277b14ULL, 0xd846c55a0b79e09fULL, 0xa72835923b4c69e6ULL,
0xeff9c1b9f35344e2ULL, 0x90973171c366cd9bULL, 0x1124202993385610ULL, 0x6e4ad0e1a30ddf69ULL,
0x8087c87e03060c18ULL, 0xffe938b633338561ULL, 0x7e5a29ee636d1eeaULL, 0x0134d92653589793ULL,
0x49e52d0d9b47ba97ULL, 0x368bddc5ab7233eeULL, 0xb738cc9dfb2ca865ULL, 0xc8563c55cb19211cULL,
0x5e7bdbf1e3ac9decULL, 0x21152b39d3991495ULL, 0xa0a63a6183c78f1eULL, 0xdfc8caa9b3f20667ULL,
0x97193e827bed2b63ULL, 0xe877ce4a4bd8a21aULL, 0x69c4df121b863991ULL, 0x16aa2fda2bb3b0e8ULL,
0xf86737458bb86399ULL, 0x8709c78dbb8deae0ULL, 0x06bad6d5ebd3716bULL, 0x79d4261ddbe6f812ULL,
0x3105d23613f9d516ULL, 0x4e6b22fe23cc5c6fULL, 0xcfd833a67392c7e4ULL, 0xb0b6c36e43a74e9dULL,
0x9a6c9329ac4bc9b5ULL, 0xe50263e19c7e40ccULL, 0x64b172b9cc20db47ULL, 0x1bdf8271fc15523eULL,
0x530e765a340a7f3aULL, 0x2c608692043ff643ULL, 0xadd397ca54616dc8ULL, 0xd2bd67026454e4b1ULL,
0x3c707f9dc45f37c0ULL, 0x431e8f55f46abeb9ULL, 0xc2ad9e0da4342532ULL, 0xbdc36ec59401ac4bULL,
0xf5129aee5c1e814fULL, 0x8a7c6a266c2b0836ULL, 0x0bcf7b7e3c7593bdULL, 0x74a18bb60c401ac4ULL,
0xe28c6c1224f5a634ULL, 0x9de29cda14c02f4dULL, 0x1c518d82449eb4c6ULL, 0x633f7d4a74ab3dbfULL,
0x2bee8961bcb410bbULL, 0x548079a98c8199c2ULL, 0xd53368f1dcdf0249ULL, 0xaa5d9839ecea8b30ULL,
0x449080a64ce15841ULL, 0x3bfe706e7cd4d138ULL, 0xba4d61362c8a4ab3ULL, 0xc52391fe1cbfc3caULL,
0x8df265d5d4a0eeceULL, 0xf29c951de49567b7ULL, 0x732f8445b4cbfc3cULL, 0x0c41748d84fe7545ULL,
0x6bad6d5ebd3716b7ULL, 0x14c39d968d029fceULL, 0x95708ccedd5c0445ULL, 0xea1e7c06ed698d3cULL,
0xa2cf882d2576a038ULL, 0xdda178e515432941ULL, 0x5c1269bd451db2caULL, 0x237c997575283bb3ULL,
0xcdb181ead523e8c2ULL, 0xb2df7122e51661bbULL, 0x336c607ab548fa30ULL, 0x4c0290b2857d7349ULL,
0x04d364994d625e4dULL, 0x7bbd94517d57d734ULL, 0xfa0e85092d094cbfULL, 0x856075c11d3cc5c6ULL,
0x134d926535897936ULL, 0x6c2362ad05bcf04fULL, 0xed9073f555e26bc4ULL, 0x92fe833d65d7e2bdULL,
0xda2f7716adc8cfb9ULL, 0xa54187de9dfd46c0ULL, 0x24f29686cda3dd4bULL, 0x5b9c664efd965432ULL,
0xb5517ed15d9d8743ULL, 0xca3f8e196da80e3aULL, 0x4b8c9f413df695b1ULL, 0x34e26f890dc31cc8ULL,
0x7c339ba2c5dc31ccULL, 0x035d6b6af5e9b8b5ULL, 0x82ee7a32a5b7233eULL, 0xfd808afa9582aa47ULL,
0x4d364994d625e4daULL, 0x3258b95ce6106da3ULL, 0xb3eba804b64ef628ULL, 0xcc8558cc867b7f51ULL,
0x8454ace74e645255ULL, 0xfb3a5c2f7e51db2cULL, 0x7a894d772e0f40a7ULL, 0x05e7bdbf1e3ac9deULL,
0xeb2aa520be311aafULL, 0x944455e88e0493d6ULL, 0x15f744b0de5a085dULL, 0x6a99b478ee6f8124ULL,
0x224840532670ac20ULL, 0x5d26b09b16452559ULL, 0xdc95a1c3461bbed2ULL, 0xa3fb510b762e37abULL,
0x35d6b6af5e9b8b5bULL, 0x4ab846676eae0222ULL, 0xcb0b573f3ef099a9ULL, 0xb465a7f70ec510d0ULL,
0xfcb453dcc6da3dd4ULL, 0x83daa314f6efb4adULL, 0x0269b24ca6b12f26ULL, 0x7d0742849684a65fULL,
0x93ca5a1b368f752eULL, 0xeca4aad306bafc57ULL, 0x6d17bb8b56e467dcULL, 0x12794b4366d1eea5ULL,
0x5aa8bf68aecec3a1ULL, 0x25c64fa09efb4ad8ULL, 0xa4755ef8cea5d153ULL, 0xdb1bae30fe90582aULL,
0xbcf7b7e3c7593bd8ULL, 0xc399472bf76cb2a1ULL, 0x422a5673a732292aULL, 0x3d44a6bb9707a053ULL,
0x759552905f188d57ULL, 0x0afba2586f2d042eULL, 0x8b48b3003f739fa5ULL, 0xf42643c80f4616dcULL,
0x1aeb5b57af4dc5adULL, 0x6585ab9f9f784cd4ULL, 0xe436bac7cf26d75fULL, 0x9b584a0fff135e26ULL,
0xd389be24370c7322ULL, 0xace74eec0739fa5bULL, 0x2d545fb4576761d0ULL, 0x523aaf7c6752e8a9ULL,
0xc41748d84fe75459ULL, 0xbb79b8107fd2dd20ULL, 0x3acaa9482f8c46abULL, 0x45a459801fb9cfd2ULL,
0x0d75adabd7a6e2d6ULL, 0x721b5d63e7936bafULL, 0xf3a84c3bb7cdf024ULL, 0x8cc6bcf387f8795dULL,
0x620ba46c27f3aa2cULL, 0x1d6554a417c62355ULL, 0x9cd645fc4798b8deULL, 0xe3b8b53477ad31a7ULL,
0xab69411fbfb21ca3ULL, 0xd407b1d78f8795daULL, 0x55b4a08fdfd90e51ULL, 0x2ada5047efec8728ULL
};
static const uint64_t crc64_rocksoft_norm_table[256] = {
0x0000000000000000ULL, 0xad93d23594c93659ULL, 0xf6b4765ebd5b5aebULL, 0x5b27a46b29926cb2ULL,
0x40fb3e88ee7f838fULL, 0xed68ecbd7ab6b5d6ULL, 0xb64f48d65324d964ULL, 0x1bdc9ae3c7edef3dULL,
0x81f67d11dcff071eULL, 0x2c65af2448363147ULL, 0x77420b4f61a45df5ULL, 0xdad1d97af56d6bacULL,
0xc10d439932808491ULL, 0x6c9e91aca649b2c8ULL, 0x37b935c78fdbde7aULL, 0x9a2ae7f21b12e823ULL,
0xae7f28162d373865ULL, 0x03ecfa23b9fe0e3cULL, 0x58cb5e48906c628eULL, 0xf5588c7d04a554d7ULL,
0xee84169ec348bbeaULL, 0x4317c4ab57818db3ULL, 0x183060c07e13e101ULL, 0xb5a3b2f5eadad758ULL,
0x2f895507f1c83f7bULL, 0x821a873265010922ULL, 0xd93d23594c936590ULL, 0x74aef16cd85a53c9ULL,
0x6f726b8f1fb7bcf4ULL, 0xc2e1b9ba8b7e8aadULL, 0x99c61dd1a2ece61fULL, 0x3455cfe43625d046ULL,
0xf16d8219cea74693ULL, 0x5cfe502c5a6e70caULL, 0x07d9f44773fc1c78ULL, 0xaa4a2672e7352a21ULL,
0xb196bc9120d8c51cULL, 0x1c056ea4b411f345ULL, 0x4722cacf9d839ff7ULL, 0xeab118fa094aa9aeULL,
0x709bff081258418dULL, 0xdd082d3d869177d4ULL, 0x862f8956af031b66ULL, 0x2bbc5b633bca2d3fULL,
0x3060c180fc27c202ULL, 0x9df313b568eef45bULL, 0xc6d4b7de417c98e9ULL, 0x6b4765ebd5b5aeb0ULL,
0x5f12aa0fe3907ef6ULL, 0xf281783a775948afULL, 0xa9a6dc515ecb241dULL, 0x04350e64ca021244ULL,
0x1fe994870deffd79ULL, 0xb27a46b29926cb20ULL, 0xe95de2d9b0b4a792ULL, 0x44ce30ec247d91cbULL,
0xdee4d71e3f6f79e8ULL, 0x7377052baba64fb1ULL, 0x2850a14082342303ULL, 0x85c3737516fd155aULL,
0x9e1fe996d110fa67ULL, 0x338c3ba345d9cc3eULL, 0x68ab9fc86c4ba08cULL, 0xc5384dfdf88296d5ULL,
0x4f48d6060987bb7fULL, 0xe2db04339d4e8d26ULL, 0xb9fca058b4dce194ULL, 0x146f726d2015d7cdULL,
0x0fb3e88ee7f838f0ULL, 0xa2203abb73310ea9ULL, 0xf9079ed05aa3621bULL, 0x54944ce5ce6a5442ULL,
0xcebeab17d578bc61ULL, 0x632d792241b18a38ULL, 0x380add496823e68aULL, 0x95990f7cfcead0d3ULL,
0x8e45959f3b073feeULL, 0x23d647aaafce09b7ULL, 0x78f1e3c1865c6505ULL, 0xd56231f41295535cULL,
0xe137fe1024b0831aULL, 0x4ca42c25b079b543ULL, 0x1783884e99ebd9f1ULL, 0xba105a7b0d22efa8ULL,
0xa1ccc098cacf0095ULL, 0x0c5f12ad5e0636ccULL, 0x5778b6c677945a7eULL, 0xfaeb64f3e35d6c27ULL,
0x60c18301f84f8404ULL, 0xcd5251346c86b25dULL, 0x9675f55f4514deefULL, 0x3be6276ad1dde8b6ULL,
0x203abd891630078bULL, 0x8da96fbc82f931d2ULL, 0xd68ecbd7ab6b5d60ULL, 0x7b1d19e23fa26b39ULL,
0xbe25541fc720fdecULL, 0x13b6862a53e9cbb5ULL, 0x489122417a7ba707ULL, 0xe502f074eeb2915eULL,
0xfede6a97295f7e63ULL, 0x534db8a2bd96483aULL, 0x086a1cc994042488ULL, 0xa5f9cefc00cd12d1ULL,
0x3fd3290e1bdffaf2ULL, 0x9240fb3b8f16ccabULL, 0xc9675f50a684a019ULL, 0x64f48d65324d9640ULL,
0x7f281786f5a0797dULL, 0xd2bbc5b361694f24ULL, 0x899c61d848fb2396ULL, 0x240fb3eddc3215cfULL,
0x105a7c09ea17c589ULL, 0xbdc9ae3c7edef3d0ULL, 0xe6ee0a57574c9f62ULL, 0x4b7dd862c385a93bULL,
0x50a1428104684606ULL, 0xfd3290b490a1705fULL, 0xa61534dfb9331cedULL, 0x0b86e6ea2dfa2ab4ULL,
0x91ac011836e8c297ULL, 0x3c3fd32da221f4ceULL, 0x671877468bb3987cULL, 0xca8ba5731f7aae25ULL,
0xd1573f90d8974118ULL, 0x7cc4eda54c5e7741ULL, 0x27e349ce65cc1bf3ULL, 0x8a709bfbf1052daaULL,
0x9e91ac0c130f76feULL, 0x33027e3987c640a7ULL, 0x6825da52ae542c15ULL, 0xc5b608673a9d1a4cULL,
0xde6a9284fd70f571ULL, 0x73f940b169b9c328ULL, 0x28dee4da402baf9aULL, 0x854d36efd4e299c3ULL,
0x1f67d11dcff071e0ULL, 0xb2f403285b3947b9ULL, 0xe9d3a74372ab2b0bULL, 0x44407576e6621d52ULL,
0x5f9cef95218ff26fULL, 0xf20f3da0b546c436ULL, 0xa92899cb9cd4a884ULL, 0x04bb4bfe081d9eddULL,
0x30ee841a3e384e9bULL, 0x9d7d562faaf178c2ULL, 0xc65af24483631470ULL, 0x6bc9207117aa2229ULL,
0x7015ba92d047cd14ULL, 0xdd8668a7448efb4dULL, 0x86a1cccc6d1c97ffULL, 0x2b321ef9f9d5a1a6ULL,
0xb118f90be2c74985ULL, 0x1c8b2b3e760e7fdcULL, 0x47ac8f555f9c136eULL, 0xea3f5d60cb552537ULL,
0xf1e3c7830cb8ca0aULL, 0x5c7015b69871fc53ULL, 0x0757b1ddb1e390e1ULL, 0xaac463e8252aa6b8ULL,
0x6ffc2e15dda8306dULL, 0xc26ffc2049610634ULL, 0x9948584b60f36a86ULL, 0x34db8a7ef43a5cdfULL,
0x2f07109d33d7b3e2ULL, 0x8294c2a8a71e85bbULL, 0xd9b366c38e8ce909ULL, 0x7420b4f61a45df50ULL,
0xee0a530401573773ULL, 0x43998131959e012aULL, 0x18be255abc0c6d98ULL, 0xb52df76f28c55bc1ULL,
0xaef16d8cef28b4fcULL, 0x0362bfb97be182a5ULL, 0x58451bd25273ee17ULL, 0xf5d6c9e7c6bad84eULL,
0xc1830603f09f0808ULL, 0x6c10d43664563e51ULL, 0x3737705d4dc452e3ULL, 0x9aa4a268d90d64baULL,
0x8178388b1ee08b87ULL, 0x2cebeabe8a29bddeULL, 0x77cc4ed5a3bbd16cULL, 0xda5f9ce03772e735ULL,
0x40757b122c600f16ULL, 0xede6a927b8a9394fULL, 0xb6c10d4c913b55fdULL, 0x1b52df7905f263a4ULL,
0x008e459ac21f8c99ULL, 0xad1d97af56d6bac0ULL, 0xf63a33c47f44d672ULL, 0x5ba9e1f1eb8de02bULL,
0xd1d97a0a1a88cd81ULL, 0x7c4aa83f8e41fbd8ULL, 0x276d0c54a7d3976aULL, 0x8afede61331aa133ULL,
0x91224482f4f74e0eULL, 0x3cb196b7603e7857ULL, 0x679632dc49ac14e5ULL, 0xca05e0e9dd6522bcULL,
0x502f071bc677ca9fULL, 0xfdbcd52e52befcc6ULL, 0xa69b71457b2c9074ULL, 0x0b08a370efe5a62dULL,
0x10d4399328084910ULL, 0xbd47eba6bcc17f49ULL, 0xe6604fcd955313fbULL, 0x4bf39df8019a25a2ULL,
0x7fa6521c37bff5e4ULL, 0xd2358029a376c3bdULL, 0x891224428ae4af0fULL, 0x2481f6771e2d9956ULL,
0x3f5d6c94d9c0766bULL, 0x92cebea14d094032ULL, 0xc9e91aca649b2c80ULL, 0x647ac8fff0521ad9ULL,
0xfe502f0deb40f2faULL, 0x53c3fd387f89c4a3ULL, 0x08e45953561ba811ULL, 0xa5778b66c2d29e48ULL,
0xbeab1185053f7175ULL, 0x1338c3b091f6472cULL, 0x481f67dbb8642b9eULL, 0xe58cb5ee2cad1dc7ULL,
0x20b4f813d42f8b12ULL, 0x8d272a2640e6bd4bULL, 0xd6008e4d6974d1f9ULL, 0x7b935c78fdbde7a0ULL,
0x604fc69b3a50089dULL, 0xcddc14aeae993ec4ULL, 0x96fbb0c5870b5276ULL, 0x3b6862f013c2642fULL,
0xa142850208d08c0cULL, 0x0cd157379c19ba55ULL, 0x57f6f35cb58bd6e7ULL, 0xfa6521692142e0beULL,
0xe1b9bb8ae6af0f83ULL, 0x4c2a69bf726639daULL, 0x170dcdd45bf45568ULL, 0xba9e1fe1cf3d6331ULL,
0x8ecbd005f918b377ULL, 0x235802306dd1852eULL, 0x787fa65b4443e99cULL, 0xd5ec746ed08adfc5ULL,
0xce30ee8d176730f8ULL, 0x63a33cb883ae06a1ULL, 0x388498d3aa3c6a13ULL, 0x95174ae63ef55c4aULL,
0x0f3dad1425e7b469ULL, 0xa2ae7f21b12e8230ULL, 0xf989db4a98bcee82ULL, 0x541a097f0c75d8dbULL,
0x4fc6939ccb9837e6ULL, 0xe25541a95f5101bfULL, 0xb972e5c276c36d0dULL, 0x14e137f7e20a5b54ULL
};
uint64_t
crc64_ecma_refl_base(uint64_t seed, const uint8_t *buf, uint64_t len)
{
uint64_t i, crc = ~seed;
for (i = 0; i < len; i++) {
uint8_t byte = buf[i];
crc = crc64_ecma_refl_table[(uint8_t) crc ^ byte] ^ (crc >> 8);
}
return ~crc;
}
uint64_t
crc64_ecma_norm_base(uint64_t seed, const uint8_t *buf, uint64_t len)
{
uint64_t i, crc = ~seed;
for (i = 0; i < len; i++) {
uint8_t byte = buf[i];
crc = crc64_ecma_norm_table[((crc >> 56) ^ byte) & 0xff] ^ (crc << 8);
}
return ~crc;
}
uint64_t
crc64_iso_refl_base(uint64_t seed, const uint8_t *buf, uint64_t len)
{
uint64_t i, crc = ~seed;
for (i = 0; i < len; i++) {
uint8_t byte = buf[i];
crc = crc64_iso_refl_table[(uint8_t) crc ^ byte] ^ (crc >> 8);
}
return ~crc;
}
uint64_t
crc64_iso_norm_base(uint64_t seed, const uint8_t *buf, uint64_t len)
{
uint64_t i, crc = ~seed;
for (i = 0; i < len; i++) {
uint8_t byte = buf[i];
crc = crc64_iso_norm_table[((crc >> 56) ^ byte) & 0xff] ^ (crc << 8);
}
return ~crc;
}
uint64_t
crc64_jones_refl_base(uint64_t seed, const uint8_t *buf, uint64_t len)
{
uint64_t i, crc = ~seed;
for (i = 0; i < len; i++) {
uint8_t byte = buf[i];
crc = crc64_jones_refl_table[(uint8_t) crc ^ byte] ^ (crc >> 8);
}
return ~crc;
}
uint64_t
crc64_jones_norm_base(uint64_t seed, const uint8_t *buf, uint64_t len)
{
uint64_t i, crc = ~seed;
for (i = 0; i < len; i++) {
uint8_t byte = buf[i];
crc = crc64_jones_norm_table[((crc >> 56) ^ byte) & 0xff] ^ (crc << 8);
}
return ~crc;
}
uint64_t
crc64_rocksoft_refl_base(uint64_t seed, const uint8_t *buf, uint64_t len)
{
uint64_t i, crc = ~seed;
for (i = 0; i < len; i++) {
uint8_t byte = buf[i];
crc = crc64_rocksoft_refl_table[(uint8_t) crc ^ byte] ^ (crc >> 8);
}
return ~crc;
}
uint64_t
crc64_rocksoft_norm_base(uint64_t seed, const uint8_t *buf, uint64_t len)
{
uint64_t i, crc = ~seed;
for (i = 0; i < len; i++) {
uint8_t byte = buf[i];
crc = crc64_rocksoft_norm_table[((crc >> 56) ^ byte) & 0xff] ^ (crc << 8);
}
return ~crc;
}

Some files were not shown because too many files have changed in this diff Show More