Commit Graph

782 Commits

Author SHA1 Message Date
Pablo de Lara
1a3e47f539 examples: fix clang 19 warning
Clang 19 flags a variable not being used.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-10-20 10:04:01 +01:00
Veronika Karpenko
74f29cfe8d shim: add crc32 and adler32 support
Signed-off-by: Veronika Karpenko <veronika.karpenko@intel.com>
2025-10-16 15:23:21 +01:00
Karpenko, Veronika
c9076a6380 shim: fix EOF exception
Fix issue: #361

Signed-off-by: Karpenko, Veronika <veronika.karpenko@intel.com>
2025-10-14 14:53:29 +01:00
lvshuo
e64ed02065 erasure_code: set vsetvli to default parameter and add space
Signed-off-by: Shuo Lv <lv.shuo@sanechips.com.cn>
2025-10-13 09:49:38 +01:00
lvshuo
7684934179 erasure_code: add optimization implementation
reduce one slli instructions and remove the dependence between vle8.v and ld instructions
gf5 and gf7 are not modified, +5 and +7 are not used in actual scenarios.

Signed-off-by: Shuo Lv <lv.shuo@sanechips.com.cn>
2025-10-13 09:49:38 +01:00
Pablo de Lara
a90a880887 igzip: fix typos reported by codespell
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-10-10 16:33:25 +01:00
Pablo de Lara
0b3ec4f3b6 Add error checking to get_filesize function
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-10-02 10:29:58 +01:00
Pablo de Lara
bfc99b6a18 igzip: exit with status 1 in applications upon failure
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-10-02 10:29:58 +01:00
Greg Tucker
9f77f65dbc igzip: Change return variable type consistent with usage
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2025-10-02 10:29:58 +01:00
Nicola Torracca
c0a10cac84 raid: keep sub and jcc in sequence to enable macrofusion.
Signed-off-by: Nicola Torracca <nicola.torracca@gmail.com>
2025-09-30 14:38:01 +01:00
vkarpenk
09d7b05dd4 shim: update README (#111)
Signed-off-by: Karpenko, Veronika <veronika.karpenko@intel.com>
2025-09-19 09:47:31 +01:00
Jonathan Swinney
aedcd375ba aarch64: Use NEON when SVE width is 128 bits
On AArch64 systems with SVE support, 128-bit SVE implementations can
perform significantly worse than equivalent NEON code due to the
different optimization strategies used in each implementation. The NEON
version is unrolled 4 times, providing excellent performance at the
fixed 128-bit width. The SVE version can achieve similar or better
performance through its variable-width operations on systems with
256-bit or 512-bit SVE, but on 128-bit SVE systems, the NEON unrolled
implementation is faster due to reduced overhead.

This change adds runtime detection of SVE vector length and falls back
to the optimized NEON implementation when SVE is operating at 128-bit
width, ensuring optimal performance across all AArch64 configurations.

This implementation checks the vector length with an intrinsic if the
compiler supports it (which works on Apple as well) and falls back to
using prctl otherwise.

This optimization ensures that systems benefit from:
- 4x unrolled NEON code on 128-bit SVE systems
- Variable-width SVE optimizations on wider SVE implementations
- Maintained compatibility across different AArch64 configurations

Performance improvement on systems with 128-bit SVE:
- Encode: 7509.80 MB/s → 8995.59 MB/s (+19.8% improvement)
- Decode: 9383.67 MB/s → 12272.38 MB/s (+30.8% improvement)

Signed-off-by: Jonathan Swinney <jswinney@amazon.com>
2025-09-18 17:11:00 +01:00
Pablo de Lara
09cec64707 erasure_code: improve verbose output of test application
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-09-17 13:36:06 +01:00
Pablo de Lara
e677f668c8 crc: only prefetch data that will be consumed for VPCLMUL functions
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-09-02 14:17:47 +01:00
Pablo de Lara
510de484c4 crc: only prefetch data that will be consumed for non-VPCLMUL functions
Also, use only 2 prefetch instructions for 128B.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-09-02 14:17:47 +01:00
Pablo de Lara
46b52726c8 crc: prefetch data with prefetcht1 for non-VPCLMUL implementations
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-09-02 14:17:47 +01:00
Pablo de Lara
81ee1cdb95 crc: prefetch data with prefetcht0 for VPCLMUL implementations
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-09-02 14:17:47 +01:00
Pablo de Lara
4613c5ac09 crc: delete unused CRC ISCSI implementation
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-09-02 14:17:47 +01:00
Pablo de Lara
0ed666031d crc: add PCLMUL CRC32 ISCSI implementation
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-09-02 14:17:47 +01:00
Tim Burke
a46e3f1588 igzip: Fix aarch64 registry width for bfinal
We only ever load 32 bits into it, and we only ever want to compare against
32 bits. There was no need to declare it as 64 bits.

Furthermore, there were cases where a 64 bit comparison around
isal_out_overflow_1 led us to erroneously set the block state to
ISAL_BLOCK_INPUT_DONE when it should have been left at ISAL_BLOCK_NEW_HDR.

Fixes #316

Signed-off-by: Tim Burke <tim.burke@gmail.com>
2025-08-29 23:50:35 +08:00
Tim Burke
73c50447fc aarch64: Fix build on macOS
Somewhere between Command Line Tools for Xcode 16.2 and 16.3, clang
started complaining like

   <instantiation>:91:26: error: unexpected token in argument list
    movk x7, br_low_b2, lsl 32
                            ^
   crc/aarch64/crc32_ieee_norm_pmull.S:34:1: note: while in macro instantiation
   crc32_norm_func crc32_ieee_norm_pmull

It seems to do with some change to macro expansion; work around it by
replacing .equ directives with #defines.

Fixes #352
Signed-off-by: Tim Burke <tim.burke@gmail.com>
2025-08-21 16:20:04 +01:00
Pablo de Lara
8772e99fee Add MAINTAINERS file
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-08-21 09:05:44 +01:00
Pablo de Lara
fa32879c2d tests: [fuzz] fix potential null dereference
There is a possibility that zstate.msg = NULL, which is set
in inflateInit2() function. In that case, we should not
compare against another string.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-08-11 17:37:49 +01:00
Pablo de Lara
768b77219f igzip: [SHIM] fix memory leaks
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-08-11 17:37:49 +01:00
Pablo de Lara
8f2c02ab9e igzip: fix memory leak in test
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-08-11 17:37:49 +01:00
vkarpenk
f0320e1c30 shim: add zlib shim library
This is experimental library is a drop-in replacement for zlib that
utilizes ISA-L for improved compression/decompression performance.

Signed-off-by: Karpenko, Veronika <veronika.karpenko@intel.com>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-08-08 07:47:35 +00:00
Pablo de Lara
5e9072107a cmake: add functional tests
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-07-17 09:57:45 +02:00
Pablo de Lara
612c210684 Add inital CMake build system
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
2025-07-17 09:57:45 +02:00
lvshuo
d414b2702a erasure_code: optimize RVV implementation
The ISA-L EC code has been written using RVV vector instructions and the minimum multiplication table,
resulting in a performance improvement of over 10 times compared to the existing implementation.

Signed-off-by: Shuo Lv <lv.shuo@sanechips.com.cn>
2025-07-11 15:55:57 +02:00
Pablo de Lara
f2883f24fd raid: add cold cache test
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-07-09 14:15:18 +01:00
Pablo de Lara
55e25f7aa2 raid: add consolidated performance app
Added new RAID performance application which consolidates the
existing XOR and P+Q gen performance applications.

This application accepts buffer sizes to benchmark,
as a single value, list or range, and the RAID function
to test and the number of sources.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-07-03 15:00:56 +02:00
Pablo de Lara
8735bb4e20 crc: add cold cache test
To benchmark a cold cache scenario, the option `--cold`
has been added as a parameter of the CRC benchmark application,
where the addresses of the input buffers are randomize
within a 1GB preallocated memory buffer.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-07-03 12:11:56 +01:00
Pablo de Lara
e97c91547f Add parenthesis around parameters in macros
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-07-03 12:11:56 +01:00
Pablo de Lara
199a0a8151 crc: add CRC consolidated performance benchmark
Added new CRC performance application which consolidates the
existing CRC performance applications (CRC16, CRC32 and CRC64).

This application accepts buffer sizes to benchmark,
as a single value, list or range, and the CRC function
to test (or all of them).

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-06-23 08:48:01 +01:00
Pablo de Lara
5d437d72f1 Add missing base function symbol
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-06-23 08:48:01 +01:00
Pablo de Lara
fc37bd08e3 Further memory leak fixes
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-06-23 08:48:01 +01:00
Pablo de Lara
bf18da6770 Free allocated memory in test applications
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-06-11 12:17:43 +01:00
Rong Tao
8054d41db5 Ignore more generated files
ignore files generated by 'make perf'.

Signed-off-by: Rong Tao <rongtao@cestc.cn>
2025-05-26 19:28:17 +01:00
Tim Burke
9a6c32cb05 Optimize crc64_rocksoft for aarch64
Closes #326

Signed-off-by: Tim Burke <tim.burke@gmail.com>
2025-05-20 11:39:28 +01:00
Tim Burke
86e775b3b5 Remove unnecessary .text directives
Signed-off-by: Tim Burke <tim.burke@gmail.com>
2025-05-20 11:39:28 +01:00
Tim Burke
22810489c6 Normalize the width of some constants
This makes it easier to compare the constants used for crc/crc64_*_by8.asm,
crc/crc64_*_by16_10.asm, and crc/aarch64/crc64_*_pmull.h

Note that this revealed some discrepancies:

  ecma_refl: br_high != rk8 (92d8af2baf0e1e85 vs 92d8af2baf0e1e84)
  iso_refl: br_high != rk8 (b000000000000001 vs b000000000000000)
  jones_refl: br_high != rk8 (2b5926535897936b vs 2b5926535897936a)

but they should be innocuous.

Signed-off-by: Tim Burke <tim.burke@gmail.com>
2025-05-20 11:39:28 +01:00
sunyuechi
f74b0d27ab Update release notes
Signed-off-by: sunyuechi <sunyuechi@iscas.ac.cn>
2025-05-13 15:39:44 +02:00
sunyuechi
a7766a91b6 mem: R-V V mem_zero_detect
banana_f3:
    rvv: mem_zero_detect_perf_warm: runtime =    3062584 usecs, bandwidth 33784 MB in 3.0626 sec = 11031.32 MB/s
    c:   mem_zero_detect_perf_warm: runtime =    3000354 usecs, bandwidth 1594 MB in 3.0004 sec = 531.34 MB/s

Signed-off-by: sunyuechi <sunyuechi@iscas.ac.cn>
2025-05-13 15:39:44 +02:00
Pablo de Lara
94690d01ca Remove 32-bit x86 architecture support
As already announced in issue #296, we are removing 32-bit x86 support,
which was not being validated anyway.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-05-08 18:37:08 +01:00
Pablo de Lara
8045bee170 Bump minimum NASM version to 2.14.01
NASM version 2.14.01 supports all x86 ISA in this library.
Since this version has been out since 2018, it is safe to
only permit the library to be compiled with this minimum version,
as announced in issue #297.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-05-08 16:20:08 +01:00
Pablo de Lara
d20335bba8 Update release notes
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-05-08 16:20:08 +01:00
sunyuechi
eb130eaf6b erasure_code: R-V V ec_encode_data
banana_f3:
    rvv:
        erasure_code_encode_warm: runtime =    3065696 usecs, bandwidth 108 MB in 3.0657 sec = 35.37 MB/s
        erasure_code_decode_warm: runtime =    3001213 usecs, bandwidth 136 MB in 3.0012 sec = 45.47 MB/s
    c:
        erasure_code_encode_warm: runtime =    3002512 usecs, bandwidth 52 MB in 3.0025 sec = 17.34 MB/s
        erasure_code_decode_warm: runtime =    3065235 usecs, bandwidth 57 MB in 3.0652 sec = 18.69 MB/s

Signed-off-by: sunyuechi <sunyuechi@iscas.ac.cn>
2025-05-01 17:44:19 +01:00
sunyuechi
c5d75f1e27 erasure_code: R-V V gf_vect_dot_prod
banana_f3:
    rvv: gf_vect_dot_prod_warm: runtime =    3062964 usecs, bandwidth 490 MB in 3.0630 sec = 160.25 MB/s
    c:   gf_vect_dot_prod_warm: runtime =    3000581 usecs, bandwidth 173 MB in 3.0006 sec = 57.69 MB/s

Signed-off-by: sunyuechi <sunyuechi@iscas.ac.cn>
2025-05-01 17:44:19 +01:00
sunyuechi
4174804684 riscv64_multibinary support more args
Signed-off-by: sunyuechi <sunyuechi@iscas.ac.cn>
2025-05-01 17:44:19 +01:00
sunyuechi
0a68e9434a erasure_code: R-V V gf_vect_mul
banana_f3:
    rvv: gf_vect_mul_warm: runtime =    3062541 usecs, bandwidth 1889 MB in 3.0625 sec = 616.84 MB/s
    c:   gf_vect_mul_warm: runtime =    3062014 usecs, bandwidth 285 MB in 3.0620 sec = 93.29 MB/s

Signed-off-by: sunyuechi <sunyuechi@iscas.ac.cn>
2025-05-01 17:44:19 +01:00