reduce one slli instructions and remove the dependence between vle8.v and ld instructions
gf5 and gf7 are not modified, +5 and +7 are not used in actual scenarios.
Signed-off-by: Shuo Lv <lv.shuo@sanechips.com.cn>
On AArch64 systems with SVE support, 128-bit SVE implementations can
perform significantly worse than equivalent NEON code due to the
different optimization strategies used in each implementation. The NEON
version is unrolled 4 times, providing excellent performance at the
fixed 128-bit width. The SVE version can achieve similar or better
performance through its variable-width operations on systems with
256-bit or 512-bit SVE, but on 128-bit SVE systems, the NEON unrolled
implementation is faster due to reduced overhead.
This change adds runtime detection of SVE vector length and falls back
to the optimized NEON implementation when SVE is operating at 128-bit
width, ensuring optimal performance across all AArch64 configurations.
This implementation checks the vector length with an intrinsic if the
compiler supports it (which works on Apple as well) and falls back to
using prctl otherwise.
This optimization ensures that systems benefit from:
- 4x unrolled NEON code on 128-bit SVE systems
- Variable-width SVE optimizations on wider SVE implementations
- Maintained compatibility across different AArch64 configurations
Performance improvement on systems with 128-bit SVE:
- Encode: 7509.80 MB/s → 8995.59 MB/s (+19.8% improvement)
- Decode: 9383.67 MB/s → 12272.38 MB/s (+30.8% improvement)
Signed-off-by: Jonathan Swinney <jswinney@amazon.com>
We only ever load 32 bits into it, and we only ever want to compare against
32 bits. There was no need to declare it as 64 bits.
Furthermore, there were cases where a 64 bit comparison around
isal_out_overflow_1 led us to erroneously set the block state to
ISAL_BLOCK_INPUT_DONE when it should have been left at ISAL_BLOCK_NEW_HDR.
Fixes#316
Signed-off-by: Tim Burke <tim.burke@gmail.com>
Somewhere between Command Line Tools for Xcode 16.2 and 16.3, clang
started complaining like
<instantiation>:91:26: error: unexpected token in argument list
movk x7, br_low_b2, lsl 32
^
crc/aarch64/crc32_ieee_norm_pmull.S:34:1: note: while in macro instantiation
crc32_norm_func crc32_ieee_norm_pmull
It seems to do with some change to macro expansion; work around it by
replacing .equ directives with #defines.
Fixes#352
Signed-off-by: Tim Burke <tim.burke@gmail.com>
There is a possibility that zstate.msg = NULL, which is set
in inflateInit2() function. In that case, we should not
compare against another string.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
This is experimental library is a drop-in replacement for zlib that
utilizes ISA-L for improved compression/decompression performance.
Signed-off-by: Karpenko, Veronika <veronika.karpenko@intel.com>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
The ISA-L EC code has been written using RVV vector instructions and the minimum multiplication table,
resulting in a performance improvement of over 10 times compared to the existing implementation.
Signed-off-by: Shuo Lv <lv.shuo@sanechips.com.cn>
Added new RAID performance application which consolidates the
existing XOR and P+Q gen performance applications.
This application accepts buffer sizes to benchmark,
as a single value, list or range, and the RAID function
to test and the number of sources.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
To benchmark a cold cache scenario, the option `--cold`
has been added as a parameter of the CRC benchmark application,
where the addresses of the input buffers are randomize
within a 1GB preallocated memory buffer.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Added new CRC performance application which consolidates the
existing CRC performance applications (CRC16, CRC32 and CRC64).
This application accepts buffer sizes to benchmark,
as a single value, list or range, and the CRC function
to test (or all of them).
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
This makes it easier to compare the constants used for crc/crc64_*_by8.asm,
crc/crc64_*_by16_10.asm, and crc/aarch64/crc64_*_pmull.h
Note that this revealed some discrepancies:
ecma_refl: br_high != rk8 (92d8af2baf0e1e85 vs 92d8af2baf0e1e84)
iso_refl: br_high != rk8 (b000000000000001 vs b000000000000000)
jones_refl: br_high != rk8 (2b5926535897936b vs 2b5926535897936a)
but they should be innocuous.
Signed-off-by: Tim Burke <tim.burke@gmail.com>
As already announced in issue #296, we are removing 32-bit x86 support,
which was not being validated anyway.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
NASM version 2.14.01 supports all x86 ISA in this library.
Since this version has been out since 2018, it is safe to
only permit the library to be compiled with this minimum version,
as announced in issue #297.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>