isa-l

mirror of https://github.com/intel/isa-l.git synced 2025-10-27 11:06:48 +01:00

Author	SHA1	Message	Date
Pablo de Lara	1a3e47f539	examples: fix clang 19 warning Clang 19 flags a variable not being used. Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2025-10-20 10:04:01 +01:00
Veronika Karpenko	74f29cfe8d	shim: add crc32 and adler32 support Signed-off-by: Veronika Karpenko <veronika.karpenko@intel.com>	2025-10-16 15:23:21 +01:00
Karpenko, Veronika	c9076a6380	shim: fix EOF exception Fix issue: #361 Signed-off-by: Karpenko, Veronika <veronika.karpenko@intel.com>	2025-10-14 14:53:29 +01:00
lvshuo	e64ed02065	erasure_code: set vsetvli to default parameter and add space Signed-off-by: Shuo Lv <lv.shuo@sanechips.com.cn>	2025-10-13 09:49:38 +01:00
lvshuo	7684934179	erasure_code: add optimization implementation reduce one slli instructions and remove the dependence between vle8.v and ld instructions gf5 and gf7 are not modified, +5 and +7 are not used in actual scenarios. Signed-off-by: Shuo Lv <lv.shuo@sanechips.com.cn>	2025-10-13 09:49:38 +01:00
Pablo de Lara	a90a880887	igzip: fix typos reported by codespell Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2025-10-10 16:33:25 +01:00
Pablo de Lara	0b3ec4f3b6	Add error checking to get_filesize function Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2025-10-02 10:29:58 +01:00
Pablo de Lara	bfc99b6a18	igzip: exit with status 1 in applications upon failure Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2025-10-02 10:29:58 +01:00
Greg Tucker	9f77f65dbc	igzip: Change return variable type consistent with usage Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2025-10-02 10:29:58 +01:00
Nicola Torracca	c0a10cac84	raid: keep sub and jcc in sequence to enable macrofusion. Signed-off-by: Nicola Torracca <nicola.torracca@gmail.com>	2025-09-30 14:38:01 +01:00
vkarpenk	09d7b05dd4	shim: update README (#111 ) Signed-off-by: Karpenko, Veronika <veronika.karpenko@intel.com>	2025-09-19 09:47:31 +01:00
Jonathan Swinney	aedcd375ba	aarch64: Use NEON when SVE width is 128 bits On AArch64 systems with SVE support, 128-bit SVE implementations can perform significantly worse than equivalent NEON code due to the different optimization strategies used in each implementation. The NEON version is unrolled 4 times, providing excellent performance at the fixed 128-bit width. The SVE version can achieve similar or better performance through its variable-width operations on systems with 256-bit or 512-bit SVE, but on 128-bit SVE systems, the NEON unrolled implementation is faster due to reduced overhead. This change adds runtime detection of SVE vector length and falls back to the optimized NEON implementation when SVE is operating at 128-bit width, ensuring optimal performance across all AArch64 configurations. This implementation checks the vector length with an intrinsic if the compiler supports it (which works on Apple as well) and falls back to using prctl otherwise. This optimization ensures that systems benefit from: - 4x unrolled NEON code on 128-bit SVE systems - Variable-width SVE optimizations on wider SVE implementations - Maintained compatibility across different AArch64 configurations Performance improvement on systems with 128-bit SVE: - Encode: 7509.80 MB/s → 8995.59 MB/s (+19.8% improvement) - Decode: 9383.67 MB/s → 12272.38 MB/s (+30.8% improvement) Signed-off-by: Jonathan Swinney <jswinney@amazon.com>	2025-09-18 17:11:00 +01:00
Pablo de Lara	09cec64707	erasure_code: improve verbose output of test application Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2025-09-17 13:36:06 +01:00
Pablo de Lara	e677f668c8	crc: only prefetch data that will be consumed for VPCLMUL functions Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2025-09-02 14:17:47 +01:00
Pablo de Lara	510de484c4	crc: only prefetch data that will be consumed for non-VPCLMUL functions Also, use only 2 prefetch instructions for 128B. Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2025-09-02 14:17:47 +01:00
Pablo de Lara	46b52726c8	crc: prefetch data with prefetcht1 for non-VPCLMUL implementations Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2025-09-02 14:17:47 +01:00
Pablo de Lara	81ee1cdb95	crc: prefetch data with prefetcht0 for VPCLMUL implementations Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2025-09-02 14:17:47 +01:00
Pablo de Lara	4613c5ac09	crc: delete unused CRC ISCSI implementation Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2025-09-02 14:17:47 +01:00
Pablo de Lara	0ed666031d	crc: add PCLMUL CRC32 ISCSI implementation Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2025-09-02 14:17:47 +01:00
Tim Burke	a46e3f1588	igzip: Fix aarch64 registry width for bfinal We only ever load 32 bits into it, and we only ever want to compare against 32 bits. There was no need to declare it as 64 bits. Furthermore, there were cases where a 64 bit comparison around isal_out_overflow_1 led us to erroneously set the block state to ISAL_BLOCK_INPUT_DONE when it should have been left at ISAL_BLOCK_NEW_HDR. Fixes #316 Signed-off-by: Tim Burke <tim.burke@gmail.com>	2025-08-29 23:50:35 +08:00
Tim Burke	73c50447fc	aarch64: Fix build on macOS Somewhere between Command Line Tools for Xcode 16.2 and 16.3, clang started complaining like <instantiation>:91:26: error: unexpected token in argument list movk x7, br_low_b2, lsl 32 ^ crc/aarch64/crc32_ieee_norm_pmull.S:34:1: note: while in macro instantiation crc32_norm_func crc32_ieee_norm_pmull It seems to do with some change to macro expansion; work around it by replacing .equ directives with #defines. Fixes #352 Signed-off-by: Tim Burke <tim.burke@gmail.com>	2025-08-21 16:20:04 +01:00
Pablo de Lara	8772e99fee	Add MAINTAINERS file Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2025-08-21 09:05:44 +01:00
Pablo de Lara	fa32879c2d	tests: [fuzz] fix potential null dereference There is a possibility that zstate.msg = NULL, which is set in inflateInit2() function. In that case, we should not compare against another string. Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2025-08-11 17:37:49 +01:00
Pablo de Lara	768b77219f	igzip: [SHIM] fix memory leaks Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2025-08-11 17:37:49 +01:00
Pablo de Lara	8f2c02ab9e	igzip: fix memory leak in test Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2025-08-11 17:37:49 +01:00
vkarpenk	f0320e1c30	shim: add zlib shim library This is experimental library is a drop-in replacement for zlib that utilizes ISA-L for improved compression/decompression performance. Signed-off-by: Karpenko, Veronika <veronika.karpenko@intel.com> Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2025-08-08 07:47:35 +00:00
Pablo de Lara	5e9072107a	cmake: add functional tests Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2025-07-17 09:57:45 +02:00
Pablo de Lara	612c210684	Add inital CMake build system Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>	2025-07-17 09:57:45 +02:00
lvshuo	d414b2702a	erasure_code: optimize RVV implementation The ISA-L EC code has been written using RVV vector instructions and the minimum multiplication table, resulting in a performance improvement of over 10 times compared to the existing implementation. Signed-off-by: Shuo Lv <lv.shuo@sanechips.com.cn>	2025-07-11 15:55:57 +02:00
Pablo de Lara	f2883f24fd	raid: add cold cache test Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2025-07-09 14:15:18 +01:00
Pablo de Lara	55e25f7aa2	raid: add consolidated performance app Added new RAID performance application which consolidates the existing XOR and P+Q gen performance applications. This application accepts buffer sizes to benchmark, as a single value, list or range, and the RAID function to test and the number of sources. Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2025-07-03 15:00:56 +02:00
Pablo de Lara	8735bb4e20	crc: add cold cache test To benchmark a cold cache scenario, the option `--cold` has been added as a parameter of the CRC benchmark application, where the addresses of the input buffers are randomize within a 1GB preallocated memory buffer. Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2025-07-03 12:11:56 +01:00
Pablo de Lara	e97c91547f	Add parenthesis around parameters in macros Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2025-07-03 12:11:56 +01:00
Pablo de Lara	199a0a8151	crc: add CRC consolidated performance benchmark Added new CRC performance application which consolidates the existing CRC performance applications (CRC16, CRC32 and CRC64). This application accepts buffer sizes to benchmark, as a single value, list or range, and the CRC function to test (or all of them). Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2025-06-23 08:48:01 +01:00
Pablo de Lara	5d437d72f1	Add missing base function symbol Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2025-06-23 08:48:01 +01:00
Pablo de Lara	fc37bd08e3	Further memory leak fixes Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2025-06-23 08:48:01 +01:00
Pablo de Lara	bf18da6770	Free allocated memory in test applications Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2025-06-11 12:17:43 +01:00
Rong Tao	8054d41db5	Ignore more generated files ignore files generated by 'make perf'. Signed-off-by: Rong Tao <rongtao@cestc.cn>	2025-05-26 19:28:17 +01:00
Tim Burke	9a6c32cb05	Optimize crc64_rocksoft for aarch64 Closes #326 Signed-off-by: Tim Burke <tim.burke@gmail.com>	2025-05-20 11:39:28 +01:00
Tim Burke	86e775b3b5	Remove unnecessary .text directives Signed-off-by: Tim Burke <tim.burke@gmail.com>	2025-05-20 11:39:28 +01:00
Tim Burke	22810489c6	Normalize the width of some constants This makes it easier to compare the constants used for crc/crc64__by8.asm, crc/crc64__by16_10.asm, and crc/aarch64/crc64_*_pmull.h Note that this revealed some discrepancies: ecma_refl: br_high != rk8 (92d8af2baf0e1e85 vs 92d8af2baf0e1e84) iso_refl: br_high != rk8 (b000000000000001 vs b000000000000000) jones_refl: br_high != rk8 (2b5926535897936b vs 2b5926535897936a) but they should be innocuous. Signed-off-by: Tim Burke <tim.burke@gmail.com>	2025-05-20 11:39:28 +01:00
sunyuechi	f74b0d27ab	Update release notes Signed-off-by: sunyuechi <sunyuechi@iscas.ac.cn>	2025-05-13 15:39:44 +02:00
sunyuechi	a7766a91b6	mem: R-V V mem_zero_detect banana_f3: rvv: mem_zero_detect_perf_warm: runtime = 3062584 usecs, bandwidth 33784 MB in 3.0626 sec = 11031.32 MB/s c: mem_zero_detect_perf_warm: runtime = 3000354 usecs, bandwidth 1594 MB in 3.0004 sec = 531.34 MB/s Signed-off-by: sunyuechi <sunyuechi@iscas.ac.cn>	2025-05-13 15:39:44 +02:00
Pablo de Lara	94690d01ca	Remove 32-bit x86 architecture support As already announced in issue #296, we are removing 32-bit x86 support, which was not being validated anyway. Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2025-05-08 18:37:08 +01:00
Pablo de Lara	8045bee170	Bump minimum NASM version to 2.14.01 NASM version 2.14.01 supports all x86 ISA in this library. Since this version has been out since 2018, it is safe to only permit the library to be compiled with this minimum version, as announced in issue #297. Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2025-05-08 16:20:08 +01:00
Pablo de Lara	d20335bba8	Update release notes Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2025-05-08 16:20:08 +01:00
sunyuechi	eb130eaf6b	erasure_code: R-V V ec_encode_data banana_f3: rvv: erasure_code_encode_warm: runtime = 3065696 usecs, bandwidth 108 MB in 3.0657 sec = 35.37 MB/s erasure_code_decode_warm: runtime = 3001213 usecs, bandwidth 136 MB in 3.0012 sec = 45.47 MB/s c: erasure_code_encode_warm: runtime = 3002512 usecs, bandwidth 52 MB in 3.0025 sec = 17.34 MB/s erasure_code_decode_warm: runtime = 3065235 usecs, bandwidth 57 MB in 3.0652 sec = 18.69 MB/s Signed-off-by: sunyuechi <sunyuechi@iscas.ac.cn>	2025-05-01 17:44:19 +01:00
sunyuechi	c5d75f1e27	erasure_code: R-V V gf_vect_dot_prod banana_f3: rvv: gf_vect_dot_prod_warm: runtime = 3062964 usecs, bandwidth 490 MB in 3.0630 sec = 160.25 MB/s c: gf_vect_dot_prod_warm: runtime = 3000581 usecs, bandwidth 173 MB in 3.0006 sec = 57.69 MB/s Signed-off-by: sunyuechi <sunyuechi@iscas.ac.cn>	2025-05-01 17:44:19 +01:00
sunyuechi	4174804684	riscv64_multibinary support more args Signed-off-by: sunyuechi <sunyuechi@iscas.ac.cn>	2025-05-01 17:44:19 +01:00
sunyuechi	0a68e9434a	erasure_code: R-V V gf_vect_mul banana_f3: rvv: gf_vect_mul_warm: runtime = 3062541 usecs, bandwidth 1889 MB in 3.0625 sec = 616.84 MB/s c: gf_vect_mul_warm: runtime = 3062014 usecs, bandwidth 285 MB in 3.0620 sec = 93.29 MB/s Signed-off-by: sunyuechi <sunyuechi@iscas.ac.cn>	2025-05-01 17:44:19 +01:00

1 2 3 4 5 ...

782 Commits