Commit Graph

96 Commits

Author SHA1 Message Date
lvshuo
e64ed02065 erasure_code: set vsetvli to default parameter and add space
Signed-off-by: Shuo Lv <lv.shuo@sanechips.com.cn>
2025-10-13 09:49:38 +01:00
lvshuo
7684934179 erasure_code: add optimization implementation
reduce one slli instructions and remove the dependence between vle8.v and ld instructions
gf5 and gf7 are not modified, +5 and +7 are not used in actual scenarios.

Signed-off-by: Shuo Lv <lv.shuo@sanechips.com.cn>
2025-10-13 09:49:38 +01:00
Jonathan Swinney
aedcd375ba aarch64: Use NEON when SVE width is 128 bits
On AArch64 systems with SVE support, 128-bit SVE implementations can
perform significantly worse than equivalent NEON code due to the
different optimization strategies used in each implementation. The NEON
version is unrolled 4 times, providing excellent performance at the
fixed 128-bit width. The SVE version can achieve similar or better
performance through its variable-width operations on systems with
256-bit or 512-bit SVE, but on 128-bit SVE systems, the NEON unrolled
implementation is faster due to reduced overhead.

This change adds runtime detection of SVE vector length and falls back
to the optimized NEON implementation when SVE is operating at 128-bit
width, ensuring optimal performance across all AArch64 configurations.

This implementation checks the vector length with an intrinsic if the
compiler supports it (which works on Apple as well) and falls back to
using prctl otherwise.

This optimization ensures that systems benefit from:
- 4x unrolled NEON code on 128-bit SVE systems
- Variable-width SVE optimizations on wider SVE implementations
- Maintained compatibility across different AArch64 configurations

Performance improvement on systems with 128-bit SVE:
- Encode: 7509.80 MB/s → 8995.59 MB/s (+19.8% improvement)
- Decode: 9383.67 MB/s → 12272.38 MB/s (+30.8% improvement)

Signed-off-by: Jonathan Swinney <jswinney@amazon.com>
2025-09-18 17:11:00 +01:00
Pablo de Lara
09cec64707 erasure_code: improve verbose output of test application
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-09-17 13:36:06 +01:00
Pablo de Lara
612c210684 Add inital CMake build system
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
2025-07-17 09:57:45 +02:00
lvshuo
d414b2702a erasure_code: optimize RVV implementation
The ISA-L EC code has been written using RVV vector instructions and the minimum multiplication table,
resulting in a performance improvement of over 10 times compared to the existing implementation.

Signed-off-by: Shuo Lv <lv.shuo@sanechips.com.cn>
2025-07-11 15:55:57 +02:00
Pablo de Lara
e97c91547f Add parenthesis around parameters in macros
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-07-03 12:11:56 +01:00
Pablo de Lara
fc37bd08e3 Further memory leak fixes
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-06-23 08:48:01 +01:00
Pablo de Lara
bf18da6770 Free allocated memory in test applications
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-06-11 12:17:43 +01:00
Pablo de Lara
94690d01ca Remove 32-bit x86 architecture support
As already announced in issue #296, we are removing 32-bit x86 support,
which was not being validated anyway.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-05-08 18:37:08 +01:00
Pablo de Lara
8045bee170 Bump minimum NASM version to 2.14.01
NASM version 2.14.01 supports all x86 ISA in this library.
Since this version has been out since 2018, it is safe to
only permit the library to be compiled with this minimum version,
as announced in issue #297.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2025-05-08 16:20:08 +01:00
sunyuechi
eb130eaf6b erasure_code: R-V V ec_encode_data
banana_f3:
    rvv:
        erasure_code_encode_warm: runtime =    3065696 usecs, bandwidth 108 MB in 3.0657 sec = 35.37 MB/s
        erasure_code_decode_warm: runtime =    3001213 usecs, bandwidth 136 MB in 3.0012 sec = 45.47 MB/s
    c:
        erasure_code_encode_warm: runtime =    3002512 usecs, bandwidth 52 MB in 3.0025 sec = 17.34 MB/s
        erasure_code_decode_warm: runtime =    3065235 usecs, bandwidth 57 MB in 3.0652 sec = 18.69 MB/s

Signed-off-by: sunyuechi <sunyuechi@iscas.ac.cn>
2025-05-01 17:44:19 +01:00
sunyuechi
c5d75f1e27 erasure_code: R-V V gf_vect_dot_prod
banana_f3:
    rvv: gf_vect_dot_prod_warm: runtime =    3062964 usecs, bandwidth 490 MB in 3.0630 sec = 160.25 MB/s
    c:   gf_vect_dot_prod_warm: runtime =    3000581 usecs, bandwidth 173 MB in 3.0006 sec = 57.69 MB/s

Signed-off-by: sunyuechi <sunyuechi@iscas.ac.cn>
2025-05-01 17:44:19 +01:00
sunyuechi
0a68e9434a erasure_code: R-V V gf_vect_mul
banana_f3:
    rvv: gf_vect_mul_warm: runtime =    3062541 usecs, bandwidth 1889 MB in 3.0625 sec = 616.84 MB/s
    c:   gf_vect_mul_warm: runtime =    3062014 usecs, bandwidth 285 MB in 3.0620 sec = 93.29 MB/s

Signed-off-by: sunyuechi <sunyuechi@iscas.ac.cn>
2025-05-01 17:44:19 +01:00
sunyuechi
5518db11a9 Fix erasure_code/gf_vect_mul_test output
Signed-off-by: sunyuechi <sunyuechi@iscas.ac.cn>
2025-05-01 17:44:19 +01:00
Mattias Ellert
841f9e34ad Address type mismatch warnings on aarch64
The PROVIDER_INFO macro used in the aarch64 code declares all
functions with the signature:

extern void function(void);

The actual return type and parameter list of the functions are however
different. The declarations provided by the PROVIDER_INFO macro
therfore conflicts with the actual declarations of the functions
elsewhere in the code, causing compiler warnings.

This commit drops the PROVIDER_INFO macro and provides proper function
declarations, eiter by including a header file or by providing a
forward declaration. This corresponds to how the code for the other
architectures are handlinging this issue.

Signed-off-by: Mattias Ellert <mattias.ellert@physics.uu.se>
2025-04-22 12:55:53 +01:00
Daniel Gregory
726a6f7c02 build: Add riscv64 support
Use the base implementations for every function.

Signed-off-by: Daniel Gregory <daniel.gregory@bytedance.com>
2025-03-20 19:22:40 +00:00
Cornu, Marcel D
07f8028743 erasure_code: fix unaligned free error in perf apps on windows
Signed-off-by: Cornu, Marcel D <marcel.d.cornu@intel.com>
2024-11-19 14:20:33 +00:00
Marcel Cornu
300260a4d9 erasure_code: reformat using new code style
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2024-04-22 11:35:03 +02:00
Taiju Yamada
38279f5e9e Avoid using x18 register
Signed-off-by: Taiju Yamada <tyamada@bi.a.u-tokyo.ac.jp>
2024-03-25 15:34:01 +00:00
Mattias Ellert
1b1ee1e18f erasure_code: fix wrong return type
erasure_code/ppc64le/gf_vect_mul_vsx.c: In function '_gf_vect_mul_base':
erasure_code/ppc64le/gf_vect_mul_vsx.c:14:16: error: 'return' with a value, in function returning void [-Wreturn-mismatch]
   14 |         return 0;
      |                ^
erasure_code/ppc64le/gf_vect_mul_vsx.c:6:13: note: declared here
    6 | static void _gf_vect_mul_base(int len, unsigned char *a, unsigned char *src,
      |             ^~~~~~~~~~~~~~~~~

Signed-off-by: Mattias Ellert <mattias.ellert@physics.uu.se>
2024-01-23 12:01:14 +00:00
Pablo de Lara
e0fd782974 erasure_code: use internal gf_vect_mul_base for ppc64le encoding
gf_vect_mul_base is expected to work for all buffer sizes.
However, this function is checking for size alignment to 32 bytes,
to follow the other gf_vect_mul implementations.
Therefore, another implementation for this function is included
inside ppc64le folder to be used by the encoding functions.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2024-01-15 15:48:14 +00:00
Pablo de Lara
b8d5633e51 erasure_code: check for size alignment on powerpc gf_vect_mul_vsx implementation
Follows the rest of the gf_vect_mul implementations for other architectures,
and checks for size alignment, stated in the documentation.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2024-01-15 15:48:14 +00:00
Pablo de Lara
91e7906f3f erasure_code: check for size on gf_vect_mul_sse/avx
gf_vect_mul requires length to be multiple of 32 bytes,
so this check is added in the SSE/AVX implementations.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2024-01-15 13:52:08 +00:00
liuqinfei
275977156d gf_vect_mul_sve: fix error and enable unit tests for aarch64
Signed-off-by: liuqinfei <lucas.liuqinfei@huawei.com>
2024-01-12 15:18:37 +00:00
Pablo de Lara
e0fffbe48b erasure_code: disable unit tests temporarily for aarch64/ppc64le
Some aarch64 and ppc64le implementations of gf_vect_mul do not check
for invalid sizes, so the unit test checking for negative return value
from this function is disabled temporarily on these architectures.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2024-01-10 15:53:14 +00:00
Pablo de Lara
455fdded4e erasure_code: add missing aarch64 and powerpc interface for ec_init_tables
ec_init_tables is now a multi-implementation function,
so it requires a dispatcher for all architectures.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2024-01-09 13:38:43 +00:00
Tomasz Kantecki
402bd4f773 erasure_code: various fixes for static code analysis issues
Signed-off-by: Tomasz Kantecki <tomasz.kantecki@intel.com>
2023-12-19 20:36:39 +00:00
Pablo de Lara
a3e260436a erasure_code: [test] fix memory leak
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-12-18 14:25:22 +00:00
Pablo de Lara
abd80d3c5a erasure_code: check for size in gf_Xvect_mad_avx512_gfni
Length of data was not checked in implementation with AVX512+GFNI,
at the start of the gf_Xvect_mad_avx512_gfni functions, resulting
in buffer overflow if length was less than 64 bytes.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-12-18 14:25:22 +00:00
Marcel Cornu
561a419bc8 erasure_code: fix modules using incorrect unsigned jump
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2023-12-14 17:55:49 +00:00
Marcel Cornu
a53a20ea2a erasure_code: add AVX2 5vect mad with GFNI implementation
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2023-12-14 17:55:49 +00:00
Marcel Cornu
47ed2847af erasure_code: add AVX2 4vect mad with GFNI implementation
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2023-12-14 17:55:49 +00:00
Marcel Cornu
22b7f33d68 erasure_code: add AVX2 3vect mad with GFNI implementation
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2023-12-14 17:55:49 +00:00
Marcel Cornu
d22bb198f3 erasure_code: optimize AVX2-GFNI single vector mad implementation
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2023-12-13 17:03:16 +00:00
Marcel Cornu
a0a149d674 erasure_code: add AVX2 2vect mad with GFNI implementation
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2023-12-13 17:03:16 +00:00
Marcel Cornu
0052080f53 erasure_code: optimize AVX2 GFNI 2 vector dot product
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2023-12-11 22:44:07 +00:00
Marcel Cornu
3f87141d03 erasure_code: optimize AVX2 GFNI single vector dot product
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2023-12-11 22:44:07 +00:00
Marcel Cornu
164d9ff1f0 erasure_code: add 2 vector AVX2 dot product with GFNI implementation
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2023-12-11 22:44:07 +00:00
Marcel Cornu
307d737bf2 erasure_code: add 3 vector AVX2 dot product with GFNI implementation
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2023-12-07 14:01:18 +00:00
Pablo de Lara
2ca781df19 lib: reduce verbosity by default in tests
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-12-01 14:33:29 +00:00
Marcel Cornu
5f23c03415 erasure_code: add initial AVX2 mad with GFNI implementation
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2023-12-01 14:20:56 +00:00
Pablo de Lara
447d9af75b erasure_code: add initial AVX2 dot product with GFNI implementation
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2023-12-01 14:20:56 +00:00
Marcel Cornu
bc34d87427 erasure_code: update GF_MUL_XOR macro to support VEX encoding
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2023-12-01 14:20:56 +00:00
Pablo de Lara
f971f02309 erasure_code: expose base implementation of init_tables
Expose ec_init_tables_base(), which should be used
with ec_encode_data_base() and ec_encode_data_update_base().

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-11-23 10:56:28 +00:00
Pablo de Lara
65e89717df erasure_code: implement EC update with AVX512 + GFNI
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-11-23 10:56:28 +00:00
Pablo de Lara
1eff12dddb erasure_code: implement EC with AVX512 + GFNI
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-11-23 10:56:28 +00:00
Pablo de Lara
9d487fd6db erasure_code: [perf] get parameters for number of buffers
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-11-23 10:56:28 +00:00
Pablo de Lara
07af4032ff erasure_code: fix stack allocation
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-11-23 10:56:28 +00:00
Pablo de Lara
801df41929 erasure_code: fix vmovdqa instruction
vmovdqa needs to be vmovdqa32/64 when used on ZMMs (EVEX encoded).

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-11-23 10:56:28 +00:00