reduce one slli instructions and remove the dependence between vle8.v and ld instructions
gf5 and gf7 are not modified, +5 and +7 are not used in actual scenarios.
Signed-off-by: Shuo Lv <lv.shuo@sanechips.com.cn>
On AArch64 systems with SVE support, 128-bit SVE implementations can
perform significantly worse than equivalent NEON code due to the
different optimization strategies used in each implementation. The NEON
version is unrolled 4 times, providing excellent performance at the
fixed 128-bit width. The SVE version can achieve similar or better
performance through its variable-width operations on systems with
256-bit or 512-bit SVE, but on 128-bit SVE systems, the NEON unrolled
implementation is faster due to reduced overhead.
This change adds runtime detection of SVE vector length and falls back
to the optimized NEON implementation when SVE is operating at 128-bit
width, ensuring optimal performance across all AArch64 configurations.
This implementation checks the vector length with an intrinsic if the
compiler supports it (which works on Apple as well) and falls back to
using prctl otherwise.
This optimization ensures that systems benefit from:
- 4x unrolled NEON code on 128-bit SVE systems
- Variable-width SVE optimizations on wider SVE implementations
- Maintained compatibility across different AArch64 configurations
Performance improvement on systems with 128-bit SVE:
- Encode: 7509.80 MB/s → 8995.59 MB/s (+19.8% improvement)
- Decode: 9383.67 MB/s → 12272.38 MB/s (+30.8% improvement)
Signed-off-by: Jonathan Swinney <jswinney@amazon.com>
The ISA-L EC code has been written using RVV vector instructions and the minimum multiplication table,
resulting in a performance improvement of over 10 times compared to the existing implementation.
Signed-off-by: Shuo Lv <lv.shuo@sanechips.com.cn>
As already announced in issue #296, we are removing 32-bit x86 support,
which was not being validated anyway.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
NASM version 2.14.01 supports all x86 ISA in this library.
Since this version has been out since 2018, it is safe to
only permit the library to be compiled with this minimum version,
as announced in issue #297.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
The PROVIDER_INFO macro used in the aarch64 code declares all
functions with the signature:
extern void function(void);
The actual return type and parameter list of the functions are however
different. The declarations provided by the PROVIDER_INFO macro
therfore conflicts with the actual declarations of the functions
elsewhere in the code, causing compiler warnings.
This commit drops the PROVIDER_INFO macro and provides proper function
declarations, eiter by including a header file or by providing a
forward declaration. This corresponds to how the code for the other
architectures are handlinging this issue.
Signed-off-by: Mattias Ellert <mattias.ellert@physics.uu.se>
gf_vect_mul_base is expected to work for all buffer sizes.
However, this function is checking for size alignment to 32 bytes,
to follow the other gf_vect_mul implementations.
Therefore, another implementation for this function is included
inside ppc64le folder to be used by the encoding functions.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Follows the rest of the gf_vect_mul implementations for other architectures,
and checks for size alignment, stated in the documentation.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
gf_vect_mul requires length to be multiple of 32 bytes,
so this check is added in the SSE/AVX implementations.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Some aarch64 and ppc64le implementations of gf_vect_mul do not check
for invalid sizes, so the unit test checking for negative return value
from this function is disabled temporarily on these architectures.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
ec_init_tables is now a multi-implementation function,
so it requires a dispatcher for all architectures.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Length of data was not checked in implementation with AVX512+GFNI,
at the start of the gf_Xvect_mad_avx512_gfni functions, resulting
in buffer overflow if length was less than 64 bytes.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Expose ec_init_tables_base(), which should be used
with ec_encode_data_base() and ec_encode_data_update_base().
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>