isa-l

mirror of https://github.com/intel/isa-l.git synced 2025-12-09 16:36:49 +01:00

Files

Jonathan Swinney aedcd375ba aarch64: Use NEON when SVE width is 128 bits

On AArch64 systems with SVE support, 128-bit SVE implementations can
perform significantly worse than equivalent NEON code due to the
different optimization strategies used in each implementation. The NEON
version is unrolled 4 times, providing excellent performance at the
fixed 128-bit width. The SVE version can achieve similar or better
performance through its variable-width operations on systems with
256-bit or 512-bit SVE, but on 128-bit SVE systems, the NEON unrolled
implementation is faster due to reduced overhead.

This change adds runtime detection of SVE vector length and falls back
to the optimized NEON implementation when SVE is operating at 128-bit
width, ensuring optimal performance across all AArch64 configurations.

This implementation checks the vector length with an intrinsic if the
compiler supports it (which works on Apple as well) and falls back to
using prctl otherwise.

This optimization ensures that systems benefit from:
- 4x unrolled NEON code on 128-bit SVE systems
- Variable-width SVE optimizations on wider SVE implementations
- Maintained compatibility across different AArch64 configurations

Performance improvement on systems with 128-bit SVE:
- Encode: 7509.80 MB/s → 8995.59 MB/s (+19.8% improvement)
- Decode: 9383.67 MB/s → 12272.38 MB/s (+30.8% improvement)

Signed-off-by: Jonathan Swinney <jswinney@amazon.com>

2025-09-18 17:11:00 +01:00

ec_aarch64_dispatcher.c

aarch64: Use NEON when SVE width is 128 bits

2025-09-18 17:11:00 +01:00

ec_aarch64_highlevel_func.c

erasure_code: reformat using new code style

2024-04-22 11:35:03 +02:00

ec_multibinary_arm.S

erasure_code: add missing aarch64 and powerpc interface for ec_init_tables

2024-01-09 13:38:43 +00:00

gf_2vect_dot_prod_neon.S

Fixes for aarch64 mac

2022-10-28 08:27:26 -07:00

gf_2vect_dot_prod_sve.S

Fixes for aarch64 mac

2022-10-28 08:27:26 -07:00

gf_2vect_mad_neon.S

Fixes for aarch64 mac

2022-10-28 08:27:26 -07:00

gf_2vect_mad_sve.S

Fixes for aarch64 mac

2022-10-28 08:27:26 -07:00

gf_3vect_dot_prod_neon.S

Fixes for aarch64 mac

2022-10-28 08:27:26 -07:00

gf_3vect_dot_prod_sve.S

Fixes for aarch64 mac

2022-10-28 08:27:26 -07:00

gf_3vect_mad_neon.S

Fixes for aarch64 mac

2022-10-28 08:27:26 -07:00

gf_3vect_mad_sve.S

Fixes for aarch64 mac

2022-10-28 08:27:26 -07:00

gf_4vect_dot_prod_neon.S

Fixes for aarch64 mac

2022-10-28 08:27:26 -07:00

gf_4vect_dot_prod_sve.S

Fixes for aarch64 mac

2022-10-28 08:27:26 -07:00

gf_4vect_mad_neon.S

Fixes for aarch64 mac

2022-10-28 08:27:26 -07:00

gf_4vect_mad_sve.S

Fixes for aarch64 mac

2022-10-28 08:27:26 -07:00

gf_5vect_dot_prod_neon.S

Fixes for aarch64 mac

2022-10-28 08:27:26 -07:00

gf_5vect_dot_prod_sve.S

Fixes for aarch64 mac

2022-10-28 08:27:26 -07:00

gf_5vect_mad_neon.S

Fixes for aarch64 mac

2022-10-28 08:27:26 -07:00

gf_5vect_mad_sve.S

Fixes for aarch64 mac

2022-10-28 08:27:26 -07:00

gf_6vect_dot_prod_sve.S

Fixes for aarch64 mac

2022-10-28 08:27:26 -07:00

gf_6vect_mad_neon.S

Avoid using x18 register

2024-03-25 15:34:01 +00:00

gf_6vect_mad_sve.S

Fixes for aarch64 mac

2022-10-28 08:27:26 -07:00

gf_7vect_dot_prod_sve.S

Fixes for aarch64 mac

2022-10-28 08:27:26 -07:00

gf_8vect_dot_prod_sve.S

Fixes for aarch64 mac

2022-10-28 08:27:26 -07:00

gf_vect_dot_prod_neon.S

Fixes for aarch64 mac

2022-10-28 08:27:26 -07:00

gf_vect_dot_prod_sve.S

Fixes for aarch64 mac

2022-10-28 08:27:26 -07:00

gf_vect_mad_neon.S

Fixes for aarch64 mac

2022-10-28 08:27:26 -07:00

gf_vect_mad_sve.S

Fixes for aarch64 mac

2022-10-28 08:27:26 -07:00

gf_vect_mul_neon.S

Fixes for aarch64 mac

2022-10-28 08:27:26 -07:00

gf_vect_mul_sve.S

gf_vect_mul_sve: fix error and enable unit tests for aarch64

2024-01-12 15:18:37 +00:00

Makefile.am

Enable SVE in ISA-L erasure code for aarch64

2022-01-04 10:54:38 -07:00