Commit Graph

79 Commits

Author SHA1 Message Date
Cornu, Marcel D
07f8028743 erasure_code: fix unaligned free error in perf apps on windows
Signed-off-by: Cornu, Marcel D <marcel.d.cornu@intel.com>
2024-11-19 14:20:33 +00:00
Marcel Cornu
300260a4d9 erasure_code: reformat using new code style
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2024-04-22 11:35:03 +02:00
Taiju Yamada
38279f5e9e Avoid using x18 register
Signed-off-by: Taiju Yamada <tyamada@bi.a.u-tokyo.ac.jp>
2024-03-25 15:34:01 +00:00
Mattias Ellert
1b1ee1e18f erasure_code: fix wrong return type
erasure_code/ppc64le/gf_vect_mul_vsx.c: In function '_gf_vect_mul_base':
erasure_code/ppc64le/gf_vect_mul_vsx.c:14:16: error: 'return' with a value, in function returning void [-Wreturn-mismatch]
   14 |         return 0;
      |                ^
erasure_code/ppc64le/gf_vect_mul_vsx.c:6:13: note: declared here
    6 | static void _gf_vect_mul_base(int len, unsigned char *a, unsigned char *src,
      |             ^~~~~~~~~~~~~~~~~

Signed-off-by: Mattias Ellert <mattias.ellert@physics.uu.se>
2024-01-23 12:01:14 +00:00
Pablo de Lara
e0fd782974 erasure_code: use internal gf_vect_mul_base for ppc64le encoding
gf_vect_mul_base is expected to work for all buffer sizes.
However, this function is checking for size alignment to 32 bytes,
to follow the other gf_vect_mul implementations.
Therefore, another implementation for this function is included
inside ppc64le folder to be used by the encoding functions.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2024-01-15 15:48:14 +00:00
Pablo de Lara
b8d5633e51 erasure_code: check for size alignment on powerpc gf_vect_mul_vsx implementation
Follows the rest of the gf_vect_mul implementations for other architectures,
and checks for size alignment, stated in the documentation.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2024-01-15 15:48:14 +00:00
Pablo de Lara
91e7906f3f erasure_code: check for size on gf_vect_mul_sse/avx
gf_vect_mul requires length to be multiple of 32 bytes,
so this check is added in the SSE/AVX implementations.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2024-01-15 13:52:08 +00:00
liuqinfei
275977156d gf_vect_mul_sve: fix error and enable unit tests for aarch64
Signed-off-by: liuqinfei <lucas.liuqinfei@huawei.com>
2024-01-12 15:18:37 +00:00
Pablo de Lara
e0fffbe48b erasure_code: disable unit tests temporarily for aarch64/ppc64le
Some aarch64 and ppc64le implementations of gf_vect_mul do not check
for invalid sizes, so the unit test checking for negative return value
from this function is disabled temporarily on these architectures.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2024-01-10 15:53:14 +00:00
Pablo de Lara
455fdded4e erasure_code: add missing aarch64 and powerpc interface for ec_init_tables
ec_init_tables is now a multi-implementation function,
so it requires a dispatcher for all architectures.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2024-01-09 13:38:43 +00:00
Tomasz Kantecki
402bd4f773 erasure_code: various fixes for static code analysis issues
Signed-off-by: Tomasz Kantecki <tomasz.kantecki@intel.com>
2023-12-19 20:36:39 +00:00
Pablo de Lara
a3e260436a erasure_code: [test] fix memory leak
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-12-18 14:25:22 +00:00
Pablo de Lara
abd80d3c5a erasure_code: check for size in gf_Xvect_mad_avx512_gfni
Length of data was not checked in implementation with AVX512+GFNI,
at the start of the gf_Xvect_mad_avx512_gfni functions, resulting
in buffer overflow if length was less than 64 bytes.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-12-18 14:25:22 +00:00
Marcel Cornu
561a419bc8 erasure_code: fix modules using incorrect unsigned jump
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2023-12-14 17:55:49 +00:00
Marcel Cornu
a53a20ea2a erasure_code: add AVX2 5vect mad with GFNI implementation
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2023-12-14 17:55:49 +00:00
Marcel Cornu
47ed2847af erasure_code: add AVX2 4vect mad with GFNI implementation
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2023-12-14 17:55:49 +00:00
Marcel Cornu
22b7f33d68 erasure_code: add AVX2 3vect mad with GFNI implementation
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2023-12-14 17:55:49 +00:00
Marcel Cornu
d22bb198f3 erasure_code: optimize AVX2-GFNI single vector mad implementation
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2023-12-13 17:03:16 +00:00
Marcel Cornu
a0a149d674 erasure_code: add AVX2 2vect mad with GFNI implementation
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2023-12-13 17:03:16 +00:00
Marcel Cornu
0052080f53 erasure_code: optimize AVX2 GFNI 2 vector dot product
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2023-12-11 22:44:07 +00:00
Marcel Cornu
3f87141d03 erasure_code: optimize AVX2 GFNI single vector dot product
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2023-12-11 22:44:07 +00:00
Marcel Cornu
164d9ff1f0 erasure_code: add 2 vector AVX2 dot product with GFNI implementation
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2023-12-11 22:44:07 +00:00
Marcel Cornu
307d737bf2 erasure_code: add 3 vector AVX2 dot product with GFNI implementation
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2023-12-07 14:01:18 +00:00
Pablo de Lara
2ca781df19 lib: reduce verbosity by default in tests
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-12-01 14:33:29 +00:00
Marcel Cornu
5f23c03415 erasure_code: add initial AVX2 mad with GFNI implementation
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2023-12-01 14:20:56 +00:00
Pablo de Lara
447d9af75b erasure_code: add initial AVX2 dot product with GFNI implementation
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2023-12-01 14:20:56 +00:00
Marcel Cornu
bc34d87427 erasure_code: update GF_MUL_XOR macro to support VEX encoding
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
2023-12-01 14:20:56 +00:00
Pablo de Lara
f971f02309 erasure_code: expose base implementation of init_tables
Expose ec_init_tables_base(), which should be used
with ec_encode_data_base() and ec_encode_data_update_base().

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-11-23 10:56:28 +00:00
Pablo de Lara
65e89717df erasure_code: implement EC update with AVX512 + GFNI
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-11-23 10:56:28 +00:00
Pablo de Lara
1eff12dddb erasure_code: implement EC with AVX512 + GFNI
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-11-23 10:56:28 +00:00
Pablo de Lara
9d487fd6db erasure_code: [perf] get parameters for number of buffers
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-11-23 10:56:28 +00:00
Pablo de Lara
07af4032ff erasure_code: fix stack allocation
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-11-23 10:56:28 +00:00
Pablo de Lara
801df41929 erasure_code: fix vmovdqa instruction
vmovdqa needs to be vmovdqa32/64 when used on ZMMs (EVEX encoded).

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2023-11-23 10:56:28 +00:00
Taiju Yamada
1187583a97 Fixes for aarch64 mac
- It should be fine to enable pmull always on Apple Silicon
- macOS 12+ is required for PMULL instruction.
- Changed the conditional macro to __APPLE__
- Rewritten dispatcher using sysctlbyname
- Use __USER_LABEL_PREFIX__
- Use __TEXT,__const as readonly section
- use ASM_DEF_RODATA macro
- fix func decl

Change-Id: I800593f21085d8187b480c8bb3ab2bd70c4a6974
Signed-off-by: Taiju Yamada <tyamada@bi.a.u-tokyo.ac.jp>
2022-10-28 08:27:26 -07:00
Surendar Chandra
85716fe2fe Correct loop bounds check in aarch64 gf_vect_mul
Prior to this change, a missing loop bounds check in the aarch64
version of gf_vect_mul would cause the routine to return 1 (error)
in the normal case.

This change introduces a check and branch to "return_pass" (success), and
also adds checks of the return code of gf_vect_mul to the supplied unit
test; it was previously ignored.

Change-Id: I9f7fe0014189b24f9600e0473ee02b5316c2da91
Signed-off-by: Surendar Chandra <vsurench@amazon.com>
2022-10-27 15:30:00 -07:00
Greg Tucker
04f3125ea0 test: Move perf routine output from stack to heap
Large cold perf tests were allocating more then allowed stack size.

Change-Id: I2c54f36ac6b42b359078dae7fffa5ce0b6d4890a
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2022-08-08 15:19:03 -07:00
Greg Tucker
9c7e3b9f22 test: Change perf tests to warm by default
The cold versions of tests depended on a fixed size of last level
cache that is too low on some arch and too high for the total
available memory on others.

Change-Id: Iee98403f9ace02e01b810c296a5fe44b933bfb17
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2022-08-03 16:35:55 -07:00
Greg Tucker
9f75defd57 Remove all slver legacy segments
The relic slver is no longer used for individual versioning
on functions and is confusing tools looking for data in text
sections. This removes all instances instead of fixing since
its usefulness is waining. Fixes #221

Change-Id: Ife0b9f105950a90337c58e8a41ac2cffc0f67d99
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2022-07-14 19:23:52 -07:00
Ilya Leoshkevich
d3cfb2fb77 Fix s390 build
The goal of this patch is to make isa-l testsuite pass on s390 with
minimal changes to the library. The one and only reason isa-l does not
work on s390 at the moment is that s390 is big-endian, and isa-l
assumes little-endian at a lot of places.

There are two flavors of this: loading/storing integers from/to
memory, and overlapping structs. Loads/stores are already helpfully
wrapped by unaligned.h header, so replace the functions there with
endianness-aware variants. Solve struct member overlap by reversing
their order on big-endian.

Also, fix a couple of usages of uninitialized memory in the testsuite
(found with MemorySanitizer).

Fixes s390x part of #188.

Change-Id: Iaf14a113bd266900192cc8b44212f8a47a8c7753
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
2022-01-04 11:06:17 -07:00
Guodong Xu
3b3d7cc47b Enable SVE in ISA-L erasure code for aarch64
This patch adds Arm (aarch64) SVE [1] variable-length vector assembly support
into ISA-L erasure code library. "Arm designed the Scalable Vector Extension
(SVE) as a next-generation SIMD extension to AArch64. SVE allows flexible
vector length implementations with a range of possible values in CPU
implementations. The vector length can vary from a minimum of 128 bits up to
a maximum of 2048 bits, at 128-bit increments. The SVE design guarantees
that the same application can run on different implementations that support
SVE, without the need to recompile the code. " [3]

Test method:
 - This patch was tested on Fujitsu's A64FX [2], and it passed all erasure
     code related test cases, including "make checks" , "make test", and
     "make perf".
 - To ensure code testing coverage, parameters in files (erasure_code/
     erasure_code_test.c , erasure_code_update_test.c and gf_vect_mad_test.c)
     are modified to cover all _vect versions of _mad_sve() / _dot_prod_sve()
     rutines.

Performance improvements over NEON:
In general, SVE benchmarks (bandwidth in MB/s) are 40% ~ 100% higher than NEON
when running _cold style (data uncached and pulled from memory) perfs. This
includes routines of dot_prod, mad, and mul.

Optimization points:
This patch was tuned for the best performance on A64FX. Tuning points being
touched in this patch include:
1) Data prefetch into L2 cache before loading. See _sve.S files.
2) Instruction sequence orchestration. Such as interleaving every two
     'ld1b/st1b' instructions with other instructions. See _sve.S files.
3) To improve dest vectors parallelism, in highlevel, running
     gf_4vect_dot_prod_sve twice is better than running gf_8vect_dot_prod_sve()
     once, and it's also better than running _7vect + _vect, _6vect + _2vect,
     and _5vect + _3vect. The similar idea is applied to improve 11 ~ 9 dest
     vectors dot product computing as well. The related change can be found
     in ec_encode_data_sve() of file:
     erasure_code/aarch64/ec_aarch64_highlevel_func.c

Notes:
1) About vector length: A64FX has a vector register length of 512bit. However,
     this patchset was written with variable length assembly so it work
     automatically on aarch64 machines with any types of SVE vector length,
     such as SVE-128, SVE-256, etc..
2) About optimization: Due to differences in microarchitecture and
     cache/memory design, to achieve optimum performance on SVE capable CPUs
     other than A64FX, it is considered necessary to do microarchitecture-level
     tunings on these CPUs.

[1] Introduction to SVE - Arm Developer.
      https://developer.arm.com/documentation/102476/latest/
[2] FUJITSU Processor A64FX.
      https://www.fujitsu.com/global/products/computing/servers/supercomputer/a64fx/
[3] Introducing SVE.
      https://developer.arm.com/documentation/102476/0001/Introducing-SVE

Change-Id: If49eb8a956154d799dcda0ba4c9c6d979f5064a9
Signed-off-by: Guodong Xu <guodong.xu@linaro.org>
2022-01-04 10:54:38 -07:00
Greg Tucker
112dd72c01 build: Remove unneeded file types.h
The file types.h has long been misnamed and overlaps with
functionality in the test helper routines.

Change-Id: I774047d3a0074198b67a6b4e909f1e2ce1938195
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2021-06-10 09:35:43 -07:00
luo rixin
bee5180a15 erasure_code: Fix text relocation on aarch64
Here is the bug report on ceph. https://tracker.ceph.com/issues/48681

Change-Id: Ie1c60a71f28c1a169c8899a621be9bb455f5e244
Signed-off-by: luo rixin <luorixin@huawei.com>
2021-01-08 15:23:15 -07:00
H.J. Lu
cd888f01a4 x86: Add ENDBR32/ENDBR64 at function entries for Intel CET
To support Intel CET, all indirect branch targets must start with
ENDBR32/ENDBR64.  Here is a patch to define endbranch and add it to
function entries in x86 assembly codes which are indirect branch
targets as discovered by running testsuite on Intel CET machine and
visual inspection.

Verified with

$ CC="gcc -Wl,-z,cet-report=error -fcf-protection" CXX="g++ -Wl,-z,cet-report=error -fcf-protection" .../configure x86_64-linux
$ make -j8
$ make -j8 check

with both nasm and yasm on both CET and non-CET machines.

Change-Id: I9822578e7294fb5043a64ab7de5c41de81a7d337
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2020-05-26 09:16:49 -07:00
Greg Tucker
af13ed6136 ec: Fix second windows reg push for avx512
Change improper stack push in windows prolog.  Error was not reachable without
windows nasm support and so went undetected.

Change-Id: I8b715195d1c8efd173843c043d42fc610ddebd17
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-03-20 12:36:58 -07:00
Greg Tucker
ede04f0a1f build: Fix for windows to allow nasm use
Previously windows build could only use yasm because some procedural items such
as proc_start were not supported by nasm.  This adds a few macros and fixes so
nasm can be used to build on windows.

Change-Id: Ia05dc3ff482f33b0f915bb1be3c7df5e4a753b3a
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-03-17 18:05:46 -07:00
Greg Tucker
5ab40c79cc ec: Fix windows reg push for avx512
Push of registers overlapped xmm push.  Error was not reachable without windows
nasm support and so went undetected.

Change-Id: I0ffd66f6d32ac37ea03fe9b11924968aa50f8fa7
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-03-17 18:05:46 -07:00
Greg Tucker
472e7011e8 ec: Change use of windows macro save_xmm128 to vec
For builds under windows this could emit a non-vec mov that's not optional for
AVX versions.

Change-Id: I31e6ea3b62d48c5a13f6e83f8d684f0b5551087b
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-03-17 18:04:54 -07:00
Greg Tucker
794413ddd2 ec: Remove arch-specific redundant gf_nvect tests
The gf_{2-6}vect_dot_prod tests were kept in other_tests since the 5,6vect
functions were not strictly called by the higher level ec_encode_data() and
needed independent testing.  As this has now changed the extra tests can be
removed as redundant.

Change-Id: I8a95e31487b150a2a8f929c5586785524d951fde
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-03-06 13:45:59 -07:00
Hong Bo Peng
180c74aefd enable VSX SIMD in ISA-L for ppc64le
1) Implement the ErasureCode function in Altivec Intrinsics
  2) Coding style update

Change-Id: I2c81d035f4083e9b011dbf3b741f628813b68606
Thanks-to: Daniel Axtens <dja@axtens.net>
Signed-off-by: Hong Bo Peng <penghb@cn.ibm.com>
2020-02-20 09:40:43 -07:00
John Kariuki
5eeb33f69c ec: add AVX512 ec functions with 5 and 6 outputs
Added AVX512 optimized functions to calculate the
GF(2^8) vector dot product with 5 and 6 outputs
at a time. Also added GF(2^8) vector multiply
AVX512 optimized functions with 5 and 6 accumulate.

Change-Id: I6d2c080f4f4f8e4823ad9a9be2c65c3b5b3bb1f8
Signed-off-by: John Kariuki <John.K.Kariuki@intel.com>
2019-11-19 10:12:14 -07:00