isa-l

mirror of https://github.com/intel/isa-l.git synced 2024-12-12 17:33:50 +01:00

Author	SHA1	Message	Date
Marcel Cornu	561a419bc8	erasure_code: fix modules using incorrect unsigned jump Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>	2023-12-14 17:55:49 +00:00
Marcel Cornu	a53a20ea2a	erasure_code: add AVX2 5vect mad with GFNI implementation Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>	2023-12-14 17:55:49 +00:00
Marcel Cornu	47ed2847af	erasure_code: add AVX2 4vect mad with GFNI implementation Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>	2023-12-14 17:55:49 +00:00
Marcel Cornu	22b7f33d68	erasure_code: add AVX2 3vect mad with GFNI implementation Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>	2023-12-14 17:55:49 +00:00
Marcel Cornu	d22bb198f3	erasure_code: optimize AVX2-GFNI single vector mad implementation Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>	2023-12-13 17:03:16 +00:00
Marcel Cornu	a0a149d674	erasure_code: add AVX2 2vect mad with GFNI implementation Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>	2023-12-13 17:03:16 +00:00
Marcel Cornu	0052080f53	erasure_code: optimize AVX2 GFNI 2 vector dot product Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>	2023-12-11 22:44:07 +00:00
Marcel Cornu	3f87141d03	erasure_code: optimize AVX2 GFNI single vector dot product Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>	2023-12-11 22:44:07 +00:00
Marcel Cornu	164d9ff1f0	erasure_code: add 2 vector AVX2 dot product with GFNI implementation Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>	2023-12-11 22:44:07 +00:00
Marcel Cornu	307d737bf2	erasure_code: add 3 vector AVX2 dot product with GFNI implementation Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>	2023-12-07 14:01:18 +00:00
Pablo de Lara	2ca781df19	lib: reduce verbosity by default in tests Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2023-12-01 14:33:29 +00:00
Marcel Cornu	5f23c03415	erasure_code: add initial AVX2 mad with GFNI implementation Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>	2023-12-01 14:20:56 +00:00
Pablo de Lara	447d9af75b	erasure_code: add initial AVX2 dot product with GFNI implementation Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>	2023-12-01 14:20:56 +00:00
Marcel Cornu	bc34d87427	erasure_code: update GF_MUL_XOR macro to support VEX encoding Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>	2023-12-01 14:20:56 +00:00
Pablo de Lara	f971f02309	erasure_code: expose base implementation of init_tables Expose ec_init_tables_base(), which should be used with ec_encode_data_base() and ec_encode_data_update_base(). Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2023-11-23 10:56:28 +00:00
Pablo de Lara	65e89717df	erasure_code: implement EC update with AVX512 + GFNI Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2023-11-23 10:56:28 +00:00
Pablo de Lara	1eff12dddb	erasure_code: implement EC with AVX512 + GFNI Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2023-11-23 10:56:28 +00:00
Pablo de Lara	9d487fd6db	erasure_code: [perf] get parameters for number of buffers Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2023-11-23 10:56:28 +00:00
Pablo de Lara	07af4032ff	erasure_code: fix stack allocation Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2023-11-23 10:56:28 +00:00
Pablo de Lara	801df41929	erasure_code: fix vmovdqa instruction vmovdqa needs to be vmovdqa32/64 when used on ZMMs (EVEX encoded). Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2023-11-23 10:56:28 +00:00
Taiju Yamada	1187583a97	Fixes for aarch64 mac - It should be fine to enable pmull always on Apple Silicon - macOS 12+ is required for PMULL instruction. - Changed the conditional macro to __APPLE__ - Rewritten dispatcher using sysctlbyname - Use __USER_LABEL_PREFIX__ - Use __TEXT,__const as readonly section - use ASM_DEF_RODATA macro - fix func decl Change-Id: I800593f21085d8187b480c8bb3ab2bd70c4a6974 Signed-off-by: Taiju Yamada <tyamada@bi.a.u-tokyo.ac.jp>	2022-10-28 08:27:26 -07:00
Surendar Chandra	85716fe2fe	Correct loop bounds check in aarch64 gf_vect_mul Prior to this change, a missing loop bounds check in the aarch64 version of gf_vect_mul would cause the routine to return 1 (error) in the normal case. This change introduces a check and branch to "return_pass" (success), and also adds checks of the return code of gf_vect_mul to the supplied unit test; it was previously ignored. Change-Id: I9f7fe0014189b24f9600e0473ee02b5316c2da91 Signed-off-by: Surendar Chandra <vsurench@amazon.com>	2022-10-27 15:30:00 -07:00
Greg Tucker	04f3125ea0	test: Move perf routine output from stack to heap Large cold perf tests were allocating more then allowed stack size. Change-Id: I2c54f36ac6b42b359078dae7fffa5ce0b6d4890a Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2022-08-08 15:19:03 -07:00
Greg Tucker	9c7e3b9f22	test: Change perf tests to warm by default The cold versions of tests depended on a fixed size of last level cache that is too low on some arch and too high for the total available memory on others. Change-Id: Iee98403f9ace02e01b810c296a5fe44b933bfb17 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2022-08-03 16:35:55 -07:00
Greg Tucker	9f75defd57	Remove all slver legacy segments The relic slver is no longer used for individual versioning on functions and is confusing tools looking for data in text sections. This removes all instances instead of fixing since its usefulness is waining. Fixes #221 Change-Id: Ife0b9f105950a90337c58e8a41ac2cffc0f67d99 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2022-07-14 19:23:52 -07:00
Ilya Leoshkevich	d3cfb2fb77	Fix s390 build The goal of this patch is to make isa-l testsuite pass on s390 with minimal changes to the library. The one and only reason isa-l does not work on s390 at the moment is that s390 is big-endian, and isa-l assumes little-endian at a lot of places. There are two flavors of this: loading/storing integers from/to memory, and overlapping structs. Loads/stores are already helpfully wrapped by unaligned.h header, so replace the functions there with endianness-aware variants. Solve struct member overlap by reversing their order on big-endian. Also, fix a couple of usages of uninitialized memory in the testsuite (found with MemorySanitizer). Fixes s390x part of #188. Change-Id: Iaf14a113bd266900192cc8b44212f8a47a8c7753 Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>	2022-01-04 11:06:17 -07:00
Guodong Xu	3b3d7cc47b	Enable SVE in ISA-L erasure code for aarch64 This patch adds Arm (aarch64) SVE [1] variable-length vector assembly support into ISA-L erasure code library. "Arm designed the Scalable Vector Extension (SVE) as a next-generation SIMD extension to AArch64. SVE allows flexible vector length implementations with a range of possible values in CPU implementations. The vector length can vary from a minimum of 128 bits up to a maximum of 2048 bits, at 128-bit increments. The SVE design guarantees that the same application can run on different implementations that support SVE, without the need to recompile the code. " [3] Test method: - This patch was tested on Fujitsu's A64FX [2], and it passed all erasure code related test cases, including "make checks" , "make test", and "make perf". - To ensure code testing coverage, parameters in files (erasure_code/ erasure_code_test.c , erasure_code_update_test.c and gf_vect_mad_test.c) are modified to cover all _vect versions of _mad_sve() / _dot_prod_sve() rutines. Performance improvements over NEON: In general, SVE benchmarks (bandwidth in MB/s) are 40% ~ 100% higher than NEON when running _cold style (data uncached and pulled from memory) perfs. This includes routines of dot_prod, mad, and mul. Optimization points: This patch was tuned for the best performance on A64FX. Tuning points being touched in this patch include: 1) Data prefetch into L2 cache before loading. See _sve.S files. 2) Instruction sequence orchestration. Such as interleaving every two 'ld1b/st1b' instructions with other instructions. See _sve.S files. 3) To improve dest vectors parallelism, in highlevel, running gf_4vect_dot_prod_sve twice is better than running gf_8vect_dot_prod_sve() once, and it's also better than running _7vect + _vect, _6vect + _2vect, and _5vect + _3vect. The similar idea is applied to improve 11 ~ 9 dest vectors dot product computing as well. The related change can be found in ec_encode_data_sve() of file: erasure_code/aarch64/ec_aarch64_highlevel_func.c Notes: 1) About vector length: A64FX has a vector register length of 512bit. However, this patchset was written with variable length assembly so it work automatically on aarch64 machines with any types of SVE vector length, such as SVE-128, SVE-256, etc.. 2) About optimization: Due to differences in microarchitecture and cache/memory design, to achieve optimum performance on SVE capable CPUs other than A64FX, it is considered necessary to do microarchitecture-level tunings on these CPUs. [1] Introduction to SVE - Arm Developer. https://developer.arm.com/documentation/102476/latest/ [2] FUJITSU Processor A64FX. https://www.fujitsu.com/global/products/computing/servers/supercomputer/a64fx/ [3] Introducing SVE. https://developer.arm.com/documentation/102476/0001/Introducing-SVE Change-Id: If49eb8a956154d799dcda0ba4c9c6d979f5064a9 Signed-off-by: Guodong Xu <guodong.xu@linaro.org>	2022-01-04 10:54:38 -07:00
Greg Tucker	112dd72c01	build: Remove unneeded file types.h The file types.h has long been misnamed and overlaps with functionality in the test helper routines. Change-Id: I774047d3a0074198b67a6b4e909f1e2ce1938195 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2021-06-10 09:35:43 -07:00
luo rixin	bee5180a15	erasure_code: Fix text relocation on aarch64 Here is the bug report on ceph. https://tracker.ceph.com/issues/48681 Change-Id: Ie1c60a71f28c1a169c8899a621be9bb455f5e244 Signed-off-by: luo rixin <luorixin@huawei.com>	2021-01-08 15:23:15 -07:00
H.J. Lu	cd888f01a4	x86: Add ENDBR32/ENDBR64 at function entries for Intel CET To support Intel CET, all indirect branch targets must start with ENDBR32/ENDBR64. Here is a patch to define endbranch and add it to function entries in x86 assembly codes which are indirect branch targets as discovered by running testsuite on Intel CET machine and visual inspection. Verified with $ CC="gcc -Wl,-z,cet-report=error -fcf-protection" CXX="g++ -Wl,-z,cet-report=error -fcf-protection" .../configure x86_64-linux $ make -j8 $ make -j8 check with both nasm and yasm on both CET and non-CET machines. Change-Id: I9822578e7294fb5043a64ab7de5c41de81a7d337 Signed-off-by: H.J. Lu <hjl.tools@gmail.com>	2020-05-26 09:16:49 -07:00
Greg Tucker	af13ed6136	ec: Fix second windows reg push for avx512 Change improper stack push in windows prolog. Error was not reachable without windows nasm support and so went undetected. Change-Id: I8b715195d1c8efd173843c043d42fc610ddebd17 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-03-20 12:36:58 -07:00
Greg Tucker	ede04f0a1f	build: Fix for windows to allow nasm use Previously windows build could only use yasm because some procedural items such as proc_start were not supported by nasm. This adds a few macros and fixes so nasm can be used to build on windows. Change-Id: Ia05dc3ff482f33b0f915bb1be3c7df5e4a753b3a Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-03-17 18:05:46 -07:00
Greg Tucker	5ab40c79cc	ec: Fix windows reg push for avx512 Push of registers overlapped xmm push. Error was not reachable without windows nasm support and so went undetected. Change-Id: I0ffd66f6d32ac37ea03fe9b11924968aa50f8fa7 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-03-17 18:05:46 -07:00
Greg Tucker	472e7011e8	ec: Change use of windows macro save_xmm128 to vec For builds under windows this could emit a non-vec mov that's not optional for AVX versions. Change-Id: I31e6ea3b62d48c5a13f6e83f8d684f0b5551087b Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-03-17 18:04:54 -07:00
Greg Tucker	794413ddd2	ec: Remove arch-specific redundant gf_nvect tests The gf_{2-6}vect_dot_prod tests were kept in other_tests since the 5,6vect functions were not strictly called by the higher level ec_encode_data() and needed independent testing. As this has now changed the extra tests can be removed as redundant. Change-Id: I8a95e31487b150a2a8f929c5586785524d951fde Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-03-06 13:45:59 -07:00
Hong Bo Peng	180c74aefd	enable VSX SIMD in ISA-L for ppc64le 1) Implement the ErasureCode function in Altivec Intrinsics 2) Coding style update Change-Id: I2c81d035f4083e9b011dbf3b741f628813b68606 Thanks-to: Daniel Axtens <dja@axtens.net> Signed-off-by: Hong Bo Peng <penghb@cn.ibm.com>	2020-02-20 09:40:43 -07:00
John Kariuki	5eeb33f69c	ec: add AVX512 ec functions with 5 and 6 outputs Added AVX512 optimized functions to calculate the GF(2^8) vector dot product with 5 and 6 outputs at a time. Also added GF(2^8) vector multiply AVX512 optimized functions with 5 and 6 accumulate. Change-Id: I6d2c080f4f4f8e4823ad9a9be2c65c3b5b3bb1f8 Signed-off-by: John Kariuki <John.K.Kariuki@intel.com>	2019-11-19 10:12:14 -07:00
Hang Li	02a86dfb3f	erasure_code: modify eor way in aarch64 neon codes Change-Id: I9fb9219c5f280ed88194ec63234af046a5a036ae Signed-off-by: Hang Li <lihang48@hisilicon.com>	2019-11-01 15:31:33 -07:00
Hang Li	621cf92c52	erasure_code: modify perf benchmark loop Change-Id: Ie45ceb3ac55ab943a155e2a3f9f6b765cd94d7a1 Signed-off-by: Hang Li <lihang48@hisilicon.com>	2019-10-30 10:34:40 -07:00
Zhou Xiong	d7848c1d05	Implement aarch64 neon for erasure code. 1.Replace below erasure code interfaces to arm neon interface by mbin_interface function. ec_encode_data gf_vect_mul gf_vect_dot_prod gf_vect_mad ec_encode_data_update 2.Utilise arm neon instrution to accelerate GF(2^8) set compute by 128bit registor. Change-Id: Ib0ecbfbd1837d2b1f823d26815c896724d2d22e4 Signed-off-by: Zhou Xiong <zhouxiong13@huawei.com>	2019-10-25 11:09:03 -07:00
Bernd Schubert	d32d3f6902	Make variables in ec_base.h (file) static ec_base.h has several variables, which were defined with a global scope. Exactly those global variables caused issues on linking a static compilation of libisal.a to a shared lib. Adding -fPIC to CFLAGS somehow didn't help. As all the variables in ec_base.h are only included and used by a single C file, all of these can be (file) static, which then will also helps the compiler to make further optimizations. And which also solves the issue to link the static libisal to a shared lib. Also make the variables const, as these are constants and must be modified. Change-Id: I2b8141dabc1c7a528401f2778cdbdbed6c93c36b Signed-off-by: Bernd Schubert <bschubert@ddn.com>	2019-10-11 15:39:56 -07:00
Roy Oursler	d3caab9c3a	build: Avoid requiring AVX512 define when using dispatch functions Change-Id: I76af2d6ab7eb61ae531bbc7427650d08737c20ab Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>	2019-09-14 16:01:28 -07:00
Greg Tucker	4ac0e435eb	ec: Fix incorrect min size stated for gf_vect_mad Change-Id: If178913f01f0d500aa66ce0e8dd67aaba49a0871 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2019-07-16 15:41:34 -07:00
Yibo Cai	57eed2f02b	aarch64: Cleanup build issues This patch addresses one build failure and fixes several build warnings for Arm (some for x86 too). - Fix dynamic relocation link failure of ld.bfd 2.30 on Arm [log] relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol `xor_gen_neon' which may bind externally can not be used when making a shared object - Add arch dependent "other_tests" to exclude x86 specific tests on Arm [log] isa-l/erasure_code/gf_2vect_dot_prod_sse_test.c:181: undefined reference to `gf_2vect_dot_prod_sse' - Check "fread" return value to fix gcc warnings on Arm and x86 [log] warning: ignoring return value of ‘fread’, declared with attribute warn_unused_result [-Wunused-result] fread(in_buf, 1, in_size, in_file); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Fix issue of comparing "char" with "int" on Arm. "char" is unsigned on Arm by default, an unsigned char will never equal to EOF(-1). [Log] programs/igzip_cli.c:318:31: warning: comparison is always true due to limited range of data type [-Wtype-limits] while (tmp != '\n' && tmp != EOF) ^~ - Include <stdlib.h> to several files to fix build warnings on Arm [log] igzip/igzip_inflate_perf.c:339:5: warning: incompatible implicit declaration of built-in function ‘exit’ exit(0); ^~~~ Change-Id: I82c1b63316b634b3d398ffba2ff815679d9051a8 Signed-off-by: Yibo Cai <yibo.cai@arm.com>	2019-03-20 10:15:40 +08:00
Greg Tucker	3c009347b1	Fix a few c99isms in unit tests Change-Id: Iea9ba619e337d5abea7ee791ddf3dd27e0f3e60f Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2019-03-19 15:02:40 -07:00
Roy Oursler	699bb5bd3f	all: Revamp performance testing to be time based Change-Id: I6260d28e4adc974d8db0a1c770e3eb922d87f8e4 Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>	2019-03-07 09:28:04 -07:00
Roy Oursler	3a78c4a205	ec: Remove gf_vect_mad_perf.c Remove gf_vect_mad_perf.c as it is architecture specific and does not provide useful information in its current format. Change-Id: I7819679db491a9b5572128e4fc05d989b870d22d Signed-off-by: Roy Oursler <roy.j.oursler@intel.com>	2019-03-07 09:28:04 -07:00
Yibo Cai	7a44098a98	build: Add aarch64 support Change-Id: If9594936a28355d89edd1a331b3b429dffa44184 Signed-off-by: Yibo Cai <yibo.cai@arm.com>	2019-02-10 13:08:56 -07:00
Greg Tucker	2e212f28fa	build: Fix for mac nasm lack of symbol types Change-Id: I9ee86a3e32876d3860477c8365fc459d94a8920e Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2018-11-29 13:54:36 -07:00
Greg Tucker	09e787231b	ec: Change gf_mad_test to use multi-binary function Change-Id: Ibe484239b75514b5563dd043bb0e8c46d3bdac5e Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2018-10-09 17:55:56 -07:00

1 2

66 Commits