isa-l

mirror of https://github.com/intel/isa-l.git synced 2025-03-03 04:38:38 +01:00

Author	SHA1	Message	Date
Taiju Yamada	1187583a97	Fixes for aarch64 mac - It should be fine to enable pmull always on Apple Silicon - macOS 12+ is required for PMULL instruction. - Changed the conditional macro to __APPLE__ - Rewritten dispatcher using sysctlbyname - Use __USER_LABEL_PREFIX__ - Use __TEXT,__const as readonly section - use ASM_DEF_RODATA macro - fix func decl Change-Id: I800593f21085d8187b480c8bb3ab2bd70c4a6974 Signed-off-by: Taiju Yamada <tyamada@bi.a.u-tokyo.ac.jp>	2022-10-28 08:27:26 -07:00
Surendar Chandra	85716fe2fe	Correct loop bounds check in aarch64 gf_vect_mul Prior to this change, a missing loop bounds check in the aarch64 version of gf_vect_mul would cause the routine to return 1 (error) in the normal case. This change introduces a check and branch to "return_pass" (success), and also adds checks of the return code of gf_vect_mul to the supplied unit test; it was previously ignored. Change-Id: I9f7fe0014189b24f9600e0473ee02b5316c2da91 Signed-off-by: Surendar Chandra <vsurench@amazon.com>	2022-10-27 15:30:00 -07:00
Pawel Piatek	b6e96427d2	Use gindent on FreeBSD Also add workaround for GNU indent bug. Signed-off-by: Pawel Piatek <pawelx.piatek@intel.com> Change-Id: I9478a06dc17675c858030cfe15552609fef021da	2022-10-11 12:30:53 +02:00
Greg Tucker	04f3125ea0	test: Move perf routine output from stack to heap Large cold perf tests were allocating more then allowed stack size. Change-Id: I2c54f36ac6b42b359078dae7fffa5ce0b6d4890a Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2022-08-08 15:19:03 -07:00
Greg Tucker	9c7e3b9f22	test: Change perf tests to warm by default The cold versions of tests depended on a fixed size of last level cache that is too low on some arch and too high for the total available memory on others. Change-Id: Iee98403f9ace02e01b810c296a5fe44b933bfb17 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2022-08-03 16:35:55 -07:00
Greg Tucker	2bcbaf4c39	doc: Add security policy file Change-Id: Id5703011c296bd79b57ce2342b3bc25f82c6bd99 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2022-07-18 19:53:53 -07:00
Greg Tucker	9f75defd57	Remove all slver legacy segments The relic slver is no longer used for individual versioning on functions and is confusing tools looking for data in text sections. This removes all instances instead of fixing since its usefulness is waining. Fixes #221 Change-Id: Ife0b9f105950a90337c58e8a41ac2cffc0f67d99 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2022-07-14 19:23:52 -07:00
Greg Tucker	62519d97ec	build: Remove ms link flag for msvcrt The cflag to link with dynamic msvcrt /MD is not necessary and causes warnings when static linking. Fixes #219 Change-Id: I0085d468afc4acbe323b0783cbbc6760b4c70704 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2022-07-11 16:16:07 -07:00
Martin Oliveira	8b7c1b80b2	igzip: fix neon adler32 load beyond buffer end In the adler32_neon function, during the last iteration of the loop through "accum32_neon", we would load data after the end of the buffer (in the ld1 instruction, the "start" register points to the end of the buffer). If this memory is unmapped, this would cause a segfault. If the memory is mapped, the checksum would be correct because that value would only be used in the next iteration, but this happens during the last iteration. To fix this, we can simply do the load before incrementing "start". And while we're at it, we can load directly into d0_v/d1_v, saving a couple of mov's. Finally, the ld1 done during the function initialization can be removed as the values aren't used for anything. Change-Id: I4a0f2811adc523852ebe774da0a6fb1f5419192f Signed-off-by: Martin Oliveira <martin.oliveira@eideticom.com>	2022-04-25 15:36:37 -07:00
ZhaiMo	5b1a519ffc	change some logic in compress_icf_map_g Change-Id: Ibb59058b6d826e03833c53839613e54c3d2003a8 Signed-off-by: ZhaiMo <zhaimo14@mails.ucas.ac.cn>	2022-04-13 17:20:05 +00:00
Chunsong Feng	e297ecae7a	crc16: Accelerate T10DIF performance with prefetch and pmull2 The memory block size calculated by t10dif is generally 512 bytes in sectors. prefetching can effectively reduce cache misses.Use ldp instead of ldr to reduce the number of instructions, pmull+pmull2 can resuce register access. The perf test result shows that the performance is improved by 5x ~ 14x after optimization. Change-Id: Ibd3f08036b6a45443ffc15f808fd3b467294c283 Signed-off-by: Chunsong Feng <fengchunsong@huawei.com>	2022-03-31 09:58:04 -07:00
Greg Tucker	ad8dce15c6	doc: Add function overview and usage page While the external headers define the API, we could really use this overview to get users started and point them to examples. Change-Id: Iba419e61d0d7723e1029a3b6e7259facfeb39522 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2022-02-15 16:59:31 -07:00
H.J. Lu	57846f414f	Properly add .note.gnu.property section to assembly codes 1. Revert "x86: Generate .note.gnu.property section for ELF output" This reverts commit 8074e3fe1b9398a9d3b717267790050fc5041594, which is a hack to work around the old nasm which doesn't support section .note.gnu.property note alloc noexec align=8 This hack doesn't work for downstream, like: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=2040091 2. If Intel CET is enabled, require nasm with note section support to add section .note.gnu.property note alloc noexec align=N to assembly codes. Verified with $ CC="gcc -Wl,-z,cet-report=error -fcf-protection" CXX="g++ -Wl,-z,cet-report=error -fcf-protection" .../configure x86_64-linux $ make -j8 on Tiger Lake. Change-Id: I6d66fe6fd054420d7fde35b1508ca9f09defdeca Signed-off-by: H.J. Lu <hjl.tools@gmail.com>	2022-01-20 12:23:30 -07:00
Nicola Torracca	e3783f28f8	Add AVX512 implementation of mem_zero_detect(). Change-Id: I60fe0846d783787198b6a44a090fd9fe17c1807f Signed-off-by: Nicola Torracca <shark@bitchx.it>	2022-01-04 12:25:23 -07:00
Ilya Leoshkevich	d3cfb2fb77	Fix s390 build The goal of this patch is to make isa-l testsuite pass on s390 with minimal changes to the library. The one and only reason isa-l does not work on s390 at the moment is that s390 is big-endian, and isa-l assumes little-endian at a lot of places. There are two flavors of this: loading/storing integers from/to memory, and overlapping structs. Loads/stores are already helpfully wrapped by unaligned.h header, so replace the functions there with endianness-aware variants. Solve struct member overlap by reversing their order on big-endian. Also, fix a couple of usages of uninitialized memory in the testsuite (found with MemorySanitizer). Fixes s390x part of #188. Change-Id: Iaf14a113bd266900192cc8b44212f8a47a8c7753 Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>	2022-01-04 11:06:17 -07:00
Guodong Xu	3b3d7cc47b	Enable SVE in ISA-L erasure code for aarch64 This patch adds Arm (aarch64) SVE [1] variable-length vector assembly support into ISA-L erasure code library. "Arm designed the Scalable Vector Extension (SVE) as a next-generation SIMD extension to AArch64. SVE allows flexible vector length implementations with a range of possible values in CPU implementations. The vector length can vary from a minimum of 128 bits up to a maximum of 2048 bits, at 128-bit increments. The SVE design guarantees that the same application can run on different implementations that support SVE, without the need to recompile the code. " [3] Test method: - This patch was tested on Fujitsu's A64FX [2], and it passed all erasure code related test cases, including "make checks" , "make test", and "make perf". - To ensure code testing coverage, parameters in files (erasure_code/ erasure_code_test.c , erasure_code_update_test.c and gf_vect_mad_test.c) are modified to cover all _vect versions of _mad_sve() / _dot_prod_sve() rutines. Performance improvements over NEON: In general, SVE benchmarks (bandwidth in MB/s) are 40% ~ 100% higher than NEON when running _cold style (data uncached and pulled from memory) perfs. This includes routines of dot_prod, mad, and mul. Optimization points: This patch was tuned for the best performance on A64FX. Tuning points being touched in this patch include: 1) Data prefetch into L2 cache before loading. See _sve.S files. 2) Instruction sequence orchestration. Such as interleaving every two 'ld1b/st1b' instructions with other instructions. See _sve.S files. 3) To improve dest vectors parallelism, in highlevel, running gf_4vect_dot_prod_sve twice is better than running gf_8vect_dot_prod_sve() once, and it's also better than running _7vect + _vect, _6vect + _2vect, and _5vect + _3vect. The similar idea is applied to improve 11 ~ 9 dest vectors dot product computing as well. The related change can be found in ec_encode_data_sve() of file: erasure_code/aarch64/ec_aarch64_highlevel_func.c Notes: 1) About vector length: A64FX has a vector register length of 512bit. However, this patchset was written with variable length assembly so it work automatically on aarch64 machines with any types of SVE vector length, such as SVE-128, SVE-256, etc.. 2) About optimization: Due to differences in microarchitecture and cache/memory design, to achieve optimum performance on SVE capable CPUs other than A64FX, it is considered necessary to do microarchitecture-level tunings on these CPUs. [1] Introduction to SVE - Arm Developer. https://developer.arm.com/documentation/102476/latest/ [2] FUJITSU Processor A64FX. https://www.fujitsu.com/global/products/computing/servers/supercomputer/a64fx/ [3] Introducing SVE. https://developer.arm.com/documentation/102476/0001/Introducing-SVE Change-Id: If49eb8a956154d799dcda0ba4c9c6d979f5064a9 Signed-off-by: Guodong Xu <guodong.xu@linaro.org>	2022-01-04 10:54:38 -07:00
Greg Tucker	642ef36874	Fix check signoff for github actions Github actions checkout changed to pull only a single generated merge commit instead of the actual PR commit id. This breaks check_format test for signoff. Pulling history of 2 will include the actual commit ID. Change-Id: I7d83871159d24faaf2f8e6086f12173e14cbcf3c Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2021-12-30 16:49:27 -07:00
John Zhang	0de83dbff7	add help2man as optional package Change-Id: Id01a6d0fa77d5ec4959c2e9d9b0d6c3390cd43be Signed-off-by: John Zhang <zsgsdesign@gmail.com>	2021-11-29 10:17:52 -07:00
Ruben Vorderman	78f5c31e66	Create github CI yaml file This file automatically triggers testing on github actions. Change-Id: I23848f2dca925e0c96e64f7d655f32b83498bed1 Signed-off-by: Ruben Vorderman <r.h.p.vorderman@lumc.nl>	2021-10-29 17:06:36 -07:00
Ruben Vorderman	fd83ed1924	Add -arch to unsupported arguments in [ny]asm-filters Change-Id: Ieb53bb225815e204482e74bb383f1b61f12dabfd Signed-off-by: Ruben Vorderman <r.h.p.vorderman@lumc.nl>	2021-10-12 15:53:32 -07:00
Greg Tucker	6d17992b6d	mem: Add small allocs into test to help mem checkers Change-Id: I6de3951ff66a715d8b1c0f36d691cb60e8396139 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2021-10-04 11:01:33 -07:00
Greg Tucker	87908c9060	mem: Move new mem_zero_detect function to avx2 New mem_zero_detect function will fail on avx only machines. Change-Id: I3bca49bff886f9c130c89e8c74b31110e9bac76b Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2021-09-30 17:47:57 -07:00
Nicola Torracca	0e65117138	mem_zero_detect_avx: OR multiple vector and test for non zero on the result micro-optimizations: vpcmpeqb+vpmaskmov is faster than vptest according to uops.info; make usually untaken branches target forward. reduce numbers of data dependant branches and code size. Change-Id: Ie70b4bc99685368e5131f23344348bfaf7c27d3e Signed-off-by: Nicola Torracca <shark@bitchx.it>	2021-09-30 16:55:30 -07:00
Greg Tucker	f980b36655	build: Change include shortcut D to not conflict with env The variable D= can be used to quickly add defines. This sets a null default so it can only be overridden by the make command line. fixes #184 Change-Id: I84615174547f36208d6d577c1e30b6fac83139b3 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2021-09-14 19:18:31 -07:00
Taiju Yamada	998e03bf95	Strip -isysroot and related flags from asm-filter This helps python-isal compatibility. Change-Id: I8a2540e330f229f65903bdb2cc47aceeb0724dc5 Signed-off-by: Taiju Yamada <tyamada@bi.a.u-tokyo.ac.jp>	2021-09-13 10:02:38 -07:00
Greg Tucker	066940a9a7	build: Add ms rc file to put extra metatdata on dll Change-Id: Idf687c6b2f8d1dea203f01bf57c5158d19ed519e Signed-off-by: Ranjit Menon <ranjit.menon@intel.com> Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2021-09-02 18:27:51 -07:00
Ruben Vorderman	908726e255	More prominently feature language bindings and igzip Change-Id: Ief814eeb6d24f16d822e22327f40756ffba05869 Signed-off-by: Ruben Vorderman <r.h.p.vorderman@lumc.nl>	2021-08-24 18:22:54 -07:00
Ruben Vorderman	94ec6026ce	Create headers based on compression parameters. Instead of using a constant as default zlib header, create the header on the fly. Both zlib header bytes depend on the wbits and compression level used. Make sure that ISA-L compression level 0 is advertised as the fastest compression in both the gzip header (setting xfl flag to 0x04) and the zlib header (as 0, fastest, other levels are 1, fast). Change-Id: I1f30e4397a0f5fcf6df593c40178e7d6f6c05328 Signed-off-by: Ruben Vorderman <r.h.p.vorderman@lumc.nl>	2021-08-23 09:48:10 -07:00
Greg Tucker	1db0363c49	igzip: Add compress-decompress with dictionary to perf test Change-Id: Ic396819537f5437e6aab3ebf5d023ed5cdbe852a Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2021-06-14 15:55:39 -07:00
Greg Tucker	112dd72c01	build: Remove unneeded file types.h The file types.h has long been misnamed and overlaps with functionality in the test helper routines. Change-Id: I774047d3a0074198b67a6b4e909f1e2ce1938195 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2021-06-10 09:35:43 -07:00
Greg Tucker	cfdd3497d1	perf: Remove unneeded time include Timing functions are made os-independent with test.h include. Change-Id: Iab7d6325254d5c32263504efc756dbbe51d77153 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2021-06-09 18:33:57 -07:00
Greg Tucker	d5928e3760	build: Fix missing ms function export Windows def file was missing an exported ec support function. Also added path in nmake file to build extra examples. Change-Id: I59ac1599dcb8cdb45077347c74b57aeca4751c35 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2021-06-07 18:30:08 -07:00
Greg Tucker	628f4e91ea	ex: Add makefile to build examples from installed lib Change-Id: I10a51dfe90e0672bb33348de241a5be91c9caa37 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2021-06-03 17:53:20 -07:00
Greg Tucker	0f7bf1c04d	doc: Update minimum nasm recommendation and details Change-Id: Icb113242c0ab7f3c75af3e65a8d519511f4ed4c3 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2021-05-27 17:44:31 -07:00
Greg Tucker	393f69fcac	build: Change travis osx to use std brew The osx brew and older linux targets are failing the update. This removes the older linux builds and change the osx to take the latest brew that comes with the image instead of doing a brew update on every build. Change-Id: Ib1543296a733875c9eff798326b0d45854153923 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2021-05-21 19:44:39 -07:00
Greg Tucker	240ca46ffb	build: Change mingw to nasm by default Change-Id: I80053b8cf62f5f2ef7c12661086e9aeaf2eea573 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2021-05-21 19:44:39 -07:00
Greg Tucker	d7bac36be4	crc: Fix warning in perf test from uninitialized tmp ptr Both gcc and clang are showing a warning on this despite the buffer always being set before use. Change-Id: I0e8f6b9e3451efe69e49814abc883d49b04f2666 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2021-05-20 11:57:56 -07:00
Greg Tucker	fe4b7f9acc	Add toplevel header gen in windows Change-Id: I3a1e5fc495266d8ba223d75384625e22c3cf66fe Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2021-05-06 16:44:10 -07:00
Greg Tucker	2c705a26cb	raid: Fix doc and base functions for min sources The raid functions xor_gen, pq_gen and check functions must have at least two sources. Fixes #175 Change-Id: I2e4509e037c2b1dc88f3f7449d80f4c763e1e124 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2021-04-26 16:23:58 -07:00
Greg Tucker	ebb78fc99e	build: Fix warning from inconsistency in gnu make Make changed the interpretation of escaped # in a quote causing warnings in the test for pthreads. Change-Id: Ice94116713aea3c3e9725b38232e03f53d6633cc Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2021-03-03 10:26:06 -07:00
luo rixin	bee5180a15	erasure_code: Fix text relocation on aarch64 Here is the bug report on ceph. https://tracker.ceph.com/issues/48681 Change-Id: Ie1c60a71f28c1a169c8899a621be9bb455f5e244 Signed-off-by: luo rixin <luorixin@huawei.com>	2021-01-08 15:23:15 -07:00
Jerry Yu	bc8b2aef55	Fix clang build fail Author of this patch is Taiju Yamada <tyamada@bi.a.u-tokyo.ac.jp> Re-organized by Jerry Yu <jerry.h.yu@arm.com> Clang version must be later than 9.x according to https://reviews.llvm.org/D61719 Change-Id: I7516cca17ef4556b828fb6ecfa755e6451052359 Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>	2020-12-09 14:37:55 +08:00
Greg Tucker	600d8d8f77	build: Update fuzz tests for deprecated clang args Clang has deprecated the option -fsanitize-coverage=trace-pc-guard for use with fuzzing. Change-Id: I7fe5da0f57ab44110208d098858b786450a0a5e7 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-12-04 15:04:02 -07:00
Greg Tucker	2df39cf5f1	build: Bump revision to 2.30 Change-Id: If6d696ee76f3949d3cf5aff34403df65bce2c6b9 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com> v2.30.0	2020-11-06 18:08:16 -07:00
Greg Tucker	05f6a0bb39	Update release notes for v2.30 additions Change-Id: Icbb1faa2b67d8d18b1c7cde9f09774ebd895a6df Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-11-04 14:59:33 -07:00
Greg Tucker	ece814e912	doc: Add details on build and test Change-Id: I58401ed26ba8a0a7fad0191b4c1bbb461d0311e6 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-11-04 12:40:08 -07:00
Greg Tucker	dca9dd221e	igzip: Use unaligned load on static header to fix usan Clang with sanitizer on was catching on cast of static header. Switching to uload64 macro for better general solution. Change-Id: I495d440407bb1773841e2f7cdc48bd95fc1a2df4 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-11-04 12:40:08 -07:00
Greg Tucker	269df8a67d	igzip: Fix order of args check in new dictionary function In the newly added function isal_deflate_process_dict(), a null check was added to the dictionary struct but was ineffectual because of the order. Change-Id: I3b3e70997210794de102b1348e1467295871cee2 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-11-03 08:50:30 -07:00
Greg Tucker	24a98e3e87	Fix missing files in extra dist Change-Id: I83e62344fab72afd755453d4eb43e9c236ba2b86 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-10-28 17:43:53 -07:00
Greg Tucker	79143208ac	test: Add testing for new dictionary functions Change-Id: I0b0a151374acfe9b44c7a2be4bb959df59356d97 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-10-28 17:28:43 -07:00

1 2 3 4 5 ...

582 Commits