isa-l

mirror of https://github.com/intel/isa-l.git synced 2024-12-12 09:23:50 +01:00

Author	SHA1	Message	Date
Greg Tucker	d7927673ba	igzip: Inflate detect pre-gen header and use pre-expanded Performance improvement for inflate to skip the time-consuming process of decode table expansion when the header matches a known common dymanic one such as produced by level 0 compression. Change-Id: Ia2550b812a062b7cc2eb1b72bcb609f1a631e40b Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-10-21 18:09:49 -07:00
Greg Tucker	cc9ed53972	build: Fix nmake check for multiple arch Change-Id: I36c3616163f6fec61dda9cf8b35ca561e59477c9 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-08-27 11:16:30 -07:00
Greg Tucker	794b8b60c1	build: Add test to check for nmake consistency Change-Id: I1180ba749d54e7ef433b01b33450e52ac5dbb2bb Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-08-26 11:41:03 -07:00
Greg Tucker	24623b8b82	crc: Fix missing object omitted from nmake file Previous new crc version missed the update for nmake. Change-Id: Ie529ee9d70d8d0ab8a8af3bd2720405802180d1e Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-08-26 09:49:23 -07:00
Greg Tucker	ec73d39086	crc: Add new vclmul version of crc32_iscsi Change-Id: I1c509c6ea312b6eb4e1c2c1c8bb7044f7b043e0d Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-08-21 17:15:58 -07:00
Greg Tucker	ae45f60e78	igzip: Add cli feature to inflate concatenated gz files Change-Id: I2beade6682e78fda30a18228a8660201ae7bf718 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-08-13 15:21:10 -07:00
Greg Tucker	93049d0d1f	igzip: Fix read header for correct null checking and init Issue with reading header only appears when combined with new feature in cli of multiple concatenated gzip files. Change-Id: Id8df9150c6f27d8b22e810b511291f3fcf136723 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-08-13 15:21:10 -07:00
Ruben Vorderman	2049d8dc81	Add conda shield to readme This will make it easier for users to get the latest version. Installing with conda is easier than compiling it yourself. Distro packages (such as Debian's) do not always ship the latest version while conda-forge can. This badge will advertise this install method. Change-Id: I99a1853a00e55fdf0c574c9906675738ac278121 Signed-off-by: Ruben Vorderman <r.h.p.vorderman@lumc.nl>	2020-07-27 11:36:55 -07:00
Jerry Yu	1c71f9c0ae	crc32: tweak performance of crc32/crc32c Tweak performances with prefetch instructions. Below is the test results: - Neoverse N1: ~30% - Cortex-A72: ~3% - Cortex-A57: ~90% - Others: 50% - 5x Change-Id: I3ab292a953043dbaea98af3c66778f57da3a1331 Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>	2020-07-09 17:37:00 +08:00
Jerry Yu	14e0081bef	build: fix build break on non-x86 platform Arm64 and ppc64 build reports below error: "configure: error: conditional "INTEL_CET_ENABLED" was never defined." And the error should be report in all non-x86 platform. Change-Id: I4c1b2fc99091424cfd5c62cf4d6536222b66712d Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>	2020-06-03 03:25:03 +00:00
H.J. Lu	8074e3fe1b	x86: Generate .note.gnu.property section for ELF output We should generate .note.gnu.property section with x86 assembly codes for ELF outputs to mark Intel CET support when Intel CET is enabled since all input files must be marked with Intel CET support in order for linker to mark output with Intel CET support. Since nasm and yasm can't generate the proper .note.gnu.property section, yasm-cet-filter.sh and yasm-filter.sh are added to generate the proper .note.gnu.property with linker help. Verified with $ CC="gcc -Wl,-z,cet-report=error -fcf-protection" CXX="g++ -Wl,-z,cet-report=error -fcf-protection" .../configure x86_64-linux $ make -j8 on Linux/x86-64. Change-Id: I14e03a8a9031c8397dc36939a528cf5a827d775a Signed-off-by: H.J. Lu <hjl.tools@gmail.com>	2020-05-26 17:12:01 -07:00
H.J. Lu	cd888f01a4	x86: Add ENDBR32/ENDBR64 at function entries for Intel CET To support Intel CET, all indirect branch targets must start with ENDBR32/ENDBR64. Here is a patch to define endbranch and add it to function entries in x86 assembly codes which are indirect branch targets as discovered by running testsuite on Intel CET machine and visual inspection. Verified with $ CC="gcc -Wl,-z,cet-report=error -fcf-protection" CXX="g++ -Wl,-z,cet-report=error -fcf-protection" .../configure x86_64-linux $ make -j8 $ make -j8 check with both nasm and yasm on both CET and non-CET machines. Change-Id: I9822578e7294fb5043a64ab7de5c41de81a7d337 Signed-off-by: H.J. Lu <hjl.tools@gmail.com>	2020-05-26 09:16:49 -07:00
Zhiyuan Zhu	031450f697	crc32: Implement default mix mode optimization Change-Id: Ib3bf04215cca491db522ec33905fe48df173cc2f Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>	2020-05-09 08:10:34 +00:00
Jerry Yu	6c4d3dbf6c	crc32:NeoverseN1: Change CRC32/PMULL order to PMULL first To reduce the cache missing events, the mix layout is changed to PMULL+CRC. It also relaxes the final delay caused by data dependency. As results, the cold perf was improved about 20% and warm perf was improved about 4%. Change-Id: I7756f846edcb4f1665b4643a5a0e02283938cfdf Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>	2020-04-16 20:38:41 +08:00
Jerry Yu	92fc8733fa	crc32: Fix prototype mismatch bug Change-Id: I7c8a2348441f32a43ff386122612405e418d9947 Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>	2020-04-10 00:46:41 +00:00
Jerry Yu	9bcd6768fd	crc32:Adjust hardware folding algorithm flags Hardware folding algorithm depend on CRC32 and PMULL instruction. And it should match both flags . Change-Id: I361068402db1fe6d7c0bd8d2c7048f1d94880233 Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>	2020-04-08 13:50:15 +08:00
Jerry Yu	0033f42189	crc32:Optimize crc32/c for cortex-a72 Change-Id: Ib1658fd4b87b31d8ea6c93f697b50d9b409c186e Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>	2020-04-08 13:49:38 +08:00
Greg Tucker	5e586843eb	build: Change ms nmake default to nasm and add pdb gen The nmake default is changed for a modern nasm. Older nasm and yasm versions will still work with windows but the nmake options must be changed appropriately for max AS_FEATURE_LEVEL to match. Also now generates debug symbol pdb files. Change-Id: I94a2dd7ecf541c6564ccbd4a184c33995d7b31ad Signed-off-by: Poornima Kumar <poornima.kumar@intel.com> Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-03-31 22:55:27 +00:00
Jerry Yu	a2fc2c000d	crc32:Add optimization implementation for Neoverse N1 This patch is base on reference(1) algorithm with some changes. - Redefine the block number to two. - That's due to only two pipe-line can be used in CRC32 calculate. - Redefine the block size: - The block size of CRC is 1536B and PMULL is 512B - Interleave CRC and PMULL instructions. The optimization parameters are calculated base on reference(2) References: - https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf - https://developer.arm.com/docs/swog309707/a Change-Id: I1c9e593d59b521f56e4b3c807b396c083c181636 Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>	2020-03-30 09:20:29 -07:00
Jerry Yu	f2cf2609cd	multi-binary:Add microarchitecture id reader This patch provides microarchitecture information and make microarchitecture optimization possible. It will trap into kernel due to mrs instruction. So it should be called only in dispatcher, that will be called only once in program lifecycle. And HWCAP must be match,That will make sure there are no illegal instruction errors. Change-Id: I393ec742010bf3f10ce335482c0350aa4202c788 Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>	2020-03-30 09:20:29 -07:00
Jerry Yu	85f947e120	ci: remove unused drone configuration Change-Id: I20bded8111deb122757dbf259d17cd80010c2bb6 Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>	2020-03-27 16:16:00 -07:00
Greg Tucker	af13ed6136	ec: Fix second windows reg push for avx512 Change improper stack push in windows prolog. Error was not reachable without windows nasm support and so went undetected. Change-Id: I8b715195d1c8efd173843c043d42fc610ddebd17 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-03-20 12:36:58 -07:00
Greg Tucker	ede04f0a1f	build: Fix for windows to allow nasm use Previously windows build could only use yasm because some procedural items such as proc_start were not supported by nasm. This adds a few macros and fixes so nasm can be used to build on windows. Change-Id: Ia05dc3ff482f33b0f915bb1be3c7df5e4a753b3a Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-03-17 18:05:46 -07:00
Greg Tucker	5ab40c79cc	ec: Fix windows reg push for avx512 Push of registers overlapped xmm push. Error was not reachable without windows nasm support and so went undetected. Change-Id: I0ffd66f6d32ac37ea03fe9b11924968aa50f8fa7 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-03-17 18:05:46 -07:00
Greg Tucker	472e7011e8	ec: Change use of windows macro save_xmm128 to vec For builds under windows this could emit a non-vec mov that's not optional for AVX versions. Change-Id: I31e6ea3b62d48c5a13f6e83f8d684f0b5551087b Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-03-17 18:04:54 -07:00
Greg Tucker	7c0ab1d459	build: Add auto regenerate of nmake file Change-Id: Icaa64aa35697c87779df18c3941d3df0f3256546 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-03-10 14:00:05 -07:00
Greg Tucker	794413ddd2	ec: Remove arch-specific redundant gf_nvect tests The gf_{2-6}vect_dot_prod tests were kept in other_tests since the 5,6vect functions were not strictly called by the higher level ec_encode_data() and needed independent testing. As this has now changed the extra tests can be removed as redundant. Change-Id: I8a95e31487b150a2a8f929c5586785524d951fde Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-03-06 13:45:59 -07:00
Greg Tucker	806b55ee57	build: Bump revision to 2.29 Change-Id: I78cfa77864f3fd77c3b63199bc18fd1782fe3dc2 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-02-26 18:29:49 -07:00
Greg Tucker	2db2cd557c	Update release notes for v2.29 additions Change-Id: Id9ba5da760ee60dbb1de47162e6276f522bc0850 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-02-26 12:04:18 -07:00
Greg Tucker	6136a04bbe	crc: Add new vclmul version of crc16_t10dif Change-Id: Ic068f35d5d8c34b74128b7a2ea8e82f5fa693c28 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-02-21 19:54:19 -07:00
Greg Tucker	5ef6eb5c68	crc: Add new vclmul version of crc32_ieee Change-Id: Ib761e3240d8252ce84e9abeadb568dce60742717 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-02-21 10:11:16 -07:00
Greg Tucker	25a673d75a	crc: Add new vclmul version of gzip_refl Change-Id: I8050853dcd177f4fb506f32f5fa723f7a1d3cded Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-02-21 10:11:16 -07:00
Greg Tucker	4217930338	crc: Add vec version of crc16_t10dif_copy Change-Id: I5f73e8a38efd1ff50d30a39689d9d85da702e809 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-02-21 10:11:16 -07:00
Greg Tucker	02a41e0653	crc: Add vec version of crc32_ieee when avx avail Change-Id: I5542ee93156c26f5a23feb89b82f4c51f282777d Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-02-21 10:11:16 -07:00
Greg Tucker	d4131bb3d3	crc: Add vec version of crc32_gzip_refl when avx avail Change-Id: I4a069c318c809dcd21a6ebc47d3e0d1c131599ea Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-02-21 10:11:16 -07:00
Greg Tucker	ad22a90686	crc: Add vec version of crc16 when avx available Vec versions mix much better with other avx code. Change-Id: I2544c75d09231ee70f16c384b1e57062976199d9 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-02-21 10:11:16 -07:00
Hong Bo Peng	180c74aefd	enable VSX SIMD in ISA-L for ppc64le 1) Implement the ErasureCode function in Altivec Intrinsics 2) Coding style update Change-Id: I2c81d035f4083e9b011dbf3b741f628813b68606 Thanks-to: Daniel Axtens <dja@axtens.net> Signed-off-by: Hong Bo Peng <penghb@cn.ibm.com>	2020-02-20 09:40:43 -07:00
Zhang Jinde	a3d5cd8642	igzip: Fix clang error on dep generation Clang errors when generating dependencies due to a stray semicolon following a function definition. Change-Id: Iefb4aca988b643bb62a69bbbaf197aca20a2d085 Signed-off-by: Zhang Jinde <zjd5536@163.com>	2020-01-17 10:25:32 -07:00
Zhang Jinde	163b6cd934	igzip: Fix for deflate logic buffer management Fixes invalid logic that attempted to eliminate unnecessary copy of input to the history buffer in cases where it is not required. Correction should improve performance and not change functionality. Change-Id: Ife24dcc9d920ce220b1a394031e971321737a171 Signed-off-by: Zhang Jinde <zjd5536@163.com>	2020-01-08 09:46:16 -07:00
Jerry Yu	fc69e8fc79	igzip: fix deflate hash bug if next_in equal end_in, the function should return. Change-Id: I59e631bb1f24835fd43f943a3736e016c4e2d0ac Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>	2019-12-31 13:15:35 -07:00
Jerry Yu	e2b07bbd44	build: fix debug build problem Remove strip command when lib_debug=1 Change-Id: I1203fcbfefb3b87080e9ba12ccbfb8018a008147 Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>	2019-12-31 13:15:05 -07:00
Jerry Yu	936d05fc4f	igzip:Add decode huffman code for aarch64 Change-Id: If26cc4fd97b078b5f3b02e5f6f121a12ec73f671 Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>	2019-12-19 16:10:04 +08:00
Greg Tucker	ad49e580dc	doc: Fix missing description of gf_matrix_inverse Doc missed issue of input matrix destruction. Fixes #116 Change-Id: Ic840b27532d90518dd21ec2701c278a1c3b61a8b Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2019-12-13 16:24:05 -07:00
Zhiyuan Zhu	2b8cc393af	igzip: implement gen_icf_map with assembly Change-Id: I74e6200a732acfaac44b7f5a82bd4a2215ba1535 Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>	2019-12-13 07:54:12 +00:00
Zhiyuan Zhu	f430953f0a	igzip: cleanup perf test related code This patch addresses some cppcheck issues. And some minor changes to maintain code consistency. - Cleanup cppcheck issues. [log][igzip/igzip_perf.c] (error) Shifting signed 32-bit value by 31 bits is undefined behaviour [log][igzip/igzip_hist_perf.c:132]: (error) Memory leak: outbuf - Some minor changes to maintain code consistency. igzip/igzip_build_hash_table_perf.c igzip/igzip_hist_perf.c igzip/igzip_semi_dyn_file_perf.c - delete unused variable outbuf and outbuf_size from igzip/igzip_hist_perf.c Change-Id: Icbbd8f70de689931c8a844d89e457af8d97c6793 Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>	2019-12-06 15:33:20 +08:00
Zhiyuan Zhu	683364c47b	igzip: implement encode_deflate_icf with assembly Change-Id: I90b12da2d2a96bfdb47d29ab329648247a756585 Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>	2019-11-29 14:45:45 -07:00
John Kariuki	5eeb33f69c	ec: add AVX512 ec functions with 5 and 6 outputs Added AVX512 optimized functions to calculate the GF(2^8) vector dot product with 5 and 6 outputs at a time. Also added GF(2^8) vector multiply AVX512 optimized functions with 5 and 6 accumulate. Change-Id: I6d2c080f4f4f8e4823ad9a9be2c65c3b5b3bb1f8 Signed-off-by: John Kariuki <John.K.Kariuki@intel.com>	2019-11-19 10:12:14 -07:00
Samuel Lee	4785428d2f	crc: arm64 implementation tweaks + Utilise `pmull2` instruction in main loops of arm64 crc functions and avoid the need for `dup` to align multiplicands. + Use just 1 ASIMD register to hold both 64b p4 constants, appropriately aligned. + Interleave quadword `ldr` with `pmull{2}` to avoid unnecessary stalls on existing LITTLE uarch (which can only issue these instructions every other cycle). + Similarly interleave scalar instructions with ASIMD instructions to increase likelihood of instruction level parallelism on a variety of uarch. + Cut down on needless instructions in non-critical sections to help performance for small buffers. + Extract common instruction sequences into inner macros and moved them into shared header - crc_common_pmull.h + Use the same human readable register aliases and register allocation in all 4 implementations, never refer to registers without using human readable alias. + Use #defines rather than .req to allow use of same names across several implementations + Reduce tail case size from 1024B to 64B + Phrased the `eor` instructions in the main loop to more clearly show that we can rewrite pairs of `eor` instructions with a single `eor3` instruction in the presence of Armv8.2-SHA (should probably be an option in multibinary in future). Change-Id: I3688193ea4ad88b53cf47e5bd9a7fd5c2b4401e1 Signed-off-by: Samuel Lee <samuel.lee@microsoft.com>	2019-11-13 10:58:19 -07:00
Greg Tucker	0a8d05a81e	doc: Move arch-dependent build instructions to readme Removed the redundant parts that apply to all arch. Change-Id: I2015c436cc8ea09913a8d0d4ce2cf1f112d71dde Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2019-11-01 15:55:44 -07:00
Hang Li	02a86dfb3f	erasure_code: modify eor way in aarch64 neon codes Change-Id: I9fb9219c5f280ed88194ec63234af046a5a036ae Signed-off-by: Hang Li <lihang48@hisilicon.com>	2019-11-01 15:31:33 -07:00

... 2 3 4 5 6 ...

677 Commits