isa-l

mirror of https://github.com/intel/isa-l.git synced 2024-12-12 17:33:50 +01:00

Author	SHA1	Message	Date
Marcel Cornu	671e67b62d	crc: reformat using new code style Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>	2024-04-22 11:35:03 +02:00
Taiju Yamada	f1b144bbab	Fix mach compilation again; fold_constant has to be the same section as crc16_t10dif_copy_pmull Signed-off-by: Taiju Yamada <tyamada@bi.a.u-tokyo.ac.jp>	2024-03-07 10:10:51 +00:00
Colin Ian King	1500db751d	Fix a handful of spelling mistakes and typos There are quite a few spelling mistakes and typos in comments and user facing message literal strings as found using codespell. Fix these. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2024-02-06 15:03:14 +00:00
Pablo de Lara	9ee34ec0f5	crc: use macro to print 64-bit value Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2023-12-22 09:35:37 +00:00
Pablo de Lara	d65d2b5572	crc: [test] fix memory leak Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2023-12-18 14:25:22 +00:00
Pablo de Lara	6188bf7b2f	crc: fix build warnings on Windows Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2023-12-07 13:23:09 +00:00
Pablo de Lara	2ca781df19	lib: reduce verbosity by default in tests Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2023-12-01 14:33:29 +00:00
Pablo de Lara	acbe0deecf	crc: fix build with NASM 2.14 Fix following compilation error crc/crc32_iscsi_by16_10.s:408: error: invalid combination of opcode and operands Fixes #257. Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2023-11-15 13:42:00 +00:00
liuqinfei	4815174a68	crc: optimize by supporting arm xor fusion feature Arrange the two xor instructions according to the specified paradigm, then the two xor instructions can be fused to execute which can save one issue slot and one execution latency. Change-Id: Ic64bcfe569b2468e4dc9c13d073d367cc81fd937 Signed-off-by: liuqinfei <lucas.liuqinfei@huawei.com>	2023-08-18 07:53:59 +00:00
Pablo de Lara	f534a5c6a9	crc: fold 64 bytes of data if possible When less than 256 bytes of data are left, fold data in steps of 64 bytes, instead of 16 bytes, if there is enough data. Change-Id: I47d7cacdd1ba620078df528136945695c338db6d Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2023-08-17 11:54:24 +01:00
Pablo de Lara	beab678fb8	crc: optimize last bytes Change-Id: I4b8f73b23eb50c4c50ca65fab19716f217fe5780 Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2023-08-17 11:54:20 +01:00
Pablo de Lara	2bbce31943	crc: add CRC64 rocksoft implementation - Added reference implementation - Added base implementation - Added functional and performance tests Change-Id: I60c5097bd5fb89ee7a50910e71d449d50d155d0a Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2023-05-08 12:37:44 +00:00
Pablo de Lara	16056ff4e4	crc: refactor SSE CRC64 implementations to use common code Change-Id: I2d141f2ccd12ab338783e50736e36ed4aeb11f7f Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2023-05-08 12:37:44 +00:00
Pablo de Lara	22d33cf795	crc: use k-mask to load final bytes of data Change-Id: Ibd8d2144bc6942e11911e25a6365c1cb108af477 Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2023-05-08 12:37:23 +00:00
Greg Tucker	c2bec3ea65	crc: Use ternlog in by16 avx512 loop Ternlog has additional benefit in by16 crc main loop in both reflected and non-reflected polynomial crcs. Some arch see 4-7% improvement. Revisited on suggestion by Nicola Torracca. Change-Id: I806266a7080168cf33409634983e254a291a0795 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2022-11-02 12:16:20 -07:00
Taiju Yamada	1187583a97	Fixes for aarch64 mac - It should be fine to enable pmull always on Apple Silicon - macOS 12+ is required for PMULL instruction. - Changed the conditional macro to __APPLE__ - Rewritten dispatcher using sysctlbyname - Use __USER_LABEL_PREFIX__ - Use __TEXT,__const as readonly section - use ASM_DEF_RODATA macro - fix func decl Change-Id: I800593f21085d8187b480c8bb3ab2bd70c4a6974 Signed-off-by: Taiju Yamada <tyamada@bi.a.u-tokyo.ac.jp>	2022-10-28 08:27:26 -07:00
Greg Tucker	9c7e3b9f22	test: Change perf tests to warm by default The cold versions of tests depended on a fixed size of last level cache that is too low on some arch and too high for the total available memory on others. Change-Id: Iee98403f9ace02e01b810c296a5fe44b933bfb17 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2022-08-03 16:35:55 -07:00
Greg Tucker	9f75defd57	Remove all slver legacy segments The relic slver is no longer used for individual versioning on functions and is confusing tools looking for data in text sections. This removes all instances instead of fixing since its usefulness is waining. Fixes #221 Change-Id: Ife0b9f105950a90337c58e8a41ac2cffc0f67d99 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2022-07-14 19:23:52 -07:00
Chunsong Feng	e297ecae7a	crc16: Accelerate T10DIF performance with prefetch and pmull2 The memory block size calculated by t10dif is generally 512 bytes in sectors. prefetching can effectively reduce cache misses.Use ldp instead of ldr to reduce the number of instructions, pmull+pmull2 can resuce register access. The perf test result shows that the performance is improved by 5x ~ 14x after optimization. Change-Id: Ibd3f08036b6a45443ffc15f808fd3b467294c283 Signed-off-by: Chunsong Feng <fengchunsong@huawei.com>	2022-03-31 09:58:04 -07:00
Greg Tucker	112dd72c01	build: Remove unneeded file types.h The file types.h has long been misnamed and overlaps with functionality in the test helper routines. Change-Id: I774047d3a0074198b67a6b4e909f1e2ce1938195 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2021-06-10 09:35:43 -07:00
Greg Tucker	cfdd3497d1	perf: Remove unneeded time include Timing functions are made os-independent with test.h include. Change-Id: Iab7d6325254d5c32263504efc756dbbe51d77153 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2021-06-09 18:33:57 -07:00
Greg Tucker	d7bac36be4	crc: Fix warning in perf test from uninitialized tmp ptr Both gcc and clang are showing a warning on this despite the buffer always being set before use. Change-Id: I0e8f6b9e3451efe69e49814abc883d49b04f2666 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2021-05-20 11:57:56 -07:00
Greg Tucker	ec73d39086	crc: Add new vclmul version of crc32_iscsi Change-Id: I1c509c6ea312b6eb4e1c2c1c8bb7044f7b043e0d Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-08-21 17:15:58 -07:00
Jerry Yu	1c71f9c0ae	crc32: tweak performance of crc32/crc32c Tweak performances with prefetch instructions. Below is the test results: - Neoverse N1: ~30% - Cortex-A72: ~3% - Cortex-A57: ~90% - Others: 50% - 5x Change-Id: I3ab292a953043dbaea98af3c66778f57da3a1331 Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>	2020-07-09 17:37:00 +08:00
H.J. Lu	cd888f01a4	x86: Add ENDBR32/ENDBR64 at function entries for Intel CET To support Intel CET, all indirect branch targets must start with ENDBR32/ENDBR64. Here is a patch to define endbranch and add it to function entries in x86 assembly codes which are indirect branch targets as discovered by running testsuite on Intel CET machine and visual inspection. Verified with $ CC="gcc -Wl,-z,cet-report=error -fcf-protection" CXX="g++ -Wl,-z,cet-report=error -fcf-protection" .../configure x86_64-linux $ make -j8 $ make -j8 check with both nasm and yasm on both CET and non-CET machines. Change-Id: I9822578e7294fb5043a64ab7de5c41de81a7d337 Signed-off-by: H.J. Lu <hjl.tools@gmail.com>	2020-05-26 09:16:49 -07:00
Zhiyuan Zhu	031450f697	crc32: Implement default mix mode optimization Change-Id: Ib3bf04215cca491db522ec33905fe48df173cc2f Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>	2020-05-09 08:10:34 +00:00
Jerry Yu	6c4d3dbf6c	crc32:NeoverseN1: Change CRC32/PMULL order to PMULL first To reduce the cache missing events, the mix layout is changed to PMULL+CRC. It also relaxes the final delay caused by data dependency. As results, the cold perf was improved about 20% and warm perf was improved about 4%. Change-Id: I7756f846edcb4f1665b4643a5a0e02283938cfdf Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>	2020-04-16 20:38:41 +08:00
Jerry Yu	92fc8733fa	crc32: Fix prototype mismatch bug Change-Id: I7c8a2348441f32a43ff386122612405e418d9947 Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>	2020-04-10 00:46:41 +00:00
Jerry Yu	9bcd6768fd	crc32:Adjust hardware folding algorithm flags Hardware folding algorithm depend on CRC32 and PMULL instruction. And it should match both flags . Change-Id: I361068402db1fe6d7c0bd8d2c7048f1d94880233 Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>	2020-04-08 13:50:15 +08:00
Jerry Yu	0033f42189	crc32:Optimize crc32/c for cortex-a72 Change-Id: Ib1658fd4b87b31d8ea6c93f697b50d9b409c186e Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>	2020-04-08 13:49:38 +08:00
Jerry Yu	a2fc2c000d	crc32:Add optimization implementation for Neoverse N1 This patch is base on reference(1) algorithm with some changes. - Redefine the block number to two. - That's due to only two pipe-line can be used in CRC32 calculate. - Redefine the block size: - The block size of CRC is 1536B and PMULL is 512B - Interleave CRC and PMULL instructions. The optimization parameters are calculated base on reference(2) References: - https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf - https://developer.arm.com/docs/swog309707/a Change-Id: I1c9e593d59b521f56e4b3c807b396c083c181636 Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>	2020-03-30 09:20:29 -07:00
Greg Tucker	ede04f0a1f	build: Fix for windows to allow nasm use Previously windows build could only use yasm because some procedural items such as proc_start were not supported by nasm. This adds a few macros and fixes so nasm can be used to build on windows. Change-Id: Ia05dc3ff482f33b0f915bb1be3c7df5e4a753b3a Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-03-17 18:05:46 -07:00
Greg Tucker	6136a04bbe	crc: Add new vclmul version of crc16_t10dif Change-Id: Ic068f35d5d8c34b74128b7a2ea8e82f5fa693c28 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-02-21 19:54:19 -07:00
Greg Tucker	5ef6eb5c68	crc: Add new vclmul version of crc32_ieee Change-Id: Ib761e3240d8252ce84e9abeadb568dce60742717 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-02-21 10:11:16 -07:00
Greg Tucker	25a673d75a	crc: Add new vclmul version of gzip_refl Change-Id: I8050853dcd177f4fb506f32f5fa723f7a1d3cded Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-02-21 10:11:16 -07:00
Greg Tucker	4217930338	crc: Add vec version of crc16_t10dif_copy Change-Id: I5f73e8a38efd1ff50d30a39689d9d85da702e809 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-02-21 10:11:16 -07:00
Greg Tucker	02a41e0653	crc: Add vec version of crc32_ieee when avx avail Change-Id: I5542ee93156c26f5a23feb89b82f4c51f282777d Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-02-21 10:11:16 -07:00
Greg Tucker	d4131bb3d3	crc: Add vec version of crc32_gzip_refl when avx avail Change-Id: I4a069c318c809dcd21a6ebc47d3e0d1c131599ea Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-02-21 10:11:16 -07:00
Greg Tucker	ad22a90686	crc: Add vec version of crc16 when avx available Vec versions mix much better with other avx code. Change-Id: I2544c75d09231ee70f16c384b1e57062976199d9 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-02-21 10:11:16 -07:00
Hong Bo Peng	180c74aefd	enable VSX SIMD in ISA-L for ppc64le 1) Implement the ErasureCode function in Altivec Intrinsics 2) Coding style update Change-Id: I2c81d035f4083e9b011dbf3b741f628813b68606 Thanks-to: Daniel Axtens <dja@axtens.net> Signed-off-by: Hong Bo Peng <penghb@cn.ibm.com>	2020-02-20 09:40:43 -07:00
Samuel Lee	4785428d2f	crc: arm64 implementation tweaks + Utilise `pmull2` instruction in main loops of arm64 crc functions and avoid the need for `dup` to align multiplicands. + Use just 1 ASIMD register to hold both 64b p4 constants, appropriately aligned. + Interleave quadword `ldr` with `pmull{2}` to avoid unnecessary stalls on existing LITTLE uarch (which can only issue these instructions every other cycle). + Similarly interleave scalar instructions with ASIMD instructions to increase likelihood of instruction level parallelism on a variety of uarch. + Cut down on needless instructions in non-critical sections to help performance for small buffers. + Extract common instruction sequences into inner macros and moved them into shared header - crc_common_pmull.h + Use the same human readable register aliases and register allocation in all 4 implementations, never refer to registers without using human readable alias. + Use #defines rather than .req to allow use of same names across several implementations + Reduce tail case size from 1024B to 64B + Phrased the `eor` instructions in the main loop to more clearly show that we can rewrite pairs of `eor` instructions with a single `eor3` instruction in the presence of Armv8.2-SHA (should probably be an option in multibinary in future). Change-Id: I3688193ea4ad88b53cf47e5bd9a7fd5c2b4401e1 Signed-off-by: Samuel Lee <samuel.lee@microsoft.com>	2019-11-13 10:58:19 -07:00
Greg Tucker	533ba53f11	crc: Fix symbol conflict with older assemblers Change-Id: I6f1322a5fecdf21b2c774454cd51cb56767f30b8 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2019-10-28 14:39:44 -07:00
Zhiyuan Zhu	f3993f5c0b	crc: Fix dynamic relocation link failure on Arm This issue occurs when dynamic compilation is used and gcc's -fsanitize memory detection option is turned on. [Log] relocation truncated to fit: R_AARCH64_LD_PREL_LO19 against `.rodata' Change-Id: Ic2f82264610552f347e043f82ac5ebafc93748e2 Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>	2019-10-11 15:37:29 -07:00
Greg Tucker	600b6d8d99	crc: Add new ecma_norm Change-Id: I7747bfdca24bcd604c3eb118e7f1bcd98b2b6211 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2019-09-16 17:01:25 -07:00
Greg Tucker	121bc635c9	crc: Add new jones_norm Change-Id: I66118baeec2a1d63423c74edc3aa20a3e8955c6e Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2019-09-16 17:01:25 -07:00
Greg Tucker	ed528bb2ad	crc: Add new iso_norm Change-Id: If0b05d1a1029b02842935c5c43966d81c59fbbca Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2019-09-16 17:01:25 -07:00
Greg Tucker	ea4cbf0ffa	crc: Add new ecma_refl Change-Id: Ifef4f8c6ce7da328b0cc03040b17e7443febf44d Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2019-09-16 17:01:25 -07:00
Greg Tucker	42bbc5a37e	crc: Add new jones_refl Change-Id: Ia4837b9125bce4e38ef6bae0a8c852d02e9b0bf2 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2019-09-16 17:01:25 -07:00
Greg Tucker	5c546ecddf	crc: Add new arch CRC Change-Id: I31d3a7e61eeed9d13a0cadd6d1ed25b0dbb39415 Signed-off-by: Chunyang Hui <chunyang.hui@intel.com> Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2019-09-16 17:01:25 -07:00
Greg Tucker	7a28c83879	test: Increase size of crc tests and simplify output Change-Id: Ia0418b7889e591a0164c335e273caff263cdf640 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2019-09-14 16:01:28 -07:00

1 2

75 Commits