isa-l

mirror of https://github.com/intel/isa-l.git synced 2024-12-12 09:23:50 +01:00

Author	SHA1	Message	Date
Jerry Yu	6c4d3dbf6c	crc32:NeoverseN1: Change CRC32/PMULL order to PMULL first To reduce the cache missing events, the mix layout is changed to PMULL+CRC. It also relaxes the final delay caused by data dependency. As results, the cold perf was improved about 20% and warm perf was improved about 4%. Change-Id: I7756f846edcb4f1665b4643a5a0e02283938cfdf Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>	2020-04-16 20:38:41 +08:00
Jerry Yu	92fc8733fa	crc32: Fix prototype mismatch bug Change-Id: I7c8a2348441f32a43ff386122612405e418d9947 Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>	2020-04-10 00:46:41 +00:00
Jerry Yu	9bcd6768fd	crc32:Adjust hardware folding algorithm flags Hardware folding algorithm depend on CRC32 and PMULL instruction. And it should match both flags . Change-Id: I361068402db1fe6d7c0bd8d2c7048f1d94880233 Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>	2020-04-08 13:50:15 +08:00
Jerry Yu	0033f42189	crc32:Optimize crc32/c for cortex-a72 Change-Id: Ib1658fd4b87b31d8ea6c93f697b50d9b409c186e Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>	2020-04-08 13:49:38 +08:00
Greg Tucker	5e586843eb	build: Change ms nmake default to nasm and add pdb gen The nmake default is changed for a modern nasm. Older nasm and yasm versions will still work with windows but the nmake options must be changed appropriately for max AS_FEATURE_LEVEL to match. Also now generates debug symbol pdb files. Change-Id: I94a2dd7ecf541c6564ccbd4a184c33995d7b31ad Signed-off-by: Poornima Kumar <poornima.kumar@intel.com> Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-03-31 22:55:27 +00:00
Jerry Yu	a2fc2c000d	crc32:Add optimization implementation for Neoverse N1 This patch is base on reference(1) algorithm with some changes. - Redefine the block number to two. - That's due to only two pipe-line can be used in CRC32 calculate. - Redefine the block size: - The block size of CRC is 1536B and PMULL is 512B - Interleave CRC and PMULL instructions. The optimization parameters are calculated base on reference(2) References: - https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf - https://developer.arm.com/docs/swog309707/a Change-Id: I1c9e593d59b521f56e4b3c807b396c083c181636 Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>	2020-03-30 09:20:29 -07:00
Jerry Yu	f2cf2609cd	multi-binary:Add microarchitecture id reader This patch provides microarchitecture information and make microarchitecture optimization possible. It will trap into kernel due to mrs instruction. So it should be called only in dispatcher, that will be called only once in program lifecycle. And HWCAP must be match,That will make sure there are no illegal instruction errors. Change-Id: I393ec742010bf3f10ce335482c0350aa4202c788 Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>	2020-03-30 09:20:29 -07:00
Jerry Yu	85f947e120	ci: remove unused drone configuration Change-Id: I20bded8111deb122757dbf259d17cd80010c2bb6 Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>	2020-03-27 16:16:00 -07:00
Greg Tucker	af13ed6136	ec: Fix second windows reg push for avx512 Change improper stack push in windows prolog. Error was not reachable without windows nasm support and so went undetected. Change-Id: I8b715195d1c8efd173843c043d42fc610ddebd17 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-03-20 12:36:58 -07:00
Greg Tucker	ede04f0a1f	build: Fix for windows to allow nasm use Previously windows build could only use yasm because some procedural items such as proc_start were not supported by nasm. This adds a few macros and fixes so nasm can be used to build on windows. Change-Id: Ia05dc3ff482f33b0f915bb1be3c7df5e4a753b3a Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-03-17 18:05:46 -07:00
Greg Tucker	5ab40c79cc	ec: Fix windows reg push for avx512 Push of registers overlapped xmm push. Error was not reachable without windows nasm support and so went undetected. Change-Id: I0ffd66f6d32ac37ea03fe9b11924968aa50f8fa7 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-03-17 18:05:46 -07:00
Greg Tucker	472e7011e8	ec: Change use of windows macro save_xmm128 to vec For builds under windows this could emit a non-vec mov that's not optional for AVX versions. Change-Id: I31e6ea3b62d48c5a13f6e83f8d684f0b5551087b Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-03-17 18:04:54 -07:00
Greg Tucker	7c0ab1d459	build: Add auto regenerate of nmake file Change-Id: Icaa64aa35697c87779df18c3941d3df0f3256546 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-03-10 14:00:05 -07:00
Greg Tucker	794413ddd2	ec: Remove arch-specific redundant gf_nvect tests The gf_{2-6}vect_dot_prod tests were kept in other_tests since the 5,6vect functions were not strictly called by the higher level ec_encode_data() and needed independent testing. As this has now changed the extra tests can be removed as redundant. Change-Id: I8a95e31487b150a2a8f929c5586785524d951fde Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-03-06 13:45:59 -07:00
Greg Tucker	806b55ee57	build: Bump revision to 2.29 Change-Id: I78cfa77864f3fd77c3b63199bc18fd1782fe3dc2 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-02-26 18:29:49 -07:00
Greg Tucker	2db2cd557c	Update release notes for v2.29 additions Change-Id: Id9ba5da760ee60dbb1de47162e6276f522bc0850 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-02-26 12:04:18 -07:00
Greg Tucker	6136a04bbe	crc: Add new vclmul version of crc16_t10dif Change-Id: Ic068f35d5d8c34b74128b7a2ea8e82f5fa693c28 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-02-21 19:54:19 -07:00
Greg Tucker	5ef6eb5c68	crc: Add new vclmul version of crc32_ieee Change-Id: Ib761e3240d8252ce84e9abeadb568dce60742717 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-02-21 10:11:16 -07:00
Greg Tucker	25a673d75a	crc: Add new vclmul version of gzip_refl Change-Id: I8050853dcd177f4fb506f32f5fa723f7a1d3cded Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-02-21 10:11:16 -07:00
Greg Tucker	4217930338	crc: Add vec version of crc16_t10dif_copy Change-Id: I5f73e8a38efd1ff50d30a39689d9d85da702e809 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-02-21 10:11:16 -07:00
Greg Tucker	02a41e0653	crc: Add vec version of crc32_ieee when avx avail Change-Id: I5542ee93156c26f5a23feb89b82f4c51f282777d Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-02-21 10:11:16 -07:00
Greg Tucker	d4131bb3d3	crc: Add vec version of crc32_gzip_refl when avx avail Change-Id: I4a069c318c809dcd21a6ebc47d3e0d1c131599ea Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-02-21 10:11:16 -07:00
Greg Tucker	ad22a90686	crc: Add vec version of crc16 when avx available Vec versions mix much better with other avx code. Change-Id: I2544c75d09231ee70f16c384b1e57062976199d9 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2020-02-21 10:11:16 -07:00
Hong Bo Peng	180c74aefd	enable VSX SIMD in ISA-L for ppc64le 1) Implement the ErasureCode function in Altivec Intrinsics 2) Coding style update Change-Id: I2c81d035f4083e9b011dbf3b741f628813b68606 Thanks-to: Daniel Axtens <dja@axtens.net> Signed-off-by: Hong Bo Peng <penghb@cn.ibm.com>	2020-02-20 09:40:43 -07:00
Zhang Jinde	a3d5cd8642	igzip: Fix clang error on dep generation Clang errors when generating dependencies due to a stray semicolon following a function definition. Change-Id: Iefb4aca988b643bb62a69bbbaf197aca20a2d085 Signed-off-by: Zhang Jinde <zjd5536@163.com>	2020-01-17 10:25:32 -07:00
Zhang Jinde	163b6cd934	igzip: Fix for deflate logic buffer management Fixes invalid logic that attempted to eliminate unnecessary copy of input to the history buffer in cases where it is not required. Correction should improve performance and not change functionality. Change-Id: Ife24dcc9d920ce220b1a394031e971321737a171 Signed-off-by: Zhang Jinde <zjd5536@163.com>	2020-01-08 09:46:16 -07:00
Jerry Yu	fc69e8fc79	igzip: fix deflate hash bug if next_in equal end_in, the function should return. Change-Id: I59e631bb1f24835fd43f943a3736e016c4e2d0ac Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>	2019-12-31 13:15:35 -07:00
Jerry Yu	e2b07bbd44	build: fix debug build problem Remove strip command when lib_debug=1 Change-Id: I1203fcbfefb3b87080e9ba12ccbfb8018a008147 Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>	2019-12-31 13:15:05 -07:00
Jerry Yu	936d05fc4f	igzip:Add decode huffman code for aarch64 Change-Id: If26cc4fd97b078b5f3b02e5f6f121a12ec73f671 Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>	2019-12-19 16:10:04 +08:00
Greg Tucker	ad49e580dc	doc: Fix missing description of gf_matrix_inverse Doc missed issue of input matrix destruction. Fixes #116 Change-Id: Ic840b27532d90518dd21ec2701c278a1c3b61a8b Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2019-12-13 16:24:05 -07:00
Zhiyuan Zhu	2b8cc393af	igzip: implement gen_icf_map with assembly Change-Id: I74e6200a732acfaac44b7f5a82bd4a2215ba1535 Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>	2019-12-13 07:54:12 +00:00
Zhiyuan Zhu	f430953f0a	igzip: cleanup perf test related code This patch addresses some cppcheck issues. And some minor changes to maintain code consistency. - Cleanup cppcheck issues. [log][igzip/igzip_perf.c] (error) Shifting signed 32-bit value by 31 bits is undefined behaviour [log][igzip/igzip_hist_perf.c:132]: (error) Memory leak: outbuf - Some minor changes to maintain code consistency. igzip/igzip_build_hash_table_perf.c igzip/igzip_hist_perf.c igzip/igzip_semi_dyn_file_perf.c - delete unused variable outbuf and outbuf_size from igzip/igzip_hist_perf.c Change-Id: Icbbd8f70de689931c8a844d89e457af8d97c6793 Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>	2019-12-06 15:33:20 +08:00
Zhiyuan Zhu	683364c47b	igzip: implement encode_deflate_icf with assembly Change-Id: I90b12da2d2a96bfdb47d29ab329648247a756585 Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>	2019-11-29 14:45:45 -07:00
John Kariuki	5eeb33f69c	ec: add AVX512 ec functions with 5 and 6 outputs Added AVX512 optimized functions to calculate the GF(2^8) vector dot product with 5 and 6 outputs at a time. Also added GF(2^8) vector multiply AVX512 optimized functions with 5 and 6 accumulate. Change-Id: I6d2c080f4f4f8e4823ad9a9be2c65c3b5b3bb1f8 Signed-off-by: John Kariuki <John.K.Kariuki@intel.com>	2019-11-19 10:12:14 -07:00
Samuel Lee	4785428d2f	crc: arm64 implementation tweaks + Utilise `pmull2` instruction in main loops of arm64 crc functions and avoid the need for `dup` to align multiplicands. + Use just 1 ASIMD register to hold both 64b p4 constants, appropriately aligned. + Interleave quadword `ldr` with `pmull{2}` to avoid unnecessary stalls on existing LITTLE uarch (which can only issue these instructions every other cycle). + Similarly interleave scalar instructions with ASIMD instructions to increase likelihood of instruction level parallelism on a variety of uarch. + Cut down on needless instructions in non-critical sections to help performance for small buffers. + Extract common instruction sequences into inner macros and moved them into shared header - crc_common_pmull.h + Use the same human readable register aliases and register allocation in all 4 implementations, never refer to registers without using human readable alias. + Use #defines rather than .req to allow use of same names across several implementations + Reduce tail case size from 1024B to 64B + Phrased the `eor` instructions in the main loop to more clearly show that we can rewrite pairs of `eor` instructions with a single `eor3` instruction in the presence of Armv8.2-SHA (should probably be an option in multibinary in future). Change-Id: I3688193ea4ad88b53cf47e5bd9a7fd5c2b4401e1 Signed-off-by: Samuel Lee <samuel.lee@microsoft.com>	2019-11-13 10:58:19 -07:00
Greg Tucker	0a8d05a81e	doc: Move arch-dependent build instructions to readme Removed the redundant parts that apply to all arch. Change-Id: I2015c436cc8ea09913a8d0d4ce2cf1f112d71dde Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2019-11-01 15:55:44 -07:00
Hang Li	02a86dfb3f	erasure_code: modify eor way in aarch64 neon codes Change-Id: I9fb9219c5f280ed88194ec63234af046a5a036ae Signed-off-by: Hang Li <lihang48@hisilicon.com>	2019-11-01 15:31:33 -07:00
Jerry Yu	ce9e56054a	igzip:implement deflate hash with assembly Change-Id: I39b3a37cd291c40f597750839c27db2a6a571fe5 Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>	2019-11-01 14:41:46 -07:00
Jerry Yu	216d0f929b	build: fix cross compile issue Replace hardcode gcc with $(CC). as_filter will work correct in cross compile Change-Id: I484d5074abdfc80ed5cd14fdd1358274f306bcfd Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>	2019-11-01 18:11:05 +08:00
Jerry Yu	5d7724898d	build: fix wrong use the register name The third parameter must be 32bit register . Those assmebly put 64bit register here , it is wrong . Change-Id: Iebe17516b555a6a9b94ea7baa4778ad4b9dd0878 Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>	2019-11-01 18:11:00 +08:00
Jerry Yu	b441659879	multibinary: fix strict-prototype warning with -Wstric-prototype option , GCC report the warning . Change-Id: Ic2d1adb566ad21deec65c66552e2863254e1376a Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>	2019-11-01 18:10:57 +08:00
Jerry Yu	f0104600a0	build: disable clang support in ci - Disable clang test for travis and drone.io - Add document about compiler requirement Change-Id: I81f8dc31088d40f315dd4ec062bed5df8ab7b633 Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>	2019-11-01 18:10:50 +08:00
Zhiyuan Zhu	6b70da5051	igzip: implement set_long_icf_fg with assembly Change-Id: I21ac55985a56c2b7b0a684934c076600d90f8b0a Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>	2019-10-31 11:02:54 -07:00
Greg Tucker	4ed944c4b1	build: Fix travis osx issue with brew update Bug in Homebrew auto-update causes post-update install to use the old environment. Change-Id: I03e20d899f558f71579dfd4be3f96903b77f1998 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2019-10-30 11:16:49 -07:00
Hang Li	621cf92c52	erasure_code: modify perf benchmark loop Change-Id: Ie45ceb3ac55ab943a155e2a3f9f6b765cd94d7a1 Signed-off-by: Hang Li <lihang48@hisilicon.com>	2019-10-30 10:34:40 -07:00
Greg Tucker	2f9eef537c	build: Fix autoconf build for mingw target Change-Id: Ie5ae17556f8cc95af8e59c8bd81a958c94455cd1 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2019-10-28 15:53:14 -07:00
Greg Tucker	e6848434ae	test: Fix issue keeping mingw tests from running Change-Id: I1e72ed99c2f09cbad488774313cddafdb1ce5de8 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2019-10-28 15:52:48 -07:00
Greg Tucker	533ba53f11	crc: Fix symbol conflict with older assemblers Change-Id: I6f1322a5fecdf21b2c774454cd51cb56767f30b8 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>	2019-10-28 14:39:44 -07:00
Zhou Xiong	d7848c1d05	Implement aarch64 neon for erasure code. 1.Replace below erasure code interfaces to arm neon interface by mbin_interface function. ec_encode_data gf_vect_mul gf_vect_dot_prod gf_vect_mad ec_encode_data_update 2.Utilise arm neon instrution to accelerate GF(2^8) set compute by 128bit registor. Change-Id: Ib0ecbfbd1837d2b1f823d26815c896724d2d22e4 Signed-off-by: Zhou Xiong <zhouxiong13@huawei.com>	2019-10-25 11:09:03 -07:00
Jun He	c680d3aba7	Add arm64 to Travis matrix Enable new arm64 architecture in TravisCI, add tests for following compilers: gcc: v5.4.0 clang: v3.8.0 Change-Id: Id0b2f2231fabcbeff7061f85050db99df12c9a67 Signed-off-by: Jun He <jun.he@arm.com>	2019-10-24 10:09:19 +08:00

... 2 3 4 5 6 ...

664 Commits