Commit Graph

622 Commits

Author SHA1 Message Date
Chunsong Feng
e297ecae7a crc16: Accelerate T10DIF performance with prefetch and pmull2
The memory block size calculated by t10dif is generally 512 bytes in
sectors. prefetching can effectively reduce cache misses.Use ldp instead
of ldr to reduce the number of instructions, pmull+pmull2 can resuce
register access. The perf test result shows that the performance is
improved by 5x ~ 14x after optimization.

Change-Id: Ibd3f08036b6a45443ffc15f808fd3b467294c283
Signed-off-by: Chunsong Feng <fengchunsong@huawei.com>
2022-03-31 09:58:04 -07:00
Greg Tucker
ad8dce15c6 doc: Add function overview and usage page
While the external headers define the API, we could really use this
overview to get users started and point them to examples.

Change-Id: Iba419e61d0d7723e1029a3b6e7259facfeb39522
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2022-02-15 16:59:31 -07:00
H.J. Lu
57846f414f Properly add .note.gnu.property section to assembly codes
1. Revert "x86: Generate .note.gnu.property section for ELF output"

This reverts commit 8074e3fe1b, which is
a hack to work around the old nasm which doesn't support

section .note.gnu.property  note  alloc noexec align=8

This hack doesn't work for downstream, like:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=2040091

2. If Intel CET is enabled, require nasm with note section support to
add

section .note.gnu.property  note  alloc noexec align=N

to assembly codes.

Verified with

$ CC="gcc -Wl,-z,cet-report=error -fcf-protection" CXX="g++ -Wl,-z,cet-report=error -fcf-protection" .../configure x86_64-linux
    $ make -j8

on Tiger Lake.

Change-Id: I6d66fe6fd054420d7fde35b1508ca9f09defdeca
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2022-01-20 12:23:30 -07:00
Nicola Torracca
e3783f28f8 Add AVX512 implementation of mem_zero_detect().
Change-Id: I60fe0846d783787198b6a44a090fd9fe17c1807f
Signed-off-by: Nicola Torracca <shark@bitchx.it>
2022-01-04 12:25:23 -07:00
Ilya Leoshkevich
d3cfb2fb77 Fix s390 build
The goal of this patch is to make isa-l testsuite pass on s390 with
minimal changes to the library. The one and only reason isa-l does not
work on s390 at the moment is that s390 is big-endian, and isa-l
assumes little-endian at a lot of places.

There are two flavors of this: loading/storing integers from/to
memory, and overlapping structs. Loads/stores are already helpfully
wrapped by unaligned.h header, so replace the functions there with
endianness-aware variants. Solve struct member overlap by reversing
their order on big-endian.

Also, fix a couple of usages of uninitialized memory in the testsuite
(found with MemorySanitizer).

Fixes s390x part of #188.

Change-Id: Iaf14a113bd266900192cc8b44212f8a47a8c7753
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
2022-01-04 11:06:17 -07:00
Guodong Xu
3b3d7cc47b Enable SVE in ISA-L erasure code for aarch64
This patch adds Arm (aarch64) SVE [1] variable-length vector assembly support
into ISA-L erasure code library. "Arm designed the Scalable Vector Extension
(SVE) as a next-generation SIMD extension to AArch64. SVE allows flexible
vector length implementations with a range of possible values in CPU
implementations. The vector length can vary from a minimum of 128 bits up to
a maximum of 2048 bits, at 128-bit increments. The SVE design guarantees
that the same application can run on different implementations that support
SVE, without the need to recompile the code. " [3]

Test method:
 - This patch was tested on Fujitsu's A64FX [2], and it passed all erasure
     code related test cases, including "make checks" , "make test", and
     "make perf".
 - To ensure code testing coverage, parameters in files (erasure_code/
     erasure_code_test.c , erasure_code_update_test.c and gf_vect_mad_test.c)
     are modified to cover all _vect versions of _mad_sve() / _dot_prod_sve()
     rutines.

Performance improvements over NEON:
In general, SVE benchmarks (bandwidth in MB/s) are 40% ~ 100% higher than NEON
when running _cold style (data uncached and pulled from memory) perfs. This
includes routines of dot_prod, mad, and mul.

Optimization points:
This patch was tuned for the best performance on A64FX. Tuning points being
touched in this patch include:
1) Data prefetch into L2 cache before loading. See _sve.S files.
2) Instruction sequence orchestration. Such as interleaving every two
     'ld1b/st1b' instructions with other instructions. See _sve.S files.
3) To improve dest vectors parallelism, in highlevel, running
     gf_4vect_dot_prod_sve twice is better than running gf_8vect_dot_prod_sve()
     once, and it's also better than running _7vect + _vect, _6vect + _2vect,
     and _5vect + _3vect. The similar idea is applied to improve 11 ~ 9 dest
     vectors dot product computing as well. The related change can be found
     in ec_encode_data_sve() of file:
     erasure_code/aarch64/ec_aarch64_highlevel_func.c

Notes:
1) About vector length: A64FX has a vector register length of 512bit. However,
     this patchset was written with variable length assembly so it work
     automatically on aarch64 machines with any types of SVE vector length,
     such as SVE-128, SVE-256, etc..
2) About optimization: Due to differences in microarchitecture and
     cache/memory design, to achieve optimum performance on SVE capable CPUs
     other than A64FX, it is considered necessary to do microarchitecture-level
     tunings on these CPUs.

[1] Introduction to SVE - Arm Developer.
      https://developer.arm.com/documentation/102476/latest/
[2] FUJITSU Processor A64FX.
      https://www.fujitsu.com/global/products/computing/servers/supercomputer/a64fx/
[3] Introducing SVE.
      https://developer.arm.com/documentation/102476/0001/Introducing-SVE

Change-Id: If49eb8a956154d799dcda0ba4c9c6d979f5064a9
Signed-off-by: Guodong Xu <guodong.xu@linaro.org>
2022-01-04 10:54:38 -07:00
Greg Tucker
642ef36874 Fix check signoff for github actions
Github actions checkout changed to pull only a single generated merge
commit instead of the actual PR commit id. This breaks check_format
test for signoff. Pulling history of 2 will include the actual commit
ID.

Change-Id: I7d83871159d24faaf2f8e6086f12173e14cbcf3c
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2021-12-30 16:49:27 -07:00
John Zhang
0de83dbff7 add help2man as optional package
Change-Id: Id01a6d0fa77d5ec4959c2e9d9b0d6c3390cd43be
Signed-off-by: John Zhang <zsgsdesign@gmail.com>
2021-11-29 10:17:52 -07:00
Ruben Vorderman
78f5c31e66 Create github CI yaml file
This file automatically triggers testing on github actions.

Change-Id: I23848f2dca925e0c96e64f7d655f32b83498bed1
Signed-off-by: Ruben Vorderman <r.h.p.vorderman@lumc.nl>
2021-10-29 17:06:36 -07:00
Ruben Vorderman
fd83ed1924 Add -arch to unsupported arguments in [ny]asm-filters
Change-Id: Ieb53bb225815e204482e74bb383f1b61f12dabfd
Signed-off-by: Ruben Vorderman <r.h.p.vorderman@lumc.nl>
2021-10-12 15:53:32 -07:00
Greg Tucker
6d17992b6d mem: Add small allocs into test to help mem checkers
Change-Id: I6de3951ff66a715d8b1c0f36d691cb60e8396139
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2021-10-04 11:01:33 -07:00
Greg Tucker
87908c9060 mem: Move new mem_zero_detect function to avx2
New mem_zero_detect function will fail on avx only machines.

Change-Id: I3bca49bff886f9c130c89e8c74b31110e9bac76b
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2021-09-30 17:47:57 -07:00
Nicola Torracca
0e65117138 mem_zero_detect_avx: OR multiple vector and test for non zero on the result
micro-optimizations: vpcmpeqb+vpmaskmov is faster than vptest according
to uops.info; make usually untaken branches target forward.
reduce numbers of data dependant branches and code size.

Change-Id: Ie70b4bc99685368e5131f23344348bfaf7c27d3e
Signed-off-by: Nicola Torracca <shark@bitchx.it>
2021-09-30 16:55:30 -07:00
Greg Tucker
f980b36655 build: Change include shortcut D to not conflict with env
The variable D= can be used to quickly add defines. This sets a null
default so it can only be overridden by the make command line.

fixes #184

Change-Id: I84615174547f36208d6d577c1e30b6fac83139b3
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2021-09-14 19:18:31 -07:00
Taiju Yamada
998e03bf95 Strip -isysroot and related flags from asm-filter
This helps python-isal compatibility.

Change-Id: I8a2540e330f229f65903bdb2cc47aceeb0724dc5
Signed-off-by: Taiju Yamada <tyamada@bi.a.u-tokyo.ac.jp>
2021-09-13 10:02:38 -07:00
Greg Tucker
066940a9a7 build: Add ms rc file to put extra metatdata on dll
Change-Id: Idf687c6b2f8d1dea203f01bf57c5158d19ed519e
Signed-off-by: Ranjit Menon <ranjit.menon@intel.com>
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2021-09-02 18:27:51 -07:00
Ruben Vorderman
908726e255 More prominently feature language bindings and igzip
Change-Id: Ief814eeb6d24f16d822e22327f40756ffba05869
Signed-off-by: Ruben Vorderman <r.h.p.vorderman@lumc.nl>
2021-08-24 18:22:54 -07:00
Ruben Vorderman
94ec6026ce Create headers based on compression parameters.
Instead of using a constant as default zlib header, create the header on the fly. Both zlib
header bytes depend on the wbits and compression level used.
Make sure that ISA-L compression level 0 is advertised as the fastest compression in
both the gzip header (setting xfl flag to 0x04) and the zlib header (as 0, fastest, other levels are 1, fast).

Change-Id: I1f30e4397a0f5fcf6df593c40178e7d6f6c05328
Signed-off-by: Ruben Vorderman <r.h.p.vorderman@lumc.nl>
2021-08-23 09:48:10 -07:00
Greg Tucker
1db0363c49 igzip: Add compress-decompress with dictionary to perf test
Change-Id: Ic396819537f5437e6aab3ebf5d023ed5cdbe852a
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2021-06-14 15:55:39 -07:00
Greg Tucker
112dd72c01 build: Remove unneeded file types.h
The file types.h has long been misnamed and overlaps with
functionality in the test helper routines.

Change-Id: I774047d3a0074198b67a6b4e909f1e2ce1938195
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2021-06-10 09:35:43 -07:00
Greg Tucker
cfdd3497d1 perf: Remove unneeded time include
Timing functions are made os-independent with test.h include.

Change-Id: Iab7d6325254d5c32263504efc756dbbe51d77153
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2021-06-09 18:33:57 -07:00
Greg Tucker
d5928e3760 build: Fix missing ms function export
Windows def file was missing an exported ec support function.
Also added path in nmake file to build extra examples.

Change-Id: I59ac1599dcb8cdb45077347c74b57aeca4751c35
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2021-06-07 18:30:08 -07:00
Greg Tucker
628f4e91ea ex: Add makefile to build examples from installed lib
Change-Id: I10a51dfe90e0672bb33348de241a5be91c9caa37
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2021-06-03 17:53:20 -07:00
Greg Tucker
0f7bf1c04d doc: Update minimum nasm recommendation and details
Change-Id: Icb113242c0ab7f3c75af3e65a8d519511f4ed4c3
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2021-05-27 17:44:31 -07:00
Greg Tucker
393f69fcac build: Change travis osx to use std brew
The osx brew and older linux targets are failing the update.
This removes the older linux builds and change the osx to
take the latest brew that comes with the image instead of
doing a brew update on every build.

Change-Id: Ib1543296a733875c9eff798326b0d45854153923
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2021-05-21 19:44:39 -07:00
Greg Tucker
240ca46ffb build: Change mingw to nasm by default
Change-Id: I80053b8cf62f5f2ef7c12661086e9aeaf2eea573
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2021-05-21 19:44:39 -07:00
Greg Tucker
d7bac36be4 crc: Fix warning in perf test from uninitialized tmp ptr
Both gcc and clang are showing a warning on this despite the buffer
always being set before use.

Change-Id: I0e8f6b9e3451efe69e49814abc883d49b04f2666
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2021-05-20 11:57:56 -07:00
Greg Tucker
fe4b7f9acc Add toplevel header gen in windows
Change-Id: I3a1e5fc495266d8ba223d75384625e22c3cf66fe
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2021-05-06 16:44:10 -07:00
Greg Tucker
2c705a26cb raid: Fix doc and base functions for min sources
The raid functions xor_gen, pq_gen and check functions
must have at least two sources. Fixes #175

Change-Id: I2e4509e037c2b1dc88f3f7449d80f4c763e1e124
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2021-04-26 16:23:58 -07:00
Greg Tucker
ebb78fc99e build: Fix warning from inconsistency in gnu make
Make changed the interpretation of escaped # in a quote causing
warnings in the test for pthreads.

Change-Id: Ice94116713aea3c3e9725b38232e03f53d6633cc
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2021-03-03 10:26:06 -07:00
luo rixin
bee5180a15 erasure_code: Fix text relocation on aarch64
Here is the bug report on ceph. https://tracker.ceph.com/issues/48681

Change-Id: Ie1c60a71f28c1a169c8899a621be9bb455f5e244
Signed-off-by: luo rixin <luorixin@huawei.com>
2021-01-08 15:23:15 -07:00
Jerry Yu
bc8b2aef55 Fix clang build fail
Author of this patch is Taiju Yamada <tyamada@bi.a.u-tokyo.ac.jp>
Re-organized by Jerry Yu <jerry.h.yu@arm.com>

Clang version must be later than 9.x according to https://reviews.llvm.org/D61719

Change-Id: I7516cca17ef4556b828fb6ecfa755e6451052359
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
2020-12-09 14:37:55 +08:00
Greg Tucker
600d8d8f77 build: Update fuzz tests for deprecated clang args
Clang has deprecated the option -fsanitize-coverage=trace-pc-guard
for use with fuzzing.

Change-Id: I7fe5da0f57ab44110208d098858b786450a0a5e7
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-12-04 15:04:02 -07:00
Greg Tucker
2df39cf5f1 build: Bump revision to 2.30
Change-Id: If6d696ee76f3949d3cf5aff34403df65bce2c6b9
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-11-06 18:08:16 -07:00
Greg Tucker
05f6a0bb39 Update release notes for v2.30 additions
Change-Id: Icbb1faa2b67d8d18b1c7cde9f09774ebd895a6df
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-11-04 14:59:33 -07:00
Greg Tucker
ece814e912 doc: Add details on build and test
Change-Id: I58401ed26ba8a0a7fad0191b4c1bbb461d0311e6
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-11-04 12:40:08 -07:00
Greg Tucker
dca9dd221e igzip: Use unaligned load on static header to fix usan
Clang with sanitizer on was catching on cast of static header.
Switching to uload64 macro for better general solution.

Change-Id: I495d440407bb1773841e2f7cdc48bd95fc1a2df4
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-11-04 12:40:08 -07:00
Greg Tucker
269df8a67d igzip: Fix order of args check in new dictionary function
In the newly added function isal_deflate_process_dict(), a null check
was added to the dictionary struct but was ineffectual because of the
order.

Change-Id: I3b3e70997210794de102b1348e1467295871cee2
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-11-03 08:50:30 -07:00
Greg Tucker
24a98e3e87 Fix missing files in extra dist
Change-Id: I83e62344fab72afd755453d4eb43e9c236ba2b86
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-10-28 17:43:53 -07:00
Greg Tucker
79143208ac test: Add testing for new dictionary functions
Change-Id: I0b0a151374acfe9b44c7a2be4bb959df59356d97
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-10-28 17:28:43 -07:00
Greg Tucker
19035917f4 igzip: Add new functions for faster dictionary compression
Change-Id: Id55728fea286d144f8a11192ab02ccc8503d7b25
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-10-21 18:09:49 -07:00
Greg Tucker
438ecd8187 Update custom hufftable tool for saving histogram
Change-Id: I515217b19373b8f996ff887268862cf2b102f3a4
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-10-21 18:09:49 -07:00
Greg Tucker
89f7c46cd5 Change igzip_file_perf to accept 0 time
Change-Id: Ie2edf8e742d0bcdd9a008704f997006f8f5009ac
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-10-21 18:09:49 -07:00
Greg Tucker
9968e7a032 Change gen cust hufftables to accept dictionary
Change-Id: I4eed03bdb91030b16b3ecfd8076adc890e4f59a2
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-10-21 18:09:49 -07:00
Greg Tucker
63dffab948 igzip: Change pre-gen inflate table to multi-symbol
Change-Id: I4b0dac1e5aa2796be17644b893e3b6c7aed05876
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-10-21 18:09:49 -07:00
Greg Tucker
d7927673ba igzip: Inflate detect pre-gen header and use pre-expanded
Performance improvement for inflate to skip the time-consuming process of decode
table expansion when the header matches a known common dymanic one such as
produced by level 0 compression.

Change-Id: Ia2550b812a062b7cc2eb1b72bcb609f1a631e40b
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-10-21 18:09:49 -07:00
Greg Tucker
cc9ed53972 build: Fix nmake check for multiple arch
Change-Id: I36c3616163f6fec61dda9cf8b35ca561e59477c9
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-08-27 11:16:30 -07:00
Greg Tucker
794b8b60c1 build: Add test to check for nmake consistency
Change-Id: I1180ba749d54e7ef433b01b33450e52ac5dbb2bb
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-08-26 11:41:03 -07:00
Greg Tucker
24623b8b82 crc: Fix missing object omitted from nmake file
Previous new crc version missed the update for nmake.

Change-Id: Ie529ee9d70d8d0ab8a8af3bd2720405802180d1e
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-08-26 09:49:23 -07:00
Greg Tucker
ec73d39086 crc: Add new vclmul version of crc32_iscsi
Change-Id: I1c509c6ea312b6eb4e1c2c1c8bb7044f7b043e0d
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2020-08-21 17:15:58 -07:00