Performance improvement for inflate to skip the time-consuming process of decode
table expansion when the header matches a known common dymanic one such as
produced by level 0 compression.
Change-Id: Ia2550b812a062b7cc2eb1b72bcb609f1a631e40b
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
Previous new crc version missed the update for nmake.
Change-Id: Ie529ee9d70d8d0ab8a8af3bd2720405802180d1e
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
Issue with reading header only appears when combined with new feature in cli of
multiple concatenated gzip files.
Change-Id: Id8df9150c6f27d8b22e810b511291f3fcf136723
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
This will make it easier for users to get the latest version. Installing with conda is easier than compiling it yourself. Distro packages (such as Debian's) do not always ship the latest version while conda-forge can. This badge will advertise this install method.
Change-Id: I99a1853a00e55fdf0c574c9906675738ac278121
Signed-off-by: Ruben Vorderman <r.h.p.vorderman@lumc.nl>
Arm64 and ppc64 build reports below error:
"configure: error: conditional "INTEL_CET_ENABLED" was never defined."
And the error should be report in all non-x86 platform.
Change-Id: I4c1b2fc99091424cfd5c62cf4d6536222b66712d
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
We should generate .note.gnu.property section with x86 assembly codes
for ELF outputs to mark Intel CET support when Intel CET is enabled
since all input files must be marked with Intel CET support in order
for linker to mark output with Intel CET support. Since nasm and yasm
can't generate the proper .note.gnu.property section, yasm-cet-filter.sh
and yasm-filter.sh are added to generate the proper .note.gnu.property
with linker help.
Verified with
$ CC="gcc -Wl,-z,cet-report=error -fcf-protection" CXX="g++ -Wl,-z,cet-report=error -fcf-protection" .../configure x86_64-linux
$ make -j8
on Linux/x86-64.
Change-Id: I14e03a8a9031c8397dc36939a528cf5a827d775a
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
To support Intel CET, all indirect branch targets must start with
ENDBR32/ENDBR64. Here is a patch to define endbranch and add it to
function entries in x86 assembly codes which are indirect branch
targets as discovered by running testsuite on Intel CET machine and
visual inspection.
Verified with
$ CC="gcc -Wl,-z,cet-report=error -fcf-protection" CXX="g++ -Wl,-z,cet-report=error -fcf-protection" .../configure x86_64-linux
$ make -j8
$ make -j8 check
with both nasm and yasm on both CET and non-CET machines.
Change-Id: I9822578e7294fb5043a64ab7de5c41de81a7d337
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
To reduce the cache missing events, the mix layout is changed
to PMULL+CRC. It also relaxes the final delay caused by data
dependency.
As results, the cold perf was improved about 20% and warm perf
was improved about 4%.
Change-Id: I7756f846edcb4f1665b4643a5a0e02283938cfdf
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
Hardware folding algorithm depend on CRC32 and PMULL instruction.
And it should match both flags .
Change-Id: I361068402db1fe6d7c0bd8d2c7048f1d94880233
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
The nmake default is changed for a modern nasm. Older nasm and yasm versions
will still work with windows but the nmake options must be changed appropriately
for max AS_FEATURE_LEVEL to match. Also now generates debug symbol pdb files.
Change-Id: I94a2dd7ecf541c6564ccbd4a184c33995d7b31ad
Signed-off-by: Poornima Kumar <poornima.kumar@intel.com>
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
This patch provides microarchitecture information
and make microarchitecture optimization possible. It
will trap into kernel due to mrs instruction. So it
should be called only in dispatcher, that will be
called only once in program lifecycle. And HWCAP must
be match,That will make sure there are no illegal
instruction errors.
Change-Id: I393ec742010bf3f10ce335482c0350aa4202c788
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
Change improper stack push in windows prolog. Error was not reachable without
windows nasm support and so went undetected.
Change-Id: I8b715195d1c8efd173843c043d42fc610ddebd17
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
Previously windows build could only use yasm because some procedural items such
as proc_start were not supported by nasm. This adds a few macros and fixes so
nasm can be used to build on windows.
Change-Id: Ia05dc3ff482f33b0f915bb1be3c7df5e4a753b3a
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
Push of registers overlapped xmm push. Error was not reachable without windows
nasm support and so went undetected.
Change-Id: I0ffd66f6d32ac37ea03fe9b11924968aa50f8fa7
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
For builds under windows this could emit a non-vec mov that's not optional for
AVX versions.
Change-Id: I31e6ea3b62d48c5a13f6e83f8d684f0b5551087b
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
The gf_{2-6}vect_dot_prod tests were kept in other_tests since the 5,6vect
functions were not strictly called by the higher level ec_encode_data() and
needed independent testing. As this has now changed the extra tests can be
removed as redundant.
Change-Id: I8a95e31487b150a2a8f929c5586785524d951fde
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
Vec versions mix much better with other avx code.
Change-Id: I2544c75d09231ee70f16c384b1e57062976199d9
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
1) Implement the ErasureCode function in Altivec Intrinsics
2) Coding style update
Change-Id: I2c81d035f4083e9b011dbf3b741f628813b68606
Thanks-to: Daniel Axtens <dja@axtens.net>
Signed-off-by: Hong Bo Peng <penghb@cn.ibm.com>
Clang errors when generating dependencies due to a stray semicolon following a
function definition.
Change-Id: Iefb4aca988b643bb62a69bbbaf197aca20a2d085
Signed-off-by: Zhang Jinde <zjd5536@163.com>
Fixes invalid logic that attempted to eliminate unnecessary copy of input to the
history buffer in cases where it is not required. Correction should improve
performance and not change functionality.
Change-Id: Ife24dcc9d920ce220b1a394031e971321737a171
Signed-off-by: Zhang Jinde <zjd5536@163.com>
This patch addresses some cppcheck issues.
And some minor changes to maintain code consistency.
- Cleanup cppcheck issues.
[log][igzip/igzip_perf.c] (error) Shifting signed 32-bit value by 31 bits is undefined behaviour
[log][igzip/igzip_hist_perf.c:132]: (error) Memory leak: outbuf
- Some minor changes to maintain code consistency.
igzip/igzip_build_hash_table_perf.c
igzip/igzip_hist_perf.c
igzip/igzip_semi_dyn_file_perf.c
- delete unused variable
outbuf and outbuf_size from igzip/igzip_hist_perf.c
Change-Id: Icbbd8f70de689931c8a844d89e457af8d97c6793
Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>
Added AVX512 optimized functions to calculate the
GF(2^8) vector dot product with 5 and 6 outputs
at a time. Also added GF(2^8) vector multiply
AVX512 optimized functions with 5 and 6 accumulate.
Change-Id: I6d2c080f4f4f8e4823ad9a9be2c65c3b5b3bb1f8
Signed-off-by: John Kariuki <John.K.Kariuki@intel.com>
+ Utilise `pmull2` instruction in main loops of arm64 crc functions and
avoid the need for `dup` to align multiplicands.
+ Use just 1 ASIMD register to hold both 64b p4 constants,
appropriately aligned.
+ Interleave quadword `ldr` with `pmull{2}` to avoid unnecessary stalls
on existing LITTLE uarch (which can only issue these instructions every
other cycle).
+ Similarly interleave scalar instructions with ASIMD instructions to
increase likelihood of instruction level parallelism on a variety of
uarch.
+ Cut down on needless instructions in non-critical sections to help
performance for small buffers.
+ Extract common instruction sequences into inner macros and moved
them into shared header - crc_common_pmull.h
+ Use the same human readable register aliases and register allocation
in all 4 implementations, never refer to registers without using human
readable alias.
+ Use #defines rather than .req to allow use of same names across
several implementations
+ Reduce tail case size from 1024B to 64B
+ Phrased the `eor` instructions in the main loop to more clearly show
that we can rewrite pairs of `eor` instructions with a single `eor3`
instruction in the presence of Armv8.2-SHA (should probably be an option
in multibinary in future).
Change-Id: I3688193ea4ad88b53cf47e5bd9a7fd5c2b4401e1
Signed-off-by: Samuel Lee <samuel.lee@microsoft.com>
Removed the redundant parts that apply to all arch.
Change-Id: I2015c436cc8ea09913a8d0d4ce2cf1f112d71dde
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>