Vec versions mix much better with other avx code.
Change-Id: I2544c75d09231ee70f16c384b1e57062976199d9
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
1) Implement the ErasureCode function in Altivec Intrinsics
2) Coding style update
Change-Id: I2c81d035f4083e9b011dbf3b741f628813b68606
Thanks-to: Daniel Axtens <dja@axtens.net>
Signed-off-by: Hong Bo Peng <penghb@cn.ibm.com>
Clang errors when generating dependencies due to a stray semicolon following a
function definition.
Change-Id: Iefb4aca988b643bb62a69bbbaf197aca20a2d085
Signed-off-by: Zhang Jinde <zjd5536@163.com>
Fixes invalid logic that attempted to eliminate unnecessary copy of input to the
history buffer in cases where it is not required. Correction should improve
performance and not change functionality.
Change-Id: Ife24dcc9d920ce220b1a394031e971321737a171
Signed-off-by: Zhang Jinde <zjd5536@163.com>
This patch addresses some cppcheck issues.
And some minor changes to maintain code consistency.
- Cleanup cppcheck issues.
[log][igzip/igzip_perf.c] (error) Shifting signed 32-bit value by 31 bits is undefined behaviour
[log][igzip/igzip_hist_perf.c:132]: (error) Memory leak: outbuf
- Some minor changes to maintain code consistency.
igzip/igzip_build_hash_table_perf.c
igzip/igzip_hist_perf.c
igzip/igzip_semi_dyn_file_perf.c
- delete unused variable
outbuf and outbuf_size from igzip/igzip_hist_perf.c
Change-Id: Icbbd8f70de689931c8a844d89e457af8d97c6793
Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>
Added AVX512 optimized functions to calculate the
GF(2^8) vector dot product with 5 and 6 outputs
at a time. Also added GF(2^8) vector multiply
AVX512 optimized functions with 5 and 6 accumulate.
Change-Id: I6d2c080f4f4f8e4823ad9a9be2c65c3b5b3bb1f8
Signed-off-by: John Kariuki <John.K.Kariuki@intel.com>
+ Utilise `pmull2` instruction in main loops of arm64 crc functions and
avoid the need for `dup` to align multiplicands.
+ Use just 1 ASIMD register to hold both 64b p4 constants,
appropriately aligned.
+ Interleave quadword `ldr` with `pmull{2}` to avoid unnecessary stalls
on existing LITTLE uarch (which can only issue these instructions every
other cycle).
+ Similarly interleave scalar instructions with ASIMD instructions to
increase likelihood of instruction level parallelism on a variety of
uarch.
+ Cut down on needless instructions in non-critical sections to help
performance for small buffers.
+ Extract common instruction sequences into inner macros and moved
them into shared header - crc_common_pmull.h
+ Use the same human readable register aliases and register allocation
in all 4 implementations, never refer to registers without using human
readable alias.
+ Use #defines rather than .req to allow use of same names across
several implementations
+ Reduce tail case size from 1024B to 64B
+ Phrased the `eor` instructions in the main loop to more clearly show
that we can rewrite pairs of `eor` instructions with a single `eor3`
instruction in the presence of Armv8.2-SHA (should probably be an option
in multibinary in future).
Change-Id: I3688193ea4ad88b53cf47e5bd9a7fd5c2b4401e1
Signed-off-by: Samuel Lee <samuel.lee@microsoft.com>
Removed the redundant parts that apply to all arch.
Change-Id: I2015c436cc8ea09913a8d0d4ce2cf1f112d71dde
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
Replace hardcode gcc with $(CC). as_filter
will work correct in cross compile
Change-Id: I484d5074abdfc80ed5cd14fdd1358274f306bcfd
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
The third parameter must be 32bit register . Those assmebly
put 64bit register here , it is wrong .
Change-Id: Iebe17516b555a6a9b94ea7baa4778ad4b9dd0878
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
with -Wstric-prototype option , GCC report the
warning .
Change-Id: Ic2d1adb566ad21deec65c66552e2863254e1376a
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
- Disable clang test for travis and drone.io
- Add document about compiler requirement
Change-Id: I81f8dc31088d40f315dd4ec062bed5df8ab7b633
Signed-off-by: Jerry Yu <jerry.h.yu@arm.com>
Bug in Homebrew auto-update causes post-update install to use the old
environment.
Change-Id: I03e20d899f558f71579dfd4be3f96903b77f1998
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
1.Replace below erasure code interfaces to arm neon interface by mbin_interface function.
ec_encode_data
gf_vect_mul
gf_vect_dot_prod
gf_vect_mad
ec_encode_data_update
2.Utilise arm neon instrution to accelerate GF(2^8) set compute by 128bit registor.
Change-Id: Ib0ecbfbd1837d2b1f823d26815c896724d2d22e4
Signed-off-by: Zhou Xiong <zhouxiong13@huawei.com>
Enable new arm64 architecture in TravisCI, add tests for
following compilers:
gcc: v5.4.0
clang: v3.8.0
Change-Id: Id0b2f2231fabcbeff7061f85050db99df12c9a67
Signed-off-by: Jun He <jun.he@arm.com>
ec_base.h has several variables, which were defined with
a global scope. Exactly those global variables caused issues
on linking a static compilation of libisal.a to a shared lib.
Adding -fPIC to CFLAGS somehow didn't help.
As all the variables in ec_base.h are only included
and used by a single C file, all of these can be
(file) static, which then will also helps the compiler to
make further optimizations. And which also solves the issue
to link the static libisal to a shared lib.
Also make the variables const, as these are constants and
must be modified.
Change-Id: I2b8141dabc1c7a528401f2778cdbdbed6c93c36b
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
This issue occurs when dynamic compilation is used
and gcc's -fsanitize memory detection option is turned on.
[Log] relocation truncated to fit: R_AARCH64_LD_PREL_LO19 against `.rodata'
Change-Id: Ic2f82264610552f347e043f82ac5ebafc93748e2
Signed-off-by: Zhiyuan Zhu <zhiyuan.zhu@arm.com>