Commit Graph

33 Commits

Author SHA1 Message Date
Andy Polyakov
2b9a8ca15b x86gas.pl: add palignr and move pclmulqdq. 2011-05-16 18:07:00 +00:00
Andy Polyakov
b5c6aab57e x86_64-xlate.pl: allow "base-less" effective address, add palignr, move
pclmulqdq.
2011-05-16 17:44:38 +00:00
Andy Polyakov
56c5f703c1 IA-64 assembler pack: fix typos and make it work on HP-UX. 2011-05-07 20:36:05 +00:00
Andy Polyakov
1e86318091 ARM assembler pack: profiler-assisted optimizations and NEON support. 2011-04-01 20:58:34 +00:00
Andy Polyakov
bc5b136c5c ghash-x86.pl: optimize for Sandy Bridge. 2011-03-04 13:21:41 +00:00
Andy Polyakov
0ab8fd58e1 s390x assembler pack: tune-up and support for new z196 hardware. 2011-03-04 13:09:16 +00:00
Andy Polyakov
e822c756b6 s390x assembler pack: adapt for -m31 build, see commentary in Configure
for more details.
2010-11-29 20:52:43 +00:00
Andy Polyakov
8986e37249 ghash-s390x.pl: reschedule instructions for better performance. 2010-09-21 11:37:00 +00:00
Andy Polyakov
f8927c89d0 Alpha assembler pack: adapt for Linux.
PR: 2335
2010-09-13 13:28:52 +00:00
Andy Polyakov
7d1f55e9d9 Add ghash-s390x.pl. 2010-09-10 14:50:17 +00:00
Andy Polyakov
d52d5ad147 modes/asm/ghash-*.pl: switch to [more reproducible] performance results
collected with 'apps/openssl speed ghash'.
2010-09-05 19:52:14 +00:00
Andy Polyakov
a3b0c44b1b ghash-ia64.pl: 50% performance improvement of gcm_ghash_4bit. 2010-09-05 19:49:54 +00:00
Andy Polyakov
85e28dfa6f ghash-ia64.pl: excuse myself from implementing "528B" variant. 2010-07-26 21:54:21 +00:00
Andy Polyakov
133a7f9a50 perlasm/x86asm.pl: move aesni and pclmulqdq opcodes to aesni-x86.pl and
ghash-x86.pl.
2010-07-26 21:42:07 +00:00
Andy Polyakov
2d22e08083 ARM assembler pack: reschedule instructions for dual-issue pipeline.
Modest improvement coefficients mean that code already had some
parallelism and there was not very much room for improvement. Special
thanks to Ted Krovetz for benchmarking the code with such patience.
2010-07-13 14:03:31 +00:00
Andy Polyakov
396df7311e crypto/*/Makefile: unify "catch-all" assembler make rules and harmonize
ARM assembler modules.
2010-07-08 15:03:42 +00:00
Andy Polyakov
acbcc271b1 ghash-armv4.pl: excuse myself from implementing "528B" flavour. 2010-07-02 08:14:12 +00:00
Andy Polyakov
b28750877c ghash-sparcv9.pl: fix Makefile rule and add performance data for T1. 2010-07-02 08:09:30 +00:00
Andy Polyakov
c32fcca6f4 SPARCv9 assembler pack: refine CPU detection on Linux, fix for "unaligned
opcodes detected in executable segment" error.
2010-07-01 07:34:56 +00:00
Andy Polyakov
d364506a24 ghash-x86_64.pl: "528B" variant delivers further >30% improvement. 2010-06-09 15:05:59 +00:00
Andy Polyakov
04e2b793d6 ghash-x86.pl: commentary updates. 2010-06-09 15:05:14 +00:00
Andy Polyakov
8525950e7e ghash-x86.pl: "528B" variant of gcm_ghash_4bit_mmx gives 20-40%
improvement.
2010-06-04 13:21:01 +00:00
Andy Polyakov
07e29c1234 ghash-x86.pl: MMX optimization (+20-40%) and commentary update. 2010-05-23 12:37:01 +00:00
Andy Polyakov
1aa8a6297c ghash-x86[_64].pl: add due credit. 2010-05-13 17:21:52 +00:00
Andy Polyakov
c1f092d14e GCM "jumbo" update:
- gcm128.c: support for Intel PCLMULQDQ, readability improvements;
- asm/ghash-x86.pl: splitted vanilla, MMX, PCLMULQDQ subroutines;
- asm/ghash-x86_64.pl: add PCLMULQDQ implementations.
2010-05-13 15:32:43 +00:00
Andy Polyakov
8a682556b4 Add ghash-armv4.pl. 2010-05-03 18:23:29 +00:00
Andy Polyakov
5e19ee96f6 Add ghash-parisc.pl. 2010-04-28 18:51:45 +00:00
Andy Polyakov
4f39edbff1 gcm128.c and assembler modules: change argument order for gcm_ghash_4bit.
ghash-x86*.pl: fix performance numbers for Core2, as it turned out
previous ones were "tainted" by variable clock frequency.
2010-04-14 19:04:51 +00:00
Andy Polyakov
42feba4797 Add ghash-alpha.pl assembler module. 2010-04-10 13:44:20 +00:00
Andy Polyakov
c3473126b1 GHASH assembler: new ghash-sparcv9.pl module and saner descriptions. 2010-03-22 17:24:18 +00:00
Andy Polyakov
480cd6ab6e ghash-ia64.pl: new file, GHASH for Itanium.
ghash-x86_64.pl: minimize stack frame usage.
ghash-x86.pl: modulo-scheduling MMX loop in respect to input vector
results in up to 10% performance improvement.
2010-03-15 19:07:52 +00:00
Andy Polyakov
f093794e55 Add GHASH x86_64 assembler. 2010-03-11 16:19:46 +00:00
Andy Polyakov
e3a510f8a6 Add GHASH x86 assembler. 2010-03-09 23:03:33 +00:00