Also, "make update" has added some missing functions to libeay.num,
updated the TABLE for the alpha changes, and updated thousands of
dependancies that have changed from recent commits.
because we're only handling words anyway) in BN_mod_exp_mont_word
making it a little faster for very small exponents,
and adjust the performance gain estimate in CHANGES according
to slightly more thorough measurements.
(15% faster than BN_mod_exp_mont for "large" base,
20% faster than BN_mod_exp_mont for small base.)
like Malloc, Realloc and especially Free conflict with already existing names
on some operating systems or other packages. That is reason enough to change
the names of the OpenSSL memory allocation macros to something that has a
better chance of being unique, like prepending them with OPENSSL_.
This change includes all the name changes needed throughout all C files.
Remove one ampersand so the compiler may complain less.
Make rand() static so it will not conflict with the C RTL.
Make bug() static too, for good measure.
Apparently BN_div_recp reports an error for small divisors
(1,2,4,8,40).
I haven't got mismatches so far. If you can, please run the test
program for a few days (nohup divtest >out& or something), and if it
reports a mismatch, post the output.
X is 5120 on 32-bit and 151552 on 64-bit architectures and I varies
from 0 to 4. As result the test was *unreasonably* slow and virtually
impossible to complete on 64-bit architectures (e.g. IRIX bc couldn't
even swallow such long lines).
crypto/bn/bn_lcl.h for further details. It should be noted that for
the moment of this writing the code was tested only on Alpha. If
compiled with DEC C the C implementation exhibits 12% performance
improvement over the crypto/bn/asm/alpha.s (on EV56 box running
AlphaLinux). GNU C is (unfortunately) 8% behind the assembler
implementation. But it's OpenVMS Alpha users who *may* benefit most
as 'apps/openssl speed rsa' exhibits 6 (six) times performance
improvement over the original VMS bignum implementation. Where "*may*"
means "as soon as code is enabled though #define SIXTY_FOUR_BIT and
crypto/bn/asm/vms.mar is skipped."
because otherwise BN_rand will fail unless DEVRANDOM works,
which causes the programs to dump core because they
don't check the return value of BN_rand (and if they
did, we still couldn't test anything).
- add comment to some files that appear not to be used at all.
returns int (1 = ok, 0 = not seeded). New function RAND_add() is the
same as RAND_seed() but takes an estimate of the entropy as an additional
argument.
problem was that one of the replacement routines had not been working since
SSLeay releases. For now the offending routine has been replaced with
non-optimised assembler. Even so, this now gives around 95% performance
improvement for 1024 bit RSA signs.
the remainder left in %edx. Here is the resulting performance improvement
matrix (improvement as a result of this *and* previous tune-up committed
two days ago). The results were obtained by profiling the "div" part of
the crypto/bn/bnspeed.c.
CPU BN_div bn_div_words overall comment
------------------------------------------------------------------------
PII +16% accumulated by +2-3% PII multiplies damn fast! Taking
inlining multiplication out of the loop
didn't make too much difference.
Eliminating of the multiplication
involved in remainder calculation
is the major factor.
Pentium +45% accumulated by +7-9% mull isn't that fast and replacing
inlining multiplications with additions in
the loop has more visible effect:-)
MIPS +75% +12% +20-25% In addition to the taking mults
R10000 out of the loop (giving 12% in the
asm/mips3.s) three mults were
eliminated in BN_div.
Alpha +30% +50% +10-15% Same as above. But remember that
EV4 bn_div_words is a C implementation.
It takes 4 Alpha mults in C to do
the same thing as 1 MIPS mult in
assembler does. So the effect (50%)
is more impressive. But not the
overall one... Well, if Alpha
bn_mul_add would be implemented
in assembler overall improvement
would be closer to MIPS...