Andy Polyakov 4c22909e31 Extra i386+gcc bn_div.c tune-up featuring inline division and saving
the remainder left in %edx. Here is the resulting performance improvement
matrix (improvement as a result of this *and* previous tune-up committed
two days ago). The results were obtained by profiling the "div" part of
the crypto/bn/bnspeed.c.

CPU	BN_div	bn_div_words	overall	comment
------------------------------------------------------------------------
PII	+16%	accumulated by	+2-3%	PII multiplies damn fast! Taking
		inlining		multiplication out of the loop
					didn't make too much difference.
					Eliminating of the multiplication
					involved in remainder calculation
					is the major factor.

Pentium	+45%	accumulated by	+7-9%	mull isn't that fast and replacing
		inlining		multiplications with additions in
					the loop has more visible effect:-)

MIPS	+75%	+12%		+20-25%	In addition to the taking mults
R10000					out of the loop (giving 12% in the
					asm/mips3.s) three mults were
					eliminated in BN_div.

Alpha	+30%	+50%		+10-15%	Same as above. But remember that
EV4					bn_div_words is a C implementation.
					It takes 4 Alpha mults in C to do
					the same thing as 1 MIPS mult in
					assembler does. So the effect (50%)
					is more impressive. But not the
					overall one... Well, if Alpha
					bn_mul_add would be implemented
					in assembler overall improvement
					would be closer to MIPS...
1999-07-31 23:27:41 +00:00
..
1999-04-19 21:31:43 +00:00
1999-04-29 16:04:54 +00:00
1999-04-19 21:31:43 +00:00
1999-04-19 21:31:43 +00:00
1999-04-19 21:31:43 +00:00
1999-05-17 22:25:31 +00:00
1999-06-03 18:04:04 +00:00
1999-04-19 21:31:43 +00:00
1999-04-19 21:31:43 +00:00
1999-06-28 16:17:38 +00:00
1999-05-13 13:29:41 +00:00
1999-05-13 13:29:41 +00:00
1999-04-29 16:07:56 +00:00
1999-07-25 22:25:12 +00:00
1999-04-19 21:31:43 +00:00
1999-05-13 11:37:32 +00:00