a2eb9688a4
actually be used, namely x86* platforms [because they don't bomb on unaligned access]. This resulted in 30-40% [depending on message length] improvement for SHA-256 compiled with gcc and running on P4. In the lack of assembler implementation I give the compiler all the help it can possibly get:-)
C2.pl works