ghash-x86_64.pl: minimize stack frame usage. ghash-x86.pl: modulo-scheduling MMX loop in respect to input vector results in up to 10% performance improvement.