94376cccb4
ARM has optimized Cortex-A5x pipeline to favour pairs of complementary AES instructions. While modified code improves performance of post-r0p0 Cortex-A53 performance by >40% (for CBC decrypt and CTR), it hurts original r0p0. We favour later revisions, because one can't prevent future from coming. Improvement on post-r0p0 Cortex-A57 exceeds 50%, while new code is not slower on r0p0, or Apple A7 for that matter. [Update even SHA results for latest Cortex-A53.] Reviewed-by: Richard Levitte <levitte@openssl.org>