algorithm suggested in following paper:
Câmara, D.; Gouvêa, C. P. L.; López, J. & Dahab, R.: Fast Software
Polynomial Multiplication on ARM Processors using the NEON Engine.
http://conradoplg.cryptoland.net/files/2010/12/mocrysen13.pdf
(cherry picked from commit f8cee9d08181f9e966ef01d3b69ba78b6cb7c8a8)