Add support for everything except block sizes of 4.
Performance is better but numbers will improve again when the variance
optimizations land.
BUG=webm:1423
Change-Id: I92eb4312b20be423fa2fe6fdb18167a604ff4d80
When a neon version is available it will be called. This allows
decoupling the variance implementations and has no real downside. For
most configurations, the call will be #define'd to the neon
implementation.
Change-Id: Ibb2afe4e156c5610e89488504d366b3e6d1ba712