The speedup is pretty poor. I would be concerned except the SSE2 is
worse:
Existing SSE2 improvement: 22%
New neon improvement: 35%
BUG=webm:1320
Change-Id: Ied598a261134aa6cbe69f96f58589d2bae17bf62
Implement vpx_post_proc_down_and_across_mb_row in NEON.
Runs about 6-7x faster than C.
BUG=webm:1320
Change-Id: Ic5c7d3552a88cfcf999ec5bf2bd46fee460642c2