vpx/vpx_dsp
Johann f3c97ed32e subpel variance neon: reduce stack usage
Unlike x86, arm does not impose additional alignment restrictions on
vector loads. For incoming values to the first pass, it uses vld1_u32()
which typically does impose a 4 byte alignment. However, as the first
pass operates on user-supplied values we must prepare for unaligned
values anyway (and have, see mem_neon.h).

But for the local temporary values there is no stride and the load will
use vld1_u8 which does not require 4 byte alignment.

There are 3 temporary structures. In the C, one is uint16_t. The arm
saturates between passes but still passes tests. If this becomes an
issue new functions will be needed.

Change-Id: I3c9d4701bfeb14b77c783d0164608e621bfecfb1
2017-05-24 13:28:13 -07:00
..
2017-03-22 14:01:03 +05:30
2017-03-24 20:41:39 -07:00
2016-07-25 14:14:19 -07:00
2016-07-25 14:14:19 -07:00
2017-03-24 20:41:39 -07:00
2016-07-25 14:14:19 -07:00
2016-07-25 14:14:19 -07:00
2017-05-09 11:05:51 -07:00
2016-07-25 14:14:19 -07:00
2017-02-24 15:36:52 -08:00
2016-07-25 14:14:19 -07:00
2016-07-25 14:14:19 -07:00
2016-07-25 14:14:19 -07:00
2016-07-25 14:14:19 -07:00
2016-07-25 14:14:19 -07:00
2017-04-17 14:26:33 +00:00
2016-07-25 14:14:19 -07:00
2016-07-25 14:14:19 -07:00
2016-07-25 14:14:19 -07:00
2016-07-25 14:14:19 -07:00
2017-04-17 08:40:43 -07:00
2016-08-23 17:05:39 -07:00
2016-07-25 14:14:19 -07:00
2016-07-25 14:14:19 -07:00
2017-05-17 07:38:18 -07:00