generic-library/vpx

Author	SHA1	Message	Date
Mans Rullgard	4fa93bcef4	vp9: neon: use aligned stores in convolve functions The destination is block-aligned so it is safe to use aligned stores. Change-Id: I38261e4fa40bc60e6472edffece59e372908da7e	2013-08-16 14:25:08 +01:00
Johann	4417c04531	Merge "vp9: neon: optimise convolve8_vert functions"	2013-08-12 17:54:47 -07:00
Mans Rullgard	ad7021dd6c	vp9: neon: optimise convolve8_vert functions Invert loops to operate vertically in the inner loop. This allows removing redundant loads. Also add preloading of data. Change-Id: I4fa85c0ab1735bcb1dd6ea58937efac949172bdc	2013-08-12 15:37:48 +01:00
Mans Rullgard	b84dc949c8	vp9: neon: optimise convolve8_horiz functions Each iteration of the horizontal loop reuses 7 of the 11 source values. Loading only the 4 new values saves some time. Also add preload for source data. Overall 4% faster on Chromebook. Change-Id: I8f69e749f2b7f79e9734620dcee51dbfcd716b44	2013-08-11 16:21:55 +01:00
Mans Rullgard	355cb14dc7	vp9: neon: convolve: replace some insns with simpler equivalents Change-Id: I5d6906772e6e6adf68d7f0fd5b8b5207a64a3a37	2013-08-02 08:11:28 -07:00
Mans Rullgard	2003468df8	vp9: neon: convolve: simplify branching to C fallbacks Change-Id: Ic7cacd02d6dc9243ad8fc85082c5618a9d1e66dc	2013-08-02 08:11:25 -07:00
Mans Rullgard	5e2e78d024	vp9: neon: optimise loads in horiz convolve functions Loading to single lanes in multiple registers is expensive since it requires a read and write of each register which saturates the register file access. Loading to single registers followed by a separate transpose reduces this pressure. Change-Id: I4cc35887ddbca80e5e635b50d2b1d158de9668ee	2013-08-02 08:11:08 -07:00
Johann	158c80cbb0	convolve8 optimizations for neon Independent horizontal and vertical implementations. Requires that blocks be built from 4x4 and [xy]_step_q4 == 16 6-10% improvement. CIF improved the least. Change-Id: I137f5ceae4440adc0960bf88e4453e55a618bcda	2013-07-11 11:08:19 -07:00

8 Commits