vpx/vpx_dsp/arm
Johann d6a7489dd5 neon variance: process two rows of 8 at a time
When the width is equal to 8, process two rows at a time. This doubles
the speed of 8x4 and improves 8x8 by about 20%.

8x16 was using this technique already, but still improved a little bit
with the rewrite.

Also use this for vpx_get8x8var_neon

BUG=webm:1422

Change-Id: Id602909afcec683665536d11298b7387ac0a1207
2017-05-04 08:59:46 -07:00
..
avg_neon.c satd highbd neon: use tran_low_t for coeff 2017-02-01 11:55:47 -08:00
deblock_neon.c postproc: vpx_mbpost_proc_down_neon 2017-01-09 10:21:56 -08:00
fwd_txfm_neon.c fdct8x8 highbd neon: use tran_low_t for output 2017-02-13 22:16:14 +00:00
hadamard_neon.c hadamard highbd neon: use tran_low_t for coeff 2017-02-01 11:50:46 -08:00
highbd_idct4x4_add_neon.c Replace 14 with DCT_CONST_BITS in idct NEON functions' shifts 2017-02-14 13:08:41 -08:00
highbd_idct8x8_add_neon.c Replace 14 with DCT_CONST_BITS in idct NEON functions' shifts 2017-02-14 13:08:41 -08:00
highbd_idct16x16_add_neon.c idct_neon: prefix non-static functions w/'vpx_' 2017-03-22 11:49:23 -07:00
highbd_idct32x32_34_add_neon.c Update 32x32 high bitdepth idct NEON optimization 2017-04-05 15:28:11 -07:00
highbd_idct32x32_135_add_neon.c Update 32x32 high bitdepth idct NEON optimization 2017-04-05 15:28:11 -07:00
highbd_idct32x32_1024_add_neon.c idct_neon: prefix non-static functions w/'vpx_' 2017-03-22 11:49:23 -07:00
highbd_idct32x32_add_neon.c Add vpx_highbd_idct{16x16,32x32}_1_add_neon() 2017-02-13 10:25:22 -08:00
highbd_intrapred_neon.c Add high bitdepth intra prediction NEON optimization (mode tm) 2016-11-15 14:19:46 -08:00
highbd_loopfilter_neon.c cosmetics,*loopfilter_neon.c: s/tranpose/transpose/ 2016-10-12 16:12:56 -07:00
highbd_vpx_convolve8_neon.c Update highbd convolve functions arguments to use uint16_t src/dst 2017-04-25 14:22:19 -07:00
highbd_vpx_convolve_avg_neon.c Update highbd convolve functions arguments to use uint16_t src/dst 2017-04-25 14:22:19 -07:00
highbd_vpx_convolve_copy_neon.c Update highbd convolve functions arguments to use uint16_t src/dst 2017-04-25 14:22:19 -07:00
highbd_vpx_convolve_neon.c Update highbd convolve functions arguments to use uint16_t src/dst 2017-04-25 14:22:19 -07:00
idct4x4_1_add_neon.asm Cosmetics by unifying dest_stride to stride in idct 2016-12-12 15:13:22 -08:00
idct4x4_1_add_neon.c Clean DC only idct NEON intrinsics 2016-12-28 13:51:44 -08:00
idct4x4_add_neon.asm Cosmetics by unifying dest_stride to stride in idct 2016-12-12 15:13:22 -08:00
idct4x4_add_neon.c Cosmetics by unifying dest_stride to stride in idct 2016-12-12 15:13:22 -08:00
idct8x8_1_add_neon.c Clean DC only idct NEON intrinsics 2016-12-28 13:51:44 -08:00
idct8x8_add_neon.c Add high bitdepth 8x8 idct NEON intrinsics 2016-12-27 16:28:53 -08:00
idct16x16_1_add_neon.c Clean DC only idct NEON intrinsics 2016-12-28 13:51:44 -08:00
idct16x16_add_neon.c idct_neon: prefix non-static functions w/'vpx_' 2017-03-22 11:49:23 -07:00
idct32x32_1_add_neon.c Clean DC only idct NEON intrinsics 2016-12-28 13:51:44 -08:00
idct32x32_34_add_neon.c Update 32x32 high bitdepth idct NEON optimization 2017-04-05 15:28:11 -07:00
idct32x32_135_add_neon.c Update 32x32 high bitdepth idct NEON optimization 2017-04-05 15:28:11 -07:00
idct32x32_add_neon.c idct_neon: prefix non-static functions w/'vpx_' 2017-03-22 11:49:23 -07:00
idct_neon.asm enable vpx_idct16x16_10_add_neon in hbd builds 2016-12-06 16:09:19 -08:00
idct_neon.h Update 32x32 high bitdepth idct NEON optimization 2017-04-05 15:28:11 -07:00
intrapred_neon_asm.asm Replace prefix vp9_ with vpx_ for intra prediction functions 2015-07-27 13:42:06 -07:00
intrapred_neon.c Merge "Add 32x32 d45 and 8x8, 16x16, 32x32 d135 NEON intra prediction" 2016-11-22 23:20:53 +00:00
loopfilter_4_neon.asm Check in vpx_lpf_vertical_4_dual_neon() assembly 2016-12-02 15:54:30 -08:00
loopfilter_8_neon.asm NEON asm of vpx_lpf_{horizontal,vertical}_8_dual_neon() 2016-08-16 08:50:57 -07:00
loopfilter_16_neon.asm Refactor vpx lpf NEON files (step 2/2) 2016-09-30 09:56:28 -07:00
loopfilter_neon.c cosmetics,*loopfilter_neon.c: s/tranpose/transpose/ 2016-10-12 16:12:56 -07:00
sad4d_neon.c vpx_dsp: apply clang-format 2016-07-25 14:14:19 -07:00
sad_neon.c vpx_dsp: apply clang-format 2016-07-25 14:14:19 -07:00
save_reg_neon.asm replace by VSTM/VLDM to reduce one of VST1/VLD1 2016-07-28 23:01:38 +00:00
subpel_variance_neon.c vpx_dsp: apply clang-format 2016-07-25 14:14:19 -07:00
subtract_neon.c vpx_dsp: apply clang-format 2016-07-25 14:14:19 -07:00
transpose_neon.h Add vpx_highbd_idct32x32_135_add_neon() 2017-03-16 22:37:55 -07:00
variance_neon.c neon variance: process two rows of 8 at a time 2017-05-04 08:59:46 -07:00
vpx_convolve8_avg_neon_asm.asm VPX: removed step checks from neon convolve code 2015-08-12 16:46:53 -07:00
vpx_convolve8_neon_asm.asm VPX: removed step checks from neon convolve code 2015-08-12 16:46:53 -07:00
vpx_convolve8_neon.c add vpx high bitdepth convolve8 NEON intrinsics optimization 2016-10-17 15:23:54 -07:00
vpx_convolve_avg_neon_asm.asm Code refactor on InterpKernel 2015-07-31 10:27:33 -07:00
vpx_convolve_avg_neon.c Refine vpx_convolve_copy_neon() and vpx_convolve_avg_neon() 2016-09-29 16:19:39 -07:00
vpx_convolve_copy_neon_asm.asm Code refactor on InterpKernel 2015-07-31 10:27:33 -07:00
vpx_convolve_copy_neon.c Refine vpx_convolve_copy_neon() and vpx_convolve_avg_neon() 2016-09-29 16:19:39 -07:00
vpx_convolve_neon.c add vpx high bitdepth convolve8 NEON intrinsics optimization 2016-10-17 15:23:54 -07:00