Commit Graph

27 Commits

Author SHA1 Message Date
Johann
42ce25821d remove DECLARE_ALIGNED from neon code
Unlike x86 neon only requires type alignment when loading into vectors.

Change-Id: I7bbbe4d51f78776e499ce137578d8c0effdbc02f
2017-05-26 10:41:57 -07:00
Johann
1088b4f87c move neon load/stores to a new file
Move the tran_low_t helper functions to a new file. Additional
load/store functions will be added here.

Change-Id: I52bf652c344c585ea2f3e1230886be93f5caefc3
2017-05-15 08:29:43 -07:00
Linfeng Zhang
6fc2e57c2c Update 32x32 high bitdepth idct NEON optimization
Preparation of CONVERT_TO_BYTEPTR/SHORTPTR clean up.

BUG=webm:1388

Change-Id: I928d30a5698023bb90888d783cf81c51ec183760
2017-04-05 15:28:11 -07:00
James Zern
f91c3bb3ab idct_neon: prefix non-static functions w/'vpx_'
Change-Id: I94fcdeae18468e6ef0cb7119b8142d982a048031
2017-03-22 11:49:23 -07:00
Linfeng Zhang
27530d484e Add vpx_highbd_idct32x32_1024_add_neon()
BUG=webm:1301

Change-Id: Ib90af0c1712e56b301d0e981dbe9a641e15e36ca
2017-03-17 00:27:46 -07:00
Linfeng Zhang
50b13f75b8 Add vpx_highbd_idct32x32_34_add_neon()
BUG=webm:1301

Change-Id: I74dd16c6c64e7bb71aa991cedccddf0663ef5e06
2017-03-17 00:27:46 -07:00
Linfeng Zhang
65e9fb65e8 Add vpx_highbd_idct32x32_135_add_neon()
BUG=webm:1301

Change-Id: I58c2d65d385080711c3666d6d8f9d241dac7b21a
2017-03-16 22:37:55 -07:00
Linfeng Zhang
c756eb01c8 Fix overflow issue in 32x32 idct NEON intrinsics
Similar issue as Change bc1c18e.

The PartialIDctTest.ResultsMatch test on vpx_idct32x32_135_add_neon()
in high bit-depth mode exposes 16-bit overflow in final stage of pass
2, when changing the test number from 1,000 to 1,000,000.

Change to use saturating add/sub for vpx_idct32x32_34_add_neon(),
vpx_idct32x32_135_add_neon and vpx_idct32x32_1024_add_neon() in high
bit-depth mode.

Change-Id: Iaec0e9aeab41a3fdb4e170d7e9b3ad1fda922f6f
2017-03-14 16:59:14 -07:00
Linfeng Zhang
3cf5c213f1 cosmetics,dsp/arm/: rename a variable
Rename cospi_6_26_14_18N to cospi_6_26N_14_18N for consistency.

Change-Id: I00498b43bb612b368219a489b3adaa41729bf31a
2017-03-08 08:55:41 -08:00
Linfeng Zhang
0620081731 Add vpx_highbd_idct16x16_10_add_neon()
BUG=webm:1301

Change-Id: If686c8144764c4162458f0bc4bb1bbf6555c48ab
2017-02-16 15:13:50 -08:00
Linfeng Zhang
81914ce68a Add vpx_highbd_idct16x16_38_add_neon()
BUG=webm:1301

Change-Id: Ic6cd8c1e63e1b7a997cbed221e20fff4c599e0fe
2017-02-15 09:12:02 -08:00
Linfeng Zhang
429e652809 Replace 14 with DCT_CONST_BITS in idct NEON functions' shifts
Change-Id: I2a39a3bb87516b04d273bc1c0f4a634e3fb6f0f6
2017-02-14 13:08:41 -08:00
Linfeng Zhang
5ad4159ebb Add vpx_highbd_idct16x16_256_add_neon()
BUG=webm:1301

Change-Id: I6bb755552a39bdd26eef3f449601f6a9766c65ec
2017-02-13 15:50:33 -08:00
Johann
1eb8a718bf hadamard highbd neon: use tran_low_t for coeff
BUG=webm:1365

Change-Id: I7e15192ead3a3631755b386f102c979f06e26279
2017-02-01 11:50:46 -08:00
Linfeng Zhang
6abdd31555 Refine 8-bit 16x16 idct NEON intrinsics
Speed test shows 25% gain on vpx_idct16x16_256_add_neon(),
and vpx_idct16x16_10_add_neon() got trippled.

Change-Id: If8518d9b6a3efab74031297b8d40cd83c4a49541
2017-01-06 17:52:07 -08:00
Linfeng Zhang
2d12a52ff0 Merge "Add high bitdepth 8x8 idct NEON intrinsics" 2017-01-06 16:47:23 +00:00
Linfeng Zhang
911bb980b1 Clean DC only idct NEON intrinsics
BUG=webm:1301

Change-Id: Iffc83854218460b3f687f3774e71d45b552382a5
2016-12-28 13:51:44 -08:00
Linfeng Zhang
9b187954df Add high bitdepth 8x8 idct NEON intrinsics
BUG=webm:1301

Change-Id: I56e3bc3aab9214e2debac93796389a7194991084
2016-12-27 16:28:53 -08:00
Johann
2c24f7178d Move load_and_transpose to transpose_neon.h
Allows for use outside the idcts without pulling in idct_neon.h

Change-Id: I4a94c1af3dac3e1b5bc8296ec9eab0ddcc8cfecf
2016-12-09 12:54:55 -08:00
James Zern
228c9940ea Merge changes Ibad079f2,I7858a0a1
* changes:
  enable vpx_idct16x16_10_add_neon in hbd builds
  idct16x16,NEON: rm output_stride from pass1 fns
2016-12-07 01:40:28 +00:00
James Zern
8befcd0089 enable vpx_idct16x16_10_add_neon in hbd builds
BUG=webm:1294

Change-Id: Ibad079f25e673d4f5181961896a8a8333a51e825
2016-12-06 16:09:19 -08:00
Linfeng Zhang
cb339d628f Refine 8-bit 8x8 idct NEON intrinsics
Change-Id: I4ec4ad1928ec2ed87f596f52f097bc52065278dd
2016-12-05 17:50:14 -08:00
Linfeng Zhang
17a8cf5cc3 Add high bitdepth 4x4 idct NEON intrinsics
Change-Id: I4afc130effa05b8be2e9f982967216b1beb2ce4b
2016-11-30 13:07:13 -08:00
James Zern
21a1abd8e3 enable vpx_idct32x32_135_add_neon in hbd builds
BUG=webm:1294

Change-Id: Ide6d3994fe01c4320c9d143e6d059b49568048e4
2016-11-23 19:59:43 -08:00
James Zern
568d4b1d63 idct_neon: rename load_tran_low_to_s16 -> ...s16q
BUG=webm:1294

Change-Id: I164cfcbe9bc4511d1d04af9206cf351a0ec2957b
2016-11-23 19:57:48 -08:00
Johann
bf8ab194ee arm idct: move to-be-shared code to header
Change-Id: I67458cd358b4dc4434bbdbfcdd571769561b619e
2016-11-01 15:43:56 -07:00
James Zern
3ae25974fd idct,NEON: add a tran_low_t->s16 load adapter
enable idct4x4* and idct8x8* which are compatible for 8-bit decodes in
high-bitdepth mode. the adapter narrows 32-bit input to 16, whether the
expansion can be avoided at all in this case remains a TODO. roughly
matches sse2.

BUG=webm:1294

Change-Id: I3ea94e5a2070dfd509b5de0c555aab4e1f4da036
2016-10-31 11:21:16 -07:00