138 Commits

Author SHA1 Message Date
Linfeng Zhang
6d5a3fe583 Clean idct 8x8 neon functions
BUG=webm:1301

Change-Id: I05f47dca1fddc155c8396e627cfccf6449677307
2016-12-21 14:24:17 -08:00
James Zern
a68b36c752 vpx_idct32x32_1024_add_neon: quiet uninitialized warning
relocate the assignment to 'in' outside of the for loop. this quiets a
spurious warning in visual studio builds since:
86e340c enable vpx_idct32x32_1024_add_neon in hbd builds

+ give the variable a more descriptive name

BUG=webm:1294

Change-Id: I5c3da5c7939621477e0fc0ad3a1b2a3045c5bffd
2016-12-19 12:49:44 -08:00
Linfeng Zhang
7e23f895ca Merge "Clean hbd idct 4x4 neon functions and other" 2016-12-19 17:09:26 +00:00
Johann
41b0888a84 postproc: neon down and across macroblock filter
Implement vpx_post_proc_down_and_across_mb_row in NEON.
Runs about 6-7x faster than C.

BUG=webm:1320

Change-Id: Ic5c7d3552a88cfcf999ec5bf2bd46fee460642c2
2016-12-14 15:11:28 -08:00
Linfeng Zhang
c8f25fa5c0 Clean hbd idct 4x4 neon functions and other
BUG=webm:1301

Change-Id: I387b7eae716a7df15c691dc6f368b07602df7342
2016-12-14 11:38:28 -08:00
James Zern
86e340c76e enable vpx_idct32x32_1024_add_neon in hbd builds
BUG=webm:1294

Change-Id: Ibdda54e6d1303b0f73bc7bc71417e4041d7618de
2016-12-12 19:28:35 -08:00
Linfeng Zhang
5d4aa325a6 Cosmetics by unifying dest_stride to stride in idct
Change-Id: Ie9336a808a3c3592bb4fd5d4ad3839028bfcafba
2016-12-12 15:13:22 -08:00
Johann
2c24f7178d Move load_and_transpose to transpose_neon.h
Allows for use outside the idcts without pulling in idct_neon.h

Change-Id: I4a94c1af3dac3e1b5bc8296ec9eab0ddcc8cfecf
2016-12-09 12:54:55 -08:00
James Zern
6defef4ab2 idct16x16_add_neon: fix arm visual studio builds
after:
2d3d95f enable vpx_idct16x16_256_add_neon in hbd builds

reorder INCLUDEs and fix indent of IF/ENDIFs

remove vpx_config.asm to avoid multiple symbol definitions in windows
builds and shift idct_neon.asm.S to the top to allow use of
CONFIG_VP9_HIGHBITDEPTH in the export list.

Change-Id: I0dacfbae62a6ec8fe4a26940c1a52da2dfad2029
2016-12-08 15:17:57 -08:00
Linfeng Zhang
174528de1e Merge "Update idct NEON optimization to not use narrowing saturating shift" 2016-12-07 21:03:21 +00:00
James Zern
f16a0a1aa4 Merge "enable vpx_idct16x16_256_add_neon in hbd builds" 2016-12-07 20:26:44 +00:00
Linfeng Zhang
018a2adcb1 Update idct NEON optimization to not use narrowing saturating shift
Change-Id: Iae517017217dbacd638d40fcfeeb0f4bba7b8b8b
2016-12-07 10:25:09 -08:00
James Zern
2d3d95f7ac enable vpx_idct16x16_256_add_neon in hbd builds
BUG=webm:1294

Change-Id: Ib421c150b0d29dee0a81390a612bf01a4a28cff1
2016-12-06 18:32:21 -08:00
James Zern
228c9940ea Merge changes Ibad079f2,I7858a0a1
* changes:
  enable vpx_idct16x16_10_add_neon in hbd builds
  idct16x16,NEON: rm output_stride from pass1 fns
2016-12-07 01:40:28 +00:00
James Zern
8befcd0089 enable vpx_idct16x16_10_add_neon in hbd builds
BUG=webm:1294

Change-Id: Ibad079f25e673d4f5181961896a8a8333a51e825
2016-12-06 16:09:19 -08:00
James Zern
af9d7aa9fb idct16x16,NEON: rm output_stride from pass1 fns
vpx_idct16x16_256_add_neon_pass1, vpx_idct16x16_10_add_neon:
this was a constant 8 in all cases meaning the results are stored
contiguously, this allows the number of stores to be reduced.

Change-Id: I7858a0a15a284883ef45c13dfd97c308df9ea09e
2016-12-06 15:13:33 -08:00
Linfeng Zhang
cb339d628f Refine 8-bit 8x8 idct NEON intrinsics
Change-Id: I4ec4ad1928ec2ed87f596f52f097bc52065278dd
2016-12-05 17:50:14 -08:00
Linfeng Zhang
a8eee97b43 Check in vpx_lpf_vertical_4_dual_neon() assembly
This replaces its C version.

Change-Id: Ie39e9324305fdc0fff610ced608a037e44a85a1a
2016-12-02 15:54:30 -08:00
James Zern
a7fa1314da Merge changes I4afc130e,Iaa64d23f
* changes:
  Add high bitdepth 4x4 idct NEON intrinsics
  Update idct x86 intrinsics to not use saturated add and sub
2016-12-02 04:01:28 +00:00
Linfeng Zhang
17a8cf5cc3 Add high bitdepth 4x4 idct NEON intrinsics
Change-Id: I4afc130effa05b8be2e9f982967216b1beb2ce4b
2016-11-30 13:07:13 -08:00
James Zern
c6641782c3 idct16x16,NEON,cosmetics: normalize fn signatures
+ remove unused parameters from vpx_idct16x16_10_add_neon_pass2

Change-Id: Ie5912a4abdd308fab589380bca054a2e7234a2c4
2016-11-28 16:46:01 -08:00
James Zern
21a1abd8e3 enable vpx_idct32x32_135_add_neon in hbd builds
BUG=webm:1294

Change-Id: Ide6d3994fe01c4320c9d143e6d059b49568048e4
2016-11-23 19:59:43 -08:00
James Zern
568d4b1d63 idct_neon: rename load_tran_low_to_s16 -> ...s16q
BUG=webm:1294

Change-Id: I164cfcbe9bc4511d1d04af9206cf351a0ec2957b
2016-11-23 19:57:48 -08:00
James Zern
d757d7e998 Merge changes Icc4ead05,Ib019964b,I3b5fd3b3,Ieedadee2
* changes:
  Update vpx_idct4x4_16_add_neon() to pass SingleExtremeCoeff test
  Refine 8-bit 4x4 idct NEON intrinsics
  Add idct speed test.
  Update partial_idct_test.cc to support high bitdepth
2016-11-24 03:31:25 +00:00
Linfeng Zhang
05e2b5a59f Merge "Add 32x32 d45 and 8x8, 16x16, 32x32 d135 NEON intra prediction" 2016-11-22 23:20:53 +00:00
Linfeng Zhang
6cc76ec73f Update vpx_idct4x4_16_add_neon() to pass SingleExtremeCoeff test
Change-Id: Icc4ead05506797d12bf134e8790443676fef5c10
2016-11-22 11:35:05 -08:00
Linfeng Zhang
974e81d184 Refine 8-bit 4x4 idct NEON intrinsics
Change-Id: Ib019964bfcbce7aec57d8c3583127f9354d3c11f
2016-11-22 11:26:03 -08:00
James Zern
cbeae53e76 Merge "Clean horizontal intra prediction NEON optimization" 2016-11-19 01:29:37 +00:00
Linfeng Zhang
85c1ee434d Add high bitdepth intra prediction NEON optimization (mode tm)
BUG=webm:1316

Change-Id: Ib014de06836ac12726f4a2c9f0833ec4eb4d233b
2016-11-15 14:19:46 -08:00
Linfeng Zhang
a3128ad33a Add high bitdepth intra prediction NEON optimization (h and v)
BUG=webm:1316

Change-Id: I47eeac698a98a31d1af5f72441052302e9fa4f46
2016-11-12 12:00:19 -08:00
James Zern
80f6b243a7 Merge changes I339088b2,Iaade219e,If142afb1,I4257c4b3
* changes:
  fdct8x8_test: add vpx_idct8x8_64_add_neon in hbd
  fdct4x4_test: add vpx_idct4x4_16_add_neon in hbd
  partial_idct_test,NEON: add missing idct variants
  enable vpx_idct32x32_34_add_neon in hbd builds
2016-11-10 05:02:39 +00:00
Linfeng Zhang
40ab0424d4 Add high bitdepth intra prediction NEON optimization (mode d45 and d135)
BUG=webm:1316

Change-Id: I6a330874348df04df24a6d9efdc06f567e04bf8e
2016-11-09 12:04:04 -08:00
James Zern
738c8f23c6 enable vpx_idct32x32_34_add_neon in hbd builds
replace load_and_transpose_s16_8x8() in idct32_6_neon() with a separate
load_tran_low_to_s16() and transpose_s16_8x8(). the combined function is
used in idct32_8_neon() where the input is the correctly sized output
from the earlier stage.

BUG=webm:1294

Change-Id: I4257c4b3a421b2cf5d13651f966eee0680ef98a9
2016-11-08 17:03:36 -08:00
Johann
50b40f114c Optimize idct32x32_135_add for NEON
BUG=webm:1295

Change-Id: I7f80ef4d29813fcb401fc6075babf19e3c195462
2016-11-08 22:06:07 +00:00
Linfeng Zhang
64a5a8fd6f Merge "Add high bitdepth intra prediction NEON optimization (mode dc)" 2016-11-08 16:53:42 +00:00
Martin Storsjo
34c35b6fb6 Add a missing END directive in idct_neon.asm
This fixes building with MS armasm.

Change-Id: I2629eeed859b775ca667a65ba109f8d1bf7b0e03
2016-11-04 12:21:18 +02:00
Linfeng Zhang
1338c71dfb Clean horizontal intra prediction NEON optimization
Change-Id: I1ef0a5b2655cbc7e1cc2a4a1a72e0eed9aa41f05
2016-11-02 11:43:45 -07:00
Linfeng Zhang
1868582e7d Add 32x32 d45 and 8x8, 16x16, 32x32 d135 NEON intra prediction
Change-Id: I852616794244490123eb615ac750da50265f0fa5
2016-11-02 11:40:37 -07:00
Johann Koenig
5ac7a59a05 Merge "arm idct: move to-be-shared code to header" 2016-11-02 18:09:45 +00:00
Linfeng Zhang
3b74066b10 Add high bitdepth intra prediction NEON optimization (mode dc)
BUG=webm:1316

Change-Id: I984d6004ea2445e86f213fb6fa4d794a9955af8f
2016-11-01 17:07:36 -07:00
Johann
bf8ab194ee arm idct: move to-be-shared code to header
Change-Id: I67458cd358b4dc4434bbdbfcdd571769561b619e
2016-11-01 15:43:56 -07:00
James Zern
1b275ab898 Merge "idct32x32_1_add_neon: clear a couple conv warnings" 2016-11-01 22:34:59 +00:00
James Zern
9de91855ef Merge changes I08af3a54,If5959a25,I6763e62e
* changes:
  build/make/Android.mk: s/armv8/arm64/
  build/make/Android.mk: fix armeabi-v7a build
  use .S suffix rather than .s for NEON asm
2016-11-01 21:43:13 +00:00
Linfeng Zhang
cc5f49767a Refine 8-bit intra prediction NEON optimization (mode tm)
Change-Id: I98b9577ec51367df5e5d564bedf7c3ea0606de4c
2016-11-01 09:45:16 -07:00
James Zern
7625c803b3 idct32x32_1_add_neon: clear a couple conv warnings
int16_t -> uint8_t

Change-Id: I3c5e0985bc3584dce289c35b5973de24cdc73b76
2016-10-31 18:56:34 -07:00
James Zern
1ddb4c0362 use .S suffix rather than .s for NEON asm
for compatibility with other build systems

Change-Id: I6763e62e3126850ad4f8ad29e388b8dad0bbc4c3
2016-10-31 16:39:05 -07:00
James Zern
410d947c5f Merge "idct,NEON: add a tran_low_t->s16 load adapter" 2016-10-31 21:59:12 +00:00
James Zern
3ae25974fd idct,NEON: add a tran_low_t->s16 load adapter
enable idct4x4* and idct8x8* which are compatible for 8-bit decodes in
high-bitdepth mode. the adapter narrows 32-bit input to 16, whether the
expansion can be avoided at all in this case remains a TODO. roughly
matches sse2.

BUG=webm:1294

Change-Id: I3ea94e5a2070dfd509b5de0c555aab4e1f4da036
2016-10-31 11:21:16 -07:00
Linfeng Zhang
a347118f3c Refine 8-bit intra prediction NEON optimization (mode h and v)
Change-Id: I45e1454c3a85e081bfa14386e0248f57e2a91854
2016-10-31 10:33:44 -07:00
Linfeng Zhang
4ae9f5c092 Refine 8-bit intra prediction NEON optimization (mode d45 and d135)
dst += stride behaving better with gcc/clang.
Unroll loops.

Change-Id: I83f85df2bc9f17c6159542f57680b509395db2b1
2016-10-27 14:24:50 -07:00