Linfeng Zhang
81914ce68a
Add vpx_highbd_idct16x16_38_add_neon()
...
BUG=webm:1301
Change-Id: Ic6cd8c1e63e1b7a997cbed221e20fff4c599e0fe
2017-02-15 09:12:02 -08:00
Linfeng Zhang
429e652809
Replace 14 with DCT_CONST_BITS in idct NEON functions' shifts
...
Change-Id: I2a39a3bb87516b04d273bc1c0f4a634e3fb6f0f6
2017-02-14 13:08:41 -08:00
Linfeng Zhang
de9ae32b93
Merge "Add vpx_highbd_idct16x16_256_add_neon()"
2017-02-14 01:15:34 +00:00
Linfeng Zhang
5ad4159ebb
Add vpx_highbd_idct16x16_256_add_neon()
...
BUG=webm:1301
Change-Id: I6bb755552a39bdd26eef3f449601f6a9766c65ec
2017-02-13 15:50:33 -08:00
Johann
5ecde212a8
fdct8x8 highbd neon: use tran_low_t for output
...
Change-Id: I100c4a1955d80bec4d28e82796b3e7f57e84d0ba
2017-02-13 22:16:14 +00:00
Linfeng Zhang
016933ad48
Add vpx_highbd_idct{16x16,32x32}_1_add_neon()
...
and update vpx_highbd_idct8x8_1_add_neon()
BUG=webm:1301
Change-Id: I18d1a0cbe98ba822d5194c1b4e13a4c29c5c75f4
2017-02-13 10:25:22 -08:00
Linfeng Zhang
bc1c18e18c
Add vpx_idct16x16_38_add_neon()
...
The RunQuantCheck() test on it exposes 16-bit overflow in stage 7 of
pass 2. Change to use saturating add/sub for both
vpx_idct16x16_38_add_neon() and vpx_idct16x16_256_add_neon() for high
bitdepth.
Change-Id: Ibf4c107a887553a52852cc582e28d38a5a5a2712
2017-02-08 12:15:22 -08:00
Linfeng Zhang
66695533a8
Merge "Update 16x16 8-bit idct NEON intrinsics"
2017-02-07 16:52:40 +00:00
Johann Koenig
726556dde9
Merge "Remove neon assembly for idct 16x16 and 8x8"
2017-02-02 03:25:31 +00:00
Johann Koenig
ce6318f254
Merge changes I43521ad3,I013659f6
...
* changes:
satd highbd neon: use tran_low_t for coeff
satd highbd sse2: use tran_low_t for coeff
2017-02-02 03:03:58 +00:00
Linfeng Zhang
e4985cf619
Update 16x16 8-bit idct NEON intrinsics
...
Remove redundant memory accesses.
Change-Id: I8049074bdba5f49eab7e735b2b377423a69cd4c8
2017-02-01 17:04:33 -08:00
Johann
f8d744d91a
satd highbd neon: use tran_low_t for coeff
...
BUG=webm:1365
Change-Id: I43521ad32b6c96737a8ef2b8c327f901fd7eaf84
2017-02-01 11:55:47 -08:00
Johann
1eb8a718bf
hadamard highbd neon: use tran_low_t for coeff
...
BUG=webm:1365
Change-Id: I7e15192ead3a3631755b386f102c979f06e26279
2017-02-01 11:50:46 -08:00
Johann
13234d3c43
Remove neon assembly for idct 16x16 and 8x8
...
Tested using test/partial_idct_test.cc:DISABLED_Speed
Both gcc 4.9 and clang 3.8 from the r13 Android NDK offer improvements
using the intrinsics:
<function> <clang asm> <gcc asm> <clang intrin> <gcc intrin>
idct16x16_256 1720ms 1703ms 1546ms 1554ms
idct16x16_10 1320ms 1247ms 518ms 488ms
idct16x16_1 107ms 108ms 64ms 68ms
idct8x8_64 924ms 931ms 866ms 989ms
idct8x8_12 826ms 824ms 519ms 514ms
idct8x8_1 172ms 166ms 110ms 125ms
idct8x8_64 isn't quite perfect (slight regression with gcc intrinsics)
but as a counter example idct16x16_10 goes from ~1300ms to ~500ms
On a sample clip, clang improved from 48.5 to 49fps and gcc stayed roughly
stable.
BUG=webm:1303
Change-Id: I9d4fd2b41b46ea6174a887b40a82c8e6e4769ed4
2017-01-19 12:27:31 -08:00
Johann
68d0f46ec0
arm idct16x16: remove extra config guards
...
This file is guarded by HAVE_NEON_ASM in the .mk file now.
Change-Id: I513a621c234aa90ad52e426c8ed494d8a7d4b74a
2017-01-11 10:17:14 -08:00
James Zern
9480da21e8
Merge "Refine 8-bit 16x16 idct NEON intrinsics"
2017-01-09 23:52:29 +00:00
Johann
c23970ec25
postproc: vpx_mbpost_proc_down_neon
...
This was much more amenable to optimization than the across filter.
Speedup of almost 2.5x
BUG=webm:1320
Change-Id: I49acc0f9cb2e7642303df90132cbc938acade4c4
2017-01-09 10:21:56 -08:00
Johann Koenig
9af97fb630
Merge "postproc: vpx_mbpost_proc_across_ip_neon"
2017-01-09 18:17:26 +00:00
Linfeng Zhang
6abdd31555
Refine 8-bit 16x16 idct NEON intrinsics
...
Speed test shows 25% gain on vpx_idct16x16_256_add_neon(),
and vpx_idct16x16_10_add_neon() got trippled.
Change-Id: If8518d9b6a3efab74031297b8d40cd83c4a49541
2017-01-06 17:52:07 -08:00
Johann
4dca923454
postproc: vpx_mbpost_proc_across_ip_neon
...
The speedup is pretty poor. I would be concerned except the SSE2 is
worse:
Existing SSE2 improvement: 22%
New neon improvement: 35%
BUG=webm:1320
Change-Id: Ied598a261134aa6cbe69f96f58589d2bae17bf62
2017-01-06 16:39:17 -08:00
Linfeng Zhang
2d12a52ff0
Merge "Add high bitdepth 8x8 idct NEON intrinsics"
2017-01-06 16:47:23 +00:00
Linfeng Zhang
911bb980b1
Clean DC only idct NEON intrinsics
...
BUG=webm:1301
Change-Id: Iffc83854218460b3f687f3774e71d45b552382a5
2016-12-28 13:51:44 -08:00
Linfeng Zhang
9b187954df
Add high bitdepth 8x8 idct NEON intrinsics
...
BUG=webm:1301
Change-Id: I56e3bc3aab9214e2debac93796389a7194991084
2016-12-27 16:28:53 -08:00
Linfeng Zhang
6d5a3fe583
Clean idct 8x8 neon functions
...
BUG=webm:1301
Change-Id: I05f47dca1fddc155c8396e627cfccf6449677307
2016-12-21 14:24:17 -08:00
James Zern
a68b36c752
vpx_idct32x32_1024_add_neon: quiet uninitialized warning
...
relocate the assignment to 'in' outside of the for loop. this quiets a
spurious warning in visual studio builds since:
86e340c enable vpx_idct32x32_1024_add_neon in hbd builds
+ give the variable a more descriptive name
BUG=webm:1294
Change-Id: I5c3da5c7939621477e0fc0ad3a1b2a3045c5bffd
2016-12-19 12:49:44 -08:00
Linfeng Zhang
7e23f895ca
Merge "Clean hbd idct 4x4 neon functions and other"
2016-12-19 17:09:26 +00:00
Johann
41b0888a84
postproc: neon down and across macroblock filter
...
Implement vpx_post_proc_down_and_across_mb_row in NEON.
Runs about 6-7x faster than C.
BUG=webm:1320
Change-Id: Ic5c7d3552a88cfcf999ec5bf2bd46fee460642c2
2016-12-14 15:11:28 -08:00
Linfeng Zhang
c8f25fa5c0
Clean hbd idct 4x4 neon functions and other
...
BUG=webm:1301
Change-Id: I387b7eae716a7df15c691dc6f368b07602df7342
2016-12-14 11:38:28 -08:00
James Zern
86e340c76e
enable vpx_idct32x32_1024_add_neon in hbd builds
...
BUG=webm:1294
Change-Id: Ibdda54e6d1303b0f73bc7bc71417e4041d7618de
2016-12-12 19:28:35 -08:00
Linfeng Zhang
5d4aa325a6
Cosmetics by unifying dest_stride to stride in idct
...
Change-Id: Ie9336a808a3c3592bb4fd5d4ad3839028bfcafba
2016-12-12 15:13:22 -08:00
Johann
2c24f7178d
Move load_and_transpose to transpose_neon.h
...
Allows for use outside the idcts without pulling in idct_neon.h
Change-Id: I4a94c1af3dac3e1b5bc8296ec9eab0ddcc8cfecf
2016-12-09 12:54:55 -08:00
James Zern
6defef4ab2
idct16x16_add_neon: fix arm visual studio builds
...
after:
2d3d95f enable vpx_idct16x16_256_add_neon in hbd builds
reorder INCLUDEs and fix indent of IF/ENDIFs
remove vpx_config.asm to avoid multiple symbol definitions in windows
builds and shift idct_neon.asm.S to the top to allow use of
CONFIG_VP9_HIGHBITDEPTH in the export list.
Change-Id: I0dacfbae62a6ec8fe4a26940c1a52da2dfad2029
2016-12-08 15:17:57 -08:00
Linfeng Zhang
174528de1e
Merge "Update idct NEON optimization to not use narrowing saturating shift"
2016-12-07 21:03:21 +00:00
James Zern
f16a0a1aa4
Merge "enable vpx_idct16x16_256_add_neon in hbd builds"
2016-12-07 20:26:44 +00:00
Linfeng Zhang
018a2adcb1
Update idct NEON optimization to not use narrowing saturating shift
...
Change-Id: Iae517017217dbacd638d40fcfeeb0f4bba7b8b8b
2016-12-07 10:25:09 -08:00
James Zern
2d3d95f7ac
enable vpx_idct16x16_256_add_neon in hbd builds
...
BUG=webm:1294
Change-Id: Ib421c150b0d29dee0a81390a612bf01a4a28cff1
2016-12-06 18:32:21 -08:00
James Zern
228c9940ea
Merge changes Ibad079f2,I7858a0a1
...
* changes:
enable vpx_idct16x16_10_add_neon in hbd builds
idct16x16,NEON: rm output_stride from pass1 fns
2016-12-07 01:40:28 +00:00
James Zern
8befcd0089
enable vpx_idct16x16_10_add_neon in hbd builds
...
BUG=webm:1294
Change-Id: Ibad079f25e673d4f5181961896a8a8333a51e825
2016-12-06 16:09:19 -08:00
James Zern
af9d7aa9fb
idct16x16,NEON: rm output_stride from pass1 fns
...
vpx_idct16x16_256_add_neon_pass1, vpx_idct16x16_10_add_neon:
this was a constant 8 in all cases meaning the results are stored
contiguously, this allows the number of stores to be reduced.
Change-Id: I7858a0a15a284883ef45c13dfd97c308df9ea09e
2016-12-06 15:13:33 -08:00
Linfeng Zhang
cb339d628f
Refine 8-bit 8x8 idct NEON intrinsics
...
Change-Id: I4ec4ad1928ec2ed87f596f52f097bc52065278dd
2016-12-05 17:50:14 -08:00
Linfeng Zhang
a8eee97b43
Check in vpx_lpf_vertical_4_dual_neon() assembly
...
This replaces its C version.
Change-Id: Ie39e9324305fdc0fff610ced608a037e44a85a1a
2016-12-02 15:54:30 -08:00
James Zern
a7fa1314da
Merge changes I4afc130e,Iaa64d23f
...
* changes:
Add high bitdepth 4x4 idct NEON intrinsics
Update idct x86 intrinsics to not use saturated add and sub
2016-12-02 04:01:28 +00:00
Linfeng Zhang
17a8cf5cc3
Add high bitdepth 4x4 idct NEON intrinsics
...
Change-Id: I4afc130effa05b8be2e9f982967216b1beb2ce4b
2016-11-30 13:07:13 -08:00
James Zern
c6641782c3
idct16x16,NEON,cosmetics: normalize fn signatures
...
+ remove unused parameters from vpx_idct16x16_10_add_neon_pass2
Change-Id: Ie5912a4abdd308fab589380bca054a2e7234a2c4
2016-11-28 16:46:01 -08:00
James Zern
21a1abd8e3
enable vpx_idct32x32_135_add_neon in hbd builds
...
BUG=webm:1294
Change-Id: Ide6d3994fe01c4320c9d143e6d059b49568048e4
2016-11-23 19:59:43 -08:00
James Zern
568d4b1d63
idct_neon: rename load_tran_low_to_s16 -> ...s16q
...
BUG=webm:1294
Change-Id: I164cfcbe9bc4511d1d04af9206cf351a0ec2957b
2016-11-23 19:57:48 -08:00
James Zern
d757d7e998
Merge changes Icc4ead05,Ib019964b,I3b5fd3b3,Ieedadee2
...
* changes:
Update vpx_idct4x4_16_add_neon() to pass SingleExtremeCoeff test
Refine 8-bit 4x4 idct NEON intrinsics
Add idct speed test.
Update partial_idct_test.cc to support high bitdepth
2016-11-24 03:31:25 +00:00
Linfeng Zhang
05e2b5a59f
Merge "Add 32x32 d45 and 8x8, 16x16, 32x32 d135 NEON intra prediction"
2016-11-22 23:20:53 +00:00
Linfeng Zhang
6cc76ec73f
Update vpx_idct4x4_16_add_neon() to pass SingleExtremeCoeff test
...
Change-Id: Icc4ead05506797d12bf134e8790443676fef5c10
2016-11-22 11:35:05 -08:00
Linfeng Zhang
974e81d184
Refine 8-bit 4x4 idct NEON intrinsics
...
Change-Id: Ib019964bfcbce7aec57d8c3583127f9354d3c11f
2016-11-22 11:26:03 -08:00