Linfeng Zhang
5d4aa325a6
Cosmetics by unifying dest_stride to stride in idct
...
Change-Id: Ie9336a808a3c3592bb4fd5d4ad3839028bfcafba
2016-12-12 15:13:22 -08:00
Johann
2c24f7178d
Move load_and_transpose to transpose_neon.h
...
Allows for use outside the idcts without pulling in idct_neon.h
Change-Id: I4a94c1af3dac3e1b5bc8296ec9eab0ddcc8cfecf
2016-12-09 12:54:55 -08:00
James Zern
6defef4ab2
idct16x16_add_neon: fix arm visual studio builds
...
after:
2d3d95f
enable vpx_idct16x16_256_add_neon in hbd builds
reorder INCLUDEs and fix indent of IF/ENDIFs
remove vpx_config.asm to avoid multiple symbol definitions in windows
builds and shift idct_neon.asm.S to the top to allow use of
CONFIG_VP9_HIGHBITDEPTH in the export list.
Change-Id: I0dacfbae62a6ec8fe4a26940c1a52da2dfad2029
2016-12-08 15:17:57 -08:00
Linfeng Zhang
174528de1e
Merge "Update idct NEON optimization to not use narrowing saturating shift"
2016-12-07 21:03:21 +00:00
James Zern
f16a0a1aa4
Merge "enable vpx_idct16x16_256_add_neon in hbd builds"
2016-12-07 20:26:44 +00:00
Linfeng Zhang
018a2adcb1
Update idct NEON optimization to not use narrowing saturating shift
...
Change-Id: Iae517017217dbacd638d40fcfeeb0f4bba7b8b8b
2016-12-07 10:25:09 -08:00
James Zern
2d3d95f7ac
enable vpx_idct16x16_256_add_neon in hbd builds
...
BUG=webm:1294
Change-Id: Ib421c150b0d29dee0a81390a612bf01a4a28cff1
2016-12-06 18:32:21 -08:00
James Zern
228c9940ea
Merge changes Ibad079f2,I7858a0a1
...
* changes:
enable vpx_idct16x16_10_add_neon in hbd builds
idct16x16,NEON: rm output_stride from pass1 fns
2016-12-07 01:40:28 +00:00
James Zern
8befcd0089
enable vpx_idct16x16_10_add_neon in hbd builds
...
BUG=webm:1294
Change-Id: Ibad079f25e673d4f5181961896a8a8333a51e825
2016-12-06 16:09:19 -08:00
James Zern
af9d7aa9fb
idct16x16,NEON: rm output_stride from pass1 fns
...
vpx_idct16x16_256_add_neon_pass1, vpx_idct16x16_10_add_neon:
this was a constant 8 in all cases meaning the results are stored
contiguously, this allows the number of stores to be reduced.
Change-Id: I7858a0a15a284883ef45c13dfd97c308df9ea09e
2016-12-06 15:13:33 -08:00
Linfeng Zhang
cb339d628f
Refine 8-bit 8x8 idct NEON intrinsics
...
Change-Id: I4ec4ad1928ec2ed87f596f52f097bc52065278dd
2016-12-05 17:50:14 -08:00
Linfeng Zhang
a8eee97b43
Check in vpx_lpf_vertical_4_dual_neon() assembly
...
This replaces its C version.
Change-Id: Ie39e9324305fdc0fff610ced608a037e44a85a1a
2016-12-02 15:54:30 -08:00
James Zern
a7fa1314da
Merge changes I4afc130e,Iaa64d23f
...
* changes:
Add high bitdepth 4x4 idct NEON intrinsics
Update idct x86 intrinsics to not use saturated add and sub
2016-12-02 04:01:28 +00:00
Linfeng Zhang
17a8cf5cc3
Add high bitdepth 4x4 idct NEON intrinsics
...
Change-Id: I4afc130effa05b8be2e9f982967216b1beb2ce4b
2016-11-30 13:07:13 -08:00
James Zern
c6641782c3
idct16x16,NEON,cosmetics: normalize fn signatures
...
+ remove unused parameters from vpx_idct16x16_10_add_neon_pass2
Change-Id: Ie5912a4abdd308fab589380bca054a2e7234a2c4
2016-11-28 16:46:01 -08:00
James Zern
21a1abd8e3
enable vpx_idct32x32_135_add_neon in hbd builds
...
BUG=webm:1294
Change-Id: Ide6d3994fe01c4320c9d143e6d059b49568048e4
2016-11-23 19:59:43 -08:00
James Zern
568d4b1d63
idct_neon: rename load_tran_low_to_s16 -> ...s16q
...
BUG=webm:1294
Change-Id: I164cfcbe9bc4511d1d04af9206cf351a0ec2957b
2016-11-23 19:57:48 -08:00
James Zern
d757d7e998
Merge changes Icc4ead05,Ib019964b,I3b5fd3b3,Ieedadee2
...
* changes:
Update vpx_idct4x4_16_add_neon() to pass SingleExtremeCoeff test
Refine 8-bit 4x4 idct NEON intrinsics
Add idct speed test.
Update partial_idct_test.cc to support high bitdepth
2016-11-24 03:31:25 +00:00
Linfeng Zhang
05e2b5a59f
Merge "Add 32x32 d45 and 8x8, 16x16, 32x32 d135 NEON intra prediction"
2016-11-22 23:20:53 +00:00
Linfeng Zhang
6cc76ec73f
Update vpx_idct4x4_16_add_neon() to pass SingleExtremeCoeff test
...
Change-Id: Icc4ead05506797d12bf134e8790443676fef5c10
2016-11-22 11:35:05 -08:00
Linfeng Zhang
974e81d184
Refine 8-bit 4x4 idct NEON intrinsics
...
Change-Id: Ib019964bfcbce7aec57d8c3583127f9354d3c11f
2016-11-22 11:26:03 -08:00
James Zern
cbeae53e76
Merge "Clean horizontal intra prediction NEON optimization"
2016-11-19 01:29:37 +00:00
Linfeng Zhang
85c1ee434d
Add high bitdepth intra prediction NEON optimization (mode tm)
...
BUG=webm:1316
Change-Id: Ib014de06836ac12726f4a2c9f0833ec4eb4d233b
2016-11-15 14:19:46 -08:00
Linfeng Zhang
a3128ad33a
Add high bitdepth intra prediction NEON optimization (h and v)
...
BUG=webm:1316
Change-Id: I47eeac698a98a31d1af5f72441052302e9fa4f46
2016-11-12 12:00:19 -08:00
James Zern
80f6b243a7
Merge changes I339088b2,Iaade219e,If142afb1,I4257c4b3
...
* changes:
fdct8x8_test: add vpx_idct8x8_64_add_neon in hbd
fdct4x4_test: add vpx_idct4x4_16_add_neon in hbd
partial_idct_test,NEON: add missing idct variants
enable vpx_idct32x32_34_add_neon in hbd builds
2016-11-10 05:02:39 +00:00
Linfeng Zhang
40ab0424d4
Add high bitdepth intra prediction NEON optimization (mode d45 and d135)
...
BUG=webm:1316
Change-Id: I6a330874348df04df24a6d9efdc06f567e04bf8e
2016-11-09 12:04:04 -08:00
James Zern
738c8f23c6
enable vpx_idct32x32_34_add_neon in hbd builds
...
replace load_and_transpose_s16_8x8() in idct32_6_neon() with a separate
load_tran_low_to_s16() and transpose_s16_8x8(). the combined function is
used in idct32_8_neon() where the input is the correctly sized output
from the earlier stage.
BUG=webm:1294
Change-Id: I4257c4b3a421b2cf5d13651f966eee0680ef98a9
2016-11-08 17:03:36 -08:00
Johann
50b40f114c
Optimize idct32x32_135_add for NEON
...
BUG=webm:1295
Change-Id: I7f80ef4d29813fcb401fc6075babf19e3c195462
2016-11-08 22:06:07 +00:00
Linfeng Zhang
64a5a8fd6f
Merge "Add high bitdepth intra prediction NEON optimization (mode dc)"
2016-11-08 16:53:42 +00:00
Martin Storsjo
34c35b6fb6
Add a missing END directive in idct_neon.asm
...
This fixes building with MS armasm.
Change-Id: I2629eeed859b775ca667a65ba109f8d1bf7b0e03
2016-11-04 12:21:18 +02:00
Linfeng Zhang
1338c71dfb
Clean horizontal intra prediction NEON optimization
...
Change-Id: I1ef0a5b2655cbc7e1cc2a4a1a72e0eed9aa41f05
2016-11-02 11:43:45 -07:00
Linfeng Zhang
1868582e7d
Add 32x32 d45 and 8x8, 16x16, 32x32 d135 NEON intra prediction
...
Change-Id: I852616794244490123eb615ac750da50265f0fa5
2016-11-02 11:40:37 -07:00
Johann Koenig
5ac7a59a05
Merge "arm idct: move to-be-shared code to header"
2016-11-02 18:09:45 +00:00
Linfeng Zhang
3b74066b10
Add high bitdepth intra prediction NEON optimization (mode dc)
...
BUG=webm:1316
Change-Id: I984d6004ea2445e86f213fb6fa4d794a9955af8f
2016-11-01 17:07:36 -07:00
Johann
bf8ab194ee
arm idct: move to-be-shared code to header
...
Change-Id: I67458cd358b4dc4434bbdbfcdd571769561b619e
2016-11-01 15:43:56 -07:00
James Zern
1b275ab898
Merge "idct32x32_1_add_neon: clear a couple conv warnings"
2016-11-01 22:34:59 +00:00
James Zern
9de91855ef
Merge changes I08af3a54,If5959a25,I6763e62e
...
* changes:
build/make/Android.mk: s/armv8/arm64/
build/make/Android.mk: fix armeabi-v7a build
use .S suffix rather than .s for NEON asm
2016-11-01 21:43:13 +00:00
Linfeng Zhang
cc5f49767a
Refine 8-bit intra prediction NEON optimization (mode tm)
...
Change-Id: I98b9577ec51367df5e5d564bedf7c3ea0606de4c
2016-11-01 09:45:16 -07:00
James Zern
7625c803b3
idct32x32_1_add_neon: clear a couple conv warnings
...
int16_t -> uint8_t
Change-Id: I3c5e0985bc3584dce289c35b5973de24cdc73b76
2016-10-31 18:56:34 -07:00
James Zern
1ddb4c0362
use .S suffix rather than .s for NEON asm
...
for compatibility with other build systems
Change-Id: I6763e62e3126850ad4f8ad29e388b8dad0bbc4c3
2016-10-31 16:39:05 -07:00
James Zern
410d947c5f
Merge "idct,NEON: add a tran_low_t->s16 load adapter"
2016-10-31 21:59:12 +00:00
James Zern
3ae25974fd
idct,NEON: add a tran_low_t->s16 load adapter
...
enable idct4x4* and idct8x8* which are compatible for 8-bit decodes in
high-bitdepth mode. the adapter narrows 32-bit input to 16, whether the
expansion can be avoided at all in this case remains a TODO. roughly
matches sse2.
BUG=webm:1294
Change-Id: I3ea94e5a2070dfd509b5de0c555aab4e1f4da036
2016-10-31 11:21:16 -07:00
Linfeng Zhang
a347118f3c
Refine 8-bit intra prediction NEON optimization (mode h and v)
...
Change-Id: I45e1454c3a85e081bfa14386e0248f57e2a91854
2016-10-31 10:33:44 -07:00
Linfeng Zhang
4ae9f5c092
Refine 8-bit intra prediction NEON optimization (mode d45 and d135)
...
dst += stride behaving better with gcc/clang.
Unroll loops.
Change-Id: I83f85df2bc9f17c6159542f57680b509395db2b1
2016-10-27 14:24:50 -07:00
Linfeng Zhang
9c0680bd43
Merge "Refine 8-bit intra prediction NEON optimization (mode dc)"
2016-10-26 16:51:44 +00:00
Johann
9720b58aac
Optimize idct32x32_34_add for NEON
...
Approximately 3 times faster than the 1024 version which was used
previously.
BUG=webm:1295
Change-Id: Id15fb3d096029ec38ef01c53e5f6eb08254347c9
2016-10-25 15:43:58 -07:00
Linfeng Zhang
ce88b8f5c5
Refine 8-bit intra prediction NEON optimization (mode dc)
...
dst += stride behaving better with gcc/clang
Expanding inline function dc_SIZExSIZE() save intructions for
vpx_dc_predictor_SIZExSIZE_neon().
Change-Id: Id0ccbd58b6a31df539141fd33bdf28633339150d
2016-10-24 13:18:51 -07:00
James Zern
2e6a1976a0
Merge "remove idct32x32*_add_neon.asm"
2016-10-22 02:29:56 +00:00
James Zern
5d91752a98
Merge "vpx_highbd_convolve_copy_neon: use multi reg loads"
2016-10-22 02:28:15 +00:00
James Zern
9dbb3ad396
remove idct32x32*_add_neon.asm
...
the intrinsics are neutral to ~20% faster on cros/android
devices when using gcc-4.9/clang-3.8.1 and gcc-4.9/clang-3.8.x from the
r13 ndk. neutral results typically came with gcc-4.9 while larger
positive gains were achieved with clang 3.8.x.
BUG=webm:1303
Change-Id: I4d31f9c017944681b881493525d4573a7a5b1e16
2016-10-20 19:47:14 -07:00