Linfeng Zhang
66695533a8
Merge "Update 16x16 8-bit idct NEON intrinsics"
2017-02-07 16:52:40 +00:00
Johann Koenig
726556dde9
Merge "Remove neon assembly for idct 16x16 and 8x8"
2017-02-02 03:25:31 +00:00
Johann Koenig
ce6318f254
Merge changes I43521ad3,I013659f6
...
* changes:
satd highbd neon: use tran_low_t for coeff
satd highbd sse2: use tran_low_t for coeff
2017-02-02 03:03:58 +00:00
Linfeng Zhang
e4985cf619
Update 16x16 8-bit idct NEON intrinsics
...
Remove redundant memory accesses.
Change-Id: I8049074bdba5f49eab7e735b2b377423a69cd4c8
2017-02-01 17:04:33 -08:00
Johann
f8d744d91a
satd highbd neon: use tran_low_t for coeff
...
BUG=webm:1365
Change-Id: I43521ad32b6c96737a8ef2b8c327f901fd7eaf84
2017-02-01 11:55:47 -08:00
Johann
1eb8a718bf
hadamard highbd neon: use tran_low_t for coeff
...
BUG=webm:1365
Change-Id: I7e15192ead3a3631755b386f102c979f06e26279
2017-02-01 11:50:46 -08:00
Johann
13234d3c43
Remove neon assembly for idct 16x16 and 8x8
...
Tested using test/partial_idct_test.cc:DISABLED_Speed
Both gcc 4.9 and clang 3.8 from the r13 Android NDK offer improvements
using the intrinsics:
<function> <clang asm> <gcc asm> <clang intrin> <gcc intrin>
idct16x16_256 1720ms 1703ms 1546ms 1554ms
idct16x16_10 1320ms 1247ms 518ms 488ms
idct16x16_1 107ms 108ms 64ms 68ms
idct8x8_64 924ms 931ms 866ms 989ms
idct8x8_12 826ms 824ms 519ms 514ms
idct8x8_1 172ms 166ms 110ms 125ms
idct8x8_64 isn't quite perfect (slight regression with gcc intrinsics)
but as a counter example idct16x16_10 goes from ~1300ms to ~500ms
On a sample clip, clang improved from 48.5 to 49fps and gcc stayed roughly
stable.
BUG=webm:1303
Change-Id: I9d4fd2b41b46ea6174a887b40a82c8e6e4769ed4
2017-01-19 12:27:31 -08:00
Johann
68d0f46ec0
arm idct16x16: remove extra config guards
...
This file is guarded by HAVE_NEON_ASM in the .mk file now.
Change-Id: I513a621c234aa90ad52e426c8ed494d8a7d4b74a
2017-01-11 10:17:14 -08:00
James Zern
9480da21e8
Merge "Refine 8-bit 16x16 idct NEON intrinsics"
2017-01-09 23:52:29 +00:00
Johann
c23970ec25
postproc: vpx_mbpost_proc_down_neon
...
This was much more amenable to optimization than the across filter.
Speedup of almost 2.5x
BUG=webm:1320
Change-Id: I49acc0f9cb2e7642303df90132cbc938acade4c4
2017-01-09 10:21:56 -08:00
Johann Koenig
9af97fb630
Merge "postproc: vpx_mbpost_proc_across_ip_neon"
2017-01-09 18:17:26 +00:00
Linfeng Zhang
6abdd31555
Refine 8-bit 16x16 idct NEON intrinsics
...
Speed test shows 25% gain on vpx_idct16x16_256_add_neon(),
and vpx_idct16x16_10_add_neon() got trippled.
Change-Id: If8518d9b6a3efab74031297b8d40cd83c4a49541
2017-01-06 17:52:07 -08:00
Johann
4dca923454
postproc: vpx_mbpost_proc_across_ip_neon
...
The speedup is pretty poor. I would be concerned except the SSE2 is
worse:
Existing SSE2 improvement: 22%
New neon improvement: 35%
BUG=webm:1320
Change-Id: Ied598a261134aa6cbe69f96f58589d2bae17bf62
2017-01-06 16:39:17 -08:00
Linfeng Zhang
2d12a52ff0
Merge "Add high bitdepth 8x8 idct NEON intrinsics"
2017-01-06 16:47:23 +00:00
Linfeng Zhang
911bb980b1
Clean DC only idct NEON intrinsics
...
BUG=webm:1301
Change-Id: Iffc83854218460b3f687f3774e71d45b552382a5
2016-12-28 13:51:44 -08:00
Linfeng Zhang
9b187954df
Add high bitdepth 8x8 idct NEON intrinsics
...
BUG=webm:1301
Change-Id: I56e3bc3aab9214e2debac93796389a7194991084
2016-12-27 16:28:53 -08:00
Linfeng Zhang
6d5a3fe583
Clean idct 8x8 neon functions
...
BUG=webm:1301
Change-Id: I05f47dca1fddc155c8396e627cfccf6449677307
2016-12-21 14:24:17 -08:00
James Zern
a68b36c752
vpx_idct32x32_1024_add_neon: quiet uninitialized warning
...
relocate the assignment to 'in' outside of the for loop. this quiets a
spurious warning in visual studio builds since:
86e340c
enable vpx_idct32x32_1024_add_neon in hbd builds
+ give the variable a more descriptive name
BUG=webm:1294
Change-Id: I5c3da5c7939621477e0fc0ad3a1b2a3045c5bffd
2016-12-19 12:49:44 -08:00
Linfeng Zhang
7e23f895ca
Merge "Clean hbd idct 4x4 neon functions and other"
2016-12-19 17:09:26 +00:00
Johann
41b0888a84
postproc: neon down and across macroblock filter
...
Implement vpx_post_proc_down_and_across_mb_row in NEON.
Runs about 6-7x faster than C.
BUG=webm:1320
Change-Id: Ic5c7d3552a88cfcf999ec5bf2bd46fee460642c2
2016-12-14 15:11:28 -08:00
Linfeng Zhang
c8f25fa5c0
Clean hbd idct 4x4 neon functions and other
...
BUG=webm:1301
Change-Id: I387b7eae716a7df15c691dc6f368b07602df7342
2016-12-14 11:38:28 -08:00
James Zern
86e340c76e
enable vpx_idct32x32_1024_add_neon in hbd builds
...
BUG=webm:1294
Change-Id: Ibdda54e6d1303b0f73bc7bc71417e4041d7618de
2016-12-12 19:28:35 -08:00
Linfeng Zhang
5d4aa325a6
Cosmetics by unifying dest_stride to stride in idct
...
Change-Id: Ie9336a808a3c3592bb4fd5d4ad3839028bfcafba
2016-12-12 15:13:22 -08:00
Johann
2c24f7178d
Move load_and_transpose to transpose_neon.h
...
Allows for use outside the idcts without pulling in idct_neon.h
Change-Id: I4a94c1af3dac3e1b5bc8296ec9eab0ddcc8cfecf
2016-12-09 12:54:55 -08:00
James Zern
6defef4ab2
idct16x16_add_neon: fix arm visual studio builds
...
after:
2d3d95f
enable vpx_idct16x16_256_add_neon in hbd builds
reorder INCLUDEs and fix indent of IF/ENDIFs
remove vpx_config.asm to avoid multiple symbol definitions in windows
builds and shift idct_neon.asm.S to the top to allow use of
CONFIG_VP9_HIGHBITDEPTH in the export list.
Change-Id: I0dacfbae62a6ec8fe4a26940c1a52da2dfad2029
2016-12-08 15:17:57 -08:00
Linfeng Zhang
174528de1e
Merge "Update idct NEON optimization to not use narrowing saturating shift"
2016-12-07 21:03:21 +00:00
James Zern
f16a0a1aa4
Merge "enable vpx_idct16x16_256_add_neon in hbd builds"
2016-12-07 20:26:44 +00:00
Linfeng Zhang
018a2adcb1
Update idct NEON optimization to not use narrowing saturating shift
...
Change-Id: Iae517017217dbacd638d40fcfeeb0f4bba7b8b8b
2016-12-07 10:25:09 -08:00
James Zern
2d3d95f7ac
enable vpx_idct16x16_256_add_neon in hbd builds
...
BUG=webm:1294
Change-Id: Ib421c150b0d29dee0a81390a612bf01a4a28cff1
2016-12-06 18:32:21 -08:00
James Zern
228c9940ea
Merge changes Ibad079f2,I7858a0a1
...
* changes:
enable vpx_idct16x16_10_add_neon in hbd builds
idct16x16,NEON: rm output_stride from pass1 fns
2016-12-07 01:40:28 +00:00
James Zern
8befcd0089
enable vpx_idct16x16_10_add_neon in hbd builds
...
BUG=webm:1294
Change-Id: Ibad079f25e673d4f5181961896a8a8333a51e825
2016-12-06 16:09:19 -08:00
James Zern
af9d7aa9fb
idct16x16,NEON: rm output_stride from pass1 fns
...
vpx_idct16x16_256_add_neon_pass1, vpx_idct16x16_10_add_neon:
this was a constant 8 in all cases meaning the results are stored
contiguously, this allows the number of stores to be reduced.
Change-Id: I7858a0a15a284883ef45c13dfd97c308df9ea09e
2016-12-06 15:13:33 -08:00
Linfeng Zhang
cb339d628f
Refine 8-bit 8x8 idct NEON intrinsics
...
Change-Id: I4ec4ad1928ec2ed87f596f52f097bc52065278dd
2016-12-05 17:50:14 -08:00
Linfeng Zhang
a8eee97b43
Check in vpx_lpf_vertical_4_dual_neon() assembly
...
This replaces its C version.
Change-Id: Ie39e9324305fdc0fff610ced608a037e44a85a1a
2016-12-02 15:54:30 -08:00
James Zern
a7fa1314da
Merge changes I4afc130e,Iaa64d23f
...
* changes:
Add high bitdepth 4x4 idct NEON intrinsics
Update idct x86 intrinsics to not use saturated add and sub
2016-12-02 04:01:28 +00:00
Linfeng Zhang
17a8cf5cc3
Add high bitdepth 4x4 idct NEON intrinsics
...
Change-Id: I4afc130effa05b8be2e9f982967216b1beb2ce4b
2016-11-30 13:07:13 -08:00
James Zern
c6641782c3
idct16x16,NEON,cosmetics: normalize fn signatures
...
+ remove unused parameters from vpx_idct16x16_10_add_neon_pass2
Change-Id: Ie5912a4abdd308fab589380bca054a2e7234a2c4
2016-11-28 16:46:01 -08:00
James Zern
21a1abd8e3
enable vpx_idct32x32_135_add_neon in hbd builds
...
BUG=webm:1294
Change-Id: Ide6d3994fe01c4320c9d143e6d059b49568048e4
2016-11-23 19:59:43 -08:00
James Zern
568d4b1d63
idct_neon: rename load_tran_low_to_s16 -> ...s16q
...
BUG=webm:1294
Change-Id: I164cfcbe9bc4511d1d04af9206cf351a0ec2957b
2016-11-23 19:57:48 -08:00
James Zern
d757d7e998
Merge changes Icc4ead05,Ib019964b,I3b5fd3b3,Ieedadee2
...
* changes:
Update vpx_idct4x4_16_add_neon() to pass SingleExtremeCoeff test
Refine 8-bit 4x4 idct NEON intrinsics
Add idct speed test.
Update partial_idct_test.cc to support high bitdepth
2016-11-24 03:31:25 +00:00
Linfeng Zhang
05e2b5a59f
Merge "Add 32x32 d45 and 8x8, 16x16, 32x32 d135 NEON intra prediction"
2016-11-22 23:20:53 +00:00
Linfeng Zhang
6cc76ec73f
Update vpx_idct4x4_16_add_neon() to pass SingleExtremeCoeff test
...
Change-Id: Icc4ead05506797d12bf134e8790443676fef5c10
2016-11-22 11:35:05 -08:00
Linfeng Zhang
974e81d184
Refine 8-bit 4x4 idct NEON intrinsics
...
Change-Id: Ib019964bfcbce7aec57d8c3583127f9354d3c11f
2016-11-22 11:26:03 -08:00
James Zern
cbeae53e76
Merge "Clean horizontal intra prediction NEON optimization"
2016-11-19 01:29:37 +00:00
Linfeng Zhang
85c1ee434d
Add high bitdepth intra prediction NEON optimization (mode tm)
...
BUG=webm:1316
Change-Id: Ib014de06836ac12726f4a2c9f0833ec4eb4d233b
2016-11-15 14:19:46 -08:00
Linfeng Zhang
a3128ad33a
Add high bitdepth intra prediction NEON optimization (h and v)
...
BUG=webm:1316
Change-Id: I47eeac698a98a31d1af5f72441052302e9fa4f46
2016-11-12 12:00:19 -08:00
James Zern
80f6b243a7
Merge changes I339088b2,Iaade219e,If142afb1,I4257c4b3
...
* changes:
fdct8x8_test: add vpx_idct8x8_64_add_neon in hbd
fdct4x4_test: add vpx_idct4x4_16_add_neon in hbd
partial_idct_test,NEON: add missing idct variants
enable vpx_idct32x32_34_add_neon in hbd builds
2016-11-10 05:02:39 +00:00
Linfeng Zhang
40ab0424d4
Add high bitdepth intra prediction NEON optimization (mode d45 and d135)
...
BUG=webm:1316
Change-Id: I6a330874348df04df24a6d9efdc06f567e04bf8e
2016-11-09 12:04:04 -08:00
James Zern
738c8f23c6
enable vpx_idct32x32_34_add_neon in hbd builds
...
replace load_and_transpose_s16_8x8() in idct32_6_neon() with a separate
load_tran_low_to_s16() and transpose_s16_8x8(). the combined function is
used in idct32_8_neon() where the input is the correctly sized output
from the earlier stage.
BUG=webm:1294
Change-Id: I4257c4b3a421b2cf5d13651f966eee0680ef98a9
2016-11-08 17:03:36 -08:00
Johann
50b40f114c
Optimize idct32x32_135_add for NEON
...
BUG=webm:1295
Change-Id: I7f80ef4d29813fcb401fc6075babf19e3c195462
2016-11-08 22:06:07 +00:00
Linfeng Zhang
64a5a8fd6f
Merge "Add high bitdepth intra prediction NEON optimization (mode dc)"
2016-11-08 16:53:42 +00:00
Martin Storsjo
34c35b6fb6
Add a missing END directive in idct_neon.asm
...
This fixes building with MS armasm.
Change-Id: I2629eeed859b775ca667a65ba109f8d1bf7b0e03
2016-11-04 12:21:18 +02:00
Linfeng Zhang
1338c71dfb
Clean horizontal intra prediction NEON optimization
...
Change-Id: I1ef0a5b2655cbc7e1cc2a4a1a72e0eed9aa41f05
2016-11-02 11:43:45 -07:00
Linfeng Zhang
1868582e7d
Add 32x32 d45 and 8x8, 16x16, 32x32 d135 NEON intra prediction
...
Change-Id: I852616794244490123eb615ac750da50265f0fa5
2016-11-02 11:40:37 -07:00
Johann Koenig
5ac7a59a05
Merge "arm idct: move to-be-shared code to header"
2016-11-02 18:09:45 +00:00
Linfeng Zhang
3b74066b10
Add high bitdepth intra prediction NEON optimization (mode dc)
...
BUG=webm:1316
Change-Id: I984d6004ea2445e86f213fb6fa4d794a9955af8f
2016-11-01 17:07:36 -07:00
Johann
bf8ab194ee
arm idct: move to-be-shared code to header
...
Change-Id: I67458cd358b4dc4434bbdbfcdd571769561b619e
2016-11-01 15:43:56 -07:00
James Zern
1b275ab898
Merge "idct32x32_1_add_neon: clear a couple conv warnings"
2016-11-01 22:34:59 +00:00
James Zern
9de91855ef
Merge changes I08af3a54,If5959a25,I6763e62e
...
* changes:
build/make/Android.mk: s/armv8/arm64/
build/make/Android.mk: fix armeabi-v7a build
use .S suffix rather than .s for NEON asm
2016-11-01 21:43:13 +00:00
Linfeng Zhang
cc5f49767a
Refine 8-bit intra prediction NEON optimization (mode tm)
...
Change-Id: I98b9577ec51367df5e5d564bedf7c3ea0606de4c
2016-11-01 09:45:16 -07:00
James Zern
7625c803b3
idct32x32_1_add_neon: clear a couple conv warnings
...
int16_t -> uint8_t
Change-Id: I3c5e0985bc3584dce289c35b5973de24cdc73b76
2016-10-31 18:56:34 -07:00
James Zern
1ddb4c0362
use .S suffix rather than .s for NEON asm
...
for compatibility with other build systems
Change-Id: I6763e62e3126850ad4f8ad29e388b8dad0bbc4c3
2016-10-31 16:39:05 -07:00
James Zern
410d947c5f
Merge "idct,NEON: add a tran_low_t->s16 load adapter"
2016-10-31 21:59:12 +00:00
James Zern
3ae25974fd
idct,NEON: add a tran_low_t->s16 load adapter
...
enable idct4x4* and idct8x8* which are compatible for 8-bit decodes in
high-bitdepth mode. the adapter narrows 32-bit input to 16, whether the
expansion can be avoided at all in this case remains a TODO. roughly
matches sse2.
BUG=webm:1294
Change-Id: I3ea94e5a2070dfd509b5de0c555aab4e1f4da036
2016-10-31 11:21:16 -07:00
Linfeng Zhang
a347118f3c
Refine 8-bit intra prediction NEON optimization (mode h and v)
...
Change-Id: I45e1454c3a85e081bfa14386e0248f57e2a91854
2016-10-31 10:33:44 -07:00
Linfeng Zhang
4ae9f5c092
Refine 8-bit intra prediction NEON optimization (mode d45 and d135)
...
dst += stride behaving better with gcc/clang.
Unroll loops.
Change-Id: I83f85df2bc9f17c6159542f57680b509395db2b1
2016-10-27 14:24:50 -07:00
Linfeng Zhang
9c0680bd43
Merge "Refine 8-bit intra prediction NEON optimization (mode dc)"
2016-10-26 16:51:44 +00:00
Johann
9720b58aac
Optimize idct32x32_34_add for NEON
...
Approximately 3 times faster than the 1024 version which was used
previously.
BUG=webm:1295
Change-Id: Id15fb3d096029ec38ef01c53e5f6eb08254347c9
2016-10-25 15:43:58 -07:00
Linfeng Zhang
ce88b8f5c5
Refine 8-bit intra prediction NEON optimization (mode dc)
...
dst += stride behaving better with gcc/clang
Expanding inline function dc_SIZExSIZE() save intructions for
vpx_dc_predictor_SIZExSIZE_neon().
Change-Id: Id0ccbd58b6a31df539141fd33bdf28633339150d
2016-10-24 13:18:51 -07:00
James Zern
2e6a1976a0
Merge "remove idct32x32*_add_neon.asm"
2016-10-22 02:29:56 +00:00
James Zern
5d91752a98
Merge "vpx_highbd_convolve_copy_neon: use multi reg loads"
2016-10-22 02:28:15 +00:00
James Zern
9dbb3ad396
remove idct32x32*_add_neon.asm
...
the intrinsics are neutral to ~20% faster on cros/android
devices when using gcc-4.9/clang-3.8.1 and gcc-4.9/clang-3.8.x from the
r13 ndk. neutral results typically came with gcc-4.9 while larger
positive gains were achieved with clang 3.8.x.
BUG=webm:1303
Change-Id: I4d31f9c017944681b881493525d4573a7a5b1e16
2016-10-20 19:47:14 -07:00
Urvang Joshi
e084e05484
Fix warnings reported by -Wshadow: Part1: vpx_dsp directory
...
While we are at it:
- Rename some variables to more meaningful names
- Reuse some common consts from a header instead of redefining them.
Change-Id: I75c4248cb75aa54c52111686f139b096dc119328
(cherry picked from aomedia 09eea21)
2016-10-17 19:25:19 -07:00
James Zern
68cd3052ca
vpx_highbd_convolve_copy_neon: use multi reg loads
...
for copy16/32/64
BUG=webm:1299
Change-Id: I5080d736bde7e487c80ef3d7024dda1e96a57eaf
2016-10-17 17:15:03 -07:00
Linfeng Zhang
9c8981c666
add vpx high bitdepth convolve8 NEON intrinsics optimization
...
BUG=webm:1299
Change-Id: I236bfa0441e357b6ff05add8269a2cfb543924d1
2016-10-17 15:23:54 -07:00
Linfeng Zhang
f910d14a1a
add vpx_highbd_convolve_{copy,avg}_neon()
...
BUG=webm:1299
Change-Id: Ib87ac466ada63251eb06ae2abd1e13e61e0d1538
2016-10-13 15:21:14 -07:00
James Zern
fd270437f0
cosmetics,*loopfilter_neon.c: s/tranpose/transpose/
...
Change-Id: I267d6a9d715ddb6110f0881c2e820c37fc673fe1
2016-10-12 16:12:56 -07:00
Linfeng Zhang
01454ec485
[vpx highbd lpf NEON 6/6] vertical 16
...
BUG=webm:1300
Change-Id: I29d0b482d66f05e278325ddebcf108fbf0b6e222
2016-10-11 22:59:19 -07:00
Linfeng Zhang
27479775c4
[vpx highbd lpf NEON 5/6] horizontal 16
...
BUG=webm:1300
Change-Id: I21da32d6cfb8a1a6f58bc9756d17f48f13a59a12
2016-10-11 22:59:19 -07:00
Linfeng Zhang
251cbfbec8
[vpx highbd lpf NEON 4/6] vertical 8
...
BUG=webm:1300
Change-Id: If06b12bc081bab60059b100414dd7018f83ac62d
2016-10-11 22:59:19 -07:00
Linfeng Zhang
96c7206ede
[vpx highbd lpf NEON 3/6] horizontal 8
...
BUG=webm:1300
Change-Id: Ica2379e294be60b7f80fcfcec110dca4c3b59d81
2016-10-12 00:48:31 +00:00
Linfeng Zhang
49aa9b1f12
[vpx highbd lpf NEON 2/6] vertical 4
...
BUG=webm:1300
Change-Id: Ia33a9f2d6c7e2e6b3497ad6f1a09439a85b33983
2016-10-06 14:22:26 -07:00
Linfeng Zhang
7aa27bd62f
[vpx highbd lpf NEON 1/6] horizontal 4
...
BUG=webm:1300
Change-Id: Idf441806e6bf397ff5ecd8776146b3f781f50c40
2016-10-06 14:03:04 -07:00
James Zern
1e1caad165
vpx_dsp/idct*_neon.asm: simplify immediate loads
...
mov supports 0-65535
Change-Id: I019de0d784836d7bd60e6b36f2cdeefb541cb3fd
2016-10-05 14:28:32 -07:00
James Zern
c6bc7499d9
Merge "cosmetics,*_neon.c: rm redundant return from void fns"
2016-10-03 22:40:42 +00:00
James Zern
50b9c467da
Merge "vpx_convolve8_neon,load/store*: correct param type"
2016-10-01 23:52:14 +00:00
James Zern
c449983c56
vpx_convolve8_neon,load/store*: correct param type
...
stride/pitch in convolve is expressed with a ptrdiff_t
Change-Id: Ia5a6732dc509f06ccf7035386fa8ae721b4b1a71
2016-10-01 11:03:29 -07:00
Martin Storsjo
9255328f27
Remove a stray END declaration in loopfilter_4_neon.asm
...
Change-Id: Ic8c359a5677f9c663787aac74f530e886163bc69
2016-10-01 14:12:42 +03:00
Linfeng Zhang
da14d23e44
Merge "Refactor vpx lpf NEON files (step 2/2)"
2016-10-01 00:07:51 +00:00
Linfeng Zhang
edbca72a53
Merge "Refactor vpx lpf NEON files (step 1/2)"
2016-10-01 00:07:31 +00:00
James Zern
db80c23fd4
cosmetics,*_neon.c: rm redundant return from void fns
...
+ a couple of 'break's after a return
Change-Id: Ia21f12ebcef98244feb923c17b689fc8115da015
2016-09-30 13:09:57 -07:00
James Zern
b6277a47c7
Merge changes from topic '8bit-hbd-idct'
...
* changes:
*idct*_neon.c: add missing rtcd include
idct,msa/neon: exclude idct files from hbd build
*rtcd_defs.pl: remove empty specialize calls
2016-09-30 19:36:08 +00:00
James Zern
1396d12103
*idct*_neon.c: add missing rtcd include
...
+ correct declarations as necessary
BUG=webm:1294
Change-Id: I719602df9a56e79188a78e7f8b31257c6d3cc11d
2016-09-30 11:41:26 -07:00
Linfeng Zhang
ca2fe7a8c7
Refactor vpx lpf NEON files (step 2/2)
...
Change-Id: I0744407cd3361ff752bd7f6e654b70ab6b41a58f
2016-09-30 09:56:28 -07:00
Linfeng Zhang
4779f5308d
Refactor vpx lpf NEON files (step 1/2)
...
Change-Id: I4016d096d46ca691f3b17199b259b7231e983cfb
2016-09-30 09:48:54 -07:00
Linfeng Zhang
8c744fd978
Merge "Unify loopfilter function names"
2016-09-30 15:58:08 +00:00
Linfeng Zhang
c435b7fbdd
Merge "Refine vpx convolve8 NEON intrinsics optimization"
2016-09-30 15:56:31 +00:00
Linfeng Zhang
7f1f35183a
Unify loopfilter function names
...
Rename vpx_lpf_horizontal_edge_8() to vpx_lpf_horizontal_16().
Rename vpx_lpf_horizontal_edge_16() to vpx_lpf_horizontal_16_dual().
Change-Id: I798ca8fbbd657d06d3db2bfb0fb3321168f49e52
2016-09-29 16:25:42 -07:00
Linfeng Zhang
85a9e48d25
Refine vpx_convolve_copy_neon() and vpx_convolve_avg_neon()
...
BUG=webm:1290
Change-Id: Ia27e58521eba5a4852b50381c56746fa5767f6d6
2016-09-29 16:19:39 -07:00
Linfeng Zhang
b3cb065ee4
Refine vpx convolve8 NEON intrinsics optimization
...
BUG=webm:1290
Change-Id: I5d7fce62270f9d76ef9ce98b3d188ad11fb21873
2016-09-29 12:48:59 -07:00
Urvang Joshi
0aa3e2564f
Add compiler warning flag -Wextra and fix related warnings.
...
Note: some of these warnings are enabled by a combination of -Wunused
(added earlier) and -Wextra.
Cherry-picked from AOM 4790a69faaec8f03d65f64ff070f6ab4307dbb16
Expands use of (void)x; on unused variables. AOM only supports one codec
in codec_factory.h
Does not include changes to HandleDecodeResult. AOM removed
invalid_file_test.cc which does use the video parameter.
Does not enable -Wextra yet. There are more issues to fix.
BUG=webm:1069
Change-Id: I322a1366bd4fd6c0dec9e758c2d5e88e003b1cbf
2016-09-27 12:05:01 -07:00
Linfeng Zhang
b46243d7ff
Merge "Refactor lpf (size 4 and 8) NEON intrinsics optimization"
2016-09-26 16:11:12 +00:00
James Zern
e372bfd5ac
variance_neon: sync variance*() w/c,sse2
...
removes some unnecessary casts and adds a few explicit uint32 ones for
larger sizes to quiet -Wshorten-64-to-32 warnings
Change-Id: I63c5fce8e62c426d5cf5c10a66a113c119a43518
2016-09-21 18:04:45 -07:00
Linfeng Zhang
761e5ec2f6
Refactor lpf (size 4 and 8) NEON intrinsics optimization
...
Also check in 8x8 8-bit transpose NEON intrinsics optimization
transpose_u8_8x8()
Change-Id: I32d321cf97ea21eab158ac4896990fc9a51681c4
2016-09-19 16:41:37 -07:00
James Zern
aa0eb67bf7
loopfilter_mb_neon: remove unused load_8x8()
...
quiets a -Wunused-function warning for arm targets
Change-Id: I293a7e3d3d7d61d6af2fbedad5e8c25126c418b6
2016-09-17 11:00:31 -07:00
Linfeng Zhang
8107368000
Refactor lpf (size 16) NEON intrinsics optimization
...
Extract shared code so later lpf size 4 and 8 functions can reuse.
Change-Id: Ibb43ef1fd8651bd2e32fcc4c56cf6fa7ca237401
2016-09-16 09:12:13 -07:00
Linfeng Zhang
bee7d837ab
Update NEON transpose functions.
...
Unify coding style.
Change-Id: I5826f40c02c882df7353391e0c9dd6cef6bd4b97
2016-08-31 14:58:40 -07:00
Linfeng Zhang
f7cbfed682
Update vpx_lpf_vertical_16_dual_neon() intrinsics
...
Process 16 samples together.
Change-Id: If6ee8e3377aa2786417f2fc411ba7d87ea8b6799
2016-08-30 11:17:33 -07:00
Linfeng Zhang
4916515511
Update vpx_lpf_horizontal_edge_16_neon() intrinsics
...
Process 16 samples together.
Change-Id: I9cfbe04c9d25d8b89f63f48f519e812746db754d
2016-08-27 14:47:48 -07:00
Linfeng Zhang
f9efbad392
NEON asm of vpx_lpf_{horizontal,vertical}_8_dual_neon()
...
Also expose the NEON intrinsics version.
BUG=webm:1261, webm:1266.
Change-Id: I8c4ae658467dcf66ebf7a75982b2ef712dbb4535
2016-08-16 08:50:57 -07:00
Linfeng Zhang
f09b5a3328
NEON intrinsics for 4 loopfilter functions
...
New NEON intrinsics functions:
vpx_lpf_horizontal_edge_8_neon()
vpx_lpf_horizontal_edge_16_neon()
vpx_lpf_vertical_16_neon()
vpx_lpf_vertical_16_dual_neon()
BUG=webm:1262, webm:1263, webm:1264, webm:1265.
Change-Id: I7a2aff2a358b22277429329adec606e08efbc8cb
2016-08-12 09:58:17 -07:00
Johann Koenig
57f49db81f
Merge changes I6ef79702,Id332c641,I354b5d22,I84438013
...
* changes:
Use common transpose for vpx_idct32x32_1024_add_neon
Use common transpose for vpx_idct8x8_[12|64]_add_neon
Use common transpose for vp9_iht8x8_add_neon
Use common transpose for vpx_idct16x16_[10|256]_add_neon
2016-08-04 22:30:47 +00:00
Johann Koenig
17720b60bb
Merge "Remove armv6 target"
2016-08-04 22:21:13 +00:00
Johann
0325b95938
Use common transpose for vpx_idct32x32_1024_add_neon
...
Change-Id: I6ef7970206d588761ebe80005aecd35365ec50ff
2016-08-04 20:13:18 +00:00
Johann
f4e4ce7549
Use common transpose for vpx_idct8x8_[12|64]_add_neon
...
Change-Id: Id332c641f05336ef9a45e17493ff149fd0a168f0
2016-08-04 20:13:12 +00:00
Johann
8619203ddc
Use common transpose for vpx_idct16x16_[10|256]_add_neon
...
Change-Id: I84438013f483e82084d33ba9a63c33273d35fcaa
2016-08-04 20:12:53 +00:00
Johann Koenig
b757d89ff9
Merge "Extract neon transpose for re-use"
2016-08-04 20:12:38 +00:00
Johann
d55724fae9
Remove armv6 target
...
Change-Id: I1fa81cc9cabf362a185fc3a53f1e58de533a41e5
2016-08-04 12:55:06 -07:00
Johann
377cfa31f0
Extract neon transpose for re-use
...
Change-Id: I5e1c7f4c80d1c6f7fd582ac468c6eaaa3603a06c
2016-08-04 19:04:25 +00:00
Johann
df69c751a7
Don't expand to Q register for 4x4 intrapred
...
The code was expanding to Q registers so that vqrshn could be used, for
vector quad round shift and narrow. If 4 values are added together,
there is a shift by 2. If 8 values, a shift by 3. Since this accounts
for any possibility of overflow, we can skip the narrowing shift.
This allows keeping the values in D registers and casting the 16 bit
value to 8 bits.
Change-Id: I8d9cfa07176271f492c116ffa6a7b351af0b8751
2016-08-04 18:51:46 +00:00
Min Chen
407c2e2974
replace by VSTM/VLDM to reduce one of VST1/VLD1
...
Change-Id: I596567570580babb1a52925541d1fd1045c352f5
2016-07-28 23:01:38 +00:00
clang-format
099bd7f07e
vpx_dsp: apply clang-format
...
Change-Id: I3ea3e77364879928bd916f2b0a7838073ade5975
2016-07-25 14:14:19 -07:00
Johann
c516dd67bc
neon hadamard 16x16
...
Runs about twice as fast as C
BUG=webm:1027
Change-Id: I6760d99f4e22259439ca35d746194b12a81bfa71
2016-06-14 19:23:38 +00:00
Johann
9b54e812f7
neon hadamard 8x8
...
Runs about 30% faster than the C
BUG=webm:1021
Change-Id: I6809d6d84c3077ab619c53298296950e976bdaba
2016-05-16 11:58:02 -07:00
Johann
2f5840de3e
vpx_minmax_8x8_neon and test
...
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1156
Change-Id: Ief0ad8d6255b0ef0f233cda153799e3c72d3dbc6
2016-04-21 21:40:25 -07:00
James Zern
1b519fb666
split vpx_lpf_horizontal_16 in two
...
replace with vpx_lpf_horizontal_edge_16 and vpx_lpf_horizontal_edge_8 to
avoid passing a count parameter
Change-Id: I848c95c02a3c6ebaa6c2bdf0983dce05cd645271
2016-02-16 22:57:45 -08:00
James Zern
b1e97c6a25
vpx_lpf_horizontal_4: remove unused count param
...
Change-Id: Iec7d8eda343991f7d7d46931dca17af23c821d11
2016-02-16 22:57:27 -08:00
James Zern
bd5a5bb561
vpx_lpf_horizontal_8: remove unused count param
...
Change-Id: I48741e167a7b09b7c9ad3bfc1c4b88ef1029ae46
2016-02-16 22:54:40 -08:00
James Zern
109a47b342
vpx_lpf_vertical_4: remove unused count param
...
Change-Id: I43a191cb3d42e51e7bca266adfa11c6239a8064c
2016-02-16 14:59:00 -08:00
James Zern
37225744db
vpx_lpf_vertical_8: remove unused count param
...
Change-Id: Ic69406da00afb0f06588e8c0deb2b043952b078c
2016-02-16 14:59:00 -08:00
James Zern
d36659cec7
move vp9_avg to vpx_dsp
...
Change-Id: I7bc991abea383db1f86c1bb0f2e849837b54d90f
2015-12-14 14:42:12 -08:00
Scott LaVarnway
fa47212933
VPX: removed step checks from neon convolve code
...
The check is handled by the predictor table.
Change-Id: I42479f843e77a2d40cdcdfc9e2e6c48a05a36561
2015-08-12 16:46:53 -07:00
Jingning Han
08a453b9de
Replace vp9_ prefix with vpx_ prefix in vpx_dsp function names
...
This commit clears the function naming convention in vpx_dsp. It
replaces vp9_ prefix of global functions with vpx_ prefix. It also
removes the vp9_ prefix from static functions.
Change-Id: I6394359a63b71a51dda01342eec6a3cc08dfeedf
2015-08-04 13:46:11 -07:00
Jingning Han
6eabf229e2
Remove vp9_common.h from idct16x16_neon.c
...
Change-Id: I3df35a99900ef8ce549d315866849a10db1a4c7b
2015-08-02 09:57:25 -07:00
Jingning Han
e8b133c79c
Factor inverse transform functions into vpx_dsp
...
This commit moves the module inverse transform functions from vp9
to vpx_dsp folder. The hybrid transform wrapper functions stay in
the vp9 folder, since it involves codec-specific data structures.
Change-Id: Ib066367c953d3d024c73ba65157bbd70a95c9ef8
2015-07-31 16:21:00 -07:00
Zoe Liu
7186a2dd86
Code refactor on InterpKernel
...
It in essence refactors the code for both the interpolation
filtering and the convolution. This change includes the moving
of all the files as well as the changing of the code from vp9_
prefix to vpx_ prefix accordingly, for underneath architectures:
(1) x86;
(2) arm/neon; and
(3) mips/msa.
The work on mips/drsp2 will be done in a separate change list.
Change-Id: Ic3ce7fb7f81210db7628b373c73553db68793c46
2015-07-31 10:27:33 -07:00
Hui Su
4cbf36b105
Merge "Replace prefix vp9_ with vpx_ for intra prediction functions"
2015-07-29 00:38:48 +00:00
Jingning Han
d12a4a825c
Merge "Replace vp9_ prefix in 2D-DCT functions with vpx_"
2015-07-29 00:07:31 +00:00
Jingning Han
fc18cf7a11
Merge "Move DC only forward 2D-DCT functions to vpx_dsp"
2015-07-29 00:06:37 +00:00
Jingning Han
4b5109cd73
Replace vp9_ prefix in 2D-DCT functions with vpx_
...
Clean up the forward 2D-DCT function names in vpx_dsp.
Change-Id: I3117978596d198b690036e7eb05fe429caf3bc25
2015-07-28 16:06:44 -07:00
Jingning Han
d19033fa4e
Move DC only forward 2D-DCT functions to vpx_dsp
...
This completes the forward transform functions layout refactoring.
Change-Id: I996fb0fb795f41e2040f7b21db985774098aedbd
2015-07-28 14:52:30 -07:00
Hui Su
fe7cabe8b6
Merge "Move intra prediction functions from vp9/common/ to vpx_dsp/"
2015-07-28 20:41:01 +00:00
hui su
4013645353
Replace prefix vp9_ with vpx_ for intra prediction functions
...
Change-Id: I8ae6fb586f8d5d018ace228df11714f82b085076
2015-07-27 13:42:06 -07:00
hui su
7971846a5e
Move intra prediction functions from vp9/common/ to vpx_dsp/
...
Change-Id: I64edc26cf4aab050c83f2d393df6250628ad43b8
2015-07-27 13:38:16 -07:00
Jingning Han
5ebc8febdc
Refactor vp9_idct.h file
...
Separate the common coefficient constant into vpx_dsp/txfm_common.h.
Move the SSE2 macro definitions to vpx_dsp/x86/txfm_common_sse2.h.
This clears the use case of vp9_idct.h in vpx_dsp folder.
Change-Id: I319735a2abf42888e5080ac14cfbcde34be7b121
2015-07-26 08:26:32 -07:00
Jingning Han
b67821f37b
Factor forward 2D-DCT transforms into vpx_dsp
...
This commit factors the 4x4, 8x8, and 16x16 2D-DCT forward
transform operations into vpx_dsp folder.
Change-Id: I084b117b79c0925edcbcabb93f62b9f4bf8dbe7d
2015-07-22 15:48:17 -07:00
Jingning Han
2992739b5d
Rename loop filter function from vp9_ to vpx_
...
Change-Id: I6f424bb8daec26bf8482b5d75dd9b0e45c11a665
2015-07-17 15:55:02 -07:00
Jingning Han
50adfdf5ba
Migrate loop filter functions from vp9/ to vpx_dsp/
...
The various tap loop filter operations are common functions across
codec. This commit moves them along with SIMD optimizations to
vpx_dsp folder.
Change-Id: Ia5fa0b2e5289cdb98467502a549c380b9c60e92c
2015-07-16 16:40:47 -07:00
Johann
6a82f0d7fb
Move sub pixel variance to vpx_dsp
...
Change-Id: I66bf6720c396c89aa2d1fd26d5d52bf5d5e3dff1
2015-07-07 15:51:04 -07:00
Jingning Han
432cd4bfb7
Move subtract functions from vp9 to vpx_dsp
...
Factor out the subtraction operator as common function.
Change-Id: I526e703477c6a290e0e3e3c8898f8bb1ca82779b
2015-07-06 12:22:47 -07:00