James Zern
9480da21e8
Merge "Refine 8-bit 16x16 idct NEON intrinsics"
2017-01-09 23:52:29 +00:00
Johann Koenig
371a64bfe7
Merge "postproc: vpx_mbpost_proc_down_neon"
2017-01-09 19:53:15 +00:00
Johann Koenig
8a7847c2c9
Merge "Fix mips dspr2 idct32x32 functions for large coefficient input"
2017-01-09 19:47:47 +00:00
Johann Koenig
bf168b24f5
Merge "Fix mips dspr2 idct16x16 functions for large coefficient input"
2017-01-09 19:47:00 +00:00
Johann Koenig
08d0a7fd0f
Merge "Fix mips dspr2 idct8x8 functions for large coefficient input"
2017-01-09 19:46:18 +00:00
Johann Koenig
ab20869221
Merge "Fix mips dspr2 idct4x4 functions for large coefficient input"
2017-01-09 19:45:54 +00:00
Johann
c23970ec25
postproc: vpx_mbpost_proc_down_neon
...
This was much more amenable to optimization than the across filter.
Speedup of almost 2.5x
BUG=webm:1320
Change-Id: I49acc0f9cb2e7642303df90132cbc938acade4c4
2017-01-09 10:21:56 -08:00
Johann Koenig
9af97fb630
Merge "postproc: vpx_mbpost_proc_across_ip_neon"
2017-01-09 18:17:26 +00:00
Kaustubh Raste
50dd3eb62c
Fix mips dspr2 idct32x32 functions for large coefficient input
...
Change-Id: If9da7099f226a27a09cc9e2899eb66a1158909d2
2017-01-09 17:21:09 +05:30
Kaustubh Raste
c06991fce6
Fix mips dspr2 idct16x16 functions for large coefficient input
...
Change-Id: I9be3d3d040837f658c6314606e28db8c31092a1a
2017-01-09 16:35:28 +05:30
Kaustubh Raste
24d804f79c
Fix mips dspr2 idct8x8 functions for large coefficient input
...
Change-Id: If011dd923bbe976589735d5aa1c3167dda1a3b61
2017-01-09 16:22:19 +05:30
Kaustubh Raste
afd2d797eb
Fix mips dspr2 idct4x4 functions for large coefficient input
...
Change-Id: I06730eec80ca81e0b7436d26232465b79f447e89
2017-01-09 15:28:30 +05:30
Linfeng Zhang
6abdd31555
Refine 8-bit 16x16 idct NEON intrinsics
...
Speed test shows 25% gain on vpx_idct16x16_256_add_neon(),
and vpx_idct16x16_10_add_neon() got trippled.
Change-Id: If8518d9b6a3efab74031297b8d40cd83c4a49541
2017-01-06 17:52:07 -08:00
Johann
4dca923454
postproc: vpx_mbpost_proc_across_ip_neon
...
The speedup is pretty poor. I would be concerned except the SSE2 is
worse:
Existing SSE2 improvement: 22%
New neon improvement: 35%
BUG=webm:1320
Change-Id: Ied598a261134aa6cbe69f96f58589d2bae17bf62
2017-01-06 16:39:17 -08:00
Linfeng Zhang
2d12a52ff0
Merge "Add high bitdepth 8x8 idct NEON intrinsics"
2017-01-06 16:47:23 +00:00
Linfeng Zhang
911bb980b1
Clean DC only idct NEON intrinsics
...
BUG=webm:1301
Change-Id: Iffc83854218460b3f687f3774e71d45b552382a5
2016-12-28 13:51:44 -08:00
Linfeng Zhang
9b187954df
Add high bitdepth 8x8 idct NEON intrinsics
...
BUG=webm:1301
Change-Id: I56e3bc3aab9214e2debac93796389a7194991084
2016-12-27 16:28:53 -08:00
Linfeng Zhang
6d5a3fe583
Clean idct 8x8 neon functions
...
BUG=webm:1301
Change-Id: I05f47dca1fddc155c8396e627cfccf6449677307
2016-12-21 14:24:17 -08:00
James Zern
a68b36c752
vpx_idct32x32_1024_add_neon: quiet uninitialized warning
...
relocate the assignment to 'in' outside of the for loop. this quiets a
spurious warning in visual studio builds since:
86e340c
enable vpx_idct32x32_1024_add_neon in hbd builds
+ give the variable a more descriptive name
BUG=webm:1294
Change-Id: I5c3da5c7939621477e0fc0ad3a1b2a3045c5bffd
2016-12-19 12:49:44 -08:00
Linfeng Zhang
7e23f895ca
Merge "Clean hbd idct 4x4 neon functions and other"
2016-12-19 17:09:26 +00:00
Johann
41b0888a84
postproc: neon down and across macroblock filter
...
Implement vpx_post_proc_down_and_across_mb_row in NEON.
Runs about 6-7x faster than C.
BUG=webm:1320
Change-Id: Ic5c7d3552a88cfcf999ec5bf2bd46fee460642c2
2016-12-14 15:11:28 -08:00
Linfeng Zhang
c8f25fa5c0
Clean hbd idct 4x4 neon functions and other
...
BUG=webm:1301
Change-Id: I387b7eae716a7df15c691dc6f368b07602df7342
2016-12-14 11:38:28 -08:00
James Zern
86e340c76e
enable vpx_idct32x32_1024_add_neon in hbd builds
...
BUG=webm:1294
Change-Id: Ibdda54e6d1303b0f73bc7bc71417e4041d7618de
2016-12-12 19:28:35 -08:00
Linfeng Zhang
5d4aa325a6
Cosmetics by unifying dest_stride to stride in idct
...
Change-Id: Ie9336a808a3c3592bb4fd5d4ad3839028bfcafba
2016-12-12 15:13:22 -08:00
Johann
2c24f7178d
Move load_and_transpose to transpose_neon.h
...
Allows for use outside the idcts without pulling in idct_neon.h
Change-Id: I4a94c1af3dac3e1b5bc8296ec9eab0ddcc8cfecf
2016-12-09 12:54:55 -08:00
James Zern
6defef4ab2
idct16x16_add_neon: fix arm visual studio builds
...
after:
2d3d95f
enable vpx_idct16x16_256_add_neon in hbd builds
reorder INCLUDEs and fix indent of IF/ENDIFs
remove vpx_config.asm to avoid multiple symbol definitions in windows
builds and shift idct_neon.asm.S to the top to allow use of
CONFIG_VP9_HIGHBITDEPTH in the export list.
Change-Id: I0dacfbae62a6ec8fe4a26940c1a52da2dfad2029
2016-12-08 15:17:57 -08:00
Linfeng Zhang
174528de1e
Merge "Update idct NEON optimization to not use narrowing saturating shift"
2016-12-07 21:03:21 +00:00
James Zern
f16a0a1aa4
Merge "enable vpx_idct16x16_256_add_neon in hbd builds"
2016-12-07 20:26:44 +00:00
Linfeng Zhang
018a2adcb1
Update idct NEON optimization to not use narrowing saturating shift
...
Change-Id: Iae517017217dbacd638d40fcfeeb0f4bba7b8b8b
2016-12-07 10:25:09 -08:00
James Zern
2d3d95f7ac
enable vpx_idct16x16_256_add_neon in hbd builds
...
BUG=webm:1294
Change-Id: Ib421c150b0d29dee0a81390a612bf01a4a28cff1
2016-12-06 18:32:21 -08:00
James Zern
228c9940ea
Merge changes Ibad079f2,I7858a0a1
...
* changes:
enable vpx_idct16x16_10_add_neon in hbd builds
idct16x16,NEON: rm output_stride from pass1 fns
2016-12-07 01:40:28 +00:00
James Zern
8befcd0089
enable vpx_idct16x16_10_add_neon in hbd builds
...
BUG=webm:1294
Change-Id: Ibad079f25e673d4f5181961896a8a8333a51e825
2016-12-06 16:09:19 -08:00
James Zern
af9d7aa9fb
idct16x16,NEON: rm output_stride from pass1 fns
...
vpx_idct16x16_256_add_neon_pass1, vpx_idct16x16_10_add_neon:
this was a constant 8 in all cases meaning the results are stored
contiguously, this allows the number of stores to be reduced.
Change-Id: I7858a0a15a284883ef45c13dfd97c308df9ea09e
2016-12-06 15:13:33 -08:00
Linfeng Zhang
cb339d628f
Refine 8-bit 8x8 idct NEON intrinsics
...
Change-Id: I4ec4ad1928ec2ed87f596f52f097bc52065278dd
2016-12-05 17:50:14 -08:00
Linfeng Zhang
a8eee97b43
Check in vpx_lpf_vertical_4_dual_neon() assembly
...
This replaces its C version.
Change-Id: Ie39e9324305fdc0fff610ced608a037e44a85a1a
2016-12-02 15:54:30 -08:00
James Zern
a7fa1314da
Merge changes I4afc130e,Iaa64d23f
...
* changes:
Add high bitdepth 4x4 idct NEON intrinsics
Update idct x86 intrinsics to not use saturated add and sub
2016-12-02 04:01:28 +00:00
Linfeng Zhang
17a8cf5cc3
Add high bitdepth 4x4 idct NEON intrinsics
...
Change-Id: I4afc130effa05b8be2e9f982967216b1beb2ce4b
2016-11-30 13:07:13 -08:00
Linfeng Zhang
264f6e70ec
Update idct x86 intrinsics to not use saturated add and sub
...
Change-Id: Iaa64d23fdb45ca1f235b0ea57e614516e548eca4
2016-11-29 17:06:08 -08:00
James Zern
c6641782c3
idct16x16,NEON,cosmetics: normalize fn signatures
...
+ remove unused parameters from vpx_idct16x16_10_add_neon_pass2
Change-Id: Ie5912a4abdd308fab589380bca054a2e7234a2c4
2016-11-28 16:46:01 -08:00
James Zern
21a1abd8e3
enable vpx_idct32x32_135_add_neon in hbd builds
...
BUG=webm:1294
Change-Id: Ide6d3994fe01c4320c9d143e6d059b49568048e4
2016-11-23 19:59:43 -08:00
James Zern
568d4b1d63
idct_neon: rename load_tran_low_to_s16 -> ...s16q
...
BUG=webm:1294
Change-Id: I164cfcbe9bc4511d1d04af9206cf351a0ec2957b
2016-11-23 19:57:48 -08:00
James Zern
d757d7e998
Merge changes Icc4ead05,Ib019964b,I3b5fd3b3,Ieedadee2
...
* changes:
Update vpx_idct4x4_16_add_neon() to pass SingleExtremeCoeff test
Refine 8-bit 4x4 idct NEON intrinsics
Add idct speed test.
Update partial_idct_test.cc to support high bitdepth
2016-11-24 03:31:25 +00:00
Jerome Jiang
97ec6291ee
Change C/MSA post proc to match SSE2.
...
BUG=webm:1321
Change-Id: I719023375dc48cf7d8ed72188853f0f1ccc4ad7f
2016-11-23 10:42:11 -08:00
Linfeng Zhang
05e2b5a59f
Merge "Add 32x32 d45 and 8x8, 16x16, 32x32 d135 NEON intra prediction"
2016-11-22 23:20:53 +00:00
Linfeng Zhang
6cc76ec73f
Update vpx_idct4x4_16_add_neon() to pass SingleExtremeCoeff test
...
Change-Id: Icc4ead05506797d12bf134e8790443676fef5c10
2016-11-22 11:35:05 -08:00
Linfeng Zhang
974e81d184
Refine 8-bit 4x4 idct NEON intrinsics
...
Change-Id: Ib019964bfcbce7aec57d8c3583127f9354d3c11f
2016-11-22 11:26:03 -08:00
Kaustubh Raste
ecc5998bcf
Fix mips dspr2 build warning
...
Change-Id: Ia8fb3ed124f01384e7896e309c9ff22c05b40719
2016-11-22 17:49:17 +05:30
Kaustubh Raste
a38e9f412d
Merge "Fix SingleLargeCoeff idct test"
2016-11-19 03:37:29 +00:00
James Zern
cbeae53e76
Merge "Clean horizontal intra prediction NEON optimization"
2016-11-19 01:29:37 +00:00
Jerome Jiang
de5fd00ec5
Change *_xmm to *_sse2 in deblocker assembly functions.
...
Some cosmetic changes because xmm is an anachronism.
Change-Id: I436a5b78a3c52776c20d6640939311f2a84a9bc7
2016-11-17 23:38:04 +00:00
Kaustubh Raste
c56e5dd620
Fix SingleLargeCoeff idct test
...
Updated idct code to handle single large coefficient (-32768)
Change-Id: Ia13ab1ab434a9a1b9954a5914088977a88841cc7
2016-11-17 11:41:07 +00:00
Jerome Jiang
5d48663e04
Merge "Change C and msa to match results from sse2."
2016-11-17 05:16:27 +00:00
Jerome Jiang
cb1b1b8fef
Change C and msa to match results from sse2.
...
Re-enable the tests to check CvsAssembly.
BUG=webm:1321
Change-Id: Id7f7d74b06c469fb6c8f5d04e91359e9cd9097a6
2016-11-16 17:05:26 -08:00
Linfeng Zhang
85c1ee434d
Add high bitdepth intra prediction NEON optimization (mode tm)
...
BUG=webm:1316
Change-Id: Ib014de06836ac12726f4a2c9f0833ec4eb4d233b
2016-11-15 14:19:46 -08:00
Linfeng Zhang
a3128ad33a
Add high bitdepth intra prediction NEON optimization (h and v)
...
BUG=webm:1316
Change-Id: I47eeac698a98a31d1af5f72441052302e9fa4f46
2016-11-12 12:00:19 -08:00
James Zern
80f6b243a7
Merge changes I339088b2,Iaade219e,If142afb1,I4257c4b3
...
* changes:
fdct8x8_test: add vpx_idct8x8_64_add_neon in hbd
fdct4x4_test: add vpx_idct4x4_16_add_neon in hbd
partial_idct_test,NEON: add missing idct variants
enable vpx_idct32x32_34_add_neon in hbd builds
2016-11-10 05:02:39 +00:00
Linfeng Zhang
40ab0424d4
Add high bitdepth intra prediction NEON optimization (mode d45 and d135)
...
BUG=webm:1316
Change-Id: I6a330874348df04df24a6d9efdc06f567e04bf8e
2016-11-09 12:04:04 -08:00
James Zern
738c8f23c6
enable vpx_idct32x32_34_add_neon in hbd builds
...
replace load_and_transpose_s16_8x8() in idct32_6_neon() with a separate
load_tran_low_to_s16() and transpose_s16_8x8(). the combined function is
used in idct32_8_neon() where the input is the correctly sized output
from the earlier stage.
BUG=webm:1294
Change-Id: I4257c4b3a421b2cf5d13651f966eee0680ef98a9
2016-11-08 17:03:36 -08:00
Johann
50b40f114c
Optimize idct32x32_135_add for NEON
...
BUG=webm:1295
Change-Id: I7f80ef4d29813fcb401fc6075babf19e3c195462
2016-11-08 22:06:07 +00:00
Linfeng Zhang
64a5a8fd6f
Merge "Add high bitdepth intra prediction NEON optimization (mode dc)"
2016-11-08 16:53:42 +00:00
Linfeng Zhang
d545c19afa
Rename vpx_highbd_idct8x8_10{*}() to vpx_highbd_idct8x8_12{*}()
...
Also update its trigger threshold from 10 to 12.
Change-Id: Ib8dddd87a5a22a12ca66e7084d342fbb027b0a2f
2016-11-07 09:07:55 -08:00
Linfeng Zhang
a9874961f0
Merge "Replace highbd_dct_const_round_shift with dct_const_round_shift"
2016-11-07 16:55:01 +00:00
Johann
e10c95dc83
Update vp9_fdct8x8_quant_ssse3 for highbitdepth
...
Borrow transition functions from fdct.h nee vpx_quantize_b_sse2
BUG=webm:1304
Change-Id: I9c88c3eec3ff8bb461411d98c26c3c236ea28ef1
2016-11-05 01:23:07 +00:00
Linfeng Zhang
04c3bf3c85
Replace highbd_dct_const_round_shift with dct_const_round_shift
...
They are identical.
Change-Id: I1ccaf03c81c3cbf88e82d77ffeb8204f5b063c61
2016-11-04 16:15:02 -07:00
Linfeng Zhang
32326c2f13
Merge "Cosmetics of inv_txfm.c"
2016-11-04 22:40:03 +00:00
Johann Koenig
900ec31bea
Merge "Extract high bit depth helper functions"
2016-11-04 21:03:17 +00:00
Linfeng Zhang
b68d8107cb
Cosmetics of inv_txfm.c
...
Unify code of 8-bit and high bitdepth.
Change-Id: I3fe441577af0249030ca3a1ef769eb9030711434
2016-11-04 13:24:41 -07:00
Johann
cf35ffc025
Extract high bit depth helper functions
...
These can be used in the vp9 fdct as well.
Change-Id: I4f3875e0cba1b8cad209c3a0581e121deba7675e
2016-11-04 18:13:51 +00:00
Martin Storsjo
34c35b6fb6
Add a missing END directive in idct_neon.asm
...
This fixes building with MS armasm.
Change-Id: I2629eeed859b775ca667a65ba109f8d1bf7b0e03
2016-11-04 12:21:18 +02:00
Linfeng Zhang
1338c71dfb
Clean horizontal intra prediction NEON optimization
...
Change-Id: I1ef0a5b2655cbc7e1cc2a4a1a72e0eed9aa41f05
2016-11-02 11:43:45 -07:00
Linfeng Zhang
1868582e7d
Add 32x32 d45 and 8x8, 16x16, 32x32 d135 NEON intra prediction
...
Change-Id: I852616794244490123eb615ac750da50265f0fa5
2016-11-02 11:40:37 -07:00
Johann Koenig
5ac7a59a05
Merge "arm idct: move to-be-shared code to header"
2016-11-02 18:09:45 +00:00
Linfeng Zhang
3b74066b10
Add high bitdepth intra prediction NEON optimization (mode dc)
...
BUG=webm:1316
Change-Id: I984d6004ea2445e86f213fb6fa4d794a9955af8f
2016-11-01 17:07:36 -07:00
Johann
bf8ab194ee
arm idct: move to-be-shared code to header
...
Change-Id: I67458cd358b4dc4434bbdbfcdd571769561b619e
2016-11-01 15:43:56 -07:00
James Zern
1b275ab898
Merge "idct32x32_1_add_neon: clear a couple conv warnings"
2016-11-01 22:34:59 +00:00
James Zern
9de91855ef
Merge changes I08af3a54,If5959a25,I6763e62e
...
* changes:
build/make/Android.mk: s/armv8/arm64/
build/make/Android.mk: fix armeabi-v7a build
use .S suffix rather than .s for NEON asm
2016-11-01 21:43:13 +00:00
Linfeng Zhang
cc5f49767a
Refine 8-bit intra prediction NEON optimization (mode tm)
...
Change-Id: I98b9577ec51367df5e5d564bedf7c3ea0606de4c
2016-11-01 09:45:16 -07:00
James Zern
7625c803b3
idct32x32_1_add_neon: clear a couple conv warnings
...
int16_t -> uint8_t
Change-Id: I3c5e0985bc3584dce289c35b5973de24cdc73b76
2016-10-31 18:56:34 -07:00
James Zern
1ddb4c0362
use .S suffix rather than .s for NEON asm
...
for compatibility with other build systems
Change-Id: I6763e62e3126850ad4f8ad29e388b8dad0bbc4c3
2016-10-31 16:39:05 -07:00
James Zern
410d947c5f
Merge "idct,NEON: add a tran_low_t->s16 load adapter"
2016-10-31 21:59:12 +00:00
James Zern
3ae25974fd
idct,NEON: add a tran_low_t->s16 load adapter
...
enable idct4x4* and idct8x8* which are compatible for 8-bit decodes in
high-bitdepth mode. the adapter narrows 32-bit input to 16, whether the
expansion can be avoided at all in this case remains a TODO. roughly
matches sse2.
BUG=webm:1294
Change-Id: I3ea94e5a2070dfd509b5de0c555aab4e1f4da036
2016-10-31 11:21:16 -07:00
Linfeng Zhang
a347118f3c
Refine 8-bit intra prediction NEON optimization (mode h and v)
...
Change-Id: I45e1454c3a85e081bfa14386e0248f57e2a91854
2016-10-31 10:33:44 -07:00
Linfeng Zhang
4ae9f5c092
Refine 8-bit intra prediction NEON optimization (mode d45 and d135)
...
dst += stride behaving better with gcc/clang.
Unroll loops.
Change-Id: I83f85df2bc9f17c6159542f57680b509395db2b1
2016-10-27 14:24:50 -07:00
Linfeng Zhang
9c0680bd43
Merge "Refine 8-bit intra prediction NEON optimization (mode dc)"
2016-10-26 16:51:44 +00:00
Johann
9720b58aac
Optimize idct32x32_34_add for NEON
...
Approximately 3 times faster than the 1024 version which was used
previously.
BUG=webm:1295
Change-Id: Id15fb3d096029ec38ef01c53e5f6eb08254347c9
2016-10-25 15:43:58 -07:00
Linfeng Zhang
ce88b8f5c5
Refine 8-bit intra prediction NEON optimization (mode dc)
...
dst += stride behaving better with gcc/clang
Expanding inline function dc_SIZExSIZE() save intructions for
vpx_dc_predictor_SIZExSIZE_neon().
Change-Id: Id0ccbd58b6a31df539141fd33bdf28633339150d
2016-10-24 13:18:51 -07:00
James Zern
2e6a1976a0
Merge "remove idct32x32*_add_neon.asm"
2016-10-22 02:29:56 +00:00
James Zern
5d91752a98
Merge "vpx_highbd_convolve_copy_neon: use multi reg loads"
2016-10-22 02:28:15 +00:00
James Zern
9dbb3ad396
remove idct32x32*_add_neon.asm
...
the intrinsics are neutral to ~20% faster on cros/android
devices when using gcc-4.9/clang-3.8.1 and gcc-4.9/clang-3.8.x from the
r13 ndk. neutral results typically came with gcc-4.9 while larger
positive gains were achieved with clang 3.8.x.
BUG=webm:1303
Change-Id: I4d31f9c017944681b881493525d4573a7a5b1e16
2016-10-20 19:47:14 -07:00
James Zern
a60dd5c83a
Merge "Fix warnings reported by -Wshadow: Part1: vpx_dsp directory"
2016-10-18 22:09:29 +00:00
Kaustubh Raste
8ff5af773a
Merge "Optimize sad_64width_x4d_msa function"
2016-10-18 07:46:02 +00:00
Kaustubh Raste
b7310e2aff
Optimize sad_64width_x4d_msa function
...
Reduced HADD_UH_U32 macro calls
Change-Id: Ie089b9a443de516646b46e8f72156aa826ca8cfa
2016-10-18 04:05:33 +00:00
Urvang Joshi
e084e05484
Fix warnings reported by -Wshadow: Part1: vpx_dsp directory
...
While we are at it:
- Rename some variables to more meaningful names
- Reuse some common consts from a header instead of redefining them.
Change-Id: I75c4248cb75aa54c52111686f139b096dc119328
(cherry picked from aomedia 09eea21)
2016-10-17 19:25:19 -07:00
James Zern
68cd3052ca
vpx_highbd_convolve_copy_neon: use multi reg loads
...
for copy16/32/64
BUG=webm:1299
Change-Id: I5080d736bde7e487c80ef3d7024dda1e96a57eaf
2016-10-17 17:15:03 -07:00
Linfeng Zhang
9c8981c666
add vpx high bitdepth convolve8 NEON intrinsics optimization
...
BUG=webm:1299
Change-Id: I236bfa0441e357b6ff05add8269a2cfb543924d1
2016-10-17 15:23:54 -07:00
Linfeng Zhang
f910d14a1a
add vpx_highbd_convolve_{copy,avg}_neon()
...
BUG=webm:1299
Change-Id: Ib87ac466ada63251eb06ae2abd1e13e61e0d1538
2016-10-13 15:21:14 -07:00
James Zern
1909270f65
Merge "cosmetics,*loopfilter_neon.c: s/tranpose/transpose/"
2016-10-13 07:12:51 +00:00
Kaustubh Raste
9e75c01353
Merge "Optimize vpx_mbpost_proc_across_ip_msa function"
2016-10-13 02:12:33 +00:00
Kaustubh Raste
99adf8b22e
Merge "Optimize vpx_get4x4sse_cs_msa function"
2016-10-13 02:12:00 +00:00
James Zern
fd270437f0
cosmetics,*loopfilter_neon.c: s/tranpose/transpose/
...
Change-Id: I267d6a9d715ddb6110f0881c2e820c37fc673fe1
2016-10-12 16:12:56 -07:00
Linfeng Zhang
01454ec485
[vpx highbd lpf NEON 6/6] vertical 16
...
BUG=webm:1300
Change-Id: I29d0b482d66f05e278325ddebcf108fbf0b6e222
2016-10-11 22:59:19 -07:00
Linfeng Zhang
27479775c4
[vpx highbd lpf NEON 5/6] horizontal 16
...
BUG=webm:1300
Change-Id: I21da32d6cfb8a1a6f58bc9756d17f48f13a59a12
2016-10-11 22:59:19 -07:00
Linfeng Zhang
251cbfbec8
[vpx highbd lpf NEON 4/6] vertical 8
...
BUG=webm:1300
Change-Id: If06b12bc081bab60059b100414dd7018f83ac62d
2016-10-11 22:59:19 -07:00
Linfeng Zhang
96c7206ede
[vpx highbd lpf NEON 3/6] horizontal 8
...
BUG=webm:1300
Change-Id: Ica2379e294be60b7f80fcfcec110dca4c3b59d81
2016-10-12 00:48:31 +00:00
Linfeng Zhang
57e4cbc632
Merge "[vpx highbd lpf NEON 2/6] vertical 4"
2016-10-10 16:57:55 +00:00
Linfeng Zhang
19046d9963
Merge "[vpx highbd lpf NEON 1/6] horizontal 4"
2016-10-10 16:56:23 +00:00
Kaustubh Raste
3da752fe00
Optimize vpx_mbpost_proc_across_ip_msa function
...
Removed HADD_SW_S32 calculation
Change-Id: I7384dc881451d197404d09beb7c27b222e1d6875
2016-10-10 18:03:28 +05:30
Kaustubh Raste
d05104b488
Optimize vpx_get4x4sse_cs_msa function
...
Reuse CALC_MSE_B macro
Change-Id: I39f0a92ac2dbb5fa8628df1a5d556cfdc42a3648
2016-10-10 16:31:57 +05:30
Kaustubh Raste
3c2f7eb339
Optimize vp9 loopfilter msa functions
...
Updated code to process in 8bit as saturation/clipping takes care of
overflow
Removed unused macro
Change-Id: I113df60286fb28b216df800d95b2d3695ef71440
2016-10-07 19:26:26 -07:00
Linfeng Zhang
49aa9b1f12
[vpx highbd lpf NEON 2/6] vertical 4
...
BUG=webm:1300
Change-Id: Ia33a9f2d6c7e2e6b3497ad6f1a09439a85b33983
2016-10-06 14:22:26 -07:00
Linfeng Zhang
7aa27bd62f
[vpx highbd lpf NEON 1/6] horizontal 4
...
BUG=webm:1300
Change-Id: Idf441806e6bf397ff5ecd8776146b3f781f50c40
2016-10-06 14:03:04 -07:00
James Zern
1e1caad165
vpx_dsp/idct*_neon.asm: simplify immediate loads
...
mov supports 0-65535
Change-Id: I019de0d784836d7bd60e6b36f2cdeefb541cb3fd
2016-10-05 14:28:32 -07:00
James Zern
a6be7ba1aa
enable idct*_1_add_neon in high-bitdepth builds
...
these are compatible as they only load one element of the input so the
larger size of tran_low_t makes no difference in little endian builds.
note the asm is incompatible with big-endian, but there are other points of
failure there so currently it's considered unsupported.
BUG=webm:1294
Change-Id: Icd2665a0699bccae92d1bea43a95b0a83fb17028
2016-10-05 11:14:25 -07:00
Angie Chiang
5d635365bb
Merge "Move highbd txfm input range check from 2d iht transform to 1d idct/iadst"
2016-10-04 16:57:37 +00:00
Kaustubh Raste
0a92dd7319
Merge "Fix vpx_plane_add_noise_msa functionality bit-mismatch"
2016-10-04 06:35:47 +00:00
Angie Chiang
5b073c695b
Move highbd txfm input range check from 2d iht transform to 1d idct/iadst
...
This change will make the highbd txfm input range check more comprehensive
The 25-bit highbd input range is composed by
12 signal input bits + 7 bits for 2D forward transform amplification + 5 bits for
1D inverse transform amplification + 1 bit for contingency in rounding and quantizing
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1286
BUG=https://bugs.chromium.org/p/chromium/issues/detail?id=651625
Change-Id: I04c0796edd7653f8d463fba5dc418132986131e7
2016-10-03 17:21:08 -07:00
James Zern
c6bc7499d9
Merge "cosmetics,*_neon.c: rm redundant return from void fns"
2016-10-03 22:40:42 +00:00
Kaustubh Raste
6922fc8230
Fix vpx_plane_add_noise_msa functionality bit-mismatch
...
Change-Id: I04961afb592ae6a67fdcfd8c9066e920dd4b30e7
2016-10-03 18:15:59 +00:00
James Zern
50b9c467da
Merge "vpx_convolve8_neon,load/store*: correct param type"
2016-10-01 23:52:14 +00:00
James Zern
c449983c56
vpx_convolve8_neon,load/store*: correct param type
...
stride/pitch in convolve is expressed with a ptrdiff_t
Change-Id: Ia5a6732dc509f06ccf7035386fa8ae721b4b1a71
2016-10-01 11:03:29 -07:00
Martin Storsjo
9255328f27
Remove a stray END declaration in loopfilter_4_neon.asm
...
Change-Id: Ic8c359a5677f9c663787aac74f530e886163bc69
2016-10-01 14:12:42 +03:00
Linfeng Zhang
da14d23e44
Merge "Refactor vpx lpf NEON files (step 2/2)"
2016-10-01 00:07:51 +00:00
Linfeng Zhang
edbca72a53
Merge "Refactor vpx lpf NEON files (step 1/2)"
2016-10-01 00:07:31 +00:00
James Zern
db80c23fd4
cosmetics,*_neon.c: rm redundant return from void fns
...
+ a couple of 'break's after a return
Change-Id: Ia21f12ebcef98244feb923c17b689fc8115da015
2016-09-30 13:09:57 -07:00
James Zern
b6277a47c7
Merge changes from topic '8bit-hbd-idct'
...
* changes:
*idct*_neon.c: add missing rtcd include
idct,msa/neon: exclude idct files from hbd build
*rtcd_defs.pl: remove empty specialize calls
2016-09-30 19:36:08 +00:00
James Zern
1396d12103
*idct*_neon.c: add missing rtcd include
...
+ correct declarations as necessary
BUG=webm:1294
Change-Id: I719602df9a56e79188a78e7f8b31257c6d3cc11d
2016-09-30 11:41:26 -07:00
James Zern
b51c4df93a
idct,msa/neon: exclude idct files from hbd build
...
these functions are incompatible currently and unreferenced in rtcd,
exclude them from the build.
BUG=webm:1294
Change-Id: I7790c195a91e1b142f56c04d2a5e305d9133b896
2016-09-30 11:32:47 -07:00
Linfeng Zhang
ca2fe7a8c7
Refactor vpx lpf NEON files (step 2/2)
...
Change-Id: I0744407cd3361ff752bd7f6e654b70ab6b41a58f
2016-09-30 09:56:28 -07:00
Linfeng Zhang
4779f5308d
Refactor vpx lpf NEON files (step 1/2)
...
Change-Id: I4016d096d46ca691f3b17199b259b7231e983cfb
2016-09-30 09:48:54 -07:00
Linfeng Zhang
8c744fd978
Merge "Unify loopfilter function names"
2016-09-30 15:58:08 +00:00
Linfeng Zhang
c435b7fbdd
Merge "Refine vpx convolve8 NEON intrinsics optimization"
2016-09-30 15:56:31 +00:00
Linfeng Zhang
bde905cba1
Merge "Refine vpx_convolve_copy_neon() and vpx_convolve_avg_neon()"
2016-09-30 15:54:02 +00:00
James Zern
ed62d27c71
*rtcd_defs.pl: remove empty specialize calls
...
add_proto adds a 'c' specialization
Change-Id: I0ed0c2240d45264b0e0056ce7c8f63f4a00780bc
2016-09-29 20:38:26 -07:00
Linfeng Zhang
7f1f35183a
Unify loopfilter function names
...
Rename vpx_lpf_horizontal_edge_8() to vpx_lpf_horizontal_16().
Rename vpx_lpf_horizontal_edge_16() to vpx_lpf_horizontal_16_dual().
Change-Id: I798ca8fbbd657d06d3db2bfb0fb3321168f49e52
2016-09-29 16:25:42 -07:00
Linfeng Zhang
85a9e48d25
Refine vpx_convolve_copy_neon() and vpx_convolve_avg_neon()
...
BUG=webm:1290
Change-Id: Ia27e58521eba5a4852b50381c56746fa5767f6d6
2016-09-29 16:19:39 -07:00
Johann Koenig
ad55b1d270
Merge changes Ia3e9122f,Id33eb6c8,I956bd8ce
...
* changes:
Remove vp8_clear_system_state
vpx_dsp: clean up rtcd
vp8: clean up rtcd
2016-09-29 23:16:45 +00:00
Linfeng Zhang
b3cb065ee4
Refine vpx convolve8 NEON intrinsics optimization
...
BUG=webm:1290
Change-Id: I5d7fce62270f9d76ef9ce98b3d188ad11fb21873
2016-09-29 12:48:59 -07:00
Johann
7b5a348088
vpx_dsp: clean up rtcd
...
Remove avx2+ssse3 specialization. Disabling ssse3 now automatically
disables avx2.
Change-Id: Id33eb6c85d1c4ee57128ebe45c995eb15cfcc765
2016-09-29 12:10:07 -07:00
James Zern
93c823e24b
vpx_dsp/get_prob: relocate den == 0 test
...
to get_binary_prob(). the only other caller mode_mv_merge_probs() does
its own test on 0.
BUG=chromium:639712
Change-Id: I1178688706baeca2883f7aadbc254abb219a44ce
2016-09-28 17:42:49 -07:00
James Zern
7481edb33f
vpx_dsp/get_prob: make clip_prob branchless
...
+ inline the function directly as there was only one consumer
(get_prob())
this is an attempt to reduce the amount of branches to workaround an amd
bug. this change is mildly faster or neutral across x86-64, arm.
http://support.amd.com/TechDocs/44739_12h_Rev_Gd.pdf
665 Integer Divide Instruction May Cause Unpredictable Behavior
BUG=chromium:639712
Suggested-by: Pascal Massimino <pascal.massimino@gmail.com>
Change-Id: Ia91823aded79aab469dd68095d44300e8df04ed2
2016-09-28 11:51:46 -07:00
Johann
02fa245d15
mips: clean up wextra warnings
...
Remove unused zbin variable:
warning: unused parameter ‘zbin’
Use int for loop variables to avoid unsigned conversion:
warning: comparison between signed and unsigned integer expressions
Change-Id: Icea74b870c0ee68a8bf687e796a69392af25a8ad
2016-09-27 13:19:18 -07:00
Urvang Joshi
0aa3e2564f
Add compiler warning flag -Wextra and fix related warnings.
...
Note: some of these warnings are enabled by a combination of -Wunused
(added earlier) and -Wextra.
Cherry-picked from AOM 4790a69faaec8f03d65f64ff070f6ab4307dbb16
Expands use of (void)x; on unused variables. AOM only supports one codec
in codec_factory.h
Does not include changes to HandleDecodeResult. AOM removed
invalid_file_test.cc which does use the video parameter.
Does not enable -Wextra yet. There are more issues to fix.
BUG=webm:1069
Change-Id: I322a1366bd4fd6c0dec9e758c2d5e88e003b1cbf
2016-09-27 12:05:01 -07:00
Linfeng Zhang
b46243d7ff
Merge "Refactor lpf (size 4 and 8) NEON intrinsics optimization"
2016-09-26 16:11:12 +00:00
James Zern
deadda3dea
Merge "vpx_idct32x32_34_add_sse2: rm unneeded transposes"
2016-09-23 02:49:26 +00:00
James Zern
fdd1186f97
vpx_idct32x32_34_add_sse2: rm unneeded transposes
...
this change is neutral to mildly positive across various x86-64
platforms
Change-Id: I28fb5ae598fc1317b7a42c9a846ac5d57d104784
2016-09-21 19:49:25 -07:00
James Zern
e372bfd5ac
variance_neon: sync variance*() w/c,sse2
...
removes some unnecessary casts and adds a few explicit uint32 ones for
larger sizes to quiet -Wshorten-64-to-32 warnings
Change-Id: I63c5fce8e62c426d5cf5c10a66a113c119a43518
2016-09-21 18:04:45 -07:00
Linfeng Zhang
761e5ec2f6
Refactor lpf (size 4 and 8) NEON intrinsics optimization
...
Also check in 8x8 8-bit transpose NEON intrinsics optimization
transpose_u8_8x8()
Change-Id: I32d321cf97ea21eab158ac4896990fc9a51681c4
2016-09-19 16:41:37 -07:00
James Zern
6acd061aad
variance_avx2: sync variance functions with c-code
...
add missing int64 -> uint32 cast; quiets -Wshorten-64-to-32 warnings
Change-Id: I4850b36e18dc8b399108342be4bfe0b684aefb78
2016-09-19 16:19:29 -07:00
James Zern
aa0eb67bf7
loopfilter_mb_neon: remove unused load_8x8()
...
quiets a -Wunused-function warning for arm targets
Change-Id: I293a7e3d3d7d61d6af2fbedad5e8c25126c418b6
2016-09-17 11:00:31 -07:00
Linfeng Zhang
5d73639d8f
Merge "Refactor lpf (size 16) NEON intrinsics optimization"
2016-09-17 00:33:30 +00:00