Commit Graph

600 Commits

Author SHA1 Message Date
Linfeng Zhang
c8f25fa5c0 Clean hbd idct 4x4 neon functions and other
BUG=webm:1301

Change-Id: I387b7eae716a7df15c691dc6f368b07602df7342
2016-12-14 11:38:28 -08:00
James Zern
86e340c76e enable vpx_idct32x32_1024_add_neon in hbd builds
BUG=webm:1294

Change-Id: Ibdda54e6d1303b0f73bc7bc71417e4041d7618de
2016-12-12 19:28:35 -08:00
Linfeng Zhang
5d4aa325a6 Cosmetics by unifying dest_stride to stride in idct
Change-Id: Ie9336a808a3c3592bb4fd5d4ad3839028bfcafba
2016-12-12 15:13:22 -08:00
Johann
2c24f7178d Move load_and_transpose to transpose_neon.h
Allows for use outside the idcts without pulling in idct_neon.h

Change-Id: I4a94c1af3dac3e1b5bc8296ec9eab0ddcc8cfecf
2016-12-09 12:54:55 -08:00
James Zern
6defef4ab2 idct16x16_add_neon: fix arm visual studio builds
after:
2d3d95f enable vpx_idct16x16_256_add_neon in hbd builds

reorder INCLUDEs and fix indent of IF/ENDIFs

remove vpx_config.asm to avoid multiple symbol definitions in windows
builds and shift idct_neon.asm.S to the top to allow use of
CONFIG_VP9_HIGHBITDEPTH in the export list.

Change-Id: I0dacfbae62a6ec8fe4a26940c1a52da2dfad2029
2016-12-08 15:17:57 -08:00
Linfeng Zhang
174528de1e Merge "Update idct NEON optimization to not use narrowing saturating shift" 2016-12-07 21:03:21 +00:00
James Zern
f16a0a1aa4 Merge "enable vpx_idct16x16_256_add_neon in hbd builds" 2016-12-07 20:26:44 +00:00
Linfeng Zhang
018a2adcb1 Update idct NEON optimization to not use narrowing saturating shift
Change-Id: Iae517017217dbacd638d40fcfeeb0f4bba7b8b8b
2016-12-07 10:25:09 -08:00
James Zern
2d3d95f7ac enable vpx_idct16x16_256_add_neon in hbd builds
BUG=webm:1294

Change-Id: Ib421c150b0d29dee0a81390a612bf01a4a28cff1
2016-12-06 18:32:21 -08:00
James Zern
228c9940ea Merge changes Ibad079f2,I7858a0a1
* changes:
  enable vpx_idct16x16_10_add_neon in hbd builds
  idct16x16,NEON: rm output_stride from pass1 fns
2016-12-07 01:40:28 +00:00
James Zern
8befcd0089 enable vpx_idct16x16_10_add_neon in hbd builds
BUG=webm:1294

Change-Id: Ibad079f25e673d4f5181961896a8a8333a51e825
2016-12-06 16:09:19 -08:00
James Zern
af9d7aa9fb idct16x16,NEON: rm output_stride from pass1 fns
vpx_idct16x16_256_add_neon_pass1, vpx_idct16x16_10_add_neon:
this was a constant 8 in all cases meaning the results are stored
contiguously, this allows the number of stores to be reduced.

Change-Id: I7858a0a15a284883ef45c13dfd97c308df9ea09e
2016-12-06 15:13:33 -08:00
Linfeng Zhang
cb339d628f Refine 8-bit 8x8 idct NEON intrinsics
Change-Id: I4ec4ad1928ec2ed87f596f52f097bc52065278dd
2016-12-05 17:50:14 -08:00
Linfeng Zhang
a8eee97b43 Check in vpx_lpf_vertical_4_dual_neon() assembly
This replaces its C version.

Change-Id: Ie39e9324305fdc0fff610ced608a037e44a85a1a
2016-12-02 15:54:30 -08:00
James Zern
a7fa1314da Merge changes I4afc130e,Iaa64d23f
* changes:
  Add high bitdepth 4x4 idct NEON intrinsics
  Update idct x86 intrinsics to not use saturated add and sub
2016-12-02 04:01:28 +00:00
Linfeng Zhang
17a8cf5cc3 Add high bitdepth 4x4 idct NEON intrinsics
Change-Id: I4afc130effa05b8be2e9f982967216b1beb2ce4b
2016-11-30 13:07:13 -08:00
Linfeng Zhang
264f6e70ec Update idct x86 intrinsics to not use saturated add and sub
Change-Id: Iaa64d23fdb45ca1f235b0ea57e614516e548eca4
2016-11-29 17:06:08 -08:00
James Zern
c6641782c3 idct16x16,NEON,cosmetics: normalize fn signatures
+ remove unused parameters from vpx_idct16x16_10_add_neon_pass2

Change-Id: Ie5912a4abdd308fab589380bca054a2e7234a2c4
2016-11-28 16:46:01 -08:00
James Zern
21a1abd8e3 enable vpx_idct32x32_135_add_neon in hbd builds
BUG=webm:1294

Change-Id: Ide6d3994fe01c4320c9d143e6d059b49568048e4
2016-11-23 19:59:43 -08:00
James Zern
568d4b1d63 idct_neon: rename load_tran_low_to_s16 -> ...s16q
BUG=webm:1294

Change-Id: I164cfcbe9bc4511d1d04af9206cf351a0ec2957b
2016-11-23 19:57:48 -08:00
James Zern
d757d7e998 Merge changes Icc4ead05,Ib019964b,I3b5fd3b3,Ieedadee2
* changes:
  Update vpx_idct4x4_16_add_neon() to pass SingleExtremeCoeff test
  Refine 8-bit 4x4 idct NEON intrinsics
  Add idct speed test.
  Update partial_idct_test.cc to support high bitdepth
2016-11-24 03:31:25 +00:00
Jerome Jiang
97ec6291ee Change C/MSA post proc to match SSE2.
BUG=webm:1321

Change-Id: I719023375dc48cf7d8ed72188853f0f1ccc4ad7f
2016-11-23 10:42:11 -08:00
Linfeng Zhang
05e2b5a59f Merge "Add 32x32 d45 and 8x8, 16x16, 32x32 d135 NEON intra prediction" 2016-11-22 23:20:53 +00:00
Linfeng Zhang
6cc76ec73f Update vpx_idct4x4_16_add_neon() to pass SingleExtremeCoeff test
Change-Id: Icc4ead05506797d12bf134e8790443676fef5c10
2016-11-22 11:35:05 -08:00
Linfeng Zhang
974e81d184 Refine 8-bit 4x4 idct NEON intrinsics
Change-Id: Ib019964bfcbce7aec57d8c3583127f9354d3c11f
2016-11-22 11:26:03 -08:00
Kaustubh Raste
ecc5998bcf Fix mips dspr2 build warning
Change-Id: Ia8fb3ed124f01384e7896e309c9ff22c05b40719
2016-11-22 17:49:17 +05:30
Kaustubh Raste
a38e9f412d Merge "Fix SingleLargeCoeff idct test" 2016-11-19 03:37:29 +00:00
James Zern
cbeae53e76 Merge "Clean horizontal intra prediction NEON optimization" 2016-11-19 01:29:37 +00:00
Jerome Jiang
de5fd00ec5 Change *_xmm to *_sse2 in deblocker assembly functions.
Some cosmetic changes because xmm is an anachronism.

Change-Id: I436a5b78a3c52776c20d6640939311f2a84a9bc7
2016-11-17 23:38:04 +00:00
Kaustubh Raste
c56e5dd620 Fix SingleLargeCoeff idct test
Updated idct code to handle single large coefficient (-32768)

Change-Id: Ia13ab1ab434a9a1b9954a5914088977a88841cc7
2016-11-17 11:41:07 +00:00
Jerome Jiang
5d48663e04 Merge "Change C and msa to match results from sse2." 2016-11-17 05:16:27 +00:00
Jerome Jiang
cb1b1b8fef Change C and msa to match results from sse2.
Re-enable the tests to check CvsAssembly.
BUG=webm:1321

Change-Id: Id7f7d74b06c469fb6c8f5d04e91359e9cd9097a6
2016-11-16 17:05:26 -08:00
Linfeng Zhang
85c1ee434d Add high bitdepth intra prediction NEON optimization (mode tm)
BUG=webm:1316

Change-Id: Ib014de06836ac12726f4a2c9f0833ec4eb4d233b
2016-11-15 14:19:46 -08:00
Linfeng Zhang
a3128ad33a Add high bitdepth intra prediction NEON optimization (h and v)
BUG=webm:1316

Change-Id: I47eeac698a98a31d1af5f72441052302e9fa4f46
2016-11-12 12:00:19 -08:00
James Zern
80f6b243a7 Merge changes I339088b2,Iaade219e,If142afb1,I4257c4b3
* changes:
  fdct8x8_test: add vpx_idct8x8_64_add_neon in hbd
  fdct4x4_test: add vpx_idct4x4_16_add_neon in hbd
  partial_idct_test,NEON: add missing idct variants
  enable vpx_idct32x32_34_add_neon in hbd builds
2016-11-10 05:02:39 +00:00
Linfeng Zhang
40ab0424d4 Add high bitdepth intra prediction NEON optimization (mode d45 and d135)
BUG=webm:1316

Change-Id: I6a330874348df04df24a6d9efdc06f567e04bf8e
2016-11-09 12:04:04 -08:00
James Zern
738c8f23c6 enable vpx_idct32x32_34_add_neon in hbd builds
replace load_and_transpose_s16_8x8() in idct32_6_neon() with a separate
load_tran_low_to_s16() and transpose_s16_8x8(). the combined function is
used in idct32_8_neon() where the input is the correctly sized output
from the earlier stage.

BUG=webm:1294

Change-Id: I4257c4b3a421b2cf5d13651f966eee0680ef98a9
2016-11-08 17:03:36 -08:00
Johann
50b40f114c Optimize idct32x32_135_add for NEON
BUG=webm:1295

Change-Id: I7f80ef4d29813fcb401fc6075babf19e3c195462
2016-11-08 22:06:07 +00:00
Linfeng Zhang
64a5a8fd6f Merge "Add high bitdepth intra prediction NEON optimization (mode dc)" 2016-11-08 16:53:42 +00:00
Linfeng Zhang
d545c19afa Rename vpx_highbd_idct8x8_10{*}() to vpx_highbd_idct8x8_12{*}()
Also update its trigger threshold from 10 to 12.

Change-Id: Ib8dddd87a5a22a12ca66e7084d342fbb027b0a2f
2016-11-07 09:07:55 -08:00
Linfeng Zhang
a9874961f0 Merge "Replace highbd_dct_const_round_shift with dct_const_round_shift" 2016-11-07 16:55:01 +00:00
Johann
e10c95dc83 Update vp9_fdct8x8_quant_ssse3 for highbitdepth
Borrow transition functions from fdct.h nee vpx_quantize_b_sse2

BUG=webm:1304

Change-Id: I9c88c3eec3ff8bb461411d98c26c3c236ea28ef1
2016-11-05 01:23:07 +00:00
Linfeng Zhang
04c3bf3c85 Replace highbd_dct_const_round_shift with dct_const_round_shift
They are identical.

Change-Id: I1ccaf03c81c3cbf88e82d77ffeb8204f5b063c61
2016-11-04 16:15:02 -07:00
Linfeng Zhang
32326c2f13 Merge "Cosmetics of inv_txfm.c" 2016-11-04 22:40:03 +00:00
Johann Koenig
900ec31bea Merge "Extract high bit depth helper functions" 2016-11-04 21:03:17 +00:00
Linfeng Zhang
b68d8107cb Cosmetics of inv_txfm.c
Unify code of 8-bit and high bitdepth.

Change-Id: I3fe441577af0249030ca3a1ef769eb9030711434
2016-11-04 13:24:41 -07:00
Johann
cf35ffc025 Extract high bit depth helper functions
These can be used in the vp9 fdct as well.

Change-Id: I4f3875e0cba1b8cad209c3a0581e121deba7675e
2016-11-04 18:13:51 +00:00
Martin Storsjo
34c35b6fb6 Add a missing END directive in idct_neon.asm
This fixes building with MS armasm.

Change-Id: I2629eeed859b775ca667a65ba109f8d1bf7b0e03
2016-11-04 12:21:18 +02:00
Linfeng Zhang
1338c71dfb Clean horizontal intra prediction NEON optimization
Change-Id: I1ef0a5b2655cbc7e1cc2a4a1a72e0eed9aa41f05
2016-11-02 11:43:45 -07:00
Linfeng Zhang
1868582e7d Add 32x32 d45 and 8x8, 16x16, 32x32 d135 NEON intra prediction
Change-Id: I852616794244490123eb615ac750da50265f0fa5
2016-11-02 11:40:37 -07:00
Johann Koenig
5ac7a59a05 Merge "arm idct: move to-be-shared code to header" 2016-11-02 18:09:45 +00:00
Linfeng Zhang
3b74066b10 Add high bitdepth intra prediction NEON optimization (mode dc)
BUG=webm:1316

Change-Id: I984d6004ea2445e86f213fb6fa4d794a9955af8f
2016-11-01 17:07:36 -07:00
Johann
bf8ab194ee arm idct: move to-be-shared code to header
Change-Id: I67458cd358b4dc4434bbdbfcdd571769561b619e
2016-11-01 15:43:56 -07:00
James Zern
1b275ab898 Merge "idct32x32_1_add_neon: clear a couple conv warnings" 2016-11-01 22:34:59 +00:00
James Zern
9de91855ef Merge changes I08af3a54,If5959a25,I6763e62e
* changes:
  build/make/Android.mk: s/armv8/arm64/
  build/make/Android.mk: fix armeabi-v7a build
  use .S suffix rather than .s for NEON asm
2016-11-01 21:43:13 +00:00
Linfeng Zhang
cc5f49767a Refine 8-bit intra prediction NEON optimization (mode tm)
Change-Id: I98b9577ec51367df5e5d564bedf7c3ea0606de4c
2016-11-01 09:45:16 -07:00
James Zern
7625c803b3 idct32x32_1_add_neon: clear a couple conv warnings
int16_t -> uint8_t

Change-Id: I3c5e0985bc3584dce289c35b5973de24cdc73b76
2016-10-31 18:56:34 -07:00
James Zern
1ddb4c0362 use .S suffix rather than .s for NEON asm
for compatibility with other build systems

Change-Id: I6763e62e3126850ad4f8ad29e388b8dad0bbc4c3
2016-10-31 16:39:05 -07:00
James Zern
410d947c5f Merge "idct,NEON: add a tran_low_t->s16 load adapter" 2016-10-31 21:59:12 +00:00
James Zern
3ae25974fd idct,NEON: add a tran_low_t->s16 load adapter
enable idct4x4* and idct8x8* which are compatible for 8-bit decodes in
high-bitdepth mode. the adapter narrows 32-bit input to 16, whether the
expansion can be avoided at all in this case remains a TODO. roughly
matches sse2.

BUG=webm:1294

Change-Id: I3ea94e5a2070dfd509b5de0c555aab4e1f4da036
2016-10-31 11:21:16 -07:00
Linfeng Zhang
a347118f3c Refine 8-bit intra prediction NEON optimization (mode h and v)
Change-Id: I45e1454c3a85e081bfa14386e0248f57e2a91854
2016-10-31 10:33:44 -07:00
Linfeng Zhang
4ae9f5c092 Refine 8-bit intra prediction NEON optimization (mode d45 and d135)
dst += stride behaving better with gcc/clang.
Unroll loops.

Change-Id: I83f85df2bc9f17c6159542f57680b509395db2b1
2016-10-27 14:24:50 -07:00
Linfeng Zhang
9c0680bd43 Merge "Refine 8-bit intra prediction NEON optimization (mode dc)" 2016-10-26 16:51:44 +00:00
Johann
9720b58aac Optimize idct32x32_34_add for NEON
Approximately 3 times faster than the 1024 version which was used
previously.

BUG=webm:1295

Change-Id: Id15fb3d096029ec38ef01c53e5f6eb08254347c9
2016-10-25 15:43:58 -07:00
Linfeng Zhang
ce88b8f5c5 Refine 8-bit intra prediction NEON optimization (mode dc)
dst += stride behaving better with gcc/clang
Expanding inline function dc_SIZExSIZE() save intructions for
vpx_dc_predictor_SIZExSIZE_neon().

Change-Id: Id0ccbd58b6a31df539141fd33bdf28633339150d
2016-10-24 13:18:51 -07:00
James Zern
2e6a1976a0 Merge "remove idct32x32*_add_neon.asm" 2016-10-22 02:29:56 +00:00
James Zern
5d91752a98 Merge "vpx_highbd_convolve_copy_neon: use multi reg loads" 2016-10-22 02:28:15 +00:00
James Zern
9dbb3ad396 remove idct32x32*_add_neon.asm
the intrinsics are neutral to ~20% faster on cros/android
devices when using gcc-4.9/clang-3.8.1 and gcc-4.9/clang-3.8.x from the
r13 ndk. neutral results typically came with gcc-4.9 while larger
positive gains were achieved with clang 3.8.x.

BUG=webm:1303

Change-Id: I4d31f9c017944681b881493525d4573a7a5b1e16
2016-10-20 19:47:14 -07:00
James Zern
a60dd5c83a Merge "Fix warnings reported by -Wshadow: Part1: vpx_dsp directory" 2016-10-18 22:09:29 +00:00
Kaustubh Raste
8ff5af773a Merge "Optimize sad_64width_x4d_msa function" 2016-10-18 07:46:02 +00:00
Kaustubh Raste
b7310e2aff Optimize sad_64width_x4d_msa function
Reduced HADD_UH_U32 macro calls

Change-Id: Ie089b9a443de516646b46e8f72156aa826ca8cfa
2016-10-18 04:05:33 +00:00
Urvang Joshi
e084e05484 Fix warnings reported by -Wshadow: Part1: vpx_dsp directory
While we are at it:
- Rename some variables to more meaningful names
- Reuse some common consts from a header instead of redefining them.

Change-Id: I75c4248cb75aa54c52111686f139b096dc119328
(cherry picked from aomedia 09eea21)
2016-10-17 19:25:19 -07:00
James Zern
68cd3052ca vpx_highbd_convolve_copy_neon: use multi reg loads
for copy16/32/64

BUG=webm:1299

Change-Id: I5080d736bde7e487c80ef3d7024dda1e96a57eaf
2016-10-17 17:15:03 -07:00
Linfeng Zhang
9c8981c666 add vpx high bitdepth convolve8 NEON intrinsics optimization
BUG=webm:1299

Change-Id: I236bfa0441e357b6ff05add8269a2cfb543924d1
2016-10-17 15:23:54 -07:00
Linfeng Zhang
f910d14a1a add vpx_highbd_convolve_{copy,avg}_neon()
BUG=webm:1299

Change-Id: Ib87ac466ada63251eb06ae2abd1e13e61e0d1538
2016-10-13 15:21:14 -07:00
James Zern
1909270f65 Merge "cosmetics,*loopfilter_neon.c: s/tranpose/transpose/" 2016-10-13 07:12:51 +00:00
Kaustubh Raste
9e75c01353 Merge "Optimize vpx_mbpost_proc_across_ip_msa function" 2016-10-13 02:12:33 +00:00
Kaustubh Raste
99adf8b22e Merge "Optimize vpx_get4x4sse_cs_msa function" 2016-10-13 02:12:00 +00:00
James Zern
fd270437f0 cosmetics,*loopfilter_neon.c: s/tranpose/transpose/
Change-Id: I267d6a9d715ddb6110f0881c2e820c37fc673fe1
2016-10-12 16:12:56 -07:00
Linfeng Zhang
01454ec485 [vpx highbd lpf NEON 6/6] vertical 16
BUG=webm:1300

Change-Id: I29d0b482d66f05e278325ddebcf108fbf0b6e222
2016-10-11 22:59:19 -07:00
Linfeng Zhang
27479775c4 [vpx highbd lpf NEON 5/6] horizontal 16
BUG=webm:1300

Change-Id: I21da32d6cfb8a1a6f58bc9756d17f48f13a59a12
2016-10-11 22:59:19 -07:00
Linfeng Zhang
251cbfbec8 [vpx highbd lpf NEON 4/6] vertical 8
BUG=webm:1300

Change-Id: If06b12bc081bab60059b100414dd7018f83ac62d
2016-10-11 22:59:19 -07:00
Linfeng Zhang
96c7206ede [vpx highbd lpf NEON 3/6] horizontal 8
BUG=webm:1300

Change-Id: Ica2379e294be60b7f80fcfcec110dca4c3b59d81
2016-10-12 00:48:31 +00:00
Linfeng Zhang
57e4cbc632 Merge "[vpx highbd lpf NEON 2/6] vertical 4" 2016-10-10 16:57:55 +00:00
Linfeng Zhang
19046d9963 Merge "[vpx highbd lpf NEON 1/6] horizontal 4" 2016-10-10 16:56:23 +00:00
Kaustubh Raste
3da752fe00 Optimize vpx_mbpost_proc_across_ip_msa function
Removed HADD_SW_S32 calculation

Change-Id: I7384dc881451d197404d09beb7c27b222e1d6875
2016-10-10 18:03:28 +05:30
Kaustubh Raste
d05104b488 Optimize vpx_get4x4sse_cs_msa function
Reuse CALC_MSE_B macro

Change-Id: I39f0a92ac2dbb5fa8628df1a5d556cfdc42a3648
2016-10-10 16:31:57 +05:30
Kaustubh Raste
3c2f7eb339 Optimize vp9 loopfilter msa functions
Updated code to process in 8bit as saturation/clipping takes care of
overflow
Removed unused macro

Change-Id: I113df60286fb28b216df800d95b2d3695ef71440
2016-10-07 19:26:26 -07:00
Linfeng Zhang
49aa9b1f12 [vpx highbd lpf NEON 2/6] vertical 4
BUG=webm:1300

Change-Id: Ia33a9f2d6c7e2e6b3497ad6f1a09439a85b33983
2016-10-06 14:22:26 -07:00
Linfeng Zhang
7aa27bd62f [vpx highbd lpf NEON 1/6] horizontal 4
BUG=webm:1300

Change-Id: Idf441806e6bf397ff5ecd8776146b3f781f50c40
2016-10-06 14:03:04 -07:00
James Zern
1e1caad165 vpx_dsp/idct*_neon.asm: simplify immediate loads
mov supports 0-65535

Change-Id: I019de0d784836d7bd60e6b36f2cdeefb541cb3fd
2016-10-05 14:28:32 -07:00
James Zern
a6be7ba1aa enable idct*_1_add_neon in high-bitdepth builds
these are compatible as they only load one element of the input so the
larger size of tran_low_t makes no difference in little endian builds.
note the asm is incompatible with big-endian, but there are other points of
failure there so currently it's considered unsupported.

BUG=webm:1294

Change-Id: Icd2665a0699bccae92d1bea43a95b0a83fb17028
2016-10-05 11:14:25 -07:00
Angie Chiang
5d635365bb Merge "Move highbd txfm input range check from 2d iht transform to 1d idct/iadst" 2016-10-04 16:57:37 +00:00
Kaustubh Raste
0a92dd7319 Merge "Fix vpx_plane_add_noise_msa functionality bit-mismatch" 2016-10-04 06:35:47 +00:00
Angie Chiang
5b073c695b Move highbd txfm input range check from 2d iht transform to 1d idct/iadst
This change will make the highbd txfm input range check more comprehensive

The 25-bit highbd input range is composed by
12 signal input bits + 7 bits for 2D forward transform amplification + 5 bits for
1D inverse transform amplification + 1 bit for contingency in rounding and quantizing

BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1286
BUG=https://bugs.chromium.org/p/chromium/issues/detail?id=651625

Change-Id: I04c0796edd7653f8d463fba5dc418132986131e7
2016-10-03 17:21:08 -07:00
James Zern
c6bc7499d9 Merge "cosmetics,*_neon.c: rm redundant return from void fns" 2016-10-03 22:40:42 +00:00
Kaustubh Raste
6922fc8230 Fix vpx_plane_add_noise_msa functionality bit-mismatch
Change-Id: I04961afb592ae6a67fdcfd8c9066e920dd4b30e7
2016-10-03 18:15:59 +00:00
James Zern
50b9c467da Merge "vpx_convolve8_neon,load/store*: correct param type" 2016-10-01 23:52:14 +00:00
James Zern
c449983c56 vpx_convolve8_neon,load/store*: correct param type
stride/pitch in convolve is expressed with a ptrdiff_t

Change-Id: Ia5a6732dc509f06ccf7035386fa8ae721b4b1a71
2016-10-01 11:03:29 -07:00
Martin Storsjo
9255328f27 Remove a stray END declaration in loopfilter_4_neon.asm
Change-Id: Ic8c359a5677f9c663787aac74f530e886163bc69
2016-10-01 14:12:42 +03:00
Linfeng Zhang
da14d23e44 Merge "Refactor vpx lpf NEON files (step 2/2)" 2016-10-01 00:07:51 +00:00
Linfeng Zhang
edbca72a53 Merge "Refactor vpx lpf NEON files (step 1/2)" 2016-10-01 00:07:31 +00:00
James Zern
db80c23fd4 cosmetics,*_neon.c: rm redundant return from void fns
+ a couple of 'break's after a return

Change-Id: Ia21f12ebcef98244feb923c17b689fc8115da015
2016-09-30 13:09:57 -07:00
James Zern
b6277a47c7 Merge changes from topic '8bit-hbd-idct'
* changes:
  *idct*_neon.c: add missing rtcd include
  idct,msa/neon: exclude idct files from hbd build
  *rtcd_defs.pl: remove empty specialize calls
2016-09-30 19:36:08 +00:00
James Zern
1396d12103 *idct*_neon.c: add missing rtcd include
+ correct declarations as necessary

BUG=webm:1294

Change-Id: I719602df9a56e79188a78e7f8b31257c6d3cc11d
2016-09-30 11:41:26 -07:00
James Zern
b51c4df93a idct,msa/neon: exclude idct files from hbd build
these functions are incompatible currently and unreferenced in rtcd,
exclude them from the build.

BUG=webm:1294

Change-Id: I7790c195a91e1b142f56c04d2a5e305d9133b896
2016-09-30 11:32:47 -07:00
Linfeng Zhang
ca2fe7a8c7 Refactor vpx lpf NEON files (step 2/2)
Change-Id: I0744407cd3361ff752bd7f6e654b70ab6b41a58f
2016-09-30 09:56:28 -07:00
Linfeng Zhang
4779f5308d Refactor vpx lpf NEON files (step 1/2)
Change-Id: I4016d096d46ca691f3b17199b259b7231e983cfb
2016-09-30 09:48:54 -07:00
Linfeng Zhang
8c744fd978 Merge "Unify loopfilter function names" 2016-09-30 15:58:08 +00:00
Linfeng Zhang
c435b7fbdd Merge "Refine vpx convolve8 NEON intrinsics optimization" 2016-09-30 15:56:31 +00:00
Linfeng Zhang
bde905cba1 Merge "Refine vpx_convolve_copy_neon() and vpx_convolve_avg_neon()" 2016-09-30 15:54:02 +00:00
James Zern
ed62d27c71 *rtcd_defs.pl: remove empty specialize calls
add_proto adds a 'c' specialization

Change-Id: I0ed0c2240d45264b0e0056ce7c8f63f4a00780bc
2016-09-29 20:38:26 -07:00
Linfeng Zhang
7f1f35183a Unify loopfilter function names
Rename vpx_lpf_horizontal_edge_8() to vpx_lpf_horizontal_16().
Rename vpx_lpf_horizontal_edge_16() to vpx_lpf_horizontal_16_dual().

Change-Id: I798ca8fbbd657d06d3db2bfb0fb3321168f49e52
2016-09-29 16:25:42 -07:00
Linfeng Zhang
85a9e48d25 Refine vpx_convolve_copy_neon() and vpx_convolve_avg_neon()
BUG=webm:1290

Change-Id: Ia27e58521eba5a4852b50381c56746fa5767f6d6
2016-09-29 16:19:39 -07:00
Johann Koenig
ad55b1d270 Merge changes Ia3e9122f,Id33eb6c8,I956bd8ce
* changes:
  Remove vp8_clear_system_state
  vpx_dsp: clean up rtcd
  vp8: clean up rtcd
2016-09-29 23:16:45 +00:00
Linfeng Zhang
b3cb065ee4 Refine vpx convolve8 NEON intrinsics optimization
BUG=webm:1290

Change-Id: I5d7fce62270f9d76ef9ce98b3d188ad11fb21873
2016-09-29 12:48:59 -07:00
Johann
7b5a348088 vpx_dsp: clean up rtcd
Remove avx2+ssse3 specialization. Disabling ssse3 now automatically
disables avx2.

Change-Id: Id33eb6c85d1c4ee57128ebe45c995eb15cfcc765
2016-09-29 12:10:07 -07:00
James Zern
93c823e24b vpx_dsp/get_prob: relocate den == 0 test
to get_binary_prob(). the only other caller mode_mv_merge_probs() does
its own test on 0.

BUG=chromium:639712

Change-Id: I1178688706baeca2883f7aadbc254abb219a44ce
2016-09-28 17:42:49 -07:00
James Zern
7481edb33f vpx_dsp/get_prob: make clip_prob branchless
+ inline the function directly as there was only one consumer
(get_prob())

this is an attempt to reduce the amount of branches to workaround an amd
bug. this change is mildly faster or neutral across x86-64, arm.

http://support.amd.com/TechDocs/44739_12h_Rev_Gd.pdf
665 Integer Divide Instruction May Cause Unpredictable Behavior

BUG=chromium:639712

Suggested-by: Pascal Massimino <pascal.massimino@gmail.com>
Change-Id: Ia91823aded79aab469dd68095d44300e8df04ed2
2016-09-28 11:51:46 -07:00
Johann
02fa245d15 mips: clean up wextra warnings
Remove unused zbin variable:
warning: unused parameter ‘zbin’

Use int for loop variables to avoid unsigned conversion:
warning: comparison between signed and unsigned integer expressions

Change-Id: Icea74b870c0ee68a8bf687e796a69392af25a8ad
2016-09-27 13:19:18 -07:00
Urvang Joshi
0aa3e2564f Add compiler warning flag -Wextra and fix related warnings.
Note: some of these warnings are enabled by a combination of -Wunused
(added earlier) and -Wextra.

Cherry-picked from AOM 4790a69faaec8f03d65f64ff070f6ab4307dbb16

Expands use of (void)x; on unused variables. AOM only supports one codec
in codec_factory.h

Does not include changes to HandleDecodeResult. AOM removed
invalid_file_test.cc which does use the video parameter.

Does not enable -Wextra yet. There are more issues to fix.

BUG=webm:1069

Change-Id: I322a1366bd4fd6c0dec9e758c2d5e88e003b1cbf
2016-09-27 12:05:01 -07:00
Linfeng Zhang
b46243d7ff Merge "Refactor lpf (size 4 and 8) NEON intrinsics optimization" 2016-09-26 16:11:12 +00:00
James Zern
deadda3dea Merge "vpx_idct32x32_34_add_sse2: rm unneeded transposes" 2016-09-23 02:49:26 +00:00
James Zern
fdd1186f97 vpx_idct32x32_34_add_sse2: rm unneeded transposes
this change is neutral to mildly positive across various x86-64
platforms

Change-Id: I28fb5ae598fc1317b7a42c9a846ac5d57d104784
2016-09-21 19:49:25 -07:00
James Zern
e372bfd5ac variance_neon: sync variance*() w/c,sse2
removes some unnecessary casts and adds a few explicit uint32 ones for
larger sizes to quiet -Wshorten-64-to-32 warnings

Change-Id: I63c5fce8e62c426d5cf5c10a66a113c119a43518
2016-09-21 18:04:45 -07:00
Linfeng Zhang
761e5ec2f6 Refactor lpf (size 4 and 8) NEON intrinsics optimization
Also check in 8x8 8-bit transpose NEON intrinsics optimization
transpose_u8_8x8()

Change-Id: I32d321cf97ea21eab158ac4896990fc9a51681c4
2016-09-19 16:41:37 -07:00
James Zern
6acd061aad variance_avx2: sync variance functions with c-code
add missing int64 -> uint32 cast; quiets -Wshorten-64-to-32 warnings

Change-Id: I4850b36e18dc8b399108342be4bfe0b684aefb78
2016-09-19 16:19:29 -07:00
James Zern
aa0eb67bf7 loopfilter_mb_neon: remove unused load_8x8()
quiets a -Wunused-function warning for arm targets

Change-Id: I293a7e3d3d7d61d6af2fbedad5e8c25126c418b6
2016-09-17 11:00:31 -07:00
Linfeng Zhang
5d73639d8f Merge "Refactor lpf (size 16) NEON intrinsics optimization" 2016-09-17 00:33:30 +00:00
Linfeng Zhang
8107368000 Refactor lpf (size 16) NEON intrinsics optimization
Extract shared code so later lpf size 4 and 8 functions can reuse.

Change-Id: Ibb43ef1fd8651bd2e32fcc4c56cf6fa7ca237401
2016-09-16 09:12:13 -07:00
James Zern
33aef48f29 vpx_subpixel_8t_intrin_avx2: tolerate unversioned clang
assume __clang_major__==0 has the latest version of
_mm256_broadcastsi128_si256. fixes builds with custom clang toolchains.

BUG=b/30970831

Change-Id: I90becd56278e4716bd46e2ba9d910af977e8dfa6
2016-09-16 07:14:17 +00:00
clang-format
5f6d143b41 apply clang-format
Change-Id: I501597b7c1e0f0c7ae2aea3ee8073f0a641b3487
2016-09-15 15:07:53 -07:00
James Zern
4b0e78bfda Merge "vpx_dsp: added vpx_highbd_idct32x32_1_add_sse2()" 2016-09-08 01:05:18 +00:00
Scott LaVarnway
309125b1e7 vpx_dsp: added vpx_highbd_idct32x32_1_add_sse2()
Change-Id: I140d93aebadb0eaf6220881e61a0451450081227
2016-09-07 05:58:29 -07:00
Johann Koenig
4d1540f8ce Merge changes from topic 'Wundef'
* changes:
  Enable -Wundef by default
  Define VP8_TEMPORAL_ALT_REF to !CONFIG_REALTIME_ONLY
  Remove CONFIG_DEBUG guards from assert()
  Remove unused function vpx_de_mblock
  Fix -Wundef warning for OUTPUT_FPF
  Fix -Wundef warning for __SANITIZE_ADDRESS__
2016-09-02 01:39:18 +00:00
Johann
24f534ac90 Remove unused function vpx_de_mblock
vpx_config.h was not included so CONFIG_POSTPROC was never defined.

Change-Id: I777de499823afa286734549a8e7f4a93e7ad97f3
2016-08-31 23:01:45 -07:00
Linfeng Zhang
bee7d837ab Update NEON transpose functions.
Unify coding style.

Change-Id: I5826f40c02c882df7353391e0c9dd6cef6bd4b97
2016-08-31 14:58:40 -07:00
Linfeng Zhang
f7cbfed682 Update vpx_lpf_vertical_16_dual_neon() intrinsics
Process 16 samples together.

Change-Id: If6ee8e3377aa2786417f2fc411ba7d87ea8b6799
2016-08-30 11:17:33 -07:00
Linfeng Zhang
4916515511 Update vpx_lpf_horizontal_edge_16_neon() intrinsics
Process 16 samples together.

Change-Id: I9cfbe04c9d25d8b89f63f48f519e812746db754d
2016-08-27 14:47:48 -07:00
Johann Koenig
a70861c435 Merge "Remove halfpix specialization" 2016-08-26 21:28:01 +00:00
James Zern
3ddff4503a add_noise,vpx_setup_noise: correct 'char_dist' type
fixes SSE2/AddNoiseTest.CheckCvsAssembly/0 with -funsigned-char.
visibly broken since:
0dc69c7 postproc : fix function parameters for noise functions.
where the types diverged (char vs. int8)
but likely the return changed in:
2ca24b0 postproc - move filling of noise buffer to vpx_dsp.
when multiple implementations were merged.

Change-Id: I176ca1f170217f05ba7872b0c4de63e41949e999
2016-08-24 21:46:26 -07:00
Johann
d393885af1 Remove halfpix specialization
This function only exists as a shortcut to subpixel variance with
predefined offsets. xoffset = 4 for horizontal, yoffset = 4 for vertical
and both for "hv"

Removing this allows the existing optimizations for the variance
functions to be called. Instead of having only sse2 optimizations, this
gives sse2, ssse3, msa and neon.

BUG=webm:1273

Change-Id: Ieb407b423b91b87d33c4263c6a1ad5e673b0efd6
2016-08-23 17:05:39 -07:00
Linfeng Zhang
f9efbad392 NEON asm of vpx_lpf_{horizontal,vertical}_8_dual_neon()
Also expose the NEON intrinsics version.

BUG=webm:1261, webm:1266.

Change-Id: I8c4ae658467dcf66ebf7a75982b2ef712dbb4535
2016-08-16 08:50:57 -07:00
James Zern
dfcefe06fa Merge "variance_impl_avx2: restore table layout" 2016-08-12 23:02:27 +00:00
James Zern
bd7cfb46fb variance_impl_avx2: restore table layout
disable clang-format for bilinear_filters_avx2

restores the row layout prior to:
099bd7f vpx_dsp: apply clang-format
but keeps the justification used by clang-format

Change-Id: Icf1733a37edb807e74c26b23a93963c03bd08fd7
2016-08-12 11:52:53 -07:00
Linfeng Zhang
f09b5a3328 NEON intrinsics for 4 loopfilter functions
New NEON intrinsics functions:
vpx_lpf_horizontal_edge_8_neon()
vpx_lpf_horizontal_edge_16_neon()
vpx_lpf_vertical_16_neon()
vpx_lpf_vertical_16_dual_neon()

BUG=webm:1262, webm:1263, webm:1264, webm:1265.

Change-Id: I7a2aff2a358b22277429329adec606e08efbc8cb
2016-08-12 09:58:17 -07:00
Johann Koenig
57f49db81f Merge changes I6ef79702,Id332c641,I354b5d22,I84438013
* changes:
  Use common transpose for vpx_idct32x32_1024_add_neon
  Use common transpose for vpx_idct8x8_[12|64]_add_neon
  Use common transpose for vp9_iht8x8_add_neon
  Use common transpose for vpx_idct16x16_[10|256]_add_neon
2016-08-04 22:30:47 +00:00
Johann Koenig
17720b60bb Merge "Remove armv6 target" 2016-08-04 22:21:13 +00:00
James Zern
7f7c888c14 Merge "correct break placement" 2016-08-04 22:19:30 +00:00
Johann
0325b95938 Use common transpose for vpx_idct32x32_1024_add_neon
Change-Id: I6ef7970206d588761ebe80005aecd35365ec50ff
2016-08-04 20:13:18 +00:00