473 Commits

Author SHA1 Message Date
Kaustubh Raste
a38e9f412d Merge "Fix SingleLargeCoeff idct test" 2016-11-19 03:37:29 +00:00
James Zern
cbeae53e76 Merge "Clean horizontal intra prediction NEON optimization" 2016-11-19 01:29:37 +00:00
Jerome Jiang
de5fd00ec5 Change *_xmm to *_sse2 in deblocker assembly functions.
Some cosmetic changes because xmm is an anachronism.

Change-Id: I436a5b78a3c52776c20d6640939311f2a84a9bc7
2016-11-17 23:38:04 +00:00
Kaustubh Raste
c56e5dd620 Fix SingleLargeCoeff idct test
Updated idct code to handle single large coefficient (-32768)

Change-Id: Ia13ab1ab434a9a1b9954a5914088977a88841cc7
2016-11-17 11:41:07 +00:00
Jerome Jiang
5d48663e04 Merge "Change C and msa to match results from sse2." 2016-11-17 05:16:27 +00:00
Jerome Jiang
cb1b1b8fef Change C and msa to match results from sse2.
Re-enable the tests to check CvsAssembly.
BUG=webm:1321

Change-Id: Id7f7d74b06c469fb6c8f5d04e91359e9cd9097a6
2016-11-16 17:05:26 -08:00
Linfeng Zhang
85c1ee434d Add high bitdepth intra prediction NEON optimization (mode tm)
BUG=webm:1316

Change-Id: Ib014de06836ac12726f4a2c9f0833ec4eb4d233b
2016-11-15 14:19:46 -08:00
Linfeng Zhang
a3128ad33a Add high bitdepth intra prediction NEON optimization (h and v)
BUG=webm:1316

Change-Id: I47eeac698a98a31d1af5f72441052302e9fa4f46
2016-11-12 12:00:19 -08:00
James Zern
80f6b243a7 Merge changes I339088b2,Iaade219e,If142afb1,I4257c4b3
* changes:
  fdct8x8_test: add vpx_idct8x8_64_add_neon in hbd
  fdct4x4_test: add vpx_idct4x4_16_add_neon in hbd
  partial_idct_test,NEON: add missing idct variants
  enable vpx_idct32x32_34_add_neon in hbd builds
2016-11-10 05:02:39 +00:00
Linfeng Zhang
40ab0424d4 Add high bitdepth intra prediction NEON optimization (mode d45 and d135)
BUG=webm:1316

Change-Id: I6a330874348df04df24a6d9efdc06f567e04bf8e
2016-11-09 12:04:04 -08:00
James Zern
738c8f23c6 enable vpx_idct32x32_34_add_neon in hbd builds
replace load_and_transpose_s16_8x8() in idct32_6_neon() with a separate
load_tran_low_to_s16() and transpose_s16_8x8(). the combined function is
used in idct32_8_neon() where the input is the correctly sized output
from the earlier stage.

BUG=webm:1294

Change-Id: I4257c4b3a421b2cf5d13651f966eee0680ef98a9
2016-11-08 17:03:36 -08:00
Johann
50b40f114c Optimize idct32x32_135_add for NEON
BUG=webm:1295

Change-Id: I7f80ef4d29813fcb401fc6075babf19e3c195462
2016-11-08 22:06:07 +00:00
Linfeng Zhang
64a5a8fd6f Merge "Add high bitdepth intra prediction NEON optimization (mode dc)" 2016-11-08 16:53:42 +00:00
Linfeng Zhang
d545c19afa Rename vpx_highbd_idct8x8_10{*}() to vpx_highbd_idct8x8_12{*}()
Also update its trigger threshold from 10 to 12.

Change-Id: Ib8dddd87a5a22a12ca66e7084d342fbb027b0a2f
2016-11-07 09:07:55 -08:00
Linfeng Zhang
a9874961f0 Merge "Replace highbd_dct_const_round_shift with dct_const_round_shift" 2016-11-07 16:55:01 +00:00
Johann
e10c95dc83 Update vp9_fdct8x8_quant_ssse3 for highbitdepth
Borrow transition functions from fdct.h nee vpx_quantize_b_sse2

BUG=webm:1304

Change-Id: I9c88c3eec3ff8bb461411d98c26c3c236ea28ef1
2016-11-05 01:23:07 +00:00
Linfeng Zhang
04c3bf3c85 Replace highbd_dct_const_round_shift with dct_const_round_shift
They are identical.

Change-Id: I1ccaf03c81c3cbf88e82d77ffeb8204f5b063c61
2016-11-04 16:15:02 -07:00
Linfeng Zhang
32326c2f13 Merge "Cosmetics of inv_txfm.c" 2016-11-04 22:40:03 +00:00
Johann Koenig
900ec31bea Merge "Extract high bit depth helper functions" 2016-11-04 21:03:17 +00:00
Linfeng Zhang
b68d8107cb Cosmetics of inv_txfm.c
Unify code of 8-bit and high bitdepth.

Change-Id: I3fe441577af0249030ca3a1ef769eb9030711434
2016-11-04 13:24:41 -07:00
Johann
cf35ffc025 Extract high bit depth helper functions
These can be used in the vp9 fdct as well.

Change-Id: I4f3875e0cba1b8cad209c3a0581e121deba7675e
2016-11-04 18:13:51 +00:00
Martin Storsjo
34c35b6fb6 Add a missing END directive in idct_neon.asm
This fixes building with MS armasm.

Change-Id: I2629eeed859b775ca667a65ba109f8d1bf7b0e03
2016-11-04 12:21:18 +02:00
Linfeng Zhang
1338c71dfb Clean horizontal intra prediction NEON optimization
Change-Id: I1ef0a5b2655cbc7e1cc2a4a1a72e0eed9aa41f05
2016-11-02 11:43:45 -07:00
Johann Koenig
5ac7a59a05 Merge "arm idct: move to-be-shared code to header" 2016-11-02 18:09:45 +00:00
Linfeng Zhang
3b74066b10 Add high bitdepth intra prediction NEON optimization (mode dc)
BUG=webm:1316

Change-Id: I984d6004ea2445e86f213fb6fa4d794a9955af8f
2016-11-01 17:07:36 -07:00
Johann
bf8ab194ee arm idct: move to-be-shared code to header
Change-Id: I67458cd358b4dc4434bbdbfcdd571769561b619e
2016-11-01 15:43:56 -07:00
James Zern
1b275ab898 Merge "idct32x32_1_add_neon: clear a couple conv warnings" 2016-11-01 22:34:59 +00:00
James Zern
9de91855ef Merge changes I08af3a54,If5959a25,I6763e62e
* changes:
  build/make/Android.mk: s/armv8/arm64/
  build/make/Android.mk: fix armeabi-v7a build
  use .S suffix rather than .s for NEON asm
2016-11-01 21:43:13 +00:00
Linfeng Zhang
cc5f49767a Refine 8-bit intra prediction NEON optimization (mode tm)
Change-Id: I98b9577ec51367df5e5d564bedf7c3ea0606de4c
2016-11-01 09:45:16 -07:00
James Zern
7625c803b3 idct32x32_1_add_neon: clear a couple conv warnings
int16_t -> uint8_t

Change-Id: I3c5e0985bc3584dce289c35b5973de24cdc73b76
2016-10-31 18:56:34 -07:00
James Zern
1ddb4c0362 use .S suffix rather than .s for NEON asm
for compatibility with other build systems

Change-Id: I6763e62e3126850ad4f8ad29e388b8dad0bbc4c3
2016-10-31 16:39:05 -07:00
James Zern
410d947c5f Merge "idct,NEON: add a tran_low_t->s16 load adapter" 2016-10-31 21:59:12 +00:00
James Zern
3ae25974fd idct,NEON: add a tran_low_t->s16 load adapter
enable idct4x4* and idct8x8* which are compatible for 8-bit decodes in
high-bitdepth mode. the adapter narrows 32-bit input to 16, whether the
expansion can be avoided at all in this case remains a TODO. roughly
matches sse2.

BUG=webm:1294

Change-Id: I3ea94e5a2070dfd509b5de0c555aab4e1f4da036
2016-10-31 11:21:16 -07:00
Linfeng Zhang
a347118f3c Refine 8-bit intra prediction NEON optimization (mode h and v)
Change-Id: I45e1454c3a85e081bfa14386e0248f57e2a91854
2016-10-31 10:33:44 -07:00
Linfeng Zhang
4ae9f5c092 Refine 8-bit intra prediction NEON optimization (mode d45 and d135)
dst += stride behaving better with gcc/clang.
Unroll loops.

Change-Id: I83f85df2bc9f17c6159542f57680b509395db2b1
2016-10-27 14:24:50 -07:00
Linfeng Zhang
9c0680bd43 Merge "Refine 8-bit intra prediction NEON optimization (mode dc)" 2016-10-26 16:51:44 +00:00
Johann
9720b58aac Optimize idct32x32_34_add for NEON
Approximately 3 times faster than the 1024 version which was used
previously.

BUG=webm:1295

Change-Id: Id15fb3d096029ec38ef01c53e5f6eb08254347c9
2016-10-25 15:43:58 -07:00
Linfeng Zhang
ce88b8f5c5 Refine 8-bit intra prediction NEON optimization (mode dc)
dst += stride behaving better with gcc/clang
Expanding inline function dc_SIZExSIZE() save intructions for
vpx_dc_predictor_SIZExSIZE_neon().

Change-Id: Id0ccbd58b6a31df539141fd33bdf28633339150d
2016-10-24 13:18:51 -07:00
James Zern
2e6a1976a0 Merge "remove idct32x32*_add_neon.asm" 2016-10-22 02:29:56 +00:00
James Zern
5d91752a98 Merge "vpx_highbd_convolve_copy_neon: use multi reg loads" 2016-10-22 02:28:15 +00:00
James Zern
9dbb3ad396 remove idct32x32*_add_neon.asm
the intrinsics are neutral to ~20% faster on cros/android
devices when using gcc-4.9/clang-3.8.1 and gcc-4.9/clang-3.8.x from the
r13 ndk. neutral results typically came with gcc-4.9 while larger
positive gains were achieved with clang 3.8.x.

BUG=webm:1303

Change-Id: I4d31f9c017944681b881493525d4573a7a5b1e16
2016-10-20 19:47:14 -07:00
James Zern
a60dd5c83a Merge "Fix warnings reported by -Wshadow: Part1: vpx_dsp directory" 2016-10-18 22:09:29 +00:00
Kaustubh Raste
8ff5af773a Merge "Optimize sad_64width_x4d_msa function" 2016-10-18 07:46:02 +00:00
Kaustubh Raste
b7310e2aff Optimize sad_64width_x4d_msa function
Reduced HADD_UH_U32 macro calls

Change-Id: Ie089b9a443de516646b46e8f72156aa826ca8cfa
2016-10-18 04:05:33 +00:00
Urvang Joshi
e084e05484 Fix warnings reported by -Wshadow: Part1: vpx_dsp directory
While we are at it:
- Rename some variables to more meaningful names
- Reuse some common consts from a header instead of redefining them.

Change-Id: I75c4248cb75aa54c52111686f139b096dc119328
(cherry picked from aomedia 09eea21)
2016-10-17 19:25:19 -07:00
James Zern
68cd3052ca vpx_highbd_convolve_copy_neon: use multi reg loads
for copy16/32/64

BUG=webm:1299

Change-Id: I5080d736bde7e487c80ef3d7024dda1e96a57eaf
2016-10-17 17:15:03 -07:00
Linfeng Zhang
9c8981c666 add vpx high bitdepth convolve8 NEON intrinsics optimization
BUG=webm:1299

Change-Id: I236bfa0441e357b6ff05add8269a2cfb543924d1
2016-10-17 15:23:54 -07:00
Linfeng Zhang
f910d14a1a add vpx_highbd_convolve_{copy,avg}_neon()
BUG=webm:1299

Change-Id: Ib87ac466ada63251eb06ae2abd1e13e61e0d1538
2016-10-13 15:21:14 -07:00
James Zern
1909270f65 Merge "cosmetics,*loopfilter_neon.c: s/tranpose/transpose/" 2016-10-13 07:12:51 +00:00
Kaustubh Raste
9e75c01353 Merge "Optimize vpx_mbpost_proc_across_ip_msa function" 2016-10-13 02:12:33 +00:00