generic-library/vpx

Author	SHA1	Message	Date
Linfeng Zhang	81914ce68a	Add vpx_highbd_idct16x16_38_add_neon() BUG=webm:1301 Change-Id: Ic6cd8c1e63e1b7a997cbed221e20fff4c599e0fe	2017-02-15 09:12:02 -08:00
Linfeng Zhang	e07e74fb0f	Add vpx_highbd_idct16x16_38_add_c() When eob is less than or equal to 38 for high-bitdepth 16x16 idct, call this function. BUG=webm:1301 Change-Id: I09167f89d29c401f9c36710b0fd2d02644052060	2017-02-14 17:25:52 -08:00
Linfeng Zhang	5ad4159ebb	Add vpx_highbd_idct16x16_256_add_neon() BUG=webm:1301 Change-Id: I6bb755552a39bdd26eef3f449601f6a9766c65ec	2017-02-13 15:50:33 -08:00
Linfeng Zhang	016933ad48	Add vpx_highbd_idct{16x16,32x32}_1_add_neon() and update vpx_highbd_idct8x8_1_add_neon() BUG=webm:1301 Change-Id: I18d1a0cbe98ba822d5194c1b4e13a4c29c5c75f4	2017-02-13 10:25:22 -08:00
Linfeng Zhang	bc1c18e18c	Add vpx_idct16x16_38_add_neon() The RunQuantCheck() test on it exposes 16-bit overflow in stage 7 of pass 2. Change to use saturating add/sub for both vpx_idct16x16_38_add_neon() and vpx_idct16x16_256_add_neon() for high bitdepth. Change-Id: Ibf4c107a887553a52852cc582e28d38a5a5a2712	2017-02-08 12:15:22 -08:00
Linfeng Zhang	cf76ee2cb7	Add vpx_idct16x16_38_add_c() When eob is less than or equal to 38 for 16x16 idct, call this function. Change-Id: Ief6f3fb16a49ace3c92cebf4e220bf5bf52a6087	2017-02-07 09:40:51 -08:00
Jingning Han	bb40844e32	Merge "Add SSSE3 intrinsic 8x8 inverse 2D-DCT"	2017-02-02 22:18:32 +00:00
Johann Koenig	726556dde9	Merge "Remove neon assembly for idct 16x16 and 8x8"	2017-02-02 03:25:31 +00:00
Jingning Han	8f95389742	Add SSSE3 intrinsic 8x8 inverse 2D-DCT The intrinsic version reduces the average cycles from 183 to 175. Change-Id: I7c1bcdb0a830266e93d8347aed38120fb3be0e03	2017-02-01 14:47:53 -08:00
Johann	270fadc135	PartialIDctTest: reduce number of RunQuantCheck iterations This currently runs 1000 * 1000 = one million times which is quite unnecessary. It's one of the slowest items in Jenkins and takes over an hour for each of the larger transforms. Change-Id: I01653b5e610683e1a2d778ec60cf5065562ab8db	2017-01-23 13:32:09 -08:00
Johann	13234d3c43	Remove neon assembly for idct 16x16 and 8x8 Tested using test/partial_idct_test.cc:DISABLED_Speed Both gcc 4.9 and clang 3.8 from the r13 Android NDK offer improvements using the intrinsics: <function> <clang asm> <gcc asm> <clang intrin> <gcc intrin> idct16x16_256 1720ms 1703ms 1546ms 1554ms idct16x16_10 1320ms 1247ms 518ms 488ms idct16x16_1 107ms 108ms 64ms 68ms idct8x8_64 924ms 931ms 866ms 989ms idct8x8_12 826ms 824ms 519ms 514ms idct8x8_1 172ms 166ms 110ms 125ms idct8x8_64 isn't quite perfect (slight regression with gcc intrinsics) but as a counter example idct16x16_10 goes from ~1300ms to ~500ms On a sample clip, clang improved from 48.5 to 49fps and gcc stayed roughly stable. BUG=webm:1303 Change-Id: I9d4fd2b41b46ea6174a887b40a82c8e6e4769ed4	2017-01-19 12:27:31 -08:00
Kaustubh Raste	6377f9d966	Add mips dspr2 partial idct tests Change-Id: Idf4003ea6f9a2a42a9f26e156bee73697acb7a37	2017-01-09 17:30:16 +05:30
Linfeng Zhang	9b187954df	Add high bitdepth 8x8 idct NEON intrinsics BUG=webm:1301 Change-Id: I56e3bc3aab9214e2debac93796389a7194991084	2016-12-27 16:28:53 -08:00
Linfeng Zhang	c8f25fa5c0	Clean hbd idct 4x4 neon functions and other BUG=webm:1301 Change-Id: I387b7eae716a7df15c691dc6f368b07602df7342	2016-12-14 11:38:28 -08:00
Linfeng Zhang	201dcefafe	Update idct test code to test 8-bit & high bitdepth simultaneously Change-Id: Icc0eb9c0ddf2a13ec832877a089450972134e8ec	2016-12-13 17:25:04 -08:00
Linfeng Zhang	834feffe08	Update TEST_P(PartialIDctTest, RunQuantCheck) 1. Use correct projections when copying real dct/quant outputs. 2. Remove local random number generator and combine loops. 3. Quantization with minimum allowed step sizes instead of maximum. This may generate larger inputs. Change-Id: I154afc26230c894d564671cff4b8fd5485b69598	2016-12-07 11:34:00 -08:00
Linfeng Zhang	17a8cf5cc3	Add high bitdepth 4x4 idct NEON intrinsics Change-Id: I4afc130effa05b8be2e9f982967216b1beb2ce4b	2016-11-30 13:07:13 -08:00
Linfeng Zhang	6cc76ec73f	Update vpx_idct4x4_16_add_neon() to pass SingleExtremeCoeff test Change-Id: Icc4ead05506797d12bf134e8790443676fef5c10	2016-11-22 11:35:05 -08:00
Linfeng Zhang	45876b4550	Add idct speed test. Change-Id: I3b5fd3b36cac1fb3a93e27fd8fd0781c91d412ce	2016-11-22 11:19:24 -08:00
Linfeng Zhang	d479c9653e	Update partial_idct_test.cc to support high bitdepth BUG=webm:1301 Change-Id: Ieedadee221ce539e39bf806c41331f749f891a3c	2016-11-22 11:11:58 -08:00
James Zern	f6921412d4	partial_idct_test: s/SingleLargeCoef/SingleExtremeCoeff/ tests with 'Large' in the name are reserved for slow running tests which may not be run on all platforms Change-Id: I2a7d6dd46b29b50469893e46433844132fb727c2	2016-11-17 12:28:57 -08:00
James Zern	2218a4c292	partial_idct_test: use <limits> for int16_min/max this removes the need for __STDC_LIMIT_MACROS which is defined in vpx_integer.h, but may be preceded by earlier includes of stdint.h; fixes build with the r13 ndk Change-Id: I3950c8837cf90d5584a20ce370ae370581c2182c	2016-11-15 12:18:38 -08:00
James Zern	c344dee463	partial_idct_test,NEON: add missing idct variants idct4x4 and idct8x8 were universally enabled for high-bitdepth builds in: `3ae2597` idct,NEON: add a tran_low_t->s16 load adapter BUG=webm:1294 Change-Id: If142afb169c48728cc4b222e7c41aa4a63f95f0f	2016-11-08 18:29:35 -08:00
James Zern	738c8f23c6	enable vpx_idct32x32_34_add_neon in hbd builds replace load_and_transpose_s16_8x8() in idct32_6_neon() with a separate load_tran_low_to_s16() and transpose_s16_8x8(). the combined function is used in idct32_8_neon() where the input is the correctly sized output from the earlier stage. BUG=webm:1294 Change-Id: I4257c4b3a421b2cf5d13651f966eee0680ef98a9	2016-11-08 17:03:36 -08:00
Johann	50b40f114c	Optimize idct32x32_135_add for NEON BUG=webm:1295 Change-Id: I7f80ef4d29813fcb401fc6075babf19e3c195462	2016-11-08 22:06:07 +00:00
James Zern	40bcb96abd	partial_idct_test: set MinSupportedCoeff for NEON vpx_idct4x4_16_add_neon fails with INT16_MIN, +1 is all right BUG=webm:1335 Change-Id: I25830c8ab0782822fc3c9db6cc669c2e65f2700e	2016-11-07 15:47:09 -08:00
Johann	e851160642	idct test: use coeff consistently Change-Id: I913a13066993a3315a0ff8310b3cad1572d4cdd7	2016-11-04 18:41:59 -07:00
Johann	9ad3e14015	partial_idct_test: Add large coefficient test Two functions do not pass this test: vpx_idct8x8_64_add_ssse3 vpx_idct8x8_12_add_ssse3 The test has been modified to avoid triggering an issue with those functions but they still must be investigated. BUG=webm:1332 Change-Id: I52569a81e8e6e0b33c4a4d060d0b69c3fc4f578e	2016-11-04 18:37:58 -07:00
Johann	7994dba6c0	partial_idct_test: add _add_ test The result of the transform is added to the destination buffers. In the existing tests the destination buffer is always empty so that portion of the code was never exercised. Change-Id: I1858c4fed2274f1b9faf834d2ba4186a4510492a	2016-10-26 21:35:49 -07:00
Johann	ed2c240538	partial_idct_test: consolidate block size Use *input_block_ for sizeof() calculation like the other test Change-Id: I1e4bd227131662056405af78c5052ad6ef769e9f	2016-10-26 21:35:03 -07:00
Johann	08e0da30ca	Refactor partial idct test Switch to using correctly sized inputs and outputs. This simplifies adding tests with varying strides. Change-Id: I716a0d8173dcf6a86d56656ac9d3101b7ec27642	2016-10-26 12:28:18 -07:00
Johann	9720b58aac	Optimize idct32x32_34_add for NEON Approximately 3 times faster than the 1024 version which was used previously. BUG=webm:1295 Change-Id: Id15fb3d096029ec38ef01c53e5f6eb08254347c9	2016-10-25 15:43:58 -07:00
James Zern	a6be7ba1aa	enable idct*_1_add_neon in high-bitdepth builds these are compatible as they only load one element of the input so the larger size of tran_low_t makes no difference in little endian builds. note the asm is incompatible with big-endian, but there are other points of failure there so currently it's considered unsupported. BUG=webm:1294 Change-Id: Icd2665a0699bccae92d1bea43a95b0a83fb17028	2016-10-05 11:14:25 -07:00
Johann	24c0146403	Connect partial IDCT tests Change-Id: Ie8d5d9123f5a9d39db4ec9c74f77ee979ae4e685	2016-10-04 10:31:01 -07:00
clang-format	9c9d92ae3a	test: apply clang-tidy google-readability-braces-around-statements applied against a x86_64 configure with and without --enable-vp9-highbitdepth clang-tidy-3.7.1 \ -checks='-,google-readability-braces-around-statements' \ -header-filter='.' -fix + clang-format afterward Change-Id: Ia2993ec64cf1eb3505d3bfb39068d9e44cfbce8d	2016-08-05 20:02:28 -07:00
clang-format	33e40cb5db	test: apply clang-format Change-Id: I0d9ab85855eb723f653a7bb09b3d0d31dd6cfd2f	2016-07-27 01:58:52 +00:00
Johann	0266e70c52	test: remove x86inc.asm distinction BUG=b:29583530 Change-Id: I296a0b81755e3086bc0a40cb126d0200ff03c095	2016-06-30 11:14:10 -07:00
Jingning Han	08a453b9de	Replace vp9_ prefix with vpx_ prefix in vpx_dsp function names This commit clears the function naming convention in vpx_dsp. It replaces vp9_ prefix of global functions with vpx_ prefix. It also removes the vp9_ prefix from static functions. Change-Id: I6394359a63b71a51dda01342eec6a3cc08dfeedf	2015-08-04 13:46:11 -07:00
Jingning Han	097d59c28c	Cosmetics - Fix header file order in unit tests Change-Id: I9582a8d74990125b71e8fe620f7f3f2585a30798	2015-07-29 20:48:25 -07:00
Jingning Han	4b5109cd73	Replace vp9_ prefix in 2D-DCT functions with vpx_ Clean up the forward 2D-DCT function names in vpx_dsp. Change-Id: I3117978596d198b690036e7eb05fe429caf3bc25	2015-07-28 16:06:44 -07:00
Jingning Han	b67821f37b	Factor forward 2D-DCT transforms into vpx_dsp This commit factors the 4x4, 8x8, and 16x16 2D-DCT forward transform operations into vpx_dsp folder. Change-Id: I084b117b79c0925edcbcabb93f62b9f4bf8dbe7d	2015-07-22 15:48:17 -07:00
Johann	ff8505a54d	Fix --disable-use-x86inc Change-Id: I374fcd8fb45a6893dcdeac6896671be142a99f06	2015-07-01 13:15:51 -07:00
Parag Salasakar	54a6f73958	mips msa vp9 idct4x4 and iwht4x4 optimization average improvement ~3x-4x moved assert to respective files Change-Id: I6c915059d456a00bdd76fab0dd2eede8b6c6ea58	2015-06-02 12:16:28 +05:30
Parag Salasakar	6af9d7f2e2	mips msa vp9 updated idct 8x8, 16x16 and 32x32 module Updated sources according to improved version of common MSA macros. Enabled idct MSA hooks and tests. Overall, this is just upgrading the code with styling changes. Change-Id: I1f488ab2c741f6c622b7a855388a202168082209	2015-06-01 09:24:23 +05:30
Parag Salasakar	f9f078ebb6	mips msa vp9 updated macros and disable all MSA functions Done little restructuring/styling changes to the sources like generic macro definitions, their use to reduce code lines, better code alignments etc. Disabled all MSA hooks and tests Change-Id: Ic6f2dce0b501f46b80c06c46c0fe2043d557b190	2015-05-29 13:34:33 +05:30
Parag Salasakar	7c5f00f868	mips msa vp9 idct 8x8 optimization average improvement ~4x-6x Change-Id: I5edf713721b9e24c7e0ce2e69d8fc3ecab625d91	2015-05-08 12:23:27 +05:30
Parag Salasakar	a8a9c2bb45	Merge "mips msa vp9 idct 32x32 optimization"	2015-05-08 04:27:44 +00:00
James Zern	fd3658b0e4	replace DECLARE_ALIGNED_ARRAY w/DECLARE_ALIGNED this macro was used inconsistently and only differs in behavior from DECLARE_ALIGNED when an alignment attribute is unavailable. this macro is used with calls to assembly, while generic c-code doesn't rely on it, so in a c-only build without an alignment attribute the code will function as expected. Change-Id: Ie9d06d4028c0de17c63b3a27e6c1b0491cc4ea79	2015-05-07 11:55:08 -07:00
Parag Salasakar	1601c1385a	mips msa vp9 idct 32x32 optimization average improvement ~4x-6x Change-Id: Idaba7e49fbd7f388caee0d73773ccf6e4807ef17	2015-05-07 12:42:23 +05:30
Parag Salasakar	60052b618f	mips msa vp9 idct 16x16 optimization average improvement ~4x-6x Change-Id: I55e95b7f2ba403dff11813958dc7c73a900dd022	2015-05-05 12:37:06 +05:30

1 2

70 Commits