generic-library/vpx

Author	SHA1	Message	Date
Johann	327a02d77e	Use 'packssdw' for loading tran_low_t values This matches bitdepth_conversion_sse2.asm and produces substantially better assembly. The old way had lots of 'movzwl' and 'shl' and storing back to memory before loading into an xmm register. Change-Id: Ib33e35354dfd691a4f8b1e39f4dbcbb14cd5302b	2017-02-14 22:39:49 +00:00
clang-format	4b402746ca	apply clang-format Change-Id: I75e4a9e0b37bd4586f26c8d6c1fa27f3f6ff1bce	2017-02-14 12:45:52 -08:00
Yi Luo	bd86de1ac8	Replace idct32x32_34_add_ssse3 assembly with intrinsics - No user-level speed performance change. - Pass unit tests. Change-Id: Idfc598e00f354265e41f6b3219f4734216c115c6	2017-02-14 10:38:36 -08:00
Yi Luo	ac04d11abc	Replace idct8x8_12_add_ssse3 assembly code with intrinsics - Performance achieves the same as assembly. - Unit tests pass. Change-Id: I6eacfbbd826b3946c724d78fbef7948af6406ccd	2017-02-08 10:07:45 -08:00
Johann	641fda79bb	highbd x86: consolidate tran_low_t conversions Create new helper files specifically for converting tran_low_t types. Change-Id: I7c4c458ef910f3b3d10a3cfbf9df4de7682fd905	2017-02-06 10:43:26 -08:00
Jingning Han	bb40844e32	Merge "Add SSSE3 intrinsic 8x8 inverse 2D-DCT"	2017-02-02 22:18:32 +00:00
Johann Koenig	ce6318f254	Merge changes I43521ad3,I013659f6 * changes: satd highbd neon: use tran_low_t for coeff satd highbd sse2: use tran_low_t for coeff	2017-02-02 03:03:58 +00:00
Jingning Han	8f95389742	Add SSSE3 intrinsic 8x8 inverse 2D-DCT The intrinsic version reduces the average cycles from 183 to 175. Change-Id: I7c1bcdb0a830266e93d8347aed38120fb3be0e03	2017-02-01 14:47:53 -08:00
Johann	2ba383474d	satd highbd sse2: use tran_low_t for coeff BUG=webm:1365 Change-Id: I013659f6b9fbf9cc52ab840eae520fe0b5f883fb	2017-02-01 11:55:16 -08:00
Johann	0f751ecee3	hadamard highbd ssse3: use tran_low_t for coeff BUG=webm:1365 Change-Id: I374dfc08732932382043905f128e928b08cb4f57	2017-02-01 11:51:15 -08:00
Johann	2dac808dd1	hadamard highbd sse2: use tran_low_t for coeff BUG=webm:1365 Change-Id: Ica414007d8412ceebfffa9e58e8416226a3fe934	2017-02-01 11:46:57 -08:00
Johann	dcfff3ccc8	quantize ssse3: remove unused pxor Change-Id: Ifa22d77fd530827de0b32ae71810dc2213ab2937	2017-01-30 17:02:57 -08:00
Jingning Han	39fff1bea0	Rework 8x8 transpose SSSE3 for avg computation Use same transpose process as inv_txfm_sse2 does. Change-Id: I2db05f0b254628a11f621c4c09abb89501ba6d3c	2017-01-12 15:16:07 -08:00
Jingning Han	f65170ea84	Rework 8x8 transpose SSSE3 for inverse 2D-DCT Use same transpose process as inv_txfm_sse2 does. Change-Id: Ic4827825bd174cba57a0a80e19bf458a648e7d94	2017-01-12 15:13:18 -08:00
Jingning Han	9a780fa7db	Rework forward 8x8 2D-DCT ssse3 implementation This commit reworks the SSSE3 implementation of the forward 8x8 2D-DCT. It uses a cyclic rotation approach to the temporary xmm registers. It reduces the average cycles from 158 to 154. The SSE2 version uses 169 cycles. Change-Id: I1b79b9642aae0ed3fb3cefb5b70246e6de5d5caa	2017-01-10 12:50:55 -08:00
Linfeng Zhang	c8f25fa5c0	Clean hbd idct 4x4 neon functions and other BUG=webm:1301 Change-Id: I387b7eae716a7df15c691dc6f368b07602df7342	2016-12-14 11:38:28 -08:00
Linfeng Zhang	264f6e70ec	Update idct x86 intrinsics to not use saturated add and sub Change-Id: Iaa64d23fdb45ca1f235b0ea57e614516e548eca4	2016-11-29 17:06:08 -08:00
Jerome Jiang	de5fd00ec5	Change _xmm to _sse2 in deblocker assembly functions. Some cosmetic changes because xmm is an anachronism. Change-Id: I436a5b78a3c52776c20d6640939311f2a84a9bc7	2016-11-17 23:38:04 +00:00
Linfeng Zhang	d545c19afa	Rename vpx_highbd_idct8x8_10{}() to vpx_highbd_idct8x8_12{}() Also update its trigger threshold from 10 to 12. Change-Id: Ib8dddd87a5a22a12ca66e7084d342fbb027b0a2f	2016-11-07 09:07:55 -08:00
Linfeng Zhang	a9874961f0	Merge "Replace highbd_dct_const_round_shift with dct_const_round_shift"	2016-11-07 16:55:01 +00:00
Johann	e10c95dc83	Update vp9_fdct8x8_quant_ssse3 for highbitdepth Borrow transition functions from fdct.h nee vpx_quantize_b_sse2 BUG=webm:1304 Change-Id: I9c88c3eec3ff8bb461411d98c26c3c236ea28ef1	2016-11-05 01:23:07 +00:00
Linfeng Zhang	04c3bf3c85	Replace highbd_dct_const_round_shift with dct_const_round_shift They are identical. Change-Id: I1ccaf03c81c3cbf88e82d77ffeb8204f5b063c61	2016-11-04 16:15:02 -07:00
Johann	cf35ffc025	Extract high bit depth helper functions These can be used in the vp9 fdct as well. Change-Id: I4f3875e0cba1b8cad209c3a0581e121deba7675e	2016-11-04 18:13:51 +00:00
Urvang Joshi	e084e05484	Fix warnings reported by -Wshadow: Part1: vpx_dsp directory While we are at it: - Rename some variables to more meaningful names - Reuse some common consts from a header instead of redefining them. Change-Id: I75c4248cb75aa54c52111686f139b096dc119328 (cherry picked from aomedia 09eea21)	2016-10-17 19:25:19 -07:00
Linfeng Zhang	9c8981c666	add vpx high bitdepth convolve8 NEON intrinsics optimization BUG=webm:1299 Change-Id: I236bfa0441e357b6ff05add8269a2cfb543924d1	2016-10-17 15:23:54 -07:00
Linfeng Zhang	7f1f35183a	Unify loopfilter function names Rename vpx_lpf_horizontal_edge_8() to vpx_lpf_horizontal_16(). Rename vpx_lpf_horizontal_edge_16() to vpx_lpf_horizontal_16_dual(). Change-Id: I798ca8fbbd657d06d3db2bfb0fb3321168f49e52	2016-09-29 16:25:42 -07:00
Urvang Joshi	0aa3e2564f	Add compiler warning flag -Wextra and fix related warnings. Note: some of these warnings are enabled by a combination of -Wunused (added earlier) and -Wextra. Cherry-picked from AOM 4790a69faaec8f03d65f64ff070f6ab4307dbb16 Expands use of (void)x; on unused variables. AOM only supports one codec in codec_factory.h Does not include changes to HandleDecodeResult. AOM removed invalid_file_test.cc which does use the video parameter. Does not enable -Wextra yet. There are more issues to fix. BUG=webm:1069 Change-Id: I322a1366bd4fd6c0dec9e758c2d5e88e003b1cbf	2016-09-27 12:05:01 -07:00
James Zern	fdd1186f97	vpx_idct32x32_34_add_sse2: rm unneeded transposes this change is neutral to mildly positive across various x86-64 platforms Change-Id: I28fb5ae598fc1317b7a42c9a846ac5d57d104784	2016-09-21 19:49:25 -07:00
James Zern	6acd061aad	variance_avx2: sync variance functions with c-code add missing int64 -> uint32 cast; quiets -Wshorten-64-to-32 warnings Change-Id: I4850b36e18dc8b399108342be4bfe0b684aefb78	2016-09-19 16:19:29 -07:00
James Zern	33aef48f29	vpx_subpixel_8t_intrin_avx2: tolerate unversioned clang assume __clang_major__==0 has the latest version of _mm256_broadcastsi128_si256. fixes builds with custom clang toolchains. BUG=b/30970831 Change-Id: I90becd56278e4716bd46e2ba9d910af977e8dfa6	2016-09-16 07:14:17 +00:00
clang-format	5f6d143b41	apply clang-format Change-Id: I501597b7c1e0f0c7ae2aea3ee8073f0a641b3487	2016-09-15 15:07:53 -07:00
James Zern	4b0e78bfda	Merge "vpx_dsp: added vpx_highbd_idct32x32_1_add_sse2()"	2016-09-08 01:05:18 +00:00
Scott LaVarnway	309125b1e7	vpx_dsp: added vpx_highbd_idct32x32_1_add_sse2() Change-Id: I140d93aebadb0eaf6220881e61a0451450081227	2016-09-07 05:58:29 -07:00
Johann	d393885af1	Remove halfpix specialization This function only exists as a shortcut to subpixel variance with predefined offsets. xoffset = 4 for horizontal, yoffset = 4 for vertical and both for "hv" Removing this allows the existing optimizations for the variance functions to be called. Instead of having only sse2 optimizations, this gives sse2, ssse3, msa and neon. BUG=webm:1273 Change-Id: Ieb407b423b91b87d33c4263c6a1ad5e673b0efd6	2016-08-23 17:05:39 -07:00
James Zern	bd7cfb46fb	variance_impl_avx2: restore table layout disable clang-format for bilinear_filters_avx2 restores the row layout prior to: `099bd7f` vpx_dsp: apply clang-format but keeps the justification used by clang-format Change-Id: Icf1733a37edb807e74c26b23a93963c03bd08fd7	2016-08-12 11:52:53 -07:00
Alex Converse	c0241664aa	Resolve -Wshorten-64-to-32 in variance. The subtrahend is small enough to fit into uint32_t. Change-Id: Ic4d7128aaa665eaf6b25d562610ba8942c46137f	2016-07-28 10:16:31 -07:00
clang-format	956af1d478	vpx_dsp/x86/quantize_sse2.c: apply clang-format post: `e429080` .clang-format: disable DerivePointerAlignment Change-Id: I21a0546668edb2b09660e216d4875a1d2ad24d53	2016-07-27 21:41:18 -07:00
clang-format	099bd7f07e	vpx_dsp: apply clang-format Change-Id: I3ea3e77364879928bd916f2b0a7838073ade5975	2016-07-25 14:14:19 -07:00
Ivan Krasin	91369fd9b7	Fix compilation error under Clang 4.0. The LLVM trunk has reached 4.0 and now __clang_major__ is not enough to distinguish between old XCode Clang and the new 'real' Clang. Using __apple_build_version__ allows to make this distinction. BUG=chromium:631144 Change-Id: I0b6e46fddfe4f409c7b7e558bda34872e60ee2d9	2016-07-25 19:18:49 +00:00
Jim Bankoski	0dc69c70f7	postproc : fix function parameters for noise functions. Change-Id: I582b6307f28bfc987dcf8910379a52c6f679173c	2016-07-15 08:27:34 -07:00
Jim Bankoski	88e6951465	deblock filter : moved from vp8 code branch The deblocking filters used in vp8 have been moved to vpx_dsp for use by both vp8 and vp9. Change-Id: I5209d76edafc894b550f751fc76d3aa6799b392d	2016-07-12 05:53:00 -07:00
Jingning Han	7c1fdf02cd	Merge "Support measure distortion in the pixel domain"	2016-07-07 18:09:20 +00:00
Jingning Han	e357b9efe0	Support measure distortion in the pixel domain Use pixel domain distortion metric in speed 0. This improves the compression performance by 0.3% for both low and high resolution test sets. Change-Id: I5b5b7115960de73f0b5e5d0c69db305e490e6f1d	2016-07-06 18:25:17 -07:00
James Zern	5afa3b9150	Merge "improve vpx_filter_block1d* based on replace paddsw+psrlw to pmulhrsw"	2016-07-02 03:08:33 +00:00
James Zern	3197172405	Merge "Update vpx subpixel 1d filter ssse3 asm"	2016-07-02 03:08:17 +00:00
Johann	1b833d63d9	vpx_dsp: remove x86inc.asm distinction BUG=b:29583530 Change-Id: I397d77536b0d3cee0a92cdfe8b76bc4e434d0720	2016-06-29 18:55:58 -07:00
James Zern	3a6a81fc9a	Merge changes I9433d858,Iafd05637,If08ce6ca * changes: tests: remove redundant round() definition remove visual studio < 2010 workarounds configure: remove old visual studio support (<2010)	2016-06-29 23:07:16 +00:00
Linfeng Zhang	6b350766bd	Update vpx subpixel 1d filter ssse3 asm Speed test shows the new vertical filters have degradation on Celeron Chromebook. Added "X86_SUBPIX_VFILTER_PREFER_SLOW_CELERON" to control the vertical filters activated code. Now just simply active the code without degradation on Celeron. Later there should be 2 set of vertical filters ssse3 functions, and let jump table to choose based on CPU type. Change-Id: Iba2f1f2fe059a9d142c396d03a6b8d2d3b981e87	2016-06-29 13:48:41 -07:00
Yaowu Xu	63a37d16f3	Prevent negative variance Due to rounding, hbd variance may become negative. This commit put in check and clamp of negative values to 0. Change-Id: I610d9c8aa2d4eebe7bc5f2c5624a9e3cadad4c94	2016-06-29 11:08:17 -07:00
James Zern	c125f4a594	remove visual studio < 2010 workarounds BUG=b/29583530 Change-Id: Iafd05637eb65f4da54a9c857e79204a77646858a	2016-06-28 20:58:49 -07:00

1 2 3 4 5

241 Commits