generic-library/vpx

Author	SHA1	Message	Date
James Zern	73d1236384	inv_txfm_vsx.c: make code c90 compatible move for loop declarations to function scope Change-Id: I84d92a1a6ca6c5ac30aacb0f55d87ca3aef4c98f	2018-02-01 19:40:28 -08:00
Linfeng Zhang	b14b616d96	Update vp9_iht8x8_64_add_neon() Change-Id: Ie70ed8b9273df5e1fd06bc93cb469e80630941d2	2018-01-29 15:17:08 -08:00
Linfeng Zhang	884d1681f8	Clean dct_const_round_shift() related neon code Change-Id: I8f4e0fc6ecb77b623519f2dd3cd2886f89218ddd	2018-01-29 10:23:24 -08:00
Linfeng Zhang	2654afc16c	Merge "cosmetic: clean idct neon functions"	2018-01-29 17:34:11 +00:00
Scott LaVarnway	15b261d854	Merge "BUG FIX: sse2 subpel variance is not PIC compliant"	2018-01-24 22:54:42 +00:00
Linfeng Zhang	6248f0c91f	cosmetic: clean idct neon functions Change-Id: I9c7c52567850aded0437b13ba1260e94441bc49d	2018-01-24 13:55:15 -08:00
Scott LaVarnway	cb9f4dc105	BUG FIX: sse2 subpel variance is not PIC compliant BUG=webm:1464 Change-Id: Ibc15bac54aaf509365bed5892a26a29972ad3540	2018-01-24 05:58:54 -08:00
Scott LaVarnway	b9e44842fc	Merge "vp9_quantize_fp_avx2()"	2018-01-24 13:58:08 +00:00
Linfeng Zhang	231012fdab	Add vp9_highbd_iht16x16_256_add_sse4_1() BUG=webm:1413 Change-Id: I8d7eeae1bd219eb848c1a86071046a477f7a91af	2018-01-23 11:24:42 -08:00
Linfeng Zhang	8f50e06012	Add "vpx_" prefix to 2 idct x86 functions Change-Id: I4f3052d8748e16b06e9155f8daf22f867dfaa7a3	2018-01-23 09:17:38 -08:00
Linfeng Zhang	6fea41abee	Merge "Add vp9_highbd_iht8x8_64_add_sse4_1()"	2018-01-23 17:04:20 +00:00
Linfeng Zhang	9874ec07bd	Add vp9_highbd_iht8x8_64_add_sse4_1() BUG=webm:1413 Change-Id: Id9038226902b2d793fc6c17ac81bb104c1a18988	2018-01-18 15:49:44 -08:00
Scott LaVarnway	c7449b482c	vp9_quantize_fp_avx2() Started from vp9_quantize_fp_sse2 and tweaked to use avx2. Change-Id: Ic2da50cc9d73896c7ef2f3cd3db5b1c5d7795b8b	2018-01-18 13:33:30 -08:00
Johann	97acbbb701	clang-format v5.0.0 vpx_dsp/ Remove comments above #define statements because they get indented unnecessarily. https://bugs.llvm.org/show_bug.cgi?id=35930 Add blank lines to prevent comments from being treated as blocks. Change-Id: I04dce21b2a10e13b8dc07411a0019c098f6dd705	2018-01-18 12:37:50 -08:00
Johann	f5b2dd2a66	adopt some clang 5.0.0 formatting At least the changes that don't conflict with 4.0.1 Change-Id: I9b6a7c14dadc0738cd0f628a10ece90fc7ee89fd	2018-01-11 12:35:24 -08:00
Linfeng Zhang	e20ca4fead	Add vp9_highbd_iht4x4_16_add_sse4_1() BUG=webm:1413 Change-Id: I14930d0af24370a44ab359de5bba5512eef4e29f	2018-01-08 10:14:20 -08:00
Linfeng Zhang	7a41610581	Update dct_test.cc Make 8-bit functions testing available in high bitdepth. Change-Id: Ic030c75aa4c6b649c52426abb4bb2122882de0fe	2018-01-08 10:07:38 -08:00
Linfeng Zhang	867b593caa	Update iadst4_sse2() Change-Id: I21ff81df0d6898170a3b80b3b5220f9f3ac7f4e8	2017-12-28 16:47:57 -08:00
Johann	e4b3f03c64	add copyright to rtcd files Allows them to pass the license check in chromium. BUG=chromium:98319 Change-Id: Iefc1706152a549d8c4ae774c917596bf1c9492d8	2017-12-14 22:50:08 +00:00
Shiyou Yin	90ce21e519	Merge "vpx_dsp: [loongson] optimize variance v2."	2017-12-04 01:30:06 +00:00
Johann	bdbecea1ba	explicitly label .text sections nasm should infer .text but does not for windows: https://bugzilla.nasm.us/show_bug.cgi?id=3392451 Change-Id: Ib195465e5f33405f5ff61c4cf88aa2a72640cacb	2017-12-01 14:33:04 -08:00
Shiyou Yin	298f5ca47d	vpx_dsp: [loongson] optimize variance v2. 1. Delete unnecessary zero setting process. 2. Optimize the method of calculating SSE in vpx_varianceWxH. Change-Id: I58890c6a2ed1543379acb48e03e620c144f6515f	2017-12-01 13:44:48 +08:00
Kaustubh Raste	8099220e6c	Merge "mips msa optimize vpx_scaled_2d function"	2017-12-01 01:24:25 +00:00
Shiyou Yin	8d70aef05f	Merge "vpx: [loongson] fix bug in var_filter_block2d_bil_16x"	2017-11-30 00:53:37 +00:00
Kyle Siefring	3ae909b0f9	Merge "Remove unnecessary includes of emmintrin_compat.h"	2017-11-29 19:14:45 +00:00
Kyle Siefring	a60da3a2eb	Remove unnecessary includes of emmintrin_compat.h Change-Id: Ie60381a0c6ee01f828cd364a43f01517f4cb03e9	2017-11-29 11:48:24 -05:00
Kaustubh Raste	339f4dcaee	mips msa optimize vpx_scaled_2d function Change-Id: I638507b360c71489ab0e87bd558d2719ad995333	2017-11-29 13:27:04 +05:30
Shiyou Yin	a0ca2a4079	vpx: [loongson] fix bug in var_filter_block2d_bil_16x Which cause failed case: 1. MMI/VpxSubpelVarianceTest.Ref/6 2. MMI/VpxSubpelVarianceTest.Ref/7 3. MMI/VpxSubpelVarianceTest.ExtremeRef/6 4. MMI/VpxSubpelVarianceTest.ExtremeRef/7 Change-Id: I122ca20089e14ac324edd61295cf8f506e06afc8	2017-11-29 10:26:43 +08:00
Johann	bd990cad72	quantize x86: dedup some parts Change-Id: I9f95f47bc7ecbb7980f21cbc3a91f699624141af	2017-11-27 13:09:21 -08:00
Kyle Siefring	dd4cc5b596	Merge "Optimize AVX2 get16x16var and get32x16var functions"	2017-11-20 22:37:57 +00:00
Kyle Siefring	07a0bf038f	Optimize AVX2 get16x16var and get32x16var functions Change-Id: If8b91aaa883c01107f0ea3468139fa24cfb301d2	2017-11-17 13:55:49 -05:00
Johann	3e3a568616	fwd txfm ssse3: use GLOBAL() for loading constants Fixes a build issue when relocation is not allowed: relocation R_X86_64_32 against '.rodata' can not be used when making a shared object Change-Id: Ica3e90c926847bc384e818d7854f0030f4d69aa0	2017-11-15 13:01:44 -08:00
Scott LaVarnway	8e6022844f	vpx: [x86] add vpx_satd_avx2() SSE2 instrinsic vs AVX2 intrinsic speed gains: blocksize 16: ~1.33 blocksize 64: ~1.51 blocksize 256: ~3.03 blocksize 1024: ~3.71 Change-Id: I79b28cba82d21f9dd765e79881aa16d24fd0cb58	2017-11-10 12:24:12 -08:00
Scott LaVarnway	2387024f41	runtime error fix: bitdepth_conversion_avx2.h Change-Id: I7364a157de39eb7137b599808474b8d46d19d376	2017-11-09 12:26:43 -08:00
Kyle Siefring	b383a17fa4	Support building AVX-512 and implement sadx4 for AVX-512 The added AVX-512 support requires the subset of AVX-512 added in Skylake-X. Change-Id: I39666b00d10bf96d06c709823663eb09b89265b7	2017-11-03 13:37:23 -04:00
Scott LaVarnway	3bf02ad74a	vpx: hadamard: use ptrdiff_t instead of int for stride Eliminates the following instruction for the x86 (64 bit) intrinsic code: movslq %esi,%rax Change-Id: I8f5ebd40726f998708a668b0f52ea7a0576befae	2017-10-26 11:41:48 -07:00
Kyle Siefring	037e596f04	Merge "Optimize convolve8 SSSE3 and AVX2 intrinsics"	2017-10-24 19:22:36 +00:00
Kyle Siefring	ae35425ae6	Optimize convolve8 SSSE3 and AVX2 intrinsics Changed the intrinsics to perform summation similiar to the way the assembly does. The new code diverges from the assembly by preferring unsaturated additions. Results for haswell SSSE3 Horiz/Vert Size Speedup Horiz x4 ~32% Horiz x8 ~6% Vert x8 ~4% AVX2 Horiz/Vert Size Speedup Horiz x16 ~16% Vert x16 ~14% BUG=webm:1471 Change-Id: I7ad98ea688c904b1ba324adf8eb977873c8b8668	2017-10-24 10:39:48 -04:00
Scott LaVarnway	512bf4e029	vpx: [x86] vpx_hadamard_16x16_avx2() highbitdepth fix Use an intermediate buffer before storing to coeffs when highbitdepth is enabled. Change-Id: I101981a1995f1108ad107c55c37d6e09eadb404b	2017-10-23 08:49:32 -07:00
Scott LaVarnway	4906cea027	vpx: [x86] vpx_hadamard_16x16_avx2() improvements ~10% performance gain. Fixed the cosmetics noted in the previous commit. Change-Id: Iddf475f34d0d0a3e356b2143682aeabac459ed13	2017-10-20 08:55:06 -07:00
Scott LaVarnway	b58259ab55	Merge "vpx: [x86] add vpx_hadamard_16x16_avx2()"	2017-10-19 23:32:10 +00:00
Scott LaVarnway	55c126a5d7	vpx: [x86] add vpx_hadamard_16x16_avx2() This version is ~1.91x faster than the sse2 version. When highbitdepth is enabled, it is ~1.74x. Change-Id: I2b0e92ede9f55c6259ca07bf1f8c8a5d0d0955bd	2017-10-18 18:00:00 -07:00
Kyle Siefring	b3a36f7946	Merge "Refactor x86/vpx_subpixel_8t_intrin_avx2.c"	2017-10-18 16:19:52 +00:00
Linfeng Zhang	9336e01621	Merge changes I17fff122,Ic149e3cb * changes: Add 4 to 3 scaling SSSE3 optimization Test extreme inputs in frame scale functions	2017-10-17 16:03:29 +00:00
Kyle Siefring	55805e2786	Refactor x86/vpx_subpixel_8t_intrin_avx2.c Change-Id: I6539111dfb35a43028e9755785b2e9ea31854305	2017-10-17 11:57:40 -04:00
Linfeng Zhang	580d32240f	Add 4 to 3 scaling SSSE3 optimization Note this change will trigger the different C version on SSSE3 and generate different scaled output. Its speed is 2x compared with the version calling vpx_scaled_2d_ssse3(). Change-Id: I17fff122cd0a5ac8aa451d84daa606582da8e194	2017-10-16 15:42:42 -07:00
Kyle Siefring	caa116c9be	Merge changes I38783d97,If5160c0c * changes: Extend 16 wide AVX2 convolve8 code to support averaging. Add AVX2 version of vpx_convolve8_avg.	2017-10-12 16:12:38 +00:00
Linfeng Zhang	16166bfdaa	Add 4 to 1 scaling x86 optimization Change-Id: I51c190f0a88685867df36912522e67bdae58a673	2017-10-10 16:24:06 -07:00
Linfeng Zhang	963cc22cef	Merge changes I9d4c1af5,I882da3a0 * changes: Rename some inline functions in NEON scaling Generalize 2:1 vp9_scale_and_extend_frame_ssse3()	2017-10-10 17:29:50 +00:00
Kyle Siefring	1b2f92ee8e	Extend 16 wide AVX2 convolve8 code to support averaging. Also adds vpx_convolve8_avg_horiz_avx2. Change-Id: I38783d972ac26bec77610e9e15a0a058ed498cbf	2017-10-09 19:10:03 -04:00

1 2 3 4 5 ...

980 Commits