generic-library/vpx

Author	SHA1	Message	Date
hkuang	98e88e6ad8	Optimize the idct assembly code. Change-Id: Ia0ff859ff1c813dbe100e2f27b1ef78167483f4e	2015-06-03 17:20:35 -07:00
James Zern	8c15ced172	vp9: move ssse3 convolve fns to intrinsics file + synchronize filter function signatures this makes any intrinsics filters available for inlining and has the side-effect of making those filters static, quieting missing-prototype warnings. Change-Id: I1908875caffa585bd4fc65aaf10d17a5e20cfb46	2015-05-22 20:14:16 -07:00
James Zern	2161e44025	vp9: move avx2 convolve fns to intrinsics file + synchronize filter function signatures this makes any intrinsics filters available for inlining and has the side-effect of making those filters static, quieting missing-prototype warnings. Change-Id: I1cd55c9d52547793ad65aa90c7620f0e426edaa2	2015-05-22 20:13:06 -07:00
James Zern	ef2b3cce50	add vp9/common/x86/convolve.h collect the vp9_convolve function definition macros there; this will allow some relocation of functions from vp9_asm_stubs.c Change-Id: Idadd117fa256dd48748379856973fd985b8204e8	2015-05-22 20:12:16 -07:00
James Zern	48d8291df4	vp9_subpixel_8t_intrin_ssse3: quiet vs9 warning reorder includes to avoid: warning C4985: 'ceil': attributes not present on previous declaration. this is the same workaround used in vp9/common/vp9_systemdependent.h Change-Id: Ia10dd63de24f96fa1507a6179220e9d6ec774db6	2015-05-22 12:05:02 -07:00
James Zern	330fba41e2	vp9 intrinsics: add vp9_rtcd include silences a missing declaration warning Change-Id: I59a34e1a1377cf3529b678d7ec0122bd43ab1bf1	2015-05-15 10:43:47 -07:00
Johann	1d7ccd5325	Relocate memory operations for common code With the sad functions, and hopefully the variance functions soon, moving to the vpx_dsp location, place the defines used in the reference C code in a common location. Change-Id: I4c8ce7778eb38a0a3ee674d2f1c488eda01cfeca	2015-05-13 11:41:15 -07:00
hkuang	f5574fb44c	Merge "Add more sse2 code for intra prediction."	2015-05-08 17:26:30 +00:00
James Zern	fd3658b0e4	replace DECLARE_ALIGNED_ARRAY w/DECLARE_ALIGNED this macro was used inconsistently and only differs in behavior from DECLARE_ALIGNED when an alignment attribute is unavailable. this macro is used with calls to assembly, while generic c-code doesn't rely on it, so in a c-only build without an alignment attribute the code will function as expected. Change-Id: Ie9d06d4028c0de17c63b3a27e6c1b0491cc4ea79	2015-05-07 11:55:08 -07:00
hkuang	7153b822ed	Add more sse2 code for intra prediction. vp9_dc_left_predictor_16x16 vp9_dc_top_predictor_32x32 vp9_dc_left_predictor_32x32 vp9_dc_128_predictor_32x32 Change-Id: Ib9861deefd01c3527235b92ff6b3d571ef6b4bc6	2015-05-06 17:17:00 -07:00
James Zern	ccae5d99d2	fix and enable vp9_dc_128_predictor_16x16 widen the loads and stores to 128-bit. this was added, but not enabled in: `493a857` Add some sse2 code for intra prediction. Change-Id: I277d7db608a7db7d75cc0bde86f48fa66ad487e4	2015-05-05 11:40:13 -07:00
hkuang	e47811ef8f	Merge "Add some sse2 code for intra prediction."	2015-05-05 17:11:07 +00:00
James Zern	670b2c09ce	vp9_idct_intrin_sse2: cosmetics: reindent + fix some whitespace Change-Id: Id61b739282014288a7e5d3c17a9d6448d9d4cda2	2015-05-01 16:07:54 -07:00
James Zern	c77b1f5acd	vp9: RECON_AND_STORE4X4: remove dest offset offsetting by a variable stride prevents instruction reordering, resulting in poor assembly Change-Id: Id62d6b3299cdd23f8c44f97b630abf4fea241446	2015-04-30 19:14:17 -07:00
James Zern	778845da05	vp9_idct_intrin_*: RECON_AND_STORE: remove dest offset offsetting by a variable stride prevents instruction reordering, resulting in poor assembly. additionally reroll 16x16/32x32 loops to reduce register spill with this new format Change-Id: I0635b8ba21ecdb88116e927dbdab53acdf256e11	2015-04-30 19:14:17 -07:00
hkuang	493a8579f1	Add some sse2 code for intra prediction. Change-Id: I16c0a62e52dab62837c547345df31e7518620ed4	2015-04-30 15:42:57 -07:00
Yaowu Xu	47767609fe	Remove vp9_idct16x16_10_add_ssse3() The rotation computation using 2X of cos(pi/16) has a potential to overflow 32 bit, this commit disable the function to allow further investigation and optimization. Change-Id: I4a9803bc71303d459cb1ec5bbd7c4aaf8968e5cf	2015-04-30 09:07:30 -07:00
Jingning Han	c4cb8059ff	Merge "Fix high bit-depth loop-filter sse2 compiling issue - part 4"	2015-02-27 09:49:10 -08:00
Jingning Han	43bb97f7d0	Merge "Fix high bit-depth loop-filter sse2 compiling issue - part 3"	2015-02-27 09:49:00 -08:00
Jingning Han	4800b0e80d	Merge "Fix high bit-depth loop-filter sse2 compiling issue - part 2"	2015-02-27 09:48:51 -08:00
Jingning Han	8ec22296b3	Fix high bit-depth loop-filter sse2 compiling issue - part 3 Change-Id: Idb14b9a285f8098126f967c5e2750221d6a58f69	2015-02-26 15:21:22 -08:00
Jingning Han	14ff1cb74a	Fix high bit-depth loop-filter sse2 compiling issue - part 2 Change-Id: I6728b69bb3dff1daa64ff7142f691e80a089f1c4	2015-02-26 12:41:19 -08:00
Jingning Han	2080e4b206	Fix high bit-depth loop-filter sse2 compiling issue - part 1 The intrinsic statement _mm_subs_epi16() should take immediate. Feeding variable as its input argument will cause compile failure in older version gcc. Change-Id: I6a71efcc8d3b16b84715e0a9bcfa818494eea3f4	2015-02-25 09:59:50 -08:00
Jingning Han	5b87f1bb5a	Fix high bit-depth loop-filter sse2 compiling issue - part 4 Change-Id: I39f56f60425836f2e1ec07da71edd4810a4c78bb	2015-02-24 14:50:30 -08:00
James Zern	923cc0bf51	vp9_highbd_tm_predictor_16x16: fix win64 by saving xmm8; cglobal's xmm reg arg is 0-based Change-Id: Ic8426ec9ac59ab4478716aa812452a6406794dcb	2015-02-10 19:34:12 -08:00
JackyChen	09673deba9	SSE2 code for the filter in MFQE. The SSE2 code is from VP8 MFQE, reuse it in VP9. No change on VP8 side. In our testing, we achieve 2X speed by adopting this change. Change-Id: Ib2b14144ae57c892005c1c4b84e3379d02e56716	2015-01-18 16:07:59 -08:00
James Zern	89ee8923a8	Merge "Remove redundant loads on 1d16_v8 filter."	2014-12-12 14:32:52 -08:00
James Zern	f82d7fd854	Merge "Remove redundant loads on 1d8_v8 filter."	2014-12-12 14:32:26 -08:00
Frank Galligan	6a24dbd71f	Remove redundant loads on 1d16_v8 filter. This CL showed about a 3% gain in performance on some systems. Change-Id: Id27e7e0b8e69068aa364e67859436da852669250	2014-12-12 11:48:47 -08:00
Frank Galligan	44ee777905	Remove redundant loads on 1d8_v8 filter. This CL showed a modest gain in performance on some systems. Change-Id: Iad636a89a1a9804ab7a0dea302bf2c6a4d1653a4	2014-12-12 11:34:24 -08:00
James Zern	d456ccbc9d	vp9_loopfilter_mmx: remove some unused tables Change-Id: I964d25cc91c8e4864d73b142d9c7a1b39cb6cfbb	2014-12-12 11:16:24 -08:00
Peter de Rivaz	5c22224e9e	Corrected optimization of 8x8 DCT code The 8x8 DCT uses a fast version whenever possible. There was a mistake in the checking code which meant sometimes the fast version was used when it was not safe to do so. Change-Id: I154c84c9e2d836764768a11082947ca30f4b5ab7 (cherry picked from commit `fd05fb0c21`)	2014-12-11 09:42:57 -08:00
Yunqing Wang	cddbdeabd0	Merge "SSSE3 Optimization for Atom processors using new instruction selection and ordering"	2014-12-08 13:34:54 -08:00
James Zern	c38d0490b3	Merge "Changes to assembler for NASM on mac."	2014-12-08 12:55:06 -08:00
levytamar82	8f9d94ec17	SSSE3 Optimization for Atom processors using new instruction selection and ordering The function vp9_filter_block1d16_h8_ssse3 uses the PSHUFB instruction which has a 3 cycle latency and slows execution when done in blocks of 5 or more on Atom processors. By replacing the PSHUFB instructions with other more efficient single cycle instructions (PUNPCKLBW + PUNPCHBW + PALIGNR) performance can be improved. In the original code, the PSHUBF uses every byte and is consecutively copied. This is done more efficiently by PUNPCKLBW and PUNPCHBW, using PALIGNR to concatenate the intermediate result and then shift right the next consecutive 16 bytes for the final result. For example: filter = 0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8 Reg = 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 REG1 = PUNPCKLBW Reg, Reg = 0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7 REG2 = PUNPCHBW Reg, Reg = 8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15 PALIGNR REG2, REG1, 1 = 0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8 This optimization improved the function performance by 23% and produced a 3% user level gain on 1080p content on Atom processors. There was no observed performance impact on Core processors (expected). Change-Id: I3cec701158993d95ed23ff04516942b5a4a461c0	2014-12-08 13:11:01 -07:00
Peter de Rivaz	7e40a55ef9	Added high bitdepth sse2 transform functions Also removes some spurious changes in common/vp9_blockd.h which was introduced by a rebase issue between nextgen and master branches. Change-Id: If359f0e9a71bca9c2ba685a87a355873536bb282 (cherry picked from commit `005d80cd05`) (cherry picked from commit `08d2f54800`) (cherry picked from commit `4230c2306c`)	2014-12-02 11:16:24 -08:00
John Stark	71379b87df	Changes to assembler for NASM on mac. fixes non-Apple nasm part of issue #755 Change-Id: I11955d270c4ee55e3c00e99f568de01b95e7ea9a	2014-11-24 12:00:50 -08:00
Yaowu Xu	2c4fee17bc	Fix visual studio 2013 compiler warnings For configured with --enable-vp9-highbitdepth Change-Id: I2b181519d7192f8d7a241ad5760c3578255f24e6	2014-11-05 13:47:28 -08:00
levytamar82	86175a5788	WORKAROUND FIX FOR GCC4.9.1 In the function mb_lpf_horizontal_edge_w_avx2_16 the usage of the intrinsic _mm256_cvtepu8_epi16 cause a compiler bug in gcc 4.9.1. until it will be fixed I created a workaround that create the up convert by using broadcast128+shuffle. The bug was reported here: https://code.google.com/p/webm/issues/detail?id=867 Change-Id: I73452e6806f42e0fadcde96b804ea3afa7eeb351	2014-11-01 11:27:28 -07:00
Deb Mukherjee	1929c9b391	Rename highbitdepth functions to use highbd prefix Uses highbd_ prefix convention consistently. Change-Id: I58f7f799a7ff8e32701bcd71c955bcf1cdd4581e	2014-10-09 14:40:40 -07:00
Deb Mukherjee	e2a90c0b21	Merge "High bit-depth loop/arf/postproc filter functions"	2014-09-23 17:26:32 -07:00
Deb Mukherjee	931ed516ba	High bit-depth loop/arf/postproc filter functions Adds high-bitdepth loopfilter, temporal filter and postproc functions Change-Id: I81c8a9176890784686bc4f2af0d550d243b3b2d3	2014-09-23 16:20:43 -07:00
Frank Galligan	49dc7b05d0	Merge "FIX: vp9_loopfilter_intrin_sse2.c"	2014-09-18 15:10:16 -07:00
Scott LaVarnway	13284311eb	FIX: vp9_loopfilter_intrin_sse2.c Fixes Visual Studio build failures Change-Id: I233719cd63b3ad0db16e2834bf1d7ea1df805880	2014-09-18 13:09:13 -07:00
Deb Mukherjee	6d0ee9860e	Merge "Adds high bitdepth convolve, interpred & scaling"	2014-09-18 10:52:23 -07:00
Deb Mukherjee	0d3c3d3ce7	Adds high bitdepth convolve, interpred & scaling Change-Id: Ie51c352a6b250547207cbc1ebba833a01ed053e3	2014-09-18 07:26:17 -07:00
Frank Galligan	4e066299d9	Merge "Improved mb_lpf_horizontal_edge_w_sse2_16() #2 "	2014-09-17 18:52:30 -07:00
Scott LaVarnway	217e3cb1fb	Improved mb_lpf_horizontal_edge_w_sse2_16() #2 The decoder performance improved up to 1% for the test clips used. Change-Id: I4621112bdccfba01640322facfa4ba8da8290ea5	2014-09-17 17:25:20 -07:00
Deb Mukherjee	81a8138fc3	Adding high-bitdepth intra prediction functions Change-Id: I6f5cb101e2dc57c3d3f4d7e0ffb4ddbed027d111	2014-09-16 15:04:39 -07:00
Johann	8645a53039	Allow specifying opt dependencies If optimizations use more than one cpu feature, allow specifying them so that '--disable-X' still works https://code.google.com/p/webm/issues/detail?id=854 Change-Id: I3108ea37b397371a2be84dd5f2380b304db23f18	2014-09-11 13:43:48 -07:00

1 2 3 4 5 ...

275 Commits