generic-library/vpx

Author	SHA1	Message	Date
Scott LaVarnway	f48321974b	Merge "VPX: x86 asm version of vpx_idct32x32_34_add()"	2015-11-10 21:40:11 +00:00
Scott LaVarnway	9aeaa2016e	VPX: x86 asm version of vpx_idct32x32_34_add() Change-Id: I8a933c63b7fbf3c65e2c06dbdca9646cadd0b7cb	2015-11-10 11:54:56 -08:00
James Zern	40dab58941	convolve_copy_sse2: replace SSE w/SSE2 code this should be neutral or slightly faster on modern (P4+) architectures Change-Id: Iec4c080275941eb8c9e05a66a2daf0405d86a69b	2015-11-09 23:45:16 -08:00
Geza Lore	9cfba09ac0	Optimize vpx_quantize_{b,b_32x32} assembler. Added optimization of the 8 bit assembly quantizer routines. This makes these functions up to 100% faster, depending on encoding parameters. This patch maskes the encoder faster in both the high bitdepth and 8bit configurations. In the high bitdepth configuration, it effects profile 0 only. Based on my profiling using 1080p input the net gain is between 1-3% for the 8 bit config, and around 2.5-4.5% for the high bitdepth config, depending on target bitrate. The difference between the 8 bit and high bitdepth configurations for the same encoder run is reduced by 1% in all cases I have profiled. Change-Id: I86714a6b7364da20cd468cd784247009663a5140	2015-10-20 10:11:19 +01:00
Johann	ec623a0bb7	Upstream Mozilla fix for older Apple clang builds Also use the _mm_broadcastsi128_si256 intrisic for Apple clang versions 4.[012] https://bugzilla.mozilla.org/show_bug.cgi?id=1085607 https://code.google.com/p/webm/issues/detail?id=1082 Change-Id: I6bc821d8163387194ef663e94bfed91fa7281d88	2015-10-14 07:41:23 -07:00
Alex Converse	0c00af126d	Add vpx_highbd_convolve_{copy,avg}_sse2 single-threaded: swanky (silvermont): ~1% faster overall peppy (celeron,haswell): ~1.5% faster overall Change-Id: Ib74f014374c63c9eaf2d38191cbd8e2edcc52073	2015-10-09 11:50:25 -07:00
Geza Lore	cbada4a982	Remove 4 mova insts from quantize_ssse3_x86_64.asm Change-Id: If3cb9345b44162e600e6c74873e0cb4c207fc7fb	2015-10-09 07:52:04 -07:00
Julia Robson	37c68efee2	SSSE3 optimisation for quantize in high bit depth When configured with high bit detpth enabled, the 8bit quantize function stopped using optimised code. This made 8bit content decode slowly. This commit re-enables the SSSE3 optimisations. Change-Id: I194b505dd3f4c494e5c5e53e020f5d94534b16b5	2015-10-06 13:32:02 +01:00
Scott LaVarnway	b212094839	Merge "VPX: refactor vpx_idct32x32_1_add_sse2()"	2015-10-06 11:35:15 +00:00
Julia Robson	5e6533e707	SSE2 optimisation for quantize in high bit depth When configured with high bit detpth enabled, the 8bit quantize function stopped using optimised code. This made 8bit content decode slowly. This commit re-enables the SSE2 optimisation (but not the SSSE3 optimisation). Change-Id: Id015fe3c1c44580a4bff3f4bd985170f2806a9d9	2015-10-05 10:59:16 -07:00
Scott LaVarnway	23d1c06268	VPX: refactor vpx_idct32x32_1_add_sse2() Change-Id: Ia1a2cac0e9dc05f3207b3433a6c1589fa7f2aee3	2015-10-05 06:33:42 -07:00
Julia Robson	406030d1b0	Accelerated transform in high bit depth When configured with high bitdepth enabled, the 8bit transform stopped using optimised code. This made 8bit content decode slowly. Change-Id: I67d91f9b212921d5320f949fc0a0d3f32f90c0ea	2015-09-28 21:09:16 -07:00
Johann	dd4f953350	Remove vpx_filter_block1d16_v8_intrin_ssse3 This was rewritten and moved to vpx_dsp/x86/vpx_subpixel_8t_ssse3.asm in 195883023bb39b5ee5c6811a316ab96d9225034d Change-Id: I117ce983dae12006e302679ba7f175573dd9e874	2015-09-18 16:05:43 -07:00
James Zern	683b5a3161	vpx_subpixel_8t_ssse3: fix reg counts/access fixes build on windows x64; previously 'heightq' i.e., the 64-bit register was accessed when only the 32-bit value was needed. given this is from a stack variable the upper bits were undefined. + bump register/xmm counts; users of SETUP_LOCAL_VARS touch xmm13 in 64-bit builds and filter_block1d16_v* uses one extra temp variable Change-Id: I9c768c0b2047481d1d3b11c2e16b2f8de6eb0d80	2015-09-17 12:27:34 -07:00
Scott LaVarnway	195883023b	VPX: subpixel_8t_ssse3 asm using x86inc This is based on the original patch optimized for 32bit platforms by Tamar/Ilya and now uses the x86inc style asm. The assembly was also modified to support 64bit platforms. Change-Id: Ice12f249bbbc162a7427e3d23fbf0cbe4135aff2	2015-09-03 20:35:51 -07:00
Johann Koenig	5c245a46d8	Merge changes I53b5bdc5,Ib81168a7,Ie0113945 * changes: Only build ssse3 filter functions on 64 bit Clean up unused function warnings in vp8 encoder Clean up unused function warnings in vp8 onyx_if.c	2015-08-27 20:58:53 +00:00
Johann	a28b2c6ff0	Add sse2 versions of halfpix variance These were lost in the great sub pixel variance move of 6a82f0d7fb9ee908c389e8d55444bbaed3d54e9c Not having these functions caused a ~10% performance regression in some realtime vp8 encodes. Change-Id: I50658483d9198391806b27899f2c0d309233c4b5	2015-08-27 11:58:38 -07:00
Johann	f5507b514c	Only build ssse3 filter functions on 64 bit Avoid an unused function warning by only building the functions when they will be used. Change-Id: I53b5bdc5a180c79d63b34e4c8921d679bbc54009	2015-08-26 10:32:18 -07:00
Scott LaVarnway	6c0f6dd817	Merge "VPX: scaled convolve : fix windows build errors"	2015-08-21 12:06:34 +00:00
Scott LaVarnway	acf24cc1b8	VPX: scaled convolve : fix windows build errors Change-Id: Ic81d435ea928183197040cdf64b6afd7dbaf57e4	2015-08-20 13:09:27 -07:00
Scott LaVarnway	6a21ca20cc	Merge "VPX ssse3 scaled convolve"	2015-08-19 22:12:21 +00:00
Jingning Han	49f6ff1103	Rename inv_txfm_sse2.asm to inv_wht_sse2.asm Change-Id: I43bcc70680503e4c18d8f021097307778cf9ea70	2015-08-19 10:29:53 -07:00
Scott LaVarnway	2030c49cf8	VPX ssse3 scaled convolve Change-Id: I71d5994e21813554a927d35ebcc26bf7a68984fd	2015-08-18 15:13:02 -07:00
Scott LaVarnway	6cf95bd1e7	Merge "VPX: remove step == 16 and filter[3] != 128 checks"	2015-08-12 20:13:33 +00:00
Scott LaVarnway	b04dad328c	Merge "VPX: remove scaled calls from FUN_CONV_1D"	2015-08-11 21:46:50 +00:00
Scott LaVarnway	4ef08dcec8	Merge "VPX: Add rtcd support for scaling."	2015-08-11 13:19:00 +00:00
James Zern	9265bad906	Merge changes from topic 'x86inc' * changes: Only use .text sections for aout Use newer x86inc.asm Use .text instead of .rodata on macho Copy PIC handling code from x86_abi_support Set 'private_extern' visibility for macho targets Avoid 'amdnop' when building with nasm Catch all elf formats Expand PIC default to macho64 and respect CONFIG_PIC from libvpx Use libvpx defines to set name mangling rules Customize x86inc.asm for libvpx	2015-08-10 21:20:38 +00:00
Scott LaVarnway	a229dbc1f0	VPX: remove step == 16 and filter[3] != 128 checks from FUN_CONV_1D and FUN_CONV_2D macros. The functions will not be called with these inputs. Change-Id: I67ec75e4edafc0acee70190521a80ea85dfa521b	2015-08-10 13:44:32 -07:00
Johann	41a0a0cb35	Use newer x86inc.asm Rename updated version of x86inc.asm Use "private_prefix" instead of "program_name" and make vpx the default prefix. Change-Id: I4883a99b2aee8e5dc9f2c16a2e6f4b5d6e4de458	2015-08-07 16:44:44 -07:00
Alex Converse	c65e79d2e5	ssim: Replace unsigned long with uint32_t. The assembly only writes the low 4 bytes, and the HBD version only uses uint32_t bytes. Change-Id: Ie3694ecda511c231e55870df814cbae30e588073	2015-08-07 11:48:31 -07:00
Alex Converse	c7b7011b9b	Move VP9 SSIM metrics to vpx_dsp. Change-Id: I20c7b42631b579fade6cf7ebf6d4c69b2fcb5e5e	2015-08-06 18:25:25 -07:00
Aℓex Converse	7ac505c726	Merge "Narrow a load in iwht4x4_16_add."	2015-08-06 22:21:16 +00:00
Alex Converse	0572052725	Narrow a load in iwht4x4_16_add. The top half is unused. Change-Id: I29b2f6a93e20ea43aff4ad0bd2d52257e1e752b6	2015-08-05 12:16:12 -07:00
Scott LaVarnway	4e6b5079c6	VPX: remove scaled calls from FUN_CONV_1D and FUN_CONV_2D macros. The predict lut now handles this case. The encoder now calls vpx_scaled_2d() instead of vpx_convolve8() for scaling. Change-Id: Ia1c8af8a31e4cb4887a587143108cb45835f7df7	2015-08-05 10:47:06 -07:00
James Zern	afd2f68dae	Revert "VP9_COPY_CONVOLVE_SSE2 optimization" This reverts commit a5e97d874b16ae5826b68515f1e35ffb44361cf8. Additionally: Revert "vpx_convolve_copy_sse2: fix win64" This reverts commit 22a8474fe7ec30d96f746dc6e4b23771758c071e. This change performs poorly on various x86_64 devices affecting performance by 1-3% at 1080P. Performance on chromebook like devices was mixed neutral to slightly negative, so there should be minimal change there. Change-Id: I95831233b4b84ee96369baa192a2d4cc7639658c	2015-08-04 17:57:01 -07:00
Jingning Han	d621de7e8d	Change vp9_quantize to vpx_quantize This commit clears all the vp9_ prefix use case in vpx_dsp. It gets the vp9 folder ready to branch out vp10. Change-Id: I2906eec179ee792b4af8c9b4161313653050e931	2015-08-04 15:31:49 -07:00
Jingning Han	08a453b9de	Replace vp9_ prefix with vpx_ prefix in vpx_dsp function names This commit clears the function naming convention in vpx_dsp. It replaces vp9_ prefix of global functions with vpx_ prefix. It also removes the vp9_ prefix from static functions. Change-Id: I6394359a63b71a51dda01342eec6a3cc08dfeedf	2015-08-04 13:46:11 -07:00
Scott LaVarnway	8f6b943100	VPX: Add rtcd support for scaling. Change-Id: If34bfb0d918967445aea7dc30cd7b55ebfedb1f2	2015-08-03 09:43:34 -07:00
Jingning Han	d10fc5af8f	Merge "Add vpx_dsp_rtcd.h to inv_txfm_sse2.c"	2015-08-03 16:03:09 +00:00
Jingning Han	80ae856c8b	Add vpx_dsp_rtcd.h to inv_txfm_sse2.c Change-Id: Ibab434fb4bd6da02dba087582ed74811f555c3ed	2015-08-02 08:25:13 -07:00
James Zern	22a8474fe7	vpx_convolve_copy_sse2: fix win64 xmm6-7 need to be stored Change-Id: I6c51559598d335946ec91be6246b49589c63b724	2015-08-01 11:45:49 -07:00
Jingning Han	b4c7d0523a	Merge "Factor inverse transform functions into vpx_dsp"	2015-08-01 16:20:24 +00:00
Jingning Han	e8b133c79c	Factor inverse transform functions into vpx_dsp This commit moves the module inverse transform functions from vp9 to vpx_dsp folder. The hybrid transform wrapper functions stay in the vp9 folder, since it involves codec-specific data structures. Change-Id: Ib066367c953d3d024c73ba65157bbd70a95c9ef8	2015-07-31 16:21:00 -07:00
Scott LaVarnway	a5e97d874b	VP9_COPY_CONVOLVE_SSE2 optimization This function suffers from a couple problems in small core(tablets): -The load of the next iteration is blocked by the store of previous iteration -4k aliasing (between future store and older loads) -current small core machine are in-order machine and because of it the store will spin the rehabQ until the load is finished fixed by: - prefetching 2 lines ahead - unroll copy of 2 rows of block - pre-load all xmm regiters before the loop, final stores after the loop The function is optimized by: copy_convolve_sse2 64x64 - 16% copy_convolve_sse2 32x32 - 52% copy_convolve_sse2 16x16 - 6% copy_convolve_sse2 8x8 - 2.5% copy_convolve_sse2 4x4 - 2.7% credit goes to Tom Craver(tom.r.craver@intel.com) and Ilya Albrekht(ilya.albrekht@intel.com) Change-Id: I63d3428799c50b2bf7b5677c8268bacb9fc29671	2015-07-31 14:51:51 -07:00
Zoe Liu	7186a2dd86	Code refactor on InterpKernel It in essence refactors the code for both the interpolation filtering and the convolution. This change includes the moving of all the files as well as the changing of the code from vp9_ prefix to vpx_ prefix accordingly, for underneath architectures: (1) x86; (2) arm/neon; and (3) mips/msa. The work on mips/drsp2 will be done in a separate change list. Change-Id: Ic3ce7fb7f81210db7628b373c73553db68793c46	2015-07-31 10:27:33 -07:00
Hui Su	4cbf36b105	Merge "Replace prefix vp9_ with vpx_ for intra prediction functions"	2015-07-29 00:38:48 +00:00
Jingning Han	d12a4a825c	Merge "Replace vp9_ prefix in 2D-DCT functions with vpx_"	2015-07-29 00:07:31 +00:00
Jingning Han	fc18cf7a11	Merge "Move DC only forward 2D-DCT functions to vpx_dsp"	2015-07-29 00:06:37 +00:00
Jingning Han	4b5109cd73	Replace vp9_ prefix in 2D-DCT functions with vpx_ Clean up the forward 2D-DCT function names in vpx_dsp. Change-Id: I3117978596d198b690036e7eb05fe429caf3bc25	2015-07-28 16:06:44 -07:00
Jingning Han	d19033fa4e	Move DC only forward 2D-DCT functions to vpx_dsp This completes the forward transform functions layout refactoring. Change-Id: I996fb0fb795f41e2040f7b21db985774098aedbd	2015-07-28 14:52:30 -07:00

... 4 5 6 7 8

371 Commits