generic-library/vpx

Author	SHA1	Message	Date
James Zern	0ccad4d649	Revert "VPX: x86 asm version of vpx_idct32x32_34_add()" This reverts commit `9aeaa2016e`. This causes some test vectors to fail. Change-Id: I3659a2068404ec5a0591fba5c88b1bec0c9059a4	2015-11-11 11:12:38 -08:00
James Zern	e3efed7f4c	Merge "convolve_copy_sse2: replace SSE w/SSE2 code"	2015-11-10 22:35:12 +00:00
Scott LaVarnway	f48321974b	Merge "VPX: x86 asm version of vpx_idct32x32_34_add()"	2015-11-10 21:40:11 +00:00
Scott LaVarnway	9aeaa2016e	VPX: x86 asm version of vpx_idct32x32_34_add() Change-Id: I8a933c63b7fbf3c65e2c06dbdca9646cadd0b7cb	2015-11-10 11:54:56 -08:00
James Zern	40dab58941	convolve_copy_sse2: replace SSE w/SSE2 code this should be neutral or slightly faster on modern (P4+) architectures Change-Id: Iec4c080275941eb8c9e05a66a2daf0405d86a69b	2015-11-09 23:45:16 -08:00
Debargha Mukherjee	65dd056e41	Merge "Optimize vpx_quantize_{b,b_32x32} assembler."	2015-10-26 18:04:49 +00:00
Ronald S. Bultje	53dc9fd0a0	vp10: merge ext_ipred_bltr experiment into misc_fixes. Change-Id: I2f2deb700748408b8278b7f5c29ee1f2e39785ec	2015-10-21 22:27:34 -04:00
Geza Lore	9cfba09ac0	Optimize vpx_quantize_{b,b_32x32} assembler. Added optimization of the 8 bit assembly quantizer routines. This makes these functions up to 100% faster, depending on encoding parameters. This patch maskes the encoder faster in both the high bitdepth and 8bit configurations. In the high bitdepth configuration, it effects profile 0 only. Based on my profiling using 1080p input the net gain is between 1-3% for the 8 bit config, and around 2.5-4.5% for the high bitdepth config, depending on target bitrate. The difference between the 8 bit and high bitdepth configurations for the same encoder run is reduced by 1% in all cases I have profiled. Change-Id: I86714a6b7364da20cd468cd784247009663a5140	2015-10-20 10:11:19 +01:00
Ronald S. Bultje	c7dc1d78bf	vp10: add extended-intra prediction edges experiment. This experiment allows using full above/right edges for all transform sizes whenever available (for d45/d63), and adds bottom/left edges for d207. See issue 1043. Change-Id: I5cf7f345e783e8539bb6b6d2c9972fb1d6d0a78b	2015-10-16 19:30:39 -04:00
Johann	ec623a0bb7	Upstream Mozilla fix for older Apple clang builds Also use the _mm_broadcastsi128_si256 intrisic for Apple clang versions 4.[012] https://bugzilla.mozilla.org/show_bug.cgi?id=1085607 https://code.google.com/p/webm/issues/detail?id=1082 Change-Id: I6bc821d8163387194ef663e94bfed91fa7281d88	2015-10-14 07:41:23 -07:00
hui su	6f31722950	Fix compiler warnings Change-Id: I761256a8100d83abf1b937f3739580237e3fad2a	2015-10-13 10:33:17 -07:00
Alex Converse	0c00af126d	Add vpx_highbd_convolve_{copy,avg}_sse2 single-threaded: swanky (silvermont): ~1% faster overall peppy (celeron,haswell): ~1.5% faster overall Change-Id: Ib74f014374c63c9eaf2d38191cbd8e2edcc52073	2015-10-09 11:50:25 -07:00
Geza Lore	cbada4a982	Remove 4 mova insts from quantize_ssse3_x86_64.asm Change-Id: If3cb9345b44162e600e6c74873e0cb4c207fc7fb	2015-10-09 07:52:04 -07:00
Julia Robson	37c68efee2	SSSE3 optimisation for quantize in high bit depth When configured with high bit detpth enabled, the 8bit quantize function stopped using optimised code. This made 8bit content decode slowly. This commit re-enables the SSSE3 optimisations. Change-Id: I194b505dd3f4c494e5c5e53e020f5d94534b16b5	2015-10-06 13:32:02 +01:00
Scott LaVarnway	b212094839	Merge "VPX: refactor vpx_idct32x32_1_add_sse2()"	2015-10-06 11:35:15 +00:00
Julia Robson	5e6533e707	SSE2 optimisation for quantize in high bit depth When configured with high bit detpth enabled, the 8bit quantize function stopped using optimised code. This made 8bit content decode slowly. This commit re-enables the SSE2 optimisation (but not the SSSE3 optimisation). Change-Id: Id015fe3c1c44580a4bff3f4bd985170f2806a9d9	2015-10-05 10:59:16 -07:00
Scott LaVarnway	23d1c06268	VPX: refactor vpx_idct32x32_1_add_sse2() Change-Id: Ia1a2cac0e9dc05f3207b3433a6c1589fa7f2aee3	2015-10-05 06:33:42 -07:00
Ronald S. Bultje	3fedf4a59b	Merge "vp10: reimplement d45/4x4 to match vp8 instead of vp9."	2015-10-02 17:15:59 +00:00
Debargha Mukherjee	cb5c47f20d	Merge "Accelerated transform in high bit depth"	2015-10-02 06:55:55 +00:00
Ronald S. Bultje	62a1579525	vp10: reimplement d45/4x4 to match vp8 instead of vp9. This is more a proof of concept than anything else. The problem here isn't so much how to code it, but rather where to place the resulting code. All intrapred DSP code lives in vpx_dsp, so do we want the vp10 specific intra pred functions to live there, or in vp10/? See issue 1015. Change-Id: I675f7badcc8e18fd99a9553910ecf3ddf81f0a05	2015-10-01 10:11:54 -04:00
Ronald S. Bultje	c26a9ecaa2	vp8: change build_intra4x4_predictors() to use vpx_dsp. I've added a few new functions (d45e, d63e, he, ve) to cover the filtered h/v 4x4 predictors that are vp8-specific, the "correct" d45 with the correctly filtered bottom-right pixel (as opposed to the unfiltered version in vp9), and the "broken" d63 with weirdly filtered bottom-right pixels (which is correctly filtered in vp9). There may be a minor performance impact on all systems because we have to do an extra copy of the Above pixel array to incorporate the topleft pixel in the same array (thus fitting the vpx_dsp API). In addition, armv6 will have a more serious performance impact b/c I removed the armv6/vp8-specific assembly. I'm not sure anyone cares... Change-Id: I7f9e5ebee11d8e21aca2cd517a69eefc181b2e86	2015-09-30 18:45:49 -04:00
Ronald S. Bultje	54d48955f6	vp8: change build_intra_predictors_mby_s to use vpx_dsp. Change-Id: I2000820e0c04de2c975d370a0cf7145330289bb2	2015-09-30 18:45:40 -04:00
Julia Robson	406030d1b0	Accelerated transform in high bit depth When configured with high bitdepth enabled, the 8bit transform stopped using optimised code. This made 8bit content decode slowly. Change-Id: I67d91f9b212921d5320f949fc0a0d3f32f90c0ea	2015-09-28 21:09:16 -07:00
Johann	dd4f953350	Remove vpx_filter_block1d16_v8_intrin_ssse3 This was rewritten and moved to vpx_dsp/x86/vpx_subpixel_8t_ssse3.asm in `195883023b` Change-Id: I117ce983dae12006e302679ba7f175573dd9e874	2015-09-18 16:05:43 -07:00
James Zern	683b5a3161	vpx_subpixel_8t_ssse3: fix reg counts/access fixes build on windows x64; previously 'heightq' i.e., the 64-bit register was accessed when only the 32-bit value was needed. given this is from a stack variable the upper bits were undefined. + bump register/xmm counts; users of SETUP_LOCAL_VARS touch xmm13 in 64-bit builds and filter_block1d16_v* uses one extra temp variable Change-Id: I9c768c0b2047481d1d3b11c2e16b2f8de6eb0d80	2015-09-17 12:27:34 -07:00
Ronald S. Bultje	a3df343cda	vp10: code sign bit before absolute value in non-arithcoded header. For reading, this makes the operation branchless, although it still requires two shifts. For writing, this makes the operation as fast as writing an unsigned value, branchlessly. This is also how other codecs typically code signed, non-arithcoded bitstream elements. See issue 1039. Change-Id: I6a8182cc88a16842fb431688c38f6b52d7f24ead	2015-09-16 19:35:03 -04:00
Debargha Mukherjee	1c8567ff09	Remove some trailing whitespaces Change-Id: Icf06d35ca347713253d1eba341a894b51efa81a9	2015-09-08 01:31:04 -07:00
Scott LaVarnway	195883023b	VPX: subpixel_8t_ssse3 asm using x86inc This is based on the original patch optimized for 32bit platforms by Tamar/Ilya and now uses the x86inc style asm. The assembly was also modified to support 64bit platforms. Change-Id: Ice12f249bbbc162a7427e3d23fbf0cbe4135aff2	2015-09-03 20:35:51 -07:00
Johann	c5f11912ae	Include vpx_dsp_common.h when using VPXMIN/MAX Change-Id: I2e387a06484a06301f3cd6600c4ba2f4335b61ee	2015-08-31 14:36:35 -07:00
Angie Chiang	45db71d0ac	Expand the idct4_c() function in idct8_c() Change-Id: I5afa3c351ba7c5e7deb3889f7471619ac60af255	2015-08-28 10:53:11 -07:00
Johann Koenig	5c245a46d8	Merge changes I53b5bdc5,Ib81168a7,Ie0113945 * changes: Only build ssse3 filter functions on 64 bit Clean up unused function warnings in vp8 encoder Clean up unused function warnings in vp8 onyx_if.c	2015-08-27 20:58:53 +00:00
Johann Koenig	18ea2a7e0c	Merge "Add sse2 versions of halfpix variance"	2015-08-27 20:56:32 +00:00
Johann	a28b2c6ff0	Add sse2 versions of halfpix variance These were lost in the great sub pixel variance move of `6a82f0d7fb` Not having these functions caused a ~10% performance regression in some realtime vp8 encodes. Change-Id: I50658483d9198391806b27899f2c0d309233c4b5	2015-08-27 11:58:38 -07:00
James Zern	5e16d397bd	vpx_dsp_common: add VPX prefix to MIN/MAX prevents redeclaration warnings; vp8 has its own define which will be resolved in a future commit Change-Id: Ic941fef3dd4262fcdce48b73075fe6b375f11c9c	2015-08-26 20:11:32 -07:00
Johann	f5507b514c	Only build ssse3 filter functions on 64 bit Avoid an unused function warning by only building the functions when they will be used. Change-Id: I53b5bdc5a180c79d63b34e4c8921d679bbc54009	2015-08-26 10:32:18 -07:00
Scott LaVarnway	6c0f6dd817	Merge "VPX: scaled convolve : fix windows build errors"	2015-08-21 12:06:34 +00:00
Scott LaVarnway	acf24cc1b8	VPX: scaled convolve : fix windows build errors Change-Id: Ic81d435ea928183197040cdf64b6afd7dbaf57e4	2015-08-20 13:09:27 -07:00
Scott LaVarnway	6a21ca20cc	Merge "VPX ssse3 scaled convolve"	2015-08-19 22:12:21 +00:00
Jingning Han	b1339751b9	Merge "Rename inv_txfm_sse2.asm to inv_wht_sse2.asm"	2015-08-19 18:26:30 +00:00
Jingning Han	49f6ff1103	Rename inv_txfm_sse2.asm to inv_wht_sse2.asm Change-Id: I43bcc70680503e4c18d8f021097307778cf9ea70	2015-08-19 10:29:53 -07:00
Scott LaVarnway	2030c49cf8	VPX ssse3 scaled convolve Change-Id: I71d5994e21813554a927d35ebcc26bf7a68984fd	2015-08-18 15:13:02 -07:00
Jingning Han	5de049b067	Turn on dspr2 loop filter functions in vpx_dsp Add the dspr2 files to vpx_dsp.mk and enable these functions in vpx_dsp_rtcd_defs.pl file. Change-Id: I79feb5af24f174f4a0788dc6f3b6df7f4e1fa467	2015-08-17 16:15:24 -07:00
James Zern	1794624c18	Merge changes I2fe52bfb,I5e5084eb * changes: VPX: removed filter == 128 checks from mips convolve code VPX: removed step checks from mips convolve code	2015-08-14 19:45:27 +00:00
James Zern	78629508f2	Merge "VPX: removed step checks from neon convolve code"	2015-08-14 19:23:46 +00:00
Yaowu Xu	94ba3939cd	vpx_highbd_ssim_parms_8x8: make parameter types consistent Change-Id: Ie1fe6603232adc22dbe4d51bd1008c856a6d40ca	2015-08-14 09:18:07 -07:00
Scott LaVarnway	89dcc13939	VPX: removed filter == 128 checks from mips convolve code The check is handled by the predictor table. Change-Id: I2fe52bfbbfccb2edd13ba250986e3a4b4b589459	2015-08-13 12:57:01 -07:00
Scott LaVarnway	aeea00cc4f	VPX: removed step checks from mips convolve code The check is handled by the predictor table. Change-Id: I5e5084ebb46be8087c8c9d80b5f76e919a1cd05b	2015-08-13 11:27:04 -07:00
Scott LaVarnway	fa47212933	VPX: removed step checks from neon convolve code The check is handled by the predictor table. Change-Id: I42479f843e77a2d40cdcdfc9e2e6c48a05a36561	2015-08-12 16:46:53 -07:00
Scott LaVarnway	6cf95bd1e7	Merge "VPX: remove step == 16 and filter[3] != 128 checks"	2015-08-12 20:13:33 +00:00
James Zern	345b11cd73	Merge "fix build w/only mmx+sse enabled"	2015-08-12 02:26:08 +00:00

1 2 3 4

152 Commits