generic-library/vpx

Author	SHA1	Message	Date
Scott LaVarnway	4cae64c32c	vpxdsp: [x86] add highbd_d117_predictor functions C vs SSE2 speed gains: _4x4 : ~2.04x C vs SSSE3 speed gains: _8x8 : ~2.82x _16x16 : ~5.93x _32x32 : ~2.79x BUG=webm:1411 Change-Id: I31d949695991c067dac89d91e0bed3e666c94993	2017-09-28 14:45:28 -07:00
Scott LaVarnway	80992a746c	Merge "vpxdsp: [x86] add highbd_d153_predictor functions"	2017-09-27 20:40:21 +00:00
Linfeng Zhang	dbbbd44304	fix signed integer overflow of idct Exposed by fuzz test in high bitdepth. The bug is introduced in commit 64653fa. BUG=webm:1466 Change-Id: Idd77d5c6a60efb9241471611ce1aba0646cb6ff5	2017-09-27 11:17:54 -07:00
Scott LaVarnway	19c45ccd43	vpxdsp: [x86] add highbd_d153_predictor functions C vs SSE2 speed gains: _4x4 : ~1.95x C vs SSSE3 speed gains: _8x8 : ~3.30x _16x16 : ~5.67x _32x32 : ~3.87x BUG=webm:1411 Change-Id: Ib483989b25614aa89b635e8c087d0879a5d71904	2017-09-27 11:01:11 -07:00
Linfeng Zhang	28762341ac	Merge changes Ib9105462,Idfac00ed,If8d8a0e2 * changes: cosmetics: NEON scaling code Refactor convolve NEON code Refactor convolve code	2017-09-26 16:10:46 +00:00
Scott LaVarnway	a059dc0986	Merge "vpxdsp: [x86] add highbd_d45_predictor functions"	2017-09-25 11:34:14 +00:00
Scott LaVarnway	cf82f7276e	vpxdsp: [x86] add highbd_d45_predictor functions C vs SSSE3 speed gains: _4x4 : ~2.45x _8x8 : ~10.61x _16x16 : ~11.34x _32x32 : ~6.36x BUG=webm:1411 Change-Id: Ic91389a4f1a8ad093f498afe53765b897fb9be09	2017-09-22 05:20:12 -07:00
Linfeng Zhang	d586cdb4d4	Remove the unnecessary cast of (int16_t)cospi_{1...31}_64 BUG=webm:1450 Change-Id: If59743aafe99226e0ec67ab5d20678ce25f53ab8	2017-09-20 14:13:26 -07:00
Linfeng Zhang	76a3d3fcc5	Remove the unnecessary upcasts of (int)cospi_{1...31}_64 BUG=webm:1450 Change-Id: Ib046fe28caec5b9ebdc9d0152df7c54ff4266858	2017-09-20 14:13:26 -07:00
Linfeng Zhang	64653fa133	Change cospi_{1...31}_64 from tran_high_t to tran_coef_t The unnecessary upcast to (int) will be cleaned later. BUG=webm:1450 Change-Id: Ia234575206d5a74540526924b06ed3939322d063	2017-09-20 14:13:26 -07:00
Scott LaVarnway	b85e391ac8	Merge "vpxdsp: [x86] add highbd_d63_predictor functions"	2017-09-20 11:39:28 +00:00
Linfeng Zhang	bf8bdae913	Refactor convolve code Extract a couple of static functions into their caller functions. Change-Id: If8d8a0e217fba6b402d2a79ede13b5b444ff08a0	2017-09-19 16:28:31 -07:00
Scott LaVarnway	bc86e2c6a2	vpxdsp: [x86] add highbd_d63_predictor functions C vs SSE2 speed gains: _4x4 : ~2.94x C vs SSSE3 speed gains: _8x8 : ~8.69x _16x16 : ~6.32x _32x32 : ~5.33x BUG=webm:1411 Change-Id: I2c35b527eac2229f17aaa9d118fb601e7195efe4	2017-09-19 15:47:22 -07:00
Johann	eb4238ac70	Revert "Revert "quantize avx: copy 32x32 implementation"" This reverts commit 8c42237bb200253931c49e2c530838f3a877dd65. Because ssse3 code is used for the reference, the qcoeff and dqcoeff reference buffers must be aligned. Original change's description: > quantize avx: copy 32x32 implementation > > Ensure avx and ssse3 stay in sync by testing them against each other. > > Change-Id: I699f3b48785c83260825402d7826231f475f697c Change-Id: Ieeef11b9406964194028b0d81d84bcb63296ae06	2017-09-12 14:25:38 -07:00
Scott LaVarnway	d6c9bbc2b6	vpxdsp: [x86] add highbd_d207_predictor functions C vs SSE2 speed gains: _4x4 : ~2.31x C vs SSSE3 speed gains: _8x8 : ~4.73x _16x16 : ~10.88x _32x32 : ~4.80x BUG=webm:1411 Change-Id: I0bac29db261079181ddabc6814bd62c463109caf	2017-09-11 07:36:24 -07:00
Linfeng Zhang	ef41c6286d	Update convolve functions' assertions So that 4 to 1 frame scaling can call them. Change-Id: I9ec438aa63b923ba164ad3c59d7ecfa12789eab5	2017-09-07 12:33:58 -07:00
Linfeng Zhang	7219f31904	Merge "Remove get_filter_base() and get_filter_offset() in convolve"	2017-09-06 22:39:15 +00:00
Linfeng Zhang	d331e7a1c0	Remove get_filter_base() and get_filter_offset() in convolve so that the convolve functions are independent of table alignment. Change-Id: Ieab132a30d72c6e75bbe9473544fbe2cf51541ee	2017-09-05 15:22:36 -07:00
Scott LaVarnway	bc4bcca3fd	vpxdsp: [x86] add highbd_dc_128_predictor functions C vs SSE2 speed gains: _4x4 : ~7.64x _8x8 : ~16.60x _16x16 : ~8.15x _32x32 : ~5.05x BUG=webm:1411 Change-Id: If165d419711cfda901bd428a05ca1560a009e62e	2017-09-05 07:57:42 -07:00
Scott LaVarnway	c39a05ff61	vpxdsp: [x86] add highbd_dc_left_predictor functions C vs SSE2 speed gains: _4x4 : ~6.49x _8x8 : ~10.82x _16x16 : ~7.61x _32x32 : ~5.29x BUG=webm:1411 Change-Id: Ibc30c50cb7139049bf05298010803499e6ef949b	2017-08-30 09:29:06 -07:00
Scott LaVarnway	f783e3a75d	vpxdsp: [x86] add highbd_dc_top_predictor functions C vs SSE2 speed gains: _4x4 : ~7.39x _8x8 : ~11.36x _16x16 : ~8.68x _32x32 : ~4.33x BUG=webm:1411 Change-Id: I7f1487cd1531d4e7f0fbb4596fed3bfb72a59d58	2017-08-29 12:53:30 -07:00
Scott LaVarnway	30d9a1916c	vpxdsp: [x86] add highbd_h_predictor functions C vs SSE2 speed gains: _4x4 : ~8.12x _8x8 : ~9.71x _16x16 : ~8.21x _32x32 : ~5.0x BUG=webm:1422 Change-Id: I5e8a1ed4db7b8dc539b3e2a728b0b34d8b4b1993	2017-08-28 17:31:18 -07:00
Marco Paniconi	8c42237bb2	Revert "quantize avx: copy 32x32 implementation" This reverts commit f60d1dcd3de46f72bafc5eeef481bd1a4e203301. Reason for revert: <INSERT REASONING HERE> Failures in AVX/VP9QuantizeTest in nightly tests. Original change's description: > quantize avx: copy 32x32 implementation > > Ensure avx and ssse3 stay in sync by testing them against each other. > > Change-Id: I699f3b48785c83260825402d7826231f475f697c TBR=slavarnway@google.com,johannkoenig@google.com,builds@webmproject.org Change-Id: Ibd38636212269328317dd0721be9d25452113d1c No-Presubmit: true No-Tree-Checks: true No-Try: true	2017-08-25 16:56:08 +00:00
Johann	f60d1dcd3d	quantize avx: copy 32x32 implementation Ensure avx and ssse3 stay in sync by testing them against each other. Change-Id: I699f3b48785c83260825402d7826231f475f697c	2017-08-24 10:42:34 -07:00
Johann	1787e7dbe0	quantize ssse3: copy implementation to intrinsics Still does not pass tests. Does match the previous assembly, although saving the sign before multiplying is dubious. Change-Id: Ia163f18c755aba542d6e93f7bf7343184660df5a	2017-08-24 07:47:51 -07:00
Johann Koenig	f53b656207	Merge "quantize avx: copy implementation to intrinsics"	2017-08-23 21:14:13 +00:00
Scott LaVarnway	1aad50c092	Merge "vpx_dsp: get32x32var_avx2() cleanup"	2017-08-23 19:59:25 +00:00
Johann	7c27872164	quantize avx: copy implementation to intrinsics Adds an early exit based on ptest. Slightly slower than ssse3 in the full case because of the extra check, but potentially faster if lots of rows can be skipped. Very close in speed to the assembly. Can run in 32 bit, unlike the assembly. Allows reworking the function prototype to use structs. Change-Id: If80e2b9ba059370a4cad3c973196e82a97b4330e	2017-08-23 09:19:16 -07:00
Johann	b9c1dcc5fa	quantize ssse3: copy style from sse2 Change-Id: I53f8a160e640c674ea035fc112e207b6dca42598	2017-08-22 14:25:27 -07:00
Johann	75752ab7c0	quantize sse2: copy opts from ssse3 Simplify eob calculations based on ssse3 implementation. General clean up and re-scoping. Change-Id: I48f282bf9bd28ee9bc2c7a6779be9d45b5a3a3ee	2017-08-22 13:01:44 -07:00
Johann	c02fdd0258	quantize: ignore skip_block in x86 Change-Id: I9a963e99f08761f0c8d6a305619270b2f1c4edf8	2017-08-21 14:37:03 -07:00
Scott LaVarnway	eab3f5e0cc	vpx_dsp: get32x32var_avx2() cleanup renamed to get32x16var_avx2() BUG=webm:1404 Change-Id: Icb8f3986c9c9c646e13a69430db7235fc7e1a036	2017-08-18 13:44:09 -07:00
Scott LaVarnway	2c5478e383	Merge "vpx_dsp: vpx_get16x16var_avx2() cleanup"	2017-08-18 20:30:59 +00:00
Scott LaVarnway	2f7497f341	vpx_dsp: vpx_get16x16var_avx2() cleanup BUG=webm:1404 Change-Id: I88aceb07f4db4870a06eee21d87296974ce3221a	2017-08-18 12:23:49 -07:00
James Zern	bb15fd51be	highbd_idct32x32*,idct32_34_4x32_quarter_1_2: fix typo 135 -> 34 fixes unused function warnings for highbd_idct32_34_4x32_quarter_[12] Change-Id: I4f50ff6ea514200af93dd59ff94c7f9717409682	2017-08-17 15:37:38 -07:00
James Zern	e038d1610e	inv_txfm_sse2.h: correct idct/iadst prototypes fixes mismatch between prototypes and definitions Change-Id: Ib5e7dfcce244dbb8401815be2cdd183d96792652	2017-08-16 23:06:09 -07:00
Linfeng Zhang	d72e20b123	Add vpx_highbd_idct32x32_{34, 135, 1024}_add_{sse2, sse4_1} BUG=webm:1412 Change-Id: I08b562b60fa85fbc2fec1c15c323a3444b44618f	2017-08-14 17:05:22 -07:00
Linfeng Zhang	69775d2f40	Update highbd idct x86 optimizations. BUG=webm:1412 Change-Id: Ia275940af7d7d8637e9a851a9e39d655bfbe4069	2017-08-14 16:59:50 -07:00
Linfeng Zhang	3f05a70c41	Update 32x32 idct sse2 and ssse3 optimizations. Change-Id: I51106e90344035452621c49a6e1be7d5276b6c70	2017-08-14 16:59:31 -07:00
Linfeng Zhang	15193ce51f	Merge "Clean highbd idct x86 code with inline functions"	2017-08-10 20:25:18 +00:00
Johann Koenig	0b393ae505	Merge "quantize: copy ssse3 optimizations to intrinsics"	2017-08-10 15:42:20 +00:00
Linfeng Zhang	39da7fb786	Clean highbd idct x86 code with inline functions Created inline functions highbd_butterfly_cospi16_sse2() and highbd_butterfly_cospi16_sse4_1() BUG=webm:1412 Change-Id: Icbc53a73712b6207379872a5e88d0a4d09e2322a	2017-08-08 17:53:28 -07:00
Johann	d52cb59729	quantize: copy ssse3 optimizations to intrinsics Fairly minor differences from sse2. pabsw and psignw are the big gains. Also re-uses some values in eob calculation to avoid an extra pcmp. Fixes test failures in HBD and OS X builds. Allows using it in 32bit builds, where it is about 40% faster than sse2. Substantially faster than the assembly for skip_block. 10-20% faster the rest of the time. Change-Id: If783bb3567e561e47667e10133b9c84414a334e2	2017-08-08 12:22:14 -07:00
Linfeng Zhang	853165ba39	Update 32x32 idct sse2 funcs, add partial case 135 Change-Id: I2b9add83f6fd8f9138fed3bec04a59877a237a6a	2017-08-07 17:37:02 -07:00
Linfeng Zhang	d670678f26	Rename highbd_multiplication_and_add_xx() to highbd_butterfly_xx() in idct x86 code Change-Id: I5159499a73a5c1b680516f6ca9c3d84f00c35083	2017-08-04 15:33:37 -07:00
Linfeng Zhang	fa829e0e5a	Replace multiplication_and_add() with butterfly() in idct x86 code Change-Id: I266e45a3d75a5357c7d6e6f20ab5c6fdbfe4982e	2017-08-04 15:33:34 -07:00
Linfeng Zhang	c9fb719ee1	Update butterfly() in idct x86 optimizations. Change-Id: Ic73e03bab9fdc085146f52094014db4af36ad701	2017-08-04 15:33:28 -07:00
Linfeng Zhang	7f20c3ac44	Add vpx_highbd_idct16x16_{10, 38, 256}_add_sse4_1 BUG=webm:1412 Change-Id: I8877c986b4042f7b8e33f5674c86700675a0e4ca	2017-08-04 15:31:17 -07:00
Linfeng Zhang	22b6dc9fdf	Update for loop increment of idct x86 functions Change-Id: Ided7895eaf41d5bc9d64fe536a17f5a078da68d4	2017-08-04 15:29:19 -07:00
Linfeng Zhang	0c61331244	Update high bitdepth 16x16 idct x86 code Prepare for high bitdepth 16x16 idct sse4.1 code. Just functions moving and renaming. BUG=webm:1412 Change-Id: Ie056fe4494b1f299491968beadcef990e2ab714a	2017-08-04 15:12:33 -07:00

1 2 3 4 5 ...

371 Commits