generic-library/vpx

Author	SHA1	Message	Date
Scott LaVarnway	3bf02ad74a	vpx: hadamard: use ptrdiff_t instead of int for stride Eliminates the following instruction for the x86 (64 bit) intrinsic code: movslq %esi,%rax Change-Id: I8f5ebd40726f998708a668b0f52ea7a0576befae	2017-10-26 11:41:48 -07:00
Kyle Siefring	037e596f04	Merge "Optimize convolve8 SSSE3 and AVX2 intrinsics"	2017-10-24 19:22:36 +00:00
Kyle Siefring	ae35425ae6	Optimize convolve8 SSSE3 and AVX2 intrinsics Changed the intrinsics to perform summation similiar to the way the assembly does. The new code diverges from the assembly by preferring unsaturated additions. Results for haswell SSSE3 Horiz/Vert Size Speedup Horiz x4 ~32% Horiz x8 ~6% Vert x8 ~4% AVX2 Horiz/Vert Size Speedup Horiz x16 ~16% Vert x16 ~14% BUG=webm:1471 Change-Id: I7ad98ea688c904b1ba324adf8eb977873c8b8668	2017-10-24 10:39:48 -04:00
Scott LaVarnway	512bf4e029	vpx: [x86] vpx_hadamard_16x16_avx2() highbitdepth fix Use an intermediate buffer before storing to coeffs when highbitdepth is enabled. Change-Id: I101981a1995f1108ad107c55c37d6e09eadb404b	2017-10-23 08:49:32 -07:00
Scott LaVarnway	4906cea027	vpx: [x86] vpx_hadamard_16x16_avx2() improvements ~10% performance gain. Fixed the cosmetics noted in the previous commit. Change-Id: Iddf475f34d0d0a3e356b2143682aeabac459ed13	2017-10-20 08:55:06 -07:00
Scott LaVarnway	b58259ab55	Merge "vpx: [x86] add vpx_hadamard_16x16_avx2()"	2017-10-19 23:32:10 +00:00
Scott LaVarnway	55c126a5d7	vpx: [x86] add vpx_hadamard_16x16_avx2() This version is ~1.91x faster than the sse2 version. When highbitdepth is enabled, it is ~1.74x. Change-Id: I2b0e92ede9f55c6259ca07bf1f8c8a5d0d0955bd	2017-10-18 18:00:00 -07:00
Kyle Siefring	b3a36f7946	Merge "Refactor x86/vpx_subpixel_8t_intrin_avx2.c"	2017-10-18 16:19:52 +00:00
Linfeng Zhang	9336e01621	Merge changes I17fff122,Ic149e3cb * changes: Add 4 to 3 scaling SSSE3 optimization Test extreme inputs in frame scale functions	2017-10-17 16:03:29 +00:00
Kyle Siefring	55805e2786	Refactor x86/vpx_subpixel_8t_intrin_avx2.c Change-Id: I6539111dfb35a43028e9755785b2e9ea31854305	2017-10-17 11:57:40 -04:00
Linfeng Zhang	580d32240f	Add 4 to 3 scaling SSSE3 optimization Note this change will trigger the different C version on SSSE3 and generate different scaled output. Its speed is 2x compared with the version calling vpx_scaled_2d_ssse3(). Change-Id: I17fff122cd0a5ac8aa451d84daa606582da8e194	2017-10-16 15:42:42 -07:00
Kyle Siefring	caa116c9be	Merge changes I38783d97,If5160c0c * changes: Extend 16 wide AVX2 convolve8 code to support averaging. Add AVX2 version of vpx_convolve8_avg.	2017-10-12 16:12:38 +00:00
Linfeng Zhang	16166bfdaa	Add 4 to 1 scaling x86 optimization Change-Id: I51c190f0a88685867df36912522e67bdae58a673	2017-10-10 16:24:06 -07:00
Linfeng Zhang	963cc22cef	Merge changes I9d4c1af5,I882da3a0 * changes: Rename some inline functions in NEON scaling Generalize 2:1 vp9_scale_and_extend_frame_ssse3()	2017-10-10 17:29:50 +00:00
Kyle Siefring	1b2f92ee8e	Extend 16 wide AVX2 convolve8 code to support averaging. Also adds vpx_convolve8_avg_horiz_avx2. Change-Id: I38783d972ac26bec77610e9e15a0a058ed498cbf	2017-10-09 19:10:03 -04:00
Kyle Siefring	9ca06bcdd2	Add AVX2 version of vpx_convolve8_avg. vpx_convolve8_avg works by first running a normal horizontal filter then a vertical filter averages at the end. The added vpx_convolve8_avg_avx2 calls pre-existing AVX2 code for the horizontal step. vpx_convolve8_avg_vert_avx2 is also added, but only uses ssse3 code. Change-Id: If5160c0c8e778e10de61ee9bf42ee4be5975c983	2017-10-07 23:37:48 -04:00
James Zern	807248ec81	Merge "ppc: Add vpx_idct32x32_1024_add_vsx"	2017-10-07 19:08:26 +00:00
Linfeng Zhang	127864deb3	Generalize 2:1 vp9_scale_and_extend_frame_ssse3() Change-Id: I882da3a04884d5fabd4cd591c28682cbb2d76aa5	2017-10-04 12:35:39 -07:00
Linfeng Zhang	9a71811d98	Merge changes Id6a8c549,Ib1e0650b,Ic369dd86 * changes: Refactor x86/vpx_subpixel_8t_intrin_ssse3.c Add vpx_dsp/x86/mem_sse2.h Add transpose_8bit_{4x4,8x8}() x86 optimization	2017-10-04 16:15:14 +00:00
James Zern	66b6b87471	Merge "vpx: fix nasm build errors"	2017-10-03 21:47:49 +00:00
Scott LaVarnway	bc4bc9b622	vpx: fix nasm build errors BUG=webm:1462,766721 Change-Id: Icfa536a8e38623636b96c396e3c94889bfde7a98	2017-10-03 20:02:21 +00:00
Linfeng Zhang	6543213e87	Refactor x86/vpx_subpixel_8t_intrin_ssse3.c Change-Id: Id6a8c549709a3c516ed5d7b719b05117c5ef8bac	2017-10-03 13:02:05 -07:00
Linfeng Zhang	0f756a307d	Add vpx_dsp/x86/mem_sse2.h Add some load and store sse2 inline functions. Change-Id: Ib1e0650b5a3d8e2b3736ab7c7642d6e384354222	2017-10-03 12:59:05 -07:00
Linfeng Zhang	67c38c92e7	Add transpose_8bit_{4x4,8x8}() x86 optimization Change-Id: Ic369dd86b3b81686f68fbc13ad34ab8ea8846878	2017-10-03 10:00:30 -07:00
Alexandra Hájková	fb7fc1dbda	ppc: Add vpx_idct32x32_1024_add_vsx Change-Id: I55cd0a1569ccc47a53d0ecf751aac259d510e10d	2017-09-30 19:31:20 +00:00
Scott LaVarnway	3bbd62ed27	vpxdsp: [x86] add highbd_d135_predictor functions C vs SSE2 speed gains: _4x4 : ~1.81x C vs SSSE3 speed gains: _8x8 : ~1.96x _16x16 : ~1.88x _32x32 : ~2.02x BUG=webm:1411 Change-Id: Iefaf8b39afbbfe34c1ad1d21e3a003b20f1f61e0	2017-09-29 08:56:38 -07:00
Scott LaVarnway	4cae64c32c	vpxdsp: [x86] add highbd_d117_predictor functions C vs SSE2 speed gains: _4x4 : ~2.04x C vs SSSE3 speed gains: _8x8 : ~2.82x _16x16 : ~5.93x _32x32 : ~2.79x BUG=webm:1411 Change-Id: I31d949695991c067dac89d91e0bed3e666c94993	2017-09-28 14:45:28 -07:00
Scott LaVarnway	80992a746c	Merge "vpxdsp: [x86] add highbd_d153_predictor functions"	2017-09-27 20:40:21 +00:00
James Zern	690fa6bb6e	Merge "fix signed integer overflow of idct"	2017-09-27 19:39:11 +00:00
Linfeng Zhang	dbbbd44304	fix signed integer overflow of idct Exposed by fuzz test in high bitdepth. The bug is introduced in commit 64653fa. BUG=webm:1466 Change-Id: Idd77d5c6a60efb9241471611ce1aba0646cb6ff5	2017-09-27 11:17:54 -07:00
Scott LaVarnway	19c45ccd43	vpxdsp: [x86] add highbd_d153_predictor functions C vs SSE2 speed gains: _4x4 : ~1.95x C vs SSSE3 speed gains: _8x8 : ~3.30x _16x16 : ~5.67x _32x32 : ~3.87x BUG=webm:1411 Change-Id: Ib483989b25614aa89b635e8c087d0879a5d71904	2017-09-27 11:01:11 -07:00
Linfeng Zhang	9d0d13e939	Add vpx_scaled_2d_neon() BUG=webm:1419 Change-Id: I39c8033734562efc0ac0e28e7f06fa05130f9b96	2017-09-26 09:22:39 -07:00
Linfeng Zhang	28762341ac	Merge changes Ib9105462,Idfac00ed,If8d8a0e2 * changes: cosmetics: NEON scaling code Refactor convolve NEON code Refactor convolve code	2017-09-26 16:10:46 +00:00
Scott LaVarnway	a059dc0986	Merge "vpxdsp: [x86] add highbd_d45_predictor functions"	2017-09-25 11:34:14 +00:00
Scott LaVarnway	cf82f7276e	vpxdsp: [x86] add highbd_d45_predictor functions C vs SSSE3 speed gains: _4x4 : ~2.45x _8x8 : ~10.61x _16x16 : ~11.34x _32x32 : ~6.36x BUG=webm:1411 Change-Id: Ic91389a4f1a8ad093f498afe53765b897fb9be09	2017-09-22 05:20:12 -07:00
Linfeng Zhang	d586cdb4d4	Remove the unnecessary cast of (int16_t)cospi_{1...31}_64 BUG=webm:1450 Change-Id: If59743aafe99226e0ec67ab5d20678ce25f53ab8	2017-09-20 14:13:26 -07:00
Linfeng Zhang	76a3d3fcc5	Remove the unnecessary upcasts of (int)cospi_{1...31}_64 BUG=webm:1450 Change-Id: Ib046fe28caec5b9ebdc9d0152df7c54ff4266858	2017-09-20 14:13:26 -07:00
Linfeng Zhang	64653fa133	Change cospi_{1...31}_64 from tran_high_t to tran_coef_t The unnecessary upcast to (int) will be cleaned later. BUG=webm:1450 Change-Id: Ia234575206d5a74540526924b06ed3939322d063	2017-09-20 14:13:26 -07:00
Scott LaVarnway	b85e391ac8	Merge "vpxdsp: [x86] add highbd_d63_predictor functions"	2017-09-20 11:39:28 +00:00
Linfeng Zhang	7c0529728a	cosmetics: NEON scaling code Change-Id: Ib91054622c1f09c4ca523bc6837d7d8ab9f03618	2017-09-19 16:39:17 -07:00
Linfeng Zhang	f357335c38	Refactor convolve NEON code Rename a couple of hbd static functions. Move the position of NEON function convolve8_4(). Change-Id: Idfac00edf2e99cdd8e0a73b9f895402f60be6349	2017-09-19 16:28:36 -07:00
Linfeng Zhang	bf8bdae913	Refactor convolve code Extract a couple of static functions into their caller functions. Change-Id: If8d8a0e217fba6b402d2a79ede13b5b444ff08a0	2017-09-19 16:28:31 -07:00
Scott LaVarnway	bc86e2c6a2	vpxdsp: [x86] add highbd_d63_predictor functions C vs SSE2 speed gains: _4x4 : ~2.94x C vs SSSE3 speed gains: _8x8 : ~8.69x _16x16 : ~6.32x _32x32 : ~5.33x BUG=webm:1411 Change-Id: I2c35b527eac2229f17aaa9d118fb601e7195efe4	2017-09-19 15:47:22 -07:00
Linfeng Zhang	a80bdfd081	Change sinpi_{1,2,3,4}_9 from tran_high_t to int16_t Add "typedef int16_t tran_coef_t;" BUG=webm:1450 Change-Id: I67866f104898d1dda8989e1abdaf6983fe324154	2017-09-18 09:26:03 -07:00
Linfeng Zhang	9d278465b5	Merge "cosmetics: vp9_rtcd_defs.pl"	2017-09-18 16:23:33 +00:00
Kaustubh Raste	4ca8f8f5e2	mips msa clean-up msa macros Removed inline for GP load-store in case of (__mips_isa_rev >= 6) Created one define LD_V for vector load and ST_V for vector store Change-Id: Ifec3570fa18346e39791b0dd622892e5c18bd448	2017-09-14 12:29:19 +05:30
Linfeng Zhang	535dee0fb6	cosmetics: vp9_rtcd_defs.pl Change-Id: I1bf57824e07fa4f8b3b5574984117f2bd7a1c086	2017-09-13 12:13:55 -07:00
Johann Koenig	ed3a80cb5e	Merge "Revert "Revert "quantize avx: copy 32x32 implementation"""	2017-09-13 14:44:53 +00:00
Johann	eb4238ac70	Revert "Revert "quantize avx: copy 32x32 implementation"" This reverts commit 8c42237bb200253931c49e2c530838f3a877dd65. Because ssse3 code is used for the reference, the qcoeff and dqcoeff reference buffers must be aligned. Original change's description: > quantize avx: copy 32x32 implementation > > Ensure avx and ssse3 stay in sync by testing them against each other. > > Change-Id: I699f3b48785c83260825402d7826231f475f697c Change-Id: Ieeef11b9406964194028b0d81d84bcb63296ae06	2017-09-12 14:25:38 -07:00
Kaustubh Raste	30f1ff94e0	Optimize mips msa vp9 average mc functions Load the specific destination loads instead of vector load Change-Id: I65ca13ae8f608fad07121fef848e2a18f54171fe	2017-09-12 16:12:11 +05:30

1 2 3 4 5 ...

995 Commits