generic-library/vpx

Author	SHA1	Message	Date
Johann	eae7cf2368	fdct16x16 neon optimization Roughly 2x speedup. Since the only change for HBD is to store(), the improvement appears to hold there as well. BUG=webm:1424 Change-Id: I15b813d50deb2e47b49a6b0705945de748e83c19	2017-06-07 14:59:55 -07:00
James Zern	ff42e04f9c	Merge "ppc: Add vpx_sadnxmx4d_vsx for n,m = {8, 16, 32 ,64}"	2017-06-06 23:52:39 +00:00
James Zern	4753c23983	Merge "ppc: Add vpx_sad64/32/16x64/32/16_avg_vsx"	2017-06-06 02:19:41 +00:00
Johann Koenig	755b3daf90	Merge "comp_avg_pred neon: used by sub pixel avg variance"	2017-05-31 18:17:28 +00:00
Linfeng Zhang	30ea3ef283	Merge "Update vpx_highbd_idct4x4_16_add_sse2()"	2017-05-31 15:56:20 +00:00
Johann	f695b30ac2	comp_avg_pred neon: used by sub pixel avg variance BUG=webm:1423 Change-Id: I33de537f238f58f89b7a6c1c2d6e8110de4b8804	2017-05-30 22:47:34 +00:00
Linfeng Zhang	45048dc9dc	Update vpx_highbd_idct4x4_16_add_sse2() BUG=webm:1412 Change-Id: I26e4b34ae9bc1ae80c24f56d740d737a95f1ab84	2017-05-30 09:25:30 -07:00
Johann Koenig	b9649d2407	Merge "comp_avg_pred: alignment"	2017-05-30 16:21:05 +00:00
Johann	ea8b4a450d	comp_avg_pred: alignment x86 requires 16 byte alignment for some vector loads/stores. arm does not have the same requirement. The asserts are still in avg_pred_sse2.c. This just removes them from the common code. Change-Id: Ic5175c607a94d2abf0b80d431c4e30c8a6f731b6	2017-05-30 07:46:43 -07:00
Johann	42ce25821d	remove DECLARE_ALIGNED from neon code Unlike x86 neon only requires type alignment when loading into vectors. Change-Id: I7bbbe4d51f78776e499ce137578d8c0effdbc02f	2017-05-26 10:41:57 -07:00
Johann	f3c97ed32e	subpel variance neon: reduce stack usage Unlike x86, arm does not impose additional alignment restrictions on vector loads. For incoming values to the first pass, it uses vld1_u32() which typically does impose a 4 byte alignment. However, as the first pass operates on user-supplied values we must prepare for unaligned values anyway (and have, see mem_neon.h). But for the local temporary values there is no stride and the load will use vld1_u8 which does not require 4 byte alignment. There are 3 temporary structures. In the C, one is uint16_t. The arm saturates between passes but still passes tests. If this becomes an issue new functions will be needed. Change-Id: I3c9d4701bfeb14b77c783d0164608e621bfecfb1	2017-05-24 13:28:13 -07:00
Johann	d204c4bf01	Use vdup instead of vmov Change-Id: Idb6248c1429b55176bb3e9f4e8365ea0ed2be62a	2017-05-24 11:38:15 -07:00
Johann Koenig	de1a9c77a7	Merge changes Iaab2b9a1,Idfb458d3 * changes: sub pel avg variance neon: 4x block sizes sub pel variance neon: 4x block sizes	2017-05-24 18:33:53 +00:00
Johann Koenig	b11a37f540	Merge changes I31fa6ef8,I228c6f29 * changes: sub pel avg variance neon: add neon optimizations sub pel variance neon: normalize variable names	2017-05-24 18:32:02 +00:00
Alexandra Hájková	8bf6eaf433	ppc: Add vpx_sadnxmx4d_vsx for n,m = {8, 16, 32 ,64} Change-Id: I547d0099e15591655eae954e3ce65fdf3b003123	2017-05-24 13:27:09 +00:00
Linfeng Zhang	6444958f62	Update inv_txfm_sse2.h and inv_txfm_sse2.c Extract shared code into inline functions. Change-Id: Iee1e5a4bc6396aeed0d301163095c9b21aa66b2f	2017-05-23 14:54:46 -07:00
Johann	f6fcd3410d	sub pel avg variance neon: 4x block sizes BUG=webm:1423 Change-Id: Iaab2b9a183fdb54aae5f717aba95d90dc36a9e3b	2017-05-22 14:40:05 -07:00
Johann	188d58eaa9	sub pel variance neon: 4x block sizes Add optimizations for blocks of width 4 BUG=webm:1423 Change-Id: Idfb458d36db3014d48fbfbe7f5462aa6eb249938	2017-05-22 14:40:01 -07:00
Johann	9b0d306a2f	sub pel avg variance neon: add neon optimizations These are missing an optimized version of vpx_comp_avg_pred BUG=webm:1423 Change-Id: I31fa6ef842e98f7ff3ea079ffed51ae33178e2ed	2017-05-22 13:58:43 -07:00
Johann	e0d294c3af	sub pel variance neon: normalize variable names match vpx_dsp/variance.c variable names Change-Id: I228c6f296c183af147b079b7c8bcdf97bd09cf3a	2017-05-22 13:58:43 -07:00
Linfeng Zhang	27beada6d0	Merge "Add vpx_highbd_idct{4x4,8x8,16x16}_1_add_sse2"	2017-05-22 20:58:18 +00:00
Johann	67ac68e399	variance neon: assert overflow conditions Change-Id: I12faca82d062eb33dc48dfeb39739b25112316cd	2017-05-22 11:25:06 -07:00
Linfeng Zhang	c167345ffb	Add vpx_highbd_idct{4x4,8x8,16x16}_1_add_sse2 BUG=webm:1412 Change-Id: Ia338a6057d36f9ed7eaa9cbd4dfbf0c3cbdc6468	2017-05-22 11:24:21 -07:00
Johann	d217c87139	neon variance: special case 4x The sub pixel variance uses a temp buffer which guarantees width == stride. Take advantage of this with the 4x and avoid the very costly lane loads. Change-Id: Ia0c97eb8c29dc8dfa6e51a29dff9b75b3c6726f1	2017-05-22 10:51:31 -07:00
Johann Koenig	e7cac13016	Merge changes Ib8dd96f7,Ie9854b77 * changes: neon variance: process 4x blocks use memcpy for unaligned neon stores	2017-05-22 17:48:33 +00:00
Johann Koenig	b5055002d7	Merge "neon 4 byte helper functions"	2017-05-19 17:11:30 +00:00
Johann Koenig	3c603eadb4	Merge "neon fdct: 4x4 implementation"	2017-05-19 17:08:58 +00:00
Johann	7b742da63e	neon variance: process 4x blocks Continue processing sets of 16 values. Plenty of improvement for 4x8 (doubles the speed) but only about 30% for 4x4. BUG=webm:1422 Change-Id: Ib8dd96f75d474f0348800271d11e58356b620905	2017-05-17 17:35:01 -07:00
Johann	2057d3ef75	use memcpy for unaligned neon stores Advise the compiler that the store is eventually going to a uint8_t buffer. This helps avoid getting alignment hints which would cause the memory access to fail. Originally added as a workaround for clang: https://bugs.llvm.org//show_bug.cgi?id=24421 Change-Id: Ie9854b777cfb2f4baaee66764f0e51dcb094d51e	2017-05-17 12:11:31 -07:00
Johann	105503b839	neon fdct: 4x4 implementation Approximately twice as fast as C implementation. BUG=webm:1424 Change-Id: I3c0307fb08ddc23df42545cd089a78e2ed5c9d3f	2017-05-17 07:38:18 -07:00
Linfeng Zhang	18e8baa5c0	Add transpose_32bit_4x4() and rename transpose_4x4() for vpx_dsp/x86 Change-Id: Ib57377f6cf6573c04720d3cc5dea4285362b4220	2017-05-16 17:46:37 -07:00
Johann Koenig	2300e16675	Revert "Add visibility="protected" attribute for global variables referenced in asm files." This reverts commit `0d88e15454`. Reason for revert: chromium builds are failing to locate vpx_rv during dlopen() dlopen failed: cannot locate symbol "vpx_rv" referenced by "libstandalonelibwebviewchromium.so" Original change's description: > Add visibility="protected" attribute for global variables referenced in asm files. > > During aosp builds with binutils-2.27, we're seeing linker error > messages of this form: > libvpx.a(subpixel_mmx.o): relocation R_386_GOTOFF against preemptible > symbol vp8_bilinear_filters_x86_8 cannot be used when making a shared > object > > subpixel_mmx.o is assembled from "vp8/common/x86/subpixel_mmx.asm". > Other messages refer to symbol references from deblock_sse2.o and > subpixel_sse2.o, also assembled from asm files. > > This change marks such symbols as having "protected" visibility. This > satisfies the linker as the symbols are not preemptible from outside > the shared library now, which I think is the original intent anyway. > > Change-Id: I2817f7a5f43041533d65ebf41aefd63f8581a452 > TBR=jzern@google.com,johannkoenig@google.com,rahulchaudhry@chromium.org,builds@webmproject.org Change-Id: I0c2ea375aa7ef5fda15b9d9e23e654bb315c941b	2017-05-16 15:54:33 -07:00
Johann	7498fe2e54	neon 4 byte helper functions When data is guaranteed to be aligned, use helper functions which assert that requirement. Change-Id: Ic4b188593aea0799d5bd8eda64f9858a1592a2a3	2017-05-15 13:42:31 -07:00
Johann	1088b4f87c	move neon load/stores to a new file Move the tran_low_t helper functions to a new file. Additional load/store functions will be added here. Change-Id: I52bf652c344c585ea2f3e1230886be93f5caefc3	2017-05-15 08:29:43 -07:00
Alexandra Hájková	bcbc3929ae	ppc: Add vpx_sad64/32/16x64/32/16_avg_vsx Change-Id: Ic9639b1331d8c5cbc207c2a036891ff0137fc56f	2017-05-13 13:13:15 +00:00
Rahul Chaudhry	0d88e15454	Add visibility="protected" attribute for global variables referenced in asm files. During aosp builds with binutils-2.27, we're seeing linker error messages of this form: libvpx.a(subpixel_mmx.o): relocation R_386_GOTOFF against preemptible symbol vp8_bilinear_filters_x86_8 cannot be used when making a shared object subpixel_mmx.o is assembled from "vp8/common/x86/subpixel_mmx.asm". Other messages refer to symbol references from deblock_sse2.o and subpixel_sse2.o, also assembled from asm files. This change marks such symbols as having "protected" visibility. This satisfies the linker as the symbols are not preemptible from outside the shared library now, which I think is the original intent anyway. Change-Id: I2817f7a5f43041533d65ebf41aefd63f8581a452	2017-05-12 11:11:16 -07:00
James Zern	ac8f58f6ab	Merge changes I1b54a7a5,I3028bdad,I59788cd9 * changes: ppc: Add get_mb_ss_vsx ppc: Add get4x4sse_cs_vsx ppc: Add comp_avg_pred_vsx	2017-05-12 15:24:59 +00:00
Luca Barbato	143b21e362	ppc: Add get_mb_ss_vsx Change-Id: I1b54a7a5bb642e4b836d786ea1ae506eed025e3f	2017-05-12 17:23:00 +02:00
Luca Barbato	6d225eb5f9	ppc: Add get4x4sse_cs_vsx Change-Id: I3028bdadf653665d18e781d28e9625f62804b3d8	2017-05-12 17:23:00 +02:00
Luca Barbato	a7f8bd451b	ppc: Add comp_avg_pred_vsx Change-Id: I59788cd98231e707239c2ad95ae54f67cfe24e10	2017-05-12 17:22:55 +02:00
Alexandra Hájková	f48532e271	ppc: Add vpx_sad64x32/64_vsx Change-Id: I84e3705fa52f75cb91b2bab4abf5cc77585ee3e2	2017-05-12 16:10:16 +02:00
Alexandra Hájková	0b15bf1e54	ppc Add vpx_sad32x16/32/64_vsx Change-Id: I3c4f9d595275669580413a71b3c3c810e7ddcacd	2017-05-12 16:10:11 +02:00
James Zern	a12ea1d5e9	Merge "ppc: Add vpx_sad16x8/16/32_vsx"	2017-05-12 13:33:51 +00:00
Alexandra Hájková	cc7f0c0f3e	ppc: Add vpx_sad16x8/16/32_vsx Change-Id: I60619d28fffd9809f93b1af510a50e1aa02519a9	2017-05-10 19:57:30 +00:00
Linfeng Zhang	764b3b8090	Update specializations of idct functions Introduced append situation in Commit `0178d97` which could be confusing. Clean a little bit and add some comments. Change-Id: I69ad336f805aca7ce9d45515b8cd237423fadbb2	2017-05-10 12:51:18 -07:00
Johann Koenig	d713ec3c46	Merge changes I92eb4312,Ibb2afe4e * changes: subpel variance neon: add mixed sizes sub pixel variance neon: use generic variance	2017-05-10 18:19:52 +00:00
Linfeng Zhang	f532504864	Clean 32x32 idct C code Change-Id: I73b8104a9e7a70ffe827c1b7ff43618f24f5d7bd	2017-05-09 11:05:51 -07:00
Linfeng Zhang	ecd1eb2162	Update 4x4 idct sse2 functions It's a bit faster to call idct4_sse2() in vpx_idct4x4_16_add_sse2() Change-Id: I1513be7a895cd2fc190f4a8297c240b17de0f876	2017-05-08 16:16:52 -07:00
Johann	f7d1486f48	neon variance: process 16 values at a time Read in a Q register. Works on blocks of 16 and larger. Improvement of about 20% for 64x64. The smaller blocks are faster, but don't have quite the same level of improvement. 16x32 is only about 5% BUG=webm:1422 Change-Id: Ie11a877c7b839e66690a48117a46657b2ac82d4b	2017-05-08 18:48:55 +00:00
Johann Koenig	1814463864	Merge changes Id602909a,Ib0e85608 * changes: neon variance: process two rows of 8 at a time neon variance: add small missing sizes	2017-05-08 17:34:20 +00:00
Linfeng Zhang	2c3a2ad6f1	Merge changes I0cfe4117,I3581d80d,Ida62c941 * changes: Split dsp/x86/inv_txfm_sse2.c Update highbd idct functions arguments to use uint16_t dst Clean CONVERT_TO_BYTEPTR/SHORTPTR in idct	2017-05-08 16:15:57 +00:00
Johann	2346a6da4a	subpel variance neon: add mixed sizes Add support for everything except block sizes of 4. Performance is better but numbers will improve again when the variance optimizations land. BUG=webm:1423 Change-Id: I92eb4312b20be423fa2fe6fdb18167a604ff4d80	2017-05-04 15:30:01 -07:00
Johann	19e1ec8359	sub pixel variance neon: use generic variance When a neon version is available it will be called. This allows decoupling the variance implementations and has no real downside. For most configurations, the call will be #define'd to the neon implementation. Change-Id: Ibb2afe4e156c5610e89488504d366b3e6d1ba712	2017-05-04 15:30:01 -07:00
Johann	462e29703c	fdct 8x8 neon: minor comment cleanup Simplify HBD/non distinction in test. Document why transpose_neon.h is not used Change-Id: I17659414206ddbb8c2f1ef0d9f4a17f1745d5a52	2017-05-04 15:14:23 -07:00
Johann	d6a7489dd5	neon variance: process two rows of 8 at a time When the width is equal to 8, process two rows at a time. This doubles the speed of 8x4 and improves 8x8 by about 20%. 8x16 was using this technique already, but still improved a little bit with the rewrite. Also use this for vpx_get8x8var_neon BUG=webm:1422 Change-Id: Id602909afcec683665536d11298b7387ac0a1207	2017-05-04 08:59:46 -07:00
Johann	cb9133c72f	neon variance: add small missing sizes Some of the mixed sizes were missing. They can be implemented trivially using the existing helper function. When comparing the previous 16x8 and 8x16 implementations, the helper function is about 10% faster than the 16x8 version. The 8x16 is very close, but the existing version appears to be faster. BUG=webm:1422 Change-Id: Ib0e856083c1893e1bd399373c5fbcd6271a7f004	2017-05-04 08:59:42 -07:00
Linfeng Zhang	2231669a83	Split dsp/x86/inv_txfm_sse2.c Spin out highbd idct functions. BUG=webm:1412 Change-Id: I0cfe4117c00039b6778c59c022eee79ad089a2af	2017-05-03 15:43:02 -07:00
Linfeng Zhang	d5de63d2be	Update highbd idct functions arguments to use uint16_t dst BUG=webm:1388 Change-Id: I3581d80d0389b99166e70987d38aba2db6c469d5	2017-05-03 13:59:16 -07:00
Linfeng Zhang	081b39f2b7	Clean CONVERT_TO_BYTEPTR/SHORTPTR in idct BUG=webm:1388 Change-Id: Ida62c941f2b836d6c9e27b427a7d5008ab6dc112	2017-05-03 13:58:31 -07:00
Yi Luo	a3452996a1	High bit depth inter prediction horizontal/vertical filters AVX2 User level speed improvement on i7-6700, cpu-used=1, x86_64 Linux, bitrate, 1080p, 8Mbps, 4K, 16Mbps: - Decoder: 1080p: ~4% 4K: ~5% - Encoder: 1080p: ~1% 4K: ~3% Change-Id: I51b48f9c5de0d62487d5a11aa579c97bd03dd640	2017-05-03 12:18:01 -07:00
Linfeng Zhang	a10a5cb356	Merge changes I8bb660de,Ica51d780,I6037525d * changes: Clean specializes of idct functions Clean add_protos of highbd idct functions Clean add_protos of idct functions	2017-05-03 19:17:55 +00:00
Luca Barbato	e2ad89092d	ppc: Add convolve8_vsx and convolve8_avg_vsx Change-Id: Ia5293d948003a7fff5a7cbad6e83d8a72717c857	2017-05-02 20:27:47 -07:00
Luca Barbato	e6ca81ee67	ppc: Add convolve8_avg_vert_vsx Only the generic one again, speedups for 8x8 and larger blocks to come later. Change-Id: I90d481d3a602d1e277ead8f3934eca126b86b72d	2017-05-02 20:27:42 -07:00
Luca Barbato	a65f1771ad	ppc: Add convolve8_vert Only the generic one again, speedups for 8x8 and larger blocks to come later. Change-Id: Ia509d6225984b4930ec03928c9bcbf51486da99f	2017-05-02 20:27:33 -07:00
Luca Barbato	77772350f3	ppc: Add convolve8_horiz_avg The 8x8 and larger blocks cases can be sped up further. Change-Id: I54549b03ac6c7a4e3f485738b100c3cac7ac2e15	2017-05-02 20:27:28 -07:00
Luca Barbato	08edb85bd0	ppc: Add convolve8_horiz The 8x8 and larger blocks cases can be sped up further. Change-Id: I89b635d6b01c59f523f2d54b1284ed32916c5046	2017-05-02 20:27:16 -07:00
Linfeng Zhang	0178d974e5	Clean specializes of idct functions Change-Id: I8bb660de47b5f97263ec381dc428db96e9c9a4b2	2017-05-02 18:01:19 -07:00
Linfeng Zhang	4412996d59	Clean add_protos of highbd idct functions Change-Id: Ica51d780b92b316ce9112740c56cdf7670816371	2017-05-02 17:59:38 -07:00
Linfeng Zhang	a7a57d9756	Clean add_protos of idct functions Change-Id: I6037525d92ec172810edab720389eb1865ed3b1a	2017-05-02 17:58:40 -07:00
Luca Barbato	d51d3934f5	ppc: Add convolve_avg Change-Id: Ib203c444c708f42072e38301ee3db97b5b53d014	2017-04-29 15:47:25 +02:00
Luca Barbato	63860ba7b8	ppc: Add convolve_copy Change-Id: Ie26d6dbe090e711d84bac01ba7da270db983f405	2017-04-29 15:47:25 +02:00
Linfeng Zhang	51dc998f3a	Update highbd convolve functions arguments to use uint16_t src/dst BUG=webm:1388 Change-Id: I6912de2639895d817ce850da8ea9f6c8fe21da42	2017-04-25 14:22:19 -07:00
Luca Barbato	914b160fb5	ppc: h predictor 8x8 Slightly faster with the current compiler. Change-Id: Iae225fac08395eb430c97a2abec69c60f5cf5c47	2017-04-19 19:57:51 -07:00
Luca Barbato	0b9be93205	ppc: d63 predictor 8x8 10x faster. Change-Id: I7cedbf4df2ce7df5b6f1108b11815d088fdb9ba8	2017-04-19 19:57:51 -07:00
Luca Barbato	ee9325b0bd	ppc: tm predictor 4x4 Slightly faster. Change-Id: I0ca43f309b3d9b50435d69bd5be64b53a99bd191	2017-04-19 19:57:51 -07:00
Luca Barbato	2904eb5800	ppc: h predictor 4x4 2x faster. Change-Id: I0583dec353299c6797401b646099f18db4e0420d	2017-04-19 19:57:51 -07:00
Luca Barbato	58245d7050	ppc: dc predictor 8x8 Slightly faster, the other dc predictors cannot be faster since the computation speedup is overwhelmed by the time spent reading dst to write just the 8x8 part. Change-Id: I94a0b50500adf8b7b6bb919dbf5c7adf5b9fba66	2017-04-19 19:57:51 -07:00
Luca Barbato	6b4a65e8b1	ppc: d45 predictor 8x8 11x faster. Change-Id: I5b8f39213ee1f5260724fc254e3fb5c462435798	2017-04-19 19:57:51 -07:00
Luca Barbato	92e33c7b31	ppc: d63 predictor 32x32 About 10x faster. Change-Id: If7d0645f75c5d7deb9751edd0bf47e2f9068e9e7	2017-04-19 19:57:51 -07:00
Luca Barbato	a5469a00a8	ppc: d63 predictor 16x16 About 18x faster. Change-Id: Id043bf76c011e03e992085bb5e20f330d3e98cd4	2017-04-19 19:57:51 -07:00
Luca Barbato	cc868da526	ppc: d45 predictor 32x32 About 12x faster. Change-Id: I22c150256aefb4941861ab1f6c17d554fb694bed	2017-04-19 19:57:51 -07:00
Luca Barbato	7a7dc9e624	ppc: d45 predictor 16x16 About 16x faster. Change-Id: Ie5469fb32d5fd11bb6cb06318cea475d8a5b00b9	2017-04-19 19:57:51 -07:00
Luca Barbato	c08baa2900	ppc: dc predictor 32x32 10x and 5x faster. Change-Id: I7913c58c768334d818f541a5e219f1035791eeaf	2017-04-19 19:57:47 -07:00
Luca Barbato	22ca468c7c	ppc: dc top and left predictor 32x32 6x faster. Change-Id: I717995b4056e5579c68191d11b495372971fe1ae	2017-04-19 19:49:31 -07:00
Luca Barbato	ad9dea1f6d	ppc: dc top and left predictor 16x16 13x faster. Change-Id: I1771ac39fda599153f933cb3f0506c9f97a6cbe6	2017-04-19 19:49:31 -07:00
Luca Barbato	d68d37872c	ppc: dc_128 predictor 32x32 6x faster. Change-Id: I1da8f51b4262871cb98f0aa03ccda41b0ac2b08b	2017-04-19 19:49:31 -07:00
Luca Barbato	f9d20e6df2	ppc: dc_128 predictor 16x16 20x faster. Change-Id: I05f0deb2d38ae7966eae6b71fbc0aa51880e5709	2017-04-19 19:49:31 -07:00
Luca Barbato	0d9417de4a	ppc: tm predictor 32x32 About 8x faster. Change-Id: I9bad827ccbdf47ec95406e961c74ac2ff45f80cf	2017-04-19 19:49:26 -07:00
James Zern	a81f037f15	Merge changes I1f5a3752,I95123051,I3bb724e0,Ie81077fa,Ic80f3c05, ... * changes: ppc: tm predictor 16x16 ppc: tm predictor 8x8 ppc: horizontal predictor 32x32 ppc: horizontal predictor 16x16 ppc: vertical intrapred 16x16 and 32x32 configure: Workaround clang not enabling altivec on -mvsx configure: Match power64 as ppc64	2017-04-20 02:45:45 +00:00
Linfeng Zhang	bf8a49abbd	Clean CONVERT_TO_BYTEPTR/SHORTPTR in convolve Replace by CAST_TO_BYTEPTR/SHORTPTR. The rule is: if a short ptr is casted to a byte ptr, any offset operation on the byte ptr must be doubled. We do this by casting to short ptr first, adding offset, then casting back to byte ptr. BUG=webm:1388 Change-Id: I9e18a73ba45ddae58fc9dae470c0ff34951fe248	2017-04-19 12:13:49 -07:00
Luca Barbato	479443a570	ppc: tm predictor 16x16 About 10x faster. Change-Id: I1f5a3752d346459df3b45f92963208bf3e520f06	2017-04-19 01:48:10 +02:00
Luca Barbato	c8f5a55df4	ppc: tm predictor 8x8 About 5x faster. Change-Id: I951230517f49c0dca9ac9eac2efa8916a303b85a	2017-04-19 01:48:09 +02:00
Luca Barbato	7b0e12934e	ppc: horizontal predictor 32x32 About 5x faster. Change-Id: I3bb724e07baffd901aa2d0f65060ba48882cc9b8	2017-04-19 01:48:09 +02:00
Luca Barbato	a7a2d1653b	ppc: horizontal predictor 16x16 About 10x faster. Change-Id: Ie81077fa32ad214cdb46bdcb0be4e9e2c7df47c2	2017-04-19 01:48:09 +02:00
Luca Barbato	7ad1faa6f8	ppc: vertical intrapred 16x16 and 32x32 Change-Id: Ic80f3c050cfbe7697e81a311b4edaaa597b85cab	2017-04-19 01:48:09 +02:00
Johann	9fa24f03b5	re-enable vpx_comp_avg_pred_sse2 Buffers on 32 bit x86 builds only guaranteed 8 byte alignment. Fixed with "AvgPred test: use aligned buffers" and "sad avg: align intermediate buffer" Also re-enable asserts on the C version. BUG=webm:1390 Change-Id: I93081f1b0002a352bb0a3371ac35452417fa8514	2017-04-17 08:40:43 -07:00
Johann	069b772915	sad avg: align intermediate buffer comp_avg_pred has started declaring a requirement for aligned buffers. BUG=webm:1390 Change-Id: Idaf6667498ea343e8d49b32bc9d8b9d0aa43ef5c	2017-04-17 14:26:33 +00:00
James Zern	4ba20da8b1	Merge "Add AVX2 optimization to copy/avg functions"	2017-04-15 00:26:08 +00:00
Yi Luo	aa5a941992	Add AVX2 optimization to copy/avg functions Change-Id: Ibcef70e4fead74e2c2909330a7044a29381a8074	2017-04-14 16:50:10 -07:00
Johann	eaa7cdf05d	Disable vpx_comp_avg_pred_sse2 Failures on windows: unknown file: error: SEH exception with code 0xc0000005 thrown in the test body. Alignment check errors on linux: test_libvpx: ../libvpx/vpx_dsp/variance.c:230: void vpx_comp_avg_pred_c(uint8_t , const uint8_t , int, int, const uint8_t *, int): Assertion `((intptr_t)comp_pred & 0xf) == 0' failed. BUG=webm:1390 Change-Id: I5eed5381c0f1a8fe594a128eb415e77232f544ea	2017-04-14 08:43:06 -07:00

1 2 3 4 5 ...

776 Commits