generic-library/vpx

Author	SHA1	Message	Date
Tamar Levy	54f9205653	mb_lpf_horizontal_edge AVX2 optimization This CL contains two AVX2 optimized loop filter functions, mb_lpf_horizontal_edge_w_avx2_8 and mb_lpf_horizontal_edge_w_avx2_16. Change-Id: I604e4fe6e99752b7800c2ea98721d97f7e0b931b	2013-10-31 10:26:15 -06:00
Parag Salasakar	1699eb0bf6	mips dsp-ase r2 vp9 decoder idct module optimizations (rebase) Change-Id: Iedcdb8867084f328f4fce2fadb968e0984217308	2013-10-24 11:29:04 +05:30
Yunqing Wang	3a0b59e3fd	Merge "SSE2 8-tap sub-pixel filter optimization"	2013-10-11 08:44:56 -07:00
Yunqing Wang	3fb728c749	SSE2 8-tap sub-pixel filter optimization To ensure fast encoding/decoding on devices without ssse3 support, SSE2 optimization of sub-pixel filters was done. Test using 1080p clip showed the decoder speeds were ~70fps with ssse3 filters, ~60fps with sse2 filters, and ~15fps with c filters. Change-Id: Ie2088f87d83a889fba80a613e4d0e287aadd785c	2013-10-10 14:12:47 -07:00
Dmitry Kovalev	9a1250e3e0	Merge "Moving all scan/iscan code into separate vp9_scan.{h, c} files."	2013-10-10 10:45:07 -07:00
Parag Salasakar	eeb5b62dc1	mips dsp-ase r2 vp9 decoder bilinear convolve optimizations Change-Id: Ic31b4ef85e65070b4f8b9f26e068ccfaae00c4f0	2013-10-09 18:05:27 +05:30
Dmitry Kovalev	e3597c6af7	Moving all scan/iscan code into separate vp9_scan.{h, c} files. Now we have entropy code separate from scan/iscan code. The next step in future is to move iscan code from common part to the encoder. Change-Id: Id9732f7d80aec00af35c1d58d1137c4c96c91451	2013-10-07 13:55:56 -07:00
Parag Salasakar	40edab5e39	mips dsp-ase r2 vp9 decoder convolve module optimizations Change-Id: I401536778e3c68ba2b3ae3955c689d005e1f1d59	2013-10-02 16:58:37 -07:00
Dmitry Kovalev	efbacc9f89	Merge "Removing vp9_subpelvar.h from common."	2013-09-29 12:00:46 -07:00
Christian Duvivier	b1b4ba1bdd	Properly save neon registers. Replace current code which corrupts the stack by duplicate of vp8 code to save and restore neon registers. Change-Id: Ibb0220b9aa985d10533befa0a455ebce57a2891a	2013-09-27 14:25:33 -07:00
Christian Duvivier	5b1dc1515f	Fix a bunch of TODO from vp9_short_idct32x32_add_neon. - full ASM version, no more C gateway file. - integrate combine-add with last step of 2nd pass. - remove a few push/pop pairs. - some instruction reordering to hide latency. Change-Id: Ic9d9933c908b65d1bf7ba8fd47b524cda808c9c6	2013-09-25 21:15:19 -07:00
Dmitry Kovalev	64eff7f360	Removing vp9_subpelvar.h from common. Moving all code from that file to vp9_variace_c.c in the encoder. Change-Id: Ic803d5b4c78d5191e4d25541b3df97337878fc3e	2013-09-25 16:10:43 -07:00
James Zern	2d58761993	Revert "Improved 8t filters" This is incompatible with most toolchains other than gcc. Revert "Deleted #include <inttypes.h>" This reverts commit 4d018be950ef8b056a7c797a22ee58012443df26. This reverts commit d22a504d11a15dc3eab666859db0046b5a7d75c5. Change-Id: I1751dc6831f4395ee064e6748281418e967e1dcf	2013-09-13 15:13:06 -07:00
hkuang	86fb12b600	Merge "Add neon optimize iht8x8 which is 282% faster than C."	2013-09-12 15:42:44 -07:00
hkuang	182366c736	Add neon optimize iht8x8 which is 282% faster than C. Change-Id: I963dd4a6e8671957403ccbb9a16ea7de703e3530	2013-09-12 11:49:05 -07:00
Christian Duvivier	6a501462f8	First draft of vp9_short_idct32x32_add_neon. Lots of TODO which will be taken care in upcoming changes. As is, about 6x faster than C version. Change-Id: Ie2557b72fd2d8edca376dbf400a4d173aa5e63e0	2013-09-11 15:19:38 -07:00
Scott LaVarnway	d22a504d11	Improved 8t filters Reformatted version of a patch submitted by Erik/Tamar from Intel. For the test clips used, the decoder performance improved by ~2%. Change-Id: Ifbc37ac6311bca9ff1cfefe3f2e9b7f13a4a511b	2013-09-11 13:56:32 -04:00
hkuang	3c05bda058	Merge "Add neon optimize vp9_short_iht4x4_add."	2013-09-04 13:35:09 -07:00
hkuang	3b8614a8f6	Add neon optimize vp9_short_iht4x4_add. Change-Id: I42c497b68ae1ee645b59c9968ad805db0a43e37e	2013-09-04 12:37:58 -07:00
Jim Bankoski	79401542f7	make vp9 postproc a config option Vp9 postproc is disabled for now as its not been shown to help and may be merged with vp8. Change-Id: I25620d6cd34c6e10331b18c7b5ef7482e39c6057	2013-09-04 10:02:08 -07:00
hkuang	3a679e56b2	Add neon optimize vp9_short_idct16x16_1_add. Change-Id: Ib9354c1d975d03e8081df20d50b6a77dfe2dc7e5	2013-08-27 14:00:27 -07:00
hkuang	36e9b82080	Add neon optimize vp9_short_idct8x8_1_add. Change-Id: I0b15d5e3b0eb97abb9ab5ec08e88b61f8723aaf4	2013-08-26 16:28:57 -07:00
hkuang	69384f4fad	Add neon optimize vp9_short_idct4x4_1_add. Change-Id: I6ecb5c4a1a472feb8e84e9f3352b536d5e28a4a5	2013-08-26 15:55:16 -07:00
Johann	a9aa7d07d0	Merge "vp9: neon: add vp9_convolve_avg_neon"	2013-08-15 14:55:15 -07:00
Johann	63e140eaa7	Merge "vp9: neon: add vp9_convolve_copy_neon"	2013-08-15 14:55:08 -07:00
hkuang	39f42c8713	Merge "Add neon optimize vp9_short_idct16x16_add."	2013-08-14 14:16:20 -07:00
hkuang	cf6beea661	Add neon optimize vp9_short_idct16x16_add. Change-Id: I27134b9a5cace2bdad53534562c91d829b48838d	2013-08-14 13:52:16 -07:00
Mans Rullgard	0f1deccf86	vp9: neon: add vp9_convolve_avg_neon Change-Id: I33cff9ac4f2234558f6f87729f9b2e88a33fbf58	2013-08-14 16:27:55 +01:00
Mans Rullgard	635ba269be	vp9: neon: add vp9_convolve_copy_neon Change-Id: I15adbbda15d1842e9f15f21878a5ffbb75c3c0c9	2013-08-14 16:27:55 +01:00
Dmitry Kovalev	8ffe85ad00	Moving scale_factors and related code to separate files. Change-Id: I531829e5aee2a4a7a112d528ecccbddf052d0e74	2013-08-09 14:07:09 -07:00
Christian Duvivier	78182538d6	Neon version of vp9_short_idct4x4_add. Change-Id: Idec4cae0cb9b3a29835fd2750d354c1393d47aa4	2013-08-06 18:41:27 -07:00
Jim Bankoski	6eb1254b88	sse3 intrapred x86inc protected Change-Id: I4a3c83119cdf8a205920034c8019d855d5504605	2013-08-06 14:17:13 -07:00
Jim Bankoski	25ec1375c9	intrapred x86inc guards Change-Id: If0399d8e11f4ebe75a5c91abb8d6a52a7709065b	2013-08-06 09:39:30 -07:00
Jim Bankoski	c3809f3de5	Begin to restrict x86inc.asm usage Chromium does not support 32bit builds for Mac which use x86inc.asm. Make the files which include it work if 64bit or not PIC enabled starting with vp9_copy_sse2.asm Consolidate these targets in vp9_rtcd_defs.sh Change-Id: If18f0b957a611efd085a3ee7d245cf1eb91e8248	2013-08-05 12:07:30 -07:00
Mans Rullgard	d85ae87183	vp9: neon: add vp9_mb_lpf_* functions Change-Id: I13e0880df234f15abc4cc7c57fe84488d5d46a75	2013-08-02 08:10:50 -07:00
hkuang	d757de744c	Add neon optimize vp9_short_idct8x8_add. Change-Id: Ic32acf3e2939c6d12d9c2bf192a5f5da59705fda	2013-07-18 16:40:41 -07:00
Johann	9ca66ec050	Merge "vp9_convolve8_neon placeholder"	2013-07-17 10:09:00 -07:00
Johann	59dc4e9cdd	vp9_convolve8_neon placeholder Call the individually optimized horizontal and vertical functions. This implementation abuses the temp buffer. This will be replaced with a custom optimized function. Over 2x speedup. Change-Id: I5b908d2a73d264e9810d6022bbff73207a3055dd	2013-07-17 08:39:27 -07:00
James Zern	98e132bde0	Merge changes I40454d26,I892e76d5,I865ab3f9,I4a4bec17,I61c4351e,I37eb3559,I1031c556,I8c8f1f42 * changes: delete vp9_loopfilter_sse2.asm vp9_loopfilter_intrin_sse2: cosmetics: fix indent delete x86/vp9_loopfilter_x86.h vp9_loopfilter_intrin_sse2: make some funcs static vp9_loopfilter_intrin_sse2: remove unused uv funcs vp9_loopfilter: remove uv function typedef filter_block_plane: reuse some constants vp9_loopfilter.c: make some functions static	2013-07-16 14:25:32 -07:00
James Zern	50015f6eba	delete vp9_loopfilter_sse2.asm sse2 functions are provided by vp9_loopfilter_intrin_sse2.c Change-Id: I40454d26034e3ef915eeaf889937fe7d1b519b9b	2013-07-16 13:09:16 -07:00
James Zern	af58254267	delete x86/vp9_loopfilter_x86.h also remove prototype_loopfilter{,_block} defines from vp9_loopfilter.h Change-Id: I865ab3f9436c7b1ca166f76630328abf01389405	2013-07-16 13:09:05 -07:00
Dmitry Kovalev	baf0c959c7	Moving vp9_kf_default_bmode_probs to vp9_entropymode.c. Removing vp9_modelcontext.c. Change-Id: If2316c58dead2708d9f95b52d9494ba4c1dd7427	2013-07-16 10:54:34 -07:00
Johann	a15bebfc0a	vp9_convolve8_[horiz\|vert]_avg Super basic conversion from the other implementations. Any changes to one should be trivial to copy over keep in sync. Change-Id: I1720b4128e0aba4b2779e3761f6494f8a09d3ea8	2013-07-12 16:21:33 -07:00
Johann	158c80cbb0	convolve8 optimizations for neon Independent horizontal and vertical implementations. Requires that blocks be built from 4x4 and [xy]_step_q4 == 16 6-10% improvement. CIF improved the least. Change-Id: I137f5ceae4440adc0960bf88e4453e55a618bcda	2013-07-11 11:08:19 -07:00
hkuang	c9b25dcae4	Add neon optimize vp9_dc_only_idct_add. Change-Id: Iae84ab945cc9662a0ddd839aa2b9ca59f2ae5423	2013-07-11 10:30:47 -07:00
Ronald S. Bultje	decead7336	Replace copy_memNxM functions with a generic copy/avg function. Change-Id: I3ce849452ed4f08527de9565a9914d5ee36170aa	2013-07-10 18:27:24 -07:00
Ronald S. Bultje	3f210f10eb	Remove unused iwalsh4x4 MMX/SSE2 functions. Change-Id: I2d22577911a37ed7d8c7e08cac20764842267652	2013-07-10 14:52:47 -07:00
Ronald S. Bultje	48c53233fd	Remove unused 16x3/3x16 sad SSE2 functions. Change-Id: I30a597c0cc366e34c9a3e2afe32d70e044f95ca4	2013-07-10 14:52:47 -07:00
Ronald S. Bultje	e6f955251f	Merge "SSSE3 assembly for 4x4/8x8/16x16/32x32 H intra prediction."	2013-07-10 14:52:23 -07:00
Ronald S. Bultje	89810bfd71	Merge "SSE/SSE2 assembly for 4x4/8x8/16x16/32x32 DC intra prediction."	2013-07-10 10:13:16 -07:00

1 2 3 4

174 Commits