generic-library/vpx

Author	SHA1	Message	Date
Dmitry Kovalev	db60c02c9e	Merge "Renaming vp9_short_idct10_16x16 to vp9_short_idct16x16_10."	2013-09-27 13:08:52 -07:00
Dmitry Kovalev	15a36a0a0d	Renaming vp9_short_idct10_16x16 to vp9_short_idct16x16_10. Making function name consistent with vp9_short_idct16x16 and vp9_short_idct16x16_1. Change-Id: I70e54be9e6b9a1dddab0de470686591e96d05517	2013-09-26 14:01:25 -07:00
Scott LaVarnway	208658490c	d63 intra prediction ssse3 using bytes byte version of ronalds d63 ssse3 optimizations (commit: c5a1c8cf3541cf3665fee981b36d22c9fbd4191e) Change-Id: Ifd3e6d454a2246085f23eabb38518a930321e807	2013-09-25 16:16:44 -04:00
Yunqing Wang	9d901217c6	Fix x86inc.asm to build PIC code correctly Current x86inc.asm didn't handle 32bit PIC build properly. TEXTRELs were seen in the library built. The PIC macros from libvpx's x86_abi_support.asm was used to fix this problem. The assembly code was modified to use the macros. Notes: We need this fix in for decoder building. Functions in encoder will be fixed later. Change-Id: Ifa548d37b1d0bc7d0528db75009cc18cd5eb1838	2013-09-18 13:45:46 -07:00
James Zern	2d58761993	Revert "Improved 8t filters" This is incompatible with most toolchains other than gcc. Revert "Deleted #include <inttypes.h>" This reverts commit `4d018be950`. This reverts commit `d22a504d11`. Change-Id: I1751dc6831f4395ee064e6748281418e967e1dcf	2013-09-13 15:13:06 -07:00
Paul Wilkins	4d018be950	Deleted #include <inttypes.h> This seems not to be needed and is not supported in the Windows build. Change-Id: Iaca3bbf8cca283aee6bc336cb31ba9dd4610322b	2013-09-12 13:43:07 +01:00
Scott LaVarnway	d22a504d11	Improved 8t filters Reformatted version of a patch submitted by Erik/Tamar from Intel. For the test clips used, the decoder performance improved by ~2%. Change-Id: Ifbc37ac6311bca9ff1cfefe3f2e9b7f13a4a511b	2013-09-11 13:56:32 -04:00
Scott LaVarnway	22dc946a7e	Improved mb_lpf_horizontal_edge_w_sse2_8 This patch is a reformatted version of optimizations done by engineers at Intel (Erik/Tamar) who have been providing performance feedback for VP9. For the test clips used (720p, 1080p), up to 1.2% performance improvement was seen. Change-Id: Ic1a7149098740079d5453b564da6fbfdd0b2f3d2	2013-08-29 08:30:17 -04:00
Jingning Han	9d67495f72	Optimize 32x32 2D inverse DCT for speed-up This commit exploits the sparsity of quantized coefficient matrix. It detects each 32x8 array and skip the corresponding inverse transformation if all entries are zero. For ped1080p at 8000 kbps, this on average reduces the runtime of 32x32 inverse 2D-DCT SSE2 function from 6256 cycles -> 5200 cycles. It makes the overall encoding process about 2% faster at speed 0. The speed-up is more pronounceable for the decoding process. Change-Id: If20056c3566bd117642a76f8884c83e8bc8efbcf	2013-07-31 17:13:31 -07:00
Jingning Han	a7c4de22e1	16x16 inverse 2D-DCT with DC only This commit provides special handle on 16x16 inverse 2D-DCT, where only DC coefficient is quantized to be non-zero value. Change-Id: I7bf71be7fa13384fab453dc8742b5b50e77a277c	2013-07-29 14:45:53 -07:00
Ronald S. Bultje	6f3054b65d	Merge "d45 intra prediction SSSE3 optimizations."	2013-07-26 17:21:09 -07:00
Jingning Han	325e0aa650	Special handle on DC only inverse 8x8 2D-DCT This commit enables a special handle for the 8x8 inverse 2D-DCT, where only DC coefficient is quantized to be non-zero. For bus_cif at 2000 kbps, it provides about 1% speed-up at speed 0. Change-Id: I2523222359eec26b144cf8fd4c63a4ad63b1b011	2013-07-26 14:16:51 -07:00
Ronald S. Bultje	94b0c6791d	d45 intra prediction SSSE3 optimizations. Change-Id: Ie48035ff4f93c41f8a9b3023e6444fd10432d8fb	2013-07-26 13:30:02 -07:00
Jingning Han	384e37e32b	SSE2 inverse 4x4 2D-DCT with DC only Add SSE2 implementation to handle the special case of inverse 2D-DCT where only DC coefficient is non-zero. Change-Id: I2c6a59e21e5e77b8cf39a4af5eecf4d5ade32e2f	2013-07-24 23:19:56 -07:00
Jingning Han	d2de1ca37b	Merge vp9_dc_only_idct_add and vp9_short_idct4x4_1 They share the same functionality, so merging together. Change-Id: I98a0386fcee052cb854f9ff90c283c1b844bcb79	2013-07-24 16:51:15 -07:00
James Zern	98e132bde0	Merge changes I40454d26,I892e76d5,I865ab3f9,I4a4bec17,I61c4351e,I37eb3559,I1031c556,I8c8f1f42 * changes: delete vp9_loopfilter_sse2.asm vp9_loopfilter_intrin_sse2: cosmetics: fix indent delete x86/vp9_loopfilter_x86.h vp9_loopfilter_intrin_sse2: make some funcs static vp9_loopfilter_intrin_sse2: remove unused uv funcs vp9_loopfilter: remove uv function typedef filter_block_plane: reuse some constants vp9_loopfilter.c: make some functions static	2013-07-16 14:25:32 -07:00
James Zern	50015f6eba	delete vp9_loopfilter_sse2.asm sse2 functions are provided by vp9_loopfilter_intrin_sse2.c Change-Id: I40454d26034e3ef915eeaf889937fe7d1b519b9b	2013-07-16 13:09:16 -07:00
James Zern	8f4787a383	vp9_loopfilter_intrin_sse2: cosmetics: fix indent Change-Id: I892e76d5ad1443b2ea0d1a7839fe26afe9c68ffb	2013-07-16 13:09:16 -07:00
James Zern	af58254267	delete x86/vp9_loopfilter_x86.h also remove prototype_loopfilter{,_block} defines from vp9_loopfilter.h Change-Id: I865ab3f9436c7b1ca166f76630328abf01389405	2013-07-16 13:09:05 -07:00
Jingning Han	d05f66aa10	SSE2 16x16 inverse ADST/DCT hybrid transform This commit enables SSE2 implementation of 16x16 inverse ADST/DCT hybrid transform. The runtime goes from 5742 cycles -> 1821 cycles. This provides about 1% encoding speed-up at speed 0. Change-Id: I1678d0988bf30b9efd524877705bbb3645edb17b	2013-07-16 12:51:42 -07:00
James Zern	04606d7258	vp9_loopfilter_intrin_sse2: make some funcs static + drop 'vp9_' Change-Id: I4a4bec175316aab8f65c3a23bacc8362399a1357	2013-07-13 18:48:00 -07:00
James Zern	dc968d3d45	vp9_loopfilter_intrin_sse2: remove unused uv funcs vp9_mbloop_filter_horizontal_edge_sse2 / vp9_mbloop_filter_vertical_edge_uv_sse2 Change-Id: I61c4351ef0cce79fa4156a47ddace781f1566869	2013-07-13 18:44:32 -07:00
James Zern	bd6b79c44d	vp9_loopfilter: remove uv function typedef loop_filter_uvfunction is unused Change-Id: I37eb3559e9eb2808f1f29dfea429441c94c9df2a	2013-07-13 18:38:28 -07:00
Jingning Han	91365addf8	SSE2 8x8 inverse ADST/DCT transform This commit enables SSE2 implementation of 8x8 inverse ADST/DCT transform. The runtime goes from 1216 cycles -> 266 cycles. For bus_cif at 2000 kbps, the overall runtime reduces from 253707ms -> 248430ms, i.e., 2% speed-up at speed 0. Change-Id: Ib0372e17e9162d7b11a10d653b1c8be547c878fb	2013-07-12 21:03:16 -07:00
Jingning Han	dac5891a1a	Merge "SSE2 4x4 invserse ADST/DCT transform"	2013-07-11 14:17:23 -07:00
Johann	158c80cbb0	convolve8 optimizations for neon Independent horizontal and vertical implementations. Requires that blocks be built from 4x4 and [xy]_step_q4 == 16 6-10% improvement. CIF improved the least. Change-Id: I137f5ceae4440adc0960bf88e4453e55a618bcda	2013-07-11 11:08:19 -07:00
Jim Bankoski	5000cdf0ff	Merge "Wide loopfilter 16 pix at a time"	2013-07-11 06:44:02 -07:00
Jingning Han	49b6302044	SSE2 4x4 invserse ADST/DCT transform Enable SSE2 4x4 inverse ADST/DCT transform. The runtime goes from 292 cycles down to 89 cycles. Running bus_cif at 2000 kbps, the overall runtime of speed 0 goes from 301s to 295s (2% speed-up). Change-Id: I24098136e7fee7ab2fbf1c11755bdf2ca37f3628	2013-07-10 20:16:02 -07:00
Ronald S. Bultje	decead7336	Replace copy_memNxM functions with a generic copy/avg function. Change-Id: I3ce849452ed4f08527de9565a9914d5ee36170aa	2013-07-10 18:27:24 -07:00
John Koleszar	64f7a4d8cb	Wide loopfilter 16 pix at a time Where possible, do the 16 pixel wide filter while doing the horizontal filtering pass. The same approach can be taken for the mbloop_filter when that's implemented. Doing so on the vertical pass is a little more involved, but possible. Change-Id: I010cb505e623464247ae8f67fa25a0cdac091320	2013-07-10 16:32:44 -07:00
Ronald S. Bultje	3f210f10eb	Remove unused iwalsh4x4 MMX/SSE2 functions. Change-Id: I2d22577911a37ed7d8c7e08cac20764842267652	2013-07-10 14:52:47 -07:00
Ronald S. Bultje	48c53233fd	Remove unused 16x3/3x16 sad SSE2 functions. Change-Id: I30a597c0cc366e34c9a3e2afe32d70e044f95ca4	2013-07-10 14:52:47 -07:00
Ronald S. Bultje	e6f955251f	Merge "SSSE3 assembly for 4x4/8x8/16x16/32x32 H intra prediction."	2013-07-10 14:52:23 -07:00
Ronald S. Bultje	6a60249071	Merge "SSE/SSE2 assembly for 4x4/8x8/16x16/32x32 TM intra prediction."	2013-07-10 14:52:19 -07:00
Ronald S. Bultje	44b29a769c	Merge "SSE/SSE2 assembly for 4x4/8x8/16x16/32x32 V intra prediction."	2013-07-10 10:24:16 -07:00
Ronald S. Bultje	89810bfd71	Merge "SSE/SSE2 assembly for 4x4/8x8/16x16/32x32 DC intra prediction."	2013-07-10 10:13:16 -07:00
Ronald S. Bultje	7fd643264a	SSSE3 assembly for 4x4/8x8/16x16/32x32 H intra prediction. Change-Id: Iad70966b986f65259329070e258f76ef0af816b4	2013-07-10 09:28:03 -07:00
Ronald S. Bultje	8dade638a1	SSE/SSE2 assembly for 4x4/8x8/16x16/32x32 TM intra prediction. Change-Id: I3441c059214c2956e8261331bbf521525a617a86	2013-07-10 09:28:03 -07:00
Ronald S. Bultje	75b33c68c7	SSE/SSE2 assembly for 4x4/8x8/16x16/32x32 V intra prediction. Change-Id: I55a6cfa2daba738cbc0c4a02f806893f7e556997	2013-07-10 09:28:03 -07:00
Ronald S. Bultje	92c5d3665d	SSE/SSE2 assembly for 4x4/8x8/16x16/32x32 DC intra prediction. Change-Id: Ibe1690afc5459f3b3beca401e7734fcd03da6dd0	2013-07-10 09:28:03 -07:00
Dmitry Kovalev	aeed28f143	Removing vp9_maskingmv.c and corresponding assembly file. Change-Id: I9842d02d61d78d17dc3449bae8ffbe60f4b3ecb3	2013-07-09 11:22:56 -07:00
Ronald S. Bultje	8350e7fe38	Make intra prediction pointers RTCD-based. This probably has a mildly negative impact on performance, but will (in future commits - or possibly merged with this one) allow SIMD implementations of individual intra prediction functions. We may perhaps want to consider having separate functions per txfm-size also (i.e. 4x4, 8x8, 16x16 and 32x32 intra prediction functions for each intra prediction mode), but I haven't played much with that yet. Change-Id: Ie739985eee0a3fcbb7aed29ee6910fdb653ea269	2013-07-08 17:25:51 -07:00
Ronald S. Bultje	d9fc451666	Move subpixel variance function from common/ to encoder/. This seems to only be used in the encoder. Also remove an empty wrapper file that contained forward declarations for this function, but didn't actually define any actual functions. Change-Id: Ifc561eef7ebe374a7d03698055e51e105f6d614b	2013-06-17 16:54:09 -07:00
Scott LaVarnway	a81bd12a2e	Quick modifications to mb loopfilter intrinsic functions Modified to work with 8x8 blocks of memory. Will revisit later for further optimizations. For the HD clip used, the decoder improved by almost 20%. Change-Id: Iaa4785be293a32a42e8db07141bd699f504b8c67	2013-06-12 19:23:03 -04:00
Yaowu Xu	d682243012	Merge "Quick modifications to wide loopfilter intrinsic functions"	2013-06-12 15:16:11 -07:00
Scott LaVarnway	26496c52bf	Quick modifications to wide loopfilter intrinsic functions Modified to work with 8x8 blocks of memory. Will revisit later for further optimizations. For the HD clip used, the decoder improved my 20%. Change-Id: Ia0057f55d66d1445882351ea6c43b595a5a980e5	2013-06-12 16:49:08 -04:00
John Koleszar	8933a652fc	Remove some unused loopfilter code This code is unreachable, and not useful for later reference. Change-Id: I4c9a9e0fbf859c1081bbcfbcda9710afb4b4741f	2013-06-12 11:36:00 -07:00
Scott LaVarnway	a143152600	Removed unused idct functions No longer used. Change-Id: Id28c9247cebba183c6fa786dff96824ae100132c	2013-05-21 17:59:54 -04:00
Scott LaVarnway	0c3f3bf1d5	Removed vp9_recon functions No longer used. Change-Id: Ica5166f7117f4693dffdf7633dcfc1b263103d0d	2013-05-21 13:57:50 -04:00
Scott LaVarnway	ba48a11130	WIP: 4x4 idct/recon merge This patch eliminates the intermediate diff buffer usage by combining the short idct and the add residual into one function. The encoder can use the same code as well. Change-Id: I296604bf73579c45105de0dd1adbcc91bcc53c22	2013-05-20 13:03:17 -04:00

1 2 3

122 Commits