ffmpeg

Author	SHA1	Message	Date
James Almer	4ac41a52e2	x86/huffyuvdsp: fix some prototypes Remove duplicate prototypes and fix int -> intptr_t in another Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-31 00:29:00 +02:00
Christophe Gisquet	d136fe6fd7	x86: huffyuvdsp: fewer functions for x86_64 When there are 2 functions that are <= SSE2, only one is needed for x86_64. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-30 21:39:06 +02:00
Timothy Gu	154cee9292	x86: dsputilenc: convert ff_sse{8, 16}_mmx() to yasm Signed-off-by: Timothy Gu <timothygu99@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-30 16:57:52 +02:00
Timothy Gu	0b6292b7b8	x86: dsputilenc: move all the function prototypes together Signed-off-by: Timothy Gu <timothygu99@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-30 16:18:10 +02:00
Christophe Gisquet	f743fa9c7f	x86: huffyuvdsp: add_hfyu_left_pred_bgr32 C MMX SSE2 Cycles: 3092 1053 578 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-30 15:20:36 +02:00
Michael Niedermayer	7be79c76d3	avcodec/huffyuvdsp: Change w to intptr in add_hfyu_median_pred() and add_hfyu_left_pred() This avoids potential issues with the high 32bits being random in x86-64 asm Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-30 15:12:58 +02:00
Christophe Gisquet	884078d2df	x86: huffyuvdsp: add SSE2 median prediction From 5010c to 4566 on lagarith YUY2. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-30 14:57:57 +02:00
Michael Niedermayer	8c891d90ca	avcodec/x86/qpeldsp_init: Restore author attribution See: `368f50359e` See: `44eb495128`, and many others See: similarity index 83% copy from libavcodec/x86/dsputil_init.c copy to libavcodec/x86/qpeldsp_init.c index ebbf97f..8f296a1 100644 --- a/libavcodec/x86/dsputil_init.c +++ b/libavcodec/x86/qpeldsp_init.c @@ -1,6 +1,5 @@ /* - * Copyright (c) 2000, 2001 Fabrice Bellard - * Copyright (c) 2002-2004 Michael Niedermayer <michaelni@gmx.at> + * quarterpel DSP functions * * This file is part of FFmpeg. * Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-30 04:05:29 +02:00
Michael Niedermayer	98a6806fdd	Merge commit '368f50359eb328b0b9d67451f56fda20b3255f9a' * commit '368f50359eb328b0b9d67451f56fda20b3255f9a': dsputil: Split off quarterpel bits into their own context Conflicts: configure libavcodec/dsputil.c libavcodec/h263dec.c libavcodec/mpegvideo.c libavcodec/mpegvideo_enc.c libavcodec/vc1dec.c libavcodec/vc1dsp.c libavcodec/x86/dsputil_init.c libavcodec/x86/qpeldsp.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-30 02:43:34 +02:00
Michael Niedermayer	40f3a87c10	Merge commit '054013a0fc6f2b52c60cee3e051be8cc7f82cef3' * commit '054013a0fc6f2b52c60cee3e051be8cc7f82cef3': dsputil: Move APE-specific bits into apedsp Conflicts: libavcodec/arm/int_neon.S libavcodec/x86/dsputil.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-30 00:59:15 +02:00
Michael Niedermayer	c814a6c778	avcodec/x86/svq1enc_mmx: Add author attribution See: `5900637219` Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-30 00:30:05 +02:00
Michael Niedermayer	ea0931fb96	Merge commit '65d5d5865845f057cc6530a8d0f34db952d9009c' * commit '65d5d5865845f057cc6530a8d0f34db952d9009c': dsputil: Move SVQ1 encoding specific bits into svq1enc Conflicts: libavcodec/x86/Makefile Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-30 00:01:45 +02:00
James Almer	02a3e327f1	x86/dsputilenc: add missing guards to ff_pix_sum16_xop XOP support was added in Yasm 1.0.0 and Nasm 2.06, and we still support older versions. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-29 22:31:28 +02:00
Christophe Gisquet	99a319c4e7	x86: huffyuvdsp: port add_bytes to yasm C MMX SSE2 Cycles: 2972 587 302 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-29 21:56:00 +02:00
Christophe Gisquet	2267003981	x86: hpeldsp: better factorization Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-29 21:47:40 +02:00
Michael Niedermayer	7b4c46050e	rename add_hfyu_left_prediction_int16 to add_hfyu_left_pred_int16 This makes the naming more consistent with the 8bit variant Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-29 19:50:44 +02:00
Michael Niedermayer	550ae6c02f	rename add_hfyu_median_prediction_int16 to add_hfyu_median_pred_int16 This makes the naming more consistent with the 8bit variant Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-29 19:49:29 +02:00
Michael Niedermayer	40a4ab8ba4	rename sub_hfyu_median_prediction_int16 to sub_hfyu_median_pred_int16 This makes the naming more consistent with the 8bit variant Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-29 19:48:23 +02:00
James Almer	05de4d3011	x86/dsputilenc: implement XOP version of pix_sum16 SSE2: 137 cycles XOP: 87 cycles Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-29 18:40:23 +02:00
Diego Biurrun	368f50359e	dsputil: Split off quarterpel bits into their own context	2014-05-29 06:48:31 -07:00
Diego Biurrun	054013a0fc	dsputil: Move APE-specific bits into apedsp	2014-05-29 06:41:15 -07:00
Diego Biurrun	65d5d58658	dsputil: Move SVQ1 encoding specific bits into svq1enc	2014-05-29 06:41:15 -07:00
Michael Niedermayer	b50559fc0b	libavcodec/x86/dsputilenc: drop and 0xffff that should have becomei redundant Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-29 00:16:52 +02:00
James Almer	561bfc85eb	x86/dsputilenc: implement SSE2 versions of pix_{sum16, norm1} Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-28 23:29:34 +02:00
Christophe Gisquet	0810608e23	x86: hevc_mc: better register allocation The xmm reg count was incorrect, and manual loading of the gprs furthermore allows to noticeable reduce the number needed. The modified functions are used in weighted prediction, so only a few samples like WP_* exhibit a change. For this one and Win64 (some widths removed because of too few occurrences): WP_A_Toshiba_3.bit, ff_hevc_put_hevc_uni_w 16 32 before: 2194 3872 after: 2119 3767 WP_B_Toshiba_3.bit, ff_hevc_put_hevc_bi_w 16 32 64 before: 2819 4960 9396 after: 2617 4788 9150 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-28 17:39:34 +02:00
Michael Niedermayer	48a6916308	Merge commit '512f3ffe9b4bb86767c2b1176554407c75fe1a5c' * commit '512f3ffe9b4bb86767c2b1176554407c75fe1a5c': dsputil: Split off HuffYUV encoding bits into their own context Conflicts: configure libavcodec/dsputil.c libavcodec/dsputil.h libavcodec/huffyuv.h libavcodec/huffyuvenc.c libavcodec/pngenc.c libavcodec/x86/dsputilenc_mmx.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-28 00:03:59 +02:00
Michael Niedermayer	e2abc0d5ca	Merge commit '0d439fbede03854eac8a978cccf21a3425a3c82d' * commit '0d439fbede03854eac8a978cccf21a3425a3c82d': dsputil: Split off HuffYUV decoding bits into their own context Conflicts: configure libavcodec/dsputil.c libavcodec/dsputil.h libavcodec/huffyuv.h libavcodec/huffyuvdec.c libavcodec/lagarith.c libavcodec/vble.c libavcodec/x86/Makefile libavcodec/x86/dsputil.asm libavcodec/x86/dsputil_init.c libavcodec/x86/dsputil_mmx.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-27 23:16:06 +02:00
Diego Biurrun	512f3ffe9b	dsputil: Split off HuffYUV encoding bits into their own context Also shorten HuffYUV context member names to avoid clutter.	2014-05-27 08:54:53 -07:00
Diego Biurrun	0d439fbede	dsputil: Split off HuffYUV decoding bits into their own context Also shorten HuffYUV context member names to avoid clutter.	2014-05-27 08:52:34 -07:00
James Almer	5863207086	x86/dsputilenc: use HADDD in ff_sse16_sse2 Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-27 15:12:50 +02:00
James Almer	e64e079ece	x86/dsputilenc: implement SSE2 version of diff_pixels Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-27 05:55:11 +02:00
Michael Niedermayer	a0c5cd3475	avcodec/x86/dsputilenc: set the count of SSE registers correctly for get_pixels Found-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-27 05:52:25 +02:00
Christophe Gisquet	86ae0da60c	x86: hpeldsp: propagate changes across codecs Some codecs still use mmx versions, so have them use the versions with newer instruction sets. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-26 15:37:04 +02:00
Michael Niedermayer	a3950a90f6	Revert "x86: dsputilenc: convert ff_sse{8, 16}_mmx() to yasm" This reverts commit `ad733089b0`. breaks with --disable-yasm revert requested by: Christophe Gisquet <christophe.gisquet@gmail.com>	2014-05-25 19:42:18 +02:00
Timothy Gu	ad733089b0	x86: dsputilenc: convert ff_sse{8, 16}_mmx() to yasm Signed-off-by: Timothy Gu <timothygu99@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-25 16:30:08 +02:00
James Almer	d94e255dd1	x86/dsputilenc: make the SUM_ABS_DCTELEM macro more readable Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-25 02:03:54 +02:00
James Almer	61eea421b2	x86/dsputilenc: port sum_abs_dctelem functions to yasm Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-24 21:46:25 +02:00
Christophe Gisquet	81aa0f4604	x86: hpeldsp: implement SSSE3 version of _xy2 Loading pb_1 rather than pw_8192 was benchmarked to be more efficient. Loading of the 2 yields no advantage. Loading of one saves ~11 cycles. decicycles count: put8: 3223(mmx) -> 2387 avg8: 2863(mmxext) -> 2125 put16: 4356(sse2) -> 3553 avg16: 4481(sse2) -> 3513 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-24 15:15:56 +02:00
Christophe Gisquet	9722a6a3f3	x86: hpeldsp: implement SSE2 put_pixels16_xy2 This is obviously equivalent to the avg version, without the avg. 3223(mmx) -> 2006(sse2) Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-24 03:45:17 +02:00
Christophe Gisquet	f0aca50e0b	x86: hpeldsp: implement SSE2 versions Those are mostly used in codecs older than H.264, eg MPEG-2. put16 versions: mmx mmx2 sse2 x2: 1888 1185 552 y2: 1778 1092 510 avg16 xy2: 3509(mmx2) -> 2169(sse2) Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-24 03:29:48 +02:00
James Almer	7538ad2248	x86/hevc_deblock: improve chroma functions register allocation Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-24 01:16:26 +02:00
James Almer	584327f22f	x86/dsputil: fix argument declaration in vector_clipf Should fix fate failures in msvc x86_64 Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-23 23:10:17 +02:00
James Almer	518cbf9b4a	x86/dsputil: fix VECTOR_CLIP_INT32 macro The inline loop was incrementing and using the value of %%i the wrong way. Disassembly of ff_vector_clip_int32_sse2 before and after this patch: movdqa (%rdx),%xmm0 \| movdqa (%rdx),%xmm0 movdqa 0x10(%rdx),%xmm1 \| movdqa 0x10(%rdx),%xmm1 movdqa 0x20(%rdx),%xmm2 \| movdqa 0x20(%rdx),%xmm2 movdqa 0x30(%rdx),%xmm3 \| movdqa 0x30(%rdx),%xmm3 [...] \| movdqa %xmm0,(%rcx) \| movdqa %xmm0,(%rcx) movdqa %xmm1,0x10(%rcx) \| movdqa %xmm1,0x10(%rcx) movdqa %xmm2,0x20(%rcx) \| movdqa %xmm2,0x20(%rcx) movdqa %xmm3,0x30(%rcx) \| movdqa %xmm3,0x30(%rcx) movdqa (%rdx),%xmm0 \| movdqa 0x40(%rdx),%xmm0 movdqa 0x20(%rdx),%xmm1 \| movdqa 0x50(%rdx),%xmm1 movdqa 0x40(%rdx),%xmm2 \| movdqa 0x60(%rdx),%xmm2 movdqa 0x60(%rdx),%xmm3 \| movdqa 0x70(%rdx),%xmm3 [...] \| movdqa %xmm0,(%rcx) \| movdqa %xmm0,0x40(%rcx) movdqa %xmm1,0x20(%rcx) \| movdqa %xmm1,0x50(%rcx) movdqa %xmm2,0x40(%rcx) \| movdqa %xmm2,0x60(%rcx) movdqa %xmm3,0x60(%rcx) \| movdqa %xmm3,0x70(%rcx) add $0x80,%rdx \| add $0x80,%rdx add $0x80,%rcx \| add $0x80,%rcx Other versions were unaffected. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-23 22:59:55 +02:00
James Almer	6a4832caae	x86/diracdsp: mark all functions as yasm No inline asm dirac code remains in the tree, so replace every relevant check. This also moves all the dirac functions from dsputil_mmx.c to diracdsp_mmx.c Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-23 15:02:42 +02:00
James Almer	1d36defe94	x86/dsputil: port ff_vector_clipf_sse to yasm Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-23 00:08:21 +02:00
Christophe Gisquet	c081ca851c	x86: hpeldsp: avg_pixels_xy2 for mmx2&3dnow This is a port of the inline assembly of the mmx version to use the pavg(us\|)b instruction. 8 16 mmx 1498 4355 mmx2 1242 3509 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-22 20:17:49 +02:00
Christophe Gisquet	17ac998055	x86: hpeldsp: mark _xy2 versions as approximate Currently, only the mmx version is bitexact, the others (mmxext and 3dnow) are not, in spite of their naming. Therefore, make their name more obvious. Also restore a comment that was removed in `71155d7b`. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-22 20:17:45 +02:00
Christophe Gisquet	f8de35ebc4	x86: hpeldsp: kill hpeldsp_mmx.c before: 1987 decicycles in 8_x2, 262121 runs, 23 skips after: 1902 decicycles in 8_x2, 262112 runs, 32 skips Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-22 20:17:40 +02:00
James Almer	80ee2dfcf6	x86/dsputil: port ff_put_signed_pixels_clamped_mmx to yasm Also add an SSE2 version Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-21 23:33:45 +02:00
James Almer	7b05267239	x86/dsputil: port clear_block functions to yasm Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-21 23:33:45 +02:00
Michael Niedermayer	3d4e365073	avcodec/x86/hpeldsp_init: remove redundant if() Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-21 13:38:27 +02:00
Hendrik Leppkes	cd9e08e110	hpeldsp: fix build without inline asm Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-21 13:37:38 +02:00
Christophe Gisquet	d1a32c3f49	x86: kill fpel_mmx.c Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-21 03:25:08 +02:00
James Almer	d43c303038	x86/hevc_deblock: use constants instead of generating values at runtime Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-19 23:09:33 +02:00
James Almer	057ebf1222	x86/hevc_deblock: remove some duplicated instructions Also remove a couple unnecessary cmps Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-18 23:28:17 +02:00
Christophe Gisquet	f1793fe9cd	x86: hevc_mc: specify coefficients registers By default, macro EPEL_FILTER loads the coefficients inconditionally into m14/m15. This forces an unneeded higher register count. Reduce that count by making them parameters of EPEL_FILTER. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-18 16:23:58 +02:00
Carl Eugen Hoyos	ef2713747f	Fix compilation of libavcodec/x86/hevc_deblock.asm with nasm. Suggested-by: Reimar	2014-05-17 12:50:55 +02:00
James Almer	be1fbc02b8	x86/hevc_deblock: use movhps instead of shuffling values Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-17 05:40:14 +02:00
James Almer	8aac77fede	x86/hevc_deblock: fix label names Also remove some unnecessary jmps Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-17 05:40:08 +02:00
James Almer	521eaea63a	x86/hevc_deblock: fix usage of ABS1 The second argument is a temp register for non-SSSE3 cases Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-17 05:39:55 +02:00
James Almer	45110d2290	x86/hevc_deblock: merge movs with other instructions Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-17 05:39:34 +02:00
plepere	ef7c4cd001	avcodec/x86/hevc: updated to use x86util macros Reviewed-by: James Almer <jamrial@gmail.com> Reviewed-by: Ronald S. Bultje Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-16 21:11:07 +02:00
plepere	de7b89fd43	avcodec/x86/hevc: added DBF assembly functions Reviewed-by: James Almer <jamrial@gmail.com> Reviewed-by: Ronald S. Bultje Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-16 21:11:03 +02:00
Michael Niedermayer	bebce653e5	avcodec/x86/dsputil_mmx: Fix build with clang-usan Found-by: Katerina Barone-Adesi Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-15 23:56:39 +02:00
Christophe Gisquet	d1310c591e	x86: sbrdsp: implement SSE qmf_deint_neg From 133 (unrolled av_intfloat32 C) to 59 cycles on Arrandale/Win64. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-15 23:11:18 +02:00
Hendrik Leppkes	87f2d8079a	hevcdsp: correctly indicate that hevc_put_hevc_bi_epel_h uses 9 GPRs Fixes FATE on Windows. Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-12 17:00:48 +02:00
James Almer	8e07800001	hevcdsp: include stddef.h for ptrdiff_t definition Including stdint.h was enough for systems like Mingw, but apparently not for Linux. This should fix make checkheaders failures on every platform Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-10 18:23:30 +02:00
James Almer	fa23190a7a	hevcdsp: add missing header include Fixes make checkheaders Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-10 14:55:03 +02:00
Michael Niedermayer	341cacb9ac	avcodec/x86/hevcdsp_init: fix build failure with --disable-mmx Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-09 05:16:27 +02:00
plepere	63832e01c3	hvcodec/x86/hevcdsp: make macros more modular to support functions that are not sse4 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-09 00:14:50 +02:00
Matt Oliver	1898c2f49d	inline asm: fix arrays as named constraints. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-07 15:02:45 +02:00
Michael Niedermayer	fc7d0d8201	avcodec/x86/hevcdsp_init: fix SSE4 checks Found-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-06 18:27:49 +02:00
Michael Niedermayer	7be230b5fa	avcodec/x86/Makefile: remove duplicate line Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-06 18:23:42 +02:00
Michael Niedermayer	3b3db02f2e	avcodec/x86/hevcdsp_init: fix build on 32bit Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-06 18:23:42 +02:00
plepere	7a2491c436	HEVC : added assembly MC functions pretty print x86 Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-06 18:23:36 +02:00
Matt Oliver	ac9869ffb0	x86/mpegaudiodsp.c: msvc compilation error without sse/avx_external Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-06 14:15:02 +02:00
Michael Niedermayer	ebf2c2c3a8	avcodec/lossless_videodsp: fix incompatible pointer type warning Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-05 05:49:18 +02:00
Matt Oliver	3c3e02b8d1	x86/cavdsp: prevent named constraints appearing twice.	2014-05-03 17:47:55 +02:00
James Almer	5ac10d40fb	x86/mpegaudiodsp: define apply_window_mp3 as SSE None of the handwritten asm in this function seems to be SSE2 Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-04-25 00:38:01 +02:00
Hendrik Leppkes	5809c2a99d	vc1dsp: fix build without inline asm Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-04-22 14:01:53 +02:00
Clément Bœsch	62d31307c1	avcodec/x86/vp9lpf: add a comment above a bunch of SWAP.	2014-04-20 21:33:58 +02:00
Clément Bœsch	f0d368d758	avcodec/x86/vp9lpf: merge a few movs with other instructions.	2014-04-20 21:29:11 +02:00
Christophe Gisquet	319235c67c	vc1dsp: introduce cases for 8x8 and 16x16 This allows further unrolling the DSP implementation where possible. x86 and ARM DSP modified by simply moving the multiple calls from vc1dec to the DSP code. Decoding improvements should only occurs because of the compiler actually able to unroll more. Decoding time: ~8.80s -> 8.64s (ie around 2%) Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-04-20 18:25:36 +02:00
Clément Bœsch	010732b73a	vp9/x86: simplify FILTER_INIT. In the 2 FILTER_INIT usages, the source is already preloaded so that extra complexity taken from FILTER_UPDATE is not necessary. Also add forgotten "mask" argument in FILTER_{INIT,UPDATE} comments.	2014-04-19 17:30:33 +02:00
Clément Bœsch	b8d002dc95	vp9/x86: clarify mixed splatb.	2014-04-19 17:00:51 +02:00
Carl Eugen Hoyos	b38910c979	Fix compilation with !HAVE_6REGS. Can be tested with: $ ./configure --cc='cc -m32' --disable-optimizations --enable-pic	2014-04-19 09:56:01 +02:00
Carl Eugen Hoyos	72c93abaad	Use MANGLE in cavsdsp.c to save two registers using gcc. Fixes compilation with !HAVE_6REGS.	2014-04-19 09:54:26 +02:00
James Almer	197fe392db	x86/dsputil: use HADDD where applicable Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-04-17 14:15:35 +02:00
James Almer	76ed71a72b	x86: move horizontal add macros to x86util Also port relevant AVX2/XOP optimizations from x264 with permission to relicense to LGPL from the corresponding authors Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-04-17 14:15:09 +02:00
Michael Niedermayer	46d5625f44	avcodec/x86/idct_sse2_xvid: fix non C99 inline function Found-by: Matt Oliver <protogonoi@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-04-14 18:04:57 +02:00
James Almer	0f524b6c69	x86/synth_filter: remove the fma3 version ifdefs This fixes compilation failures with --disable-fma3 Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>	2014-04-13 11:29:28 +02:00
Timothy Gu	71c32ed533	DNxHD: convert inline asm to yasm	2014-04-11 12:09:09 +02:00
Timothy Gu	676856204b	DNxHD: make get_pixel_8x4_sym accept ptrdiff_t as stride	2014-04-11 12:09:09 +02:00
Matt Oliver	d1e6e5c887	avcodec/x86: Exclude broken get_cabac under icl. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-04-10 17:47:22 +02:00
Matt Oliver	158a80cc0b	Remove leal op to fix icl inline asm. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-04-07 13:02:54 +02:00
Hendrik Leppkes	fc7e02f0ff	dcadsp: fix SSE code to not use SSE2 instructions. movq from SSE register to memory is an SSE2 instruction. Instead, use SSE movlps, which does the same thing. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-04-06 18:31:22 +02:00
Michael Niedermayer	e6f69b324e	Merge commit '57b5b84e208ad61ffdd74ad849bed212deb92bc5' * commit '57b5b84e208ad61ffdd74ad849bed212deb92bc5': x86: dsputil: Move ff_apply_window_int16_* bits to ac3dsp, where they belong Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-04-05 19:36:21 +02:00
Michael Niedermayer	e3c3f277a9	Merge commit 'c2c5be57494e6117086771bca34c8cd4c72c8e99' * commit 'c2c5be57494e6117086771bca34c8cd4c72c8e99': x86: h264_qpel: Simplify an #if conditional Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-04-05 19:30:44 +02:00
Michael Niedermayer	ebb21887b8	Merge commit '01c5779f56cf708e6cb88b11cfdc248cae7e2ee8' * commit '01c5779f56cf708e6cb88b11cfdc248cae7e2ee8': x86: Drop some unnecessary YASM ifdefs Conflicts: libavfilter/x86/vf_yadif_init.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-04-05 19:16:39 +02:00
Michael Niedermayer	874f27a8f7	Merge commit 'b42f49e42f8cde25a788b2d13d03e99ca2956647' * commit 'b42f49e42f8cde25a788b2d13d03e99ca2956647': x86: dsputil: Eliminate some unnecessary dsputil_x86.h #includes Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-04-05 19:05:00 +02:00
Michael Niedermayer	5440151fa4	Merge commit '3dc6272bed7890a49080e18eacf3c7a4a6594b0d' * commit '3dc6272bed7890a49080e18eacf3c7a4a6594b0d': Remove a number of unnecessary dsputil.h #includes Conflicts: libavcodec/h264pred.c libavcodec/vc1dsp.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-04-05 18:54:15 +02:00
James Almer	a1ac12bddd	x86/dcadsp: add ff_dca_lfe_fir0_fma3 ~10% faster than the SSE version. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-04-05 13:55:59 +02:00
James Almer	7d2116dd09	x86/synth_filter: compile avx and fma3 functions unconditionally Fixes compilation failures with "--disable-{avx,fma3} --disable-optimizations" Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-04-05 05:15:27 +02:00
Michael Niedermayer	490d53e335	avcodec/x86/dcadsp_init: fix compilation failure without FMA3 alternatively the call could be put under #if or the #if over the function removed Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-04-05 00:11:48 +02:00
Michael Niedermayer	51fd962c0b	Merge commit 'c74b86699c86bdf62e8570f41d8a38be5710baa3' * commit 'c74b86699c86bdf62e8570f41d8a38be5710baa3': x86/synth_filter: add synth_filter_fma3 x86/synth_filter: add synth_filter_avx x86/synth_filter: add synth_filter_sse Conflicts: libavcodec/x86/dcadsp.asm libavcodec/x86/dcadsp_init.c See: `6467209836` See: `68c3ed936a` See: `7fd64e3e36` See: `aa1f38015c` See: `dfd865e51b` Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-04-04 23:40:08 +02:00
Christophe Gisquet	dfd865e51b	x86/synth_filter: remove the main loop when it's not needed Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-04-04 22:35:45 +02:00
Diego Biurrun	57b5b84e20	x86: dsputil: Move ff_apply_window_int16_* bits to ac3dsp, where they belong	2014-04-04 19:08:05 +02:00
Diego Biurrun	c2c5be5749	x86: h264_qpel: Simplify an #if conditional The extra conditions are covered by previous #ifs and conditional compilation.	2014-04-04 19:08:05 +02:00
Diego Biurrun	01c5779f56	x86: Drop some unnecessary YASM ifdefs Dead code elimination is enough to avoid undefined references in these cases.	2014-04-04 19:08:05 +02:00
Diego Biurrun	b42f49e42f	x86: dsputil: Eliminate some unnecessary dsputil_x86.h #includes	2014-04-04 19:08:05 +02:00
Diego Biurrun	3dc6272bed	Remove a number of unnecessary dsputil.h #includes	2014-04-04 19:08:05 +02:00
James Almer	c74b86699c	x86/synth_filter: add synth_filter_fma3 Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>	2014-04-04 17:40:51 +02:00
James Almer	81e02fae6e	x86/synth_filter: add synth_filter_avx Sandy Bridge Win64: 180 cycles in ff_synth_filter_inner_sse2 150 cycles in ff_synth_filter_inner_avx Also switch some instructions to a three operand format to avoid assembly errors with Yasm 1.1.0 or older. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>	2014-04-04 17:40:51 +02:00
James Almer	2025d8026f	x86/synth_filter: add synth_filter_sse Build only on x86_32 targets. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>	2014-04-04 17:40:51 +02:00
Michael Niedermayer	fb61ed1e9f	Merge commit 'ac4b32df71bd932838043a4838b86d11e169707f' * commit 'ac4b32df71bd932838043a4838b86d11e169707f': On2 VP7 decoder Conflicts: Changelog libavcodec/arm/h264pred_init_arm.c libavcodec/arm/vp8dsp.h libavcodec/arm/vp8dsp_init_arm.c libavcodec/arm/vp8dsp_init_armv6.c libavcodec/arm/vp8dsp_init_neon.c libavcodec/avcodec.h libavcodec/h264pred.c libavcodec/version.h libavcodec/vp8.c libavcodec/vp8.h libavcodec/vp8data.h libavcodec/vp8dsp.c libavcodec/vp8dsp.h libavcodec/x86/h264_intrapred_init.c libavcodec/x86/vp8dsp_init.c See: `89f2f5dbd7` and others Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-04-04 14:46:10 +02:00
Peter Ross	ac4b32df71	On2 VP7 decoder Further performance improvements and security fixes by Vittorio Giovara, Luca Barbato and Diego Biurrun. Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com> Signed-off-by: Luca Barbato <lu_zero@gentoo.org> Signed-off-by: Diego Biurrun <diego@biurrun.de>	2014-04-04 04:00:11 +02:00
Matt Oliver	0f2588d7e5	Use intel compliant CDQ instead of CLTD in inline asm. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-30 23:14:36 +02:00
Clément Bœsch	c4148a6668	x86/vp9mc: add vp9 namespace.	2014-03-29 18:13:15 +01:00
Timothy Gu	9d34dce05b	x86: convert DNxHDenc inline asm to yasm Signed-off-by: Timothy Gu <timothygu99@gmail.com> Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-27 23:16:17 +01:00
Timothy Gu	cb11b9e89e	dnxhdenc: make get_pixel_8x4_sym accept ptrdiff_t as stride Signed-off-by: Timothy Gu <timothygu99@gmail.com> Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-27 23:09:10 +01:00
Michael Niedermayer	4998a72b49	Merge remote-tracking branch 'qatar/master' * qatar/master: x86: hpeldsp: Keep all rnd_template instantiations in hpeldsp_init Conflicts: libavcodec/x86/rnd_mmx.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-26 16:55:46 +01:00
Michael Niedermayer	0371eaebcd	Merge commit 'aba70bb5387f12dfa5e6cd8cb861c9c7e668151f' * commit 'aba70bb5387f12dfa5e6cd8cb861c9c7e668151f': Add missing headers to make template files compile (more) standalone Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-26 14:50:55 +01:00
Diego Biurrun	efc7290eb6	x86: hpeldsp: Keep all rnd_template instantiations in hpeldsp_init There is no point in having a separate file just for the instantiation that provides the public functions.	2014-03-26 04:31:27 -07:00
Diego Biurrun	aba70bb538	Add missing headers to make template files compile (more) standalone	2014-03-26 04:31:27 -07:00
Diego Biurrun	d0aabeab23	x86: h264_qpel: Fix typo in CALL_2X_PIXELS macro invocation This fixes FATE with mmxext CPUFLAGS set.	2014-03-26 12:00:01 +01:00
Peter Ross	a490970af2	libavcodec/*/vp8dsp_init: indent Signed-off-by: Peter Ross <pross@xvid.org> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-25 13:29:29 +01:00
Peter Ross	89f2f5dbd7	On2 VP7 decoder Signed-off-by: Peter Ross <pross@xvid.org> Reviewed-by: BBB previous patch reviewed by jason Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-25 13:29:05 +01:00
Michael Niedermayer	c25d2cd20b	avcodec/x86/mpegvideoenc_template: fix integer overflow Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-25 00:15:52 +01:00
Michael Niedermayer	c8246d3766	avcodec/x86/h264_qpel: Fix typo introduced by `322a1dda97` Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-23 15:04:53 +01:00
Michael Niedermayer	74fed968d1	Merge commit '82dd1026cfc1d72b04019185bea4c1c9621ace3f' * commit '82dd1026cfc1d72b04019185bea4c1c9621ace3f': x86: dsputil: Move hpeldsp-related declarations to a separate header Conflicts: libavcodec/x86/dsputil_x86.h Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-22 23:21:54 +01:00
Michael Niedermayer	9333bba6ed	Merge commit '6655c933a887a2d20707fff657b614aa1d86a25b' * commit '6655c933a887a2d20707fff657b614aa1d86a25b': x86: dsputil: Move fpel declarations to a separate header Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-22 23:08:22 +01:00
Michael Niedermayer	77bc342975	Merge commit '322a1dda973e802db7b57f2007fad3efcd5bab81' * commit '322a1dda973e802db7b57f2007fad3efcd5bab81': dsputil: Refactor duplicated CALL_2X_PIXELS / PIXELS16 macros Conflicts: libavcodec/arm/hpeldsp_init_arm.c libavcodec/x86/dsputil_x86.h Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-22 22:53:33 +01:00
Michael Niedermayer	d6d3cfb0aa	Merge commit '600b854ad8173995518bd917e7f86120b5505088' * commit '600b854ad8173995518bd917e7f86120b5505088': imgconvert: Move ff_deinterlace_line_*_mmx declarations out of dsputil Conflicts: libavcodec/imgconvert.c libavcodec/x86/dsputil_x86.h Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-22 22:17:54 +01:00
Michael Niedermayer	8fbc6e5911	Merge commit '1a8d0cf77ed2611e542ae98f341d4c43a04467bd' * commit '1a8d0cf77ed2611e542ae98f341d4c43a04467bd': x86: dsputil: Move inline assembly macros to a separate header Conflicts: libavcodec/x86/dsputil_mmx.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-22 22:11:27 +01:00
Diego Biurrun	82dd1026cf	x86: dsputil: Move hpeldsp-related declarations to a separate header	2014-03-22 06:17:29 -07:00
Diego Biurrun	6655c933a8	x86: dsputil: Move fpel declarations to a separate header	2014-03-22 06:17:29 -07:00
Diego Biurrun	322a1dda97	dsputil: Refactor duplicated CALL_2X_PIXELS / PIXELS16 macros	2014-03-22 06:17:29 -07:00
Diego Biurrun	600b854ad8	imgconvert: Move ff_deinterlace_line_*_mmx declarations out of dsputil	2014-03-22 06:17:29 -07:00
Diego Biurrun	1a8d0cf77e	x86: dsputil: Move inline assembly macros to a separate header	2014-03-22 06:17:29 -07:00
Matt Oliver	cd5cf395f6	Additional icl inline asm fix. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-22 14:07:03 +01:00
Michael Niedermayer	1cd107f637	avcodec/x86/snowdsp: add missing clobbers to inner_add_yblock_bw_8_obmc_16_bh_even_sse2() and inner_add_yblock_bw_16_obmc_32_sse2() Note, these functions are currently disabled Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-21 03:38:48 +01:00
Michael Niedermayer	e98bac82e5	Merge commit '82bb3048013201c0095d2853d4623633d912252f' * commit '82bb3048013201c0095d2853d4623633d912252f': dsputil: Use correct type in me_cmp_func function pointer Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-20 22:36:40 +01:00
Michael Niedermayer	011d83de48	Merge commit '0e083d7e43805db1a978cb57bfa25fda62e8ff18' * commit '0e083d7e43805db1a978cb57bfa25fda62e8ff18': build: Group general components separate from de/encoders in arch Makefiles Conflicts: libavcodec/arm/Makefile libavcodec/x86/Makefile Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-20 22:26:31 +01:00
Michael Niedermayer	ba85bfabf3	Merge commit '5169e688956be3378adb3b16a93962fe0048f1c9' * commit '5169e688956be3378adb3b16a93962fe0048f1c9': dsputil: Propagate bit depth information to all (sub)init functions Conflicts: libavcodec/arm/dsputil_init_arm.c libavcodec/arm/dsputil_init_armv5te.c libavcodec/arm/dsputil_init_armv6.c libavcodec/arm/dsputil_init_neon.c libavcodec/dsputil.c libavcodec/dsputil.h libavcodec/ppc/dsputil_ppc.c libavcodec/x86/dsputil_init.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-20 22:06:01 +01:00
Diego Biurrun	82bb304801	dsputil: Use correct type in me_cmp_func function pointer	2014-03-20 05:03:23 -07:00
Diego Biurrun	0e083d7e43	build: Group general components separate from de/encoders in arch Makefiles This is in line with how the top-level libavcodec Makefile is structured.	2014-03-20 05:03:23 -07:00
Diego Biurrun	5169e68895	dsputil: Propagate bit depth information to all (sub)init functions This avoids recalculating the value over and over again.	2014-03-20 05:03:23 -07:00
Carl Eugen Hoyos	57fdc74c34	Add one forgotten named inline asm operand in libavcodec/x86/motion_est.c.	2014-03-19 03:00:19 +01:00
Matt Oliver	8236747511	Automatically change MANGLE() into named inline asm operands when direct symbol reference in inline asm are not supported. This is part of the patch-set for intel C inline asm on windows support Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-18 23:39:30 +01:00
Matt Oliver	b2d3a45598	avcodec/x86/mlpdsp: Only use asm when non-local inline asm lables are supported This is part of the patch-set for intel C inline asm on windows support Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-18 23:37:50 +01:00
James Almer	aa1f38015c	x86/synth_filter: improve FMA version Replace mulps+subps with fnmaddps, resulting in two less instructions inside the inner loops. About 1% faster FMA3 performance. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-17 21:04:15 +01:00
Matt Oliver	b73aae6fe9	avcodec/x86/idct_sse2_xvid: move offsets out of MANGLE() Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-17 04:19:59 +01:00
Matt Oliver	9eb3f11c55	Add missing external declarations. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-17 00:48:09 +01:00
Matt Oliver	590805b7c3	Fixed 64bit conformance with mvzbl. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-17 00:13:50 +01:00
Michael Niedermayer	5dd97d5809	Merge commit 'db3f61a04f1f66746660f921bb2780ddf1141f3b' * commit 'db3f61a04f1f66746660f921bb2780ddf1141f3b': x86: dsputil_init: Drop some unnecessary parentheses Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-14 01:25:57 +01:00
Michael Niedermayer	27cab16ce7	Merge commit '441b093915717afa7d24be34bdab2a4911b30a57' * commit '441b093915717afa7d24be34bdab2a4911b30a57': x86: dsputil_init: K&R formatting cosmetics Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-14 01:25:36 +01:00
Michael Niedermayer	236874a571	Merge commit '4cb4680c1087a2cd13d4b0c9167a2eb3147f99d8' * commit '4cb4680c1087a2cd13d4b0c9167a2eb3147f99d8': x86: dsputil_x86.h: K&R formatting cosmetics Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-14 01:25:19 +01:00
Michael Niedermayer	925ce6faf4	Merge commit 'f8bbebecfd7ea3dceb7c96f931beca33f80a3490' * commit 'f8bbebecfd7ea3dceb7c96f931beca33f80a3490': x86: motion_est: K&R formatting cosmetics Conflicts: libavcodec/x86/motion_est.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-14 01:20:43 +01:00
Michael Niedermayer	b7a5f5dc66	Merge commit 'a36947c167d7278b891453083b57dc56b7a7f5c5' * commit 'a36947c167d7278b891453083b57dc56b7a7f5c5': dsputilenc_mmx: K&R formatting cosmetics Conflicts: libavcodec/x86/dsputilenc_mmx.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-14 01:09:57 +01:00
Michael Niedermayer	d926c4b240	Merge commit '38675229a879aa5258a8c71891fc8cbf74cf139f' * commit '38675229a879aa5258a8c71891fc8cbf74cf139f': dsputil_mmx: K&R formatting cosmetics Conflicts: libavcodec/x86/dsputil_mmx.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-14 01:01:37 +01:00
Michael Niedermayer	55f53f6c29	Merge commit '6a8b35dc88b4a1a452f192fbbf53ae7f59bc3f23' * commit '6a8b35dc88b4a1a452f192fbbf53ae7f59bc3f23': dsputilenc_mmx: Merge two assignment blocks with identical conditions Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-14 00:57:25 +01:00
Michael Niedermayer	4104eb44e6	Merge commit '55519926ef855c671d084ccc151056de9e3d3a77' * commit '55519926ef855c671d084ccc151056de9e3d3a77': x86: Make function prototype comments in assembly code consistent Conflicts: libavcodec/x86/sbrdsp.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-14 00:01:30 +01:00
Michael Niedermayer	a9b1936a4e	Merge commit 'edd1f833fa145eb9c5026877c699ebe6efca00a0' * commit 'edd1f833fa145eb9c5026877c699ebe6efca00a0': x86: h264_idct_10_bit: Use proper type in function prototype comments Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-14 00:00:16 +01:00
Michael Niedermayer	1c788eaca9	Merge commit '831a1180785a786272cdcefb71566a770bfb879e' * commit '831a1180785a786272cdcefb71566a770bfb879e': Update dsputil- and SIMD-related comments to match reality more closely Conflicts: libavcodec/x86/hpeldsp.asm libavutil/arm/float_dsp_init_arm.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-13 23:59:56 +01:00
Michael Niedermayer	d61e1156be	Merge commit '17608f6ee3d2088cdb8d1e704276d8b34f01160d' * commit '17608f6ee3d2088cdb8d1e704276d8b34f01160d': x86: Add some more missing headers Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-13 23:41:17 +01:00
Diego Biurrun	db3f61a04f	x86: dsputil_init: Drop some unnecessary parentheses	2014-03-13 08:15:51 -07:00
Diego Biurrun	441b093915	x86: dsputil_init: K&R formatting cosmetics	2014-03-13 08:15:51 -07:00
Diego Biurrun	4cb4680c10	x86: dsputil_x86.h: K&R formatting cosmetics	2014-03-13 08:15:51 -07:00
Diego Biurrun	f8bbebecfd	x86: motion_est: K&R formatting cosmetics	2014-03-13 08:15:51 -07:00
Diego Biurrun	a36947c167	dsputilenc_mmx: K&R formatting cosmetics	2014-03-13 08:15:51 -07:00
Diego Biurrun	38675229a8	dsputil_mmx: K&R formatting cosmetics	2014-03-13 08:15:51 -07:00
Diego Biurrun	6a8b35dc88	dsputilenc_mmx: Merge two assignment blocks with identical conditions	2014-03-13 08:15:51 -07:00
Diego Biurrun	55519926ef	x86: Make function prototype comments in assembly code consistent This helps grepping for functions, among other things.	2014-03-13 05:50:29 -07:00
Diego Biurrun	edd1f833fa	x86: h264_idct_10_bit: Use proper type in function prototype comments	2014-03-13 05:50:29 -07:00
Diego Biurrun	831a118078	Update dsputil- and SIMD-related comments to match reality more closely	2014-03-13 05:50:29 -07:00
Diego Biurrun	17608f6ee3	x86: Add some more missing headers	2014-03-13 05:50:28 -07:00
Diego Biurrun	08dba0e1c3	x86: mpegvideoenc: Remove some remnants of the long-gone libmpeg2 IDCT	2014-03-13 05:50:28 -07:00
James Almer	9e0e1f9067	x86/dsputil: add emms to ff_scalarproduct_int16_mmxext() Also undo the changes to ra144enc.c from previous commits. Should fix ticket #3429 Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-06 18:23:55 +01:00
Michael Niedermayer	2d99de66b7	Merge commit '3bfdee00cd92ff07c364d4901c4aefda32780756' * commit '3bfdee00cd92ff07c364d4901c4aefda32780756': x86: dcadsp: Fix linking with yasm and optimizations disabled Conflicts: libavcodec/x86/dcadsp_init.c See: `206167a295` Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-06 14:10:27 +01:00
Diego Biurrun	3bfdee00cd	x86: dcadsp: Fix linking with yasm and optimizations disabled Some optimized functions reference optimized symbols, so the functions must be explicitly disabled when those symbols are unavailable.	2014-03-05 23:16:21 +01:00
Michael Niedermayer	146b476ba0	Merge commit '3741aa37c2a0d0717faff74a5c4cc357d16f6d1d' * commit '3741aa37c2a0d0717faff74a5c4cc357d16f6d1d': x86: cabac: Use correct #includes to make header compile standalone Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-05 21:33:44 +01:00
Diego Biurrun	3741aa37c2	x86: cabac: Use correct #includes to make header compile standalone	2014-03-05 13:32:25 +01:00
James Almer	7fd64e3e36	x86/synth_filter: add synth_filter_fma3 Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-05 01:58:16 +01:00
James Almer	206167a295	x86/synth_filter: add missing HAVE_YASM guard Should fix compilation failures with --disable-yasm on some compilers Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-04 22:47:28 +01:00
James Almer	884e085d1e	x86/synth_filter: Revert the switch to float ops with SSE2 This reverts the changes `6467209836` and `68c3ed936a` did to the SSE2 version, which generated a hit of about 5 cycles. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-02 11:58:10 +01:00
James Almer	68c3ed936a	x86/synth_filter: add synth_filter_avx Sandy Bridge Win64: 180 cycles on ff_synth_filter_inner_sse2 150 cycles on ff_synth_filter_inner_avx Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-02 01:00:55 +01:00
James Almer	6467209836	x86/synth_filter: add synth_filter_sse Build only on x86_32 targets. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-01 15:32:40 +01:00
Michael Niedermayer	fb3c33f3cd	Merge commit '4cb6964244fd6c099383d8b7e99731e72cc844b9' * commit '4cb6964244fd6c099383d8b7e99731e72cc844b9': dcadec: simplify decoding of VQ high frequencies Conflicts: configure libavcodec/dcadec.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-28 21:41:19 +01:00
Michael Niedermayer	baf3adc621	Merge commit '08e3ea60ff4059341b74be04a428a38f7c3630b0' * commit '08e3ea60ff4059341b74be04a428a38f7c3630b0': x86: synth filter float: implement SSE2 version Conflicts: libavcodec/x86/dcadsp.asm libavcodec/x86/dcadsp_init.c See: `2cdbcc0048` Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-28 20:38:39 +01:00
Christophe Gisquet	2cdbcc0048	x86: synth filter float: implement SSE2 version Timings for Arrandale: C SSE win32: 2108 334 win64: 1152 322 Factorizing the inner loop with a call/jmp is a >15 cycles cost, even with the jmp destination being aligned. Unrolling for ARCH_X86_64 is a 20 cycles gain. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-28 20:34:40 +01:00
Michael Niedermayer	e346a59383	Merge commit 'ad507d7907457e678900bac132122ba7be4644cb' * commit 'ad507d7907457e678900bac132122ba7be4644cb': x86: dcadsp: implement SSE lfe_dir Conflicts: libavcodec/x86/dcadsp.asm See: `169243112c` Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-28 19:22:00 +01:00
Christophe Gisquet	169243112c	x86: dcadsp: implement SSE lfe_dir Results for Arrandale/Windows: 32: 1670 -> 316 64: 728 -> 298 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-28 19:20:03 +01:00
Michael Niedermayer	5ba1648318	Merge commit 'b23650491fbd579a4365f42bd42575afb7b53f7e' * commit 'b23650491fbd579a4365f42bd42575afb7b53f7e': prores: Use consistent names for DSP arch initialization functions Conflicts: libavcodec/proresdsp.c libavcodec/proresdsp.h libavcodec/x86/proresdsp_init.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-28 17:13:00 +01:00
Christophe Gisquet	4cb6964244	dcadec: simplify decoding of VQ high frequencies The vector dequantization has a test in a loop preventing effective SIMD implementation. By moving it out of the loop, this loop can be DSPized. Therefore, modify the current DSP implementation. In particular, the DSP implementation no longer has to handle null loop sizes. The decode_hf implementations have following timings: For x86 Arrandale: C SSE SSE2 SSE4 win32: 260 162 119 104 win64: 242 N/A 89 72 The arm NEON optimizations follow in a later patch as external asm. The now unused check for the y modifier in arm inline asm is removed from configure.	2014-02-28 13:03:22 +01:00
Christophe Gisquet	08e3ea60ff	x86: synth filter float: implement SSE2 version Timings for Arrandale: C SSE win32: 2108 334 win64: 1152 322 Factorizing the inner loop with a call/jmp is a >15 cycles cost, even with the jmp destination being aligned. Unrolling for ARCH_X86_64 is a 20 cycles gain. Signed-off-by: Janne Grunau <janne-libav@jannau.net>	2014-02-28 13:00:48 +01:00
Christophe Gisquet	ad507d7907	x86: dcadsp: implement SSE lfe_dir Results for Arrandale/Windows: 32: 1670 -> 316 64: 728 -> 298 Signed-off-by: Janne Grunau <janne-libav@jannau.net>	2014-02-28 13:00:47 +01:00
Diego Biurrun	b23650491f	prores: Use consistent names for DSP arch initialization functions	2014-02-28 10:34:55 +01:00
James Almer	2163a40a46	x86/imdct36: use sse3 instructions in the last BUTTERF step when possible Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-27 23:28:15 +01:00
James Almer	fbf98375e4	x86/imdct36: don't build imdct36_float_sse on x86_64 targets There's an SSE2 version as well, and x86_64 guarantees that instruction set is present. Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-27 22:54:03 +01:00
James Almer	3f3d748cab	x86: Move XOP emulation to x86util We need the emulation to support the cases where the first argument is the same as the fourth. To achieve this a fifth argument working as a temporary may be needed. Emulation that doesn't obey the original instruction semantics can't be in x86inc. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-24 08:30:19 +01:00
Michael Niedermayer	8372aaf721	Merge commit '017a06a9ee86b047079166c2694c9c655ff03356' * commit '017a06a9ee86b047079166c2694c9c655ff03356': x86: dsputil: Use correct file name as multiple inclusion guard Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-20 14:58:04 +01:00
Diego Biurrun	017a06a9ee	x86: dsputil: Use correct file name as multiple inclusion guard	2014-02-20 04:16:15 -08:00
Michael Niedermayer	130c33af35	Merge commit 'b23bc95920e2f10b9621857e829c45b064f356c0' * commit 'b23bc95920e2f10b9621857e829c45b064f356c0': x86: dca: Add missing multiple inclusion guards Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-19 15:44:48 +01:00
Diego Biurrun	b23bc95920	x86: dca: Add missing multiple inclusion guards	2014-02-19 10:19:15 +01:00
Hendrik Leppkes	7716eda0aa	vp9/x86: set correct number of registers used in intra pred asm Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-18 17:20:14 +01:00
James Almer	07b4b0ca62	tta/x86: add ff_ttafilter_process_dec_{ssse3, sse4} Results are from a Win64 build running on an AMD FX 6300 1121 decicycles in ttafilter_process_dec_c, 16777112 runs, 104 skips 522 decicycles in ff_ttafilter_process_dec_ssse3, 16777149 runs, 67 skips 477 decicycles in ff_ttafilter_process_dec_sse4, 16777156 runs, 60 skips Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-17 13:51:19 +01:00
Ronald S. Bultje	fdb093c4e4	vp9/x86: intra prediction SIMD. Partially based on h264_intrapred. (I hope to eventually merge these two intrapred implementations back together.)	2014-02-17 13:39:00 +01:00
James Almer	ec482e738d	x86/fladsp: add missing check to ff_flacdsp_init_x86() Fixes compilation with flac decoder disabled and encoder enabled Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-16 12:06:04 +01:00
Michael Niedermayer	d601106ab1	avcodec/x86/lossless_videodsp: fix w type Fixes fate issues on mingw64 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-15 06:41:38 +01:00
Peter Ross	b8664c9294	avcodec/vp8dsp: add VP7 idct and loop filter Signed-off-by: Peter Ross <pross@xvid.org> Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-15 02:15:35 +01:00
James Almer	e87974bc00	flac/x86: add ff_flac_lpc_32_xop() Tested on an AMD FX 6300 679081 decicycles in ff_flac_lpc_32_xop, 32768 runs 774425 decicycles in ff_flac_lpc_32_sse4, 32768 runs Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-13 22:14:59 +01:00
James Darnley	623f380a18	lavc: fix flac encoder and decoder dependencies Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-13 21:00:32 +01:00
Michael Niedermayer	df98b36aa6	Merge commit '5c1c6e82261b856214499b9fef3a08bf3ff6e0ae' * commit '5c1c6e82261b856214499b9fef3a08bf3ff6e0ae': dca: include dcadsp.h in {arm,x86}/dca.h for checkheaders Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-08 17:25:31 +01:00
Michael Niedermayer	dd2b330347	Merge commit '0cffd6fff59f192120dc93aa6c3cb8180f5506e3' * commit '0cffd6fff59f192120dc93aa6c3cb8180f5506e3': x86: use the inline int8x8_fmul_int32 only if inline SSE2 is availbale Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-08 17:06:57 +01:00
Janne Grunau	5c1c6e8226	dca: include dcadsp.h in {arm,x86}/dca.h for checkheaders	2014-02-08 13:38:36 +01:00
Janne Grunau	0cffd6fff5	x86: use the inline int8x8_fmul_int32 only if inline SSE2 is availbale Fixes compilation with MSVC. Also does not rely on on earlier config.h include but include it directly.	2014-02-08 12:10:56 +01:00
Clément Bœsch	669d4f9053	x86/vp9lpf: simplify 2nd transpose in 44/48/88/84. For non-avx optims, this saves 8 movs. before: 1785 decicycles in ff_vp9_loop_filter_h_44_16_ssse3, 524129 runs, 159 skips 3327 decicycles in ff_vp9_loop_filter_h_48_16_ssse3, 262116 runs, 28 skips 2712 decicycles in ff_vp9_loop_filter_h_88_16_ssse3, 4193729 runs, 575 skips 3237 decicycles in ff_vp9_loop_filter_h_84_16_ssse3, 524061 runs, 227 skips after: 1768 decicycles in ff_vp9_loop_filter_h_44_16_ssse3, 524062 runs, 226 skips 3310 decicycles in ff_vp9_loop_filter_h_48_16_ssse3, 262107 runs, 37 skips 2719 decicycles in ff_vp9_loop_filter_h_88_16_ssse3, 4193954 runs, 350 skips 3184 decicycles in ff_vp9_loop_filter_h_84_16_ssse3, 524236 runs, 52 skips	2014-02-08 11:10:23 +01:00
Michael Niedermayer	82ae8a44e6	Merge commit '5b59a9fc6152169599561f04b4f66370edda5c9c' * commit '5b59a9fc6152169599561f04b4f66370edda5c9c': x86: dcadsp: implement int8x8_fmul_int32 Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-08 01:20:33 +01:00
Christophe Gisquet	5b59a9fc61	x86: dcadsp: implement int8x8_fmul_int32 For the callable function (as opposed to the inline one): C SSE SSE2 SSE4 Win32: 47 42 29 26 Win64: 30 33 25 23 The SSE version is neither compiled nor set for ARCH_X86_64, as the inlinable function takes over. Signed-off-by: Janne Grunau <janne-libav@jannau.net>	2014-02-07 22:52:40 +01:00
Loren Merritt	9c978f243a	flac/x86: add ff_flac_lpc_32_sse4() benchmarked on sandybridge x86_64: 1358232 decicycles in flac_lpc_32_c 1244575 decicycles in flac_lpc_32_sse4, James Almer's patch 650045 decicycles in flac_lpc_32_sse4, this patch I haven't tested the edgecases such as odd block lengths odd block length tested-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-06 02:51:19 +01:00
Clément Bœsch	d92a725329	x86/vp9lpf: remove 8 SWAPs in 84/48 transpose.	2014-02-05 07:21:13 +01:00
Clément Bœsch	97dde561de	x86/vp9lpf: remove braindead double pxor.	2014-02-05 07:21:11 +01:00
Clément Bœsch	9a3b05b0a9	x86/vp9lpf: save a few mov in flat8in/hev masks calc.	2014-02-05 07:21:09 +01:00
Clément Bœsch	91d85bb167	x86/vp9lpf: add ff_vp9_loop_filter_[vh]_44_16_{sse2,ssse3,avx}.	2014-02-05 07:21:06 +01:00
Michael Niedermayer	de17ccc774	Merge commit '51daafb02eaf96e0743a37ce95a7f5d02c1fa3c2' * commit '51daafb02eaf96e0743a37ce95a7f5d02c1fa3c2': x86: videodsp: Properly mark sse2 instructions in emulated_edge_mc as such. Conflicts: libavcodec/x86/videodsp_init.c See: `1b3a7e1f42` Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-01-31 14:30:30 +01:00
Clément Bœsch	c5dd73b890	x86/vp9lpf: add ff_vp9_loop_filter_h_{48,84}_16_{sse2,ssse3,avx}(). 5.40s → 5.30s overall decode time with -threads 1 on ped1080p.webm (i7 920, ssse3)	2014-01-30 19:34:13 +01:00
Ronald S. Bultje	9ee9c679a7	x86: videodsp: Fix a bug in a %if statement where we used '%%' instead of '&&'. Signed-off-by: Janne Grunau <janne-libav@jannau.net>	2014-01-30 15:33:23 +01:00
Ronald S. Bultje	51daafb02e	x86: videodsp: Properly mark sse2 instructions in emulated_edge_mc as such. Should fix crashes or corrupt output on pre-SSE2 CPUs when they were using SSE2-code (e.g. AMD Athlon XP 2400+ or Intel Pentium III) in hfix or hvar single-edge (left/right) extension functions. Signed-off-by: Janne Grunau <janne-libav@jannau.net>	2014-01-30 15:30:01 +01:00
James Almer	644c32ea4b	x86/vp9lpf: add ff_vp9_loop_filter_[vh]_88_16_sse2() Similar gains as the ssse3 version once again Signed-off-by: James Almer <jamrial@gmail.com>	2014-01-28 09:30:55 +01:00
Clément Bœsch	222c46c531	x86/vp9lpf: add ff_vp9_loop_filter_[vh]_88_16_{ssse3,avx}. 9680 decicycles in loop_filter_v_88_16_c, 4193765 runs, 539 skips 9233 decicycles in loop_filter_h_88_16_c, 4193751 runs, 553 skips 1929 decicycles in ff_vp9_loop_filter_v_88_16_ssse3, 4194118 runs, 186 skips 2738 decicycles in ff_vp9_loop_filter_h_88_16_ssse3, 4193861 runs, 443 skips 5.978 → 5.417 overall decode time on ped1080p.webm (-threads 1) Adding SSE2 support should be relatively trivial (just a matter of changing the pshufb [mask_mix] with something else), patch welcome.	2014-01-28 07:36:38 +01:00
Clément Bœsch	822385d775	x86/vp9lpf: add a preload system in FILTER_UPDATE. Allow some macro refactoring in filter14().	2014-01-27 22:39:26 +01:00
Clément Bœsch	315b4775ad	x86/vp9lpf: refactor v/h using common macros for P7 to Q7.	2014-01-27 22:39:26 +01:00
Clément Bœsch	5d144086cc	x86/vp9lpf: faster P7..Q7 accesses. Introduce 2 additional registers for stride3 and mstride3 to allow direct accesses (lea drops). 3931 → 3827 decicycles in ff_vp9_loop_filter_v_16_16_ssse3 Also uses defines to clarify the code.	2014-01-27 22:37:42 +01:00
Clément Bœsch	5f4d04d084	x86/lossless_videodsp: silly one-line cosmetic.	2014-01-25 16:24:50 +01:00
Clément Bœsch	5267e85056	x86/lossless_videodsp: use common macro for add and diff int16 loop.	2014-01-25 14:27:37 +01:00
Clément Bœsch	cddbfd2a95	x86/lossless_videodsp: simplify and explicit aligned/unaligned flags	2014-01-25 11:59:43 +01:00
Ronald S. Bultje	c9e6325ed9	vp9/x86: use explicit register for relative stack references. Before this patch, we explicitly modify rsp, which isn't necessarily universally acceptable, since the space under the stack pointer might be modified in things like signal handlers. Therefore, use an explicit register to hold the stack pointer relative to the bottom of the stack (i.e. rsp). This will also clear out valgrind errors about the use of uninitialized data that started occurring after the idct16x16/ssse3 optimizations were first merged.	2014-01-24 19:25:25 -05:00
Ronald S. Bultje	97474d527f	vp9/x86: iwht4x4 (lossless) mmx.	2014-01-24 19:25:25 -05:00
Ronald S. Bultje	d43efa68bd	vp9/x86: 4x4 iadst SIMD (ssse3) variants. Cycle measurements for intra itxfm_4x4_add on ped1080p.webm: idct_idct: 66 -> 67 cycles (noise measurement) idct_iadst: 199 -> 79 cycles iadst_idct: 165 -> 70 cycles iadst_iadst: 183 -> 82 cycles	2014-01-24 19:25:25 -05:00
Ronald S. Bultje	baf47020cd	vp9/x86: 8x8 iadst SIMD (ssse3/avx) variants. Cycle measurements for intra itxfm_8x8_add on ped1080p.webm: idct_idct: 133 -> 135 cycles (noise measurement) idct_iadst: 900 -> 241 cycles iadst_idct: 864 -> 215 cycles iadst_iadst: 973 -> 310 cycles	2014-01-24 19:25:25 -05:00
Michael Niedermayer	e6d1c66d74	avcodec/x86/lossless_videodsp: disable median optimizations for 16bps They only support upto 15bps Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-01-23 01:51:24 +01:00
Michael Niedermayer	eaacfc7dd1	avcodec/lossless_videodsp: Pass AVCodecContext to init Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-01-23 01:43:00 +01:00
Michael Niedermayer	ef00ef7553	avcodec/x86/lossless_videodsp: port sub_hfyu_median_prediction_int16 to yasm Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-01-22 23:27:27 +01:00
Michael Niedermayer	fad49aae28	avcodec/x86/lossless_videodsp: Port sub_hfyu_median_prediction_mmxext to int16 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-01-22 22:55:49 +01:00
Michael Niedermayer	fee97f25fa	avcodec/x86/lossless_videodsp: port add_hfyu_median_prediction_mmxext to 16bit Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-01-22 21:11:40 +01:00
Michael Niedermayer	631939bde6	avcodec/x86/lossless_videodsp: add diff_int16_mmx/sse2 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-01-22 19:41:21 +01:00
Reimar Döffinger	76421982d0	lossless_videodsp.asm: fix compilation. Fixes these errors with nasm: libavcodec/x86/lossless_videodsp.asm:86: error: invalid combination of opcode and operands libavcodec/x86/lossless_videodsp.asm:88: error: invalid combination of opcode and operands I don't know whether movd or movq was meant, but either way maskq vs. maskd must match the mov size. Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>	2014-01-21 19:46:02 +01:00
Michael Niedermayer	83b67ca056	avcodec/x86/lossless_videodsp: Port lorens add_hfyu_left_prediction_ssse3/sse4 to 16bit Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-01-21 02:55:41 +01:00
Michael Niedermayer	63d2be7533	avcodec/x86/lossless_videodsp: use SPLATW in add_int16 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-01-21 02:33:20 +01:00
Michael Niedermayer	f70d7eb20c	Move add/diff_int16 to lossless_videodsp Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-01-20 21:32:47 +01:00
Michael Niedermayer	a493f8541d	avcodec/x86/dsp: add_int16_mmx / add_int16_sse2 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-01-20 04:06:46 +01:00
James Almer	26800e3864	vp9/x86: rename ff_avg[48]_sse to ff_avg[48]_mmxext pavgb is an sse integer instruction, so the mmxext flag is enough Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-01-18 17:08:25 +01:00
James Almer	d2a7314f1e	vp9/x86: add ff_vp9_loop_filter_[vh]_16_16_sse2(). Similar gains in performance as the SSSE3 version Signed-off-by: James Almer <jamrial@gmail.com>	2014-01-17 14:16:38 +01:00
Ronald S. Bultje	8173d1ffc0	vp9/x86: 16x16 iadst_idct, idct_iadst and iadst_iadst (ssse3+avx). Sample timings on ped1080p.webm (of the ssse3 functions): iadst_idct: 4672 -> 1175 cycles idct_iadst: 4736 -> 1263 cycles iadst_iadst: 4924 -> 1438 cycles Total decoding time changed from 6.565s to 6.413s.	2014-01-16 13:49:31 +01:00
Clément Bœsch	9cc8fa63dd	vp9/x86: simplify a few mc inits.	2014-01-16 07:48:27 +01:00
Michael Niedermayer	6391dec82a	Merge remote-tracking branch 'qatar/master' * qatar/master: x86: dsputil: Simplify xvmc deprecation conditional Conflicts: libavcodec/x86/dsputil_init.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-01-15 20:41:08 +01:00
Diego Biurrun	aab40bbfd5	x86: dsputil: Simplify xvmc deprecation conditional	2014-01-15 15:23:46 +01:00
Clément Bœsch	8b4190da93	vp9/x86: add AVX for itxfm and lpf. 4412 decicycles in ff_vp9_loop_filter_h_16_16_ssse3, 4193462 runs, 842 skips 3600 decicycles in ff_vp9_loop_filter_h_16_16_avx, 4193621 runs, 683 skips 3010 decicycles in ff_vp9_loop_filter_v_16_16_ssse3, 4193528 runs, 776 skips 2678 decicycles in ff_vp9_loop_filter_v_16_16_avx, 4193742 runs, 562 skips 23025 decicycles in ff_vp9_idct_idct_32x32_add_ssse3, 2096871 runs, 281 skips 19943 decicycles in ff_vp9_idct_idct_32x32_add_avx, 2096815 runs, 337 skips 4675 decicycles in ff_vp9_idct_idct_16x16_add_ssse3, 4194018 runs, 286 skips 3980 decicycles in ff_vp9_idct_idct_16x16_add_avx, 4194022 runs, 282 skips 967 decicycles in ff_vp9_idct_idct_8x8_add_ssse3, 16776972 runs, 244 skips 887 decicycles in ff_vp9_idct_idct_8x8_add_avx, 16777002 runs, 214 skips	2014-01-15 15:54:03 +01:00
Michael Niedermayer	cb613657ee	avcodec/x86/proresdsp_init: x86 prores IDCT is bitexact again reenable it for for bitexact mode Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-01-14 15:59:00 +01:00
Michael Niedermayer	b148a39d55	Merge commit '46bacb5cc6169ff5e8e982495c4925467c1d8bb7' * commit '46bacb5cc6169ff5e8e982495c4925467c1d8bb7': x86: Consistently use cpu flag detection macros in places that still miss it Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-01-14 14:44:59 +01:00
Diego Biurrun	46bacb5cc6	x86: Consistently use cpu flag detection macros in places that still miss it	2014-01-14 00:04:58 +01:00
Clément Bœsch	af68bd1c06	vp9/x86: add ff_vp9_loop_filter_[vh]_16_16_ssse3(). 16662 decicycles in loop_filter_h_16_16_c, 8387355 runs, 1253 skips 17510 decicycles in loop_filter_v_16_16_c, 8387516 runs, 1092 skips 4941 decicycles in ff_vp9_loop_filter_h_16_16_ssse3, 8387887 runs, 721 skips 3899 decicycles in ff_vp9_loop_filter_v_16_16_ssse3, 8387980 runs, 628 skips Overall decode time goes from: ./ffmpeg -v 0 -nostats -threads 1 -i ~/samples/vp9/ped1080p.webm -f null - 8.10s user 0.02s system 99% cpu 8.126 total to: ./ffmpeg -v 0 -nostats -threads 1 -i ~/samples/vp9/ped1080p.webm -f null - 6.15s user 0.04s system 99% cpu 6.199 total (46 to 61 fps)	2014-01-12 20:20:24 +01:00
Clément Bœsch	e11ceea68f	vp9/x86: factor out some code in VP9_UNPACK_MULSUB_2W_4X.	2014-01-12 20:19:00 +01:00
Clément Bœsch	c9aa0b8f70	vp9/x86: remove reg redundancy in VP9_MULSUB_2W_2X.	2014-01-12 20:18:55 +01:00
Clément Bœsch	7c55ee6168	vp9/x86: merge IDCT coef macros.	2014-01-12 20:18:44 +01:00
Michael Niedermayer	92b2404571	Merge commit '4c642d8d98703faf52983243098f35865e15b312' * commit '4c642d8d98703faf52983243098f35865e15b312': x86: hpeldsp: Add missing av_cold attribute to init function Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-01-09 20:32:53 +01:00
Michael Niedermayer	390452bab6	Merge commit 'b0be1ae792ac8bbfb0fc7b9b9cb39eaf0feb489b' * commit 'b0be1ae792ac8bbfb0fc7b9b9cb39eaf0feb489b': x86: avcodec: Add a bunch of missing #includes for av_cold Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-01-09 20:24:15 +01:00
Diego Biurrun	4c642d8d98	x86: hpeldsp: Add missing av_cold attribute to init function	2014-01-09 15:09:07 +01:00
Diego Biurrun	b0be1ae792	x86: avcodec: Add a bunch of missing #includes for av_cold	2014-01-09 15:09:07 +01:00
Ronald S. Bultje	c6fe984f2f	vp9/x86: make STORE_2X2 macro local. Prevents this assembler warning: libavcodec/x86/vp9itxfm.asm:1208: warning: (VP9_IDCT32_1D:309) redefining multi-line macro `STORE_2X2' Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-01-08 14:07:15 +01:00
Ronald S. Bultje	04a187fb2a	vp9/x86: idct_32x32_add_ssse3 sub-8x8-idct. Runtime of the full 32x32 idct goes from 2446 to 2441 cycles (intra) or from 1425 to 1306 cycles (inter). Overall runtime is not significantly affected.	2014-01-07 20:43:35 -05:00
Ronald S. Bultje	37b001d14d	vp9/x86: idct_32x32_add_ssse3 sub-16x16-idct. Runtime of all IDCTs together goes from 3327 to 2473 cycles (intra, i.e. ~35% faster) or from 2312 to 1448 cycles (inter, i.e. ~60% faster). Total decode time of ped1080p.webm goes from 8.086sec to 7.974sec (1.4% faster).	2014-01-07 20:43:34 -05:00
Ronald S. Bultje	e84d14df10	vp9/x86: idct_32x32_add_ssse3. Sub-IDCTs will follow later. ped1080.webm goes from 9.295s to 8.191s (13.5% faster). The IDCT itself goes from 4372 (intra) or 4337 (inter) to 403 (intra) or 329 (inter) cycles for the DC-only form, 23755 (intra) or 23723 (inter) to 3497 (intra) or 3607 (inter) cycles for the no-DC form, which averages from 23393 (intra) or 16612 (inter) to 3449 (intra) or 2392 (inter) for all 32x32s together, i.e. about ~7x faster (all tests done on ped1080p.webm).	2014-01-07 20:43:30 -05:00
Michael Niedermayer	30056fd0be	Merge commit 'a03a642d5ceb5f2f7c6ebbf56ff365dfbcdb65eb' * commit 'a03a642d5ceb5f2f7c6ebbf56ff365dfbcdb65eb': h264: do not use 422 functions for monochrome See: `07abf13da4` Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-01-06 16:51:23 +01:00
Anton Khirnov	a03a642d5c	h264: do not use 422 functions for monochrome Fixes invalid memory access. Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind CC:libav-stable@libav.org	2014-01-06 08:25:36 +01:00
Ronald S. Bultje	18175baa54	vp9/x86: 16px MC functions (64bit only). Cycle counts for large MCs (old -> new on ped1080p.webm, mx!=0&&my!=0): 16x8: 876 -> 870 (0.7%) 16x16: 1444 -> 1435 (0.7%) 16x32: 2784 -> 2748 (1.3%) 32x16: 2455 -> 2349 (4.5%) 32x32: 4641 -> 4084 (13.6%) 32x64: 9200 -> 7834 (17.4%) 64x32: 8980 -> 7197 (24.8%) 64x64: 17330 -> 13796 (25.6%) Total decoding time goes from 9.326sec to 9.182sec.	2013-12-26 21:05:10 -05:00
Ronald S. Bultje	0d9375fc90	vp9/x86: 16x16 sub-IDCT for top-left 8x8 subblock (eob <= 38). Sub8x8 speed (w/o dc-only case) goes from ~750 cycles (inter) or ~735 cycles (intra) to ~415 cycles (inter) or ~430 cycles (intra). Average overall 16x16 idct speed goes from ~635 cycles (inter) or ~720 cycles (intra) to ~415 cycles (inter) or ~545 (intra) - all measurements done using ped1080p.webm.	2013-12-26 07:40:25 -05:00
Ivan Kalvachev	1c63aed232	Convert XvMC to hwaccel v3 Signed-off-by: Ivan Kalvachev <ikalvachev@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-12-22 22:03:47 +01:00
Michael Niedermayer	ce612fc186	Merge commit 'dfc50ac85e9d68a771b556297b7c411650206f3b' * commit 'dfc50ac85e9d68a771b556297b7c411650206f3b': x86: mpegvideo: move denoise_dct asm to mpegvideoenc Conflicts: libavcodec/x86/mpegvideo.c libavcodec/x86/mpegvideoenc.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-12-20 23:44:31 +01:00
Anton Khirnov	dfc50ac85e	x86: mpegvideo: move denoise_dct asm to mpegvideoenc This function is encoding-only. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2013-12-20 17:16:11 +01:00
Ronald S. Bultje	8d4c616fc0	vp9/x86: idct_add_16x16_ssse3. Currently only dc-only and full 16x16. Other subforms will follow in the near future. Total decoding time of ped1080p.webm goes from 9.7 to 9.3 seconds. DC-only goes from 957 -> 131 cycles, and the full IDCT goes from ~4050 to ~745 cycles.	2013-12-14 12:13:26 -05:00
Michael Niedermayer	8e70fdab36	Merge commit '4958f35a2ebc307049ff2104ffb944f5f457feb3' * commit '4958f35a2ebc307049ff2104ffb944f5f457feb3': dsputil: Move apply_window_int16 to ac3dsp Conflicts: libavcodec/arm/ac3dsp_init_arm.c libavcodec/arm/ac3dsp_neon.S libavcodec/x86/ac3dsp_init.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-12-09 04:12:40 +01:00
Diego Biurrun	4958f35a2e	dsputil: Move apply_window_int16 to ac3dsp The (optimized) functions are used nowhere else.	2013-12-08 17:57:15 +01:00
Ronald S. Bultje	92436e8ad9	vp9: implement top/left half (4x4) sub-8x8-IDCT. For that specific case (eob>3&&eob<=12), runtime of idct8x8 goes from 668 to 477 cycles. For all idct8x8, runtime goes from 521 to 490 cycles.	2013-12-07 12:39:36 -05:00
Ronald S. Bultje	b2045c44a9	vp9: split pre-load of 11585x2 out of 1d idct macro. This allows us to load it only once, instead of twice, in this function.	2013-12-07 12:39:36 -05:00
Ronald S. Bultje	f9a0d4c6e0	vp9: minor refactorings in idct ssse3 assembly. Make register usage in macros explicit; change mulsub_2w_4x to use 2 instead of 3 temp registers.	2013-12-07 12:39:35 -05:00
Ronald S. Bultje	8729964b99	vp9: split x86 assembly in two files. (And in future, loopfilter or intra pred could be put in their own respective files also.)	2013-12-07 12:39:35 -05:00
Michael Niedermayer	5b4d57455d	Merge remote-tracking branch 'qatar/master' * qatar/master: x86: Initialize mmxext after amd3dnow optimizations Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-12-05 11:55:41 +01:00
Diego Biurrun	3d7c84747d	x86: Initialize mmxext after amd3dnow optimizations The mmxext optimizations should be at least equally fast if available and amd3dnow optimizations are being deprecated. Thus the former should override the latter, not the other way around.	2013-12-04 18:52:48 +01:00
Michael Niedermayer	be2312aa8f	Merge remote-tracking branch 'qatar/master' * qatar/master: dsputil: x86: Move ff_inv_zigzag_direct16 table init to mpegvideo If someone optimizes dct_quantize for non x86 SIMD, then this probably needs to be reverted. Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-12-02 10:59:48 +01:00
Diego Biurrun	7ffaa19570	dsputil: x86: Move ff_inv_zigzag_direct16 table init to mpegvideo The table is MMX-specific and used nowhere else.	2013-12-02 04:05:18 +01:00
Michael Niedermayer	3adb825650	Merge commit 'cf7860db608df7c76471d8b61f07abbd5aad8dd5' * commit 'cf7860db608df7c76471d8b61f07abbd5aad8dd5': x86: dsputil: Suppress deprecation warnings for XvMC bits Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-11-28 22:47:37 +01:00
Diego Biurrun	cf7860db60	x86: dsputil: Suppress deprecation warnings for XvMC bits These parts are scheduled for removal on the next version bump. Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>	2013-11-28 16:04:30 +01:00
Clément Bœsch	616da59542	avcodec/x86/vp9dsp: merge a few SWAP together.	2013-11-21 23:06:21 +01:00
Clément Bœsch	e0434cfcfc	avcodec/x86: remove 3 sub in pred4x4_tm_vp8_8. before: 411 decicycles in ff_pred4x4_tm_vp8_8_ssse3, 8388289 runs, 319 skips after: 389 decicycles in ff_pred4x4_tm_vp8_8_ssse3, 8388308 runs, 300 skips Tested on i7 920.	2013-11-17 23:12:35 +01:00
Clément Bœsch	d28c79b003	avcodec/x86/vp9dsp: use EXTERNAL_* macros. Original fix by one of these developers: Anton Khirnov <anton@khirnov.net> Diego Biurrun <diego@biurrun.de> Luca Barbato <lu_zero@gentoo.org> Martin Storsjö <martin@martin.st> See `97962b2` / `72ca830` Personnal guess is Diego Biurrun.	2013-11-16 17:03:17 +01:00
Michael Niedermayer	91e00c4a78	Merge commit '458446acfa1441d283dacf9e6e545beb083b8bb0' * commit '458446acfa1441d283dacf9e6e545beb083b8bb0': lavc: Edge emulation with dst/src linesize Conflicts: libavcodec/cavs.c libavcodec/h264.c libavcodec/hevc.c libavcodec/mpegvideo_enc.c libavcodec/mpegvideo_motion.c libavcodec/rv34.c libavcodec/svq3.c libavcodec/vc1dec.c libavcodec/videodsp.h libavcodec/videodsp_template.c libavcodec/vp3.c libavcodec/vp8.c libavcodec/wmv2.c libavcodec/x86/videodsp.asm libavcodec/x86/videodsp_init.c Changes to the asm are not merged, they are left for volunteers or in their absence for later. The changes this merge introduces are reordering of the function arguments See: `face578d56` Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-11-15 15:07:10 +01:00
Ronald S. Bultje	72ca830f51	lavc: VP9 decoder Originally written by Ronald S. Bultje <rsbultje@gmail.com> and Clément Bœsch <u@pkh.me> Further contributions by: Anton Khirnov <anton@khirnov.net> Diego Biurrun <diego@biurrun.de> Luca Barbato <lu_zero@gentoo.org> Martin Storsjö <martin@martin.st> Signed-off-by: Luca Barbato <lu_zero@gentoo.org> Signed-off-by: Anton Khirnov <anton@khirnov.net>	2013-11-15 10:16:28 +01:00
Ronald S. Bultje	458446acfa	lavc: Edge emulation with dst/src linesize Allow supporting files for which the image stride is smaller than the maximum block size + number of subpel mc taps, e.g. a 64x64 VP9 file or a 16x16 VP8 file with -fflags +emu_edge.	2013-11-15 10:16:27 +01:00
Michael Niedermayer	5231eecdaf	Merge remote-tracking branch 'qatar/master' * qatar/master: Deprecate obsolete XvMC hardware decoding support Conflicts: libavcodec/mpeg12.c libavcodec/mpeg12dec.c libavcodec/mpegvideo.c libavcodec/options_table.h libavutil/pixdesc.c libavutil/version.h Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-11-14 03:26:35 +01:00
Diego Biurrun	19e30a58fc	Deprecate obsolete XvMC hardware decoding support XvMC has long ago been superseded by newer acceleration APIs, such as VDPAU, and few downstreams still support it. Furthermore XvMC is not implemented within the hwaccel framework, but requires its own specific code in the MPEG-1/2 decoder, which is a maintenance burden.	2013-11-13 21:07:45 +01:00
Michael Niedermayer	a30f7918b5	Merge commit '0338c396987c82b41d322630ea9712fe5f9561d6' * commit '0338c396987c82b41d322630ea9712fe5f9561d6': dsputil: Split off H.263 bits into their own H263DSPContext Conflicts: configure libavcodec/mpegvideo.h libavcodec/mpegvideo_enc.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-11-08 17:42:56 +01:00
Diego Biurrun	0338c39698	dsputil: Split off H.263 bits into their own H263DSPContext	2013-11-08 12:40:47 +01:00
Clément Bœsch	87434cf373	avcodec/vp9: add ff_vp9_idct_idct_{4x4,8x8}_ssse3(). 1789 decicycles in idct_idct_4x4_add_c, 262136 runs, 8 skips 1839 decicycles in idct_idct_4x4_add_c, 524270 runs, 18 skips 1864 decicycles in idct_idct_4x4_add_c, 1048548 runs, 28 skips 529 decicycles in ff_vp9_idct_idct_4x4_add_ssse3, 262138 runs, 6 skips 516 decicycles in ff_vp9_idct_idct_4x4_add_ssse3, 524282 runs, 6 skips 474 decicycles in ff_vp9_idct_idct_4x4_add_ssse3, 1048565 runs, 11 skips (~3.9x faster) 7726 decicycles in idct_idct_8x8_add_c, 1048433 runs, 143 skips 7732 decicycles in idct_idct_8x8_add_c, 2096882 runs, 270 skips 7731 decicycles in idct_idct_8x8_add_c, 4193772 runs, 532 skips 1145 decicycles in ff_vp9_idct_idct_8x8_add_ssse3, 1048549 runs, 27 skips 1137 decicycles in ff_vp9_idct_idct_8x8_add_ssse3, 2097097 runs, 55 skips 1086 decicycles in ff_vp9_idct_idct_8x8_add_ssse3, 4194188 runs, 116 skips (~7.1x faster) Overall decode time before commit: 16.48s user 0.03s system 99% cpu 16.526 total 16.54s user 0.01s system 99% cpu 16.566 total 16.46s user 0.03s system 99% cpu 16.511 total Overall decode time after commit: 16.34s user 0.02s system 99% cpu 16.378 total 16.28s user 0.02s system 99% cpu 16.315 total 16.32s user 0.03s system 99% cpu 16.366 total Tested on i7 920 with 40s 1080p footage.	2013-11-05 19:25:40 +01:00
Michael Niedermayer	934e489ee8	Merge commit 'e2b5b097898c9155f4bdff4d83cdc54d5eef6930' * commit 'e2b5b097898c9155f4bdff4d83cdc54d5eef6930': x86: rv40dsp: Use PAVGB instruction macro where appropriate Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-11-05 10:26:07 +01:00
Diego Biurrun	e2b5b09789	x86: rv40dsp: Use PAVGB instruction macro where appropriate	2013-11-04 21:14:39 +01:00
Mikulas Patocka	694d997afe	x86: hpeldsp: Use PAVGB instruction macro where necessary Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Signed-off-by: Diego Biurrun <diego@biurrun.de>	2013-11-04 01:29:23 +01:00
Mikulas Patocka	074155360d	avcodec/x86/hpeldsp: fix crash on AMD K6-3+ There are instructions pavgb and pavgusb. Both instructions do the same operation but they have different enconding. Pavgb exists in SSE (or MMXEXT) instruction set and pavgusb exists in 3D-NOW instruction set. livavcodec uses the macro PAVGB to select the proper instruction. However, the function avg_pixels8_xy2 doesn't use this macro, it uses pavgb directly. As a consequence, the function avg_pixels8_xy2 crashes on AMD K6-2 and K6-3 processors, because they have pavgusb, but not pavgb. This bug seems to be introduced by commit `71155d7b41`, "dsputil: x86: Convert mpeg4 qpel and dsputil avg to yasm" Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-11-03 19:49:11 +01:00
Michael Niedermayer	7146eacfc5	Merge commit '1700b4e678ed329611a16b20d11e64b7abda4839' * commit '1700b4e678ed329611a16b20d11e64b7abda4839': x86: vp8dsp: Split loopfilter code into a separate file Conflicts: libavcodec/x86/Makefile Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-11-02 10:13:14 +01:00
Diego Biurrun	1700b4e678	x86: vp8dsp: Split loopfilter code into a separate file	2013-11-01 22:05:20 +01:00
Michael Niedermayer	fa6fa2162b	avcodec/cabac: support UNCHECKED_BITSTREAM_READER = 0 Fixes overreads in HEVC Fixes Ticket3070 Also fixed remaining issues from Ticket3075 and Ticket3076 Some lines of code taken from 0c5f839693da2276c2da23400f67a67be4ea0af1:libavcodec/x86/cabac.h and 0c5f839693da2276c2da23400f67a67be4ea0af1:libavcodec/cabac_functions.h Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-31 11:13:27 +01:00
Ronald S. Bultje	960490c0b2	avcodec/x86/videodsp: Small speedups in ff_emulated_edge_mc x86 SIMD. Don't use word-size multiplications if size == 2, and if we're using SIMD instructions (size >= 8), complete leftover 4byte sets using movd, not mov. Both of these changes lead to minor speedups. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-27 15:02:48 +01:00
Ronald S. Bultje	cd86eb265f	avcodec/x86/videodsp: fix a bug in a %if statement where we used '%%' instead of '&&'. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-27 15:02:48 +01:00
Michael Niedermayer	41efb8d9a7	avcodec/x86/cabac: include get_cabac_bypass_sign_x86() under #if !BROKEN_COMPILER this might fix Ticket2999 as well as some fate clients untested as the original patch submitter no longer has the environment to test this should be reverted if it does not fix the issues Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-26 15:06:55 +02:00
Ronald S. Bultje	1b3a7e1f42	avcodec/x86/videodsp: Properly mark sse2 instructions in emulated_edge_mc x86 simd as such. Should fix crashes or corrupt output on pre-SSE2 CPUs when they were using SSE2-code (e.g. AMD Athlon XP 2400+ or Intel Pentium III) in hfix or hvar single-edge (left/right) extension functions. Tested-by: Ingo Brückl <ib@wupperonline.de> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-24 13:36:55 +02:00
Michael Niedermayer	c35d29a9c8	avcodec/x86/dsputil_init: move ff_idct_xvid_mmxext init This decreases the diff to libav Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-15 02:06:12 +02:00
Michael Niedermayer	ab8cbfe0dd	avcodec/x86/dsputil_init: remove duplicated sse2 idct init Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-15 01:59:36 +02:00
Michael Niedermayer	1bf8fa75ee	avcodec/x86/dsputil_init: fix cpu flag checks Fixes linking failure with --disable-sse2 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-15 01:46:21 +02:00
Ronald S. Bultje	20d78a8606	libavcodec/x86: Fix emulated_edge_mc SSE code to not contain SSE2 instructions on x86-32. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-10 13:36:06 +02:00
Ronald S. Bultje	ad75d2b590	x86: Fix compilation with nasm on PPC & OS/2 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-08 12:36:19 +02:00
Michael Niedermayer	deb5addcff	Merge remote-tracking branch 'qatar/master' * qatar/master: x86: h264_idct: Update comments to match 8/10-bit depth optimization split Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-08 12:10:02 +02:00
Michael Niedermayer	1f17619fe4	Merge commit 'bbe4a6db44f0b55b424a5cc9d3e89cd88e250450' * commit 'bbe4a6db44f0b55b424a5cc9d3e89cd88e250450': x86inc: Utilize the shadow space on 64-bit Windows Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-08 11:23:00 +02:00
Ronald S. Bultje	ba9c557b92	avcodec/x86/vp9dsp: Fix compilation with nasm. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-08 02:27:12 +02:00
Diego Biurrun	6405ca7d4a	x86: h264_idct: Update comments to match 8/10-bit depth optimization split	2013-10-07 21:46:46 +02:00
Henrik Gramner	bbe4a6db44	x86inc: Utilize the shadow space on 64-bit Windows Store XMM6 and XMM7 in the shadow space in functions that clobbers them. This way we don't have to adjust the stack pointer as often, reducing the number of instructions as well as code size. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	2013-10-07 06:25:35 -04:00
Michael Niedermayer	b67cb58520	Merge remote-tracking branch 'qatar/master' * qatar/master: x86: fdct: Employ more specific ifdefs Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-07 11:37:47 +02:00
Diego Biurrun	ce1e8045e0	x86: fdct: Employ more specific ifdefs This avoids building mmxext and sse2 code when disabled by configure.	2013-10-06 22:02:25 +02:00
Michael Niedermayer	c86955d24a	Merge commit '2ddb35b91131115c094d90e04031451023441b4d' * commit '2ddb35b91131115c094d90e04031451023441b4d': x86: dsputil: Separate ff_add_hfyu_median_prediction_cmov from dsputil_mmx Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-06 11:50:01 +02:00
Michael Niedermayer	7fb123429e	Merge commit '258414d0771845d20f646ffe4d4e60f22fba217c' * commit '258414d0771845d20f646ffe4d4e60f22fba217c': x86: fdct: Initialize optimized fdct implementations in the standard way Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-06 11:31:01 +02:00
Michael Niedermayer	d0b2703676	Merge commit '0b8b2ae5e93d616c2ece59f7175f483154cff918' * commit '0b8b2ae5e93d616c2ece59f7175f483154cff918': x86: xviddct: Employ more specific ifdefs Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-06 11:25:22 +02:00
Diego Biurrun	2ddb35b911	x86: dsputil: Separate ff_add_hfyu_median_prediction_cmov from dsputil_mmx The function does not depend on MMX and compilation without MMX enabled fails if the function is compiled conditional on MMX availability.	2013-10-05 19:21:15 +02:00
Diego Biurrun	258414d077	x86: fdct: Initialize optimized fdct implementations in the standard way	2013-10-05 18:20:52 +02:00
Diego Biurrun	0b8b2ae5e9	x86: xviddct: Employ more specific ifdefs This avoids building mmxext and sse2 code when disabled by configure.	2013-10-05 18:14:58 +02:00
Michael Niedermayer	9d8e8495c9	Merge remote-tracking branch 'qatar/master' * qatar/master: x86: fdct: Only build fdct code if encoders have been enabled Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-04 14:36:58 +02:00
Diego Biurrun	6cc133ec58	x86: fdct: Only build fdct code if encoders have been enabled fdct is only initialized if encoders are enabled.	2013-10-04 10:50:44 +02:00
Ronald S. Bultje	f1548c008f	Full-pixel MC functions. Decoding time of ped1080p.webm goes from 11.3sec to 11.1sec.	2013-10-02 21:03:15 -04:00
Ronald S. Bultje	c07ac8d467	VP9 MC (ssse3) optimizations. Decoding time of ped1080p.webm goes from 20.7sec to 11.3sec.	2013-10-02 21:03:15 -04:00
Ronald S. Bultje	face578d56	Rewrite emu_edge functions to have separate src/dst_stride arguments. This allows supporting files for which the image stride is smaller than the max. block size + number of subpel mc taps, e.g. a 64x64 VP9 file or a 16x16 VP8 file with -fflags +emu_edge.	2013-09-28 20:28:08 -04:00
Ronald S. Bultje	c341f734e5	Convert multiplier for MV from int to ptrdiff_t. This prevents emulated_edge_mc from not undoing mvy*stride-related integer overflows. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-09-28 11:28:09 +02:00
Martin Storsjö	ede42109e7	x86: Add an xmm clobbering wrapper for avcodec_encode_video2 This is required since `187105ff8` when we started trying to wrap this function as well. Signed-off-by: Martin Storsjö <martin@martin.st>	2013-09-17 10:53:23 +02:00
Martin Storsjö	1daea5232f	x86: Add an xmm clobbering wrapper for avcodec_encode_video2 This is required since `187105ff8` when we started trying to wrap this function as well. Signed-off-by: Martin Storsjö <martin@martin.st>	2013-09-16 22:22:41 +03:00
Hendrik Leppkes	a06a5b78e2	mathops/x86: work around inline asm miscompilation with GCC 4.8.1 The volatile is not required here, and prevents a miscompilation with GCC 4.8.1 when building on x86 with --cpu=i686 Signed-off-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	2013-09-15 11:15:07 -04:00
Michael Niedermayer	2ffead98dd	avcodec: add emuedge_linesize_type Currently all uses of the emu edge code as well as the code itself assume int linesize changing some but not changing all would introduce a security issue once all use this typedef a simple search and replace can be done to switch them all to ptrdiff_t Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-09-04 14:29:20 +02:00
Paul B Mahol	6053812814	x86/simple_idct: use LOCAL_ALIGNED instead of DECLARE_ALIGNED Signed-off-by: Paul B Mahol <onemda@gmail.com>	2013-09-03 17:02:49 +00:00
Thilo Borgmann	d814a839ac	Reinstate proper FFmpeg license for all files.	2013-08-30 15:47:38 +00:00
Carl Eugen Hoyos	8fe1fb41ac	Fix compilation with --disable-mmx.	2013-08-30 15:21:15 +02:00
Michael Niedermayer	62a6052974	Merge commit 'e998b56362c711701b3daa34e7b956e7126336f4' * commit 'e998b56362c711701b3daa34e7b956e7126336f4': x86: avcodec: Consistently structure CPU extension initialization Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-08-30 12:50:01 +02:00
Michael Niedermayer	7fb758cd8e	avcodec/x86/lpc: Fix cpu flag checks so they work Broken by `6369ba3c9c` Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-08-30 12:34:52 +02:00
Michael Niedermayer	c1913064e3	avcodec/x86/vp8dsp: Fix cpu flag checks so they work Broken by `6369ba3c9c` Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-08-30 12:33:56 +02:00
Michael Niedermayer	8be0e2bd43	Merge commit '6369ba3c9cc74becfaad2a8882dff3dd3e7ae3c0' * commit '6369ba3c9cc74becfaad2a8882dff3dd3e7ae3c0': x86: avcodec: Use convenience macros to check for CPU flags Conflicts: libavcodec/x86/dsputil_init.c libavcodec/x86/hpeldsp_init.c libavcodec/x86/motion_est.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-08-30 12:08:28 +02:00

... 5 6 7 8 9 ...

1949 Commits