ffmpeg

Author	SHA1	Message	Date
Timothy Gu	0b6292b7b8	x86: dsputilenc: move all the function prototypes together Signed-off-by: Timothy Gu <timothygu99@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-30 16:18:10 +02:00
Christophe Gisquet	f743fa9c7f	x86: huffyuvdsp: add_hfyu_left_pred_bgr32 C MMX SSE2 Cycles: 3092 1053 578 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-30 15:20:36 +02:00
Michael Niedermayer	7be79c76d3	avcodec/huffyuvdsp: Change w to intptr in add_hfyu_median_pred() and add_hfyu_left_pred() This avoids potential issues with the high 32bits being random in x86-64 asm Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-30 15:12:58 +02:00
Christophe Gisquet	884078d2df	x86: huffyuvdsp: add SSE2 median prediction From 5010c to 4566 on lagarith YUY2. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-30 14:57:57 +02:00
Michael Niedermayer	8c891d90ca	avcodec/x86/qpeldsp_init: Restore author attribution See: 368f50359eb328b0b9d67451f56fda20b3255f9a See: 44eb49512888143905860af2de2932ab002cdbf7, and many others See: similarity index 83% copy from libavcodec/x86/dsputil_init.c copy to libavcodec/x86/qpeldsp_init.c index ebbf97f..8f296a1 100644 --- a/libavcodec/x86/dsputil_init.c +++ b/libavcodec/x86/qpeldsp_init.c @@ -1,6 +1,5 @@ /* - * Copyright (c) 2000, 2001 Fabrice Bellard - * Copyright (c) 2002-2004 Michael Niedermayer <michaelni@gmx.at> + * quarterpel DSP functions * * This file is part of FFmpeg. * Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-30 04:05:29 +02:00
Michael Niedermayer	98a6806fdd	Merge commit '368f50359eb328b0b9d67451f56fda20b3255f9a' * commit '368f50359eb328b0b9d67451f56fda20b3255f9a': dsputil: Split off quarterpel bits into their own context Conflicts: configure libavcodec/dsputil.c libavcodec/h263dec.c libavcodec/mpegvideo.c libavcodec/mpegvideo_enc.c libavcodec/vc1dec.c libavcodec/vc1dsp.c libavcodec/x86/dsputil_init.c libavcodec/x86/qpeldsp.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-30 02:43:34 +02:00
Michael Niedermayer	40f3a87c10	Merge commit '054013a0fc6f2b52c60cee3e051be8cc7f82cef3' * commit '054013a0fc6f2b52c60cee3e051be8cc7f82cef3': dsputil: Move APE-specific bits into apedsp Conflicts: libavcodec/arm/int_neon.S libavcodec/x86/dsputil.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-30 00:59:15 +02:00
Michael Niedermayer	c814a6c778	avcodec/x86/svq1enc_mmx: Add author attribution See: 5900637219ccccdd39ddafa4e7181da20b8e1f1b Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-30 00:30:05 +02:00
Michael Niedermayer	ea0931fb96	Merge commit '65d5d5865845f057cc6530a8d0f34db952d9009c' * commit '65d5d5865845f057cc6530a8d0f34db952d9009c': dsputil: Move SVQ1 encoding specific bits into svq1enc Conflicts: libavcodec/x86/Makefile Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-30 00:01:45 +02:00
James Almer	02a3e327f1	x86/dsputilenc: add missing guards to ff_pix_sum16_xop XOP support was added in Yasm 1.0.0 and Nasm 2.06, and we still support older versions. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-29 22:31:28 +02:00
Christophe Gisquet	99a319c4e7	x86: huffyuvdsp: port add_bytes to yasm C MMX SSE2 Cycles: 2972 587 302 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-29 21:56:00 +02:00
Christophe Gisquet	2267003981	x86: hpeldsp: better factorization Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-29 21:47:40 +02:00
Michael Niedermayer	7b4c46050e	rename add_hfyu_left_prediction_int16 to add_hfyu_left_pred_int16 This makes the naming more consistent with the 8bit variant Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-29 19:50:44 +02:00
Michael Niedermayer	550ae6c02f	rename add_hfyu_median_prediction_int16 to add_hfyu_median_pred_int16 This makes the naming more consistent with the 8bit variant Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-29 19:49:29 +02:00
Michael Niedermayer	40a4ab8ba4	rename sub_hfyu_median_prediction_int16 to sub_hfyu_median_pred_int16 This makes the naming more consistent with the 8bit variant Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-29 19:48:23 +02:00
James Almer	05de4d3011	x86/dsputilenc: implement XOP version of pix_sum16 SSE2: 137 cycles XOP: 87 cycles Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-29 18:40:23 +02:00
Diego Biurrun	368f50359e	dsputil: Split off quarterpel bits into their own context	2014-05-29 06:48:31 -07:00
Diego Biurrun	054013a0fc	dsputil: Move APE-specific bits into apedsp	2014-05-29 06:41:15 -07:00
Diego Biurrun	65d5d58658	dsputil: Move SVQ1 encoding specific bits into svq1enc	2014-05-29 06:41:15 -07:00
Michael Niedermayer	b50559fc0b	libavcodec/x86/dsputilenc: drop and 0xffff that should have becomei redundant Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-29 00:16:52 +02:00
James Almer	561bfc85eb	x86/dsputilenc: implement SSE2 versions of pix_{sum16, norm1} Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-28 23:29:34 +02:00
Christophe Gisquet	0810608e23	x86: hevc_mc: better register allocation The xmm reg count was incorrect, and manual loading of the gprs furthermore allows to noticeable reduce the number needed. The modified functions are used in weighted prediction, so only a few samples like WP_* exhibit a change. For this one and Win64 (some widths removed because of too few occurrences): WP_A_Toshiba_3.bit, ff_hevc_put_hevc_uni_w 16 32 before: 2194 3872 after: 2119 3767 WP_B_Toshiba_3.bit, ff_hevc_put_hevc_bi_w 16 32 64 before: 2819 4960 9396 after: 2617 4788 9150 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-28 17:39:34 +02:00
Michael Niedermayer	48a6916308	Merge commit '512f3ffe9b4bb86767c2b1176554407c75fe1a5c' * commit '512f3ffe9b4bb86767c2b1176554407c75fe1a5c': dsputil: Split off HuffYUV encoding bits into their own context Conflicts: configure libavcodec/dsputil.c libavcodec/dsputil.h libavcodec/huffyuv.h libavcodec/huffyuvenc.c libavcodec/pngenc.c libavcodec/x86/dsputilenc_mmx.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-28 00:03:59 +02:00
Michael Niedermayer	e2abc0d5ca	Merge commit '0d439fbede03854eac8a978cccf21a3425a3c82d' * commit '0d439fbede03854eac8a978cccf21a3425a3c82d': dsputil: Split off HuffYUV decoding bits into their own context Conflicts: configure libavcodec/dsputil.c libavcodec/dsputil.h libavcodec/huffyuv.h libavcodec/huffyuvdec.c libavcodec/lagarith.c libavcodec/vble.c libavcodec/x86/Makefile libavcodec/x86/dsputil.asm libavcodec/x86/dsputil_init.c libavcodec/x86/dsputil_mmx.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-27 23:16:06 +02:00
Diego Biurrun	512f3ffe9b	dsputil: Split off HuffYUV encoding bits into their own context Also shorten HuffYUV context member names to avoid clutter.	2014-05-27 08:54:53 -07:00
Diego Biurrun	0d439fbede	dsputil: Split off HuffYUV decoding bits into their own context Also shorten HuffYUV context member names to avoid clutter.	2014-05-27 08:52:34 -07:00
James Almer	5863207086	x86/dsputilenc: use HADDD in ff_sse16_sse2 Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-27 15:12:50 +02:00
James Almer	e64e079ece	x86/dsputilenc: implement SSE2 version of diff_pixels Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-27 05:55:11 +02:00
Michael Niedermayer	a0c5cd3475	avcodec/x86/dsputilenc: set the count of SSE registers correctly for get_pixels Found-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-27 05:52:25 +02:00
Christophe Gisquet	86ae0da60c	x86: hpeldsp: propagate changes across codecs Some codecs still use mmx versions, so have them use the versions with newer instruction sets. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-26 15:37:04 +02:00
Michael Niedermayer	a3950a90f6	Revert "x86: dsputilenc: convert ff_sse{8, 16}_mmx() to yasm" This reverts commit ad733089b024e4a3ff8f024d247a032f79a50ac8. breaks with --disable-yasm revert requested by: Christophe Gisquet <christophe.gisquet@gmail.com>	2014-05-25 19:42:18 +02:00
Timothy Gu	ad733089b0	x86: dsputilenc: convert ff_sse{8, 16}_mmx() to yasm Signed-off-by: Timothy Gu <timothygu99@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-25 16:30:08 +02:00
James Almer	d94e255dd1	x86/dsputilenc: make the SUM_ABS_DCTELEM macro more readable Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-25 02:03:54 +02:00
James Almer	61eea421b2	x86/dsputilenc: port sum_abs_dctelem functions to yasm Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-24 21:46:25 +02:00
Christophe Gisquet	81aa0f4604	x86: hpeldsp: implement SSSE3 version of _xy2 Loading pb_1 rather than pw_8192 was benchmarked to be more efficient. Loading of the 2 yields no advantage. Loading of one saves ~11 cycles. decicycles count: put8: 3223(mmx) -> 2387 avg8: 2863(mmxext) -> 2125 put16: 4356(sse2) -> 3553 avg16: 4481(sse2) -> 3513 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-24 15:15:56 +02:00
Christophe Gisquet	9722a6a3f3	x86: hpeldsp: implement SSE2 put_pixels16_xy2 This is obviously equivalent to the avg version, without the avg. 3223(mmx) -> 2006(sse2) Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-24 03:45:17 +02:00
Christophe Gisquet	f0aca50e0b	x86: hpeldsp: implement SSE2 versions Those are mostly used in codecs older than H.264, eg MPEG-2. put16 versions: mmx mmx2 sse2 x2: 1888 1185 552 y2: 1778 1092 510 avg16 xy2: 3509(mmx2) -> 2169(sse2) Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-24 03:29:48 +02:00
James Almer	7538ad2248	x86/hevc_deblock: improve chroma functions register allocation Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-24 01:16:26 +02:00
James Almer	584327f22f	x86/dsputil: fix argument declaration in vector_clipf Should fix fate failures in msvc x86_64 Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-23 23:10:17 +02:00
James Almer	518cbf9b4a	x86/dsputil: fix VECTOR_CLIP_INT32 macro The inline loop was incrementing and using the value of %%i the wrong way. Disassembly of ff_vector_clip_int32_sse2 before and after this patch: movdqa (%rdx),%xmm0 \| movdqa (%rdx),%xmm0 movdqa 0x10(%rdx),%xmm1 \| movdqa 0x10(%rdx),%xmm1 movdqa 0x20(%rdx),%xmm2 \| movdqa 0x20(%rdx),%xmm2 movdqa 0x30(%rdx),%xmm3 \| movdqa 0x30(%rdx),%xmm3 [...] \| movdqa %xmm0,(%rcx) \| movdqa %xmm0,(%rcx) movdqa %xmm1,0x10(%rcx) \| movdqa %xmm1,0x10(%rcx) movdqa %xmm2,0x20(%rcx) \| movdqa %xmm2,0x20(%rcx) movdqa %xmm3,0x30(%rcx) \| movdqa %xmm3,0x30(%rcx) movdqa (%rdx),%xmm0 \| movdqa 0x40(%rdx),%xmm0 movdqa 0x20(%rdx),%xmm1 \| movdqa 0x50(%rdx),%xmm1 movdqa 0x40(%rdx),%xmm2 \| movdqa 0x60(%rdx),%xmm2 movdqa 0x60(%rdx),%xmm3 \| movdqa 0x70(%rdx),%xmm3 [...] \| movdqa %xmm0,(%rcx) \| movdqa %xmm0,0x40(%rcx) movdqa %xmm1,0x20(%rcx) \| movdqa %xmm1,0x50(%rcx) movdqa %xmm2,0x40(%rcx) \| movdqa %xmm2,0x60(%rcx) movdqa %xmm3,0x60(%rcx) \| movdqa %xmm3,0x70(%rcx) add $0x80,%rdx \| add $0x80,%rdx add $0x80,%rcx \| add $0x80,%rcx Other versions were unaffected. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-23 22:59:55 +02:00
James Almer	6a4832caae	x86/diracdsp: mark all functions as yasm No inline asm dirac code remains in the tree, so replace every relevant check. This also moves all the dirac functions from dsputil_mmx.c to diracdsp_mmx.c Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-23 15:02:42 +02:00
James Almer	1d36defe94	x86/dsputil: port ff_vector_clipf_sse to yasm Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-23 00:08:21 +02:00
Christophe Gisquet	c081ca851c	x86: hpeldsp: avg_pixels_xy2 for mmx2&3dnow This is a port of the inline assembly of the mmx version to use the pavg(us\|)b instruction. 8 16 mmx 1498 4355 mmx2 1242 3509 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-22 20:17:49 +02:00
Christophe Gisquet	17ac998055	x86: hpeldsp: mark _xy2 versions as approximate Currently, only the mmx version is bitexact, the others (mmxext and 3dnow) are not, in spite of their naming. Therefore, make their name more obvious. Also restore a comment that was removed in 71155d7b. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-22 20:17:45 +02:00
Christophe Gisquet	f8de35ebc4	x86: hpeldsp: kill hpeldsp_mmx.c before: 1987 decicycles in 8_x2, 262121 runs, 23 skips after: 1902 decicycles in 8_x2, 262112 runs, 32 skips Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-22 20:17:40 +02:00
James Almer	80ee2dfcf6	x86/dsputil: port ff_put_signed_pixels_clamped_mmx to yasm Also add an SSE2 version Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-21 23:33:45 +02:00
James Almer	7b05267239	x86/dsputil: port clear_block functions to yasm Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-21 23:33:45 +02:00
Michael Niedermayer	3d4e365073	avcodec/x86/hpeldsp_init: remove redundant if() Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-21 13:38:27 +02:00
Hendrik Leppkes	cd9e08e110	hpeldsp: fix build without inline asm Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-21 13:37:38 +02:00
Christophe Gisquet	d1a32c3f49	x86: kill fpel_mmx.c Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-21 03:25:08 +02:00

1 2 3 4 5 ...

1646 Commits