ffmpeg

Author	SHA1	Message	Date
James Almer	c792528970	x86/imdct36: use extractps inside the STORE macro Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Reviewed-by: Henrik Gramner <henrik@gramner.com> Signed-off-by: James Almer <jamrial@gmail.com>	2016-01-28 13:35:15 -03:00
Derek Buitenhuis	ea2df33052	Merge commit '4f22b138886e29f7fffa8c715673951e51be9f32' Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	2016-01-27 18:23:31 +00:00
James Almer	209f50e16b	avcodec/synth_filter: split off remaining code from dcadec files Signed-off-by: James Almer <jamrial@gmail.com>	2016-01-25 14:57:38 -03:00
Geza Lore	d39c229e54	x86inc: Add debug symbols indicating sizes of compiled functions Some debuggers/profilers use this metadata to determine which function a given instruction is in; without it they get can confused by local labels (if you haven't stripped those). On the other hand, some tools are still confused even with this metadata. e.g. this fixes `gdb`, but not `perf`. Currently only implemented for ELF.	2016-01-21 23:19:46 +01:00
Ronald S. Bultje	0f88b3f82f	videodsp: fix 1-byte overread in top/bottom READ_NUM_BYTES iterations. This can overread (either before start or beyond end) of the buffer in Nx1 (i.e. height=1) images. Fixes mozilla bug 1240080.	2016-01-18 11:12:47 -05:00
Diego Biurrun	4f22b13888	x86: ac3dsp: Drop forward declaration for nonexisting function	2016-01-18 11:55:38 +01:00
James Darnley	f59b727e2f	avcodec/v210: guard new avx2 functions from old assemblers	2016-01-17 21:23:58 +01:00
James Darnley	2cba1825f7	avcodec/v210: add avx2 version of the 10-bit line encoder Around 25% faster than the ssse3 version.	2016-01-17 16:03:43 +01:00
James Darnley	3836f404a8	avcodec/v210: add avx2 version of the 8-bit line encoder Around 35% faster than the avx version. Signed-off-by: Henrik Gramner <henrik@gramner.com>	2016-01-17 16:03:43 +01:00
Michael Niedermayer	da6f34516b	avcodec/x86/fmtconvert: Add emms to int32_to_float_fmul_array8_sse() this should fix checkasm on x86_64-archlinux-gcc-valgrind Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2016-01-15 17:08:37 +01:00
Hendrik Leppkes	2214207d04	Merge commit '8563f9887194b07c972c3475d6b51592d77f73f7' * commit '8563f9887194b07c972c3475d6b51592d77f73f7': x86: use emms after ff_int32_to_float_fmul_scalar_sse Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>	2016-01-02 13:27:11 +01:00
Hendrik Leppkes	a9cd11b212	Merge commit 'f4f27e4cf1013c55b2c7df359ce8d58ee922662c' * commit 'f4f27e4cf1013c55b2c7df359ce8d58ee922662c': x86: zero extend the 32-bit length in int32_to_float_fmul_scalar implicitly Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>	2016-01-02 13:23:25 +01:00
Hendrik Leppkes	d03da3e240	Merge commit '2008f76054906e9ff6bf744800af0e5a5bfe61be' * commit '2008f76054906e9ff6bf744800af0e5a5bfe61be': dca: remove unused decode_hf function and quant_d tables Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>	2016-01-02 13:17:48 +01:00
Hendrik Leppkes	00e91d0676	Merge commit '5dfe4edad63971d669ae456b0bc40ef9364cca80' * commit '5dfe4edad63971d669ae456b0bc40ef9364cca80': x86_64: int32_to_float_fmul_scalar sign extend integer length Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>	2016-01-02 10:46:18 +01:00
Janne Grunau	8563f98871	x86: use emms after ff_int32_to_float_fmul_scalar_sse Intel's Instruction Set Reference (as of September 2015) clearly states that cvtpi2ps switches to MMX state. Actual CPUs do not switch if the source is a memory location. The Instruction Set Reference from 1999 (Order Number 243191) describes this behaviour but all later versions I've seen have make no distinction whether MMX registers or memory is used as source. The documentation for the matching SSE2 instruction to convert to double (cvtpi2pd) was fixed (see the valgrind bug https://bugs.kde.org/show_bug.cgi?id=210264). It will take time to get a clarification and fixes in place. In the meantime it makes sense to change ff_int32_to_float_fmul_scalar_sse to be correct according to the documentation. The vast majority of users will have SSE2 so a change to the SSE version has little effect. Fixes fate-checkasm on x86 valgrind targets. Valgrind 'bug' reported as https://bugs.kde.org/show_bug.cgi?id=357059	2015-12-30 13:37:57 +01:00
Janne Grunau	f4f27e4cf1	x86: zero extend the 32-bit length in int32_to_float_fmul_scalar implicitly This reverts commit `5dfe4edad6`.	2015-12-29 11:42:51 +01:00
Alexandra Hájková	2008f76054	dca: remove unused decode_hf function and quant_d tables They were superseded with their integer equivalents. Rename integer decode_hf to decode_hf.	2015-12-24 13:58:18 +01:00
James Almer	d4c47333e1	x86/hevc_sao: add ff_hevc_sao_edge_filter_{8,16}_{10,12} Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2015-12-20 17:01:15 -03:00
James Almer	3ff2beff65	x86/hevc_sao: simplify sao_edge_filter 10/12bit Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2015-12-20 16:45:37 -03:00
James Almer	34b2bd03cf	x86/hevc_sao: simplify sao_band_filter 10/12bit Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2015-12-20 16:42:36 -03:00
Janne Grunau	5dfe4edad6	x86_64: int32_to_float_fmul_scalar sign extend integer length	2015-12-14 16:42:35 +01:00
Dave Yeo	b0b133b8c0	hevcdsp: use a macro for .rodata section fixes assembling on OS/2 Signed-off-by: Dave Yeo <dave.r.yeo@gmail.com> Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2015-12-11 16:19:30 +01:00
Kieran Kunhya	3f07f12f65	diracdec: Template DSP functions adding 10-bit versions	2015-12-10 18:25:02 +00:00
Anton Khirnov	e7078e842d	hevcdsp: add x86 SIMD for MC	2015-12-05 21:11:52 +01:00
Timothy Gu	4b80b895a9	pixblockdsp: x86: Condense diff_pixels_* to a shared macro Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Reviewed-by: James Almer <jamrial@gmail.com>	2015-11-07 14:31:34 -08:00
Ganesh Ajjanagadde	38f4e973ef	all: fix -Wextra-semi reported on clang This fixes extra semicolons that clang 3.7 on GNU/Linux warns about. These were trigggered when built under -Wpedantic, which essentially checks for strict ISO compliance in numerous ways. Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>	2015-10-24 17:58:17 -04:00
Ronald S. Bultje	52f84d82bd	videodsp: don't overread edges in vfix3 emu_edge. Fixes trac ticket 3226. Also see Andreas' analysis in https://bugs.debian.org/801745, which was very helpful.	2015-10-24 14:34:50 -04:00
Michael Niedermayer	ea5a1d1485	avcodec/x86/vc1dsp: Remove unused macro Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2015-10-22 21:13:42 +02:00
Carl Eugen Hoyos	775b84e30e	lavc/x86/vc1dsp_init: Fix compilation with --disable-yasm.	2015-10-22 11:37:42 +02:00
James Almer	73353af6e5	x86/Makefile: move decoder/encoder objects out of the subsystems section Signed-off-by: James Almer <jamrial@gmail.com>	2015-10-22 03:55:18 -03:00
Timothy Gu	ab5f43e634	vc1dsp: Port ff_vc1_put_ver_16b_shift2_mmx to yasm This function is only used within other inline asm functions, hence the HAVE_MMX_INLINE guard. Per recent discussions, we should not worry about the performance of inline asm-only builds.	2015-10-21 20:01:52 -07:00
Timothy Gu	98da061461	huffyuvencdsp: Cherry pick changes left out in the last commit Oops.	2015-10-21 12:42:33 -07:00
Timothy Gu	5e586e1bef	huffyuvencdsp: Add ff_diff_bytes_{sse2,avx2} SSE2 version 4%-35% faster than MMX depending on the width. AVX2 version 1%-13% faster than SSE2 depending on the width.	2015-10-21 12:25:32 -07:00
Timothy Gu	6b41b44149	huffyuvencdsp: Convert ff_diff_bytes_mmx to yasm Heavily based upon ff_add_bytes by Christophe Gisquet. Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Timothy Gu <timothygu99@gmail.com>	2015-10-20 18:24:54 -07:00
Timothy Gu	068e6cb732	huffyuvencdsp: Use intptr_t for width It is done this way in huffyuvdsp as well.	2015-10-19 16:57:33 -07:00
Timothy Gu	a079cbf458	x86: vc1dsp_mmx: Move yasm initiation steps to vc1dsp_init That's where all yasm initiation steps are. Also removes the overlap between the two files.	2015-10-19 16:52:52 -07:00
Timothy Gu	607f820ec7	x86: fpel: Remove erroneous ff_put_pixels8_mmxext prototype This function does not exist.	2015-10-19 16:52:37 -07:00
Timothy Gu	cb6f1f8bf9	x86: fpel: Move prototypes for 4-px block functions	2015-10-19 16:52:33 -07:00
James Almer	74a87ae210	x86/vp9itxfm: fix register clobbering in ff_vp9_idct_idct_4x4_add_12_sse2 Reviewed-by: Henrik Gramner <henrik@gramner.com> Signed-off-by: James Almer <jamrial@gmail.com>	2015-10-13 20:21:33 -03:00
Christophe Gisquet	74c414202f	x86: simple_idct10_template: use const This avoid going through constants.c while still sharing them with proresdsp.asm Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2015-10-13 22:52:33 +02:00
Ronald S. Bultje	e578638382	vp9: use registers for constant loading where possible.	2015-10-13 11:06:01 -04:00
Ronald S. Bultje	408bb8556f	vp9: refactor itx coefficients and share between 8 and 10/12bpp.	2015-10-13 11:06:01 -04:00
Ronald S. Bultje	eb4b5ff738	vp9: add itxfm_add eob shortcuts to 10/12bpp functions. These aren't quite as helpful as the ones in 8bpp, since over there, we can use pmulhrsw, but here the coefficients have too many bits to be able to take advantage of pmulhrsw. However, we can still skip cols for which all coefs are 0, and instead just zero the input data for the row itx. This helps a few % on overall decoding speed.	2015-10-13 11:06:01 -04:00
Ronald S. Bultje	488fadebbc	vp9: add 10/12bpp idct_idct_32x32 sse2 SIMD version.	2015-10-13 11:06:00 -04:00
Ronald S. Bultje	3d0ca2fe89	vp9: 10/12bpp sse2 SIMD for iadst16.	2015-10-13 11:06:00 -04:00
Ronald S. Bultje	0e80265b0a	vp9: refactor 10/12bpp dc-only code in 4x4/8x8 and add to 16x16.	2015-10-13 11:06:00 -04:00
Ronald S. Bultje	1338fb79d4	vp9: add 10/12bpp sse2 SIMD version for idct_idct_16x16.	2015-10-13 11:06:00 -04:00
Ronald S. Bultje	cb054d061a	vp9: add 10/12bpp sse2 SIMD versions of iadst8x8.	2015-10-13 11:05:59 -04:00
Ronald S. Bultje	e0610787b2	vp9: add 10/12bpp sse2 SIMD for idct_idct_8x8.	2015-10-13 11:05:59 -04:00
Ronald S. Bultje	a35f6bdb38	vp9: add 12bpp sse2 versions of iadst4.	2015-10-13 11:05:59 -04:00

1 2 3 4 5 ...

2096 Commits