ffmpeg

Author	SHA1	Message	Date
Christophe Gisquet	9dc45d1f42	x86: lavc: share more constants Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-02-06 23:35:02 +01:00
Mickaël Raulet	6ecc3fd612	x86/hevc_mc: use aligned loads Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-02-06 21:38:00 +01:00
James Almer	383fddeec6	x86/lossless_audiodsp: fix compilation with --disable-yasm Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2015-02-06 17:30:17 -03:00
James Almer	aea29a891f	x86/hevc_sao: fix loading of RIP address pb_eo must be handled as a rip relative address for MSVC64, so an intermediate register is needed. Should fix link failures. Suggested by Hendrik Leppkes and Christophe Gisquet. Tested-By: Hendrik Leppkes <h.leppkes@gmail.com> Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2015-02-06 15:06:15 -03:00
Mickaël Raulet	bcb0925115	x86/hevc: use CLIPW macro when possible Conflicts: libavcodec/x86/hevc_mc.asm Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-02-06 17:38:47 +01:00
Christophe Gisquet	5eedd36df1	x86: hevc_mc: use epel_hv 16-wide function The epel_hv functions were still relying on only epel_hv 8-wide being the maximum width instanciated. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-02-06 17:37:56 +01:00
Pierre Edouard Lepere	a0d1300f71	x86: hevc_mc: add AVX2 optimizations before 33304 decicycles in luma_bi_1, 523066 runs, 1222 skips 38138 decicycles in luma_bi_2, 523427 runs, 861 skips 13490 decicycles in luma_uni, 516138 runs, 8150 skips after 20185 decicycles in luma_bi_1, 519970 runs, 4318 skips 24620 decicycles in luma_bi_2, 521024 runs, 3264 skips 10397 decicycles in luma_uni, 515715 runs, 8573 skips Conflicts: libavcodec/x86/hevc_mc.asm libavcodec/x86/hevcdsp_init.c Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-02-06 17:20:47 +01:00
Michael Niedermayer	a6c2c8fe3f	Revert "avcodec/x86/lossless_audiodsp: Make scalarproduct_and_madd_int16 prototypes more similar" This reverts commit `3b4ffba3af`. Unbreaks the SSSE3 code on mingw32 Conflicts: libavcodec/x86/lossless_audiodsp.asm Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-02-06 02:31:45 +01:00
Michael Niedermayer	f1214763af	avcodec/x86/lossless_audiodsp: Move order&8 fallback into C code This is simpler and more robust, and fixes mismatching XMM save restore mismatches Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-02-06 02:18:54 +01:00
Michael Niedermayer	3b4ffba3af	avcodec/x86/lossless_audiodsp: Make scalarproduct_and_madd_int16 prototypes more similar This is needed as the mmx code is used as fallback from the ssse3 code Suggested-by: jamrial Tested-by: wm4 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-02-06 00:20:59 +01:00
James Almer	15574c505b	x86/hevcdsp: add ff_hevc_sao_edge_filter_{10,12}_{sse2,avx2} Original x86 intrinsics code by Pierre-Edouard Lepere. Yasm port, refactoring and optimizations by James Almer. Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U Width 32 342694 decicycles in sao_edge_filter_10, 16384 runs, 0 skips 29476 decicycles in ff_hevc_sao_edge_filter_32_10_ssse3, 16384 runs, 0 skips 13996 decicycles in ff_hevc_sao_edge_filter_32_10_avx2, 16381 runs, 3 skips Width 64 581163 decicycles in sao_edge_filter_10, 8192 runs, 0 skips 59774 decicycles in ff_hevc_sao_edge_filter_64_10_ssse3, 8192 runs, 0 skips 28383 decicycles in ff_hevc_sao_edge_filter_64_10_avx2, 8191 runs, 1 skips Signed-off-by: James Almer <jamrial@gmail.com>	2015-02-05 15:02:33 -03:00
James Almer	042c1159fc	x86/hevcdsp: add ff_hevc_sao_edge_filter_8_{ssse3,avx2} Original x86 intrinsics code and initial yasm port by Pierre-Edouard Lepere. Refactoring and optimizations by James Almer. Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U Width 32 158583 decicycles in edge, sao_edge_filter_8 runs, 0 skips 5205 decicycles in ff_hevc_sao_edge_filter_32_8_ssse3, 32767 runs, 1 skips 2942 decicycles in ff_hevc_sao_edge_filter_32_8_avx2, 32767 runs, 1 skips Width 64 705639 decicycles in sao_edge_filter_8, 262144 runs, 0 skips 19224 decicycles in ff_hevc_sao_edge_filter_64_8_ssse3, 262111 runs, 33 skips 10433 decicycles in ff_hevc_sao_edge_filter_64_8_avx2, 262115 runs, 29 skips Signed-off-by: James Almer <jamrial@gmail.com>	2015-02-05 15:02:27 -03:00
James Almer	aa945dc112	x86/hevcdsp: add missing vzeroupper in ff_hevc_sao_band_filter_48_*_avx2 Signed-off-by: James Almer <jamrial@gmail.com>	2015-02-02 00:01:35 -03:00
James Almer	71e2cb4706	x86/hevcdsp: add missing guards to ff_hevc_sao_band_filter_avx2 Signed-off-by: James Almer <jamrial@gmail.com>	2015-02-01 21:45:52 -03:00
Christophe Gisquet	bff7feb328	x86: hevc/sao: aligned source buffers Usefull for at least band filter, for which: - Band filter call only: 32 64 Before: 16556 54015 After: 16497 52355 - Whole case: 32 64 Before: 37031 103008 After: 32045 93952	2015-02-01 20:22:54 -03:00
James Almer	fa3eccb4f9	x86/hevc: add ff_hevc_sao_band_filter_{8,10,12}_{sse2,avx,avx2} Original x86 intrinsics code and initial 8bit yasm port by Pierre-Edouard Lepere. 10/12bit yasm ports, refactoring and optimizations by James Almer Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U width 32 40338 decicycles in sao_band_filter_0_8, 2048 runs, 0 skips 8056 decicycles in ff_hevc_sao_band_filter_8_32_sse2, 2048 runs, 0 skips 7458 decicycles in ff_hevc_sao_band_filter_8_32_avx, 2048 runs, 0 skips 4504 decicycles in ff_hevc_sao_band_filter_8_32_avx2, 2048 runs, 0 skips width 64 136046 decicycles in sao_band_filter_0_8, 16384 runs, 0 skips 28576 decicycles in ff_hevc_sao_band_filter_8_32_sse2, 16384 runs, 0 skips 26707 decicycles in ff_hevc_sao_band_filter_8_32_avx, 16384 runs, 0 skips 14387 decicycles in ff_hevc_sao_band_filter_8_32_avx2, 16384 runs, 0 skips Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2015-02-01 20:22:35 -03:00
Christophe Gisquet	7aeafacfd0	x86/sbrdsp: Use different mem moves Before 2843 decicycles in ff_sbr_autocorrelate_sse3, 262086 runs, 58 skips After 2693 decicycles in ff_sbr_autocorrelate_sse3, 262117 runs, 27 skips Signed-off-by: James Almer <jamrial@gmail.com>	2015-01-25 18:20:43 -03:00
James Almer	449b21bfab	x86/sbrdsp: add ff_sbr_autocorrelate_{sse,sse3} 2 to 2.5 times faster. Signed-off-by: James Almer <jamrial@gmail.com>	2015-01-25 18:20:39 -03:00
James Almer	08810a8895	x86/flacdsp: remove unneeded ifdeffery x86inc can translate r*m into a register or stack on its own Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2015-01-05 16:29:28 -03:00
James Almer	37b35feb64	x86/swr: add SSE2/AVX pack_8ch functions Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2014-12-30 23:05:27 -03:00
Ronald S. Bultje	3aefca68ca	vp9/x86: add myself to copyright holders for loopfilter assembly.	2014-12-27 16:55:16 -05:00
Ronald S. Bultje	afd8c464b7	vp9/x86: make filter_16_h work on 32-bit.	2014-12-27 16:55:16 -05:00
Ronald S. Bultje	b26bc3520f	vp9/x86: make filter_48/84/88_h work on 32-bit.	2014-12-27 16:55:15 -05:00
Ronald S. Bultje	8a1cff1c35	vp9/x86: make filter_44_h work on 32-bit.	2014-12-27 16:55:15 -05:00
Ronald S. Bultje	047088b8c6	vp9/x86: make filter_16_v work on 32-bit.	2014-12-27 16:55:14 -05:00
Ronald S. Bultje	0cc9c23ea1	vp9/x86: make filter_48/84_v work on 32-bit.	2014-12-27 16:55:14 -05:00
Ronald S. Bultje	6433a9133f	vp9/x86: make filter_88_v work on 32-bit.	2014-12-27 16:55:14 -05:00
Ronald S. Bultje	75f8e52089	vp9/x86: make filter_44_v work on 32-bit.	2014-12-27 16:55:13 -05:00
Ronald S. Bultje	7f80c3344c	vp8/x86: save one register in SIGN_ADD/SUB.	2014-12-27 16:55:13 -05:00
Ronald S. Bultje	8ea2194ebb	vp9/x86: store unpacked intermediates for filter6/14 on stack. filter16 goes from 508 to 482 (h) or 346 to 314 (v) cycles; filter88 goes from 240 to 238 (h) or 174 to 165 (v) cycles, measured on TOS.	2014-12-27 16:55:13 -05:00
Ronald S. Bultje	e42409479f	vp8/x86: move variable assigned inside macro branch. The value is not used outside the branch.	2014-12-27 16:55:12 -05:00
Ronald S. Bultje	418c202c63	vp9/x86: simplify ABSSUM_CMP by inverting the comparison meaning.	2014-12-27 16:55:12 -05:00
Ronald S. Bultje	d1c55654e1	vp8/x86: remove unused register from ABSSUB_CMP macro.	2014-12-27 16:55:12 -05:00
Ronald S. Bultje	e59bd08986	vp9/x86: slightly simplify 44/48/84/88 h stores.	2014-12-27 16:55:11 -05:00
Ronald S. Bultje	8132629bd5	vp9/x86: make cglobal statement more conservative in register allocation.	2014-12-27 16:55:11 -05:00
Ronald S. Bultje	c013ca58c5	vp9/x86: save one register in loopfilter surface coverage.	2014-12-27 16:55:11 -05:00
James Almer	32c836cb11	x86/vp9: remove duplicate function prototypes Fixes "redundant redeclaration" warnings. Signed-off-by: James Almer <jamrial@gmail.com>	2014-12-23 00:56:51 -03:00
James Almer	7696e429c7	x86/vp3dsp: port put_vp_no_rnd_pixels8_l2_mmx to yasm Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-12-20 13:25:43 +01:00
James Almer	a4d62f7775	x86/constants: fix alignment of pw_255 Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-12-19 20:21:34 +01:00
Ronald S. Bultje	bdc1e3e3b2	vp9/x86: intra prediction sse2/32bit support. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-12-19 14:07:19 +01:00
Ronald S. Bultje	b6e1711223	vp9/x86: invert hu_ipred left array ordering. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-12-19 14:07:18 +01:00
Ronald S. Bultje	0a7964dca5	vp9/x86: save one register on 32bit idct32x32. Fixes build on win32. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-12-16 02:51:26 +01:00
Ronald S. Bultje	cae893f692	vp9/x86: sse2 MC assembly. Also a slight change to the ssse3 code, which prevents a theoretical overflow in the sharp filter. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-12-15 02:34:05 +01:00
Ronald S. Bultje	fd77fbb390	vp9/x86: 32bit and sse2 support for vp9 inverse transform assembly Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-12-15 00:38:05 +01:00
Michael Niedermayer	a03f72e744	avcodec/x86/hevc_mc: fix sse register counts These fix failures of --enable-xmm-clobber-test It would be better to change the code to use fewer registers, but until someone does the used register count must not be too small Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-12-11 13:17:26 +01:00
Michael Niedermayer	d43d5c5707	avcodec/x86/hevc_mc: remove dead branch from EPEL_FILTER Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-12-10 07:34:49 +01:00
Michael Niedermayer	ed9be7dd47	avcodec/x86/pngdsp: fix off by 1 error This fixes artifacts in the last pixel of rows with some widths and pixel formats Found-by: Dominique Leroux <Dominique.Leroux@autodesk.com> Tested-by: Dominique Leroux <Dominique.Leroux@autodesk.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-12-08 18:24:40 +01:00
Michael Niedermayer	1d048f762d	Merge commit '9a738c27dceb4b975784b23213a46f5cb560d1c2' * commit '9a738c27dceb4b975784b23213a46f5cb560d1c2': v210enc: Add SIMD optimised 8-bit and 10-bit encoders Conflicts: libavcodec/v210enc.c libavcodec/v210enc.h libavcodec/x86/Makefile libavcodec/x86/v210enc.asm libavcodec/x86/v210enc_init.c tests/ref/vsynth/vsynth1-v210 tests/ref/vsynth/vsynth2-v210 See: `36091742d1` Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-12-06 01:54:10 +01:00
Kieran Kunhya	9a738c27dc	v210enc: Add SIMD optimised 8-bit and 10-bit encoders Signed-off-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>	2014-12-05 13:03:49 +00:00
Reimar Döffinger	49d9cbe55d	h264_i386: Fix operand size Fixes fate failure on macosx clang x86-64 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-12-03 23:03:13 +01:00
Christophe Gisquet	9fa056ba75	pngdsp x86: use unaligned access For test images manually generated to contain only up prediction, timing results: 8380x3032 255x185 before: 138635 1992 after: 139232 1996 Actually jumping to the proper version depending on the alignment: 8380x3032: 138767 A 0.5% speed improvement for gigantic images is not worth the code duplication. Fixes ticket #4148 Signed-off-by: Christophe Gisquet <christophe.gisquet@gmail.com> Tested-by: Benoit Fouet <benoit.fouet@free.fr> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-12-03 11:56:22 +01:00
Kieran Kunhya	36091742d1	v210enc: Add SIMD optimised 8-bit and 10-bit encoders Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-11-26 20:30:47 +01:00
Michael Niedermayer	ea41e6d637	Merge commit '9c12c6ff9539e926df0b2a2299e915ae71872600' * commit '9c12c6ff9539e926df0b2a2299e915ae71872600': motion_est: convert stride to ptrdiff_t Conflicts: libavcodec/me_cmp.c libavcodec/ppc/me_cmp.c libavcodec/x86/me_cmp_init.c See: `9c669672c7` Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-11-24 12:13:00 +01:00
Vittorio Giovara	9c12c6ff95	motion_est: convert stride to ptrdiff_t CC: libav-stable@libav.org Bug-Id: CID 700556 / CID 700557 / CID 700558	2014-11-24 01:30:10 +00:00
Carl Eugen Hoyos	600e38f563	Fix standalone compilation of the apng decoder on x86.	2014-11-23 13:21:29 +01:00
Michael Niedermayer	65ce8f8895	avcodec/x86/Makefile: fix order Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-11-23 01:49:04 +01:00
Michael Niedermayer	d3512a0e89	avcodec/x86/lossless_audiodsp: fix fallback code for 32bit Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-11-22 21:08:38 +01:00
Michael Niedermayer	4327088da3	avcodec/x86/lossless_audiodsp: support len %16 == 8 in scalarproduct_and_madd_int16() Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-11-22 20:40:36 +01:00
Reimar Döffinger	478c61ccb2	h264_i386: Optimize decode_significance_8x8_x86 for 64 bit. 11674 -> 10877 decicycles on my Phenom II. Overall speedup was unfortunately within measurement error. Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>	2014-11-22 14:06:48 +01:00
James Almer	3cec54b7d7	x86/flacdsp: add SSE2 and AVX decorrelate functions Two to four times faster depending on instruction set, block size and channel count.	2014-11-13 13:47:55 -03:00
James Almer	84ccc317ce	x86/flacdsp: separate decoder and encoder dsp initialization Signed-off-by: James Almer <jamrial@gmail.com>	2014-11-12 14:41:45 -03:00
James Almer	7292b0477a	x86/hpeldsp: fix loop in {avg,avg_no_rnd}_pixels16_x2_mmx Handle it inside the __asm__() block. Fixes fate-vc1_ilaced_twomv when using the gcc-usan toolchain. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-10-23 13:11:05 -03:00
Michael Niedermayer	3c1378ce0a	Merge commit '2d91abade29e43bb45c881d45909b8ee77e904e2' * commit '2d91abade29e43bb45c881d45909b8ee77e904e2': x86: h264_intrapred: Don't treat 32-bit integers as 64-bit Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-10-08 11:48:58 +02:00
Henrik Gramner	2d91abade2	x86: h264_intrapred: Don't treat 32-bit integers as 64-bit The upper halves are not guaranteed to be zero in x86-64. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2014-10-08 08:15:52 +00:00
Mickaël Raulet	4ba6371a83	x86/hevc: get rid off packusdw for ssse3 compatibility cherry picked from commit df8ebe304df453f26c28ff8f11d607f49b90a4c2 Fixes out of array access Fixes: asan_stack-oob_1046454_9_asan_stack-oob_15a9e7c_170_WP_MAIN10_B_Toshiba_3.bit Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-10-04 21:14:15 +02:00
James Almer	0de1d6287e	x86/mlpdec: add ff_mlp_rematrix_channel_{sse4,avx2} 2x to 2.5x faster than the C version. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-10-02 22:11:55 -03:00
James Almer	acebff8e5d	x86/mpegvideoencdsp: improve ff_pix_sum16_sse2 ~15% faster. Also add an mmxext version that takes advantage of the new code, and build it alongside with the mmx version only on x86_32. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-10-01 13:07:22 -03:00
Michael Niedermayer	d22e88d120	avcodec/x86/fmtconvert: Fix operand size in ff_int32_to_float_fmul_array8_sse* Fixes acodec-dca2 fate failure Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-09-28 19:04:06 +02:00
James Almer	26cd7b1e1a	x86/fmtconvert: add ff_int32_to_float_fmul_array8_{sse,sse2} About two times faster than the c wrapper. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-09-26 20:48:40 -03:00
Carl Eugen Hoyos	c0f9df30dd	lavc/x86/idctdsp.h: Fix make checkheaders.	2014-09-25 22:18:25 +02:00
James Almer	a829870b2f	avcodec/svq1enc: align buffer used by simd functions Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-09-25 16:00:20 -03:00
James Almer	4b892e469b	x86/cavsdsp: fix buffer alignment in cavs_idct8_add_mmx() It may be used by ff_add_pixels_clamped_sse2(). Should fix fate-cavs failures on some systems. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-09-25 16:00:16 -03:00
James Almer	4f4f08e6f0	x86/idctdsp: port {put,add}_pixels_clamped to yasm Also add sse2 versions for both. put_pixels_clamped port and sse2 version originally written by Timothy Gu. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-09-24 21:52:13 -03:00
James Almer	c99a882814	avcodec/idctdsp: change {put,add}_pixels_clamped to ptrdiff_t line_size Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-09-24 21:43:19 -03:00
James Almer	ad26e83f9c	avcodec/x86: use function pointers for {put,add}_pixels_clamped Same behavior as in simple_idct. This way the best optimized versions available will be used instead. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-09-24 18:52:32 -03:00
James Almer	70277d1d23	x86/videodsp: add ff_emu_edge_{hfix,hvar}_avx2 ~15% faster than sse2. Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2014-09-24 16:12:55 -03:00
James Almer	164d6c7f5b	x86/videodsp: fix warning about discarded 'const' qualifier Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-09-23 19:59:20 -03:00
James Almer	6b2caa321f	x86/vp9: add AVX and AVX2 MC Roughly 25% faster MC than ssse3 for blocksizes 32 and 64. Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2014-09-22 22:35:03 -03:00
James Almer	33c752be51	x86/me_cmp: port mmxext vsad functions to yasm Also add mmxext versions of vsad8 and vsad_intra8, and sse2 versions of vsad16 and vsad_intra16. Since vsad8 and vsad16 are not bitexact, they are accordingly marked as approximate. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-09-19 20:50:20 -03:00
James Almer	77f9a81cca	x86/me_cmp: combine sad functions into a single macro No point in having the sad8 functions separate now that the loop is no longer unrolled. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-09-17 23:52:36 -03:00
Michael Niedermayer	41d82b85ab	avcodec/x86/vp9lpf: Always include x86util.asm Fixes executable stack Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-09-17 23:37:46 +02:00
Michael Niedermayer	85f2c0124d	avcodec/x86/me_cmp: fix sad8xh This adds back support for 8x4 and 8x16 it does not support 8x2, i think nothing uses that Found-by: ubitux Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-09-17 14:08:24 +02:00
James Almer	0456d169c4	x86/me_cmp: port mmxext and sse2 sad functions to yasm Also add a missing c->pix_abs[0][0] initialization, and sse2 versions of sad16_x2, sad16_y2 and sad16_xy2 (%15 to %20 faster than mmxext). Since the _xy2 versions are not bitexact, they are accordingly marked as approximate. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-09-17 11:12:50 +02:00
James Almer	52ec81c67d	x86/hevc_res_add: add missing guards to hevc_transform_add32_8_avx2 Should fix compilation with old Yasm/Nasm versions. Signed-off-by: James Almer <jamrial@gmail.com>	2014-09-04 23:34:01 -03:00
James Almer	c3d2426cca	x86/hevc_res_add: add ff_hevc_transform_add32_8_avx2 ~20% faster than AVX. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-09-04 20:21:29 -03:00
James Darnley	46ef45ab59	lavc/x86/v210: give cpuflag to INIT macro This lets the cglobal macro automatically append a suffix to the function name. This means that INIT_XMM avx must be used rather than INIT_AVX. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-09-05 00:35:07 +02:00
Michael Niedermayer	5b58d79a99	Merge commit '7a1d6ddd2c6b2d66fbc1afa584cf506930a26453' * commit '7a1d6ddd2c6b2d66fbc1afa584cf506930a26453': xvid: Add C IDCT Conflicts: libavcodec/dct-test.c libavcodec/xvididct.c See: `298b3b6c1f` Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-09-03 04:09:38 +02:00
Michael Niedermayer	5db23c07a3	Merge commit '95c0cec03acec0a80cc1c7db48f3b2355d9e767b' * commit '95c0cec03acec0a80cc1c7db48f3b2355d9e767b': idctdsp: Add global function pointers for {add\|put}_pixels_clamped functions Conflicts: libavcodec/arm/idctdsp_init_arm.c libavcodec/dct.h libavcodec/idctdsp.c libavcodec/jrevdct.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-09-03 03:19:40 +02:00
Pascal Massimino	7a1d6ddd2c	xvid: Add C IDCT Thanks to Pascal Massimino and Michael Militzer for relicensing as LGPL. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2014-09-02 14:41:13 -07:00
Diego Biurrun	95c0cec03a	idctdsp: Add global function pointers for {add\|put}_pixels_clamped functions These function pointers already existed in the ARM code. Adding them globally allows calls to the function pointers to access arch-optimized versions of the functions transparently.	2014-09-02 14:41:13 -07:00
Reimar Döffinger	d9e2aceb7f	Add missing "const" all over the place. Only "./configure --enable-gpl" on x86 was tested. Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>	2014-08-29 18:57:25 +02:00
Michael Niedermayer	5403a288a7	Merge commit '8d27bf1cff35be406b0fd89d832e1852d4c573bc' * commit '8d27bf1cff35be406b0fd89d832e1852d4c573bc': x86: xvid: K&R formatting cosmetics Conflicts: libavcodec/x86/xvididct_sse2.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-27 21:20:39 +02:00
Michael Niedermayer	b3b05a11d3	Merge commit 'dcb7c868ec7af7d3a138b3254ef2e08f074d8ec5' * commit 'dcb7c868ec7af7d3a138b3254ef2e08f074d8ec5': cosmetics: Make naming scheme of Xvid IDCT consistent with other IDCTs Conflicts: libavcodec/mpeg4videodec.c libavcodec/x86/Makefile libavcodec/x86/dct-test.c libavcodec/x86/xvididct_sse2.c libavcodec/xvididct.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-27 21:09:30 +02:00
Michael Niedermayer	3ff5ca89fc	Merge commit '1f156af4274dc72d588620f6bedb4e9e66023c92' * commit '1f156af4274dc72d588620f6bedb4e9e66023c92': x86: xvid_idct: Drop unused definitions Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-27 21:01:54 +02:00
Diego Biurrun	8d27bf1cff	x86: xvid: K&R formatting cosmetics	2014-08-27 05:58:04 -07:00
Diego Biurrun	dcb7c868ec	cosmetics: Make naming scheme of Xvid IDCT consistent with other IDCTs	2014-08-27 04:54:05 -07:00
Diego Biurrun	1f156af427	x86: xvid_idct: Drop unused definitions	2014-08-27 04:36:41 -07:00
Christophe Gisquet	3e892b2bcd	x86: hevc_mc: split differently calls In some cases, 2 or 3 calls are performed to functions for unusual widths. Instead, perform 2 calls for different widths to split the workload. The 8+16 and 4+8 widths for respectively 8 and more than 8 bits can't be processed that way without modifications: some calls use unaligned buffers, and having branches to handle this was resulting in no micro-benchmark benefit. For block_w == 12 (around 1% of the pixels of the sequence): Before: 12758 decicycles in epel_uni, 4093 runs, 3 skips 19389 decicycles in qpel_uni, 8187 runs, 5 skips 22699 decicycles in epel_bi, 32743 runs, 25 skips 34736 decicycles in qpel_bi, 32733 runs, 35 skips After: 11929 decicycles in epel_uni, 4096 runs, 0 skips 18131 decicycles in qpel_uni, 8184 runs, 8 skips 20065 decicycles in epel_bi, 32750 runs, 18 skips 31458 decicycles in qpel_bi, 32753 runs, 15 skips Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-24 12:05:33 +02:00
Christophe Gisquet	38e2aa3759	x86: hevc_mc: correct unneeded use of SSE4 code Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-24 11:43:33 +02:00
Christophe Gisquet	2346f2b5db	x86: hevcdsp: use compilation-time-fixed constant The stride for some buffers is known. Reviewed-by: Mickaël Raulet <mraulet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-22 16:26:30 +02:00
Christophe Gisquet	dad7f15567	hevcdsp: remove more instances of compile-time-fixed parameters Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-22 15:22:42 +02:00
Christophe Gisquet	d4f44b66d3	hevcdsp: remove compilation-time-fixed parameter The dststride parameter is always MAX_PB_SIZE. Reviewed-by: Mickaël Raulet <mraulet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-22 14:57:37 +02:00
Christophe Gisquet	fb1a98ec5b	x86: hevc_mc: assume 2nd source stride is 64 Reviewed-by: Mickaël Raulet <mraulet@gmail.com Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-22 13:21:37 +02:00
James Almer	54ca4dd43b	x86/hevc_res_add: refactor ff_hevc_transform_add{16,32}_8 * Reduced xmm register count to 7 (As such they are now enabled for x86_32). * Removed four movdqa (affects the sse2 version only). * pxor is now used to clear m0 only once. ~5% faster. Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2014-08-21 15:01:33 -03:00
James Almer	76a99d467f	x86/hecv_res_add: add ff_hevc_transform_add{8,16,32}_8_avx ~15% faster than sse2 Reviewed-by: Mickaël Raulet <mraulet@gmail.com> Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2014-08-20 16:54:52 -03:00
James Almer	9f498f4e6f	x86/hevc_res_add: fix register count in hevc_transform_add{16,32}_10_avx2 Signed-off-by: James Almer <jamrial@gmail.com>	2014-08-19 21:34:52 -03:00
Pierre Edouard Lepere	a6af4bf64d	x86: hevc: adding transform_add Reviewed-by: James Almer <jamrial@gmail.com> Approved-by: Ronald S. Bultje Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-20 01:28:56 +02:00
Michael Niedermayer	3bb2297351	Merge commit 'efd26bedec9a345a5960dbfcbaec888418f2d4e6' * commit 'efd26bedec9a345a5960dbfcbaec888418f2d4e6': build: Add explanatory comments to (optimization) blocks in the Makefiles Conflicts: libavcodec/ppc/Makefile libavcodec/x86/Makefile Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-15 20:25:12 +02:00
Michael Niedermayer	c1df467d73	Merge commit '835f798c7d20bca89eb4f3593846251ad0d84e4b' * commit '835f798c7d20bca89eb4f3593846251ad0d84e4b': mpegvideo: cosmetics: Lowercase ugly uppercase MPV_ function name prefixes Conflicts: libavcodec/h261dec.c libavcodec/intrax8.c libavcodec/mjpegenc.c libavcodec/mpeg12dec.c libavcodec/mpeg12enc.c libavcodec/mpeg4videoenc.c libavcodec/mpegvideo.c libavcodec/mpegvideo.h libavcodec/mpegvideo_enc.c libavcodec/rv10.c libavcodec/x86/mpegvideoenc.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-15 20:11:56 +02:00
Diego Biurrun	efd26bedec	build: Add explanatory comments to (optimization) blocks in the Makefiles	2014-08-15 02:55:21 -07:00
Diego Biurrun	835f798c7d	mpegvideo: cosmetics: Lowercase ugly uppercase MPV_ function name prefixes	2014-08-15 01:26:33 -07:00
James Darnley	54a51d3840	lavc/flacenc: partially unroll loop in flac_enc_lpc_16 It now does 12 samples per iteration, up from 4. From 1.8 to 3.2 times faster again. 3.6 to 5.7 times faster overall. Runtime is reduced by a further 2 to 18%. Overall runtime reduced by 4 to 50%. Same conditions as before apply. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-13 03:09:26 +02:00
James Darnley	0081a14e7d	lavc/flacenc: add sse4 version of the 16-bit lpc encoder From 1.8 to 2.4 times faster. Runtime is reduced by 2 to 39%. The speed-up generally increases with compression_level. This lpc encoder is not used with levels < 3 so it provides no speed-up in these cases. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-13 01:14:47 +02:00
Ronald S. Bultje	45bed0ab30	vp9/x86: fix bug in intra_pred_hd_32x32. Fixes mismatch in first keyframe in sample ffvp9_fails_where_libvpx.succeeds.webm from ticket 3849. There's still a second mismatch a few frames into the sample. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-12 13:11:21 +02:00
James Almer	c97870d1a1	x86/dca: remove unused header Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-12 12:46:53 +02:00
James Almer	e20ff251a6	x86/ttadsp: remove an unnecessary mova Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-12 12:29:05 +02:00
Michael Niedermayer	3841f2ae66	Merge commit 'd35b94fbabd8beb5d566c0b5d01688aff62c3b36' * commit 'd35b94fbabd8beb5d566c0b5d01688aff62c3b36': avcodec: Rename xvidmmx IDCT to xvid Conflicts: doc/APIchanges libavcodec/version.h Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-09 12:11:13 +02:00
Michael Niedermayer	0dcebb9f63	Merge commit '84d173d3de97c753234ab0c0b50551d51413d663' * commit '84d173d3de97c753234ab0c0b50551d51413d663': xvididct: Ensure that the scantable permutation is always set correctly Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-08 22:17:04 +02:00
Diego Biurrun	d35b94fbab	avcodec: Rename xvidmmx IDCT to xvid The Xvid IDCT is not MMX-specific.	2014-08-08 11:13:30 -07:00
Diego Biurrun	84d173d3de	xvididct: Ensure that the scantable permutation is always set correctly This fixes cases where the scantable permuation would get overwritten by the general idctdsp initialization.	2014-08-08 11:13:29 -07:00
Christophe Gisquet	75837e9add	x86: sbrdsp/fft: reuse ps_neg constant Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-06 19:25:08 +02:00
Christophe Gisquet	51dd80e751	x86: diracdsp: reuse constants Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-06 19:25:02 +02:00
Christophe Gisquet	6622a6cff3	x86: dwt: better share constants Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-06 19:24:57 +02:00
Christophe Gisquet	71db2d08b1	x86: better share ff_pw_2 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-06 19:24:49 +02:00
Christophe Gisquet	4e128ab0b1	x86: vpx/h264/hevc/mpeg2: share constants Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-06 18:36:31 +02:00
Michael Niedermayer	305f72aee7	avcodec: Change get_pixels() to ptrdiff_t linesize Found-by: ubitux Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-06 15:50:54 +02:00
Christophe Gisquet	6786848585	hevc_deblock: change tc type The x86 asm expects int32_t so use that type. Reviewed-by: Mickaël Raulet <mraulet@insa-rennes.fr> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-06 12:38:26 +02:00
James Almer	de417982e8	x86/vp9lpf: use fewer instructions in SPLATB_MIX Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-05 02:47:54 +02:00
Christophe Gisquet	e8c003edd2	x86: hevc_deblock: remove unnecessary masking The unpacks/shuffles later on makes it unnecessary. Before: 1508 decicycles in h, 2096759 runs, 393 skips 2512 decicycles in v, 2095422 runs, 1730 skips After: 1477 decicycles in h, 2096745 runs, 407 skips 2484 decicycles in v, 2095297 runs, 1855 skips Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-04 17:46:04 +02:00
James Almer	b7863c972c	x86/hevc_mc: use fewer instructions in hevc_put_hevc_{uni, bi}_w[24]_{8, 10, 12} Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-04 14:47:15 +02:00
James Almer	b1a44e6bf5	x86/hevc_mc: remove an unnecessary pxor Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: Mickaël Raulet <mraulet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-04 14:35:08 +02:00
James Almer	d0f56ca071	x86/hevc_deblock: improve 8bit transpose store macros Up to four instructions less depending on function and instruction set. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-03 04:24:15 +02:00
Michael Niedermayer	f54e01c24e	Merge commit 'a786c8259dafeca9744252230b5d78f67810770c' * commit 'a786c8259dafeca9744252230b5d78f67810770c': idct: Split off Xvid IDCT Conflicts: libavcodec/Makefile libavcodec/mpeg4videodec.c libavcodec/x86/Makefile libavcodec/x86/idctdsp_init.c This split is somewhat restructured leaving the xvid IDCT available outside mpeg4 if manually selected. The code also could not be merged unchanged as it conflicted with a bugfix in FFmpeg Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-01 16:21:52 +02:00
Diego Biurrun	a786c8259d	idct: Split off Xvid IDCT The Xvid IDCT is only required to decode some Xvid-encoded MPEG-4 files, so there is no point in having it as an unconditional part of idctdsp.	2014-08-01 01:25:18 -07:00
James Almer	62baf5b853	x86/hevc_deblock: use existing x86util transpose macro in chroma_{10, 12} Cosmetic change. No measurable difference in speed. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-31 22:56:21 +02:00
Christophe Gisquet	a507623bad	x86: hevc_mc: fix register count usage A macro was using a fixed register, causing too many GPRs to be declared as used. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-29 22:50:50 +02:00
James Almer	73c4f63ba5	x86/hevc_deblock: add add ff_hevc_[hv]_loop_filter_luma_{8, 10, 12}_avx ~5% faster than SSSE3 Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-29 14:04:59 +02:00
James Almer	88ba821f23	x86/hevc_deblock: improve luma functions register allocation Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-29 13:38:05 +02:00
James Almer	c74b08c5c6	x86/hevc_deblock: remove some unnecessary instructions Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-29 13:27:44 +02:00
James Almer	4f91bb0ff0	x86/hevc_deblock: use psignw instead of pmullw where possible It's slightly faster Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-29 03:42:29 +02:00
Michael Niedermayer	a91c5ed008	Merge commit '4f8cf0dc4ef6110174056df7edd9dc2f2a988b6d' * commit '4f8cf0dc4ef6110174056df7edd9dc2f2a988b6d': x86: build: Restore ordering of OBJS lines Conflicts: libavcodec/x86/Makefile Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-29 00:34:53 +02:00
Diego Biurrun	4f8cf0dc4e	x86: build: Restore ordering of OBJS lines	2014-07-28 13:19:04 -07:00
James Almer	664e9e4331	x86/hevc_deblock: load less data in hevc_h_loop_filter_luma_8 Reading 8 bytes is enough. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-28 21:55:22 +02:00
James Almer	f137876182	x86/hevc_idct: add a colon to labels This fixes a warning spam when using NASM Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-28 21:43:32 +02:00
Christophe Gisquet	81943a10b5	x86: hevc_mc: load less data in epel filters Before: 5679 decicycles in epel_bi, 2059976 runs, 37176 skips 3468 decicycles in epel_uni, 1040886 runs, 7690 skips After: 5323 decicycles in epel_bi, 2059493 runs, 37659 skips 3262 decicycles in epel_uni, 1040871 runs, 7705 skips Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-27 18:34:39 +02:00
Christophe Gisquet	36284ae981	x86: hevc_mc: replace one lea by add Should have been in `036f11bdb5`. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-27 17:42:56 +02:00
James Almer	bfb3b2b7a6	x86/hevc_idct: add 12bit idct_dc Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: Mickaël Raulet <mraulet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-27 00:30:56 +02:00
Michael Niedermayer	d4a9e89b27	avcodec/x86/hevcdsp_init: make license header consistent Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-27 00:28:44 +02:00
Michael Niedermayer	706f81a2c2	Merge commit '1a880b2fb8456ce68eefe5902bac95fea1e6a72d' * commit '1a880b2fb8456ce68eefe5902bac95fea1e6a72d': hevc: SSE2 and SSSE3 loop filters Conflicts: libavcodec/hevcdsp.c libavcodec/hevcdsp.h libavcodec/x86/Makefile libavcodec/x86/hevc_deblock.asm libavcodec/x86/hevcdsp_init.c See: `de7b89fd43` and several others Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-27 00:20:48 +02:00
James Almer	1ace9573dc	x86/hevc_idct: replace old and unused idct functions Only 8-bit and 10-bit idct_dc() functions are included (adding others should be trivial). Benchmarks on an Intel Core i5-4200U: idct8x8_dc SSE2 MMXEXT C cycles 22 26 57 idct16x16_dc AVX2 SSE2 C cycles 27 32 249 idct32x32_dc AVX2 SSE2 C cycles 62 126 1375 Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: Mickaël Raulet <mraulet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-26 18:00:11 +02:00
Pierre Edouard Lepere	1a880b2fb8	hevc: SSE2 and SSSE3 loop filters Additional contributions by James Almer <jamrial@gmail.com>, Carl Eugen Hoyos <cehoyos@ag.or.at>, Fiona Glaser <fiona@x264.com> and Anton Khirnov <anton@khirnov.net> Signed-off-by: Anton Khirnov <anton@khirnov.net>	2014-07-26 15:01:01 +00:00
Christophe Gisquet	036f11bdb5	x86: hevc_mc: replace simple leas by adds lea is detrimental for those simple cases. No impact overall to the change though. Before: 15017 decicycles in q, 1016152 runs, 32424 skips 15382 decicycles in q_bi, 1013673 runs, 34903 skips 3713 decicycles in e, 2074534 runs, 22618 skips 3901 decicycles in e_bi, 2065509 runs, 31643 skips 7852 decicycles in q_uni, 520165 runs, 4123 skips 2398 decicycles in e_uni, 1043339 runs, 5237 skips After: 14898 decicycles in q, 1016295 runs, 32281 skips 15119 decicycles in q_bi, 1015392 runs, 33184 skips 3682 decicycles in e, `2073224` runs, 23928 skips 3720 decicycles in e_bi, 2065043 runs, 32109 skips 7643 decicycles in q_uni, 520280 runs, 4008 skips 2363 decicycles in e_uni, 1043780 runs, 4796 skips Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-26 05:41:04 +02:00
Mickaël Raulet	bd0f2d316f	x86/hevc: add 12bits support for MC cherry picked from commit 3fcb7a4595a6f40100a22110a5805e3b7510c0fd Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-26 01:55:20 +02:00
Mickaël Raulet	7df98d8c4d	x86/hevc: remove unused constant in deblocking filter cherry picked from commit a3f7282eaa6f1ab0524fb966c6eade50c3025f99 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-26 01:20:40 +02:00
Mickaël Raulet	7bdcf5c934	x86/hevc: add 12bits support for deblocking filter cherry picked from commit 97d46afe320c7d61d7b9525e5f5588355cde4bb0 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-26 01:19:42 +02:00
Michael Niedermayer	2904d052b7	Merge commit '7fb993d338d88f2f62e0a358b6c9f3eb9a3a08ac' * commit '7fb993d338d88f2f62e0a358b6c9f3eb9a3a08ac': qpeldsp: Mark source pointer in qpel_mc_func function pointer const Conflicts: libavcodec/h264qpel_template.c libavcodec/x86/cavsdsp.c libavcodec/x86/rv40dsp_init.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-25 13:05:08 +02:00
Diego Biurrun	7fb993d338	qpeldsp: Mark source pointer in qpel_mc_func function pointer const	2014-07-25 02:52:54 -07:00
Christophe Gisquet	670b7f203a	x86: hevcdsp: align Reviewed-by: Mickaël Raulet <mraulet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-23 22:18:08 +02:00
Carl Eugen Hoyos	c75fdee747	avcodec/x86/hevc_deblock: Fix compilation with nasm.	2014-07-23 10:32:27 +02:00
Michael Niedermayer	ca6b33b8bd	avcodec/x86/hevcdsp_init: Fix "warning: assignment from incompatible pointer type"	2014-07-22 16:36:12 +02:00
Anton Khirnov	d7e162d46b	hevcdsp: remove an unneeded variable in the loop filter beta0 and beta1 will always be the same within a CU Signed-off-by: Mickaël Raulet <mraulet@insa-rennes.fr> cherry picked from commit 4a23d824741a289c7d2d2f2871d1e2621b63fa1b Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-22 16:27:26 +02:00
Anton Khirnov	ae2f048fd7	avcodec/x86/hevc_deblock: cosmetics cherry picked from commit f7843356253459e6010320292dbbc1e888a5249b Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-22 16:18:05 +02:00
Anton Khirnov	b435043abb	hevc: cleanups in SSE2 and SSSE3 loop filters, use fewer instructions cherry picked from commit f7843356253459e6010320292dbbc1e888a5249b Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-22 16:17:29 +02:00
Anton Khirnov	e8581b17a8	avcodec/x86/hevc_deblock: use test instead of cmp 0 cherry picked from commit f7843356253459e6010320292dbbc1e888a5249b Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-22 16:16:05 +02:00
Anton Khirnov	dc69247de4	avcodec/x86/hevc_deblock: use of paddw instead of psllw cherry picked from commit f7843356253459e6010320292dbbc1e888a5249b Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-22 16:14:53 +02:00
Anton Khirnov	500a0394d5	avcodec/x86/hevc_deblock: add %ifs to avoid "do nothing instructions" cherry picked from commit f7843356253459e6010320292dbbc1e888a5249b Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-22 16:13:28 +02:00
Anton Khirnov	7a4cf67117	hevc: cleaning up SSE2 and SSSE3 deblocking filters Signed-off-by: Mickaël Raulet <mraulet@insa-rennes.fr> cherry picked from commit b432041d7d1eca38831590f13b4e5baffff8186f Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-22 16:00:48 +02:00
Michael Niedermayer	d986c414de	Merge commit '81b9bf319226fe03436c80aaa8a2c91767cab7ce' * commit '81b9bf319226fe03436c80aaa8a2c91767cab7ce': dct-test: Move arch-specific bits into arch-specific subdirectories Conflicts: libavcodec/dct-test.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-21 13:33:51 +02:00
Diego Biurrun	81b9bf3192	dct-test: Move arch-specific bits into arch-specific subdirectories	2014-07-21 01:10:11 -07:00
Michael Niedermayer	776647360d	Merge commit '5dcc201505f71b1e73e9eef12ce89d4eed252ad0' * commit '5dcc201505f71b1e73e9eef12ce89d4eed252ad0': simple_idct: Move x86-specific declarations to a header in the x86 directory Conflicts: libavcodec/x86/simple_idct.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-19 13:56:29 +02:00
Michael Niedermayer	6da96a9fc9	Merge commit '85cabb8d002f2cd100ced5cc17d87bfc9460d314' * commit '85cabb8d002f2cd100ced5cc17d87bfc9460d314': fdct: Move x86-specific declarations to a header in the x86 directory Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-19 13:45:59 +02:00
Diego Biurrun	5dcc201505	simple_idct: Move x86-specific declarations to a header in the x86 directory	2014-07-19 02:33:36 -07:00
Diego Biurrun	85cabb8d00	fdct: Move x86-specific declarations to a header in the x86 directory	2014-07-19 02:25:59 -07:00
Michael Niedermayer	097bf834ba	Merge commit '9e0b29911f1f167381a7dbdfca68bf417b8c767b' * commit '9e0b29911f1f167381a7dbdfca68bf417b8c767b': x86: dnxhdenc: Eliminate some unnecessary ifdefs Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-18 22:33:24 +02:00
Michael Niedermayer	521f569734	Merge commit '8b0dd4942aac320d1ca3c40fa7ea1be342c71273' * commit '8b0dd4942aac320d1ca3c40fa7ea1be342c71273': idctdsp: prettyprinting cosmetics Conflicts: libavcodec/idctdsp.c libavcodec/ppc/idctdsp.c libavcodec/x86/idctdsp_init.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-18 22:16:04 +02:00
Michael Niedermayer	42d326353c	Merge commit 'b4987f72197e0c62cf2633bf835a9c32d2a445ae' * commit 'b4987f72197e0c62cf2633bf835a9c32d2a445ae': idct: Convert IDCT permutation #defines to an enum Conflicts: libavcodec/idctdsp.c libavcodec/x86/cavsdsp.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-18 22:01:17 +02:00
Diego Biurrun	9e0b29911f	x86: dnxhdenc: Eliminate some unnecessary ifdefs	2014-07-18 09:58:17 -07:00
Diego Biurrun	8b0dd4942a	idctdsp: prettyprinting cosmetics	2014-07-18 07:51:03 -07:00
Diego Biurrun	b4987f7219	idct: Convert IDCT permutation #defines to an enum Also rename the enum values to be consistent with other DCT permutations.	2014-07-18 07:51:03 -07:00
Michael Niedermayer	3a2d1465c8	Merge commit '2d60444331fca1910510038dd3817bea885c2367' * commit '2d60444331fca1910510038dd3817bea885c2367': dsputil: Split motion estimation compare bits off into their own context Conflicts: configure libavcodec/Makefile libavcodec/arm/Makefile libavcodec/dvenc.c libavcodec/error_resilience.c libavcodec/h264.h libavcodec/h264_slice.c libavcodec/me_cmp.c libavcodec/me_cmp.h libavcodec/motion_est.c libavcodec/motion_est_template.c libavcodec/mpeg4videoenc.c libavcodec/mpegvideo.c libavcodec/mpegvideo_enc.c libavcodec/x86/Makefile libavcodec/x86/me_cmp_init.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-17 23:27:40 +02:00
Michael Niedermayer	d6676a1605	Merge commit 'c23ce454b3e33634a188d6facfd2b7182af5af93' * commit 'c23ce454b3e33634a188d6facfd2b7182af5af93': x86: dsputil: Coalesce all init files Conflicts: libavcodec/x86/dsputil_init.c libavcodec/x86/dsputil_x86.h libavcodec/x86/motion_est.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-17 22:07:52 +02:00
Diego Biurrun	2d60444331	dsputil: Split motion estimation compare bits off into their own context	2014-07-17 09:07:10 -07:00
Diego Biurrun	c23ce454b3	x86: dsputil: Coalesce all init files This makes the init files match the structure of the dsputil split.	2014-07-17 03:32:56 -07:00
Michael Niedermayer	cc3e7a4c3d	Merge commit 'acf91215c74a91eb3b86af01dcb1d3c78d0e2310' * commit 'acf91215c74a91eb3b86af01dcb1d3c78d0e2310': x86: dsputil: Avoid pointless CONFIG_ENCODERS indirection Conflicts: libavcodec/x86/dsputil_init.c libavcodec/x86/dsputilenc_mmx.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-13 21:51:20 +02:00
Diego Biurrun	acf91215c7	x86: dsputil: Avoid pointless CONFIG_ENCODERS indirection The remaining dsputil bits are encoding-specific anyway.	2014-07-13 07:01:05 -07:00
James Almer	276bef5340	x86/hevc_deblock: add ff_hevc_[hv]_loop_filter_luma_{8, 10}_sse2 Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: Kieran Kunhya <kierank@obe.tv> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-13 13:48:31 +02:00
James Almer	123649dd19	x86/dsputilenc: remove some empty if statements Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-12 15:04:58 +02:00
Michael Niedermayer	b8cdf04726	Merge commit '1173320249745eab01c901a39054fc0fced33c87' * commit '1173320249745eab01c901a39054fc0fced33c87': dsputil: Drop unused bit_depth parameter from all init functions Conflicts: libavcodec/dsputil.c libavcodec/dsputil.h libavcodec/ppc/dsputil_ppc.c libavcodec/x86/dsputilenc_mmx.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-11 20:29:40 +02:00
Diego Biurrun	1173320249	dsputil: Drop unused bit_depth parameter from all init functions	2014-07-11 06:38:26 -07:00
Michael Niedermayer	2d5e9451de	Merge commit 'f46bb608d9d76c543e4929dc8cffe36b84bd789e' * commit 'f46bb608d9d76c543e4929dc8cffe36b84bd789e': dsputil: Split off pixel block routines into their own context Conflicts: configure libavcodec/dsputil.c libavcodec/mpegvideo_enc.c libavcodec/pixblockdsp_template.c libavcodec/x86/dsputilenc.asm libavcodec/x86/dsputilenc_mmx.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-10 01:22:14 +02:00
Diego Biurrun	f46bb608d9	dsputil: Split off pixel block routines into their own context	2014-07-09 08:05:26 -07:00
Michael Niedermayer	14e2406de7	Merge commit 'a9aee08d900f686e966c64afec5d88a7d9d130a3' * commit 'a9aee08d900f686e966c64afec5d88a7d9d130a3': dsputil: Split off FDCT bits into their own context Conflicts: configure libavcodec/Makefile libavcodec/asvenc.c libavcodec/dnxhdenc.c libavcodec/dsputil.c libavcodec/mpegvideo.h libavcodec/mpegvideo_enc.c libavcodec/x86/Makefile libavcodec/x86/dsputilenc_mmx.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-08 03:19:06 +02:00
Diego Biurrun	a9aee08d90	dsputil: Split off FDCT bits into their own context	2014-07-07 12:28:45 -07:00
Michael Niedermayer	3790801f9c	Merge commit '3c650efb81aaa3b395ba4606ee68a47ee4efb57b' * commit '3c650efb81aaa3b395ba4606ee68a47ee4efb57b': dsputil: Move draw_edges() to mpegvideoencdsp Conflicts: libavcodec/mpegvideo_enc.c libavcodec/x86/Makefile libavcodec/x86/dsputil_init.c libavcodec/x86/dsputil_mmx.c libavcodec/x86/dsputil_x86.h Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-07 16:17:27 +02:00
Michael Niedermayer	020865f557	Merge commit 'c166148409fe8f0dbccef2fe684286a40ba1e37d' * commit 'c166148409fe8f0dbccef2fe684286a40ba1e37d': dsputil: Move pix_sum, pix_norm1, shrink function pointers to mpegvideoenc Conflicts: libavcodec/dsputil.c libavcodec/mpegvideo_enc.c libavcodec/x86/dsputilenc.asm libavcodec/x86/dsputilenc_mmx.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-07 15:36:58 +02:00
Michael Niedermayer	462c6cdb8e	Merge commit '8d686ca59db14900ad5c12b547fb8a7afc8b0b94' * commit '8d686ca59db14900ad5c12b547fb8a7afc8b0b94': dsputil: Split off *_8x8basis to a separate context Conflicts: libavcodec/dsputil.c libavcodec/mpegvideo_enc.c libavcodec/x86/dsputilenc_mmx.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-07 15:08:55 +02:00
Diego Biurrun	3c650efb81	dsputil: Move draw_edges() to mpegvideoencdsp	2014-07-06 14:48:50 -07:00
Diego Biurrun	c166148409	dsputil: Move pix_sum, pix_norm1, shrink function pointers to mpegvideoenc	2014-07-06 14:26:53 -07:00
Diego Biurrun	8d686ca59d	dsputil: Split off *_8x8basis to a separate context	2014-07-06 13:09:24 -07:00
James Almer	195f7bd23d	x86/svq1enc: use unaligned mov on SSE2 Might fix fate failures on some systems Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-06 20:27:57 +02:00

... 2 3 4 5 6 ...

2058 Commits