ffmpeg

Author	SHA1	Message	Date
Christophe Gisquet	3e892b2bcd	x86: hevc_mc: split differently calls In some cases, 2 or 3 calls are performed to functions for unusual widths. Instead, perform 2 calls for different widths to split the workload. The 8+16 and 4+8 widths for respectively 8 and more than 8 bits can't be processed that way without modifications: some calls use unaligned buffers, and having branches to handle this was resulting in no micro-benchmark benefit. For block_w == 12 (around 1% of the pixels of the sequence): Before: 12758 decicycles in epel_uni, 4093 runs, 3 skips 19389 decicycles in qpel_uni, 8187 runs, 5 skips 22699 decicycles in epel_bi, 32743 runs, 25 skips 34736 decicycles in qpel_bi, 32733 runs, 35 skips After: 11929 decicycles in epel_uni, 4096 runs, 0 skips 18131 decicycles in qpel_uni, 8184 runs, 8 skips 20065 decicycles in epel_bi, 32750 runs, 18 skips 31458 decicycles in qpel_bi, 32753 runs, 15 skips Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-24 12:05:33 +02:00
Christophe Gisquet	38e2aa3759	x86: hevc_mc: correct unneeded use of SSE4 code Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-24 11:43:33 +02:00
Christophe Gisquet	2346f2b5db	x86: hevcdsp: use compilation-time-fixed constant The stride for some buffers is known. Reviewed-by: Mickaël Raulet <mraulet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-22 16:26:30 +02:00
Christophe Gisquet	dad7f15567	hevcdsp: remove more instances of compile-time-fixed parameters Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-22 15:22:42 +02:00
Christophe Gisquet	d4f44b66d3	hevcdsp: remove compilation-time-fixed parameter The dststride parameter is always MAX_PB_SIZE. Reviewed-by: Mickaël Raulet <mraulet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-22 14:57:37 +02:00
Christophe Gisquet	fb1a98ec5b	x86: hevc_mc: assume 2nd source stride is 64 Reviewed-by: Mickaël Raulet <mraulet@gmail.com Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-22 13:21:37 +02:00
James Almer	54ca4dd43b	x86/hevc_res_add: refactor ff_hevc_transform_add{16,32}_8 * Reduced xmm register count to 7 (As such they are now enabled for x86_32). * Removed four movdqa (affects the sse2 version only). * pxor is now used to clear m0 only once. ~5% faster. Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2014-08-21 15:01:33 -03:00
James Almer	76a99d467f	x86/hecv_res_add: add ff_hevc_transform_add{8,16,32}_8_avx ~15% faster than sse2 Reviewed-by: Mickaël Raulet <mraulet@gmail.com> Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2014-08-20 16:54:52 -03:00
James Almer	9f498f4e6f	x86/hevc_res_add: fix register count in hevc_transform_add{16,32}_10_avx2 Signed-off-by: James Almer <jamrial@gmail.com>	2014-08-19 21:34:52 -03:00
Pierre Edouard Lepere	a6af4bf64d	x86: hevc: adding transform_add Reviewed-by: James Almer <jamrial@gmail.com> Approved-by: Ronald S. Bultje Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-20 01:28:56 +02:00
Michael Niedermayer	3bb2297351	Merge commit 'efd26bedec9a345a5960dbfcbaec888418f2d4e6' * commit 'efd26bedec9a345a5960dbfcbaec888418f2d4e6': build: Add explanatory comments to (optimization) blocks in the Makefiles Conflicts: libavcodec/ppc/Makefile libavcodec/x86/Makefile Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-15 20:25:12 +02:00
Michael Niedermayer	c1df467d73	Merge commit '835f798c7d20bca89eb4f3593846251ad0d84e4b' * commit '835f798c7d20bca89eb4f3593846251ad0d84e4b': mpegvideo: cosmetics: Lowercase ugly uppercase MPV_ function name prefixes Conflicts: libavcodec/h261dec.c libavcodec/intrax8.c libavcodec/mjpegenc.c libavcodec/mpeg12dec.c libavcodec/mpeg12enc.c libavcodec/mpeg4videoenc.c libavcodec/mpegvideo.c libavcodec/mpegvideo.h libavcodec/mpegvideo_enc.c libavcodec/rv10.c libavcodec/x86/mpegvideoenc.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-15 20:11:56 +02:00
Diego Biurrun	efd26bedec	build: Add explanatory comments to (optimization) blocks in the Makefiles	2014-08-15 02:55:21 -07:00
Diego Biurrun	835f798c7d	mpegvideo: cosmetics: Lowercase ugly uppercase MPV_ function name prefixes	2014-08-15 01:26:33 -07:00
James Darnley	54a51d3840	lavc/flacenc: partially unroll loop in flac_enc_lpc_16 It now does 12 samples per iteration, up from 4. From 1.8 to 3.2 times faster again. 3.6 to 5.7 times faster overall. Runtime is reduced by a further 2 to 18%. Overall runtime reduced by 4 to 50%. Same conditions as before apply. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-13 03:09:26 +02:00
James Darnley	0081a14e7d	lavc/flacenc: add sse4 version of the 16-bit lpc encoder From 1.8 to 2.4 times faster. Runtime is reduced by 2 to 39%. The speed-up generally increases with compression_level. This lpc encoder is not used with levels < 3 so it provides no speed-up in these cases. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-13 01:14:47 +02:00
Ronald S. Bultje	45bed0ab30	vp9/x86: fix bug in intra_pred_hd_32x32. Fixes mismatch in first keyframe in sample ffvp9_fails_where_libvpx.succeeds.webm from ticket 3849. There's still a second mismatch a few frames into the sample. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-12 13:11:21 +02:00
James Almer	c97870d1a1	x86/dca: remove unused header Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-12 12:46:53 +02:00
James Almer	e20ff251a6	x86/ttadsp: remove an unnecessary mova Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-12 12:29:05 +02:00
Michael Niedermayer	3841f2ae66	Merge commit 'd35b94fbabd8beb5d566c0b5d01688aff62c3b36' * commit 'd35b94fbabd8beb5d566c0b5d01688aff62c3b36': avcodec: Rename xvidmmx IDCT to xvid Conflicts: doc/APIchanges libavcodec/version.h Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-09 12:11:13 +02:00
Michael Niedermayer	0dcebb9f63	Merge commit '84d173d3de97c753234ab0c0b50551d51413d663' * commit '84d173d3de97c753234ab0c0b50551d51413d663': xvididct: Ensure that the scantable permutation is always set correctly Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-08 22:17:04 +02:00
Diego Biurrun	d35b94fbab	avcodec: Rename xvidmmx IDCT to xvid The Xvid IDCT is not MMX-specific.	2014-08-08 11:13:30 -07:00
Diego Biurrun	84d173d3de	xvididct: Ensure that the scantable permutation is always set correctly This fixes cases where the scantable permuation would get overwritten by the general idctdsp initialization.	2014-08-08 11:13:29 -07:00
Christophe Gisquet	75837e9add	x86: sbrdsp/fft: reuse ps_neg constant Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-06 19:25:08 +02:00
Christophe Gisquet	51dd80e751	x86: diracdsp: reuse constants Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-06 19:25:02 +02:00
Christophe Gisquet	6622a6cff3	x86: dwt: better share constants Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-06 19:24:57 +02:00
Christophe Gisquet	71db2d08b1	x86: better share ff_pw_2 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-06 19:24:49 +02:00
Christophe Gisquet	4e128ab0b1	x86: vpx/h264/hevc/mpeg2: share constants Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-06 18:36:31 +02:00
Michael Niedermayer	305f72aee7	avcodec: Change get_pixels() to ptrdiff_t linesize Found-by: ubitux Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-06 15:50:54 +02:00
Christophe Gisquet	6786848585	hevc_deblock: change tc type The x86 asm expects int32_t so use that type. Reviewed-by: Mickaël Raulet <mraulet@insa-rennes.fr> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-06 12:38:26 +02:00
James Almer	de417982e8	x86/vp9lpf: use fewer instructions in SPLATB_MIX Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-05 02:47:54 +02:00
Christophe Gisquet	e8c003edd2	x86: hevc_deblock: remove unnecessary masking The unpacks/shuffles later on makes it unnecessary. Before: 1508 decicycles in h, 2096759 runs, 393 skips 2512 decicycles in v, 2095422 runs, 1730 skips After: 1477 decicycles in h, 2096745 runs, 407 skips 2484 decicycles in v, 2095297 runs, 1855 skips Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-04 17:46:04 +02:00
James Almer	b7863c972c	x86/hevc_mc: use fewer instructions in hevc_put_hevc_{uni, bi}_w[24]_{8, 10, 12} Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-04 14:47:15 +02:00
James Almer	b1a44e6bf5	x86/hevc_mc: remove an unnecessary pxor Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: Mickaël Raulet <mraulet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-04 14:35:08 +02:00
James Almer	d0f56ca071	x86/hevc_deblock: improve 8bit transpose store macros Up to four instructions less depending on function and instruction set. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-03 04:24:15 +02:00
Michael Niedermayer	f54e01c24e	Merge commit 'a786c8259dafeca9744252230b5d78f67810770c' * commit 'a786c8259dafeca9744252230b5d78f67810770c': idct: Split off Xvid IDCT Conflicts: libavcodec/Makefile libavcodec/mpeg4videodec.c libavcodec/x86/Makefile libavcodec/x86/idctdsp_init.c This split is somewhat restructured leaving the xvid IDCT available outside mpeg4 if manually selected. The code also could not be merged unchanged as it conflicted with a bugfix in FFmpeg Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-01 16:21:52 +02:00
Diego Biurrun	a786c8259d	idct: Split off Xvid IDCT The Xvid IDCT is only required to decode some Xvid-encoded MPEG-4 files, so there is no point in having it as an unconditional part of idctdsp.	2014-08-01 01:25:18 -07:00
James Almer	62baf5b853	x86/hevc_deblock: use existing x86util transpose macro in chroma_{10, 12} Cosmetic change. No measurable difference in speed. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-31 22:56:21 +02:00
Christophe Gisquet	a507623bad	x86: hevc_mc: fix register count usage A macro was using a fixed register, causing too many GPRs to be declared as used. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-29 22:50:50 +02:00
James Almer	73c4f63ba5	x86/hevc_deblock: add add ff_hevc_[hv]_loop_filter_luma_{8, 10, 12}_avx ~5% faster than SSSE3 Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-29 14:04:59 +02:00
James Almer	88ba821f23	x86/hevc_deblock: improve luma functions register allocation Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-29 13:38:05 +02:00
James Almer	c74b08c5c6	x86/hevc_deblock: remove some unnecessary instructions Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-29 13:27:44 +02:00
James Almer	4f91bb0ff0	x86/hevc_deblock: use psignw instead of pmullw where possible It's slightly faster Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-29 03:42:29 +02:00
Michael Niedermayer	a91c5ed008	Merge commit '4f8cf0dc4ef6110174056df7edd9dc2f2a988b6d' * commit '4f8cf0dc4ef6110174056df7edd9dc2f2a988b6d': x86: build: Restore ordering of OBJS lines Conflicts: libavcodec/x86/Makefile Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-29 00:34:53 +02:00
Diego Biurrun	4f8cf0dc4e	x86: build: Restore ordering of OBJS lines	2014-07-28 13:19:04 -07:00
James Almer	664e9e4331	x86/hevc_deblock: load less data in hevc_h_loop_filter_luma_8 Reading 8 bytes is enough. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-28 21:55:22 +02:00
James Almer	f137876182	x86/hevc_idct: add a colon to labels This fixes a warning spam when using NASM Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-28 21:43:32 +02:00
Christophe Gisquet	81943a10b5	x86: hevc_mc: load less data in epel filters Before: 5679 decicycles in epel_bi, 2059976 runs, 37176 skips 3468 decicycles in epel_uni, 1040886 runs, 7690 skips After: 5323 decicycles in epel_bi, 2059493 runs, 37659 skips 3262 decicycles in epel_uni, 1040871 runs, 7705 skips Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-27 18:34:39 +02:00
Christophe Gisquet	36284ae981	x86: hevc_mc: replace one lea by add Should have been in `036f11bdb5`. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-27 17:42:56 +02:00
James Almer	bfb3b2b7a6	x86/hevc_idct: add 12bit idct_dc Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: Mickaël Raulet <mraulet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-27 00:30:56 +02:00

1 2 3 4 5 ...

1811 Commits