ffmpeg

Author	SHA1	Message	Date
Diego Biurrun	017a06a9ee	x86: dsputil: Use correct file name as multiple inclusion guard	2014-02-20 04:16:15 -08:00
Michael Niedermayer	130c33af35	Merge commit 'b23bc95920e2f10b9621857e829c45b064f356c0' * commit 'b23bc95920e2f10b9621857e829c45b064f356c0': x86: dca: Add missing multiple inclusion guards Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-19 15:44:48 +01:00
Diego Biurrun	b23bc95920	x86: dca: Add missing multiple inclusion guards	2014-02-19 10:19:15 +01:00
Hendrik Leppkes	7716eda0aa	vp9/x86: set correct number of registers used in intra pred asm Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-18 17:20:14 +01:00
James Almer	07b4b0ca62	tta/x86: add ff_ttafilter_process_dec_{ssse3, sse4} Results are from a Win64 build running on an AMD FX 6300 1121 decicycles in ttafilter_process_dec_c, 16777112 runs, 104 skips 522 decicycles in ff_ttafilter_process_dec_ssse3, 16777149 runs, 67 skips 477 decicycles in ff_ttafilter_process_dec_sse4, 16777156 runs, 60 skips Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-17 13:51:19 +01:00
Ronald S. Bultje	fdb093c4e4	vp9/x86: intra prediction SIMD. Partially based on h264_intrapred. (I hope to eventually merge these two intrapred implementations back together.)	2014-02-17 13:39:00 +01:00
James Almer	ec482e738d	x86/fladsp: add missing check to ff_flacdsp_init_x86() Fixes compilation with flac decoder disabled and encoder enabled Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-16 12:06:04 +01:00
Michael Niedermayer	d601106ab1	avcodec/x86/lossless_videodsp: fix w type Fixes fate issues on mingw64 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-15 06:41:38 +01:00
Peter Ross	b8664c9294	avcodec/vp8dsp: add VP7 idct and loop filter Signed-off-by: Peter Ross <pross@xvid.org> Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-15 02:15:35 +01:00
James Almer	e87974bc00	flac/x86: add ff_flac_lpc_32_xop() Tested on an AMD FX 6300 679081 decicycles in ff_flac_lpc_32_xop, 32768 runs 774425 decicycles in ff_flac_lpc_32_sse4, 32768 runs Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-13 22:14:59 +01:00
James Darnley	623f380a18	lavc: fix flac encoder and decoder dependencies Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-13 21:00:32 +01:00
Michael Niedermayer	df98b36aa6	Merge commit '5c1c6e82261b856214499b9fef3a08bf3ff6e0ae' * commit '5c1c6e82261b856214499b9fef3a08bf3ff6e0ae': dca: include dcadsp.h in {arm,x86}/dca.h for checkheaders Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-08 17:25:31 +01:00
Michael Niedermayer	dd2b330347	Merge commit '0cffd6fff59f192120dc93aa6c3cb8180f5506e3' * commit '0cffd6fff59f192120dc93aa6c3cb8180f5506e3': x86: use the inline int8x8_fmul_int32 only if inline SSE2 is availbale Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-08 17:06:57 +01:00
Janne Grunau	5c1c6e8226	dca: include dcadsp.h in {arm,x86}/dca.h for checkheaders	2014-02-08 13:38:36 +01:00
Janne Grunau	0cffd6fff5	x86: use the inline int8x8_fmul_int32 only if inline SSE2 is availbale Fixes compilation with MSVC. Also does not rely on on earlier config.h include but include it directly.	2014-02-08 12:10:56 +01:00
Clément Bœsch	669d4f9053	x86/vp9lpf: simplify 2nd transpose in 44/48/88/84. For non-avx optims, this saves 8 movs. before: 1785 decicycles in ff_vp9_loop_filter_h_44_16_ssse3, 524129 runs, 159 skips 3327 decicycles in ff_vp9_loop_filter_h_48_16_ssse3, 262116 runs, 28 skips 2712 decicycles in ff_vp9_loop_filter_h_88_16_ssse3, 4193729 runs, 575 skips 3237 decicycles in ff_vp9_loop_filter_h_84_16_ssse3, 524061 runs, 227 skips after: 1768 decicycles in ff_vp9_loop_filter_h_44_16_ssse3, 524062 runs, 226 skips 3310 decicycles in ff_vp9_loop_filter_h_48_16_ssse3, 262107 runs, 37 skips 2719 decicycles in ff_vp9_loop_filter_h_88_16_ssse3, 4193954 runs, 350 skips 3184 decicycles in ff_vp9_loop_filter_h_84_16_ssse3, 524236 runs, 52 skips	2014-02-08 11:10:23 +01:00
Michael Niedermayer	82ae8a44e6	Merge commit '5b59a9fc6152169599561f04b4f66370edda5c9c' * commit '5b59a9fc6152169599561f04b4f66370edda5c9c': x86: dcadsp: implement int8x8_fmul_int32 Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-08 01:20:33 +01:00
Christophe Gisquet	5b59a9fc61	x86: dcadsp: implement int8x8_fmul_int32 For the callable function (as opposed to the inline one): C SSE SSE2 SSE4 Win32: 47 42 29 26 Win64: 30 33 25 23 The SSE version is neither compiled nor set for ARCH_X86_64, as the inlinable function takes over. Signed-off-by: Janne Grunau <janne-libav@jannau.net>	2014-02-07 22:52:40 +01:00
Loren Merritt	9c978f243a	flac/x86: add ff_flac_lpc_32_sse4() benchmarked on sandybridge x86_64: 1358232 decicycles in flac_lpc_32_c 1244575 decicycles in flac_lpc_32_sse4, James Almer's patch 650045 decicycles in flac_lpc_32_sse4, this patch I haven't tested the edgecases such as odd block lengths odd block length tested-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-06 02:51:19 +01:00
Clément Bœsch	d92a725329	x86/vp9lpf: remove 8 SWAPs in 84/48 transpose.	2014-02-05 07:21:13 +01:00
Clément Bœsch	97dde561de	x86/vp9lpf: remove braindead double pxor.	2014-02-05 07:21:11 +01:00
Clément Bœsch	9a3b05b0a9	x86/vp9lpf: save a few mov in flat8in/hev masks calc.	2014-02-05 07:21:09 +01:00
Clément Bœsch	91d85bb167	x86/vp9lpf: add ff_vp9_loop_filter_[vh]_44_16_{sse2,ssse3,avx}.	2014-02-05 07:21:06 +01:00
Michael Niedermayer	de17ccc774	Merge commit '51daafb02eaf96e0743a37ce95a7f5d02c1fa3c2' * commit '51daafb02eaf96e0743a37ce95a7f5d02c1fa3c2': x86: videodsp: Properly mark sse2 instructions in emulated_edge_mc as such. Conflicts: libavcodec/x86/videodsp_init.c See: `1b3a7e1f42` Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-01-31 14:30:30 +01:00
Clément Bœsch	c5dd73b890	x86/vp9lpf: add ff_vp9_loop_filter_h_{48,84}_16_{sse2,ssse3,avx}(). 5.40s → 5.30s overall decode time with -threads 1 on ped1080p.webm (i7 920, ssse3)	2014-01-30 19:34:13 +01:00
Ronald S. Bultje	9ee9c679a7	x86: videodsp: Fix a bug in a %if statement where we used '%%' instead of '&&'. Signed-off-by: Janne Grunau <janne-libav@jannau.net>	2014-01-30 15:33:23 +01:00
Ronald S. Bultje	51daafb02e	x86: videodsp: Properly mark sse2 instructions in emulated_edge_mc as such. Should fix crashes or corrupt output on pre-SSE2 CPUs when they were using SSE2-code (e.g. AMD Athlon XP 2400+ or Intel Pentium III) in hfix or hvar single-edge (left/right) extension functions. Signed-off-by: Janne Grunau <janne-libav@jannau.net>	2014-01-30 15:30:01 +01:00
James Almer	644c32ea4b	x86/vp9lpf: add ff_vp9_loop_filter_[vh]_88_16_sse2() Similar gains as the ssse3 version once again Signed-off-by: James Almer <jamrial@gmail.com>	2014-01-28 09:30:55 +01:00
Clément Bœsch	222c46c531	x86/vp9lpf: add ff_vp9_loop_filter_[vh]_88_16_{ssse3,avx}. 9680 decicycles in loop_filter_v_88_16_c, 4193765 runs, 539 skips 9233 decicycles in loop_filter_h_88_16_c, 4193751 runs, 553 skips 1929 decicycles in ff_vp9_loop_filter_v_88_16_ssse3, 4194118 runs, 186 skips 2738 decicycles in ff_vp9_loop_filter_h_88_16_ssse3, 4193861 runs, 443 skips 5.978 → 5.417 overall decode time on ped1080p.webm (-threads 1) Adding SSE2 support should be relatively trivial (just a matter of changing the pshufb [mask_mix] with something else), patch welcome.	2014-01-28 07:36:38 +01:00
Clément Bœsch	822385d775	x86/vp9lpf: add a preload system in FILTER_UPDATE. Allow some macro refactoring in filter14().	2014-01-27 22:39:26 +01:00
Clément Bœsch	315b4775ad	x86/vp9lpf: refactor v/h using common macros for P7 to Q7.	2014-01-27 22:39:26 +01:00
Clément Bœsch	5d144086cc	x86/vp9lpf: faster P7..Q7 accesses. Introduce 2 additional registers for stride3 and mstride3 to allow direct accesses (lea drops). 3931 → 3827 decicycles in ff_vp9_loop_filter_v_16_16_ssse3 Also uses defines to clarify the code.	2014-01-27 22:37:42 +01:00
Clément Bœsch	5f4d04d084	x86/lossless_videodsp: silly one-line cosmetic.	2014-01-25 16:24:50 +01:00
Clément Bœsch	5267e85056	x86/lossless_videodsp: use common macro for add and diff int16 loop.	2014-01-25 14:27:37 +01:00
Clément Bœsch	cddbfd2a95	x86/lossless_videodsp: simplify and explicit aligned/unaligned flags	2014-01-25 11:59:43 +01:00
Ronald S. Bultje	c9e6325ed9	vp9/x86: use explicit register for relative stack references. Before this patch, we explicitly modify rsp, which isn't necessarily universally acceptable, since the space under the stack pointer might be modified in things like signal handlers. Therefore, use an explicit register to hold the stack pointer relative to the bottom of the stack (i.e. rsp). This will also clear out valgrind errors about the use of uninitialized data that started occurring after the idct16x16/ssse3 optimizations were first merged.	2014-01-24 19:25:25 -05:00
Ronald S. Bultje	97474d527f	vp9/x86: iwht4x4 (lossless) mmx.	2014-01-24 19:25:25 -05:00
Ronald S. Bultje	d43efa68bd	vp9/x86: 4x4 iadst SIMD (ssse3) variants. Cycle measurements for intra itxfm_4x4_add on ped1080p.webm: idct_idct: 66 -> 67 cycles (noise measurement) idct_iadst: 199 -> 79 cycles iadst_idct: 165 -> 70 cycles iadst_iadst: 183 -> 82 cycles	2014-01-24 19:25:25 -05:00
Ronald S. Bultje	baf47020cd	vp9/x86: 8x8 iadst SIMD (ssse3/avx) variants. Cycle measurements for intra itxfm_8x8_add on ped1080p.webm: idct_idct: 133 -> 135 cycles (noise measurement) idct_iadst: 900 -> 241 cycles iadst_idct: 864 -> 215 cycles iadst_iadst: 973 -> 310 cycles	2014-01-24 19:25:25 -05:00
Michael Niedermayer	e6d1c66d74	avcodec/x86/lossless_videodsp: disable median optimizations for 16bps They only support upto 15bps Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-01-23 01:51:24 +01:00
Michael Niedermayer	eaacfc7dd1	avcodec/lossless_videodsp: Pass AVCodecContext to init Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-01-23 01:43:00 +01:00
Michael Niedermayer	ef00ef7553	avcodec/x86/lossless_videodsp: port sub_hfyu_median_prediction_int16 to yasm Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-01-22 23:27:27 +01:00
Michael Niedermayer	fad49aae28	avcodec/x86/lossless_videodsp: Port sub_hfyu_median_prediction_mmxext to int16 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-01-22 22:55:49 +01:00
Michael Niedermayer	fee97f25fa	avcodec/x86/lossless_videodsp: port add_hfyu_median_prediction_mmxext to 16bit Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-01-22 21:11:40 +01:00
Michael Niedermayer	631939bde6	avcodec/x86/lossless_videodsp: add diff_int16_mmx/sse2 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-01-22 19:41:21 +01:00
Reimar Döffinger	76421982d0	lossless_videodsp.asm: fix compilation. Fixes these errors with nasm: libavcodec/x86/lossless_videodsp.asm:86: error: invalid combination of opcode and operands libavcodec/x86/lossless_videodsp.asm:88: error: invalid combination of opcode and operands I don't know whether movd or movq was meant, but either way maskq vs. maskd must match the mov size. Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>	2014-01-21 19:46:02 +01:00
Michael Niedermayer	83b67ca056	avcodec/x86/lossless_videodsp: Port lorens add_hfyu_left_prediction_ssse3/sse4 to 16bit Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-01-21 02:55:41 +01:00
Michael Niedermayer	63d2be7533	avcodec/x86/lossless_videodsp: use SPLATW in add_int16 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-01-21 02:33:20 +01:00
Michael Niedermayer	f70d7eb20c	Move add/diff_int16 to lossless_videodsp Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-01-20 21:32:47 +01:00
Michael Niedermayer	a493f8541d	avcodec/x86/dsp: add_int16_mmx / add_int16_sse2 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-01-20 04:06:46 +01:00
James Almer	26800e3864	vp9/x86: rename ff_avg[48]_sse to ff_avg[48]_mmxext pavgb is an sse integer instruction, so the mmxext flag is enough Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-01-18 17:08:25 +01:00
James Almer	d2a7314f1e	vp9/x86: add ff_vp9_loop_filter_[vh]_16_16_sse2(). Similar gains in performance as the SSSE3 version Signed-off-by: James Almer <jamrial@gmail.com>	2014-01-17 14:16:38 +01:00
Ronald S. Bultje	8173d1ffc0	vp9/x86: 16x16 iadst_idct, idct_iadst and iadst_iadst (ssse3+avx). Sample timings on ped1080p.webm (of the ssse3 functions): iadst_idct: 4672 -> 1175 cycles idct_iadst: 4736 -> 1263 cycles iadst_iadst: 4924 -> 1438 cycles Total decoding time changed from 6.565s to 6.413s.	2014-01-16 13:49:31 +01:00
Clément Bœsch	9cc8fa63dd	vp9/x86: simplify a few mc inits.	2014-01-16 07:48:27 +01:00
Michael Niedermayer	6391dec82a	Merge remote-tracking branch 'qatar/master' * qatar/master: x86: dsputil: Simplify xvmc deprecation conditional Conflicts: libavcodec/x86/dsputil_init.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-01-15 20:41:08 +01:00
Diego Biurrun	aab40bbfd5	x86: dsputil: Simplify xvmc deprecation conditional	2014-01-15 15:23:46 +01:00
Clément Bœsch	8b4190da93	vp9/x86: add AVX for itxfm and lpf. 4412 decicycles in ff_vp9_loop_filter_h_16_16_ssse3, 4193462 runs, 842 skips 3600 decicycles in ff_vp9_loop_filter_h_16_16_avx, 4193621 runs, 683 skips 3010 decicycles in ff_vp9_loop_filter_v_16_16_ssse3, 4193528 runs, 776 skips 2678 decicycles in ff_vp9_loop_filter_v_16_16_avx, 4193742 runs, 562 skips 23025 decicycles in ff_vp9_idct_idct_32x32_add_ssse3, 2096871 runs, 281 skips 19943 decicycles in ff_vp9_idct_idct_32x32_add_avx, 2096815 runs, 337 skips 4675 decicycles in ff_vp9_idct_idct_16x16_add_ssse3, 4194018 runs, 286 skips 3980 decicycles in ff_vp9_idct_idct_16x16_add_avx, 4194022 runs, 282 skips 967 decicycles in ff_vp9_idct_idct_8x8_add_ssse3, 16776972 runs, 244 skips 887 decicycles in ff_vp9_idct_idct_8x8_add_avx, 16777002 runs, 214 skips	2014-01-15 15:54:03 +01:00
Michael Niedermayer	cb613657ee	avcodec/x86/proresdsp_init: x86 prores IDCT is bitexact again reenable it for for bitexact mode Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-01-14 15:59:00 +01:00
Michael Niedermayer	b148a39d55	Merge commit '46bacb5cc6169ff5e8e982495c4925467c1d8bb7' * commit '46bacb5cc6169ff5e8e982495c4925467c1d8bb7': x86: Consistently use cpu flag detection macros in places that still miss it Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-01-14 14:44:59 +01:00
Diego Biurrun	46bacb5cc6	x86: Consistently use cpu flag detection macros in places that still miss it	2014-01-14 00:04:58 +01:00
Clément Bœsch	af68bd1c06	vp9/x86: add ff_vp9_loop_filter_[vh]_16_16_ssse3(). 16662 decicycles in loop_filter_h_16_16_c, 8387355 runs, 1253 skips 17510 decicycles in loop_filter_v_16_16_c, 8387516 runs, 1092 skips 4941 decicycles in ff_vp9_loop_filter_h_16_16_ssse3, 8387887 runs, 721 skips 3899 decicycles in ff_vp9_loop_filter_v_16_16_ssse3, 8387980 runs, 628 skips Overall decode time goes from: ./ffmpeg -v 0 -nostats -threads 1 -i ~/samples/vp9/ped1080p.webm -f null - 8.10s user 0.02s system 99% cpu 8.126 total to: ./ffmpeg -v 0 -nostats -threads 1 -i ~/samples/vp9/ped1080p.webm -f null - 6.15s user 0.04s system 99% cpu 6.199 total (46 to 61 fps)	2014-01-12 20:20:24 +01:00
Clément Bœsch	e11ceea68f	vp9/x86: factor out some code in VP9_UNPACK_MULSUB_2W_4X.	2014-01-12 20:19:00 +01:00
Clément Bœsch	c9aa0b8f70	vp9/x86: remove reg redundancy in VP9_MULSUB_2W_2X.	2014-01-12 20:18:55 +01:00
Clément Bœsch	7c55ee6168	vp9/x86: merge IDCT coef macros.	2014-01-12 20:18:44 +01:00
Michael Niedermayer	92b2404571	Merge commit '4c642d8d98703faf52983243098f35865e15b312' * commit '4c642d8d98703faf52983243098f35865e15b312': x86: hpeldsp: Add missing av_cold attribute to init function Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-01-09 20:32:53 +01:00
Michael Niedermayer	390452bab6	Merge commit 'b0be1ae792ac8bbfb0fc7b9b9cb39eaf0feb489b' * commit 'b0be1ae792ac8bbfb0fc7b9b9cb39eaf0feb489b': x86: avcodec: Add a bunch of missing #includes for av_cold Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-01-09 20:24:15 +01:00
Diego Biurrun	4c642d8d98	x86: hpeldsp: Add missing av_cold attribute to init function	2014-01-09 15:09:07 +01:00
Diego Biurrun	b0be1ae792	x86: avcodec: Add a bunch of missing #includes for av_cold	2014-01-09 15:09:07 +01:00
Ronald S. Bultje	c6fe984f2f	vp9/x86: make STORE_2X2 macro local. Prevents this assembler warning: libavcodec/x86/vp9itxfm.asm:1208: warning: (VP9_IDCT32_1D:309) redefining multi-line macro `STORE_2X2' Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-01-08 14:07:15 +01:00
Ronald S. Bultje	04a187fb2a	vp9/x86: idct_32x32_add_ssse3 sub-8x8-idct. Runtime of the full 32x32 idct goes from 2446 to 2441 cycles (intra) or from 1425 to 1306 cycles (inter). Overall runtime is not significantly affected.	2014-01-07 20:43:35 -05:00
Ronald S. Bultje	37b001d14d	vp9/x86: idct_32x32_add_ssse3 sub-16x16-idct. Runtime of all IDCTs together goes from 3327 to 2473 cycles (intra, i.e. ~35% faster) or from 2312 to 1448 cycles (inter, i.e. ~60% faster). Total decode time of ped1080p.webm goes from 8.086sec to 7.974sec (1.4% faster).	2014-01-07 20:43:34 -05:00
Ronald S. Bultje	e84d14df10	vp9/x86: idct_32x32_add_ssse3. Sub-IDCTs will follow later. ped1080.webm goes from 9.295s to 8.191s (13.5% faster). The IDCT itself goes from 4372 (intra) or 4337 (inter) to 403 (intra) or 329 (inter) cycles for the DC-only form, 23755 (intra) or 23723 (inter) to 3497 (intra) or 3607 (inter) cycles for the no-DC form, which averages from 23393 (intra) or 16612 (inter) to 3449 (intra) or 2392 (inter) for all 32x32s together, i.e. about ~7x faster (all tests done on ped1080p.webm).	2014-01-07 20:43:30 -05:00
Michael Niedermayer	30056fd0be	Merge commit 'a03a642d5ceb5f2f7c6ebbf56ff365dfbcdb65eb' * commit 'a03a642d5ceb5f2f7c6ebbf56ff365dfbcdb65eb': h264: do not use 422 functions for monochrome See: `07abf13da4` Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-01-06 16:51:23 +01:00
Anton Khirnov	a03a642d5c	h264: do not use 422 functions for monochrome Fixes invalid memory access. Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind CC:libav-stable@libav.org	2014-01-06 08:25:36 +01:00
Ronald S. Bultje	18175baa54	vp9/x86: 16px MC functions (64bit only). Cycle counts for large MCs (old -> new on ped1080p.webm, mx!=0&&my!=0): 16x8: 876 -> 870 (0.7%) 16x16: 1444 -> 1435 (0.7%) 16x32: 2784 -> 2748 (1.3%) 32x16: 2455 -> 2349 (4.5%) 32x32: 4641 -> 4084 (13.6%) 32x64: 9200 -> 7834 (17.4%) 64x32: 8980 -> 7197 (24.8%) 64x64: 17330 -> 13796 (25.6%) Total decoding time goes from 9.326sec to 9.182sec.	2013-12-26 21:05:10 -05:00
Ronald S. Bultje	0d9375fc90	vp9/x86: 16x16 sub-IDCT for top-left 8x8 subblock (eob <= 38). Sub8x8 speed (w/o dc-only case) goes from ~750 cycles (inter) or ~735 cycles (intra) to ~415 cycles (inter) or ~430 cycles (intra). Average overall 16x16 idct speed goes from ~635 cycles (inter) or ~720 cycles (intra) to ~415 cycles (inter) or ~545 (intra) - all measurements done using ped1080p.webm.	2013-12-26 07:40:25 -05:00
Ivan Kalvachev	1c63aed232	Convert XvMC to hwaccel v3 Signed-off-by: Ivan Kalvachev <ikalvachev@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-12-22 22:03:47 +01:00
Michael Niedermayer	ce612fc186	Merge commit 'dfc50ac85e9d68a771b556297b7c411650206f3b' * commit 'dfc50ac85e9d68a771b556297b7c411650206f3b': x86: mpegvideo: move denoise_dct asm to mpegvideoenc Conflicts: libavcodec/x86/mpegvideo.c libavcodec/x86/mpegvideoenc.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-12-20 23:44:31 +01:00
Anton Khirnov	dfc50ac85e	x86: mpegvideo: move denoise_dct asm to mpegvideoenc This function is encoding-only. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2013-12-20 17:16:11 +01:00
Ronald S. Bultje	8d4c616fc0	vp9/x86: idct_add_16x16_ssse3. Currently only dc-only and full 16x16. Other subforms will follow in the near future. Total decoding time of ped1080p.webm goes from 9.7 to 9.3 seconds. DC-only goes from 957 -> 131 cycles, and the full IDCT goes from ~4050 to ~745 cycles.	2013-12-14 12:13:26 -05:00
Michael Niedermayer	8e70fdab36	Merge commit '4958f35a2ebc307049ff2104ffb944f5f457feb3' * commit '4958f35a2ebc307049ff2104ffb944f5f457feb3': dsputil: Move apply_window_int16 to ac3dsp Conflicts: libavcodec/arm/ac3dsp_init_arm.c libavcodec/arm/ac3dsp_neon.S libavcodec/x86/ac3dsp_init.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-12-09 04:12:40 +01:00
Diego Biurrun	4958f35a2e	dsputil: Move apply_window_int16 to ac3dsp The (optimized) functions are used nowhere else.	2013-12-08 17:57:15 +01:00
Ronald S. Bultje	92436e8ad9	vp9: implement top/left half (4x4) sub-8x8-IDCT. For that specific case (eob>3&&eob<=12), runtime of idct8x8 goes from 668 to 477 cycles. For all idct8x8, runtime goes from 521 to 490 cycles.	2013-12-07 12:39:36 -05:00
Ronald S. Bultje	b2045c44a9	vp9: split pre-load of 11585x2 out of 1d idct macro. This allows us to load it only once, instead of twice, in this function.	2013-12-07 12:39:36 -05:00
Ronald S. Bultje	f9a0d4c6e0	vp9: minor refactorings in idct ssse3 assembly. Make register usage in macros explicit; change mulsub_2w_4x to use 2 instead of 3 temp registers.	2013-12-07 12:39:35 -05:00
Ronald S. Bultje	8729964b99	vp9: split x86 assembly in two files. (And in future, loopfilter or intra pred could be put in their own respective files also.)	2013-12-07 12:39:35 -05:00
Michael Niedermayer	5b4d57455d	Merge remote-tracking branch 'qatar/master' * qatar/master: x86: Initialize mmxext after amd3dnow optimizations Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-12-05 11:55:41 +01:00
Diego Biurrun	3d7c84747d	x86: Initialize mmxext after amd3dnow optimizations The mmxext optimizations should be at least equally fast if available and amd3dnow optimizations are being deprecated. Thus the former should override the latter, not the other way around.	2013-12-04 18:52:48 +01:00
Michael Niedermayer	be2312aa8f	Merge remote-tracking branch 'qatar/master' * qatar/master: dsputil: x86: Move ff_inv_zigzag_direct16 table init to mpegvideo If someone optimizes dct_quantize for non x86 SIMD, then this probably needs to be reverted. Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-12-02 10:59:48 +01:00
Diego Biurrun	7ffaa19570	dsputil: x86: Move ff_inv_zigzag_direct16 table init to mpegvideo The table is MMX-specific and used nowhere else.	2013-12-02 04:05:18 +01:00
Michael Niedermayer	3adb825650	Merge commit 'cf7860db608df7c76471d8b61f07abbd5aad8dd5' * commit 'cf7860db608df7c76471d8b61f07abbd5aad8dd5': x86: dsputil: Suppress deprecation warnings for XvMC bits Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-11-28 22:47:37 +01:00
Diego Biurrun	cf7860db60	x86: dsputil: Suppress deprecation warnings for XvMC bits These parts are scheduled for removal on the next version bump. Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>	2013-11-28 16:04:30 +01:00
Clément Bœsch	616da59542	avcodec/x86/vp9dsp: merge a few SWAP together.	2013-11-21 23:06:21 +01:00
Clément Bœsch	e0434cfcfc	avcodec/x86: remove 3 sub in pred4x4_tm_vp8_8. before: 411 decicycles in ff_pred4x4_tm_vp8_8_ssse3, 8388289 runs, 319 skips after: 389 decicycles in ff_pred4x4_tm_vp8_8_ssse3, 8388308 runs, 300 skips Tested on i7 920.	2013-11-17 23:12:35 +01:00
Clément Bœsch	d28c79b003	avcodec/x86/vp9dsp: use EXTERNAL_* macros. Original fix by one of these developers: Anton Khirnov <anton@khirnov.net> Diego Biurrun <diego@biurrun.de> Luca Barbato <lu_zero@gentoo.org> Martin Storsjö <martin@martin.st> See `97962b2` / `72ca830` Personnal guess is Diego Biurrun.	2013-11-16 17:03:17 +01:00
Michael Niedermayer	91e00c4a78	Merge commit '458446acfa1441d283dacf9e6e545beb083b8bb0' * commit '458446acfa1441d283dacf9e6e545beb083b8bb0': lavc: Edge emulation with dst/src linesize Conflicts: libavcodec/cavs.c libavcodec/h264.c libavcodec/hevc.c libavcodec/mpegvideo_enc.c libavcodec/mpegvideo_motion.c libavcodec/rv34.c libavcodec/svq3.c libavcodec/vc1dec.c libavcodec/videodsp.h libavcodec/videodsp_template.c libavcodec/vp3.c libavcodec/vp8.c libavcodec/wmv2.c libavcodec/x86/videodsp.asm libavcodec/x86/videodsp_init.c Changes to the asm are not merged, they are left for volunteers or in their absence for later. The changes this merge introduces are reordering of the function arguments See: `face578d56` Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-11-15 15:07:10 +01:00
Ronald S. Bultje	72ca830f51	lavc: VP9 decoder Originally written by Ronald S. Bultje <rsbultje@gmail.com> and Clément Bœsch <u@pkh.me> Further contributions by: Anton Khirnov <anton@khirnov.net> Diego Biurrun <diego@biurrun.de> Luca Barbato <lu_zero@gentoo.org> Martin Storsjö <martin@martin.st> Signed-off-by: Luca Barbato <lu_zero@gentoo.org> Signed-off-by: Anton Khirnov <anton@khirnov.net>	2013-11-15 10:16:28 +01:00
Ronald S. Bultje	458446acfa	lavc: Edge emulation with dst/src linesize Allow supporting files for which the image stride is smaller than the maximum block size + number of subpel mc taps, e.g. a 64x64 VP9 file or a 16x16 VP8 file with -fflags +emu_edge.	2013-11-15 10:16:27 +01:00
Michael Niedermayer	5231eecdaf	Merge remote-tracking branch 'qatar/master' * qatar/master: Deprecate obsolete XvMC hardware decoding support Conflicts: libavcodec/mpeg12.c libavcodec/mpeg12dec.c libavcodec/mpegvideo.c libavcodec/options_table.h libavutil/pixdesc.c libavutil/version.h Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-11-14 03:26:35 +01:00
Diego Biurrun	19e30a58fc	Deprecate obsolete XvMC hardware decoding support XvMC has long ago been superseded by newer acceleration APIs, such as VDPAU, and few downstreams still support it. Furthermore XvMC is not implemented within the hwaccel framework, but requires its own specific code in the MPEG-1/2 decoder, which is a maintenance burden.	2013-11-13 21:07:45 +01:00

1 2 3 4 5 ...

1498 Commits