Commit Graph

90 Commits

Author SHA1 Message Date
Michael Niedermayer
696f5f98e2 Merge commit '6e9f8d6a7d7392a236df19fef6f4eba41f18167e'
* commit '6e9f8d6a7d7392a236df19fef6f4eba41f18167e':
  x86: vf_yadif: Remove stray dsputil_mmx #include

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-05-09 11:51:40 +02:00
Diego Biurrun
6e9f8d6a7d x86: vf_yadif: Remove stray dsputil_mmx #include 2013-05-08 18:18:23 +02:00
Michael Niedermayer
a8ff830b79 Merge commit '093804a93cc5da3f95f98265a5df116912443cec'
* commit '093804a93cc5da3f95f98265a5df116912443cec':
  avfilter: Add av_cold attributes to init/uninit functions

Conflicts:
	libavfilter/af_ashowinfo.c
	libavfilter/af_volume.c
	libavfilter/src_movie.c
	libavfilter/vf_lut.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-05-05 11:42:18 +02:00
Diego Biurrun
093804a93c avfilter: Add av_cold attributes to init/uninit functions 2013-05-04 21:10:05 +02:00
Michael Niedermayer
0a73803c86 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  x86: Move some conditional code around to avoid unused variable warnings

Conflicts:
	libavcodec/x86/dsputil_mmx.c
	libavfilter/x86/vf_yadif_init.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-04-23 11:01:46 +02:00
Diego Biurrun
c1ad70c3cb x86: Move some conditional code around to avoid unused variable warnings 2013-04-22 17:50:02 +02:00
Clément Bœsch
1ae44c87c9 lavfi/gradfun: remove rounding to match C and SSE code.
There is no noticable benefit for such precision.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2013-03-28 07:59:29 +01:00
Clément Bœsch
38a2f88d39 lavfi/gradfun: fix dithering in MMX code.
Current dithering only uses the first 4 instead of the whole 8 random values.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2013-03-28 07:59:18 +01:00
Clément Bœsch
2d66fc543b lavfi/gradfun: fix rounding in MMX code.
Current code divides before increasing precision.

Also reduce upper bound for strength from 255 to 64.  This will prevent
an overflow in the SSSE3 and MMX filter_line code: delta is expressed as
an u16 being shifted by 2 to the left. If it overflows, having a
strength not above 64 will make sure that m is set to 0 (making the
m*m*delta >> 14 expression void).

A value above 64 should not make any sense unless gradfun is used as
a blur filter.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2013-03-28 07:59:04 +01:00
James Darnley
c9a51c29fc yadif: remove an 'm' from the LOAD macro definition
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-03-16 22:33:49 +01:00
James Darnley
1d3b14cac2 yadif: remove repeated check on width
The filter already checks that width (and height) are greater than 3.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-03-16 22:33:30 +01:00
James Darnley
7976d92dac yadif: cosmetic indentation from previous commits
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-03-16 22:33:06 +01:00
James Darnley
0a5814c9ba yadif: x86 assembly for 9 to 14-bit samples
These smaller samples do not need to be unpacked to double words
allowing the code to process more pixels every iteration (still 2 in MMX
but 6 in SSE2).  It also avoids emulating the missing double word
instructions on older instruction sets.

Like with the previous code for 16-bit samples this has been tested on
an Athlon64 and a Core2Quad.

Athlon64:
1809275 decicycles in C,    32718 runs, 50 skips
 911675 decicycles in mmx,  32727 runs, 41 skips, 2.0x faster
 495284 decicycles in sse2, 32747 runs, 21 skips, 3.7x faster

Core2Quad:
 921363 decicycles in C,     32756 runs, 12 skips
 486537 decicycles in mmx,   32764 runs,  4 skips, 1.9x faster
 293296 decicycles in sse2,  32759 runs,  9 skips, 3.1x faster
 284910 decicycles in ssse3, 32759 runs,  9 skips, 3.2x faster

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-03-16 22:32:54 +01:00
James Darnley
17e7b49501 yadif: x86 assembly for 16-bit samples
This is a fairly dumb copy of the assembly for 8-bit samples but it
works and produces identical output to the C version.  The options have
been tested on an Athlon64 and a Core2Quad.

Athlon64:
1810385 decicycles in C,    32726 runs, 42 skips
1080744 decicycles in mmx,  32744 runs, 24 skips, 1.7x faster
 818315 decicycles in sse2, 32735 runs, 33 skips, 2.2x faster

Core2Quad:
 924025 decicycles in C,     32750 runs, 18 skips
 623995 decicycles in mmx,   32767 runs,  1 skips, 1.5x faster
 406223 decicycles in sse2,  32764 runs,  4 skips, 2.3x faster
 387842 decicycles in ssse3, 32767 runs,  1 skips, 2.4x faster
 307726 decicycles in sse4,  32763 runs,  5 skips, 3.0x faster

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-03-16 22:32:34 +01:00
James Darnley
0735b50880 yadif: restore speed of the C filtering code
Always use the special filter for the first and last 3 columns (only).

Changes made in 64ed397 slowed the filter to just under 3/4 of what it
was.  This commit restores the speed while maintaining identical output.

For reference, on my Athlon64:
1733222 decicycles in old
2358563 decicycles in new
1727558 decicycles in this

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-03-13 22:07:25 +01:00
Loren Merritt
5b3c1aecb2 hqdn3d: Fix out of array read in LOWPASS
CC:libav-stable@libav.org
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2013-03-13 09:14:59 +01:00
Michael Niedermayer
446f7c62a2 Merge commit '64ed397635ef2666b0ca0c8d8c60a8bc44581d82'
* commit '64ed397635ef2666b0ca0c8d8c60a8bc44581d82':
  vf_yadif: fix out-of line reads

Conflicts:
	libavfilter/vf_yadif.c
	tests/ref/fate/filter-yadif-mode0
	tests/ref/fate/filter-yadif-mode1

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-02-16 09:09:38 +01:00
Anton Khirnov
64ed397635 vf_yadif: fix out-of line reads
Some changes in the border pixels, visually indistinguishable.
2013-02-15 16:08:33 +01:00
Michael Niedermayer
6e9f3f3b65 Merge commit '238614de679a71970c20d7c3fee08a322967ec40'
* commit '238614de679a71970c20d7c3fee08a322967ec40':
  cdgraphics: do not rely on get_buffer() initializing the frame.
  svq1: replace struct svq1_frame_size with an array.
  vf_yadif: silence a warning.

Conflicts:
	libavcodec/svq1dec.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-02-06 14:53:49 +01:00
Anton Khirnov
99162f8d46 vf_yadif: silence a warning.
clang says:
libavfilter/vf_yadif.c:192:28: warning: incompatible pointer types assigning to
'void (*)(uint8_t *, uint8_t *, uint8_t *, uint8_t *, int, int, int, int, int)'
from 'void (uint16_t *, uint16_t *, uint16_t *, uint16_t *, int, int, int, int, int)'
2013-02-06 10:21:51 +01:00
Michael Niedermayer
0b6f34cc9f Merge remote-tracking branch 'qatar/master'
* qatar/master:
  avfilter: x86: consistent filenames for filter optimizations

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-02-05 11:56:48 +01:00
Diego Biurrun
e66240f22e avfilter: x86: consistent filenames for filter optimizations 2013-02-04 15:00:47 +01:00
Michael Niedermayer
d593f2b241 avfilter/x86/vf_hqdn3d_init: fix author attribution & project name
Reference: 7a1944b907

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-02-02 13:18:09 +01:00
Michael Niedermayer
0d13a7b786 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  vf_hqdn3d: x86: Add proper arch optimization initialization

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-02-02 13:18:01 +01:00
Diego Biurrun
76d90125cd vf_hqdn3d: x86: Add proper arch optimization initialization 2013-02-01 13:11:45 +01:00
Michael Niedermayer
329675cfd7 Merge commit 'a1c525f7eb0783d31ba7a653865b6cbd3dc880de'
* commit 'a1c525f7eb0783d31ba7a653865b6cbd3dc880de':
  pcx: return meaningful error codes.
  tmv: return meaningful error codes.
  msrle: return meaningful error codes.
  cscd: return meaningful error codes.
  yadif: x86: fix build for compilers without aligned stack
  lavc: introduce the convenience function init_get_bits8
  lavc: check for overflow in init_get_bits

Conflicts:
	libavcodec/cscd.c
	libavcodec/pcx.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-01-14 14:43:32 +01:00
Daniel Kang
67360ccd51 yadif: x86: fix build for compilers without aligned stack
Manually load registers to avoid using 8 registers on x86_32 with
compilers that do not align the stack (e.g. MSVC).

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2013-01-14 09:51:52 +01:00
Michael Niedermayer
65b8527993 Merge commit 'f7bf72a4a1146a7583577c9bdc066767e1ba3c6a'
* commit 'f7bf72a4a1146a7583577c9bdc066767e1ba3c6a':
  idcinvideo: correctly set AVFrame defaults
  yadif: Port inline assembly to yasm
  au: remove unnecessary casts
  au: return AVERROR codes instead of -1

Conflicts:
	libavcodec/idcinvideo.c
	libavfilter/x86/yadif_template.c
	libavformat/au.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-01-10 12:27:16 +01:00
Daniel Kang
899157b308 yadif: Port inline assembly to yasm
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2013-01-09 18:41:02 +01:00
Clément Bœsch
63e1fc2588 lavfi/gradfun: remove rounding to match C and SSE code.
There is no noticable benefit for such precision.
2012-12-19 03:13:25 +01:00
Clément Bœsch
60ba9a9a88 lavfi/gradfun: fix dithering in MMX code.
Current dithering only use the first 4w instead of the whole 8 random values.
2012-12-19 03:13:25 +01:00
Clément Bœsch
49de902a1e lavfi/gradfun: fix rounding in MMX code.
Current code divide before increasing precision.
2012-12-19 03:13:25 +01:00
Carl Eugen Hoyos
24b20087bd Fix compilation with yasm 0.6.2. 2012-12-07 00:26:45 +01:00
Michael Niedermayer
54a71f2e6c Merge commit 'b519298a1578e0c895d53d4b4ed8867b1c031a56'
* commit 'b519298a1578e0c895d53d4b4ed8867b1c031a56':
  pixdesc: fix yuva 10bit bit depth
  avconv: deprecate the -vol option
  x86: af_volume: add SSE2/SSSE3/AVX-optimized s32 volume scaling
  x86: af_volume: add SSE2-optimized s16 volume scaling

Conflicts:
	ffmpeg.c
	tests/ref/lavfi/pixdesc
	tests/ref/lavfi/pixfmts_copy
	tests/ref/lavfi/pixfmts_null
	tests/ref/lavfi/pixfmts_scale
	tests/ref/lavfi/pixfmts_vflip

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-12-06 15:55:47 +01:00
Justin Ruggles
b30a363331 x86: af_volume: add SSE2/SSSE3/AVX-optimized s32 volume scaling 2012-12-05 11:23:37 -05:00
Justin Ruggles
f96f1e06a4 x86: af_volume: add SSE2-optimized s16 volume scaling 2012-12-05 11:23:37 -05:00
Michael Niedermayer
add7513e64 Merge commit 'fa8fcab1e0d31074c0644c4ac5194474c6c26415'
* commit 'fa8fcab1e0d31074c0644c4ac5194474c6c26415':
  x86: h264_chromamc_10bit: drop pointless PAVG %define
  x86: mmx2 ---> mmxext in function names
  swscale: do not forget to swap data in formats with different endianness

Conflicts:
	libavcodec/x86/dsputil_mmx.c
	libavfilter/x86/gradfun.c
	libswscale/input.c
	libswscale/utils.c
	libswscale/x86/swscale.c
	tests/ref/lavfi/pixfmts_scale

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-11-01 13:11:51 +01:00
Diego Biurrun
d8eda37080 x86: mmx2 ---> mmxext in function names 2012-10-31 17:53:57 +01:00
Michael Niedermayer
9766d9c985 Merge commit '04581c8c77ce779e4e70684ac45302972766be0f'
* commit '04581c8c77ce779e4e70684ac45302972766be0f':
  x86: yasm: Use complete source path for macro helper %includes

Conflicts:
	Makefile

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-10-31 13:57:09 +01:00
Michael Niedermayer
3174616f59 Merge commit '6860b4081d046558c44b1b42f22022ea341a2a73'
* commit '6860b4081d046558c44b1b42f22022ea341a2a73':
  x86: include x86inc.asm in x86util.asm
  cng: Reindent some incorrectly indented lines
  cngdec: Allow flushing the decoder
  cngdec: Make the dbov variable have the right unit
  cngdec: Fix the memset size to cover the full array
  cngdec: Update the LPC coefficients after averaging the reflection coefficients
  configure: fix print_config() with broke awks

Conflicts:
	libavcodec/x86/ac3dsp.asm
	libavcodec/x86/dct32.asm
	libavcodec/x86/deinterlace.asm
	libavcodec/x86/dsputil.asm
	libavcodec/x86/dsputilenc.asm
	libavcodec/x86/fft.asm
	libavcodec/x86/fmtconvert.asm
	libavcodec/x86/h264_chromamc.asm
	libavcodec/x86/h264_deblock.asm
	libavcodec/x86/h264_deblock_10bit.asm
	libavcodec/x86/h264_idct.asm
	libavcodec/x86/h264_idct_10bit.asm
	libavcodec/x86/h264_intrapred.asm
	libavcodec/x86/h264_intrapred_10bit.asm
	libavcodec/x86/h264_weight.asm
	libavcodec/x86/vc1dsp.asm
	libavcodec/x86/vp3dsp.asm
	libavcodec/x86/vp56dsp.asm
	libavcodec/x86/vp8dsp.asm

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-10-31 13:43:33 +01:00
Diego Biurrun
04581c8c77 x86: yasm: Use complete source path for macro helper %includes
This is more consistent with the way we handle C #includes and
it simplifies the build system.
2012-10-31 00:37:42 +01:00
Diego Biurrun
6860b4081d x86: include x86inc.asm in x86util.asm
This is necessary to allow refactoring some x86util macros with cpuflags.
2012-10-31 00:37:42 +01:00
Michael Niedermayer
3b0bb321a5 Merge commit 'f6c38c5f4ed6683a6a61db2ed418a68bbe5f5507'
* commit 'f6c38c5f4ed6683a6a61db2ed418a68bbe5f5507':
  avfilter: call x86 init functions under if (ARCH_X86), not if (HAVE_MMX)
  rtspdec: Set the default port for listen mode, if none is specified
  tscc2: Fix an out of array access
  rtmpproto: Fix an out of array write
  rtspdec: Fix use of uninitialized byte
  vp8: reset loopfilter delta values at keyframes.
  avutil: add yuva422p and yuva444p formats

Conflicts:
	libavutil/pixdesc.c
	libavutil/pixfmt.h
	tests/ref/lavfi/pixdesc
	tests/ref/lavfi/pixfmts_copy
	tests/ref/lavfi/pixfmts_null
	tests/ref/lavfi/pixfmts_scale
	tests/ref/lavfi/pixfmts_vflip

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-10-13 14:14:11 +02:00
Diego Biurrun
f6c38c5f4e avfilter: call x86 init functions under if (ARCH_X86), not if (HAVE_MMX) 2012-10-12 19:58:51 +02:00
Loren Merritt
1b1b902e2c hqdn3d: Fix out of array read in LOWPASS
Fixes ticket1752

Commit message by commiter
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-09-22 03:17:28 +02:00
Michael Niedermayer
c617bed34f Merge remote-tracking branch 'qatar/master'
* qatar/master:
  MSS1 and MSS2: set final pixel format after common stuff has been initialised
  MSS2 decoder
  configure: handle --disable-asm before check_deps
  x86: Split inline and external assembly #ifdefs
  configure: x86: Separate inline from standalone assembler capabilities
  pktdumper: Use a custom define instead of PATH_MAX for buffers
  pktdumper: Use av_strlcpy instead of strncpy
  pktdumper: Use sizeof(variable) instead of the direct buffer length

Conflicts:
	Changelog
	configure
	libavcodec/allcodecs.c
	libavcodec/avcodec.h
	libavcodec/codec_desc.c
	libavcodec/dct-test.c
	libavcodec/imgconvert.c
	libavcodec/mss12.c
	libavcodec/version.h
	libavfilter/x86/gradfun.c
	libswscale/x86/yuv2rgb.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-08-31 13:34:32 +02:00
Michael Niedermayer
98298eb103 Merge commit 'ec36aa69448f20a78d8c4588265022e0b2272ab5'
* commit 'ec36aa69448f20a78d8c4588265022e0b2272ab5':
  x86: Fix linking with some or all of yasm, mmx, optimizations disabled
  configure: Add more fine-grained SSE CPU capabilities flags
  avfilter: x86: Use more precise compile template names
  x86: cosmetics: Comment some #endifs for better readability
  g723_1: add comfort noise generation
  utvideoenc: Switch to dsputils' median prediction
  utvideoenc: Avoid writing into the input picture
  avtools: remove the distinction between func_arg and func2_arg.
  avconv: make the -passlogfile option per-stream.
  avconv: make the -pass option per-stream.
  cmdutils: make -codecs print lossy/lossless flags.
  lavc: add lossy/lossless codec properties.

Conflicts:
	Changelog
	cmdutils.c
	configure
	doc/APIchanges
	ffmpeg.h
	ffmpeg_opt.c
	ffprobe.c
	libavcodec/codec_desc.c
	libavcodec/g723_1.c
	libavcodec/utvideoenc.c
	libavcodec/version.h
	libavcodec/x86/mpegaudiodec.c
	libavcodec/x86/rv40dsp_init.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-08-31 13:01:30 +02:00
Diego Biurrun
17337f54c0 x86: Split inline and external assembly #ifdefs 2012-08-31 01:53:25 +02:00
Diego Biurrun
cdaec0b240 avfilter: x86: Use more precise compile template names 2012-08-30 18:51:51 +02:00
Michael Niedermayer
17106a7c90 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  audio_frame_queue: Clean up ff_af_queue_log_state debug function
  dwt: Remove unused code.
  cavs: convert cavsdata.h to a .c file
  cavs: Move inline functions only used in one file out of the header
  cavs: Move data tables used in only one place to that file
  fate: Add a single symbol Ut Video decoder test
  vf_hqdn3d: x86 asm
  vf_hqdn3d: support 16bit colordepth
  avconv: prefer user-forced input framerate when choosing output framerate

Conflicts:
	ffmpeg.c
	libavcodec/audio_frame_queue.c
	libavcodec/dwt.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-08-26 22:40:02 +02:00