Commit Graph

938 Commits

Author SHA1 Message Date
Diego Biurrun
b0de1c7663 x86: build: Only compile FDCT code if MMX is enabled
All other files containing purely inline assembly are treated the same way.
2014-07-05 04:18:34 -07:00
Diego Biurrun
12f129e545 x86: Unconditionally compile blockdsp and svq1enc init files
This avoids a link failure with MMX disabled as the init functions
are referenced unconditionally.
2014-07-05 04:18:34 -07:00
Diego Biurrun
009331303a x86: huffyuvdsp: Move inline assembly to init file
This avoids a link failure with MMX disabled as now code and
initialization are compiled under the same condition.
2014-07-05 04:18:34 -07:00
Diego Biurrun
391ecc961c x86: mpegvideoenc: Change SIMD optimization name suffixes to lowercase 2014-07-03 13:41:41 -07:00
Diego Biurrun
79793f8337 Update Fiona's name in copyright statements. 2014-07-01 03:26:51 -07:00
Diego Biurrun
e3fcb14347 dsputil: Split off IDCT bits into their own context 2014-06-30 07:58:46 -07:00
Diego Biurrun
d2869aea04 dsputil: Move MMX/SSE2-optimized IDCT bits to the x86 subdirectory 2014-06-26 16:15:07 -07:00
Diego Biurrun
5ab03e41e5 x86: h264dsp: Fix link failure with optimizations disabled
With optimzations disabled compilers have trouble doing dead code
elimination on 'if (foo && 0)' expressions, while 'if (0 && foo)'
still works, so use the latter to avoid problems.

Bug-Id: 707
2014-06-25 15:24:51 -07:00
Diego Biurrun
fab9df63a3 dsputil: Split off global motion compensation bits into a separate context 2014-06-23 09:58:17 -07:00
Diego Biurrun
c67b449beb dsputil: Split bswap*_buf() off into a separate context 2014-06-22 18:22:31 -07:00
Diego Biurrun
9a9e2f1c8a dsputil: Split audio operations off into a separate context 2014-06-22 06:20:15 -07:00
Diego Biurrun
e74433a8e6 dsputil: Split clear_block*/fill_block* off into a separate context 2014-06-18 14:07:23 -07:00
Martin Storsjö
570d4b2186 x86: h264: Don't keep data in the redzone across function calls on 64 bit unix
We know that the called function (ff_chroma_inter_body_mmxext)
doesn't touch the redzone, and thus will be kept intact - thus,
this doesn't fix any bug per se.

However, valgrind's memcheck tool intentionally assumes that the
redzone is clobbered on every function call and function return
(see a long comment in valgrind/memcheck/mc_main.c). This avoids
false positives in that tool, at the cost of an extra stack pointer
adjustment.

The other alternative would be a valgrind suppression for this issue,
but that's an extra burden for everybody that wants to run libavcodec
within valgrind.

Signed-off-by: Martin Storsjö <martin@martin.st>
2014-06-10 16:31:48 +03:00
Diego Biurrun
368f50359e dsputil: Split off quarterpel bits into their own context 2014-05-29 06:48:31 -07:00
Diego Biurrun
054013a0fc dsputil: Move APE-specific bits into apedsp 2014-05-29 06:41:15 -07:00
Diego Biurrun
65d5d58658 dsputil: Move SVQ1 encoding specific bits into svq1enc 2014-05-29 06:41:15 -07:00
Diego Biurrun
512f3ffe9b dsputil: Split off HuffYUV encoding bits into their own context
Also shorten HuffYUV context member names to avoid clutter.
2014-05-27 08:54:53 -07:00
Diego Biurrun
0d439fbede dsputil: Split off HuffYUV decoding bits into their own context
Also shorten HuffYUV context member names to avoid clutter.
2014-05-27 08:52:34 -07:00
James Almer
0f524b6c69 x86/synth_filter: remove the fma3 version ifdefs
This fixes compilation failures with --disable-fma3

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2014-04-13 11:29:28 +02:00
Timothy Gu
71c32ed533 DNxHD: convert inline asm to yasm 2014-04-11 12:09:09 +02:00
Timothy Gu
676856204b DNxHD: make get_pixel_8x4_sym accept ptrdiff_t as stride 2014-04-11 12:09:09 +02:00
Diego Biurrun
57b5b84e20 x86: dsputil: Move ff_apply_window_int16_* bits to ac3dsp, where they belong 2014-04-04 19:08:05 +02:00
Diego Biurrun
c2c5be5749 x86: h264_qpel: Simplify an #if conditional
The extra conditions are covered by previous #ifs and conditional compilation.
2014-04-04 19:08:05 +02:00
Diego Biurrun
01c5779f56 x86: Drop some unnecessary YASM ifdefs
Dead code elimination is enough to avoid undefined references in these cases.
2014-04-04 19:08:05 +02:00
Diego Biurrun
b42f49e42f x86: dsputil: Eliminate some unnecessary dsputil_x86.h #includes 2014-04-04 19:08:05 +02:00
Diego Biurrun
3dc6272bed Remove a number of unnecessary dsputil.h #includes 2014-04-04 19:08:05 +02:00
James Almer
c74b86699c x86/synth_filter: add synth_filter_fma3
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2014-04-04 17:40:51 +02:00
James Almer
81e02fae6e x86/synth_filter: add synth_filter_avx
Sandy Bridge Win64:
180 cycles in ff_synth_filter_inner_sse2
150 cycles in ff_synth_filter_inner_avx

Also switch some instructions to a three operand format to avoid
assembly errors with Yasm 1.1.0 or older.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2014-04-04 17:40:51 +02:00
James Almer
2025d8026f x86/synth_filter: add synth_filter_sse
Build only on x86_32 targets.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2014-04-04 17:40:51 +02:00
Peter Ross
ac4b32df71 On2 VP7 decoder
Further performance improvements and security fixes by
Vittorio Giovara, Luca Barbato and Diego Biurrun.

Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2014-04-04 04:00:11 +02:00
Diego Biurrun
efc7290eb6 x86: hpeldsp: Keep all rnd_template instantiations in hpeldsp_init
There is no point in having a separate file just for the instantiation
that provides the public functions.
2014-03-26 04:31:27 -07:00
Diego Biurrun
aba70bb538 Add missing headers to make template files compile (more) standalone 2014-03-26 04:31:27 -07:00
Diego Biurrun
d0aabeab23 x86: h264_qpel: Fix typo in CALL_2X_PIXELS macro invocation
This fixes FATE with mmxext CPUFLAGS set.
2014-03-26 12:00:01 +01:00
Diego Biurrun
82dd1026cf x86: dsputil: Move hpeldsp-related declarations to a separate header 2014-03-22 06:17:29 -07:00
Diego Biurrun
6655c933a8 x86: dsputil: Move fpel declarations to a separate header 2014-03-22 06:17:29 -07:00
Diego Biurrun
322a1dda97 dsputil: Refactor duplicated CALL_2X_PIXELS / PIXELS16 macros 2014-03-22 06:17:29 -07:00
Diego Biurrun
600b854ad8 imgconvert: Move ff_deinterlace_line_*_mmx declarations out of dsputil 2014-03-22 06:17:29 -07:00
Diego Biurrun
1a8d0cf77e x86: dsputil: Move inline assembly macros to a separate header 2014-03-22 06:17:29 -07:00
Diego Biurrun
82bb304801 dsputil: Use correct type in me_cmp_func function pointer 2014-03-20 05:03:23 -07:00
Diego Biurrun
0e083d7e43 build: Group general components separate from de/encoders in arch Makefiles
This is in line with how the top-level libavcodec Makefile is structured.
2014-03-20 05:03:23 -07:00
Diego Biurrun
5169e68895 dsputil: Propagate bit depth information to all (sub)init functions
This avoids recalculating the value over and over again.
2014-03-20 05:03:23 -07:00
Diego Biurrun
db3f61a04f x86: dsputil_init: Drop some unnecessary parentheses 2014-03-13 08:15:51 -07:00
Diego Biurrun
441b093915 x86: dsputil_init: K&R formatting cosmetics 2014-03-13 08:15:51 -07:00
Diego Biurrun
4cb4680c10 x86: dsputil_x86.h: K&R formatting cosmetics 2014-03-13 08:15:51 -07:00
Diego Biurrun
f8bbebecfd x86: motion_est: K&R formatting cosmetics 2014-03-13 08:15:51 -07:00
Diego Biurrun
a36947c167 dsputilenc_mmx: K&R formatting cosmetics 2014-03-13 08:15:51 -07:00
Diego Biurrun
38675229a8 dsputil_mmx: K&R formatting cosmetics 2014-03-13 08:15:51 -07:00
Diego Biurrun
6a8b35dc88 dsputilenc_mmx: Merge two assignment blocks with identical conditions 2014-03-13 08:15:51 -07:00
Diego Biurrun
55519926ef x86: Make function prototype comments in assembly code consistent
This helps grepping for functions, among other things.
2014-03-13 05:50:29 -07:00
Diego Biurrun
edd1f833fa x86: h264_idct_10_bit: Use proper type in function prototype comments 2014-03-13 05:50:29 -07:00
Diego Biurrun
831a118078 Update dsputil- and SIMD-related comments to match reality more closely 2014-03-13 05:50:29 -07:00
Diego Biurrun
17608f6ee3 x86: Add some more missing headers 2014-03-13 05:50:28 -07:00
Diego Biurrun
08dba0e1c3 x86: mpegvideoenc: Remove some remnants of the long-gone libmpeg2 IDCT 2014-03-13 05:50:28 -07:00
Diego Biurrun
3bfdee00cd x86: dcadsp: Fix linking with yasm and optimizations disabled
Some optimized functions reference optimized symbols, so the functions
must be explicitly disabled when those symbols are unavailable.
2014-03-05 23:16:21 +01:00
Diego Biurrun
3741aa37c2 x86: cabac: Use correct #includes to make header compile standalone 2014-03-05 13:32:25 +01:00
Christophe Gisquet
4cb6964244 dcadec: simplify decoding of VQ high frequencies
The vector dequantization has a test in a loop preventing effective SIMD
implementation. By moving it out of the loop, this loop can be DSPized.

Therefore, modify the current DSP implementation. In particular, the
DSP implementation no longer has to handle null loop sizes.

The decode_hf implementations have following timings:

For x86 Arrandale:
        C  SSE SSE2 SSE4
win32: 260 162  119  104
win64: 242 N/A   89   72

The arm NEON optimizations follow in a later patch as external asm. The
now unused check for the y modifier in arm inline asm is removed from
configure.
2014-02-28 13:03:22 +01:00
Christophe Gisquet
08e3ea60ff x86: synth filter float: implement SSE2 version
Timings for Arrandale:
          C    SSE
win32:  2108   334
win64:  1152   322

Factorizing the inner loop with a call/jmp is a >15 cycles cost, even with
the jmp destination being aligned.

Unrolling for ARCH_X86_64 is a 20 cycles gain.

Signed-off-by: Janne Grunau <janne-libav@jannau.net>
2014-02-28 13:00:48 +01:00
Christophe Gisquet
ad507d7907 x86: dcadsp: implement SSE lfe_dir
Results for Arrandale/Windows:
32: 1670 -> 316
64:  728 -> 298

Signed-off-by: Janne Grunau <janne-libav@jannau.net>
2014-02-28 13:00:47 +01:00
Diego Biurrun
b23650491f prores: Use consistent names for DSP arch initialization functions 2014-02-28 10:34:55 +01:00
Diego Biurrun
017a06a9ee x86: dsputil: Use correct file name as multiple inclusion guard 2014-02-20 04:16:15 -08:00
Diego Biurrun
b23bc95920 x86: dca: Add missing multiple inclusion guards 2014-02-19 10:19:15 +01:00
Janne Grunau
5c1c6e8226 dca: include dcadsp.h in {arm,x86}/dca.h for checkheaders 2014-02-08 13:38:36 +01:00
Janne Grunau
0cffd6fff5 x86: use the inline int8x8_fmul_int32 only if inline SSE2 is availbale
Fixes compilation with MSVC. Also does not rely on on earlier config.h
include but include it directly.
2014-02-08 12:10:56 +01:00
Christophe Gisquet
5b59a9fc61 x86: dcadsp: implement int8x8_fmul_int32
For the callable function (as opposed to the inline one):
         C  SSE  SSE2  SSE4
Win32:  47   42   29    26
Win64:  30   33   25    23
The SSE version is neither compiled nor set for ARCH_X86_64, as the
inlinable function takes over.

Signed-off-by: Janne Grunau <janne-libav@jannau.net>
2014-02-07 22:52:40 +01:00
Ronald S. Bultje
9ee9c679a7 x86: videodsp: Fix a bug in a %if statement where we used '%%' instead of '&&'.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
2014-01-30 15:33:23 +01:00
Ronald S. Bultje
51daafb02e x86: videodsp: Properly mark sse2 instructions in emulated_edge_mc as such.
Should fix crashes or corrupt output on pre-SSE2 CPUs when they were
using SSE2-code (e.g. AMD Athlon XP 2400+ or Intel Pentium III) in
hfix or hvar single-edge (left/right) extension functions.

Signed-off-by: Janne Grunau <janne-libav@jannau.net>
2014-01-30 15:30:01 +01:00
Diego Biurrun
aab40bbfd5 x86: dsputil: Simplify xvmc deprecation conditional 2014-01-15 15:23:46 +01:00
Diego Biurrun
46bacb5cc6 x86: Consistently use cpu flag detection macros in places that still miss it 2014-01-14 00:04:58 +01:00
Diego Biurrun
4c642d8d98 x86: hpeldsp: Add missing av_cold attribute to init function 2014-01-09 15:09:07 +01:00
Diego Biurrun
b0be1ae792 x86: avcodec: Add a bunch of missing #includes for av_cold 2014-01-09 15:09:07 +01:00
Anton Khirnov
a03a642d5c h264: do not use 422 functions for monochrome
Fixes invalid memory access.

Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC:libav-stable@libav.org
2014-01-06 08:25:36 +01:00
Anton Khirnov
dfc50ac85e x86: mpegvideo: move denoise_dct asm to mpegvideoenc
This function is encoding-only.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2013-12-20 17:16:11 +01:00
Diego Biurrun
4958f35a2e dsputil: Move apply_window_int16 to ac3dsp
The (optimized) functions are used nowhere else.
2013-12-08 17:57:15 +01:00
Diego Biurrun
3d7c84747d x86: Initialize mmxext after amd3dnow optimizations
The mmxext optimizations should be at least equally fast if available and
amd3dnow optimizations are being deprecated. Thus the former should
override the latter, not the other way around.
2013-12-04 18:52:48 +01:00
Diego Biurrun
7ffaa19570 dsputil: x86: Move ff_inv_zigzag_direct16 table init to mpegvideo
The table is MMX-specific and used nowhere else.
2013-12-02 04:05:18 +01:00
Diego Biurrun
cf7860db60 x86: dsputil: Suppress deprecation warnings for XvMC bits
These parts are scheduled for removal on the next version bump.

Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2013-11-28 16:04:30 +01:00
Ronald S. Bultje
72ca830f51 lavc: VP9 decoder
Originally written by Ronald S. Bultje <rsbultje@gmail.com> and
Clément Bœsch <u@pkh.me>

Further contributions by:
Anton Khirnov <anton@khirnov.net>
Diego Biurrun <diego@biurrun.de>
Luca Barbato <lu_zero@gentoo.org>
Martin Storsjö <martin@martin.st>

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2013-11-15 10:16:28 +01:00
Ronald S. Bultje
458446acfa lavc: Edge emulation with dst/src linesize
Allow supporting files for which the image stride is smaller than
the maximum block size + number of subpel mc taps, e.g. a 64x64 VP9
file or a 16x16 VP8 file with -fflags +emu_edge.
2013-11-15 10:16:27 +01:00
Diego Biurrun
19e30a58fc Deprecate obsolete XvMC hardware decoding support
XvMC has long ago been superseded by newer acceleration APIs, such as
VDPAU, and few downstreams still support it. Furthermore XvMC is not
implemented within the hwaccel framework, but requires its own specific
code in the MPEG-1/2 decoder, which is a maintenance burden.
2013-11-13 21:07:45 +01:00
Diego Biurrun
0338c39698 dsputil: Split off H.263 bits into their own H263DSPContext 2013-11-08 12:40:47 +01:00
Diego Biurrun
e2b5b09789 x86: rv40dsp: Use PAVGB instruction macro where appropriate 2013-11-04 21:14:39 +01:00
Mikulas Patocka
694d997afe x86: hpeldsp: Use PAVGB instruction macro where necessary
Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2013-11-04 01:29:23 +01:00
Diego Biurrun
1700b4e678 x86: vp8dsp: Split loopfilter code into a separate file 2013-11-01 22:05:20 +01:00
Diego Biurrun
6405ca7d4a x86: h264_idct: Update comments to match 8/10-bit depth optimization split 2013-10-07 21:46:46 +02:00
Henrik Gramner
bbe4a6db44 x86inc: Utilize the shadow space on 64-bit Windows
Store XMM6 and XMM7 in the shadow space in functions that
clobbers them. This way we don't have to adjust the stack
pointer as often, reducing the number of instructions as
well as code size.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:25:35 -04:00
Diego Biurrun
ce1e8045e0 x86: fdct: Employ more specific ifdefs
This avoids building mmxext and sse2 code when disabled by configure.
2013-10-06 22:02:25 +02:00
Diego Biurrun
2ddb35b911 x86: dsputil: Separate ff_add_hfyu_median_prediction_cmov from dsputil_mmx
The function does not depend on MMX and compilation without MMX enabled
fails if the function is compiled conditional on MMX availability.
2013-10-05 19:21:15 +02:00
Diego Biurrun
258414d077 x86: fdct: Initialize optimized fdct implementations in the standard way 2013-10-05 18:20:52 +02:00
Diego Biurrun
0b8b2ae5e9 x86: xviddct: Employ more specific ifdefs
This avoids building mmxext and sse2 code when disabled by configure.
2013-10-05 18:14:58 +02:00
Diego Biurrun
6cc133ec58 x86: fdct: Only build fdct code if encoders have been enabled
fdct is only initialized if encoders are enabled.
2013-10-04 10:50:44 +02:00
Martin Storsjö
1daea5232f x86: Add an xmm clobbering wrapper for avcodec_encode_video2
This is required since 187105ff8 when we started trying to
wrap this function as well.

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-09-16 22:22:41 +03:00
Hendrik Leppkes
a06a5b78e2 mathops/x86: work around inline asm miscompilation with GCC 4.8.1
The volatile is not required here, and prevents a miscompilation with GCC
4.8.1 when building on x86 with --cpu=i686

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-09-15 11:15:07 -04:00
Diego Biurrun
e998b56362 x86: avcodec: Consistently structure CPU extension initialization 2013-08-29 13:07:37 +02:00
Diego Biurrun
6369ba3c9c x86: avcodec: Use convenience macros to check for CPU flags 2013-08-29 13:07:37 +02:00
Diego Biurrun
cd52917237 x86: rv40dsp: Move inline assembly optimizations out of YASM init section 2013-08-28 23:59:24 +02:00
Diego Biurrun
a64f6a04ac dsputil: x86: Hide arch-specific initialization details
Also give consistent names to init functions.
2013-08-28 23:59:24 +02:00
Diego Biurrun
8506ff97c9 vp56: Mark VP6-only optimizations as such.
Most of our VP56 optimizations are VP6-only and will stay that way.
So avoid compiling them for VP5-only builds.
2013-08-23 14:42:19 +02:00
Diego Biurrun
e7b31844f6 x86: Split DCT and FFT initialization into separate files 2013-08-21 20:15:27 +02:00
Diego Biurrun
0b45269c2d x86: h264_idct: Remove incorrect comment 2013-08-21 15:09:58 +02:00
Diego Biurrun
3ac7fa81b2 Consistently use "cpu_flags" as variable/parameter name for CPU flags 2013-07-18 00:31:35 +02:00