Loren Merritt
25cb0c1a1e
x86inc: activate REP_RET automatically
...
Now RET checks whether it immediately follows a branch, so the
programmer dosen't have to keep track of that condition. REP_RET
is still needed manually when it's a branch target, but that's
much rarer.
The implementation involves lots of spurious labels, but that's OK
because we strip them.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:17:59 -04:00
Alex Smith
08fa828b3f
avutil: Fix compilation with inline asm disabled on mingw
...
Because of -Werror=implicit-function-declaration the build will fail.
Signed-off-by: Martin Storsjö <martin@martin.st>
2013-09-22 00:50:32 +03:00
Diego Biurrun
79aec43ce8
x86: Add and use more convenience macros to check CPU extension availability
2013-08-29 13:07:37 +02:00
Diego Biurrun
8410d6e93c
avutil: Refactor CPU extension availability macros
2013-08-28 23:54:14 +02:00
Diego Biurrun
b78b10c4b7
avutil: Move internal CPU detection function declarations to private header
2013-08-28 23:54:14 +02:00
Diego Biurrun
3ac7fa81b2
Consistently use "cpu_flags" as variable/parameter name for CPU flags
2013-07-18 00:31:35 +02:00
Loren Merritt
c8b920a9b7
lls/x86: use 3-operator vaddpd in ADDPD_MEM
...
Fixes build with yasm-1.1
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2013-07-02 10:15:09 +02:00
Loren Merritt
1221bb6239
x86: lpc: fix a segfault in av_evaluate_lls_sse2()
2013-06-30 23:11:19 +00:00
Loren Merritt
b545179fdf
x86: lpc: simd av_evaluate_lls
...
1.5x-1.8x faster on sandybridge
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2013-06-29 13:23:57 +02:00
Loren Merritt
502ab21af0
x86: lpc: simd av_update_lls
...
4x-6x faster on sandybridge
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2013-06-29 13:23:57 +02:00
Diego Biurrun
1fda184a85
avutil: Add av_cold attributes to init functions missing them
2013-05-04 22:48:05 +02:00
Christophe Gisquet
566b7a20fd
x86: float dsp: butterflies_float SSE
...
97c -> 49c
Some codecs could benefit from more unrolling, but AAC doesn't.
2013-05-03 08:08:02 +02:00
Ronald S. Bultje
b93b27edb0
dsputil: Make dsputil selectable
...
Signed-off-by: Martin Storsjö <martin@martin.st>
2013-04-10 11:04:05 +03:00
Christophe Gisquet
2e81acc687
x86inc: Fix number of operands for cmp* instructions
...
cmp{p,s}{s,d} instructions do take an imm8 operand.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2013-04-09 23:55:30 +02:00
Diego Biurrun
b6649ab503
cosmetics: Remove unnecessary extern keywords from function declarations
2013-03-27 14:21:45 +01:00
Ronald S. Bultje
0c0828ecc5
x86: Use simple nop codes for <= sse (rather than <= mmx)
...
The "CentaurHauls family 6 model 9 stepping 8" family of CPUs
(flags: fpu vme de pse tsc msr cx8 sep mtrr pge mov pat mmx fxsr sse
up rng rng_en ace ace_en) SIGILLs on long nop codes.
Signed-off-by: Martin Storsjö <martin@martin.st>
2013-02-19 22:33:19 +02:00
Diego Biurrun
4db96649ca
avutil: Ensure that emms_c is always defined, even on non-x86
2013-02-14 19:29:04 +01:00
Diego Biurrun
ab441e20ff
avutil: Move emms code to x86-specific header
2013-02-14 17:37:34 +01:00
Ronald S. Bultje
d56668bd80
floatdsp: move scalarproduct_float from dsputil to avfloatdsp.
...
This makes the aac decoder and all voice codecs independent of dsputil.
2013-01-22 11:55:42 -08:00
Ronald S. Bultje
42d3246948
floatdsp: move vector_fmul_reverse from dsputil to avfloatdsp.
...
Now, nellymoserenc and aacenc no longer depends on dsputil. Independent
of this patch, wmaprodec also does not depend on dsputil, so I removed
it from there also.
2013-01-22 11:55:42 -08:00
Ronald S. Bultje
55aa03b9f8
floatdsp: move vector_fmul_add from dsputil to avfloatdsp.
2013-01-22 11:55:42 -08:00
Martin Storsjö
f4facd2ce7
x86: Add a Yasm-based emms() replacement
...
This provides a fallback when building with Yasm enabled, but neither
inline assembly, nor the _mm_empty intrinsic are available or enabled.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2013-01-18 22:02:13 +01:00
Diego Biurrun
d633d12b2c
x86inc: Add cvisible macro for C functions with public prefix
...
This allows defining externally visible library symbols.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2013-01-18 22:02:03 +01:00
Diego Biurrun
ef5d41a553
x86inc: Rename "program_name" to "private_prefix"
...
The new name is more descriptive and will allow defining a separate
public prefix for externally visible library symbols.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2013-01-18 20:29:53 +01:00
Martin Storsjö
973b4d44f1
float_dsp: Add #ifdef HAVE_INLINE_ASM around vector_fmul_window
...
This fixes builds on 64bit MSVC.
Signed-off-by: Martin Storsjö <martin@martin.st>
2013-01-17 19:07:35 +02:00
Justin Ruggles
e034cc6c60
lavc: Move vector_fmul_window to AVFloatDSPContext
...
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2013-01-16 10:45:45 +01:00
Diego Biurrun
dae1d507af
x86: Add PAVGB macro to abstract pavgb/pavgusb instruction via cpuflags
2013-01-15 17:29:43 +01:00
Diego Biurrun
320e1d0df3
x86: ABSB2: port to cpuflags
2013-01-15 11:18:51 +01:00
Diego Biurrun
094a7405e5
x86: ABSB: port to cpuflags
2013-01-15 11:18:51 +01:00
Diego Biurrun
51969a652c
x86: ABS2: port to cpuflags
2013-01-14 21:56:55 +01:00
Diego Biurrun
5b4dfbffc2
x86: ABS1: port to cpuflags
2013-01-06 13:57:01 +01:00
Ronald S. Bultje
a34d9ad969
lavc: merge latest x86inc.asm fixes with x264
...
Unbreak NASM support.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2012-12-19 07:27:33 +01:00
Janne Grunau
0995ad8db4
x86inc: fully concatenate tokens to fix macro expansion for nasm
...
Fixes build errors with nasm introduced in 6f40e9f070
for stack
memory alignment. Noticed by BugMaster.
2012-12-13 23:57:09 +01:00
Ronald S. Bultje
140367aff9
x86inc: fix stack alignment on win64
...
Signed-off-by: Martin Storsjö <martin@martin.st>
2012-12-12 21:30:49 +02:00
Ronald S. Bultje
6f40e9f070
x86inc: support stack mem allocation and re-alignment in PROLOGUE
...
Use this in VP8/H264-8bit loopfilter functions so they can be used if
there is no aligned stack (e.g. MSVC 32bit or ICC 10.x).
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2012-12-12 05:23:46 +01:00
Justin Ruggles
1c012e6bfb
x86: float_dsp: fix loading of the len parameter on x86-32
2012-12-07 21:19:29 -05:00
Justin Ruggles
ecc8b02194
x86: float_dsp: fix compilation of ff_vector_dmul_scalar_avx() on x86-32
...
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
2012-12-06 14:11:15 +01:00
Justin Ruggles
b30a363331
x86: af_volume: add SSE2/SSSE3/AVX-optimized s32 volume scaling
2012-12-05 11:23:37 -05:00
Justin Ruggles
ac7eb4cb20
float_dsp: add vector_dmul_scalar() to multiply a vector of doubles
...
Include x86-optimized versions for SSE2 and AVX.
2012-12-05 11:23:36 -05:00
Diego Biurrun
490df522c7
x86: cpu: Drop unused HAVE_RWEFLAGS condition
...
The test for rweflags was dropped in a previous commit.
2012-11-28 00:28:09 +01:00
Justin Ruggles
947f933687
x86: float_dsp: add SSE version of vector_fmul_scalar()
2012-11-26 11:30:19 -05:00
Diego Biurrun
87af05c575
x86: SPLATD: port to cpuflags
2012-11-18 18:34:05 +01:00
Diego Biurrun
26301caaa1
x86: mmx2 ---> mmxext in asm constructs
2012-11-14 00:58:51 +01:00
Diego Biurrun
2b479bcab0
build: Drop AVX assembly ifdefs
...
An assembler able to cope with AVX instructions is now required.
2012-11-11 20:43:28 +01:00
Diego Biurrun
f0d124f005
x86inc: Set program_name outside of x86inc.asm
...
This reduces the local difference to the x264 upstream version.
2012-11-11 11:06:19 +01:00
Diego Biurrun
4b60fac419
x86: PALIGNR: port to cpuflags
2012-11-09 21:31:31 +01:00
Diego Biurrun
dbb37e7711
x86: PABSW: port to cpuflags
2012-11-05 14:51:10 +01:00
Diego Biurrun
0a7a94f2e5
x86: Refactor PSWAPD fallback implementations and port to cpuflags
2012-11-02 17:05:29 +01:00
Diego Biurrun
26f01bd106
x86: PMINUB: port to cpuflags
2012-11-02 15:38:15 +01:00
Diego Biurrun
61bc2bc7d4
x86util: Add cpuflags_mmxext alias for cpuflags_mmx2
...
"mmxext" is a more sensible name and more common in outside projects.
2012-11-02 15:22:34 +01:00
Diego Biurrun
012f73e271
x86inc: Only define program_name if the macro is unset
...
This allows overriding the value from outside of the file.
2012-11-02 14:38:00 +01:00
Dave Yeo
9c167914a1
x86: Fix assembly with NASM
...
Unlike YASM, NASM only looks for include files in the current
directory, not in the directory that included files reside in.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2012-10-31 10:20:35 +01:00
Diego Biurrun
588fafe7f3
x86: MMX2 ---> MMXEXT in macro names
2012-10-31 01:04:55 +01:00
Diego Biurrun
6860b4081d
x86: include x86inc.asm in x86util.asm
...
This is necessary to allow refactoring some x86util macros with cpuflags.
2012-10-31 00:37:42 +01:00
Ronald S. Bultje
08b028c18d
Remove INIT_AVX from x86inc.asm.
2012-10-29 14:51:14 -07:00
Diego Biurrun
a7329e5fc2
x86: get_cpu_flags: add necessary ifdefs around function body
...
ff_get_cpu_flags_x86() requires cpuid(), which is conditionally defined
elsewhere in the file. Surrounding the function body with ifdefs allows
building even when cpuid is not defined. An empty cpuflags mask is
returned in this case.
2012-10-04 19:29:14 +02:00
Diego Biurrun
f6fbce761e
x86: Drop CPU detection intrinsics
...
Now that there is CPU detection in YASM, there will always be one of
inline or external assembly enabled, which obviates the need to fall
back on CPU detection through compiler intrinsics.
2012-10-04 19:29:14 +02:00
Diego Biurrun
1f6d86991f
x86: Add YASM implementations of cpuid and xgetbv from x264
...
This allows detecting CPU features with builds that have neither
gcc inline assembly nor the right compiler intrinsics enabled.
2012-10-04 19:29:14 +02:00
Diego Biurrun
54b243141e
x86: cpu: Break out test for cpuid capabilities into separate function
2012-10-04 18:09:21 +02:00
Diego Biurrun
cc5e9e5ff0
x86: ff_get_cpu_flags_x86(): Avoid a pointless variable indirection
2012-10-04 17:58:42 +02:00
Diego Biurrun
e0c6cce447
x86: Replace checks for CPU extensions and flags by convenience macros
...
This separates code relying on inline from that relying on external
assembly and fixes instances where the coalesced check was incorrect.
2012-09-08 18:18:34 +02:00
Justin Ruggles
7327525997
x86: float_dsp: fix ff_vector_fmac_scalar_avx() on Win64
...
The SWAP macro does not work for explicit xmm/ymm usage, so instead just move
the scalar value from xmm2 to xmm0.
2012-09-07 14:49:10 -04:00
Diego Biurrun
f82c4fb27f
x86: Add convenience macros to check for CPU extensions and flags
2012-09-04 01:44:59 +02:00
Diego Biurrun
17337f54c0
x86: Split inline and external assembly #ifdefs
2012-08-31 01:53:25 +02:00
Diego Biurrun
a886b279a0
x86: cosmetics: Comment some #endifs for better readability
2012-08-30 18:50:33 +02:00
Loren Merritt
7a1944b907
vf_hqdn3d: x86 asm
...
13% faster on penryn, 16% on sandybridge, 15% on bulldozer
Not simd; a compiler should have generated this, but gcc didn't.
2012-08-26 10:49:14 +00:00
Justin Ruggles
6092dafb5a
lavr: x86: optimized 6-channel s16 to fltp conversion
2012-08-23 20:10:57 -04:00
Mans Rullgard
5b170c0bea
x86: remove FASTDIV inline asm
...
GCC 4.3 and later do the right thing with the plain C code. Earlier
versions in 32-bit mode generate one extra instruction, needlessly
zeroing what would be the high half of the shifted value. At least
two gcc configurations miscompile the inline asm in some situations.
In 64-bit mode, all gcc versions generate imul r64, r64 followed by
shr. On Intel i7 and later, this imul is faster 32-bit mul. On
older Intel and all AMD, it is slightly slower. On Atom it is much
slower.
Considering where the FASTDIV macro is used, any overall negative
performance impact of this change should be negligible. If anyone
cares, they should file a bug against gcc and get the instruction
selection fixed.
Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-22 14:29:10 +01:00
Martin Storsjö
33e112847d
Add more missing includes after removing the implicit common.h
...
Signed-off-by: Martin Storsjö <martin@martin.st>
2012-08-16 10:49:54 +03:00
Martin Storsjö
70766c2182
Add some more missing includes after removing the implicit common.h
...
Signed-off-by: Martin Storsjö <martin@martin.st>
2012-08-15 23:48:48 +03:00
Mans Rullgard
070a402b60
x86: move MANGLE() and related macros to libavutil/x86/asm.h
...
These x86-specific macros do not belong in generic code.
Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-09 00:58:20 +01:00
Mans Rullgard
c318626ce2
x86: rename libavutil/x86_cpu.h to libavutil/x86/asm.h
...
This puts x86-specific things in the x86/ subdirectory where they
belong.
Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-09 00:58:20 +01:00
Mans Rullgard
edd8226795
x86: fix build with nasm 2.08
...
It appears that something goes wrong in old nasm versions when the
%+ operator is used in the last argument of a macro invocation and
this argument is tested with %ifdef within the macro. This patch
rearranges the macro arguments such that the %+ operator is never
used in the last argument.
2012-08-07 15:24:34 +01:00
Mans Rullgard
180d43bc67
x86: use nop cpu directives only if supported
...
nasm does not support 'CPU foonop' directives. This adds a configure
test for the directive and uses it only if supported.
Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-07 15:22:20 +01:00
Mans Rullgard
7238265052
x86: fix rNmp macros with nasm
...
For some reason, nasm requires this. No harm done to yasm.
Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-07 15:21:58 +01:00
Mans Rullgard
a3df4781f4
x86: add colons after labels
...
nasm prints a warning if the colon is missing.
Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-07 15:20:56 +01:00
Diego Biurrun
239fdf1b4a
x86: build: replace mmx2 by mmxext
...
Refactoring mmx2/mmxext YASM code with cpuflags will force renames.
So switching to a consistent naming scheme beforehand is sensible.
The name "mmxext" is more official and widespread and also the name
of the CPU flag, as reported e.g. by the Linux kernel.
2012-08-03 22:51:05 +02:00
Diego Biurrun
ca844b7be9
x86: Use consistent 3dnowext function and macro name suffixes
...
Currently there is a wild mix of 3dn2/3dnow2/3dnowext. Switching to
"3dnowext", which is a more common name of the CPU flag, as reported
e.g. by the Linux kernel, unifies this.
2012-08-03 14:00:47 +02:00
Loren Merritt
f8d8fe255d
x86inc: clip num_args to 7 on x86-32.
...
This allows us to unconditionally set the cglobal num_args
parameter to a bigger value, thus making writing yasm code
even easier than before.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-07-28 08:29:45 -07:00
Ronald S. Bultje
96c9cc1094
x86inc: sync to latest version from x264.
2012-07-28 08:29:44 -07:00
Justin Ruggles
79687079a9
x86: add support for fmaddps fma4 instruction with abstraction to avx/sse
2012-07-27 11:25:48 -04:00
Ronald S. Bultje
30b45d9c38
x86inc: automatically insert vzeroupper for YMM functions.
2012-07-26 13:43:16 -07:00
Jason Garrett-Glaser
85a3c19ed1
dsputil: x86: add SHUFFLE_MASK_W macro
...
Simplifies pshufb masks that operate on words.
2012-07-22 16:56:58 -04:00
Ronald S. Bultje
358d854df8
x86/cpu: implement get/set_eflags using intrinsics
...
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Signed-off-by: Martin Storsjö <martin@martin.st>
2012-07-10 14:33:32 +03:00
Ronald S. Bultje
c0ee695bd7
x86/cpu: implement support for cpuid through intrinsics
...
Signed-off-by: Martin Storsjö <martin@martin.st>
2012-07-10 14:33:24 +03:00
Ronald S. Bultje
3f150ffba3
x86/cpu: implement support for xgetbv through intrinsics
...
Signed-off-by: Martin Storsjö <martin@martin.st>
2012-07-10 14:33:17 +03:00
Ronald S. Bultje
07b287020c
x86/timer: implement an intrinsic-based version for rdtsc (AV_READ_TIME).
2012-07-07 13:35:07 -07:00
Loren Merritt
4d4752366f
x86inc: add SPLATB_LOAD, SPLATB_REG, PSHUFLW macros
...
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2012-07-05 17:37:11 +02:00
Loren Merritt
2cd1f5cadc
x86inc: modify ALIGN to not generate long nops on i586
...
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2012-07-05 17:37:11 +02:00
Mans Rullgard
889c1ec4cc
x86: cpu: clean up check for cpuid instruction support
...
This adds macros for accessing the EFLAGS register and uses
these instead of coding the entire check in inline asm.
Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-07-01 12:25:33 +01:00
Mans Rullgard
963cdf39b4
x86: cpu: whitespace (mostly) cosmetics
...
This adds whitespace around operators, aligns line continuation
backslashes, and breaks long lines. Also fixes an ifdef halfway
through a statement. The one line of duplication this saved is
not worth the ugliness.
Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-06-25 16:24:31 +01:00
Ronald S. Bultje
8123e0901f
x86: place some inline asm under #if HAVE_INLINE_ASM
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-06-25 13:23:12 +01:00
Diego Biurrun
65345a5a30
x86: Add CPU flag for the i686 cmov instruction
2012-06-23 16:21:50 +02:00
Justin Ruggles
82b2df9790
float_dsp: add x86-optimized functions for vector_fmac_scalar()
2012-06-18 18:01:14 -04:00
Justin Ruggles
d5a7229ba4
Add a float DSP framework to libavutil
...
Move vector_fmul() from DSPContext to AVFloatDSPContext.
2012-06-08 13:14:38 -04:00
Vitor Sessak
4a301706fd
x86: Avoid movs on BUTTERFLYPS when in AVX mode
...
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
2012-05-29 15:29:46 +02:00
Justin Ruggles
5cc6d5244d
lavr: replace the SSE version of ff_conv_fltp_to_flt_6ch() with SSE4 and AVX
...
The current SSE version is slower than the MMX version on Athlon64 and Sandy
Bridge, but the SSE4 and AVX versions are faster on Sandy Bridge.
2012-05-09 16:17:59 -04:00
Justin Ruggles
c8af852b97
Add libavresample
...
This is a new library for audio sample format, channel layout, and sample rate
conversion.
2012-04-24 21:28:27 -04:00
Loren Merritt
705f3d4759
x86inc: support AVX abstraction for 2-operand instructions
...
Add cvtdq2ps and cvtps2dq to the AVX instruction list.
Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
2012-04-18 21:14:32 -04:00
Diego Biurrun
baaab6069a
build: Move all arch OBJS declarations into arch subdirectory Makefiles.
2012-04-12 21:30:13 +02:00