Commit Graph

50 Commits

Author SHA1 Message Date
Loren Merritt
b7d0d10a1d x86inc: Speed up assembling with Yasm
Work around Yasm's inefficiency with handling large numbers of variables
in the global scope.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2014-01-26 18:40:08 +01:00
Jason Garrett-Glaser
a3fabc6cb3 x86: more AVX2 framework
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-14 12:41:56 +01:00
Jason Garrett-Glaser
c6908d6b4b x86inc: FMA3/4 Support
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-14 12:41:54 +01:00
Derek Buitenhuis
206895708e x86inc: Remove our FMA4 support
This is so we can sync to x264's version of FMA4 support.

This partialy reverts commit 79687079a9.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-14 12:39:29 +01:00
Henrik Gramner
c108ba0175 x86inc: Use VEX-encoded instructions in AVX functions
Automatically use VEX-encoding in AVX/AVX2/XOP/FMA3/FMA4
functions for all instructions that exists in a VEX-encoded
version.

This change makes it easier to extend existing code to use AVX2.

Also add support for AVX emulation of a few instructions that
were missing before.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-14 12:36:11 +01:00
Henrik Gramner
ad7d7d4f6a x86inc: Remove .rodata kludges
The Mach-O bug was fixed in yasm 0.8.0 and we don't
support versions that old anymore.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-09 07:44:30 -04:00
Henrik Gramner
3e2fa991db x86inc: remove misaligned cpu flag
Prevents a crash if the misaligned exception mask bit is
cleared for some reason.

Misaligned SSE functions are only used on AMD Phenom CPUs
and the benefit is miniscule. They also require modifying
the MXCSR control register and by removing those functions
we can get rid of that complexity altogether.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:27:38 -04:00
Jason Garrett-Glaser
7115566541 x86inc: various minor backports from x264
Small backports that sneaked into other asm commits in x264.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:27:22 -04:00
Derek Buitenhuis
47f9d7ce54 x86inc: Check for __OUTPUT_FORMAT__ having a value of "x64"
This is also a valid value for WIN64.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:27:08 -04:00
Henrik Gramner
bbe4a6db44 x86inc: Utilize the shadow space on 64-bit Windows
Store XMM6 and XMM7 in the shadow space in functions that
clobbers them. This way we don't have to adjust the stack
pointer as often, reducing the number of instructions as
well as code size.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:25:35 -04:00
Loren Merritt
3fb78e99a0 x86inc: create xm# and ym#, analagous to m#
For when we want to mix simd sizes within one function.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:25:19 -04:00
Loren Merritt
49ebe3f9fe x86inc: fix some corner cases of SWAP
SWAP with >=3 named (rather than numbered) args
PERMUTE followed by SWAP with 2 named args
used to produce the wrong permutation

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:25:06 -04:00
Henrik Gramner
63f0d62310 x86inc: Use SSE instead of SSE2 for copying data
Reduces code size because movaps/movups is one byte
shorter than movdqa/movdqu.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:24:33 -04:00
Henrik Gramner
ad76e6e7e1 x86inc: Set ELF hidden visibility for global constants
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:24:13 -04:00
Loren Merritt
25cb0c1a1e x86inc: activate REP_RET automatically
Now RET checks whether it immediately follows a branch, so the
programmer dosen't have to keep track of that condition. REP_RET
is still needed manually when it's a branch target, but that's
much rarer.

The implementation involves lots of spurious labels, but that's OK
because we strip them.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:17:59 -04:00
Christophe Gisquet
2e81acc687 x86inc: Fix number of operands for cmp* instructions
cmp{p,s}{s,d} instructions do take an imm8 operand.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2013-04-09 23:55:30 +02:00
Ronald S. Bultje
0c0828ecc5 x86: Use simple nop codes for <= sse (rather than <= mmx)
The "CentaurHauls family 6 model 9 stepping 8" family of CPUs
(flags: fpu vme de pse tsc msr cx8 sep mtrr pge mov pat mmx fxsr sse
up rng rng_en ace ace_en) SIGILLs on long nop codes.

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-02-19 22:33:19 +02:00
Diego Biurrun
d633d12b2c x86inc: Add cvisible macro for C functions with public prefix
This allows defining externally visible library symbols.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2013-01-18 22:02:03 +01:00
Diego Biurrun
ef5d41a553 x86inc: Rename "program_name" to "private_prefix"
The new name is more descriptive and will allow defining a separate
public prefix for externally visible library symbols.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2013-01-18 20:29:53 +01:00
Ronald S. Bultje
a34d9ad969 lavc: merge latest x86inc.asm fixes with x264
Unbreak NASM support.

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2012-12-19 07:27:33 +01:00
Janne Grunau
0995ad8db4 x86inc: fully concatenate tokens to fix macro expansion for nasm
Fixes build errors with nasm introduced in 6f40e9f070 for stack
memory alignment. Noticed by BugMaster.
2012-12-13 23:57:09 +01:00
Ronald S. Bultje
140367aff9 x86inc: fix stack alignment on win64
Signed-off-by: Martin Storsjö <martin@martin.st>
2012-12-12 21:30:49 +02:00
Ronald S. Bultje
6f40e9f070 x86inc: support stack mem allocation and re-alignment in PROLOGUE
Use this in VP8/H264-8bit loopfilter functions so they can be used if
there is no aligned stack (e.g. MSVC 32bit or ICC 10.x).

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2012-12-12 05:23:46 +01:00
Justin Ruggles
b30a363331 x86: af_volume: add SSE2/SSSE3/AVX-optimized s32 volume scaling 2012-12-05 11:23:37 -05:00
Diego Biurrun
f0d124f005 x86inc: Set program_name outside of x86inc.asm
This reduces the local difference to the x264 upstream version.
2012-11-11 11:06:19 +01:00
Diego Biurrun
012f73e271 x86inc: Only define program_name if the macro is unset
This allows overriding the value from outside of the file.
2012-11-02 14:38:00 +01:00
Ronald S. Bultje
08b028c18d Remove INIT_AVX from x86inc.asm. 2012-10-29 14:51:14 -07:00
Loren Merritt
7a1944b907 vf_hqdn3d: x86 asm
13% faster on penryn, 16% on sandybridge, 15% on bulldozer
Not simd; a compiler should have generated this, but gcc didn't.
2012-08-26 10:49:14 +00:00
Mans Rullgard
edd8226795 x86: fix build with nasm 2.08
It appears that something goes wrong in old nasm versions when the
%+ operator is used in the last argument of a macro invocation and
this argument is tested with %ifdef within the macro.  This patch
rearranges the macro arguments such that the %+ operator is never
used in the last argument.
2012-08-07 15:24:34 +01:00
Mans Rullgard
180d43bc67 x86: use nop cpu directives only if supported
nasm does not support 'CPU foonop' directives.  This adds a configure
test for the directive and uses it only if supported.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-07 15:22:20 +01:00
Mans Rullgard
7238265052 x86: fix rNmp macros with nasm
For some reason, nasm requires this.  No harm done to yasm.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-07 15:21:58 +01:00
Diego Biurrun
ca844b7be9 x86: Use consistent 3dnowext function and macro name suffixes
Currently there is a wild mix of 3dn2/3dnow2/3dnowext.  Switching to
"3dnowext", which is a more common name of the CPU flag, as reported
e.g. by the Linux kernel, unifies this.
2012-08-03 14:00:47 +02:00
Loren Merritt
f8d8fe255d x86inc: clip num_args to 7 on x86-32.
This allows us to unconditionally set the cglobal num_args
parameter to a bigger value, thus making writing yasm code
even easier than before.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-07-28 08:29:45 -07:00
Ronald S. Bultje
96c9cc1094 x86inc: sync to latest version from x264. 2012-07-28 08:29:44 -07:00
Justin Ruggles
79687079a9 x86: add support for fmaddps fma4 instruction with abstraction to avx/sse 2012-07-27 11:25:48 -04:00
Ronald S. Bultje
30b45d9c38 x86inc: automatically insert vzeroupper for YMM functions. 2012-07-26 13:43:16 -07:00
Loren Merritt
2cd1f5cadc x86inc: modify ALIGN to not generate long nops on i586
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2012-07-05 17:37:11 +02:00
Loren Merritt
705f3d4759 x86inc: support AVX abstraction for 2-operand instructions
Add cvtdq2ps and cvtps2dq to the AVX instruction list.

Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
2012-04-18 21:14:32 -04:00
Henrik Gramner
729f90e268 x86inc improvements for 64-bit
Add support for all x86-64 registers
Prefer caller-saved register over callee-saved on WIN64
Support up to 15 function arguments

Also (by Ronald S. Bultje)
Fix up our asm to work with new x86inc.asm.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
2012-04-11 15:47:00 -04:00
Ronald S. Bultje
98b9da2ac7 x86inc: add *mp named argument support to DEFINE_ARGS. 2012-03-14 20:09:53 -07:00
Loren Merritt
0f53d0cf4b x86inc: don't "bake" stack_offset in named arguments.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-03-03 20:39:59 -08:00
Haruhiko Yamagata
166f399377 x86inc: support yasm -f win64 flag also.
This sets __OUTPUT_FORMAT__ to win64 instead of win32, even though both
(through -m amd64) produce 64-bit binary code.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-02-08 10:31:14 -08:00
Henrik Gramner
9cf7385309 x86inc: allow manual use of WIN64_SPILL_XMM.
Functions using INIT_MMX may still access XMM registers through direct
means (xmm0-15). Therefore, they still need to be marked for clobber
so they can be properly saved/restored.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-02-08 10:31:14 -08:00
Ronald S. Bultje
412b248edb x86inc.asm: fix typo.
Assemblers don't understand ! in %if statements.
2012-01-27 16:33:03 +08:00
Ronald S. Bultje
3b15a6d742 config.asm: change %ifdef directives to %if directives.
This allows combining multiple conditionals in a single statement.
2012-01-27 10:19:57 +08:00
Vitor Sessak
39df0c434c mpegaudiodec: optimized iMDCT transform
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-01-08 17:40:55 -08:00
Vitor Sessak
6b6ee58249 x86inc: Flag shufps as an floating-point instruction for the AVX emulation code.
Without this, code like "shufps m0, m1, m2, 0xaa" would not work in CPUs
not supporting SSE2.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-11-27 13:10:33 -08:00
Justin Ruggles
f2bd8a0786 x86inc: use sse versions of common macros instead of sse2 when applicable 2011-11-06 19:14:13 -05:00
Loren Merritt
2f7f2e4b41 Update x86inc.asm to latest x264 version, and add AVX symmetry.
We keep INIT_AVX (for backwards compatibility). 3arg AVX ops with
a memory arg can only have it in src2, whereas SSE emulation of
3arg prefers to have it in src1 (i.e. the mov). So, if the op is
symmetric and the wrong one is memory, swap them.
2011-11-05 20:48:14 -07:00
Ronald S. Bultje
3a39195b1d Move x86inc.asm to libavutil/.
This allows using it in libswscale/ also.
2011-08-12 11:43:02 -07:00