Commit Graph

62 Commits

Author SHA1 Message Date
Henrik Gramner
44b4444120 x86inc: Various minor backports from x264
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2015-08-13 07:46:24 +02:00
Henrik Gramner
ab43beefab x86inc: Drop SECTION_TEXT macro
The .text section is already 16-byte aligned by default on all supported
platforms so `SECTION_TEXT` isn't any different from `SECTION .text`.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2015-08-11 11:12:01 +02:00
Henrik Gramner
1c6bb81328 x86inc: Disable vpbroadcastq workaround in newer yasm versions
The bug was fixed in 1.3.0, so only perform the workaround in earlier versions.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2015-08-11 11:11:27 +02:00
Christophe Gisquet
f5e486f6f8 x86inc: Fix instantiation of YMM registers
Signed-off-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2015-08-11 11:09:08 +02:00
Anton Mitrofanov
b114d28a18 x86inc: warn when instructions incompatible with current cpuflags are used
Signed-off-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2015-08-11 11:07:18 +02:00
Henrik Gramner
9f1245eb96 x86inc: Support arbitrary stack alignments
Change ALLOC_STACK to always align the stack before allocating stack space for
consistency. Previously alignment would occur either before or after allocating
stack space depending on whether manual alignment was required or not.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2015-08-11 11:04:11 +02:00
Anton Mitrofanov
8c75ba55a4 x86inc: warn if XOP integer FMA instruction emulation is impossible
Emulation requires a temporary register if arguments 1 and 4 are the same; this
doesn't obey the semantics of the original instruction, so we can't emulate
that in x86inc.

Also add pmacsdql emulation.

Signed-off-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2015-08-11 11:02:27 +02:00
Timothy Gu
dd4d709be7 x86inc: Clear __SECT__
Silences warning(s) like:

    libavcodec/x86/fft.asm:93: warning: section flags ignored on
    section redeclaration

The cause of this warning is that because `struc` and `endstruc`
attempts to revert to the previous section state [1].

The section state is stored in the macro __SECT__, defined by
x86inc.asm to be `.note.GNU-stack ...`, through the `SECTION`
directive [2].

Thus, the `.note.GNU-stack` section is defined twice
(once in x86inc.asm, once during `endstruc`), causing the warning.

That is the first part of the commit: using the primitive `[section]` format
for .note.GNU-stack etc., which does not update `__SECT__` [2].

That fixes only half of the problem. Even without any `SECTION` directives,
`__SECT__` is predefined as `.text`, which conflicting with the later
`SECTION_TEXT` (which expands to `.text align=16`).

[1]: http://www.nasm.us/doc/nasmdoc6.html#section-6.4
[2]: http://www.nasm.us/doc/nasmdoc6.html#section-6.3

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2015-05-28 11:40:15 +02:00
Henrik Gramner
f629705b02 x86inc: Make INIT_CPUFLAGS support an arbitrary number of cpuflags
Previously there was a limit of two cpuflags.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2014-09-09 02:00:25 -07:00
Loren Merritt
ec217218c2 x86inc: Free up variable name "n" in global namespace
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2014-09-09 02:00:19 -07:00
Henrik Gramner
176a0fca3f x86inc: Make ym# behave the same way as xm#
This makes more sense for future implementations of templates with zmm registers.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2014-09-09 01:45:14 -07:00
Diego Biurrun
79793f8337 Update Fiona's name in copyright statements. 2014-07-01 03:26:51 -07:00
Loren Merritt
b7d0d10a1d x86inc: Speed up assembling with Yasm
Work around Yasm's inefficiency with handling large numbers of variables
in the global scope.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2014-01-26 18:40:08 +01:00
Jason Garrett-Glaser
a3fabc6cb3 x86: more AVX2 framework
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-14 12:41:56 +01:00
Jason Garrett-Glaser
c6908d6b4b x86inc: FMA3/4 Support
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-14 12:41:54 +01:00
Derek Buitenhuis
206895708e x86inc: Remove our FMA4 support
This is so we can sync to x264's version of FMA4 support.

This partialy reverts commit 79687079a9.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-14 12:39:29 +01:00
Henrik Gramner
c108ba0175 x86inc: Use VEX-encoded instructions in AVX functions
Automatically use VEX-encoding in AVX/AVX2/XOP/FMA3/FMA4
functions for all instructions that exists in a VEX-encoded
version.

This change makes it easier to extend existing code to use AVX2.

Also add support for AVX emulation of a few instructions that
were missing before.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-14 12:36:11 +01:00
Henrik Gramner
ad7d7d4f6a x86inc: Remove .rodata kludges
The Mach-O bug was fixed in yasm 0.8.0 and we don't
support versions that old anymore.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-09 07:44:30 -04:00
Henrik Gramner
3e2fa991db x86inc: remove misaligned cpu flag
Prevents a crash if the misaligned exception mask bit is
cleared for some reason.

Misaligned SSE functions are only used on AMD Phenom CPUs
and the benefit is miniscule. They also require modifying
the MXCSR control register and by removing those functions
we can get rid of that complexity altogether.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:27:38 -04:00
Jason Garrett-Glaser
7115566541 x86inc: various minor backports from x264
Small backports that sneaked into other asm commits in x264.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:27:22 -04:00
Derek Buitenhuis
47f9d7ce54 x86inc: Check for __OUTPUT_FORMAT__ having a value of "x64"
This is also a valid value for WIN64.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:27:08 -04:00
Henrik Gramner
bbe4a6db44 x86inc: Utilize the shadow space on 64-bit Windows
Store XMM6 and XMM7 in the shadow space in functions that
clobbers them. This way we don't have to adjust the stack
pointer as often, reducing the number of instructions as
well as code size.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:25:35 -04:00
Loren Merritt
3fb78e99a0 x86inc: create xm# and ym#, analagous to m#
For when we want to mix simd sizes within one function.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:25:19 -04:00
Loren Merritt
49ebe3f9fe x86inc: fix some corner cases of SWAP
SWAP with >=3 named (rather than numbered) args
PERMUTE followed by SWAP with 2 named args
used to produce the wrong permutation

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:25:06 -04:00
Henrik Gramner
63f0d62310 x86inc: Use SSE instead of SSE2 for copying data
Reduces code size because movaps/movups is one byte
shorter than movdqa/movdqu.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:24:33 -04:00
Henrik Gramner
ad76e6e7e1 x86inc: Set ELF hidden visibility for global constants
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:24:13 -04:00
Loren Merritt
25cb0c1a1e x86inc: activate REP_RET automatically
Now RET checks whether it immediately follows a branch, so the
programmer dosen't have to keep track of that condition. REP_RET
is still needed manually when it's a branch target, but that's
much rarer.

The implementation involves lots of spurious labels, but that's OK
because we strip them.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:17:59 -04:00
Christophe Gisquet
2e81acc687 x86inc: Fix number of operands for cmp* instructions
cmp{p,s}{s,d} instructions do take an imm8 operand.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2013-04-09 23:55:30 +02:00
Ronald S. Bultje
0c0828ecc5 x86: Use simple nop codes for <= sse (rather than <= mmx)
The "CentaurHauls family 6 model 9 stepping 8" family of CPUs
(flags: fpu vme de pse tsc msr cx8 sep mtrr pge mov pat mmx fxsr sse
up rng rng_en ace ace_en) SIGILLs on long nop codes.

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-02-19 22:33:19 +02:00
Diego Biurrun
d633d12b2c x86inc: Add cvisible macro for C functions with public prefix
This allows defining externally visible library symbols.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2013-01-18 22:02:03 +01:00
Diego Biurrun
ef5d41a553 x86inc: Rename "program_name" to "private_prefix"
The new name is more descriptive and will allow defining a separate
public prefix for externally visible library symbols.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2013-01-18 20:29:53 +01:00
Ronald S. Bultje
a34d9ad969 lavc: merge latest x86inc.asm fixes with x264
Unbreak NASM support.

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2012-12-19 07:27:33 +01:00
Janne Grunau
0995ad8db4 x86inc: fully concatenate tokens to fix macro expansion for nasm
Fixes build errors with nasm introduced in 6f40e9f070 for stack
memory alignment. Noticed by BugMaster.
2012-12-13 23:57:09 +01:00
Ronald S. Bultje
140367aff9 x86inc: fix stack alignment on win64
Signed-off-by: Martin Storsjö <martin@martin.st>
2012-12-12 21:30:49 +02:00
Ronald S. Bultje
6f40e9f070 x86inc: support stack mem allocation and re-alignment in PROLOGUE
Use this in VP8/H264-8bit loopfilter functions so they can be used if
there is no aligned stack (e.g. MSVC 32bit or ICC 10.x).

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2012-12-12 05:23:46 +01:00
Justin Ruggles
b30a363331 x86: af_volume: add SSE2/SSSE3/AVX-optimized s32 volume scaling 2012-12-05 11:23:37 -05:00
Diego Biurrun
f0d124f005 x86inc: Set program_name outside of x86inc.asm
This reduces the local difference to the x264 upstream version.
2012-11-11 11:06:19 +01:00
Diego Biurrun
012f73e271 x86inc: Only define program_name if the macro is unset
This allows overriding the value from outside of the file.
2012-11-02 14:38:00 +01:00
Ronald S. Bultje
08b028c18d Remove INIT_AVX from x86inc.asm. 2012-10-29 14:51:14 -07:00
Loren Merritt
7a1944b907 vf_hqdn3d: x86 asm
13% faster on penryn, 16% on sandybridge, 15% on bulldozer
Not simd; a compiler should have generated this, but gcc didn't.
2012-08-26 10:49:14 +00:00
Mans Rullgard
edd8226795 x86: fix build with nasm 2.08
It appears that something goes wrong in old nasm versions when the
%+ operator is used in the last argument of a macro invocation and
this argument is tested with %ifdef within the macro.  This patch
rearranges the macro arguments such that the %+ operator is never
used in the last argument.
2012-08-07 15:24:34 +01:00
Mans Rullgard
180d43bc67 x86: use nop cpu directives only if supported
nasm does not support 'CPU foonop' directives.  This adds a configure
test for the directive and uses it only if supported.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-07 15:22:20 +01:00
Mans Rullgard
7238265052 x86: fix rNmp macros with nasm
For some reason, nasm requires this.  No harm done to yasm.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-07 15:21:58 +01:00
Diego Biurrun
ca844b7be9 x86: Use consistent 3dnowext function and macro name suffixes
Currently there is a wild mix of 3dn2/3dnow2/3dnowext.  Switching to
"3dnowext", which is a more common name of the CPU flag, as reported
e.g. by the Linux kernel, unifies this.
2012-08-03 14:00:47 +02:00
Loren Merritt
f8d8fe255d x86inc: clip num_args to 7 on x86-32.
This allows us to unconditionally set the cglobal num_args
parameter to a bigger value, thus making writing yasm code
even easier than before.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-07-28 08:29:45 -07:00
Ronald S. Bultje
96c9cc1094 x86inc: sync to latest version from x264. 2012-07-28 08:29:44 -07:00
Justin Ruggles
79687079a9 x86: add support for fmaddps fma4 instruction with abstraction to avx/sse 2012-07-27 11:25:48 -04:00
Ronald S. Bultje
30b45d9c38 x86inc: automatically insert vzeroupper for YMM functions. 2012-07-26 13:43:16 -07:00
Loren Merritt
2cd1f5cadc x86inc: modify ALIGN to not generate long nops on i586
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2012-07-05 17:37:11 +02:00
Loren Merritt
705f3d4759 x86inc: support AVX abstraction for 2-operand instructions
Add cvtdq2ps and cvtps2dq to the AVX instruction list.

Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
2012-04-18 21:14:32 -04:00