Allows emulation to work when dst is equal to src2 as long as the
instruction is commutative, e.g. `addps m0, m1, m0`.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
The yasm/nasm preprocessor only checks the first token, which means that
parameters such as `dword [rax]` are treated as identifiers, which is
generally not what we want.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Those instructions are not commutative since they only change the first
element in the vector and leave the rest unmodified.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
BT.709 coefficients were gathered from the first two parts of BT.709
to BT.2020 conversion guide in ARIB STD-B62 (Pt. 1, Chapter 6.2.2).
They were additionally confirmed by manually calculating values.
The h264/hevc Annex E colour primaries table says that AVCOL_SPC_SMPTE170M is
similar than AVCOL_SPC_SMPTE240M. These two values are not similar than
AVCOL_SPC_BT470BG.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Restore alphabetical order in lists, break overly long lines, do some
prettyprinting, add some explanatory section comments, group parts
together that belong together logically.
Some debuggers/profilers use this metadata to determine which function a
given instruction is in; without it they get can confused by local labels
(if you haven't stripped those). On the other hand, some tools are still
confused even with this metadata. e.g. this fixes `gdb`, but not `perf`.
Currently only implemented for ELF.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
The REP_RET workaround is only needed on old AMD cpus, and the labels clutter
up the symbol table and confuse debugging/profiling tools, so use EQU to
create SHN_ABS symbols instead of creating local labels. Furthermore, skip
the workaround completely in functions that definitely won't run on such cpus.
Note that EQU is just creating a local label when using nasm instead of yasm.
This is probably a bug, but at least it doesn't break anything.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
When allocating stack space with a larger alignment than the known stack
alignment a temporary register is used for storing the stack pointer.
Ensure that this isn't one of the registers used for passing arguments.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
* Correctly handle FMA instructions with memory operands.
* Print a warning if FMA instructions are used without the correct cpuflag.
* Simplify the instantiation code.
* Clarify documentation.
Only the last operand in FMA3 instructions can be a memory operand. When
converting FMA4 instructions to FMA3 instructions we can utilize the fact
that multiply is a commutative operation and reorder operands if necessary
to ensure that a memory operand is used only as the last operand.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
This fixes builds with --disable-vfp.
Checking for the armv6 cpu flag is incorrect, since vfpv2 isn't
armv6 specific.
Signed-off-by: Martin Storsjö <martin@martin.st>
The vector mode was deprecated in ARMv7-A/VFPv3 and various cpu
implementations do not support it in hardware. Vector mode code will
depending the OS either be emulated in software or result in an illegal
instruction on cpus which does not support it. This was not really
problem in practice since NEON implementations of the same functions are
preferred. It will however become a problem for checkasm which tests
every cpu flag separately.
Since this is a cpu feature newer cpu do not support anymore the
behaviour of this flag differs from the other flags. It can be only
activated by runtime cpu feature selection.