Store XMM6 and XMM7 in the shadow space in functions that
clobbers them. This way we don't have to adjust the stack
pointer as often, reducing the number of instructions as
well as code size.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
This fixes crashes in chromium on win64 on machines with AVX
(crashes that apparently aren't triggered by fate).
Signed-off-by: Martin Storsjö <martin@martin.st>
Use this in VP8/H264-8bit loopfilter functions so they can be used if
there is no aligned stack (e.g. MSVC 32bit or ICC 10.x).
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
This completes the conversion of h264dsp to yasm; note that h264 also
uses some dsputil functions, most notably qpel. Performance-wise, the
yasm-version is ~10 cycles faster (182->172) on x86-64, and ~8 cycles
faster (201->193) on x86-32.
Add support for all x86-64 registers
Prefer caller-saved register over callee-saved on WIN64
Support up to 15 function arguments
Also (by Ronald S. Bultje)
Fix up our asm to work with new x86inc.asm.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
format), LGPL'ed with permission from Jason and Loren. This includes mmx2
code, so remove inline asm from h264dsp_mmx.c accordingly.
Originally committed as revision 25031 to svn://svn.ffmpeg.org/ffmpeg/trunk