This is so we can sync to x264's version of FMA4 support.
This partialy reverts commit 79687079a9.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Automatically use VEX-encoding in AVX/AVX2/XOP/FMA3/FMA4
functions for all instructions that exists in a VEX-encoded
version.
This change makes it easier to extend existing code to use AVX2.
Also add support for AVX emulation of a few instructions that
were missing before.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* commit '47f9d7ce5493e119e09d1227d017414feaaf8d97':
x86inc: Check for __OUTPUT_FORMAT__ having a value of "x64"
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'bbe4a6db44f0b55b424a5cc9d3e89cd88e250450':
x86inc: Utilize the shadow space on 64-bit Windows
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '63f0d623100bdb0c6081456127f4b6713e83d3db':
x86inc: Use SSE instead of SSE2 for copying data
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'ad76e6e7e193b98e7335156422d35467816f9ef1':
x86inc: Set ELF hidden visibility for global constants
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Prevents a crash if the misaligned exception mask bit is
cleared for some reason.
Misaligned SSE functions are only used on AMD Phenom CPUs
and the benefit is miniscule. They also require modifying
the MXCSR control register and by removing those functions
we can get rid of that complexity altogether.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Store XMM6 and XMM7 in the shadow space in functions that
clobbers them. This way we don't have to adjust the stack
pointer as often, reducing the number of instructions as
well as code size.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
SWAP with >=3 named (rather than numbered) args
PERMUTE followed by SWAP with 2 named args
used to produce the wrong permutation
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Now RET checks whether it immediately follows a branch, so the
programmer dosen't have to keep track of that condition. REP_RET
is still needed manually when it's a branch target, but that's
much rarer.
The implementation involves lots of spurious labels, but that's OK
because we strip them.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* commit '79aec43ce813a3e270743ca64fa3f31fa43df80b':
x86: Add and use more convenience macros to check CPU extension availability
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'b78b10c4b78b696927f2801cf2d9f193b4eff28b':
avutil: Move internal CPU detection function declarations to private header
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
Consistently use "cpu_flags" as variable/parameter name for CPU flags
Conflicts:
libavcodec/x86/dsputil_init.c
libavcodec/x86/h264dsp_init.c
libavcodec/x86/hpeldsp_init.c
libavcodec/x86/motion_est.c
libavcodec/x86/mpegvideo.c
libavcodec/x86/proresdsp_init.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '502ab21af0ca68f76d6112722c46d2f35c004053':
x86: lpc: simd av_update_lls
The versions are bumped due to changes in lls.h which is used across
libraries affecting intra library ABI
(This version bump also covers changes to lls.h in the immedeatly previous
commits)
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The "CentaurHauls family 6 model 9 stepping 8" family of CPUs
(flags: fpu vme de pse tsc msr cx8 sep mtrr pge mov pat mmx fxsr sse
up rng rng_en ace ace_en) SIGILLs on long nop codes.
Signed-off-by: Martin Storsjö <martin@martin.st>
internal.h doesn't need to include cpu.h anymore since
the relevant code was moved to x86/emms.h
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '4db96649ca700db563d9da4ebe70bf9fc4c7a6ba':
avutil: Ensure that emms_c is always defined, even on non-x86
configure: Move MinGW CPPFLAGS setting to libc section, where it belongs
avutil: Move emms code to x86-specific header
Conflicts:
configure
libavutil/internal.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The "CPU: CentaurHauls family 6 model 9 stepping 8" family of CPUs
(flags: fpu vme de pse tsc msr cx8 sep mtrr pge mov pat mmx fxsr sse
up rng rng_en ace ace_en) SIGILLs on long nop codes.
Change-Id: I7e7c52a2191006df30a9aadbc40d481a1db89106
Now, nellymoserenc and aacenc no longer depends on dsputil. Independent
of this patch, wmaprodec also does not depend on dsputil, so I removed
it from there also.
* qatar/master:
x86: dsputil: Drop some unused macro definitions
x86: Add a Yasm-based emms() replacement
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'd633d12b2cc999cee3ac25bf9a810fe7ff03726d':
x86inc: Add cvisible macro for C functions with public prefix
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'ef5d41a5534b65f03d02f2e11a503ab8416bfc3b':
x86inc: Rename "program_name" to "private_prefix"
configure: Run SHFLAGS through ldflags_filter()
Conflicts:
configure
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This provides a fallback when building with Yasm enabled, but neither
inline assembly, nor the _mm_empty intrinsic are available or enabled.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
The new name is more descriptive and will allow defining a separate
public prefix for externally visible library symbols.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
* qatar/master:
lavc: Move vector_fmul_window to AVFloatDSPContext
rtpdec_mpeg4: Check the remaining amount of data before reading
Conflicts:
libavcodec/dsputil.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'dae1d507af94261bafd3b11549884e5d1eca590e':
x86: Add PAVGB macro to abstract pavgb/pavgusb instruction via cpuflags
vf_fps: add final flushed frames to the dropped frame count
rv34_parser: Adjust #if for disabling individual parsers
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '094a7405e5d8463d7d167d893e04934ec1a84ecd':
x86: ABSB: port to cpuflags
sdp: Include SRTP crypto params if using the srtp protocol
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'd8c772de53d29afb1bada88afa859fce8489c668':
nutdec: Always return a value from nut_read_timestamp()
configure: Make warnings from -Wreturn-type fatal errors
x86: ABS2: port to cpuflags
vdpau: Remove av_unused attribute from function declaration
h264: fix ff_generate_sliding_window_mmcos() prototype.
Conflicts:
configure
libavformat/nutdec.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '5b4dfbffc258f90a7d2540d21209ac23afcf7cd0':
x86: ABS1: port to cpuflags
v210x: cosmetics, reformat
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
aacdec: Fix an off-by-one overwrite when switching to LTP profile from MAIN.
x86inc: fix stack alignment on win64
rtpproto: Remove unused defines
Conflicts:
libavcodec/aacdec.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Use this in VP8/H264-8bit loopfilter functions so they can be used if
there is no aligned stack (e.g. MSVC 32bit or ICC 10.x).
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Use this in VP8/H264-8bit loopfilter functions so they can be used if
there is no aligned stack (e.g. MSVC 32bit or ICC 10.x).
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
* qatar/master:
golomb: use unsigned arithmetics in svq3_get_ue_golomb()
x86: float_dsp: fix loading of the len parameter on x86-32
takdec: fix initialisation of LOCAL_ALIGNED array
takdec: fix initialisation of LOCAL_ALIGNED array
Conflicts:
libavcodec/rv30.c
libavcodec/svq3.c
libavcodec/takdec.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '9d5c62ba5b586c80af508b5914934b1c439f6652':
lavu/opt: do not filter out the initial sign character except for flags
eval: treat dB as decibels instead of decibytes
float_dsp: add vector_dmul_scalar() to multiply a vector of doubles
Conflicts:
libavutil/eval.c
tests/ref/fate/eval
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'af7d13ee4a4bf8d708f9b0598abb8f6e22b76de1':
asink_nullsink: plug a memory leak.
x86: h264_idct: port to cpuflags
x86: cpu: Drop unused HAVE_RWEFLAGS condition
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '3c370f5abc55739a261534b9f9bdc739cedbbbb9':
riff: only warn on a bad INFO chunk code size instead of failing
configure: Add separate list for libraries and use where appropriate
x86: float_dsp: add SSE version of vector_fmul_scalar()
Conflicts:
configure
libavformat/riff.c
libavutil/x86/float_dsp.asm
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '930e26a3ea9d223e04bac4cdde13697cec770031':
x86: h264qpel: Only define mmxext QPEL functions if H264QPEL is enabled
x86: PABSW: port to cpuflags
x86: vc1dsp: port to cpuflags
rtmp: Use av_strlcat instead of strncat
Conflicts:
libavcodec/x86/h264_qpel.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '9221efef7968463f3e3d9ce79ea72eaca082e73f':
lavf: fix av_interleaved_write_frame() doxy.
lavf: clarify the lifetime of demuxed packets.
avconv: do not free muxed packet on streamcopy.
crc: move doxy to the header
vf_drawtext: do not use deprecated av_tree_node_size
x86: Refactor PSWAPD fallback implementations and port to cpuflags
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '9a07c1332cfe092b57b5758f22b686ca58806c60':
parser: Move Doxygen documentation to the header files
PGS subtitles: Expose forced flag
x86: PMINUB: port to cpuflags
Conflicts:
libavcodec/avcodec.h
libavcodec/pgssubdec.c
libavcodec/version.h
libavcodec/x86/ac3dsp.asm
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '9ce02e14f01de50fcc6f7f459544b140be66d615':
x86: ac3dsp: port to cpuflags
x86util: Add cpuflags_mmxext alias for cpuflags_mmx2
x86inc: Only define program_name if the macro is unset
Conflicts:
libavcodec/x86/ac3dsp.asm
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'be923ed659016350592acb9b3346f706f8170ac5':
x86: fmtconvert: port to cpuflags
x86: MMX2 ---> MMXEXT in macro names
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Unlike YASM, NASM only looks for include files in the current
directory, not in the directory that included files reside in.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
* commit '6860b4081d046558c44b1b42f22022ea341a2a73':
x86: include x86inc.asm in x86util.asm
cng: Reindent some incorrectly indented lines
cngdec: Allow flushing the decoder
cngdec: Make the dbov variable have the right unit
cngdec: Fix the memset size to cover the full array
cngdec: Update the LPC coefficients after averaging the reflection coefficients
configure: fix print_config() with broke awks
Conflicts:
libavcodec/x86/ac3dsp.asm
libavcodec/x86/dct32.asm
libavcodec/x86/deinterlace.asm
libavcodec/x86/dsputil.asm
libavcodec/x86/dsputilenc.asm
libavcodec/x86/fft.asm
libavcodec/x86/fmtconvert.asm
libavcodec/x86/h264_chromamc.asm
libavcodec/x86/h264_deblock.asm
libavcodec/x86/h264_deblock_10bit.asm
libavcodec/x86/h264_idct.asm
libavcodec/x86/h264_idct_10bit.asm
libavcodec/x86/h264_intrapred.asm
libavcodec/x86/h264_intrapred_10bit.asm
libavcodec/x86/h264_weight.asm
libavcodec/x86/vc1dsp.asm
libavcodec/x86/vp3dsp.asm
libavcodec/x86/vp56dsp.asm
libavcodec/x86/vp8dsp.asm
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Unlike YASM, NASM only looks for include files in the current
directory, not in the directory that included files reside in.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
* commit '9734b8ba56d05e970c353dfd5baafa43fdb08024':
Move avutil tables only used in libavcodec to libavcodec.
Conflicts:
libavcodec/mathtables.c
libavutil/intmath.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
h264: don't touch H264Context->ref_count[] during MB decoding
x86: get_cpu_flags: add necessary ifdefs around function body
x86: Drop CPU detection intrinsics
x86: Add YASM implementations of cpuid and xgetbv from x264
Conflicts:
configure
libavcodec/h264_cabac.c
libavcodec/h264_cavlc.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '65d12900432ac880d764edbbd36818431484a76e':
configure: add --enable-lto option
x86: cpu: Break out test for cpuid capabilities into separate function
x86: ff_get_cpu_flags_x86(): Avoid a pointless variable indirection
build: Factor out mpegaudio dependencies to CONFIG_MPEGAUDIO
segment: Add comments about calls that only are relevant for some muxers
segment: Add an option for omitting the first header and final trailer
Conflicts:
configure
libavcodec/Makefile
libavformat/segment.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
ff_get_cpu_flags_x86() requires cpuid(), which is conditionally defined
elsewhere in the file. Surrounding the function body with ifdefs allows
building even when cpuid is not defined. An empty cpuflags mask is
returned in this case.
Now that there is CPU detection in YASM, there will always be one of
inline or external assembly enabled, which obviates the need to fall
back on CPU detection through compiler intrinsics.
* qatar/master:
swscale: Provide the right alignment for external mmx asm
x86: Replace checks for CPU extensions and flags by convenience macros
configure: msvc: fix/simplify setting of flags for hostcc
x86: mlpdsp: mlp_filter_channel_x86 requires inline asm
Conflicts:
libavcodec/x86/fft_init.c
libavcodec/x86/h264_intrapred_init.c
libavcodec/x86/h264dsp_init.c
libavcodec/x86/mpegaudiodec.c
libavcodec/x86/proresdsp_init.c
libavutil/x86/float_dsp_init.c
libswscale/utils.c
libswscale/x86/swscale.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
mov_chan: Only set the channel_layout if setting it to a nonzero value
mov_chan: Reindent an incorrectly indented line
mp2 muxer: mark as AVFMT_NOTIMESTAMPS.
x86: float_dsp: fix ff_vector_fmac_scalar_avx() on Win64
x86: more specific checks for availability of required assembly capabilities
x86: avcodec: Drop silly "_mmx" suffix from dsputil template names
fate: Drop redundant setting of FUZZ to 1
cavsdsp: set idct permutation independently of dsputil
x86: allow using add_hfyu_median_prediction_cmov on any cpu with cmov
Conflicts:
libavcodec/x86/dsputil_mmx.c
libavformat/mp3enc.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>