To access data at multiple fixed offsets from a base address, this
code uses a single "m" operand and code of the form "32%0", relying on
the memory operand instantiation having no displacement, giving a final
result of the form "32(%rax)". If the compiler uses a register and
displacement, e.g. "64(%rax)", the end result becomes "3264(%rax)",
which obviously does not work.
Replacing the "m" operands with "r" operands allows safe addition of a
displacement. In theory, multiple memory operands could use a shared
base register with different index registers, "(%rax,%rbx)", potentially
making more efficient use of registers. In the cases at hand, no such
sharing is possible since the addresses involved are entirely unrelated.
After this change, the code somewhat rudely accesses memory without
using a corresponding memory operand, which in some cases can lead to
unwanted "optimisations" of surrounding code. However, the original
code also accesses memory not covered by a memory operand, so this is
not adding any defect not already present. It is also hightly unlikely
that any such optimisations could be performed here since the memory
locations in questions are not accessed elsewhere in the same functions.
This fixes crashes with suncc.
Signed-off-by: Mans Rullgard <mans@mansr.com>
About 30% faster on 32 bit Atom, 120% faster on 64 bit Phenom2.
This is interesting because supporting P16 is easier in e.g.
OpenGL (can misuse support for any 2-component 8 bit format),
whereas supporting p9/p10 without conversion needs a texture
format with at least 14 bits actual precision.
The shiftonly == 0 case is not optimized since the code is more
complex and the speed gain less obvious.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
* qatar/master:
mpegvideo: reduce excessive inlining of mpeg_motion()
mpegvideo: convert mpegvideo_common.h to a .c file
build: factor out mpegvideo.o dependencies to CONFIG_MPEGVIDEO
Move MASK_ABS macro to libavcodec/mathops.h
x86: move MANGLE() and related macros to libavutil/x86/asm.h
x86: rename libavutil/x86_cpu.h to libavutil/x86/asm.h
aacdec: Don't fall back to the old output configuration when no old configuration is present.
rtmp: Add message tracking
rtsp: Support mpegts in raw udp packets
rtsp: Support receiving plain data over UDP without any RTP encapsulation
rtpdec: Remove an unused include
rtpenc: Remove an av_abort() that depends on user-supplied data
vsrc_movie: discourage its use with avconv.
avconv: allow no input files.
avconv: prevent invalid reads in transcode_init()
avconv: rename OutputStream.is_past_recording_time to finished.
Conflicts:
configure
doc/filters.texi
ffmpeg.c
ffmpeg.h
libavcodec/Makefile
libavcodec/aacdec.c
libavcodec/mpegvideo.c
libavformat/version.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
lavr: fix handling of custom mix matrices
fate: force pix_fmt in lagarith-rgb32 test
fate: add tests for lagarith lossless video codec.
ARMv6: vp8: fix stack allocation with Apple's assembler
ARM: vp56: allow inline asm to build with clang
fft: 3dnow: fix register name typo in DECL_IMDCT macro
x86: dct32: port to cpuflags
x86: build: replace mmx2 by mmxext
Revert "wmapro: prevent division by zero when sample rate is unspecified"
wmapro: prevent division by zero when sample rate is unspecified
lagarith: fix color plane inversion for YUY2 output.
lagarith: pad RGB buffer by 1 byte.
dsputil: make add_hfyu_left_prediction_sse4() support unaligned src.
Conflicts:
doc/APIchanges
libavcodec/lagarith.c
libavfilter/x86/gradfun.c
libavutil/cpu.h
libavutil/version.h
libswscale/utils.c
libswscale/version.h
libswscale/x86/yuv2rgb.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Refactoring mmx2/mmxext YASM code with cpuflags will force renames.
So switching to a consistent naming scheme beforehand is sensible.
The name "mmxext" is more official and widespread and also the name
of the CPU flag, as reported e.g. by the Linux kernel.
* qatar/master:
avformat: Drop pointless "format" from container long names
swscale: bury one more piece of inline asm under HAVE_INLINE_ASM.
wv: K&R formatting cosmetics
configure: Add missing descriptions to help output
h264_ps: declare array of colorspace strings on its own line.
fate: amix: specify f32 sample format for comparison
tiny_psnr: support 32-bit float samples
eamad/eatgq/eatqi: call special EA IDCT directly
eamad: remove use of MpegEncContext
mpegvideo: remove unnecessary inclusions of faandct.h
af_asyncts: avoid overflow in out_size with large delta values
af_asyncts: add first_pts option
Conflicts:
configure
libavcodec/eamad.c
libavcodec/h264_ps.c
libavformat/crcenc.c
libavformat/ffmdec.c
libavformat/ffmenc.c
libavformat/framecrcenc.c
libavformat/md5enc.c
libavformat/nutdec.c
libavformat/rawenc.c
libavformat/yuv4mpeg.c
tests/tiny_psnr.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
v410dec: Implement explode mode support
zerocodec: fix direct rendering.
wav: init st to NULL to avoid a false-positive warning.
wavpack: set bits_per_raw_sample for S32 samples to properly identify 24-bit
h264: refactor NAL decode loop
RTMPTE protocol support
RTMPE protocol support
rtmp: Add ff_rtmp_calc_digest_pos()
rtmp: Rename rtmp_calc_digest to ff_rtmp_calc_digest and make it global
swscale: add missing HAVE_INLINE_ASM check.
lavfi: place x86 inline assembly under HAVE_INLINE_ASM.
vc1: Add a test for interlaced field pictures
swscale: Mark all init functions as av_cold
swscale: x86: Drop pointless _mmx suffix from filenames
lavf: use conditional notation for default codec in muxer declarations.
swscale: place inline assembly bilinear scaler under HAVE_INLINE_ASM.
dsputil: ppc: cosmetics: pretty-print
dsputil: x86: add SHUFFLE_MASK_W macro
configure: respect CC_O setting in check_cc
Conflicts:
Changelog
configure
libavcodec/v410dec.c
libavcodec/zerocodec.c
libavformat/asfenc.c
libavformat/version.h
libswscale/utils.c
libswscale/x86/swscale.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
x86: swscale: Place inline assembly code under appropriate #ifdefs
rtsp: remove terminal comma in FF_RTP_FLAG_OPTS macro.
configure: Remove redundant RTMPT/RTMPTS dependencies
configure: add filtering of host cflags/ldflags
configure: initialise all flag filters at the same place
configure: add filtering of linker flags
configure: name some variables more consistently
configure: remove filter_cppflags
configure: set icc_version where it is needed
mpegenc: remove disabled code
Conflicts:
configure
libavformat/movenc.c
libswscale/x86/swscale_mmx.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Misaligned row artifacts showed up when a 624x352 frame was converted
to BGR24 format. When advancing to the next row the destination linesize
was added to the last output pointer position which was not linesize aligned,
resulting in a distorted picture.
Signed-off-by: Pavel Koshevoy <pavel@apple.aragog.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master: (29 commits)
lavfi: reclassify showfiltfmts as a TESTPROG
graph2dot: fix printf format specifier
swscale: yuv2planeX 8bit >=sse2 functions need aligned stack on x86-32.
vp8: loopfilter >=sse2 functions need aligned stack on x86-32.
amr: remove shift out of the AMR_BIT() macro.
dsputilenc: group yasm and inline asm function pointer assignment.
mov: use forward declaration of a function instead of a table.
Clarify Doxygen comment for FF_API_* #defines.
configure: simplify get_version()
Create version.h headers for libraries that lack them
gitignore: Use full path instead of relative path to specify patterns
mpegvideo: remove VLAs
Add XTEA encryption support in libavutil
Add Blowfish encryption support in libavutil
eval: Add the isinf() function and tests for it
flacdec: move lpc filter to flacdsp
flacdec: split off channel decorrelation as flacdsp
avplay: Add an option for not limiting the input buffer size
FATE: add a test for WMA cover art.
FATE: add a test for apetag cover art
...
Conflicts:
.gitignore
configure
ffplay.c
libavcodec/Makefile
libavcodec/error_resilience.c
libavcodec/mpegvideo.c
libavcodec/ratecontrol.c
libavdevice/avdevice.h
libavfilter/Makefile
libavfilter/filtfmts.c
libavfilter/version.h
libavformat/mov.c
libavformat/version.h
libavutil/Makefile
libavutil/avutil.h
libavutil/version.h
libswscale/swscale.h
libswscale/x86/swscale_mmx.c
tests/fate/libavutil.mak
tests/lavfi-regression.sh
tools/graph2dot.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
MS Screen 1 decoder
aacdec: Fix popping channel layouts.
av_gettime: support Win32 without gettimeofday()
Use av_gettime() in various places
Move av_gettime() to libavutil
dct-test: use emms_c() from libavutil instead of duplicating it
mov: fix operator precedence bug
mathematics.h: remove a couple of math defines
Remove unnecessary inclusions of [sys/]time.h
lavf: remove unnecessary inclusions of unistd.h
bfin: libswscale: add const where appropriate to fix warnings
bfin: libswscale: remove unnecessary #includes
udp: Properly check for invalid sockets
tcp: Check the return value from getsockopt
network: Use av_strerror for getting error messages
udp: Properly print error from getnameinfo
mmst: Use AVUNERROR() to convert error codes to the right range for strerror
network: Pass pointers of the right type to get/setsockopt/ioctlsocket on windows
rtmp: Reduce the number of idle posts sent by sleeping 50ms
Conflicts:
Changelog
configure
libavcodec/aacdec.c
libavcodec/allcodecs.c
libavcodec/avcodec.h
libavcodec/dct-test.c
libavcodec/version.h
libavformat/riff.c
libavformat/udp.c
libavutil/Makefile
libswscale/bfin/yuv2rgb_bfin.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
rtmp: Add a new option 'rtmp_buffer', for setting the client buffer time
rtmp: Set the client buffer time to 3s instead of 0.26s
rtmp: Handle server bandwidth packets
rtmp: Display a verbose message when an unknown packet type is received
lavfi/audio: use av_samples_copy() instead of custom code.
configure: add all filters hardcoded into avconv to avconv_deps
avfiltergraph: remove a redundant call to avfilter_get_by_name().
lavfi: allow building without swscale.
build: Do not delete tests/vsynth2 directory, which is no longer created.
lavfi: replace AVFilterContext.input/output_count with nb_inputs/outputs
lavfi: make AVFilterPad opaque after two major bumps.
lavfi: add avfilter_pad_get_type() and avfilter_pad_get_name().
lavfi: make avfilter_get_video_buffer() private on next bump.
jack: update to new latency range API as the old one has been deprecated
rtmp: Tokenize the AMF connection parameters manually instead of using strtok_r
ppc: Rename H.264 optimization template file for consistency.
lavfi: add channelsplit audio filter.
golomb: check remaining bits during unary decoding in get_ur_golomb_jpegls()
sws: fix planar RGB input conversions for 9/10/16 bpp.
Conflicts:
Changelog
configure
doc/APIchanges
ffmpeg.c
libavcodec/golomb.h
libavcodec/v210dec.h
libavfilter/Makefile
libavfilter/allfilters.c
libavfilter/asrc_anullsrc.c
libavfilter/audio.c
libavfilter/avfilter.c
libavfilter/avfilter.h
libavfilter/avfiltergraph.c
libavfilter/buffersrc.c
libavfilter/formats.c
libavfilter/version.h
libavfilter/vf_frei0r.c
libavfilter/vf_pad.c
libavfilter/vf_scale.c
libavfilter/video.h
libavfilter/vsrc_color.c
libavformat/rtmpproto.c
libswscale/input.c
tests/Makefile
Merged-by: Michael Niedermayer <michaelni@gmx.at>
register starvation caused gcc4.2 to fail building 32 bit shared libs
on 64 bit OS X
Signed-off-by: Michael Bradshaw <mbradshaw@sorensonmedia.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>