Henrik Gramner
9f1245eb96
x86inc: Support arbitrary stack alignments
...
Change ALLOC_STACK to always align the stack before allocating stack space for
consistency. Previously alignment would occur either before or after allocating
stack space depending on whether manual alignment was required or not.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2015-08-11 11:04:11 +02:00
Henrik Gramner
4a53c758d2
x86: dcadsp: Avoid SSE2 instructions in SSE functions
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2015-08-11 09:22:46 +02:00
James Almer
9c0407e856
x86/sbrdsp: remove an unnecessary mova in sbr_autocorrelate
...
Signed-off-by: James Almer <jamrial@gmail.com>
2015-08-06 23:42:19 -03:00
Henrik Gramner
f0b7882ceb
x86inc: Drop SECTION_TEXT macro
...
The .text section is already 16-byte aligned by default on all supported
platforms so `SECTION_TEXT` isn't any different from `SECTION .text`.
2015-08-04 20:13:09 +02:00
Henrik Gramner
826790f596
x86inc: Support arbitrary stack alignments
...
Change ALLOC_STACK to always align the stack before allocating stack space for
consistency. Previously alignment would occur either before or after allocating
stack space depending on whether manual alignment was required or not.
2015-08-04 20:13:09 +02:00
James Almer
5750d6c5e9
x86: move XOP emulation code back to x86inc
...
Only two functions that use xop multiply-accumulate instructions where the
first operand is the same as the fourth actually took advantage of the macros.
This further reduces differences with x264's x86inc.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-08-03 17:11:13 -03:00
Hendrik Leppkes
1ce298dac5
Merge commit 'ebaf571aca2dd6ce3caeeeec4210a3fccd47e7db'
...
* commit 'ebaf571aca2dd6ce3caeeeec4210a3fccd47e7db':
x86: dct: Disable dct32_float_sse on x86-64
Conflicts:
libavcodec/x86/dct32.asm
libavcodec/x86/dct_init.c
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2015-08-02 12:31:39 +02:00
Henrik Gramner
ebaf571aca
x86: dct: Disable dct32_float_sse on x86-64
...
There is an SSE2 implementation so the SSE version is never used. The "SSE"
version also happens to contain SSE2 instructions on x86-64.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2015-08-02 08:41:45 +02:00
James Almer
9dcaae70f2
x86/aacpsdsp: add SSE and SSE3 optimized functions
...
Between 1.5 and 2.5 times faster
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-07-30 19:01:15 -03:00
Michael Niedermayer
29d147c94d
Merge commit '059a934806d61f7af9ab3fd9f74994b838ea5eba'
...
* commit '059a934806d61f7af9ab3fd9f74994b838ea5eba':
lavc: Consistently prefix input buffer defines
Conflicts:
doc/examples/decoding_encoding.c
libavcodec/4xm.c
libavcodec/aac_adtstoasc_bsf.c
libavcodec/aacdec.c
libavcodec/aacenc.c
libavcodec/ac3dec.h
libavcodec/asvenc.c
libavcodec/avcodec.h
libavcodec/avpacket.c
libavcodec/dvdec.c
libavcodec/ffv1enc.c
libavcodec/g2meet.c
libavcodec/gif.c
libavcodec/h264.c
libavcodec/h264_mp4toannexb_bsf.c
libavcodec/huffyuvdec.c
libavcodec/huffyuvenc.c
libavcodec/jpeglsenc.c
libavcodec/libxvid.c
libavcodec/mdec.c
libavcodec/motionpixels.c
libavcodec/mpeg4videodec.c
libavcodec/mpegvideo.c
libavcodec/noise_bsf.c
libavcodec/nuv.c
libavcodec/nvenc.c
libavcodec/options.c
libavcodec/parser.c
libavcodec/pngenc.c
libavcodec/proresenc_kostya.c
libavcodec/qsvdec.c
libavcodec/svq1enc.c
libavcodec/tiffenc.c
libavcodec/truemotion2.c
libavcodec/utils.c
libavcodec/utvideoenc.c
libavcodec/vc1dec.c
libavcodec/wmalosslessdec.c
libavformat/adxdec.c
libavformat/aiffdec.c
libavformat/apc.c
libavformat/apetag.c
libavformat/avidec.c
libavformat/bink.c
libavformat/cafdec.c
libavformat/flvdec.c
libavformat/id3v2.c
libavformat/isom.c
libavformat/matroskadec.c
libavformat/mov.c
libavformat/mpc.c
libavformat/mpc8.c
libavformat/mpegts.c
libavformat/mvi.c
libavformat/mxfdec.c
libavformat/mxg.c
libavformat/nutdec.c
libavformat/oggdec.c
libavformat/oggparsecelt.c
libavformat/oggparseflac.c
libavformat/oggparseopus.c
libavformat/oggparsespeex.c
libavformat/omadec.c
libavformat/rawdec.c
libavformat/riffdec.c
libavformat/rl2.c
libavformat/rmdec.c
libavformat/rtpdec_latm.c
libavformat/rtpdec_mpeg4.c
libavformat/rtpdec_qdm2.c
libavformat/rtpdec_svq3.c
libavformat/sierravmd.c
libavformat/smacker.c
libavformat/smush.c
libavformat/spdifenc.c
libavformat/takdec.c
libavformat/tta.c
libavformat/utils.c
libavformat/vqf.c
libavformat/westwood_vqa.c
libavformat/xmv.c
libavformat/xwma.c
libavformat/yop.c
Merged-by: Michael Niedermayer <michael@niedermayer.cc>
2015-07-27 23:15:19 +02:00
Michael Niedermayer
94d68a41fa
Merge commit '7c6eb0a1b7bf1aac7f033a7ec6d8cacc3b5c2615'
...
* commit '7c6eb0a1b7bf1aac7f033a7ec6d8cacc3b5c2615':
lavc: AV-prefix all codec flags
Conflicts:
doc/examples/muxing.c
ffmpeg.c
ffmpeg_opt.c
ffplay.c
libavcodec/aacdec.c
libavcodec/aacenc.c
libavcodec/ac3dec.c
libavcodec/ac3enc_float.c
libavcodec/atrac1.c
libavcodec/atrac3.c
libavcodec/atrac3plusdec.c
libavcodec/dcadec.c
libavcodec/ffv1enc.c
libavcodec/h264.c
libavcodec/h264_loopfilter.c
libavcodec/h264_mb.c
libavcodec/imc.c
libavcodec/libmp3lame.c
libavcodec/libtheoraenc.c
libavcodec/libtwolame.c
libavcodec/libvpxenc.c
libavcodec/libxavs.c
libavcodec/libxvid.c
libavcodec/mpeg12dec.c
libavcodec/mpeg12enc.c
libavcodec/mpegaudiodec_template.c
libavcodec/mpegvideo.c
libavcodec/mpegvideo_enc.c
libavcodec/mpegvideo_motion.c
libavcodec/nellymoserdec.c
libavcodec/nellymoserenc.c
libavcodec/nvenc.c
libavcodec/on2avc.c
libavcodec/options_table.h
libavcodec/opus_celt.c
libavcodec/pngenc.c
libavcodec/ra288.c
libavcodec/ratecontrol.c
libavcodec/twinvq.c
libavcodec/vc1_block.c
libavcodec/vc1_loopfilter.c
libavcodec/vc1_mc.c
libavcodec/vc1dec.c
libavcodec/vorbisdec.c
libavcodec/vp3.c
libavcodec/wma.c
libavcodec/wmaprodec.c
libavcodec/x86/hpeldsp_init.c
libavcodec/x86/me_cmp_init.c
Merged-by: Michael Niedermayer <michael@niedermayer.cc>
2015-07-27 22:10:35 +02:00
Vittorio Giovara
7c6eb0a1b7
lavc: AV-prefix all codec flags
...
Convert doxygen to multiline and express bitfields more simply.
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2015-07-27 15:24:58 +01:00
James Almer
844bef578e
avcodec/x86: add missing colon to labels
...
Silences warnings with Nasm
Signed-off-by: James Almer <jamrial@gmail.com>
2015-07-26 02:50:14 -03:00
Michael Niedermayer
52b6d96268
Merge commit 'a344e5d094ebcf9a23acf3a27c56cbbbc829db42'
...
* commit 'a344e5d094ebcf9a23acf3a27c56cbbbc829db42':
x86: bswapdsp: Don't treat 32-bit integers as 64-bit
Conflicts:
libavcodec/x86/bswapdsp.asm
Merged-by: Michael Niedermayer <michael@niedermayer.cc>
2015-07-17 23:20:14 +02:00
Michael Niedermayer
115a9b5091
Merge commit 'd42191c78befc1983f23b1899b2dda513b72f1ed'
...
* commit 'd42191c78befc1983f23b1899b2dda513b72f1ed':
configure: Factor out vp8dsp module
Conflicts:
configure
libavcodec/Makefile
libavcodec/x86/Makefile
Merged-by: Michael Niedermayer <michael@niedermayer.cc>
2015-07-17 22:45:34 +02:00
Michael Niedermayer
fd29dd432c
Merge commit '5cb4bdb2a03c3643f8f1e7d21d7094e61e0a4418'
...
* commit '5cb4bdb2a03c3643f8f1e7d21d7094e61e0a4418':
configure: Factor out rv34dsp module
Conflicts:
libavcodec/Makefile
libavcodec/x86/Makefile
Merged-by: Michael Niedermayer <michael@niedermayer.cc>
2015-07-17 22:21:36 +02:00
Henrik Gramner
a344e5d094
x86: bswapdsp: Don't treat 32-bit integers as 64-bit
...
The upper halves are not guaranteed to be zero in x86-64.
Also use `test` instead of `and` when the result isn't used for anything other
than as a branch condition, this allows some register moves to be eliminated.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2015-07-17 20:02:28 +02:00
Vittorio Giovara
d42191c78b
configure: Factor out vp8dsp module
2015-07-17 18:46:24 +01:00
Vittorio Giovara
5cb4bdb2a0
configure: Factor out rv34dsp module
2015-07-17 18:46:24 +01:00
Michael Niedermayer
b8c438e762
videodsp: assert that linesize is larger than width
...
Suggested-by: Andreas Cadhalpun <andreas.cadhalpun@googlemail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-07-08 01:32:04 +02:00
Andreas Cadhalpun
28efeb6502
doc: avoid incorrect phrase 'allows to'
...
Also fix typo found by Lou Logan:
Sacrifying -> Sacrificing
Reviewed-by: Lou Logan <lou@lrcd.com>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2015-06-16 21:48:51 +02:00
James Almer
9f815bc2c2
avcodec/jpeg200dsp: add ff_rct_int_{sse2,avx2}
...
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-06-13 16:53:31 -03:00
James Almer
7912a6830d
avcodec/jpeg200dsp: add ff_ict_float_{sse,avx}
...
Original intrinsics version by Nicolas Bertrand.
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-06-13 16:53:27 -03:00
Michael Niedermayer
63b0356274
Merge commit 'b7a4127a45b780d76e6b09427a3d0197c4bc1cdb'
...
* commit 'b7a4127a45b780d76e6b09427a3d0197c4bc1cdb':
h264_qpel: Use the correct header
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2015-06-12 21:55:40 +02:00
Michael Niedermayer
b68b5ec513
Merge commit '5e87080f2c73186066df0b9c43877b4af0beef3a'
...
* commit '5e87080f2c73186066df0b9c43877b4af0beef3a':
h264_weight: Fix SSSE3 biweight code with weights of 128
Conflicts:
libavcodec/x86/h264_weight.asm
See: e100966575
See: fb2288834b
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2015-06-12 21:47:01 +02:00
Vittorio Giovara
b7a4127a45
h264_qpel: Use the correct header
2015-06-12 17:02:48 +01:00
Michael Niedermayer
5e87080f2c
h264_weight: Fix SSSE3 biweight code with weights of 128
...
CC: libav-stable@libav.org
Sample-Id: test_bref.mp4
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2015-06-12 17:02:48 +01:00
Michael Niedermayer
e100966575
avcodec/x86/h264_weight: handle weight1=128
...
Fix ticket4596
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-06-09 05:11:09 +02:00
James Almer
c16e99e3b3
x86: check for AV_CPU_FLAG_AVXSLOW where useful
...
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-06-01 00:15:35 +02:00
James Almer
d68c05380c
x86: check for AV_CPU_FLAG_AVXSLOW where useful
...
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2015-05-31 12:07:11 +02:00
Michael Niedermayer
b666e81c13
Merge commit 'e4610300de6869bd6b3b00e76cfeabb6d7653dcd'
...
* commit 'e4610300de6869bd6b3b00e76cfeabb6d7653dcd':
x86: cavs: Remove an unneeded scratch buffer
Conflicts:
libavcodec/x86/cavsdsp.c
See: d79f7bf0d6
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2015-05-28 22:12:41 +02:00
Michael Niedermayer
e4610300de
x86: cavs: Remove an unneeded scratch buffer
...
Simplifies the code and makes it build on certain compilers
running out of registers on x86.
CC: libav-stable@libav.org
Reported-By: mudler
2015-05-28 18:40:40 +02:00
Timothy Gu
2b388e6dde
Revert "Move struc FFTContext below SECTION_RODATA"
...
This reverts commit 599888a480
.
The commit does not silence the warning on ELF-based systems, and will be
fixed in the subsequent commit.
Conflicts:
libavcodec/x86/fft_mmx.asm
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-05-28 00:08:32 +02:00
Michael Niedermayer
d9b264bc73
Merge commit '848e86f74d3e6e87fa592ee8ba8c184cc5fd9a42'
...
* commit '848e86f74d3e6e87fa592ee8ba8c184cc5fd9a42':
mpegvideo: Drop flags and flags2
Conflicts:
libavcodec/mpeg12dec.c
libavcodec/mpeg12enc.c
libavcodec/mpegvideo.c
libavcodec/mpegvideo_enc.c
libavcodec/mpegvideo_motion.c
libavcodec/ratecontrol.c
libavcodec/vc1_block.c
libavcodec/vc1_loopfilter.c
libavcodec/vc1_mc.c
libavcodec/vc1dec.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2015-05-22 20:24:41 +02:00
Vittorio Giovara
848e86f74d
mpegvideo: Drop flags and flags2
...
They are just duplicates of AVCodecContext members so use those instead.
2015-05-22 15:34:39 +01:00
Michael Niedermayer
451be676f3
Merge remote-tracking branch 'rbultje/vp9-bugfixes'
...
* rbultje/vp9-bugfixes:
vp9: match another find_ref_mvs() bug in libvpx.
vp9: fix scaled motion vector clipping for sub8x8 blocks.
vp9: improve signbias check.
vp9: don't allow compound references if error_resilience is enabled.
vp9: clamp segmented lflvl before applying ref/mode deltas.
vp9: reset loopfilter mode/ref deltas on keyframe.
vp9: fix crash when playing back 440/440 content with width%64<56.
vp9: extend loopfilter workaround for vp9 h/v mix-up to work for 422.
vp9: clip motion vectors in the same way as libvpx does.
vp9: set skip flag if the block had no coded coefficients.
vp9: apply mv scaling workaround only when subsampling is enabled.
vp9: read all 4x4 blocks in sub8x8 blocks individually with scalability.
vp9: fix segmentation map referencing upon framesize change.
vp9: disable more pmulhrsw optimizations in idct16/32.
vp9: disable all pmulhrsw in 8/16 iadst x86 optimizations.
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2015-05-18 02:35:16 +02:00
Carl Eugen Hoyos
e609cfd697
lavc/flac: Fix encoding and decoding with high lpc.
...
Based on an analysis by trac user lvqcl.
Fixes ticket #4421 , reported by Chase Walker.
2015-05-17 02:08:58 +02:00
Ronald S. Bultje
d32d0593f1
vp9: disable more pmulhrsw optimizations in idct16/32.
...
For idct16, only when called from a adst16x16 variant, so impact is
minor. For idct32, for all, so relatively major impact.
2015-05-14 14:15:27 -04:00
Ronald S. Bultje
96d30c3495
vp9: disable all pmulhrsw in 8/16 iadst x86 optimizations.
...
They all overflow in various samples that are considered valid input.
2015-05-14 13:39:37 -04:00
Michael Niedermayer
cc77bb09e4
avcodec/x86/vp9dsp_init: Fix mix of declaration and statement
...
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-05-07 14:33:10 +02:00
Ronald S. Bultje
b224b165cb
vp9: add keyframe profile 2/3 support.
2015-05-06 15:10:41 -04:00
Michael Niedermayer
6ef3426d90
avcodec/x86/deinterlace: use INIT_MMX like other asm code does too
2015-05-05 02:41:15 +02:00
Michael Niedermayer
dfc0708e23
avcodec/x86/dct-test: Use uint8_t for idct_simple_mmx_perm
...
The table contains no element outside the unsigned 8bit range
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-05-02 13:43:15 +02:00
Michael Niedermayer
270e647adc
avcodec/x86/dct-test: Make static table const
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-05-02 13:42:46 +02:00
Ronald S. Bultje
3de13d5212
vp9: remove another optimization branch in iadst16 which causes overflows.
...
See sample vp90-2-14-resize-fp-tiles-16-8.webm from the vp9 test vector
set to reproduce the issue.
2015-04-24 16:54:31 +02:00
Ronald S. Bultje
d02d04a18f
vp9: remove one optimization branch in iadst16 which causes overflows.
...
See sample vp90-2-14-resize-fp-tiles-16-8-4-2-1.webm from the vp9 test
vector set which reproduces the issue. This probably costs a few cycles,
but I don't think there's an easy way to workaround that.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-04-22 21:37:10 +02:00
Michael Niedermayer
0245abc7c1
avcodec/x86/hpeldsp_init: Put CONFIG_* first in if()
...
This is more consistent and may fix a build failure
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-03-26 15:41:27 +01:00
James Almer
6b940b8c99
x86/xvididct: add some yasm guards
...
Should fix compilation on compilers with less-than-ideal dead code elimination
Signed-off-by: James Almer <jamrial@gmail.com>
2015-03-20 02:38:20 -03:00
James Almer
b0fea4ad7e
x86/xvididct: remove obsolete function prototypes
...
Signed-off-by: James Almer <jamrial@gmail.com>
2015-03-20 02:38:14 -03:00
Michael Niedermayer
1eb28479da
Merge commit '48aef27f5232794e70ecef0d347b9f65e27a9bad'
...
* commit '48aef27f5232794e70ecef0d347b9f65e27a9bad':
x86: Put COPY3_IF_LT under HAVE_6REGS
Conflicts:
libavcodec/x86/mathops.h
See: b38910c979
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2015-03-17 20:25:47 +01:00
Luca Barbato
48aef27f52
x86: Put COPY3_IF_LT under HAVE_6REGS
...
It uses 6 registers, unbreaks building on hardened x86 system.
Bug-Id: gentoo/541930
CC: libav-stable@libav.org
2015-03-17 12:31:04 +01:00
Michael Niedermayer
d79f7bf0d6
avcodec/x86/cavsdsp: remove incorrect LOCAL_ALIGN tmp
...
This is faster and simpler as well
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-03-16 14:51:51 +01:00
James Almer
e8374d7202
x86/proresdsp: remove ff_prores_idct_put_10_sse4
...
It's exactly the same as the sse2 version.
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-03-16 01:52:44 -03:00
James Almer
bdd179c8cb
x86/proresdsp: remove unused macro
...
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-03-16 01:49:34 -03:00
Christophe Gisquet
238db7cc56
x86: lavc: use LOCAL_ALIGNED instead of DECLARE_ALIGNED
...
The later may yield incorrect code for on-stack variables.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-03-14 20:06:47 +01:00
Christophe Gisquet
15ce160183
x86: xvid_idct: SSE2 merged add version
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-03-14 13:36:47 +01:00
Christophe Gisquet
decd5193e1
x86: xvid_idct: merged idct_put SSE2 versions
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-03-14 13:36:29 +01:00
Christophe Gisquet
8200575d84
x86: dct-test: evaluate prores idct avx version
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-03-14 13:23:27 +01:00
Christophe Gisquet
4eb4451be1
x86: dct-test: fix compilation for prores
...
When the decoder is deactivated, the x86-optimized versions are
not compiled, resulting in a link error.
The C version is unaffected, as it is part of the idctdsp
subsystem.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-03-14 13:23:06 +01:00
Christophe Gisquet
c3bf52713a
x86: xvid_idct: port MMX iDCT to yasm
...
Also reduce the table duplication with SSE2 code, remove duplicated
macro parameters.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-03-14 11:45:11 +01:00
Christophe Gisquet
2999bd7da2
x86: xvid_idct: port SSE2 iDCT to yasm
...
The main difference consists in renaming properly labels, and
letting yasm select the gprs for skipping 1D transforms.
Previous-version-reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-03-13 01:04:52 +01:00
James Almer
5c8f747085
x86/hevc_sao: use unaligned movs for sao_{band,filter} with width 8
...
Suggested-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-03-01 20:02:43 -03:00
Michael Niedermayer
7fce8c752d
Merge commit '71f1ad37d858b810b71a4af1c25771beaa50b27b'
...
* commit '71f1ad37d858b810b71a4af1c25771beaa50b27b':
lavc: do not compile fmtconvert unconditionally
Conflicts:
configure
libavcodec/ppc/Makefile
libavcodec/x86/Makefile
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2015-03-01 00:06:42 +01:00
Michael Niedermayer
5c17377e28
Merge commit 'd74a8cb7e42f703be5796eeb485f06af710ae8ca'
...
* commit 'd74a8cb7e42f703be5796eeb485f06af710ae8ca':
fmtconvert: drop unused functions
Conflicts:
libavcodec/arm/fmtconvert_vfp_armv6.S
libavcodec/x86/fmtconvert.asm
libavcodec/x86/fmtconvert_init.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-28 23:58:29 +01:00
Anton Khirnov
71f1ad37d8
lavc: do not compile fmtconvert unconditionally
...
Only ac3dec and dcadec use it.
2015-02-28 21:51:24 +01:00
Anton Khirnov
d74a8cb7e4
fmtconvert: drop unused functions
2015-02-28 21:51:24 +01:00
Michael Niedermayer
23a90768a8
avcodec/v210dec: Add ff prefix to v210_x86_init()
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-27 19:08:09 +01:00
Michael Niedermayer
0e699676f9
avcodec/snow: mark dwt init as av_cold
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-27 16:53:37 +01:00
Carl Eugen Hoyos
36a6fb989b
hevc_deblock: Fix compilation with nasm
...
CC: libav-stable@libav.org
Bug-Id: 795
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2015-02-22 22:34:20 +00:00
Michael Niedermayer
03f39fbb2a
avcodec/x86/mlpdsp_init: Simplify mlp_filter_channel_x86()
...
Based on patch by Francisco Blas Izquierdo Riera
Commit message partly taken from carl
fixes a compilation
error in mlpdsp_init.c with -fstack-check and some gcc compilers (I
reproduced the issue with gcc 4.7.3) by simplifying the code.
See also https://bugs.gentoo.org/show_bug.cgi?id=471756
$ make libavcodec/x86/mlpdsp_init.o
libavcodec/x86/mlpdsp_init.c: In function ‘mlp_filter_channel_x86’:
libavcodec/x86/mlpdsp_init.c:142:5: error: can’t find a register in
class ‘GENERAL_REGS’ while reloading ‘asm’
libavcodec/x86/mlpdsp_init.c:142:5: error: ‘asm’ operand has impossible
constraints
4551 -> 4509 dezicycles
Reviewed-by: Ramiro Polla <ramiro.polla@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-21 16:05:41 +01:00
Christophe Gisquet
398f531915
x86: hevc_mc: fewer xmm regs used in epel h/v
...
11 xmm regs seem only required for avx2.
Reviewed-by: Mickaël Raulet <mraulet@insa-rennes.fr
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-17 15:19:19 +01:00
Christophe Gisquet
89cb4995fa
x86: hevc_mc: save 1 gpr in epel filter loading
...
The 3*stride value stored in r3src can be loaded much later,
so use r3src instead of a dedicated gpr when possible.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-16 21:53:51 +01:00
James Almer
03adafb318
x86/g722dsp: add ff_g722_apply_qmf_sse2
...
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-16 00:41:21 -03:00
Christophe Gisquet
b533949813
x86: hevc: remove a parameter to WP internals
...
The second stride is always the internal buffer one, MAX_PB_SIZE (times 2 to
get the value in bytes).
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-14 17:22:50 +01:00
James Almer
1679d68dbf
x86/hevc_mc: optimize AVX2 mc functions
...
Before
40766 decicycles in ff_hevc_put_hevc_qpel_h64_8_avx2, 8192 runs, 0 skips
After
37975 decicycles in ff_hevc_put_hevc_qpel_h64_8_avx2, 8192 runs, 0 skips
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-12 13:21:58 -03:00
James Almer
14b44c1614
x86/hevc_sao: make sao_edge_filter_{10,12} work on x86_32
...
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-12 13:21:30 -03:00
James Almer
06fe6dfe12
x86/hevc_sao: make sao_band_filter work on x86_32
...
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-09 20:41:21 -03:00
Christophe Gisquet
b61b9e4919
x86: hevc_mc: remove lea in EPEL_LOAD
...
The second parameter to the macro is always an immediate address,
so no lea is needed.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-08 22:19:35 +01:00
Christophe Gisquet
4919b38421
x86: hevc_mc: fewer gpr autoloads for _v filters
...
In that case, it's just to load my, but mx/r3src is not used.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-08 22:19:34 +01:00
James Almer
92d903afaa
x86/vp9dsp: fix clobbering of xmm6 on IDCT sse2 functions
...
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-08 00:50:39 -03:00
Christophe Gisquet
626d6184ce
x86: lavc/hevc_mc: fix comments
...
The width parameter is now completely at the back, and actually
never used. This helps understanding the actual parameter list.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-07 20:52:03 +01:00
Christophe Gisquet
ed450d4acf
x86: lavc: share more constant through defines
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-07 17:48:14 +01:00
Christophe Gisquet
691b7f5e9e
lavc/lossless_audiodsp: revert various commits
...
Their intent was to make the DSP work with wmalossless pro.
The later was fixed to work with the DSP.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-07 15:15:19 +01:00
Christophe Gisquet
9dc45d1f42
x86: lavc: share more constants
...
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-06 23:35:02 +01:00
Mickaël Raulet
6ecc3fd612
x86/hevc_mc: use aligned loads
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-06 21:38:00 +01:00
James Almer
383fddeec6
x86/lossless_audiodsp: fix compilation with --disable-yasm
...
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-06 17:30:17 -03:00
James Almer
aea29a891f
x86/hevc_sao: fix loading of RIP address
...
pb_eo must be handled as a rip relative address for MSVC64, so an
intermediate register is needed. Should fix link failures.
Suggested by Hendrik Leppkes and Christophe Gisquet.
Tested-By: Hendrik Leppkes <h.leppkes@gmail.com>
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-06 15:06:15 -03:00
Mickaël Raulet
bcb0925115
x86/hevc: use CLIPW macro when possible
...
Conflicts:
libavcodec/x86/hevc_mc.asm
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-06 17:38:47 +01:00
Christophe Gisquet
5eedd36df1
x86: hevc_mc: use epel_hv 16-wide function
...
The epel_hv functions were still relying on only epel_hv 8-wide
being the maximum width instanciated.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-06 17:37:56 +01:00
Pierre Edouard Lepere
a0d1300f71
x86: hevc_mc: add AVX2 optimizations
...
before
33304 decicycles in luma_bi_1, 523066 runs, 1222 skips
38138 decicycles in luma_bi_2, 523427 runs, 861 skips
13490 decicycles in luma_uni, 516138 runs, 8150 skips
after
20185 decicycles in luma_bi_1, 519970 runs, 4318 skips
24620 decicycles in luma_bi_2, 521024 runs, 3264 skips
10397 decicycles in luma_uni, 515715 runs, 8573 skips
Conflicts:
libavcodec/x86/hevc_mc.asm
libavcodec/x86/hevcdsp_init.c
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-06 17:20:47 +01:00
Michael Niedermayer
a6c2c8fe3f
Revert "avcodec/x86/lossless_audiodsp: Make scalarproduct_and_madd_int16 prototypes more similar"
...
This reverts commit 3b4ffba3af
.
Unbreaks the SSSE3 code on mingw32
Conflicts:
libavcodec/x86/lossless_audiodsp.asm
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-06 02:31:45 +01:00
Michael Niedermayer
f1214763af
avcodec/x86/lossless_audiodsp: Move order&8 fallback into C code
...
This is simpler and more robust, and fixes mismatching XMM save restore
mismatches
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-06 02:18:54 +01:00
Michael Niedermayer
3b4ffba3af
avcodec/x86/lossless_audiodsp: Make scalarproduct_and_madd_int16 prototypes more similar
...
This is needed as the mmx code is used as fallback from the ssse3 code
Suggested-by: jamrial
Tested-by: wm4
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-06 00:20:59 +01:00
James Almer
15574c505b
x86/hevcdsp: add ff_hevc_sao_edge_filter_{10,12}_{sse2,avx2}
...
Original x86 intrinsics code by Pierre-Edouard Lepere.
Yasm port, refactoring and optimizations by James Almer.
Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U
Width 32
342694 decicycles in sao_edge_filter_10, 16384 runs, 0 skips
29476 decicycles in ff_hevc_sao_edge_filter_32_10_ssse3, 16384 runs, 0 skips
13996 decicycles in ff_hevc_sao_edge_filter_32_10_avx2, 16381 runs, 3 skips
Width 64
581163 decicycles in sao_edge_filter_10, 8192 runs, 0 skips
59774 decicycles in ff_hevc_sao_edge_filter_64_10_ssse3, 8192 runs, 0 skips
28383 decicycles in ff_hevc_sao_edge_filter_64_10_avx2, 8191 runs, 1 skips
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-05 15:02:33 -03:00
James Almer
042c1159fc
x86/hevcdsp: add ff_hevc_sao_edge_filter_8_{ssse3,avx2}
...
Original x86 intrinsics code and initial yasm port by Pierre-Edouard Lepere.
Refactoring and optimizations by James Almer.
Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U
Width 32
158583 decicycles in edge, sao_edge_filter_8 runs, 0 skips
5205 decicycles in ff_hevc_sao_edge_filter_32_8_ssse3, 32767 runs, 1 skips
2942 decicycles in ff_hevc_sao_edge_filter_32_8_avx2, 32767 runs, 1 skips
Width 64
705639 decicycles in sao_edge_filter_8, 262144 runs, 0 skips
19224 decicycles in ff_hevc_sao_edge_filter_64_8_ssse3, 262111 runs, 33 skips
10433 decicycles in ff_hevc_sao_edge_filter_64_8_avx2, 262115 runs, 29 skips
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-05 15:02:27 -03:00
James Almer
aa945dc112
x86/hevcdsp: add missing vzeroupper in ff_hevc_sao_band_filter_48_*_avx2
...
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-02 00:01:35 -03:00
James Almer
71e2cb4706
x86/hevcdsp: add missing guards to ff_hevc_sao_band_filter_avx2
...
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-01 21:45:52 -03:00
Christophe Gisquet
bff7feb328
x86: hevc/sao: aligned source buffers
...
Usefull for at least band filter, for which:
- Band filter call only:
32 64
Before: 16556 54015
After: 16497 52355
- Whole case:
32 64
Before: 37031 103008
After: 32045 93952
2015-02-01 20:22:54 -03:00
James Almer
fa3eccb4f9
x86/hevc: add ff_hevc_sao_band_filter_{8,10,12}_{sse2,avx,avx2}
...
Original x86 intrinsics code and initial 8bit yasm port by Pierre-Edouard Lepere.
10/12bit yasm ports, refactoring and optimizations by James Almer
Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U
width 32
40338 decicycles in sao_band_filter_0_8, 2048 runs, 0 skips
8056 decicycles in ff_hevc_sao_band_filter_8_32_sse2, 2048 runs, 0 skips
7458 decicycles in ff_hevc_sao_band_filter_8_32_avx, 2048 runs, 0 skips
4504 decicycles in ff_hevc_sao_band_filter_8_32_avx2, 2048 runs, 0 skips
width 64
136046 decicycles in sao_band_filter_0_8, 16384 runs, 0 skips
28576 decicycles in ff_hevc_sao_band_filter_8_32_sse2, 16384 runs, 0 skips
26707 decicycles in ff_hevc_sao_band_filter_8_32_avx, 16384 runs, 0 skips
14387 decicycles in ff_hevc_sao_band_filter_8_32_avx2, 16384 runs, 0 skips
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-01 20:22:35 -03:00
Christophe Gisquet
7aeafacfd0
x86/sbrdsp: Use different mem moves
...
Before
2843 decicycles in ff_sbr_autocorrelate_sse3, 262086 runs, 58 skips
After
2693 decicycles in ff_sbr_autocorrelate_sse3, 262117 runs, 27 skips
Signed-off-by: James Almer <jamrial@gmail.com>
2015-01-25 18:20:43 -03:00
James Almer
449b21bfab
x86/sbrdsp: add ff_sbr_autocorrelate_{sse,sse3}
...
2 to 2.5 times faster.
Signed-off-by: James Almer <jamrial@gmail.com>
2015-01-25 18:20:39 -03:00
James Almer
08810a8895
x86/flacdsp: remove unneeded ifdeffery
...
x86inc can translate r*m into a register or stack on its own
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-01-05 16:29:28 -03:00
James Almer
37b35feb64
x86/swr: add SSE2/AVX pack_8ch functions
...
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-12-30 23:05:27 -03:00
Ronald S. Bultje
3aefca68ca
vp9/x86: add myself to copyright holders for loopfilter assembly.
2014-12-27 16:55:16 -05:00
Ronald S. Bultje
afd8c464b7
vp9/x86: make filter_16_h work on 32-bit.
2014-12-27 16:55:16 -05:00
Ronald S. Bultje
b26bc3520f
vp9/x86: make filter_48/84/88_h work on 32-bit.
2014-12-27 16:55:15 -05:00
Ronald S. Bultje
8a1cff1c35
vp9/x86: make filter_44_h work on 32-bit.
2014-12-27 16:55:15 -05:00
Ronald S. Bultje
047088b8c6
vp9/x86: make filter_16_v work on 32-bit.
2014-12-27 16:55:14 -05:00
Ronald S. Bultje
0cc9c23ea1
vp9/x86: make filter_48/84_v work on 32-bit.
2014-12-27 16:55:14 -05:00
Ronald S. Bultje
6433a9133f
vp9/x86: make filter_88_v work on 32-bit.
2014-12-27 16:55:14 -05:00
Ronald S. Bultje
75f8e52089
vp9/x86: make filter_44_v work on 32-bit.
2014-12-27 16:55:13 -05:00
Ronald S. Bultje
7f80c3344c
vp8/x86: save one register in SIGN_ADD/SUB.
2014-12-27 16:55:13 -05:00
Ronald S. Bultje
8ea2194ebb
vp9/x86: store unpacked intermediates for filter6/14 on stack.
...
filter16 goes from 508 to 482 (h) or 346 to 314 (v) cycles; filter88
goes from 240 to 238 (h) or 174 to 165 (v) cycles, measured on TOS.
2014-12-27 16:55:13 -05:00
Ronald S. Bultje
e42409479f
vp8/x86: move variable assigned inside macro branch.
...
The value is not used outside the branch.
2014-12-27 16:55:12 -05:00
Ronald S. Bultje
418c202c63
vp9/x86: simplify ABSSUM_CMP by inverting the comparison meaning.
2014-12-27 16:55:12 -05:00
Ronald S. Bultje
d1c55654e1
vp8/x86: remove unused register from ABSSUB_CMP macro.
2014-12-27 16:55:12 -05:00
Ronald S. Bultje
e59bd08986
vp9/x86: slightly simplify 44/48/84/88 h stores.
2014-12-27 16:55:11 -05:00
Ronald S. Bultje
8132629bd5
vp9/x86: make cglobal statement more conservative in register allocation.
2014-12-27 16:55:11 -05:00
Ronald S. Bultje
c013ca58c5
vp9/x86: save one register in loopfilter surface coverage.
2014-12-27 16:55:11 -05:00
James Almer
32c836cb11
x86/vp9: remove duplicate function prototypes
...
Fixes "redundant redeclaration" warnings.
Signed-off-by: James Almer <jamrial@gmail.com>
2014-12-23 00:56:51 -03:00
James Almer
7696e429c7
x86/vp3dsp: port put_vp_no_rnd_pixels8_l2_mmx to yasm
...
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-20 13:25:43 +01:00
James Almer
a4d62f7775
x86/constants: fix alignment of pw_255
...
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-19 20:21:34 +01:00
Ronald S. Bultje
bdc1e3e3b2
vp9/x86: intra prediction sse2/32bit support.
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-19 14:07:19 +01:00
Ronald S. Bultje
b6e1711223
vp9/x86: invert hu_ipred left array ordering.
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-19 14:07:18 +01:00
Ronald S. Bultje
0a7964dca5
vp9/x86: save one register on 32bit idct32x32.
...
Fixes build on win32.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-16 02:51:26 +01:00
Ronald S. Bultje
cae893f692
vp9/x86: sse2 MC assembly.
...
Also a slight change to the ssse3 code, which prevents a theoretical
overflow in the sharp filter.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-15 02:34:05 +01:00
Ronald S. Bultje
fd77fbb390
vp9/x86: 32bit and sse2 support for vp9 inverse transform assembly
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-15 00:38:05 +01:00
Michael Niedermayer
a03f72e744
avcodec/x86/hevc_mc: fix sse register counts
...
These fix failures of --enable-xmm-clobber-test
It would be better to change the code to use fewer registers, but until
someone does the used register count must not be too small
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-11 13:17:26 +01:00
Michael Niedermayer
d43d5c5707
avcodec/x86/hevc_mc: remove dead branch from EPEL_FILTER
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-10 07:34:49 +01:00
Michael Niedermayer
ed9be7dd47
avcodec/x86/pngdsp: fix off by 1 error
...
This fixes artifacts in the last pixel of rows with some widths and pixel formats
Found-by: Dominique Leroux <Dominique.Leroux@autodesk.com>
Tested-by: Dominique Leroux <Dominique.Leroux@autodesk.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-08 18:24:40 +01:00
Michael Niedermayer
1d048f762d
Merge commit '9a738c27dceb4b975784b23213a46f5cb560d1c2'
...
* commit '9a738c27dceb4b975784b23213a46f5cb560d1c2':
v210enc: Add SIMD optimised 8-bit and 10-bit encoders
Conflicts:
libavcodec/v210enc.c
libavcodec/v210enc.h
libavcodec/x86/Makefile
libavcodec/x86/v210enc.asm
libavcodec/x86/v210enc_init.c
tests/ref/vsynth/vsynth1-v210
tests/ref/vsynth/vsynth2-v210
See: 36091742d1
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-06 01:54:10 +01:00
Kieran Kunhya
9a738c27dc
v210enc: Add SIMD optimised 8-bit and 10-bit encoders
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2014-12-05 13:03:49 +00:00
Reimar Döffinger
49d9cbe55d
h264_i386: Fix operand size
...
Fixes fate failure on macosx clang x86-64
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-03 23:03:13 +01:00
Christophe Gisquet
9fa056ba75
pngdsp x86: use unaligned access
...
For test images manually generated to contain only up prediction,
timing results:
8380x3032 255x185
before: 138635 1992
after: 139232 1996
Actually jumping to the proper version depending on the alignment:
8380x3032: 138767
A 0.5% speed improvement for gigantic images is not worth the code
duplication.
Fixes ticket #4148
Signed-off-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Tested-by: Benoit Fouet <benoit.fouet@free.fr>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-03 11:56:22 +01:00
Kieran Kunhya
36091742d1
v210enc: Add SIMD optimised 8-bit and 10-bit encoders
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-11-26 20:30:47 +01:00
Michael Niedermayer
ea41e6d637
Merge commit '9c12c6ff9539e926df0b2a2299e915ae71872600'
...
* commit '9c12c6ff9539e926df0b2a2299e915ae71872600':
motion_est: convert stride to ptrdiff_t
Conflicts:
libavcodec/me_cmp.c
libavcodec/ppc/me_cmp.c
libavcodec/x86/me_cmp_init.c
See: 9c669672c7
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-11-24 12:13:00 +01:00
Vittorio Giovara
9c12c6ff95
motion_est: convert stride to ptrdiff_t
...
CC: libav-stable@libav.org
Bug-Id: CID 700556 / CID 700557 / CID 700558
2014-11-24 01:30:10 +00:00
Carl Eugen Hoyos
600e38f563
Fix standalone compilation of the apng decoder on x86.
2014-11-23 13:21:29 +01:00
Michael Niedermayer
65ce8f8895
avcodec/x86/Makefile: fix order
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-11-23 01:49:04 +01:00
Michael Niedermayer
d3512a0e89
avcodec/x86/lossless_audiodsp: fix fallback code for 32bit
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-11-22 21:08:38 +01:00
Michael Niedermayer
4327088da3
avcodec/x86/lossless_audiodsp: support len %16 == 8 in scalarproduct_and_madd_int16()
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-11-22 20:40:36 +01:00
Reimar Döffinger
478c61ccb2
h264_i386: Optimize decode_significance_8x8_x86 for 64 bit.
...
11674 -> 10877 decicycles on my Phenom II.
Overall speedup was unfortunately within measurement error.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
2014-11-22 14:06:48 +01:00
James Almer
3cec54b7d7
x86/flacdsp: add SSE2 and AVX decorrelate functions
...
Two to four times faster depending on instruction set, block size and channel count.
2014-11-13 13:47:55 -03:00
James Almer
84ccc317ce
x86/flacdsp: separate decoder and encoder dsp initialization
...
Signed-off-by: James Almer <jamrial@gmail.com>
2014-11-12 14:41:45 -03:00
James Almer
7292b0477a
x86/hpeldsp: fix loop in {avg,avg_no_rnd}_pixels16_x2_mmx
...
Handle it inside the __asm__() block.
Fixes fate-vc1_ilaced_twomv when using the gcc-usan toolchain.
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-10-23 13:11:05 -03:00
Michael Niedermayer
3c1378ce0a
Merge commit '2d91abade29e43bb45c881d45909b8ee77e904e2'
...
* commit '2d91abade29e43bb45c881d45909b8ee77e904e2':
x86: h264_intrapred: Don't treat 32-bit integers as 64-bit
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-10-08 11:48:58 +02:00
Henrik Gramner
2d91abade2
x86: h264_intrapred: Don't treat 32-bit integers as 64-bit
...
The upper halves are not guaranteed to be zero in x86-64.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2014-10-08 08:15:52 +00:00
Mickaël Raulet
4ba6371a83
x86/hevc: get rid off packusdw for ssse3 compatibility
...
cherry picked from commit df8ebe304df453f26c28ff8f11d607f49b90a4c2
Fixes out of array access
Fixes: asan_stack-oob_1046454_9_asan_stack-oob_15a9e7c_170_WP_MAIN10_B_Toshiba_3.bit
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-10-04 21:14:15 +02:00
James Almer
0de1d6287e
x86/mlpdec: add ff_mlp_rematrix_channel_{sse4,avx2}
...
2x to 2.5x faster than the C version.
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-10-02 22:11:55 -03:00
James Almer
acebff8e5d
x86/mpegvideoencdsp: improve ff_pix_sum16_sse2
...
~15% faster.
Also add an mmxext version that takes advantage of the new code, and
build it alongside with the mmx version only on x86_32.
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-10-01 13:07:22 -03:00