Commit Graph

2072 Commits

Author SHA1 Message Date
Michael Niedermayer
b8c438e762 videodsp: assert that linesize is larger than width
Suggested-by: Andreas Cadhalpun <andreas.cadhalpun@googlemail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-07-08 01:32:04 +02:00
Andreas Cadhalpun
28efeb6502 doc: avoid incorrect phrase 'allows to'
Also fix typo found by Lou Logan:
Sacrifying -> Sacrificing

Reviewed-by: Lou Logan <lou@lrcd.com>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2015-06-16 21:48:51 +02:00
James Almer
9f815bc2c2 avcodec/jpeg200dsp: add ff_rct_int_{sse2,avx2}
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-06-13 16:53:31 -03:00
James Almer
7912a6830d avcodec/jpeg200dsp: add ff_ict_float_{sse,avx}
Original intrinsics version by Nicolas Bertrand.

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-06-13 16:53:27 -03:00
Michael Niedermayer
63b0356274 Merge commit 'b7a4127a45b780d76e6b09427a3d0197c4bc1cdb'
* commit 'b7a4127a45b780d76e6b09427a3d0197c4bc1cdb':
  h264_qpel: Use the correct header

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2015-06-12 21:55:40 +02:00
Michael Niedermayer
b68b5ec513 Merge commit '5e87080f2c73186066df0b9c43877b4af0beef3a'
* commit '5e87080f2c73186066df0b9c43877b4af0beef3a':
  h264_weight: Fix SSSE3 biweight code with weights of 128

Conflicts:
	libavcodec/x86/h264_weight.asm

See: e100966575
See: fb2288834b
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2015-06-12 21:47:01 +02:00
Vittorio Giovara
b7a4127a45 h264_qpel: Use the correct header 2015-06-12 17:02:48 +01:00
Michael Niedermayer
5e87080f2c h264_weight: Fix SSSE3 biweight code with weights of 128
CC: libav-stable@libav.org
Sample-Id: test_bref.mp4

Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2015-06-12 17:02:48 +01:00
Michael Niedermayer
e100966575 avcodec/x86/h264_weight: handle weight1=128
Fix ticket4596

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-06-09 05:11:09 +02:00
James Almer
c16e99e3b3 x86: check for AV_CPU_FLAG_AVXSLOW where useful
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-06-01 00:15:35 +02:00
James Almer
d68c05380c x86: check for AV_CPU_FLAG_AVXSLOW where useful
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2015-05-31 12:07:11 +02:00
Michael Niedermayer
b666e81c13 Merge commit 'e4610300de6869bd6b3b00e76cfeabb6d7653dcd'
* commit 'e4610300de6869bd6b3b00e76cfeabb6d7653dcd':
  x86: cavs: Remove an unneeded scratch buffer

Conflicts:
	libavcodec/x86/cavsdsp.c

See: d79f7bf0d6
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2015-05-28 22:12:41 +02:00
Michael Niedermayer
e4610300de x86: cavs: Remove an unneeded scratch buffer
Simplifies the code and makes it build on certain compilers
running out of registers on x86.

CC: libav-stable@libav.org
Reported-By: mudler
2015-05-28 18:40:40 +02:00
Timothy Gu
2b388e6dde Revert "Move struc FFTContext below SECTION_RODATA"
This reverts commit 599888a480.

The commit does not silence the warning on ELF-based systems, and will be
fixed in the subsequent commit.

Conflicts:
	libavcodec/x86/fft_mmx.asm

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-05-28 00:08:32 +02:00
Michael Niedermayer
d9b264bc73 Merge commit '848e86f74d3e6e87fa592ee8ba8c184cc5fd9a42'
* commit '848e86f74d3e6e87fa592ee8ba8c184cc5fd9a42':
  mpegvideo: Drop flags and flags2

Conflicts:
	libavcodec/mpeg12dec.c
	libavcodec/mpeg12enc.c
	libavcodec/mpegvideo.c
	libavcodec/mpegvideo_enc.c
	libavcodec/mpegvideo_motion.c
	libavcodec/ratecontrol.c
	libavcodec/vc1_block.c
	libavcodec/vc1_loopfilter.c
	libavcodec/vc1_mc.c
	libavcodec/vc1dec.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2015-05-22 20:24:41 +02:00
Vittorio Giovara
848e86f74d mpegvideo: Drop flags and flags2
They are just duplicates of AVCodecContext members so use those instead.
2015-05-22 15:34:39 +01:00
Michael Niedermayer
451be676f3 Merge remote-tracking branch 'rbultje/vp9-bugfixes'
* rbultje/vp9-bugfixes:
  vp9: match another find_ref_mvs() bug in libvpx.
  vp9: fix scaled motion vector clipping for sub8x8 blocks.
  vp9: improve signbias check.
  vp9: don't allow compound references if error_resilience is enabled.
  vp9: clamp segmented lflvl before applying ref/mode deltas.
  vp9: reset loopfilter mode/ref deltas on keyframe.
  vp9: fix crash when playing back 440/440 content with width%64<56.
  vp9: extend loopfilter workaround for vp9 h/v mix-up to work for 422.
  vp9: clip motion vectors in the same way as libvpx does.
  vp9: set skip flag if the block had no coded coefficients.
  vp9: apply mv scaling workaround only when subsampling is enabled.
  vp9: read all 4x4 blocks in sub8x8 blocks individually with scalability.
  vp9: fix segmentation map referencing upon framesize change.
  vp9: disable more pmulhrsw optimizations in idct16/32.
  vp9: disable all pmulhrsw in 8/16 iadst x86 optimizations.

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2015-05-18 02:35:16 +02:00
Carl Eugen Hoyos
e609cfd697 lavc/flac: Fix encoding and decoding with high lpc.
Based on an analysis by trac user lvqcl.

Fixes ticket #4421, reported by Chase Walker.
2015-05-17 02:08:58 +02:00
Ronald S. Bultje
d32d0593f1 vp9: disable more pmulhrsw optimizations in idct16/32.
For idct16, only when called from a adst16x16 variant, so impact is
minor. For idct32, for all, so relatively major impact.
2015-05-14 14:15:27 -04:00
Ronald S. Bultje
96d30c3495 vp9: disable all pmulhrsw in 8/16 iadst x86 optimizations.
They all overflow in various samples that are considered valid input.
2015-05-14 13:39:37 -04:00
Michael Niedermayer
cc77bb09e4 avcodec/x86/vp9dsp_init: Fix mix of declaration and statement
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-05-07 14:33:10 +02:00
Ronald S. Bultje
b224b165cb vp9: add keyframe profile 2/3 support. 2015-05-06 15:10:41 -04:00
Michael Niedermayer
6ef3426d90 avcodec/x86/deinterlace: use INIT_MMX like other asm code does too 2015-05-05 02:41:15 +02:00
Michael Niedermayer
dfc0708e23 avcodec/x86/dct-test: Use uint8_t for idct_simple_mmx_perm
The table contains no element outside the unsigned 8bit range

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-05-02 13:43:15 +02:00
Michael Niedermayer
270e647adc avcodec/x86/dct-test: Make static table const
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-05-02 13:42:46 +02:00
Ronald S. Bultje
3de13d5212 vp9: remove another optimization branch in iadst16 which causes overflows.
See sample vp90-2-14-resize-fp-tiles-16-8.webm from the vp9 test vector
set to reproduce the issue.
2015-04-24 16:54:31 +02:00
Ronald S. Bultje
d02d04a18f vp9: remove one optimization branch in iadst16 which causes overflows.
See sample vp90-2-14-resize-fp-tiles-16-8-4-2-1.webm from the vp9 test
vector set which reproduces the issue. This probably costs a few cycles,
but I don't think there's an easy way to workaround that.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-04-22 21:37:10 +02:00
Michael Niedermayer
0245abc7c1 avcodec/x86/hpeldsp_init: Put CONFIG_* first in if()
This is more consistent and may fix a build failure

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-03-26 15:41:27 +01:00
James Almer
6b940b8c99 x86/xvididct: add some yasm guards
Should fix compilation on compilers with less-than-ideal dead code elimination

Signed-off-by: James Almer <jamrial@gmail.com>
2015-03-20 02:38:20 -03:00
James Almer
b0fea4ad7e x86/xvididct: remove obsolete function prototypes
Signed-off-by: James Almer <jamrial@gmail.com>
2015-03-20 02:38:14 -03:00
Michael Niedermayer
1eb28479da Merge commit '48aef27f5232794e70ecef0d347b9f65e27a9bad'
* commit '48aef27f5232794e70ecef0d347b9f65e27a9bad':
  x86: Put COPY3_IF_LT under HAVE_6REGS

Conflicts:
	libavcodec/x86/mathops.h

See: b38910c979
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2015-03-17 20:25:47 +01:00
Luca Barbato
48aef27f52 x86: Put COPY3_IF_LT under HAVE_6REGS
It uses 6 registers, unbreaks building on hardened x86 system.

Bug-Id: gentoo/541930
CC: libav-stable@libav.org
2015-03-17 12:31:04 +01:00
Michael Niedermayer
d79f7bf0d6 avcodec/x86/cavsdsp: remove incorrect LOCAL_ALIGN tmp
This is faster and simpler as well

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-03-16 14:51:51 +01:00
James Almer
e8374d7202 x86/proresdsp: remove ff_prores_idct_put_10_sse4
It's exactly the same as the sse2 version.

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-03-16 01:52:44 -03:00
James Almer
bdd179c8cb x86/proresdsp: remove unused macro
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-03-16 01:49:34 -03:00
Christophe Gisquet
238db7cc56 x86: lavc: use LOCAL_ALIGNED instead of DECLARE_ALIGNED
The later may yield incorrect code for on-stack variables.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-03-14 20:06:47 +01:00
Christophe Gisquet
15ce160183 x86: xvid_idct: SSE2 merged add version
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-03-14 13:36:47 +01:00
Christophe Gisquet
decd5193e1 x86: xvid_idct: merged idct_put SSE2 versions
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-03-14 13:36:29 +01:00
Christophe Gisquet
8200575d84 x86: dct-test: evaluate prores idct avx version
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-03-14 13:23:27 +01:00
Christophe Gisquet
4eb4451be1 x86: dct-test: fix compilation for prores
When the decoder is deactivated, the x86-optimized versions are
not compiled, resulting in a link error.

The C version is unaffected, as it is part of the idctdsp
subsystem.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-03-14 13:23:06 +01:00
Christophe Gisquet
c3bf52713a x86: xvid_idct: port MMX iDCT to yasm
Also reduce the table duplication with SSE2 code, remove duplicated
macro parameters.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-03-14 11:45:11 +01:00
Christophe Gisquet
2999bd7da2 x86: xvid_idct: port SSE2 iDCT to yasm
The main difference consists in renaming properly labels, and
letting yasm select the gprs for skipping 1D transforms.

Previous-version-reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-03-13 01:04:52 +01:00
James Almer
5c8f747085 x86/hevc_sao: use unaligned movs for sao_{band,filter} with width 8
Suggested-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-03-01 20:02:43 -03:00
Michael Niedermayer
7fce8c752d Merge commit '71f1ad37d858b810b71a4af1c25771beaa50b27b'
* commit '71f1ad37d858b810b71a4af1c25771beaa50b27b':
  lavc: do not compile fmtconvert unconditionally

Conflicts:
	configure
	libavcodec/ppc/Makefile
	libavcodec/x86/Makefile

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2015-03-01 00:06:42 +01:00
Michael Niedermayer
5c17377e28 Merge commit 'd74a8cb7e42f703be5796eeb485f06af710ae8ca'
* commit 'd74a8cb7e42f703be5796eeb485f06af710ae8ca':
  fmtconvert: drop unused functions

Conflicts:
	libavcodec/arm/fmtconvert_vfp_armv6.S
	libavcodec/x86/fmtconvert.asm
	libavcodec/x86/fmtconvert_init.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-28 23:58:29 +01:00
Anton Khirnov
71f1ad37d8 lavc: do not compile fmtconvert unconditionally
Only ac3dec and dcadec use it.
2015-02-28 21:51:24 +01:00
Anton Khirnov
d74a8cb7e4 fmtconvert: drop unused functions 2015-02-28 21:51:24 +01:00
Michael Niedermayer
23a90768a8 avcodec/v210dec: Add ff prefix to v210_x86_init()
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-27 19:08:09 +01:00
Michael Niedermayer
0e699676f9 avcodec/snow: mark dwt init as av_cold
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-27 16:53:37 +01:00
Carl Eugen Hoyos
36a6fb989b hevc_deblock: Fix compilation with nasm
CC: libav-stable@libav.org
Bug-Id: 795
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2015-02-22 22:34:20 +00:00
Michael Niedermayer
03f39fbb2a avcodec/x86/mlpdsp_init: Simplify mlp_filter_channel_x86()
Based on patch by Francisco Blas Izquierdo Riera
Commit message partly taken from carl

fixes a compilation
error in mlpdsp_init.c with -fstack-check and some gcc compilers (I
reproduced the issue with gcc 4.7.3) by simplifying the code.

See also https://bugs.gentoo.org/show_bug.cgi?id=471756

$ make libavcodec/x86/mlpdsp_init.o
libavcodec/x86/mlpdsp_init.c: In function ‘mlp_filter_channel_x86’:
libavcodec/x86/mlpdsp_init.c:142:5: error: can’t find a register in
class ‘GENERAL_REGS’ while reloading ‘asm’
libavcodec/x86/mlpdsp_init.c:142:5: error: ‘asm’ operand has impossible
constraints

4551 -> 4509 dezicycles

Reviewed-by: Ramiro Polla <ramiro.polla@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-21 16:05:41 +01:00
Christophe Gisquet
398f531915 x86: hevc_mc: fewer xmm regs used in epel h/v
11 xmm regs seem only required for avx2.

Reviewed-by: Mickaël Raulet <mraulet@insa-rennes.fr
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-17 15:19:19 +01:00
Christophe Gisquet
89cb4995fa x86: hevc_mc: save 1 gpr in epel filter loading
The 3*stride value stored in r3src can be loaded much later,
so use r3src instead of a dedicated gpr when possible.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-16 21:53:51 +01:00
James Almer
03adafb318 x86/g722dsp: add ff_g722_apply_qmf_sse2
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-16 00:41:21 -03:00
Christophe Gisquet
b533949813 x86: hevc: remove a parameter to WP internals
The second stride is always the internal buffer one, MAX_PB_SIZE (times 2 to
get the value in bytes).

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-14 17:22:50 +01:00
James Almer
1679d68dbf x86/hevc_mc: optimize AVX2 mc functions
Before
40766 decicycles in ff_hevc_put_hevc_qpel_h64_8_avx2, 8192 runs, 0 skips

After
37975 decicycles in ff_hevc_put_hevc_qpel_h64_8_avx2, 8192 runs, 0 skips

Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-12 13:21:58 -03:00
James Almer
14b44c1614 x86/hevc_sao: make sao_edge_filter_{10,12} work on x86_32
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-12 13:21:30 -03:00
James Almer
06fe6dfe12 x86/hevc_sao: make sao_band_filter work on x86_32
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-09 20:41:21 -03:00
Christophe Gisquet
b61b9e4919 x86: hevc_mc: remove lea in EPEL_LOAD
The second parameter to the macro is always an immediate address,
so no lea is needed.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-08 22:19:35 +01:00
Christophe Gisquet
4919b38421 x86: hevc_mc: fewer gpr autoloads for _v filters
In that case, it's just to load my, but mx/r3src is not used.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-08 22:19:34 +01:00
James Almer
92d903afaa x86/vp9dsp: fix clobbering of xmm6 on IDCT sse2 functions
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-08 00:50:39 -03:00
Christophe Gisquet
626d6184ce x86: lavc/hevc_mc: fix comments
The width parameter is now completely at the back, and actually
never used. This helps understanding the actual parameter list.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-07 20:52:03 +01:00
Christophe Gisquet
ed450d4acf x86: lavc: share more constant through defines
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-07 17:48:14 +01:00
Christophe Gisquet
691b7f5e9e lavc/lossless_audiodsp: revert various commits
Their intent was to make the DSP work with wmalossless pro.
The later was fixed to work with the DSP.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-07 15:15:19 +01:00
Christophe Gisquet
9dc45d1f42 x86: lavc: share more constants
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-06 23:35:02 +01:00
Mickaël Raulet
6ecc3fd612 x86/hevc_mc: use aligned loads
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-06 21:38:00 +01:00
James Almer
383fddeec6 x86/lossless_audiodsp: fix compilation with --disable-yasm
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-06 17:30:17 -03:00
James Almer
aea29a891f x86/hevc_sao: fix loading of RIP address
pb_eo must be handled as a rip relative address for MSVC64, so an
intermediate register is needed. Should fix link failures.

Suggested by Hendrik Leppkes and Christophe Gisquet.

Tested-By: Hendrik Leppkes <h.leppkes@gmail.com>
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-06 15:06:15 -03:00
Mickaël Raulet
bcb0925115 x86/hevc: use CLIPW macro when possible
Conflicts:
	libavcodec/x86/hevc_mc.asm

Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-06 17:38:47 +01:00
Christophe Gisquet
5eedd36df1 x86: hevc_mc: use epel_hv 16-wide function
The epel_hv functions were still relying on only epel_hv 8-wide
being the maximum width instanciated.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-06 17:37:56 +01:00
Pierre Edouard Lepere
a0d1300f71 x86: hevc_mc: add AVX2 optimizations
before
33304 decicycles in luma_bi_1, 523066 runs, 1222 skips
38138 decicycles in luma_bi_2, 523427 runs, 861 skips
13490 decicycles in luma_uni, 516138 runs, 8150 skips
after
20185 decicycles in luma_bi_1, 519970 runs, 4318 skips
24620 decicycles in luma_bi_2, 521024 runs, 3264 skips
10397 decicycles in luma_uni, 515715 runs, 8573 skips

Conflicts:
	libavcodec/x86/hevc_mc.asm
	libavcodec/x86/hevcdsp_init.c

Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-06 17:20:47 +01:00
Michael Niedermayer
a6c2c8fe3f Revert "avcodec/x86/lossless_audiodsp: Make scalarproduct_and_madd_int16 prototypes more similar"
This reverts commit 3b4ffba3af.

Unbreaks the SSSE3 code on mingw32

Conflicts:

	libavcodec/x86/lossless_audiodsp.asm

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-06 02:31:45 +01:00
Michael Niedermayer
f1214763af avcodec/x86/lossless_audiodsp: Move order&8 fallback into C code
This is simpler and more robust, and fixes mismatching XMM save restore
mismatches

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-06 02:18:54 +01:00
Michael Niedermayer
3b4ffba3af avcodec/x86/lossless_audiodsp: Make scalarproduct_and_madd_int16 prototypes more similar
This is needed as the mmx code is used as fallback from the ssse3 code

Suggested-by: jamrial
Tested-by: wm4
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-06 00:20:59 +01:00
James Almer
15574c505b x86/hevcdsp: add ff_hevc_sao_edge_filter_{10,12}_{sse2,avx2}
Original x86 intrinsics code by Pierre-Edouard Lepere.
Yasm port, refactoring and optimizations by James Almer.

Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U

Width 32
342694 decicycles in sao_edge_filter_10, 16384 runs, 0 skips
29476 decicycles in ff_hevc_sao_edge_filter_32_10_ssse3, 16384 runs, 0 skips
13996 decicycles in ff_hevc_sao_edge_filter_32_10_avx2, 16381 runs, 3 skips

Width 64
581163 decicycles in sao_edge_filter_10, 8192 runs, 0 skips
59774 decicycles in ff_hevc_sao_edge_filter_64_10_ssse3, 8192 runs, 0 skips
28383 decicycles in ff_hevc_sao_edge_filter_64_10_avx2, 8191 runs, 1 skips

Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-05 15:02:33 -03:00
James Almer
042c1159fc x86/hevcdsp: add ff_hevc_sao_edge_filter_8_{ssse3,avx2}
Original x86 intrinsics code and initial yasm port by Pierre-Edouard Lepere.
Refactoring and optimizations by James Almer.

Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U

Width 32
158583 decicycles in edge, sao_edge_filter_8 runs, 0 skips
5205 decicycles in ff_hevc_sao_edge_filter_32_8_ssse3, 32767 runs, 1 skips
2942 decicycles in ff_hevc_sao_edge_filter_32_8_avx2, 32767 runs, 1 skips

Width 64
705639 decicycles in sao_edge_filter_8, 262144 runs, 0 skips
19224 decicycles in ff_hevc_sao_edge_filter_64_8_ssse3, 262111 runs, 33 skips
10433 decicycles in ff_hevc_sao_edge_filter_64_8_avx2, 262115 runs, 29 skips

Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-05 15:02:27 -03:00
James Almer
aa945dc112 x86/hevcdsp: add missing vzeroupper in ff_hevc_sao_band_filter_48_*_avx2
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-02 00:01:35 -03:00
James Almer
71e2cb4706 x86/hevcdsp: add missing guards to ff_hevc_sao_band_filter_avx2
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-01 21:45:52 -03:00
Christophe Gisquet
bff7feb328 x86: hevc/sao: aligned source buffers
Usefull for at least band filter, for which:
- Band filter call only:
           32      64
Before:  16556    54015
After:   16497    52355
- Whole case:
           32      64
Before:  37031   103008
After:   32045    93952
2015-02-01 20:22:54 -03:00
James Almer
fa3eccb4f9 x86/hevc: add ff_hevc_sao_band_filter_{8,10,12}_{sse2,avx,avx2}
Original x86 intrinsics code and initial 8bit yasm port by Pierre-Edouard Lepere.
10/12bit yasm ports, refactoring and optimizations by James Almer

Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U

width 32
40338 decicycles in sao_band_filter_0_8, 2048 runs, 0 skips
8056 decicycles in ff_hevc_sao_band_filter_8_32_sse2, 2048 runs, 0 skips
7458 decicycles in ff_hevc_sao_band_filter_8_32_avx, 2048 runs, 0 skips
4504 decicycles in ff_hevc_sao_band_filter_8_32_avx2, 2048 runs, 0 skips

width 64
136046 decicycles in sao_band_filter_0_8, 16384 runs, 0 skips
28576 decicycles in ff_hevc_sao_band_filter_8_32_sse2, 16384 runs, 0 skips
26707 decicycles in ff_hevc_sao_band_filter_8_32_avx, 16384 runs, 0 skips
14387 decicycles in ff_hevc_sao_band_filter_8_32_avx2, 16384 runs, 0 skips

Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-01 20:22:35 -03:00
Christophe Gisquet
7aeafacfd0 x86/sbrdsp: Use different mem moves
Before
2843 decicycles in ff_sbr_autocorrelate_sse3, 262086 runs, 58 skips

After
2693 decicycles in ff_sbr_autocorrelate_sse3, 262117 runs, 27 skips

Signed-off-by: James Almer <jamrial@gmail.com>
2015-01-25 18:20:43 -03:00
James Almer
449b21bfab x86/sbrdsp: add ff_sbr_autocorrelate_{sse,sse3}
2 to 2.5 times faster.

Signed-off-by: James Almer <jamrial@gmail.com>
2015-01-25 18:20:39 -03:00
James Almer
08810a8895 x86/flacdsp: remove unneeded ifdeffery
x86inc can translate r*m into a register or stack on its own

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-01-05 16:29:28 -03:00
James Almer
37b35feb64 x86/swr: add SSE2/AVX pack_8ch functions
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-12-30 23:05:27 -03:00
Ronald S. Bultje
3aefca68ca vp9/x86: add myself to copyright holders for loopfilter assembly. 2014-12-27 16:55:16 -05:00
Ronald S. Bultje
afd8c464b7 vp9/x86: make filter_16_h work on 32-bit. 2014-12-27 16:55:16 -05:00
Ronald S. Bultje
b26bc3520f vp9/x86: make filter_48/84/88_h work on 32-bit. 2014-12-27 16:55:15 -05:00
Ronald S. Bultje
8a1cff1c35 vp9/x86: make filter_44_h work on 32-bit. 2014-12-27 16:55:15 -05:00
Ronald S. Bultje
047088b8c6 vp9/x86: make filter_16_v work on 32-bit. 2014-12-27 16:55:14 -05:00
Ronald S. Bultje
0cc9c23ea1 vp9/x86: make filter_48/84_v work on 32-bit. 2014-12-27 16:55:14 -05:00
Ronald S. Bultje
6433a9133f vp9/x86: make filter_88_v work on 32-bit. 2014-12-27 16:55:14 -05:00
Ronald S. Bultje
75f8e52089 vp9/x86: make filter_44_v work on 32-bit. 2014-12-27 16:55:13 -05:00
Ronald S. Bultje
7f80c3344c vp8/x86: save one register in SIGN_ADD/SUB. 2014-12-27 16:55:13 -05:00
Ronald S. Bultje
8ea2194ebb vp9/x86: store unpacked intermediates for filter6/14 on stack.
filter16 goes from 508 to 482 (h) or 346 to 314 (v) cycles; filter88
goes from 240 to 238 (h) or 174 to 165 (v) cycles, measured on TOS.
2014-12-27 16:55:13 -05:00
Ronald S. Bultje
e42409479f vp8/x86: move variable assigned inside macro branch.
The value is not used outside the branch.
2014-12-27 16:55:12 -05:00
Ronald S. Bultje
418c202c63 vp9/x86: simplify ABSSUM_CMP by inverting the comparison meaning. 2014-12-27 16:55:12 -05:00
Ronald S. Bultje
d1c55654e1 vp8/x86: remove unused register from ABSSUB_CMP macro. 2014-12-27 16:55:12 -05:00
Ronald S. Bultje
e59bd08986 vp9/x86: slightly simplify 44/48/84/88 h stores. 2014-12-27 16:55:11 -05:00
Ronald S. Bultje
8132629bd5 vp9/x86: make cglobal statement more conservative in register allocation. 2014-12-27 16:55:11 -05:00
Ronald S. Bultje
c013ca58c5 vp9/x86: save one register in loopfilter surface coverage. 2014-12-27 16:55:11 -05:00
James Almer
32c836cb11 x86/vp9: remove duplicate function prototypes
Fixes "redundant redeclaration" warnings.

Signed-off-by: James Almer <jamrial@gmail.com>
2014-12-23 00:56:51 -03:00
James Almer
7696e429c7 x86/vp3dsp: port put_vp_no_rnd_pixels8_l2_mmx to yasm
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-20 13:25:43 +01:00
James Almer
a4d62f7775 x86/constants: fix alignment of pw_255
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-19 20:21:34 +01:00
Ronald S. Bultje
bdc1e3e3b2 vp9/x86: intra prediction sse2/32bit support.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-19 14:07:19 +01:00
Ronald S. Bultje
b6e1711223 vp9/x86: invert hu_ipred left array ordering.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-19 14:07:18 +01:00
Ronald S. Bultje
0a7964dca5 vp9/x86: save one register on 32bit idct32x32.
Fixes build on win32.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-16 02:51:26 +01:00
Ronald S. Bultje
cae893f692 vp9/x86: sse2 MC assembly.
Also a slight change to the ssse3 code, which prevents a theoretical
overflow in the sharp filter.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-15 02:34:05 +01:00
Ronald S. Bultje
fd77fbb390 vp9/x86: 32bit and sse2 support for vp9 inverse transform assembly
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-15 00:38:05 +01:00
Michael Niedermayer
a03f72e744 avcodec/x86/hevc_mc: fix sse register counts
These fix failures of --enable-xmm-clobber-test
It would be better to change the code to use fewer registers, but until
someone does the used register count must not be too small

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-11 13:17:26 +01:00
Michael Niedermayer
d43d5c5707 avcodec/x86/hevc_mc: remove dead branch from EPEL_FILTER
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-10 07:34:49 +01:00
Michael Niedermayer
ed9be7dd47 avcodec/x86/pngdsp: fix off by 1 error
This fixes artifacts in the last pixel of rows with some widths and pixel formats

Found-by: Dominique Leroux <Dominique.Leroux@autodesk.com>
Tested-by: Dominique Leroux <Dominique.Leroux@autodesk.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-08 18:24:40 +01:00
Michael Niedermayer
1d048f762d Merge commit '9a738c27dceb4b975784b23213a46f5cb560d1c2'
* commit '9a738c27dceb4b975784b23213a46f5cb560d1c2':
  v210enc: Add SIMD optimised 8-bit and 10-bit encoders

Conflicts:
	libavcodec/v210enc.c
	libavcodec/v210enc.h
	libavcodec/x86/Makefile
	libavcodec/x86/v210enc.asm
	libavcodec/x86/v210enc_init.c
	tests/ref/vsynth/vsynth1-v210
	tests/ref/vsynth/vsynth2-v210

See: 36091742d1
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-06 01:54:10 +01:00
Kieran Kunhya
9a738c27dc v210enc: Add SIMD optimised 8-bit and 10-bit encoders
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2014-12-05 13:03:49 +00:00
Reimar Döffinger
49d9cbe55d h264_i386: Fix operand size
Fixes fate failure on macosx clang x86-64

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-03 23:03:13 +01:00
Christophe Gisquet
9fa056ba75 pngdsp x86: use unaligned access
For test images manually generated to contain only up prediction,
timing results:
         8380x3032    255x185
before:   138635       1992
after:    139232       1996

Actually jumping to the proper version depending on the alignment:
8380x3032: 138767

A 0.5% speed improvement for gigantic images is not worth the code
duplication.

Fixes ticket #4148

Signed-off-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Tested-by: Benoit Fouet <benoit.fouet@free.fr>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-03 11:56:22 +01:00
Kieran Kunhya
36091742d1 v210enc: Add SIMD optimised 8-bit and 10-bit encoders
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-11-26 20:30:47 +01:00
Michael Niedermayer
ea41e6d637 Merge commit '9c12c6ff9539e926df0b2a2299e915ae71872600'
* commit '9c12c6ff9539e926df0b2a2299e915ae71872600':
  motion_est: convert stride to ptrdiff_t

Conflicts:
	libavcodec/me_cmp.c
	libavcodec/ppc/me_cmp.c
	libavcodec/x86/me_cmp_init.c

See: 9c669672c7
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-11-24 12:13:00 +01:00
Vittorio Giovara
9c12c6ff95 motion_est: convert stride to ptrdiff_t
CC: libav-stable@libav.org
Bug-Id: CID 700556 / CID 700557 / CID 700558
2014-11-24 01:30:10 +00:00
Carl Eugen Hoyos
600e38f563 Fix standalone compilation of the apng decoder on x86. 2014-11-23 13:21:29 +01:00
Michael Niedermayer
65ce8f8895 avcodec/x86/Makefile: fix order
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-11-23 01:49:04 +01:00
Michael Niedermayer
d3512a0e89 avcodec/x86/lossless_audiodsp: fix fallback code for 32bit
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-11-22 21:08:38 +01:00
Michael Niedermayer
4327088da3 avcodec/x86/lossless_audiodsp: support len %16 == 8 in scalarproduct_and_madd_int16()
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-11-22 20:40:36 +01:00
Reimar Döffinger
478c61ccb2 h264_i386: Optimize decode_significance_8x8_x86 for 64 bit.
11674 -> 10877 decicycles on my Phenom II.
Overall speedup was unfortunately within measurement error.

Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
2014-11-22 14:06:48 +01:00
James Almer
3cec54b7d7 x86/flacdsp: add SSE2 and AVX decorrelate functions
Two to four times faster depending on instruction set, block size and channel count.
2014-11-13 13:47:55 -03:00
James Almer
84ccc317ce x86/flacdsp: separate decoder and encoder dsp initialization
Signed-off-by: James Almer <jamrial@gmail.com>
2014-11-12 14:41:45 -03:00
James Almer
7292b0477a x86/hpeldsp: fix loop in {avg,avg_no_rnd}_pixels16_x2_mmx
Handle it inside the __asm__() block.
Fixes fate-vc1_ilaced_twomv when using the gcc-usan toolchain.

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-10-23 13:11:05 -03:00
Michael Niedermayer
3c1378ce0a Merge commit '2d91abade29e43bb45c881d45909b8ee77e904e2'
* commit '2d91abade29e43bb45c881d45909b8ee77e904e2':
  x86: h264_intrapred: Don't treat 32-bit integers as 64-bit

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-10-08 11:48:58 +02:00
Henrik Gramner
2d91abade2 x86: h264_intrapred: Don't treat 32-bit integers as 64-bit
The upper halves are not guaranteed to be zero in x86-64.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2014-10-08 08:15:52 +00:00
Mickaël Raulet
4ba6371a83 x86/hevc: get rid off packusdw for ssse3 compatibility
cherry picked from commit df8ebe304df453f26c28ff8f11d607f49b90a4c2

Fixes out of array access
Fixes: asan_stack-oob_1046454_9_asan_stack-oob_15a9e7c_170_WP_MAIN10_B_Toshiba_3.bit

Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-10-04 21:14:15 +02:00
James Almer
0de1d6287e x86/mlpdec: add ff_mlp_rematrix_channel_{sse4,avx2}
2x to 2.5x faster than the C version.

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-10-02 22:11:55 -03:00
James Almer
acebff8e5d x86/mpegvideoencdsp: improve ff_pix_sum16_sse2
~15% faster.

Also add an mmxext version that takes advantage of the new code, and
build it alongside with the mmx version only on x86_32.

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-10-01 13:07:22 -03:00
Michael Niedermayer
d22e88d120 avcodec/x86/fmtconvert: Fix operand size in ff_int32_to_float_fmul_array8_sse*
Fixes acodec-dca2 fate failure

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-28 19:04:06 +02:00
James Almer
26cd7b1e1a x86/fmtconvert: add ff_int32_to_float_fmul_array8_{sse,sse2}
About two times faster than the c wrapper.

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-26 20:48:40 -03:00
Carl Eugen Hoyos
c0f9df30dd lavc/x86/idctdsp.h: Fix make checkheaders. 2014-09-25 22:18:25 +02:00
James Almer
a829870b2f avcodec/svq1enc: align buffer used by simd functions
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-25 16:00:20 -03:00
James Almer
4b892e469b x86/cavsdsp: fix buffer alignment in cavs_idct8_add_mmx()
It may be used by ff_add_pixels_clamped_sse2().
Should fix fate-cavs failures on some systems.

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-25 16:00:16 -03:00
James Almer
4f4f08e6f0 x86/idctdsp: port {put,add}_pixels_clamped to yasm
Also add sse2 versions for both.
put_pixels_clamped port and sse2 version originally written by Timothy Gu.

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-24 21:52:13 -03:00
James Almer
c99a882814 avcodec/idctdsp: change {put,add}_pixels_clamped to ptrdiff_t line_size
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-24 21:43:19 -03:00
James Almer
ad26e83f9c avcodec/x86: use function pointers for {put,add}_pixels_clamped
Same behavior as in simple_idct.
This way the best optimized versions available will be used instead.

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-24 18:52:32 -03:00
James Almer
70277d1d23 x86/videodsp: add ff_emu_edge_{hfix,hvar}_avx2
~15% faster than sse2.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-24 16:12:55 -03:00
James Almer
164d6c7f5b x86/videodsp: fix warning about discarded 'const' qualifier
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-23 19:59:20 -03:00
James Almer
6b2caa321f x86/vp9: add AVX and AVX2 MC
Roughly 25% faster MC than ssse3 for blocksizes 32 and 64.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-22 22:35:03 -03:00
James Almer
33c752be51 x86/me_cmp: port mmxext vsad functions to yasm
Also add mmxext versions of vsad8 and vsad_intra8, and sse2 versions of
vsad16 and vsad_intra16.
Since vsad8 and vsad16 are not bitexact, they are accordingly marked as
approximate.

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-19 20:50:20 -03:00
James Almer
77f9a81cca x86/me_cmp: combine sad functions into a single macro
No point in having the sad8 functions separate now that the loop is no
longer unrolled.

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-17 23:52:36 -03:00
Michael Niedermayer
41d82b85ab avcodec/x86/vp9lpf: Always include x86util.asm
Fixes executable stack

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-17 23:37:46 +02:00
Michael Niedermayer
85f2c0124d avcodec/x86/me_cmp: fix sad8xh
This adds back support for 8x4 and 8x16
it does not support 8x2, i think nothing uses that

Found-by: ubitux
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-17 14:08:24 +02:00
James Almer
0456d169c4 x86/me_cmp: port mmxext and sse2 sad functions to yasm
Also add a missing c->pix_abs[0][0] initialization, and sse2 versions of
sad16_x2, sad16_y2 and sad16_xy2 (%15 to %20 faster than mmxext).
Since the _xy2 versions are not bitexact, they are accordingly marked as
approximate.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-17 11:12:50 +02:00
James Almer
52ec81c67d x86/hevc_res_add: add missing guards to hevc_transform_add32_8_avx2
Should fix compilation with old Yasm/Nasm versions.

Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-04 23:34:01 -03:00
James Almer
c3d2426cca x86/hevc_res_add: add ff_hevc_transform_add32_8_avx2
~20% faster than AVX.

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-04 20:21:29 -03:00
James Darnley
46ef45ab59 lavc/x86/v210: give cpuflag to INIT macro
This lets the cglobal macro automatically append a suffix to the function name.
This means that INIT_XMM avx must be used rather than INIT_AVX.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-05 00:35:07 +02:00