Commit Graph

276 Commits

Author SHA1 Message Date
James Almer
9dcaae70f2 x86/aacpsdsp: add SSE and SSE3 optimized functions
Between 1.5 and 2.5 times faster

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-07-30 19:01:15 -03:00
Michael Niedermayer
115a9b5091 Merge commit 'd42191c78befc1983f23b1899b2dda513b72f1ed'
* commit 'd42191c78befc1983f23b1899b2dda513b72f1ed':
  configure: Factor out vp8dsp module

Conflicts:
	configure
	libavcodec/Makefile
	libavcodec/x86/Makefile

Merged-by: Michael Niedermayer <michael@niedermayer.cc>
2015-07-17 22:45:34 +02:00
Michael Niedermayer
fd29dd432c Merge commit '5cb4bdb2a03c3643f8f1e7d21d7094e61e0a4418'
* commit '5cb4bdb2a03c3643f8f1e7d21d7094e61e0a4418':
  configure: Factor out rv34dsp module

Conflicts:
	libavcodec/Makefile
	libavcodec/x86/Makefile

Merged-by: Michael Niedermayer <michael@niedermayer.cc>
2015-07-17 22:21:36 +02:00
Vittorio Giovara
d42191c78b configure: Factor out vp8dsp module 2015-07-17 18:46:24 +01:00
Vittorio Giovara
5cb4bdb2a0 configure: Factor out rv34dsp module 2015-07-17 18:46:24 +01:00
James Almer
7912a6830d avcodec/jpeg200dsp: add ff_ict_float_{sse,avx}
Original intrinsics version by Nicolas Bertrand.

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-06-13 16:53:27 -03:00
Christophe Gisquet
c3bf52713a x86: xvid_idct: port MMX iDCT to yasm
Also reduce the table duplication with SSE2 code, remove duplicated
macro parameters.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-03-14 11:45:11 +01:00
Christophe Gisquet
2999bd7da2 x86: xvid_idct: port SSE2 iDCT to yasm
The main difference consists in renaming properly labels, and
letting yasm select the gprs for skipping 1D transforms.

Previous-version-reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-03-13 01:04:52 +01:00
Michael Niedermayer
7fce8c752d Merge commit '71f1ad37d858b810b71a4af1c25771beaa50b27b'
* commit '71f1ad37d858b810b71a4af1c25771beaa50b27b':
  lavc: do not compile fmtconvert unconditionally

Conflicts:
	configure
	libavcodec/ppc/Makefile
	libavcodec/x86/Makefile

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2015-03-01 00:06:42 +01:00
Anton Khirnov
71f1ad37d8 lavc: do not compile fmtconvert unconditionally
Only ac3dec and dcadec use it.
2015-02-28 21:51:24 +01:00
James Almer
03adafb318 x86/g722dsp: add ff_g722_apply_qmf_sse2
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-16 00:41:21 -03:00
James Almer
fa3eccb4f9 x86/hevc: add ff_hevc_sao_band_filter_{8,10,12}_{sse2,avx,avx2}
Original x86 intrinsics code and initial 8bit yasm port by Pierre-Edouard Lepere.
10/12bit yasm ports, refactoring and optimizations by James Almer

Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U

width 32
40338 decicycles in sao_band_filter_0_8, 2048 runs, 0 skips
8056 decicycles in ff_hevc_sao_band_filter_8_32_sse2, 2048 runs, 0 skips
7458 decicycles in ff_hevc_sao_band_filter_8_32_avx, 2048 runs, 0 skips
4504 decicycles in ff_hevc_sao_band_filter_8_32_avx2, 2048 runs, 0 skips

width 64
136046 decicycles in sao_band_filter_0_8, 16384 runs, 0 skips
28576 decicycles in ff_hevc_sao_band_filter_8_32_sse2, 16384 runs, 0 skips
26707 decicycles in ff_hevc_sao_band_filter_8_32_avx, 16384 runs, 0 skips
14387 decicycles in ff_hevc_sao_band_filter_8_32_avx2, 16384 runs, 0 skips

Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-01 20:22:35 -03:00
Kieran Kunhya
9a738c27dc v210enc: Add SIMD optimised 8-bit and 10-bit encoders
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2014-12-05 13:03:49 +00:00
Kieran Kunhya
36091742d1 v210enc: Add SIMD optimised 8-bit and 10-bit encoders
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-11-26 20:30:47 +01:00
Carl Eugen Hoyos
600e38f563 Fix standalone compilation of the apng decoder on x86. 2014-11-23 13:21:29 +01:00
Michael Niedermayer
65ce8f8895 avcodec/x86/Makefile: fix order
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-11-23 01:49:04 +01:00
James Almer
0de1d6287e x86/mlpdec: add ff_mlp_rematrix_channel_{sse4,avx2}
2x to 2.5x faster than the C version.

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-10-02 22:11:55 -03:00
James Almer
4f4f08e6f0 x86/idctdsp: port {put,add}_pixels_clamped to yasm
Also add sse2 versions for both.
put_pixels_clamped port and sse2 version originally written by Timothy Gu.

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-24 21:52:13 -03:00
Michael Niedermayer
b3b05a11d3 Merge commit 'dcb7c868ec7af7d3a138b3254ef2e08f074d8ec5'
* commit 'dcb7c868ec7af7d3a138b3254ef2e08f074d8ec5':
  cosmetics: Make naming scheme of Xvid IDCT consistent with other IDCTs

Conflicts:
	libavcodec/mpeg4videodec.c
	libavcodec/x86/Makefile
	libavcodec/x86/dct-test.c
	libavcodec/x86/xvididct_sse2.c
	libavcodec/xvididct.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-27 21:09:30 +02:00
Diego Biurrun
dcb7c868ec cosmetics: Make naming scheme of Xvid IDCT consistent with other IDCTs 2014-08-27 04:54:05 -07:00
Pierre Edouard Lepere
a6af4bf64d x86: hevc: adding transform_add
Reviewed-by: James Almer <jamrial@gmail.com>
Approved-by: Ronald S. Bultje
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-20 01:28:56 +02:00
Michael Niedermayer
3bb2297351 Merge commit 'efd26bedec9a345a5960dbfcbaec888418f2d4e6'
* commit 'efd26bedec9a345a5960dbfcbaec888418f2d4e6':
  build: Add explanatory comments to (optimization) blocks in the Makefiles

Conflicts:
	libavcodec/ppc/Makefile
	libavcodec/x86/Makefile

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-15 20:25:12 +02:00
Diego Biurrun
efd26bedec build: Add explanatory comments to (optimization) blocks in the Makefiles 2014-08-15 02:55:21 -07:00
James Darnley
0081a14e7d lavc/flacenc: add sse4 version of the 16-bit lpc encoder
From 1.8 to 2.4 times faster.  Runtime is reduced by 2 to 39%.  The
speed-up generally increases with compression_level.

This lpc encoder is not used with levels < 3 so it provides no speed-up
in these cases.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-13 01:14:47 +02:00
Michael Niedermayer
f54e01c24e Merge commit 'a786c8259dafeca9744252230b5d78f67810770c'
* commit 'a786c8259dafeca9744252230b5d78f67810770c':
  idct: Split off Xvid IDCT

Conflicts:
	libavcodec/Makefile
	libavcodec/mpeg4videodec.c
	libavcodec/x86/Makefile
	libavcodec/x86/idctdsp_init.c

This split is somewhat restructured leaving the xvid IDCT available
outside mpeg4 if manually selected.

The code also could not be merged unchanged as it conflicted with a
bugfix in FFmpeg

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-01 16:21:52 +02:00
Diego Biurrun
a786c8259d idct: Split off Xvid IDCT
The Xvid IDCT is only required to decode some Xvid-encoded MPEG-4 files,
so there is no point in having it as an unconditional part of idctdsp.
2014-08-01 01:25:18 -07:00
Michael Niedermayer
a91c5ed008 Merge commit '4f8cf0dc4ef6110174056df7edd9dc2f2a988b6d'
* commit '4f8cf0dc4ef6110174056df7edd9dc2f2a988b6d':
  x86: build: Restore ordering of OBJS lines

Conflicts:
	libavcodec/x86/Makefile

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-29 00:34:53 +02:00
Diego Biurrun
4f8cf0dc4e x86: build: Restore ordering of OBJS lines 2014-07-28 13:19:04 -07:00
Pierre Edouard Lepere
1a880b2fb8 hevc: SSE2 and SSSE3 loop filters
Additional contributions by James Almer <jamrial@gmail.com>,
Carl Eugen Hoyos <cehoyos@ag.or.at>, Fiona Glaser <fiona@x264.com> and
Anton Khirnov <anton@khirnov.net>

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2014-07-26 15:01:01 +00:00
Michael Niedermayer
3a2d1465c8 Merge commit '2d60444331fca1910510038dd3817bea885c2367'
* commit '2d60444331fca1910510038dd3817bea885c2367':
  dsputil: Split motion estimation compare bits off into their own context

Conflicts:
	configure
	libavcodec/Makefile
	libavcodec/arm/Makefile
	libavcodec/dvenc.c
	libavcodec/error_resilience.c
	libavcodec/h264.h
	libavcodec/h264_slice.c
	libavcodec/me_cmp.c
	libavcodec/me_cmp.h
	libavcodec/motion_est.c
	libavcodec/motion_est_template.c
	libavcodec/mpeg4videoenc.c
	libavcodec/mpegvideo.c
	libavcodec/mpegvideo_enc.c
	libavcodec/x86/Makefile
	libavcodec/x86/me_cmp_init.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-17 23:27:40 +02:00
Michael Niedermayer
d6676a1605 Merge commit 'c23ce454b3e33634a188d6facfd2b7182af5af93'
* commit 'c23ce454b3e33634a188d6facfd2b7182af5af93':
  x86: dsputil: Coalesce all init files

Conflicts:
	libavcodec/x86/dsputil_init.c
	libavcodec/x86/dsputil_x86.h
	libavcodec/x86/motion_est.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-17 22:07:52 +02:00
Diego Biurrun
2d60444331 dsputil: Split motion estimation compare bits off into their own context 2014-07-17 09:07:10 -07:00
Diego Biurrun
c23ce454b3 x86: dsputil: Coalesce all init files
This makes the init files match the structure of the dsputil split.
2014-07-17 03:32:56 -07:00
Michael Niedermayer
cc3e7a4c3d Merge commit 'acf91215c74a91eb3b86af01dcb1d3c78d0e2310'
* commit 'acf91215c74a91eb3b86af01dcb1d3c78d0e2310':
  x86: dsputil: Avoid pointless CONFIG_ENCODERS indirection

Conflicts:
	libavcodec/x86/dsputil_init.c
	libavcodec/x86/dsputilenc_mmx.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-13 21:51:20 +02:00
Diego Biurrun
acf91215c7 x86: dsputil: Avoid pointless CONFIG_ENCODERS indirection
The remaining dsputil bits are encoding-specific anyway.
2014-07-13 07:01:05 -07:00
Michael Niedermayer
2d5e9451de Merge commit 'f46bb608d9d76c543e4929dc8cffe36b84bd789e'
* commit 'f46bb608d9d76c543e4929dc8cffe36b84bd789e':
  dsputil: Split off pixel block routines into their own context

Conflicts:
	configure
	libavcodec/dsputil.c
	libavcodec/mpegvideo_enc.c
	libavcodec/pixblockdsp_template.c
	libavcodec/x86/dsputilenc.asm
	libavcodec/x86/dsputilenc_mmx.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-10 01:22:14 +02:00
Diego Biurrun
f46bb608d9 dsputil: Split off pixel block routines into their own context 2014-07-09 08:05:26 -07:00
Michael Niedermayer
14e2406de7 Merge commit 'a9aee08d900f686e966c64afec5d88a7d9d130a3'
* commit 'a9aee08d900f686e966c64afec5d88a7d9d130a3':
  dsputil: Split off FDCT bits into their own context

Conflicts:
	configure
	libavcodec/Makefile
	libavcodec/asvenc.c
	libavcodec/dnxhdenc.c
	libavcodec/dsputil.c
	libavcodec/mpegvideo.h
	libavcodec/mpegvideo_enc.c
	libavcodec/x86/Makefile
	libavcodec/x86/dsputilenc_mmx.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-08 03:19:06 +02:00
Diego Biurrun
a9aee08d90 dsputil: Split off FDCT bits into their own context 2014-07-07 12:28:45 -07:00
Michael Niedermayer
3790801f9c Merge commit '3c650efb81aaa3b395ba4606ee68a47ee4efb57b'
* commit '3c650efb81aaa3b395ba4606ee68a47ee4efb57b':
  dsputil: Move draw_edges() to mpegvideoencdsp

Conflicts:
	libavcodec/mpegvideo_enc.c
	libavcodec/x86/Makefile
	libavcodec/x86/dsputil_init.c
	libavcodec/x86/dsputil_mmx.c
	libavcodec/x86/dsputil_x86.h

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-07 16:17:27 +02:00
Michael Niedermayer
020865f557 Merge commit 'c166148409fe8f0dbccef2fe684286a40ba1e37d'
* commit 'c166148409fe8f0dbccef2fe684286a40ba1e37d':
  dsputil: Move pix_sum, pix_norm1, shrink function pointers to mpegvideoenc

Conflicts:
	libavcodec/dsputil.c
	libavcodec/mpegvideo_enc.c
	libavcodec/x86/dsputilenc.asm
	libavcodec/x86/dsputilenc_mmx.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-07 15:36:58 +02:00
Michael Niedermayer
462c6cdb8e Merge commit '8d686ca59db14900ad5c12b547fb8a7afc8b0b94'
* commit '8d686ca59db14900ad5c12b547fb8a7afc8b0b94':
  dsputil: Split off *_8x8basis to a separate context

Conflicts:
	libavcodec/dsputil.c
	libavcodec/mpegvideo_enc.c
	libavcodec/x86/dsputilenc_mmx.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-07 15:08:55 +02:00
Diego Biurrun
3c650efb81 dsputil: Move draw_edges() to mpegvideoencdsp 2014-07-06 14:48:50 -07:00
Diego Biurrun
c166148409 dsputil: Move pix_sum, pix_norm1, shrink function pointers to mpegvideoenc 2014-07-06 14:26:53 -07:00
Diego Biurrun
8d686ca59d dsputil: Split off *_8x8basis to a separate context 2014-07-06 13:09:24 -07:00
James Almer
dad31083ae x86/svq1enc: port ssd_int8_vs_int16 to yasm
Also add an SSE2 version

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-05 21:43:40 +02:00
Michael Niedermayer
19b79c1429 Merge commit 'b0de1c766329dd8c9960ad1722e2f653160abc1b'
* commit 'b0de1c766329dd8c9960ad1722e2f653160abc1b':
  x86: build: Only compile FDCT code if MMX is enabled

Conflicts:
	libavcodec/x86/Makefile

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-05 20:12:31 +02:00
Michael Niedermayer
5036c8b17b Merge commit '12f129e545e5a5844b6ad7f3eb6a438015cad8bc'
* commit '12f129e545e5a5844b6ad7f3eb6a438015cad8bc':
  x86: Unconditionally compile blockdsp and svq1enc init files

Conflicts:
	libavcodec/x86/Makefile

blockdsp_mmx is renamed to blockdsp_init as we already have a blockdsp file
and _init is how all other such files are called

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-05 19:50:05 +02:00
Michael Niedermayer
6bef3e55bd Merge commit '009331303a6462d07cbe94aef9c446f1a1695519'
* commit '009331303a6462d07cbe94aef9c446f1a1695519':
  x86: huffyuvdsp: Move inline assembly to init file

Conflicts:
	libavcodec/x86/Makefile
	libavcodec/x86/huffyuvdsp.h
	libavcodec/x86/huffyuvdsp_init.c
	libavcodec/x86/huffyuvdsp_mmx.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-05 19:11:26 +02:00
Diego Biurrun
b0de1c7663 x86: build: Only compile FDCT code if MMX is enabled
All other files containing purely inline assembly are treated the same way.
2014-07-05 04:18:34 -07:00