Reimar Döffinger
478c61ccb2
h264_i386: Optimize decode_significance_8x8_x86 for 64 bit.
...
11674 -> 10877 decicycles on my Phenom II.
Overall speedup was unfortunately within measurement error.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
2014-11-22 14:06:48 +01:00
James Almer
3cec54b7d7
x86/flacdsp: add SSE2 and AVX decorrelate functions
...
Two to four times faster depending on instruction set, block size and channel count.
2014-11-13 13:47:55 -03:00
James Almer
84ccc317ce
x86/flacdsp: separate decoder and encoder dsp initialization
...
Signed-off-by: James Almer <jamrial@gmail.com>
2014-11-12 14:41:45 -03:00
James Almer
7292b0477a
x86/hpeldsp: fix loop in {avg,avg_no_rnd}_pixels16_x2_mmx
...
Handle it inside the __asm__() block.
Fixes fate-vc1_ilaced_twomv when using the gcc-usan toolchain.
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-10-23 13:11:05 -03:00
Michael Niedermayer
3c1378ce0a
Merge commit '2d91abade29e43bb45c881d45909b8ee77e904e2'
...
* commit '2d91abade29e43bb45c881d45909b8ee77e904e2':
x86: h264_intrapred: Don't treat 32-bit integers as 64-bit
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-10-08 11:48:58 +02:00
Henrik Gramner
2d91abade2
x86: h264_intrapred: Don't treat 32-bit integers as 64-bit
...
The upper halves are not guaranteed to be zero in x86-64.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2014-10-08 08:15:52 +00:00
Mickaël Raulet
4ba6371a83
x86/hevc: get rid off packusdw for ssse3 compatibility
...
cherry picked from commit df8ebe304df453f26c28ff8f11d607f49b90a4c2
Fixes out of array access
Fixes: asan_stack-oob_1046454_9_asan_stack-oob_15a9e7c_170_WP_MAIN10_B_Toshiba_3.bit
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-10-04 21:14:15 +02:00
James Almer
0de1d6287e
x86/mlpdec: add ff_mlp_rematrix_channel_{sse4,avx2}
...
2x to 2.5x faster than the C version.
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-10-02 22:11:55 -03:00
James Almer
acebff8e5d
x86/mpegvideoencdsp: improve ff_pix_sum16_sse2
...
~15% faster.
Also add an mmxext version that takes advantage of the new code, and
build it alongside with the mmx version only on x86_32.
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-10-01 13:07:22 -03:00
Michael Niedermayer
d22e88d120
avcodec/x86/fmtconvert: Fix operand size in ff_int32_to_float_fmul_array8_sse*
...
Fixes acodec-dca2 fate failure
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-28 19:04:06 +02:00
James Almer
26cd7b1e1a
x86/fmtconvert: add ff_int32_to_float_fmul_array8_{sse,sse2}
...
About two times faster than the c wrapper.
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-26 20:48:40 -03:00
Carl Eugen Hoyos
c0f9df30dd
lavc/x86/idctdsp.h: Fix make checkheaders.
2014-09-25 22:18:25 +02:00
James Almer
a829870b2f
avcodec/svq1enc: align buffer used by simd functions
...
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-25 16:00:20 -03:00
James Almer
4b892e469b
x86/cavsdsp: fix buffer alignment in cavs_idct8_add_mmx()
...
It may be used by ff_add_pixels_clamped_sse2().
Should fix fate-cavs failures on some systems.
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-25 16:00:16 -03:00
James Almer
4f4f08e6f0
x86/idctdsp: port {put,add}_pixels_clamped to yasm
...
Also add sse2 versions for both.
put_pixels_clamped port and sse2 version originally written by Timothy Gu.
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-24 21:52:13 -03:00
James Almer
c99a882814
avcodec/idctdsp: change {put,add}_pixels_clamped to ptrdiff_t line_size
...
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-24 21:43:19 -03:00
James Almer
ad26e83f9c
avcodec/x86: use function pointers for {put,add}_pixels_clamped
...
Same behavior as in simple_idct.
This way the best optimized versions available will be used instead.
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-24 18:52:32 -03:00
James Almer
70277d1d23
x86/videodsp: add ff_emu_edge_{hfix,hvar}_avx2
...
~15% faster than sse2.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-24 16:12:55 -03:00
James Almer
164d6c7f5b
x86/videodsp: fix warning about discarded 'const' qualifier
...
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-23 19:59:20 -03:00
James Almer
6b2caa321f
x86/vp9: add AVX and AVX2 MC
...
Roughly 25% faster MC than ssse3 for blocksizes 32 and 64.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-22 22:35:03 -03:00
James Almer
33c752be51
x86/me_cmp: port mmxext vsad functions to yasm
...
Also add mmxext versions of vsad8 and vsad_intra8, and sse2 versions of
vsad16 and vsad_intra16.
Since vsad8 and vsad16 are not bitexact, they are accordingly marked as
approximate.
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-19 20:50:20 -03:00
James Almer
77f9a81cca
x86/me_cmp: combine sad functions into a single macro
...
No point in having the sad8 functions separate now that the loop is no
longer unrolled.
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-17 23:52:36 -03:00
Michael Niedermayer
41d82b85ab
avcodec/x86/vp9lpf: Always include x86util.asm
...
Fixes executable stack
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-17 23:37:46 +02:00
Michael Niedermayer
85f2c0124d
avcodec/x86/me_cmp: fix sad8xh
...
This adds back support for 8x4 and 8x16
it does not support 8x2, i think nothing uses that
Found-by: ubitux
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-17 14:08:24 +02:00
James Almer
0456d169c4
x86/me_cmp: port mmxext and sse2 sad functions to yasm
...
Also add a missing c->pix_abs[0][0] initialization, and sse2 versions of
sad16_x2, sad16_y2 and sad16_xy2 (%15 to %20 faster than mmxext).
Since the _xy2 versions are not bitexact, they are accordingly marked as
approximate.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-17 11:12:50 +02:00
James Almer
52ec81c67d
x86/hevc_res_add: add missing guards to hevc_transform_add32_8_avx2
...
Should fix compilation with old Yasm/Nasm versions.
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-04 23:34:01 -03:00
James Almer
c3d2426cca
x86/hevc_res_add: add ff_hevc_transform_add32_8_avx2
...
~20% faster than AVX.
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-04 20:21:29 -03:00
James Darnley
46ef45ab59
lavc/x86/v210: give cpuflag to INIT macro
...
This lets the cglobal macro automatically append a suffix to the function name.
This means that INIT_XMM avx must be used rather than INIT_AVX.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-05 00:35:07 +02:00
Michael Niedermayer
5b58d79a99
Merge commit '7a1d6ddd2c6b2d66fbc1afa584cf506930a26453'
...
* commit '7a1d6ddd2c6b2d66fbc1afa584cf506930a26453':
xvid: Add C IDCT
Conflicts:
libavcodec/dct-test.c
libavcodec/xvididct.c
See: 298b3b6c1f
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-03 04:09:38 +02:00
Michael Niedermayer
5db23c07a3
Merge commit '95c0cec03acec0a80cc1c7db48f3b2355d9e767b'
...
* commit '95c0cec03acec0a80cc1c7db48f3b2355d9e767b':
idctdsp: Add global function pointers for {add|put}_pixels_clamped functions
Conflicts:
libavcodec/arm/idctdsp_init_arm.c
libavcodec/dct.h
libavcodec/idctdsp.c
libavcodec/jrevdct.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-03 03:19:40 +02:00
Pascal Massimino
7a1d6ddd2c
xvid: Add C IDCT
...
Thanks to Pascal Massimino and Michael Militzer for relicensing as LGPL.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2014-09-02 14:41:13 -07:00
Diego Biurrun
95c0cec03a
idctdsp: Add global function pointers for {add|put}_pixels_clamped functions
...
These function pointers already existed in the ARM code. Adding them globally
allows calls to the function pointers to access arch-optimized versions of the
functions transparently.
2014-09-02 14:41:13 -07:00
Reimar Döffinger
d9e2aceb7f
Add missing "const" all over the place.
...
Only "./configure --enable-gpl" on x86 was tested.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
2014-08-29 18:57:25 +02:00
Michael Niedermayer
5403a288a7
Merge commit '8d27bf1cff35be406b0fd89d832e1852d4c573bc'
...
* commit '8d27bf1cff35be406b0fd89d832e1852d4c573bc':
x86: xvid: K&R formatting cosmetics
Conflicts:
libavcodec/x86/xvididct_sse2.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-27 21:20:39 +02:00
Michael Niedermayer
b3b05a11d3
Merge commit 'dcb7c868ec7af7d3a138b3254ef2e08f074d8ec5'
...
* commit 'dcb7c868ec7af7d3a138b3254ef2e08f074d8ec5':
cosmetics: Make naming scheme of Xvid IDCT consistent with other IDCTs
Conflicts:
libavcodec/mpeg4videodec.c
libavcodec/x86/Makefile
libavcodec/x86/dct-test.c
libavcodec/x86/xvididct_sse2.c
libavcodec/xvididct.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-27 21:09:30 +02:00
Michael Niedermayer
3ff5ca89fc
Merge commit '1f156af4274dc72d588620f6bedb4e9e66023c92'
...
* commit '1f156af4274dc72d588620f6bedb4e9e66023c92':
x86: xvid_idct: Drop unused definitions
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-27 21:01:54 +02:00
Diego Biurrun
8d27bf1cff
x86: xvid: K&R formatting cosmetics
2014-08-27 05:58:04 -07:00
Diego Biurrun
dcb7c868ec
cosmetics: Make naming scheme of Xvid IDCT consistent with other IDCTs
2014-08-27 04:54:05 -07:00
Diego Biurrun
1f156af427
x86: xvid_idct: Drop unused definitions
2014-08-27 04:36:41 -07:00
Christophe Gisquet
3e892b2bcd
x86: hevc_mc: split differently calls
...
In some cases, 2 or 3 calls are performed to functions for unusual
widths. Instead, perform 2 calls for different widths to split the
workload.
The 8+16 and 4+8 widths for respectively 8 and more than 8 bits can't
be processed that way without modifications: some calls use unaligned
buffers, and having branches to handle this was resulting in no
micro-benchmark benefit.
For block_w == 12 (around 1% of the pixels of the sequence):
Before:
12758 decicycles in epel_uni, 4093 runs, 3 skips
19389 decicycles in qpel_uni, 8187 runs, 5 skips
22699 decicycles in epel_bi, 32743 runs, 25 skips
34736 decicycles in qpel_bi, 32733 runs, 35 skips
After:
11929 decicycles in epel_uni, 4096 runs, 0 skips
18131 decicycles in qpel_uni, 8184 runs, 8 skips
20065 decicycles in epel_bi, 32750 runs, 18 skips
31458 decicycles in qpel_bi, 32753 runs, 15 skips
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-24 12:05:33 +02:00
Christophe Gisquet
38e2aa3759
x86: hevc_mc: correct unneeded use of SSE4 code
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-24 11:43:33 +02:00
Christophe Gisquet
2346f2b5db
x86: hevcdsp: use compilation-time-fixed constant
...
The stride for some buffers is known.
Reviewed-by: Mickaël Raulet <mraulet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-22 16:26:30 +02:00
Christophe Gisquet
dad7f15567
hevcdsp: remove more instances of compile-time-fixed parameters
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-22 15:22:42 +02:00
Christophe Gisquet
d4f44b66d3
hevcdsp: remove compilation-time-fixed parameter
...
The dststride parameter is always MAX_PB_SIZE.
Reviewed-by: Mickaël Raulet <mraulet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-22 14:57:37 +02:00
Christophe Gisquet
fb1a98ec5b
x86: hevc_mc: assume 2nd source stride is 64
...
Reviewed-by: Mickaël Raulet <mraulet@gmail.com
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-22 13:21:37 +02:00
James Almer
54ca4dd43b
x86/hevc_res_add: refactor ff_hevc_transform_add{16,32}_8
...
* Reduced xmm register count to 7 (As such they are now enabled for x86_32).
* Removed four movdqa (affects the sse2 version only).
* pxor is now used to clear m0 only once.
~5% faster.
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-08-21 15:01:33 -03:00
James Almer
76a99d467f
x86/hecv_res_add: add ff_hevc_transform_add{8,16,32}_8_avx
...
~15% faster than sse2
Reviewed-by: Mickaël Raulet <mraulet@gmail.com>
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-08-20 16:54:52 -03:00
James Almer
9f498f4e6f
x86/hevc_res_add: fix register count in hevc_transform_add{16,32}_10_avx2
...
Signed-off-by: James Almer <jamrial@gmail.com>
2014-08-19 21:34:52 -03:00
Pierre Edouard Lepere
a6af4bf64d
x86: hevc: adding transform_add
...
Reviewed-by: James Almer <jamrial@gmail.com>
Approved-by: Ronald S. Bultje
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-20 01:28:56 +02:00
Michael Niedermayer
3bb2297351
Merge commit 'efd26bedec9a345a5960dbfcbaec888418f2d4e6'
...
* commit 'efd26bedec9a345a5960dbfcbaec888418f2d4e6':
build: Add explanatory comments to (optimization) blocks in the Makefiles
Conflicts:
libavcodec/ppc/Makefile
libavcodec/x86/Makefile
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-15 20:25:12 +02:00
Michael Niedermayer
c1df467d73
Merge commit '835f798c7d20bca89eb4f3593846251ad0d84e4b'
...
* commit '835f798c7d20bca89eb4f3593846251ad0d84e4b':
mpegvideo: cosmetics: Lowercase ugly uppercase MPV_ function name prefixes
Conflicts:
libavcodec/h261dec.c
libavcodec/intrax8.c
libavcodec/mjpegenc.c
libavcodec/mpeg12dec.c
libavcodec/mpeg12enc.c
libavcodec/mpeg4videoenc.c
libavcodec/mpegvideo.c
libavcodec/mpegvideo.h
libavcodec/mpegvideo_enc.c
libavcodec/rv10.c
libavcodec/x86/mpegvideoenc.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-15 20:11:56 +02:00
Diego Biurrun
efd26bedec
build: Add explanatory comments to (optimization) blocks in the Makefiles
2014-08-15 02:55:21 -07:00
Diego Biurrun
835f798c7d
mpegvideo: cosmetics: Lowercase ugly uppercase MPV_ function name prefixes
2014-08-15 01:26:33 -07:00
James Darnley
54a51d3840
lavc/flacenc: partially unroll loop in flac_enc_lpc_16
...
It now does 12 samples per iteration, up from 4.
From 1.8 to 3.2 times faster again. 3.6 to 5.7 times faster overall.
Runtime is reduced by a further 2 to 18%. Overall runtime reduced by
4 to 50%.
Same conditions as before apply.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-13 03:09:26 +02:00
James Darnley
0081a14e7d
lavc/flacenc: add sse4 version of the 16-bit lpc encoder
...
From 1.8 to 2.4 times faster. Runtime is reduced by 2 to 39%. The
speed-up generally increases with compression_level.
This lpc encoder is not used with levels < 3 so it provides no speed-up
in these cases.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-13 01:14:47 +02:00
Ronald S. Bultje
45bed0ab30
vp9/x86: fix bug in intra_pred_hd_32x32.
...
Fixes mismatch in first keyframe in sample
ffvp9_fails_where_libvpx.succeeds.webm from ticket 3849. There's still
a second mismatch a few frames into the sample.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-12 13:11:21 +02:00
James Almer
c97870d1a1
x86/dca: remove unused header
...
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-12 12:46:53 +02:00
James Almer
e20ff251a6
x86/ttadsp: remove an unnecessary mova
...
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-12 12:29:05 +02:00
Michael Niedermayer
3841f2ae66
Merge commit 'd35b94fbabd8beb5d566c0b5d01688aff62c3b36'
...
* commit 'd35b94fbabd8beb5d566c0b5d01688aff62c3b36':
avcodec: Rename xvidmmx IDCT to xvid
Conflicts:
doc/APIchanges
libavcodec/version.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-09 12:11:13 +02:00
Michael Niedermayer
0dcebb9f63
Merge commit '84d173d3de97c753234ab0c0b50551d51413d663'
...
* commit '84d173d3de97c753234ab0c0b50551d51413d663':
xvididct: Ensure that the scantable permutation is always set correctly
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-08 22:17:04 +02:00
Diego Biurrun
d35b94fbab
avcodec: Rename xvidmmx IDCT to xvid
...
The Xvid IDCT is not MMX-specific.
2014-08-08 11:13:30 -07:00
Diego Biurrun
84d173d3de
xvididct: Ensure that the scantable permutation is always set correctly
...
This fixes cases where the scantable permuation would get overwritten by
the general idctdsp initialization.
2014-08-08 11:13:29 -07:00
Christophe Gisquet
75837e9add
x86: sbrdsp/fft: reuse ps_neg constant
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-06 19:25:08 +02:00
Christophe Gisquet
51dd80e751
x86: diracdsp: reuse constants
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-06 19:25:02 +02:00
Christophe Gisquet
6622a6cff3
x86: dwt: better share constants
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-06 19:24:57 +02:00
Christophe Gisquet
71db2d08b1
x86: better share ff_pw_2
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-06 19:24:49 +02:00
Christophe Gisquet
4e128ab0b1
x86: vpx/h264/hevc/mpeg2: share constants
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-06 18:36:31 +02:00
Michael Niedermayer
305f72aee7
avcodec: Change get_pixels() to ptrdiff_t linesize
...
Found-by: ubitux
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-06 15:50:54 +02:00
Christophe Gisquet
6786848585
hevc_deblock: change tc type
...
The x86 asm expects int32_t so use that type.
Reviewed-by: Mickaël Raulet <mraulet@insa-rennes.fr>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-06 12:38:26 +02:00
James Almer
de417982e8
x86/vp9lpf: use fewer instructions in SPLATB_MIX
...
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-05 02:47:54 +02:00
Christophe Gisquet
e8c003edd2
x86: hevc_deblock: remove unnecessary masking
...
The unpacks/shuffles later on makes it unnecessary.
Before:
1508 decicycles in h, 2096759 runs, 393 skips
2512 decicycles in v, 2095422 runs, 1730 skips
After:
1477 decicycles in h, 2096745 runs, 407 skips
2484 decicycles in v, 2095297 runs, 1855 skips
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-04 17:46:04 +02:00
James Almer
b7863c972c
x86/hevc_mc: use fewer instructions in hevc_put_hevc_{uni, bi}_w[24]_{8, 10, 12}
...
Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-04 14:47:15 +02:00
James Almer
b1a44e6bf5
x86/hevc_mc: remove an unnecessary pxor
...
Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: Mickaël Raulet <mraulet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-04 14:35:08 +02:00
James Almer
d0f56ca071
x86/hevc_deblock: improve 8bit transpose store macros
...
Up to four instructions less depending on function and instruction set.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-03 04:24:15 +02:00
Michael Niedermayer
f54e01c24e
Merge commit 'a786c8259dafeca9744252230b5d78f67810770c'
...
* commit 'a786c8259dafeca9744252230b5d78f67810770c':
idct: Split off Xvid IDCT
Conflicts:
libavcodec/Makefile
libavcodec/mpeg4videodec.c
libavcodec/x86/Makefile
libavcodec/x86/idctdsp_init.c
This split is somewhat restructured leaving the xvid IDCT available
outside mpeg4 if manually selected.
The code also could not be merged unchanged as it conflicted with a
bugfix in FFmpeg
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-01 16:21:52 +02:00
Diego Biurrun
a786c8259d
idct: Split off Xvid IDCT
...
The Xvid IDCT is only required to decode some Xvid-encoded MPEG-4 files,
so there is no point in having it as an unconditional part of idctdsp.
2014-08-01 01:25:18 -07:00
James Almer
62baf5b853
x86/hevc_deblock: use existing x86util transpose macro in chroma_{10, 12}
...
Cosmetic change. No measurable difference in speed.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-31 22:56:21 +02:00
Christophe Gisquet
a507623bad
x86: hevc_mc: fix register count usage
...
A macro was using a fixed register, causing too many GPRs to be
declared as used.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-29 22:50:50 +02:00
James Almer
73c4f63ba5
x86/hevc_deblock: add add ff_hevc_[hv]_loop_filter_luma_{8, 10, 12}_avx
...
~5% faster than SSSE3
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-29 14:04:59 +02:00
James Almer
88ba821f23
x86/hevc_deblock: improve luma functions register allocation
...
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-29 13:38:05 +02:00
James Almer
c74b08c5c6
x86/hevc_deblock: remove some unnecessary instructions
...
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-29 13:27:44 +02:00
James Almer
4f91bb0ff0
x86/hevc_deblock: use psignw instead of pmullw where possible
...
It's slightly faster
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-29 03:42:29 +02:00
Michael Niedermayer
a91c5ed008
Merge commit '4f8cf0dc4ef6110174056df7edd9dc2f2a988b6d'
...
* commit '4f8cf0dc4ef6110174056df7edd9dc2f2a988b6d':
x86: build: Restore ordering of OBJS lines
Conflicts:
libavcodec/x86/Makefile
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-29 00:34:53 +02:00
Diego Biurrun
4f8cf0dc4e
x86: build: Restore ordering of OBJS lines
2014-07-28 13:19:04 -07:00
James Almer
664e9e4331
x86/hevc_deblock: load less data in hevc_h_loop_filter_luma_8
...
Reading 8 bytes is enough.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-28 21:55:22 +02:00
James Almer
f137876182
x86/hevc_idct: add a colon to labels
...
This fixes a warning spam when using NASM
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-28 21:43:32 +02:00
Christophe Gisquet
81943a10b5
x86: hevc_mc: load less data in epel filters
...
Before:
5679 decicycles in epel_bi, 2059976 runs, 37176 skips
3468 decicycles in epel_uni, 1040886 runs, 7690 skips
After:
5323 decicycles in epel_bi, 2059493 runs, 37659 skips
3262 decicycles in epel_uni, 1040871 runs, 7705 skips
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-27 18:34:39 +02:00
Christophe Gisquet
36284ae981
x86: hevc_mc: replace one lea by add
...
Should have been in 036f11bdb5
.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-27 17:42:56 +02:00
James Almer
bfb3b2b7a6
x86/hevc_idct: add 12bit idct_dc
...
Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: Mickaël Raulet <mraulet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-27 00:30:56 +02:00
Michael Niedermayer
d4a9e89b27
avcodec/x86/hevcdsp_init: make license header consistent
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-27 00:28:44 +02:00
Michael Niedermayer
706f81a2c2
Merge commit '1a880b2fb8456ce68eefe5902bac95fea1e6a72d'
...
* commit '1a880b2fb8456ce68eefe5902bac95fea1e6a72d':
hevc: SSE2 and SSSE3 loop filters
Conflicts:
libavcodec/hevcdsp.c
libavcodec/hevcdsp.h
libavcodec/x86/Makefile
libavcodec/x86/hevc_deblock.asm
libavcodec/x86/hevcdsp_init.c
See: de7b89fd43
and several others
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-27 00:20:48 +02:00
James Almer
1ace9573dc
x86/hevc_idct: replace old and unused idct functions
...
Only 8-bit and 10-bit idct_dc() functions are included (adding others should be trivial).
Benchmarks on an Intel Core i5-4200U:
idct8x8_dc
SSE2 MMXEXT C
cycles 22 26 57
idct16x16_dc
AVX2 SSE2 C
cycles 27 32 249
idct32x32_dc
AVX2 SSE2 C
cycles 62 126 1375
Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: Mickaël Raulet <mraulet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-26 18:00:11 +02:00
Pierre Edouard Lepere
1a880b2fb8
hevc: SSE2 and SSSE3 loop filters
...
Additional contributions by James Almer <jamrial@gmail.com>,
Carl Eugen Hoyos <cehoyos@ag.or.at>, Fiona Glaser <fiona@x264.com> and
Anton Khirnov <anton@khirnov.net>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2014-07-26 15:01:01 +00:00
Christophe Gisquet
036f11bdb5
x86: hevc_mc: replace simple leas by adds
...
lea is detrimental for those simple cases. No impact overall to
the change though.
Before:
15017 decicycles in q, 1016152 runs, 32424 skips
15382 decicycles in q_bi, 1013673 runs, 34903 skips
3713 decicycles in e, 2074534 runs, 22618 skips
3901 decicycles in e_bi, 2065509 runs, 31643 skips
7852 decicycles in q_uni, 520165 runs, 4123 skips
2398 decicycles in e_uni, 1043339 runs, 5237 skips
After:
14898 decicycles in q, 1016295 runs, 32281 skips
15119 decicycles in q_bi, 1015392 runs, 33184 skips
3682 decicycles in e, 2073224
runs, 23928 skips
3720 decicycles in e_bi, 2065043 runs, 32109 skips
7643 decicycles in q_uni, 520280 runs, 4008 skips
2363 decicycles in e_uni, 1043780 runs, 4796 skips
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-26 05:41:04 +02:00
Mickaël Raulet
bd0f2d316f
x86/hevc: add 12bits support for MC
...
cherry picked from commit 3fcb7a4595a6f40100a22110a5805e3b7510c0fd
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-26 01:55:20 +02:00
Mickaël Raulet
7df98d8c4d
x86/hevc: remove unused constant in deblocking filter
...
cherry picked from commit a3f7282eaa6f1ab0524fb966c6eade50c3025f99
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-26 01:20:40 +02:00
Mickaël Raulet
7bdcf5c934
x86/hevc: add 12bits support for deblocking filter
...
cherry picked from commit 97d46afe320c7d61d7b9525e5f5588355cde4bb0
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-26 01:19:42 +02:00
Michael Niedermayer
2904d052b7
Merge commit '7fb993d338d88f2f62e0a358b6c9f3eb9a3a08ac'
...
* commit '7fb993d338d88f2f62e0a358b6c9f3eb9a3a08ac':
qpeldsp: Mark source pointer in qpel_mc_func function pointer const
Conflicts:
libavcodec/h264qpel_template.c
libavcodec/x86/cavsdsp.c
libavcodec/x86/rv40dsp_init.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-25 13:05:08 +02:00
Diego Biurrun
7fb993d338
qpeldsp: Mark source pointer in qpel_mc_func function pointer const
2014-07-25 02:52:54 -07:00
Christophe Gisquet
670b7f203a
x86: hevcdsp: align
...
Reviewed-by: Mickaël Raulet <mraulet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-23 22:18:08 +02:00
Carl Eugen Hoyos
c75fdee747
avcodec/x86/hevc_deblock: Fix compilation with nasm.
2014-07-23 10:32:27 +02:00
Michael Niedermayer
ca6b33b8bd
avcodec/x86/hevcdsp_init: Fix "warning: assignment from incompatible pointer type"
2014-07-22 16:36:12 +02:00
Anton Khirnov
d7e162d46b
hevcdsp: remove an unneeded variable in the loop filter
...
beta0 and beta1 will always be the same within a CU
Signed-off-by: Mickaël Raulet <mraulet@insa-rennes.fr>
cherry picked from commit 4a23d824741a289c7d2d2f2871d1e2621b63fa1b
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-22 16:27:26 +02:00
Anton Khirnov
ae2f048fd7
avcodec/x86/hevc_deblock: cosmetics
...
cherry picked from commit f7843356253459e6010320292dbbc1e888a5249b
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-22 16:18:05 +02:00
Anton Khirnov
b435043abb
hevc: cleanups in SSE2 and SSSE3 loop filters, use fewer instructions
...
cherry picked from commit f7843356253459e6010320292dbbc1e888a5249b
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-22 16:17:29 +02:00
Anton Khirnov
e8581b17a8
avcodec/x86/hevc_deblock: use test instead of cmp 0
...
cherry picked from commit f7843356253459e6010320292dbbc1e888a5249b
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-22 16:16:05 +02:00
Anton Khirnov
dc69247de4
avcodec/x86/hevc_deblock: use of paddw instead of psllw
...
cherry picked from commit f7843356253459e6010320292dbbc1e888a5249b
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-22 16:14:53 +02:00
Anton Khirnov
500a0394d5
avcodec/x86/hevc_deblock: add %ifs to avoid "do nothing instructions"
...
cherry picked from commit f7843356253459e6010320292dbbc1e888a5249b
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-22 16:13:28 +02:00
Anton Khirnov
7a4cf67117
hevc: cleaning up SSE2 and SSSE3 deblocking filters
...
Signed-off-by: Mickaël Raulet <mraulet@insa-rennes.fr>
cherry picked from commit b432041d7d1eca38831590f13b4e5baffff8186f
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-22 16:00:48 +02:00
Michael Niedermayer
d986c414de
Merge commit '81b9bf319226fe03436c80aaa8a2c91767cab7ce'
...
* commit '81b9bf319226fe03436c80aaa8a2c91767cab7ce':
dct-test: Move arch-specific bits into arch-specific subdirectories
Conflicts:
libavcodec/dct-test.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-21 13:33:51 +02:00
Diego Biurrun
81b9bf3192
dct-test: Move arch-specific bits into arch-specific subdirectories
2014-07-21 01:10:11 -07:00
Michael Niedermayer
776647360d
Merge commit '5dcc201505f71b1e73e9eef12ce89d4eed252ad0'
...
* commit '5dcc201505f71b1e73e9eef12ce89d4eed252ad0':
simple_idct: Move x86-specific declarations to a header in the x86 directory
Conflicts:
libavcodec/x86/simple_idct.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-19 13:56:29 +02:00
Michael Niedermayer
6da96a9fc9
Merge commit '85cabb8d002f2cd100ced5cc17d87bfc9460d314'
...
* commit '85cabb8d002f2cd100ced5cc17d87bfc9460d314':
fdct: Move x86-specific declarations to a header in the x86 directory
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-19 13:45:59 +02:00
Diego Biurrun
5dcc201505
simple_idct: Move x86-specific declarations to a header in the x86 directory
2014-07-19 02:33:36 -07:00
Diego Biurrun
85cabb8d00
fdct: Move x86-specific declarations to a header in the x86 directory
2014-07-19 02:25:59 -07:00
Michael Niedermayer
097bf834ba
Merge commit '9e0b29911f1f167381a7dbdfca68bf417b8c767b'
...
* commit '9e0b29911f1f167381a7dbdfca68bf417b8c767b':
x86: dnxhdenc: Eliminate some unnecessary ifdefs
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-18 22:33:24 +02:00
Michael Niedermayer
521f569734
Merge commit '8b0dd4942aac320d1ca3c40fa7ea1be342c71273'
...
* commit '8b0dd4942aac320d1ca3c40fa7ea1be342c71273':
idctdsp: prettyprinting cosmetics
Conflicts:
libavcodec/idctdsp.c
libavcodec/ppc/idctdsp.c
libavcodec/x86/idctdsp_init.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-18 22:16:04 +02:00
Michael Niedermayer
42d326353c
Merge commit 'b4987f72197e0c62cf2633bf835a9c32d2a445ae'
...
* commit 'b4987f72197e0c62cf2633bf835a9c32d2a445ae':
idct: Convert IDCT permutation #defines to an enum
Conflicts:
libavcodec/idctdsp.c
libavcodec/x86/cavsdsp.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-18 22:01:17 +02:00
Diego Biurrun
9e0b29911f
x86: dnxhdenc: Eliminate some unnecessary ifdefs
2014-07-18 09:58:17 -07:00
Diego Biurrun
8b0dd4942a
idctdsp: prettyprinting cosmetics
2014-07-18 07:51:03 -07:00
Diego Biurrun
b4987f7219
idct: Convert IDCT permutation #defines to an enum
...
Also rename the enum values to be consistent with other DCT permutations.
2014-07-18 07:51:03 -07:00
Michael Niedermayer
3a2d1465c8
Merge commit '2d60444331fca1910510038dd3817bea885c2367'
...
* commit '2d60444331fca1910510038dd3817bea885c2367':
dsputil: Split motion estimation compare bits off into their own context
Conflicts:
configure
libavcodec/Makefile
libavcodec/arm/Makefile
libavcodec/dvenc.c
libavcodec/error_resilience.c
libavcodec/h264.h
libavcodec/h264_slice.c
libavcodec/me_cmp.c
libavcodec/me_cmp.h
libavcodec/motion_est.c
libavcodec/motion_est_template.c
libavcodec/mpeg4videoenc.c
libavcodec/mpegvideo.c
libavcodec/mpegvideo_enc.c
libavcodec/x86/Makefile
libavcodec/x86/me_cmp_init.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-17 23:27:40 +02:00
Michael Niedermayer
d6676a1605
Merge commit 'c23ce454b3e33634a188d6facfd2b7182af5af93'
...
* commit 'c23ce454b3e33634a188d6facfd2b7182af5af93':
x86: dsputil: Coalesce all init files
Conflicts:
libavcodec/x86/dsputil_init.c
libavcodec/x86/dsputil_x86.h
libavcodec/x86/motion_est.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-17 22:07:52 +02:00
Diego Biurrun
2d60444331
dsputil: Split motion estimation compare bits off into their own context
2014-07-17 09:07:10 -07:00
Diego Biurrun
c23ce454b3
x86: dsputil: Coalesce all init files
...
This makes the init files match the structure of the dsputil split.
2014-07-17 03:32:56 -07:00
Michael Niedermayer
cc3e7a4c3d
Merge commit 'acf91215c74a91eb3b86af01dcb1d3c78d0e2310'
...
* commit 'acf91215c74a91eb3b86af01dcb1d3c78d0e2310':
x86: dsputil: Avoid pointless CONFIG_ENCODERS indirection
Conflicts:
libavcodec/x86/dsputil_init.c
libavcodec/x86/dsputilenc_mmx.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-13 21:51:20 +02:00
Diego Biurrun
acf91215c7
x86: dsputil: Avoid pointless CONFIG_ENCODERS indirection
...
The remaining dsputil bits are encoding-specific anyway.
2014-07-13 07:01:05 -07:00
James Almer
276bef5340
x86/hevc_deblock: add ff_hevc_[hv]_loop_filter_luma_{8, 10}_sse2
...
Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: Kieran Kunhya <kierank@obe.tv>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-13 13:48:31 +02:00
James Almer
123649dd19
x86/dsputilenc: remove some empty if statements
...
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-12 15:04:58 +02:00
Michael Niedermayer
b8cdf04726
Merge commit '1173320249745eab01c901a39054fc0fced33c87'
...
* commit '1173320249745eab01c901a39054fc0fced33c87':
dsputil: Drop unused bit_depth parameter from all init functions
Conflicts:
libavcodec/dsputil.c
libavcodec/dsputil.h
libavcodec/ppc/dsputil_ppc.c
libavcodec/x86/dsputilenc_mmx.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-11 20:29:40 +02:00
Diego Biurrun
1173320249
dsputil: Drop unused bit_depth parameter from all init functions
2014-07-11 06:38:26 -07:00
Michael Niedermayer
2d5e9451de
Merge commit 'f46bb608d9d76c543e4929dc8cffe36b84bd789e'
...
* commit 'f46bb608d9d76c543e4929dc8cffe36b84bd789e':
dsputil: Split off pixel block routines into their own context
Conflicts:
configure
libavcodec/dsputil.c
libavcodec/mpegvideo_enc.c
libavcodec/pixblockdsp_template.c
libavcodec/x86/dsputilenc.asm
libavcodec/x86/dsputilenc_mmx.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-10 01:22:14 +02:00
Diego Biurrun
f46bb608d9
dsputil: Split off pixel block routines into their own context
2014-07-09 08:05:26 -07:00
Michael Niedermayer
14e2406de7
Merge commit 'a9aee08d900f686e966c64afec5d88a7d9d130a3'
...
* commit 'a9aee08d900f686e966c64afec5d88a7d9d130a3':
dsputil: Split off FDCT bits into their own context
Conflicts:
configure
libavcodec/Makefile
libavcodec/asvenc.c
libavcodec/dnxhdenc.c
libavcodec/dsputil.c
libavcodec/mpegvideo.h
libavcodec/mpegvideo_enc.c
libavcodec/x86/Makefile
libavcodec/x86/dsputilenc_mmx.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-08 03:19:06 +02:00
Diego Biurrun
a9aee08d90
dsputil: Split off FDCT bits into their own context
2014-07-07 12:28:45 -07:00
Michael Niedermayer
3790801f9c
Merge commit '3c650efb81aaa3b395ba4606ee68a47ee4efb57b'
...
* commit '3c650efb81aaa3b395ba4606ee68a47ee4efb57b':
dsputil: Move draw_edges() to mpegvideoencdsp
Conflicts:
libavcodec/mpegvideo_enc.c
libavcodec/x86/Makefile
libavcodec/x86/dsputil_init.c
libavcodec/x86/dsputil_mmx.c
libavcodec/x86/dsputil_x86.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-07 16:17:27 +02:00
Michael Niedermayer
020865f557
Merge commit 'c166148409fe8f0dbccef2fe684286a40ba1e37d'
...
* commit 'c166148409fe8f0dbccef2fe684286a40ba1e37d':
dsputil: Move pix_sum, pix_norm1, shrink function pointers to mpegvideoenc
Conflicts:
libavcodec/dsputil.c
libavcodec/mpegvideo_enc.c
libavcodec/x86/dsputilenc.asm
libavcodec/x86/dsputilenc_mmx.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-07 15:36:58 +02:00
Michael Niedermayer
462c6cdb8e
Merge commit '8d686ca59db14900ad5c12b547fb8a7afc8b0b94'
...
* commit '8d686ca59db14900ad5c12b547fb8a7afc8b0b94':
dsputil: Split off *_8x8basis to a separate context
Conflicts:
libavcodec/dsputil.c
libavcodec/mpegvideo_enc.c
libavcodec/x86/dsputilenc_mmx.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-07 15:08:55 +02:00
Diego Biurrun
3c650efb81
dsputil: Move draw_edges() to mpegvideoencdsp
2014-07-06 14:48:50 -07:00
Diego Biurrun
c166148409
dsputil: Move pix_sum, pix_norm1, shrink function pointers to mpegvideoenc
2014-07-06 14:26:53 -07:00
Diego Biurrun
8d686ca59d
dsputil: Split off *_8x8basis to a separate context
2014-07-06 13:09:24 -07:00
James Almer
195f7bd23d
x86/svq1enc: use unaligned mov on SSE2
...
Might fix fate failures on some systems
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-06 20:27:57 +02:00
James Almer
dad31083ae
x86/svq1enc: port ssd_int8_vs_int16 to yasm
...
Also add an SSE2 version
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-05 21:43:40 +02:00
Michael Niedermayer
19b79c1429
Merge commit 'b0de1c766329dd8c9960ad1722e2f653160abc1b'
...
* commit 'b0de1c766329dd8c9960ad1722e2f653160abc1b':
x86: build: Only compile FDCT code if MMX is enabled
Conflicts:
libavcodec/x86/Makefile
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-05 20:12:31 +02:00
Michael Niedermayer
5036c8b17b
Merge commit '12f129e545e5a5844b6ad7f3eb6a438015cad8bc'
...
* commit '12f129e545e5a5844b6ad7f3eb6a438015cad8bc':
x86: Unconditionally compile blockdsp and svq1enc init files
Conflicts:
libavcodec/x86/Makefile
blockdsp_mmx is renamed to blockdsp_init as we already have a blockdsp file
and _init is how all other such files are called
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-05 19:50:05 +02:00
Michael Niedermayer
6bef3e55bd
Merge commit '009331303a6462d07cbe94aef9c446f1a1695519'
...
* commit '009331303a6462d07cbe94aef9c446f1a1695519':
x86: huffyuvdsp: Move inline assembly to init file
Conflicts:
libavcodec/x86/Makefile
libavcodec/x86/huffyuvdsp.h
libavcodec/x86/huffyuvdsp_init.c
libavcodec/x86/huffyuvdsp_mmx.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-05 19:11:26 +02:00
Diego Biurrun
b0de1c7663
x86: build: Only compile FDCT code if MMX is enabled
...
All other files containing purely inline assembly are treated the same way.
2014-07-05 04:18:34 -07:00
Diego Biurrun
12f129e545
x86: Unconditionally compile blockdsp and svq1enc init files
...
This avoids a link failure with MMX disabled as the init functions
are referenced unconditionally.
2014-07-05 04:18:34 -07:00
Diego Biurrun
009331303a
x86: huffyuvdsp: Move inline assembly to init file
...
This avoids a link failure with MMX disabled as now code and
initialization are compiled under the same condition.
2014-07-05 04:18:34 -07:00
Michael Niedermayer
5c65aed7fd
Merge commit '391ecc961ced2bde7aecb3053ac35191f838fae8'
...
* commit '391ecc961ced2bde7aecb3053ac35191f838fae8':
x86: mpegvideoenc: Change SIMD optimization name suffixes to lowercase
Conflicts:
libavcodec/x86/mpegvideoenc.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-04 01:17:39 +02:00
Diego Biurrun
391ecc961c
x86: mpegvideoenc: Change SIMD optimization name suffixes to lowercase
2014-07-03 13:41:41 -07:00
James Almer
a441a2437b
x86: rename dsputil.asm to idctdsp.asm
...
Its only function is no longer part of dsputil.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-02 01:08:04 +02:00
Michael Niedermayer
8d0c7031a8
Merge commit '79793f833784121d574454af4871866576c0749d'
...
* commit '79793f833784121d574454af4871866576c0749d':
Update Fiona's name in copyright statements.
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-01 15:43:40 +02:00
Michael Niedermayer
581b5f0b9b
Merge commit 'e3fcb14347466095839c2a3c47ebecff02da891e'
...
* commit 'e3fcb14347466095839c2a3c47ebecff02da891e':
dsputil: Split off IDCT bits into their own context
Conflicts:
configure
libavcodec/aic.c
libavcodec/arm/Makefile
libavcodec/arm/dsputil_init_arm.c
libavcodec/arm/dsputil_init_armv6.c
libavcodec/asvdec.c
libavcodec/dnxhdenc.c
libavcodec/dsputil.c
libavcodec/dvdec.c
libavcodec/dxva2_mpeg2.c
libavcodec/intrax8.c
libavcodec/mdec.c
libavcodec/mjpegdec.c
libavcodec/mjpegenc_common.h
libavcodec/mpegvideo.c
libavcodec/ppc/dsputil_altivec.h
libavcodec/ppc/dsputil_ppc.c
libavcodec/ppc/idctdsp.c
libavcodec/x86/Makefile
libavcodec/x86/dsputil_init.c
libavcodec/x86/dsputil_mmx.c
libavcodec/x86/dsputil_x86.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-01 15:22:11 +02:00
Diego Biurrun
79793f8337
Update Fiona's name in copyright statements.
2014-07-01 03:26:51 -07:00
Diego Biurrun
e3fcb14347
dsputil: Split off IDCT bits into their own context
2014-06-30 07:58:46 -07:00
Michael Niedermayer
5bca5f87d1
Revert "x86/videodsp: add emulated_edge_mc_mmxext"
...
The commit causes minor out of array reads and was mainly intended for
future optimizations which turned out not to be meassurably faster.
Itself it was just 1 cpu cycle faster
Approved-by: jamrial
This reverts commit 057d2704e7
.
2014-06-28 05:39:07 +02:00
Michael Niedermayer
09a7a4704e
Merge commit 'd2869aea0494d3a20d53d5034cd41dbb488eb133'
...
* commit 'd2869aea0494d3a20d53d5034cd41dbb488eb133':
dsputil: Move MMX/SSE2-optimized IDCT bits to the x86 subdirectory
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-27 03:05:33 +02:00
Diego Biurrun
d2869aea04
dsputil: Move MMX/SSE2-optimized IDCT bits to the x86 subdirectory
2014-06-26 16:15:07 -07:00
James Almer
057d2704e7
x86/videodsp: add emulated_edge_mc_mmxext
...
This also changes hfix8_mmx and above to use mmx regs instead of
gprs, and makes emulated_edge_mc_sse and emulated_edge_mc_sse2 use
mmxext hfix and hvar functions instead of mmx where possible.
This is mostly in preparation for an ssse3 version.
Signed-off-by: James Almer <jamrial@gmail.com>
code is about 1 cpu cycle faster approximately
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-26 17:58:57 +02:00
Michael Niedermayer
11ba0c8207
Merge commit '5ab03e41e553452118113d0c224fa32b325e45e5'
...
* commit '5ab03e41e553452118113d0c224fa32b325e45e5':
x86: h264dsp: Fix link failure with optimizations disabled
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-26 02:58:59 +02:00
Diego Biurrun
5ab03e41e5
x86: h264dsp: Fix link failure with optimizations disabled
...
With optimzations disabled compilers have trouble doing dead code
elimination on 'if (foo && 0)' expressions, while 'if (0 && foo)'
still works, so use the latter to avoid problems.
Bug-Id: 707
2014-06-25 15:24:51 -07:00
Michael Niedermayer
1ace0ca60f
avcodec/x86/hevc_idct: fix function name in comment
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-25 18:22:25 +02:00
plepere
9ba6b17add
avcodec/x86/hevc_idct: fix number of sse registers
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-25 14:59:23 +02:00
plepere
942e22c651
avcodec/x86/hevc: add avx2 dc idct
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-25 14:49:44 +02:00
Michael Niedermayer
eab2509f8c
avcodec/x86/h264_qpel_10bit: locally define pb_0
...
somehow old llvm-gcc manages to ignore the alignment from ff_pb_0 causing a crash on freebsd
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-24 02:13:43 +02:00
James Almer
476bd3c7e4
x86/dsputil: move put_signed_pixels_clamped out of bswapdsp.asm
...
It's still a dsputil function
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-23 22:11:18 +02:00
Michael Niedermayer
d7463c6813
Merge commit 'fab9df63a3156ffe1f9490aafaea41e03ef60ddf'
...
* commit 'fab9df63a3156ffe1f9490aafaea41e03ef60ddf':
dsputil: Split off global motion compensation bits into a separate context
Conflicts:
libavcodec/dsputil.c
libavcodec/dsputil.h
libavcodec/ppc/dsputil_altivec.h
libavcodec/x86/dsputil_init.c
libavcodec/x86/dsputil_mmx.c
libavcodec/x86/dsputil_x86.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-23 21:10:10 +02:00
Diego Biurrun
fab9df63a3
dsputil: Split off global motion compensation bits into a separate context
2014-06-23 09:58:17 -07:00
Michael Niedermayer
35bb74900b
Merge commit 'c67b449bebbe0b35c73b203683e77a0a649bc765'
...
* commit 'c67b449bebbe0b35c73b203683e77a0a649bc765':
dsputil: Split bswap*_buf() off into a separate context
Conflicts:
configure
libavcodec/4xm.c
libavcodec/ac3dec.c
libavcodec/ac3dec.h
libavcodec/apedec.c
libavcodec/eamad.c
libavcodec/flacenc.c
libavcodec/fraps.c
libavcodec/huffyuv.c
libavcodec/huffyuvdec.c
libavcodec/motionpixels.c
libavcodec/truemotion2.c
libavcodec/x86/Makefile
libavcodec/x86/dsputil_init.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-23 13:31:26 +02:00
Diego Biurrun
c67b449beb
dsputil: Split bswap*_buf() off into a separate context
2014-06-22 18:22:31 -07:00
James Almer
c172683bf4
x86/dsputil: remove redundant global motion compensation code
...
The SSE version has been no different than the mmx one since commit a41bf09d
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-23 02:15:06 +02:00
James Almer
6ec3dc97fc
x86/audiodsp: move asm code out of dsputil
...
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-22 19:53:09 +02:00
Michael Niedermayer
99497b4683
Merge commit '9a9e2f1c8aa4539a261625145e5c1f46a8106ac2'
...
* commit '9a9e2f1c8aa4539a261625145e5c1f46a8106ac2':
dsputil: Split audio operations off into a separate context
Conflicts:
configure
libavcodec/takdec.c
libavcodec/x86/Makefile
libavcodec/x86/dsputil.asm
libavcodec/x86/dsputil_init.c
libavcodec/x86/dsputil_mmx.c
libavcodec/x86/dsputil_x86.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-22 17:58:28 +02:00
Diego Biurrun
9a9e2f1c8a
dsputil: Split audio operations off into a separate context
2014-06-22 06:20:15 -07:00
Michael Niedermayer
33f83a2157
avcodec/x86/rv40dsp_init: fix () in macros
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-20 21:36:43 +02:00
James Almer
a5ce608fc7
x86/blockdsp: restore author attribution
...
See commits
649c00c96d
5fecfb7d58
73b02e2460
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-19 18:31:44 +02:00
Michael Niedermayer
08c5859f17
avcodec: add simpleauto idct
...
This will pick the "best" simple idct compatible idct
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-19 14:28:01 +02:00
James Almer
454c019cb5
x86/hevc_idct: fix movd parameter size in DC_ADD_INIT
...
Fixes compilation with NASM x86_64
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-19 13:18:13 +02:00
James Almer
fe782233aa
x86/blockdsp: move asm code out of dsputil
...
Also replace INLINE_<opt> with EXTERNAL_<opt> that were wrongly
changed by commit 2b05db4f81
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-19 13:09:03 +02:00
Michael Niedermayer
042a82ca37
avcodec/x86/lossless_videodsp: Fix size of values read for left/left_top
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-19 05:53:41 +02:00
Michael Niedermayer
2b05db4f81
Merge commit 'e74433a8e6fc00c8dbde293c97a3e45384c2c1d9'
...
* commit 'e74433a8e6fc00c8dbde293c97a3e45384c2c1d9':
dsputil: Split clear_block*/fill_block* off into a separate context
Conflicts:
configure
libavcodec/asvdec.c
libavcodec/dnxhddec.c
libavcodec/dnxhdenc.c
libavcodec/dsputil.h
libavcodec/eamad.c
libavcodec/intrax8.c
libavcodec/mjpegdec.c
libavcodec/ppc/dsputil_ppc.c
libavcodec/vc1dec.c
libavcodec/x86/dsputil_init.c
libavcodec/x86/dsputil_mmx.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-19 04:54:38 +02:00
Diego Biurrun
e74433a8e6
dsputil: Split clear_block*/fill_block* off into a separate context
2014-06-18 14:07:23 -07:00
plepere
92cccb7bcd
avcodec/hevc: new idct + asm
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-17 13:23:36 +02:00
Christophe Gisquet
9107612818
x86util: add and use RSHIFT/LSHIFT macros
...
Those macros take a byte number as shift argument, as this argument
differs between MMX and SSE2 instructions.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-15 13:19:27 +02:00
Ronald S. Bultje
385a3420d1
vp9/x86: fix overwrite in ipred_vl_4x4_ssse3.
...
Fixes track ticket 3717.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-12 04:11:20 +02:00
Christophe Gisquet
508e7a5c16
x86: huffyuv: fix {add,diff}_int16
...
They used an extra, undeclared register. Fixes a crash in
fate-vsynth3-ffvhuff444p16
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-12 00:26:19 +02:00
Michael Niedermayer
1a2ff62859
Merge commit '570d4b21863b6254d6bbca9c528bede471bb4478'
...
* commit '570d4b21863b6254d6bbca9c528bede471bb4478':
x86: h264: Don't keep data in the redzone across function calls on 64 bit unix
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-10 18:35:49 +02:00
Martin Storsjö
570d4b2186
x86: h264: Don't keep data in the redzone across function calls on 64 bit unix
...
We know that the called function (ff_chroma_inter_body_mmxext)
doesn't touch the redzone, and thus will be kept intact - thus,
this doesn't fix any bug per se.
However, valgrind's memcheck tool intentionally assumes that the
redzone is clobbered on every function call and function return
(see a long comment in valgrind/memcheck/mc_main.c). This avoids
false positives in that tool, at the cost of an extra stack pointer
adjustment.
The other alternative would be a valgrind suppression for this issue,
but that's an extra burden for everybody that wants to run libavcodec
within valgrind.
Signed-off-by: Martin Storsjö <martin@martin.st>
2014-06-10 16:31:48 +03:00
Michael Niedermayer
06f576c4ab
avcodec/x86/dct_init: fix build failure with clang && disable-optimizations
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-09 19:32:41 +02:00
James Almer
6d408495b5
x86/dct32: don't build ff_dct32_float_sse on x86_64
...
There's an SSE2 version already, and technically the SSE version
on x86_64 was wrong (using pshufd and pshuflw, SSE2 instructions).
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-09 00:51:43 +02:00
James Almer
fc8db12a73
x86/vp9: inital AVX2 intra_pred
...
tos3k-vp9-b10000.webm on a Core i5-4200U @1.6GHz
1219 decicycles in ff_vp9_ipred_dc_32x32_ssse3, 131070 runs, 2 skips
439 decicycles in ff_vp9_ipred_dc_32x32_avx2, 131070 runs, 2 skips
3570 decicycles in ff_vp9_ipred_dc_top_32x32_ssse3, 4096 runs, 0 skips
2494 decicycles in ff_vp9_ipred_dc_top_32x32_avx2, 4096 runs, 0 skips
1419 decicycles in ff_vp9_ipred_dc_left_32x32_ssse3, 16384 runs, 0 skips
717 decicycles in ff_vp9_ipred_dc_left_32x32_avx2, 16384 runs, 0 skips
2737 decicycles in ff_vp9_ipred_tm_32x32_avx, 1024 runs, 0 skips
2088 decicycles in ff_vp9_ipred_tm_32x32_avx2, 1024 runs, 0 skips
3090 decicycles in ff_vp9_ipred_v_32x32_avx, 512 runs, 0 skips
2226 decicycles in ff_vp9_ipred_v_32x32_avx2, 512 runs, 0 skips
1565 decicycles in ff_vp9_ipred_h_32x32_avx, 1024 runs, 0 skips
922 decicycles in ff_vp9_ipred_h_32x32_avx2, 1024 runs, 0 skips
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-08 02:37:20 +02:00
James Almer
ec98f80af4
x86/dsputil: move some mmx init code inside dsputil_init_mmx()
...
This reduces differences with the fork
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-06 05:26:04 +02:00
Christophe Gisquet
ccff45a0d3
apedsp: move to llauddsp
...
APE is not the sole codec using scalarproduct_and_madd_int16.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-05 20:31:59 +02:00
Michael Niedermayer
d5c9d055ea
avcodec/x86/dsputilenc_mmx: fix build without yasm
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-04 05:39:03 +02:00
James Almer
625ffa1457
x86/motion_est: sad_{x, y}2_mmxext functions are bitexact
...
Only the xy2 functions aren't.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-04 00:48:35 +02:00
Timothy Gu
108dec3055
x86: dsputilenc: convert hf_noise*_mmx to yasm
...
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
Several bugfixes by: Christophe Gisquet <christophe.gisquet@gmail.com>
See: [FFmpeg-devel] [WIP] [PATCH 4/4] x86: dsputilenc: convert hf_noise*_mmx to yasm
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-03 23:59:43 +02:00
Christophe Gisquet
dcd2a6ca36
x86: hevc_mc: remove unneeded shift
...
The immediate value may be 0.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-01 23:34:33 +02:00
Christophe Gisquet
09fc28aed1
x86: hevcdsp_init: fix macro usage
...
The macro was not using the parameter but unconditionally using sse4.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-01 23:20:07 +02:00
James Almer
e1bd40fe6b
x86/motion_est: enable sad16_sse2 on k10 CPUs
...
The check is meant for k8 CPUs. sad16_sse2 is ~20% faster than sad16_mmxext on k10.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-01 02:10:32 +02:00