Christophe Gisquet
dad7f15567
hevcdsp: remove more instances of compile-time-fixed parameters
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-22 15:22:42 +02:00
Christophe Gisquet
d4f44b66d3
hevcdsp: remove compilation-time-fixed parameter
...
The dststride parameter is always MAX_PB_SIZE.
Reviewed-by: Mickaël Raulet <mraulet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-22 14:57:37 +02:00
Christophe Gisquet
fb1a98ec5b
x86: hevc_mc: assume 2nd source stride is 64
...
Reviewed-by: Mickaël Raulet <mraulet@gmail.com
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-22 13:21:37 +02:00
James Almer
54ca4dd43b
x86/hevc_res_add: refactor ff_hevc_transform_add{16,32}_8
...
* Reduced xmm register count to 7 (As such they are now enabled for x86_32).
* Removed four movdqa (affects the sse2 version only).
* pxor is now used to clear m0 only once.
~5% faster.
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-08-21 15:01:33 -03:00
James Almer
76a99d467f
x86/hecv_res_add: add ff_hevc_transform_add{8,16,32}_8_avx
...
~15% faster than sse2
Reviewed-by: Mickaël Raulet <mraulet@gmail.com>
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-08-20 16:54:52 -03:00
James Almer
9f498f4e6f
x86/hevc_res_add: fix register count in hevc_transform_add{16,32}_10_avx2
...
Signed-off-by: James Almer <jamrial@gmail.com>
2014-08-19 21:34:52 -03:00
Pierre Edouard Lepere
a6af4bf64d
x86: hevc: adding transform_add
...
Reviewed-by: James Almer <jamrial@gmail.com>
Approved-by: Ronald S. Bultje
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-20 01:28:56 +02:00
Michael Niedermayer
3bb2297351
Merge commit 'efd26bedec9a345a5960dbfcbaec888418f2d4e6'
...
* commit 'efd26bedec9a345a5960dbfcbaec888418f2d4e6':
build: Add explanatory comments to (optimization) blocks in the Makefiles
Conflicts:
libavcodec/ppc/Makefile
libavcodec/x86/Makefile
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-15 20:25:12 +02:00
Michael Niedermayer
c1df467d73
Merge commit '835f798c7d20bca89eb4f3593846251ad0d84e4b'
...
* commit '835f798c7d20bca89eb4f3593846251ad0d84e4b':
mpegvideo: cosmetics: Lowercase ugly uppercase MPV_ function name prefixes
Conflicts:
libavcodec/h261dec.c
libavcodec/intrax8.c
libavcodec/mjpegenc.c
libavcodec/mpeg12dec.c
libavcodec/mpeg12enc.c
libavcodec/mpeg4videoenc.c
libavcodec/mpegvideo.c
libavcodec/mpegvideo.h
libavcodec/mpegvideo_enc.c
libavcodec/rv10.c
libavcodec/x86/mpegvideoenc.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-15 20:11:56 +02:00
Diego Biurrun
efd26bedec
build: Add explanatory comments to (optimization) blocks in the Makefiles
2014-08-15 02:55:21 -07:00
Diego Biurrun
835f798c7d
mpegvideo: cosmetics: Lowercase ugly uppercase MPV_ function name prefixes
2014-08-15 01:26:33 -07:00
James Darnley
54a51d3840
lavc/flacenc: partially unroll loop in flac_enc_lpc_16
...
It now does 12 samples per iteration, up from 4.
From 1.8 to 3.2 times faster again. 3.6 to 5.7 times faster overall.
Runtime is reduced by a further 2 to 18%. Overall runtime reduced by
4 to 50%.
Same conditions as before apply.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-13 03:09:26 +02:00
James Darnley
0081a14e7d
lavc/flacenc: add sse4 version of the 16-bit lpc encoder
...
From 1.8 to 2.4 times faster. Runtime is reduced by 2 to 39%. The
speed-up generally increases with compression_level.
This lpc encoder is not used with levels < 3 so it provides no speed-up
in these cases.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-13 01:14:47 +02:00
Ronald S. Bultje
45bed0ab30
vp9/x86: fix bug in intra_pred_hd_32x32.
...
Fixes mismatch in first keyframe in sample
ffvp9_fails_where_libvpx.succeeds.webm from ticket 3849. There's still
a second mismatch a few frames into the sample.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-12 13:11:21 +02:00
James Almer
c97870d1a1
x86/dca: remove unused header
...
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-12 12:46:53 +02:00
James Almer
e20ff251a6
x86/ttadsp: remove an unnecessary mova
...
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-12 12:29:05 +02:00
Michael Niedermayer
3841f2ae66
Merge commit 'd35b94fbabd8beb5d566c0b5d01688aff62c3b36'
...
* commit 'd35b94fbabd8beb5d566c0b5d01688aff62c3b36':
avcodec: Rename xvidmmx IDCT to xvid
Conflicts:
doc/APIchanges
libavcodec/version.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-09 12:11:13 +02:00
Michael Niedermayer
0dcebb9f63
Merge commit '84d173d3de97c753234ab0c0b50551d51413d663'
...
* commit '84d173d3de97c753234ab0c0b50551d51413d663':
xvididct: Ensure that the scantable permutation is always set correctly
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-08 22:17:04 +02:00
Diego Biurrun
d35b94fbab
avcodec: Rename xvidmmx IDCT to xvid
...
The Xvid IDCT is not MMX-specific.
2014-08-08 11:13:30 -07:00
Diego Biurrun
84d173d3de
xvididct: Ensure that the scantable permutation is always set correctly
...
This fixes cases where the scantable permuation would get overwritten by
the general idctdsp initialization.
2014-08-08 11:13:29 -07:00
Christophe Gisquet
75837e9add
x86: sbrdsp/fft: reuse ps_neg constant
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-06 19:25:08 +02:00
Christophe Gisquet
51dd80e751
x86: diracdsp: reuse constants
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-06 19:25:02 +02:00
Christophe Gisquet
6622a6cff3
x86: dwt: better share constants
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-06 19:24:57 +02:00
Christophe Gisquet
71db2d08b1
x86: better share ff_pw_2
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-06 19:24:49 +02:00
Christophe Gisquet
4e128ab0b1
x86: vpx/h264/hevc/mpeg2: share constants
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-06 18:36:31 +02:00
Michael Niedermayer
305f72aee7
avcodec: Change get_pixels() to ptrdiff_t linesize
...
Found-by: ubitux
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-06 15:50:54 +02:00
Christophe Gisquet
6786848585
hevc_deblock: change tc type
...
The x86 asm expects int32_t so use that type.
Reviewed-by: Mickaël Raulet <mraulet@insa-rennes.fr>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-06 12:38:26 +02:00
James Almer
de417982e8
x86/vp9lpf: use fewer instructions in SPLATB_MIX
...
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-05 02:47:54 +02:00
Christophe Gisquet
e8c003edd2
x86: hevc_deblock: remove unnecessary masking
...
The unpacks/shuffles later on makes it unnecessary.
Before:
1508 decicycles in h, 2096759 runs, 393 skips
2512 decicycles in v, 2095422 runs, 1730 skips
After:
1477 decicycles in h, 2096745 runs, 407 skips
2484 decicycles in v, 2095297 runs, 1855 skips
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-04 17:46:04 +02:00
James Almer
b7863c972c
x86/hevc_mc: use fewer instructions in hevc_put_hevc_{uni, bi}_w[24]_{8, 10, 12}
...
Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-04 14:47:15 +02:00
James Almer
b1a44e6bf5
x86/hevc_mc: remove an unnecessary pxor
...
Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: Mickaël Raulet <mraulet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-04 14:35:08 +02:00
James Almer
d0f56ca071
x86/hevc_deblock: improve 8bit transpose store macros
...
Up to four instructions less depending on function and instruction set.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-03 04:24:15 +02:00
Michael Niedermayer
f54e01c24e
Merge commit 'a786c8259dafeca9744252230b5d78f67810770c'
...
* commit 'a786c8259dafeca9744252230b5d78f67810770c':
idct: Split off Xvid IDCT
Conflicts:
libavcodec/Makefile
libavcodec/mpeg4videodec.c
libavcodec/x86/Makefile
libavcodec/x86/idctdsp_init.c
This split is somewhat restructured leaving the xvid IDCT available
outside mpeg4 if manually selected.
The code also could not be merged unchanged as it conflicted with a
bugfix in FFmpeg
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-01 16:21:52 +02:00
Diego Biurrun
a786c8259d
idct: Split off Xvid IDCT
...
The Xvid IDCT is only required to decode some Xvid-encoded MPEG-4 files,
so there is no point in having it as an unconditional part of idctdsp.
2014-08-01 01:25:18 -07:00
James Almer
62baf5b853
x86/hevc_deblock: use existing x86util transpose macro in chroma_{10, 12}
...
Cosmetic change. No measurable difference in speed.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-31 22:56:21 +02:00
Christophe Gisquet
a507623bad
x86: hevc_mc: fix register count usage
...
A macro was using a fixed register, causing too many GPRs to be
declared as used.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-29 22:50:50 +02:00
James Almer
73c4f63ba5
x86/hevc_deblock: add add ff_hevc_[hv]_loop_filter_luma_{8, 10, 12}_avx
...
~5% faster than SSSE3
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-29 14:04:59 +02:00
James Almer
88ba821f23
x86/hevc_deblock: improve luma functions register allocation
...
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-29 13:38:05 +02:00
James Almer
c74b08c5c6
x86/hevc_deblock: remove some unnecessary instructions
...
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-29 13:27:44 +02:00
James Almer
4f91bb0ff0
x86/hevc_deblock: use psignw instead of pmullw where possible
...
It's slightly faster
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-29 03:42:29 +02:00
Michael Niedermayer
a91c5ed008
Merge commit '4f8cf0dc4ef6110174056df7edd9dc2f2a988b6d'
...
* commit '4f8cf0dc4ef6110174056df7edd9dc2f2a988b6d':
x86: build: Restore ordering of OBJS lines
Conflicts:
libavcodec/x86/Makefile
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-29 00:34:53 +02:00
Diego Biurrun
4f8cf0dc4e
x86: build: Restore ordering of OBJS lines
2014-07-28 13:19:04 -07:00
James Almer
664e9e4331
x86/hevc_deblock: load less data in hevc_h_loop_filter_luma_8
...
Reading 8 bytes is enough.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-28 21:55:22 +02:00
James Almer
f137876182
x86/hevc_idct: add a colon to labels
...
This fixes a warning spam when using NASM
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-28 21:43:32 +02:00
Christophe Gisquet
81943a10b5
x86: hevc_mc: load less data in epel filters
...
Before:
5679 decicycles in epel_bi, 2059976 runs, 37176 skips
3468 decicycles in epel_uni, 1040886 runs, 7690 skips
After:
5323 decicycles in epel_bi, 2059493 runs, 37659 skips
3262 decicycles in epel_uni, 1040871 runs, 7705 skips
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-27 18:34:39 +02:00
Christophe Gisquet
36284ae981
x86: hevc_mc: replace one lea by add
...
Should have been in 036f11bdb5
.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-27 17:42:56 +02:00
James Almer
bfb3b2b7a6
x86/hevc_idct: add 12bit idct_dc
...
Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: Mickaël Raulet <mraulet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-27 00:30:56 +02:00
Michael Niedermayer
d4a9e89b27
avcodec/x86/hevcdsp_init: make license header consistent
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-27 00:28:44 +02:00
Michael Niedermayer
706f81a2c2
Merge commit '1a880b2fb8456ce68eefe5902bac95fea1e6a72d'
...
* commit '1a880b2fb8456ce68eefe5902bac95fea1e6a72d':
hevc: SSE2 and SSSE3 loop filters
Conflicts:
libavcodec/hevcdsp.c
libavcodec/hevcdsp.h
libavcodec/x86/Makefile
libavcodec/x86/hevc_deblock.asm
libavcodec/x86/hevcdsp_init.c
See: de7b89fd43
and several others
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-27 00:20:48 +02:00
James Almer
1ace9573dc
x86/hevc_idct: replace old and unused idct functions
...
Only 8-bit and 10-bit idct_dc() functions are included (adding others should be trivial).
Benchmarks on an Intel Core i5-4200U:
idct8x8_dc
SSE2 MMXEXT C
cycles 22 26 57
idct16x16_dc
AVX2 SSE2 C
cycles 27 32 249
idct32x32_dc
AVX2 SSE2 C
cycles 62 126 1375
Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: Mickaël Raulet <mraulet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-26 18:00:11 +02:00