James Almer
3178931a14
x86/hevc_sao: move 10/12bit functions into a separate file
...
Tested-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-09-30 02:59:55 -03:00
Henrik Gramner
f0b7882ceb
x86inc: Drop SECTION_TEXT macro
...
The .text section is already 16-byte aligned by default on all supported
platforms so `SECTION_TEXT` isn't any different from `SECTION .text`.
2015-08-04 20:13:09 +02:00
Michael Niedermayer
29d147c94d
Merge commit '059a934806d61f7af9ab3fd9f74994b838ea5eba'
...
* commit '059a934806d61f7af9ab3fd9f74994b838ea5eba':
lavc: Consistently prefix input buffer defines
Conflicts:
doc/examples/decoding_encoding.c
libavcodec/4xm.c
libavcodec/aac_adtstoasc_bsf.c
libavcodec/aacdec.c
libavcodec/aacenc.c
libavcodec/ac3dec.h
libavcodec/asvenc.c
libavcodec/avcodec.h
libavcodec/avpacket.c
libavcodec/dvdec.c
libavcodec/ffv1enc.c
libavcodec/g2meet.c
libavcodec/gif.c
libavcodec/h264.c
libavcodec/h264_mp4toannexb_bsf.c
libavcodec/huffyuvdec.c
libavcodec/huffyuvenc.c
libavcodec/jpeglsenc.c
libavcodec/libxvid.c
libavcodec/mdec.c
libavcodec/motionpixels.c
libavcodec/mpeg4videodec.c
libavcodec/mpegvideo.c
libavcodec/noise_bsf.c
libavcodec/nuv.c
libavcodec/nvenc.c
libavcodec/options.c
libavcodec/parser.c
libavcodec/pngenc.c
libavcodec/proresenc_kostya.c
libavcodec/qsvdec.c
libavcodec/svq1enc.c
libavcodec/tiffenc.c
libavcodec/truemotion2.c
libavcodec/utils.c
libavcodec/utvideoenc.c
libavcodec/vc1dec.c
libavcodec/wmalosslessdec.c
libavformat/adxdec.c
libavformat/aiffdec.c
libavformat/apc.c
libavformat/apetag.c
libavformat/avidec.c
libavformat/bink.c
libavformat/cafdec.c
libavformat/flvdec.c
libavformat/id3v2.c
libavformat/isom.c
libavformat/matroskadec.c
libavformat/mov.c
libavformat/mpc.c
libavformat/mpc8.c
libavformat/mpegts.c
libavformat/mvi.c
libavformat/mxfdec.c
libavformat/mxg.c
libavformat/nutdec.c
libavformat/oggdec.c
libavformat/oggparsecelt.c
libavformat/oggparseflac.c
libavformat/oggparseopus.c
libavformat/oggparsespeex.c
libavformat/omadec.c
libavformat/rawdec.c
libavformat/riffdec.c
libavformat/rl2.c
libavformat/rmdec.c
libavformat/rtpdec_latm.c
libavformat/rtpdec_mpeg4.c
libavformat/rtpdec_qdm2.c
libavformat/rtpdec_svq3.c
libavformat/sierravmd.c
libavformat/smacker.c
libavformat/smush.c
libavformat/spdifenc.c
libavformat/takdec.c
libavformat/tta.c
libavformat/utils.c
libavformat/vqf.c
libavformat/westwood_vqa.c
libavformat/xmv.c
libavformat/xwma.c
libavformat/yop.c
Merged-by: Michael Niedermayer <michael@niedermayer.cc>
2015-07-27 23:15:19 +02:00
James Almer
844bef578e
avcodec/x86: add missing colon to labels
...
Silences warnings with Nasm
Signed-off-by: James Almer <jamrial@gmail.com>
2015-07-26 02:50:14 -03:00
James Almer
5c8f747085
x86/hevc_sao: use unaligned movs for sao_{band,filter} with width 8
...
Suggested-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-03-01 20:02:43 -03:00
James Almer
14b44c1614
x86/hevc_sao: make sao_edge_filter_{10,12} work on x86_32
...
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-12 13:21:30 -03:00
James Almer
06fe6dfe12
x86/hevc_sao: make sao_band_filter work on x86_32
...
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-09 20:41:21 -03:00
Christophe Gisquet
9dc45d1f42
x86: lavc: share more constants
...
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-06 23:35:02 +01:00
James Almer
aea29a891f
x86/hevc_sao: fix loading of RIP address
...
pb_eo must be handled as a rip relative address for MSVC64, so an
intermediate register is needed. Should fix link failures.
Suggested by Hendrik Leppkes and Christophe Gisquet.
Tested-By: Hendrik Leppkes <h.leppkes@gmail.com>
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-06 15:06:15 -03:00
James Almer
15574c505b
x86/hevcdsp: add ff_hevc_sao_edge_filter_{10,12}_{sse2,avx2}
...
Original x86 intrinsics code by Pierre-Edouard Lepere.
Yasm port, refactoring and optimizations by James Almer.
Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U
Width 32
342694 decicycles in sao_edge_filter_10, 16384 runs, 0 skips
29476 decicycles in ff_hevc_sao_edge_filter_32_10_ssse3, 16384 runs, 0 skips
13996 decicycles in ff_hevc_sao_edge_filter_32_10_avx2, 16381 runs, 3 skips
Width 64
581163 decicycles in sao_edge_filter_10, 8192 runs, 0 skips
59774 decicycles in ff_hevc_sao_edge_filter_64_10_ssse3, 8192 runs, 0 skips
28383 decicycles in ff_hevc_sao_edge_filter_64_10_avx2, 8191 runs, 1 skips
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-05 15:02:33 -03:00
James Almer
042c1159fc
x86/hevcdsp: add ff_hevc_sao_edge_filter_8_{ssse3,avx2}
...
Original x86 intrinsics code and initial yasm port by Pierre-Edouard Lepere.
Refactoring and optimizations by James Almer.
Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U
Width 32
158583 decicycles in edge, sao_edge_filter_8 runs, 0 skips
5205 decicycles in ff_hevc_sao_edge_filter_32_8_ssse3, 32767 runs, 1 skips
2942 decicycles in ff_hevc_sao_edge_filter_32_8_avx2, 32767 runs, 1 skips
Width 64
705639 decicycles in sao_edge_filter_8, 262144 runs, 0 skips
19224 decicycles in ff_hevc_sao_edge_filter_64_8_ssse3, 262111 runs, 33 skips
10433 decicycles in ff_hevc_sao_edge_filter_64_8_avx2, 262115 runs, 29 skips
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-05 15:02:27 -03:00
James Almer
aa945dc112
x86/hevcdsp: add missing vzeroupper in ff_hevc_sao_band_filter_48_*_avx2
...
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-02 00:01:35 -03:00
James Almer
71e2cb4706
x86/hevcdsp: add missing guards to ff_hevc_sao_band_filter_avx2
...
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-01 21:45:52 -03:00
Christophe Gisquet
bff7feb328
x86: hevc/sao: aligned source buffers
...
Usefull for at least band filter, for which:
- Band filter call only:
32 64
Before: 16556 54015
After: 16497 52355
- Whole case:
32 64
Before: 37031 103008
After: 32045 93952
2015-02-01 20:22:54 -03:00
James Almer
fa3eccb4f9
x86/hevc: add ff_hevc_sao_band_filter_{8,10,12}_{sse2,avx,avx2}
...
Original x86 intrinsics code and initial 8bit yasm port by Pierre-Edouard Lepere.
10/12bit yasm ports, refactoring and optimizations by James Almer
Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U
width 32
40338 decicycles in sao_band_filter_0_8, 2048 runs, 0 skips
8056 decicycles in ff_hevc_sao_band_filter_8_32_sse2, 2048 runs, 0 skips
7458 decicycles in ff_hevc_sao_band_filter_8_32_avx, 2048 runs, 0 skips
4504 decicycles in ff_hevc_sao_band_filter_8_32_avx2, 2048 runs, 0 skips
width 64
136046 decicycles in sao_band_filter_0_8, 16384 runs, 0 skips
28576 decicycles in ff_hevc_sao_band_filter_8_32_sse2, 16384 runs, 0 skips
26707 decicycles in ff_hevc_sao_band_filter_8_32_avx, 16384 runs, 0 skips
14387 decicycles in ff_hevc_sao_band_filter_8_32_avx2, 16384 runs, 0 skips
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-01 20:22:35 -03:00