Commit Graph

11 Commits

Author SHA1 Message Date
James Almer
5c8f747085 x86/hevc_sao: use unaligned movs for sao_{band,filter} with width 8
Suggested-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-03-01 20:02:43 -03:00
James Almer
14b44c1614 x86/hevc_sao: make sao_edge_filter_{10,12} work on x86_32
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-12 13:21:30 -03:00
James Almer
06fe6dfe12 x86/hevc_sao: make sao_band_filter work on x86_32
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-09 20:41:21 -03:00
Christophe Gisquet
9dc45d1f42 x86: lavc: share more constants
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-06 23:35:02 +01:00
James Almer
aea29a891f x86/hevc_sao: fix loading of RIP address
pb_eo must be handled as a rip relative address for MSVC64, so an
intermediate register is needed. Should fix link failures.

Suggested by Hendrik Leppkes and Christophe Gisquet.

Tested-By: Hendrik Leppkes <h.leppkes@gmail.com>
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-06 15:06:15 -03:00
James Almer
15574c505b x86/hevcdsp: add ff_hevc_sao_edge_filter_{10,12}_{sse2,avx2}
Original x86 intrinsics code by Pierre-Edouard Lepere.
Yasm port, refactoring and optimizations by James Almer.

Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U

Width 32
342694 decicycles in sao_edge_filter_10, 16384 runs, 0 skips
29476 decicycles in ff_hevc_sao_edge_filter_32_10_ssse3, 16384 runs, 0 skips
13996 decicycles in ff_hevc_sao_edge_filter_32_10_avx2, 16381 runs, 3 skips

Width 64
581163 decicycles in sao_edge_filter_10, 8192 runs, 0 skips
59774 decicycles in ff_hevc_sao_edge_filter_64_10_ssse3, 8192 runs, 0 skips
28383 decicycles in ff_hevc_sao_edge_filter_64_10_avx2, 8191 runs, 1 skips

Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-05 15:02:33 -03:00
James Almer
042c1159fc x86/hevcdsp: add ff_hevc_sao_edge_filter_8_{ssse3,avx2}
Original x86 intrinsics code and initial yasm port by Pierre-Edouard Lepere.
Refactoring and optimizations by James Almer.

Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U

Width 32
158583 decicycles in edge, sao_edge_filter_8 runs, 0 skips
5205 decicycles in ff_hevc_sao_edge_filter_32_8_ssse3, 32767 runs, 1 skips
2942 decicycles in ff_hevc_sao_edge_filter_32_8_avx2, 32767 runs, 1 skips

Width 64
705639 decicycles in sao_edge_filter_8, 262144 runs, 0 skips
19224 decicycles in ff_hevc_sao_edge_filter_64_8_ssse3, 262111 runs, 33 skips
10433 decicycles in ff_hevc_sao_edge_filter_64_8_avx2, 262115 runs, 29 skips

Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-05 15:02:27 -03:00
James Almer
aa945dc112 x86/hevcdsp: add missing vzeroupper in ff_hevc_sao_band_filter_48_*_avx2
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-02 00:01:35 -03:00
James Almer
71e2cb4706 x86/hevcdsp: add missing guards to ff_hevc_sao_band_filter_avx2
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-01 21:45:52 -03:00
Christophe Gisquet
bff7feb328 x86: hevc/sao: aligned source buffers
Usefull for at least band filter, for which:
- Band filter call only:
           32      64
Before:  16556    54015
After:   16497    52355
- Whole case:
           32      64
Before:  37031   103008
After:   32045    93952
2015-02-01 20:22:54 -03:00
James Almer
fa3eccb4f9 x86/hevc: add ff_hevc_sao_band_filter_{8,10,12}_{sse2,avx,avx2}
Original x86 intrinsics code and initial 8bit yasm port by Pierre-Edouard Lepere.
10/12bit yasm ports, refactoring and optimizations by James Almer

Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U

width 32
40338 decicycles in sao_band_filter_0_8, 2048 runs, 0 skips
8056 decicycles in ff_hevc_sao_band_filter_8_32_sse2, 2048 runs, 0 skips
7458 decicycles in ff_hevc_sao_band_filter_8_32_avx, 2048 runs, 0 skips
4504 decicycles in ff_hevc_sao_band_filter_8_32_avx2, 2048 runs, 0 skips

width 64
136046 decicycles in sao_band_filter_0_8, 16384 runs, 0 skips
28576 decicycles in ff_hevc_sao_band_filter_8_32_sse2, 16384 runs, 0 skips
26707 decicycles in ff_hevc_sao_band_filter_8_32_avx, 16384 runs, 0 skips
14387 decicycles in ff_hevc_sao_band_filter_8_32_avx2, 16384 runs, 0 skips

Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-01 20:22:35 -03:00