Their intent was to make the DSP work with wmalossless pro.
The later was fixed to work with the DSP.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
pb_eo must be handled as a rip relative address for MSVC64, so an
intermediate register is needed. Should fix link failures.
Suggested by Hendrik Leppkes and Christophe Gisquet.
Tested-By: Hendrik Leppkes <h.leppkes@gmail.com>
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
The epel_hv functions were still relying on only epel_hv 8-wide
being the maximum width instanciated.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This reverts commit 3b4ffba3af968ae702e3a44f6b5f53445efc7363.
Unbreaks the SSSE3 code on mingw32
Conflicts:
libavcodec/x86/lossless_audiodsp.asm
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This is needed as the mmx code is used as fallback from the ssse3 code
Suggested-by: jamrial
Tested-by: wm4
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Original x86 intrinsics code by Pierre-Edouard Lepere.
Yasm port, refactoring and optimizations by James Almer.
Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U
Width 32
342694 decicycles in sao_edge_filter_10, 16384 runs, 0 skips
29476 decicycles in ff_hevc_sao_edge_filter_32_10_ssse3, 16384 runs, 0 skips
13996 decicycles in ff_hevc_sao_edge_filter_32_10_avx2, 16381 runs, 3 skips
Width 64
581163 decicycles in sao_edge_filter_10, 8192 runs, 0 skips
59774 decicycles in ff_hevc_sao_edge_filter_64_10_ssse3, 8192 runs, 0 skips
28383 decicycles in ff_hevc_sao_edge_filter_64_10_avx2, 8191 runs, 1 skips
Signed-off-by: James Almer <jamrial@gmail.com>
Original x86 intrinsics code and initial yasm port by Pierre-Edouard Lepere.
Refactoring and optimizations by James Almer.
Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U
Width 32
158583 decicycles in edge, sao_edge_filter_8 runs, 0 skips
5205 decicycles in ff_hevc_sao_edge_filter_32_8_ssse3, 32767 runs, 1 skips
2942 decicycles in ff_hevc_sao_edge_filter_32_8_avx2, 32767 runs, 1 skips
Width 64
705639 decicycles in sao_edge_filter_8, 262144 runs, 0 skips
19224 decicycles in ff_hevc_sao_edge_filter_64_8_ssse3, 262111 runs, 33 skips
10433 decicycles in ff_hevc_sao_edge_filter_64_8_avx2, 262115 runs, 29 skips
Signed-off-by: James Almer <jamrial@gmail.com>
Before
2843 decicycles in ff_sbr_autocorrelate_sse3, 262086 runs, 58 skips
After
2693 decicycles in ff_sbr_autocorrelate_sse3, 262117 runs, 27 skips
Signed-off-by: James Almer <jamrial@gmail.com>
x86inc can translate r*m into a register or stack on its own
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
Also a slight change to the ssse3 code, which prevents a theoretical
overflow in the sharp filter.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
These fix failures of --enable-xmm-clobber-test
It would be better to change the code to use fewer registers, but until
someone does the used register count must not be too small
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This fixes artifacts in the last pixel of rows with some widths and pixel formats
Found-by: Dominique Leroux <Dominique.Leroux@autodesk.com>
Tested-by: Dominique Leroux <Dominique.Leroux@autodesk.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>