
The inline loop was incrementing and using the value of %%i the wrong way. Disassembly of ff_vector_clip_int32_sse2 before and after this patch: movdqa (%rdx),%xmm0 | movdqa (%rdx),%xmm0 movdqa 0x10(%rdx),%xmm1 | movdqa 0x10(%rdx),%xmm1 movdqa 0x20(%rdx),%xmm2 | movdqa 0x20(%rdx),%xmm2 movdqa 0x30(%rdx),%xmm3 | movdqa 0x30(%rdx),%xmm3 [...] | movdqa %xmm0,(%rcx) | movdqa %xmm0,(%rcx) movdqa %xmm1,0x10(%rcx) | movdqa %xmm1,0x10(%rcx) movdqa %xmm2,0x20(%rcx) | movdqa %xmm2,0x20(%rcx) movdqa %xmm3,0x30(%rcx) | movdqa %xmm3,0x30(%rcx) movdqa (%rdx),%xmm0 | movdqa 0x40(%rdx),%xmm0 movdqa 0x20(%rdx),%xmm1 | movdqa 0x50(%rdx),%xmm1 movdqa 0x40(%rdx),%xmm2 | movdqa 0x60(%rdx),%xmm2 movdqa 0x60(%rdx),%xmm3 | movdqa 0x70(%rdx),%xmm3 [...] | movdqa %xmm0,(%rcx) | movdqa %xmm0,0x40(%rcx) movdqa %xmm1,0x20(%rcx) | movdqa %xmm1,0x50(%rcx) movdqa %xmm2,0x40(%rcx) | movdqa %xmm2,0x60(%rcx) movdqa %xmm3,0x60(%rcx) | movdqa %xmm3,0x70(%rcx) add $0x80,%rdx | add $0x80,%rdx add $0x80,%rcx | add $0x80,%rcx Other versions were unaffected. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
FFmpeg README ------------- 1) Documentation ---------------- * Read the documentation in the doc/ directory in git. You can also view it online at http://ffmpeg.org/documentation.html 2) Licensing ------------ * See the LICENSE file. 3) Build and Install -------------------- * See the INSTALL file.
Description
Languages
C
92.1%
Assembly
6%
Makefile
1.2%
C++
0.3%
Objective-C
0.2%
Other
0.1%