generic-library/vpx

Author	SHA1	Message	Date
Johann	c7cfde42a9	Add save/restore xmm registers in x86 assembly code Went through the code and fixed it. Verified on Windows. Where possible, remove dependencies on xmm[67] Current code relies on pushing rbp to the stack to get 16 byte alignment. This broke when rbp wasn't pushed (vp8/encoder/x86/sad_sse3.asm). Work around this by using unaligned memory accesses. Revisit this and the offsets in vp8/encoder/x86/sad_sse3.asm in another change to SAVE_XMM. Change-Id: I5f940994d3ebfd977c3d68446cef20fd78b07877	2011-04-18 16:30:38 -04:00
Johann	487c0299c9	remove dead code, add missing RESTORE_XMM vp8_filter_block1d16_h4_ssse3 was never called because UNSHADOW_ARGS moves the stack by 'mov rsp, rbp', the issue was masked. however, if/when win64 used those registers for persistant data, issues could/will arise. Change-Id: I56d6effca0aeba1f86082689771cb10145d39651	2011-04-15 10:11:53 -04:00
Jan Kratochvil	5cdc3a4c29	nasm: address labels 'rel label' vice 'wrt rip' nasm does not support `label wrt rip', it requires `rel label'. It is still fully compatible with yasm. Provide nasm compatibility. No binary change by this patch with yasm on {x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on {x86_64,i686}-fedora13-linux-gnu have been checked as safe. Change-Id: I488773a4e930a56e43b0cc72d867ee5291215f50	2010-10-04 19:47:54 -04:00
Fritz Koenig	b7dc9398f2	Use movq instead of movdqu. Movdqu is more expensive (throughput, uops) than movq. Minimal impact for newer big cores, but ~2.25% gain on Atom. Change-Id: I62c80bb1cc01d8a91c350c4c7719462809a4ef7f	2010-09-20 11:34:26 -07:00
Fritz Koenig	6d90f867e4	Merge branch 'master' of git://review.webmproject.org/libvpx	2010-09-09 08:54:21 -07:00
John Koleszar	c2140b8af1	Use WebM in copyright notice for consistency Changes 'The VP8 project' to 'The WebM project', for consistency with other webmproject.org repositories. Fixes issue #97. Change-Id: I37c13ed5fbdb9d334ceef71c6350e9febed9bbba	2010-09-09 10:01:21 -04:00
Fritz Koenig	3fb37162a8	Bilinear subpixel optimizations for ssse3. Used pmaddubsw for multiply and add of two filter taps at once for 16x16 and 8x8 blocks. Change-Id: Idccf2d6e094561624407b109fa7e80ba799355ea	2010-09-07 17:19:40 -07:00
Jim Bankoski	b0660457fe	Revert "Removed ssse3 sixtap code" This reverts commit `6ea5bb85cd`.	2010-08-19 15:58:27 -04:00
Scott LaVarnway	6ea5bb85cd	Removed ssse3 sixtap code Change-Id: I0f20fbb898ee31eb94a143471aa6f1ca17a229a4	2010-08-18 15:34:09 -04:00
Scott LaVarnway	b07e5b6fa1	Finished vp8_sixtap_predict4x4_ssse3 function Added vp8_filter_block1d4_h6_ssse3 and vp8_filter_block1d4_v6_ssse3 assembly routines. Also removed unused assembly. Change-Id: I01c1021835f2edda9da706822345f217087ca0d0	2010-08-11 13:49:00 -04:00
Scott LaVarnway	e4fe866949	Added ssse3 version of sixtap filters Improved decoder performance by 9% for the clip used. Change-Id: I8fc5609213b7bef10248372595dc85b29f9895b9	2010-08-10 17:33:49 -04:00

11 Commits