openh264

Author	SHA1	Message	Date
Martin Storsjö	23f57adaea	Do full register loads instead of single-lane loads in DeblockLumaEq4H_neon Instead of loading the registers one lane at a time, load full registers and then transpose them. This is faster, reducing the runtime for the function from about 506 cycles to 434 cycles (tested on a Cortex A8). This also avoids an issue which seems like a cpu bug, present on Sony Xperia T (cpu implementer 0x51 architecture 7 variant 0x1 part 0x04d). On such a device, it seemed like the "vswp q9, q10" could start executing before the previous vld4.u8 {d20[x],d21[x],d22[x],d23[x]}, [r3], r1 had finished and written back their result. Changing the "vswp q9, q10" into "vswp q10, q9", or into separate "vswp d18, d20; vswp d19, d21" (or the other way around) seemed to avoid the issue. This happened occasionally (a couple times per 100000 invocations or so).	2014-04-28 10:12:16 +03:00
volvet	c65e286036	Merge pull request #738 from mstorsjo/gnu-aarch64 Fix building the aarch64 assembly using gnu binutils	2014-04-25 09:07:43 +08:00
Martin Storsjö	66f58e8357	Add macros for the non-standard mov.16b/mov.8b/ext.16b/ext.8b This fixes building with gnu binutils, which don't support this nonstandard form of the instructions. Once Apple's tools support the proper standard form of the instructions, the code should be updated to use that everywhere instead, and these macros should be removed.	2014-04-23 11:47:12 +03:00
Martin Storsjö	7cd175d097	Use the correct ext syntax in the gnu version of macros	2014-04-23 11:47:12 +03:00
Martin Storsjö	b13a399ab5	Use a plain "ret" instead of "ret lr" This fixes an issue with assembling with gnu binutils.	2014-04-23 11:47:12 +03:00
Martin Storsjö	f2642b308a	Add correct arguments to the gnu version of UNPACK_FILTER_SINGLE_TAG_16BITS	2014-04-23 11:47:12 +03:00
Martin Storsjö	90fad9fd98	Add \() to macro arguments to separate the argument from the following .8h or similar	2014-04-23 11:47:12 +03:00
Martin Storsjö	80bd541cbe	Remove .syntax unified from the aarch64 common header This directive isn't available in aarch64 code, only in arm code.	2014-04-23 11:47:12 +03:00
Martin Storsjö	3c2e9cd7bf	Regenerate makefiles to include the new arm64 assembly files	2014-04-23 11:44:47 +03:00
Martin Storsjö	764f787dcb	Rename the makefile variable for arm assembly sources This is in preparation for adding support for the aarc64 assembly files as well.	2014-04-23 10:55:30 +03:00
Martin Storsjö	a842f14a3c	Remove .orig files left over from running astyle	2014-04-23 09:24:23 +03:00
Martin Storsjö	45aef90d26	Remove the executable bit from source files	2014-04-23 09:23:56 +03:00
dongzhang	ad9e2dab4f	Add Motion Compehension ARM64 Neon Code	2014-04-23 13:26:28 +08:00
Licai Guo	b47606a4ff	Merge pull request #733 from dongzha/ExpandPic_ARM64 Add expand picture support for ARM64 NEON	2014-04-23 09:57:39 +08:00
dongzhang	2444327a6c	Add expand picture support for ARM64 NEON Remove duplicate MACROS	2014-04-23 09:14:32 +08:00
Martin Storsjö	564d16c2ef	Make WelsSnprintf return values be non-negative This makes sure the windows version of these functions behave more like the posix version. The posix snprintf returns how much would have been written if the buffer had been large enough, which we don't know easily in the windows versions. This basically means that we can assume that the return value is >= 0 now, which can simplify the calling code.	2014-04-21 22:03:20 +03:00
Licai Guo	3f2ea77908	Merge pull request #719 from dongzha/MC Modify ARM32 Neon code for Expand Chroma Picture, when UVWidth%16==8.	2014-04-21 14:38:51 +08:00
Licai Guo	039a547804	give accurate align information for mc copy functions this can improve the performance for target like javascript	2014-04-19 00:33:23 -07:00
Licai Guo	2f8c539e60	Merge pull request #707 from dongzha/FixIssueMcNEON Fix potential issue for neon implement on encoder mode decision.	2014-04-17 17:26:25 +08:00
dongzhang	a4f59bc0d7	Modify ARM32 Neon code for Expand Chroma Picture, when UVWidth%16==8.	2014-04-17 15:58:30 +08:00
Licai Guo	4062fa9d34	Merge pull request #703 from zhilwang/pf-test Move copy_mb neon code to common folder	2014-04-17 11:08:56 +08:00
Licai Guo	3d9d00b27c	Update targets.mk	2014-04-17 10:43:10 +08:00
Licai Guo	c8e1a41c29	Move copy_mb neon code to common folder	2014-04-17 10:06:48 +08:00
ruil2	b553468ad3	keep the declaration and definition in the same namespace	2014-04-17 09:45:26 +08:00
huili2	4ab8c88e98	divide copy_mb functions into new file for decoder use from encoder and add files for EC in decoder only.	2014-04-14 20:17:41 -07:00
Dong Zhang	8a4300be50	Fix potential issue for neon implement on encoder mode decision. Error happens when ME_REFINE_BUF_STRIDE is not equal to 32.	2014-04-13 19:41:29 -07:00
Martin Storsjö	b35c21201b	Use the Windows Runtime ThreadPool API for creating threads on Windows Phone Windows Phone lacks the old CreateThread/beginthreadex APIs for creating threads. (Technically, the functions still do exist, but they aren't officially supported and aren't visible in the headers when targeting Windows Phone.) Building code that uses the Windows Runtime language extensions requires building with the -ZW option.	2014-04-01 11:18:49 +03:00
Martin Storsjö	f293d26a62	Use more modern versions of functions that don't exist on Windows Phone	2014-04-01 11:18:48 +03:00
Martin Storsjö	4bcb03c5a0	Remove the unused function WelsSleep Windows Phone 8 doesn't have Sleep(), but there's no need to use the function at all.	2014-04-01 11:18:48 +03:00
volvet	9f50e0c91e	clean multi-threading macro	2014-03-31 18:24:10 -07:00
ruil2	6b3f89d582	move some common functions to common.cpp and add some functions in common	2014-03-25 15:35:55 +08:00
Licai Guo	e39de8d404	reoranize common to inc/src/x86/arm	2014-03-18 19:41:32 -07:00
volvet	7313ecdbd0	Merge pull request #538 from mstorsjo/use-apple-builtin-define Use __APPLE__ instead of APPLE_IOS for apple/arm specific features	2014-03-19 09:45:56 +08:00
Licai Guo	d897d362ab	Merge pull request #532 from huili2/WELS_CLIP1 Modify MACRO WELS_CLIP1 as inline functions	2014-03-19 08:50:04 +08:00
Martin Storsjö	9586c59b9e	Use __APPLE__ instead of APPLE_IOS in the arm assembly sources	2014-03-18 23:15:49 +02:00
Martin Storsjö	73ed237d73	Use __APPLE__ instead of APPLE_IOS for using the apple cpu feature detection	2014-03-18 23:15:49 +02:00
Ethan Hugg	197423f271	Merge pull request #520 from ylatuya/master Fix compiler warnings and remove dead code	2014-03-18 13:28:02 -07:00
Andoni Morales Alastruey	703c69de81	codec: add a new macro for unused functions Variables used only for tracing logs can trigger -Werror=unusef-variable when tracing is disabled. This macro helps to silent gcc in those casesWIP	2014-03-18 19:15:25 +01:00
Martin Storsjö	e1b5e038d2	Use .obj as suffix for object files on MSVC This avoids warnings when linking about "unrecognized source file type, object file assumed".	2014-03-18 19:41:06 +02:00
huili2	090e8cc1ed	modify WELS_CLIP1 to be inline functions	2014-03-18 01:54:25 -07:00
volvet	b21411ad7c	Merge pull request #511 from mstorsjo/remove-unused-define Remove the unused FORMAT_COFF define	2014-03-18 16:11:22 +08:00
volvet	fb1958ad13	Merge pull request #519 from mstorsjo/push-xmm-registers Backup/restore the xmm6-xmm15 SSE registers within asm functions on win64 Reviewed by zhiliang	2014-03-18 15:04:54 +08:00
volvet	b5353c8455	Merge pull request #516 from mstorsjo/fix-yasm-64bit Fix building with yasm in 64 bit mode	2014-03-18 09:29:42 +08:00
volvet	e75cd2298b	Merge pull request #517 from mstorsjo/simplify-x86-asm-func-macro Fold ALIGN 16 and the function label into WELS_EXTERN	2014-03-18 09:29:17 +08:00
Martin Storsjö	4633626d69	Remove XMMREG_PROTECT This isn't necessary any longer, when all the assembly routines take care of restoring registers as necessary.	2014-03-17 13:47:01 +02:00
Martin Storsjö	3cf52554f7	Backup/restore the xmm6-xmm15 SSE registers within asm functions on win64 According to the Win64 ABI, these registers need to be preserved, and compilers are allowed to rely on their content to stay available - not only for float usage but for any usage, anywhere, in the calling C++ code. This adds a macro which pushes the clobbered registers onto the stack if targeting win64 (and a matching one which restores them). The parameter to the macro is the number of xmm registers used (e.g. if using xmm0 - xmm7, the parameter is 8), or in other words, the number of the highest xmm register used plus one. This is similar to how the same issue is handled for the NEON registers q4-q7 with the vpush instruction, except that they needed to be preserved on all platforms, not only on one particular platform. This allows removing the XMMREG_PROTECT_* hacks, which can easily fail if the compiler chooses to use the callee saved xmm registers in an unexpected spot.	2014-03-17 13:44:33 +02:00
Martin Storsjö	9293f2f947	Remove commented out rodata sections and tables in assembly files	2014-03-17 13:42:18 +02:00
Martin Storsjö	eec968234d	Fold ALIGN 16 and the function label into WELS_EXTERN This simplifies the structure for all x86 assembly functions, reducing the amount of duplicated code structure.	2014-03-17 13:35:00 +02:00
Martin Storsjö	04f5bcd68d	Use movsxd in SIGN_EXTENSION This is what nasm ended up assembling movsx with 32 bit input to anyway. Keep using plain movsx for 16 bit input. This fixes building with yasm in 64 bit mode.	2014-03-17 13:26:46 +02:00
Martin Storsjö	f96918283f	Remove commented out code for old, 32-bit only x86 assembly function prologues/epilogues	2014-03-17 11:20:11 +02:00

1 2 3 4 5

236 Commits