openh264

Author	SHA1	Message	Date
dongzha	80fdf09b26	Merge pull request #903 from zhilwang/arm64-sad Add Arm64 sad code	2014-05-30 09:26:04 +08:00
Sijia Chen	7413032185	using WelsRound for all the double-int32_t conversion	2014-05-20 14:06:31 +08:00
zhiliang wang	e6c9eb9824	Add Sad arm64 code	2014-05-14 17:06:48 +08:00
Martin Storsjö	3cc01c6239	Use CCASFLAGS when assembling .S sources This allows overriding whether all of CFLAGS should be passed when assembling.	2014-05-13 19:39:26 +03:00
sijchen	31a4d2aa3e	Merge pull request #829 from dongzha/FixBugforDeblocking Fix a bug in deblocking for neon 32 bit arm implementation for master	2014-05-13 17:21:48 +08:00
Martin Storsjö	6b9167199f	Use the built-in define __linux__ instead of the manually set LINUX	2014-05-12 12:14:33 +03:00
dongzhang	218adc7e29	Fix a bug in deblocking for neon 32 bit arm implementation	2014-05-09 14:06:16 +08:00
Martin Storsjö	6e715ddc10	Make an endif comment match the actual condition	2014-05-08 11:14:24 +03:00
huili2	5ed24f216b	astyle all files	2014-05-05 19:30:21 -07:00
Martin Storsjö	b8eeda1740	Properly back up and restore XMM registers on win64 in WelsSampleSadFour4x4_sse2	2014-05-04 15:47:56 +03:00
Licai Guo	fe5b8d1a69	refine format	2014-05-04 14:51:05 +08:00
Licai Guo	485b2b5b43	Add IntraSad asm code. Enable intraSad ASM code Refine format Add X86_ASM pretect for intraSad ASM code UT remove duplicated code.	2014-05-04 12:12:38 +08:00
Martin Storsjö	23f57adaea	Do full register loads instead of single-lane loads in DeblockLumaEq4H_neon Instead of loading the registers one lane at a time, load full registers and then transpose them. This is faster, reducing the runtime for the function from about 506 cycles to 434 cycles (tested on a Cortex A8). This also avoids an issue which seems like a cpu bug, present on Sony Xperia T (cpu implementer 0x51 architecture 7 variant 0x1 part 0x04d). On such a device, it seemed like the "vswp q9, q10" could start executing before the previous vld4.u8 {d20[x],d21[x],d22[x],d23[x]}, [r3], r1 had finished and written back their result. Changing the "vswp q9, q10" into "vswp q10, q9", or into separate "vswp d18, d20; vswp d19, d21" (or the other way around) seemed to avoid the issue. This happened occasionally (a couple times per 100000 invocations or so).	2014-04-28 10:12:16 +03:00
volvet	c65e286036	Merge pull request #738 from mstorsjo/gnu-aarch64 Fix building the aarch64 assembly using gnu binutils	2014-04-25 09:07:43 +08:00
Martin Storsjö	66f58e8357	Add macros for the non-standard mov.16b/mov.8b/ext.16b/ext.8b This fixes building with gnu binutils, which don't support this nonstandard form of the instructions. Once Apple's tools support the proper standard form of the instructions, the code should be updated to use that everywhere instead, and these macros should be removed.	2014-04-23 11:47:12 +03:00
Martin Storsjö	7cd175d097	Use the correct ext syntax in the gnu version of macros	2014-04-23 11:47:12 +03:00
Martin Storsjö	b13a399ab5	Use a plain "ret" instead of "ret lr" This fixes an issue with assembling with gnu binutils.	2014-04-23 11:47:12 +03:00
Martin Storsjö	f2642b308a	Add correct arguments to the gnu version of UNPACK_FILTER_SINGLE_TAG_16BITS	2014-04-23 11:47:12 +03:00
Martin Storsjö	90fad9fd98	Add \() to macro arguments to separate the argument from the following .8h or similar	2014-04-23 11:47:12 +03:00
Martin Storsjö	80bd541cbe	Remove .syntax unified from the aarch64 common header This directive isn't available in aarch64 code, only in arm code.	2014-04-23 11:47:12 +03:00
Martin Storsjö	3c2e9cd7bf	Regenerate makefiles to include the new arm64 assembly files	2014-04-23 11:44:47 +03:00
Martin Storsjö	764f787dcb	Rename the makefile variable for arm assembly sources This is in preparation for adding support for the aarc64 assembly files as well.	2014-04-23 10:55:30 +03:00
Martin Storsjö	a842f14a3c	Remove .orig files left over from running astyle	2014-04-23 09:24:23 +03:00
Martin Storsjö	45aef90d26	Remove the executable bit from source files	2014-04-23 09:23:56 +03:00
dongzhang	ad9e2dab4f	Add Motion Compehension ARM64 Neon Code	2014-04-23 13:26:28 +08:00
Licai Guo	b47606a4ff	Merge pull request #733 from dongzha/ExpandPic_ARM64 Add expand picture support for ARM64 NEON	2014-04-23 09:57:39 +08:00
dongzhang	2444327a6c	Add expand picture support for ARM64 NEON Remove duplicate MACROS	2014-04-23 09:14:32 +08:00
Martin Storsjö	564d16c2ef	Make WelsSnprintf return values be non-negative This makes sure the windows version of these functions behave more like the posix version. The posix snprintf returns how much would have been written if the buffer had been large enough, which we don't know easily in the windows versions. This basically means that we can assume that the return value is >= 0 now, which can simplify the calling code.	2014-04-21 22:03:20 +03:00
Licai Guo	3f2ea77908	Merge pull request #719 from dongzha/MC Modify ARM32 Neon code for Expand Chroma Picture, when UVWidth%16==8.	2014-04-21 14:38:51 +08:00
Licai Guo	039a547804	give accurate align information for mc copy functions this can improve the performance for target like javascript	2014-04-19 00:33:23 -07:00
Licai Guo	2f8c539e60	Merge pull request #707 from dongzha/FixIssueMcNEON Fix potential issue for neon implement on encoder mode decision.	2014-04-17 17:26:25 +08:00
dongzhang	a4f59bc0d7	Modify ARM32 Neon code for Expand Chroma Picture, when UVWidth%16==8.	2014-04-17 15:58:30 +08:00
Licai Guo	4062fa9d34	Merge pull request #703 from zhilwang/pf-test Move copy_mb neon code to common folder	2014-04-17 11:08:56 +08:00
Licai Guo	3d9d00b27c	Update targets.mk	2014-04-17 10:43:10 +08:00
Licai Guo	c8e1a41c29	Move copy_mb neon code to common folder	2014-04-17 10:06:48 +08:00
ruil2	b553468ad3	keep the declaration and definition in the same namespace	2014-04-17 09:45:26 +08:00
huili2	4ab8c88e98	divide copy_mb functions into new file for decoder use from encoder and add files for EC in decoder only.	2014-04-14 20:17:41 -07:00
Dong Zhang	8a4300be50	Fix potential issue for neon implement on encoder mode decision. Error happens when ME_REFINE_BUF_STRIDE is not equal to 32.	2014-04-13 19:41:29 -07:00
Martin Storsjö	b35c21201b	Use the Windows Runtime ThreadPool API for creating threads on Windows Phone Windows Phone lacks the old CreateThread/beginthreadex APIs for creating threads. (Technically, the functions still do exist, but they aren't officially supported and aren't visible in the headers when targeting Windows Phone.) Building code that uses the Windows Runtime language extensions requires building with the -ZW option.	2014-04-01 11:18:49 +03:00
Martin Storsjö	f293d26a62	Use more modern versions of functions that don't exist on Windows Phone	2014-04-01 11:18:48 +03:00
Martin Storsjö	4bcb03c5a0	Remove the unused function WelsSleep Windows Phone 8 doesn't have Sleep(), but there's no need to use the function at all.	2014-04-01 11:18:48 +03:00
volvet	9f50e0c91e	clean multi-threading macro	2014-03-31 18:24:10 -07:00
ruil2	6b3f89d582	move some common functions to common.cpp and add some functions in common	2014-03-25 15:35:55 +08:00
Licai Guo	e39de8d404	reoranize common to inc/src/x86/arm	2014-03-18 19:41:32 -07:00
volvet	7313ecdbd0	Merge pull request #538 from mstorsjo/use-apple-builtin-define Use __APPLE__ instead of APPLE_IOS for apple/arm specific features	2014-03-19 09:45:56 +08:00
Licai Guo	d897d362ab	Merge pull request #532 from huili2/WELS_CLIP1 Modify MACRO WELS_CLIP1 as inline functions	2014-03-19 08:50:04 +08:00
Martin Storsjö	9586c59b9e	Use __APPLE__ instead of APPLE_IOS in the arm assembly sources	2014-03-18 23:15:49 +02:00
Martin Storsjö	73ed237d73	Use __APPLE__ instead of APPLE_IOS for using the apple cpu feature detection	2014-03-18 23:15:49 +02:00
Ethan Hugg	197423f271	Merge pull request #520 from ylatuya/master Fix compiler warnings and remove dead code	2014-03-18 13:28:02 -07:00
Andoni Morales Alastruey	703c69de81	codec: add a new macro for unused functions Variables used only for tracing logs can trigger -Werror=unusef-variable when tracing is disabled. This macro helps to silent gcc in those casesWIP	2014-03-18 19:15:25 +01:00
Martin Storsjö	e1b5e038d2	Use .obj as suffix for object files on MSVC This avoids warnings when linking about "unrecognized source file type, object file assumed".	2014-03-18 19:41:06 +02:00
huili2	090e8cc1ed	modify WELS_CLIP1 to be inline functions	2014-03-18 01:54:25 -07:00
volvet	b21411ad7c	Merge pull request #511 from mstorsjo/remove-unused-define Remove the unused FORMAT_COFF define	2014-03-18 16:11:22 +08:00
volvet	fb1958ad13	Merge pull request #519 from mstorsjo/push-xmm-registers Backup/restore the xmm6-xmm15 SSE registers within asm functions on win64 Reviewed by zhiliang	2014-03-18 15:04:54 +08:00
volvet	b5353c8455	Merge pull request #516 from mstorsjo/fix-yasm-64bit Fix building with yasm in 64 bit mode	2014-03-18 09:29:42 +08:00
volvet	e75cd2298b	Merge pull request #517 from mstorsjo/simplify-x86-asm-func-macro Fold ALIGN 16 and the function label into WELS_EXTERN	2014-03-18 09:29:17 +08:00
Martin Storsjö	4633626d69	Remove XMMREG_PROTECT This isn't necessary any longer, when all the assembly routines take care of restoring registers as necessary.	2014-03-17 13:47:01 +02:00
Martin Storsjö	3cf52554f7	Backup/restore the xmm6-xmm15 SSE registers within asm functions on win64 According to the Win64 ABI, these registers need to be preserved, and compilers are allowed to rely on their content to stay available - not only for float usage but for any usage, anywhere, in the calling C++ code. This adds a macro which pushes the clobbered registers onto the stack if targeting win64 (and a matching one which restores them). The parameter to the macro is the number of xmm registers used (e.g. if using xmm0 - xmm7, the parameter is 8), or in other words, the number of the highest xmm register used plus one. This is similar to how the same issue is handled for the NEON registers q4-q7 with the vpush instruction, except that they needed to be preserved on all platforms, not only on one particular platform. This allows removing the XMMREG_PROTECT_* hacks, which can easily fail if the compiler chooses to use the callee saved xmm registers in an unexpected spot.	2014-03-17 13:44:33 +02:00
Martin Storsjö	9293f2f947	Remove commented out rodata sections and tables in assembly files	2014-03-17 13:42:18 +02:00
Martin Storsjö	eec968234d	Fold ALIGN 16 and the function label into WELS_EXTERN This simplifies the structure for all x86 assembly functions, reducing the amount of duplicated code structure.	2014-03-17 13:35:00 +02:00
Martin Storsjö	04f5bcd68d	Use movsxd in SIGN_EXTENSION This is what nasm ended up assembling movsx with 32 bit input to anyway. Keep using plain movsx for 16 bit input. This fixes building with yasm in 64 bit mode.	2014-03-17 13:26:46 +02:00
Martin Storsjö	f96918283f	Remove commented out code for old, 32-bit only x86 assembly function prologues/epilogues	2014-03-17 11:20:11 +02:00
Licai Guo	b5a4d706b9	Merge pull request #496 from mstorsjo/use-sign-extend-macro Use the SIGN_EXTENSION macro where possible	2014-03-17 16:31:03 +08:00
Licai Guo	1c0ba88b0e	Merge pull request #501 from mstorsjo/neon-register-backup Avoid clobbering the registers q4-q7 in DeblockingBSCalcEnc_neon	2014-03-17 14:05:23 +08:00
Martin Storsjö	fc260b39e0	Remove the unused FORMAT_COFF define Nothing in the project currently sets FORMAT_COFF - the other generic branch works just fine on windows.	2014-03-16 17:54:55 +02:00
Martin Storsjö	eb238e6549	Use the SIGN_EXTENSION macro where possible This shortens the x86 assembly by 134 lines in total.	2014-03-16 17:54:24 +02:00
Martin Storsjö	91e5838621	Indent all WELS_ASM_FUNC_BEGIN properly By having all of them start at the start of the line, the code is more consistent and readable.	2014-03-16 12:01:54 +02:00
Martin Storsjö	c82f548e6f	Add defines of arg11 and arg12 in asm_inc.asm	2014-03-15 14:42:07 +02:00
Martin Storsjö	cde30c155b	Avoid clobbering the registers q4-q7 in DeblockingBSCalcEnc_neon Remap q5 to q8, q6 to q9, q7 to q10 and q8 to q11, and push q4 to the stack. This was missed previously since the codec unittest doesn't test encoding with loop filter enabled yet.	2014-03-14 22:22:28 +02:00
Martin Storsjö	9199798f22	Fix a typo in a macro name, EXTENTION -> EXTENSION	2014-03-14 10:13:18 +02:00
volvet	6714b8ae99	Merge pull request #463 from mstorsjo/dont-clobber-neon-registers Avoid clobbering the neon registers q4-q7 Review and verified by zhilwang	2014-03-14 10:28:55 +08:00
Martin Storsjö	efe32b7900	Make arm assembly labels always start from the beginning of the line A few labels were misformatted.	2014-03-12 12:01:01 +02:00
Martin Storsjö	52e8973869	Mark the stack as non-executable in the arm assembly Otherwise the linker is forced to enable an executable stack for executables that the code is linked into.	2014-03-11 14:24:16 +02:00
Martin Storsjö	c011890764	Push clobbered neon registers on the stack According to the calling convention, the registers q4-q7 should be preserved by functions. The caller (generated by the compiler) could be using those registers anywhere for any intermediate data. Functions that use more than 12 of the qX registers must push the clobbered registers on the stack in order to be able to restore them afterwards. In functions that don't use all 16 registers, but clobber some of the callee saved registers q4-q7, one or more of them are remapped to reduce the number of registers that have to be saved/restored. This incurs a very small (around 0.5%) slowdown in the decoder and encoder.	2014-03-10 22:07:36 +02:00
Martin Storsjö	811c647c0e	Remap registers to avoid clobbering the neon registers q4-q7 According to the calling convention, the registers q4-q7 should be preserved by functions. The caller (generated by the compiler) could be using those registers anywhere for any intermediate data. Functions that use 12 or less of the qX registers can avoid violating the calling convention by simply using other registers instead of the callee saved registers q4-q7. This change only remaps the registers used within functions - therefore this does not affect performance at all. E.g. in functions using registers q0-q7, we now use q0-q3 and q8-q11 instead.	2014-03-10 22:07:25 +02:00
Ethan Hugg	3627875986	Merge pull request #456 from mstorsjo/use-common-threadlib Make the processing lib use mutexes from WelsThreadLib from the common library	2014-03-10 09:45:51 -07:00
ruil2	44a49b1fef	Merge pull request #458 from mstorsjo/android-threading Don't try to set thread scope and scheduling policy on android	2014-03-10 17:26:00 +08:00
ruil2	2539d6e447	Merge pull request #462 from mstorsjo/fix-typos Fix two typos in variable and macro names	2014-03-10 15:25:20 +08:00
Martin Storsjö	cc7b81f3c3	Fix a typo in arm assembly, LORD -> LOAD	2014-03-09 19:19:38 +02:00
Martin Storsjö	8d6b368a1c	Remove unnecessary stray __cdecl annotations in function signature comments in x86 assembly	2014-03-09 19:18:02 +02:00
Martin Storsjö	1c6a910c11	Don't try to set thread scope and scheduling policy on android These APIs aren't implemented on android.	2014-03-08 20:37:42 +02:00
Martin Storsjö	c5390521ec	Make the processing lib use mutexes from WelsThreadLib from the common library This requires always building the WelsMutex* functions, even if MT_ENABLED isn't set.	2014-03-08 12:46:25 +02:00
volvet	355bbacc2d	Merge pull request #443 from mstorsjo/rerun-mktargets Rerun mktargets.sh	2014-03-07 18:25:20 +08:00
Martin Storsjö	64b4556d13	Rerun mktargets.sh This fixes inconsistent indentation of one line, caused by manually editing one of the targets.mk files.	2014-03-07 11:30:19 +02:00
Martin Storsjö	5b8ee37162	Merge WelsThreadDestroy into WelsThreadJoin Now calling WelsThreadJoin is enough to finish and clean up the thread on all platforms. This unifies the thread cleanup code between windows and unix. Now all of the threading code should use the exact same codepaths between windows and unix.	2014-03-07 10:51:28 +02:00
Martin Storsjö	474deacd7a	Remove the now unused thread cancellation support This makes the thread library build on android - android does not have pthread_cancel.	2014-03-07 10:51:14 +02:00
volvet	38a3fada24	Merge pull request #435 from mstorsjo/threadlib-wait-single-unix Make WelsMultipleEventsWaitSingleBlocking usable on unix as well	2014-03-07 16:47:38 +08:00
Licai Guo	e5f36822a9	Update targets.mk files	2014-03-07 16:22:59 +08:00
Licai Guo	71467f948a	mv mc_neon.S to common,add MC arm code to encoder	2014-03-07 12:18:58 +08:00
volvet	14f5518e6a	Merge pull request #437 from mstorsjo/fix-arm-encoder-android Fix building arm encoder assembly for android	2014-03-07 10:41:34 +08:00
volvet	b3fa8dd334	Merge pull request #418 from mstorsjo/ios-neon-detection Use the __ARM_NEON__ built-in compiler define for identifying neon capability on iOS	2014-03-07 09:15:17 +08:00
Martin Storsjö	11bdebb12c	Explicitly enable the UAL syntax when using gnu tools Arm assembly has got two variants of the syntax, the old legacy syntax, and the new modern UAL (unified assembly language) syntax. Most arm assembly is the same in the both syntaxes, but some uncommon cases change the order of suffixes - the "subscs" instruction would be written "subcss" in the old syntax. The apple tools default to UAL, while the GNU tools (e.g. in android) require you to specify ".syntax unified" to enable the new syntax. When enabling the new syntax with the GNU tools, some cases of "sub r0, r1, lsl #1" needs to be written explicitly as "sub r0, r0, r1, lsl #1", handled in the previous commit. This allows using the same, modern syntax for things like subscs, without needing to have two alternate forms of writing it.	2014-03-06 16:21:54 +02:00
Martin Storsjö	c0043f7053	Use the three-operand form of add/sub with shift When using unified syntax, the two operand form with a shift isn't allowed.	2014-03-06 16:21:54 +02:00
Martin Storsjö	4e4bfcc1bc	Regenerate makefiles to include the encoder arm assembly	2014-03-06 16:11:54 +02:00
Martin Storsjö	45e059ec5f	Rename expand_picture.S to expand_picture_neon.S This avoids ambiguity in the make based build system about whether expand_picture.o should be built from expand_picture.S or expand_picture.asm.	2014-03-06 16:11:40 +02:00
Martin Storsjö	276b585f03	Use the cpu-features NDK library for detecting the number of cores in WelsThreadLib On arm, the exact same detection is done in WelsCPUFeatureDetect, but in the x86 version of that function we use x86 cpuid for getting the core count, and this is not available on all processors. For the case when cpuid can't tell the core count, use the NDK function as higher level API. The thread lib itself doesn't build properly on android yet, but will do so soon.	2014-03-06 15:28:59 +02:00
Martin Storsjö	d0a81355b0	Add support for using a separate "master event" in WelsMultipleEventsWait*Blocking This allows making the WelsMultipleEventsWaitSingleBlocking function work properly in unix, without polling. If a master event is provided, the function first waits for a signal on that event - once such a signal is received, it is assumed that one of the individual events in the list have been signalled as well. Then the function can proceed to check each of the semaphores in the list using sem_trywait to find the first one of them that has been signalled. Assuming that the master event is signalled in pair with the other events, one of the sem_trywait calls should succeed. The same master event is also used in WelsMultipleEventsWaitAllBlocking, to keep the semaphore values in sync across calls to the both functions.	2014-03-06 15:03:59 +02:00
Martin Storsjö	de32455d87	Remove the timeout parameter from WelsMultipleEventsWaitSingleBlocking All users of the function passed the value corresponding to "infinite", and the (currently unused) unix implementation of it only supported infinite wait as well.	2014-03-06 15:03:59 +02:00
volvet	8cc332dea1	Merge pull request #432 from zhilwang/arm-asm Arm asm	2014-03-06 16:50:56 +08:00
volvet	8beb3c8c09	Merge pull request #417 from mstorsjo/unify-event-init Unify the interface for creating/deleting event objects	2014-03-06 09:13:13 +08:00

1 2 3 4 5 ...

298 Commits