openh264

Author	SHA1	Message	Date
huili2	090e8cc1ed	modify WELS_CLIP1 to be inline functions	2014-03-18 01:54:25 -07:00
volvet	b21411ad7c	Merge pull request #511 from mstorsjo/remove-unused-define Remove the unused FORMAT_COFF define	2014-03-18 16:11:22 +08:00
volvet	fb1958ad13	Merge pull request #519 from mstorsjo/push-xmm-registers Backup/restore the xmm6-xmm15 SSE registers within asm functions on win64 Reviewed by zhiliang	2014-03-18 15:04:54 +08:00
volvet	b5353c8455	Merge pull request #516 from mstorsjo/fix-yasm-64bit Fix building with yasm in 64 bit mode	2014-03-18 09:29:42 +08:00
volvet	e75cd2298b	Merge pull request #517 from mstorsjo/simplify-x86-asm-func-macro Fold ALIGN 16 and the function label into WELS_EXTERN	2014-03-18 09:29:17 +08:00
Martin Storsjö	4633626d69	Remove XMMREG_PROTECT This isn't necessary any longer, when all the assembly routines take care of restoring registers as necessary.	2014-03-17 13:47:01 +02:00
Martin Storsjö	3cf52554f7	Backup/restore the xmm6-xmm15 SSE registers within asm functions on win64 According to the Win64 ABI, these registers need to be preserved, and compilers are allowed to rely on their content to stay available - not only for float usage but for any usage, anywhere, in the calling C++ code. This adds a macro which pushes the clobbered registers onto the stack if targeting win64 (and a matching one which restores them). The parameter to the macro is the number of xmm registers used (e.g. if using xmm0 - xmm7, the parameter is 8), or in other words, the number of the highest xmm register used plus one. This is similar to how the same issue is handled for the NEON registers q4-q7 with the vpush instruction, except that they needed to be preserved on all platforms, not only on one particular platform. This allows removing the XMMREG_PROTECT_* hacks, which can easily fail if the compiler chooses to use the callee saved xmm registers in an unexpected spot.	2014-03-17 13:44:33 +02:00
Martin Storsjö	9293f2f947	Remove commented out rodata sections and tables in assembly files	2014-03-17 13:42:18 +02:00
Martin Storsjö	eec968234d	Fold ALIGN 16 and the function label into WELS_EXTERN This simplifies the structure for all x86 assembly functions, reducing the amount of duplicated code structure.	2014-03-17 13:35:00 +02:00
Martin Storsjö	04f5bcd68d	Use movsxd in SIGN_EXTENSION This is what nasm ended up assembling movsx with 32 bit input to anyway. Keep using plain movsx for 16 bit input. This fixes building with yasm in 64 bit mode.	2014-03-17 13:26:46 +02:00
Martin Storsjö	f96918283f	Remove commented out code for old, 32-bit only x86 assembly function prologues/epilogues	2014-03-17 11:20:11 +02:00
Licai Guo	b5a4d706b9	Merge pull request #496 from mstorsjo/use-sign-extend-macro Use the SIGN_EXTENSION macro where possible	2014-03-17 16:31:03 +08:00
Licai Guo	1c0ba88b0e	Merge pull request #501 from mstorsjo/neon-register-backup Avoid clobbering the registers q4-q7 in DeblockingBSCalcEnc_neon	2014-03-17 14:05:23 +08:00
Martin Storsjö	fc260b39e0	Remove the unused FORMAT_COFF define Nothing in the project currently sets FORMAT_COFF - the other generic branch works just fine on windows.	2014-03-16 17:54:55 +02:00
Martin Storsjö	eb238e6549	Use the SIGN_EXTENSION macro where possible This shortens the x86 assembly by 134 lines in total.	2014-03-16 17:54:24 +02:00
Martin Storsjö	91e5838621	Indent all WELS_ASM_FUNC_BEGIN properly By having all of them start at the start of the line, the code is more consistent and readable.	2014-03-16 12:01:54 +02:00
Martin Storsjö	c82f548e6f	Add defines of arg11 and arg12 in asm_inc.asm	2014-03-15 14:42:07 +02:00
Martin Storsjö	cde30c155b	Avoid clobbering the registers q4-q7 in DeblockingBSCalcEnc_neon Remap q5 to q8, q6 to q9, q7 to q10 and q8 to q11, and push q4 to the stack. This was missed previously since the codec unittest doesn't test encoding with loop filter enabled yet.	2014-03-14 22:22:28 +02:00
Martin Storsjö	9199798f22	Fix a typo in a macro name, EXTENTION -> EXTENSION	2014-03-14 10:13:18 +02:00
volvet	6714b8ae99	Merge pull request #463 from mstorsjo/dont-clobber-neon-registers Avoid clobbering the neon registers q4-q7 Review and verified by zhilwang	2014-03-14 10:28:55 +08:00
Martin Storsjö	efe32b7900	Make arm assembly labels always start from the beginning of the line A few labels were misformatted.	2014-03-12 12:01:01 +02:00
Martin Storsjö	52e8973869	Mark the stack as non-executable in the arm assembly Otherwise the linker is forced to enable an executable stack for executables that the code is linked into.	2014-03-11 14:24:16 +02:00
Martin Storsjö	c011890764	Push clobbered neon registers on the stack According to the calling convention, the registers q4-q7 should be preserved by functions. The caller (generated by the compiler) could be using those registers anywhere for any intermediate data. Functions that use more than 12 of the qX registers must push the clobbered registers on the stack in order to be able to restore them afterwards. In functions that don't use all 16 registers, but clobber some of the callee saved registers q4-q7, one or more of them are remapped to reduce the number of registers that have to be saved/restored. This incurs a very small (around 0.5%) slowdown in the decoder and encoder.	2014-03-10 22:07:36 +02:00
Martin Storsjö	811c647c0e	Remap registers to avoid clobbering the neon registers q4-q7 According to the calling convention, the registers q4-q7 should be preserved by functions. The caller (generated by the compiler) could be using those registers anywhere for any intermediate data. Functions that use 12 or less of the qX registers can avoid violating the calling convention by simply using other registers instead of the callee saved registers q4-q7. This change only remaps the registers used within functions - therefore this does not affect performance at all. E.g. in functions using registers q0-q7, we now use q0-q3 and q8-q11 instead.	2014-03-10 22:07:25 +02:00
Ethan Hugg	3627875986	Merge pull request #456 from mstorsjo/use-common-threadlib Make the processing lib use mutexes from WelsThreadLib from the common library	2014-03-10 09:45:51 -07:00
ruil2	44a49b1fef	Merge pull request #458 from mstorsjo/android-threading Don't try to set thread scope and scheduling policy on android	2014-03-10 17:26:00 +08:00
ruil2	2539d6e447	Merge pull request #462 from mstorsjo/fix-typos Fix two typos in variable and macro names	2014-03-10 15:25:20 +08:00
Martin Storsjö	cc7b81f3c3	Fix a typo in arm assembly, LORD -> LOAD	2014-03-09 19:19:38 +02:00
Martin Storsjö	8d6b368a1c	Remove unnecessary stray __cdecl annotations in function signature comments in x86 assembly	2014-03-09 19:18:02 +02:00
Martin Storsjö	1c6a910c11	Don't try to set thread scope and scheduling policy on android These APIs aren't implemented on android.	2014-03-08 20:37:42 +02:00
Martin Storsjö	c5390521ec	Make the processing lib use mutexes from WelsThreadLib from the common library This requires always building the WelsMutex* functions, even if MT_ENABLED isn't set.	2014-03-08 12:46:25 +02:00
volvet	355bbacc2d	Merge pull request #443 from mstorsjo/rerun-mktargets Rerun mktargets.sh	2014-03-07 18:25:20 +08:00
Martin Storsjö	64b4556d13	Rerun mktargets.sh This fixes inconsistent indentation of one line, caused by manually editing one of the targets.mk files.	2014-03-07 11:30:19 +02:00
Martin Storsjö	5b8ee37162	Merge WelsThreadDestroy into WelsThreadJoin Now calling WelsThreadJoin is enough to finish and clean up the thread on all platforms. This unifies the thread cleanup code between windows and unix. Now all of the threading code should use the exact same codepaths between windows and unix.	2014-03-07 10:51:28 +02:00
Martin Storsjö	474deacd7a	Remove the now unused thread cancellation support This makes the thread library build on android - android does not have pthread_cancel.	2014-03-07 10:51:14 +02:00
volvet	38a3fada24	Merge pull request #435 from mstorsjo/threadlib-wait-single-unix Make WelsMultipleEventsWaitSingleBlocking usable on unix as well	2014-03-07 16:47:38 +08:00
Licai Guo	e5f36822a9	Update targets.mk files	2014-03-07 16:22:59 +08:00
Licai Guo	71467f948a	mv mc_neon.S to common,add MC arm code to encoder	2014-03-07 12:18:58 +08:00
volvet	14f5518e6a	Merge pull request #437 from mstorsjo/fix-arm-encoder-android Fix building arm encoder assembly for android	2014-03-07 10:41:34 +08:00
volvet	b3fa8dd334	Merge pull request #418 from mstorsjo/ios-neon-detection Use the __ARM_NEON__ built-in compiler define for identifying neon capability on iOS	2014-03-07 09:15:17 +08:00
Martin Storsjö	11bdebb12c	Explicitly enable the UAL syntax when using gnu tools Arm assembly has got two variants of the syntax, the old legacy syntax, and the new modern UAL (unified assembly language) syntax. Most arm assembly is the same in the both syntaxes, but some uncommon cases change the order of suffixes - the "subscs" instruction would be written "subcss" in the old syntax. The apple tools default to UAL, while the GNU tools (e.g. in android) require you to specify ".syntax unified" to enable the new syntax. When enabling the new syntax with the GNU tools, some cases of "sub r0, r1, lsl #1" needs to be written explicitly as "sub r0, r0, r1, lsl #1", handled in the previous commit. This allows using the same, modern syntax for things like subscs, without needing to have two alternate forms of writing it.	2014-03-06 16:21:54 +02:00
Martin Storsjö	c0043f7053	Use the three-operand form of add/sub with shift When using unified syntax, the two operand form with a shift isn't allowed.	2014-03-06 16:21:54 +02:00
Martin Storsjö	4e4bfcc1bc	Regenerate makefiles to include the encoder arm assembly	2014-03-06 16:11:54 +02:00
Martin Storsjö	45e059ec5f	Rename expand_picture.S to expand_picture_neon.S This avoids ambiguity in the make based build system about whether expand_picture.o should be built from expand_picture.S or expand_picture.asm.	2014-03-06 16:11:40 +02:00
Martin Storsjö	276b585f03	Use the cpu-features NDK library for detecting the number of cores in WelsThreadLib On arm, the exact same detection is done in WelsCPUFeatureDetect, but in the x86 version of that function we use x86 cpuid for getting the core count, and this is not available on all processors. For the case when cpuid can't tell the core count, use the NDK function as higher level API. The thread lib itself doesn't build properly on android yet, but will do so soon.	2014-03-06 15:28:59 +02:00
Martin Storsjö	d0a81355b0	Add support for using a separate "master event" in WelsMultipleEventsWait*Blocking This allows making the WelsMultipleEventsWaitSingleBlocking function work properly in unix, without polling. If a master event is provided, the function first waits for a signal on that event - once such a signal is received, it is assumed that one of the individual events in the list have been signalled as well. Then the function can proceed to check each of the semaphores in the list using sem_trywait to find the first one of them that has been signalled. Assuming that the master event is signalled in pair with the other events, one of the sem_trywait calls should succeed. The same master event is also used in WelsMultipleEventsWaitAllBlocking, to keep the semaphore values in sync across calls to the both functions.	2014-03-06 15:03:59 +02:00
Martin Storsjö	de32455d87	Remove the timeout parameter from WelsMultipleEventsWaitSingleBlocking All users of the function passed the value corresponding to "infinite", and the (currently unused) unix implementation of it only supported infinite wait as well.	2014-03-06 15:03:59 +02:00
volvet	8cc332dea1	Merge pull request #432 from zhilwang/arm-asm Arm asm	2014-03-06 16:50:56 +08:00
volvet	8beb3c8c09	Merge pull request #417 from mstorsjo/unify-event-init Unify the interface for creating/deleting event objects	2014-03-06 09:13:13 +08:00
volvet	97376c6339	Merge pull request #413 from mstorsjo/remove-commented-code Remove commented out, unused code	2014-03-05 22:13:35 +08:00

1 2 3 4

197 Commits