openh264

Author	SHA1	Message	Date
volvet	fb1958ad13	Merge pull request #519 from mstorsjo/push-xmm-registers Backup/restore the xmm6-xmm15 SSE registers within asm functions on win64 Reviewed by zhiliang	2014-03-18 15:04:54 +08:00
volvet	b5353c8455	Merge pull request #516 from mstorsjo/fix-yasm-64bit Fix building with yasm in 64 bit mode	2014-03-18 09:29:42 +08:00
volvet	e75cd2298b	Merge pull request #517 from mstorsjo/simplify-x86-asm-func-macro Fold ALIGN 16 and the function label into WELS_EXTERN	2014-03-18 09:29:17 +08:00
Martin Storsjö	4633626d69	Remove XMMREG_PROTECT This isn't necessary any longer, when all the assembly routines take care of restoring registers as necessary.	2014-03-17 13:47:01 +02:00
Martin Storsjö	3cf52554f7	Backup/restore the xmm6-xmm15 SSE registers within asm functions on win64 According to the Win64 ABI, these registers need to be preserved, and compilers are allowed to rely on their content to stay available - not only for float usage but for any usage, anywhere, in the calling C++ code. This adds a macro which pushes the clobbered registers onto the stack if targeting win64 (and a matching one which restores them). The parameter to the macro is the number of xmm registers used (e.g. if using xmm0 - xmm7, the parameter is 8), or in other words, the number of the highest xmm register used plus one. This is similar to how the same issue is handled for the NEON registers q4-q7 with the vpush instruction, except that they needed to be preserved on all platforms, not only on one particular platform. This allows removing the XMMREG_PROTECT_* hacks, which can easily fail if the compiler chooses to use the callee saved xmm registers in an unexpected spot.	2014-03-17 13:44:33 +02:00
Martin Storsjö	9293f2f947	Remove commented out rodata sections and tables in assembly files	2014-03-17 13:42:18 +02:00
Martin Storsjö	eec968234d	Fold ALIGN 16 and the function label into WELS_EXTERN This simplifies the structure for all x86 assembly functions, reducing the amount of duplicated code structure.	2014-03-17 13:35:00 +02:00
Martin Storsjö	04f5bcd68d	Use movsxd in SIGN_EXTENSION This is what nasm ended up assembling movsx with 32 bit input to anyway. Keep using plain movsx for 16 bit input. This fixes building with yasm in 64 bit mode.	2014-03-17 13:26:46 +02:00
Martin Storsjö	f96918283f	Remove commented out code for old, 32-bit only x86 assembly function prologues/epilogues	2014-03-17 11:20:11 +02:00
Licai Guo	b5a4d706b9	Merge pull request #496 from mstorsjo/use-sign-extend-macro Use the SIGN_EXTENSION macro where possible	2014-03-17 16:31:03 +08:00
Licai Guo	1c0ba88b0e	Merge pull request #501 from mstorsjo/neon-register-backup Avoid clobbering the registers q4-q7 in DeblockingBSCalcEnc_neon	2014-03-17 14:05:23 +08:00
Martin Storsjö	fc260b39e0	Remove the unused FORMAT_COFF define Nothing in the project currently sets FORMAT_COFF - the other generic branch works just fine on windows.	2014-03-16 17:54:55 +02:00
Martin Storsjö	eb238e6549	Use the SIGN_EXTENSION macro where possible This shortens the x86 assembly by 134 lines in total.	2014-03-16 17:54:24 +02:00
Martin Storsjö	91e5838621	Indent all WELS_ASM_FUNC_BEGIN properly By having all of them start at the start of the line, the code is more consistent and readable.	2014-03-16 12:01:54 +02:00
Martin Storsjö	c82f548e6f	Add defines of arg11 and arg12 in asm_inc.asm	2014-03-15 14:42:07 +02:00
Martin Storsjö	cde30c155b	Avoid clobbering the registers q4-q7 in DeblockingBSCalcEnc_neon Remap q5 to q8, q6 to q9, q7 to q10 and q8 to q11, and push q4 to the stack. This was missed previously since the codec unittest doesn't test encoding with loop filter enabled yet.	2014-03-14 22:22:28 +02:00
Martin Storsjö	9199798f22	Fix a typo in a macro name, EXTENTION -> EXTENSION	2014-03-14 10:13:18 +02:00
volvet	6714b8ae99	Merge pull request #463 from mstorsjo/dont-clobber-neon-registers Avoid clobbering the neon registers q4-q7 Review and verified by zhilwang	2014-03-14 10:28:55 +08:00
Martin Storsjö	efe32b7900	Make arm assembly labels always start from the beginning of the line A few labels were misformatted.	2014-03-12 12:01:01 +02:00
Martin Storsjö	52e8973869	Mark the stack as non-executable in the arm assembly Otherwise the linker is forced to enable an executable stack for executables that the code is linked into.	2014-03-11 14:24:16 +02:00
Martin Storsjö	c011890764	Push clobbered neon registers on the stack According to the calling convention, the registers q4-q7 should be preserved by functions. The caller (generated by the compiler) could be using those registers anywhere for any intermediate data. Functions that use more than 12 of the qX registers must push the clobbered registers on the stack in order to be able to restore them afterwards. In functions that don't use all 16 registers, but clobber some of the callee saved registers q4-q7, one or more of them are remapped to reduce the number of registers that have to be saved/restored. This incurs a very small (around 0.5%) slowdown in the decoder and encoder.	2014-03-10 22:07:36 +02:00
Martin Storsjö	811c647c0e	Remap registers to avoid clobbering the neon registers q4-q7 According to the calling convention, the registers q4-q7 should be preserved by functions. The caller (generated by the compiler) could be using those registers anywhere for any intermediate data. Functions that use 12 or less of the qX registers can avoid violating the calling convention by simply using other registers instead of the callee saved registers q4-q7. This change only remaps the registers used within functions - therefore this does not affect performance at all. E.g. in functions using registers q0-q7, we now use q0-q3 and q8-q11 instead.	2014-03-10 22:07:25 +02:00
Ethan Hugg	3627875986	Merge pull request #456 from mstorsjo/use-common-threadlib Make the processing lib use mutexes from WelsThreadLib from the common library	2014-03-10 09:45:51 -07:00
ruil2	44a49b1fef	Merge pull request #458 from mstorsjo/android-threading Don't try to set thread scope and scheduling policy on android	2014-03-10 17:26:00 +08:00
ruil2	2539d6e447	Merge pull request #462 from mstorsjo/fix-typos Fix two typos in variable and macro names	2014-03-10 15:25:20 +08:00
Martin Storsjö	cc7b81f3c3	Fix a typo in arm assembly, LORD -> LOAD	2014-03-09 19:19:38 +02:00
Martin Storsjö	8d6b368a1c	Remove unnecessary stray __cdecl annotations in function signature comments in x86 assembly	2014-03-09 19:18:02 +02:00
Martin Storsjö	1c6a910c11	Don't try to set thread scope and scheduling policy on android These APIs aren't implemented on android.	2014-03-08 20:37:42 +02:00
Martin Storsjö	c5390521ec	Make the processing lib use mutexes from WelsThreadLib from the common library This requires always building the WelsMutex* functions, even if MT_ENABLED isn't set.	2014-03-08 12:46:25 +02:00
volvet	355bbacc2d	Merge pull request #443 from mstorsjo/rerun-mktargets Rerun mktargets.sh	2014-03-07 18:25:20 +08:00
Martin Storsjö	64b4556d13	Rerun mktargets.sh This fixes inconsistent indentation of one line, caused by manually editing one of the targets.mk files.	2014-03-07 11:30:19 +02:00
Martin Storsjö	5b8ee37162	Merge WelsThreadDestroy into WelsThreadJoin Now calling WelsThreadJoin is enough to finish and clean up the thread on all platforms. This unifies the thread cleanup code between windows and unix. Now all of the threading code should use the exact same codepaths between windows and unix.	2014-03-07 10:51:28 +02:00
Martin Storsjö	474deacd7a	Remove the now unused thread cancellation support This makes the thread library build on android - android does not have pthread_cancel.	2014-03-07 10:51:14 +02:00
volvet	38a3fada24	Merge pull request #435 from mstorsjo/threadlib-wait-single-unix Make WelsMultipleEventsWaitSingleBlocking usable on unix as well	2014-03-07 16:47:38 +08:00
Licai Guo	e5f36822a9	Update targets.mk files	2014-03-07 16:22:59 +08:00
Licai Guo	71467f948a	mv mc_neon.S to common,add MC arm code to encoder	2014-03-07 12:18:58 +08:00
volvet	14f5518e6a	Merge pull request #437 from mstorsjo/fix-arm-encoder-android Fix building arm encoder assembly for android	2014-03-07 10:41:34 +08:00
volvet	b3fa8dd334	Merge pull request #418 from mstorsjo/ios-neon-detection Use the __ARM_NEON__ built-in compiler define for identifying neon capability on iOS	2014-03-07 09:15:17 +08:00
Martin Storsjö	11bdebb12c	Explicitly enable the UAL syntax when using gnu tools Arm assembly has got two variants of the syntax, the old legacy syntax, and the new modern UAL (unified assembly language) syntax. Most arm assembly is the same in the both syntaxes, but some uncommon cases change the order of suffixes - the "subscs" instruction would be written "subcss" in the old syntax. The apple tools default to UAL, while the GNU tools (e.g. in android) require you to specify ".syntax unified" to enable the new syntax. When enabling the new syntax with the GNU tools, some cases of "sub r0, r1, lsl #1" needs to be written explicitly as "sub r0, r0, r1, lsl #1", handled in the previous commit. This allows using the same, modern syntax for things like subscs, without needing to have two alternate forms of writing it.	2014-03-06 16:21:54 +02:00
Martin Storsjö	c0043f7053	Use the three-operand form of add/sub with shift When using unified syntax, the two operand form with a shift isn't allowed.	2014-03-06 16:21:54 +02:00
Martin Storsjö	4e4bfcc1bc	Regenerate makefiles to include the encoder arm assembly	2014-03-06 16:11:54 +02:00
Martin Storsjö	45e059ec5f	Rename expand_picture.S to expand_picture_neon.S This avoids ambiguity in the make based build system about whether expand_picture.o should be built from expand_picture.S or expand_picture.asm.	2014-03-06 16:11:40 +02:00
Martin Storsjö	276b585f03	Use the cpu-features NDK library for detecting the number of cores in WelsThreadLib On arm, the exact same detection is done in WelsCPUFeatureDetect, but in the x86 version of that function we use x86 cpuid for getting the core count, and this is not available on all processors. For the case when cpuid can't tell the core count, use the NDK function as higher level API. The thread lib itself doesn't build properly on android yet, but will do so soon.	2014-03-06 15:28:59 +02:00
Martin Storsjö	d0a81355b0	Add support for using a separate "master event" in WelsMultipleEventsWait*Blocking This allows making the WelsMultipleEventsWaitSingleBlocking function work properly in unix, without polling. If a master event is provided, the function first waits for a signal on that event - once such a signal is received, it is assumed that one of the individual events in the list have been signalled as well. Then the function can proceed to check each of the semaphores in the list using sem_trywait to find the first one of them that has been signalled. Assuming that the master event is signalled in pair with the other events, one of the sem_trywait calls should succeed. The same master event is also used in WelsMultipleEventsWaitAllBlocking, to keep the semaphore values in sync across calls to the both functions.	2014-03-06 15:03:59 +02:00
Martin Storsjö	de32455d87	Remove the timeout parameter from WelsMultipleEventsWaitSingleBlocking All users of the function passed the value corresponding to "infinite", and the (currently unused) unix implementation of it only supported infinite wait as well.	2014-03-06 15:03:59 +02:00
volvet	8cc332dea1	Merge pull request #432 from zhilwang/arm-asm Arm asm	2014-03-06 16:50:56 +08:00
volvet	8beb3c8c09	Merge pull request #417 from mstorsjo/unify-event-init Unify the interface for creating/deleting event objects	2014-03-06 09:13:13 +08:00
volvet	97376c6339	Merge pull request #413 from mstorsjo/remove-commented-code Remove commented out, unused code	2014-03-05 22:13:35 +08:00
volvet	7ea70491c8	Merge pull request #411 from mstorsjo/arm-add-func-markers Add .func/.endfunc markers in the arm assembly	2014-03-05 17:40:18 +08:00
Martin Storsjö	f384dde881	Add .func/.endfunc markers in the arm assembly This adds information to debug builds. This requires adding a separate definition of WELS_ASM_FUNC_END for apple tools.	2014-03-05 11:25:51 +02:00
Licai Guo	e7cc8c2780	Add arm asm code for processing.	2014-03-05 16:54:05 +08:00
Martin Storsjö	ef7e05d47d	Use the __ARM_NEON__ built-in compiler define for identifying neon capability on iOS This avoids having to hardcode the names of devices that don't support neon. The devices that don't support neon don't run the armv7 variants of iOS binaries at all - they would need to be built for the armv6 architecture. (Building for armv6 isn't supported at all in modern iOS SDKs.) Therefore we can simply use the __ARM_NEON__ built-in compiler define to check if NEON code is allowed in the current build, and have the WelsCPUFeatureDetect function return flags accordingly. The only thing this disallows is doing an armv6 build which would optionally enable neon code at runtime if run on an armv7 capable device, but since Apple allows you to build the same binary for armv7 separately in the same app bundle, and since armv6 building isn't even possible in the current iOS SDKs, this isn't really a loss. This is in contrast to the android builds where the armv7 baseline does not include NEON.	2014-03-05 09:47:05 +02:00
Martin Storsjö	4814d5828d	Use unnamed semaphores on linux This avoids the risk of namespace collisions for named semaphores (where the names are global for the whole machine), on platforms where we strictly don't need to use the named semaphores.	2014-03-05 09:36:46 +02:00
Martin Storsjö	5480ffafdf	Use the WelsEventOpen interface with an event name on windows as well This unifies the event creation interface, even if the event name itself is unused on windows, allowing use the exact same code to initialize events regardless of the actual platform. Some ifdefs still remain in the event initialization code, since some events are only used on windows.	2014-03-05 09:36:04 +02:00
Martin Storsjö	04917cd13f	Remove commented out, unused code Some few lines of commented out code is left, that might be useful for debugging.	2014-03-05 08:50:59 +02:00
volvet	adb27ff0b1	Merge pull request #405 from mstorsjo/simplify-threads Adjust WELS_EVENT definitions to allow sharing more code between unix and win32 codepaths	2014-03-05 12:31:15 +08:00
Licai Guo	bb244d736b	Partly add arm asm code to encoder.	2014-03-05 10:24:05 +08:00
Martin Storsjö	dae8f4b737	Exclude the arm assembly header as well This avoids warnings about object files not containing any symbols.	2014-03-04 23:23:19 +02:00
Ethan Hugg	975a3e41bc	Merge pull request #404 from mstorsjo/arm-asm-type-func Mark the arm asm labels as functions	2014-03-04 10:17:07 -08:00
Ethan Hugg	01a2f582c3	Merge pull request #401 from mstorsjo/android-arm-assembly Enable the arm assembly in android builds	2014-03-04 09:50:07 -08:00
volvet	e61bd1b504	Merge pull request #408 from mstorsjo/exclude-asm-headers Exclude assembly files that are used as headers	2014-03-04 21:07:42 +08:00
volvet	bc9ee5b145	Merge pull request #406 from mstorsjo/use-proper-define Use the windows INFINITE define instead of manually casting -1 to uint32_t	2014-03-04 20:57:57 +08:00
Martin Storsjö	773cc4a797	Exclude assembly files that are used as headers This avoids some warnings about object files not containing any symbols.	2014-03-04 14:57:36 +02:00
Martin Storsjö	42592217c2	Use the windows INFINITE define instead of manually casting -1 to uint32_t	2014-03-04 14:47:25 +02:00
Martin Storsjö	71bc52d103	Change the unix version of WELS_EVENT to sem_t* Typedeffing WELS_EVENT as sem_t* makes the typedef behave similarly to the windows version (typedeffed as HANDLE), unifying the code that allocates and uses these event objects (getting rid of most of the need for separate codepaths and ifdefs).	2014-03-04 12:17:32 +02:00
Martin Storsjö	e20930ef73	Mark the arm asm labels as functions This fixes calling them from thumb code, on linux.	2014-03-04 11:23:04 +02:00
Martin Storsjö	d411d768b6	Add cpu feature detection for generic arm/linux For platforms without runtime detection, assume whoever built it with HAVE_NEON actually wanted it.	2014-03-04 10:18:30 +02:00
Martin Storsjö	9cf34e7615	Unify the interface for the different variants of WelsCPUFeatureDetect The caller of the function should not need to know exactly which implementation of it is being used. For the variants that don't support detecting the number of cores, the pNumberOfLogicProcessors parameter can be left untouched and the caller will use a higher level API for finding it out. This simplifies all the calling code, and simplifies adding more implementations of cpu feature detection.	2014-03-04 10:18:30 +02:00
volvet	4d729c9418	get cpu cores for android	2014-03-04 15:50:41 +08:00
Martin Storsjö	1118dd4f71	Update the makefile generator to support .S arm assembly files These are built if ASM_ARCH is set to arm.	2014-03-04 08:56:42 +02:00
Martin Storsjö	03e0dcd814	Convert the arm assembly sources to unix newlines	2014-03-04 08:42:28 +02:00
volvet	d9a02d27f2	remove execute attribute of the arm asm files	2014-03-04 11:45:10 +08:00
volvet	f8b0cec68d	Merge pull request #387 from zhilwang/arm-asm Arm asm	2014-03-04 11:08:17 +08:00
volvet	901b89f7ad	Merge pull request #376 from mstorsjo/simplify-x86-asm-makefiles Simplify makefiles with respect to x86 assembly	2014-03-04 10:16:01 +08:00
Ethan Hugg	1eb688264b	Merge pull request #395 from mstorsjo/printf-64bit-macro Use a standard macro for 64 bit printf conversion specifiers	2014-03-03 09:11:51 -08:00
Martin Storsjö	e0951599ea	Unify ifdef conditions related to threading code The two different variants of the threadlib basically are win32 and unix - use _WIN32 to check for this consistently, instead of occasionally using __GNUC__ to enable the unix codepath. (__GNUC__ is also defined on mingw, which still is a windows platform and should use the _WIN32 code.)	2014-03-03 14:55:53 +02:00
Martin Storsjö	3c7dde97ee	Use a standard macro for 64 bit printf conversion specifiers This avoids duplicating the printf line with an ifdef every time a 64 bit number needs to be printed.	2014-03-03 12:33:34 +02:00
Licai Guo	21d6b3481f	Remove trailing space.	2014-03-03 16:05:07 +08:00
Licai Guo	7768cd0a98	Modify code style, remove trailing space.	2014-03-03 15:42:01 +08:00
volvet	a3f129d8cd	Merge pull request #382 from mstorsjo/avoid-overflow-in-timespec Avoid overflow when populating a struct timespec	2014-03-03 09:05:27 +08:00
volvet	3a602a382b	Merge pull request #379 from mstorsjo/simplify-emms-calling Provide a no-op WelsEmms macro if X86_ASM is disabled	2014-03-03 09:03:41 +08:00
volvet	6c41cccb81	Merge pull request #377 from mstorsjo/threadlib-const-str Add const to string parameters in WelsThreadLib	2014-03-03 08:53:00 +08:00
volvet	8c7e0a6ac6	Merge pull request #381 from mstorsjo/clarify-threading-comment Clarify a comment in the threading code	2014-03-03 08:52:10 +08:00
Martin Storsjö	a96d83e762	Remove the broken WelsEventReset function This function didn't work properly with named semaphores, which are used in the unix codepaths. Since it's unused, just remove it instead.	2014-03-03 00:00:17 +02:00
Martin Storsjö	b7db015a8c	Avoid overflow when populating a struct timespec When adding the (dwMilliseconds % 1000) * 1000000 part to ts.tv_nsec, the ts.tv_nsec field can grow larger than one whole second. Therefore first add all of dwMilliseconds to the tv_nsec field and add all whole seconds to the tv_sec field instead - this way we make sure that the tv_nsec field actually is less than a second.	2014-03-02 23:53:51 +02:00
Martin Storsjö	d2fc2e47f2	Clarify a comment in the threading code Named semaphores are used instead of unnamed semaphores in the unix codepaths, since unnamed semaphores aren't available on OS X.	2014-03-02 23:51:59 +02:00
Martin Storsjö	dd47d4805f	Provide a no-op WelsEmms macro if X86_ASM is disabled This allows always calling this function, reducing the number of ifdefs in the calling code.	2014-03-02 23:46:20 +02:00
Martin Storsjö	8db97925a5	Add const to string parameters in WelsThreadLib	2014-03-02 23:43:45 +02:00
Martin Storsjö	3ccd2ae4cf	Remove a redundant makefile ifdef ASM_ARCH=x86 is only set if USE_ASM is enabled.	2014-03-01 23:56:14 +02:00
Licai Guo	b7a25df13f	Remove deblocking arm asm code to common folder, add cpu detect for arm, clean some code.	2014-02-28 17:08:24 +08:00
Martin Storsjö	7d2c761604	Allow using the USE_ASM makefile variable for architectures other than x86 Add an ASM_ARCH variable which specifies which kind of assembly is supposed to be built.	2014-02-28 10:19:53 +02:00
Licai Guo	0fd9db2878	Add ARM 32bit asm code for decoder.	2014-02-28 13:36:34 +08:00
Martin Storsjö	bb5b3978bf	Use higher level APIs for getting the number of cores if WelsCPUFeatureDetect didn't report anything On processors without HTT, WelsCPUFeatureDetect can't return a number of cores but might still return a nonzero set of CPU feature flags. Previously the nonzero cpu feature flag indicated that cpuid worked and the encoder wouldn't use the higher level API for getting the number of cores, even though the number of cores was left at 1.	2014-02-26 21:43:46 +02:00
sijchen	e45e859473	Squashed merge from writenal_refactor2	2014-02-20 14:50:04 +08:00
Martin Storsjö	3532781556	Mark source parameters to MC functions as const	2014-02-19 10:19:56 +02:00
Martin Storsjö	ce22f84a2b	Regenerate target makefiles after the latest mktargets.py changes	2014-02-12 22:11:05 +02:00
Ethan Hugg	8b8e0d4b3e	Merge pull request #274 from mstorsjo/typedef-cleanup Remove typedefs for standard C++ types	2014-02-10 10:46:12 -08:00
Ethan Hugg	1e549e6f9a	Merge pull request #271 from mstorsjo/merge-asm-headers Merge declarations of shared asm functions into the comon library	2014-02-10 10:40:34 -08:00
Ethan Hugg	007f5ba773	Merge pull request #258 from mstorsjo/endian-cleanup Avoid endian-specific code	2014-02-10 10:36:05 -08:00
Martin Storsjö	80862eec77	Use the C++ constants true/false instead of defining our own TRUE/FALSE has intentionally been left in use for the few platform specific APIs that define these constants themselves and expect them to be used, for consistency.	2014-02-10 08:06:37 +02:00

1 2 3 4 5

245 Commits