Commit Graph

197 Commits

Author SHA1 Message Date
huili2
090e8cc1ed modify WELS_CLIP1 to be inline functions 2014-03-18 01:54:25 -07:00
volvet
b21411ad7c Merge pull request #511 from mstorsjo/remove-unused-define
Remove the unused FORMAT_COFF define
2014-03-18 16:11:22 +08:00
volvet
fb1958ad13 Merge pull request #519 from mstorsjo/push-xmm-registers
Backup/restore the xmm6-xmm15 SSE registers within asm functions on win64

Reviewed by zhiliang
2014-03-18 15:04:54 +08:00
volvet
b5353c8455 Merge pull request #516 from mstorsjo/fix-yasm-64bit
Fix building with yasm in 64 bit mode
2014-03-18 09:29:42 +08:00
volvet
e75cd2298b Merge pull request #517 from mstorsjo/simplify-x86-asm-func-macro
Fold ALIGN 16 and the function label into WELS_EXTERN
2014-03-18 09:29:17 +08:00
Martin Storsjö
4633626d69 Remove XMMREG_PROTECT
This isn't necessary any longer, when all the assembly routines
take care of restoring registers as necessary.
2014-03-17 13:47:01 +02:00
Martin Storsjö
3cf52554f7 Backup/restore the xmm6-xmm15 SSE registers within asm functions on win64
According to the Win64 ABI, these registers need to be preserved,
and compilers are allowed to rely on their content to stay
available - not only for float usage but for any usage, anywhere,
in the calling C++ code.

This adds a macro which pushes the clobbered registers onto the
stack if targeting win64 (and a matching one which restores them).
The parameter to the macro is the number of xmm registers used
(e.g. if using xmm0 - xmm7, the parameter is 8), or in other
words, the number of the highest xmm register used plus one.

This is similar to how the same issue is handled for the NEON
registers q4-q7 with the vpush instruction, except that they needed
to be preserved on all platforms, not only on one particular platform.

This allows removing the XMMREG_PROTECT_* hacks, which can
easily fail if the compiler chooses to use the callee saved
xmm registers in an unexpected spot.
2014-03-17 13:44:33 +02:00
Martin Storsjö
9293f2f947 Remove commented out rodata sections and tables in assembly files 2014-03-17 13:42:18 +02:00
Martin Storsjö
eec968234d Fold ALIGN 16 and the function label into WELS_EXTERN
This simplifies the structure for all x86 assembly functions,
reducing the amount of duplicated code structure.
2014-03-17 13:35:00 +02:00
Martin Storsjö
04f5bcd68d Use movsxd in SIGN_EXTENSION
This is what nasm ended up assembling movsx with 32 bit input to
anyway.

Keep using plain movsx for 16 bit input.

This fixes building with yasm in 64 bit mode.
2014-03-17 13:26:46 +02:00
Martin Storsjö
f96918283f Remove commented out code for old, 32-bit only x86 assembly function prologues/epilogues 2014-03-17 11:20:11 +02:00
Licai Guo
b5a4d706b9 Merge pull request #496 from mstorsjo/use-sign-extend-macro
Use the SIGN_EXTENSION macro where possible
2014-03-17 16:31:03 +08:00
Licai Guo
1c0ba88b0e Merge pull request #501 from mstorsjo/neon-register-backup
Avoid clobbering the registers q4-q7 in DeblockingBSCalcEnc_neon
2014-03-17 14:05:23 +08:00
Martin Storsjö
fc260b39e0 Remove the unused FORMAT_COFF define
Nothing in the project currently sets FORMAT_COFF - the other generic
branch works just fine on windows.
2014-03-16 17:54:55 +02:00
Martin Storsjö
eb238e6549 Use the SIGN_EXTENSION macro where possible
This shortens the x86 assembly by 134 lines in total.
2014-03-16 17:54:24 +02:00
Martin Storsjö
91e5838621 Indent all WELS_ASM_FUNC_BEGIN properly
By having all of them start at the start of the line, the code
is more consistent and readable.
2014-03-16 12:01:54 +02:00
Martin Storsjö
c82f548e6f Add defines of arg11 and arg12 in asm_inc.asm 2014-03-15 14:42:07 +02:00
Martin Storsjö
cde30c155b Avoid clobbering the registers q4-q7 in DeblockingBSCalcEnc_neon
Remap q5 to q8, q6 to q9, q7 to q10 and q8 to q11, and push
q4 to the stack.

This was missed previously since the codec unittest doesn't
test encoding with loop filter enabled yet.
2014-03-14 22:22:28 +02:00
Martin Storsjö
9199798f22 Fix a typo in a macro name, EXTENTION -> EXTENSION 2014-03-14 10:13:18 +02:00
volvet
6714b8ae99 Merge pull request #463 from mstorsjo/dont-clobber-neon-registers
Avoid clobbering the neon registers q4-q7

Review and verified by zhilwang
2014-03-14 10:28:55 +08:00
Martin Storsjö
efe32b7900 Make arm assembly labels always start from the beginning of the line
A few labels were misformatted.
2014-03-12 12:01:01 +02:00
Martin Storsjö
52e8973869 Mark the stack as non-executable in the arm assembly
Otherwise the linker is forced to enable an executable stack for
executables that the code is linked into.
2014-03-11 14:24:16 +02:00
Martin Storsjö
c011890764 Push clobbered neon registers on the stack
According to the calling convention, the registers q4-q7 should be
preserved by functions. The caller (generated by the compiler) could
be using those registers anywhere for any intermediate data.

Functions that use more than 12 of the qX registers must push
the clobbered registers on the stack in order to be able to restore them
afterwards.

In functions that don't use all 16 registers, but clobber some of
the callee saved registers q4-q7, one or more of them are remapped
to reduce the number of registers that have to be saved/restored.

This incurs a very small (around 0.5%) slowdown in the decoder and
encoder.
2014-03-10 22:07:36 +02:00
Martin Storsjö
811c647c0e Remap registers to avoid clobbering the neon registers q4-q7
According to the calling convention, the registers q4-q7 should be
preserved by functions. The caller (generated by the compiler) could
be using those registers anywhere for any intermediate data.

Functions that use 12 or less of the qX registers can avoid
violating the calling convention by simply using other registers instead
of the callee saved registers q4-q7.

This change only remaps the registers used within functions - therefore
this does not affect performance at all. E.g. in functions using
registers q0-q7, we now use q0-q3 and q8-q11 instead.
2014-03-10 22:07:25 +02:00
Ethan Hugg
3627875986 Merge pull request #456 from mstorsjo/use-common-threadlib
Make the processing lib use mutexes from WelsThreadLib from the common library
2014-03-10 09:45:51 -07:00
ruil2
44a49b1fef Merge pull request #458 from mstorsjo/android-threading
Don't try to set thread scope and scheduling policy on android
2014-03-10 17:26:00 +08:00
ruil2
2539d6e447 Merge pull request #462 from mstorsjo/fix-typos
Fix two typos in variable and macro names
2014-03-10 15:25:20 +08:00
Martin Storsjö
cc7b81f3c3 Fix a typo in arm assembly, LORD -> LOAD 2014-03-09 19:19:38 +02:00
Martin Storsjö
8d6b368a1c Remove unnecessary stray __cdecl annotations in function signature comments in x86 assembly 2014-03-09 19:18:02 +02:00
Martin Storsjö
1c6a910c11 Don't try to set thread scope and scheduling policy on android
These APIs aren't implemented on android.
2014-03-08 20:37:42 +02:00
Martin Storsjö
c5390521ec Make the processing lib use mutexes from WelsThreadLib from the common library
This requires always building the WelsMutex* functions,
even if MT_ENABLED isn't set.
2014-03-08 12:46:25 +02:00
volvet
355bbacc2d Merge pull request #443 from mstorsjo/rerun-mktargets
Rerun mktargets.sh
2014-03-07 18:25:20 +08:00
Martin Storsjö
64b4556d13 Rerun mktargets.sh
This fixes inconsistent indentation of one line, caused by
manually editing one of the targets.mk files.
2014-03-07 11:30:19 +02:00
Martin Storsjö
5b8ee37162 Merge WelsThreadDestroy into WelsThreadJoin
Now calling WelsThreadJoin is enough to finish and clean up
the thread on all platforms.

This unifies the thread cleanup code between windows and unix.

Now all of the threading code should use the exact same codepaths
between windows and unix.
2014-03-07 10:51:28 +02:00
Martin Storsjö
474deacd7a Remove the now unused thread cancellation support
This makes the thread library build on android - android does
not have pthread_cancel.
2014-03-07 10:51:14 +02:00
volvet
38a3fada24 Merge pull request #435 from mstorsjo/threadlib-wait-single-unix
Make WelsMultipleEventsWaitSingleBlocking usable on unix as well
2014-03-07 16:47:38 +08:00
Licai Guo
e5f36822a9 Update targets.mk files 2014-03-07 16:22:59 +08:00
Licai Guo
71467f948a mv mc_neon.S to common,add MC arm code to encoder 2014-03-07 12:18:58 +08:00
volvet
14f5518e6a Merge pull request #437 from mstorsjo/fix-arm-encoder-android
Fix building arm encoder assembly for android
2014-03-07 10:41:34 +08:00
volvet
b3fa8dd334 Merge pull request #418 from mstorsjo/ios-neon-detection
Use the __ARM_NEON__ built-in compiler define for identifying neon capability on iOS
2014-03-07 09:15:17 +08:00
Martin Storsjö
11bdebb12c Explicitly enable the UAL syntax when using gnu tools
Arm assembly has got two variants of the syntax, the old legacy
syntax, and the new modern UAL (unified assembly language) syntax.

Most arm assembly is the same in the both syntaxes, but some
uncommon cases change the order of suffixes - the "subscs"
instruction would be written "subcss" in the old syntax.

The apple tools default to UAL, while the GNU tools (e.g. in
android) require you to specify ".syntax unified" to enable the
new syntax. When enabling the new syntax with the GNU tools, some
cases of "sub r0, r1, lsl #1" needs to be written explicitly as
"sub r0, r0, r1, lsl #1", handled in the previous commit.

This allows using the same, modern syntax for things like subscs,
without needing to have two alternate forms of writing it.
2014-03-06 16:21:54 +02:00
Martin Storsjö
c0043f7053 Use the three-operand form of add/sub with shift
When using unified syntax, the two operand form with a shift
isn't allowed.
2014-03-06 16:21:54 +02:00
Martin Storsjö
4e4bfcc1bc Regenerate makefiles to include the encoder arm assembly 2014-03-06 16:11:54 +02:00
Martin Storsjö
45e059ec5f Rename expand_picture.S to expand_picture_neon.S
This avoids ambiguity in the make based build system about
whether expand_picture.o should be built from expand_picture.S
or expand_picture.asm.
2014-03-06 16:11:40 +02:00
Martin Storsjö
276b585f03 Use the cpu-features NDK library for detecting the number of cores in WelsThreadLib
On arm, the exact same detection is done in WelsCPUFeatureDetect,
but in the x86 version of that function we use x86 cpuid for getting
the core count, and this is not available on all processors. For the
case when cpuid can't tell the core count, use the NDK function as
higher level API.

The thread lib itself doesn't build properly on android yet, but will
do so soon.
2014-03-06 15:28:59 +02:00
Martin Storsjö
d0a81355b0 Add support for using a separate "master event" in WelsMultipleEventsWait*Blocking
This allows making the WelsMultipleEventsWaitSingleBlocking
function work properly in unix, without polling. If a master
event is provided, the function first waits for a signal on
that event - once such a signal is received, it is assumed that
one of the individual events in the list have been signalled as
well. Then the function can proceed to check each of the semaphores
in the list using sem_trywait to find the first one of them that
has been signalled. Assuming that the master event is signalled
in pair with the other events, one of the sem_trywait calls
should succeed.

The same master event is also used in
WelsMultipleEventsWaitAllBlocking, to keep the semaphore values
in sync across calls to the both functions.
2014-03-06 15:03:59 +02:00
Martin Storsjö
de32455d87 Remove the timeout parameter from WelsMultipleEventsWaitSingleBlocking
All users of the function passed the value corresponding to
"infinite", and the (currently unused) unix implementation of it
only supported infinite wait as well.
2014-03-06 15:03:59 +02:00
volvet
8cc332dea1 Merge pull request #432 from zhilwang/arm-asm
Arm asm
2014-03-06 16:50:56 +08:00
volvet
8beb3c8c09 Merge pull request #417 from mstorsjo/unify-event-init
Unify the interface for creating/deleting event objects
2014-03-06 09:13:13 +08:00
volvet
97376c6339 Merge pull request #413 from mstorsjo/remove-commented-code
Remove commented out, unused code
2014-03-05 22:13:35 +08:00