Commit Graph

737 Commits

Author SHA1 Message Date
Martin Storsjö
3cf52554f7 Backup/restore the xmm6-xmm15 SSE registers within asm functions on win64
According to the Win64 ABI, these registers need to be preserved,
and compilers are allowed to rely on their content to stay
available - not only for float usage but for any usage, anywhere,
in the calling C++ code.

This adds a macro which pushes the clobbered registers onto the
stack if targeting win64 (and a matching one which restores them).
The parameter to the macro is the number of xmm registers used
(e.g. if using xmm0 - xmm7, the parameter is 8), or in other
words, the number of the highest xmm register used plus one.

This is similar to how the same issue is handled for the NEON
registers q4-q7 with the vpush instruction, except that they needed
to be preserved on all platforms, not only on one particular platform.

This allows removing the XMMREG_PROTECT_* hacks, which can
easily fail if the compiler chooses to use the callee saved
xmm registers in an unexpected spot.
2014-03-17 13:44:33 +02:00
Martin Storsjö
f96918283f Remove commented out code for old, 32-bit only x86 assembly function prologues/epilogues 2014-03-17 11:20:11 +02:00
Licai Guo
fc4e0cacec Merge pull request #483 from volvet/develop_b
use large/medium/similar to define scene change result
2014-03-17 16:32:31 +08:00
Licai Guo
b5a4d706b9 Merge pull request #496 from mstorsjo/use-sign-extend-macro
Use the SIGN_EXTENSION macro where possible
2014-03-17 16:31:03 +08:00
Licai Guo
1c0ba88b0e Merge pull request #501 from mstorsjo/neon-register-backup
Avoid clobbering the registers q4-q7 in DeblockingBSCalcEnc_neon
2014-03-17 14:05:23 +08:00
Licai Guo
2c796337ba Merge pull request #510 from huili2/remove_basemb
remove BASE_MB related code
2014-03-17 08:46:25 +08:00
Martin Storsjö
eb238e6549 Use the SIGN_EXTENSION macro where possible
This shortens the x86 assembly by 134 lines in total.
2014-03-16 17:54:24 +02:00
Martin Storsjö
91e5838621 Indent all WELS_ASM_FUNC_BEGIN properly
By having all of them start at the start of the line, the code
is more consistent and readable.
2014-03-16 12:01:54 +02:00
volvet
e654bf6b7f Merge pull request #490 from ruil2/encoder_slice_auto
fix dump file issue
2014-03-16 15:41:26 +08:00
volvet
1c3e87dd3f Merge pull request #505 from mstorsjo/x86-asm-constants
Format x86 assembly constants in a yasm compatible manner
2014-03-16 15:37:16 +08:00
volvet
ac990fdc38 Merge pull request #507 from mstorsjo/more-x86-asm-args
Add defines for argument 11 and 12 in asm_inc
2014-03-16 10:21:35 +08:00
volvet
e606bae0e9 Merge pull request #504 from mstorsjo/fix-function-name-typo
Fix a typo, Smple -> Sample
2014-03-16 10:18:03 +08:00
Martin Storsjö
d45e624cd4 Simplify code by using the arg11 and arg12 defines 2014-03-15 14:42:27 +02:00
Martin Storsjö
c82f548e6f Add defines of arg11 and arg12 in asm_inc.asm 2014-03-15 14:42:07 +02:00
Martin Storsjö
ce8da27287 Format x86 assembly constants in a yasm compatible manner
This is similar to what was done in a6463be0cc - yasm requires
these constants to have a zero after $.
2014-03-15 14:38:28 +02:00
Licai Guo
6f2b98975e Merge pull request #502 from mstorsjo/fix-macro-indentation
Fix the indentation of some nasm macros
2014-03-15 07:16:37 +08:00
Martin Storsjö
f4fdb15397 Fix a typo, Smple -> Sample 2014-03-14 23:30:09 +02:00
Martin Storsjö
4d120781c1 Fix the indentation of some nasm macros 2014-03-14 22:26:33 +02:00
Martin Storsjö
cde30c155b Avoid clobbering the registers q4-q7 in DeblockingBSCalcEnc_neon
Remap q5 to q8, q6 to q9, q7 to q10 and q8 to q11, and push
q4 to the stack.

This was missed previously since the codec unittest doesn't
test encoding with loop filter enabled yet.
2014-03-14 22:22:28 +02:00
Martin Storsjö
b3d04d88a0 Check for the right function pointer
This code checked whether one function pointer was non-null,
but the went on to call a different function pointer. Check
for the one that actually was called.
2014-03-14 22:20:40 +02:00
volvet
6da9a9e5c8 Merge pull request #489 from sijchen/me_refactor22
refactor ME for easier adding other search methods
2014-03-14 17:53:10 +08:00
huili2
b1f596fd69 remove BASE_MB related code 2014-03-14 02:03:41 -07:00
Martin Storsjö
9199798f22 Fix a typo in a macro name, EXTENTION -> EXTENSION 2014-03-14 10:13:18 +02:00
Martin Storsjö
2bce50283f Fix a mismatched ifdef comment
This is an ifdef block for HAVE_NEON.
2014-03-14 10:01:18 +02:00
unknown
94f8c351ca fix dump file issue 2014-03-14 15:13:24 +08:00
sijchen
6c3d83a8ac refactor ME for easier adding other search methods 2014-03-14 15:04:35 +08:00
Licai Guo
f589c580eb clean redundant checks in decoder 2014-03-13 23:56:54 -07:00
Licai Guo
8492aac917 Merge pull request #486 from huili2/nzc_bug_fix
nzc bug fix and clear
2014-03-14 13:11:45 +08:00
huili2
734e60aeeb add according to review opinion 2014-03-13 20:21:10 -07:00
Licai Guo
5bffb627d6 nzc bug fix and clear 2014-03-13 19:31:28 -07:00
volvet
6714b8ae99 Merge pull request #463 from mstorsjo/dont-clobber-neon-registers
Avoid clobbering the neon registers q4-q7

Review and verified by zhilwang
2014-03-14 10:28:55 +08:00
volvet
fc5c48830a fix the condition of scene change flag and comments 2014-03-14 09:53:24 +08:00
volvet
c8761c08ae use large/medium/similar to define scene change result 2014-03-13 10:43:20 +08:00
volvet
8962b7c98b Merge pull request #482 from sijchen/me_refactor1
mv range setting refactor
2014-03-13 10:21:39 +08:00
sijchen
d809a7981b mv range setting refactor 2014-03-13 10:18:01 +08:00
Licai Guo
3d6fdfee3d Merge pull request #480 from volvet/fix-idr-interval-issue
fix idr interval issue
2014-03-12 19:23:51 +08:00
Martin Storsjö
efe32b7900 Make arm assembly labels always start from the beginning of the line
A few labels were misformatted.
2014-03-12 12:01:01 +02:00
volvet
8b907c18fd fix idr interval issue 2014-03-12 17:38:25 +08:00
sijchen
cf37fa3ef4 Merge pull request #476 from ruil2/encoder_slice_auto
modify the parameter verification for SM_AUTO_SLICE mode, review at: https://rbcommons.com/s/OpenH264/r/184/
2014-03-12 16:41:57 +08:00
volvet
ce448a21d7 fix double free crash in encoder 2014-03-12 16:21:33 +08:00
volvet
4d74beb928 fix encode crash 2014-03-12 15:46:13 +08:00
ruil2
c7f2a0b7f6 3Author: ruil2 <ruil2@cisco.com>
modify the parameter verification for SM_AUTO_SLICE mode -- uiSliceNum
 iis ignored
2014-03-12 10:44:13 +08:00
Ethan Hugg
cece1fe2cf Merge pull request #473 from mstorsjo/arm-non-executable-stack
Mark the stack as non-executable in the arm assembly
2014-03-11 09:40:49 -07:00
Martin Storsjö
dbdf8fbe9d Remove something that looks like a personal todo note from the Android makefiles 2014-03-11 14:27:14 +02:00
Martin Storsjö
52e8973869 Mark the stack as non-executable in the arm assembly
Otherwise the linker is forced to enable an executable stack for
executables that the code is linked into.
2014-03-11 14:24:16 +02:00
Licai Guo
b773ec60ab Merge pull request #472 from mstorsjo/android-remove-mkdir-workaround
Remove a dubious/unnecessary workaround for an issue in a nonstandard toolchain
2014-03-11 17:30:00 +08:00
Martin Storsjö
966cc97496 Remove a dubious/unnecessary workaround for an issue in a nonstandard toolchain
There's no requirement to work with the Intel NDK. The NDK build
files don't even need to create any cpufeatures subdirectory since
all of the cpufeatures library is handled within the normal build
system. Finally, the encoder makefile tried to create a directory
for "welsdecdemo", not anything for "welsencdemo".

Thus, if this really was necessary, it would already have been noticed
that it missed an entry for "welsencdemo". Therefore, remove this
hack/workaround.
2014-03-11 10:09:17 +02:00
ruil2
7c8ce799c0 fix SM_FIXEDSLCNUM_SLICE bug, add SM_AUTO_SLICE mode 2014-03-11 10:23:46 +08:00
Martin Storsjö
c011890764 Push clobbered neon registers on the stack
According to the calling convention, the registers q4-q7 should be
preserved by functions. The caller (generated by the compiler) could
be using those registers anywhere for any intermediate data.

Functions that use more than 12 of the qX registers must push
the clobbered registers on the stack in order to be able to restore them
afterwards.

In functions that don't use all 16 registers, but clobber some of
the callee saved registers q4-q7, one or more of them are remapped
to reduce the number of registers that have to be saved/restored.

This incurs a very small (around 0.5%) slowdown in the decoder and
encoder.
2014-03-10 22:07:36 +02:00
Martin Storsjö
811c647c0e Remap registers to avoid clobbering the neon registers q4-q7
According to the calling convention, the registers q4-q7 should be
preserved by functions. The caller (generated by the compiler) could
be using those registers anywhere for any intermediate data.

Functions that use 12 or less of the qX registers can avoid
violating the calling convention by simply using other registers instead
of the callee saved registers q4-q7.

This change only remaps the registers used within functions - therefore
this does not affect performance at all. E.g. in functions using
registers q0-q7, we now use q0-q3 and q8-q11 instead.
2014-03-10 22:07:25 +02:00