Commit Graph

317 Commits

Author SHA1 Message Date
Martin Storsjö
5e22d5366e Remove the unused level parameter to welsStderrLevelTrace 2014-06-11 08:08:29 +03:00
Martin Storsjö
cfc9367610 Remove WelsStderrSetTraceLevel
The logging level is checked in welsCodecTrace anyway.

Previously, error logging wasn't ever shown if the trace
level was set to WELS_LOG_ERROR (as it was by default),
since welsStderrLevelTrace required the message level to
be strictly lower than the trace level.
2014-06-11 08:08:29 +03:00
Martin Storsjö
90be3d8215 Don't treat log levels as a bitmask
All use of log levels in the library just do a numerical
greater-than comparison between the set log level and the
level of the current message.
2014-06-11 08:08:29 +03:00
Martin Storsjö
4f1ea1c4f8 Remove some unused typedefs 2014-06-11 08:08:29 +03:00
Martin Storsjö
cc65a1d76c Don't include a [ENCODER]: prefix in all logging
The same trace module is used for the decoder now as well.
2014-06-10 10:52:26 +03:00
Martin Storsjö
17fc6bd66e Remove the unnecessary method WelsTraceModuleIsExist(), which always returned true 2014-06-10 10:52:26 +03:00
Martin Storsjö
d93488448e Remove some commented out lines 2014-06-10 10:52:26 +03:00
Martin Storsjö
968d87045d Remove an unnecessary local function 2014-06-10 10:52:26 +03:00
Martin Storsjö
40af75c19d Remove the unnecessary WelsSet/GetLogLevel functions
Nothing actually used the variable that these functions
handled.
2014-06-10 10:52:06 +03:00
Martin Storsjö
ba1de16ac2 Make internal logging variables static
This avoids polluting the global namespace.
2014-06-10 09:28:45 +03:00
Martin Storsjö
ab4fe3fdf4 Remove an unused variable 2014-06-10 09:27:54 +03:00
dongzhang
0e0c8b5569 add arm 64 deblock code and Unit Test code 2014-06-10 11:23:51 +08:00
ruil2
4c12f8970c cleanup trace module 2014-06-10 10:24:45 +08:00
Martin Storsjö
7bc3e944ad Get rid of uneven spacing after WELS_EXTERN 2014-06-09 11:03:25 +03:00
Martin Storsjö
57f6bcc4b0 Convert all tabs to spaces in assembly sources, unify indentation
Previously the assembly sources had mixed indentation consisting
of both spaces and tabs, making it quite hard to read unless
the right tab size was used in the editor.

Tabs have been interpreted as 4 spaces in most cases, matching
the surrounding code.
2014-06-01 01:35:43 +03:00
Martin Storsjö
faaf62afad Get rid of double spaces in macro declarations 2014-06-01 01:13:01 +03:00
Martin Storsjö
ac03b8b503 Avoid unnecessary tabs in macro declarations 2014-06-01 01:13:01 +03:00
Martin Storsjö
932a38abc0 Reformat the copyright header of deblocking_neon.S
This makes it identical to the ones in the other files.
2014-05-31 13:44:21 +03:00
ruil2
14e5d740cd clean up expand picture. 2014-05-30 11:05:31 +08:00
dongzha
80fdf09b26 Merge pull request #903 from zhilwang/arm64-sad
Add Arm64 sad code
2014-05-30 09:26:04 +08:00
Sijia Chen
7413032185 using WelsRound for all the double-int32_t conversion 2014-05-20 14:06:31 +08:00
zhiliang wang
e6c9eb9824 Add Sad arm64 code 2014-05-14 17:06:48 +08:00
Martin Storsjö
3cc01c6239 Use CCASFLAGS when assembling .S sources
This allows overriding whether all of CFLAGS should be passed
when assembling.
2014-05-13 19:39:26 +03:00
sijchen
31a4d2aa3e Merge pull request #829 from dongzha/FixBugforDeblocking
Fix a bug in deblocking for neon 32 bit arm implementation for master
2014-05-13 17:21:48 +08:00
Martin Storsjö
6b9167199f Use the built-in define __linux__ instead of the manually set LINUX 2014-05-12 12:14:33 +03:00
dongzhang
218adc7e29 Fix a bug in deblocking for neon 32 bit arm implementation 2014-05-09 14:06:16 +08:00
Martin Storsjö
6e715ddc10 Make an endif comment match the actual condition 2014-05-08 11:14:24 +03:00
huili2
5ed24f216b astyle all files 2014-05-05 19:30:21 -07:00
Martin Storsjö
b8eeda1740 Properly back up and restore XMM registers on win64 in WelsSampleSadFour4x4_sse2 2014-05-04 15:47:56 +03:00
Licai Guo
fe5b8d1a69 refine format 2014-05-04 14:51:05 +08:00
Licai Guo
485b2b5b43 Add IntraSad asm code.
Enable intraSad ASM code

Refine format

Add X86_ASM pretect for intraSad ASM code UT

remove duplicated code.
2014-05-04 12:12:38 +08:00
Martin Storsjö
23f57adaea Do full register loads instead of single-lane loads in DeblockLumaEq4H_neon
Instead of loading the registers one lane at a time, load full
registers and then transpose them.

This is faster, reducing the runtime for the function from about
506 cycles to 434 cycles (tested on a Cortex A8).

This also avoids an issue which seems like a cpu bug, present
on Sony Xperia T (cpu implementer 0x51 architecture 7 variant 0x1
part 0x04d). On such a device, it seemed like the "vswp q9, q10"
could start executing before the previous
vld4.u8 {d20[x],d21[x],d22[x],d23[x]}, [r3], r1
had finished and written back their result. Changing the
"vswp q9, q10" into "vswp q10, q9", or into separate
"vswp d18, d20; vswp d19, d21" (or the other way around) seemed to
avoid the issue. This happened occasionally (a couple times per
100000 invocations or so).
2014-04-28 10:12:16 +03:00
volvet
c65e286036 Merge pull request #738 from mstorsjo/gnu-aarch64
Fix building the aarch64 assembly using gnu binutils
2014-04-25 09:07:43 +08:00
Martin Storsjö
66f58e8357 Add macros for the non-standard mov.16b/mov.8b/ext.16b/ext.8b
This fixes building with gnu binutils, which don't support this
nonstandard form of the instructions.

Once Apple's tools support the proper standard form of the
instructions, the code should be updated to use that everywhere
instead, and these macros should be removed.
2014-04-23 11:47:12 +03:00
Martin Storsjö
7cd175d097 Use the correct ext syntax in the gnu version of macros 2014-04-23 11:47:12 +03:00
Martin Storsjö
b13a399ab5 Use a plain "ret" instead of "ret lr"
This fixes an issue with assembling with gnu binutils.
2014-04-23 11:47:12 +03:00
Martin Storsjö
f2642b308a Add correct arguments to the gnu version of UNPACK_FILTER_SINGLE_TAG_16BITS 2014-04-23 11:47:12 +03:00
Martin Storsjö
90fad9fd98 Add \() to macro arguments to separate the argument from the following .8h or similar 2014-04-23 11:47:12 +03:00
Martin Storsjö
80bd541cbe Remove .syntax unified from the aarch64 common header
This directive isn't available in aarch64 code, only in arm code.
2014-04-23 11:47:12 +03:00
Martin Storsjö
3c2e9cd7bf Regenerate makefiles to include the new arm64 assembly files 2014-04-23 11:44:47 +03:00
Martin Storsjö
764f787dcb Rename the makefile variable for arm assembly sources
This is in preparation for adding support for the aarc64 assembly
files as well.
2014-04-23 10:55:30 +03:00
Martin Storsjö
a842f14a3c Remove .orig files left over from running astyle 2014-04-23 09:24:23 +03:00
Martin Storsjö
45aef90d26 Remove the executable bit from source files 2014-04-23 09:23:56 +03:00
dongzhang
ad9e2dab4f Add Motion Compehension ARM64 Neon Code 2014-04-23 13:26:28 +08:00
Licai Guo
b47606a4ff Merge pull request #733 from dongzha/ExpandPic_ARM64
Add expand picture support for ARM64 NEON
2014-04-23 09:57:39 +08:00
dongzhang
2444327a6c Add expand picture support for ARM64 NEON
Remove duplicate MACROS
2014-04-23 09:14:32 +08:00
Martin Storsjö
564d16c2ef Make Wels*Snprintf return values be non-negative
This makes sure the windows version of these functions behave
more like the posix version. The posix *snprintf returns how
much would have been written if the buffer had been large
enough, which we don't know easily in the windows versions.

This basically means that we can assume that the return value is
>= 0 now, which can simplify the calling code.
2014-04-21 22:03:20 +03:00
Licai Guo
3f2ea77908 Merge pull request #719 from dongzha/MC
Modify ARM32 Neon code for Expand Chroma Picture, when UVWidth%16==8.
2014-04-21 14:38:51 +08:00
Licai Guo
039a547804 give accurate align information for mc copy functions
this can improve the performance for target like javascript
2014-04-19 00:33:23 -07:00
Licai Guo
2f8c539e60 Merge pull request #707 from dongzha/FixIssueMcNEON
Fix potential issue for neon implement on encoder mode decision.
2014-04-17 17:26:25 +08:00
dongzhang
a4f59bc0d7 Modify ARM32 Neon code for Expand Chroma Picture, when UVWidth%16==8. 2014-04-17 15:58:30 +08:00
Licai Guo
4062fa9d34 Merge pull request #703 from zhilwang/pf-test
Move copy_mb neon code to common folder
2014-04-17 11:08:56 +08:00
Licai Guo
3d9d00b27c Update targets.mk 2014-04-17 10:43:10 +08:00
Licai Guo
c8e1a41c29 Move copy_mb neon code to common folder 2014-04-17 10:06:48 +08:00
ruil2
b553468ad3 keep the declaration and definition in the same namespace 2014-04-17 09:45:26 +08:00
huili2
4ab8c88e98 divide copy_mb functions into new file for decoder use from encoder and add files for EC in decoder only. 2014-04-14 20:17:41 -07:00
Dong Zhang
8a4300be50 Fix potential issue for neon implement on encoder mode decision.
Error happens when ME_REFINE_BUF_STRIDE is not equal to 32.
2014-04-13 19:41:29 -07:00
Martin Storsjö
b35c21201b Use the Windows Runtime ThreadPool API for creating threads on Windows Phone
Windows Phone lacks the old CreateThread/beginthreadex APIs for
creating threads. (Technically, the functions still do exist,
but they aren't officially supported and aren't visible in the
headers when targeting Windows Phone.)

Building code that uses the Windows Runtime language extensions
requires building with the -ZW option.
2014-04-01 11:18:49 +03:00
Martin Storsjö
f293d26a62 Use more modern versions of functions that don't exist on Windows Phone 2014-04-01 11:18:48 +03:00
Martin Storsjö
4bcb03c5a0 Remove the unused function WelsSleep
Windows Phone 8 doesn't have Sleep(), but there's no need to
use the function at all.
2014-04-01 11:18:48 +03:00
volvet
9f50e0c91e clean multi-threading macro 2014-03-31 18:24:10 -07:00
ruil2
6b3f89d582 move some common functions to common.cpp and add some functions in common 2014-03-25 15:35:55 +08:00
Licai Guo
e39de8d404 reoranize common to inc/src/x86/arm 2014-03-18 19:41:32 -07:00
volvet
7313ecdbd0 Merge pull request #538 from mstorsjo/use-apple-builtin-define
Use __APPLE__ instead of APPLE_IOS for apple/arm specific features
2014-03-19 09:45:56 +08:00
Licai Guo
d897d362ab Merge pull request #532 from huili2/WELS_CLIP1
Modify MACRO WELS_CLIP1 as inline functions
2014-03-19 08:50:04 +08:00
Martin Storsjö
9586c59b9e Use __APPLE__ instead of APPLE_IOS in the arm assembly sources 2014-03-18 23:15:49 +02:00
Martin Storsjö
73ed237d73 Use __APPLE__ instead of APPLE_IOS for using the apple cpu feature detection 2014-03-18 23:15:49 +02:00
Ethan Hugg
197423f271 Merge pull request #520 from ylatuya/master
Fix compiler warnings and remove dead code
2014-03-18 13:28:02 -07:00
Andoni Morales Alastruey
703c69de81 codec: add a new macro for unused functions
Variables used only for tracing logs can trigger
-Werror=unusef-variable when tracing is disabled.
This macro helps to silent gcc in those casesWIP
2014-03-18 19:15:25 +01:00
Martin Storsjö
e1b5e038d2 Use .obj as suffix for object files on MSVC
This avoids warnings when linking about "unrecognized source file
type, object file assumed".
2014-03-18 19:41:06 +02:00
huili2
090e8cc1ed modify WELS_CLIP1 to be inline functions 2014-03-18 01:54:25 -07:00
volvet
b21411ad7c Merge pull request #511 from mstorsjo/remove-unused-define
Remove the unused FORMAT_COFF define
2014-03-18 16:11:22 +08:00
volvet
fb1958ad13 Merge pull request #519 from mstorsjo/push-xmm-registers
Backup/restore the xmm6-xmm15 SSE registers within asm functions on win64

Reviewed by zhiliang
2014-03-18 15:04:54 +08:00
volvet
b5353c8455 Merge pull request #516 from mstorsjo/fix-yasm-64bit
Fix building with yasm in 64 bit mode
2014-03-18 09:29:42 +08:00
volvet
e75cd2298b Merge pull request #517 from mstorsjo/simplify-x86-asm-func-macro
Fold ALIGN 16 and the function label into WELS_EXTERN
2014-03-18 09:29:17 +08:00
Martin Storsjö
4633626d69 Remove XMMREG_PROTECT
This isn't necessary any longer, when all the assembly routines
take care of restoring registers as necessary.
2014-03-17 13:47:01 +02:00
Martin Storsjö
3cf52554f7 Backup/restore the xmm6-xmm15 SSE registers within asm functions on win64
According to the Win64 ABI, these registers need to be preserved,
and compilers are allowed to rely on their content to stay
available - not only for float usage but for any usage, anywhere,
in the calling C++ code.

This adds a macro which pushes the clobbered registers onto the
stack if targeting win64 (and a matching one which restores them).
The parameter to the macro is the number of xmm registers used
(e.g. if using xmm0 - xmm7, the parameter is 8), or in other
words, the number of the highest xmm register used plus one.

This is similar to how the same issue is handled for the NEON
registers q4-q7 with the vpush instruction, except that they needed
to be preserved on all platforms, not only on one particular platform.

This allows removing the XMMREG_PROTECT_* hacks, which can
easily fail if the compiler chooses to use the callee saved
xmm registers in an unexpected spot.
2014-03-17 13:44:33 +02:00
Martin Storsjö
9293f2f947 Remove commented out rodata sections and tables in assembly files 2014-03-17 13:42:18 +02:00
Martin Storsjö
eec968234d Fold ALIGN 16 and the function label into WELS_EXTERN
This simplifies the structure for all x86 assembly functions,
reducing the amount of duplicated code structure.
2014-03-17 13:35:00 +02:00
Martin Storsjö
04f5bcd68d Use movsxd in SIGN_EXTENSION
This is what nasm ended up assembling movsx with 32 bit input to
anyway.

Keep using plain movsx for 16 bit input.

This fixes building with yasm in 64 bit mode.
2014-03-17 13:26:46 +02:00
Martin Storsjö
f96918283f Remove commented out code for old, 32-bit only x86 assembly function prologues/epilogues 2014-03-17 11:20:11 +02:00
Licai Guo
b5a4d706b9 Merge pull request #496 from mstorsjo/use-sign-extend-macro
Use the SIGN_EXTENSION macro where possible
2014-03-17 16:31:03 +08:00
Licai Guo
1c0ba88b0e Merge pull request #501 from mstorsjo/neon-register-backup
Avoid clobbering the registers q4-q7 in DeblockingBSCalcEnc_neon
2014-03-17 14:05:23 +08:00
Martin Storsjö
fc260b39e0 Remove the unused FORMAT_COFF define
Nothing in the project currently sets FORMAT_COFF - the other generic
branch works just fine on windows.
2014-03-16 17:54:55 +02:00
Martin Storsjö
eb238e6549 Use the SIGN_EXTENSION macro where possible
This shortens the x86 assembly by 134 lines in total.
2014-03-16 17:54:24 +02:00
Martin Storsjö
91e5838621 Indent all WELS_ASM_FUNC_BEGIN properly
By having all of them start at the start of the line, the code
is more consistent and readable.
2014-03-16 12:01:54 +02:00
Martin Storsjö
c82f548e6f Add defines of arg11 and arg12 in asm_inc.asm 2014-03-15 14:42:07 +02:00
Martin Storsjö
cde30c155b Avoid clobbering the registers q4-q7 in DeblockingBSCalcEnc_neon
Remap q5 to q8, q6 to q9, q7 to q10 and q8 to q11, and push
q4 to the stack.

This was missed previously since the codec unittest doesn't
test encoding with loop filter enabled yet.
2014-03-14 22:22:28 +02:00
Martin Storsjö
9199798f22 Fix a typo in a macro name, EXTENTION -> EXTENSION 2014-03-14 10:13:18 +02:00
volvet
6714b8ae99 Merge pull request #463 from mstorsjo/dont-clobber-neon-registers
Avoid clobbering the neon registers q4-q7

Review and verified by zhilwang
2014-03-14 10:28:55 +08:00
Martin Storsjö
efe32b7900 Make arm assembly labels always start from the beginning of the line
A few labels were misformatted.
2014-03-12 12:01:01 +02:00
Martin Storsjö
52e8973869 Mark the stack as non-executable in the arm assembly
Otherwise the linker is forced to enable an executable stack for
executables that the code is linked into.
2014-03-11 14:24:16 +02:00
Martin Storsjö
c011890764 Push clobbered neon registers on the stack
According to the calling convention, the registers q4-q7 should be
preserved by functions. The caller (generated by the compiler) could
be using those registers anywhere for any intermediate data.

Functions that use more than 12 of the qX registers must push
the clobbered registers on the stack in order to be able to restore them
afterwards.

In functions that don't use all 16 registers, but clobber some of
the callee saved registers q4-q7, one or more of them are remapped
to reduce the number of registers that have to be saved/restored.

This incurs a very small (around 0.5%) slowdown in the decoder and
encoder.
2014-03-10 22:07:36 +02:00
Martin Storsjö
811c647c0e Remap registers to avoid clobbering the neon registers q4-q7
According to the calling convention, the registers q4-q7 should be
preserved by functions. The caller (generated by the compiler) could
be using those registers anywhere for any intermediate data.

Functions that use 12 or less of the qX registers can avoid
violating the calling convention by simply using other registers instead
of the callee saved registers q4-q7.

This change only remaps the registers used within functions - therefore
this does not affect performance at all. E.g. in functions using
registers q0-q7, we now use q0-q3 and q8-q11 instead.
2014-03-10 22:07:25 +02:00
Ethan Hugg
3627875986 Merge pull request #456 from mstorsjo/use-common-threadlib
Make the processing lib use mutexes from WelsThreadLib from the common library
2014-03-10 09:45:51 -07:00
ruil2
44a49b1fef Merge pull request #458 from mstorsjo/android-threading
Don't try to set thread scope and scheduling policy on android
2014-03-10 17:26:00 +08:00
ruil2
2539d6e447 Merge pull request #462 from mstorsjo/fix-typos
Fix two typos in variable and macro names
2014-03-10 15:25:20 +08:00
Martin Storsjö
cc7b81f3c3 Fix a typo in arm assembly, LORD -> LOAD 2014-03-09 19:19:38 +02:00
Martin Storsjö
8d6b368a1c Remove unnecessary stray __cdecl annotations in function signature comments in x86 assembly 2014-03-09 19:18:02 +02:00
Martin Storsjö
1c6a910c11 Don't try to set thread scope and scheduling policy on android
These APIs aren't implemented on android.
2014-03-08 20:37:42 +02:00