Commit Graph

702 Commits

Author SHA1 Message Date
Martin Storsjö
dbc324d5bb Use the iMultipleThreadIdc field from SEncParamExt 2014-03-07 14:47:43 +02:00
volvet
355bbacc2d Merge pull request #443 from mstorsjo/rerun-mktargets
Rerun mktargets.sh
2014-03-07 18:25:20 +08:00
volvet
3382e8f8c4 Merge pull request #441 from mstorsjo/remove-stray-macro-parameters
Remove unused/undeclared arm assembly macro parameters
2014-03-07 18:24:33 +08:00
Martin Storsjö
64b4556d13 Rerun mktargets.sh
This fixes inconsistent indentation of one line, caused by
manually editing one of the targets.mk files.
2014-03-07 11:30:19 +02:00
Martin Storsjö
5b8ee37162 Merge WelsThreadDestroy into WelsThreadJoin
Now calling WelsThreadJoin is enough to finish and clean up
the thread on all platforms.

This unifies the thread cleanup code between windows and unix.

Now all of the threading code should use the exact same codepaths
between windows and unix.
2014-03-07 10:51:28 +02:00
Martin Storsjö
b4aa9be7de Use WelsThreadJoin on windows as well
This avoids using a separate event just for signalling that
a thread has finished running.
2014-03-07 10:51:28 +02:00
Martin Storsjö
474deacd7a Remove the now unused thread cancellation support
This makes the thread library build on android - android does
not have pthread_cancel.
2014-03-07 10:51:14 +02:00
Martin Storsjö
baaa38737e Use pExitEncodeEvent instead of thread cancellation on unix as well
This works now that we've got a suitably working implementation
of WelsMultipleEventsWaitSingleBlocking.
2014-03-07 10:49:39 +02:00
volvet
38a3fada24 Merge pull request #435 from mstorsjo/threadlib-wait-single-unix
Make WelsMultipleEventsWaitSingleBlocking usable on unix as well
2014-03-07 16:47:38 +08:00
Licai Guo
1b9aae8434 Merge pull request #439 from zhilwang/mc-arm-asm
mv mc_neon.S to common,add MC arm code to encoder
2014-03-07 16:36:48 +08:00
ruil2
b3c45946ff modify typing format 2014-03-07 16:29:12 +08:00
Martin Storsjö
c87bb2b449 Remove unused/undeclared arm assembly macro parameters
The SAD_VAR_16_END macro only takes 3 parameters, never 4,
and SAD_SSD_16_END never is called with more than 3 parameters
either.
2014-03-07 10:26:54 +02:00
Licai Guo
e5f36822a9 Update targets.mk files 2014-03-07 16:22:59 +08:00
Licai Guo
d986c27b9d remove mc_neon.S from encoder 2014-03-07 16:11:36 +08:00
ruil2
f0c6c2b318 Merge branch 'master' of https://github.com/cisco/openh264 into encoder_update 2014-03-07 15:59:23 +08:00
Licai Guo
8ad9e0b60d use temp buffer to store new arrived PPS/SPS 2014-03-06 22:05:12 -08:00
Licai Guo
8a3518f7be set bAuReadyFlag to true when we meet a PPS 2014-03-06 21:50:31 -08:00
Licai Guo
71467f948a mv mc_neon.S to common,add MC arm code to encoder 2014-03-07 12:18:58 +08:00
Licai Guo
a4cecd8004 Merge pull request #426 from volvet/simplify-layer-process
simplify-layer-process
2014-03-07 10:58:28 +08:00
volvet
14f5518e6a Merge pull request #437 from mstorsjo/fix-arm-encoder-android
Fix building arm encoder assembly for android
2014-03-07 10:41:34 +08:00
ruil2
594fc4fe7b dump file refactor 2014-03-07 10:23:25 +08:00
volvet
b3fa8dd334 Merge pull request #418 from mstorsjo/ios-neon-detection
Use the __ARM_NEON__ built-in compiler define for identifying neon capability on iOS
2014-03-07 09:15:17 +08:00
Ethan Hugg
2f2801dc78 Merge pull request #434 from mstorsjo/threadlib-core-count-android
Use the cpu-features NDK library for detecting the number of cores in WelsThreadLib
2014-03-06 08:02:28 -08:00
Martin Storsjö
11bdebb12c Explicitly enable the UAL syntax when using gnu tools
Arm assembly has got two variants of the syntax, the old legacy
syntax, and the new modern UAL (unified assembly language) syntax.

Most arm assembly is the same in the both syntaxes, but some
uncommon cases change the order of suffixes - the "subscs"
instruction would be written "subcss" in the old syntax.

The apple tools default to UAL, while the GNU tools (e.g. in
android) require you to specify ".syntax unified" to enable the
new syntax. When enabling the new syntax with the GNU tools, some
cases of "sub r0, r1, lsl #1" needs to be written explicitly as
"sub r0, r0, r1, lsl #1", handled in the previous commit.

This allows using the same, modern syntax for things like subscs,
without needing to have two alternate forms of writing it.
2014-03-06 16:21:54 +02:00
Martin Storsjö
c0043f7053 Use the three-operand form of add/sub with shift
When using unified syntax, the two operand form with a shift
isn't allowed.
2014-03-06 16:21:54 +02:00
Martin Storsjö
f1502c26e3 Don't use WELS_ASM_FUNC_END in the middle of a function
WELS_ASM_FUNC_END declares the end of the function, and needs
to be paired with WELS_ASM_FUNC_BEGIN.
2014-03-06 16:21:54 +02:00
Martin Storsjö
8ba79262bf Rename a function to avoid conflicts between almost duplicate neon functions
There's a different version of the same function in the encoder,
but they're not identical - the encoder version has got stricter
alignment requirements.

If someone can confirm that it is ok to use the function from the
encoder, pixel_sad_neon.S in processing could be deleted, and the
encoder version moved to codec/common instead.
2014-03-06 16:19:48 +02:00
Martin Storsjö
4e4bfcc1bc Regenerate makefiles to include the encoder arm assembly 2014-03-06 16:11:54 +02:00
Martin Storsjö
45e059ec5f Rename expand_picture.S to expand_picture_neon.S
This avoids ambiguity in the make based build system about
whether expand_picture.o should be built from expand_picture.S
or expand_picture.asm.
2014-03-06 16:11:40 +02:00
Martin Storsjö
ce4fa9e272 Correct the endif comment
The code block is about HAVE_NEON, not X86_ASM.
2014-03-06 15:43:04 +02:00
Martin Storsjö
636df2bebb Use WelsMultipleEventsWaitSingleBlocking within the worker thread on unix as well
This avoids using a separate thread for handling pUpdateMbListEvent
events, and later allowing using the encode exit event on unix instead
of pthread cancellation.
2014-03-06 15:34:35 +02:00
Martin Storsjö
801da26d1d Use WelsMultipleEventsWaitSingleBlocking with a master event for waiting on finished threads
This allows using the same codepath for both unix and windows
for distributing new slices to code to threads.

This also improves the performance on unix - instead of waiting
for all the current threads to finish their current slice
before handing out a new slice to each of them (where the threads
that finish first will just wait instead of immediately getting
a new slice to work on), we now use the same logic as on windows.

In one setup, it improves the performance of encoding from ~920 fps
to ~950 fps, and in another setup it goes from ~390 fps to ~660 fps.
(These tests were done with the SM_ROWMB_SLICE mode, which
heavily exercises the code for distributing new slices to the
worker threads.)

The extra WelsEventSignal call on windows where it isn't strictly
necessary doesn't incur any measurable slowdown, so it is kept
without any extra ifdefs to keep the code more readable and unified.
2014-03-06 15:33:37 +02:00
Martin Storsjö
276b585f03 Use the cpu-features NDK library for detecting the number of cores in WelsThreadLib
On arm, the exact same detection is done in WelsCPUFeatureDetect,
but in the x86 version of that function we use x86 cpuid for getting
the core count, and this is not available on all processors. For the
case when cpuid can't tell the core count, use the NDK function as
higher level API.

The thread lib itself doesn't build properly on android yet, but will
do so soon.
2014-03-06 15:28:59 +02:00
Martin Storsjö
d0a81355b0 Add support for using a separate "master event" in WelsMultipleEventsWait*Blocking
This allows making the WelsMultipleEventsWaitSingleBlocking
function work properly in unix, without polling. If a master
event is provided, the function first waits for a signal on
that event - once such a signal is received, it is assumed that
one of the individual events in the list have been signalled as
well. Then the function can proceed to check each of the semaphores
in the list using sem_trywait to find the first one of them that
has been signalled. Assuming that the master event is signalled
in pair with the other events, one of the sem_trywait calls
should succeed.

The same master event is also used in
WelsMultipleEventsWaitAllBlocking, to keep the semaphore values
in sync across calls to the both functions.
2014-03-06 15:03:59 +02:00
Martin Storsjö
de32455d87 Remove the timeout parameter from WelsMultipleEventsWaitSingleBlocking
All users of the function passed the value corresponding to
"infinite", and the (currently unused) unix implementation of it
only supported infinite wait as well.
2014-03-06 15:03:59 +02:00
Licai Guo
201ab42d7e Merge pull request #431 from huili2/large_to_small_sps_bug
Large to small sps bug for issue #373
2014-03-06 16:51:59 +08:00
volvet
8cc332dea1 Merge pull request #432 from zhilwang/arm-asm
Arm asm
2014-03-06 16:50:56 +08:00
volvet
73452e0993 Merge pull request #429 from mstorsjo/simplify-ifdef-with-macro
Use a macro for conditionally logging based on ENABLE_TRACE_MT
2014-03-06 16:01:41 +08:00
Licai Guo
8a7a9195d9 remove unused function 2014-03-05 23:14:57 -08:00
Licai Guo
4260a9b2ba remove CS and RS syntaxs for issue 373 2014-03-05 22:59:27 -08:00
Licai Guo
7bfe801874 Remove trailing space 2014-03-06 14:55:36 +08:00
Licai Guo
67534b0fc0 arm asm code refine. 2014-03-06 14:30:16 +08:00
Martin Storsjö
fd6f8a83b3 Use a macro for conditionally logging based on ENABLE_TRACE_MT
This avoids having an extra ifdef around every single WelsLog
call.
2014-03-06 08:06:34 +02:00
ruil2
28a56a6752 Merge pull request #415 from volvet/remove-useless-mgs-code
remove un-supported mgs code
2014-03-06 14:05:04 +08:00
ruil2
a59c8ea04c Merge pull request #428 from sijchen/read_para3
[Encoder Console] add new para reading to get accord with the new API design
2014-03-06 14:04:33 +08:00
sijchen
a4a8eddb04 add new para reading to get accord with the new API design 2014-03-06 13:48:18 +08:00
volvet
50fe120a3e simplify-layer-process 2014-03-06 11:19:33 +08:00
ruil2
334c5765c7 remove inter-deblock related parameters 2014-03-06 10:26:53 +08:00
volvet
8beb3c8c09 Merge pull request #417 from mstorsjo/unify-event-init
Unify the interface for creating/deleting event objects
2014-03-06 09:13:13 +08:00
volvet
97376c6339 Merge pull request #413 from mstorsjo/remove-commented-code
Remove commented out, unused code
2014-03-05 22:13:35 +08:00
volvet
7ea70491c8 Merge pull request #411 from mstorsjo/arm-add-func-markers
Add .func/.endfunc markers in the arm assembly
2014-03-05 17:40:18 +08:00
Martin Storsjö
f384dde881 Add .func/.endfunc markers in the arm assembly
This adds information to debug builds.

This requires adding a separate definition of WELS_ASM_FUNC_END
for apple tools.
2014-03-05 11:25:51 +02:00
Licai Guo
eb3dd8f0ae Merge pull request #416 from huili2/move_iTotalNumMbRec_to_pCtx
move iTotalNumMbRec from refpic to ctx
2014-03-05 17:12:43 +08:00
Licai Guo
e7cc8c2780 Add arm asm code for processing. 2014-03-05 16:54:05 +08:00
Martin Storsjö
ef7e05d47d Use the __ARM_NEON__ built-in compiler define for identifying neon capability on iOS
This avoids having to hardcode the names of devices that don't
support neon.

The devices that don't support neon don't run the armv7 variants
of iOS binaries at all - they would need to be built for the armv6
architecture. (Building for armv6 isn't supported at all in
modern iOS SDKs.)

Therefore we can simply use the __ARM_NEON__ built-in compiler
define to check if NEON code is allowed in the current build,
and have the WelsCPUFeatureDetect function return flags accordingly.

The only thing this disallows is doing an armv6 build which would
optionally enable neon code at runtime if run on an armv7 capable
device, but since Apple allows you to build the same binary for
armv7 separately in the same app bundle, and since armv6 building
isn't even possible in the current iOS SDKs, this isn't really a loss.

This is in contrast to the android builds where the armv7 baseline
does not include NEON.
2014-03-05 09:47:05 +02:00
Martin Storsjö
d4bdef2916 Use an event name that contains the process id
This reduces the risk for namespace collisions if two processes
run the encoder simultaneously without address space layout
randomization.
2014-03-05 09:36:46 +02:00
Martin Storsjö
4814d5828d Use unnamed semaphores on linux
This avoids the risk of namespace collisions for named semaphores
(where the names are global for the whole machine), on platforms
where we strictly don't need to use the named semaphores.
2014-03-05 09:36:46 +02:00
Martin Storsjö
5480ffafdf Use the WelsEventOpen interface with an event name on windows as well
This unifies the event creation interface, even if the event
name itself is unused on windows, allowing use the exact same
code to initialize events regardless of the actual platform.

Some ifdefs still remain in the event initialization code, since
some events are only used on windows.
2014-03-05 09:36:04 +02:00
volvet
e9395bbd35 remove un-supported mgs code 2014-03-05 15:17:07 +08:00
Martin Storsjö
04917cd13f Remove commented out, unused code
Some few lines of commented out code is left, that might be useful
for debugging.
2014-03-05 08:50:59 +02:00
volvet
adb27ff0b1 Merge pull request #405 from mstorsjo/simplify-threads
Adjust WELS_EVENT definitions to allow sharing more code between unix and win32 codepaths
2014-03-05 12:31:15 +08:00
Licai Guo
9e4ab64c73 move iTotalNumMbRec from refpic to ctx 2014-03-04 19:23:15 -08:00
Licai Guo
ced9e41b5d Merge pull request #399 from volvet/refine-multi-layer-process
refine-multi-layer-process
2014-03-05 10:45:35 +08:00
Licai Guo
248f324c62 Add intra predictor arm asm code. 2014-03-05 10:25:15 +08:00
Licai Guo
efcee63692 Remove .DS_Store file. 2014-03-05 10:24:05 +08:00
Licai Guo
bb244d736b Partly add arm asm code to encoder. 2014-03-05 10:24:05 +08:00
volvet
7150adc91b Merge pull request #407 from mstorsjo/do-blocking-wait
Do a blocking wait with WelsMultipleEventsWaitSingleBlocking
2014-03-05 09:18:45 +08:00
Martin Storsjö
dae8f4b737 Exclude the arm assembly header as well
This avoids warnings about object files not containing any symbols.
2014-03-04 23:23:19 +02:00
Ethan Hugg
975a3e41bc Merge pull request #404 from mstorsjo/arm-asm-type-func
Mark the arm asm labels as functions
2014-03-04 10:17:07 -08:00
Ethan Hugg
01a2f582c3 Merge pull request #401 from mstorsjo/android-arm-assembly
Enable the arm assembly in android builds
2014-03-04 09:50:07 -08:00
volvet
e61bd1b504 Merge pull request #408 from mstorsjo/exclude-asm-headers
Exclude assembly files that are used as headers
2014-03-04 21:07:42 +08:00
volvet
bc9ee5b145 Merge pull request #406 from mstorsjo/use-proper-define
Use the windows INFINITE define instead of manually casting -1 to uint32_t
2014-03-04 20:57:57 +08:00
Martin Storsjö
773cc4a797 Exclude assembly files that are used as headers
This avoids some warnings about object files not containing any
symbols.
2014-03-04 14:57:36 +02:00
Martin Storsjö
cf07d61f06 Do a blocking wait with WelsMultipleEventsWaitSingleBlocking
There is no point in doing a timed wait here - there's no work
that we can do if the wait timed out, and sleeping for 1 ms
inbetween doesn't help, it only adds potential extra latency
to reacting to threads that need more work to do.
2014-03-04 14:51:33 +02:00
Martin Storsjö
42592217c2 Use the windows INFINITE define instead of manually casting -1 to uint32_t 2014-03-04 14:47:25 +02:00
Martin Storsjö
1eaa38b130 Simplify code by allocating the arrays of events and thread handles statically
This avoids having to malloc a whole lot of separate arrays,
all which are all bounded by MAX_THREADS_NUM.
2014-03-04 12:17:32 +02:00
Martin Storsjö
ae63f064a0 Share the declarations for WELS_EVENT arrays between win32 and unix codepaths 2014-03-04 12:17:32 +02:00
Martin Storsjö
71bc52d103 Change the unix version of WELS_EVENT to sem_t*
Typedeffing WELS_EVENT as sem_t* makes the typedef behave similarly
to the windows version (typedeffed as HANDLE), unifying the code
that allocates and uses these event objects (getting rid of
most of the need for separate codepaths and ifdefs).
2014-03-04 12:17:32 +02:00
Martin Storsjö
e9c3403674 Merge some WIN32 ifdefs that were directly next to each other 2014-03-04 12:17:32 +02:00
Martin Storsjö
e20930ef73 Mark the arm asm labels as functions
This fixes calling them from thumb code, on linux.
2014-03-04 11:23:04 +02:00
Martin Storsjö
d411d768b6 Add cpu feature detection for generic arm/linux
For platforms without runtime detection, assume whoever built it
with HAVE_NEON actually wanted it.
2014-03-04 10:18:30 +02:00
Martin Storsjö
9cf34e7615 Unify the interface for the different variants of WelsCPUFeatureDetect
The caller of the function should not need to know exactly which
implementation of it is being used.

For the variants that don't support detecting the number of cores,
the pNumberOfLogicProcessors parameter can be left untouched
and the caller will use a higher level API for finding it out.

This simplifies all the calling code, and simplifies adding
more implementations of cpu feature detection.
2014-03-04 10:18:30 +02:00
volvet
4d729c9418 get cpu cores for android 2014-03-04 15:50:41 +08:00
Martin Storsjö
1118dd4f71 Update the makefile generator to support .S arm assembly files
These are built if ASM_ARCH is set to arm.
2014-03-04 08:56:42 +02:00
Martin Storsjö
03e0dcd814 Convert the arm assembly sources to unix newlines 2014-03-04 08:42:28 +02:00
volvet
13d785ec6a refine-multi-layer-process 2014-03-04 12:04:04 +08:00
volvet
d9a02d27f2 remove execute attribute of the arm asm files 2014-03-04 11:45:10 +08:00
Licai Guo
26218731c6 Merge pull request #386 from volvet/refine_processing
refine build spatial list in processing
2014-03-04 11:15:35 +08:00
volvet
f8b0cec68d Merge pull request #387 from zhilwang/arm-asm
Arm asm
2014-03-04 11:08:17 +08:00
volvet
901b89f7ad Merge pull request #376 from mstorsjo/simplify-x86-asm-makefiles
Simplify makefiles with respect to x86 assembly
2014-03-04 10:16:01 +08:00
Licai Guo
cdeb6449c6 remove unused declarations in as264_common.h 2014-03-03 17:42:47 -08:00
Licai Guo
83d0791163 Modify function name according latest cisco master 2014-03-04 09:18:18 +08:00
volvet
bc1850a54d remove uiFrameIdxRc 2014-03-04 09:08:54 +08:00
Ethan Hugg
1eb688264b Merge pull request #395 from mstorsjo/printf-64bit-macro
Use a standard macro for 64 bit printf conversion specifiers
2014-03-03 09:11:51 -08:00
Ethan Hugg
e9593682eb Merge pull request #392 from mstorsjo/unify-threading-ifdefs
Unify ifdef conditions related to threading code
2014-03-03 08:23:30 -08:00
Ethan Hugg
d940a204eb Merge pull request #388 from mstorsjo/initialize-default
Initialize sSpatialLayers[0] in SEncParamExt for GetDefaultParams
2014-03-03 08:20:36 -08:00
Martin Storsjö
e0951599ea Unify ifdef conditions related to threading code
The two different variants of the threadlib basically are
win32 and unix - use _WIN32 to check for this consistently,
instead of occasionally using __GNUC__ to enable the unix
codepath. (__GNUC__ is also defined on mingw, which still is
a windows platform and should use the _WIN32 code.)
2014-03-03 14:55:53 +02:00
Martin Storsjö
3c7dde97ee Use a standard macro for 64 bit printf conversion specifiers
This avoids duplicating the printf line with an ifdef every
time a 64 bit number needs to be printed.
2014-03-03 12:33:34 +02:00
volvet
c7d98a8fa3 Merge pull request #394 from ruil2/encoder_update
add timestamp in encoder interface --- review request#138
2014-03-03 17:31:48 +08:00
unknown
e0e7107ff1 add timestamp in encoder interface 2014-03-03 17:05:06 +08:00