Commit Graph

368 Commits

Author SHA1 Message Date
volvet
6714b8ae99 Merge pull request #463 from mstorsjo/dont-clobber-neon-registers
Avoid clobbering the neon registers q4-q7

Review and verified by zhilwang
2014-03-14 10:28:55 +08:00
volvet
fc5c48830a fix the condition of scene change flag and comments 2014-03-14 09:53:24 +08:00
volvet
c8761c08ae use large/medium/similar to define scene change result 2014-03-13 10:43:20 +08:00
volvet
8962b7c98b Merge pull request #482 from sijchen/me_refactor1
mv range setting refactor
2014-03-13 10:21:39 +08:00
sijchen
d809a7981b mv range setting refactor 2014-03-13 10:18:01 +08:00
volvet
8b907c18fd fix idr interval issue 2014-03-12 17:38:25 +08:00
ruil2
c7f2a0b7f6 3Author: ruil2 <ruil2@cisco.com>
modify the parameter verification for SM_AUTO_SLICE mode -- uiSliceNum
 iis ignored
2014-03-12 10:44:13 +08:00
ruil2
7c8ce799c0 fix SM_FIXEDSLCNUM_SLICE bug, add SM_AUTO_SLICE mode 2014-03-11 10:23:46 +08:00
Martin Storsjö
c011890764 Push clobbered neon registers on the stack
According to the calling convention, the registers q4-q7 should be
preserved by functions. The caller (generated by the compiler) could
be using those registers anywhere for any intermediate data.

Functions that use more than 12 of the qX registers must push
the clobbered registers on the stack in order to be able to restore them
afterwards.

In functions that don't use all 16 registers, but clobber some of
the callee saved registers q4-q7, one or more of them are remapped
to reduce the number of registers that have to be saved/restored.

This incurs a very small (around 0.5%) slowdown in the decoder and
encoder.
2014-03-10 22:07:36 +02:00
Martin Storsjö
811c647c0e Remap registers to avoid clobbering the neon registers q4-q7
According to the calling convention, the registers q4-q7 should be
preserved by functions. The caller (generated by the compiler) could
be using those registers anywhere for any intermediate data.

Functions that use 12 or less of the qX registers can avoid
violating the calling convention by simply using other registers instead
of the callee saved registers q4-q7.

This change only remaps the registers used within functions - therefore
this does not affect performance at all. E.g. in functions using
registers q0-q7, we now use q0-q3 and q8-q11 instead.
2014-03-10 22:07:25 +02:00
ruil2
a922155c9a Merge pull request #466 from sijchen/add_memalign_test
Add memalign unit test
2014-03-10 17:25:41 +08:00
sijchen
385128e403 Merge pull request #465 from ruil2/encoder_trace
use global trace in encoder
reviewed at https://rbcommons.com/s/OpenH264/r/176/
2014-03-10 17:22:19 +08:00
sijchen
53a570556d add memalign unit test 2014-03-10 16:28:05 +08:00
ruil2
02bafd9320 Merge pull request #445 from mstorsjo/use-thread-param
Use the iMultipleThreadIdc field from SEncParamExt
2014-03-10 15:28:04 +08:00
ruil2
86f37f047c Merge pull request #452 from mstorsjo/use-slice-mode-enum
Use SliceModeEnum as data type for the slice mode fields
2014-03-10 15:27:04 +08:00
ruil2
2539d6e447 Merge pull request #462 from mstorsjo/fix-typos
Fix two typos in variable and macro names
2014-03-10 15:25:20 +08:00
ruil2
ba6b2a8d62 use global trace in encoder 2014-03-10 15:22:40 +08:00
Martin Storsjö
cc7b81f3c3 Fix a typo in arm assembly, LORD -> LOAD 2014-03-09 19:19:38 +02:00
Martin Storsjö
7c435ad295 Remove a stray inline keyword in a function signature comment in x86 assembly
Assembly functions written in external assembly files is obviously
not inlined.
2014-03-09 19:18:03 +02:00
Martin Storsjö
8d6b368a1c Remove unnecessary stray __cdecl annotations in function signature comments in x86 assembly 2014-03-09 19:18:02 +02:00
Martin Storsjö
5df2e2a996 Use SliceModeEnum as data type for the slice mode fields
This makes the use of the field clearer and safer by allowing
the compiler check that users actually assign proper enum
values.
2014-03-08 00:23:58 +02:00
Martin Storsjö
ce7b00ea72 Get rid of an unnecessary cast by declaring the right pointer type 2014-03-08 00:17:30 +02:00
Ethan Hugg
fb4f677f77 Merge pull request #446 from mstorsjo/remove-unnecessary-public-param
Move the iCountThreadsNum field to SWelsSvcCodingParam
2014-03-07 09:18:52 -08:00
Ethan Hugg
7632510209 Merge pull request #450 from mstorsjo/publish-slice-mode-enum
Move the slice mode enum to the public API
2014-03-07 09:17:03 -08:00
Martin Storsjö
5f1c207845 Move the slice mode enum to the public header
This simplifies setting the slice mode in the public API.
2014-03-07 14:53:29 +02:00
Martin Storsjö
495a4a392e Make ParamValidationExt use the actual type instead of a void pointer 2014-03-07 14:51:34 +02:00
Martin Storsjö
656e4c5c35 Move the iCountThreadsNum field to SWelsSvcCodingParam
There is no point in the user setting this field, it's only used
as an internal field within the encoder.
2014-03-07 14:48:38 +02:00
Martin Storsjö
dbc324d5bb Use the iMultipleThreadIdc field from SEncParamExt 2014-03-07 14:47:43 +02:00
Martin Storsjö
5b8ee37162 Merge WelsThreadDestroy into WelsThreadJoin
Now calling WelsThreadJoin is enough to finish and clean up
the thread on all platforms.

This unifies the thread cleanup code between windows and unix.

Now all of the threading code should use the exact same codepaths
between windows and unix.
2014-03-07 10:51:28 +02:00
Martin Storsjö
b4aa9be7de Use WelsThreadJoin on windows as well
This avoids using a separate event just for signalling that
a thread has finished running.
2014-03-07 10:51:28 +02:00
Martin Storsjö
baaa38737e Use pExitEncodeEvent instead of thread cancellation on unix as well
This works now that we've got a suitably working implementation
of WelsMultipleEventsWaitSingleBlocking.
2014-03-07 10:49:39 +02:00
volvet
38a3fada24 Merge pull request #435 from mstorsjo/threadlib-wait-single-unix
Make WelsMultipleEventsWaitSingleBlocking usable on unix as well
2014-03-07 16:47:38 +08:00
Licai Guo
1b9aae8434 Merge pull request #439 from zhilwang/mc-arm-asm
mv mc_neon.S to common,add MC arm code to encoder
2014-03-07 16:36:48 +08:00
ruil2
b3c45946ff modify typing format 2014-03-07 16:29:12 +08:00
Licai Guo
e5f36822a9 Update targets.mk files 2014-03-07 16:22:59 +08:00
Licai Guo
d986c27b9d remove mc_neon.S from encoder 2014-03-07 16:11:36 +08:00
ruil2
f0c6c2b318 Merge branch 'master' of https://github.com/cisco/openh264 into encoder_update 2014-03-07 15:59:23 +08:00
Licai Guo
71467f948a mv mc_neon.S to common,add MC arm code to encoder 2014-03-07 12:18:58 +08:00
Licai Guo
a4cecd8004 Merge pull request #426 from volvet/simplify-layer-process
simplify-layer-process
2014-03-07 10:58:28 +08:00
volvet
14f5518e6a Merge pull request #437 from mstorsjo/fix-arm-encoder-android
Fix building arm encoder assembly for android
2014-03-07 10:41:34 +08:00
ruil2
594fc4fe7b dump file refactor 2014-03-07 10:23:25 +08:00
Martin Storsjö
c0043f7053 Use the three-operand form of add/sub with shift
When using unified syntax, the two operand form with a shift
isn't allowed.
2014-03-06 16:21:54 +02:00
Martin Storsjö
f1502c26e3 Don't use WELS_ASM_FUNC_END in the middle of a function
WELS_ASM_FUNC_END declares the end of the function, and needs
to be paired with WELS_ASM_FUNC_BEGIN.
2014-03-06 16:21:54 +02:00
Martin Storsjö
4e4bfcc1bc Regenerate makefiles to include the encoder arm assembly 2014-03-06 16:11:54 +02:00
Martin Storsjö
ce4fa9e272 Correct the endif comment
The code block is about HAVE_NEON, not X86_ASM.
2014-03-06 15:43:04 +02:00
Martin Storsjö
636df2bebb Use WelsMultipleEventsWaitSingleBlocking within the worker thread on unix as well
This avoids using a separate thread for handling pUpdateMbListEvent
events, and later allowing using the encode exit event on unix instead
of pthread cancellation.
2014-03-06 15:34:35 +02:00
Martin Storsjö
801da26d1d Use WelsMultipleEventsWaitSingleBlocking with a master event for waiting on finished threads
This allows using the same codepath for both unix and windows
for distributing new slices to code to threads.

This also improves the performance on unix - instead of waiting
for all the current threads to finish their current slice
before handing out a new slice to each of them (where the threads
that finish first will just wait instead of immediately getting
a new slice to work on), we now use the same logic as on windows.

In one setup, it improves the performance of encoding from ~920 fps
to ~950 fps, and in another setup it goes from ~390 fps to ~660 fps.
(These tests were done with the SM_ROWMB_SLICE mode, which
heavily exercises the code for distributing new slices to the
worker threads.)

The extra WelsEventSignal call on windows where it isn't strictly
necessary doesn't incur any measurable slowdown, so it is kept
without any extra ifdefs to keep the code more readable and unified.
2014-03-06 15:33:37 +02:00
Martin Storsjö
de32455d87 Remove the timeout parameter from WelsMultipleEventsWaitSingleBlocking
All users of the function passed the value corresponding to
"infinite", and the (currently unused) unix implementation of it
only supported infinite wait as well.
2014-03-06 15:03:59 +02:00
volvet
8cc332dea1 Merge pull request #432 from zhilwang/arm-asm
Arm asm
2014-03-06 16:50:56 +08:00
volvet
73452e0993 Merge pull request #429 from mstorsjo/simplify-ifdef-with-macro
Use a macro for conditionally logging based on ENABLE_TRACE_MT
2014-03-06 16:01:41 +08:00
Licai Guo
7bfe801874 Remove trailing space 2014-03-06 14:55:36 +08:00
Licai Guo
67534b0fc0 arm asm code refine. 2014-03-06 14:30:16 +08:00
Martin Storsjö
fd6f8a83b3 Use a macro for conditionally logging based on ENABLE_TRACE_MT
This avoids having an extra ifdef around every single WelsLog
call.
2014-03-06 08:06:34 +02:00
ruil2
28a56a6752 Merge pull request #415 from volvet/remove-useless-mgs-code
remove un-supported mgs code
2014-03-06 14:05:04 +08:00
volvet
50fe120a3e simplify-layer-process 2014-03-06 11:19:33 +08:00
ruil2
334c5765c7 remove inter-deblock related parameters 2014-03-06 10:26:53 +08:00
Licai Guo
e7cc8c2780 Add arm asm code for processing. 2014-03-05 16:54:05 +08:00
Martin Storsjö
d4bdef2916 Use an event name that contains the process id
This reduces the risk for namespace collisions if two processes
run the encoder simultaneously without address space layout
randomization.
2014-03-05 09:36:46 +02:00
Martin Storsjö
5480ffafdf Use the WelsEventOpen interface with an event name on windows as well
This unifies the event creation interface, even if the event
name itself is unused on windows, allowing use the exact same
code to initialize events regardless of the actual platform.

Some ifdefs still remain in the event initialization code, since
some events are only used on windows.
2014-03-05 09:36:04 +02:00
volvet
e9395bbd35 remove un-supported mgs code 2014-03-05 15:17:07 +08:00
volvet
adb27ff0b1 Merge pull request #405 from mstorsjo/simplify-threads
Adjust WELS_EVENT definitions to allow sharing more code between unix and win32 codepaths
2014-03-05 12:31:15 +08:00
Licai Guo
ced9e41b5d Merge pull request #399 from volvet/refine-multi-layer-process
refine-multi-layer-process
2014-03-05 10:45:35 +08:00
Licai Guo
248f324c62 Add intra predictor arm asm code. 2014-03-05 10:25:15 +08:00
Licai Guo
efcee63692 Remove .DS_Store file. 2014-03-05 10:24:05 +08:00
Licai Guo
bb244d736b Partly add arm asm code to encoder. 2014-03-05 10:24:05 +08:00
volvet
7150adc91b Merge pull request #407 from mstorsjo/do-blocking-wait
Do a blocking wait with WelsMultipleEventsWaitSingleBlocking
2014-03-05 09:18:45 +08:00
Martin Storsjö
cf07d61f06 Do a blocking wait with WelsMultipleEventsWaitSingleBlocking
There is no point in doing a timed wait here - there's no work
that we can do if the wait timed out, and sleeping for 1 ms
inbetween doesn't help, it only adds potential extra latency
to reacting to threads that need more work to do.
2014-03-04 14:51:33 +02:00
Martin Storsjö
1eaa38b130 Simplify code by allocating the arrays of events and thread handles statically
This avoids having to malloc a whole lot of separate arrays,
all which are all bounded by MAX_THREADS_NUM.
2014-03-04 12:17:32 +02:00
Martin Storsjö
ae63f064a0 Share the declarations for WELS_EVENT arrays between win32 and unix codepaths 2014-03-04 12:17:32 +02:00
Martin Storsjö
71bc52d103 Change the unix version of WELS_EVENT to sem_t*
Typedeffing WELS_EVENT as sem_t* makes the typedef behave similarly
to the windows version (typedeffed as HANDLE), unifying the code
that allocates and uses these event objects (getting rid of
most of the need for separate codepaths and ifdefs).
2014-03-04 12:17:32 +02:00
Martin Storsjö
e9c3403674 Merge some WIN32 ifdefs that were directly next to each other 2014-03-04 12:17:32 +02:00
Martin Storsjö
9cf34e7615 Unify the interface for the different variants of WelsCPUFeatureDetect
The caller of the function should not need to know exactly which
implementation of it is being used.

For the variants that don't support detecting the number of cores,
the pNumberOfLogicProcessors parameter can be left untouched
and the caller will use a higher level API for finding it out.

This simplifies all the calling code, and simplifies adding
more implementations of cpu feature detection.
2014-03-04 10:18:30 +02:00
volvet
13d785ec6a refine-multi-layer-process 2014-03-04 12:04:04 +08:00
Licai Guo
26218731c6 Merge pull request #386 from volvet/refine_processing
refine build spatial list in processing
2014-03-04 11:15:35 +08:00
volvet
901b89f7ad Merge pull request #376 from mstorsjo/simplify-x86-asm-makefiles
Simplify makefiles with respect to x86 assembly
2014-03-04 10:16:01 +08:00
volvet
bc1850a54d remove uiFrameIdxRc 2014-03-04 09:08:54 +08:00
Ethan Hugg
1eb688264b Merge pull request #395 from mstorsjo/printf-64bit-macro
Use a standard macro for 64 bit printf conversion specifiers
2014-03-03 09:11:51 -08:00
Ethan Hugg
e9593682eb Merge pull request #392 from mstorsjo/unify-threading-ifdefs
Unify ifdef conditions related to threading code
2014-03-03 08:23:30 -08:00
Ethan Hugg
d940a204eb Merge pull request #388 from mstorsjo/initialize-default
Initialize sSpatialLayers[0] in SEncParamExt for GetDefaultParams
2014-03-03 08:20:36 -08:00
Martin Storsjö
e0951599ea Unify ifdef conditions related to threading code
The two different variants of the threadlib basically are
win32 and unix - use _WIN32 to check for this consistently,
instead of occasionally using __GNUC__ to enable the unix
codepath. (__GNUC__ is also defined on mingw, which still is
a windows platform and should use the _WIN32 code.)
2014-03-03 14:55:53 +02:00
Martin Storsjö
3c7dde97ee Use a standard macro for 64 bit printf conversion specifiers
This avoids duplicating the printf line with an ifdef every
time a 64 bit number needs to be printed.
2014-03-03 12:33:34 +02:00
volvet
c7d98a8fa3 Merge pull request #394 from ruil2/encoder_update
add timestamp in encoder interface --- review request#138
2014-03-03 17:31:48 +08:00
unknown
e0e7107ff1 add timestamp in encoder interface 2014-03-03 17:05:06 +08:00
Martin Storsjö
9ccabd1fe3 Fix cropping when using SEncParamBase
The iFrameWidth/iFrameHeight fields are already aligned by the
SetActualPicResolution() function. Previously when iFrameWidth was
aligned directly in ParamBaseTranscode, this aligned value was used
to set iActualWidth/iActualHeight - losing the original, cropped
size.

This makes sure the output bitstream from the test of encoding
res/Static_152_100.yuv actually is cropped as it should.
2014-03-03 10:34:37 +02:00
Martin Storsjö
e392932ad2 Initialize sSpatialLayers[0] in SEncParamExt for GetDefaultParams 2014-03-03 10:32:20 +02:00
volvet
775eebaf36 refine build spatial list in processing 2014-03-03 14:04:19 +08:00
ruil2
b552944453 fix sizeof() bug 2014-03-03 10:46:32 +08:00
volvet
e3bf5ced53 Merge pull request #371 from ruil2/encode_ret
add verification on return value -- review request #128
2014-03-03 10:27:26 +08:00
ruil2
abdeb1951d format update 2014-03-03 09:07:16 +08:00
ruil2
23df8a9ff6 add video format support verification 2014-03-03 09:03:59 +08:00
volvet
3a602a382b Merge pull request #379 from mstorsjo/simplify-emms-calling
Provide a no-op WelsEmms macro if X86_ASM is disabled
2014-03-03 09:03:41 +08:00
volvet
5be179e0aa Merge pull request #378 from mstorsjo/fix-building-debug-code
Fix building a logging statement in debug code
2014-03-03 09:00:59 +08:00
Martin Storsjö
2b82a5743d Fix printing an event name for debugging 2014-03-02 23:49:50 +02:00
Martin Storsjö
dd47d4805f Provide a no-op WelsEmms macro if X86_ASM is disabled
This allows always calling this function, reducing the number
of ifdefs in the calling code.
2014-03-02 23:46:20 +02:00
Martin Storsjö
26d66a4e1f Fix building a logging statement in debug code 2014-03-02 23:45:14 +02:00
Martin Storsjö
3ccd2ae4cf Remove a redundant makefile ifdef
ASM_ARCH=x86 is only set if USE_ASM is enabled.
2014-03-01 23:56:14 +02:00
Ethan Hugg
6e9df66272 Merge pull request #369 from sijchen/mt_refactor3
[Encoder] remove macros to clear codes
2014-02-28 08:28:18 -08:00
Martin Storsjö
7d2c761604 Allow using the USE_ASM makefile variable for architectures other than x86
Add an ASM_ARCH variable which specifies which kind of assembly
is supposed to be built.
2014-02-28 10:19:53 +02:00
volvet
4808eca022 update comments on welsEncoderEncodeExt 2014-02-28 15:27:54 +08:00
volvet
4c951aab83 refine welsEncoderEncodeExt 2014-02-28 15:13:38 +08:00