This avoids using a separate thread for handling pUpdateMbListEvent
events, and later allowing using the encode exit event on unix instead
of pthread cancellation.
This allows using the same codepath for both unix and windows
for distributing new slices to code to threads.
This also improves the performance on unix - instead of waiting
for all the current threads to finish their current slice
before handing out a new slice to each of them (where the threads
that finish first will just wait instead of immediately getting
a new slice to work on), we now use the same logic as on windows.
In one setup, it improves the performance of encoding from ~920 fps
to ~950 fps, and in another setup it goes from ~390 fps to ~660 fps.
(These tests were done with the SM_ROWMB_SLICE mode, which
heavily exercises the code for distributing new slices to the
worker threads.)
The extra WelsEventSignal call on windows where it isn't strictly
necessary doesn't incur any measurable slowdown, so it is kept
without any extra ifdefs to keep the code more readable and unified.
All users of the function passed the value corresponding to
"infinite", and the (currently unused) unix implementation of it
only supported infinite wait as well.
This unifies the event creation interface, even if the event
name itself is unused on windows, allowing use the exact same
code to initialize events regardless of the actual platform.
Some ifdefs still remain in the event initialization code, since
some events are only used on windows.
There is no point in doing a timed wait here - there's no work
that we can do if the wait timed out, and sleeping for 1 ms
inbetween doesn't help, it only adds potential extra latency
to reacting to threads that need more work to do.
Typedeffing WELS_EVENT as sem_t* makes the typedef behave similarly
to the windows version (typedeffed as HANDLE), unifying the code
that allocates and uses these event objects (getting rid of
most of the need for separate codepaths and ifdefs).
The caller of the function should not need to know exactly which
implementation of it is being used.
For the variants that don't support detecting the number of cores,
the pNumberOfLogicProcessors parameter can be left untouched
and the caller will use a higher level API for finding it out.
This simplifies all the calling code, and simplifies adding
more implementations of cpu feature detection.
The two different variants of the threadlib basically are
win32 and unix - use _WIN32 to check for this consistently,
instead of occasionally using __GNUC__ to enable the unix
codepath. (__GNUC__ is also defined on mingw, which still is
a windows platform and should use the _WIN32 code.)
The iFrameWidth/iFrameHeight fields are already aligned by the
SetActualPicResolution() function. Previously when iFrameWidth was
aligned directly in ParamBaseTranscode, this aligned value was used
to set iActualWidth/iActualHeight - losing the original, cropped
size.
This makes sure the output bitstream from the test of encoding
res/Static_152_100.yuv actually is cropped as it should.
This simplifies forward source compatibility when new fields are
added to SEncParamExt - when new fields are added to SEncParamExt,
this method makes sure those fields are initialized to the
default value - otherwise all API users would have to manually check
SEncParamExt every time it is updated to make sure there's no new fields
that should be set to a nonzero value by default (e.g. like
bEnableFrameSkip).
On processors without HTT, WelsCPUFeatureDetect can't return
a number of cores but might still return a nonzero set of
CPU feature flags. Previously the nonzero cpu feature flag
indicated that cpuid worked and the encoder wouldn't use the
higher level API for getting the number of cores, even though the
number of cores was left at 1.
Previously the loop filter was unconditionally enabled
regardless of what encoder parameter was set. If using
SEncParamBase instead, the loop filter was always disabled.
Previously, these fields kept whatever value was set by
FillDefault. The corresponding fields were set properly within
sSpatialLayers, but the fields within the main struct were left
with the default values.
This doesn't change the hashes in the unit test, since these
fields don't seem to be used in the produced bitstream at all.
Instead just duplicate the common fields. These fields had to
be duplicated for the C interface compatibility anyway - but
this way there is no risk to accidentally introduce an ABI
break since there is no need for the layout of SEncParamBase to
actually match the start of SEncParamExt.
Remove the useless iInitType parameter, make the method
private within CWelsH264SVCEncoder class, give the pointer
parameter the correct type, avoiding needless casting.
TRUE/FALSE has intentionally been left in use for the few
platform specific APIs that define these constants themselves
and expect them to be used, for consistency.
Instead of byteswapping a 32 bit word and writing it out as a
whole (which could even possibly lead to crashes due to
incorrect alignment on some platforms), write it out explicitly
in the intended byte order.
This avoids having to set a define indicating the endianness.
The code interprets an array of 4 uint8_t values as one uint32_t
and does shifts on the value. The same optimization can be
kept in big endian as well, but the shift has to be done in the
other direction.
This code could be made truly independent of endianness, but
that could cause some minimal performance degradaion, at least
in theory.
This makes "make test" pass on big endian, assuming that
WORDS_BIGENDIAN is defined while building.
This makes the code work properly on big endian.
The MC case is similar to how it's done in the encoder.
Neither of these should have any significant performance
impact.
This simplifies the code and makes the buffer size checks
more consistent. Additionally, the previous version wrote
the extra space character without checking if it actually fit
into the buffer.
strlen is not dangerous if the string is known to be null
terminated (and MSVC does not warn about its use either).
For the cases in the decoder welsCodecTrace.cpp, the string
passed to all WriteString instances is produced by WelsVsnprintf
which always null terminates the buffer nowadays.
Additionally, as the string was passed to OutputDebugStringA
without any length specifier before, it was already assumed to
be null terminated.
The file name parameter passed to DumpDependencyRec and
DumpRecFrame in encoder.cpp is always null terminated,
which was already assumed as it is passed to WelsFopen as is.
As for the encoder utils.cpp, the strings returned by GetLogPath
are string constants that are null terminated.
As long as WelsFileHandle* is equal to FILE* this doesn't matter,
but for consistency use the WelsF* functions for all handles
opened by WelsFopen, and use WelsFileHandle* as type for it
instead of FILE*.
Both encoder and decoder versions were functionally equivalent,
but I picked the decoder version (but added the static inline
keywords to it) since the encoder one was quite messy with a lot
of commented out code.
Instead of using "defined(MSC_VER) || defined(__MINGW32__)" to
indicate the windows platform, just check for the _WIN32 define
instead.
Also remove an unused codepath - the removed codepath would
only be used under the condition
"(defined(MSC_VER) || defined(__MINGW32__)) && !defined(_WIN32)",
and I'm not aware of any environment with MSVC or MinGW that
doesn't define _WIN32, thus this codepath never was used.
This fixes two separate issues.
First, with the MSVC _snprintf implementations, the return value
is negative if the buffer wasn't large enough - this would in
the worst case lead to making iBufferUsed negative, writing before
the start of the buffer.
Secondly, when both iBufferUsed and iBufferLeft are accumulated,
one can't do "iBufferLeft -= iBufferUsed;". As an example,
say the buffer is 100 bytes in total and iBufferLeft is 40 and
iBufferUsed is 60. If SNPRINTF then writes 5 more bytes to the
buffer, iBufferUsed would be 65, but if we now do
"iBufferLeft -= iBufferUsed;" then iBufferLeft would end up as
-25 even though there's 35 bytes left in the buffer to use.
Therefore, we use a separate variable to store the return value
from the latest SNPRINTF call. This is checked to make sure it
wasn't negative, and only this amount is added to iBufferUsed
and subtracted from iBufferLeft.
This is the same pattern used in codec/encoder/core/src/utils.cpp.
strftime never returns negative numbers, so those calls don't
need as much checking.
Checking iBufferLeft > iBufferUsed does not make sense, since
this would stop writing into the buffer alredy after the buffer
is half full, when there is less space left than has been used.
The right check is iBufferLeft > 0.
The following pattern is unsafe on all platforms:
n = SNPRINTF(buf, ...);
buf[n] = '\0';
On windows, the _snprintf variants return a negative number
if the buffer was too small, thus buf[n] would be outside
of (before the start of) the buffer.
On other platforms, the C99 snprintf function returns the
total number of characters which would have been written if
the buffer had been large enough, which can be larger than
the buffer size itself, and thus buf[n] would be beyond the
end of the buffer.
The C99 snprintf function always null terminate the buffer.
These invocations of SNPRINTF are within !WIN32, so we can
be sure that the SNPRINTF call itself already null terminated
the buffer.
The decoder used WelsMedian while the encoder used WELS_MEDIAN.
The former has two different implementations, WELS_MEDIAN was
identical to the disabled version of WelsMedian.
Settle on using the same implementation for both decoder and
encoder - whichever version of the implementations is faster
should be used for both.
All functions that are assigned to function pointers with this
typedef (WelsHadamardQuant2x2Skip_c and WelsHadamardQuant2x2Skip_mmx)
use int32_t instead of BOOL_T for the return value.
No code exists within the project for building such a trace library.
This also fixes building on OS X with -Wno-deprecated-declarations
removed, since this code contained calls to deprecated functions
within #ifdef MACOS, which now are enabled when building on OS X.
bundleloader.h, which is included if MACOS is defined, defines
inline functions that reference bundle loading system functions,
which requires linking to the core foundation framework.
Avoid requiring linking to extra libraries/frameworks if
NO_DYNAMIC_VP is defined.
Add a struct that matches the C++ interface vtable.
This requires that the C++ interface methods are declared to use
the same calling convention as normal C functions, and that the
C struct exactly matches the layout and ordering of the C++
virtual table - MSVC seemed to reorder methods if there were
overloaded methods.
This is required to make the order in the C++ virtual table
consistent in MSVC - previously the overloaded methods were
ordered differently in the vtable compared to the interface
declaration.
Commit f38111d76b updated these files
manually (based on older versions of them) to something not generated
by the current mktargets.py/sh, losing the compact pattern rules.
Currently this used the _MSC_VER && !WIN64 to enable the inline
assembly, which still tried to use this code on windows on arm.
Using _MSC_VER && _M_IX86 is enough since _M_IX86 is defined only
when targeting 32 bit x86, not for x64.
astyle was only run on .cpp files this time - already in
ff6b66917 where the style cleanup was done initially, not all
.h files seem to have gotten the same styling (rerunning astyle
on .h files at that commit produces a huge diff).