This avoids using a separate thread for handling pUpdateMbListEvent
events, and later allowing using the encode exit event on unix instead
of pthread cancellation.
This allows using the same codepath for both unix and windows
for distributing new slices to code to threads.
This also improves the performance on unix - instead of waiting
for all the current threads to finish their current slice
before handing out a new slice to each of them (where the threads
that finish first will just wait instead of immediately getting
a new slice to work on), we now use the same logic as on windows.
In one setup, it improves the performance of encoding from ~920 fps
to ~950 fps, and in another setup it goes from ~390 fps to ~660 fps.
(These tests were done with the SM_ROWMB_SLICE mode, which
heavily exercises the code for distributing new slices to the
worker threads.)
The extra WelsEventSignal call on windows where it isn't strictly
necessary doesn't incur any measurable slowdown, so it is kept
without any extra ifdefs to keep the code more readable and unified.
Typedeffing WELS_EVENT as sem_t* makes the typedef behave similarly
to the windows version (typedeffed as HANDLE), unifying the code
that allocates and uses these event objects (getting rid of
most of the need for separate codepaths and ifdefs).
The two different variants of the threadlib basically are
win32 and unix - use _WIN32 to check for this consistently,
instead of occasionally using __GNUC__ to enable the unix
codepath. (__GNUC__ is also defined on mingw, which still is
a windows platform and should use the _WIN32 code.)