Fix for Race condition can hang miniserver thread.

Add 'requiredThreads' field to the ThreadPool structure, to avoid
a race condition when waiting for a new thread to be created. The
race condition occurs when a thread is destroyed while the master
thread is waiting for a new thread to be created.

Thanks to Chuck Thomason for pointing the problem.

Summary: Race condition can hang miniserver thread - ID: 3158591

Details:
Hello,

I have found a race condition in the thread pool handling of
libupnp-1.6.6 that periodically results in the miniserver thread
getting blocked infinitely.

In my setup, I have the miniserver thread pool configured with 1
job per thread, 2 threads minimum, and 50 threads maximum.

Just before the lockup occurs, the miniserver thread pool contains
2 threads: one worker thread hanging around from a previous HTTP
request job (let's call that thread "old_worker") and the
miniserver thread itself.

A new HTTP request comes in. Accordingly, the miniserver enters
schedule_request_job() and then ThreadPoolAdd(). In
ThreadPoolAdd(), the job gets added to the medium-priority queue,
and AddWorker() is called. In AddWorker(), jobs = 1 and threads =
1, so CreateWorker gets called.

When we enter CreateWorker(), tp->totalThreads is 2, so
currentThreads is 3. The function creates a new thread and then
blocks on tp->start_and_shutdown. The miniserver thread expects
the newly created thread to increment tp->totalThreads and then
signal the condition variable to wake up the miniserver thread and
let it proceed.

The newly created thread starts in the WorkerThread() function. It
increments tp->totalThreads to 3, does a broadcast on the
start_and_shutdown condition, and starts running its job. However,
before the miniserver thread wakes up, "old_worker" times out. It
sees that there are no jobs in any queue and that the total number
of threads (3) is more than the minimum (2). As a result, it
reduces tp->totalThreads to 2 and dies.

Now the miniserver thread finally wakes up. It checks
tp->totalThreads and sees that its value is 2, so it blocks on
tp->start_and_shutdown again. It has now "missed" seeing
tp->totalThreads get incremented to 3 and will never be unblocked
again.

When this issue does occur for a server device, the miniserver
port remains open, but becomes unresponsive since the miniserver
thread is stuck. SSDP alive messages keep getting sent out, as
they are handled by a separate thread. Reproducing the issue is
difficult due to the timing coincidence involved, but in my
environment I am presently seeing it at least once a day. I
figured out the sequence described above through addition of my
own debug logs.

The relevant code involved in this bug has not changed
substantially in libupnp-1.6.10, though I am planning to test
against 1.6.10 as well in the near future.

Do you have any input for an elegant fix for this issue?

Thanks,

Chuck Thomason
This commit is contained in:
Chandra Penke 2011-01-20 04:45:27 -02:00 committed by Marcelo Roberto Jimenez
parent 639d3a5a03
commit c4e9757bcf
3 changed files with 96 additions and 6 deletions

View File

@ -2,6 +2,79 @@
Version 1.6.11
*******************************************************************************
2011-01-20 Chandra Penke <chandrapenke(at)mcntech.com>
Fix for Race condition can hang miniserver thread.
Add 'requiredThreads' field to the ThreadPool structure, to avoid
a race condition when waiting for a new thread to be created. The
race condition occurs when a thread is destroyed while the master
thread is waiting for a new thread to be created.
Thanks to Chuck Thomason for pointing the problem.
Summary: Race condition can hang miniserver thread - ID: 3158591
Details:
Hello,
I have found a race condition in the thread pool handling of
libupnp-1.6.6 that periodically results in the miniserver thread
getting blocked infinitely.
In my setup, I have the miniserver thread pool configured with 1
job per thread, 2 threads minimum, and 50 threads maximum.
Just before the lockup occurs, the miniserver thread pool contains
2 threads: one worker thread hanging around from a previous HTTP
request job (let's call that thread "old_worker") and the
miniserver thread itself.
A new HTTP request comes in. Accordingly, the miniserver enters
schedule_request_job() and then ThreadPoolAdd(). In
ThreadPoolAdd(), the job gets added to the medium-priority queue,
and AddWorker() is called. In AddWorker(), jobs = 1 and threads =
1, so CreateWorker gets called.
When we enter CreateWorker(), tp->totalThreads is 2, so
currentThreads is 3. The function creates a new thread and then
blocks on tp->start_and_shutdown. The miniserver thread expects
the newly created thread to increment tp->totalThreads and then
signal the condition variable to wake up the miniserver thread and
let it proceed.
The newly created thread starts in the WorkerThread() function. It
increments tp->totalThreads to 3, does a broadcast on the
start_and_shutdown condition, and starts running its job. However,
before the miniserver thread wakes up, "old_worker" times out. It
sees that there are no jobs in any queue and that the total number
of threads (3) is more than the minimum (2). As a result, it
reduces tp->totalThreads to 2 and dies.
Now the miniserver thread finally wakes up. It checks
tp->totalThreads and sees that its value is 2, so it blocks on
tp->start_and_shutdown again. It has now "missed" seeing
tp->totalThreads get incremented to 3 and will never be unblocked
again.
When this issue does occur for a server device, the miniserver
port remains open, but becomes unresponsive since the miniserver
thread is stuck. SSDP alive messages keep getting sent out, as
they are handled by a separate thread. Reproducing the issue is
difficult due to the timing coincidence involved, but in my
environment I am presently seeing it at least once a day. I
figured out the sequence described above through addition of my
own debug logs.
The relevant code involved in this bug has not changed
substantially in libupnp-1.6.10, though I am planning to test
against 1.6.10 as well in the near future.
Do you have any input for an elegant fix for this issue?
Thanks,
Chuck Thomason
2011-01-16 Marcelo Roberto Jimenez <mroberto(at)users.sourceforge.net>
Define _FILE_OFFSET_BITS, _LARGEFILE_SOURCE and _LARGE_FILE_SOURCE in

View File

@ -222,6 +222,8 @@ typedef struct THREADPOOL
int shutdown;
/*! total number of threads */
int totalThreads;
/*! flag that's set when waiting for a new worker thread to start */
int pendingWorkerThreadStart;
/*! number of threads that are currently executing jobs */
int busyThreads;
/*! number of persistent threads */

View File

@ -97,7 +97,8 @@ static void StatsInit(
stats->workerThreads = 0;
stats->idleThreads = 0;
stats->persistentThreads = 0;
stats->maxThreads = 0; stats->totalThreads = 0;
stats->maxThreads = 0;
stats->totalThreads = 0;
}
/*!
@ -459,6 +460,7 @@ static void *WorkerThread(
/* Increment total thread count */
ithread_mutex_lock(&tp->mutex);
tp->totalThreads++;
tp->pendingWorkerThreadStart = 0;
ithread_cond_broadcast(&tp->start_and_shutdown);
ithread_mutex_unlock(&tp->mutex);
@ -602,6 +604,9 @@ static ThreadPoolJob *CreateThreadPoolJob(
* \brief Creates a worker thread, if the thread pool does not already have
* max threads.
*
* \remark The ThreadPool object mutex must be locked prior to calling this
* function.
*
* \internal
*
* \return
@ -610,16 +615,20 @@ static ThreadPoolJob *CreateThreadPoolJob(
* \li \c EAGAIN if system can not create thread.
*/
static int CreateWorker(
/*! . */
/*! A pointer to the ThreadPool object. */
ThreadPool *tp)
{
ithread_t temp;
int rc = 0;
int currentThreads = tp->totalThreads + 1;
ithread_attr_t attr;
/* if a new worker is the process of starting, wait until it fully starts */
while (tp->pendingWorkerThreadStart) {
ithread_cond_wait(&tp->start_and_shutdown, &tp->mutex);
}
if (tp->attr.maxThreads != INFINITE_THREADS &&
currentThreads > tp->attr.maxThreads) {
tp->totalThreads + 1 > tp->attr.maxThreads) {
return EMAXTHREADS;
}
ithread_attr_init(&attr);
@ -628,7 +637,9 @@ static int CreateWorker(
ithread_attr_destroy(&attr);
if (rc == 0) {
rc = ithread_detach(temp);
while (tp->totalThreads < currentThreads) {
tp->pendingWorkerThreadStart = 1;
/* wait until the new worker thread starts */
while (tp->pendingWorkerThreadStart) {
ithread_cond_wait(&tp->start_and_shutdown, &tp->mutex);
}
}
@ -643,10 +654,13 @@ static int CreateWorker(
* \brief Determines whether or not a thread should be added based on the
* jobsPerThread ratio. Adds a thread if appropriate.
*
* \remark The ThreadPool object mutex must be locked prior to calling this
* function.
*
* \internal
*/
static void AddWorker(
/*! . */
/*! A pointer to the ThreadPool object. */
ThreadPool *tp)
{
long jobs = 0;
@ -709,6 +723,7 @@ int ThreadPoolInit(ThreadPool *tp, ThreadPoolAttr *attr)
tp->totalThreads = 0;
tp->busyThreads = 0;
tp->persistentThreads = 0;
tp->pendingWorkerThreadStart = 0;
for (i = 0; i < tp->attr.minThreads; ++i) {
retCode = CreateWorker(tp);
if (retCode) {