Fix for Race condition can hang miniserver thread.

Add 'requiredThreads' field to the ThreadPool structure, to avoid a race condition when waiting for a new thread to be created. The race condition occurs when a thread is destroyed while the master thread is waiting for a new thread to be created. Thanks to Chuck Thomason for pointing the problem. Summary: Race condition can hang miniserver thread - ID: 3158591 Details: Hello, I have found a race condition in the thread pool handling of libupnp-1.6.6 that periodically results in the miniserver thread getting blocked infinitely. In my setup, I have the miniserver thread pool configured with 1 job per thread, 2 threads minimum, and 50 threads maximum. Just before the lockup occurs, the miniserver thread pool contains 2 threads: one worker thread hanging around from a previous HTTP request job (let's call that thread "old_worker") and the miniserver thread itself. A new HTTP request comes in. Accordingly, the miniserver enters schedule_request_job() and then ThreadPoolAdd(). In ThreadPoolAdd(), the job gets added to the medium-priority queue, and AddWorker() is called. In AddWorker(), jobs = 1 and threads = 1, so CreateWorker gets called. When we enter CreateWorker(), tp->totalThreads is 2, so currentThreads is 3. The function creates a new thread and then blocks on tp->start_and_shutdown. The miniserver thread expects the newly created thread to increment tp->totalThreads and then signal the condition variable to wake up the miniserver thread and let it proceed. The newly created thread starts in the WorkerThread() function. It increments tp->totalThreads to 3, does a broadcast on the start_and_shutdown condition, and starts running its job. However, before the miniserver thread wakes up, "old_worker" times out. It sees that there are no jobs in any queue and that the total number of threads (3) is more than the minimum (2). As a result, it reduces tp->totalThreads to 2 and dies. Now the miniserver thread finally wakes up. It checks tp->totalThreads and sees that its value is 2, so it blocks on tp->start_and_shutdown again. It has now "missed" seeing tp->totalThreads get incremented to 3 and will never be unblocked again. When this issue does occur for a server device, the miniserver port remains open, but becomes unresponsive since the miniserver thread is stuck. SSDP alive messages keep getting sent out, as they are handled by a separate thread. Reproducing the issue is difficult due to the timing coincidence involved, but in my environment I am presently seeing it at least once a day. I figured out the sequence described above through addition of my own debug logs. The relevant code involved in this bug has not changed substantially in libupnp-1.6.10, though I am planning to test against 1.6.10 as well in the near future. Do you have any input for an elegant fix for this issue? Thanks, Chuck Thomason
2011-01-20 04:45:27 -02:00 · 2011-01-20 04:45:27 -02:00 · c4e9757bcf
commit c4e9757bcf
parent 639d3a5a03
3 changed files with 96 additions and 6 deletions
--- a/73
+++ b/73
@ -2,6 +2,79 @@
 Version 1.6.11
 *******************************************************************************

+2011-01-20 Chandra Penke <chandrapenke(at)mcntech.com>
+	Fix for Race condition can hang miniserver thread.
+
+	Add 'requiredThreads' field to the ThreadPool structure, to avoid
+	a race condition when waiting for a new thread to be created. The
+	race condition occurs when a thread is destroyed while the master
+	thread is waiting for a new thread to be created.
+
+	Thanks to Chuck Thomason for pointing the problem.
+
+	Summary: Race condition can hang miniserver thread - ID: 3158591
+
+	Details:
+	Hello,
+
+	I have found a race condition in the thread pool handling of
+	libupnp-1.6.6 that periodically results in the miniserver thread
+	getting blocked infinitely.
+
+	In my setup, I have the miniserver thread pool configured with 1
+	job per thread, 2 threads minimum, and 50 threads maximum.
+
+	Just before the lockup occurs, the miniserver thread pool contains
+	2 threads: one worker thread hanging around from a previous HTTP
+	request job (let's call that thread "old_worker") and the
+	miniserver thread itself.
+
+	A new HTTP request comes in. Accordingly, the miniserver enters
+	schedule_request_job() and then ThreadPoolAdd(). In
+	ThreadPoolAdd(), the job gets added to the medium-priority queue,
+	and AddWorker() is called. In AddWorker(), jobs = 1 and threads =
+	1, so CreateWorker gets called.
+
+	When we enter CreateWorker(), tp->totalThreads is 2, so
+	currentThreads is 3. The function creates a new thread and then
+	blocks on tp->start_and_shutdown. The miniserver thread expects
+	the newly created thread to increment tp->totalThreads and then
+	signal the condition variable to wake up the miniserver thread and
+	let it proceed.
+
+	The newly created thread starts in the WorkerThread() function. It
+	increments tp->totalThreads to 3, does a broadcast on the
+	start_and_shutdown condition, and starts running its job. However,
+	before the miniserver thread wakes up, "old_worker" times out. It
+	sees that there are no jobs in any queue and that the total number
+	of threads (3) is more than the minimum (2). As a result, it
+	reduces tp->totalThreads to 2 and dies.
+
+	Now the miniserver thread finally wakes up. It checks
+	tp->totalThreads and sees that its value is 2, so it blocks on
+	tp->start_and_shutdown again. It has now "missed" seeing
+	tp->totalThreads get incremented to 3 and will never be unblocked
+	again.
+
+	When this issue does occur for a server device, the miniserver
+	port remains open, but becomes unresponsive since the miniserver
+	thread is stuck. SSDP alive messages keep getting sent out, as
+	they are handled by a separate thread. Reproducing the issue is
+	difficult due to the timing coincidence involved, but in my
+	environment I am presently seeing it at least once a day. I
+	figured out the sequence described above through addition of my
+	own debug logs.
+
+	The relevant code involved in this bug has not changed
+	substantially in libupnp-1.6.10, though I am planning to test
+	against 1.6.10 as well in the near future.
+
+	Do you have any input for an elegant fix for this issue?
+
+	Thanks,
+
+	Chuck Thomason
+
 2011-01-16 Marcelo Roberto Jimenez <mroberto(at)users.sourceforge.net>

 	Define _FILE_OFFSET_BITS, _LARGEFILE_SOURCE and _LARGE_FILE_SOURCE in
--- a/threadutil/inc/ThreadPool.h
+++ b/threadutil/inc/ThreadPool.h
@ -222,6 +222,8 @@ typedef struct THREADPOOL
 	int shutdown;
 	/*! total number of threads */
 	int totalThreads;
+	/*! flag that's set when waiting for a new worker thread to start */
+	int pendingWorkerThreadStart;
 	/*! number of threads that are currently executing jobs */
 	int busyThreads;
 	/*! number of persistent threads */
--- a/threadutil/src/ThreadPool.c
+++ b/threadutil/src/ThreadPool.c
@ -97,7 +97,8 @@ static void StatsInit(
 	stats->workerThreads = 0;
 	stats->idleThreads = 0;
 	stats->persistentThreads = 0;
-	stats->maxThreads = 0; stats->totalThreads = 0;
+	stats->maxThreads = 0;
+	stats->totalThreads = 0;
 }

 /*!
@ -459,6 +460,7 @@ static void *WorkerThread(
 	/* Increment total thread count */
 	ithread_mutex_lock(&tp->mutex);
 	tp->totalThreads++;
+	tp->pendingWorkerThreadStart = 0;
 	ithread_cond_broadcast(&tp->start_and_shutdown);
 	ithread_mutex_unlock(&tp->mutex);

@ -602,6 +604,9 @@ static ThreadPoolJob *CreateThreadPoolJob(
 * \brief Creates a worker thread, if the thread pool does not already have
 * max threads.
 *
+ * \remark The ThreadPool object mutex must be locked prior to calling this
+ * function.
+ *
 * \internal
 *
 * \return
@ -610,16 +615,20 @@ static ThreadPoolJob *CreateThreadPoolJob(
 *	\li \c EAGAIN if system can not create thread.
 */
 static int CreateWorker(
-	/*! . */
+	/*! A pointer to the ThreadPool object. */
 	ThreadPool *tp)
 {
 	ithread_t temp;
 	int rc = 0;
-	int currentThreads = tp->totalThreads + 1;
 	ithread_attr_t attr;

+	/* if a new worker is the process of starting, wait until it fully starts */
+	while (tp->pendingWorkerThreadStart) {
+		ithread_cond_wait(&tp->start_and_shutdown, &tp->mutex);
+	}
+
 	if (tp->attr.maxThreads != INFINITE_THREADS &&
-	    currentThreads > tp->attr.maxThreads) {
+	    tp->totalThreads + 1 > tp->attr.maxThreads) {
 		return EMAXTHREADS;
 	}
 	ithread_attr_init(&attr);
@ -628,7 +637,9 @@ static int CreateWorker(
 	ithread_attr_destroy(&attr);
 	if (rc == 0) {
 		rc = ithread_detach(temp);
-		while (tp->totalThreads < currentThreads) {
+		tp->pendingWorkerThreadStart = 1;
+		/* wait until the new worker thread starts */
+		while (tp->pendingWorkerThreadStart) {
 			ithread_cond_wait(&tp->start_and_shutdown, &tp->mutex);
 		}
 	}
@ -643,10 +654,13 @@ static int CreateWorker(
 * \brief Determines whether or not a thread should be added based on the
 * jobsPerThread ratio. Adds a thread if appropriate.
 *
+ * \remark The ThreadPool object mutex must be locked prior to calling this
+ * function.
+ *
 * \internal
 */
 static void AddWorker(
-	/*! . */
+	/*! A pointer to the ThreadPool object. */
 	ThreadPool *tp)
 {
 	long jobs = 0;
@ -709,6 +723,7 @@ int ThreadPoolInit(ThreadPool *tp, ThreadPoolAttr *attr)
 		tp->totalThreads = 0;
 		tp->busyThreads = 0;
 		tp->persistentThreads = 0;
+		tp->pendingWorkerThreadStart = 0;
 		for (i = 0; i < tp->attr.minThreads; ++i) {
 			retCode = CreateWorker(tp);
 			if (retCode) {