Major bug fix in miniserver.c for IPv6, bug was introduced when
changing implementation of get_port in November 20th 2010 ("gena:fix
several compiler warnings" commit).
The files FreeList.h, LinkedList.h, ThreadPool.h and TimerThread.h
from the threautil library were being installed in the include
directory of the library, incorrectly exposing internal data structure
of the library.
Add 'requiredThreads' field to the ThreadPool structure, to avoid
a race condition when waiting for a new thread to be created. The
race condition occurs when a thread is destroyed while the master
thread is waiting for a new thread to be created.
Thanks to Chuck Thomason for pointing the problem.
Summary: Race condition can hang miniserver thread - ID: 3158591
Details:
Hello,
I have found a race condition in the thread pool handling of
libupnp-1.6.6 that periodically results in the miniserver thread
getting blocked infinitely.
In my setup, I have the miniserver thread pool configured with 1
job per thread, 2 threads minimum, and 50 threads maximum.
Just before the lockup occurs, the miniserver thread pool contains
2 threads: one worker thread hanging around from a previous HTTP
request job (let's call that thread "old_worker") and the
miniserver thread itself.
A new HTTP request comes in. Accordingly, the miniserver enters
schedule_request_job() and then ThreadPoolAdd(). In
ThreadPoolAdd(), the job gets added to the medium-priority queue,
and AddWorker() is called. In AddWorker(), jobs = 1 and threads =
1, so CreateWorker gets called.
When we enter CreateWorker(), tp->totalThreads is 2, so
currentThreads is 3. The function creates a new thread and then
blocks on tp->start_and_shutdown. The miniserver thread expects
the newly created thread to increment tp->totalThreads and then
signal the condition variable to wake up the miniserver thread and
let it proceed.
The newly created thread starts in the WorkerThread() function. It
increments tp->totalThreads to 3, does a broadcast on the
start_and_shutdown condition, and starts running its job. However,
before the miniserver thread wakes up, "old_worker" times out. It
sees that there are no jobs in any queue and that the total number
of threads (3) is more than the minimum (2). As a result, it
reduces tp->totalThreads to 2 and dies.
Now the miniserver thread finally wakes up. It checks
tp->totalThreads and sees that its value is 2, so it blocks on
tp->start_and_shutdown again. It has now "missed" seeing
tp->totalThreads get incremented to 3 and will never be unblocked
again.
When this issue does occur for a server device, the miniserver
port remains open, but becomes unresponsive since the miniserver
thread is stuck. SSDP alive messages keep getting sent out, as
they are handled by a separate thread. Reproducing the issue is
difficult due to the timing coincidence involved, but in my
environment I am presently seeing it at least once a day. I
figured out the sequence described above through addition of my
own debug logs.
The relevant code involved in this bug has not changed
substantially in libupnp-1.6.10, though I am planning to test
against 1.6.10 as well in the near future.
Do you have any input for an elegant fix for this issue?
Thanks,
Chuck Thomason
Details:
Hello. I trying compile libupnp-1.6.10 on the Fedora 14 MinGW
Environment and get many errors. I create patch to fix it. With this
patch i can get static library. This patch is very raw.
Submitted: Ivan Romanov (ivanromanov) - 2010-12-16 23:29:19 UTC
Currently, http_SendMessage was not able to write to write a buffer
due to a bad use of file_buf instead of buf. This bug was introduced by
the 0197-Doxygen-reformating-compiler-warnings patch.
Currently, Upnp_Event_Subscribe always contains an empty chain in the
Sid parameter. This patch now saves the client Subscription ID in this
parameter so Control Points can see and use the same SID in the
Upnp_Event_Subscribe and in the Upnp_Event structures.
Currently, in notify_send_and_recv function, pupnp waits for
HTTP_DEFAULT_TIMEOUT seconds when trying to send a GENA notification.
When there is a lot of notifications with CPs which was disconnected
without unsusbcribing, all the pupnp threads are blocked on this
timeout. To correct, this issue, this patch adds a new variable,
GENA_NOTIFICATION_SENDING_TIMEOUT, which can be used to lower the
timeout so GENA threads return quickly when writing is impossible. By
the same mean, pupnp waits the CP's answer to the NOTIFY for
HTTP_DEFAULT_TIMEOUT seconds, so this patch adds a new variable,
GENA_NOTIFICATION_ANSWERING_TIMEOUT, to customize this value.
Currently, pupnp is using a blocking connect to sends GENA
notifications. As a result, when there is a lot of notifications with
CPs which were disconnected without unsusbcribing, all the pupnp
threads are blocked for 20s (timeout). To correct this issue, this
patch replace the call to connect with a call to private_connect and add
a compilation flag to disable blocking TCP connections, so if we are not
able to connect to the CP, the notification is lost.
SF Bug Tracker - ID: 3104527
Submitted: OBATA Akio ( obache ) - 2010-11-07 07:10:28 BRST
In threadutil/inc/ithread.h, it is expected that
PTHREAD_MUTEX_RECURSIVE is defined as macro. But on DragonFly BSD,
it is defined as enum, so not works as expected.
Attachment patch treat that DragonFly BSD always
have PTHREAD_MUTEX_RECURSIVE.
SF Bug Tracker - ID: 3104521
Submitted: OBATA Akio ( obache ) - 2010-11-07 07:03:44 BRST
In configure.ac
AC_CHECK_FUNCS(ftime,, [AC_CHECK_LIB(compat, ftime)])
But since version 1.6.3, ftime(3) is not used, so it should be
removed, or introduce unwanted linkage with -lcompat.
Fix for bug introduced in samples code in svn revision 502, commit
git:25c908c558c8e60eb386c155a6b93add447ffec0
Sample device and combo were aborting with the message:
"***** SampleUtil_Initialize was called multiple times!"
Put the loop to send multiple copies of each SSDP advertisements in
ssdp_server.c instead of ssdp_device.c so we have only one call to
imillisleep ( SSDP_PAUSE ) to speed up advertisements.
Previously, NUM_COPY was used in ssdp_device.c to send multiple copies
of each advertisements but also multiple replies to each M-SEARCH
request. As sending multiple replies is not compliant with HTTPU/MU
spec, NUM_COPY has been set to 1 in an older patch. However, as this
variable is not needed and has been replaced with SSDP_COPY, it has
been removed.
Currently, SSDP_COPY is used only to send multiple M-SEARCH requests (in
ssdp_ctrlpt.c). With this patch, SSDP_COPY is also used to send multiple
copies of each advertisements packets (in ssdp_device.c).
By "Carl Benson" <carl.benson@windriver.com>:
I had to do some modifications myself though, because the Android
build system insists on having a file named "util.h" taking precedence
in its include path, libupnp gets confused because of the same filename
in upnp/src/inc/util.h
Fix a bug in miniserver.c, in which maxMiniSock was wrongly declared as
unsigned int and as a result it was beeng set to ((unsigned int)(-1)).
As a result, after beeing incremented, it became zero, and this value
was beeing used in the select() call.
Thanks to Fabrice Fontaine for helping and testing with this issue.
the sleep() call, it was just a workaround.
SF Bug Tracker [ 3086852 ] 99% CPU loop in miniserver.c on a non ipv6
system.
Submitted by: Jin ( jin_eld ) - 2010-10-13 19:29:13 UTC
I cross compiled libupnp 1.6.7 for ARM9 using the --disable-ipv6
option, my system is an ipv4 only setup.
I do not know why this problem only appears when running the app in the
background (for instance using nohup &), but then it starts using 99%
CPU.
I traced the problem down to the select() call in miniserver.c in the
RunMiniServer() function. Select returns code 1, but errno is set to
"Socket operation on non-socket", I also see this when running my app
under strace.
I set all ...Sock6 variables to INVALID_SOCKET to make sure that they
do not get added to the FD_SET and the problem is gone.
Adding a configure flag to disable GENA notification reordering as even
with an imillisleep(1), this mechanism consumes too much CPU on embedded
devices when there is a burst of notifications.
When a device with embedded devices (like IGD) when created and one of
the embedded devices did not have any service, there was a Segmentation
Fault (see SF Tracker [ 2688125 ]).
When a lot of notifications were generated by a device in a short
period of time then 100% of the CPU was used to reorder those
notifications by pushing back the thread in the job queue. This
mechanism has been modified so now thread sleep 1 ms before being
pushed back into the job queue.
Removing DEFAULT_SCHED_PARAM parameter and use
sched_get_priority_min(DEFAULT_POLICY) instead.
Devices must respond to M-SEARCH requests for any supported version and the
response should specify the same version as was contained in the search target.
Previously, the device did not answer if the M-SEARCH request did not
contain the same version number than the version number of the device.
This patch adds the WEB_SERVER_CONTENT_LANGUAGE parameter so the user can specify
the language used by the device during Description and Presentation steps of UPnP
through the HTTP CONTENT-LANGUAGE header.
By default, the WEB_SERVER_CONTENT_LANGUAGE is an empty string so no
CONTENT-LANGUAGE is added.
This patch allows a user to customize the stack size of the threads used by
pupnp through the new THREAD_STACK_SIZE variable. This is especially useful
on embedded systems with limited memory where the user can set THREAD_STACK_SIZE
to ITHREAD_STACK_MIN.
However, as this modification can have side effects, I set 0 as the default
value, so threads will continue to use the default stack size of the system
(which varies greatly as stated in
https://computing.llnl.gov/tutorials/pthreads/).
I discovered a reliable denial-of-service issue on the last stable
release of libupnp (1.6.6) remotely triggerable by any
unauthenticated user. The issue is related with a bad parsing of
malformed XML.
Sending messages over UDP is broken in some Apple OSes
such as OS X and iOS. This might be broken in other OSes to but didn't
verify.
The fix is to modify the socket lenght argument of sendto to use the correct
sockaddr lenght dependng on whether the socket is IPV4 or IPV6.
Also added some error checks and debugging related to the issue
crash. This happens when the file being downloaded exceeds the device
memory - entirely possible when transferring video files.
The programmatic cause is that the logic implemented in the function
http_ReadHttpGet (which UpnpReadHttpGet calls) reads the entire file
into memory. The fix modifies the existing logic to discard data after
it's been read; there's no reason to keep it around since the caller
of UpnpReadHttpGet already has a copy of it.
This issue exists in 1.6.6 as well as the latest sources.
Patch submitted by Chandra (inactiveneurons).
use connect() are broken. More specifically, connect() in these methods
is returning with an EINVAL. The programatic cause is that the address_len
argument passed to connect() is different in IPV4 vs IPV6 (as described in:
http://www.opengroup.org/onlinepubs/009695399/functions/connect.html).
The current code always uses the IPV6 size. The fix modifies each use of
connect() to use the correct size based on the address family being used.
Patch submitted by Chandra (inactiveneurons).