diff mbox

[05/16] Fix a couple of signal issues

Message ID 1367531197-8987-6-git-send-email-bmarzins@redhat.com (mailing list archive)
State Deferred, archived
Headers show

Commit Message

Benjamin Marzinski May 2, 2013, 9:46 p.m. UTC
The patch cleans up some signal issues.
First, when the vecs locking around reconfigure got shuffled
around earlier, it was removed from sighup. This patch restores
that.

Second, a new sigusr1 handler was created. However the existing
one was never removed.  Since signal handlers are per-process, and
not per-thread, the original handler will get overwritten by the
new one, so this patch deletes the original handler.

Third, sighup locks the vecs lock and sigusr1 locks logq_lock.
However, these signals weren't being blocked before threads locked
those locks.  This patch blocks those signals while those locks
are being taken to avoid locking deadlocks.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/log_pthread.c |  3 +++
 multipathd/main.c          | 12 +++++-------
 2 files changed, 8 insertions(+), 7 deletions(-)

Comments

Bart Van Assche May 3, 2013, 6:36 a.m. UTC | #1
On 05/02/13 23:46, Benjamin Marzinski wrote:
> The patch cleans up some signal issues.
> First, when the vecs locking around reconfigure got shuffled
> around earlier, it was removed from sighup. This patch restores
> that.
>
> Second, a new sigusr1 handler was created. However the existing
> one was never removed.  Since signal handlers are per-process, and
> not per-thread, the original handler will get overwritten by the
> new one, so this patch deletes the original handler.
>
> Third, sighup locks the vecs lock and sigusr1 locks logq_lock.
> However, these signals weren't being blocked before threads locked
> those locks.  This patch blocks those signals while those locks
> are being taken to avoid locking deadlocks.

Are you aware that POSIX does not allow any locking function to be 
invoked from inside a signal handler ? See e.g. 
http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04 
and the text that starts with "The following table defines a set of 
functions that shall be async-signal-safe. Therefore, applications can 
invoke them, without restriction, from signal-catching functions" for a 
list of C library functions that may be invoked from inside a signal 
handler.

Bart.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Benjamin Marzinski May 3, 2013, 8:24 p.m. UTC | #2
On Fri, May 03, 2013 at 08:36:19AM +0200, Bart Van Assche wrote:
> On 05/02/13 23:46, Benjamin Marzinski wrote:
>> The patch cleans up some signal issues.
>> First, when the vecs locking around reconfigure got shuffled
>> around earlier, it was removed from sighup. This patch restores
>> that.
>>
>> Second, a new sigusr1 handler was created. However the existing
>> one was never removed.  Since signal handlers are per-process, and
>> not per-thread, the original handler will get overwritten by the
>> new one, so this patch deletes the original handler.
>>
>> Third, sighup locks the vecs lock and sigusr1 locks logq_lock.
>> However, these signals weren't being blocked before threads locked
>> those locks.  This patch blocks those signals while those locks
>> are being taken to avoid locking deadlocks.
>
> Are you aware that POSIX does not allow any locking function to be invoked 
> from inside a signal handler ? See e.g. 
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04 
> and the text that starts with "The following table defines a set of 
> functions that shall be async-signal-safe. Therefore, applications can 
> invoke them, without restriction, from signal-catching functions" for a 
> list of C library functions that may be invoked from inside a signal 
> handler.

Um. no. I clearly wasn't aware of that. Nuts. So is the risk that a
signal will come into some thread that's not blocking it, and try to
acquire the pthread mutex at the same time as another thread that is
blocking the signal is trying to aquire the same mutex and that this
will corrupt the lock?

The Ubuntu man page for pthread_mutex_lock says

"The mutex functions are not async-signal safe. What this means is  that
 they should not be called from a signal handler. In particular, calling
 pthread_mutex_lock or pthread_mutex_unlock from a  signal handler  may
 deadlock the calling thread."

If that's the only risk, then blocking the signal before we lock the
mutexes should avoid the issue.  Since the point of mutexes is do deal
with multiple threads that are trying to acquire them at the same time,
it seems like they should be able to handle this when one of the threads
happens to be in a signal handler. Also, the locks in sighup were there
for a long time before they were removed, and while I have definitely
seen deadlocks when we don't remember to mask the signal before locking
on that thread, I've never encountered a bug that seems to be related to
a locking corruption like I speculated in the beginning of my reply.

Now, since my patch was to fix some potential deadlocks from using
pthread_mutex_locks in signal handlers, I do realize that the way
we are doing things isn't the safest way around.  Also, it seems pretty
obvious from looking at the URL you posted that we are in undefined
behavior territory, where if we are safe, it's solely based on the
some non-defined parts of the implementation. So I agree, we aren't
following the specs, and I'll work on redesigning things to avoid this.

But the locking I added in this patch fixes corruption that definitely
happens, and is very much able to crash multipathd.  I still think this
patch should go in, since those locks were previously there for a long
time, and I can easily crash multipathd by repeadly sending it sighups
without this patch.

Does that sound reasonable?

-Ben
>
> Bart.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Bart Van Assche May 4, 2013, 7:19 a.m. UTC | #3
On 05/03/13 22:24, Benjamin Marzinski wrote:
> On Fri, May 03, 2013 at 08:36:19AM +0200, Bart Van Assche wrote:
>> On 05/02/13 23:46, Benjamin Marzinski wrote:
>>> The patch cleans up some signal issues.
>>> First, when the vecs locking around reconfigure got shuffled
>>> around earlier, it was removed from sighup. This patch restores
>>> that.
>>>
>>> Second, a new sigusr1 handler was created. However the existing
>>> one was never removed.  Since signal handlers are per-process, and
>>> not per-thread, the original handler will get overwritten by the
>>> new one, so this patch deletes the original handler.
>>>
>>> Third, sighup locks the vecs lock and sigusr1 locks logq_lock.
>>> However, these signals weren't being blocked before threads locked
>>> those locks.  This patch blocks those signals while those locks
>>> are being taken to avoid locking deadlocks.
>>
>> Are you aware that POSIX does not allow any locking function to be invoked
>> from inside a signal handler ? See e.g.
>> http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04
>> and the text that starts with "The following table defines a set of
>> functions that shall be async-signal-safe. Therefore, applications can
>> invoke them, without restriction, from signal-catching functions" for a
>> list of C library functions that may be invoked from inside a signal
>> handler.
>
> Um. no. I clearly wasn't aware of that. Nuts. So is the risk that a
> signal will come into some thread that's not blocking it, and try to
> acquire the pthread mutex at the same time as another thread that is
> blocking the signal is trying to aquire the same mutex and that this
> will corrupt the lock?
>
> The Ubuntu man page for pthread_mutex_lock says
>
> "The mutex functions are not async-signal safe. What this means is  that
>   they should not be called from a signal handler. In particular, calling
>   pthread_mutex_lock or pthread_mutex_unlock from a  signal handler  may
>   deadlock the calling thread."
>
> If that's the only risk, then blocking the signal before we lock the
> mutexes should avoid the issue.  Since the point of mutexes is do deal
> with multiple threads that are trying to acquire them at the same time,
> it seems like they should be able to handle this when one of the threads
> happens to be in a signal handler. Also, the locks in sighup were there
> for a long time before they were removed, and while I have definitely
> seen deadlocks when we don't remember to mask the signal before locking
> on that thread, I've never encountered a bug that seems to be related to
> a locking corruption like I speculated in the beginning of my reply.
>
> Now, since my patch was to fix some potential deadlocks from using
> pthread_mutex_locks in signal handlers, I do realize that the way
> we are doing things isn't the safest way around.  Also, it seems pretty
> obvious from looking at the URL you posted that we are in undefined
> behavior territory, where if we are safe, it's solely based on the
> some non-defined parts of the implementation. So I agree, we aren't
> following the specs, and I'll work on redesigning things to avoid this.
>
> But the locking I added in this patch fixes corruption that definitely
> happens, and is very much able to crash multipathd.  I still think this
> patch should go in, since those locks were previously there for a long
> time, and I can easily crash multipathd by repeadly sending it sighups
> without this patch.
>
> Does that sound reasonable?

Sorry but in my opinion the patch at the start of this thread makes the 
approach for signal handling in multipathd more complex and harder to 
maintain than strictly necessary. Will e.g. the next person who modifies 
this code be aware of all these subtleties ? The approach I follow in 
multithreaded code that I maintain myself for handling signals is to let 
each signal handler write some data into a pipe, wait on that pipe in 
the main thread and let the main thread take the appropriate action. 
This approach is easy to maintain, does not require to block and unblock 
signals at runtime, is portable between Unix systems and POSIX 
compliant. See e.g. 
http://github.com/bvanassche/srptools/commit/b6589892206ca628dbbb7fd8a1d613bf4f442ee6 
for an example. Note: a possible alternative is to use signalfd() 
instead of creating a pipe.

Bart.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
diff mbox

Patch

diff --git a/libmultipath/log_pthread.c b/libmultipath/log_pthread.c
index b8f9119..b3daf8f 100644
--- a/libmultipath/log_pthread.c
+++ b/libmultipath/log_pthread.c
@@ -55,14 +55,17 @@  void log_safe (int prio, const char * fmt, va_list ap)
 
 void log_thread_flush (void)
 {
+	sigset_t old;
 	int empty;
 
 	do {
+		block_signal(SIGUSR1, &old);
 		pthread_mutex_lock(&logq_lock);
 		empty = log_dequeue(la->buff);
 		pthread_mutex_unlock(&logq_lock);
 		if (!empty)
 			log_syslog(la->buff);
+		pthread_sigmask(SIG_SETMASK, &old, NULL);
 	} while (empty == 0);
 }
 
diff --git a/multipathd/main.c b/multipathd/main.c
index 95264fc..f6e68e8 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -1467,7 +1467,9 @@  sighup (int sig)
 	if (running_state != DAEMON_RUNNING)
 		return;
 
+	lock(gvecs->lock);
 	reconfigure(gvecs);
+	unlock(gvecs->lock);
 
 #ifdef _DEBUG_
 	dbg_free_final(NULL);
@@ -1481,16 +1483,9 @@  sigend (int sig)
 }
 
 static void
-sigusr1 (int sig)
-{
-	condlog(3, "SIGUSR1 received");
-}
-
-static void
 signal_init(void)
 {
 	signal_set(SIGHUP, sighup);
-	signal_set(SIGUSR1, sigusr1);
 	signal_set(SIGINT, sigend);
 	signal_set(SIGTERM, sigend);
 	signal(SIGPIPE, SIG_IGN);
@@ -1646,6 +1641,7 @@  child (void * param)
 	 */
 	running_state = DAEMON_CONFIGURE;
 
+	block_signal(SIGHUP, &set);
 	lock(vecs->lock);
 	if (configure(vecs, 1)) {
 		unlock(vecs->lock);
@@ -1653,6 +1649,7 @@  child (void * param)
 		exit(1);
 	}
 	unlock(vecs->lock);
+	pthread_sigmask(SIG_SETMASK, &set, NULL);
 
 	/*
 	 * start threads
@@ -1685,6 +1682,7 @@  child (void * param)
 	 */
 	running_state = DAEMON_SHUTDOWN;
 	pthread_sigmask(SIG_UNBLOCK, &set, NULL);
+	block_signal(SIGUSR1, NULL);
 	block_signal(SIGHUP, NULL);
 	lock(vecs->lock);
 	if (conf->queue_without_daemon == QUE_NO_DAEMON_OFF)