diff mbox

multipathd: fix inverted signal blocking logic

Message ID 20180302211807.11434-1-mwilck@suse.com (mailing list archive)
State Not Applicable, archived
Delegated to: christophe varoqui
Headers show

Commit Message

Martin Wilck March 2, 2018, 9:18 p.m. UTC
On Fri, 2018-03-02 at 14:00 -0600, Benjamin Marzinski wrote:

> But
> in any case, after commit 534ec4c, we are only blocking SIGUSR2 on
> all
> threads, which is the only signal we don't need to block.  This means

I can see what you mean. Better like this, maybe?

multipathd is supposed to block all signals in all threads, except
the uxlsnr thread which handles termination and reconfiguration
signals (SIGUSR1) in its ppoll() call, SIGUSR2 in the waiter thread, and
occasional SIGALRM. The current logic does exactly the opposite, it blocks
termination signals in SIGPOLL and allows multipathd to be killed e.g.
by SIGALRM.

Fix that bin inverting the logic. The argument to pthread_sigmask and
ppoll is the set of *blocked* signals, not vice versa.

Fixes: 534ec4c "multipathd: Ensure that SIGINT, SIGTERM, SIGHUP and SIGUSR1
are delivered to the uxsock thread"

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 multipathd/main.c   |  7 +++++--
 multipathd/uxlsnr.c | 10 +++++-----
 2 files changed, 10 insertions(+), 7 deletions(-)

Comments

Bart Van Assche March 2, 2018, 9:35 p.m. UTC | #1
On Fri, 2018-03-02 at 22:18 +0100, Martin Wilck wrote:
> diff --git a/multipathd/main.c b/multipathd/main.c
> index 61739ac6ea59..85ee9b713d75 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -2270,10 +2270,13 @@ signal_init(void)
>  {
>  	sigset_t set;
>  
> -	sigemptyset(&set);
> -	sigaddset(&set, SIGUSR2);
> +	/* block all signals */
> +	sigfillset(&set);
> +	/* SIGPIPE occurs if logging fails */
> +	sigdelset(&set, SIGPIPE);
>  	pthread_sigmask(SIG_SETMASK, &set, NULL);

The modified code was introduced by commit 90dd424afa65 ("multipathd:
fix SIGUSR2 handling"). This change looks fine to me.

> diff --git a/multipathd/uxlsnr.c b/multipathd/uxlsnr.c
> index 98ac25a68c43..a2ca36ba1653 100644
> --- a/multipathd/uxlsnr.c
> +++ b/multipathd/uxlsnr.c
> @@ -170,11 +170,11 @@ void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, void * trigger_data)
>  		condlog(0, "uxsock: failed to allocate poll fds");
>  		return NULL;
>  	}
> -	sigemptyset(&mask);
> -	sigaddset(&mask, SIGINT);
> -	sigaddset(&mask, SIGTERM);
> -	sigaddset(&mask, SIGHUP);
> -	sigaddset(&mask, SIGUSR1);
> +	sigfillset(&mask);
> +	sigdelset(&mask, SIGINT);
> +	sigdelset(&mask, SIGTERM);
> +	sigdelset(&mask, SIGHUP);
> +	sigdelset(&mask, SIGUSR1);
>  	while (1) {
>  		struct client *c, *tmp;
>  		int i, poll_count, num_clients;

This change looks more complicated to me than necessary. Have you considered
to pass an empty signal set as the fourth ppoll() argument?

Thanks,

Bart.



--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Martin Wilck March 2, 2018, 10:15 p.m. UTC | #2
On Fri, 2018-03-02 at 21:35 +0000, Bart Van Assche wrote:
> On Fri, 2018-03-02 at 22:18 +0100, Martin Wilck wrote:
> > diff --git a/multipathd/main.c b/multipathd/main.c
> > index 61739ac6ea59..85ee9b713d75 100644
> > --- a/multipathd/main.c
> > +++ b/multipathd/main.c
> > @@ -2270,10 +2270,13 @@ signal_init(void)
> >  {
> >  	sigset_t set;
> >  
> > -	sigemptyset(&set);
> > -	sigaddset(&set, SIGUSR2);
> > +	/* block all signals */
> > +	sigfillset(&set);
> > +	/* SIGPIPE occurs if logging fails */
> > +	sigdelset(&set, SIGPIPE);
> >  	pthread_sigmask(SIG_SETMASK, &set, NULL);
> 
> The modified code was introduced by commit 90dd424afa65 ("multipathd:
> fix SIGUSR2 handling"). This change looks fine to me.
> 
> > diff --git a/multipathd/uxlsnr.c b/multipathd/uxlsnr.c
> > index 98ac25a68c43..a2ca36ba1653 100644
> > --- a/multipathd/uxlsnr.c
> > +++ b/multipathd/uxlsnr.c
> > @@ -170,11 +170,11 @@ void * uxsock_listen(uxsock_trigger_fn
> > uxsock_trigger, void * trigger_data)
> >  		condlog(0, "uxsock: failed to allocate poll fds");
> >  		return NULL;
> >  	}
> > -	sigemptyset(&mask);
> > -	sigaddset(&mask, SIGINT);
> > -	sigaddset(&mask, SIGTERM);
> > -	sigaddset(&mask, SIGHUP);
> > -	sigaddset(&mask, SIGUSR1);
> > +	sigfillset(&mask);
> > +	sigdelset(&mask, SIGINT);
> > +	sigdelset(&mask, SIGTERM);
> > +	sigdelset(&mask, SIGHUP);
> > +	sigdelset(&mask, SIGUSR1);
> >  	while (1) {
> >  		struct client *c, *tmp;
> >  		int i, poll_count, num_clients;
> 
> This change looks more complicated to me than necessary. Have you
> considered
> to pass an empty signal set as the fourth ppoll() argument?

An empty set would mean that no signal is blocked during ppoll().
Therefore e.g. SIGALRM would terminate multipathd if it arrives
during the ppoll (no handler set, and default action is "Term").

Regards,
Martin
Bart Van Assche March 2, 2018, 10:23 p.m. UTC | #3
On Fri, 2018-03-02 at 23:15 +0100, Martin Wilck wrote:
> On Fri, 2018-03-02 at 21:35 +0000, Bart Van Assche wrote:
> > This change looks more complicated to me than necessary. Have you
> > considered
> > to pass an empty signal set as the fourth ppoll() argument?
> 
> An empty set would mean that no signal is blocked during ppoll().
> Therefore e.g. SIGALRM would terminate multipathd if it arrives
> during the ppoll (no handler set, and default action is "Term").

Have you considered to only block SIGALRM while ppoll() is in progress?

Thanks,

Bart.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Martin Wilck March 2, 2018, 11:16 p.m. UTC | #4
On Fri, 2018-03-02 at 22:23 +0000, Bart Van Assche wrote:
> On Fri, 2018-03-02 at 23:15 +0100, Martin Wilck wrote:
> > On Fri, 2018-03-02 at 21:35 +0000, Bart Van Assche wrote:
> > > This change looks more complicated to me than necessary. Have you
> > > considered
> > > to pass an empty signal set as the fourth ppoll() argument?
> > 
> > An empty set would mean that no signal is blocked during ppoll().
> > Therefore e.g. SIGALRM would terminate multipathd if it arrives
> > during the ppoll (no handler set, and default action is "Term").
> 
> Have you considered to only block SIGALRM while ppoll() is in
> progress?

Why should we? The same reasoning applies to other signals such as e.g.
SIGUSR2. We need to block all signals except those that we can handle.

Regards,
Martin
Bart Van Assche March 2, 2018, 11:27 p.m. UTC | #5
On Sat, 2018-03-03 at 00:16 +0100, Martin Wilck wrote:
> On Fri, 2018-03-02 at 22:23 +0000, Bart Van Assche wrote:
> > On Fri, 2018-03-02 at 23:15 +0100, Martin Wilck wrote:
> > > On Fri, 2018-03-02 at 21:35 +0000, Bart Van Assche wrote:
> > > > This change looks more complicated to me than necessary. Have you
> > > > considered
> > > > to pass an empty signal set as the fourth ppoll() argument?
> > > 
> > > An empty set would mean that no signal is blocked during ppoll().
> > > Therefore e.g. SIGALRM would terminate multipathd if it arrives
> > > during the ppoll (no handler set, and default action is "Term").
> > 
> > Have you considered to only block SIGALRM while ppoll() is in
> > progress?
> 
> Why should we? The same reasoning applies to other signals such as e.g.
> SIGUSR2. We need to block all signals except those that we can handle.

I see two reasons:
- All signals that are delivered to the multipathd process except SIGALRM
  should interrupt ppoll(). SIGPIPE is delivered to the thread that caused
  that signal to be raised. SIGUSR2 is sent to a specific thread. Hence
  neither SIGPIPE nor SIGUSR2 will ever be delivered to the thread that
  calls ppoll().
- Asking ppoll() to block all signals is weird because some signals, e.g.
  SIGKILL and SIGSTOP, cannot be blocked.

Bart.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Martin Wilck March 3, 2018, 12:31 a.m. UTC | #6
On Fri, 2018-03-02 at 23:27 +0000, Bart Van Assche wrote:
> On Sat, 2018-03-03 at 00:16 +0100, Martin Wilck wrote:
> > On Fri, 2018-03-02 at 22:23 +0000, Bart Van Assche wrote:
> > > On Fri, 2018-03-02 at 23:15 +0100, Martin Wilck wrote:
> > > > On Fri, 2018-03-02 at 21:35 +0000, Bart Van Assche wrote:
> > > > > This change looks more complicated to me than necessary. Have
> > > > > you
> > > > > considered
> > > > > to pass an empty signal set as the fourth ppoll() argument?
> > > > 
> > > > An empty set would mean that no signal is blocked during
> > > > ppoll().
> > > > Therefore e.g. SIGALRM would terminate multipathd if it arrives
> > > > during the ppoll (no handler set, and default action is
> > > > "Term").
> > > 
> > > Have you considered to only block SIGALRM while ppoll() is in
> > > progress?
> > 
> > Why should we? The same reasoning applies to other signals such as
> > e.g.
> > SIGUSR2. We need to block all signals except those that we can
> > handle.
> 
> I see two reasons:
> - All signals that are delivered to the multipathd process except
> SIGALRM
>   should interrupt ppoll(). SIGPIPE is delivered to the thread that
> caused
>   that signal to be raised. SIGUSR2 is sent to a specific thread.
> Hence
>   neither SIGPIPE nor SIGUSR2 will ever be delivered to the thread
> that
>   calls ppoll().

So, there's no reason not to block them, right? Is it expected behavior
that a user running 'kill -USR2 $(pidof multipathd)' terminates the
process? Why do you think these signals should interrupt ppoll()
although the uxlsnr can't can't handle them? Isn't it sufficient that
they're handled by the threads that are meant to do that?

IMO the idea of ppoll() is to specify a set of signals the calling
thread is interested in, and that's precisely the set of signals
handle_signals() can deal with.

I'm rather neutral about SIGPIPE, we can add a sigdelset(&set, SIGPIPE)
if you insist (multipathd will die anyway), but clearing SIGUSR2 in the
uxlsnr would be wrong IMO, as it's dedicated to the waiter thread.

> - Asking ppoll() to block all signals is weird because some signals,
> e.g.
>   SIGKILL and SIGSTOP, cannot be blocked.

And the kernel makes sure that doesn't happen. So I can add
sigdelset(&set, SIGKILL) bit it's kind of pointless.

Martin

> 
> Bart.
Bart Van Assche March 5, 2018, 4:27 p.m. UTC | #7
On Sat, 2018-03-03 at 01:31 +0100, Martin Wilck wrote:
> So, there's no reason not to block them, right? Is it expected behavior
> that a user running 'kill -USR2 $(pidof multipathd)' terminates the
> process? Why do you think these signals should interrupt ppoll()
> although the uxlsnr can't can't handle them? Isn't it sufficient that
> they're handled by the threads that are meant to do that?

Blocking all signals except the ones for which we installed a handler sounds
weird to me. I think users expect daemons to process signals instead of
blocking all but a specific set of signals. This is a rather philosophical
argument and not an argument of which I think that it is strong enough to
prevent this patch of being integrated in the upstream multipath-tools
repository.

Bart.



--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Martin Wilck March 5, 2018, 5:28 p.m. UTC | #8
Hello Bart,

On Mon, 2018-03-05 at 16:27 +0000, Bart Van Assche wrote:
> On Sat, 2018-03-03 at 01:31 +0100, Martin Wilck wrote:
> > So, there's no reason not to block them, right? Is it expected
> > behavior
> > that a user running 'kill -USR2 $(pidof multipathd)' terminates the
> > process? Why do you think these signals should interrupt ppoll()
> > although the uxlsnr can't can't handle them? Isn't it sufficient
> > that
> > they're handled by the threads that are meant to do that?
> 
> Blocking all signals except the ones for which we installed a handler
> sounds
> weird to me. I think users expect daemons to process signals instead
> of
> blocking all but a specific set of signals. This is a rather
> philosophical
> argument and not an argument of which I think that it is strong
> enough to
> prevent this patch of being integrated in the upstream multipath-
> tools
> repository.

Thank you. Meanwhile, I found that this patch requires a change in the
"marginal paths" code. I'll post an official patch as update.

Regarding the philosophical discussion, may I remind you of your own
words from 534ec4c?

    The POSIX standard mentions that the only way to guarantee that
    signals are delivered to a specific thread is:
     * Block all signals before the first pthread_create() call.
     * Unblock signals from the thread that should receive signals.

If we unblock signals with a default disposition of "Term" for which we
have no handler, stray signals caught would terminate multipathd. I
don't think that's what we want. Of course we must react on
SIGINT/SIGTERM/SIGHUP in the way the user expects us to, but
terminating on other signals would be wrong.

Let's see if others have to say anything about this.

Martin
Benjamin Marzinski March 6, 2018, 12:46 a.m. UTC | #9
On Mon, Mar 05, 2018 at 06:28:05PM +0100, Martin Wilck wrote:
> Hello Bart,
> 
> On Mon, 2018-03-05 at 16:27 +0000, Bart Van Assche wrote:
> > On Sat, 2018-03-03 at 01:31 +0100, Martin Wilck wrote:
> > > So, there's no reason not to block them, right? Is it expected
> > > behavior
> > > that a user running 'kill -USR2 $(pidof multipathd)' terminates the
> > > process? Why do you think these signals should interrupt ppoll()
> > > although the uxlsnr can't can't handle them? Isn't it sufficient
> > > that
> > > they're handled by the threads that are meant to do that?
> > 
> > Blocking all signals except the ones for which we installed a handler
> > sounds
> > weird to me. I think users expect daemons to process signals instead
> > of
> > blocking all but a specific set of signals. This is a rather
> > philosophical
> > argument and not an argument of which I think that it is strong
> > enough to
> > prevent this patch of being integrated in the upstream multipath-
> > tools
> > repository.
> 
> Thank you. Meanwhile, I found that this patch requires a change in the
> "marginal paths" code. I'll post an official patch as update.
> 
> Regarding the philosophical discussion, may I remind you of your own
> words from 534ec4c?
> 
>     The POSIX standard mentions that the only way to guarantee that
>     signals are delivered to a specific thread is:
>      * Block all signals before the first pthread_create() call.
>      * Unblock signals from the thread that should receive signals.
> 
> If we unblock signals with a default disposition of "Term" for which we
> have no handler, stray signals caught would terminate multipathd. I
> don't think that's what we want. Of course we must react on
> SIGINT/SIGTERM/SIGHUP in the way the user expects us to, but
> terminating on other signals would be wrong.
> 
> Let's see if others have to say anything about this.

This is prety close to what I had in mind. The only things missing, that
I noticed when I was looking at multipathd's signal handling, is dealing
with SIGUSR2 in io_err_stat.c, which it sounds like you have already
spotted, and changing the calls to setitimer and usleep outside of the
vecs lock to use nanosleep, so they don't interact with SIGALRM.

I'll send a patch based on yours, to fix these.

-Ben

> 
> Martin
> 
> -- 
> Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Martin Wilck March 6, 2018, 8:48 a.m. UTC | #10
On Mon, 2018-03-05 at 18:46 -0600, Benjamin Marzinski wrote:
> 
> This is prety close to what I had in mind. The only things missing,
> that
> I noticed when I was looking at multipathd's signal handling, is
> dealing
> with SIGUSR2 in io_err_stat.c, which it sounds like you have already
> spotted, and changing the calls to setitimer and usleep outside of
> the
> vecs lock to use nanosleep, so they don't interact with SIGALRM.
> 
> I'll send a patch based on yours, to fix these.

I'm curious to see what you come up with. I've something on my
workbench which seems to work well but needs some cleanup. The idea is
to have a dedicated "alarm dispatcher" thread that deals with kernel
timers exclusively, and forwards expiry events to the threads that have
asked for it. The dispatcher maintains a list of active alarms and sets
a kernel timer to the expiry time of the first alarm in the list.

Cheers,
Martin
diff mbox

Patch

diff --git a/multipathd/main.c b/multipathd/main.c
index 61739ac6ea59..85ee9b713d75 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -2270,10 +2270,13 @@  signal_init(void)
 {
 	sigset_t set;
 
-	sigemptyset(&set);
-	sigaddset(&set, SIGUSR2);
+	/* block all signals */
+	sigfillset(&set);
+	/* SIGPIPE occurs if logging fails */
+	sigdelset(&set, SIGPIPE);
 	pthread_sigmask(SIG_SETMASK, &set, NULL);
 
+	/* Other signals will be unblocked in the uxlsnr thread */
 	signal_set(SIGHUP, sighup);
 	signal_set(SIGUSR1, sigusr1);
 	signal_set(SIGUSR2, sigusr2);
diff --git a/multipathd/uxlsnr.c b/multipathd/uxlsnr.c
index 98ac25a68c43..a2ca36ba1653 100644
--- a/multipathd/uxlsnr.c
+++ b/multipathd/uxlsnr.c
@@ -170,11 +170,11 @@  void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, void * trigger_data)
 		condlog(0, "uxsock: failed to allocate poll fds");
 		return NULL;
 	}
-	sigemptyset(&mask);
-	sigaddset(&mask, SIGINT);
-	sigaddset(&mask, SIGTERM);
-	sigaddset(&mask, SIGHUP);
-	sigaddset(&mask, SIGUSR1);
+	sigfillset(&mask);
+	sigdelset(&mask, SIGINT);
+	sigdelset(&mask, SIGTERM);
+	sigdelset(&mask, SIGHUP);
+	sigdelset(&mask, SIGUSR1);
 	while (1) {
 		struct client *c, *tmp;
 		int i, poll_count, num_clients;