diff mbox

[2/2] multipathd: handle errors in uxlsnr as fatal

Message ID 20180320165010.15259-2-mwilck@suse.com (mailing list archive)
State Not Applicable, archived
Delegated to: christophe varoqui
Headers show

Commit Message

Martin Wilck March 20, 2018, 4:50 p.m. UTC
The ppoll() calls of the uxlsnr thread are vital for proper functioning of
multipathd. If the uxlsnr thread can't open the socket or fails to call ppoll()
for other reasons, quit the daemon. If we don't do that, multipathd may
hang in a state where it can't be terminated any more, because the uxlsnr
thread is responsible for handling all signals. This happens e.g. if
systemd's multipathd.socket is running in and multipathd is started from
outside systemd.

24f2844 "multipathd: fix signal blocking logic" has made this problem more
severe. Before that patch, the signals weren't actually blocked in any thread.
That's not to say 24f2844 was wrong. I still think it's correct, we just
need this one on top.

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 multipathd/uxlsnr.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Chongyun Wu March 21, 2018, 2:43 a.m. UTC | #1
On 2018/3/21 0:51, Martin Wilck wrote:
> The ppoll() calls of the uxlsnr thread are vital for proper functioning of
> multipathd. If the uxlsnr thread can't open the socket or fails to call ppoll()
> for other reasons, quit the daemon. If we don't do that, multipathd may
> hang in a state where it can't be terminated any more, because the uxlsnr
> thread is responsible for handling all signals. This happens e.g. if
> systemd's multipathd.socket is running in and multipathd is started from
> outside systemd.
> 
> 24f2844 "multipathd: fix signal blocking logic" has made this problem more
> severe. Before that patch, the signals weren't actually blocked in any thread.
> That's not to say 24f2844 was wrong. I still think it's correct, we just
> need this one on top.
> 
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>   multipathd/uxlsnr.c | 5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/multipathd/uxlsnr.c b/multipathd/uxlsnr.c
> index cdafd82943e7..6f666663fc6f 100644
> --- a/multipathd/uxlsnr.c
> +++ b/multipathd/uxlsnr.c
> @@ -178,7 +178,7 @@ void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, void * trigger_data)
>   
>   	if (ux_sock == -1) {
>   		condlog(1, "could not create uxsock: %d", errno);
> -		return NULL;
> +		exit_daemon();
>   	}
>   
>   	pthread_cleanup_push(uxsock_cleanup, (void *)ux_sock);
> @@ -187,7 +187,7 @@ void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, void * trigger_data)
>   	polls = (struct pollfd *)MALLOC((MIN_POLLS + 1) * sizeof(struct pollfd));
>   	if (!polls) {
>   		condlog(0, "uxsock: failed to allocate poll fds");
> -		return NULL;
> +		exit_daemon();
>   	}
>   	sigfillset(&mask);
>   	sigdelset(&mask, SIGINT);
> @@ -249,6 +249,7 @@ void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, void * trigger_data)
>   
>   			/* something went badly wrong! */
>   			condlog(0, "uxsock: poll failed with %d", errno);
> +			exit_daemon();
>   			break;
>   		}
Hi Martin,

Your analysis is reasonable. It is necessary to deal with fatal error 
not only to return, if not doing this multipathd can't exit normally and 
multipathd commands can't work any more. I think your patch is OK, but I 
have some ideas inspired by your patch.
Calling exit_daemon() is to shut down the multipathd, relay on the 
outside to pull multipathd again. Is there a function can be use to deal 
with fatal error? Its function are close the socket(if create 
successfully before) and create a new socket to make uxlsnr thread work 
properly again or continue to create uxsocket? This function actually is 
try to repair those errors.
It just an idea, maybe not quite right.

Regards,
Chongyun



--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Martin Wilck March 21, 2018, 7:48 a.m. UTC | #2
On Wed, 2018-03-21 at 02:43 +0000, Chongyun Wu wrote:
> On 2018/3/21 0:51, Martin Wilck wrote:
> > The ppoll() calls of the uxlsnr thread are vital for proper
> > functioning of
> > multipathd. If the uxlsnr thread can't open the socket or fails to
> > call ppoll()
> > for other reasons, quit the daemon. If we don't do that, multipathd
> > may
> > hang in a state where it can't be terminated any more, because the
> > uxlsnr
> > thread is responsible for handling all signals. This happens e.g.
> > if
> > systemd's multipathd.socket is running in and multipathd is started
> > from
> > outside systemd.
> > 
> > 24f2844 "multipathd: fix signal blocking logic" has made this
> > problem more
> > severe. Before that patch, the signals weren't actually blocked in
> > any thread.
> > That's not to say 24f2844 was wrong. I still think it's correct, we
> > just
> > need this one on top.
> > 
> > Signed-off-by: Martin Wilck <mwilck@suse.com>
> > ---
> >   multipathd/uxlsnr.c | 5 +++--
> >   1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/multipathd/uxlsnr.c b/multipathd/uxlsnr.c
> > index cdafd82943e7..6f666663fc6f 100644
> > --- a/multipathd/uxlsnr.c
> > +++ b/multipathd/uxlsnr.c
> > @@ -178,7 +178,7 @@ void * uxsock_listen(uxsock_trigger_fn
> > uxsock_trigger, void * trigger_data)
> >   
> >   	if (ux_sock == -1) {
> >   		condlog(1, "could not create uxsock: %d", errno);
> > -		return NULL;
> > +		exit_daemon();
> >   	}
> >   
> >   	pthread_cleanup_push(uxsock_cleanup, (void *)ux_sock);
> > @@ -187,7 +187,7 @@ void * uxsock_listen(uxsock_trigger_fn
> > uxsock_trigger, void * trigger_data)
> >   	polls = (struct pollfd *)MALLOC((MIN_POLLS + 1) *
> > sizeof(struct pollfd));
> >   	if (!polls) {
> >   		condlog(0, "uxsock: failed to allocate poll
> > fds");
> > -		return NULL;
> > +		exit_daemon();
> >   	}
> >   	sigfillset(&mask);
> >   	sigdelset(&mask, SIGINT);
> > @@ -249,6 +249,7 @@ void * uxsock_listen(uxsock_trigger_fn
> > uxsock_trigger, void * trigger_data)
> >   
> >   			/* something went badly wrong! */
> >   			condlog(0, "uxsock: poll failed with %d",
> > errno);
> > +			exit_daemon();
> >   			break;
> >   		}
> 
> Hi Martin,
> 
> Your analysis is reasonable. It is necessary to deal with fatal
> error 
> not only to return, if not doing this multipathd can't exit normally
> and 
> multipathd commands can't work any more. I think your patch is OK,
> but I 
> have some ideas inspired by your patch.
> Calling exit_daemon() is to shut down the multipathd, relay on the 
> outside to pull multipathd again. Is there a function can be use to
> deal 
> with fatal error?

I don't think so, that's why I call the error "fatal". But feel free to
come up with a more intelligent solution. In the only case with
practical relevance I've seen so far (socket can't be opened because
it's open in systemd), it's sufficient that multipathd is started via
systemd (this should be done in production anyway), or that the admin
runs "systemctl stop multipathd.socket" before starting the daemon
manually.

In general, I don't think it makes sense to try and be overly smart.
Errors that prevent the ux socket from being opened (with the mentioned
exception) are likely to be so severe that any attempts to work around
them will probably fail as well.

Martin
Benjamin Marzinski March 22, 2018, 11:31 p.m. UTC | #3
On Tue, Mar 20, 2018 at 05:50:10PM +0100, Martin Wilck wrote:
> The ppoll() calls of the uxlsnr thread are vital for proper functioning of
> multipathd. If the uxlsnr thread can't open the socket or fails to call ppoll()
> for other reasons, quit the daemon. If we don't do that, multipathd may
> hang in a state where it can't be terminated any more, because the uxlsnr
> thread is responsible for handling all signals. This happens e.g. if
> systemd's multipathd.socket is running in and multipathd is started from
> outside systemd.
> 
> 24f2844 "multipathd: fix signal blocking logic" has made this problem more
> severe. Before that patch, the signals weren't actually blocked in any thread.
> That's not to say 24f2844 was wrong. I still think it's correct, we just
> need this one on top.
> 
Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>  multipathd/uxlsnr.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/multipathd/uxlsnr.c b/multipathd/uxlsnr.c
> index cdafd82943e7..6f666663fc6f 100644
> --- a/multipathd/uxlsnr.c
> +++ b/multipathd/uxlsnr.c
> @@ -178,7 +178,7 @@ void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, void * trigger_data)
>  
>  	if (ux_sock == -1) {
>  		condlog(1, "could not create uxsock: %d", errno);
> -		return NULL;
> +		exit_daemon();
>  	}
>  
>  	pthread_cleanup_push(uxsock_cleanup, (void *)ux_sock);
> @@ -187,7 +187,7 @@ void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, void * trigger_data)
>  	polls = (struct pollfd *)MALLOC((MIN_POLLS + 1) * sizeof(struct pollfd));
>  	if (!polls) {
>  		condlog(0, "uxsock: failed to allocate poll fds");
> -		return NULL;
> +		exit_daemon();
>  	}
>  	sigfillset(&mask);
>  	sigdelset(&mask, SIGINT);
> @@ -249,6 +249,7 @@ void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, void * trigger_data)
>  
>  			/* something went badly wrong! */
>  			condlog(0, "uxsock: poll failed with %d", errno);
> +			exit_daemon();
>  			break;
>  		}
>  
> -- 
> 2.16.1

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
diff mbox

Patch

diff --git a/multipathd/uxlsnr.c b/multipathd/uxlsnr.c
index cdafd82943e7..6f666663fc6f 100644
--- a/multipathd/uxlsnr.c
+++ b/multipathd/uxlsnr.c
@@ -178,7 +178,7 @@  void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, void * trigger_data)
 
 	if (ux_sock == -1) {
 		condlog(1, "could not create uxsock: %d", errno);
-		return NULL;
+		exit_daemon();
 	}
 
 	pthread_cleanup_push(uxsock_cleanup, (void *)ux_sock);
@@ -187,7 +187,7 @@  void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, void * trigger_data)
 	polls = (struct pollfd *)MALLOC((MIN_POLLS + 1) * sizeof(struct pollfd));
 	if (!polls) {
 		condlog(0, "uxsock: failed to allocate poll fds");
-		return NULL;
+		exit_daemon();
 	}
 	sigfillset(&mask);
 	sigdelset(&mask, SIGINT);
@@ -249,6 +249,7 @@  void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, void * trigger_data)
 
 			/* something went badly wrong! */
 			condlog(0, "uxsock: poll failed with %d", errno);
+			exit_daemon();
 			break;
 		}