Message ID | 20180320165010.15259-2-mwilck@suse.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Delegated to: | christophe varoqui |
Headers | show |
On 2018/3/21 0:51, Martin Wilck wrote: > The ppoll() calls of the uxlsnr thread are vital for proper functioning of > multipathd. If the uxlsnr thread can't open the socket or fails to call ppoll() > for other reasons, quit the daemon. If we don't do that, multipathd may > hang in a state where it can't be terminated any more, because the uxlsnr > thread is responsible for handling all signals. This happens e.g. if > systemd's multipathd.socket is running in and multipathd is started from > outside systemd. > > 24f2844 "multipathd: fix signal blocking logic" has made this problem more > severe. Before that patch, the signals weren't actually blocked in any thread. > That's not to say 24f2844 was wrong. I still think it's correct, we just > need this one on top. > > Signed-off-by: Martin Wilck <mwilck@suse.com> > --- > multipathd/uxlsnr.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/multipathd/uxlsnr.c b/multipathd/uxlsnr.c > index cdafd82943e7..6f666663fc6f 100644 > --- a/multipathd/uxlsnr.c > +++ b/multipathd/uxlsnr.c > @@ -178,7 +178,7 @@ void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, void * trigger_data) > > if (ux_sock == -1) { > condlog(1, "could not create uxsock: %d", errno); > - return NULL; > + exit_daemon(); > } > > pthread_cleanup_push(uxsock_cleanup, (void *)ux_sock); > @@ -187,7 +187,7 @@ void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, void * trigger_data) > polls = (struct pollfd *)MALLOC((MIN_POLLS + 1) * sizeof(struct pollfd)); > if (!polls) { > condlog(0, "uxsock: failed to allocate poll fds"); > - return NULL; > + exit_daemon(); > } > sigfillset(&mask); > sigdelset(&mask, SIGINT); > @@ -249,6 +249,7 @@ void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, void * trigger_data) > > /* something went badly wrong! */ > condlog(0, "uxsock: poll failed with %d", errno); > + exit_daemon(); > break; > } Hi Martin, Your analysis is reasonable. It is necessary to deal with fatal error not only to return, if not doing this multipathd can't exit normally and multipathd commands can't work any more. I think your patch is OK, but I have some ideas inspired by your patch. Calling exit_daemon() is to shut down the multipathd, relay on the outside to pull multipathd again. Is there a function can be use to deal with fatal error? Its function are close the socket(if create successfully before) and create a new socket to make uxlsnr thread work properly again or continue to create uxsocket? This function actually is try to repair those errors. It just an idea, maybe not quite right. Regards, Chongyun -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Wed, 2018-03-21 at 02:43 +0000, Chongyun Wu wrote: > On 2018/3/21 0:51, Martin Wilck wrote: > > The ppoll() calls of the uxlsnr thread are vital for proper > > functioning of > > multipathd. If the uxlsnr thread can't open the socket or fails to > > call ppoll() > > for other reasons, quit the daemon. If we don't do that, multipathd > > may > > hang in a state where it can't be terminated any more, because the > > uxlsnr > > thread is responsible for handling all signals. This happens e.g. > > if > > systemd's multipathd.socket is running in and multipathd is started > > from > > outside systemd. > > > > 24f2844 "multipathd: fix signal blocking logic" has made this > > problem more > > severe. Before that patch, the signals weren't actually blocked in > > any thread. > > That's not to say 24f2844 was wrong. I still think it's correct, we > > just > > need this one on top. > > > > Signed-off-by: Martin Wilck <mwilck@suse.com> > > --- > > multipathd/uxlsnr.c | 5 +++-- > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > diff --git a/multipathd/uxlsnr.c b/multipathd/uxlsnr.c > > index cdafd82943e7..6f666663fc6f 100644 > > --- a/multipathd/uxlsnr.c > > +++ b/multipathd/uxlsnr.c > > @@ -178,7 +178,7 @@ void * uxsock_listen(uxsock_trigger_fn > > uxsock_trigger, void * trigger_data) > > > > if (ux_sock == -1) { > > condlog(1, "could not create uxsock: %d", errno); > > - return NULL; > > + exit_daemon(); > > } > > > > pthread_cleanup_push(uxsock_cleanup, (void *)ux_sock); > > @@ -187,7 +187,7 @@ void * uxsock_listen(uxsock_trigger_fn > > uxsock_trigger, void * trigger_data) > > polls = (struct pollfd *)MALLOC((MIN_POLLS + 1) * > > sizeof(struct pollfd)); > > if (!polls) { > > condlog(0, "uxsock: failed to allocate poll > > fds"); > > - return NULL; > > + exit_daemon(); > > } > > sigfillset(&mask); > > sigdelset(&mask, SIGINT); > > @@ -249,6 +249,7 @@ void * uxsock_listen(uxsock_trigger_fn > > uxsock_trigger, void * trigger_data) > > > > /* something went badly wrong! */ > > condlog(0, "uxsock: poll failed with %d", > > errno); > > + exit_daemon(); > > break; > > } > > Hi Martin, > > Your analysis is reasonable. It is necessary to deal with fatal > error > not only to return, if not doing this multipathd can't exit normally > and > multipathd commands can't work any more. I think your patch is OK, > but I > have some ideas inspired by your patch. > Calling exit_daemon() is to shut down the multipathd, relay on the > outside to pull multipathd again. Is there a function can be use to > deal > with fatal error? I don't think so, that's why I call the error "fatal". But feel free to come up with a more intelligent solution. In the only case with practical relevance I've seen so far (socket can't be opened because it's open in systemd), it's sufficient that multipathd is started via systemd (this should be done in production anyway), or that the admin runs "systemctl stop multipathd.socket" before starting the daemon manually. In general, I don't think it makes sense to try and be overly smart. Errors that prevent the ux socket from being opened (with the mentioned exception) are likely to be so severe that any attempts to work around them will probably fail as well. Martin
On Tue, Mar 20, 2018 at 05:50:10PM +0100, Martin Wilck wrote: > The ppoll() calls of the uxlsnr thread are vital for proper functioning of > multipathd. If the uxlsnr thread can't open the socket or fails to call ppoll() > for other reasons, quit the daemon. If we don't do that, multipathd may > hang in a state where it can't be terminated any more, because the uxlsnr > thread is responsible for handling all signals. This happens e.g. if > systemd's multipathd.socket is running in and multipathd is started from > outside systemd. > > 24f2844 "multipathd: fix signal blocking logic" has made this problem more > severe. Before that patch, the signals weren't actually blocked in any thread. > That's not to say 24f2844 was wrong. I still think it's correct, we just > need this one on top. > Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com> > Signed-off-by: Martin Wilck <mwilck@suse.com> > --- > multipathd/uxlsnr.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/multipathd/uxlsnr.c b/multipathd/uxlsnr.c > index cdafd82943e7..6f666663fc6f 100644 > --- a/multipathd/uxlsnr.c > +++ b/multipathd/uxlsnr.c > @@ -178,7 +178,7 @@ void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, void * trigger_data) > > if (ux_sock == -1) { > condlog(1, "could not create uxsock: %d", errno); > - return NULL; > + exit_daemon(); > } > > pthread_cleanup_push(uxsock_cleanup, (void *)ux_sock); > @@ -187,7 +187,7 @@ void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, void * trigger_data) > polls = (struct pollfd *)MALLOC((MIN_POLLS + 1) * sizeof(struct pollfd)); > if (!polls) { > condlog(0, "uxsock: failed to allocate poll fds"); > - return NULL; > + exit_daemon(); > } > sigfillset(&mask); > sigdelset(&mask, SIGINT); > @@ -249,6 +249,7 @@ void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, void * trigger_data) > > /* something went badly wrong! */ > condlog(0, "uxsock: poll failed with %d", errno); > + exit_daemon(); > break; > } > > -- > 2.16.1 -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
diff --git a/multipathd/uxlsnr.c b/multipathd/uxlsnr.c index cdafd82943e7..6f666663fc6f 100644 --- a/multipathd/uxlsnr.c +++ b/multipathd/uxlsnr.c @@ -178,7 +178,7 @@ void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, void * trigger_data) if (ux_sock == -1) { condlog(1, "could not create uxsock: %d", errno); - return NULL; + exit_daemon(); } pthread_cleanup_push(uxsock_cleanup, (void *)ux_sock); @@ -187,7 +187,7 @@ void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, void * trigger_data) polls = (struct pollfd *)MALLOC((MIN_POLLS + 1) * sizeof(struct pollfd)); if (!polls) { condlog(0, "uxsock: failed to allocate poll fds"); - return NULL; + exit_daemon(); } sigfillset(&mask); sigdelset(&mask, SIGINT); @@ -249,6 +249,7 @@ void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, void * trigger_data) /* something went badly wrong! */ condlog(0, "uxsock: poll failed with %d", errno); + exit_daemon(); break; }
The ppoll() calls of the uxlsnr thread are vital for proper functioning of multipathd. If the uxlsnr thread can't open the socket or fails to call ppoll() for other reasons, quit the daemon. If we don't do that, multipathd may hang in a state where it can't be terminated any more, because the uxlsnr thread is responsible for handling all signals. This happens e.g. if systemd's multipathd.socket is running in and multipathd is started from outside systemd. 24f2844 "multipathd: fix signal blocking logic" has made this problem more severe. Before that patch, the signals weren't actually blocked in any thread. That's not to say 24f2844 was wrong. I still think it's correct, we just need this one on top. Signed-off-by: Martin Wilck <mwilck@suse.com> --- multipathd/uxlsnr.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)