Message ID | 20250227174104.206721-1-mwilck@suse.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | multipathd: fix hang during shutdown with queuing maps | expand |
On Thu, Feb 27, 2025 at 06:41:04PM +0100, Martin Wilck wrote: > Since c9689b6 ("multipathd: Remove dependency on > systemd-udev-settle.service"), multipathd.service starts very early during > boot, which in systemd's service ordering logic means that it is stopped > late. While this is generally a good thing, it means that, when systemd > unmounts file systems and tears down the block device stack, multipathd > is still running. Therefore our "queue_without_daemon" logic, which disables > queuing when multipathd exits, isn't effective yet. If there are any > multipath maps that are in queueing state at this point in time, the system > may hang indefinitely. > > To fix that, add a new service which starts later (and thus stops earlier) and > disables queueing on all multipath maps during shutdown. Similar to lvm2's > blk-availability.service, the service does nothing when started. > > Fixes: c9689b6 ("multipathd: Remove dependency on systemd-udev-settle.service") Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com> > > Signed-off-by: Martin Wilck <mwilck@suse.com> > --- > multipathd/Makefile | 6 ++++-- > multipathd/multipathd-queueing.service.in | 9 +++++++++ > multipathd/multipathd.service.in | 2 +- > 3 files changed, 14 insertions(+), 3 deletions(-) > create mode 100644 multipathd/multipathd-queueing.service.in > > diff --git a/multipathd/Makefile b/multipathd/Makefile > index 61cf1af..4bcee6b 100644 > --- a/multipathd/Makefile > +++ b/multipathd/Makefile > @@ -41,7 +41,7 @@ ifeq ($(FPIN_SUPPORT),1) > OBJS += fpin_handlers.o > endif > > -all : $(EXEC) $(CLI) $(MANPAGES) $(EXEC).service $(EXEC).socket > +all : $(EXEC) $(CLI) $(MANPAGES) $(EXEC).service $(EXEC).socket $(EXEC)-queueing.service > > $(EXEC): $(OBJS) $(multipathdir)/libmultipath.so $(mpathcmddir)/libmpathcmd.so > @echo building $@ because of $? > @@ -64,6 +64,7 @@ install: > ifdef SYSTEMD > $(Q)$(INSTALL_PROGRAM) -d $(DESTDIR)$(unitdir) > $(Q)$(INSTALL_PROGRAM) -m 644 $(EXEC).service $(DESTDIR)$(unitdir) > + $(Q)$(INSTALL_PROGRAM) -m 644 $(EXEC)-queueing.service $(DESTDIR)$(unitdir) > $(Q)$(INSTALL_PROGRAM) -m 644 $(EXEC).socket $(DESTDIR)$(unitdir) > endif > $(Q)$(INSTALL_PROGRAM) -d $(DESTDIR)$(mandir)/man8 > @@ -74,11 +75,12 @@ uninstall: > $(Q)$(RM) $(DESTDIR)$(bindir)/$(EXEC) $(DESTDIR)$(bindir)/$(CLI) > $(Q)$(RM) $(DESTDIR)$(mandir)/man8/$(EXEC).8 > $(Q)$(RM) $(DESTDIR)$(mandir)/man8/$(CLI).8 > + $(Q)$(RM) $(DESTDIR)$(unitdir)/$(EXEC)-queueing.service > $(Q)$(RM) $(DESTDIR)$(unitdir)/$(EXEC).service > $(Q)$(RM) $(DESTDIR)$(unitdir)/$(EXEC).socket > > clean: dep_clean > - $(Q)$(RM) core *.o $(EXEC) $(CLI) $(MANPAGES) $(EXEC).service $(EXEC).socket > + $(Q)$(RM) core *.o $(EXEC) $(CLI) $(MANPAGES) $(EXEC).service $(EXEC)-queueing.service $(EXEC).socket > > include $(wildcard $(OBJS:.o=.d) $(CLI_OBJS:.o=.d)) > > diff --git a/multipathd/multipathd-queueing.service.in b/multipathd/multipathd-queueing.service.in > new file mode 100644 > index 0000000..18b0ca6 > --- /dev/null > +++ b/multipathd/multipathd-queueing.service.in > @@ -0,0 +1,9 @@ > +[Unit] > +Description=Enable queuing for multipath maps > +After=local-fs.target remote-fs.target multipathd.service > + > +[Service] > +Type=oneshot > +RemainAfterExit=yes > +ExecStart=/bin/true > +ExecStop=@BINDIR@/multipathd disablequeueing maps > diff --git a/multipathd/multipathd.service.in b/multipathd/multipathd.service.in > index b6a25b3..eb58943 100644 > --- a/multipathd/multipathd.service.in > +++ b/multipathd/multipathd.service.in > @@ -2,7 +2,7 @@ > Description=Device-Mapper Multipath Device Controller > Before=lvm2-activation-early.service > Before=local-fs-pre.target blk-availability.service shutdown.target > -Wants=systemd-udevd-kernel.socket @MODPROBE_UNIT@ > +Wants=systemd-udevd-kernel.socket multipathd-queueing.service @MODPROBE_UNIT@ > After=systemd-udevd-kernel.socket @MODPROBE_UNIT@ > After=multipathd.socket systemd-remount-fs.service > Before=initrd-cleanup.service > -- > 2.48.1
On Mon, 2025-03-03 at 14:42 -0500, Benjamin Marzinski wrote: > On Thu, Feb 27, 2025 at 06:41:04PM +0100, Martin Wilck wrote: > > Since c9689b6 ("multipathd: Remove dependency on > > systemd-udev-settle.service"), multipathd.service starts very early > > during > > boot, which in systemd's service ordering logic means that it is > > stopped > > late. While this is generally a good thing, it means that, when > > systemd > > unmounts file systems and tears down the block device stack, > > multipathd > > is still running. Therefore our "queue_without_daemon" logic, which > > disables > > queuing when multipathd exits, isn't effective yet. If there are > > any > > multipath maps that are in queueing state at this point in time, > > the system > > may hang indefinitely. > > > > To fix that, add a new service which starts later (and thus stops > > earlier) and > > disables queueing on all multipath maps during shutdown. Similar to > > lvm2's > > blk-availability.service, the service does nothing when started. > > > > Fixes: c9689b6 ("multipathd: Remove dependency on systemd-udev- > > settle.service") > > Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com> Thanks - do you reckon this is suitable for the stable tree? There is not much of a regression risk, but it requires shipping another file, so it's non-trivial for packagers. Personally I'm inclined to add it to stable (also because I'll need to backport it anyway). Martin
On Tue, Mar 04, 2025 at 09:26:12PM +0100, Martin Wilck wrote: > On Mon, 2025-03-03 at 14:42 -0500, Benjamin Marzinski wrote: > > On Thu, Feb 27, 2025 at 06:41:04PM +0100, Martin Wilck wrote: > > > Since c9689b6 ("multipathd: Remove dependency on > > > systemd-udev-settle.service"), multipathd.service starts very early > > > during > > > boot, which in systemd's service ordering logic means that it is > > > stopped > > > late. While this is generally a good thing, it means that, when > > > systemd > > > unmounts file systems and tears down the block device stack, > > > multipathd > > > is still running. Therefore our "queue_without_daemon" logic, which > > > disables > > > queuing when multipathd exits, isn't effective yet. If there are > > > any > > > multipath maps that are in queueing state at this point in time, > > > the system > > > may hang indefinitely. > > > > > > To fix that, add a new service which starts later (and thus stops > > > earlier) and > > > disables queueing on all multipath maps during shutdown. Similar to > > > lvm2's > > > blk-availability.service, the service does nothing when started. > > > > > > Fixes: c9689b6 ("multipathd: Remove dependency on systemd-udev- > > > settle.service") > > > > Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com> > > Thanks - do you reckon this is suitable for the stable tree? > There is not much of a regression risk, but it requires shipping > another file, so it's non-trivial for packagers. > > Personally I'm inclined to add it to stable (also because I'll need to > backport it anyway). I on the fence about this one. It does fix a bug, but adding a new service is kinda feature-y. But your right that there's not much risk of it breaking things, so it doesn't really matter much to me either way. -Ben > > Martin
diff --git a/multipathd/Makefile b/multipathd/Makefile index 61cf1af..4bcee6b 100644 --- a/multipathd/Makefile +++ b/multipathd/Makefile @@ -41,7 +41,7 @@ ifeq ($(FPIN_SUPPORT),1) OBJS += fpin_handlers.o endif -all : $(EXEC) $(CLI) $(MANPAGES) $(EXEC).service $(EXEC).socket +all : $(EXEC) $(CLI) $(MANPAGES) $(EXEC).service $(EXEC).socket $(EXEC)-queueing.service $(EXEC): $(OBJS) $(multipathdir)/libmultipath.so $(mpathcmddir)/libmpathcmd.so @echo building $@ because of $? @@ -64,6 +64,7 @@ install: ifdef SYSTEMD $(Q)$(INSTALL_PROGRAM) -d $(DESTDIR)$(unitdir) $(Q)$(INSTALL_PROGRAM) -m 644 $(EXEC).service $(DESTDIR)$(unitdir) + $(Q)$(INSTALL_PROGRAM) -m 644 $(EXEC)-queueing.service $(DESTDIR)$(unitdir) $(Q)$(INSTALL_PROGRAM) -m 644 $(EXEC).socket $(DESTDIR)$(unitdir) endif $(Q)$(INSTALL_PROGRAM) -d $(DESTDIR)$(mandir)/man8 @@ -74,11 +75,12 @@ uninstall: $(Q)$(RM) $(DESTDIR)$(bindir)/$(EXEC) $(DESTDIR)$(bindir)/$(CLI) $(Q)$(RM) $(DESTDIR)$(mandir)/man8/$(EXEC).8 $(Q)$(RM) $(DESTDIR)$(mandir)/man8/$(CLI).8 + $(Q)$(RM) $(DESTDIR)$(unitdir)/$(EXEC)-queueing.service $(Q)$(RM) $(DESTDIR)$(unitdir)/$(EXEC).service $(Q)$(RM) $(DESTDIR)$(unitdir)/$(EXEC).socket clean: dep_clean - $(Q)$(RM) core *.o $(EXEC) $(CLI) $(MANPAGES) $(EXEC).service $(EXEC).socket + $(Q)$(RM) core *.o $(EXEC) $(CLI) $(MANPAGES) $(EXEC).service $(EXEC)-queueing.service $(EXEC).socket include $(wildcard $(OBJS:.o=.d) $(CLI_OBJS:.o=.d)) diff --git a/multipathd/multipathd-queueing.service.in b/multipathd/multipathd-queueing.service.in new file mode 100644 index 0000000..18b0ca6 --- /dev/null +++ b/multipathd/multipathd-queueing.service.in @@ -0,0 +1,9 @@ +[Unit] +Description=Enable queuing for multipath maps +After=local-fs.target remote-fs.target multipathd.service + +[Service] +Type=oneshot +RemainAfterExit=yes +ExecStart=/bin/true +ExecStop=@BINDIR@/multipathd disablequeueing maps diff --git a/multipathd/multipathd.service.in b/multipathd/multipathd.service.in index b6a25b3..eb58943 100644 --- a/multipathd/multipathd.service.in +++ b/multipathd/multipathd.service.in @@ -2,7 +2,7 @@ Description=Device-Mapper Multipath Device Controller Before=lvm2-activation-early.service Before=local-fs-pre.target blk-availability.service shutdown.target -Wants=systemd-udevd-kernel.socket @MODPROBE_UNIT@ +Wants=systemd-udevd-kernel.socket multipathd-queueing.service @MODPROBE_UNIT@ After=systemd-udevd-kernel.socket @MODPROBE_UNIT@ After=multipathd.socket systemd-remount-fs.service Before=initrd-cleanup.service
Since c9689b6 ("multipathd: Remove dependency on systemd-udev-settle.service"), multipathd.service starts very early during boot, which in systemd's service ordering logic means that it is stopped late. While this is generally a good thing, it means that, when systemd unmounts file systems and tears down the block device stack, multipathd is still running. Therefore our "queue_without_daemon" logic, which disables queuing when multipathd exits, isn't effective yet. If there are any multipath maps that are in queueing state at this point in time, the system may hang indefinitely. To fix that, add a new service which starts later (and thus stops earlier) and disables queueing on all multipath maps during shutdown. Similar to lvm2's blk-availability.service, the service does nothing when started. Fixes: c9689b6 ("multipathd: Remove dependency on systemd-udev-settle.service") Signed-off-by: Martin Wilck <mwilck@suse.com> --- multipathd/Makefile | 6 ++++-- multipathd/multipathd-queueing.service.in | 9 +++++++++ multipathd/multipathd.service.in | 2 +- 3 files changed, 14 insertions(+), 3 deletions(-) create mode 100644 multipathd/multipathd-queueing.service.in