diff mbox series

multipathd: fix hang during shutdown with queuing maps

Message ID 20250227174104.206721-1-mwilck@suse.com (mailing list archive)
State New
Headers show
Series multipathd: fix hang during shutdown with queuing maps | expand

Commit Message

Martin Wilck Feb. 27, 2025, 5:41 p.m. UTC
Since c9689b6 ("multipathd: Remove dependency on
systemd-udev-settle.service"), multipathd.service starts very early during
boot, which in systemd's service ordering logic means that it is stopped
late. While this is generally a good thing, it means that, when systemd
unmounts file systems and tears down the block device stack, multipathd
is still running. Therefore our "queue_without_daemon" logic, which disables
queuing when multipathd exits, isn't effective yet. If there are any
multipath maps that are in queueing state at this point in time, the system
may hang indefinitely.

To fix that, add a new service which starts later (and thus stops earlier) and
disables queueing on all multipath maps during shutdown. Similar to lvm2's
blk-availability.service, the service does nothing when started.

Fixes: c9689b6 ("multipathd: Remove dependency on systemd-udev-settle.service")

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 multipathd/Makefile                       | 6 ++++--
 multipathd/multipathd-queueing.service.in | 9 +++++++++
 multipathd/multipathd.service.in          | 2 +-
 3 files changed, 14 insertions(+), 3 deletions(-)
 create mode 100644 multipathd/multipathd-queueing.service.in

Comments

Benjamin Marzinski March 3, 2025, 7:42 p.m. UTC | #1
On Thu, Feb 27, 2025 at 06:41:04PM +0100, Martin Wilck wrote:
> Since c9689b6 ("multipathd: Remove dependency on
> systemd-udev-settle.service"), multipathd.service starts very early during
> boot, which in systemd's service ordering logic means that it is stopped
> late. While this is generally a good thing, it means that, when systemd
> unmounts file systems and tears down the block device stack, multipathd
> is still running. Therefore our "queue_without_daemon" logic, which disables
> queuing when multipathd exits, isn't effective yet. If there are any
> multipath maps that are in queueing state at this point in time, the system
> may hang indefinitely.
> 
> To fix that, add a new service which starts later (and thus stops earlier) and
> disables queueing on all multipath maps during shutdown. Similar to lvm2's
> blk-availability.service, the service does nothing when started.
> 
> Fixes: c9689b6 ("multipathd: Remove dependency on systemd-udev-settle.service")

Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
> 
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>  multipathd/Makefile                       | 6 ++++--
>  multipathd/multipathd-queueing.service.in | 9 +++++++++
>  multipathd/multipathd.service.in          | 2 +-
>  3 files changed, 14 insertions(+), 3 deletions(-)
>  create mode 100644 multipathd/multipathd-queueing.service.in
> 
> diff --git a/multipathd/Makefile b/multipathd/Makefile
> index 61cf1af..4bcee6b 100644
> --- a/multipathd/Makefile
> +++ b/multipathd/Makefile
> @@ -41,7 +41,7 @@ ifeq ($(FPIN_SUPPORT),1)
>  OBJS += fpin_handlers.o
>  endif
>  
> -all : $(EXEC) $(CLI) $(MANPAGES) $(EXEC).service $(EXEC).socket
> +all : $(EXEC) $(CLI) $(MANPAGES) $(EXEC).service $(EXEC).socket $(EXEC)-queueing.service
>  
>  $(EXEC): $(OBJS) $(multipathdir)/libmultipath.so $(mpathcmddir)/libmpathcmd.so
>  	@echo building $@ because of $?
> @@ -64,6 +64,7 @@ install:
>  ifdef SYSTEMD
>  	$(Q)$(INSTALL_PROGRAM) -d $(DESTDIR)$(unitdir)
>  	$(Q)$(INSTALL_PROGRAM) -m 644 $(EXEC).service $(DESTDIR)$(unitdir)
> +	$(Q)$(INSTALL_PROGRAM) -m 644 $(EXEC)-queueing.service $(DESTDIR)$(unitdir)
>  	$(Q)$(INSTALL_PROGRAM) -m 644 $(EXEC).socket $(DESTDIR)$(unitdir)
>  endif
>  	$(Q)$(INSTALL_PROGRAM) -d $(DESTDIR)$(mandir)/man8
> @@ -74,11 +75,12 @@ uninstall:
>  	$(Q)$(RM) $(DESTDIR)$(bindir)/$(EXEC) $(DESTDIR)$(bindir)/$(CLI)
>  	$(Q)$(RM) $(DESTDIR)$(mandir)/man8/$(EXEC).8
>  	$(Q)$(RM) $(DESTDIR)$(mandir)/man8/$(CLI).8
> +	$(Q)$(RM) $(DESTDIR)$(unitdir)/$(EXEC)-queueing.service
>  	$(Q)$(RM) $(DESTDIR)$(unitdir)/$(EXEC).service
>  	$(Q)$(RM) $(DESTDIR)$(unitdir)/$(EXEC).socket
>  
>  clean: dep_clean
> -	$(Q)$(RM) core *.o $(EXEC) $(CLI) $(MANPAGES) $(EXEC).service $(EXEC).socket
> +	$(Q)$(RM) core *.o $(EXEC) $(CLI) $(MANPAGES) $(EXEC).service $(EXEC)-queueing.service $(EXEC).socket
>  
>  include $(wildcard $(OBJS:.o=.d) $(CLI_OBJS:.o=.d))
>  
> diff --git a/multipathd/multipathd-queueing.service.in b/multipathd/multipathd-queueing.service.in
> new file mode 100644
> index 0000000..18b0ca6
> --- /dev/null
> +++ b/multipathd/multipathd-queueing.service.in
> @@ -0,0 +1,9 @@
> +[Unit]
> +Description=Enable queuing for multipath maps
> +After=local-fs.target remote-fs.target multipathd.service
> +
> +[Service]
> +Type=oneshot
> +RemainAfterExit=yes
> +ExecStart=/bin/true
> +ExecStop=@BINDIR@/multipathd disablequeueing maps
> diff --git a/multipathd/multipathd.service.in b/multipathd/multipathd.service.in
> index b6a25b3..eb58943 100644
> --- a/multipathd/multipathd.service.in
> +++ b/multipathd/multipathd.service.in
> @@ -2,7 +2,7 @@
>  Description=Device-Mapper Multipath Device Controller
>  Before=lvm2-activation-early.service
>  Before=local-fs-pre.target blk-availability.service shutdown.target
> -Wants=systemd-udevd-kernel.socket @MODPROBE_UNIT@
> +Wants=systemd-udevd-kernel.socket multipathd-queueing.service @MODPROBE_UNIT@
>  After=systemd-udevd-kernel.socket @MODPROBE_UNIT@
>  After=multipathd.socket systemd-remount-fs.service
>  Before=initrd-cleanup.service
> -- 
> 2.48.1
Martin Wilck March 4, 2025, 8:26 p.m. UTC | #2
On Mon, 2025-03-03 at 14:42 -0500, Benjamin Marzinski wrote:
> On Thu, Feb 27, 2025 at 06:41:04PM +0100, Martin Wilck wrote:
> > Since c9689b6 ("multipathd: Remove dependency on
> > systemd-udev-settle.service"), multipathd.service starts very early
> > during
> > boot, which in systemd's service ordering logic means that it is
> > stopped
> > late. While this is generally a good thing, it means that, when
> > systemd
> > unmounts file systems and tears down the block device stack,
> > multipathd
> > is still running. Therefore our "queue_without_daemon" logic, which
> > disables
> > queuing when multipathd exits, isn't effective yet. If there are
> > any
> > multipath maps that are in queueing state at this point in time,
> > the system
> > may hang indefinitely.
> > 
> > To fix that, add a new service which starts later (and thus stops
> > earlier) and
> > disables queueing on all multipath maps during shutdown. Similar to
> > lvm2's
> > blk-availability.service, the service does nothing when started.
> > 
> > Fixes: c9689b6 ("multipathd: Remove dependency on systemd-udev-
> > settle.service")
> 
> Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>

Thanks - do you reckon this is suitable for the stable tree?
There is not much of a regression risk, but it requires shipping
another file, so it's non-trivial for packagers.

Personally I'm inclined to add it to stable (also because I'll need to
backport it anyway).

Martin
Benjamin Marzinski March 5, 2025, 9:24 p.m. UTC | #3
On Tue, Mar 04, 2025 at 09:26:12PM +0100, Martin Wilck wrote:
> On Mon, 2025-03-03 at 14:42 -0500, Benjamin Marzinski wrote:
> > On Thu, Feb 27, 2025 at 06:41:04PM +0100, Martin Wilck wrote:
> > > Since c9689b6 ("multipathd: Remove dependency on
> > > systemd-udev-settle.service"), multipathd.service starts very early
> > > during
> > > boot, which in systemd's service ordering logic means that it is
> > > stopped
> > > late. While this is generally a good thing, it means that, when
> > > systemd
> > > unmounts file systems and tears down the block device stack,
> > > multipathd
> > > is still running. Therefore our "queue_without_daemon" logic, which
> > > disables
> > > queuing when multipathd exits, isn't effective yet. If there are
> > > any
> > > multipath maps that are in queueing state at this point in time,
> > > the system
> > > may hang indefinitely.
> > > 
> > > To fix that, add a new service which starts later (and thus stops
> > > earlier) and
> > > disables queueing on all multipath maps during shutdown. Similar to
> > > lvm2's
> > > blk-availability.service, the service does nothing when started.
> > > 
> > > Fixes: c9689b6 ("multipathd: Remove dependency on systemd-udev-
> > > settle.service")
> > 
> > Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
> 
> Thanks - do you reckon this is suitable for the stable tree?
> There is not much of a regression risk, but it requires shipping
> another file, so it's non-trivial for packagers.
> 
> Personally I'm inclined to add it to stable (also because I'll need to
> backport it anyway).

I on the fence about this one. It does fix a bug, but adding a new
service is kinda feature-y. But your right that there's not much risk of
it breaking things, so it doesn't really matter much to me either way.

-Ben

> 
> Martin
diff mbox series

Patch

diff --git a/multipathd/Makefile b/multipathd/Makefile
index 61cf1af..4bcee6b 100644
--- a/multipathd/Makefile
+++ b/multipathd/Makefile
@@ -41,7 +41,7 @@  ifeq ($(FPIN_SUPPORT),1)
 OBJS += fpin_handlers.o
 endif
 
-all : $(EXEC) $(CLI) $(MANPAGES) $(EXEC).service $(EXEC).socket
+all : $(EXEC) $(CLI) $(MANPAGES) $(EXEC).service $(EXEC).socket $(EXEC)-queueing.service
 
 $(EXEC): $(OBJS) $(multipathdir)/libmultipath.so $(mpathcmddir)/libmpathcmd.so
 	@echo building $@ because of $?
@@ -64,6 +64,7 @@  install:
 ifdef SYSTEMD
 	$(Q)$(INSTALL_PROGRAM) -d $(DESTDIR)$(unitdir)
 	$(Q)$(INSTALL_PROGRAM) -m 644 $(EXEC).service $(DESTDIR)$(unitdir)
+	$(Q)$(INSTALL_PROGRAM) -m 644 $(EXEC)-queueing.service $(DESTDIR)$(unitdir)
 	$(Q)$(INSTALL_PROGRAM) -m 644 $(EXEC).socket $(DESTDIR)$(unitdir)
 endif
 	$(Q)$(INSTALL_PROGRAM) -d $(DESTDIR)$(mandir)/man8
@@ -74,11 +75,12 @@  uninstall:
 	$(Q)$(RM) $(DESTDIR)$(bindir)/$(EXEC) $(DESTDIR)$(bindir)/$(CLI)
 	$(Q)$(RM) $(DESTDIR)$(mandir)/man8/$(EXEC).8
 	$(Q)$(RM) $(DESTDIR)$(mandir)/man8/$(CLI).8
+	$(Q)$(RM) $(DESTDIR)$(unitdir)/$(EXEC)-queueing.service
 	$(Q)$(RM) $(DESTDIR)$(unitdir)/$(EXEC).service
 	$(Q)$(RM) $(DESTDIR)$(unitdir)/$(EXEC).socket
 
 clean: dep_clean
-	$(Q)$(RM) core *.o $(EXEC) $(CLI) $(MANPAGES) $(EXEC).service $(EXEC).socket
+	$(Q)$(RM) core *.o $(EXEC) $(CLI) $(MANPAGES) $(EXEC).service $(EXEC)-queueing.service $(EXEC).socket
 
 include $(wildcard $(OBJS:.o=.d) $(CLI_OBJS:.o=.d))
 
diff --git a/multipathd/multipathd-queueing.service.in b/multipathd/multipathd-queueing.service.in
new file mode 100644
index 0000000..18b0ca6
--- /dev/null
+++ b/multipathd/multipathd-queueing.service.in
@@ -0,0 +1,9 @@ 
+[Unit]
+Description=Enable queuing for multipath maps
+After=local-fs.target remote-fs.target multipathd.service
+
+[Service]
+Type=oneshot
+RemainAfterExit=yes
+ExecStart=/bin/true
+ExecStop=@BINDIR@/multipathd disablequeueing maps
diff --git a/multipathd/multipathd.service.in b/multipathd/multipathd.service.in
index b6a25b3..eb58943 100644
--- a/multipathd/multipathd.service.in
+++ b/multipathd/multipathd.service.in
@@ -2,7 +2,7 @@ 
 Description=Device-Mapper Multipath Device Controller
 Before=lvm2-activation-early.service
 Before=local-fs-pre.target blk-availability.service shutdown.target
-Wants=systemd-udevd-kernel.socket @MODPROBE_UNIT@
+Wants=systemd-udevd-kernel.socket multipathd-queueing.service @MODPROBE_UNIT@
 After=systemd-udevd-kernel.socket @MODPROBE_UNIT@
 After=multipathd.socket systemd-remount-fs.service
 Before=initrd-cleanup.service