mbox series

[0/6] Assorted patches relating to mdmon

Message ID 167745586347.16565.4353184078424535907.stgit@noble.brown (mailing list archive)
Headers show
Series Assorted patches relating to mdmon | expand

Message

NeilBrown Feb. 27, 2023, 12:13 a.m. UTC
mdmon is a root-storage daemon is the sense defined by systemd
documentation, but it does not follow the practice that systemd
recommends.  Specifically it is run from the root filesystem when
possible.  The instance started in the initrd hands-over to a root-fs
based instance, which then hands-over to an initrd-based instance
started by dracut at shutdown.

Part of the reason that we ignore systemd advise is that mdmon needs
access to the filesystem - specifically /dev and /sys - which is not
available in the initrd context after switchroot.  We could possibly
change mdmon to work in the systemd-preferred way by splitting mdmon
into two processes instead of just having 2 threads.  The "monitor"
process could running entirely in the initrd context, the "manager"
process could safely run in the root-fs context, passing newly opened
file descriptors to the monitor over a unix-domain socket.

But we aren't there yet and may never be.

For now, mdmon doesn't work correctly.  There is no mechanism to ensure
a new instance starts after switchroot.  Until recently the initrd
instance of the systemd mdmon unit would be stopped at switchroot time
because udev would temporarily forget about md devices.  This would
allow the "udevadm trigger" process to start a new instance.  udev was
recently fixed:

Commit: 7ec624147a41 ("udevadm: cleanup-db: don't delete information for kept db entries")

so now the attempt to start mdmon via "udevadm trigger" does nothing as
mdmon already has an active unit.

The net result is that mdmon continues running in the initrd mount
namespace and so cannot access new devices.  Adding a device to a root
md array that depends on mdmon will no longer work.

We want the initrd instance of mdmon to continue running until the
root-fs based instance starts, and that really requires we have two
different systemd units.  This series achieves this in the final patch by
using a different instance name inside or initrd and outside.
"initrd-mdfoo" and "mdfoo".

Other patches in the series are mostly clean-ups and minor improvements
in related code.

NeilBrown



---

NeilBrown (6):
      Use existence of /etc/initrd-release to detect initrd.
      Improvements for IMSM_NO_PLATFORM testing.
      mdmon: don't test both 'all' and 'container_name'.
      mdmon: change systemd unit file to use --foreground
      mdmon: Remove need for KillMode=none
      mdmon improvements for switchroot


 Grow.c                    |  4 ++--
 mdadm.h                   |  4 +++-
 mdmon.c                   | 21 ++++++++++++-------
 super-intel.c             | 43 ++++++++++++++++++++++++++++++++++++---
 systemd/mdmon@.service    | 15 +++++++-------
 udev-md-raid-arrays.rules |  3 ++-
 util.c                    | 17 +++++-----------
 7 files changed, 74 insertions(+), 33 deletions(-)

--
Signature

Comments

Mariusz Tkaczyk March 1, 2023, 8:55 a.m. UTC | #1
On Mon, 27 Feb 2023 11:13:07 +1100
NeilBrown <neilb@suse.de> wrote:

> mdmon is a root-storage daemon is the sense defined by systemd
> documentation, but it does not follow the practice that systemd
> recommends.  Specifically it is run from the root filesystem when
> possible.  The instance started in the initrd hands-over to a root-fs
> based instance, which then hands-over to an initrd-based instance
> started by dracut at shutdown.
> 
> Part of the reason that we ignore systemd advise is that mdmon needs
> access to the filesystem - specifically /dev and /sys - which is not
> available in the initrd context after switchroot.  We could possibly
> change mdmon to work in the systemd-preferred way by splitting mdmon
> into two processes instead of just having 2 threads.  The "monitor"
> process could running entirely in the initrd context, the "manager"
> process could safely run in the root-fs context, passing newly opened
> file descriptors to the monitor over a unix-domain socket.
> 
> But we aren't there yet and may never be.
> 
> For now, mdmon doesn't work correctly.  There is no mechanism to ensure
> a new instance starts after switchroot.  Until recently the initrd
> instance of the systemd mdmon unit would be stopped at switchroot time
> because udev would temporarily forget about md devices.  This would
> allow the "udevadm trigger" process to start a new instance.  udev was
> recently fixed:
> 
> Commit: 7ec624147a41 ("udevadm: cleanup-db: don't delete information for kept
> db entries")
> 
> so now the attempt to start mdmon via "udevadm trigger" does nothing as
> mdmon already has an active unit.
> 
> The net result is that mdmon continues running in the initrd mount
> namespace and so cannot access new devices.  Adding a device to a root
> md array that depends on mdmon will no longer work.
> 
> We want the initrd instance of mdmon to continue running until the
> root-fs based instance starts, and that really requires we have two
> different systemd units.  This series achieves this in the final patch by
> using a different instance name inside or initrd and outside.
> "initrd-mdfoo" and "mdfoo".
> 
> Other patches in the series are mostly clean-ups and minor improvements
> in related code.
> 
> NeilBrown
> 

Hi Jes,
The problem descried by Neil is critical for IMSM. I will test the patchset
ASAP.
Additionally, it resolves "KillMode=none" problem so we will be able to finally
drop it.

I will be back with results soon.

Thanks,
Mariusz