diff mbox series

[6/6] mdmon improvements for switchroot

Message ID 167745678753.16565.5052083348539533042.stgit@noble.brown (mailing list archive)
State Superseded, archived
Delegated to: Mariusz Tkaczyk
Headers show
Series Assorted patches relating to mdmon | expand

Commit Message

NeilBrown Feb. 27, 2023, 12:13 a.m. UTC
We need a new mdmon@mdfoo instance to run in the root filesystem after
switch root, as /sys and /dev are removed from the initrd.

systemd will not start a new unit with the same name running while the
old unit is still active, and we want the two mdmon processes to overlap
in time to avoid any risk of deadlock which a write is attempted with no
mdmon running.

So we need a different unit name in the initrd than in the root.  Apart
from the name, everything else should be the same.

This is easily achieved using a different instance name as the
mdmon@.service unit file already supports multiple instances (for
different arrays).

So start "mdmon@mdfoo.service" from root, but
"mdmon@initrd-mdfoo.service" from the initrd.  udev can tell which
circumstance is the case by looking for /etc/initrd-release.
continue_from_systemd() is enhanced so that the "initrd-" prefix can be
requested.

Teach mdmon that a container name like "initrd/foo" should be treated
just like "foo".  Note that systemd passes the instance name
"initrd-foo" as "initrd/foo".

We don't need a similar machanism at shutdown because dracut runs
"mdmon --takeover --all" when appropriate.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 Grow.c                    |    4 ++--
 mdadm.h                   |    2 +-
 mdmon.c                   |    6 +++++-
 systemd/mdmon@.service    |    2 +-
 udev-md-raid-arrays.rules |    3 ++-
 util.c                    |    7 ++++---
 6 files changed, 15 insertions(+), 9 deletions(-)

Comments

Mariusz Tkaczyk March 1, 2023, 1:50 p.m. UTC | #1
Hi Neil,
We found typo. We fixed that to test the change.
Other comments are less important.

On Mon, 27 Feb 2023 11:13:07 +1100
NeilBrown <neilb@suse.de> wrote:

> We need a new mdmon@mdfoo instance to run in the root filesystem after
> switch root, as /sys and /dev are removed from the initrd.
> 
> systemd will not start a new unit with the same name running while the
> old unit is still active, and we want the two mdmon processes to overlap
> in time to avoid any risk of deadlock which a write is attempted with no
> mdmon running.
> 
> So we need a different unit name in the initrd than in the root.  Apart
> from the name, everything else should be the same.
> 
> This is easily achieved using a different instance name as the
> mdmon@.service unit file already supports multiple instances (for
> different arrays).
> 
> So start "mdmon@mdfoo.service" from root, but
> "mdmon@initrd-mdfoo.service" from the initrd.  udev can tell which
> circumstance is the case by looking for /etc/initrd-release.
> continue_from_systemd() is enhanced so that the "initrd-" prefix can be
> requested.
> 
> Teach mdmon that a container name like "initrd/foo" should be treated
> just like "foo".  Note that systemd passes the instance name
> "initrd-foo" as "initrd/foo".
> 
> We don't need a similar machanism at shutdown because dracut runs
> "mdmon --takeover --all" when appropriate.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>

> diff --git a/mdmon.c b/mdmon.c
> index 6d37b17c3f53..25abdd71fb1e 100644
> --- a/mdmon.c
> +++ b/mdmon.c
> @@ -368,7 +368,11 @@ int main(int argc, char *argv[])
>  	}
>  
>  	if (!all && argv[optind]) {
> -		container_name = get_md_name(argv[optind]);
> +		static const char prefix[] = "initrd/";
> +		container_name = argv[optind];
> +		if (strncmp(container_name, prefix, sizeof(prefix)-1) == 0)
> +			container_name += sizeof(prefix)-1;
> +		container_name = get_md_name(container_name);

"sizeof(prefix)-1" there should be spaces before and after operator.

You are defining similar literals in 2 places:
prefix[] = "initrd/"
*prefix = in_initrd() ? "initrd-", "";

When I see something like this, I need to ask why it is not globally defined
because in the future we would need to define it for the firth and fourth time.
I see the difference in last sign ('/' and '-'). We can omit that.
I would like propose something like:

in mdadm.h:
#DEFINE MDMON_PREFIX "initrd"

in mdmon, do not check last sign. whatever it is, we don't really care, just
skip it. All we need to know is that it not belongs to container name.
Hope it works correctly:
	if (strncmp(container_name, MDMON_PREFIX, sizeof(prefix) - 1) == 0)
		container_name += sizeof(MDMON_PREFIX);
	
And later in start_mdmon include '-' in snprintf:
		 "%s@%s%s.service", service_name, MDMON_PREFIX"-" ?: "",

I think that we don't need to pass whole char* value, we can use bool, the one
possibility is "initrd" now. If that would be changed, we can use enum and maps
interface:
https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/tree/maps.c

This is lesson learned by code study, we needed to put big effort to correct
similar case with reshapes because pointers become overkill through
years:
https://lore.kernel.org/linux-raid/20230102083524.28893-1-mateusz.kusiak@intel.com/

It my my personal view so you are free to make decision. I will accept it but
please note that mdadm is full of same literals (just find /dev/md or /dev/md/)
so that is why I'm especially sensitive in that cases.

> --git a/util.c b/util.c index 6b44662db7cd..1d433d1826b5 100644 --- a/util.c
> +++ b/util.c @@ -1906,6 +1906,7 @@ int start_mdmon(char *devnm)
>  	int len;
>  	pid_t pid;
>  	int status;
> +	char *prefix = in_initrd() ? "initrd-", "";

The most important thing:
typo, should be in_initrd() ? "initrd-": "";

>  	char pathbuf[1024];
>  	char *paths[4] = {
>  		pathbuf,
> @@ -1916,7 +1917,7 @@ int start_mdmon(char *devnm)
>  
>  	if (check_env("MDADM_NO_MDMON"))
>  		return 0;
> -	if (continue_via_systemd(devnm, MDMON_SERVICE))
> +	if (continue_via_systemd(devnm, MDMON_SERVICE, prefix))
>  		return 0;
>  
>  	/* That failed, try running mdmon directly */
> @@ -2187,7 +2188,7 @@ void manage_fork_fds(int close_all)
>   *	1- if systemd service has been started
>   *	0- otherwise
>   */
> -int continue_via_systemd(char *devnm, char *service_name)
> +int continue_via_systemd(char *devnm, char *service_name, char *prefix)
>  {
>  	int pid, status;
>  	char pathbuf[1024];
> @@ -2199,7 +2200,7 @@ int continue_via_systemd(char *devnm, char
> *service_name) case  0:
>  		manage_fork_fds(1);
>  		snprintf(pathbuf, sizeof(pathbuf),
> -			 "%s@%s.service", service_name, devnm);
> +			 "%s@%s%s.service", service_name, prefix ?: "",
> devnm); status = execl("/usr/bin/systemctl", "systemctl", "restart",
>  			       pathbuf, NULL);
>  		status = execl("/bin/systemctl", "systemctl", "restart",
> 
> 

Thanks,
Mariusz
NeilBrown March 5, 2023, 10:10 p.m. UTC | #2
On Thu, 02 Mar 2023, Mariusz Tkaczyk wrote:
> Hi Neil,
> We found typo. We fixed that to test the change.
> Other comments are less important.
> 
> On Mon, 27 Feb 2023 11:13:07 +1100
> NeilBrown <neilb@suse.de> wrote:
> 
> > We need a new mdmon@mdfoo instance to run in the root filesystem after
> > switch root, as /sys and /dev are removed from the initrd.
> > 
> > systemd will not start a new unit with the same name running while the
> > old unit is still active, and we want the two mdmon processes to overlap
> > in time to avoid any risk of deadlock which a write is attempted with no
> > mdmon running.
> > 
> > So we need a different unit name in the initrd than in the root.  Apart
> > from the name, everything else should be the same.
> > 
> > This is easily achieved using a different instance name as the
> > mdmon@.service unit file already supports multiple instances (for
> > different arrays).
> > 
> > So start "mdmon@mdfoo.service" from root, but
> > "mdmon@initrd-mdfoo.service" from the initrd.  udev can tell which
> > circumstance is the case by looking for /etc/initrd-release.
> > continue_from_systemd() is enhanced so that the "initrd-" prefix can be
> > requested.
> > 
> > Teach mdmon that a container name like "initrd/foo" should be treated
> > just like "foo".  Note that systemd passes the instance name
> > "initrd-foo" as "initrd/foo".
> > 
> > We don't need a similar machanism at shutdown because dracut runs
> > "mdmon --takeover --all" when appropriate.
> > 
> > Signed-off-by: NeilBrown <neilb@suse.de>
> 
> > diff --git a/mdmon.c b/mdmon.c
> > index 6d37b17c3f53..25abdd71fb1e 100644
> > --- a/mdmon.c
> > +++ b/mdmon.c
> > @@ -368,7 +368,11 @@ int main(int argc, char *argv[])
> >  	}
> >  
> >  	if (!all && argv[optind]) {
> > -		container_name = get_md_name(argv[optind]);
> > +		static const char prefix[] = "initrd/";
> > +		container_name = argv[optind];
> > +		if (strncmp(container_name, prefix, sizeof(prefix)-1) == 0)
> > +			container_name += sizeof(prefix)-1;
> > +		container_name = get_md_name(container_name);
> 
> "sizeof(prefix)-1" there should be spaces before and after operator.
> 
> You are defining similar literals in 2 places:
> prefix[] = "initrd/"
> *prefix = in_initrd() ? "initrd-", "";
> 
> When I see something like this, I need to ask why it is not globally defined
> because in the future we would need to define it for the firth and fourth time.
> I see the difference in last sign ('/' and '-'). We can omit that.
> I would like propose something like:
> 
> in mdadm.h:
> #DEFINE MDMON_PREFIX "initrd"

To my mind this is "premature optimisation".  I think it makes the core
harder to read.  If it makes the code easier to maintain then it might
be worth the cost.  I don't think it does.
> 
> in mdmon, do not check last sign. whatever it is, we don't really care, just
> skip it. All we need to know is that it not belongs to container name.
> Hope it works correctly:
> 	if (strncmp(container_name, MDMON_PREFIX, sizeof(prefix) - 1) == 0)
> 		container_name += sizeof(MDMON_PREFIX);
> 	
> And later in start_mdmon include '-' in snprintf:
> 		 "%s@%s%s.service", service_name, MDMON_PREFIX"-" ?: "",
> 
> I think that we don't need to pass whole char* value, we can use bool, the one
> possibility is "initrd" now. If that would be changed, we can use enum and maps
> interface:
> https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/tree/maps.c

Passing bools makes the calling code hard to read.  "What does that
"true" or "false" mean??".  Sometimes it really is best, but I think
passing the string "initrd" makes more sense to the reader than passing "true".

Passing enums is better than simple bools - and has the benefit of
spelling out the meaning of NULL.  These still feel a bit clumsy to me
so I would only use them when there is a clear benefit.

> 
> This is lesson learned by code study, we needed to put big effort to correct
> similar case with reshapes because pointers become overkill through
> years:
> https://lore.kernel.org/linux-raid/20230102083524.28893-1-mateusz.kusiak@intel.com/
> 
> It my my personal view so you are free to make decision. I will accept it but
> please note that mdadm is full of same literals (just find /dev/md or /dev/md/)
> so that is why I'm especially sensitive in that cases.
> 
> > --git a/util.c b/util.c index 6b44662db7cd..1d433d1826b5 100644 --- a/util.c
> > +++ b/util.c @@ -1906,6 +1906,7 @@ int start_mdmon(char *devnm)
> >  	int len;
> >  	pid_t pid;
> >  	int status;
> > +	char *prefix = in_initrd() ? "initrd-", "";
> 
> The most important thing:
> typo, should be in_initrd() ? "initrd-": "";

Certainly - thanks for catching that!

Jes - should I resent the whole series, or just this patch, or will you
edit before applying?

Thanks,
NeilBrown

> 
> >  	char pathbuf[1024];
> >  	char *paths[4] = {
> >  		pathbuf,
> > @@ -1916,7 +1917,7 @@ int start_mdmon(char *devnm)
> >  
> >  	if (check_env("MDADM_NO_MDMON"))
> >  		return 0;
> > -	if (continue_via_systemd(devnm, MDMON_SERVICE))
> > +	if (continue_via_systemd(devnm, MDMON_SERVICE, prefix))
> >  		return 0;
> >  
> >  	/* That failed, try running mdmon directly */
> > @@ -2187,7 +2188,7 @@ void manage_fork_fds(int close_all)
> >   *	1- if systemd service has been started
> >   *	0- otherwise
> >   */
> > -int continue_via_systemd(char *devnm, char *service_name)
> > +int continue_via_systemd(char *devnm, char *service_name, char *prefix)
> >  {
> >  	int pid, status;
> >  	char pathbuf[1024];
> > @@ -2199,7 +2200,7 @@ int continue_via_systemd(char *devnm, char
> > *service_name) case  0:
> >  		manage_fork_fds(1);
> >  		snprintf(pathbuf, sizeof(pathbuf),
> > -			 "%s@%s.service", service_name, devnm);
> > +			 "%s@%s%s.service", service_name, prefix ?: "",
> > devnm); status = execl("/usr/bin/systemctl", "systemctl", "restart",
> >  			       pathbuf, NULL);
> >  		status = execl("/bin/systemctl", "systemctl", "restart",
> > 
> > 
> 
> Thanks,
> Mariusz
>
Paul Menzel March 6, 2023, 8:31 a.m. UTC | #3
Dear Neil,


Thank you for your patch.

Am 27.02.23 um 01:13 schrieb NeilBrown:

For commit message summary, I’d use a statement – maybe:

mdmon: Improve switchroot

> We need a new mdmon@mdfoo instance to run in the root filesystem after
> switch root, as /sys and /dev are removed from the initrd.
> 
> systemd will not start a new unit with the same name running while the
> old unit is still active, and we want the two mdmon processes to overlap
> in time to avoid any risk of deadlock which a write is attempted with no
> mdmon running.

I do not understand the part after *deadlock*.

> So we need a different unit name in the initrd than in the root.  Apart
> from the name, everything else should be the same.
> 
> This is easily achieved using a different instance name as the
> mdmon@.service unit file already supports multiple instances (for
> different arrays).
> 
> So start "mdmon@mdfoo.service" from root, but
> "mdmon@initrd-mdfoo.service" from the initrd.  udev can tell which
> circumstance is the case by looking for /etc/initrd-release.
> continue_from_systemd() is enhanced so that the "initrd-" prefix can be
> requested.
> 
> Teach mdmon that a container name like "initrd/foo" should be treated
> just like "foo".  Note that systemd passes the instance name
> "initrd-foo" as "initrd/foo".
> 
> We don't need a similar machanism at shutdown because dracut runs

mechanism

> "mdmon --takeover --all" when appropriate.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>   Grow.c                    |    4 ++--
>   mdadm.h                   |    2 +-
>   mdmon.c                   |    6 +++++-
>   systemd/mdmon@.service    |    2 +-
>   udev-md-raid-arrays.rules |    3 ++-
>   util.c                    |    7 ++++---
>   6 files changed, 15 insertions(+), 9 deletions(-)

[…]


Kind regards,

Paul
diff mbox series

Patch

diff --git a/Grow.c b/Grow.c
index bb5fe45c851c..06001f2d334a 100644
--- a/Grow.c
+++ b/Grow.c
@@ -3516,7 +3516,7 @@  started:
 
 	if (!forked)
 		if (continue_via_systemd(container ?: sra->sys_name,
-					 GROW_SERVICE)) {
+					 GROW_SERVICE, NULL)) {
 			free(fdlist);
 			free(offsets);
 			sysfs_free(sra);
@@ -3714,7 +3714,7 @@  int reshape_container(char *container, char *devname,
 	ping_monitor(container);
 
 	if (!forked && !freeze_reshape)
-		if (continue_via_systemd(container, GROW_SERVICE))
+		if (continue_via_systemd(container, GROW_SERVICE, NULL))
 			return 0;
 
 	switch (forked ? 0 : fork()) {
diff --git a/mdadm.h b/mdadm.h
index 0939fd3cb1ee..f208ed5ec22f 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -1605,7 +1605,7 @@  extern int same_dev(char *one, char *two);
 extern int compare_paths (char* path1,char* path2);
 extern void enable_fds(int devices);
 extern void manage_fork_fds(int close_all);
-extern int continue_via_systemd(char *devnm, char *service_name);
+extern int continue_via_systemd(char *devnm, char *service_name, char *prefix);
 
 extern void ident_init(struct mddev_ident *ident);
 
diff --git a/mdmon.c b/mdmon.c
index 6d37b17c3f53..25abdd71fb1e 100644
--- a/mdmon.c
+++ b/mdmon.c
@@ -368,7 +368,11 @@  int main(int argc, char *argv[])
 	}
 
 	if (!all && argv[optind]) {
-		container_name = get_md_name(argv[optind]);
+		static const char prefix[] = "initrd/";
+		container_name = argv[optind];
+		if (strncmp(container_name, prefix, sizeof(prefix)-1) == 0)
+			container_name += sizeof(prefix)-1;
+		container_name = get_md_name(container_name);
 		if (!container_name)
 			return 1;
 	}
diff --git a/systemd/mdmon@.service b/systemd/mdmon@.service
index 3ab908c45a3b..8d55fa63ce28 100644
--- a/systemd/mdmon@.service
+++ b/systemd/mdmon@.service
@@ -6,7 +6,7 @@ 
 #  (at your option) any later version.
 
 [Unit]
-Description=MD Metadata Monitor on /dev/%I
+Description=MD Metadata Monitor on %I
 DefaultDependencies=no
 Before=initrd-switch-root.target
 Documentation=man:mdmon(8)
diff --git a/udev-md-raid-arrays.rules b/udev-md-raid-arrays.rules
index 2967ace1f040..4e64b249b2db 100644
--- a/udev-md-raid-arrays.rules
+++ b/udev-md-raid-arrays.rules
@@ -38,7 +38,8 @@  ENV{MD_LEVEL}=="raid[1-9]*", ENV{SYSTEMD_WANTS}+="mdmonitor.service"
 
 # Tell systemd to run mdmon for our container, if we need it.
 ENV{MD_LEVEL}=="raid[1-9]*", ENV{MD_CONTAINER}=="?*", PROGRAM="/usr/bin/readlink $env{MD_CONTAINER}", ENV{MD_MON_THIS}="%c"
-ENV{MD_MON_THIS}=="?*", PROGRAM="/usr/bin/basename $env{MD_MON_THIS}", ENV{SYSTEMD_WANTS}+="mdmon@%c.service"
+ENV{MD_MON_THIS}=="?*", TEST=="/etc/initrd-release", PROGRAM="/usr/bin/basename $env{MD_MON_THIS}", ENV{SYSTEMD_WANTS}+="mdmon@initrd-%c.service"
+ENV{MD_MON_THIS}=="?*", TEST!="/etc/initrd-release", PROGRAM="/usr/bin/basename $env{MD_MON_THIS}", ENV{SYSTEMD_WANTS}+="mdmon@%c.service"
 ENV{RESHAPE_ACTIVE}=="yes", PROGRAM="/usr/bin/basename $env{MD_MON_THIS}", ENV{SYSTEMD_WANTS}+="mdadm-grow-continue@%c.service"
 
 LABEL="md_end"
diff --git a/util.c b/util.c
index 6b44662db7cd..1d433d1826b5 100644
--- a/util.c
+++ b/util.c
@@ -1906,6 +1906,7 @@  int start_mdmon(char *devnm)
 	int len;
 	pid_t pid;
 	int status;
+	char *prefix = in_initrd() ? "initrd-", "";
 	char pathbuf[1024];
 	char *paths[4] = {
 		pathbuf,
@@ -1916,7 +1917,7 @@  int start_mdmon(char *devnm)
 
 	if (check_env("MDADM_NO_MDMON"))
 		return 0;
-	if (continue_via_systemd(devnm, MDMON_SERVICE))
+	if (continue_via_systemd(devnm, MDMON_SERVICE, prefix))
 		return 0;
 
 	/* That failed, try running mdmon directly */
@@ -2187,7 +2188,7 @@  void manage_fork_fds(int close_all)
  *	1- if systemd service has been started
  *	0- otherwise
  */
-int continue_via_systemd(char *devnm, char *service_name)
+int continue_via_systemd(char *devnm, char *service_name, char *prefix)
 {
 	int pid, status;
 	char pathbuf[1024];
@@ -2199,7 +2200,7 @@  int continue_via_systemd(char *devnm, char *service_name)
 	case  0:
 		manage_fork_fds(1);
 		snprintf(pathbuf, sizeof(pathbuf),
-			 "%s@%s.service", service_name, devnm);
+			 "%s@%s%s.service", service_name, prefix ?: "", devnm);
 		status = execl("/usr/bin/systemctl", "systemctl", "restart",
 			       pathbuf, NULL);
 		status = execl("/bin/systemctl", "systemctl", "restart",