diff mbox

[v2,17/20] multipath.rules: find_multipaths "smart" logic

Message ID 20180319150155.5363-18-mwilck@suse.com (mailing list archive)
State Not Applicable, archived
Delegated to: christophe varoqui
Headers show

Commit Message

Martin Wilck March 19, 2018, 3:01 p.m. UTC
When the first path to a device appears, we don't know if more paths are going
to follow. find_multipath "smart" logic attempts to solve this dilemma by
waiting for additional paths for a configurable time before giving up
and releasing single paths to upper layers.

These rules apply only if both find_multipaths is set to "smart" in
multipath.conf. In this mode, multipath -u sets DM_MULTIPATH_DEVICE_PATH=2 if
there's no clear evidence wheteher a given device should be a multipath member
(not blacklisted, not listed as "failed", not in WWIDs file, not member of an
exisiting map, only one path seen yet).

In this case, pretend that the path is multipath member, disallow further
processing by systemd (allowing multipathd some time to grab the path),
and check again after some time. If the path is still not multipathed by then,
pass it on to systemd for further processing.

The timeout is controlled by the "find_multipaths_timeout" config option.
Note that delays caused by waiting don't "add up" during boot, because the
timers run concurrently.

Implementation note: This logic requires obtaining the current time. It's not
trivial to do this in udev rules in a portable way, because "/bin/date" is
often not available in restricted environments such as the initrd. I chose
the sysfs method, because /sys/class/rtc/rtc0 seems to be quite universally
available. I'm open for better suggestions if there are any.

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 multipath/multipath.rules | 80 +++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 78 insertions(+), 2 deletions(-)

Comments

Benjamin Marzinski March 27, 2018, 9:03 p.m. UTC | #1
On Mon, Mar 19, 2018 at 04:01:52PM +0100, Martin Wilck wrote:
> When the first path to a device appears, we don't know if more paths are going
> to follow. find_multipath "smart" logic attempts to solve this dilemma by
> waiting for additional paths for a configurable time before giving up
> and releasing single paths to upper layers.
> 
> These rules apply only if both find_multipaths is set to "smart" in
> multipath.conf. In this mode, multipath -u sets DM_MULTIPATH_DEVICE_PATH=2 if
> there's no clear evidence wheteher a given device should be a multipath member
> (not blacklisted, not listed as "failed", not in WWIDs file, not member of an
> exisiting map, only one path seen yet).
> 
> In this case, pretend that the path is multipath member, disallow further
> processing by systemd (allowing multipathd some time to grab the path),
> and check again after some time. If the path is still not multipathed by then,
> pass it on to systemd for further processing.
> 
> The timeout is controlled by the "find_multipaths_timeout" config option.
> Note that delays caused by waiting don't "add up" during boot, because the
> timers run concurrently.
> 
> Implementation note: This logic requires obtaining the current time. It's not
> trivial to do this in udev rules in a portable way, because "/bin/date" is
> often not available in restricted environments such as the initrd. I chose
> the sysfs method, because /sys/class/rtc/rtc0 seems to be quite universally
> available. I'm open for better suggestions if there are any.

I have a couple of code issues, that I'll point out below, but I have an
overall question.  If multipath exists in the initramfs, and a device is
not claimed there, then after the pivot, multipath will not temporarily
claim it, correct?  I'm pretty sure, but not totally certain, that udev
database persists between the udev running in the initramfs and the
regular system. On the other hand, if multipth isn't in the initramfs
but it is in the regular system, then AFAICS, once the system pivots to
the regular fs, there is nothing to warn multipath that these devices
could already be in use, correct?  So, even if you don't need to
multipath any devices in your initramfs, you will need multipath in your
initramfs, or it could go setting devices to not ready. right?

> 
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>  multipath/multipath.rules | 80 +++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 78 insertions(+), 2 deletions(-)
> 
> diff --git a/multipath/multipath.rules b/multipath/multipath.rules
> index aab64dc7182c..32d33991db3d 100644
> --- a/multipath/multipath.rules
> +++ b/multipath/multipath.rules
> @@ -21,7 +21,83 @@ TEST!="$env{MPATH_SBIN_PATH}/multipath", ENV{MPATH_SBIN_PATH}="/usr/sbin"
>  
>  # multipath -u sets DM_MULTIPATH_DEVICE_PATH
>  ENV{DM_MULTIPATH_DEVICE_PATH}!="1", IMPORT{program}="$env{MPATH_SBIN_PATH}/multipath -u %k"
> -ENV{DM_MULTIPATH_DEVICE_PATH}=="1", ENV{ID_FS_TYPE}="mpath_member", \
> -	ENV{SYSTEMD_READY}="0"
> +
> +# case 1: this is definitely multipath
> +ENV{DM_MULTIPATH_DEVICE_PATH}=="1", \
> +	ENV{ID_FS_TYPE}="mpath_member", ENV{SYSTEMD_READY}="0", \
> +	ENV{FIND_MULTIPATHS_WAIT_UNTIL}="finished", \
> +	GOTO="end_mpath"
> +
> +# case 2: this is definitely not multipath
> +ENV{DM_MULTIPATH_DEVICE_PATH}!="2", \
> +	ENV{FIND_MULTIPATHS_WAIT_UNTIL}="finished", \
> +	GOTO="end_mpath"
> +
> +# All code below here is only run in "smart" mode.
> +
> +# FIND_MULTIPATHS_WAIT_UNTIL is the timeout (in seconds after the
> +# epoch). If waiting ends for any reason, it is set to "finished".
> +IMPORT{db}="FIND_MULTIPATHS_WAIT_UNTIL"
> +
> +# At this point we know DM_MULTIPATH_DEVICE_PATH==2.
> +# (multipath -u indicates this is "maybe" multipath)
> +
> +# case 3: waiting has already finished. Treat as non-multipath.
> +ENV{FIND_MULTIPATHS_WAIT_UNTIL}=="finished", \
> +	ENV{DM_MULTIPATH_DEVICE_PATH}="", GOTO="end_mpath"
> +
> +# The timeout should have been set by the multipath -u call above, set a default
> +# value it that didn't happen for whatever reason
> +ENV{FIND_MULTIPATHS_PATH_TMO}!="?*", ENV{FIND_MULTIPATHS_PATH_TMO}="5"
> +

This code adds three more callouts.  I know that the udev people dislike
these, and they do eat up time that can cause udev to timeout on busy
systems.  To avoid the overhead of these execs, as well as to make the
rules simpler, what do you thing about moving the 

IMPORT{db}="FIND_MULTIPATHS_WAIT_UNTIL"

line before the "multipath -u" call, and passing that as a parameter if
present.  Then multipath could check the current time and compare it.
It could also return an updated FIND_MULTIPATHS_WAIT_UNTIL as a udev
environment variable, instead of returning FIND_MULTIPATHS_PATH_TMO, and
forcing udev to calculate the new timeout. That would remove the need
for the other PROGRAM calls.

-Ben

> +PROGRAM="/bin/cat $sys/class/rtc/rtc0/since_epoch", \
> +	ENV{.TIME_NOW}="$result"
> +ENV{.TIME_NOW}!="?*", ENV{DM_MULTIPATH_DEVICE_PATH}="", \
> +	ENV{FIND_MULTIPATHS_WAIT_UNTIL}="error", GOTO="end_mpath"
> +
> +ENV{FIND_MULTIPATHS_WAIT_UNTIL}!="?*", GOTO="start_wait"
> +
> +# At this point, we know that FIND_MULTIPATHS_WAIT_UNTIL is a timeout value.
> +# If it's expired, give up waiting and assume this is non-multipath.
> +PROGRAM="/bin/sh -c '[ $env{.TIME_NOW} -ge $env{FIND_MULTIPATHS_WAIT_UNTIL} ]'", \
> +	ENV{DM_MULTIPATH_DEVICE_PATH}="", \
> +	ENV{FIND_MULTIPATHS_WAIT_UNTIL}="finished", \
> +	GOTO="end_mpath"
> +
> +# Timer not expired yet. The current uevent has not been triggered by the
> +# systemd timer but something else, e.g. 60-block.rules. Continue pretending.
> +ACTION!="add", GOTO="pretend_mpath"
> +
> +# Special case: Another "add" event happened before the timeout expired.
> +# This can happen during "coldplug" after switching root FS, if a
> +# systemd timer started during initramfs processing had been cancelled.
> +# We need to start the timer again.
> +
> +LABEL="start_wait"
> +
> +# We are seeing this path for the first time, and it's "maybe" multipath.
> +# Pretend that it is actually multipath, and set a timer to create another
> +# uevent at FIND_MULTIPATHS_WAIT_UNTIL seconds after the epoch.
> +ENV{FIND_MULTIPATHS_WAIT_UNTIL}!="?*", \
> +	PROGRAM="/bin/sh -c 'echo $(( $env{.TIME_NOW} + $env{FIND_MULTIPATHS_PATH_TMO} ))'", \
> +	ENV{FIND_MULTIPATHS_WAIT_UNTIL}="$result"
> +
> +# The actual start command for the timer.
> +#
> +# The purpose of this command is only to make sure we will receive another
> +# uevent eventually. *Any* uevent may cause waiting to finish if it either ends
> +# in case 1-3 above, or if it arrives after FIND_MULTIPATHS_WAIT_UNTIL.
> +#
> +# Note that this will try to activate multipathd if it isn't running yet.
> +# If that fails, the unit starts and expires nonetheless. If multipathd
> +# startup needs to wait for other services, this wait time will add up with
> +# the --on-active timeout.
> +#
> +# We must trigger an "add" event because LVM2 will only act on those.
> +RUN+="/usr/bin/systemd-run --unit=cancel-multipath-wait-$kernel --description 'cancel waiting for multipath siblings of $kernel' --no-block --timer-property DefaultDependencies=no --timer-property Conflicts=shutdown.target --timer-property Before=shutdown.target --timer-property AccuracySec=500ms --property DefaultDependencies=no --property Conflicts=shutdown.target --property Before=shutdown.target --property Wants=multipathd.service --property After=multipathd.service --on-active=$env{FIND_MULTIPATHS_PATH_TMO} /usr/bin/udevadm trigger --action=add $sys$devpath"
> +
> +LABEL="pretend_mpath"
> +ENV{DM_MULTIPATH_DEVICE_PATH}="1"
> +ENV{SYSTEMD_READY}="0"
>  
>  LABEL="end_mpath"
> -- 
> 2.16.1

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Martin Wilck March 27, 2018, 9:34 p.m. UTC | #2
On Tue, 2018-03-27 at 16:03 -0500, Benjamin Marzinski wrote:
> On Mon, Mar 19, 2018 at 04:01:52PM +0100, Martin Wilck wrote:
> > When the first path to a device appears, we don't know if more
> > paths are going
> > to follow. find_multipath "smart" logic attempts to solve this
> > dilemma by
> > waiting for additional paths for a configurable time before giving
> > up
> > and releasing single paths to upper layers.
> > 
> > These rules apply only if both find_multipaths is set to "smart" in
> > multipath.conf. In this mode, multipath -u sets
> > DM_MULTIPATH_DEVICE_PATH=2 if
> > there's no clear evidence wheteher a given device should be a
> > multipath member
> > (not blacklisted, not listed as "failed", not in WWIDs file, not
> > member of an
> > exisiting map, only one path seen yet).
> > 
> > In this case, pretend that the path is multipath member, disallow
> > further
> > processing by systemd (allowing multipathd some time to grab the
> > path),
> > and check again after some time. If the path is still not
> > multipathed by then,
> > pass it on to systemd for further processing.
> > 
> > The timeout is controlled by the "find_multipaths_timeout" config
> > option.
> > Note that delays caused by waiting don't "add up" during boot,
> > because the
> > timers run concurrently.
> > 
> > Implementation note: This logic requires obtaining the current
> > time. It's not
> > trivial to do this in udev rules in a portable way, because
> > "/bin/date" is
> > often not available in restricted environments such as the initrd.
> > I chose
> > the sysfs method, because /sys/class/rtc/rtc0 seems to be quite
> > universally
> > available. I'm open for better suggestions if there are any.
> 
> I have a couple of code issues, that I'll point out below, but I have
> an
> overall question.  If multipath exists in the initramfs, and a device
> is
> not claimed there, then after the pivot, multipath will not
> temporarily
> claim it, correct? 

Incorrect, it will do the temporary claim.

>  I'm pretty sure, but not totally certain, that udev
> database persists between the udev running in the initramfs and the
> regular system.

That's only true for devices that set OPTIONS+="db_persist", and dracut
sets this only for dm and md devices. For other devices,
/usr/lib/systemd/system/initrd-udevadm-cleanup-db.service cleans up the
udev data base, and devices are seen as "new" during coldplug. So, if
there's still only one path and no other information (e.g. wwids file)
after pivot, we'll wait.

>  On the other hand, if multipth isn't in the initramfs
> but it is in the regular system, then AFAICS, once the system pivots
> to
> the regular fs, there is nothing to warn multipath that these devices
> could already be in use, correct? 

Correct.

>  So, even if you don't need to
> multipath any devices in your initramfs, you will need multipath in
> your
> initramfs, or it could go setting devices to not ready. right?

The following happens: multipath -u temporarily claims the device. When
multipathd starts, it fails to set up the map, sets the "failed"
marker, and retriggers udev. The second time, multipath -u unclaims the
device because it recognizes it as failed.

I admit I haven't tested the default Red Hat setup with a very
restrictive multipath.conf in the initrd. But I'm pretty certain that
in that case, the same thing happens.
I'd be grateful if you could give it a try :-)

> 
> > 
> > Signed-off-by: Martin Wilck <mwilck@suse.com>
> > ---
> >  multipath/multipath.rules | 80
> > +++++++++++++++++++++++++++++++++++++++++++++--
> >  1 file changed, 78 insertions(+), 2 deletions(-)
> > 
> > diff --git a/multipath/multipath.rules b/multipath/multipath.rules
> > index aab64dc7182c..32d33991db3d 100644
> > --- a/multipath/multipath.rules
> > +++ b/multipath/multipath.rules
> > @@ -21,7 +21,83 @@ TEST!="$env{MPATH_SBIN_PATH}/multipath",
> > ENV{MPATH_SBIN_PATH}="/usr/sbin"
> >  
> >  # multipath -u sets DM_MULTIPATH_DEVICE_PATH
> >  ENV{DM_MULTIPATH_DEVICE_PATH}!="1",
> > IMPORT{program}="$env{MPATH_SBIN_PATH}/multipath -u %k"
> > -ENV{DM_MULTIPATH_DEVICE_PATH}=="1",
> > ENV{ID_FS_TYPE}="mpath_member", \
> > -	ENV{SYSTEMD_READY}="0"
> > +
> > +# case 1: this is definitely multipath
> > +ENV{DM_MULTIPATH_DEVICE_PATH}=="1", \
> > +	ENV{ID_FS_TYPE}="mpath_member", ENV{SYSTEMD_READY}="0", \
> > +	ENV{FIND_MULTIPATHS_WAIT_UNTIL}="finished", \
> > +	GOTO="end_mpath"
> > +
> > +# case 2: this is definitely not multipath
> > +ENV{DM_MULTIPATH_DEVICE_PATH}!="2", \
> > +	ENV{FIND_MULTIPATHS_WAIT_UNTIL}="finished", \
> > +	GOTO="end_mpath"
> > +
> > +# All code below here is only run in "smart" mode.
> > +
> > +# FIND_MULTIPATHS_WAIT_UNTIL is the timeout (in seconds after the
> > +# epoch). If waiting ends for any reason, it is set to "finished".
> > +IMPORT{db}="FIND_MULTIPATHS_WAIT_UNTIL"
> > +
> > +# At this point we know DM_MULTIPATH_DEVICE_PATH==2.
> > +# (multipath -u indicates this is "maybe" multipath)
> > +
> > +# case 3: waiting has already finished. Treat as non-multipath.
> > +ENV{FIND_MULTIPATHS_WAIT_UNTIL}=="finished", \
> > +	ENV{DM_MULTIPATH_DEVICE_PATH}="", GOTO="end_mpath"
> > +
> > +# The timeout should have been set by the multipath -u call above,
> > set a default
> > +# value it that didn't happen for whatever reason
> > +ENV{FIND_MULTIPATHS_PATH_TMO}!="?*",
> > ENV{FIND_MULTIPATHS_PATH_TMO}="5"
> > +
> 
> This code adds three more callouts.  I know that the udev people
> dislike
> these, and they do eat up time that can cause udev to timeout on busy
> systems.  To avoid the overhead of these execs, as well as to make
> the
> rules simpler, what do you thing about moving the 
> 
> IMPORT{db}="FIND_MULTIPATHS_WAIT_UNTIL"
> 
> line before the "multipath -u" call, and passing that as a parameter
> if
> present.  Then multipath could check the current time and compare it.
> It could also return an updated FIND_MULTIPATHS_WAIT_UNTIL as a udev
> environment variable, instead of returning FIND_MULTIPATHS_PATH_TMO,
> and
> forcing udev to calculate the new timeout. That would remove the need
> for the other PROGRAM calls.

That's a nice idea. Why didn't I have it?

Martin
Benjamin Marzinski March 27, 2018, 11:07 p.m. UTC | #3
On Tue, Mar 27, 2018 at 11:34:00PM +0200, Martin Wilck wrote:
> On Tue, 2018-03-27 at 16:03 -0500, Benjamin Marzinski wrote:
> > On Mon, Mar 19, 2018 at 04:01:52PM +0100, Martin Wilck wrote:
> > > When the first path to a device appears, we don't know if more
> > > paths are going
> > > to follow. find_multipath "smart" logic attempts to solve this
> > > dilemma by
> > > waiting for additional paths for a configurable time before giving
> > > up
> > > and releasing single paths to upper layers.
> > > 
> > > These rules apply only if both find_multipaths is set to "smart" in
> > > multipath.conf. In this mode, multipath -u sets
> > > DM_MULTIPATH_DEVICE_PATH=2 if
> > > there's no clear evidence wheteher a given device should be a
> > > multipath member
> > > (not blacklisted, not listed as "failed", not in WWIDs file, not
> > > member of an
> > > exisiting map, only one path seen yet).
> > > 
> > > In this case, pretend that the path is multipath member, disallow
> > > further
> > > processing by systemd (allowing multipathd some time to grab the
> > > path),
> > > and check again after some time. If the path is still not
> > > multipathed by then,
> > > pass it on to systemd for further processing.
> > > 
> > > The timeout is controlled by the "find_multipaths_timeout" config
> > > option.
> > > Note that delays caused by waiting don't "add up" during boot,
> > > because the
> > > timers run concurrently.
> > > 
> > > Implementation note: This logic requires obtaining the current
> > > time. It's not
> > > trivial to do this in udev rules in a portable way, because
> > > "/bin/date" is
> > > often not available in restricted environments such as the initrd.
> > > I chose
> > > the sysfs method, because /sys/class/rtc/rtc0 seems to be quite
> > > universally
> > > available. I'm open for better suggestions if there are any.
> > 
> > I have a couple of code issues, that I'll point out below, but I have
> > an
> > overall question.  If multipath exists in the initramfs, and a device
> > is
> > not claimed there, then after the pivot, multipath will not
> > temporarily
> > claim it, correct? 
> 
> Incorrect, it will do the temporary claim.
> 
> >  I'm pretty sure, but not totally certain, that udev
> > database persists between the udev running in the initramfs and the
> > regular system.
> 
> That's only true for devices that set OPTIONS+="db_persist", and dracut
> sets this only for dm and md devices. For other devices,
> /usr/lib/systemd/system/initrd-udevadm-cleanup-db.service cleans up the
> udev data base, and devices are seen as "new" during coldplug. So, if
> there's still only one path and no other information (e.g. wwids file)
> after pivot, we'll wait.
> 
> >  On the other hand, if multipth isn't in the initramfs
> > but it is in the regular system, then AFAICS, once the system pivots
> > to
> > the regular fs, there is nothing to warn multipath that these devices
> > could already be in use, correct? 
> 
> Correct.
> 
> >  So, even if you don't need to
> > multipath any devices in your initramfs, you will need multipath in
> > your
> > initramfs, or it could go setting devices to not ready. right?
> 
> The following happens: multipath -u temporarily claims the device. When
> multipathd starts, it fails to set up the map, sets the "failed"
> marker, and retriggers udev. The second time, multipath -u unclaims the
> device because it recognizes it as failed.

But if that device is already in use because multipath didn't claim it
in the initramfs, and you suddenly mark it as ENV{SYSTEMD_READY}="0",
this can cause systemd to automatically unmount any filesystem on it.
This isn't just a problem with Red Hat's setup.  If it's not a
configured device type, there will only be a short timeout, but that's
still enough to mess with devices that are already in use.  I'm pretty
sure that the multipath temporary claiming is only safe the very first
time a device appears. Otherwise, it's possible that something else will
claim it first, and then multipath will claim it and mess with that
other user.

> I admit I haven't tested the default Red Hat setup with a very
> restrictive multipath.conf in the initrd. But I'm pretty certain that
> in that case, the same thing happens.
> I'd be grateful if you could give it a try :-)
> 
> > 
> > > 
> > > Signed-off-by: Martin Wilck <mwilck@suse.com>
> > > ---
> > >  multipath/multipath.rules | 80
> > > +++++++++++++++++++++++++++++++++++++++++++++--
> > >  1 file changed, 78 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/multipath/multipath.rules b/multipath/multipath.rules
> > > index aab64dc7182c..32d33991db3d 100644
> > > --- a/multipath/multipath.rules
> > > +++ b/multipath/multipath.rules
> > > @@ -21,7 +21,83 @@ TEST!="$env{MPATH_SBIN_PATH}/multipath",
> > > ENV{MPATH_SBIN_PATH}="/usr/sbin"
> > >  
> > >  # multipath -u sets DM_MULTIPATH_DEVICE_PATH
> > >  ENV{DM_MULTIPATH_DEVICE_PATH}!="1",
> > > IMPORT{program}="$env{MPATH_SBIN_PATH}/multipath -u %k"
> > > -ENV{DM_MULTIPATH_DEVICE_PATH}=="1",
> > > ENV{ID_FS_TYPE}="mpath_member", \
> > > -	ENV{SYSTEMD_READY}="0"
> > > +
> > > +# case 1: this is definitely multipath
> > > +ENV{DM_MULTIPATH_DEVICE_PATH}=="1", \
> > > +	ENV{ID_FS_TYPE}="mpath_member", ENV{SYSTEMD_READY}="0", \
> > > +	ENV{FIND_MULTIPATHS_WAIT_UNTIL}="finished", \
> > > +	GOTO="end_mpath"
> > > +
> > > +# case 2: this is definitely not multipath
> > > +ENV{DM_MULTIPATH_DEVICE_PATH}!="2", \
> > > +	ENV{FIND_MULTIPATHS_WAIT_UNTIL}="finished", \
> > > +	GOTO="end_mpath"
> > > +
> > > +# All code below here is only run in "smart" mode.
> > > +
> > > +# FIND_MULTIPATHS_WAIT_UNTIL is the timeout (in seconds after the
> > > +# epoch). If waiting ends for any reason, it is set to "finished".
> > > +IMPORT{db}="FIND_MULTIPATHS_WAIT_UNTIL"
> > > +
> > > +# At this point we know DM_MULTIPATH_DEVICE_PATH==2.
> > > +# (multipath -u indicates this is "maybe" multipath)
> > > +
> > > +# case 3: waiting has already finished. Treat as non-multipath.
> > > +ENV{FIND_MULTIPATHS_WAIT_UNTIL}=="finished", \
> > > +	ENV{DM_MULTIPATH_DEVICE_PATH}="", GOTO="end_mpath"
> > > +
> > > +# The timeout should have been set by the multipath -u call above,
> > > set a default
> > > +# value it that didn't happen for whatever reason
> > > +ENV{FIND_MULTIPATHS_PATH_TMO}!="?*",
> > > ENV{FIND_MULTIPATHS_PATH_TMO}="5"
> > > +
> > 
> > This code adds three more callouts.  I know that the udev people
> > dislike
> > these, and they do eat up time that can cause udev to timeout on busy
> > systems.  To avoid the overhead of these execs, as well as to make
> > the
> > rules simpler, what do you thing about moving the 
> > 
> > IMPORT{db}="FIND_MULTIPATHS_WAIT_UNTIL"
> > 
> > line before the "multipath -u" call, and passing that as a parameter
> > if
> > present.  Then multipath could check the current time and compare it.
> > It could also return an updated FIND_MULTIPATHS_WAIT_UNTIL as a udev
> > environment variable, instead of returning FIND_MULTIPATHS_PATH_TMO,
> > and
> > forcing udev to calculate the new timeout. That would remove the need
> > for the other PROGRAM calls.
> 
> That's a nice idea. Why didn't I have it?
> 
> Martin
> 
> -- 
> Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Martin Wilck March 28, 2018, 2:51 p.m. UTC | #4
On Tue, 2018-03-27 at 18:07 -0500, Benjamin Marzinski wrote:
> On Tue, Mar 27, 2018 at 11:34:00PM +0200, Martin Wilck wrote:
> > 
> > The following happens: multipath -u temporarily claims the device.
> > When
> > multipathd starts, it fails to set up the map, sets the "failed"
> > marker, and retriggers udev. The second time, multipath -u unclaims
> > the
> > device because it recognizes it as failed.
> 
> But if that device is already in use because multipath didn't claim
> it
> in the initramfs, and you suddenly mark it as ENV{SYSTEMD_READY}="0",
> this can cause systemd to automatically unmount any filesystem on it.
> This isn't just a problem with Red Hat's setup.  If it's not a
> configured device type, there will only be a short timeout, but
> that's
> still enough to mess with devices that are already in use.  I'm
> pretty
> sure that the multipath temporary claiming is only safe the very
> first
> time a device appears. Otherwise, it's possible that something else
> will
> claim it first, and then multipath will claim it and mess with that
> other user.
> 

Arrgh. Thanks for pointing that out. It's even worse because even if we
could figure out whether we're being called for the first or second
time, it wouldn't be sufficient. multipath.rules may not even be
present in the initrd, so the device may be present in the system, and
used, without multipath.rules having ever been called.

The only way I see to avoid this is to try calling open(O_EXCL) on the
path device in "multipath -u". So far we've avoided that because it's
not completely race-free. But we're not talking about a race here, but
a situation where some entity grabbed a device before pivot-root.
So we could attempt open(O_EXCL) only in the "is maybe a valid path"
case, and only return "maybe" if that open succeeds. Otherwise we'd
return "no", as we already checked that the device isn't currently part
of a multipath map.

Agreed?

Martin
Benjamin Marzinski March 28, 2018, 5:12 p.m. UTC | #5
On Wed, Mar 28, 2018 at 04:51:12PM +0200, Martin Wilck wrote:
> On Tue, 2018-03-27 at 18:07 -0500, Benjamin Marzinski wrote:
> > On Tue, Mar 27, 2018 at 11:34:00PM +0200, Martin Wilck wrote:
> > > 
> > > The following happens: multipath -u temporarily claims the device.
> > > When
> > > multipathd starts, it fails to set up the map, sets the "failed"
> > > marker, and retriggers udev. The second time, multipath -u unclaims
> > > the
> > > device because it recognizes it as failed.
> > 
> > But if that device is already in use because multipath didn't claim
> > it
> > in the initramfs, and you suddenly mark it as ENV{SYSTEMD_READY}="0",
> > this can cause systemd to automatically unmount any filesystem on it.
> > This isn't just a problem with Red Hat's setup.  If it's not a
> > configured device type, there will only be a short timeout, but
> > that's
> > still enough to mess with devices that are already in use.  I'm
> > pretty
> > sure that the multipath temporary claiming is only safe the very
> > first
> > time a device appears. Otherwise, it's possible that something else
> > will
> > claim it first, and then multipath will claim it and mess with that
> > other user.
> > 
> 
> Arrgh. Thanks for pointing that out. It's even worse because even if we
> could figure out whether we're being called for the first or second
> time, it wouldn't be sufficient. multipath.rules may not even be
> present in the initrd, so the device may be present in the system, and
> used, without multipath.rules having ever been called.
> 
> The only way I see to avoid this is to try calling open(O_EXCL) on the
> path device in "multipath -u". So far we've avoided that because it's
> not completely race-free. But we're not talking about a race here, but
> a situation where some entity grabbed a device before pivot-root.
> So we could attempt open(O_EXCL) only in the "is maybe a valid path"
> case, and only return "maybe" if that open succeeds. Otherwise we'd
> return "no", as we already checked that the device isn't currently part
> of a multipath map.
> 

Something like that should work.  I think we need to be careful to only
ever do the exclusive open on an "add" event. In fact in probably
wouldn't hurt to add some checks to make sure that we never start
a temporary claim on a change event.  If we get a change event and
FIND_MULTIPATHS_WAIT_UNTIL is unset, we should just assume that we
missed the add event, and it should be set to "finished".

What would be really nice would be to be able to tell if we are doing a
cold-plug add after the pivot.  This is the only time where we actually
need to check if a device is in use. Otherwise nobody else will have had
a chance to use it yet.

-Ben

> Agreed?
> 
> Martin
> 
> -- 
> Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Hannes Reinecke March 29, 2018, 6:04 a.m. UTC | #6
On Wed, 28 Mar 2018 16:51:12 +0200
Martin Wilck <mwilck@suse.com> wrote:

> On Tue, 2018-03-27 at 18:07 -0500, Benjamin Marzinski wrote:
> > On Tue, Mar 27, 2018 at 11:34:00PM +0200, Martin Wilck wrote:  
> > > 
> > > The following happens: multipath -u temporarily claims the device.
> > > When
> > > multipathd starts, it fails to set up the map, sets the "failed"
> > > marker, and retriggers udev. The second time, multipath -u
> > > unclaims the
> > > device because it recognizes it as failed.  
> > 
> > But if that device is already in use because multipath didn't claim
> > it
> > in the initramfs, and you suddenly mark it as
> > ENV{SYSTEMD_READY}="0", this can cause systemd to automatically
> > unmount any filesystem on it. This isn't just a problem with Red
> > Hat's setup.  If it's not a configured device type, there will only
> > be a short timeout, but that's
> > still enough to mess with devices that are already in use.  I'm
> > pretty
> > sure that the multipath temporary claiming is only safe the very
> > first
> > time a device appears. Otherwise, it's possible that something else
> > will
> > claim it first, and then multipath will claim it and mess with that
> > other user.
> >   
> 
> Arrgh. Thanks for pointing that out. It's even worse because even if
> we could figure out whether we're being called for the first or second
> time, it wouldn't be sufficient. multipath.rules may not even be
> present in the initrd, so the device may be present in the system, and
> used, without multipath.rules having ever been called.
> 
> The only way I see to avoid this is to try calling open(O_EXCL) on the
> path device in "multipath -u". So far we've avoided that because it's
> not completely race-free. But we're not talking about a race here, but
> a situation where some entity grabbed a device before pivot-root.
> So we could attempt open(O_EXCL) only in the "is maybe a valid path"
> case, and only return "maybe" if that open succeeds. Otherwise we'd
> return "no", as we already checked that the device isn't currently
> part of a multipath map.
> 
Ho-hum.

Watch out when you do this; systemd will generate another event
whenever you close this fd, and suddenly you find yourself in a middle
of an event storm ...

Cheers,

Hannes

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Martin Wilck March 29, 2018, 9:15 a.m. UTC | #7
On Thu, 2018-03-29 at 08:04 +0200, Hannes Reinecke wrote:
> 
> > The only way I see to avoid this is to try calling open(O_EXCL) on
> > the
> > path device in "multipath -u". So far we've avoided that because
> > it's
> > not completely race-free. But we're not talking about a race here,
> > but
> > a situation where some entity grabbed a device before pivot-root.
> > So we could attempt open(O_EXCL) only in the "is maybe a valid
> > path"
> > case, and only return "maybe" if that open succeeds. Otherwise we'd
> > return "no", as we already checked that the device isn't currently
> > part of a multipath map.
> > 
> 
> Ho-hum.
> 
> Watch out when you do this; systemd will generate another event
> whenever you close this fd, and suddenly you find yourself in a
> middle
> of an event storm ...

Thanks for mentioning that.

Udev listens for inotify events for "close after write". We will do the
test with O_RDONLY|O_EXCL, so no event should be generated. Moreover,
events should only occur if udev is watching the device, which
shouldn't be the case while an event is processed (but I'm not 100%
certain about that). Anyway, I'll test this.
 
Cheers,
Martin
diff mbox

Patch

diff --git a/multipath/multipath.rules b/multipath/multipath.rules
index aab64dc7182c..32d33991db3d 100644
--- a/multipath/multipath.rules
+++ b/multipath/multipath.rules
@@ -21,7 +21,83 @@  TEST!="$env{MPATH_SBIN_PATH}/multipath", ENV{MPATH_SBIN_PATH}="/usr/sbin"
 
 # multipath -u sets DM_MULTIPATH_DEVICE_PATH
 ENV{DM_MULTIPATH_DEVICE_PATH}!="1", IMPORT{program}="$env{MPATH_SBIN_PATH}/multipath -u %k"
-ENV{DM_MULTIPATH_DEVICE_PATH}=="1", ENV{ID_FS_TYPE}="mpath_member", \
-	ENV{SYSTEMD_READY}="0"
+
+# case 1: this is definitely multipath
+ENV{DM_MULTIPATH_DEVICE_PATH}=="1", \
+	ENV{ID_FS_TYPE}="mpath_member", ENV{SYSTEMD_READY}="0", \
+	ENV{FIND_MULTIPATHS_WAIT_UNTIL}="finished", \
+	GOTO="end_mpath"
+
+# case 2: this is definitely not multipath
+ENV{DM_MULTIPATH_DEVICE_PATH}!="2", \
+	ENV{FIND_MULTIPATHS_WAIT_UNTIL}="finished", \
+	GOTO="end_mpath"
+
+# All code below here is only run in "smart" mode.
+
+# FIND_MULTIPATHS_WAIT_UNTIL is the timeout (in seconds after the
+# epoch). If waiting ends for any reason, it is set to "finished".
+IMPORT{db}="FIND_MULTIPATHS_WAIT_UNTIL"
+
+# At this point we know DM_MULTIPATH_DEVICE_PATH==2.
+# (multipath -u indicates this is "maybe" multipath)
+
+# case 3: waiting has already finished. Treat as non-multipath.
+ENV{FIND_MULTIPATHS_WAIT_UNTIL}=="finished", \
+	ENV{DM_MULTIPATH_DEVICE_PATH}="", GOTO="end_mpath"
+
+# The timeout should have been set by the multipath -u call above, set a default
+# value it that didn't happen for whatever reason
+ENV{FIND_MULTIPATHS_PATH_TMO}!="?*", ENV{FIND_MULTIPATHS_PATH_TMO}="5"
+
+PROGRAM="/bin/cat $sys/class/rtc/rtc0/since_epoch", \
+	ENV{.TIME_NOW}="$result"
+ENV{.TIME_NOW}!="?*", ENV{DM_MULTIPATH_DEVICE_PATH}="", \
+	ENV{FIND_MULTIPATHS_WAIT_UNTIL}="error", GOTO="end_mpath"
+
+ENV{FIND_MULTIPATHS_WAIT_UNTIL}!="?*", GOTO="start_wait"
+
+# At this point, we know that FIND_MULTIPATHS_WAIT_UNTIL is a timeout value.
+# If it's expired, give up waiting and assume this is non-multipath.
+PROGRAM="/bin/sh -c '[ $env{.TIME_NOW} -ge $env{FIND_MULTIPATHS_WAIT_UNTIL} ]'", \
+	ENV{DM_MULTIPATH_DEVICE_PATH}="", \
+	ENV{FIND_MULTIPATHS_WAIT_UNTIL}="finished", \
+	GOTO="end_mpath"
+
+# Timer not expired yet. The current uevent has not been triggered by the
+# systemd timer but something else, e.g. 60-block.rules. Continue pretending.
+ACTION!="add", GOTO="pretend_mpath"
+
+# Special case: Another "add" event happened before the timeout expired.
+# This can happen during "coldplug" after switching root FS, if a
+# systemd timer started during initramfs processing had been cancelled.
+# We need to start the timer again.
+
+LABEL="start_wait"
+
+# We are seeing this path for the first time, and it's "maybe" multipath.
+# Pretend that it is actually multipath, and set a timer to create another
+# uevent at FIND_MULTIPATHS_WAIT_UNTIL seconds after the epoch.
+ENV{FIND_MULTIPATHS_WAIT_UNTIL}!="?*", \
+	PROGRAM="/bin/sh -c 'echo $(( $env{.TIME_NOW} + $env{FIND_MULTIPATHS_PATH_TMO} ))'", \
+	ENV{FIND_MULTIPATHS_WAIT_UNTIL}="$result"
+
+# The actual start command for the timer.
+#
+# The purpose of this command is only to make sure we will receive another
+# uevent eventually. *Any* uevent may cause waiting to finish if it either ends
+# in case 1-3 above, or if it arrives after FIND_MULTIPATHS_WAIT_UNTIL.
+#
+# Note that this will try to activate multipathd if it isn't running yet.
+# If that fails, the unit starts and expires nonetheless. If multipathd
+# startup needs to wait for other services, this wait time will add up with
+# the --on-active timeout.
+#
+# We must trigger an "add" event because LVM2 will only act on those.
+RUN+="/usr/bin/systemd-run --unit=cancel-multipath-wait-$kernel --description 'cancel waiting for multipath siblings of $kernel' --no-block --timer-property DefaultDependencies=no --timer-property Conflicts=shutdown.target --timer-property Before=shutdown.target --timer-property AccuracySec=500ms --property DefaultDependencies=no --property Conflicts=shutdown.target --property Before=shutdown.target --property Wants=multipathd.service --property After=multipathd.service --on-active=$env{FIND_MULTIPATHS_PATH_TMO} /usr/bin/udevadm trigger --action=add $sys$devpath"
+
+LABEL="pretend_mpath"
+ENV{DM_MULTIPATH_DEVICE_PATH}="1"
+ENV{SYSTEMD_READY}="0"
 
 LABEL="end_mpath"