Message ID | 20170902223900.7339-21-mwilck@suse.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Delegated to: | christophe varoqui |
Headers | show |
On Sun, Sep 03, 2017 at 12:38:49AM +0200, Martin Wilck wrote: > The fact alone that a map changes from not ready to ready does > not imply that it is activating. NAK on this one and [PATCH 21/31] 11-dm-mpath.rules: Remember DM_ACTIVATION This breaks things. It's often the case that there are devices stacked on top of multipath. When multipath loses all of its paths, if it doesn't have queue_if_no_path set, it can fail IO up to these higher devices, forcing them to react to the failure. There is no way that I know of to be able to check in udev to see if this has occured. The only safe course of action is to assume the worst and notify lvm that things are better now. A simple reproducer of the problem is something like: - create 2 multipath devices (call them mpatha and mpathb) - set up raid1 on top of them # pvcreate /dev/mapper/mpatha # pvcreate /dev/mapper/mpathc # vgcreate test /dev/mapper/mpathb /dev/mapper/mpatha # lvcreate --type raid1 -m 1 -L 1G -n raid test - verify that everything is o.k. # dd if=/dev/zero of=/dev/test/raid bs=1M count=100 # multipath -ll # pvs # vgs # lvs - remove all of the path devices from mpatha - verify that the raid device is in partial mode # dd if=/dev/zero of=/dev/test/raid bs=1M count=100 # multipath -ll # pvs # vgs # lvs - add the path devices back to the system - check if the raid device has recovered # multipath -ll # pvs # vgs # lvs With these patches, the raid device remains in partial mode, and the mpatha pv is still listed as unknown. If I revert these two patches, everything works fine again. -Ben > Signed-off-by: Martin Wilck <mwilck@suse.com> > --- > multipath/11-dm-mpath.rules | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/multipath/11-dm-mpath.rules b/multipath/11-dm-mpath.rules > index 0be22ae4..3f47744f 100644 > --- a/multipath/11-dm-mpath.rules > +++ b/multipath/11-dm-mpath.rules > @@ -64,8 +64,7 @@ ENV{MPATH_DEVICE_READY}=="0", ENV{.MPATH_DEVICE_READY_OLD}=="1",\ > ENV{MPATH_DEVICE_READY}=="0", ENV{DM_UDEV_DISABLE_OTHER_RULES_FLAG}="1" > ENV{MPATH_DEVICE_READY}!="0", ENV{.MPATH_DEVICE_READY_OLD}=="0",\ > ENV{DM_UDEV_DISABLE_OTHER_RULES_FLAG}="$env{DM_DISABLE_OTHER_RULES_FLAG_OLD}",\ > - ENV{DM_DISABLE_OTHER_RULES_FLAG_OLD}="",\ > - ENV{DM_ACTIVATION}="1" > + ENV{DM_DISABLE_OTHER_RULES_FLAG_OLD}="" > > LABEL="scan_import" > ENV{DM_NOSCAN}!="1", GOTO="mpath_end" > -- > 2.14.0 -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Wed, 2017-09-13 at 16:19 -0500, Benjamin Marzinski wrote: > On Sun, Sep 03, 2017 at 12:38:49AM +0200, Martin Wilck wrote: > > The fact alone that a map changes from not ready to ready does > > not imply that it is activating. > > NAK on this one and > [PATCH 21/31] 11-dm-mpath.rules: Remember DM_ACTIVATION > > This breaks things. It's often the case that there are devices > stacked > on top of multipath. When multipath loses all of its paths, if it > doesn't have queue_if_no_path set, it can fail IO up to these higher > devices, forcing them to react to the failure. There is no way that > I > know of to be able to check in udev to see if this has occured. The > only > safe course of action is to assume the worst and notify lvm that > things > are better now. You are right, thanks. I didn't consider the case without queueing. I see no problem with just skipping these two. Best, Martin > > A simple reproducer of the problem is something like: > > - create 2 multipath devices (call them mpatha and mpathb) > - set up raid1 on top of them > > # pvcreate /dev/mapper/mpatha > # pvcreate /dev/mapper/mpathc > # vgcreate test /dev/mapper/mpathb /dev/mapper/mpatha > # lvcreate --type raid1 -m 1 -L 1G -n raid test > > - verify that everything is o.k. > > # dd if=/dev/zero of=/dev/test/raid bs=1M count=100 > # multipath -ll > # pvs > # vgs > # lvs > > - remove all of the path devices from mpatha > - verify that the raid device is in partial mode > > # dd if=/dev/zero of=/dev/test/raid bs=1M count=100 > # multipath -ll > # pvs > # vgs > # lvs > > - add the path devices back to the system > - check if the raid device has recovered > > # multipath -ll > # pvs > # vgs > # lvs > > With these patches, the raid device remains in partial mode, and the > mpatha pv is still listed as unknown. If I revert these two patches, > everything works fine again. > > -Ben > > > Signed-off-by: Martin Wilck <mwilck@suse.com> > > --- > > multipath/11-dm-mpath.rules | 3 +-- > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > diff --git a/multipath/11-dm-mpath.rules b/multipath/11-dm- > > mpath.rules > > index 0be22ae4..3f47744f 100644 > > --- a/multipath/11-dm-mpath.rules > > +++ b/multipath/11-dm-mpath.rules > > @@ -64,8 +64,7 @@ ENV{MPATH_DEVICE_READY}=="0", > > ENV{.MPATH_DEVICE_READY_OLD}=="1",\ > > ENV{MPATH_DEVICE_READY}=="0", > > ENV{DM_UDEV_DISABLE_OTHER_RULES_FLAG}="1" > > ENV{MPATH_DEVICE_READY}!="0", ENV{.MPATH_DEVICE_READY_OLD}=="0",\ > > ENV{DM_UDEV_DISABLE_OTHER_RULES_FLAG}="$env{DM_DISABLE_OTH > > ER_RULES_FLAG_OLD}",\ > > - ENV{DM_DISABLE_OTHER_RULES_FLAG_OLD}="",\ > > - ENV{DM_ACTIVATION}="1" > > + ENV{DM_DISABLE_OTHER_RULES_FLAG_OLD}="" > > > > LABEL="scan_import" > > ENV{DM_NOSCAN}!="1", GOTO="mpath_end" > > -- > > 2.14.0 > >
On Wed, 2017-09-13 at 16:19 -0500, Benjamin Marzinski wrote: > On Sun, Sep 03, 2017 at 12:38:49AM +0200, Martin Wilck wrote: > > The fact alone that a map changes from not ready to ready does > > not imply that it is activating. > > NAK on this one and > [PATCH 21/31] 11-dm-mpath.rules: Remember DM_ACTIVATION > > This breaks things. It's often the case that there are devices > stacked > on top of multipath. When multipath loses all of its paths, if it > doesn't have queue_if_no_path set, it can fail IO up to these higher > devices, forcing them to react to the failure. Thinking about this again, most of the stuff we do in the multipath- related udev rules is for the queueing case. Without "queue_if_no_path", all that checking whether or not the map will sustain IO is useless; it would be better to simply pass the device on to upper layers, which will try to probe it and fail, which is intended in the non-queueing setup. Perhaps we should check "queue_if_no_path" early on and skip most of the stuff we do in the queueing case? If we did this, maybe even my patches 20/21 might be worth reconsideration for the "queue_if_no_path" case only? Cheers, Martin
diff --git a/multipath/11-dm-mpath.rules b/multipath/11-dm-mpath.rules index 0be22ae4..3f47744f 100644 --- a/multipath/11-dm-mpath.rules +++ b/multipath/11-dm-mpath.rules @@ -64,8 +64,7 @@ ENV{MPATH_DEVICE_READY}=="0", ENV{.MPATH_DEVICE_READY_OLD}=="1",\ ENV{MPATH_DEVICE_READY}=="0", ENV{DM_UDEV_DISABLE_OTHER_RULES_FLAG}="1" ENV{MPATH_DEVICE_READY}!="0", ENV{.MPATH_DEVICE_READY_OLD}=="0",\ ENV{DM_UDEV_DISABLE_OTHER_RULES_FLAG}="$env{DM_DISABLE_OTHER_RULES_FLAG_OLD}",\ - ENV{DM_DISABLE_OTHER_RULES_FLAG_OLD}="",\ - ENV{DM_ACTIVATION}="1" + ENV{DM_DISABLE_OTHER_RULES_FLAG_OLD}="" LABEL="scan_import" ENV{DM_NOSCAN}!="1", GOTO="mpath_end"
The fact alone that a map changes from not ready to ready does not imply that it is activating. Signed-off-by: Martin Wilck <mwilck@suse.com> --- multipath/11-dm-mpath.rules | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)