Message ID | 1494349025-6561-5-git-send-email-bmarzins@redhat.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Delegated to: | christophe varoqui |
Headers | show |
On Tue, 2017-05-09 at 11:57 -0500, Benjamin Marzinski wrote: > When multipath reloads a device, it can either fail while loading the > new table or while resuming the device. If it fails while resuming > the > device, the device can get stuck in the suspended state. To fix > this, > multipath needs to resume the device again so that it can continue > using > the old table. > > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> > --- > libmultipath/devmapper.c | 19 ++++++++++++++++++- > libmultipath/devmapper.h | 1 + > 2 files changed, 19 insertions(+), 1 deletion(-) > > diff --git a/libmultipath/devmapper.c b/libmultipath/devmapper.c > index 2c4a13a..69b634b 100644 > --- a/libmultipath/devmapper.c > +++ b/libmultipath/devmapper.c > @@ -396,7 +396,13 @@ int dm_addmap_reload(struct multipath *mpp, char > *params, int flush) > if (r) > r = dm_simplecmd(DM_DEVICE_RESUME, mpp->alias, > !flush, > 1, udev_flags, 0); > - return r; > + if (r) > + return r; > + > + if (dm_is_suspended(mpp->alias)) > + dm_simplecmd(DM_DEVICE_RESUME, mpp->alias, !flush, > 1, > + udev_flags, 0); > + return 0; > } Why would the second DM_DEVICE_RESUME call succeed if the first one failed? Martin
On Thu, May 11, 2017 at 10:26:52PM +0200, Martin Wilck wrote: > On Tue, 2017-05-09 at 11:57 -0500, Benjamin Marzinski wrote: > > When multipath reloads a device, it can either fail while loading the > > new table or while resuming the device. If it fails while resuming > > the > > device, the device can get stuck in the suspended state. To fix > > this, > > multipath needs to resume the device again so that it can continue > > using > > the old table. > > > > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> > > --- > > libmultipath/devmapper.c | 19 ++++++++++++++++++- > > libmultipath/devmapper.h | 1 + > > 2 files changed, 19 insertions(+), 1 deletion(-) > > > > diff --git a/libmultipath/devmapper.c b/libmultipath/devmapper.c > > index 2c4a13a..69b634b 100644 > > --- a/libmultipath/devmapper.c > > +++ b/libmultipath/devmapper.c > > @@ -396,7 +396,13 @@ int dm_addmap_reload(struct multipath *mpp, char > > *params, int flush) > > if (r) > > r = dm_simplecmd(DM_DEVICE_RESUME, mpp->alias, > > !flush, > > 1, udev_flags, 0); > > - return r; > > + if (r) > > + return r; > > + > > + if (dm_is_suspended(mpp->alias)) > > + dm_simplecmd(DM_DEVICE_RESUME, mpp->alias, !flush, > > 1, > > + udev_flags, 0); > > + return 0; > > } > > Why would the second DM_DEVICE_RESUME call succeed if the first one > failed? Because if the first resume fails, device-mapper rolls back to the original table. The specific way that someone found this was by running a multipathd resize when only some of the paths had been rescaned and noticed the new, larger, size. Since the first path had changed size (that's all the multipath looks at to find the new size) the multipath device tried to reload larger. However, when it tried to resume with the larger size, it failed since some of the devices were too small. The second resume lets it try again with the original table. Of course, that isn't the only way that this could fail, and it's nice to be able to gracefully continue using the old table instead of just being suspended. -Ben > > Martin > > -- > Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107 > SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton > HRB 21284 (AG Nürnberg) > > -- > dm-devel mailing list > dm-devel@redhat.com > https://www.redhat.com/mailman/listinfo/dm-devel -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Tue, 2017-05-16 at 12:33 -0500, Benjamin Marzinski wrote: > On Thu, May 11, 2017 at 10:26:52PM +0200, Martin Wilck wrote: > > On Tue, 2017-05-09 at 11:57 -0500, Benjamin Marzinski wrote: > > > When multipath reloads a device, it can either fail while loading > > > the > > > new table or while resuming the device. If it fails while > > > resuming > > > the > > > device, the device can get stuck in the suspended state. To fix > > > this, > > > multipath needs to resume the device again so that it can > > > continue > > > using > > > the old table. > > > > > > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> > > > --- > > > libmultipath/devmapper.c | 19 ++++++++++++++++++- > > > libmultipath/devmapper.h | 1 + > > > 2 files changed, 19 insertions(+), 1 deletion(-) > > > > > > diff --git a/libmultipath/devmapper.c b/libmultipath/devmapper.c > > > index 2c4a13a..69b634b 100644 > > > --- a/libmultipath/devmapper.c > > > +++ b/libmultipath/devmapper.c > > > @@ -396,7 +396,13 @@ int dm_addmap_reload(struct multipath *mpp, > > > char > > > *params, int flush) > > > if (r) > > > r = dm_simplecmd(DM_DEVICE_RESUME, mpp->alias, > > > !flush, > > > 1, udev_flags, 0); > > > - return r; > > > + if (r) > > > + return r; > > > + > > > + if (dm_is_suspended(mpp->alias)) > > > + dm_simplecmd(DM_DEVICE_RESUME, mpp->alias, > > > !flush, > > > 1, > > > + udev_flags, 0); > > > + return 0; > > > } > > > > Why would the second DM_DEVICE_RESUME call succeed if the first one > > failed? > > Because if the first resume fails, device-mapper rolls back to the > original table. Ah. I didn't know that, and I'm likely to forget it again. Would you mind adding a short comment at that point in the code? Otherwise, ACK. Martin
On Tue, May 16, 2017 at 08:56:40PM +0200, Martin Wilck wrote: > On Tue, 2017-05-16 at 12:33 -0500, Benjamin Marzinski wrote: > > On Thu, May 11, 2017 at 10:26:52PM +0200, Martin Wilck wrote: > > > On Tue, 2017-05-09 at 11:57 -0500, Benjamin Marzinski wrote: > > > > When multipath reloads a device, it can either fail while loading > > > > the > > > > new table or while resuming the device. If it fails while > > > > resuming > > > > the > > > > device, the device can get stuck in the suspended state. To fix > > > > this, > > > > multipath needs to resume the device again so that it can > > > > continue > > > > using > > > > the old table. > > > > > > > > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> > > > > --- > > > > libmultipath/devmapper.c | 19 ++++++++++++++++++- > > > > libmultipath/devmapper.h | 1 + > > > > 2 files changed, 19 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/libmultipath/devmapper.c b/libmultipath/devmapper.c > > > > index 2c4a13a..69b634b 100644 > > > > --- a/libmultipath/devmapper.c > > > > +++ b/libmultipath/devmapper.c > > > > @@ -396,7 +396,13 @@ int dm_addmap_reload(struct multipath *mpp, > > > > char > > > > *params, int flush) > > > > if (r) > > > > r = dm_simplecmd(DM_DEVICE_RESUME, mpp->alias, > > > > !flush, > > > > 1, udev_flags, 0); > > > > - return r; > > > > + if (r) > > > > + return r; > > > > + > > > > + if (dm_is_suspended(mpp->alias)) > > > > + dm_simplecmd(DM_DEVICE_RESUME, mpp->alias, > > > > !flush, > > > > 1, > > > > + udev_flags, 0); > > > > + return 0; > > > > } > > > > > > Why would the second DM_DEVICE_RESUME call succeed if the first one > > > failed? > > > > Because if the first resume fails, device-mapper rolls back to the > > original table. > > Ah. I didn't know that, and I'm likely to forget it again. Would you > mind adding a short comment at that point in the code? Sure. -Ben > > Otherwise, ACK. > > Martin > > -- > Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107 > SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton > HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
diff --git a/libmultipath/devmapper.c b/libmultipath/devmapper.c index 2c4a13a..69b634b 100644 --- a/libmultipath/devmapper.c +++ b/libmultipath/devmapper.c @@ -396,7 +396,13 @@ int dm_addmap_reload(struct multipath *mpp, char *params, int flush) if (r) r = dm_simplecmd(DM_DEVICE_RESUME, mpp->alias, !flush, 1, udev_flags, 0); - return r; + if (r) + return r; + + if (dm_is_suspended(mpp->alias)) + dm_simplecmd(DM_DEVICE_RESUME, mpp->alias, !flush, 1, + udev_flags, 0); + return 0; } static int @@ -1030,6 +1036,17 @@ dm_geteventnr (char *name) return info.event_nr; } +int +dm_is_suspended(const char *name) +{ + struct dm_info info; + + if (do_get_info(name, &info) != 0) + return -1; + + return info.suspended; +} + char * dm_mapname(int major, int minor) { diff --git a/libmultipath/devmapper.h b/libmultipath/devmapper.h index 5b66865..fa739bc 100644 --- a/libmultipath/devmapper.h +++ b/libmultipath/devmapper.h @@ -52,6 +52,7 @@ int dm_enablegroup(char * mapname, int index); int dm_disablegroup(char * mapname, int index); int dm_get_maps (vector mp); int dm_geteventnr (char *name); +int dm_is_suspended(const char *name); int dm_get_major_minor (const char *name, int *major, int *minor); char * dm_mapname(int major, int minor); int dm_remove_partmaps (const char * mapname, int need_sync,
When multipath reloads a device, it can either fail while loading the new table or while resuming the device. If it fails while resuming the device, the device can get stuck in the suspended state. To fix this, multipath needs to resume the device again so that it can continue using the old table. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> --- libmultipath/devmapper.c | 19 ++++++++++++++++++- libmultipath/devmapper.h | 1 + 2 files changed, 19 insertions(+), 1 deletion(-)