Message ID | 20250324205504.2523493-2-bmarzins@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | fix issue of multipathd not tracking device | expand |
On Mon, 2025-03-24 at 16:55 -0400, Benjamin Marzinski wrote: > If a multipath device was created by the multipath command, > multipathd > might not agree with how the device was created. ev_add_map() can > reload > the device with a different table by calling add_map_without_path() - > > > update_map(). If this reloading of the map failed, multipathd was > simply > ignoring the multipath device, even though it still existed. > > One way that reloading can fail is if a path that multipathd already > has > initialized goes offline. If a multipath device is created by the > multipath command while the path is offline, it will not use the > offline > path, since multipath won't be able to get the necessary pathinfo. > However, multipathd will already have the pathinfo for the path, and > may > not even know that it's offline, since the path is an orphan. When it > tries to reload the device, it will include the offline path, and the > reload will fail. Am I understanding correctly that during the reload, bdev_open() failed in the kernel because the path was offline? I was thinking about my recent tests [1] where I'd come to the conclusion that reload failure is hardly possible. While I'd realized that "trying to add a path device that was not previously mapped" might fail, I didn't think of the scenario you describe here. [1] https://lore.kernel.org/dm-devel/ee6fcbda31fd1f13969653782417fbed748f5bc7.camel@suse.com/ > > Instead of ignoring the device if it can't reload it, multipathd > should > just montior it as it is. When the path device is no longer offline, > it > can be added back to the multipath device by calling > "multipathd reconfigure" or "multipathd add path <path>". > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Reviewed-by: Martin Wilck <mwilck@suse.com>
On Tue, Mar 25, 2025 at 11:33:12PM +0100, Martin Wilck wrote: > On Mon, 2025-03-24 at 16:55 -0400, Benjamin Marzinski wrote: > > If a multipath device was created by the multipath command, > > multipathd > > might not agree with how the device was created. ev_add_map() can > > reload > > the device with a different table by calling add_map_without_path() - > > > > > update_map(). If this reloading of the map failed, multipathd was > > simply > > ignoring the multipath device, even though it still existed. > > > > One way that reloading can fail is if a path that multipathd already > > has > > initialized goes offline. If a multipath device is created by the > > multipath command while the path is offline, it will not use the > > offline > > path, since multipath won't be able to get the necessary pathinfo. > > However, multipathd will already have the pathinfo for the path, and > > may > > not even know that it's offline, since the path is an orphan. When it > > tries to reload the device, it will include the offline path, and the > > reload will fail. > > Am I understanding correctly that during the reload, bdev_open() failed > in the kernel because the path was offline? Yep. It's failing in sd_open() -> scsi_block_when_processing_errors() see the comment here: https://github.com/torvalds/linux/blob/5c2a430e85994f4873ea5ec42091baa1153bc731/drivers/scsi/sd.c#L1523 > I was thinking about my recent tests [1] where I'd come to the > conclusion that reload failure is hardly possible. While I'd realized > that "trying to add a path device that was not previously mapped" might > fail, I didn't think of the scenario you describe here. > > [1] https://lore.kernel.org/dm-devel/ee6fcbda31fd1f13969653782417fbed748f5bc7.camel@suse.com/ > > > > > Instead of ignoring the device if it can't reload it, multipathd > > should > > just montior it as it is. When the path device is no longer offline, > > it > > can be added back to the multipath device by calling > > "multipathd reconfigure" or "multipathd add path <path>". > > > > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> > > Reviewed-by: Martin Wilck <mwilck@suse.com>
diff --git a/multipathd/main.c b/multipathd/main.c index e63b6aa7..7aaae773 100644 --- a/multipathd/main.c +++ b/multipathd/main.c @@ -679,7 +679,7 @@ retry: } fail: - if (new_map && (retries < 0 || wait_for_events(mpp, vecs))) { + if (new_map && wait_for_events(mpp, vecs)) { condlog(0, "%s: failed to create new map", mpp->alias); remove_map(mpp, vecs->pathvec, vecs->mpvec); return 1;
If a multipath device was created by the multipath command, multipathd might not agree with how the device was created. ev_add_map() can reload the device with a different table by calling add_map_without_path() -> update_map(). If this reloading of the map failed, multipathd was simply ignoring the multipath device, even though it still existed. One way that reloading can fail is if a path that multipathd already has initialized goes offline. If a multipath device is created by the multipath command while the path is offline, it will not use the offline path, since multipath won't be able to get the necessary pathinfo. However, multipathd will already have the pathinfo for the path, and may not even know that it's offline, since the path is an orphan. When it tries to reload the device, it will include the offline path, and the reload will fail. Instead of ignoring the device if it can't reload it, multipathd should just montior it as it is. When the path device is no longer offline, it can be added back to the multipath device by calling "multipathd reconfigure" or "multipathd add path <path>". Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> --- multipathd/main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)