diff mbox series

[1/3] multipathd: monitor new multipath dev even if we can't update it

Message ID 20250324205504.2523493-2-bmarzins@redhat.com (mailing list archive)
State New
Headers show
Series fix issue of multipathd not tracking device | expand

Commit Message

Benjamin Marzinski March 24, 2025, 8:55 p.m. UTC
If a multipath device was created by the multipath command, multipathd
might not agree with how the device was created. ev_add_map() can reload
the device with a different table by calling add_map_without_path() ->
update_map(). If this reloading of the map failed, multipathd was simply
ignoring the multipath device, even though it still existed.

One way that reloading can fail is if a path that multipathd already has
initialized goes offline. If a multipath device is created by the
multipath command while the path is offline, it will not use the offline
path, since multipath won't be able to get the necessary pathinfo.
However, multipathd will already have the pathinfo for the path, and may
not even know that it's offline, since the path is an orphan. When it
tries to reload the device, it will include the offline path, and the
reload will fail.

Instead of ignoring the device if it can't reload it, multipathd should
just montior it as it is. When the path device is no longer offline, it
can be added back to the multipath device by calling
"multipathd reconfigure" or "multipathd add path <path>".

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 multipathd/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Martin Wilck March 25, 2025, 10:33 p.m. UTC | #1
On Mon, 2025-03-24 at 16:55 -0400, Benjamin Marzinski wrote:
> If a multipath device was created by the multipath command,
> multipathd
> might not agree with how the device was created. ev_add_map() can
> reload
> the device with a different table by calling add_map_without_path() -
> >
> update_map(). If this reloading of the map failed, multipathd was
> simply
> ignoring the multipath device, even though it still existed.
> 
> One way that reloading can fail is if a path that multipathd already
> has
> initialized goes offline. If a multipath device is created by the
> multipath command while the path is offline, it will not use the
> offline
> path, since multipath won't be able to get the necessary pathinfo.
> However, multipathd will already have the pathinfo for the path, and
> may
> not even know that it's offline, since the path is an orphan. When it
> tries to reload the device, it will include the offline path, and the
> reload will fail.

Am I understanding correctly that during the reload, bdev_open() failed
in the kernel because the path was offline?

I was thinking about my recent tests [1] where I'd come to the
conclusion that reload failure is hardly possible. While I'd realized
that "trying to add a path device that was not previously mapped" might
fail, I didn't think of the scenario you describe here.

[1] https://lore.kernel.org/dm-devel/ee6fcbda31fd1f13969653782417fbed748f5bc7.camel@suse.com/

> 
> Instead of ignoring the device if it can't reload it, multipathd
> should
> just montior it as it is. When the path device is no longer offline,
> it
> can be added back to the multipath device by calling
> "multipathd reconfigure" or "multipathd add path <path>".


> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

Reviewed-by: Martin Wilck <mwilck@suse.com>
Benjamin Marzinski March 27, 2025, 11:51 p.m. UTC | #2
On Tue, Mar 25, 2025 at 11:33:12PM +0100, Martin Wilck wrote:
> On Mon, 2025-03-24 at 16:55 -0400, Benjamin Marzinski wrote:
> > If a multipath device was created by the multipath command,
> > multipathd
> > might not agree with how the device was created. ev_add_map() can
> > reload
> > the device with a different table by calling add_map_without_path() -
> > >
> > update_map(). If this reloading of the map failed, multipathd was
> > simply
> > ignoring the multipath device, even though it still existed.
> > 
> > One way that reloading can fail is if a path that multipathd already
> > has
> > initialized goes offline. If a multipath device is created by the
> > multipath command while the path is offline, it will not use the
> > offline
> > path, since multipath won't be able to get the necessary pathinfo.
> > However, multipathd will already have the pathinfo for the path, and
> > may
> > not even know that it's offline, since the path is an orphan. When it
> > tries to reload the device, it will include the offline path, and the
> > reload will fail.
> 
> Am I understanding correctly that during the reload, bdev_open() failed
> in the kernel because the path was offline?

Yep. It's failing in sd_open() -> scsi_block_when_processing_errors()
see the comment here:
https://github.com/torvalds/linux/blob/5c2a430e85994f4873ea5ec42091baa1153bc731/drivers/scsi/sd.c#L1523
 
> I was thinking about my recent tests [1] where I'd come to the
> conclusion that reload failure is hardly possible. While I'd realized
> that "trying to add a path device that was not previously mapped" might
> fail, I didn't think of the scenario you describe here.
> 
> [1] https://lore.kernel.org/dm-devel/ee6fcbda31fd1f13969653782417fbed748f5bc7.camel@suse.com/
> 
> > 
> > Instead of ignoring the device if it can't reload it, multipathd
> > should
> > just montior it as it is. When the path device is no longer offline,
> > it
> > can be added back to the multipath device by calling
> > "multipathd reconfigure" or "multipathd add path <path>".
> 
> 
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> 
> Reviewed-by: Martin Wilck <mwilck@suse.com>
diff mbox series

Patch

diff --git a/multipathd/main.c b/multipathd/main.c
index e63b6aa7..7aaae773 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -679,7 +679,7 @@  retry:
 	}
 
 fail:
-	if (new_map && (retries < 0 || wait_for_events(mpp, vecs))) {
+	if (new_map && wait_for_events(mpp, vecs)) {
 		condlog(0, "%s: failed to create new map", mpp->alias);
 		remove_map(mpp, vecs->pathvec, vecs->mpvec);
 		return 1;