diff mbox series

[v3,15/15] multipathd: enable pathgroups in checker_finished()

Message ID 20250117202738.126196-16-mwilck@suse.com (mailing list archive)
State Not Applicable, archived
Delegated to: Benjamin Marzinski
Headers show
Series multipathd: More map reload handling, and checkerloop work | expand

Commit Message

Martin Wilck Jan. 17, 2025, 8:27 p.m. UTC
multipathd calls enable_group() from update_path_state() if a path in a
previously disabled pathgroup is reinstated. This call may be mistakenly
skipped if the path group status wasn't up-to-date while update_path_state()
was executed. This can happen after applying the previous patch "multipathd:
sync maps at end of checkerloop", if the kernel has disabled the group during
the last checker interval.

Therefore add another check in checker_finished() after calling sync_mpp(),
and enable groups if necessary. This step can be skipped if the map was
reloaded, because after a reload, all pathgroups are enabled by default.

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 multipathd/main.c | 34 +++++++++++++++++++++++++++++++++-
 1 file changed, 33 insertions(+), 1 deletion(-)

Comments

Benjamin Marzinski Jan. 20, 2025, 11:51 p.m. UTC | #1
On Fri, Jan 17, 2025 at 09:27:38PM +0100, Martin Wilck wrote:
> multipathd calls enable_group() from update_path_state() if a path in a
> previously disabled pathgroup is reinstated. This call may be mistakenly
> skipped if the path group status wasn't up-to-date while update_path_state()
> was executed. This can happen after applying the previous patch "multipathd:
> sync maps at end of checkerloop", if the kernel has disabled the group during
> the last checker interval.
> 
> Therefore add another check in checker_finished() after calling sync_mpp(),
> and enable groups if necessary. This step can be skipped if the map was
> reloaded, because after a reload, all pathgroups are enabled by default.

The logic running in checker_finished() isn't the same as the logic
running in update_path_state(). It update_path_state(), we only enable a
pathgroup when a path switches to PATH_UP. In checker_finished(), we
enable it whenever a path is in PATH_UP. I worry that this might not
always be correct. Perhaps instead of just checking if the path is in
PATH_UP, we should also make sure that pp->is_checked isn't in
CHECK_PATH_SKIPPED, which would mean that the checker is pending or
messed up, and the path is using its old state. This still assumes that
the path switched to some other state before the pathgroup got delayed
in re-initializing again, but that seems pretty safe. I just don't want
to go re-enabling pathgroups where the controller is actually busy,
since I'm pretty sure that the kernel usually does the right thing
without multipathd's help.

Alternatively, we could make the check_finished() logic match the
update_path_state() logic exactly by just setting a flag in the path's
pathgroup during enable_group() (and possibly not call dm_enablegroup()
at all, but I suppose that there could be a benefit to re-enabling the
group as soon as possible. I'm still kinda fuzzy on whether the kernel's
own pathgroup re-enabling code makes all this redundant). Then, in
enable_pathrgoups(), instead of checking each path, we just need to
check if pgp->need_reenable is set for the pathgroup().

The other benefit to using a flag like pgp->need_reenable is that we
could clear it on all pathgroups if we called switch_pathgroups(), since
that will cause the kernel to enable all the pathgroups anyways, which
makes our pgp->status invalid. Although we probably should also update
pgp->status to be PGSTATE_ENABLED (or at least PGSTATE_UNDEF) if we
were messing with the pathgroups in switch_pathgroups(). And if we
update pgp->status, that would avoid unnecessary re-enables with my
first idea as well (since no pathgroups would be disabled anymore). 

But perhaps the best answer is to just to say that this is a corner case
where we skip a multipathd action that might be totally unnecessary. And
even if it is a problem, it will be fixed if multipathd ever decides
that the kernel is using the wrong pathgroup and should switch, or
whenever the table gets reloaded. Maybe the whole patch is unnecessary.

Thoughts? I'm clearly thinking too much.
-Ben

> 
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>  multipathd/main.c | 34 +++++++++++++++++++++++++++++++++-
>  1 file changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/multipathd/main.c b/multipathd/main.c
> index 310d7ef..98abadc 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -2917,6 +2917,34 @@ update_paths(struct vectors *vecs, int *num_paths_p, time_t start_secs)
>  	return CHECKER_FINISHED;
>  }
>  
> +static void enable_pathgroups(struct multipath *mpp)
> +{
> +	struct pathgroup *pgp;
> +	int i;
> +
> +	vector_foreach_slot(mpp->pg, pgp, i) {
> +		struct path *pp;
> +		int j;
> +
> +		if (pgp->status != PGSTATE_DISABLED)
> +			continue;
> +
> +		vector_foreach_slot(pgp->paths, pp, j) {
> +			if (pp->state != PATH_UP)
> +				continue;
> +
> +			if (dm_enablegroup(mpp->alias, i + 1) == 0) {
> +				condlog(2, "%s: enabled pathgroup #%i",
> +					mpp->alias, i + 1);
> +				pgp->status = PGSTATE_ENABLED;
> +			} else
> +				condlog(2, "%s: failed to enable pathgroup #%i",
> +					mpp->alias, i + 1);
> +			break;
> +		}
> +	}
> +}
> +
>  static void checker_finished(struct vectors *vecs, unsigned int ticks)
>  {
>  	struct multipath *mpp;
> @@ -2943,12 +2971,16 @@ static void checker_finished(struct vectors *vecs, unsigned int ticks)
>  				i--;
>  				continue;
>  			}
> -		} else if (prio_reload || failback_reload || ghost_reload || inconsistent)
> +		} else if (prio_reload || failback_reload || ghost_reload || inconsistent) {
>  			if (reload_and_sync_map(mpp, vecs) == 2) {
>  				/* multipath device deleted */
>  				i--;
>  				continue;
>  			}
> +		} else
> +			/* not necessary after map reloads */
> +			enable_pathgroups(mpp);
> +
>  		/* need_reload was cleared in dm_addmap and then set again */
>  		if (inconsistent && mpp->need_reload)
>  			condlog(1, "BUG: %s; map remained in inconsistent state after reload",
> -- 
> 2.47.1
Martin Wilck Jan. 21, 2025, 11:12 a.m. UTC | #2
On Mon, 2025-01-20 at 18:51 -0500, Benjamin Marzinski wrote:
> On Fri, Jan 17, 2025 at 09:27:38PM +0100, Martin Wilck wrote:
> > multipathd calls enable_group() from update_path_state() if a path
> > in a
> > previously disabled pathgroup is reinstated. This call may be
> > mistakenly
> > skipped if the path group status wasn't up-to-date while
> > update_path_state()
> > was executed. This can happen after applying the previous patch
> > "multipathd:
> > sync maps at end of checkerloop", if the kernel has disabled the
> > group during
> > the last checker interval.
> > 
> > Therefore add another check in checker_finished() after calling
> > sync_mpp(),
> > and enable groups if necessary. This step can be skipped if the map
> > was
> > reloaded, because after a reload, all pathgroups are enabled by
> > default.
> 
> The logic running in checker_finished() isn't the same as the logic
> running in update_path_state(). It update_path_state(), we only
> enable a
> pathgroup when a path switches to PATH_UP. In checker_finished(), we
> enable it whenever a path is in PATH_UP. I worry that this might not
> always be correct. Perhaps instead of just checking if the path is in
> PATH_UP, we should also make sure that pp->is_checked isn't in
> CHECK_PATH_SKIPPED, which would mean that the checker is pending or
> messed up, and the path is using its old state. This still assumes
> that
> the path switched to some other state before the pathgroup got
> delayed
> in re-initializing again, but that seems pretty safe. I just don't
> want
> to go re-enabling pathgroups where the controller is actually busy,
> since I'm pretty sure that the kernel usually does the right thing
> without multipathd's help.

OK. I'll see if I can add a test for the checker state.

> Alternatively, we could make the check_finished() logic match the
> update_path_state() logic exactly by just setting a flag in the
> path's
> pathgroup during enable_group() (and possibly not call
> dm_enablegroup()
> at all, 

That was my first thought, but it doesn't work, because we call  
free_pgvec(mpp->pg, KEEP_PATHS) update_multipath_strings(), and
reallocate the vector in disassemble_map. I guess we could change that,
but it would be another intrusive patch set. Currently we just can't
store any persistent properties in struct pathgroup.

> but I suppose that there could be a benefit to re-enabling the
> group as soon as possible. I'm still kinda fuzzy on whether the
> kernel's
> own pathgroup re-enabling code makes all this redundant).

The kernel will even try "bypassed" PGs if it finds no other. But that
means that e.g. the marginal pathgroup would be tried before disabled
PGs [1]. In general, I don't think we can rely on the kernel in this
respect.

>  Then, in
> enable_pathrgoups(), instead of checking each path, we just need to
> check if pgp->need_reenable is set for the pathgroup().

As I said, as long as we discard the PGs, I guess we'll have to test
the path flags. It's difficult to come up with a decent test case for
this, in particular if checking for PATH_UP is not enough...


> [...]
> 
> But perhaps the best answer is to just to say that this is a corner
> case
> where we skip a multipathd action that might be totally unnecessary.
> And
> even if it is a problem, it will be fixed if multipathd ever decides
> that the kernel is using the wrong pathgroup and should switch, or
> whenever the table gets reloaded. Maybe the whole patch is
> unnecessary.

I'm going to submit a v4 that adds a test for the checker flags.

Btw, I could only find one place in the kernel code where the kernel
actively sets the "bypassed" flag, when the SCSI device handler returns
SCSL_DH_DEV_TEMP_BUSY [2]. This seems rather questionable to me. First
of all, this seems to assume that PGs are mapped to "controllers", i.e.
something like "group_by_tpg", which isn't necessarily the case.
Second, while the EMC device handler uses this error code in this way,
in the ALUA handler it rather means a memory allocation failure, and
the other device handlers don't support it at all.

I think we can conclude that the kernel setting "bypassed" on a path
group by itself is indeed a corner case, except for EMC Clariion, which
is quite dated by now.

> Thoughts? I'm clearly thinking too much.
> 

No, you are not. I strongly appreciate these comments.

Regards,
Martin

[1] Unrelated wild thought: in view of the logic the kernel applies for
"bypassed" PGs, it might actually make sense to use this flag for 
marginal pathgroups. They would be tried if all else fails, which is
more or less the behavior we want for them. What do you think?

[2] https://elixir.bootlin.com/linux/v6.12.6/source/drivers/md/dm-mpath.c#L1564
Benjamin Marzinski Jan. 21, 2025, 9:16 p.m. UTC | #3
On Tue, Jan 21, 2025 at 12:12:50PM +0100, Martin Wilck wrote:
> On Mon, 2025-01-20 at 18:51 -0500, Benjamin Marzinski wrote:
> > On Fri, Jan 17, 2025 at 09:27:38PM +0100, Martin Wilck wrote:
> 
> > Alternatively, we could make the check_finished() logic match the
> > update_path_state() logic exactly by just setting a flag in the
> > path's
> > pathgroup during enable_group() (and possibly not call
> > dm_enablegroup()
> > at all, 
> 
> That was my first thought, but it doesn't work, because we call  
> free_pgvec(mpp->pg, KEEP_PATHS) update_multipath_strings(), and
> reallocate the vector in disassemble_map. I guess we could change that,
> but it would be another intrusive patch set. Currently we just can't
> store any persistent properties in struct pathgroup.

Oh, yeah. nuts.

> > but I suppose that there could be a benefit to re-enabling the
> > group as soon as possible. I'm still kinda fuzzy on whether the
> > kernel's
> > own pathgroup re-enabling code makes all this redundant).
> 
> The kernel will even try "bypassed" PGs if it finds no other. But that
> means that e.g. the marginal pathgroup would be tried before disabled
> PGs [1]. In general, I don't think we can rely on the kernel in this
> respect.
>
> >  Then, in
> > enable_pathrgoups(), instead of checking each path, we just need to
> > check if pgp->need_reenable is set for the pathgroup().
> 
> As I said, as long as we discard the PGs, I guess we'll have to test
> the path flags. It's difficult to come up with a decent test case for
> this, in particular if checking for PATH_UP is not enough...
> 
> 
> > [...]
> > 
> > But perhaps the best answer is to just to say that this is a corner
> > case
> > where we skip a multipathd action that might be totally unnecessary.
> > And
> > even if it is a problem, it will be fixed if multipathd ever decides
> > that the kernel is using the wrong pathgroup and should switch, or
> > whenever the table gets reloaded. Maybe the whole patch is
> > unnecessary.
> 
> I'm going to submit a v4 that adds a test for the checker flags.

Seems reasonable.

> 
> Btw, I could only find one place in the kernel code where the kernel
> actively sets the "bypassed" flag, when the SCSI device handler returns
> SCSL_DH_DEV_TEMP_BUSY [2]. This seems rather questionable to me. First
> of all, this seems to assume that PGs are mapped to "controllers", i.e.
> something like "group_by_tpg", which isn't necessarily the case.
> Second, while the EMC device handler uses this error code in this way,
> in the ALUA handler it rather means a memory allocation failure, and
> the other device handlers don't support it at all.

Yeah, I really wish I knew how this was supposed to work. For the ALUA
devices, it's likely fine to clear bypassed when the device is in
PATH_UP. Either the hardware controller just ran into a temporary error,
and we should go back to using this pathgroup based on its priority, or
there is a persistent memory issue, it which case it likey doesn't
matter what we do, since the system has larger problems than using a
non-optimal pathgroup.

It's the EMC devices that seem like there should be a real right and
wrong way to do things. For instance, even though the code doesn't
currently clear bypassed for PATH_GHOST devices, that might be the right
thing to do.  Imagine if you tried to switch to a new pathgroup, but the
controller had some issue and so you bypassed it, and that issue was
later resolved.  In that case, you want to clear the bypassed flag. But
since you are using some other controller, the paths in the bypassed
pathgroup could well be in PATH_GHOST. This means that you are less
likely to pick that pathgroup in the future, so the bypass flag won't
get cleared. And multipathd won't do anything to fix this. So the
correct answer might be to just check for
pp->is_checked == CHECK_PATH_NEW_UP. But maybe not. I dunno.

> I think we can conclude that the kernel setting "bypassed" on a path
> group by itself is indeed a corner case, except for EMC Clariion, which
> is quite dated by now.
> 
> > Thoughts? I'm clearly thinking too much.
> > 
> 
> No, you are not. I strongly appreciate these comments.
> 
> Regards,
> Martin
> 
> [1] Unrelated wild thought: in view of the logic the kernel applies for
> "bypassed" PGs, it might actually make sense to use this flag for 
> marginal pathgroups. They would be tried if all else fails, which is
> more or less the behavior we want for them. What do you think?

Maybe, but I kind of doubt it. It will get cleared by the kernel if we
ever call switch_pathgroup(), but we could reset it. AFAICS the only
real "benefit" would be that regular bypassed pathgroups would get used
instead of marginal pathgroups. But if we're doing the right thing with
clearing the bypassed flag, then when a pathgroup is bypassed, it likely
can't be initialized (at least for the EMC handler). It might make more
sense to use a marginal pathgroup that can be initialized instead of a
regular pathgroup that quite possibly can't at all.

-Ben

> [2] https://elixir.bootlin.com/linux/v6.12.6/source/drivers/md/dm-mpath.c#L1564
diff mbox series

Patch

diff --git a/multipathd/main.c b/multipathd/main.c
index 310d7ef..98abadc 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -2917,6 +2917,34 @@  update_paths(struct vectors *vecs, int *num_paths_p, time_t start_secs)
 	return CHECKER_FINISHED;
 }
 
+static void enable_pathgroups(struct multipath *mpp)
+{
+	struct pathgroup *pgp;
+	int i;
+
+	vector_foreach_slot(mpp->pg, pgp, i) {
+		struct path *pp;
+		int j;
+
+		if (pgp->status != PGSTATE_DISABLED)
+			continue;
+
+		vector_foreach_slot(pgp->paths, pp, j) {
+			if (pp->state != PATH_UP)
+				continue;
+
+			if (dm_enablegroup(mpp->alias, i + 1) == 0) {
+				condlog(2, "%s: enabled pathgroup #%i",
+					mpp->alias, i + 1);
+				pgp->status = PGSTATE_ENABLED;
+			} else
+				condlog(2, "%s: failed to enable pathgroup #%i",
+					mpp->alias, i + 1);
+			break;
+		}
+	}
+}
+
 static void checker_finished(struct vectors *vecs, unsigned int ticks)
 {
 	struct multipath *mpp;
@@ -2943,12 +2971,16 @@  static void checker_finished(struct vectors *vecs, unsigned int ticks)
 				i--;
 				continue;
 			}
-		} else if (prio_reload || failback_reload || ghost_reload || inconsistent)
+		} else if (prio_reload || failback_reload || ghost_reload || inconsistent) {
 			if (reload_and_sync_map(mpp, vecs) == 2) {
 				/* multipath device deleted */
 				i--;
 				continue;
 			}
+		} else
+			/* not necessary after map reloads */
+			enable_pathgroups(mpp);
+
 		/* need_reload was cleared in dm_addmap and then set again */
 		if (inconsistent && mpp->need_reload)
 			condlog(1, "BUG: %s; map remained in inconsistent state after reload",