diff mbox series

[04/13] multipathd: reload maps in do_sync_mpp() if necessary

Message ID 20241206233617.382200-5-mwilck@suse.com (mailing list archive)
State Not Applicable, archived
Delegated to: Benjamin Marzinski
Headers show
Series multipathd: More map reload handling, and checkerloop work | expand

Commit Message

Martin Wilck Dec. 6, 2024, 11:36 p.m. UTC
update_pathvec_from_dm() may set mpp->need_reload if it finds inconsistent
settings. In this case, the map should be reloaded, but so far we don't
do this reliably. A previous patch added a call to reload_and_sync_map()
in the CHECKER_FINISHED state, but in the mean time the checker may have
waited for checker threads to finish, and may have dropped and re-acquired the
vecs lock. As mpp->need_reload is a serious but rare condition, also try
to fix it early in the checker loop. Because of the previous patch, we
can call reload_and_sync_map() here.

Fixes: https://github.com/opensvc/multipath-tools/issues/105
Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 multipathd/main.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

Comments

Benjamin Marzinski Dec. 10, 2024, 7:20 p.m. UTC | #1
On Sat, Dec 07, 2024 at 12:36:08AM +0100, Martin Wilck wrote:
> update_pathvec_from_dm() may set mpp->need_reload if it finds inconsistent
> settings. In this case, the map should be reloaded, but so far we don't
> do this reliably. A previous patch added a call to reload_and_sync_map()
> in the CHECKER_FINISHED state, but in the mean time the checker may have
> waited for checker threads to finish, and may have dropped and re-acquired the
> vecs lock. As mpp->need_reload is a serious but rare condition, also try
> to fix it early in the checker loop. Because of the previous patch, we
> can call reload_and_sync_map() here.

Again, if the map has any paths in the INIT_PARTIAL or INIT_REMOVED
state, the reload will remove them from the pathvec and mess up our
looping through it. Reloading maps with need_reload will already happen
at the end of this path check loop, so it will still get done very
quickly.

Also, since we try to get all the paths for a device checked in the same
checker loop now, it seems to me that there could a benefit to waiting
till the end, to sync our state with the kernel. This could avoid some
path ping-ponging in a corner case. If a path failed and the kernel
noticed it while we are in a checker loop, but haven't gotten to
update_path_state() for that path yet, the path will still appear up to
multipathd. In this case, if we reloaded the multipath device while
updating an earlier path in the device, we would reinstate this failed
path in sync_map_state(), only for us or the kernel to fail it again
right afterwards.

-Ben
 
> Fixes: https://github.com/opensvc/multipath-tools/issues/105
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>  multipathd/main.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/multipathd/main.c b/multipathd/main.c
> index 131dab6..18083c7 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -2450,12 +2450,25 @@ get_new_state(struct path *pp)
>  static int
>  do_sync_mpp(struct vectors *vecs, struct multipath *mpp)
>  {
> -	int ret;
> +	const int MAX_RETRIES = 1;
> +	int ret, retry = 0;
>  
> +try_again:
>  	ret = refresh_multipath(vecs, mpp);
>  	if (ret)
>  		return ret;
>  
> +	else if (mpp->need_reload) {
> +		if (retry++ < MAX_RETRIES) {
> +			condlog(2, "%s: %s needs reload", __func__, mpp->alias);
> +			if (reload_and_sync_map(mpp, vecs) == 2)
> +				/* map removed */
> +				return 1;
> +			goto try_again;
> +		} else
> +			condlog(1, "%s: %s still needs reload after %d retries",
> +				__func__, mpp->alias, retry);
> +	}
>  	set_no_path_retry(mpp);
>  	return 0;
>  }
> -- 
> 2.47.0
diff mbox series

Patch

diff --git a/multipathd/main.c b/multipathd/main.c
index 131dab6..18083c7 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -2450,12 +2450,25 @@  get_new_state(struct path *pp)
 static int
 do_sync_mpp(struct vectors *vecs, struct multipath *mpp)
 {
-	int ret;
+	const int MAX_RETRIES = 1;
+	int ret, retry = 0;
 
+try_again:
 	ret = refresh_multipath(vecs, mpp);
 	if (ret)
 		return ret;
 
+	else if (mpp->need_reload) {
+		if (retry++ < MAX_RETRIES) {
+			condlog(2, "%s: %s needs reload", __func__, mpp->alias);
+			if (reload_and_sync_map(mpp, vecs) == 2)
+				/* map removed */
+				return 1;
+			goto try_again;
+		} else
+			condlog(1, "%s: %s still needs reload after %d retries",
+				__func__, mpp->alias, retry);
+	}
 	set_no_path_retry(mpp);
 	return 0;
 }