diff mbox series

[v2,04/14] multipathd: quickly re-sync if a map is inconsistent

Message ID 20241211225909.298770-5-mwilck@suse.com (mailing list archive)
State Not Applicable, archived
Delegated to: christophe varoqui
Headers show
Series multipathd: More map reload handling, and checkerloop work | expand

Commit Message

Martin Wilck Dec. 11, 2024, 10:58 p.m. UTC
After reading the kernel device-mapper table, update_pathvec_from_dm()
sets the mpp->need_reload flag if an inconsistent state was found (often a
path with wrong WWID). We expect reload_and_sync_map() to fix this situation.
However, schedule a quick resync in this case, to be double-check that the
inconsistency has been fixed.

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 multipathd/main.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Comments

Benjamin Marzinski Dec. 19, 2024, 9:57 p.m. UTC | #1
On Wed, Dec 11, 2024 at 11:58:59PM +0100, Martin Wilck wrote:
> After reading the kernel device-mapper table, update_pathvec_from_dm()
> sets the mpp->need_reload flag if an inconsistent state was found (often a
> path with wrong WWID). We expect reload_and_sync_map() to fix this situation.
> However, schedule a quick resync in this case, to be double-check that the
> inconsistency has been fixed.

I'm not too sure about this. My biggest worry with handling
mpp->need_reload in the checkerloop is what happens if for some reason
multipathd and the kernel keep disagreeing on something. You would just
keep reloading the device. That seems unlikely, so I've o.k. with
handling it here, but if that does happen, this would make it much
worse.  Instead of reloading every path check, you would reload every
loop.

If you do detect an inconsistent state, and trigger a reload, and the
state is still inconsistent after that, I would argue that yet another
reload is more likely to remain inconsistent than it is to fix the
problem. So I would rather not speed it up.

If I'm overlooking a case where a second reload would fix a problem,
please let me know.

-Ben

> 
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>  multipathd/main.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/multipathd/main.c b/multipathd/main.c
> index e4e6bf7..178618c 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -3026,13 +3026,22 @@ checkerloop (void *ap)
>  							     start_time.tv_sec);
>  			if (checker_state == CHECKER_FINISHED) {
>  				vector_foreach_slot(vecs->mpvec, mpp, i) {
> +					bool inconsistent;
> +
>  					sync_mpp(vecs, mpp, ticks);
> -					if ((update_mpp_prio(mpp) || mpp->need_reload) &&
> +					inconsistent = mpp->need_reload;
> +					if ((update_mpp_prio(mpp) || inconsistent) &&
>  					    reload_and_sync_map(mpp, vecs) == 2) {
>  						/* multipath device deleted */
>  						i--;
>  						continue;
>  					}
> +					/*
> +					 * If we reloaded due to inconsistent state,
> +					 * schedule another sync at the next tick.
> +					 */
> +					if (inconsistent)
> +						mpp->sync_tick = 1;
>  				}
>  			}
>  			lock_cleanup_pop(vecs->lock);
> -- 
> 2.47.0
Benjamin Marzinski Dec. 19, 2024, 11:05 p.m. UTC | #2
On Wed, Dec 11, 2024 at 11:58:59PM +0100, Martin Wilck wrote:
> After reading the kernel device-mapper table, update_pathvec_from_dm()
> sets the mpp->need_reload flag if an inconsistent state was found (often a
> path with wrong WWID). We expect reload_and_sync_map() to fix this situation.
> However, schedule a quick resync in this case, to be double-check that the
> inconsistency has been fixed.

Like I mentioned in by second comment to patch 03, this sync already
happens as part of reload_and_sync_map().

> 
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>  multipathd/main.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/multipathd/main.c b/multipathd/main.c
> index e4e6bf7..178618c 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -3026,13 +3026,22 @@ checkerloop (void *ap)
>  							     start_time.tv_sec);
>  			if (checker_state == CHECKER_FINISHED) {
>  				vector_foreach_slot(vecs->mpvec, mpp, i) {
> +					bool inconsistent;
> +
>  					sync_mpp(vecs, mpp, ticks);
> -					if ((update_mpp_prio(mpp) || mpp->need_reload) &&
> +					inconsistent = mpp->need_reload;
> +					if ((update_mpp_prio(mpp) || inconsistent) &&
>  					    reload_and_sync_map(mpp, vecs) == 2) {
>  						/* multipath device deleted */
>  						i--;
>  						continue;
>  					}
> +					/*
> +					 * If we reloaded due to inconsistent state,
> +					 * schedule another sync at the next tick.
> +					 */
> +					if (inconsistent)
> +						mpp->sync_tick = 1;
>  				}
>  			}
>  			lock_cleanup_pop(vecs->lock);
> -- 
> 2.47.0
diff mbox series

Patch

diff --git a/multipathd/main.c b/multipathd/main.c
index e4e6bf7..178618c 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -3026,13 +3026,22 @@  checkerloop (void *ap)
 							     start_time.tv_sec);
 			if (checker_state == CHECKER_FINISHED) {
 				vector_foreach_slot(vecs->mpvec, mpp, i) {
+					bool inconsistent;
+
 					sync_mpp(vecs, mpp, ticks);
-					if ((update_mpp_prio(mpp) || mpp->need_reload) &&
+					inconsistent = mpp->need_reload;
+					if ((update_mpp_prio(mpp) || inconsistent) &&
 					    reload_and_sync_map(mpp, vecs) == 2) {
 						/* multipath device deleted */
 						i--;
 						continue;
 					}
+					/*
+					 * If we reloaded due to inconsistent state,
+					 * schedule another sync at the next tick.
+					 */
+					if (inconsistent)
+						mpp->sync_tick = 1;
 				}
 			}
 			lock_cleanup_pop(vecs->lock);