diff mbox series

[4/5] multipathd: reset __delayed_reconfig from ev_add_map()

Message ID 20220314213036.12074-5-mwilck@suse.com (mailing list archive)
State Not Applicable, archived
Delegated to: christophe varoqui
Headers show
Series multipathd: fix __delayed_reconfig logic | expand

Commit Message

Martin Wilck March 14, 2022, 9:30 p.m. UTC
From: Martin Wilck <mwilck@suse.com>

With the previous patch, one race condition between child() and
the uevent handler (ev_add_map()) remains: 1. child() sets
__delayed_reconfig, 2. ev_add_map() calls schedule_reconfigure() and
thus DAEMON_CONFIGURE, 3. child() sets DAEMON_IDLE. This would cause
the pending reconfigure request to be missed.

To fix this, reset __delayed_reconfig immediately from ev_add_map()
(and likewise, missing_uev_wait_tick()). This way the wait loop in
child() terminates on the reconfigure_pending condition.

Also, these schedule_reconfigure() callers don't really request a
reconfigure() call; they just unblock processing previously requested
reconfigure() calls. Add a new helper, unblock_reconfigure(), that
does just that.

Suggested-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 multipathd/main.c | 41 ++++++++++++++++++++---------------------
 1 file changed, 20 insertions(+), 21 deletions(-)

Comments

Benjamin Marzinski March 16, 2022, 8:13 p.m. UTC | #1
On Mon, Mar 14, 2022 at 10:30:35PM +0100, mwilck@suse.com wrote:
> From: Martin Wilck <mwilck@suse.com>
> 
> With the previous patch, one race condition between child() and
> the uevent handler (ev_add_map()) remains: 1. child() sets
> __delayed_reconfig, 2. ev_add_map() calls schedule_reconfigure() and
> thus DAEMON_CONFIGURE, 3. child() sets DAEMON_IDLE. This would cause
> the pending reconfigure request to be missed.
> 
> To fix this, reset __delayed_reconfig immediately from ev_add_map()
> (and likewise, missing_uev_wait_tick()). This way the wait loop in
> child() terminates on the reconfigure_pending condition.
> 
> Also, these schedule_reconfigure() callers don't really request a
> reconfigure() call; they just unblock processing previously requested
> reconfigure() calls. Add a new helper, unblock_reconfigure(), that
> does just that.
>

Doesn't unblock_reconfigure() allow us to set the state to
DAEMON_CONFIGURE, right after the checkerloop() has set it to
DAEMON_RUNNING. schedule_reconfigure() didn't update the state if it was
DAEMON_RUNNING. Instead, it would wait till checkerloop() set the state
back to DAEMON_IDLE for the reconfigure to start.

I have also come up with a different bug that effects these fixes,
including mine. But it will effect this fix worse. This fix assumes that
when a reconfigure is delayed, it should remain delayed until either a
change event happens on the multipath device (ev_add_map) or multipathd
times out waiting for one (missing_uev_wait_tick). However the map could
be removed before either of those things happen.  It's possible that
multipathd could get a remove uevent after the add event but before the
change event. Either that or a multipathd client command could remove
it, or a dm event could happen either removing the device, or updating
it, but with __setup_multipath() removing it. Those are just the
examples I came up with off the top of my head.

So there's a specific problem where we don't handle clearing
__delayed_reconfig if a map was removed. That can be handled in a
different patch.  But this, or any other oversight we might have here
can be mitigated by being more proactive in removing __delayed_reconfig.
For instance, when then main thread tries to reconfigure, it checks
need_to_delay_reconfig(), and only does the reconfigure if this returns
that it's ok.  __delayed_reconfig mostly exists so that the main thread
doesn't need to grab the vecs lock and loop through the the multipath
devices to know if it need to delay, but need_to_delay_reconfig() is
the definitive answer. When the main thread checks this, we should be
updating __delayed_reconfig to match the result. This patch removes the
"__delayed_reconfig = false;" line, which I think is a mistake.

Imagine if, because a map got removed before ev_add_map() was called,
__delayed_reconfig was true, even though there were no maps with
mpp->wait_for_udev set. Any existing config request would still be
delayed, but if another reconfigure was requested not only would it
happen, but if the "__delayed_reconfig = false;" line was back,
__delayed_reconfig would go back to the correct value.

-Ben
 
> Suggested-by: Benjamin Marzinski <bmarzins@redhat.com>
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>  multipathd/main.c | 41 ++++++++++++++++++++---------------------
>  1 file changed, 20 insertions(+), 21 deletions(-)
> 
> diff --git a/multipathd/main.c b/multipathd/main.c
> index d033a9a..2424db7 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -155,16 +155,6 @@ int should_exit(void)
>  	return get_running_state() == DAEMON_SHUTDOWN;
>  }
>  
> -static bool get_delayed_reconfig(void)
> -{
> -	bool val;
> -
> -	pthread_mutex_lock(&config_lock);
> -	val = __delayed_reconfig;
> -	pthread_mutex_unlock(&config_lock);
> -	return val;
> -}
> -
>  /*
>   * global copy of vecs for use in sig handlers
>   */
> @@ -308,6 +298,22 @@ void post_config_state(enum daemon_status state)
>  	pthread_cleanup_pop(1);
>  }
>  
> +static bool unblock_reconfigure(void)
> +{
> +	bool was_delayed;
> +
> +	pthread_mutex_lock(&config_lock);
> +	was_delayed = __delayed_reconfig;
> +	if (was_delayed) {
> +		__delayed_reconfig = false;
> +		__post_config_state(DAEMON_CONFIGURE);
> +	}
> +	pthread_mutex_unlock(&config_lock);
> +	if (was_delayed)
> +		condlog(2, "reconfigure (delayed)");
> +	return was_delayed;
> +}
> +
>  void schedule_reconfigure(enum force_reload_types requested_type)
>  {
>  	pthread_mutex_lock(&config_lock);
> @@ -790,12 +796,9 @@ ev_add_map (char * dev, const char * alias, struct vectors * vecs)
>  		dm_get_info(mpp->alias, &mpp->dmi);
>  		if (mpp->wait_for_udev) {
>  			mpp->wait_for_udev = 0;
> -			if (get_delayed_reconfig() &&
> -			    !need_to_delay_reconfig(vecs)) {
> -				condlog(2, "reconfigure (delayed)");
> -				schedule_reconfigure(FORCE_RELOAD_WEAK);
> +			if (!need_to_delay_reconfig(vecs) &&
> +			    unblock_reconfigure())
>  				return 0;
> -			}
>  		}
>  		/*
>  		 * Not really an error -- we generate our own uevent
> @@ -1899,11 +1902,8 @@ missing_uev_wait_tick(struct vectors *vecs)
>  		}
>  	}
>  
> -	if (timed_out && get_delayed_reconfig() &&
> -	    !need_to_delay_reconfig(vecs)) {
> -		condlog(2, "reconfigure (delayed)");
> -		schedule_reconfigure(FORCE_RELOAD_WEAK);
> -	}
> +	if (timed_out && !need_to_delay_reconfig(vecs))
> +		unblock_reconfigure();
>  }
>  
>  static void
> @@ -3395,7 +3395,6 @@ child (__attribute__((unused)) void *param)
>  			reload_type = reconfigure_pending == FORCE_RELOAD_YES ?
>  				FORCE_RELOAD_YES : FORCE_RELOAD_WEAK;
>  			reconfigure_pending = FORCE_RELOAD_NONE;
> -			__delayed_reconfig = false;
>  			pthread_mutex_unlock(&config_lock);
>  
>  			rc = reconfigure(vecs, reload_type);
> -- 
> 2.35.1
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Martin Wilck March 16, 2022, 10:01 p.m. UTC | #2
On Wed, 2022-03-16 at 15:13 -0500, Benjamin Marzinski wrote:
> On Mon, Mar 14, 2022 at 10:30:35PM +0100, mwilck@suse.com wrote:
> > From: Martin Wilck <mwilck@suse.com>
> > 
> > With the previous patch, one race condition between child() and
> > the uevent handler (ev_add_map()) remains: 1. child() sets
> > __delayed_reconfig, 2. ev_add_map() calls schedule_reconfigure()
> > and
> > thus DAEMON_CONFIGURE, 3. child() sets DAEMON_IDLE. This would
> > cause
> > the pending reconfigure request to be missed.
> > 
> > To fix this, reset __delayed_reconfig immediately from ev_add_map()
> > (and likewise, missing_uev_wait_tick()). This way the wait loop in
> > child() terminates on the reconfigure_pending condition.
> > 
> > Also, these schedule_reconfigure() callers don't really request a
> > reconfigure() call; they just unblock processing previously
> > requested
> > reconfigure() calls. Add a new helper, unblock_reconfigure(), that
> > does just that.
> > 
> 
> Doesn't unblock_reconfigure() allow us to set the state to
> DAEMON_CONFIGURE, right after the checkerloop() has set it to
> DAEMON_RUNNING.

Right. I added the __post_config_state() there in order to be sure that
child() would be woken up. But unblock_reconfigure() should only call
__post_config_state if running_state is DAEMON_IDLE, like
schedule_reconfigure(). Otherwise, it can be sure that some other
process will wake up child().

>  schedule_reconfigure() didn't update the state if it was
> DAEMON_RUNNING. Instead, it would wait till checkerloop() set the
> state
> back to DAEMON_IDLE for the reconfigure to start.

> I have also come up with a different bug that effects these fixes,
> including mine. But it will effect this fix worse. This fix assumes
> that
> when a reconfigure is delayed, it should remain delayed until either
> a
> change event happens on the multipath device (ev_add_map) or
> multipathd
> times out waiting for one (missing_uev_wait_tick). However the map
> could
> be removed before either of those things happen.  It's possible that
> multipathd could get a remove uevent after the add event but before
> the
> change event. Either that or a multipathd client command could remove
> it, or a dm event could happen either removing the device, or
> updating
> it, but with __setup_multipath() removing it. Those are just the
> examples I came up with off the top of my head.


> So there's a specific problem where we don't handle clearing
> __delayed_reconfig if a map was removed. That can be handled in a
> different patch.  But this, or any other oversight we might have here
> can be mitigated by being more proactive in removing
> __delayed_reconfig.
> For instance, when then main thread tries to reconfigure, it checks
> need_to_delay_reconfig(), and only does the reconfigure if this
> returns
> that it's ok.  __delayed_reconfig mostly exists so that the main
> thread
> doesn't need to grab the vecs lock and loop through the the multipath
> devices to know if it need to delay, but need_to_delay_reconfig() is
> the definitive answer. When the main thread checks this, we should be
> updating __delayed_reconfig to match the result. This patch removes
> the
> "__delayed_reconfig = false;" line, which I think is a mistake.

Agreed. My thinking was that this is done by the uevent handler when
the state of need_to_delay_reconfig() changes. But you're right with
the "remove" special case. We should re-add this, it definitely doesn't
hurt.

> Imagine if, because a map got removed before ev_add_map() was called,
> __delayed_reconfig was true, even though there were no maps with
> mpp->wait_for_udev set. Any existing config request would still be
> delayed, but if another reconfigure was requested not only would it
> happen, but if the "__delayed_reconfig = false;" line was back,
> __delayed_reconfig would go back to the correct value.

So we should call unblock_reconfigure() from remove_map().
Unfortunately that requires another callback from libmultipath into
multipathd. Trying to figure it out... 

Thanks,
Martin

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
diff mbox series

Patch

diff --git a/multipathd/main.c b/multipathd/main.c
index d033a9a..2424db7 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -155,16 +155,6 @@  int should_exit(void)
 	return get_running_state() == DAEMON_SHUTDOWN;
 }
 
-static bool get_delayed_reconfig(void)
-{
-	bool val;
-
-	pthread_mutex_lock(&config_lock);
-	val = __delayed_reconfig;
-	pthread_mutex_unlock(&config_lock);
-	return val;
-}
-
 /*
  * global copy of vecs for use in sig handlers
  */
@@ -308,6 +298,22 @@  void post_config_state(enum daemon_status state)
 	pthread_cleanup_pop(1);
 }
 
+static bool unblock_reconfigure(void)
+{
+	bool was_delayed;
+
+	pthread_mutex_lock(&config_lock);
+	was_delayed = __delayed_reconfig;
+	if (was_delayed) {
+		__delayed_reconfig = false;
+		__post_config_state(DAEMON_CONFIGURE);
+	}
+	pthread_mutex_unlock(&config_lock);
+	if (was_delayed)
+		condlog(2, "reconfigure (delayed)");
+	return was_delayed;
+}
+
 void schedule_reconfigure(enum force_reload_types requested_type)
 {
 	pthread_mutex_lock(&config_lock);
@@ -790,12 +796,9 @@  ev_add_map (char * dev, const char * alias, struct vectors * vecs)
 		dm_get_info(mpp->alias, &mpp->dmi);
 		if (mpp->wait_for_udev) {
 			mpp->wait_for_udev = 0;
-			if (get_delayed_reconfig() &&
-			    !need_to_delay_reconfig(vecs)) {
-				condlog(2, "reconfigure (delayed)");
-				schedule_reconfigure(FORCE_RELOAD_WEAK);
+			if (!need_to_delay_reconfig(vecs) &&
+			    unblock_reconfigure())
 				return 0;
-			}
 		}
 		/*
 		 * Not really an error -- we generate our own uevent
@@ -1899,11 +1902,8 @@  missing_uev_wait_tick(struct vectors *vecs)
 		}
 	}
 
-	if (timed_out && get_delayed_reconfig() &&
-	    !need_to_delay_reconfig(vecs)) {
-		condlog(2, "reconfigure (delayed)");
-		schedule_reconfigure(FORCE_RELOAD_WEAK);
-	}
+	if (timed_out && !need_to_delay_reconfig(vecs))
+		unblock_reconfigure();
 }
 
 static void
@@ -3395,7 +3395,6 @@  child (__attribute__((unused)) void *param)
 			reload_type = reconfigure_pending == FORCE_RELOAD_YES ?
 				FORCE_RELOAD_YES : FORCE_RELOAD_WEAK;
 			reconfigure_pending = FORCE_RELOAD_NONE;
-			__delayed_reconfig = false;
 			pthread_mutex_unlock(&config_lock);
 
 			rc = reconfigure(vecs, reload_type);