diff mbox series

[2/2] multipathd: don't keep looping when config is delayed

Message ID 1646964130-21800-3-git-send-email-bmarzins@redhat.com (mailing list archive)
State Not Applicable, archived
Delegated to: christophe varoqui
Headers show
Series fix looping when reconfigure is delayed | expand

Commit Message

Benjamin Marzinski March 11, 2022, 2:02 a.m. UTC
If a reconfigure is delayed because multipathd is waiting on a change
uevent for a new multipath device, the main thread will not pause, but
will keep looping and rechecking to see if it can reconfigure.

To solve this, when __post_config_state(DAEMON_IDLE) is called, if
__delayed_reconfig is set we really do want to switch to the
DAEMON_IDLE state, even if there is a pending reconfigure, since it's
being delayed. When the last change uevent for a new map arrives (or
we time out waiting for it), a reconfigure will get triggered.

However, we need to avoid a race where the main thread calls
enable_delayed_reconfig() and sets __delayed_reconfig, and then the
uevent thread processes a change uevent that sets the state to
DAEMON_CONFIGURE, an then the main thread calls post_config_state().
In this case, multipathd would move to DAEMON_IDLE, even though
the reconfigure should no longer be delayed. To avoid this, when
schedule_reconfigure() is called and the daemon is currently in
DAEMON_CONFIGURE or DAEMON_RUNNING, __delayed_reconfig should be
cleared, so switching to DAEMON_IDLE will instead become
DAEMON_CONFIGURE.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 multipathd/main.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Martin Wilck March 11, 2022, 10:21 a.m. UTC | #1
On Thu, 2022-03-10 at 20:02 -0600, Benjamin Marzinski wrote:
> If a reconfigure is delayed because multipathd is waiting on a change
> uevent for a new multipath device, the main thread will not pause,
> but
> will keep looping and rechecking to see if it can reconfigure.
> 
> To solve this, when __post_config_state(DAEMON_IDLE) is called, if
> __delayed_reconfig is set we really do want to switch to the
> DAEMON_IDLE state, even if there is a pending reconfigure, since it's
> being delayed. When the last change uevent for a new map arrives (or
> we time out waiting for it), a reconfigure will get triggered.

I had thought about something like this, too. I think there's one good
reason to switch to DAEMON_IDLE even if reconfigure is delayed: if we
don't, and forever reason the uevents we expect arrive with large delay
or not at all, we risk being killed by systemd, which will kill
processes that stay in "RELOADING=1" state for more than
TimeoutStartSec seconds. It's unlikely, but I think we should try to
avoid it if we can, because we have no control about systemd's timeout
configuration.

> However, we need to avoid a race where the main thread calls
> enable_delayed_reconfig() and sets __delayed_reconfig, and then the
> uevent thread processes a change uevent that sets the state to
> DAEMON_CONFIGURE, an then the main thread calls post_config_state().
> In this case, multipathd would move to DAEMON_IDLE, even though
> the reconfigure should no longer be delayed. To avoid this, when
> schedule_reconfigure() is called and the daemon is currently in
> DAEMON_CONFIGURE or DAEMON_RUNNING, __delayed_reconfig should be
> cleared, so switching to DAEMON_IDLE will instead become
> DAEMON_CONFIGURE.

I suppose this would work. The part I don't like so much is that the
DAEMON_CONFIGURE logic remains complex and distributed over different
functions (__post_config_state(), schedule_reconfigure(), child())
which interact in non-obvious ways. I noticed that while looking into
Guozhonghua's problem yesterday - the logic is hard to grok, even
though I wrote a significant part of it myself. In particular, I have
started to dislike the complexity we added in __post_config_state(),
which today doesn't do what a caller would expect it does (which is:
simply setting the state passed to it). I'm aware that this complexity
was created by my own commit 250708c :-)

By adding extra semantics to the DAEMON_RUNNING state (which used to
simply mean "checkers running"), the logic gets even harder to
understand, IMO.

Please have a look at my alternative approach (@dm-devel: only posted
off-list so far). If you think that'd be a viable solution too, I'd
prefer it, because it moves most of the logic into a single place
(child()).

Regards,
Martin


> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>  multipathd/main.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/multipathd/main.c b/multipathd/main.c
> index 86b1745a..9bd1f530 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -309,6 +309,7 @@ static void __post_config_state(enum
> daemon_status state)
>                  * again and start another reconfigure cycle.
>                  */
>                 if (reconfigure_pending != FORCE_RELOAD_NONE &&
> +                   !__delayed_reconfig &&
>                     state == DAEMON_IDLE &&
>                     (old_state == DAEMON_CONFIGURE ||
>                      old_state == DAEMON_RUNNING)) {
> @@ -353,6 +354,7 @@ void schedule_reconfigure(enum force_reload_types
> requested_type)
>                 break;
>         case DAEMON_CONFIGURE:
>         case DAEMON_RUNNING:
> +               __delayed_reconfig = false;
>                 reconfigure_pending = type;
>                 break;
>         default:

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Benjamin Marzinski March 11, 2022, 6:37 p.m. UTC | #2
On Fri, Mar 11, 2022 at 10:21:29AM +0000, Martin Wilck wrote:
> On Thu, 2022-03-10 at 20:02 -0600, Benjamin Marzinski wrote:
> > If a reconfigure is delayed because multipathd is waiting on a change
> > uevent for a new multipath device, the main thread will not pause,
> > but
> > will keep looping and rechecking to see if it can reconfigure.
> > 
> > To solve this, when __post_config_state(DAEMON_IDLE) is called, if
> > __delayed_reconfig is set we really do want to switch to the
> > DAEMON_IDLE state, even if there is a pending reconfigure, since it's
> > being delayed. When the last change uevent for a new map arrives (or
> > we time out waiting for it), a reconfigure will get triggered.
> 
> I had thought about something like this, too. I think there's one good
> reason to switch to DAEMON_IDLE even if reconfigure is delayed: if we
> don't, and forever reason the uevents we expect arrive with large delay
> or not at all, we risk being killed by systemd, which will kill
> processes that stay in "RELOADING=1" state for more than
> TimeoutStartSec seconds. It's unlikely, but I think we should try to
> avoid it if we can, because we have no control about systemd's timeout
> configuration.
> 
> > However, we need to avoid a race where the main thread calls
> > enable_delayed_reconfig() and sets __delayed_reconfig, and then the
> > uevent thread processes a change uevent that sets the state to
> > DAEMON_CONFIGURE, an then the main thread calls post_config_state().
> > In this case, multipathd would move to DAEMON_IDLE, even though
> > the reconfigure should no longer be delayed. To avoid this, when
> > schedule_reconfigure() is called and the daemon is currently in
> > DAEMON_CONFIGURE or DAEMON_RUNNING, __delayed_reconfig should be
> > cleared, so switching to DAEMON_IDLE will instead become
> > DAEMON_CONFIGURE.
> 
> I suppose this would work. The part I don't like so much is that the
> DAEMON_CONFIGURE logic remains complex and distributed over different
> functions (__post_config_state(), schedule_reconfigure(), child())
> which interact in non-obvious ways. I noticed that while looking into
> Guozhonghua's problem yesterday - the logic is hard to grok, even
> though I wrote a significant part of it myself. In particular, I have
> started to dislike the complexity we added in __post_config_state(),
> which today doesn't do what a caller would expect it does (which is:
> simply setting the state passed to it). I'm aware that this complexity
> was created by my own commit 250708c :-)
> 
> By adding extra semantics to the DAEMON_RUNNING state (which used to
> simply mean "checkers running"), the logic gets even harder to
> understand, IMO.
> 
> Please have a look at my alternative approach (@dm-devel: only posted
> off-list so far). If you think that'd be a viable solution too, I'd
> prefer it, because it moves most of the logic into a single place
> (child()).
> 

Err.. Patch 2 is still borken. The child process will only stop waiting
in the DAEMON_IDLE state and perform the reconfigure if
__delayed_reconfig is false. The only way that __delayed_reconfig can be
set to false is when a reconfigure actually happens. So you can fail to
reconfig if:

- main thread notices that it needs to delay the reconfigure(), and sets
  __delayed_reconfig to true.
- the uevent thread processes a change event on the last device that was
  delaying the reconfigure and calls schedule_reconfigure(), which sets
  reconfigure_pending, but doesn't set __delayed_reconfig to false
- the main thread calls post_config_state(DAEMON_IDLE)

The solution is to set __delayed_reconfig to false in
schedule_reconfigure().

Patch 3 looks fine.

While we're looking at this, does running_state need to be a volatile,
given that we only ever access it while holding the config_lock? 

-Ben
> Regards,
> Martin
> 
> 
> > 
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > ---
> >  multipathd/main.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/multipathd/main.c b/multipathd/main.c
> > index 86b1745a..9bd1f530 100644
> > --- a/multipathd/main.c
> > +++ b/multipathd/main.c
> > @@ -309,6 +309,7 @@ static void __post_config_state(enum
> > daemon_status state)
> >                  * again and start another reconfigure cycle.
> >                  */
> >                 if (reconfigure_pending != FORCE_RELOAD_NONE &&
> > +                   !__delayed_reconfig &&
> >                     state == DAEMON_IDLE &&
> >                     (old_state == DAEMON_CONFIGURE ||
> >                      old_state == DAEMON_RUNNING)) {
> > @@ -353,6 +354,7 @@ void schedule_reconfigure(enum force_reload_types
> > requested_type)
> >                 break;
> >         case DAEMON_CONFIGURE:
> >         case DAEMON_RUNNING:
> > +               __delayed_reconfig = false;
> >                 reconfigure_pending = type;
> >                 break;
> >         default:
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
diff mbox series

Patch

diff --git a/multipathd/main.c b/multipathd/main.c
index 86b1745a..9bd1f530 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -309,6 +309,7 @@  static void __post_config_state(enum daemon_status state)
 		 * again and start another reconfigure cycle.
 		 */
 		if (reconfigure_pending != FORCE_RELOAD_NONE &&
+		    !__delayed_reconfig &&
 		    state == DAEMON_IDLE &&
 		    (old_state == DAEMON_CONFIGURE ||
 		     old_state == DAEMON_RUNNING)) {
@@ -353,6 +354,7 @@  void schedule_reconfigure(enum force_reload_types requested_type)
 		break;
 	case DAEMON_CONFIGURE:
 	case DAEMON_RUNNING:
+		__delayed_reconfig = false;
 		reconfigure_pending = type;
 		break;
 	default: