diff mbox series

[v2,07/11] multipathd: use remove_map_callback for delayed reconfigure

Message ID 20220318223339.4226-8-mwilck@suse.com (mailing list archive)
State Not Applicable, archived
Delegated to: christophe varoqui
Headers show
Series multipathd: fix __delayed_reconfig logic | expand

Commit Message

Martin Wilck March 18, 2022, 10:33 p.m. UTC
From: Martin Wilck <mwilck@suse.com>

If multipathd needs to delay reconfigure() because it's waiting for a map
creation uevent, it can happen that child() isn't woken up if the map
being waited for is removed before the uevent arrives. Fix this by
unblocking reconfigure() with the remove_map_callback() function.

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 multipathd/main.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Benjamin Marzinski March 22, 2022, 12:34 a.m. UTC | #1
On Fri, Mar 18, 2022 at 5:33 PM <mwilck@suse.com> wrote:
>
> From: Martin Wilck <mwilck@suse.com>
>
> If multipathd needs to delay reconfigure() because it's waiting for a map
> creation uevent, it can happen that child() isn't woken up if the map
> being waited for is removed before the uevent arrives. Fix this by
> unblocking reconfigure() with the remove_map_callback() function.
>
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>  multipathd/main.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/multipathd/main.c b/multipathd/main.c
> index f3b8eb4..4721cd8 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -319,6 +319,17 @@ static bool unblock_reconfigure(void)
>         return was_delayed;
>  }
>
> +/*
> + * Make sure child() is woken up when a map is removed that multipathd
> + * is currently waiting for.
> + * Overrides libmultipath's weak symbol by the same name
> + */
> +void remove_map_callback(struct multipath *mpp)
> +{
> +       if (mpp->wait_for_udev > 0)

Is there a reason why you don't decrement wait_for_udev, and check
need_to_delay_reconfig() like in ev_add_map()? I realize that it
doesn't hurt anything to unblock the reconfigure even if there are
other devices that need a delay. The main thread will notice that it
still needs to delay. I'm just wondering why we do it differently
here?

-Ben

> +               unblock_reconfigure();
> +}
> +
>  void schedule_reconfigure(enum force_reload_types requested_type)
>  {
>         pthread_mutex_lock(&config_lock);
> --
> 2.35.1
>

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Martin Wilck March 22, 2022, 9:08 a.m. UTC | #2
On Mon, 2022-03-21 at 19:34 -0500, Benjamin Marzinski wrote:
> On Fri, Mar 18, 2022 at 5:33 PM <mwilck@suse.com> wrote:
> > 
> > From: Martin Wilck <mwilck@suse.com>
> > 
> > If multipathd needs to delay reconfigure() because it's waiting for
> > a map
> > creation uevent, it can happen that child() isn't woken up if the
> > map
> > being waited for is removed before the uevent arrives. Fix this by
> > unblocking reconfigure() with the remove_map_callback() function.
> > 
> > Signed-off-by: Martin Wilck <mwilck@suse.com>
> > ---
> >  multipathd/main.c | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> > 
> > diff --git a/multipathd/main.c b/multipathd/main.c
> > index f3b8eb4..4721cd8 100644
> > --- a/multipathd/main.c
> > +++ b/multipathd/main.c
> > @@ -319,6 +319,17 @@ static bool unblock_reconfigure(void)
> >         return was_delayed;
> >  }
> > 
> > +/*
> > + * Make sure child() is woken up when a map is removed that
> > multipathd
> > + * is currently waiting for.
> > + * Overrides libmultipath's weak symbol by the same name
> > + */
> > +void remove_map_callback(struct multipath *mpp)
> > +{
> > +       if (mpp->wait_for_udev > 0)
> 
> Is there a reason why you don't decrement wait_for_udev, and check
> need_to_delay_reconfig() like in ev_add_map()? I realize that it
> doesn't hurt anything to unblock the reconfigure even if there are
> other devices that need a delay. The main thread will notice that it
> still needs to delay. I'm just wondering why we do it differently
> here?

The main reason was to keep it simple. need_to_delay_reconfig() needs
to be passed a "vecs" pointer, and requires the vecs lock to be
held. remove_map() is deep down in the call stack and has lots of
direct and indirect callers. It can be called with mpvec == NULL and
(in theory) also with pathvec == NULL, in which case it simply frees
the memory obtained by the map, without unlinking itself or its members
from any vectors. In that case it *could* be called without the vecs
lock held (AFAICS, that's not the case today, but the function could be
used this way, e.g. in an error handling code path).

I thought it was easier and safer not to have to assert that every
current and future caller holds the vecs lock, in particular because
this is called indirectly from libmultipath, the function that grabs
the lock is usually high up in the call stack.

I had indeed pondered whether I should remove the call to
need_to_delay_reconfig() from the ev_add_map(), too. I decided against
it, because I realized that waking up child() for nothing is not
cheap,as child() needs to grab the vecs lock just for calling
need_to_delay_reconfig(). We should avoid this for the common case of 
an uevent which we are waiting for. 
The remove_map() code path, OTOH, is a corner case (map being removed
while in the process of being added). We need to cover it, but we know
that this code path will be rarely executed in practice, and thus
unlikely to cause vecs lock contention.

I hope this makes sense.

Regards
Martin

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Benjamin Marzinski March 22, 2022, 3:36 p.m. UTC | #3
On Tue, Mar 22, 2022 at 4:08 AM Martin Wilck <mwilck@suse.com> wrote:
>
> On Mon, 2022-03-21 at 19:34 -0500, Benjamin Marzinski wrote:
> > On Fri, Mar 18, 2022 at 5:33 PM <mwilck@suse.com> wrote:
> > >
> > > From: Martin Wilck <mwilck@suse.com>
> > >
> > > If multipathd needs to delay reconfigure() because it's waiting for
> > > a map
> > > creation uevent, it can happen that child() isn't woken up if the
> > > map
> > > being waited for is removed before the uevent arrives. Fix this by
> > > unblocking reconfigure() with the remove_map_callback() function.
> > >
> > > Signed-off-by: Martin Wilck <mwilck@suse.com>
> > > ---
> > >  multipathd/main.c | 11 +++++++++++
> > >  1 file changed, 11 insertions(+)
> > >
> > > diff --git a/multipathd/main.c b/multipathd/main.c
> > > index f3b8eb4..4721cd8 100644
> > > --- a/multipathd/main.c
> > > +++ b/multipathd/main.c
> > > @@ -319,6 +319,17 @@ static bool unblock_reconfigure(void)
> > >         return was_delayed;
> > >  }
> > >
> > > +/*
> > > + * Make sure child() is woken up when a map is removed that
> > > multipathd
> > > + * is currently waiting for.
> > > + * Overrides libmultipath's weak symbol by the same name
> > > + */
> > > +void remove_map_callback(struct multipath *mpp)
> > > +{
> > > +       if (mpp->wait_for_udev > 0)
> >
> > Is there a reason why you don't decrement wait_for_udev, and check
> > need_to_delay_reconfig() like in ev_add_map()? I realize that it
> > doesn't hurt anything to unblock the reconfigure even if there are
> > other devices that need a delay. The main thread will notice that it
> > still needs to delay. I'm just wondering why we do it differently
> > here?
>
> The main reason was to keep it simple. need_to_delay_reconfig() needs
> to be passed a "vecs" pointer, and requires the vecs lock to be
> held. remove_map() is deep down in the call stack and has lots of
> direct and indirect callers. It can be called with mpvec == NULL and
> (in theory) also with pathvec == NULL, in which case it simply frees
> the memory obtained by the map, without unlinking itself or its members
> from any vectors. In that case it *could* be called without the vecs
> lock held (AFAICS, that's not the case today, but the function could be
> used this way, e.g. in an error handling code path).
>
> I thought it was easier and safer not to have to assert that every
> current and future caller holds the vecs lock, in particular because
> this is called indirectly from libmultipath, the function that grabs
> the lock is usually high up in the call stack.
>
> I had indeed pondered whether I should remove the call to
> need_to_delay_reconfig() from the ev_add_map(), too. I decided against
> it, because I realized that waking up child() for nothing is not
> cheap,as child() needs to grab the vecs lock just for calling
> need_to_delay_reconfig(). We should avoid this for the common case of
> an uevent which we are waiting for.
> The remove_map() code path, OTOH, is a corner case (map being removed
> while in the process of being added). We need to cover it, but we know
> that this code path will be rarely executed in practice, and thus
> unlikely to cause vecs lock contention.
>
> I hope this makes sense.

Yeah. Whenever a map is getting removed from vecs->mpvec, the vecs
lock better be held. But since a map could be getting removed from
some other vector of maps, we can't use (mpvec != NULL) to be sure
that the vecs lock must be held.

Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>

>
> Regards
> Martin
>

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
diff mbox series

Patch

diff --git a/multipathd/main.c b/multipathd/main.c
index f3b8eb4..4721cd8 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -319,6 +319,17 @@  static bool unblock_reconfigure(void)
 	return was_delayed;
 }
 
+/*
+ * Make sure child() is woken up when a map is removed that multipathd
+ * is currently waiting for.
+ * Overrides libmultipath's weak symbol by the same name
+ */
+void remove_map_callback(struct multipath *mpp)
+{
+	if (mpp->wait_for_udev > 0)
+		unblock_reconfigure();
+}
+
 void schedule_reconfigure(enum force_reload_types requested_type)
 {
 	pthread_mutex_lock(&config_lock);