diff mbox series

[v2,4/5] libmultipath: pad dev_loss_tmo to avoid race with no_path_retry

Message ID 20240425233517.2125142-5-bmarzins@redhat.com (mailing list archive)
State Not Applicable, archived
Delegated to: christophe varoqui
Headers show
Series multipath: fix hang in flush_map_nopaths | expand

Commit Message

Benjamin Marzinski April 25, 2024, 11:35 p.m. UTC
Currently multipath makes sure that dev_loss_tmo is at least as long as
the configured no path queueing time. The goal is to make sure that path
devices aren't removed while the multipath device is still queueing in
hopes that they will become usable again.

This is racy. Multipathd may take longer to check the paths than
configured. If strict_timing isn't set, it will definitely take longer.
To account for this, pad the minimum dev_loss_tmo value by five seconds
(one default checker interval) plus one second per minute of no path
queueing time.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/discovery.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Martin Wilck May 2, 2024, 3:14 p.m. UTC | #1
On Thu, 2024-04-25 at 19:35 -0400, Benjamin Marzinski wrote:
> Currently multipath makes sure that dev_loss_tmo is at least as long
> as
> the configured no path queueing time. The goal is to make sure that
> path
> devices aren't removed while the multipath device is still queueing
> in
> hopes that they will become usable again.
> 
> This is racy. Multipathd may take longer to check the paths than
> configured. If strict_timing isn't set, it will definitely take
> longer.
> To account for this, pad the minimum dev_loss_tmo value by five
> seconds
> (one default checker interval) plus one second per minute of no path
> queueing time.

What's the rationale for the "one second per minute" part?
It feels kind of arbitrary to me, and I don't find it obvious that we
need larger padding for larger timeouts.

Regards
Martin
Benjamin Marzinski May 2, 2024, 4:05 p.m. UTC | #2
On Thu, May 02, 2024 at 05:14:56PM +0200, Martin Wilck wrote:
> On Thu, 2024-04-25 at 19:35 -0400, Benjamin Marzinski wrote:
> > Currently multipath makes sure that dev_loss_tmo is at least as long
> > as
> > the configured no path queueing time. The goal is to make sure that
> > path
> > devices aren't removed while the multipath device is still queueing
> > in
> > hopes that they will become usable again.
> > 
> > This is racy. Multipathd may take longer to check the paths than
> > configured. If strict_timing isn't set, it will definitely take
> > longer.
> > To account for this, pad the minimum dev_loss_tmo value by five
> > seconds
> > (one default checker interval) plus one second per minute of no path
> > queueing time.
> 
> What's the rationale for the "one second per minute" part?
> It feels kind of arbitrary to me, and I don't find it obvious that we
> need larger padding for larger timeouts.

The way the checker works, without strict_timing, we do all the checking
work, and then we wait for a second. Obviously, the checking work takes
time, so every nominal one second checker loop will in reality take
longer than a second. The larger you have configured (no_path_retry *
checkint), the further away the actual time when we disable queueing
will be from (no_path_retry * checkint) seconds.

With strict_timing on, things work better. As long as the checking work
doesn't take more than a second, you will stay fairly close to one
second per loop, but even still, it can be longer. nanosleep() will
sleep till at least the time specified. If multipathd is not running as
a realtime process, it will usually sleep for longer.

Also, if the checker work takes longer than a second, then strict_timing
will make sure that the loop takes (close to) exactly 2 (or whatever the
next number is) seconds. However, retry_count_tick() still only ticks
down by 1 for each loop. This means that the real time before queueing
is disabled can diverge from (no_path_retry * checkint). And again, the
longer you've configured it to wait, the more it diverges.

I thought about passing ticks to retry_count_tick(), so that it would
correctly deal with these multi-tick loops, but this can cause it to
diverge from the actual number of retries attempted, in a way that would
mean we disable queueing before the desired number of retries have
occurred, which is a bigger problem, IMHO.  No matter how long the
checker work takes, you will only ever do one path check per loop. In
the worse case, say that the checker work (or anything else in
multipathd) gets stuck for a couple of minutes. You would only do one
path check, but hundreds of ticks would be passed to the
retry_count_tick(), meaning that you could disable queueing after only
one retry. Now you could limit the number of ticks you passed to
retry_count_tick(), but even still, if the a path was 1 tick away from
being checked, and the checker work took 3 seconds, you would still pass
3 ticks to retry_count_tick(), which would cause it to start to diverge
from the actual time it takes for the path to do the configured amount
of retries.

We could improve retry_count_tick(), so that it actually tracks the
number of retries you did, but that still leaves the problems when
either strict_timing isn't on or multipathd isn't a realtime process.
It seemed easier to just say, the longer you have configured multipathd
to wait to disable queueing, the more the time when it actually disables
queueing can diverge from the configured time.

The number I picked for the rate of divergence is just a guest, and if
you don't like it, I'm fine with removing it. After all, the whole point
of this patchset is to make sure that the worst thing that can happen if
paths disappear while we are still queueing is that a multipath device
isn't autoremoved.

-Ben

> 
> Regards
> Martin
Martin Wilck May 2, 2024, 6:44 p.m. UTC | #3
On Thu, 2024-05-02 at 12:05 -0400, Benjamin Marzinski wrote:
> On Thu, May 02, 2024 at 05:14:56PM +0200, Martin Wilck wrote:
> > On Thu, 2024-04-25 at 19:35 -0400, Benjamin Marzinski wrote:
> > > Currently multipath makes sure that dev_loss_tmo is at least as
> > > long
> > > as
> > > the configured no path queueing time. The goal is to make sure
> > > that
> > > path
> > > devices aren't removed while the multipath device is still
> > > queueing
> > > in
> > > hopes that they will become usable again.
> > > 
> > > This is racy. Multipathd may take longer to check the paths than
> > > configured. If strict_timing isn't set, it will definitely take
> > > longer.
> > > To account for this, pad the minimum dev_loss_tmo value by five
> > > seconds
> > > (one default checker interval) plus one second per minute of no
> > > path
> > > queueing time.
> > 
> > What's the rationale for the "one second per minute" part?
> > It feels kind of arbitrary to me, and I don't find it obvious that
> > we
> > need larger padding for larger timeouts.
> 
> The way the checker works, without strict_timing, we do all the
> checking
> work, and then we wait for a second. Obviously, the checking work
> takes
> time, so every nominal one second checker loop will in reality take
> longer than a second. The larger you have configured (no_path_retry *
> checkint), the further away the actual time when we disable queueing
> will be from (no_path_retry * checkint) seconds.
> 
> With strict_timing on, things work better. As long as the checking
> work
> doesn't take more than a second, you will stay fairly close to one
> second per loop, but even still, it can be longer. nanosleep() will
> sleep till at least the time specified. If multipathd is not running
> as
> a realtime process, it will usually sleep for longer.
> 
> Also, if the checker work takes longer than a second, then
> strict_timing
> will make sure that the loop takes (close to) exactly 2 (or whatever
> the
> next number is) seconds. However, retry_count_tick() still only ticks
> down by 1 for each loop. This means that the real time before
> queueing
> is disabled can diverge from (no_path_retry * checkint). And again,
> the
> longer you've configured it to wait, the more it diverges.
> 
> I thought about passing ticks to retry_count_tick(), so that it would
> correctly deal with these multi-tick loops, but this can cause it to
> diverge from the actual number of retries attempted, in a way that
> would
> mean we disable queueing before the desired number of retries have
> occurred, which is a bigger problem, IMHO.  No matter how long the
> checker work takes, you will only ever do one path check per loop. In
> the worse case, say that the checker work (or anything else in
> multipathd) gets stuck for a couple of minutes. You would only do one
> path check, but hundreds of ticks would be passed to the
> retry_count_tick(), meaning that you could disable queueing after
> only
> one retry. Now you could limit the number of ticks you passed to
> retry_count_tick(), but even still, if the a path was 1 tick away
> from
> being checked, and the checker work took 3 seconds, you would still
> pass
> 3 ticks to retry_count_tick(), which would cause it to start to
> diverge
> from the actual time it takes for the path to do the configured
> amount
> of retries.
> 
> We could improve retry_count_tick(), so that it actually tracks the
> number of retries you did, but that still leaves the problems when
> either strict_timing isn't on or multipathd isn't a realtime process.
> It seemed easier to just say, the longer you have configured
> multipathd
> to wait to disable queueing, the more the time when it actually
> disables
> queueing can diverge from the configured time.
> 
> The number I picked for the rate of divergence is just a guest, and
> if
> you don't like it, I'm fine with removing it. After all, the whole
> point
> of this patchset is to make sure that the worst thing that can happen
> if
> paths disappear while we are still queueing is that a multipath
> device
> isn't autoremoved.

I was just asking for a rationale. This is fine.

I think we can, and should, improve multipathd's timing. Unfortunately
this has been on my agenda for a long time already.

For 4/5, too:

Reviewed-by: Martin Wilck <mwilck@suse.com>
diff mbox series

Patch

diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
index 6fd4dabb..e2052422 100644
--- a/libmultipath/discovery.c
+++ b/libmultipath/discovery.c
@@ -901,6 +901,11 @@  sysfs_set_scsi_tmo (struct config *conf, struct multipath *mpp)
 		uint64_t no_path_retry_tmo =
 			(uint64_t)mpp->no_path_retry * conf->checkint;
 
+		/* pad no_path_retry_tmo by one standard check interval
+		 * plus one second per minute to account for timing
+		 * issues with the rechecks */
+		no_path_retry_tmo += no_path_retry_tmo / 60 + DEFAULT_CHECKINT;
+
 		if (no_path_retry_tmo > MAX_DEV_LOSS_TMO)
 			min_dev_loss = MAX_DEV_LOSS_TMO;
 		else