[v3,18/19] libmultipath: Don't blank intialized paths

Message ID	1537571127-10143-19-git-send-email-bmarzins@redhat.com (mailing list archive)
State	Not Applicable, archived
Delegated to:	christophe varoqui
Headers	show Return-Path: <dm-devel-bounces@redhat.com> From: "Benjamin Marzinski" <bmarzins@redhat.com> To: device-mapper development <dm-devel@redhat.com> Date: Fri, 21 Sep 2018 18:05:26 -0500 Message-Id: <1537571127-10143-19-git-send-email-bmarzins@redhat.com> In-Reply-To: <1537571127-10143-1-git-send-email-bmarzins@redhat.com> References: <1537571127-10143-1-git-send-email-bmarzins@redhat.com> Cc: Martin Wilck <mwilck@suse.com> Subject: [dm-devel] [PATCH v3 18/19] libmultipath: Don't blank intialized paths Precedence: junk MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com
Series	Misc Multipath patches \| expand [v3,00/19] Misc Multipath patches [v3,01/19] libmultipath: fix tur checker timeout [v3,02/19] libmultipath: fix tur checker double locking [v3,03/19] libmultipath: fix tur memory misuse [v3,04/19] libmultipath: cleanup tur locking [v3,05/19] libmultipath: fix tur checker timeout issue [v3,06/19] libmultipath: fix set_int error path [v3,07/19] libmultipath: fix length issues in get_vpd_sgio [v3,08/19] libmultipath: _install_keyword cleanup [v3,09/19] libmultipath: remove unused code [v3,10/19] libmultipath: fix memory issue in path_latency prio [v3,11/19] libmultipath: fix null dereference int alloc_path_group [v3,12/19] libmutipath: don't use malformed uevents [v3,13/19] multipath: fix max array size in print_cmd_valid [v3,14/19] multipathd: function return value tweaks [v3,15/19] multipathd: minor fixes [v3,16/19] multipathd: remove useless check and fix format [v3,17/19] multipathd: fix memory leak on error in configure [v3,18/19] libmultipath: Don't blank intialized paths [v3,19/19] libmultipath: Fixup updating paths

Benjamin Marzinski Sept. 21, 2018, 11:05 p.m. UTC

When pathinfo fails for some likely transient reason, it clears the path
wwid, but otherwise returns successfully, to keep the path around but
not usable until it gets fully initialized. However, if the path has
already been initialized, and pathinfo hits a transient error, it
shouldn't clear the wwid.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/discovery.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Martin Wilck Oct. 1, 2018, 10 p.m. UTC | #1

On Fri, 2018-09-21 at 18:05 -0500, Benjamin Marzinski wrote:
> When pathinfo fails for some likely transient reason, it clears the
> path
> wwid, but otherwise returns successfully, to keep the path around but
> not usable until it gets fully initialized. However, if the path has
> already been initialized, and pathinfo hits a transient error, it
> shouldn't clear the wwid.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>  libmultipath/discovery.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
> index 3e0db7f..33815dc 100644
> --- a/libmultipath/discovery.c
> +++ b/libmultipath/discovery.c
> @@ -1991,9 +1991,9 @@ blank:
>  	/*
>  	 * Recoverable error, for example faulty or offline path
>  	 */
> -	memset(pp->wwid, 0, WWID_SIZE);
>  	pp->chkrstate = pp->state = PATH_DOWN;
> -	pp->initialized = INIT_FAILED;
> +	if (pp->initialized == INIT_FAILED)
> +		memset(pp->wwid, 0, WWID_SIZE);
>  
>  	return PATHINFO_OK;
>  }

I am uncertain about this one. The old code sets pp->initialized to
INIT_FAILED. If the state had been INIT_MISSING_UDEV or
INIT_REQUESTED_UDEV before, this patch might change how the code
behaves later in check_path(), where these conditions are checked.

Likewise, tests for strlen(pp->wwid) are used in various places around
the code. These tests would now yield different results for paths in
"recoverable error" state.

Have you considered these possible side effects?

Martin

Martin Wilck Oct. 2, 2018, 10:37 p.m. UTC | #2

Hi Ben,

On Tue, 2018-10-02 at 00:00 +0200, Martin Wilck wrote:
> On Fri, 2018-09-21 at 18:05 -0500, Benjamin Marzinski wrote:
> > When pathinfo fails for some likely transient reason, it clears the
> > path
> > wwid, but otherwise returns successfully, to keep the path around
> > but
> > not usable until it gets fully initialized. However, if the path
> > has
> > already been initialized, and pathinfo hits a transient error, it
> > shouldn't clear the wwid.
> > 
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > ---
> >  libmultipath/discovery.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
> > index 3e0db7f..33815dc 100644
> > --- a/libmultipath/discovery.c
> > +++ b/libmultipath/discovery.c
> > @@ -1991,9 +1991,9 @@ blank:
> >  	/*
> >  	 * Recoverable error, for example faulty or offline path
> >  	 */
> > -	memset(pp->wwid, 0, WWID_SIZE);
> >  	pp->chkrstate = pp->state = PATH_DOWN;
> > -	pp->initialized = INIT_FAILED;
> > +	if (pp->initialized == INIT_FAILED)
> > +		memset(pp->wwid, 0, WWID_SIZE);
> >  
> >  	return PATHINFO_OK;
> >  }
> 
> I am uncertain about this one. The old code sets pp->initialized to
> INIT_FAILED. If the state had been INIT_MISSING_UDEV or
> INIT_REQUESTED_UDEV before, this patch might change how the code
> behaves later in check_path(), where these conditions are checked.
> 
> Likewise, tests for strlen(pp->wwid) are used in various places
> around
> the code. These tests would now yield different results for paths in
> "recoverable error" state.
> 
> Have you considered these possible side effects?

I've pondered over this a lot. The dust is clearing up a bit.

1. With your patch in place, INIT_FAILED is never set except in
alloc_path() (we might rename it to INIT_NEW or the like, but see
below).

2. I don't understand how you handle repeated failure to retrieve the
WWID. I see that get_uid() (actually, scsi_uid_fallback()) would
retrieve the WWID from sysfs after retriggers are exhausted. But I
don't see how pathinfo(DI_WWID) would ever be called in this situation:

In the last invocation, pathinfo() had failed to retrieve the WWID and
set pp->initialized = INIT_MISSING_UDEV. There it will remain because
check_path() won't set it to INIT_REQUESTED_UDEV any more after retries
are exhausted. And now, check_path() won't call pathinfo(DI_ALL) any
more from the "add missing path" code, because of the (pp->initialized
!= INIT_MISSING_UDEV) condition.

Am I overlooking something?

3. If "blank" state means that important device information couldn't be
retrieved because of presumably transient failure conditions, we should
retry to retrieve this information by calling pathinfo again later. But
unless the WWID is (reset to) the empty string, check_path() won't call
pathinfo(DI_ALL) any more.

4. The "blank" logic in pathinfo() combines several very different
cases.
  a) PATH_REMOVED status from path_offline(). This means that
elementary sysfs attributes were missing. This is almost the same as
failure in sysfs_pathinfo(), which results in PATHINFO_FAILED return
status; but for PATH_REMOVED we return PATHINFO_OK and keep the path
around.
  b) Failure in checker_check(). If the path is offline in the first
place, the checker isn't called, and WWID determination is attempted.
But if the checker returns PATH_UNCHECKED or PATH_WILD, we goto "blank"
state.
  c) Failure in scsi_ioctl_pathinfo() or cciss_ioctl_pathinfo(). Both
functions never fail, so this can't happen. I've patches here to fix
that.  
  d) Failure to open pp->fd. 

d) is the only case in which the "blank" logic makes really sense to
me. It can happen only at the first pathinfo() invocation, meaning 
pp->wwid is still empty, and pp->initialized is INIT_FAILED. Your patch
would change nothing for this case.

a) and b) can happen for paths that have been initialized already. I
think in case a) the WWID should be reset, probably initialized should
be set to INIT_FAILED, and PATHINFO_FAILED should be returned. In case
b) we should IMO proceed normally rather than goto "blank". Resetting
the WWID in case b) is nonsense, agreed.

Altogether, if my analysis is correct, your patch (not blanking the
WWID) should be applied to case b) only.

Please comment - I still feel a bit confused and may have overlooked
something essential.

Regards
Martin

Benjamin Marzinski Oct. 5, 2018, 7:38 p.m. UTC | #3

On Wed, Oct 03, 2018 at 12:37:17AM +0200, Martin Wilck wrote:
> Hi Ben,
> 
> > I am uncertain about this one. The old code sets pp->initialized to
> > INIT_FAILED. If the state had been INIT_MISSING_UDEV or
> > INIT_REQUESTED_UDEV before, this patch might change how the code
> > behaves later in check_path(), where these conditions are checked.
> > 
> > Likewise, tests for strlen(pp->wwid) are used in various places
> > around
> > the code. These tests would now yield different results for paths in
> > "recoverable error" state.
> > 
> > Have you considered these possible side effects?
> 
> I've pondered over this a lot. The dust is clearing up a bit.
> 
> 1. With your patch in place, INIT_FAILED is never set except in
> alloc_path() (we might rename it to INIT_NEW or the like, but see
> below).

Correct. The idea is that INIT_FAILED means that the path has never been
fully initialized and that it is missing more than simply the wwid
(which would set INIT_MISSING_UDEV instead).

> 2. I don't understand how you handle repeated failure to retrieve the
> WWID. I see that get_uid() (actually, scsi_uid_fallback()) would
> retrieve the WWID from sysfs after retriggers are exhausted. But I
> don't see how pathinfo(DI_WWID) would ever be called in this situation:
>
> In the last invocation, pathinfo() had failed to retrieve the WWID and
> set pp->initialized = INIT_MISSING_UDEV. There it will remain because
> check_path() won't set it to INIT_REQUESTED_UDEV any more after retries
> are exhausted. And now, check_path() won't call pathinfo(DI_ALL) any
> more from the "add missing path" code, because of the (pp->initialized
> != INIT_MISSING_UDEV) condition.
> 
> Am I overlooking something?

Your analysis looks right. This shouldn't have anything to do with the
correctness of the not blanking initialized paths, but we should fix it
anyway.  Right now, as you mentioned, if multipathd fails to get the
wwid after retrigger_tries new uevents, it only has one shot at the
failback methods, and then mutipathd cannot use that path in the future.
However, it will still check if the path is up or down, which is pretty
pointless. It should either

1. set pp->intialized to INIT_OK, to keep it from being checked at all
in the future, or

2. try to grab the missing pathinfo, so that it can use the path.

The benefit of doing the first is that mutipathd doesn't keep messing
with paths that really aren't set up to be multipathed. On the other
hand it might ignore some paths that really should be multipathed.
perhaps the best idea is to keep trying to get the missing info, but
just increase pp->tick (possibly progressively) so we try less often.

> 3. If "blank" state means that important device information couldn't be
> retrieved because of presumably transient failure conditions, we should
> retry to retrieve this information by calling pathinfo again later. But
> unless the WWID is (reset to) the empty string, check_path() won't call
> pathinfo(DI_ALL) any more.

But if we set the wwid at some point in time, that means we got all the
necessary information.  My argument is that if something happens to
cause a later pathinfo() call to fail, we shouldn't pretend like we
didn't already get this info.

> 4. The "blank" logic in pathinfo() combines several very different
> cases.
>   a) PATH_REMOVED status from path_offline(). This means that
> elementary sysfs attributes were missing. This is almost the same as
> failure in sysfs_pathinfo(), which results in PATHINFO_FAILED return
> status; but for PATH_REMOVED we return PATHINFO_OK and keep the path
> around.

True, but I've always assumed that this means that we will be getting a
uevent to remove this path momentarily, so I doesn't really matter what
we do with it. We could blank the path here, but if we already go the
necessary info, it seems like just setting the path to PATH_DOWN, and
waiting for the remove uevent should be fine.

>   b) Failure in checker_check(). If the path is offline in the first
> place, the checker isn't called, and WWID determination is attempted.
> But if the checker returns PATH_UNCHECKED or PATH_WILD, we goto "blank"
> state.

This is clearly the wrong thing to do.

>   c) Failure in scsi_ioctl_pathinfo() or cciss_ioctl_pathinfo(). Both
> functions never fail, so this can't happen. I've patches here to fix
> that.

I just ignored this state because it was impossible. But even with your
patches, my argument remains the same. If we already got this info,
failing to get it again shouldn't cause us to delete the wwid. We should
probably remember the existing values, to make sure that we don't lose
them if the calls fail for some reason.

>   d) Failure to open pp->fd. 

This is clearly the right thing to do.

> d) is the only case in which the "blank" logic makes really sense to
> me. It can happen only at the first pathinfo() invocation, meaning 
> pp->wwid is still empty, and pp->initialized is INIT_FAILED. Your patch
> would change nothing for this case.
> 
> a) and b) can happen for paths that have been initialized already. I
> think in case a) the WWID should be reset, probably initialized should
> be set to INIT_FAILED, and PATHINFO_FAILED should be returned. In case
> b) we should IMO proceed normally rather than goto "blank". Resetting
> the WWID in case b) is nonsense, agreed.
> 
> Altogether, if my analysis is correct, your patch (not blanking the
> WWID) should be applied to case b) only.

I don't really care about blanking the wwid in case a). I just think
it's unnecessary.  Why do you think we should blank it in case c) if we
have previously gotten those values?

-Ben

> Please comment - I still feel a bit confused and may have overlooked
> something essential.
> 
> Regards
> Martin
> 
> -- 
> Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Martin Wilck Oct. 8, 2018, 9:35 a.m. UTC | #4

On Fri, 2018-09-21 at 18:05 -0500,  Benjamin Marzinski  wrote:
> When pathinfo fails for some likely transient reason, it clears the
> path
> wwid, but otherwise returns successfully, to keep the path around but
> not usable until it gets fully initialized. However, if the path has
> already been initialized, and pathinfo hits a transient error, it
> shouldn't clear the wwid.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

After follow-up discussion:

Reviewed-by: Martin Wilck <mwilck@suse.com>


> ---
>  libmultipath/discovery.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
> index 3e0db7f..33815dc 100644
> --- a/libmultipath/discovery.c
> +++ b/libmultipath/discovery.c
> @@ -1991,9 +1991,9 @@ blank:
>  	/*
>  	 * Recoverable error, for example faulty or offline path
>  	 */
> -	memset(pp->wwid, 0, WWID_SIZE);
>  	pp->chkrstate = pp->state = PATH_DOWN;
> -	pp->initialized = INIT_FAILED;
> +	if (pp->initialized == INIT_FAILED)
> +		memset(pp->wwid, 0, WWID_SIZE);
>  
>  	return PATHINFO_OK;
>  }

Martin Wilck Oct. 8, 2018, 9:41 a.m. UTC | #5

On Fri, 2018-10-05 at 14:38 -0500,  Benjamin Marzinski  wrote:
> On Wed, Oct 03, 2018 at 12:37:17AM +0200, Martin Wilck wrote:
> > Hi Ben,
> > 
> > > I am uncertain about this one. The old code sets pp->initialized
> > > to
> > > INIT_FAILED. If the state had been INIT_MISSING_UDEV or
> > > INIT_REQUESTED_UDEV before, this patch might change how the code
> > > behaves later in check_path(), where these conditions are
> > > checked.
> > > 
> > > Likewise, tests for strlen(pp->wwid) are used in various places
> > > around
> > > the code. These tests would now yield different results for paths
> > > in
> > > "recoverable error" state.
> > > 
> > > Have you considered these possible side effects?
> > 
> > I've pondered over this a lot. The dust is clearing up a bit.
> > 
> > 1. With your patch in place, INIT_FAILED is never set except in
> > alloc_path() (we might rename it to INIT_NEW or the like, but see
> > below).
> 
> Correct. The idea is that INIT_FAILED means that the path has never
> been
> fully initialized and that it is missing more than simply the wwid
> (which would set INIT_MISSING_UDEV instead).
> 
> > 2. I don't understand how you handle repeated failure to retrieve
> > the
> > WWID. I see that get_uid() (actually, scsi_uid_fallback()) would
> > retrieve the WWID from sysfs after retriggers are exhausted. But I
> > don't see how pathinfo(DI_WWID) would ever be called in this
> > situation:
> > 
> > In the last invocation, pathinfo() had failed to retrieve the WWID
> > and
> > set pp->initialized = INIT_MISSING_UDEV. There it will remain
> > because
> > check_path() won't set it to INIT_REQUESTED_UDEV any more after
> > retries
> > are exhausted. And now, check_path() won't call pathinfo(DI_ALL)
> > any
> > more from the "add missing path" code, because of the (pp-
> > >initialized
> > != INIT_MISSING_UDEV) condition.
> > 
> > Am I overlooking something?
> 
> Your analysis looks right. This shouldn't have anything to do with
> the
> correctness of the not blanking initialized paths, but we should fix
> it
> anyway.  Right now, as you mentioned, if multipathd fails to get the
> wwid after retrigger_tries new uevents, it only has one shot at the
> failback methods, and then mutipathd cannot use that path in the
> future.
> However, it will still check if the path is up or down, which is
> pretty
> pointless. It should either
> 
> 1. set pp->intialized to INIT_OK, to keep it from being checked at
> all
> in the future, or

That would be lying to ourselves. I'd rather call it INIT_BROKEN or so,
or maybe even delete the path from pathvec altogether.

[ While we're at this: we call pathinfo with DI_ALL from 
configure->path_discovery(), but with DI_ALL|DI_BLACKLIST from
uev_add_path(). As a result, paths discovered in uevents won't be added
to pathvec in the first place if they're blacklisted, while blacklisted
paths discovered at startup will be in the pathvec, but orphaned. Is
that intentional, and if yes, why ? ]

> 2. try to grab the missing pathinfo, so that it can use the path.
> 
> The benefit of doing the first is that mutipathd doesn't keep messing
> with paths that really aren't set up to be multipathed. On the other
> hand it might ignore some paths that really should be multipathed.
> perhaps the best idea is to keep trying to get the missing info, but
> just increase pp->tick (possibly progressively) so we try less often.

The latter sounds overly complex to me, in particular as in the daemon,
we're only looking at paths retrieved via udev anyway, as you pointed
out in your other post. So perhaps INIT_MISSING_UDEV is just about the
right state for this. We may just consider changing uev_update_path()
such that the path is passed to uev_add_path() not only for
INIT_REQUESTED_UDEV, but also for INIT_MISSING_UDEV state.

> 
> > 3. If "blank" state means that important device information
> > couldn't be
> > retrieved because of presumably transient failure conditions, we
> > should
> > retry to retrieve this information by calling pathinfo again later.
> > But
> > unless the WWID is (reset to) the empty string, check_path() won't
> > call
> > pathinfo(DI_ALL) any more.
> 
> But if we set the wwid at some point in time, that means we got all
> the
> necessary information.  My argument is that if something happens to
> cause a later pathinfo() call to fail, we shouldn't pretend like we
> didn't already get this info.

The question is whether the problem encountered means that the earlier
determined WWID is still reliable. If the path is entirely gone from
sysfs (case a) below), I wouldn't bet on that. In this case we rely on
the assumption that we'll receive an uevent later - likely but not
guaranteed.

OTOH, by blanking the WWID, we forfeit the opportunity to detect a WWID
change later - with that in mind, blanking is actually wrong in this
situation, and your patch should be an improvement.

>  
> > 4. The "blank" logic in pathinfo() combines several very different
> > cases.
> >   a) PATH_REMOVED status from path_offline(). This means that
> > elementary sysfs attributes were missing. This is almost the same
> > as
> > failure in sysfs_pathinfo(), which results in PATHINFO_FAILED
> > return
> > status; but for PATH_REMOVED we return PATHINFO_OK and keep the
> > path
> > around.
> 
> True, but I've always assumed that this means that we will be getting
> a
> uevent to remove this path momentarily, so I doesn't really matter
> what
> we do with it. We could blank the path here, but if we already go the
> necessary info, it seems like just setting the path to PATH_DOWN, and
> waiting for the remove uevent should be fine.

Perhaps. I just wanted to point out the different treatment of two very
similar scenarios.

> >   b) Failure in checker_check(). If the path is offline in the
> > first
> > place, the checker isn't called, and WWID determination is
> > attempted.
> > But if the checker returns PATH_UNCHECKED or PATH_WILD, we goto
> > "blank"
> > state.
> 
> This is clearly the wrong thing to do.
> 
> >   c) Failure in scsi_ioctl_pathinfo() or cciss_ioctl_pathinfo().
> > Both
> > functions never fail, so this can't happen. I've patches here to
> > fix
> > that.
> 
> I just ignored this state because it was impossible. But even with
> your
> patches, my argument remains the same. If we already got this info,
> failing to get it again shouldn't cause us to delete the wwid. We
> should
> probably remember the existing values, to make sure that we don't
> lose
> them if the calls fail for some reason.

My patches wouldn't change a thing here. They just turn these two
functions to void type.

> 
> >   d) Failure to open pp->fd. 
> 
> This is clearly the right thing to do.
> 
> > d) is the only case in which the "blank" logic makes really sense
> > to
> > me. It can happen only at the first pathinfo() invocation, meaning 
> > pp->wwid is still empty, and pp->initialized is INIT_FAILED. Your
> > patch
> > would change nothing for this case.
> > 
> > a) and b) can happen for paths that have been initialized already.
> > I
> > think in case a) the WWID should be reset, probably initialized
> > should
> > be set to INIT_FAILED, and PATHINFO_FAILED should be returned. In
> > case
> > b) we should IMO proceed normally rather than goto "blank".
> > Resetting
> > the WWID in case b) is nonsense, agreed.
> > 
> > Altogether, if my analysis is correct, your patch (not blanking the
> > WWID) should be applied to case b) only.
> 
> I don't really care about blanking the wwid in case a). I just think
> it's unnecessary.  Why do you think we should blank it in case c) if
> we
> have previously gotten those values?

Summarizing, 

 a) you convinced me about that one,
 b) your patch is good, but we may need additional changes in a follow-
up patch,
 c) doesn't happen,
 d) I agreed in the first place.

And my concern about INIT_MISSING_UDEV was orthogonal to your patch
anyway. So I'll give your patch a reviewed-by. Sorry for the noise. I
guess you'll concede the handling of path states in multipath-tools is
non-trivial. 

Martin

Benjamin Marzinski Oct. 9, 2018, 10:20 p.m. UTC | #6

On Mon, Oct 08, 2018 at 11:41:35AM +0200, Martin Wilck wrote:
> On Fri, 2018-10-05 at 14:38 -0500,  Benjamin Marzinski  wrote:
> > On Wed, Oct 03, 2018 at 12:37:17AM +0200, Martin Wilck wrote:

 
> > Your analysis looks right. This shouldn't have anything to do with
> > the
> > correctness of the not blanking initialized paths, but we should fix
> > it
> > anyway.  Right now, as you mentioned, if multipathd fails to get the
> > wwid after retrigger_tries new uevents, it only has one shot at the
> > failback methods, and then mutipathd cannot use that path in the
> > future.
> > However, it will still check if the path is up or down, which is
> > pretty
> > pointless. It should either
> > 
> > 1. set pp->intialized to INIT_OK, to keep it from being checked at
> > all
> > in the future, or
> 
> That would be lying to ourselves. I'd rather call it INIT_BROKEN or so,
> or maybe even delete the path from pathvec altogether.
> 
> [ While we're at this: we call pathinfo with DI_ALL from 
> configure->path_discovery(), but with DI_ALL|DI_BLACKLIST from
> uev_add_path(). As a result, paths discovered in uevents won't be added
> to pathvec in the first place if they're blacklisted, while blacklisted
> paths discovered at startup will be in the pathvec, but orphaned. Is
> that intentional, and if yes, why ? ]

I really don't know why configure() works this way. The call to
filter_path() just after path_discovery() will remove the blacklisted
paths, so they won't actually end up in the pathvec. But it does seem
pretty pointless to not just check if they are blacklisted while
initializing them.

> > 2. try to grab the missing pathinfo, so that it can use the path.
> > 
> > The benefit of doing the first is that mutipathd doesn't keep messing
> > with paths that really aren't set up to be multipathed. On the other
> > hand it might ignore some paths that really should be multipathed.
> > perhaps the best idea is to keep trying to get the missing info, but
> > just increase pp->tick (possibly progressively) so we try less often.
> 
> The latter sounds overly complex to me, in particular as in the daemon,
> we're only looking at paths retrieved via udev anyway, as you pointed
> out in your other post. So perhaps INIT_MISSING_UDEV is just about the
> right state for this. We may just consider changing uev_update_path()
> such that the path is passed to uev_add_path() not only for
> INIT_REQUESTED_UDEV, but also for INIT_MISSING_UDEV state.

The thought behind not just using uev_update_path() is that it doesn't
get called very predicably. There's a chance that the path may become
usable, but we won't get any event for it.  There's also a chance that a
path that will never have a wwid (and thus shouldn't be multipathed)
will be part of a uevent storm, and multipath will do a bunch of
pointless work.  If we do the checks in the checkerloop(), we can
control exactly how often we check these paths.
 
> 
> Summarizing, 
> 
>  a) you convinced me about that one,
>  b) your patch is good, but we may need additional changes in a follow-
> up patch,
>  c) doesn't happen,
>  d) I agreed in the first place.
> 
> And my concern about INIT_MISSING_UDEV was orthogonal to your patch
> anyway. So I'll give your patch a reviewed-by. Sorry for the noise. I
> guess you'll concede the handling of path states in multipath-tools is
> non-trivial. 

Sure. We do need to figure out what to do when we really can't get a
wwid for the device, but that can happen in a different patchset.

-Ben
 
> Martin
> 
> -- 
> Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

[v3,18/19] libmultipath: Don't blank intialized paths

Commit Message

Comments

Patch