diff mbox series

[v2,23/24] domap(): never return DOMAP_RETRY in daemon mode

Message ID 20181203193624.21870-6-mwilck@suse.com (mailing list archive)
State Not Applicable, archived
Delegated to: christophe varoqui
Headers show
Series multipath-tools: improve logging at -v3 | expand

Commit Message

Martin Wilck Dec. 3, 2018, 7:36 p.m. UTC
DOMAP_RETRY is only used by the multipath tool. multipathd always treats
it exactly like DOMAP_FAIL. Simplify logic by only returning
DOMAP_RETRY in non-daemon mode from domap().

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 libmultipath/configure.c | 28 +++++++++++++---------------
 multipathd/main.c        |  9 +--------
 2 files changed, 14 insertions(+), 23 deletions(-)

Comments

Benjamin Marzinski Dec. 3, 2018, 11:45 p.m. UTC | #1
On Mon, Dec 03, 2018 at 08:36:23PM +0100, Martin Wilck wrote:
> DOMAP_RETRY is only used by the multipath tool. multipathd always treats
> it exactly like DOMAP_FAIL. Simplify logic by only returning
> DOMAP_RETRY in non-daemon mode from domap().
> 
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>  libmultipath/configure.c | 28 +++++++++++++---------------
>  multipathd/main.c        |  9 +--------
>  2 files changed, 14 insertions(+), 23 deletions(-)
> 
> diff --git a/libmultipath/configure.c b/libmultipath/configure.c
> index 84ae5f56..1c549817 100644
> --- a/libmultipath/configure.c
> +++ b/libmultipath/configure.c
> @@ -831,7 +831,7 @@ int domap(struct multipath *mpp, char *params, int is_daemon)
>  		if (lock_multipath(mpp, 1)) {
>  			condlog(3, "%s: failed to create map (in use)",
>  				mpp->alias);
> -			return DOMAP_RETRY;
> +			return is_daemon ? DOMAP_FAIL : DOMAP_RETRY;
>  		}
>  
>  		sysfs_set_max_sectors_kb(mpp, 0);
> @@ -1110,20 +1110,18 @@ int coalesce_paths (struct vectors * vecs, vector newmp, char * refwwid,
>  
>  		r = domap(mpp, params, is_daemon);
>  
> -		if (r == DOMAP_FAIL || r == DOMAP_RETRY) {
> -			condlog(3, "%s: domap (%u) failure "
> -				   "for create/reload map",
> -				mpp->alias, r);
> -			if (r == DOMAP_FAIL || is_daemon) {
> -				condlog(2, "%s: %s map",
> -					mpp->alias, (mpp->action == ACT_CREATE)?
> -					"ignoring" : "removing");
> -				remove_map(mpp, vecs, 0);
> -				continue;
> -			} else /* if (r == DOMAP_RETRY && !is_daemon) */ {
> -				ret = CP_RETRY;
> -				goto out;
> -			}
> +		if (r == DOMAP_RETRY) {
> +			/* This happens in non-daemon case only */
> +			ret = CP_RETRY;
> +			goto out;
> +		}
> +
> +		if (r == DOMAP_FAIL) {
> +			condlog(2, "%s: domap failure, %s map",
> +				mpp->alias, (mpp->action == ACT_CREATE) ?
> +				"ignoring" : "removing");
> +			remove_map(mpp, vecs, 0);
> +			continue;
>  		}
>  		if (r == DOMAP_DRY)
>  			continue;
> diff --git a/multipathd/main.c b/multipathd/main.c
> index d0e7107c..1bf3c481 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -998,15 +998,8 @@ rescan:
>  	/*
>  	 * reload the map for the multipath mapped device
>  	 */
> -retry:
>  	ret = domap(mpp, params, 1);
> -	if (ret == DOMAP_FAIL || ret == DOMAP_RETRY) {
> -		if (ret < 0 && retries-- > 0) {
> -			condlog(0, "%s: retry domap for addition of new "
> -				"path %s", mpp->alias, pp->dev);
> -			sleep(1);
> -			goto retry;
> -		}
> +	if (ret == DOMAP_FAIL) {

I'm kind of conflicted about this change.  According to the commit
message (commit 1c50cd32), Hannes put this code in to deal with lock
contention on the paths (back when we used an exclusive lock in
lock_multipath(), and actually saw contention). I don't know of any
purpose for lock_multipath(), so removing this code probably won't hurt
anything. But if we believe that we need lock_multipath(), then we
should probably retry here (unless you have some understanding of
lock_multipath()'s purpose where retrying won't help).

If we do keep lock_multipath(), I definitely don't see a reason to retry
here if we got DOMAP_FAIL instead of DOMAP_RETRY, which means that
multipathd needs to continue to distingush between them.  On the other
hand, if we decide we don't need lock_multipath(), and thus DOMAP_RETRY,
then there's no point in coalesce_paths returning symbolic return
values, since we only have one non-failure return value.

Probably the most important question I have is "Does anyone know why we
lock the paths?" I believe that the code originally existed to keep
multipath and multipathd from trying to load a device at the same time.
It ended up causing problems with udev, because that tries to grab a
shared lock on the path while working on it.  When multipath switched to
using a shared lock (commit 5ec07b3a), I agreed based on the rational
that udev must be protecting against something, and we could need
protecting against the same thing. The discussion is at:

https://www.redhat.com/archives/dm-devel/2016-May/msg00009.html

But we've always seemed to be one step away from just removing this
code. I would be nice to either determine what we do need it for, or that we
don't need it at all.

-Ben

>  		condlog(0, "%s: failed in domap for addition of new "
>  			"path %s", mpp->alias, pp->dev);
>  		/*
> -- 
> 2.19.1

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Martin Wilck Dec. 9, 2018, 9:06 p.m. UTC | #2
On Mon, 2018-12-03 at 17:45 -0600,  Benjamin Marzinski  wrote:
> On Mon, Dec 03, 2018 at 08:36:23PM +0100, Martin Wilck wrote:
> > DOMAP_RETRY is only used by the multipath tool. multipathd always
> > treats
> > it exactly like DOMAP_FAIL. Simplify logic by only returning
> > DOMAP_RETRY in non-daemon mode from domap().
> > 
> > Signed-off-by: Martin Wilck <mwilck@suse.com>
> > ---
> >  libmultipath/configure.c | 28 +++++++++++++---------------
> >  multipathd/main.c        |  9 +--------
> >  2 files changed, 14 insertions(+), 23 deletions(-)
> > 
> > diff --git a/multipathd/main.c b/multipathd/main.c
> > index d0e7107c..1bf3c481 100644
> > --- a/multipathd/main.c
> > +++ b/multipathd/main.c
> > @@ -998,15 +998,8 @@ rescan:
> >  	/*
> >  	 * reload the map for the multipath mapped device
> >  	 */
> > -retry:
> >  	ret = domap(mpp, params, 1);
> > -	if (ret == DOMAP_FAIL || ret == DOMAP_RETRY) {
> > -		if (ret < 0 && retries-- > 0) {
> > -			condlog(0, "%s: retry domap for addition of new
> > "
> > -				"path %s", mpp->alias, pp->dev);
> > -			sleep(1);
> > -			goto retry;
> > -		}
> > +	if (ret == DOMAP_FAIL) {
> 
> I'm kind of conflicted about this change.

Right, I shot over the top here. I was irritated by the fact that 
"ret < 0" translates to "ret == DOMAP_RETRY" and that therefore
DOMAP_RETRY is tested twice.

>  According to the commit
> message (commit 1c50cd32), Hannes put this code in to deal with lock
> contention on the paths (back when we used an exclusive lock in
> lock_multipath(), and actually saw contention). I don't know of any
> purpose for lock_multipath(), so removing this code probably won't
> hurt
> anything. But if we believe that we need lock_multipath(), then we
> should probably retry here (unless you have some understanding of
> lock_multipath()'s purpose where retrying won't help).
>
> [...]
> Probably the most important question I have is "Does anyone know why
> we > Probably the most important question I have is "Does anyone know
> why we lock the paths?

TL;DR: I'm 99.7% sure we don't need lock_multipath() any more.

The historic reason is 4d7a270:

    'Multiple multipath(8) execs can race with udev storm.
    
    We can simulate this with the following :
    "multipath -F; /sbin/multipath 8:16 & /sbin/multipath 8:32"
    
    Problem arise when two runs are about to create the same map.
    One will fail, leaving us with a choice : abord or retry.'

This commit was made at a time (October 2005) when multipath was called
directly from udev rules to set up maps. Earlier versions of multipath
had a general locking that would not allow multiple multipath commands
to run in parallel, but that has been removed later. This was an
attempt to lock only (would-be) members of one specific map.

Obviously, the goal of this patch wouldn't be achieved any more since
the lock has been change to non-exclusive (1c50cd32). Multiple
multipath instances run happily on members of the same set now. I
haven't tested it, but I believe the historic race "/sbin/multipath
8:16 & /sbin/multipath 8:32" still exists; just we don't run multipath
this way from udev rules any more. 

lock_multipath() doesn't help us void this race, as we can't go back to
exclusive locking. If we want to avoid it, we could create a lock file
from the WWID before calling domap(), /dev/shm/multipath/$WWID.lock or
so. Or we could use a SYSV semaphore set.

> I believe that the code originally existed to keep
> multipath and multipathd from trying to load a device at the same
> time.
> It ended up causing problems with udev, because that tries to grab a
> shared lock on the path while working on it.  When multipath switched
> to
> using a shared lock (commit 5ec07b3a), I agreed based on the rational
> that udev must be protecting against something, and we could need
> protecting against the same thing. The discussion is at:
> 
> https://www.redhat.com/archives/dm-devel/2016-May/msg00009.html
> 

The message of udev commit 3ebdb81, which introduced "device ownership"
in udev, says just "udev: serialize/synchronize block device event
handling with file locks". A comment says "this establishes a concept
of device "ownership" to serialize device access. External processes
holding a "write lock" will cause udev to skip the event handling; in
the case udev acquired the lock, the external process will block until
udev has finished its event handling".

Because multipathd is an "external process" from udev's point of view,
this would mean that our old way of using a write lock was correct (at
least, it was what the udev author intended). But udev gets this really
wrong, it should itself block or retry if the device is locked, rather
than aborting the event processing.

Note that both dm and md devices are excluded from udev's locking
anyway, because the udev people encountered problems with this scheme.

Martin
Benjamin Marzinski Dec. 11, 2018, 5:41 p.m. UTC | #3
On Sun, Dec 09, 2018 at 10:06:05PM +0100, Martin Wilck wrote:
> On Mon, 2018-12-03 at 17:45 -0600,  Benjamin Marzinski  wrote:
> 
> TL;DR: I'm 99.7% sure we don't need lock_multipath() any more.
> 
> The historic reason is 4d7a270:
> 
>     'Multiple multipath(8) execs can race with udev storm.
>     
>     We can simulate this with the following :
>     "multipath -F; /sbin/multipath 8:16 & /sbin/multipath 8:32"
>     
>     Problem arise when two runs are about to create the same map.
>     One will fail, leaving us with a choice : abord or retry.'
> 
> This commit was made at a time (October 2005) when multipath was called
> directly from udev rules to set up maps. Earlier versions of multipath
> had a general locking that would not allow multiple multipath commands
> to run in parallel, but that has been removed later. This was an
> attempt to lock only (would-be) members of one specific map.
> 
> Obviously, the goal of this patch wouldn't be achieved any more since
> the lock has been change to non-exclusive (1c50cd32). Multiple
> multipath instances run happily on members of the same set now. I
> haven't tested it, but I believe the historic race "/sbin/multipath
> 8:16 & /sbin/multipath 8:32" still exists; just we don't run multipath
> this way from udev rules any more. 
> 
> lock_multipath() doesn't help us void this race, as we can't go back to
> exclusive locking. If we want to avoid it, we could create a lock file
> from the WWID before calling domap(), /dev/shm/multipath/$WWID.lock or
> so. Or we could use a SYSV semaphore set.

I vote for removing lock_multipath(). Personally, I've never seen anyone
report the anything that looks like a multiple creation race since we've
changed the locking to shared, so I'm fine with leaving it out, but I
certainly wouldn't NAK a patch that added useful locking back in.

-Ben

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
diff mbox series

Patch

diff --git a/libmultipath/configure.c b/libmultipath/configure.c
index 84ae5f56..1c549817 100644
--- a/libmultipath/configure.c
+++ b/libmultipath/configure.c
@@ -831,7 +831,7 @@  int domap(struct multipath *mpp, char *params, int is_daemon)
 		if (lock_multipath(mpp, 1)) {
 			condlog(3, "%s: failed to create map (in use)",
 				mpp->alias);
-			return DOMAP_RETRY;
+			return is_daemon ? DOMAP_FAIL : DOMAP_RETRY;
 		}
 
 		sysfs_set_max_sectors_kb(mpp, 0);
@@ -1110,20 +1110,18 @@  int coalesce_paths (struct vectors * vecs, vector newmp, char * refwwid,
 
 		r = domap(mpp, params, is_daemon);
 
-		if (r == DOMAP_FAIL || r == DOMAP_RETRY) {
-			condlog(3, "%s: domap (%u) failure "
-				   "for create/reload map",
-				mpp->alias, r);
-			if (r == DOMAP_FAIL || is_daemon) {
-				condlog(2, "%s: %s map",
-					mpp->alias, (mpp->action == ACT_CREATE)?
-					"ignoring" : "removing");
-				remove_map(mpp, vecs, 0);
-				continue;
-			} else /* if (r == DOMAP_RETRY && !is_daemon) */ {
-				ret = CP_RETRY;
-				goto out;
-			}
+		if (r == DOMAP_RETRY) {
+			/* This happens in non-daemon case only */
+			ret = CP_RETRY;
+			goto out;
+		}
+
+		if (r == DOMAP_FAIL) {
+			condlog(2, "%s: domap failure, %s map",
+				mpp->alias, (mpp->action == ACT_CREATE) ?
+				"ignoring" : "removing");
+			remove_map(mpp, vecs, 0);
+			continue;
 		}
 		if (r == DOMAP_DRY)
 			continue;
diff --git a/multipathd/main.c b/multipathd/main.c
index d0e7107c..1bf3c481 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -998,15 +998,8 @@  rescan:
 	/*
 	 * reload the map for the multipath mapped device
 	 */
-retry:
 	ret = domap(mpp, params, 1);
-	if (ret == DOMAP_FAIL || ret == DOMAP_RETRY) {
-		if (ret < 0 && retries-- > 0) {
-			condlog(0, "%s: retry domap for addition of new "
-				"path %s", mpp->alias, pp->dev);
-			sleep(1);
-			goto retry;
-		}
+	if (ret == DOMAP_FAIL) {
 		condlog(0, "%s: failed in domap for addition of new "
 			"path %s", mpp->alias, pp->dev);
 		/*