diff mbox

export table lookup: was [PATCH 10/10 v7] nfsd: Allows user un-mounting filesystem where nfsd exports base on

Message ID 20150724094657.0ca793b4@noble (mailing list archive)
State New, archived
Headers show

Commit Message

NeilBrown July 23, 2015, 11:46 p.m. UTC
On Wed, 22 Jul 2015 11:08:40 -0400 "J. Bruce Fields"
<bfields@fieldses.org> wrote:


> I've had this nagging todo to work out if there are other interesting
> consequences of the fact that the cache is internally keyed on one thing
> and appears to mountd to be keyed on another.  (And that there's a
> complicated many<->many relationship between those two things.)  But I
> haven't gotten to it.  Could be all unlikely corner cases, for all I
> know.

Even corner cases are worth resolving - and you got me interested now:-)

I think the distinction between pathnames and mnt+dentry is not quite
the important one.
I think mnt+dentry is the primary object - it is what a filehandle maps
to and what a pathname maps to.

The problem is that some mnt+dentry pairs do not have matching path
names.  If nfsd gets hold of one of these pairs, it shouldn't try
asking mountd about it because there is no way to ask the question, and
even if there was it isn't clear there is any way for mountd to answer.

If think that nfsd should assume that any such mountpoint is not
exported.

So something vaguely like:


Would mean that if I

# cd /tmp/a/b/c
# mount --bind /etc /tmp/a
# /bin/pwd

I get 

/tmp//(unreachable)/a/b/c

would could be checked for by nfsd to decide that there is no point asking user-space.
I'm not at all certain that this is a good interface (or that the code isn't racy) - it is just
a proof-of-concept.

We should probably place the (unreachable) at the front rather than in the middle.

Does that seem like a reasonable approach from your understanding of the problem?

Thanks,
NeilBrown









--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

J. Bruce Fields July 24, 2015, 7:48 p.m. UTC | #1
On Fri, Jul 24, 2015 at 09:46:57AM +1000, NeilBrown wrote:
> On Wed, 22 Jul 2015 11:08:40 -0400 "J. Bruce Fields"
> <bfields@fieldses.org> wrote:
> 
> 
> > I've had this nagging todo to work out if there are other interesting
> > consequences of the fact that the cache is internally keyed on one thing
> > and appears to mountd to be keyed on another.  (And that there's a
> > complicated many<->many relationship between those two things.)  But I
> > haven't gotten to it.  Could be all unlikely corner cases, for all I
> > know.
> 
> Even corner cases are worth resolving - and you got me interested now:-)
> 
> I think the distinction between pathnames and mnt+dentry is not quite
> the important one.
> I think mnt+dentry is the primary object - it is what a filehandle maps
> to and what a pathname maps to.
> 
> The problem is that some mnt+dentry pairs do not have matching path
> names.  If nfsd gets hold of one of these pairs, it shouldn't try
> asking mountd about it because there is no way to ask the question, and
> even if there was it isn't clear there is any way for mountd to answer.
> 
> If think that nfsd should assume that any such mountpoint is not
> exported.
> 
> So something vaguely like:
> 
> diff --git a/fs/dcache.c b/fs/dcache.c
> index 5c8ea15e73a5..a0651872ae8e 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -2943,6 +2943,12 @@ restart:
>  		if (error)
>  			break;
>  
> +		if (unlikely(d_mountpoint(dentry))) {
> +			struct mount *mounted = __lookup_mnt(vfsmnt, dentry);
> +			if (mounted)
> +				prepend(&bptr, &blen, "//(unreachable)",15);
> +		}
> +
>  		dentry = parent;
>  	}
>  	if (!(seq & 1))
> 
> Would mean that if I
> 
> # cd /tmp/a/b/c
> # mount --bind /etc /tmp/a
> # /bin/pwd
> 
> I get 
> 
> /tmp//(unreachable)/a/b/c
> 
> would could be checked for by nfsd to decide that there is no point asking user-space.
> I'm not at all certain that this is a good interface (or that the code isn't racy) - it is just
> a proof-of-concept.
> 
> We should probably place the (unreachable) at the front rather than in the middle.
> 
> Does that seem like a reasonable approach from your understanding of the problem?

So something like that could give us a way to prevent asking mountd
about mounts that it can't see.

Except when things change: it's possible a mount that would pass this
test at the time we create the request is no longer by the time we get
mountd's reply.

You can tell people not to do that.  It still bugs me to have the
possibility of a unanswereable request.  

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
NeilBrown July 25, 2015, 12:40 a.m. UTC | #2
On Fri, 24 Jul 2015 15:48:17 -0400 "J. Bruce Fields"
<bfields@fieldses.org> wrote:

> On Fri, Jul 24, 2015 at 09:46:57AM +1000, NeilBrown wrote:

> > Does that seem like a reasonable approach from your understanding of the problem?
> 
> So something like that could give us a way to prevent asking mountd
> about mounts that it can't see.
> 
> Except when things change: it's possible a mount that would pass this
> test at the time we create the request is no longer by the time we get
> mountd's reply.
> 
> You can tell people not to do that.  It still bugs me to have the
> possibility of a unanswereable request.  

I can see three general ways to cope with the fact that things could
change between creating a request and receiving the reply:
 - lock to prevent the change
 - refcounts to provide a stable reference to the thing of interest
 - detect change and retry.

These correspond roughly to spinlock, kref, and seqlock.

Trying to prevent changes in the filesystem over an upcall-and-reply is
out of the question.

A refcount could be implemented as a file descriptor. i.e. when nfsd
finds a mountpoint at the end of a 'lookup', it creates a
file descriptor for the target object and passes that to mountd.  mountd
does what it does and sends the reply back with the same file
descriptor.
I think that is needlessly complex.

detect-change-and-retry is, I think, the best.  The cache already has a
retry mechanism.  It can often detect a change implicitly if it gets
told about some filesystem object that it doesn't really care about.
The only weakness is that it can't currently detect if its question can
no longer be answered...

I agree that an unanswerable request seems ugly.  But sometimes the
best way to handle races is to let ugly things happen temporarily.

I should probably double-check that the cache will retry the upcall in
a reasonable time frame - I only have a vague recollection of how that
works...

NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/dcache.c b/fs/dcache.c
index 5c8ea15e73a5..a0651872ae8e 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2943,6 +2943,12 @@  restart:
 		if (error)
 			break;
 
+		if (unlikely(d_mountpoint(dentry))) {
+			struct mount *mounted = __lookup_mnt(vfsmnt, dentry);
+			if (mounted)
+				prepend(&bptr, &blen, "//(unreachable)",15);
+		}
+
 		dentry = parent;
 	}
 	if (!(seq & 1))