[RFC] NFSD: fix cannot umounting mount points under pseudo root

On Fri, 1 May 2015 09:29:53 -0400 "J. Bruce Fields" <bfields@fieldses.org>
wrote:

> On Fri, May 01, 2015 at 01:08:26PM +1000, NeilBrown wrote:
> > On Fri, 1 May 2015 03:29:40 +0100 Al Viro <viro@ZenIV.linux.org.uk> wrote:
> > 
> > > On Fri, May 01, 2015 at 12:23:33PM +1000, NeilBrown wrote:
> > > > > What kind of consistency warranties do callers expect, BTW?  You do realize
> > > > > that between iterate_dir() and callbacks an entry might have been removed
> > > > > and/or replaced?
> > > > 
> > > > For READDIR_PLUS, lookup_one_len is called on each name and it requires
> > > > i_mutex, so the code currently holds i_mutex over the whole sequence.
> > > > This is triggering a deadlock.
> > > 
> > > Yes, I've seen the context.  However, you are _not_ holding it between
> > > actual iterate_dir() and those callbacks, which opens a window when
> > > directory might have been changed.
> > > 
> > > Again, what kind of consistency is expected by callers?  Are they ready to
> > > cope with "there's no such entry anymore" or "inumber is nothing like
> > > what we'd put in ->ino, since it's no the same object" or "->d_type is
> > > completely unrelated to what we'd found, since the damn thing had been
> > > removed and created from scratch"?
> > 
> > Ah, sorry.
> > 
> > Yes, the callers are prepared for "there's no such entry anymore".
> > They don't use d_type, so don't care if it might be meaningless.
> > NFSv4 doesn't use ino either, but NFSv3 does and isn't properly cautious
> > about ino changing.
> > 
> > In nfs3xdr, we should probably pass 'ino' to encode_entryplus_baggage() and
> > thence to compose_entry_fh() and it should report failure if
> > dchild->d_inode->i_ino doesn't match.
> 
> Just to make sure I understand the concern..... So it shouldn't really
> be a problem if readdir and lookup find different objects for the same
> name, the problem is just when we mix attributes from the two objects,
> right?  Looks like the v3 code could return an inode number derived from
> the readdir and a filehandle from the lookup, which is a problem.  The
> v4 code will get everything from the result of the lookup, which should
> be OK.

That agrees with my understanding, yes.

I did wonder for a little while about the possibility of a directory
containing both 'a' and 'b', and NFSv4 doing the readdir and the stat of 'a',
and the a "mv a b" happening before the stat of 'b'.

Then the readdir response will show both 'a' and 'b' referring to the same
object with a link count of 1.

I can't quite decide if that is a problem or not.

> 
> > Simply not returning the extra attributes is perfectly acceptable in NFSv3.
> 
> Right, so no big deal anyway.--b.

Not a big deal, but we should really add a patch like the following ("like"
as in "actually compile tested and documented" which this one isn't).

NeilBrown

> 
> > So it looks like we are mostly OK here - we don't really need i_mutex to be
> > held for very long.
> > 
> > NeilBrown
> > 
>

[RFC] NFSD: fix cannot umounting mount points under pseudo root

Commit Message

Comments

Patch