diff mbox

Regression with initramfs and nfsroot (appears to be in the dcache)

Message ID 20121130020047.GA4939@ZenIV.linux.org.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Al Viro Nov. 30, 2012, 2 a.m. UTC
On Thu, Nov 29, 2012 at 05:54:02PM -0800, Patrick McLean wrote:
> > 	Very interesting.  Do you have anything mounted on the corresponding
> > directories on server?  The picture looks like you are getting empty
> > fhandles in readdir+ respons for exactly the same directories that happen
> > to be mountpoints on client.  In any case, we shouldn't do that blind
> > d_drop() - empty fhandles can happen.  The only remaining question is
> > why do they happen on that set of entries.  From my reading of
> > encode_entryplus_baggage() it looks like we have compose_entry_fh()
> > failing for those entries and those entries alone.  One possible cause
> > would be d_mountpoint(dchild) being true on server.  If it is true, we
> > can declare the case closed; if not, I really wonder what's going on.
> 
> Those directories do have the server's own copies of the said directories bind mounted at the moment in a separate mount namespace.
> 
> Unmounting those directories on the server does appear to stop the WARN_ON from triggering.

OK, that settles it.  WARN_ON() and printks in the area can be dropped;
the right fix is below.  However, there's a similar place in cifs that
also needs to be dealt with and I really, really wonder why the hell do
we do d_drop() in nfs_revalidate_lookup().  It's not relevant in this
bug, but I would like to understand what's wrong with simply returning
0 from ->d_revalidate() and letting the caller (in fs/namei.c) take care
of unhashing, etc. itself.  Would make have_submounts() in there pointless
as well - we could just return 0 and let d_invalidate() take care of the
checks...  Trond?

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Patrick McLean Nov. 30, 2012, 2:33 a.m. UTC | #1
On 29/11/12 06:00 PM, Al Viro wrote:
> On Thu, Nov 29, 2012 at 05:54:02PM -0800, Patrick McLean wrote:
>>> 	Very interesting.  Do you have anything mounted on the corresponding
>>> directories on server?  The picture looks like you are getting empty
>>> fhandles in readdir+ respons for exactly the same directories that happen
>>> to be mountpoints on client.  In any case, we shouldn't do that blind
>>> d_drop() - empty fhandles can happen.  The only remaining question is
>>> why do they happen on that set of entries.  From my reading of
>>> encode_entryplus_baggage() it looks like we have compose_entry_fh()
>>> failing for those entries and those entries alone.  One possible cause
>>> would be d_mountpoint(dchild) being true on server.  If it is true, we
>>> can declare the case closed; if not, I really wonder what's going on.
>>
>> Those directories do have the server's own copies of the said directories bind mounted at the moment in a separate mount namespace.
>>
>> Unmounting those directories on the server does appear to stop the WARN_ON from triggering.
> 
> OK, that settles it.  WARN_ON() and printks in the area can be dropped;
> the right fix is below.  However, there's a similar place in cifs that
> also needs to be dealt with and I really, really wonder why the hell do
> we do d_drop() in nfs_revalidate_lookup().  It's not relevant in this
> bug, but I would like to understand what's wrong with simply returning
> 0 from ->d_revalidate() and letting the caller (in fs/namei.c) take care
> of unhashing, etc. itself.  Would make have_submounts() in there pointless
> as well - we could just return 0 and let d_invalidate() take care of the
> checks...  Trond?
> 
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -450,7 +450,8 @@ void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry)
>  			nfs_refresh_inode(dentry->d_inode, entry->fattr);
>  			goto out;
>  		} else {
> -			d_drop(dentry);
> +			if (d_invalidate(dentry) != 0)
> +				goto out;
>  			dput(dentry);
>  		}
>  	}

Excellent, thanks. Is there any chance this will make it to 3.7? Also we might want to cc stable@ on this as well since it is a regression in 3.6.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Al Viro Nov. 30, 2012, 4:11 a.m. UTC | #2
On Thu, Nov 29, 2012 at 06:33:53PM -0800, Patrick McLean wrote:

> Excellent, thanks. Is there any chance this will make it to 3.7? Also we might want to cc stable@ on this as well since it is a regression in 3.6.

Definitely.  I've dropped that into vfs.git#for-linus and vfs.git#for-next
and tomorrow to Linus it goes...
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Trond Myklebust Nov. 30, 2012, 1:58 p.m. UTC | #3
On Fri, 2012-11-30 at 02:00 +0000, Al Viro wrote:
> On Thu, Nov 29, 2012 at 05:54:02PM -0800, Patrick McLean wrote:

> > > 	Very interesting.  Do you have anything mounted on the corresponding

> > > directories on server?  The picture looks like you are getting empty

> > > fhandles in readdir+ respons for exactly the same directories that happen

> > > to be mountpoints on client.  In any case, we shouldn't do that blind

> > > d_drop() - empty fhandles can happen.  The only remaining question is

> > > why do they happen on that set of entries.  From my reading of

> > > encode_entryplus_baggage() it looks like we have compose_entry_fh()

> > > failing for those entries and those entries alone.  One possible cause

> > > would be d_mountpoint(dchild) being true on server.  If it is true, we

> > > can declare the case closed; if not, I really wonder what's going on.

> > 

> > Those directories do have the server's own copies of the said directories bind mounted at the moment in a separate mount namespace.

> > 

> > Unmounting those directories on the server does appear to stop the WARN_ON from triggering.

> 

> OK, that settles it.  WARN_ON() and printks in the area can be dropped;

> the right fix is below.  However, there's a similar place in cifs that

> also needs to be dealt with and I really, really wonder why the hell do

> we do d_drop() in nfs_revalidate_lookup().  It's not relevant in this

> bug, but I would like to understand what's wrong with simply returning

> 0 from ->d_revalidate() and letting the caller (in fs/namei.c) take care

> of unhashing, etc. itself.  Would make have_submounts() in there pointless

> as well - we could just return 0 and let d_invalidate() take care of the

> checks...  Trond?


The reason for the choice of d_drop over d_invalidate() is the d_count
checks. It really doesn't matter whether or not the client thinks it has
users for a directory if the server is telling you that it is ESTALE. So
we force a d_drop to prevent further lookups from finding it.

IOW: It is there in order to fix the case where the user does
'rmdir("foo"); mkdir("foo")' on the server.


-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com
Simon Kirby Dec. 1, 2012, 2:18 a.m. UTC | #4
On Fri, Nov 30, 2012 at 02:00:48AM +0000, Al Viro wrote:

> OK, that settles it.  WARN_ON() and printks in the area can be dropped;
> the right fix is below.  However, there's a similar place in cifs that
> also needs to be dealt with and I really, really wonder why the hell do
> we do d_drop() in nfs_revalidate_lookup().  It's not relevant in this
> bug, but I would like to understand what's wrong with simply returning
> 0 from ->d_revalidate() and letting the caller (in fs/namei.c) take care
> of unhashing, etc. itself.  Would make have_submounts() in there pointless
> as well - we could just return 0 and let d_invalidate() take care of the
> checks...  Trond?
> 
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -450,7 +450,8 @@ void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry)
>  			nfs_refresh_inode(dentry->d_inode, entry->fattr);
>  			goto out;
>  		} else {
> -			d_drop(dentry);
> +			if (d_invalidate(dentry) != 0)
> +				goto out;
>  			dput(dentry);
>  		}
>  	}

Hello,

With your previous patch (with the WARN_ON), I hit the WARN_ON() in the
test case described here: https://patchwork.kernel.org/patch/1446851/ .
The __d_move()ing mountpoint case no longer hits, and there is no longer
an EBUSY, so this seems to work for me (in 3.6, where it broke).

Simon-
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Al Viro Dec. 1, 2012, 9:40 p.m. UTC | #5
On Fri, Nov 30, 2012 at 01:58:18PM +0000, Myklebust, Trond wrote:

> The reason for the choice of d_drop over d_invalidate() is the d_count
> checks. It really doesn't matter whether or not the client thinks it has
> users for a directory if the server is telling you that it is ESTALE. So
> we force a d_drop to prevent further lookups from finding it.
> 
> IOW: It is there in order to fix the case where the user does
> 'rmdir("foo"); mkdir("foo")' on the server.

You do realize that your have_submounts() check in there is inherently
racy, right?
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -450,7 +450,8 @@  void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry)
 			nfs_refresh_inode(dentry->d_inode, entry->fattr);
 			goto out;
 		} else {
-			d_drop(dentry);
+			if (d_invalidate(dentry) != 0)
+				goto out;
 			dput(dentry);
 		}
 	}