diff mbox

[3.6-rc3] rdirplus broken? (EBUSY)

Message ID 20120912212035.GB28555@hostway.ca (mailing list archive)
State New, archived
Headers show

Commit Message

Simon Kirby Sept. 12, 2012, 9:20 p.m. UTC
On Wed, Sep 12, 2012 at 08:16:13AM -0400, J. Bruce Fields wrote:

> On Tue, Sep 11, 2012 at 12:25:23PM -0700, Simon Kirby wrote:
> > On Mon, Aug 27, 2012 at 02:55:10PM -0700, Simon Kirby wrote:
> > 
> > > Something seems broiken in 3.6-rc[123] which was fine in 3.5 and earlier.
> > > This is a 3.4.1 knfsd server with ext3 and XFS-based NFS exports:
> > > 
> > > /	192.168.13.0/24(rw,no_root_squash,no_subtree_check,async)
> > > /pics	192.168.13.0/24(rw,no_root_squash,no_subtree_check,async)
> > > /raid	192.168.13.0/24(rw,no_root_squash,no_subtree_check,async)
> > > 
> > > and a 3.6-rc3 client with this in fstab:
> > > 
> > > flick:/		/flick		nfs	rw,vers=3
> > > flick:/raid	/flick/raid	nfs	rw,vers=3
> > > flick:/pics	/flick/pics	nfs	rw,vers=3
> > > 
> > > This seems to fail now as follows:
> > > 
> > > [sroot@oof:/]# mount flick
> > > [sroot@oof:/]# mount flick/raid
> > > [sroot@oof:/]# mount flick/pics
> > > [sroot@oof:/]# ls -l flick
> > > ls: cannot access flick/pics: Device or resource busy
> > > ls: cannot access flick/raid: Device or resource busy
> > > total 2180
> > > drwxr-xr-x  45 root root    4096 Jun 18 14:19 ./
> > > drwxr-xr-x  58 root root    4096 Jul  3 22:24 ../
> > > ...
> > > ??????????   ? ?    ?          ?            ? pics
> > > ??????????   ? ?    ?          ?            ? raid
> > > ...
> > > [sroot@oof:/]# cd flick/pics
> > > flick/pics: Device or resource busy.
> > > 
> > > These mount points are now stuck and cannot be unmounted until
> > > I reboot (umount -l fails with EBUSY).
> > > 
> > > If I mount with "nordirplus", I can't seem to get it to break. However,
> > > sometimes it will work regardless. I can bisect this if it would help..
> > 
> > This is still the case with 3.6-rc5. I hadn't noticed any problem since
> > mounting with nordirplus, and it broke immediately after removing the
> > option again. I will bisect.
> 
> The symptoms sound similar to
> http://marc.info/?l=linux-fsdevel&m=134738157303017&w=2
> 
> Might be worth checking whether it's that patch?

Indeed! I tried this hack:


With this applied, "ls -l flick" prints:

[   77.217420] VFS: __d_move()ing a d_mountpoint(), uh oh
[   77.222390] VFS: __d_move()ing a d_mountpoint(), uh oh

...and "pics" and "raid" then work as they did before, or with "nordirplus"
set. So, is something broken with nordirplus or the NFS layer, or should
__d_unalias() really move a mountpoint? With nordirplus, it works without
complaining about moving a mountpoint.

Simon-
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Simon Kirby Sept. 20, 2012, 12:13 a.m. UTC | #1
On Wed, Sep 12, 2012 at 02:20:35PM -0700, Simon Kirby wrote:

> On Wed, Sep 12, 2012 at 08:16:13AM -0400, J. Bruce Fields wrote:
> 
> > The symptoms sound similar to
> > http://marc.info/?l=linux-fsdevel&m=134738157303017&w=2
> > 
> > Might be worth checking whether it's that patch?
> 
> Indeed! I tried this hack:
> 
> diff --git a/fs/dcache.c b/fs/dcache.c
> index 8086636..649a112 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -2404,6 +2404,10 @@ out_unalias:
>  	if (likely(!d_mountpoint(alias))) {
>  		__d_move(alias, dentry);
>  		ret = alias;
> +	} else {
> +		printk(KERN_WARNING "VFS: __d_move()ing a d_mountpoint(), uh oh\n");
> +		__d_move(alias, dentry);
> +		ret = alias;
>  	}
>  out_err:
>  	spin_unlock(&inode->i_lock);
> 
> With this applied, "ls -l flick" prints:
> 
> [   77.217420] VFS: __d_move()ing a d_mountpoint(), uh oh
> [   77.222390] VFS: __d_move()ing a d_mountpoint(), uh oh
> 
> ...and "pics" and "raid" then work as they did before, or with "nordirplus"
> set. So, is something broken with nordirplus or the NFS layer, or should
> __d_unalias() really move a mountpoint? With nordirplus, it works without
> complaining about moving a mountpoint.

By the way, This seems fixed in 3.6-rc6, likely due to
c3f52af3e03013db5237e339c817beaae5ec9e3a. Thanks!

Simon-
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Simon Kirby Oct. 3, 2012, 1:17 a.m. UTC | #2
On Wed, Sep 19, 2012 at 05:13:03PM -0700, Simon Kirby wrote:

> On Wed, Sep 12, 2012 at 02:20:35PM -0700, Simon Kirby wrote:
> 
> > On Wed, Sep 12, 2012 at 08:16:13AM -0400, J. Bruce Fields wrote:
> > 
> > > The symptoms sound similar to
> > > http://marc.info/?l=linux-fsdevel&m=134738157303017&w=2
> > > 
> > > Might be worth checking whether it's that patch?
> > 
> > Indeed! I tried this hack:
> > 
> > diff --git a/fs/dcache.c b/fs/dcache.c
> > index 8086636..649a112 100644
> > --- a/fs/dcache.c
> > +++ b/fs/dcache.c
> > @@ -2404,6 +2404,10 @@ out_unalias:
> >  	if (likely(!d_mountpoint(alias))) {
> >  		__d_move(alias, dentry);
> >  		ret = alias;
> > +	} else {
> > +		printk(KERN_WARNING "VFS: __d_move()ing a d_mountpoint(), uh oh\n");
> > +		__d_move(alias, dentry);
> > +		ret = alias;
> >  	}
> >  out_err:
> >  	spin_unlock(&inode->i_lock);
> > 
> > With this applied, "ls -l flick" prints:
> > 
> > [   77.217420] VFS: __d_move()ing a d_mountpoint(), uh oh
> > [   77.222390] VFS: __d_move()ing a d_mountpoint(), uh oh
> > 
> > ...and "pics" and "raid" then work as they did before, or with "nordirplus"
> > set. So, is something broken with nordirplus or the NFS layer, or should
> > __d_unalias() really move a mountpoint? With nordirplus, it works without
> > complaining about moving a mountpoint.
> 
> By the way, This seems fixed in 3.6-rc6, likely due to
> c3f52af3e03013db5237e339c817beaae5ec9e3a. Thanks!

I confused myself with my own patch here. This still happens to me in
release 3.6, making it not possible for me to use these NFS mounts unless
I set "nordirplus" or apply my above call-__d_move-anyway patch.

I'm also getting file data corruption when mounted TCP, for some stupid
reason, even with all TSO/GSO/GRO disabled, and this goes away with UDP.
Reproducible on different client hardware, and on client kernels back to
2.6.32. Probably related to the 3.4.1 server. More debugging to do...

Simon-
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
J. Bruce Fields Aug. 27, 2013, 3:33 p.m. UTC | #3
On Tue, Oct 02, 2012 at 06:17:53PM -0700, Simon Kirby wrote:
> I'm also getting file data corruption when mounted TCP, for some stupid
> reason, even with all TSO/GSO/GRO disabled, and this goes away with UDP.
> Reproducible on different client hardware, and on client kernels back to
> 2.6.32. Probably related to the 3.4.1 server. More debugging to do...

Is this still happening?

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/dcache.c b/fs/dcache.c
index 8086636..649a112 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2404,6 +2404,10 @@  out_unalias:
 	if (likely(!d_mountpoint(alias))) {
 		__d_move(alias, dentry);
 		ret = alias;
+	} else {
+		printk(KERN_WARNING "VFS: __d_move()ing a d_mountpoint(), uh oh\n");
+		__d_move(alias, dentry);
+		ret = alias;
 	}
 out_err:
 	spin_unlock(&inode->i_lock);