Message ID | d2076a27c1f3faa0d732e64d49bcbab054cae23b.1566850914.git.bcodding@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | NFSv3: nfs_instantiate() might succeed leaving dentry negative unhashed | expand |
On Mon, 2019-08-26 at 16:24 -0400, Benjamin Coddington wrote: > After adding commit b0c6108ecf64 ("nfs_instantiate(): prevent > multiple > aliases for directory inode") my NFS client crashes while doing > lustre race > tests simultaneously on a local filesystem and the same filesystem > exported > via knfsd: > > BUG: unable to handle kernel NULL pointer dereference at > 0000000000000028 > Call Trace: > ? iput+0x76/0x200 > ? d_splice_alias+0x307/0x3c0 > ? dput.part.31+0x96/0x110 > ? nfs_instantiate+0x45/0x160 [nfs] > nfs3_proc_setacls+0xa/0x20 [nfsv3] > nfs3_proc_create+0x1cc/0x230 [nfsv3] > nfs_create+0x83/0x160 [nfs] > path_openat+0x11aa/0x14d0 > do_filp_open+0x93/0x100 > ? __check_object_size+0xa3/0x181 > do_sys_open+0x184/0x220 > do_syscall_64+0x5b/0x1b0 > entry_SYSCALL_64_after_hwframe+0x65/0xca > > 158 static int __nfs3_proc_setacls(struct inode *inode, struct > posix_acl *acl, > 159 struct posix_acl *dfacl) > 160 { > > > 161 struct nfs_server *server = NFS_SERVER(inode); > > The 0x28 offset is i_sb in struct inode, we passed a NULL inode to > nfs3_proc_setacls(). > > After taking this apart, I find the dentry in R12 has a NULL inode > after > nfs_instantiate(), which makes sense if we move it to the alias just > after > nfs_fhget() (See the referenced commit above). Indeed, on the list > of > children is the identical positive dentry that is left behind after > d_splice_alias(). Moving it would usualy be fine for callers, except > for > NFSv3 because we want the inode pointer to ride the dentry back up > the > stack so we can set ACLs on it and/or set attributes in the case of > EXCLUSIVE. > > A similar problem existed in nfsd_create_locked(), and was fixed by > commit > 3819bb0d79f5 ("nfsd: vfs_mkdir() might succeed leaving dentry > negative > unhashed"). This patch takes the same approach to fixing the > problem: in > the rare case that we lost the race to the dentry, look it up and get > the > inode from there. > > Signed-off-by: Benjamin Coddington <bcodding@redhat.com> > Fixes: b0c6108ecf64 ("nfs_instantiate(): prevent multiple aliases for > directory inode") > Cc: Al Viro <viro@zeniv.linux.org.uk> > --- > fs/nfs/nfs3proc.c | 22 +++++++++++++++++++++- > 1 file changed, 21 insertions(+), 1 deletion(-) > > diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c > index a3ad2d46fd42..292c53c082f7 100644 > --- a/fs/nfs/nfs3proc.c > +++ b/fs/nfs/nfs3proc.c > @@ -20,6 +20,7 @@ > #include <linux/nfs_mount.h> > #include <linux/freezer.h> > #include <linux/xattr.h> > +#include <linux/namei.h> > > #include "iostat.h" > #include "internal.h" > @@ -305,6 +306,7 @@ nfs3_proc_create(struct inode *dir, struct dentry > *dentry, struct iattr *sattr, > struct posix_acl *default_acl, *acl; > struct nfs3_createdata *data; > int status = -ENOMEM; > + struct dentry *d = NULL; > > dprintk("NFS call create %pd\n", dentry); > > @@ -355,6 +357,22 @@ nfs3_proc_create(struct inode *dir, struct > dentry *dentry, struct iattr *sattr, > if (status != 0) > goto out_release_acls; > > + /* Possible that nfs_instantiate() lost a race to open-by- > fhandle, > + * in which case we don't have a reference to the dentry */ > + if (unlikely(d_unhashed(dentry))) { > + d = lookup_one_len(dentry->d_name.name, dentry- > >d_parent, > + dentry- > >d_name.len); > + if (IS_ERR(d)) { > + status = PTR_ERR(d); > + goto out_release_acls; > + } > + if (unlikely(d_is_negative(d))) { > + status = -ENOENT; > + goto out_put_d; > + } > + dentry = d; > + } > + If this is a consequence of a race in nfs_instantiate, then why are we not fix it there? Won't we otherwise end up having to duplicate the above code in all the other callers? IOW: why not simply modify nfs_instantiate() to return the dentry from d_splice_alias(), much like we already do for nfs_lookup()? Cheers Trond
On 26 Aug 2019, at 16:39, Trond Myklebust wrote: > On Mon, 2019-08-26 at 16:24 -0400, Benjamin Coddington wrote: >> After adding commit b0c6108ecf64 ("nfs_instantiate(): prevent >> multiple >> aliases for directory inode") my NFS client crashes while doing >> lustre race >> tests simultaneously on a local filesystem and the same filesystem >> exported >> via knfsd: >> >> BUG: unable to handle kernel NULL pointer dereference at >> 0000000000000028 >> Call Trace: >> ? iput+0x76/0x200 >> ? d_splice_alias+0x307/0x3c0 >> ? dput.part.31+0x96/0x110 >> ? nfs_instantiate+0x45/0x160 [nfs] >> nfs3_proc_setacls+0xa/0x20 [nfsv3] >> nfs3_proc_create+0x1cc/0x230 [nfsv3] >> nfs_create+0x83/0x160 [nfs] >> path_openat+0x11aa/0x14d0 >> do_filp_open+0x93/0x100 >> ? __check_object_size+0xa3/0x181 >> do_sys_open+0x184/0x220 >> do_syscall_64+0x5b/0x1b0 >> entry_SYSCALL_64_after_hwframe+0x65/0xca >> >> 158 static int __nfs3_proc_setacls(struct inode *inode, struct >> posix_acl *acl, >> 159 struct posix_acl *dfacl) >> 160 { >>>> 161 struct nfs_server *server = NFS_SERVER(inode); >> >> The 0x28 offset is i_sb in struct inode, we passed a NULL inode to >> nfs3_proc_setacls(). >> >> After taking this apart, I find the dentry in R12 has a NULL inode >> after >> nfs_instantiate(), which makes sense if we move it to the alias just >> after >> nfs_fhget() (See the referenced commit above). Indeed, on the list >> of >> children is the identical positive dentry that is left behind after >> d_splice_alias(). Moving it would usualy be fine for callers, except >> for >> NFSv3 because we want the inode pointer to ride the dentry back up >> the >> stack so we can set ACLs on it and/or set attributes in the case of >> EXCLUSIVE. >> >> A similar problem existed in nfsd_create_locked(), and was fixed by >> commit >> 3819bb0d79f5 ("nfsd: vfs_mkdir() might succeed leaving dentry >> negative >> unhashed"). This patch takes the same approach to fixing the >> problem: in >> the rare case that we lost the race to the dentry, look it up and get >> the >> inode from there. >> >> Signed-off-by: Benjamin Coddington <bcodding@redhat.com> >> Fixes: b0c6108ecf64 ("nfs_instantiate(): prevent multiple aliases for >> directory inode") >> Cc: Al Viro <viro@zeniv.linux.org.uk> >> --- >> fs/nfs/nfs3proc.c | 22 +++++++++++++++++++++- >> 1 file changed, 21 insertions(+), 1 deletion(-) >> >> diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c >> index a3ad2d46fd42..292c53c082f7 100644 >> --- a/fs/nfs/nfs3proc.c >> +++ b/fs/nfs/nfs3proc.c >> @@ -20,6 +20,7 @@ >> #include <linux/nfs_mount.h> >> #include <linux/freezer.h> >> #include <linux/xattr.h> >> +#include <linux/namei.h> >> >> #include "iostat.h" >> #include "internal.h" >> @@ -305,6 +306,7 @@ nfs3_proc_create(struct inode *dir, struct dentry >> *dentry, struct iattr *sattr, >> struct posix_acl *default_acl, *acl; >> struct nfs3_createdata *data; >> int status = -ENOMEM; >> + struct dentry *d = NULL; >> >> dprintk("NFS call create %pd\n", dentry); >> >> @@ -355,6 +357,22 @@ nfs3_proc_create(struct inode *dir, struct >> dentry *dentry, struct iattr *sattr, >> if (status != 0) >> goto out_release_acls; >> >> + /* Possible that nfs_instantiate() lost a race to open-by- >> fhandle, >> + * in which case we don't have a reference to the dentry */ >> + if (unlikely(d_unhashed(dentry))) { >> + d = lookup_one_len(dentry->d_name.name, dentry- >>> d_parent, >> + dentry- >>> d_name.len); >> + if (IS_ERR(d)) { >> + status = PTR_ERR(d); >> + goto out_release_acls; >> + } >> + if (unlikely(d_is_negative(d))) { >> + status = -ENOENT; >> + goto out_put_d; >> + } >> + dentry = d; >> + } >> + > > > If this is a consequence of a race in nfs_instantiate, then why are we > not fix it there? Won't we otherwise end up having to duplicate the > above code in all the other callers? > > IOW: why not simply modify nfs_instantiate() to return the dentry from > d_splice_alias(), much like we already do for nfs_lookup()? None of the other callers care about the dentry and it seemed more invasive. It is also an accepted pattern for VFS - that's why Al justified it in b0c6108ecf64. If you'd rather change all the callers, let me know and I can send that. Ben
On Tue, 2019-08-27 at 06:33 -0400, Benjamin Coddington wrote: > On 26 Aug 2019, at 16:39, Trond Myklebust wrote: > > > On Mon, 2019-08-26 at 16:24 -0400, Benjamin Coddington wrote: > > > After adding commit b0c6108ecf64 ("nfs_instantiate(): prevent > > > multiple > > > aliases for directory inode") my NFS client crashes while doing > > > lustre race > > > tests simultaneously on a local filesystem and the same > > > filesystem > > > exported > > > via knfsd: > > > > > > BUG: unable to handle kernel NULL pointer dereference at > > > 0000000000000028 > > > Call Trace: > > > ? iput+0x76/0x200 > > > ? d_splice_alias+0x307/0x3c0 > > > ? dput.part.31+0x96/0x110 > > > ? nfs_instantiate+0x45/0x160 [nfs] > > > nfs3_proc_setacls+0xa/0x20 [nfsv3] > > > nfs3_proc_create+0x1cc/0x230 [nfsv3] > > > nfs_create+0x83/0x160 [nfs] > > > path_openat+0x11aa/0x14d0 > > > do_filp_open+0x93/0x100 > > > ? __check_object_size+0xa3/0x181 > > > do_sys_open+0x184/0x220 > > > do_syscall_64+0x5b/0x1b0 > > > entry_SYSCALL_64_after_hwframe+0x65/0xca > > > > > > 158 static int __nfs3_proc_setacls(struct inode *inode, struct > > > posix_acl *acl, > > > 159 struct posix_acl *dfacl) > > > 160 { > > > > > 161 struct nfs_server *server = NFS_SERVER(inode); > > > > > > The 0x28 offset is i_sb in struct inode, we passed a NULL inode > > > to > > > nfs3_proc_setacls(). > > > > > > After taking this apart, I find the dentry in R12 has a NULL > > > inode > > > after > > > nfs_instantiate(), which makes sense if we move it to the alias > > > just > > > after > > > nfs_fhget() (See the referenced commit above). Indeed, on the > > > list > > > of > > > children is the identical positive dentry that is left behind > > > after > > > d_splice_alias(). Moving it would usualy be fine for callers, > > > except > > > for > > > NFSv3 because we want the inode pointer to ride the dentry back > > > up > > > the > > > stack so we can set ACLs on it and/or set attributes in the case > > > of > > > EXCLUSIVE. > > > > > > A similar problem existed in nfsd_create_locked(), and was fixed > > > by > > > commit > > > 3819bb0d79f5 ("nfsd: vfs_mkdir() might succeed leaving dentry > > > negative > > > unhashed"). This patch takes the same approach to fixing the > > > problem: in > > > the rare case that we lost the race to the dentry, look it up and > > > get > > > the > > > inode from there. > > > > > > Signed-off-by: Benjamin Coddington <bcodding@redhat.com> > > > Fixes: b0c6108ecf64 ("nfs_instantiate(): prevent multiple aliases > > > for > > > directory inode") > > > Cc: Al Viro <viro@zeniv.linux.org.uk> > > > --- > > > fs/nfs/nfs3proc.c | 22 +++++++++++++++++++++- > > > 1 file changed, 21 insertions(+), 1 deletion(-) > > > > > > diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c > > > index a3ad2d46fd42..292c53c082f7 100644 > > > --- a/fs/nfs/nfs3proc.c > > > +++ b/fs/nfs/nfs3proc.c > > > @@ -20,6 +20,7 @@ > > > #include <linux/nfs_mount.h> > > > #include <linux/freezer.h> > > > #include <linux/xattr.h> > > > +#include <linux/namei.h> > > > > > > #include "iostat.h" > > > #include "internal.h" > > > @@ -305,6 +306,7 @@ nfs3_proc_create(struct inode *dir, struct > > > dentry > > > *dentry, struct iattr *sattr, > > > struct posix_acl *default_acl, *acl; > > > struct nfs3_createdata *data; > > > int status = -ENOMEM; > > > + struct dentry *d = NULL; > > > > > > dprintk("NFS call create %pd\n", dentry); > > > > > > @@ -355,6 +357,22 @@ nfs3_proc_create(struct inode *dir, struct > > > dentry *dentry, struct iattr *sattr, > > > if (status != 0) > > > goto out_release_acls; > > > > > > + /* Possible that nfs_instantiate() lost a race to open-by- > > > fhandle, > > > + * in which case we don't have a reference to the dentry */ > > > + if (unlikely(d_unhashed(dentry))) { > > > + d = lookup_one_len(dentry->d_name.name, dentry- > > > > d_parent, > > > + dentry- > > > > d_name.len); > > > + if (IS_ERR(d)) { > > > + status = PTR_ERR(d); > > > + goto out_release_acls; > > > + } > > > + if (unlikely(d_is_negative(d))) { > > > + status = -ENOENT; > > > + goto out_put_d; > > > + } > > > + dentry = d; > > > + } > > > + > > > > If this is a consequence of a race in nfs_instantiate, then why are > > we > > not fix it there? Won't we otherwise end up having to duplicate the > > above code in all the other callers? > > > > IOW: why not simply modify nfs_instantiate() to return the dentry > > from > > d_splice_alias(), much like we already do for nfs_lookup()? > > None of the other callers care about the dentry and it seemed more > invasive. > It is also an accepted pattern for VFS - that's why Al justified it > in > b0c6108ecf64. It is racy, though. Nothing guarantees that a dentry for that file is still hashed under the same name when you look it up again, so it is better to pass it back directly from the d_splice_alias() call. > If you'd rather change all the callers, let me know and I can send > that. If you'd prefer not to have to change all the callers, then perhaps split the function into two parts: - The inner part that returns the dentry from d_splice_alias() on success, and which can be called directly from nfs3_do_create(). - Then a wrapper that works like nfs_instantiate() by dput()ing the valid dentry (and returning 0) or otherwise converting the ERR_PTR() and returning that.
On 27 Aug 2019, at 7:46, Trond Myklebust wrote: > On Tue, 2019-08-27 at 06:33 -0400, Benjamin Coddington wrote: >> On 26 Aug 2019, at 16:39, Trond Myklebust wrote: >>> If this is a consequence of a race in nfs_instantiate, then why are >>> we >>> not fix it there? Won't we otherwise end up having to duplicate the >>> above code in all the other callers? >>> >>> IOW: why not simply modify nfs_instantiate() to return the dentry >>> from >>> d_splice_alias(), much like we already do for nfs_lookup()? >> >> None of the other callers care about the dentry and it seemed more >> invasive. >> It is also an accepted pattern for VFS - that's why Al justified it >> in >> b0c6108ecf64. > > It is racy, though. Nothing guarantees that a dentry for that file is > still hashed under the same name when you look it up again, so it is > better to pass it back directly from the d_splice_alias() call. > >> If you'd rather change all the callers, let me know and I can send >> that. > > If you'd prefer not to have to change all the callers, then perhaps > split the function into two parts: > - The inner part that returns the dentry from d_splice_alias() on > success, and which can be called directly from nfs3_do_create(). > - Then a wrapper that works like nfs_instantiate() by dput()ing the > valid dentry (and returning 0) or otherwise converting the ERR_PTR() > and returning that. Ok, sounds fine. One thing that strikes me as odd is the d_drop() at the top of nfs_instantiate(). That seems wrong if the next check for positive bypasses the work of hashing it again. Can you give me a hint as to why the paths are that way? Otherwise, I think it should change as: diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index 8d501093660f..7720a19b38d3 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -1682,11 +1682,12 @@ int nfs_instantiate(struct dentry *dentry, struct nfs_fh *fhandle, struct dentry *d; int error = -EACCES; - d_drop(dentry); - /* We may have been initialized further down */ if (d_really_is_positive(dentry)) goto out; + + d_drop(dentry); if (fhandle->size == 0) { error = NFS_PROTO(dir)->lookup(dir, &dentry->d_name, fhandle, fattr, NULL); if (error) Ben
diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c index a3ad2d46fd42..292c53c082f7 100644 --- a/fs/nfs/nfs3proc.c +++ b/fs/nfs/nfs3proc.c @@ -20,6 +20,7 @@ #include <linux/nfs_mount.h> #include <linux/freezer.h> #include <linux/xattr.h> +#include <linux/namei.h> #include "iostat.h" #include "internal.h" @@ -305,6 +306,7 @@ nfs3_proc_create(struct inode *dir, struct dentry *dentry, struct iattr *sattr, struct posix_acl *default_acl, *acl; struct nfs3_createdata *data; int status = -ENOMEM; + struct dentry *d = NULL; dprintk("NFS call create %pd\n", dentry); @@ -355,6 +357,22 @@ nfs3_proc_create(struct inode *dir, struct dentry *dentry, struct iattr *sattr, if (status != 0) goto out_release_acls; + /* Possible that nfs_instantiate() lost a race to open-by-fhandle, + * in which case we don't have a reference to the dentry */ + if (unlikely(d_unhashed(dentry))) { + d = lookup_one_len(dentry->d_name.name, dentry->d_parent, + dentry->d_name.len); + if (IS_ERR(d)) { + status = PTR_ERR(d); + goto out_release_acls; + } + if (unlikely(d_is_negative(d))) { + status = -ENOENT; + goto out_put_d; + } + dentry = d; + } + /* When we created the file with exclusive semantics, make * sure we set the attributes afterwards. */ if (data->arg.create.createmode == NFS3_CREATE_EXCLUSIVE) { @@ -372,11 +390,13 @@ nfs3_proc_create(struct inode *dir, struct dentry *dentry, struct iattr *sattr, nfs_post_op_update_inode(d_inode(dentry), data->res.fattr); dprintk("NFS reply setattr (post-create): %d\n", status); if (status != 0) - goto out_release_acls; + goto out_put_d; } status = nfs3_proc_setacls(d_inode(dentry), acl, default_acl); +out_put_d: + dput(d); out_release_acls: posix_acl_release(acl); posix_acl_release(default_acl);