diff mbox

vfs: Fix, simpliy and stop using d_dname for the /proc/*/ns/* files.

Message ID 87bnhedhw5.fsf_-_@x220.int.ebiederm.org (mailing list archive)
State New, archived
Headers show

Commit Message

Eric W. Biederman May 20, 2015, 10:05 p.m. UTC
Omar Sandoval <osandov@osandov.com> writes:

> On Fri, May 15, 2015 at 08:58:20PM +0200, Ivan Delalande wrote:
>> Hi,
>> 
>> I’m still struggling to understand all the relations between the VFS
>> structures and at which point they get initialized precisely. Do you
>> have any idea how to fix this problem?

The relationships in this case are quite odd and can be simplified.

> I'm attaching a minimal script that reproduces this on 4.1-rc4. I'm
> taking a look, but Eric or Al will probably figure this out before I get
> the chance :)

This works for me, can you confirm it works for you folks as well?

Thank you,
Eric


From: "Eric W. Biederman" <ebiederm@xmission.com>
Date: Wed, 20 May 2015 16:39:00 -0500
Subject: [PATCH] vfs: Fix, simpliy and stop using d_dname for the /proc/*/ns/* files.

Now that these files are no longer in proc we can stop playing games
with d_alloc_pseudo and d_dname.  This change causes a couple of user
visible changes:

- Opening /proc/*/ns/* and then readlink on /proc/self/fd/N now sees a
  prepended / in the filename.

- /proc/mountinfo now gives useful information on what is mounted.

- Bind mounting /proc/*/ns/*, opening the mounted file, removing the
  bind mount, and readlink on /proc/self/fd/N now sees / (as it should)
  instead of triggering a warning and a kernel stack backtrace from
  prepend_path.

Cc: stable@vger.kernel.org
Reported-by: Ivan Delalande <colona@arista.com>
Reported-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/dcache.c |  7 +------
 fs/nsfs.c   | 19 +++++--------------
 2 files changed, 6 insertions(+), 20 deletions(-)

Comments

Omar Sandoval May 20, 2015, 11 p.m. UTC | #1
On Wed, May 20, 2015 at 05:05:30PM -0500, Eric W. Biederman wrote:
> This works for me, can you confirm it works for you folks as well?
> 
> Thank you,
> Eric

Yup, neither reproducer produces the warning anymore.
Reported-and-Tested-by: Omar Sandoval <osandov@osandov.com>

> From: "Eric W. Biederman" <ebiederm@xmission.com>
> Date: Wed, 20 May 2015 16:39:00 -0500
> Subject: [PATCH] vfs: Fix, simpliy and stop using d_dname for the /proc/*/ns/* files.
> 
> Now that these files are no longer in proc we can stop playing games
> with d_alloc_pseudo and d_dname.  This change causes a couple of user
> visible changes:
> 
> - Opening /proc/*/ns/* and then readlink on /proc/self/fd/N now sees a
>   prepended / in the filename.
> 
> - /proc/mountinfo now gives useful information on what is mounted.
> 
> - Bind mounting /proc/*/ns/*, opening the mounted file, removing the
>   bind mount, and readlink on /proc/self/fd/N now sees / (as it should)

What's the rationale for "/" in this case? (I'm not disagreeing, just
looking to learn something.)

Thanks,
Ivan Delalande May 20, 2015, 11:08 p.m. UTC | #2
On Wed, May 20, 2015 at 05:05:30PM -0500, Eric W. Biederman wrote:
> Omar Sandoval <osandov@osandov.com> writes:
>> I'm attaching a minimal script that reproduces this on 4.1-rc4. I'm
>> taking a look, but Eric or Al will probably figure this out before I get
>> the chance :)
> 
> This works for me, can you confirm it works for you folks as well?

Yeah, your patch works with our setup and when running our initial
test-case as well. Thanks a lot Eric.

> From: "Eric W. Biederman" <ebiederm@xmission.com>
> Date: Wed, 20 May 2015 16:39:00 -0500
> Subject: [PATCH] vfs: Fix, simpliy and stop using d_dname for the /proc/*/ns/* files.
> 
> Now that these files are no longer in proc we can stop playing games
> with d_alloc_pseudo and d_dname.  This change causes a couple of user
> visible changes:
> 
> - Opening /proc/*/ns/* and then readlink on /proc/self/fd/N now sees a
>   prepended / in the filename.
> 
> - /proc/mountinfo now gives useful information on what is mounted.
> 
> - Bind mounting /proc/*/ns/*, opening the mounted file, removing the
>   bind mount, and readlink on /proc/self/fd/N now sees / (as it should)
>   instead of triggering a warning and a kernel stack backtrace from
>   prepend_path.
> 
> Cc: stable@vger.kernel.org
> Reported-by: Ivan Delalande <colona@arista.com>
> Reported-by: Omar Sandoval <osandov@osandov.com>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Tested-by: Ivan Delalande <colona@arista.com>
Al Viro May 20, 2015, 11:23 p.m. UTC | #3
On Wed, May 20, 2015 at 05:05:30PM -0500, Eric W. Biederman wrote:

> -	dentry = d_alloc_pseudo(mnt->mnt_sb, &qname);
> +	dentry = d_alloc_name(mnt->mnt_root, name);
>  	if (!dentry) {
>  		iput(inode);
>  		mntput(mnt);
>  		return ERR_PTR(-ENOMEM);
>  	}
> -	d_instantiate(dentry, inode);
> +	d_add(dentry, inode);

Careful - that might have non-trivial effects.  Namely, you are making
the root dentry of that sucker a contention point and adding to hash
pollution...  It's probably not going to cause visible problems, but
it's worth profiling just to be sure.

Besides, you are violating a bunch of rules here - several hashed
children of the same directory with the same name all at once...
not nice.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ivan Delalande June 3, 2015, 8:51 p.m. UTC | #4
On Thu, May 21, 2015 at 12:23:25AM +0100, Al Viro wrote:
> On Wed, May 20, 2015 at 05:05:30PM -0500, Eric W. Biederman wrote:
> > -	dentry = d_alloc_pseudo(mnt->mnt_sb, &qname);
> > +	dentry = d_alloc_name(mnt->mnt_root, name);
> >  	if (!dentry) {
> >  		iput(inode);
> >  		mntput(mnt);
> >  		return ERR_PTR(-ENOMEM);
> >  	}
> > -	d_instantiate(dentry, inode);
> > +	d_add(dentry, inode);
> 
> Careful - that might have non-trivial effects.  Namely, you are making
> the root dentry of that sucker a contention point and adding to hash
> pollution...  It's probably not going to cause visible problems, but
> it's worth profiling just to be sure.
> 
> Besides, you are violating a bunch of rules here - several hashed
> children of the same directory with the same name all at once...
> not nice.

Hey Eric, did you have any thought about Al’s concerns?

Thanks,
Eric W. Biederman June 6, 2015, 7:27 p.m. UTC | #5
Ivan Delalande <colona@arista.com> writes:

> On Thu, May 21, 2015 at 12:23:25AM +0100, Al Viro wrote:
>> On Wed, May 20, 2015 at 05:05:30PM -0500, Eric W. Biederman wrote:
>> > -	dentry = d_alloc_pseudo(mnt->mnt_sb, &qname);
>> > +	dentry = d_alloc_name(mnt->mnt_root, name);
>> >  	if (!dentry) {
>> >  		iput(inode);
>> >  		mntput(mnt);
>> >  		return ERR_PTR(-ENOMEM);
>> >  	}
>> > -	d_instantiate(dentry, inode);
>> > +	d_add(dentry, inode);
>> 
>> Careful - that might have non-trivial effects.  Namely, you are making
>> the root dentry of that sucker a contention point and adding to hash
>> pollution...  It's probably not going to cause visible problems, but
>> it's worth profiling just to be sure.
>> 
>> Besides, you are violating a bunch of rules here - several hashed
>> children of the same directory with the same name all at once...
>> not nice.
>
> Hey Eric, did you have any thought about Al’s concerns?

Massive difference in perspective for the most part.  It did cause me to
step back and really look at what that code is doing and why.

For the immediate problem the issue is that the WARN_ON is warning about
something nothing in the kernel has done for 5 years and in practice as
you have seen is actually wrong.  So deleting the warning message
appears the best way to handle the situation you are seeing.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ivan Delalande June 6, 2015, 8:08 p.m. UTC | #6
On Sat, Jun 06, 2015 at 02:27:24PM -0500, Eric W. Biederman wrote:
> Ivan Delalande <colona@arista.com> writes:
> 
> > On Thu, May 21, 2015 at 12:23:25AM +0100, Al Viro wrote:
> >> On Wed, May 20, 2015 at 05:05:30PM -0500, Eric W. Biederman wrote:
> >> > -	dentry = d_alloc_pseudo(mnt->mnt_sb, &qname);
> >> > +	dentry = d_alloc_name(mnt->mnt_root, name);
> >> >  	if (!dentry) {
> >> >  		iput(inode);
> >> >  		mntput(mnt);
> >> >  		return ERR_PTR(-ENOMEM);
> >> >  	}
> >> > -	d_instantiate(dentry, inode);
> >> > +	d_add(dentry, inode);
> >> 
> >> Careful - that might have non-trivial effects.  Namely, you are making
> >> the root dentry of that sucker a contention point and adding to hash
> >> pollution...  It's probably not going to cause visible problems, but
> >> it's worth profiling just to be sure.
> >> 
> >> Besides, you are violating a bunch of rules here - several hashed
> >> children of the same directory with the same name all at once...
> >> not nice.
> >
> > Hey Eric, did you have any thought about Al’s concerns?
> 
> Massive difference in perspective for the most part.  It did cause me to
> step back and really look at what that code is doing and why.
> 
> For the immediate problem the issue is that the WARN_ON is warning about
> something nothing in the kernel has done for 5 years and in practice as
> you have seen is actually wrong.  So deleting the warning message
> appears the best way to handle the situation you are seeing.

Ok, great. Thanks for all the info and the new patch, Eric.
diff mbox

Patch

diff --git a/fs/dcache.c b/fs/dcache.c
index 656ce522a218..e0f0967ef24c 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -3087,13 +3087,8 @@  char *d_path(const struct path *path, char *buf, int buflen)
 	 * thus don't need to be hashed.  They also don't need a name until a
 	 * user wants to identify the object in /proc/pid/fd/.  The little hack
 	 * below allows us to generate a name for these objects on demand:
-	 *
-	 * Some pseudo inodes are mountable.  When they are mounted
-	 * path->dentry == path->mnt->mnt_root.  In that case don't call d_dname
-	 * and instead have d_path return the mounted path.
 	 */
-	if (path->dentry->d_op && path->dentry->d_op->d_dname &&
-	    (!IS_ROOT(path->dentry) || path->dentry != path->mnt->mnt_root))
+	if (path->dentry->d_op && path->dentry->d_op->d_dname)
 		return path->dentry->d_op->d_dname(path->dentry, buf, buflen);
 
 	rcu_read_lock();
diff --git a/fs/nsfs.c b/fs/nsfs.c
index 99521e7c492b..b9ecfa6cb46d 100644
--- a/fs/nsfs.c
+++ b/fs/nsfs.c
@@ -11,15 +11,6 @@  static const struct file_operations ns_file_operations = {
 	.llseek		= no_llseek,
 };
 
-static char *ns_dname(struct dentry *dentry, char *buffer, int buflen)
-{
-	struct inode *inode = d_inode(dentry);
-	const struct proc_ns_operations *ns_ops = dentry->d_fsdata;
-
-	return dynamic_dname(dentry, buffer, buflen, "%s:[%lu]",
-		ns_ops->name, inode->i_ino);
-}
-
 static void ns_prune_dentry(struct dentry *dentry)
 {
 	struct inode *inode = d_inode(dentry);
@@ -33,7 +24,6 @@  const struct dentry_operations ns_dentry_operations =
 {
 	.d_prune	= ns_prune_dentry,
 	.d_delete	= always_delete_dentry,
-	.d_dname	= ns_dname,
 };
 
 static void nsfs_evict(struct inode *inode)
@@ -47,7 +37,7 @@  void *ns_get_path(struct path *path, struct task_struct *task,
 			const struct proc_ns_operations *ns_ops)
 {
 	struct vfsmount *mnt = mntget(nsfs_mnt);
-	struct qstr qname = { .name = "", };
+	char name[DNAME_INLINE_LEN];
 	struct dentry *dentry;
 	struct inode *inode;
 	struct ns_common *ns;
@@ -80,6 +70,7 @@  slow:
 		mntput(mnt);
 		return ERR_PTR(-ENOMEM);
 	}
+	snprintf(name, sizeof(name), "%s:[%u]", ns_ops->name, ns->inum);
 	inode->i_ino = ns->inum;
 	inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME;
 	inode->i_flags |= S_IMMUTABLE;
@@ -87,13 +78,13 @@  slow:
 	inode->i_fop = &ns_file_operations;
 	inode->i_private = ns;
 
-	dentry = d_alloc_pseudo(mnt->mnt_sb, &qname);
+	dentry = d_alloc_name(mnt->mnt_root, name);
 	if (!dentry) {
 		iput(inode);
 		mntput(mnt);
 		return ERR_PTR(-ENOMEM);
 	}
-	d_instantiate(dentry, inode);
+	d_add(dentry, inode);
 	dentry->d_fsdata = (void *)ns_ops;
 	d = atomic_long_cmpxchg(&ns->stashed, 0, (unsigned long)dentry);
 	if (d) {
@@ -143,7 +134,7 @@  static const struct super_operations nsfs_ops = {
 static struct dentry *nsfs_mount(struct file_system_type *fs_type,
 			int flags, const char *dev_name, void *data)
 {
-	return mount_pseudo(fs_type, "nsfs:", &nsfs_ops,
+	return mount_pseudo(fs_type, "/", &nsfs_ops,
 			&ns_dentry_operations, NSFS_MAGIC);
 }
 static struct file_system_type nsfs = {