vfs: fsmount: add missing mntget()

Message ID	20190612184313.143456-1-ebiggers@kernel.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-fsdevel-owner@kernel.org> From: Eric Biggers <ebiggers@kernel.org> To: David Howells <dhowells@redhat.com>, Alexander Viro <viro@zeniv.linux.org.uk>, linux-fsdevel@vger.kernel.org Cc: Mark Rutland <mark.rutland@arm.com>, linux-kernel@vger.kernel.org Subject: [PATCH] vfs: fsmount: add missing mntget() Date: Wed, 12 Jun 2019 11:43:13 -0700 Message-Id: <20190612184313.143456-1-ebiggers@kernel.org> In-Reply-To: <20190610183031.GE63833@gmail.com> References: <20190610183031.GE63833@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk
Series	vfs: fsmount: add missing mntget() \| expand vfs: fsmount: add missing mntget()

Eric Biggers June 12, 2019, 6:43 p.m. UTC

From: Eric Biggers <ebiggers@google.com>

sys_fsmount() needs to take a reference to the new mount when adding it
to the anonymous mount namespace.  Otherwise the filesystem can be
unmounted while it's still in use, as found by syzkaller.

Reported-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: syzbot+99de05d099a170867f22@syzkaller.appspotmail.com
Reported-by: syzbot+7008b8b8ba7df475fdc8@syzkaller.appspotmail.com
Fixes: 93766fbd2696 ("vfs: syscall: Add fsmount() to create a mount for a superblock")
Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/namespace.c | 1 +
 1 file changed, 1 insertion(+)

Miklos Szeredi June 13, 2019, 8:47 a.m. UTC | #1

On Wed, Jun 12, 2019 at 11:43:13AM -0700, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> sys_fsmount() needs to take a reference to the new mount when adding it
> to the anonymous mount namespace.  Otherwise the filesystem can be
> unmounted while it's still in use, as found by syzkaller.

So it needs one count for the file (which dentry_open() obtains) and one for the
attachment into the anonymous namespace.  The latter one is dropped at cleanup
time, so your patch appears to be correct at getting that ref.

I wonder why such a blatant use-after-free was missed in normal testing.  RCU
delayed freeing, I guess?

How about this additional sanity checking patch?

Thanks,
Miklos


diff --git a/fs/namespace.c b/fs/namespace.c
index b26778bdc236..c638f220805a 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -153,10 +153,10 @@ static inline void mnt_add_count(struct mount *mnt, int n)
 /*
  * vfsmount lock must be held for write
  */
-unsigned int mnt_get_count(struct mount *mnt)
+int mnt_get_count(struct mount *mnt)
 {
 #ifdef CONFIG_SMP
-	unsigned int count = 0;
+	int count = 0;
 	int cpu;
 
 	for_each_possible_cpu(cpu) {
@@ -1140,6 +1140,8 @@ static DECLARE_DELAYED_WORK(delayed_mntput_work, delayed_mntput);
 
 static void mntput_no_expire(struct mount *mnt)
 {
+	int count;
+
 	rcu_read_lock();
 	if (likely(READ_ONCE(mnt->mnt_ns))) {
 		/*
@@ -1162,11 +1164,13 @@ static void mntput_no_expire(struct mount *mnt)
 	 */
 	smp_mb();
 	mnt_add_count(mnt, -1);
-	if (mnt_get_count(mnt)) {
+	count = mnt_get_count(mnt);
+	if (count > 0) {
 		rcu_read_unlock();
 		unlock_mount_hash();
 		return;
 	}
+	WARN_ON(count < 0);
 	if (unlikely(mnt->mnt.mnt_flags & MNT_DOOMED)) {
 		rcu_read_unlock();
 		unlock_mount_hash();
diff --git a/fs/pnode.h b/fs/pnode.h
index 49a058c73e4c..26f74e092bd9 100644
--- a/fs/pnode.h
+++ b/fs/pnode.h
@@ -44,7 +44,7 @@ int propagate_mount_busy(struct mount *, int);
 void propagate_mount_unlock(struct mount *);
 void mnt_release_group_id(struct mount *);
 int get_dominating_id(struct mount *mnt, const struct path *root);
-unsigned int mnt_get_count(struct mount *mnt);
+int mnt_get_count(struct mount *mnt);
 void mnt_set_mountpoint(struct mount *, struct mountpoint *,
 			struct mount *);
 void mnt_change_mountpoint(struct mount *parent, struct mountpoint *mp,

Mark Rutland June 13, 2019, 9:03 a.m. UTC | #2

On Wed, Jun 12, 2019 at 11:43:13AM -0700, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> sys_fsmount() needs to take a reference to the new mount when adding it
> to the anonymous mount namespace.  Otherwise the filesystem can be
> unmounted while it's still in use, as found by syzkaller.
> 
> Reported-by: Mark Rutland <mark.rutland@arm.com>
> Reported-by: syzbot+99de05d099a170867f22@syzkaller.appspotmail.com
> Reported-by: syzbot+7008b8b8ba7df475fdc8@syzkaller.appspotmail.com
> Fixes: 93766fbd2696 ("vfs: syscall: Add fsmount() to create a mount for a superblock")
> Signed-off-by: Eric Biggers <ebiggers@google.com>

With this patch applied, my reproducer from [1] no longer triggers the
issue. I polled /proc/meminfo, and don't see any leak. FWIW:

Tested-by: Mark Rutland <mark.rutland@arm.com>

Thanks for fixing this!

Mark.

[1] https://lore.kernel.org/lkml/20190605135401.GB30925@lakrids.cambridge.arm.com/

> ---
>  fs/namespace.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index b26778bdc236e..5dc137a22d406 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -3445,6 +3445,7 @@ SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags,
>  	ns->root = mnt;
>  	ns->mounts = 1;
>  	list_add(&mnt->mnt_list, &ns->list);
> +	mntget(newmount.mnt);
>  
>  	/* Attach to an apparent O_PATH fd with a note that we need to unmount
>  	 * it, not just simply put it.
> -- 
> 2.22.0.rc2.383.gf4fbbf30c2-goog
>

Eric Biggers June 13, 2019, 4:41 p.m. UTC | #3

On Thu, Jun 13, 2019 at 10:47:28AM +0200, Miklos Szeredi wrote:
> On Wed, Jun 12, 2019 at 11:43:13AM -0700, Eric Biggers wrote:
> > From: Eric Biggers <ebiggers@google.com>
> > 
> > sys_fsmount() needs to take a reference to the new mount when adding it
> > to the anonymous mount namespace.  Otherwise the filesystem can be
> > unmounted while it's still in use, as found by syzkaller.
> 
> So it needs one count for the file (which dentry_open() obtains) and one for the
> attachment into the anonymous namespace.  The latter one is dropped at cleanup
> time, so your patch appears to be correct at getting that ref.

Yes.

> 
> I wonder why such a blatant use-after-free was missed in normal testing.  RCU
> delayed freeing, I guess?

It's because mount freeing is delayed by task_work_add(), so normally the refcnt
would be dropped to -1 when the file is closed without problems.  The problems
only showed up if you took another reference, e.g. by fchdir().

> 
> How about this additional sanity checking patch?

Seems like a good idea.  Without my fix, the WARNING is triggered by the
following program (no fchdir() needed):

	int main(void)
	{
		int fs;

		fs = syscall(__NR_fsopen, "ramfs", 0);
		syscall(__NR_fsconfig, fs, 6, 0, 0, 0);
		syscall(__NR_fsmount, fs, 0, 0);
	}

Can you send it as a formal patch?

- Eric

Al Viro June 17, 2019, 9:34 p.m. UTC | #4

On Wed, Jun 12, 2019 at 11:43:13AM -0700, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> sys_fsmount() needs to take a reference to the new mount when adding it
> to the anonymous mount namespace.  Otherwise the filesystem can be
> unmounted while it's still in use, as found by syzkaller.
> 
> Reported-by: Mark Rutland <mark.rutland@arm.com>
> Reported-by: syzbot+99de05d099a170867f22@syzkaller.appspotmail.com
> Reported-by: syzbot+7008b8b8ba7df475fdc8@syzkaller.appspotmail.com
> Fixes: 93766fbd2696 ("vfs: syscall: Add fsmount() to create a mount for a superblock")
> Signed-off-by: Eric Biggers <ebiggers@google.com>
> ---
>  fs/namespace.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index b26778bdc236e..5dc137a22d406 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -3445,6 +3445,7 @@ SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags,
>  	ns->root = mnt;
>  	ns->mounts = 1;
>  	list_add(&mnt->mnt_list, &ns->list);
> +	mntget(newmount.mnt);
>  
>  	/* Attach to an apparent O_PATH fd with a note that we need to unmount
>  	 * it, not just simply put it.

Nice catch...  Applied, with apologies for having been MIA lately.

Eric Biggers July 9, 2019, 11 p.m. UTC | #5

On Thu, Jun 13, 2019 at 10:47:28AM +0200, Miklos Szeredi wrote:
> On Wed, Jun 12, 2019 at 11:43:13AM -0700, Eric Biggers wrote:
> > From: Eric Biggers <ebiggers@google.com>
> > 
> > sys_fsmount() needs to take a reference to the new mount when adding it
> > to the anonymous mount namespace.  Otherwise the filesystem can be
> > unmounted while it's still in use, as found by syzkaller.
> 
> So it needs one count for the file (which dentry_open() obtains) and one for the
> attachment into the anonymous namespace.  The latter one is dropped at cleanup
> time, so your patch appears to be correct at getting that ref.
> 
> I wonder why such a blatant use-after-free was missed in normal testing.  RCU
> delayed freeing, I guess?
> 
> How about this additional sanity checking patch?
> 
> Thanks,
> Miklos
> 
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index b26778bdc236..c638f220805a 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -153,10 +153,10 @@ static inline void mnt_add_count(struct mount *mnt, int n)
>  /*
>   * vfsmount lock must be held for write
>   */
> -unsigned int mnt_get_count(struct mount *mnt)
> +int mnt_get_count(struct mount *mnt)
>  {
>  #ifdef CONFIG_SMP
> -	unsigned int count = 0;
> +	int count = 0;
>  	int cpu;
>  
>  	for_each_possible_cpu(cpu) {
> @@ -1140,6 +1140,8 @@ static DECLARE_DELAYED_WORK(delayed_mntput_work, delayed_mntput);
>  
>  static void mntput_no_expire(struct mount *mnt)
>  {
> +	int count;
> +
>  	rcu_read_lock();
>  	if (likely(READ_ONCE(mnt->mnt_ns))) {
>  		/*
> @@ -1162,11 +1164,13 @@ static void mntput_no_expire(struct mount *mnt)
>  	 */
>  	smp_mb();
>  	mnt_add_count(mnt, -1);
> -	if (mnt_get_count(mnt)) {
> +	count = mnt_get_count(mnt);
> +	if (count > 0) {
>  		rcu_read_unlock();
>  		unlock_mount_hash();
>  		return;
>  	}
> +	WARN_ON(count < 0);
>  	if (unlikely(mnt->mnt.mnt_flags & MNT_DOOMED)) {
>  		rcu_read_unlock();
>  		unlock_mount_hash();
> diff --git a/fs/pnode.h b/fs/pnode.h
> index 49a058c73e4c..26f74e092bd9 100644
> --- a/fs/pnode.h
> +++ b/fs/pnode.h
> @@ -44,7 +44,7 @@ int propagate_mount_busy(struct mount *, int);
>  void propagate_mount_unlock(struct mount *);
>  void mnt_release_group_id(struct mount *);
>  int get_dominating_id(struct mount *mnt, const struct path *root);
> -unsigned int mnt_get_count(struct mount *mnt);
> +int mnt_get_count(struct mount *mnt);
>  void mnt_set_mountpoint(struct mount *, struct mountpoint *,
>  			struct mount *);
>  void mnt_change_mountpoint(struct mount *parent, struct mountpoint *mp,

Miklos, are you planning to send this as a formal patch?

- Eric

Al Viro July 10, 2019, 12:31 a.m. UTC | #6

On Tue, Jul 09, 2019 at 04:00:29PM -0700, Eric Biggers wrote:

> > index 49a058c73e4c..26f74e092bd9 100644
> > --- a/fs/pnode.h
> > +++ b/fs/pnode.h
> > @@ -44,7 +44,7 @@ int propagate_mount_busy(struct mount *, int);
> >  void propagate_mount_unlock(struct mount *);
> >  void mnt_release_group_id(struct mount *);
> >  int get_dominating_id(struct mount *mnt, const struct path *root);
> > -unsigned int mnt_get_count(struct mount *mnt);
> > +int mnt_get_count(struct mount *mnt);
> >  void mnt_set_mountpoint(struct mount *, struct mountpoint *,
> >  			struct mount *);
> >  void mnt_change_mountpoint(struct mount *parent, struct mountpoint *mp,
> 
> Miklos, are you planning to send this as a formal patch?

Hold it for a while, OK?  There's an unpleasant issue (a very long-standing
one) with boxen that have an obscene amount of RAM.  Some of the counters
involved will need to become long.  This is the coming cycle fodder (mounts
and inodes are relatively easy; it's dentry->d_count that brings arseloads
of fun) and I'd rather deal with that sanity check as part of the same series.
It's not forgotten...  Patch series re limiting the number of negative
dentries is also getting into the same mix.  Watch #work.dcache - what's
in there is basically prep work for the big pile for the next cycle; it'll
be interesting...

Eric Biggers Oct. 16, 2019, 12:52 a.m. UTC | #7

On Wed, Jul 10, 2019 at 01:31:13AM +0100, Al Viro wrote:
> On Tue, Jul 09, 2019 at 04:00:29PM -0700, Eric Biggers wrote:
> 
> > > index 49a058c73e4c..26f74e092bd9 100644
> > > --- a/fs/pnode.h
> > > +++ b/fs/pnode.h
> > > @@ -44,7 +44,7 @@ int propagate_mount_busy(struct mount *, int);
> > >  void propagate_mount_unlock(struct mount *);
> > >  void mnt_release_group_id(struct mount *);
> > >  int get_dominating_id(struct mount *mnt, const struct path *root);
> > > -unsigned int mnt_get_count(struct mount *mnt);
> > > +int mnt_get_count(struct mount *mnt);
> > >  void mnt_set_mountpoint(struct mount *, struct mountpoint *,
> > >  			struct mount *);
> > >  void mnt_change_mountpoint(struct mount *parent, struct mountpoint *mp,
> > 
> > Miklos, are you planning to send this as a formal patch?
> 
> Hold it for a while, OK?  There's an unpleasant issue (a very long-standing
> one) with boxen that have an obscene amount of RAM.  Some of the counters
> involved will need to become long.  This is the coming cycle fodder (mounts
> and inodes are relatively easy; it's dentry->d_count that brings arseloads
> of fun) and I'd rather deal with that sanity check as part of the same series.
> It's not forgotten...  Patch series re limiting the number of negative
> dentries is also getting into the same mix.  Watch #work.dcache - what's
> in there is basically prep work for the big pile for the next cycle; it'll
> be interesting...

Al, whatever happened to the refcounting patches you mentioned here?

- Eric

vfs: fsmount: add missing mntget()

Commit Message

Comments

Patch