[2/2] super: ensure valid info

Message ID	20230828-vfs-super-fixes-v1-2-b37a4a04a88f@kernel.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-fsdevel-owner@vger.kernel.org> From: Christian Brauner <brauner@kernel.org> Date: Mon, 28 Aug 2023 13:26:24 +0200 Subject: [PATCH 2/2] super: ensure valid info MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <20230828-vfs-super-fixes-v1-2-b37a4a04a88f@kernel.org> References: <20230828-vfs-super-fixes-v1-0-b37a4a04a88f@kernel.org> In-Reply-To: <20230828-vfs-super-fixes-v1-0-b37a4a04a88f@kernel.org> To: Jan Kara <jack@suse.cz>, Christoph Hellwig <hch@lst.de> Cc: linux-fsdevel@vger.kernel.org, Christian Brauner <brauner@kernel.org>, syzbot+5b64180f8d9e39d3f061@syzkaller.appspotmail.com Precedence: bulk
Series	Small follow-up fixes for this cycle \| expand [0/2] Small follow-up fixes for this cycle [1/2] super: move lockdep assert [2/2] super: ensure valid info

Christian Brauner Aug. 28, 2023, 11:26 a.m. UTC

For keyed filesystems that recycle superblocks based on s_fs_info or
information contained therein s_fs_info must be kept as long as the
superblock is on the filesystem type super list. This isn't guaranteed
as s_fs_info will be freed latest in sb->kill_sb().

The fix is simply to perform notification and list removal in
kill_anon_super(). Any filesystem needs to free s_fs_info after they
call the kill_*() helpers. If they don't they risk use-after-free right
now so fixing it here is guaranteed that s_fs_info remain valid.

For block backed filesystems notifying in pass sb->kill_sb() in
deactivate_locked_super() remains unproblematic and is required because
multiple other block devices can be shut down after kill_block_super()
has been called from a filesystem's sb->kill_sb() handler. For example,
ext4 and xfs close additional devices. Block based filesystems don't
depend on s_fs_info (btrfs does use s_fs_info but also uses
kill_anon_super() and not kill_block_super().).

Sorry for that braino. Goal should be to unify this behavior during this
cycle obviously. But let's please do a simple bugfix now.

Fixes: 2c18a63b760a ("super: wait until we passed kill super")
Fixes: syzbot+5b64180f8d9e39d3f061@syzkaller.appspotmail.com
Reported-by: syzbot+5b64180f8d9e39d3f061@syzkaller.appspotmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/super.c | 49 ++++++++++++++++++++++++++++++-------------------
 1 file changed, 30 insertions(+), 19 deletions(-)

Christoph Hellwig Aug. 28, 2023, 12:04 p.m. UTC | #1

> For block backed filesystems notifying in pass sb->kill_sb() in

I can't parse the 'in pass' here.

> @@ -1260,6 +1270,7 @@ void kill_anon_super(struct super_block *sb)
>  {
>  	dev_t dev = sb->s_dev;
>  	generic_shutdown_super(sb);
> +	kill_super_notify(sb);
>  	free_anon_bdev(dev);

Maybe I didn't read the commit log carefully enough, but why do we
need to call kill_super_notify before free_anon_bdev and any potential
action in ->kill_sb after calling kill_anon_super here given that
we already add a call to kill_super_notify after ->kill_sb?

Christian Brauner Aug. 28, 2023, 12:28 p.m. UTC | #2

> Maybe I didn't read the commit log carefully enough, but why do we
> need to call kill_super_notify before free_anon_bdev and any potential
> action in ->kill_sb after calling kill_anon_super here given that
> we already add a call to kill_super_notify after ->kill_sb?

Yeah, the commit log explains this. We leave the superblock on fs_supers
past sb->kill_sb() and notify after device closure. For block based
filesystems that's the correct thing. They don't rely on sb->s_fs_info
and we need to ensure that all devices are closed.

But for filesystems like kernfs that rely on get_keyed_super() they rely
on sb->s_fs_info to recycle sbs. sb->s_fs_info is currently always freed
in sb->kill_sb()

kernfs_kill_sb()
-> kill_anon_super()
   -> kfree(info)

For such fses sb->s_fs_info is freed with the superblock still on
fs_supers which means we get a UAF when the sb is still found on the
list. So for such filesystems we need to remove and notify before
sb->s_fs_info is freed. That's done in kill_anon_super(). For such
filesystems the call in deactivate_locked_super() is a nop.

Jan Kara Aug. 28, 2023, 1:07 p.m. UTC | #3

On Mon 28-08-23 13:26:24, Christian Brauner wrote:
> For keyed filesystems that recycle superblocks based on s_fs_info or
> information contained therein s_fs_info must be kept as long as the
> superblock is on the filesystem type super list. This isn't guaranteed
> as s_fs_info will be freed latest in sb->kill_sb().
> 
> The fix is simply to perform notification and list removal in
> kill_anon_super(). Any filesystem needs to free s_fs_info after they
> call the kill_*() helpers. If they don't they risk use-after-free right
> now so fixing it here is guaranteed that s_fs_info remain valid.
> 
> For block backed filesystems notifying in pass sb->kill_sb() in
> deactivate_locked_super() remains unproblematic and is required because
> multiple other block devices can be shut down after kill_block_super()
> has been called from a filesystem's sb->kill_sb() handler. For example,
> ext4 and xfs close additional devices. Block based filesystems don't
> depend on s_fs_info (btrfs does use s_fs_info but also uses
> kill_anon_super() and not kill_block_super().).
> 
> Sorry for that braino. Goal should be to unify this behavior during this
> cycle obviously. But let's please do a simple bugfix now.
> 
> Fixes: 2c18a63b760a ("super: wait until we passed kill super")
> Fixes: syzbot+5b64180f8d9e39d3f061@syzkaller.appspotmail.com
> Reported-by: syzbot+5b64180f8d9e39d3f061@syzkaller.appspotmail.com
> Signed-off-by: Christian Brauner <brauner@kernel.org>

So AFAICT this fixes the UAF issues. It does reintroduce the EBUSY problems
with mount racing with umount e.g. for EROFS which calls bdev_release()
after calling kill_anon_super() but I think we can live with that until we
come up with a neater solution for this problem. So feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/super.c | 49 ++++++++++++++++++++++++++++++-------------------
>  1 file changed, 30 insertions(+), 19 deletions(-)
> 
> diff --git a/fs/super.c b/fs/super.c
> index 779247eb219c..ad7ac3a24d38 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -434,6 +434,33 @@ void put_super(struct super_block *sb)
>  	spin_unlock(&sb_lock);
>  }
>  
> +static void kill_super_notify(struct super_block *sb)
> +{
> +	lockdep_assert_not_held(&sb->s_umount);
> +
> +	/* already notified earlier */
> +	if (sb->s_flags & SB_DEAD)
> +		return;
> +
> +	/*
> +	 * Remove it from @fs_supers so it isn't found by new
> +	 * sget{_fc}() walkers anymore. Any concurrent mounter still
> +	 * managing to grab a temporary reference is guaranteed to
> +	 * already see SB_DYING and will wait until we notify them about
> +	 * SB_DEAD.
> +	 */
> +	spin_lock(&sb_lock);
> +	hlist_del_init(&sb->s_instances);
> +	spin_unlock(&sb_lock);
> +
> +	/*
> +	 * Let concurrent mounts know that this thing is really dead.
> +	 * We don't need @sb->s_umount here as every concurrent caller
> +	 * will see SB_DYING and either discard the superblock or wait
> +	 * for SB_DEAD.
> +	 */
> +	super_wake(sb, SB_DEAD);
> +}
>  
>  /**
>   *	deactivate_locked_super	-	drop an active reference to superblock
> @@ -453,6 +480,8 @@ void deactivate_locked_super(struct super_block *s)
>  		unregister_shrinker(&s->s_shrink);
>  		fs->kill_sb(s);
>  
> +		kill_super_notify(s);
> +
>  		/*
>  		 * Since list_lru_destroy() may sleep, we cannot call it from
>  		 * put_super(), where we hold the sb_lock. Therefore we destroy
> @@ -461,25 +490,6 @@ void deactivate_locked_super(struct super_block *s)
>  		list_lru_destroy(&s->s_dentry_lru);
>  		list_lru_destroy(&s->s_inode_lru);
>  
> -		/*
> -		 * Remove it from @fs_supers so it isn't found by new
> -		 * sget{_fc}() walkers anymore. Any concurrent mounter still
> -		 * managing to grab a temporary reference is guaranteed to
> -		 * already see SB_DYING and will wait until we notify them about
> -		 * SB_DEAD.
> -		 */
> -		spin_lock(&sb_lock);
> -		hlist_del_init(&s->s_instances);
> -		spin_unlock(&sb_lock);
> -
> -		/*
> -		 * Let concurrent mounts know that this thing is really dead.
> -		 * We don't need @sb->s_umount here as every concurrent caller
> -		 * will see SB_DYING and either discard the superblock or wait
> -		 * for SB_DEAD.
> -		 */
> -		super_wake(s, SB_DEAD);
> -
>  		put_filesystem(fs);
>  		put_super(s);
>  	} else {
> @@ -1260,6 +1270,7 @@ void kill_anon_super(struct super_block *sb)
>  {
>  	dev_t dev = sb->s_dev;
>  	generic_shutdown_super(sb);
> +	kill_super_notify(sb);
>  	free_anon_bdev(dev);
>  }
>  EXPORT_SYMBOL(kill_anon_super);
> 
> -- 
> 2.34.1
>

Christian Brauner Aug. 28, 2023, 1:17 p.m. UTC | #4

> So AFAICT this fixes the UAF issues. It does reintroduce the EBUSY problems
> with mount racing with umount e.g. for EROFS which calls bdev_release()

It shouldn't as erofs doesn't use block devices when it is in fscache
mode which is when kill_anon_super() is called. For regular erofs
kill_block_super() is used and notification happens as before.

> after calling kill_anon_super() but I think we can live with that until we
> come up with a neater solution for this problem. So feel free to add:

I think ultimately we might just provide callback to kill_*_super() like
we do for sget_fc() or something. But let's keep thinking of course.

Christoph Hellwig Aug. 28, 2023, 2:03 p.m. UTC | #5

On Mon, Aug 28, 2023 at 02:28:48PM +0200, Christian Brauner wrote:
> > Maybe I didn't read the commit log carefully enough, but why do we
> > need to call kill_super_notify before free_anon_bdev and any potential
> > action in ->kill_sb after calling kill_anon_super here given that
> > we already add a call to kill_super_notify after ->kill_sb?
> 
> Yeah, the commit log explains this. We leave the superblock on fs_supers
> past sb->kill_sb() and notify after device closure. For block based
> filesystems that's the correct thing. They don't rely on sb->s_fs_info
> and we need to ensure that all devices are closed.
> 
> But for filesystems like kernfs that rely on get_keyed_super() they rely
> on sb->s_fs_info to recycle sbs. sb->s_fs_info is currently always freed
> in sb->kill_sb()
> 
> kernfs_kill_sb()
> -> kill_anon_super()
>    -> kfree(info)
> 
> For such fses sb->s_fs_info is freed with the superblock still on
> fs_supers which means we get a UAF when the sb is still found on the
> list. So for such filesystems we need to remove and notify before
> sb->s_fs_info is freed. That's done in kill_anon_super(). For such
> filesystems the call in deactivate_locked_super() is a nop.

Ok, so I did fail to parse the commit log.

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

[2/2] super: ensure valid info

Commit Message

Comments

Patch