diff mbox series

[2/2] super: ensure valid info

Message ID 20230828-vfs-super-fixes-v1-2-b37a4a04a88f@kernel.org (mailing list archive)
State New, archived
Headers show
Series Small follow-up fixes for this cycle | expand

Commit Message

Christian Brauner Aug. 28, 2023, 11:26 a.m. UTC
For keyed filesystems that recycle superblocks based on s_fs_info or
information contained therein s_fs_info must be kept as long as the
superblock is on the filesystem type super list. This isn't guaranteed
as s_fs_info will be freed latest in sb->kill_sb().

The fix is simply to perform notification and list removal in
kill_anon_super(). Any filesystem needs to free s_fs_info after they
call the kill_*() helpers. If they don't they risk use-after-free right
now so fixing it here is guaranteed that s_fs_info remain valid.

For block backed filesystems notifying in pass sb->kill_sb() in
deactivate_locked_super() remains unproblematic and is required because
multiple other block devices can be shut down after kill_block_super()
has been called from a filesystem's sb->kill_sb() handler. For example,
ext4 and xfs close additional devices. Block based filesystems don't
depend on s_fs_info (btrfs does use s_fs_info but also uses
kill_anon_super() and not kill_block_super().).

Sorry for that braino. Goal should be to unify this behavior during this
cycle obviously. But let's please do a simple bugfix now.

Fixes: 2c18a63b760a ("super: wait until we passed kill super")
Fixes: syzbot+5b64180f8d9e39d3f061@syzkaller.appspotmail.com
Reported-by: syzbot+5b64180f8d9e39d3f061@syzkaller.appspotmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/super.c | 49 ++++++++++++++++++++++++++++++-------------------
 1 file changed, 30 insertions(+), 19 deletions(-)

Comments

Christoph Hellwig Aug. 28, 2023, 12:04 p.m. UTC | #1
> For block backed filesystems notifying in pass sb->kill_sb() in

I can't parse the 'in pass' here.

> @@ -1260,6 +1270,7 @@ void kill_anon_super(struct super_block *sb)
>  {
>  	dev_t dev = sb->s_dev;
>  	generic_shutdown_super(sb);
> +	kill_super_notify(sb);
>  	free_anon_bdev(dev);

Maybe I didn't read the commit log carefully enough, but why do we
need to call kill_super_notify before free_anon_bdev and any potential
action in ->kill_sb after calling kill_anon_super here given that
we already add a call to kill_super_notify after ->kill_sb?
Christian Brauner Aug. 28, 2023, 12:28 p.m. UTC | #2
> Maybe I didn't read the commit log carefully enough, but why do we
> need to call kill_super_notify before free_anon_bdev and any potential
> action in ->kill_sb after calling kill_anon_super here given that
> we already add a call to kill_super_notify after ->kill_sb?

Yeah, the commit log explains this. We leave the superblock on fs_supers
past sb->kill_sb() and notify after device closure. For block based
filesystems that's the correct thing. They don't rely on sb->s_fs_info
and we need to ensure that all devices are closed.

But for filesystems like kernfs that rely on get_keyed_super() they rely
on sb->s_fs_info to recycle sbs. sb->s_fs_info is currently always freed
in sb->kill_sb()

kernfs_kill_sb()
-> kill_anon_super()
   -> kfree(info)

For such fses sb->s_fs_info is freed with the superblock still on
fs_supers which means we get a UAF when the sb is still found on the
list. So for such filesystems we need to remove and notify before
sb->s_fs_info is freed. That's done in kill_anon_super(). For such
filesystems the call in deactivate_locked_super() is a nop.
Jan Kara Aug. 28, 2023, 1:07 p.m. UTC | #3
On Mon 28-08-23 13:26:24, Christian Brauner wrote:
> For keyed filesystems that recycle superblocks based on s_fs_info or
> information contained therein s_fs_info must be kept as long as the
> superblock is on the filesystem type super list. This isn't guaranteed
> as s_fs_info will be freed latest in sb->kill_sb().
> 
> The fix is simply to perform notification and list removal in
> kill_anon_super(). Any filesystem needs to free s_fs_info after they
> call the kill_*() helpers. If they don't they risk use-after-free right
> now so fixing it here is guaranteed that s_fs_info remain valid.
> 
> For block backed filesystems notifying in pass sb->kill_sb() in
> deactivate_locked_super() remains unproblematic and is required because
> multiple other block devices can be shut down after kill_block_super()
> has been called from a filesystem's sb->kill_sb() handler. For example,
> ext4 and xfs close additional devices. Block based filesystems don't
> depend on s_fs_info (btrfs does use s_fs_info but also uses
> kill_anon_super() and not kill_block_super().).
> 
> Sorry for that braino. Goal should be to unify this behavior during this
> cycle obviously. But let's please do a simple bugfix now.
> 
> Fixes: 2c18a63b760a ("super: wait until we passed kill super")
> Fixes: syzbot+5b64180f8d9e39d3f061@syzkaller.appspotmail.com
> Reported-by: syzbot+5b64180f8d9e39d3f061@syzkaller.appspotmail.com
> Signed-off-by: Christian Brauner <brauner@kernel.org>

So AFAICT this fixes the UAF issues. It does reintroduce the EBUSY problems
with mount racing with umount e.g. for EROFS which calls bdev_release()
after calling kill_anon_super() but I think we can live with that until we
come up with a neater solution for this problem. So feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/super.c | 49 ++++++++++++++++++++++++++++++-------------------
>  1 file changed, 30 insertions(+), 19 deletions(-)
> 
> diff --git a/fs/super.c b/fs/super.c
> index 779247eb219c..ad7ac3a24d38 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -434,6 +434,33 @@ void put_super(struct super_block *sb)
>  	spin_unlock(&sb_lock);
>  }
>  
> +static void kill_super_notify(struct super_block *sb)
> +{
> +	lockdep_assert_not_held(&sb->s_umount);
> +
> +	/* already notified earlier */
> +	if (sb->s_flags & SB_DEAD)
> +		return;
> +
> +	/*
> +	 * Remove it from @fs_supers so it isn't found by new
> +	 * sget{_fc}() walkers anymore. Any concurrent mounter still
> +	 * managing to grab a temporary reference is guaranteed to
> +	 * already see SB_DYING and will wait until we notify them about
> +	 * SB_DEAD.
> +	 */
> +	spin_lock(&sb_lock);
> +	hlist_del_init(&sb->s_instances);
> +	spin_unlock(&sb_lock);
> +
> +	/*
> +	 * Let concurrent mounts know that this thing is really dead.
> +	 * We don't need @sb->s_umount here as every concurrent caller
> +	 * will see SB_DYING and either discard the superblock or wait
> +	 * for SB_DEAD.
> +	 */
> +	super_wake(sb, SB_DEAD);
> +}
>  
>  /**
>   *	deactivate_locked_super	-	drop an active reference to superblock
> @@ -453,6 +480,8 @@ void deactivate_locked_super(struct super_block *s)
>  		unregister_shrinker(&s->s_shrink);
>  		fs->kill_sb(s);
>  
> +		kill_super_notify(s);
> +
>  		/*
>  		 * Since list_lru_destroy() may sleep, we cannot call it from
>  		 * put_super(), where we hold the sb_lock. Therefore we destroy
> @@ -461,25 +490,6 @@ void deactivate_locked_super(struct super_block *s)
>  		list_lru_destroy(&s->s_dentry_lru);
>  		list_lru_destroy(&s->s_inode_lru);
>  
> -		/*
> -		 * Remove it from @fs_supers so it isn't found by new
> -		 * sget{_fc}() walkers anymore. Any concurrent mounter still
> -		 * managing to grab a temporary reference is guaranteed to
> -		 * already see SB_DYING and will wait until we notify them about
> -		 * SB_DEAD.
> -		 */
> -		spin_lock(&sb_lock);
> -		hlist_del_init(&s->s_instances);
> -		spin_unlock(&sb_lock);
> -
> -		/*
> -		 * Let concurrent mounts know that this thing is really dead.
> -		 * We don't need @sb->s_umount here as every concurrent caller
> -		 * will see SB_DYING and either discard the superblock or wait
> -		 * for SB_DEAD.
> -		 */
> -		super_wake(s, SB_DEAD);
> -
>  		put_filesystem(fs);
>  		put_super(s);
>  	} else {
> @@ -1260,6 +1270,7 @@ void kill_anon_super(struct super_block *sb)
>  {
>  	dev_t dev = sb->s_dev;
>  	generic_shutdown_super(sb);
> +	kill_super_notify(sb);
>  	free_anon_bdev(dev);
>  }
>  EXPORT_SYMBOL(kill_anon_super);
> 
> -- 
> 2.34.1
>
Christian Brauner Aug. 28, 2023, 1:17 p.m. UTC | #4
> So AFAICT this fixes the UAF issues. It does reintroduce the EBUSY problems
> with mount racing with umount e.g. for EROFS which calls bdev_release()

It shouldn't as erofs doesn't use block devices when it is in fscache
mode which is when kill_anon_super() is called. For regular erofs
kill_block_super() is used and notification happens as before.

> after calling kill_anon_super() but I think we can live with that until we
> come up with a neater solution for this problem. So feel free to add:

I think ultimately we might just provide callback to kill_*_super() like
we do for sget_fc() or something. But let's keep thinking of course.
Christoph Hellwig Aug. 28, 2023, 2:03 p.m. UTC | #5
On Mon, Aug 28, 2023 at 02:28:48PM +0200, Christian Brauner wrote:
> > Maybe I didn't read the commit log carefully enough, but why do we
> > need to call kill_super_notify before free_anon_bdev and any potential
> > action in ->kill_sb after calling kill_anon_super here given that
> > we already add a call to kill_super_notify after ->kill_sb?
> 
> Yeah, the commit log explains this. We leave the superblock on fs_supers
> past sb->kill_sb() and notify after device closure. For block based
> filesystems that's the correct thing. They don't rely on sb->s_fs_info
> and we need to ensure that all devices are closed.
> 
> But for filesystems like kernfs that rely on get_keyed_super() they rely
> on sb->s_fs_info to recycle sbs. sb->s_fs_info is currently always freed
> in sb->kill_sb()
> 
> kernfs_kill_sb()
> -> kill_anon_super()
>    -> kfree(info)
> 
> For such fses sb->s_fs_info is freed with the superblock still on
> fs_supers which means we get a UAF when the sb is still found on the
> list. So for such filesystems we need to remove and notify before
> sb->s_fs_info is freed. That's done in kill_anon_super(). For such
> filesystems the call in deactivate_locked_super() is a nop.

Ok, so I did fail to parse the commit log.

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>
diff mbox series

Patch

diff --git a/fs/super.c b/fs/super.c
index 779247eb219c..ad7ac3a24d38 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -434,6 +434,33 @@  void put_super(struct super_block *sb)
 	spin_unlock(&sb_lock);
 }
 
+static void kill_super_notify(struct super_block *sb)
+{
+	lockdep_assert_not_held(&sb->s_umount);
+
+	/* already notified earlier */
+	if (sb->s_flags & SB_DEAD)
+		return;
+
+	/*
+	 * Remove it from @fs_supers so it isn't found by new
+	 * sget{_fc}() walkers anymore. Any concurrent mounter still
+	 * managing to grab a temporary reference is guaranteed to
+	 * already see SB_DYING and will wait until we notify them about
+	 * SB_DEAD.
+	 */
+	spin_lock(&sb_lock);
+	hlist_del_init(&sb->s_instances);
+	spin_unlock(&sb_lock);
+
+	/*
+	 * Let concurrent mounts know that this thing is really dead.
+	 * We don't need @sb->s_umount here as every concurrent caller
+	 * will see SB_DYING and either discard the superblock or wait
+	 * for SB_DEAD.
+	 */
+	super_wake(sb, SB_DEAD);
+}
 
 /**
  *	deactivate_locked_super	-	drop an active reference to superblock
@@ -453,6 +480,8 @@  void deactivate_locked_super(struct super_block *s)
 		unregister_shrinker(&s->s_shrink);
 		fs->kill_sb(s);
 
+		kill_super_notify(s);
+
 		/*
 		 * Since list_lru_destroy() may sleep, we cannot call it from
 		 * put_super(), where we hold the sb_lock. Therefore we destroy
@@ -461,25 +490,6 @@  void deactivate_locked_super(struct super_block *s)
 		list_lru_destroy(&s->s_dentry_lru);
 		list_lru_destroy(&s->s_inode_lru);
 
-		/*
-		 * Remove it from @fs_supers so it isn't found by new
-		 * sget{_fc}() walkers anymore. Any concurrent mounter still
-		 * managing to grab a temporary reference is guaranteed to
-		 * already see SB_DYING and will wait until we notify them about
-		 * SB_DEAD.
-		 */
-		spin_lock(&sb_lock);
-		hlist_del_init(&s->s_instances);
-		spin_unlock(&sb_lock);
-
-		/*
-		 * Let concurrent mounts know that this thing is really dead.
-		 * We don't need @sb->s_umount here as every concurrent caller
-		 * will see SB_DYING and either discard the superblock or wait
-		 * for SB_DEAD.
-		 */
-		super_wake(s, SB_DEAD);
-
 		put_filesystem(fs);
 		put_super(s);
 	} else {
@@ -1260,6 +1270,7 @@  void kill_anon_super(struct super_block *sb)
 {
 	dev_t dev = sb->s_dev;
 	generic_shutdown_super(sb);
+	kill_super_notify(sb);
 	free_anon_bdev(dev);
 }
 EXPORT_SYMBOL(kill_anon_super);