Message ID | 20230828-vfs-super-fixes-v1-2-b37a4a04a88f@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Small follow-up fixes for this cycle | expand |
> For block backed filesystems notifying in pass sb->kill_sb() in I can't parse the 'in pass' here. > @@ -1260,6 +1270,7 @@ void kill_anon_super(struct super_block *sb) > { > dev_t dev = sb->s_dev; > generic_shutdown_super(sb); > + kill_super_notify(sb); > free_anon_bdev(dev); Maybe I didn't read the commit log carefully enough, but why do we need to call kill_super_notify before free_anon_bdev and any potential action in ->kill_sb after calling kill_anon_super here given that we already add a call to kill_super_notify after ->kill_sb?
> Maybe I didn't read the commit log carefully enough, but why do we > need to call kill_super_notify before free_anon_bdev and any potential > action in ->kill_sb after calling kill_anon_super here given that > we already add a call to kill_super_notify after ->kill_sb? Yeah, the commit log explains this. We leave the superblock on fs_supers past sb->kill_sb() and notify after device closure. For block based filesystems that's the correct thing. They don't rely on sb->s_fs_info and we need to ensure that all devices are closed. But for filesystems like kernfs that rely on get_keyed_super() they rely on sb->s_fs_info to recycle sbs. sb->s_fs_info is currently always freed in sb->kill_sb() kernfs_kill_sb() -> kill_anon_super() -> kfree(info) For such fses sb->s_fs_info is freed with the superblock still on fs_supers which means we get a UAF when the sb is still found on the list. So for such filesystems we need to remove and notify before sb->s_fs_info is freed. That's done in kill_anon_super(). For such filesystems the call in deactivate_locked_super() is a nop.
On Mon 28-08-23 13:26:24, Christian Brauner wrote: > For keyed filesystems that recycle superblocks based on s_fs_info or > information contained therein s_fs_info must be kept as long as the > superblock is on the filesystem type super list. This isn't guaranteed > as s_fs_info will be freed latest in sb->kill_sb(). > > The fix is simply to perform notification and list removal in > kill_anon_super(). Any filesystem needs to free s_fs_info after they > call the kill_*() helpers. If they don't they risk use-after-free right > now so fixing it here is guaranteed that s_fs_info remain valid. > > For block backed filesystems notifying in pass sb->kill_sb() in > deactivate_locked_super() remains unproblematic and is required because > multiple other block devices can be shut down after kill_block_super() > has been called from a filesystem's sb->kill_sb() handler. For example, > ext4 and xfs close additional devices. Block based filesystems don't > depend on s_fs_info (btrfs does use s_fs_info but also uses > kill_anon_super() and not kill_block_super().). > > Sorry for that braino. Goal should be to unify this behavior during this > cycle obviously. But let's please do a simple bugfix now. > > Fixes: 2c18a63b760a ("super: wait until we passed kill super") > Fixes: syzbot+5b64180f8d9e39d3f061@syzkaller.appspotmail.com > Reported-by: syzbot+5b64180f8d9e39d3f061@syzkaller.appspotmail.com > Signed-off-by: Christian Brauner <brauner@kernel.org> So AFAICT this fixes the UAF issues. It does reintroduce the EBUSY problems with mount racing with umount e.g. for EROFS which calls bdev_release() after calling kill_anon_super() but I think we can live with that until we come up with a neater solution for this problem. So feel free to add: Reviewed-by: Jan Kara <jack@suse.cz> Honza > --- > fs/super.c | 49 ++++++++++++++++++++++++++++++------------------- > 1 file changed, 30 insertions(+), 19 deletions(-) > > diff --git a/fs/super.c b/fs/super.c > index 779247eb219c..ad7ac3a24d38 100644 > --- a/fs/super.c > +++ b/fs/super.c > @@ -434,6 +434,33 @@ void put_super(struct super_block *sb) > spin_unlock(&sb_lock); > } > > +static void kill_super_notify(struct super_block *sb) > +{ > + lockdep_assert_not_held(&sb->s_umount); > + > + /* already notified earlier */ > + if (sb->s_flags & SB_DEAD) > + return; > + > + /* > + * Remove it from @fs_supers so it isn't found by new > + * sget{_fc}() walkers anymore. Any concurrent mounter still > + * managing to grab a temporary reference is guaranteed to > + * already see SB_DYING and will wait until we notify them about > + * SB_DEAD. > + */ > + spin_lock(&sb_lock); > + hlist_del_init(&sb->s_instances); > + spin_unlock(&sb_lock); > + > + /* > + * Let concurrent mounts know that this thing is really dead. > + * We don't need @sb->s_umount here as every concurrent caller > + * will see SB_DYING and either discard the superblock or wait > + * for SB_DEAD. > + */ > + super_wake(sb, SB_DEAD); > +} > > /** > * deactivate_locked_super - drop an active reference to superblock > @@ -453,6 +480,8 @@ void deactivate_locked_super(struct super_block *s) > unregister_shrinker(&s->s_shrink); > fs->kill_sb(s); > > + kill_super_notify(s); > + > /* > * Since list_lru_destroy() may sleep, we cannot call it from > * put_super(), where we hold the sb_lock. Therefore we destroy > @@ -461,25 +490,6 @@ void deactivate_locked_super(struct super_block *s) > list_lru_destroy(&s->s_dentry_lru); > list_lru_destroy(&s->s_inode_lru); > > - /* > - * Remove it from @fs_supers so it isn't found by new > - * sget{_fc}() walkers anymore. Any concurrent mounter still > - * managing to grab a temporary reference is guaranteed to > - * already see SB_DYING and will wait until we notify them about > - * SB_DEAD. > - */ > - spin_lock(&sb_lock); > - hlist_del_init(&s->s_instances); > - spin_unlock(&sb_lock); > - > - /* > - * Let concurrent mounts know that this thing is really dead. > - * We don't need @sb->s_umount here as every concurrent caller > - * will see SB_DYING and either discard the superblock or wait > - * for SB_DEAD. > - */ > - super_wake(s, SB_DEAD); > - > put_filesystem(fs); > put_super(s); > } else { > @@ -1260,6 +1270,7 @@ void kill_anon_super(struct super_block *sb) > { > dev_t dev = sb->s_dev; > generic_shutdown_super(sb); > + kill_super_notify(sb); > free_anon_bdev(dev); > } > EXPORT_SYMBOL(kill_anon_super); > > -- > 2.34.1 >
> So AFAICT this fixes the UAF issues. It does reintroduce the EBUSY problems > with mount racing with umount e.g. for EROFS which calls bdev_release() It shouldn't as erofs doesn't use block devices when it is in fscache mode which is when kill_anon_super() is called. For regular erofs kill_block_super() is used and notification happens as before. > after calling kill_anon_super() but I think we can live with that until we > come up with a neater solution for this problem. So feel free to add: I think ultimately we might just provide callback to kill_*_super() like we do for sget_fc() or something. But let's keep thinking of course.
On Mon, Aug 28, 2023 at 02:28:48PM +0200, Christian Brauner wrote: > > Maybe I didn't read the commit log carefully enough, but why do we > > need to call kill_super_notify before free_anon_bdev and any potential > > action in ->kill_sb after calling kill_anon_super here given that > > we already add a call to kill_super_notify after ->kill_sb? > > Yeah, the commit log explains this. We leave the superblock on fs_supers > past sb->kill_sb() and notify after device closure. For block based > filesystems that's the correct thing. They don't rely on sb->s_fs_info > and we need to ensure that all devices are closed. > > But for filesystems like kernfs that rely on get_keyed_super() they rely > on sb->s_fs_info to recycle sbs. sb->s_fs_info is currently always freed > in sb->kill_sb() > > kernfs_kill_sb() > -> kill_anon_super() > -> kfree(info) > > For such fses sb->s_fs_info is freed with the superblock still on > fs_supers which means we get a UAF when the sb is still found on the > list. So for such filesystems we need to remove and notify before > sb->s_fs_info is freed. That's done in kill_anon_super(). For such > filesystems the call in deactivate_locked_super() is a nop. Ok, so I did fail to parse the commit log. Looks good: Reviewed-by: Christoph Hellwig <hch@lst.de>
diff --git a/fs/super.c b/fs/super.c index 779247eb219c..ad7ac3a24d38 100644 --- a/fs/super.c +++ b/fs/super.c @@ -434,6 +434,33 @@ void put_super(struct super_block *sb) spin_unlock(&sb_lock); } +static void kill_super_notify(struct super_block *sb) +{ + lockdep_assert_not_held(&sb->s_umount); + + /* already notified earlier */ + if (sb->s_flags & SB_DEAD) + return; + + /* + * Remove it from @fs_supers so it isn't found by new + * sget{_fc}() walkers anymore. Any concurrent mounter still + * managing to grab a temporary reference is guaranteed to + * already see SB_DYING and will wait until we notify them about + * SB_DEAD. + */ + spin_lock(&sb_lock); + hlist_del_init(&sb->s_instances); + spin_unlock(&sb_lock); + + /* + * Let concurrent mounts know that this thing is really dead. + * We don't need @sb->s_umount here as every concurrent caller + * will see SB_DYING and either discard the superblock or wait + * for SB_DEAD. + */ + super_wake(sb, SB_DEAD); +} /** * deactivate_locked_super - drop an active reference to superblock @@ -453,6 +480,8 @@ void deactivate_locked_super(struct super_block *s) unregister_shrinker(&s->s_shrink); fs->kill_sb(s); + kill_super_notify(s); + /* * Since list_lru_destroy() may sleep, we cannot call it from * put_super(), where we hold the sb_lock. Therefore we destroy @@ -461,25 +490,6 @@ void deactivate_locked_super(struct super_block *s) list_lru_destroy(&s->s_dentry_lru); list_lru_destroy(&s->s_inode_lru); - /* - * Remove it from @fs_supers so it isn't found by new - * sget{_fc}() walkers anymore. Any concurrent mounter still - * managing to grab a temporary reference is guaranteed to - * already see SB_DYING and will wait until we notify them about - * SB_DEAD. - */ - spin_lock(&sb_lock); - hlist_del_init(&s->s_instances); - spin_unlock(&sb_lock); - - /* - * Let concurrent mounts know that this thing is really dead. - * We don't need @sb->s_umount here as every concurrent caller - * will see SB_DYING and either discard the superblock or wait - * for SB_DEAD. - */ - super_wake(s, SB_DEAD); - put_filesystem(fs); put_super(s); } else { @@ -1260,6 +1270,7 @@ void kill_anon_super(struct super_block *sb) { dev_t dev = sb->s_dev; generic_shutdown_super(sb); + kill_super_notify(sb); free_anon_bdev(dev); } EXPORT_SYMBOL(kill_anon_super);
For keyed filesystems that recycle superblocks based on s_fs_info or information contained therein s_fs_info must be kept as long as the superblock is on the filesystem type super list. This isn't guaranteed as s_fs_info will be freed latest in sb->kill_sb(). The fix is simply to perform notification and list removal in kill_anon_super(). Any filesystem needs to free s_fs_info after they call the kill_*() helpers. If they don't they risk use-after-free right now so fixing it here is guaranteed that s_fs_info remain valid. For block backed filesystems notifying in pass sb->kill_sb() in deactivate_locked_super() remains unproblematic and is required because multiple other block devices can be shut down after kill_block_super() has been called from a filesystem's sb->kill_sb() handler. For example, ext4 and xfs close additional devices. Block based filesystems don't depend on s_fs_info (btrfs does use s_fs_info but also uses kill_anon_super() and not kill_block_super().). Sorry for that braino. Goal should be to unify this behavior during this cycle obviously. But let's please do a simple bugfix now. Fixes: 2c18a63b760a ("super: wait until we passed kill super") Fixes: syzbot+5b64180f8d9e39d3f061@syzkaller.appspotmail.com Reported-by: syzbot+5b64180f8d9e39d3f061@syzkaller.appspotmail.com Signed-off-by: Christian Brauner <brauner@kernel.org> --- fs/super.c | 49 ++++++++++++++++++++++++++++++------------------- 1 file changed, 30 insertions(+), 19 deletions(-)