Message ID | 20181211172423.7709-3-jack@suse.cz (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | vfs: Fix crashes when reading /proc/mounts | expand |
On Tue, Dec 11, 2018 at 06:24:23PM +0100, Jan Kara wrote: > Readers of /proc/mounts (and similar files) are currently unprotected > from concurrently running remount on the filesystem they are reporting. > This can not only result in bogus reported information but also in > confusion in filesystems ->show_options callbacks resulting in > use-after-free issues or similar (for example ext4 handling of quota > file names is prone to such races). > > Fix the problem by protecting showing of mount information with > sb->s_umount semaphore. Umm... Which tree is it against and what exactly does your hold_sb() do?
On Tue, Dec 11, 2018 at 06:36:19PM +0000, Al Viro wrote: > On Tue, Dec 11, 2018 at 06:24:23PM +0100, Jan Kara wrote: > > Readers of /proc/mounts (and similar files) are currently unprotected > > from concurrently running remount on the filesystem they are reporting. > > This can not only result in bogus reported information but also in > > confusion in filesystems ->show_options callbacks resulting in > > use-after-free issues or similar (for example ext4 handling of quota > > file names is prone to such races). > > > > Fix the problem by protecting showing of mount information with > > sb->s_umount semaphore. > > Umm... Which tree is it against and what exactly does your hold_sb() do? D'oh... I need more coffee. Nevermind.
On Tue, Dec 11, 2018 at 06:24:23PM +0100, Jan Kara wrote: > Readers of /proc/mounts (and similar files) are currently unprotected > from concurrently running remount on the filesystem they are reporting. > This can not only result in bogus reported information but also in > confusion in filesystems ->show_options callbacks resulting in > use-after-free issues or similar (for example ext4 handling of quota > file names is prone to such races). > > Fix the problem by protecting showing of mount information with > sb->s_umount semaphore. > +static bool mounts_trylock_super(struct proc_mounts *p, struct super_block *sb) > +{ > + if (p->locked_sb == sb) > + return true; > + if (p->locked_sb) { > + drop_super(p->locked_sb); > + p->locked_sb = NULL; > + } > + if (down_read_trylock(&sb->s_umount)) { > + hold_sb(sb); > + p->locked_sb = sb; > + return true; > + } > + return false; > +} Bad calling conventions, and you are paying for those with making hold_sb() non-static (and having it, in the first place). > + if (mounts_trylock_super(p, sb)) > + return p->cached_mount; > + /* > + * Trylock failed. Since namepace_sem ranks below s_umount (through > + * sb->s_umount > dir->i_rwsem > namespace_sem in the mount path), we > + * have to drop it, wait for s_umount and then try again to guarantee > + * forward progress. > + */ > + hold_sb(sb); That. Just hoist that hold_sb() into your trylock (and put it before the down_read_trylock() there, while we are at it). And turn the other caller into if (unlikely(!.....)) ret = -EAGAIN; else ret = p->show(m, &r->mnt); followed by unconditional drop_super(). And I would probably go for mount_trylock_super(&p->locked_super, sb) while we are at it, so that it's isolated from proc_mounts and can be moved to fs/super.c > + up_read(&namespace_sem); > + down_read(&sb->s_umount); > + /* > + * Sb may be dead by now but that just means we won't find it in any > + * mount and drop lock & reference anyway. > + */ > + p->locked_sb = sb; > + goto restart; No. if (likely(sb->s_root)) p->locked_sb = sb; else drop_super(sb); goto restart;
On Tue, Dec 11, 2018 at 06:58:31PM +0000, Al Viro wrote: > > +static bool mounts_trylock_super(struct proc_mounts *p, struct super_block *sb) > > +{ > > + if (p->locked_sb == sb) > > + return true; > > + if (p->locked_sb) { > > + drop_super(p->locked_sb); > > + p->locked_sb = NULL; > > + } > > + if (down_read_trylock(&sb->s_umount)) { > > + hold_sb(sb); > > + p->locked_sb = sb; > > + return true; > > + } > > + return false; > > +} > > Bad calling conventions, and you are paying for those with making > hold_sb() non-static (and having it, in the first place). > > > + if (mounts_trylock_super(p, sb)) > > + return p->cached_mount; > > + /* > > + * Trylock failed. Since namepace_sem ranks below s_umount (through > > + * sb->s_umount > dir->i_rwsem > namespace_sem in the mount path), we > > + * have to drop it, wait for s_umount and then try again to guarantee > > + * forward progress. > > + */ > > + hold_sb(sb); > > That. Just hoist that hold_sb() into your trylock (and put it before the > down_read_trylock() there, while we are at it). And turn the other caller > into > if (unlikely(!.....)) > ret = -EAGAIN; > else > ret = p->show(m, &r->mnt); > followed by unconditional drop_super(). And I would probably go for > mount_trylock_super(&p->locked_super, sb) > while we are at it, so that it's isolated from proc_mounts and can > be moved to fs/super.c Looking at it some more... I still hate it ;-/ Take a look at traverse() in fs/seq_file.c and think what kind of clusterfuck will it cause...
On Tue 11-12-18 19:14:52, Al Viro wrote: > On Tue, Dec 11, 2018 at 06:58:31PM +0000, Al Viro wrote: > > > > +static bool mounts_trylock_super(struct proc_mounts *p, struct super_block *sb) > > > +{ > > > + if (p->locked_sb == sb) > > > + return true; > > > + if (p->locked_sb) { > > > + drop_super(p->locked_sb); > > > + p->locked_sb = NULL; > > > + } > > > + if (down_read_trylock(&sb->s_umount)) { > > > + hold_sb(sb); > > > + p->locked_sb = sb; > > > + return true; > > > + } > > > + return false; > > > +} > > > > Bad calling conventions, and you are paying for those with making > > hold_sb() non-static (and having it, in the first place). > > > > > + if (mounts_trylock_super(p, sb)) > > > + return p->cached_mount; > > > + /* > > > + * Trylock failed. Since namepace_sem ranks below s_umount (through > > > + * sb->s_umount > dir->i_rwsem > namespace_sem in the mount path), we > > > + * have to drop it, wait for s_umount and then try again to guarantee > > > + * forward progress. > > > + */ > > > + hold_sb(sb); > > > > That. Just hoist that hold_sb() into your trylock (and put it before the > > down_read_trylock() there, while we are at it). And turn the other caller > > into > > if (unlikely(!.....)) > > ret = -EAGAIN; > > else > > ret = p->show(m, &r->mnt); > > followed by unconditional drop_super(). And I would probably go for > > mount_trylock_super(&p->locked_super, sb) > > while we are at it, so that it's isolated from proc_mounts and can > > be moved to fs/super.c > > Looking at it some more... I still hate it ;-/ Take a look at traverse() > in fs/seq_file.c and think what kind of clusterfuck will it cause... I guess you mean that in case we fail to lock s_umount semaphore, we'll return -EAGAIN and traverse() will abort? That is true but since we return -EAGAIN, callers will call into traverse() again - both do: while ((err = traverse(m, *ppos)) == -EAGAIN) ; and then in m_start() we will do the blocking lock on s_umount. I agree it is ugly and twisted but it should be rare... Now looking at the code, maybe we could avoid this weird retry dance with traverse(). Something like following in m_show(): sb = mnt->mnt_sb; if (mount_trylock_super()) show and done get passive sb reference namespace_unlock(); down_read(&sb->s_umount); namespace_lock(); new_mnt = seq_list_start(); if (new_mnt != mnt) retry show and done This could be handled completely inside m_show() so no strange retry dance with traverse(). Do you find this better? Honza
diff --git a/fs/mount.h b/fs/mount.h index f39bc9da4d73..5ca0a9e714b3 100644 --- a/fs/mount.h +++ b/fs/mount.h @@ -134,6 +134,7 @@ struct proc_mounts { void *cached_mount; u64 cached_event; loff_t cached_index; + struct super_block *locked_sb; }; extern const struct seq_operations mounts_op; diff --git a/fs/namespace.c b/fs/namespace.c index a7f91265ea67..706c802cb75b 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -1237,26 +1237,68 @@ struct vfsmount *mnt_clone_internal(const struct path *path) } #ifdef CONFIG_PROC_FS + +static bool mounts_trylock_super(struct proc_mounts *p, struct super_block *sb) +{ + if (p->locked_sb == sb) + return true; + if (p->locked_sb) { + drop_super(p->locked_sb); + p->locked_sb = NULL; + } + if (down_read_trylock(&sb->s_umount)) { + hold_sb(sb); + p->locked_sb = sb; + return true; + } + return false; +} + /* iterator; we want it to have access to namespace_sem, thus here... */ static void *m_start(struct seq_file *m, loff_t *pos) { struct proc_mounts *p = m->private; + struct mount *mount; + struct super_block *sb; +restart: down_read(&namespace_sem); if (p->cached_event == p->ns->event) { void *v = p->cached_mount; if (*pos == p->cached_index) - return v; + goto lock_sb; if (*pos == p->cached_index + 1) { v = seq_list_next(v, &p->ns->list, &p->cached_index); - return p->cached_mount = v; + p->cached_mount = v; + goto lock_sb; } } p->cached_event = p->ns->event; p->cached_mount = seq_list_start(&p->ns->list, *pos); p->cached_index = *pos; - return p->cached_mount; +lock_sb: + if (!p->cached_mount) + return NULL; + mount = list_entry(p->cached_mount, struct mount, mnt_list); + sb = mount->mnt.mnt_sb; + if (mounts_trylock_super(p, sb)) + return p->cached_mount; + /* + * Trylock failed. Since namepace_sem ranks below s_umount (through + * sb->s_umount > dir->i_rwsem > namespace_sem in the mount path), we + * have to drop it, wait for s_umount and then try again to guarantee + * forward progress. + */ + hold_sb(sb); + up_read(&namespace_sem); + down_read(&sb->s_umount); + /* + * Sb may be dead by now but that just means we won't find it in any + * mount and drop lock & reference anyway. + */ + p->locked_sb = sb; + goto restart; } static void *m_next(struct seq_file *m, void *v, loff_t *pos) @@ -1277,7 +1319,16 @@ static int m_show(struct seq_file *m, void *v) { struct proc_mounts *p = m->private; struct mount *r = list_entry(v, struct mount, mnt_list); - return p->show(m, &r->mnt); + struct super_block *sb = r->mnt.mnt_sb; + int ret; + + /* Protect show function against racing remount */ + if (!mounts_trylock_super(p, sb)) + return -EAGAIN; + ret = p->show(m, &r->mnt); + drop_super(sb); + p->locked_sb = NULL; + return ret; } const struct seq_operations mounts_op = {
Readers of /proc/mounts (and similar files) are currently unprotected from concurrently running remount on the filesystem they are reporting. This can not only result in bogus reported information but also in confusion in filesystems ->show_options callbacks resulting in use-after-free issues or similar (for example ext4 handling of quota file names is prone to such races). Fix the problem by protecting showing of mount information with sb->s_umount semaphore. Signed-off-by: Jan Kara <jack@suse.cz> --- fs/mount.h | 1 + fs/namespace.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++++++---- 2 files changed, 56 insertions(+), 4 deletions(-)