[2/2] proc: Protect readers of /proc/mounts from remount
diff mbox series

Message ID 20181211172423.7709-3-jack@suse.cz
State New
Headers show
Series
  • vfs: Fix crashes when reading /proc/mounts
Related show

Commit Message

Jan Kara Dec. 11, 2018, 5:24 p.m. UTC
Readers of /proc/mounts (and similar files) are currently unprotected
from concurrently running remount on the filesystem they are reporting.
This can not only result in bogus reported information but also in
confusion in filesystems ->show_options callbacks resulting in
use-after-free issues or similar (for example ext4 handling of quota
file names is prone to such races).

Fix the problem by protecting showing of mount information with
sb->s_umount semaphore.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/mount.h     |  1 +
 fs/namespace.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 56 insertions(+), 4 deletions(-)

Comments

Al Viro Dec. 11, 2018, 6:36 p.m. UTC | #1
On Tue, Dec 11, 2018 at 06:24:23PM +0100, Jan Kara wrote:
> Readers of /proc/mounts (and similar files) are currently unprotected
> from concurrently running remount on the filesystem they are reporting.
> This can not only result in bogus reported information but also in
> confusion in filesystems ->show_options callbacks resulting in
> use-after-free issues or similar (for example ext4 handling of quota
> file names is prone to such races).
> 
> Fix the problem by protecting showing of mount information with
> sb->s_umount semaphore.

Umm...  Which tree is it against and what exactly does your hold_sb() do?
Al Viro Dec. 11, 2018, 6:37 p.m. UTC | #2
On Tue, Dec 11, 2018 at 06:36:19PM +0000, Al Viro wrote:
> On Tue, Dec 11, 2018 at 06:24:23PM +0100, Jan Kara wrote:
> > Readers of /proc/mounts (and similar files) are currently unprotected
> > from concurrently running remount on the filesystem they are reporting.
> > This can not only result in bogus reported information but also in
> > confusion in filesystems ->show_options callbacks resulting in
> > use-after-free issues or similar (for example ext4 handling of quota
> > file names is prone to such races).
> > 
> > Fix the problem by protecting showing of mount information with
> > sb->s_umount semaphore.
> 
> Umm...  Which tree is it against and what exactly does your hold_sb() do?

D'oh... I need more coffee.  Nevermind.
Al Viro Dec. 11, 2018, 6:58 p.m. UTC | #3
On Tue, Dec 11, 2018 at 06:24:23PM +0100, Jan Kara wrote:
> Readers of /proc/mounts (and similar files) are currently unprotected
> from concurrently running remount on the filesystem they are reporting.
> This can not only result in bogus reported information but also in
> confusion in filesystems ->show_options callbacks resulting in
> use-after-free issues or similar (for example ext4 handling of quota
> file names is prone to such races).
> 
> Fix the problem by protecting showing of mount information with
> sb->s_umount semaphore.

> +static bool mounts_trylock_super(struct proc_mounts *p, struct super_block *sb)
> +{
> +	if (p->locked_sb == sb)
> +		return true;
> +	if (p->locked_sb) {
> +		drop_super(p->locked_sb);
> +		p->locked_sb = NULL;
> +	}
> +	if (down_read_trylock(&sb->s_umount)) {
> +		hold_sb(sb);
> +		p->locked_sb = sb;
> +		return true;
> +	}
> +	return false;
> +}

Bad calling conventions, and you are paying for those with making
hold_sb() non-static (and having it, in the first place).

> +	if (mounts_trylock_super(p, sb))
> +		return p->cached_mount;
> +	/*
> +	 * Trylock failed. Since namepace_sem ranks below s_umount (through
> +	 * sb->s_umount > dir->i_rwsem > namespace_sem in the mount path), we
> +	 * have to drop it, wait for s_umount and then try again to guarantee
> +	 * forward progress.
> +	 */
> +	hold_sb(sb);

That.  Just hoist that hold_sb() into your trylock (and put it before the
down_read_trylock() there, while we are at it).  And turn the other caller
into
	if (unlikely(!.....))
		ret = -EAGAIN;
	else
		ret = p->show(m, &r->mnt);
followed by unconditional drop_super().  And I would probably go for
	mount_trylock_super(&p->locked_super, sb)
while we are at it, so that it's isolated from proc_mounts and can
be moved to fs/super.c


> +	up_read(&namespace_sem);
> +	down_read(&sb->s_umount);
> +	/*
> +	 * Sb may be dead by now but that just means we won't find it in any
> +	 * mount and drop lock & reference anyway.
> +	 */
> +	p->locked_sb = sb;
> +	goto restart;

No.
	if (likely(sb->s_root))
		p->locked_sb = sb;
	else
		drop_super(sb);
	goto restart;
Al Viro Dec. 11, 2018, 7:14 p.m. UTC | #4
On Tue, Dec 11, 2018 at 06:58:31PM +0000, Al Viro wrote:

> > +static bool mounts_trylock_super(struct proc_mounts *p, struct super_block *sb)
> > +{
> > +	if (p->locked_sb == sb)
> > +		return true;
> > +	if (p->locked_sb) {
> > +		drop_super(p->locked_sb);
> > +		p->locked_sb = NULL;
> > +	}
> > +	if (down_read_trylock(&sb->s_umount)) {
> > +		hold_sb(sb);
> > +		p->locked_sb = sb;
> > +		return true;
> > +	}
> > +	return false;
> > +}
> 
> Bad calling conventions, and you are paying for those with making
> hold_sb() non-static (and having it, in the first place).
> 
> > +	if (mounts_trylock_super(p, sb))
> > +		return p->cached_mount;
> > +	/*
> > +	 * Trylock failed. Since namepace_sem ranks below s_umount (through
> > +	 * sb->s_umount > dir->i_rwsem > namespace_sem in the mount path), we
> > +	 * have to drop it, wait for s_umount and then try again to guarantee
> > +	 * forward progress.
> > +	 */
> > +	hold_sb(sb);
> 
> That.  Just hoist that hold_sb() into your trylock (and put it before the
> down_read_trylock() there, while we are at it).  And turn the other caller
> into
> 	if (unlikely(!.....))
> 		ret = -EAGAIN;
> 	else
> 		ret = p->show(m, &r->mnt);
> followed by unconditional drop_super().  And I would probably go for
> 	mount_trylock_super(&p->locked_super, sb)
> while we are at it, so that it's isolated from proc_mounts and can
> be moved to fs/super.c

Looking at it some more...  I still hate it ;-/  Take a look at traverse()
in fs/seq_file.c and think what kind of clusterfuck will it cause...
Jan Kara Dec. 12, 2018, 12:56 p.m. UTC | #5
On Tue 11-12-18 19:14:52, Al Viro wrote:
> On Tue, Dec 11, 2018 at 06:58:31PM +0000, Al Viro wrote:
> 
> > > +static bool mounts_trylock_super(struct proc_mounts *p, struct super_block *sb)
> > > +{
> > > +	if (p->locked_sb == sb)
> > > +		return true;
> > > +	if (p->locked_sb) {
> > > +		drop_super(p->locked_sb);
> > > +		p->locked_sb = NULL;
> > > +	}
> > > +	if (down_read_trylock(&sb->s_umount)) {
> > > +		hold_sb(sb);
> > > +		p->locked_sb = sb;
> > > +		return true;
> > > +	}
> > > +	return false;
> > > +}
> > 
> > Bad calling conventions, and you are paying for those with making
> > hold_sb() non-static (and having it, in the first place).
> > 
> > > +	if (mounts_trylock_super(p, sb))
> > > +		return p->cached_mount;
> > > +	/*
> > > +	 * Trylock failed. Since namepace_sem ranks below s_umount (through
> > > +	 * sb->s_umount > dir->i_rwsem > namespace_sem in the mount path), we
> > > +	 * have to drop it, wait for s_umount and then try again to guarantee
> > > +	 * forward progress.
> > > +	 */
> > > +	hold_sb(sb);
> > 
> > That.  Just hoist that hold_sb() into your trylock (and put it before the
> > down_read_trylock() there, while we are at it).  And turn the other caller
> > into
> > 	if (unlikely(!.....))
> > 		ret = -EAGAIN;
> > 	else
> > 		ret = p->show(m, &r->mnt);
> > followed by unconditional drop_super().  And I would probably go for
> > 	mount_trylock_super(&p->locked_super, sb)
> > while we are at it, so that it's isolated from proc_mounts and can
> > be moved to fs/super.c
> 
> Looking at it some more...  I still hate it ;-/  Take a look at traverse()
> in fs/seq_file.c and think what kind of clusterfuck will it cause...

I guess you mean that in case we fail to lock s_umount semaphore, we'll
return -EAGAIN and traverse() will abort? That is true but since we return
-EAGAIN, callers will call into traverse() again - both do:

while ((err = traverse(m, *ppos)) == -EAGAIN) ;

and then in m_start() we will do the blocking lock on s_umount. I agree it
is ugly and twisted but it should be rare...

Now looking at the code, maybe we could avoid this weird retry dance with
traverse(). Something like following in m_show():

        sb = mnt->mnt_sb;
	if (mount_trylock_super())
		show and done
	get passive sb reference
	namespace_unlock();
	down_read(&sb->s_umount);
	namespace_lock();
	new_mnt = seq_list_start();
	if (new_mnt != mnt)
		retry
	show and done

This could be handled completely inside m_show() so no strange retry dance
with traverse(). Do you find this better?

								Honza

Patch
diff mbox series

diff --git a/fs/mount.h b/fs/mount.h
index f39bc9da4d73..5ca0a9e714b3 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -134,6 +134,7 @@  struct proc_mounts {
 	void *cached_mount;
 	u64 cached_event;
 	loff_t cached_index;
+	struct super_block *locked_sb;
 };
 
 extern const struct seq_operations mounts_op;
diff --git a/fs/namespace.c b/fs/namespace.c
index a7f91265ea67..706c802cb75b 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1237,26 +1237,68 @@  struct vfsmount *mnt_clone_internal(const struct path *path)
 }
 
 #ifdef CONFIG_PROC_FS
+
+static bool mounts_trylock_super(struct proc_mounts *p, struct super_block *sb)
+{
+	if (p->locked_sb == sb)
+		return true;
+	if (p->locked_sb) {
+		drop_super(p->locked_sb);
+		p->locked_sb = NULL;
+	}
+	if (down_read_trylock(&sb->s_umount)) {
+		hold_sb(sb);
+		p->locked_sb = sb;
+		return true;
+	}
+	return false;
+}
+
 /* iterator; we want it to have access to namespace_sem, thus here... */
 static void *m_start(struct seq_file *m, loff_t *pos)
 {
 	struct proc_mounts *p = m->private;
+	struct mount *mount;
+	struct super_block *sb;
 
+restart:
 	down_read(&namespace_sem);
 	if (p->cached_event == p->ns->event) {
 		void *v = p->cached_mount;
 		if (*pos == p->cached_index)
-			return v;
+			goto lock_sb;
 		if (*pos == p->cached_index + 1) {
 			v = seq_list_next(v, &p->ns->list, &p->cached_index);
-			return p->cached_mount = v;
+			p->cached_mount = v;
+			goto lock_sb;
 		}
 	}
 
 	p->cached_event = p->ns->event;
 	p->cached_mount = seq_list_start(&p->ns->list, *pos);
 	p->cached_index = *pos;
-	return p->cached_mount;
+lock_sb:
+	if (!p->cached_mount)
+		return NULL;
+	mount = list_entry(p->cached_mount, struct mount, mnt_list);
+	sb = mount->mnt.mnt_sb;
+	if (mounts_trylock_super(p, sb))
+		return p->cached_mount;
+	/*
+	 * Trylock failed. Since namepace_sem ranks below s_umount (through
+	 * sb->s_umount > dir->i_rwsem > namespace_sem in the mount path), we
+	 * have to drop it, wait for s_umount and then try again to guarantee
+	 * forward progress.
+	 */
+	hold_sb(sb);
+	up_read(&namespace_sem);
+	down_read(&sb->s_umount);
+	/*
+	 * Sb may be dead by now but that just means we won't find it in any
+	 * mount and drop lock & reference anyway.
+	 */
+	p->locked_sb = sb;
+	goto restart;
 }
 
 static void *m_next(struct seq_file *m, void *v, loff_t *pos)
@@ -1277,7 +1319,16 @@  static int m_show(struct seq_file *m, void *v)
 {
 	struct proc_mounts *p = m->private;
 	struct mount *r = list_entry(v, struct mount, mnt_list);
-	return p->show(m, &r->mnt);
+	struct super_block *sb = r->mnt.mnt_sb;
+	int ret;
+
+	/* Protect show function against racing remount */
+	if (!mounts_trylock_super(p, sb))
+		return -EAGAIN;
+	ret = p->show(m, &r->mnt);
+	drop_super(sb);
+	p->locked_sb = NULL;
+	return ret;
 }
 
 const struct seq_operations mounts_op = {