From patchwork Tue Jan  3 04:00:52 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Al Viro <viro@ZenIV.linux.org.uk>
X-Patchwork-Id: 9494367
Return-Path: <linux-fsdevel-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	3011760413 for <patchwork-linux-fsdevel@patchwork.kernel.org>;
	Tue,  3 Jan 2017 04:01:00 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2223720246
	for <patchwork-linux-fsdevel@patchwork.kernel.org>;
	Tue,  3 Jan 2017 04:01:00 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 13024269E2; Tue,  3 Jan 2017 04:01:00 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI
	autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8858B20246
	for <patchwork-linux-fsdevel@patchwork.kernel.org>;
	Tue,  3 Jan 2017 04:00:59 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1756442AbdACEA6 (ORCPT
	<rfc822;patchwork-linux-fsdevel@patchwork.kernel.org>);
	Mon, 2 Jan 2017 23:00:58 -0500
Received: from zeniv.linux.org.uk ([195.92.253.2]:48996 "EHLO
	ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1752258AbdACEA5 (ORCPT
	<rfc822;linux-fsdevel@vger.kernel.org>);
	Mon, 2 Jan 2017 23:00:57 -0500
Received: from viro by ZenIV.linux.org.uk with local (Exim 4.87 #1 (Red Hat
	Linux)) id 1cOGHM-0003Y9-O1; Tue, 03 Jan 2017 04:00:52 +0000
Date: Tue, 3 Jan 2017 04:00:52 +0000
From: Al Viro <viro@ZenIV.linux.org.uk>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Krister Johansen <kjlx@templeofstupid.com>, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH] Fix a race in put_mountpoint.
Message-ID: <20170103040052.GB1555@ZenIV.linux.org.uk>
References: <20161231041001.GA2448@templeofstupid.com>
	<20161231061729.GX1555@ZenIV.linux.org.uk>
	<874m1hdkyv.fsf@xmission.com>
	<20170103014806.GA1555@ZenIV.linux.org.uk>
	<87ful07ryd.fsf@xmission.com>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <87ful07ryd.fsf@xmission.com>
User-Agent: Mutt/1.7.1 (2016-10-04)
Sender: linux-fsdevel-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-fsdevel.vger.kernel.org>
X-Mailing-List: linux-fsdevel@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

On Tue, Jan 03, 2017 at 04:17:14PM +1300, Eric W. Biederman wrote:
> Al Viro <viro@ZenIV.linux.org.uk> writes:
> 
> > On Tue, Jan 03, 2017 at 01:51:36PM +1300, Eric W. Biederman wrote:
> >
> >> The only significant thing I see is that you have not taken the
> >> mount_lock on the path where new_mountpoint adds the new struct
> >> mountpoint into the mountpoint hash table.
> >
> > Umm...  Point, but I really don't like that bouncing mount_lock up
> > and down there.  It's not going to cause any serious overhead,
> > but it just looks ugly... ;-/
> >
> > Let me think for a while...
> 
> The other possibility is to grab namespace_sem in mntput_no_expire
> around the call of umount_mnt.  That is the only path where
> put_mountpoint can be called where we are not holding namespace_sem.
> That works in the small but I haven't traced the callers of mntput and
> mntput_no_expire yet to see if it works in practice.

No, that's a really bad idea.  Final mntput should _not_ happen under
namespace_lock, but I don't want grabbing it in that place.

How about this instead:
---
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/fs/namespace.c b/fs/namespace.c
index b5b1259e064f..20fc797277f8 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -742,29 +742,6 @@ static struct mountpoint *lookup_mountpoint(struct dentry *dentry)
 	return NULL;
 }
 
-static struct mountpoint *new_mountpoint(struct dentry *dentry)
-{
-	struct hlist_head *chain = mp_hash(dentry);
-	struct mountpoint *mp;
-	int ret;
-
-	mp = kmalloc(sizeof(struct mountpoint), GFP_KERNEL);
-	if (!mp)
-		return ERR_PTR(-ENOMEM);
-
-	ret = d_set_mounted(dentry);
-	if (ret) {
-		kfree(mp);
-		return ERR_PTR(ret);
-	}
-
-	mp->m_dentry = dentry;
-	mp->m_count = 1;
-	hlist_add_head(&mp->m_hash, chain);
-	INIT_HLIST_HEAD(&mp->m_list);
-	return mp;
-}
-
 static void put_mountpoint(struct mountpoint *mp)
 {
 	if (!--mp->m_count) {
@@ -1595,11 +1572,11 @@ void __detach_mounts(struct dentry *dentry)
 	struct mount *mnt;
 
 	namespace_lock();
+	lock_mount_hash();
 	mp = lookup_mountpoint(dentry);
 	if (IS_ERR_OR_NULL(mp))
 		goto out_unlock;
 
-	lock_mount_hash();
 	event++;
 	while (!hlist_empty(&mp->m_list)) {
 		mnt = hlist_entry(mp->m_list.first, struct mount, mnt_mp_list);
@@ -1609,9 +1586,9 @@ void __detach_mounts(struct dentry *dentry)
 		}
 		else umount_tree(mnt, UMOUNT_CONNECTED);
 	}
-	unlock_mount_hash();
 	put_mountpoint(mp);
 out_unlock:
+	unlock_mount_hash();
 	namespace_unlock();
 }
 
@@ -2027,8 +2004,11 @@ static int attach_recursive_mnt(struct mount *source_mnt,
 
 static struct mountpoint *lock_mount(struct path *path)
 {
+	struct mountpoint *mp = NULL;
 	struct vfsmount *mnt;
 	struct dentry *dentry = path->dentry;
+	int err;
+
 retry:
 	inode_lock(dentry->d_inode);
 	if (unlikely(cant_mount(dentry))) {
@@ -2037,29 +2017,60 @@ static struct mountpoint *lock_mount(struct path *path)
 	}
 	namespace_lock();
 	mnt = lookup_mnt(path);
-	if (likely(!mnt)) {
-		struct mountpoint *mp = lookup_mountpoint(dentry);
-		if (!mp)
-			mp = new_mountpoint(dentry);
-		if (IS_ERR(mp)) {
+	if (unlikely(mnt)) {
+		namespace_unlock();
+		inode_unlock(path->dentry->d_inode);
+		path_put(path);
+		path->mnt = mnt;
+		dentry = path->dentry = dget(mnt->mnt_root);
+		goto retry;
+	}
+
+	/*
+	 * OK, we have namespace_lock held, nothing is overmounting
+	 * *path and inode of mountpoint to be is locked.
+	 */
+	if (likely(!d_mountpoint(dentry)))
+		mp = kmalloc(sizeof(struct mountpoint), GFP_KERNEL);
+	read_seqlock_excl(&mount_lock);
+	if (!mp && !d_mountpoint(dentry)) {
+		read_sequnlock_excl(&mount_lock);
+		mp = kmalloc(sizeof(struct mountpoint), GFP_KERNEL);
+		read_seqlock_excl(&mount_lock);
+	}
+	if (d_mountpoint(dentry)) {
+		kfree(mp);
+		mp = lookup_mountpoint(dentry);
+	} else {
+		if (unlikely(!mp)) {
+			read_sequnlock_excl(&mount_lock);
 			namespace_unlock();
 			inode_unlock(dentry->d_inode);
-			return mp;
+			return ERR_PTR(-ENOMEM);
 		}
-		return mp;
+		err = d_set_mounted(dentry);
+		if (unlikely(err)) {
+			kfree(mp);
+			read_sequnlock_excl(&mount_lock);
+			namespace_unlock();
+			inode_unlock(dentry->d_inode);
+			return ERR_PTR(err);
+		}
+		mp->m_dentry = dentry;
+		mp->m_count = 1;
+		hlist_add_head(&mp->m_hash, mp_hash(dentry));
+		INIT_HLIST_HEAD(&mp->m_list);
 	}
-	namespace_unlock();
-	inode_unlock(path->dentry->d_inode);
-	path_put(path);
-	path->mnt = mnt;
-	dentry = path->dentry = dget(mnt->mnt_root);
-	goto retry;
+	read_sequnlock_excl(&mount_lock);
+	return mp;
 }
 
 static void unlock_mount(struct mountpoint *where)
 {
 	struct dentry *dentry = where->m_dentry;
+	read_seqlock_excl(&mount_lock);
 	put_mountpoint(where);
+	read_sequnlock_excl(&mount_lock);
 	namespace_unlock();
 	inode_unlock(dentry->d_inode);
 }
@@ -3137,7 +3148,9 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
 	list_del_init(&new_mnt->mnt_expire);
 	unlock_mount_hash();
 	chroot_fs_refs(&root, &new);
+	read_seqlock_excl(&mount_lock);
 	put_mountpoint(root_mp);
+	read_sequnlock_excl(&mount_lock);
 	error = 0;
 out4:
 	unlock_mount(old_mp);