diff mbox

races in ll_splice_alias()

Message ID 20160308211148.GX17997@ZenIV.linux.org.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Al Viro March 8, 2016, 9:11 p.m. UTC
On Tue, Mar 08, 2016 at 08:44:24PM +0000, Drokin, Oleg wrote:

> The links are hardlinks, right? (because otherwise they would not be
> parallel due to parent dir i_mutex held).

Yes.

> Hm, The race is a "safe" one, since if the alias have appeared, it does not break
> anything other than using up some RAM, I think?
> In fact what's to stop one appearing after we released the locking leading to the
> same situation?

Kinda-sorta.  It's safe unless a rename on server gets you see the same
directory in two places.  d_splice_alias() would've coped with that, this
code won't.

> > If so, how about adding d_hash_exact_alias(dentry) that would try to find
> > and hash an exact unhashed alias, so that this thing would
> > 	* call d_hash_exact_alias(dentry), if found - just return it
> > 	* ll_d_init(dentry)
> > 	* return d_splice_alias() ?: dentry
> > Do you see any problems with that?  Parent directory is locked, so no new
> > unhashed exact aliases should have a chance to appear and d_splice_alias()
> > would take care of everything else (correctness and detached ones).
> 
> This sounds fine on the surface. I think I remember there were some other
> complications with d_splice_alias.
> Andreas, do ou remember?

FWIW, a patch in my queue kills d_add_unique(), replacing it with
d_exact_alias() and d_splice_alias(); it could be used to implement
ll_splice_alias() as well (with code duplication gone *and* capable
of dealing with directories moving around), except for the odd
rules re inode refcount on error; it would've been easier if ll_splice_alias()
would always either inserted inode reference into a new dentry or dropped it.
I'm still trying to trace what does iput() in case of error in your current
code; I understand the one in do_statahead_enter(), but what does it in
ll_lookup_it_finish()?

Anyway, d_add_unique() removal (and switching its only caller to replacement)
follows:

replace d_add_unique() with saner primitive

new primitive: d_exact_alias(dentry, inode).  If there is an unhashed
dentry with the same name/parent and given inode, rehash, grab and
return it.  Otherwise, return NULL.  The only caller of d_add_unique()
switched to d_exact_alias() + d_splice_alias().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Drokin, Oleg March 8, 2016, 11:18 p.m. UTC | #1
On Mar 8, 2016, at 4:11 PM, Al Viro wrote:

> On Tue, Mar 08, 2016 at 08:44:24PM +0000, Drokin, Oleg wrote:
> 
>> The links are hardlinks, right? (because otherwise they would not be
>> parallel due to parent dir i_mutex held).
> 
> Yes.
> 
>> Hm, The race is a "safe" one, since if the alias have appeared, it does not break
>> anything other than using up some RAM, I think?
>> In fact what's to stop one appearing after we released the locking leading to the
>> same situation?
> 
> Kinda-sorta.  It's safe unless a rename on server gets you see the same
> directory in two places.  d_splice_alias() would've coped with that, this
> code won't.

Rename on server cannot get us to see the same directory in two places, or what's
the scenario?
In Lustre:
thread1: lookup a directory on the server.
         get a lock back
         instantiate a dentry (guarded by the lock)
         make a lock revocable (ll_lookup_finish_locks() in ll_lookup_it())
thread2: rename the directory moving it somewhere else
         server attempts to revoke the lock from thread1
         node that runs thread1 drops the lock and marks all dentries for that
              inode invalid
         server completes rename and releases the lock it holds
thread3: lookup the directory under new name on the server
         this can only return from server when the rename has completed and the
         dentry from thread1 marked invalid.

>>> If so, how about adding d_hash_exact_alias(dentry) that would try to find
>>> and hash an exact unhashed alias, so that this thing would
>>> 	* call d_hash_exact_alias(dentry), if found - just return it
>>> 	* ll_d_init(dentry)
>>> 	* return d_splice_alias() ?: dentry
>>> Do you see any problems with that?  Parent directory is locked, so no new
>>> unhashed exact aliases should have a chance to appear and d_splice_alias()
>>> would take care of everything else (correctness and detached ones).
>> 
>> This sounds fine on the surface. I think I remember there were some other
>> complications with d_splice_alias.
>> Andreas, do ou remember?
> 
> FWIW, a patch in my queue kills d_add_unique(), replacing it with
> d_exact_alias() and d_splice_alias(); it could be used to implement
> ll_splice_alias() as well (with code duplication gone *and* capable
> of dealing with directories moving around), except for the odd
> rules re inode refcount on error; it would've been easier if ll_splice_alias()

Ok, let me try my hand at that. Hopefully whatever complications are there would
show themselves right away too.

> would always either inserted inode reference into a new dentry or dropped it.
> I'm still trying to trace what does iput() in case of error in your current
> code; I understand the one in do_statahead_enter(), but what does it in
> ll_lookup_it_finish()?

You mean when ll_d_init() fails in ll_splice_alias()?
Hm… It appears that we are indeed leaking the inode in that case, thanks.
I'll try to address that too.
Hm, in fact this was almost noticed, but I guess nobody retested it after
fixing the initial crash we had with 7486bc06ab2c46d6957f0211d09bc549aaf9cc87

Thanks.

Bye,
    Oleg--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/dcache.c b/fs/dcache.c
index 2398f9f9..4d20bf5 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1782,81 +1782,6 @@  void d_instantiate(struct dentry *entry, struct inode * inode)
 EXPORT_SYMBOL(d_instantiate);
 
 /**
- * d_instantiate_unique - instantiate a non-aliased dentry
- * @entry: dentry to instantiate
- * @inode: inode to attach to this dentry
- *
- * Fill in inode information in the entry. On success, it returns NULL.
- * If an unhashed alias of "entry" already exists, then we return the
- * aliased dentry instead and drop one reference to inode.
- *
- * Note that in order to avoid conflicts with rename() etc, the caller
- * had better be holding the parent directory semaphore.
- *
- * This also assumes that the inode count has been incremented
- * (or otherwise set) by the caller to indicate that it is now
- * in use by the dcache.
- */
-static struct dentry *__d_instantiate_unique(struct dentry *entry,
-					     struct inode *inode)
-{
-	struct dentry *alias;
-	int len = entry->d_name.len;
-	const char *name = entry->d_name.name;
-	unsigned int hash = entry->d_name.hash;
-
-	if (!inode) {
-		__d_instantiate(entry, NULL);
-		return NULL;
-	}
-
-	hlist_for_each_entry(alias, &inode->i_dentry, d_u.d_alias) {
-		/*
-		 * Don't need alias->d_lock here, because aliases with
-		 * d_parent == entry->d_parent are not subject to name or
-		 * parent changes, because the parent inode i_mutex is held.
-		 */
-		if (alias->d_name.hash != hash)
-			continue;
-		if (alias->d_parent != entry->d_parent)
-			continue;
-		if (alias->d_name.len != len)
-			continue;
-		if (dentry_cmp(alias, name, len))
-			continue;
-		__dget(alias);
-		return alias;
-	}
-
-	__d_instantiate(entry, inode);
-	return NULL;
-}
-
-struct dentry *d_instantiate_unique(struct dentry *entry, struct inode *inode)
-{
-	struct dentry *result;
-
-	BUG_ON(!hlist_unhashed(&entry->d_u.d_alias));
-
-	if (inode)
-		spin_lock(&inode->i_lock);
-	result = __d_instantiate_unique(entry, inode);
-	if (inode)
-		spin_unlock(&inode->i_lock);
-
-	if (!result) {
-		security_d_instantiate(entry, inode);
-		return NULL;
-	}
-
-	BUG_ON(!d_unhashed(result));
-	iput(inode);
-	return result;
-}
-
-EXPORT_SYMBOL(d_instantiate_unique);
-
-/**
  * d_instantiate_no_diralias - instantiate a non-aliased dentry
  * @entry: dentry to complete
  * @inode: inode to attach to this dentry
@@ -2437,6 +2362,56 @@  void d_rehash(struct dentry * entry)
 EXPORT_SYMBOL(d_rehash);
 
 /**
+ * d_exact_alias - find and hash an exact unhashed alias
+ * @entry: dentry to add
+ * @inode: The inode to go with this dentry
+ *
+ * If an unhashed dentry with the same name/parent and desired
+ * inode already exists, hash and return it.  Otherwise, return
+ * NULL.
+ *
+ * Parent directory should be locked.
+ */
+struct dentry *d_exact_alias(struct dentry *entry, struct inode *inode)
+{
+	struct dentry *alias;
+	int len = entry->d_name.len;
+	const char *name = entry->d_name.name;
+	unsigned int hash = entry->d_name.hash;
+
+	spin_lock(&inode->i_lock);
+	hlist_for_each_entry(alias, &inode->i_dentry, d_u.d_alias) {
+		/*
+		 * Don't need alias->d_lock here, because aliases with
+		 * d_parent == entry->d_parent are not subject to name or
+		 * parent changes, because the parent inode i_mutex is held.
+		 */
+		if (alias->d_name.hash != hash)
+			continue;
+		if (alias->d_parent != entry->d_parent)
+			continue;
+		if (alias->d_name.len != len)
+			continue;
+		if (dentry_cmp(alias, name, len))
+			continue;
+		spin_lock(&alias->d_lock);
+		if (!d_unhashed(alias)) {
+			spin_unlock(&alias->d_lock);
+			alias = NULL;
+		} else {
+			__dget_dlock(alias);
+			_d_rehash(alias);
+			spin_unlock(&alias->d_lock);
+		}
+		spin_unlock(&inode->i_lock);
+		return alias;
+	}
+	spin_unlock(&inode->i_lock);
+	return NULL;
+}
+EXPORT_SYMBOL(d_exact_alias);
+
+/**
  * dentry_update_name_case - update case insensitive dentry with a new name
  * @dentry: dentry to be updated
  * @name: new name
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 4bfc33a..9a5d67f 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -2461,14 +2461,16 @@  static int _nfs4_open_and_get_state(struct nfs4_opendata *opendata,
 
 	dentry = opendata->dentry;
 	if (d_really_is_negative(dentry)) {
-		/* FIXME: Is this d_drop() ever needed? */
+		struct dentry *alias;
 		d_drop(dentry);
-		dentry = d_add_unique(dentry, igrab(state->inode));
-		if (dentry == NULL) {
-			dentry = opendata->dentry;
-		} else if (dentry != ctx->dentry) {
+		alias = d_exact_alias(dentry, state->inode);
+		if (!alias)
+			alias = d_splice_alias(igrab(state->inode), dentry);
+		/* d_splice_alias() can't fail here - it's a non-directory */
+		if (alias) {
+			dentry = dget(alias);
 			dput(ctx->dentry);
-			ctx->dentry = dget(dentry);
+			ctx->dentry = dentry;
 		}
 		nfs_set_verifier(dentry,
 				nfs_save_change_attribute(d_inode(opendata->dir)));
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index c4b5f4b..bda4ec5 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -246,6 +246,7 @@  extern struct dentry * d_alloc(struct dentry *, const struct qstr *);
 extern struct dentry * d_alloc_pseudo(struct super_block *, const struct qstr *);
 extern struct dentry * d_splice_alias(struct inode *, struct dentry *);
 extern struct dentry * d_add_ci(struct dentry *, struct inode *, struct qstr *);
+extern struct dentry * d_exact_alias(struct dentry *, struct inode *);
 extern struct dentry *d_find_any_alias(struct inode *inode);
 extern struct dentry * d_obtain_alias(struct inode *);
 extern struct dentry * d_obtain_root(struct inode *);
@@ -288,23 +289,6 @@  static inline void d_add(struct dentry *entry, struct inode *inode)
 	d_rehash(entry);
 }
 
-/**
- * d_add_unique - add dentry to hash queues without aliasing
- * @entry: dentry to add
- * @inode: The inode to attach to this dentry
- *
- * This adds the entry to the hash queues and initializes @inode.
- * The entry was actually filled in earlier during d_alloc().
- */
-static inline struct dentry *d_add_unique(struct dentry *entry, struct inode *inode)
-{
-	struct dentry *res;
-
-	res = d_instantiate_unique(entry, inode);
-	d_rehash(res != NULL ? res : entry);
-	return res;
-}
-
 extern void dentry_update_name_case(struct dentry *, struct qstr *);
 
 /* used for rename() and baskets */