diff mbox

[RFC,2/3] vfs: add d_replace()

Message ID 527297d2dae27845b3b7ba8ad6e81ef477fddee2.1479802448.git.osandov@fb.com (mailing list archive)
State New, archived
Headers show

Commit Message

Omar Sandoval Nov. 22, 2016, 8:25 a.m. UTC
From: Omar Sandoval <osandov@fb.com>

Changes the inode associated with a dentry. This'll be useful for
implementations of linkat() AT_REPLACE.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/dcache.c            | 68 +++++++++++++++++++++++++++++++++++++++++++++-----
 include/linux/dcache.h |  1 +
 2 files changed, 63 insertions(+), 6 deletions(-)

Comments

Al Viro Nov. 23, 2016, 3:40 a.m. UTC | #1
On Tue, Nov 22, 2016 at 12:25:02AM -0800, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> Changes the inode associated with a dentry. This'll be useful for
> implementations of linkat() AT_REPLACE.

Hard NAK.  That violates all kinds of assumptions made by VFS and
filesystems alike; never, ever do that.  If you have a reference to
a positive dentry, inode should *NEVER* change.

If it unhashed the old dentry, created a new one and attached inode to
it, it _might_ have a chance.  I'm less than sure it's a good idea, but
it this form it's a non-starter.

Again,

NAKed-by: Al Viro <viro@zeniv.linux.org.uk>

and don't bring it back in that form.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Omar Sandoval Nov. 23, 2016, 3:57 a.m. UTC | #2
On Wed, Nov 23, 2016 at 03:40:04AM +0000, Al Viro wrote:
> On Tue, Nov 22, 2016 at 12:25:02AM -0800, Omar Sandoval wrote:
> > From: Omar Sandoval <osandov@fb.com>
> > 
> > Changes the inode associated with a dentry. This'll be useful for
> > implementations of linkat() AT_REPLACE.
> 
> Hard NAK.  That violates all kinds of assumptions made by VFS and
> filesystems alike; never, ever do that.  If you have a reference to
> a positive dentry, inode should *NEVER* change.

Okay, thanks for the sanity check.

> If it unhashed the old dentry, created a new one and attached inode to
> it, it _might_ have a chance.  I'm less than sure it's a good idea, but
> it this form it's a non-starter.

One thing I considered was having the filesystem unhash the dentry and
just letting the next lookup that comes along instantiate the new one.
Is that better or worse than doing something like your suggestion?

> Again,
> 
> NAKed-by: Al Viro <viro@zeniv.linux.org.uk>
> 
> and don't bring it back in that form.
Linus Torvalds Nov. 23, 2016, 4:29 p.m. UTC | #3
On Tue, Nov 22, 2016 at 7:57 PM, Omar Sandoval <osandov@osandov.com> wrote:
> On Wed, Nov 23, 2016 at 03:40:04AM +0000, Al Viro wrote:
>
>> If it unhashed the old dentry, created a new one and attached inode to
>> it, it _might_ have a chance.  I'm less than sure it's a good idea, but
>> it this form it's a non-starter.
>
> One thing I considered was having the filesystem unhash the dentry and
> just letting the next lookup that comes along instantiate the new one.
> Is that better or worse than doing something like your suggestion?

I don't have the background for why you want this, but the two
approaches should be equivalent.

However, it's not a safe operation in the general case, since the
low-level filesystem may depend on the single unique dentry meaning
that operations on one particular filename are serialized and that the
dentry is unique, and your "unhash and create new" would leave old
users with a stale dentry that is no longer "unique" in that filename.
So you certainly cannot do even that kind of "d_replace()" in some
general situation.

An example of that kind of situation is the whole "d_in_lookup()"
where we use the dentry itself to guarantee uniqueness while possibly
looking up multiple entries in parallell in the same directory.

So for some particular filesystem, under some very particular
situations, such a d_replace() may be valid. But without seeing the
background, it's hard to tell. Apparently this was discussed on the
fsdevel list that google doesn't even index, and looking at the
fsdevel archives is a pain. Looks like it's AT_REPLACE for linkat().

In that context it superficially looks ok to me (ie it's
filesystem-controlled and done only when we've serialized the
directory for the link() operation anyway).  But I didn't think about
it _that_ much.

                   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Omar Sandoval Dec. 2, 2016, 1:10 a.m. UTC | #4
On Wed, Nov 23, 2016 at 08:29:57AM -0800, Linus Torvalds wrote:
> On Tue, Nov 22, 2016 at 7:57 PM, Omar Sandoval <osandov@osandov.com> wrote:
> > On Wed, Nov 23, 2016 at 03:40:04AM +0000, Al Viro wrote:
> >
> >> If it unhashed the old dentry, created a new one and attached inode to
> >> it, it _might_ have a chance.  I'm less than sure it's a good idea, but
> >> it this form it's a non-starter.
> >
> > One thing I considered was having the filesystem unhash the dentry and
> > just letting the next lookup that comes along instantiate the new one.
> > Is that better or worse than doing something like your suggestion?
> 
> I don't have the background for why you want this, but the two
> approaches should be equivalent.
> 
> However, it's not a safe operation in the general case, since the
> low-level filesystem may depend on the single unique dentry meaning
> that operations on one particular filename are serialized and that the
> dentry is unique, and your "unhash and create new" would leave old
> users with a stale dentry that is no longer "unique" in that filename.
> So you certainly cannot do even that kind of "d_replace()" in some
> general situation.
> 
> An example of that kind of situation is the whole "d_in_lookup()"
> where we use the dentry itself to guarantee uniqueness while possibly
> looking up multiple entries in parallell in the same directory.

Thanks, this was helpful. As you mentioned below, since this is for
linkat(), we're serialized on i_rwsem, so this particular case should be
fine. But if there's something that tries to serialize on the dentry
itself without holding i_rwsem at least shared, then this would
definitely be wrong. Does such a thing exist?

> So for some particular filesystem, under some very particular
> situations, such a d_replace() may be valid. But without seeing the
> background, it's hard to tell. Apparently this was discussed on the
> fsdevel list that google doesn't even index, and looking at the
> fsdevel archives is a pain. Looks like it's AT_REPLACE for linkat().

That's right, here's the thread:

[RFC PATCH 0/3] http://marc.info/?t=147980325700004&r=1&w=2
[RFC PATCH 1/3] http://marc.info/?t=147980325700006&r=1&w=2
[RFC PATCH 2/3] http://marc.info/?t=147980325700005&r=1&w=2
[RFC PATCH 3/3] http://marc.info/?t=147980325700007&r=1&w=2

(Man, I miss gmane.) I'll cc lkml on the next submission.

> In that context it superficially looks ok to me (ie it's
> filesystem-controlled and done only when we've serialized the
> directory for the link() operation anyway).  But I didn't think about
> it _that_ much.
> 
>                    Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/dcache.c b/fs/dcache.c
index 5c7cc95..aaa2b16 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -316,6 +316,16 @@  static void dentry_free(struct dentry *dentry)
 		call_rcu(&dentry->d_u.d_rcu, __d_free);
 }
 
+static void dentry_iput(struct dentry *dentry, struct inode *inode)
+{
+	if (!inode->i_nlink)
+		fsnotify_inoderemove(inode);
+	if (dentry->d_op && dentry->d_op->d_iput)
+		dentry->d_op->d_iput(dentry, inode);
+	else
+		iput(inode);
+}
+
 /*
  * Release the dentry's inode, using the filesystem
  * d_iput() operation if defined.
@@ -335,12 +345,7 @@  static void dentry_unlink_inode(struct dentry * dentry)
 		raw_write_seqcount_end(&dentry->d_seq);
 	spin_unlock(&dentry->d_lock);
 	spin_unlock(&inode->i_lock);
-	if (!inode->i_nlink)
-		fsnotify_inoderemove(inode);
-	if (dentry->d_op && dentry->d_op->d_iput)
-		dentry->d_op->d_iput(dentry, inode);
-	else
-		iput(inode);
+	dentry_iput(dentry, inode);
 }
 
 /*
@@ -1816,6 +1821,24 @@  void d_instantiate(struct dentry *entry, struct inode * inode)
 }
 EXPORT_SYMBOL(d_instantiate);
 
+static void lock_two_inodes(struct inode *inode1, struct inode *inode2)
+{
+	if (inode1 > inode2)
+		swap(inode1, inode2);
+	if (inode1)
+		spin_lock(&inode1->i_lock);
+	if (inode2)
+		spin_lock(&inode2->i_lock);
+}
+
+static void unlock_two_inodes(struct inode *inode1, struct inode *inode2)
+{
+	if (inode1)
+		spin_unlock(&inode1->i_lock);
+	if (inode2)
+		spin_unlock(&inode2->i_lock);
+}
+
 /**
  * d_instantiate_no_diralias - instantiate a non-aliased dentry
  * @entry: dentry to complete
@@ -2339,6 +2362,39 @@  void d_delete(struct dentry * dentry)
 }
 EXPORT_SYMBOL(d_delete);
 
+/**
+ * d_replace - change the inode a dentry is associated with
+ * @dentry: dentry to modify
+ * @inode: inode to attach to this dentry
+ *
+ * Fill in new inode information in a dentry that may have previously been
+ * instantiated. This handles both negative and positive dentries.
+ */
+void d_replace(struct dentry *dentry, struct inode *inode)
+{
+	struct inode *old_inode = dentry->d_inode;
+	unsigned int add_flags;
+
+	lock_two_inodes(old_inode, inode);
+	spin_lock(&dentry->d_lock);
+	add_flags = d_flags_for_inode(inode);
+
+	if (old_inode)
+		hlist_del(&dentry->d_u.d_alias);
+	hlist_add_head(&dentry->d_u.d_alias, &inode->i_dentry);
+
+	raw_write_seqcount_begin(&dentry->d_seq);
+	__d_set_inode_and_type(dentry, inode, add_flags);
+	raw_write_seqcount_end(&dentry->d_seq);
+	fsnotify_update_flags(dentry);
+
+	spin_unlock(&dentry->d_lock);
+	unlock_two_inodes(old_inode, inode);
+	if (old_inode)
+		dentry_iput(dentry, old_inode);
+}
+EXPORT_SYMBOL(d_replace);
+
 static void __d_rehash(struct dentry *entry)
 {
 	struct hlist_bl_head *b = d_hash(entry->d_name.hash);
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 5beed7b..0610bb0 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -224,6 +224,7 @@  extern int d_instantiate_no_diralias(struct dentry *, struct inode *);
 extern void __d_drop(struct dentry *dentry);
 extern void d_drop(struct dentry *dentry);
 extern void d_delete(struct dentry *);
+extern void d_replace(struct dentry *, struct inode *);
 extern void d_set_d_op(struct dentry *dentry, const struct dentry_operations *op);
 
 /* allocate/de-allocate */