diff mbox series

[v2] fs: change test in inode_insert5 for adding to the sb list

Message ID 20220511165339.85614-1-jlayton@kernel.org (mailing list archive)
State New, archived
Headers show
Series [v2] fs: change test in inode_insert5 for adding to the sb list | expand

Commit Message

Jeff Layton May 11, 2022, 4:53 p.m. UTC
The inode_insert5 currently looks at I_CREATING to decide whether to
insert the inode into the sb list. This test is a bit ambiguous though
as I_CREATING state is not directly related to that list.

This test is also problematic for some upcoming ceph changes to add
fscrypt support. We need to be able to allocate an inode using new_inode
and insert it into the hash later if we end up using it, and doing that
now means that we double add it and corrupt the list.

What we really want to know in this test is whether the inode is already
in its superblock list, and then add it if it isn't. Have it test for
list_empty instead and ensure that we always initialize the list by
doing it in inode_init_once. It's only ever removed from the list with
list_del_init, so that should be sufficient.

There doesn't seem to be any need to hold the inode_hash_lock for this
operation either, so drop that before adding to to the list.

Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
 fs/inode.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

A small revision to the patch that I sent as part of the ceph+fscrypt
series. I didn't see any need to hold the inode_hash_lock when adding
the inode to the sb list, so do that outside the lock. I also revised
the comment to be more clear.

Al, I'm planning to merge this via the ceph tree since I have other
patches that depend on it. Let me know if you'd rather take this via
your tree instead.

Comments

Christoph Hellwig May 12, 2022, 6:21 a.m. UTC | #1
Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>
Dave Chinner May 12, 2022, 2:21 p.m. UTC | #2
On Wed, May 11, 2022 at 12:53:39PM -0400, Jeff Layton wrote:
> The inode_insert5 currently looks at I_CREATING to decide whether to
> insert the inode into the sb list. This test is a bit ambiguous though
> as I_CREATING state is not directly related to that list.
> 
> This test is also problematic for some upcoming ceph changes to add
> fscrypt support. We need to be able to allocate an inode using new_inode
> and insert it into the hash later if we end up using it, and doing that
> now means that we double add it and corrupt the list.
> 
> What we really want to know in this test is whether the inode is already
> in its superblock list, and then add it if it isn't. Have it test for
> list_empty instead and ensure that we always initialize the list by
> doing it in inode_init_once. It's only ever removed from the list with
> list_del_init, so that should be sufficient.
> 
> There doesn't seem to be any need to hold the inode_hash_lock for this
> operation either, so drop that before adding to to the list.
> 
> Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
> ---
>  fs/inode.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> A small revision to the patch that I sent as part of the ceph+fscrypt
> series. I didn't see any need to hold the inode_hash_lock when adding
> the inode to the sb list, so do that outside the lock. I also revised
> the comment to be more clear.

I'm not sure that's valid. The moment the hash lock is dropped,
other lookups can find this inode in the cache. Hence there's no
guarantee that someone else isn't already accessing the inode
once the hash lock is dropped. Hence it's not clear to me that this
is a safe modification to make.

Given that we already do the list add under the hash lock, I don't
see any real gain to removing it from that context and it isn't
necessary to address the problem.

If you are concerned about reducing inode_has_lock contention, then
the answer to that is to convert the global lock to bit list locks
as the dentry cache uses. I wrote a patch a while back to do this:

https://lore.kernel.org/linux-fsdevel/20210406123343.1739669-1-david@fromorbit.com/

Hence at this stage, I prefer the original version that doesn't
change locking because there's much less risk associated with it.

Cheers,

DAve.
Jeff Layton May 12, 2022, 3:51 p.m. UTC | #3
On Fri, 2022-05-13 at 00:21 +1000, Dave Chinner wrote:
> On Wed, May 11, 2022 at 12:53:39PM -0400, Jeff Layton wrote:
> > The inode_insert5 currently looks at I_CREATING to decide whether to
> > insert the inode into the sb list. This test is a bit ambiguous though
> > as I_CREATING state is not directly related to that list.
> > 
> > This test is also problematic for some upcoming ceph changes to add
> > fscrypt support. We need to be able to allocate an inode using new_inode
> > and insert it into the hash later if we end up using it, and doing that
> > now means that we double add it and corrupt the list.
> > 
> > What we really want to know in this test is whether the inode is already
> > in its superblock list, and then add it if it isn't. Have it test for
> > list_empty instead and ensure that we always initialize the list by
> > doing it in inode_init_once. It's only ever removed from the list with
> > list_del_init, so that should be sufficient.
> > 
> > There doesn't seem to be any need to hold the inode_hash_lock for this
> > operation either, so drop that before adding to to the list.
> > 
> > Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
> > Cc: Dave Chinner <dchinner@redhat.com>
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
> > ---
> >  fs/inode.c | 13 +++++++++----
> >  1 file changed, 9 insertions(+), 4 deletions(-)
> > 
> > A small revision to the patch that I sent as part of the ceph+fscrypt
> > series. I didn't see any need to hold the inode_hash_lock when adding
> > the inode to the sb list, so do that outside the lock. I also revised
> > the comment to be more clear.
> 
> I'm not sure that's valid. The moment the hash lock is dropped,
> other lookups can find this inode in the cache. Hence there's no
> guarantee that someone else isn't already accessing the inode
> once the hash lock is dropped. Hence it's not clear to me that this
> is a safe modification to make.
> 
> Given that we already do the list add under the hash lock, I don't
> see any real gain to removing it from that context and it isn't
> necessary to address the problem.
> 
> If you are concerned about reducing inode_has_lock contention, then
> the answer to that is to convert the global lock to bit list locks
> as the dentry cache uses. I wrote a patch a while back to do this:
> 
> https://lore.kernel.org/linux-fsdevel/20210406123343.1739669-1-david@fromorbit.com/
> 
> Hence at this stage, I prefer the original version that doesn't
> change locking because there's much less risk associated with it.
> 

I didn't see anything that cares about the i_sb_list while
simultaneously holding the inode_hash_lock. It gets removed from the
list in evict, but we hold a reference to it at this point, so
nothingThe thing also has I_NEW set on it so there had better not be
anyone.

That said, iget_locked also holds it while adding it to the list and
you're correct that it's not required to fix the problem. I'll plan to
revert it to the v1 version. We can always experiment with moving it
outside the lock later if we think that's useful.

Christoph, are you OK with the original version of that patch as well? I
can resend it again if you missed it the first time.

Thanks,
Jeff Layton June 27, 2022, 11:15 a.m. UTC | #4
On Wed, 2022-05-11 at 12:53 -0400, Jeff Layton wrote:
> The inode_insert5 currently looks at I_CREATING to decide whether to
> insert the inode into the sb list. This test is a bit ambiguous though
> as I_CREATING state is not directly related to that list.
> 
> This test is also problematic for some upcoming ceph changes to add
> fscrypt support. We need to be able to allocate an inode using new_inode
> and insert it into the hash later if we end up using it, and doing that
> now means that we double add it and corrupt the list.
> 
> What we really want to know in this test is whether the inode is already
> in its superblock list, and then add it if it isn't. Have it test for
> list_empty instead and ensure that we always initialize the list by
> doing it in inode_init_once. It's only ever removed from the list with
> list_del_init, so that should be sufficient.
> 
> There doesn't seem to be any need to hold the inode_hash_lock for this
> operation either, so drop that before adding to to the list.
> 
> Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
> ---
>  fs/inode.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> A small revision to the patch that I sent as part of the ceph+fscrypt
> series. I didn't see any need to hold the inode_hash_lock when adding
> the inode to the sb list, so do that outside the lock. I also revised
> the comment to be more clear.
> 
> Al, I'm planning to merge this via the ceph tree since I have other
> patches that depend on it. Let me know if you'd rather take this via
> your tree instead.
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index 9d9b422504d1..9d429247a4f0 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -422,6 +422,7 @@ void inode_init_once(struct inode *inode)
>  	INIT_LIST_HEAD(&inode->i_io_list);
>  	INIT_LIST_HEAD(&inode->i_wb_list);
>  	INIT_LIST_HEAD(&inode->i_lru);
> +	INIT_LIST_HEAD(&inode->i_sb_list);
>  	__address_space_init_once(&inode->i_data);
>  	i_size_ordered_init(inode);
>  }
> @@ -1021,7 +1022,6 @@ struct inode *new_inode_pseudo(struct super_block *sb)
>  		spin_lock(&inode->i_lock);
>  		inode->i_state = 0;
>  		spin_unlock(&inode->i_lock);
> -		INIT_LIST_HEAD(&inode->i_sb_list);
>  	}
>  	return inode;
>  }
> @@ -1165,7 +1165,6 @@ struct inode *inode_insert5(struct inode *inode, unsigned long hashval,
>  {
>  	struct hlist_head *head = inode_hashtable + hash(inode->i_sb, hashval);
>  	struct inode *old;
> -	bool creating = inode->i_state & I_CREATING;
>  
>  again:
>  	spin_lock(&inode_hash_lock);
> @@ -1199,11 +1198,17 @@ struct inode *inode_insert5(struct inode *inode, unsigned long hashval,
>  	inode->i_state |= I_NEW;
>  	hlist_add_head_rcu(&inode->i_hash, head);
>  	spin_unlock(&inode->i_lock);
> -	if (!creating)
> -		inode_sb_list_add(inode);
>  unlock:
>  	spin_unlock(&inode_hash_lock);
>  
> +	/*
> +	 * Add it to the sb list if it's not already. If there is an inode,
> +	 * then it has I_NEW at this point, so it should be safe to test
> +	 * i_sb_list locklessly.
> +	 */
> +	if (inode && list_empty(&inode->i_sb_list))
> +		inode_sb_list_add(inode);
> +
>  	return inode;
>  }
>  EXPORT_SYMBOL(inode_insert5);

Hi Al,

Could I get your Acked-by or R-b on this patch? I'd like to take this in
via the ceph tree, and I'd prefer an explicit ack on this if possible.

Thanks!
diff mbox series

Patch

diff --git a/fs/inode.c b/fs/inode.c
index 9d9b422504d1..9d429247a4f0 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -422,6 +422,7 @@  void inode_init_once(struct inode *inode)
 	INIT_LIST_HEAD(&inode->i_io_list);
 	INIT_LIST_HEAD(&inode->i_wb_list);
 	INIT_LIST_HEAD(&inode->i_lru);
+	INIT_LIST_HEAD(&inode->i_sb_list);
 	__address_space_init_once(&inode->i_data);
 	i_size_ordered_init(inode);
 }
@@ -1021,7 +1022,6 @@  struct inode *new_inode_pseudo(struct super_block *sb)
 		spin_lock(&inode->i_lock);
 		inode->i_state = 0;
 		spin_unlock(&inode->i_lock);
-		INIT_LIST_HEAD(&inode->i_sb_list);
 	}
 	return inode;
 }
@@ -1165,7 +1165,6 @@  struct inode *inode_insert5(struct inode *inode, unsigned long hashval,
 {
 	struct hlist_head *head = inode_hashtable + hash(inode->i_sb, hashval);
 	struct inode *old;
-	bool creating = inode->i_state & I_CREATING;
 
 again:
 	spin_lock(&inode_hash_lock);
@@ -1199,11 +1198,17 @@  struct inode *inode_insert5(struct inode *inode, unsigned long hashval,
 	inode->i_state |= I_NEW;
 	hlist_add_head_rcu(&inode->i_hash, head);
 	spin_unlock(&inode->i_lock);
-	if (!creating)
-		inode_sb_list_add(inode);
 unlock:
 	spin_unlock(&inode_hash_lock);
 
+	/*
+	 * Add it to the sb list if it's not already. If there is an inode,
+	 * then it has I_NEW at this point, so it should be safe to test
+	 * i_sb_list locklessly.
+	 */
+	if (inode && list_empty(&inode->i_sb_list))
+		inode_sb_list_add(inode);
+
 	return inode;
 }
 EXPORT_SYMBOL(inode_insert5);