diff mbox series

[4/7] vfs: Convert sb->s_inodes iteration to super_iter_inodes()

Message ID 20241002014017.3801899-5-david@fromorbit.com (mailing list archive)
State New
Headers show
Series vfs: improving inode cache iteration scalability | expand

Commit Message

Dave Chinner Oct. 2, 2024, 1:33 a.m. UTC
From: Dave Chinner <dchinner@redhat.com>

Convert all the remaining superblock inode iterators to use
super_iter_inodes(). These are mostly straight forward conversions
for the iterations that use references, and the bdev use cases that
didn't even validate the inode before dereferencing it are now
inherently safe.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 block/bdev.c           |  76 ++++++++--------------
 fs/drop_caches.c       |  38 ++++-------
 fs/gfs2/ops_fstype.c   |  67 ++++++-------------
 fs/notify/fsnotify.c   |  75 ++++++---------------
 fs/quota/dquot.c       |  79 +++++++++--------------
 security/landlock/fs.c | 143 ++++++++++++++---------------------------
 6 files changed, 154 insertions(+), 324 deletions(-)

Comments

Christoph Hellwig Oct. 3, 2024, 7:23 a.m. UTC | #1
On Wed, Oct 02, 2024 at 11:33:21AM +1000, Dave Chinner wrote:
> --- a/security/landlock/fs.c
> +++ b/security/landlock/fs.c
> @@ -1223,109 +1223,60 @@ static void hook_inode_free_security_rcu(void *inode_security)
>  
>  /*
>   * Release the inodes used in a security policy.
> - *
> - * Cf. fsnotify_unmount_inodes() and invalidate_inodes()
>   */
> +static int release_inode_fn(struct inode *inode, void *data)

Looks like this is called from the sb_delete LSM hook, which
is only implemented by landlock, and only called from
generic_shutdown_super, separated from evict_inodes only by call
to fsnotify_sb_delete.  Why did LSM not hook into that and instead
added another iteration of the per-sb inode list?

(Note that this is not tying to get Dave to fix this, just noticed
it when reviewing this series)
Christoph Hellwig Oct. 3, 2024, 7:38 a.m. UTC | #2
On Thu, Oct 03, 2024 at 12:23:41AM -0700, Christoph Hellwig wrote:
> On Wed, Oct 02, 2024 at 11:33:21AM +1000, Dave Chinner wrote:
> > --- a/security/landlock/fs.c
> > +++ b/security/landlock/fs.c
> > @@ -1223,109 +1223,60 @@ static void hook_inode_free_security_rcu(void *inode_security)
> >  
> >  /*
> >   * Release the inodes used in a security policy.
> > - *
> > - * Cf. fsnotify_unmount_inodes() and invalidate_inodes()
> >   */
> > +static int release_inode_fn(struct inode *inode, void *data)
> 
> Looks like this is called from the sb_delete LSM hook, which
> is only implemented by landlock, and only called from
> generic_shutdown_super, separated from evict_inodes only by call
> to fsnotify_sb_delete.  Why did LSM not hook into that and instead

An the main thing that fsnotify_sb_delete does is yet another inode
iteration..

Ay chance you all could get together an figure out how to get down
to a single sb inode iteration per unmount?
Jan Kara Oct. 3, 2024, 11:57 a.m. UTC | #3
On Thu 03-10-24 00:38:05, Christoph Hellwig wrote:
> On Thu, Oct 03, 2024 at 12:23:41AM -0700, Christoph Hellwig wrote:
> > On Wed, Oct 02, 2024 at 11:33:21AM +1000, Dave Chinner wrote:
> > > --- a/security/landlock/fs.c
> > > +++ b/security/landlock/fs.c
> > > @@ -1223,109 +1223,60 @@ static void hook_inode_free_security_rcu(void *inode_security)
> > >  
> > >  /*
> > >   * Release the inodes used in a security policy.
> > > - *
> > > - * Cf. fsnotify_unmount_inodes() and invalidate_inodes()
> > >   */
> > > +static int release_inode_fn(struct inode *inode, void *data)
> > 
> > Looks like this is called from the sb_delete LSM hook, which
> > is only implemented by landlock, and only called from
> > generic_shutdown_super, separated from evict_inodes only by call
> > to fsnotify_sb_delete.  Why did LSM not hook into that and instead
> 
> An the main thing that fsnotify_sb_delete does is yet another inode
> iteration..
> 
> Ay chance you all could get together an figure out how to get down
> to a single sb inode iteration per unmount?

Fair enough. If we go with the iterator variant I've suggested to Dave in
[1], we could combine the evict_inodes(), fsnotify_unmount_inodes() and
Landlocks hook_sb_delete() into a single iteration relatively easily. But
I'd wait with that convertion until this series lands.

								Honza

[1] https://lore.kernel.org/all/20241003114555.bl34fkqsja4s5tok@quack3
Christoph Hellwig Oct. 3, 2024, 12:11 p.m. UTC | #4
On Thu, Oct 03, 2024 at 01:57:21PM +0200, Jan Kara wrote:
> Fair enough. If we go with the iterator variant I've suggested to Dave in
> [1], we could combine the evict_inodes(), fsnotify_unmount_inodes() and
> Landlocks hook_sb_delete() into a single iteration relatively easily. But
> I'd wait with that convertion until this series lands.

I don't see how that has anything to do with iterators or not.
Jan Kara Oct. 3, 2024, 12:26 p.m. UTC | #5
On Thu 03-10-24 05:11:11, Christoph Hellwig wrote:
> On Thu, Oct 03, 2024 at 01:57:21PM +0200, Jan Kara wrote:
> > Fair enough. If we go with the iterator variant I've suggested to Dave in
> > [1], we could combine the evict_inodes(), fsnotify_unmount_inodes() and
> > Landlocks hook_sb_delete() into a single iteration relatively easily. But
> > I'd wait with that convertion until this series lands.
> 
> I don't see how that has anything to do with iterators or not.

Well, the patches would obviously conflict which seems pointless if we
could live with three iterations for a few years until somebody noticed :).
And with current Dave's version of iterators it will not be possible to
integrate evict_inodes() iteration with the other two without a layering
violation. Still we could go from 3 to 2 iterations.

								Honza
Christoph Hellwig Oct. 3, 2024, 12:39 p.m. UTC | #6
On Thu, Oct 03, 2024 at 02:26:57PM +0200, Jan Kara wrote:
> On Thu 03-10-24 05:11:11, Christoph Hellwig wrote:
> > On Thu, Oct 03, 2024 at 01:57:21PM +0200, Jan Kara wrote:
> > > Fair enough. If we go with the iterator variant I've suggested to Dave in
> > > [1], we could combine the evict_inodes(), fsnotify_unmount_inodes() and
> > > Landlocks hook_sb_delete() into a single iteration relatively easily. But
> > > I'd wait with that convertion until this series lands.
> > 
> > I don't see how that has anything to do with iterators or not.
> 
> Well, the patches would obviously conflict

Conflict with what?

> which seems pointless if we
> could live with three iterations for a few years until somebody noticed :).
> And with current Dave's version of iterators it will not be possible to
> integrate evict_inodes() iteration with the other two without a layering
> violation. Still we could go from 3 to 2 iterations.

What layering violation?

Below is quick compile tested part to do the fsnotify side and
get rid of the fsnotify iteration, which looks easily worth it.

landlock is a bit more complex because of lsm hooks, and the internal
underobj abstraction inside of landlock.  But looking at the release
inode data vs unomunt synchronization it has and the duplicate code I
think doing it this way is worth the effort even more, it'll just need 
someone who knows landlock and the lsm layering to help with the work.

diff --git a/fs/inode.c b/fs/inode.c
index 3f335f78c5b228..7d5f8a09e4d29d 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -789,11 +789,23 @@ static bool dispose_list(struct list_head *head)
  */
 static int evict_inode_fn(struct inode *inode, void *data)
 {
+	struct super_block *sb = inode->i_sb;
 	struct list_head *dispose = data;
+	bool post_unmount = !(sb->s_flags & SB_ACTIVE);
 
 	spin_lock(&inode->i_lock);
-	if (atomic_read(&inode->i_count) ||
-	    (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE))) {
+	if (atomic_read(&inode->i_count)) {
+		spin_unlock(&inode->i_lock);
+
+		/* for each watch, send FS_UNMOUNT and then remove it */
+		if (post_unmount && fsnotify_sb_info(sb)) {
+			fsnotify_inode(inode, FS_UNMOUNT);
+			fsnotify_inode_delete(inode);
+		}
+		return INO_ITER_DONE;
+	}
+
+	if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) {
 		spin_unlock(&inode->i_lock);
 		return INO_ITER_DONE;
 	}
diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index 68c34ed9427190..cf89aa69e82c8d 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -28,16 +28,6 @@ void __fsnotify_vfsmount_delete(struct vfsmount *mnt)
 	fsnotify_clear_marks_by_mount(mnt);
 }
 
-static int fsnotify_unmount_inode_fn(struct inode *inode, void *data)
-{
-	spin_unlock(&inode->i_lock);
-
-	/* for each watch, send FS_UNMOUNT and then remove it */
-	fsnotify_inode(inode, FS_UNMOUNT);
-	fsnotify_inode_delete(inode);
-	return INO_ITER_DONE;
-}
-
 void fsnotify_sb_delete(struct super_block *sb)
 {
 	struct fsnotify_sb_info *sbinfo = fsnotify_sb_info(sb);
@@ -46,19 +36,6 @@ void fsnotify_sb_delete(struct super_block *sb)
 	if (!sbinfo)
 		return;
 
-	/*
-	 * If i_count is zero, the inode cannot have any watches and
-	 * doing an __iget/iput with SB_ACTIVE clear would actually
-	 * evict all inodes with zero i_count from icache which is
-	 * unnecessarily violent and may in fact be illegal to do.
-	 * However, we should have been called /after/ evict_inodes
-	 * removed all zero refcount inodes, in any case. Hence we use
-	 * INO_ITER_REFERENCED to ensure zero refcount inodes are filtered
-	 * properly.
-	 */
-	super_iter_inodes(sb, fsnotify_unmount_inode_fn, NULL,
-			INO_ITER_REFERENCED);
-
 	fsnotify_clear_marks_by_sb(sb);
 	/* Wait for outstanding object references from connectors */
 	wait_var_event(fsnotify_sb_watched_objects(sb),
diff --git a/fs/super.c b/fs/super.c
index 971ad4e996e0ba..88dd1703fe73db 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -167,28 +167,17 @@ static void super_wake(struct super_block *sb, unsigned int flag)
 	wake_up_var(&sb->s_flags);
 }
 
-bool super_iter_iget(struct inode *inode, int flags)
+bool super_iter_iget(struct inode *inode)
 {
-	bool	ret = false;
+	bool ret = false;
 
 	spin_lock(&inode->i_lock);
-	if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE))
-		goto out_unlock;
-
-	/*
-	 * Skip over zero refcount inode if the caller only wants
-	 * referenced inodes to be iterated.
-	 */
-	if ((flags & INO_ITER_REFERENCED) &&
-	    !atomic_read(&inode->i_count))
-		goto out_unlock;
-
-	__iget(inode);
-	ret = true;
-out_unlock:
+	if (!(inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE))) {
+		__iget(inode);
+		ret = true;
+	}
 	spin_unlock(&inode->i_lock);
 	return ret;
-
 }
 EXPORT_SYMBOL_GPL(super_iter_iget);
 
@@ -216,7 +205,7 @@ int super_iter_inodes(struct super_block *sb, ino_iter_fn iter_fn,
 
 	spin_lock(&sb->s_inode_list_lock);
 	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
-		if (!super_iter_iget(inode, flags))
+		if (!super_iter_iget(inode))
 			continue;
 		spin_unlock(&sb->s_inode_list_lock);
 		iput(old_inode);
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index ee544556cee728..5a174e690424fb 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -1654,8 +1654,7 @@ xfs_iter_vfs_igrab(
 	if (ip->i_flags & XFS_ITER_VFS_NOGRAB_IFLAGS)
 		goto out_unlock_noent;
 
-	if ((flags & INO_ITER_UNSAFE) ||
-	    super_iter_iget(inode, flags))
+	if ((flags & INO_ITER_UNSAFE) || super_iter_iget(inode))
 		ret = true;
 
 out_unlock_noent:
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2aa335228b84bf..a3c682f0d94c1b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2224,7 +2224,7 @@ enum freeze_holder {
 typedef int (*ino_iter_fn)(struct inode *inode, void *priv);
 int super_iter_inodes(struct super_block *sb, ino_iter_fn iter_fn,
 		void *private_data, int flags);
-bool super_iter_iget(struct inode *inode, int flags);
+bool super_iter_iget(struct inode *inode);
 
 struct super_operations {
    	struct inode *(*alloc_inode)(struct super_block *sb);
Jan Kara Oct. 3, 2024, 12:56 p.m. UTC | #7
On Thu 03-10-24 05:39:23, Christoph Hellwig wrote:
> On Thu, Oct 03, 2024 at 02:26:57PM +0200, Jan Kara wrote:
> > On Thu 03-10-24 05:11:11, Christoph Hellwig wrote:
> > > On Thu, Oct 03, 2024 at 01:57:21PM +0200, Jan Kara wrote:
> > > > Fair enough. If we go with the iterator variant I've suggested to Dave in
> > > > [1], we could combine the evict_inodes(), fsnotify_unmount_inodes() and
> > > > Landlocks hook_sb_delete() into a single iteration relatively easily. But
> > > > I'd wait with that convertion until this series lands.
> > > 
> > > I don't see how that has anything to do with iterators or not.
> > 
> > Well, the patches would obviously conflict
> 
> Conflict with what?

I thought you wanted the interations to be unified in current state of
code. If you meant after Dave's series, then we are in agreement.

> > which seems pointless if we
> > could live with three iterations for a few years until somebody noticed :).
> > And with current Dave's version of iterators it will not be possible to
> > integrate evict_inodes() iteration with the other two without a layering
> > violation. Still we could go from 3 to 2 iterations.
> 
> What layering violation?
> 
> Below is quick compile tested part to do the fsnotify side and
> get rid of the fsnotify iteration, which looks easily worth it.

...

> @@ -789,11 +789,23 @@ static bool dispose_list(struct list_head *head)
>   */
>  static int evict_inode_fn(struct inode *inode, void *data)
>  {
> +	struct super_block *sb = inode->i_sb;
>  	struct list_head *dispose = data;
> +	bool post_unmount = !(sb->s_flags & SB_ACTIVE);
>  
>  	spin_lock(&inode->i_lock);
> -	if (atomic_read(&inode->i_count) ||
> -	    (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE))) {
> +	if (atomic_read(&inode->i_count)) {
> +		spin_unlock(&inode->i_lock);
> +
> +		/* for each watch, send FS_UNMOUNT and then remove it */
> +		if (post_unmount && fsnotify_sb_info(sb)) {
> +			fsnotify_inode(inode, FS_UNMOUNT);
> +			fsnotify_inode_delete(inode);
> +		}

This will not work because you are in unsafe iterator holding
sb->s_inode_list_lock. To be able to call into fsnotify, you need to do the
iget / iput dance and releasing of s_inode_list_lock which does not work
when a filesystem has its own inodes iterator AFAICT... That's why I've
called it a layering violation.

									Honza

> +		return INO_ITER_DONE;
> +	}
> +
> +	if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) {
>  		spin_unlock(&inode->i_lock);
>  		return INO_ITER_DONE;
>  	}
Christoph Hellwig Oct. 3, 2024, 1:04 p.m. UTC | #8
On Thu, Oct 03, 2024 at 02:56:50PM +0200, Jan Kara wrote:
> > +	if (atomic_read(&inode->i_count)) {
> > +		spin_unlock(&inode->i_lock);
> > +
> > +		/* for each watch, send FS_UNMOUNT and then remove it */
> > +		if (post_unmount && fsnotify_sb_info(sb)) {
> > +			fsnotify_inode(inode, FS_UNMOUNT);
> > +			fsnotify_inode_delete(inode);
> > +		}
> 
> This will not work because you are in unsafe iterator holding
> sb->s_inode_list_lock. To be able to call into fsnotify, you need to do the
> iget / iput dance and releasing of s_inode_list_lock which does not work
> when a filesystem has its own inodes iterator AFAICT... That's why I've
> called it a layering violation.

Ah, yes.  So we'll need to special case it some way either way.  Still
feels saner to do it in one iteration and make the inode eviction not
use the unsafe version, but maybe that's indeed better postponed until
after Dave's series.
Dave Chinner Oct. 3, 2024, 1:59 p.m. UTC | #9
On Thu, Oct 03, 2024 at 02:56:50PM +0200, Jan Kara wrote:
> On Thu 03-10-24 05:39:23, Christoph Hellwig wrote:
> > @@ -789,11 +789,23 @@ static bool dispose_list(struct list_head *head)
> >   */
> >  static int evict_inode_fn(struct inode *inode, void *data)
> >  {
> > +	struct super_block *sb = inode->i_sb;
> >  	struct list_head *dispose = data;
> > +	bool post_unmount = !(sb->s_flags & SB_ACTIVE);
> >  
> >  	spin_lock(&inode->i_lock);
> > -	if (atomic_read(&inode->i_count) ||
> > -	    (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE))) {
> > +	if (atomic_read(&inode->i_count)) {
> > +		spin_unlock(&inode->i_lock);
> > +
> > +		/* for each watch, send FS_UNMOUNT and then remove it */
> > +		if (post_unmount && fsnotify_sb_info(sb)) {
> > +			fsnotify_inode(inode, FS_UNMOUNT);
> > +			fsnotify_inode_delete(inode);
> > +		}
> 
> This will not work because you are in unsafe iterator holding
> sb->s_inode_list_lock. To be able to call into fsnotify, you need to do the
> iget / iput dance and releasing of s_inode_list_lock which does not work
> when a filesystem has its own inodes iterator AFAICT... That's why I've
> called it a layering violation.

The whole point of the iget/iput dance is to stabilise the
s_inodes list iteration whilst it is unlocked - the actual fsnotify
calls don't need an inode reference to work correctly.

IOWs, we don't need to run the fsnotify stuff right here - we can
defer that like we do with the dispose list for all the inodes we
mark as I_FREEING here.

So if we pass a structure:

struct evict_inode_args {
	struct list_head	dispose;
	struct list_head	fsnotify;
};

If we use __iget() instead of requiring an inode state flag to keep
the inode off the LRU for the fsnotify cleanup, then the code
fragment above becomes:

	if (atomic_read(&inode->i_count)) {
		if (post_unmount && fsnotify_sb_info(sb)) {
			__iget(inode);
			inode_lru_list_del(inode);
			spin_unlock(&inode->i_lock);
			list_add(&inode->i_lru, &args->fsnotify);
		}
		return INO_ITER_DONE;
	}

And then once we return to evict_inodes(), we do this:

	while (!list_empty(args->fsnotify)) {
		struct inode *inode

		inode = list_first_entry(head, struct inode, i_lru);
                list_del_init(&inode->i_lru);

		fsnotify_inode(inode, FS_UNMOUNT);
		fsnotify_inode_delete(inode);
		iput(inode);
		cond_resched();
	}

And so now all the fsnotify cleanup is done outside the traversal in
one large batch from evict_inodes().

As for the landlock code, I think it needs to have it's own internal
tracking mechanism and not search the sb inode list for inodes that
it holds references to. LSM cleanup should be run before before we
get to tearing down the inode cache, not after....

-Dave.
Jan Kara Oct. 3, 2024, 4:17 p.m. UTC | #10
On Thu 03-10-24 23:59:51, Dave Chinner wrote:
> On Thu, Oct 03, 2024 at 02:56:50PM +0200, Jan Kara wrote:
> > On Thu 03-10-24 05:39:23, Christoph Hellwig wrote:
> > > @@ -789,11 +789,23 @@ static bool dispose_list(struct list_head *head)
> > >   */
> > >  static int evict_inode_fn(struct inode *inode, void *data)
> > >  {
> > > +	struct super_block *sb = inode->i_sb;
> > >  	struct list_head *dispose = data;
> > > +	bool post_unmount = !(sb->s_flags & SB_ACTIVE);
> > >  
> > >  	spin_lock(&inode->i_lock);
> > > -	if (atomic_read(&inode->i_count) ||
> > > -	    (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE))) {
> > > +	if (atomic_read(&inode->i_count)) {
> > > +		spin_unlock(&inode->i_lock);
> > > +
> > > +		/* for each watch, send FS_UNMOUNT and then remove it */
> > > +		if (post_unmount && fsnotify_sb_info(sb)) {
> > > +			fsnotify_inode(inode, FS_UNMOUNT);
> > > +			fsnotify_inode_delete(inode);
> > > +		}
> > 
> > This will not work because you are in unsafe iterator holding
> > sb->s_inode_list_lock. To be able to call into fsnotify, you need to do the
> > iget / iput dance and releasing of s_inode_list_lock which does not work
> > when a filesystem has its own inodes iterator AFAICT... That's why I've
> > called it a layering violation.
> 
> The whole point of the iget/iput dance is to stabilise the
> s_inodes list iteration whilst it is unlocked - the actual fsnotify
> calls don't need an inode reference to work correctly.
> 
> IOWs, we don't need to run the fsnotify stuff right here - we can
> defer that like we do with the dispose list for all the inodes we
> mark as I_FREEING here.
> 
> So if we pass a structure:
> 
> struct evict_inode_args {
> 	struct list_head	dispose;
> 	struct list_head	fsnotify;
> };
> 
> If we use __iget() instead of requiring an inode state flag to keep
> the inode off the LRU for the fsnotify cleanup, then the code
> fragment above becomes:
> 
> 	if (atomic_read(&inode->i_count)) {
> 		if (post_unmount && fsnotify_sb_info(sb)) {
> 			__iget(inode);
> 			inode_lru_list_del(inode);
> 			spin_unlock(&inode->i_lock);
> 			list_add(&inode->i_lru, &args->fsnotify);
> 		}

Nit: Need to release i_lock in else branch here.  Otherwise interesting
idea. Yes, something like this could work even in unsafe iterator.

> 		return INO_ITER_DONE;
> 	}
> And then once we return to evict_inodes(), we do this:
> 
> 	while (!list_empty(args->fsnotify)) {
> 		struct inode *inode
> 
> 		inode = list_first_entry(head, struct inode, i_lru);
>                 list_del_init(&inode->i_lru);
> 
> 		fsnotify_inode(inode, FS_UNMOUNT);
> 		fsnotify_inode_delete(inode);
> 		iput(inode);
> 		cond_resched();
> 	}
> 
> And so now all the fsnotify cleanup is done outside the traversal in
> one large batch from evict_inodes().

Yup.

> As for the landlock code, I think it needs to have it's own internal
> tracking mechanism and not search the sb inode list for inodes that
> it holds references to. LSM cleanup should be run before before we
> get to tearing down the inode cache, not after....

Well, I think LSM cleanup could in principle be handled together with the
fsnotify cleanup but I didn't check the details.


								Honza
Dave Chinner Oct. 4, 2024, 12:46 a.m. UTC | #11
On Thu, Oct 03, 2024 at 06:17:31PM +0200, Jan Kara wrote:
> On Thu 03-10-24 23:59:51, Dave Chinner wrote:
> > As for the landlock code, I think it needs to have it's own internal
> > tracking mechanism and not search the sb inode list for inodes that
> > it holds references to. LSM cleanup should be run before before we
> > get to tearing down the inode cache, not after....
> 
> Well, I think LSM cleanup could in principle be handled together with the
> fsnotify cleanup but I didn't check the details.

I'm not sure how we tell if an inode potentially has a LSM related
reference hanging off it. The landlock code looks to make an
assumption in that the only referenced inodes it sees will have a
valid inode->i_security pointer if landlock is enabled. i.e. it
calls landlock_inode(inode) and dereferences the returned value
without ever checking if inode->i_security is NULL or not.

I mean, we could do a check for inode->i_security when the refcount
is elevated and replace the security_sb_delete hook with an
security_evict_inode hook similar to the proposed fsnotify eviction
from evict_inodes().

But screwing with LSM instructure looks ....  obnoxiously complex
from the outside...

-Dave.
Christian Brauner Oct. 4, 2024, 7:21 a.m. UTC | #12
On Fri, Oct 04, 2024 at 10:46:27AM GMT, Dave Chinner wrote:
> On Thu, Oct 03, 2024 at 06:17:31PM +0200, Jan Kara wrote:
> > On Thu 03-10-24 23:59:51, Dave Chinner wrote:
> > > As for the landlock code, I think it needs to have it's own internal
> > > tracking mechanism and not search the sb inode list for inodes that
> > > it holds references to. LSM cleanup should be run before before we
> > > get to tearing down the inode cache, not after....
> > 
> > Well, I think LSM cleanup could in principle be handled together with the
> > fsnotify cleanup but I didn't check the details.
> 
> I'm not sure how we tell if an inode potentially has a LSM related
> reference hanging off it. The landlock code looks to make an
> assumption in that the only referenced inodes it sees will have a
> valid inode->i_security pointer if landlock is enabled. i.e. it
> calls landlock_inode(inode) and dereferences the returned value
> without ever checking if inode->i_security is NULL or not.
> 
> I mean, we could do a check for inode->i_security when the refcount
> is elevated and replace the security_sb_delete hook with an
> security_evict_inode hook similar to the proposed fsnotify eviction
> from evict_inodes().
> 
> But screwing with LSM instructure looks ....  obnoxiously complex
> from the outside...

Imho, please just focus on the immediate feedback and ignore all the
extra bells and whistles that we could or should do. I prefer all of
that to be done after this series lands.
Christoph Hellwig Oct. 4, 2024, 12:14 p.m. UTC | #13
On Fri, Oct 04, 2024 at 09:21:19AM +0200, Christian Brauner wrote:
> > But screwing with LSM instructure looks ....  obnoxiously complex
> > from the outside...
> 
> Imho, please just focus on the immediate feedback and ignore all the
> extra bells and whistles that we could or should do. I prefer all of
> that to be done after this series lands.

For the LSM mess: absolutely.  For fsnotify it seems like Dave has
a good idea to integrate it, and it removes the somewhat awkward
need for the reffed flag.  So if that delayed notify idea works out
I'd prefer to see that in over the flag.
Jan Kara Oct. 4, 2024, 1:49 p.m. UTC | #14
On Fri 04-10-24 05:14:36, Christoph Hellwig wrote:
> On Fri, Oct 04, 2024 at 09:21:19AM +0200, Christian Brauner wrote:
> > > But screwing with LSM instructure looks ....  obnoxiously complex
> > > from the outside...
> > 
> > Imho, please just focus on the immediate feedback and ignore all the
> > extra bells and whistles that we could or should do. I prefer all of
> > that to be done after this series lands.
> 
> For the LSM mess: absolutely.  For fsnotify it seems like Dave has
> a good idea to integrate it, and it removes the somewhat awkward
> need for the reffed flag.  So if that delayed notify idea works out
> I'd prefer to see that in over the flag.

As I wrote in one of the emails in this (now huge) thread, I'm fine with
completely dropping that inode->i_refcount check from the
fsnotify_unmount_inodes(). It made sense when it was called before
evict_inodes() but after 1edc8eb2e931 ("fs: call fsnotify_sb_delete after
evict_inodes") the usefulness of this check is rather doubtful. So we can
drop the awkward flag regardless whether we unify evict_inodes() with
fsnotify_unmount_inodes() or not. I have no strong preference whether the
unification happens as part of this patch set or later on so it's up to
Dave as far as I'm concerned.

								Honza
Paul Moore Oct. 4, 2024, 6:15 p.m. UTC | #15
On Fri, Oct 4, 2024 at 9:49 AM Jan Kara <jack@suse.cz> wrote:
> On Fri 04-10-24 05:14:36, Christoph Hellwig wrote:
> > On Fri, Oct 04, 2024 at 09:21:19AM +0200, Christian Brauner wrote:
> > > > But screwing with LSM instructure looks ....  obnoxiously complex
> > > > from the outside...
> > >
> > > Imho, please just focus on the immediate feedback and ignore all the
> > > extra bells and whistles that we could or should do. I prefer all of
> > > that to be done after this series lands.
> >
> > For the LSM mess: absolutely.  For fsnotify it seems like Dave has
> > a good idea to integrate it, and it removes the somewhat awkward
> > need for the reffed flag.  So if that delayed notify idea works out
> > I'd prefer to see that in over the flag.
>
> As I wrote in one of the emails in this (now huge) thread, I'm fine with
> completely dropping that inode->i_refcount check from the
> fsnotify_unmount_inodes(). It made sense when it was called before
> evict_inodes() but after 1edc8eb2e931 ("fs: call fsnotify_sb_delete after
> evict_inodes") the usefulness of this check is rather doubtful. So we can
> drop the awkward flag regardless whether we unify evict_inodes() with
> fsnotify_unmount_inodes() or not. I have no strong preference whether the
> unification happens as part of this patch set or later on so it's up to
> Dave as far as I'm concerned.

I didn't get a chance to look at this thread until just now and I'm
noticing that the email used for Mickaël is likely not the best, I'm
adding the email he uses in MAINTAINERS as well as that of Günther
Noack, a designated Landlock reviewer.

Mickaël, Günther, the lore link for the full discussion is below:

https://lore.kernel.org/all/Zv5GfY1WS_aaczZM@infradead.org
Dave Chinner Oct. 4, 2024, 10:57 p.m. UTC | #16
On Fri, Oct 04, 2024 at 09:21:19AM +0200, Christian Brauner wrote:
> On Fri, Oct 04, 2024 at 10:46:27AM GMT, Dave Chinner wrote:
> > On Thu, Oct 03, 2024 at 06:17:31PM +0200, Jan Kara wrote:
> > > On Thu 03-10-24 23:59:51, Dave Chinner wrote:
> > > > As for the landlock code, I think it needs to have it's own internal
> > > > tracking mechanism and not search the sb inode list for inodes that
> > > > it holds references to. LSM cleanup should be run before before we
> > > > get to tearing down the inode cache, not after....
> > > 
> > > Well, I think LSM cleanup could in principle be handled together with the
> > > fsnotify cleanup but I didn't check the details.
> > 
> > I'm not sure how we tell if an inode potentially has a LSM related
> > reference hanging off it. The landlock code looks to make an
> > assumption in that the only referenced inodes it sees will have a
> > valid inode->i_security pointer if landlock is enabled. i.e. it
> > calls landlock_inode(inode) and dereferences the returned value
> > without ever checking if inode->i_security is NULL or not.
> > 
> > I mean, we could do a check for inode->i_security when the refcount
> > is elevated and replace the security_sb_delete hook with an
> > security_evict_inode hook similar to the proposed fsnotify eviction
> > from evict_inodes().
> > 
> > But screwing with LSM instructure looks ....  obnoxiously complex
> > from the outside...
> 
> Imho, please just focus on the immediate feedback and ignore all the
> extra bells and whistles that we could or should do. I prefer all of
> that to be done after this series lands.

Actually, it's not as bad as I thought it was going to be. I've
already moved both fsnotify and LSM inode eviction to
evict_inodes() as preparatory patches...

Dave Chinner (2):
      vfs: move fsnotify inode eviction to evict_inodes()
      vfs, lsm: rework lsm inode eviction at unmount

 fs/inode.c                    |  52 +++++++++++++---
 fs/notify/fsnotify.c          |  60 -------------------
 fs/super.c                    |   8 +--
 include/linux/lsm_hook_defs.h |   2 +-
 include/linux/security.h      |   2 +-
 security/landlock/fs.c        | 134 ++++++++++--------------------------------
 security/security.c           |  31 ++++++----
7 files changed, 99 insertions(+), 190 deletions(-)

-Dave.
diff mbox series

Patch

diff --git a/block/bdev.c b/block/bdev.c
index b5a362156ca1..5f720e12f731 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -1226,56 +1226,36 @@  void bdev_mark_dead(struct block_device *bdev, bool surprise)
  */
 EXPORT_SYMBOL_GPL(bdev_mark_dead);
 
+static int sync_bdev_fn(struct inode *inode, void *data)
+{
+	struct block_device *bdev;
+	bool wait = *(bool *)data;
+
+	if (inode->i_mapping->nrpages == 0)
+		return INO_ITER_DONE;
+
+	bdev = I_BDEV(inode);
+	mutex_lock(&bdev->bd_disk->open_mutex);
+	if (!atomic_read(&bdev->bd_openers)) {
+		; /* skip */
+	} else if (wait) {
+		/*
+		 * We keep the error status of individual mapping so
+		 * that applications can catch the writeback error using
+		 * fsync(2). See filemap_fdatawait_keep_errors() for
+		 * details.
+		 */
+		filemap_fdatawait_keep_errors(inode->i_mapping);
+	} else {
+		filemap_fdatawrite(inode->i_mapping);
+	}
+	mutex_unlock(&bdev->bd_disk->open_mutex);
+	return INO_ITER_DONE;
+}
+
 void sync_bdevs(bool wait)
 {
-	struct inode *inode, *old_inode = NULL;
-
-	spin_lock(&blockdev_superblock->s_inode_list_lock);
-	list_for_each_entry(inode, &blockdev_superblock->s_inodes, i_sb_list) {
-		struct address_space *mapping = inode->i_mapping;
-		struct block_device *bdev;
-
-		spin_lock(&inode->i_lock);
-		if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW) ||
-		    mapping->nrpages == 0) {
-			spin_unlock(&inode->i_lock);
-			continue;
-		}
-		__iget(inode);
-		spin_unlock(&inode->i_lock);
-		spin_unlock(&blockdev_superblock->s_inode_list_lock);
-		/*
-		 * We hold a reference to 'inode' so it couldn't have been
-		 * removed from s_inodes list while we dropped the
-		 * s_inode_list_lock  We cannot iput the inode now as we can
-		 * be holding the last reference and we cannot iput it under
-		 * s_inode_list_lock. So we keep the reference and iput it
-		 * later.
-		 */
-		iput(old_inode);
-		old_inode = inode;
-		bdev = I_BDEV(inode);
-
-		mutex_lock(&bdev->bd_disk->open_mutex);
-		if (!atomic_read(&bdev->bd_openers)) {
-			; /* skip */
-		} else if (wait) {
-			/*
-			 * We keep the error status of individual mapping so
-			 * that applications can catch the writeback error using
-			 * fsync(2). See filemap_fdatawait_keep_errors() for
-			 * details.
-			 */
-			filemap_fdatawait_keep_errors(inode->i_mapping);
-		} else {
-			filemap_fdatawrite(inode->i_mapping);
-		}
-		mutex_unlock(&bdev->bd_disk->open_mutex);
-
-		spin_lock(&blockdev_superblock->s_inode_list_lock);
-	}
-	spin_unlock(&blockdev_superblock->s_inode_list_lock);
-	iput(old_inode);
+	super_iter_inodes(blockdev_superblock, sync_bdev_fn, &wait, 0);
 }
 
 /*
diff --git a/fs/drop_caches.c b/fs/drop_caches.c
index d45ef541d848..901cda15537f 100644
--- a/fs/drop_caches.c
+++ b/fs/drop_caches.c
@@ -16,36 +16,20 @@ 
 /* A global variable is a bit ugly, but it keeps the code simple */
 int sysctl_drop_caches;
 
-static void drop_pagecache_sb(struct super_block *sb, void *unused)
+static int invalidate_inode_fn(struct inode *inode, void *data)
 {
-	struct inode *inode, *toput_inode = NULL;
-
-	spin_lock(&sb->s_inode_list_lock);
-	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
-		spin_lock(&inode->i_lock);
-		/*
-		 * We must skip inodes in unusual state. We may also skip
-		 * inodes without pages but we deliberately won't in case
-		 * we need to reschedule to avoid softlockups.
-		 */
-		if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) ||
-		    (mapping_empty(inode->i_mapping) && !need_resched())) {
-			spin_unlock(&inode->i_lock);
-			continue;
-		}
-		__iget(inode);
-		spin_unlock(&inode->i_lock);
-		spin_unlock(&sb->s_inode_list_lock);
-
+	if (!mapping_empty(inode->i_mapping))
 		invalidate_mapping_pages(inode->i_mapping, 0, -1);
-		iput(toput_inode);
-		toput_inode = inode;
+	return INO_ITER_DONE;
+}
 
-		cond_resched();
-		spin_lock(&sb->s_inode_list_lock);
-	}
-	spin_unlock(&sb->s_inode_list_lock);
-	iput(toput_inode);
+/*
+ * Note: it would be nice to check mapping_empty() before we get a reference on
+ * the inode in super_iter_inodes(), but that's a future optimisation.
+ */
+static void drop_pagecache_sb(struct super_block *sb, void *unused)
+{
+	super_iter_inodes(sb, invalidate_inode_fn, NULL, 0);
 }
 
 int drop_caches_sysctl_handler(const struct ctl_table *table, int write,
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index e83d293c3614..f20862614ad6 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -1714,53 +1714,10 @@  static int gfs2_meta_init_fs_context(struct fs_context *fc)
 	return 0;
 }
 
-/**
- * gfs2_evict_inodes - evict inodes cooperatively
- * @sb: the superblock
- *
- * When evicting an inode with a zero link count, we are trying to upgrade the
- * inode's iopen glock from SH to EX mode in order to determine if we can
- * delete the inode.  The other nodes are supposed to evict the inode from
- * their caches if they can, and to poke the inode's inode glock if they cannot
- * do so.  Either behavior allows gfs2_upgrade_iopen_glock() to proceed
- * quickly, but if the other nodes are not cooperating, the lock upgrading
- * attempt will time out.  Since inodes are evicted sequentially, this can add
- * up quickly.
- *
- * Function evict_inodes() tries to keep the s_inode_list_lock list locked over
- * a long time, which prevents other inodes from being evicted concurrently.
- * This precludes the cooperative behavior we are looking for.  This special
- * version of evict_inodes() avoids that.
- *
- * Modeled after drop_pagecache_sb().
- */
-static void gfs2_evict_inodes(struct super_block *sb)
+/* Nothing to do because we just want to bounce the inode through iput() */
+static int gfs2_evict_inode_fn(struct inode *inode, void *data)
 {
-	struct inode *inode, *toput_inode = NULL;
-	struct gfs2_sbd *sdp = sb->s_fs_info;
-
-	set_bit(SDF_EVICTING, &sdp->sd_flags);
-
-	spin_lock(&sb->s_inode_list_lock);
-	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
-		spin_lock(&inode->i_lock);
-		if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) &&
-		    !need_resched()) {
-			spin_unlock(&inode->i_lock);
-			continue;
-		}
-		atomic_inc(&inode->i_count);
-		spin_unlock(&inode->i_lock);
-		spin_unlock(&sb->s_inode_list_lock);
-
-		iput(toput_inode);
-		toput_inode = inode;
-
-		cond_resched();
-		spin_lock(&sb->s_inode_list_lock);
-	}
-	spin_unlock(&sb->s_inode_list_lock);
-	iput(toput_inode);
+	return INO_ITER_DONE;
 }
 
 static void gfs2_kill_sb(struct super_block *sb)
@@ -1779,7 +1736,23 @@  static void gfs2_kill_sb(struct super_block *sb)
 	sdp->sd_master_dir = NULL;
 	shrink_dcache_sb(sb);
 
-	gfs2_evict_inodes(sb);
+	/*
+	 * When evicting an inode with a zero link count, we are trying to
+	 * upgrade the inode's iopen glock from SH to EX mode in order to
+	 * determine if we can delete the inode.  The other nodes are supposed
+	 * to evict the inode from their caches if they can, and to poke the
+	 * inode's inode glock if they cannot do so.  Either behavior allows
+	 * gfs2_upgrade_iopen_glock() to proceed quickly, but if the other nodes
+	 * are not cooperating, the lock upgrading attempt will time out.  Since
+	 * inodes are evicted sequentially, this can add up quickly.
+	 *
+	 * evict_inodes() tries to keep the s_inode_list_lock list locked over a
+	 * long time, which prevents other inodes from being evicted
+	 * concurrently.  This precludes the cooperative behavior we are looking
+	 * for. 
+	 */
+	set_bit(SDF_EVICTING, &sdp->sd_flags);
+	super_iter_inodes(sb, gfs2_evict_inode_fn, NULL, 0);
 
 	/*
 	 * Flush and then drain the delete workqueue here (via
diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index 272c8a1dab3c..68c34ed94271 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -28,63 +28,14 @@  void __fsnotify_vfsmount_delete(struct vfsmount *mnt)
 	fsnotify_clear_marks_by_mount(mnt);
 }
 
-/**
- * fsnotify_unmount_inodes - an sb is unmounting.  handle any watched inodes.
- * @sb: superblock being unmounted.
- *
- * Called during unmount with no locks held, so needs to be safe against
- * concurrent modifiers. We temporarily drop sb->s_inode_list_lock and CAN block.
- */
-static void fsnotify_unmount_inodes(struct super_block *sb)
+static int fsnotify_unmount_inode_fn(struct inode *inode, void *data)
 {
-	struct inode *inode, *iput_inode = NULL;
+	spin_unlock(&inode->i_lock);
 
-	spin_lock(&sb->s_inode_list_lock);
-	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
-		/*
-		 * We cannot __iget() an inode in state I_FREEING,
-		 * I_WILL_FREE, or I_NEW which is fine because by that point
-		 * the inode cannot have any associated watches.
-		 */
-		spin_lock(&inode->i_lock);
-		if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) {
-			spin_unlock(&inode->i_lock);
-			continue;
-		}
-
-		/*
-		 * If i_count is zero, the inode cannot have any watches and
-		 * doing an __iget/iput with SB_ACTIVE clear would actually
-		 * evict all inodes with zero i_count from icache which is
-		 * unnecessarily violent and may in fact be illegal to do.
-		 * However, we should have been called /after/ evict_inodes
-		 * removed all zero refcount inodes, in any case.  Test to
-		 * be sure.
-		 */
-		if (!atomic_read(&inode->i_count)) {
-			spin_unlock(&inode->i_lock);
-			continue;
-		}
-
-		__iget(inode);
-		spin_unlock(&inode->i_lock);
-		spin_unlock(&sb->s_inode_list_lock);
-
-		iput(iput_inode);
-
-		/* for each watch, send FS_UNMOUNT and then remove it */
-		fsnotify_inode(inode, FS_UNMOUNT);
-
-		fsnotify_inode_delete(inode);
-
-		iput_inode = inode;
-
-		cond_resched();
-		spin_lock(&sb->s_inode_list_lock);
-	}
-	spin_unlock(&sb->s_inode_list_lock);
-
-	iput(iput_inode);
+	/* for each watch, send FS_UNMOUNT and then remove it */
+	fsnotify_inode(inode, FS_UNMOUNT);
+	fsnotify_inode_delete(inode);
+	return INO_ITER_DONE;
 }
 
 void fsnotify_sb_delete(struct super_block *sb)
@@ -95,7 +46,19 @@  void fsnotify_sb_delete(struct super_block *sb)
 	if (!sbinfo)
 		return;
 
-	fsnotify_unmount_inodes(sb);
+	/*
+	 * If i_count is zero, the inode cannot have any watches and
+	 * doing an __iget/iput with SB_ACTIVE clear would actually
+	 * evict all inodes with zero i_count from icache which is
+	 * unnecessarily violent and may in fact be illegal to do.
+	 * However, we should have been called /after/ evict_inodes
+	 * removed all zero refcount inodes, in any case. Hence we use
+	 * INO_ITER_REFERENCED to ensure zero refcount inodes are filtered
+	 * properly.
+	 */
+	super_iter_inodes(sb, fsnotify_unmount_inode_fn, NULL,
+			INO_ITER_REFERENCED);
+
 	fsnotify_clear_marks_by_sb(sb);
 	/* Wait for outstanding object references from connectors */
 	wait_var_event(fsnotify_sb_watched_objects(sb),
diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index ea0bd807fed7..ea9fce7acd1b 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -1017,56 +1017,40 @@  static int dqinit_needed(struct inode *inode, int type)
 	return 0;
 }
 
+struct dquot_ref_data {
+	int	type;
+	int	reserved;
+};
+
+static int add_dquot_ref_fn(struct inode *inode, void *data)
+{
+	struct dquot_ref_data *ref = data;
+	int ret;
+
+	if (!dqinit_needed(inode, ref->type))
+		return INO_ITER_DONE;
+
+#ifdef CONFIG_QUOTA_DEBUG
+	if (unlikely(inode_get_rsv_space(inode) > 0))
+		ref->reserved++;
+#endif
+	ret = __dquot_initialize(inode, ref->type);
+	if (ret < 0)
+		return ret;
+	return INO_ITER_DONE;
+}
+
 /* This routine is guarded by s_umount semaphore */
 static int add_dquot_ref(struct super_block *sb, int type)
 {
-	struct inode *inode, *old_inode = NULL;
-#ifdef CONFIG_QUOTA_DEBUG
-	int reserved = 0;
-#endif
-	int err = 0;
-
-	spin_lock(&sb->s_inode_list_lock);
-	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
-		spin_lock(&inode->i_lock);
-		if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) ||
-		    !atomic_read(&inode->i_writecount) ||
-		    !dqinit_needed(inode, type)) {
-			spin_unlock(&inode->i_lock);
-			continue;
-		}
-		__iget(inode);
-		spin_unlock(&inode->i_lock);
-		spin_unlock(&sb->s_inode_list_lock);
-
-#ifdef CONFIG_QUOTA_DEBUG
-		if (unlikely(inode_get_rsv_space(inode) > 0))
-			reserved = 1;
-#endif
-		iput(old_inode);
-		err = __dquot_initialize(inode, type);
-		if (err) {
-			iput(inode);
-			goto out;
-		}
+	struct dquot_ref_data ref = {
+		.type = type,
+	};
+	int err;
 
-		/*
-		 * We hold a reference to 'inode' so it couldn't have been
-		 * removed from s_inodes list while we dropped the
-		 * s_inode_list_lock. We cannot iput the inode now as we can be
-		 * holding the last reference and we cannot iput it under
-		 * s_inode_list_lock. So we keep the reference and iput it
-		 * later.
-		 */
-		old_inode = inode;
-		cond_resched();
-		spin_lock(&sb->s_inode_list_lock);
-	}
-	spin_unlock(&sb->s_inode_list_lock);
-	iput(old_inode);
-out:
+	err = super_iter_inodes(sb, add_dquot_ref_fn, &ref, 0);
 #ifdef CONFIG_QUOTA_DEBUG
-	if (reserved) {
+	if (ref.reserved) {
 		quota_error(sb, "Writes happened before quota was turned on "
 			"thus quota information is probably inconsistent. "
 			"Please run quotacheck(8)");
@@ -1075,11 +1059,6 @@  static int add_dquot_ref(struct super_block *sb, int type)
 	return err;
 }
 
-struct dquot_ref_data {
-	int	type;
-	int	reserved;
-};
-
 static int remove_dquot_ref_fn(struct inode *inode, void *data)
 {
 	struct dquot_ref_data *ref = data;
diff --git a/security/landlock/fs.c b/security/landlock/fs.c
index 7d79fc8abe21..013ec4017ddd 100644
--- a/security/landlock/fs.c
+++ b/security/landlock/fs.c
@@ -1223,109 +1223,60 @@  static void hook_inode_free_security_rcu(void *inode_security)
 
 /*
  * Release the inodes used in a security policy.
- *
- * Cf. fsnotify_unmount_inodes() and invalidate_inodes()
  */
+static int release_inode_fn(struct inode *inode, void *data)
+{
+
+	rcu_read_lock();
+	object = rcu_dereference(landlock_inode(inode)->object);
+	if (!object) {
+		rcu_read_unlock();
+		return INO_ITER_DONE;
+	}
+
+	/*
+	 * If there is no concurrent release_inode() ongoing, then we
+	 * are in charge of calling iput() on this inode, otherwise we
+	 * will just wait for it to finish.
+	 */
+	spin_lock(&object->lock);
+	if (object->underobj != inode) {
+		spin_unlock(&object->lock);
+		rcu_read_unlock();
+		return INO_ITER_DONE;
+	}
+
+	object->underobj = NULL;
+	spin_unlock(&object->lock);
+	rcu_read_unlock();
+
+	/*
+	 * Because object->underobj was not NULL, release_inode() and
+	 * get_inode_object() guarantee that it is safe to reset
+	 * landlock_inode(inode)->object while it is not NULL.  It is therefore
+	 * not necessary to lock inode->i_lock.
+	 */
+	rcu_assign_pointer(landlock_inode(inode)->object, NULL);
+
+	/*
+	 * At this point, we own the ihold() reference that was originally set
+	 * up by get_inode_object() as well as the reference the inode iterator
+	 * obtained before calling us.  Therefore the following call to iput()
+	 * will not sleep nor drop the inode because there is now at least two
+	 * references to it.
+	 */
+	iput(inode);
+	return INO_ITER_DONE;
+}
+
 static void hook_sb_delete(struct super_block *const sb)
 {
-	struct inode *inode, *prev_inode = NULL;
-
 	if (!landlock_initialized)
 		return;
 
-	spin_lock(&sb->s_inode_list_lock);
-	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
-		struct landlock_object *object;
+	super_iter_inodes(sb, release_inode_fn, NULL, 0);
 
-		/* Only handles referenced inodes. */
-		if (!atomic_read(&inode->i_count))
-			continue;
-
-		/*
-		 * Protects against concurrent modification of inode (e.g.
-		 * from get_inode_object()).
-		 */
-		spin_lock(&inode->i_lock);
-		/*
-		 * Checks I_FREEING and I_WILL_FREE  to protect against a race
-		 * condition when release_inode() just called iput(), which
-		 * could lead to a NULL dereference of inode->security or a
-		 * second call to iput() for the same Landlock object.  Also
-		 * checks I_NEW because such inode cannot be tied to an object.
-		 */
-		if (inode->i_state & (I_FREEING | I_WILL_FREE | I_NEW)) {
-			spin_unlock(&inode->i_lock);
-			continue;
-		}
-
-		rcu_read_lock();
-		object = rcu_dereference(landlock_inode(inode)->object);
-		if (!object) {
-			rcu_read_unlock();
-			spin_unlock(&inode->i_lock);
-			continue;
-		}
-		/* Keeps a reference to this inode until the next loop walk. */
-		__iget(inode);
-		spin_unlock(&inode->i_lock);
-
-		/*
-		 * If there is no concurrent release_inode() ongoing, then we
-		 * are in charge of calling iput() on this inode, otherwise we
-		 * will just wait for it to finish.
-		 */
-		spin_lock(&object->lock);
-		if (object->underobj == inode) {
-			object->underobj = NULL;
-			spin_unlock(&object->lock);
-			rcu_read_unlock();
-
-			/*
-			 * Because object->underobj was not NULL,
-			 * release_inode() and get_inode_object() guarantee
-			 * that it is safe to reset
-			 * landlock_inode(inode)->object while it is not NULL.
-			 * It is therefore not necessary to lock inode->i_lock.
-			 */
-			rcu_assign_pointer(landlock_inode(inode)->object, NULL);
-			/*
-			 * At this point, we own the ihold() reference that was
-			 * originally set up by get_inode_object() and the
-			 * __iget() reference that we just set in this loop
-			 * walk.  Therefore the following call to iput() will
-			 * not sleep nor drop the inode because there is now at
-			 * least two references to it.
-			 */
-			iput(inode);
-		} else {
-			spin_unlock(&object->lock);
-			rcu_read_unlock();
-		}
-
-		if (prev_inode) {
-			/*
-			 * At this point, we still own the __iget() reference
-			 * that we just set in this loop walk.  Therefore we
-			 * can drop the list lock and know that the inode won't
-			 * disappear from under us until the next loop walk.
-			 */
-			spin_unlock(&sb->s_inode_list_lock);
-			/*
-			 * We can now actually put the inode reference from the
-			 * previous loop walk, which is not needed anymore.
-			 */
-			iput(prev_inode);
-			cond_resched();
-			spin_lock(&sb->s_inode_list_lock);
-		}
-		prev_inode = inode;
-	}
-	spin_unlock(&sb->s_inode_list_lock);
-
-	/* Puts the inode reference from the last loop walk, if any. */
-	if (prev_inode)
-		iput(prev_inode);
-	/* Waits for pending iput() in release_inode(). */
+	/* Waits for pending iput()s in release_inode(). */
 	wait_var_event(&landlock_superblock(sb)->inode_refs,
 		       !atomic_long_read(&landlock_superblock(sb)->inode_refs));
 }