diff mbox

[v2,3/3] vfs: Use per-cpu list for superblock's inode list

Message ID 20160222130435.GM7791@quack.suse.cz (mailing list archive)
State New, archived
Headers show

Commit Message

Jan Kara Feb. 22, 2016, 1:04 p.m. UTC
On Mon 22-02-16 13:12:22, Peter Zijlstra wrote:
> On Mon, Feb 22, 2016 at 12:54:35PM +0100, Jan Kara wrote:
> > > Also, I think fsnotify_unmount_inodes() (as per mainline) is missing a
> > > final iput(need_iput) at the very end, but I could be mistaken, that
> > > code hurts my brain.
> > 
> > I think the code is actually correct since need_iput contains "inode
> > further in the list than the current inode". Thus we will always go though
> > another iteration of the loop which will drop the reference. And inode
> > cannot change state to I_FREEING or I_WILL_FREE because we hold inode
> > reference. But it is subtle as hell so I agree that code needs rewrite.
> 
> So while talking to dchinner, he doubted fsnotify will actually remove
> inodes from the sb-list, but wasn't sure and too tired to check now.
> 
> (I got lost in the fsnotify code real quick and gave up, for I was
> mostly trying to make a point that we don't need the CPP magic and can
> do with 'readable' code).
> 
> If it doesn't, it doesn't need to do this extra special magic dance and
> can use the 'normal' iterator pattern used in all the other functions,
> greatly reducing complexity.

Yeah, that would be nice. But fsnotify code needs to iterate over all
inodes, drop sb_list_lock and do some fsnotify magic with the inode which
is not substantial for our discussion. Now that fsnotify magic may actually
drop all the remaining inode references so once we drop our reference
pinning the inode, it can just disappear. We don't want to restart the scan
for each inode we have to process so that is the reason why we play ugly
tricks with pinning the next inode in the list.

But I agree it should be possible to just use list_for_each_entry() instead
of list_for_each_entry_safe() and keep current inode pinned till the next
iteration to make it stick in the sb->s_inodes list. That would make the
iteration more standard. Lightly tested patch attached.

								Honza

Comments

Dave Chinner Feb. 22, 2016, 9:08 p.m. UTC | #1
On Mon, Feb 22, 2016 at 02:04:35PM +0100, Jan Kara wrote:
> On Mon 22-02-16 13:12:22, Peter Zijlstra wrote:
> > On Mon, Feb 22, 2016 at 12:54:35PM +0100, Jan Kara wrote:
> > > > Also, I think fsnotify_unmount_inodes() (as per mainline) is missing a
> > > > final iput(need_iput) at the very end, but I could be mistaken, that
> > > > code hurts my brain.
> > > 
> > > I think the code is actually correct since need_iput contains "inode
> > > further in the list than the current inode". Thus we will always go though
> > > another iteration of the loop which will drop the reference. And inode
> > > cannot change state to I_FREEING or I_WILL_FREE because we hold inode
> > > reference. But it is subtle as hell so I agree that code needs rewrite.
> > 
> > So while talking to dchinner, he doubted fsnotify will actually remove
> > inodes from the sb-list, but wasn't sure and too tired to check now.
> > 
> > (I got lost in the fsnotify code real quick and gave up, for I was
> > mostly trying to make a point that we don't need the CPP magic and can
> > do with 'readable' code).
> > 
> > If it doesn't, it doesn't need to do this extra special magic dance and
> > can use the 'normal' iterator pattern used in all the other functions,
> > greatly reducing complexity.
> 
> Yeah, that would be nice. But fsnotify code needs to iterate over all
> inodes, drop sb_list_lock and do some fsnotify magic with the inode which
> is not substantial for our discussion. Now that fsnotify magic may actually
> drop all the remaining inode references so once we drop our reference
> pinning the inode, it can just disappear. We don't want to restart the scan
> for each inode we have to process so that is the reason why we play ugly
> tricks with pinning the next inode in the list.
> 
> But I agree it should be possible to just use list_for_each_entry() instead
> of list_for_each_entry_safe() and keep current inode pinned till the next
> iteration to make it stick in the sb->s_inodes list. That would make the
> iteration more standard. Lightly tested patch attached.

That's exactly what I was thinking. Patch looks ok from aquick
reading of it, but I haven't I've got anything here to test it
at all. Perhaps we need so xfstests coverage of this code....

Cheers,

Dave.
Jan Kara Feb. 22, 2016, 10:18 p.m. UTC | #2
On Tue 23-02-16 08:08:14, Dave Chinner wrote:
> On Mon, Feb 22, 2016 at 02:04:35PM +0100, Jan Kara wrote:
> > On Mon 22-02-16 13:12:22, Peter Zijlstra wrote:
> > > On Mon, Feb 22, 2016 at 12:54:35PM +0100, Jan Kara wrote:
> > > > > Also, I think fsnotify_unmount_inodes() (as per mainline) is missing a
> > > > > final iput(need_iput) at the very end, but I could be mistaken, that
> > > > > code hurts my brain.
> > > > 
> > > > I think the code is actually correct since need_iput contains "inode
> > > > further in the list than the current inode". Thus we will always go though
> > > > another iteration of the loop which will drop the reference. And inode
> > > > cannot change state to I_FREEING or I_WILL_FREE because we hold inode
> > > > reference. But it is subtle as hell so I agree that code needs rewrite.
> > > 
> > > So while talking to dchinner, he doubted fsnotify will actually remove
> > > inodes from the sb-list, but wasn't sure and too tired to check now.
> > > 
> > > (I got lost in the fsnotify code real quick and gave up, for I was
> > > mostly trying to make a point that we don't need the CPP magic and can
> > > do with 'readable' code).
> > > 
> > > If it doesn't, it doesn't need to do this extra special magic dance and
> > > can use the 'normal' iterator pattern used in all the other functions,
> > > greatly reducing complexity.
> > 
> > Yeah, that would be nice. But fsnotify code needs to iterate over all
> > inodes, drop sb_list_lock and do some fsnotify magic with the inode which
> > is not substantial for our discussion. Now that fsnotify magic may actually
> > drop all the remaining inode references so once we drop our reference
> > pinning the inode, it can just disappear. We don't want to restart the scan
> > for each inode we have to process so that is the reason why we play ugly
> > tricks with pinning the next inode in the list.
> > 
> > But I agree it should be possible to just use list_for_each_entry() instead
> > of list_for_each_entry_safe() and keep current inode pinned till the next
> > iteration to make it stick in the sb->s_inodes list. That would make the
> > iteration more standard. Lightly tested patch attached.
> 
> That's exactly what I was thinking. Patch looks ok from aquick
> reading of it, but I haven't I've got anything here to test it
> at all. Perhaps we need so xfstests coverage of this code....

I've tested it by adding watches to some files and then unmounting the
filesystem. That should give basic testing to the code. There's some
reasonable inotify coverage (including unmount events) in LTP but most of
the testing happens in tmpdir so it is not particularly useful for
stressing this code.

								Honza
Waiman Long Feb. 23, 2016, 7:01 p.m. UTC | #3
On 02/22/2016 08:04 AM, Jan Kara wrote:
> On Mon 22-02-16 13:12:22, Peter Zijlstra wrote:
>> On Mon, Feb 22, 2016 at 12:54:35PM +0100, Jan Kara wrote:
>>>> Also, I think fsnotify_unmount_inodes() (as per mainline) is missing a
>>>> final iput(need_iput) at the very end, but I could be mistaken, that
>>>> code hurts my brain.
>>> I think the code is actually correct since need_iput contains "inode
>>> further in the list than the current inode". Thus we will always go though
>>> another iteration of the loop which will drop the reference. And inode
>>> cannot change state to I_FREEING or I_WILL_FREE because we hold inode
>>> reference. But it is subtle as hell so I agree that code needs rewrite.
>> So while talking to dchinner, he doubted fsnotify will actually remove
>> inodes from the sb-list, but wasn't sure and too tired to check now.
>>
>> (I got lost in the fsnotify code real quick and gave up, for I was
>> mostly trying to make a point that we don't need the CPP magic and can
>> do with 'readable' code).
>>
>> If it doesn't, it doesn't need to do this extra special magic dance and
>> can use the 'normal' iterator pattern used in all the other functions,
>> greatly reducing complexity.
> Yeah, that would be nice. But fsnotify code needs to iterate over all
> inodes, drop sb_list_lock and do some fsnotify magic with the inode which
> is not substantial for our discussion. Now that fsnotify magic may actually
> drop all the remaining inode references so once we drop our reference
> pinning the inode, it can just disappear. We don't want to restart the scan
> for each inode we have to process so that is the reason why we play ugly
> tricks with pinning the next inode in the list.
>
> But I agree it should be possible to just use list_for_each_entry() instead
> of list_for_each_entry_safe() and keep current inode pinned till the next
> iteration to make it stick in the sb->s_inodes list. That would make the
> iteration more standard. Lightly tested patch attached.
>
> 								Honza

Your patch looks good to me. I would like to put your patch into my 
per-cpu list patchset if you don't mind.

Cheers,
Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

From b73ae63fff14dea2afac34523d5ebfc5f030eff6 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Mon, 22 Feb 2016 13:54:32 +0100
Subject: [PATCH] fsnotify: Simplify inode iteration on umount

fsnotify_unmount_inodes() played complex tricks to pin next inode in the
sb->s_inodes list when iterating over all inodes. If we switch to
keeping current inode pinned somewhat longer, we can make the code much
simpler and standard.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/notify/inode_mark.c | 45 +++++++++------------------------------------
 1 file changed, 9 insertions(+), 36 deletions(-)

diff --git a/fs/notify/inode_mark.c b/fs/notify/inode_mark.c
index 741077deef3b..a3645249f7ec 100644
--- a/fs/notify/inode_mark.c
+++ b/fs/notify/inode_mark.c
@@ -150,12 +150,10 @@  int fsnotify_add_inode_mark(struct fsnotify_mark *mark,
  */
 void fsnotify_unmount_inodes(struct super_block *sb)
 {
-	struct inode *inode, *next_i, *need_iput = NULL;
+	struct inode *inode, *iput_inode = NULL;
 
 	spin_lock(&sb->s_inode_list_lock);
-	list_for_each_entry_safe(inode, next_i, &sb->s_inodes, i_sb_list) {
-		struct inode *need_iput_tmp;
-
+	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
 		/*
 		 * We cannot __iget() an inode in state I_FREEING,
 		 * I_WILL_FREE, or I_NEW which is fine because by that point
@@ -178,49 +176,24 @@  void fsnotify_unmount_inodes(struct super_block *sb)
 			continue;
 		}
 
-		need_iput_tmp = need_iput;
-		need_iput = NULL;
-
-		/* In case fsnotify_inode_delete() drops a reference. */
-		if (inode != need_iput_tmp)
-			__iget(inode);
-		else
-			need_iput_tmp = NULL;
+		__iget(inode);
 		spin_unlock(&inode->i_lock);
-
-		/* In case the dropping of a reference would nuke next_i. */
-		while (&next_i->i_sb_list != &sb->s_inodes) {
-			spin_lock(&next_i->i_lock);
-			if (!(next_i->i_state & (I_FREEING | I_WILL_FREE)) &&
-						atomic_read(&next_i->i_count)) {
-				__iget(next_i);
-				need_iput = next_i;
-				spin_unlock(&next_i->i_lock);
-				break;
-			}
-			spin_unlock(&next_i->i_lock);
-			next_i = list_next_entry(next_i, i_sb_list);
-		}
-
-		/*
-		 * We can safely drop s_inode_list_lock here because either
-		 * we actually hold references on both inode and next_i or
-		 * end of list.  Also no new inodes will be added since the
-		 * umount has begun.
-		 */
 		spin_unlock(&sb->s_inode_list_lock);
 
-		if (need_iput_tmp)
-			iput(need_iput_tmp);
+		if (iput_inode)
+			iput(iput_inode);
 
 		/* for each watch, send FS_UNMOUNT and then remove it */
 		fsnotify(inode, FS_UNMOUNT, inode, FSNOTIFY_EVENT_INODE, NULL, 0);
 
 		fsnotify_inode_delete(inode);
 
-		iput(inode);
+		iput_inode = inode;
 
 		spin_lock(&sb->s_inode_list_lock);
 	}
 	spin_unlock(&sb->s_inode_list_lock);
+
+	if (iput_inode)
+		iput(iput_inode);
 }
-- 
2.6.2