[02/10] ocfs2: free inode when i_count becomes zero
diff mbox

Message ID 53e290c2.UrOYpRJudvt5Qabp%akpm@linux-foundation.org
State New, archived
Headers show

Commit Message

Andrew Morton Aug. 6, 2014, 8:32 p.m. UTC
From: Xue jiufei <xuejiufei@huawei.com>
Subject: ocfs2: free inode when i_count becomes zero

Disk inode deletion may be heavily delayed when one node unlink a file
after the same dentry is freed on another node(say N1) because of memory
shrink but inode is left in memory.  This inode can only be freed while N1
doing the orphan scan work.

However, N1 may skip orphan scan for several times because other nodes may
do the work earlier.  In our tests, it may take 1 hour on 4 nodes cluster
and this will cause bad user experience.  So we think the inode should be
freed when i_count becomes zero to avoid such circumstances.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: joyce.xue <xuejiufei@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/ocfs2/inode.c |   10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

Comments

Mark Fasheh Aug. 13, 2014, 6:03 p.m. UTC | #1
On Wed, Aug 06, 2014 at 01:32:02PM -0700, Andrew Morton wrote:
> From: Xue jiufei <xuejiufei@huawei.com>
> Subject: ocfs2: free inode when i_count becomes zero
> 
> Disk inode deletion may be heavily delayed when one node unlink a file
> after the same dentry is freed on another node(say N1) because of memory
> shrink but inode is left in memory.  This inode can only be freed while N1
> doing the orphan scan work.
> 
> However, N1 may skip orphan scan for several times because other nodes may
> do the work earlier.  In our tests, it may take 1 hour on 4 nodes cluster
> and this will cause bad user experience.  So we think the inode should be
> freed when i_count becomes zero to avoid such circumstances.

Firstly, thanks for the patch Xue.

I understand your problem and I definitely agree that it hurts the user
experience. If the inode is free to be deleted we shouldn't be taking so
long to get rid of it.

What I'm worried about is that we're always going to tell the kernel to
evict the inode now, which will always cause some sort of cluster locking.

I need to look at this more and think about it a bit. Maybe there's a better
way?
	--Mark

--
Mark Fasheh
Xue jiufei Aug. 30, 2014, 7:03 a.m. UTC | #2
On 2014/8/14 2:03, Mark Fasheh wrote:
> On Wed, Aug 06, 2014 at 01:32:02PM -0700, Andrew Morton wrote:
>> From: Xue jiufei <xuejiufei@huawei.com>
>> Subject: ocfs2: free inode when i_count becomes zero
>>
>> Disk inode deletion may be heavily delayed when one node unlink a file
>> after the same dentry is freed on another node(say N1) because of memory
>> shrink but inode is left in memory.  This inode can only be freed while N1
>> doing the orphan scan work.
>>
>> However, N1 may skip orphan scan for several times because other nodes may
>> do the work earlier.  In our tests, it may take 1 hour on 4 nodes cluster
>> and this will cause bad user experience.  So we think the inode should be
>> freed when i_count becomes zero to avoid such circumstances.
> 
> Firstly, thanks for the patch Xue.
> 
> I understand your problem and I definitely agree that it hurts the user
> experience. If the inode is free to be deleted we shouldn't be taking so
> long to get rid of it.
> 
> What I'm worried about is that we're always going to tell the kernel to
> evict the inode now, which will always cause some sort of cluster locking.
> 
> I need to look at this more and think about it a bit. Maybe there's a better
> way?
> 	--Mark
> 
> --
In most cases, the refcount of inode would not be zero because there
is one or more dentrys associated with it. So only in this situation
that a dentry is force to be freed because of memory pressure but the
inode is left, we increase the probability of inode eviction. I think it
is acceptable.

Thanks,
Xuejiufei

> Mark Fasheh
> .
>
Xue jiufei Dec. 2, 2014, 6:50 a.m. UTC | #3
Hi
On 2014/8/30 15:03, Xue jiufei wrote:
> On 2014/8/14 2:03, Mark Fasheh wrote:
>> On Wed, Aug 06, 2014 at 01:32:02PM -0700, Andrew Morton wrote:
>>> From: Xue jiufei <xuejiufei@huawei.com>
>>> Subject: ocfs2: free inode when i_count becomes zero
>>>
>>> Disk inode deletion may be heavily delayed when one node unlink a file
>>> after the same dentry is freed on another node(say N1) because of memory
>>> shrink but inode is left in memory.  This inode can only be freed while N1
>>> doing the orphan scan work.
>>>
>>> However, N1 may skip orphan scan for several times because other nodes may
>>> do the work earlier.  In our tests, it may take 1 hour on 4 nodes cluster
>>> and this will cause bad user experience.  So we think the inode should be
>>> freed when i_count becomes zero to avoid such circumstances.
>>
>> Firstly, thanks for the patch Xue.
>>
>> I understand your problem and I definitely agree that it hurts the user
>> experience. If the inode is free to be deleted we shouldn't be taking so
>> long to get rid of it.
>>
>> What I'm worried about is that we're always going to tell the kernel to
>> evict the inode now, which will always cause some sort of cluster locking.
>>
>> I need to look at this more and think about it a bit. Maybe there's a better
>> way?
>> 	--Mark
>>
I am sorry that I made a mistake. This patch may lead to data loss when i_count
becomes zero but there still exists dirty pages in i_mapping, the dirty pages
would be freed without flushing the data.

To avoid this problem, we should flush dirty page before dropping
the inode, but I don't think it it a good idea to flush page in
function ocfs2_drop_inode().

Is there any better way to solve this problem?

Thanks,
Xuejiufei
>> --
> In most cases, the refcount of inode would not be zero because there
> is one or more dentrys associated with it. So only in this situation
> that a dentry is force to be freed because of memory pressure but the
> inode is left, we increase the probability of inode eviction. I think it
> is acceptable.
> 
> Thanks,
> Xuejiufei
> 
>> Mark Fasheh
>> .
>>
>

Patch
diff mbox

diff -puN fs/ocfs2/inode.c~ocfs2-free-inode-when-i_count-becomes-zero fs/ocfs2/inode.c
--- a/fs/ocfs2/inode.c~ocfs2-free-inode-when-i_count-becomes-zero
+++ a/fs/ocfs2/inode.c
@@ -1192,17 +1192,9 @@  void ocfs2_evict_inode(struct inode *ino
 int ocfs2_drop_inode(struct inode *inode)
 {
 	struct ocfs2_inode_info *oi = OCFS2_I(inode);
-	int res;
-
 	trace_ocfs2_drop_inode((unsigned long long)oi->ip_blkno,
 				inode->i_nlink, oi->ip_flags);
-
-	if (oi->ip_flags & OCFS2_INODE_MAYBE_ORPHANED)
-		res = 1;
-	else
-		res = generic_drop_inode(inode);
-
-	return res;
+	return 1;
 }
 
 /*