Message ID | 20200919093923.19016-1-luoshijie1@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RESEND] fs: fix race condition oops between destroy_inode and writeback_sb_inodes | expand |
On Sat, Sep 19, 2020 at 05:39:23AM -0400, Shijie Luo wrote: > There is a race condition between destroy_inode and writeback_sb_inodes, > thread-1 thread-2 > wb_workfn > writeback_inodes_wb > __writeback_inodes_wb > writeback_sb_inodes > wbc_attach_and_unlock_inode > iget_locked > destroy_inode > inode_detach_wb > inode->i_wb = NULL; > > inode_to_wb_and_lock_list > locked_inode_to_wb_and_lock_list > wb_get > oops > > so destroy inode after adding I_FREEING to inode state and the I_SYNC state > being cleared. > > Reported-by: Tianxiong Lu <lutianxiong@huawei.com> > Signed-off-by: Shijie Luo <luoshijie1@huawei.com> > Signed-off-by: Haotian Li <lihaotian9@huawei.com> > --- > fs/inode.c | 14 +++++++++++++- > 1 file changed, 13 insertions(+), 1 deletion(-) > > diff --git a/fs/inode.c b/fs/inode.c > index 72c4c347afb7..b28a2a9e15d5 100644 > --- a/fs/inode.c > +++ b/fs/inode.c > @@ -1148,10 +1148,17 @@ struct inode *iget5_locked(struct super_block *sb, unsigned long hashval, > struct inode *new = alloc_inode(sb); > > if (new) { > + spin_lock(&new->i_lock); > new->i_state = 0; > + spin_unlock(&new->i_lock); This part is unnecessary. We just allocated 'new' two lines above; nobody else can see 'new' yet. We make it visible with hlist_add_head_rcu() which uses rcu_assign_pointer() whch contains a memory barrier, so it's impossible for another CPU to see a stale i_state. > inode = inode_insert5(new, hashval, test, set, data); > - if (unlikely(inode != new)) > + if (unlikely(inode != new)) { > + spin_lock(&new->i_lock); > + new->i_state |= I_FREEING; > + spin_unlock(&new->i_lock); > + inode_wait_for_writeback(new); > destroy_inode(new); This doesn't make sense either. If an inode is returned here which is not 'new', then adding 'new' to the hash failed, and new was never visible to another CPU. > @@ -1218,6 +1225,11 @@ struct inode *iget_locked(struct super_block *sb, unsigned long ino) > * allocated. > */ > spin_unlock(&inode_hash_lock); > + > + spin_lock(&inode->i_lock); > + inode->i_state |= I_FREEING; > + spin_unlock(&inode->i_lock); > + inode_wait_for_writeback(inode); > destroy_inode(inode); Again, this doesn't make sense. This is also a codepath which failed to make 'inode' visible to any other thread. I don't understand how this patch could fix anything.
On 2020/9/19 22:56, Matthew Wilcox wrote: > This part is unnecessary. We just allocated 'new' two lines above; > nobody else can see 'new' yet. We make it visible with hlist_add_head_rcu() > which uses rcu_assign_pointer() whch contains a memory barrier, so it's > impossible for another CPU to see a stale i_state. > >> inode = inode_insert5(new, hashval, test, set, data); >> - if (unlikely(inode != new)) >> + if (unlikely(inode != new)) { >> + spin_lock(&new->i_lock); >> + new->i_state |= I_FREEING; >> + spin_unlock(&new->i_lock); >> + inode_wait_for_writeback(new); >> destroy_inode(new); > This doesn't make sense either. If an inode is returned here which is not > 'new', then adding 'new' to the hash failed, and new was never visible > to another CPU. > >> @@ -1218,6 +1225,11 @@ struct inode *iget_locked(struct super_block *sb, unsigned long ino) >> * allocated. >> */ >> spin_unlock(&inode_hash_lock); >> + >> + spin_lock(&inode->i_lock); >> + inode->i_state |= I_FREEING; >> + spin_unlock(&inode->i_lock); >> + inode_wait_for_writeback(inode); >> destroy_inode(inode); > Again, this doesn't make sense. This is also a codepath which failed to > make 'inode' visible to any other thread. > > I don't understand how this patch could fix anything. > . Thanks for your review,the underlying filesystem is ext4, ext4_alloc_inode doesn't allocate a new vfs inode from slab, and I found the "new inode" was used by another thread in vmcore, in other words, the new inode should be a new one , but not. Maybe it's not a filesystem problem, and fixing this problem in iget_locked is not a good way, I 'll try to find the root cause and fix it.
On Sat 19-09-20 05:39:23, Shijie Luo wrote: > We tested an oops problem in Linux 4.18. The Call Trace message is > followed below. > > [255946.665989] Oops: 0000 [#1] SMP PTI > [255946.674811] Workqueue: writeback wb_workfn (flush-253:6) > [255946.676443] RIP: 0010:locked_inode_to_wb_and_lock_list+0x20/0x120 > [255946.683916] RSP: 0018:ffffbb0e44727c00 EFLAGS: 00010286 > [255946.685518] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 > [255946.687699] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9ef282be5398 > [255946.689866] RBP: ffff9ef282be5398 R08: ffffbb0e44727cd8 R09: ffff9ef3064f306e > [255946.692037] R10: 0000000000000000 R11: 0000000000000010 R12: ffff9ef282be5420 > [255946.694208] R13: ffff9ef3351cc800 R14: 0000000000000000 R15: ffff9ef3352e2058 > [255946.696378] FS: 0000000000000000(0000) GS:ffff9ef33ad80000(0000) knlGS:0000000000000000 > [255946.698835] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [255946.700604] CR2: 0000000000000000 CR3: 000000000760a005 CR4: 00000000003606e0 > [255946.702787] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [255946.704955] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [255946.707123] Call Trace: > [255946.707918] writeback_sb_inodes+0x1fe/0x460 > [255946.709244] __writeback_inodes_wb+0x5d/0xb0 > [255946.710575] wb_writeback+0x265/0x2f0 > [255946.711728] ? wb_workfn+0x3cf/0x4d0 > [255946.712850] wb_workfn+0x3cf/0x4d0 > [255946.713923] process_one_work+0x195/0x390 > [255946.715173] worker_thread+0x30/0x390 > [255946.716319] ? process_one_work+0x390/0x390 > [255946.717625] kthread+0x10d/0x130 > [255946.718789] ? kthread_flush_work_fn+0x10/0x10 > [255946.720170] ret_from_fork+0x35/0x40 So 4.18 is rather old and we had several fixes in this area for crashes similar to the one you show above. The list was likely: 68f23b89067 ("memcg: fix a crash in wb_workfn when a device disappears") but there were multiple changes before that to bdi logic to fix lifetime issues when devices are hot-removed. > There is a race condition between destroy_inode and writeback_sb_inodes, > thread-1 thread-2 > wb_workfn > writeback_inodes_wb > __writeback_inodes_wb > writeback_sb_inodes > wbc_attach_and_unlock_inode > iget_locked > destroy_inode > inode_detach_wb > inode->i_wb = NULL; So thread-1 looks sensible but I don't see how what is in thread-2 can ever happen. We can call destroy_inode() from iget_locked() only for inodes that were never added to inode hash (and so they couldn't ever be dirty of even be handled by the flusher thread). Active inodes must (and AFAIK always do) pass through fs/inode.c:evict() which takes care of waiting for the running flusher thread (through inode_wait_for_writeback()). Honza
On 2020/9/21 18:25, Jan Kara wrote: > On Sat 19-09-20 05:39:23, Shijie Luo wrote: >> So 4.18 is rather old and we had several fixes in this area for crashes >> similar to the one you show above. The list was likely: >> >> 68f23b89067 ("memcg: fix a crash in wb_workfn when a device disappears") >> >> but there were multiple changes before that to bdi logic to fix lifetime >> issues when devices are hot-removed. >> Thanks for your reply, we checked several fixes in wb_workfn , and finally found this patch (ceff86fddae8 ext4: Avoid freeing inodes on dirty list) works. Our fsstress process randomly uses ioctl interface to set inode with journal data flag, ext4 inode with journal data flags is possible to be marked dirty and added to writeback lists again. When locked_inode_to_wb_and_lock_list in __mark_inode_dirty releases inode->i_lock and do not lock wb->list_lock, simultaneously the inode is evicted and removed from writeback lists, it's possible this inode will be added to writeback list again. This problem causes inode allocated from slab is still on writeback list, and may causes crash because destory_inode set inode->wb to be NULL.
diff --git a/fs/inode.c b/fs/inode.c index 72c4c347afb7..b28a2a9e15d5 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -1148,10 +1148,17 @@ struct inode *iget5_locked(struct super_block *sb, unsigned long hashval, struct inode *new = alloc_inode(sb); if (new) { + spin_lock(&new->i_lock); new->i_state = 0; + spin_unlock(&new->i_lock); inode = inode_insert5(new, hashval, test, set, data); - if (unlikely(inode != new)) + if (unlikely(inode != new)) { + spin_lock(&new->i_lock); + new->i_state |= I_FREEING; + spin_unlock(&new->i_lock); + inode_wait_for_writeback(new); destroy_inode(new); + } } } return inode; @@ -1218,6 +1225,11 @@ struct inode *iget_locked(struct super_block *sb, unsigned long ino) * allocated. */ spin_unlock(&inode_hash_lock); + + spin_lock(&inode->i_lock); + inode->i_state |= I_FREEING; + spin_unlock(&inode->i_lock); + inode_wait_for_writeback(inode); destroy_inode(inode); if (IS_ERR(old)) return NULL;