diff mbox series

[RFC] mm: truncate: flush lru cache for evicted inode

Message ID 20240614131856.754-1-hdanton@sina.com (mailing list archive)
State New
Headers show
Series [RFC] mm: truncate: flush lru cache for evicted inode | expand

Commit Message

Hillf Danton June 14, 2024, 1:18 p.m. UTC
Flush lru cache to avoid folio->mapping uaf in case of inode teardown.

Reported-and-tested-by: syzbot+d79afb004be235636ee8@syzkaller.appspotmail.com
Signed-off-by: Hillf Danton <hdanton@sina.com>
---
Post for comments because lru_add_drain_all() is too haevy a hammer.

--

Comments

Matthew Wilcox (Oracle) June 14, 2024, 1:42 p.m. UTC | #1
On Fri, Jun 14, 2024 at 09:18:56PM +0800, Hillf Danton wrote:
> Flush lru cache to avoid folio->mapping uaf in case of inode teardown.

What?  inodes are supposed to have all their folios removed before
being freed.  Part of removing a folio sets the folio->mapping to NULL.
Where is the report?

> Reported-and-tested-by: syzbot+d79afb004be235636ee8@syzkaller.appspotmail.com
> Signed-off-by: Hillf Danton <hdanton@sina.com>
> ---
> Post for comments because lru_add_drain_all() is too haevy a hammer.
> 
> --- x/mm/truncate.c
> +++ y/mm/truncate.c
> @@ -419,6 +419,9 @@ void truncate_inode_pages_range(struct a
>  		truncate_folio_batch_exceptionals(mapping, &fbatch, indices);
>  		folio_batch_release(&fbatch);
>  	}
> +
> +	if (mapping_exiting(mapping))
> +		lru_add_drain_all();
>  }
>  EXPORT_SYMBOL(truncate_inode_pages_range);
>  
> --
Hillf Danton June 14, 2024, 11:59 p.m. UTC | #2
On Fri, 14 Jun 2024 14:42:20 +0100 Matthew Wilcox wrote:
> On Fri, Jun 14, 2024 at 09:18:56PM +0800, Hillf Danton wrote:
> > Flush lru cache to avoid folio->mapping uaf in case of inode teardown.
> 
> What?  inodes are supposed to have all their folios removed before
> being freed.  Part of removing a folio sets the folio->mapping to NULL.
> Where is the report?
>
Subject: Re: [syzbot] [nilfs?] [mm?] KASAN: slab-use-after-free Read in lru_add_fn
https://lore.kernel.org/lkml/000000000000cae276061aa12d5e@google.com/
Matthew Wilcox (Oracle) June 15, 2024, 8:44 p.m. UTC | #3
On Sat, Jun 15, 2024 at 07:59:53AM +0800, Hillf Danton wrote:
> On Fri, 14 Jun 2024 14:42:20 +0100 Matthew Wilcox wrote:
> > On Fri, Jun 14, 2024 at 09:18:56PM +0800, Hillf Danton wrote:
> > > Flush lru cache to avoid folio->mapping uaf in case of inode teardown.
> > 
> > What?  inodes are supposed to have all their folios removed before
> > being freed.  Part of removing a folio sets the folio->mapping to NULL.
> > Where is the report?
> >
> Subject: Re: [syzbot] [nilfs?] [mm?] KASAN: slab-use-after-free Read in lru_add_fn
> https://lore.kernel.org/lkml/000000000000cae276061aa12d5e@google.com/

Thanks.  This fix is wrong.  Of course syzbot says it fixes the problem,
but you're just avoiding putting the folios into the situation where we
have debug that would detect the problem.

I suspect this would trigger:

+++ b/fs/inode.c
@@ -282,6 +282,7 @@ static struct inode *alloc_inode(struct super_block *sb)
 void __destroy_inode(struct inode *inode)
 {
        BUG_ON(inode_has_buffers(inode));
+       BUG_ON(inode->i_data.nrpages);
        inode_detach_wb(inode);
        security_inode_free(inode);
        fsnotify_inode_delete(inode);

and what a real fix would look like would be calling clear_inode()
before calling iput() in nilfs_put_root().  But I'm not an expert
in this layer of the VFS, so I might well be wrong.
Hillf Danton June 15, 2024, 11:52 p.m. UTC | #4
On Sat, 15 Jun 2024 21:44:54 +0100 Matthew Wilcox wrote:
> On Sat, Jun 15, 2024 at 07:59:53AM +0800, Hillf Danton wrote:
> > On Fri, 14 Jun 2024 14:42:20 +0100 Matthew Wilcox wrote:
> > > On Fri, Jun 14, 2024 at 09:18:56PM +0800, Hillf Danton wrote:
> > > > Flush lru cache to avoid folio->mapping uaf in case of inode teardown.
> > > 
> > > What?  inodes are supposed to have all their folios removed before
> > > being freed.  Part of removing a folio sets the folio->mapping to NULL.
> > > Where is the report?
> > >
> > Subject: Re: [syzbot] [nilfs?] [mm?] KASAN: slab-use-after-free Read in lru_add_fn
> > https://lore.kernel.org/lkml/000000000000cae276061aa12d5e@google.com/
> 
> Thanks.  This fix is wrong.  Of course syzbot says it fixes the problem,
> but you're just avoiding putting the folios into the situation where we
> have debug that would detect the problem.
> 
> I suspect this would trigger:
> 
Happy to test your idea.

> +++ b/fs/inode.c
> @@ -282,6 +282,7 @@ static struct inode *alloc_inode(struct super_block *sb)
>  void __destroy_inode(struct inode *inode)
>  {
>         BUG_ON(inode_has_buffers(inode));
> +       BUG_ON(inode->i_data.nrpages);
>         inode_detach_wb(inode);
>         security_inode_free(inode);
>         fsnotify_inode_delete(inode);
> 
> and what a real fix would look like would be calling clear_inode()
> before calling iput() in nilfs_put_root().  But I'm not an expert

Hm...given I_FREEING checked in clear_inode(), fix like this one could be
tried in midle 2026.

> in this layer of the VFS, so I might well be wrong.

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git  83a7eefedc9b

--- x/mm/truncate.c
+++ y/mm/truncate.c
@@ -419,6 +419,9 @@ void truncate_inode_pages_range(struct a
 		truncate_folio_batch_exceptionals(mapping, &fbatch, indices);
 		folio_batch_release(&fbatch);
 	}
+
+	if (mapping_exiting(mapping))
+		lru_add_drain_all();
 }
 EXPORT_SYMBOL(truncate_inode_pages_range);
 
--- x/fs/inode.c
+++ y/fs/inode.c
@@ -282,6 +282,7 @@ static struct inode *alloc_inode(struct
 void __destroy_inode(struct inode *inode)
 {
 	BUG_ON(inode_has_buffers(inode));
+	BUG_ON(inode->i_data.nrpages);
 	inode_detach_wb(inode);
 	security_inode_free(inode);
 	fsnotify_inode_delete(inode);
--
syzbot June 16, 2024, 12:10 a.m. UTC | #5
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
kernel BUG in __destroy_inode

NILFS (loop0): I/O error reading meta-data file (ino=3, block-offset=0)
NILFS (loop0): I/O error reading meta-data file (ino=3, block-offset=0)
NILFS (loop0): disposed unprocessed dirty file(s) when stopping log writer
------------[ cut here ]------------
kernel BUG at fs/inode.c:285!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
CPU: 2 PID: 5330 Comm: syz-executor Not tainted 6.10.0-rc3-syzkaller-dirty #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:__destroy_inode+0x5e4/0x7a0 fs/inode.c:285
Code: 2a 03 00 00 48 c7 c7 40 78 3d 8b c6 05 aa 6d cc 0d 01 e8 bf d9 69 ff e9 0e fc ff ff e8 a5 8b 8c ff 90 0f 0b e8 9d 8b 8c ff 90 <0f> 0b e8 95 8b 8c ff 90 0f 0b 90 e9 fa fa ff ff e8 87 8b 8c ff 90
RSP: 0018:ffffc900035afaf0 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff8880325ba7c8 RCX: ffffffff82015439
RDX: ffff8880222ec880 RSI: ffffffff820159b3 RDI: 0000000000000007
RBP: 0000000000000001 R08: 0000000000000007 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: ffff8880325ba980
R13: 0000000000000024 R14: ffffffff8b706c60 R15: ffff8880325ba8a0
FS:  0000555571e27480(0000) GS:ffff88806b200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f01cb366731 CR3: 0000000034ef4000 CR4: 0000000000350ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 destroy_inode+0x91/0x1b0 fs/inode.c:310
 iput_final fs/inode.c:1742 [inline]
 iput.part.0+0x5a8/0x7f0 fs/inode.c:1768
 iput+0x5c/0x80 fs/inode.c:1758
 nilfs_put_root+0xae/0xe0 fs/nilfs2/the_nilfs.c:925
 nilfs_segctor_destroy fs/nilfs2/segment.c:2788 [inline]
 nilfs_detach_log_writer+0x5ef/0xaa0 fs/nilfs2/segment.c:2850
 nilfs_put_super+0x43/0x1b0 fs/nilfs2/super.c:498
 generic_shutdown_super+0x159/0x3d0 fs/super.c:642
 kill_block_super+0x3b/0x90 fs/super.c:1676
 deactivate_locked_super+0xbe/0x1a0 fs/super.c:473
 deactivate_super+0xde/0x100 fs/super.c:506
 cleanup_mnt+0x222/0x450 fs/namespace.c:1267
 task_work_run+0x14e/0x250 kernel/task_work.c:180
 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
 exit_to_user_mode_loop kernel/entry/common.c:114 [inline]
 exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
 __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
 syscall_exit_to_user_mode+0x278/0x2a0 kernel/entry/common.c:218
 do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fc203a7e217
Code: b0 ff ff ff f7 d8 64 89 01 48 83 c8 ff c3 0f 1f 44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 c7 c2 b0 ff ff ff f7 d8 64 89 02 b8
RSP: 002b:00007fffe9265ae8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a6
RAX: 0000000000000000 RBX: 0000000000000064 RCX: 00007fc203a7e217
RDX: 0000000000000200 RSI: 0000000000000009 RDI: 00007fffe9266c90
RBP: 00007fc203ac8336 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000100 R11: 0000000000000202 R12: 00007fffe9266c90
R13: 00007fc203ac8336 R14: 0000555571e27430 R15: 0000000000000005
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:__destroy_inode+0x5e4/0x7a0 fs/inode.c:285
Code: 2a 03 00 00 48 c7 c7 40 78 3d 8b c6 05 aa 6d cc 0d 01 e8 bf d9 69 ff e9 0e fc ff ff e8 a5 8b 8c ff 90 0f 0b e8 9d 8b 8c ff 90 <0f> 0b e8 95 8b 8c ff 90 0f 0b 90 e9 fa fa ff ff e8 87 8b 8c ff 90
RSP: 0018:ffffc900035afaf0 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff8880325ba7c8 RCX: ffffffff82015439
RDX: ffff8880222ec880 RSI: ffffffff820159b3 RDI: 0000000000000007
RBP: 0000000000000001 R08: 0000000000000007 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: ffff8880325ba980
R13: 0000000000000024 R14: ffffffff8b706c60 R15: ffff8880325ba8a0
FS:  0000555571e27480(0000) GS:ffff88806b300000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000c0016fb000 CR3: 0000000034ef4000 CR4: 0000000000350ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400


Tested on:

commit:         83a7eefe Linux 6.10-rc3
git tree:       https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=11bb8ada980000
kernel config:  https://syzkaller.appspot.com/x/.config?x=b8786f381e62940f
dashboard link: https://syzkaller.appspot.com/bug?extid=d79afb004be235636ee8
compiler:       gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch:          https://syzkaller.appspot.com/x/patch.diff?x=16642012980000
Hillf Danton June 16, 2024, 2:39 a.m. UTC | #6
On Sat, 15 Jun 2024 21:44:54 +0100 Matthew Wilcox wrote:
> 
> I suspect this would trigger:
> 
> +++ b/fs/inode.c
> @@ -282,6 +282,7 @@ static struct inode *alloc_inode(struct super_block *sb)
>  void __destroy_inode(struct inode *inode)
>  {
>         BUG_ON(inode_has_buffers(inode));
> +       BUG_ON(inode->i_data.nrpages);
>         inode_detach_wb(inode);
>         security_inode_free(inode);
>         fsnotify_inode_delete(inode);
> 
Yes, it was triggered [1]

[1] https://lore.kernel.org/lkml/00000000000084b401061af6ab80@google.com/

and given trigger after nrpages is checked in clear_inode(),

	iput(inode)
	evict(inode)
	truncate_inode_pages_final(&inode->i_data);
	clear_inode(inode);
	destroy_inode(inode);

why is folio added to exiting mapping?

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git  83a7eefedc9b

--- x/mm/filemap.c
+++ y/mm/filemap.c
@@ -870,6 +870,7 @@ noinline int __filemap_add_folio(struct
 	folio_ref_add(folio, nr);
 	folio->mapping = mapping;
 	folio->index = xas.xa_index;
+	BUG_ON(mapping_exiting(mapping));
 
 	for (;;) {
 		int order = -1, split_order = 0;
--
syzbot June 16, 2024, 3:06 a.m. UTC | #7
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
kernel BUG in __filemap_add_folio

NILFS (loop0): I/O error reading meta-data file (ino=3, block-offset=0)
NILFS (loop0): I/O error reading meta-data file (ino=3, block-offset=0)
NILFS (loop0): disposed unprocessed dirty file(s) when stopping log writer
------------[ cut here ]------------
kernel BUG at mm/filemap.c:873!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
CPU: 1 PID: 5321 Comm: syz-executor Not tainted 6.10.0-rc3-syzkaller-dirty #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:__filemap_add_folio+0xd1d/0xe80 mm/filemap.c:873
Code: 37 8b 4c 89 f7 e8 23 68 10 00 90 0f 0b e8 9b 14 ce ff 48 c7 c6 e0 92 37 8b 4c 89 f7 e8 0c 68 10 00 90 0f 0b e8 84 14 ce ff 90 <0f> 0b e8 7c 14 ce ff 90 0f 0b 90 e9 24 fb ff ff e8 6e 14 ce ff 48
RSP: 0018:ffffc900035773f0 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff81bfc8cd
RDX: ffff888023052440 RSI: ffffffff81bfd0cc RDI: 0000000000000001
RBP: ffff88803233a9f0 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000003 R12: ffffc90003577468
R13: 0000000000000000 R14: ffffea0000b3f7c0 R15: 0000000000000000
FS:  000055556c846480(0000) GS:ffff88806b100000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffe311b9ff8 CR3: 000000001ae02000 CR4: 0000000000350ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 filemap_add_folio+0x110/0x220 mm/filemap.c:971
 __filemap_get_folio+0x455/0xa80 mm/filemap.c:1959
 filemap_grab_folio include/linux/pagemap.h:697 [inline]
 nilfs_grab_buffer+0xc3/0x370 fs/nilfs2/page.c:57
 nilfs_mdt_submit_block+0x9f/0x870 fs/nilfs2/mdt.c:121
 nilfs_mdt_read_block+0xa4/0x3b0 fs/nilfs2/mdt.c:176
 nilfs_mdt_get_block+0xdb/0xb90 fs/nilfs2/mdt.c:251
 nilfs_palloc_get_block+0xb5/0x300 fs/nilfs2/alloc.c:217
 nilfs_palloc_get_entry_block+0x165/0x1b0 fs/nilfs2/alloc.c:319
 nilfs_ifile_delete_inode+0x1e6/0x260 fs/nilfs2/ifile.c:109
 nilfs_evict_inode+0x294/0x550 fs/nilfs2/inode.c:950
 evict+0x2ed/0x6c0 fs/inode.c:667
 iput_final fs/inode.c:1741 [inline]
 iput.part.0+0x5a8/0x7f0 fs/inode.c:1767
 iput+0x5c/0x80 fs/inode.c:1757
 nilfs_put_root+0xae/0xe0 fs/nilfs2/the_nilfs.c:925
 nilfs_segctor_destroy fs/nilfs2/segment.c:2788 [inline]
 nilfs_detach_log_writer+0x5ef/0xaa0 fs/nilfs2/segment.c:2850
 nilfs_put_super+0x43/0x1b0 fs/nilfs2/super.c:498
 generic_shutdown_super+0x159/0x3d0 fs/super.c:642
 kill_block_super+0x3b/0x90 fs/super.c:1676
 deactivate_locked_super+0xbe/0x1a0 fs/super.c:473
 deactivate_super+0xde/0x100 fs/super.c:506
 cleanup_mnt+0x222/0x450 fs/namespace.c:1267
 task_work_run+0x14e/0x250 kernel/task_work.c:180
 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
 exit_to_user_mode_loop kernel/entry/common.c:114 [inline]
 exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
 __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
 syscall_exit_to_user_mode+0x278/0x2a0 kernel/entry/common.c:218
 do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f70d447e217
Code: b0 ff ff ff f7 d8 64 89 01 48 83 c8 ff c3 0f 1f 44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 c7 c2 b0 ff ff ff f7 d8 64 89 02 b8
RSP: 002b:00007ffe311ba288 EFLAGS: 00000202 ORIG_RAX: 00000000000000a6
RAX: 0000000000000000 RBX: 0000000000000064 RCX: 00007f70d447e217
RDX: 0000000000000200 RSI: 0000000000000009 RDI: 00007ffe311bb430
RBP: 00007f70d44c8336 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000100 R11: 0000000000000202 R12: 00007ffe311bb430
R13: 00007f70d44c8336 R14: 000055556c846430 R15: 0000000000000005
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:__filemap_add_folio+0xd1d/0xe80 mm/filemap.c:873
Code: 37 8b 4c 89 f7 e8 23 68 10 00 90 0f 0b e8 9b 14 ce ff 48 c7 c6 e0 92 37 8b 4c 89 f7 e8 0c 68 10 00 90 0f 0b e8 84 14 ce ff 90 <0f> 0b e8 7c 14 ce ff 90 0f 0b 90 e9 24 fb ff ff e8 6e 14 ce ff 48
RSP: 0018:ffffc900035773f0 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff81bfc8cd
RDX: ffff888023052440 RSI: ffffffff81bfd0cc RDI: 0000000000000001
RBP: ffff88803233a9f0 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000003 R12: ffffc90003577468
R13: 0000000000000000 R14: ffffea0000b3f7c0 R15: 0000000000000000
FS:  000055556c846480(0000) GS:ffff88806b000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f70d45a8000 CR3: 000000001ae02000 CR4: 0000000000350ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400


Tested on:

commit:         83a7eefe Linux 6.10-rc3
git tree:       https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=15608256980000
kernel config:  https://syzkaller.appspot.com/x/.config?x=b8786f381e62940f
dashboard link: https://syzkaller.appspot.com/bug?extid=d79afb004be235636ee8
compiler:       gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch:          https://syzkaller.appspot.com/x/patch.diff?x=147bb012980000
Jan Kara June 17, 2024, 7:57 a.m. UTC | #8
On Sun 16-06-24 10:39:51, Hillf Danton wrote:
> On Sat, 15 Jun 2024 21:44:54 +0100 Matthew Wilcox wrote:
> > 
> > I suspect this would trigger:
> > 
> > +++ b/fs/inode.c
> > @@ -282,6 +282,7 @@ static struct inode *alloc_inode(struct super_block *sb)
> >  void __destroy_inode(struct inode *inode)
> >  {
> >         BUG_ON(inode_has_buffers(inode));
> > +       BUG_ON(inode->i_data.nrpages);
> >         inode_detach_wb(inode);
> >         security_inode_free(inode);
> >         fsnotify_inode_delete(inode);
> > 
> Yes, it was triggered [1]
> 
> [1] https://lore.kernel.org/lkml/00000000000084b401061af6ab80@google.com/
> 
> and given trigger after nrpages is checked in clear_inode(),
> 
> 	iput(inode)
> 	evict(inode)
> 	truncate_inode_pages_final(&inode->i_data);
> 	clear_inode(inode);
> 	destroy_inode(inode);
> 
> why is folio added to exiting mapping?
> 
> #syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git  83a7eefedc9b

OK, so based on syzbot results this seems to be a bug in
nilfs_evict_inode() (likely caused by corrupted filesystem so that root
inode's link count was 0 and hence was getting deleted on iput()). I guess
nilfs maintainers need to address these with more consistency checks of
metadata when loading them...

									Honza
Ryusuke Konishi June 17, 2024, 11:24 a.m. UTC | #9
On Mon, Jun 17, 2024 at 4:57 PM Jan Kara wrote:
>
> On Sun 16-06-24 10:39:51, Hillf Danton wrote:
> > On Sat, 15 Jun 2024 21:44:54 +0100 Matthew Wilcox wrote:
> > >
> > > I suspect this would trigger:
> > >
> > > +++ b/fs/inode.c
> > > @@ -282,6 +282,7 @@ static struct inode *alloc_inode(struct super_block *sb)
> > >  void __destroy_inode(struct inode *inode)
> > >  {
> > >         BUG_ON(inode_has_buffers(inode));
> > > +       BUG_ON(inode->i_data.nrpages);
> > >         inode_detach_wb(inode);
> > >         security_inode_free(inode);
> > >         fsnotify_inode_delete(inode);
> > >
> > Yes, it was triggered [1]
> >
> > [1] https://lore.kernel.org/lkml/00000000000084b401061af6ab80@google.com/
> >
> > and given trigger after nrpages is checked in clear_inode(),
> >
> >       iput(inode)
> >       evict(inode)
> >       truncate_inode_pages_final(&inode->i_data);
> >       clear_inode(inode);
> >       destroy_inode(inode);
> >
> > why is folio added to exiting mapping?
> >
> > #syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git  83a7eefedc9b
>
> OK, so based on syzbot results this seems to be a bug in
> nilfs_evict_inode() (likely caused by corrupted filesystem so that root
> inode's link count was 0 and hence was getting deleted on iput()). I guess
> nilfs maintainers need to address these with more consistency checks of
> metadata when loading them...
>
>                                                                         Honza
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

Sorry for my late response.

Also, thank you for pointing out that the problem seems to be caused
via nilfs_evict_inode() by a missing consistency check of the link
count.

I'll check it out and think about how to deal with it.

Thanks,
Ryusuke Konishi
diff mbox series

Patch

--- x/mm/truncate.c
+++ y/mm/truncate.c
@@ -419,6 +419,9 @@  void truncate_inode_pages_range(struct a
 		truncate_folio_batch_exceptionals(mapping, &fbatch, indices);
 		folio_batch_release(&fbatch);
 	}
+
+	if (mapping_exiting(mapping))
+		lru_add_drain_all();
 }
 EXPORT_SYMBOL(truncate_inode_pages_range);