Message ID | 20200309043430.143206-1-ebiggers@kernel.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v2] xfs: clear PF_MEMALLOC before exiting xfsaild thread | expand |
On Sun, Mar 08, 2020 at 09:34:30PM -0700, Eric Biggers wrote: > From: Eric Biggers <ebiggers@google.com> > > Leaving PF_MEMALLOC set when exiting a kthread causes it to remain set > during do_exit(). That can confuse things. For example, if BSD process > accounting is enabled and the accounting file has FS_SYNC_FL set and is > located on an ext4 filesystem without a journal, then do_exit() ends up > calling ext4_write_inode(). That triggers the > WARN_ON_ONCE(current->flags & PF_MEMALLOC) there, as it assumes > (appropriately) that inodes aren't written when allocating memory. > > Fix this in xfsaild() by using the helper functions to save and restore > PF_MEMALLOC. > > This can be reproduced as follows in the kvm-xfstests test appliance > modified to add the 'acct' Debian package, and with kvm-xfstests's > recommended kconfig modified to add CONFIG_BSD_PROCESS_ACCT=y: > > mkfs.ext2 -F /dev/vdb > mount /vdb -t ext4 > touch /vdb/file > chattr +S /vdb/file > accton /vdb/file > mkfs.xfs -f /dev/vdc > mount /vdc > umount /vdc > > It causes: > WARNING: CPU: 0 PID: 332 at fs/ext4/inode.c:5097 ext4_write_inode+0x140/0x1a0 > CPU: 0 PID: 332 Comm: xfsaild/vdc Not tainted 5.6.0-rc5 #5 > [...] > RIP: 0010:ext4_write_inode+0x140/0x1a0 fs/ext4/inode.c:5097 > [...] > Call Trace: > write_inode fs/fs-writeback.c:1312 [inline] > __writeback_single_inode+0x465/0x5f0 fs/fs-writeback.c:1511 > writeback_single_inode+0xad/0x120 fs/fs-writeback.c:1565 > sync_inode fs/fs-writeback.c:2602 [inline] > sync_inode_metadata+0x3d/0x57 fs/fs-writeback.c:2622 > ext4_fsync_nojournal fs/ext4/fsync.c:94 [inline] > ext4_sync_file+0x243/0x4b0 fs/ext4/fsync.c:172 > generic_write_sync include/linux/fs.h:2867 [inline] > ext4_buffered_write_iter+0xe1/0x130 fs/ext4/file.c:277 > call_write_iter include/linux/fs.h:1901 [inline] > new_sync_write+0x130/0x1d0 fs/read_write.c:483 > __kernel_write+0x54/0xe0 fs/read_write.c:515 > do_acct_process+0x122/0x170 kernel/acct.c:522 > slow_acct_process kernel/acct.c:581 [inline] > acct_process+0x1d4/0x27c kernel/acct.c:607 > do_exit+0x83d/0xbc0 kernel/exit.c:791 > kthread+0xf1/0x140 kernel/kthread.c:257 > ret_from_fork+0x27/0x50 arch/x86/entry/entry_64.S:352 > > This case was originally reported by syzbot at > https://lore.kernel.org/r/0000000000000e7156059f751d7b@google.com. > > Reported-by: syzbot+1f9dc49e8de2582d90c2@syzkaller.appspotmail.com > Signed-off-by: Eric Biggers <ebiggers@google.com> > --- Reviewed-by: Brian Foster <bfoster@redhat.com> > > v2: include more details in the commit message. > > fs/xfs/xfs_trans_ail.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c > index 00cc5b8734be8..3bc570c90ad97 100644 > --- a/fs/xfs/xfs_trans_ail.c > +++ b/fs/xfs/xfs_trans_ail.c > @@ -529,8 +529,9 @@ xfsaild( > { > struct xfs_ail *ailp = data; > long tout = 0; /* milliseconds */ > + unsigned int noreclaim_flag; > > - current->flags |= PF_MEMALLOC; > + noreclaim_flag = memalloc_noreclaim_save(); > set_freezable(); > > while (1) { > @@ -601,6 +602,7 @@ xfsaild( > tout = xfsaild_push(ailp); > } > > + memalloc_noreclaim_restore(noreclaim_flag); > return 0; > } > > -- > 2.25.1 >
On Sun, Mar 08, 2020 at 09:34:30PM -0700, Eric Biggers wrote: > From: Eric Biggers <ebiggers@google.com> > > Leaving PF_MEMALLOC set when exiting a kthread causes it to remain set > during do_exit(). That can confuse things. For example, if BSD process > accounting is enabled and the accounting file has FS_SYNC_FL set and is > located on an ext4 filesystem without a journal, then do_exit() ends up > calling ext4_write_inode(). That triggers the > WARN_ON_ONCE(current->flags & PF_MEMALLOC) there, as it assumes > (appropriately) that inodes aren't written when allocating memory. > > Fix this in xfsaild() by using the helper functions to save and restore > PF_MEMALLOC. > > This can be reproduced as follows in the kvm-xfstests test appliance > modified to add the 'acct' Debian package, and with kvm-xfstests's > recommended kconfig modified to add CONFIG_BSD_PROCESS_ACCT=y: > > mkfs.ext2 -F /dev/vdb > mount /vdb -t ext4 > touch /vdb/file > chattr +S /vdb/file Does this trip if the process accounting file is also on an xfs filesystem? > accton /vdb/file > mkfs.xfs -f /dev/vdc > mount /vdc > umount /vdc ...and if so, can this be turned into an fstests case, please? > > It causes: > WARNING: CPU: 0 PID: 332 at fs/ext4/inode.c:5097 ext4_write_inode+0x140/0x1a0 > CPU: 0 PID: 332 Comm: xfsaild/vdc Not tainted 5.6.0-rc5 #5 > [...] > RIP: 0010:ext4_write_inode+0x140/0x1a0 fs/ext4/inode.c:5097 > [...] > Call Trace: > write_inode fs/fs-writeback.c:1312 [inline] > __writeback_single_inode+0x465/0x5f0 fs/fs-writeback.c:1511 > writeback_single_inode+0xad/0x120 fs/fs-writeback.c:1565 > sync_inode fs/fs-writeback.c:2602 [inline] > sync_inode_metadata+0x3d/0x57 fs/fs-writeback.c:2622 > ext4_fsync_nojournal fs/ext4/fsync.c:94 [inline] > ext4_sync_file+0x243/0x4b0 fs/ext4/fsync.c:172 > generic_write_sync include/linux/fs.h:2867 [inline] > ext4_buffered_write_iter+0xe1/0x130 fs/ext4/file.c:277 > call_write_iter include/linux/fs.h:1901 [inline] > new_sync_write+0x130/0x1d0 fs/read_write.c:483 > __kernel_write+0x54/0xe0 fs/read_write.c:515 > do_acct_process+0x122/0x170 kernel/acct.c:522 > slow_acct_process kernel/acct.c:581 [inline] > acct_process+0x1d4/0x27c kernel/acct.c:607 > do_exit+0x83d/0xbc0 kernel/exit.c:791 > kthread+0xf1/0x140 kernel/kthread.c:257 > ret_from_fork+0x27/0x50 arch/x86/entry/entry_64.S:352 > > This case was originally reported by syzbot at > https://lore.kernel.org/r/0000000000000e7156059f751d7b@google.com. > > Reported-by: syzbot+1f9dc49e8de2582d90c2@syzkaller.appspotmail.com > Signed-off-by: Eric Biggers <ebiggers@google.com> Looks ok, Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> --D > --- > > v2: include more details in the commit message. > > fs/xfs/xfs_trans_ail.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c > index 00cc5b8734be8..3bc570c90ad97 100644 > --- a/fs/xfs/xfs_trans_ail.c > +++ b/fs/xfs/xfs_trans_ail.c > @@ -529,8 +529,9 @@ xfsaild( > { > struct xfs_ail *ailp = data; > long tout = 0; /* milliseconds */ > + unsigned int noreclaim_flag; > > - current->flags |= PF_MEMALLOC; > + noreclaim_flag = memalloc_noreclaim_save(); > set_freezable(); > > while (1) { > @@ -601,6 +602,7 @@ xfsaild( > tout = xfsaild_push(ailp); > } > > + memalloc_noreclaim_restore(noreclaim_flag); > return 0; > } > > -- > 2.25.1 >
On Mon, Mar 09, 2020 at 09:24:39AM -0700, Darrick J. Wong wrote: > On Sun, Mar 08, 2020 at 09:34:30PM -0700, Eric Biggers wrote: > > From: Eric Biggers <ebiggers@google.com> > > > > Leaving PF_MEMALLOC set when exiting a kthread causes it to remain set > > during do_exit(). That can confuse things. For example, if BSD process > > accounting is enabled and the accounting file has FS_SYNC_FL set and is > > located on an ext4 filesystem without a journal, then do_exit() ends up > > calling ext4_write_inode(). That triggers the > > WARN_ON_ONCE(current->flags & PF_MEMALLOC) there, as it assumes > > (appropriately) that inodes aren't written when allocating memory. > > > > Fix this in xfsaild() by using the helper functions to save and restore > > PF_MEMALLOC. > > > > This can be reproduced as follows in the kvm-xfstests test appliance > > modified to add the 'acct' Debian package, and with kvm-xfstests's > > recommended kconfig modified to add CONFIG_BSD_PROCESS_ACCT=y: > > > > mkfs.ext2 -F /dev/vdb > > mount /vdb -t ext4 > > touch /vdb/file > > chattr +S /vdb/file > > Does this trip if the process accounting file is also on an xfs > filesystem? > > > accton /vdb/file > > mkfs.xfs -f /dev/vdc > > mount /vdc > > umount /vdc > > ...and if so, can this be turned into an fstests case, please? I wasn't expecting it, but it turns out it does actually trip a similar warning in iomap_do_writepage(): mkfs.xfs -f /dev/vdb mount /vdb touch /vdb/file chattr +S /vdb/file accton /vdb/file mkfs.xfs -f /dev/vdc mount /vdc umount /vdc causes... WARNING: CPU: 1 PID: 336 at fs/iomap/buffered-io.c:1534 CPU: 1 PID: 336 Comm: xfsaild/vdc Not tainted 5.6.0-rc5 #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20191223_100556-anatol 04/01/2014 RIP: 0010:iomap_do_writepage+0x16b/0x1f0 fs/iomap/buffered-io.c:1534 [...] Call Trace: write_cache_pages+0x189/0x4d0 mm/page-writeback.c:2238 iomap_writepages+0x1c/0x33 fs/iomap/buffered-io.c:1642 xfs_vm_writepages+0x65/0x90 fs/xfs/xfs_aops.c:578 do_writepages+0x41/0xe0 mm/page-writeback.c:2344 __filemap_fdatawrite_range+0xd2/0x120 mm/filemap.c:421 file_write_and_wait_range+0x71/0xc0 mm/filemap.c:760 xfs_file_fsync+0x7a/0x2b0 fs/xfs/xfs_file.c:114 generic_write_sync include/linux/fs.h:2867 [inline] xfs_file_buffered_aio_write+0x379/0x3b0 fs/xfs/xfs_file.c:691 call_write_iter include/linux/fs.h:1901 [inline] new_sync_write+0x130/0x1d0 fs/read_write.c:483 __kernel_write+0x54/0xe0 fs/read_write.c:515 do_acct_process+0x122/0x170 kernel/acct.c:522 slow_acct_process kernel/acct.c:581 [inline] acct_process+0x1d4/0x27c kernel/acct.c:607 do_exit+0x83d/0xbc0 kernel/exit.c:791 kthread+0xf1/0x140 kernel/kthread.c:257 ret_from_fork+0x27/0x50 arch/x86/entry/entry_64.S:352 So sure, since it's not necessarily a multi-filesystem thing, I can try to turn it into an xfstest. There's currently no way to enable BSD process accounting in xfstests though, so we'll either need to make the test depend on the 'acct' program or add a helper test program. Also, do you want me to update the commit message again, to mention the above case? - Eric
On Mon, Mar 09, 2020 at 11:04:52AM -0700, Eric Biggers wrote: > On Mon, Mar 09, 2020 at 09:24:39AM -0700, Darrick J. Wong wrote: > > On Sun, Mar 08, 2020 at 09:34:30PM -0700, Eric Biggers wrote: > > > From: Eric Biggers <ebiggers@google.com> > > > > > > Leaving PF_MEMALLOC set when exiting a kthread causes it to remain set > > > during do_exit(). That can confuse things. For example, if BSD process > > > accounting is enabled and the accounting file has FS_SYNC_FL set and is > > > located on an ext4 filesystem without a journal, then do_exit() ends up > > > calling ext4_write_inode(). That triggers the > > > WARN_ON_ONCE(current->flags & PF_MEMALLOC) there, as it assumes > > > (appropriately) that inodes aren't written when allocating memory. > > > > > > Fix this in xfsaild() by using the helper functions to save and restore > > > PF_MEMALLOC. > > > > > > This can be reproduced as follows in the kvm-xfstests test appliance > > > modified to add the 'acct' Debian package, and with kvm-xfstests's > > > recommended kconfig modified to add CONFIG_BSD_PROCESS_ACCT=y: > > > > > > mkfs.ext2 -F /dev/vdb > > > mount /vdb -t ext4 > > > touch /vdb/file > > > chattr +S /vdb/file > > > > Does this trip if the process accounting file is also on an xfs > > filesystem? > > > > > accton /vdb/file > > > mkfs.xfs -f /dev/vdc > > > mount /vdc > > > umount /vdc > > > > ...and if so, can this be turned into an fstests case, please? > > I wasn't expecting it, but it turns out it does actually trip a similar warning > in iomap_do_writepage(): > > mkfs.xfs -f /dev/vdb > mount /vdb > touch /vdb/file > chattr +S /vdb/file > accton /vdb/file > mkfs.xfs -f /dev/vdc > mount /vdc > umount /vdc > > causes... > > WARNING: CPU: 1 PID: 336 at fs/iomap/buffered-io.c:1534 > CPU: 1 PID: 336 Comm: xfsaild/vdc Not tainted 5.6.0-rc5 #3 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20191223_100556-anatol 04/01/2014 > RIP: 0010:iomap_do_writepage+0x16b/0x1f0 fs/iomap/buffered-io.c:1534 > [...] > Call Trace: > write_cache_pages+0x189/0x4d0 mm/page-writeback.c:2238 > iomap_writepages+0x1c/0x33 fs/iomap/buffered-io.c:1642 > xfs_vm_writepages+0x65/0x90 fs/xfs/xfs_aops.c:578 > do_writepages+0x41/0xe0 mm/page-writeback.c:2344 > __filemap_fdatawrite_range+0xd2/0x120 mm/filemap.c:421 > file_write_and_wait_range+0x71/0xc0 mm/filemap.c:760 > xfs_file_fsync+0x7a/0x2b0 fs/xfs/xfs_file.c:114 > generic_write_sync include/linux/fs.h:2867 [inline] > xfs_file_buffered_aio_write+0x379/0x3b0 fs/xfs/xfs_file.c:691 > call_write_iter include/linux/fs.h:1901 [inline] > new_sync_write+0x130/0x1d0 fs/read_write.c:483 > __kernel_write+0x54/0xe0 fs/read_write.c:515 > do_acct_process+0x122/0x170 kernel/acct.c:522 > slow_acct_process kernel/acct.c:581 [inline] > acct_process+0x1d4/0x27c kernel/acct.c:607 > do_exit+0x83d/0xbc0 kernel/exit.c:791 > kthread+0xf1/0x140 kernel/kthread.c:257 > ret_from_fork+0x27/0x50 arch/x86/entry/entry_64.S:352 > > So sure, since it's not necessarily a multi-filesystem thing, I can try to turn > it into an xfstest. There's currently no way to enable BSD process accounting > in xfstests though, so we'll either need to make the test depend on the 'acct' > program or add a helper test program. > > Also, do you want me to update the commit message again, to mention the above > case? I think it's worth mentioning that this is a general problem that applies any time the process accounting file has that sync flag set, since this problem isn't specific to ext4 + xfs. (and now I wonder how many other places in the kernel suffer from these kinds of file write surprises...) --D > - Eric
On Mon, Mar 09, 2020 at 09:24:39AM -0700, Darrick J. Wong wrote: > > > > mkfs.ext2 -F /dev/vdb > > mount /vdb -t ext4 > > touch /vdb/file > > chattr +S /vdb/file > > Does this trip if the process accounting file is also on an xfs > filesystem? > > > accton /vdb/file > > mkfs.xfs -f /dev/vdc > > mount /vdc > > umount /vdc > > ...and if so, can this be turned into an fstests case, please? > Test sent out at https://lkml.kernel.org/fstests/20200312221437.141484-1-ebiggers@kernel.org/ - Eric
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c index 00cc5b8734be8..3bc570c90ad97 100644 --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -529,8 +529,9 @@ xfsaild( { struct xfs_ail *ailp = data; long tout = 0; /* milliseconds */ + unsigned int noreclaim_flag; - current->flags |= PF_MEMALLOC; + noreclaim_flag = memalloc_noreclaim_save(); set_freezable(); while (1) { @@ -601,6 +602,7 @@ xfsaild( tout = xfsaild_push(ailp); } + memalloc_noreclaim_restore(noreclaim_flag); return 0; }