Message ID | 447c5374eeee4ad7abb5320602be92bf5748c04c.1690816368.git.josef@toxicpanda.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: set_page_extent_mapped after read_folio in relocate_one_page | expand |
On Mon, Jul 31, 2023 at 4:43 PM Josef Bacik <josef@toxicpanda.com> wrote: > > One of the CI runs triggered the following panic > > assertion failed: PagePrivate(page) && page->private, in fs/btrfs/subpage.c:229 > ------------[ cut here ]------------ > kernel BUG at fs/btrfs/subpage.c:229! > Internal error: Oops - BUG: 00000000f2000800 [#1] SMP > CPU: 0 PID: 923660 Comm: btrfs Not tainted 6.5.0-rc3+ #1 > pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) > pc : btrfs_subpage_assert+0xbc/0xf0 > lr : btrfs_subpage_assert+0xbc/0xf0 > sp : ffff800093213720 > x29: ffff800093213720 x28: ffff8000932138b4 x27: 000000000c280000 > x26: 00000001b5d00000 x25: 000000000c281000 x24: 000000000c281fff > x23: 0000000000001000 x22: 0000000000000000 x21: ffffff42b95bf880 > x20: ffff42b9528e0000 x19: 0000000000001000 x18: ffffffffffffffff > x17: 667274622f736620 x16: 6e69202c65746176 x15: 0000000000000028 > x14: 0000000000000003 x13: 00000000002672d7 x12: 0000000000000000 > x11: ffffcd3f0ccd9204 x10: ffffcd3f0554ae50 x9 : ffffcd3f0379528c > x8 : ffff800093213428 x7 : 0000000000000000 x6 : ffffcd3f091771e8 > x5 : ffff42b97f333948 x4 : 0000000000000000 x3 : 0000000000000000 > x2 : 0000000000000000 x1 : ffff42b9556cde80 x0 : 000000000000004f > Call trace: > btrfs_subpage_assert+0xbc/0xf0 > btrfs_subpage_set_dirty+0x38/0xa0 > btrfs_page_set_dirty+0x58/0x88 > relocate_one_page+0x204/0x5f0 > relocate_file_extent_cluster+0x11c/0x180 > relocate_data_extent+0xd0/0xf8 > relocate_block_group+0x3d0/0x4e8 > btrfs_relocate_block_group+0x2d8/0x490 > btrfs_relocate_chunk+0x54/0x1a8 > btrfs_balance+0x7f4/0x1150 > btrfs_ioctl+0x10f0/0x20b8 > __arm64_sys_ioctl+0x120/0x11d8 > invoke_syscall.constprop.0+0x80/0xd8 > do_el0_svc+0x6c/0x158 > el0_svc+0x50/0x1b0 > el0t_64_sync_handler+0x120/0x130 > el0t_64_sync+0x194/0x198 > Code: 91098021 b0007fa0 91346000 97e9c6d2 (d4210000) > > This is the same problem outlined in "btrfs: set_page_extent_mapped > after read_folio in btrfs_cont_expand", and the fix is the same. Btw, that patch is already in Linus' tree, so it's convenient to use the git commit here as well (17b17fcd6d44). > I originally looked for the same pattern elsewhere in our code, but > mistakenly skipped over this code because I saw the page cache readahead > before we set_page_extent_mapped, not realizing that this was only in > the !page case, that we can still end up with a !uptodate page and then > do the btrfs_read_folio further down. > > The fix here is the same as the above mentioned patch, move the > set_page_extent_mapped call to after the btrfs_read_folio() block to > make sure that we have the subpage blocksize stuff setup properly before > using the page. > > Signed-off-by: Josef Bacik <josef@toxicpanda.com> Anyway, it looks good to me, so: Reviewed-by: Filipe Manana <fdmanana@suse.com> > --- > fs/btrfs/relocation.c | 12 +++++++++--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c > index 9db2e6fa2cb2..e01819f8de5b 100644 > --- a/fs/btrfs/relocation.c > +++ b/fs/btrfs/relocation.c > @@ -2977,9 +2977,6 @@ static int relocate_one_page(struct inode *inode, struct file_ra_state *ra, > if (!page) > return -ENOMEM; > } > - ret = set_page_extent_mapped(page); > - if (ret < 0) > - goto release_page; > > if (PageReadahead(page)) > page_cache_async_readahead(inode->i_mapping, ra, NULL, > @@ -2995,6 +2992,15 @@ static int relocate_one_page(struct inode *inode, struct file_ra_state *ra, > } > } > > + /* > + * We could have lost page private when we dropped the lock to read the > + * page above, make sure we set_page_extent_mapped here so we have any > + * of the subpage blocksize stuff we need in place. > + */ > + ret = set_page_extent_mapped(page); > + if (ret < 0) > + goto release_page; > + > page_start = page_offset(page); > page_end = page_start + PAGE_SIZE - 1; > > -- > 2.41.0 >
On Mon, Jul 31, 2023 at 11:13:00AM -0400, Josef Bacik wrote: > One of the CI runs triggered the following panic > > assertion failed: PagePrivate(page) && page->private, in fs/btrfs/subpage.c:229 > ------------[ cut here ]------------ > kernel BUG at fs/btrfs/subpage.c:229! > Internal error: Oops - BUG: 00000000f2000800 [#1] SMP > CPU: 0 PID: 923660 Comm: btrfs Not tainted 6.5.0-rc3+ #1 > pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) > pc : btrfs_subpage_assert+0xbc/0xf0 > lr : btrfs_subpage_assert+0xbc/0xf0 > sp : ffff800093213720 > x29: ffff800093213720 x28: ffff8000932138b4 x27: 000000000c280000 > x26: 00000001b5d00000 x25: 000000000c281000 x24: 000000000c281fff > x23: 0000000000001000 x22: 0000000000000000 x21: ffffff42b95bf880 > x20: ffff42b9528e0000 x19: 0000000000001000 x18: ffffffffffffffff > x17: 667274622f736620 x16: 6e69202c65746176 x15: 0000000000000028 > x14: 0000000000000003 x13: 00000000002672d7 x12: 0000000000000000 > x11: ffffcd3f0ccd9204 x10: ffffcd3f0554ae50 x9 : ffffcd3f0379528c > x8 : ffff800093213428 x7 : 0000000000000000 x6 : ffffcd3f091771e8 > x5 : ffff42b97f333948 x4 : 0000000000000000 x3 : 0000000000000000 > x2 : 0000000000000000 x1 : ffff42b9556cde80 x0 : 000000000000004f > Call trace: > btrfs_subpage_assert+0xbc/0xf0 > btrfs_subpage_set_dirty+0x38/0xa0 > btrfs_page_set_dirty+0x58/0x88 > relocate_one_page+0x204/0x5f0 > relocate_file_extent_cluster+0x11c/0x180 > relocate_data_extent+0xd0/0xf8 > relocate_block_group+0x3d0/0x4e8 > btrfs_relocate_block_group+0x2d8/0x490 > btrfs_relocate_chunk+0x54/0x1a8 > btrfs_balance+0x7f4/0x1150 > btrfs_ioctl+0x10f0/0x20b8 > __arm64_sys_ioctl+0x120/0x11d8 > invoke_syscall.constprop.0+0x80/0xd8 > do_el0_svc+0x6c/0x158 > el0_svc+0x50/0x1b0 > el0t_64_sync_handler+0x120/0x130 > el0t_64_sync+0x194/0x198 > Code: 91098021 b0007fa0 91346000 97e9c6d2 (d4210000) > > This is the same problem outlined in "btrfs: set_page_extent_mapped > after read_folio in btrfs_cont_expand", and the fix is the same. I > originally looked for the same pattern elsewhere in our code, but > mistakenly skipped over this code because I saw the page cache readahead > before we set_page_extent_mapped, not realizing that this was only in > the !page case, that we can still end up with a !uptodate page and then > do the btrfs_read_folio further down. > > The fix here is the same as the above mentioned patch, move the > set_page_extent_mapped call to after the btrfs_read_folio() block to > make sure that we have the subpage blocksize stuff setup properly before > using the page. > > Signed-off-by: Josef Bacik <josef@toxicpanda.com> With the updated patch referece added to misc-next, thanks.
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 9db2e6fa2cb2..e01819f8de5b 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -2977,9 +2977,6 @@ static int relocate_one_page(struct inode *inode, struct file_ra_state *ra, if (!page) return -ENOMEM; } - ret = set_page_extent_mapped(page); - if (ret < 0) - goto release_page; if (PageReadahead(page)) page_cache_async_readahead(inode->i_mapping, ra, NULL, @@ -2995,6 +2992,15 @@ static int relocate_one_page(struct inode *inode, struct file_ra_state *ra, } } + /* + * We could have lost page private when we dropped the lock to read the + * page above, make sure we set_page_extent_mapped here so we have any + * of the subpage blocksize stuff we need in place. + */ + ret = set_page_extent_mapped(page); + if (ret < 0) + goto release_page; + page_start = page_offset(page); page_end = page_start + PAGE_SIZE - 1;
One of the CI runs triggered the following panic assertion failed: PagePrivate(page) && page->private, in fs/btrfs/subpage.c:229 ------------[ cut here ]------------ kernel BUG at fs/btrfs/subpage.c:229! Internal error: Oops - BUG: 00000000f2000800 [#1] SMP CPU: 0 PID: 923660 Comm: btrfs Not tainted 6.5.0-rc3+ #1 pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) pc : btrfs_subpage_assert+0xbc/0xf0 lr : btrfs_subpage_assert+0xbc/0xf0 sp : ffff800093213720 x29: ffff800093213720 x28: ffff8000932138b4 x27: 000000000c280000 x26: 00000001b5d00000 x25: 000000000c281000 x24: 000000000c281fff x23: 0000000000001000 x22: 0000000000000000 x21: ffffff42b95bf880 x20: ffff42b9528e0000 x19: 0000000000001000 x18: ffffffffffffffff x17: 667274622f736620 x16: 6e69202c65746176 x15: 0000000000000028 x14: 0000000000000003 x13: 00000000002672d7 x12: 0000000000000000 x11: ffffcd3f0ccd9204 x10: ffffcd3f0554ae50 x9 : ffffcd3f0379528c x8 : ffff800093213428 x7 : 0000000000000000 x6 : ffffcd3f091771e8 x5 : ffff42b97f333948 x4 : 0000000000000000 x3 : 0000000000000000 x2 : 0000000000000000 x1 : ffff42b9556cde80 x0 : 000000000000004f Call trace: btrfs_subpage_assert+0xbc/0xf0 btrfs_subpage_set_dirty+0x38/0xa0 btrfs_page_set_dirty+0x58/0x88 relocate_one_page+0x204/0x5f0 relocate_file_extent_cluster+0x11c/0x180 relocate_data_extent+0xd0/0xf8 relocate_block_group+0x3d0/0x4e8 btrfs_relocate_block_group+0x2d8/0x490 btrfs_relocate_chunk+0x54/0x1a8 btrfs_balance+0x7f4/0x1150 btrfs_ioctl+0x10f0/0x20b8 __arm64_sys_ioctl+0x120/0x11d8 invoke_syscall.constprop.0+0x80/0xd8 do_el0_svc+0x6c/0x158 el0_svc+0x50/0x1b0 el0t_64_sync_handler+0x120/0x130 el0t_64_sync+0x194/0x198 Code: 91098021 b0007fa0 91346000 97e9c6d2 (d4210000) This is the same problem outlined in "btrfs: set_page_extent_mapped after read_folio in btrfs_cont_expand", and the fix is the same. I originally looked for the same pattern elsewhere in our code, but mistakenly skipped over this code because I saw the page cache readahead before we set_page_extent_mapped, not realizing that this was only in the !page case, that we can still end up with a !uptodate page and then do the btrfs_read_folio further down. The fix here is the same as the above mentioned patch, move the set_page_extent_mapped call to after the btrfs_read_folio() block to make sure that we have the subpage blocksize stuff setup properly before using the page. Signed-off-by: Josef Bacik <josef@toxicpanda.com> --- fs/btrfs/relocation.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)