Message ID | cd4159ae5d32fdb87deba4bf6485819614016c11.1685088405.git.wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: fix a crash in metadata repair path | expand |
On Fri, May 26, 2023 at 08:30:20PM +0800, Qu Wenruo wrote: > struct btrfs_fs_info *fs_info = eb->fs_info; > - u64 start = eb->start; > int i, num_pages = num_extent_pages(eb); > int ret = 0; > > @@ -185,12 +184,14 @@ static int btrfs_repair_eb_io_failure(const struct extent_buffer *eb, > > for (i = 0; i < num_pages; i++) { > struct page *p = eb->pages[i]; > + u64 start = max_t(u64, eb->start, page_offset(p)); > + u64 end = min_t(u64, eb->start + eb->len, page_offset(p) + PAGE_SIZE); > + u32 len = end - start; > > - ret = btrfs_repair_io_failure(fs_info, 0, start, PAGE_SIZE, > - start, p, start - page_offset(p), mirror_num); > + ret = btrfs_repair_io_failure(fs_info, 0, start, len, > + start, p, offset_in_page(start), mirror_num); > if (ret) > break; > - start += PAGE_SIZE; I actually just noticed this a week or so ago, but didn't have a reproducer. My fix does this a little differeny by adding a branch for nodesize < PAGE_SIZE similar to write_one_eb or read_extent_buffer_pages, which feels a bit simpler and easier to read to me: diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 48ac9fccca06ae..3d498d08e61b65 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -182,11 +182,19 @@ static int btrfs_repair_eb_io_failure(const struct extent_buffer *eb, if (sb_rdonly(fs_info->sb)) return -EROFS; + if (fs_info->nodesize < PAGE_SIZE) { + struct page *p = eb->pages[0]; + + return btrfs_repair_io_failure(fs_info, 0, start, eb->len, + start, p, start - page_offset(p), + mirror_num); + } + for (i = 0; i < num_pages; i++) { struct page *p = eb->pages[i]; ret = btrfs_repair_io_failure(fs_info, 0, start, PAGE_SIZE, - start, p, start - page_offset(p), mirror_num); + start, p, 0, mirror_num); if (ret) break; start += PAGE_SIZE;
On 2023/5/26 21:42, Christoph Hellwig wrote: > On Fri, May 26, 2023 at 08:30:20PM +0800, Qu Wenruo wrote: >> struct btrfs_fs_info *fs_info = eb->fs_info; >> - u64 start = eb->start; >> int i, num_pages = num_extent_pages(eb); >> int ret = 0; >> >> @@ -185,12 +184,14 @@ static int btrfs_repair_eb_io_failure(const struct extent_buffer *eb, >> >> for (i = 0; i < num_pages; i++) { >> struct page *p = eb->pages[i]; >> + u64 start = max_t(u64, eb->start, page_offset(p)); >> + u64 end = min_t(u64, eb->start + eb->len, page_offset(p) + PAGE_SIZE); >> + u32 len = end - start; >> >> - ret = btrfs_repair_io_failure(fs_info, 0, start, PAGE_SIZE, >> - start, p, start - page_offset(p), mirror_num); >> + ret = btrfs_repair_io_failure(fs_info, 0, start, len, >> + start, p, offset_in_page(start), mirror_num); >> if (ret) >> break; >> - start += PAGE_SIZE; > > I actually just noticed this a week or so ago, but didn't have a > reproducer. My fix does this a little differeny by adding a branch > for nodesize < PAGE_SIZE similar to write_one_eb or > read_extent_buffer_pages, which feels a bit simpler and easier to read > to me: Sometimes the extra calculation is going to damage the readability indeed. But it's more like a preference, I'll leave David to do the final call. Thanks, Qu > > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index 48ac9fccca06ae..3d498d08e61b65 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -182,11 +182,19 @@ static int btrfs_repair_eb_io_failure(const struct extent_buffer *eb, > if (sb_rdonly(fs_info->sb)) > return -EROFS; > > + if (fs_info->nodesize < PAGE_SIZE) { > + struct page *p = eb->pages[0]; > + > + return btrfs_repair_io_failure(fs_info, 0, start, eb->len, > + start, p, start - page_offset(p), > + mirror_num); > + } > + > for (i = 0; i < num_pages; i++) { > struct page *p = eb->pages[i]; > > ret = btrfs_repair_io_failure(fs_info, 0, start, PAGE_SIZE, > - start, p, start - page_offset(p), mirror_num); > + start, p, 0, mirror_num); > if (ret) > break; > start += PAGE_SIZE;
Hi David, I still don't see this fix merged for misc-next, any update on this? Thanks, Qu On 2023/5/26 20:30, Qu Wenruo wrote: > [BUG] > Test case btrfs/027 would crash with subpage (64K page size, 4K > sectorsize) with the following dying messages: > > debug: map_length=16384 length=65536 type=metadata|raid6(0x104) > assertion failed: map_length >= length, in fs/btrfs/volumes.c:8093 > ------------[ cut here ]------------ > kernel BUG at fs/btrfs/messages.c:259! > Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 > Call trace: > btrfs_assertfail+0x28/0x2c [btrfs] > btrfs_map_repair_block+0x150/0x2b8 [btrfs] > btrfs_repair_io_failure+0xd4/0x31c [btrfs] > btrfs_read_extent_buffer+0x150/0x16c [btrfs] > read_tree_block+0x38/0xbc [btrfs] > read_tree_root_path+0xfc/0x1bc [btrfs] > btrfs_get_root_ref.part.0+0xd4/0x3a8 [btrfs] > open_ctree+0xa30/0x172c [btrfs] > btrfs_mount_root+0x3c4/0x4a4 [btrfs] > legacy_get_tree+0x30/0x60 > vfs_get_tree+0x28/0xec > vfs_kern_mount.part.0+0x90/0xd4 > vfs_kern_mount+0x14/0x28 > btrfs_mount+0x114/0x418 [btrfs] > legacy_get_tree+0x30/0x60 > vfs_get_tree+0x28/0xec > path_mount+0x3e0/0xb64 > __arm64_sys_mount+0x200/0x2d8 > invoke_syscall+0x48/0x114 > el0_svc_common.constprop.0+0x60/0x11c > do_el0_svc+0x38/0x98 > el0_svc+0x40/0xa8 > el0t_64_sync_handler+0xf4/0x120 > el0t_64_sync+0x190/0x194 > Code: aa0403e2 b0fff060 91010000 959c2024 (d4210000) > > [CAUSE] > In btrfs/027 we test RAID6 with missing devices, in this particular > case, we're repairing a metadata at the end of a data stripe. > > But at btrfs_repair_io_failure(), we always pass a full PAGE for repair, > and for subpage case this can cross stripe boundary and lead to the > above BUG_ON(). > > This metadata repair code is always there, since the introduction of > subpage support, but this can trigger BUG_ON() after the bio split > ability at btrfs_map_bio(). > > [FIX] > Instead of passing the old PAGE_SIZE, we calculate the correct length > based on the eb size and page size for both regular and subpage cases. > > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > fs/btrfs/disk-io.c | 9 +++++---- > 1 file changed, 5 insertions(+), 4 deletions(-) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index c461a46ac6f2..65ac7b95d3a4 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -176,7 +176,6 @@ static int btrfs_repair_eb_io_failure(const struct extent_buffer *eb, > int mirror_num) > { > struct btrfs_fs_info *fs_info = eb->fs_info; > - u64 start = eb->start; > int i, num_pages = num_extent_pages(eb); > int ret = 0; > > @@ -185,12 +184,14 @@ static int btrfs_repair_eb_io_failure(const struct extent_buffer *eb, > > for (i = 0; i < num_pages; i++) { > struct page *p = eb->pages[i]; > + u64 start = max_t(u64, eb->start, page_offset(p)); > + u64 end = min_t(u64, eb->start + eb->len, page_offset(p) + PAGE_SIZE); > + u32 len = end - start; > > - ret = btrfs_repair_io_failure(fs_info, 0, start, PAGE_SIZE, > - start, p, start - page_offset(p), mirror_num); > + ret = btrfs_repair_io_failure(fs_info, 0, start, len, > + start, p, offset_in_page(start), mirror_num); > if (ret) > break; > - start += PAGE_SIZE; > } > > return ret;
On Fri, Jun 02, 2023 at 03:15:39PM +0800, Qu Wenruo wrote: > Hi David, > > I still don't see this fix merged for misc-next, any update on this? FYI, while I prefer the explicit version, this one definitively fixes the bug, so: Reviewed-by: Christoph Hellwig <hch@lst.de>
On Fri, May 26, 2023 at 08:30:20PM +0800, Qu Wenruo wrote: > [BUG] > Test case btrfs/027 would crash with subpage (64K page size, 4K > sectorsize) with the following dying messages: > > debug: map_length=16384 length=65536 type=metadata|raid6(0x104) > assertion failed: map_length >= length, in fs/btrfs/volumes.c:8093 > ------------[ cut here ]------------ > kernel BUG at fs/btrfs/messages.c:259! > Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 > Call trace: > btrfs_assertfail+0x28/0x2c [btrfs] > btrfs_map_repair_block+0x150/0x2b8 [btrfs] > btrfs_repair_io_failure+0xd4/0x31c [btrfs] > btrfs_read_extent_buffer+0x150/0x16c [btrfs] > read_tree_block+0x38/0xbc [btrfs] > read_tree_root_path+0xfc/0x1bc [btrfs] > btrfs_get_root_ref.part.0+0xd4/0x3a8 [btrfs] > open_ctree+0xa30/0x172c [btrfs] > btrfs_mount_root+0x3c4/0x4a4 [btrfs] > legacy_get_tree+0x30/0x60 > vfs_get_tree+0x28/0xec > vfs_kern_mount.part.0+0x90/0xd4 > vfs_kern_mount+0x14/0x28 > btrfs_mount+0x114/0x418 [btrfs] > legacy_get_tree+0x30/0x60 > vfs_get_tree+0x28/0xec > path_mount+0x3e0/0xb64 > __arm64_sys_mount+0x200/0x2d8 > invoke_syscall+0x48/0x114 > el0_svc_common.constprop.0+0x60/0x11c > do_el0_svc+0x38/0x98 > el0_svc+0x40/0xa8 > el0t_64_sync_handler+0xf4/0x120 > el0t_64_sync+0x190/0x194 > Code: aa0403e2 b0fff060 91010000 959c2024 (d4210000) > > [CAUSE] > In btrfs/027 we test RAID6 with missing devices, in this particular > case, we're repairing a metadata at the end of a data stripe. > > But at btrfs_repair_io_failure(), we always pass a full PAGE for repair, > and for subpage case this can cross stripe boundary and lead to the > above BUG_ON(). > > This metadata repair code is always there, since the introduction of > subpage support, but this can trigger BUG_ON() after the bio split > ability at btrfs_map_bio(). > > [FIX] > Instead of passing the old PAGE_SIZE, we calculate the correct length > based on the eb size and page size for both regular and subpage cases. > > Signed-off-by: Qu Wenruo <wqu@suse.com> Added to misc-next, thanks.
On Sat, May 27, 2023 at 06:28:40AM +0800, Qu Wenruo wrote: > > > On 2023/5/26 21:42, Christoph Hellwig wrote: > > On Fri, May 26, 2023 at 08:30:20PM +0800, Qu Wenruo wrote: > >> struct btrfs_fs_info *fs_info = eb->fs_info; > >> - u64 start = eb->start; > >> int i, num_pages = num_extent_pages(eb); > >> int ret = 0; > >> > >> @@ -185,12 +184,14 @@ static int btrfs_repair_eb_io_failure(const struct extent_buffer *eb, > >> > >> for (i = 0; i < num_pages; i++) { > >> struct page *p = eb->pages[i]; > >> + u64 start = max_t(u64, eb->start, page_offset(p)); > >> + u64 end = min_t(u64, eb->start + eb->len, page_offset(p) + PAGE_SIZE); > >> + u32 len = end - start; > >> > >> - ret = btrfs_repair_io_failure(fs_info, 0, start, PAGE_SIZE, > >> - start, p, start - page_offset(p), mirror_num); > >> + ret = btrfs_repair_io_failure(fs_info, 0, start, len, > >> + start, p, offset_in_page(start), mirror_num); > >> if (ret) > >> break; > >> - start += PAGE_SIZE; > > > > I actually just noticed this a week or so ago, but didn't have a > > reproducer. My fix does this a little differeny by adding a branch > > for nodesize < PAGE_SIZE similar to write_one_eb or > > read_extent_buffer_pages, which feels a bit simpler and easier to read > > to me: > > Sometimes the extra calculation is going to damage the readability indeed. I prefer your version as it has only one call to btrfs_repair_io_failure which takes several parameters. The 'iterate/prepare arguments/call' is a pattern I think we've been using often. The separate checks for subpage were ment as an intermediate step so we don't touch the non-subpage code but the goal is to merge them, meaning the logic would be pushed to 'prepare arguments'.
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index c461a46ac6f2..65ac7b95d3a4 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -176,7 +176,6 @@ static int btrfs_repair_eb_io_failure(const struct extent_buffer *eb, int mirror_num) { struct btrfs_fs_info *fs_info = eb->fs_info; - u64 start = eb->start; int i, num_pages = num_extent_pages(eb); int ret = 0; @@ -185,12 +184,14 @@ static int btrfs_repair_eb_io_failure(const struct extent_buffer *eb, for (i = 0; i < num_pages; i++) { struct page *p = eb->pages[i]; + u64 start = max_t(u64, eb->start, page_offset(p)); + u64 end = min_t(u64, eb->start + eb->len, page_offset(p) + PAGE_SIZE); + u32 len = end - start; - ret = btrfs_repair_io_failure(fs_info, 0, start, PAGE_SIZE, - start, p, start - page_offset(p), mirror_num); + ret = btrfs_repair_io_failure(fs_info, 0, start, len, + start, p, offset_in_page(start), mirror_num); if (ret) break; - start += PAGE_SIZE; } return ret;
[BUG] Test case btrfs/027 would crash with subpage (64K page size, 4K sectorsize) with the following dying messages: debug: map_length=16384 length=65536 type=metadata|raid6(0x104) assertion failed: map_length >= length, in fs/btrfs/volumes.c:8093 ------------[ cut here ]------------ kernel BUG at fs/btrfs/messages.c:259! Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 Call trace: btrfs_assertfail+0x28/0x2c [btrfs] btrfs_map_repair_block+0x150/0x2b8 [btrfs] btrfs_repair_io_failure+0xd4/0x31c [btrfs] btrfs_read_extent_buffer+0x150/0x16c [btrfs] read_tree_block+0x38/0xbc [btrfs] read_tree_root_path+0xfc/0x1bc [btrfs] btrfs_get_root_ref.part.0+0xd4/0x3a8 [btrfs] open_ctree+0xa30/0x172c [btrfs] btrfs_mount_root+0x3c4/0x4a4 [btrfs] legacy_get_tree+0x30/0x60 vfs_get_tree+0x28/0xec vfs_kern_mount.part.0+0x90/0xd4 vfs_kern_mount+0x14/0x28 btrfs_mount+0x114/0x418 [btrfs] legacy_get_tree+0x30/0x60 vfs_get_tree+0x28/0xec path_mount+0x3e0/0xb64 __arm64_sys_mount+0x200/0x2d8 invoke_syscall+0x48/0x114 el0_svc_common.constprop.0+0x60/0x11c do_el0_svc+0x38/0x98 el0_svc+0x40/0xa8 el0t_64_sync_handler+0xf4/0x120 el0t_64_sync+0x190/0x194 Code: aa0403e2 b0fff060 91010000 959c2024 (d4210000) [CAUSE] In btrfs/027 we test RAID6 with missing devices, in this particular case, we're repairing a metadata at the end of a data stripe. But at btrfs_repair_io_failure(), we always pass a full PAGE for repair, and for subpage case this can cross stripe boundary and lead to the above BUG_ON(). This metadata repair code is always there, since the introduction of subpage support, but this can trigger BUG_ON() after the bio split ability at btrfs_map_bio(). [FIX] Instead of passing the old PAGE_SIZE, we calculate the correct length based on the eb size and page size for both regular and subpage cases. Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/disk-io.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)