diff mbox series

btrfs: fix a crash in metadata repair path

Message ID cd4159ae5d32fdb87deba4bf6485819614016c11.1685088405.git.wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: fix a crash in metadata repair path | expand

Commit Message

Qu Wenruo May 26, 2023, 12:30 p.m. UTC
[BUG]
Test case btrfs/027 would crash with subpage (64K page size, 4K
sectorsize) with the following dying messages:

 debug: map_length=16384 length=65536 type=metadata|raid6(0x104)
 assertion failed: map_length >= length, in fs/btrfs/volumes.c:8093
 ------------[ cut here ]------------
 kernel BUG at fs/btrfs/messages.c:259!
 Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
 Call trace:
  btrfs_assertfail+0x28/0x2c [btrfs]
  btrfs_map_repair_block+0x150/0x2b8 [btrfs]
  btrfs_repair_io_failure+0xd4/0x31c [btrfs]
  btrfs_read_extent_buffer+0x150/0x16c [btrfs]
  read_tree_block+0x38/0xbc [btrfs]
  read_tree_root_path+0xfc/0x1bc [btrfs]
  btrfs_get_root_ref.part.0+0xd4/0x3a8 [btrfs]
  open_ctree+0xa30/0x172c [btrfs]
  btrfs_mount_root+0x3c4/0x4a4 [btrfs]
  legacy_get_tree+0x30/0x60
  vfs_get_tree+0x28/0xec
  vfs_kern_mount.part.0+0x90/0xd4
  vfs_kern_mount+0x14/0x28
  btrfs_mount+0x114/0x418 [btrfs]
  legacy_get_tree+0x30/0x60
  vfs_get_tree+0x28/0xec
  path_mount+0x3e0/0xb64
  __arm64_sys_mount+0x200/0x2d8
  invoke_syscall+0x48/0x114
  el0_svc_common.constprop.0+0x60/0x11c
  do_el0_svc+0x38/0x98
  el0_svc+0x40/0xa8
  el0t_64_sync_handler+0xf4/0x120
  el0t_64_sync+0x190/0x194
 Code: aa0403e2 b0fff060 91010000 959c2024 (d4210000)

[CAUSE]
In btrfs/027 we test RAID6 with missing devices, in this particular
case, we're repairing a metadata at the end of a data stripe.

But at btrfs_repair_io_failure(), we always pass a full PAGE for repair,
and for subpage case this can cross stripe boundary and lead to the
above BUG_ON().

This metadata repair code is always there, since the introduction of
subpage support, but this can trigger BUG_ON() after the bio split
ability at btrfs_map_bio().

[FIX]
Instead of passing the old PAGE_SIZE, we calculate the correct length
based on the eb size and page size for both regular and subpage cases.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

Comments

Christoph Hellwig May 26, 2023, 1:42 p.m. UTC | #1
On Fri, May 26, 2023 at 08:30:20PM +0800, Qu Wenruo wrote:
>  	struct btrfs_fs_info *fs_info = eb->fs_info;
> -	u64 start = eb->start;
>  	int i, num_pages = num_extent_pages(eb);
>  	int ret = 0;
>  
> @@ -185,12 +184,14 @@ static int btrfs_repair_eb_io_failure(const struct extent_buffer *eb,
>  
>  	for (i = 0; i < num_pages; i++) {
>  		struct page *p = eb->pages[i];
> +		u64 start = max_t(u64, eb->start, page_offset(p));
> +		u64 end = min_t(u64, eb->start + eb->len, page_offset(p) + PAGE_SIZE);
> +		u32 len = end - start;
>  
> -		ret = btrfs_repair_io_failure(fs_info, 0, start, PAGE_SIZE,
> -				start, p, start - page_offset(p), mirror_num);
> +		ret = btrfs_repair_io_failure(fs_info, 0, start, len,
> +				start, p, offset_in_page(start), mirror_num);
>  		if (ret)
>  			break;
> -		start += PAGE_SIZE;

I actually just noticed this a week or so ago, but didn't have a
reproducer.  My fix  does this a little differeny by adding a branch
for nodesize < PAGE_SIZE similar to write_one_eb or
read_extent_buffer_pages, which feels a bit simpler and easier to read
to me:


diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 48ac9fccca06ae..3d498d08e61b65 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -182,11 +182,19 @@ static int btrfs_repair_eb_io_failure(const struct extent_buffer *eb,
 	if (sb_rdonly(fs_info->sb))
 		return -EROFS;
 
+	if (fs_info->nodesize < PAGE_SIZE) {
+		struct page *p = eb->pages[0];
+
+		return btrfs_repair_io_failure(fs_info, 0, start, eb->len,
+					       start, p, start - page_offset(p),
+					       mirror_num);
+	}
+
 	for (i = 0; i < num_pages; i++) {
 		struct page *p = eb->pages[i];
 
 		ret = btrfs_repair_io_failure(fs_info, 0, start, PAGE_SIZE,
-				start, p, start - page_offset(p), mirror_num);
+					      start, p, 0, mirror_num);
 		if (ret)
 			break;
 		start += PAGE_SIZE;
Qu Wenruo May 26, 2023, 10:28 p.m. UTC | #2
On 2023/5/26 21:42, Christoph Hellwig wrote:
> On Fri, May 26, 2023 at 08:30:20PM +0800, Qu Wenruo wrote:
>>   	struct btrfs_fs_info *fs_info = eb->fs_info;
>> -	u64 start = eb->start;
>>   	int i, num_pages = num_extent_pages(eb);
>>   	int ret = 0;
>>
>> @@ -185,12 +184,14 @@ static int btrfs_repair_eb_io_failure(const struct extent_buffer *eb,
>>
>>   	for (i = 0; i < num_pages; i++) {
>>   		struct page *p = eb->pages[i];
>> +		u64 start = max_t(u64, eb->start, page_offset(p));
>> +		u64 end = min_t(u64, eb->start + eb->len, page_offset(p) + PAGE_SIZE);
>> +		u32 len = end - start;
>>
>> -		ret = btrfs_repair_io_failure(fs_info, 0, start, PAGE_SIZE,
>> -				start, p, start - page_offset(p), mirror_num);
>> +		ret = btrfs_repair_io_failure(fs_info, 0, start, len,
>> +				start, p, offset_in_page(start), mirror_num);
>>   		if (ret)
>>   			break;
>> -		start += PAGE_SIZE;
>
> I actually just noticed this a week or so ago, but didn't have a
> reproducer.  My fix  does this a little differeny by adding a branch
> for nodesize < PAGE_SIZE similar to write_one_eb or
> read_extent_buffer_pages, which feels a bit simpler and easier to read
> to me:

Sometimes the extra calculation is going to damage the readability indeed.

But it's more like a preference, I'll leave David to do the final call.

Thanks,
Qu

>
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 48ac9fccca06ae..3d498d08e61b65 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -182,11 +182,19 @@ static int btrfs_repair_eb_io_failure(const struct extent_buffer *eb,
>   	if (sb_rdonly(fs_info->sb))
>   		return -EROFS;
>
> +	if (fs_info->nodesize < PAGE_SIZE) {
> +		struct page *p = eb->pages[0];
> +
> +		return btrfs_repair_io_failure(fs_info, 0, start, eb->len,
> +					       start, p, start - page_offset(p),
> +					       mirror_num);
> +	}
> +
>   	for (i = 0; i < num_pages; i++) {
>   		struct page *p = eb->pages[i];
>
>   		ret = btrfs_repair_io_failure(fs_info, 0, start, PAGE_SIZE,
> -				start, p, start - page_offset(p), mirror_num);
> +					      start, p, 0, mirror_num);
>   		if (ret)
>   			break;
>   		start += PAGE_SIZE;
Qu Wenruo June 2, 2023, 7:15 a.m. UTC | #3
Hi David,

I still don't see this fix merged for misc-next, any update on this?

Thanks,
Qu

On 2023/5/26 20:30, Qu Wenruo wrote:
> [BUG]
> Test case btrfs/027 would crash with subpage (64K page size, 4K
> sectorsize) with the following dying messages:
>
>   debug: map_length=16384 length=65536 type=metadata|raid6(0x104)
>   assertion failed: map_length >= length, in fs/btrfs/volumes.c:8093
>   ------------[ cut here ]------------
>   kernel BUG at fs/btrfs/messages.c:259!
>   Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
>   Call trace:
>    btrfs_assertfail+0x28/0x2c [btrfs]
>    btrfs_map_repair_block+0x150/0x2b8 [btrfs]
>    btrfs_repair_io_failure+0xd4/0x31c [btrfs]
>    btrfs_read_extent_buffer+0x150/0x16c [btrfs]
>    read_tree_block+0x38/0xbc [btrfs]
>    read_tree_root_path+0xfc/0x1bc [btrfs]
>    btrfs_get_root_ref.part.0+0xd4/0x3a8 [btrfs]
>    open_ctree+0xa30/0x172c [btrfs]
>    btrfs_mount_root+0x3c4/0x4a4 [btrfs]
>    legacy_get_tree+0x30/0x60
>    vfs_get_tree+0x28/0xec
>    vfs_kern_mount.part.0+0x90/0xd4
>    vfs_kern_mount+0x14/0x28
>    btrfs_mount+0x114/0x418 [btrfs]
>    legacy_get_tree+0x30/0x60
>    vfs_get_tree+0x28/0xec
>    path_mount+0x3e0/0xb64
>    __arm64_sys_mount+0x200/0x2d8
>    invoke_syscall+0x48/0x114
>    el0_svc_common.constprop.0+0x60/0x11c
>    do_el0_svc+0x38/0x98
>    el0_svc+0x40/0xa8
>    el0t_64_sync_handler+0xf4/0x120
>    el0t_64_sync+0x190/0x194
>   Code: aa0403e2 b0fff060 91010000 959c2024 (d4210000)
>
> [CAUSE]
> In btrfs/027 we test RAID6 with missing devices, in this particular
> case, we're repairing a metadata at the end of a data stripe.
>
> But at btrfs_repair_io_failure(), we always pass a full PAGE for repair,
> and for subpage case this can cross stripe boundary and lead to the
> above BUG_ON().
>
> This metadata repair code is always there, since the introduction of
> subpage support, but this can trigger BUG_ON() after the bio split
> ability at btrfs_map_bio().
>
> [FIX]
> Instead of passing the old PAGE_SIZE, we calculate the correct length
> based on the eb size and page size for both regular and subpage cases.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>   fs/btrfs/disk-io.c | 9 +++++----
>   1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index c461a46ac6f2..65ac7b95d3a4 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -176,7 +176,6 @@ static int btrfs_repair_eb_io_failure(const struct extent_buffer *eb,
>   				      int mirror_num)
>   {
>   	struct btrfs_fs_info *fs_info = eb->fs_info;
> -	u64 start = eb->start;
>   	int i, num_pages = num_extent_pages(eb);
>   	int ret = 0;
>
> @@ -185,12 +184,14 @@ static int btrfs_repair_eb_io_failure(const struct extent_buffer *eb,
>
>   	for (i = 0; i < num_pages; i++) {
>   		struct page *p = eb->pages[i];
> +		u64 start = max_t(u64, eb->start, page_offset(p));
> +		u64 end = min_t(u64, eb->start + eb->len, page_offset(p) + PAGE_SIZE);
> +		u32 len = end - start;
>
> -		ret = btrfs_repair_io_failure(fs_info, 0, start, PAGE_SIZE,
> -				start, p, start - page_offset(p), mirror_num);
> +		ret = btrfs_repair_io_failure(fs_info, 0, start, len,
> +				start, p, offset_in_page(start), mirror_num);
>   		if (ret)
>   			break;
> -		start += PAGE_SIZE;
>   	}
>
>   	return ret;
Christoph Hellwig June 5, 2023, 7:06 a.m. UTC | #4
On Fri, Jun 02, 2023 at 03:15:39PM +0800, Qu Wenruo wrote:
> Hi David,
> 
> I still don't see this fix merged for misc-next, any update on this?

FYI, while I prefer the explicit version, this one definitively fixes
the bug, so:

Reviewed-by: Christoph Hellwig <hch@lst.de>
David Sterba June 5, 2023, 4:40 p.m. UTC | #5
On Fri, May 26, 2023 at 08:30:20PM +0800, Qu Wenruo wrote:
> [BUG]
> Test case btrfs/027 would crash with subpage (64K page size, 4K
> sectorsize) with the following dying messages:
> 
>  debug: map_length=16384 length=65536 type=metadata|raid6(0x104)
>  assertion failed: map_length >= length, in fs/btrfs/volumes.c:8093
>  ------------[ cut here ]------------
>  kernel BUG at fs/btrfs/messages.c:259!
>  Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
>  Call trace:
>   btrfs_assertfail+0x28/0x2c [btrfs]
>   btrfs_map_repair_block+0x150/0x2b8 [btrfs]
>   btrfs_repair_io_failure+0xd4/0x31c [btrfs]
>   btrfs_read_extent_buffer+0x150/0x16c [btrfs]
>   read_tree_block+0x38/0xbc [btrfs]
>   read_tree_root_path+0xfc/0x1bc [btrfs]
>   btrfs_get_root_ref.part.0+0xd4/0x3a8 [btrfs]
>   open_ctree+0xa30/0x172c [btrfs]
>   btrfs_mount_root+0x3c4/0x4a4 [btrfs]
>   legacy_get_tree+0x30/0x60
>   vfs_get_tree+0x28/0xec
>   vfs_kern_mount.part.0+0x90/0xd4
>   vfs_kern_mount+0x14/0x28
>   btrfs_mount+0x114/0x418 [btrfs]
>   legacy_get_tree+0x30/0x60
>   vfs_get_tree+0x28/0xec
>   path_mount+0x3e0/0xb64
>   __arm64_sys_mount+0x200/0x2d8
>   invoke_syscall+0x48/0x114
>   el0_svc_common.constprop.0+0x60/0x11c
>   do_el0_svc+0x38/0x98
>   el0_svc+0x40/0xa8
>   el0t_64_sync_handler+0xf4/0x120
>   el0t_64_sync+0x190/0x194
>  Code: aa0403e2 b0fff060 91010000 959c2024 (d4210000)
> 
> [CAUSE]
> In btrfs/027 we test RAID6 with missing devices, in this particular
> case, we're repairing a metadata at the end of a data stripe.
> 
> But at btrfs_repair_io_failure(), we always pass a full PAGE for repair,
> and for subpage case this can cross stripe boundary and lead to the
> above BUG_ON().
> 
> This metadata repair code is always there, since the introduction of
> subpage support, but this can trigger BUG_ON() after the bio split
> ability at btrfs_map_bio().
> 
> [FIX]
> Instead of passing the old PAGE_SIZE, we calculate the correct length
> based on the eb size and page size for both regular and subpage cases.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Added to misc-next, thanks.
David Sterba June 5, 2023, 4:44 p.m. UTC | #6
On Sat, May 27, 2023 at 06:28:40AM +0800, Qu Wenruo wrote:
> 
> 
> On 2023/5/26 21:42, Christoph Hellwig wrote:
> > On Fri, May 26, 2023 at 08:30:20PM +0800, Qu Wenruo wrote:
> >>   	struct btrfs_fs_info *fs_info = eb->fs_info;
> >> -	u64 start = eb->start;
> >>   	int i, num_pages = num_extent_pages(eb);
> >>   	int ret = 0;
> >>
> >> @@ -185,12 +184,14 @@ static int btrfs_repair_eb_io_failure(const struct extent_buffer *eb,
> >>
> >>   	for (i = 0; i < num_pages; i++) {
> >>   		struct page *p = eb->pages[i];
> >> +		u64 start = max_t(u64, eb->start, page_offset(p));
> >> +		u64 end = min_t(u64, eb->start + eb->len, page_offset(p) + PAGE_SIZE);
> >> +		u32 len = end - start;
> >>
> >> -		ret = btrfs_repair_io_failure(fs_info, 0, start, PAGE_SIZE,
> >> -				start, p, start - page_offset(p), mirror_num);
> >> +		ret = btrfs_repair_io_failure(fs_info, 0, start, len,
> >> +				start, p, offset_in_page(start), mirror_num);
> >>   		if (ret)
> >>   			break;
> >> -		start += PAGE_SIZE;
> >
> > I actually just noticed this a week or so ago, but didn't have a
> > reproducer.  My fix  does this a little differeny by adding a branch
> > for nodesize < PAGE_SIZE similar to write_one_eb or
> > read_extent_buffer_pages, which feels a bit simpler and easier to read
> > to me:
> 
> Sometimes the extra calculation is going to damage the readability indeed.

I prefer your version as it has only one call to btrfs_repair_io_failure
which takes several parameters. The 'iterate/prepare arguments/call'
is a pattern I think we've been using often. The separate checks for
subpage were ment as an intermediate step so we don't touch the
non-subpage code but the goal is to merge them, meaning the logic would
be pushed to 'prepare arguments'.
diff mbox series

Patch

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index c461a46ac6f2..65ac7b95d3a4 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -176,7 +176,6 @@  static int btrfs_repair_eb_io_failure(const struct extent_buffer *eb,
 				      int mirror_num)
 {
 	struct btrfs_fs_info *fs_info = eb->fs_info;
-	u64 start = eb->start;
 	int i, num_pages = num_extent_pages(eb);
 	int ret = 0;
 
@@ -185,12 +184,14 @@  static int btrfs_repair_eb_io_failure(const struct extent_buffer *eb,
 
 	for (i = 0; i < num_pages; i++) {
 		struct page *p = eb->pages[i];
+		u64 start = max_t(u64, eb->start, page_offset(p));
+		u64 end = min_t(u64, eb->start + eb->len, page_offset(p) + PAGE_SIZE);
+		u32 len = end - start;
 
-		ret = btrfs_repair_io_failure(fs_info, 0, start, PAGE_SIZE,
-				start, p, start - page_offset(p), mirror_num);
+		ret = btrfs_repair_io_failure(fs_info, 0, start, len,
+				start, p, offset_in_page(start), mirror_num);
 		if (ret)
 			break;
-		start += PAGE_SIZE;
 	}
 
 	return ret;