diff mbox series

btrfs-progs: raid56: fix the wrong recovery condition for data and P case

Message ID 20211116131051.247977-1-wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs-progs: raid56: fix the wrong recovery condition for data and P case | expand

Commit Message

Qu Wenruo Nov. 16, 2021, 1:10 p.m. UTC
There is a bug in raid56_recov() which doesn't properly repair data and
P case corruption:

	/* Data and P*/
	if (dest2 == nr_devs - 1)
		return raid6_recov_datap(nr_devs, stripe_len, dest1, data);

Note that, dest1/2 is to indicate which slot has corruption.

For RAID6 cases:

[0, nr_devs - 2) is for data stripes,
@data_devs - 2 is for P,
@data_devs - 1 is for Q.

For above code, the comment is correct, but the check condition is
wrong, and leads to the only project, btrfs-fuse, to report raid6
recovery error for 2 devices missing case.

Fix it by using correct condition.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
But I'm more interested in why this function is still there, as there
seems to be no caller of this function in btrfs-progs anyway.
---
 kernel-lib/raid56.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

David Sterba Nov. 16, 2021, 1:52 p.m. UTC | #1
On Tue, Nov 16, 2021 at 09:10:51PM +0800, Qu Wenruo wrote:
> There is a bug in raid56_recov() which doesn't properly repair data and
> P case corruption:
> 
> 	/* Data and P*/
> 	if (dest2 == nr_devs - 1)
> 		return raid6_recov_datap(nr_devs, stripe_len, dest1, data);
> 
> Note that, dest1/2 is to indicate which slot has corruption.
> 
> For RAID6 cases:
> 
> [0, nr_devs - 2) is for data stripes,
> @data_devs - 2 is for P,
> @data_devs - 1 is for Q.
> 
> For above code, the comment is correct, but the check condition is
> wrong, and leads to the only project, btrfs-fuse, to report raid6
> recovery error for 2 devices missing case.
> 
> Fix it by using correct condition.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> But I'm more interested in why this function is still there, as there
> seems to be no caller of this function in btrfs-progs anyway.

The file is there from old times when the radi56 implementation landed
and the file was a copy of something in the lib/raid6 directory, but the
sources have diverged.

The function was not used as there was no repair code in userspace, so
the question is if w still want it there or remove it.
Qu Wenruo Nov. 16, 2021, 11:09 p.m. UTC | #2
On 2021/11/16 21:52, David Sterba wrote:
> On Tue, Nov 16, 2021 at 09:10:51PM +0800, Qu Wenruo wrote:
>> There is a bug in raid56_recov() which doesn't properly repair data and
>> P case corruption:
>>
>> 	/* Data and P*/
>> 	if (dest2 == nr_devs - 1)
>> 		return raid6_recov_datap(nr_devs, stripe_len, dest1, data);
>>
>> Note that, dest1/2 is to indicate which slot has corruption.
>>
>> For RAID6 cases:
>>
>> [0, nr_devs - 2) is for data stripes,
>> @data_devs - 2 is for P,
>> @data_devs - 1 is for Q.
>>
>> For above code, the comment is correct, but the check condition is
>> wrong, and leads to the only project, btrfs-fuse, to report raid6
>> recovery error for 2 devices missing case.
>>
>> Fix it by using correct condition.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>> But I'm more interested in why this function is still there, as there
>> seems to be no caller of this function in btrfs-progs anyway.
>
> The file is there from old times when the radi56 implementation landed
> and the file was a copy of something in the lib/raid6 directory, but the
> sources have diverged.
>
> The function was not used as there was no repair code in userspace, so
> the question is if w still want it there or remove it.
>
But then the problem is, how could userspace doesn't have any RAID56
recovery mechanism?

Things like btrfs check still needs RAID56 recovery to read fs with
missing devices, this doesn't make any sense to me.

The only good news is now we have another project which would do a full
coverage test for btrfs-progs...

Thanks,
Qu
diff mbox series

Patch

diff --git a/kernel-lib/raid56.c b/kernel-lib/raid56.c
index a94a60ed73d0..6f690484c810 100644
--- a/kernel-lib/raid56.c
+++ b/kernel-lib/raid56.c
@@ -343,7 +343,7 @@  int raid56_recov(int nr_devs, size_t stripe_len, u64 profile, int dest1,
 		return raid6_recov_data2(nr_devs, stripe_len, dest1, dest2,
 					 data);
 	/* Data and P*/
-	if (dest2 == nr_devs - 1)
+	if (dest2 == nr_devs - 2)
 		return raid6_recov_datap(nr_devs, stripe_len, dest1, data);
 
 	/*