diff mbox series

btrfs: verify the tranisd of the to-be-written dirty extent buffer

Message ID 1f011e6358e042e5ae1501e88377267a2a95c09d.1646183319.git.wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: verify the tranisd of the to-be-written dirty extent buffer | expand

Commit Message

Qu Wenruo March 2, 2022, 1:10 a.m. UTC
[BUG]
There is a bug report that a bitflip in the transid part of an extent
buffer makes btrfs to reject certain tree blocks:

  BTRFS error (device dm-0): parent transid verify failed on 1382301696 wanted 262166 found 22

[CAUSE]
Note the failed transid check, hex(262166) = 0x40016, while
hex(22) = 0x16.

It's an obvious bitflip.

Furthermore, the reporter also confirmed the bitflip is from the
hardware, so it's a real hardware caused bitflip, and such problem can
not be detected by the existing tree-checker framework.

As tree-checker can only verify the content inside one tree block, while
generation of a tree block can only be verified against its parent.

So such problem remain undetected.

[FIX]
Although tree-checker can not verify it at write-time, we still have a
quick (but not the most accurate) way to catch such obvious corruption.

Function csum_one_extent_buffer() is called before we submit metadata
write.

Thus it means, all the extent buffer passed in should be dirty tree
blocks, and should be newer than last committed transaction.

Using that we can catch the above bitflip.

Although it's not a perfect solution, as if the corrupted generation is
higher than the correct value, we have no way to catch it at all.

Reported-by: Christoph Anton Mitterer <calestyo@scientia.org>
Link: https://lore.kernel.org/linux-btrfs/2dfcbc130c55cc6fd067b93752e90bd2b079baca.camel@scientia.org/
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

Comments

David Sterba March 2, 2022, 6:56 p.m. UTC | #1
On Wed, Mar 02, 2022 at 09:10:21AM +0800, Qu Wenruo wrote:
> [BUG]
> There is a bug report that a bitflip in the transid part of an extent
> buffer makes btrfs to reject certain tree blocks:
> 
>   BTRFS error (device dm-0): parent transid verify failed on 1382301696 wanted 262166 found 22
> 
> [CAUSE]
> Note the failed transid check, hex(262166) = 0x40016, while
> hex(22) = 0x16.
> 
> It's an obvious bitflip.

Oh, quite. I think I've seen only bits flipped from 0 -> 1, this case is
an interesting evidence.

> Furthermore, the reporter also confirmed the bitflip is from the
> hardware, so it's a real hardware caused bitflip, and such problem can
> not be detected by the existing tree-checker framework.
> 
> As tree-checker can only verify the content inside one tree block, while
> generation of a tree block can only be verified against its parent.
> 
> So such problem remain undetected.
> 
> [FIX]
> Although tree-checker can not verify it at write-time, we still have a
> quick (but not the most accurate) way to catch such obvious corruption.
> 
> Function csum_one_extent_buffer() is called before we submit metadata
> write.
> 
> Thus it means, all the extent buffer passed in should be dirty tree
> blocks, and should be newer than last committed transaction.
> 
> Using that we can catch the above bitflip.
> 
> Although it's not a perfect solution, as if the corrupted generation is
> higher than the correct value, we have no way to catch it at all.
> 
> Reported-by: Christoph Anton Mitterer <calestyo@scientia.org>
> Link: https://lore.kernel.org/linux-btrfs/2dfcbc130c55cc6fd067b93752e90bd2b079baca.camel@scientia.org/
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Added to misc-next, thanks.

> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -441,17 +441,31 @@ static int csum_one_extent_buffer(struct extent_buffer *eb)
>  	else
>  		ret = btrfs_check_leaf_full(eb);
>  
> -	if (ret < 0) {
> -		btrfs_print_tree(eb, 0);
> +	if (ret < 0)
> +		goto error;
> +
> +	/*
> +	 * Also check the generation, the eb reached here must be newer than
> +	 * last committed. Or something seriously wrong happened.
> +	 */
> +	if (btrfs_header_generation(eb) <= fs_info->last_trans_committed) {

For any cases in regular code that leads to EUCLEAN you can use
unlikely() annotation.
Christoph Anton Mitterer March 3, 2022, 4:20 a.m. UTC | #2
On Wed, 2022-03-02 at 19:56 +0100, David Sterba wrote:
> Oh, quite. I think I've seen only bits flipped from 0 -> 1, this case
> is
> an interesting evidence.

I had actually both cases in that defective RAM module...

Cheers,
Chris.
Nikolay Borisov March 7, 2022, 10:51 a.m. UTC | #3
On 2.03.22 г. 3:10 ч., Qu Wenruo wrote:
> [BUG]
> There is a bug report that a bitflip in the transid part of an extent
> buffer makes btrfs to reject certain tree blocks:
> 
>    BTRFS error (device dm-0): parent transid verify failed on 1382301696 wanted 262166 found 22
> 
> [CAUSE]
> Note the failed transid check, hex(262166) = 0x40016, while
> hex(22) = 0x16.
> 
> It's an obvious bitflip.
> 
> Furthermore, the reporter also confirmed the bitflip is from the
> hardware, so it's a real hardware caused bitflip, and such problem can
> not be detected by the existing tree-checker framework.
> 
> As tree-checker can only verify the content inside one tree block, while
> generation of a tree block can only be verified against its parent.
> 
> So such problem remain undetected.
> 
> [FIX]
> Although tree-checker can not verify it at write-time, we still have a
> quick (but not the most accurate) way to catch such obvious corruption.
> 
> Function csum_one_extent_buffer() is called before we submit metadata
> write.
> 
> Thus it means, all the extent buffer passed in should be dirty tree
> blocks, and should be newer than last committed transaction.
> 
> Using that we can catch the above bitflip.
> 
> Although it's not a perfect solution, as if the corrupted generation is
> higher than the correct value, we have no way to catch it at all.
> 
> Reported-by: Christoph Anton Mitterer <calestyo@scientia.org>
> Link: https://lore.kernel.org/linux-btrfs/2dfcbc130c55cc6fd067b93752e90bd2b079baca.camel@scientia.org/
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>   fs/btrfs/disk-io.c | 26 ++++++++++++++++++++------
>   1 file changed, 20 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index b6a81c39d5f4..a89aa523413b 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -441,17 +441,31 @@ static int csum_one_extent_buffer(struct extent_buffer *eb)
>   	else
>   		ret = btrfs_check_leaf_full(eb);
>   
> -	if (ret < 0) {
> -		btrfs_print_tree(eb, 0);
> +	if (ret < 0)
> +		goto error;
> +
> +	/*
> +	 * Also check the generation, the eb reached here must be newer than
> +	 * last committed. Or something seriously wrong happened.
> +	 */
> +	if (btrfs_header_generation(eb) <= fs_info->last_trans_committed) {
> +		ret = -EUCLEAN;
>   		btrfs_err(fs_info,
> -			"block=%llu write time tree block corruption detected",
> -			eb->start);
> -		WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
> -		return ret;
> +			"block=%llu bad generation, have %llu expect > %llu",
> +			  eb->start, btrfs_header_generation(eb),
> +			  fs_info->last_trans_committed);
> +		goto error;

nit: I'd rather have this check in btrfs_check_node/check_leaf functions 
rather than having just this specific check in csum_one_extent_buffer. 
The only thing which is missing AFAICS is the fact the check function 
don't have a context whether we are checking for read or for write. It 
might make sense to extend them to get a boolean param whether the 
validation is for a write or not ?

>   	}
>   	write_extent_buffer(eb, result, 0, fs_info->csum_size);
>   
>   	return 0;
> +error:
> +	btrfs_print_tree(eb, 0);
> +	btrfs_err(fs_info,
> +		"block=%llu write time tree block corruption detected",
> +		eb->start);
> +	WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
> +	return ret;
>   }
>   
>   /* Checksum all dirty extent buffers in one bio_vec */
Qu Wenruo March 7, 2022, 11:11 a.m. UTC | #4
On 2022/3/7 18:51, Nikolay Borisov wrote:
> 
> 
> On 2.03.22 г. 3:10 ч., Qu Wenruo wrote:
>> [BUG]
>> There is a bug report that a bitflip in the transid part of an extent
>> buffer makes btrfs to reject certain tree blocks:
>>
>>    BTRFS error (device dm-0): parent transid verify failed on 
>> 1382301696 wanted 262166 found 22
>>
>> [CAUSE]
>> Note the failed transid check, hex(262166) = 0x40016, while
>> hex(22) = 0x16.
>>
>> It's an obvious bitflip.
>>
>> Furthermore, the reporter also confirmed the bitflip is from the
>> hardware, so it's a real hardware caused bitflip, and such problem can
>> not be detected by the existing tree-checker framework.
>>
>> As tree-checker can only verify the content inside one tree block, while
>> generation of a tree block can only be verified against its parent.
>>
>> So such problem remain undetected.
>>
>> [FIX]
>> Although tree-checker can not verify it at write-time, we still have a
>> quick (but not the most accurate) way to catch such obvious corruption.
>>
>> Function csum_one_extent_buffer() is called before we submit metadata
>> write.
>>
>> Thus it means, all the extent buffer passed in should be dirty tree
>> blocks, and should be newer than last committed transaction.
>>
>> Using that we can catch the above bitflip.
>>
>> Although it's not a perfect solution, as if the corrupted generation is
>> higher than the correct value, we have no way to catch it at all.
>>
>> Reported-by: Christoph Anton Mitterer <calestyo@scientia.org>
>> Link: 
>> https://lore.kernel.org/linux-btrfs/2dfcbc130c55cc6fd067b93752e90bd2b079baca.camel@scientia.org/ 
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   fs/btrfs/disk-io.c | 26 ++++++++++++++++++++------
>>   1 file changed, 20 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index b6a81c39d5f4..a89aa523413b 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -441,17 +441,31 @@ static int csum_one_extent_buffer(struct 
>> extent_buffer *eb)
>>       else
>>           ret = btrfs_check_leaf_full(eb);
>> -    if (ret < 0) {
>> -        btrfs_print_tree(eb, 0);
>> +    if (ret < 0)
>> +        goto error;
>> +
>> +    /*
>> +     * Also check the generation, the eb reached here must be newer than
>> +     * last committed. Or something seriously wrong happened.
>> +     */
>> +    if (btrfs_header_generation(eb) <= fs_info->last_trans_committed) {
>> +        ret = -EUCLEAN;
>>           btrfs_err(fs_info,
>> -            "block=%llu write time tree block corruption detected",
>> -            eb->start);
>> -        WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
>> -        return ret;
>> +            "block=%llu bad generation, have %llu expect > %llu",
>> +              eb->start, btrfs_header_generation(eb),
>> +              fs_info->last_trans_committed);
>> +        goto error;
> 
> nit: I'd rather have this check in btrfs_check_node/check_leaf functions 
> rather than having just this specific check in csum_one_extent_buffer. 
> The only thing which is missing AFAICS is the fact the check function 
> don't have a context whether we are checking for read or for write. It 
> might make sense to extend them to get a boolean param whether the 
> validation is for a write or not ?

Not only it lacks a bool, but also needs a u64 root_owner.


However, I really want to keep check_leaf/node() simple, just an eb, and 
all checks are based on the info from that eb, no extra ones.

Adding one u64 will not be a problem, but I doubt if we begin this 
trend, we may add more and more parameters for that simple function, and 
make it no longer that simple.

Thanks,
Qu

> 
>>       }
>>       write_extent_buffer(eb, result, 0, fs_info->csum_size);
>>       return 0;
>> +error:
>> +    btrfs_print_tree(eb, 0);
>> +    btrfs_err(fs_info,
>> +        "block=%llu write time tree block corruption detected",
>> +        eb->start);
>> +    WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
>> +    return ret;
>>   }
>>   /* Checksum all dirty extent buffers in one bio_vec */
>
diff mbox series

Patch

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b6a81c39d5f4..a89aa523413b 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -441,17 +441,31 @@  static int csum_one_extent_buffer(struct extent_buffer *eb)
 	else
 		ret = btrfs_check_leaf_full(eb);
 
-	if (ret < 0) {
-		btrfs_print_tree(eb, 0);
+	if (ret < 0)
+		goto error;
+
+	/*
+	 * Also check the generation, the eb reached here must be newer than
+	 * last committed. Or something seriously wrong happened.
+	 */
+	if (btrfs_header_generation(eb) <= fs_info->last_trans_committed) {
+		ret = -EUCLEAN;
 		btrfs_err(fs_info,
-			"block=%llu write time tree block corruption detected",
-			eb->start);
-		WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
-		return ret;
+			"block=%llu bad generation, have %llu expect > %llu",
+			  eb->start, btrfs_header_generation(eb),
+			  fs_info->last_trans_committed);
+		goto error;
 	}
 	write_extent_buffer(eb, result, 0, fs_info->csum_size);
 
 	return 0;
+error:
+	btrfs_print_tree(eb, 0);
+	btrfs_err(fs_info,
+		"block=%llu write time tree block corruption detected",
+		eb->start);
+	WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
+	return ret;
 }
 
 /* Checksum all dirty extent buffers in one bio_vec */