Message ID | 1f011e6358e042e5ae1501e88377267a2a95c09d.1646183319.git.wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: verify the tranisd of the to-be-written dirty extent buffer | expand |
On Wed, Mar 02, 2022 at 09:10:21AM +0800, Qu Wenruo wrote: > [BUG] > There is a bug report that a bitflip in the transid part of an extent > buffer makes btrfs to reject certain tree blocks: > > BTRFS error (device dm-0): parent transid verify failed on 1382301696 wanted 262166 found 22 > > [CAUSE] > Note the failed transid check, hex(262166) = 0x40016, while > hex(22) = 0x16. > > It's an obvious bitflip. Oh, quite. I think I've seen only bits flipped from 0 -> 1, this case is an interesting evidence. > Furthermore, the reporter also confirmed the bitflip is from the > hardware, so it's a real hardware caused bitflip, and such problem can > not be detected by the existing tree-checker framework. > > As tree-checker can only verify the content inside one tree block, while > generation of a tree block can only be verified against its parent. > > So such problem remain undetected. > > [FIX] > Although tree-checker can not verify it at write-time, we still have a > quick (but not the most accurate) way to catch such obvious corruption. > > Function csum_one_extent_buffer() is called before we submit metadata > write. > > Thus it means, all the extent buffer passed in should be dirty tree > blocks, and should be newer than last committed transaction. > > Using that we can catch the above bitflip. > > Although it's not a perfect solution, as if the corrupted generation is > higher than the correct value, we have no way to catch it at all. > > Reported-by: Christoph Anton Mitterer <calestyo@scientia.org> > Link: https://lore.kernel.org/linux-btrfs/2dfcbc130c55cc6fd067b93752e90bd2b079baca.camel@scientia.org/ > Signed-off-by: Qu Wenruo <wqu@suse.com> Added to misc-next, thanks. > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -441,17 +441,31 @@ static int csum_one_extent_buffer(struct extent_buffer *eb) > else > ret = btrfs_check_leaf_full(eb); > > - if (ret < 0) { > - btrfs_print_tree(eb, 0); > + if (ret < 0) > + goto error; > + > + /* > + * Also check the generation, the eb reached here must be newer than > + * last committed. Or something seriously wrong happened. > + */ > + if (btrfs_header_generation(eb) <= fs_info->last_trans_committed) { For any cases in regular code that leads to EUCLEAN you can use unlikely() annotation.
On Wed, 2022-03-02 at 19:56 +0100, David Sterba wrote: > Oh, quite. I think I've seen only bits flipped from 0 -> 1, this case > is > an interesting evidence. I had actually both cases in that defective RAM module... Cheers, Chris.
On 2.03.22 г. 3:10 ч., Qu Wenruo wrote: > [BUG] > There is a bug report that a bitflip in the transid part of an extent > buffer makes btrfs to reject certain tree blocks: > > BTRFS error (device dm-0): parent transid verify failed on 1382301696 wanted 262166 found 22 > > [CAUSE] > Note the failed transid check, hex(262166) = 0x40016, while > hex(22) = 0x16. > > It's an obvious bitflip. > > Furthermore, the reporter also confirmed the bitflip is from the > hardware, so it's a real hardware caused bitflip, and such problem can > not be detected by the existing tree-checker framework. > > As tree-checker can only verify the content inside one tree block, while > generation of a tree block can only be verified against its parent. > > So such problem remain undetected. > > [FIX] > Although tree-checker can not verify it at write-time, we still have a > quick (but not the most accurate) way to catch such obvious corruption. > > Function csum_one_extent_buffer() is called before we submit metadata > write. > > Thus it means, all the extent buffer passed in should be dirty tree > blocks, and should be newer than last committed transaction. > > Using that we can catch the above bitflip. > > Although it's not a perfect solution, as if the corrupted generation is > higher than the correct value, we have no way to catch it at all. > > Reported-by: Christoph Anton Mitterer <calestyo@scientia.org> > Link: https://lore.kernel.org/linux-btrfs/2dfcbc130c55cc6fd067b93752e90bd2b079baca.camel@scientia.org/ > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > fs/btrfs/disk-io.c | 26 ++++++++++++++++++++------ > 1 file changed, 20 insertions(+), 6 deletions(-) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index b6a81c39d5f4..a89aa523413b 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -441,17 +441,31 @@ static int csum_one_extent_buffer(struct extent_buffer *eb) > else > ret = btrfs_check_leaf_full(eb); > > - if (ret < 0) { > - btrfs_print_tree(eb, 0); > + if (ret < 0) > + goto error; > + > + /* > + * Also check the generation, the eb reached here must be newer than > + * last committed. Or something seriously wrong happened. > + */ > + if (btrfs_header_generation(eb) <= fs_info->last_trans_committed) { > + ret = -EUCLEAN; > btrfs_err(fs_info, > - "block=%llu write time tree block corruption detected", > - eb->start); > - WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG)); > - return ret; > + "block=%llu bad generation, have %llu expect > %llu", > + eb->start, btrfs_header_generation(eb), > + fs_info->last_trans_committed); > + goto error; nit: I'd rather have this check in btrfs_check_node/check_leaf functions rather than having just this specific check in csum_one_extent_buffer. The only thing which is missing AFAICS is the fact the check function don't have a context whether we are checking for read or for write. It might make sense to extend them to get a boolean param whether the validation is for a write or not ? > } > write_extent_buffer(eb, result, 0, fs_info->csum_size); > > return 0; > +error: > + btrfs_print_tree(eb, 0); > + btrfs_err(fs_info, > + "block=%llu write time tree block corruption detected", > + eb->start); > + WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG)); > + return ret; > } > > /* Checksum all dirty extent buffers in one bio_vec */
On 2022/3/7 18:51, Nikolay Borisov wrote: > > > On 2.03.22 г. 3:10 ч., Qu Wenruo wrote: >> [BUG] >> There is a bug report that a bitflip in the transid part of an extent >> buffer makes btrfs to reject certain tree blocks: >> >> BTRFS error (device dm-0): parent transid verify failed on >> 1382301696 wanted 262166 found 22 >> >> [CAUSE] >> Note the failed transid check, hex(262166) = 0x40016, while >> hex(22) = 0x16. >> >> It's an obvious bitflip. >> >> Furthermore, the reporter also confirmed the bitflip is from the >> hardware, so it's a real hardware caused bitflip, and such problem can >> not be detected by the existing tree-checker framework. >> >> As tree-checker can only verify the content inside one tree block, while >> generation of a tree block can only be verified against its parent. >> >> So such problem remain undetected. >> >> [FIX] >> Although tree-checker can not verify it at write-time, we still have a >> quick (but not the most accurate) way to catch such obvious corruption. >> >> Function csum_one_extent_buffer() is called before we submit metadata >> write. >> >> Thus it means, all the extent buffer passed in should be dirty tree >> blocks, and should be newer than last committed transaction. >> >> Using that we can catch the above bitflip. >> >> Although it's not a perfect solution, as if the corrupted generation is >> higher than the correct value, we have no way to catch it at all. >> >> Reported-by: Christoph Anton Mitterer <calestyo@scientia.org> >> Link: >> https://lore.kernel.org/linux-btrfs/2dfcbc130c55cc6fd067b93752e90bd2b079baca.camel@scientia.org/ >> >> Signed-off-by: Qu Wenruo <wqu@suse.com> >> --- >> fs/btrfs/disk-io.c | 26 ++++++++++++++++++++------ >> 1 file changed, 20 insertions(+), 6 deletions(-) >> >> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >> index b6a81c39d5f4..a89aa523413b 100644 >> --- a/fs/btrfs/disk-io.c >> +++ b/fs/btrfs/disk-io.c >> @@ -441,17 +441,31 @@ static int csum_one_extent_buffer(struct >> extent_buffer *eb) >> else >> ret = btrfs_check_leaf_full(eb); >> - if (ret < 0) { >> - btrfs_print_tree(eb, 0); >> + if (ret < 0) >> + goto error; >> + >> + /* >> + * Also check the generation, the eb reached here must be newer than >> + * last committed. Or something seriously wrong happened. >> + */ >> + if (btrfs_header_generation(eb) <= fs_info->last_trans_committed) { >> + ret = -EUCLEAN; >> btrfs_err(fs_info, >> - "block=%llu write time tree block corruption detected", >> - eb->start); >> - WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG)); >> - return ret; >> + "block=%llu bad generation, have %llu expect > %llu", >> + eb->start, btrfs_header_generation(eb), >> + fs_info->last_trans_committed); >> + goto error; > > nit: I'd rather have this check in btrfs_check_node/check_leaf functions > rather than having just this specific check in csum_one_extent_buffer. > The only thing which is missing AFAICS is the fact the check function > don't have a context whether we are checking for read or for write. It > might make sense to extend them to get a boolean param whether the > validation is for a write or not ? Not only it lacks a bool, but also needs a u64 root_owner. However, I really want to keep check_leaf/node() simple, just an eb, and all checks are based on the info from that eb, no extra ones. Adding one u64 will not be a problem, but I doubt if we begin this trend, we may add more and more parameters for that simple function, and make it no longer that simple. Thanks, Qu > >> } >> write_extent_buffer(eb, result, 0, fs_info->csum_size); >> return 0; >> +error: >> + btrfs_print_tree(eb, 0); >> + btrfs_err(fs_info, >> + "block=%llu write time tree block corruption detected", >> + eb->start); >> + WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG)); >> + return ret; >> } >> /* Checksum all dirty extent buffers in one bio_vec */ >
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index b6a81c39d5f4..a89aa523413b 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -441,17 +441,31 @@ static int csum_one_extent_buffer(struct extent_buffer *eb) else ret = btrfs_check_leaf_full(eb); - if (ret < 0) { - btrfs_print_tree(eb, 0); + if (ret < 0) + goto error; + + /* + * Also check the generation, the eb reached here must be newer than + * last committed. Or something seriously wrong happened. + */ + if (btrfs_header_generation(eb) <= fs_info->last_trans_committed) { + ret = -EUCLEAN; btrfs_err(fs_info, - "block=%llu write time tree block corruption detected", - eb->start); - WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG)); - return ret; + "block=%llu bad generation, have %llu expect > %llu", + eb->start, btrfs_header_generation(eb), + fs_info->last_trans_committed); + goto error; } write_extent_buffer(eb, result, 0, fs_info->csum_size); return 0; +error: + btrfs_print_tree(eb, 0); + btrfs_err(fs_info, + "block=%llu write time tree block corruption detected", + eb->start); + WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG)); + return ret; } /* Checksum all dirty extent buffers in one bio_vec */
[BUG] There is a bug report that a bitflip in the transid part of an extent buffer makes btrfs to reject certain tree blocks: BTRFS error (device dm-0): parent transid verify failed on 1382301696 wanted 262166 found 22 [CAUSE] Note the failed transid check, hex(262166) = 0x40016, while hex(22) = 0x16. It's an obvious bitflip. Furthermore, the reporter also confirmed the bitflip is from the hardware, so it's a real hardware caused bitflip, and such problem can not be detected by the existing tree-checker framework. As tree-checker can only verify the content inside one tree block, while generation of a tree block can only be verified against its parent. So such problem remain undetected. [FIX] Although tree-checker can not verify it at write-time, we still have a quick (but not the most accurate) way to catch such obvious corruption. Function csum_one_extent_buffer() is called before we submit metadata write. Thus it means, all the extent buffer passed in should be dirty tree blocks, and should be newer than last committed transaction. Using that we can catch the above bitflip. Although it's not a perfect solution, as if the corrupted generation is higher than the correct value, we have no way to catch it at all. Reported-by: Christoph Anton Mitterer <calestyo@scientia.org> Link: https://lore.kernel.org/linux-btrfs/2dfcbc130c55cc6fd067b93752e90bd2b079baca.camel@scientia.org/ Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/disk-io.c | 26 ++++++++++++++++++++------ 1 file changed, 20 insertions(+), 6 deletions(-)