Message ID | eb543cde2378cc111b0b8359ef94ff0dbd51ee58.1723355397.git.wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: tree-checker: add btrfs dev extent checks | expand |
On Sun, Aug 11, 2024 at 03:20:08PM +0930, Qu Wenruo wrote: > [REPORT] > There is a corruption report that btrfs refuse to mount a fs that has > overlapping dev extents: > > BTRFS error (device sdc): dev extent devid 4 physical offset > 14263979671552 overlap with previous dev extent end 14263980982272 > BTRFS error (device sdc): failed to verify dev extents against chunks: -117 > BTRFS error (device sdc): open_ctree failed > > [CAUSE] > The cause is very obvious, there is a bad dev extent item with incorrect > length. > Although we are not 100% sure of the cause before getting the dev tree > dump, I'm already surprised that we do not have any checks on dev tree. > > Currently we only do the dev-extent verification at mount time, but if the > corruption is caused by memory bitflip, we really want to catch it before > writing the corruption to the storage. > > Furthermore the dev extent items has the following key definition: > > (<device id> DEV_EXTENT <physical offset>) > > Thus we can not just rely on the generic key order check to make sure > there is no overlapping. > > [ENHANCEMENT] > Introduce dedicated dev extent checks, including: > > - Fixed member checks > * chunk_tree should always be BTRFS_CHUNK_TREE_OBJECTID (3) > * chunk_objectid should always be > BTRFS_FIRST_CHUNK_CHUNK_TREE_OBJECTID (256) > > - Alignment checks > * chunk_offset should be aligned to sectorsize > * length should be aligned to sectorsize > * key.offset should be aligned to sectorsize > > - Overlap checks > If the previous key is also a dev-extent item, with the same > device id, make sure we do not overlap with the previous dev extent. > > Reported: Stefan N <stefannnau@gmail.com> > Link: https://lore.kernel.org/linux-btrfs/CA+W5K0rSO3koYTo=nzxxTm1-Pdu1HYgVxEpgJ=aGc7d=E8mGEg@mail.gmail.com/ > Signed-off-by: Qu Wenruo <wqu@suse.com> Looks like we missed some simple tree item checks indeed. Reviewed-by: David Sterba <dsterba@suse.com>
在 2024/8/14 08:41, David Sterba 写道: > On Sun, Aug 11, 2024 at 03:20:08PM +0930, Qu Wenruo wrote: >> [REPORT] >> There is a corruption report that btrfs refuse to mount a fs that has >> overlapping dev extents: >> >> BTRFS error (device sdc): dev extent devid 4 physical offset >> 14263979671552 overlap with previous dev extent end 14263980982272 >> BTRFS error (device sdc): failed to verify dev extents against chunks: -117 >> BTRFS error (device sdc): open_ctree failed >> >> [CAUSE] >> The cause is very obvious, there is a bad dev extent item with incorrect >> length. >> Although we are not 100% sure of the cause before getting the dev tree >> dump, I'm already surprised that we do not have any checks on dev tree. >> >> Currently we only do the dev-extent verification at mount time, but if the >> corruption is caused by memory bitflip, we really want to catch it before >> writing the corruption to the storage. >> >> Furthermore the dev extent items has the following key definition: >> >> (<device id> DEV_EXTENT <physical offset>) >> >> Thus we can not just rely on the generic key order check to make sure >> there is no overlapping. >> >> [ENHANCEMENT] >> Introduce dedicated dev extent checks, including: >> >> - Fixed member checks >> * chunk_tree should always be BTRFS_CHUNK_TREE_OBJECTID (3) >> * chunk_objectid should always be >> BTRFS_FIRST_CHUNK_CHUNK_TREE_OBJECTID (256) >> >> - Alignment checks >> * chunk_offset should be aligned to sectorsize >> * length should be aligned to sectorsize >> * key.offset should be aligned to sectorsize >> >> - Overlap checks >> If the previous key is also a dev-extent item, with the same >> device id, make sure we do not overlap with the previous dev extent. >> >> Reported: Stefan N <stefannnau@gmail.com> >> Link: https://lore.kernel.org/linux-btrfs/CA+W5K0rSO3koYTo=nzxxTm1-Pdu1HYgVxEpgJ=aGc7d=E8mGEg@mail.gmail.com/ >> Signed-off-by: Qu Wenruo <wqu@suse.com> > > Looks like we missed some simple tree item checks indeed. The original idea is, we have btrfs_verify_dev_extents() at mount time, thus it's enough to reject bad dev extents, and no need for tree-checker for dev-extents. But this method doesn't prevent bitflip from sneaking in during runtime. So in the long run, our sanity checks should: - Do cross-checks at mount time for critical infrastructure To prevent corruption sneaking in undetected. - Do in-leaf checks at tree-checker To prevent corruption reach storage. - Do extra read-time cross-checks Just like the dir item checks we did. Thanks, Qu > > Reviewed-by: David Sterba <dsterba@suse.com>
On Wed, Aug 14, 2024 at 08:47:40AM +0930, Qu Wenruo wrote: > > Looks like we missed some simple tree item checks indeed. > > The original idea is, we have btrfs_verify_dev_extents() at mount time, > thus it's enough to reject bad dev extents, and no need for tree-checker > for dev-extents. > > But this method doesn't prevent bitflip from sneaking in during runtime. Yeah, the tree-checker is run before commit too, so the mount checks won't catch that. > So in the long run, our sanity checks should: > > - Do cross-checks at mount time for critical infrastructure > To prevent corruption sneaking in undetected. At mount we can afford to do more than the tree-checker can do in the single leaf, like matching the dev extents and block groups. > - Do in-leaf checks at tree-checker > To prevent corruption reach storage. > > - Do extra read-time cross-checks > Just like the dir item checks we did. Some values have a clear range, or a constraint like alignment to a sectorsize. Beyond that I think it's quite limited, we can't read other leaves but maybe other checks against existing in-memory structures can be done. An example, a block group item is consistent but overlaps with one in the tree. Similar to what you do in this patch, but the bg is already handled because of how the data is tracked. The tree-checker coverage is pretty good so it's hard to find what else to cross-check.
On 11/8/24 1:50 pm, Qu Wenruo wrote: > [REPORT] > There is a corruption report that btrfs refuse to mount a fs that has > overlapping dev extents: > > BTRFS error (device sdc): dev extent devid 4 physical offset > 14263979671552 overlap with previous dev extent end 14263980982272 > BTRFS error (device sdc): failed to verify dev extents against chunks: -117 > BTRFS error (device sdc): open_ctree failed > > [CAUSE] > The cause is very obvious, there is a bad dev extent item with incorrect > length. > Although we are not 100% sure of the cause before getting the dev tree > dump, I'm already surprised that we do not have any checks on dev tree. > > Currently we only do the dev-extent verification at mount time, but if the > corruption is caused by memory bitflip, we really want to catch it before > writing the corruption to the storage. > > Furthermore the dev extent items has the following key definition: > > (<device id> DEV_EXTENT <physical offset>) > > Thus we can not just rely on the generic key order check to make sure > there is no overlapping. > > [ENHANCEMENT] > Introduce dedicated dev extent checks, including: > > - Fixed member checks > * chunk_tree should always be BTRFS_CHUNK_TREE_OBJECTID (3) > * chunk_objectid should always be > BTRFS_FIRST_CHUNK_CHUNK_TREE_OBJECTID (256) > > - Alignment checks > * chunk_offset should be aligned to sectorsize > * length should be aligned to sectorsize > * key.offset should be aligned to sectorsize > > - Overlap checks > If the previous key is also a dev-extent item, with the same > device id, make sure we do not overlap with the previous dev extent. > > Reported: Stefan N <stefannnau@gmail.com> > Link: https://lore.kernel.org/linux-btrfs/CA+W5K0rSO3koYTo=nzxxTm1-Pdu1HYgVxEpgJ=aGc7d=E8mGEg@mail.gmail.com/ > Signed-off-by: Qu Wenruo <wqu@suse.com> Looks good. Reviewed-by: Anand Jain <anand.jain@oracle.com> Thx.
On Thu, Aug 15, 2024 at 01:17:00PM +0800, Anand Jain wrote: > On 11/8/24 1:50 pm, Qu Wenruo wrote: > > [REPORT] > > There is a corruption report that btrfs refuse to mount a fs that has > > overlapping dev extents: > > > > BTRFS error (device sdc): dev extent devid 4 physical offset > > 14263979671552 overlap with previous dev extent end 14263980982272 > > BTRFS error (device sdc): failed to verify dev extents against chunks: -117 > > BTRFS error (device sdc): open_ctree failed > > > > [CAUSE] > > The cause is very obvious, there is a bad dev extent item with incorrect > > length. > > Although we are not 100% sure of the cause before getting the dev tree > > dump, I'm already surprised that we do not have any checks on dev tree. > > > > Currently we only do the dev-extent verification at mount time, but if the > > corruption is caused by memory bitflip, we really want to catch it before > > writing the corruption to the storage. > > > > Furthermore the dev extent items has the following key definition: > > > > (<device id> DEV_EXTENT <physical offset>) > > > > Thus we can not just rely on the generic key order check to make sure > > there is no overlapping. > > > > [ENHANCEMENT] > > Introduce dedicated dev extent checks, including: > > > > - Fixed member checks > > * chunk_tree should always be BTRFS_CHUNK_TREE_OBJECTID (3) > > * chunk_objectid should always be > > BTRFS_FIRST_CHUNK_CHUNK_TREE_OBJECTID (256) > > > > - Alignment checks > > * chunk_offset should be aligned to sectorsize > > * length should be aligned to sectorsize > > * key.offset should be aligned to sectorsize > > > > - Overlap checks > > If the previous key is also a dev-extent item, with the same > > device id, make sure we do not overlap with the previous dev extent. > > > > Reported: Stefan N <stefannnau@gmail.com> > > Link: https://lore.kernel.org/linux-btrfs/CA+W5K0rSO3koYTo=nzxxTm1-Pdu1HYgVxEpgJ=aGc7d=E8mGEg@mail.gmail.com/ > > Signed-off-by: Qu Wenruo <wqu@suse.com> > > Looks good. > > Reviewed-by: Anand Jain <anand.jain@oracle.com> I'm doing some updaets to for-next so I've added the tag.
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c index a825fa598e3c..3e8fbedfe04d 100644 --- a/fs/btrfs/tree-checker.c +++ b/fs/btrfs/tree-checker.c @@ -1763,6 +1763,72 @@ static int check_raid_stripe_extent(const struct extent_buffer *leaf, return 0; } +static int check_dev_extent_item(const struct extent_buffer *leaf, + const struct btrfs_key *key, + int slot, + struct btrfs_key *prev_key) +{ + struct btrfs_dev_extent *de; + const u32 sectorsize = leaf->fs_info->sectorsize; + + de = btrfs_item_ptr(leaf, slot, struct btrfs_dev_extent); + /* Basic fixed member checks. */ + if (unlikely(btrfs_dev_extent_chunk_tree(leaf, de) != + BTRFS_CHUNK_TREE_OBJECTID)) { + generic_err(leaf, slot, + "invalid dev extent chunk tree id, has %llu expect %llu", + btrfs_dev_extent_chunk_tree(leaf, de), + BTRFS_CHUNK_TREE_OBJECTID); + return -EUCLEAN; + } + if (unlikely(btrfs_dev_extent_chunk_objectid(leaf, de) != + BTRFS_FIRST_CHUNK_TREE_OBJECTID)) { + generic_err(leaf, slot, + "invalid dev extent chunk objectid, has %llu expect %llu", + btrfs_dev_extent_chunk_objectid(leaf, de), + BTRFS_FIRST_CHUNK_TREE_OBJECTID); + return -EUCLEAN; + } + /* Alignment check. */ + if (unlikely(!IS_ALIGNED(key->offset, sectorsize))) { + generic_err(leaf, slot, + "invalid dev extent key.offset, has %llu not aligned to %u", + key->offset, sectorsize); + return -EUCLEAN; + } + if (unlikely(!IS_ALIGNED(btrfs_dev_extent_chunk_offset(leaf, de), + sectorsize))) { + generic_err(leaf, slot, + "invalid dev extent chunk offset, has %llu not aligned to %u", + btrfs_dev_extent_chunk_objectid(leaf, de), + sectorsize); + return -EUCLEAN; + } + if (unlikely(!IS_ALIGNED(btrfs_dev_extent_length(leaf, de), + sectorsize))) { + generic_err(leaf, slot, + "invalid dev extent length, has %llu not aligned to %u", + btrfs_dev_extent_length(leaf, de), sectorsize); + return -EUCLEAN; + } + /* Overlap check with previous dev extent. */ + if (slot && prev_key->objectid == key->objectid && + prev_key->type == key->type) { + struct btrfs_dev_extent *prev_de; + u64 prev_len; + + prev_de = btrfs_item_ptr(leaf, slot - 1, struct btrfs_dev_extent); + prev_len = btrfs_dev_extent_length(leaf, prev_de); + if (unlikely(prev_key->offset + prev_len > key->offset)) { + generic_err(leaf, slot, + "dev extent overlap, prev offset %llu len %llu current offset %llu", + prev_key->objectid, prev_len, key->offset); + return -EUCLEAN; + } + } + return 0; +} + /* * Common point to switch the item-specific validation. */ @@ -1799,6 +1865,9 @@ static enum btrfs_tree_block_status check_leaf_item(struct extent_buffer *leaf, case BTRFS_DEV_ITEM_KEY: ret = check_dev_item(leaf, key, slot); break; + case BTRFS_DEV_EXTENT_KEY: + ret = check_dev_extent_item(leaf, key, slot, prev_key); + break; case BTRFS_INODE_ITEM_KEY: ret = check_inode_item(leaf, key, slot); break;
[REPORT] There is a corruption report that btrfs refuse to mount a fs that has overlapping dev extents: BTRFS error (device sdc): dev extent devid 4 physical offset 14263979671552 overlap with previous dev extent end 14263980982272 BTRFS error (device sdc): failed to verify dev extents against chunks: -117 BTRFS error (device sdc): open_ctree failed [CAUSE] The cause is very obvious, there is a bad dev extent item with incorrect length. Although we are not 100% sure of the cause before getting the dev tree dump, I'm already surprised that we do not have any checks on dev tree. Currently we only do the dev-extent verification at mount time, but if the corruption is caused by memory bitflip, we really want to catch it before writing the corruption to the storage. Furthermore the dev extent items has the following key definition: (<device id> DEV_EXTENT <physical offset>) Thus we can not just rely on the generic key order check to make sure there is no overlapping. [ENHANCEMENT] Introduce dedicated dev extent checks, including: - Fixed member checks * chunk_tree should always be BTRFS_CHUNK_TREE_OBJECTID (3) * chunk_objectid should always be BTRFS_FIRST_CHUNK_CHUNK_TREE_OBJECTID (256) - Alignment checks * chunk_offset should be aligned to sectorsize * length should be aligned to sectorsize * key.offset should be aligned to sectorsize - Overlap checks If the previous key is also a dev-extent item, with the same device id, make sure we do not overlap with the previous dev extent. Reported: Stefan N <stefannnau@gmail.com> Link: https://lore.kernel.org/linux-btrfs/CA+W5K0rSO3koYTo=nzxxTm1-Pdu1HYgVxEpgJ=aGc7d=E8mGEg@mail.gmail.com/ Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/tree-checker.c | 69 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 69 insertions(+)