diff mbox series

btrfs: tree-checker: add btrfs dev extent checks

Message ID eb543cde2378cc111b0b8359ef94ff0dbd51ee58.1723355397.git.wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: tree-checker: add btrfs dev extent checks | expand

Commit Message

Qu Wenruo Aug. 11, 2024, 5:50 a.m. UTC
[REPORT]
There is a corruption report that btrfs refuse to mount a fs that has
overlapping dev extents:

 BTRFS error (device sdc): dev extent devid 4 physical offset
14263979671552 overlap with previous dev extent end 14263980982272
 BTRFS error (device sdc): failed to verify dev extents against chunks: -117
 BTRFS error (device sdc): open_ctree failed

[CAUSE]
The cause is very obvious, there is a bad dev extent item with incorrect
length.
Although we are not 100% sure of the cause before getting the dev tree
dump, I'm already surprised that we do not have any checks on dev tree.

Currently we only do the dev-extent verification at mount time, but if the
corruption is caused by memory bitflip, we really want to catch it before
writing the corruption to the storage.

Furthermore the dev extent items has the following key definition:

	(<device id> DEV_EXTENT <physical offset>)

Thus we can not just rely on the generic key order check to make sure
there is no overlapping.

[ENHANCEMENT]
Introduce dedicated dev extent checks, including:

- Fixed member checks
  * chunk_tree should always be BTRFS_CHUNK_TREE_OBJECTID (3)
  * chunk_objectid should always be
    BTRFS_FIRST_CHUNK_CHUNK_TREE_OBJECTID (256)

- Alignment checks
  * chunk_offset should be aligned to sectorsize
  * length should be aligned to sectorsize
  * key.offset should be aligned to sectorsize

- Overlap checks
  If the previous key is also a dev-extent item, with the same
  device id, make sure we do not overlap with the previous dev extent.

Reported: Stefan N <stefannnau@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CA+W5K0rSO3koYTo=nzxxTm1-Pdu1HYgVxEpgJ=aGc7d=E8mGEg@mail.gmail.com/
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/tree-checker.c | 69 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 69 insertions(+)

Comments

David Sterba Aug. 13, 2024, 11:11 p.m. UTC | #1
On Sun, Aug 11, 2024 at 03:20:08PM +0930, Qu Wenruo wrote:
> [REPORT]
> There is a corruption report that btrfs refuse to mount a fs that has
> overlapping dev extents:
> 
>  BTRFS error (device sdc): dev extent devid 4 physical offset
> 14263979671552 overlap with previous dev extent end 14263980982272
>  BTRFS error (device sdc): failed to verify dev extents against chunks: -117
>  BTRFS error (device sdc): open_ctree failed
> 
> [CAUSE]
> The cause is very obvious, there is a bad dev extent item with incorrect
> length.
> Although we are not 100% sure of the cause before getting the dev tree
> dump, I'm already surprised that we do not have any checks on dev tree.
> 
> Currently we only do the dev-extent verification at mount time, but if the
> corruption is caused by memory bitflip, we really want to catch it before
> writing the corruption to the storage.
> 
> Furthermore the dev extent items has the following key definition:
> 
> 	(<device id> DEV_EXTENT <physical offset>)
> 
> Thus we can not just rely on the generic key order check to make sure
> there is no overlapping.
> 
> [ENHANCEMENT]
> Introduce dedicated dev extent checks, including:
> 
> - Fixed member checks
>   * chunk_tree should always be BTRFS_CHUNK_TREE_OBJECTID (3)
>   * chunk_objectid should always be
>     BTRFS_FIRST_CHUNK_CHUNK_TREE_OBJECTID (256)
> 
> - Alignment checks
>   * chunk_offset should be aligned to sectorsize
>   * length should be aligned to sectorsize
>   * key.offset should be aligned to sectorsize
> 
> - Overlap checks
>   If the previous key is also a dev-extent item, with the same
>   device id, make sure we do not overlap with the previous dev extent.
> 
> Reported: Stefan N <stefannnau@gmail.com>
> Link: https://lore.kernel.org/linux-btrfs/CA+W5K0rSO3koYTo=nzxxTm1-Pdu1HYgVxEpgJ=aGc7d=E8mGEg@mail.gmail.com/
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Looks like we missed some simple tree item checks indeed.

Reviewed-by: David Sterba <dsterba@suse.com>
Qu Wenruo Aug. 13, 2024, 11:17 p.m. UTC | #2
在 2024/8/14 08:41, David Sterba 写道:
> On Sun, Aug 11, 2024 at 03:20:08PM +0930, Qu Wenruo wrote:
>> [REPORT]
>> There is a corruption report that btrfs refuse to mount a fs that has
>> overlapping dev extents:
>>
>>   BTRFS error (device sdc): dev extent devid 4 physical offset
>> 14263979671552 overlap with previous dev extent end 14263980982272
>>   BTRFS error (device sdc): failed to verify dev extents against chunks: -117
>>   BTRFS error (device sdc): open_ctree failed
>>
>> [CAUSE]
>> The cause is very obvious, there is a bad dev extent item with incorrect
>> length.
>> Although we are not 100% sure of the cause before getting the dev tree
>> dump, I'm already surprised that we do not have any checks on dev tree.
>>
>> Currently we only do the dev-extent verification at mount time, but if the
>> corruption is caused by memory bitflip, we really want to catch it before
>> writing the corruption to the storage.
>>
>> Furthermore the dev extent items has the following key definition:
>>
>> 	(<device id> DEV_EXTENT <physical offset>)
>>
>> Thus we can not just rely on the generic key order check to make sure
>> there is no overlapping.
>>
>> [ENHANCEMENT]
>> Introduce dedicated dev extent checks, including:
>>
>> - Fixed member checks
>>    * chunk_tree should always be BTRFS_CHUNK_TREE_OBJECTID (3)
>>    * chunk_objectid should always be
>>      BTRFS_FIRST_CHUNK_CHUNK_TREE_OBJECTID (256)
>>
>> - Alignment checks
>>    * chunk_offset should be aligned to sectorsize
>>    * length should be aligned to sectorsize
>>    * key.offset should be aligned to sectorsize
>>
>> - Overlap checks
>>    If the previous key is also a dev-extent item, with the same
>>    device id, make sure we do not overlap with the previous dev extent.
>>
>> Reported: Stefan N <stefannnau@gmail.com>
>> Link: https://lore.kernel.org/linux-btrfs/CA+W5K0rSO3koYTo=nzxxTm1-Pdu1HYgVxEpgJ=aGc7d=E8mGEg@mail.gmail.com/
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
> 
> Looks like we missed some simple tree item checks indeed.

The original idea is, we have btrfs_verify_dev_extents() at mount time, 
thus it's enough to reject bad dev extents, and no need for tree-checker 
for dev-extents.

But this method doesn't prevent bitflip from sneaking in during runtime.

So in the long run, our sanity checks should:

- Do cross-checks at mount time for critical infrastructure
   To prevent corruption sneaking in undetected.

- Do in-leaf checks at tree-checker
   To prevent corruption reach storage.

- Do extra read-time cross-checks
   Just like the dir item checks we did.

Thanks,
Qu
> 
> Reviewed-by: David Sterba <dsterba@suse.com>
David Sterba Aug. 13, 2024, 11:32 p.m. UTC | #3
On Wed, Aug 14, 2024 at 08:47:40AM +0930, Qu Wenruo wrote:
> > Looks like we missed some simple tree item checks indeed.
> 
> The original idea is, we have btrfs_verify_dev_extents() at mount time, 
> thus it's enough to reject bad dev extents, and no need for tree-checker 
> for dev-extents.
> 
> But this method doesn't prevent bitflip from sneaking in during runtime.

Yeah, the tree-checker is run before commit too, so the mount checks
won't catch that.

> So in the long run, our sanity checks should:
> 
> - Do cross-checks at mount time for critical infrastructure
>    To prevent corruption sneaking in undetected.

At mount we can afford to do more than the tree-checker can do in the
single leaf, like matching the dev extents and block groups.

> - Do in-leaf checks at tree-checker
>    To prevent corruption reach storage.
> 
> - Do extra read-time cross-checks
>    Just like the dir item checks we did.

Some values have a clear range, or a constraint like alignment to a
sectorsize. Beyond that I think it's quite limited, we can't read other
leaves but maybe other checks against existing in-memory structures can
be done. An example, a block group item is consistent but overlaps with
one in the tree. Similar to what you do in this patch, but the bg is
already handled because of how the data is tracked. The tree-checker
coverage is pretty good so it's hard to find what else to cross-check.
Anand Jain Aug. 15, 2024, 5:17 a.m. UTC | #4
On 11/8/24 1:50 pm, Qu Wenruo wrote:
> [REPORT]
> There is a corruption report that btrfs refuse to mount a fs that has
> overlapping dev extents:
> 
>   BTRFS error (device sdc): dev extent devid 4 physical offset
> 14263979671552 overlap with previous dev extent end 14263980982272
>   BTRFS error (device sdc): failed to verify dev extents against chunks: -117
>   BTRFS error (device sdc): open_ctree failed
> 
> [CAUSE]
> The cause is very obvious, there is a bad dev extent item with incorrect
> length.
> Although we are not 100% sure of the cause before getting the dev tree
> dump, I'm already surprised that we do not have any checks on dev tree.
> 
> Currently we only do the dev-extent verification at mount time, but if the
> corruption is caused by memory bitflip, we really want to catch it before
> writing the corruption to the storage.
> 
> Furthermore the dev extent items has the following key definition:
> 
> 	(<device id> DEV_EXTENT <physical offset>)
> 
> Thus we can not just rely on the generic key order check to make sure
> there is no overlapping.
> 
> [ENHANCEMENT]
> Introduce dedicated dev extent checks, including:
> 
> - Fixed member checks
>    * chunk_tree should always be BTRFS_CHUNK_TREE_OBJECTID (3)
>    * chunk_objectid should always be
>      BTRFS_FIRST_CHUNK_CHUNK_TREE_OBJECTID (256)
> 
> - Alignment checks
>    * chunk_offset should be aligned to sectorsize
>    * length should be aligned to sectorsize
>    * key.offset should be aligned to sectorsize
> 
> - Overlap checks
>    If the previous key is also a dev-extent item, with the same
>    device id, make sure we do not overlap with the previous dev extent.
> 
> Reported: Stefan N <stefannnau@gmail.com>
> Link: https://lore.kernel.org/linux-btrfs/CA+W5K0rSO3koYTo=nzxxTm1-Pdu1HYgVxEpgJ=aGc7d=E8mGEg@mail.gmail.com/
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Looks good.

Reviewed-by: Anand Jain <anand.jain@oracle.com>

Thx.
David Sterba Aug. 15, 2024, 12:25 p.m. UTC | #5
On Thu, Aug 15, 2024 at 01:17:00PM +0800, Anand Jain wrote:
> On 11/8/24 1:50 pm, Qu Wenruo wrote:
> > [REPORT]
> > There is a corruption report that btrfs refuse to mount a fs that has
> > overlapping dev extents:
> > 
> >   BTRFS error (device sdc): dev extent devid 4 physical offset
> > 14263979671552 overlap with previous dev extent end 14263980982272
> >   BTRFS error (device sdc): failed to verify dev extents against chunks: -117
> >   BTRFS error (device sdc): open_ctree failed
> > 
> > [CAUSE]
> > The cause is very obvious, there is a bad dev extent item with incorrect
> > length.
> > Although we are not 100% sure of the cause before getting the dev tree
> > dump, I'm already surprised that we do not have any checks on dev tree.
> > 
> > Currently we only do the dev-extent verification at mount time, but if the
> > corruption is caused by memory bitflip, we really want to catch it before
> > writing the corruption to the storage.
> > 
> > Furthermore the dev extent items has the following key definition:
> > 
> > 	(<device id> DEV_EXTENT <physical offset>)
> > 
> > Thus we can not just rely on the generic key order check to make sure
> > there is no overlapping.
> > 
> > [ENHANCEMENT]
> > Introduce dedicated dev extent checks, including:
> > 
> > - Fixed member checks
> >    * chunk_tree should always be BTRFS_CHUNK_TREE_OBJECTID (3)
> >    * chunk_objectid should always be
> >      BTRFS_FIRST_CHUNK_CHUNK_TREE_OBJECTID (256)
> > 
> > - Alignment checks
> >    * chunk_offset should be aligned to sectorsize
> >    * length should be aligned to sectorsize
> >    * key.offset should be aligned to sectorsize
> > 
> > - Overlap checks
> >    If the previous key is also a dev-extent item, with the same
> >    device id, make sure we do not overlap with the previous dev extent.
> > 
> > Reported: Stefan N <stefannnau@gmail.com>
> > Link: https://lore.kernel.org/linux-btrfs/CA+W5K0rSO3koYTo=nzxxTm1-Pdu1HYgVxEpgJ=aGc7d=E8mGEg@mail.gmail.com/
> > Signed-off-by: Qu Wenruo <wqu@suse.com>
> 
> Looks good.
> 
> Reviewed-by: Anand Jain <anand.jain@oracle.com>

I'm doing some updaets to for-next so I've added the tag.
diff mbox series

Patch

diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index a825fa598e3c..3e8fbedfe04d 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -1763,6 +1763,72 @@  static int check_raid_stripe_extent(const struct extent_buffer *leaf,
 	return 0;
 }
 
+static int check_dev_extent_item(const struct extent_buffer *leaf,
+				 const struct btrfs_key *key,
+				 int slot,
+				 struct btrfs_key *prev_key)
+{
+	struct btrfs_dev_extent *de;
+	const u32 sectorsize = leaf->fs_info->sectorsize;
+
+	de = btrfs_item_ptr(leaf, slot, struct btrfs_dev_extent);
+	/* Basic fixed member checks. */
+	if (unlikely(btrfs_dev_extent_chunk_tree(leaf, de) !=
+		     BTRFS_CHUNK_TREE_OBJECTID)) {
+		generic_err(leaf, slot,
+			    "invalid dev extent chunk tree id, has %llu expect %llu",
+			    btrfs_dev_extent_chunk_tree(leaf, de),
+			    BTRFS_CHUNK_TREE_OBJECTID);
+		return -EUCLEAN;
+	}
+	if (unlikely(btrfs_dev_extent_chunk_objectid(leaf, de) !=
+		     BTRFS_FIRST_CHUNK_TREE_OBJECTID)) {
+		generic_err(leaf, slot,
+			    "invalid dev extent chunk objectid, has %llu expect %llu",
+			    btrfs_dev_extent_chunk_objectid(leaf, de),
+			    BTRFS_FIRST_CHUNK_TREE_OBJECTID);
+		return -EUCLEAN;
+	}
+	/* Alignment check. */
+	if (unlikely(!IS_ALIGNED(key->offset, sectorsize))) {
+		generic_err(leaf, slot,
+			    "invalid dev extent key.offset, has %llu not aligned to %u",
+			    key->offset, sectorsize);
+		return -EUCLEAN;
+	}
+	if (unlikely(!IS_ALIGNED(btrfs_dev_extent_chunk_offset(leaf, de),
+				 sectorsize))) {
+		generic_err(leaf, slot,
+			    "invalid dev extent chunk offset, has %llu not aligned to %u",
+			    btrfs_dev_extent_chunk_objectid(leaf, de),
+			    sectorsize);
+		return -EUCLEAN;
+	}
+	if (unlikely(!IS_ALIGNED(btrfs_dev_extent_length(leaf, de),
+				 sectorsize))) {
+		generic_err(leaf, slot,
+			    "invalid dev extent length, has %llu not aligned to %u",
+			    btrfs_dev_extent_length(leaf, de), sectorsize);
+		return -EUCLEAN;
+	}
+	/* Overlap check with previous dev extent. */
+	if (slot && prev_key->objectid == key->objectid &&
+	    prev_key->type == key->type) {
+		struct btrfs_dev_extent *prev_de;
+		u64 prev_len;
+
+		prev_de = btrfs_item_ptr(leaf, slot - 1, struct btrfs_dev_extent);
+		prev_len = btrfs_dev_extent_length(leaf, prev_de);
+		if (unlikely(prev_key->offset + prev_len > key->offset)) {
+			generic_err(leaf, slot,
+		"dev extent overlap, prev offset %llu len %llu current offset %llu",
+				    prev_key->objectid, prev_len, key->offset);
+			return -EUCLEAN;
+		}
+	}
+	return 0;
+}
+
 /*
  * Common point to switch the item-specific validation.
  */
@@ -1799,6 +1865,9 @@  static enum btrfs_tree_block_status check_leaf_item(struct extent_buffer *leaf,
 	case BTRFS_DEV_ITEM_KEY:
 		ret = check_dev_item(leaf, key, slot);
 		break;
+	case BTRFS_DEV_EXTENT_KEY:
+		ret = check_dev_extent_item(leaf, key, slot, prev_key);
+		break;
 	case BTRFS_INODE_ITEM_KEY:
 		ret = check_inode_item(leaf, key, slot);
 		break;