diff mbox series

[v2,04/11] btrfs: introduce extra sanity checks for extent maps

Message ID 23eef0d1cc8c482121d6958b3c131ba51648cde6.1714707707.git.wqu@suse.com (mailing list archive)
State New
Headers show
Series btrfs: extent-map: unify the members with btrfs_ordered_extent | expand

Commit Message

Qu Wenruo May 3, 2024, 6:01 a.m. UTC
Since extent_map structure has the all the needed members to represent a
file extent directly, we can apply all the file extent sanity checks to an extent
map.

The new sanity checks would cross check both the old members
(block_start/block_len/orig_start) and the new members
(disk_bytenr/disk_num_bytes/offset).

There is a special case for offset/orig_start/start cross check, we only
do such sanity check for compressed extent:

- Only compressed read/encoded write really utilize orig_start
  This can be proved by the cleanup patch of orig_start.

- Merged data extents can lead to false alerts
  The problem is, with disk_bytenr/disk_num_bytes, if we're merging
  two extent maps like this:

    |<- data extent A -->|<-- data extent B -->|
              |<- em 1 ->|<- em 2 ->|

  Let's assume em2 has orig_offset of 0 and start of 0, and obvisouly
  offset 0.

  But after merging, the merged em would have offset of em1, screwing up
  whatever the @orig_start cross check against @start.

The checks happens at the following timing:

- add_extent_mapping()
  This is for newly added extent map

- replace_extent_mapping()
  This is for btrfs_drop_extent_map_range() and split_extent_map()

- try_merge_map()

Since the check is way more strict than before, the following code has
to be modified to pass the check:

- extent-map-tests
  Previously the test case never populate ram_bytes, not to mention the
  newly introduced disk_bytenr/disk_num_bytes.
  Populate the involved numbers mostly to follow the existing
  block_start/block_len values.

  There are two special cases worth mentioning:
  - test_case_3()
    The test case is already way too invalid that tree-checker will
    reject almost all extents.

    And there is a special unaligned regular extent which has mismatch
    disk_num_bytes (4096) and ram_bytes (4096 - 1).
    Fix it by all assigned the disk_num_bytes and ram_bytes to 4096 - 1.

  - test_case_7()
    An extent is inserted with 16K length, but on-disk extent size is
    only 4K.
    This means it must be a compressed extent, so set the compressed flag
    for it.

- setup_relocation_extent_mapping()
  This is mostly utilized by relocation code to read the chunk like an
  inode.
  So populate the extent map using a regular non-compressed extent.

In fact, the new cross checks already exposed a bug in
btrfs_drop_extent_map_range(), and caught tons of bugs in the new
members assignment.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/extent_map.c             | 66 +++++++++++++++++++++++++++++++
 fs/btrfs/relocation.c             |  4 ++
 fs/btrfs/tests/extent-map-tests.c | 56 +++++++++++++++++++++++++-
 fs/btrfs/tests/inode-tests.c      |  2 +-
 4 files changed, 126 insertions(+), 2 deletions(-)

Comments

Filipe Manana May 13, 2024, 12:21 p.m. UTC | #1
On Fri, May 3, 2024 at 7:02 AM Qu Wenruo <wqu@suse.com> wrote:
>
> Since extent_map structure has the all the needed members to represent a
> file extent directly, we can apply all the file extent sanity checks to an extent
> map.
>
> The new sanity checks would cross check both the old members
> (block_start/block_len/orig_start) and the new members
> (disk_bytenr/disk_num_bytes/offset).
>
> There is a special case for offset/orig_start/start cross check, we only
> do such sanity check for compressed extent:
>
> - Only compressed read/encoded write really utilize orig_start
>   This can be proved by the cleanup patch of orig_start.
>
> - Merged data extents can lead to false alerts
>   The problem is, with disk_bytenr/disk_num_bytes, if we're merging
>   two extent maps like this:
>
>     |<- data extent A -->|<-- data extent B -->|
>               |<- em 1 ->|<- em 2 ->|
>
>   Let's assume em2 has orig_offset of 0 and start of 0, and obvisouly
>   offset 0.

obvisouly -> obviously

I'm confused. How can em2 have a "start" of 0?
By "start" you mean file offset I suppose, since that's what it is
before this patchset. That would mean em 1 would have a negative
"start".

And by "offset" you mean extent offset?

>
>   But after merging, the merged em would have offset of em1, screwing up

But this suggests that by "offset" you mean file offset and not the
offset within an extent's range.
So did you mean "start" here?

>   whatever the @orig_start cross check against @start.
>
> The checks happens at the following timing:
>
> - add_extent_mapping()
>   This is for newly added extent map
>
> - replace_extent_mapping()
>   This is for btrfs_drop_extent_map_range() and split_extent_map()
>
> - try_merge_map()
>
> Since the check is way more strict than before, the following code has
> to be modified to pass the check:
>
> - extent-map-tests
>   Previously the test case never populate ram_bytes, not to mention the
>   newly introduced disk_bytenr/disk_num_bytes.
>   Populate the involved numbers mostly to follow the existing
>   block_start/block_len values.
>
>   There are two special cases worth mentioning:
>   - test_case_3()
>     The test case is already way too invalid that tree-checker will
>     reject almost all extents.
>
>     And there is a special unaligned regular extent which has mismatch
>     disk_num_bytes (4096) and ram_bytes (4096 - 1).
>     Fix it by all assigned the disk_num_bytes and ram_bytes to 4096 - 1.

Looking at the diff for test_case_3(), I don't see any assignment of
4096 - 1 anywhere.
All the assignments I see have a value of SZ_4K or SZ_16K.

>
>   - test_case_7()
>     An extent is inserted with 16K length, but on-disk extent size is
>     only 4K.
>     This means it must be a compressed extent, so set the compressed flag
>     for it.
>
> - setup_relocation_extent_mapping()
>   This is mostly utilized by relocation code to read the chunk like an
>   inode.

Not "mostly" - it's exclusively used by the relocation code.

>   So populate the extent map using a regular non-compressed extent.

Isn't that what we already do? I'm confused.
We are already creating a regular non-compressed extent, and all the
diff does is to initialize some fields unrelated to that.

>
> In fact, the new cross checks already exposed a bug in
> btrfs_drop_extent_map_range(), and caught tons of bugs in the new
> members assignment.

I remember one bug in btrfs_drop_extent_map_range(), which fortunately
didn't have any consequences.
Those tons of bugs you mention are only in the code added by the
patchset if I understand you correctly.

>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
>  fs/btrfs/extent_map.c             | 66 +++++++++++++++++++++++++++++++
>  fs/btrfs/relocation.c             |  4 ++
>  fs/btrfs/tests/extent-map-tests.c | 56 +++++++++++++++++++++++++-
>  fs/btrfs/tests/inode-tests.c      |  2 +-
>  4 files changed, 126 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
> index 4d4ac9fc43e2..8d0e257fc113 100644
> --- a/fs/btrfs/extent_map.c
> +++ b/fs/btrfs/extent_map.c
> @@ -284,6 +284,66 @@ static void merge_ondisk_extents(struct extent_map *prev, struct extent_map *nex
>         next->offset = new_offset;
>  }
>
> +static void dump_extent_map(const char *prefix, struct extent_map *em)
> +{
> +       if (!IS_ENABLED(CONFIG_BTRFS_DEBUG))
> +               return;
> +       pr_crit("%s, start=%llu len=%llu disk_bytenr=%llu disk_num_bytes=%llu ram_bytes=%llu offset=%llu orig_start=%llu block_start=%llu block_len=%llu flags=0x%x\n",
> +               prefix, em->start, em->len, em->disk_bytenr, em->disk_num_bytes,
> +               em->ram_bytes, em->offset, em->orig_start, em->block_start,
> +               em->block_len, em->flags);

Instead of pr_crit() please use btrfs_crit(), in the call chain we
have access to the fs_info at try_merge_map().

> +       ASSERT(0);
> +}
> +
> +/* Internal sanity checks for btrfs debug builds. */
> +static void validate_extent_map(struct extent_map *em)
> +{
> +       if (!IS_ENABLED(CONFIG_BTRFS_DEBUG))
> +               return;
> +       if (em->disk_bytenr < EXTENT_MAP_LAST_BYTE) {
> +               if (em->disk_num_bytes == 0)
> +                       dump_extent_map("zero disk_num_bytes", em);
> +               if (em->offset + em->len > em->ram_bytes)
> +                       dump_extent_map("ram_bytes too small", em);
> +               if (em->offset + em->len > em->disk_num_bytes &&
> +                   !extent_map_is_compressed(em))
> +                       dump_extent_map("disk_num_bytes too small", em);
> +
> +               if (extent_map_is_compressed(em)) {
> +                       if (em->block_start != em->disk_bytenr)
> +                               dump_extent_map(
> +                               "mismatch block_start/disk_bytenr/offset", em);
> +                       if (em->disk_num_bytes != em->block_len)
> +                               dump_extent_map(
> +                               "mismatch disk_num_bytes/block_len", em);
> +                       /*
> +                        * Here we only check the start/orig_start/offset for
> +                        * compressed extents.
> +                        * This is because em::offset is always based on the
> +                        * referred data extent, which can be merged.
> +                        *
> +                        * In that case, @offset would no longer match
> +                        * em::start - em::orig_start, and cause false alert.
> +                        *
> +                        * Thankfully only compressed extent read/encoded write
> +                        * really bothers @orig_start, so we can skip

really bothers -> care about

> +                        * the check for non-compressed extents.
> +                        */
> +                       if (em->orig_start != em->start - em->offset)
> +                               dump_extent_map(
> +                               "mismatch orig_start/offset/start", em);
> +
> +               } else {
> +                       if (em->block_start != em->disk_bytenr + em->offset)

You can combine this as:

} else if (em->block_start != em->disk_bytenr + em->offset) {

Which is clear and helps reduce indentation.

> +                               dump_extent_map(
> +                               "mismatch block_start/disk_bytenr/offset", em);
> +               }
> +       } else {
> +               if (em->offset)

Same here, can be combined.

> +                       dump_extent_map("non-zero offset for hole/inline", em);
> +       }
> +}
> +
>  static void try_merge_map(struct btrfs_inode *inode, struct extent_map *em)
>  {
>         struct extent_map_tree *tree = &inode->extent_tree;
> @@ -320,6 +380,7 @@ static void try_merge_map(struct btrfs_inode *inode, struct extent_map *em)
>                                 merge_ondisk_extents(merge, em);
>                         em->flags |= EXTENT_FLAG_MERGED;
>
> +                       validate_extent_map(em);
>                         rb_erase_cached(&merge->rb_node, &tree->map);
>                         RB_CLEAR_NODE(&merge->rb_node);
>                         free_extent_map(merge);
> @@ -335,6 +396,7 @@ static void try_merge_map(struct btrfs_inode *inode, struct extent_map *em)
>                 em->block_len += merge->block_len;
>                 if (em->disk_bytenr < EXTENT_MAP_LAST_BYTE)
>                         merge_ondisk_extents(em, merge);
> +               validate_extent_map(em);
>                 rb_erase_cached(&merge->rb_node, &tree->map);
>                 RB_CLEAR_NODE(&merge->rb_node);
>                 em->generation = max(em->generation, merge->generation);
> @@ -446,6 +508,7 @@ static int add_extent_mapping(struct btrfs_inode *inode,
>
>         lockdep_assert_held_write(&tree->lock);
>
> +       validate_extent_map(em);
>         ret = tree_insert(&tree->map, em);
>         if (ret)
>                 return ret;
> @@ -553,6 +616,9 @@ static void replace_extent_mapping(struct btrfs_inode *inode,
>
>         lockdep_assert_held_write(&tree->lock);
>
> +       validate_extent_map(cur);

Why validate the extent map we are throwing away?
We have already validated it when we inserted it or after we merged it.

Thanks.

> +       validate_extent_map(new);
> +
>         WARN_ON(cur->flags & EXTENT_FLAG_PINNED);
>         ASSERT(extent_map_in_tree(cur));
>         if (!(cur->flags & EXTENT_FLAG_LOGGING))
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index 8b24bb5a0aa1..0eb737507d12 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -2911,9 +2911,13 @@ static noinline_for_stack int setup_relocation_extent_mapping(struct inode *inod
>                 return -ENOMEM;
>
>         em->start = start;
> +       em->orig_start = start;
>         em->len = end + 1 - start;
>         em->block_len = em->len;
>         em->block_start = block_start;
> +       em->disk_bytenr = block_start;
> +       em->disk_num_bytes = em->len;
> +       em->ram_bytes = em->len;
>         em->flags |= EXTENT_FLAG_PINNED;
>
>         lock_extent(&BTRFS_I(inode)->io_tree, start, end, &cached_state);
> diff --git a/fs/btrfs/tests/extent-map-tests.c b/fs/btrfs/tests/extent-map-tests.c
> index ba36794ba2d5..8c683eed9f27 100644
> --- a/fs/btrfs/tests/extent-map-tests.c
> +++ b/fs/btrfs/tests/extent-map-tests.c
> @@ -78,6 +78,9 @@ static int test_case_1(struct btrfs_fs_info *fs_info, struct btrfs_inode *inode)
>         em->len = SZ_16K;
>         em->block_start = 0;
>         em->block_len = SZ_16K;
> +       em->disk_bytenr = 0;
> +       em->disk_num_bytes = SZ_16K;
> +       em->ram_bytes = SZ_16K;
>         write_lock(&em_tree->lock);
>         ret = btrfs_add_extent_mapping(inode, &em, em->start, em->len);
>         write_unlock(&em_tree->lock);
> @@ -96,9 +99,13 @@ static int test_case_1(struct btrfs_fs_info *fs_info, struct btrfs_inode *inode)
>         }
>
>         em->start = SZ_16K;
> +       em->orig_start = SZ_16K;
>         em->len = SZ_4K;
>         em->block_start = SZ_32K; /* avoid merging */
>         em->block_len = SZ_4K;
> +       em->disk_bytenr = SZ_32K; /* avoid merging */
> +       em->disk_num_bytes = SZ_4K;
> +       em->ram_bytes = SZ_4K;
>         write_lock(&em_tree->lock);
>         ret = btrfs_add_extent_mapping(inode, &em, em->start, em->len);
>         write_unlock(&em_tree->lock);
> @@ -117,9 +124,13 @@ static int test_case_1(struct btrfs_fs_info *fs_info, struct btrfs_inode *inode)
>
>         /* Add [0, 8K), should return [0, 16K) instead. */
>         em->start = start;
> +       em->orig_start = start;
>         em->len = len;
>         em->block_start = start;
>         em->block_len = len;
> +       em->disk_bytenr = start;
> +       em->disk_num_bytes = len;
> +       em->ram_bytes = len;
>         write_lock(&em_tree->lock);
>         ret = btrfs_add_extent_mapping(inode, &em, em->start, em->len);
>         write_unlock(&em_tree->lock);
> @@ -174,6 +185,9 @@ static int test_case_2(struct btrfs_fs_info *fs_info, struct btrfs_inode *inode)
>         em->len = SZ_1K;
>         em->block_start = EXTENT_MAP_INLINE;
>         em->block_len = (u64)-1;
> +       em->disk_bytenr = EXTENT_MAP_INLINE;
> +       em->disk_num_bytes = 0;
> +       em->ram_bytes = SZ_1K;
>         write_lock(&em_tree->lock);
>         ret = btrfs_add_extent_mapping(inode, &em, em->start, em->len);
>         write_unlock(&em_tree->lock);
> @@ -192,9 +206,13 @@ static int test_case_2(struct btrfs_fs_info *fs_info, struct btrfs_inode *inode)
>         }
>
>         em->start = SZ_4K;
> +       em->orig_start = SZ_4K;
>         em->len = SZ_4K;
>         em->block_start = SZ_4K;
>         em->block_len = SZ_4K;
> +       em->disk_bytenr = SZ_4K;
> +       em->disk_num_bytes = SZ_4K;
> +       em->ram_bytes = SZ_4K;
>         write_lock(&em_tree->lock);
>         ret = btrfs_add_extent_mapping(inode, &em, em->start, em->len);
>         write_unlock(&em_tree->lock);
> @@ -216,6 +234,9 @@ static int test_case_2(struct btrfs_fs_info *fs_info, struct btrfs_inode *inode)
>         em->len = SZ_1K;
>         em->block_start = EXTENT_MAP_INLINE;
>         em->block_len = (u64)-1;
> +       em->disk_bytenr = EXTENT_MAP_INLINE;
> +       em->disk_num_bytes = 0;
> +       em->ram_bytes = SZ_1K;
>         write_lock(&em_tree->lock);
>         ret = btrfs_add_extent_mapping(inode, &em, em->start, em->len);
>         write_unlock(&em_tree->lock);
> @@ -262,9 +283,13 @@ static int __test_case_3(struct btrfs_fs_info *fs_info,
>
>         /* Add [4K, 8K) */
>         em->start = SZ_4K;
> +       em->orig_start = SZ_4K;
>         em->len = SZ_4K;
>         em->block_start = SZ_4K;
>         em->block_len = SZ_4K;
> +       em->disk_bytenr = SZ_4K;
> +       em->disk_num_bytes = SZ_4K;
> +       em->ram_bytes = SZ_4K;
>         write_lock(&em_tree->lock);
>         ret = btrfs_add_extent_mapping(inode, &em, em->start, em->len);
>         write_unlock(&em_tree->lock);
> @@ -286,6 +311,9 @@ static int __test_case_3(struct btrfs_fs_info *fs_info,
>         em->len = SZ_16K;
>         em->block_start = 0;
>         em->block_len = SZ_16K;
> +       em->disk_bytenr = 0;
> +       em->disk_num_bytes = SZ_16K;
> +       em->ram_bytes = SZ_16K;
>         write_lock(&em_tree->lock);
>         ret = btrfs_add_extent_mapping(inode, &em, start, len);
>         write_unlock(&em_tree->lock);
> @@ -372,6 +400,9 @@ static int __test_case_4(struct btrfs_fs_info *fs_info,
>         em->len = SZ_8K;
>         em->block_start = 0;
>         em->block_len = SZ_8K;
> +       em->disk_bytenr = 0;
> +       em->disk_num_bytes = SZ_8K;
> +       em->ram_bytes = SZ_8K;
>         write_lock(&em_tree->lock);
>         ret = btrfs_add_extent_mapping(inode, &em, em->start, em->len);
>         write_unlock(&em_tree->lock);
> @@ -390,9 +421,13 @@ static int __test_case_4(struct btrfs_fs_info *fs_info,
>
>         /* Add [8K, 32K) */
>         em->start = SZ_8K;
> +       em->orig_start = SZ_8K;
>         em->len = 24 * SZ_1K;
>         em->block_start = SZ_16K; /* avoid merging */
>         em->block_len = 24 * SZ_1K;
> +       em->disk_bytenr = SZ_16K; /* avoid merging */
> +       em->disk_num_bytes = 24 * SZ_1K;
> +       em->ram_bytes = 24 * SZ_1K;
>         write_lock(&em_tree->lock);
>         ret = btrfs_add_extent_mapping(inode, &em, em->start, em->len);
>         write_unlock(&em_tree->lock);
> @@ -410,9 +445,13 @@ static int __test_case_4(struct btrfs_fs_info *fs_info,
>         }
>         /* Add [0K, 32K) */
>         em->start = 0;
> +       em->orig_start = 0;
>         em->len = SZ_32K;
>         em->block_start = 0;
>         em->block_len = SZ_32K;
> +       em->disk_bytenr = 0;
> +       em->disk_num_bytes = SZ_32K;
> +       em->ram_bytes = SZ_32K;
>         write_lock(&em_tree->lock);
>         ret = btrfs_add_extent_mapping(inode, &em, start, len);
>         write_unlock(&em_tree->lock);
> @@ -494,9 +533,13 @@ static int add_compressed_extent(struct btrfs_inode *inode,
>         }
>
>         em->start = start;
> +       em->orig_start = start;
>         em->len = len;
>         em->block_start = block_start;
>         em->block_len = SZ_4K;
> +       em->disk_bytenr = block_start;
> +       em->disk_num_bytes = SZ_4K;
> +       em->ram_bytes = len;
>         em->flags |= EXTENT_FLAG_COMPRESS_ZLIB;
>         write_lock(&em_tree->lock);
>         ret = btrfs_add_extent_mapping(inode, &em, em->start, em->len);
> @@ -715,9 +758,13 @@ static int test_case_6(struct btrfs_fs_info *fs_info, struct btrfs_inode *inode)
>         }
>
>         em->start = SZ_4K;
> +       em->orig_start = SZ_4K;
>         em->len = SZ_4K;
>         em->block_start = SZ_16K;
>         em->block_len = SZ_16K;
> +       em->disk_bytenr = SZ_16K;
> +       em->disk_num_bytes = SZ_16K;
> +       em->ram_bytes = SZ_16K;
>         write_lock(&em_tree->lock);
>         ret = btrfs_add_extent_mapping(inode, &em, 0, SZ_8K);
>         write_unlock(&em_tree->lock);
> @@ -771,7 +818,10 @@ static int test_case_7(struct btrfs_fs_info *fs_info, struct btrfs_inode *inode)
>         em->len = SZ_16K;
>         em->block_start = 0;
>         em->block_len = SZ_4K;
> -       em->flags |= EXTENT_FLAG_PINNED;
> +       em->disk_bytenr = 0;
> +       em->disk_num_bytes = SZ_4K;
> +       em->ram_bytes = SZ_16K;
> +       em->flags |= (EXTENT_FLAG_PINNED | EXTENT_FLAG_COMPRESS_ZLIB);
>         write_lock(&em_tree->lock);
>         ret = btrfs_add_extent_mapping(inode, &em, em->start, em->len);
>         write_unlock(&em_tree->lock);
> @@ -790,9 +840,13 @@ static int test_case_7(struct btrfs_fs_info *fs_info, struct btrfs_inode *inode)
>
>         /* [32K, 48K), not pinned */
>         em->start = SZ_32K;
> +       em->orig_start = SZ_32K;
>         em->len = SZ_16K;
>         em->block_start = SZ_32K;
>         em->block_len = SZ_16K;
> +       em->disk_bytenr = SZ_32K;
> +       em->disk_num_bytes = SZ_16K;
> +       em->ram_bytes = SZ_16K;
>         write_lock(&em_tree->lock);
>         ret = btrfs_add_extent_mapping(inode, &em, em->start, em->len);
>         write_unlock(&em_tree->lock);
> diff --git a/fs/btrfs/tests/inode-tests.c b/fs/btrfs/tests/inode-tests.c
> index 99da9d34b77a..0895c6e06812 100644
> --- a/fs/btrfs/tests/inode-tests.c
> +++ b/fs/btrfs/tests/inode-tests.c
> @@ -117,7 +117,7 @@ static void setup_file_extents(struct btrfs_root *root, u32 sectorsize)
>
>         /* Now for a regular extent */
>         insert_extent(root, offset, sectorsize - 1, sectorsize - 1, 0,
> -                     disk_bytenr, sectorsize, BTRFS_FILE_EXTENT_REG, 0, slot);
> +                     disk_bytenr, sectorsize - 1, BTRFS_FILE_EXTENT_REG, 0, slot);
>         slot++;
>         disk_bytenr += sectorsize;
>         offset += sectorsize - 1;
> --
> 2.45.0
>
>
Qu Wenruo May 13, 2024, 10:34 p.m. UTC | #2
在 2024/5/13 21:51, Filipe Manana 写道:
> On Fri, May 3, 2024 at 7:02 AM Qu Wenruo <wqu@suse.com> wrote:
>>
>> Since extent_map structure has the all the needed members to represent a
>> file extent directly, we can apply all the file extent sanity checks to an extent
>> map.
>>
>> The new sanity checks would cross check both the old members
>> (block_start/block_len/orig_start) and the new members
>> (disk_bytenr/disk_num_bytes/offset).
>>
>> There is a special case for offset/orig_start/start cross check, we only
>> do such sanity check for compressed extent:
>>
>> - Only compressed read/encoded write really utilize orig_start
>>    This can be proved by the cleanup patch of orig_start.
>>
>> - Merged data extents can lead to false alerts
>>    The problem is, with disk_bytenr/disk_num_bytes, if we're merging
>>    two extent maps like this:
>>
>>      |<- data extent A -->|<-- data extent B -->|
>>                |<- em 1 ->|<- em 2 ->|
>>
>>    Let's assume em2 has orig_offset of 0 and start of 0, and obvisouly
>>    offset 0.
>
> obvisouly -> obviously
>
> I'm confused. How can em2 have a "start" of 0?
> By "start" you mean file offset I suppose, since that's what it is
> before this patchset. That would mean em 1 would have a negative
> "start".

My bad, I screwed up the example numbers.

For "offset" I mean the the offset inside the data extent, aka,
"btrfs_file_extent_item::offset"
And for "start" I mean filepos, aka the key.offset.

And in above case, it's all dependent on the filepos of the em1/em2.

I would change the example to something like this:

em1: file pos 4k, len 4k, extent offset 4k, ram bytes 8k, disk_bytenr X,
disk_num_bytes 8K, compress none.

Then it's orig_start should be 0.

em2: file pos 8k, len 4k, extent offset 0, ram_bytes 8K,
disk_bytenr X + 8K, disk_num_bytes 8K, compress none.

And its orig_start should be 8K.

>
> And by "offset" you mean extent offset?
>
>>
>>    But after merging, the merged em would have offset of em1, screwing up
>
> But this suggests that by "offset" you mean file offset and not the
> offset within an extent's range.

No, I still mean "extent offset".

As the "merged" em would be:

em merged: filepos 4k, len 8K, extent offset 4K, ram_bytes 16K,
disk_bytenr X, disk_num_bytes 16K, compress none.

Thus the orig_offset would be still be 0, the same as the em1 before
merging.

> So did you mean "start" here?
>
>>    whatever the @orig_start cross check against @start.
>>
>> The checks happens at the following timing:
>>
>> - add_extent_mapping()
>>    This is for newly added extent map
>>
>> - replace_extent_mapping()
>>    This is for btrfs_drop_extent_map_range() and split_extent_map()
>>
>> - try_merge_map()
>>
>> Since the check is way more strict than before, the following code has
>> to be modified to pass the check:
>>
>> - extent-map-tests
>>    Previously the test case never populate ram_bytes, not to mention the
>>    newly introduced disk_bytenr/disk_num_bytes.
>>    Populate the involved numbers mostly to follow the existing
>>    block_start/block_len values.
>>
>>    There are two special cases worth mentioning:
>>    - test_case_3()
>>      The test case is already way too invalid that tree-checker will
>>      reject almost all extents.
>>
>>      And there is a special unaligned regular extent which has mismatch
>>      disk_num_bytes (4096) and ram_bytes (4096 - 1).
>>      Fix it by all assigned the disk_num_bytes and ram_bytes to 4096 - 1.
>
> Looking at the diff for test_case_3(), I don't see any assignment of
> 4096 - 1 anywhere.
> All the assignments I see have a value of SZ_4K or SZ_16K.

It's in fact setup_file_extents() from inode-tests.c, I got confused
with other fixes.

>
>>
>>    - test_case_7()
>>      An extent is inserted with 16K length, but on-disk extent size is
>>      only 4K.
>>      This means it must be a compressed extent, so set the compressed flag
>>      for it.
>>
>> - setup_relocation_extent_mapping()
>>    This is mostly utilized by relocation code to read the chunk like an
>>    inode.
>
> Not "mostly" - it's exclusively used by the relocation code.
>
>>    So populate the extent map using a regular non-compressed extent.
>
> Isn't that what we already do? I'm confused.
> We are already creating a regular non-compressed extent, and all the
> diff does is to initialize some fields unrelated to that.

It's more like initializing the remaining members, or those
uninitialized members would not pass cross-check.

I guess I should put the missing initialization where I introduced them?

>
>>
>> In fact, the new cross checks already exposed a bug in
>> btrfs_drop_extent_map_range(), and caught tons of bugs in the new
>> members assignment.
>
> I remember one bug in btrfs_drop_extent_map_range(), which fortunately
> didn't have any consequences.
> Those tons of bugs you mention are only in the code added by the
> patchset if I understand you correctly.

Isn't that why I mention is as "in the new members assignment"?

Or should I just drop the mention completely?

Thanks,
Qu
diff mbox series

Patch

diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
index 4d4ac9fc43e2..8d0e257fc113 100644
--- a/fs/btrfs/extent_map.c
+++ b/fs/btrfs/extent_map.c
@@ -284,6 +284,66 @@  static void merge_ondisk_extents(struct extent_map *prev, struct extent_map *nex
 	next->offset = new_offset;
 }
 
+static void dump_extent_map(const char *prefix, struct extent_map *em)
+{
+	if (!IS_ENABLED(CONFIG_BTRFS_DEBUG))
+		return;
+	pr_crit("%s, start=%llu len=%llu disk_bytenr=%llu disk_num_bytes=%llu ram_bytes=%llu offset=%llu orig_start=%llu block_start=%llu block_len=%llu flags=0x%x\n",
+		prefix, em->start, em->len, em->disk_bytenr, em->disk_num_bytes,
+		em->ram_bytes, em->offset, em->orig_start, em->block_start,
+		em->block_len, em->flags);
+	ASSERT(0);
+}
+
+/* Internal sanity checks for btrfs debug builds. */
+static void validate_extent_map(struct extent_map *em)
+{
+	if (!IS_ENABLED(CONFIG_BTRFS_DEBUG))
+		return;
+	if (em->disk_bytenr < EXTENT_MAP_LAST_BYTE) {
+		if (em->disk_num_bytes == 0)
+			dump_extent_map("zero disk_num_bytes", em);
+		if (em->offset + em->len > em->ram_bytes)
+			dump_extent_map("ram_bytes too small", em);
+		if (em->offset + em->len > em->disk_num_bytes &&
+		    !extent_map_is_compressed(em))
+			dump_extent_map("disk_num_bytes too small", em);
+
+		if (extent_map_is_compressed(em)) {
+			if (em->block_start != em->disk_bytenr)
+				dump_extent_map(
+				"mismatch block_start/disk_bytenr/offset", em);
+			if (em->disk_num_bytes != em->block_len)
+				dump_extent_map(
+				"mismatch disk_num_bytes/block_len", em);
+			/*
+			 * Here we only check the start/orig_start/offset for
+			 * compressed extents.
+			 * This is because em::offset is always based on the
+			 * referred data extent, which can be merged.
+			 *
+			 * In that case, @offset would no longer match
+			 * em::start - em::orig_start, and cause false alert.
+			 *
+			 * Thankfully only compressed extent read/encoded write
+			 * really bothers @orig_start, so we can skip
+			 * the check for non-compressed extents.
+			 */
+			if (em->orig_start != em->start - em->offset)
+				dump_extent_map(
+				"mismatch orig_start/offset/start", em);
+
+		} else {
+			if (em->block_start != em->disk_bytenr + em->offset)
+				dump_extent_map(
+				"mismatch block_start/disk_bytenr/offset", em);
+		}
+	} else {
+		if (em->offset)
+			dump_extent_map("non-zero offset for hole/inline", em);
+	}
+}
+
 static void try_merge_map(struct btrfs_inode *inode, struct extent_map *em)
 {
 	struct extent_map_tree *tree = &inode->extent_tree;
@@ -320,6 +380,7 @@  static void try_merge_map(struct btrfs_inode *inode, struct extent_map *em)
 				merge_ondisk_extents(merge, em);
 			em->flags |= EXTENT_FLAG_MERGED;
 
+			validate_extent_map(em);
 			rb_erase_cached(&merge->rb_node, &tree->map);
 			RB_CLEAR_NODE(&merge->rb_node);
 			free_extent_map(merge);
@@ -335,6 +396,7 @@  static void try_merge_map(struct btrfs_inode *inode, struct extent_map *em)
 		em->block_len += merge->block_len;
 		if (em->disk_bytenr < EXTENT_MAP_LAST_BYTE)
 			merge_ondisk_extents(em, merge);
+		validate_extent_map(em);
 		rb_erase_cached(&merge->rb_node, &tree->map);
 		RB_CLEAR_NODE(&merge->rb_node);
 		em->generation = max(em->generation, merge->generation);
@@ -446,6 +508,7 @@  static int add_extent_mapping(struct btrfs_inode *inode,
 
 	lockdep_assert_held_write(&tree->lock);
 
+	validate_extent_map(em);
 	ret = tree_insert(&tree->map, em);
 	if (ret)
 		return ret;
@@ -553,6 +616,9 @@  static void replace_extent_mapping(struct btrfs_inode *inode,
 
 	lockdep_assert_held_write(&tree->lock);
 
+	validate_extent_map(cur);
+	validate_extent_map(new);
+
 	WARN_ON(cur->flags & EXTENT_FLAG_PINNED);
 	ASSERT(extent_map_in_tree(cur));
 	if (!(cur->flags & EXTENT_FLAG_LOGGING))
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 8b24bb5a0aa1..0eb737507d12 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2911,9 +2911,13 @@  static noinline_for_stack int setup_relocation_extent_mapping(struct inode *inod
 		return -ENOMEM;
 
 	em->start = start;
+	em->orig_start = start;
 	em->len = end + 1 - start;
 	em->block_len = em->len;
 	em->block_start = block_start;
+	em->disk_bytenr = block_start;
+	em->disk_num_bytes = em->len;
+	em->ram_bytes = em->len;
 	em->flags |= EXTENT_FLAG_PINNED;
 
 	lock_extent(&BTRFS_I(inode)->io_tree, start, end, &cached_state);
diff --git a/fs/btrfs/tests/extent-map-tests.c b/fs/btrfs/tests/extent-map-tests.c
index ba36794ba2d5..8c683eed9f27 100644
--- a/fs/btrfs/tests/extent-map-tests.c
+++ b/fs/btrfs/tests/extent-map-tests.c
@@ -78,6 +78,9 @@  static int test_case_1(struct btrfs_fs_info *fs_info, struct btrfs_inode *inode)
 	em->len = SZ_16K;
 	em->block_start = 0;
 	em->block_len = SZ_16K;
+	em->disk_bytenr = 0;
+	em->disk_num_bytes = SZ_16K;
+	em->ram_bytes = SZ_16K;
 	write_lock(&em_tree->lock);
 	ret = btrfs_add_extent_mapping(inode, &em, em->start, em->len);
 	write_unlock(&em_tree->lock);
@@ -96,9 +99,13 @@  static int test_case_1(struct btrfs_fs_info *fs_info, struct btrfs_inode *inode)
 	}
 
 	em->start = SZ_16K;
+	em->orig_start = SZ_16K;
 	em->len = SZ_4K;
 	em->block_start = SZ_32K; /* avoid merging */
 	em->block_len = SZ_4K;
+	em->disk_bytenr = SZ_32K; /* avoid merging */
+	em->disk_num_bytes = SZ_4K;
+	em->ram_bytes = SZ_4K;
 	write_lock(&em_tree->lock);
 	ret = btrfs_add_extent_mapping(inode, &em, em->start, em->len);
 	write_unlock(&em_tree->lock);
@@ -117,9 +124,13 @@  static int test_case_1(struct btrfs_fs_info *fs_info, struct btrfs_inode *inode)
 
 	/* Add [0, 8K), should return [0, 16K) instead. */
 	em->start = start;
+	em->orig_start = start;
 	em->len = len;
 	em->block_start = start;
 	em->block_len = len;
+	em->disk_bytenr = start;
+	em->disk_num_bytes = len;
+	em->ram_bytes = len;
 	write_lock(&em_tree->lock);
 	ret = btrfs_add_extent_mapping(inode, &em, em->start, em->len);
 	write_unlock(&em_tree->lock);
@@ -174,6 +185,9 @@  static int test_case_2(struct btrfs_fs_info *fs_info, struct btrfs_inode *inode)
 	em->len = SZ_1K;
 	em->block_start = EXTENT_MAP_INLINE;
 	em->block_len = (u64)-1;
+	em->disk_bytenr = EXTENT_MAP_INLINE;
+	em->disk_num_bytes = 0;
+	em->ram_bytes = SZ_1K;
 	write_lock(&em_tree->lock);
 	ret = btrfs_add_extent_mapping(inode, &em, em->start, em->len);
 	write_unlock(&em_tree->lock);
@@ -192,9 +206,13 @@  static int test_case_2(struct btrfs_fs_info *fs_info, struct btrfs_inode *inode)
 	}
 
 	em->start = SZ_4K;
+	em->orig_start = SZ_4K;
 	em->len = SZ_4K;
 	em->block_start = SZ_4K;
 	em->block_len = SZ_4K;
+	em->disk_bytenr = SZ_4K;
+	em->disk_num_bytes = SZ_4K;
+	em->ram_bytes = SZ_4K;
 	write_lock(&em_tree->lock);
 	ret = btrfs_add_extent_mapping(inode, &em, em->start, em->len);
 	write_unlock(&em_tree->lock);
@@ -216,6 +234,9 @@  static int test_case_2(struct btrfs_fs_info *fs_info, struct btrfs_inode *inode)
 	em->len = SZ_1K;
 	em->block_start = EXTENT_MAP_INLINE;
 	em->block_len = (u64)-1;
+	em->disk_bytenr = EXTENT_MAP_INLINE;
+	em->disk_num_bytes = 0;
+	em->ram_bytes = SZ_1K;
 	write_lock(&em_tree->lock);
 	ret = btrfs_add_extent_mapping(inode, &em, em->start, em->len);
 	write_unlock(&em_tree->lock);
@@ -262,9 +283,13 @@  static int __test_case_3(struct btrfs_fs_info *fs_info,
 
 	/* Add [4K, 8K) */
 	em->start = SZ_4K;
+	em->orig_start = SZ_4K;
 	em->len = SZ_4K;
 	em->block_start = SZ_4K;
 	em->block_len = SZ_4K;
+	em->disk_bytenr = SZ_4K;
+	em->disk_num_bytes = SZ_4K;
+	em->ram_bytes = SZ_4K;
 	write_lock(&em_tree->lock);
 	ret = btrfs_add_extent_mapping(inode, &em, em->start, em->len);
 	write_unlock(&em_tree->lock);
@@ -286,6 +311,9 @@  static int __test_case_3(struct btrfs_fs_info *fs_info,
 	em->len = SZ_16K;
 	em->block_start = 0;
 	em->block_len = SZ_16K;
+	em->disk_bytenr = 0;
+	em->disk_num_bytes = SZ_16K;
+	em->ram_bytes = SZ_16K;
 	write_lock(&em_tree->lock);
 	ret = btrfs_add_extent_mapping(inode, &em, start, len);
 	write_unlock(&em_tree->lock);
@@ -372,6 +400,9 @@  static int __test_case_4(struct btrfs_fs_info *fs_info,
 	em->len = SZ_8K;
 	em->block_start = 0;
 	em->block_len = SZ_8K;
+	em->disk_bytenr = 0;
+	em->disk_num_bytes = SZ_8K;
+	em->ram_bytes = SZ_8K;
 	write_lock(&em_tree->lock);
 	ret = btrfs_add_extent_mapping(inode, &em, em->start, em->len);
 	write_unlock(&em_tree->lock);
@@ -390,9 +421,13 @@  static int __test_case_4(struct btrfs_fs_info *fs_info,
 
 	/* Add [8K, 32K) */
 	em->start = SZ_8K;
+	em->orig_start = SZ_8K;
 	em->len = 24 * SZ_1K;
 	em->block_start = SZ_16K; /* avoid merging */
 	em->block_len = 24 * SZ_1K;
+	em->disk_bytenr = SZ_16K; /* avoid merging */
+	em->disk_num_bytes = 24 * SZ_1K;
+	em->ram_bytes = 24 * SZ_1K;
 	write_lock(&em_tree->lock);
 	ret = btrfs_add_extent_mapping(inode, &em, em->start, em->len);
 	write_unlock(&em_tree->lock);
@@ -410,9 +445,13 @@  static int __test_case_4(struct btrfs_fs_info *fs_info,
 	}
 	/* Add [0K, 32K) */
 	em->start = 0;
+	em->orig_start = 0;
 	em->len = SZ_32K;
 	em->block_start = 0;
 	em->block_len = SZ_32K;
+	em->disk_bytenr = 0;
+	em->disk_num_bytes = SZ_32K;
+	em->ram_bytes = SZ_32K;
 	write_lock(&em_tree->lock);
 	ret = btrfs_add_extent_mapping(inode, &em, start, len);
 	write_unlock(&em_tree->lock);
@@ -494,9 +533,13 @@  static int add_compressed_extent(struct btrfs_inode *inode,
 	}
 
 	em->start = start;
+	em->orig_start = start;
 	em->len = len;
 	em->block_start = block_start;
 	em->block_len = SZ_4K;
+	em->disk_bytenr = block_start;
+	em->disk_num_bytes = SZ_4K;
+	em->ram_bytes = len;
 	em->flags |= EXTENT_FLAG_COMPRESS_ZLIB;
 	write_lock(&em_tree->lock);
 	ret = btrfs_add_extent_mapping(inode, &em, em->start, em->len);
@@ -715,9 +758,13 @@  static int test_case_6(struct btrfs_fs_info *fs_info, struct btrfs_inode *inode)
 	}
 
 	em->start = SZ_4K;
+	em->orig_start = SZ_4K;
 	em->len = SZ_4K;
 	em->block_start = SZ_16K;
 	em->block_len = SZ_16K;
+	em->disk_bytenr = SZ_16K;
+	em->disk_num_bytes = SZ_16K;
+	em->ram_bytes = SZ_16K;
 	write_lock(&em_tree->lock);
 	ret = btrfs_add_extent_mapping(inode, &em, 0, SZ_8K);
 	write_unlock(&em_tree->lock);
@@ -771,7 +818,10 @@  static int test_case_7(struct btrfs_fs_info *fs_info, struct btrfs_inode *inode)
 	em->len = SZ_16K;
 	em->block_start = 0;
 	em->block_len = SZ_4K;
-	em->flags |= EXTENT_FLAG_PINNED;
+	em->disk_bytenr = 0;
+	em->disk_num_bytes = SZ_4K;
+	em->ram_bytes = SZ_16K;
+	em->flags |= (EXTENT_FLAG_PINNED | EXTENT_FLAG_COMPRESS_ZLIB);
 	write_lock(&em_tree->lock);
 	ret = btrfs_add_extent_mapping(inode, &em, em->start, em->len);
 	write_unlock(&em_tree->lock);
@@ -790,9 +840,13 @@  static int test_case_7(struct btrfs_fs_info *fs_info, struct btrfs_inode *inode)
 
 	/* [32K, 48K), not pinned */
 	em->start = SZ_32K;
+	em->orig_start = SZ_32K;
 	em->len = SZ_16K;
 	em->block_start = SZ_32K;
 	em->block_len = SZ_16K;
+	em->disk_bytenr = SZ_32K;
+	em->disk_num_bytes = SZ_16K;
+	em->ram_bytes = SZ_16K;
 	write_lock(&em_tree->lock);
 	ret = btrfs_add_extent_mapping(inode, &em, em->start, em->len);
 	write_unlock(&em_tree->lock);
diff --git a/fs/btrfs/tests/inode-tests.c b/fs/btrfs/tests/inode-tests.c
index 99da9d34b77a..0895c6e06812 100644
--- a/fs/btrfs/tests/inode-tests.c
+++ b/fs/btrfs/tests/inode-tests.c
@@ -117,7 +117,7 @@  static void setup_file_extents(struct btrfs_root *root, u32 sectorsize)
 
 	/* Now for a regular extent */
 	insert_extent(root, offset, sectorsize - 1, sectorsize - 1, 0,
-		      disk_bytenr, sectorsize, BTRFS_FILE_EXTENT_REG, 0, slot);
+		      disk_bytenr, sectorsize - 1, BTRFS_FILE_EXTENT_REG, 0, slot);
 	slot++;
 	disk_bytenr += sectorsize;
 	offset += sectorsize - 1;