mbox series

[v2,0/2] btrfs: fix btrfs_file_extent_item::ram_bytes of btrfs_split_ordered_extent()

Message ID cover.1713329516.git.wqu@suse.com (mailing list archive)
Headers show
Series btrfs: fix btrfs_file_extent_item::ram_bytes of btrfs_split_ordered_extent() | expand

Message

Qu Wenruo April 17, 2024, 4:54 a.m. UTC
[CHANGELOG]
v2:
- Update the comment on file extent item tree-checker 
  To be less confusing for future readers.

- Remove one fixes tag of the first patch
  The bug goes back to the introduction of zoned ordered extent
  splitting, thus that oldest commit should be the cause.

During my extent_map members rework, I added a sanity check to make sure
regular non-compressed extent_map would have its disk_num_bytes to match
ram_bytes.

But that extent_map sanity check always fail as we have on-disk file
extent items which has its ram_bytes much larger than the corresponding
disk_num_bytes, even if it's not compressed.

It turns out that, the ram_bytes > disk_num_bytes is caused by
btrfs_split_ordered_extent(), where it doesn't properly update
ram_bytes, resulting it larger than disk_num_bytes.

Thankfully everything is fine, as our code doesn't really bother
ram_bytes for non-compressed regular file extents, so no real damage.

Still I'd like to catch such problem in the future, so add another
tree-checker patch for this case.

And since the invalid ram_bytes is already in the wild for a while, we
do not want to bother the end users to fix their fs for nothing.
So the check is only behind DEBUG builds.

Furthermore, the tree-checker is only to make sure @ram_bytes <
@disk_num_bytes for non-compressed file extents.
As we still have other locations to make @ram_bytes < @disk_num_bytes.

And for btrfs-progs, I'm going to add extra check and repair support
soon.

Qu Wenruo (2):
  btrfs: set correct ram_bytes when splitting ordered extent
  btrfs: tree-checker: add one extra file extent item ram_bytes check

 fs/btrfs/ordered-data.c |  1 +
 fs/btrfs/tree-checker.c | 35 +++++++++++++++++++++++++++++------
 2 files changed, 30 insertions(+), 6 deletions(-)

Comments

Johannes Thumshirn April 17, 2024, 9:06 a.m. UTC | #1
Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
David Sterba April 19, 2024, 5:29 p.m. UTC | #2
On Wed, Apr 17, 2024 at 02:24:37PM +0930, Qu Wenruo wrote:
> [CHANGELOG]
> v2:
> - Update the comment on file extent item tree-checker 
>   To be less confusing for future readers.
> 
> - Remove one fixes tag of the first patch
>   The bug goes back to the introduction of zoned ordered extent
>   splitting, thus that oldest commit should be the cause.
> 
> During my extent_map members rework, I added a sanity check to make sure
> regular non-compressed extent_map would have its disk_num_bytes to match
> ram_bytes.
> 
> But that extent_map sanity check always fail as we have on-disk file
> extent items which has its ram_bytes much larger than the corresponding
> disk_num_bytes, even if it's not compressed.
> 
> It turns out that, the ram_bytes > disk_num_bytes is caused by
> btrfs_split_ordered_extent(), where it doesn't properly update
> ram_bytes, resulting it larger than disk_num_bytes.
> 
> Thankfully everything is fine, as our code doesn't really bother
> ram_bytes for non-compressed regular file extents, so no real damage.
> 
> Still I'd like to catch such problem in the future, so add another
> tree-checker patch for this case.
> 
> And since the invalid ram_bytes is already in the wild for a while, we
> do not want to bother the end users to fix their fs for nothing.
> So the check is only behind DEBUG builds.

For testing this is OK, but would it make sense to fix the wrong values
automatically?
Qu Wenruo April 19, 2024, 10 p.m. UTC | #3
在 2024/4/20 02:59, David Sterba 写道:
> On Wed, Apr 17, 2024 at 02:24:37PM +0930, Qu Wenruo wrote:
>> [CHANGELOG]
>> v2:
>> - Update the comment on file extent item tree-checker
>>    To be less confusing for future readers.
>>
>> - Remove one fixes tag of the first patch
>>    The bug goes back to the introduction of zoned ordered extent
>>    splitting, thus that oldest commit should be the cause.
>>
>> During my extent_map members rework, I added a sanity check to make sure
>> regular non-compressed extent_map would have its disk_num_bytes to match
>> ram_bytes.
>>
>> But that extent_map sanity check always fail as we have on-disk file
>> extent items which has its ram_bytes much larger than the corresponding
>> disk_num_bytes, even if it's not compressed.
>>
>> It turns out that, the ram_bytes > disk_num_bytes is caused by
>> btrfs_split_ordered_extent(), where it doesn't properly update
>> ram_bytes, resulting it larger than disk_num_bytes.
>>
>> Thankfully everything is fine, as our code doesn't really bother
>> ram_bytes for non-compressed regular file extents, so no real damage.
>>
>> Still I'd like to catch such problem in the future, so add another
>> tree-checker patch for this case.
>>
>> And since the invalid ram_bytes is already in the wild for a while, we
>> do not want to bother the end users to fix their fs for nothing.
>> So the check is only behind DEBUG builds.
>
> For testing this is OK, but would it make sense to fix the wrong values
> automatically?
>

I'm also thinking about this, two alternatives, but neither is perfect:

- Fix the value at eb level, aka, updates the ram_bytes directly at read
   time
   This should fix the problem forever, but it has its own problems
   related to read-only mounts.
   If we do not do the fix for RO mounts, then remounting to RW, and we
   would have cached result untouched.
   But if we do, we're modifying ebs, how to write them back for a full
   RO mounts (e.g. rescue=all)?

- Only fix the extent_map::ram_bytes value
   This is not going to change anything on-disk.

And considering this "problem" has no real chance for corruption, I do
not believe we need all those hassles.

My plan to repair the mismatch would all be inside btrfs-progs, after I
have pinned down other call sites resulting smaller ram_bytes than
disk_num_bytes.

Thanks,
Qu
David Sterba April 19, 2024, 10:44 p.m. UTC | #4
On Sat, Apr 20, 2024 at 07:30:39AM +0930, Qu Wenruo wrote:
> >> And since the invalid ram_bytes is already in the wild for a while, we
> >> do not want to bother the end users to fix their fs for nothing.
> >> So the check is only behind DEBUG builds.
> >
> > For testing this is OK, but would it make sense to fix the wrong values
> > automatically?
> >
> 
> I'm also thinking about this, two alternatives, but neither is perfect:
> 
> - Fix the value at eb level, aka, updates the ram_bytes directly at read
>    time
>    This should fix the problem forever, but it has its own problems
>    related to read-only mounts.
>    If we do not do the fix for RO mounts, then remounting to RW, and we
>    would have cached result untouched.
>    But if we do, we're modifying ebs, how to write them back for a full
>    RO mounts (e.g. rescue=all)?
> 
> - Only fix the extent_map::ram_bytes value
>    This is not going to change anything on-disk.
> 
> And considering this "problem" has no real chance for corruption, I do
> not believe we need all those hassles.
> 
> My plan to repair the mismatch would all be inside btrfs-progs, after I
> have pinned down other call sites resulting smaller ram_bytes than
> disk_num_bytes.

I see, so the only chance to fix it on disk is to change the extent for
some other reason. Otherwise we can only make sure that an in-memory
value is correct so it does not propagate further.
David Sterba April 23, 2024, 2:17 p.m. UTC | #5
On Wed, Apr 17, 2024 at 02:24:37PM +0930, Qu Wenruo wrote:
> [CHANGELOG]
> v2:
> - Update the comment on file extent item tree-checker 
>   To be less confusing for future readers.
> 
> - Remove one fixes tag of the first patch
>   The bug goes back to the introduction of zoned ordered extent
>   splitting, thus that oldest commit should be the cause.
> 
> During my extent_map members rework, I added a sanity check to make sure
> regular non-compressed extent_map would have its disk_num_bytes to match
> ram_bytes.
> 
> But that extent_map sanity check always fail as we have on-disk file
> extent items which has its ram_bytes much larger than the corresponding
> disk_num_bytes, even if it's not compressed.
> 
> It turns out that, the ram_bytes > disk_num_bytes is caused by
> btrfs_split_ordered_extent(), where it doesn't properly update
> ram_bytes, resulting it larger than disk_num_bytes.
> 
> Thankfully everything is fine, as our code doesn't really bother
> ram_bytes for non-compressed regular file extents, so no real damage.
> 
> Still I'd like to catch such problem in the future, so add another
> tree-checker patch for this case.
> 
> And since the invalid ram_bytes is already in the wild for a while, we
> do not want to bother the end users to fix their fs for nothing.
> So the check is only behind DEBUG builds.
> 
> Furthermore, the tree-checker is only to make sure @ram_bytes <
> @disk_num_bytes for non-compressed file extents.
> As we still have other locations to make @ram_bytes < @disk_num_bytes.
> 
> And for btrfs-progs, I'm going to add extra check and repair support
> soon.
> 
> Qu Wenruo (2):
>   btrfs: set correct ram_bytes when splitting ordered extent
>   btrfs: tree-checker: add one extra file extent item ram_bytes check

Reviewed-by: David Sterba <dsterba@suse.com>