mbox series

[v2,0/3] btrfs-progs: mkfs: make sure we can clean up all temporary chunks

Message ID 20211011120650.179017-1-wqu@suse.com (mailing list archive)
Headers show
Series btrfs-progs: mkfs: make sure we can clean up all temporary chunks | expand

Message

Qu Wenruo Oct. 11, 2021, 12:06 p.m. UTC
There is a bug report that with certain mkfs options, mkfs.btrfs may
fail to cleanup some temporary chunks, leading to "btrfs filesystem df"
warning about multiple profiles:

  WARNING: Multiple block group profiles detected, see 'man btrfs(5)'.
  WARNING:   Metadata: single, raid1 

The easiest way to reproduce is "mkfs.btrfs -f -R free-space-tree -m dup
-d dup".

It turns out that, the old _recow_root() can not handle tree levels > 0,
while with newer free space tree creation timing, the free space tree
can reach level 1 or higher.

To fix the problem, Patch 2 will do the proper full tree re-CoW, with
extra transaction commitment to make sure all free space tree get
re-CoWed.

The 3rd patch will do the extra verification during mkfs-tests.

The first patch is just to fix a confusing parameter which also caused
u64 -> int width reduction and can be problematic in the future.

Changelog:
v2:
- Remove a duplicated recow_roots() call in create_raid_groups()
  This call makes no difference as we will later commit transaction
  and manually call recow_roots() again.
  Remove such duplicated call to save some time.

- Replace the btrfs_next_sibling_tree_block() with btrfs_next_leaf()
  Since we're always handling leaves, there is no need for
  btrfs_next_sibling_tree_block()

- Work around a kernel bug which may cause false alerts
  For single device RAID0, btrfs kernel is not respecting it, and will
  allocate new chunks using SINGLE instead.
  This can be very noisy and cause false alerts, and not always
  reproducible, depending on how fast kernel creates new chunks.

  Work around it by mounting the RO before calling "btrfs fi df".

  The kernel bug needs to be investigated and fixed.


Qu Wenruo (3):
  btrfs-progs: rename @data parameter to @profile in extent allocation
    path
  btrfs-progs: mkfs: recow all tree blocks properly
  btrfs-progs: mfks-tests: make sure mkfs.btrfs cleans up temporary
    chunks

 kernel-shared/extent-tree.c                 | 26 +++---
 mkfs/main.c                                 | 90 ++++++++++++++++++---
 tests/mkfs-tests/001-basic-profiles/test.sh | 16 +++-
 3 files changed, 104 insertions(+), 28 deletions(-)

Comments

Nikolay Borisov Oct. 11, 2021, 12:10 p.m. UTC | #1
On 11.10.21 г. 15:06, Qu Wenruo wrote:
> There is a bug report that with certain mkfs options, mkfs.btrfs may
> fail to cleanup some temporary chunks, leading to "btrfs filesystem df"
> warning about multiple profiles:
> 
>   WARNING: Multiple block group profiles detected, see 'man btrfs(5)'.
>   WARNING:   Metadata: single, raid1 
> 
> The easiest way to reproduce is "mkfs.btrfs -f -R free-space-tree -m dup
> -d dup".
> 
> It turns out that, the old _recow_root() can not handle tree levels > 0,
> while with newer free space tree creation timing, the free space tree
> can reach level 1 or higher.
> 
> To fix the problem, Patch 2 will do the proper full tree re-CoW, with
> extra transaction commitment to make sure all free space tree get
> re-CoWed.
> 
> The 3rd patch will do the extra verification during mkfs-tests.
> 
> The first patch is just to fix a confusing parameter which also caused
> u64 -> int width reduction and can be problematic in the future.
> 
> Changelog:
> v2:
> - Remove a duplicated recow_roots() call in create_raid_groups()
>   This call makes no difference as we will later commit transaction
>   and manually call recow_roots() again.
>   Remove such duplicated call to save some time.
> 
> - Replace the btrfs_next_sibling_tree_block() with btrfs_next_leaf()
>   Since we're always handling leaves, there is no need for
>   btrfs_next_sibling_tree_block()
> 
> - Work around a kernel bug which may cause false alerts
>   For single device RAID0, btrfs kernel is not respecting it, and will
>   allocate new chunks using SINGLE instead.
>   This can be very noisy and cause false alerts, and not always
>   reproducible, depending on how fast kernel creates new chunks.
> 
>   Work around it by mounting the RO before calling "btrfs fi df".
> 
>   The kernel bug needs to be investigated and fixed.
It's better to see the kernel bug fixed rather than papering over it.

> 
> 
> Qu Wenruo (3):
>   btrfs-progs: rename @data parameter to @profile in extent allocation
>     path
>   btrfs-progs: mkfs: recow all tree blocks properly
>   btrfs-progs: mfks-tests: make sure mkfs.btrfs cleans up temporary
>     chunks
> 
>  kernel-shared/extent-tree.c                 | 26 +++---
>  mkfs/main.c                                 | 90 ++++++++++++++++++---
>  tests/mkfs-tests/001-basic-profiles/test.sh | 16 +++-
>  3 files changed, 104 insertions(+), 28 deletions(-)
>
Qu Wenruo Oct. 11, 2021, 12:14 p.m. UTC | #2
On 2021/10/11 20:10, Nikolay Borisov wrote:
>
>
> On 11.10.21 г. 15:06, Qu Wenruo wrote:
>> There is a bug report that with certain mkfs options, mkfs.btrfs may
>> fail to cleanup some temporary chunks, leading to "btrfs filesystem df"
>> warning about multiple profiles:
>>
>>    WARNING: Multiple block group profiles detected, see 'man btrfs(5)'.
>>    WARNING:   Metadata: single, raid1
>>
>> The easiest way to reproduce is "mkfs.btrfs -f -R free-space-tree -m dup
>> -d dup".
>>
>> It turns out that, the old _recow_root() can not handle tree levels > 0,
>> while with newer free space tree creation timing, the free space tree
>> can reach level 1 or higher.
>>
>> To fix the problem, Patch 2 will do the proper full tree re-CoW, with
>> extra transaction commitment to make sure all free space tree get
>> re-CoWed.
>>
>> The 3rd patch will do the extra verification during mkfs-tests.
>>
>> The first patch is just to fix a confusing parameter which also caused
>> u64 -> int width reduction and can be problematic in the future.
>>
>> Changelog:
>> v2:
>> - Remove a duplicated recow_roots() call in create_raid_groups()
>>    This call makes no difference as we will later commit transaction
>>    and manually call recow_roots() again.
>>    Remove such duplicated call to save some time.
>>
>> - Replace the btrfs_next_sibling_tree_block() with btrfs_next_leaf()
>>    Since we're always handling leaves, there is no need for
>>    btrfs_next_sibling_tree_block()
>>
>> - Work around a kernel bug which may cause false alerts
>>    For single device RAID0, btrfs kernel is not respecting it, and will
>>    allocate new chunks using SINGLE instead.
>>    This can be very noisy and cause false alerts, and not always
>>    reproducible, depending on how fast kernel creates new chunks.
>>
>>    Work around it by mounting the RO before calling "btrfs fi df".
>>
>>    The kernel bug needs to be investigated and fixed.
> It's better to see the kernel bug fixed rather than papering over it.

That's for sure.

Just get overloaded by so many small bugs in one day.

Will investigate and fix the bug soon.

For the test case itself, mounting with RO in fact makes sense, we just
want to the initial chunk layout created by mkfs.

If later we choose to compare the total chunk size against the reported
values, such RO mount is a hard requirement to avoid chunk preallocation.

Thanks,
Qu
>
>>
>>
>> Qu Wenruo (3):
>>    btrfs-progs: rename @data parameter to @profile in extent allocation
>>      path
>>    btrfs-progs: mkfs: recow all tree blocks properly
>>    btrfs-progs: mfks-tests: make sure mkfs.btrfs cleans up temporary
>>      chunks
>>
>>   kernel-shared/extent-tree.c                 | 26 +++---
>>   mkfs/main.c                                 | 90 ++++++++++++++++++---
>>   tests/mkfs-tests/001-basic-profiles/test.sh | 16 +++-
>>   3 files changed, 104 insertions(+), 28 deletions(-)
>>
David Sterba Oct. 11, 2021, 2:05 p.m. UTC | #3
On Mon, Oct 11, 2021 at 08:06:47PM +0800, Qu Wenruo wrote:
> There is a bug report that with certain mkfs options, mkfs.btrfs may
> fail to cleanup some temporary chunks, leading to "btrfs filesystem df"
> warning about multiple profiles:
> 
>   WARNING: Multiple block group profiles detected, see 'man btrfs(5)'.
>   WARNING:   Metadata: single, raid1 
> 
> The easiest way to reproduce is "mkfs.btrfs -f -R free-space-tree -m dup
> -d dup".
> 
> It turns out that, the old _recow_root() can not handle tree levels > 0,
> while with newer free space tree creation timing, the free space tree
> can reach level 1 or higher.
> 
> To fix the problem, Patch 2 will do the proper full tree re-CoW, with
> extra transaction commitment to make sure all free space tree get
> re-CoWed.

The extra commit breaks assumptions of test misc/038 that looks up the
backup roots in particular slot(s). I already had to fix it once due to
the additional commit with free space tree. Now it broke again, the test
is too fragle, I'm not sure we want to keep doing the whack-a-mole.
David Sterba Oct. 11, 2021, 2:39 p.m. UTC | #4
On Mon, Oct 11, 2021 at 04:05:18PM +0200, David Sterba wrote:
> On Mon, Oct 11, 2021 at 08:06:47PM +0800, Qu Wenruo wrote:
> > There is a bug report that with certain mkfs options, mkfs.btrfs may
> > fail to cleanup some temporary chunks, leading to "btrfs filesystem df"
> > warning about multiple profiles:
> > 
> >   WARNING: Multiple block group profiles detected, see 'man btrfs(5)'.
> >   WARNING:   Metadata: single, raid1 
> > 
> > The easiest way to reproduce is "mkfs.btrfs -f -R free-space-tree -m dup
> > -d dup".
> > 
> > It turns out that, the old _recow_root() can not handle tree levels > 0,
> > while with newer free space tree creation timing, the free space tree
> > can reach level 1 or higher.
> > 
> > To fix the problem, Patch 2 will do the proper full tree re-CoW, with
> > extra transaction commitment to make sure all free space tree get
> > re-CoWed.
> 
> The extra commit breaks assumptions of test misc/038 that looks up the
> backup roots in particular slot(s). I already had to fix it once due to
> the additional commit with free space tree. Now it broke again, the test
> is too fragle, I'm not sure we want to keep doing the whack-a-mole.

I've check it with the v2, no change, so I'll disable the test.
David Sterba Oct. 11, 2021, 2:40 p.m. UTC | #5
On Mon, Oct 11, 2021 at 08:06:47PM +0800, Qu Wenruo wrote:
> There is a bug report that with certain mkfs options, mkfs.btrfs may
> fail to cleanup some temporary chunks, leading to "btrfs filesystem df"
> warning about multiple profiles:
> 
>   WARNING: Multiple block group profiles detected, see 'man btrfs(5)'.
>   WARNING:   Metadata: single, raid1 
> 
> The easiest way to reproduce is "mkfs.btrfs -f -R free-space-tree -m dup
> -d dup".
> 
> It turns out that, the old _recow_root() can not handle tree levels > 0,
> while with newer free space tree creation timing, the free space tree
> can reach level 1 or higher.
> 
> To fix the problem, Patch 2 will do the proper full tree re-CoW, with
> extra transaction commitment to make sure all free space tree get
> re-CoWed.
> 
> The 3rd patch will do the extra verification during mkfs-tests.
> 
> The first patch is just to fix a confusing parameter which also caused
> u64 -> int width reduction and can be problematic in the future.
> 
> Changelog:
> v2:
> - Remove a duplicated recow_roots() call in create_raid_groups()
>   This call makes no difference as we will later commit transaction
>   and manually call recow_roots() again.
>   Remove such duplicated call to save some time.
> 
> - Replace the btrfs_next_sibling_tree_block() with btrfs_next_leaf()
>   Since we're always handling leaves, there is no need for
>   btrfs_next_sibling_tree_block()
> 
> - Work around a kernel bug which may cause false alerts
>   For single device RAID0, btrfs kernel is not respecting it, and will
>   allocate new chunks using SINGLE instead.
>   This can be very noisy and cause false alerts, and not always
>   reproducible, depending on how fast kernel creates new chunks.
> 
>   Work around it by mounting the RO before calling "btrfs fi df".
> 
>   The kernel bug needs to be investigated and fixed.
> 
> 
> Qu Wenruo (3):
>   btrfs-progs: rename @data parameter to @profile in extent allocation
>     path
>   btrfs-progs: mkfs: recow all tree blocks properly
>   btrfs-progs: mfks-tests: make sure mkfs.btrfs cleans up temporary
>     chunks

I've replaced patches 2 and 3, thanks. No need to resend with the fixes
I mentioned, that's for future patches.
Qu Wenruo Oct. 11, 2021, 10:56 p.m. UTC | #6
On 2021/10/11 22:39, David Sterba wrote:
> On Mon, Oct 11, 2021 at 04:05:18PM +0200, David Sterba wrote:
>> On Mon, Oct 11, 2021 at 08:06:47PM +0800, Qu Wenruo wrote:
>>> There is a bug report that with certain mkfs options, mkfs.btrfs may
>>> fail to cleanup some temporary chunks, leading to "btrfs filesystem df"
>>> warning about multiple profiles:
>>>
>>>    WARNING: Multiple block group profiles detected, see 'man btrfs(5)'.
>>>    WARNING:   Metadata: single, raid1
>>>
>>> The easiest way to reproduce is "mkfs.btrfs -f -R free-space-tree -m dup
>>> -d dup".
>>>
>>> It turns out that, the old _recow_root() can not handle tree levels > 0,
>>> while with newer free space tree creation timing, the free space tree
>>> can reach level 1 or higher.
>>>
>>> To fix the problem, Patch 2 will do the proper full tree re-CoW, with
>>> extra transaction commitment to make sure all free space tree get
>>> re-CoWed.
>>
>> The extra commit breaks assumptions of test misc/038 that looks up the
>> backup roots in particular slot(s). I already had to fix it once due to
>> the additional commit with free space tree. Now it broke again, the test
>> is too fragle, I'm not sure we want to keep doing the whack-a-mole.
> 
> I've check it with the v2, no change, so I'll disable the test.
> 

I'll update the test case to make it more flex to handle the extra commits.

The proper way to test the behavior is not to use the hardcoded slot 
number, but auto-detect the slot.

Thanks,
Qu
Qu Wenruo Oct. 12, 2021, 7:07 a.m. UTC | #7
On 2021/10/11 20:14, Qu Wenruo wrote:
>
>
> On 2021/10/11 20:10, Nikolay Borisov wrote:
>>
>>
>> On 11.10.21 г. 15:06, Qu Wenruo wrote:
>>> There is a bug report that with certain mkfs options, mkfs.btrfs may
>>> fail to cleanup some temporary chunks, leading to "btrfs filesystem df"
>>> warning about multiple profiles:
>>>
>>>    WARNING: Multiple block group profiles detected, see 'man btrfs(5)'.
>>>    WARNING:   Metadata: single, raid1
>>>
>>> The easiest way to reproduce is "mkfs.btrfs -f -R free-space-tree -m dup
>>> -d dup".
>>>
>>> It turns out that, the old _recow_root() can not handle tree levels > 0,
>>> while with newer free space tree creation timing, the free space tree
>>> can reach level 1 or higher.
>>>
>>> To fix the problem, Patch 2 will do the proper full tree re-CoW, with
>>> extra transaction commitment to make sure all free space tree get
>>> re-CoWed.
>>>
>>> The 3rd patch will do the extra verification during mkfs-tests.
>>>
>>> The first patch is just to fix a confusing parameter which also caused
>>> u64 -> int width reduction and can be problematic in the future.
>>>
>>> Changelog:
>>> v2:
>>> - Remove a duplicated recow_roots() call in create_raid_groups()
>>>    This call makes no difference as we will later commit transaction
>>>    and manually call recow_roots() again.
>>>    Remove such duplicated call to save some time.
>>>
>>> - Replace the btrfs_next_sibling_tree_block() with btrfs_next_leaf()
>>>    Since we're always handling leaves, there is no need for
>>>    btrfs_next_sibling_tree_block()
>>>
>>> - Work around a kernel bug which may cause false alerts
>>>    For single device RAID0, btrfs kernel is not respecting it, and will
>>>    allocate new chunks using SINGLE instead.
>>>    This can be very noisy and cause false alerts, and not always
>>>    reproducible, depending on how fast kernel creates new chunks.
>>>
>>>    Work around it by mounting the RO before calling "btrfs fi df".
>>>
>>>    The kernel bug needs to be investigated and fixed.
>> It's better to see the kernel bug fixed rather than papering over it.

The truth is, this is more like a kernel behavior change.

Before commit b2f78e88052b ("btrfs: allow degenerate raid0/raid10"),
kernel can only create RAID0 chunks with at least two devices.

Thus older kernel (when tested under my host, it's still v5.14) will
create SINGLE chunk as it has no other choice.

So false alert.

Thanks,
Qu

>
> That's for sure.
>
> Just get overloaded by so many small bugs in one day.
>
> Will investigate and fix the bug soon.
>
> For the test case itself, mounting with RO in fact makes sense, we just
> want to the initial chunk layout created by mkfs.
>
> If later we choose to compare the total chunk size against the reported
> values, such RO mount is a hard requirement to avoid chunk preallocation.
>
> Thanks,
> Qu
>>
>>>
>>>
>>> Qu Wenruo (3):
>>>    btrfs-progs: rename @data parameter to @profile in extent allocation
>>>      path
>>>    btrfs-progs: mkfs: recow all tree blocks properly
>>>    btrfs-progs: mfks-tests: make sure mkfs.btrfs cleans up temporary
>>>      chunks
>>>
>>>   kernel-shared/extent-tree.c                 | 26 +++---
>>>   mkfs/main.c                                 | 90 ++++++++++++++++++---
>>>   tests/mkfs-tests/001-basic-profiles/test.sh | 16 +++-
>>>   3 files changed, 104 insertions(+), 28 deletions(-)
>>>