diff mbox series

btrfs-progs: properly reserve metadata space for bgt conversion

Message ID 6c556ce0456303fb8ec23a454c3b5e18b874ae91.1696291742.git.wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs-progs: properly reserve metadata space for bgt conversion | expand

Commit Message

Qu Wenruo Oct. 3, 2023, 12:09 a.m. UTC
There is a bug report that btrfstune failed to convert to block group
tree.
The bug report shows some very weird call trace, mostly fail at free
space tree code.

One of the concern is the failed conversion may be caused by exhausted
metadata space.
In that case, we're doing quite some metadata operations (one
transaction to convert 64 block groups in one go).

But for each transaction we have only reserved the metadata for one
block group conversion (1 block for extent tree and 1 block for block
group tree, this excludes free space and extent tree operations, as they
should go global rsv).

Although we're not 100% sure about the failed conversion, at least we
should handle the metadata reservation correctly, by properly reserve
the needed space for the batch, and reduce the batch size just in case
there isn't much metadata space left.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 tune/convert-bgt.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

Comments

David Sterba Oct. 6, 2023, 3:35 p.m. UTC | #1
On Tue, Oct 03, 2023 at 10:39:18AM +1030, Qu Wenruo wrote:
> There is a bug report that btrfstune failed to convert to block group
> tree.
> The bug report shows some very weird call trace, mostly fail at free
> space tree code.

If you mention a bug report please either add a link or describe the key
details of the report.

> One of the concern is the failed conversion may be caused by exhausted
> metadata space.
> In that case, we're doing quite some metadata operations (one
> transaction to convert 64 block groups in one go).
> 
> But for each transaction we have only reserved the metadata for one
> block group conversion (1 block for extent tree and 1 block for block
> group tree, this excludes free space and extent tree operations, as they
> should go global rsv).
> 
> Although we're not 100% sure about the failed conversion, at least we
> should handle the metadata reservation correctly, by properly reserve
> the needed space for the batch, and reduce the batch size just in case
> there isn't much metadata space left.

This is probably reasonable.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Added to devel, thanks.
Qu Wenruo Oct. 8, 2023, 11:25 p.m. UTC | #2
On 2023/10/7 02:05, David Sterba wrote:
> On Tue, Oct 03, 2023 at 10:39:18AM +1030, Qu Wenruo wrote:
>> There is a bug report that btrfstune failed to convert to block group
>> tree.
>> The bug report shows some very weird call trace, mostly fail at free
>> space tree code.
> 
> If you mention a bug report please either add a link or describe the key
> details of the report.

Here it is:

https://lore.kernel.org/linux-btrfs/3c93d0b5-a8cb-ebe3-f8d6-76ea6340f23e@gmail.com/

I didn't put it in the first place is, it's really just an inspiration, 
not a concrete bug report.

But for sure, in the future I can also put those inspirational reports 
into "Link:" tag.
> 
>> One of the concern is the failed conversion may be caused by exhausted
>> metadata space.
>> In that case, we're doing quite some metadata operations (one
>> transaction to convert 64 block groups in one go).
>>
>> But for each transaction we have only reserved the metadata for one
>> block group conversion (1 block for extent tree and 1 block for block
>> group tree, this excludes free space and extent tree operations, as they
>> should go global rsv).
>>
>> Although we're not 100% sure about the failed conversion, at least we
>> should handle the metadata reservation correctly, by properly reserve
>> the needed space for the batch, and reduce the batch size just in case
>> there isn't much metadata space left.
> 
> This is probably reasonable.

The change itself is reasonable, but don't be surprised that, 
btrfs-progs has no metadata reservation at all.

It really goes allocate-as-we-can method, until it crashes at some 
critical path, and make the fs unable to be mounted/used by kernel due 
to -ENOSPC.

The extra safenet for a simple metadata rsv check is introduced in this 
patch:

https://lore.kernel.org/linux-btrfs/6081e57fe6f43b3ab44c883814c6a197513c66c0.1696489737.git.wqu@suse.com/

Thanks,
Qu
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
> 
> Added to devel, thanks.
David Sterba Oct. 9, 2023, 5:12 p.m. UTC | #3
On Mon, Oct 09, 2023 at 09:55:58AM +1030, Qu Wenruo wrote:
> 
> 
> On 2023/10/7 02:05, David Sterba wrote:
> > On Tue, Oct 03, 2023 at 10:39:18AM +1030, Qu Wenruo wrote:
> >> There is a bug report that btrfstune failed to convert to block group
> >> tree.
> >> The bug report shows some very weird call trace, mostly fail at free
> >> space tree code.
> > 
> > If you mention a bug report please either add a link or describe the key
> > details of the report.
> 
> Here it is:
> 
> https://lore.kernel.org/linux-btrfs/3c93d0b5-a8cb-ebe3-f8d6-76ea6340f23e@gmail.com/
> 
> I didn't put it in the first place is, it's really just an inspiration, 
> not a concrete bug report.

That's also good reason to mention it in the changelog as it's relevant
for the patch and can add more context e.g. about the use case.

> But for sure, in the future I can also put those inspirational reports 
> into "Link:" tag.

Yeah, you can mention id like "This is based on discussion in [1]" or
similar.

> >> One of the concern is the failed conversion may be caused by exhausted
> >> metadata space.
> >> In that case, we're doing quite some metadata operations (one
> >> transaction to convert 64 block groups in one go).
> >>
> >> But for each transaction we have only reserved the metadata for one
> >> block group conversion (1 block for extent tree and 1 block for block
> >> group tree, this excludes free space and extent tree operations, as they
> >> should go global rsv).
> >>
> >> Although we're not 100% sure about the failed conversion, at least we
> >> should handle the metadata reservation correctly, by properly reserve
> >> the needed space for the batch, and reduce the batch size just in case
> >> there isn't much metadata space left.
> > 
> > This is probably reasonable.
> 
> The change itself is reasonable, but don't be surprised that, 
> btrfs-progs has no metadata reservation at all.
> 
> It really goes allocate-as-we-can method, until it crashes at some 
> critical path, and make the fs unable to be mounted/used by kernel due 
> to -ENOSPC.

I am not surprised.

> The extra safenet for a simple metadata rsv check is introduced in this 
> patch:
> 
> https://lore.kernel.org/linux-btrfs/6081e57fe6f43b3ab44c883814c6a197513c66c0.1696489737.git.wqu@suse.com/

Thanks. I think historically there was not much need for that as the
only offlline writing action was 'check' that relied on the single
transaction to do everything. Now we do more conversion changes so the
sanity checks will be needed.
diff mbox series

Patch

diff --git a/tune/convert-bgt.c b/tune/convert-bgt.c
index 77cba3930ae1..dd3a8c750604 100644
--- a/tune/convert-bgt.c
+++ b/tune/convert-bgt.c
@@ -28,7 +28,7 @@ 
 #include "tune/tune.h"
 
 /* After this many block groups we need to commit transaction. */
-#define BLOCK_GROUP_BATCH	64
+#define BLOCK_GROUP_BATCH	16
 
 int convert_to_bg_tree(struct btrfs_fs_info *fs_info)
 {
@@ -71,7 +71,8 @@  int convert_to_bg_tree(struct btrfs_fs_info *fs_info)
 		error_msg(ERROR_MSG_COMMIT_TRANS, "new bg root: %d", ret);
 		goto error;
 	}
-	trans = btrfs_start_transaction(fs_info->tree_root, 2);
+	trans = btrfs_start_transaction(fs_info->tree_root,
+					2 * BLOCK_GROUP_BATCH);
 	if (IS_ERR(trans)) {
 		ret = PTR_ERR(trans);
 		errno = -ret;
@@ -119,7 +120,8 @@  iterate_bgs:
 				error_msg(ERROR_MSG_COMMIT_TRANS, "%m");
 				return ret;
 			}
-			trans = btrfs_start_transaction(fs_info->tree_root, 2);
+			trans = btrfs_start_transaction(fs_info->tree_root,
+							2 * BLOCK_GROUP_BATCH);
 			if (IS_ERR(trans)) {
 				ret = PTR_ERR(trans);
 				errno = -ret;
@@ -181,7 +183,8 @@  int convert_to_extent_tree(struct btrfs_fs_info *fs_info)
 		error_msg(ERROR_MSG_COMMIT_TRANS, "new extent tree root: %m");
 		goto error;
 	}
-	trans = btrfs_start_transaction(fs_info->tree_root, 2);
+	trans = btrfs_start_transaction(fs_info->tree_root,
+					2 * BLOCK_GROUP_BATCH);
 	if (IS_ERR(trans)) {
 		ret = PTR_ERR(trans);
 		errno = -ret;
@@ -227,7 +230,8 @@  iterate_bgs:
 				error_msg(ERROR_MSG_COMMIT_TRANS, "%m");
 				return ret;
 			}
-			trans = btrfs_start_transaction(fs_info->tree_root, 2);
+			trans = btrfs_start_transaction(fs_info->tree_root,
+							2 * BLOCK_GROUP_BATCH);
 			if (IS_ERR(trans)) {
 				ret = PTR_ERR(trans);
 				errno = -ret;