[1/5] btrfs: change the minimum global reserve size
diff mbox series

Message ID 20190816152019.1962-2-josef@toxicpanda.com
State New
Headers show
Series
  • Fix global reserve size and can overcommit
Related show

Commit Message

Josef Bacik Aug. 16, 2019, 3:20 p.m. UTC
It made sense to have the global reserve set at 16M in the past, but
since it is used less nowadays set the minimum size to the number of
items we'll need to update the main trees we update during a transaction
commit, plus some slop area so we can do unlinks if we need to.

In practice this doesn't affect normal file systems, but for xfstests
where we do things like fill up a fs and then rm * it can fall over in
weird ways.  This enables us for more sane behavior at extremely small
file system sizes.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/block-rsv.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

Comments

Nikolay Borisov Aug. 20, 2019, 1:45 p.m. UTC | #1
On 16.08.19 г. 18:20 ч., Josef Bacik wrote:
> It made sense to have the global reserve set at 16M in the past, but
> since it is used less nowadays set the minimum size to the number of
> items we'll need to update the main trees we update during a transaction
> commit, plus some slop area so we can do unlinks if we need to.
> 
> In practice this doesn't affect normal file systems, but for xfstests
> where we do things like fill up a fs and then rm * it can fall over in
> weird ways.  This enables us for more sane behavior at extremely small
> file system sizes.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/block-rsv.c | 22 +++++++++++++++++++++-
>  1 file changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/block-rsv.c b/fs/btrfs/block-rsv.c
> index c64b460a4301..657675eef443 100644
> --- a/fs/btrfs/block-rsv.c
> +++ b/fs/btrfs/block-rsv.c
> @@ -258,6 +258,7 @@ void btrfs_update_global_block_rsv(struct btrfs_fs_info *fs_info)
>  	struct btrfs_block_rsv *block_rsv = &fs_info->global_block_rsv;
>  	struct btrfs_space_info *sinfo = block_rsv->space_info;
>  	u64 num_bytes;
> +	unsigned min_items;
>  
>  	/*
>  	 * The global block rsv is based on the size of the extent tree, the
> @@ -267,7 +268,26 @@ void btrfs_update_global_block_rsv(struct btrfs_fs_info *fs_info)
>  	num_bytes = btrfs_root_used(&fs_info->extent_root->root_item) +
>  		btrfs_root_used(&fs_info->csum_root->root_item) +
>  		btrfs_root_used(&fs_info->tree_root->root_item);
> -	num_bytes = max_t(u64, num_bytes, SZ_16M);
> +
> +	/*
> +	 * We at a minimum are going to modify the csum root, the tree root, and
> +	 * the extent root.
> +	 */
> +	min_items = 3;
> +
> +	/*
> +	 * But we also want to reserve enough space so we can do the fallback
> +	 * global reserve for an unlink, which is an additional 5 items (see the
> +	 * comment in __unlink_start_trans for what we're modifying.)
> +	 *
> +	 * But we also need space for the delayed ref updates from the unlink,
> +	 * so its 10, 5 for the actual operation, and 5 for the delayed ref
> +	 * updates.
> +	 */
> +	min_items += 10;
> +
> +	num_bytes = max_t(u64, num_bytes,
> +			  btrfs_calc_insert_metadata_size(fs_info, min_items));

For ordinary, 16k nodesize filesystem, btrfs_calc_insert_metadata_size
will really return 3.4m -> 16 * 8 * 2 * 13 * 1024 = 3407872 bytes . In
those cases I expect that the code will always be doing num_bytes =
num_bytes.

>  
>  	spin_lock(&sinfo->lock);
>  	spin_lock(&block_rsv->lock);
>
Josef Bacik Aug. 20, 2019, 7:12 p.m. UTC | #2
On Tue, Aug 20, 2019 at 04:45:15PM +0300, Nikolay Borisov wrote:
> 
> 
> On 16.08.19 г. 18:20 ч., Josef Bacik wrote:
> > It made sense to have the global reserve set at 16M in the past, but
> > since it is used less nowadays set the minimum size to the number of
> > items we'll need to update the main trees we update during a transaction
> > commit, plus some slop area so we can do unlinks if we need to.
> > 
> > In practice this doesn't affect normal file systems, but for xfstests
> > where we do things like fill up a fs and then rm * it can fall over in
> > weird ways.  This enables us for more sane behavior at extremely small
> > file system sizes.
> > 
> > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > ---
> >  fs/btrfs/block-rsv.c | 22 +++++++++++++++++++++-
> >  1 file changed, 21 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/btrfs/block-rsv.c b/fs/btrfs/block-rsv.c
> > index c64b460a4301..657675eef443 100644
> > --- a/fs/btrfs/block-rsv.c
> > +++ b/fs/btrfs/block-rsv.c
> > @@ -258,6 +258,7 @@ void btrfs_update_global_block_rsv(struct btrfs_fs_info *fs_info)
> >  	struct btrfs_block_rsv *block_rsv = &fs_info->global_block_rsv;
> >  	struct btrfs_space_info *sinfo = block_rsv->space_info;
> >  	u64 num_bytes;
> > +	unsigned min_items;
> >  
> >  	/*
> >  	 * The global block rsv is based on the size of the extent tree, the
> > @@ -267,7 +268,26 @@ void btrfs_update_global_block_rsv(struct btrfs_fs_info *fs_info)
> >  	num_bytes = btrfs_root_used(&fs_info->extent_root->root_item) +
> >  		btrfs_root_used(&fs_info->csum_root->root_item) +
> >  		btrfs_root_used(&fs_info->tree_root->root_item);
> > -	num_bytes = max_t(u64, num_bytes, SZ_16M);
> > +
> > +	/*
> > +	 * We at a minimum are going to modify the csum root, the tree root, and
> > +	 * the extent root.
> > +	 */
> > +	min_items = 3;
> > +
> > +	/*
> > +	 * But we also want to reserve enough space so we can do the fallback
> > +	 * global reserve for an unlink, which is an additional 5 items (see the
> > +	 * comment in __unlink_start_trans for what we're modifying.)
> > +	 *
> > +	 * But we also need space for the delayed ref updates from the unlink,
> > +	 * so its 10, 5 for the actual operation, and 5 for the delayed ref
> > +	 * updates.
> > +	 */
> > +	min_items += 10;
> > +
> > +	num_bytes = max_t(u64, num_bytes,
> > +			  btrfs_calc_insert_metadata_size(fs_info, min_items));
> 
> For ordinary, 16k nodesize filesystem, btrfs_calc_insert_metadata_size
> will really return 3.4m -> 16 * 8 * 2 * 13 * 1024 = 3407872 bytes . In
> those cases I expect that the code will always be doing num_bytes =
> num_bytes.
> 

Right, generally this will always be num_bytes = num_bytes.  This is mostly for
things like xfstests which start with empty fs's and then fill up the fs with
data, so you only have like 200kib of metadata ever, so it ends up being the
btrfs_calc_insert_metadat_size() amount.  Thanks,

Josef

Patch
diff mbox series

diff --git a/fs/btrfs/block-rsv.c b/fs/btrfs/block-rsv.c
index c64b460a4301..657675eef443 100644
--- a/fs/btrfs/block-rsv.c
+++ b/fs/btrfs/block-rsv.c
@@ -258,6 +258,7 @@  void btrfs_update_global_block_rsv(struct btrfs_fs_info *fs_info)
 	struct btrfs_block_rsv *block_rsv = &fs_info->global_block_rsv;
 	struct btrfs_space_info *sinfo = block_rsv->space_info;
 	u64 num_bytes;
+	unsigned min_items;
 
 	/*
 	 * The global block rsv is based on the size of the extent tree, the
@@ -267,7 +268,26 @@  void btrfs_update_global_block_rsv(struct btrfs_fs_info *fs_info)
 	num_bytes = btrfs_root_used(&fs_info->extent_root->root_item) +
 		btrfs_root_used(&fs_info->csum_root->root_item) +
 		btrfs_root_used(&fs_info->tree_root->root_item);
-	num_bytes = max_t(u64, num_bytes, SZ_16M);
+
+	/*
+	 * We at a minimum are going to modify the csum root, the tree root, and
+	 * the extent root.
+	 */
+	min_items = 3;
+
+	/*
+	 * But we also want to reserve enough space so we can do the fallback
+	 * global reserve for an unlink, which is an additional 5 items (see the
+	 * comment in __unlink_start_trans for what we're modifying.)
+	 *
+	 * But we also need space for the delayed ref updates from the unlink,
+	 * so its 10, 5 for the actual operation, and 5 for the delayed ref
+	 * updates.
+	 */
+	min_items += 10;
+
+	num_bytes = max_t(u64, num_bytes,
+			  btrfs_calc_insert_metadata_size(fs_info, min_items));
 
 	spin_lock(&sinfo->lock);
 	spin_lock(&block_rsv->lock);