diff mbox

Btrfs: fix max chunk size on raid5/6

Message ID 20130220213229.GC21291@shiny.masoncoding.com (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Mason Feb. 20, 2013, 9:32 p.m. UTC
Hi everyone,

This spot in the chunk allocation code has seen a lot of little tweaks,
so I wanted to send this patch out for more eyes.

--

We try to limit the size of a chunk to 10GB, which keeps the unit of
work reasonable during balance and resize operations.  The limit checks
were taking into account the number of copies of the data we had but
what they really should be doing is comparing against the logical
size of the chunk we're creating.

This moves the code around a little to use the count of data stripes
from raid5/6.

Signed-off-by: Chris Mason <chris.mason@fusionio.com>

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Goffredo Baroncelli Feb. 21, 2013, 6:25 p.m. UTC | #1
Hi Chris,

my comments below
On 02/20/2013 10:32 PM, Chris Mason wrote:
> Hi everyone,
> 
> This spot in the chunk allocation code has seen a lot of little tweaks,
> so I wanted to send this patch out for more eyes.
> 
> --
> 
> We try to limit the size of a chunk to 10GB, which keeps the unit of
> work reasonable during balance and resize operations.  The limit checks
> were taking into account the number of copies of the data we had but
> what they really should be doing is comparing against the logical
> size of the chunk we're creating.
> 
> This moves the code around a little to use the count of data stripes
> from raid5/6.
> 
> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 5d6010b..538c5cf 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -3837,10 +3837,6 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>  	 */
>  	data_stripes = num_stripes / ncopies;
>  
> -	if (stripe_size * ndevs > max_chunk_size * ncopies) {
> -		stripe_size = max_chunk_size * ncopies;
> -		do_div(stripe_size, ndevs);
> -	}
>  	if (type & BTRFS_BLOCK_GROUP_RAID5) {
>  		raid_stripe_len = find_raid56_stripe_len(ndevs - 1,
>  				 btrfs_super_stripesize(info->super_copy));
> @@ -3851,6 +3847,27 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>  				 btrfs_super_stripesize(info->super_copy));
>  		data_stripes = num_stripes - 2;
>  	}
> +
> +	/*
> +	 * Use the number of data stripes to figure out how big this chunk
> +	 * is really going to be in terms of logical address space,
> +	 * and compare that answer with the max chunk size
> +	 */
> +	if (stripe_size * data_stripes > max_chunk_size) {
> +		u64 mask = (1ULL << 24) - 1;

1<<24 should be a #define or better a parameter tunable via sysfs and/or
a superblock field.

May be we can use everywhere BTRFS_STRIPE_LEN ? (of course increasing
its value to 16MB or more ?) I am asking that because I don't know if it
makes sense to have BTRFS_STRIPE_LEN different from the "round up" value ...


> +		stripe_size = max_chunk_size;
> +		do_div(stripe_size, data_stripes);
> +
> +		/* bump the answer up to a 16MB boundary */
> +		stripe_size = (stripe_size + mask) & ~mask;
> +
> +		/* but don't go higher than the limits we found
> +		 * while searching for free extents
> +		 */
> +		if (stripe_size > devices_info[ndevs-1].max_avail)
> +			stripe_size = devices_info[ndevs-1].max_avail;
> +	}
> +
>  	do_div(stripe_size, dev_stripes);

The logic to me seems correct.
However my fear is that a "16MB" boundary is too low. I would felt
better with a round-up to 128MB. So the stripe size would vary from
128MB to 1GB in step of 128MB. The combination wouldn't be too high.
diff mbox

Patch

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 5d6010b..538c5cf 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3837,10 +3837,6 @@  static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	 */
 	data_stripes = num_stripes / ncopies;
 
-	if (stripe_size * ndevs > max_chunk_size * ncopies) {
-		stripe_size = max_chunk_size * ncopies;
-		do_div(stripe_size, ndevs);
-	}
 	if (type & BTRFS_BLOCK_GROUP_RAID5) {
 		raid_stripe_len = find_raid56_stripe_len(ndevs - 1,
 				 btrfs_super_stripesize(info->super_copy));
@@ -3851,6 +3847,27 @@  static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 				 btrfs_super_stripesize(info->super_copy));
 		data_stripes = num_stripes - 2;
 	}
+
+	/*
+	 * Use the number of data stripes to figure out how big this chunk
+	 * is really going to be in terms of logical address space,
+	 * and compare that answer with the max chunk size
+	 */
+	if (stripe_size * data_stripes > max_chunk_size) {
+		u64 mask = (1ULL << 24) - 1;
+		stripe_size = max_chunk_size;
+		do_div(stripe_size, data_stripes);
+
+		/* bump the answer up to a 16MB boundary */
+		stripe_size = (stripe_size + mask) & ~mask;
+
+		/* but don't go higher than the limits we found
+		 * while searching for free extents
+		 */
+		if (stripe_size > devices_info[ndevs-1].max_avail)
+			stripe_size = devices_info[ndevs-1].max_avail;
+	}
+
 	do_div(stripe_size, dev_stripes);
 
 	/* align to BTRFS_STRIPE_LEN */