diff mbox series

[v4,2/3] btrfs: Split remaining space to discard in chunks

Message ID 20240916101615.116164-3-luca.stefani.ge1@gmail.com (mailing list archive)
State New, archived
Headers show
Series btrfs: Don't block system suspend during fstrim | expand

Commit Message

Luca Stefani Sept. 16, 2024, 10:16 a.m. UTC
Per Qu Wenruo in case we have a very large disk, e.g. 8TiB device,
mostly empty although we will do the split according to our super block
locations, the last super block ends at 256G, we can submit a huge
discard for the range [256G, 8T), causing a super large delay.

We now split the space left to discard based on BTRFS_MAX_DATA_CHUNK_SIZE
in preparation of introduction of cancellation signals handling.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=219180
Link: https://bugzilla.suse.com/show_bug.cgi?id=1229737
Signed-off-by: Luca Stefani <luca.stefani.ge1@gmail.com>
---
 fs/btrfs/extent-tree.c | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

Comments

Qu Wenruo Sept. 16, 2024, 10:39 a.m. UTC | #1
在 2024/9/16 19:46, Luca Stefani 写道:
> Per Qu Wenruo in case we have a very large disk, e.g. 8TiB device,
> mostly empty although we will do the split according to our super block
> locations, the last super block ends at 256G, we can submit a huge
> discard for the range [256G, 8T), causing a super large delay.
> 
> We now split the space left to discard based on BTRFS_MAX_DATA_CHUNK_SIZE
> in preparation of introduction of cancellation signals handling.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=219180
> Link: https://bugzilla.suse.com/show_bug.cgi?id=1229737
> Signed-off-by: Luca Stefani <luca.stefani.ge1@gmail.com>
> ---
>   fs/btrfs/extent-tree.c | 24 +++++++++++++++++++-----
>   1 file changed, 19 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index a5966324607d..cbe66d0acff8 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -1239,7 +1239,7 @@ static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len,
>   			       u64 *discarded_bytes)
>   {
>   	int j, ret = 0;
> -	u64 bytes_left, end;
> +	u64 bytes_left, bytes_to_discard, end;
>   	u64 aligned_start = ALIGN(start, 1 << SECTOR_SHIFT);
>   
>   	/* Adjust the range to be aligned to 512B sectors if necessary. */
> @@ -1300,13 +1300,27 @@ static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len,
>   		bytes_left = end - start;
>   	}
>   
> -	if (bytes_left) {
> +	while (bytes_left) {
> +		if (bytes_left > BTRFS_MAX_DATA_CHUNK_SIZE)
> +			bytes_to_discard = BTRFS_MAX_DATA_CHUNK_SIZE;

That MAX_DATA_CHUNK_SIZE is only possible for RAID0/RAID10/RAID5/RAID6, 
by spanning the device extents across multiple devices.

For each device, the maximum size is limited to 1G (check 
init_alloc_chunk_ctl_policy_regular()).

So you can just limit it to 1G instead.
(If you want, you can also extract that into a macro as a cleanup).

Furthermore, you can use min() instead of a if ().

So you only need:

		bytes_to_discard = min(SZ_1G, bytes_left);

Otherwise this looks good enough to me.
If the 1G size is not good enough, we can later tune it to smaller values.

Personally speaking I think 1G would be enough.

Thanks,
Qu
> +		else
> +			bytes_to_discard = bytes_left;
> +
>   		ret = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT,
> -					   bytes_left >> SECTOR_SHIFT,
> +					   bytes_to_discard >> SECTOR_SHIFT,
>   					   GFP_NOFS);
> -		if (!ret)
> -			*discarded_bytes += bytes_left;
> +
> +		if (ret) {
> +			if (ret != -EOPNOTSUPP)
> +				break;
> +			continue;
> +		}
> +
> +		start += bytes_to_discard;
> +		bytes_left -= bytes_to_discard;
> +		*discarded_bytes += bytes_to_discard;
>   	}
> +
>   	return ret;
>   }
>
Luca Stefani Sept. 16, 2024, 10:51 a.m. UTC | #2
On 16/09/24 12:39, Qu Wenruo wrote:
> 
> 
> 在 2024/9/16 19:46, Luca Stefani 写道:
>> Per Qu Wenruo in case we have a very large disk, e.g. 8TiB device,
>> mostly empty although we will do the split according to our super block
>> locations, the last super block ends at 256G, we can submit a huge
>> discard for the range [256G, 8T), causing a super large delay.
>>
>> We now split the space left to discard based on BTRFS_MAX_DATA_CHUNK_SIZE
>> in preparation of introduction of cancellation signals handling.
>>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=219180
>> Link: https://bugzilla.suse.com/show_bug.cgi?id=1229737
>> Signed-off-by: Luca Stefani <luca.stefani.ge1@gmail.com>
>> ---
>>   fs/btrfs/extent-tree.c | 24 +++++++++++++++++++-----
>>   1 file changed, 19 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index a5966324607d..cbe66d0acff8 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -1239,7 +1239,7 @@ static int btrfs_issue_discard(struct 
>> block_device *bdev, u64 start, u64 len,
>>                      u64 *discarded_bytes)
>>   {
>>       int j, ret = 0;
>> -    u64 bytes_left, end;
>> +    u64 bytes_left, bytes_to_discard, end;
>>       u64 aligned_start = ALIGN(start, 1 << SECTOR_SHIFT);
>>       /* Adjust the range to be aligned to 512B sectors if necessary. */
>> @@ -1300,13 +1300,27 @@ static int btrfs_issue_discard(struct 
>> block_device *bdev, u64 start, u64 len,
>>           bytes_left = end - start;
>>       }
>> -    if (bytes_left) {
>> +    while (bytes_left) {
>> +        if (bytes_left > BTRFS_MAX_DATA_CHUNK_SIZE)
>> +            bytes_to_discard = BTRFS_MAX_DATA_CHUNK_SIZE;
> 
> That MAX_DATA_CHUNK_SIZE is only possible for RAID0/RAID10/RAID5/RAID6, 
> by spanning the device extents across multiple devices.
> 
> For each device, the maximum size is limited to 1G (check 
> init_alloc_chunk_ctl_policy_regular()).
> 
> So you can just limit it to 1G instead.
> (If you want, you can also extract that into a macro as a cleanup).
I think SZ_1G is enough for now.
> 
> Furthermore, you can use min() instead of a if ().
> 
> So you only need:
> 
>          bytes_to_discard = min(SZ_1G, bytes_left);
> 
> Otherwise this looks good enough to me.
> If the 1G size is not good enough, we can later tune it to smaller values.
> 
> Personally speaking I think 1G would be enough.
> 
> Thanks,
> Qu
Ack, done in v5
>> +        else
>> +            bytes_to_discard = bytes_left;
>> +
>>           ret = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT,
>> -                       bytes_left >> SECTOR_SHIFT,
>> +                       bytes_to_discard >> SECTOR_SHIFT,
>>                          GFP_NOFS);
>> -        if (!ret)
>> -            *discarded_bytes += bytes_left;
>> +
>> +        if (ret) {
>> +            if (ret != -EOPNOTSUPP)
>> +                break;
>> +            continue;
>> +        }
>> +
>> +        start += bytes_to_discard;
>> +        bytes_left -= bytes_to_discard;
>> +        *discarded_bytes += bytes_to_discard;
>>       }
>> +
>>       return ret;
>>   }
diff mbox series

Patch

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index a5966324607d..cbe66d0acff8 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1239,7 +1239,7 @@  static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len,
 			       u64 *discarded_bytes)
 {
 	int j, ret = 0;
-	u64 bytes_left, end;
+	u64 bytes_left, bytes_to_discard, end;
 	u64 aligned_start = ALIGN(start, 1 << SECTOR_SHIFT);
 
 	/* Adjust the range to be aligned to 512B sectors if necessary. */
@@ -1300,13 +1300,27 @@  static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len,
 		bytes_left = end - start;
 	}
 
-	if (bytes_left) {
+	while (bytes_left) {
+		if (bytes_left > BTRFS_MAX_DATA_CHUNK_SIZE)
+			bytes_to_discard = BTRFS_MAX_DATA_CHUNK_SIZE;
+		else
+			bytes_to_discard = bytes_left;
+
 		ret = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT,
-					   bytes_left >> SECTOR_SHIFT,
+					   bytes_to_discard >> SECTOR_SHIFT,
 					   GFP_NOFS);
-		if (!ret)
-			*discarded_bytes += bytes_left;
+
+		if (ret) {
+			if (ret != -EOPNOTSUPP)
+				break;
+			continue;
+		}
+
+		start += bytes_to_discard;
+		bytes_left -= bytes_to_discard;
+		*discarded_bytes += bytes_to_discard;
 	}
+
 	return ret;
 }