diff mbox series

[v2,2/3] btrfs: Split remaining space to discard in chunks

Message ID 20240902205828.943155-3-luca.stefani.ge1@gmail.com (mailing list archive)
State New, archived
Headers show
Series btrfs: Don't block suspend during fstrim | expand

Commit Message

Luca Stefani Sept. 2, 2024, 8:56 p.m. UTC
Per Qu Wenruo in case we have a very large disk, e.g. 8TiB device,
mostly empty although we will do the split according to our super block
locations, the last super block ends at 256G, we can submit a huge
discard for the range [256G, 8T), causing a super large delay.

We now split the space left to discard based the block discard limit
in preparation of introduction of cancellation signals handling.

Reported-by: Qu Wenruo <wqu@suse.com>
Closes: https://lore.kernel.org/lkml/2e15214b-7e95-4e64-a899-725de12c9037@gmail.com/T/#mdfef1d8b36334a15c54cd009f6aadf49e260e105
Signed-off-by: Luca Stefani <luca.stefani.ge1@gmail.com>
---
 fs/btrfs/extent-tree.c | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

Comments

Qu Wenruo Sept. 2, 2024, 10:31 p.m. UTC | #1
在 2024/9/3 06:26, Luca Stefani 写道:
> Per Qu Wenruo in case we have a very large disk, e.g. 8TiB device,
> mostly empty although we will do the split according to our super block
> locations, the last super block ends at 256G, we can submit a huge
> discard for the range [256G, 8T), causing a super large delay.
> 
> We now split the space left to discard based the block discard limit
> in preparation of introduction of cancellation signals handling.
> 
> Reported-by: Qu Wenruo <wqu@suse.com>

Well, I'd say who ever reported the original fstrim and suspension 
failure should be the reporter, not me.

And David's advice is indeed pretty good for the new split according to 
the discard limit.

> Closes: https://lore.kernel.org/lkml/2e15214b-7e95-4e64-a899-725de12c9037@gmail.com/T/#mdfef1d8b36334a15c54cd009f6aadf49e260e105
> Signed-off-by: Luca Stefani <luca.stefani.ge1@gmail.com>
> ---
>   fs/btrfs/extent-tree.c | 23 +++++++++++++++++------
>   1 file changed, 17 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index feec49e6f9c8..894684f4f497 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -1239,8 +1239,9 @@ static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len,
>   			       u64 *discarded_bytes)
>   {
>   	int j, ret = 0;
> -	u64 bytes_left, end;
> +	u64 bytes_left, bytes_to_discard, end;

You can calculate the discard limit here, no need to recalculate inside 
the loop.

For the other variables only utilized inside the loop, you can always 
declare them inside the loop.

Otherwise looks good to me.

Thanks,
Qu
>   	u64 aligned_start = ALIGN(start, 1 << SECTOR_SHIFT);
> +	sector_t sector, nr_sects, bio_sects;
>   
>   	/* Adjust the range to be aligned to 512B sectors if necessary. */
>   	if (start != aligned_start) {
> @@ -1300,13 +1301,23 @@ static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len,
>   		bytes_left = end - start;
>   	}
>   
> -	if (bytes_left) {
> -		ret = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT,
> -					   bytes_left >> SECTOR_SHIFT,
> -					   GFP_NOFS);
> +	while (bytes_left) {
> +		sector = start >> SECTOR_SHIFT;
> +		nr_sects = bytes_left >> SECTOR_SHIFT;
> +		bio_sects = min(nr_sects, bio_discard_limit(bdev, sector));
> +		bytes_to_discard = bio_sects << SECTOR_SHIFT;
> +
> +		ret = blkdev_issue_discard(bdev, sector, bio_sects, GFP_NOFS);
> +
>   		if (!ret)
> -			*discarded_bytes += bytes_left;
> +			*discarded_bytes += bytes_to_discard;
> +		else if (ret != -EOPNOTSUPP)
> +			return ret;
> +
> +		start += bytes_to_discard;
> +		bytes_left -= bytes_to_discard;
>   	}
> +
>   	return ret;
>   }
>
David Sterba Sept. 2, 2024, 11:41 p.m. UTC | #2
On Tue, Sep 03, 2024 at 08:01:16AM +0930, Qu Wenruo wrote:
> 
> 
> 在 2024/9/3 06:26, Luca Stefani 写道:
> > Per Qu Wenruo in case we have a very large disk, e.g. 8TiB device,
> > mostly empty although we will do the split according to our super block
> > locations, the last super block ends at 256G, we can submit a huge
> > discard for the range [256G, 8T), causing a super large delay.
> > 
> > We now split the space left to discard based the block discard limit
> > in preparation of introduction of cancellation signals handling.
> > 
> > Reported-by: Qu Wenruo <wqu@suse.com>
> 
> Well, I'd say who ever reported the original fstrim and suspension 
> failure should be the reporter, not me.

Once the patch is finalized I'll add the links to reports but yeah if
they're added when the patches are submitted it's helpful.
diff mbox series

Patch

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index feec49e6f9c8..894684f4f497 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1239,8 +1239,9 @@  static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len,
 			       u64 *discarded_bytes)
 {
 	int j, ret = 0;
-	u64 bytes_left, end;
+	u64 bytes_left, bytes_to_discard, end;
 	u64 aligned_start = ALIGN(start, 1 << SECTOR_SHIFT);
+	sector_t sector, nr_sects, bio_sects;
 
 	/* Adjust the range to be aligned to 512B sectors if necessary. */
 	if (start != aligned_start) {
@@ -1300,13 +1301,23 @@  static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len,
 		bytes_left = end - start;
 	}
 
-	if (bytes_left) {
-		ret = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT,
-					   bytes_left >> SECTOR_SHIFT,
-					   GFP_NOFS);
+	while (bytes_left) {
+		sector = start >> SECTOR_SHIFT;
+		nr_sects = bytes_left >> SECTOR_SHIFT;
+		bio_sects = min(nr_sects, bio_discard_limit(bdev, sector));
+		bytes_to_discard = bio_sects << SECTOR_SHIFT;
+
+		ret = blkdev_issue_discard(bdev, sector, bio_sects, GFP_NOFS);
+
 		if (!ret)
-			*discarded_bytes += bytes_left;
+			*discarded_bytes += bytes_to_discard;
+		else if (ret != -EOPNOTSUPP)
+			return ret;
+
+		start += bytes_to_discard;
+		bytes_left -= bytes_to_discard;
 	}
+
 	return ret;
 }