Message ID | 20240916101615.116164-3-luca.stefani.ge1@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: Don't block system suspend during fstrim | expand |
在 2024/9/16 19:46, Luca Stefani 写道: > Per Qu Wenruo in case we have a very large disk, e.g. 8TiB device, > mostly empty although we will do the split according to our super block > locations, the last super block ends at 256G, we can submit a huge > discard for the range [256G, 8T), causing a super large delay. > > We now split the space left to discard based on BTRFS_MAX_DATA_CHUNK_SIZE > in preparation of introduction of cancellation signals handling. > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=219180 > Link: https://bugzilla.suse.com/show_bug.cgi?id=1229737 > Signed-off-by: Luca Stefani <luca.stefani.ge1@gmail.com> > --- > fs/btrfs/extent-tree.c | 24 +++++++++++++++++++----- > 1 file changed, 19 insertions(+), 5 deletions(-) > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index a5966324607d..cbe66d0acff8 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -1239,7 +1239,7 @@ static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len, > u64 *discarded_bytes) > { > int j, ret = 0; > - u64 bytes_left, end; > + u64 bytes_left, bytes_to_discard, end; > u64 aligned_start = ALIGN(start, 1 << SECTOR_SHIFT); > > /* Adjust the range to be aligned to 512B sectors if necessary. */ > @@ -1300,13 +1300,27 @@ static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len, > bytes_left = end - start; > } > > - if (bytes_left) { > + while (bytes_left) { > + if (bytes_left > BTRFS_MAX_DATA_CHUNK_SIZE) > + bytes_to_discard = BTRFS_MAX_DATA_CHUNK_SIZE; That MAX_DATA_CHUNK_SIZE is only possible for RAID0/RAID10/RAID5/RAID6, by spanning the device extents across multiple devices. For each device, the maximum size is limited to 1G (check init_alloc_chunk_ctl_policy_regular()). So you can just limit it to 1G instead. (If you want, you can also extract that into a macro as a cleanup). Furthermore, you can use min() instead of a if (). So you only need: bytes_to_discard = min(SZ_1G, bytes_left); Otherwise this looks good enough to me. If the 1G size is not good enough, we can later tune it to smaller values. Personally speaking I think 1G would be enough. Thanks, Qu > + else > + bytes_to_discard = bytes_left; > + > ret = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT, > - bytes_left >> SECTOR_SHIFT, > + bytes_to_discard >> SECTOR_SHIFT, > GFP_NOFS); > - if (!ret) > - *discarded_bytes += bytes_left; > + > + if (ret) { > + if (ret != -EOPNOTSUPP) > + break; > + continue; > + } > + > + start += bytes_to_discard; > + bytes_left -= bytes_to_discard; > + *discarded_bytes += bytes_to_discard; > } > + > return ret; > } >
On 16/09/24 12:39, Qu Wenruo wrote: > > > 在 2024/9/16 19:46, Luca Stefani 写道: >> Per Qu Wenruo in case we have a very large disk, e.g. 8TiB device, >> mostly empty although we will do the split according to our super block >> locations, the last super block ends at 256G, we can submit a huge >> discard for the range [256G, 8T), causing a super large delay. >> >> We now split the space left to discard based on BTRFS_MAX_DATA_CHUNK_SIZE >> in preparation of introduction of cancellation signals handling. >> >> Link: https://bugzilla.kernel.org/show_bug.cgi?id=219180 >> Link: https://bugzilla.suse.com/show_bug.cgi?id=1229737 >> Signed-off-by: Luca Stefani <luca.stefani.ge1@gmail.com> >> --- >> fs/btrfs/extent-tree.c | 24 +++++++++++++++++++----- >> 1 file changed, 19 insertions(+), 5 deletions(-) >> >> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c >> index a5966324607d..cbe66d0acff8 100644 >> --- a/fs/btrfs/extent-tree.c >> +++ b/fs/btrfs/extent-tree.c >> @@ -1239,7 +1239,7 @@ static int btrfs_issue_discard(struct >> block_device *bdev, u64 start, u64 len, >> u64 *discarded_bytes) >> { >> int j, ret = 0; >> - u64 bytes_left, end; >> + u64 bytes_left, bytes_to_discard, end; >> u64 aligned_start = ALIGN(start, 1 << SECTOR_SHIFT); >> /* Adjust the range to be aligned to 512B sectors if necessary. */ >> @@ -1300,13 +1300,27 @@ static int btrfs_issue_discard(struct >> block_device *bdev, u64 start, u64 len, >> bytes_left = end - start; >> } >> - if (bytes_left) { >> + while (bytes_left) { >> + if (bytes_left > BTRFS_MAX_DATA_CHUNK_SIZE) >> + bytes_to_discard = BTRFS_MAX_DATA_CHUNK_SIZE; > > That MAX_DATA_CHUNK_SIZE is only possible for RAID0/RAID10/RAID5/RAID6, > by spanning the device extents across multiple devices. > > For each device, the maximum size is limited to 1G (check > init_alloc_chunk_ctl_policy_regular()). > > So you can just limit it to 1G instead. > (If you want, you can also extract that into a macro as a cleanup). I think SZ_1G is enough for now. > > Furthermore, you can use min() instead of a if (). > > So you only need: > > bytes_to_discard = min(SZ_1G, bytes_left); > > Otherwise this looks good enough to me. > If the 1G size is not good enough, we can later tune it to smaller values. > > Personally speaking I think 1G would be enough. > > Thanks, > Qu Ack, done in v5 >> + else >> + bytes_to_discard = bytes_left; >> + >> ret = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT, >> - bytes_left >> SECTOR_SHIFT, >> + bytes_to_discard >> SECTOR_SHIFT, >> GFP_NOFS); >> - if (!ret) >> - *discarded_bytes += bytes_left; >> + >> + if (ret) { >> + if (ret != -EOPNOTSUPP) >> + break; >> + continue; >> + } >> + >> + start += bytes_to_discard; >> + bytes_left -= bytes_to_discard; >> + *discarded_bytes += bytes_to_discard; >> } >> + >> return ret; >> }
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index a5966324607d..cbe66d0acff8 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1239,7 +1239,7 @@ static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len, u64 *discarded_bytes) { int j, ret = 0; - u64 bytes_left, end; + u64 bytes_left, bytes_to_discard, end; u64 aligned_start = ALIGN(start, 1 << SECTOR_SHIFT); /* Adjust the range to be aligned to 512B sectors if necessary. */ @@ -1300,13 +1300,27 @@ static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len, bytes_left = end - start; } - if (bytes_left) { + while (bytes_left) { + if (bytes_left > BTRFS_MAX_DATA_CHUNK_SIZE) + bytes_to_discard = BTRFS_MAX_DATA_CHUNK_SIZE; + else + bytes_to_discard = bytes_left; + ret = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT, - bytes_left >> SECTOR_SHIFT, + bytes_to_discard >> SECTOR_SHIFT, GFP_NOFS); - if (!ret) - *discarded_bytes += bytes_left; + + if (ret) { + if (ret != -EOPNOTSUPP) + break; + continue; + } + + start += bytes_to_discard; + bytes_left -= bytes_to_discard; + *discarded_bytes += bytes_to_discard; } + return ret; }
Per Qu Wenruo in case we have a very large disk, e.g. 8TiB device, mostly empty although we will do the split according to our super block locations, the last super block ends at 256G, we can submit a huge discard for the range [256G, 8T), causing a super large delay. We now split the space left to discard based on BTRFS_MAX_DATA_CHUNK_SIZE in preparation of introduction of cancellation signals handling. Link: https://bugzilla.kernel.org/show_bug.cgi?id=219180 Link: https://bugzilla.suse.com/show_bug.cgi?id=1229737 Signed-off-by: Luca Stefani <luca.stefani.ge1@gmail.com> --- fs/btrfs/extent-tree.c | 24 +++++++++++++++++++----- 1 file changed, 19 insertions(+), 5 deletions(-)