Message ID | 31ce602fac88f25567a0b3e89037693ec962c1c7.1570479299.git.dennis@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: async discard support | expand |
On Mon, Oct 07, 2019 at 04:17:46PM -0400, Dennis Zhou wrote: > Async discard doesn't remember the discard state of a block_group when > unmounting or when we crash. So, any block_group that is not fully used > may have undiscarded regions. However, free space caches are read in on > demand. Let the discard worker read in the free space cache so we can > proceed with discarding rather than wait for the block_group to be used. > This prevents us from indefinitely deferring discards until that > particular block_group is reused. > > Signed-off-by: Dennis Zhou <dennis@kernel.org> What if we did completely discard the last time, now we're going back and discarding again? I think by default we just assume we discarded everything. If we didn't then the user can always initiate a fitrim later. Drop this one. Thanks, Josef
On Thu, Oct 10, 2019 at 01:11:38PM -0400, Josef Bacik wrote: > On Mon, Oct 07, 2019 at 04:17:46PM -0400, Dennis Zhou wrote: > > Async discard doesn't remember the discard state of a block_group when > > unmounting or when we crash. So, any block_group that is not fully used > > may have undiscarded regions. However, free space caches are read in on > > demand. Let the discard worker read in the free space cache so we can > > proceed with discarding rather than wait for the block_group to be used. > > This prevents us from indefinitely deferring discards until that > > particular block_group is reused. > > > > Signed-off-by: Dennis Zhou <dennis@kernel.org> > > What if we did completely discard the last time, now we're going back and > discarding again? I think by default we just assume we discarded everything. > If we didn't then the user can always initiate a fitrim later. Drop this one. > Thanks, > Yeah this is something I wasn't sure about. It makes me a little uncomfortable to make the lack of persistence a user problem. If in some extreme case where someone frees a large amount of space and then unmounts. We can either make them wait on unmount to discard everything or retrim the whole drive which in an ideal world should just be a noop on already free lba space. If others are in favor of just going the fitrim route for users, I'm happy to drop this patch, but I do like the fact that this makes the whole system consistent without user intervention. Does anyone else have an opinion? On a side note, the find_free_extent() allocator tries pretty hard before allocating subsequent block groups. So maybe it's right to just deprioritize these block groups instead of just not loading them. Thanks, Dennis
On Mon, Oct 14, 2019 at 04:17:46PM -0400, Dennis Zhou wrote: > On Thu, Oct 10, 2019 at 01:11:38PM -0400, Josef Bacik wrote: > > On Mon, Oct 07, 2019 at 04:17:46PM -0400, Dennis Zhou wrote: > > > Async discard doesn't remember the discard state of a block_group when > > > unmounting or when we crash. So, any block_group that is not fully used > > > may have undiscarded regions. However, free space caches are read in on > > > demand. Let the discard worker read in the free space cache so we can > > > proceed with discarding rather than wait for the block_group to be used. > > > This prevents us from indefinitely deferring discards until that > > > particular block_group is reused. > > > > > > Signed-off-by: Dennis Zhou <dennis@kernel.org> > > > > What if we did completely discard the last time, now we're going back and > > discarding again? I think by default we just assume we discarded everything. > > If we didn't then the user can always initiate a fitrim later. Drop this one. > > Thanks, > > > > Yeah this is something I wasn't sure about. > > It makes me a little uncomfortable to make the lack of persistence a > user problem. If in some extreme case where someone frees a large amount > of space and then unmounts. Based on past experience, umount should not be slowed down unless really necessary. > We can either make them wait on unmount to > discard everything or retrim the whole drive which in an ideal world > should just be a noop on already free lba space. Without persistence of the state, we can't make it perfect and I think, without any hard evidence, that trimming already trimmed blocks is no-op on the device. We all know that we don't know what SSDs actually do, so it's best effort and making it "device problem" is a good solution from filesystem POV.
On Tue, Oct 15, 2019 at 01:38:25AM +0200, David Sterba wrote: > On Mon, Oct 14, 2019 at 04:17:46PM -0400, Dennis Zhou wrote: > > On Thu, Oct 10, 2019 at 01:11:38PM -0400, Josef Bacik wrote: > > > On Mon, Oct 07, 2019 at 04:17:46PM -0400, Dennis Zhou wrote: > > > > Async discard doesn't remember the discard state of a block_group when > > > > unmounting or when we crash. So, any block_group that is not fully used > > > > may have undiscarded regions. However, free space caches are read in on > > > > demand. Let the discard worker read in the free space cache so we can > > > > proceed with discarding rather than wait for the block_group to be used. > > > > This prevents us from indefinitely deferring discards until that > > > > particular block_group is reused. > > > > > > > > Signed-off-by: Dennis Zhou <dennis@kernel.org> > > > > > > What if we did completely discard the last time, now we're going back and > > > discarding again? I think by default we just assume we discarded everything. > > > If we didn't then the user can always initiate a fitrim later. Drop this one. > > > Thanks, > > > > > > > Yeah this is something I wasn't sure about. > > > > It makes me a little uncomfortable to make the lack of persistence a > > user problem. If in some extreme case where someone frees a large amount > > of space and then unmounts. > > Based on past experience, umount should not be slowed down unless really > necessary. > > > We can either make them wait on unmount to > > discard everything or retrim the whole drive which in an ideal world > > should just be a noop on already free lba space. > > Without persistence of the state, we can't make it perfect and I think, > without any hard evidence, that trimming already trimmed blocks is no-op > on the device. We all know that we don't know what SSDs actually do, so > it's best effort and making it "device problem" is a good solution from > filesystem POV. That makes sense and sounds good to me. I've dropped this patch. Thanks, Dennis
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 73e5a9384491..684959c96c3f 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1859,6 +1859,8 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info) &info->discard_ctl, cache); else btrfs_mark_bg_unused(cache); + } else if (btrfs_test_opt(info, DISCARD_ASYNC)) { + btrfs_add_to_discard_list(&info->discard_ctl, cache); } } diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c index 0e4d5a22c661..d99ba31e6f3b 100644 --- a/fs/btrfs/discard.c +++ b/fs/btrfs/discard.c @@ -246,6 +246,7 @@ static void btrfs_discard_workfn(struct work_struct *work) int discard_index = 0; u64 trimmed = 0; u64 minlen = 0; + int ret; discard_ctl = container_of(work, struct btrfs_discard_ctl, work.work); @@ -254,6 +255,19 @@ static void btrfs_discard_workfn(struct work_struct *work) if (!cache || !btrfs_run_discard_work(discard_ctl)) return; + if (!btrfs_block_group_cache_done(cache)) { + ret = btrfs_cache_block_group(cache, 0); + if (ret) { + remove_from_discard_list(discard_ctl, cache); + goto out; + } + ret = btrfs_wait_block_group_cache_done(cache); + if (ret) { + remove_from_discard_list(discard_ctl, cache); + goto out; + } + } + minlen = discard_minlen[discard_index]; if (btrfs_discard_bitmaps(cache)) { @@ -291,6 +305,7 @@ static void btrfs_discard_workfn(struct work_struct *work) } } +out: spin_lock(&discard_ctl->lock); discard_ctl->cache = NULL; spin_unlock(&discard_ctl->lock);
Async discard doesn't remember the discard state of a block_group when unmounting or when we crash. So, any block_group that is not fully used may have undiscarded regions. However, free space caches are read in on demand. Let the discard worker read in the free space cache so we can proceed with discarding rather than wait for the block_group to be used. This prevents us from indefinitely deferring discards until that particular block_group is reused. Signed-off-by: Dennis Zhou <dennis@kernel.org> --- fs/btrfs/block-group.c | 2 ++ fs/btrfs/discard.c | 15 +++++++++++++++ 2 files changed, 17 insertions(+)