diff mbox series

[15/19] btrfs: load block_groups into discard_list on mount

Message ID 31ce602fac88f25567a0b3e89037693ec962c1c7.1570479299.git.dennis@kernel.org (mailing list archive)
State New, archived
Headers show
Series btrfs: async discard support | expand

Commit Message

Dennis Zhou Oct. 7, 2019, 8:17 p.m. UTC
Async discard doesn't remember the discard state of a block_group when
unmounting or when we crash. So, any block_group that is not fully used
may have undiscarded regions. However, free space caches are read in on
demand. Let the discard worker read in the free space cache so we can
proceed with discarding rather than wait for the block_group to be used.
This prevents us from indefinitely deferring discards until that
particular block_group is reused.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
---
 fs/btrfs/block-group.c |  2 ++
 fs/btrfs/discard.c     | 15 +++++++++++++++
 2 files changed, 17 insertions(+)

Comments

Josef Bacik Oct. 10, 2019, 5:11 p.m. UTC | #1
On Mon, Oct 07, 2019 at 04:17:46PM -0400, Dennis Zhou wrote:
> Async discard doesn't remember the discard state of a block_group when
> unmounting or when we crash. So, any block_group that is not fully used
> may have undiscarded regions. However, free space caches are read in on
> demand. Let the discard worker read in the free space cache so we can
> proceed with discarding rather than wait for the block_group to be used.
> This prevents us from indefinitely deferring discards until that
> particular block_group is reused.
> 
> Signed-off-by: Dennis Zhou <dennis@kernel.org>

What if we did completely discard the last time, now we're going back and
discarding again?  I think by default we just assume we discarded everything.
If we didn't then the user can always initiate a fitrim later.  Drop this one.
Thanks,

Josef
Dennis Zhou Oct. 14, 2019, 8:17 p.m. UTC | #2
On Thu, Oct 10, 2019 at 01:11:38PM -0400, Josef Bacik wrote:
> On Mon, Oct 07, 2019 at 04:17:46PM -0400, Dennis Zhou wrote:
> > Async discard doesn't remember the discard state of a block_group when
> > unmounting or when we crash. So, any block_group that is not fully used
> > may have undiscarded regions. However, free space caches are read in on
> > demand. Let the discard worker read in the free space cache so we can
> > proceed with discarding rather than wait for the block_group to be used.
> > This prevents us from indefinitely deferring discards until that
> > particular block_group is reused.
> > 
> > Signed-off-by: Dennis Zhou <dennis@kernel.org>
> 
> What if we did completely discard the last time, now we're going back and
> discarding again?  I think by default we just assume we discarded everything.
> If we didn't then the user can always initiate a fitrim later.  Drop this one.
> Thanks,
> 

Yeah this is something I wasn't sure about.

It makes me a little uncomfortable to make the lack of persistence a
user problem. If in some extreme case where someone frees a large amount
of space and then unmounts. We can either make them wait on unmount to
discard everything or retrim the whole drive which in an ideal world
should just be a noop on already free lba space. If others are in favor
of just going the fitrim route for users, I'm happy to drop this patch,
but I do like the fact that this makes the whole system consistent
without user intervention. Does anyone else have an opinion?

On a side note, the find_free_extent() allocator tries pretty hard
before allocating subsequent block groups. So maybe it's right to just
deprioritize these block groups instead of just not loading them.

Thanks, 
Dennis
David Sterba Oct. 14, 2019, 11:38 p.m. UTC | #3
On Mon, Oct 14, 2019 at 04:17:46PM -0400, Dennis Zhou wrote:
> On Thu, Oct 10, 2019 at 01:11:38PM -0400, Josef Bacik wrote:
> > On Mon, Oct 07, 2019 at 04:17:46PM -0400, Dennis Zhou wrote:
> > > Async discard doesn't remember the discard state of a block_group when
> > > unmounting or when we crash. So, any block_group that is not fully used
> > > may have undiscarded regions. However, free space caches are read in on
> > > demand. Let the discard worker read in the free space cache so we can
> > > proceed with discarding rather than wait for the block_group to be used.
> > > This prevents us from indefinitely deferring discards until that
> > > particular block_group is reused.
> > > 
> > > Signed-off-by: Dennis Zhou <dennis@kernel.org>
> > 
> > What if we did completely discard the last time, now we're going back and
> > discarding again?  I think by default we just assume we discarded everything.
> > If we didn't then the user can always initiate a fitrim later.  Drop this one.
> > Thanks,
> > 
> 
> Yeah this is something I wasn't sure about.
> 
> It makes me a little uncomfortable to make the lack of persistence a
> user problem. If in some extreme case where someone frees a large amount
> of space and then unmounts.

Based on past experience, umount should not be slowed down unless really
necessary.

> We can either make them wait on unmount to
> discard everything or retrim the whole drive which in an ideal world
> should just be a noop on already free lba space.

Without persistence of the state, we can't make it perfect and I think,
without any hard evidence, that trimming already trimmed blocks is no-op
on the device. We all know that we don't know what SSDs actually do, so
it's best effort and making it "device problem" is a good solution from
filesystem POV.
Dennis Zhou Oct. 15, 2019, 3:42 p.m. UTC | #4
On Tue, Oct 15, 2019 at 01:38:25AM +0200, David Sterba wrote:
> On Mon, Oct 14, 2019 at 04:17:46PM -0400, Dennis Zhou wrote:
> > On Thu, Oct 10, 2019 at 01:11:38PM -0400, Josef Bacik wrote:
> > > On Mon, Oct 07, 2019 at 04:17:46PM -0400, Dennis Zhou wrote:
> > > > Async discard doesn't remember the discard state of a block_group when
> > > > unmounting or when we crash. So, any block_group that is not fully used
> > > > may have undiscarded regions. However, free space caches are read in on
> > > > demand. Let the discard worker read in the free space cache so we can
> > > > proceed with discarding rather than wait for the block_group to be used.
> > > > This prevents us from indefinitely deferring discards until that
> > > > particular block_group is reused.
> > > > 
> > > > Signed-off-by: Dennis Zhou <dennis@kernel.org>
> > > 
> > > What if we did completely discard the last time, now we're going back and
> > > discarding again?  I think by default we just assume we discarded everything.
> > > If we didn't then the user can always initiate a fitrim later.  Drop this one.
> > > Thanks,
> > > 
> > 
> > Yeah this is something I wasn't sure about.
> > 
> > It makes me a little uncomfortable to make the lack of persistence a
> > user problem. If in some extreme case where someone frees a large amount
> > of space and then unmounts.
> 
> Based on past experience, umount should not be slowed down unless really
> necessary.
> 
> > We can either make them wait on unmount to
> > discard everything or retrim the whole drive which in an ideal world
> > should just be a noop on already free lba space.
> 
> Without persistence of the state, we can't make it perfect and I think,
> without any hard evidence, that trimming already trimmed blocks is no-op
> on the device. We all know that we don't know what SSDs actually do, so
> it's best effort and making it "device problem" is a good solution from
> filesystem POV.

That makes sense and sounds good to me. I've dropped this patch.

Thanks,
Dennis
diff mbox series

Patch

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 73e5a9384491..684959c96c3f 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1859,6 +1859,8 @@  int btrfs_read_block_groups(struct btrfs_fs_info *info)
 						&info->discard_ctl, cache);
 			else
 				btrfs_mark_bg_unused(cache);
+		} else if (btrfs_test_opt(info, DISCARD_ASYNC)) {
+			btrfs_add_to_discard_list(&info->discard_ctl, cache);
 		}
 	}
 
diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c
index 0e4d5a22c661..d99ba31e6f3b 100644
--- a/fs/btrfs/discard.c
+++ b/fs/btrfs/discard.c
@@ -246,6 +246,7 @@  static void btrfs_discard_workfn(struct work_struct *work)
 	int discard_index = 0;
 	u64 trimmed = 0;
 	u64 minlen = 0;
+	int ret;
 
 	discard_ctl = container_of(work, struct btrfs_discard_ctl, work.work);
 
@@ -254,6 +255,19 @@  static void btrfs_discard_workfn(struct work_struct *work)
 	if (!cache || !btrfs_run_discard_work(discard_ctl))
 		return;
 
+	if (!btrfs_block_group_cache_done(cache)) {
+		ret = btrfs_cache_block_group(cache, 0);
+		if (ret) {
+			remove_from_discard_list(discard_ctl, cache);
+			goto out;
+		}
+		ret = btrfs_wait_block_group_cache_done(cache);
+		if (ret) {
+			remove_from_discard_list(discard_ctl, cache);
+			goto out;
+		}
+	}
+
 	minlen = discard_minlen[discard_index];
 
 	if (btrfs_discard_bitmaps(cache)) {
@@ -291,6 +305,7 @@  static void btrfs_discard_workfn(struct work_struct *work)
 		}
 	}
 
+out:
 	spin_lock(&discard_ctl->lock);
 	discard_ctl->cache = NULL;
 	spin_unlock(&discard_ctl->lock);