[15/19] btrfs: load block_groups into discard_list on mount

Message ID	31ce602fac88f25567a0b3e89037693ec962c1c7.1570479299.git.dennis@kernel.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=vnEf=YA=vger.kernel.org=linux-btrfs-owner@kernel.org> From: Dennis Zhou <dennis@kernel.org> To: David Sterba <dsterba@suse.com>, Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>, Omar Sandoval <osandov@osandov.com> Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou <dennis@kernel.org> Subject: [PATCH 15/19] btrfs: load block_groups into discard_list on mount Date: Mon, 7 Oct 2019 16:17:46 -0400 Message-Id: <31ce602fac88f25567a0b3e89037693ec962c1c7.1570479299.git.dennis@kernel.org> In-Reply-To: <cover.1570479299.git.dennis@kernel.org> References: <cover.1570479299.git.dennis@kernel.org> In-Reply-To: <cover.1570479299.git.dennis@kernel.org> References: <cover.1570479299.git.dennis@kernel.org> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk
Series	btrfs: async discard support \| expand [RFC,00/19] btrfs: async discard support [01/19] bitmap: genericize percpu bitmap region iterators [02/19] btrfs: rename DISCARD opt to DISCARD_SYNC [03/19] btrfs: keep track of which extents have been discarded [04/19] btrfs: keep track of cleanliness of the bitmap [05/19] btrfs: add the beginning of async discard, discard workqueue [06/19] btrfs: handle empty block_group removal [07/19] btrfs: discard one region at a time in async discard [08/19] btrfs: track discardable extents for asnyc discard [09/19] btrfs: keep track of discardable_bytes [10/19] btrfs: calculate discard delay based on number of extents [11/19] btrfs: add bps discard rate limit [12/19] btrfs: limit max discard size for async discard [13/19] btrfs: have multiple discard lists [14/19] btrfs: only keep track of data extents for async discard [15/19] btrfs: load block_groups into discard_list on mount [16/19] btrfs: keep track of discard reuse stats [17/19] btrfs: add async discard header [18/19] btrfs: increase the metadata allowance for the free_space_cache [19/19] btrfs: make smaller extents more likely to go into bitmaps

Message ID

31ce602fac88f25567a0b3e89037693ec962c1c7.1570479299.git.dennis@kernel.org (mailing list archive)

State

New, archived

Headers

From: Dennis Zhou <dennis@kernel.org>
To: David Sterba <dsterba@suse.com>, Chris Mason <clm@fb.com>,
        Josef Bacik <josef@toxicpanda.com>,
        Omar Sandoval <osandov@osandov.com>
Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org,
        Dennis Zhou <dennis@kernel.org>
Subject: [PATCH 15/19] btrfs: load block_groups into discard_list on mount
Date: Mon,  7 Oct 2019 16:17:46 -0400
Message-Id: 
 <31ce602fac88f25567a0b3e89037693ec962c1c7.1570479299.git.dennis@kernel.org>
In-Reply-To: <cover.1570479299.git.dennis@kernel.org>
References: <cover.1570479299.git.dennis@kernel.org>
In-Reply-To: <cover.1570479299.git.dennis@kernel.org>
References: <cover.1570479299.git.dennis@kernel.org>
Sender: linux-btrfs-owner@vger.kernel.org
Precedence: bulk

Series

btrfs: async discard support | expand

Commit Message

Dennis Zhou Oct. 7, 2019, 8:17 p.m. UTC

Async discard doesn't remember the discard state of a block_group when
unmounting or when we crash. So, any block_group that is not fully used
may have undiscarded regions. However, free space caches are read in on
demand. Let the discard worker read in the free space cache so we can
proceed with discarding rather than wait for the block_group to be used.
This prevents us from indefinitely deferring discards until that
particular block_group is reused.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
---
 fs/btrfs/block-group.c |  2 ++
 fs/btrfs/discard.c     | 15 +++++++++++++++
 2 files changed, 17 insertions(+)

Comments

Josef Bacik Oct. 10, 2019, 5:11 p.m. UTC | #1

On Mon, Oct 07, 2019 at 04:17:46PM -0400, Dennis Zhou wrote:
> Async discard doesn't remember the discard state of a block_group when
> unmounting or when we crash. So, any block_group that is not fully used
> may have undiscarded regions. However, free space caches are read in on
> demand. Let the discard worker read in the free space cache so we can
> proceed with discarding rather than wait for the block_group to be used.
> This prevents us from indefinitely deferring discards until that
> particular block_group is reused.
> 
> Signed-off-by: Dennis Zhou <dennis@kernel.org>

What if we did completely discard the last time, now we're going back and
discarding again?  I think by default we just assume we discarded everything.
If we didn't then the user can always initiate a fitrim later.  Drop this one.
Thanks,

Josef

Dennis Zhou Oct. 14, 2019, 8:17 p.m. UTC | #2

On Thu, Oct 10, 2019 at 01:11:38PM -0400, Josef Bacik wrote:
> On Mon, Oct 07, 2019 at 04:17:46PM -0400, Dennis Zhou wrote:
> > Async discard doesn't remember the discard state of a block_group when
> > unmounting or when we crash. So, any block_group that is not fully used
> > may have undiscarded regions. However, free space caches are read in on
> > demand. Let the discard worker read in the free space cache so we can
> > proceed with discarding rather than wait for the block_group to be used.
> > This prevents us from indefinitely deferring discards until that
> > particular block_group is reused.
> > 
> > Signed-off-by: Dennis Zhou <dennis@kernel.org>
> 
> What if we did completely discard the last time, now we're going back and
> discarding again?  I think by default we just assume we discarded everything.
> If we didn't then the user can always initiate a fitrim later.  Drop this one.
> Thanks,
> 

Yeah this is something I wasn't sure about.

It makes me a little uncomfortable to make the lack of persistence a
user problem. If in some extreme case where someone frees a large amount
of space and then unmounts. We can either make them wait on unmount to
discard everything or retrim the whole drive which in an ideal world
should just be a noop on already free lba space. If others are in favor
of just going the fitrim route for users, I'm happy to drop this patch,
but I do like the fact that this makes the whole system consistent
without user intervention. Does anyone else have an opinion?

On a side note, the find_free_extent() allocator tries pretty hard
before allocating subsequent block groups. So maybe it's right to just
deprioritize these block groups instead of just not loading them.

Thanks, 
Dennis

David Sterba Oct. 14, 2019, 11:38 p.m. UTC | #3

On Mon, Oct 14, 2019 at 04:17:46PM -0400, Dennis Zhou wrote:
> On Thu, Oct 10, 2019 at 01:11:38PM -0400, Josef Bacik wrote:
> > On Mon, Oct 07, 2019 at 04:17:46PM -0400, Dennis Zhou wrote:
> > > Async discard doesn't remember the discard state of a block_group when
> > > unmounting or when we crash. So, any block_group that is not fully used
> > > may have undiscarded regions. However, free space caches are read in on
> > > demand. Let the discard worker read in the free space cache so we can
> > > proceed with discarding rather than wait for the block_group to be used.
> > > This prevents us from indefinitely deferring discards until that
> > > particular block_group is reused.
> > > 
> > > Signed-off-by: Dennis Zhou <dennis@kernel.org>
> > 
> > What if we did completely discard the last time, now we're going back and
> > discarding again?  I think by default we just assume we discarded everything.
> > If we didn't then the user can always initiate a fitrim later.  Drop this one.
> > Thanks,
> > 
> 
> Yeah this is something I wasn't sure about.
> 
> It makes me a little uncomfortable to make the lack of persistence a
> user problem. If in some extreme case where someone frees a large amount
> of space and then unmounts.

Based on past experience, umount should not be slowed down unless really
necessary.

> We can either make them wait on unmount to
> discard everything or retrim the whole drive which in an ideal world
> should just be a noop on already free lba space.

Without persistence of the state, we can't make it perfect and I think,
without any hard evidence, that trimming already trimmed blocks is no-op
on the device. We all know that we don't know what SSDs actually do, so
it's best effort and making it "device problem" is a good solution from
filesystem POV.

Dennis Zhou Oct. 15, 2019, 3:42 p.m. UTC | #4

On Tue, Oct 15, 2019 at 01:38:25AM +0200, David Sterba wrote:
> On Mon, Oct 14, 2019 at 04:17:46PM -0400, Dennis Zhou wrote:
> > On Thu, Oct 10, 2019 at 01:11:38PM -0400, Josef Bacik wrote:
> > > On Mon, Oct 07, 2019 at 04:17:46PM -0400, Dennis Zhou wrote:
> > > > Async discard doesn't remember the discard state of a block_group when
> > > > unmounting or when we crash. So, any block_group that is not fully used
> > > > may have undiscarded regions. However, free space caches are read in on
> > > > demand. Let the discard worker read in the free space cache so we can
> > > > proceed with discarding rather than wait for the block_group to be used.
> > > > This prevents us from indefinitely deferring discards until that
> > > > particular block_group is reused.
> > > > 
> > > > Signed-off-by: Dennis Zhou <dennis@kernel.org>
> > > 
> > > What if we did completely discard the last time, now we're going back and
> > > discarding again?  I think by default we just assume we discarded everything.
> > > If we didn't then the user can always initiate a fitrim later.  Drop this one.
> > > Thanks,
> > > 
> > 
> > Yeah this is something I wasn't sure about.
> > 
> > It makes me a little uncomfortable to make the lack of persistence a
> > user problem. If in some extreme case where someone frees a large amount
> > of space and then unmounts.
> 
> Based on past experience, umount should not be slowed down unless really
> necessary.
> 
> > We can either make them wait on unmount to
> > discard everything or retrim the whole drive which in an ideal world
> > should just be a noop on already free lba space.
> 
> Without persistence of the state, we can't make it perfect and I think,
> without any hard evidence, that trimming already trimmed blocks is no-op
> on the device. We all know that we don't know what SSDs actually do, so
> it's best effort and making it "device problem" is a good solution from
> filesystem POV.

That makes sense and sounds good to me. I've dropped this patch.

Thanks,
Dennis

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 73e5a9384491..684959c96c3f 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1859,6 +1859,8 @@  int btrfs_read_block_groups(struct btrfs_fs_info *info)
 						&info->discard_ctl, cache);
 			else
 				btrfs_mark_bg_unused(cache);
+		} else if (btrfs_test_opt(info, DISCARD_ASYNC)) {
+			btrfs_add_to_discard_list(&info->discard_ctl, cache);
 		}
 	}
 
diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c
index 0e4d5a22c661..d99ba31e6f3b 100644
--- a/fs/btrfs/discard.c
+++ b/fs/btrfs/discard.c
@@ -246,6 +246,7 @@  static void btrfs_discard_workfn(struct work_struct *work)
 	int discard_index = 0;
 	u64 trimmed = 0;
 	u64 minlen = 0;
+	int ret;
 
 	discard_ctl = container_of(work, struct btrfs_discard_ctl, work.work);
 
@@ -254,6 +255,19 @@  static void btrfs_discard_workfn(struct work_struct *work)
 	if (!cache || !btrfs_run_discard_work(discard_ctl))
 		return;
 
+	if (!btrfs_block_group_cache_done(cache)) {
+		ret = btrfs_cache_block_group(cache, 0);
+		if (ret) {
+			remove_from_discard_list(discard_ctl, cache);
+			goto out;
+		}
+		ret = btrfs_wait_block_group_cache_done(cache);
+		if (ret) {
+			remove_from_discard_list(discard_ctl, cache);
+			goto out;
+		}
+	}
+
 	minlen = discard_minlen[discard_index];
 
 	if (btrfs_discard_bitmaps(cache)) {
@@ -291,6 +305,7 @@  static void btrfs_discard_workfn(struct work_struct *work)
 		}
 	}
 
+out:
 	spin_lock(&discard_ctl->lock);
 	discard_ctl->cache = NULL;
 	spin_unlock(&discard_ctl->lock);

[15/19] btrfs: load block_groups into discard_list on mount

Commit Message

Comments

Patch