[v6,00/22] btrfs: async discard support

Message ID	cover.1576195673.git.dennis@kernel.org (mailing list archive)
Headers	show Return-Path: <SRS0=jDr+=2E=vger.kernel.org=linux-btrfs-owner@kernel.org> From: Dennis Zhou <dennis@kernel.org> To: David Sterba <dsterba@suse.com>, Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>, Omar Sandoval <osandov@osandov.com> Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou <dennis@kernel.org> Subject: [PATCH v6 00/22] btrfs: async discard support Date: Fri, 13 Dec 2019 16:22:09 -0800 Message-Id: <cover.1576195673.git.dennis@kernel.org> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk
Series	btrfs: async discard support \| expand [v6,00/22] btrfs: async discard support [01/22] bitmap: genericize percpu bitmap region iterators [02/22] btrfs: rename DISCARD opt to DISCARD_SYNC [03/22] btrfs: keep track of which extents have been discarded [04/22] btrfs: keep track of cleanliness of the bitmap [05/22] btrfs: add the beginning of async discard, discard workqueue [06/22] btrfs: handle empty block_group removal [07/22] btrfs: discard one region at a time in async discard [08/22] btrfs: add removal calls for sysfs debug/ [09/22] btrfs: make UUID/debug have its own kobject [10/22] btrfs: add discard sysfs directory [11/22] btrfs: track discardable extents for async discard [12/22] btrfs: keep track of discardable_bytes [13/22] btrfs: calculate discard delay based on number of extents [14/22] btrfs: add bps discard rate limit [15/22] btrfs: limit max discard size for async discard [16/22] btrfs: make max async discard size tunable [17/22] btrfs: have multiple discard lists [18/22] btrfs: only keep track of data extents for async discard [19/22] btrfs: keep track of discard reuse stats [20/22] btrfs: add async discard header [21/22] btrfs: increase the metadata allowance for the free_space_cache [22/22] btrfs: make smaller extents more likely to go into bitmaps

Dennis Zhou Dec. 14, 2019, 12:22 a.m. UTC

Hello,

Dave reported a lockdep issue [1]. I'm a bit surprised as I can't repro
it, but it obviously is right. I believe I fixed the issue by moving the
fully trimmed check outside of the block_group lock.  I mistakingly
thought the btrfs_block_group lock subsumed btrfs_free_space_ctl
tree_lock. This clearly isn't the case.

Changes in v6:
 - Move the fully trimmed check outside of the block_group lock.

v5 is available here: [2].

This series is on top of btrfs-devel#misc-next 7ee98bb808e2 + [3] and
[4].

[1] https://lore.kernel.org/linux-btrfs/20191210140438.GU2734@twin.jikos.cz/
[2] https://lore.kernel.org/linux-btrfs/cover.1575919745.git.dennis@kernel.org/
[3] https://lore.kernel.org/linux-btrfs/d934383ea528d920a95b6107daad6023b516f0f4.1576109087.git.dennis@kernel.org/
[4] https://lore.kernel.org/linux-btrfs/20191209193846.18162-1-dennis@kernel.org/

Dennis Zhou (22):
  bitmap: genericize percpu bitmap region iterators
  btrfs: rename DISCARD opt to DISCARD_SYNC
  btrfs: keep track of which extents have been discarded
  btrfs: keep track of cleanliness of the bitmap
  btrfs: add the beginning of async discard, discard workqueue
  btrfs: handle empty block_group removal
  btrfs: discard one region at a time in async discard
  btrfs: add removal calls for sysfs debug/
  btrfs: make UUID/debug have its own kobject
  btrfs: add discard sysfs directory
  btrfs: track discardable extents for async discard
  btrfs: keep track of discardable_bytes
  btrfs: calculate discard delay based on number of extents
  btrfs: add bps discard rate limit
  btrfs: limit max discard size for async discard
  btrfs: make max async discard size tunable
  btrfs: have multiple discard lists
  btrfs: only keep track of data extents for async discard
  btrfs: keep track of discard reuse stats
  btrfs: add async discard header
  btrfs: increase the metadata allowance for the free_space_cache
  btrfs: make smaller extents more likely to go into bitmaps

 fs/btrfs/Makefile           |   2 +-
 fs/btrfs/block-group.c      |  87 ++++-
 fs/btrfs/block-group.h      |  30 ++
 fs/btrfs/ctree.h            |  52 ++-
 fs/btrfs/discard.c          | 684 ++++++++++++++++++++++++++++++++++++
 fs/btrfs/discard.h          |  42 +++
 fs/btrfs/disk-io.c          |  15 +-
 fs/btrfs/extent-tree.c      |   8 +-
 fs/btrfs/free-space-cache.c | 611 +++++++++++++++++++++++++++-----
 fs/btrfs/free-space-cache.h |  41 ++-
 fs/btrfs/inode-map.c        |  13 +-
 fs/btrfs/inode.c            |   2 +-
 fs/btrfs/scrub.c            |   7 +-
 fs/btrfs/super.c            |  39 +-
 fs/btrfs/sysfs.c            | 205 ++++++++++-
 fs/btrfs/volumes.c          |   7 +
 include/linux/bitmap.h      |  35 ++
 mm/percpu.c                 |  61 +---
 18 files changed, 1789 insertions(+), 152 deletions(-)
 create mode 100644 fs/btrfs/discard.c
 create mode 100644 fs/btrfs/discard.h

Thanks,
Dennis

David Sterba Dec. 17, 2019, 2:55 p.m. UTC | #1

On Fri, Dec 13, 2019 at 04:22:09PM -0800, Dennis Zhou wrote:
> Hello,
> 
> Dave reported a lockdep issue [1]. I'm a bit surprised as I can't repro
> it, but it obviously is right. I believe I fixed the issue by moving the
> fully trimmed check outside of the block_group lock.  I mistakingly
> thought the btrfs_block_group lock subsumed btrfs_free_space_ctl
> tree_lock. This clearly isn't the case.
> 
> Changes in v6:
>  - Move the fully trimmed check outside of the block_group lock.

v6 passed fstests, with some weird test failures that don't seem to be
related to the patchset.

Meanwhile I did manual test how the discard behaves. The workload was
a series of linux git checkouts of various release tags (ie. this should
provide some freed extents and coalesce them eventually to get larger
chunks to discard), then a simple large file copy, sync, remove, sync.

The discards going down to the device were followin the maximum default
size (64M) but I observed that only one range was discarded per 10
seconds, while the other stats there are many more extents to discard.

For the large file it took like 5-10 cycles to send all the trimmed
ranges, the discardable_extents decreased by one each time until it
reached ... -1. At this point the discardable bytes were -16384, so
thre's some accounting problem.

This happened also when I deleted everything from the filesystem and ran
full balance.

Regarding the slow io submission, I tried to increase the iops value,
default was 10, but 100 and 1000 made no change. Increasing the maximum
discard request size to 128M worked (when there was such long extent
ready). I was expecting a burst of like 4 consecutive IOs after a 600MB
file is deleted.  I did not try to tweak bps_limit because there was
nothing to limit.

So this is something to fix but otherwise the patchset seems to be ok
for adding to misc-next. Due to the timing of the end of the year and
that we're already at rc2, this will be the main feature in 5.6.

Dennis Zhou Dec. 18, 2019, 12:06 a.m. UTC | #2

On Tue, Dec 17, 2019 at 03:55:41PM +0100, David Sterba wrote:
> On Fri, Dec 13, 2019 at 04:22:09PM -0800, Dennis Zhou wrote:
> > Hello,
> > 
> > Dave reported a lockdep issue [1]. I'm a bit surprised as I can't repro
> > it, but it obviously is right. I believe I fixed the issue by moving the
> > fully trimmed check outside of the block_group lock.  I mistakingly
> > thought the btrfs_block_group lock subsumed btrfs_free_space_ctl
> > tree_lock. This clearly isn't the case.
> > 
> > Changes in v6:
> >  - Move the fully trimmed check outside of the block_group lock.
> 
> v6 passed fstests, with some weird test failures that don't seem to be
> related to the patchset.

Yay!

> 
> Meanwhile I did manual test how the discard behaves. The workload was
> a series of linux git checkouts of various release tags (ie. this should
> provide some freed extents and coalesce them eventually to get larger
> chunks to discard), then a simple large file copy, sync, remove, sync.
> 
> The discards going down to the device were followin the maximum default
> size (64M) but I observed that only one range was discarded per 10
> seconds, while the other stats there are many more extents to discard.
> 
> For the large file it took like 5-10 cycles to send all the trimmed
> ranges, the discardable_extents decreased by one each time until it
> reached ... -1. At this point the discardable bytes were -16384, so
> thre's some accounting problem.
> 
> This happened also when I deleted everything from the filesystem and ran
> full balance.
> 

Oh no :(. I've been trying to repro with some limited checking out and
syncing, then subsequently removing everything and calling balance. It
is coming out to be 0 for me. I'll try harder to repro this and fix it.

> Regarding the slow io submission, I tried to increase the iops value,
> default was 10, but 100 and 1000 made no change. Increasing the maximum
> discard request size to 128M worked (when there was such long extent
> ready). I was expecting a burst of like 4 consecutive IOs after a 600MB
> file is deleted.  I did not try to tweak bps_limit because there was
> nothing to limit.
> 

Ah so there's actually a max time between discards set to 10 seconds as
the maximum timeout is calculated over 6 hours. So if we only have 6
extents, we'd discard 1 per hour(ish given it decays), but this is
clamped to 10 seconds.

At least on our servers, we seem to discard at a reasonable rate to
prevent performance penalties during a large number of reads and writes
while maintaining reasonable write amplification performance. Also,
metadata blocks aren't tracked, so on freeing of a whole metadata block
group (minus relocation), we'll trickle discards slightly slower than
expected.


> So this is something to fix but otherwise the patchset seems to be ok
> for adding to misc-next. Due to the timing of the end of the year and
> that we're already at rc2, this will be the main feature in 5.6.

I'll report back if I continue having trouble reproing it.

Thanks v5.6 sounds good to me!
Dennis

Dennis Zhou Dec. 19, 2019, 2:03 a.m. UTC | #3

On Tue, Dec 17, 2019 at 07:06:00PM -0500, Dennis Zhou wrote:
> On Tue, Dec 17, 2019 at 03:55:41PM +0100, David Sterba wrote:
> > On Fri, Dec 13, 2019 at 04:22:09PM -0800, Dennis Zhou wrote:
> > > Hello,
> > > 
> > > Dave reported a lockdep issue [1]. I'm a bit surprised as I can't repro
> > > it, but it obviously is right. I believe I fixed the issue by moving the
> > > fully trimmed check outside of the block_group lock.  I mistakingly
> > > thought the btrfs_block_group lock subsumed btrfs_free_space_ctl
> > > tree_lock. This clearly isn't the case.
> > > 
> > > Changes in v6:
> > >  - Move the fully trimmed check outside of the block_group lock.
> > 
> > v6 passed fstests, with some weird test failures that don't seem to be
> > related to the patchset.
> 
> Yay!
> 
> > 
> > Meanwhile I did manual test how the discard behaves. The workload was
> > a series of linux git checkouts of various release tags (ie. this should
> > provide some freed extents and coalesce them eventually to get larger
> > chunks to discard), then a simple large file copy, sync, remove, sync.
> > 
> > The discards going down to the device were followin the maximum default
> > size (64M) but I observed that only one range was discarded per 10
> > seconds, while the other stats there are many more extents to discard.
> > 
> > For the large file it took like 5-10 cycles to send all the trimmed
> > ranges, the discardable_extents decreased by one each time until it
> > reached ... -1. At this point the discardable bytes were -16384, so
> > thre's some accounting problem.
> > 
> > This happened also when I deleted everything from the filesystem and ran
> > full balance.
> > 

Also were these both on fresh file systems so it seems reproducible for
you?

> 
> Oh no :(. I've been trying to repro with some limited checking out and
> syncing, then subsequently removing everything and calling balance. It
> is coming out to be 0 for me. I'll try harder to repro this and fix it.
> 
> > Regarding the slow io submission, I tried to increase the iops value,
> > default was 10, but 100 and 1000 made no change. Increasing the maximum
> > discard request size to 128M worked (when there was such long extent
> > ready). I was expecting a burst of like 4 consecutive IOs after a 600MB
> > file is deleted.  I did not try to tweak bps_limit because there was
> > nothing to limit.
> > 
> 
> Ah so there's actually a max time between discards set to 10 seconds as
> the maximum timeout is calculated over 6 hours. So if we only have 6
> extents, we'd discard 1 per hour(ish given it decays), but this is
> clamped to 10 seconds.
> 
> At least on our servers, we seem to discard at a reasonable rate to
> prevent performance penalties during a large number of reads and writes
> while maintaining reasonable write amplification performance. Also,
> metadata blocks aren't tracked, so on freeing of a whole metadata block
> group (minus relocation), we'll trickle discards slightly slower than
> expected.
> 
> 
> > So this is something to fix but otherwise the patchset seems to be ok
> > for adding to misc-next. Due to the timing of the end of the year and
> > that we're already at rc2, this will be the main feature in 5.6.
> 
> I'll report back if I continue having trouble reproing it.
> 

I spent the day trying to repro against ext/dzhou-async-discard-v6
without any luck... I've been running the following:

$ mkfs.btrfs -f /dev/nvme0n1
$ mount -t btrfs -o discard=async /dev/nvme0n1 mnt
$ cd mnt
$ bash ../age_btrfs.sh .

where age_btrfs.sh is from [1].

If I delete arbitrary subvolumes, sync, and then run balance:
$ btrfs balance start --full-balance .
It all seems to resolve to 0 after some time. I haven't seen a negative
case on either of my 2 boxes. I've also tried unmounting and then
remounting, deleting and removing more free space items.

I'm still considering how this can happen. Possibly bad load of free
space cache and then freeing of the block group? Because being off by
just 1 and it not accumulating seems to be a real corner case here.

Adding asserts in btrfs_discard_update_discardable() might give us
insight to which callsite is responsible for going below 0.

[1] https://github.com/osandov/osandov-linux/blob/master/scripts/age_btrfs.sh

Thanks,
Dennis

David Sterba Dec. 19, 2019, 8:06 p.m. UTC | #4

On Wed, Dec 18, 2019 at 08:03:37PM -0600, Dennis Zhou wrote:
> > > This happened also when I deleted everything from the filesystem and ran
> > > full balance.
> 
> Also were these both on fresh file systems so it seems reproducible for
> you?

Yes the filesystem was freshly created before the test.

No luck reproducing it, I tried to repeat the steps as before but the
timing must make a difference and the numbers always ended up as 0
(bytes) 0 (extents).

> > I'll report back if I continue having trouble reproing it.
> 
> I spent the day trying to repro against ext/dzhou-async-discard-v6
> without any luck... I've been running the following:
> 
> $ mkfs.btrfs -f /dev/nvme0n1
> $ mount -t btrfs -o discard=async /dev/nvme0n1 mnt
> $ cd mnt
> $ bash ../age_btrfs.sh .
> 
> where age_btrfs.sh is from [1].
> 
> If I delete arbitrary subvolumes, sync, and then run balance:
> $ btrfs balance start --full-balance .
> It all seems to resolve to 0 after some time. I haven't seen a negative
> case on either of my 2 boxes. I've also tried unmounting and then
> remounting, deleting and removing more free space items.
> 
> I'm still considering how this can happen. Possibly bad load of free
> space cache and then freeing of the block group? Because being off by
> just 1 and it not accumulating seems to be a real corner case here.
> 
> Adding asserts in btrfs_discard_update_discardable() might give us
> insight to which callsite is responsible for going below 0.

Yeah more asserts would be good.

David Sterba Dec. 19, 2019, 8:34 p.m. UTC | #5

On Tue, Dec 17, 2019 at 07:06:00PM -0500, Dennis Zhou wrote:
> > Regarding the slow io submission, I tried to increase the iops value,
> > default was 10, but 100 and 1000 made no change. Increasing the maximum
> > discard request size to 128M worked (when there was such long extent
> > ready). I was expecting a burst of like 4 consecutive IOs after a 600MB
> > file is deleted.  I did not try to tweak bps_limit because there was
> > nothing to limit.
> 
> Ah so there's actually a max time between discards set to 10 seconds as
> the maximum timeout is calculated over 6 hours. So if we only have 6
> extents, we'd discard 1 per hour(ish given it decays), but this is
> clamped to 10 seconds.
> 
> At least on our servers, we seem to discard at a reasonable rate to
> prevent performance penalties during a large number of reads and writes
> while maintaining reasonable write amplification performance. Also,
> metadata blocks aren't tracked, so on freeing of a whole metadata block
> group (minus relocation), we'll trickle discards slightly slower than
> expected.

So after watching the sysfs numbers, my observation is that the overall
strategy of the async discard is to wait for larger ranges and discard
one range every 10 seconds. This is a slow process, but this makes sense
when there are reads or writes going on so the discard IO penalty is
marginal.

Running full fstrim will flush all the discardable extents so there's a
way to reset the discardable queue. What I still don't see as optimal is
the single discard request sent per one period. Namely because there's
the iops_limit knob.

My idea is that each timeout, 'iops_limit' times 'max_discard_size' is
called, so the discard batches are large in total. However, this has
impact on reads and writes and also on the device itself, I'm not sure
if the too frequent discards are not making things worse (as this is a
known problem).

I'm interested in more strategies that you could have tested in your
setups, either bps based or rate limited etc. The current one seems ok
for first implementation but we might want to tune it once we get
feedback from more users.

Dennis Zhou Dec. 19, 2019, 9:17 p.m. UTC | #6

On Thu, Dec 19, 2019 at 09:34:38PM +0100, David Sterba wrote:
> On Tue, Dec 17, 2019 at 07:06:00PM -0500, Dennis Zhou wrote:
> > > Regarding the slow io submission, I tried to increase the iops value,
> > > default was 10, but 100 and 1000 made no change. Increasing the maximum
> > > discard request size to 128M worked (when there was such long extent
> > > ready). I was expecting a burst of like 4 consecutive IOs after a 600MB
> > > file is deleted.  I did not try to tweak bps_limit because there was
> > > nothing to limit.
> > 
> > Ah so there's actually a max time between discards set to 10 seconds as
> > the maximum timeout is calculated over 6 hours. So if we only have 6
> > extents, we'd discard 1 per hour(ish given it decays), but this is
> > clamped to 10 seconds.
> > 
> > At least on our servers, we seem to discard at a reasonable rate to
> > prevent performance penalties during a large number of reads and writes
> > while maintaining reasonable write amplification performance. Also,
> > metadata blocks aren't tracked, so on freeing of a whole metadata block
> > group (minus relocation), we'll trickle discards slightly slower than
> > expected.
> 
> So after watching the sysfs numbers, my observation is that the overall
> strategy of the async discard is to wait for larger ranges and discard
> one range every 10 seconds. This is a slow process, but this makes sense
> when there are reads or writes going on so the discard IO penalty is
> marginal.
> 

Yeah, (un)fortunately on our systems we're running chef fairly
frequently which results in a lot of IO in addition to package
deployment. This actually drives the system to have a fairly high steady
state number of untrimmed extents and results in a bit faster paced
discarding rate.

> Running full fstrim will flush all the discardable extents so there's a
> way to reset the discardable queue. What I still don't see as optimal is
> the single discard request sent per one period. Namely because there's
> the iops_limit knob.
> 

Yeah, it's not really ideal at the moment for much slower paced systems
such as our own laptops. Adding persistence would also make a big
difference here.

> My idea is that each timeout, 'iops_limit' times 'max_discard_size' is
> called, so the discard batches are large in total. However, this has
> impact on reads and writes and also on the device itself, I'm not sure
> if the too frequent discards are not making things worse (as this is a
> known problem).
> 

I spent a bit of time looking at the impact of discard on some drives
and my conclusion was that the iops rate is more impactful than the size
of the discards (within reason, which is why there's the
max_discard_size). On a particular drive, I noticed if I went over 10
iops of discards on a sustained simple read write workload, the
latencies would double. That's kind of where the 10 iops limit comes
from. Given the latency impact, that's why this more or less trickles it
down in pieces rather than as a larger batch.

> I'm interested in more strategies that you could have tested in your
> setups, either bps based or rate limited etc. The current one seems ok
> for first implementation but we might want to tune it once we get
> feedback from more users.

Definitely, one of the things I want to do is experiment with different
limits and see how this all correlates with write amplification. I'm
sure there's some happy medium that we can identify that's a lot less
arbitrary than what's current set forth. I imagine it should result in
some graph that we can correlate delay and rate of discarding to a
particular write amp given a fixed workload.

Thanks,
Dennis

Dennis Zhou Dec. 19, 2019, 9:22 p.m. UTC | #7

On Thu, Dec 19, 2019 at 09:06:07PM +0100, David Sterba wrote:
> On Wed, Dec 18, 2019 at 08:03:37PM -0600, Dennis Zhou wrote:
> > > > This happened also when I deleted everything from the filesystem and ran
> > > > full balance.
> > 
> > Also were these both on fresh file systems so it seems reproducible for
> > you?
> 
> Yes the filesystem was freshly created before the test.
> 
> No luck reproducing it, I tried to repeat the steps as before but the
> timing must make a difference and the numbers always ended up as 0
> (bytes) 0 (extents).
> 
> > > I'll report back if I continue having trouble reproing it.
> > 
> > I spent the day trying to repro against ext/dzhou-async-discard-v6
> > without any luck... I've been running the following:
> > 
> > $ mkfs.btrfs -f /dev/nvme0n1
> > $ mount -t btrfs -o discard=async /dev/nvme0n1 mnt
> > $ cd mnt
> > $ bash ../age_btrfs.sh .
> > 
> > where age_btrfs.sh is from [1].
> > 
> > If I delete arbitrary subvolumes, sync, and then run balance:
> > $ btrfs balance start --full-balance .
> > It all seems to resolve to 0 after some time. I haven't seen a negative
> > case on either of my 2 boxes. I've also tried unmounting and then
> > remounting, deleting and removing more free space items.
> > 
> > I'm still considering how this can happen. Possibly bad load of free
> > space cache and then freeing of the block group? Because being off by
> > just 1 and it not accumulating seems to be a real corner case here.
> > 
> > Adding asserts in btrfs_discard_update_discardable() might give us
> > insight to which callsite is responsible for going below 0.
> 
> Yeah more asserts would be good.

I'll add a few assert patches and some code to ensure that life can
still move on properly if we do hit the -1 case. I think it probably has
something to do with free space cache removal as it can't be a simple
corner case, otherwise we'd see the -1 accumulating much more easily.
What does puzzle me is it's a single nodesize that I'm off by and not
some other random number.

Thanks,
Dennis

David Sterba Dec. 30, 2019, 6:13 p.m. UTC | #8

On Fri, Dec 13, 2019 at 04:22:09PM -0800, Dennis Zhou wrote:
> Hello,
> 
> Dave reported a lockdep issue [1]. I'm a bit surprised as I can't repro
> it, but it obviously is right. I believe I fixed the issue by moving the
> fully trimmed check outside of the block_group lock.  I mistakingly
> thought the btrfs_block_group lock subsumed btrfs_free_space_ctl
> tree_lock. This clearly isn't the case.
> 
> Changes in v6:
>  - Move the fully trimmed check outside of the block_group lock.
> 
> v5 is available here: [2].
> 
> This series is on top of btrfs-devel#misc-next 7ee98bb808e2 + [3] and
> [4].
> 
> [1] https://lore.kernel.org/linux-btrfs/20191210140438.GU2734@twin.jikos.cz/
> [2] https://lore.kernel.org/linux-btrfs/cover.1575919745.git.dennis@kernel.org/
> [3] https://lore.kernel.org/linux-btrfs/d934383ea528d920a95b6107daad6023b516f0f4.1576109087.git.dennis@kernel.org/
> [4] https://lore.kernel.org/linux-btrfs/20191209193846.18162-1-dennis@kernel.org/
> 
> Dennis Zhou (22):
>   bitmap: genericize percpu bitmap region iterators
>   btrfs: rename DISCARD opt to DISCARD_SYNC
>   btrfs: keep track of which extents have been discarded
>   btrfs: keep track of cleanliness of the bitmap
>   btrfs: add the beginning of async discard, discard workqueue
>   btrfs: handle empty block_group removal
>   btrfs: discard one region at a time in async discard
>   btrfs: add removal calls for sysfs debug/
>   btrfs: make UUID/debug have its own kobject
>   btrfs: add discard sysfs directory
>   btrfs: track discardable extents for async discard
>   btrfs: keep track of discardable_bytes
>   btrfs: calculate discard delay based on number of extents
>   btrfs: add bps discard rate limit
>   btrfs: limit max discard size for async discard
>   btrfs: make max async discard size tunable
>   btrfs: have multiple discard lists
>   btrfs: only keep track of data extents for async discard
>   btrfs: keep track of discard reuse stats
>   btrfs: add async discard header
>   btrfs: increase the metadata allowance for the free_space_cache
>   btrfs: make smaller extents more likely to go into bitmaps

Patches 1-12 merged to a temporary misc-next but I haven't pushed it as
misc-next yet (it's misc-next-with-discard-v6 in my github repo). There
are some comments to patch 13 and up so please send them either as
replies or as a shorter series. Thanks.

Dennis Zhou Dec. 30, 2019, 6:49 p.m. UTC | #9

On Mon, Dec 30, 2019 at 07:13:18PM +0100, David Sterba wrote:
> On Fri, Dec 13, 2019 at 04:22:09PM -0800, Dennis Zhou wrote:
> > Hello,
> > 
> > Dave reported a lockdep issue [1]. I'm a bit surprised as I can't repro
> > it, but it obviously is right. I believe I fixed the issue by moving the
> > fully trimmed check outside of the block_group lock.  I mistakingly
> > thought the btrfs_block_group lock subsumed btrfs_free_space_ctl
> > tree_lock. This clearly isn't the case.
> > 
> > Changes in v6:
> >  - Move the fully trimmed check outside of the block_group lock.
> > 
> > v5 is available here: [2].
> > 
> > This series is on top of btrfs-devel#misc-next 7ee98bb808e2 + [3] and
> > [4].
> > 
> > [1] https://lore.kernel.org/linux-btrfs/20191210140438.GU2734@twin.jikos.cz/
> > [2] https://lore.kernel.org/linux-btrfs/cover.1575919745.git.dennis@kernel.org/
> > [3] https://lore.kernel.org/linux-btrfs/d934383ea528d920a95b6107daad6023b516f0f4.1576109087.git.dennis@kernel.org/
> > [4] https://lore.kernel.org/linux-btrfs/20191209193846.18162-1-dennis@kernel.org/
> > 
> > Dennis Zhou (22):
> >   bitmap: genericize percpu bitmap region iterators
> >   btrfs: rename DISCARD opt to DISCARD_SYNC
> >   btrfs: keep track of which extents have been discarded
> >   btrfs: keep track of cleanliness of the bitmap
> >   btrfs: add the beginning of async discard, discard workqueue
> >   btrfs: handle empty block_group removal
> >   btrfs: discard one region at a time in async discard
> >   btrfs: add removal calls for sysfs debug/
> >   btrfs: make UUID/debug have its own kobject
> >   btrfs: add discard sysfs directory
> >   btrfs: track discardable extents for async discard
> >   btrfs: keep track of discardable_bytes
> >   btrfs: calculate discard delay based on number of extents
> >   btrfs: add bps discard rate limit
> >   btrfs: limit max discard size for async discard
> >   btrfs: make max async discard size tunable
> >   btrfs: have multiple discard lists
> >   btrfs: only keep track of data extents for async discard
> >   btrfs: keep track of discard reuse stats
> >   btrfs: add async discard header
> >   btrfs: increase the metadata allowance for the free_space_cache
> >   btrfs: make smaller extents more likely to go into bitmaps
> 
> Patches 1-12 merged to a temporary misc-next but I haven't pushed it as
> misc-next yet (it's misc-next-with-discard-v6 in my github repo). There
> are some comments to patch 13 and up so please send them either as
> replies or as a shorter series. Thanks.

Great! Thanks for taking another pass at it all. Would you prefer a pull
request or just another series? I'll throw on top a couple patches to
hopefully address the -1 (I'm still not fully sure how it can happen).

Thanks,
Dennis

David Sterba Jan. 2, 2020, 1:22 p.m. UTC | #10

On Mon, Dec 30, 2019 at 01:49:30PM -0500, Dennis Zhou wrote:
> On Mon, Dec 30, 2019 at 07:13:18PM +0100, David Sterba wrote:
> > On Fri, Dec 13, 2019 at 04:22:09PM -0800, Dennis Zhou wrote:
> > Patches 1-12 merged to a temporary misc-next but I haven't pushed it as
> > misc-next yet (it's misc-next-with-discard-v6 in my github repo). There
> > are some comments to patch 13 and up so please send them either as
> > replies or as a shorter series. Thanks.
> 
> Great! Thanks for taking another pass at it all. Would you prefer a pull
> request or just another series? I'll throw on top a couple patches to
> hopefully address the -1 (I'm still not fully sure how it can happen).

Send the patch series please, you can add a link to git repo/branch but
that's not necessary.

[v6,00/22] btrfs: async discard support

Message

Comments