mbox series

[00/11] btrfs: zoned: split out data relocation space_info

Message ID cover.1733384171.git.naohiro.aota@wdc.com (mailing list archive)
Headers show
Series btrfs: zoned: split out data relocation space_info | expand

Message

Naohiro Aota Dec. 5, 2024, 7:48 a.m. UTC
As discussed in [1], there is a longstanding early ENOSPC issue on the
zoned mode even with simple fio script. This is also causing blktests
zbd/009 to fail [2].

[1] https://lore.kernel.org/linux-btrfs/cover.1731571240.git.naohiro.aota@wdc.com/
[2] https://github.com/osandov/blktests/issues/150

This series is the second part to fix the ENOSPC issue. This series
introduces "space_info sub-space" and use it split a space_info for data
relocation block group.

Current code assumes we have only one space_info for each block group type
(DATA, METADATA, and SYSTEM). We sometime needs multiple space_info to
manage special block groups.

One example is handling the data relocation block group for the zoned mode.
That block group is dedicated for writing relocated data and we cannot
allocate any regular extent from that block group, which is implemented in
the zoned extent allocator. That block group still belongs to the normal
data space_info. So, when all the normal data block groups are full and
there are some free space in the dedicated block group, the space_info
looks to have some free space, while it cannot allocate normal extent
anymore. That results in a strange ENOSPC error. We need to have a
space_info for the relocation data block group to represent the situation
properly.

Naohiro Aota (11):
  btrfs: take btrfs_space_info in btrfs_reserve_data_bytes
  btrfs: take struct btrfs_inode in
    btrfs_free_reserved_data_space_noquota
  btrfs: factor out init_space_info()
  btrfs: spin out do_async_reclaim_data_space()
  btrfs: factor out check_removing_space_info()
  btrfs: introduce space_info argument to btrfs_chunk_alloc
  btrfs: pass space_info for block group creation
  btrfs: introduce btrfs_space_info sub-group
  btrfs: tweak extent/chunk allocation for space_info sub-space
  btrfs: use proper data space_info
  btrfs: reclaim from data sub-space space_info

 fs/btrfs/block-group.c    | 89 ++++++++++++++++++++++++---------------
 fs/btrfs/block-group.h    |  5 ++-
 fs/btrfs/delalloc-space.c | 26 ++++++++----
 fs/btrfs/delalloc-space.h |  3 +-
 fs/btrfs/extent-tree.c    |  5 ++-
 fs/btrfs/inode.c          |  4 +-
 fs/btrfs/relocation.c     |  3 +-
 fs/btrfs/space-info.c     | 84 ++++++++++++++++++++++++------------
 fs/btrfs/space-info.h     | 10 ++++-
 fs/btrfs/sysfs.c          | 16 +++++--
 fs/btrfs/transaction.c    |  2 +-
 fs/btrfs/volumes.c        | 16 ++++---
 fs/btrfs/volumes.h        |  2 +-
 13 files changed, 176 insertions(+), 89 deletions(-)

Comments

Johannes Thumshirn Dec. 7, 2024, 11:35 a.m. UTC | #1
On 05.12.24 08:50, Naohiro Aota wrote:
> As discussed in [1], there is a longstanding early ENOSPC issue on the
> zoned mode even with simple fio script. This is also causing blktests
> zbd/009 to fail [2].
> 
> [1] https://lore.kernel.org/linux-btrfs/cover.1731571240.git.naohiro.aota@wdc.com/
> [2] https://github.com/osandov/blktests/issues/150
> 
> This series is the second part to fix the ENOSPC issue. This series
> introduces "space_info sub-space" and use it split a space_info for data
> relocation block group.
> 
> Current code assumes we have only one space_info for each block group type
> (DATA, METADATA, and SYSTEM). We sometime needs multiple space_info to
> manage special block groups.
> 
> One example is handling the data relocation block group for the zoned mode.
> That block group is dedicated for writing relocated data and we cannot
> allocate any regular extent from that block group, which is implemented in
> the zoned extent allocator. That block group still belongs to the normal
> data space_info. So, when all the normal data block groups are full and
> there are some free space in the dedicated block group, the space_info
> looks to have some free space, while it cannot allocate normal extent
> anymore. That results in a strange ENOSPC error. We need to have a
> space_info for the relocation data block group to represent the situation
> properly.

I like the idea and the patches, but I'm a bit concerned it diverges 
zoned and non-zoned btrfs quite a bit in handling relocation. I'd be 
interested what David and Josef think of it. If no one objects to have 
these sub-space_infos zoned specific I'm all good with it as it fixes 
real problems.

Would it be useful to also do the same for regular btrfs? And while 
we're at it, the treelog block-group for zoned mode could benefit form a 
own space-info as well, couldn't it? To not run into premature ENOSPC on 
frequent syncs, or is this unlikely to happen (I'm thinking out loud here).

Byte,
	Johannes
Naohiro Aota Dec. 10, 2024, 5:40 a.m. UTC | #2
On Sat, Dec 07, 2024 at 11:35:04AM +0000, Johannes Thumshirn wrote:
> On 05.12.24 08:50, Naohiro Aota wrote:
> > As discussed in [1], there is a longstanding early ENOSPC issue on the
> > zoned mode even with simple fio script. This is also causing blktests
> > zbd/009 to fail [2].
> > 
> > [1] https://lore.kernel.org/linux-btrfs/cover.1731571240.git.naohiro.aota@wdc.com/
> > [2] https://github.com/osandov/blktests/issues/150
> > 
> > This series is the second part to fix the ENOSPC issue. This series
> > introduces "space_info sub-space" and use it split a space_info for data
> > relocation block group.
> > 
> > Current code assumes we have only one space_info for each block group type
> > (DATA, METADATA, and SYSTEM). We sometime needs multiple space_info to
> > manage special block groups.
> > 
> > One example is handling the data relocation block group for the zoned mode.
> > That block group is dedicated for writing relocated data and we cannot
> > allocate any regular extent from that block group, which is implemented in
> > the zoned extent allocator. That block group still belongs to the normal
> > data space_info. So, when all the normal data block groups are full and
> > there are some free space in the dedicated block group, the space_info
> > looks to have some free space, while it cannot allocate normal extent
> > anymore. That results in a strange ENOSPC error. We need to have a
> > space_info for the relocation data block group to represent the situation
> > properly.
> 
> I like the idea and the patches, but I'm a bit concerned it diverges 
> zoned and non-zoned btrfs quite a bit in handling relocation. I'd be 
> interested what David and Josef think of it. If no one objects to have 
> these sub-space_infos zoned specific I'm all good with it as it fixes 
> real problems.

Hmm, for that point, we already do the relocation a bit different on zoned
vs non-zoned. On the zoned mode, the relocated data is always allocated
from a dedicated block group. This series just move that block group into
its own space_info (aka sub-space_info) to fix a space accounting issue.

I admit the concept of sub-space_info is new, so I'd like to hear David and
Josef's opinion on it.

> 
> Would it be useful to also do the same for regular btrfs? And while 
> we're at it, the treelog block-group for zoned mode could benefit form a 
> own space-info as well, couldn't it? To not run into premature ENOSPC on 
> frequent syncs, or is this unlikely to happen (I'm thinking out loud here).

On the regular mode, it can allocate space for relocation data from any
block group. So, it is not so useful to separate space_info for
that. However, it would be interesting to have a dedicated relocation data
block group as well on the regular mode, because it could reduce the
fragmentation of relocated data. This would be interesting topic to
explore.

Yes, adding treelog sub-space_info is useful too. Apparently, I sometime
see a test failure due to treelog space_info accounting mismatch. I didn't
implement that sub-space_info for now, because I'd like to know first that the
sub-space_info concept itself is the right way to go.

> 
> Byte,
> 	Johannes