Message ID | c1cfe98ea6c2610373d11d4df7c8855e6e98d3dc.1688658745.git.josef@toxicpanda.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs-progs: some zoned mkfs fixups | expand |
On Thu, Jul 06, 2023 at 11:54:00AM -0400, Josef Bacik wrote: > We currently limit the size of the file system to 5 * the zone size, > however we actually want to limit it to 7 * the zone size. Fix up the > comment and the math to match our actual minimum zoned file system size. Hmm. IS this actually correct? Don't we also need at least a second metadata and system block group in case the first one fills up and metadata needs to go somewhere else to be able to reset the previous ones? Sorry, should have noticed that last time around.
On Fri, Jul 07, 2023 at 04:38:10AM -0700, Christoph Hellwig wrote: > On Thu, Jul 06, 2023 at 11:54:00AM -0400, Josef Bacik wrote: > > We currently limit the size of the file system to 5 * the zone size, > > however we actually want to limit it to 7 * the zone size. Fix up the > > comment and the math to match our actual minimum zoned file system size. > > Hmm. IS this actually correct? Don't we also need at least a second > metadata and system block group in case the first one fills up and > metadata needs to go somewhere else to be able to reset the previous > ones? > > Sorry, should have noticed that last time around. It depends on what we consider the "minimal" is. Even with the 5 zones (2 SBs + 1 per BG type), we can start writing to the file-system. If you need to run a relocation, one more block group for it is needed. The fsync block group might be optional because if the fsync node allocation failed, it should fall back to the full sync. It will kill the performance but still works... If we say it is the "minimal" that we can infinitely write and delete a file without ENOSPC, we need one (or two, depending on the metadata profile) more BGs per META/SYSTEM.
On Thu, Jul 06, 2023 at 11:54:00AM -0400, Josef Bacik wrote: > We currently limit the size of the file system to 5 * the zone size, > however we actually want to limit it to 7 * the zone size. Fix up the > comment and the math to match our actual minimum zoned file system size. > > Signed-off-by: Josef Bacik <josef@toxicpanda.com> > --- > mkfs/main.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/mkfs/main.c b/mkfs/main.c > index 8d94dac8..c7d7399f 100644 > --- a/mkfs/main.c > +++ b/mkfs/main.c > @@ -84,10 +84,12 @@ struct prepare_device_progress { > * 1 zone for the system block group > * 1 zone for a metadata block group > * 1 zone for a data block group > + * 1 zone for a relocation block group > + * 1 zone for the tree log > */ > static u64 min_zoned_fs_size(const char *filename) > { > - return 5 * zone_size(file); > + return 7 * zone_size(file); When we use DUP profile for METADATA or SYSTEM, we need two zones for METADATA, SYSTEM, and the tree log BG. > } > > static int create_metadata_block_groups(struct btrfs_root *root, bool mixed, > -- > 2.41.0 >
On Mon, Jul 10, 2023 at 12:57:52AM +0000, Naohiro Aota wrote: > It depends on what we consider the "minimal" is. I think minimal means a file system that can actually be be continuously used. > Even with the 5 zones (2 > SBs + 1 per BG type), we can start writing to the file-system. > > If you need to run a relocation, one more block group for it is needed. > > The fsync block group might be optional because if the fsync node > allocation failed, it should fall back to the full sync. It will kill the > performance but still works... > > If we say it is the "minimal" that we can infinitely write and delete a > file without ENOSPC, we need one (or two, depending on the metadata > profile) more BGs per META/SYSTEM. Based on my above sentence we then need: 2 zones for the primary superblock metadata replication factor * ( 2 zones for the system block group 2 zone for a metadata block group 2 zone for the tree log) data replication factor * ( 1 zone for a data block group 1 zone for a relocation block group ) where the two for the non-sb, non-data blocks accounts for beeing able to continue writing after filling up one bg and allowing gc. In fact even just two might lead to deadlocks in that case depending on the exact algorithm in other zoned storage systems, but I don't know enough about btrfs metadata placement to understand how that works on zoned btrfs right now.
On Sun, Jul 09, 2023 at 10:28:12PM -0700, hch@infradead.org wrote: > On Mon, Jul 10, 2023 at 12:57:52AM +0000, Naohiro Aota wrote: > > It depends on what we consider the "minimal" is. > > I think minimal means a file system that can actually be be continuously > used. > > > Even with the 5 zones (2 > > SBs + 1 per BG type), we can start writing to the file-system. > > > > If you need to run a relocation, one more block group for it is needed. > > > > The fsync block group might be optional because if the fsync node > > allocation failed, it should fall back to the full sync. It will kill the > > performance but still works... > > > > If we say it is the "minimal" that we can infinitely write and delete a > > file without ENOSPC, we need one (or two, depending on the metadata > > profile) more BGs per META/SYSTEM. > > Based on my above sentence we then need: > > 2 zones for the primary superblock > > metadata replication factor * ( > 2 zones for the system block group > 2 zone for a metadata block group > 2 zone for the tree log) > > > data replication factor * ( > 1 zone for a data block group > 1 zone for a relocation block group > ) I think the relocation should be taken separately, there can be only one relocation process running per block group type, ie. data/metadata and the replication depends on the respective factor. Otherwise yeah the formula for minimal number of zones needs to take into account the replication and all the normal usage case. In total this is still a low number, say always below 20 with currently supported profiles. Devices typically have more and for emulated ones we should scale the size or zone size properly. Setting up devices with small number of spare zones is also interesting, or with small zones that will trigger the reclaim more often.
On Thu, Jul 13, 2023 at 08:19:22PM +0200, David Sterba wrote: > On Sun, Jul 09, 2023 at 10:28:12PM -0700, hch@infradead.org wrote: > > On Mon, Jul 10, 2023 at 12:57:52AM +0000, Naohiro Aota wrote: > > > It depends on what we consider the "minimal" is. > > > > I think minimal means a file system that can actually be be continuously > > used. > > > > > Even with the 5 zones (2 > > > SBs + 1 per BG type), we can start writing to the file-system. > > > > > > If you need to run a relocation, one more block group for it is needed. > > > > > > The fsync block group might be optional because if the fsync node > > > allocation failed, it should fall back to the full sync. It will kill the > > > performance but still works... > > > > > > If we say it is the "minimal" that we can infinitely write and delete a > > > file without ENOSPC, we need one (or two, depending on the metadata > > > profile) more BGs per META/SYSTEM. > > > > Based on my above sentence we then need: > > > > 2 zones for the primary superblock > > > > metadata replication factor * ( > > 2 zones for the system block group > > 2 zone for a metadata block group > > 2 zone for the tree log) > > > > > > data replication factor * ( > > 1 zone for a data block group > > 1 zone for a relocation block group > > ) > > I think the relocation should be taken separately, there can be only one > relocation process running per block group type, ie. data/metadata and That relocation block group only write relocated data and that data must be written into a dedicated block group. The relocated metadata can go with the same BG as normal metadata. So, the above calculation looks good to me. > the replication depends on the respective factor. Otherwise yeah the > formula for minimal number of zones needs to take into account the > replication and all the normal usage case. In total this is still a low > number, say always below 20 with currently supported profiles. Devices > typically have more and for emulated ones we should scale the size or > zone size properly. > > Setting up devices with small number of spare zones is also interesting, > or with small zones that will trigger the reclaim more often.
diff --git a/mkfs/main.c b/mkfs/main.c index 8d94dac8..c7d7399f 100644 --- a/mkfs/main.c +++ b/mkfs/main.c @@ -84,10 +84,12 @@ struct prepare_device_progress { * 1 zone for the system block group * 1 zone for a metadata block group * 1 zone for a data block group + * 1 zone for a relocation block group + * 1 zone for the tree log */ static u64 min_zoned_fs_size(const char *filename) { - return 5 * zone_size(file); + return 7 * zone_size(file); } static int create_metadata_block_groups(struct btrfs_root *root, bool mixed,
We currently limit the size of the file system to 5 * the zone size, however we actually want to limit it to 7 * the zone size. Fix up the comment and the math to match our actual minimum zoned file system size. Signed-off-by: Josef Bacik <josef@toxicpanda.com> --- mkfs/main.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)