Message ID | 20220901074216.1849941-16-hch@lst.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [01/17] block: export bio_split_rw | expand |
On 01.09.22 09:43, Christoph Hellwig wrote: > To be able to split a write into properly sized zone append commands, > we need a queue_limits structure that contains the least common > denominator suitable for all devices. > This patch conflicts with Shinichiro's patch restoring functionality of the zone emulation mode.
On 9/1/22 16:42, Christoph Hellwig wrote: > To be able to split a write into properly sized zone append commands, > we need a queue_limits structure that contains the least common > denominator suitable for all devices. > > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > fs/btrfs/ctree.h | 4 +++- > fs/btrfs/zoned.c | 36 ++++++++++++++++++------------------ > fs/btrfs/zoned.h | 1 - > 3 files changed, 21 insertions(+), 20 deletions(-) > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > index 5e57e3c6a1fd6..a37129363e184 100644 > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -1071,8 +1071,10 @@ struct btrfs_fs_info { > */ > u64 zone_size; > > - /* Max size to emit ZONE_APPEND write command */ > + /* Constraints for ZONE_APPEND commands: */ > + struct queue_limits limits; > u64 max_zone_append_size; Can't we get rid of this one and have the code directly use fs_info->limits.max_zone_append_sectors through a little helper doing a conversion to bytes (a 9 bit shift) ? [...] > /* Count zoned devices */ > list_for_each_entry(device, &fs_devices->devices, dev_list) { > enum blk_zoned_model model; > @@ -685,11 +677,9 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) > ret = -EINVAL; > goto out; > } > - if (!max_zone_append_size || > - (zone_info->max_zone_append_size && > - zone_info->max_zone_append_size < max_zone_append_size)) > - max_zone_append_size = > - zone_info->max_zone_append_size; > + blk_stack_limits(lim, > + &bdev_get_queue(device->bdev)->limits, > + 0); This does: t->max_zone_append_sectors = min(t->max_zone_append_sectors, b->max_zone_append_sectors); So if we are mixing zoned and non-zoned devices in a multi-dev volume, we'll end up with max_zone_append_sectors being 0. The previous code prevented that. Note that I am not sure if it is allowed to mix zoned and non-zoned drives in the same volume. Given that we have a fake zone emulation for non-zoned drives with zoned btrfs, I do not see why it would not work. But I may be wrong.
On 9/2/22 10:56, Damien Le Moal wrote: > On 9/1/22 16:42, Christoph Hellwig wrote: >> To be able to split a write into properly sized zone append commands, >> we need a queue_limits structure that contains the least common >> denominator suitable for all devices. >> >> Signed-off-by: Christoph Hellwig <hch@lst.de> >> --- >> fs/btrfs/ctree.h | 4 +++- >> fs/btrfs/zoned.c | 36 ++++++++++++++++++------------------ >> fs/btrfs/zoned.h | 1 - >> 3 files changed, 21 insertions(+), 20 deletions(-) >> >> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h >> index 5e57e3c6a1fd6..a37129363e184 100644 >> --- a/fs/btrfs/ctree.h >> +++ b/fs/btrfs/ctree.h >> @@ -1071,8 +1071,10 @@ struct btrfs_fs_info { >> */ >> u64 zone_size; >> >> - /* Max size to emit ZONE_APPEND write command */ >> + /* Constraints for ZONE_APPEND commands: */ >> + struct queue_limits limits; >> u64 max_zone_append_size; > > Can't we get rid of this one and have the code directly use > fs_info->limits.max_zone_append_sectors through a little helper doing a > conversion to bytes (a 9 bit shift) ? Note: Only a suggestion, not sure that would be much of a cleanup. > > [...] >> /* Count zoned devices */ >> list_for_each_entry(device, &fs_devices->devices, dev_list) { >> enum blk_zoned_model model; >> @@ -685,11 +677,9 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) >> ret = -EINVAL; >> goto out; >> } >> - if (!max_zone_append_size || >> - (zone_info->max_zone_append_size && >> - zone_info->max_zone_append_size < max_zone_append_size)) >> - max_zone_append_size = >> - zone_info->max_zone_append_size; >> + blk_stack_limits(lim, >> + &bdev_get_queue(device->bdev)->limits, >> + 0); > > This does: > > t->max_zone_append_sectors = min(t->max_zone_append_sectors, > b->max_zone_append_sectors); > > So if we are mixing zoned and non-zoned devices in a multi-dev volume, > we'll end up with max_zone_append_sectors being 0. The previous code > prevented that. > > Note that I am not sure if it is allowed to mix zoned and non-zoned drives > in the same volume. Given that we have a fake zone emulation for non-zoned > drives with zoned btrfs, I do not see why it would not work. But I may be > wrong. >
On Thu, Sep 01, 2022 at 11:28:02AM +0000, Johannes Thumshirn wrote: > On 01.09.22 09:43, Christoph Hellwig wrote: > > To be able to split a write into properly sized zone append commands, > > we need a queue_limits structure that contains the least common > > denominator suitable for all devices. > > > > This patch conflicts with Shinichiro's patch restoring functionality > of the zone emulation mode. Looks like that got into misc-next in the meantime, so I'll rebase on top of that.
On Fri, Sep 02, 2022 at 10:56:40AM +0900, Damien Le Moal wrote: > > - /* Max size to emit ZONE_APPEND write command */ > > + /* Constraints for ZONE_APPEND commands: */ > > + struct queue_limits limits; > > u64 max_zone_append_size; > > Can't we get rid of this one and have the code directly use > fs_info->limits.max_zone_append_sectors through a little helper doing a > conversion to bytes (a 9 bit shift) ? Well, the helper would be a little more complicated, doing three different shifts, a max3 and and ALIGN_DOWN. That's why I thought I'd rather cache the value then recalculating it on every write. But either way would be entirely feasible. > This does: > > t->max_zone_append_sectors = min(t->max_zone_append_sectors, > b->max_zone_append_sectors); > > So if we are mixing zoned and non-zoned devices in a multi-dev volume, > we'll end up with max_zone_append_sectors being 0. The previous code > prevented that. > > Note that I am not sure if it is allowed to mix zoned and non-zoned drives > in the same volume. Given that we have a fake zone emulation for non-zoned > drives with zoned btrfs, I do not see why it would not work. But I may be > wrong. Yes, this could be problematic. I wonder if we need to initialize max_zone_append_sectors to max_hw_sectors by default and use a separate flag if it is actually supported in the block layer if we want to support that.
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 5e57e3c6a1fd6..a37129363e184 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1071,8 +1071,10 @@ struct btrfs_fs_info { */ u64 zone_size; - /* Max size to emit ZONE_APPEND write command */ + /* Constraints for ZONE_APPEND commands: */ + struct queue_limits limits; u64 max_zone_append_size; + struct mutex zoned_meta_io_lock; spinlock_t treelog_bg_lock; u64 treelog_bg; diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 2638f71eec4b6..6e04fbbd76b92 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -415,16 +415,6 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device, bool populate_cache) nr_sectors = bdev_nr_sectors(bdev); zone_info->zone_size_shift = ilog2(zone_info->zone_size); zone_info->nr_zones = nr_sectors >> ilog2(zone_sectors); - /* - * We limit max_zone_append_size also by max_segments * - * PAGE_SIZE. Technically, we can have multiple pages per segment. But, - * since btrfs adds the pages one by one to a bio, and btrfs cannot - * increase the metadata reservation even if it increases the number of - * extents, it is safe to stick with the limit. - */ - zone_info->max_zone_append_size = - min_t(u64, (u64)bdev_max_zone_append_sectors(bdev) << SECTOR_SHIFT, - (u64)bdev_max_segments(bdev) << PAGE_SHIFT); if (!IS_ALIGNED(nr_sectors, zone_sectors)) zone_info->nr_zones++; @@ -646,14 +636,16 @@ int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) { struct btrfs_fs_devices *fs_devices = fs_info->fs_devices; + struct queue_limits *lim = &fs_info->limits; struct btrfs_device *device; u64 zoned_devices = 0; u64 nr_devices = 0; u64 zone_size = 0; - u64 max_zone_append_size = 0; const bool incompat_zoned = btrfs_fs_incompat(fs_info, ZONED); int ret = 0; + blk_set_stacking_limits(lim); + /* Count zoned devices */ list_for_each_entry(device, &fs_devices->devices, dev_list) { enum blk_zoned_model model; @@ -685,11 +677,9 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) ret = -EINVAL; goto out; } - if (!max_zone_append_size || - (zone_info->max_zone_append_size && - zone_info->max_zone_append_size < max_zone_append_size)) - max_zone_append_size = - zone_info->max_zone_append_size; + blk_stack_limits(lim, + &bdev_get_queue(device->bdev)->limits, + 0); } nr_devices++; } @@ -739,8 +729,18 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) } fs_info->zone_size = zone_size; - fs_info->max_zone_append_size = ALIGN_DOWN(max_zone_append_size, - fs_info->sectorsize); + /* + * Also limit max_zone_append_size by max_segments * PAGE_SIZE. + * Technically, we can have multiple pages per segment. But, + * since btrfs adds the pages one by one to a bio, and btrfs cannot + * increase the metadata reservation even if it increases the number of + * extents, it is safe to stick with the limit. + */ + fs_info->max_zone_append_size = ALIGN_DOWN( + min3((u64)lim->max_zone_append_sectors << SECTOR_SHIFT, + (u64)lim->max_sectors << SECTOR_SHIFT, + (u64)lim->max_segments << PAGE_SHIFT), + fs_info->sectorsize); fs_info->fs_devices->chunk_alloc_policy = BTRFS_CHUNK_ALLOC_ZONED; if (fs_info->max_zone_append_size < fs_info->max_extent_size) fs_info->max_extent_size = fs_info->max_zone_append_size; diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index cafa639927050..0f22b22fe359f 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -19,7 +19,6 @@ struct btrfs_zoned_device_info { */ u64 zone_size; u8 zone_size_shift; - u64 max_zone_append_size; u32 nr_zones; unsigned int max_active_zones; atomic_t active_zones_left;
To be able to split a write into properly sized zone append commands, we need a queue_limits structure that contains the least common denominator suitable for all devices. Signed-off-by: Christoph Hellwig <hch@lst.de> --- fs/btrfs/ctree.h | 4 +++- fs/btrfs/zoned.c | 36 ++++++++++++++++++------------------ fs/btrfs/zoned.h | 1 - 3 files changed, 21 insertions(+), 20 deletions(-)