[v10,05/41] btrfs: check and enable ZONED mode

Message ID	104218b8d66fec2e4121203b90e7673ddac19d6a.1605007036.git.naohiro.aota@wdc.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-fsdevel-owner@kernel.org> IronPort-SDR: RhopRcQ1Sp2NHa0Wdn2fY/n/4CSDzYTF6dcryj1JHov+37mXa6/WUNWOAd8hgTBrV6qkZ6MwnZ WRGW/8R/DYcK9j2PiWtNfWpqw7qJueIwYQ923RjIlwsMofjfN0TF8hlVT3qRMGWYtj4fGpuvkQ LdsUwXjiTlSBT9OLIfKrPRFqFRkpovry0ZDU+hUb/To2MKGtI7qNhebbXpIfL8GEAUJkXQE9k5 3/YinLJovUQGuI97I5W8vWiB4MCShm/NIo471iWY+/dVxTHmfYwwCmHSXRMKJ2qedConBiFTEq /cE= IronPort-SDR: BO4+psgAL28WnefEmOd9asBEATWMtUH6CLkx+OfPmY6+kW9CEo/GO/PrcKda8B5C5H8dMla3x5 mhfcQuXPwR/HxeVce9wBNizuiBP2+Xgzk+Qsu40qN1TB8zG7L4j0STc0wrRKavB6m+/ykRTsgD z/9X1GgE1ErX3yabhAC9iulN9Y01C0Op+LUHmb804p8qvrLSTbgp4ldRsqC+hSAjB+UmzPqGlE ecwlDUDI7QNM8kWxQ9Kvty6X3JOO1pWc23E0wtn8emWcv3ICDAQBqAUowJLQptMDm1l6mCx3cx cy89U421fy7P4S2x+nEc3IH8 IronPort-SDR: IYDyPkeoUEgIChjiqGpOUP7XriyRE6R2vxhpeD5nrn9FRfyXNIZkJFXrjx5qIGCo+zGUdKlv15 1HXzhdFr6rFXzxPrRcv9OEV3df0zwX6CFMd9lem19LVxoHf09kaFe3mAQi7ntTalztBqJyFXiz tpCPQK0fj5Z7Pl21AbeNEZR1OH0+ohzawajdUU/688NU9pe/uBOoi+jGasSqt3m5jXjoNaB9vb kLKHNmP0Pj6gKw9QBXXyB8xFP4UsNZajdWiXw4w3orcT7umvF9U6k/NvCa7tXOsmXN+dCV63LN mdw= WDCIronportException: Internal From: Naohiro Aota <naohiro.aota@wdc.com> To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@infradead.org>, "Darrick J. Wong" <darrick.wong@oracle.com>, Naohiro Aota <naohiro.aota@wdc.com>, Johannes Thumshirn <johannes.thumshirn@wdc.com>, Damien Le Moal <damien.lemoal@wdc.com>, Josef Bacik <josef@toxicpanda.com> Subject: [PATCH v10 05/41] btrfs: check and enable ZONED mode Date: Tue, 10 Nov 2020 20:26:08 +0900 Message-Id: <104218b8d66fec2e4121203b90e7673ddac19d6a.1605007036.git.naohiro.aota@wdc.com> In-Reply-To: <cover.1605007036.git.naohiro.aota@wdc.com> References: <cover.1605007036.git.naohiro.aota@wdc.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	btrfs: zoned block device support \| expand [v10,00/41] btrfs: zoned block device support [v10,01/41] block: add bio_add_zone_append_page [v10,02/41] iomap: support REQ_OP_ZONE_APPEND [v10,03/41] btrfs: introduce ZONED feature flag [v10,04/41] btrfs: get zone information of zoned block devices [v10,05/41] btrfs: check and enable ZONED mode [v10,06/41] btrfs: introduce max_zone_append_size [v10,07/41] btrfs: disallow space_cache in ZONED mode [v10,08/41] btrfs: disallow NODATACOW in ZONED mode [v10,09/41] btrfs: disable fallocate in ZONED mode [v10,10/41] btrfs: disallow mixed-bg in ZONED mode [v10,11/41] btrfs: implement log-structured superblock for ZONED mode [v10,12/41] btrfs: implement zoned chunk allocator [v10,13/41] btrfs: verify device extent is aligned to zone [v10,14/41] btrfs: load zone's alloction offset [v10,15/41] btrfs: emulate write pointer for conventional zones [v10,16/41] btrfs: track unusable bytes for zones [v10,17/41] btrfs: do sequential extent allocation in ZONED mode [v10,18/41] btrfs: reset zones of unused block groups [v10,19/41] btrfs: redirty released extent buffers in ZONED mode [v10,20/41] btrfs: extract page adding function [v10,21/41] btrfs: use bio_add_zone_append_page for zoned btrfs [v10,22/41] btrfs: handle REQ_OP_ZONE_APPEND as writing [v10,23/41] btrfs: split ordered extent when bio is sent [v10,24/41] btrfs: extend btrfs_rmap_block for specifying a device [v10,25/41] btrfs: use ZONE_APPEND write for ZONED btrfs [v10,26/41] btrfs: enable zone append writing for direct IO [v10,27/41] btrfs: introduce dedicated data write path for ZONED mode [v10,28/41] btrfs: serialize meta IOs on ZONED mode [v10,29/41] btrfs: wait existing extents before truncating [v10,30/41] btrfs: avoid async metadata checksum on ZONED mode [v10,31/41] btrfs: mark block groups to copy for device-replace [v10,32/41] btrfs: implement cloning for ZONED device-replace [v10,33/41] btrfs: implement copying for ZONED device-replace [v10,34/41] btrfs: support dev-replace in ZONED mode [v10,35/41] btrfs: enable relocation in ZONED mode [v10,36/41] btrfs: relocate block group to repair IO failure in ZONED [v10,37/41] btrfs: split alloc_log_tree() [v10,38/41] btrfs: extend zoned allocator to use dedicated tree-log block group [v10,39/41] btrfs: serialize log transaction on ZONED mode [v10,40/41] btrfs: reorder log node allocation [v10,41/41] btrfs: enable to mount ZONED incompat flag

Naohiro Aota Nov. 10, 2020, 11:26 a.m. UTC

This commit introduces the function btrfs_check_zoned_mode() to check if
ZONED flag is enabled on the file system and if the file system consists of
zoned devices with equal zone size.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/ctree.h       | 11 ++++++
 fs/btrfs/dev-replace.c |  7 ++++
 fs/btrfs/disk-io.c     | 11 ++++++
 fs/btrfs/super.c       |  1 +
 fs/btrfs/volumes.c     |  5 +++
 fs/btrfs/zoned.c       | 81 ++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/zoned.h       | 26 ++++++++++++++
 7 files changed, 142 insertions(+)

Anand Jain Nov. 18, 2020, 11:29 a.m. UTC | #1

On 10/11/20 7:26 pm, Naohiro Aota wrote:
> This commit introduces the function btrfs_check_zoned_mode() to check if
> ZONED flag is enabled on the file system and if the file system consists of
> zoned devices with equal zone size.
> 
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> ---
>   fs/btrfs/ctree.h       | 11 ++++++
>   fs/btrfs/dev-replace.c |  7 ++++
>   fs/btrfs/disk-io.c     | 11 ++++++
>   fs/btrfs/super.c       |  1 +
>   fs/btrfs/volumes.c     |  5 +++
>   fs/btrfs/zoned.c       | 81 ++++++++++++++++++++++++++++++++++++++++++
>   fs/btrfs/zoned.h       | 26 ++++++++++++++
>   7 files changed, 142 insertions(+)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index aac3d6f4e35b..453f41ca024e 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -948,6 +948,12 @@ struct btrfs_fs_info {
>   	/* Type of exclusive operation running */
>   	unsigned long exclusive_operation;
>   
> +	/* Zone size when in ZONED mode */
> +	union {
> +		u64 zone_size;
> +		u64 zoned;
> +	};
> +
>   #ifdef CONFIG_BTRFS_FS_REF_VERIFY
>   	spinlock_t ref_verify_lock;
>   	struct rb_root block_tree;
> @@ -3595,4 +3601,9 @@ static inline int btrfs_is_testing(struct btrfs_fs_info *fs_info)
>   }
>   #endif
>   
> +static inline bool btrfs_is_zoned(struct btrfs_fs_info *fs_info)
> +{
> +	return fs_info->zoned != 0;
> +}
> +
>   #endif
> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
> index 6f6d77224c2b..db87f1aa604b 100644
> --- a/fs/btrfs/dev-replace.c
> +++ b/fs/btrfs/dev-replace.c
> @@ -238,6 +238,13 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
>   		return PTR_ERR(bdev);
>   	}
>   
> +	if (!btrfs_check_device_zone_type(fs_info, bdev)) {
> +		btrfs_err(fs_info,
> +			  "dev-replace: zoned type of target device mismatch with filesystem");
> +		ret = -EINVAL;
> +		goto error;
> +	}
> +
>   	sync_blockdev(bdev);
>   
>   	list_for_each_entry(device, &fs_info->fs_devices->devices, dev_list) {

  I am not sure if it is done in some other patch. But we still have to
  check for

  (model == BLK_ZONED_HA && incompat_zoned))

right? What if in a non-zoned FS, a zoned device is added through the
replace. No?


> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 764001609a15..e76ac4da208d 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -42,6 +42,7 @@
>   #include "block-group.h"
>   #include "discard.h"
>   #include "space-info.h"
> +#include "zoned.h"
>   
>   #define BTRFS_SUPER_FLAG_SUPP	(BTRFS_HEADER_FLAG_WRITTEN |\
>   				 BTRFS_HEADER_FLAG_RELOC |\
> @@ -2976,6 +2977,8 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
>   	if (features & BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA)
>   		btrfs_info(fs_info, "has skinny extents");
>   
> +	fs_info->zoned = features & BTRFS_FEATURE_INCOMPAT_ZONED;
> +
>   	/*
>   	 * flag our filesystem as having big metadata blocks if
>   	 * they are bigger than the page size
> @@ -3130,7 +3133,15 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
>   
>   	btrfs_free_extra_devids(fs_devices, 1);
>   
> +	ret = btrfs_check_zoned_mode(fs_info);
> +	if (ret) {
> +		btrfs_err(fs_info, "failed to inititialize zoned mode: %d",
> +			  ret);
> +		goto fail_block_groups;
> +	}
> +
>   	ret = btrfs_sysfs_add_fsid(fs_devices);
> +
>   	if (ret) {
>   		btrfs_err(fs_info, "failed to init sysfs fsid interface: %d",
>   				ret);
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index ed55014fd1bd..3312fe08168f 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -44,6 +44,7 @@
>   #include "backref.h"
>   #include "space-info.h"
>   #include "sysfs.h"
> +#include "zoned.h"
>   #include "tests/btrfs-tests.h"
>   #include "block-group.h"
>   #include "discard.h"
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index e787bf89f761..10827892c086 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2518,6 +2518,11 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>   	if (IS_ERR(bdev))
>   		return PTR_ERR(bdev);
>   
> +	if (!btrfs_check_device_zone_type(fs_info, bdev)) {
> +		ret = -EINVAL;
> +		goto error;
> +	}
> +
>   	if (fs_devices->seeding) {
>   		seeding_dev = 1;
>   		down_write(&sb->s_umount);


Same here too. It can also happen that a zone device is added to a non 
zoned fs.

Thanks, Anand


> diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
> index b7ffe6670d3a..1223d5b0e411 100644
> --- a/fs/btrfs/zoned.c
> +++ b/fs/btrfs/zoned.c
> @@ -180,3 +180,84 @@ int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos,
>   
>   	return 0;
>   }
> +
> +int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info)
> +{
> +	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
> +	struct btrfs_device *device;
> +	u64 zoned_devices = 0;
> +	u64 nr_devices = 0;
> +	u64 zone_size = 0;
> +	const bool incompat_zoned = btrfs_is_zoned(fs_info);
> +	int ret = 0;
> +
> +	/* Count zoned devices */
> +	list_for_each_entry(device, &fs_devices->devices, dev_list) {
> +		enum blk_zoned_model model;
> +
> +		if (!device->bdev)
> +			continue;
> +
> +		model = bdev_zoned_model(device->bdev);
> +		if (model == BLK_ZONED_HM ||
> +		    (model == BLK_ZONED_HA && incompat_zoned)) {
> +			zoned_devices++;
> +			if (!zone_size) {
> +				zone_size = device->zone_info->zone_size;
> +			} else if (device->zone_info->zone_size != zone_size) {
> +				btrfs_err(fs_info,
> +					  "zoned: unequal block device zone sizes: have %llu found %llu",
> +					  device->zone_info->zone_size,
> +					  zone_size);
> +				ret = -EINVAL;
> +				goto out;
> +			}
> +		}
> +		nr_devices++;
> +	}
> +
> +	if (!zoned_devices && !incompat_zoned)
> +		goto out;
> +
> +	if (!zoned_devices && incompat_zoned) {
> +		/* No zoned block device found on ZONED FS */
> +		btrfs_err(fs_info,
> +			  "zoned: no zoned devices found on a zoned filesystem");
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	if (zoned_devices && !incompat_zoned) {
> +		btrfs_err(fs_info,
> +			  "zoned: mode not enabled but zoned device found");
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	if (zoned_devices != nr_devices) {
> +		btrfs_err(fs_info,
> +			  "zoned: cannot mix zoned and regular devices");
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	/*
> +	 * stripe_size is always aligned to BTRFS_STRIPE_LEN in
> +	 * __btrfs_alloc_chunk(). Since we want stripe_len == zone_size,
> +	 * check the alignment here.
> +	 */
> +	if (!IS_ALIGNED(zone_size, BTRFS_STRIPE_LEN)) {
> +		btrfs_err(fs_info,
> +			  "zoned: zone size not aligned to stripe %u",
> +			  BTRFS_STRIPE_LEN);
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	fs_info->zone_size = zone_size;
> +
> +	btrfs_info(fs_info, "zoned mode enabled with zone size %llu",
> +		   fs_info->zone_size);
> +out:
> +	return ret;
> +}
> diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h
> index c9e69ff87ab9..bcb1cb99a4f3 100644
> --- a/fs/btrfs/zoned.h
> +++ b/fs/btrfs/zoned.h
> @@ -4,6 +4,7 @@
>   #define BTRFS_ZONED_H
>   
>   #include <linux/types.h>
> +#include <linux/blkdev.h>
>   
>   struct btrfs_zoned_device_info {
>   	/*
> @@ -22,6 +23,7 @@ int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos,
>   		       struct blk_zone *zone);
>   int btrfs_get_dev_zone_info(struct btrfs_device *device);
>   void btrfs_destroy_dev_zone_info(struct btrfs_device *device);
> +int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info);
>   #else /* CONFIG_BLK_DEV_ZONED */
>   static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos,
>   				     struct blk_zone *zone)
> @@ -36,6 +38,15 @@ static inline int btrfs_get_dev_zone_info(struct btrfs_device *device)
>   
>   static inline void btrfs_destroy_dev_zone_info(struct btrfs_device *device) { }
>   
> +static inline int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info)
> +{
> +	if (!btrfs_is_zoned(fs_info))
> +		return 0;
> +
> +	btrfs_err(fs_info, "Zoned block devices support is not enabled");
> +	return -EOPNOTSUPP;
> +}
> +
>   #endif
>   
>   static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos)
> @@ -88,4 +99,19 @@ static inline void btrfs_dev_clear_zone_empty(struct btrfs_device *device,
>   	btrfs_dev_set_empty_zone_bit(device, pos, false);
>   }
>   
> +static inline bool btrfs_check_device_zone_type(struct btrfs_fs_info *fs_info,
> +						struct block_device *bdev)
> +{
> +	u64 zone_size;
> +
> +	if (btrfs_is_zoned(fs_info)) {
> +		zone_size = (u64)bdev_zone_sectors(bdev) << SECTOR_SHIFT;
> +		/* Do not allow non-zoned device */
> +		return bdev_is_zoned(bdev) && fs_info->zone_size == zone_size;
> +	}
> +
> +	/* Do not allow Host Manged zoned device */
> +	return bdev_zoned_model(bdev) != BLK_ZONED_HM;
> +}
> +
>   #endif
>

David Sterba Nov. 27, 2020, 6:44 p.m. UTC | #2

On Wed, Nov 18, 2020 at 07:29:20PM +0800, Anand Jain wrote:
> On 10/11/20 7:26 pm, Naohiro Aota wrote:
> > This commit introduces the function btrfs_check_zoned_mode() to check if
> > ZONED flag is enabled on the file system and if the file system consists of
> > zoned devices with equal zone size.
> > 
> > Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> > Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> > Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> > ---
> >   fs/btrfs/ctree.h       | 11 ++++++
> >   fs/btrfs/dev-replace.c |  7 ++++
> >   fs/btrfs/disk-io.c     | 11 ++++++
> >   fs/btrfs/super.c       |  1 +
> >   fs/btrfs/volumes.c     |  5 +++
> >   fs/btrfs/zoned.c       | 81 ++++++++++++++++++++++++++++++++++++++++++
> >   fs/btrfs/zoned.h       | 26 ++++++++++++++
> >   7 files changed, 142 insertions(+)
> > 
> > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> > index aac3d6f4e35b..453f41ca024e 100644
> > --- a/fs/btrfs/ctree.h
> > +++ b/fs/btrfs/ctree.h
> > @@ -948,6 +948,12 @@ struct btrfs_fs_info {
> >   	/* Type of exclusive operation running */
> >   	unsigned long exclusive_operation;
> >   
> > +	/* Zone size when in ZONED mode */
> > +	union {
> > +		u64 zone_size;
> > +		u64 zoned;
> > +	};
> > +
> >   #ifdef CONFIG_BTRFS_FS_REF_VERIFY
> >   	spinlock_t ref_verify_lock;
> >   	struct rb_root block_tree;
> > @@ -3595,4 +3601,9 @@ static inline int btrfs_is_testing(struct btrfs_fs_info *fs_info)
> >   }
> >   #endif
> >   
> > +static inline bool btrfs_is_zoned(struct btrfs_fs_info *fs_info)
> > +{
> > +	return fs_info->zoned != 0;
> > +}
> > +
> >   #endif
> > diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
> > index 6f6d77224c2b..db87f1aa604b 100644
> > --- a/fs/btrfs/dev-replace.c
> > +++ b/fs/btrfs/dev-replace.c
> > @@ -238,6 +238,13 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
> >   		return PTR_ERR(bdev);
> >   	}
> >   
> > +	if (!btrfs_check_device_zone_type(fs_info, bdev)) {
> > +		btrfs_err(fs_info,
> > +			  "dev-replace: zoned type of target device mismatch with filesystem");
> > +		ret = -EINVAL;
> > +		goto error;
> > +	}
> > +
> >   	sync_blockdev(bdev);
> >   
> >   	list_for_each_entry(device, &fs_info->fs_devices->devices, dev_list) {
> 
>   I am not sure if it is done in some other patch. But we still have to
>   check for
> 
>   (model == BLK_ZONED_HA && incompat_zoned))

Do you really mean BLK_ZONED_HA, ie. host-aware (HA)?
btrfs_check_device_zone_type checks for _HM.

> right? What if in a non-zoned FS, a zoned device is added through the
> replace. No?

The types of devices cannot mix, yeah. So I'd like to know the answer as
well.

> > --- a/fs/btrfs/volumes.c
> > +++ b/fs/btrfs/volumes.c
> > @@ -2518,6 +2518,11 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
> >   	if (IS_ERR(bdev))
> >   		return PTR_ERR(bdev);
> >   
> > +	if (!btrfs_check_device_zone_type(fs_info, bdev)) {
> > +		ret = -EINVAL;
> > +		goto error;
> > +	}
> > +
> >   	if (fs_devices->seeding) {
> >   		seeding_dev = 1;
> >   		down_write(&sb->s_umount);
> 
> Same here too. It can also happen that a zone device is added to a non 
> zoned fs.

Anand Jain Nov. 30, 2020, 12:12 p.m. UTC | #3

On 28/11/20 2:44 am, David Sterba wrote:
> On Wed, Nov 18, 2020 at 07:29:20PM +0800, Anand Jain wrote:
>> On 10/11/20 7:26 pm, Naohiro Aota wrote:
>>> This commit introduces the function btrfs_check_zoned_mode() to check if
>>> ZONED flag is enabled on the file system and if the file system consists of
>>> zoned devices with equal zone size.
>>>
>>> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
>>> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
>>> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
>>> ---
>>>    fs/btrfs/ctree.h       | 11 ++++++
>>>    fs/btrfs/dev-replace.c |  7 ++++
>>>    fs/btrfs/disk-io.c     | 11 ++++++
>>>    fs/btrfs/super.c       |  1 +
>>>    fs/btrfs/volumes.c     |  5 +++
>>>    fs/btrfs/zoned.c       | 81 ++++++++++++++++++++++++++++++++++++++++++
>>>    fs/btrfs/zoned.h       | 26 ++++++++++++++
>>>    7 files changed, 142 insertions(+)
>>>
>>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>>> index aac3d6f4e35b..453f41ca024e 100644
>>> --- a/fs/btrfs/ctree.h
>>> +++ b/fs/btrfs/ctree.h
>>> @@ -948,6 +948,12 @@ struct btrfs_fs_info {
>>>    	/* Type of exclusive operation running */
>>>    	unsigned long exclusive_operation;
>>>    
>>> +	/* Zone size when in ZONED mode */
>>> +	union {
>>> +		u64 zone_size;
>>> +		u64 zoned;
>>> +	};
>>> +
>>>    #ifdef CONFIG_BTRFS_FS_REF_VERIFY
>>>    	spinlock_t ref_verify_lock;
>>>    	struct rb_root block_tree;
>>> @@ -3595,4 +3601,9 @@ static inline int btrfs_is_testing(struct btrfs_fs_info *fs_info)
>>>    }
>>>    #endif
>>>    
>>> +static inline bool btrfs_is_zoned(struct btrfs_fs_info *fs_info)
>>> +{
>>> +	return fs_info->zoned != 0;
>>> +}
>>> +
>>>    #endif
>>> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
>>> index 6f6d77224c2b..db87f1aa604b 100644
>>> --- a/fs/btrfs/dev-replace.c
>>> +++ b/fs/btrfs/dev-replace.c
>>> @@ -238,6 +238,13 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
>>>    		return PTR_ERR(bdev);
>>>    	}
>>>    
>>> +	if (!btrfs_check_device_zone_type(fs_info, bdev)) {
>>> +		btrfs_err(fs_info,
>>> +			  "dev-replace: zoned type of target device mismatch with filesystem");
>>> +		ret = -EINVAL;
>>> +		goto error;
>>> +	}
>>> +
>>>    	sync_blockdev(bdev);
>>>    
>>>    	list_for_each_entry(device, &fs_info->fs_devices->devices, dev_list) {
>>
>>    I am not sure if it is done in some other patch. But we still have to
>>    check for
>>
>>    (model == BLK_ZONED_HA && incompat_zoned))
> 
> Do you really mean BLK_ZONED_HA, ie. host-aware (HA)?
> btrfs_check_device_zone_type checks for _HM.


Still confusing to me. The below function, which is part of this
patch, says we don't support BLK_ZONED_HM. So does it mean we
allow BLK_ZONED_HA only?

+static inline bool btrfs_check_device_zone_type(struct btrfs_fs_info 
*fs_info,
+						struct block_device *bdev)
+{
+	u64 zone_size;
+
+	if (btrfs_is_zoned(fs_info)) {
+		zone_size = (u64)bdev_zone_sectors(bdev) << SECTOR_SHIFT;
+		/* Do not allow non-zoned device */
+		return bdev_is_zoned(bdev) && fs_info->zone_size == zone_size;
+	}
+
+	/* Do not allow Host Manged zoned device */
+	return bdev_zoned_model(bdev) != BLK_ZONED_HM;
+}


Also, if there is a new type of zoned device in the future, the older 
kernel should be able to reject the newer zone device types.

And, if possible could you rename above function to 
btrfs_zone_type_is_valid(). Or better.


>> right? What if in a non-zoned FS, a zoned device is added through the
>> replace. No?
> 
> The types of devices cannot mix, yeah. So I'd like to know the answer as
> well.


>>> --- a/fs/btrfs/volumes.c
>>> +++ b/fs/btrfs/volumes.c
>>> @@ -2518,6 +2518,11 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>>>    	if (IS_ERR(bdev))
>>>    		return PTR_ERR(bdev);
>>>    
>>> +	if (!btrfs_check_device_zone_type(fs_info, bdev)) {
>>> +		ret = -EINVAL;
>>> +		goto error;
>>> +	}
>>> +
>>>    	if (fs_devices->seeding) {
>>>    		seeding_dev = 1;
>>>    		down_write(&sb->s_umount);
>>
>> Same here too. It can also happen that a zone device is added to a non
>> zoned fs.


Thanks.

Damien Le Moal Nov. 30, 2020, 1:15 p.m. UTC | #4

On 2020/11/30 21:13, Anand Jain wrote:
> On 28/11/20 2:44 am, David Sterba wrote:
>> On Wed, Nov 18, 2020 at 07:29:20PM +0800, Anand Jain wrote:
>>> On 10/11/20 7:26 pm, Naohiro Aota wrote:
>>>> This commit introduces the function btrfs_check_zoned_mode() to check if
>>>> ZONED flag is enabled on the file system and if the file system consists of
>>>> zoned devices with equal zone size.
>>>>
>>>> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>>>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
>>>> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
>>>> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
>>>> ---
>>>>    fs/btrfs/ctree.h       | 11 ++++++
>>>>    fs/btrfs/dev-replace.c |  7 ++++
>>>>    fs/btrfs/disk-io.c     | 11 ++++++
>>>>    fs/btrfs/super.c       |  1 +
>>>>    fs/btrfs/volumes.c     |  5 +++
>>>>    fs/btrfs/zoned.c       | 81 ++++++++++++++++++++++++++++++++++++++++++
>>>>    fs/btrfs/zoned.h       | 26 ++++++++++++++
>>>>    7 files changed, 142 insertions(+)
>>>>
>>>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>>>> index aac3d6f4e35b..453f41ca024e 100644
>>>> --- a/fs/btrfs/ctree.h
>>>> +++ b/fs/btrfs/ctree.h
>>>> @@ -948,6 +948,12 @@ struct btrfs_fs_info {
>>>>    	/* Type of exclusive operation running */
>>>>    	unsigned long exclusive_operation;
>>>>    
>>>> +	/* Zone size when in ZONED mode */
>>>> +	union {
>>>> +		u64 zone_size;
>>>> +		u64 zoned;
>>>> +	};
>>>> +
>>>>    #ifdef CONFIG_BTRFS_FS_REF_VERIFY
>>>>    	spinlock_t ref_verify_lock;
>>>>    	struct rb_root block_tree;
>>>> @@ -3595,4 +3601,9 @@ static inline int btrfs_is_testing(struct btrfs_fs_info *fs_info)
>>>>    }
>>>>    #endif
>>>>    
>>>> +static inline bool btrfs_is_zoned(struct btrfs_fs_info *fs_info)
>>>> +{
>>>> +	return fs_info->zoned != 0;
>>>> +}
>>>> +
>>>>    #endif
>>>> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
>>>> index 6f6d77224c2b..db87f1aa604b 100644
>>>> --- a/fs/btrfs/dev-replace.c
>>>> +++ b/fs/btrfs/dev-replace.c
>>>> @@ -238,6 +238,13 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
>>>>    		return PTR_ERR(bdev);
>>>>    	}
>>>>    
>>>> +	if (!btrfs_check_device_zone_type(fs_info, bdev)) {
>>>> +		btrfs_err(fs_info,
>>>> +			  "dev-replace: zoned type of target device mismatch with filesystem");
>>>> +		ret = -EINVAL;
>>>> +		goto error;
>>>> +	}
>>>> +
>>>>    	sync_blockdev(bdev);
>>>>    
>>>>    	list_for_each_entry(device, &fs_info->fs_devices->devices, dev_list) {
>>>
>>>    I am not sure if it is done in some other patch. But we still have to
>>>    check for
>>>
>>>    (model == BLK_ZONED_HA && incompat_zoned))
>>
>> Do you really mean BLK_ZONED_HA, ie. host-aware (HA)?
>> btrfs_check_device_zone_type checks for _HM.
> 
> 
> Still confusing to me. The below function, which is part of this
> patch, says we don't support BLK_ZONED_HM. So does it mean we
> allow BLK_ZONED_HA only?
> 
> +static inline bool btrfs_check_device_zone_type(struct btrfs_fs_info 
> *fs_info,
> +						struct block_device *bdev)
> +{
> +	u64 zone_size;
> +
> +	if (btrfs_is_zoned(fs_info)) {
> +		zone_size = (u64)bdev_zone_sectors(bdev) << SECTOR_SHIFT;
> +		/* Do not allow non-zoned device */

This comment does not make sense. It should be:

		/* Only allow zoned devices with the same zone size */

> +		return bdev_is_zoned(bdev) && fs_info->zone_size == zone_size;
> +	}
> +
> +	/* Do not allow Host Manged zoned device */
> +	return bdev_zoned_model(bdev) != BLK_ZONED_HM;

The comment is also wrong. It should read:

	/* Allow only host managed zoned devices */

This is because we decided to treat host aware devices in the same way as
regular block devices, since HA drives are backward compatible with regular
block devices.

> +}
> 
> 
> Also, if there is a new type of zoned device in the future, the older 
> kernel should be able to reject the newer zone device types.
> 
> And, if possible could you rename above function to 
> btrfs_zone_type_is_valid(). Or better.
> 
> 
>>> right? What if in a non-zoned FS, a zoned device is added through the
>>> replace. No?
>>
>> The types of devices cannot mix, yeah. So I'd like to know the answer as
>> well.
> 
> 
>>>> --- a/fs/btrfs/volumes.c
>>>> +++ b/fs/btrfs/volumes.c
>>>> @@ -2518,6 +2518,11 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>>>>    	if (IS_ERR(bdev))
>>>>    		return PTR_ERR(bdev);
>>>>    
>>>> +	if (!btrfs_check_device_zone_type(fs_info, bdev)) {
>>>> +		ret = -EINVAL;
>>>> +		goto error;
>>>> +	}
>>>> +
>>>>    	if (fs_devices->seeding) {
>>>>    		seeding_dev = 1;
>>>>    		down_write(&sb->s_umount);
>>>
>>> Same here too. It can also happen that a zone device is added to a non
>>> zoned fs.
> 
> 
> Thanks.
>

Anand Jain Dec. 1, 2020, 2:19 a.m. UTC | #5

On 30/11/20 9:15 pm, Damien Le Moal wrote:
> On 2020/11/30 21:13, Anand Jain wrote:
>> On 28/11/20 2:44 am, David Sterba wrote:
>>> On Wed, Nov 18, 2020 at 07:29:20PM +0800, Anand Jain wrote:
>>>> On 10/11/20 7:26 pm, Naohiro Aota wrote:
>>>>> This commit introduces the function btrfs_check_zoned_mode() to check if
>>>>> ZONED flag is enabled on the file system and if the file system consists of
>>>>> zoned devices with equal zone size.
>>>>>
>>>>> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>>>>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
>>>>> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
>>>>> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
>>>>> ---
>>>>>     fs/btrfs/ctree.h       | 11 ++++++
>>>>>     fs/btrfs/dev-replace.c |  7 ++++
>>>>>     fs/btrfs/disk-io.c     | 11 ++++++
>>>>>     fs/btrfs/super.c       |  1 +
>>>>>     fs/btrfs/volumes.c     |  5 +++
>>>>>     fs/btrfs/zoned.c       | 81 ++++++++++++++++++++++++++++++++++++++++++
>>>>>     fs/btrfs/zoned.h       | 26 ++++++++++++++
>>>>>     7 files changed, 142 insertions(+)
>>>>>
>>>>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>>>>> index aac3d6f4e35b..453f41ca024e 100644
>>>>> --- a/fs/btrfs/ctree.h
>>>>> +++ b/fs/btrfs/ctree.h
>>>>> @@ -948,6 +948,12 @@ struct btrfs_fs_info {
>>>>>     	/* Type of exclusive operation running */
>>>>>     	unsigned long exclusive_operation;
>>>>>     
>>>>> +	/* Zone size when in ZONED mode */
>>>>> +	union {
>>>>> +		u64 zone_size;
>>>>> +		u64 zoned;
>>>>> +	};
>>>>> +
>>>>>     #ifdef CONFIG_BTRFS_FS_REF_VERIFY
>>>>>     	spinlock_t ref_verify_lock;
>>>>>     	struct rb_root block_tree;
>>>>> @@ -3595,4 +3601,9 @@ static inline int btrfs_is_testing(struct btrfs_fs_info *fs_info)
>>>>>     }
>>>>>     #endif
>>>>>     
>>>>> +static inline bool btrfs_is_zoned(struct btrfs_fs_info *fs_info)
>>>>> +{
>>>>> +	return fs_info->zoned != 0;
>>>>> +}
>>>>> +
>>>>>     #endif
>>>>> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
>>>>> index 6f6d77224c2b..db87f1aa604b 100644
>>>>> --- a/fs/btrfs/dev-replace.c
>>>>> +++ b/fs/btrfs/dev-replace.c
>>>>> @@ -238,6 +238,13 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
>>>>>     		return PTR_ERR(bdev);
>>>>>     	}
>>>>>     
>>>>> +	if (!btrfs_check_device_zone_type(fs_info, bdev)) {
>>>>> +		btrfs_err(fs_info,
>>>>> +			  "dev-replace: zoned type of target device mismatch with filesystem");
>>>>> +		ret = -EINVAL;
>>>>> +		goto error;
>>>>> +	}
>>>>> +
>>>>>     	sync_blockdev(bdev);
>>>>>     
>>>>>     	list_for_each_entry(device, &fs_info->fs_devices->devices, dev_list) {
>>>>
>>>>     I am not sure if it is done in some other patch. But we still have to
>>>>     check for
>>>>
>>>>     (model == BLK_ZONED_HA && incompat_zoned))
>>>
>>> Do you really mean BLK_ZONED_HA, ie. host-aware (HA)?
>>> btrfs_check_device_zone_type checks for _HM.
>>
>>
>> Still confusing to me. The below function, which is part of this
>> patch, says we don't support BLK_ZONED_HM. So does it mean we
>> allow BLK_ZONED_HA only?
>>
>> +static inline bool btrfs_check_device_zone_type(struct btrfs_fs_info
>> *fs_info,
>> +						struct block_device *bdev)
>> +{
>> +	u64 zone_size;
>> +
>> +	if (btrfs_is_zoned(fs_info)) {
>> +		zone_size = (u64)bdev_zone_sectors(bdev) << SECTOR_SHIFT;
>> +		/* Do not allow non-zoned device */
> 
> This comment does not make sense. It should be:
> 
> 		/* Only allow zoned devices with the same zone size */
> 
>> +		return bdev_is_zoned(bdev) && fs_info->zone_size == zone_size;
>> +	}
>> +
>> +	/* Do not allow Host Manged zoned device */
>> +	return bdev_zoned_model(bdev) != BLK_ZONED_HM;
> 
> The comment is also wrong. It should read:
> 
> 	/* Allow only host managed zoned devices */
> 
> This is because we decided to treat host aware devices in the same way as
> regular block devices, since HA drives are backward compatible with regular
> block devices.


Yeah, I read about them, but I have questions like do an FS work on top 
of a BLK_ZONED_HA without modification?
  Are we ok to replace an HM device with a HA device? Or add a HA device 
to a btrfs on an HM device.

Thanks.

> 
>> +}
>>
>>
>> Also, if there is a new type of zoned device in the future, the older
>> kernel should be able to reject the newer zone device types.
>>
>> And, if possible could you rename above function to
>> btrfs_zone_type_is_valid(). Or better.
>>
>>
>>>> right? What if in a non-zoned FS, a zoned device is added through the
>>>> replace. No?
>>>
>>> The types of devices cannot mix, yeah. So I'd like to know the answer as
>>> well.
>>
>>
>>>>> --- a/fs/btrfs/volumes.c
>>>>> +++ b/fs/btrfs/volumes.c
>>>>> @@ -2518,6 +2518,11 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>>>>>     	if (IS_ERR(bdev))
>>>>>     		return PTR_ERR(bdev);
>>>>>     
>>>>> +	if (!btrfs_check_device_zone_type(fs_info, bdev)) {
>>>>> +		ret = -EINVAL;
>>>>> +		goto error;
>>>>> +	}
>>>>> +
>>>>>     	if (fs_devices->seeding) {
>>>>>     		seeding_dev = 1;
>>>>>     		down_write(&sb->s_umount);
>>>>
>>>> Same here too. It can also happen that a zone device is added to a non
>>>> zoned fs.
>>
>>
>> Thanks.
>>
> 
>

Damien Le Moal Dec. 1, 2020, 2:29 a.m. UTC | #6

On 2020/12/01 11:20, Anand Jain wrote:
> On 30/11/20 9:15 pm, Damien Le Moal wrote:
>> On 2020/11/30 21:13, Anand Jain wrote:
>>> On 28/11/20 2:44 am, David Sterba wrote:
>>>> On Wed, Nov 18, 2020 at 07:29:20PM +0800, Anand Jain wrote:
>>>>> On 10/11/20 7:26 pm, Naohiro Aota wrote:
>>>>>> This commit introduces the function btrfs_check_zoned_mode() to check if
>>>>>> ZONED flag is enabled on the file system and if the file system consists of
>>>>>> zoned devices with equal zone size.
>>>>>>
>>>>>> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>>>>>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
>>>>>> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
>>>>>> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
>>>>>> ---
>>>>>>     fs/btrfs/ctree.h       | 11 ++++++
>>>>>>     fs/btrfs/dev-replace.c |  7 ++++
>>>>>>     fs/btrfs/disk-io.c     | 11 ++++++
>>>>>>     fs/btrfs/super.c       |  1 +
>>>>>>     fs/btrfs/volumes.c     |  5 +++
>>>>>>     fs/btrfs/zoned.c       | 81 ++++++++++++++++++++++++++++++++++++++++++
>>>>>>     fs/btrfs/zoned.h       | 26 ++++++++++++++
>>>>>>     7 files changed, 142 insertions(+)
>>>>>>
>>>>>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>>>>>> index aac3d6f4e35b..453f41ca024e 100644
>>>>>> --- a/fs/btrfs/ctree.h
>>>>>> +++ b/fs/btrfs/ctree.h
>>>>>> @@ -948,6 +948,12 @@ struct btrfs_fs_info {
>>>>>>     	/* Type of exclusive operation running */
>>>>>>     	unsigned long exclusive_operation;
>>>>>>     
>>>>>> +	/* Zone size when in ZONED mode */
>>>>>> +	union {
>>>>>> +		u64 zone_size;
>>>>>> +		u64 zoned;
>>>>>> +	};
>>>>>> +
>>>>>>     #ifdef CONFIG_BTRFS_FS_REF_VERIFY
>>>>>>     	spinlock_t ref_verify_lock;
>>>>>>     	struct rb_root block_tree;
>>>>>> @@ -3595,4 +3601,9 @@ static inline int btrfs_is_testing(struct btrfs_fs_info *fs_info)
>>>>>>     }
>>>>>>     #endif
>>>>>>     
>>>>>> +static inline bool btrfs_is_zoned(struct btrfs_fs_info *fs_info)
>>>>>> +{
>>>>>> +	return fs_info->zoned != 0;
>>>>>> +}
>>>>>> +
>>>>>>     #endif
>>>>>> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
>>>>>> index 6f6d77224c2b..db87f1aa604b 100644
>>>>>> --- a/fs/btrfs/dev-replace.c
>>>>>> +++ b/fs/btrfs/dev-replace.c
>>>>>> @@ -238,6 +238,13 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
>>>>>>     		return PTR_ERR(bdev);
>>>>>>     	}
>>>>>>     
>>>>>> +	if (!btrfs_check_device_zone_type(fs_info, bdev)) {
>>>>>> +		btrfs_err(fs_info,
>>>>>> +			  "dev-replace: zoned type of target device mismatch with filesystem");
>>>>>> +		ret = -EINVAL;
>>>>>> +		goto error;
>>>>>> +	}
>>>>>> +
>>>>>>     	sync_blockdev(bdev);
>>>>>>     
>>>>>>     	list_for_each_entry(device, &fs_info->fs_devices->devices, dev_list) {
>>>>>
>>>>>     I am not sure if it is done in some other patch. But we still have to
>>>>>     check for
>>>>>
>>>>>     (model == BLK_ZONED_HA && incompat_zoned))
>>>>
>>>> Do you really mean BLK_ZONED_HA, ie. host-aware (HA)?
>>>> btrfs_check_device_zone_type checks for _HM.
>>>
>>>
>>> Still confusing to me. The below function, which is part of this
>>> patch, says we don't support BLK_ZONED_HM. So does it mean we
>>> allow BLK_ZONED_HA only?
>>>
>>> +static inline bool btrfs_check_device_zone_type(struct btrfs_fs_info
>>> *fs_info,
>>> +						struct block_device *bdev)
>>> +{
>>> +	u64 zone_size;
>>> +
>>> +	if (btrfs_is_zoned(fs_info)) {
>>> +		zone_size = (u64)bdev_zone_sectors(bdev) << SECTOR_SHIFT;
>>> +		/* Do not allow non-zoned device */
>>
>> This comment does not make sense. It should be:
>>
>> 		/* Only allow zoned devices with the same zone size */
>>
>>> +		return bdev_is_zoned(bdev) && fs_info->zone_size == zone_size;
>>> +	}
>>> +
>>> +	/* Do not allow Host Manged zoned device */
>>> +	return bdev_zoned_model(bdev) != BLK_ZONED_HM;
>>
>> The comment is also wrong. It should read:
>>
>> 	/* Allow only host managed zoned devices */
>>
>> This is because we decided to treat host aware devices in the same way as
>> regular block devices, since HA drives are backward compatible with regular
>> block devices.
> 
> 
> Yeah, I read about them, but I have questions like do an FS work on top 
> of a BLK_ZONED_HA without modification?

Yes. These drives are fully backward compatible and accept random writes
anywhere. Performance however is potentially a different story as the drive will
eventually need to do internal garbage collection of some sort, exactly like an
SSD, but definitely not at SSD speeds :)

>   Are we ok to replace an HM device with a HA device? Or add a HA device 
> to a btrfs on an HM device.

We have a choice here: we can treat HA drives as regular devices or treat them
as HM devices. Anything in between does not make sense. I am fine either way,
the main reason being that there are no HA drive on the market today that I know
of (this model did not have a lot of success due to the potentially very
unpredictable performance depending on the use case).

So the simplest thing to do is, in my opinion, to ignore their "zone"
characteristics and treat them as regular disks. But treating them as HM drives
is a simple to do too.

Of note is that a host-aware drive will be reported by the block layer as
BLK_ZONED_HA only as long as the drive does not have any partition. If it does,
then the block layer will treat the drive as a regular disk.

> 
> Thanks.
> 
>>
>>> +}
>>>
>>>
>>> Also, if there is a new type of zoned device in the future, the older
>>> kernel should be able to reject the newer zone device types.
>>>
>>> And, if possible could you rename above function to
>>> btrfs_zone_type_is_valid(). Or better.
>>>
>>>
>>>>> right? What if in a non-zoned FS, a zoned device is added through the
>>>>> replace. No?
>>>>
>>>> The types of devices cannot mix, yeah. So I'd like to know the answer as
>>>> well.
>>>
>>>
>>>>>> --- a/fs/btrfs/volumes.c
>>>>>> +++ b/fs/btrfs/volumes.c
>>>>>> @@ -2518,6 +2518,11 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>>>>>>     	if (IS_ERR(bdev))
>>>>>>     		return PTR_ERR(bdev);
>>>>>>     
>>>>>> +	if (!btrfs_check_device_zone_type(fs_info, bdev)) {
>>>>>> +		ret = -EINVAL;
>>>>>> +		goto error;
>>>>>> +	}
>>>>>> +
>>>>>>     	if (fs_devices->seeding) {
>>>>>>     		seeding_dev = 1;
>>>>>>     		down_write(&sb->s_umount);
>>>>>
>>>>> Same here too. It can also happen that a zone device is added to a non
>>>>> zoned fs.
>>>
>>>
>>> Thanks.
>>>
>>
>>
> 
>

Anand Jain Dec. 1, 2020, 5:53 a.m. UTC | #7

On 1/12/20 10:29 am, Damien Le Moal wrote:
> On 2020/12/01 11:20, Anand Jain wrote:
>> On 30/11/20 9:15 pm, Damien Le Moal wrote:
>>> On 2020/11/30 21:13, Anand Jain wrote:
>>>> On 28/11/20 2:44 am, David Sterba wrote:
>>>>> On Wed, Nov 18, 2020 at 07:29:20PM +0800, Anand Jain wrote:
>>>>>> On 10/11/20 7:26 pm, Naohiro Aota wrote:
>>>>>>> This commit introduces the function btrfs_check_zoned_mode() to check if
>>>>>>> ZONED flag is enabled on the file system and if the file system consists of
>>>>>>> zoned devices with equal zone size.
>>>>>>>
>>>>>>> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>>>>>>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
>>>>>>> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
>>>>>>> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
>>>>>>> ---
>>>>>>>      fs/btrfs/ctree.h       | 11 ++++++
>>>>>>>      fs/btrfs/dev-replace.c |  7 ++++
>>>>>>>      fs/btrfs/disk-io.c     | 11 ++++++
>>>>>>>      fs/btrfs/super.c       |  1 +
>>>>>>>      fs/btrfs/volumes.c     |  5 +++
>>>>>>>      fs/btrfs/zoned.c       | 81 ++++++++++++++++++++++++++++++++++++++++++
>>>>>>>      fs/btrfs/zoned.h       | 26 ++++++++++++++
>>>>>>>      7 files changed, 142 insertions(+)
>>>>>>>
>>>>>>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>>>>>>> index aac3d6f4e35b..453f41ca024e 100644
>>>>>>> --- a/fs/btrfs/ctree.h
>>>>>>> +++ b/fs/btrfs/ctree.h
>>>>>>> @@ -948,6 +948,12 @@ struct btrfs_fs_info {
>>>>>>>      	/* Type of exclusive operation running */
>>>>>>>      	unsigned long exclusive_operation;
>>>>>>>      
>>>>>>> +	/* Zone size when in ZONED mode */
>>>>>>> +	union {
>>>>>>> +		u64 zone_size;
>>>>>>> +		u64 zoned;
>>>>>>> +	};
>>>>>>> +
>>>>>>>      #ifdef CONFIG_BTRFS_FS_REF_VERIFY
>>>>>>>      	spinlock_t ref_verify_lock;
>>>>>>>      	struct rb_root block_tree;
>>>>>>> @@ -3595,4 +3601,9 @@ static inline int btrfs_is_testing(struct btrfs_fs_info *fs_info)
>>>>>>>      }
>>>>>>>      #endif
>>>>>>>      
>>>>>>> +static inline bool btrfs_is_zoned(struct btrfs_fs_info *fs_info)
>>>>>>> +{
>>>>>>> +	return fs_info->zoned != 0;
>>>>>>> +}
>>>>>>> +
>>>>>>>      #endif
>>>>>>> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
>>>>>>> index 6f6d77224c2b..db87f1aa604b 100644
>>>>>>> --- a/fs/btrfs/dev-replace.c
>>>>>>> +++ b/fs/btrfs/dev-replace.c
>>>>>>> @@ -238,6 +238,13 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
>>>>>>>      		return PTR_ERR(bdev);
>>>>>>>      	}
>>>>>>>      
>>>>>>> +	if (!btrfs_check_device_zone_type(fs_info, bdev)) {
>>>>>>> +		btrfs_err(fs_info,
>>>>>>> +			  "dev-replace: zoned type of target device mismatch with filesystem");
>>>>>>> +		ret = -EINVAL;
>>>>>>> +		goto error;
>>>>>>> +	}
>>>>>>> +
>>>>>>>      	sync_blockdev(bdev);
>>>>>>>      
>>>>>>>      	list_for_each_entry(device, &fs_info->fs_devices->devices, dev_list) {
>>>>>>
>>>>>>      I am not sure if it is done in some other patch. But we still have to
>>>>>>      check for
>>>>>>
>>>>>>      (model == BLK_ZONED_HA && incompat_zoned))
>>>>>
>>>>> Do you really mean BLK_ZONED_HA, ie. host-aware (HA)?
>>>>> btrfs_check_device_zone_type checks for _HM.
>>>>
>>>>
>>>> Still confusing to me. The below function, which is part of this
>>>> patch, says we don't support BLK_ZONED_HM. So does it mean we
>>>> allow BLK_ZONED_HA only?
>>>>
>>>> +static inline bool btrfs_check_device_zone_type(struct btrfs_fs_info
>>>> *fs_info,
>>>> +						struct block_device *bdev)
>>>> +{
>>>> +	u64 zone_size;
>>>> +
>>>> +	if (btrfs_is_zoned(fs_info)) {
>>>> +		zone_size = (u64)bdev_zone_sectors(bdev) << SECTOR_SHIFT;
>>>> +		/* Do not allow non-zoned device */
>>>
>>> This comment does not make sense. It should be:
>>>
>>> 		/* Only allow zoned devices with the same zone size */
>>>
>>>> +		return bdev_is_zoned(bdev) && fs_info->zone_size == zone_size;
>>>> +	}
>>>> +
>>>> +	/* Do not allow Host Manged zoned device */
>>>> +	return bdev_zoned_model(bdev) != BLK_ZONED_HM;
>>>
>>> The comment is also wrong. It should read:
>>>
>>> 	/* Allow only host managed zoned devices */
>>>
>>> This is because we decided to treat host aware devices in the same way as
>>> regular block devices, since HA drives are backward compatible with regular
>>> block devices.
>>
>>
>> Yeah, I read about them, but I have questions like do an FS work on top
>> of a BLK_ZONED_HA without modification?
> 
> Yes. These drives are fully backward compatible and accept random writes
> anywhere. Performance however is potentially a different story as the drive will
> eventually need to do internal garbage collection of some sort, exactly like an
> SSD, but definitely not at SSD speeds :)
> 
>>    Are we ok to replace an HM device with a HA device? Or add a HA device
>> to a btrfs on an HM device.
> 
> We have a choice here: we can treat HA drives as regular devices or treat them
> as HM devices. Anything in between does not make sense. I am fine either way,
> the main reason being that there are no HA drive on the market today that I know
> of (this model did not have a lot of success due to the potentially very
> unpredictable performance depending on the use case).
> 
> So the simplest thing to do is, in my opinion, to ignore their "zone"
> characteristics and treat them as regular disks. But treating them as HM drives
> is a simple to do too.
> > Of note is that a host-aware drive will be reported by the block layer as
> BLK_ZONED_HA only as long as the drive does not have any partition. If it does,
> then the block layer will treat the drive as a regular disk.

IMO. For now, it is better to check for the BLK_ZONED_HA explicitly in a 
non-zoned-btrfs. And check for BLK_ZONED_HM explicitly in a zoned-btrfs. 
This way, if there is another type of BLK_ZONED_xx in the future, we 
have the opportunity to review to support it. As below [1]...

[1]
bool btrfs_check_device_type()
{
	if (bdev_is_zoned()) {
		if (btrfs_is_zoned())
			if (bdev_zoned_model == BLK_ZONED_HM)
			/* also check the zone_size. */
				return true;
		else
			if (bdev_zoned_model == BLK_ZONED_HA)
			/* a regular device and FS, no zone_size to check I think? */
				return true;
	} else {
		if (!btrfs_is_zoned())
			return true
	}

	return false;
}

Thanks.

> 
>>
>> Thanks.
>>
>>>
>>>> +}
>>>>
>>>>
>>>> Also, if there is a new type of zoned device in the future, the older
>>>> kernel should be able to reject the newer zone device types.
>>>>
>>>> And, if possible could you rename above function to
>>>> btrfs_zone_type_is_valid(). Or better.
>>>>
>>>>
>>>>>> right? What if in a non-zoned FS, a zoned device is added through the
>>>>>> replace. No?
>>>>>
>>>>> The types of devices cannot mix, yeah. So I'd like to know the answer as
>>>>> well.
>>>>
>>>>
>>>>>>> --- a/fs/btrfs/volumes.c
>>>>>>> +++ b/fs/btrfs/volumes.c
>>>>>>> @@ -2518,6 +2518,11 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>>>>>>>      	if (IS_ERR(bdev))
>>>>>>>      		return PTR_ERR(bdev);
>>>>>>>      
>>>>>>> +	if (!btrfs_check_device_zone_type(fs_info, bdev)) {
>>>>>>> +		ret = -EINVAL;
>>>>>>> +		goto error;
>>>>>>> +	}
>>>>>>> +
>>>>>>>      	if (fs_devices->seeding) {
>>>>>>>      		seeding_dev = 1;
>>>>>>>      		down_write(&sb->s_umount);
>>>>>>
>>>>>> Same here too. It can also happen that a zone device is added to a non
>>>>>> zoned fs.
>>>>
>>>>
>>>> Thanks.

Damien Le Moal Dec. 1, 2020, 6:09 a.m. UTC | #8

On 2020/12/01 14:54, Anand Jain wrote:
> On 1/12/20 10:29 am, Damien Le Moal wrote:
>> On 2020/12/01 11:20, Anand Jain wrote:
>>> On 30/11/20 9:15 pm, Damien Le Moal wrote:
>>>> On 2020/11/30 21:13, Anand Jain wrote:
>>>>> On 28/11/20 2:44 am, David Sterba wrote:
>>>>>> On Wed, Nov 18, 2020 at 07:29:20PM +0800, Anand Jain wrote:
>>>>>>> On 10/11/20 7:26 pm, Naohiro Aota wrote:
>>>>>>>> This commit introduces the function btrfs_check_zoned_mode() to check if
>>>>>>>> ZONED flag is enabled on the file system and if the file system consists of
>>>>>>>> zoned devices with equal zone size.
>>>>>>>>
>>>>>>>> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>>>>>>>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
>>>>>>>> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
>>>>>>>> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
>>>>>>>> ---
>>>>>>>>      fs/btrfs/ctree.h       | 11 ++++++
>>>>>>>>      fs/btrfs/dev-replace.c |  7 ++++
>>>>>>>>      fs/btrfs/disk-io.c     | 11 ++++++
>>>>>>>>      fs/btrfs/super.c       |  1 +
>>>>>>>>      fs/btrfs/volumes.c     |  5 +++
>>>>>>>>      fs/btrfs/zoned.c       | 81 ++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>      fs/btrfs/zoned.h       | 26 ++++++++++++++
>>>>>>>>      7 files changed, 142 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>>>>>>>> index aac3d6f4e35b..453f41ca024e 100644
>>>>>>>> --- a/fs/btrfs/ctree.h
>>>>>>>> +++ b/fs/btrfs/ctree.h
>>>>>>>> @@ -948,6 +948,12 @@ struct btrfs_fs_info {
>>>>>>>>      	/* Type of exclusive operation running */
>>>>>>>>      	unsigned long exclusive_operation;
>>>>>>>>      
>>>>>>>> +	/* Zone size when in ZONED mode */
>>>>>>>> +	union {
>>>>>>>> +		u64 zone_size;
>>>>>>>> +		u64 zoned;
>>>>>>>> +	};
>>>>>>>> +
>>>>>>>>      #ifdef CONFIG_BTRFS_FS_REF_VERIFY
>>>>>>>>      	spinlock_t ref_verify_lock;
>>>>>>>>      	struct rb_root block_tree;
>>>>>>>> @@ -3595,4 +3601,9 @@ static inline int btrfs_is_testing(struct btrfs_fs_info *fs_info)
>>>>>>>>      }
>>>>>>>>      #endif
>>>>>>>>      
>>>>>>>> +static inline bool btrfs_is_zoned(struct btrfs_fs_info *fs_info)
>>>>>>>> +{
>>>>>>>> +	return fs_info->zoned != 0;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>>      #endif
>>>>>>>> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
>>>>>>>> index 6f6d77224c2b..db87f1aa604b 100644
>>>>>>>> --- a/fs/btrfs/dev-replace.c
>>>>>>>> +++ b/fs/btrfs/dev-replace.c
>>>>>>>> @@ -238,6 +238,13 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
>>>>>>>>      		return PTR_ERR(bdev);
>>>>>>>>      	}
>>>>>>>>      
>>>>>>>> +	if (!btrfs_check_device_zone_type(fs_info, bdev)) {
>>>>>>>> +		btrfs_err(fs_info,
>>>>>>>> +			  "dev-replace: zoned type of target device mismatch with filesystem");
>>>>>>>> +		ret = -EINVAL;
>>>>>>>> +		goto error;
>>>>>>>> +	}
>>>>>>>> +
>>>>>>>>      	sync_blockdev(bdev);
>>>>>>>>      
>>>>>>>>      	list_for_each_entry(device, &fs_info->fs_devices->devices, dev_list) {
>>>>>>>
>>>>>>>      I am not sure if it is done in some other patch. But we still have to
>>>>>>>      check for
>>>>>>>
>>>>>>>      (model == BLK_ZONED_HA && incompat_zoned))
>>>>>>
>>>>>> Do you really mean BLK_ZONED_HA, ie. host-aware (HA)?
>>>>>> btrfs_check_device_zone_type checks for _HM.
>>>>>
>>>>>
>>>>> Still confusing to me. The below function, which is part of this
>>>>> patch, says we don't support BLK_ZONED_HM. So does it mean we
>>>>> allow BLK_ZONED_HA only?
>>>>>
>>>>> +static inline bool btrfs_check_device_zone_type(struct btrfs_fs_info
>>>>> *fs_info,
>>>>> +						struct block_device *bdev)
>>>>> +{
>>>>> +	u64 zone_size;
>>>>> +
>>>>> +	if (btrfs_is_zoned(fs_info)) {
>>>>> +		zone_size = (u64)bdev_zone_sectors(bdev) << SECTOR_SHIFT;
>>>>> +		/* Do not allow non-zoned device */
>>>>
>>>> This comment does not make sense. It should be:
>>>>
>>>> 		/* Only allow zoned devices with the same zone size */
>>>>
>>>>> +		return bdev_is_zoned(bdev) && fs_info->zone_size == zone_size;
>>>>> +	}
>>>>> +
>>>>> +	/* Do not allow Host Manged zoned device */
>>>>> +	return bdev_zoned_model(bdev) != BLK_ZONED_HM;
>>>>
>>>> The comment is also wrong. It should read:
>>>>
>>>> 	/* Allow only host managed zoned devices */
>>>>
>>>> This is because we decided to treat host aware devices in the same way as
>>>> regular block devices, since HA drives are backward compatible with regular
>>>> block devices.
>>>
>>>
>>> Yeah, I read about them, but I have questions like do an FS work on top
>>> of a BLK_ZONED_HA without modification?
>>
>> Yes. These drives are fully backward compatible and accept random writes
>> anywhere. Performance however is potentially a different story as the drive will
>> eventually need to do internal garbage collection of some sort, exactly like an
>> SSD, but definitely not at SSD speeds :)
>>
>>>    Are we ok to replace an HM device with a HA device? Or add a HA device
>>> to a btrfs on an HM device.
>>
>> We have a choice here: we can treat HA drives as regular devices or treat them
>> as HM devices. Anything in between does not make sense. I am fine either way,
>> the main reason being that there are no HA drive on the market today that I know
>> of (this model did not have a lot of success due to the potentially very
>> unpredictable performance depending on the use case).
>>
>> So the simplest thing to do is, in my opinion, to ignore their "zone"
>> characteristics and treat them as regular disks. But treating them as HM drives
>> is a simple to do too.
>>> Of note is that a host-aware drive will be reported by the block layer as
>> BLK_ZONED_HA only as long as the drive does not have any partition. If it does,
>> then the block layer will treat the drive as a regular disk.
> 
> IMO. For now, it is better to check for the BLK_ZONED_HA explicitly in a 
> non-zoned-btrfs. And check for BLK_ZONED_HM explicitly in a zoned-btrfs. 

Sure, we can. But since HA drives are backward compatible, not sure the HA check
for non-zoned make sense. As long as the zoned flag is not set, the drive can be
used like a regular disk. If the user really want to use it as a zoned drive,
then it can format with force selecting the zoned flag in btrfs super. Then the
HA drive will be used as a zoned disk, exactly like HM disks.

> This way, if there is another type of BLK_ZONED_xx in the future, we 
> have the opportunity to review to support it. As below [1]...

It is very unlikely that we will see any other zone model. ZNS adopted the HM
model in purpose, to avoid multiplying the possible models, making the ecosystem
effort a nightmare.

> 
> [1]
> bool btrfs_check_device_type()
> {
> 	if (bdev_is_zoned()) {
> 		if (btrfs_is_zoned())
> 			if (bdev_zoned_model == BLK_ZONED_HM)
> 			/* also check the zone_size. */
> 				return true;
> 		else
> 			if (bdev_zoned_model == BLK_ZONED_HA)
> 			/* a regular device and FS, no zone_size to check I think? */
> 				return true;
> 	} else {
> 		if (!btrfs_is_zoned())
> 			return true
> 	}
> 
> 	return false;
> }
> 
> Thanks.

Works for me. May be reverse the conditions to make things easier to read and
understand:

bool btrfs_check_device_type()
{
	if (btrfs_is_zoned()) {
		if (bdev_is_zoned()) {
			/* also check the zone_size. */
			return true;
		}

		/*
		 * Regular device: emulate zones with zone size equal
		 * to device extent size.
		 */
		return true;
	}

	if (bdev_zoned_model == BLK_ZONED_HM) {
		/* Zoned HM device require zoned btrfs */
		return false;
	}

	/* Regular device or zoned HA device used as a regular device */
	return true;
}

Anand Jain Dec. 1, 2020, 7:12 a.m. UTC | #9

On 1/12/20 2:09 pm, Damien Le Moal wrote:
> On 2020/12/01 14:54, Anand Jain wrote:
>> On 1/12/20 10:29 am, Damien Le Moal wrote:
>>> On 2020/12/01 11:20, Anand Jain wrote:
>>>> On 30/11/20 9:15 pm, Damien Le Moal wrote:
>>>>> On 2020/11/30 21:13, Anand Jain wrote:
>>>>>> On 28/11/20 2:44 am, David Sterba wrote:
>>>>>>> On Wed, Nov 18, 2020 at 07:29:20PM +0800, Anand Jain wrote:
>>>>>>>> On 10/11/20 7:26 pm, Naohiro Aota wrote:
>>>>>>>>> This commit introduces the function btrfs_check_zoned_mode() to check if
>>>>>>>>> ZONED flag is enabled on the file system and if the file system consists of
>>>>>>>>> zoned devices with equal zone size.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>>>>>>>>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
>>>>>>>>> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
>>>>>>>>> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
>>>>>>>>> ---
>>>>>>>>>       fs/btrfs/ctree.h       | 11 ++++++
>>>>>>>>>       fs/btrfs/dev-replace.c |  7 ++++
>>>>>>>>>       fs/btrfs/disk-io.c     | 11 ++++++
>>>>>>>>>       fs/btrfs/super.c       |  1 +
>>>>>>>>>       fs/btrfs/volumes.c     |  5 +++
>>>>>>>>>       fs/btrfs/zoned.c       | 81 ++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>>       fs/btrfs/zoned.h       | 26 ++++++++++++++
>>>>>>>>>       7 files changed, 142 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>>>>>>>>> index aac3d6f4e35b..453f41ca024e 100644
>>>>>>>>> --- a/fs/btrfs/ctree.h
>>>>>>>>> +++ b/fs/btrfs/ctree.h
>>>>>>>>> @@ -948,6 +948,12 @@ struct btrfs_fs_info {
>>>>>>>>>       	/* Type of exclusive operation running */
>>>>>>>>>       	unsigned long exclusive_operation;
>>>>>>>>>       
>>>>>>>>> +	/* Zone size when in ZONED mode */
>>>>>>>>> +	union {
>>>>>>>>> +		u64 zone_size;
>>>>>>>>> +		u64 zoned;
>>>>>>>>> +	};
>>>>>>>>> +
>>>>>>>>>       #ifdef CONFIG_BTRFS_FS_REF_VERIFY
>>>>>>>>>       	spinlock_t ref_verify_lock;
>>>>>>>>>       	struct rb_root block_tree;
>>>>>>>>> @@ -3595,4 +3601,9 @@ static inline int btrfs_is_testing(struct btrfs_fs_info *fs_info)
>>>>>>>>>       }
>>>>>>>>>       #endif
>>>>>>>>>       
>>>>>>>>> +static inline bool btrfs_is_zoned(struct btrfs_fs_info *fs_info)
>>>>>>>>> +{
>>>>>>>>> +	return fs_info->zoned != 0;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>>       #endif
>>>>>>>>> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
>>>>>>>>> index 6f6d77224c2b..db87f1aa604b 100644
>>>>>>>>> --- a/fs/btrfs/dev-replace.c
>>>>>>>>> +++ b/fs/btrfs/dev-replace.c
>>>>>>>>> @@ -238,6 +238,13 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
>>>>>>>>>       		return PTR_ERR(bdev);
>>>>>>>>>       	}
>>>>>>>>>       
>>>>>>>>> +	if (!btrfs_check_device_zone_type(fs_info, bdev)) {
>>>>>>>>> +		btrfs_err(fs_info,
>>>>>>>>> +			  "dev-replace: zoned type of target device mismatch with filesystem");
>>>>>>>>> +		ret = -EINVAL;
>>>>>>>>> +		goto error;
>>>>>>>>> +	}
>>>>>>>>> +
>>>>>>>>>       	sync_blockdev(bdev);
>>>>>>>>>       
>>>>>>>>>       	list_for_each_entry(device, &fs_info->fs_devices->devices, dev_list) {
>>>>>>>>
>>>>>>>>       I am not sure if it is done in some other patch. But we still have to
>>>>>>>>       check for
>>>>>>>>
>>>>>>>>       (model == BLK_ZONED_HA && incompat_zoned))
>>>>>>>
>>>>>>> Do you really mean BLK_ZONED_HA, ie. host-aware (HA)?
>>>>>>> btrfs_check_device_zone_type checks for _HM.
>>>>>>
>>>>>>
>>>>>> Still confusing to me. The below function, which is part of this
>>>>>> patch, says we don't support BLK_ZONED_HM. So does it mean we
>>>>>> allow BLK_ZONED_HA only?
>>>>>>
>>>>>> +static inline bool btrfs_check_device_zone_type(struct btrfs_fs_info
>>>>>> *fs_info,
>>>>>> +						struct block_device *bdev)
>>>>>> +{
>>>>>> +	u64 zone_size;
>>>>>> +
>>>>>> +	if (btrfs_is_zoned(fs_info)) {
>>>>>> +		zone_size = (u64)bdev_zone_sectors(bdev) << SECTOR_SHIFT;
>>>>>> +		/* Do not allow non-zoned device */
>>>>>
>>>>> This comment does not make sense. It should be:
>>>>>
>>>>> 		/* Only allow zoned devices with the same zone size */
>>>>>
>>>>>> +		return bdev_is_zoned(bdev) && fs_info->zone_size == zone_size;
>>>>>> +	}
>>>>>> +
>>>>>> +	/* Do not allow Host Manged zoned device */
>>>>>> +	return bdev_zoned_model(bdev) != BLK_ZONED_HM;
>>>>>
>>>>> The comment is also wrong. It should read:
>>>>>
>>>>> 	/* Allow only host managed zoned devices */
>>>>>
>>>>> This is because we decided to treat host aware devices in the same way as
>>>>> regular block devices, since HA drives are backward compatible with regular
>>>>> block devices.
>>>>
>>>>
>>>> Yeah, I read about them, but I have questions like do an FS work on top
>>>> of a BLK_ZONED_HA without modification?
>>>
>>> Yes. These drives are fully backward compatible and accept random writes
>>> anywhere. Performance however is potentially a different story as the drive will
>>> eventually need to do internal garbage collection of some sort, exactly like an
>>> SSD, but definitely not at SSD speeds :)
>>>
>>>>     Are we ok to replace an HM device with a HA device? Or add a HA device
>>>> to a btrfs on an HM device.
>>>
>>> We have a choice here: we can treat HA drives as regular devices or treat them
>>> as HM devices. Anything in between does not make sense. I am fine either way,
>>> the main reason being that there are no HA drive on the market today that I know
>>> of (this model did not have a lot of success due to the potentially very
>>> unpredictable performance depending on the use case).
>>>
>>> So the simplest thing to do is, in my opinion, to ignore their "zone"
>>> characteristics and treat them as regular disks. But treating them as HM drives
>>> is a simple to do too.
>>>> Of note is that a host-aware drive will be reported by the block layer as
>>> BLK_ZONED_HA only as long as the drive does not have any partition. If it does,
>>> then the block layer will treat the drive as a regular disk.
>>
>> IMO. For now, it is better to check for the BLK_ZONED_HA explicitly in a
>> non-zoned-btrfs. And check for BLK_ZONED_HM explicitly in a zoned-btrfs.
> 
> Sure, we can. But since HA drives are backward compatible, not sure the HA check
> for non-zoned make sense. As long as the zoned flag is not set, the drive can be
> used like a regular disk. If the user really want to use it as a zoned drive,
> then it can format with force selecting the zoned flag in btrfs super. Then the
> HA drive will be used as a zoned disk, exactly like HM disks.
> 
>> This way, if there is another type of BLK_ZONED_xx in the future, we
>> have the opportunity to review to support it. As below [1]...
> 
> It is very unlikely that we will see any other zone model. ZNS adopted the HM
> model in purpose, to avoid multiplying the possible models, making the ecosystem
> effort a nightmare.
> 
>>
>> [1]
>> bool btrfs_check_device_type()
>> {
>> 	if (bdev_is_zoned()) {
>> 		if (btrfs_is_zoned())
>> 			if (bdev_zoned_model == BLK_ZONED_HM)
>> 			/* also check the zone_size. */
>> 				return true;
>> 		else
>> 			if (bdev_zoned_model == BLK_ZONED_HA)
>> 			/* a regular device and FS, no zone_size to check I think? */
>> 				return true;
>> 	} else {
>> 		if (!btrfs_is_zoned())
>> 			return true
>> 	}
>>
>> 	return false;
>> }
>>
>> Thanks.
> 
> Works for me. May be reverse the conditions to make things easier to read and
> understand:
> 
> bool btrfs_check_device_type()
> {
> 	if (btrfs_is_zoned()) {
> 		if (bdev_is_zoned()) {
> 			/* also check the zone_size. */
> 			return true;
> 		}
> 
> 		/*
> 		 * Regular device: emulate zones with zone size equal
> 		 * to device extent size.
> 		 */
> 		return true;
> 	}
> 
> 	if (bdev_zoned_model == BLK_ZONED_HM) {
> 		/* Zoned HM device require zoned btrfs */
> 		return false;
> 	}
> 
> 	/* Regular device or zoned HA device used as a regular device */
> 	return true;
> }
> 
> 

Yeah. Makes sense.

Thanks.

Graham Cobb Dec. 1, 2020, 10:45 a.m. UTC | #10

On 01/12/2020 02:29, Damien Le Moal wrote:
> Yes. These drives are fully backward compatible and accept random writes
> anywhere. Performance however is potentially a different story as the drive will
> eventually need to do internal garbage collection of some sort, exactly like an
> SSD, but definitely not at SSD speeds :)
> 
>>   Are we ok to replace an HM device with a HA device? Or add a HA device 
>> to a btrfs on an HM device.
> 
> We have a choice here: we can treat HA drives as regular devices or treat them
> as HM devices. Anything in between does not make sense. I am fine either way,
> the main reason being that there are no HA drive on the market today that I know
> of (this model did not have a lot of success due to the potentially very
> unpredictable performance depending on the use case).

So there will be no testing against HA drives? And no btrfs developers
will have one? And they have very different timing and possibly failure
modes from "normal" disks when they do GC?

I think there is no option but to disallow them. If HA drives start to
appear in significant numbers then that would be easy enough to change,
after suitable testing.

> Of note is that a host-aware drive will be reported by the block layer as
> BLK_ZONED_HA only as long as the drive does not have any partition. If it does,
> then the block layer will treat the drive as a regular disk.

That is a bit of a shame. With that unfortunate decision in the block
layer, system managers need to realise that partitioning an HA disk
means they may be entering territory untested by their filesystem.

Damien Le Moal Dec. 1, 2020, 11:03 a.m. UTC | #11

On 2020/12/01 19:45, Graham Cobb wrote:
> On 01/12/2020 02:29, Damien Le Moal wrote:
>> Yes. These drives are fully backward compatible and accept random writes
>> anywhere. Performance however is potentially a different story as the drive will
>> eventually need to do internal garbage collection of some sort, exactly like an
>> SSD, but definitely not at SSD speeds :)
>>
>>>   Are we ok to replace an HM device with a HA device? Or add a HA device 
>>> to a btrfs on an HM device.
>>
>> We have a choice here: we can treat HA drives as regular devices or treat them
>> as HM devices. Anything in between does not make sense. I am fine either way,
>> the main reason being that there are no HA drive on the market today that I know
>> of (this model did not have a lot of success due to the potentially very
>> unpredictable performance depending on the use case).
> 
> So there will be no testing against HA drives? And no btrfs developers
> will have one? And they have very different timing and possibly failure
> modes from "normal" disks when they do GC?
> 
> I think there is no option but to disallow them. If HA drives start to
> appear in significant numbers then that would be easy enough to change,
> after suitable testing.

Works for me. Even simpler :)

> 
>> Of note is that a host-aware drive will be reported by the block layer as
>> BLK_ZONED_HA only as long as the drive does not have any partition. If it does,
>> then the block layer will treat the drive as a regular disk.
> 
> That is a bit of a shame. With that unfortunate decision in the block
> layer, system managers need to realise that partitioning an HA disk
> means they may be entering territory untested by their filesystem.

Well, not really. HA drives, per specifications, are backward compatible. If
they are partitioned, the block layer will force a regular drive mode use,
hiding their zoned interface, which is completely optional to use in the first
place.

If by "untested territory" you mean the possibility of hitting drive FW bugs
coming from the added complexity of internal GC, then I would argue that this is
a common territory for any FS on any drive, especially SSDs: device FW bugs do
exist and show up from time to time, even on the simplest of drives.

Christoph Hellwig Dec. 1, 2020, 11:11 a.m. UTC | #12

On Tue, Dec 01, 2020 at 11:03:35AM +0000, Damien Le Moal wrote:
> Well, not really. HA drives, per specifications, are backward compatible. If
> they are partitioned, the block layer will force a regular drive mode use,
> hiding their zoned interface, which is completely optional to use in the first
> place.
> 
> If by "untested territory" you mean the possibility of hitting drive FW bugs
> coming from the added complexity of internal GC, then I would argue that this is
> a common territory for any FS on any drive, especially SSDs: device FW bugs do
> exist and show up from time to time, even on the simplest of drives.

Also note that most cheaper hard drives are using SMR internally,
you'll run into the same GC "issues" as with a HA drive that is used for
random writes.

Damien Le Moal Dec. 1, 2020, 11:27 a.m. UTC | #13

On 2020/12/01 20:11, hch@infradead.org wrote:
> On Tue, Dec 01, 2020 at 11:03:35AM +0000, Damien Le Moal wrote:
>> Well, not really. HA drives, per specifications, are backward compatible. If
>> they are partitioned, the block layer will force a regular drive mode use,
>> hiding their zoned interface, which is completely optional to use in the first
>> place.
>>
>> If by "untested territory" you mean the possibility of hitting drive FW bugs
>> coming from the added complexity of internal GC, then I would argue that this is
>> a common territory for any FS on any drive, especially SSDs: device FW bugs do
>> exist and show up from time to time, even on the simplest of drives.
> 
> Also note that most cheaper hard drives are using SMR internally,
> you'll run into the same GC "issues" as with a HA drive that is used for
> random writes.

Indeed. Good point !

[v10,05/41] btrfs: check and enable ZONED mode

Commit Message

Comments

Patch