btrfs: zoned: clear SB magic on conventional zone

Message ID	98ef25697d52cd3e17b44a846e60eba9e5dfb39c.1726193590.git.naohiro.aota@wdc.com (mailing list archive)
State	New, archived
Headers	show Received: from esa1.hgst.iphmx.com (esa1.hgst.iphmx.com [68.232.141.245]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DA7461D31BB for <linux-btrfs@vger.kernel.org>; Fri, 13 Sep 2024 02:16:28 +0000 (UTC) IronPort-SDR: 66e39394_q/ZQSO3sDLEWODcjTnF3mJHDQuSGMObl5VScXevBWEd9JUu 8vRFmJZeulA2+FMVRwDdWEpAn+LU9Yzp2WZOSeA== WDCIronportException: Internal From: Naohiro Aota <naohiro.aota@wdc.com> To: linux-btrfs@vger.kernel.org Cc: xuefer@gmail.com, Naohiro Aota <naohiro.aota@wdc.com> Subject: [PATCH] btrfs: zoned: clear SB magic on conventional zone Date: Fri, 13 Sep 2024 11:14:27 +0900 Message-ID: <98ef25697d52cd3e17b44a846e60eba9e5dfb39c.1726193590.git.naohiro.aota@wdc.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	btrfs: zoned: clear SB magic on conventional zone \| expand btrfs: zoned: clear SB magic on conventional zone

Message ID

98ef25697d52cd3e17b44a846e60eba9e5dfb39c.1726193590.git.naohiro.aota@wdc.com (mailing list archive)

State

New, archived

Headers

IronPort-SDR: 66e39394_q/ZQSO3sDLEWODcjTnF3mJHDQuSGMObl5VScXevBWEd9JUu
 8vRFmJZeulA2+FMVRwDdWEpAn+LU9Yzp2WZOSeA==
WDCIronportException: Internal
From: Naohiro Aota <naohiro.aota@wdc.com>
To: linux-btrfs@vger.kernel.org
Cc: xuefer@gmail.com,
	Naohiro Aota <naohiro.aota@wdc.com>
Subject: [PATCH] btrfs: zoned: clear SB magic on conventional zone
Date: Fri, 13 Sep 2024 11:14:27 +0900
Message-ID: 
 <98ef25697d52cd3e17b44a846e60eba9e5dfb39c.1726193590.git.naohiro.aota@wdc.com>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

btrfs: zoned: clear SB magic on conventional zone | expand

Commit Message

Naohiro Aota Sept. 13, 2024, 2:14 a.m. UTC

btrfs_reset_sb_log_zones() properly resets a zone if the first zone of SB
log zones is not a conventional zone, which clears a SB magic
properly. However, it leaves SB magic on a conventional zone intact. As a
result, "btrfs delete" cannot remove the SB magic on a conventional
zone. So, re-adding the disk results in an error.

Use the same logic as btrfs_scratch_superblock() to remove the magic, if
the first zone is conventional.

Reported-by: Xuefer <xuefer@gmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219170
Fixes: 12659251ca5d ("btrfs: implement log-structured superblock for ZONED mode")
CC: stable@vger.kernel.org # 5.15+
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/volumes.c |  6 ++--
 fs/btrfs/volumes.h |  2 ++
 fs/btrfs/zoned.c   | 80 ++++++++++++++++++++++++++++++++++------------
 fs/btrfs/zoned.h   |  4 +--
 4 files changed, 67 insertions(+), 25 deletions(-)

Comments

Johannes Thumshirn Sept. 13, 2024, 6:36 a.m. UTC | #1

On 13.09.24 04:16, Naohiro Aota wrote:
> +	zone = &zinfo->sb_zones[BTRFS_NR_SB_LOG_ZONES * mirror];
> +	if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL) {
> +		/*
> +		 * If the first zone is conventional, the SB is placed at the
> +		 * first zone.
> +		 */
> +
> +		u64 bytenr = zone->start << SECTOR_SHIFT;
> +		u64 bytenr_orig = btrfs_sb_offset(mirror);
> +		struct btrfs_super_block *disk_super;
> +		const size_t len = sizeof(disk_super->magic);
> +
> +		disk_super = btrfs_read_disk_super(device->bdev, bytenr, bytenr_orig);
> +		if (IS_ERR(disk_super))
> +			return PTR_ERR(disk_super);
> +
> +		memset(&disk_super->magic, 0, len);
> +		folio_mark_dirty(virt_to_folio(disk_super));
> +		btrfs_release_disk_super(disk_super);
> +
> +		ret = sync_blockdev_range(device->bdev, bytenr, bytenr + len - 1);
> +	} else {
> +		unsigned int nofs_flags;
> +
> +		/*
> +		 * For the other case, all zones must be a sequential required
> +		 * zone.
> +		 */
> +#ifdef CONFIG_BTRFS_ASSERT
> +		for (int i = 0; i < BTRFS_NR_SB_LOG_ZONES; i++) {
> +			ASSERT(zone->type != BLK_ZONE_TYPE_CONVENTIONAL);
> +			zone++;
> +		}
> +		zone = &zinfo->sb_zones[BTRFS_NR_SB_LOG_ZONES * mirror];
> +#endif
> +
> +		nofs_flags = memalloc_nofs_save();
> +		ret = blkdev_zone_mgmt(device->bdev, REQ_OP_ZONE_RESET, zone->start,
> +				       zone->len * BTRFS_NR_SB_LOG_ZONES);
> +		memalloc_nofs_restore(nofs_flags);
> +
> +		if (!ret) {
> +			for (int i = 0; i < BTRFS_NR_SB_LOG_ZONES; i++) {
> +				zone->cond = BLK_ZONE_COND_EMPTY;
> +				zone->wp = zone->start;
> +				zone++;
> +			}
> +		}
> +	}
> +
> +	if (ret)
> +		btrfs_warn(device->fs_info, "error clearing superblock number %d (%d)", mirror,
> +			   ret);
> +

Is there a reason we can't go through the discard code for this? In the 
sequential zone case we end up with REQ_OP_ZONE_RESET in both code 
paths, in the conventional code case, we can do a REQ_OP_DISCARD or 
REQ_OP_WRITE_ZEROES for the whole 4k of the superblock.

Naohiro Aota Sept. 13, 2024, 7:46 a.m. UTC | #2

On Fri, Sep 13, 2024 at 06:36:47AM GMT, Johannes Thumshirn wrote:
> On 13.09.24 04:16, Naohiro Aota wrote:
> > +	zone = &zinfo->sb_zones[BTRFS_NR_SB_LOG_ZONES * mirror];
> > +	if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL) {
> > +		/*
> > +		 * If the first zone is conventional, the SB is placed at the
> > +		 * first zone.
> > +		 */
> > +
> > +		u64 bytenr = zone->start << SECTOR_SHIFT;
> > +		u64 bytenr_orig = btrfs_sb_offset(mirror);
> > +		struct btrfs_super_block *disk_super;
> > +		const size_t len = sizeof(disk_super->magic);
> > +
> > +		disk_super = btrfs_read_disk_super(device->bdev, bytenr, bytenr_orig);
> > +		if (IS_ERR(disk_super))
> > +			return PTR_ERR(disk_super);
> > +
> > +		memset(&disk_super->magic, 0, len);
> > +		folio_mark_dirty(virt_to_folio(disk_super));
> > +		btrfs_release_disk_super(disk_super);
> > +
> > +		ret = sync_blockdev_range(device->bdev, bytenr, bytenr + len - 1);
> > +	} else {
> > +		unsigned int nofs_flags;
> > +
> > +		/*
> > +		 * For the other case, all zones must be a sequential required
> > +		 * zone.
> > +		 */
> > +#ifdef CONFIG_BTRFS_ASSERT
> > +		for (int i = 0; i < BTRFS_NR_SB_LOG_ZONES; i++) {
> > +			ASSERT(zone->type != BLK_ZONE_TYPE_CONVENTIONAL);
> > +			zone++;
> > +		}
> > +		zone = &zinfo->sb_zones[BTRFS_NR_SB_LOG_ZONES * mirror];
> > +#endif
> > +
> > +		nofs_flags = memalloc_nofs_save();
> > +		ret = blkdev_zone_mgmt(device->bdev, REQ_OP_ZONE_RESET, zone->start,
> > +				       zone->len * BTRFS_NR_SB_LOG_ZONES);
> > +		memalloc_nofs_restore(nofs_flags);
> > +
> > +		if (!ret) {
> > +			for (int i = 0; i < BTRFS_NR_SB_LOG_ZONES; i++) {
> > +				zone->cond = BLK_ZONE_COND_EMPTY;
> > +				zone->wp = zone->start;
> > +				zone++;
> > +			}
> > +		}
> > +	}
> > +
> > +	if (ret)
> > +		btrfs_warn(device->fs_info, "error clearing superblock number %d (%d)", mirror,
> > +			   ret);
> > +
> 
> Is there a reason we can't go through the discard code for this? In the 
> sequential zone case we end up with REQ_OP_ZONE_RESET in both code 
> paths, in the conventional code case, we can do a REQ_OP_DISCARD or 
> REQ_OP_WRITE_ZEROES for the whole 4k of the superblock.
> 

Yeah, we can do so. I agree that is simple.

But, I tried to make the behavior compatible with the regular
mode. btrfs_scratch_superblock(), which handle the case for the regular
mode, just overwrites the SB magic (4 bytes?) and leaves other field
intact. I guess it is for a rescue option?

That is not possible on a sequential write required zone. So, I'd just
reset the zone entirely. (Well, reading the last SB, resetting the zone,
writing SB data with magic cleared may work..., though)

For a conventional zone, we can do the same logic as the regular case. So,
I follow that.

Xuefer Sept. 13, 2024, 2:45 p.m. UTC | #3

Thanks. applied on 6.10.9-gentoo and works for me
# btrfs device add /dev/sdd /d/ -f
# btrfs de remove /dev/sdd /d/
# lsblk -f
NAME   FSTYPE FSVER LABEL     UUID
FSAVAIL FSUSE% MOUNTPOINTS
sdd
sde    btrfs        downloads 8d4bcd03-3f85-4aae-a7dc-a302f1d6d8bb
5.9T    53% /d
sdf    btrfs        downloads 8d4bcd03-3f85-4aae-a7dc-a302f1d6d8bb

# dmesg | tail
[31061.373787] BTRFS info (device sde): host-managed zoned block
device /dev/sdd, 52156 zones of 268435456 bytes
[31061.430513] BTRFS info (device sde): disk added /dev/sdd
[31068.855965] BTRFS info (device sde): device deleted: /dev/sdd

no more error, no dd needed to clear manually

On Fri, Sep 13, 2024 at 10:16 AM Naohiro Aota <naohiro.aota@wdc.com> wrote:
>
> btrfs_reset_sb_log_zones() properly resets a zone if the first zone of SB
> log zones is not a conventional zone, which clears a SB magic
> properly. However, it leaves SB magic on a conventional zone intact. As a
> result, "btrfs delete" cannot remove the SB magic on a conventional
> zone. So, re-adding the disk results in an error.
>
> Use the same logic as btrfs_scratch_superblock() to remove the magic, if
> the first zone is conventional.
>
> Reported-by: Xuefer <xuefer@gmail.com>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=219170
> Fixes: 12659251ca5d ("btrfs: implement log-structured superblock for ZONED mode")
> CC: stable@vger.kernel.org # 5.15+
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
>  fs/btrfs/volumes.c |  6 ++--
>  fs/btrfs/volumes.h |  2 ++
>  fs/btrfs/zoned.c   | 80 ++++++++++++++++++++++++++++++++++------------
>  fs/btrfs/zoned.h   |  4 +--
>  4 files changed, 67 insertions(+), 25 deletions(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 4a259bdaa21c..140c4ca74d4f 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1270,8 +1270,8 @@ void btrfs_release_disk_super(struct btrfs_super_block *super)
>         put_page(page);
>  }
>
> -static struct btrfs_super_block *btrfs_read_disk_super(struct block_device *bdev,
> -                                                      u64 bytenr, u64 bytenr_orig)
> +struct btrfs_super_block *btrfs_read_disk_super(struct block_device *bdev, u64 bytenr,
> +                                               u64 bytenr_orig)
>  {
>         struct btrfs_super_block *disk_super;
>         struct page *page;
> @@ -2101,7 +2101,7 @@ void btrfs_scratch_superblocks(struct btrfs_fs_info *fs_info, struct btrfs_devic
>
>         for (copy_num = 0; copy_num < BTRFS_SUPER_MIRROR_MAX; copy_num++) {
>                 if (bdev_is_zoned(bdev))
> -                       btrfs_reset_sb_log_zones(bdev, copy_num);
> +                       btrfs_reset_sb_log_zones(device, copy_num);
>                 else
>                         btrfs_scratch_superblock(fs_info, bdev, copy_num);
>         }
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 03d2d60afe0c..176aa916fc05 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -758,6 +758,8 @@ struct btrfs_chunk_map *btrfs_get_chunk_map(struct btrfs_fs_info *fs_info,
>                                             u64 logical, u64 length);
>  void btrfs_remove_chunk_map(struct btrfs_fs_info *fs_info, struct btrfs_chunk_map *map);
>  void btrfs_release_disk_super(struct btrfs_super_block *super);
> +struct btrfs_super_block *btrfs_read_disk_super(struct block_device *bdev, u64 bytenr,
> +                                               u64 bytenr_orig);
>
>  static inline void btrfs_dev_stat_inc(struct btrfs_device *dev,
>                                       int index)
> diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
> index 41ce252bb8fe..39d37a246b3e 100644
> --- a/fs/btrfs/zoned.c
> +++ b/fs/btrfs/zoned.c
> @@ -989,30 +989,70 @@ int btrfs_advance_sb_log(struct btrfs_device *device, int mirror)
>         return -EIO;
>  }
>
> -int btrfs_reset_sb_log_zones(struct block_device *bdev, int mirror)
> +int btrfs_reset_sb_log_zones(struct btrfs_device *device, int mirror)
>  {
> -       unsigned int nofs_flags;
> -       sector_t zone_sectors;
> -       sector_t nr_sectors;
> -       u8 zone_sectors_shift;
> -       u32 sb_zone;
> -       u32 nr_zones;
> -       int ret;
> -
> -       zone_sectors = bdev_zone_sectors(bdev);
> -       zone_sectors_shift = ilog2(zone_sectors);
> -       nr_sectors = bdev_nr_sectors(bdev);
> -       nr_zones = nr_sectors >> zone_sectors_shift;
> +       struct btrfs_zoned_device_info *zinfo = device->zone_info;
> +       u32 sb_zone = sb_zone_number(zinfo->zone_size_shift, mirror);
> +       struct blk_zone *zone;
> +       int ret = 0;
>
> -       sb_zone = sb_zone_number(zone_sectors_shift + SECTOR_SHIFT, mirror);
> -       if (sb_zone + 1 >= nr_zones)
> +       if (sb_zone + BTRFS_NR_SB_LOG_ZONES > zinfo->nr_zones)
>                 return -ENOENT;
>
> -       nofs_flags = memalloc_nofs_save();
> -       ret = blkdev_zone_mgmt(bdev, REQ_OP_ZONE_RESET,
> -                              zone_start_sector(sb_zone, bdev),
> -                              zone_sectors * BTRFS_NR_SB_LOG_ZONES);
> -       memalloc_nofs_restore(nofs_flags);
> +       zone = &zinfo->sb_zones[BTRFS_NR_SB_LOG_ZONES * mirror];
> +       if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL) {
> +               /*
> +                * If the first zone is conventional, the SB is placed at the
> +                * first zone.
> +                */
> +
> +               u64 bytenr = zone->start << SECTOR_SHIFT;
> +               u64 bytenr_orig = btrfs_sb_offset(mirror);
> +               struct btrfs_super_block *disk_super;
> +               const size_t len = sizeof(disk_super->magic);
> +
> +               disk_super = btrfs_read_disk_super(device->bdev, bytenr, bytenr_orig);
> +               if (IS_ERR(disk_super))
> +                       return PTR_ERR(disk_super);
> +
> +               memset(&disk_super->magic, 0, len);
> +               folio_mark_dirty(virt_to_folio(disk_super));
> +               btrfs_release_disk_super(disk_super);
> +
> +               ret = sync_blockdev_range(device->bdev, bytenr, bytenr + len - 1);
> +       } else {
> +               unsigned int nofs_flags;
> +
> +               /*
> +                * For the other case, all zones must be a sequential required
> +                * zone.
> +                */
> +#ifdef CONFIG_BTRFS_ASSERT
> +               for (int i = 0; i < BTRFS_NR_SB_LOG_ZONES; i++) {
> +                       ASSERT(zone->type != BLK_ZONE_TYPE_CONVENTIONAL);
> +                       zone++;
> +               }
> +               zone = &zinfo->sb_zones[BTRFS_NR_SB_LOG_ZONES * mirror];
> +#endif
> +
> +               nofs_flags = memalloc_nofs_save();
> +               ret = blkdev_zone_mgmt(device->bdev, REQ_OP_ZONE_RESET, zone->start,
> +                                      zone->len * BTRFS_NR_SB_LOG_ZONES);
> +               memalloc_nofs_restore(nofs_flags);
> +
> +               if (!ret) {
> +                       for (int i = 0; i < BTRFS_NR_SB_LOG_ZONES; i++) {
> +                               zone->cond = BLK_ZONE_COND_EMPTY;
> +                               zone->wp = zone->start;
> +                               zone++;
> +                       }
> +               }
> +       }
> +
> +       if (ret)
> +               btrfs_warn(device->fs_info, "error clearing superblock number %d (%d)", mirror,
> +                          ret);
> +
>         return ret;
>  }
>
> diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h
> index 7612e6572605..eef3272b087c 100644
> --- a/fs/btrfs/zoned.h
> +++ b/fs/btrfs/zoned.h
> @@ -65,7 +65,7 @@ int btrfs_sb_log_location_bdev(struct block_device *bdev, int mirror, int rw,
>  int btrfs_sb_log_location(struct btrfs_device *device, int mirror, int rw,
>                           u64 *bytenr_ret);
>  int btrfs_advance_sb_log(struct btrfs_device *device, int mirror);
> -int btrfs_reset_sb_log_zones(struct block_device *bdev, int mirror);
> +int btrfs_reset_sb_log_zones(struct btrfs_device *device, int mirror);
>  u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start,
>                                  u64 hole_end, u64 num_bytes);
>  int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical,
> @@ -155,7 +155,7 @@ static inline int btrfs_advance_sb_log(struct btrfs_device *device, int mirror)
>         return 0;
>  }
>
> -static inline int btrfs_reset_sb_log_zones(struct block_device *bdev, int mirror)
> +static inline int btrfs_reset_sb_log_zones(struct btrfs_device *device, int mirror)
>  {
>         return 0;
>  }
> --
> 2.46.0
>

David Sterba Sept. 17, 2024, 4:38 p.m. UTC | #4

On Fri, Sep 13, 2024 at 07:46:14AM +0000, Naohiro Aota wrote:
> On Fri, Sep 13, 2024 at 06:36:47AM GMT, Johannes Thumshirn wrote:
> > On 13.09.24 04:16, Naohiro Aota wrote:
> > > +	zone = &zinfo->sb_zones[BTRFS_NR_SB_LOG_ZONES * mirror];
> > > +	if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL) {
> > > +		/*
> > > +		 * If the first zone is conventional, the SB is placed at the
> > > +		 * first zone.
> > > +		 */
> > > +
> > > +		u64 bytenr = zone->start << SECTOR_SHIFT;
> > > +		u64 bytenr_orig = btrfs_sb_offset(mirror);
> > > +		struct btrfs_super_block *disk_super;
> > > +		const size_t len = sizeof(disk_super->magic);
> > > +
> > > +		disk_super = btrfs_read_disk_super(device->bdev, bytenr, bytenr_orig);
> > > +		if (IS_ERR(disk_super))
> > > +			return PTR_ERR(disk_super);
> > > +
> > > +		memset(&disk_super->magic, 0, len);
> > > +		folio_mark_dirty(virt_to_folio(disk_super));
> > > +		btrfs_release_disk_super(disk_super);
> > > +
> > > +		ret = sync_blockdev_range(device->bdev, bytenr, bytenr + len - 1);
> > > +	} else {
> > > +		unsigned int nofs_flags;
> > > +
> > > +		/*
> > > +		 * For the other case, all zones must be a sequential required
> > > +		 * zone.
> > > +		 */
> > > +#ifdef CONFIG_BTRFS_ASSERT
> > > +		for (int i = 0; i < BTRFS_NR_SB_LOG_ZONES; i++) {
> > > +			ASSERT(zone->type != BLK_ZONE_TYPE_CONVENTIONAL);
> > > +			zone++;
> > > +		}
> > > +		zone = &zinfo->sb_zones[BTRFS_NR_SB_LOG_ZONES * mirror];
> > > +#endif
> > > +
> > > +		nofs_flags = memalloc_nofs_save();
> > > +		ret = blkdev_zone_mgmt(device->bdev, REQ_OP_ZONE_RESET, zone->start,
> > > +				       zone->len * BTRFS_NR_SB_LOG_ZONES);
> > > +		memalloc_nofs_restore(nofs_flags);
> > > +
> > > +		if (!ret) {
> > > +			for (int i = 0; i < BTRFS_NR_SB_LOG_ZONES; i++) {
> > > +				zone->cond = BLK_ZONE_COND_EMPTY;
> > > +				zone->wp = zone->start;
> > > +				zone++;
> > > +			}
> > > +		}
> > > +	}
> > > +
> > > +	if (ret)
> > > +		btrfs_warn(device->fs_info, "error clearing superblock number %d (%d)", mirror,
> > > +			   ret);
> > > +
> > 
> > Is there a reason we can't go through the discard code for this? In the 
> > sequential zone case we end up with REQ_OP_ZONE_RESET in both code 
> > paths, in the conventional code case, we can do a REQ_OP_DISCARD or 
> > REQ_OP_WRITE_ZEROES for the whole 4k of the superblock.
> > 
> 
> Yeah, we can do so. I agree that is simple.
> 
> But, I tried to make the behavior compatible with the regular
> mode. btrfs_scratch_superblock(), which handle the case for the regular
> mode, just overwrites the SB magic (4 bytes?) and leaves other field
> intact. I guess it is for a rescue option?

I'd prefer to follow the same logic as btrfs_scratch_superblock() here,
deleting only the signature. Leaving metadata behind for rescue purposes
sometimes helps.

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 4a259bdaa21c..140c4ca74d4f 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1270,8 +1270,8 @@  void btrfs_release_disk_super(struct btrfs_super_block *super)
 	put_page(page);
 }
 
-static struct btrfs_super_block *btrfs_read_disk_super(struct block_device *bdev,
-						       u64 bytenr, u64 bytenr_orig)
+struct btrfs_super_block *btrfs_read_disk_super(struct block_device *bdev, u64 bytenr,
+						u64 bytenr_orig)
 {
 	struct btrfs_super_block *disk_super;
 	struct page *page;
@@ -2101,7 +2101,7 @@  void btrfs_scratch_superblocks(struct btrfs_fs_info *fs_info, struct btrfs_devic
 
 	for (copy_num = 0; copy_num < BTRFS_SUPER_MIRROR_MAX; copy_num++) {
 		if (bdev_is_zoned(bdev))
-			btrfs_reset_sb_log_zones(bdev, copy_num);
+			btrfs_reset_sb_log_zones(device, copy_num);
 		else
 			btrfs_scratch_superblock(fs_info, bdev, copy_num);
 	}
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 03d2d60afe0c..176aa916fc05 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -758,6 +758,8 @@  struct btrfs_chunk_map *btrfs_get_chunk_map(struct btrfs_fs_info *fs_info,
 					    u64 logical, u64 length);
 void btrfs_remove_chunk_map(struct btrfs_fs_info *fs_info, struct btrfs_chunk_map *map);
 void btrfs_release_disk_super(struct btrfs_super_block *super);
+struct btrfs_super_block *btrfs_read_disk_super(struct block_device *bdev, u64 bytenr,
+						u64 bytenr_orig);
 
 static inline void btrfs_dev_stat_inc(struct btrfs_device *dev,
 				      int index)
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 41ce252bb8fe..39d37a246b3e 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -989,30 +989,70 @@  int btrfs_advance_sb_log(struct btrfs_device *device, int mirror)
 	return -EIO;
 }
 
-int btrfs_reset_sb_log_zones(struct block_device *bdev, int mirror)
+int btrfs_reset_sb_log_zones(struct btrfs_device *device, int mirror)
 {
-	unsigned int nofs_flags;
-	sector_t zone_sectors;
-	sector_t nr_sectors;
-	u8 zone_sectors_shift;
-	u32 sb_zone;
-	u32 nr_zones;
-	int ret;
-
-	zone_sectors = bdev_zone_sectors(bdev);
-	zone_sectors_shift = ilog2(zone_sectors);
-	nr_sectors = bdev_nr_sectors(bdev);
-	nr_zones = nr_sectors >> zone_sectors_shift;
+	struct btrfs_zoned_device_info *zinfo = device->zone_info;
+	u32 sb_zone = sb_zone_number(zinfo->zone_size_shift, mirror);
+	struct blk_zone *zone;
+	int ret = 0;
 
-	sb_zone = sb_zone_number(zone_sectors_shift + SECTOR_SHIFT, mirror);
-	if (sb_zone + 1 >= nr_zones)
+	if (sb_zone + BTRFS_NR_SB_LOG_ZONES > zinfo->nr_zones)
 		return -ENOENT;
 
-	nofs_flags = memalloc_nofs_save();
-	ret = blkdev_zone_mgmt(bdev, REQ_OP_ZONE_RESET,
-			       zone_start_sector(sb_zone, bdev),
-			       zone_sectors * BTRFS_NR_SB_LOG_ZONES);
-	memalloc_nofs_restore(nofs_flags);
+	zone = &zinfo->sb_zones[BTRFS_NR_SB_LOG_ZONES * mirror];
+	if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL) {
+		/*
+		 * If the first zone is conventional, the SB is placed at the
+		 * first zone.
+		 */
+
+		u64 bytenr = zone->start << SECTOR_SHIFT;
+		u64 bytenr_orig = btrfs_sb_offset(mirror);
+		struct btrfs_super_block *disk_super;
+		const size_t len = sizeof(disk_super->magic);
+
+		disk_super = btrfs_read_disk_super(device->bdev, bytenr, bytenr_orig);
+		if (IS_ERR(disk_super))
+			return PTR_ERR(disk_super);
+
+		memset(&disk_super->magic, 0, len);
+		folio_mark_dirty(virt_to_folio(disk_super));
+		btrfs_release_disk_super(disk_super);
+
+		ret = sync_blockdev_range(device->bdev, bytenr, bytenr + len - 1);
+	} else {
+		unsigned int nofs_flags;
+
+		/*
+		 * For the other case, all zones must be a sequential required
+		 * zone.
+		 */
+#ifdef CONFIG_BTRFS_ASSERT
+		for (int i = 0; i < BTRFS_NR_SB_LOG_ZONES; i++) {
+			ASSERT(zone->type != BLK_ZONE_TYPE_CONVENTIONAL);
+			zone++;
+		}
+		zone = &zinfo->sb_zones[BTRFS_NR_SB_LOG_ZONES * mirror];
+#endif
+
+		nofs_flags = memalloc_nofs_save();
+		ret = blkdev_zone_mgmt(device->bdev, REQ_OP_ZONE_RESET, zone->start,
+				       zone->len * BTRFS_NR_SB_LOG_ZONES);
+		memalloc_nofs_restore(nofs_flags);
+
+		if (!ret) {
+			for (int i = 0; i < BTRFS_NR_SB_LOG_ZONES; i++) {
+				zone->cond = BLK_ZONE_COND_EMPTY;
+				zone->wp = zone->start;
+				zone++;
+			}
+		}
+	}
+
+	if (ret)
+		btrfs_warn(device->fs_info, "error clearing superblock number %d (%d)", mirror,
+			   ret);
+
 	return ret;
 }
 
diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h
index 7612e6572605..eef3272b087c 100644
--- a/fs/btrfs/zoned.h
+++ b/fs/btrfs/zoned.h
@@ -65,7 +65,7 @@  int btrfs_sb_log_location_bdev(struct block_device *bdev, int mirror, int rw,
 int btrfs_sb_log_location(struct btrfs_device *device, int mirror, int rw,
 			  u64 *bytenr_ret);
 int btrfs_advance_sb_log(struct btrfs_device *device, int mirror);
-int btrfs_reset_sb_log_zones(struct block_device *bdev, int mirror);
+int btrfs_reset_sb_log_zones(struct btrfs_device *device, int mirror);
 u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start,
 				 u64 hole_end, u64 num_bytes);
 int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical,
@@ -155,7 +155,7 @@  static inline int btrfs_advance_sb_log(struct btrfs_device *device, int mirror)
 	return 0;
 }
 
-static inline int btrfs_reset_sb_log_zones(struct block_device *bdev, int mirror)
+static inline int btrfs_reset_sb_log_zones(struct btrfs_device *device, int mirror)
 {
 	return 0;
 }

btrfs: zoned: clear SB magic on conventional zone

Commit Message

Comments

Patch