mbox series

[v3,0/3] implement zone-aware probing/wiping for zoned btrfs

Message ID 20210426055036.2103620-1-naohiro.aota@wdc.com (mailing list archive)
Headers show
Series implement zone-aware probing/wiping for zoned btrfs | expand

Message

Naohiro Aota April 26, 2021, 5:50 a.m. UTC
This series implements probing and wiping of the superblock of zoned btrfs.

Changes:
  - v3:
     - Implement and use blkdev_get_zonereport()
     - Also modify blkid_clone_probe() for completeness
     - Drop temporary btrfs magic from the table
     - Do not try to aggressively copy-paste the kernel side code
     - Fix commit log
  - v2:
     - Fix zone alignment calculation
     - Fix the build without HAVE_LINUX_BLKZONED_H

Zoned btrfs is merged with this series:
https://lore.kernel.org/linux-btrfs/20210222160049.GR1993@twin.jikos.cz/T/

And, superblock locations are finalized with this patch:
https://lore.kernel.org/linux-btrfs/BL0PR04MB651442E6ACBF48342BD00FEBE7719@BL0PR04MB6514.namprd04.prod.outlook.com/T/

Corresponding btrfs-progs is available here:
https://github.com/naota/btrfs-progs/tree/btrfs-zoned

A zoned block device consists of a number of zones. Zones are either
conventional and accepting random writes or sequential and requiring that
writes be issued in LBA order from each zone write pointer position.

Superblock (and its copies) is the only data structure in btrfs with a
fixed location on a device. Since we cannot overwrite in a sequential write
required zone, we cannot place superblock in the zone.

Thus, zoned btrfs use superblock log writing to update superblock on
sequential write required zones. It uses two zones as a circular buffer to
write updated superblocks. Once the first zone is filled up, start writing
into the second buffer. When both zones are filled up and before start
writing to the first zone again, it reset the first zone.

This series first implements zone based detection of the magic location.
Then, it adds magics for zoned btrfs and implements a probing function to
detect the latest superblock. Finally, this series also implements
zone-aware wiping by zone resetting.

* Testing device

You need devices with zone append writing command support to run ZONED
btrfs.

Other than real devices, null_blk supports zone append write command. You
can use memory backed null_blk to run the test on it. Following script
creates 12800 MB /dev/nullb0 filled with 4MB zones.

    sysfs=/sys/kernel/config/nullb/nullb0
    size=12800 # MB
    zone_size= 4 # MB
    
    # drop nullb0
    if [[ -d $sysfs ]]; then
            echo 0 > "${sysfs}"/power
            rmdir $sysfs
    fi
    lsmod | grep -q null_blk && rmmod null_blk
    modprobe null_blk nr_devices=0
    
    mkdir "${sysfs}"
    
    echo "${size}" > "${sysfs}"/size
    echo 1 > "${sysfs}"/zoned
    echo "${zone_size}" > "${sysfs}"/zone_size
    echo 0 > "${sysfs}"/zone_nr_conv
    echo 1 > "${sysfs}"/memory_backed
    
    echo 1 > "${sysfs}"/power
    udevadm settle

Zoned SCSI devices such as SMR HDDs or scsi_debug also support the zone
append command as an emulated command within the SCSI sd driver. This
emulation is completely transparent to the user and provides the same
semantic as a NVMe ZNS native drive support.

Also, there is a qemu patch available to enable NVMe ZNS device.

Then, you can create zoned btrfs with the above btrfs-progs.

    $ mkfs.btrfs -d single -m single /dev/nullb0
    btrfs-progs v5.11
    See http://btrfs.wiki.kernel.org for more information.
    
    ERROR: superblock magic doesn't match
    /dev/nullb0: host-managed device detected, setting zoned feature
    Resetting device zones /dev/nullb0 (3200 zones) ...
    Label:              (null)
    UUID:               1e5912a2-b5c3-46fb-aa9a-ee3d073ff600
    Node size:          16384
    Sector size:        4096
    Filesystem size:    12.50GiB
    Block group profiles:
      Data:             single            4.00MiB
      Metadata:         single            4.00MiB
      System:           single            4.00MiB
    SSD detected:       yes
    Zoned device:       yes
    Zone size:          4.00MiB
    Incompat features:  extref, skinny-metadata, zoned
    Runtime features:   
    Checksum:           crc32c
    Number of devices:  1
    Devices:
       ID        SIZE  PATH
        1    12.50GiB  /dev/nullb0
    $ mount /dev/nullb0 /mnt/somewhere
    $ dmesg | tail
    ...
    [272816.682461] BTRFS: device fsid 1e5912a2-b5c3-46fb-aa9a-ee3d073ff600 devid 1 transid 5 /dev/nullb0 scanned by mkfs.btrfs (44367)
    [272883.678401] BTRFS info (device nullb0): has skinny extents
    [272883.686373] BTRFS info (device nullb0): flagging fs with big metadata feature
    [272883.699020] BTRFS info (device nullb0): host-managed zoned block device /dev/nullb0, 3200 zones of 4194304 bytes
    [272883.711736] BTRFS info (device nullb0): zoned mode enabled with zone size 4194304
    [272883.722388] BTRFS info (device nullb0): enabling ssd optimizations
    [272883.731332] BTRFS info (device nullb0): checking UUID tree

Naohiro Aota (3):
  blkid: implement zone-aware probing
  blkid: add magic and probing for zoned btrfs
  blkid: support zone reset for wipefs

 include/blkdev.h                 |   9 ++
 lib/blkdev.c                     |  29 ++++++
 libblkid/src/blkidP.h            |   5 +
 libblkid/src/probe.c             |  99 +++++++++++++++++--
 libblkid/src/superblocks/btrfs.c | 159 ++++++++++++++++++++++++++++++-
 5 files changed, 292 insertions(+), 9 deletions(-)

Comments

Karel Zak April 28, 2021, 11:36 a.m. UTC | #1
On Mon, Apr 26, 2021 at 02:50:33PM +0900, Naohiro Aota wrote:
> Naohiro Aota (3):
>   blkid: implement zone-aware probing
>   blkid: add magic and probing for zoned btrfs
>   blkid: support zone reset for wipefs
> 
>  include/blkdev.h                 |   9 ++
>  lib/blkdev.c                     |  29 ++++++
>  libblkid/src/blkidP.h            |   5 +
>  libblkid/src/probe.c             |  99 +++++++++++++++++--
>  libblkid/src/superblocks/btrfs.c | 159 ++++++++++++++++++++++++++++++-
>  5 files changed, 292 insertions(+), 9 deletions(-)

Merged to the "next" branch (on github) and it will be merged to the
"master" later after v2.37 release. 

Thanks! (and extra thank for the examples :-)

  Karel