Message ID | 20210417023323.852530-4-damien.lemoal@wdc.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | Fix dm-crypt zoned block device support | expand |
On Sat, Apr 17, 2021 at 11:33:23AM +0900, Damien Le Moal wrote: > Synchronous writes to sequential zone files cannot use zone append > operations if the underlying zoned device queue limit > max_zone_append_sectors is 0, indicating that the device does not > support this operation. In this case, fall back to using regular write > operations. Zone append is a mandatory feature of the zoned device API.
On 2021/04/19 15:45, Christoph Hellwig wrote: > On Sat, Apr 17, 2021 at 11:33:23AM +0900, Damien Le Moal wrote: >> Synchronous writes to sequential zone files cannot use zone append >> operations if the underlying zoned device queue limit >> max_zone_append_sectors is 0, indicating that the device does not >> support this operation. In this case, fall back to using regular write >> operations. > > Zone append is a mandatory feature of the zoned device API. Yes, I am well aware of that. All physical zoned devices and null blk do support zone append, but the logical device created by dm-crypt is out. And we cannot simply disable zone support in dm-crypt as there are use cases out there in the field that I am aware of, in SMR space. So this series is a compromise: preserve dm-crypt zone support for SMR (no one uses the zone append emulation yet, as far as I know) by disabling zone append. For zonefs, we can: 1) refuse to mount if ZA is disabled, same as btrfs 2) Do as I did in the patch, fallback to regular writes since that is easy to do (zonefs file size tracks the WP position already). I chose option (2) to allow for SMR+dm-crypt to still work with zonefs.
On Mon, Apr 19, 2021 at 07:08:46AM +0000, Damien Le Moal wrote:
> 1) refuse to mount if ZA is disabled, same as btrfs
Yes, please do that.
On 2021-04-19 2:45 a.m., Christoph Hellwig wrote: > On Sat, Apr 17, 2021 at 11:33:23AM +0900, Damien Le Moal wrote: >> Synchronous writes to sequential zone files cannot use zone append >> operations if the underlying zoned device queue limit >> max_zone_append_sectors is 0, indicating that the device does not >> support this operation. In this case, fall back to using regular write >> operations. > > Zone append is a mandatory feature of the zoned device API. So a hack required for ZNS and not needed by ZBC and ZAC becomes a "mandatory feature" in a Linux API. Like many hacks, that one might come back to bite you :-) Doug Gilbert
On 2021/04/20 10:20, Douglas Gilbert wrote: > On 2021-04-19 2:45 a.m., Christoph Hellwig wrote: >> On Sat, Apr 17, 2021 at 11:33:23AM +0900, Damien Le Moal wrote: >>> Synchronous writes to sequential zone files cannot use zone append >>> operations if the underlying zoned device queue limit >>> max_zone_append_sectors is 0, indicating that the device does not >>> support this operation. In this case, fall back to using regular write >>> operations. >> >> Zone append is a mandatory feature of the zoned device API. > > So a hack required for ZNS and not needed by ZBC and ZAC becomes > a "mandatory feature" in a Linux API. Like many hacks, that one might > come back to bite you :-) Zone append is not a hack in ZNS. It is a write interface that fits very well with the multi-queue nature of NVMe. The "hack" is the emulation in scsi. We decided on having this mandatory for zoned devices (all types) to make sure that file systems do not have to implement different IO paths for sequential writing to zones. Zone append does simplify a lot of things and allows to get the best performance from ZNS drives. Zone write locking/serialization of writes per zones using regular writes is much harder to implement, make a mess of the file system code, and would kill write performance on ZNS.
diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c index 049e36c69ed7..b97566b9dff7 100644 --- a/fs/zonefs/super.c +++ b/fs/zonefs/super.c @@ -689,14 +689,15 @@ static ssize_t zonefs_file_dio_append(struct kiocb *iocb, struct iov_iter *from) { struct inode *inode = file_inode(iocb->ki_filp); struct zonefs_inode_info *zi = ZONEFS_I(inode); - struct block_device *bdev = inode->i_sb->s_bdev; - unsigned int max; + struct super_block *sb = inode->i_sb; + struct zonefs_sb_info *sbi = ZONEFS_SB(sb); + struct block_device *bdev = sb->s_bdev; + sector_t max = sbi->s_max_zone_append_sectors; struct bio *bio; ssize_t size; int nr_pages; ssize_t ret; - max = queue_max_zone_append_sectors(bdev_get_queue(bdev)); max = ALIGN_DOWN(max << SECTOR_SHIFT, inode->i_sb->s_blocksize); iov_iter_truncate(from, max); @@ -853,6 +854,8 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from) /* Enforce sequential writes (append only) in sequential zones */ if (zi->i_ztype == ZONEFS_ZTYPE_SEQ) { + struct zonefs_sb_info *sbi = ZONEFS_SB(sb); + mutex_lock(&zi->i_truncate_mutex); if (iocb->ki_pos != zi->i_wpoffset) { mutex_unlock(&zi->i_truncate_mutex); @@ -860,7 +863,7 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from) goto inode_unlock; } mutex_unlock(&zi->i_truncate_mutex); - append = sync; + append = sync && sbi->s_max_zone_append_sectors; } if (append) @@ -1683,6 +1686,11 @@ static int zonefs_fill_super(struct super_block *sb, void *data, int silent) sbi->s_mount_opts &= ~ZONEFS_MNTOPT_EXPLICIT_OPEN; } + sbi->s_max_zone_append_sectors = + queue_max_zone_append_sectors(bdev_get_queue(sb->s_bdev)); + if (!sbi->s_max_zone_append_sectors) + zonefs_info(sb, "Zone append is not supported: falling back to using regular writes\n"); + ret = zonefs_read_super(sb); if (ret) return ret; diff --git a/fs/zonefs/zonefs.h b/fs/zonefs/zonefs.h index 51141907097c..2b8c3b1a32ea 100644 --- a/fs/zonefs/zonefs.h +++ b/fs/zonefs/zonefs.h @@ -185,6 +185,8 @@ struct zonefs_sb_info { unsigned int s_max_open_zones; atomic_t s_open_zones; + + sector_t s_max_zone_append_sectors; }; static inline struct zonefs_sb_info *ZONEFS_SB(struct super_block *sb)