Message ID | 20240124113841.31824-15-john.g.garry@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | block atomic writes | expand |
On Wed, Jan 24, 2024 at 11:38:40AM +0000, John Garry wrote: > From: Alan Adamson <alan.adamson@oracle.com> > > Support reading atomic write registers to fill in request_queue > properties. > > Use following method to calculate limits: > atomic_write_max_bytes = flp2(NAWUPF ?: AWUPF) > atomic_write_unit_min = logical_block_size > atomic_write_unit_max = flp2(NAWUPF ?: AWUPF) > atomic_write_boundary = NABSPF Can you expand this to actually be a real commit log with full sentences, expanding the NVME field name acronyms and reference the relevant Sections and Figures in a specific version of the NVMe specification? Also some implementation comments: NVMe has a particularly nasty NABO field in Identify Namespace, which offsets the boundary. We probably need to reject atomic writes or severly limit them if this field is set. Please also read through TP4098(a) and look at the MAM field. As far as I can tell the patch as-is assumes it always is set to 1. > +static void nvme_update_atomic_write_disk_info(struct gendisk *disk, > + struct nvme_ctrl *ctrl, struct nvme_id_ns *id, u32 bs, u32 atomic_bs) Please avoid the overly long line here. > + nvme_update_atomic_write_disk_info(disk, ctrl, id, bs, atomic_bs); .. and here.
On 13/02/2024 06:42, Christoph Hellwig wrote: > On Wed, Jan 24, 2024 at 11:38:40AM +0000, John Garry wrote: >> From: Alan Adamson <alan.adamson@oracle.com> >> >> Support reading atomic write registers to fill in request_queue >> properties. >> >> Use following method to calculate limits: >> atomic_write_max_bytes = flp2(NAWUPF ?: AWUPF) >> atomic_write_unit_min = logical_block_size >> atomic_write_unit_max = flp2(NAWUPF ?: AWUPF) >> atomic_write_boundary = NABSPF > > Can you expand this to actually be a real commit log with full > sentences, expanding the NVME field name acronyms and reference > the relevant Sections and Figures in a specific version of the > NVMe specification? ok > > Also some implementation comments: > > NVMe has a particularly nasty NABO field in Identify Namespace, which > offsets the boundary. We probably need to reject atomic writes or > severly limit them if this field is set. ok, and initially we'll just not support atomic writes for NABO != 0 > > Please also read through TP4098(a) and look at the MAM field. It's not public, AFAIK. And I don't think a feature which allows us to straddle boundaries is too interesting really. > As far > as I can tell the patch as-is assumes it always is set to 1. > >> +static void nvme_update_atomic_write_disk_info(struct gendisk *disk, >> + struct nvme_ctrl *ctrl, struct nvme_id_ns *id, u32 bs, u32 atomic_bs) > > Please avoid the overly long line here. > >> + nvme_update_atomic_write_disk_info(disk, ctrl, id, bs, atomic_bs); > > .. and here. ok, but I think that this will get shorter anyway. > Thanks, John
On Tue, Feb 13, 2024 at 02:21:25PM +0000, John Garry wrote: >> Please also read through TP4098(a) and look at the MAM field. > > It's not public, AFAIK. Oracle is a member, so you can take a look at it easily. If we need it for Linux I can also work with the NVMe Board to release it. > And I don't think a feature which allows us to straddle boundaries is too > interesting really. Without MAM=1 NVMe can't support atomic writes larger than AWUPF/NAWUPF, which is typically set to the indirection table size. You're leaving a lot of potential unused with that.
On 14/02/2024 08:00, Christoph Hellwig wrote: > On Tue, Feb 13, 2024 at 02:21:25PM +0000, John Garry wrote: >>> Please also read through TP4098(a) and look at the MAM field. >> >> It's not public, AFAIK. > > Oracle is a member, so you can take a look at it easily. If we need > it for Linux I can also work with the NVMe Board to release it. What I really meant was that I prefer not to quote private TPs in public domain. I have the doc. > >> And I don't think a feature which allows us to straddle boundaries is too >> interesting really. > > Without MAM=1 NVMe can't support atomic writes larger than > AWUPF/NAWUPF, which is typically set to the indirection table > size. You're leaving a lot of potential unused with that. > atomic_write_unit_max would always be dictated by min of boundary and AWUPF/NAWUPF. With MAM=1, we could increase atomic_write_max_bytes - but does it really help us? I mean, atomic_write_max_bytes only comes into play for merging - do we get much merging for NVMe transports? I am not sure. Thanks, John
>Support reading atomic write registers to fill in request_queue >properties. >Use following method to calculate limits: >atomic_write_max_bytes = flp2(NAWUPF ?: AWUPF) >atomic_write_unit_min = logical_block_size >atomic_write_unit_max = flp2(NAWUPF ?: AWUPF) >atomic_write_boundary = NABSPF In case the device doesn't support namespace atomic boundary size (i.e. NABSPF is zero) then while merging atomic block-IO we should allow merge. For example, while front/back merging the atomic block IO, we check whether boundary is defined or not. In case if boundary is not-defined (i.e. it's zero) then we simply reject merging ateempt (as implemented in rq_straddles_atomic_write_boundary()). I am quoting this from NVMe spec (Command Set Specification, revision 1.0a, Section 2.1.4.3) : "To ensure backwards compatibility, the values reported for AWUN, AWUPF, and ACWU shall be set such that they are supported even if a write crosses an atomic boundary. If a controller does not guarantee atomicity across atomic boundaries, the controller shall set AWUN, AWUPF, and ACWU to 0h (1 LBA)." Thanks, --Nilay
On 14/02/2024 12:27, Nilay Shroff wrote: >> Support reading atomic write registers to fill in request_queue > >> properties. > > > >> Use following method to calculate limits: > >> atomic_write_max_bytes = flp2(NAWUPF ?: AWUPF) > You still need to fix that mail client to not add extra blank lines. >> atomic_write_unit_min = logical_block_size > >> atomic_write_unit_max = flp2(NAWUPF ?: AWUPF) > >> atomic_write_boundary = NABSPF > > > > In case the device doesn't support namespace atomic boundary size (i.e. NABSPF > > is zero) then while merging atomic block-IO we should allow merge. > > > > For example, while front/back merging the atomic block IO, we check whether > > boundary is defined or not. In case if boundary is not-defined (i.e. it's zero) > > then we simply reject merging ateempt (as implemented in > > rq_straddles_atomic_write_boundary()). Are you sure about that? In rq_straddles_atomic_write_boundary(), if boundary == 0, then we return false, i.e. there is no boundary, so we can never be crossing it. static bool rq_straddles_atomic_write_boundary(struct request *rq, unsigned int front, unsigned int back) { unsigned int boundary = queue_atomic_write_boundary_bytes(rq->q); unsigned int mask, imask; loff_t start, end; if (!boundary) return false; ... } And then will not reject a merge for that reason, like: int ll_back_merge_fn(struct request *req, struct bio *bio, unsigned int nr_segs) { ... if (req->cmd_flags & REQ_ATOMIC) { if (rq_straddles_atomic_write_boundary(req, 0, bio->bi_iter.bi_size)) { return 0; } } return ll_new_hw_segment(req, bio, nr_segs); } > > > > I am quoting this from NVMe spec (Command Set Specification, revision 1.0a, > > Section 2.1.4.3) : "To ensure backwards compatibility, the values reported for > > AWUN, AWUPF, and ACWU shall be set such that they are supported even if a > > write crosses an atomic boundary. If a controller does not guarantee > > atomicity across atomic boundaries, the controller shall set AWUN, AWUPF, and > > ACWU to 0h (1 LBA)." > > > Thanks, John
On 2/14/24 18:32, John Garry wrote: > On 14/02/2024 12:27, Nilay Shroff wrote: >> >> >> >>> Use following method to calculate limits: >> >>> atomic_write_max_bytes = flp2(NAWUPF ?: AWUPF) >> > > You still need to fix that mail client to not add extra blank lines. Yes, I am working on it. I hope it's solved now. > >>> atomic_write_unit_min = logical_block_size >> >>> atomic_write_unit_max = flp2(NAWUPF ?: AWUPF) >> >>> atomic_write_boundary = NABSPF >> >> >> >> In case the device doesn't support namespace atomic boundary size (i.e. NABSPF >> >> is zero) then while merging atomic block-IO we should allow merge. >> >> >> For example, while front/back merging the atomic block IO, we check whether >> >> boundary is defined or not. In case if boundary is not-defined (i.e. it's zero) >> >> then we simply reject merging ateempt (as implemented in >> >> rq_straddles_atomic_write_boundary()). > > Are you sure about that? In rq_straddles_atomic_write_boundary(), if boundary == 0, then we return false, i.e. there is no boundary, so we can never be crossing it. > > static bool rq_straddles_atomic_write_boundary(struct request *rq, > unsigned int front, > unsigned int back) > { > unsigned int boundary = queue_atomic_write_boundary_bytes(rq->q); > unsigned int mask, imask; > loff_t start, end; > > if (!boundary) > return false; > > ... > } > > And then will not reject a merge for that reason, like: > > int ll_back_merge_fn(struct request *req, struct bio *bio, unsigned int nr_segs) > { > ... > > if (req->cmd_flags & REQ_ATOMIC) { > if (rq_straddles_atomic_write_boundary(req, > 0, bio->bi_iter.bi_size)) { > return 0; > } > } > > return ll_new_hw_segment(req, bio, nr_segs); > } > > Aargh, you are right. I see that if rq_straddles_atomic_write_boundary() returns true then we avoid merge otherwise the merge is attempted. My bad... Thanks, --Nilay
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 85ab0fcf9e88..5045c84f2516 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -1911,6 +1911,44 @@ static void nvme_set_queue_limits(struct nvme_ctrl *ctrl, blk_queue_write_cache(q, vwc, vwc); } +static void nvme_update_atomic_write_disk_info(struct gendisk *disk, + struct nvme_ctrl *ctrl, struct nvme_id_ns *id, u32 bs, u32 atomic_bs) +{ + unsigned int unit_min = 0, unit_max = 0, boundary = 0, max_bytes = 0; + struct request_queue *q = disk->queue; + + if (id->nsfeat & NVME_NS_FEAT_ATOMICS && id->nawupf) { + if (le16_to_cpu(id->nabspf)) + boundary = (le16_to_cpu(id->nabspf) + 1) * bs; + + /* + * The boundary size just needs to be a multiple of unit_max + * (and not necessarily a power-of-2), so this could be relaxed + * in the block layer in future. + * Furthermore, if needed, unit_max could be reduced so that the + * boundary size was compliant - but don't support yet. + */ + if (!boundary || is_power_of_2(boundary)) { + max_bytes = atomic_bs; + unit_min = bs; + unit_max = rounddown_pow_of_two(atomic_bs); + } else { + dev_notice(ctrl->device, "Unsupported atomic write boundary (%d)\n", + boundary); + boundary = 0; + } + } else if (ctrl->subsys->awupf) { + max_bytes = atomic_bs; + unit_min = bs; + unit_max = rounddown_pow_of_two(atomic_bs); + } + + blk_queue_atomic_write_max_bytes(q, max_bytes); + blk_queue_atomic_write_unit_min_sectors(q, unit_min >> SECTOR_SHIFT); + blk_queue_atomic_write_unit_max_sectors(q, unit_max >> SECTOR_SHIFT); + blk_queue_atomic_write_boundary_bytes(q, boundary); +} + static void nvme_update_disk_info(struct nvme_ctrl *ctrl, struct gendisk *disk, struct nvme_ns_head *head, struct nvme_id_ns *id) { @@ -1941,6 +1979,8 @@ static void nvme_update_disk_info(struct nvme_ctrl *ctrl, struct gendisk *disk, atomic_bs = (1 + le16_to_cpu(id->nawupf)) * bs; else atomic_bs = (1 + ctrl->subsys->awupf) * bs; + + nvme_update_atomic_write_disk_info(disk, ctrl, id, bs, atomic_bs); } if (id->nsfeat & NVME_NS_FEAT_IO_OPT) {