Message ID | 20240620125359.2684798-11-john.g.garry@oracle.com (mailing list archive) |
---|---|
State | Accepted, archived |
Delegated to: | Benjamin Marzinski |
Headers | show |
Series | block atomic writes | expand |
On Thu, Jun 20, 2024 at 12:53:59PM +0000, John Garry wrote: > From: Alan Adamson <alan.adamson@oracle.com> > > Add support to set block layer request_queue atomic write limits. The > limits will be derived from either the namespace or controller atomic > parameters. > > NVMe atomic-related parameters are grouped into "normal" and "power-fail" > (or PF) class of parameter. For atomic write support, only PF parameters > are of interest. The "normal" parameters are concerned with racing reads > and writes (which also applies to PF). See NVM Command Set Specification > Revision 1.0d section 2.1.4 for reference. Looks good. Reviewed-by: Keith Busch <kbusch@kernel.org>
On 6/20/24 14:53, John Garry wrote: > From: Alan Adamson <alan.adamson@oracle.com> > > Add support to set block layer request_queue atomic write limits. The > limits will be derived from either the namespace or controller atomic > parameters. > > NVMe atomic-related parameters are grouped into "normal" and "power-fail" > (or PF) class of parameter. For atomic write support, only PF parameters > are of interest. The "normal" parameters are concerned with racing reads > and writes (which also applies to PF). See NVM Command Set Specification > Revision 1.0d section 2.1.4 for reference. > > Whether to use per namespace or controller atomic parameters is decided by > NSFEAT bit 1 - see Figure 97: Identify – Identify Namespace Data > Structure, NVM Command Set. > > NVMe namespaces may define an atomic boundary, whereby no atomic guarantees > are provided for a write which straddles this per-lba space boundary. The > block layer merging policy is such that no merges may occur in which the > resultant request would straddle such a boundary. > > Unlike SCSI, NVMe specifies no granularity or alignment rules, apart from > atomic boundary rule. In addition, again unlike SCSI, there is no > dedicated atomic write command - a write which adheres to the atomic size > limit and boundary is implicitly atomic. > > If NSFEAT bit 1 is set, the following parameters are of interest: > - NAWUPF (Namespace Atomic Write Unit Power Fail) > - NABSPF (Namespace Atomic Boundary Size Power Fail) > - NABO (Namespace Atomic Boundary Offset) > > and we set request_queue limits as follows: > - atomic_write_unit_max = rounddown_pow_of_two(NAWUPF) > - atomic_write_max_bytes = NAWUPF > - atomic_write_boundary = NABSPF > > If in the unlikely scenario that NABO is non-zero, then atomic writes will > not be supported at all as dealing with this adds extra complexity. This > policy may change in future. > > In all cases, atomic_write_unit_min is set to the logical block size. > > If NSFEAT bit 1 is unset, the following parameter is of interest: > - AWUPF (Atomic Write Unit Power Fail) > > and we set request_queue limits as follows: > - atomic_write_unit_max = rounddown_pow_of_two(AWUPF) > - atomic_write_max_bytes = AWUPF > - atomic_write_boundary = 0 > > A new function, nvme_valid_atomic_write(), is also called from submission > path to verify that a request has been submitted to the driver will > actually be executed atomically. As mentioned, there is no dedicated NVMe > atomic write command (which may error for a command which exceeds the > controller atomic write limits). > > Note on NABSPF: > There seems to be some vagueness in the spec as to whether NABSPF applies > for NSFEAT bit 1 being unset. Figure 97 does not explicitly mention NABSPF > and how it is affected by bit 1. However Figure 4 does tell to check Figure > 97 for info about per-namespace parameters, which NABSPF is, so it is > implied. However currently nvme_update_disk_info() does check namespace > parameter NABO regardless of this bit. > > Signed-off-by: Alan Adamson <alan.adamson@oracle.com> > Reviewed-by: Keith Busch <kbusch@kernel.org> > Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> > jpg: total rewrite > Signed-off-by: John Garry <john.g.garry@oracle.com> > --- > drivers/nvme/host/core.c | 52 ++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 52 insertions(+) > Reviewed-by: Hannes Reinecke <hare@suse.de> Cheers, Hannes
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index bf410d10b120..89ebfa89613e 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -927,6 +927,36 @@ static inline blk_status_t nvme_setup_write_zeroes(struct nvme_ns *ns, return BLK_STS_OK; } +/* + * NVMe does not support a dedicated command to issue an atomic write. A write + * which does adhere to the device atomic limits will silently be executed + * non-atomically. The request issuer should ensure that the write is within + * the queue atomic writes limits, but just validate this in case it is not. + */ +static bool nvme_valid_atomic_write(struct request *req) +{ + struct request_queue *q = req->q; + u32 boundary_bytes = queue_atomic_write_boundary_bytes(q); + + if (blk_rq_bytes(req) > queue_atomic_write_unit_max_bytes(q)) + return false; + + if (boundary_bytes) { + u64 mask = boundary_bytes - 1, imask = ~mask; + u64 start = blk_rq_pos(req) << SECTOR_SHIFT; + u64 end = start + blk_rq_bytes(req) - 1; + + /* If greater then must be crossing a boundary */ + if (blk_rq_bytes(req) > boundary_bytes) + return false; + + if ((start & imask) != (end & imask)) + return false; + } + + return true; +} + static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns, struct request *req, struct nvme_command *cmnd, enum nvme_opcode op) @@ -942,6 +972,9 @@ static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns, if (req->cmd_flags & REQ_RAHEAD) dsmgmt |= NVME_RW_DSM_FREQ_PREFETCH; + if (req->cmd_flags & REQ_ATOMIC && !nvme_valid_atomic_write(req)) + return BLK_STS_INVAL; + cmnd->rw.opcode = op; cmnd->rw.flags = 0; cmnd->rw.nsid = cpu_to_le32(ns->head->ns_id); @@ -1920,6 +1953,23 @@ static void nvme_configure_metadata(struct nvme_ctrl *ctrl, } } + +static void nvme_update_atomic_write_disk_info(struct nvme_ns *ns, + struct nvme_id_ns *id, struct queue_limits *lim, + u32 bs, u32 atomic_bs) +{ + unsigned int boundary = 0; + + if (id->nsfeat & NVME_NS_FEAT_ATOMICS && id->nawupf) { + if (le16_to_cpu(id->nabspf)) + boundary = (le16_to_cpu(id->nabspf) + 1) * bs; + } + lim->atomic_write_hw_max = atomic_bs; + lim->atomic_write_hw_boundary = boundary; + lim->atomic_write_hw_unit_min = bs; + lim->atomic_write_hw_unit_max = rounddown_pow_of_two(atomic_bs); +} + static u32 nvme_max_drv_segments(struct nvme_ctrl *ctrl) { return ctrl->max_hw_sectors / (NVME_CTRL_PAGE_SIZE >> SECTOR_SHIFT) + 1; @@ -1966,6 +2016,8 @@ static bool nvme_update_disk_info(struct nvme_ns *ns, struct nvme_id_ns *id, atomic_bs = (1 + le16_to_cpu(id->nawupf)) * bs; else atomic_bs = (1 + ns->ctrl->subsys->awupf) * bs; + + nvme_update_atomic_write_disk_info(ns, id, lim, bs, atomic_bs); } if (id->nsfeat & NVME_NS_FEAT_IO_OPT) {