Message ID | 20250321184819.3847386-4-csander@purestorage.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Consistently look up fixed buffers before going async | expand |
On 3/21/25 18:48, Caleb Sander Mateos wrote: > For uring_cmd operations with fixed buffers, the fixed buffer lookup > happens in io_uring_cmd_import_fixed(), called from the ->uring_cmd() > implementation. A ->uring_cmd() implementation could return -EAGAIN on > the initial issue for any reason before io_uring_cmd_import_fixed(). > For example, nvme_uring_cmd_io() calls nvme_alloc_user_request() first, > which can return -EAGAIN if all tags in the tag set are in use. That's up to command when it resolves the buffer, you can just move the call to io_import_reg_buf() earlier in nvme cmd code and not working it around at the io_uring side. In general, it's a step back, it just got cleaned up from the mess where node resolution and buffer imports were separate steps and duplicate by every single request type that used it. > This ordering difference is observable when using > UBLK_U_IO_{,UN}REGISTER_IO_BUF SQEs to modify the fixed buffer table. > If the uring_cmd is followed by a UBLK_U_IO_UNREGISTER_IO_BUF operation > that unregisters the fixed buffer, the uring_cmd going async will cause > the fixed buffer lookup to fail because it happens after the unregister. > > Move the fixed buffer lookup out of io_uring_cmd_import_fixed() and > instead perform it in io_uring_cmd() before calling ->uring_cmd(). > io_uring_cmd_import_fixed() now only initializes an iov_iter from the > existing fixed buffer node. This division of responsibilities makes > sense as the fixed buffer lookup is an io_uring implementation detail > and independent of the ->uring_cmd() implementation. It also cuts down > on the need to pass around the io_uring issue_flags. > > Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> > Fixes: 27cb27b6d5ea ("io_uring: add support for kernel registered bvecs") > --- > drivers/nvme/host/ioctl.c | 10 ++++------ > include/linux/io_uring/cmd.h | 6 ++---- > io_uring/rsrc.c | 6 ++++++ > io_uring/rsrc.h | 2 ++ > io_uring/uring_cmd.c | 10 +++++++--- > 5 files changed, 21 insertions(+), 13 deletions(-) > > diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c > index fe9fb80c6a14..3fad74563b9e 100644 > --- a/drivers/nvme/host/ioctl.c > +++ b/drivers/nvme/host/ioctl.c > @@ -112,12 +112,11 @@ static struct request *nvme_alloc_user_request(struct request_queue *q,
On Fri, Mar 21, 2025 at 1:34 PM Pavel Begunkov <asml.silence@gmail.com> wrote: > > On 3/21/25 18:48, Caleb Sander Mateos wrote: > > For uring_cmd operations with fixed buffers, the fixed buffer lookup > > happens in io_uring_cmd_import_fixed(), called from the ->uring_cmd() > > implementation. A ->uring_cmd() implementation could return -EAGAIN on > > the initial issue for any reason before io_uring_cmd_import_fixed(). > > For example, nvme_uring_cmd_io() calls nvme_alloc_user_request() first, > > which can return -EAGAIN if all tags in the tag set are in use. > > That's up to command when it resolves the buffer, you can just > move the call to io_import_reg_buf() earlier in nvme cmd code > and not working it around at the io_uring side. > > In general, it's a step back, it just got cleaned up from the > mess where node resolution and buffer imports were separate > steps and duplicate by every single request type that used it. Yes, I considered just reordering the steps in nvme_uring_cmd_io(). But it seems easy for a future change to accidentally introduce another point where the issue can go async before it has looked up the fixed buffer. And I am imagining there will be more uring_cmd fixed buffer users added (e.g. btrfs). This seems like a generic problem rather than something specific to NVMe passthru. My other feeling is that the fixed buffer lookup is an io_uring-layer detail, whereas the use of the buffer is more a concern of the ->uring_cmd() implementation. If only the opcodes were consistent about how a fixed buffer is requested, we could do the lookup in the generic io_uring code like fixed files already do. But I'm open to implementing a different fix here if Jens would prefer. Best, Caleb
On 3/21/25 21:38, Caleb Sander Mateos wrote: > On Fri, Mar 21, 2025 at 1:34 PM Pavel Begunkov <asml.silence@gmail.com> wrote: >> >> On 3/21/25 18:48, Caleb Sander Mateos wrote: >>> For uring_cmd operations with fixed buffers, the fixed buffer lookup >>> happens in io_uring_cmd_import_fixed(), called from the ->uring_cmd() >>> implementation. A ->uring_cmd() implementation could return -EAGAIN on >>> the initial issue for any reason before io_uring_cmd_import_fixed(). >>> For example, nvme_uring_cmd_io() calls nvme_alloc_user_request() first, >>> which can return -EAGAIN if all tags in the tag set are in use. >> >> That's up to command when it resolves the buffer, you can just >> move the call to io_import_reg_buf() earlier in nvme cmd code >> and not working it around at the io_uring side. >> >> In general, it's a step back, it just got cleaned up from the >> mess where node resolution and buffer imports were separate >> steps and duplicate by every single request type that used it. > > Yes, I considered just reordering the steps in nvme_uring_cmd_io(). > But it seems easy for a future change to accidentally introduce > another point where the issue can go async before it has looked up the > fixed buffer. And I am imagining there will be more uring_cmd fixed > buffer users added (e.g. btrfs). This seems like a generic problem > rather than something specific to NVMe passthru. That's working around the api for ordering requests, that's the reason you have an ordering problem. > My other feeling is that the fixed buffer lookup is an io_uring-layer > detail, whereas the use of the buffer is more a concern of the > ->uring_cmd() implementation. If only the opcodes were consistent > about how a fixed buffer is requested, we could do the lookup in the > generic io_uring code like fixed files already do. That's one of things I'd hope was done better, and not only specifically for registered buffers, but it's late for that. > But I'm open to implementing a different fix here if Jens would prefer. It's not a fix, the behaviour is well within the constrained of io_uring.
diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c index fe9fb80c6a14..3fad74563b9e 100644 --- a/drivers/nvme/host/ioctl.c +++ b/drivers/nvme/host/ioctl.c @@ -112,12 +112,11 @@ static struct request *nvme_alloc_user_request(struct request_queue *q, return req; } static int nvme_map_user_request(struct request *req, u64 ubuffer, unsigned bufflen, void __user *meta_buffer, unsigned meta_len, - struct io_uring_cmd *ioucmd, unsigned int flags, - unsigned int iou_issue_flags) + struct io_uring_cmd *ioucmd, unsigned int flags) { struct request_queue *q = req->q; struct nvme_ns *ns = q->queuedata; struct block_device *bdev = ns ? ns->disk->part0 : NULL; bool supports_metadata = bdev && blk_get_integrity(bdev->bd_disk); @@ -141,12 +140,11 @@ static int nvme_map_user_request(struct request *req, u64 ubuffer, /* fixedbufs is only for non-vectored io */ if (WARN_ON_ONCE(flags & NVME_IOCTL_VEC)) return -EINVAL; ret = io_uring_cmd_import_fixed(ubuffer, bufflen, - rq_data_dir(req), &iter, ioucmd, - iou_issue_flags); + rq_data_dir(req), &iter, ioucmd); if (ret < 0) goto out; ret = blk_rq_map_user_iov(q, req, NULL, &iter, GFP_KERNEL); } else { ret = blk_rq_map_user_io(req, NULL, nvme_to_user_ptr(ubuffer), @@ -194,11 +192,11 @@ static int nvme_submit_user_cmd(struct request_queue *q, return PTR_ERR(req); req->timeout = timeout; if (ubuffer && bufflen) { ret = nvme_map_user_request(req, ubuffer, bufflen, meta_buffer, - meta_len, NULL, flags, 0); + meta_len, NULL, flags); if (ret) return ret; } bio = req->bio; @@ -514,11 +512,11 @@ static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, struct nvme_ns *ns, req->timeout = d.timeout_ms ? msecs_to_jiffies(d.timeout_ms) : 0; if (d.data_len) { ret = nvme_map_user_request(req, d.addr, d.data_len, nvme_to_user_ptr(d.metadata), - d.metadata_len, ioucmd, vec, issue_flags); + d.metadata_len, ioucmd, vec); if (ret) return ret; } /* to free bio on completion, as req->bio will be null at that time */ diff --git a/include/linux/io_uring/cmd.h b/include/linux/io_uring/cmd.h index 598cacda4aa3..ea243bfab2a8 100644 --- a/include/linux/io_uring/cmd.h +++ b/include/linux/io_uring/cmd.h @@ -39,12 +39,11 @@ static inline void io_uring_cmd_private_sz_check(size_t cmd_sz) ) #if defined(CONFIG_IO_URING) int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, struct iov_iter *iter, - struct io_uring_cmd *ioucmd, - unsigned int issue_flags); + struct io_uring_cmd *ioucmd); /* * Completes the request, i.e. posts an io_uring CQE and deallocates @ioucmd * and the corresponding io_uring request. * @@ -69,12 +68,11 @@ void io_uring_cmd_mark_cancelable(struct io_uring_cmd *cmd, void io_uring_cmd_issue_blocking(struct io_uring_cmd *ioucmd); #else static inline int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, - struct iov_iter *iter, struct io_uring_cmd *ioucmd, - unsigned int issue_flags) + struct iov_iter *iter, struct io_uring_cmd *ioucmd) { return -EOPNOTSUPP; } static inline void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret, u64 ret2, unsigned issue_flags) diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index 5fff6ba2b7c0..ad0dfe51acb1 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -1099,10 +1099,16 @@ int io_import_reg_buf(struct io_kiocb *req, struct iov_iter *iter, if (!node) return -EFAULT; return io_import_fixed(ddir, iter, node->buf, buf_addr, len); } +int io_import_buf_node(struct io_kiocb *req, struct iov_iter *iter, + u64 buf_addr, size_t len, int ddir) +{ + return io_import_fixed(ddir, iter, req->buf_node->buf, buf_addr, len); +} + /* Lock two rings at once. The rings must be different! */ static void lock_two_rings(struct io_ring_ctx *ctx1, struct io_ring_ctx *ctx2) { if (ctx1 > ctx2) swap(ctx1, ctx2); diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h index f10a1252b3e9..bc0f8f0a2054 100644 --- a/io_uring/rsrc.h +++ b/io_uring/rsrc.h @@ -59,10 +59,12 @@ int io_rsrc_data_alloc(struct io_rsrc_data *data, unsigned nr); struct io_rsrc_node *io_find_buf_node(struct io_kiocb *req, unsigned issue_flags); int io_import_reg_buf(struct io_kiocb *req, struct iov_iter *iter, u64 buf_addr, size_t len, int ddir, unsigned issue_flags); +int io_import_buf_node(struct io_kiocb *req, struct iov_iter *iter, + u64 buf_addr, size_t len, int ddir); int io_register_clone_buffers(struct io_ring_ctx *ctx, void __user *arg); int io_sqe_buffers_unregister(struct io_ring_ctx *ctx); int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg, unsigned int nr_args, u64 __user *tags); diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index de39b602aa82..15a76fe48fe5 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -232,10 +232,15 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags) return -EOPNOTSUPP; issue_flags |= IO_URING_F_IOPOLL; req->iopoll_completed = 0; } + if (ioucmd->flags & IORING_URING_CMD_FIXED) { + if (!io_find_buf_node(req, issue_flags)) + return -EFAULT; + } + ret = file->f_op->uring_cmd(ioucmd, issue_flags); if (ret == -EAGAIN || ret == -EIOCBQUEUED) return ret; if (ret < 0) req_set_fail(req); @@ -244,16 +249,15 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags) return IOU_OK; } int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, struct iov_iter *iter, - struct io_uring_cmd *ioucmd, - unsigned int issue_flags) + struct io_uring_cmd *ioucmd) { struct io_kiocb *req = cmd_to_io_kiocb(ioucmd); - return io_import_reg_buf(req, iter, ubuf, len, rw, issue_flags); + return io_import_buf_node(req, iter, ubuf, len, rw); } EXPORT_SYMBOL_GPL(io_uring_cmd_import_fixed); void io_uring_cmd_issue_blocking(struct io_uring_cmd *ioucmd) {
For uring_cmd operations with fixed buffers, the fixed buffer lookup happens in io_uring_cmd_import_fixed(), called from the ->uring_cmd() implementation. A ->uring_cmd() implementation could return -EAGAIN on the initial issue for any reason before io_uring_cmd_import_fixed(). For example, nvme_uring_cmd_io() calls nvme_alloc_user_request() first, which can return -EAGAIN if all tags in the tag set are in use. This ordering difference is observable when using UBLK_U_IO_{,UN}REGISTER_IO_BUF SQEs to modify the fixed buffer table. If the uring_cmd is followed by a UBLK_U_IO_UNREGISTER_IO_BUF operation that unregisters the fixed buffer, the uring_cmd going async will cause the fixed buffer lookup to fail because it happens after the unregister. Move the fixed buffer lookup out of io_uring_cmd_import_fixed() and instead perform it in io_uring_cmd() before calling ->uring_cmd(). io_uring_cmd_import_fixed() now only initializes an iov_iter from the existing fixed buffer node. This division of responsibilities makes sense as the fixed buffer lookup is an io_uring implementation detail and independent of the ->uring_cmd() implementation. It also cuts down on the need to pass around the io_uring issue_flags. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Fixes: 27cb27b6d5ea ("io_uring: add support for kernel registered bvecs") --- drivers/nvme/host/ioctl.c | 10 ++++------ include/linux/io_uring/cmd.h | 6 ++---- io_uring/rsrc.c | 6 ++++++ io_uring/rsrc.h | 2 ++ io_uring/uring_cmd.c | 10 +++++++--- 5 files changed, 21 insertions(+), 13 deletions(-)