Message ID | 20220923092854.5116-4-joshi.k@samsung.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [for-next,v8,1/5] io_uring: add io_uring_cmd_import_fixed | expand |
> -static void *nvme_add_user_metadata(struct bio *bio, void __user *ubuf, > - unsigned len, u32 seed, bool write) > +static void *nvme_add_user_metadata(struct request *req, void __user *ubuf, > + unsigned len, u32 seed) Please split out a separate patch just for the changes to nvme_add_user_metadata. > - if (ret == len) > + if (ret == len) { > + req->cmd_flags |= REQ_INTEGRITY; > return buf; > + } > ret = -ENOMEM; Nit: Please keep the successful path in line and branch out for the errors, i.e. if (ret != len) { ret = -ENOMEM; goto out_free_meta; } req->cmd_flags |= REQ_INTEGRITY; return buf; We should have done this already with the old code, but now that more is added to the success path it becomes even more important:. > + else { > + struct iovec fast_iov[UIO_FASTIOV]; > + struct iovec *iov = fast_iov; > + struct iov_iter iter; > + > + ret = import_iovec(rq_data_dir(req), ubuffer, bufflen, > + UIO_FASTIOV, &iov, &iter); > + if (ret < 0) > goto out; > + ret = blk_rq_map_user_iov(q, req, NULL, &iter, GFP_KERNEL); > + kfree(iov); > + } While you touch this: I think thi block of code would also be a good separate helper. Maybe even in the block layer given the the scsi ioctl code and sg duplicate it, and already missed the fast_iov treatment due to the duplication. Having this in a separate function is also nice to keep the fast_iov stack footprint isolated. > + if (ret) > + goto out; > + bio = req->bio; I think we can also do away with this bio local variable now. > + if (bdev) > + bio_set_dev(bio, bdev); We don't need the bio_set_dev here as mentioned last time, so I think we should remove it in a prep patch. > + ret = nvme_map_user_request(req, ubuffer, bufflen, meta_buffer, > + meta_len, meta_seed, &meta, vec); Nit: Add an extra tab for indentation here please. > if (IS_ERR(req)) > return PTR_ERR(req); > + req->timeout = d.timeout_ms ? msecs_to_jiffies(d.timeout_ms) : 0; if (d.timeout_ms) req->timeout = msecs_to_jiffies(d.timeout_ms);
On Fri, Sep 23, 2022 at 05:38:19PM +0200, Christoph Hellwig wrote: > >> + else { >> + struct iovec fast_iov[UIO_FASTIOV]; >> + struct iovec *iov = fast_iov; >> + struct iov_iter iter; >> + >> + ret = import_iovec(rq_data_dir(req), ubuffer, bufflen, >> + UIO_FASTIOV, &iov, &iter); >> + if (ret < 0) >> goto out; >> + ret = blk_rq_map_user_iov(q, req, NULL, &iter, GFP_KERNEL); >> + kfree(iov); >> + } > >While you touch this: I think thi block of code would also be a good >separate helper. Maybe even in the block layer given the the scsi >ioctl code and sg duplicate it, and already missed the fast_iov >treatment due to the duplication. Having this in a separate function >is also nice to keep the fast_iov stack footprint isolated. Totally agree on goodness. I think instead of new helper this seems suited to go inside blk_rq_map_user_iov itself. That will make it symmetric to blk_rq_map_user which also combines import + mapping. But if I go that route now, I will have to alter parameters of blk_rq_map_user_iov, and that will make it mandatory to change the callers (scsi-ioctl, sg) too. Nothing hairy, but that means further growth of unrelated elements in this series. Hope you agree that separate series is much better, which I will post after this. Will fold all other changes you pointed.
>> + if (ret) >> + goto out; >> + bio = req->bio; > >I think we can also do away with this bio local variable now. > >> + if (bdev) >> + bio_set_dev(bio, bdev); > >We don't need the bio_set_dev here as mentioned last time, so I think >we should remove it in a prep patch. we miss completing polled io with this change. bdev needs to be put in bio to complete polled passthrough IO. nvme_ns_chr_uring_cmd_iopoll uses bio_poll and that in turn makes use of this.
On Mon, Sep 26, 2022 at 01:13:54AM +0530, Kanchan Joshi wrote: >>> + if (ret) >>> + goto out; >>> + bio = req->bio; >> >> I think we can also do away with this bio local variable now. >> >>> + if (bdev) >>> + bio_set_dev(bio, bdev); >> >> We don't need the bio_set_dev here as mentioned last time, so I think >> we should remove it in a prep patch. > > we miss completing polled io with this change. > bdev needs to be put in bio to complete polled passthrough IO. > nvme_ns_chr_uring_cmd_iopoll uses bio_poll and that in turn makes use of > this. Oh, indeed - polling is another and someone unexpected user in addition to the I/O accounting that does not apply to passthrough requests. That also means we can't poll admin commands at all.
On Mon, Sep 26, 2022 at 04:51:59PM +0200, Christoph Hellwig wrote: >On Mon, Sep 26, 2022 at 01:13:54AM +0530, Kanchan Joshi wrote: >>>> + if (ret) >>>> + goto out; >>>> + bio = req->bio; >>> >>> I think we can also do away with this bio local variable now. >>> >>>> + if (bdev) >>>> + bio_set_dev(bio, bdev); >>> >>> We don't need the bio_set_dev here as mentioned last time, so I think >>> we should remove it in a prep patch. >> >> we miss completing polled io with this change. >> bdev needs to be put in bio to complete polled passthrough IO. >> nvme_ns_chr_uring_cmd_iopoll uses bio_poll and that in turn makes use of >> this. > >Oh, indeed - polling is another and someone unexpected user in >addition to the I/O accounting that does not apply to passthrough >requests. That also means we can't poll admin commands at all. Yes. That falls back to IRQ completions. I think it should be possible to support if we use request-only interface. Most of the information in bio-poll interface comes from request. But I doubt if polling for admin command is a useful thing.
diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c index 548aca8b5b9f..9537991deac9 100644 --- a/drivers/nvme/host/ioctl.c +++ b/drivers/nvme/host/ioctl.c @@ -20,19 +20,20 @@ static void __user *nvme_to_user_ptr(uintptr_t ptrval) return (void __user *)ptrval; } -static void *nvme_add_user_metadata(struct bio *bio, void __user *ubuf, - unsigned len, u32 seed, bool write) +static void *nvme_add_user_metadata(struct request *req, void __user *ubuf, + unsigned len, u32 seed) { struct bio_integrity_payload *bip; int ret = -ENOMEM; void *buf; + struct bio *bio = req->bio; buf = kmalloc(len, GFP_KERNEL); if (!buf) goto out; ret = -EFAULT; - if (write && copy_from_user(buf, ubuf, len)) + if ((req_op(req) == REQ_OP_DRV_OUT) && copy_from_user(buf, ubuf, len)) goto out_free_meta; bip = bio_integrity_alloc(bio, GFP_KERNEL, 1); @@ -45,8 +46,10 @@ static void *nvme_add_user_metadata(struct bio *bio, void __user *ubuf, bip->bip_iter.bi_sector = seed; ret = bio_integrity_add_page(bio, virt_to_page(buf), len, offset_in_page(buf)); - if (ret == len) + if (ret == len) { + req->cmd_flags |= REQ_INTEGRITY; return buf; + } ret = -ENOMEM; out_free_meta: kfree(buf); @@ -65,70 +68,67 @@ static int nvme_finish_user_metadata(struct request *req, void __user *ubuf, } static struct request *nvme_alloc_user_request(struct request_queue *q, - struct nvme_command *cmd, void __user *ubuffer, - unsigned bufflen, void __user *meta_buffer, unsigned meta_len, - u32 meta_seed, void **metap, unsigned timeout, bool vec, - blk_opf_t rq_flags, blk_mq_req_flags_t blk_flags) + struct nvme_command *cmd, blk_opf_t rq_flags, + blk_mq_req_flags_t blk_flags) { - bool write = nvme_is_write(cmd); - struct nvme_ns *ns = q->queuedata; - struct block_device *bdev = ns ? ns->disk->part0 : NULL; struct request *req; - struct bio *bio = NULL; - void *meta = NULL; - int ret; req = blk_mq_alloc_request(q, nvme_req_op(cmd) | rq_flags, blk_flags); if (IS_ERR(req)) return req; nvme_init_request(req, cmd); - - if (timeout) - req->timeout = timeout; nvme_req(req)->flags |= NVME_REQ_USERCMD; + return req; +} - if (ubuffer && bufflen) { - if (!vec) - ret = blk_rq_map_user(q, req, NULL, ubuffer, bufflen, - GFP_KERNEL); - else { - struct iovec fast_iov[UIO_FASTIOV]; - struct iovec *iov = fast_iov; - struct iov_iter iter; - - ret = import_iovec(rq_data_dir(req), ubuffer, bufflen, - UIO_FASTIOV, &iov, &iter); - if (ret < 0) - goto out; - ret = blk_rq_map_user_iov(q, req, NULL, &iter, - GFP_KERNEL); - kfree(iov); - } - if (ret) +static int nvme_map_user_request(struct request *req, void __user *ubuffer, + unsigned bufflen, void __user *meta_buffer, unsigned meta_len, + u32 meta_seed, void **metap, bool vec) +{ + struct request_queue *q = req->q; + struct nvme_ns *ns = q->queuedata; + struct block_device *bdev = ns ? ns->disk->part0 : NULL; + struct bio *bio = NULL; + void *meta = NULL; + int ret; + + if (!vec) + ret = blk_rq_map_user(q, req, NULL, ubuffer, bufflen, + GFP_KERNEL); + else { + struct iovec fast_iov[UIO_FASTIOV]; + struct iovec *iov = fast_iov; + struct iov_iter iter; + + ret = import_iovec(rq_data_dir(req), ubuffer, bufflen, + UIO_FASTIOV, &iov, &iter); + if (ret < 0) goto out; - bio = req->bio; - if (bdev) - bio_set_dev(bio, bdev); - if (bdev && meta_buffer && meta_len) { - meta = nvme_add_user_metadata(bio, meta_buffer, meta_len, - meta_seed, write); - if (IS_ERR(meta)) { - ret = PTR_ERR(meta); - goto out_unmap; - } - req->cmd_flags |= REQ_INTEGRITY; - *metap = meta; + ret = blk_rq_map_user_iov(q, req, NULL, &iter, GFP_KERNEL); + kfree(iov); + } + if (ret) + goto out; + bio = req->bio; + if (bdev) + bio_set_dev(bio, bdev); + if (bdev && meta_buffer && meta_len) { + meta = nvme_add_user_metadata(req, meta_buffer, meta_len, + meta_seed); + if (IS_ERR(meta)) { + ret = PTR_ERR(meta); + goto out_unmap; } + *metap = meta; } - return req; + return ret; out_unmap: if (bio) blk_rq_unmap_user(bio); out: - blk_mq_free_request(req); - return ERR_PTR(ret); + return ret; } static int nvme_submit_user_cmd(struct request_queue *q, @@ -141,13 +141,19 @@ static int nvme_submit_user_cmd(struct request_queue *q, struct bio *bio; int ret; - req = nvme_alloc_user_request(q, cmd, ubuffer, bufflen, meta_buffer, - meta_len, meta_seed, &meta, timeout, vec, 0, 0); + req = nvme_alloc_user_request(q, cmd, 0, 0); if (IS_ERR(req)) return PTR_ERR(req); - bio = req->bio; + req->timeout = timeout; + if (ubuffer && bufflen) { + ret = nvme_map_user_request(req, ubuffer, bufflen, meta_buffer, + meta_len, meta_seed, &meta, vec); + if (ret) + goto out; + } + bio = req->bio; ret = nvme_execute_passthru_rq(req); if (result) @@ -157,6 +163,7 @@ static int nvme_submit_user_cmd(struct request_queue *q, meta_len, ret); if (bio) blk_rq_unmap_user(bio); +out: blk_mq_free_request(req); return ret; } @@ -418,6 +425,7 @@ static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, struct nvme_ns *ns, blk_opf_t rq_flags = 0; blk_mq_req_flags_t blk_flags = 0; void *meta = NULL; + int ret; if (!capable(CAP_SYS_ADMIN)) return -EACCES; @@ -457,13 +465,18 @@ static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, struct nvme_ns *ns, rq_flags |= REQ_POLLED; retry: - req = nvme_alloc_user_request(q, &c, nvme_to_user_ptr(d.addr), - d.data_len, nvme_to_user_ptr(d.metadata), - d.metadata_len, 0, &meta, d.timeout_ms ? - msecs_to_jiffies(d.timeout_ms) : 0, vec, rq_flags, - blk_flags); + req = nvme_alloc_user_request(q, &c, rq_flags, blk_flags); if (IS_ERR(req)) return PTR_ERR(req); + req->timeout = d.timeout_ms ? msecs_to_jiffies(d.timeout_ms) : 0; + + if (d.addr && d.data_len) { + ret = nvme_map_user_request(req, nvme_to_user_ptr(d.addr), + d.data_len, nvme_to_user_ptr(d.metadata), + d.metadata_len, 0, &meta, vec); + if (ret) + goto out_err; + } req->end_io = nvme_uring_cmd_end_io; req->end_io_data = ioucmd; @@ -486,6 +499,9 @@ static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, struct nvme_ns *ns, blk_execute_rq_nowait(req, false); return -EIOCBQUEUED; +out_err: + blk_mq_free_request(req); + return ret; } static bool is_ctrl_ioctl(unsigned int cmd)