diff mbox series

[for-next,v8,3/5] nvme: refactor nvme_alloc_user_request

Message ID 20220923092854.5116-4-joshi.k@samsung.com (mailing list archive)
State New, archived
Headers show
Series [for-next,v8,1/5] io_uring: add io_uring_cmd_import_fixed | expand

Commit Message

Kanchan Joshi Sept. 23, 2022, 9:28 a.m. UTC
Separate this out to two functions with reduced number of arguments.
While at it, do bit of refactoring in nvme_add_user_metadata too.

Comments

Christoph Hellwig Sept. 23, 2022, 3:38 p.m. UTC | #1
> -static void *nvme_add_user_metadata(struct bio *bio, void __user *ubuf,
> -		unsigned len, u32 seed, bool write)
> +static void *nvme_add_user_metadata(struct request *req, void __user *ubuf,
> +		unsigned len, u32 seed)

Please split out a separate patch just for the changes to
nvme_add_user_metadata.

> -	if (ret == len)
> +	if (ret == len) {
> +		req->cmd_flags |= REQ_INTEGRITY;
>  		return buf;
> +	}
>  	ret = -ENOMEM;

Nit: Please keep the successful path in line and branch out for the
errors, i.e.


	if (ret != len) {
	 	ret = -ENOMEM;
		goto out_free_meta;
	}

	req->cmd_flags |= REQ_INTEGRITY;
	return buf;

We should have done this already with the old code, but now that more
is added to the success path it becomes even more important:.

> +	else {
> +		struct iovec fast_iov[UIO_FASTIOV];
> +		struct iovec *iov = fast_iov;
> +		struct iov_iter iter;
> +
> +		ret = import_iovec(rq_data_dir(req), ubuffer, bufflen,
> +				UIO_FASTIOV, &iov, &iter);
> +		if (ret < 0)
>  			goto out;
> +		ret = blk_rq_map_user_iov(q, req, NULL, &iter, GFP_KERNEL);
> +		kfree(iov);
> +	}

While you touch this:  I think thi block of code would also be a good
separate helper.  Maybe even in the block layer given the the scsi
ioctl code and sg duplicate it, and already missed the fast_iov
treatment due to the duplication.  Having this in a separate function
is also nice to keep the fast_iov stack footprint isolated.

> +	if (ret)
> +		goto out;
> +	bio = req->bio;

I think we can also do away with this bio local variable now.

> +	if (bdev)
> +		bio_set_dev(bio, bdev);

We don't need the bio_set_dev here as mentioned last time, so I think
we should remove it in a prep patch.

> +		ret = nvme_map_user_request(req, ubuffer, bufflen, meta_buffer,
> +			meta_len, meta_seed, &meta, vec);

Nit: Add an extra tab for indentation here please.

>  	if (IS_ERR(req))
>  		return PTR_ERR(req);
> +	req->timeout = d.timeout_ms ? msecs_to_jiffies(d.timeout_ms) : 0;

	if (d.timeout_ms)
		req->timeout = msecs_to_jiffies(d.timeout_ms);
Kanchan Joshi Sept. 25, 2022, 5:39 p.m. UTC | #2
On Fri, Sep 23, 2022 at 05:38:19PM +0200, Christoph Hellwig wrote:
>
>> +	else {
>> +		struct iovec fast_iov[UIO_FASTIOV];
>> +		struct iovec *iov = fast_iov;
>> +		struct iov_iter iter;
>> +
>> +		ret = import_iovec(rq_data_dir(req), ubuffer, bufflen,
>> +				UIO_FASTIOV, &iov, &iter);
>> +		if (ret < 0)
>>  			goto out;
>> +		ret = blk_rq_map_user_iov(q, req, NULL, &iter, GFP_KERNEL);
>> +		kfree(iov);
>> +	}
>
>While you touch this:  I think thi block of code would also be a good
>separate helper.  Maybe even in the block layer given the the scsi
>ioctl code and sg duplicate it, and already missed the fast_iov
>treatment due to the duplication.  Having this in a separate function
>is also nice to keep the fast_iov stack footprint isolated.

Totally agree on goodness. 
I think instead of new helper this seems suited to go inside
blk_rq_map_user_iov itself. That will make it symmetric to
blk_rq_map_user which also combines import + mapping.

But if I go that route now, I will have to alter parameters of
blk_rq_map_user_iov, and that will make it mandatory to change the
callers (scsi-ioctl, sg) too. Nothing hairy, but that means further
growth of unrelated elements in this series. Hope you agree that
separate series is much better, which I will post after this.

Will fold all other changes you pointed.
Kanchan Joshi Sept. 25, 2022, 7:43 p.m. UTC | #3
>> +	if (ret)
>> +		goto out;
>> +	bio = req->bio;
>
>I think we can also do away with this bio local variable now.
>
>> +	if (bdev)
>> +		bio_set_dev(bio, bdev);
>
>We don't need the bio_set_dev here as mentioned last time, so I think
>we should remove it in a prep patch.

we miss completing polled io with this change.
bdev needs to be put in bio to complete polled passthrough IO.
nvme_ns_chr_uring_cmd_iopoll uses bio_poll and that in turn makes use of
this.
Christoph Hellwig Sept. 26, 2022, 2:51 p.m. UTC | #4
On Mon, Sep 26, 2022 at 01:13:54AM +0530, Kanchan Joshi wrote:
>>> +	if (ret)
>>> +		goto out;
>>> +	bio = req->bio;
>>
>> I think we can also do away with this bio local variable now.
>>
>>> +	if (bdev)
>>> +		bio_set_dev(bio, bdev);
>>
>> We don't need the bio_set_dev here as mentioned last time, so I think
>> we should remove it in a prep patch.
>
> we miss completing polled io with this change.
> bdev needs to be put in bio to complete polled passthrough IO.
> nvme_ns_chr_uring_cmd_iopoll uses bio_poll and that in turn makes use of
> this.

Oh, indeed - polling is another and someone unexpected user in
addition to the I/O accounting that does not apply to passthrough
requests.  That also means we can't poll admin commands at all.
Kanchan Joshi Sept. 27, 2022, 4:57 p.m. UTC | #5
On Mon, Sep 26, 2022 at 04:51:59PM +0200, Christoph Hellwig wrote:
>On Mon, Sep 26, 2022 at 01:13:54AM +0530, Kanchan Joshi wrote:
>>>> +	if (ret)
>>>> +		goto out;
>>>> +	bio = req->bio;
>>>
>>> I think we can also do away with this bio local variable now.
>>>
>>>> +	if (bdev)
>>>> +		bio_set_dev(bio, bdev);
>>>
>>> We don't need the bio_set_dev here as mentioned last time, so I think
>>> we should remove it in a prep patch.
>>
>> we miss completing polled io with this change.
>> bdev needs to be put in bio to complete polled passthrough IO.
>> nvme_ns_chr_uring_cmd_iopoll uses bio_poll and that in turn makes use of
>> this.
>
>Oh, indeed - polling is another and someone unexpected user in
>addition to the I/O accounting that does not apply to passthrough
>requests.  That also means we can't poll admin commands at all.

Yes. That falls back to IRQ completions.

I think it should be possible to support if we use request-only
interface. Most of the information in bio-poll interface comes from
request.
But I doubt if polling for admin command is a useful thing.
diff mbox series

Patch

diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
index 548aca8b5b9f..9537991deac9 100644
--- a/drivers/nvme/host/ioctl.c
+++ b/drivers/nvme/host/ioctl.c
@@ -20,19 +20,20 @@  static void __user *nvme_to_user_ptr(uintptr_t ptrval)
 	return (void __user *)ptrval;
 }
 
-static void *nvme_add_user_metadata(struct bio *bio, void __user *ubuf,
-		unsigned len, u32 seed, bool write)
+static void *nvme_add_user_metadata(struct request *req, void __user *ubuf,
+		unsigned len, u32 seed)
 {
 	struct bio_integrity_payload *bip;
 	int ret = -ENOMEM;
 	void *buf;
+	struct bio *bio = req->bio;
 
 	buf = kmalloc(len, GFP_KERNEL);
 	if (!buf)
 		goto out;
 
 	ret = -EFAULT;
-	if (write && copy_from_user(buf, ubuf, len))
+	if ((req_op(req) == REQ_OP_DRV_OUT) && copy_from_user(buf, ubuf, len))
 		goto out_free_meta;
 
 	bip = bio_integrity_alloc(bio, GFP_KERNEL, 1);
@@ -45,8 +46,10 @@  static void *nvme_add_user_metadata(struct bio *bio, void __user *ubuf,
 	bip->bip_iter.bi_sector = seed;
 	ret = bio_integrity_add_page(bio, virt_to_page(buf), len,
 			offset_in_page(buf));
-	if (ret == len)
+	if (ret == len) {
+		req->cmd_flags |= REQ_INTEGRITY;
 		return buf;
+	}
 	ret = -ENOMEM;
 out_free_meta:
 	kfree(buf);
@@ -65,70 +68,67 @@  static int nvme_finish_user_metadata(struct request *req, void __user *ubuf,
 }
 
 static struct request *nvme_alloc_user_request(struct request_queue *q,
-		struct nvme_command *cmd, void __user *ubuffer,
-		unsigned bufflen, void __user *meta_buffer, unsigned meta_len,
-		u32 meta_seed, void **metap, unsigned timeout, bool vec,
-		blk_opf_t rq_flags, blk_mq_req_flags_t blk_flags)
+		struct nvme_command *cmd, blk_opf_t rq_flags,
+		blk_mq_req_flags_t blk_flags)
 {
-	bool write = nvme_is_write(cmd);
-	struct nvme_ns *ns = q->queuedata;
-	struct block_device *bdev = ns ? ns->disk->part0 : NULL;
 	struct request *req;
-	struct bio *bio = NULL;
-	void *meta = NULL;
-	int ret;
 
 	req = blk_mq_alloc_request(q, nvme_req_op(cmd) | rq_flags, blk_flags);
 	if (IS_ERR(req))
 		return req;
 	nvme_init_request(req, cmd);
-
-	if (timeout)
-		req->timeout = timeout;
 	nvme_req(req)->flags |= NVME_REQ_USERCMD;
+	return req;
+}
 
-	if (ubuffer && bufflen) {
-		if (!vec)
-			ret = blk_rq_map_user(q, req, NULL, ubuffer, bufflen,
-				GFP_KERNEL);
-		else {
-			struct iovec fast_iov[UIO_FASTIOV];
-			struct iovec *iov = fast_iov;
-			struct iov_iter iter;
-
-			ret = import_iovec(rq_data_dir(req), ubuffer, bufflen,
-					UIO_FASTIOV, &iov, &iter);
-			if (ret < 0)
-				goto out;
-			ret = blk_rq_map_user_iov(q, req, NULL, &iter,
-					GFP_KERNEL);
-			kfree(iov);
-		}
-		if (ret)
+static int nvme_map_user_request(struct request *req, void __user *ubuffer,
+		unsigned bufflen, void __user *meta_buffer, unsigned meta_len,
+		u32 meta_seed, void **metap, bool vec)
+{
+	struct request_queue *q = req->q;
+	struct nvme_ns *ns = q->queuedata;
+	struct block_device *bdev = ns ? ns->disk->part0 : NULL;
+	struct bio *bio = NULL;
+	void *meta = NULL;
+	int ret;
+
+	if (!vec)
+		ret = blk_rq_map_user(q, req, NULL, ubuffer, bufflen,
+			GFP_KERNEL);
+	else {
+		struct iovec fast_iov[UIO_FASTIOV];
+		struct iovec *iov = fast_iov;
+		struct iov_iter iter;
+
+		ret = import_iovec(rq_data_dir(req), ubuffer, bufflen,
+				UIO_FASTIOV, &iov, &iter);
+		if (ret < 0)
 			goto out;
-		bio = req->bio;
-		if (bdev)
-			bio_set_dev(bio, bdev);
-		if (bdev && meta_buffer && meta_len) {
-			meta = nvme_add_user_metadata(bio, meta_buffer, meta_len,
-					meta_seed, write);
-			if (IS_ERR(meta)) {
-				ret = PTR_ERR(meta);
-				goto out_unmap;
-			}
-			req->cmd_flags |= REQ_INTEGRITY;
-			*metap = meta;
+		ret = blk_rq_map_user_iov(q, req, NULL, &iter, GFP_KERNEL);
+		kfree(iov);
+	}
+	if (ret)
+		goto out;
+	bio = req->bio;
+	if (bdev)
+		bio_set_dev(bio, bdev);
+	if (bdev && meta_buffer && meta_len) {
+		meta = nvme_add_user_metadata(req, meta_buffer, meta_len,
+				meta_seed);
+		if (IS_ERR(meta)) {
+			ret = PTR_ERR(meta);
+			goto out_unmap;
 		}
+		*metap = meta;
 	}
 
-	return req;
+	return ret;
 
 out_unmap:
 	if (bio)
 		blk_rq_unmap_user(bio);
 out:
-	blk_mq_free_request(req);
-	return ERR_PTR(ret);
+	return ret;
 }
 
 static int nvme_submit_user_cmd(struct request_queue *q,
@@ -141,13 +141,19 @@  static int nvme_submit_user_cmd(struct request_queue *q,
 	struct bio *bio;
 	int ret;
 
-	req = nvme_alloc_user_request(q, cmd, ubuffer, bufflen, meta_buffer,
-			meta_len, meta_seed, &meta, timeout, vec, 0, 0);
+	req = nvme_alloc_user_request(q, cmd, 0, 0);
 	if (IS_ERR(req))
 		return PTR_ERR(req);
 
-	bio = req->bio;
+	req->timeout = timeout;
+	if (ubuffer && bufflen) {
+		ret = nvme_map_user_request(req, ubuffer, bufflen, meta_buffer,
+			meta_len, meta_seed, &meta, vec);
+		if (ret)
+			goto out;
+	}
 
+	bio = req->bio;
 	ret = nvme_execute_passthru_rq(req);
 
 	if (result)
@@ -157,6 +163,7 @@  static int nvme_submit_user_cmd(struct request_queue *q,
 						meta_len, ret);
 	if (bio)
 		blk_rq_unmap_user(bio);
+out:
 	blk_mq_free_request(req);
 	return ret;
 }
@@ -418,6 +425,7 @@  static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 	blk_opf_t rq_flags = 0;
 	blk_mq_req_flags_t blk_flags = 0;
 	void *meta = NULL;
+	int ret;
 
 	if (!capable(CAP_SYS_ADMIN))
 		return -EACCES;
@@ -457,13 +465,18 @@  static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 		rq_flags |= REQ_POLLED;
 
 retry:
-	req = nvme_alloc_user_request(q, &c, nvme_to_user_ptr(d.addr),
-			d.data_len, nvme_to_user_ptr(d.metadata),
-			d.metadata_len, 0, &meta, d.timeout_ms ?
-			msecs_to_jiffies(d.timeout_ms) : 0, vec, rq_flags,
-			blk_flags);
+	req = nvme_alloc_user_request(q, &c, rq_flags, blk_flags);
 	if (IS_ERR(req))
 		return PTR_ERR(req);
+	req->timeout = d.timeout_ms ? msecs_to_jiffies(d.timeout_ms) : 0;
+
+	if (d.addr && d.data_len) {
+		ret = nvme_map_user_request(req, nvme_to_user_ptr(d.addr),
+			d.data_len, nvme_to_user_ptr(d.metadata),
+			d.metadata_len, 0, &meta, vec);
+		if (ret)
+			goto out_err;
+	}
 	req->end_io = nvme_uring_cmd_end_io;
 	req->end_io_data = ioucmd;
 
@@ -486,6 +499,9 @@  static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 
 	blk_execute_rq_nowait(req, false);
 	return -EIOCBQUEUED;
+out_err:
+	blk_mq_free_request(req);
+	return ret;
 }
 
 static bool is_ctrl_ioctl(unsigned int cmd)