diff mbox series

ublk: remove io_cmds list in ublk_queue

Message ID 20250318-ublk_io_cmds-v1-1-c1bb74798fef@purestorage.com (mailing list archive)
State New
Headers show
Series ublk: remove io_cmds list in ublk_queue | expand

Checks

Context Check Description
shin/vmtest-linus-master-PR success PR summary
shin/vmtest-linus-master-VM_Test-1 success Logs for build-kernel
shin/vmtest-linus-master-VM_Test-0 success Logs for build-kernel

Commit Message

Uday Shankar March 18, 2025, 6:14 p.m. UTC
The current I/O dispatch mechanism - queueing I/O by adding it to the
io_cmds list (and poking task_work as needed), then dispatching it in
ublk server task context by reversing io_cmds and completing the
io_uring command associated to each one - was introduced by commit
7d4a93176e014 ("ublk_drv: don't forward io commands in reserve order")
to ensure that the ublk server received I/O in the same order that the
block layer submitted it to ublk_drv. This mechanism was only needed for
the "raw" task_work submission mechanism, since the io_uring task work
wrapper maintains FIFO ordering (using quite a similar mechanism in
fact). The "raw" task_work submission mechanism is no longer supported
in ublk_drv as of commit 29dc5d06613f2 ("ublk: kill queuing request by
task_work_add"), so the explicit llist/reversal is no longer needed - it
just duplicates logic already present in the underlying io_uring APIs.
Remove it.

Signed-off-by: Uday Shankar <ushankar@purestorage.com>
---
 drivers/block/ublk_drv.c | 46 +++++++++++-----------------------------------
 1 file changed, 11 insertions(+), 35 deletions(-)


---
base-commit: 37f0c10318273bec83f373bbdad14fdc4e8ce79e
change-id: 20250318-ublk_io_cmds-a421653cefc2

Best regards,

Comments

Jens Axboe March 18, 2025, 6:22 p.m. UTC | #1
On 3/18/25 12:14 PM, Uday Shankar wrote:
> The current I/O dispatch mechanism - queueing I/O by adding it to the
> io_cmds list (and poking task_work as needed), then dispatching it in
> ublk server task context by reversing io_cmds and completing the
> io_uring command associated to each one - was introduced by commit
> 7d4a93176e014 ("ublk_drv: don't forward io commands in reserve order")
> to ensure that the ublk server received I/O in the same order that the
> block layer submitted it to ublk_drv. This mechanism was only needed for
> the "raw" task_work submission mechanism, since the io_uring task work
> wrapper maintains FIFO ordering (using quite a similar mechanism in
> fact). The "raw" task_work submission mechanism is no longer supported
> in ublk_drv as of commit 29dc5d06613f2 ("ublk: kill queuing request by
> task_work_add"), so the explicit llist/reversal is no longer needed - it
> just duplicates logic already present in the underlying io_uring APIs.
> Remove it.

Patch looks good to me, just one followup:

> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> index 2955900ee713c5d8f3cbc2a69f6f6058348e5253..82c9d3d22f0ea5a0fad3f33837fa16146b5af7a9 100644
> --- a/drivers/block/ublk_drv.c
> +++ b/drivers/block/ublk_drv.c
> @@ -77,8 +77,6 @@
>  	 UBLK_PARAM_TYPE_DMA_ALIGN)
>  
>  struct ublk_rq_data {
> -	struct llist_node node;
> -
>  	struct kref ref;
>  };

Can we get rid of ublk_rq_data then? If it's just a ref thing, I'm sure
we can find an atomic_t of space in struct request and avoid it. Not a
pressing thing, just tossing it out there...
Uday Shankar March 18, 2025, 6:43 p.m. UTC | #2
On Tue, Mar 18, 2025 at 12:22:57PM -0600, Jens Axboe wrote:
> >  struct ublk_rq_data {
> > -	struct llist_node node;
> > -
> >  	struct kref ref;
> >  };
> 
> Can we get rid of ublk_rq_data then? If it's just a ref thing, I'm sure
> we can find an atomic_t of space in struct request and avoid it. Not a
> pressing thing, just tossing it out there...

Yeah probably - we do need a ref since one could complete a request
concurrently with another code path which references it (user copy and
zero copy). I see that struct request has a refcount in it already,
though I don't see any examples of drivers using it. Would it be a bad
idea to try and reuse that?
Jens Axboe March 18, 2025, 6:48 p.m. UTC | #3
On 3/18/25 12:43 PM, Uday Shankar wrote:
> On Tue, Mar 18, 2025 at 12:22:57PM -0600, Jens Axboe wrote:
>>>  struct ublk_rq_data {
>>> -	struct llist_node node;
>>> -
>>>  	struct kref ref;
>>>  };
>>
>> Can we get rid of ublk_rq_data then? If it's just a ref thing, I'm sure
>> we can find an atomic_t of space in struct request and avoid it. Not a
>> pressing thing, just tossing it out there...
> 
> Yeah probably - we do need a ref since one could complete a request
> concurrently with another code path which references it (user copy and
> zero copy). I see that struct request has a refcount in it already,

Right, at least with the current usage, we still do need that kref, or
something similar. I would've probably made it just use refcount_t
though, rather than rely on the indirect calls. kref doesn't really
bring us anything here in terms of API.

> though I don't see any examples of drivers using it. Would it be a bad
> idea to try and reuse that?

We can't reuse that one, and it's not for driver use - purely internal.
But I _think_ you could easily grab space in the union that has the hash
and ipi_list for it. And then you could dump needing this extra data per
request.
Uday Shankar March 18, 2025, 9:58 p.m. UTC | #4
On Tue, Mar 18, 2025 at 12:48:44PM -0600, Jens Axboe wrote:
> > though I don't see any examples of drivers using it. Would it be a bad
> > idea to try and reuse that?
> 
> We can't reuse that one, and it's not for driver use - purely internal.
> But I _think_ you could easily grab space in the union that has the hash
> and ipi_list for it. And then you could dump needing this extra data per
> request.

Another idea is to union the refcount with end_io_data, since that's
purely for the driver. But it might be a bit weird to tell drivers that
they can use either end_io/end_io_data or the refcount but not both...
not to mention it could be a nice exploit vector if drivers get it
wrong.

Either way, I think this work is follow-on material and shouldn't be
rolled into this patch.
Ming Lei March 19, 2025, 1:04 a.m. UTC | #5
On Tue, Mar 18, 2025 at 12:48:44PM -0600, Jens Axboe wrote:
> On 3/18/25 12:43 PM, Uday Shankar wrote:
> > On Tue, Mar 18, 2025 at 12:22:57PM -0600, Jens Axboe wrote:
> >>>  struct ublk_rq_data {
> >>> -	struct llist_node node;
> >>> -
> >>>  	struct kref ref;
> >>>  };
> >>
> >> Can we get rid of ublk_rq_data then? If it's just a ref thing, I'm sure
> >> we can find an atomic_t of space in struct request and avoid it. Not a
> >> pressing thing, just tossing it out there...
> > 
> > Yeah probably - we do need a ref since one could complete a request
> > concurrently with another code path which references it (user copy and
> > zero copy). I see that struct request has a refcount in it already,
> 
> Right, at least with the current usage, we still do need that kref, or
> something similar. I would've probably made it just use refcount_t
> though, rather than rely on the indirect calls. kref doesn't really
> bring us anything here in terms of API.
> 
> > though I don't see any examples of drivers using it. Would it be a bad
> > idea to try and reuse that?
> 
> We can't reuse that one, and it's not for driver use - purely internal.
> But I _think_ you could easily grab space in the union that has the hash
> and ipi_list for it. And then you could dump needing this extra data per
> request.

It should be fine to reuse request->ref, since the payload shares same
lifetime with request.

But if it is exported, the interface is likely to be misused...


thanks,
Ming
Jens Axboe March 19, 2025, 1:54 a.m. UTC | #6
On 3/18/25 3:58 PM, Uday Shankar wrote:
> On Tue, Mar 18, 2025 at 12:48:44PM -0600, Jens Axboe wrote:
>>> though I don't see any examples of drivers using it. Would it be a bad
>>> idea to try and reuse that?
>>
>> We can't reuse that one, and it's not for driver use - purely internal.
>> But I _think_ you could easily grab space in the union that has the hash
>> and ipi_list for it. And then you could dump needing this extra data per
>> request.
> 
> Another idea is to union the refcount with end_io_data, since that's
> purely for the driver. But it might be a bit weird to tell drivers that
> they can use either end_io/end_io_data or the refcount but not both...
> not to mention it could be a nice exploit vector if drivers get it
> wrong.
> 
> Either way, I think this work is follow-on material and shouldn't be
> rolled into this patch.

For sure, that's what I said in my first reply too. It's just an idea
for a further improvement, not gating this patch at all.
Jens Axboe March 19, 2025, 1:57 a.m. UTC | #7
On 3/18/25 7:04 PM, Ming Lei wrote:
> On Tue, Mar 18, 2025 at 12:48:44PM -0600, Jens Axboe wrote:
>> On 3/18/25 12:43 PM, Uday Shankar wrote:
>>> On Tue, Mar 18, 2025 at 12:22:57PM -0600, Jens Axboe wrote:
>>>>>  struct ublk_rq_data {
>>>>> -	struct llist_node node;
>>>>> -
>>>>>  	struct kref ref;
>>>>>  };
>>>>
>>>> Can we get rid of ublk_rq_data then? If it's just a ref thing, I'm sure
>>>> we can find an atomic_t of space in struct request and avoid it. Not a
>>>> pressing thing, just tossing it out there...
>>>
>>> Yeah probably - we do need a ref since one could complete a request
>>> concurrently with another code path which references it (user copy and
>>> zero copy). I see that struct request has a refcount in it already,
>>
>> Right, at least with the current usage, we still do need that kref, or
>> something similar. I would've probably made it just use refcount_t
>> though, rather than rely on the indirect calls. kref doesn't really
>> bring us anything here in terms of API.
>>
>>> though I don't see any examples of drivers using it. Would it be a bad
>>> idea to try and reuse that?
>>
>> We can't reuse that one, and it's not for driver use - purely internal.
>> But I _think_ you could easily grab space in the union that has the hash
>> and ipi_list for it. And then you could dump needing this extra data per
>> request.
> 
> It should be fine to reuse request->ref, since the payload shares same
> lifetime with request.
> 
> But if it is exported, the interface is likely to be misused...

Exactly, that's why I don't want to have drivers mess with this.

And following up on the location, as I forgot to do that in the reply to
Uday - the end_io_data is not a bad idea either, drivers get to use that
as they wish. Then you can't use it with an end_io handler, but it's not
like we've had a need to do that before anyway... It is a bit odd,
however.

But the ipi_list is only used completion side, at which point the driver
is by definition done with the ref. And hash is just for io scheduling,
which we'd never do with requests like this. So still think that's the
best option.
Ming Lei March 19, 2025, 2:14 a.m. UTC | #8
On Tue, Mar 18, 2025 at 12:14:17PM -0600, Uday Shankar wrote:
> The current I/O dispatch mechanism - queueing I/O by adding it to the
> io_cmds list (and poking task_work as needed), then dispatching it in
> ublk server task context by reversing io_cmds and completing the
> io_uring command associated to each one - was introduced by commit
> 7d4a93176e014 ("ublk_drv: don't forward io commands in reserve order")
> to ensure that the ublk server received I/O in the same order that the
> block layer submitted it to ublk_drv. This mechanism was only needed for
> the "raw" task_work submission mechanism, since the io_uring task work
> wrapper maintains FIFO ordering (using quite a similar mechanism in
> fact). The "raw" task_work submission mechanism is no longer supported
> in ublk_drv as of commit 29dc5d06613f2 ("ublk: kill queuing request by
> task_work_add"), so the explicit llist/reversal is no longer needed - it
> just duplicates logic already present in the underlying io_uring APIs.
> Remove it.
> 
> Signed-off-by: Uday Shankar <ushankar@purestorage.com>

Reviewed-by: Ming Lei <ming.lei@redhat.com>

Thanks,
Ming
Jens Axboe March 19, 2025, 12:32 p.m. UTC | #9
On Tue, 18 Mar 2025 12:14:17 -0600, Uday Shankar wrote:
> The current I/O dispatch mechanism - queueing I/O by adding it to the
> io_cmds list (and poking task_work as needed), then dispatching it in
> ublk server task context by reversing io_cmds and completing the
> io_uring command associated to each one - was introduced by commit
> 7d4a93176e014 ("ublk_drv: don't forward io commands in reserve order")
> to ensure that the ublk server received I/O in the same order that the
> block layer submitted it to ublk_drv. This mechanism was only needed for
> the "raw" task_work submission mechanism, since the io_uring task work
> wrapper maintains FIFO ordering (using quite a similar mechanism in
> fact). The "raw" task_work submission mechanism is no longer supported
> in ublk_drv as of commit 29dc5d06613f2 ("ublk: kill queuing request by
> task_work_add"), so the explicit llist/reversal is no longer needed - it
> just duplicates logic already present in the underlying io_uring APIs.
> Remove it.
> 
> [...]

Applied, thanks!

[1/1] ublk: remove io_cmds list in ublk_queue
      commit: 989bcd623a8b0c32b76d9258767d8b37e53419e6

Best regards,
diff mbox series

Patch

diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 2955900ee713c5d8f3cbc2a69f6f6058348e5253..82c9d3d22f0ea5a0fad3f33837fa16146b5af7a9 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -77,8 +77,6 @@ 
 	 UBLK_PARAM_TYPE_DMA_ALIGN)
 
 struct ublk_rq_data {
-	struct llist_node node;
-
 	struct kref ref;
 };
 
@@ -145,8 +143,6 @@  struct ublk_queue {
 	struct task_struct	*ubq_daemon;
 	char *io_cmd_buf;
 
-	struct llist_head	io_cmds;
-
 	unsigned long io_addr;	/* mapped vm address */
 	unsigned int max_io_sz;
 	bool force_abort;
@@ -1113,7 +1109,7 @@  static void ublk_complete_rq(struct kref *ref)
 }
 
 /*
- * Since __ublk_rq_task_work always fails requests immediately during
+ * Since ublk_rq_task_work_cb always fails requests immediately during
  * exiting, __ublk_fail_req() is only called from abort context during
  * exiting. So lock is unnecessary.
  *
@@ -1159,11 +1155,14 @@  static inline void __ublk_abort_rq(struct ublk_queue *ubq,
 		blk_mq_end_request(rq, BLK_STS_IOERR);
 }
 
-static inline void __ublk_rq_task_work(struct request *req,
-				       unsigned issue_flags)
+static void ublk_rq_task_work_cb(struct io_uring_cmd *cmd,
+				 unsigned int issue_flags)
 {
-	struct ublk_queue *ubq = req->mq_hctx->driver_data;
-	int tag = req->tag;
+	struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd);
+	struct ublk_queue *ubq = pdu->ubq;
+	int tag = pdu->tag;
+	struct request *req = blk_mq_tag_to_rq(
+		ubq->dev->tag_set.tags[ubq->q_id], tag);
 	struct ublk_io *io = &ubq->ios[tag];
 	unsigned int mapped_bytes;
 
@@ -1238,34 +1237,11 @@  static inline void __ublk_rq_task_work(struct request *req,
 	ubq_complete_io_cmd(io, UBLK_IO_RES_OK, issue_flags);
 }
 
-static inline void ublk_forward_io_cmds(struct ublk_queue *ubq,
-					unsigned issue_flags)
-{
-	struct llist_node *io_cmds = llist_del_all(&ubq->io_cmds);
-	struct ublk_rq_data *data, *tmp;
-
-	io_cmds = llist_reverse_order(io_cmds);
-	llist_for_each_entry_safe(data, tmp, io_cmds, node)
-		__ublk_rq_task_work(blk_mq_rq_from_pdu(data), issue_flags);
-}
-
-static void ublk_rq_task_work_cb(struct io_uring_cmd *cmd, unsigned issue_flags)
-{
-	struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd);
-	struct ublk_queue *ubq = pdu->ubq;
-
-	ublk_forward_io_cmds(ubq, issue_flags);
-}
-
 static void ublk_queue_cmd(struct ublk_queue *ubq, struct request *rq)
 {
-	struct ublk_rq_data *data = blk_mq_rq_to_pdu(rq);
+	struct ublk_io *io = &ubq->ios[rq->tag];
 
-	if (llist_add(&data->node, &ubq->io_cmds)) {
-		struct ublk_io *io = &ubq->ios[rq->tag];
-
-		io_uring_cmd_complete_in_task(io->cmd, ublk_rq_task_work_cb);
-	}
+	io_uring_cmd_complete_in_task(io->cmd, ublk_rq_task_work_cb);
 }
 
 static enum blk_eh_timer_return ublk_timeout(struct request *rq)
@@ -1458,7 +1434,7 @@  static void ublk_abort_queue(struct ublk_device *ub, struct ublk_queue *ubq)
 			struct request *rq;
 
 			/*
-			 * Either we fail the request or ublk_rq_task_work_fn
+			 * Either we fail the request or ublk_rq_task_work_cb
 			 * will do it
 			 */
 			rq = blk_mq_tag_to_rq(ub->tag_set.tags[ubq->q_id], i);