Message ID | 20250318-ublk_io_cmds-v1-1-c1bb74798fef@purestorage.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | ublk: remove io_cmds list in ublk_queue | expand |
Context | Check | Description |
---|---|---|
shin/vmtest-linus-master-PR | success | PR summary |
shin/vmtest-linus-master-VM_Test-1 | success | Logs for build-kernel |
shin/vmtest-linus-master-VM_Test-0 | success | Logs for build-kernel |
On 3/18/25 12:14 PM, Uday Shankar wrote: > The current I/O dispatch mechanism - queueing I/O by adding it to the > io_cmds list (and poking task_work as needed), then dispatching it in > ublk server task context by reversing io_cmds and completing the > io_uring command associated to each one - was introduced by commit > 7d4a93176e014 ("ublk_drv: don't forward io commands in reserve order") > to ensure that the ublk server received I/O in the same order that the > block layer submitted it to ublk_drv. This mechanism was only needed for > the "raw" task_work submission mechanism, since the io_uring task work > wrapper maintains FIFO ordering (using quite a similar mechanism in > fact). The "raw" task_work submission mechanism is no longer supported > in ublk_drv as of commit 29dc5d06613f2 ("ublk: kill queuing request by > task_work_add"), so the explicit llist/reversal is no longer needed - it > just duplicates logic already present in the underlying io_uring APIs. > Remove it. Patch looks good to me, just one followup: > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c > index 2955900ee713c5d8f3cbc2a69f6f6058348e5253..82c9d3d22f0ea5a0fad3f33837fa16146b5af7a9 100644 > --- a/drivers/block/ublk_drv.c > +++ b/drivers/block/ublk_drv.c > @@ -77,8 +77,6 @@ > UBLK_PARAM_TYPE_DMA_ALIGN) > > struct ublk_rq_data { > - struct llist_node node; > - > struct kref ref; > }; Can we get rid of ublk_rq_data then? If it's just a ref thing, I'm sure we can find an atomic_t of space in struct request and avoid it. Not a pressing thing, just tossing it out there...
On Tue, Mar 18, 2025 at 12:22:57PM -0600, Jens Axboe wrote: > > struct ublk_rq_data { > > - struct llist_node node; > > - > > struct kref ref; > > }; > > Can we get rid of ublk_rq_data then? If it's just a ref thing, I'm sure > we can find an atomic_t of space in struct request and avoid it. Not a > pressing thing, just tossing it out there... Yeah probably - we do need a ref since one could complete a request concurrently with another code path which references it (user copy and zero copy). I see that struct request has a refcount in it already, though I don't see any examples of drivers using it. Would it be a bad idea to try and reuse that?
On 3/18/25 12:43 PM, Uday Shankar wrote: > On Tue, Mar 18, 2025 at 12:22:57PM -0600, Jens Axboe wrote: >>> struct ublk_rq_data { >>> - struct llist_node node; >>> - >>> struct kref ref; >>> }; >> >> Can we get rid of ublk_rq_data then? If it's just a ref thing, I'm sure >> we can find an atomic_t of space in struct request and avoid it. Not a >> pressing thing, just tossing it out there... > > Yeah probably - we do need a ref since one could complete a request > concurrently with another code path which references it (user copy and > zero copy). I see that struct request has a refcount in it already, Right, at least with the current usage, we still do need that kref, or something similar. I would've probably made it just use refcount_t though, rather than rely on the indirect calls. kref doesn't really bring us anything here in terms of API. > though I don't see any examples of drivers using it. Would it be a bad > idea to try and reuse that? We can't reuse that one, and it's not for driver use - purely internal. But I _think_ you could easily grab space in the union that has the hash and ipi_list for it. And then you could dump needing this extra data per request.
On Tue, Mar 18, 2025 at 12:48:44PM -0600, Jens Axboe wrote: > > though I don't see any examples of drivers using it. Would it be a bad > > idea to try and reuse that? > > We can't reuse that one, and it's not for driver use - purely internal. > But I _think_ you could easily grab space in the union that has the hash > and ipi_list for it. And then you could dump needing this extra data per > request. Another idea is to union the refcount with end_io_data, since that's purely for the driver. But it might be a bit weird to tell drivers that they can use either end_io/end_io_data or the refcount but not both... not to mention it could be a nice exploit vector if drivers get it wrong. Either way, I think this work is follow-on material and shouldn't be rolled into this patch.
On Tue, Mar 18, 2025 at 12:48:44PM -0600, Jens Axboe wrote: > On 3/18/25 12:43 PM, Uday Shankar wrote: > > On Tue, Mar 18, 2025 at 12:22:57PM -0600, Jens Axboe wrote: > >>> struct ublk_rq_data { > >>> - struct llist_node node; > >>> - > >>> struct kref ref; > >>> }; > >> > >> Can we get rid of ublk_rq_data then? If it's just a ref thing, I'm sure > >> we can find an atomic_t of space in struct request and avoid it. Not a > >> pressing thing, just tossing it out there... > > > > Yeah probably - we do need a ref since one could complete a request > > concurrently with another code path which references it (user copy and > > zero copy). I see that struct request has a refcount in it already, > > Right, at least with the current usage, we still do need that kref, or > something similar. I would've probably made it just use refcount_t > though, rather than rely on the indirect calls. kref doesn't really > bring us anything here in terms of API. > > > though I don't see any examples of drivers using it. Would it be a bad > > idea to try and reuse that? > > We can't reuse that one, and it's not for driver use - purely internal. > But I _think_ you could easily grab space in the union that has the hash > and ipi_list for it. And then you could dump needing this extra data per > request. It should be fine to reuse request->ref, since the payload shares same lifetime with request. But if it is exported, the interface is likely to be misused... thanks, Ming
On 3/18/25 3:58 PM, Uday Shankar wrote: > On Tue, Mar 18, 2025 at 12:48:44PM -0600, Jens Axboe wrote: >>> though I don't see any examples of drivers using it. Would it be a bad >>> idea to try and reuse that? >> >> We can't reuse that one, and it's not for driver use - purely internal. >> But I _think_ you could easily grab space in the union that has the hash >> and ipi_list for it. And then you could dump needing this extra data per >> request. > > Another idea is to union the refcount with end_io_data, since that's > purely for the driver. But it might be a bit weird to tell drivers that > they can use either end_io/end_io_data or the refcount but not both... > not to mention it could be a nice exploit vector if drivers get it > wrong. > > Either way, I think this work is follow-on material and shouldn't be > rolled into this patch. For sure, that's what I said in my first reply too. It's just an idea for a further improvement, not gating this patch at all.
On 3/18/25 7:04 PM, Ming Lei wrote: > On Tue, Mar 18, 2025 at 12:48:44PM -0600, Jens Axboe wrote: >> On 3/18/25 12:43 PM, Uday Shankar wrote: >>> On Tue, Mar 18, 2025 at 12:22:57PM -0600, Jens Axboe wrote: >>>>> struct ublk_rq_data { >>>>> - struct llist_node node; >>>>> - >>>>> struct kref ref; >>>>> }; >>>> >>>> Can we get rid of ublk_rq_data then? If it's just a ref thing, I'm sure >>>> we can find an atomic_t of space in struct request and avoid it. Not a >>>> pressing thing, just tossing it out there... >>> >>> Yeah probably - we do need a ref since one could complete a request >>> concurrently with another code path which references it (user copy and >>> zero copy). I see that struct request has a refcount in it already, >> >> Right, at least with the current usage, we still do need that kref, or >> something similar. I would've probably made it just use refcount_t >> though, rather than rely on the indirect calls. kref doesn't really >> bring us anything here in terms of API. >> >>> though I don't see any examples of drivers using it. Would it be a bad >>> idea to try and reuse that? >> >> We can't reuse that one, and it's not for driver use - purely internal. >> But I _think_ you could easily grab space in the union that has the hash >> and ipi_list for it. And then you could dump needing this extra data per >> request. > > It should be fine to reuse request->ref, since the payload shares same > lifetime with request. > > But if it is exported, the interface is likely to be misused... Exactly, that's why I don't want to have drivers mess with this. And following up on the location, as I forgot to do that in the reply to Uday - the end_io_data is not a bad idea either, drivers get to use that as they wish. Then you can't use it with an end_io handler, but it's not like we've had a need to do that before anyway... It is a bit odd, however. But the ipi_list is only used completion side, at which point the driver is by definition done with the ref. And hash is just for io scheduling, which we'd never do with requests like this. So still think that's the best option.
On Tue, Mar 18, 2025 at 12:14:17PM -0600, Uday Shankar wrote: > The current I/O dispatch mechanism - queueing I/O by adding it to the > io_cmds list (and poking task_work as needed), then dispatching it in > ublk server task context by reversing io_cmds and completing the > io_uring command associated to each one - was introduced by commit > 7d4a93176e014 ("ublk_drv: don't forward io commands in reserve order") > to ensure that the ublk server received I/O in the same order that the > block layer submitted it to ublk_drv. This mechanism was only needed for > the "raw" task_work submission mechanism, since the io_uring task work > wrapper maintains FIFO ordering (using quite a similar mechanism in > fact). The "raw" task_work submission mechanism is no longer supported > in ublk_drv as of commit 29dc5d06613f2 ("ublk: kill queuing request by > task_work_add"), so the explicit llist/reversal is no longer needed - it > just duplicates logic already present in the underlying io_uring APIs. > Remove it. > > Signed-off-by: Uday Shankar <ushankar@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Thanks, Ming
On Tue, 18 Mar 2025 12:14:17 -0600, Uday Shankar wrote: > The current I/O dispatch mechanism - queueing I/O by adding it to the > io_cmds list (and poking task_work as needed), then dispatching it in > ublk server task context by reversing io_cmds and completing the > io_uring command associated to each one - was introduced by commit > 7d4a93176e014 ("ublk_drv: don't forward io commands in reserve order") > to ensure that the ublk server received I/O in the same order that the > block layer submitted it to ublk_drv. This mechanism was only needed for > the "raw" task_work submission mechanism, since the io_uring task work > wrapper maintains FIFO ordering (using quite a similar mechanism in > fact). The "raw" task_work submission mechanism is no longer supported > in ublk_drv as of commit 29dc5d06613f2 ("ublk: kill queuing request by > task_work_add"), so the explicit llist/reversal is no longer needed - it > just duplicates logic already present in the underlying io_uring APIs. > Remove it. > > [...] Applied, thanks! [1/1] ublk: remove io_cmds list in ublk_queue commit: 989bcd623a8b0c32b76d9258767d8b37e53419e6 Best regards,
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 2955900ee713c5d8f3cbc2a69f6f6058348e5253..82c9d3d22f0ea5a0fad3f33837fa16146b5af7a9 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -77,8 +77,6 @@ UBLK_PARAM_TYPE_DMA_ALIGN) struct ublk_rq_data { - struct llist_node node; - struct kref ref; }; @@ -145,8 +143,6 @@ struct ublk_queue { struct task_struct *ubq_daemon; char *io_cmd_buf; - struct llist_head io_cmds; - unsigned long io_addr; /* mapped vm address */ unsigned int max_io_sz; bool force_abort; @@ -1113,7 +1109,7 @@ static void ublk_complete_rq(struct kref *ref) } /* - * Since __ublk_rq_task_work always fails requests immediately during + * Since ublk_rq_task_work_cb always fails requests immediately during * exiting, __ublk_fail_req() is only called from abort context during * exiting. So lock is unnecessary. * @@ -1159,11 +1155,14 @@ static inline void __ublk_abort_rq(struct ublk_queue *ubq, blk_mq_end_request(rq, BLK_STS_IOERR); } -static inline void __ublk_rq_task_work(struct request *req, - unsigned issue_flags) +static void ublk_rq_task_work_cb(struct io_uring_cmd *cmd, + unsigned int issue_flags) { - struct ublk_queue *ubq = req->mq_hctx->driver_data; - int tag = req->tag; + struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd); + struct ublk_queue *ubq = pdu->ubq; + int tag = pdu->tag; + struct request *req = blk_mq_tag_to_rq( + ubq->dev->tag_set.tags[ubq->q_id], tag); struct ublk_io *io = &ubq->ios[tag]; unsigned int mapped_bytes; @@ -1238,34 +1237,11 @@ static inline void __ublk_rq_task_work(struct request *req, ubq_complete_io_cmd(io, UBLK_IO_RES_OK, issue_flags); } -static inline void ublk_forward_io_cmds(struct ublk_queue *ubq, - unsigned issue_flags) -{ - struct llist_node *io_cmds = llist_del_all(&ubq->io_cmds); - struct ublk_rq_data *data, *tmp; - - io_cmds = llist_reverse_order(io_cmds); - llist_for_each_entry_safe(data, tmp, io_cmds, node) - __ublk_rq_task_work(blk_mq_rq_from_pdu(data), issue_flags); -} - -static void ublk_rq_task_work_cb(struct io_uring_cmd *cmd, unsigned issue_flags) -{ - struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd); - struct ublk_queue *ubq = pdu->ubq; - - ublk_forward_io_cmds(ubq, issue_flags); -} - static void ublk_queue_cmd(struct ublk_queue *ubq, struct request *rq) { - struct ublk_rq_data *data = blk_mq_rq_to_pdu(rq); + struct ublk_io *io = &ubq->ios[rq->tag]; - if (llist_add(&data->node, &ubq->io_cmds)) { - struct ublk_io *io = &ubq->ios[rq->tag]; - - io_uring_cmd_complete_in_task(io->cmd, ublk_rq_task_work_cb); - } + io_uring_cmd_complete_in_task(io->cmd, ublk_rq_task_work_cb); } static enum blk_eh_timer_return ublk_timeout(struct request *rq) @@ -1458,7 +1434,7 @@ static void ublk_abort_queue(struct ublk_device *ub, struct ublk_queue *ubq) struct request *rq; /* - * Either we fail the request or ublk_rq_task_work_fn + * Either we fail the request or ublk_rq_task_work_cb * will do it */ rq = blk_mq_tag_to_rq(ub->tag_set.tags[ubq->q_id], i);
The current I/O dispatch mechanism - queueing I/O by adding it to the io_cmds list (and poking task_work as needed), then dispatching it in ublk server task context by reversing io_cmds and completing the io_uring command associated to each one - was introduced by commit 7d4a93176e014 ("ublk_drv: don't forward io commands in reserve order") to ensure that the ublk server received I/O in the same order that the block layer submitted it to ublk_drv. This mechanism was only needed for the "raw" task_work submission mechanism, since the io_uring task work wrapper maintains FIFO ordering (using quite a similar mechanism in fact). The "raw" task_work submission mechanism is no longer supported in ublk_drv as of commit 29dc5d06613f2 ("ublk: kill queuing request by task_work_add"), so the explicit llist/reversal is no longer needed - it just duplicates logic already present in the underlying io_uring APIs. Remove it. Signed-off-by: Uday Shankar <ushankar@purestorage.com> --- drivers/block/ublk_drv.c | 46 +++++++++++----------------------------------- 1 file changed, 11 insertions(+), 35 deletions(-) --- base-commit: 37f0c10318273bec83f373bbdad14fdc4e8ce79e change-id: 20250318-ublk_io_cmds-a421653cefc2 Best regards,