Message ID | 20240529-fuse-uring-for-6-9-rfc2-out-v1-19-d149476b1d65@ddn.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | fuse: fuse-over-io-uring | expand |
On 5/29/24 12:00 PM, Bernd Schubert wrote: > This is to avoid using async completion tasks > (i.e. context switches) when not needed. > > Cc: io-uring@vger.kernel.org > Signed-off-by: Bernd Schubert <bschubert@ddn.com> This patch is very confusing, even after having pulled the other changes. In general, would be great if the io_uring list was CC'ed on the whole series, it's very hard to review just a single patch, when you don't have the full picture. Outside of that, would be super useful to include a blurb on how you set things up for testing, and how you run the testing. That would really help in terms of being able to run and test it, and also to propose changes that might make a big difference.
On 5/31/24 18:24, Jens Axboe wrote: > On 5/29/24 12:00 PM, Bernd Schubert wrote: >> This is to avoid using async completion tasks >> (i.e. context switches) when not needed. >> >> Cc: io-uring@vger.kernel.org >> Signed-off-by: Bernd Schubert <bschubert@ddn.com> > > This patch is very confusing, even after having pulled the other > changes. In general, would be great if the io_uring list was CC'ed on Hmm, let me try to explain. And yes, I definitely need to add these details to the commit message Without the patch: <sending a struct fuse_req> fuse_uring_queue_fuse_req fuse_uring_send_to_ring io_uring_cmd_complete_in_task <async task runs> io_uring_cmd_done() Now I would like to call io_uring_cmd_done() directly without another task whenever possible. I didn't benchmark it, but another task is in general against the entire concept. That is where the patch comes in fuse_uring_queue_fuse_req() now adds the information if io_uring_cmd_done() shall be called directly or via io_uring_cmd_complete_in_task(). Doing it directly requires the knowledge of issue_flags - these are the conditions in fuse_uring_queue_fuse_req. 1) (current == queue->server_task) fuse_uring_cmd (IORING_OP_URING_CMD) received a completion for a previous fuse_req, after completion it fetched the next fuse_req and wants to send it - for 'current == queue->server_task' issue flags got stored in struct fuse_ring_queue::uring_cmd_issue_flags 2) 'else if (current->io_uring)' (actually documented in the code) 2.1 This might be through IORING_OP_URING_CMD as well, but then server side uses multiple threads to access the same ring - not nice. We only store issue_flags into the queue for 'current == queue->server_task', so we do not know issue_flags - sending through task is needed. 2.2 This might be an application request through the mount point, through the io-uring interface. We do know issue flags either. (That one was actually a surprise for me, when xfstests caught it. Initially I had a condition to send without the extra task then lockdep caught that. In both cases it has to use a tasks. My question here is if 'current->io_uring' is reliable. 3) everything else 3.1) For async requests, interesting are cached reads and writes here. At a minimum writes a holding a spin lock and that lock conflicts with the mutex io-uring is taking - we need a task as well 3.2) sync - no lock being hold, it can send without the extra task. > the whole series, it's very hard to review just a single patch, when you > don't have the full picture. Sorry, I will do that for the next version. > > Outside of that, would be super useful to include a blurb on how you set > things up for testing, and how you run the testing. That would really > help in terms of being able to run and test it, and also to propose > changes that might make a big difference. > Will do in the next version. You basically need my libfuse uring branch (right now commit history is not cleaned up) and follow instructions in <libfuse>/xfstests/README.md how to run xfstests. Missing is a slight patch for that dir to set extra daemon parameters, like direct-io (fuse' FOPEN_DIRECT_IO) and io-uring. Will add that libfuse during the next days. Thanks, Bernd
On 5/31/24 11:36 AM, Bernd Schubert wrote: > On 5/31/24 18:24, Jens Axboe wrote: >> On 5/29/24 12:00 PM, Bernd Schubert wrote: >>> This is to avoid using async completion tasks >>> (i.e. context switches) when not needed. >>> >>> Cc: io-uring@vger.kernel.org >>> Signed-off-by: Bernd Schubert <bschubert@ddn.com> >> >> This patch is very confusing, even after having pulled the other >> changes. In general, would be great if the io_uring list was CC'ed on > > Hmm, let me try to explain. And yes, I definitely need to add these details > to the commit message > > Without the patch: > > <sending a struct fuse_req> > > fuse_uring_queue_fuse_req > fuse_uring_send_to_ring > io_uring_cmd_complete_in_task > > <async task runs> > io_uring_cmd_done() And this is a worthwhile optimization, you always want to complete it line if at all possible. But none of this logic or code belongs in fuse, it really should be provided by io_uring helpers. I would just drop this patch for now and focus on the core functionality. Send out a version with that, and then we'll be happy to help this as performant as it can be. This is where the ask on "how to reproduce your numbers" comes from - with that, it's usually trivial to spot areas where things could be improved. And I strongly suspect that will involve providing you with the right API to use here, and perhaps refactoring a bit on the fuse side. Making up issue_flags is _really_ not something a user should do. > 1) (current == queue->server_task) > fuse_uring_cmd (IORING_OP_URING_CMD) received a completion for a > previous fuse_req, after completion it fetched the next fuse_req and > wants to send it - for 'current == queue->server_task' issue flags > got stored in struct fuse_ring_queue::uring_cmd_issue_flags And queue->server_task is the owner of the ring? Then yes that is safe > > 2) 'else if (current->io_uring)' > > (actually documented in the code) > > 2.1 This might be through IORING_OP_URING_CMD as well, but then server > side uses multiple threads to access the same ring - not nice. We only > store issue_flags into the queue for 'current == queue->server_task', so > we do not know issue_flags - sending through task is needed. What's the path leading to you not having the issue_flags? > 2.2 This might be an application request through the mount point, through > the io-uring interface. We do know issue flags either. > (That one was actually a surprise for me, when xfstests caught it. > Initially I had a condition to send without the extra task then lockdep > caught that. In general, if you don't know the context (eg you don't have issue_flags passed in), you should probably assume the only way is to sanely proceed is to have it processed by the task itself. > > In both cases it has to use a tasks. > > > My question here is if 'current->io_uring' is reliable. Yes that will be reliable in the sense that it tells you that the current task has (at least) one io_uring context setup. But it doesn't tell you anything beyond that, like if it's the owner of this request. > 3) everything else > > 3.1) For async requests, interesting are cached reads and writes here. At a minimum > writes a holding a spin lock and that lock conflicts with the mutex io-uring is taking - > we need a task as well > > 3.2) sync - no lock being hold, it can send without the extra task. As mentioned, let's drop this patch 19 for now. Send out what you have with instructions on how to test it, and I'll give it a spin and see what we can do about this. >> Outside of that, would be super useful to include a blurb on how you set >> things up for testing, and how you run the testing. That would really >> help in terms of being able to run and test it, and also to propose >> changes that might make a big difference. >> > > Will do in the next version. > You basically need my libfuse uring branch > (right now commit history is not cleaned up) and follow > instructions in <libfuse>/xfstests/README.md how to run xfstests. > Missing is a slight patch for that dir to set extra daemon parameters, > like direct-io (fuse' FOPEN_DIRECT_IO) and io-uring. Will add that libfuse > during the next days. I'll leave the xfstests to you for now, but running some perf testing just to verify how it's being used would be useful and help improve it for sure.
On 5/31/24 21:10, Jens Axboe wrote: > On 5/31/24 11:36 AM, Bernd Schubert wrote: >> On 5/31/24 18:24, Jens Axboe wrote: >>> On 5/29/24 12:00 PM, Bernd Schubert wrote: >>>> This is to avoid using async completion tasks >>>> (i.e. context switches) when not needed. >>>> >>>> Cc: io-uring@vger.kernel.org >>>> Signed-off-by: Bernd Schubert <bschubert@ddn.com> >>> >>> This patch is very confusing, even after having pulled the other >>> changes. In general, would be great if the io_uring list was CC'ed on >> >> Hmm, let me try to explain. And yes, I definitely need to add these details >> to the commit message >> >> Without the patch: >> >> <sending a struct fuse_req> >> >> fuse_uring_queue_fuse_req >> fuse_uring_send_to_ring >> io_uring_cmd_complete_in_task >> >> <async task runs> >> io_uring_cmd_done() > > And this is a worthwhile optimization, you always want to complete it > line if at all possible. But none of this logic or code belongs in fuse, > it really should be provided by io_uring helpers. > > I would just drop this patch for now and focus on the core > functionality. Send out a version with that, and then we'll be happy to > help this as performant as it can be. This is where the ask on "how to > reproduce your numbers" comes from - with that, it's usually trivial to > spot areas where things could be improved. And I strongly suspect that > will involve providing you with the right API to use here, and perhaps > refactoring a bit on the fuse side. Making up issue_flags is _really_ > not something a user should do. Great that you agree, I don't like the issue_flag handling in fuse code either. I will also follow your suggestion to drop this patch. > >> 1) (current == queue->server_task) >> fuse_uring_cmd (IORING_OP_URING_CMD) received a completion for a >> previous fuse_req, after completion it fetched the next fuse_req and >> wants to send it - for 'current == queue->server_task' issue flags >> got stored in struct fuse_ring_queue::uring_cmd_issue_flags > > And queue->server_task is the owner of the ring? Then yes that is safe Yeah, it is the thread that submits SQEs - should be the owner of the ring, unless daemon side does something wrong (given that there are several userspace implementation and not a single libfuse only, we need to expect and handle implementation errors, though). >> >> 2) 'else if (current->io_uring)' >> >> (actually documented in the code) >> >> 2.1 This might be through IORING_OP_URING_CMD as well, but then server >> side uses multiple threads to access the same ring - not nice. We only >> store issue_flags into the queue for 'current == queue->server_task', so >> we do not know issue_flags - sending through task is needed. > > What's the path leading to you not having the issue_flags? We get issue flags here, but I want to keep changes to libfuse small and want to avoid changing non uring related function signatures. Which is the the why we store issue_flags for the presumed ring owner thread in the queue data structure, but we don't have it for possible other threads then Example: IORING_OP_URING_CMD fuse_uring_cmd fuse_uring_commit_and_release fuse_uring_req_end_and_get_next --> until here issue_flags passed fuse_request_end -> generic fuse function, issue_flags not passed req->args->end() / fuse_writepage_end fuse_simple_background fuse_request_queue_background fuse_request_queue_background_uring fuse_uring_queue_fuse_req fuse_uring_send_to_ring io_uring_cmd_done I.e. we had issue_flags up to fuse_uring_req_end_and_get_next(), but then call into generic fuse functions and stop passing through issue_flags. For the ring-owner we take issue flags stored by fuse_uring_cmd() into struct fuse_ring_queue, but if daemon side uses multiple threads to access the ring we won't have that. Well, we could allow it and store it into an array or rb-tree, but I don't like that multiple threads access something that is optimized to have a thread per core already. > >> 2.2 This might be an application request through the mount point, through >> the io-uring interface. We do know issue flags either. >> (That one was actually a surprise for me, when xfstests caught it. >> Initially I had a condition to send without the extra task then lockdep >> caught that. > > In general, if you don't know the context (eg you don't have issue_flags > passed in), you should probably assume the only way is to sanely proceed > is to have it processed by the task itself. > >> >> In both cases it has to use a tasks. >> >> >> My question here is if 'current->io_uring' is reliable. > > Yes that will be reliable in the sense that it tells you that the > current task has (at least) one io_uring context setup. But it doesn't > tell you anything beyond that, like if it's the owner of this request. Yeah, you can see that it just checks for current->io_uring and then uses a task. > >> 3) everything else >> >> 3.1) For async requests, interesting are cached reads and writes here. At a minimum >> writes a holding a spin lock and that lock conflicts with the mutex io-uring is taking - >> we need a task as well >> >> 3.2) sync - no lock being hold, it can send without the extra task. > > As mentioned, let's drop this patch 19 for now. Send out what you have > with instructions on how to test it, and I'll give it a spin and see > what we can do about this. > >>> Outside of that, would be super useful to include a blurb on how you set >>> things up for testing, and how you run the testing. That would really >>> help in terms of being able to run and test it, and also to propose >>> changes that might make a big difference. >>> >> >> Will do in the next version. >> You basically need my libfuse uring branch >> (right now commit history is not cleaned up) and follow >> instructions in <libfuse>/xfstests/README.md how to run xfstests. >> Missing is a slight patch for that dir to set extra daemon parameters, >> like direct-io (fuse' FOPEN_DIRECT_IO) and io-uring. Will add that libfuse >> during the next days. > > I'll leave the xfstests to you for now, but running some perf testing > just to verify how it's being used would be useful and help improve it > for sure. > Ah you meant performance tests. I used libfuse/example/passthrough_hp from my uring branch and then fio on top of that for reads/writes and mdtest from the ior repo for metadata. Maybe I should upload my scripts somewhere. Thanks, Beernd
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c index cdc5836edb6e..74407e5e86fa 100644 --- a/fs/fuse/dev_uring.c +++ b/fs/fuse/dev_uring.c @@ -32,7 +32,8 @@ #include <linux/io_uring/cmd.h> static void fuse_uring_req_end_and_get_next(struct fuse_ring_ent *ring_ent, - bool set_err, int error); + bool set_err, int error, + unsigned int issue_flags); static void fuse_ring_ring_ent_unset_userspace(struct fuse_ring_ent *ent) { @@ -682,7 +683,9 @@ static int fuse_uring_copy_to_ring(struct fuse_ring *ring, struct fuse_req *req, * userspace will read it * This is comparable with classical read(/dev/fuse) */ -static void fuse_uring_send_to_ring(struct fuse_ring_ent *ring_ent) +static void fuse_uring_send_to_ring(struct fuse_ring_ent *ring_ent, + unsigned int issue_flags, + bool send_in_task) { struct fuse_ring *ring = ring_ent->queue->ring; struct fuse_ring_req *rreq = ring_ent->rreq; @@ -723,13 +726,16 @@ static void fuse_uring_send_to_ring(struct fuse_ring_ent *ring_ent) __func__, ring_ent->queue->qid, ring_ent->tag, ring_ent->state, rreq->in.opcode, rreq->in.unique); - io_uring_cmd_complete_in_task(ring_ent->cmd, - fuse_uring_async_send_to_ring); + if (send_in_task) + io_uring_cmd_complete_in_task(ring_ent->cmd, + fuse_uring_async_send_to_ring); + else + io_uring_cmd_done(ring_ent->cmd, 0, 0, issue_flags); return; err: - fuse_uring_req_end_and_get_next(ring_ent, true, err); + fuse_uring_req_end_and_get_next(ring_ent, true, err, issue_flags); } /* @@ -806,7 +812,8 @@ static bool fuse_uring_ent_release_and_fetch(struct fuse_ring_ent *ring_ent) * has lock/unlock/lock to avoid holding the lock on calling fuse_request_end */ static void fuse_uring_req_end_and_get_next(struct fuse_ring_ent *ring_ent, - bool set_err, int error) + bool set_err, int error, + unsigned int issue_flags) { struct fuse_req *req = ring_ent->fuse_req; int has_next; @@ -822,7 +829,7 @@ static void fuse_uring_req_end_and_get_next(struct fuse_ring_ent *ring_ent, has_next = fuse_uring_ent_release_and_fetch(ring_ent); if (has_next) { /* called within uring context - use provided flags */ - fuse_uring_send_to_ring(ring_ent); + fuse_uring_send_to_ring(ring_ent, issue_flags, false); } } @@ -857,7 +864,7 @@ static void fuse_uring_commit_and_release(struct fuse_dev *fud, out: pr_devel("%s:%d ret=%zd op=%d req-ret=%d\n", __func__, __LINE__, err, req->args->opcode, req->out.h.error); - fuse_uring_req_end_and_get_next(ring_ent, set_err, err); + fuse_uring_req_end_and_get_next(ring_ent, set_err, err, issue_flags); } /* @@ -1156,10 +1163,12 @@ int fuse_uring_queue_fuse_req(struct fuse_conn *fc, struct fuse_req *req) struct fuse_ring_queue *queue; struct fuse_ring_ent *ring_ent = NULL; int res; - int async = test_bit(FR_BACKGROUND, &req->flags) && - !req->args->async_blocking; + int async_req = test_bit(FR_BACKGROUND, &req->flags); + int async = async_req && !req->args->async_blocking; struct list_head *ent_queue, *req_queue; int qid; + bool send_in_task; + unsigned int issue_flags; qid = fuse_uring_get_req_qid(req, ring, async); queue = fuse_uring_get_queue(ring, qid); @@ -1182,11 +1191,37 @@ int fuse_uring_queue_fuse_req(struct fuse_conn *fc, struct fuse_req *req) list_first_entry(ent_queue, struct fuse_ring_ent, list); list_del(&ring_ent->list); fuse_uring_add_req_to_ring_ent(ring_ent, req); + if (current == queue->server_task) { + issue_flags = queue->uring_cmd_issue_flags; + } else if (current->io_uring) { + /* There are two cases here + * 1) fuse-server side uses multiple threads accessing + * the ring. We only have stored issue_flags for + * into the queue for one thread (the first one + * that submits FUSE_URING_REQ_FETCH) + * 2) IO requests through io-uring, we do not have + * issue flags at all for these + */ + send_in_task = true; + issue_flags = 0; + } else { + if (async_req) { + /* + * page cache writes might hold an upper + * spinlockl, which conflicts with the io-uring + * mutex + */ + send_in_task = true; + issue_flags = 0; + } else { + issue_flags = IO_URING_F_UNLOCKED; + } + } } spin_unlock(&queue->lock); if (ring_ent != NULL) - fuse_uring_send_to_ring(ring_ent); + fuse_uring_send_to_ring(ring_ent, issue_flags, send_in_task); return 0;
This is to avoid using async completion tasks (i.e. context switches) when not needed. Cc: io-uring@vger.kernel.org Signed-off-by: Bernd Schubert <bschubert@ddn.com> --- This condition should be better verified by io-uring developers. } else if (current->io_uring) { /* There are two cases here * 1) fuse-server side uses multiple threads accessing * the ring * 2) IO requests through io-uring */ send_in_task = true; issue_flags = 0; --- fs/fuse/dev_uring.c | 57 ++++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 46 insertions(+), 11 deletions(-)