Message ID | 692e81eeddccc096f449a7960365fa7b4a18f8e6.1655637157.git.asml.silence@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | cqe posting cleanups | expand |
On 6/19/22 5:26 AM, Pavel Begunkov wrote: > It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often > ->flush_cqes flag prevents from completion being flushed. Sometimes it's > high level of concurrency that enables it at least for one CQE, but > sometimes it doesn't save much because nobody waiting on the CQ. > > Remove ->flush_cqes flag and the optimisation, it should benefit the > normal use case. Note, that there is no spurious eventfd problem with > that as checks for spuriousness were incorporated into > io_eventfd_signal(). Would be note to quantify, which should be pretty easy. Eg run a nop workload, then run the same but with CQE_SKIP_SUCCESS set. That'd take it to the extreme, and I do think it'd be nice to have an understanding of how big the gap could potentially be. With luck, it doesn't really matter. Always nice to kill stuff like this, if it isn't that impactful.
On 6/19/22 14:31, Jens Axboe wrote: > On 6/19/22 5:26 AM, Pavel Begunkov wrote: >> It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often >> ->flush_cqes flag prevents from completion being flushed. Sometimes it's >> high level of concurrency that enables it at least for one CQE, but >> sometimes it doesn't save much because nobody waiting on the CQ. >> >> Remove ->flush_cqes flag and the optimisation, it should benefit the >> normal use case. Note, that there is no spurious eventfd problem with >> that as checks for spuriousness were incorporated into >> io_eventfd_signal(). > > Would be note to quantify, which should be pretty easy. Eg run a nop > workload, then run the same but with CQE_SKIP_SUCCESS set. That'd take > it to the extreme, and I do think it'd be nice to have an understanding > of how big the gap could potentially be. > > With luck, it doesn't really matter. Always nice to kill stuff like > this, if it isn't that impactful. Trying without this patch nops32 (submit 32 nops, complete all, repeat). 1) all CQE_SKIP: ~51 Mreqs/s 2) all CQE_SKIP but last, so it triggers locking + *ev_posted() ~49 Mreq/s 3) same as 2) but another task waits on CQ (so we call wake_up_all) ~36 Mreq/s And that's more or less expected. What is more interesting for me is how often for those using CQE_SKIP it helps to avoid this ev_posted()/etc. They obviously can't just mark all requests with it, and most probably helping only some quite niche cases.
On 6/19/22 8:52 AM, Pavel Begunkov wrote: > On 6/19/22 14:31, Jens Axboe wrote: >> On 6/19/22 5:26 AM, Pavel Begunkov wrote: >>> It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often >>> ->flush_cqes flag prevents from completion being flushed. Sometimes it's >>> high level of concurrency that enables it at least for one CQE, but >>> sometimes it doesn't save much because nobody waiting on the CQ. >>> >>> Remove ->flush_cqes flag and the optimisation, it should benefit the >>> normal use case. Note, that there is no spurious eventfd problem with >>> that as checks for spuriousness were incorporated into >>> io_eventfd_signal(). >> >> Would be note to quantify, which should be pretty easy. Eg run a nop >> workload, then run the same but with CQE_SKIP_SUCCESS set. That'd take >> it to the extreme, and I do think it'd be nice to have an understanding >> of how big the gap could potentially be. >> >> With luck, it doesn't really matter. Always nice to kill stuff like >> this, if it isn't that impactful. > > Trying without this patch nops32 (submit 32 nops, complete all, repeat). > > 1) all CQE_SKIP: > ~51 Mreqs/s > 2) all CQE_SKIP but last, so it triggers locking + *ev_posted() > ~49 Mreq/s > 3) same as 2) but another task waits on CQ (so we call wake_up_all) > ~36 Mreq/s > > And that's more or less expected. What is more interesting for me > is how often for those using CQE_SKIP it helps to avoid this > ev_posted()/etc. They obviously can't just mark all requests > with it, and most probably helping only some quite niche cases. That's not too bad. But I think we disagree on CQE_SKIP being niche, there are several standard cases where it makes sense. Provide buffers is one, though that one we have a better solution for now. But also eg OP_CLOSE is something that I'd personally use CQE_SKIP with always. Hence I don't think it's fair or reasonable to call it "quite niche" in terms of general usability. But if this helps in terms of SINGLE_ISSUER, then I think it's worth it as we'll likely see more broad appeal from that.
On 6/19/22 16:52, Jens Axboe wrote: > On 6/19/22 8:52 AM, Pavel Begunkov wrote: >> On 6/19/22 14:31, Jens Axboe wrote: >>> On 6/19/22 5:26 AM, Pavel Begunkov wrote: >>>> It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often >>>> ->flush_cqes flag prevents from completion being flushed. Sometimes it's >>>> high level of concurrency that enables it at least for one CQE, but >>>> sometimes it doesn't save much because nobody waiting on the CQ. >>>> >>>> Remove ->flush_cqes flag and the optimisation, it should benefit the >>>> normal use case. Note, that there is no spurious eventfd problem with >>>> that as checks for spuriousness were incorporated into >>>> io_eventfd_signal(). >>> >>> Would be note to quantify, which should be pretty easy. Eg run a nop >>> workload, then run the same but with CQE_SKIP_SUCCESS set. That'd take >>> it to the extreme, and I do think it'd be nice to have an understanding >>> of how big the gap could potentially be. >>> >>> With luck, it doesn't really matter. Always nice to kill stuff like >>> this, if it isn't that impactful. >> >> Trying without this patch nops32 (submit 32 nops, complete all, repeat). >> >> 1) all CQE_SKIP: >> ~51 Mreqs/s >> 2) all CQE_SKIP but last, so it triggers locking + *ev_posted() >> ~49 Mreq/s >> 3) same as 2) but another task waits on CQ (so we call wake_up_all) >> ~36 Mreq/s >> >> And that's more or less expected. What is more interesting for me >> is how often for those using CQE_SKIP it helps to avoid this >> ev_posted()/etc. They obviously can't just mark all requests >> with it, and most probably helping only some quite niche cases. > > That's not too bad. But I think we disagree on CQE_SKIP being niche, I wasn't talking about CQE_SKIP but rather cases where that ->flush_cqes actually does anything. Consider that when at least one of the requests queued for inline completion is not CQE_SKIP ->flush_cqes is effectively disabled. > there are several standard cases where it makes sense. Provide buffers > is one, though that one we have a better solution for now. But also eg > OP_CLOSE is something that I'd personally use CQE_SKIP with always. > > Hence I don't think it's fair or reasonable to call it "quite niche" in > terms of general usability. > > But if this helps in terms of SINGLE_ISSUER, then I think it's worth it > as we'll likely see more broad appeal from that. It neither conflicts with the SINGLE_ISSUER locking optimisations nor with the meantioned mb() optimisation. So, if there is a good reason to leave ->flush_cqes alone we can drop the patch.
On 6/19/22 10:15 AM, Pavel Begunkov wrote: > On 6/19/22 16:52, Jens Axboe wrote: >> On 6/19/22 8:52 AM, Pavel Begunkov wrote: >>> On 6/19/22 14:31, Jens Axboe wrote: >>>> On 6/19/22 5:26 AM, Pavel Begunkov wrote: >>>>> It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often >>>>> ->flush_cqes flag prevents from completion being flushed. Sometimes it's >>>>> high level of concurrency that enables it at least for one CQE, but >>>>> sometimes it doesn't save much because nobody waiting on the CQ. >>>>> >>>>> Remove ->flush_cqes flag and the optimisation, it should benefit the >>>>> normal use case. Note, that there is no spurious eventfd problem with >>>>> that as checks for spuriousness were incorporated into >>>>> io_eventfd_signal(). >>>> >>>> Would be note to quantify, which should be pretty easy. Eg run a nop >>>> workload, then run the same but with CQE_SKIP_SUCCESS set. That'd take >>>> it to the extreme, and I do think it'd be nice to have an understanding >>>> of how big the gap could potentially be. >>>> >>>> With luck, it doesn't really matter. Always nice to kill stuff like >>>> this, if it isn't that impactful. >>> >>> Trying without this patch nops32 (submit 32 nops, complete all, repeat). >>> >>> 1) all CQE_SKIP: >>> ~51 Mreqs/s >>> 2) all CQE_SKIP but last, so it triggers locking + *ev_posted() >>> ~49 Mreq/s >>> 3) same as 2) but another task waits on CQ (so we call wake_up_all) >>> ~36 Mreq/s >>> >>> And that's more or less expected. What is more interesting for me >>> is how often for those using CQE_SKIP it helps to avoid this >>> ev_posted()/etc. They obviously can't just mark all requests >>> with it, and most probably helping only some quite niche cases. >> >> That's not too bad. But I think we disagree on CQE_SKIP being niche, > > I wasn't talking about CQE_SKIP but rather cases where that > ->flush_cqes actually does anything. Consider that when at least > one of the requests queued for inline completion is not CQE_SKIP > ->flush_cqes is effectively disabled. > >> there are several standard cases where it makes sense. Provide buffers >> is one, though that one we have a better solution for now. But also eg >> OP_CLOSE is something that I'd personally use CQE_SKIP with always. >> >> Hence I don't think it's fair or reasonable to call it "quite niche" in >> terms of general usability. >> >> But if this helps in terms of SINGLE_ISSUER, then I think it's worth it >> as we'll likely see more broad appeal from that. > > It neither conflicts with the SINGLE_ISSUER locking optimisations > nor with the meantioned mb() optimisation. So, if there is a good > reason to leave ->flush_cqes alone we can drop the patch. Let me flip that around - is there a good reason NOT to leave the optimization in there then?
On 6/19/22 17:17, Jens Axboe wrote: > On 6/19/22 10:15 AM, Pavel Begunkov wrote: >> On 6/19/22 16:52, Jens Axboe wrote: >>> On 6/19/22 8:52 AM, Pavel Begunkov wrote: >>>> On 6/19/22 14:31, Jens Axboe wrote: >>>>> On 6/19/22 5:26 AM, Pavel Begunkov wrote: >>>>>> It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often >>>>>> ->flush_cqes flag prevents from completion being flushed. Sometimes it's >>>>>> high level of concurrency that enables it at least for one CQE, but >>>>>> sometimes it doesn't save much because nobody waiting on the CQ. >>>>>> >>>>>> Remove ->flush_cqes flag and the optimisation, it should benefit the >>>>>> normal use case. Note, that there is no spurious eventfd problem with >>>>>> that as checks for spuriousness were incorporated into >>>>>> io_eventfd_signal(). >>>>> >>>>> Would be note to quantify, which should be pretty easy. Eg run a nop >>>>> workload, then run the same but with CQE_SKIP_SUCCESS set. That'd take >>>>> it to the extreme, and I do think it'd be nice to have an understanding >>>>> of how big the gap could potentially be. >>>>> >>>>> With luck, it doesn't really matter. Always nice to kill stuff like >>>>> this, if it isn't that impactful. >>>> >>>> Trying without this patch nops32 (submit 32 nops, complete all, repeat). >>>> >>>> 1) all CQE_SKIP: >>>> ~51 Mreqs/s >>>> 2) all CQE_SKIP but last, so it triggers locking + *ev_posted() >>>> ~49 Mreq/s >>>> 3) same as 2) but another task waits on CQ (so we call wake_up_all) >>>> ~36 Mreq/s >>>> >>>> And that's more or less expected. What is more interesting for me >>>> is how often for those using CQE_SKIP it helps to avoid this >>>> ev_posted()/etc. They obviously can't just mark all requests >>>> with it, and most probably helping only some quite niche cases. >>> >>> That's not too bad. But I think we disagree on CQE_SKIP being niche, >> >> I wasn't talking about CQE_SKIP but rather cases where that >> ->flush_cqes actually does anything. Consider that when at least >> one of the requests queued for inline completion is not CQE_SKIP >> ->flush_cqes is effectively disabled. >> >>> there are several standard cases where it makes sense. Provide buffers >>> is one, though that one we have a better solution for now. But also eg >>> OP_CLOSE is something that I'd personally use CQE_SKIP with always. >>> >>> Hence I don't think it's fair or reasonable to call it "quite niche" in >>> terms of general usability. >>> >>> But if this helps in terms of SINGLE_ISSUER, then I think it's worth it >>> as we'll likely see more broad appeal from that. >> >> It neither conflicts with the SINGLE_ISSUER locking optimisations >> nor with the meantioned mb() optimisation. So, if there is a good >> reason to leave ->flush_cqes alone we can drop the patch. > > Let me flip that around - is there a good reason NOT to leave the > optimization in there then? Apart from ifs in the hot path with no understanding whether it helps anything, no
On 6/19/22 10:19 AM, Pavel Begunkov wrote: > On 6/19/22 17:17, Jens Axboe wrote: >> On 6/19/22 10:15 AM, Pavel Begunkov wrote: >>> On 6/19/22 16:52, Jens Axboe wrote: >>>> On 6/19/22 8:52 AM, Pavel Begunkov wrote: >>>>> On 6/19/22 14:31, Jens Axboe wrote: >>>>>> On 6/19/22 5:26 AM, Pavel Begunkov wrote: >>>>>>> It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often >>>>>>> ->flush_cqes flag prevents from completion being flushed. Sometimes it's >>>>>>> high level of concurrency that enables it at least for one CQE, but >>>>>>> sometimes it doesn't save much because nobody waiting on the CQ. >>>>>>> >>>>>>> Remove ->flush_cqes flag and the optimisation, it should benefit the >>>>>>> normal use case. Note, that there is no spurious eventfd problem with >>>>>>> that as checks for spuriousness were incorporated into >>>>>>> io_eventfd_signal(). >>>>>> >>>>>> Would be note to quantify, which should be pretty easy. Eg run a nop >>>>>> workload, then run the same but with CQE_SKIP_SUCCESS set. That'd take >>>>>> it to the extreme, and I do think it'd be nice to have an understanding >>>>>> of how big the gap could potentially be. >>>>>> >>>>>> With luck, it doesn't really matter. Always nice to kill stuff like >>>>>> this, if it isn't that impactful. >>>>> >>>>> Trying without this patch nops32 (submit 32 nops, complete all, repeat). >>>>> >>>>> 1) all CQE_SKIP: >>>>> ~51 Mreqs/s >>>>> 2) all CQE_SKIP but last, so it triggers locking + *ev_posted() >>>>> ~49 Mreq/s >>>>> 3) same as 2) but another task waits on CQ (so we call wake_up_all) >>>>> ~36 Mreq/s >>>>> >>>>> And that's more or less expected. What is more interesting for me >>>>> is how often for those using CQE_SKIP it helps to avoid this >>>>> ev_posted()/etc. They obviously can't just mark all requests >>>>> with it, and most probably helping only some quite niche cases. >>>> >>>> That's not too bad. But I think we disagree on CQE_SKIP being niche, >>> >>> I wasn't talking about CQE_SKIP but rather cases where that >>> ->flush_cqes actually does anything. Consider that when at least >>> one of the requests queued for inline completion is not CQE_SKIP >>> ->flush_cqes is effectively disabled. >>> >>>> there are several standard cases where it makes sense. Provide buffers >>>> is one, though that one we have a better solution for now. But also eg >>>> OP_CLOSE is something that I'd personally use CQE_SKIP with always. >>>> >>>> Hence I don't think it's fair or reasonable to call it "quite niche" in >>>> terms of general usability. >>>> >>>> But if this helps in terms of SINGLE_ISSUER, then I think it's worth it >>>> as we'll likely see more broad appeal from that. >>> >>> It neither conflicts with the SINGLE_ISSUER locking optimisations >>> nor with the meantioned mb() optimisation. So, if there is a good >>> reason to leave ->flush_cqes alone we can drop the patch. >> >> Let me flip that around - is there a good reason NOT to leave the >> optimization in there then? > > Apart from ifs in the hot path with no understanding whether > it helps anything, no Let's just keep the patch. Ratio of skip to non-skip should still be very tiny.
On 6/19/22 10:19 AM, Pavel Begunkov wrote: > On 6/19/22 17:17, Jens Axboe wrote: >> On 6/19/22 10:15 AM, Pavel Begunkov wrote: >>> On 6/19/22 16:52, Jens Axboe wrote: >>>> On 6/19/22 8:52 AM, Pavel Begunkov wrote: >>>>> On 6/19/22 14:31, Jens Axboe wrote: >>>>>> On 6/19/22 5:26 AM, Pavel Begunkov wrote: >>>>>>> It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often >>>>>>> ->flush_cqes flag prevents from completion being flushed. Sometimes it's >>>>>>> high level of concurrency that enables it at least for one CQE, but >>>>>>> sometimes it doesn't save much because nobody waiting on the CQ. >>>>>>> >>>>>>> Remove ->flush_cqes flag and the optimisation, it should benefit the >>>>>>> normal use case. Note, that there is no spurious eventfd problem with >>>>>>> that as checks for spuriousness were incorporated into >>>>>>> io_eventfd_signal(). >>>>>> >>>>>> Would be note to quantify, which should be pretty easy. Eg run a nop >>>>>> workload, then run the same but with CQE_SKIP_SUCCESS set. That'd take >>>>>> it to the extreme, and I do think it'd be nice to have an understanding >>>>>> of how big the gap could potentially be. >>>>>> >>>>>> With luck, it doesn't really matter. Always nice to kill stuff like >>>>>> this, if it isn't that impactful. >>>>> >>>>> Trying without this patch nops32 (submit 32 nops, complete all, repeat). >>>>> >>>>> 1) all CQE_SKIP: >>>>> ~51 Mreqs/s >>>>> 2) all CQE_SKIP but last, so it triggers locking + *ev_posted() >>>>> ~49 Mreq/s >>>>> 3) same as 2) but another task waits on CQ (so we call wake_up_all) >>>>> ~36 Mreq/s >>>>> >>>>> And that's more or less expected. What is more interesting for me >>>>> is how often for those using CQE_SKIP it helps to avoid this >>>>> ev_posted()/etc. They obviously can't just mark all requests >>>>> with it, and most probably helping only some quite niche cases. >>>> >>>> That's not too bad. But I think we disagree on CQE_SKIP being niche, >>> >>> I wasn't talking about CQE_SKIP but rather cases where that >>> ->flush_cqes actually does anything. Consider that when at least >>> one of the requests queued for inline completion is not CQE_SKIP >>> ->flush_cqes is effectively disabled. >>> >>>> there are several standard cases where it makes sense. Provide buffers >>>> is one, though that one we have a better solution for now. But also eg >>>> OP_CLOSE is something that I'd personally use CQE_SKIP with always. >>>> >>>> Hence I don't think it's fair or reasonable to call it "quite niche" in >>>> terms of general usability. >>>> >>>> But if this helps in terms of SINGLE_ISSUER, then I think it's worth it >>>> as we'll likely see more broad appeal from that. >>> >>> It neither conflicts with the SINGLE_ISSUER locking optimisations >>> nor with the meantioned mb() optimisation. So, if there is a good >>> reason to leave ->flush_cqes alone we can drop the patch. >> >> Let me flip that around - is there a good reason NOT to leave the >> optimization in there then? > > Apart from ifs in the hot path with no understanding whether > it helps anything, no Let's just keep the patch. Ratio of skip to non-skip should still be very tiny.
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 0875cc649e23..57aef092ef38 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -1253,22 +1253,19 @@ static void __io_submit_flush_completions(struct io_ring_ctx *ctx) struct io_wq_work_node *node, *prev; struct io_submit_state *state = &ctx->submit_state; - if (state->flush_cqes) { - spin_lock(&ctx->completion_lock); - wq_list_for_each(node, prev, &state->compl_reqs) { - struct io_kiocb *req = container_of(node, struct io_kiocb, - comp_list); - - if (!(req->flags & REQ_F_CQE_SKIP)) - __io_fill_cqe_req(ctx, req); - } + spin_lock(&ctx->completion_lock); + wq_list_for_each(node, prev, &state->compl_reqs) { + struct io_kiocb *req = container_of(node, struct io_kiocb, + comp_list); - io_commit_cqring(ctx); - spin_unlock(&ctx->completion_lock); - io_cqring_ev_posted(ctx); - state->flush_cqes = false; + if (!(req->flags & REQ_F_CQE_SKIP)) + __io_fill_cqe_req(ctx, req); } + io_commit_cqring(ctx); + spin_unlock(&ctx->completion_lock); + io_cqring_ev_posted(ctx); + io_free_batch_list(ctx, state->compl_reqs.first); INIT_WQ_LIST(&state->compl_reqs); } diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 906749fa3415..7feef8c36db7 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -232,8 +232,6 @@ static inline void io_req_complete_defer(struct io_kiocb *req) lockdep_assert_held(&req->ctx->uring_lock); - if (!(req->flags & REQ_F_CQE_SKIP)) - state->flush_cqes = true; wq_list_add_tail(&req->comp_list, &state->compl_reqs); }
It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often ->flush_cqes flag prevents from completion being flushed. Sometimes it's high level of concurrency that enables it at least for one CQE, but sometimes it doesn't save much because nobody waiting on the CQ. Remove ->flush_cqes flag and the optimisation, it should benefit the normal use case. Note, that there is no spurious eventfd problem with that as checks for spuriousness were incorporated into io_eventfd_signal(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> --- io_uring/io_uring.c | 23 ++++++++++------------- io_uring/io_uring.h | 2 -- 2 files changed, 10 insertions(+), 15 deletions(-)