diff mbox series

[for-next,5/7] io_uring: remove ->flush_cqes optimisation

Message ID 692e81eeddccc096f449a7960365fa7b4a18f8e6.1655637157.git.asml.silence@gmail.com (mailing list archive)
State New
Headers show
Series cqe posting cleanups | expand

Commit Message

Pavel Begunkov June 19, 2022, 11:26 a.m. UTC
It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often
->flush_cqes flag prevents from completion being flushed. Sometimes it's
high level of concurrency that enables it at least for one CQE, but
sometimes it doesn't save much because nobody waiting on the CQ.

Remove ->flush_cqes flag and the optimisation, it should benefit the
normal use case. Note, that there is no spurious eventfd problem with
that as checks for spuriousness were incorporated into
io_eventfd_signal().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/io_uring.c | 23 ++++++++++-------------
 io_uring/io_uring.h |  2 --
 2 files changed, 10 insertions(+), 15 deletions(-)

Comments

Jens Axboe June 19, 2022, 1:31 p.m. UTC | #1
On 6/19/22 5:26 AM, Pavel Begunkov wrote:
> It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often
> ->flush_cqes flag prevents from completion being flushed. Sometimes it's
> high level of concurrency that enables it at least for one CQE, but
> sometimes it doesn't save much because nobody waiting on the CQ.
> 
> Remove ->flush_cqes flag and the optimisation, it should benefit the
> normal use case. Note, that there is no spurious eventfd problem with
> that as checks for spuriousness were incorporated into
> io_eventfd_signal().

Would be note to quantify, which should be pretty easy. Eg run a nop
workload, then run the same but with CQE_SKIP_SUCCESS set. That'd take
it to the extreme, and I do think it'd be nice to have an understanding
of how big the gap could potentially be.

With luck, it doesn't really matter. Always nice to kill stuff like
this, if it isn't that impactful.
Pavel Begunkov June 19, 2022, 2:52 p.m. UTC | #2
On 6/19/22 14:31, Jens Axboe wrote:
> On 6/19/22 5:26 AM, Pavel Begunkov wrote:
>> It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often
>> ->flush_cqes flag prevents from completion being flushed. Sometimes it's
>> high level of concurrency that enables it at least for one CQE, but
>> sometimes it doesn't save much because nobody waiting on the CQ.
>>
>> Remove ->flush_cqes flag and the optimisation, it should benefit the
>> normal use case. Note, that there is no spurious eventfd problem with
>> that as checks for spuriousness were incorporated into
>> io_eventfd_signal().
> 
> Would be note to quantify, which should be pretty easy. Eg run a nop
> workload, then run the same but with CQE_SKIP_SUCCESS set. That'd take
> it to the extreme, and I do think it'd be nice to have an understanding
> of how big the gap could potentially be.
> 
> With luck, it doesn't really matter. Always nice to kill stuff like
> this, if it isn't that impactful.

Trying without this patch nops32 (submit 32 nops, complete all, repeat).

1) all CQE_SKIP:
	~51 Mreqs/s
2) all CQE_SKIP but last, so it triggers locking + *ev_posted()
	~49 Mreq/s
3) same as 2) but another task waits on CQ (so we call wake_up_all)
	~36 Mreq/s

And that's more or less expected. What is more interesting for me
is how often for those using CQE_SKIP it helps to avoid this
ev_posted()/etc. They obviously can't just mark all requests
with it, and most probably helping only some quite niche cases.
Jens Axboe June 19, 2022, 3:52 p.m. UTC | #3
On 6/19/22 8:52 AM, Pavel Begunkov wrote:
> On 6/19/22 14:31, Jens Axboe wrote:
>> On 6/19/22 5:26 AM, Pavel Begunkov wrote:
>>> It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often
>>> ->flush_cqes flag prevents from completion being flushed. Sometimes it's
>>> high level of concurrency that enables it at least for one CQE, but
>>> sometimes it doesn't save much because nobody waiting on the CQ.
>>>
>>> Remove ->flush_cqes flag and the optimisation, it should benefit the
>>> normal use case. Note, that there is no spurious eventfd problem with
>>> that as checks for spuriousness were incorporated into
>>> io_eventfd_signal().
>>
>> Would be note to quantify, which should be pretty easy. Eg run a nop
>> workload, then run the same but with CQE_SKIP_SUCCESS set. That'd take
>> it to the extreme, and I do think it'd be nice to have an understanding
>> of how big the gap could potentially be.
>>
>> With luck, it doesn't really matter. Always nice to kill stuff like
>> this, if it isn't that impactful.
> 
> Trying without this patch nops32 (submit 32 nops, complete all, repeat).
> 
> 1) all CQE_SKIP:
>     ~51 Mreqs/s
> 2) all CQE_SKIP but last, so it triggers locking + *ev_posted()
>     ~49 Mreq/s
> 3) same as 2) but another task waits on CQ (so we call wake_up_all)
>     ~36 Mreq/s
> 
> And that's more or less expected. What is more interesting for me
> is how often for those using CQE_SKIP it helps to avoid this
> ev_posted()/etc. They obviously can't just mark all requests
> with it, and most probably helping only some quite niche cases.

That's not too bad. But I think we disagree on CQE_SKIP being niche,
there are several standard cases where it makes sense. Provide buffers
is one, though that one we have a better solution for now. But also eg
OP_CLOSE is something that I'd personally use CQE_SKIP with always.

Hence I don't think it's fair or reasonable to call it "quite niche" in
terms of general usability.

But if this helps in terms of SINGLE_ISSUER, then I think it's worth it
as we'll likely see more broad appeal from that.
Pavel Begunkov June 19, 2022, 4:15 p.m. UTC | #4
On 6/19/22 16:52, Jens Axboe wrote:
> On 6/19/22 8:52 AM, Pavel Begunkov wrote:
>> On 6/19/22 14:31, Jens Axboe wrote:
>>> On 6/19/22 5:26 AM, Pavel Begunkov wrote:
>>>> It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often
>>>> ->flush_cqes flag prevents from completion being flushed. Sometimes it's
>>>> high level of concurrency that enables it at least for one CQE, but
>>>> sometimes it doesn't save much because nobody waiting on the CQ.
>>>>
>>>> Remove ->flush_cqes flag and the optimisation, it should benefit the
>>>> normal use case. Note, that there is no spurious eventfd problem with
>>>> that as checks for spuriousness were incorporated into
>>>> io_eventfd_signal().
>>>
>>> Would be note to quantify, which should be pretty easy. Eg run a nop
>>> workload, then run the same but with CQE_SKIP_SUCCESS set. That'd take
>>> it to the extreme, and I do think it'd be nice to have an understanding
>>> of how big the gap could potentially be.
>>>
>>> With luck, it doesn't really matter. Always nice to kill stuff like
>>> this, if it isn't that impactful.
>>
>> Trying without this patch nops32 (submit 32 nops, complete all, repeat).
>>
>> 1) all CQE_SKIP:
>>      ~51 Mreqs/s
>> 2) all CQE_SKIP but last, so it triggers locking + *ev_posted()
>>      ~49 Mreq/s
>> 3) same as 2) but another task waits on CQ (so we call wake_up_all)
>>      ~36 Mreq/s
>>
>> And that's more or less expected. What is more interesting for me
>> is how often for those using CQE_SKIP it helps to avoid this
>> ev_posted()/etc. They obviously can't just mark all requests
>> with it, and most probably helping only some quite niche cases.
> 
> That's not too bad. But I think we disagree on CQE_SKIP being niche,

I wasn't talking about CQE_SKIP but rather cases where that
->flush_cqes actually does anything. Consider that when at least
one of the requests queued for inline completion is not CQE_SKIP
->flush_cqes is effectively disabled.

> there are several standard cases where it makes sense. Provide buffers
> is one, though that one we have a better solution for now. But also eg
> OP_CLOSE is something that I'd personally use CQE_SKIP with always.
> 
> Hence I don't think it's fair or reasonable to call it "quite niche" in
> terms of general usability.
> 
> But if this helps in terms of SINGLE_ISSUER, then I think it's worth it
> as we'll likely see more broad appeal from that.

It neither conflicts with the SINGLE_ISSUER locking optimisations
nor with the meantioned mb() optimisation. So, if there is a good
reason to leave ->flush_cqes alone we can drop the patch.
Jens Axboe June 19, 2022, 4:17 p.m. UTC | #5
On 6/19/22 10:15 AM, Pavel Begunkov wrote:
> On 6/19/22 16:52, Jens Axboe wrote:
>> On 6/19/22 8:52 AM, Pavel Begunkov wrote:
>>> On 6/19/22 14:31, Jens Axboe wrote:
>>>> On 6/19/22 5:26 AM, Pavel Begunkov wrote:
>>>>> It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often
>>>>> ->flush_cqes flag prevents from completion being flushed. Sometimes it's
>>>>> high level of concurrency that enables it at least for one CQE, but
>>>>> sometimes it doesn't save much because nobody waiting on the CQ.
>>>>>
>>>>> Remove ->flush_cqes flag and the optimisation, it should benefit the
>>>>> normal use case. Note, that there is no spurious eventfd problem with
>>>>> that as checks for spuriousness were incorporated into
>>>>> io_eventfd_signal().
>>>>
>>>> Would be note to quantify, which should be pretty easy. Eg run a nop
>>>> workload, then run the same but with CQE_SKIP_SUCCESS set. That'd take
>>>> it to the extreme, and I do think it'd be nice to have an understanding
>>>> of how big the gap could potentially be.
>>>>
>>>> With luck, it doesn't really matter. Always nice to kill stuff like
>>>> this, if it isn't that impactful.
>>>
>>> Trying without this patch nops32 (submit 32 nops, complete all, repeat).
>>>
>>> 1) all CQE_SKIP:
>>>      ~51 Mreqs/s
>>> 2) all CQE_SKIP but last, so it triggers locking + *ev_posted()
>>>      ~49 Mreq/s
>>> 3) same as 2) but another task waits on CQ (so we call wake_up_all)
>>>      ~36 Mreq/s
>>>
>>> And that's more or less expected. What is more interesting for me
>>> is how often for those using CQE_SKIP it helps to avoid this
>>> ev_posted()/etc. They obviously can't just mark all requests
>>> with it, and most probably helping only some quite niche cases.
>>
>> That's not too bad. But I think we disagree on CQE_SKIP being niche,
> 
> I wasn't talking about CQE_SKIP but rather cases where that
> ->flush_cqes actually does anything. Consider that when at least
> one of the requests queued for inline completion is not CQE_SKIP
> ->flush_cqes is effectively disabled.
> 
>> there are several standard cases where it makes sense. Provide buffers
>> is one, though that one we have a better solution for now. But also eg
>> OP_CLOSE is something that I'd personally use CQE_SKIP with always.
>>
>> Hence I don't think it's fair or reasonable to call it "quite niche" in
>> terms of general usability.
>>
>> But if this helps in terms of SINGLE_ISSUER, then I think it's worth it
>> as we'll likely see more broad appeal from that.
> 
> It neither conflicts with the SINGLE_ISSUER locking optimisations
> nor with the meantioned mb() optimisation. So, if there is a good
> reason to leave ->flush_cqes alone we can drop the patch.

Let me flip that around - is there a good reason NOT to leave the
optimization in there then?
Pavel Begunkov June 19, 2022, 4:19 p.m. UTC | #6
On 6/19/22 17:17, Jens Axboe wrote:
> On 6/19/22 10:15 AM, Pavel Begunkov wrote:
>> On 6/19/22 16:52, Jens Axboe wrote:
>>> On 6/19/22 8:52 AM, Pavel Begunkov wrote:
>>>> On 6/19/22 14:31, Jens Axboe wrote:
>>>>> On 6/19/22 5:26 AM, Pavel Begunkov wrote:
>>>>>> It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often
>>>>>> ->flush_cqes flag prevents from completion being flushed. Sometimes it's
>>>>>> high level of concurrency that enables it at least for one CQE, but
>>>>>> sometimes it doesn't save much because nobody waiting on the CQ.
>>>>>>
>>>>>> Remove ->flush_cqes flag and the optimisation, it should benefit the
>>>>>> normal use case. Note, that there is no spurious eventfd problem with
>>>>>> that as checks for spuriousness were incorporated into
>>>>>> io_eventfd_signal().
>>>>>
>>>>> Would be note to quantify, which should be pretty easy. Eg run a nop
>>>>> workload, then run the same but with CQE_SKIP_SUCCESS set. That'd take
>>>>> it to the extreme, and I do think it'd be nice to have an understanding
>>>>> of how big the gap could potentially be.
>>>>>
>>>>> With luck, it doesn't really matter. Always nice to kill stuff like
>>>>> this, if it isn't that impactful.
>>>>
>>>> Trying without this patch nops32 (submit 32 nops, complete all, repeat).
>>>>
>>>> 1) all CQE_SKIP:
>>>>       ~51 Mreqs/s
>>>> 2) all CQE_SKIP but last, so it triggers locking + *ev_posted()
>>>>       ~49 Mreq/s
>>>> 3) same as 2) but another task waits on CQ (so we call wake_up_all)
>>>>       ~36 Mreq/s
>>>>
>>>> And that's more or less expected. What is more interesting for me
>>>> is how often for those using CQE_SKIP it helps to avoid this
>>>> ev_posted()/etc. They obviously can't just mark all requests
>>>> with it, and most probably helping only some quite niche cases.
>>>
>>> That's not too bad. But I think we disagree on CQE_SKIP being niche,
>>
>> I wasn't talking about CQE_SKIP but rather cases where that
>> ->flush_cqes actually does anything. Consider that when at least
>> one of the requests queued for inline completion is not CQE_SKIP
>> ->flush_cqes is effectively disabled.
>>
>>> there are several standard cases where it makes sense. Provide buffers
>>> is one, though that one we have a better solution for now. But also eg
>>> OP_CLOSE is something that I'd personally use CQE_SKIP with always.
>>>
>>> Hence I don't think it's fair or reasonable to call it "quite niche" in
>>> terms of general usability.
>>>
>>> But if this helps in terms of SINGLE_ISSUER, then I think it's worth it
>>> as we'll likely see more broad appeal from that.
>>
>> It neither conflicts with the SINGLE_ISSUER locking optimisations
>> nor with the meantioned mb() optimisation. So, if there is a good
>> reason to leave ->flush_cqes alone we can drop the patch.
> 
> Let me flip that around - is there a good reason NOT to leave the
> optimization in there then?

Apart from ifs in the hot path with no understanding whether
it helps anything, no
Jens Axboe June 19, 2022, 4:38 p.m. UTC | #7
On 6/19/22 10:19 AM, Pavel Begunkov wrote:
> On 6/19/22 17:17, Jens Axboe wrote:
>> On 6/19/22 10:15 AM, Pavel Begunkov wrote:
>>> On 6/19/22 16:52, Jens Axboe wrote:
>>>> On 6/19/22 8:52 AM, Pavel Begunkov wrote:
>>>>> On 6/19/22 14:31, Jens Axboe wrote:
>>>>>> On 6/19/22 5:26 AM, Pavel Begunkov wrote:
>>>>>>> It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often
>>>>>>> ->flush_cqes flag prevents from completion being flushed. Sometimes it's
>>>>>>> high level of concurrency that enables it at least for one CQE, but
>>>>>>> sometimes it doesn't save much because nobody waiting on the CQ.
>>>>>>>
>>>>>>> Remove ->flush_cqes flag and the optimisation, it should benefit the
>>>>>>> normal use case. Note, that there is no spurious eventfd problem with
>>>>>>> that as checks for spuriousness were incorporated into
>>>>>>> io_eventfd_signal().
>>>>>>
>>>>>> Would be note to quantify, which should be pretty easy. Eg run a nop
>>>>>> workload, then run the same but with CQE_SKIP_SUCCESS set. That'd take
>>>>>> it to the extreme, and I do think it'd be nice to have an understanding
>>>>>> of how big the gap could potentially be.
>>>>>>
>>>>>> With luck, it doesn't really matter. Always nice to kill stuff like
>>>>>> this, if it isn't that impactful.
>>>>>
>>>>> Trying without this patch nops32 (submit 32 nops, complete all, repeat).
>>>>>
>>>>> 1) all CQE_SKIP:
>>>>>       ~51 Mreqs/s
>>>>> 2) all CQE_SKIP but last, so it triggers locking + *ev_posted()
>>>>>       ~49 Mreq/s
>>>>> 3) same as 2) but another task waits on CQ (so we call wake_up_all)
>>>>>       ~36 Mreq/s
>>>>>
>>>>> And that's more or less expected. What is more interesting for me
>>>>> is how often for those using CQE_SKIP it helps to avoid this
>>>>> ev_posted()/etc. They obviously can't just mark all requests
>>>>> with it, and most probably helping only some quite niche cases.
>>>>
>>>> That's not too bad. But I think we disagree on CQE_SKIP being niche,
>>>
>>> I wasn't talking about CQE_SKIP but rather cases where that
>>> ->flush_cqes actually does anything. Consider that when at least
>>> one of the requests queued for inline completion is not CQE_SKIP
>>> ->flush_cqes is effectively disabled.
>>>
>>>> there are several standard cases where it makes sense. Provide buffers
>>>> is one, though that one we have a better solution for now. But also eg
>>>> OP_CLOSE is something that I'd personally use CQE_SKIP with always.
>>>>
>>>> Hence I don't think it's fair or reasonable to call it "quite niche" in
>>>> terms of general usability.
>>>>
>>>> But if this helps in terms of SINGLE_ISSUER, then I think it's worth it
>>>> as we'll likely see more broad appeal from that.
>>>
>>> It neither conflicts with the SINGLE_ISSUER locking optimisations
>>> nor with the meantioned mb() optimisation. So, if there is a good
>>> reason to leave ->flush_cqes alone we can drop the patch.
>>
>> Let me flip that around - is there a good reason NOT to leave the
>> optimization in there then?
> 
> Apart from ifs in the hot path with no understanding whether
> it helps anything, no

Let's just keep the patch. Ratio of skip to non-skip should still be
very tiny.
Jens Axboe June 19, 2022, 4:38 p.m. UTC | #8
On 6/19/22 10:19 AM, Pavel Begunkov wrote:
> On 6/19/22 17:17, Jens Axboe wrote:
>> On 6/19/22 10:15 AM, Pavel Begunkov wrote:
>>> On 6/19/22 16:52, Jens Axboe wrote:
>>>> On 6/19/22 8:52 AM, Pavel Begunkov wrote:
>>>>> On 6/19/22 14:31, Jens Axboe wrote:
>>>>>> On 6/19/22 5:26 AM, Pavel Begunkov wrote:
>>>>>>> It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often
>>>>>>> ->flush_cqes flag prevents from completion being flushed. Sometimes it's
>>>>>>> high level of concurrency that enables it at least for one CQE, but
>>>>>>> sometimes it doesn't save much because nobody waiting on the CQ.
>>>>>>>
>>>>>>> Remove ->flush_cqes flag and the optimisation, it should benefit the
>>>>>>> normal use case. Note, that there is no spurious eventfd problem with
>>>>>>> that as checks for spuriousness were incorporated into
>>>>>>> io_eventfd_signal().
>>>>>>
>>>>>> Would be note to quantify, which should be pretty easy. Eg run a nop
>>>>>> workload, then run the same but with CQE_SKIP_SUCCESS set. That'd take
>>>>>> it to the extreme, and I do think it'd be nice to have an understanding
>>>>>> of how big the gap could potentially be.
>>>>>>
>>>>>> With luck, it doesn't really matter. Always nice to kill stuff like
>>>>>> this, if it isn't that impactful.
>>>>>
>>>>> Trying without this patch nops32 (submit 32 nops, complete all, repeat).
>>>>>
>>>>> 1) all CQE_SKIP:
>>>>>       ~51 Mreqs/s
>>>>> 2) all CQE_SKIP but last, so it triggers locking + *ev_posted()
>>>>>       ~49 Mreq/s
>>>>> 3) same as 2) but another task waits on CQ (so we call wake_up_all)
>>>>>       ~36 Mreq/s
>>>>>
>>>>> And that's more or less expected. What is more interesting for me
>>>>> is how often for those using CQE_SKIP it helps to avoid this
>>>>> ev_posted()/etc. They obviously can't just mark all requests
>>>>> with it, and most probably helping only some quite niche cases.
>>>>
>>>> That's not too bad. But I think we disagree on CQE_SKIP being niche,
>>>
>>> I wasn't talking about CQE_SKIP but rather cases where that
>>> ->flush_cqes actually does anything. Consider that when at least
>>> one of the requests queued for inline completion is not CQE_SKIP
>>> ->flush_cqes is effectively disabled.
>>>
>>>> there are several standard cases where it makes sense. Provide buffers
>>>> is one, though that one we have a better solution for now. But also eg
>>>> OP_CLOSE is something that I'd personally use CQE_SKIP with always.
>>>>
>>>> Hence I don't think it's fair or reasonable to call it "quite niche" in
>>>> terms of general usability.
>>>>
>>>> But if this helps in terms of SINGLE_ISSUER, then I think it's worth it
>>>> as we'll likely see more broad appeal from that.
>>>
>>> It neither conflicts with the SINGLE_ISSUER locking optimisations
>>> nor with the meantioned mb() optimisation. So, if there is a good
>>> reason to leave ->flush_cqes alone we can drop the patch.
>>
>> Let me flip that around - is there a good reason NOT to leave the
>> optimization in there then?
> 
> Apart from ifs in the hot path with no understanding whether
> it helps anything, no

Let's just keep the patch. Ratio of skip to non-skip should still be
very tiny.
diff mbox series

Patch

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 0875cc649e23..57aef092ef38 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -1253,22 +1253,19 @@  static void __io_submit_flush_completions(struct io_ring_ctx *ctx)
 	struct io_wq_work_node *node, *prev;
 	struct io_submit_state *state = &ctx->submit_state;
 
-	if (state->flush_cqes) {
-		spin_lock(&ctx->completion_lock);
-		wq_list_for_each(node, prev, &state->compl_reqs) {
-			struct io_kiocb *req = container_of(node, struct io_kiocb,
-						    comp_list);
-
-			if (!(req->flags & REQ_F_CQE_SKIP))
-				__io_fill_cqe_req(ctx, req);
-		}
+	spin_lock(&ctx->completion_lock);
+	wq_list_for_each(node, prev, &state->compl_reqs) {
+		struct io_kiocb *req = container_of(node, struct io_kiocb,
+					    comp_list);
 
-		io_commit_cqring(ctx);
-		spin_unlock(&ctx->completion_lock);
-		io_cqring_ev_posted(ctx);
-		state->flush_cqes = false;
+		if (!(req->flags & REQ_F_CQE_SKIP))
+			__io_fill_cqe_req(ctx, req);
 	}
 
+	io_commit_cqring(ctx);
+	spin_unlock(&ctx->completion_lock);
+	io_cqring_ev_posted(ctx);
+
 	io_free_batch_list(ctx, state->compl_reqs.first);
 	INIT_WQ_LIST(&state->compl_reqs);
 }
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 906749fa3415..7feef8c36db7 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -232,8 +232,6 @@  static inline void io_req_complete_defer(struct io_kiocb *req)
 
 	lockdep_assert_held(&req->ctx->uring_lock);
 
-	if (!(req->flags & REQ_F_CQE_SKIP))
-		state->flush_cqes = true;
 	wq_list_add_tail(&req->comp_list, &state->compl_reqs);
 }