[for-next,5/7] io_uring: remove ->flush_cqes optimisation

Message ID	692e81eeddccc096f449a7960365fa7b4a18f8e6.1655637157.git.asml.silence@gmail.com (mailing list archive)
State	New
Headers	show Return-Path: <io-uring-owner@kernel.org> From: Pavel Begunkov <asml.silence@gmail.com> To: io-uring@vger.kernel.org Cc: Jens Axboe <axboe@kernel.dk>, asml.silence@gmail.com Subject: [PATCH for-next 5/7] io_uring: remove ->flush_cqes optimisation Date: Sun, 19 Jun 2022 12:26:08 +0100 Message-Id: <692e81eeddccc096f449a7960365fa7b4a18f8e6.1655637157.git.asml.silence@gmail.com> In-Reply-To: <cover.1655637157.git.asml.silence@gmail.com> References: <cover.1655637157.git.asml.silence@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	cqe posting cleanups \| expand [for-next,0/7] cqe posting cleanups [for-next,1/7] io_uring: remove extra io_commit_cqring() [for-next,2/7] io_uring: reshuffle io_uring/io_uring.h [for-next,3/7] io_uring: move io_eventfd_signal() [for-next,4/7] io_uring: hide eventfd assumptions in evenfd paths [for-next,5/7] io_uring: remove ->flush_cqes optimisation [for-next,6/7] io_uring: introduce locking helpers for CQE posting [for-next,7/7] io_uring: add io_commit_cqring_flush()

Pavel Begunkov June 19, 2022, 11:26 a.m. UTC

It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often
->flush_cqes flag prevents from completion being flushed. Sometimes it's
high level of concurrency that enables it at least for one CQE, but
sometimes it doesn't save much because nobody waiting on the CQ.

Remove ->flush_cqes flag and the optimisation, it should benefit the
normal use case. Note, that there is no spurious eventfd problem with
that as checks for spuriousness were incorporated into
io_eventfd_signal().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/io_uring.c | 23 ++++++++++-------------
 io_uring/io_uring.h |  2 --
 2 files changed, 10 insertions(+), 15 deletions(-)

Jens Axboe June 19, 2022, 1:31 p.m. UTC | #1

On 6/19/22 5:26 AM, Pavel Begunkov wrote:
> It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often
> ->flush_cqes flag prevents from completion being flushed. Sometimes it's
> high level of concurrency that enables it at least for one CQE, but
> sometimes it doesn't save much because nobody waiting on the CQ.
> 
> Remove ->flush_cqes flag and the optimisation, it should benefit the
> normal use case. Note, that there is no spurious eventfd problem with
> that as checks for spuriousness were incorporated into
> io_eventfd_signal().

Would be note to quantify, which should be pretty easy. Eg run a nop
workload, then run the same but with CQE_SKIP_SUCCESS set. That'd take
it to the extreme, and I do think it'd be nice to have an understanding
of how big the gap could potentially be.

With luck, it doesn't really matter. Always nice to kill stuff like
this, if it isn't that impactful.

Pavel Begunkov June 19, 2022, 2:52 p.m. UTC | #2

On 6/19/22 14:31, Jens Axboe wrote:
> On 6/19/22 5:26 AM, Pavel Begunkov wrote:
>> It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often
>> ->flush_cqes flag prevents from completion being flushed. Sometimes it's
>> high level of concurrency that enables it at least for one CQE, but
>> sometimes it doesn't save much because nobody waiting on the CQ.
>>
>> Remove ->flush_cqes flag and the optimisation, it should benefit the
>> normal use case. Note, that there is no spurious eventfd problem with
>> that as checks for spuriousness were incorporated into
>> io_eventfd_signal().
> 
> Would be note to quantify, which should be pretty easy. Eg run a nop
> workload, then run the same but with CQE_SKIP_SUCCESS set. That'd take
> it to the extreme, and I do think it'd be nice to have an understanding
> of how big the gap could potentially be.
> 
> With luck, it doesn't really matter. Always nice to kill stuff like
> this, if it isn't that impactful.

Trying without this patch nops32 (submit 32 nops, complete all, repeat).

1) all CQE_SKIP:
	~51 Mreqs/s
2) all CQE_SKIP but last, so it triggers locking + *ev_posted()
	~49 Mreq/s
3) same as 2) but another task waits on CQ (so we call wake_up_all)
	~36 Mreq/s

And that's more or less expected. What is more interesting for me
is how often for those using CQE_SKIP it helps to avoid this
ev_posted()/etc. They obviously can't just mark all requests
with it, and most probably helping only some quite niche cases.

Jens Axboe June 19, 2022, 3:52 p.m. UTC | #3

On 6/19/22 8:52 AM, Pavel Begunkov wrote:
> On 6/19/22 14:31, Jens Axboe wrote:
>> On 6/19/22 5:26 AM, Pavel Begunkov wrote:
>>> It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often
>>> ->flush_cqes flag prevents from completion being flushed. Sometimes it's
>>> high level of concurrency that enables it at least for one CQE, but
>>> sometimes it doesn't save much because nobody waiting on the CQ.
>>>
>>> Remove ->flush_cqes flag and the optimisation, it should benefit the
>>> normal use case. Note, that there is no spurious eventfd problem with
>>> that as checks for spuriousness were incorporated into
>>> io_eventfd_signal().
>>
>> Would be note to quantify, which should be pretty easy. Eg run a nop
>> workload, then run the same but with CQE_SKIP_SUCCESS set. That'd take
>> it to the extreme, and I do think it'd be nice to have an understanding
>> of how big the gap could potentially be.
>>
>> With luck, it doesn't really matter. Always nice to kill stuff like
>> this, if it isn't that impactful.
> 
> Trying without this patch nops32 (submit 32 nops, complete all, repeat).
> 
> 1) all CQE_SKIP:
>     ~51 Mreqs/s
> 2) all CQE_SKIP but last, so it triggers locking + *ev_posted()
>     ~49 Mreq/s
> 3) same as 2) but another task waits on CQ (so we call wake_up_all)
>     ~36 Mreq/s
> 
> And that's more or less expected. What is more interesting for me
> is how often for those using CQE_SKIP it helps to avoid this
> ev_posted()/etc. They obviously can't just mark all requests
> with it, and most probably helping only some quite niche cases.

That's not too bad. But I think we disagree on CQE_SKIP being niche,
there are several standard cases where it makes sense. Provide buffers
is one, though that one we have a better solution for now. But also eg
OP_CLOSE is something that I'd personally use CQE_SKIP with always.

Hence I don't think it's fair or reasonable to call it "quite niche" in
terms of general usability.

But if this helps in terms of SINGLE_ISSUER, then I think it's worth it
as we'll likely see more broad appeal from that.

Pavel Begunkov June 19, 2022, 4:15 p.m. UTC | #4

On 6/19/22 16:52, Jens Axboe wrote:
> On 6/19/22 8:52 AM, Pavel Begunkov wrote:
>> On 6/19/22 14:31, Jens Axboe wrote:
>>> On 6/19/22 5:26 AM, Pavel Begunkov wrote:
>>>> It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often
>>>> ->flush_cqes flag prevents from completion being flushed. Sometimes it's
>>>> high level of concurrency that enables it at least for one CQE, but
>>>> sometimes it doesn't save much because nobody waiting on the CQ.
>>>>
>>>> Remove ->flush_cqes flag and the optimisation, it should benefit the
>>>> normal use case. Note, that there is no spurious eventfd problem with
>>>> that as checks for spuriousness were incorporated into
>>>> io_eventfd_signal().
>>>
>>> Would be note to quantify, which should be pretty easy. Eg run a nop
>>> workload, then run the same but with CQE_SKIP_SUCCESS set. That'd take
>>> it to the extreme, and I do think it'd be nice to have an understanding
>>> of how big the gap could potentially be.
>>>
>>> With luck, it doesn't really matter. Always nice to kill stuff like
>>> this, if it isn't that impactful.
>>
>> Trying without this patch nops32 (submit 32 nops, complete all, repeat).
>>
>> 1) all CQE_SKIP:
>>      ~51 Mreqs/s
>> 2) all CQE_SKIP but last, so it triggers locking + *ev_posted()
>>      ~49 Mreq/s
>> 3) same as 2) but another task waits on CQ (so we call wake_up_all)
>>      ~36 Mreq/s
>>
>> And that's more or less expected. What is more interesting for me
>> is how often for those using CQE_SKIP it helps to avoid this
>> ev_posted()/etc. They obviously can't just mark all requests
>> with it, and most probably helping only some quite niche cases.
> 
> That's not too bad. But I think we disagree on CQE_SKIP being niche,

I wasn't talking about CQE_SKIP but rather cases where that
->flush_cqes actually does anything. Consider that when at least
one of the requests queued for inline completion is not CQE_SKIP
->flush_cqes is effectively disabled.

> there are several standard cases where it makes sense. Provide buffers
> is one, though that one we have a better solution for now. But also eg
> OP_CLOSE is something that I'd personally use CQE_SKIP with always.
> 
> Hence I don't think it's fair or reasonable to call it "quite niche" in
> terms of general usability.
> 
> But if this helps in terms of SINGLE_ISSUER, then I think it's worth it
> as we'll likely see more broad appeal from that.

It neither conflicts with the SINGLE_ISSUER locking optimisations
nor with the meantioned mb() optimisation. So, if there is a good
reason to leave ->flush_cqes alone we can drop the patch.

Jens Axboe June 19, 2022, 4:17 p.m. UTC | #5

On 6/19/22 10:15 AM, Pavel Begunkov wrote:
> On 6/19/22 16:52, Jens Axboe wrote:
>> On 6/19/22 8:52 AM, Pavel Begunkov wrote:
>>> On 6/19/22 14:31, Jens Axboe wrote:
>>>> On 6/19/22 5:26 AM, Pavel Begunkov wrote:
>>>>> It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often
>>>>> ->flush_cqes flag prevents from completion being flushed. Sometimes it's
>>>>> high level of concurrency that enables it at least for one CQE, but
>>>>> sometimes it doesn't save much because nobody waiting on the CQ.
>>>>>
>>>>> Remove ->flush_cqes flag and the optimisation, it should benefit the
>>>>> normal use case. Note, that there is no spurious eventfd problem with
>>>>> that as checks for spuriousness were incorporated into
>>>>> io_eventfd_signal().
>>>>
>>>> Would be note to quantify, which should be pretty easy. Eg run a nop
>>>> workload, then run the same but with CQE_SKIP_SUCCESS set. That'd take
>>>> it to the extreme, and I do think it'd be nice to have an understanding
>>>> of how big the gap could potentially be.
>>>>
>>>> With luck, it doesn't really matter. Always nice to kill stuff like
>>>> this, if it isn't that impactful.
>>>
>>> Trying without this patch nops32 (submit 32 nops, complete all, repeat).
>>>
>>> 1) all CQE_SKIP:
>>>      ~51 Mreqs/s
>>> 2) all CQE_SKIP but last, so it triggers locking + *ev_posted()
>>>      ~49 Mreq/s
>>> 3) same as 2) but another task waits on CQ (so we call wake_up_all)
>>>      ~36 Mreq/s
>>>
>>> And that's more or less expected. What is more interesting for me
>>> is how often for those using CQE_SKIP it helps to avoid this
>>> ev_posted()/etc. They obviously can't just mark all requests
>>> with it, and most probably helping only some quite niche cases.
>>
>> That's not too bad. But I think we disagree on CQE_SKIP being niche,
> 
> I wasn't talking about CQE_SKIP but rather cases where that
> ->flush_cqes actually does anything. Consider that when at least
> one of the requests queued for inline completion is not CQE_SKIP
> ->flush_cqes is effectively disabled.
> 
>> there are several standard cases where it makes sense. Provide buffers
>> is one, though that one we have a better solution for now. But also eg
>> OP_CLOSE is something that I'd personally use CQE_SKIP with always.
>>
>> Hence I don't think it's fair or reasonable to call it "quite niche" in
>> terms of general usability.
>>
>> But if this helps in terms of SINGLE_ISSUER, then I think it's worth it
>> as we'll likely see more broad appeal from that.
> 
> It neither conflicts with the SINGLE_ISSUER locking optimisations
> nor with the meantioned mb() optimisation. So, if there is a good
> reason to leave ->flush_cqes alone we can drop the patch.

Let me flip that around - is there a good reason NOT to leave the
optimization in there then?

Pavel Begunkov June 19, 2022, 4:19 p.m. UTC | #6

On 6/19/22 17:17, Jens Axboe wrote:
> On 6/19/22 10:15 AM, Pavel Begunkov wrote:
>> On 6/19/22 16:52, Jens Axboe wrote:
>>> On 6/19/22 8:52 AM, Pavel Begunkov wrote:
>>>> On 6/19/22 14:31, Jens Axboe wrote:
>>>>> On 6/19/22 5:26 AM, Pavel Begunkov wrote:
>>>>>> It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often
>>>>>> ->flush_cqes flag prevents from completion being flushed. Sometimes it's
>>>>>> high level of concurrency that enables it at least for one CQE, but
>>>>>> sometimes it doesn't save much because nobody waiting on the CQ.
>>>>>>
>>>>>> Remove ->flush_cqes flag and the optimisation, it should benefit the
>>>>>> normal use case. Note, that there is no spurious eventfd problem with
>>>>>> that as checks for spuriousness were incorporated into
>>>>>> io_eventfd_signal().
>>>>>
>>>>> Would be note to quantify, which should be pretty easy. Eg run a nop
>>>>> workload, then run the same but with CQE_SKIP_SUCCESS set. That'd take
>>>>> it to the extreme, and I do think it'd be nice to have an understanding
>>>>> of how big the gap could potentially be.
>>>>>
>>>>> With luck, it doesn't really matter. Always nice to kill stuff like
>>>>> this, if it isn't that impactful.
>>>>
>>>> Trying without this patch nops32 (submit 32 nops, complete all, repeat).
>>>>
>>>> 1) all CQE_SKIP:
>>>>       ~51 Mreqs/s
>>>> 2) all CQE_SKIP but last, so it triggers locking + *ev_posted()
>>>>       ~49 Mreq/s
>>>> 3) same as 2) but another task waits on CQ (so we call wake_up_all)
>>>>       ~36 Mreq/s
>>>>
>>>> And that's more or less expected. What is more interesting for me
>>>> is how often for those using CQE_SKIP it helps to avoid this
>>>> ev_posted()/etc. They obviously can't just mark all requests
>>>> with it, and most probably helping only some quite niche cases.
>>>
>>> That's not too bad. But I think we disagree on CQE_SKIP being niche,
>>
>> I wasn't talking about CQE_SKIP but rather cases where that
>> ->flush_cqes actually does anything. Consider that when at least
>> one of the requests queued for inline completion is not CQE_SKIP
>> ->flush_cqes is effectively disabled.
>>
>>> there are several standard cases where it makes sense. Provide buffers
>>> is one, though that one we have a better solution for now. But also eg
>>> OP_CLOSE is something that I'd personally use CQE_SKIP with always.
>>>
>>> Hence I don't think it's fair or reasonable to call it "quite niche" in
>>> terms of general usability.
>>>
>>> But if this helps in terms of SINGLE_ISSUER, then I think it's worth it
>>> as we'll likely see more broad appeal from that.
>>
>> It neither conflicts with the SINGLE_ISSUER locking optimisations
>> nor with the meantioned mb() optimisation. So, if there is a good
>> reason to leave ->flush_cqes alone we can drop the patch.
> 
> Let me flip that around - is there a good reason NOT to leave the
> optimization in there then?

Apart from ifs in the hot path with no understanding whether
it helps anything, no

Jens Axboe June 19, 2022, 4:38 p.m. UTC | #7

On 6/19/22 10:19 AM, Pavel Begunkov wrote:
> On 6/19/22 17:17, Jens Axboe wrote:
>> On 6/19/22 10:15 AM, Pavel Begunkov wrote:
>>> On 6/19/22 16:52, Jens Axboe wrote:
>>>> On 6/19/22 8:52 AM, Pavel Begunkov wrote:
>>>>> On 6/19/22 14:31, Jens Axboe wrote:
>>>>>> On 6/19/22 5:26 AM, Pavel Begunkov wrote:
>>>>>>> It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often
>>>>>>> ->flush_cqes flag prevents from completion being flushed. Sometimes it's
>>>>>>> high level of concurrency that enables it at least for one CQE, but
>>>>>>> sometimes it doesn't save much because nobody waiting on the CQ.
>>>>>>>
>>>>>>> Remove ->flush_cqes flag and the optimisation, it should benefit the
>>>>>>> normal use case. Note, that there is no spurious eventfd problem with
>>>>>>> that as checks for spuriousness were incorporated into
>>>>>>> io_eventfd_signal().
>>>>>>
>>>>>> Would be note to quantify, which should be pretty easy. Eg run a nop
>>>>>> workload, then run the same but with CQE_SKIP_SUCCESS set. That'd take
>>>>>> it to the extreme, and I do think it'd be nice to have an understanding
>>>>>> of how big the gap could potentially be.
>>>>>>
>>>>>> With luck, it doesn't really matter. Always nice to kill stuff like
>>>>>> this, if it isn't that impactful.
>>>>>
>>>>> Trying without this patch nops32 (submit 32 nops, complete all, repeat).
>>>>>
>>>>> 1) all CQE_SKIP:
>>>>>       ~51 Mreqs/s
>>>>> 2) all CQE_SKIP but last, so it triggers locking + *ev_posted()
>>>>>       ~49 Mreq/s
>>>>> 3) same as 2) but another task waits on CQ (so we call wake_up_all)
>>>>>       ~36 Mreq/s
>>>>>
>>>>> And that's more or less expected. What is more interesting for me
>>>>> is how often for those using CQE_SKIP it helps to avoid this
>>>>> ev_posted()/etc. They obviously can't just mark all requests
>>>>> with it, and most probably helping only some quite niche cases.
>>>>
>>>> That's not too bad. But I think we disagree on CQE_SKIP being niche,
>>>
>>> I wasn't talking about CQE_SKIP but rather cases where that
>>> ->flush_cqes actually does anything. Consider that when at least
>>> one of the requests queued for inline completion is not CQE_SKIP
>>> ->flush_cqes is effectively disabled.
>>>
>>>> there are several standard cases where it makes sense. Provide buffers
>>>> is one, though that one we have a better solution for now. But also eg
>>>> OP_CLOSE is something that I'd personally use CQE_SKIP with always.
>>>>
>>>> Hence I don't think it's fair or reasonable to call it "quite niche" in
>>>> terms of general usability.
>>>>
>>>> But if this helps in terms of SINGLE_ISSUER, then I think it's worth it
>>>> as we'll likely see more broad appeal from that.
>>>
>>> It neither conflicts with the SINGLE_ISSUER locking optimisations
>>> nor with the meantioned mb() optimisation. So, if there is a good
>>> reason to leave ->flush_cqes alone we can drop the patch.
>>
>> Let me flip that around - is there a good reason NOT to leave the
>> optimization in there then?
> 
> Apart from ifs in the hot path with no understanding whether
> it helps anything, no

Let's just keep the patch. Ratio of skip to non-skip should still be
very tiny.

Jens Axboe June 19, 2022, 4:38 p.m. UTC | #8

On 6/19/22 10:19 AM, Pavel Begunkov wrote:
> On 6/19/22 17:17, Jens Axboe wrote:
>> On 6/19/22 10:15 AM, Pavel Begunkov wrote:
>>> On 6/19/22 16:52, Jens Axboe wrote:
>>>> On 6/19/22 8:52 AM, Pavel Begunkov wrote:
>>>>> On 6/19/22 14:31, Jens Axboe wrote:
>>>>>> On 6/19/22 5:26 AM, Pavel Begunkov wrote:
>>>>>>> It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often
>>>>>>> ->flush_cqes flag prevents from completion being flushed. Sometimes it's
>>>>>>> high level of concurrency that enables it at least for one CQE, but
>>>>>>> sometimes it doesn't save much because nobody waiting on the CQ.
>>>>>>>
>>>>>>> Remove ->flush_cqes flag and the optimisation, it should benefit the
>>>>>>> normal use case. Note, that there is no spurious eventfd problem with
>>>>>>> that as checks for spuriousness were incorporated into
>>>>>>> io_eventfd_signal().
>>>>>>
>>>>>> Would be note to quantify, which should be pretty easy. Eg run a nop
>>>>>> workload, then run the same but with CQE_SKIP_SUCCESS set. That'd take
>>>>>> it to the extreme, and I do think it'd be nice to have an understanding
>>>>>> of how big the gap could potentially be.
>>>>>>
>>>>>> With luck, it doesn't really matter. Always nice to kill stuff like
>>>>>> this, if it isn't that impactful.
>>>>>
>>>>> Trying without this patch nops32 (submit 32 nops, complete all, repeat).
>>>>>
>>>>> 1) all CQE_SKIP:
>>>>>       ~51 Mreqs/s
>>>>> 2) all CQE_SKIP but last, so it triggers locking + *ev_posted()
>>>>>       ~49 Mreq/s
>>>>> 3) same as 2) but another task waits on CQ (so we call wake_up_all)
>>>>>       ~36 Mreq/s
>>>>>
>>>>> And that's more or less expected. What is more interesting for me
>>>>> is how often for those using CQE_SKIP it helps to avoid this
>>>>> ev_posted()/etc. They obviously can't just mark all requests
>>>>> with it, and most probably helping only some quite niche cases.
>>>>
>>>> That's not too bad. But I think we disagree on CQE_SKIP being niche,
>>>
>>> I wasn't talking about CQE_SKIP but rather cases where that
>>> ->flush_cqes actually does anything. Consider that when at least
>>> one of the requests queued for inline completion is not CQE_SKIP
>>> ->flush_cqes is effectively disabled.
>>>
>>>> there are several standard cases where it makes sense. Provide buffers
>>>> is one, though that one we have a better solution for now. But also eg
>>>> OP_CLOSE is something that I'd personally use CQE_SKIP with always.
>>>>
>>>> Hence I don't think it's fair or reasonable to call it "quite niche" in
>>>> terms of general usability.
>>>>
>>>> But if this helps in terms of SINGLE_ISSUER, then I think it's worth it
>>>> as we'll likely see more broad appeal from that.
>>>
>>> It neither conflicts with the SINGLE_ISSUER locking optimisations
>>> nor with the meantioned mb() optimisation. So, if there is a good
>>> reason to leave ->flush_cqes alone we can drop the patch.
>>
>> Let me flip that around - is there a good reason NOT to leave the
>> optimization in there then?
> 
> Apart from ifs in the hot path with no understanding whether
> it helps anything, no

Let's just keep the patch. Ratio of skip to non-skip should still be
very tiny.

[for-next,5/7] io_uring: remove ->flush_cqes optimisation

Commit Message

Comments

Patch