[PATCHSET,0/6] Enable NO_OFFLOAD support

Message ID	20230419162552.576489-1-axboe@kernel.dk (mailing list archive)
Headers	show Return-Path: <io-uring-owner@vger.kernel.org> From: Jens Axboe <axboe@kernel.dk> To: io-uring@vger.kernel.org Cc: luhongfei@vivo.com Subject: [PATCHSET 0/6] Enable NO_OFFLOAD support Date: Wed, 19 Apr 2023 10:25:46 -0600 Message-Id: <20230419162552.576489-1-axboe@kernel.dk> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Enable NO_OFFLOAD support \| expand [PATCHSET,0/6] Enable NO_OFFLOAD support [1/6] io_uring: grow struct io_kiocb 'flags' to a 64-bit value [2/6] io_uring: move poll_refs up a cacheline to fill a hole [3/6] io_uring: add support for NO_OFFLOAD [4/6] Revert "io_uring: always go async for unsupported fadvise flags" [5/6] Revert "io_uring: for requests that require async, force it" [6/6] io_uring: mark opcodes that always need io-wq punt

Message ID

20230419162552.576489-1-axboe@kernel.dk (mailing list archive)

Headers

From: Jens Axboe <axboe@kernel.dk>
To: io-uring@vger.kernel.org
Cc: luhongfei@vivo.com
Subject: [PATCHSET 0/6] Enable NO_OFFLOAD support
Date: Wed, 19 Apr 2023 10:25:46 -0600
Message-Id: <20230419162552.576489-1-axboe@kernel.dk>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

Enable NO_OFFLOAD support | expand

Message

Jens Axboe April 19, 2023, 4:25 p.m. UTC

Hi,

This series enables support for forcing no-offload for requests that
otherwise would have been punted to io-wq. In essence, it bypasses
the normal non-blocking issue in favor of just letting the issue block.
This is only done for requests that would've otherwise hit io-wq in
the offload path, anything pollable will still be doing non-blocking
issue. See patch 3 for details.

Patches 1-2 are just prep patches, and patch 4-5 are reverts of async
cleanups, and patch 6 enables this for requests that don't support
nonblocking issue at all.

Comments

Pavel Begunkov April 20, 2023, 12:43 a.m. UTC | #1

On 4/19/23 17:25, Jens Axboe wrote:
> Hi,
> 
> This series enables support for forcing no-offload for requests that
> otherwise would have been punted to io-wq. In essence, it bypasses
> the normal non-blocking issue in favor of just letting the issue block.
> This is only done for requests that would've otherwise hit io-wq in
> the offload path, anything pollable will still be doing non-blocking
> issue. See patch 3 for details.

That's shooting ourselves in the leg.

1) It has never been easier to lock up userspace. They might be able
to deal with simple cases like read(pipe) + write(pipe), though even
that in a complex enough framework would cause debugging and associated
headache.

Now let's assume that the userspace submits nvme passthrough requests,
it exhausts tags and a request is left waiting there. To progress
forward one of the previous reqs should complete, but it's only putting
task in tw, which will never be run with DEFER_TASKRUN.

It's not enough for the userspace to be careful, for DEFER_TASKRUN
there will always be a chance to get locked .

2) It's not limited only to requests we're submitting, but also
already queued async requests. Inline submission holds uring_lock,
and so if io-wq thread needs to grab a registered file for the
request, it'll io_ring_submit_lock() and wait until the submission
ends. Same for provided buffers and some other cases.

Even task exit will actively try to grab the lock.

Jens Axboe April 20, 2023, 3:08 p.m. UTC | #2

On 4/19/23 6:43?PM, Pavel Begunkov wrote:
> On 4/19/23 17:25, Jens Axboe wrote:
>> Hi,
>>
>> This series enables support for forcing no-offload for requests that
>> otherwise would have been punted to io-wq. In essence, it bypasses
>> the normal non-blocking issue in favor of just letting the issue block.
>> This is only done for requests that would've otherwise hit io-wq in
>> the offload path, anything pollable will still be doing non-blocking
>> issue. See patch 3 for details.
> 
> That's shooting ourselves in the leg.
> 
> 1) It has never been easier to lock up userspace. They might be able
> to deal with simple cases like read(pipe) + write(pipe), though even
> that in a complex enough framework would cause debugging and associated
> headache.
> 
> Now let's assume that the userspace submits nvme passthrough requests,
> it exhausts tags and a request is left waiting there. To progress
> forward one of the previous reqs should complete, but it's only putting
> task in tw, which will never be run with DEFER_TASKRUN.
> 
> It's not enough for the userspace to be careful, for DEFER_TASKRUN
> there will always be a chance to get locked .
> 
> 2) It's not limited only to requests we're submitting, but also
> already queued async requests. Inline submission holds uring_lock,
> and so if io-wq thread needs to grab a registered file for the
> request, it'll io_ring_submit_lock() and wait until the submission
> ends. Same for provided buffers and some other cases.
> 
> Even task exit will actively try to grab the lock.

One thing I pondered was making the inline submissions similar to io-wq
submissions - eg don't hold uring_lock over them. To make useful, I
suspect we'd want to prep all SQ entries upfront, and then drop for
submission.

We'd also want to make this mutually exclusive with IOPOLL, obviously.
Doesn't make any sense to do anyway for IOPOLL, but it needs to be
explicitly disallowed.

Pavel Begunkov April 20, 2023, 3:28 p.m. UTC | #3

On 4/20/23 16:08, Jens Axboe wrote:
> On 4/19/23 6:43?PM, Pavel Begunkov wrote:
>> On 4/19/23 17:25, Jens Axboe wrote:
>>> Hi,
>>>
>>> This series enables support for forcing no-offload for requests that
>>> otherwise would have been punted to io-wq. In essence, it bypasses
>>> the normal non-blocking issue in favor of just letting the issue block.
>>> This is only done for requests that would've otherwise hit io-wq in
>>> the offload path, anything pollable will still be doing non-blocking
>>> issue. See patch 3 for details.
>>
>> That's shooting ourselves in the leg.
>>
>> 1) It has never been easier to lock up userspace. They might be able
>> to deal with simple cases like read(pipe) + write(pipe), though even
>> that in a complex enough framework would cause debugging and associated
>> headache.
>>
>> Now let's assume that the userspace submits nvme passthrough requests,
>> it exhausts tags and a request is left waiting there. To progress
>> forward one of the previous reqs should complete, but it's only putting
>> task in tw, which will never be run with DEFER_TASKRUN.
>>
>> It's not enough for the userspace to be careful, for DEFER_TASKRUN
>> there will always be a chance to get locked .
>>
>> 2) It's not limited only to requests we're submitting, but also
>> already queued async requests. Inline submission holds uring_lock,
>> and so if io-wq thread needs to grab a registered file for the
>> request, it'll io_ring_submit_lock() and wait until the submission
>> ends. Same for provided buffers and some other cases.
>>
>> Even task exit will actively try to grab the lock.
> 
> One thing I pondered was making the inline submissions similar to io-wq
> submissions - eg don't hold uring_lock over them. To make useful, I
> suspect we'd want to prep all SQ entries upfront, and then drop for
> submission.

That would need completion caches (ctx->submit_state) to be changed,
either by allowing multiple of them or limiting by some other mean
to only 1 inline submitter. Also, that will probably return the
request refcounting back, and DEFER_TASKRUN would probably need
to retake the lock for execution unless there are magic tricks
around it. Not an easy task if we don't want to hurt performance.


> We'd also want to make this mutually exclusive with IOPOLL, obviously.
> Doesn't make any sense to do anyway for IOPOLL, but it needs to be
> explicitly disallowed.