Message ID | 20230419162552.576489-1-axboe@kernel.dk (mailing list archive) |
---|---|
Headers | show |
Series | Enable NO_OFFLOAD support | expand |
On 4/19/23 17:25, Jens Axboe wrote: > Hi, > > This series enables support for forcing no-offload for requests that > otherwise would have been punted to io-wq. In essence, it bypasses > the normal non-blocking issue in favor of just letting the issue block. > This is only done for requests that would've otherwise hit io-wq in > the offload path, anything pollable will still be doing non-blocking > issue. See patch 3 for details. That's shooting ourselves in the leg. 1) It has never been easier to lock up userspace. They might be able to deal with simple cases like read(pipe) + write(pipe), though even that in a complex enough framework would cause debugging and associated headache. Now let's assume that the userspace submits nvme passthrough requests, it exhausts tags and a request is left waiting there. To progress forward one of the previous reqs should complete, but it's only putting task in tw, which will never be run with DEFER_TASKRUN. It's not enough for the userspace to be careful, for DEFER_TASKRUN there will always be a chance to get locked . 2) It's not limited only to requests we're submitting, but also already queued async requests. Inline submission holds uring_lock, and so if io-wq thread needs to grab a registered file for the request, it'll io_ring_submit_lock() and wait until the submission ends. Same for provided buffers and some other cases. Even task exit will actively try to grab the lock.
On 4/19/23 6:43?PM, Pavel Begunkov wrote: > On 4/19/23 17:25, Jens Axboe wrote: >> Hi, >> >> This series enables support for forcing no-offload for requests that >> otherwise would have been punted to io-wq. In essence, it bypasses >> the normal non-blocking issue in favor of just letting the issue block. >> This is only done for requests that would've otherwise hit io-wq in >> the offload path, anything pollable will still be doing non-blocking >> issue. See patch 3 for details. > > That's shooting ourselves in the leg. > > 1) It has never been easier to lock up userspace. They might be able > to deal with simple cases like read(pipe) + write(pipe), though even > that in a complex enough framework would cause debugging and associated > headache. > > Now let's assume that the userspace submits nvme passthrough requests, > it exhausts tags and a request is left waiting there. To progress > forward one of the previous reqs should complete, but it's only putting > task in tw, which will never be run with DEFER_TASKRUN. > > It's not enough for the userspace to be careful, for DEFER_TASKRUN > there will always be a chance to get locked . > > 2) It's not limited only to requests we're submitting, but also > already queued async requests. Inline submission holds uring_lock, > and so if io-wq thread needs to grab a registered file for the > request, it'll io_ring_submit_lock() and wait until the submission > ends. Same for provided buffers and some other cases. > > Even task exit will actively try to grab the lock. One thing I pondered was making the inline submissions similar to io-wq submissions - eg don't hold uring_lock over them. To make useful, I suspect we'd want to prep all SQ entries upfront, and then drop for submission. We'd also want to make this mutually exclusive with IOPOLL, obviously. Doesn't make any sense to do anyway for IOPOLL, but it needs to be explicitly disallowed.
On 4/20/23 16:08, Jens Axboe wrote: > On 4/19/23 6:43?PM, Pavel Begunkov wrote: >> On 4/19/23 17:25, Jens Axboe wrote: >>> Hi, >>> >>> This series enables support for forcing no-offload for requests that >>> otherwise would have been punted to io-wq. In essence, it bypasses >>> the normal non-blocking issue in favor of just letting the issue block. >>> This is only done for requests that would've otherwise hit io-wq in >>> the offload path, anything pollable will still be doing non-blocking >>> issue. See patch 3 for details. >> >> That's shooting ourselves in the leg. >> >> 1) It has never been easier to lock up userspace. They might be able >> to deal with simple cases like read(pipe) + write(pipe), though even >> that in a complex enough framework would cause debugging and associated >> headache. >> >> Now let's assume that the userspace submits nvme passthrough requests, >> it exhausts tags and a request is left waiting there. To progress >> forward one of the previous reqs should complete, but it's only putting >> task in tw, which will never be run with DEFER_TASKRUN. >> >> It's not enough for the userspace to be careful, for DEFER_TASKRUN >> there will always be a chance to get locked . >> >> 2) It's not limited only to requests we're submitting, but also >> already queued async requests. Inline submission holds uring_lock, >> and so if io-wq thread needs to grab a registered file for the >> request, it'll io_ring_submit_lock() and wait until the submission >> ends. Same for provided buffers and some other cases. >> >> Even task exit will actively try to grab the lock. > > One thing I pondered was making the inline submissions similar to io-wq > submissions - eg don't hold uring_lock over them. To make useful, I > suspect we'd want to prep all SQ entries upfront, and then drop for > submission. That would need completion caches (ctx->submit_state) to be changed, either by allowing multiple of them or limiting by some other mean to only 1 inline submitter. Also, that will probably return the request refcounting back, and DEFER_TASKRUN would probably need to retake the lock for execution unless there are magic tricks around it. Not an easy task if we don't want to hurt performance. > We'd also want to make this mutually exclusive with IOPOLL, obviously. > Doesn't make any sense to do anyway for IOPOLL, but it needs to be > explicitly disallowed.