Message ID | 3e79156a106e8b5b3646672656f738ba157957ef.1684505086.git.asml.silence@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [1/1] io_uring: more graceful request alloc OOM | expand |
On Fri, 19 May 2023 15:05:14 +0100, Pavel Begunkov wrote: > It's ok for io_uring request allocation to fail, however there are > reports that it starts killing tasks instead of just returning back > to the userspace. Add __GFP_NORETRY, so it doesn't trigger OOM killer. > > Applied, thanks! [1/1] io_uring: more graceful request alloc OOM (no commit info) Best regards,
Hi, Thanks for your response. But I applied this patch to LTS kernel 5.10.180, it can still trigger this bug. --- io_uring/io_uring.c.back 2023-05-20 17:11:25.870550438 +0800 +++ io_uring/io_uring.c 2023-05-20 16:35:24.265846283 +0800 @@ -1970,7 +1970,7 @@ static struct io_kiocb *io_alloc_req(struct io_ring_ctx *ctx) __must_hold(&ctx->uring_lock) { struct io_submit_state *state = &ctx->submit_state; - gfp_t gfp = GFP_KERNEL | __GFP_NOWARN; + gfp_t gfp = GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY; int ret, i; BUILD_BUG_ON(ARRAY_SIZE(state->reqs) < IO_REQ_ALLOC_BATCH); The io_uring.c.back is the original file. Do I apply this patch wrong? Regards, Yang Pavel Begunkov <asml.silence@gmail.com> 于2023年5月19日周五 22:06写道: > > It's ok for io_uring request allocation to fail, however there are > reports that it starts killing tasks instead of just returning back > to the userspace. Add __GFP_NORETRY, so it doesn't trigger OOM killer. > > Cc: stable@vger.kernel.org > Fixes: 2b188cc1bb857 ("Add io_uring IO interface") > Reported-by: yang lan <lanyang0908@gmail.com> > Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> > --- > io_uring/io_uring.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c > index dab09f568294..ad34a4320dab 100644 > --- a/io_uring/io_uring.c > +++ b/io_uring/io_uring.c > @@ -1073,7 +1073,7 @@ static void io_flush_cached_locked_reqs(struct io_ring_ctx *ctx, > __cold bool __io_alloc_req_refill(struct io_ring_ctx *ctx) > __must_hold(&ctx->uring_lock) > { > - gfp_t gfp = GFP_KERNEL | __GFP_NOWARN; > + gfp_t gfp = GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY; > void *reqs[IO_REQ_ALLOC_BATCH]; > int ret, i; > > -- > 2.40.0 >
On 5/20/23 10:38, yang lan wrote: > Hi, > > Thanks for your response. > > But I applied this patch to LTS kernel 5.10.180, it can still trigger this bug. > > --- io_uring/io_uring.c.back 2023-05-20 17:11:25.870550438 +0800 > +++ io_uring/io_uring.c 2023-05-20 16:35:24.265846283 +0800 > @@ -1970,7 +1970,7 @@ > static struct io_kiocb *io_alloc_req(struct io_ring_ctx *ctx) > __must_hold(&ctx->uring_lock) > { > struct io_submit_state *state = &ctx->submit_state; > - gfp_t gfp = GFP_KERNEL | __GFP_NOWARN; > + gfp_t gfp = GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY; > int ret, i; > > BUILD_BUG_ON(ARRAY_SIZE(state->reqs) < IO_REQ_ALLOC_BATCH); > > The io_uring.c.back is the original file. > Do I apply this patch wrong? The patch looks fine. I run a self-written test before sending with 6.4, worked as expected. I need to run the syz test, maybe it shifted to another spot, e.g. in provided buffers.
Hi, Thanks. I'm also analyzing the root cause of this bug. By the way, can I apply for a CVE? And should I submit a request to some official organizations, such as Openwall, etc? Regards, Yang Pavel Begunkov <asml.silence@gmail.com> 于2023年5月22日周一 08:45写道: > > On 5/20/23 10:38, yang lan wrote: > > Hi, > > > > Thanks for your response. > > > > But I applied this patch to LTS kernel 5.10.180, it can still trigger this bug. > > > > --- io_uring/io_uring.c.back 2023-05-20 17:11:25.870550438 +0800 > > +++ io_uring/io_uring.c 2023-05-20 16:35:24.265846283 +0800 > > @@ -1970,7 +1970,7 @@ > > static struct io_kiocb *io_alloc_req(struct io_ring_ctx *ctx) > > __must_hold(&ctx->uring_lock) > > { > > struct io_submit_state *state = &ctx->submit_state; > > - gfp_t gfp = GFP_KERNEL | __GFP_NOWARN; > > + gfp_t gfp = GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY; > > int ret, i; > > > > BUILD_BUG_ON(ARRAY_SIZE(state->reqs) < IO_REQ_ALLOC_BATCH); > > > > The io_uring.c.back is the original file. > > Do I apply this patch wrong? > > The patch looks fine. I run a self-written test before > sending with 6.4, worked as expected. I need to run the syz > test, maybe it shifted to another spot, e.g. in provided > buffers. > > -- > Pavel Begunkov
On 5/22/23 08:55, yang lan wrote: > Hi, > > Thanks. I'm also analyzing the root cause of this bug. The repro indeed triggers, this time in another place. Though when I patch all of them it would fail somewhere else, like in ext4 on a pagefault. We can add a couple more those "don't oom but fail" and some niceness around, but I think it's fine as it is as allocations are under cgroup. If admin cares about collision b/w users there should be cgroups, so allocations of one don't affect another. And if a user pushes it to the limit and oom's itself and gets OOM, that should be fine. > By the way, can I apply for a CVE? And should I submit a request to > some official organizations, such as Openwall, etc? Sorry, but we cannot help you here. We don't deal with CVEs. That aside, I'm not even sure it's CVE'able because it shouldn't be exploitable in a configured environment (unless it is). But I'm not an expert in that so might be wrong. > Pavel Begunkov <asml.silence@gmail.com> 于2023年5月22日周一 08:45写道: >> >> On 5/20/23 10:38, yang lan wrote: >>> Hi, >>> >>> Thanks for your response. >>> >>> But I applied this patch to LTS kernel 5.10.180, it can still trigger this bug. >>> >>> --- io_uring/io_uring.c.back 2023-05-20 17:11:25.870550438 +0800 >>> +++ io_uring/io_uring.c 2023-05-20 16:35:24.265846283 +0800 >>> @@ -1970,7 +1970,7 @@ >>> static struct io_kiocb *io_alloc_req(struct io_ring_ctx *ctx) >>> __must_hold(&ctx->uring_lock) >>> { >>> struct io_submit_state *state = &ctx->submit_state; >>> - gfp_t gfp = GFP_KERNEL | __GFP_NOWARN; >>> + gfp_t gfp = GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY; >>> int ret, i; >>> >>> BUILD_BUG_ON(ARRAY_SIZE(state->reqs) < IO_REQ_ALLOC_BATCH); >>> >>> The io_uring.c.back is the original file. >>> Do I apply this patch wrong? >> >> The patch looks fine. I run a self-written test before >> sending with 6.4, worked as expected. I need to run the syz >> test, maybe it shifted to another spot, e.g. in provided >> buffers. >> >> -- >> Pavel Begunkov
Hi, Thanks. Regards, Yang Pavel Begunkov <asml.silence@gmail.com> 于2023年5月23日周二 20:15写道: > > On 5/22/23 08:55, yang lan wrote: > > Hi, > > > > Thanks. I'm also analyzing the root cause of this bug. > > The repro indeed triggers, this time in another place. Though > when I patch all of them it would fail somewhere else, like in > ext4 on a pagefault. > > We can add a couple more those "don't oom but fail" and some > niceness around, but I think it's fine as it is as allocations > are under cgroup. If admin cares about collision b/w users there > should be cgroups, so allocations of one don't affect another. And > if a user pushes it to the limit and oom's itself and gets OOM, > that should be fine. > > > By the way, can I apply for a CVE? And should I submit a request to > > some official organizations, such as Openwall, etc? > > Sorry, but we cannot help you here. We don't deal with CVEs. > > That aside, I'm not even sure it's CVE'able because it shouldn't > be exploitable in a configured environment (unless it is). But > I'm not an expert in that so might be wrong. > > > > > Pavel Begunkov <asml.silence@gmail.com> 于2023年5月22日周一 08:45写道: > >> > >> On 5/20/23 10:38, yang lan wrote: > >>> Hi, > >>> > >>> Thanks for your response. > >>> > >>> But I applied this patch to LTS kernel 5.10.180, it can still trigger this bug. > >>> > >>> --- io_uring/io_uring.c.back 2023-05-20 17:11:25.870550438 +0800 > >>> +++ io_uring/io_uring.c 2023-05-20 16:35:24.265846283 +0800 > >>> @@ -1970,7 +1970,7 @@ > >>> static struct io_kiocb *io_alloc_req(struct io_ring_ctx *ctx) > >>> __must_hold(&ctx->uring_lock) > >>> { > >>> struct io_submit_state *state = &ctx->submit_state; > >>> - gfp_t gfp = GFP_KERNEL | __GFP_NOWARN; > >>> + gfp_t gfp = GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY; > >>> int ret, i; > >>> > >>> BUILD_BUG_ON(ARRAY_SIZE(state->reqs) < IO_REQ_ALLOC_BATCH); > >>> > >>> The io_uring.c.back is the original file. > >>> Do I apply this patch wrong? > >> > >> The patch looks fine. I run a self-written test before > >> sending with 6.4, worked as expected. I need to run the syz > >> test, maybe it shifted to another spot, e.g. in provided > >> buffers. > >> > >> -- > >> Pavel Begunkov > > -- > Pavel Begunkov
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index dab09f568294..ad34a4320dab 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -1073,7 +1073,7 @@ static void io_flush_cached_locked_reqs(struct io_ring_ctx *ctx, __cold bool __io_alloc_req_refill(struct io_ring_ctx *ctx) __must_hold(&ctx->uring_lock) { - gfp_t gfp = GFP_KERNEL | __GFP_NOWARN; + gfp_t gfp = GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY; void *reqs[IO_REQ_ALLOC_BATCH]; int ret, i;
It's ok for io_uring request allocation to fail, however there are reports that it starts killing tasks instead of just returning back to the userspace. Add __GFP_NORETRY, so it doesn't trigger OOM killer. Cc: stable@vger.kernel.org Fixes: 2b188cc1bb857 ("Add io_uring IO interface") Reported-by: yang lan <lanyang0908@gmail.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> --- io_uring/io_uring.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)