diff mbox series

soft lockup with io_uring

Message ID 2f87ee3f-d61f-e572-08f5-96a8ef8843b0@grimberg.me (mailing list archive)
State New, archived
Headers show
Series soft lockup with io_uring | expand

Commit Message

Sagi Grimberg Aug. 21, 2019, 10:48 p.m. UTC
Hey,

Just ran io-uring-bench on my VM to /dev/nullb0 and got the following 
soft lockup [1], the reproducer is as simple as:

modprobe null_blk
tools/io_uring/io_uring-bench /dev/nullb0

It looks like io_iopoll_getevents() can hog the cpu, however I don't
yet really know what is preventing it from quickly exceeding min and
punting back...

Adding this makes the problem go away:
--
         return 1;
--

But I do not know if this is the correct way to fix this, or what
exactly is the issue, but thought I send it out given its so
easy to reproduce.


[1]:
--
[1032930.615999] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! 
[io_uring-bench:21221]
[1032930.626182] RIP: 0010:io_iopoll_getevents+0xaf/0x260
[...]
[1032930.626942] Call Trace:
[1032930.627598]  io_iopoll_check+0x34/0x70
[1032930.627601]  __x64_sys_io_uring_enter+0x183/0x300
[1032930.627729]  do_syscall_64+0x55/0x130
[1032930.627911]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[1032930.627927] RIP: 0033:0x7fa4641d7839
[...]
[1032958.615873] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! 
[io_uring-bench:21221]
[1032958.720822] RIP: 0010:blkdev_iopoll+0x0/0x30
[...]
[1032958.721116] Call Trace:
[1032958.721624]  io_iopoll_getevents+0xaa/0x260
[1032958.721635]  io_iopoll_check+0x34/0x70
[1032958.721637]  __x64_sys_io_uring_enter+0x183/0x300
[1032958.721642]  do_syscall_64+0x55/0x130
[1032958.721649]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[1032958.721654] RIP: 0033:0x7fa4641d7839
[...]
--

Comments

Jens Axboe Aug. 22, 2019, 12:55 a.m. UTC | #1
On 8/21/19 4:48 PM, Sagi Grimberg wrote:
> Hey,
> 
> Just ran io-uring-bench on my VM to /dev/nullb0 and got the following
> soft lockup [1], the reproducer is as simple as:
> 
> modprobe null_blk
> tools/io_uring/io_uring-bench /dev/nullb0
> 
> It looks like io_iopoll_getevents() can hog the cpu, however I don't
> yet really know what is preventing it from quickly exceeding min and
> punting back...
> 
> Adding this makes the problem go away:
> --
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 8b9dbf3b2298..aba03eee5c81 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -779,6 +779,7 @@ static int io_iopoll_getevents(struct io_ring_ctx
> *ctx, unsigned int *nr_events,
>                           return ret;
>                   if (!min || *nr_events >= min)
>                           return 0;
> +               cond_resched();
>           }
> 
>           return 1;
> --
> 
> But I do not know if this is the correct way to fix this, or what
> exactly is the issue, but thought I send it out given its so
> easy to reproduce.

I wonder what your .config is, can you attach it?

Also, please try my for-linus branch, it's got a few tweaks for how
we handle polling (and when we back off). Doesn't affect the inner
loop, so might not change anything for you.

If not, might be better to have a need_resched() terminator in there,
like we have in the outer loop.
Jens Axboe Aug. 22, 2019, 1:33 a.m. UTC | #2
On 8/21/19 7:18 PM, Sagi Grimberg wrote:
> 
>>> Hey,
>>>
>>> Just ran io-uring-bench on my VM to /dev/nullb0 and got the following
>>> soft lockup [1], the reproducer is as simple as:
>>>
>>> modprobe null_blk
>>> tools/io_uring/io_uring-bench /dev/nullb0
>>>
>>> It looks like io_iopoll_getevents() can hog the cpu, however I don't
>>> yet really know what is preventing it from quickly exceeding min and
>>> punting back...
>>>
>>> Adding this makes the problem go away:
>>> --
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index 8b9dbf3b2298..aba03eee5c81 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -779,6 +779,7 @@ static int io_iopoll_getevents(struct io_ring_ctx
>>> *ctx, unsigned int *nr_events,
>>>                             return ret;
>>>                     if (!min || *nr_events >= min)
>>>                             return 0;
>>> +               cond_resched();
>>>             }
>>>
>>>             return 1;
>>> --
>>>
>>> But I do not know if this is the correct way to fix this, or what
>>> exactly is the issue, but thought I send it out given its so
>>> easy to reproduce.
>>
>> I wonder what your .config is, can you attach it?
> 
> Attached.
> 
>>
>> Also, please try my for-linus branch, it's got a few tweaks for how
>> we handle polling (and when we back off). Doesn't affect the inner
>> loop, so might not change anything for you.
> 
> This is your for-linus branch (or at least the one when I sent you
> the nvme pull request this week).
> 
> The head commit on fs/io_uring.c:
> 2fc7323f1c4b io_uring: fix potential hang with polled IO
> 
> I'm only missing:
> a3a0e43fd770 io_uring: don't enter poll loop if we have CQEs pending
> 
> But that does not indicate that it addresses such an issue.
> 
> I can still give it a shot if you think it can be resolved...
> 
>> If not, might be better to have a need_resched() terminator in there,
>> like we have in the outer loop.
> 
> I can easily modify that, would like to understand what is preventing
> the stop condition from happening though...

I'm guessing because we need to free that same CPU to process the
softirq that's actually completing them. null_blk is a bit special in
that regard. The key in your case is that you have voluntary preempt
set, so it'll never get to do that unless we yield on our own.

Can you try this?

diff --git a/fs/io_uring.c b/fs/io_uring.c
index e7a43a354d91..c6a722996d8a 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -778,7 +778,7 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, unsigned int *nr_events,
 static int io_iopoll_getevents(struct io_ring_ctx *ctx, unsigned int *nr_events,
 				long min)
 {
-	while (!list_empty(&ctx->poll_list)) {
+	while (!list_empty(&ctx->poll_list) && !need_resched()) {
 		int ret;
 
 		ret = io_do_iopoll(ctx, nr_events, min);
diff mbox series

Patch

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 8b9dbf3b2298..aba03eee5c81 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -779,6 +779,7 @@  static int io_iopoll_getevents(struct io_ring_ctx 
*ctx, unsigned int *nr_events,
                         return ret;
                 if (!min || *nr_events >= min)
                         return 0;
+               cond_resched();
         }