[00/11] remove aux CQE caches

Message ID	cover.1710514702.git.asml.silence@gmail.com (mailing list archive)
Headers	show Received: from mail-wr1-f43.google.com (mail-wr1-f43.google.com [209.85.221.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3B3131946C; Fri, 15 Mar 2024 15:31:10 +0000 (UTC) From: Pavel Begunkov <asml.silence@gmail.com> To: io-uring@vger.kernel.org Cc: linux-block@vger.kernel.org, Jens Axboe <axboe@kernel.dk>, asml.silence@gmail.com, Kanchan Joshi <joshi.k@samsung.com>, Ming Lei <ming.lei@redhat.com> Subject: [PATCH 00/11] remove aux CQE caches Date: Fri, 15 Mar 2024 15:29:50 +0000 Message-ID: <cover.1710514702.git.asml.silence@gmail.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	remove aux CQE caches \| expand [00/11] remove aux CQE caches [01/11] io_uring: fix poll_remove stalled req completion [02/11] io_uring/cmd: kill one issue_flags to tw conversion [03/11] io_uring/cmd: fix tw <-> issue_flags conversion [04/11] io_uring/cmd: introduce io_uring_cmd_complete [05/11] ublk: don't hard code IO_URING_F_UNLOCKED [06/11] nvme/io_uring: don't hard code IO_URING_F_UNLOCKED [07/11] io_uring/rw: avoid punting to io-wq directly [08/11] io_uring: force tw ctx locking [09/11] io_uring: remove struct io_tw_state::locked [10/11] io_uring: refactor io_fill_cqe_req_aux [11/11] io_uring: get rid of intermediate aux cqe caches

Pavel Begunkov March 15, 2024, 3:29 p.m. UTC

Patch 1 is a fix.

Patches 2-7 are cleanups mainly dealing with issue_flags conversions,
misundertsandings of the flags and of the tw state. It'd be great to have
even without even w/o the rest.

8-11 mandate ctx locking for task_work and finally removes the CQE
caches, instead we post directly into the CQ. Note that the cache is
used by multishot auxiliary completions.

The nvme cmd change is tested with io_uring_passthrough.c, however
it doesn't seem there is anything in liburing exercising ublk paths.
How do we test it? It'd be great to have at least some basic tests
for it.

Pavel Begunkov (11):
  io_uring: fix poll_remove stalled req completion
  io_uring/cmd: kill one issue_flags to tw conversion
  io_uring/cmd: fix tw <-> issue_flags conversion
  io_uring/cmd: introduce io_uring_cmd_complete
  ublk: don't hard code IO_URING_F_UNLOCKED
  nvme/io_uring: don't hard code IO_URING_F_UNLOCKED
  io_uring/rw: avoid punting to io-wq directly
  io_uring: force tw ctx locking
  io_uring: remove struct io_tw_state::locked
  io_uring: refactor io_fill_cqe_req_aux
  io_uring: get rid of intermediate aux cqe caches

 drivers/block/ublk_drv.c       |  18 ++---
 drivers/nvme/host/ioctl.c      |   9 ++-
 include/linux/io_uring/cmd.h   |  24 ++++++
 include/linux/io_uring_types.h |   5 +-
 io_uring/io_uring.c            | 132 +++++++++------------------------
 io_uring/io_uring.h            |   8 +-
 io_uring/net.c                 |   6 +-
 io_uring/poll.c                |   7 +-
 io_uring/rw.c                  |  18 +----
 io_uring/timeout.c             |   8 +-
 io_uring/uring_cmd.c           |  14 ++--
 io_uring/waitid.c              |   2 +-
 12 files changed, 95 insertions(+), 156 deletions(-)

Jens Axboe March 15, 2024, 3:42 p.m. UTC | #1

On 3/15/24 9:29 AM, Pavel Begunkov wrote:
> Patch 1 is a fix.
> 
> Patches 2-7 are cleanups mainly dealing with issue_flags conversions,
> misundertsandings of the flags and of the tw state. It'd be great to have
> even without even w/o the rest.
> 
> 8-11 mandate ctx locking for task_work and finally removes the CQE
> caches, instead we post directly into the CQ. Note that the cache is
> used by multishot auxiliary completions.

I love this series! I'll push patch 1 for 6.9, and then run some testing
with the rest for 6.10.

Jens Axboe March 15, 2024, 4 p.m. UTC | #2

On 3/15/24 9:29 AM, Pavel Begunkov wrote:
> The nvme cmd change is tested with io_uring_passthrough.c, however
> it doesn't seem there is anything in liburing exercising ublk paths.
> How do we test it? It'd be great to have at least some basic tests
> for it.

Forgot to comment on this... Last time I tested it, I just pulled the
ublk repo and setup a device and did some IO. It would indeed be nice to
have some basic ublk tests in liburing, however.

Jens Axboe March 15, 2024, 10:53 p.m. UTC | #3

On Fri, 15 Mar 2024 15:29:50 +0000, Pavel Begunkov wrote:
> Patch 1 is a fix.
> 
> Patches 2-7 are cleanups mainly dealing with issue_flags conversions,
> misundertsandings of the flags and of the tw state. It'd be great to have
> even without even w/o the rest.
> 
> 8-11 mandate ctx locking for task_work and finally removes the CQE
> caches, instead we post directly into the CQ. Note that the cache is
> used by multishot auxiliary completions.
> 
> [...]

Applied, thanks!

[02/11] io_uring/cmd: kill one issue_flags to tw conversion
        commit: 31ab0342cf6434e1e2879d12f0526830ce97365d
[03/11] io_uring/cmd: fix tw <-> issue_flags conversion
        commit: b48f3e29b89055894b3f50c657658c325b5b49fd
[04/11] io_uring/cmd: introduce io_uring_cmd_complete
        commit: c5b4c92ca69215c0af17e4e9d8c84c8942f3257d
[05/11] ublk: don't hard code IO_URING_F_UNLOCKED
        commit: c54cfb81fe1774231fca952eff928389bfc3b2e3
[06/11] nvme/io_uring: don't hard code IO_URING_F_UNLOCKED
        commit: 800a90681f3c3383660a8e3e2d279e0f056afaee
[07/11] io_uring/rw: avoid punting to io-wq directly
        commit: 56d565d54373c17b7620fc605c899c41968e48d0
[08/11] io_uring: force tw ctx locking
        commit: f087cdd065af0418ffc8a9ed39eadc93347efdd5
[09/11] io_uring: remove struct io_tw_state::locked
        commit: 339f8d66e996ec52b47221448ff4b3534cc9a58d
[10/11] io_uring: refactor io_fill_cqe_req_aux
        commit: 7b31c3964b769a6a16c4e414baa8094b441e498e
[11/11] io_uring: get rid of intermediate aux cqe caches
        commit: 5a475a1f47412a44ed184aac04b9ff0aeaa31d65

Best regards,

Ming Lei March 16, 2024, 2:03 a.m. UTC | #4

On Fri, Mar 15, 2024 at 04:53:21PM -0600, Jens Axboe wrote:
> 
> On Fri, 15 Mar 2024 15:29:50 +0000, Pavel Begunkov wrote:
> > Patch 1 is a fix.
> > 
> > Patches 2-7 are cleanups mainly dealing with issue_flags conversions,
> > misundertsandings of the flags and of the tw state. It'd be great to have
> > even without even w/o the rest.
> > 
> > 8-11 mandate ctx locking for task_work and finally removes the CQE
> > caches, instead we post directly into the CQ. Note that the cache is
> > used by multishot auxiliary completions.
> > 
> > [...]
> 
> Applied, thanks!

Hi Jens and Pavel,

Looks this patch causes hang when running './check ublk/002' in blktests.

Steps:

1) cargo install rublk

2) cd blktests

3) ./check ublk/002



Thanks,
Ming

Ming Lei March 16, 2024, 2:24 a.m. UTC | #5

On Sat, Mar 16, 2024 at 10:04 AM Ming Lei <ming.lei@redhat.com> wrote:
>
> On Fri, Mar 15, 2024 at 04:53:21PM -0600, Jens Axboe wrote:
> >
> > On Fri, 15 Mar 2024 15:29:50 +0000, Pavel Begunkov wrote:
> > > Patch 1 is a fix.
> > >
> > > Patches 2-7 are cleanups mainly dealing with issue_flags conversions,
> > > misundertsandings of the flags and of the tw state. It'd be great to have
> > > even without even w/o the rest.
> > >
> > > 8-11 mandate ctx locking for task_work and finally removes the CQE
> > > caches, instead we post directly into the CQ. Note that the cache is
> > > used by multishot auxiliary completions.
> > >
> > > [...]
> >
> > Applied, thanks!
>
> Hi Jens and Pavel,
>
> Looks this patch causes hang when running './check ublk/002' in blktests.

Not take close look, and  I guess it hangs in

io_uring_cmd_del_cancelable() -> io_ring_submit_lock

[root@ktest-36 ~]# cat /proc/1420/stack
[<0>] io_uring_cmd_done+0x161/0x1c0
[<0>] ublk_stop_dev+0x10e/0x1b0 [ublk_drv]
[<0>] ublk_ctrl_uring_cmd+0xbc9/0x11e0 [ublk_drv]
[<0>] io_uring_cmd+0x9e/0x130
[<0>] io_issue_sqe+0x2d3/0x730
[<0>] io_wq_submit_work+0xd2/0x350
[<0>] io_worker_handle_work+0x12a/0x4b0
[<0>] io_wq_worker+0x101/0x390
[<0>] ret_from_fork+0x31/0x50
[<0>] ret_from_fork_asm+0x1a/0x30

(gdb) l *(io_uring_cmd_done+0x161)
0xffffffff817ed241 is in io_uring_cmd_done (./include/linux/list.h:985).
980 return !READ_ONCE(h->first);
981 }
982
983 static inline void __hlist_del(struct hlist_node *n)
984 {
985 struct hlist_node *next = n->next;
986 struct hlist_node **pprev = n->pprev;
987
988 WRITE_ONCE(*pprev, next);
989 if (next)


Thanks,

Pavel Begunkov March 16, 2024, 2:54 a.m. UTC | #6

On 3/16/24 02:24, Ming Lei wrote:
> On Sat, Mar 16, 2024 at 10:04 AM Ming Lei <ming.lei@redhat.com> wrote:
>>
>> On Fri, Mar 15, 2024 at 04:53:21PM -0600, Jens Axboe wrote:
>>>
>>> On Fri, 15 Mar 2024 15:29:50 +0000, Pavel Begunkov wrote:
>>>> Patch 1 is a fix.
>>>>
>>>> Patches 2-7 are cleanups mainly dealing with issue_flags conversions,
>>>> misundertsandings of the flags and of the tw state. It'd be great to have
>>>> even without even w/o the rest.
>>>>
>>>> 8-11 mandate ctx locking for task_work and finally removes the CQE
>>>> caches, instead we post directly into the CQ. Note that the cache is
>>>> used by multishot auxiliary completions.
>>>>
>>>> [...]
>>>
>>> Applied, thanks!
>>
>> Hi Jens and Pavel,
>>
>> Looks this patch causes hang when running './check ublk/002' in blktests.
> 
> Not take close look, and  I guess it hangs in
> 
> io_uring_cmd_del_cancelable() -> io_ring_submit_lock

Thanks, the trace doesn't completely explains it, but my blind spot
was io_uring_cmd_done() potentially grabbing the mutex. They're
supposed to be irq safe mimicking io_req_task_work_add(), that's how
nvme passthrough uses it as well (but at least it doesn't need the
cancellation bits).

One option is to replace it with a spinlock, the other is to delay
the io_uring_cmd_del_cancelable() call to the task_work callback.
The latter would be cleaner and more preferable, but I'm lacking
context to tell if that would be correct. Ming, what do you think?

Ming Lei March 16, 2024, 3:54 a.m. UTC | #7

On Sat, Mar 16, 2024 at 02:54:19AM +0000, Pavel Begunkov wrote:
> On 3/16/24 02:24, Ming Lei wrote:
> > On Sat, Mar 16, 2024 at 10:04 AM Ming Lei <ming.lei@redhat.com> wrote:
> > > 
> > > On Fri, Mar 15, 2024 at 04:53:21PM -0600, Jens Axboe wrote:
> > > > 
> > > > On Fri, 15 Mar 2024 15:29:50 +0000, Pavel Begunkov wrote:
> > > > > Patch 1 is a fix.
> > > > > 
> > > > > Patches 2-7 are cleanups mainly dealing with issue_flags conversions,
> > > > > misundertsandings of the flags and of the tw state. It'd be great to have
> > > > > even without even w/o the rest.
> > > > > 
> > > > > 8-11 mandate ctx locking for task_work and finally removes the CQE
> > > > > caches, instead we post directly into the CQ. Note that the cache is
> > > > > used by multishot auxiliary completions.
> > > > > 
> > > > > [...]
> > > > 
> > > > Applied, thanks!
> > > 
> > > Hi Jens and Pavel,
> > > 
> > > Looks this patch causes hang when running './check ublk/002' in blktests.
> > 
> > Not take close look, and  I guess it hangs in
> > 
> > io_uring_cmd_del_cancelable() -> io_ring_submit_lock
> 
> Thanks, the trace doesn't completely explains it, but my blind spot
> was io_uring_cmd_done() potentially grabbing the mutex. They're
> supposed to be irq safe mimicking io_req_task_work_add(), that's how
> nvme passthrough uses it as well (but at least it doesn't need the
> cancellation bits).
> 
> One option is to replace it with a spinlock, the other is to delay
> the io_uring_cmd_del_cancelable() call to the task_work callback.
> The latter would be cleaner and more preferable, but I'm lacking
> context to tell if that would be correct. Ming, what do you think?

I prefer to the latter approach because the two cancelable helpers are
run in fast path.

Looks all new io_uring_cmd_complete() in ublk have this issue, and the
following patch should avoid them all.

diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 97dceecadab2..1f54da0e655c 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -1417,6 +1417,12 @@ static bool ublk_abort_requests(struct ublk_device *ub, struct ublk_queue *ubq)
 	return true;
 }
 
+static void ublk_cancel_cmd_cb(struct io_uring_cmd *cmd,
+		unsigned int issue_flags)
+{
+	io_uring_cmd_done(cmd, UBLK_IO_RES_ABORT, 0, issue_flags);
+}
+
 static void ublk_cancel_cmd(struct ublk_queue *ubq, struct ublk_io *io)
 {
 	bool done;
@@ -1431,7 +1437,7 @@ static void ublk_cancel_cmd(struct ublk_queue *ubq, struct ublk_io *io)
 	spin_unlock(&ubq->cancel_lock);
 
 	if (!done)
-		io_uring_cmd_complete(io->cmd, UBLK_IO_RES_ABORT, 0);
+		io_uring_cmd_complete_in_task(io->cmd, ublk_cancel_cmd_cb);
 }
 
 /*
@@ -1775,10 +1781,9 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd,
 	return -EIOCBQUEUED;
 
  out:
-	io_uring_cmd_complete(cmd, ret, 0);
 	pr_devel("%s: complete: cmd op %d, tag %d ret %x io_flags %x\n",
 			__func__, cmd_op, tag, ret, io->flags);
-	return -EIOCBQUEUED;
+	return ret;
 }
 
 static inline struct request *__ublk_check_and_get_req(struct ublk_device *ub,
@@ -2928,10 +2933,9 @@ static int ublk_ctrl_uring_cmd(struct io_uring_cmd *cmd,
 	if (ub)
 		ublk_put_device(ub);
  out:
-	io_uring_cmd_complete(cmd, ret, 0);
 	pr_devel("%s: cmd done ret %d cmd_op %x, dev id %d qid %d\n",
 			__func__, ret, cmd->cmd_op, header->dev_id, header->queue_id);
-	return -EIOCBQUEUED;
+	return ret;
 }
 
 static const struct file_operations ublk_ctl_fops = {



Thanks,
Ming

Pavel Begunkov March 16, 2024, 4:13 a.m. UTC | #8

On 3/16/24 03:54, Ming Lei wrote:
> On Sat, Mar 16, 2024 at 02:54:19AM +0000, Pavel Begunkov wrote:
>> On 3/16/24 02:24, Ming Lei wrote:
>>> On Sat, Mar 16, 2024 at 10:04 AM Ming Lei <ming.lei@redhat.com> wrote:
>>>>
>>>> On Fri, Mar 15, 2024 at 04:53:21PM -0600, Jens Axboe wrote:
>>>>>
>>>>> On Fri, 15 Mar 2024 15:29:50 +0000, Pavel Begunkov wrote:
>>>>>> Patch 1 is a fix.
>>>>>>
>>>>>> Patches 2-7 are cleanups mainly dealing with issue_flags conversions,
>>>>>> misundertsandings of the flags and of the tw state. It'd be great to have
>>>>>> even without even w/o the rest.
>>>>>>
>>>>>> 8-11 mandate ctx locking for task_work and finally removes the CQE
>>>>>> caches, instead we post directly into the CQ. Note that the cache is
>>>>>> used by multishot auxiliary completions.
>>>>>>
>>>>>> [...]
>>>>>
>>>>> Applied, thanks!
>>>>
>>>> Hi Jens and Pavel,
>>>>
>>>> Looks this patch causes hang when running './check ublk/002' in blktests.
>>>
>>> Not take close look, and  I guess it hangs in
>>>
>>> io_uring_cmd_del_cancelable() -> io_ring_submit_lock
>>
>> Thanks, the trace doesn't completely explains it, but my blind spot
>> was io_uring_cmd_done() potentially grabbing the mutex. They're
>> supposed to be irq safe mimicking io_req_task_work_add(), that's how
>> nvme passthrough uses it as well (but at least it doesn't need the
>> cancellation bits).
>>
>> One option is to replace it with a spinlock, the other is to delay
>> the io_uring_cmd_del_cancelable() call to the task_work callback.
>> The latter would be cleaner and more preferable, but I'm lacking
>> context to tell if that would be correct. Ming, what do you think?
> 
> I prefer to the latter approach because the two cancelable helpers are
> run in fast path.
> 
> Looks all new io_uring_cmd_complete() in ublk have this issue, and the
> following patch should avoid them all.

The one I have in mind on top of the current tree would be like below.
Untested, and doesn't allow this cancellation thing for iopoll. I'll
prepare something tomorrow.


diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index e45d4cd5ef82..000ba435451c 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -14,19 +14,15 @@
  #include "rsrc.h"
  #include "uring_cmd.h"
  
-static void io_uring_cmd_del_cancelable(struct io_uring_cmd *cmd,
-		unsigned int issue_flags)
+static void io_uring_cmd_del_cancelable(struct io_uring_cmd *cmd)
  {
  	struct io_kiocb *req = cmd_to_io_kiocb(cmd);
-	struct io_ring_ctx *ctx = req->ctx;
  
  	if (!(cmd->flags & IORING_URING_CMD_CANCELABLE))
  		return;
  
  	cmd->flags &= ~IORING_URING_CMD_CANCELABLE;
-	io_ring_submit_lock(ctx, issue_flags);
  	hlist_del(&req->hash_node);
-	io_ring_submit_unlock(ctx, issue_flags);
  }
  
  /*
@@ -80,6 +76,15 @@ static inline void io_req_set_cqe32_extra(struct io_kiocb *req,
  	req->big_cqe.extra2 = extra2;
  }
  
+static void io_req_task_cmd_complete(struct io_kiocb *req,
+				     struct io_tw_state *ts)
+{
+	struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd);
+
+	io_uring_cmd_del_cancelable(ioucmd);
+	io_req_task_complete(req, ts);
+}
+
  /*
   * Called by consumers of io_uring_cmd, if they originally returned
   * -EIOCBQUEUED upon receiving the command.
@@ -89,8 +94,6 @@ void io_uring_cmd_done(struct io_uring_cmd *ioucmd, ssize_t ret, ssize_t res2,
  {
  	struct io_kiocb *req = cmd_to_io_kiocb(ioucmd);
  
-	io_uring_cmd_del_cancelable(ioucmd, issue_flags);
-
  	if (ret < 0)
  		req_set_fail(req);
  
@@ -105,7 +108,7 @@ void io_uring_cmd_done(struct io_uring_cmd *ioucmd, ssize_t ret, ssize_t res2,
  			return;
  		io_req_complete_defer(req);
  	} else {
-		req->io_task_work.func = io_req_task_complete;
+		req->io_task_work.func = io_req_task_cmd_complete;
  		io_req_task_work_add(req);
  	}
  }

Pavel Begunkov March 16, 2024, 4:20 a.m. UTC | #9

On 3/16/24 04:13, Pavel Begunkov wrote:
> On 3/16/24 03:54, Ming Lei wrote:
>> On Sat, Mar 16, 2024 at 02:54:19AM +0000, Pavel Begunkov wrote:
>>> On 3/16/24 02:24, Ming Lei wrote:
>>>> On Sat, Mar 16, 2024 at 10:04 AM Ming Lei <ming.lei@redhat.com> wrote:
>>>>>
>>>>> On Fri, Mar 15, 2024 at 04:53:21PM -0600, Jens Axboe wrote:
>>>>>>
>>>>>> On Fri, 15 Mar 2024 15:29:50 +0000, Pavel Begunkov wrote:
>>>>>>> Patch 1 is a fix.
>>>>>>>
>>>>>>> Patches 2-7 are cleanups mainly dealing with issue_flags conversions,
>>>>>>> misundertsandings of the flags and of the tw state. It'd be great to have
>>>>>>> even without even w/o the rest.
>>>>>>>
>>>>>>> 8-11 mandate ctx locking for task_work and finally removes the CQE
>>>>>>> caches, instead we post directly into the CQ. Note that the cache is
>>>>>>> used by multishot auxiliary completions.
>>>>>>>
>>>>>>> [...]
>>>>>>
>>>>>> Applied, thanks!
>>>>>
>>>>> Hi Jens and Pavel,
>>>>>
>>>>> Looks this patch causes hang when running './check ublk/002' in blktests.
>>>>
>>>> Not take close look, and  I guess it hangs in
>>>>
>>>> io_uring_cmd_del_cancelable() -> io_ring_submit_lock
>>>
>>> Thanks, the trace doesn't completely explains it, but my blind spot
>>> was io_uring_cmd_done() potentially grabbing the mutex. They're
>>> supposed to be irq safe mimicking io_req_task_work_add(), that's how
>>> nvme passthrough uses it as well (but at least it doesn't need the
>>> cancellation bits).
>>>
>>> One option is to replace it with a spinlock, the other is to delay
>>> the io_uring_cmd_del_cancelable() call to the task_work callback.
>>> The latter would be cleaner and more preferable, but I'm lacking
>>> context to tell if that would be correct. Ming, what do you think?
>>
>> I prefer to the latter approach because the two cancelable helpers are
>> run in fast path.
>>
>> Looks all new io_uring_cmd_complete() in ublk have this issue, and the
>> following patch should avoid them all.
> 
> The one I have in mind on top of the current tree would be like below.
> Untested, and doesn't allow this cancellation thing for iopoll. I'll
> prepare something tomorrow.
> 
> 
> diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
> index e45d4cd5ef82..000ba435451c 100644
> --- a/io_uring/uring_cmd.c
> +++ b/io_uring/uring_cmd.c
> @@ -14,19 +14,15 @@
>   #include "rsrc.h"
>   #include "uring_cmd.h"
> 
> -static void io_uring_cmd_del_cancelable(struct io_uring_cmd *cmd,
> -        unsigned int issue_flags)
> +static void io_uring_cmd_del_cancelable(struct io_uring_cmd *cmd)
>   {
>       struct io_kiocb *req = cmd_to_io_kiocb(cmd);
> -    struct io_ring_ctx *ctx = req->ctx;
> 
>       if (!(cmd->flags & IORING_URING_CMD_CANCELABLE))
>           return;
> 
>       cmd->flags &= ~IORING_URING_CMD_CANCELABLE;
> -    io_ring_submit_lock(ctx, issue_flags);
>       hlist_del(&req->hash_node);
> -    io_ring_submit_unlock(ctx, issue_flags);
>   }
> 
>   /*
> @@ -80,6 +76,15 @@ static inline void io_req_set_cqe32_extra(struct io_kiocb *req,
>       req->big_cqe.extra2 = extra2;
>   }
> 
> +static void io_req_task_cmd_complete(struct io_kiocb *req,
> +                     struct io_tw_state *ts)
> +{
> +    struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd);
> +
> +    io_uring_cmd_del_cancelable(ioucmd);
> +    io_req_task_complete(req, ts);
> +}
> +
>   /*
>    * Called by consumers of io_uring_cmd, if they originally returned
>    * -EIOCBQUEUED upon receiving the command.
> @@ -89,8 +94,6 @@ void io_uring_cmd_done(struct io_uring_cmd *ioucmd, ssize_t ret, ssize_t res2,
>   {
>       struct io_kiocb *req = cmd_to_io_kiocb(ioucmd);
> 
> -    io_uring_cmd_del_cancelable(ioucmd, issue_flags);
> -
>       if (ret < 0)
>           req_set_fail(req);
> 
> @@ -105,7 +108,7 @@ void io_uring_cmd_done(struct io_uring_cmd *ioucmd, ssize_t ret, ssize_t res2,
>               return;

Not very well thought through... Here should be a *del_cancelable call
as well

>           io_req_complete_defer(req);
>       } else {
> -        req->io_task_work.func = io_req_task_complete;
> +        req->io_task_work.func = io_req_task_cmd_complete;
>           io_req_task_work_add(req);
>       }
>   }
>

Ming Lei March 16, 2024, 9:53 a.m. UTC | #10

On Sat, Mar 16, 2024 at 04:20:25AM +0000, Pavel Begunkov wrote:
> On 3/16/24 04:13, Pavel Begunkov wrote:
> > On 3/16/24 03:54, Ming Lei wrote:
> > > On Sat, Mar 16, 2024 at 02:54:19AM +0000, Pavel Begunkov wrote:
> > > > On 3/16/24 02:24, Ming Lei wrote:
> > > > > On Sat, Mar 16, 2024 at 10:04 AM Ming Lei <ming.lei@redhat.com> wrote:
> > > > > > 
> > > > > > On Fri, Mar 15, 2024 at 04:53:21PM -0600, Jens Axboe wrote:
> > > > > > > 
> > > > > > > On Fri, 15 Mar 2024 15:29:50 +0000, Pavel Begunkov wrote:
> > > > > > > > Patch 1 is a fix.
> > > > > > > > 
> > > > > > > > Patches 2-7 are cleanups mainly dealing with issue_flags conversions,
> > > > > > > > misundertsandings of the flags and of the tw state. It'd be great to have
> > > > > > > > even without even w/o the rest.
> > > > > > > > 
> > > > > > > > 8-11 mandate ctx locking for task_work and finally removes the CQE
> > > > > > > > caches, instead we post directly into the CQ. Note that the cache is
> > > > > > > > used by multishot auxiliary completions.
> > > > > > > > 
> > > > > > > > [...]
> > > > > > > 
> > > > > > > Applied, thanks!
> > > > > > 
> > > > > > Hi Jens and Pavel,
> > > > > > 
> > > > > > Looks this patch causes hang when running './check ublk/002' in blktests.
> > > > > 
> > > > > Not take close look, and  I guess it hangs in
> > > > > 
> > > > > io_uring_cmd_del_cancelable() -> io_ring_submit_lock
> > > > 
> > > > Thanks, the trace doesn't completely explains it, but my blind spot
> > > > was io_uring_cmd_done() potentially grabbing the mutex. They're
> > > > supposed to be irq safe mimicking io_req_task_work_add(), that's how
> > > > nvme passthrough uses it as well (but at least it doesn't need the
> > > > cancellation bits).
> > > > 
> > > > One option is to replace it with a spinlock, the other is to delay
> > > > the io_uring_cmd_del_cancelable() call to the task_work callback.
> > > > The latter would be cleaner and more preferable, but I'm lacking
> > > > context to tell if that would be correct. Ming, what do you think?
> > > 
> > > I prefer to the latter approach because the two cancelable helpers are
> > > run in fast path.
> > > 
> > > Looks all new io_uring_cmd_complete() in ublk have this issue, and the
> > > following patch should avoid them all.
> > 
> > The one I have in mind on top of the current tree would be like below.
> > Untested, and doesn't allow this cancellation thing for iopoll. I'll
> > prepare something tomorrow.
> > 
> > 
> > diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
> > index e45d4cd5ef82..000ba435451c 100644
> > --- a/io_uring/uring_cmd.c
> > +++ b/io_uring/uring_cmd.c
> > @@ -14,19 +14,15 @@
> >   #include "rsrc.h"
> >   #include "uring_cmd.h"
> > 
> > -static void io_uring_cmd_del_cancelable(struct io_uring_cmd *cmd,
> > -        unsigned int issue_flags)
> > +static void io_uring_cmd_del_cancelable(struct io_uring_cmd *cmd)
> >   {
> >       struct io_kiocb *req = cmd_to_io_kiocb(cmd);
> > -    struct io_ring_ctx *ctx = req->ctx;
> > 
> >       if (!(cmd->flags & IORING_URING_CMD_CANCELABLE))
> >           return;
> > 
> >       cmd->flags &= ~IORING_URING_CMD_CANCELABLE;
> > -    io_ring_submit_lock(ctx, issue_flags);
> >       hlist_del(&req->hash_node);
> > -    io_ring_submit_unlock(ctx, issue_flags);
> >   }
> > 
> >   /*
> > @@ -80,6 +76,15 @@ static inline void io_req_set_cqe32_extra(struct io_kiocb *req,
> >       req->big_cqe.extra2 = extra2;
> >   }
> > 
> > +static void io_req_task_cmd_complete(struct io_kiocb *req,
> > +                     struct io_tw_state *ts)
> > +{
> > +    struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd);
> > +
> > +    io_uring_cmd_del_cancelable(ioucmd);
> > +    io_req_task_complete(req, ts);
> > +}
> > +
> >   /*
> >    * Called by consumers of io_uring_cmd, if they originally returned
> >    * -EIOCBQUEUED upon receiving the command.
> > @@ -89,8 +94,6 @@ void io_uring_cmd_done(struct io_uring_cmd *ioucmd, ssize_t ret, ssize_t res2,
> >   {
> >       struct io_kiocb *req = cmd_to_io_kiocb(ioucmd);
> > 
> > -    io_uring_cmd_del_cancelable(ioucmd, issue_flags);
> > -
> >       if (ret < 0)
> >           req_set_fail(req);
> > 
> > @@ -105,7 +108,7 @@ void io_uring_cmd_done(struct io_uring_cmd *ioucmd, ssize_t ret, ssize_t res2,
> >               return;
> 
> Not very well thought through... Here should be a *del_cancelable call
> as well

Thanks for the fix!

The patch works after adding io_uring_cmd_del_cancelable() in the branch of
`else if (issue_flags & IO_URING_F_COMPLETE_DEFER)'.


Thanks,
Ming

Ming Lei March 16, 2024, 11:52 a.m. UTC | #11

On Fri, Mar 15, 2024 at 04:53:21PM -0600, Jens Axboe wrote:
> 
> On Fri, 15 Mar 2024 15:29:50 +0000, Pavel Begunkov wrote:
> > Patch 1 is a fix.
> > 
> > Patches 2-7 are cleanups mainly dealing with issue_flags conversions,
> > misundertsandings of the flags and of the tw state. It'd be great to have
> > even without even w/o the rest.
> > 
> > 8-11 mandate ctx locking for task_work and finally removes the CQE
> > caches, instead we post directly into the CQ. Note that the cache is
> > used by multishot auxiliary completions.
> > 
> > [...]
> 
> Applied, thanks!
> 
> [02/11] io_uring/cmd: kill one issue_flags to tw conversion
>         commit: 31ab0342cf6434e1e2879d12f0526830ce97365d
> [03/11] io_uring/cmd: fix tw <-> issue_flags conversion
>         commit: b48f3e29b89055894b3f50c657658c325b5b49fd
> [04/11] io_uring/cmd: introduce io_uring_cmd_complete
>         commit: c5b4c92ca69215c0af17e4e9d8c84c8942f3257d
> [05/11] ublk: don't hard code IO_URING_F_UNLOCKED
>         commit: c54cfb81fe1774231fca952eff928389bfc3b2e3
> [06/11] nvme/io_uring: don't hard code IO_URING_F_UNLOCKED
>         commit: 800a90681f3c3383660a8e3e2d279e0f056afaee
> [07/11] io_uring/rw: avoid punting to io-wq directly
>         commit: 56d565d54373c17b7620fc605c899c41968e48d0
> [08/11] io_uring: force tw ctx locking
>         commit: f087cdd065af0418ffc8a9ed39eadc93347efdd5
> [09/11] io_uring: remove struct io_tw_state::locked
>         commit: 339f8d66e996ec52b47221448ff4b3534cc9a58d
> [10/11] io_uring: refactor io_fill_cqe_req_aux
>         commit: 7b31c3964b769a6a16c4e414baa8094b441e498e
> [11/11] io_uring: get rid of intermediate aux cqe caches
>         commit: 5a475a1f47412a44ed184aac04b9ff0aeaa31d65

Hi Jens and Pavel,

The following two error can be triggered with this patchset
when running some ublk stress test(io vs. deletion). And not see
such failures after reverting the 11 patches.

1) error 1

[  318.843517] ------------[ cut here ]------------
[  318.843937] kernel BUG at mm/slub.c:553!
[  318.844235] invalid opcode: 0000 [#1] SMP NOPTI
[  318.844580] CPU: 7 PID: 1475 Comm: kworker/u48:13 Not tainted 6.8.0_io_uring_6.10+ #14
[  318.845133] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc37 04/01/2014
[  318.845732] Workqueue: events_unbound io_ring_exit_work
[  318.846104] RIP: 0010:__slab_free+0x152/0x2f0
[  318.846434] Code: 00 4c 89 ff e8 ef 41 bc 00 48 8b 14 24 48 8b 4c 24 20 48 89 44 24 08 48 8b 03 48 c1 e8 09 83 e0 01 88 44 24 13 e9 71 ff4
[  318.851192] RSP: 0018:ffffb490411abcb0 EFLAGS: 00010246
[  318.851574] RAX: ffff8b0e871e44f0 RBX: fffff113841c7900 RCX: 0000000000200010
[  318.852032] RDX: ffff8b0e871e4400 RSI: fffff113841c7900 RDI: ffffb490411abd20
[  318.852521] RBP: ffffb490411abd50 R08: 0000000000000001 R09: ffffffffa17e4deb
[  318.852981] R10: 0000000000200010 R11: 0000000000000024 R12: ffff8b0e80292c00
[  318.853472] R13: ffff8b0e871e4400 R14: ffff8b0e80292c00 R15: ffffffffa17e4deb
[  318.853911] FS:  0000000000000000(0000) GS:ffff8b13e7b80000(0000) knlGS:0000000000000000
[  318.854448] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  318.854831] CR2: 00007fbce5249298 CR3: 0000000363020002 CR4: 0000000000770ef0
[  318.855291] PKRU: 55555554
[  318.855533] Call Trace:
[  318.855724]  <TASK>
[  318.855898]  ? die+0x36/0x90
[  318.856121]  ? do_trap+0xdd/0x100
[  318.856389]  ? __slab_free+0x152/0x2f0
[  318.856674]  ? do_error_trap+0x6a/0x90
[  318.856939]  ? __slab_free+0x152/0x2f0
[  318.857202]  ? exc_invalid_op+0x50/0x70
[  318.857505]  ? __slab_free+0x152/0x2f0
[  318.857770]  ? asm_exc_invalid_op+0x1a/0x20
[  318.858056]  ? io_req_caches_free+0x9b/0x100
[  318.858439]  ? io_req_caches_free+0x9b/0x100
[  318.858961]  ? __slab_free+0x152/0x2f0
[  318.859466]  ? __memcg_slab_free_hook+0xd9/0x130
[  318.859941]  ? io_req_caches_free+0x9b/0x100
[  318.860395]  kmem_cache_free+0x2eb/0x3b0
[  318.860826]  io_req_caches_free+0x9b/0x100
[  318.861190]  io_ring_exit_work+0x105/0x5c0
[  318.861496]  ? __schedule+0x3d4/0x1510
[  318.861761]  process_one_work+0x181/0x350
[  318.862042]  worker_thread+0x27e/0x390
[  318.862307]  ? __pfx_worker_thread+0x10/0x10
[  318.862621]  kthread+0xbb/0xf0
[  318.862854]  ? __pfx_kthread+0x10/0x10
[  318.863124]  ret_from_fork+0x31/0x50
[  318.863397]  ? __pfx_kthread+0x10/0x10
[  318.863665]  ret_from_fork_asm+0x1a/0x30
[  318.863943]  </TASK>
[  318.864122] Modules linked in: isofs binfmt_misc xfs vfat fat raid0 iTCO_wdt intel_pmc_bxt iTCO_vendor_support virtio_net net_failover i2g
[  318.865638] ---[ end trace 0000000000000000 ]---
[  318.865966] RIP: 0010:__slab_free+0x152/0x2f0
[  318.866267] Code: 00 4c 89 ff e8 ef 41 bc 00 48 8b 14 24 48 8b 4c 24 20 48 89 44 24 08 48 8b 03 48 c1 e8 09 83 e0 01 88 44 24 13 e9 71 ff4
[  318.867622] RSP: 0018:ffffb490411abcb0 EFLAGS: 00010246
[  318.868103] RAX: ffff8b0e871e44f0 RBX: fffff113841c7900 RCX: 0000000000200010
[  318.868602] RDX: ffff8b0e871e4400 RSI: fffff113841c7900 RDI: ffffb490411abd20
[  318.869051] RBP: ffffb490411abd50 R08: 0000000000000001 R09: ffffffffa17e4deb
[  318.869544] R10: 0000000000200010 R11: 0000000000000024 R12: ffff8b0e80292c00
[  318.870028] R13: ffff8b0e871e4400 R14: ffff8b0e80292c00 R15: ffffffffa17e4deb
[  318.870550] FS:  0000000000000000(0000) GS:ffff8b13e7b80000(0000) knlGS:0000000000000000
[  318.871080] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  318.871509] CR2: 00007fbce5249298 CR3: 0000000363020002 CR4: 0000000000770ef0
[  318.871974] PKRU: 55555554

2) error 2

[ 2833.161174] ------------[ cut here ]------------
[ 2833.161527] WARNING: CPU: 11 PID: 22867 at kernel/fork.c:969 __put_task_struct+0x10c/0x180
[ 2833.162114] Modules linked in: isofs binfmt_misc vfat fat xfs raid0 iTCO_wdt intel_pmc_bxt iTCO_vendor_support i2c_i801 virtio_net i2c_smbus net_failover failover lpc_ich ublk_drv loop zram nvme nvme_core usb_storage crc32c_intel virtio_scsi virtio_blk fuse qemu_fw_cfg
[ 2833.163650] CPU: 11 PID: 22867 Comm: kworker/11:0 Tainted: G      D W          6.8.0_io_uring_6.10+ #14
[ 2833.164289] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc37 04/01/2014
[ 2833.164860] Workqueue: events io_fallback_req_func
[ 2833.165224] RIP: 0010:__put_task_struct+0x10c/0x180
[ 2833.165586] Code: 48 85 d2 74 05 f0 ff 0a 74 44 48 8b 3d d5 b6 c7 02 48 89 ee e8 65 b7 2d 00 eb ac be 03 00 00 00 48 89 ef e8 36 82 70 00 eb 9d <0f> 0b 8b 43 28 85 c0 0f 84 0e ff ff ff 0f 0b 65 48 3b 1d 5d d2 f2
[ 2833.166819] RSP: 0018:ffffb89da07a7df8 EFLAGS: 00010246
[ 2833.167210] RAX: 0000000000000000 RBX: ffff97d7d9332ec0 RCX: 0000000000000000
[ 2833.167685] RDX: 0000000000000001 RSI: 0000000000000246 RDI: ffff97d7d9332ec0
[ 2833.168167] RBP: ffff97d6cd9cc000 R08: 0000000000000000 R09: 0000000000000000
[ 2833.168664] R10: ffffb89da07a7db0 R11: 0000000000000100 R12: ffff97d7dee497f0
[ 2833.169161] R13: ffff97d7dee497f0 R14: ffff97d7400e9d00 R15: ffff97d6cd9cc410
[ 2833.169621] FS:  0000000000000000(0000) GS:ffff97dc27d80000(0000) knlGS:0000000000000000
[ 2833.170196] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2833.170578] CR2: 00007fa731cfe648 CR3: 0000000220b30004 CR4: 0000000000770ef0
[ 2833.171158] PKRU: 55555554
[ 2833.171394] Call Trace:
[ 2833.171596]  <TASK>
[ 2833.171779]  ? __warn+0x80/0x110
[ 2833.172044]  ? __put_task_struct+0x10c/0x180
[ 2833.172367]  ? report_bug+0x150/0x170
[ 2833.172637]  ? handle_bug+0x41/0x70
[ 2833.172899]  ? exc_invalid_op+0x17/0x70
[ 2833.173203]  ? asm_exc_invalid_op+0x1a/0x20
[ 2833.173522]  ? __put_task_struct+0x10c/0x180
[ 2833.173826]  ? io_put_task_remote+0x80/0x90
[ 2833.174153]  __io_submit_flush_completions+0x2bd/0x380
[ 2833.174509]  io_fallback_req_func+0xa3/0x130
[ 2833.174806]  process_one_work+0x181/0x350
[ 2833.175105]  worker_thread+0x27e/0x390
[ 2833.175394]  ? __pfx_worker_thread+0x10/0x10
[ 2833.175690]  kthread+0xbb/0xf0
[ 2833.175920]  ? __pfx_kthread+0x10/0x10
[ 2833.176226]  ret_from_fork+0x31/0x50
[ 2833.176485]  ? __pfx_kthread+0x10/0x10
[ 2833.176751]  ret_from_fork_asm+0x1a/0x30
[ 2833.177044]  </TASK>
[ 2833.177256] ---[ end trace 0000000000000000 ]---
[ 2833.177586] BUG: kernel NULL pointer dereference, address: 00000000000000e8
[ 2833.178054] #PF: supervisor read access in kernel mode
[ 2833.178424] #PF: error_code(0x0000) - not-present page
[ 2833.178776] PGD 21f4f9067 P4D 21f4f9067 PUD 21f4fa067 PMD 0
[ 2833.179182] Oops: 0000 [#3] SMP NOPTI
[ 2833.179464] CPU: 11 PID: 22867 Comm: kworker/11:0 Tainted: G      D W          6.8.0_io_uring_6.10+ #14
[ 2833.180110] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc37 04/01/2014
[ 2833.180692] Workqueue: events io_fallback_req_func
[ 2833.181042] RIP: 0010:percpu_counter_add_batch+0x19/0x80
[ 2833.181430] Code: 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 41 55 41 54 55 53 9c 58 0f 1f 40 00 49 89 c5 fa 0f 1f 44 00 00 <48> 8b 4f 20 48 63 d2 65 48 63 19 49 89 dc 48 01 f3 48 89 d8 48 f7
[ 2833.182623] RSP: 0018:ffffb89da07a7dd0 EFLAGS: 00010006
[ 2833.186362] RAX: 0000000000000206 RBX: ffff97d7d9332ec0 RCX: 0000000000000000
[ 2833.186825] RDX: 0000000000000020 RSI: ffffffffffffffff RDI: 00000000000000c8
[ 2833.187326] RBP: 0000000000000000 R08: 0000000000000246 R09: 0000000000020001
[ 2833.187783] R10: 0000000000020001 R11: 0000000000000032 R12: ffff97d7dee497f0
[ 2833.188284] R13: 0000000000000206 R14: ffff97d7400e9d00 R15: ffff97d6cd9cc410
[ 2833.188741] FS:  0000000000000000(0000) GS:ffff97dc27d80000(0000) knlGS:0000000000000000
[ 2833.189310] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2833.189709] CR2: 00000000000000e8 CR3: 0000000220b30004 CR4: 0000000000770ef0
[ 2833.190205] PKRU: 55555554
[ 2833.190418] Call Trace:
[ 2833.190615]  <TASK>
[ 2833.190795]  ? __die+0x23/0x70
[ 2833.191053]  ? page_fault_oops+0x173/0x4f0
[ 2833.191362]  ? exc_page_fault+0x76/0x150
[ 2833.191654]  ? asm_exc_page_fault+0x26/0x30
[ 2833.191968]  ? percpu_counter_add_batch+0x19/0x80
[ 2833.192313]  io_put_task_remote+0x2a/0x90
[ 2833.192594]  __io_submit_flush_completions+0x2bd/0x380
[ 2833.192944]  io_fallback_req_func+0xa3/0x130
[ 2833.193273]  process_one_work+0x181/0x350
[ 2833.193550]  worker_thread+0x27e/0x390
[ 2833.193813]  ? __pfx_worker_thread+0x10/0x10
[ 2833.194123]  kthread+0xbb/0xf0
[ 2833.194369]  ? __pfx_kthread+0x10/0x10
[ 2833.194638]  ret_from_fork+0x31/0x50
[ 2833.194899]  ? __pfx_kthread+0x10/0x10
[ 2833.195213]  ret_from_fork_asm+0x1a/0x30
[ 2833.195484]  </TASK>
[ 2833.195661] Modules linked in: isofs binfmt_misc vfat fat xfs raid0 iTCO_wdt intel_pmc_bxt iTCO_vendor_support i2c_i801 virtio_net i2c_smbus net_failover failover lpc_ich ublk_drv loop zram nvme nvme_core usb_storage crc32c_intel virtio_scsi virtio_blk fuse qemu_fw_cfg
[ 2833.197148] CR2: 00000000000000e8
[ 2833.197400] ---[ end trace 0000000000000000 ]---
[ 2833.197714] RIP: 0010:percpu_counter_add_batch+0x19/0x80
[ 2833.198078] Code: 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 41 55 41 54 55 53 9c 58 0f 1f 40 00 49 89 c5 fa 0f 1f 44 00 00 <48> 8b 4f 20 48 63 d2 65 48 63 19 49 89 dc 48 01 f3 48 89 d8 48 f7
[ 2833.199261] RSP: 0018:ffffb89d8b0f7dd0 EFLAGS: 00010006
[ 2833.199599] RAX: 0000000000000206 RBX: ffff97d77b830000 RCX: 0000000080020001
[ 2833.200051] RDX: 0000000000000020 RSI: ffffffffffffffff RDI: 00000000000000c8
[ 2833.200515] RBP: 0000000000000000 R08: ffff97d77b830000 R09: 0000000080020001
[ 2833.200956] R10: 0000000080020001 R11: 0000000000000016 R12: ffff97d75210c6c0
[ 2833.201439] R13: 0000000000000206 R14: ffff97d7518f3800 R15: ffff97d6c304bc10
[ 2833.201894] FS:  0000000000000000(0000) GS:ffff97dc27d80000(0000) knlGS:0000000000000000
[ 2833.202455] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2833.202832] CR2: 00000000000000e8 CR3: 0000000220b30004 CR4: 0000000000770ef0
[ 2833.203316] PKRU: 55555554
[ 2833.203524] note: kworker/11:0[22867] exited with irqs disabled


Thanks, 
Ming

Pavel Begunkov March 16, 2024, 1:27 p.m. UTC | #12

On 3/16/24 11:52, Ming Lei wrote:
> On Fri, Mar 15, 2024 at 04:53:21PM -0600, Jens Axboe wrote:
>>
>> On Fri, 15 Mar 2024 15:29:50 +0000, Pavel Begunkov wrote:
>>> Patch 1 is a fix.
>>>
>>> Patches 2-7 are cleanups mainly dealing with issue_flags conversions,
>>> misundertsandings of the flags and of the tw state. It'd be great to have
>>> even without even w/o the rest.
>>>
>>> 8-11 mandate ctx locking for task_work and finally removes the CQE
>>> caches, instead we post directly into the CQ. Note that the cache is
>>> used by multishot auxiliary completions.
>>>
>>> [...]
>>
>> Applied, thanks!
>>
>> [02/11] io_uring/cmd: kill one issue_flags to tw conversion
>>          commit: 31ab0342cf6434e1e2879d12f0526830ce97365d
>> [03/11] io_uring/cmd: fix tw <-> issue_flags conversion
>>          commit: b48f3e29b89055894b3f50c657658c325b5b49fd
>> [04/11] io_uring/cmd: introduce io_uring_cmd_complete
>>          commit: c5b4c92ca69215c0af17e4e9d8c84c8942f3257d
>> [05/11] ublk: don't hard code IO_URING_F_UNLOCKED
>>          commit: c54cfb81fe1774231fca952eff928389bfc3b2e3
>> [06/11] nvme/io_uring: don't hard code IO_URING_F_UNLOCKED
>>          commit: 800a90681f3c3383660a8e3e2d279e0f056afaee
>> [07/11] io_uring/rw: avoid punting to io-wq directly
>>          commit: 56d565d54373c17b7620fc605c899c41968e48d0
>> [08/11] io_uring: force tw ctx locking
>>          commit: f087cdd065af0418ffc8a9ed39eadc93347efdd5
>> [09/11] io_uring: remove struct io_tw_state::locked
>>          commit: 339f8d66e996ec52b47221448ff4b3534cc9a58d
>> [10/11] io_uring: refactor io_fill_cqe_req_aux
>>          commit: 7b31c3964b769a6a16c4e414baa8094b441e498e
>> [11/11] io_uring: get rid of intermediate aux cqe caches
>>          commit: 5a475a1f47412a44ed184aac04b9ff0aeaa31d65
> 
> Hi Jens and Pavel,

Jens, I hope you already dropped the series for now, right?

> 
> The following two error can be triggered with this patchset
> when running some ublk stress test(io vs. deletion). And not see
> such failures after reverting the 11 patches.

I suppose it's with the fix from yesterday. How can I
reproduce it, blktests?



> 1) error 1
> 
> [  318.843517] ------------[ cut here ]------------
> [  318.843937] kernel BUG at mm/slub.c:553!
> [  318.844235] invalid opcode: 0000 [#1] SMP NOPTI
> [  318.844580] CPU: 7 PID: 1475 Comm: kworker/u48:13 Not tainted 6.8.0_io_uring_6.10+ #14
> [  318.845133] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc37 04/01/2014
> [  318.845732] Workqueue: events_unbound io_ring_exit_work
> [  318.846104] RIP: 0010:__slab_free+0x152/0x2f0
> [  318.846434] Code: 00 4c 89 ff e8 ef 41 bc 00 48 8b 14 24 48 8b 4c 24 20 48 89 44 24 08 48 8b 03 48 c1 e8 09 83 e0 01 88 44 24 13 e9 71 ff4
> [  318.851192] RSP: 0018:ffffb490411abcb0 EFLAGS: 00010246
> [  318.851574] RAX: ffff8b0e871e44f0 RBX: fffff113841c7900 RCX: 0000000000200010
> [  318.852032] RDX: ffff8b0e871e4400 RSI: fffff113841c7900 RDI: ffffb490411abd20
> [  318.852521] RBP: ffffb490411abd50 R08: 0000000000000001 R09: ffffffffa17e4deb
> [  318.852981] R10: 0000000000200010 R11: 0000000000000024 R12: ffff8b0e80292c00
> [  318.853472] R13: ffff8b0e871e4400 R14: ffff8b0e80292c00 R15: ffffffffa17e4deb
> [  318.853911] FS:  0000000000000000(0000) GS:ffff8b13e7b80000(0000) knlGS:0000000000000000
> [  318.854448] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  318.854831] CR2: 00007fbce5249298 CR3: 0000000363020002 CR4: 0000000000770ef0
> [  318.855291] PKRU: 55555554
> [  318.855533] Call Trace:
> [  318.855724]  <TASK>
> [  318.855898]  ? die+0x36/0x90
> [  318.856121]  ? do_trap+0xdd/0x100
> [  318.856389]  ? __slab_free+0x152/0x2f0
> [  318.856674]  ? do_error_trap+0x6a/0x90
> [  318.856939]  ? __slab_free+0x152/0x2f0
> [  318.857202]  ? exc_invalid_op+0x50/0x70
> [  318.857505]  ? __slab_free+0x152/0x2f0
> [  318.857770]  ? asm_exc_invalid_op+0x1a/0x20
> [  318.858056]  ? io_req_caches_free+0x9b/0x100
> [  318.858439]  ? io_req_caches_free+0x9b/0x100
> [  318.858961]  ? __slab_free+0x152/0x2f0
> [  318.859466]  ? __memcg_slab_free_hook+0xd9/0x130
> [  318.859941]  ? io_req_caches_free+0x9b/0x100
> [  318.860395]  kmem_cache_free+0x2eb/0x3b0
> [  318.860826]  io_req_caches_free+0x9b/0x100
> [  318.861190]  io_ring_exit_work+0x105/0x5c0
> [  318.861496]  ? __schedule+0x3d4/0x1510
> [  318.861761]  process_one_work+0x181/0x350
> [  318.862042]  worker_thread+0x27e/0x390
> [  318.862307]  ? __pfx_worker_thread+0x10/0x10
> [  318.862621]  kthread+0xbb/0xf0
> [  318.862854]  ? __pfx_kthread+0x10/0x10
> [  318.863124]  ret_from_fork+0x31/0x50
> [  318.863397]  ? __pfx_kthread+0x10/0x10
> [  318.863665]  ret_from_fork_asm+0x1a/0x30
> [  318.863943]  </TASK>
> [  318.864122] Modules linked in: isofs binfmt_misc xfs vfat fat raid0 iTCO_wdt intel_pmc_bxt iTCO_vendor_support virtio_net net_failover i2g
> [  318.865638] ---[ end trace 0000000000000000 ]---
> [  318.865966] RIP: 0010:__slab_free+0x152/0x2f0
> [  318.866267] Code: 00 4c 89 ff e8 ef 41 bc 00 48 8b 14 24 48 8b 4c 24 20 48 89 44 24 08 48 8b 03 48 c1 e8 09 83 e0 01 88 44 24 13 e9 71 ff4
> [  318.867622] RSP: 0018:ffffb490411abcb0 EFLAGS: 00010246
> [  318.868103] RAX: ffff8b0e871e44f0 RBX: fffff113841c7900 RCX: 0000000000200010
> [  318.868602] RDX: ffff8b0e871e4400 RSI: fffff113841c7900 RDI: ffffb490411abd20
> [  318.869051] RBP: ffffb490411abd50 R08: 0000000000000001 R09: ffffffffa17e4deb
> [  318.869544] R10: 0000000000200010 R11: 0000000000000024 R12: ffff8b0e80292c00
> [  318.870028] R13: ffff8b0e871e4400 R14: ffff8b0e80292c00 R15: ffffffffa17e4deb
> [  318.870550] FS:  0000000000000000(0000) GS:ffff8b13e7b80000(0000) knlGS:0000000000000000
> [  318.871080] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  318.871509] CR2: 00007fbce5249298 CR3: 0000000363020002 CR4: 0000000000770ef0
> [  318.871974] PKRU: 55555554
> 
> 2) error 2
> 
> [ 2833.161174] ------------[ cut here ]------------
> [ 2833.161527] WARNING: CPU: 11 PID: 22867 at kernel/fork.c:969 __put_task_struct+0x10c/0x180
> [ 2833.162114] Modules linked in: isofs binfmt_misc vfat fat xfs raid0 iTCO_wdt intel_pmc_bxt iTCO_vendor_support i2c_i801 virtio_net i2c_smbus net_failover failover lpc_ich ublk_drv loop zram nvme nvme_core usb_storage crc32c_intel virtio_scsi virtio_blk fuse qemu_fw_cfg
> [ 2833.163650] CPU: 11 PID: 22867 Comm: kworker/11:0 Tainted: G      D W          6.8.0_io_uring_6.10+ #14
> [ 2833.164289] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc37 04/01/2014
> [ 2833.164860] Workqueue: events io_fallback_req_func
> [ 2833.165224] RIP: 0010:__put_task_struct+0x10c/0x180
> [ 2833.165586] Code: 48 85 d2 74 05 f0 ff 0a 74 44 48 8b 3d d5 b6 c7 02 48 89 ee e8 65 b7 2d 00 eb ac be 03 00 00 00 48 89 ef e8 36 82 70 00 eb 9d <0f> 0b 8b 43 28 85 c0 0f 84 0e ff ff ff 0f 0b 65 48 3b 1d 5d d2 f2
> [ 2833.166819] RSP: 0018:ffffb89da07a7df8 EFLAGS: 00010246
> [ 2833.167210] RAX: 0000000000000000 RBX: ffff97d7d9332ec0 RCX: 0000000000000000
> [ 2833.167685] RDX: 0000000000000001 RSI: 0000000000000246 RDI: ffff97d7d9332ec0
> [ 2833.168167] RBP: ffff97d6cd9cc000 R08: 0000000000000000 R09: 0000000000000000
> [ 2833.168664] R10: ffffb89da07a7db0 R11: 0000000000000100 R12: ffff97d7dee497f0
> [ 2833.169161] R13: ffff97d7dee497f0 R14: ffff97d7400e9d00 R15: ffff97d6cd9cc410
> [ 2833.169621] FS:  0000000000000000(0000) GS:ffff97dc27d80000(0000) knlGS:0000000000000000
> [ 2833.170196] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2833.170578] CR2: 00007fa731cfe648 CR3: 0000000220b30004 CR4: 0000000000770ef0
> [ 2833.171158] PKRU: 55555554
> [ 2833.171394] Call Trace:
> [ 2833.171596]  <TASK>
> [ 2833.171779]  ? __warn+0x80/0x110
> [ 2833.172044]  ? __put_task_struct+0x10c/0x180
> [ 2833.172367]  ? report_bug+0x150/0x170
> [ 2833.172637]  ? handle_bug+0x41/0x70
> [ 2833.172899]  ? exc_invalid_op+0x17/0x70
> [ 2833.173203]  ? asm_exc_invalid_op+0x1a/0x20
> [ 2833.173522]  ? __put_task_struct+0x10c/0x180
> [ 2833.173826]  ? io_put_task_remote+0x80/0x90
> [ 2833.174153]  __io_submit_flush_completions+0x2bd/0x380
> [ 2833.174509]  io_fallback_req_func+0xa3/0x130
> [ 2833.174806]  process_one_work+0x181/0x350
> [ 2833.175105]  worker_thread+0x27e/0x390
> [ 2833.175394]  ? __pfx_worker_thread+0x10/0x10
> [ 2833.175690]  kthread+0xbb/0xf0
> [ 2833.175920]  ? __pfx_kthread+0x10/0x10
> [ 2833.176226]  ret_from_fork+0x31/0x50
> [ 2833.176485]  ? __pfx_kthread+0x10/0x10
> [ 2833.176751]  ret_from_fork_asm+0x1a/0x30
> [ 2833.177044]  </TASK>
> [ 2833.177256] ---[ end trace 0000000000000000 ]---
> [ 2833.177586] BUG: kernel NULL pointer dereference, address: 00000000000000e8
> [ 2833.178054] #PF: supervisor read access in kernel mode
> [ 2833.178424] #PF: error_code(0x0000) - not-present page
> [ 2833.178776] PGD 21f4f9067 P4D 21f4f9067 PUD 21f4fa067 PMD 0
> [ 2833.179182] Oops: 0000 [#3] SMP NOPTI
> [ 2833.179464] CPU: 11 PID: 22867 Comm: kworker/11:0 Tainted: G      D W          6.8.0_io_uring_6.10+ #14
> [ 2833.180110] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc37 04/01/2014
> [ 2833.180692] Workqueue: events io_fallback_req_func
> [ 2833.181042] RIP: 0010:percpu_counter_add_batch+0x19/0x80
> [ 2833.181430] Code: 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 41 55 41 54 55 53 9c 58 0f 1f 40 00 49 89 c5 fa 0f 1f 44 00 00 <48> 8b 4f 20 48 63 d2 65 48 63 19 49 89 dc 48 01 f3 48 89 d8 48 f7
> [ 2833.182623] RSP: 0018:ffffb89da07a7dd0 EFLAGS: 00010006
> [ 2833.186362] RAX: 0000000000000206 RBX: ffff97d7d9332ec0 RCX: 0000000000000000
> [ 2833.186825] RDX: 0000000000000020 RSI: ffffffffffffffff RDI: 00000000000000c8
> [ 2833.187326] RBP: 0000000000000000 R08: 0000000000000246 R09: 0000000000020001
> [ 2833.187783] R10: 0000000000020001 R11: 0000000000000032 R12: ffff97d7dee497f0
> [ 2833.188284] R13: 0000000000000206 R14: ffff97d7400e9d00 R15: ffff97d6cd9cc410
> [ 2833.188741] FS:  0000000000000000(0000) GS:ffff97dc27d80000(0000) knlGS:0000000000000000
> [ 2833.189310] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2833.189709] CR2: 00000000000000e8 CR3: 0000000220b30004 CR4: 0000000000770ef0
> [ 2833.190205] PKRU: 55555554
> [ 2833.190418] Call Trace:
> [ 2833.190615]  <TASK>
> [ 2833.190795]  ? __die+0x23/0x70
> [ 2833.191053]  ? page_fault_oops+0x173/0x4f0
> [ 2833.191362]  ? exc_page_fault+0x76/0x150
> [ 2833.191654]  ? asm_exc_page_fault+0x26/0x30
> [ 2833.191968]  ? percpu_counter_add_batch+0x19/0x80
> [ 2833.192313]  io_put_task_remote+0x2a/0x90
> [ 2833.192594]  __io_submit_flush_completions+0x2bd/0x380
> [ 2833.192944]  io_fallback_req_func+0xa3/0x130
> [ 2833.193273]  process_one_work+0x181/0x350
> [ 2833.193550]  worker_thread+0x27e/0x390
> [ 2833.193813]  ? __pfx_worker_thread+0x10/0x10
> [ 2833.194123]  kthread+0xbb/0xf0
> [ 2833.194369]  ? __pfx_kthread+0x10/0x10
> [ 2833.194638]  ret_from_fork+0x31/0x50
> [ 2833.194899]  ? __pfx_kthread+0x10/0x10
> [ 2833.195213]  ret_from_fork_asm+0x1a/0x30
> [ 2833.195484]  </TASK>
> [ 2833.195661] Modules linked in: isofs binfmt_misc vfat fat xfs raid0 iTCO_wdt intel_pmc_bxt iTCO_vendor_support i2c_i801 virtio_net i2c_smbus net_failover failover lpc_ich ublk_drv loop zram nvme nvme_core usb_storage crc32c_intel virtio_scsi virtio_blk fuse qemu_fw_cfg
> [ 2833.197148] CR2: 00000000000000e8
> [ 2833.197400] ---[ end trace 0000000000000000 ]---
> [ 2833.197714] RIP: 0010:percpu_counter_add_batch+0x19/0x80
> [ 2833.198078] Code: 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 41 55 41 54 55 53 9c 58 0f 1f 40 00 49 89 c5 fa 0f 1f 44 00 00 <48> 8b 4f 20 48 63 d2 65 48 63 19 49 89 dc 48 01 f3 48 89 d8 48 f7
> [ 2833.199261] RSP: 0018:ffffb89d8b0f7dd0 EFLAGS: 00010006
> [ 2833.199599] RAX: 0000000000000206 RBX: ffff97d77b830000 RCX: 0000000080020001
> [ 2833.200051] RDX: 0000000000000020 RSI: ffffffffffffffff RDI: 00000000000000c8
> [ 2833.200515] RBP: 0000000000000000 R08: ffff97d77b830000 R09: 0000000080020001
> [ 2833.200956] R10: 0000000080020001 R11: 0000000000000016 R12: ffff97d75210c6c0
> [ 2833.201439] R13: 0000000000000206 R14: ffff97d7518f3800 R15: ffff97d6c304bc10
> [ 2833.201894] FS:  0000000000000000(0000) GS:ffff97dc27d80000(0000) knlGS:0000000000000000
> [ 2833.202455] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2833.202832] CR2: 00000000000000e8 CR3: 0000000220b30004 CR4: 0000000000770ef0
> [ 2833.203316] PKRU: 55555554
> [ 2833.203524] note: kworker/11:0[22867] exited with irqs disabled
> 
> 
> Thanks,
> Ming
>

Ming Lei March 16, 2024, 1:56 p.m. UTC | #13

On Sat, Mar 16, 2024 at 01:27:17PM +0000, Pavel Begunkov wrote:
> On 3/16/24 11:52, Ming Lei wrote:
> > On Fri, Mar 15, 2024 at 04:53:21PM -0600, Jens Axboe wrote:

...

> > The following two error can be triggered with this patchset
> > when running some ublk stress test(io vs. deletion). And not see
> > such failures after reverting the 11 patches.
> 
> I suppose it's with the fix from yesterday. How can I
> reproduce it, blktests?

Yeah, it needs yesterday's fix.

You may need to run this test multiple times for triggering the problem:

1) git clone https://github.com/ublk-org/ublksrv.git

2) cd ublksrv

3) make test T=generic/004


Thanks,
Ming

Jens Axboe March 16, 2024, 2:39 p.m. UTC | #14

On 3/16/24 7:27 AM, Pavel Begunkov wrote:
> On 3/16/24 11:52, Ming Lei wrote:
>> On Fri, Mar 15, 2024 at 04:53:21PM -0600, Jens Axboe wrote:
>>>
>>> On Fri, 15 Mar 2024 15:29:50 +0000, Pavel Begunkov wrote:
>>>> Patch 1 is a fix.
>>>>
>>>> Patches 2-7 are cleanups mainly dealing with issue_flags conversions,
>>>> misundertsandings of the flags and of the tw state. It'd be great to have
>>>> even without even w/o the rest.
>>>>
>>>> 8-11 mandate ctx locking for task_work and finally removes the CQE
>>>> caches, instead we post directly into the CQ. Note that the cache is
>>>> used by multishot auxiliary completions.
>>>>
>>>> [...]
>>>
>>> Applied, thanks!
>>>
>>> [02/11] io_uring/cmd: kill one issue_flags to tw conversion
>>>          commit: 31ab0342cf6434e1e2879d12f0526830ce97365d
>>> [03/11] io_uring/cmd: fix tw <-> issue_flags conversion
>>>          commit: b48f3e29b89055894b3f50c657658c325b5b49fd
>>> [04/11] io_uring/cmd: introduce io_uring_cmd_complete
>>>          commit: c5b4c92ca69215c0af17e4e9d8c84c8942f3257d
>>> [05/11] ublk: don't hard code IO_URING_F_UNLOCKED
>>>          commit: c54cfb81fe1774231fca952eff928389bfc3b2e3
>>> [06/11] nvme/io_uring: don't hard code IO_URING_F_UNLOCKED
>>>          commit: 800a90681f3c3383660a8e3e2d279e0f056afaee
>>> [07/11] io_uring/rw: avoid punting to io-wq directly
>>>          commit: 56d565d54373c17b7620fc605c899c41968e48d0
>>> [08/11] io_uring: force tw ctx locking
>>>          commit: f087cdd065af0418ffc8a9ed39eadc93347efdd5
>>> [09/11] io_uring: remove struct io_tw_state::locked
>>>          commit: 339f8d66e996ec52b47221448ff4b3534cc9a58d
>>> [10/11] io_uring: refactor io_fill_cqe_req_aux
>>>          commit: 7b31c3964b769a6a16c4e414baa8094b441e498e
>>> [11/11] io_uring: get rid of intermediate aux cqe caches
>>>          commit: 5a475a1f47412a44ed184aac04b9ff0aeaa31d65
>>
>> Hi Jens and Pavel,
> 
> Jens, I hope you already dropped the series for now, right?

It's just sitting in a branch for now, it's not even in linux-next. I'll
review and look at a v2 of the series. So it hasn't moved anywhere yet.

Pavel Begunkov March 17, 2024, 8:55 p.m. UTC | #15

On 3/16/24 13:56, Ming Lei wrote:
> On Sat, Mar 16, 2024 at 01:27:17PM +0000, Pavel Begunkov wrote:
>> On 3/16/24 11:52, Ming Lei wrote:
>>> On Fri, Mar 15, 2024 at 04:53:21PM -0600, Jens Axboe wrote:
> 
> ...
> 
>>> The following two error can be triggered with this patchset
>>> when running some ublk stress test(io vs. deletion). And not see
>>> such failures after reverting the 11 patches.
>>
>> I suppose it's with the fix from yesterday. How can I
>> reproduce it, blktests?
> 
> Yeah, it needs yesterday's fix.
> 
> You may need to run this test multiple times for triggering the problem:

Thanks for all the testing. I've tried it, all ublk/generic tests hang
in userspace waiting for CQEs but no complaints from the kernel.
However, it seems the branch is buggy even without my patches, I
consistently (5-15 minutes of running in a slow VM) hit page underflow
by running liburing tests. Not sure what is that yet, but might also
be the reason.

I'll repost it with the locking fix for reference, would make more
sense retesting ublk after figuring out what's up with the branch.

> 1) git clone https://github.com/ublk-org/ublksrv.git
> 
> 2) cd ublksrv
> 
> 3) make test T=generic/004

Jens Axboe March 17, 2024, 9:24 p.m. UTC | #16

On 3/17/24 2:55 PM, Pavel Begunkov wrote:
> On 3/16/24 13:56, Ming Lei wrote:
>> On Sat, Mar 16, 2024 at 01:27:17PM +0000, Pavel Begunkov wrote:
>>> On 3/16/24 11:52, Ming Lei wrote:
>>>> On Fri, Mar 15, 2024 at 04:53:21PM -0600, Jens Axboe wrote:
>>
>> ...
>>
>>>> The following two error can be triggered with this patchset
>>>> when running some ublk stress test(io vs. deletion). And not see
>>>> such failures after reverting the 11 patches.
>>>
>>> I suppose it's with the fix from yesterday. How can I
>>> reproduce it, blktests?
>>
>> Yeah, it needs yesterday's fix.
>>
>> You may need to run this test multiple times for triggering the problem:
> 
> Thanks for all the testing. I've tried it, all ublk/generic tests hang
> in userspace waiting for CQEs but no complaints from the kernel.
> However, it seems the branch is buggy even without my patches, I
> consistently (5-15 minutes of running in a slow VM) hit page underflow
> by running liburing tests. Not sure what is that yet, but might also
> be the reason.

Hmm odd, there's nothing in there but your series and then the
io_uring-6.9 bits pulled in. Maybe it hit an unfortunate point in the
merge window -git cycle? Does it happen with io_uring-6.9 as well? I
haven't seen anything odd.

> I'll repost it with the locking fix for reference, would make more
> sense retesting ublk after figuring out what's up with the branch.

Yep if you repost it with the fix, I'll rebase for-6.10/io_uring.

Pavel Begunkov March 17, 2024, 9:29 p.m. UTC | #17

On 3/17/24 21:24, Jens Axboe wrote:
> On 3/17/24 2:55 PM, Pavel Begunkov wrote:
>> On 3/16/24 13:56, Ming Lei wrote:
>>> On Sat, Mar 16, 2024 at 01:27:17PM +0000, Pavel Begunkov wrote:
>>>> On 3/16/24 11:52, Ming Lei wrote:
>>>>> On Fri, Mar 15, 2024 at 04:53:21PM -0600, Jens Axboe wrote:
>>>
>>> ...
>>>
>>>>> The following two error can be triggered with this patchset
>>>>> when running some ublk stress test(io vs. deletion). And not see
>>>>> such failures after reverting the 11 patches.
>>>>
>>>> I suppose it's with the fix from yesterday. How can I
>>>> reproduce it, blktests?
>>>
>>> Yeah, it needs yesterday's fix.
>>>
>>> You may need to run this test multiple times for triggering the problem:
>>
>> Thanks for all the testing. I've tried it, all ublk/generic tests hang
>> in userspace waiting for CQEs but no complaints from the kernel.
>> However, it seems the branch is buggy even without my patches, I
>> consistently (5-15 minutes of running in a slow VM) hit page underflow
>> by running liburing tests. Not sure what is that yet, but might also
>> be the reason.
> 
> Hmm odd, there's nothing in there but your series and then the
> io_uring-6.9 bits pulled in. Maybe it hit an unfortunate point in the
> merge window -git cycle? Does it happen with io_uring-6.9 as well? I
> haven't seen anything odd.

Need to test io_uring-6.9. I actually checked the branch twice, both
with the issue, and by full recompilation and config prompts I assumed
you pulled something in between (maybe not).

And yeah, I can't confirm it's specifically an io_uring bug, the
stack trace is usually some unmap or task exit, sometimes it only
shows when you try to shutdown the VM after tests.

>> I'll repost it with the locking fix for reference, would make more
>> sense retesting ublk after figuring out what's up with the branch.
> 
> Yep if you repost it with the fix, I'll rebase for-6.10/io_uring.
>

Jens Axboe March 17, 2024, 9:32 p.m. UTC | #18

On 3/17/24 3:29 PM, Pavel Begunkov wrote:
> On 3/17/24 21:24, Jens Axboe wrote:
>> On 3/17/24 2:55 PM, Pavel Begunkov wrote:
>>> On 3/16/24 13:56, Ming Lei wrote:
>>>> On Sat, Mar 16, 2024 at 01:27:17PM +0000, Pavel Begunkov wrote:
>>>>> On 3/16/24 11:52, Ming Lei wrote:
>>>>>> On Fri, Mar 15, 2024 at 04:53:21PM -0600, Jens Axboe wrote:
>>>>
>>>> ...
>>>>
>>>>>> The following two error can be triggered with this patchset
>>>>>> when running some ublk stress test(io vs. deletion). And not see
>>>>>> such failures after reverting the 11 patches.
>>>>>
>>>>> I suppose it's with the fix from yesterday. How can I
>>>>> reproduce it, blktests?
>>>>
>>>> Yeah, it needs yesterday's fix.
>>>>
>>>> You may need to run this test multiple times for triggering the problem:
>>>
>>> Thanks for all the testing. I've tried it, all ublk/generic tests hang
>>> in userspace waiting for CQEs but no complaints from the kernel.
>>> However, it seems the branch is buggy even without my patches, I
>>> consistently (5-15 minutes of running in a slow VM) hit page underflow
>>> by running liburing tests. Not sure what is that yet, but might also
>>> be the reason.
>>
>> Hmm odd, there's nothing in there but your series and then the
>> io_uring-6.9 bits pulled in. Maybe it hit an unfortunate point in the
>> merge window -git cycle? Does it happen with io_uring-6.9 as well? I
>> haven't seen anything odd.
> 
> Need to test io_uring-6.9. I actually checked the branch twice, both
> with the issue, and by full recompilation and config prompts I assumed
> you pulled something in between (maybe not).
> 
> And yeah, I can't confirm it's specifically an io_uring bug, the
> stack trace is usually some unmap or task exit, sometimes it only
> shows when you try to shutdown the VM after tests.

Funky. I just ran a bunch of loops of liburing tests and Ming's ublksrv
test case as well on io_uring-6.9 and it all worked fine. Trying
liburing tests on for-6.10/io_uring as well now, but didn't see anything
the other times I ran it. In any case, once you repost I'll rebase and
then let's see if it hits again.

Did you run with KASAN enabled?

Pavel Begunkov March 17, 2024, 9:34 p.m. UTC | #19

On 3/17/24 21:32, Jens Axboe wrote:
> On 3/17/24 3:29 PM, Pavel Begunkov wrote:
>> On 3/17/24 21:24, Jens Axboe wrote:
>>> On 3/17/24 2:55 PM, Pavel Begunkov wrote:
>>>> On 3/16/24 13:56, Ming Lei wrote:
>>>>> On Sat, Mar 16, 2024 at 01:27:17PM +0000, Pavel Begunkov wrote:
>>>>>> On 3/16/24 11:52, Ming Lei wrote:
>>>>>>> On Fri, Mar 15, 2024 at 04:53:21PM -0600, Jens Axboe wrote:
>>>>>
>>>>> ...
>>>>>
>>>>>>> The following two error can be triggered with this patchset
>>>>>>> when running some ublk stress test(io vs. deletion). And not see
>>>>>>> such failures after reverting the 11 patches.
>>>>>>
>>>>>> I suppose it's with the fix from yesterday. How can I
>>>>>> reproduce it, blktests?
>>>>>
>>>>> Yeah, it needs yesterday's fix.
>>>>>
>>>>> You may need to run this test multiple times for triggering the problem:
>>>>
>>>> Thanks for all the testing. I've tried it, all ublk/generic tests hang
>>>> in userspace waiting for CQEs but no complaints from the kernel.
>>>> However, it seems the branch is buggy even without my patches, I
>>>> consistently (5-15 minutes of running in a slow VM) hit page underflow
>>>> by running liburing tests. Not sure what is that yet, but might also
>>>> be the reason.
>>>
>>> Hmm odd, there's nothing in there but your series and then the
>>> io_uring-6.9 bits pulled in. Maybe it hit an unfortunate point in the
>>> merge window -git cycle? Does it happen with io_uring-6.9 as well? I
>>> haven't seen anything odd.
>>
>> Need to test io_uring-6.9. I actually checked the branch twice, both
>> with the issue, and by full recompilation and config prompts I assumed
>> you pulled something in between (maybe not).
>>
>> And yeah, I can't confirm it's specifically an io_uring bug, the
>> stack trace is usually some unmap or task exit, sometimes it only
>> shows when you try to shutdown the VM after tests.
> 
> Funky. I just ran a bunch of loops of liburing tests and Ming's ublksrv
> test case as well on io_uring-6.9 and it all worked fine. Trying
> liburing tests on for-6.10/io_uring as well now, but didn't see anything
> the other times I ran it. In any case, once you repost I'll rebase and
> then let's see if it hits again.
> 
> Did you run with KASAN enabled

Yes, it's a debug kernel, full on KASANs, lockdeps and so

Pavel Begunkov March 17, 2024, 9:47 p.m. UTC | #20

On 3/17/24 21:34, Pavel Begunkov wrote:
> On 3/17/24 21:32, Jens Axboe wrote:
>> On 3/17/24 3:29 PM, Pavel Begunkov wrote:
>>> On 3/17/24 21:24, Jens Axboe wrote:
>>>> On 3/17/24 2:55 PM, Pavel Begunkov wrote:
>>>>> On 3/16/24 13:56, Ming Lei wrote:
>>>>>> On Sat, Mar 16, 2024 at 01:27:17PM +0000, Pavel Begunkov wrote:
>>>>>>> On 3/16/24 11:52, Ming Lei wrote:
>>>>>>>> On Fri, Mar 15, 2024 at 04:53:21PM -0600, Jens Axboe wrote:
>>>>>>
>>>>>> ...
>>>>>>
>>>>>>>> The following two error can be triggered with this patchset
>>>>>>>> when running some ublk stress test(io vs. deletion). And not see
>>>>>>>> such failures after reverting the 11 patches.
>>>>>>>
>>>>>>> I suppose it's with the fix from yesterday. How can I
>>>>>>> reproduce it, blktests?
>>>>>>
>>>>>> Yeah, it needs yesterday's fix.
>>>>>>
>>>>>> You may need to run this test multiple times for triggering the problem:
>>>>>
>>>>> Thanks for all the testing. I've tried it, all ublk/generic tests hang
>>>>> in userspace waiting for CQEs but no complaints from the kernel.
>>>>> However, it seems the branch is buggy even without my patches, I
>>>>> consistently (5-15 minutes of running in a slow VM) hit page underflow
>>>>> by running liburing tests. Not sure what is that yet, but might also
>>>>> be the reason.
>>>>
>>>> Hmm odd, there's nothing in there but your series and then the
>>>> io_uring-6.9 bits pulled in. Maybe it hit an unfortunate point in the
>>>> merge window -git cycle? Does it happen with io_uring-6.9 as well? I
>>>> haven't seen anything odd.
>>>
>>> Need to test io_uring-6.9. I actually checked the branch twice, both
>>> with the issue, and by full recompilation and config prompts I assumed
>>> you pulled something in between (maybe not).
>>>
>>> And yeah, I can't confirm it's specifically an io_uring bug, the
>>> stack trace is usually some unmap or task exit, sometimes it only
>>> shows when you try to shutdown the VM after tests.
>>
>> Funky. I just ran a bunch of loops of liburing tests and Ming's ublksrv
>> test case as well on io_uring-6.9 and it all worked fine. Trying
>> liburing tests on for-6.10/io_uring as well now, but didn't see anything
>> the other times I ran it. In any case, once you repost I'll rebase and
>> then let's see if it hits again.
>>
>> Did you run with KASAN enabled
> 
> Yes, it's a debug kernel, full on KASANs, lockdeps and so

And another note, I triggered it once (IIRC on shutdown) with ublk
tests only w/o liburing/tests, likely limits it to either the core
io_uring infra or non-io_uring bugs.

Jens Axboe March 17, 2024, 9:51 p.m. UTC | #21

On 3/17/24 3:47 PM, Pavel Begunkov wrote:
> On 3/17/24 21:34, Pavel Begunkov wrote:
>> On 3/17/24 21:32, Jens Axboe wrote:
>>> On 3/17/24 3:29 PM, Pavel Begunkov wrote:
>>>> On 3/17/24 21:24, Jens Axboe wrote:
>>>>> On 3/17/24 2:55 PM, Pavel Begunkov wrote:
>>>>>> On 3/16/24 13:56, Ming Lei wrote:
>>>>>>> On Sat, Mar 16, 2024 at 01:27:17PM +0000, Pavel Begunkov wrote:
>>>>>>>> On 3/16/24 11:52, Ming Lei wrote:
>>>>>>>>> On Fri, Mar 15, 2024 at 04:53:21PM -0600, Jens Axboe wrote:
>>>>>>>
>>>>>>> ...
>>>>>>>
>>>>>>>>> The following two error can be triggered with this patchset
>>>>>>>>> when running some ublk stress test(io vs. deletion). And not see
>>>>>>>>> such failures after reverting the 11 patches.
>>>>>>>>
>>>>>>>> I suppose it's with the fix from yesterday. How can I
>>>>>>>> reproduce it, blktests?
>>>>>>>
>>>>>>> Yeah, it needs yesterday's fix.
>>>>>>>
>>>>>>> You may need to run this test multiple times for triggering the problem:
>>>>>>
>>>>>> Thanks for all the testing. I've tried it, all ublk/generic tests hang
>>>>>> in userspace waiting for CQEs but no complaints from the kernel.
>>>>>> However, it seems the branch is buggy even without my patches, I
>>>>>> consistently (5-15 minutes of running in a slow VM) hit page underflow
>>>>>> by running liburing tests. Not sure what is that yet, but might also
>>>>>> be the reason.
>>>>>
>>>>> Hmm odd, there's nothing in there but your series and then the
>>>>> io_uring-6.9 bits pulled in. Maybe it hit an unfortunate point in the
>>>>> merge window -git cycle? Does it happen with io_uring-6.9 as well? I
>>>>> haven't seen anything odd.
>>>>
>>>> Need to test io_uring-6.9. I actually checked the branch twice, both
>>>> with the issue, and by full recompilation and config prompts I assumed
>>>> you pulled something in between (maybe not).
>>>>
>>>> And yeah, I can't confirm it's specifically an io_uring bug, the
>>>> stack trace is usually some unmap or task exit, sometimes it only
>>>> shows when you try to shutdown the VM after tests.
>>>
>>> Funky. I just ran a bunch of loops of liburing tests and Ming's ublksrv
>>> test case as well on io_uring-6.9 and it all worked fine. Trying
>>> liburing tests on for-6.10/io_uring as well now, but didn't see anything
>>> the other times I ran it. In any case, once you repost I'll rebase and
>>> then let's see if it hits again.
>>>
>>> Did you run with KASAN enabled
>>
>> Yes, it's a debug kernel, full on KASANs, lockdeps and so
> 
> And another note, I triggered it once (IIRC on shutdown) with ublk
> tests only w/o liburing/tests, likely limits it to either the core
> io_uring infra or non-io_uring bugs.

Been running on for-6.10/io_uring, and the only odd thing I see is that
the test output tends to stall here:

Running test read-before-exit.t

which then either leads to a connection disconnect from my ssh into that
vm, or just a long delay and then it picks up again. This did not happen
with io_uring-6.9.

Maybe related? At least it's something new. Just checked again, and yeah
it seems to totally lock up the vm while that is running. Will try a
quick bisect of that series.

Jens Axboe March 17, 2024, 10:07 p.m. UTC | #22

On 3/17/24 3:51 PM, Jens Axboe wrote:
> On 3/17/24 3:47 PM, Pavel Begunkov wrote:
>> On 3/17/24 21:34, Pavel Begunkov wrote:
>>> On 3/17/24 21:32, Jens Axboe wrote:
>>>> On 3/17/24 3:29 PM, Pavel Begunkov wrote:
>>>>> On 3/17/24 21:24, Jens Axboe wrote:
>>>>>> On 3/17/24 2:55 PM, Pavel Begunkov wrote:
>>>>>>> On 3/16/24 13:56, Ming Lei wrote:
>>>>>>>> On Sat, Mar 16, 2024 at 01:27:17PM +0000, Pavel Begunkov wrote:
>>>>>>>>> On 3/16/24 11:52, Ming Lei wrote:
>>>>>>>>>> On Fri, Mar 15, 2024 at 04:53:21PM -0600, Jens Axboe wrote:
>>>>>>>>
>>>>>>>> ...
>>>>>>>>
>>>>>>>>>> The following two error can be triggered with this patchset
>>>>>>>>>> when running some ublk stress test(io vs. deletion). And not see
>>>>>>>>>> such failures after reverting the 11 patches.
>>>>>>>>>
>>>>>>>>> I suppose it's with the fix from yesterday. How can I
>>>>>>>>> reproduce it, blktests?
>>>>>>>>
>>>>>>>> Yeah, it needs yesterday's fix.
>>>>>>>>
>>>>>>>> You may need to run this test multiple times for triggering the problem:
>>>>>>>
>>>>>>> Thanks for all the testing. I've tried it, all ublk/generic tests hang
>>>>>>> in userspace waiting for CQEs but no complaints from the kernel.
>>>>>>> However, it seems the branch is buggy even without my patches, I
>>>>>>> consistently (5-15 minutes of running in a slow VM) hit page underflow
>>>>>>> by running liburing tests. Not sure what is that yet, but might also
>>>>>>> be the reason.
>>>>>>
>>>>>> Hmm odd, there's nothing in there but your series and then the
>>>>>> io_uring-6.9 bits pulled in. Maybe it hit an unfortunate point in the
>>>>>> merge window -git cycle? Does it happen with io_uring-6.9 as well? I
>>>>>> haven't seen anything odd.
>>>>>
>>>>> Need to test io_uring-6.9. I actually checked the branch twice, both
>>>>> with the issue, and by full recompilation and config prompts I assumed
>>>>> you pulled something in between (maybe not).
>>>>>
>>>>> And yeah, I can't confirm it's specifically an io_uring bug, the
>>>>> stack trace is usually some unmap or task exit, sometimes it only
>>>>> shows when you try to shutdown the VM after tests.
>>>>
>>>> Funky. I just ran a bunch of loops of liburing tests and Ming's ublksrv
>>>> test case as well on io_uring-6.9 and it all worked fine. Trying
>>>> liburing tests on for-6.10/io_uring as well now, but didn't see anything
>>>> the other times I ran it. In any case, once you repost I'll rebase and
>>>> then let's see if it hits again.
>>>>
>>>> Did you run with KASAN enabled
>>>
>>> Yes, it's a debug kernel, full on KASANs, lockdeps and so
>>
>> And another note, I triggered it once (IIRC on shutdown) with ublk
>> tests only w/o liburing/tests, likely limits it to either the core
>> io_uring infra or non-io_uring bugs.
> 
> Been running on for-6.10/io_uring, and the only odd thing I see is that
> the test output tends to stall here:
> 
> Running test read-before-exit.t
> 
> which then either leads to a connection disconnect from my ssh into that
> vm, or just a long delay and then it picks up again. This did not happen
> with io_uring-6.9.
> 
> Maybe related? At least it's something new. Just checked again, and yeah
> it seems to totally lock up the vm while that is running. Will try a
> quick bisect of that series.

Seems to be triggered by the top of branch patch in there, my poll and
timeout special casing. While the above test case runs with that commit,
it'll freeze the host.

Jens Axboe March 17, 2024, 10:24 p.m. UTC | #23

On 3/17/24 4:07 PM, Jens Axboe wrote:
> On 3/17/24 3:51 PM, Jens Axboe wrote:
>> On 3/17/24 3:47 PM, Pavel Begunkov wrote:
>>> On 3/17/24 21:34, Pavel Begunkov wrote:
>>>> On 3/17/24 21:32, Jens Axboe wrote:
>>>>> On 3/17/24 3:29 PM, Pavel Begunkov wrote:
>>>>>> On 3/17/24 21:24, Jens Axboe wrote:
>>>>>>> On 3/17/24 2:55 PM, Pavel Begunkov wrote:
>>>>>>>> On 3/16/24 13:56, Ming Lei wrote:
>>>>>>>>> On Sat, Mar 16, 2024 at 01:27:17PM +0000, Pavel Begunkov wrote:
>>>>>>>>>> On 3/16/24 11:52, Ming Lei wrote:
>>>>>>>>>>> On Fri, Mar 15, 2024 at 04:53:21PM -0600, Jens Axboe wrote:
>>>>>>>>>
>>>>>>>>> ...
>>>>>>>>>
>>>>>>>>>>> The following two error can be triggered with this patchset
>>>>>>>>>>> when running some ublk stress test(io vs. deletion). And not see
>>>>>>>>>>> such failures after reverting the 11 patches.
>>>>>>>>>>
>>>>>>>>>> I suppose it's with the fix from yesterday. How can I
>>>>>>>>>> reproduce it, blktests?
>>>>>>>>>
>>>>>>>>> Yeah, it needs yesterday's fix.
>>>>>>>>>
>>>>>>>>> You may need to run this test multiple times for triggering the problem:
>>>>>>>>
>>>>>>>> Thanks for all the testing. I've tried it, all ublk/generic tests hang
>>>>>>>> in userspace waiting for CQEs but no complaints from the kernel.
>>>>>>>> However, it seems the branch is buggy even without my patches, I
>>>>>>>> consistently (5-15 minutes of running in a slow VM) hit page underflow
>>>>>>>> by running liburing tests. Not sure what is that yet, but might also
>>>>>>>> be the reason.
>>>>>>>
>>>>>>> Hmm odd, there's nothing in there but your series and then the
>>>>>>> io_uring-6.9 bits pulled in. Maybe it hit an unfortunate point in the
>>>>>>> merge window -git cycle? Does it happen with io_uring-6.9 as well? I
>>>>>>> haven't seen anything odd.
>>>>>>
>>>>>> Need to test io_uring-6.9. I actually checked the branch twice, both
>>>>>> with the issue, and by full recompilation and config prompts I assumed
>>>>>> you pulled something in between (maybe not).
>>>>>>
>>>>>> And yeah, I can't confirm it's specifically an io_uring bug, the
>>>>>> stack trace is usually some unmap or task exit, sometimes it only
>>>>>> shows when you try to shutdown the VM after tests.
>>>>>
>>>>> Funky. I just ran a bunch of loops of liburing tests and Ming's ublksrv
>>>>> test case as well on io_uring-6.9 and it all worked fine. Trying
>>>>> liburing tests on for-6.10/io_uring as well now, but didn't see anything
>>>>> the other times I ran it. In any case, once you repost I'll rebase and
>>>>> then let's see if it hits again.
>>>>>
>>>>> Did you run with KASAN enabled
>>>>
>>>> Yes, it's a debug kernel, full on KASANs, lockdeps and so
>>>
>>> And another note, I triggered it once (IIRC on shutdown) with ublk
>>> tests only w/o liburing/tests, likely limits it to either the core
>>> io_uring infra or non-io_uring bugs.
>>
>> Been running on for-6.10/io_uring, and the only odd thing I see is that
>> the test output tends to stall here:
>>
>> Running test read-before-exit.t
>>
>> which then either leads to a connection disconnect from my ssh into that
>> vm, or just a long delay and then it picks up again. This did not happen
>> with io_uring-6.9.
>>
>> Maybe related? At least it's something new. Just checked again, and yeah
>> it seems to totally lock up the vm while that is running. Will try a
>> quick bisect of that series.
> 
> Seems to be triggered by the top of branch patch in there, my poll and
> timeout special casing. While the above test case runs with that commit,
> it'll freeze the host.

Had a feeling this was the busy looping off cancelations, and flushing
the fallback task_work seems to fix it. I'll check more tomorrow.


diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index a2cb8da3cc33..f1d3c5e065e9 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3242,6 +3242,8 @@ static __cold bool io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
 	ret |= io_kill_timeouts(ctx, task, cancel_all);
 	if (task)
 		ret |= io_run_task_work() > 0;
+	else if (ret)
+		flush_delayed_work(&ctx->fallback_work);
 	return ret;
 }

Pavel Begunkov March 17, 2024, 11:16 p.m. UTC | #24

On 3/17/24 21:51, Jens Axboe wrote:
> On 3/17/24 3:47 PM, Pavel Begunkov wrote:
>> On 3/17/24 21:34, Pavel Begunkov wrote:
>>> On 3/17/24 21:32, Jens Axboe wrote:
>>>> On 3/17/24 3:29 PM, Pavel Begunkov wrote:
>>>>> On 3/17/24 21:24, Jens Axboe wrote:
>>>>>> On 3/17/24 2:55 PM, Pavel Begunkov wrote:
>>>>>>> On 3/16/24 13:56, Ming Lei wrote:
>>>>>>>> On Sat, Mar 16, 2024 at 01:27:17PM +0000, Pavel Begunkov wrote:
>>>>>>>>> On 3/16/24 11:52, Ming Lei wrote:
>>>>>>>>>> On Fri, Mar 15, 2024 at 04:53:21PM -0600, Jens Axboe wrote:
>>>>>>>>
>>>>>>>> ...
>>>>>>>>
>>>>>>>>>> The following two error can be triggered with this patchset
>>>>>>>>>> when running some ublk stress test(io vs. deletion). And not see
>>>>>>>>>> such failures after reverting the 11 patches.
>>>>>>>>>
>>>>>>>>> I suppose it's with the fix from yesterday. How can I
>>>>>>>>> reproduce it, blktests?
>>>>>>>>
>>>>>>>> Yeah, it needs yesterday's fix.
>>>>>>>>
>>>>>>>> You may need to run this test multiple times for triggering the problem:
>>>>>>>
>>>>>>> Thanks for all the testing. I've tried it, all ublk/generic tests hang
>>>>>>> in userspace waiting for CQEs but no complaints from the kernel.
>>>>>>> However, it seems the branch is buggy even without my patches, I
>>>>>>> consistently (5-15 minutes of running in a slow VM) hit page underflow
>>>>>>> by running liburing tests. Not sure what is that yet, but might also
>>>>>>> be the reason.
>>>>>>
>>>>>> Hmm odd, there's nothing in there but your series and then the
>>>>>> io_uring-6.9 bits pulled in. Maybe it hit an unfortunate point in the
>>>>>> merge window -git cycle? Does it happen with io_uring-6.9 as well? I
>>>>>> haven't seen anything odd.
>>>>>
>>>>> Need to test io_uring-6.9. I actually checked the branch twice, both
>>>>> with the issue, and by full recompilation and config prompts I assumed
>>>>> you pulled something in between (maybe not).
>>>>>
>>>>> And yeah, I can't confirm it's specifically an io_uring bug, the
>>>>> stack trace is usually some unmap or task exit, sometimes it only
>>>>> shows when you try to shutdown the VM after tests.
>>>>
>>>> Funky. I just ran a bunch of loops of liburing tests and Ming's ublksrv
>>>> test case as well on io_uring-6.9 and it all worked fine. Trying
>>>> liburing tests on for-6.10/io_uring as well now, but didn't see anything
>>>> the other times I ran it. In any case, once you repost I'll rebase and
>>>> then let's see if it hits again.
>>>>
>>>> Did you run with KASAN enabled
>>>
>>> Yes, it's a debug kernel, full on KASANs, lockdeps and so
>>
>> And another note, I triggered it once (IIRC on shutdown) with ublk
>> tests only w/o liburing/tests, likely limits it to either the core
>> io_uring infra or non-io_uring bugs.
> 
> Been running on for-6.10/io_uring, and the only odd thing I see is that
> the test output tends to stall here:

can't trigger with io_uring-5.9, which makes sense because the
patchset was done on top of it and tested w/o problems.

Ming Lei March 18, 2024, 12:15 a.m. UTC | #25

On Sun, Mar 17, 2024 at 04:24:07PM -0600, Jens Axboe wrote:
> On 3/17/24 4:07 PM, Jens Axboe wrote:
> > On 3/17/24 3:51 PM, Jens Axboe wrote:
> >> On 3/17/24 3:47 PM, Pavel Begunkov wrote:
> >>> On 3/17/24 21:34, Pavel Begunkov wrote:
> >>>> On 3/17/24 21:32, Jens Axboe wrote:
> >>>>> On 3/17/24 3:29 PM, Pavel Begunkov wrote:
> >>>>>> On 3/17/24 21:24, Jens Axboe wrote:
> >>>>>>> On 3/17/24 2:55 PM, Pavel Begunkov wrote:
> >>>>>>>> On 3/16/24 13:56, Ming Lei wrote:
> >>>>>>>>> On Sat, Mar 16, 2024 at 01:27:17PM +0000, Pavel Begunkov wrote:
> >>>>>>>>>> On 3/16/24 11:52, Ming Lei wrote:
> >>>>>>>>>>> On Fri, Mar 15, 2024 at 04:53:21PM -0600, Jens Axboe wrote:
> >>>>>>>>>
> >>>>>>>>> ...
> >>>>>>>>>
> >>>>>>>>>>> The following two error can be triggered with this patchset
> >>>>>>>>>>> when running some ublk stress test(io vs. deletion). And not see
> >>>>>>>>>>> such failures after reverting the 11 patches.
> >>>>>>>>>>
> >>>>>>>>>> I suppose it's with the fix from yesterday. How can I
> >>>>>>>>>> reproduce it, blktests?
> >>>>>>>>>
> >>>>>>>>> Yeah, it needs yesterday's fix.
> >>>>>>>>>
> >>>>>>>>> You may need to run this test multiple times for triggering the problem:
> >>>>>>>>
> >>>>>>>> Thanks for all the testing. I've tried it, all ublk/generic tests hang
> >>>>>>>> in userspace waiting for CQEs but no complaints from the kernel.
> >>>>>>>> However, it seems the branch is buggy even without my patches, I
> >>>>>>>> consistently (5-15 minutes of running in a slow VM) hit page underflow
> >>>>>>>> by running liburing tests. Not sure what is that yet, but might also
> >>>>>>>> be the reason.
> >>>>>>>
> >>>>>>> Hmm odd, there's nothing in there but your series and then the
> >>>>>>> io_uring-6.9 bits pulled in. Maybe it hit an unfortunate point in the
> >>>>>>> merge window -git cycle? Does it happen with io_uring-6.9 as well? I
> >>>>>>> haven't seen anything odd.
> >>>>>>
> >>>>>> Need to test io_uring-6.9. I actually checked the branch twice, both
> >>>>>> with the issue, and by full recompilation and config prompts I assumed
> >>>>>> you pulled something in between (maybe not).
> >>>>>>
> >>>>>> And yeah, I can't confirm it's specifically an io_uring bug, the
> >>>>>> stack trace is usually some unmap or task exit, sometimes it only
> >>>>>> shows when you try to shutdown the VM after tests.
> >>>>>
> >>>>> Funky. I just ran a bunch of loops of liburing tests and Ming's ublksrv
> >>>>> test case as well on io_uring-6.9 and it all worked fine. Trying
> >>>>> liburing tests on for-6.10/io_uring as well now, but didn't see anything
> >>>>> the other times I ran it. In any case, once you repost I'll rebase and
> >>>>> then let's see if it hits again.
> >>>>>
> >>>>> Did you run with KASAN enabled
> >>>>
> >>>> Yes, it's a debug kernel, full on KASANs, lockdeps and so
> >>>
> >>> And another note, I triggered it once (IIRC on shutdown) with ublk
> >>> tests only w/o liburing/tests, likely limits it to either the core
> >>> io_uring infra or non-io_uring bugs.
> >>
> >> Been running on for-6.10/io_uring, and the only odd thing I see is that
> >> the test output tends to stall here:
> >>
> >> Running test read-before-exit.t
> >>
> >> which then either leads to a connection disconnect from my ssh into that
> >> vm, or just a long delay and then it picks up again. This did not happen
> >> with io_uring-6.9.
> >>
> >> Maybe related? At least it's something new. Just checked again, and yeah
> >> it seems to totally lock up the vm while that is running. Will try a
> >> quick bisect of that series.
> > 
> > Seems to be triggered by the top of branch patch in there, my poll and
> > timeout special casing. While the above test case runs with that commit,
> > it'll freeze the host.
> 
> Had a feeling this was the busy looping off cancelations, and flushing
> the fallback task_work seems to fix it. I'll check more tomorrow.
> 
> 
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index a2cb8da3cc33..f1d3c5e065e9 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -3242,6 +3242,8 @@ static __cold bool io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
>  	ret |= io_kill_timeouts(ctx, task, cancel_all);
>  	if (task)
>  		ret |= io_run_task_work() > 0;
> +	else if (ret)
> +		flush_delayed_work(&ctx->fallback_work);
>  	return ret;
>  }

Still can trigger the warning with above patch:

[  446.275975] ------------[ cut here ]------------
[  446.276340] WARNING: CPU: 8 PID: 731 at kernel/fork.c:969 __put_task_struct+0x10c/0x180
[  446.276931] Modules linked in: isofs binfmt_misc xfs vfat fat raid0 iTCO_wdt intel_pmc_bxt iTCO_vendor_support virtio_net i2c_i801 net_fag
[  446.278608] CPU: 8 PID: 731 Comm: kworker/8:2 Not tainted 6.8.0_io_uring_6.10+ #20
[  446.279535] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc37 04/01/2014
[  446.280440] Workqueue: events io_fallback_req_func
[  446.280971] RIP: 0010:__put_task_struct+0x10c/0x180
[  446.281485] Code: 48 85 d2 74 05 f0 ff 0a 74 44 48 8b 3d b5 83 c7 02 48 89 ee e8 a5 f6 2e 00 eb ac be 03 00 00 00 48 89 ef e8 26 9f 72 002
[  446.282727] RSP: 0018:ffffb325c06bfdf8 EFLAGS: 00010246
[  446.283099] RAX: 0000000000000000 RBX: ffff92717cabaf40 RCX: 0000000000000000
[  446.283578] RDX: 0000000000000001 RSI: 0000000000000246 RDI: ffff92717cabaf40
[  446.284062] RBP: ffff92710cab4800 R08: 0000000000000000 R09: 0000000000000000
[  446.284545] R10: ffffb325c06bfdb0 R11: 0000000000000100 R12: ffff92717aedc580
[  446.285233] R13: ffff92717aedc580 R14: ffff927151ee5a00 R15: 0000000000000000
[  446.285840] FS:  0000000000000000(0000) GS:ffff927667c00000(0000) knlGS:0000000000000000
[  446.286412] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  446.286819] CR2: 00007feacac32000 CR3: 0000000618020002 CR4: 0000000000770ef0
[  446.287310] PKRU: 55555554
[  446.287534] Call Trace:
[  446.287752]  <TASK>
[  446.287941]  ? __warn+0x80/0x120
[  446.288206]  ? __put_task_struct+0x10c/0x180
[  446.288524]  ? report_bug+0x164/0x190
[  446.288816]  ? handle_bug+0x41/0x70
[  446.289098]  ? exc_invalid_op+0x17/0x70
[  446.289392]  ? asm_exc_invalid_op+0x1a/0x20
[  446.289715]  ? __put_task_struct+0x10c/0x180
[  446.290038]  ? io_put_task_remote+0x80/0x90
[  446.290372]  __io_submit_flush_completions+0x2d6/0x390
[  446.290761]  io_fallback_req_func+0xad/0x140
[  446.291088]  process_one_work+0x189/0x3b0
[  446.291403]  worker_thread+0x277/0x390
[  446.291700]  ? __pfx_worker_thread+0x10/0x10
[  446.292018]  kthread+0xcf/0x100
[  446.292278]  ? __pfx_kthread+0x10/0x10
[  446.292562]  ret_from_fork+0x31/0x50
[  446.292848]  ? __pfx_kthread+0x10/0x10
[  446.293143]  ret_from_fork_asm+0x1a/0x30
[  446.293576]  </TASK>
[  446.293919] ---[ end trace 0000000000000000 ]---
[  446.294460] ------------[ cut here ]------------
[  446.294808] WARNING: CPU: 8 PID: 731 at kernel/fork.c:601 free_task+0x61/0x70
[  446.295294] Modules linked in: isofs binfmt_misc xfs vfat fat raid0 iTCO_wdt intel_pmc_bxt iTCO_vendor_support virtio_net i2c_i801 net_fag
[  446.296901] CPU: 8 PID: 731 Comm: kworker/8:2 Tainted: G        W          6.8.0_io_uring_6.10+ #20
[  446.297521] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc37 04/01/2014
[  446.298110] Workqueue: events io_fallback_req_func
[  446.298455] RIP: 0010:free_task+0x61/0x70
[  446.298756] Code: f3 ff f6 43 2e 20 75 18 48 89 df e8 49 7b 20 00 48 8b 3d e2 84 c7 02 48 89 de 5b e9 c9 f7 2e 00 48 89 df e8 61 70 03 000
[  446.303360] RSP: 0018:ffffb325c06bfe00 EFLAGS: 00010202
[  446.303745] RAX: 0000000000000001 RBX: ffff92717cabaf40 RCX: 0000000009a40008
[  446.304226] RDX: 0000000009a3e008 RSI: 000000000003b060 RDI: 6810a90e7192ffff
[  446.304763] RBP: ffff92710cab4800 R08: 0000000000000000 R09: 00000000820001df
[  446.305288] R10: 00000000820001df R11: 000000000000000d R12: ffff92717aedc580
[  446.305769] R13: ffff92717aedc580 R14: ffff927151ee5a00 R15: 0000000000000000
[  446.306251] FS:  0000000000000000(0000) GS:ffff927667c00000(0000) knlGS:0000000000000000
[  446.306815] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  446.307220] CR2: 00007feacac32000 CR3: 0000000618020002 CR4: 0000000000770ef0
[  446.307702] PKRU: 55555554
[  446.307924] Call Trace:
[  446.308135]  <TASK>
[  446.308322]  ? __warn+0x80/0x120
[  446.308637]  ? free_task+0x61/0x70
[  446.308926]  ? report_bug+0x164/0x190
[  446.309207]  ? handle_bug+0x41/0x70
[  446.309474]  ? exc_invalid_op+0x17/0x70
[  446.309767]  ? asm_exc_invalid_op+0x1a/0x20
[  446.310076]  ? free_task+0x61/0x70
[  446.310340]  __io_submit_flush_completions+0x2d6/0x390
[  446.310711]  io_fallback_req_func+0xad/0x140
[  446.311067]  process_one_work+0x189/0x3b0
[  446.311492]  worker_thread+0x277/0x390
[  446.311881]  ? __pfx_worker_thread+0x10/0x10
[  446.312205]  kthread+0xcf/0x100
[  446.312457]  ? __pfx_kthread+0x10/0x10
[  446.312750]  ret_from_fork+0x31/0x50
[  446.313028]  ? __pfx_kthread+0x10/0x10
[  446.313320]  ret_from_fork_asm+0x1a/0x30
[  446.313616]  </TASK>
[  446.313812] ---[ end trace 0000000000000000 ]---
[  446.314171] BUG: kernel NULL pointer dereference, address: 0000000000000098
[  446.314184] ------------[ cut here ]------------
[  446.314495] #PF: supervisor read access in kernel mode
[  446.314747] kernel BUG at mm/slub.c:553!
[  446.314986] #PF: error_code(0x0000) - not-present page
[  446.316032] PGD 0 P4D 0
[  446.316253] Oops: 0000 [#1] PREEMPT SMP NOPTI
[  446.316573] CPU: 8 PID: 9914 Comm: ublk Tainted: G        W          6.8.0_io_uring_6.10+ #20
[  446.317167] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc37 04/01/2014
[  446.317763] RIP: 0010:release_task+0x3b/0x560
[  446.318089] Code: 55 53 48 89 fb 48 83 ec 28 65 48 8b 04 25 28 00 00 00 48 89 44 24 20 31 c0 e8 d1 17 0b 00 e8 cc 17 0b 00 48 8b 83 b8 0bf
[  446.319301] RSP: 0018:ffffb325cdb2bca8 EFLAGS: 00010202
[  446.319672] RAX: 0000000000000000 RBX: ffff92717cabaf40 RCX: ffffb325cdb2bd18
[  446.320151] RDX: ffff92717cabaf40 RSI: ffff92717cabb948 RDI: ffff92717cabaf40
[  446.320628] RBP: ffffb325cdb2bd18 R08: ffffb325cdb2bd18 R09: 0000000000000000
[  446.321122] R10: 0000000000000001 R11: 0000000000000100 R12: ffffb325cdb2bd18
[  446.321706] R13: ffffb325cdb2b310 R14: ffffb325cdb2b310 R15: 0000000000000000
[  446.322188] FS:  0000000000000000(0000) GS:ffff927667c00000(0000) knlGS:0000000000000000
[  446.322742] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  446.323169] CR2: 0000000000000098 CR3: 0000000618020002 CR4: 0000000000770ef0
[  446.324013] PKRU: 55555554
[  446.324267] Call Trace:
[  446.324466]  <TASK>
[  446.324646]  ? __die+0x23/0x70
[  446.324881]  ? page_fault_oops+0x173/0x4f0
[  446.325175]  ? _raw_spin_unlock_irq+0xe/0x30
[  446.325475]  ? exc_page_fault+0x76/0x170
[  446.325752]  ? asm_exc_page_fault+0x26/0x30
[  446.326048]  ? release_task+0x3b/0x560
[  446.326321]  ? release_task+0x34/0x560
[  446.326589]  do_exit+0x6fd/0xad0
[  446.326832]  do_group_exit+0x30/0x80
[  446.327100]  get_signal+0x8de/0x8e0
[  446.327355]  arch_do_signal_or_restart+0x3e/0x240
[  446.327680]  syscall_exit_to_user_mode+0x167/0x210
[  446.328008]  do_syscall_64+0x96/0x170
[  446.328361]  ? syscall_exit_to_user_mode+0x60/0x210
[  446.328879]  ? do_syscall_64+0x96/0x170
[  446.329173]  entry_SYSCALL_64_after_hwframe+0x6c/0x74
[  446.329524] RIP: 0033:0x7feadc57e445
[  446.329796] Code: Unable to access opcode bytes at 0x7feadc57e41b.
[  446.330208] RSP: 002b:00007feadb3ffd38 EFLAGS: 00000202 ORIG_RAX: 00000000000001aa
[  446.330717] RAX: 0000000000000078 RBX: 0000000000000000 RCX: 00007feadc57e445
[  446.331184] RDX: 0000000000000001 RSI: 0000000000000078 RDI: 0000000000000000
[  446.331642] RBP: 00007feacc002ff8 R08: 00007feadb3ffd70 R09: 0000000000000018
[  446.332107] R10: 0000000000000019 R11: 0000000000000202 R12: 00007feadb3ffd90
[  446.332564] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000000001aa
[  446.333022]  </TASK>
[  446.333208] Modules linked in: isofs binfmt_misc xfs vfat fat raid0 iTCO_wdt intel_pmc_bxt iTCO_vendor_support virtio_net i2c_i801 net_fag
[  446.334744] CR2: 0000000000000098
[  446.335008] ---[ end trace 0000000000000000 ]---



Thanks, 
Ming

Jens Axboe March 18, 2024, 1:34 a.m. UTC | #26

On 3/17/24 6:15 PM, Ming Lei wrote:
> On Sun, Mar 17, 2024 at 04:24:07PM -0600, Jens Axboe wrote:
>> On 3/17/24 4:07 PM, Jens Axboe wrote:
>>> On 3/17/24 3:51 PM, Jens Axboe wrote:
>>>> On 3/17/24 3:47 PM, Pavel Begunkov wrote:
>>>>> On 3/17/24 21:34, Pavel Begunkov wrote:
>>>>>> On 3/17/24 21:32, Jens Axboe wrote:
>>>>>>> On 3/17/24 3:29 PM, Pavel Begunkov wrote:
>>>>>>>> On 3/17/24 21:24, Jens Axboe wrote:
>>>>>>>>> On 3/17/24 2:55 PM, Pavel Begunkov wrote:
>>>>>>>>>> On 3/16/24 13:56, Ming Lei wrote:
>>>>>>>>>>> On Sat, Mar 16, 2024 at 01:27:17PM +0000, Pavel Begunkov wrote:
>>>>>>>>>>>> On 3/16/24 11:52, Ming Lei wrote:
>>>>>>>>>>>>> On Fri, Mar 15, 2024 at 04:53:21PM -0600, Jens Axboe wrote:
>>>>>>>>>>>
>>>>>>>>>>> ...
>>>>>>>>>>>
>>>>>>>>>>>>> The following two error can be triggered with this patchset
>>>>>>>>>>>>> when running some ublk stress test(io vs. deletion). And not see
>>>>>>>>>>>>> such failures after reverting the 11 patches.
>>>>>>>>>>>>
>>>>>>>>>>>> I suppose it's with the fix from yesterday. How can I
>>>>>>>>>>>> reproduce it, blktests?
>>>>>>>>>>>
>>>>>>>>>>> Yeah, it needs yesterday's fix.
>>>>>>>>>>>
>>>>>>>>>>> You may need to run this test multiple times for triggering the problem:
>>>>>>>>>>
>>>>>>>>>> Thanks for all the testing. I've tried it, all ublk/generic tests hang
>>>>>>>>>> in userspace waiting for CQEs but no complaints from the kernel.
>>>>>>>>>> However, it seems the branch is buggy even without my patches, I
>>>>>>>>>> consistently (5-15 minutes of running in a slow VM) hit page underflow
>>>>>>>>>> by running liburing tests. Not sure what is that yet, but might also
>>>>>>>>>> be the reason.
>>>>>>>>>
>>>>>>>>> Hmm odd, there's nothing in there but your series and then the
>>>>>>>>> io_uring-6.9 bits pulled in. Maybe it hit an unfortunate point in the
>>>>>>>>> merge window -git cycle? Does it happen with io_uring-6.9 as well? I
>>>>>>>>> haven't seen anything odd.
>>>>>>>>
>>>>>>>> Need to test io_uring-6.9. I actually checked the branch twice, both
>>>>>>>> with the issue, and by full recompilation and config prompts I assumed
>>>>>>>> you pulled something in between (maybe not).
>>>>>>>>
>>>>>>>> And yeah, I can't confirm it's specifically an io_uring bug, the
>>>>>>>> stack trace is usually some unmap or task exit, sometimes it only
>>>>>>>> shows when you try to shutdown the VM after tests.
>>>>>>>
>>>>>>> Funky. I just ran a bunch of loops of liburing tests and Ming's ublksrv
>>>>>>> test case as well on io_uring-6.9 and it all worked fine. Trying
>>>>>>> liburing tests on for-6.10/io_uring as well now, but didn't see anything
>>>>>>> the other times I ran it. In any case, once you repost I'll rebase and
>>>>>>> then let's see if it hits again.
>>>>>>>
>>>>>>> Did you run with KASAN enabled
>>>>>>
>>>>>> Yes, it's a debug kernel, full on KASANs, lockdeps and so
>>>>>
>>>>> And another note, I triggered it once (IIRC on shutdown) with ublk
>>>>> tests only w/o liburing/tests, likely limits it to either the core
>>>>> io_uring infra or non-io_uring bugs.
>>>>
>>>> Been running on for-6.10/io_uring, and the only odd thing I see is that
>>>> the test output tends to stall here:
>>>>
>>>> Running test read-before-exit.t
>>>>
>>>> which then either leads to a connection disconnect from my ssh into that
>>>> vm, or just a long delay and then it picks up again. This did not happen
>>>> with io_uring-6.9.
>>>>
>>>> Maybe related? At least it's something new. Just checked again, and yeah
>>>> it seems to totally lock up the vm while that is running. Will try a
>>>> quick bisect of that series.
>>>
>>> Seems to be triggered by the top of branch patch in there, my poll and
>>> timeout special casing. While the above test case runs with that commit,
>>> it'll freeze the host.
>>
>> Had a feeling this was the busy looping off cancelations, and flushing
>> the fallback task_work seems to fix it. I'll check more tomorrow.
>>
>>
>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>> index a2cb8da3cc33..f1d3c5e065e9 100644
>> --- a/io_uring/io_uring.c
>> +++ b/io_uring/io_uring.c
>> @@ -3242,6 +3242,8 @@ static __cold bool io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
>>  	ret |= io_kill_timeouts(ctx, task, cancel_all);
>>  	if (task)
>>  		ret |= io_run_task_work() > 0;
>> +	else if (ret)
>> +		flush_delayed_work(&ctx->fallback_work);
>>  	return ret;
>>  }
> 
> Still can trigger the warning with above patch:
> 
> [  446.275975] ------------[ cut here ]------------
> [  446.276340] WARNING: CPU: 8 PID: 731 at kernel/fork.c:969 __put_task_struct+0x10c/0x180

And this is running that test case you referenced? I'll take a look, as
it seems related to the poll kill rather than the other patchset.

Jens Axboe March 18, 2024, 1:44 a.m. UTC | #27

On 3/17/24 7:34 PM, Jens Axboe wrote:
>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>>> index a2cb8da3cc33..f1d3c5e065e9 100644
>>> --- a/io_uring/io_uring.c
>>> +++ b/io_uring/io_uring.c
>>> @@ -3242,6 +3242,8 @@ static __cold bool io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
>>>  	ret |= io_kill_timeouts(ctx, task, cancel_all);
>>>  	if (task)
>>>  		ret |= io_run_task_work() > 0;
>>> +	else if (ret)
>>> +		flush_delayed_work(&ctx->fallback_work);
>>>  	return ret;
>>>  }
>>
>> Still can trigger the warning with above patch:
>>
>> [  446.275975] ------------[ cut here ]------------
>> [  446.276340] WARNING: CPU: 8 PID: 731 at kernel/fork.c:969 __put_task_struct+0x10c/0x180
> 
> And this is running that test case you referenced? I'll take a look, as
> it seems related to the poll kill rather than the other patchset.

I can reproduce with that test too, and it triggers with v2 of the
patchset on top of io_uring-6.9, with and without that poll patch. Which
makes sense, as I doubt you're doing any poll or timeouts in there.

Pavel, passing that one to you then :)

Ming Lei March 18, 2024, 1:49 a.m. UTC | #28

On Sun, Mar 17, 2024 at 07:34:30PM -0600, Jens Axboe wrote:
> On 3/17/24 6:15 PM, Ming Lei wrote:
> > On Sun, Mar 17, 2024 at 04:24:07PM -0600, Jens Axboe wrote:
> >> On 3/17/24 4:07 PM, Jens Axboe wrote:
> >>> On 3/17/24 3:51 PM, Jens Axboe wrote:
> >>>> On 3/17/24 3:47 PM, Pavel Begunkov wrote:
> >>>>> On 3/17/24 21:34, Pavel Begunkov wrote:
> >>>>>> On 3/17/24 21:32, Jens Axboe wrote:
> >>>>>>> On 3/17/24 3:29 PM, Pavel Begunkov wrote:
> >>>>>>>> On 3/17/24 21:24, Jens Axboe wrote:
> >>>>>>>>> On 3/17/24 2:55 PM, Pavel Begunkov wrote:
> >>>>>>>>>> On 3/16/24 13:56, Ming Lei wrote:
> >>>>>>>>>>> On Sat, Mar 16, 2024 at 01:27:17PM +0000, Pavel Begunkov wrote:
> >>>>>>>>>>>> On 3/16/24 11:52, Ming Lei wrote:
> >>>>>>>>>>>>> On Fri, Mar 15, 2024 at 04:53:21PM -0600, Jens Axboe wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> ...
> >>>>>>>>>>>
> >>>>>>>>>>>>> The following two error can be triggered with this patchset
> >>>>>>>>>>>>> when running some ublk stress test(io vs. deletion). And not see
> >>>>>>>>>>>>> such failures after reverting the 11 patches.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I suppose it's with the fix from yesterday. How can I
> >>>>>>>>>>>> reproduce it, blktests?
> >>>>>>>>>>>
> >>>>>>>>>>> Yeah, it needs yesterday's fix.
> >>>>>>>>>>>
> >>>>>>>>>>> You may need to run this test multiple times for triggering the problem:
> >>>>>>>>>>
> >>>>>>>>>> Thanks for all the testing. I've tried it, all ublk/generic tests hang
> >>>>>>>>>> in userspace waiting for CQEs but no complaints from the kernel.
> >>>>>>>>>> However, it seems the branch is buggy even without my patches, I
> >>>>>>>>>> consistently (5-15 minutes of running in a slow VM) hit page underflow
> >>>>>>>>>> by running liburing tests. Not sure what is that yet, but might also
> >>>>>>>>>> be the reason.
> >>>>>>>>>
> >>>>>>>>> Hmm odd, there's nothing in there but your series and then the
> >>>>>>>>> io_uring-6.9 bits pulled in. Maybe it hit an unfortunate point in the
> >>>>>>>>> merge window -git cycle? Does it happen with io_uring-6.9 as well? I
> >>>>>>>>> haven't seen anything odd.
> >>>>>>>>
> >>>>>>>> Need to test io_uring-6.9. I actually checked the branch twice, both
> >>>>>>>> with the issue, and by full recompilation and config prompts I assumed
> >>>>>>>> you pulled something in between (maybe not).
> >>>>>>>>
> >>>>>>>> And yeah, I can't confirm it's specifically an io_uring bug, the
> >>>>>>>> stack trace is usually some unmap or task exit, sometimes it only
> >>>>>>>> shows when you try to shutdown the VM after tests.
> >>>>>>>
> >>>>>>> Funky. I just ran a bunch of loops of liburing tests and Ming's ublksrv
> >>>>>>> test case as well on io_uring-6.9 and it all worked fine. Trying
> >>>>>>> liburing tests on for-6.10/io_uring as well now, but didn't see anything
> >>>>>>> the other times I ran it. In any case, once you repost I'll rebase and
> >>>>>>> then let's see if it hits again.
> >>>>>>>
> >>>>>>> Did you run with KASAN enabled
> >>>>>>
> >>>>>> Yes, it's a debug kernel, full on KASANs, lockdeps and so
> >>>>>
> >>>>> And another note, I triggered it once (IIRC on shutdown) with ublk
> >>>>> tests only w/o liburing/tests, likely limits it to either the core
> >>>>> io_uring infra or non-io_uring bugs.
> >>>>
> >>>> Been running on for-6.10/io_uring, and the only odd thing I see is that
> >>>> the test output tends to stall here:
> >>>>
> >>>> Running test read-before-exit.t
> >>>>
> >>>> which then either leads to a connection disconnect from my ssh into that
> >>>> vm, or just a long delay and then it picks up again. This did not happen
> >>>> with io_uring-6.9.
> >>>>
> >>>> Maybe related? At least it's something new. Just checked again, and yeah
> >>>> it seems to totally lock up the vm while that is running. Will try a
> >>>> quick bisect of that series.
> >>>
> >>> Seems to be triggered by the top of branch patch in there, my poll and
> >>> timeout special casing. While the above test case runs with that commit,
> >>> it'll freeze the host.
> >>
> >> Had a feeling this was the busy looping off cancelations, and flushing
> >> the fallback task_work seems to fix it. I'll check more tomorrow.
> >>
> >>
> >> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> >> index a2cb8da3cc33..f1d3c5e065e9 100644
> >> --- a/io_uring/io_uring.c
> >> +++ b/io_uring/io_uring.c
> >> @@ -3242,6 +3242,8 @@ static __cold bool io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
> >>  	ret |= io_kill_timeouts(ctx, task, cancel_all);
> >>  	if (task)
> >>  		ret |= io_run_task_work() > 0;
> >> +	else if (ret)
> >> +		flush_delayed_work(&ctx->fallback_work);
> >>  	return ret;
> >>  }
> > 
> > Still can trigger the warning with above patch:
> > 
> > [  446.275975] ------------[ cut here ]------------
> > [  446.276340] WARNING: CPU: 8 PID: 731 at kernel/fork.c:969 __put_task_struct+0x10c/0x180
> 
> And this is running that test case you referenced? I'll take a look, as
> it seems related to the poll kill rather than the other patchset.

Yeah, and now I am running 'git bisect' on Pavel's V2.

thanks,
Ming

[00/11] remove aux CQE caches

Message

Comments