diff mbox series

[for-next,02/18] RMDA/rtrs-srv: Occasionally flush ongoing session closing

Message ID 20201209164542.61387-3-jinpu.wang@cloud.ionos.com (mailing list archive)
State Superseded
Headers show
Series Misc update for rtrs | expand

Commit Message

Jinpu Wang Dec. 9, 2020, 4:45 p.m. UTC
If there are many establishments/teardowns, we need to make sure
we do not consume too much system memory. Thus let on going
session closing to finish before accepting new connection.

Inspired by commit 777dc82395de ("nvmet-rdma: occasionally flush ongoing controller teardown")
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
---
 drivers/infiniband/ulp/rtrs/rtrs-srv.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Jinpu Wang Dec. 10, 2020, 2:56 p.m. UTC | #1
On Wed, Dec 9, 2020 at 5:45 PM Jack Wang <jinpu.wang@cloud.ionos.com> wrote:
>
> If there are many establishments/teardowns, we need to make sure
> we do not consume too much system memory. Thus let on going
> session closing to finish before accepting new connection.
>
> Inspired by commit 777dc82395de ("nvmet-rdma: occasionally flush ongoing controller teardown")
> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
> Reviewed-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Please ignore this one, it could lead to deadlock, due to the fact
cma_ib_req_handler is holding
mutex_lock(&listen_id->handler_mutex) when calling into
rtrs_rdma_connect, we call close_work which will call rdma_destroy_id,
which
could try to hold the same handler_mutex, so deadlock.

Sorry & thanks!

Jack

> ---
>  drivers/infiniband/ulp/rtrs/rtrs-srv.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv.c b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
> index ed4628f032bb..0a2202c28b54 100644
> --- a/drivers/infiniband/ulp/rtrs/rtrs-srv.c
> +++ b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
> @@ -1791,6 +1791,10 @@ static int rtrs_rdma_connect(struct rdma_cm_id *cm_id,
>                 err = -ENOMEM;
>                 goto reject_w_err;
>         }
> +       if (!cid) {
> +               /* Let inflight session teardown complete */
> +               flush_workqueue(rtrs_wq);
> +       }
>         mutex_lock(&srv->paths_mutex);
>         sess = __find_sess(srv, &msg->sess_uuid);
>         if (sess) {
> --
> 2.25.1
>
Guoqing Jiang Dec. 11, 2020, 2:33 a.m. UTC | #2
On 12/10/20 15:56, Jinpu Wang wrote:
> On Wed, Dec 9, 2020 at 5:45 PM Jack Wang <jinpu.wang@cloud.ionos.com> wrote:
>>
>> If there are many establishments/teardowns, we need to make sure
>> we do not consume too much system memory. Thus let on going
>> session closing to finish before accepting new connection.
>>
>> Inspired by commit 777dc82395de ("nvmet-rdma: occasionally flush ongoing controller teardown")
>> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
>> Reviewed-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
> Please ignore this one, it could lead to deadlock, due to the fact
> cma_ib_req_handler is holding
> mutex_lock(&listen_id->handler_mutex) when calling into
> rtrs_rdma_connect, we call close_work which will call rdma_destroy_id,
> which
> could try to hold the same handler_mutex, so deadlock.
> 

I am wondering if nvmet-rdma has the similar issue or not, if so, maybe 
introduce a locked version of rdma_destroy_id.

Thanks,
Guoqing
Jinpu Wang Dec. 11, 2020, 6:50 a.m. UTC | #3
On Fri, Dec 11, 2020 at 3:35 AM Guoqing Jiang
<guoqing.jiang@cloud.ionos.com> wrote:
>
>
>
> On 12/10/20 15:56, Jinpu Wang wrote:
> > On Wed, Dec 9, 2020 at 5:45 PM Jack Wang <jinpu.wang@cloud.ionos.com> wrote:
> >>
> >> If there are many establishments/teardowns, we need to make sure
> >> we do not consume too much system memory. Thus let on going
> >> session closing to finish before accepting new connection.
> >>
> >> Inspired by commit 777dc82395de ("nvmet-rdma: occasionally flush ongoing controller teardown")
> >> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
> >> Reviewed-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
> > Please ignore this one, it could lead to deadlock, due to the fact
> > cma_ib_req_handler is holding
> > mutex_lock(&listen_id->handler_mutex) when calling into
> > rtrs_rdma_connect, we call close_work which will call rdma_destroy_id,
> > which
> > could try to hold the same handler_mutex, so deadlock.
> >
>
> I am wondering if nvmet-rdma has the similar issue or not, if so, maybe
> introduce a locked version of rdma_destroy_id.
>
> Thanks,
> Guoqing

No, I was wrong. I rechecked the code, it's not a valid deadlock, in
cma_ib_req_handler, the conn_id is newly created in
https://elixir.bootlin.com/linux/latest/source/drivers/infiniband/core/cma.c#L2185.

Flush_workqueue will only flush close_work for any other cm_id, but
not the newly created one conn_id, it has not associated with anything
yet.

The same applies to nvme-rdma. so it's a false alarm by lockdep.

Regards!
Jack
Leon Romanovsky Dec. 11, 2020, 7:26 a.m. UTC | #4
On Fri, Dec 11, 2020 at 07:50:13AM +0100, Jinpu Wang wrote:
> On Fri, Dec 11, 2020 at 3:35 AM Guoqing Jiang
> <guoqing.jiang@cloud.ionos.com> wrote:
> >
> >
> >
> > On 12/10/20 15:56, Jinpu Wang wrote:
> > > On Wed, Dec 9, 2020 at 5:45 PM Jack Wang <jinpu.wang@cloud.ionos.com> wrote:
> > >>
> > >> If there are many establishments/teardowns, we need to make sure
> > >> we do not consume too much system memory. Thus let on going
> > >> session closing to finish before accepting new connection.
> > >>
> > >> Inspired by commit 777dc82395de ("nvmet-rdma: occasionally flush ongoing controller teardown")
> > >> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
> > >> Reviewed-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
> > > Please ignore this one, it could lead to deadlock, due to the fact
> > > cma_ib_req_handler is holding
> > > mutex_lock(&listen_id->handler_mutex) when calling into
> > > rtrs_rdma_connect, we call close_work which will call rdma_destroy_id,
> > > which
> > > could try to hold the same handler_mutex, so deadlock.
> > >
> >
> > I am wondering if nvmet-rdma has the similar issue or not, if so, maybe
> > introduce a locked version of rdma_destroy_id.
> >
> > Thanks,
> > Guoqing
>
> No, I was wrong. I rechecked the code, it's not a valid deadlock, in
> cma_ib_req_handler, the conn_id is newly created in
> https://elixir.bootlin.com/linux/latest/source/drivers/infiniband/core/cma.c#L2185.
>
> Flush_workqueue will only flush close_work for any other cm_id, but
> not the newly created one conn_id, it has not associated with anything
> yet.
>
> The same applies to nvme-rdma. so it's a false alarm by lockdep.

Leaving this without fix (proper lock annotation) is not right thing to
do, because everyone who runs rtrs code with LOCKDEP on will have same
"false alarm".

So I recommend or not to take this patch or write it without LOCKDEP warning.

Thanks

>
> Regards!
> Jack
Jinpu Wang Dec. 11, 2020, 7:53 a.m. UTC | #5
On Fri, Dec 11, 2020 at 8:26 AM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Fri, Dec 11, 2020 at 07:50:13AM +0100, Jinpu Wang wrote:
> > On Fri, Dec 11, 2020 at 3:35 AM Guoqing Jiang
> > <guoqing.jiang@cloud.ionos.com> wrote:
> > >
> > >
> > >
> > > On 12/10/20 15:56, Jinpu Wang wrote:
> > > > On Wed, Dec 9, 2020 at 5:45 PM Jack Wang <jinpu.wang@cloud.ionos.com> wrote:
> > > >>
> > > >> If there are many establishments/teardowns, we need to make sure
> > > >> we do not consume too much system memory. Thus let on going
> > > >> session closing to finish before accepting new connection.
> > > >>
> > > >> Inspired by commit 777dc82395de ("nvmet-rdma: occasionally flush ongoing controller teardown")
> > > >> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
> > > >> Reviewed-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
> > > > Please ignore this one, it could lead to deadlock, due to the fact
> > > > cma_ib_req_handler is holding
> > > > mutex_lock(&listen_id->handler_mutex) when calling into
> > > > rtrs_rdma_connect, we call close_work which will call rdma_destroy_id,
> > > > which
> > > > could try to hold the same handler_mutex, so deadlock.
> > > >
> > >
> > > I am wondering if nvmet-rdma has the similar issue or not, if so, maybe
> > > introduce a locked version of rdma_destroy_id.
> > >
> > > Thanks,
> > > Guoqing
> >
> > No, I was wrong. I rechecked the code, it's not a valid deadlock, in
> > cma_ib_req_handler, the conn_id is newly created in
> > https://elixir.bootlin.com/linux/latest/source/drivers/infiniband/core/cma.c#L2185.
> >
> > Flush_workqueue will only flush close_work for any other cm_id, but
> > not the newly created one conn_id, it has not associated with anything
> > yet.
> >
> > The same applies to nvme-rdma. so it's a false alarm by lockdep.
>
> Leaving this without fix (proper lock annotation) is not right thing to
> do, because everyone who runs rtrs code with LOCKDEP on will have same
> "false alarm".
>
> So I recommend or not to take this patch or write it without LOCKDEP warning.
Hi Leon,

I'm thinking about the same, do you have a suggestion on how to teach
LOCKDEP this is not really a deadlock,
I do not know LOCKDEP well.

Thanks
>
> Thanks
>
> >
> > Regards!
> > Jack
Jinpu Wang Dec. 11, 2020, 7:58 a.m. UTC | #6
On Fri, Dec 11, 2020 at 8:53 AM Jinpu Wang <jinpu.wang@cloud.ionos.com> wrote:
>
> On Fri, Dec 11, 2020 at 8:26 AM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Fri, Dec 11, 2020 at 07:50:13AM +0100, Jinpu Wang wrote:
> > > On Fri, Dec 11, 2020 at 3:35 AM Guoqing Jiang
> > > <guoqing.jiang@cloud.ionos.com> wrote:
> > > >
> > > >
> > > >
> > > > On 12/10/20 15:56, Jinpu Wang wrote:
> > > > > On Wed, Dec 9, 2020 at 5:45 PM Jack Wang <jinpu.wang@cloud.ionos.com> wrote:
> > > > >>
> > > > >> If there are many establishments/teardowns, we need to make sure
> > > > >> we do not consume too much system memory. Thus let on going
> > > > >> session closing to finish before accepting new connection.
> > > > >>
> > > > >> Inspired by commit 777dc82395de ("nvmet-rdma: occasionally flush ongoing controller teardown")
> > > > >> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
> > > > >> Reviewed-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
> > > > > Please ignore this one, it could lead to deadlock, due to the fact
> > > > > cma_ib_req_handler is holding
> > > > > mutex_lock(&listen_id->handler_mutex) when calling into
> > > > > rtrs_rdma_connect, we call close_work which will call rdma_destroy_id,
> > > > > which
> > > > > could try to hold the same handler_mutex, so deadlock.
> > > > >
> > > >
> > > > I am wondering if nvmet-rdma has the similar issue or not, if so, maybe
> > > > introduce a locked version of rdma_destroy_id.
> > > >
> > > > Thanks,
> > > > Guoqing
> > >
> > > No, I was wrong. I rechecked the code, it's not a valid deadlock, in
> > > cma_ib_req_handler, the conn_id is newly created in
> > > https://elixir.bootlin.com/linux/latest/source/drivers/infiniband/core/cma.c#L2185.
> > >
> > > Flush_workqueue will only flush close_work for any other cm_id, but
> > > not the newly created one conn_id, it has not associated with anything
> > > yet.
> > >
> > > The same applies to nvme-rdma. so it's a false alarm by lockdep.
> >
> > Leaving this without fix (proper lock annotation) is not right thing to
> > do, because everyone who runs rtrs code with LOCKDEP on will have same
> > "false alarm".
> >
> > So I recommend or not to take this patch or write it without LOCKDEP warning.
> Hi Leon,
>
> I'm thinking about the same, do you have a suggestion on how to teach
> LOCKDEP this is not really a deadlock,
> I do not know LOCKDEP well.
Found it myself, we can use lockdep_off

https://elixir.bootlin.com/linux/latest/source/drivers/virtio/virtio_mem.c#L699

Thanks

>
> Thanks
> >
> > Thanks
> >
> > >
> > > Regards!
> > > Jack
Jason Gunthorpe Dec. 11, 2020, 1:45 p.m. UTC | #7
On Fri, Dec 11, 2020 at 08:58:09AM +0100, Jinpu Wang wrote:
> > > > No, I was wrong. I rechecked the code, it's not a valid deadlock, in
> > > > cma_ib_req_handler, the conn_id is newly created in
> > > > https://elixir.bootlin.com/linux/latest/source/drivers/infiniband/core/cma.c#L2185.
> > > >
> > > > Flush_workqueue will only flush close_work for any other cm_id, but
> > > > not the newly created one conn_id, it has not associated with anything
> > > > yet.
> > > >
> > > > The same applies to nvme-rdma. so it's a false alarm by lockdep.
> > >
> > > Leaving this without fix (proper lock annotation) is not right thing to
> > > do, because everyone who runs rtrs code with LOCKDEP on will have same
> > > "false alarm".
> > >
> > > So I recommend or not to take this patch or write it without LOCKDEP warning.
> > Hi Leon,
> >
> > I'm thinking about the same, do you have a suggestion on how to teach
> > LOCKDEP this is not really a deadlock,
> > I do not know LOCKDEP well.
> Found it myself, we can use lockdep_off
> 
> https://elixir.bootlin.com/linux/latest/source/drivers/virtio/virtio_mem.c#L699

Gah, that is horrible.

I'm not completely sure it is false either, at this point two
handler_mutex's are locked (listening ID and the new ID) and there may
be locking cycles that end up back on the listening ID, certainly it
is not so simple.

Jason
Jason Gunthorpe Dec. 11, 2020, 4:29 p.m. UTC | #8
On Fri, Dec 11, 2020 at 05:17:36PM +0100, Jack Wang wrote:
>    En, the lockdep was complaining about the new conn_id, I will
>    post the full log if needed next week.  let’s skip this patch for
>    now, will double check!

That is even more worrysome as the new conn_id already has a different
lock class.

Jason
Leon Romanovsky Dec. 11, 2020, 8:49 p.m. UTC | #9
On Fri, Dec 11, 2020 at 08:58:09AM +0100, Jinpu Wang wrote:
> On Fri, Dec 11, 2020 at 8:53 AM Jinpu Wang <jinpu.wang@cloud.ionos.com> wrote:
> >
> > On Fri, Dec 11, 2020 at 8:26 AM Leon Romanovsky <leon@kernel.org> wrote:
> > >
> > > On Fri, Dec 11, 2020 at 07:50:13AM +0100, Jinpu Wang wrote:
> > > > On Fri, Dec 11, 2020 at 3:35 AM Guoqing Jiang
> > > > <guoqing.jiang@cloud.ionos.com> wrote:
> > > > >
> > > > >
> > > > >
> > > > > On 12/10/20 15:56, Jinpu Wang wrote:
> > > > > > On Wed, Dec 9, 2020 at 5:45 PM Jack Wang <jinpu.wang@cloud.ionos.com> wrote:
> > > > > >>
> > > > > >> If there are many establishments/teardowns, we need to make sure
> > > > > >> we do not consume too much system memory. Thus let on going
> > > > > >> session closing to finish before accepting new connection.
> > > > > >>
> > > > > >> Inspired by commit 777dc82395de ("nvmet-rdma: occasionally flush ongoing controller teardown")
> > > > > >> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
> > > > > >> Reviewed-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
> > > > > > Please ignore this one, it could lead to deadlock, due to the fact
> > > > > > cma_ib_req_handler is holding
> > > > > > mutex_lock(&listen_id->handler_mutex) when calling into
> > > > > > rtrs_rdma_connect, we call close_work which will call rdma_destroy_id,
> > > > > > which
> > > > > > could try to hold the same handler_mutex, so deadlock.
> > > > > >
> > > > >
> > > > > I am wondering if nvmet-rdma has the similar issue or not, if so, maybe
> > > > > introduce a locked version of rdma_destroy_id.
> > > > >
> > > > > Thanks,
> > > > > Guoqing
> > > >
> > > > No, I was wrong. I rechecked the code, it's not a valid deadlock, in
> > > > cma_ib_req_handler, the conn_id is newly created in
> > > > https://elixir.bootlin.com/linux/latest/source/drivers/infiniband/core/cma.c#L2185.
> > > >
> > > > Flush_workqueue will only flush close_work for any other cm_id, but
> > > > not the newly created one conn_id, it has not associated with anything
> > > > yet.
> > > >
> > > > The same applies to nvme-rdma. so it's a false alarm by lockdep.
> > >
> > > Leaving this without fix (proper lock annotation) is not right thing to
> > > do, because everyone who runs rtrs code with LOCKDEP on will have same
> > > "false alarm".
> > >
> > > So I recommend or not to take this patch or write it without LOCKDEP warning.
> > Hi Leon,
> >
> > I'm thinking about the same, do you have a suggestion on how to teach
> > LOCKDEP this is not really a deadlock,
> > I do not know LOCKDEP well.
> Found it myself, we can use lockdep_off
>
> https://elixir.bootlin.com/linux/latest/source/drivers/virtio/virtio_mem.c#L699

My personal experience from internal/external reviews shows that claims
about false alarms in LOCKDEP warnings are almost always wrong.

Thanks

>
> Thanks
>
> >
> > Thanks
> > >
> > > Thanks
> > >
> > > >
> > > > Regards!
> > > > Jack
Jinpu Wang Dec. 16, 2020, 4:42 p.m. UTC | #10
On Fri, Dec 11, 2020 at 5:29 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Fri, Dec 11, 2020 at 05:17:36PM +0100, Jack Wang wrote:
> >    En, the lockdep was complaining about the new conn_id, I will
> >    post the full log if needed next week.  let’s skip this patch for
> >    now, will double check!
>
> That is even more worrysome as the new conn_id already has a different
> lock class.
>
> Jason
This is the dmesg of the LOCKDEP warning, it's on kernel 5.4.77, but
the latest 5.10 behaves the same.

[  500.071552] ======================================================
[  500.071648] WARNING: possible circular locking dependency detected
[  500.071869] 5.4.77-storage+ #35 Tainted: G           O
[  500.071959] ------------------------------------------------------
[  500.072054] kworker/1:1/28 is trying to acquire lock:
[  500.072200] ffff99653a624390 (&id_priv->handler_mutex){+.+.}, at:
rdma_destroy_id+0x55/0x230 [rdma_cm]
[  500.072837]
               but task is already holding lock:
[  500.072938] ffff9d18800f7e80
((work_completion)(&sess->close_work)){+.+.}, at:
process_one_work+0x223/0x600
[  500.075642]
               which lock already depends on the new lock.

[  500.075759]
               the existing dependency chain (in reverse order) is:
[  500.075880]
               -> #3 ((work_completion)(&sess->close_work)){+.+.}:
[  500.076062]        process_one_work+0x278/0x600
[  500.076154]        worker_thread+0x2d/0x3d0
[  500.076225]        kthread+0x111/0x130
[  500.076290]        ret_from_fork+0x24/0x30
[  500.076370]
               -> #2 ((wq_completion)rtrs_server_wq){+.+.}:
[  500.076482]        flush_workqueue+0xab/0x4b0
[  500.076565]        rtrs_srv_rdma_cm_handler+0x71d/0x1500 [rtrs_server]
[  500.076674]        cma_ib_req_handler+0x8c4/0x14f0 [rdma_cm]
[  500.076770]        cm_process_work+0x22/0x140 [ib_cm]
[  500.076857]        cm_req_handler+0x900/0xde0 [ib_cm]
[  500.076944]        cm_work_handler+0x136/0x1af2 [ib_cm]
[  500.077025]        process_one_work+0x29f/0x600
[  500.077097]        worker_thread+0x2d/0x3d0
[  500.077164]        kthread+0x111/0x130
[  500.077224]        ret_from_fork+0x24/0x30
[  500.077294]
               -> #1 (&id_priv->handler_mutex/1){+.+.}:
[  500.077409]        __mutex_lock+0x7e/0x950
[  500.077488]        cma_ib_req_handler+0x787/0x14f0 [rdma_cm]
[  500.077582]        cm_process_work+0x22/0x140 [ib_cm]
[  500.077669]        cm_req_handler+0x900/0xde0 [ib_cm]
[  500.077755]        cm_work_handler+0x136/0x1af2 [ib_cm]
[  500.077835]        process_one_work+0x29f/0x600
[  500.077907]        worker_thread+0x2d/0x3d0
[  500.077973]        kthread+0x111/0x130
[  500.078034]        ret_from_fork+0x24/0x30
[  500.078095]
               -> #0 (&id_priv->handler_mutex){+.+.}:
[  500.078196]        __lock_acquire+0x1166/0x1440
[  500.078267]        lock_acquire+0x90/0x170
[  500.078335]        __mutex_lock+0x7e/0x950
[  500.078410]        rdma_destroy_id+0x55/0x230 [rdma_cm]
[  500.078498]        rtrs_srv_close_work+0xf2/0x2d0 [rtrs_server]
[  500.078586]        process_one_work+0x29f/0x600
[  500.078662]        worker_thread+0x2d/0x3d0
[  500.078732]        kthread+0x111/0x130
[  500.078793]        ret_from_fork+0x24/0x30
[  500.078859]
               other info that might help us debug this:

[  500.078984] Chain exists of:
                 &id_priv->handler_mutex -->
(wq_completion)rtrs_server_wq --> (work_completion)(&sess->close_work)

[  500.079207]  Possible unsafe locking scenario:

[  500.079293]        CPU0                    CPU1
[  500.079358]        ----                    ----
[  500.079358]   lock((work_completion)(&sess->close_work));
[  500.079358]
lock((wq_completion)rtrs_server_wq);
[  500.079358]
lock((work_completion)(&sess->close_work));
[  500.079358]   lock(&id_priv->handler_mutex);
[  500.079358]
                *** DEADLOCK ***

[  500.079358] 2 locks held by kworker/1:1/28:
[  500.079358]  #0: ffff99652d281f28
((wq_completion)rtrs_server_wq){+.+.}, at:
process_one_work+0x223/0x600
[  500.079358]  #1: ffff9d18800f7e80
((work_completion)(&sess->close_work)){+.+.}, at:
process_one_work+0x223/0x600
[  500.079358]
               stack backtrace:
[  500.079358] CPU: 1 PID: 28 Comm: kworker/1:1 Tainted: G           O
     5.4.77-storage+ #35
[  500.079358] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.10.2-1ubuntu1 04/01/2014
[  500.079358] Workqueue: rtrs_server_wq rtrs_srv_close_work [rtrs_server]
[  500.079358] Call Trace:
[  500.079358]  dump_stack+0x71/0x9b
[  500.079358]  check_noncircular+0x17d/0x1a0
[  500.079358]  ? __lock_acquire+0x1166/0x1440
[  500.079358]  __lock_acquire+0x1166/0x1440
[  500.079358]  lock_acquire+0x90/0x170
[  500.079358]  ? rdma_destroy_id+0x55/0x230 [rdma_cm]
[  500.079358]  ? rdma_destroy_id+0x55/0x230 [rdma_cm]
[  500.079358]  __mutex_lock+0x7e/0x950
[  500.079358]  ? rdma_destroy_id+0x55/0x230 [rdma_cm]
[  500.079358]  ? find_held_lock+0x2d/0x90
[  500.079358]  ? mark_held_locks+0x49/0x70
[  500.079358]  ? rdma_destroy_id+0x55/0x230 [rdma_cm]
[  500.079358]  rdma_destroy_id+0x55/0x230 [rdma_cm]
[  500.079358]  rtrs_srv_close_work+0xf2/0x2d0 [rtrs_server]
[  500.079358]  process_one_work+0x29f/0x600
[  500.079358]  worker_thread+0x2d/0x3d0
[  500.079358]  ? process_one_work+0x600/0x600
[  500.079358]  kthread+0x111/0x130
[  500.079358]  ? kthread_park+0x90/0x90
[  500.079358]  ret_from_fork+0x24/0x30

According to my understanding
in cma_ib_req_handler, the conn_id is newly created in
https://elixir.bootlin.com/linux/latest/source/drivers/infiniband/core/cma.c#L2222.
And the rdma_cm_id associated with conn_id is passed to
rtrs_srv_rdma_cm_handler->rtrs_rdma_connect.

In rtrs_rdma_connect, we do flush_workqueue will only flush close_work
for any other cm_id, but
not the newly created one conn_id, it has not associated with anything yet.

The same applies to nvme-rdma. so it's a false alarm by lockdep.

Regards!
Leon Romanovsky Dec. 27, 2020, 9:01 a.m. UTC | #11
On Wed, Dec 16, 2020 at 05:42:17PM +0100, Jinpu Wang wrote:
> On Fri, Dec 11, 2020 at 5:29 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Fri, Dec 11, 2020 at 05:17:36PM +0100, Jack Wang wrote:
> > >    En, the lockdep was complaining about the new conn_id, I will
> > >    post the full log if needed next week.  let’s skip this patch for
> > >    now, will double check!
> >
> > That is even more worrysome as the new conn_id already has a different
> > lock class.
> >
> > Jason
> This is the dmesg of the LOCKDEP warning, it's on kernel 5.4.77, but
> the latest 5.10 behaves the same.
>
> [  500.071552] ======================================================
> [  500.071648] WARNING: possible circular locking dependency detected
> [  500.071869] 5.4.77-storage+ #35 Tainted: G           O
> [  500.071959] ------------------------------------------------------
> [  500.072054] kworker/1:1/28 is trying to acquire lock:
> [  500.072200] ffff99653a624390 (&id_priv->handler_mutex){+.+.}, at:
> rdma_destroy_id+0x55/0x230 [rdma_cm]
> [  500.072837]
>                but task is already holding lock:
> [  500.072938] ffff9d18800f7e80
> ((work_completion)(&sess->close_work)){+.+.}, at:
> process_one_work+0x223/0x600
> [  500.075642]
>                which lock already depends on the new lock.
>
> [  500.075759]
>                the existing dependency chain (in reverse order) is:
> [  500.075880]
>                -> #3 ((work_completion)(&sess->close_work)){+.+.}:
> [  500.076062]        process_one_work+0x278/0x600
> [  500.076154]        worker_thread+0x2d/0x3d0
> [  500.076225]        kthread+0x111/0x130
> [  500.076290]        ret_from_fork+0x24/0x30
> [  500.076370]
>                -> #2 ((wq_completion)rtrs_server_wq){+.+.}:
> [  500.076482]        flush_workqueue+0xab/0x4b0
> [  500.076565]        rtrs_srv_rdma_cm_handler+0x71d/0x1500 [rtrs_server]
> [  500.076674]        cma_ib_req_handler+0x8c4/0x14f0 [rdma_cm]
> [  500.076770]        cm_process_work+0x22/0x140 [ib_cm]
> [  500.076857]        cm_req_handler+0x900/0xde0 [ib_cm]
> [  500.076944]        cm_work_handler+0x136/0x1af2 [ib_cm]
> [  500.077025]        process_one_work+0x29f/0x600
> [  500.077097]        worker_thread+0x2d/0x3d0
> [  500.077164]        kthread+0x111/0x130
> [  500.077224]        ret_from_fork+0x24/0x30
> [  500.077294]
>                -> #1 (&id_priv->handler_mutex/1){+.+.}:
> [  500.077409]        __mutex_lock+0x7e/0x950
> [  500.077488]        cma_ib_req_handler+0x787/0x14f0 [rdma_cm]
> [  500.077582]        cm_process_work+0x22/0x140 [ib_cm]
> [  500.077669]        cm_req_handler+0x900/0xde0 [ib_cm]
> [  500.077755]        cm_work_handler+0x136/0x1af2 [ib_cm]
> [  500.077835]        process_one_work+0x29f/0x600
> [  500.077907]        worker_thread+0x2d/0x3d0
> [  500.077973]        kthread+0x111/0x130
> [  500.078034]        ret_from_fork+0x24/0x30
> [  500.078095]
>                -> #0 (&id_priv->handler_mutex){+.+.}:
> [  500.078196]        __lock_acquire+0x1166/0x1440
> [  500.078267]        lock_acquire+0x90/0x170
> [  500.078335]        __mutex_lock+0x7e/0x950
> [  500.078410]        rdma_destroy_id+0x55/0x230 [rdma_cm]
> [  500.078498]        rtrs_srv_close_work+0xf2/0x2d0 [rtrs_server]
> [  500.078586]        process_one_work+0x29f/0x600
> [  500.078662]        worker_thread+0x2d/0x3d0
> [  500.078732]        kthread+0x111/0x130
> [  500.078793]        ret_from_fork+0x24/0x30
> [  500.078859]
>                other info that might help us debug this:
>
> [  500.078984] Chain exists of:
>                  &id_priv->handler_mutex -->
> (wq_completion)rtrs_server_wq --> (work_completion)(&sess->close_work)
>
> [  500.079207]  Possible unsafe locking scenario:
>
> [  500.079293]        CPU0                    CPU1
> [  500.079358]        ----                    ----
> [  500.079358]   lock((work_completion)(&sess->close_work));
> [  500.079358]
> lock((wq_completion)rtrs_server_wq);
> [  500.079358]
> lock((work_completion)(&sess->close_work));
> [  500.079358]   lock(&id_priv->handler_mutex);
> [  500.079358]
>                 *** DEADLOCK ***
>
> [  500.079358] 2 locks held by kworker/1:1/28:
> [  500.079358]  #0: ffff99652d281f28
> ((wq_completion)rtrs_server_wq){+.+.}, at:
> process_one_work+0x223/0x600
> [  500.079358]  #1: ffff9d18800f7e80
> ((work_completion)(&sess->close_work)){+.+.}, at:
> process_one_work+0x223/0x600
> [  500.079358]
>                stack backtrace:
> [  500.079358] CPU: 1 PID: 28 Comm: kworker/1:1 Tainted: G           O
>      5.4.77-storage+ #35
> [  500.079358] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 1.10.2-1ubuntu1 04/01/2014
> [  500.079358] Workqueue: rtrs_server_wq rtrs_srv_close_work [rtrs_server]
> [  500.079358] Call Trace:
> [  500.079358]  dump_stack+0x71/0x9b
> [  500.079358]  check_noncircular+0x17d/0x1a0
> [  500.079358]  ? __lock_acquire+0x1166/0x1440
> [  500.079358]  __lock_acquire+0x1166/0x1440
> [  500.079358]  lock_acquire+0x90/0x170
> [  500.079358]  ? rdma_destroy_id+0x55/0x230 [rdma_cm]
> [  500.079358]  ? rdma_destroy_id+0x55/0x230 [rdma_cm]
> [  500.079358]  __mutex_lock+0x7e/0x950
> [  500.079358]  ? rdma_destroy_id+0x55/0x230 [rdma_cm]
> [  500.079358]  ? find_held_lock+0x2d/0x90
> [  500.079358]  ? mark_held_locks+0x49/0x70
> [  500.079358]  ? rdma_destroy_id+0x55/0x230 [rdma_cm]
> [  500.079358]  rdma_destroy_id+0x55/0x230 [rdma_cm]
> [  500.079358]  rtrs_srv_close_work+0xf2/0x2d0 [rtrs_server]
> [  500.079358]  process_one_work+0x29f/0x600
> [  500.079358]  worker_thread+0x2d/0x3d0
> [  500.079358]  ? process_one_work+0x600/0x600
> [  500.079358]  kthread+0x111/0x130
> [  500.079358]  ? kthread_park+0x90/0x90
> [  500.079358]  ret_from_fork+0x24/0x30
>
> According to my understanding
> in cma_ib_req_handler, the conn_id is newly created in
> https://elixir.bootlin.com/linux/latest/source/drivers/infiniband/core/cma.c#L2222.
> And the rdma_cm_id associated with conn_id is passed to
> rtrs_srv_rdma_cm_handler->rtrs_rdma_connect.
>
> In rtrs_rdma_connect, we do flush_workqueue will only flush close_work
> for any other cm_id, but
> not the newly created one conn_id, it has not associated with anything yet.

How did you come to this conclusion that rtrs handler was called before
cma_cm_event_handler()? I'm not so sure about that and it will explain
the lockdep.

Thanks

>
> The same applies to nvme-rdma. so it's a false alarm by lockdep.
>
> Regards!
Jinpu Wang Jan. 4, 2021, 8:06 a.m. UTC | #12
On Sun, Dec 27, 2020 at 10:01 AM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Wed, Dec 16, 2020 at 05:42:17PM +0100, Jinpu Wang wrote:
> > On Fri, Dec 11, 2020 at 5:29 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Fri, Dec 11, 2020 at 05:17:36PM +0100, Jack Wang wrote:
> > > >    En, the lockdep was complaining about the new conn_id, I will
> > > >    post the full log if needed next week.  let’s skip this patch for
> > > >    now, will double check!
> > >
> > > That is even more worrysome as the new conn_id already has a different
> > > lock class.
> > >
> > > Jason
> > This is the dmesg of the LOCKDEP warning, it's on kernel 5.4.77, but
> > the latest 5.10 behaves the same.
> >
> > [  500.071552] ======================================================
> > [  500.071648] WARNING: possible circular locking dependency detected
> > [  500.071869] 5.4.77-storage+ #35 Tainted: G           O
> > [  500.071959] ------------------------------------------------------
> > [  500.072054] kworker/1:1/28 is trying to acquire lock:
> > [  500.072200] ffff99653a624390 (&id_priv->handler_mutex){+.+.}, at:
> > rdma_destroy_id+0x55/0x230 [rdma_cm]
> > [  500.072837]
> >                but task is already holding lock:
> > [  500.072938] ffff9d18800f7e80
> > ((work_completion)(&sess->close_work)){+.+.}, at:
> > process_one_work+0x223/0x600
> > [  500.075642]
> >                which lock already depends on the new lock.
> >
> > [  500.075759]
> >                the existing dependency chain (in reverse order) is:
> > [  500.075880]
> >                -> #3 ((work_completion)(&sess->close_work)){+.+.}:
> > [  500.076062]        process_one_work+0x278/0x600
> > [  500.076154]        worker_thread+0x2d/0x3d0
> > [  500.076225]        kthread+0x111/0x130
> > [  500.076290]        ret_from_fork+0x24/0x30
> > [  500.076370]
> >                -> #2 ((wq_completion)rtrs_server_wq){+.+.}:
> > [  500.076482]        flush_workqueue+0xab/0x4b0
> > [  500.076565]        rtrs_srv_rdma_cm_handler+0x71d/0x1500 [rtrs_server]
> > [  500.076674]        cma_ib_req_handler+0x8c4/0x14f0 [rdma_cm]
> > [  500.076770]        cm_process_work+0x22/0x140 [ib_cm]
> > [  500.076857]        cm_req_handler+0x900/0xde0 [ib_cm]
> > [  500.076944]        cm_work_handler+0x136/0x1af2 [ib_cm]
> > [  500.077025]        process_one_work+0x29f/0x600
> > [  500.077097]        worker_thread+0x2d/0x3d0
> > [  500.077164]        kthread+0x111/0x130
> > [  500.077224]        ret_from_fork+0x24/0x30
> > [  500.077294]
> >                -> #1 (&id_priv->handler_mutex/1){+.+.}:
> > [  500.077409]        __mutex_lock+0x7e/0x950
> > [  500.077488]        cma_ib_req_handler+0x787/0x14f0 [rdma_cm]
> > [  500.077582]        cm_process_work+0x22/0x140 [ib_cm]
> > [  500.077669]        cm_req_handler+0x900/0xde0 [ib_cm]
> > [  500.077755]        cm_work_handler+0x136/0x1af2 [ib_cm]
> > [  500.077835]        process_one_work+0x29f/0x600
> > [  500.077907]        worker_thread+0x2d/0x3d0
> > [  500.077973]        kthread+0x111/0x130
> > [  500.078034]        ret_from_fork+0x24/0x30
> > [  500.078095]
> >                -> #0 (&id_priv->handler_mutex){+.+.}:
> > [  500.078196]        __lock_acquire+0x1166/0x1440
> > [  500.078267]        lock_acquire+0x90/0x170
> > [  500.078335]        __mutex_lock+0x7e/0x950
> > [  500.078410]        rdma_destroy_id+0x55/0x230 [rdma_cm]
> > [  500.078498]        rtrs_srv_close_work+0xf2/0x2d0 [rtrs_server]
> > [  500.078586]        process_one_work+0x29f/0x600
> > [  500.078662]        worker_thread+0x2d/0x3d0
> > [  500.078732]        kthread+0x111/0x130
> > [  500.078793]        ret_from_fork+0x24/0x30
> > [  500.078859]
> >                other info that might help us debug this:
> >
> > [  500.078984] Chain exists of:
> >                  &id_priv->handler_mutex -->
> > (wq_completion)rtrs_server_wq --> (work_completion)(&sess->close_work)
> >
> > [  500.079207]  Possible unsafe locking scenario:
> >
> > [  500.079293]        CPU0                    CPU1
> > [  500.079358]        ----                    ----
> > [  500.079358]   lock((work_completion)(&sess->close_work));
> > [  500.079358]
> > lock((wq_completion)rtrs_server_wq);
> > [  500.079358]
> > lock((work_completion)(&sess->close_work));
> > [  500.079358]   lock(&id_priv->handler_mutex);
> > [  500.079358]
> >                 *** DEADLOCK ***
> >
> > [  500.079358] 2 locks held by kworker/1:1/28:
> > [  500.079358]  #0: ffff99652d281f28
> > ((wq_completion)rtrs_server_wq){+.+.}, at:
> > process_one_work+0x223/0x600
> > [  500.079358]  #1: ffff9d18800f7e80
> > ((work_completion)(&sess->close_work)){+.+.}, at:
> > process_one_work+0x223/0x600
> > [  500.079358]
> >                stack backtrace:
> > [  500.079358] CPU: 1 PID: 28 Comm: kworker/1:1 Tainted: G           O
> >      5.4.77-storage+ #35
> > [  500.079358] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> > BIOS 1.10.2-1ubuntu1 04/01/2014
> > [  500.079358] Workqueue: rtrs_server_wq rtrs_srv_close_work [rtrs_server]
> > [  500.079358] Call Trace:
> > [  500.079358]  dump_stack+0x71/0x9b
> > [  500.079358]  check_noncircular+0x17d/0x1a0
> > [  500.079358]  ? __lock_acquire+0x1166/0x1440
> > [  500.079358]  __lock_acquire+0x1166/0x1440
> > [  500.079358]  lock_acquire+0x90/0x170
> > [  500.079358]  ? rdma_destroy_id+0x55/0x230 [rdma_cm]
> > [  500.079358]  ? rdma_destroy_id+0x55/0x230 [rdma_cm]
> > [  500.079358]  __mutex_lock+0x7e/0x950
> > [  500.079358]  ? rdma_destroy_id+0x55/0x230 [rdma_cm]
> > [  500.079358]  ? find_held_lock+0x2d/0x90
> > [  500.079358]  ? mark_held_locks+0x49/0x70
> > [  500.079358]  ? rdma_destroy_id+0x55/0x230 [rdma_cm]
> > [  500.079358]  rdma_destroy_id+0x55/0x230 [rdma_cm]
> > [  500.079358]  rtrs_srv_close_work+0xf2/0x2d0 [rtrs_server]
> > [  500.079358]  process_one_work+0x29f/0x600
> > [  500.079358]  worker_thread+0x2d/0x3d0
> > [  500.079358]  ? process_one_work+0x600/0x600
> > [  500.079358]  kthread+0x111/0x130
> > [  500.079358]  ? kthread_park+0x90/0x90
> > [  500.079358]  ret_from_fork+0x24/0x30
> >
> > According to my understanding
> > in cma_ib_req_handler, the conn_id is newly created in
> > https://elixir.bootlin.com/linux/latest/source/drivers/infiniband/core/cma.c#L2222.
> > And the rdma_cm_id associated with conn_id is passed to
> > rtrs_srv_rdma_cm_handler->rtrs_rdma_connect.
> >
> > In rtrs_rdma_connect, we do flush_workqueue will only flush close_work
> > for any other cm_id, but
> > not the newly created one conn_id, it has not associated with anything yet.
>
> How did you come to this conclusion that rtrs handler was called before
> cma_cm_event_handler()? I'm not so sure about that and it will explain
> the lockdep.
>
> Thanks
Hi Leon,
I never said that, the call chain here is:
cma_ib_req_handler->cma_cm_event_handler->rtrs_srv_rdma_cm_handler->rtrs_rdma_connect.
Repeat myself in last email:
in cma_ib_req_handler, the conn_id is newly created in
 https://elixir.bootlin.com/linux/latest/source/drivers/infiniband/core/cma.c#L2222.
And the rdma_cm_id associated with conn_id is passed to
rtrs_rdma_connect.

In rtrs_rdma_connect, we do flush_workqueue will only flush close_work
for any other cm_id, but
not the newly created one conn_id, the rdma_cm_id passed in
rtrs_rdma_connect has not associated with anything yet.

Hope this is now clear.

Happy New Year!
Leon Romanovsky Jan. 4, 2021, 8:25 a.m. UTC | #13
On Mon, Jan 04, 2021 at 09:06:13AM +0100, Jinpu Wang wrote:
> On Sun, Dec 27, 2020 at 10:01 AM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Wed, Dec 16, 2020 at 05:42:17PM +0100, Jinpu Wang wrote:
> > > On Fri, Dec 11, 2020 at 5:29 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > >
> > > > On Fri, Dec 11, 2020 at 05:17:36PM +0100, Jack Wang wrote:
> > > > >    En, the lockdep was complaining about the new conn_id, I will
> > > > >    post the full log if needed next week.  let’s skip this patch for
> > > > >    now, will double check!
> > > >
> > > > That is even more worrysome as the new conn_id already has a different
> > > > lock class.
> > > >
> > > > Jason
> > > This is the dmesg of the LOCKDEP warning, it's on kernel 5.4.77, but
> > > the latest 5.10 behaves the same.
> > >
> > > [  500.071552] ======================================================
> > > [  500.071648] WARNING: possible circular locking dependency detected
> > > [  500.071869] 5.4.77-storage+ #35 Tainted: G           O
> > > [  500.071959] ------------------------------------------------------
> > > [  500.072054] kworker/1:1/28 is trying to acquire lock:
> > > [  500.072200] ffff99653a624390 (&id_priv->handler_mutex){+.+.}, at:
> > > rdma_destroy_id+0x55/0x230 [rdma_cm]
> > > [  500.072837]
> > >                but task is already holding lock:
> > > [  500.072938] ffff9d18800f7e80
> > > ((work_completion)(&sess->close_work)){+.+.}, at:
> > > process_one_work+0x223/0x600
> > > [  500.075642]
> > >                which lock already depends on the new lock.
> > >
> > > [  500.075759]
> > >                the existing dependency chain (in reverse order) is:
> > > [  500.075880]
> > >                -> #3 ((work_completion)(&sess->close_work)){+.+.}:
> > > [  500.076062]        process_one_work+0x278/0x600
> > > [  500.076154]        worker_thread+0x2d/0x3d0
> > > [  500.076225]        kthread+0x111/0x130
> > > [  500.076290]        ret_from_fork+0x24/0x30
> > > [  500.076370]
> > >                -> #2 ((wq_completion)rtrs_server_wq){+.+.}:
> > > [  500.076482]        flush_workqueue+0xab/0x4b0
> > > [  500.076565]        rtrs_srv_rdma_cm_handler+0x71d/0x1500 [rtrs_server]
> > > [  500.076674]        cma_ib_req_handler+0x8c4/0x14f0 [rdma_cm]
> > > [  500.076770]        cm_process_work+0x22/0x140 [ib_cm]
> > > [  500.076857]        cm_req_handler+0x900/0xde0 [ib_cm]
> > > [  500.076944]        cm_work_handler+0x136/0x1af2 [ib_cm]
> > > [  500.077025]        process_one_work+0x29f/0x600
> > > [  500.077097]        worker_thread+0x2d/0x3d0
> > > [  500.077164]        kthread+0x111/0x130
> > > [  500.077224]        ret_from_fork+0x24/0x30
> > > [  500.077294]
> > >                -> #1 (&id_priv->handler_mutex/1){+.+.}:
> > > [  500.077409]        __mutex_lock+0x7e/0x950
> > > [  500.077488]        cma_ib_req_handler+0x787/0x14f0 [rdma_cm]
> > > [  500.077582]        cm_process_work+0x22/0x140 [ib_cm]
> > > [  500.077669]        cm_req_handler+0x900/0xde0 [ib_cm]
> > > [  500.077755]        cm_work_handler+0x136/0x1af2 [ib_cm]
> > > [  500.077835]        process_one_work+0x29f/0x600
> > > [  500.077907]        worker_thread+0x2d/0x3d0
> > > [  500.077973]        kthread+0x111/0x130
> > > [  500.078034]        ret_from_fork+0x24/0x30
> > > [  500.078095]
> > >                -> #0 (&id_priv->handler_mutex){+.+.}:
> > > [  500.078196]        __lock_acquire+0x1166/0x1440
> > > [  500.078267]        lock_acquire+0x90/0x170
> > > [  500.078335]        __mutex_lock+0x7e/0x950
> > > [  500.078410]        rdma_destroy_id+0x55/0x230 [rdma_cm]
> > > [  500.078498]        rtrs_srv_close_work+0xf2/0x2d0 [rtrs_server]
> > > [  500.078586]        process_one_work+0x29f/0x600
> > > [  500.078662]        worker_thread+0x2d/0x3d0
> > > [  500.078732]        kthread+0x111/0x130
> > > [  500.078793]        ret_from_fork+0x24/0x30
> > > [  500.078859]
> > >                other info that might help us debug this:
> > >
> > > [  500.078984] Chain exists of:
> > >                  &id_priv->handler_mutex -->
> > > (wq_completion)rtrs_server_wq --> (work_completion)(&sess->close_work)
> > >
> > > [  500.079207]  Possible unsafe locking scenario:
> > >
> > > [  500.079293]        CPU0                    CPU1
> > > [  500.079358]        ----                    ----
> > > [  500.079358]   lock((work_completion)(&sess->close_work));
> > > [  500.079358]
> > > lock((wq_completion)rtrs_server_wq);
> > > [  500.079358]
> > > lock((work_completion)(&sess->close_work));
> > > [  500.079358]   lock(&id_priv->handler_mutex);
> > > [  500.079358]
> > >                 *** DEADLOCK ***
> > >
> > > [  500.079358] 2 locks held by kworker/1:1/28:
> > > [  500.079358]  #0: ffff99652d281f28
> > > ((wq_completion)rtrs_server_wq){+.+.}, at:
> > > process_one_work+0x223/0x600
> > > [  500.079358]  #1: ffff9d18800f7e80
> > > ((work_completion)(&sess->close_work)){+.+.}, at:
> > > process_one_work+0x223/0x600
> > > [  500.079358]
> > >                stack backtrace:
> > > [  500.079358] CPU: 1 PID: 28 Comm: kworker/1:1 Tainted: G           O
> > >      5.4.77-storage+ #35
> > > [  500.079358] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> > > BIOS 1.10.2-1ubuntu1 04/01/2014
> > > [  500.079358] Workqueue: rtrs_server_wq rtrs_srv_close_work [rtrs_server]
> > > [  500.079358] Call Trace:
> > > [  500.079358]  dump_stack+0x71/0x9b
> > > [  500.079358]  check_noncircular+0x17d/0x1a0
> > > [  500.079358]  ? __lock_acquire+0x1166/0x1440
> > > [  500.079358]  __lock_acquire+0x1166/0x1440
> > > [  500.079358]  lock_acquire+0x90/0x170
> > > [  500.079358]  ? rdma_destroy_id+0x55/0x230 [rdma_cm]
> > > [  500.079358]  ? rdma_destroy_id+0x55/0x230 [rdma_cm]
> > > [  500.079358]  __mutex_lock+0x7e/0x950
> > > [  500.079358]  ? rdma_destroy_id+0x55/0x230 [rdma_cm]
> > > [  500.079358]  ? find_held_lock+0x2d/0x90
> > > [  500.079358]  ? mark_held_locks+0x49/0x70
> > > [  500.079358]  ? rdma_destroy_id+0x55/0x230 [rdma_cm]
> > > [  500.079358]  rdma_destroy_id+0x55/0x230 [rdma_cm]
> > > [  500.079358]  rtrs_srv_close_work+0xf2/0x2d0 [rtrs_server]
> > > [  500.079358]  process_one_work+0x29f/0x600
> > > [  500.079358]  worker_thread+0x2d/0x3d0
> > > [  500.079358]  ? process_one_work+0x600/0x600
> > > [  500.079358]  kthread+0x111/0x130
> > > [  500.079358]  ? kthread_park+0x90/0x90
> > > [  500.079358]  ret_from_fork+0x24/0x30
> > >
> > > According to my understanding
> > > in cma_ib_req_handler, the conn_id is newly created in
> > > https://elixir.bootlin.com/linux/latest/source/drivers/infiniband/core/cma.c#L2222.
> > > And the rdma_cm_id associated with conn_id is passed to
> > > rtrs_srv_rdma_cm_handler->rtrs_rdma_connect.
> > >
> > > In rtrs_rdma_connect, we do flush_workqueue will only flush close_work
> > > for any other cm_id, but
> > > not the newly created one conn_id, it has not associated with anything yet.
> >
> > How did you come to this conclusion that rtrs handler was called before
> > cma_cm_event_handler()? I'm not so sure about that and it will explain
> > the lockdep.
> >
> > Thanks
> Hi Leon,
> I never said that, the call chain here is:
> cma_ib_req_handler->cma_cm_event_handler->rtrs_srv_rdma_cm_handler->rtrs_rdma_connect.
> Repeat myself in last email:
> in cma_ib_req_handler, the conn_id is newly created in
>  https://elixir.bootlin.com/linux/latest/source/drivers/infiniband/core/cma.c#L2222.
> And the rdma_cm_id associated with conn_id is passed to
> rtrs_rdma_connect.
>
> In rtrs_rdma_connect, we do flush_workqueue will only flush close_work
> for any other cm_id, but
> not the newly created one conn_id, the rdma_cm_id passed in
> rtrs_rdma_connect has not associated with anything yet.

This is exactly why I'm not so sure, after rdma_cm_id returns from
RDMA/core, it will be in that flush_workqueue queue.

>
> Hope this is now clear.
>
> Happy New Year!

Happy New Year too :)
Jinpu Wang Jan. 4, 2021, 11:04 a.m. UTC | #14
On Mon, Jan 4, 2021 at 9:25 AM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Mon, Jan 04, 2021 at 09:06:13AM +0100, Jinpu Wang wrote:
> > On Sun, Dec 27, 2020 at 10:01 AM Leon Romanovsky <leon@kernel.org> wrote:
> > >
> > > On Wed, Dec 16, 2020 at 05:42:17PM +0100, Jinpu Wang wrote:
> > > > On Fri, Dec 11, 2020 at 5:29 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > > >
> > > > > On Fri, Dec 11, 2020 at 05:17:36PM +0100, Jack Wang wrote:
> > > > > >    En, the lockdep was complaining about the new conn_id, I will
> > > > > >    post the full log if needed next week.  let’s skip this patch for
> > > > > >    now, will double check!
> > > > >
> > > > > That is even more worrysome as the new conn_id already has a different
> > > > > lock class.
> > > > >
> > > > > Jason
> > > > This is the dmesg of the LOCKDEP warning, it's on kernel 5.4.77, but
> > > > the latest 5.10 behaves the same.
> > > >
> > > > [  500.071552] ======================================================
> > > > [  500.071648] WARNING: possible circular locking dependency detected
> > > > [  500.071869] 5.4.77-storage+ #35 Tainted: G           O
> > > > [  500.071959] ------------------------------------------------------
> > > > [  500.072054] kworker/1:1/28 is trying to acquire lock:
> > > > [  500.072200] ffff99653a624390 (&id_priv->handler_mutex){+.+.}, at:
> > > > rdma_destroy_id+0x55/0x230 [rdma_cm]
> > > > [  500.072837]
> > > >                but task is already holding lock:
> > > > [  500.072938] ffff9d18800f7e80
> > > > ((work_completion)(&sess->close_work)){+.+.}, at:
> > > > process_one_work+0x223/0x600
> > > > [  500.075642]
> > > >                which lock already depends on the new lock.
> > > >
> > > > [  500.075759]
> > > >                the existing dependency chain (in reverse order) is:
> > > > [  500.075880]
> > > >                -> #3 ((work_completion)(&sess->close_work)){+.+.}:
> > > > [  500.076062]        process_one_work+0x278/0x600
> > > > [  500.076154]        worker_thread+0x2d/0x3d0
> > > > [  500.076225]        kthread+0x111/0x130
> > > > [  500.076290]        ret_from_fork+0x24/0x30
> > > > [  500.076370]
> > > >                -> #2 ((wq_completion)rtrs_server_wq){+.+.}:
> > > > [  500.076482]        flush_workqueue+0xab/0x4b0
> > > > [  500.076565]        rtrs_srv_rdma_cm_handler+0x71d/0x1500 [rtrs_server]
> > > > [  500.076674]        cma_ib_req_handler+0x8c4/0x14f0 [rdma_cm]
> > > > [  500.076770]        cm_process_work+0x22/0x140 [ib_cm]
> > > > [  500.076857]        cm_req_handler+0x900/0xde0 [ib_cm]
> > > > [  500.076944]        cm_work_handler+0x136/0x1af2 [ib_cm]
> > > > [  500.077025]        process_one_work+0x29f/0x600
> > > > [  500.077097]        worker_thread+0x2d/0x3d0
> > > > [  500.077164]        kthread+0x111/0x130
> > > > [  500.077224]        ret_from_fork+0x24/0x30
> > > > [  500.077294]
> > > >                -> #1 (&id_priv->handler_mutex/1){+.+.}:
> > > > [  500.077409]        __mutex_lock+0x7e/0x950
> > > > [  500.077488]        cma_ib_req_handler+0x787/0x14f0 [rdma_cm]
> > > > [  500.077582]        cm_process_work+0x22/0x140 [ib_cm]
> > > > [  500.077669]        cm_req_handler+0x900/0xde0 [ib_cm]
> > > > [  500.077755]        cm_work_handler+0x136/0x1af2 [ib_cm]
> > > > [  500.077835]        process_one_work+0x29f/0x600
> > > > [  500.077907]        worker_thread+0x2d/0x3d0
> > > > [  500.077973]        kthread+0x111/0x130
> > > > [  500.078034]        ret_from_fork+0x24/0x30
> > > > [  500.078095]
> > > >                -> #0 (&id_priv->handler_mutex){+.+.}:
> > > > [  500.078196]        __lock_acquire+0x1166/0x1440
> > > > [  500.078267]        lock_acquire+0x90/0x170
> > > > [  500.078335]        __mutex_lock+0x7e/0x950
> > > > [  500.078410]        rdma_destroy_id+0x55/0x230 [rdma_cm]
> > > > [  500.078498]        rtrs_srv_close_work+0xf2/0x2d0 [rtrs_server]
> > > > [  500.078586]        process_one_work+0x29f/0x600
> > > > [  500.078662]        worker_thread+0x2d/0x3d0
> > > > [  500.078732]        kthread+0x111/0x130
> > > > [  500.078793]        ret_from_fork+0x24/0x30
> > > > [  500.078859]
> > > >                other info that might help us debug this:
> > > >
> > > > [  500.078984] Chain exists of:
> > > >                  &id_priv->handler_mutex -->
> > > > (wq_completion)rtrs_server_wq --> (work_completion)(&sess->close_work)
> > > >
> > > > [  500.079207]  Possible unsafe locking scenario:
> > > >
> > > > [  500.079293]        CPU0                    CPU1
> > > > [  500.079358]        ----                    ----
> > > > [  500.079358]   lock((work_completion)(&sess->close_work));
> > > > [  500.079358]
> > > > lock((wq_completion)rtrs_server_wq);
> > > > [  500.079358]
> > > > lock((work_completion)(&sess->close_work));
> > > > [  500.079358]   lock(&id_priv->handler_mutex);
> > > > [  500.079358]
> > > >                 *** DEADLOCK ***
> > > >
> > > > [  500.079358] 2 locks held by kworker/1:1/28:
> > > > [  500.079358]  #0: ffff99652d281f28
> > > > ((wq_completion)rtrs_server_wq){+.+.}, at:
> > > > process_one_work+0x223/0x600
> > > > [  500.079358]  #1: ffff9d18800f7e80
> > > > ((work_completion)(&sess->close_work)){+.+.}, at:
> > > > process_one_work+0x223/0x600
> > > > [  500.079358]
> > > >                stack backtrace:
> > > > [  500.079358] CPU: 1 PID: 28 Comm: kworker/1:1 Tainted: G           O
> > > >      5.4.77-storage+ #35
> > > > [  500.079358] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> > > > BIOS 1.10.2-1ubuntu1 04/01/2014
> > > > [  500.079358] Workqueue: rtrs_server_wq rtrs_srv_close_work [rtrs_server]
> > > > [  500.079358] Call Trace:
> > > > [  500.079358]  dump_stack+0x71/0x9b
> > > > [  500.079358]  check_noncircular+0x17d/0x1a0
> > > > [  500.079358]  ? __lock_acquire+0x1166/0x1440
> > > > [  500.079358]  __lock_acquire+0x1166/0x1440
> > > > [  500.079358]  lock_acquire+0x90/0x170
> > > > [  500.079358]  ? rdma_destroy_id+0x55/0x230 [rdma_cm]
> > > > [  500.079358]  ? rdma_destroy_id+0x55/0x230 [rdma_cm]
> > > > [  500.079358]  __mutex_lock+0x7e/0x950
> > > > [  500.079358]  ? rdma_destroy_id+0x55/0x230 [rdma_cm]
> > > > [  500.079358]  ? find_held_lock+0x2d/0x90
> > > > [  500.079358]  ? mark_held_locks+0x49/0x70
> > > > [  500.079358]  ? rdma_destroy_id+0x55/0x230 [rdma_cm]
> > > > [  500.079358]  rdma_destroy_id+0x55/0x230 [rdma_cm]
> > > > [  500.079358]  rtrs_srv_close_work+0xf2/0x2d0 [rtrs_server]
> > > > [  500.079358]  process_one_work+0x29f/0x600
> > > > [  500.079358]  worker_thread+0x2d/0x3d0
> > > > [  500.079358]  ? process_one_work+0x600/0x600
> > > > [  500.079358]  kthread+0x111/0x130
> > > > [  500.079358]  ? kthread_park+0x90/0x90
> > > > [  500.079358]  ret_from_fork+0x24/0x30
> > > >
> > > > According to my understanding
> > > > in cma_ib_req_handler, the conn_id is newly created in
> > > > https://elixir.bootlin.com/linux/latest/source/drivers/infiniband/core/cma.c#L2222.
> > > > And the rdma_cm_id associated with conn_id is passed to
> > > > rtrs_srv_rdma_cm_handler->rtrs_rdma_connect.
> > > >
> > > > In rtrs_rdma_connect, we do flush_workqueue will only flush close_work
> > > > for any other cm_id, but
> > > > not the newly created one conn_id, it has not associated with anything yet.
> > >
> > > How did you come to this conclusion that rtrs handler was called before
> > > cma_cm_event_handler()? I'm not so sure about that and it will explain
> > > the lockdep.
> > >
> > > Thanks
> > Hi Leon,
> > I never said that, the call chain here is:
> > cma_ib_req_handler->cma_cm_event_handler->rtrs_srv_rdma_cm_handler->rtrs_rdma_connect.
> > Repeat myself in last email:
> > in cma_ib_req_handler, the conn_id is newly created in
> >  https://elixir.bootlin.com/linux/latest/source/drivers/infiniband/core/cma.c#L2222.
> > And the rdma_cm_id associated with conn_id is passed to
> > rtrs_rdma_connect.
> >
> > In rtrs_rdma_connect, we do flush_workqueue will only flush close_work
> > for any other cm_id, but
> > not the newly created one conn_id, the rdma_cm_id passed in
> > rtrs_rdma_connect has not associated with anything yet.
>
> This is exactly why I'm not so sure, after rdma_cm_id returns from
> RDMA/core, it will be in that flush_workqueue queue.

In rtrs_rdma_connect,  we do  flush_workqueue(rtrs_wq) in the
beginning before we associate rdma_cm_id (conn_id) to rtrs_srv_con in
by rtrs_srv_rdma_cm_handler -> rtrs_rdma_connect -> create_con ->
con->c.cm_id = cm_id. And in rtrs_srv_close_work we do
rdma_destroy_id(con->c.cm_id);

so the rdma_cm_id is not in the flush_workqueue queue yet.

Thanks!
>
> >
> > Hope this is now clear.
> >
> > Happy New Year!
>
> Happy New Year too :)
diff mbox series

Patch

diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv.c b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
index ed4628f032bb..0a2202c28b54 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-srv.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
@@ -1791,6 +1791,10 @@  static int rtrs_rdma_connect(struct rdma_cm_id *cm_id,
 		err = -ENOMEM;
 		goto reject_w_err;
 	}
+	if (!cid) {
+		/* Let inflight session teardown complete */
+		flush_workqueue(rtrs_wq);
+	}
 	mutex_lock(&srv->paths_mutex);
 	sess = __find_sess(srv, &msg->sess_uuid);
 	if (sess) {