Message ID | 1681108984-2-3-git-send-email-lizhijian@fujitsu.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | rtrs bugfix and cleanups | expand |
On Mon, Apr 10, 2023 at 06:43:03AM +0000, Li Zhijian wrote: > The warning occurs when destroying PD whose reference count is not zero. > > Precodition: clt_path->s.con_num is 2. > So 2 cm connection will be created as below: > CPU0 CPU1 > init_conns { | > create_cm() // a. con[0] created | > | a'. rtrs_clt_rdma_cm_handler() { > | rtrs_rdma_addr_resolved() > | create_con_cq_qp(con); << con[0] > | } > | in this moment, refcnt of PD was increased to 2+ > | > create_cm() // b. cid = 1, failed | > destroy_con_cq_qp() | > rtrs_ib_dev_put() | > dev_free() | > ib_dealloc_pd(dev->ib_pd) << PD | > is destroyed, but refcnt is | > still greater than 0 | > } > > Simply, Here we can avoid this warning by introducing conn own flag to > track if its cleanup should drop the PD. > > ----------------------------------------------- > rnbd_client L597: Mapping device /dev/nvme0n1 on session client, (access_mode: rw, nr_poll_queues: 0) > ------------[ cut here ]------------ > WARNING: CPU: 0 PID: 26407 at drivers/infiniband/sw/rxe/rxe_pool.c:256 __rxe_cleanup+0x13a/0x170 [rdma_rxe] > Modules linked in: rpcrdma rdma_ucm ib_iser rnbd_client libiscsi rtrs_client scsi_transport_iscsi rtrs_core rdma_cm iw_cm ib_cm crc32_generic rdma_rxe udp_tunnel ib_uverbs ib_core kmem device_dax nd_pmem dax_pmem nd_ > vme crc32c_intel fuse nvme_core nfit libnvdimm dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_mod > CPU: 0 PID: 26407 Comm: rnbd-client.sh Kdump: loaded Not tainted 6.2.0-rc6-roce-flush+ #53 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 > RIP: 0010:__rxe_cleanup+0x13a/0x170 [rdma_rxe] > Code: 45 84 e4 0f 84 5a ff ff ff 48 89 ef e8 5f 18 71 f9 84 c0 75 90 be c8 00 00 00 48 89 ef e8 be 89 1f fa 85 c0 0f 85 7b ff ff ff <0f> 0b 41 bc ea ff ff ff e9 71 ff ff ff e8 84 7f 1f fa e9 d0 fe ff > RSP: 0018:ffffb09880b6f5f0 EFLAGS: 00010246 > RAX: 0000000000000000 RBX: ffff99401f15d6a8 RCX: 0000000000000000 > RDX: 0000000000000001 RSI: ffffffffbac8234b RDI: 00000000ffffffff > RBP: ffff99401f15d6d0 R08: 0000000000000001 R09: 0000000000000001 > R10: 0000000000002d82 R11: 0000000000000000 R12: 0000000000000001 > R13: ffff994101eff208 R14: ffffb09880b6f6a0 R15: 00000000fffffe00 > FS: 00007fe113904740(0000) GS:ffff99413bc00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007ff6cde656c8 CR3: 000000001f108004 CR4: 00000000001706f0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > <TASK> > rxe_dealloc_pd+0x16/0x20 [rdma_rxe] > ib_dealloc_pd_user+0x4b/0x80 [ib_core] > rtrs_ib_dev_put+0x79/0xd0 [rtrs_core] > destroy_con_cq_qp+0x8a/0xa0 [rtrs_client] > init_path+0x1e7/0x9a0 [rtrs_client] > ? __pfx_autoremove_wake_function+0x10/0x10 > ? lock_is_held_type+0xd7/0x130 > ? rcu_read_lock_sched_held+0x43/0x80 > ? pcpu_alloc+0x3dd/0x7d0 > ? rtrs_clt_init_stats+0x18/0x40 [rtrs_client] > rtrs_clt_open+0x24f/0x5a0 [rtrs_client] > ? __pfx_rnbd_clt_link_ev+0x10/0x10 [rnbd_client] > rnbd_clt_map_device+0x6a5/0xe10 [rnbd_client] > > Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> > --- > drivers/infiniband/ulp/rtrs/rtrs-clt.c | 4 ++++ > drivers/infiniband/ulp/rtrs/rtrs-clt.h | 1 + > 2 files changed, 5 insertions(+) > > diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c > index c2065fc33a56..4c8f42e46e2f 100644 > --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c > +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c > @@ -1664,6 +1664,7 @@ static int create_con_cq_qp(struct rtrs_clt_con *con) > return -ENOMEM; > } > clt_path->s.dev_ref = 1; > + con->has_dev = true; > query_fast_reg_mode(clt_path); > wr_limit = clt_path->s.dev->ib_dev->attrs.max_qp_wr; > /* > @@ -1690,6 +1691,7 @@ static int create_con_cq_qp(struct rtrs_clt_con *con) > wr_limit = clt_path->s.dev->ib_dev->attrs.max_qp_wr; > /* Shared between connections */ > clt_path->s.dev_ref++; Without looking in the code, I would expect dev_ref from the line above to perform PD protection. > + con->has_dev = true; > max_send_wr = min_t(int, wr_limit, > /* QD * (REQ + RSP + FR REGS or INVS) + drain */ > clt_path->queue_depth * 3 + 1); > @@ -1742,6 +1744,8 @@ static void destroy_con_cq_qp(struct rtrs_clt_con *con) > con->rsp_ius = NULL; > con->queue_num = 0; > } > + if (!con->has_dev) > + return; > if (clt_path->s.dev_ref && !--clt_path->s.dev_ref) { > rtrs_ib_dev_put(clt_path->s.dev); > clt_path->s.dev = NULL; > diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.h b/drivers/infiniband/ulp/rtrs/rtrs-clt.h > index f848c0392d98..970b75633594 100644 > --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.h > +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.h > @@ -75,6 +75,7 @@ struct rtrs_clt_con { > unsigned int cpu; > struct mutex con_mutex; > int cm_err; > + bool has_dev; > }; > > /** > -- > 2.29.2 >
On 4/10/23 20:08, Leon Romanovsky wrote: > On Mon, Apr 10, 2023 at 06:43:03AM +0000, Li Zhijian wrote: >> The warning occurs when destroying PD whose reference count is not zero. >> >> Precodition: clt_path->s.con_num is 2. >> So 2 cm connection will be created as below: >> CPU0 CPU1 >> init_conns { | >> create_cm() // a. con[0] created | >> | a'. rtrs_clt_rdma_cm_handler() { >> | rtrs_rdma_addr_resolved() >> | create_con_cq_qp(con); << con[0] >> | } >> | in this moment, refcnt of PD was increased to 2+ >> | >> create_cm() // b. cid = 1, failed | >> destroy_con_cq_qp() | >> rtrs_ib_dev_put() | >> dev_free() | >> ib_dealloc_pd(dev->ib_pd) << PD | >> is destroyed, but refcnt is | >> still greater than 0 | >> } >> >> Simply, Here we can avoid this warning by introducing conn own flag to >> track if its cleanup should drop the PD. >> >> ----------------------------------------------- >> rnbd_client L597: Mapping device /dev/nvme0n1 on session client, (access_mode: rw, nr_poll_queues: 0) >> ------------[ cut here ]------------ >> WARNING: CPU: 0 PID: 26407 at drivers/infiniband/sw/rxe/rxe_pool.c:256 __rxe_cleanup+0x13a/0x170 [rdma_rxe] >> Modules linked in: rpcrdma rdma_ucm ib_iser rnbd_client libiscsi rtrs_client scsi_transport_iscsi rtrs_core rdma_cm iw_cm ib_cm crc32_generic rdma_rxe udp_tunnel ib_uverbs ib_core kmem device_dax nd_pmem dax_pmem nd_ >> vme crc32c_intel fuse nvme_core nfit libnvdimm dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_mod >> CPU: 0 PID: 26407 Comm: rnbd-client.sh Kdump: loaded Not tainted 6.2.0-rc6-roce-flush+ #53 >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 >> RIP: 0010:__rxe_cleanup+0x13a/0x170 [rdma_rxe] >> Code: 45 84 e4 0f 84 5a ff ff ff 48 89 ef e8 5f 18 71 f9 84 c0 75 90 be c8 00 00 00 48 89 ef e8 be 89 1f fa 85 c0 0f 85 7b ff ff ff <0f> 0b 41 bc ea ff ff ff e9 71 ff ff ff e8 84 7f 1f fa e9 d0 fe ff >> RSP: 0018:ffffb09880b6f5f0 EFLAGS: 00010246 >> RAX: 0000000000000000 RBX: ffff99401f15d6a8 RCX: 0000000000000000 >> RDX: 0000000000000001 RSI: ffffffffbac8234b RDI: 00000000ffffffff >> RBP: ffff99401f15d6d0 R08: 0000000000000001 R09: 0000000000000001 >> R10: 0000000000002d82 R11: 0000000000000000 R12: 0000000000000001 >> R13: ffff994101eff208 R14: ffffb09880b6f6a0 R15: 00000000fffffe00 >> FS: 00007fe113904740(0000) GS:ffff99413bc00000(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> CR2: 00007ff6cde656c8 CR3: 000000001f108004 CR4: 00000000001706f0 >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> Call Trace: >> <TASK> >> rxe_dealloc_pd+0x16/0x20 [rdma_rxe] >> ib_dealloc_pd_user+0x4b/0x80 [ib_core] >> rtrs_ib_dev_put+0x79/0xd0 [rtrs_core] >> destroy_con_cq_qp+0x8a/0xa0 [rtrs_client] >> init_path+0x1e7/0x9a0 [rtrs_client] >> ? __pfx_autoremove_wake_function+0x10/0x10 >> ? lock_is_held_type+0xd7/0x130 >> ? rcu_read_lock_sched_held+0x43/0x80 >> ? pcpu_alloc+0x3dd/0x7d0 >> ? rtrs_clt_init_stats+0x18/0x40 [rtrs_client] >> rtrs_clt_open+0x24f/0x5a0 [rtrs_client] >> ? __pfx_rnbd_clt_link_ev+0x10/0x10 [rnbd_client] >> rnbd_clt_map_device+0x6a5/0xe10 [rnbd_client] >> >> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> >> --- >> drivers/infiniband/ulp/rtrs/rtrs-clt.c | 4 ++++ >> drivers/infiniband/ulp/rtrs/rtrs-clt.h | 1 + >> 2 files changed, 5 insertions(+) >> >> diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c >> index c2065fc33a56..4c8f42e46e2f 100644 >> --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c >> +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c >> @@ -1664,6 +1664,7 @@ static int create_con_cq_qp(struct rtrs_clt_con *con) >> return -ENOMEM; >> } >> clt_path->s.dev_ref = 1; >> + con->has_dev = true; >> query_fast_reg_mode(clt_path); >> wr_limit = clt_path->s.dev->ib_dev->attrs.max_qp_wr; >> /* >> @@ -1690,6 +1691,7 @@ static int create_con_cq_qp(struct rtrs_clt_con *con) >> wr_limit = clt_path->s.dev->ib_dev->attrs.max_qp_wr; >> /* Shared between connections */ >> clt_path->s.dev_ref++; > Without looking in the code, I would expect dev_ref from the line above > to perform PD protection. Agreed. Thanks, Guoqing
On 10/04/2023 21:10, Guoqing Jiang wrote: > > > On 4/10/23 20:08, Leon Romanovsky wrote: >> On Mon, Apr 10, 2023 at 06:43:03AM +0000, Li Zhijian wrote: >>> The warning occurs when destroying PD whose reference count is not zero. >>> >>> Precodition: clt_path->s.con_num is 2. >>> So 2 cm connection will be created as below: >>> CPU0 CPU1 >>> init_conns { | >>> create_cm() // a. con[0] created | >>> | a'. rtrs_clt_rdma_cm_handler() { >>> | rtrs_rdma_addr_resolved() >>> | create_con_cq_qp(con); << con[0] >>> | } >>> | in this moment, refcnt of PD was increased to 2+ >>> | >>> create_cm() // b. cid = 1, failed | >>> destroy_con_cq_qp() | >>> rtrs_ib_dev_put() | >>> dev_free() | >>> ib_dealloc_pd(dev->ib_pd) << PD | >>> is destroyed, but refcnt is | >>> still greater than 0 | >>> } >>> >>> Simply, Here we can avoid this warning by introducing conn own flag to >>> track if its cleanup should drop the PD. >>> >>> ----------------------------------------------- >>> rnbd_client L597: Mapping device /dev/nvme0n1 on session client, (access_mode: rw, nr_poll_queues: 0) >>> ------------[ cut here ]------------ >>> WARNING: CPU: 0 PID: 26407 at drivers/infiniband/sw/rxe/rxe_pool.c:256 __rxe_cleanup+0x13a/0x170 [rdma_rxe] >>> Modules linked in: rpcrdma rdma_ucm ib_iser rnbd_client libiscsi rtrs_client scsi_transport_iscsi rtrs_core rdma_cm iw_cm ib_cm crc32_generic rdma_rxe udp_tunnel ib_uverbs ib_core kmem device_dax nd_pmem dax_pmem nd_ >>> vme crc32c_intel fuse nvme_core nfit libnvdimm dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_mod >>> CPU: 0 PID: 26407 Comm: rnbd-client.sh Kdump: loaded Not tainted 6.2.0-rc6-roce-flush+ #53 >>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 >>> RIP: 0010:__rxe_cleanup+0x13a/0x170 [rdma_rxe] >>> Code: 45 84 e4 0f 84 5a ff ff ff 48 89 ef e8 5f 18 71 f9 84 c0 75 90 be c8 00 00 00 48 89 ef e8 be 89 1f fa 85 c0 0f 85 7b ff ff ff <0f> 0b 41 bc ea ff ff ff e9 71 ff ff ff e8 84 7f 1f fa e9 d0 fe ff >>> RSP: 0018:ffffb09880b6f5f0 EFLAGS: 00010246 >>> RAX: 0000000000000000 RBX: ffff99401f15d6a8 RCX: 0000000000000000 >>> RDX: 0000000000000001 RSI: ffffffffbac8234b RDI: 00000000ffffffff >>> RBP: ffff99401f15d6d0 R08: 0000000000000001 R09: 0000000000000001 >>> R10: 0000000000002d82 R11: 0000000000000000 R12: 0000000000000001 >>> R13: ffff994101eff208 R14: ffffb09880b6f6a0 R15: 00000000fffffe00 >>> FS: 00007fe113904740(0000) GS:ffff99413bc00000(0000) knlGS:0000000000000000 >>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> CR2: 00007ff6cde656c8 CR3: 000000001f108004 CR4: 00000000001706f0 >>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>> Call Trace: >>> <TASK> >>> rxe_dealloc_pd+0x16/0x20 [rdma_rxe] >>> ib_dealloc_pd_user+0x4b/0x80 [ib_core] >>> rtrs_ib_dev_put+0x79/0xd0 [rtrs_core] >>> destroy_con_cq_qp+0x8a/0xa0 [rtrs_client] >>> init_path+0x1e7/0x9a0 [rtrs_client] >>> ? __pfx_autoremove_wake_function+0x10/0x10 >>> ? lock_is_held_type+0xd7/0x130 >>> ? rcu_read_lock_sched_held+0x43/0x80 >>> ? pcpu_alloc+0x3dd/0x7d0 >>> ? rtrs_clt_init_stats+0x18/0x40 [rtrs_client] >>> rtrs_clt_open+0x24f/0x5a0 [rtrs_client] >>> ? __pfx_rnbd_clt_link_ev+0x10/0x10 [rnbd_client] >>> rnbd_clt_map_device+0x6a5/0xe10 [rnbd_client] >>> >>> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> >>> --- >>> drivers/infiniband/ulp/rtrs/rtrs-clt.c | 4 ++++ >>> drivers/infiniband/ulp/rtrs/rtrs-clt.h | 1 + >>> 2 files changed, 5 insertions(+) >>> >>> diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c >>> index c2065fc33a56..4c8f42e46e2f 100644 >>> --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c >>> +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c >>> @@ -1664,6 +1664,7 @@ static int create_con_cq_qp(struct rtrs_clt_con *con) >>> return -ENOMEM; >>> } >>> clt_path->s.dev_ref = 1; >>> + con->has_dev = true; >>> query_fast_reg_mode(clt_path); >>> wr_limit = clt_path->s.dev->ib_dev->attrs.max_qp_wr; >>> /* >>> @@ -1690,6 +1691,7 @@ static int create_con_cq_qp(struct rtrs_clt_con *con) >>> wr_limit = clt_path->s.dev->ib_dev->attrs.max_qp_wr; >>> /* Shared between connections */ >>> clt_path->s.dev_ref++; >> Without looking in the code, I would expect dev_ref from the line above >> to perform PD protection. > > Agreed. Sorry, i didn't get your point. Do you mean something like this: + con->has_dev = true; clt_path->s.dev_ref++; Thanks Zhijian > > Thanks, > Guoqing
On Tue, Apr 11, 2023 at 02:43:46AM +0000, Zhijian Li (Fujitsu) wrote: > > > On 10/04/2023 21:10, Guoqing Jiang wrote: > > > > > > On 4/10/23 20:08, Leon Romanovsky wrote: > >> On Mon, Apr 10, 2023 at 06:43:03AM +0000, Li Zhijian wrote: > >>> The warning occurs when destroying PD whose reference count is not zero. > >>> > >>> Precodition: clt_path->s.con_num is 2. > >>> So 2 cm connection will be created as below: > >>> CPU0 CPU1 > >>> init_conns { | > >>> create_cm() // a. con[0] created | > >>> | a'. rtrs_clt_rdma_cm_handler() { > >>> | rtrs_rdma_addr_resolved() > >>> | create_con_cq_qp(con); << con[0] > >>> | } > >>> | in this moment, refcnt of PD was increased to 2+ > >>> | > >>> create_cm() // b. cid = 1, failed | > >>> destroy_con_cq_qp() | > >>> rtrs_ib_dev_put() | > >>> dev_free() | > >>> ib_dealloc_pd(dev->ib_pd) << PD | > >>> is destroyed, but refcnt is | > >>> still greater than 0 | > >>> } > >>> > >>> Simply, Here we can avoid this warning by introducing conn own flag to > >>> track if its cleanup should drop the PD. > >>> > >>> ----------------------------------------------- > >>> rnbd_client L597: Mapping device /dev/nvme0n1 on session client, (access_mode: rw, nr_poll_queues: 0) > >>> ------------[ cut here ]------------ > >>> WARNING: CPU: 0 PID: 26407 at drivers/infiniband/sw/rxe/rxe_pool.c:256 __rxe_cleanup+0x13a/0x170 [rdma_rxe] > >>> Modules linked in: rpcrdma rdma_ucm ib_iser rnbd_client libiscsi rtrs_client scsi_transport_iscsi rtrs_core rdma_cm iw_cm ib_cm crc32_generic rdma_rxe udp_tunnel ib_uverbs ib_core kmem device_dax nd_pmem dax_pmem nd_ > >>> vme crc32c_intel fuse nvme_core nfit libnvdimm dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_mod > >>> CPU: 0 PID: 26407 Comm: rnbd-client.sh Kdump: loaded Not tainted 6.2.0-rc6-roce-flush+ #53 > >>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 > >>> RIP: 0010:__rxe_cleanup+0x13a/0x170 [rdma_rxe] > >>> Code: 45 84 e4 0f 84 5a ff ff ff 48 89 ef e8 5f 18 71 f9 84 c0 75 90 be c8 00 00 00 48 89 ef e8 be 89 1f fa 85 c0 0f 85 7b ff ff ff <0f> 0b 41 bc ea ff ff ff e9 71 ff ff ff e8 84 7f 1f fa e9 d0 fe ff > >>> RSP: 0018:ffffb09880b6f5f0 EFLAGS: 00010246 > >>> RAX: 0000000000000000 RBX: ffff99401f15d6a8 RCX: 0000000000000000 > >>> RDX: 0000000000000001 RSI: ffffffffbac8234b RDI: 00000000ffffffff > >>> RBP: ffff99401f15d6d0 R08: 0000000000000001 R09: 0000000000000001 > >>> R10: 0000000000002d82 R11: 0000000000000000 R12: 0000000000000001 > >>> R13: ffff994101eff208 R14: ffffb09880b6f6a0 R15: 00000000fffffe00 > >>> FS: 00007fe113904740(0000) GS:ffff99413bc00000(0000) knlGS:0000000000000000 > >>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>> CR2: 00007ff6cde656c8 CR3: 000000001f108004 CR4: 00000000001706f0 > >>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > >>> Call Trace: > >>> <TASK> > >>> rxe_dealloc_pd+0x16/0x20 [rdma_rxe] > >>> ib_dealloc_pd_user+0x4b/0x80 [ib_core] > >>> rtrs_ib_dev_put+0x79/0xd0 [rtrs_core] > >>> destroy_con_cq_qp+0x8a/0xa0 [rtrs_client] > >>> init_path+0x1e7/0x9a0 [rtrs_client] > >>> ? __pfx_autoremove_wake_function+0x10/0x10 > >>> ? lock_is_held_type+0xd7/0x130 > >>> ? rcu_read_lock_sched_held+0x43/0x80 > >>> ? pcpu_alloc+0x3dd/0x7d0 > >>> ? rtrs_clt_init_stats+0x18/0x40 [rtrs_client] > >>> rtrs_clt_open+0x24f/0x5a0 [rtrs_client] > >>> ? __pfx_rnbd_clt_link_ev+0x10/0x10 [rnbd_client] > >>> rnbd_clt_map_device+0x6a5/0xe10 [rnbd_client] > >>> > >>> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> > >>> --- > >>> drivers/infiniband/ulp/rtrs/rtrs-clt.c | 4 ++++ > >>> drivers/infiniband/ulp/rtrs/rtrs-clt.h | 1 + > >>> 2 files changed, 5 insertions(+) > >>> > >>> diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c > >>> index c2065fc33a56..4c8f42e46e2f 100644 > >>> --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c > >>> +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c > >>> @@ -1664,6 +1664,7 @@ static int create_con_cq_qp(struct rtrs_clt_con *con) > >>> return -ENOMEM; > >>> } > >>> clt_path->s.dev_ref = 1; > >>> + con->has_dev = true; > >>> query_fast_reg_mode(clt_path); > >>> wr_limit = clt_path->s.dev->ib_dev->attrs.max_qp_wr; > >>> /* > >>> @@ -1690,6 +1691,7 @@ static int create_con_cq_qp(struct rtrs_clt_con *con) > >>> wr_limit = clt_path->s.dev->ib_dev->attrs.max_qp_wr; > >>> /* Shared between connections */ > >>> clt_path->s.dev_ref++; > >> Without looking in the code, I would expect dev_ref from the line above > >> to perform PD protection. > > > > Agreed. > > Sorry, i didn't get your point. Do you mean something like this: > > + con->has_dev = true; > clt_path->s.dev_ref++; No, my point was that clt_path->s.dev_ref > 0 means that has_dev is equal to true, and dev_ref is supposed to protect from early PD destruction. Thanks > > > > Thanks > Zhijian > > > > > Thanks, > > Guoqing
On 11/04/2023 20:26, Leon Romanovsky wrote: > On Tue, Apr 11, 2023 at 02:43:46AM +0000, Zhijian Li (Fujitsu) wrote: >> >> >> On 10/04/2023 21:10, Guoqing Jiang wrote: >>> >>> >>> On 4/10/23 20:08, Leon Romanovsky wrote: >>>> On Mon, Apr 10, 2023 at 06:43:03AM +0000, Li Zhijian wrote: >>>>> The warning occurs when destroying PD whose reference count is not zero. >>>>> >>>>> Precodition: clt_path->s.con_num is 2. >>>>> So 2 cm connection will be created as below: >>>>> CPU0 CPU1 >>>>> init_conns { | >>>>> create_cm() // a. con[0] created | >>>>> | a'. rtrs_clt_rdma_cm_handler() { >>>>> | rtrs_rdma_addr_resolved() >>>>> | create_con_cq_qp(con); << con[0] >>>>> | } >>>>> | in this moment, refcnt of PD was increased to 2+ >>>>> | >>>>> create_cm() // b. cid = 1, failed | >>>>> destroy_con_cq_qp() | >>>>> rtrs_ib_dev_put() | >>>>> dev_free() | >>>>> ib_dealloc_pd(dev->ib_pd) << PD | >>>>> is destroyed, but refcnt is | >>>>> still greater than 0 | >>>>> } >>>>> >>>>> Simply, Here we can avoid this warning by introducing conn own flag to >>>>> track if its cleanup should drop the PD. >>>>> >>>>> ----------------------------------------------- >>>>> rnbd_client L597: Mapping device /dev/nvme0n1 on session client, (access_mode: rw, nr_poll_queues: 0) >>>>> ------------[ cut here ]------------ >>>>> WARNING: CPU: 0 PID: 26407 at drivers/infiniband/sw/rxe/rxe_pool.c:256 __rxe_cleanup+0x13a/0x170 [rdma_rxe] >>>>> Modules linked in: rpcrdma rdma_ucm ib_iser rnbd_client libiscsi rtrs_client scsi_transport_iscsi rtrs_core rdma_cm iw_cm ib_cm crc32_generic rdma_rxe udp_tunnel ib_uverbs ib_core kmem device_dax nd_pmem dax_pmem nd_ >>>>> vme crc32c_intel fuse nvme_core nfit libnvdimm dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_mod >>>>> CPU: 0 PID: 26407 Comm: rnbd-client.sh Kdump: loaded Not tainted 6.2.0-rc6-roce-flush+ #53 >>>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 >>>>> RIP: 0010:__rxe_cleanup+0x13a/0x170 [rdma_rxe] >>>>> Code: 45 84 e4 0f 84 5a ff ff ff 48 89 ef e8 5f 18 71 f9 84 c0 75 90 be c8 00 00 00 48 89 ef e8 be 89 1f fa 85 c0 0f 85 7b ff ff ff <0f> 0b 41 bc ea ff ff ff e9 71 ff ff ff e8 84 7f 1f fa e9 d0 fe ff >>>>> RSP: 0018:ffffb09880b6f5f0 EFLAGS: 00010246 >>>>> RAX: 0000000000000000 RBX: ffff99401f15d6a8 RCX: 0000000000000000 >>>>> RDX: 0000000000000001 RSI: ffffffffbac8234b RDI: 00000000ffffffff >>>>> RBP: ffff99401f15d6d0 R08: 0000000000000001 R09: 0000000000000001 >>>>> R10: 0000000000002d82 R11: 0000000000000000 R12: 0000000000000001 >>>>> R13: ffff994101eff208 R14: ffffb09880b6f6a0 R15: 00000000fffffe00 >>>>> FS: 00007fe113904740(0000) GS:ffff99413bc00000(0000) knlGS:0000000000000000 >>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>> CR2: 00007ff6cde656c8 CR3: 000000001f108004 CR4: 00000000001706f0 >>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>>>> Call Trace: >>>>> <TASK> >>>>> rxe_dealloc_pd+0x16/0x20 [rdma_rxe] >>>>> ib_dealloc_pd_user+0x4b/0x80 [ib_core] >>>>> rtrs_ib_dev_put+0x79/0xd0 [rtrs_core] >>>>> destroy_con_cq_qp+0x8a/0xa0 [rtrs_client] >>>>> init_path+0x1e7/0x9a0 [rtrs_client] >>>>> ? __pfx_autoremove_wake_function+0x10/0x10 >>>>> ? lock_is_held_type+0xd7/0x130 >>>>> ? rcu_read_lock_sched_held+0x43/0x80 >>>>> ? pcpu_alloc+0x3dd/0x7d0 >>>>> ? rtrs_clt_init_stats+0x18/0x40 [rtrs_client] >>>>> rtrs_clt_open+0x24f/0x5a0 [rtrs_client] >>>>> ? __pfx_rnbd_clt_link_ev+0x10/0x10 [rnbd_client] >>>>> rnbd_clt_map_device+0x6a5/0xe10 [rnbd_client] >>>>> >>>>> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> >>>>> --- >>>>> drivers/infiniband/ulp/rtrs/rtrs-clt.c | 4 ++++ >>>>> drivers/infiniband/ulp/rtrs/rtrs-clt.h | 1 + >>>>> 2 files changed, 5 insertions(+) >>>>> >>>>> diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c >>>>> index c2065fc33a56..4c8f42e46e2f 100644 >>>>> --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c >>>>> +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c >>>>> @@ -1664,6 +1664,7 @@ static int create_con_cq_qp(struct rtrs_clt_con *con) >>>>> return -ENOMEM; >>>>> } >>>>> clt_path->s.dev_ref = 1; >>>>> + con->has_dev = true; >>>>> query_fast_reg_mode(clt_path); >>>>> wr_limit = clt_path->s.dev->ib_dev->attrs.max_qp_wr; >>>>> /* >>>>> @@ -1690,6 +1691,7 @@ static int create_con_cq_qp(struct rtrs_clt_con *con) >>>>> wr_limit = clt_path->s.dev->ib_dev->attrs.max_qp_wr; >>>>> /* Shared between connections */ >>>>> clt_path->s.dev_ref++; >>>> Without looking in the code, I would expect dev_ref from the line above >>>> to perform PD protection. >>> >>> Agreed. >> >> Sorry, i didn't get your point. Do you mean something like this: >> >> + con->has_dev = true; >> clt_path->s.dev_ref++; > > No, my point was that clt_path->s.dev_ref > 0 means that has_dev is > equal to true, and dev_ref is supposed to protect from early PD > destruction. > > + if (!con->has_dev) > + return; We have already done such protection VVVV > if (clt_path->s.dev_ref && !--clt_path->s.dev_ref) { <<< each cleanup will decrease clt_path->s.dev_ref > rtrs_ib_dev_put(clt_path->s.dev); <<< when it becomes to 0, PD will be destructed. > clt_path->s.dev = NULL; But they are not equal, clt_path->s.dev_ref could be shared by multiple connections. So in the case con[0] successed and con[1] failed(clt_path->s.dev_ref is 1), the con[1]'s cleanup path(destroy_con_cq_qp) will destroy PD while conn[0] still associates this PD. Thanks Zhijian > Thanks > >> >> >> >> Thanks >> Zhijian >> >>> >>> Thanks, >>> Guoqing
Hi, I take a closer look today. On 4/12/23 09:15, Zhijian Li (Fujitsu) wrote: > > On 11/04/2023 20:26, Leon Romanovsky wrote: >> On Tue, Apr 11, 2023 at 02:43:46AM +0000, Zhijian Li (Fujitsu) wrote: >>> >>> On 10/04/2023 21:10, Guoqing Jiang wrote: >>>> >>>> On 4/10/23 20:08, Leon Romanovsky wrote: >>>>> On Mon, Apr 10, 2023 at 06:43:03AM +0000, Li Zhijian wrote: >>>>>> The warning occurs when destroying PD whose reference count is not zero. >>>>>> >>>>>> Precodition: clt_path->s.con_num is 2. >>>>>> So 2 cm connection will be created as below: >>>>>> CPU0 CPU1 >>>>>> init_conns { | >>>>>> create_cm() // a. con[0] created | >>>>>> | a'. rtrs_clt_rdma_cm_handler() { >>>>>> | rtrs_rdma_addr_resolved() >>>>>> | create_con_cq_qp(con); << con[0] >>>>>> | } >>>>>> | in this moment, refcnt of PD was increased to 2+ What do you mean "refcnt of PD"? usecnt in struct ib_pd or dev_ref. >>>>>> | >>>>>> create_cm() // b. cid = 1, failed | >>>>>> destroy_con_cq_qp() | >>>>>> rtrs_ib_dev_put() | >>>>>> dev_free() | >>>>>> ib_dealloc_pd(dev->ib_pd) << PD | >>>>>> is destroyed, but refcnt is | >>>>>> still greater than 0 | Assuming you mean "pd->usecnt". We only allocate pd in con[0] by rtrs_ib_dev_find_or_add, if con[1] failed to create cm, then alloc_path_reqs -> ib_alloc_mr -> atomic_inc(&pd->usecnt) can't be triggered. Is there other places could increase the refcnt? >>>>>> } >>>>>> >>>>>> Simply, Here we can avoid this warning by introducing conn own flag to >>>>>> track if its cleanup should drop the PD. >>>>>> >>>>>> ----------------------------------------------- >>>>>> rnbd_client L597: Mapping device /dev/nvme0n1 on session client, (access_mode: rw, nr_poll_queues: 0) >>>>>> ------------[ cut here ]------------ >>>>>> WARNING: CPU: 0 PID: 26407 at drivers/infiniband/sw/rxe/rxe_pool.c:256 __rxe_cleanup+0x13a/0x170 [rdma_rxe] >>>>>> Modules linked in: rpcrdma rdma_ucm ib_iser rnbd_client libiscsi rtrs_client scsi_transport_iscsi rtrs_core rdma_cm iw_cm ib_cm crc32_generic rdma_rxe udp_tunnel ib_uverbs ib_core kmem device_dax nd_pmem dax_pmem nd_ >>>>>> vme crc32c_intel fuse nvme_core nfit libnvdimm dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_mod >>>>>> CPU: 0 PID: 26407 Comm: rnbd-client.sh Kdump: loaded Not tainted 6.2.0-rc6-roce-flush+ #53 >>>>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 >>>>>> RIP: 0010:__rxe_cleanup+0x13a/0x170 [rdma_rxe] >>>>>> Code: 45 84 e4 0f 84 5a ff ff ff 48 89 ef e8 5f 18 71 f9 84 c0 75 90 be c8 00 00 00 48 89 ef e8 be 89 1f fa 85 c0 0f 85 7b ff ff ff <0f> 0b 41 bc ea ff ff ff e9 71 ff ff ff e8 84 7f 1f fa e9 d0 fe ff >>>>>> RSP: 0018:ffffb09880b6f5f0 EFLAGS: 00010246 >>>>>> RAX: 0000000000000000 RBX: ffff99401f15d6a8 RCX: 0000000000000000 >>>>>> RDX: 0000000000000001 RSI: ffffffffbac8234b RDI: 00000000ffffffff >>>>>> RBP: ffff99401f15d6d0 R08: 0000000000000001 R09: 0000000000000001 >>>>>> R10: 0000000000002d82 R11: 0000000000000000 R12: 0000000000000001 >>>>>> R13: ffff994101eff208 R14: ffffb09880b6f6a0 R15: 00000000fffffe00 >>>>>> FS: 00007fe113904740(0000) GS:ffff99413bc00000(0000) knlGS:0000000000000000 >>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>>> CR2: 00007ff6cde656c8 CR3: 000000001f108004 CR4: 00000000001706f0 >>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>>>>> Call Trace: >>>>>> <TASK> >>>>>> rxe_dealloc_pd+0x16/0x20 [rdma_rxe] >>>>>> ib_dealloc_pd_user+0x4b/0x80 [ib_core] >>>>>> rtrs_ib_dev_put+0x79/0xd0 [rtrs_core] >>>>>> destroy_con_cq_qp+0x8a/0xa0 [rtrs_client] >>>>>> init_path+0x1e7/0x9a0 [rtrs_client] >>>>>> ? __pfx_autoremove_wake_function+0x10/0x10 >>>>>> ? lock_is_held_type+0xd7/0x130 >>>>>> ? rcu_read_lock_sched_held+0x43/0x80 >>>>>> ? pcpu_alloc+0x3dd/0x7d0 >>>>>> ? rtrs_clt_init_stats+0x18/0x40 [rtrs_client] >>>>>> rtrs_clt_open+0x24f/0x5a0 [rtrs_client] >>>>>> ? __pfx_rnbd_clt_link_ev+0x10/0x10 [rnbd_client] >>>>>> rnbd_clt_map_device+0x6a5/0xe10 [rnbd_client] >>>>>> >>>>>> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> >>>>>> --- >>>>>> drivers/infiniband/ulp/rtrs/rtrs-clt.c | 4 ++++ >>>>>> drivers/infiniband/ulp/rtrs/rtrs-clt.h | 1 + >>>>>> 2 files changed, 5 insertions(+) >>>>>> >>>>>> diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c >>>>>> index c2065fc33a56..4c8f42e46e2f 100644 >>>>>> --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c >>>>>> +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c >>>>>> @@ -1664,6 +1664,7 @@ static int create_con_cq_qp(struct rtrs_clt_con *con) >>>>>> return -ENOMEM; >>>>>> } >>>>>> clt_path->s.dev_ref = 1; >>>>>> + con->has_dev = true; >>>>>> query_fast_reg_mode(clt_path); >>>>>> wr_limit = clt_path->s.dev->ib_dev->attrs.max_qp_wr; >>>>>> /* >>>>>> @@ -1690,6 +1691,7 @@ static int create_con_cq_qp(struct rtrs_clt_con *con) >>>>>> wr_limit = clt_path->s.dev->ib_dev->attrs.max_qp_wr; >>>>>> /* Shared between connections */ >>>>>> clt_path->s.dev_ref++; >>>>> Without looking in the code, I would expect dev_ref from the line above >>>>> to perform PD protection. >>>> Agreed. >>> Sorry, i didn't get your point. Do you mean something like this: >>> >>> + con->has_dev = true; >>> clt_path->s.dev_ref++; >> No, my point was that clt_path->s.dev_ref > 0 means that has_dev is >> equal to true, and dev_ref is supposed to protect from early PD >> destruction. >> > >> + if (!con->has_dev) >> + return; > We have already done such protection VVVV > >> if (clt_path->s.dev_ref && !--clt_path->s.dev_ref) { <<< each cleanup will decrease clt_path->s.dev_ref >> rtrs_ib_dev_put(clt_path->s.dev); <<< when it becomes to 0, PD will be destructed. >> clt_path->s.dev = NULL; > > But they are not equal, clt_path->s.dev_ref could be shared by multiple connections. > So in the case con[0] successed and con[1] failed(clt_path->s.dev_ref is 1), > the con[1]'s cleanup path(destroy_con_cq_qp) will destroy PD while conn[0] still associates this PD. Then what is the appropriate time to call destroy_con_cq_qp for this scenario? Otherwise there could be memory leak. Thanks, Guoqing
On 13/04/2023 15:35, Guoqing Jiang wrote: > Hi, > > I take a closer look today. > > On 4/12/23 09:15, Zhijian Li (Fujitsu) wrote: >> >> On 11/04/2023 20:26, Leon Romanovsky wrote: >>> On Tue, Apr 11, 2023 at 02:43:46AM +0000, Zhijian Li (Fujitsu) wrote: >>>> >>>> On 10/04/2023 21:10, Guoqing Jiang wrote: >>>>> >>>>> On 4/10/23 20:08, Leon Romanovsky wrote: >>>>>> On Mon, Apr 10, 2023 at 06:43:03AM +0000, Li Zhijian wrote: >>>>>>> The warning occurs when destroying PD whose reference count is not zero. >>>>>>> >>>>>>> Precodition: clt_path->s.con_num is 2. >>>>>>> So 2 cm connection will be created as below: >>>>>>> CPU0 CPU1 >>>>>>> init_conns { | >>>>>>> create_cm() // a. con[0] created | >>>>>>> | a'. rtrs_clt_rdma_cm_handler() { >>>>>>> | rtrs_rdma_addr_resolved() >>>>>>> | create_con_cq_qp(con); << con[0] >>>>>>> | } >>>>>>> | in this moment, refcnt of PD was increased to 2+ > > What do you mean "refcnt of PD"? usecnt in struct ib_pd or dev_ref. I mean usecnt in struct ib_pd > >>>>>>> | >>>>>>> create_cm() // b. cid = 1, failed | >>>>>>> destroy_con_cq_qp() | >>>>>>> rtrs_ib_dev_put() | >>>>>>> dev_free() | >>>>>>> ib_dealloc_pd(dev->ib_pd) << PD | >>>>>>> is destroyed, but refcnt is | >>>>>>> still greater than 0 | > > Assuming you mean "pd->usecnt". We only allocate pd in con[0] by rtrs_ib_dev_find_or_add, > if con[1] failed to create cm, then alloc_path_reqs -> ib_alloc_mr -> atomic_inc(&pd->usecnt) > can't be triggered. Is there other places could increase the refcnt? Yes, when create a qp, it will also associate to this PD, that also mean refcnt of PD will be increased. When con[0](create_con_cq_qp) succeeded, refcnt of PD will be 2. and then when con[1] failed, since QP didn't create, refcnt of PD is still 2. con[1]'s cleanup will destroy the PD(ib_dealloc_pd) since dev_ref = 1, after that its refcnt is still 1. > Then what is the appropriate time to call destroy_con_cq_qp for this scenario? > Otherwise there could be memory leak. we must ensure QP in con[0] is closed before destroying the PD. Currently destroy_con_cq_qp() subroutine will close the opened QP first. Thanks > >>>>>>> } >>>>>>> >>>>>>> Simply, Here we can avoid this warning by introducing conn own flag to >>>>>>> track if its cleanup should drop the PD. >>>>>>> >>>>>>> ----------------------------------------------- >>>>>>> rnbd_client L597: Mapping device /dev/nvme0n1 on session client, (access_mode: rw, nr_poll_queues: 0) >>>>>>> ------------[ cut here ]------------ >>>>>>> WARNING: CPU: 0 PID: 26407 at drivers/infiniband/sw/rxe/rxe_pool.c:256 __rxe_cleanup+0x13a/0x170 [rdma_rxe] >>>>>>> Modules linked in: rpcrdma rdma_ucm ib_iser rnbd_client libiscsi rtrs_client scsi_transport_iscsi rtrs_core rdma_cm iw_cm ib_cm crc32_generic rdma_rxe udp_tunnel ib_uverbs ib_core kmem device_dax nd_pmem dax_pmem nd_ >>>>>>> vme crc32c_intel fuse nvme_core nfit libnvdimm dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_mod >>>>>>> CPU: 0 PID: 26407 Comm: rnbd-client.sh Kdump: loaded Not tainted 6.2.0-rc6-roce-flush+ #53 >>>>>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 >>>>>>> RIP: 0010:__rxe_cleanup+0x13a/0x170 [rdma_rxe] >>>>>>> Code: 45 84 e4 0f 84 5a ff ff ff 48 89 ef e8 5f 18 71 f9 84 c0 75 90 be c8 00 00 00 48 89 ef e8 be 89 1f fa 85 c0 0f 85 7b ff ff ff <0f> 0b 41 bc ea ff ff ff e9 71 ff ff ff e8 84 7f 1f fa e9 d0 fe ff >>>>>>> RSP: 0018:ffffb09880b6f5f0 EFLAGS: 00010246 >>>>>>> RAX: 0000000000000000 RBX: ffff99401f15d6a8 RCX: 0000000000000000 >>>>>>> RDX: 0000000000000001 RSI: ffffffffbac8234b RDI: 00000000ffffffff >>>>>>> RBP: ffff99401f15d6d0 R08: 0000000000000001 R09: 0000000000000001 >>>>>>> R10: 0000000000002d82 R11: 0000000000000000 R12: 0000000000000001 >>>>>>> R13: ffff994101eff208 R14: ffffb09880b6f6a0 R15: 00000000fffffe00 >>>>>>> FS: 00007fe113904740(0000) GS:ffff99413bc00000(0000) knlGS:0000000000000000 >>>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>>>> CR2: 00007ff6cde656c8 CR3: 000000001f108004 CR4: 00000000001706f0 >>>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>>>>>> Call Trace: >>>>>>> <TASK> >>>>>>> rxe_dealloc_pd+0x16/0x20 [rdma_rxe] >>>>>>> ib_dealloc_pd_user+0x4b/0x80 [ib_core] >>>>>>> rtrs_ib_dev_put+0x79/0xd0 [rtrs_core] >>>>>>> destroy_con_cq_qp+0x8a/0xa0 [rtrs_client] >>>>>>> init_path+0x1e7/0x9a0 [rtrs_client] >>>>>>> ? __pfx_autoremove_wake_function+0x10/0x10 >>>>>>> ? lock_is_held_type+0xd7/0x130 >>>>>>> ? rcu_read_lock_sched_held+0x43/0x80 >>>>>>> ? pcpu_alloc+0x3dd/0x7d0 >>>>>>> ? rtrs_clt_init_stats+0x18/0x40 [rtrs_client] >>>>>>> rtrs_clt_open+0x24f/0x5a0 [rtrs_client] >>>>>>> ? __pfx_rnbd_clt_link_ev+0x10/0x10 [rnbd_client] >>>>>>> rnbd_clt_map_device+0x6a5/0xe10 [rnbd_client] >>>>>>> >>>>>>> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> >>>>>>> --- >>>>>>> drivers/infiniband/ulp/rtrs/rtrs-clt.c | 4 ++++ >>>>>>> drivers/infiniband/ulp/rtrs/rtrs-clt.h | 1 + >>>>>>> 2 files changed, 5 insertions(+) >>>>>>> >>>>>>> diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c >>>>>>> index c2065fc33a56..4c8f42e46e2f 100644 >>>>>>> --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c >>>>>>> +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c >>>>>>> @@ -1664,6 +1664,7 @@ static int create_con_cq_qp(struct rtrs_clt_con *con) >>>>>>> return -ENOMEM; >>>>>>> } >>>>>>> clt_path->s.dev_ref = 1; >>>>>>> + con->has_dev = true; >>>>>>> query_fast_reg_mode(clt_path); >>>>>>> wr_limit = clt_path->s.dev->ib_dev->attrs.max_qp_wr; >>>>>>> /* >>>>>>> @@ -1690,6 +1691,7 @@ static int create_con_cq_qp(struct rtrs_clt_con *con) >>>>>>> wr_limit = clt_path->s.dev->ib_dev->attrs.max_qp_wr; >>>>>>> /* Shared between connections */ >>>>>>> clt_path->s.dev_ref++; >>>>>> Without looking in the code, I would expect dev_ref from the line above >>>>>> to perform PD protection. >>>>> Agreed. >>>> Sorry, i didn't get your point. Do you mean something like this: >>>> >>>> + con->has_dev = true; >>>> clt_path->s.dev_ref++; >>> No, my point was that clt_path->s.dev_ref > 0 means that has_dev is >>> equal to true, and dev_ref is supposed to protect from early PD >>> destruction. >>> >> >>> + if (!con->has_dev) >>> + return; >> We have already done such protection VVVV >> >>> if (clt_path->s.dev_ref && !--clt_path->s.dev_ref) { <<< each cleanup will decrease clt_path->s.dev_ref >>> rtrs_ib_dev_put(clt_path->s.dev); <<< when it becomes to 0, PD will be destructed. >>> clt_path->s.dev = NULL; >> >> But they are not equal, clt_path->s.dev_ref could be shared by multiple connections. >> So in the case con[0] successed and con[1] failed(clt_path->s.dev_ref is 1), >> the con[1]'s cleanup path(destroy_con_cq_qp) will destroy PD while conn[0] still associates this PD. > > Then what is the appropriate time to call destroy_con_cq_qp for this scenario? > Otherwise there could be memory leak. > > Thanks, > Guoqing
On Thu, Apr 13, 2023 at 08:12:15AM +0000, Zhijian Li (Fujitsu) wrote: > > > On 13/04/2023 15:35, Guoqing Jiang wrote: > > Hi, > > > > I take a closer look today. > > > > On 4/12/23 09:15, Zhijian Li (Fujitsu) wrote: > >> > >> On 11/04/2023 20:26, Leon Romanovsky wrote: > >>> On Tue, Apr 11, 2023 at 02:43:46AM +0000, Zhijian Li (Fujitsu) wrote: > >>>> > >>>> On 10/04/2023 21:10, Guoqing Jiang wrote: > >>>>> > >>>>> On 4/10/23 20:08, Leon Romanovsky wrote: > >>>>>> On Mon, Apr 10, 2023 at 06:43:03AM +0000, Li Zhijian wrote: > >>>>>>> The warning occurs when destroying PD whose reference count is not zero. > >>>>>>> > >>>>>>> Precodition: clt_path->s.con_num is 2. > >>>>>>> So 2 cm connection will be created as below: > >>>>>>> CPU0 CPU1 > >>>>>>> init_conns { | > >>>>>>> create_cm() // a. con[0] created | > >>>>>>> | a'. rtrs_clt_rdma_cm_handler() { > >>>>>>> | rtrs_rdma_addr_resolved() > >>>>>>> | create_con_cq_qp(con); << con[0] > >>>>>>> | } > >>>>>>> | in this moment, refcnt of PD was increased to 2+ > > > > What do you mean "refcnt of PD"? usecnt in struct ib_pd or dev_ref. > > I mean usecnt in struct ib_pd > > > > > > >>>>>>> | > >>>>>>> create_cm() // b. cid = 1, failed | > >>>>>>> destroy_con_cq_qp() | > >>>>>>> rtrs_ib_dev_put() | > >>>>>>> dev_free() | > >>>>>>> ib_dealloc_pd(dev->ib_pd) << PD | > >>>>>>> is destroyed, but refcnt is | > >>>>>>> still greater than 0 | > > > > Assuming you mean "pd->usecnt". We only allocate pd in con[0] by rtrs_ib_dev_find_or_add, > > if con[1] failed to create cm, then alloc_path_reqs -> ib_alloc_mr -> atomic_inc(&pd->usecnt) > > can't be triggered. Is there other places could increase the refcnt? > > > Yes, when create a qp, it will also associate to this PD, that also mean refcnt of PD will be increased. > > When con[0](create_con_cq_qp) succeeded, refcnt of PD will be 2. and then when con[1] failed, since > QP didn't create, refcnt of PD is still 2. con[1]'s cleanup will destroy the PD(ib_dealloc_pd) since dev_ref = 1, after that its > refcnt is still 1. Why is refcnt 1 in con[1] destruction phase? It seems to me like a bug. Thanks
On 4/13/23 16:12, Zhijian Li (Fujitsu) wrote: > On 13/04/2023 15:35, Guoqing Jiang wrote: >> Hi, >> >> I take a closer look today. >> >> On 4/12/23 09:15, Zhijian Li (Fujitsu) wrote: >>> On 11/04/2023 20:26, Leon Romanovsky wrote: >>>> On Tue, Apr 11, 2023 at 02:43:46AM +0000, Zhijian Li (Fujitsu) wrote: >>>>> On 10/04/2023 21:10, Guoqing Jiang wrote: >>>>>> On 4/10/23 20:08, Leon Romanovsky wrote: >>>>>>> On Mon, Apr 10, 2023 at 06:43:03AM +0000, Li Zhijian wrote: >>>>>>>> The warning occurs when destroying PD whose reference count is not zero. >>>>>>>> >>>>>>>> Precodition: clt_path->s.con_num is 2. >>>>>>>> So 2 cm connection will be created as below: >>>>>>>> CPU0 CPU1 >>>>>>>> init_conns { | >>>>>>>> create_cm() // a. con[0] created | >>>>>>>> | a'. rtrs_clt_rdma_cm_handler() { >>>>>>>> | rtrs_rdma_addr_resolved() >>>>>>>> | create_con_cq_qp(con); << con[0] >>>>>>>> | } >>>>>>>> | in this moment, refcnt of PD was increased to 2+ >> What do you mean "refcnt of PD"? usecnt in struct ib_pd or dev_ref. > I mean usecnt in struct ib_pd > > > >>>>>>>> | >>>>>>>> create_cm() // b. cid = 1, failed | >>>>>>>> destroy_con_cq_qp() | >>>>>>>> rtrs_ib_dev_put() | >>>>>>>> dev_free() | >>>>>>>> ib_dealloc_pd(dev->ib_pd) << PD | >>>>>>>> is destroyed, but refcnt is | >>>>>>>> still greater than 0 | >> Assuming you mean "pd->usecnt". We only allocate pd in con[0] by rtrs_ib_dev_find_or_add, >> if con[1] failed to create cm, then alloc_path_reqs -> ib_alloc_mr -> atomic_inc(&pd->usecnt) The above can't be invoked, right? >> can't be triggered. Is there other places could increase the refcnt? > Yes, when create a qp, it will also associate to this PD, that also mean refcnt of PD will be increased. > > When con[0](create_con_cq_qp) succeeded, refcnt of PD will be 2. and then when con[1] failed, since > QP didn't create, refcnt of PD is still 2. con[1]'s cleanup will destroy the PD(ib_dealloc_pd) since dev_ref = 1, after that its > refcnt is still 1. I can see the path increase usecnt to 1. rtrs_cq_qp_create -> create_qp -> rdma_create_qp -> ib_create_qp -> create_qp -> ib_qp_usecnt_inc which increases pd->usecnt Where is another place to increase usecnt to 2? >> Then what is the appropriate time to call destroy_con_cq_qp for this scenario? >> Otherwise there could be memory leak. > we must ensure QP in con[0] is closed before destroying the PD. > Currently destroy_con_cq_qp() subroutine will close the opened QP first. Let me try another way, with below change, rtrs_ib_dev_put can't be called from destroy_con_cq_qp, right? + if (!con->has_dev) + return; if (clt_path->s.dev_ref && !--clt_path->s.dev_ref) { rtrs_ib_dev_put(clt_path->s.dev); clt_path->s.dev = NULL; Then when will you dealloc pd and free rtrs_ib_dev? Thanks, Guoqing
On 4/13/23 22:40, Guoqing Jiang wrote: > > > On 4/13/23 16:12, Zhijian Li (Fujitsu) wrote: >> On 13/04/2023 15:35, Guoqing Jiang wrote: >>> Hi, >>> >>> I take a closer look today. >>> >>> On 4/12/23 09:15, Zhijian Li (Fujitsu) wrote: >>>> On 11/04/2023 20:26, Leon Romanovsky wrote: >>>>> On Tue, Apr 11, 2023 at 02:43:46AM +0000, Zhijian Li (Fujitsu) wrote: >>>>>> On 10/04/2023 21:10, Guoqing Jiang wrote: >>>>>>> On 4/10/23 20:08, Leon Romanovsky wrote: >>>>>>>> On Mon, Apr 10, 2023 at 06:43:03AM +0000, Li Zhijian wrote: >>>>>>>>> The warning occurs when destroying PD whose reference count is not zero. >>>>>>>>> >>>>>>>>> Precodition: clt_path->s.con_num is 2. >>>>>>>>> So 2 cm connection will be created as below: >>>>>>>>> CPU0 CPU1 >>>>>>>>> init_conns { | >>>>>>>>> create_cm() // a. con[0] created | >>>>>>>>> | a'. rtrs_clt_rdma_cm_handler() { >>>>>>>>> | rtrs_rdma_addr_resolved() >>>>>>>>> | create_con_cq_qp(con); << con[0] >>>>>>>>> | } >>>>>>>>> | in this moment, refcnt of PD was increased to 2+ >>> What do you mean "refcnt of PD"? usecnt in struct ib_pd or dev_ref. >> I mean usecnt in struct ib_pd >> >> >> >>>>>>>>> | >>>>>>>>> create_cm() // b. cid = 1, failed | >>>>>>>>> destroy_con_cq_qp() | >>>>>>>>> rtrs_ib_dev_put() | >>>>>>>>> dev_free() | >>>>>>>>> ib_dealloc_pd(dev->ib_pd) << PD | >>>>>>>>> is destroyed, but refcnt is | >>>>>>>>> still greater than 0 | >>> Assuming you mean "pd->usecnt". We only allocate pd in con[0] by rtrs_ib_dev_find_or_add, >>> if con[1] failed to create cm, then alloc_path_reqs -> ib_alloc_mr -> atomic_inc(&pd->usecnt) > > The above can't be invoked, right? > >>> can't be triggered. Is there other places could increase the refcnt? >> Yes, when create a qp, it will also associate to this PD, that also mean refcnt of PD will be increased. >> >> When con[0](create_con_cq_qp) succeeded, refcnt of PD will be 2. and then when con[1] failed, since >> QP didn't create, refcnt of PD is still 2. con[1]'s cleanup will destroy the PD(ib_dealloc_pd) since dev_ref = 1, after that its >> refcnt is still 1. > > I can see the path increase usecnt to 1. > > rtrs_cq_qp_create -> create_qp > -> rdma_create_qp > -> ib_create_qp > -> create_qp > -> ib_qp_usecnt_inc which increases pd->usecnt > > Where is another place to increase usecnt to 2? > >>> Then what is the appropriate time to call destroy_con_cq_qp for this scenario? >>> Otherwise there could be memory leak. >> we must ensure QP in con[0] is closed before destroying the PD. >> Currently destroy_con_cq_qp() subroutine will close the opened QP first. > > Let me try another way, with below change, rtrs_ib_dev_put can't be called > from destroy_con_cq_qp, right? > > + if (!con->has_dev) > + return; > if (clt_path->s.dev_ref && !--clt_path->s.dev_ref) { > rtrs_ib_dev_put(clt_path->s.dev); > clt_path->s.dev = NULL; > > Then when will you dealloc pd and free rtrs_ib_dev? > > Thanks, > Guoqing I think that wondering into Leon's reference counting is a really bad idea. Currently the assumed rule is that rdma-core keeps its ref counts and rxe keeps its. rxe defers the return from rxe_dealloc_pd() until the rxe ref count drops to zero for that pd, sleeping if necessary. (There is a timeout value set where rxe will return anyway but it will throw a WARN. If the timeout isn't long enough under heavy load we could extend it.) If it doesn't happen, or it happens too soon, then there is a ref count bug in rxe that needs to be fixed. Fixing rxe ref count bugs is hard enough without entangling rdma-core ref counts into the mix. Bob
On 14/04/2023 11:40, Guoqing Jiang wrote: > > > On 4/13/23 16:12, Zhijian Li (Fujitsu) wrote: >> On 13/04/2023 15:35, Guoqing Jiang wrote: >>> Hi, >>> >>> I take a closer look today. >>> >>> On 4/12/23 09:15, Zhijian Li (Fujitsu) wrote: >>>> On 11/04/2023 20:26, Leon Romanovsky wrote: >>>>> On Tue, Apr 11, 2023 at 02:43:46AM +0000, Zhijian Li (Fujitsu) wrote: >>>>>> On 10/04/2023 21:10, Guoqing Jiang wrote: >>>>>>> On 4/10/23 20:08, Leon Romanovsky wrote: >>>>>>>> On Mon, Apr 10, 2023 at 06:43:03AM +0000, Li Zhijian wrote: >>>>>>>>> The warning occurs when destroying PD whose reference count is not zero. >>>>>>>>> >>>>>>>>> Precodition: clt_path->s.con_num is 2. >>>>>>>>> So 2 cm connection will be created as below: >>>>>>>>> CPU0 CPU1 >>>>>>>>> init_conns { | >>>>>>>>> create_cm() // a. con[0] created | >>>>>>>>> | a'. rtrs_clt_rdma_cm_handler() { >>>>>>>>> | rtrs_rdma_addr_resolved() >>>>>>>>> | create_con_cq_qp(con); << con[0] >>>>>>>>> | } >>>>>>>>> | in this moment, refcnt of PD was increased to 2+ >>> What do you mean "refcnt of PD"? usecnt in struct ib_pd or dev_ref. >> I mean usecnt in struct ib_pd >> >> >> >>>>>>>>> | >>>>>>>>> create_cm() // b. cid = 1, failed | >>>>>>>>> destroy_con_cq_qp() | >>>>>>>>> rtrs_ib_dev_put() | >>>>>>>>> dev_free() | >>>>>>>>> ib_dealloc_pd(dev->ib_pd) << PD | >>>>>>>>> is destroyed, but refcnt is | >>>>>>>>> still greater than 0 | >>> Assuming you mean "pd->usecnt". We only allocate pd in con[0] by rtrs_ib_dev_find_or_add, >>> if con[1] failed to create cm, then alloc_path_reqs -> ib_alloc_mr -> atomic_inc(&pd->usecnt) > > The above can't be invoked, right? > >>> can't be triggered. Is there other places could increase the refcnt? >> Yes, when create a qp, it will also associate to this PD, that also mean refcnt of PD will be increased. >> >> When con[0](create_con_cq_qp) succeeded, refcnt of PD will be 2. and then when con[1] failed, since >> QP didn't create, refcnt of PD is still 2. con[1]'s cleanup will destroy the PD(ib_dealloc_pd) since dev_ref = 1, after that its >> refcnt is still 1. > > I can see the path increase usecnt to 1. > > rtrs_cq_qp_create -> create_qp > -> rdma_create_qp > -> ib_create_qp > -> create_qp > -> ib_qp_usecnt_inc which increases pd->usecnt > > Where is another place to increase usecnt to 2? It should be ib_create_qp ... -> rxe_create_qp -> rxe_qp_from_init -> rxe_get(pd) <<< pd's refcnt will be increased. > >>> Then what is the appropriate time to call destroy_con_cq_qp for this scenario? >>> Otherwise there could be memory leak. >> we must ensure QP in con[0] is closed before destroying the PD. >> Currently destroy_con_cq_qp() subroutine will close the opened QP first. > > Let me try another way, with below change, rtrs_ib_dev_put can't be called > from destroy_con_cq_qp, right? Not really, con[0]->has_dev is true, so con[0]'s cleanup will call rtrs_ib_dev_put() Without this patch, when con[1] failed, con[1]'s cleanup will be called first. then call con[0]'s cleanup. After this change, con[1]'s cleanup will not call rtrs_ib_dev_put, but it will be called the later con[0]'s cleanup. Thanks Zhijian > > + if (!con->has_dev) > + return; > if (clt_path->s.dev_ref && !--clt_path->s.dev_ref) { > rtrs_ib_dev_put(clt_path->s.dev); > clt_path->s.dev = NULL; > > Then when will you dealloc pd and free rtrs_ib_dev? > > Thanks, > Guoqing
Hi Zhijian, Guoqing, Leon, Bob First of all, thanks for the patch and discussion. On Fri, Apr 14, 2023 at 7:37 AM Zhijian Li (Fujitsu) <lizhijian@fujitsu.com> wrote: > > > > On 14/04/2023 11:40, Guoqing Jiang wrote: > > > > > > On 4/13/23 16:12, Zhijian Li (Fujitsu) wrote: > >> On 13/04/2023 15:35, Guoqing Jiang wrote: > >>> Hi, > >>> > >>> I take a closer look today. > >>> > >>> On 4/12/23 09:15, Zhijian Li (Fujitsu) wrote: > >>>> On 11/04/2023 20:26, Leon Romanovsky wrote: > >>>>> On Tue, Apr 11, 2023 at 02:43:46AM +0000, Zhijian Li (Fujitsu) wrote: > >>>>>> On 10/04/2023 21:10, Guoqing Jiang wrote: > >>>>>>> On 4/10/23 20:08, Leon Romanovsky wrote: > >>>>>>>> On Mon, Apr 10, 2023 at 06:43:03AM +0000, Li Zhijian wrote: > >>>>>>>>> The warning occurs when destroying PD whose reference count is not zero. > >>>>>>>>> > >>>>>>>>> Precodition: clt_path->s.con_num is 2. > >>>>>>>>> So 2 cm connection will be created as below: > >>>>>>>>> CPU0 CPU1 > >>>>>>>>> init_conns { | > >>>>>>>>> create_cm() // a. con[0] created | > >>>>>>>>> | a'. rtrs_clt_rdma_cm_handler() { > >>>>>>>>> | rtrs_rdma_addr_resolved() > >>>>>>>>> | create_con_cq_qp(con); << con[0] > >>>>>>>>> | } > >>>>>>>>> | in this moment, refcnt of PD was increased to 2+ > >>> What do you mean "refcnt of PD"? usecnt in struct ib_pd or dev_ref. > >> I mean usecnt in struct ib_pd > >> > >> > >> > >>>>>>>>> | > >>>>>>>>> create_cm() // b. cid = 1, failed | > >>>>>>>>> destroy_con_cq_qp() | > >>>>>>>>> rtrs_ib_dev_put() | > >>>>>>>>> dev_free() | > >>>>>>>>> ib_dealloc_pd(dev->ib_pd) << PD | > >>>>>>>>> is destroyed, but refcnt is | > >>>>>>>>> still greater than 0 | > >>> Assuming you mean "pd->usecnt". We only allocate pd in con[0] by rtrs_ib_dev_find_or_add, > >>> if con[1] failed to create cm, then alloc_path_reqs -> ib_alloc_mr -> atomic_inc(&pd->usecnt) > > > > The above can't be invoked, right? > > > >>> can't be triggered. Is there other places could increase the refcnt? > >> Yes, when create a qp, it will also associate to this PD, that also mean refcnt of PD will be increased. > >> > >> When con[0](create_con_cq_qp) succeeded, refcnt of PD will be 2. and then when con[1] failed, since > >> QP didn't create, refcnt of PD is still 2. con[1]'s cleanup will destroy the PD(ib_dealloc_pd) since dev_ref = 1, after that its > >> refcnt is still 1. > > > > I can see the path increase usecnt to 1. > > > > rtrs_cq_qp_create -> create_qp > > -> rdma_create_qp > > -> ib_create_qp > > -> create_qp > > -> ib_qp_usecnt_inc which increases pd->usecnt > > > > Where is another place to increase usecnt to 2? > > It should be > ib_create_qp ... > -> rxe_create_qp > -> rxe_qp_from_init > -> rxe_get(pd) <<< pd's refcnt will be increased. IIUC, this problem is rxe specific, because rxe manipulate refcnt itself? I checked mlx5/mlx4 they do not change the refcnt of pd when create_kernel_qp. So question is then if the bug is on rxe side or rtrs side? Zhijian how do you reproduce the warning? do you inject error explictly? Regards! > > > > > >>> Then what is the appropriate time to call destroy_con_cq_qp for this scenario? > >>> Otherwise there could be memory leak. > >> we must ensure QP in con[0] is closed before destroying the PD. > >> Currently destroy_con_cq_qp() subroutine will close the opened QP first. > > > > Let me try another way, with below change, rtrs_ib_dev_put can't be called > > from destroy_con_cq_qp, right? > > Not really, con[0]->has_dev is true, so con[0]'s cleanup will call rtrs_ib_dev_put() > > Without this patch, when con[1] failed, con[1]'s cleanup will be called first. then call con[0]'s cleanup. > After this change, con[1]'s cleanup will not call rtrs_ib_dev_put, but it will be called the later con[0]'s cleanup. > > > Thanks > Zhijian > > > > > + if (!con->has_dev) > > + return; > > if (clt_path->s.dev_ref && !--clt_path->s.dev_ref) { > > rtrs_ib_dev_put(clt_path->s.dev); > > clt_path->s.dev = NULL; > > > > Then when will you dealloc pd and free rtrs_ib_dev? > > > > Thanks, > > Guoqing
On 4/14/23 13:37, Zhijian Li (Fujitsu) wrote: > > On 14/04/2023 11:40, Guoqing Jiang wrote: >> >> On 4/13/23 16:12, Zhijian Li (Fujitsu) wrote: >>> On 13/04/2023 15:35, Guoqing Jiang wrote: >>>> Hi, >>>> >>>> I take a closer look today. >>>> >>>> On 4/12/23 09:15, Zhijian Li (Fujitsu) wrote: >>>>> On 11/04/2023 20:26, Leon Romanovsky wrote: >>>>>> On Tue, Apr 11, 2023 at 02:43:46AM +0000, Zhijian Li (Fujitsu) wrote: >>>>>>> On 10/04/2023 21:10, Guoqing Jiang wrote: >>>>>>>> On 4/10/23 20:08, Leon Romanovsky wrote: >>>>>>>>> On Mon, Apr 10, 2023 at 06:43:03AM +0000, Li Zhijian wrote: >>>>>>>>>> The warning occurs when destroying PD whose reference count is not zero. >>>>>>>>>> >>>>>>>>>> Precodition: clt_path->s.con_num is 2. >>>>>>>>>> So 2 cm connection will be created as below: >>>>>>>>>> CPU0 CPU1 >>>>>>>>>> init_conns { | >>>>>>>>>> create_cm() // a. con[0] created | >>>>>>>>>> | a'. rtrs_clt_rdma_cm_handler() { >>>>>>>>>> | rtrs_rdma_addr_resolved() >>>>>>>>>> | create_con_cq_qp(con); << con[0] >>>>>>>>>> | } >>>>>>>>>> | in this moment, refcnt of PD was increased to 2+ >>>> What do you mean "refcnt of PD"? usecnt in struct ib_pd or dev_ref. >>> I mean usecnt in struct ib_pd >>> >>> >>> >>>>>>>>>> | >>>>>>>>>> create_cm() // b. cid = 1, failed | >>>>>>>>>> destroy_con_cq_qp() | >>>>>>>>>> rtrs_ib_dev_put() | >>>>>>>>>> dev_free() | >>>>>>>>>> ib_dealloc_pd(dev->ib_pd) << PD | >>>>>>>>>> is destroyed, but refcnt is | >>>>>>>>>> still greater than 0 | >>>> Assuming you mean "pd->usecnt". We only allocate pd in con[0] by rtrs_ib_dev_find_or_add, >>>> if con[1] failed to create cm, then alloc_path_reqs -> ib_alloc_mr -> atomic_inc(&pd->usecnt) >> The above can't be invoked, right? >> >>>> can't be triggered. Is there other places could increase the refcnt? >>> Yes, when create a qp, it will also associate to this PD, that also mean refcnt of PD will be increased. >>> >>> When con[0](create_con_cq_qp) succeeded, refcnt of PD will be 2. and then when con[1] failed, since >>> QP didn't create, refcnt of PD is still 2. con[1]'s cleanup will destroy the PD(ib_dealloc_pd) since dev_ref = 1, after that its >>> refcnt is still 1. >> I can see the path increase usecnt to 1. >> >> rtrs_cq_qp_create -> create_qp >> -> rdma_create_qp >> -> ib_create_qp >> -> create_qp >> -> ib_qp_usecnt_inc which increases pd->usecnt >> >> Where is another place to increase usecnt to 2? > It should be > ib_create_qp ... > -> rxe_create_qp > -> rxe_qp_from_init > -> rxe_get(pd) <<< pd's refcnt will be increased. Isn't rxe_get just increase elem->ref_cnt? https://elixir.bootlin.com/linux/v6.3-rc6/source/drivers/infiniband/sw/rxe/rxe_pool.c#L240 >>>> Then what is the appropriate time to call destroy_con_cq_qp for this scenario? >>>> Otherwise there could be memory leak. >>> we must ensure QP in con[0] is closed before destroying the PD. >>> Currently destroy_con_cq_qp() subroutine will close the opened QP first. >> Let me try another way, with below change, rtrs_ib_dev_put can't be called >> from destroy_con_cq_qp, right? > Not really, con[0]->has_dev is true, so con[0]'s cleanup will call rtrs_ib_dev_put() > > Without this patch, when con[1] failed, con[1]'s cleanup will be called first. then call con[0]'s cleanup. > After this change, con[1]'s cleanup will not call rtrs_ib_dev_put, but it will be called the later con[0]'s cleanup. But rtrs_ib_dev_put relies on dev_ref, if con[1] returns earlier without decrease dev_ref (it is shared among connections), how rtrs_ib_dev_put can be called? Thanks, Guoqing
On 14/04/2023 14:03, Jinpu Wang wrote: >>> I can see the path increase usecnt to 1. >>> >>> rtrs_cq_qp_create -> create_qp >>> -> rdma_create_qp >>> -> ib_create_qp >>> -> create_qp >>> -> ib_qp_usecnt_inc which increases pd->usecnt >>> >>> Where is another place to increase usecnt to 2? >> It should be >> ib_create_qp ... >> -> rxe_create_qp >> -> rxe_qp_from_init >> -> rxe_get(pd) <<< pd's refcnt will be increased. > IIUC, this problem is rxe specific, because rxe manipulate refcnt > itself? I checked mlx5/mlx4 they do not change the refcnt of pd when > create_kernel_qp. > > So question is then if the bug is on rxe side or rtrs side? > > Zhijian how do you reproduce the warning? do you inject error explictly? # cat rnbd-self.sh #!/bin/bash /root/rpma/tools/config_softroce.sh eth0 modprobe rnbd_server modprobe rnbd_client while true; do echo "sessname=xyz path=ip:<server-ip> device_path=/dev/nvme0n1" > /sys/devices/virtual/rnbd-client/ctl/map_device for i in /sys/block/rnbd*/rnbd/unmap_device do echo "normal" > $i done done
On 14/04/2023 14:04, Guoqing Jiang wrote: > > > On 4/14/23 13:37, Zhijian Li (Fujitsu) wrote: >> >> On 14/04/2023 11:40, Guoqing Jiang wrote: >>> >>> On 4/13/23 16:12, Zhijian Li (Fujitsu) wrote: >>>> On 13/04/2023 15:35, Guoqing Jiang wrote: >>>>> Hi, >>>>> >>>>> I take a closer look today. >>>>> >>>>> On 4/12/23 09:15, Zhijian Li (Fujitsu) wrote: >>>>>> On 11/04/2023 20:26, Leon Romanovsky wrote: >>>>>>> On Tue, Apr 11, 2023 at 02:43:46AM +0000, Zhijian Li (Fujitsu) wrote: >>>>>>>> On 10/04/2023 21:10, Guoqing Jiang wrote: >>>>>>>>> On 4/10/23 20:08, Leon Romanovsky wrote: >>>>>>>>>> On Mon, Apr 10, 2023 at 06:43:03AM +0000, Li Zhijian wrote: >>>>>>>>>>> The warning occurs when destroying PD whose reference count is not zero. >>>>>>>>>>> >>>>>>>>>>> Precodition: clt_path->s.con_num is 2. >>>>>>>>>>> So 2 cm connection will be created as below: >>>>>>>>>>> CPU0 CPU1 >>>>>>>>>>> init_conns { | >>>>>>>>>>> create_cm() // a. con[0] created | >>>>>>>>>>> | a'. rtrs_clt_rdma_cm_handler() { >>>>>>>>>>> | rtrs_rdma_addr_resolved() >>>>>>>>>>> | create_con_cq_qp(con); << con[0] >>>>>>>>>>> | } >>>>>>>>>>> | in this moment, refcnt of PD was increased to 2+ >>>>> What do you mean "refcnt of PD"? usecnt in struct ib_pd or dev_ref. >>>> I mean usecnt in struct ib_pd >>>> >>>> >>>> >>>>>>>>>>> | >>>>>>>>>>> create_cm() // b. cid = 1, failed | >>>>>>>>>>> destroy_con_cq_qp() | >>>>>>>>>>> rtrs_ib_dev_put() | >>>>>>>>>>> dev_free() | >>>>>>>>>>> ib_dealloc_pd(dev->ib_pd) << PD | >>>>>>>>>>> is destroyed, but refcnt is | >>>>>>>>>>> still greater than 0 | >>>>> Assuming you mean "pd->usecnt". We only allocate pd in con[0] by rtrs_ib_dev_find_or_add, >>>>> if con[1] failed to create cm, then alloc_path_reqs -> ib_alloc_mr -> atomic_inc(&pd->usecnt) >>> The above can't be invoked, right? >>> >>>>> can't be triggered. Is there other places could increase the refcnt? >>>> Yes, when create a qp, it will also associate to this PD, that also mean refcnt of PD will be increased. >>>> >>>> When con[0](create_con_cq_qp) succeeded, refcnt of PD will be 2. and then when con[1] failed, since >>>> QP didn't create, refcnt of PD is still 2. con[1]'s cleanup will destroy the PD(ib_dealloc_pd) since dev_ref = 1, after that its >>>> refcnt is still 1. >>> I can see the path increase usecnt to 1. >>> >>> rtrs_cq_qp_create -> create_qp >>> -> rdma_create_qp >>> -> ib_create_qp >>> -> create_qp >>> -> ib_qp_usecnt_inc which increases pd->usecnt >>> >>> Where is another place to increase usecnt to 2? >> It should be >> ib_create_qp ... >> -> rxe_create_qp >> -> rxe_qp_from_init >> -> rxe_get(pd) <<< pd's refcnt will be increased. > > Isn't rxe_get just increase elem->ref_cnt? Yes, that's true. > > https://elixir.bootlin.com/linux/v6.3-rc6/source/drivers/infiniband/sw/rxe/rxe_pool.c#L240 > >>>>> Then what is the appropriate time to call destroy_con_cq_qp for this scenario? >>>>> Otherwise there could be memory leak. >>>> we must ensure QP in con[0] is closed before destroying the PD. >>>> Currently destroy_con_cq_qp() subroutine will close the opened QP first. >>> Let me try another way, with below change, rtrs_ib_dev_put can't be called >>> from destroy_con_cq_qp, right? >> Not really, con[0]->has_dev is true, so con[0]'s cleanup will call rtrs_ib_dev_put() >> >> Without this patch, when con[1] failed, con[1]'s cleanup will be called first. then call con[0]'s cleanup. >> After this change, con[1]'s cleanup will not call rtrs_ib_dev_put, but it will be called the later con[0]'s cleanup. > > But rtrs_ib_dev_put relies on dev_ref, if con[1] returns earlier without decrease dev_ref > (it is shared among connections), how rtrs_ib_dev_put can be called? > we must ensure each connections that take dev_ref decrease dev_ref during its cleanup path. So the new flag con->has_dev added to track if the con has taken the dev_ref. > Thanks, > Guoqing
在 2023/4/13 21:24, Leon Romanovsky 写道: > On Thu, Apr 13, 2023 at 08:12:15AM +0000, Zhijian Li (Fujitsu) wrote: >> >> >> On 13/04/2023 15:35, Guoqing Jiang wrote: >>> Hi, >>> >>> I take a closer look today. >>> >>> On 4/12/23 09:15, Zhijian Li (Fujitsu) wrote: >>>> >>>> On 11/04/2023 20:26, Leon Romanovsky wrote: >>>>> On Tue, Apr 11, 2023 at 02:43:46AM +0000, Zhijian Li (Fujitsu) wrote: >>>>>> >>>>>> On 10/04/2023 21:10, Guoqing Jiang wrote: >>>>>>> >>>>>>> On 4/10/23 20:08, Leon Romanovsky wrote: >>>>>>>> On Mon, Apr 10, 2023 at 06:43:03AM +0000, Li Zhijian wrote: >>>>>>>>> The warning occurs when destroying PD whose reference count is not zero. >>>>>>>>> >>>>>>>>> Precodition: clt_path->s.con_num is 2. >>>>>>>>> So 2 cm connection will be created as below: >>>>>>>>> CPU0 CPU1 >>>>>>>>> init_conns { | >>>>>>>>> create_cm() // a. con[0] created | >>>>>>>>> | a'. rtrs_clt_rdma_cm_handler() { >>>>>>>>> | rtrs_rdma_addr_resolved() >>>>>>>>> | create_con_cq_qp(con); << con[0] >>>>>>>>> | } >>>>>>>>> | in this moment, refcnt of PD was increased to 2+ >>> >>> What do you mean "refcnt of PD"? usecnt in struct ib_pd or dev_ref. >> >> I mean usecnt in struct ib_pd >> >> >> >>> >>>>>>>>> | >>>>>>>>> create_cm() // b. cid = 1, failed | >>>>>>>>> destroy_con_cq_qp() | >>>>>>>>> rtrs_ib_dev_put() | >>>>>>>>> dev_free() | >>>>>>>>> ib_dealloc_pd(dev->ib_pd) << PD | >>>>>>>>> is destroyed, but refcnt is | >>>>>>>>> still greater than 0 | >>> >>> Assuming you mean "pd->usecnt". We only allocate pd in con[0] by rtrs_ib_dev_find_or_add, >>> if con[1] failed to create cm, then alloc_path_reqs -> ib_alloc_mr -> atomic_inc(&pd->usecnt) >>> can't be triggered. Is there other places could increase the refcnt? >> >> >> Yes, when create a qp, it will also associate to this PD, that also mean refcnt of PD will be increased. >> >> When con[0](create_con_cq_qp) succeeded, refcnt of PD will be 2. and then when con[1] failed, since >> QP didn't create, refcnt of PD is still 2. con[1]'s cleanup will destroy the PD(ib_dealloc_pd) since dev_ref = 1, after that its >> refcnt is still 1. > > Why is refcnt 1 in con[1] destruction phase? It seems to me like a bug. Agree. We should find out why refcnt 1 and fix this problem. Zhu Yanjun > > Thanks
On 14/04/2023 23:58, Zhu Yanjun wrote: > 在 2023/4/13 21:24, Leon Romanovsky 写道: >> On Thu, Apr 13, 2023 at 08:12:15AM +0000, Zhijian Li (Fujitsu) wrote: >>> >>> >>> On 13/04/2023 15:35, Guoqing Jiang wrote: >>>> Hi, >>>> >>>> I take a closer look today. >>>> >>>> On 4/12/23 09:15, Zhijian Li (Fujitsu) wrote: >>>>> >>>>> On 11/04/2023 20:26, Leon Romanovsky wrote: >>>>>> On Tue, Apr 11, 2023 at 02:43:46AM +0000, Zhijian Li (Fujitsu) wrote: >>>>>>> >>>>>>> On 10/04/2023 21:10, Guoqing Jiang wrote: >>>>>>>> >>>>>>>> On 4/10/23 20:08, Leon Romanovsky wrote: >>>>>>>>> On Mon, Apr 10, 2023 at 06:43:03AM +0000, Li Zhijian wrote: >>>>>>>>>> The warning occurs when destroying PD whose reference count is not zero. >>>>>>>>>> >>>>>>>>>> Precodition: clt_path->s.con_num is 2. >>>>>>>>>> So 2 cm connection will be created as below: >>>>>>>>>> CPU0 CPU1 >>>>>>>>>> init_conns { | >>>>>>>>>> create_cm() // a. con[0] created | >>>>>>>>>> | a'. rtrs_clt_rdma_cm_handler() { >>>>>>>>>> | rtrs_rdma_addr_resolved() >>>>>>>>>> | create_con_cq_qp(con); << con[0] >>>>>>>>>> | } >>>>>>>>>> | in this moment, refcnt of PD was increased to 2+ >>>> >>>> What do you mean "refcnt of PD"? usecnt in struct ib_pd or dev_ref. >>> >>> I mean usecnt in struct ib_pd >>> >>> >>> >>>> >>>>>>>>>> | >>>>>>>>>> create_cm() // b. cid = 1, failed | >>>>>>>>>> destroy_con_cq_qp() | >>>>>>>>>> rtrs_ib_dev_put() | >>>>>>>>>> dev_free() | >>>>>>>>>> ib_dealloc_pd(dev->ib_pd) << PD | >>>>>>>>>> is destroyed, but refcnt is | >>>>>>>>>> still greater than 0 | >>>> >>>> Assuming you mean "pd->usecnt". We only allocate pd in con[0] by rtrs_ib_dev_find_or_add, >>>> if con[1] failed to create cm, then alloc_path_reqs -> ib_alloc_mr -> atomic_inc(&pd->usecnt) >>>> can't be triggered. Is there other places could increase the refcnt? >>> >>> >>> Yes, when create a qp, it will also associate to this PD, that also mean refcnt of PD will be increased. >>> >>> When con[0](create_con_cq_qp) succeeded, refcnt of PD will be 2. and then when con[1] failed, since >>> QP didn't create, refcnt of PD is still 2. con[1]'s cleanup will destroy the PD(ib_dealloc_pd) since dev_ref = 1, after that its >>> refcnt is still 1. >> >> Why is refcnt 1 in con[1] destruction phase? It seems to me like a bug. > + if (!con->has_dev) > + return; > if (clt_path->s.dev_ref && !--clt_path->s.dev_ref) { > rtrs_ib_dev_put(clt_path->s.dev); > clt_path->s.dev = NULL; Currently, without this patch: 1. PD and clt_path->s.dev are shared among connections. 2. every con[n]'s cleanup phase will call destroy_con_cq_qp() 3. clt_path->s.dev will be always decreased in destroy_con_cq_qp(), and when clt_path->s.dev become zero, it will destroy PD. 4. when con[1] failed to create, con[1] will not take clt_path->s.dev, but it try to decreased clt_path->s.dev <<< it's wrong to do that. Thanks Zhijian > Agree. We should find out why refcnt 1 and fix this problem. > > Zhu Yanjun >> >> Thanks >
On 4/14/23 18:09, Zhijian Li (Fujitsu) wrote: > > On 14/04/2023 14:04, Guoqing Jiang wrote: >> >> On 4/14/23 13:37, Zhijian Li (Fujitsu) wrote: >>> On 14/04/2023 11:40, Guoqing Jiang wrote: >>>> On 4/13/23 16:12, Zhijian Li (Fujitsu) wrote: >>>>> On 13/04/2023 15:35, Guoqing Jiang wrote: >>>>>> Hi, >>>>>> >>>>>> I take a closer look today. >>>>>> >>>>>> On 4/12/23 09:15, Zhijian Li (Fujitsu) wrote: >>>>>>> On 11/04/2023 20:26, Leon Romanovsky wrote: >>>>>>>> On Tue, Apr 11, 2023 at 02:43:46AM +0000, Zhijian Li (Fujitsu) wrote: >>>>>>>>> On 10/04/2023 21:10, Guoqing Jiang wrote: >>>>>>>>>> On 4/10/23 20:08, Leon Romanovsky wrote: >>>>>>>>>>> On Mon, Apr 10, 2023 at 06:43:03AM +0000, Li Zhijian wrote: >>>>>>>>>>>> The warning occurs when destroying PD whose reference count is not zero. >>>>>>>>>>>> >>>>>>>>>>>> Precodition: clt_path->s.con_num is 2. >>>>>>>>>>>> So 2 cm connection will be created as below: >>>>>>>>>>>> CPU0 CPU1 >>>>>>>>>>>> init_conns { | >>>>>>>>>>>> create_cm() // a. con[0] created | >>>>>>>>>>>> | a'. rtrs_clt_rdma_cm_handler() { >>>>>>>>>>>> | rtrs_rdma_addr_resolved() >>>>>>>>>>>> | create_con_cq_qp(con); << con[0] >>>>>>>>>>>> | } >>>>>>>>>>>> | in this moment, refcnt of PD was increased to 2+ >>>>>> What do you mean "refcnt of PD"? usecnt in struct ib_pd or dev_ref. >>>>> I mean usecnt in struct ib_pd >>>>> >>>>>>>>>>>> | >>>>>>>>>>>> create_cm() // b. cid = 1, failed | >>>>>>>>>>>> destroy_con_cq_qp() | >>>>>>>>>>>> rtrs_ib_dev_put() | >>>>>>>>>>>> dev_free() | >>>>>>>>>>>> ib_dealloc_pd(dev->ib_pd) << PD | >>>>>>>>>>>> is destroyed, but refcnt is | >>>>>>>>>>>> still greater than 0 | >>>>>> Assuming you mean "pd->usecnt". We only allocate pd in con[0] by rtrs_ib_dev_find_or_add, >>>>>> if con[1] failed to create cm, then alloc_path_reqs -> ib_alloc_mr -> atomic_inc(&pd->usecnt) >>>> The above can't be invoked, right? >>>> >>>>>> can't be triggered. Is there other places could increase the refcnt? >>>>> Yes, when create a qp, it will also associate to this PD, that also mean refcnt of PD will be increased. >>>>> >>>>> When con[0](create_con_cq_qp) succeeded, refcnt of PD will be 2. and then when con[1] failed, since >>>>> QP didn't create, refcnt of PD is still 2. con[1]'s cleanup will destroy the PD(ib_dealloc_pd) since dev_ref = 1, after that its >>>>> refcnt is still 1. >>>> I can see the path increase usecnt to 1. >>>> >>>> rtrs_cq_qp_create -> create_qp >>>> -> rdma_create_qp >>>> -> ib_create_qp >>>> -> create_qp >>>> -> ib_qp_usecnt_inc which increases pd->usecnt >>>> >>>> Where is another place to increase usecnt to 2? >>> It should be >>> ib_create_qp ... >>> -> rxe_create_qp >>> -> rxe_qp_from_init >>> -> rxe_get(pd) <<< pd's refcnt will be increased. >> Isn't rxe_get just increase elem->ref_cnt? > Yes, that's true. I am confused, does increase ref_cnt equal to increase usecnt? If not, then where is another place to increase usecnt to 2? BTW, I traced with 6.3-rc5, seems pd's usecnt is only increase once after create one connection. [ 6941.525088] in init_conns 2353 con_num=3 [ 6941.525732] in create_con_cq_qp 1648 [ 6941.525944] in rtrs_cq_qp_create 311 con->cid=0 path->dev->ib_pd->usecnt=1 [ 6941.532460] in create_con_cq_qp 1648 [ 6941.532746] in rtrs_cq_qp_create 311 con->cid=1 path->dev->ib_pd->usecnt=2 [ 6941.533183] in create_con_cq_qp 1648 [ 6941.533464] in rtrs_cq_qp_create 311 con->cid=2 path->dev->ib_pd->usecnt=3 [ 6941.533685] in init_conns 2365, clt_path->s.dev->ib_pd->usecnt=3 [ 6941.535680] in init_conns 2371, clt_path->s.dev->ib_pd->usecnt=515 Thanks, Guoqing
On Mon, Apr 17, 2023 at 02:18:24AM +0000, Zhijian Li (Fujitsu) wrote: > > > On 14/04/2023 23:58, Zhu Yanjun wrote: > > 在 2023/4/13 21:24, Leon Romanovsky 写道: > >> On Thu, Apr 13, 2023 at 08:12:15AM +0000, Zhijian Li (Fujitsu) wrote: > >>> > >>> > >>> On 13/04/2023 15:35, Guoqing Jiang wrote: > >>>> Hi, > >>>> > >>>> I take a closer look today. > >>>> > >>>> On 4/12/23 09:15, Zhijian Li (Fujitsu) wrote: > >>>>> > >>>>> On 11/04/2023 20:26, Leon Romanovsky wrote: > >>>>>> On Tue, Apr 11, 2023 at 02:43:46AM +0000, Zhijian Li (Fujitsu) wrote: > >>>>>>> > >>>>>>> On 10/04/2023 21:10, Guoqing Jiang wrote: > >>>>>>>> > >>>>>>>> On 4/10/23 20:08, Leon Romanovsky wrote: > >>>>>>>>> On Mon, Apr 10, 2023 at 06:43:03AM +0000, Li Zhijian wrote: > >>>>>>>>>> The warning occurs when destroying PD whose reference count is not zero. > >>>>>>>>>> > >>>>>>>>>> Precodition: clt_path->s.con_num is 2. > >>>>>>>>>> So 2 cm connection will be created as below: > >>>>>>>>>> CPU0 CPU1 > >>>>>>>>>> init_conns { | > >>>>>>>>>> create_cm() // a. con[0] created | > >>>>>>>>>> | a'. rtrs_clt_rdma_cm_handler() { > >>>>>>>>>> | rtrs_rdma_addr_resolved() > >>>>>>>>>> | create_con_cq_qp(con); << con[0] > >>>>>>>>>> | } > >>>>>>>>>> | in this moment, refcnt of PD was increased to 2+ > >>>> > >>>> What do you mean "refcnt of PD"? usecnt in struct ib_pd or dev_ref. > >>> > >>> I mean usecnt in struct ib_pd > >>> > >>> > >>> > >>>> > >>>>>>>>>> | > >>>>>>>>>> create_cm() // b. cid = 1, failed | > >>>>>>>>>> destroy_con_cq_qp() | > >>>>>>>>>> rtrs_ib_dev_put() | > >>>>>>>>>> dev_free() | > >>>>>>>>>> ib_dealloc_pd(dev->ib_pd) << PD | > >>>>>>>>>> is destroyed, but refcnt is | > >>>>>>>>>> still greater than 0 | > >>>> > >>>> Assuming you mean "pd->usecnt". We only allocate pd in con[0] by rtrs_ib_dev_find_or_add, > >>>> if con[1] failed to create cm, then alloc_path_reqs -> ib_alloc_mr -> atomic_inc(&pd->usecnt) > >>>> can't be triggered. Is there other places could increase the refcnt? > >>> > >>> > >>> Yes, when create a qp, it will also associate to this PD, that also mean refcnt of PD will be increased. > >>> > >>> When con[0](create_con_cq_qp) succeeded, refcnt of PD will be 2. and then when con[1] failed, since > >>> QP didn't create, refcnt of PD is still 2. con[1]'s cleanup will destroy the PD(ib_dealloc_pd) since dev_ref = 1, after that its > >>> refcnt is still 1. > >> > >> Why is refcnt 1 in con[1] destruction phase? It seems to me like a bug. > > > > > + if (!con->has_dev) > > + return; > > if (clt_path->s.dev_ref && !--clt_path->s.dev_ref) { > > rtrs_ib_dev_put(clt_path->s.dev); > > clt_path->s.dev = NULL; > > Currently, without this patch: > 1. PD and clt_path->s.dev are shared among connections. > 2. every con[n]'s cleanup phase will call destroy_con_cq_qp() > 3. clt_path->s.dev will be always decreased in destroy_con_cq_qp(), and when > clt_path->s.dev become zero, it will destroy PD. > 4. when con[1] failed to create, con[1] will not take clt_path->s.dev, but it try to decreased clt_path->s.dev <<< it's wrong to do that. So please fix it by making sure that failure to create con[1] will release resources which were allocated. If con[1] didn't increase s.dev_ref, it shouldn't decrease it either. Thanks > > > Thanks > Zhijian > > > Agree. We should find out why refcnt 1 and fix this problem. > > > > > > > > Zhu Yanjun > >> > >> Thanks > >
On 17/04/2023 11:08, Guoqing Jiang wrote: > > > On 4/14/23 18:09, Zhijian Li (Fujitsu) wrote: >> >> On 14/04/2023 14:04, Guoqing Jiang wrote: >>> >>> On 4/14/23 13:37, Zhijian Li (Fujitsu) wrote: >>>> On 14/04/2023 11:40, Guoqing Jiang wrote: >>>>> On 4/13/23 16:12, Zhijian Li (Fujitsu) wrote: >>>>>> On 13/04/2023 15:35, Guoqing Jiang wrote: >>>>>>> Hi, >>>>>>> >>>>>>> I take a closer look today. >>>>>>> >>>>>>> On 4/12/23 09:15, Zhijian Li (Fujitsu) wrote: >>>>>>>> On 11/04/2023 20:26, Leon Romanovsky wrote: >>>>>>>>> On Tue, Apr 11, 2023 at 02:43:46AM +0000, Zhijian Li (Fujitsu) wrote: >>>>>>>>>> On 10/04/2023 21:10, Guoqing Jiang wrote: >>>>>>>>>>> On 4/10/23 20:08, Leon Romanovsky wrote: >>>>>>>>>>>> On Mon, Apr 10, 2023 at 06:43:03AM +0000, Li Zhijian wrote: >>>>>>>>>>>>> The warning occurs when destroying PD whose reference count is not zero. >>>>>>>>>>>>> >>>>>>>>>>>>> Precodition: clt_path->s.con_num is 2. >>>>>>>>>>>>> So 2 cm connection will be created as below: >>>>>>>>>>>>> CPU0 CPU1 >>>>>>>>>>>>> init_conns { | >>>>>>>>>>>>> create_cm() // a. con[0] created | >>>>>>>>>>>>> | a'. rtrs_clt_rdma_cm_handler() { >>>>>>>>>>>>> | rtrs_rdma_addr_resolved() >>>>>>>>>>>>> | create_con_cq_qp(con); << con[0] >>>>>>>>>>>>> | } >>>>>>>>>>>>> | in this moment, refcnt of PD was increased to 2+ >>>>>>> What do you mean "refcnt of PD"? usecnt in struct ib_pd or dev_ref. >>>>>> I mean usecnt in struct ib_pd >>>>>> >>>>>>>>>>>>> | >>>>>>>>>>>>> create_cm() // b. cid = 1, failed | >>>>>>>>>>>>> destroy_con_cq_qp() | >>>>>>>>>>>>> rtrs_ib_dev_put() | >>>>>>>>>>>>> dev_free() | >>>>>>>>>>>>> ib_dealloc_pd(dev->ib_pd) << PD | >>>>>>>>>>>>> is destroyed, but refcnt is | >>>>>>>>>>>>> still greater than 0 | >>>>>>> Assuming you mean "pd->usecnt". We only allocate pd in con[0] by rtrs_ib_dev_find_or_add, >>>>>>> if con[1] failed to create cm, then alloc_path_reqs -> ib_alloc_mr -> atomic_inc(&pd->usecnt) >>>>> The above can't be invoked, right? >>>>> >>>>>>> can't be triggered. Is there other places could increase the refcnt? >>>>>> Yes, when create a qp, it will also associate to this PD, that also mean refcnt of PD will be increased. >>>>>> >>>>>> When con[0](create_con_cq_qp) succeeded, refcnt of PD will be 2. and then when con[1] failed, since >>>>>> QP didn't create, refcnt of PD is still 2. con[1]'s cleanup will destroy the PD(ib_dealloc_pd) since dev_ref = 1, after that its >>>>>> refcnt is still 1. >>>>> I can see the path increase usecnt to 1. >>>>> >>>>> rtrs_cq_qp_create -> create_qp >>>>> -> rdma_create_qp >>>>> -> ib_create_qp >>>>> -> create_qp >>>>> -> ib_qp_usecnt_inc which increases pd->usecnt >>>>> >>>>> Where is another place to increase usecnt to 2? >>>> It should be >>>> ib_create_qp ... >>>> -> rxe_create_qp >>>> -> rxe_qp_from_init >>>> -> rxe_get(pd) <<< pd's refcnt will be increased. >>> Isn't rxe_get just increase elem->ref_cnt? >> Yes, that's true. > > I am confused, does increase ref_cnt equal to increase usecnt? I need to apologize for my mistake. I have been referring to the elem.ref_cnt of the rxe driver as the refcnt of PD. > If not, then where is another place to increase usecnt to 2? > > BTW, I traced with 6.3-rc5, seems pd's usecnt is only increase once > after create one connection. And the warning mentioned above it also pointed to the PD's elem.ref_cnt. > > [ 6941.525088] in init_conns 2353 con_num=3 > [ 6941.525732] in create_con_cq_qp 1648 > [ 6941.525944] in rtrs_cq_qp_create 311 con->cid=0 path->dev->ib_pd->usecnt=1 > [ 6941.532460] in create_con_cq_qp 1648 > [ 6941.532746] in rtrs_cq_qp_create 311 con->cid=1 path->dev->ib_pd->usecnt=2 > [ 6941.533183] in create_con_cq_qp 1648 > [ 6941.533464] in rtrs_cq_qp_create 311 con->cid=2 path->dev->ib_pd->usecnt=3 > [ 6941.533685] in init_conns 2365, clt_path->s.dev->ib_pd->usecnt=3 > [ 6941.535680] in init_conns 2371, clt_path->s.dev->ib_pd->usecnt=515 Thanks Zhijian below is a piece of code that i used to debug this issue. --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c @@ -1730,15 +1730,39 @@ static int create_con_cq_qp(struct rtrs_clt_con *con) return err; } +struct rxe_pool; +struct rxe_pool_elem { + struct rxe_pool *pool; + void *obj; + struct kref ref_cnt; + struct list_head list; + struct completion complete; + u32 index; +}; + +struct rxe_pd { + struct ib_pd ibpd; + struct rxe_pool_elem elem; +}; + +static inline struct rxe_pd *to_rpd(struct ib_pd *pd) +{ + return pd ? container_of(pd, struct rxe_pd, ibpd) : NULL; +} + +#define rxe_read(obj) kref_read(&(obj)->elem.ref_cnt) static void destroy_con_cq_qp(struct rtrs_clt_con *con) { struct rtrs_clt_path *clt_path = to_clt_path(con->c.path); + struct rtrs_ib_dev *dev = clt_path->s.dev; + struct rxe_pd *pd = to_rpd(dev->ib_pd); /* * Be careful here: destroy_con_cq_qp() can be called even * create_con_cq_qp() failed, see comments there. */ lockdep_assert_held(&con->con_mutex); + rtrs_info(clt_path->clt, "%s: clt_path->s.dev_ref: %d, pd %px, ref: %d\n", __func__, clt_path->s.dev_ref, &pd->elem, rxe_read(pd)); rtrs_cq_qp_destroy(&con->c); if (con->rsp_ius) { rtrs_iu_free(con->rsp_ius, clt_path->s.dev->ib_dev, @@ -1746,7 +1770,8 @@ static void destroy_con_cq_qp(struct rtrs_clt_con *con) con->rsp_ius = NULL; con->queue_num = 0; } + rtrs_info(clt_path->clt, "%s: clt_path->s.dev_ref: %d, pd %px, ref: %d\n", __func__, clt_path->s.dev_ref, &pd->elem, rxe_read(pd)); if (clt_path->s.dev_ref && !--clt_path->s.dev_ref) { rtrs_ib_dev_put(clt_path->s.dev); clt_path->s.dev = NULL; > > Thanks, > Guoqing
On 18/04/2023 02:04, Leon Romanovsky wrote: > On Mon, Apr 17, 2023 at 02:18:24AM +0000, Zhijian Li (Fujitsu) wrote: >> >> >> On 14/04/2023 23:58, Zhu Yanjun wrote: >>> 在 2023/4/13 21:24, Leon Romanovsky 写道: >>>> On Thu, Apr 13, 2023 at 08:12:15AM +0000, Zhijian Li (Fujitsu) wrote: >>>>> >>>>> >>>>> On 13/04/2023 15:35, Guoqing Jiang wrote: >>>>>> Hi, >>>>>> >>>>>> I take a closer look today. >>>>>> >>>>>> On 4/12/23 09:15, Zhijian Li (Fujitsu) wrote: >>>>>>> >>>>>>> On 11/04/2023 20:26, Leon Romanovsky wrote: >>>>>>>> On Tue, Apr 11, 2023 at 02:43:46AM +0000, Zhijian Li (Fujitsu) wrote: >>>>>>>>> >>>>>>>>> On 10/04/2023 21:10, Guoqing Jiang wrote: >>>>>>>>>> >>>>>>>>>> On 4/10/23 20:08, Leon Romanovsky wrote: >>>>>>>>>>> On Mon, Apr 10, 2023 at 06:43:03AM +0000, Li Zhijian wrote: >>>>>>>>>>>> The warning occurs when destroying PD whose reference count is not zero. >>>>>>>>>>>> >>>>>>>>>>>> Precodition: clt_path->s.con_num is 2. >>>>>>>>>>>> So 2 cm connection will be created as below: >>>>>>>>>>>> CPU0 CPU1 >>>>>>>>>>>> init_conns { | >>>>>>>>>>>> create_cm() // a. con[0] created | >>>>>>>>>>>> | a'. rtrs_clt_rdma_cm_handler() { >>>>>>>>>>>> | rtrs_rdma_addr_resolved() >>>>>>>>>>>> | create_con_cq_qp(con); << con[0] >>>>>>>>>>>> | } >>>>>>>>>>>> | in this moment, refcnt of PD was increased to 2+ >>>>>> >>>>>> What do you mean "refcnt of PD"? usecnt in struct ib_pd or dev_ref. >>>>> >>>>> I mean usecnt in struct ib_pd >>>>> >>>>> >>>>> >>>>>> >>>>>>>>>>>> | >>>>>>>>>>>> create_cm() // b. cid = 1, failed | >>>>>>>>>>>> destroy_con_cq_qp() | >>>>>>>>>>>> rtrs_ib_dev_put() | >>>>>>>>>>>> dev_free() | >>>>>>>>>>>> ib_dealloc_pd(dev->ib_pd) << PD | >>>>>>>>>>>> is destroyed, but refcnt is | >>>>>>>>>>>> still greater than 0 | >>>>>> >>>>>> Assuming you mean "pd->usecnt". We only allocate pd in con[0] by rtrs_ib_dev_find_or_add, >>>>>> if con[1] failed to create cm, then alloc_path_reqs -> ib_alloc_mr -> atomic_inc(&pd->usecnt) >>>>>> can't be triggered. Is there other places could increase the refcnt? >>>>> >>>>> >>>>> Yes, when create a qp, it will also associate to this PD, that also mean refcnt of PD will be increased. >>>>> >>>>> When con[0](create_con_cq_qp) succeeded, refcnt of PD will be 2. and then when con[1] failed, since >>>>> QP didn't create, refcnt of PD is still 2. con[1]'s cleanup will destroy the PD(ib_dealloc_pd) since dev_ref = 1, after that its >>>>> refcnt is still 1. >>>> >>>> Why is refcnt 1 in con[1] destruction phase? It seems to me like a bug. >> >> >> >>> + if (!con->has_dev) >>> + return; >>> if (clt_path->s.dev_ref && !--clt_path->s.dev_ref) { >>> rtrs_ib_dev_put(clt_path->s.dev); >>> clt_path->s.dev = NULL; >> >> Currently, without this patch: >> 1. PD and clt_path->s.dev are shared among connections. >> 2. every con[n]'s cleanup phase will call destroy_con_cq_qp() >> 3. clt_path->s.dev will be always decreased in destroy_con_cq_qp(), and when >> clt_path->s.dev become zero, it will destroy PD. >> 4. when con[1] failed to create, con[1] will not take clt_path->s.dev, but it try to decreased clt_path->s.dev <<< it's wrong to do that. > > So please fix it by making sure that failure to create con[1] will > release resources which were allocated. If con[1] didn't increase > s.dev_ref, it shouldn't decrease it either. You are right, the current patch did exactly that. It introduced a con owning flag 'has_dev' to indicate whether this con has taken s.dev. so that its cleanup phase will only decrease its s.dev properly. Thanks Zhijian > > Thanks > >> >> >> Thanks >> Zhijian >> >>> Agree. We should find out why refcnt 1 and fix this problem. >> >> >> >> >>> >>> Zhu Yanjun >>>> >>>> Thanks >>>
On Tue, Apr 18, 2023 at 07:04:00AM +0000, Zhijian Li (Fujitsu) wrote: > > > On 18/04/2023 02:04, Leon Romanovsky wrote: > > On Mon, Apr 17, 2023 at 02:18:24AM +0000, Zhijian Li (Fujitsu) wrote: > >> > >> > >> On 14/04/2023 23:58, Zhu Yanjun wrote: > >>> 在 2023/4/13 21:24, Leon Romanovsky 写道: > >>>> On Thu, Apr 13, 2023 at 08:12:15AM +0000, Zhijian Li (Fujitsu) wrote: > >>>>> > >>>>> > >>>>> On 13/04/2023 15:35, Guoqing Jiang wrote: > >>>>>> Hi, > >>>>>> > >>>>>> I take a closer look today. > >>>>>> > >>>>>> On 4/12/23 09:15, Zhijian Li (Fujitsu) wrote: > >>>>>>> > >>>>>>> On 11/04/2023 20:26, Leon Romanovsky wrote: > >>>>>>>> On Tue, Apr 11, 2023 at 02:43:46AM +0000, Zhijian Li (Fujitsu) wrote: > >>>>>>>>> > >>>>>>>>> On 10/04/2023 21:10, Guoqing Jiang wrote: > >>>>>>>>>> > >>>>>>>>>> On 4/10/23 20:08, Leon Romanovsky wrote: > >>>>>>>>>>> On Mon, Apr 10, 2023 at 06:43:03AM +0000, Li Zhijian wrote: > >>>>>>>>>>>> The warning occurs when destroying PD whose reference count is not zero. > >>>>>>>>>>>> > >>>>>>>>>>>> Precodition: clt_path->s.con_num is 2. > >>>>>>>>>>>> So 2 cm connection will be created as below: > >>>>>>>>>>>> CPU0 CPU1 > >>>>>>>>>>>> init_conns { | > >>>>>>>>>>>> create_cm() // a. con[0] created | > >>>>>>>>>>>> | a'. rtrs_clt_rdma_cm_handler() { > >>>>>>>>>>>> | rtrs_rdma_addr_resolved() > >>>>>>>>>>>> | create_con_cq_qp(con); << con[0] > >>>>>>>>>>>> | } > >>>>>>>>>>>> | in this moment, refcnt of PD was increased to 2+ > >>>>>> > >>>>>> What do you mean "refcnt of PD"? usecnt in struct ib_pd or dev_ref. > >>>>> > >>>>> I mean usecnt in struct ib_pd > >>>>> > >>>>> > >>>>> > >>>>>> > >>>>>>>>>>>> | > >>>>>>>>>>>> create_cm() // b. cid = 1, failed | > >>>>>>>>>>>> destroy_con_cq_qp() | > >>>>>>>>>>>> rtrs_ib_dev_put() | > >>>>>>>>>>>> dev_free() | > >>>>>>>>>>>> ib_dealloc_pd(dev->ib_pd) << PD | > >>>>>>>>>>>> is destroyed, but refcnt is | > >>>>>>>>>>>> still greater than 0 | > >>>>>> > >>>>>> Assuming you mean "pd->usecnt". We only allocate pd in con[0] by rtrs_ib_dev_find_or_add, > >>>>>> if con[1] failed to create cm, then alloc_path_reqs -> ib_alloc_mr -> atomic_inc(&pd->usecnt) > >>>>>> can't be triggered. Is there other places could increase the refcnt? > >>>>> > >>>>> > >>>>> Yes, when create a qp, it will also associate to this PD, that also mean refcnt of PD will be increased. > >>>>> > >>>>> When con[0](create_con_cq_qp) succeeded, refcnt of PD will be 2. and then when con[1] failed, since > >>>>> QP didn't create, refcnt of PD is still 2. con[1]'s cleanup will destroy the PD(ib_dealloc_pd) since dev_ref = 1, after that its > >>>>> refcnt is still 1. > >>>> > >>>> Why is refcnt 1 in con[1] destruction phase? It seems to me like a bug. > >> > >> > >> > >>> + if (!con->has_dev) > >>> + return; > >>> if (clt_path->s.dev_ref && !--clt_path->s.dev_ref) { > >>> rtrs_ib_dev_put(clt_path->s.dev); > >>> clt_path->s.dev = NULL; > >> > >> Currently, without this patch: > >> 1. PD and clt_path->s.dev are shared among connections. > >> 2. every con[n]'s cleanup phase will call destroy_con_cq_qp() > >> 3. clt_path->s.dev will be always decreased in destroy_con_cq_qp(), and when > >> clt_path->s.dev become zero, it will destroy PD. > >> 4. when con[1] failed to create, con[1] will not take clt_path->s.dev, but it try to decreased clt_path->s.dev <<< it's wrong to do that. > > > > So please fix it by making sure that failure to create con[1] will > > release resources which were allocated. If con[1] didn't increase > > s.dev_ref, it shouldn't decrease it either. > > You are right, the current patch did exactly that. > It introduced a con owning flag 'has_dev' to indicate whether this con has taken s.dev. > so that its cleanup phase will only decrease its s.dev properly. The has_dev is a workaround and not a solution. In proper error unwind sequence, you won't need extra flag. Thanks > > Thanks > Zhijian > > > > > > Thanks > > > >> > >> > >> Thanks > >> Zhijian > >> > >>> Agree. We should find out why refcnt 1 and fix this problem. > >> > >> > >> > >> > >>> > >>> Zhu Yanjun > >>>> > >>>> Thanks > >>>
Leon, Guoqing On 18/04/2023 15:57, Leon Romanovsky wrote: >>>> Currently, without this patch: >>>> 1. PD and clt_path->s.dev are shared among connections. >>>> 2. every con[n]'s cleanup phase will call destroy_con_cq_qp() >>>> 3. clt_path->s.dev will be always decreased in destroy_con_cq_qp(), and when >>>> clt_path->s.dev become zero, it will destroy PD. >>>> 4. when con[1] failed to create, con[1] will not take clt_path->s.dev, but it try to decreased clt_path->s.dev <<< it's wrong to do that. >>> So please fix it by making sure that failure to create con[1] will >>> release resources which were allocated. If con[1] didn't increase >>> s.dev_ref, it shouldn't decrease it either. >> You are right, the current patch did exactly that. >> It introduced a con owning flag 'has_dev' to indicate whether this con has taken s.dev. >> so that its cleanup phase will only decrease its s.dev properly. > The has_dev is a workaround and not a solution. In proper error unwind > sequence, you won't need extra flag. > > Thanks > how about below changes commit 61dba725384e226d472b8142d70d40d4103df87a Author: Li Zhijian <lizhijian@fujitsu.com> Date: Wed Apr 19 17:42:26 2023 +0800 RDMA/rtrs: Fix rxe_dealloc_pd warning con[0] always sets s.dev to 1, correspondingly, we should let it to release the last dev. Previously, 1. PD and clt_path->s.dev are shared among connections. 2. every con[n]'s cleanup phase will call destroy_con_cq_qp() 3. clt_path->s.dev will be always decreased in destroy_con_cq_qp(), and when clt_path->s.dev become zero, it will destroy PD. 4. when con[1] failed to create, con[1] will not take clt_path->s.dev, but it try to decreased clt_path->s.dev <<< it's wrong to do that. The warning occurs when destroying PD whose reference count is not zero. Precodition: clt_path->s.con_num is 2. So 2 cm connection will be created as below: CPU0 CPU1 init_conns { | create_cm() // a. con[0] created | | a'. rtrs_clt_rdma_cm_handler() { | rtrs_rdma_addr_resolved() | create_con_cq_qp(con); << con[0] | } | in this moment, refcnt of PD was increased to 2+ | create_cm() // b. cid = 1, failed | destroy_con_cq_qp() | rtrs_ib_dev_put() | dev_free() | ib_dealloc_pd(dev->ib_pd) << PD | is destroyed, but refcnt is | still greater than 0 | } diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c index 80abf45a197a..1eb652dedca3 100644 --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c @@ -1743,6 +1743,15 @@ static void destroy_con_cq_qp(struct rtrs_clt_con *con) con->rsp_ius = NULL; con->queue_num = 0; } + + /* + * Every con will try to decreased s.dev_ref, but we should + * reserve the last s.dev_ref for con[0]. In case con[1+]'s + * cleanup phase call rtrs_ib_dev_put(clt_path->s.dev) early. + */ + if (con->c.cid != 0 && clt_path->s.dev_ref == 1) + return; + if (clt_path->s.dev_ref && !--clt_path->s.dev_ref) { rtrs_ib_dev_put(clt_path->s.dev); clt_path->s.dev = NULL;
On Wed, Apr 19, 2023 at 11:53 AM Zhijian Li (Fujitsu) <lizhijian@fujitsu.com> wrote: > > Leon, Guoqing > > > On 18/04/2023 15:57, Leon Romanovsky wrote: > >>>> Currently, without this patch: > >>>> 1. PD and clt_path->s.dev are shared among connections. > >>>> 2. every con[n]'s cleanup phase will call destroy_con_cq_qp() > >>>> 3. clt_path->s.dev will be always decreased in destroy_con_cq_qp(), and when > >>>> clt_path->s.dev become zero, it will destroy PD. > >>>> 4. when con[1] failed to create, con[1] will not take clt_path->s.dev, but it try to decreased clt_path->s.dev <<< it's wrong to do that. > >>> So please fix it by making sure that failure to create con[1] will > >>> release resources which were allocated. If con[1] didn't increase > >>> s.dev_ref, it shouldn't decrease it either. > >> You are right, the current patch did exactly that. > >> It introduced a con owning flag 'has_dev' to indicate whether this con has taken s.dev. > >> so that its cleanup phase will only decrease its s.dev properly. > > The has_dev is a workaround and not a solution. In proper error unwind > > sequence, you won't need extra flag. > > > > Thanks > > > > how about below changes > > commit 61dba725384e226d472b8142d70d40d4103df87a > Author: Li Zhijian <lizhijian@fujitsu.com> > Date: Wed Apr 19 17:42:26 2023 +0800 > > RDMA/rtrs: Fix rxe_dealloc_pd warning > > con[0] always sets s.dev to 1, correspondingly, we should let it to > release the last dev. > > Previously, > 1. PD and clt_path->s.dev are shared among connections. > 2. every con[n]'s cleanup phase will call destroy_con_cq_qp() > 3. clt_path->s.dev will be always decreased in destroy_con_cq_qp(), and when > clt_path->s.dev become zero, it will destroy PD. > 4. when con[1] failed to create, con[1] will not take clt_path->s.dev, > but it try to decreased clt_path->s.dev <<< it's wrong to do that. > > The warning occurs when destroying PD whose reference count is not zero. > Precodition: clt_path->s.con_num is 2. > So 2 cm connection will be created as below: > CPU0 CPU1 > init_conns { | > create_cm() // a. con[0] created | > | a'. rtrs_clt_rdma_cm_handler() { > | rtrs_rdma_addr_resolved() > | create_con_cq_qp(con); << con[0] > | } > | in this moment, refcnt of PD was increased to 2+ > | > create_cm() // b. cid = 1, failed | > destroy_con_cq_qp() | > rtrs_ib_dev_put() | > dev_free() | > ib_dealloc_pd(dev->ib_pd) << PD | > is destroyed, but refcnt is | > still greater than 0 | > } > > diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c > index 80abf45a197a..1eb652dedca3 100644 > --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c > +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c > @@ -1743,6 +1743,15 @@ static void destroy_con_cq_qp(struct rtrs_clt_con *con) > con->rsp_ius = NULL; > con->queue_num = 0; > } > + > + /* > + * Every con will try to decreased s.dev_ref, but we should > + * reserve the last s.dev_ref for con[0]. In case con[1+]'s > + * cleanup phase call rtrs_ib_dev_put(clt_path->s.dev) early. > + */ > + if (con->c.cid != 0 && clt_path->s.dev_ref == 1) > + return; > + > if (clt_path->s.dev_ref && !--clt_path->s.dev_ref) { > rtrs_ib_dev_put(clt_path->s.dev); > clt_path->s.dev = NULL; I run a regression test in our test env, it triggers a warning on 1681 if (WARN_ON(clt_path->s.dev)) [ 1333.042633] ------------[ cut here ]------------ [ 1333.042650] WARNING: CPU: 8 PID: 559 at /root/kernel-test/ibnbd2/rtrs/rtrs-clt.c:1681 rtrs_clt_rdma_cm_handler+0x864/0x8a0 [rtrs_client] [ 1333.042651] Modules linked in: loop rnbd_client(O) rtrs_client(O) rtrs_core(O) kvm_amd kvm input_leds led_class irqbypass crc32_pclmul aesni_intel sp5100_tco evdev libaes watchdog sg k10temp crypto_simd fam15h_power ipmi_si serio_raw cryptd ipmi_devintf glue_helper ipmi_msghandler acpi_cpufreq button ib_ipoib ib_umad null_blk brd rdma_cm iw_cm ib_cm ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 linear mlx4_ib md_mod ib_uverbs ib_core sd_mod t10_pi crc_t10dif crct10dif_generic ahci libahci crct10dif_pclmul crct10dif_common crc32c_intel igb libata usb_storage psmouse i2c_piix4 i2c_algo_bit mlx4_core dca scsi_mod i2c_core ptp pps_core [ 1333.042737] CPU: 8 PID: 559 Comm: kworker/u128:1 Tainted: G O 5.10.136-pserver-develop-5.10 #257 [ 1333.042738] Hardware name: Supermicro H8QG6/H8QG6, BIOS 3.00 09/04/2012 [ 1333.042752] Workqueue: rdma_cm cma_work_handler [rdma_cm] [ 1333.042758] RIP: 0010:rtrs_clt_rdma_cm_handler+0x864/0x8a0 [rtrs_client] [ 1333.042761] Code: ff bb ea ff ff ff e8 db a5 24 fc 49 8d b4 24 10 01 00 00 89 da 48 c7 c7 40 93 5b c0 e8 4b 47 21 fc 4d 8b 65 00 e9 15 fe ff ff <0f> 0b 4c 89 ff bb ea ff ff ff e8 ad a5 24 fc eb d0 0f 0b 4c 89 ff [ 1333.042763] RSP: 0018:ffffaff68e57bdb0 EFLAGS: 00010286 [ 1333.042765] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff9eddc0051420 [ 1333.042767] RDX: ffff9ee4ef716e40 RSI: ffff9f14ea288f30 RDI: ffff9eddc88db240 [ 1333.042768] RBP: ffffaff68e57be50 R08: 0000000000000000 R09: 006d635f616d6472 [ 1333.042769] R10: ffffaff68e57be68 R11: 0000000000000000 R12: ffff9edde1388000 [ 1333.042771] R13: ffff9eddc88db200 R14: ffff9edde1388000 R15: ffff9eddc88db240 [ 1333.042773] FS: 0000000000000000(0000) GS:ffff9eecc7c00000(0000) knlGS:0000000000000000 [ 1333.042774] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1333.042776] CR2: 00007f0ac4ed4004 CR3: 0000002b5040a000 CR4: 00000000000406e0 [ 1333.042777] Call Trace: [ 1333.042790] ? newidle_balance+0x25e/0x3c0 [ 1333.042795] ? psi_group_change+0x43/0x230 [ 1333.042801] ? cma_cm_event_handler+0x23/0xb0 [rdma_cm] [ 1333.042807] cma_cm_event_handler+0x23/0xb0 [rdma_cm] [ 1333.042814] cma_work_handler+0x5a/0xb0 [rdma_cm] [ 1333.042819] process_one_work+0x1f3/0x390 [ 1333.042822] worker_thread+0x2d/0x3c0
On 19/04/2023 21:20, Jinpu Wang wrote: > On Wed, Apr 19, 2023 at 11:53 AM Zhijian Li (Fujitsu) > <lizhijian@fujitsu.com> wrote: >> >> Leon, Guoqing >> >> >> On 18/04/2023 15:57, Leon Romanovsky wrote: >>>>>> Currently, without this patch: >>>>>> 1. PD and clt_path->s.dev are shared among connections. >>>>>> 2. every con[n]'s cleanup phase will call destroy_con_cq_qp() >>>>>> 3. clt_path->s.dev will be always decreased in destroy_con_cq_qp(), and when >>>>>> clt_path->s.dev become zero, it will destroy PD. >>>>>> 4. when con[1] failed to create, con[1] will not take clt_path->s.dev, but it try to decreased clt_path->s.dev <<< it's wrong to do that. >>>>> So please fix it by making sure that failure to create con[1] will >>>>> release resources which were allocated. If con[1] didn't increase >>>>> s.dev_ref, it shouldn't decrease it either. >>>> You are right, the current patch did exactly that. >>>> It introduced a con owning flag 'has_dev' to indicate whether this con has taken s.dev. >>>> so that its cleanup phase will only decrease its s.dev properly. >>> The has_dev is a workaround and not a solution. In proper error unwind >>> sequence, you won't need extra flag. >>> >>> Thanks >>> >> >> how about below changes >> >> commit 61dba725384e226d472b8142d70d40d4103df87a >> Author: Li Zhijian <lizhijian@fujitsu.com> >> Date: Wed Apr 19 17:42:26 2023 +0800 >> >> RDMA/rtrs: Fix rxe_dealloc_pd warning >> >> con[0] always sets s.dev to 1, correspondingly, we should let it to >> release the last dev. >> >> Previously, >> 1. PD and clt_path->s.dev are shared among connections. >> 2. every con[n]'s cleanup phase will call destroy_con_cq_qp() >> 3. clt_path->s.dev will be always decreased in destroy_con_cq_qp(), and when >> clt_path->s.dev become zero, it will destroy PD. >> 4. when con[1] failed to create, con[1] will not take clt_path->s.dev, >> but it try to decreased clt_path->s.dev <<< it's wrong to do that. >> >> The warning occurs when destroying PD whose reference count is not zero. >> Precodition: clt_path->s.con_num is 2. >> So 2 cm connection will be created as below: >> CPU0 CPU1 >> init_conns { | >> create_cm() // a. con[0] created | >> | a'. rtrs_clt_rdma_cm_handler() { >> | rtrs_rdma_addr_resolved() >> | create_con_cq_qp(con); << con[0] >> | } >> | in this moment, refcnt of PD was increased to 2+ >> | >> create_cm() // b. cid = 1, failed | >> destroy_con_cq_qp() | >> rtrs_ib_dev_put() | >> dev_free() | >> ib_dealloc_pd(dev->ib_pd) << PD | >> is destroyed, but refcnt is | >> still greater than 0 | >> } >> >> diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c >> index 80abf45a197a..1eb652dedca3 100644 >> --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c >> +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c >> @@ -1743,6 +1743,15 @@ static void destroy_con_cq_qp(struct rtrs_clt_con *con) >> con->rsp_ius = NULL; >> con->queue_num = 0; >> } >> + >> + /* >> + * Every con will try to decreased s.dev_ref, but we should >> + * reserve the last s.dev_ref for con[0]. In case con[1+]'s >> + * cleanup phase call rtrs_ib_dev_put(clt_path->s.dev) early. >> + */ >> + if (con->c.cid != 0 && clt_path->s.dev_ref == 1) >> + return; >> + >> if (clt_path->s.dev_ref && !--clt_path->s.dev_ref) { >> rtrs_ib_dev_put(clt_path->s.dev); >> clt_path->s.dev = NULL; > Jinpu, thanks for your testing. Indeed, above changes are not correct. it breaks the normal cleanup like below: for (i=0; i < N, i++) destroy_con_cq_qp(con[i]) Thanks Zhijian > I run a regression test in our test env, it triggers a warning on > > 1681 if (WARN_ON(clt_path->s.dev)) > > [ 1333.042633] ------------[ cut here ]------------ > [ 1333.042650] WARNING: CPU: 8 PID: 559 at > /root/kernel-test/ibnbd2/rtrs/rtrs-clt.c:1681 > rtrs_clt_rdma_cm_handler+0x864/0x8a0 [rtrs_client] > [ 1333.042651] Modules linked in: loop rnbd_client(O) rtrs_client(O) > rtrs_core(O) kvm_amd kvm input_leds led_class irqbypass crc32_pclmul > aesni_intel sp5100_tco evdev libaes watchdog sg k10temp crypto_simd > fam15h_power ipmi_si serio_raw cryptd ipmi_devintf glue_helper > ipmi_msghandler acpi_cpufreq button ib_ipoib ib_umad null_blk brd > rdma_cm iw_cm ib_cm ip_tables x_tables autofs4 raid10 raid456 > async_raid6_recov async_memcpy async_pq async_xor async_tx xor > raid6_pq libcrc32c raid1 raid0 linear mlx4_ib md_mod ib_uverbs ib_core > sd_mod t10_pi crc_t10dif crct10dif_generic ahci libahci > crct10dif_pclmul crct10dif_common crc32c_intel igb libata usb_storage > psmouse i2c_piix4 i2c_algo_bit mlx4_core dca scsi_mod i2c_core ptp > pps_core > [ 1333.042737] CPU: 8 PID: 559 Comm: kworker/u128:1 Tainted: G > O 5.10.136-pserver-develop-5.10 #257 > [ 1333.042738] Hardware name: Supermicro H8QG6/H8QG6, BIOS 3.00 09/04/2012 > [ 1333.042752] Workqueue: rdma_cm cma_work_handler [rdma_cm] > [ 1333.042758] RIP: 0010:rtrs_clt_rdma_cm_handler+0x864/0x8a0 [rtrs_client] > [ 1333.042761] Code: ff bb ea ff ff ff e8 db a5 24 fc 49 8d b4 24 10 > 01 00 00 89 da 48 c7 c7 40 93 5b c0 e8 4b 47 21 fc 4d 8b 65 00 e9 15 > fe ff ff <0f> 0b 4c 89 ff bb ea ff ff ff e8 ad a5 24 fc eb d0 0f 0b 4c > 89 ff > [ 1333.042763] RSP: 0018:ffffaff68e57bdb0 EFLAGS: 00010286 > [ 1333.042765] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff9eddc0051420 > [ 1333.042767] RDX: ffff9ee4ef716e40 RSI: ffff9f14ea288f30 RDI: ffff9eddc88db240 > [ 1333.042768] RBP: ffffaff68e57be50 R08: 0000000000000000 R09: 006d635f616d6472 > [ 1333.042769] R10: ffffaff68e57be68 R11: 0000000000000000 R12: ffff9edde1388000 > [ 1333.042771] R13: ffff9eddc88db200 R14: ffff9edde1388000 R15: ffff9eddc88db240 > [ 1333.042773] FS: 0000000000000000(0000) GS:ffff9eecc7c00000(0000) > knlGS:0000000000000000 > [ 1333.042774] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1333.042776] CR2: 00007f0ac4ed4004 CR3: 0000002b5040a000 CR4: 00000000000406e0 > [ 1333.042777] Call Trace: > [ 1333.042790] ? newidle_balance+0x25e/0x3c0 > [ 1333.042795] ? psi_group_change+0x43/0x230 > [ 1333.042801] ? cma_cm_event_handler+0x23/0xb0 [rdma_cm] > [ 1333.042807] cma_cm_event_handler+0x23/0xb0 [rdma_cm] > [ 1333.042814] cma_work_handler+0x5a/0xb0 [rdma_cm] > [ 1333.042819] process_one_work+0x1f3/0x390 > [ 1333.042822] worker_thread+0x2d/0x3c0
Jinpu I updated the changes as below, and tested for thousand rounds. From d441c0e2496c1795b5af2b6b8ae4672203d6af3c Mon Sep 17 00:00:00 2001 From: Li Zhijian <lizhijian@fujitsu.com> Date: Thu, 20 Apr 2023 17:28:28 +0800 Subject: [PATCH] RDMA/rtrs: Fix rxe_dealloc_pd warning In current design: 1. PD and clt_path->s.dev are shared among connections. 2. every con[n]'s cleanup phase will call destroy_con_cq_qp() 3. clt_path->s.dev will be always decreased in destroy_con_cq_qp(), and when clt_path->s.dev become zero, it will destroy PD. 4. when con[1] failed to create, con[1] will not take clt_path->s.dev, but it try to decreased clt_path->s.dev So, in case create_cm(con[0]) succeeds but create_cm(con[1]) fails, destroy_con_cq_qp(con[1]) will be called first which will destory the PD while this PD is still taken by con[0]. Here, we refactor the error path of create_cm() and init_conns(), so that we do the cleanup in the order they are created. Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> --- drivers/infiniband/ulp/rtrs/rtrs-clt.c | 47 +++++++++++--------------- 1 file changed, 19 insertions(+), 28 deletions(-) diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c index 80abf45a197a..5faf0ecb726b 100644 --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c @@ -2040,6 +2040,7 @@ static int rtrs_clt_rdma_cm_handler(struct rdma_cm_id *cm_id, return 0; } +/* The caller should the do the cleanup in case of error */ static int create_cm(struct rtrs_clt_con *con) { struct rtrs_path *s = con->c.path; @@ -2062,14 +2063,14 @@ static int create_cm(struct rtrs_clt_con *con) err = rdma_set_reuseaddr(cm_id, 1); if (err != 0) { rtrs_err(s, "Set address reuse failed, err: %d\n", err); - goto destroy_cm; + return err; } err = rdma_resolve_addr(cm_id, (struct sockaddr *)&clt_path->s.src_addr, (struct sockaddr *)&clt_path->s.dst_addr, RTRS_CONNECT_TIMEOUT_MS); if (err) { rtrs_err(s, "Failed to resolve address, err: %d\n", err); - goto destroy_cm; + return err; } /* * Combine connection status and session events. This is needed @@ -2084,29 +2085,17 @@ static int create_cm(struct rtrs_clt_con *con) if (err == 0) err = -ETIMEDOUT; /* Timedout or interrupted */ - goto errr; + return err; } if (con->cm_err < 0) { - err = con->cm_err; - goto errr; + return con->cm_err; } if (READ_ONCE(clt_path->state) != RTRS_CLT_CONNECTING) { /* Device removal */ - err = -ECONNABORTED; - goto errr; + return -ECONNABORTED; } return 0; - -errr: - stop_cm(con); - mutex_lock(&con->con_mutex); - destroy_con_cq_qp(con); - mutex_unlock(&con->con_mutex); -destroy_cm: - destroy_cm(con); - - return err; } static void rtrs_clt_path_up(struct rtrs_clt_path *clt_path) @@ -2334,7 +2323,7 @@ static void rtrs_clt_close_work(struct work_struct *work) static int init_conns(struct rtrs_clt_path *clt_path) { unsigned int cid; - int err; + int err, i; /* * On every new session connections increase reconnect counter @@ -2350,10 +2339,8 @@ static int init_conns(struct rtrs_clt_path *clt_path) goto destroy; err = create_cm(to_clt_con(clt_path->s.con[cid])); - if (err) { - destroy_con(to_clt_con(clt_path->s.con[cid])); + if (err) goto destroy; - } } err = alloc_path_reqs(clt_path); if (err) @@ -2364,15 +2351,19 @@ static int init_conns(struct rtrs_clt_path *clt_path) return 0; destroy: - while (cid--) { + /* Make sure we do the cleanup in the order they are created */ + for (i = 0; i <= cid; i++) { struct rtrs_clt_con *con = to_clt_con(clt_path->s.con[cid]); - stop_cm(con); - - mutex_lock(&con->con_mutex); - destroy_con_cq_qp(con); - mutex_unlock(&con->con_mutex); - destroy_cm(con); + if (!con) + break; + if (con->c.cm_id) { + stop_cm(con); + mutex_lock(&con->con_mutex); + destroy_con_cq_qp(con); + mutex_unlock(&con->con_mutex); + destroy_cm(con); + } destroy_con(con); } /*
On 21/04/2023 09:38, Li Zhijian wrote: > Jinpu > > I updated the changes as below, and tested for thousand rounds. > > From d441c0e2496c1795b5af2b6b8ae4672203d6af3c Mon Sep 17 00:00:00 2001 > From: Li Zhijian <lizhijian@fujitsu.com> > Date: Thu, 20 Apr 2023 17:28:28 +0800 > Subject: [PATCH] RDMA/rtrs: Fix rxe_dealloc_pd warning > > In current design: > 1. PD and clt_path->s.dev are shared among connections. > 2. every con[n]'s cleanup phase will call destroy_con_cq_qp() > 3. clt_path->s.dev will be always decreased in destroy_con_cq_qp(), and > when clt_path->s.dev become zero, it will destroy PD. > 4. when con[1] failed to create, con[1] will not take clt_path->s.dev, > but it try to decreased clt_path->s.dev > > So, in case create_cm(con[0]) succeeds but create_cm(con[1]) > fails, destroy_con_cq_qp(con[1]) will be called first which will destory > the PD while this PD is still taken by con[0]. > > Here, we refactor the error path of create_cm() and init_conns(), so that > we do the cleanup in the order they are created. > > Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> > --- > drivers/infiniband/ulp/rtrs/rtrs-clt.c | 47 +++++++++++--------------- > 1 file changed, 19 insertions(+), 28 deletions(-) > > diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c > index 80abf45a197a..5faf0ecb726b 100644 > --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c > +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c > @@ -2040,6 +2040,7 @@ static int rtrs_clt_rdma_cm_handler(struct rdma_cm_id *cm_id, > return 0; > } > > +/* The caller should the do the cleanup in case of error */ > static int create_cm(struct rtrs_clt_con *con) > { > struct rtrs_path *s = con->c.path; > @@ -2062,14 +2063,14 @@ static int create_cm(struct rtrs_clt_con *con) > err = rdma_set_reuseaddr(cm_id, 1); > if (err != 0) { > rtrs_err(s, "Set address reuse failed, err: %d\n", err); > - goto destroy_cm; > + return err; > } > err = rdma_resolve_addr(cm_id, (struct sockaddr *)&clt_path->s.src_addr, > (struct sockaddr *)&clt_path->s.dst_addr, > RTRS_CONNECT_TIMEOUT_MS); > if (err) { > rtrs_err(s, "Failed to resolve address, err: %d\n", err); > - goto destroy_cm; > + return err; > } > /* > * Combine connection status and session events. This is needed > @@ -2084,29 +2085,17 @@ static int create_cm(struct rtrs_clt_con *con) > if (err == 0) > err = -ETIMEDOUT; > /* Timedout or interrupted */ > - goto errr; > + return err; > } > if (con->cm_err < 0) { > - err = con->cm_err; > - goto errr; > + return con->cm_err; > } > if (READ_ONCE(clt_path->state) != RTRS_CLT_CONNECTING) { > /* Device removal */ > - err = -ECONNABORTED; > - goto errr; > + return -ECONNABORTED; > } > > return 0; > - > -errr: > - stop_cm(con); > - mutex_lock(&con->con_mutex); > - destroy_con_cq_qp(con); > - mutex_unlock(&con->con_mutex); > -destroy_cm: > - destroy_cm(con); > - > - return err; > } > > static void rtrs_clt_path_up(struct rtrs_clt_path *clt_path) > @@ -2334,7 +2323,7 @@ static void rtrs_clt_close_work(struct work_struct *work) > static int init_conns(struct rtrs_clt_path *clt_path) > { > unsigned int cid; > - int err; > + int err, i; > > /* > * On every new session connections increase reconnect counter > @@ -2350,10 +2339,8 @@ static int init_conns(struct rtrs_clt_path *clt_path) > goto destroy; > > err = create_cm(to_clt_con(clt_path->s.con[cid])); > - if (err) { > - destroy_con(to_clt_con(clt_path->s.con[cid])); > + if (err) > goto destroy; > - } > } > err = alloc_path_reqs(clt_path); > if (err) > @@ -2364,15 +2351,19 @@ static int init_conns(struct rtrs_clt_path *clt_path) > return 0; > > destroy: > - while (cid--) { > + /* Make sure we do the cleanup in the order they are created */ > + for (i = 0; i <= cid; i++) { > struct rtrs_clt_con *con = to_clt_con(clt_path->s.con[cid]); s/cid/i > > - stop_cm(con); > - > - mutex_lock(&con->con_mutex); > - destroy_con_cq_qp(con); > - mutex_unlock(&con->con_mutex); > - destroy_cm(con); > + if (!con) > + break; > + if (con->c.cm_id) { > + stop_cm(con); > + mutex_lock(&con->con_mutex); > + destroy_con_cq_qp(con); > + mutex_unlock(&con->con_mutex); > + destroy_cm(con); > + } > destroy_con(con); > } > /*
On Fri, Apr 21, 2023 at 3:38 AM Zhijian Li (Fujitsu) <lizhijian@fujitsu.com> wrote: > > Jinpu > > I updated the changes as below, and tested for thousand rounds. > > From d441c0e2496c1795b5af2b6b8ae4672203d6af3c Mon Sep 17 00:00:00 2001 > From: Li Zhijian <lizhijian@fujitsu.com> > Date: Thu, 20 Apr 2023 17:28:28 +0800 > Subject: [PATCH] RDMA/rtrs: Fix rxe_dealloc_pd warning > > In current design: > 1. PD and clt_path->s.dev are shared among connections. > 2. every con[n]'s cleanup phase will call destroy_con_cq_qp() > 3. clt_path->s.dev will be always decreased in destroy_con_cq_qp(), and > when clt_path->s.dev become zero, it will destroy PD. > 4. when con[1] failed to create, con[1] will not take clt_path->s.dev, > but it try to decreased clt_path->s.dev > > So, in case create_cm(con[0]) succeeds but create_cm(con[1]) > fails, destroy_con_cq_qp(con[1]) will be called first which will destory > the PD while this PD is still taken by con[0]. > > Here, we refactor the error path of create_cm() and init_conns(), so that > we do the cleanup in the order they are created. > > Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> > --- > drivers/infiniband/ulp/rtrs/rtrs-clt.c | 47 +++++++++++--------------- > 1 file changed, 19 insertions(+), 28 deletions(-) > > diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c > index 80abf45a197a..5faf0ecb726b 100644 > --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c > +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c > @@ -2040,6 +2040,7 @@ static int rtrs_clt_rdma_cm_handler(struct rdma_cm_id *cm_id, > return 0; > } > > +/* The caller should the do the cleanup in case of error */ > static int create_cm(struct rtrs_clt_con *con) > { > struct rtrs_path *s = con->c.path; > @@ -2062,14 +2063,14 @@ static int create_cm(struct rtrs_clt_con *con) > err = rdma_set_reuseaddr(cm_id, 1); > if (err != 0) { > rtrs_err(s, "Set address reuse failed, err: %d\n", err); > - goto destroy_cm; > + return err; > } > err = rdma_resolve_addr(cm_id, (struct sockaddr *)&clt_path->s.src_addr, > (struct sockaddr *)&clt_path->s.dst_addr, > RTRS_CONNECT_TIMEOUT_MS); > if (err) { > rtrs_err(s, "Failed to resolve address, err: %d\n", err); > - goto destroy_cm; > + return err; > } > /* > * Combine connection status and session events. This is needed > @@ -2084,29 +2085,17 @@ static int create_cm(struct rtrs_clt_con *con) > if (err == 0) > err = -ETIMEDOUT; > /* Timedout or interrupted */ > - goto errr; > + return err; > } > if (con->cm_err < 0) { > - err = con->cm_err; > - goto errr; > + return con->cm_err; > } The bracket can be removed too > if (READ_ONCE(clt_path->state) != RTRS_CLT_CONNECTING) { > /* Device removal */ > - err = -ECONNABORTED; > - goto errr; > + return -ECONNABORTED; > } same here. > > return 0; > - > -errr: > - stop_cm(con); > - mutex_lock(&con->con_mutex); > - destroy_con_cq_qp(con); > - mutex_unlock(&con->con_mutex); > -destroy_cm: > - destroy_cm(con); > - > - return err; > } > > static void rtrs_clt_path_up(struct rtrs_clt_path *clt_path) > @@ -2334,7 +2323,7 @@ static void rtrs_clt_close_work(struct work_struct *work) > static int init_conns(struct rtrs_clt_path *clt_path) > { > unsigned int cid; > - int err; > + int err, i; > > /* > * On every new session connections increase reconnect counter > @@ -2350,10 +2339,8 @@ static int init_conns(struct rtrs_clt_path *clt_path) > goto destroy; > > err = create_cm(to_clt_con(clt_path->s.con[cid])); > - if (err) { > - destroy_con(to_clt_con(clt_path->s.con[cid])); > + if (err) > goto destroy; > - } > } > err = alloc_path_reqs(clt_path); > if (err) > @@ -2364,15 +2351,19 @@ static int init_conns(struct rtrs_clt_path *clt_path) > return 0; > > destroy: > - while (cid--) { > + /* Make sure we do the cleanup in the order they are created */ > + for (i = 0; i <= cid; i++) { > struct rtrs_clt_con *con = to_clt_con(clt_path->s.con[cid]); Yes, this line has to be adapted. struct rtrs_clt_con *con = to_clt_con(clt_path->s.con[i]); > > - stop_cm(con); > - > - mutex_lock(&con->con_mutex); > - destroy_con_cq_qp(con); > - mutex_unlock(&con->con_mutex); > - destroy_cm(con); > + if (!con) > + break; > + if (con->c.cm_id) { > + stop_cm(con); > + mutex_lock(&con->con_mutex); > + destroy_con_cq_qp(con); > + mutex_unlock(&con->con_mutex); > + destroy_cm(con); > + } > destroy_con(con); > } > /* > -- > 2.29.2 > This version looks fine. I will run some tests. Thx! > > > On 20/04/2023 10:00, Li Zhijian wrote: > > On 19/04/2023 21:20, Jinpu Wang wrote: > >> On Wed, Apr 19, 2023 at 11:53 AM Zhijian Li (Fujitsu) > >> <lizhijian@fujitsu.com> wrote: > >>> > >>> Leon, Guoqing > >>> > >>> > >>> On 18/04/2023 15:57, Leon Romanovsky wrote: > >>>>>>> Currently, without this patch: > >>>>>>> 1. PD and clt_path->s.dev are shared among connections. > >>>>>>> 2. every con[n]'s cleanup phase will call destroy_con_cq_qp() > >>>>>>> 3. clt_path->s.dev will be always decreased in destroy_con_cq_qp(), and when > >>>>>>> clt_path->s.dev become zero, it will destroy PD. > >>>>>>> 4. when con[1] failed to create, con[1] will not take clt_path->s.dev, but it try to decreased clt_path->s.dev <<< it's wrong to do that. > >>>>>> So please fix it by making sure that failure to create con[1] will > >>>>>> release resources which were allocated. If con[1] didn't increase > >>>>>> s.dev_ref, it shouldn't decrease it either. > >>>>> You are right, the current patch did exactly that. > >>>>> It introduced a con owning flag 'has_dev' to indicate whether this con has taken s.dev. > >>>>> so that its cleanup phase will only decrease its s.dev properly. > >>>> The has_dev is a workaround and not a solution. In proper error unwind > >>>> sequence, you won't need extra flag.
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c index c2065fc33a56..4c8f42e46e2f 100644 --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c @@ -1664,6 +1664,7 @@ static int create_con_cq_qp(struct rtrs_clt_con *con) return -ENOMEM; } clt_path->s.dev_ref = 1; + con->has_dev = true; query_fast_reg_mode(clt_path); wr_limit = clt_path->s.dev->ib_dev->attrs.max_qp_wr; /* @@ -1690,6 +1691,7 @@ static int create_con_cq_qp(struct rtrs_clt_con *con) wr_limit = clt_path->s.dev->ib_dev->attrs.max_qp_wr; /* Shared between connections */ clt_path->s.dev_ref++; + con->has_dev = true; max_send_wr = min_t(int, wr_limit, /* QD * (REQ + RSP + FR REGS or INVS) + drain */ clt_path->queue_depth * 3 + 1); @@ -1742,6 +1744,8 @@ static void destroy_con_cq_qp(struct rtrs_clt_con *con) con->rsp_ius = NULL; con->queue_num = 0; } + if (!con->has_dev) + return; if (clt_path->s.dev_ref && !--clt_path->s.dev_ref) { rtrs_ib_dev_put(clt_path->s.dev); clt_path->s.dev = NULL; diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.h b/drivers/infiniband/ulp/rtrs/rtrs-clt.h index f848c0392d98..970b75633594 100644 --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.h +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.h @@ -75,6 +75,7 @@ struct rtrs_clt_con { unsigned int cpu; struct mutex con_mutex; int cm_err; + bool has_dev; }; /**
The warning occurs when destroying PD whose reference count is not zero. Precodition: clt_path->s.con_num is 2. So 2 cm connection will be created as below: CPU0 CPU1 init_conns { | create_cm() // a. con[0] created | | a'. rtrs_clt_rdma_cm_handler() { | rtrs_rdma_addr_resolved() | create_con_cq_qp(con); << con[0] | } | in this moment, refcnt of PD was increased to 2+ | create_cm() // b. cid = 1, failed | destroy_con_cq_qp() | rtrs_ib_dev_put() | dev_free() | ib_dealloc_pd(dev->ib_pd) << PD | is destroyed, but refcnt is | still greater than 0 | } Simply, Here we can avoid this warning by introducing conn own flag to track if its cleanup should drop the PD. ----------------------------------------------- rnbd_client L597: Mapping device /dev/nvme0n1 on session client, (access_mode: rw, nr_poll_queues: 0) ------------[ cut here ]------------ WARNING: CPU: 0 PID: 26407 at drivers/infiniband/sw/rxe/rxe_pool.c:256 __rxe_cleanup+0x13a/0x170 [rdma_rxe] Modules linked in: rpcrdma rdma_ucm ib_iser rnbd_client libiscsi rtrs_client scsi_transport_iscsi rtrs_core rdma_cm iw_cm ib_cm crc32_generic rdma_rxe udp_tunnel ib_uverbs ib_core kmem device_dax nd_pmem dax_pmem nd_ vme crc32c_intel fuse nvme_core nfit libnvdimm dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_mod CPU: 0 PID: 26407 Comm: rnbd-client.sh Kdump: loaded Not tainted 6.2.0-rc6-roce-flush+ #53 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:__rxe_cleanup+0x13a/0x170 [rdma_rxe] Code: 45 84 e4 0f 84 5a ff ff ff 48 89 ef e8 5f 18 71 f9 84 c0 75 90 be c8 00 00 00 48 89 ef e8 be 89 1f fa 85 c0 0f 85 7b ff ff ff <0f> 0b 41 bc ea ff ff ff e9 71 ff ff ff e8 84 7f 1f fa e9 d0 fe ff RSP: 0018:ffffb09880b6f5f0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff99401f15d6a8 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffffffffbac8234b RDI: 00000000ffffffff RBP: ffff99401f15d6d0 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000002d82 R11: 0000000000000000 R12: 0000000000000001 R13: ffff994101eff208 R14: ffffb09880b6f6a0 R15: 00000000fffffe00 FS: 00007fe113904740(0000) GS:ffff99413bc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ff6cde656c8 CR3: 000000001f108004 CR4: 00000000001706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> rxe_dealloc_pd+0x16/0x20 [rdma_rxe] ib_dealloc_pd_user+0x4b/0x80 [ib_core] rtrs_ib_dev_put+0x79/0xd0 [rtrs_core] destroy_con_cq_qp+0x8a/0xa0 [rtrs_client] init_path+0x1e7/0x9a0 [rtrs_client] ? __pfx_autoremove_wake_function+0x10/0x10 ? lock_is_held_type+0xd7/0x130 ? rcu_read_lock_sched_held+0x43/0x80 ? pcpu_alloc+0x3dd/0x7d0 ? rtrs_clt_init_stats+0x18/0x40 [rtrs_client] rtrs_clt_open+0x24f/0x5a0 [rtrs_client] ? __pfx_rnbd_clt_link_ev+0x10/0x10 [rnbd_client] rnbd_clt_map_device+0x6a5/0xe10 [rnbd_client] Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> --- drivers/infiniband/ulp/rtrs/rtrs-clt.c | 4 ++++ drivers/infiniband/ulp/rtrs/rtrs-clt.h | 1 + 2 files changed, 5 insertions(+)