diff mbox series

[rdma-next] Revert "IB/isert: Fix incorrect release of isert connection"

Message ID a27982d3235005c58f6d321f3fad5eb6e1beaf9e.1692604607.git.leonro@nvidia.com (mailing list archive)
State Accepted
Commit dfe261107c080709459c32695847eec96238852b
Headers show
Series [rdma-next] Revert "IB/isert: Fix incorrect release of isert connection" | expand

Commit Message

Leon Romanovsky Aug. 21, 2023, 7:57 a.m. UTC
From: Leon Romanovsky <leonro@nvidia.com>

Commit: 699826f4e30a ("IB/isert: Fix incorrect release of isert connection") is
causing problems on OPA when DEVICE_REMOVAL is happening.

 ------------[ cut here ]------------
 WARNING: CPU: 52 PID: 2117247 at drivers/infiniband/core/cq.c:359
ib_cq_pool_cleanup+0xac/0xb0 [ib_core]
 Modules linked in: nfsd nfs_acl target_core_user uio tcm_fc libfc
scsi_transport_fc tcm_loop target_core_pscsi target_core_iblock target_core_file
rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs
rfkill rpcrdma rdma_ucm ib_srpt sunrpc ib_isert iscsi_target_mod target_core_mod
opa_vnic ib_iser libiscsi ib_umad scsi_transport_iscsi rdma_cm ib_ipoib iw_cm
ib_cm hfi1(-) rdmavt ib_uverbs intel_rapl_msr intel_rapl_common sb_edac ib_core
x86_pkg_temp_thermal intel_powerclamp coretemp i2c_i801 mxm_wmi rapl iTCO_wdt
ipmi_si iTCO_vendor_support mei_me ipmi_devintf mei intel_cstate ioatdma
intel_uncore i2c_smbus joydev pcspkr lpc_ich ipmi_msghandler acpi_power_meter
acpi_pad xfs libcrc32c sr_mod sd_mod cdrom t10_pi sg crct10dif_pclmul
crc32_pclmul crc32c_intel drm_kms_helper drm_shmem_helper ahci libahci
ghash_clmulni_intel igb drm libata dca i2c_algo_bit wmi fuse
 CPU: 52 PID: 2117247 Comm: modprobe Not tainted 6.5.0-rc1+ #1
 Hardware name: Intel Corporation S2600CWR/S2600CW, BIOS
SE5C610.86B.01.01.0014.121820151719 12/18/2015
 RIP: 0010:ib_cq_pool_cleanup+0xac/0xb0 [ib_core]
 Code: ff 48 8b 43 40 48 8d 7b 40 48 83 e8 40 4c 39 e7 75 b3 49 83
c4 10 4d 39 fc 75 94 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc <0f> 0b eb a1
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f
 RSP: 0018:ffffc10bea13fc80 EFLAGS: 00010206
 RAX: 000000000000010c RBX: ffff9bf5c7e66c00 RCX: 000000008020001d
 RDX: 000000008020001e RSI: fffff175221f9900 RDI: ffff9bf5c7e67640
 RBP: ffff9bf5c7e67600 R08: ffff9bf5c7e64400 R09: 000000008020001d
 R10: 0000000040000000 R11: 0000000000000000 R12: ffff9bee4b1e8a18
 R13: dead000000000122 R14: dead000000000100 R15: ffff9bee4b1e8a38
 FS:  00007ff1e6d38740(0000) GS:ffff9bfd9fb00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00005652044ecc68 CR3: 0000000889b5c005 CR4: 00000000001706e0
 Call Trace:
  <TASK>
  ? __warn+0x80/0x130
  ? ib_cq_pool_cleanup+0xac/0xb0 [ib_core]
  ? report_bug+0x195/0x1a0
  ? handle_bug+0x3c/0x70
  ? exc_invalid_op+0x14/0x70
  ? asm_exc_invalid_op+0x16/0x20
  ? ib_cq_pool_cleanup+0xac/0xb0 [ib_core]
  disable_device+0x9d/0x160 [ib_core]
  __ib_unregister_device+0x42/0xb0 [ib_core]
  ib_unregister_device+0x22/0x30 [ib_core]
  rvt_unregister_device+0x20/0x90 [rdmavt]
  hfi1_unregister_ib_device+0x16/0xf0 [hfi1]
  remove_one+0x55/0x1a0 [hfi1]
  pci_device_remove+0x36/0xa0
  device_release_driver_internal+0x193/0x200
  driver_detach+0x44/0x90
  bus_remove_driver+0x69/0xf0
  pci_unregister_driver+0x2a/0xb0
  hfi1_mod_cleanup+0xc/0x3c [hfi1]
  __do_sys_delete_module.constprop.0+0x17a/0x2f0
  ? exit_to_user_mode_prepare+0xc4/0xd0
  ? syscall_trace_enter.constprop.0+0x126/0x1a0
  do_syscall_64+0x5c/0x90
  ? syscall_exit_to_user_mode+0x12/0x30
  ? do_syscall_64+0x69/0x90
  ? syscall_exit_work+0x103/0x130
  ? syscall_exit_to_user_mode+0x12/0x30
  ? do_syscall_64+0x69/0x90
  ? exc_page_fault+0x65/0x150
  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
 RIP: 0033:0x7ff1e643f5ab
 Code: 73 01 c3 48 8b 0d 75 a8 1b 00 f7 d8 64 89 01 48 83 c8 ff c3
66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 45 a8 1b 00 f7 d8 64 89 01 48
 RSP: 002b:00007ffec9103cc8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
 RAX: ffffffffffffffda RBX: 00005615267fdc50 RCX: 00007ff1e643f5ab
 RDX: 0000000000000000 RSI: 0000000000000800 RDI: 00005615267fdcb8
 RBP: 00005615267fdc50 R08: 0000000000000000 R09: 0000000000000000
 R10: 00007ff1e659eac0 R11: 0000000000000206 R12: 00005615267fdcb8
 R13: 0000000000000000 R14: 00005615267fdcb8 R15: 00007ffec9105ff8
  </TASK>
 ---[ end trace 0000000000000000 ]---

And...

 restrack: ------------[ cut here ]------------
 infiniband hfi1_0: BUG: RESTRACK detected leak of resources
 restrack: Kernel PD object allocated by ib_isert is not freed
 restrack: Kernel CQ object allocated by ib_core is not freed
 restrack: Kernel QP object allocated by rdma_cm is not freed
 restrack: ------------[ cut here ]------------

Fixes: 699826f4e30a ("IB/isert: Fix incorrect release of isert connection")
Reported-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Closes: https://lore.kernel.org/all/921cd1d9-2879-f455-1f50-0053fe6a6655@cornelisnetworks.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/ulp/isert/ib_isert.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Mike Christie Aug. 21, 2023, 4:25 p.m. UTC | #1
On 8/21/23 2:57 AM, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com> Commit: 699826f4e30a ("IB/isert: Fix incorrect release of isert connection") is

Was the issue described in that commit just not correct? Or was it
supposed to be for some sort of race'y error path?

It looks like that analysis was wrong for the normal login/logout path where
we call:

1. isert_init_conn sets kref to 1.
2. If we are connected we set kref to 2 via isert_connected_handler -> kref_get
3. When we logout from there then isert_wait_conn -> queue_work release_work
and release_work does isert_put_conn so kref = 1.
4. Then we do isert_free_conn which does isert_put_conn to set the kref to 0
and then free the conn.

So the patch in this mail looks ok.

I checked most of the error paths, but I might have missed some. It looks ok
for them as well.
Dennis Dalessandro Aug. 22, 2023, 1:49 p.m. UTC | #2
On 8/21/23 3:57 AM, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> Commit: 699826f4e30a ("IB/isert: Fix incorrect release of isert connection") is
> causing problems on OPA when DEVICE_REMOVAL is happening.
> 
>  ------------[ cut here ]------------
>  WARNING: CPU: 52 PID: 2117247 at drivers/infiniband/core/cq.c:359
> ib_cq_pool_cleanup+0xac/0xb0 [ib_core]
>  Modules linked in: nfsd nfs_acl target_core_user uio tcm_fc libfc
> scsi_transport_fc tcm_loop target_core_pscsi target_core_iblock target_core_file
> rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs
> rfkill rpcrdma rdma_ucm ib_srpt sunrpc ib_isert iscsi_target_mod target_core_mod
> opa_vnic ib_iser libiscsi ib_umad scsi_transport_iscsi rdma_cm ib_ipoib iw_cm
> ib_cm hfi1(-) rdmavt ib_uverbs intel_rapl_msr intel_rapl_common sb_edac ib_core
> x86_pkg_temp_thermal intel_powerclamp coretemp i2c_i801 mxm_wmi rapl iTCO_wdt
> ipmi_si iTCO_vendor_support mei_me ipmi_devintf mei intel_cstate ioatdma
> intel_uncore i2c_smbus joydev pcspkr lpc_ich ipmi_msghandler acpi_power_meter
> acpi_pad xfs libcrc32c sr_mod sd_mod cdrom t10_pi sg crct10dif_pclmul
> crc32_pclmul crc32c_intel drm_kms_helper drm_shmem_helper ahci libahci
> ghash_clmulni_intel igb drm libata dca i2c_algo_bit wmi fuse
>  CPU: 52 PID: 2117247 Comm: modprobe Not tainted 6.5.0-rc1+ #1
>  Hardware name: Intel Corporation S2600CWR/S2600CW, BIOS
> SE5C610.86B.01.01.0014.121820151719 12/18/2015
>  RIP: 0010:ib_cq_pool_cleanup+0xac/0xb0 [ib_core]
>  Code: ff 48 8b 43 40 48 8d 7b 40 48 83 e8 40 4c 39 e7 75 b3 49 83
> c4 10 4d 39 fc 75 94 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc <0f> 0b eb a1
> 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f
>  RSP: 0018:ffffc10bea13fc80 EFLAGS: 00010206
>  RAX: 000000000000010c RBX: ffff9bf5c7e66c00 RCX: 000000008020001d
>  RDX: 000000008020001e RSI: fffff175221f9900 RDI: ffff9bf5c7e67640
>  RBP: ffff9bf5c7e67600 R08: ffff9bf5c7e64400 R09: 000000008020001d
>  R10: 0000000040000000 R11: 0000000000000000 R12: ffff9bee4b1e8a18
>  R13: dead000000000122 R14: dead000000000100 R15: ffff9bee4b1e8a38
>  FS:  00007ff1e6d38740(0000) GS:ffff9bfd9fb00000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 00005652044ecc68 CR3: 0000000889b5c005 CR4: 00000000001706e0
>  Call Trace:
>   <TASK>
>   ? __warn+0x80/0x130
>   ? ib_cq_pool_cleanup+0xac/0xb0 [ib_core]
>   ? report_bug+0x195/0x1a0
>   ? handle_bug+0x3c/0x70
>   ? exc_invalid_op+0x14/0x70
>   ? asm_exc_invalid_op+0x16/0x20
>   ? ib_cq_pool_cleanup+0xac/0xb0 [ib_core]
>   disable_device+0x9d/0x160 [ib_core]
>   __ib_unregister_device+0x42/0xb0 [ib_core]
>   ib_unregister_device+0x22/0x30 [ib_core]
>   rvt_unregister_device+0x20/0x90 [rdmavt]
>   hfi1_unregister_ib_device+0x16/0xf0 [hfi1]
>   remove_one+0x55/0x1a0 [hfi1]
>   pci_device_remove+0x36/0xa0
>   device_release_driver_internal+0x193/0x200
>   driver_detach+0x44/0x90
>   bus_remove_driver+0x69/0xf0
>   pci_unregister_driver+0x2a/0xb0
>   hfi1_mod_cleanup+0xc/0x3c [hfi1]
>   __do_sys_delete_module.constprop.0+0x17a/0x2f0
>   ? exit_to_user_mode_prepare+0xc4/0xd0
>   ? syscall_trace_enter.constprop.0+0x126/0x1a0
>   do_syscall_64+0x5c/0x90
>   ? syscall_exit_to_user_mode+0x12/0x30
>   ? do_syscall_64+0x69/0x90
>   ? syscall_exit_work+0x103/0x130
>   ? syscall_exit_to_user_mode+0x12/0x30
>   ? do_syscall_64+0x69/0x90
>   ? exc_page_fault+0x65/0x150
>   entry_SYSCALL_64_after_hwframe+0x6e/0xd8
>  RIP: 0033:0x7ff1e643f5ab
>  Code: 73 01 c3 48 8b 0d 75 a8 1b 00 f7 d8 64 89 01 48 83 c8 ff c3
> 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0
> ff ff 73 01 c3 48 8b 0d 45 a8 1b 00 f7 d8 64 89 01 48
>  RSP: 002b:00007ffec9103cc8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
>  RAX: ffffffffffffffda RBX: 00005615267fdc50 RCX: 00007ff1e643f5ab
>  RDX: 0000000000000000 RSI: 0000000000000800 RDI: 00005615267fdcb8
>  RBP: 00005615267fdc50 R08: 0000000000000000 R09: 0000000000000000
>  R10: 00007ff1e659eac0 R11: 0000000000000206 R12: 00005615267fdcb8
>  R13: 0000000000000000 R14: 00005615267fdcb8 R15: 00007ffec9105ff8
>   </TASK>
>  ---[ end trace 0000000000000000 ]---
> 
> And...
> 
>  restrack: ------------[ cut here ]------------
>  infiniband hfi1_0: BUG: RESTRACK detected leak of resources
>  restrack: Kernel PD object allocated by ib_isert is not freed
>  restrack: Kernel CQ object allocated by ib_core is not freed
>  restrack: Kernel QP object allocated by rdma_cm is not freed
>  restrack: ------------[ cut here ]------------
> 
> Fixes: 699826f4e30a ("IB/isert: Fix incorrect release of isert connection")
> Reported-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
> Closes: https://lore.kernel.org/all/921cd1d9-2879-f455-1f50-0053fe6a6655@cornelisnetworks.com
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>  drivers/infiniband/ulp/isert/ib_isert.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
> index 92e1e7587af8..00a7303c8cc6 100644
> --- a/drivers/infiniband/ulp/isert/ib_isert.c
> +++ b/drivers/infiniband/ulp/isert/ib_isert.c
> @@ -2570,6 +2570,8 @@ static void isert_wait_conn(struct iscsit_conn *conn)
>  	isert_put_unsol_pending_cmds(conn);
>  	isert_wait4cmds(conn);
>  	isert_wait4logout(isert_conn);
> +
> +	queue_work(isert_release_wq, &isert_conn->release_work);
>  }
>  
>  static void isert_free_conn(struct iscsit_conn *conn)

Tested-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Leon Romanovsky Aug. 22, 2023, 1:58 p.m. UTC | #3
On Mon, Aug 21, 2023 at 11:25:35AM -0500, Mike Christie wrote:
> On 8/21/23 2:57 AM, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@nvidia.com> Commit: 699826f4e30a ("IB/isert: Fix incorrect release of isert connection") is
> 
> Was the issue described in that commit just not correct? Or was it
> supposed to be for some sort of race'y error path?
> 
> It looks like that analysis was wrong for the normal login/logout path where
> we call:

I came to same conclusion when analyzed Dennis's report.

Thanks

> 
> 1. isert_init_conn sets kref to 1.
> 2. If we are connected we set kref to 2 via isert_connected_handler -> kref_get
> 3. When we logout from there then isert_wait_conn -> queue_work release_work
> and release_work does isert_put_conn so kref = 1.
> 4. Then we do isert_free_conn which does isert_put_conn to set the kref to 0
> and then free the conn.
> 
> So the patch in this mail looks ok.
> 
> I checked most of the error paths, but I might have missed some. It looks ok
> for them as well.
Leon Romanovsky Aug. 22, 2023, 2:01 p.m. UTC | #4
On Mon, 21 Aug 2023 10:57:14 +0300, Leon Romanovsky wrote:
> Commit: 699826f4e30a ("IB/isert: Fix incorrect release of isert connection") is
> causing problems on OPA when DEVICE_REMOVAL is happening.
> 
>  ------------[ cut here ]------------
>  WARNING: CPU: 52 PID: 2117247 at drivers/infiniband/core/cq.c:359
> ib_cq_pool_cleanup+0xac/0xb0 [ib_core]
>  Modules linked in: nfsd nfs_acl target_core_user uio tcm_fc libfc
> scsi_transport_fc tcm_loop target_core_pscsi target_core_iblock target_core_file
> rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs
> rfkill rpcrdma rdma_ucm ib_srpt sunrpc ib_isert iscsi_target_mod target_core_mod
> opa_vnic ib_iser libiscsi ib_umad scsi_transport_iscsi rdma_cm ib_ipoib iw_cm
> ib_cm hfi1(-) rdmavt ib_uverbs intel_rapl_msr intel_rapl_common sb_edac ib_core
> x86_pkg_temp_thermal intel_powerclamp coretemp i2c_i801 mxm_wmi rapl iTCO_wdt
> ipmi_si iTCO_vendor_support mei_me ipmi_devintf mei intel_cstate ioatdma
> intel_uncore i2c_smbus joydev pcspkr lpc_ich ipmi_msghandler acpi_power_meter
> acpi_pad xfs libcrc32c sr_mod sd_mod cdrom t10_pi sg crct10dif_pclmul
> crc32_pclmul crc32c_intel drm_kms_helper drm_shmem_helper ahci libahci
> ghash_clmulni_intel igb drm libata dca i2c_algo_bit wmi fuse
>  CPU: 52 PID: 2117247 Comm: modprobe Not tainted 6.5.0-rc1+ #1
>  Hardware name: Intel Corporation S2600CWR/S2600CW, BIOS
> SE5C610.86B.01.01.0014.121820151719 12/18/2015
>  RIP: 0010:ib_cq_pool_cleanup+0xac/0xb0 [ib_core]
>  Code: ff 48 8b 43 40 48 8d 7b 40 48 83 e8 40 4c 39 e7 75 b3 49 83
> c4 10 4d 39 fc 75 94 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc <0f> 0b eb a1
> 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f
>  RSP: 0018:ffffc10bea13fc80 EFLAGS: 00010206
>  RAX: 000000000000010c RBX: ffff9bf5c7e66c00 RCX: 000000008020001d
>  RDX: 000000008020001e RSI: fffff175221f9900 RDI: ffff9bf5c7e67640
>  RBP: ffff9bf5c7e67600 R08: ffff9bf5c7e64400 R09: 000000008020001d
>  R10: 0000000040000000 R11: 0000000000000000 R12: ffff9bee4b1e8a18
>  R13: dead000000000122 R14: dead000000000100 R15: ffff9bee4b1e8a38
>  FS:  00007ff1e6d38740(0000) GS:ffff9bfd9fb00000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 00005652044ecc68 CR3: 0000000889b5c005 CR4: 00000000001706e0
>  Call Trace:
>   <TASK>
>   ? __warn+0x80/0x130
>   ? ib_cq_pool_cleanup+0xac/0xb0 [ib_core]
>   ? report_bug+0x195/0x1a0
>   ? handle_bug+0x3c/0x70
>   ? exc_invalid_op+0x14/0x70
>   ? asm_exc_invalid_op+0x16/0x20
>   ? ib_cq_pool_cleanup+0xac/0xb0 [ib_core]
>   disable_device+0x9d/0x160 [ib_core]
>   __ib_unregister_device+0x42/0xb0 [ib_core]
>   ib_unregister_device+0x22/0x30 [ib_core]
>   rvt_unregister_device+0x20/0x90 [rdmavt]
>   hfi1_unregister_ib_device+0x16/0xf0 [hfi1]
>   remove_one+0x55/0x1a0 [hfi1]
>   pci_device_remove+0x36/0xa0
>   device_release_driver_internal+0x193/0x200
>   driver_detach+0x44/0x90
>   bus_remove_driver+0x69/0xf0
>   pci_unregister_driver+0x2a/0xb0
>   hfi1_mod_cleanup+0xc/0x3c [hfi1]
>   __do_sys_delete_module.constprop.0+0x17a/0x2f0
>   ? exit_to_user_mode_prepare+0xc4/0xd0
>   ? syscall_trace_enter.constprop.0+0x126/0x1a0
>   do_syscall_64+0x5c/0x90
>   ? syscall_exit_to_user_mode+0x12/0x30
>   ? do_syscall_64+0x69/0x90
>   ? syscall_exit_work+0x103/0x130
>   ? syscall_exit_to_user_mode+0x12/0x30
>   ? do_syscall_64+0x69/0x90
>   ? exc_page_fault+0x65/0x150
>   entry_SYSCALL_64_after_hwframe+0x6e/0xd8
>  RIP: 0033:0x7ff1e643f5ab
>  Code: 73 01 c3 48 8b 0d 75 a8 1b 00 f7 d8 64 89 01 48 83 c8 ff c3
> 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0
> ff ff 73 01 c3 48 8b 0d 45 a8 1b 00 f7 d8 64 89 01 48
>  RSP: 002b:00007ffec9103cc8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
>  RAX: ffffffffffffffda RBX: 00005615267fdc50 RCX: 00007ff1e643f5ab
>  RDX: 0000000000000000 RSI: 0000000000000800 RDI: 00005615267fdcb8
>  RBP: 00005615267fdc50 R08: 0000000000000000 R09: 0000000000000000
>  R10: 00007ff1e659eac0 R11: 0000000000000206 R12: 00005615267fdcb8
>  R13: 0000000000000000 R14: 00005615267fdcb8 R15: 00007ffec9105ff8
>   </TASK>
>  ---[ end trace 0000000000000000 ]---
> 
> [...]

Applied, thanks!

[1/1] Revert "IB/isert: Fix incorrect release of isert connection"
      https://git.kernel.org/rdma/rdma/c/dfe261107c0807

Best regards,
Saravanan Vajravel Aug. 23, 2023, 4:23 a.m. UTC | #5
Agree with you Mike. I came to a wrong conclusion based on assumption that
kref_init() is initializing the refcount to zero.

Thanks & Regards,
Saravanan Vajravel
+91-80-46116256


-----Original Message-----
From: Mike Christie <michael.christie@oracle.com>
Sent: Monday, August 21, 2023 9:56 PM
To: Leon Romanovsky <leon@kernel.org>; Jason Gunthorpe <jgg@nvidia.com>
Cc: Leon Romanovsky <leonro@nvidia.com>; Dennis Dalessandro
<dennis.dalessandro@cornelisnetworks.com>; linux-rdma@vger.kernel.org; Sagi
Grimberg <sagi@grimberg.me>; Saravanan Vajravel
<saravanan.vajravel@broadcom.com>; Selvin Xavier
<selvin.xavier@broadcom.com>; target-devel@vger.kernel.org
Subject: Re: [PATCH rdma-next] Revert "IB/isert: Fix incorrect release of
isert connection"

On 8/21/23 2:57 AM, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com> Commit: 699826f4e30a
> ("IB/isert: Fix incorrect release of isert connection") is

Was the issue described in that commit just not correct? Or was it supposed
to be for some sort of race'y error path?

It looks like that analysis was wrong for the normal login/logout path where
we call:

1. isert_init_conn sets kref to 1.
2. If we are connected we set kref to 2 via isert_connected_handler ->
kref_get 3. When we logout from there then isert_wait_conn -> queue_work
release_work and release_work does isert_put_conn so kref = 1.
4. Then we do isert_free_conn which does isert_put_conn to set the kref to 0
and then free the conn.

So the patch in this mail looks ok.

I checked most of the error paths, but I might have missed some. It looks ok
for them as well.
diff mbox series

Patch

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index 92e1e7587af8..00a7303c8cc6 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -2570,6 +2570,8 @@  static void isert_wait_conn(struct iscsit_conn *conn)
 	isert_put_unsol_pending_cmds(conn);
 	isert_wait4cmds(conn);
 	isert_wait4logout(isert_conn);
+
+	queue_work(isert_release_wq, &isert_conn->release_work);
 }
 
 static void isert_free_conn(struct iscsit_conn *conn)