Message ID | 20230612054237.1855292-1-shinichiro.kawasaki@wdc.com (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Series | [v2] RDMA/cma: prevent rdma id destroy during cma_iw_handler | expand |
On Mon, 12 Jun 2023 14:42:37 +0900, Shin'ichiro Kawasaki wrote: > When rdma_destroy_id() and cma_iw_handler() race, struct rdma_id_private > *id_priv can be destroyed during cma_iw_handler call. This causes "BUG: > KASAN: slab-use-after-free" at mutex_lock() in cma_iw_handler() [1]. > To prevent the destroy of id_priv, keep its reference count by calling > cma_id_get() and cma_id_put() at start and end of cma_iw_handler(). > > [1] > > [...] Applied, thanks! [1/1] RDMA/cma: prevent rdma id destroy during cma_iw_handler https://git.kernel.org/rdma/rdma/c/fd06a5925e4773 Best regards,
On Mon, Jun 12, 2023 at 02:42:37PM +0900, Shin'ichiro Kawasaki wrote: > When rdma_destroy_id() and cma_iw_handler() race, struct rdma_id_private > *id_priv can be destroyed during cma_iw_handler call. This causes "BUG: > KASAN: slab-use-after-free" at mutex_lock() in cma_iw_handler() [1]. > To prevent the destroy of id_priv, keep its reference count by calling > cma_id_get() and cma_id_put() at start and end of cma_iw_handler(). > > [1] > > ================================================================== > BUG: KASAN: slab-use-after-free in __mutex_lock+0x1324/0x18f0 > Read of size 8 at addr ffff888197b37418 by task kworker/u8:0/9 > > CPU: 0 PID: 9 Comm: kworker/u8:0 Not tainted 6.3.0 #62 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014 > Workqueue: iw_cm_wq cm_work_handler [iw_cm] > Call Trace: > <TASK> > dump_stack_lvl+0x57/0x90 > print_report+0xcf/0x660 > ? __mutex_lock+0x1324/0x18f0 > kasan_report+0xa4/0xe0 > ? __mutex_lock+0x1324/0x18f0 > __mutex_lock+0x1324/0x18f0 > ? cma_iw_handler+0xac/0x4f0 [rdma_cm] > ? _raw_spin_unlock_irqrestore+0x30/0x60 > ? rcu_is_watching+0x11/0xb0 > ? _raw_spin_unlock_irqrestore+0x30/0x60 > ? trace_hardirqs_on+0x12/0x100 > ? __pfx___mutex_lock+0x10/0x10 > ? __percpu_counter_sum+0x147/0x1e0 > ? domain_dirty_limits+0x246/0x390 > ? wb_over_bg_thresh+0x4d5/0x610 > ? rcu_is_watching+0x11/0xb0 > ? cma_iw_handler+0xac/0x4f0 [rdma_cm] > cma_iw_handler+0xac/0x4f0 [rdma_cm] What is the full call chain here, eg with the static functions un-inlined? > > drivers/infiniband/core/cma.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c > index 93a1c48d0c32..c5267d9bb184 100644 > --- a/drivers/infiniband/core/cma.c > +++ b/drivers/infiniband/core/cma.c > @@ -2477,6 +2477,7 @@ static int cma_iw_handler(struct iw_cm_id *iw_id, struct iw_cm_event *iw_event) > struct sockaddr *laddr = (struct sockaddr *)&iw_event->local_addr; > struct sockaddr *raddr = (struct sockaddr *)&iw_event->remote_addr; > > + cma_id_get(id_priv); > mutex_lock(&id_priv->handler_mutex); > if (READ_ONCE(id_priv->state) != RDMA_CM_CONNECT) > goto out; > @@ -2524,12 +2525,14 @@ static int cma_iw_handler(struct iw_cm_id *iw_id, struct iw_cm_event *iw_event) > if (ret) { > /* Destroy the CM ID by returning a non-zero value. */ > id_priv->cm_id.iw = NULL; > + cma_id_put(id_priv); > destroy_id_handler_unlock(id_priv); > return ret; > } > > out: > mutex_unlock(&id_priv->handler_mutex); > + cma_id_put(id_priv); > return ret; > } cm_work_handler already has a ref on the iwcm_id_private I think there is likely some much larger issue with the IW CM if the cm_id can be destroyed while the iwcm_id is in use? It is weird that there are two id memories for this :\ Jason
On Jun 12, 2023 / 11:18, Jason Gunthorpe wrote: > On Mon, Jun 12, 2023 at 02:42:37PM +0900, Shin'ichiro Kawasaki wrote: > > When rdma_destroy_id() and cma_iw_handler() race, struct rdma_id_private > > *id_priv can be destroyed during cma_iw_handler call. This causes "BUG: > > KASAN: slab-use-after-free" at mutex_lock() in cma_iw_handler() [1]. > > To prevent the destroy of id_priv, keep its reference count by calling > > cma_id_get() and cma_id_put() at start and end of cma_iw_handler(). > > > > [1] > > > > ================================================================== > > BUG: KASAN: slab-use-after-free in __mutex_lock+0x1324/0x18f0 > > Read of size 8 at addr ffff888197b37418 by task kworker/u8:0/9 > > > > CPU: 0 PID: 9 Comm: kworker/u8:0 Not tainted 6.3.0 #62 > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014 > > Workqueue: iw_cm_wq cm_work_handler [iw_cm] > > Call Trace: > > <TASK> > > dump_stack_lvl+0x57/0x90 > > print_report+0xcf/0x660 > > ? __mutex_lock+0x1324/0x18f0 > > kasan_report+0xa4/0xe0 > > ? __mutex_lock+0x1324/0x18f0 > > __mutex_lock+0x1324/0x18f0 > > ? cma_iw_handler+0xac/0x4f0 [rdma_cm] > > ? _raw_spin_unlock_irqrestore+0x30/0x60 > > ? rcu_is_watching+0x11/0xb0 > > ? _raw_spin_unlock_irqrestore+0x30/0x60 > > ? trace_hardirqs_on+0x12/0x100 > > ? __pfx___mutex_lock+0x10/0x10 > > ? __percpu_counter_sum+0x147/0x1e0 > > ? domain_dirty_limits+0x246/0x390 > > ? wb_over_bg_thresh+0x4d5/0x610 > > ? rcu_is_watching+0x11/0xb0 > > ? cma_iw_handler+0xac/0x4f0 [rdma_cm] > > cma_iw_handler+0xac/0x4f0 [rdma_cm] > > What is the full call chain here, eg with the static functions > un-inlined? I checked the inlined func call chain from cm_work_handler to cma_iw_handler (I recreated the symptom using kernel v6.4-rc5, so, address numbers are different): $ ./scripts/faddr2line ./drivers/infiniband/core/iw_cm.ko cm_work_handler+0xb57/0x1c50 cm_work_handler+0xb57/0x1c50: cm_close_handler at /home/shin/Linux/linux/drivers/infiniband/core/iwcm.c:974 (inlined by) process_event at /home/shin/Linux/linux/drivers/infiniband/core/iwcm.c:997 (inlined by) cm_work_handler at /home/shin/Linux/linux/drivers/infiniband/core/iwcm.c:1036 With this, my understanding of the full call chain from NVME driver to cma_iw_handler is as follows, including task switch to cm_work_handler: nvme_rdma_teardown_io_queue nvme_rdma_stop_io_queues nvme_rdma_stop_queue __nvme_rdma_stop_queue rdma_disconnect iw_cm_disconnect iwcm_modify_qp_sqd ib_modify_qp _ib_modify_qp ib_security_modify_qp siw_verbs_modify_qp siw_qp_modify siw_qp_cm_drop siw_cm_upcall(IW_CM_EVENT_CLOSE) cm_event_handler -> refcount_inc(&cm_id_priv->refoucnt) queue_work -> cm_work_handler process_event cm_close_handler cm_work_handler cma_iw_handler > > > > drivers/infiniband/core/cma.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c > > index 93a1c48d0c32..c5267d9bb184 100644 > > --- a/drivers/infiniband/core/cma.c > > +++ b/drivers/infiniband/core/cma.c > > @@ -2477,6 +2477,7 @@ static int cma_iw_handler(struct iw_cm_id *iw_id, struct iw_cm_event *iw_event) > > struct sockaddr *laddr = (struct sockaddr *)&iw_event->local_addr; > > struct sockaddr *raddr = (struct sockaddr *)&iw_event->remote_addr; > > > > + cma_id_get(id_priv); > > mutex_lock(&id_priv->handler_mutex); > > if (READ_ONCE(id_priv->state) != RDMA_CM_CONNECT) > > goto out; > > @@ -2524,12 +2525,14 @@ static int cma_iw_handler(struct iw_cm_id *iw_id, struct iw_cm_event *iw_event) > > if (ret) { > > /* Destroy the CM ID by returning a non-zero value. */ > > id_priv->cm_id.iw = NULL; > > + cma_id_put(id_priv); > > destroy_id_handler_unlock(id_priv); > > return ret; > > } > > > > out: > > mutex_unlock(&id_priv->handler_mutex); > > + cma_id_put(id_priv); > > return ret; > > } > > cm_work_handler already has a ref on the iwcm_id_private > > I think there is likely some much larger issue with the IW CM if the > cm_id can be destroyed while the iwcm_id is in use? It is weird that > there are two id memories for this :\ My understanding about the call chain to rdma id destroy is as follows. I guess _destory_id calls iw_destory_cm_id before destroying the rdma id, but not sure why it does not wait for cm_id deref by cm_work_handler. nvme_rdma_teardown_io_queueus nvme_rdma_stop_io_queues -> chained to cma_iw_handler nvme_rdma_free_io_queues nvme_rdma_free_queue rdma_destroy_id mutex_lock(&id_priv->handler_mutex) destroy_id_handler_unlock mutex_unlock(&id_priv->handler_mutex) _destory_id iw_destroy_cm_id wait_for_completiion(&id_priv->comp) kfree(id_priv)
On Jun 13, 2023 / 10:43, Shin'ichiro Kawasaki wrote: [...] > My understanding about the call chain to rdma id destroy is as follows. I guess > _destory_id calls iw_destory_cm_id before destroying the rdma id, but not sure > why it does not wait for cm_id deref by cm_work_handler. I found the commit 59c68ac31e15 ("iw_cm: free cm_id resources on the last deref") looks related the issue. Its commit message says that it "removed the block in iw_destroy_cm_id()" to avoid potential deadlock. So it looks cm_id ref no longer guards rdma id destroy. I'm not sure how to fix this issue at IW CM level.
On Tue, Jun 13, 2023 at 01:43:43AM +0000, Shinichiro Kawasaki wrote: > > I think there is likely some much larger issue with the IW CM if the > > cm_id can be destroyed while the iwcm_id is in use? It is weird that > > there are two id memories for this :\ > > My understanding about the call chain to rdma id destroy is as follows. I guess > _destory_id calls iw_destory_cm_id before destroying the rdma id, but not sure > why it does not wait for cm_id deref by cm_work_handler. > > nvme_rdma_teardown_io_queueus > nvme_rdma_stop_io_queues -> chained to cma_iw_handler > nvme_rdma_free_io_queues > nvme_rdma_free_queue > rdma_destroy_id > mutex_lock(&id_priv->handler_mutex) > destroy_id_handler_unlock > mutex_unlock(&id_priv->handler_mutex) > _destory_id > iw_destroy_cm_id > wait_for_completiion(&id_priv->comp) > kfree(id_priv) Once a destroy_cm_id() has returned that layer is no longer permitted to run or be running in its handlers. The iw cm is broken if it allows this, and that is the cause of the bug. Taking more refs within handlers that are already not allowed to be running is just racy. Jason
On Tue, Jun 13, 2023 at 10:30:37AM -0300, Jason Gunthorpe wrote: > On Tue, Jun 13, 2023 at 01:43:43AM +0000, Shinichiro Kawasaki wrote: > > > I think there is likely some much larger issue with the IW CM if the > > > cm_id can be destroyed while the iwcm_id is in use? It is weird that > > > there are two id memories for this :\ > > > > My understanding about the call chain to rdma id destroy is as follows. I guess > > _destory_id calls iw_destory_cm_id before destroying the rdma id, but not sure > > why it does not wait for cm_id deref by cm_work_handler. > > > > nvme_rdma_teardown_io_queueus > > nvme_rdma_stop_io_queues -> chained to cma_iw_handler > > nvme_rdma_free_io_queues > > nvme_rdma_free_queue > > rdma_destroy_id > > mutex_lock(&id_priv->handler_mutex) > > destroy_id_handler_unlock > > mutex_unlock(&id_priv->handler_mutex) > > _destory_id > > iw_destroy_cm_id > > wait_for_completiion(&id_priv->comp) > > kfree(id_priv) > > Once a destroy_cm_id() has returned that layer is no longer > permitted to run or be running in its handlers. The iw cm is broken if > it allows this, and that is the cause of the bug. > > Taking more refs within handlers that are already not allowed to be > running is just racy. So we need to revert that patch from our rdma-rc. Thanks > > Jason
On Jun 13, 2023 / 21:07, Leon Romanovsky wrote: > On Tue, Jun 13, 2023 at 10:30:37AM -0300, Jason Gunthorpe wrote: > > On Tue, Jun 13, 2023 at 01:43:43AM +0000, Shinichiro Kawasaki wrote: > > > > I think there is likely some much larger issue with the IW CM if the > > > > cm_id can be destroyed while the iwcm_id is in use? It is weird that > > > > there are two id memories for this :\ > > > > > > My understanding about the call chain to rdma id destroy is as follows. I guess > > > _destory_id calls iw_destory_cm_id before destroying the rdma id, but not sure > > > why it does not wait for cm_id deref by cm_work_handler. > > > > > > nvme_rdma_teardown_io_queueus > > > nvme_rdma_stop_io_queues -> chained to cma_iw_handler > > > nvme_rdma_free_io_queues > > > nvme_rdma_free_queue > > > rdma_destroy_id > > > mutex_lock(&id_priv->handler_mutex) > > > destroy_id_handler_unlock > > > mutex_unlock(&id_priv->handler_mutex) > > > _destory_id > > > iw_destroy_cm_id > > > wait_for_completiion(&id_priv->comp) > > > kfree(id_priv) > > > > Once a destroy_cm_id() has returned that layer is no longer > > permitted to run or be running in its handlers. The iw cm is broken if > > it allows this, and that is the cause of the bug. > > > > Taking more refs within handlers that are already not allowed to be > > running is just racy. > > So we need to revert that patch from our rdma-rc. I see, thanks for the clarifications. As another fix approach, I reverted the commit 59c68ac31e15 ("iw_cm: free cm_id resources on the last deref") so that iw_destroy_cm_id() waits for deref of cm_id. With that revert, the KASAN slab-use-after-free disappeared. Is this the right fix approach?
On Wed, Jun 14, 2023 at 07:53:49AM +0000, Shinichiro Kawasaki wrote: > On Jun 13, 2023 / 21:07, Leon Romanovsky wrote: > > On Tue, Jun 13, 2023 at 10:30:37AM -0300, Jason Gunthorpe wrote: > > > On Tue, Jun 13, 2023 at 01:43:43AM +0000, Shinichiro Kawasaki wrote: > > > > > I think there is likely some much larger issue with the IW CM if the > > > > > cm_id can be destroyed while the iwcm_id is in use? It is weird that > > > > > there are two id memories for this :\ > > > > > > > > My understanding about the call chain to rdma id destroy is as follows. I guess > > > > _destory_id calls iw_destory_cm_id before destroying the rdma id, but not sure > > > > why it does not wait for cm_id deref by cm_work_handler. > > > > > > > > nvme_rdma_teardown_io_queueus > > > > nvme_rdma_stop_io_queues -> chained to cma_iw_handler > > > > nvme_rdma_free_io_queues > > > > nvme_rdma_free_queue > > > > rdma_destroy_id > > > > mutex_lock(&id_priv->handler_mutex) > > > > destroy_id_handler_unlock > > > > mutex_unlock(&id_priv->handler_mutex) > > > > _destory_id > > > > iw_destroy_cm_id > > > > wait_for_completiion(&id_priv->comp) > > > > kfree(id_priv) > > > > > > Once a destroy_cm_id() has returned that layer is no longer > > > permitted to run or be running in its handlers. The iw cm is broken if > > > it allows this, and that is the cause of the bug. > > > > > > Taking more refs within handlers that are already not allowed to be > > > running is just racy. > > > > So we need to revert that patch from our rdma-rc. > > I see, thanks for the clarifications. > > As another fix approach, I reverted the commit 59c68ac31e15 ("iw_cm: free cm_id > resources on the last deref") so that iw_destroy_cm_id() waits for deref of > cm_id. With that revert, the KASAN slab-use-after-free disappeared. Is this > the right fix approach? That seems like it would bring back the bug it was fixing, though it isn't totally clear what that is There is something wrong with the iwarp cm if it is destroying IDs in handlers, IB cm avoids doing that to avoid the deadlock, the same solution will be needed for iwarp too. Also the code this patch removed is quite ugly, if we are going back to waiting it should be written in a more modern way without the test bit and so on. Jason
On Jun 14, 2023 / 14:36, Jason Gunthorpe wrote: > On Wed, Jun 14, 2023 at 07:53:49AM +0000, Shinichiro Kawasaki wrote: [...] > > As another fix approach, I reverted the commit 59c68ac31e15 ("iw_cm: free cm_id > > resources on the last deref") so that iw_destroy_cm_id() waits for deref of > > cm_id. With that revert, the KASAN slab-use-after-free disappeared. Is this > > the right fix approach? > > That seems like it would bring back the bug it was fixing, though it > isn't totally clear what that is > > There is something wrong with the iwarp cm if it is destroying IDs in > handlers, IB cm avoids doing that to avoid the deadlock, the same > solution will be needed for iwarp too. > > Also the code this patch removed is quite ugly, if we are going back > to waiting it should be written in a more modern way without the test > bit and so on. I see, thanks for the clarifications. I took a look in ib_destroy_cm_id() and fount it does differently from iw_destroy_cm_id(). I guess certain amount of changes will be required for the good fix in iwarp code. I stop my fix attempt here, since it looks beyond my bandwidth now. If anyone provides fix patches, I'm willing to test them.
On Jun 15, 2023 / 09:45, Shin'ichiro Kawasaki wrote: [...] > I stop my fix attempt here, since it looks beyond my bandwidth now. If anyone > provides fix patches, I'm willing to test them. FYI, I tried to recreate this failure on my test system this week, and found it is no longer recreated. In case your TODO list includes a work item for this issue, please put lower priority on it. Two month ago, the failure had been recreated by repeating the blktests test cases nvme/030 and nvme/031 with SIW tranposrt around 30 times. Now I do not see it when I repeat them 100 times. I suspected some kernel changes between v6.4 and v6.5 would have fixed the issue, but even when I use both kernel versions, the failure is no longer seen. I reverted nvme-cli and libnvme to older version, but still do not see the failure. I just guess some change on my test system affected the symptom (Fedora 38 on QEMU).
On Tue, Sep 05, 2023 at 12:39:38AM +0000, Shinichiro Kawasaki wrote: > Two month ago, the failure had been recreated by repeating the blktests test > cases nvme/030 and nvme/031 with SIW tranposrt around 30 times. Now I do not see > it when I repeat them 100 times. I suspected some kernel changes between v6.4 > and v6.5 would have fixed the issue, but even when I use both kernel versions, > the failure is no longer seen. I reverted nvme-cli and libnvme to older version, > but still do not see the failure. I just guess some change on my test system > affected the symptom (Fedora 38 on QEMU). Maybe the refactoring in blktests could also be the reason the tests are not working? Just an idea.
On Sep 06, 2023 / 10:54, Daniel Wagner wrote: > On Tue, Sep 05, 2023 at 12:39:38AM +0000, Shinichiro Kawasaki wrote: > > Two month ago, the failure had been recreated by repeating the blktests test > > cases nvme/030 and nvme/031 with SIW tranposrt around 30 times. Now I do not see > > it when I repeat them 100 times. I suspected some kernel changes between v6.4 > > and v6.5 would have fixed the issue, but even when I use both kernel versions, > > the failure is no longer seen. I reverted nvme-cli and libnvme to older version, > > but still do not see the failure. I just guess some change on my test system > > affected the symptom (Fedora 38 on QEMU). > > Maybe the refactoring in blktests could also be the reason the tests are > not working? Just an idea. That is an idea. I rolled back blktests to the git revision of two months ago, but still the failure is not recreated. Hmm.
Seems I reproduced this issue[3] when I run [1] on the latest linux-block/for-next with kernel config[2]. [1]nvme_trtype=rdma ./check nvme/ [2] https://pastebin.com/u9fh7dbX [3] [ 387.880016] run blktests nvme/003 at 2023-09-07 03:31:15 [ 388.173979] (null): rxe_set_mtu: Set mtu to 1024 [ 388.310460] infiniband eno1_rxe: set active [ 388.310495] infiniband eno1_rxe: added eno1 [ 388.708380] loop0: detected capacity change from 0 to 2097152 [ 388.767800] nvmet: adding nsid 1 to subsystem blktests-subsystem-1 [ 388.839397] nvmet_rdma: enabling port 0 (10.16.221.26:4420) [ 389.053251] nvmet: creating discovery controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349. [ 389.061415] nvme nvme2: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 10.16.221.26:4420 [ 402.202057] nvme nvme2: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery" [ 402.251605] ====================================================== [ 402.257782] WARNING: possible circular locking dependency detected [ 402.263963] 6.5.0+ #1 Not tainted [ 402.267280] ------------------------------------------------------ [ 402.273459] kworker/5:0/48 is trying to acquire lock: [ 402.278512] ffff88831a9b6450 (&id_priv->handler_mutex){+.+.}-{3:3}, at: rdma_destroy_id+0x17/0x20 [rdma_cm] [ 402.288279] but task is already holding lock: [ 402.294112] ffffc90000687da0 ((work_completion)(&queue->release_work)){+.+.}-{0:0}, at: process_one_work+0x83a/0x1660 [ 402.304729] which lock already depends on the new lock. [ 402.312903] the existing dependency chain (in reverse order) is: [ 402.320380] -> #3 ((work_completion)(&queue->release_work)){+.+.}-{0:0}: [ 402.328561] __lock_acquire+0xbd6/0x1c00 [ 402.333015] lock_acquire+0x1da/0x5e0 [ 402.337203] process_one_work+0x884/0x1660 [ 402.341822] worker_thread+0x5be/0xef0 [ 402.346093] kthread+0x2f4/0x3d0 [ 402.349845] ret_from_fork+0x30/0x70 [ 402.353947] ret_from_fork_asm+0x1b/0x30 [ 402.358391] -> #2 ((wq_completion)nvmet-wq){+.+.}-{0:0}: [ 402.365186] __lock_acquire+0xbd6/0x1c00 [ 402.369631] lock_acquire+0x1da/0x5e0 [ 402.373818] __flush_workqueue+0x101/0x1230 [ 402.378524] nvmet_rdma_queue_connect+0x106/0x4b0 [nvmet_rdma] [ 402.384885] cma_cm_event_handler+0xf2/0x430 [rdma_cm] [ 402.390562] cma_ib_req_handler+0x93a/0x1cd0 [rdma_cm] [ 402.396238] cm_process_work+0x48/0x3b0 [ib_cm] [ 402.401307] cm_req_handler+0xf9b/0x2610 [ib_cm] [ 402.406465] cm_work_handler+0x80a/0x1380 [ib_cm] [ 402.411708] process_one_work+0x955/0x1660 [ 402.416326] worker_thread+0x5be/0xef0 [ 402.420600] kthread+0x2f4/0x3d0 [ 402.424351] ret_from_fork+0x30/0x70 [ 402.428450] ret_from_fork_asm+0x1b/0x30 [ 402.432897] -> #1 (&id_priv->handler_mutex/1){+.+.}-{3:3}: [ 402.439874] __lock_acquire+0xbd6/0x1c00 [ 402.444320] lock_acquire+0x1da/0x5e0 [ 402.448504] __mutex_lock+0x154/0x1490 [ 402.452776] cma_ib_req_handler+0x4f8/0x1cd0 [rdma_cm] [ 402.458455] cm_process_work+0x48/0x3b0 [ib_cm] [ 402.463525] cm_req_handler+0xf9b/0x2610 [ib_cm] [ 402.468681] cm_work_handler+0x80a/0x1380 [ib_cm] [ 402.473925] process_one_work+0x955/0x1660 [ 402.478543] worker_thread+0x5be/0xef0 [ 402.482817] kthread+0x2f4/0x3d0 [ 402.486570] ret_from_fork+0x30/0x70 [ 402.490668] ret_from_fork_asm+0x1b/0x30 [ 402.495113] -> #0 (&id_priv->handler_mutex){+.+.}-{3:3}: [ 402.501908] check_prev_add+0x1bd/0x23a0 [ 402.506352] validate_chain+0xb02/0xf30 [ 402.510713] __lock_acquire+0xbd6/0x1c00 [ 402.515159] lock_acquire+0x1da/0x5e0 [ 402.519344] __mutex_lock+0x154/0x1490 [ 402.523617] rdma_destroy_id+0x17/0x20 [rdma_cm] [ 402.528774] nvmet_rdma_free_queue+0x7b/0x380 [nvmet_rdma] [ 402.534790] nvmet_rdma_release_queue_work+0x3e/0x90 [nvmet_rdma] [ 402.541408] process_one_work+0x955/0x1660 [ 402.546029] worker_thread+0x5be/0xef0 [ 402.550303] kthread+0x2f4/0x3d0 [ 402.554054] ret_from_fork+0x30/0x70 [ 402.558153] ret_from_fork_asm+0x1b/0x30 [ 402.562599] other info that might help us debug this: [ 402.570599] Chain exists of: &id_priv->handler_mutex --> (wq_completion)nvmet-wq --> (work_completion)(&queue->release_work) [ 402.584854] Possible unsafe locking scenario: [ 402.590775] CPU0 CPU1 [ 402.595307] ---- ---- [ 402.599839] lock((work_completion)(&queue->release_work)); [ 402.605508] lock((wq_completion)nvmet-wq); [ 402.612300] lock((work_completion)(&queue->release_work)); [ 402.620475] lock(&id_priv->handler_mutex); [ 402.624755] *** DEADLOCK *** [ 402.630675] 2 locks held by kworker/5:0/48: [ 402.634860] #0: ffff8883070c9d48 ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: process_one_work+0x80c/0x1660 [ 402.644524] #1: ffffc90000687da0 ((work_completion)(&queue->release_work)){+.+.}-{0:0}, at: process_one_work+0x83a/0x1660 [ 402.655572] stack backtrace: [ 402.659933] CPU: 5 PID: 48 Comm: kworker/5:0 Not tainted 6.5.0+ #1 [ 402.666110] Hardware name: Dell Inc. PowerEdge R6515/07PXPY, BIOS 2.11.4 03/22/2023 [ 402.673763] Workqueue: nvmet-wq nvmet_rdma_release_queue_work [nvmet_rdma] [ 402.680646] Call Trace: [ 402.683099] <TASK> [ 402.685205] dump_stack_lvl+0x60/0xb0 [ 402.688879] check_noncircular+0x2f9/0x3e0 [ 402.692982] ? __pfx_check_noncircular+0x10/0x10 [ 402.697609] ? srso_return_thunk+0x5/0x10 [ 402.701635] ? srso_return_thunk+0x5/0x10 [ 402.705649] ? alloc_chain_hlocks+0x1de/0x520 [ 402.710010] check_prev_add+0x1bd/0x23a0 [ 402.713944] ? mark_lock.part.0+0xca/0xab0 [ 402.718054] validate_chain+0xb02/0xf30 [ 402.721909] ? __pfx_validate_chain+0x10/0x10 [ 402.726274] ? _raw_spin_unlock_irq+0x24/0x50 [ 402.730635] ? lockdep_hardirqs_on+0x79/0x100 [ 402.735000] ? srso_return_thunk+0x5/0x10 [ 402.739012] ? _raw_spin_unlock_irq+0x2f/0x50 [ 402.743380] __lock_acquire+0xbd6/0x1c00 [ 402.747321] lock_acquire+0x1da/0x5e0 [ 402.750991] ? rdma_destroy_id+0x17/0x20 [rdma_cm] [ 402.755805] ? __pfx_lock_acquire+0x10/0x10 [ 402.759997] ? do_raw_spin_trylock+0xb5/0x180 [ 402.764363] ? srso_return_thunk+0x5/0x10 [ 402.768381] ? srso_return_thunk+0x5/0x10 [ 402.772402] __mutex_lock+0x154/0x1490 [ 402.776159] ? rdma_destroy_id+0x17/0x20 [rdma_cm] [ 402.780973] ? rdma_destroy_id+0x17/0x20 [rdma_cm] [ 402.785782] ? __pfx___mutex_lock+0x10/0x10 [ 402.789981] ? srso_return_thunk+0x5/0x10 [ 402.793992] ? rcu_is_watching+0x11/0xb0 [ 402.797929] ? rdma_destroy_id+0x17/0x20 [rdma_cm] [ 402.802738] rdma_destroy_id+0x17/0x20 [rdma_cm] [ 402.807377] nvmet_rdma_free_queue+0x7b/0x380 [nvmet_rdma] [ 402.812875] nvmet_rdma_release_queue_work+0x3e/0x90 [nvmet_rdma] [ 402.818972] process_one_work+0x955/0x1660 [ 402.823078] ? __lock_acquired+0x207/0x7b0 [ 402.827177] ? __pfx_process_one_work+0x10/0x10 [ 402.831711] ? __pfx___lock_acquired+0x10/0x10 [ 402.836164] ? worker_thread+0x15a/0xef0 [ 402.840095] worker_thread+0x5be/0xef0 [ 402.843862] ? __pfx_worker_thread+0x10/0x10 [ 402.848134] kthread+0x2f4/0x3d0 [ 402.851366] ? __pfx_kthread+0x10/0x10 [ 402.855121] ret_from_fork+0x30/0x70 [ 402.858704] ? __pfx_kthread+0x10/0x10 [ 402.862459] ret_from_fork_asm+0x1b/0x30 [ 402.866398] </TASK> [ 403.865323] rdma_rxe: unloaded [ 405.242319] rdma_rxe: loaded [ 405.272491] loop: module loaded [ 405.352101] run blktests nvme/004 at 2023-09-07 03:31:32 [ 405.504367] (null): rxe_set_mtu: Set mtu to 1024 [ 405.587145] infiniband eno1_rxe: set active [ 405.595981] infiniband eno1_rxe: added eno1 [ 405.714913] loop0: detected capacity change from 0 to 2097152 [ 405.746602] nvmet: adding nsid 1 to subsystem blktests-subsystem-1 [ 405.784747] nvmet_rdma: enabling port 0 (10.16.221.26:4420) [ 405.876322] nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349. [ 405.894750] nvme nvme2: creating 16 I/O queues. [ 406.181403] nvme nvme2: mapped 16/0/0 default/read/poll queues. [ 406.216348] nvme nvme2: new ctrl: NQN "blktests-subsystem-1", addr 10.16.221.26:4420 [ 406.552633] nvme nvme2: Removing ctrl: NQN "blktests-subsystem-1" [ 407.879250] rdma_rxe: unloaded [ 409.146426] rdma_rxe: loaded [ 409.176282] loop: module loaded [ 409.460211] rdma_rxe: unloaded [ 410.938596] rdma_rxe: loaded [ 410.966908] loop: module loaded On Wed, Sep 6, 2023 at 7:33 PM Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> wrote: > > On Sep 06, 2023 / 10:54, Daniel Wagner wrote: > > On Tue, Sep 05, 2023 at 12:39:38AM +0000, Shinichiro Kawasaki wrote: > > > Two month ago, the failure had been recreated by repeating the blktests test > > > cases nvme/030 and nvme/031 with SIW tranposrt around 30 times. Now I do not see > > > it when I repeat them 100 times. I suspected some kernel changes between v6.4 > > > and v6.5 would have fixed the issue, but even when I use both kernel versions, > > > the failure is no longer seen. I reverted nvme-cli and libnvme to older version, > > > but still do not see the failure. I just guess some change on my test system > > > affected the symptom (Fedora 38 on QEMU). > > > > Maybe the refactoring in blktests could also be the reason the tests are > > not working? Just an idea. > > That is an idea. I rolled back blktests to the git revision of two months ago, > but still the failure is not recreated. Hmm. >
On Thu, Sep 07, 2023 at 03:47:03PM +0800, Yi Zhang wrote: > Seems I reproduced this issue[3] when I run [1] on the latest > linux-block/for-next with kernel config[2]. https://lore.kernel.org/linux-rdma/13441b9b-cc13-f0e0-bd46-f14983dadd49@grimberg.me/ So far nobody has come with a solution Jason
On 9/7/23 09:47, Yi Zhang wrote: > Seems I reproduced this issue[3] when I run [1] on the latest > linux-block/for-next with kernel config[2]. > > [1]nvme_trtype=rdma ./check nvme/ > > [2] https://pastebin.com/u9fh7dbX > But isn't this the same issue as the circular locking discussed earlier? Can you check if the patch 'nvmet-tcp: avoid circular locking dependency on install_queue()' fixes this issue, too? Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman
On Fri, Sep 8, 2023 at 3:18 PM Hannes Reinecke <hare@suse.de> wrote: > > On 9/7/23 09:47, Yi Zhang wrote: > > Seems I reproduced this issue[3] when I run [1] on the latest > > linux-block/for-next with kernel config[2]. > > > > [1]nvme_trtype=rdma ./check nvme/ > > > > [2] https://pastebin.com/u9fh7dbX > > > But isn't this the same issue as the circular locking discussed earlier? Sorry, maybe I mixed up the issues here, There are three issues here: 1. "BUG:KASAN: slab-use-after-free at mutex_lock() in cma_iw_handler()." which Shin'ichiro mentioned in this patch and triggered by nvme/rdma nvme/030 nvme/031 use siw 2. nvme/tcp circular locking on install_queue which could be fixed by: https://lore.kernel.org/linux-nvme/20230810132006.129365-1-hare@suse.de/ https://lore.kernel.org/linux-nvme/20230726142939.10062-1-guoqing.jiang@linux.dev/ 3. circular locking at rdma_destroy_id+0x17/0x20 [rdma_cm] which was already filed last year but still not get one solution: https://lore.kernel.org/linux-rdma/CAHj4cs93BfTRgWF6PbuZcfq6AARHgYC2g=RQ-7Jgcf1-6h+2SQ@mail.gmail.com/ > > Can you check if the patch > 'nvmet-tcp: avoid circular locking dependency on install_queue()' > fixes this issue, too? > > Cheers, > > Hannes > -- > Dr. Hannes Reinecke Kernel Storage Architect > hare@suse.de +49 911 74053 688 > SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg > HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew > Myers, Andrew McDonald, Martje Boudien Moerman > -- Best Regards, Yi Zhang
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 93a1c48d0c32..c5267d9bb184 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -2477,6 +2477,7 @@ static int cma_iw_handler(struct iw_cm_id *iw_id, struct iw_cm_event *iw_event) struct sockaddr *laddr = (struct sockaddr *)&iw_event->local_addr; struct sockaddr *raddr = (struct sockaddr *)&iw_event->remote_addr; + cma_id_get(id_priv); mutex_lock(&id_priv->handler_mutex); if (READ_ONCE(id_priv->state) != RDMA_CM_CONNECT) goto out; @@ -2524,12 +2525,14 @@ static int cma_iw_handler(struct iw_cm_id *iw_id, struct iw_cm_event *iw_event) if (ret) { /* Destroy the CM ID by returning a non-zero value. */ id_priv->cm_id.iw = NULL; + cma_id_put(id_priv); destroy_id_handler_unlock(id_priv); return ret; } out: mutex_unlock(&id_priv->handler_mutex); + cma_id_put(id_priv); return ret; }