Message ID | 56030fcd-b8a0-fc0e-18e5-985ebf16a82e@grimberg.me (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
On Tue, Jun 27, 2017 at 12:37:51AM -0600, Sagi Grimberg wrote: > > Hi Sagi/Christoph, > > Hi Shiraz, > > Please CC linux-nvme for nvme-rdma related stuff. OK. > > I am seeing a deadlock for a device removal event on NVMeF target. > > > > The sequence of events leading to the deadlock are as follows, > > > > 1. i40iw posts IW_CM_EVENT_CLOSE events for all QPs causing the corresponding > > NVMet RDMA Queues to disconnect and schedule release of any pending work on WQ > > 2. i40iw triggers device removal > > ib_unregister_device > > [..] > > cma_remove_id_dev (takes a handler lock before calling the event handler) > > nvmet_rdma_cm_handler > > nvmet_rdma_device_removal (queue->state = NVMET_RDMA_Q_DISCONNECTING due to 1.) > > flush_scheduled_work (blocks till all scheduled work is drained from WQ) > > nvmet_rdma_release_queue_work (queue->state = NVMET_RDMA_Q_DISCONNECTING) > > rdma_destroy_id (waits on the same handler lock as cma_remove_id_dev causing the deadlock) > > > > So this problem can occur when there is a device removal event while the queue is in > > disconnect state with the some oustanding work that hasnt been drained from the WQ at the > > time flush_scheduled_work is called. > > This indeed looks like a bug (thanks for reporting!). We indeed don't > have sufficient information on where the queue release procedure is by > only looking at the queue state, we can't tell if rdma_destroy_id was > invoked and we can deadlock with rdma_destroy_id. > > How about the (untested) alternative below: > -- > [PATCH] nvmet-rdma: register ib_client to not deadlock in device > removal > > We can deadlock in case we got to a device removal > event on a queue which is already in the process of > destroying the cm_id is this is blocking until all > events on this cm_id will drain. On the other hand > we cannot guarantee that rdma_destroy_id was invoked > as we only have indication that the queue disconnect > flow has been queued (the queue state is updated before > the realease work has been queued). > > So, we leave all the queue removal to a separate ib_client > to avoid this deadlock as ib_client device removal is in > a different context than the cm_id itself. > > Signed-off-by: Sagi Grimberg <sagi@grimberg.me> > --- Yes. This patch fixes the problem I am seeing. Shiraz -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>> How about the (untested) alternative below: >> -- >> [PATCH] nvmet-rdma: register ib_client to not deadlock in device >> removal >> >> We can deadlock in case we got to a device removal >> event on a queue which is already in the process of >> destroying the cm_id is this is blocking until all >> events on this cm_id will drain. On the other hand >> we cannot guarantee that rdma_destroy_id was invoked >> as we only have indication that the queue disconnect >> flow has been queued (the queue state is updated before >> the realease work has been queued). >> >> So, we leave all the queue removal to a separate ib_client >> to avoid this deadlock as ib_client device removal is in >> a different context than the cm_id itself. >> >> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> >> --- > > Yes. This patch fixes the problem I am seeing. Awsome, Adding your Tested-by tag. Thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Could something like this be causing the D state problem I was seeing in iSER almost a year ago? I tried writing a patch for iSER based on this, but it didn't help. Either the bug is not being triggered in device removal, or I didn't line up the statuses correctly. But it seems that things are getting stuck in the work queue and some sort of deadlock is happening so I was hopeful that something similar may be in iSER. Thanks, Robert ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Wed, Jun 28, 2017 at 12:50 AM, Sagi Grimberg <sagi@grimberg.me> wrote: > >>> How about the (untested) alternative below: >>> -- >>> [PATCH] nvmet-rdma: register ib_client to not deadlock in device >>> removal >>> >>> We can deadlock in case we got to a device removal >>> event on a queue which is already in the process of >>> destroying the cm_id is this is blocking until all >>> events on this cm_id will drain. On the other hand >>> we cannot guarantee that rdma_destroy_id was invoked >>> as we only have indication that the queue disconnect >>> flow has been queued (the queue state is updated before >>> the realease work has been queued). >>> >>> So, we leave all the queue removal to a separate ib_client >>> to avoid this deadlock as ib_client device removal is in >>> a different context than the cm_id itself. >>> >>> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> >>> --- >> >> >> Yes. This patch fixes the problem I am seeing. > > > Awsome, > > Adding your Tested-by tag. > > Thanks! > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hey Robert, > Could something like this be causing the D state problem I was seeing > in iSER almost a year ago? No, that is a bug in the mlx5 device as far as I'm concerned (although I couldn't prove it). I've tried to track it down but without access to the FW tools I can't understand what is going on. I've seen this same phenomenon with nvmet-rdma before as well. It looks like when we perform QP draining in the presence of rdma operations it may not complete, meaning that the zero-length rdma write never generates a completion. Maybe it has something to do with the qp moving to error state when some rdma operations have not completed. > I tried writing a patch for iSER based on > this, but it didn't help. Either the bug is not being triggered in > device removal, It's 100% not related to device removal. > or I didn't line up the statuses correctly. But it > seems that things are getting stuck in the work queue and some sort of > deadlock is happening so I was hopeful that something similar may be > in iSER. The hang is the ULP code waiting for QP drain. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Sagi, Thanks for the update. On Thu, Jun 29, 2017 at 8:32 AM, Sagi Grimberg <sagi@grimberg.me> wrote: > Hey Robert, > >> Could something like this be causing the D state problem I was seeing >> in iSER almost a year ago? > > > No, that is a bug in the mlx5 device as far as I'm concerned (although I > couldn't prove it). I've tried to track it down but without access to > the FW tools I can't understand what is going on. I've seen this same > phenomenon with nvmet-rdma before as well. Do you know who I could contact about it? I can reproduce the problem pretty easy with two hosts back to back, so it should be easy for someone with mlx5 Eth devices to replicate. > It looks like when we perform QP draining in the presence of rdma > operations it may not complete, meaning that the zero-length rdma write > never generates a completion. Maybe it has something to do with the qp > moving to error state when some rdma operations have not completed. > >> I tried writing a patch for iSER based on >> this, but it didn't help. Either the bug is not being triggered in >> device removal, > > > It's 100% not related to device removal. > >> or I didn't line up the statuses correctly. But it >> seems that things are getting stuck in the work queue and some sort of >> deadlock is happening so I was hopeful that something similar may be >> in iSER. > > > The hang is the ULP code waiting for QP drain. Yeah, the patches I wrote did nothing to help the problem. The only thing that kind of worked, was forcing the queue to drop (maybe I was just ignoring the old queue, I can't remember exactly), but it was leaving some stale iSCSI session info around. Now that I've read more of the iSCSI code, I wonder if I should revisit that. I think Bart said that the sledgehammer approach I took should not be necessary. ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c index 32aa10b521c8..56a4cba690b5 100644 --- a/drivers/nvme/target/rdma.c +++ b/drivers/nvme/target/rdma.c @@ -1307,53 +1307,44 @@ static void nvmet_rdma_queue_connect_fail(struct rdma_cm_id *cm_id, /** * nvme_rdma_device_removal() - Handle RDMA device removal + * @cm_id: rdma_cm id, used for nvmet port * @queue: nvmet rdma queue (cm id qp_context) - * @addr: nvmet address (cm_id context) * * DEVICE_REMOVAL event notifies us that the RDMA device is about - * to unplug so we should take care of destroying our RDMA resources. - * This event will be generated for each allocated cm_id. + * to unplug. Note that this event can be generated on a normal + * queue cm_id and/or a device bound listener cm_id (where in this + * case queue will be null). * - * Note that this event can be generated on a normal queue cm_id - * and/or a device bound listener cm_id (where in this case - * queue will be null). - * - * we claim ownership on destroying the cm_id. For queues we move - * the queue state to NVMET_RDMA_IN_DEVICE_REMOVAL and for port + * We registered an ib_client to handle device removal for queues, + * so we only need to handle the listening port cm_ids. In this case * we nullify the priv to prevent double cm_id destruction and destroying * the cm_id implicitely by returning a non-zero rc to the callout. */ static int nvmet_rdma_device_removal(struct rdma_cm_id *cm_id, struct nvmet_rdma_queue *queue) { - unsigned long flags; - - if (!queue) { - struct nvmet_port *port = cm_id->context; + struct nvmet_port *port; + if (queue) { /* - * This is a listener cm_id. Make sure that - * future remove_port won't invoke a double - * cm_id destroy. use atomic xchg to make sure - * we don't compete with remove_port. - */ - if (xchg(&port->priv, NULL) != cm_id) - return 0; - } else { - /* - * This is a queue cm_id. Make sure that - * release queue will not destroy the cm_id - * and schedule all ctrl queues removal (only - * if the queue is not disconnecting already). + * This is a queue cm_id. we have registered + * an ib_client to handle queues removal + * so don't interfear and just return. */ - spin_lock_irqsave(&queue->state_lock, flags); - if (queue->state != NVMET_RDMA_Q_DISCONNECTING) - queue->state = NVMET_RDMA_IN_DEVICE_REMOVAL; - spin_unlock_irqrestore(&queue->state_lock, flags); - nvmet_rdma_queue_disconnect(queue); - flush_scheduled_work(); + return 0; } + port = cm_id->context; + + /* + * This is a listener cm_id. Make sure that + * future remove_port won't invoke a double + * cm_id destroy. use atomic xchg to make sure + * we don't compete with remove_port. + */ + if (xchg(&port->priv, NULL) != cm_id) + return 0; + /* * We need to return 1 so that the core will destroy