Message ID | 4cc25277-429a-4ab9-470c-b3af1428ce93@grimberg.me (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
> > >>> I just hit an nvmf target NULL pointer deref BUG after a few hours of > > keep-alive > >>> timeout testing. It appears that nvmet_rdma_cm_handler() was called with > >>> cm_id->qp == NULL, so the local nvmet_rdma_queue * variable queue is left as > >>> NULL. But then nvmet_rdma_queue_disconnect() is called with queue == NULL > >> which > >>> causes the crash. > >> > >> AFAICT, the only way cm_id->qp is NULL is for a scenario we didn't even > >> get to allocate a queue-pair (e.g. calling rdma_create_qp). The teardown > >> paths does not nullify cm_id->qp... > > > > rdma_destroy_qp() nulls out cm_id->qp. > > pphh, somehow managed to miss it... > > So we have a case where we can call rdma_destroy_qp and > then rdma_destroy_id but still get events on the cm_id... > Not very nice... > > So I think that the patch from Bart a few weeks ago was correct: > Not quite. It just guards against a null queue for TIMEWAIT_EXIT, which is only generated by the IB_CM. > --- > drivers/nvme/target/rdma.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c > index d1aea17..a61e47f 100644 > --- a/drivers/nvme/target/rdma.c > +++ b/drivers/nvme/target/rdma.c > @@ -1354,9 +1354,12 @@ static int nvmet_rdma_cm_handler(struct > rdma_cm_id *cm_id, > break; > case RDMA_CM_EVENT_ADDR_CHANGE: > case RDMA_CM_EVENT_DISCONNECTED: > - case RDMA_CM_EVENT_TIMEWAIT_EXIT: > nvmet_rdma_queue_disconnect(queue); > break; > + case RDMA_CM_EVENT_TIMEWAIT_EXIT: > + if (queue) > + nvmet_rdma_queue_disconnect(queue); > + break; > case RDMA_CM_EVENT_DEVICE_REMOVAL: > ret = nvmet_rdma_device_removal(cm_id, queue); > break; > --- > > In case this fixes the issue (as expected) I'll queue it up > with a change log and a code comment on why we need to do > this (and include all the relevant cases around it)... -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c index d1aea17..a61e47f 100644 --- a/drivers/nvme/target/rdma.c +++ b/drivers/nvme/target/rdma.c @@ -1354,9 +1354,12 @@ static int nvmet_rdma_cm_handler(struct rdma_cm_id *cm_id, break; case RDMA_CM_EVENT_ADDR_CHANGE: case RDMA_CM_EVENT_DISCONNECTED: - case RDMA_CM_EVENT_TIMEWAIT_EXIT: nvmet_rdma_queue_disconnect(queue); break; + case RDMA_CM_EVENT_TIMEWAIT_EXIT: + if (queue) + nvmet_rdma_queue_disconnect(queue); + break; case RDMA_CM_EVENT_DEVICE_REMOVAL: ret = nvmet_rdma_device_removal(cm_id, queue);