Message ID | 62eea88b-caa4-5799-3d8f-8d8789879aa8@grimberg.me (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
On 10/19/2017 02:55 PM, Sagi Grimberg wrote: > >>> Hi Yi, >>> >>> I was referring to the bug you reported on a simple create_ctrl failed: >>> https://pastebin.com/7z0XSGSd >>> >>> Does it reproduce? >>> >> yes, this issue was reproduced during "git bisect" with below patch > > OK, if this does not reproduce with the latest code, lets put it aside > for now. > > So as for the error you see, can you please try the following patch? Hi Sagi With this patch, no such error log found on host side, but I found there is no nvme0n1 device node even get "nvme nvme0: Successfully reconnected" on host. Host side: [ 98.181089] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 172.31.0.90:4420 [ 98.329464] nvme nvme0: creating 40 I/O queues. [ 98.835409] nvme nvme0: new ctrl: NQN "testnqn", addr 172.31.0.90:4420 [ 107.873586] nvme nvme0: Reconnecting in 10 seconds... [ 118.505937] nvme nvme0: Connect rejected: status 8 (invalid service ID). [ 118.513443] nvme nvme0: rdma_resolve_addr wait failed (-104). [ 118.519875] nvme nvme0: Failed reconnect attempt 1 [ 118.525241] nvme nvme0: Reconnecting in 10 seconds... [ 128.733311] nvme nvme0: Connect rejected: status 8 (invalid service ID). [ 128.740812] nvme nvme0: rdma_resolve_addr wait failed (-104). [ 128.747247] nvme nvme0: Failed reconnect attempt 2 [ 128.752609] nvme nvme0: Reconnecting in 10 seconds... [ 138.973404] nvme nvme0: Connect rejected: status 8 (invalid service ID). [ 138.980904] nvme nvme0: rdma_resolve_addr wait failed (-104). [ 138.987329] nvme nvme0: Failed reconnect attempt 3 [ 138.992691] nvme nvme0: Reconnecting in 10 seconds... [ 149.232610] nvme nvme0: creating 40 I/O queues. [ 149.831443] nvme nvme0: Successfully reconnected [ 149.831519] nvme nvme0: identifiers changed for nsid 1 [root@rdma-virt-01 linux ((dafb1b2...))]$ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 465.8G 0 disk ├─sda2 8:2 0 464.8G 0 part │ ├─rhelaa_rdma--virt--01-swap 253:1 0 4G 0 lvm [SWAP] │ ├─rhelaa_rdma--virt--01-home 253:2 0 410.8G 0 lvm /home │ └─rhelaa_rdma--virt--01-root 253:0 0 50G 0 lvm / └─sda1 8:1 0 1G 0 part /boot > -- > diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c > index 405895b1dff2..916658e010ff 100644 > --- a/drivers/nvme/host/rdma.c > +++ b/drivers/nvme/host/rdma.c > @@ -572,6 +572,11 @@ static void nvme_rdma_free_queue(struct > nvme_rdma_queue *queue) > if (!test_and_clear_bit(NVME_RDMA_Q_ALLOCATED, &queue->flags)) > return; > > + if(nvme_rdma_queue_idx(queue) == 0) > + nvme_rdma_free_qe(queue->device->dev, > + &queue->ctrl->async_event_sqe, > + sizeof(struct nvme_command), DMA_TO_DEVICE); > + > nvme_rdma_destroy_queue_ib(queue); > rdma_destroy_id(queue->cm_id); > } > @@ -739,8 +744,6 @@ static struct blk_mq_tag_set > *nvme_rdma_alloc_tagset(struct nvme_ctrl *nctrl, > static void nvme_rdma_destroy_admin_queue(struct nvme_rdma_ctrl *ctrl, > bool remove) > { > - nvme_rdma_free_qe(ctrl->queues[0].device->dev, > &ctrl->async_event_sqe, > - sizeof(struct nvme_command), DMA_TO_DEVICE); > nvme_rdma_stop_queue(&ctrl->queues[0]); > if (remove) { > blk_cleanup_queue(ctrl->ctrl.admin_q); > -- > > _______________________________________________ > Linux-nvme mailing list > Linux-nvme@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-nvme -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Hi Sagi > With this patch, no such error log found on host side, Awsome. that's the culprit... > but I found there > is no nvme0n1 device node even get "nvme nvme0: Successfully > reconnected" on host. That is expected because you did not persist a namespace UUID which caused the kernel to generate a random one. That confused the host as it got the same namespace ID with a different UUID. Can you please set a uuid when you rerun the test? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/19/2017 05:44 PM, Sagi Grimberg wrote: > >> Hi Sagi >> With this patch, no such error log found on host side, > > > Awsome. that's the culprit... > >> but I found there is no nvme0n1 device node even get "nvme nvme0: >> Successfully reconnected" on host. > > That is expected because you did not persist a namespace UUID > which caused the kernel to generate a random one. That confused > the host as it got the same namespace ID with a different UUID. > > Can you please set a uuid when you rerun the test? > I tried add uuid filed on rdma.json, it works well now. [root@rdma-virt-00 ~]$ cat /etc/rdma.json { "hosts": [ { "nqn": "hostnqn" } ], "ports": [ { "addr": { "adrfam": "ipv4", "traddr": "172.31.0.90", "treq": "not specified", "trsvcid": "4420", "trtype": "rdma" }, "portid": 2, "referrals": [], "subsystems": [ "testnqn" ] } ], "subsystems": [ { "allowed_hosts": [], "attr": { "allow_any_host": "1" }, "namespaces": [ { "device": { "nguid": "ef90689c-6c46-d44c-89c1-4067801309a8", "path": "/dev/nullb0", "uuid": "00000000-0000-0000-0000-000000000001" }, "enable": 1, "nsid": 1 } ], "nqn": "testnqn" } ] } > _______________________________________________ > Linux-nvme mailing list > Linux-nvme@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-nvme -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index 405895b1dff2..916658e010ff 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -572,6 +572,11 @@ static void nvme_rdma_free_queue(struct nvme_rdma_queue *queue) if (!test_and_clear_bit(NVME_RDMA_Q_ALLOCATED, &queue->flags)) return; + if(nvme_rdma_queue_idx(queue) == 0) + nvme_rdma_free_qe(queue->device->dev, + &queue->ctrl->async_event_sqe, + sizeof(struct nvme_command), DMA_TO_DEVICE); + nvme_rdma_destroy_queue_ib(queue); rdma_destroy_id(queue->cm_id);