From patchwork Thu Mar 9 11:23:08 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sagi Grimberg X-Patchwork-Id: 9613107 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id D20BA602B4 for ; Thu, 9 Mar 2017 11:24:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CCB6B285ED for ; Thu, 9 Mar 2017 11:24:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BFD4D285F7; Thu, 9 Mar 2017 11:24:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.4 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3CF2E285ED for ; Thu, 9 Mar 2017 11:24:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754612AbdCILYS (ORCPT ); Thu, 9 Mar 2017 06:24:18 -0500 Received: from mail-wm0-f65.google.com ([74.125.82.65]:33607 "EHLO mail-wm0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753581AbdCILXe (ORCPT ); Thu, 9 Mar 2017 06:23:34 -0500 Received: by mail-wm0-f65.google.com with SMTP id n11so10410679wma.0 for ; Thu, 09 Mar 2017 03:23:12 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=akiPysQPLi31Eci1EtWKmaTguDrFPqHCawFozwZ0EFs=; b=IP/aGQPzB8bikCKg+54CxnDm6T8MnnhLXOgXsXBm+WSmIEvu7niPlQtXXnJdbAk/xK ffGxoTSlbYRXixHUhvhCL2mVm2DAPUUTl0krCvxTol8TH+ZiAWMBd7XYT9dTbLvbwlhF bPmUfIRx/1cWY8y6sifcCEivrHW4Hwv/gX8GHo4xG13zT3tla1jdCjeRXeA8ZgJVwd8n zPhjYw8ww88Qe3wHKfXDd8OTo1I8/ZgkKy4kTRsDlUCduO+lHROb2mSMZWLdd2XfZOoL 1WApbYucoejr+aHGpW7qw6bvGsffHFRVyyGhIkA4qr7wtaG65IRdbFk7JKjvG9pr2UpH fJwA== X-Gm-Message-State: AMke39lJCFJDRvztLg5Eh2nRAk1w1fmFix3PrXXRQItsz4577AWR4J5fDKReI9ilVYRA2Q== X-Received: by 10.28.169.130 with SMTP id s124mr9584248wme.137.1489058591534; Thu, 09 Mar 2017 03:23:11 -0800 (PST) Received: from [192.168.64.116] (bzq-82-81-101-184.red.bezeqint.net. [82.81.101.184]) by smtp.gmail.com with ESMTPSA id 198sm8428267wmn.11.2017.03.09.03.23.09 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 09 Mar 2017 03:23:10 -0800 (PST) Subject: Re: kernull NULL pointer observed on initiator side after 'nvmetcli clear' on target side To: Yi Zhang , linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org References: <1053522223.31446389.1488721184925.JavaMail.zimbra@redhat.com> <644fc4ab-df6b-a337-1431-bad881ef56ee@grimberg.me> <88ae146a-7510-9be0-c9b4-58e70f9d73b9@redhat.com> Cc: hch@lst.de From: Sagi Grimberg Message-ID: <6ffda302-02f9-12f0-a112-ea7cd20b9ffa@grimberg.me> Date: Thu, 9 Mar 2017 13:23:08 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.7.0 MIME-Version: 1.0 In-Reply-To: <88ae146a-7510-9be0-c9b4-58e70f9d73b9@redhat.com> Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP > Hi Sagi > With this patch, the NULL pointer fixed now. > But from below log, we can see it will continue reconnecting in 10 > seconds and cannot be stopped. > > [36288.963890] Broke affinity for irq 16 > [36288.983090] Broke affinity for irq 28 > [36289.003104] Broke affinity for irq 90 > [36289.020488] Broke affinity for irq 93 > [36289.036911] Broke affinity for irq 97 > [36289.053344] Broke affinity for irq 100 > [36289.070166] Broke affinity for irq 104 > [36289.088076] smpboot: CPU 1 is now offline > [36302.371160] nvme nvme0: reconnecting in 10 seconds > [36312.953684] blk_mq_reinit_tagset: tag is null, continue > [36312.983267] nvme nvme0: Connect rejected: status 8 (invalid service ID). > [36313.017290] nvme nvme0: rdma_resolve_addr wait failed (-104). > [36313.044937] nvme nvme0: Failed reconnect attempt, requeueing... > [36323.171983] blk_mq_reinit_tagset: tag is null, continue > [36323.200733] nvme nvme0: Connect rejected: status 8 (invalid service ID). > [36323.233820] nvme nvme0: rdma_resolve_addr wait failed (-104). > [36323.261027] nvme nvme0: Failed reconnect attempt, requeueing... > [36333.412341] blk_mq_reinit_tagset: tag is null, continue > [36333.441346] nvme nvme0: Connect rejected: status 8 (invalid service ID). > [36333.476139] nvme nvme0: rdma_resolve_addr wait failed (-104). > [36333.502794] nvme nvme0: Failed reconnect attempt, requeueing... > [36343.652755] blk_mq_reinit_tagset: tag is null, continue > [36343.682103] nvme nvme0: Connect rejected: status 8 (invalid service ID). > [36343.716645] nvme nvme0: rdma_resolve_addr wait failed (-104). > [36343.743581] nvme nvme0: Failed reconnect attempt, requeueing... > [36353.893103] blk_mq_reinit_tagset: tag is null, continue > [36353.921041] nvme nvme0: Connect rejected: status 8 (invalid service ID). > [36353.953541] nvme nvme0: rdma_resolve_addr wait failed (-104). > [36353.983528] nvme nvme0: Failed reconnect attempt, requeueing... > [36364.133544] blk_mq_reinit_tagset: tag is null, continue > [36364.162012] nvme nvme0: Connect rejected: status 8 (invalid service ID). > [36364.195002] nvme nvme0: rdma_resolve_addr wait failed (-104). > [36364.221671] nvme nvme0: Failed reconnect attempt, requeueing... > Yep... looks like we don't take into account that we can't use all the queues now... Does this patch help: --- DMA_TO_DEVICE); if (ret) @@ -647,8 +645,22 @@ static int nvme_rdma_connect_io_queues(struct nvme_rdma_ctrl *ctrl) static int nvme_rdma_init_io_queues(struct nvme_rdma_ctrl *ctrl) { + struct nvmf_ctrl_options *opts = ctrl->ctrl.opts; + unsigned int nr_io_queues; int i, ret; + nr_io_queues = min(opts->nr_io_queues, num_online_cpus()); + ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues); + if (ret) + return ret; + + ctrl->queue_count = nr_io_queues + 1; + if (ctrl->queue_count < 2) + return 0; + + dev_info(ctrl->ctrl.device, + "creating %d I/O queues.\n", nr_io_queues); + for (i = 1; i < ctrl->queue_count; i++) { ret = nvme_rdma_init_queue(ctrl, i, ctrl->ctrl.opts->queue_size); @@ -1793,20 +1805,8 @@ static const struct nvme_ctrl_ops nvme_rdma_ctrl_ops = { static int nvme_rdma_create_io_queues(struct nvme_rdma_ctrl *ctrl) { - struct nvmf_ctrl_options *opts = ctrl->ctrl.opts; int ret; - ret = nvme_set_queue_count(&ctrl->ctrl, &opts->nr_io_queues); - if (ret) - return ret; - - ctrl->queue_count = opts->nr_io_queues + 1; - if (ctrl->queue_count < 2) - return 0; - - dev_info(ctrl->ctrl.device, - "creating %d I/O queues.\n", opts->nr_io_queues); - -- -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index 29ac8fcb8d2c..25af3f75f6f1 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -337,8 +337,6 @@ static int __nvme_rdma_init_request(struct nvme_rdma_ctrl *ctrl, struct ib_device *ibdev = dev->dev; int ret; - BUG_ON(queue_idx >= ctrl->queue_count); - ret = nvme_rdma_alloc_qe(ibdev, &req->sqe, sizeof(struct nvme_command),