From patchwork Mon Oct 2 12:51:43 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Sagi Grimberg X-Patchwork-Id: 9980801 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 4BBD760384 for ; Mon, 2 Oct 2017 12:51:51 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3E22A288CF for ; Mon, 2 Oct 2017 12:51:51 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 32D1E289CC; Mon, 2 Oct 2017 12:51:51 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 064E5288CF for ; Mon, 2 Oct 2017 12:51:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751338AbdJBMvs (ORCPT ); Mon, 2 Oct 2017 08:51:48 -0400 Received: from mail-wr0-f176.google.com ([209.85.128.176]:45647 "EHLO mail-wr0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751065AbdJBMvr (ORCPT ); Mon, 2 Oct 2017 08:51:47 -0400 Received: by mail-wr0-f176.google.com with SMTP id m18so3854725wrm.2 for ; Mon, 02 Oct 2017 05:51:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=t/QZ8YwfzEE4e5H6lIHjVWcDelJWDLV0LZqbP5trZhk=; b=rS5X1OLsoQYL+oCqyR6hs7rxKy+mjP208fT+Gj5LICfS3iM6DCEioNMjetjVuUwWrp n0Dd9RvF4x7Zs6IvirNkyfnBSdfBL3lXElWwSKmYt3w17jhKk2R7QAGfZTTdBK6nYG1D 84Z3G8h9w1I3afkppMYtagLlg8PCiTVjc51fJ1eInk5Wcb8SkkMyVvoG4IXiMl7H7bV+ DIGgmL4GxaieKCUJWylFRrYwvEuFsmjXIo/Q2quWjpeShe/a5PNxRn5oh0XkLr0504wl Q9Pt7UqSJLCm+QE4xqBSNXRZmDcAkOleQiMGl6JWStccvYnV+aCl+odHf2Rz/ofpc3fG eZMQ== X-Gm-Message-State: AHPjjUjHdCmCc4ZuzW75k9BIPSyqFG/ioSEfwD3NJQ1hPMqmomfRPZyF HyTVZnYS3UeQrc7yLHGPU/M= X-Google-Smtp-Source: AOwi7QB9QusGRbd8Ege2BfhqfQvS5UTZDJS3b0ahGVsM1dNhmonAO3h+3qEcjfV89X7rw8CPKU010w== X-Received: by 10.223.199.18 with SMTP id k18mr12126972wrg.145.1506948706193; Mon, 02 Oct 2017 05:51:46 -0700 (PDT) Received: from [192.168.64.169] (bzq-82-81-101-184.red.bezeqint.net. [82.81.101.184]) by smtp.gmail.com with ESMTPSA id 81sm8640609wmi.17.2017.10.02.05.51.44 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 02 Oct 2017 05:51:45 -0700 (PDT) Subject: Re: nvmeof rdma regression issue on 4.14.0-rc1 (or maybe mlx4?) To: Yi Zhang , Christoph Hellwig Cc: linux-rdma@vger.kernel.org, linux-nvme@lists.infradead.org, Leon Romanovsky , Tariq Toukan References: <1735134433.8514119.1505997532669.JavaMail.zimbra@redhat.com> <1215229914.8516804.1505998051674.JavaMail.zimbra@redhat.com> <20170921144421.GA15285@infradead.org> <47493aa0-4cad-721b-4ea2-c3b2293340aa@grimberg.me> <20170924103426.GB25094@mtr-leonro.local> <4d0e27f2-99e4-ffc7-18a5-e5757c27dad4@redhat.com> <4faccebf-7fcf-58b9-1605-82ee9acc652b@grimberg.me> From: Sagi Grimberg Message-ID: <729c512f-55ff-25e4-6dd9-8b4dcc31bb8d@grimberg.me> Date: Mon, 2 Oct 2017 14:51:43 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hi, >>> Panic after connection with below commits, detailed log here: >>> https://pastebin.com/7z0XSGSd >>> 31fdf18     nvme-rdma: reuse configure/destroy_admin_queue >>> 3f02fff       nvme-rdma: don't free tagset on resets >>> 18398af    nvme-rdma: disable the controller on resets >>> b28a308   nvme-rdma: move tagset allocation to a dedicated routine >>> >>> good    34b6c23 nvme: Add admin_tagset pointer to nvme_ctrl >> >> Is that a reproducible panic? I'm not seeing this at all. >> > > Yes, I can reproduce every time. And the target side kernel version is > 4.14.0-rc1 during the panic occurred. > >> Can you run gdb on nvme-rdma.ko >> $ l *(nvme_rdma_create_ctrl+0x37d) >> > [root@rdma-virt-01 linux ((31fdf18...))]$ gdb > /usr/lib/modules/4.13.0-rc7.31fdf18+/kernel/drivers/nvme/host/nvme-rdma.ko > GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-100.el7 > Copyright (C) 2013 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later > > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law.  Type "show copying" > and "show warranty" for details. > This GDB was configured as "x86_64-redhat-linux-gnu". > For bug reporting instructions, please see: > ... > Reading symbols from > /usr/lib/modules/4.13.0-rc7.31fdf18+/kernel/drivers/nvme/host/nvme-rdma.ko...done. > > (gdb) l *(nvme_rdma_create_ctrl+0x37d) > 0x297d is in nvme_rdma_create_ctrl (drivers/nvme/host/rdma.c:656). > 651        struct nvme_rdma_ctrl *ctrl = to_rdma_ctrl(nctrl); > 652        struct blk_mq_tag_set *set = admin ? > 653                &ctrl->admin_tag_set : &ctrl->tag_set; > 654 > 655        blk_mq_free_tag_set(set); > 656        nvme_rdma_dev_put(ctrl->device); > 657    } > 658 > 659    static struct blk_mq_tag_set *nvme_rdma_alloc_tagset(struct > nvme_ctrl *nctrl, > 660            bool admin) > (gdb) Lets take this one step at a time, starting with this issue. First, there is a reason why a simple create_ctrl fails, can we isolate exactly which call fails? Was something else going on that might have made the simple create_ctrl fail? We don't see any "rdma_resolve_addr failed" or "failed to connect queue" messages but we do see "creating I/O queues" which means that we either failed at IO tagset allocation or initializing connect_q. We have a missing error code assignment so can you try the following patch: --- } -- -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index 58983000964b..98dd51e630bd 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -765,8 +765,10 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl, if (new) { ctrl->ctrl.admin_tagset = nvme_rdma_alloc_tagset(&ctrl->ctrl, true); - if (IS_ERR(ctrl->ctrl.admin_tagset)) + if (IS_ERR(ctrl->ctrl.admin_tagset)) { + error = PTR_ERR(ctrl->ctrl.admin_tagset); goto out_free_queue; + } ctrl->ctrl.admin_q = blk_mq_init_queue(&ctrl->admin_tag_set); if (IS_ERR(ctrl->ctrl.admin_q)) { @@ -846,8 +848,10 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new) if (new) { ctrl->ctrl.tagset = nvme_rdma_alloc_tagset(&ctrl->ctrl, false); - if (IS_ERR(ctrl->ctrl.tagset)) + if (IS_ERR(ctrl->ctrl.tagset)) { + ret = PTR_ERR(ctrl->ctrl.tagset); goto out_free_io_queues; + } ctrl->ctrl.connect_q = blk_mq_init_queue(&ctrl->tag_set); if (IS_ERR(ctrl->ctrl.connect_q)) { -- Also, can you add the following debug messages to find out what failed? -- diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index 58983000964b..e46475100eea 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -676,6 +676,12 @@ static void nvme_rdma_free_tagset(struct nvme_ctrl *nctrl, bool admin) struct blk_mq_tag_set *set = admin ? &ctrl->admin_tag_set : &ctrl->tag_set; + if (set == &ctrl->tag_set) { + pr_err("%s: freeing IO tagset\n", __func__); + } else { + pr_err("%s: freeing ADMIN tagset\n", __func__); + } + blk_mq_free_tag_set(set); nvme_rdma_dev_put(ctrl->device);