From patchwork Mon Feb 27 20:33:16 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sagi Grimberg X-Patchwork-Id: 9594545 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 3DE42604B9 for ; Tue, 28 Feb 2017 01:32:22 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2EF86284D5 for ; Tue, 28 Feb 2017 01:32:22 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 234E7284D9; Tue, 28 Feb 2017 01:32:22 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 88248284D5 for ; Tue, 28 Feb 2017 01:32:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752037AbdB1BcU (ORCPT ); Mon, 27 Feb 2017 20:32:20 -0500 Received: from mail-wm0-f65.google.com ([74.125.82.65]:34902 "EHLO mail-wm0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751505AbdB1Bbh (ORCPT ); Mon, 27 Feb 2017 20:31:37 -0500 Received: by mail-wm0-f65.google.com with SMTP id u63so54750wmu.2 for ; Mon, 27 Feb 2017 17:31:26 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=bYweTgiILjnBB/6it7CSFFGqn8eCRIWBwvj0pAaAvq8=; b=sk/GlXYwLxb8vv+0Adsd3D7OBmT46qvr8vamvvEmcDEJ7EUp9+nSUt+7mMu+amRJTa MGWHnBLIAFrpy7VHFi3j0cvIgx+yAaPVVq8Hdxmrncm8x82o/jzI0XVHelWqg8MQK25Q ajXA/BRwzUWBPmy3LrmyPf5mlqQgq1zS3I/D+ydqBDmyQ/mD7v1jAeIqRLtdChOzpRLW zvxWdPvH0dAsoyUMkjY/lrHzm0ygDxKYhN49co9Hcl6ETyvtMqSYa5WVQJRFhPfGJckb CaVDskPfw9DCjhD7YYWesVB+77KV+HSzgtDE+iAsuM0Frcbhtrt5LKyVwjFwlMTxJ5h7 8lfw== X-Gm-Message-State: AMke39k0GYceQiBOe8abDi6paT4Yyt70vDH31IyGmLUMgHI/iCIqgWRBwbgcI6qJGRSNkA== X-Received: by 10.28.191.24 with SMTP id p24mr8942149wmf.118.1488227599676; Mon, 27 Feb 2017 12:33:19 -0800 (PST) Received: from [192.168.64.116] (bzq-82-81-101-184.red.bezeqint.net. [82.81.101.184]) by smtp.gmail.com with ESMTPSA id j184sm15253204wmd.31.2017.02.27.12.33.17 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 27 Feb 2017 12:33:18 -0800 (PST) Subject: Re: Unexpected issues with 2 NVME initiators using the same target To: "Gruher, Joseph R" , "shahar.salzman" , Laurence Oberman , "Riches Jr, Robert M" References: <08131a05-1f56-ef61-990a-7fff04eea095@gmail.com> <1848296658.37025722.1487782361271.JavaMail.zimbra@redhat.com> <1554c1d1-6bf4-9ca2-12d4-a0125d8c5715@gmail.com> Cc: "linux-rdma@vger.kernel.org" , "linux-nvme@lists.infradead.org" From: Sagi Grimberg Message-ID: <3eb5814f-14cb-2b94-adf8-335d4b2eb7e9@grimberg.me> Date: Mon, 27 Feb 2017 22:33:16 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.7.0 MIME-Version: 1.0 In-Reply-To: Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hey Joseph, > In our lab we are dealing with an issue which has some of the same symptoms. Wanted to add to the thread in case it is useful here. We have a target system with 16 Intel P3520 disks and a Mellanox CX4 50Gb NIC directly connected (no switch) to a single initiator system with a matching Mellanox CX4 50Gb NIC. We are running Ubuntu 16.10 with 4.10-RC8 mainline kernel. All drivers are kernel default drivers. I've attached our nvmetcli json, and FIO workload, and dmesg from both systems. > > We are able to provoke this problem with a variety of workloads but a high bandwidth read operation seems to cause it the most reliably, harder to produce with smaller block sizes. For some reason the problem seems produced when we stop and restart IO - I can run the FIO workload on the initiator system for 1-2 hours without any new events in dmesg, pushing about 5500MB/sec the whole time, then kill it and wait 10 seconds and restart it, and the errors and reconnect events happen reliably at that point. Working to characterize further this week and also to see if we can produce on a smaller configuration. Happy to provide any additional details that would be useful or try any fixes! > > On the initiator we see events like this: > > [51390.065641] mlx5_0:dump_cqe:262:(pid 0): dump error cqe > [51390.065644] 00000000 00000000 00000000 00000000 > [51390.065645] 00000000 00000000 00000000 00000000 > [51390.065646] 00000000 00000000 00000000 00000000 > [51390.065648] 00000000 08007806 250003ab 02b9dcd2 > [51390.065666] nvme nvme3: MEMREG for CQE 0xffff9fc845039410 failed with status memory management operation error (6) > [51390.079156] nvme nvme3: reconnecting in 10 seconds > [51400.432782] nvme nvme3: Successfully reconnected Seems to me this is a CX4 FW issue. Mellanox can elaborate on these vendor specific syndromes on this output. > On the target we see events like this: > > [51370.394694] mlx5_0:dump_cqe:262:(pid 6623): dump error cqe > [51370.394696] 00000000 00000000 00000000 00000000 > [51370.394697] 00000000 00000000 00000000 00000000 > [51370.394699] 00000000 00000000 00000000 00000000 > [51370.394701] 00000000 00008813 080003ea 00c3b1d2 If the host is failing on memory mapping while the target is initiating rdma access it makes sense that it will see errors. > > Sometimes, but less frequently, we also will see events on the target like this as part of the problem: > > [21322.678571] nvmet: ctrl 1 fatal error occurred! Again, also makes sense because for nvmet this is a fatal error and we need to teardown the controller. You can try out this patch to see if it makes the memreg issues to go away: --- qp->sq.wr_data[idx] = IB_WR_LOCAL_INV; ctrl->imm = cpu_to_be32(wr->ex.invalidate_rkey); set_linv_wr(qp, &seg, &size); @@ -3901,7 +3901,7 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, break; case IB_WR_REG_MR: - next_fence = MLX5_FENCE_MODE_INITIATOR_SMALL; + next_fence = MLX5_FENCE_MODE_STRONG_ORDERING; qp->sq.wr_data[idx] = IB_WR_REG_MR; ctrl->imm = cpu_to_be32(reg_wr(wr)->key); err = set_reg_wr(qp, reg_wr(wr), &seg, &size); -- Note that this will have a big performance (negative) impact on small read workloads. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index ad8a2638e339..0f9a12570262 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -3893,7 +3893,7 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, goto out; case IB_WR_LOCAL_INV: - next_fence = MLX5_FENCE_MODE_INITIATOR_SMALL; + next_fence = MLX5_FENCE_MODE_STRONG_ORDERING;