From patchwork Mon Feb 27 20:33:16 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Sagi Grimberg <sagi@grimberg.me>
X-Patchwork-Id: 9594545
Return-Path: <linux-rdma-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	3DE42604B9 for <patchwork-linux-rdma@patchwork.kernel.org>;
	Tue, 28 Feb 2017 01:32:22 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2EF86284D5
	for <patchwork-linux-rdma@patchwork.kernel.org>;
	Tue, 28 Feb 2017 01:32:22 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 234E7284D9; Tue, 28 Feb 2017 01:32:22 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI
	autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 88248284D5
	for <patchwork-linux-rdma@patchwork.kernel.org>;
	Tue, 28 Feb 2017 01:32:21 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1752037AbdB1BcU (ORCPT
	<rfc822;patchwork-linux-rdma@patchwork.kernel.org>);
	Mon, 27 Feb 2017 20:32:20 -0500
Received: from mail-wm0-f65.google.com ([74.125.82.65]:34902 "EHLO
	mail-wm0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751505AbdB1Bbh (ORCPT
	<rfc822; linux-rdma@vger.kernel.org>); Mon, 27 Feb 2017 20:31:37 -0500
Received: by mail-wm0-f65.google.com with SMTP id u63so54750wmu.2
	for <linux-rdma@vger.kernel.org>;
	Mon, 27 Feb 2017 17:31:26 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:subject:to:references:cc:from:message-id:date
	:user-agent:mime-version:in-reply-to:content-transfer-encoding;
	bh=bYweTgiILjnBB/6it7CSFFGqn8eCRIWBwvj0pAaAvq8=;
	b=sk/GlXYwLxb8vv+0Adsd3D7OBmT46qvr8vamvvEmcDEJ7EUp9+nSUt+7mMu+amRJTa
	MGWHnBLIAFrpy7VHFi3j0cvIgx+yAaPVVq8Hdxmrncm8x82o/jzI0XVHelWqg8MQK25Q
	ajXA/BRwzUWBPmy3LrmyPf5mlqQgq1zS3I/D+ydqBDmyQ/mD7v1jAeIqRLtdChOzpRLW
	zvxWdPvH0dAsoyUMkjY/lrHzm0ygDxKYhN49co9Hcl6ETyvtMqSYa5WVQJRFhPfGJckb
	CaVDskPfw9DCjhD7YYWesVB+77KV+HSzgtDE+iAsuM0Frcbhtrt5LKyVwjFwlMTxJ5h7
	8lfw==
X-Gm-Message-State: 
 AMke39k0GYceQiBOe8abDi6paT4Yyt70vDH31IyGmLUMgHI/iCIqgWRBwbgcI6qJGRSNkA==
X-Received: by 10.28.191.24 with SMTP id p24mr8942149wmf.118.1488227599676;
	Mon, 27 Feb 2017 12:33:19 -0800 (PST)
Received: from [192.168.64.116] (bzq-82-81-101-184.red.bezeqint.net.
	[82.81.101.184]) by smtp.gmail.com with ESMTPSA id
	j184sm15253204wmd.31.2017.02.27.12.33.17
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Mon, 27 Feb 2017 12:33:18 -0800 (PST)
Subject: Re: Unexpected issues with 2 NVME initiators using the same target
To: "Gruher, Joseph R" <joseph.r.gruher@intel.com>,
	"shahar.salzman" <shahar.salzman@gmail.com>,
	Laurence Oberman <loberman@redhat.com>,
	"Riches Jr, Robert M" <robert.m.riches.jr@intel.com>
References: <08131a05-1f56-ef61-990a-7fff04eea095@gmail.com>
	<de1a559a-bf24-0d73-5fc7-148d6cd4d4e0@grimberg.me>
	<1848296658.37025722.1487782361271.JavaMail.zimbra@redhat.com>
	<1554c1d1-6bf4-9ca2-12d4-a0125d8c5715@gmail.com>
	<DE927C68B458BE418D582EC97927A92854655137@ORSMSX112.amr.corp.intel.com>
Cc: "linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>
From: Sagi Grimberg <sagi@grimberg.me>
Message-ID: <3eb5814f-14cb-2b94-adf8-335d4b2eb7e9@grimberg.me>
Date: Mon, 27 Feb 2017 22:33:16 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
	Thunderbird/45.7.0
MIME-Version: 1.0
In-Reply-To: 
 <DE927C68B458BE418D582EC97927A92854655137@ORSMSX112.amr.corp.intel.com>
Sender: linux-rdma-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-rdma.vger.kernel.org>
X-Mailing-List: linux-rdma@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Hey Joseph,

> In our lab we are dealing with an issue which has some of the same symptoms.  Wanted to add to the thread in case it is useful here.  We have a target system with 16 Intel P3520 disks and a Mellanox CX4 50Gb NIC directly connected (no switch) to a single initiator system with a matching Mellanox CX4 50Gb NIC.  We are running Ubuntu 16.10 with 4.10-RC8 mainline kernel.  All drivers are kernel default drivers.  I've attached our nvmetcli json, and FIO workload, and dmesg from both systems.
>
> We are able to provoke this problem with a variety of workloads but a high bandwidth read operation seems to cause it the most reliably, harder to produce with smaller block sizes.  For some reason the problem seems produced when we stop and restart IO - I can run the FIO workload on the initiator system for 1-2 hours without any new events in dmesg, pushing about 5500MB/sec the whole time, then kill it and wait 10 seconds and restart it, and the errors and reconnect events happen reliably at that point.  Working to characterize further this week and also to see if we can produce on a smaller configuration.  Happy to provide any additional details that would be useful or try any fixes!
>
> On the initiator we see events like this:
>
> [51390.065641] mlx5_0:dump_cqe:262:(pid 0): dump error cqe
> [51390.065644] 00000000 00000000 00000000 00000000
> [51390.065645] 00000000 00000000 00000000 00000000
> [51390.065646] 00000000 00000000 00000000 00000000
> [51390.065648] 00000000 08007806 250003ab 02b9dcd2
> [51390.065666] nvme nvme3: MEMREG for CQE 0xffff9fc845039410 failed with status memory management operation error (6)
> [51390.079156] nvme nvme3: reconnecting in 10 seconds
> [51400.432782] nvme nvme3: Successfully reconnected

Seems to me this is a CX4 FW issue. Mellanox can elaborate on these
vendor specific syndromes on this output.

> On the target we see events like this:
>
> [51370.394694] mlx5_0:dump_cqe:262:(pid 6623): dump error cqe
> [51370.394696] 00000000 00000000 00000000 00000000
> [51370.394697] 00000000 00000000 00000000 00000000
> [51370.394699] 00000000 00000000 00000000 00000000
> [51370.394701] 00000000 00008813 080003ea 00c3b1d2

If the host is failing on memory mapping while the target is initiating
rdma access it makes sense that it will see errors.

>
> Sometimes, but less frequently, we also will see events on the target like this as part of the problem:
>
> [21322.678571] nvmet: ctrl 1 fatal error occurred!

Again, also makes sense because for nvmet this is a fatal error and we
need to teardown the controller.

You can try out this patch to see if it makes the memreg issues to go
away:
---
                                 qp->sq.wr_data[idx] = IB_WR_LOCAL_INV;
                                 ctrl->imm = 
cpu_to_be32(wr->ex.invalidate_rkey);
                                 set_linv_wr(qp, &seg, &size);
@@ -3901,7 +3901,7 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct 
ib_send_wr *wr,
                                 break;

                         case IB_WR_REG_MR:
-                               next_fence = 
MLX5_FENCE_MODE_INITIATOR_SMALL;
+                               next_fence = 
MLX5_FENCE_MODE_STRONG_ORDERING;
                                 qp->sq.wr_data[idx] = IB_WR_REG_MR;
                                 ctrl->imm = cpu_to_be32(reg_wr(wr)->key);
                                 err = set_reg_wr(qp, reg_wr(wr), &seg, 
&size);
--

Note that this will have a big performance (negative) impact on small
read workloads.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/drivers/infiniband/hw/mlx5/qp.c 
b/drivers/infiniband/hw/mlx5/qp.c
index ad8a2638e339..0f9a12570262 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -3893,7 +3893,7 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct 
ib_send_wr *wr,
                                 goto out;

                         case IB_WR_LOCAL_INV:
-                               next_fence = 
MLX5_FENCE_MODE_INITIATOR_SMALL;
+                               next_fence = 
MLX5_FENCE_MODE_STRONG_ORDERING;