From patchwork Thu Apr 20 09:43:34 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Marta Rybczynska <mrybczyn@kalray.eu>
X-Patchwork-Id: 9689949
Return-Path: <linux-rdma-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	960A060326 for <patchwork-linux-rdma@patchwork.kernel.org>;
	Thu, 20 Apr 2017 09:44:11 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9E432282E2
	for <patchwork-linux-rdma@patchwork.kernel.org>;
	Thu, 20 Apr 2017 09:44:11 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 9303E2845E; Thu, 20 Apr 2017 09:44:11 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED,
	RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A8065282E2
	for <patchwork-linux-rdma@patchwork.kernel.org>;
	Thu, 20 Apr 2017 09:44:10 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S944100AbdDTJoH (ORCPT
	<rfc822;patchwork-linux-rdma@patchwork.kernel.org>);
	Thu, 20 Apr 2017 05:44:07 -0400
Received: from zimbra1.kalray.eu ([92.103.151.219]:37576 "EHLO
	zimbra1.kalray.eu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S944111AbdDTJni (ORCPT
	<rfc822; linux-rdma@vger.kernel.org>); Thu, 20 Apr 2017 05:43:38 -0400
Received: from localhost (localhost [127.0.0.1])
	by zimbra1.kalray.eu (Postfix) with ESMTP id A8C1F2819E3;
	Thu, 20 Apr 2017 11:43:35 +0200 (CEST)
Received: from zimbra1.kalray.eu ([127.0.0.1])
	by localhost (zimbra1.kalray.eu [127.0.0.1]) (amavisd-new, port 10032)
	with ESMTP id ust68mauEkeg; Thu, 20 Apr 2017 11:43:35 +0200 (CEST)
Received: from localhost (localhost [127.0.0.1])
	by zimbra1.kalray.eu (Postfix) with ESMTP id 245A82819BA;
	Thu, 20 Apr 2017 11:43:35 +0200 (CEST)
DKIM-Filter: OpenDKIM Filter v2.9.2 zimbra1.kalray.eu 245A82819BA
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kalray.eu;
	s=32AE1B44-9502-11E5-BA35-3734643DEF29; t=1492681415;
	bh=NiNhUtDFbuXB417cx3h2gnM5rDiUSP5787u3OhtrzKE=;
	h=Date:From:To:Message-ID:Subject:MIME-Version:Content-Type:
	Content-Transfer-Encoding;
	b=VLP962/51UcBvGSgzbUfTUXqB8BG4UYjUQNj6G1gygpz6AOgzTCeWMCqNRes/f9Fa
	FeK3uiomTKEu2OHXAfLs05cOxGo3RRpXUu4GCkYK/6UN/CBsJbfl8tQ8Wymbb6AmdY
	e/gnoJtreNJCe61Guw3LkAHzVBEoxOR0wBuqmpMo=
X-Virus-Scanned: amavisd-new at kalray.eu
Received: from zimbra1.kalray.eu ([127.0.0.1])
	by localhost (zimbra1.kalray.eu [127.0.0.1]) (amavisd-new, port 10026)
	with ESMTP id a7N4xRXdemrg; Thu, 20 Apr 2017 11:43:35 +0200 (CEST)
Received: from zimbra1.kalray.eu (localhost [127.0.0.1])
	by zimbra1.kalray.eu (Postfix) with ESMTP id 0C487280D16;
	Thu, 20 Apr 2017 11:43:35 +0200 (CEST)
Date: Thu, 20 Apr 2017 11:43:34 +0200 (CEST)
From: Marta Rybczynska <mrybczyn@kalray.eu>
To: linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org,
	leonro@mellanox.com, axboe@fb.com, maxg@mellanox.com,
	jgunthorpe@obsidianresearch.com, hch@lst.de,
	keith busch <keith.busch@intel.com>,
	samuel jones <samuel.jones@kalray.eu>, dledford@redhat.com,
	Bart Van Assche <Bart.VanAssche@sandisk.com>
Message-ID: <1677617891.385461828.1492681414720.JavaMail.zimbra@kalray.eu>
In-Reply-To: <1491923426.2654.1.camel@sandisk.com>
References: <1519881025.363156294.1491837154312.JavaMail.zimbra@kalray.eu>
	<1491838338.4199.5.camel@sandisk.com>
	<1807354347.364485979.1491900728245.JavaMail.zimbra@kalray.eu>
	<1491923426.2654.1.camel@sandisk.com>
Subject: [PATCH v3] nvme-rdma: support devices with queue size < 32
MIME-Version: 1.0
X-Originating-IP: [192.168.37.210]
X-Mailer: Zimbra 8.6.0_GA_1182 (ZimbraWebClient - FF45 (Linux)/8.6.0_GA_1182)
Thread-Topic: [PATCH v2] nvme-rdma: support devices with queue size < 32
Thread-Index: MO8QN5RXiksiPEld16/iGBj06KERGMdDsB0AS3JAl8X9pfo0ADwD6RZk
Sender: linux-rdma-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-rdma.vger.kernel.org>
X-Mailing-List: linux-rdma@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

In the case of small NVMe-oF queue size (<32) we may enter
a deadlock caused by the fact that the IB completions aren't sent
waiting for 32 and the send queue will fill up.

The error is seen as (using mlx5):
[ 2048.693355] mlx5_0:mlx5_ib_post_send:3765:(pid 7273):
[ 2048.693360] nvme nvme1: nvme_rdma_post_send failed with error code -12

This patch changes the way the signalling is done so
that it depends on the queue depth now. The magic define has
been removed completely.

Signed-off-by: Marta Rybczynska <marta.rybczynska@kalray.eu>
Signed-off-by: Samuel Jones <sjones@kalray.eu>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
---
Changes in v3:
* avoid division in the fast path
* reverse sig_count logic to simplify the code: it now counts down
  from the queue depth/2 to 0
* change sig_count to int to avoid overflows for big queues

Changes in v2:
* signal by queue size/2, remove hardcoded 32
* support queue depth of 1

 drivers/nvme/host/rdma.c | 28 +++++++++++++++++++++++-----
 1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 3d25add..ee6f747 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -88,7 +88,7 @@ enum nvme_rdma_queue_flags {
 
 struct nvme_rdma_queue {
 	struct nvme_rdma_qe	*rsp_ring;
-	u8			sig_count;
+	int			sig_count;
 	int			queue_size;
 	size_t			cmnd_capsule_len;
 	struct nvme_rdma_ctrl	*ctrl;
@@ -251,6 +251,15 @@ static int nvme_rdma_wait_for_cm(struct nvme_rdma_queue *queue)
 	return queue->cm_error;
 }
 
+static inline int nvme_rdma_init_sig_count(int queue_size)
+{
+	/* We signal completion every queue depth/2 and also
+	 * handle the case of possible device with queue_depth=1,
+	 * where we would need to signal every message.
+	 */
+	return max(queue_size / 2, 1);
+}
+
 static int nvme_rdma_create_qp(struct nvme_rdma_queue *queue, const int factor)
 {
 	struct nvme_rdma_device *dev = queue->device;
@@ -556,6 +565,8 @@ static int nvme_rdma_init_queue(struct nvme_rdma_ctrl *ctrl,
 
 	queue->queue_size = queue_size;
 
+	queue->sig_count = nvme_rdma_init_sig_count(queue_size);
+
 	queue->cm_id = rdma_create_id(&init_net, nvme_rdma_cm_handler, queue,
 			RDMA_PS_TCP, IB_QPT_RC);
 	if (IS_ERR(queue->cm_id)) {
@@ -1011,6 +1022,16 @@ static void nvme_rdma_send_done(struct ib_cq *cq, struct ib_wc *wc)
 		nvme_rdma_wr_error(cq, wc, "SEND");
 }
 
+static inline int nvme_rdma_queue_sig_limit(struct nvme_rdma_queue *queue)
+{
+	queue->sig_count--;
+	if (queue->sig_count == 0) {
+		queue->sig_count = nvme_rdma_init_sig_count(queue->queue_size);
+		return 1;
+	}
+	return 0;
+}
+
 static int nvme_rdma_post_send(struct nvme_rdma_queue *queue,
 		struct nvme_rdma_qe *qe, struct ib_sge *sge, u32 num_sge,
 		struct ib_send_wr *first, bool flush)
@@ -1038,9 +1059,6 @@ static int nvme_rdma_post_send(struct nvme_rdma_queue *queue,
 	 * Would have been way to obvious to handle this in hardware or
 	 * at least the RDMA stack..
 	 *
-	 * This messy and racy code sniplet is copy and pasted from the iSER
-	 * initiator, and the magic '32' comes from there as well.
-	 *
 	 * Always signal the flushes. The magic request used for the flush
 	 * sequencer is not allocated in our driver's tagset and it's
 	 * triggered to be freed by blk_cleanup_queue(). So we need to
@@ -1048,7 +1066,7 @@ static int nvme_rdma_post_send(struct nvme_rdma_queue *queue,
 	 * embeded in request's payload, is not freed when __ib_process_cq()
 	 * calls wr_cqe->done().
 	 */
-	if ((++queue->sig_count % 32) == 0 || flush)
+	if (nvme_rdma_queue_sig_limit(queue) || flush)
 		wr.send_flags |= IB_SEND_SIGNALED;
 
 	if (first)