From patchwork Thu Mar 16 16:51:16 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Sagi Grimberg <sagi@grimberg.me>
X-Patchwork-Id: 9629015
Return-Path: <linux-rdma-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	DA33B6048C for <patchwork-linux-rdma@patchwork.kernel.org>;
	Thu, 16 Mar 2017 16:51:21 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C248428532
	for <patchwork-linux-rdma@patchwork.kernel.org>;
	Thu, 16 Mar 2017 16:51:21 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id B0EE828573; Thu, 16 Mar 2017 16:51:21 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-6.4 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_HI,
	RCVD_IN_SORBS_SPAM autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5B3E2285B7
	for <patchwork-linux-rdma@patchwork.kernel.org>;
	Thu, 16 Mar 2017 16:51:21 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1752593AbdCPQvU (ORCPT
	<rfc822;patchwork-linux-rdma@patchwork.kernel.org>);
	Thu, 16 Mar 2017 12:51:20 -0400
Received: from mail-pg0-f68.google.com ([74.125.83.68]:32969 "EHLO
	mail-pg0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751185AbdCPQvU (ORCPT
	<rfc822; linux-rdma@vger.kernel.org>); Thu, 16 Mar 2017 12:51:20 -0400
Received: by mail-pg0-f68.google.com with SMTP id y17so6700723pgh.0
	for <linux-rdma@vger.kernel.org>;
	Thu, 16 Mar 2017 09:51:19 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:subject:to:references:cc:from:message-id:date
	:user-agent:mime-version:in-reply-to:content-transfer-encoding;
	bh=93jjSGyjcExTDbPNdyYjEK4hCcSaPSKyaoZXjWpnp6A=;
	b=sL92HCxUNSafF/4PunuWMFqAlG3F5GPyOVCeUmd1nnfgKz4393aOgzpY4RN3A1Huns
	0kKkxa2Hal3wBo3KktuaYsdqverPDqf0gLVbb0t5KB8IA6Vef109HaWvBE4DRfiO8p3O
	I1LzjP+2uZTEEH/wGg+tuEG8nzge4pqxeDt+QIXAzmpMYP2nbX3eROlvPagVbVmhqN+8
	EOZTtHtC3oFbTxxnWgSJCs5t2VidMSpcEit7HgtUl+FFbgLdTmiRHMhV8f5siexqf8uq
	JC9CCR3IY0t5h6LlZk7nEdzZMMYJ+mmqBUXRrWELdzahjEO1BeWKRzj/WHCrneHBgcia
	NBkQ==
X-Gm-Message-State: 
 AFeK/H2QHSKOFN6lQCPuJ3N+DfwQ8XkHnaoE/T8O9FwAuUDf59JAYKBd2/pi+26hrH9DQw==
X-Received: by 10.99.175.7 with SMTP id w7mr11132917pge.170.1489683078776;
	Thu, 16 Mar 2017 09:51:18 -0700 (PDT)
Received: from [10.0.1.2] (50-197-129-18-static.hfc.comcastbusiness.net.
	[50.197.129.18]) by smtp.gmail.com with ESMTPSA id
	90sm11640142pfl.24.2017.03.16.09.51.16
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Thu, 16 Mar 2017 09:51:17 -0700 (PDT)
Subject: Re: mlx4_core 0000:07:00.0: swiotlb buffer is full and OOM observed
	during stress test on reset_controller
To: Yi Zhang <yizhan@redhat.com>, Max Gurtovoy <maxg@mellanox.com>,
	Leon Romanovsky <leon@kernel.org>
References: <2013049462.31187009.1488542111040.JavaMail.zimbra@redhat.com>
	<95e045a8-ace0-6a9a-b9a9-555cb2670572@grimberg.me>
	<d21c5571-78fd-7882-b4cc-c24f76f6ff47@redhat.com>
	<20170310165214.GC14379@mtr-leonro.local>
	<56e8ccd3-8116-89a1-2f65-eb61a91c5f84@mellanox.com>
	<860db62d-ae93-d94c-e5fb-88e7b643f737@redhat.com>
	<0a825b18-df06-9a6d-38c9-402f4ee121f7@mellanox.com>
	<7496c68a-15f3-d8cb-b17f-20f5a59a24d2@redhat.com>
Cc: linux-rdma@vger.kernel.org, linux-nvme@lists.infradead.org,
	Christoph Hellwig <hch@lst.de>
From: Sagi Grimberg <sagi@grimberg.me>
Message-ID: <31678a43-f76c-a921-e40c-470b0de1a86c@grimberg.me>
Date: Thu, 16 Mar 2017 18:51:16 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
	Thunderbird/45.7.0
MIME-Version: 1.0
In-Reply-To: <7496c68a-15f3-d8cb-b17f-20f5a59a24d2@redhat.com>
Sender: linux-rdma-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-rdma.vger.kernel.org>
X-Mailing-List: linux-rdma@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

>>>>> Sagi,
>>>>> The release function is placed in global workqueue. I'm not familiar
>>>>> with NVMe design and I don't know all the details, but maybe the
>>>>> proper way will
>>>>> be to create special workqueue with MEM_RECLAIM flag to ensure the
>>>>> progress?

Leon, the release work makes progress, but it is inherently slower
than the establishment work and when we are bombarded with
establishments we have no backpressure...

> I tried with 4.11.0-rc2, and still can reproduced it with less than 2000
> times.

Yi,

Can you try the below (untested) patch:

I'm not at all convinced this is the way to go because it will
slow down all the connect requests, but I'm curious to know
if it'll make the issue go away.
---
                 goto release_queue;
--

Any other good ideas are welcome...
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index ecc4fe862561..f15fa6e6b640 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -1199,6 +1199,9 @@ static int nvmet_rdma_queue_connect(struct 
rdma_cm_id *cm_id,
         }
         queue->port = cm_id->context;

+       /* Let inflight queue teardown complete */
+       flush_scheduled_work();
+
         ret = nvmet_rdma_cm_accept(cm_id, queue, &event->param.conn);
         if (ret)