From patchwork Thu Mar 16 16:51:16 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sagi Grimberg X-Patchwork-Id: 9629015 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id DA33B6048C for ; Thu, 16 Mar 2017 16:51:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C248428532 for ; Thu, 16 Mar 2017 16:51:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B0EE828573; Thu, 16 Mar 2017 16:51:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.4 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5B3E2285B7 for ; Thu, 16 Mar 2017 16:51:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752593AbdCPQvU (ORCPT ); Thu, 16 Mar 2017 12:51:20 -0400 Received: from mail-pg0-f68.google.com ([74.125.83.68]:32969 "EHLO mail-pg0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751185AbdCPQvU (ORCPT ); Thu, 16 Mar 2017 12:51:20 -0400 Received: by mail-pg0-f68.google.com with SMTP id y17so6700723pgh.0 for ; Thu, 16 Mar 2017 09:51:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=93jjSGyjcExTDbPNdyYjEK4hCcSaPSKyaoZXjWpnp6A=; b=sL92HCxUNSafF/4PunuWMFqAlG3F5GPyOVCeUmd1nnfgKz4393aOgzpY4RN3A1Huns 0kKkxa2Hal3wBo3KktuaYsdqverPDqf0gLVbb0t5KB8IA6Vef109HaWvBE4DRfiO8p3O I1LzjP+2uZTEEH/wGg+tuEG8nzge4pqxeDt+QIXAzmpMYP2nbX3eROlvPagVbVmhqN+8 EOZTtHtC3oFbTxxnWgSJCs5t2VidMSpcEit7HgtUl+FFbgLdTmiRHMhV8f5siexqf8uq JC9CCR3IY0t5h6LlZk7nEdzZMMYJ+mmqBUXRrWELdzahjEO1BeWKRzj/WHCrneHBgcia NBkQ== X-Gm-Message-State: AFeK/H2QHSKOFN6lQCPuJ3N+DfwQ8XkHnaoE/T8O9FwAuUDf59JAYKBd2/pi+26hrH9DQw== X-Received: by 10.99.175.7 with SMTP id w7mr11132917pge.170.1489683078776; Thu, 16 Mar 2017 09:51:18 -0700 (PDT) Received: from [10.0.1.2] (50-197-129-18-static.hfc.comcastbusiness.net. [50.197.129.18]) by smtp.gmail.com with ESMTPSA id 90sm11640142pfl.24.2017.03.16.09.51.16 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 16 Mar 2017 09:51:17 -0700 (PDT) Subject: Re: mlx4_core 0000:07:00.0: swiotlb buffer is full and OOM observed during stress test on reset_controller To: Yi Zhang , Max Gurtovoy , Leon Romanovsky References: <2013049462.31187009.1488542111040.JavaMail.zimbra@redhat.com> <95e045a8-ace0-6a9a-b9a9-555cb2670572@grimberg.me> <20170310165214.GC14379@mtr-leonro.local> <56e8ccd3-8116-89a1-2f65-eb61a91c5f84@mellanox.com> <860db62d-ae93-d94c-e5fb-88e7b643f737@redhat.com> <0a825b18-df06-9a6d-38c9-402f4ee121f7@mellanox.com> <7496c68a-15f3-d8cb-b17f-20f5a59a24d2@redhat.com> Cc: linux-rdma@vger.kernel.org, linux-nvme@lists.infradead.org, Christoph Hellwig From: Sagi Grimberg Message-ID: <31678a43-f76c-a921-e40c-470b0de1a86c@grimberg.me> Date: Thu, 16 Mar 2017 18:51:16 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.7.0 MIME-Version: 1.0 In-Reply-To: <7496c68a-15f3-d8cb-b17f-20f5a59a24d2@redhat.com> Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP >>>>> Sagi, >>>>> The release function is placed in global workqueue. I'm not familiar >>>>> with NVMe design and I don't know all the details, but maybe the >>>>> proper way will >>>>> be to create special workqueue with MEM_RECLAIM flag to ensure the >>>>> progress? Leon, the release work makes progress, but it is inherently slower than the establishment work and when we are bombarded with establishments we have no backpressure... > I tried with 4.11.0-rc2, and still can reproduced it with less than 2000 > times. Yi, Can you try the below (untested) patch: I'm not at all convinced this is the way to go because it will slow down all the connect requests, but I'm curious to know if it'll make the issue go away. --- goto release_queue; -- Any other good ideas are welcome... -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c index ecc4fe862561..f15fa6e6b640 100644 --- a/drivers/nvme/target/rdma.c +++ b/drivers/nvme/target/rdma.c @@ -1199,6 +1199,9 @@ static int nvmet_rdma_queue_connect(struct rdma_cm_id *cm_id, } queue->port = cm_id->context; + /* Let inflight queue teardown complete */ + flush_scheduled_work(); + ret = nvmet_rdma_cm_accept(cm_id, queue, &event->param.conn); if (ret)