From patchwork Sat Mar 18 17:50:59 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sagi Grimberg X-Patchwork-Id: 9632289 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id CD427601E9 for ; Sat, 18 Mar 2017 17:51:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C51A92836B for ; Sat, 18 Mar 2017 17:51:04 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B7F5F2838E; Sat, 18 Mar 2017 17:51:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.4 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 59E632836B for ; Sat, 18 Mar 2017 17:51:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751854AbdCRRvE (ORCPT ); Sat, 18 Mar 2017 13:51:04 -0400 Received: from mail-pf0-f193.google.com ([209.85.192.193]:33286 "EHLO mail-pf0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751685AbdCRRvD (ORCPT ); Sat, 18 Mar 2017 13:51:03 -0400 Received: by mail-pf0-f193.google.com with SMTP id p189so6599719pfp.0 for ; Sat, 18 Mar 2017 10:51:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=LfrCSrkIWmCK4/eUbX0XIhGIgpMeBKHqhcm+FsGFFpc=; b=qMiqB/ZU/YnFuc6HT8CkaqMj/kHmhmw0CdZnOcYZEeVU9jpSGiSP2jSUjbI3NSwDcY FDktyBIuIPLURuBMzIV9neoQRTybRFq+E8SQJdZLk7f34DvLCLsil25oUir4jR4cl24G j9g004V0bMZQzFzr3Ku6sdH5RTmq8QaHR5f8xbpdxMBcLQLxt5KHHthxEb+LNAOPZE8b yWHLY3uQWIjIPiY5XKTZwBF8JGvzBwxZNZGNcWtUEOQa6HXS0mC3KAVFSBd+hZ7YLKf7 HgtA9ZjCLi8GgFUMzAekzBu3ODaJs5HjJxnLsYRLIB3ZzL87qkbFtZ6Kf4sGYtk/nNeY OPig== X-Gm-Message-State: AFeK/H1xGMZITinYafQe7azysfW7GI0jXCVOEOyAYaBjjx56Wrm/NaoQ1WXJG+kBpP1yRw== X-Received: by 10.84.129.67 with SMTP id 61mr28586971plb.16.1489859461981; Sat, 18 Mar 2017 10:51:01 -0700 (PDT) Received: from [10.0.1.2] (50-197-129-18-static.hfc.comcastbusiness.net. [50.197.129.18]) by smtp.gmail.com with ESMTPSA id t66sm11297876pfk.53.2017.03.18.10.50.59 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 18 Mar 2017 10:51:00 -0700 (PDT) Subject: Re: mlx4_core 0000:07:00.0: swiotlb buffer is full and OOM observed during stress test on reset_controller To: Yi Zhang References: <2013049462.31187009.1488542111040.JavaMail.zimbra@redhat.com> <20170310165214.GC14379@mtr-leonro.local> <56e8ccd3-8116-89a1-2f65-eb61a91c5f84@mellanox.com> <860db62d-ae93-d94c-e5fb-88e7b643f737@redhat.com> <0a825b18-df06-9a6d-38c9-402f4ee121f7@mellanox.com> <7496c68a-15f3-d8cb-b17f-20f5a59a24d2@redhat.com> <31678a43-f76c-a921-e40c-470b0de1a86c@grimberg.me> <1768681609.3995777.1489837916289.JavaMail.zimbra@redhat.com> Cc: Max Gurtovoy , Leon Romanovsky , linux-rdma@vger.kernel.org, Christoph Hellwig , linux-nvme@lists.infradead.org From: Sagi Grimberg Message-ID: <059299cc-7f45-e8eb-f1b1-7da2cf49cf5a@grimberg.me> Date: Sat, 18 Mar 2017 19:50:59 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.7.0 MIME-Version: 1.0 In-Reply-To: <1768681609.3995777.1489837916289.JavaMail.zimbra@redhat.com> Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP > Hi Sagi > With this path, the OOM cannot be reproduced now. > > But there is another problem, the reset operation[1] failed at iteration 1007. > [1] > echo 1 >/sys/block/nvme0n1/device/reset_controller We can relax this a bit by only flushing for admin queue accepts, and also let the host accept longer time for establishing a connection. Does this help? --- goto release_queue; -- -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index 47a479f26e5d..e1db1736823f 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -34,7 +34,7 @@ #include "fabrics.h" -#define NVME_RDMA_CONNECT_TIMEOUT_MS 1000 /* 1 second */ +#define NVME_RDMA_CONNECT_TIMEOUT_MS 5000 /* 5 seconds */ #define NVME_RDMA_MAX_SEGMENT_SIZE 0xffffff /* 24-bit SGL field */ diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c index ecc4fe862561..88bb5814c264 100644 --- a/drivers/nvme/target/rdma.c +++ b/drivers/nvme/target/rdma.c @@ -1199,6 +1199,11 @@ static int nvmet_rdma_queue_connect(struct rdma_cm_id *cm_id, } queue->port = cm_id->context; + if (queue->host_qid == 0) { + /* Let inflight controller teardown complete */ + flush_scheduled_work(); + } + ret = nvmet_rdma_cm_accept(cm_id, queue, &event->param.conn); if (ret)