From patchwork Fri Sep 2 16:01:54 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steve Wise X-Patchwork-Id: 9311411 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 1C954607D2 for ; Fri, 2 Sep 2016 16:34:35 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0E8E6297AC for ; Fri, 2 Sep 2016 16:34:35 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 039662980C; Fri, 2 Sep 2016 16:34:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 76A2F297AC for ; Fri, 2 Sep 2016 16:34:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932750AbcIBQed (ORCPT ); Fri, 2 Sep 2016 12:34:33 -0400 Received: from smtp.opengridcomputing.com ([72.48.136.20]:58865 "EHLO smtp.opengridcomputing.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932554AbcIBQed (ORCPT ); Fri, 2 Sep 2016 12:34:33 -0400 Received: from smtp.ogc.us (build2.ogc.int [10.10.0.32]) by smtp.opengridcomputing.com (Postfix) with ESMTP id 9BB6D2A033; Fri, 2 Sep 2016 11:34:32 -0500 (CDT) Received: by smtp.ogc.us (Postfix, from userid 503) id 9258EE08C5; Fri, 2 Sep 2016 11:34:32 -0500 (CDT) Message-Id: <3fee4ad93b127f1c0391e107e98034fecbc31725.1472833420.git.swise@opengridcomputing.com> In-Reply-To: References: From: Steve Wise Date: Fri, 2 Sep 2016 09:01:54 -0700 Subject: [PATCH v5 5/6] nvme-rdma: add DELETING queue flag To: sagi@grimberg.me, hch@lst.de, linux-nvme@lists.infradead.org Cc: dledford@redhat.com, linux-rdma@vger.kernel.org Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Sagi Grimberg When we get a surprise disconnect from the target we queue a periodic reconnect (which is the sane thing to do...). We only move the queues out of CONNECTED when we retry to reconnect (after 10 seconds in the default case) but we stop the blk queues immediately so we are not bothered with traffic from now on. If delete() is kicking off in this period the queues are still in CONNECTED state. Part of the delete sequence is trying to issue ctrl shutdown if the admin queue is CONNECTED (which it is!). This request is issued but stuck in blk-mq waiting for the queues to start again. This might be the one preventing us from forward progress... The patch separates the queue flags to CONNECTED and DELETING. Now we will move out of CONNECTED as soon as error recovery kicks in (before stopping the queues) and DELETING is on when we start the queue deletion. Signed-off-by: Sagi Grimberg --- drivers/nvme/host/rdma.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index a9d43f0..9ea69cc 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -83,6 +83,7 @@ struct nvme_rdma_request { enum nvme_rdma_queue_flags { NVME_RDMA_Q_CONNECTED = (1 << 0), NVME_RDMA_IB_QUEUE_ALLOCATED = (1 << 1), + NVME_RDMA_Q_DELETING = (1 << 2), }; struct nvme_rdma_queue { @@ -616,7 +617,7 @@ static void nvme_rdma_free_queue(struct nvme_rdma_queue *queue) static void nvme_rdma_stop_and_free_queue(struct nvme_rdma_queue *queue) { - if (!test_and_clear_bit(NVME_RDMA_Q_CONNECTED, &queue->flags)) + if (test_and_set_bit(NVME_RDMA_Q_DELETING, &queue->flags)) return; nvme_rdma_stop_queue(queue); nvme_rdma_free_queue(queue); @@ -769,8 +770,13 @@ static void nvme_rdma_error_recovery_work(struct work_struct *work) { struct nvme_rdma_ctrl *ctrl = container_of(work, struct nvme_rdma_ctrl, err_work); + int i; nvme_stop_keep_alive(&ctrl->ctrl); + + for (i = 0; i < ctrl->queue_count; i++) + clear_bit(NVME_RDMA_Q_CONNECTED, &ctrl->queues[i].flags); + if (ctrl->queue_count > 1) nvme_stop_queues(&ctrl->ctrl); blk_mq_stop_hw_queues(ctrl->ctrl.admin_q); @@ -1350,7 +1356,7 @@ static int nvme_rdma_device_unplug(struct nvme_rdma_queue *queue) cancel_delayed_work_sync(&ctrl->reconnect_work); /* Disable the queue so ctrl delete won't free it */ - if (test_and_clear_bit(NVME_RDMA_Q_CONNECTED, &queue->flags)) { + if (!test_and_set_bit(NVME_RDMA_Q_DELETING, &queue->flags)) { /* Free this queue ourselves */ nvme_rdma_stop_queue(queue); nvme_rdma_destroy_queue_ib(queue);