From patchwork Thu Oct 20 14:52:24 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 9386977 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 8AE9460487 for ; Thu, 20 Oct 2016 14:42:01 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7C43B28BA6 for ; Thu, 20 Oct 2016 14:42:01 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 708CC29C9F; Thu, 20 Oct 2016 14:42:01 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 088C128BA6 for ; Thu, 20 Oct 2016 14:42:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933149AbcJTOl7 (ORCPT ); Thu, 20 Oct 2016 10:41:59 -0400 Received: from mga04.intel.com ([192.55.52.120]:21347 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753405AbcJTOl6 (ORCPT ); Thu, 20 Oct 2016 10:41:58 -0400 Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga104.fm.intel.com with ESMTP; 20 Oct 2016 07:41:58 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.31,371,1473145200"; d="scan'208";a="181903801" Received: from unknown (HELO localhost.localdomain) ([10.232.112.96]) by fmsmga004.fm.intel.com with ESMTP; 20 Oct 2016 07:41:57 -0700 Date: Thu, 20 Oct 2016 10:52:24 -0400 From: Keith Busch To: Bart Van Assche Cc: Jens Axboe , Christoph Hellwig , James Bottomley , "Martin K. Petersen" , Mike Snitzer , Doug Ledford , Ming Lin , Laurence Oberman , "linux-block@vger.kernel.org" , "linux-scsi@vger.kernel.org" , "linux-rdma@vger.kernel.org" , "linux-nvme@lists.infradead.org" Subject: Re: [PATCH v3 0/11] Fix race conditions related to stopping block layer queues Message-ID: <20161020145224.GA2771@localhost.localdomain> References: <20161019222454.GA1215@localhost.localdomain> <25418d7a-7e66-3b99-7532-669f7ebd58a6@sandisk.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <25418d7a-7e66-3b99-7532-669f7ebd58a6@sandisk.com> User-Agent: Mutt/1.7.0 (2016-08-17) Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Wed, Oct 19, 2016 at 04:51:18PM -0700, Bart Van Assche wrote: > > I assume that line 498 in blk-mq.c corresponds to BUG_ON(blk_queued_rq(rq))? > Anyway, it seems to me like this is a bug in the NVMe code and also that > this bug is completely unrelated to my patch series. In nvme_complete_rq() I > see that blk_mq_requeue_request() is called. I don't think this is allowed > from the context of nvme_cancel_request() because blk_mq_requeue_request() > assumes that a request has already been removed from the request list. > However, neither blk_mq_tagset_busy_iter() nor nvme_cancel_request() remove > a request from the request list before nvme_complete_rq() is called. I think > this is what triggers the BUG_ON() statement in blk_mq_requeue_request(). > Have you noticed that e.g. the scsi-mq code only calls > blk_mq_requeue_request() after __blk_mq_end_request() has finished? Have you > considered to follow the same approach in nvme_cancel_request()? Both nvme and scsi requeue through their mp_ops 'complete' callback, so nvme is similarly waiting for __blk_mq_end_request before requesting to requeue. The problem, I think, is nvme's IO cancelling path is observing active requests that it's requeuing from the queue_rq path. Patch [11/11] kicks the requeue list unconditionally. This restarts queues the driver had just quiesced a moment before, restarting those requests, but the driver isn't ready to handle them. When the driver ultimately unbinds from the device, it requeues those requests a second time. Either the requeuing can't kick the requeue work when queisced, or the shutdown needs to quiesce even when it hasn't restarted the queues. Either patch below appears to fix the issue. --- -- -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index ccd9cc5..078530c 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -201,7 +201,7 @@ static struct nvme_ns *nvme_get_ns_from_disk(struct gendisk *disk) void nvme_requeue_req(struct request *req) { - blk_mq_requeue_request(req, true); + blk_mq_requeue_request(req, !blk_mq_queue_stopped(req->q)); } EXPORT_SYMBOL_GPL(nvme_requeue_req); -- --- diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 4b30fa2..a05da98 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -1681,10 +1681,9 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown) del_timer_sync(&dev->watchdog_timer); mutex_lock(&dev->shutdown_lock); - if (pci_is_enabled(to_pci_dev(dev->dev))) { - nvme_stop_queues(&dev->ctrl); + nvme_stop_queues(&dev->ctrl); + if (pci_is_enabled(to_pci_dev(dev->dev))) csts = readl(dev->bar + NVME_REG_CSTS); - } queues = dev->online_queues - 1; for (i = dev->queue_count - 1; i > 0; i--)