From patchwork Mon May 8 15:07:20 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 9716295 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 41E9C6035D for ; Mon, 8 May 2017 15:00:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 353122094F for ; Mon, 8 May 2017 15:00:04 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 282F92684F; Mon, 8 May 2017 15:00:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B3EAB2094F for ; Mon, 8 May 2017 15:00:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754315AbdEHPAC (ORCPT ); Mon, 8 May 2017 11:00:02 -0400 Received: from mga07.intel.com ([134.134.136.100]:64731 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750985AbdEHPAC (ORCPT ); Mon, 8 May 2017 11:00:02 -0400 Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga105.jf.intel.com with ESMTP; 08 May 2017 08:00:01 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.38,309,1491289200"; d="scan'208";a="84080536" Received: from unknown (HELO localhost.localdomain) ([10.232.112.96]) by orsmga002.jf.intel.com with ESMTP; 08 May 2017 08:00:00 -0700 Date: Mon, 8 May 2017 11:07:20 -0400 From: Keith Busch To: Ming Lei Cc: Jens Axboe , linux-block@vger.kernel.org, Christoph Hellwig , Sagi Grimberg , linux-nvme@lists.infradead.org, stable@vger.kernel.org Subject: Re: [PATCH] nvme: remove disk after hw queue is started Message-ID: <20170508150720.GB32736@localhost.localdomain> References: <20170508112457.10236-1-ming.lei@redhat.com> <20170508124638.GD5696@ming.t460p> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20170508124638.GD5696@ming.t460p> User-Agent: Mutt/1.7.1 (2016-10-04) Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Mon, May 08, 2017 at 08:46:39PM +0800, Ming Lei wrote: > On Mon, May 08, 2017 at 07:24:57PM +0800, Ming Lei wrote: > > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c > > index c8541c3dcd19..ebe13e157c00 100644 > > --- a/drivers/nvme/host/pci.c > > +++ b/drivers/nvme/host/pci.c > > @@ -2185,8 +2185,8 @@ static void nvme_remove(struct pci_dev *pdev) > > } > > > > flush_work(&dev->reset_work); > > - nvme_uninit_ctrl(&dev->ctrl); > > nvme_dev_disable(dev, true); > > + nvme_uninit_ctrl(&dev->ctrl); > > nvme_dev_remove_admin(dev); > > nvme_free_queues(dev, 0); > > nvme_release_cmb(dev); > > This patch should be wrong, and looks the correct fix should be > flushing 'dev->remove_work' before calling nvme_uninit_ctrl(). Yeah, disabling the device before calling "nvme_uninit_ctrl" shouldn't be required. If you disable the device first, del_gendisk can't flush dirty data on an orderly removal request. > But it might cause deadloack by calling flush_work(&dev->remove_work) > here simply. I'm almost certain the remove_work shouldn't even be running in this case. If the reset work can't transition the controller state correctly, it should assume something is handling the controller. --- -- diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 26a5fd0..d81104d 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -1792,7 +1797,7 @@ static void nvme_reset_work(struct work_struct *work) nvme_dev_disable(dev, false); if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING)) - goto out; + return; result = nvme_pci_enable(dev); if (result)