[v3,8/8] nvdimm: Fix firmware activation deadlock scenarios

Message ID	165055523099.3745911.9091010720291846249.stgit@dwillia2-desk3.amr.corp.intel.com (mailing list archive)
State	New, archived
Headers	show Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D956E1FC0 for <nvdimm@lists.linux.dev>; Thu, 21 Apr 2022 15:37:37 +0000 (UTC) Subject: [PATCH v3 8/8] nvdimm: Fix firmware activation deadlock scenarios From: Dan Williams <dan.j.williams@intel.com> To: linux-cxl@vger.kernel.org Cc: peterz@infradead.org, vishal.l.verma@intel.com, alison.schofield@intel.com, nvdimm@lists.linux.dev, linux-kernel@vger.kernel.org Date: Thu, 21 Apr 2022 08:33:51 -0700 Message-ID: <165055523099.3745911.9091010720291846249.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <165055518776.3745911.9346998911322224736.stgit@dwillia2-desk3.amr.corp.intel.com> References: <165055518776.3745911.9346998911322224736.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-3-g996c Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit
Series	device-core: Enable device_lock() lockdep validation \| expand [v3,0/8] device-core: Enable device_lock() lockdep validation [v3,1/8] cxl: Replace lockdep_mutex with local lock classes [v3,2/8] cxl/acpi: Add root device lockdep validation [v3,3/8] cxl: Drop cxl_device_lock() [v3,4/8] nvdimm: Replace lockdep_mutex with local lock classes [v3,5/8] ACPI: NFIT: Drop nfit_device_lock() [v3,6/8] nvdimm: Drop nd_device_lock() [v3,7/8] device-core: Kill the lockdep_mutex [v3,8/8] nvdimm: Fix firmware activation deadlock scenarios

Message ID

165055523099.3745911.9091010720291846249.stgit@dwillia2-desk3.amr.corp.intel.com (mailing list archive)

State

New, archived

Headers

Subject: [PATCH v3 8/8] nvdimm: Fix firmware activation deadlock scenarios
From: Dan Williams <dan.j.williams@intel.com>
To: linux-cxl@vger.kernel.org
Cc: peterz@infradead.org, vishal.l.verma@intel.com,
 alison.schofield@intel.com,
 nvdimm@lists.linux.dev, linux-kernel@vger.kernel.org
Date: Thu, 21 Apr 2022 08:33:51 -0700
Message-ID: 
 <165055523099.3745911.9091010720291846249.stgit@dwillia2-desk3.amr.corp.intel.com>
In-Reply-To: 
 <165055518776.3745911.9346998911322224736.stgit@dwillia2-desk3.amr.corp.intel.com>
References: 
 <165055518776.3745911.9346998911322224736.stgit@dwillia2-desk3.amr.corp.intel.com>
User-Agent: StGit/0.18-3-g996c
Precedence: bulk
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit

Series

device-core: Enable device_lock() lockdep validation | expand

Commit Message

Dan Williams April 21, 2022, 3:33 p.m. UTC

Lockdep reports the following deadlock scenarios for CXL root device
power-management, device_prepare(), operations, and device_shutdown()
operations for 'nd_region' devices:

---
 Chain exists of:
   &nvdimm_region_key --> &nvdimm_bus->reconfig_mutex --> system_transition_mutex

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(system_transition_mutex);
                                lock(&nvdimm_bus->reconfig_mutex);
                                lock(system_transition_mutex);
   lock(&nvdimm_region_key);

--

 Chain exists of:
   &cxl_nvdimm_bridge_key --> acpi_scan_lock --> &cxl_root_key

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&cxl_root_key);
                                lock(acpi_scan_lock);
                                lock(&cxl_root_key);
   lock(&cxl_nvdimm_bridge_key);

---

These stem from holding nvdimm_bus_lock() over hibernate_quiet_exec()
which walks the entire system device topology taking device_lock() along
the way. The nvdimm_bus_lock() is protecting against unregistration,
multiple simultaneous ops callers, and preventing activate_show() from
racing activate_store(). For the first 2, the lock is redundant.
Unregistration already flushes all ops users, and sysfs already prevents
multiple threads to be active in an ops handler at the same time. For
the last userspace should already be waiting for its last
activate_store() to complete, and does not need activate_show() to flush
the write side, so this lock usage can be deleted in these attributes.

Fixes: 48001ea50d17 ("PM, libnvdimm: Add runtime firmware activation support")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/nvdimm/core.c |    4 ----
 1 file changed, 4 deletions(-)

Comments

Ira Weiny April 23, 2022, 4:28 a.m. UTC | #1

On Thu, Apr 21, 2022 at 08:33:51AM -0700, Dan Williams wrote:
> Lockdep reports the following deadlock scenarios for CXL root device
> power-management, device_prepare(), operations, and device_shutdown()
> operations for 'nd_region' devices:
> 
> ---
>  Chain exists of:
>    &nvdimm_region_key --> &nvdimm_bus->reconfig_mutex --> system_transition_mutex
> 
>   Possible unsafe locking scenario:
> 
>         CPU0                    CPU1
>         ----                    ----
>    lock(system_transition_mutex);
>                                 lock(&nvdimm_bus->reconfig_mutex);
>                                 lock(system_transition_mutex);
>    lock(&nvdimm_region_key);
> 
> --
> 
>  Chain exists of:
>    &cxl_nvdimm_bridge_key --> acpi_scan_lock --> &cxl_root_key
> 
>   Possible unsafe locking scenario:
> 
>         CPU0                    CPU1
>         ----                    ----
>    lock(&cxl_root_key);
>                                 lock(acpi_scan_lock);
>                                 lock(&cxl_root_key);
>    lock(&cxl_nvdimm_bridge_key);
> 
> ---
> 
> These stem from holding nvdimm_bus_lock() over hibernate_quiet_exec()
> which walks the entire system device topology taking device_lock() along
> the way. The nvdimm_bus_lock() is protecting against unregistration,
> multiple simultaneous ops callers, and preventing activate_show() from
> racing activate_store(). For the first 2, the lock is redundant.
> Unregistration already flushes all ops users, and sysfs already prevents
> multiple threads to be active in an ops handler at the same time. For
> the last userspace should already be waiting for its last
> activate_store() to complete, and does not need activate_show() to flush
> the write side, so this lock usage can be deleted in these attributes.
>

I'm sorry if this is obvious but why can't the locking be removed from
capability_show() and nvdimm_bus_firmware_visible() as well?

Effectively it sounds like we don't care if the cap read is racing any state
change?  And we know the device can't go away while sysfs is calling those
functions.

Ira

> 
> Fixes: 48001ea50d17 ("PM, libnvdimm: Add runtime firmware activation support")
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/nvdimm/core.c |    4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/drivers/nvdimm/core.c b/drivers/nvdimm/core.c
> index 144926b7451c..7c7f4a43fd4f 100644
> --- a/drivers/nvdimm/core.c
> +++ b/drivers/nvdimm/core.c
> @@ -395,10 +395,8 @@ static ssize_t activate_show(struct device *dev,
>  	if (!nd_desc->fw_ops)
>  		return -EOPNOTSUPP;
>  
> -	nvdimm_bus_lock(dev);
>  	cap = nd_desc->fw_ops->capability(nd_desc);
>  	state = nd_desc->fw_ops->activate_state(nd_desc);
> -	nvdimm_bus_unlock(dev);
>  
>  	if (cap < NVDIMM_FWA_CAP_QUIESCE)
>  		return -EOPNOTSUPP;
> @@ -443,7 +441,6 @@ static ssize_t activate_store(struct device *dev,
>  	else
>  		return -EINVAL;
>  
> -	nvdimm_bus_lock(dev);
>  	state = nd_desc->fw_ops->activate_state(nd_desc);
>  
>  	switch (state) {
> @@ -461,7 +458,6 @@ static ssize_t activate_store(struct device *dev,
>  	default:
>  		rc = -ENXIO;
>  	}
> -	nvdimm_bus_unlock(dev);
>  
>  	if (rc == 0)
>  		rc = len;
> 
>

Dan Williams April 23, 2022, 5:29 p.m. UTC | #2

On Fri, Apr 22, 2022 at 9:28 PM Ira Weiny <ira.weiny@intel.com> wrote:
>
> On Thu, Apr 21, 2022 at 08:33:51AM -0700, Dan Williams wrote:
> > Lockdep reports the following deadlock scenarios for CXL root device
> > power-management, device_prepare(), operations, and device_shutdown()
> > operations for 'nd_region' devices:
> >
> > ---
> >  Chain exists of:
> >    &nvdimm_region_key --> &nvdimm_bus->reconfig_mutex --> system_transition_mutex
> >
> >   Possible unsafe locking scenario:
> >
> >         CPU0                    CPU1
> >         ----                    ----
> >    lock(system_transition_mutex);
> >                                 lock(&nvdimm_bus->reconfig_mutex);
> >                                 lock(system_transition_mutex);
> >    lock(&nvdimm_region_key);
> >
> > --
> >
> >  Chain exists of:
> >    &cxl_nvdimm_bridge_key --> acpi_scan_lock --> &cxl_root_key
> >
> >   Possible unsafe locking scenario:
> >
> >         CPU0                    CPU1
> >         ----                    ----
> >    lock(&cxl_root_key);
> >                                 lock(acpi_scan_lock);
> >                                 lock(&cxl_root_key);
> >    lock(&cxl_nvdimm_bridge_key);
> >
> > ---
> >
> > These stem from holding nvdimm_bus_lock() over hibernate_quiet_exec()
> > which walks the entire system device topology taking device_lock() along
> > the way. The nvdimm_bus_lock() is protecting against unregistration,
> > multiple simultaneous ops callers, and preventing activate_show() from
> > racing activate_store(). For the first 2, the lock is redundant.
> > Unregistration already flushes all ops users, and sysfs already prevents
> > multiple threads to be active in an ops handler at the same time. For
> > the last userspace should already be waiting for its last
> > activate_store() to complete, and does not need activate_show() to flush
> > the write side, so this lock usage can be deleted in these attributes.
> >
>
> I'm sorry if this is obvious but why can't the locking be removed from
> capability_show() and nvdimm_bus_firmware_visible() as well?

It can, that's a good catch, thanks.

diff --git a/drivers/nvdimm/core.c b/drivers/nvdimm/core.c
index 144926b7451c..7c7f4a43fd4f 100644
--- a/drivers/nvdimm/core.c
+++ b/drivers/nvdimm/core.c
@@ -395,10 +395,8 @@  static ssize_t activate_show(struct device *dev,
 	if (!nd_desc->fw_ops)
 		return -EOPNOTSUPP;
 
-	nvdimm_bus_lock(dev);
 	cap = nd_desc->fw_ops->capability(nd_desc);
 	state = nd_desc->fw_ops->activate_state(nd_desc);
-	nvdimm_bus_unlock(dev);
 
 	if (cap < NVDIMM_FWA_CAP_QUIESCE)
 		return -EOPNOTSUPP;
@@ -443,7 +441,6 @@  static ssize_t activate_store(struct device *dev,
 	else
 		return -EINVAL;
 
-	nvdimm_bus_lock(dev);
 	state = nd_desc->fw_ops->activate_state(nd_desc);
 
 	switch (state) {
@@ -461,7 +458,6 @@  static ssize_t activate_store(struct device *dev,
 	default:
 		rc = -ENXIO;
 	}
-	nvdimm_bus_unlock(dev);
 
 	if (rc == 0)
 		rc = len;

[v3,8/8] nvdimm: Fix firmware activation deadlock scenarios

Commit Message

Comments

Patch