diff mbox series

[v3,2/8] cxl/acpi: Add root device lockdep validation

Message ID 165055519869.3745911.10162603933337340370.stgit@dwillia2-desk3.amr.corp.intel.com (mailing list archive)
State New, archived
Headers show
Series device-core: Enable device_lock() lockdep validation | expand

Commit Message

Dan Williams April 21, 2022, 3:33 p.m. UTC
The CXL "root" device, ACPI0017, is an attach point for coordinating
platform level CXL resources and is the parent device for a CXL port
topology tree. As such it has distinct locking rules relative to other
CXL subsystem objects, but because it is an ACPI device the lock class
is established well before it is given to the cxl_acpi driver.

However, the lockdep API does support changing the lock class "live" for
situations like this. Add a device_lock_set_class() helper that a driver
can use in ->probe() to set a custom lock class, and
device_lock_reset_class() to return to the default "no validate" class
before the custom lock class key goes out of scope after ->remove().

Note the helpers are all macros to support dead code elimination in the
CONFIG_PROVE_LOCKING=n case.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Ben Widawsky <ben.widawsky@intel.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/acpi.c     |   15 +++++++++++++++
 include/linux/device.h |   25 +++++++++++++++++++++++++
 2 files changed, 40 insertions(+)

Comments

Greg Kroah-Hartman April 21, 2022, 4:10 p.m. UTC | #1
On Thu, Apr 21, 2022 at 08:33:18AM -0700, Dan Williams wrote:
> The CXL "root" device, ACPI0017, is an attach point for coordinating
> platform level CXL resources and is the parent device for a CXL port
> topology tree. As such it has distinct locking rules relative to other
> CXL subsystem objects, but because it is an ACPI device the lock class
> is established well before it is given to the cxl_acpi driver.
> 
> However, the lockdep API does support changing the lock class "live" for
> situations like this. Add a device_lock_set_class() helper that a driver
> can use in ->probe() to set a custom lock class, and
> device_lock_reset_class() to return to the default "no validate" class
> before the custom lock class key goes out of scope after ->remove().
> 
> Note the helpers are all macros to support dead code elimination in the
> CONFIG_PROVE_LOCKING=n case.
> 
> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Waiman Long <longman@redhat.com>
> Cc: Boqun Feng <boqun.feng@gmail.com>
> Cc: Alison Schofield <alison.schofield@intel.com>
> Cc: Vishal Verma <vishal.l.verma@intel.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Cc: Ben Widawsky <ben.widawsky@intel.com>
> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/cxl/acpi.c     |   15 +++++++++++++++
>  include/linux/device.h |   25 +++++++++++++++++++++++++
>  2 files changed, 40 insertions(+)

Much simpler, great work.

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ira Weiny April 22, 2022, 11:58 p.m. UTC | #2
On Thu, Apr 21, 2022 at 08:33:18AM -0700, Dan Williams wrote:
> The CXL "root" device, ACPI0017, is an attach point for coordinating
> platform level CXL resources and is the parent device for a CXL port
> topology tree. As such it has distinct locking rules relative to other
> CXL subsystem objects, but because it is an ACPI device the lock class
> is established well before it is given to the cxl_acpi driver.
 
This final sentence gave me pause because it implied that the device lock class
was set to something other than no validate.  But I don't see that anywhere in
the acpi code.  So given that it looks to me like ACPI is just using the
default no validate class...

Reviewed-by: Ira Weiny <ira.weiny@intel.com>

> However, the lockdep API does support changing the lock class "live" for
> situations like this. Add a device_lock_set_class() helper that a driver
> can use in ->probe() to set a custom lock class, and
> device_lock_reset_class() to return to the default "no validate" class
> before the custom lock class key goes out of scope after ->remove().
> 
> Note the helpers are all macros to support dead code elimination in the
> CONFIG_PROVE_LOCKING=n case.
> 
> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Waiman Long <longman@redhat.com>
> Cc: Boqun Feng <boqun.feng@gmail.com>
> Cc: Alison Schofield <alison.schofield@intel.com>
> Cc: Vishal Verma <vishal.l.verma@intel.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Cc: Ben Widawsky <ben.widawsky@intel.com>
> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/cxl/acpi.c     |   15 +++++++++++++++
>  include/linux/device.h |   25 +++++++++++++++++++++++++
>  2 files changed, 40 insertions(+)
> 
> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index d15a6aec0331..e19cea27387e 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -275,6 +275,15 @@ static int add_root_nvdimm_bridge(struct device *match, void *data)
>  	return 1;
>  }
>  
> +static struct lock_class_key cxl_root_key;
> +
> +static void cxl_acpi_lock_reset_class(void *_dev)
> +{
> +	struct device *dev = _dev;
> +
> +	device_lock_reset_class(dev);
> +}
> +
>  static int cxl_acpi_probe(struct platform_device *pdev)
>  {
>  	int rc;
> @@ -283,6 +292,12 @@ static int cxl_acpi_probe(struct platform_device *pdev)
>  	struct acpi_device *adev = ACPI_COMPANION(host);
>  	struct cxl_cfmws_context ctx;
>  
> +	device_lock_set_class(&pdev->dev, &cxl_root_key);
> +	rc = devm_add_action_or_reset(&pdev->dev, cxl_acpi_lock_reset_class,
> +				      &pdev->dev);
> +	if (rc)
> +		return rc;
> +
>  	root_port = devm_cxl_add_port(host, host, CXL_RESOURCE_NONE, NULL);
>  	if (IS_ERR(root_port))
>  		return PTR_ERR(root_port);
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 93459724dcde..82c9d307e7bd 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -850,6 +850,31 @@ static inline bool device_supports_offline(struct device *dev)
>  	return dev->bus && dev->bus->offline && dev->bus->online;
>  }
>  
> +#define __device_lock_set_class(dev, name, key) \
> +	lock_set_class(&(dev)->mutex.dep_map, name, key, 0, _THIS_IP_)
> +
> +/**
> + * device_lock_set_class - Specify a temporary lock class while a device
> + *			   is attached to a driver
> + * @dev: device to modify
> + * @key: lock class key data
> + *
> + * This must be called with the device_lock() already held, for example
> + * from driver ->probe().
> + */
> +#define device_lock_set_class(dev, key)				\
> +	__device_lock_set_class(dev, #key, key)
> +
> +/**
> + * device_lock_reset_class - Return a device to the default lockdep novalidate state
> + * @dev: device to modify
> + *
> + * This must be called with the device_lock() already held, for example
> + * from driver ->remove().
> + */
> +#define device_lock_reset_class(dev) \
> +	device_lock_set_class(dev, &__lockdep_no_validate__)
> +
>  void lock_device_hotplug(void);
>  void unlock_device_hotplug(void);
>  int lock_device_hotplug_sysfs(void);
>
Dan Williams April 23, 2022, 12:08 a.m. UTC | #3
On Fri, Apr 22, 2022 at 4:58 PM Ira Weiny <ira.weiny@intel.com> wrote:
>
> On Thu, Apr 21, 2022 at 08:33:18AM -0700, Dan Williams wrote:
> > The CXL "root" device, ACPI0017, is an attach point for coordinating
> > platform level CXL resources and is the parent device for a CXL port
> > topology tree. As such it has distinct locking rules relative to other
> > CXL subsystem objects, but because it is an ACPI device the lock class
> > is established well before it is given to the cxl_acpi driver.
>
> This final sentence gave me pause because it implied that the device lock class
> was set to something other than no validate.  But I don't see that anywhere in
> the acpi code.  So given that it looks to me like ACPI is just using the
> default no validate class...

Oh, good observation. *If* ACPI had set a custom lock class then
cxl_acpi would need to be careful to restore that ACPI-specific class
and not reset it to "no validate" on exit, or skip setting its own
custom class. However, I think for generic buses like ACPI that feed
devices into other subsystems it likely has little reason to set its
own class. For safety, since device_lock_set_class() is general
purpose, I'll have it emit a debug message and fail if the class is
not "no validate" on entry.

Thanks Ira!
Dan Williams April 23, 2022, 5:27 p.m. UTC | #4
On Fri, Apr 22, 2022 at 5:08 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Fri, Apr 22, 2022 at 4:58 PM Ira Weiny <ira.weiny@intel.com> wrote:
> >
> > On Thu, Apr 21, 2022 at 08:33:18AM -0700, Dan Williams wrote:
> > > The CXL "root" device, ACPI0017, is an attach point for coordinating
> > > platform level CXL resources and is the parent device for a CXL port
> > > topology tree. As such it has distinct locking rules relative to other
> > > CXL subsystem objects, but because it is an ACPI device the lock class
> > > is established well before it is given to the cxl_acpi driver.
> >
> > This final sentence gave me pause because it implied that the device lock class
> > was set to something other than no validate.  But I don't see that anywhere in
> > the acpi code.  So given that it looks to me like ACPI is just using the
> > default no validate class...
>
> Oh, good observation. *If* ACPI had set a custom lock class then
> cxl_acpi would need to be careful to restore that ACPI-specific class
> and not reset it to "no validate" on exit, or skip setting its own
> custom class. However, I think for generic buses like ACPI that feed
> devices into other subsystems it likely has little reason to set its
> own class. For safety, since device_lock_set_class() is general
> purpose, I'll have it emit a debug message and fail if the class is
> not "no validate" on entry.
>

So this turned out way uglier than I expected:

 drivers/cxl/acpi.c     |    4 +++-
 include/linux/device.h |   33 +++++++++++++++++++++++++--------
 2 files changed, 28 insertions(+), 9 deletions(-)

...so I'm going to drop it and just add a comment about the
expectations. As Peter said there's already a multitude of ways to
cause false positive / negative results with lockdep so this is just
one more area where one needs to be careful and understand the lock
context they might be overriding.
Peter Zijlstra April 25, 2022, 10:33 a.m. UTC | #5
On Sat, Apr 23, 2022 at 10:27:52AM -0700, Dan Williams wrote:

> ...so I'm going to drop it and just add a comment about the
> expectations. As Peter said there's already a multitude of ways to
> cause false positive / negative results with lockdep so this is just
> one more area where one needs to be careful and understand the lock
> context they might be overriding.

One safe-guard might be to check the class you're overriding is indeed
__no_validate__, and WARN if not. Then the unconditional reset is
conistent.

Then, if/when, that WARN ever triggers you can revisit all this.
Dan Williams April 25, 2022, 4:05 p.m. UTC | #6
On Mon, Apr 25, 2022 at 3:33 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Sat, Apr 23, 2022 at 10:27:52AM -0700, Dan Williams wrote:
>
> > ...so I'm going to drop it and just add a comment about the
> > expectations. As Peter said there's already a multitude of ways to
> > cause false positive / negative results with lockdep so this is just
> > one more area where one needs to be careful and understand the lock
> > context they might be overriding.
>
> One safe-guard might be to check the class you're overriding is indeed
> __no_validate__, and WARN if not. Then the unconditional reset is
> conistent.
>
> Then, if/when, that WARN ever triggers you can revisit all this.

Ok, that does seem to need a dummy definition of lockdep_match_class()
in the CONFIG_LOCKDEP=n case, but that seems worth it to me for the
sanity check.
Dan Williams April 25, 2022, 6:57 p.m. UTC | #7
On Mon, Apr 25, 2022 at 9:05 AM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Mon, Apr 25, 2022 at 3:33 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Sat, Apr 23, 2022 at 10:27:52AM -0700, Dan Williams wrote:
> >
> > > ...so I'm going to drop it and just add a comment about the
> > > expectations. As Peter said there's already a multitude of ways to
> > > cause false positive / negative results with lockdep so this is just
> > > one more area where one needs to be careful and understand the lock
> > > context they might be overriding.
> >
> > One safe-guard might be to check the class you're overriding is indeed
> > __no_validate__, and WARN if not. Then the unconditional reset is
> > conistent.
> >
> > Then, if/when, that WARN ever triggers you can revisit all this.
>
> Ok, that does seem to need a dummy definition of lockdep_match_class()
> in the CONFIG_LOCKDEP=n case, but that seems worth it to me for the
> sanity check.

Thankfully the comment in lockdep.h to not define a
lockdep_match_class() for the CONFIG_LOCKDEP=n stopped me from going
that direction.
diff mbox series

Patch

diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index d15a6aec0331..e19cea27387e 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -275,6 +275,15 @@  static int add_root_nvdimm_bridge(struct device *match, void *data)
 	return 1;
 }
 
+static struct lock_class_key cxl_root_key;
+
+static void cxl_acpi_lock_reset_class(void *_dev)
+{
+	struct device *dev = _dev;
+
+	device_lock_reset_class(dev);
+}
+
 static int cxl_acpi_probe(struct platform_device *pdev)
 {
 	int rc;
@@ -283,6 +292,12 @@  static int cxl_acpi_probe(struct platform_device *pdev)
 	struct acpi_device *adev = ACPI_COMPANION(host);
 	struct cxl_cfmws_context ctx;
 
+	device_lock_set_class(&pdev->dev, &cxl_root_key);
+	rc = devm_add_action_or_reset(&pdev->dev, cxl_acpi_lock_reset_class,
+				      &pdev->dev);
+	if (rc)
+		return rc;
+
 	root_port = devm_cxl_add_port(host, host, CXL_RESOURCE_NONE, NULL);
 	if (IS_ERR(root_port))
 		return PTR_ERR(root_port);
diff --git a/include/linux/device.h b/include/linux/device.h
index 93459724dcde..82c9d307e7bd 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -850,6 +850,31 @@  static inline bool device_supports_offline(struct device *dev)
 	return dev->bus && dev->bus->offline && dev->bus->online;
 }
 
+#define __device_lock_set_class(dev, name, key) \
+	lock_set_class(&(dev)->mutex.dep_map, name, key, 0, _THIS_IP_)
+
+/**
+ * device_lock_set_class - Specify a temporary lock class while a device
+ *			   is attached to a driver
+ * @dev: device to modify
+ * @key: lock class key data
+ *
+ * This must be called with the device_lock() already held, for example
+ * from driver ->probe().
+ */
+#define device_lock_set_class(dev, key)				\
+	__device_lock_set_class(dev, #key, key)
+
+/**
+ * device_lock_reset_class - Return a device to the default lockdep novalidate state
+ * @dev: device to modify
+ *
+ * This must be called with the device_lock() already held, for example
+ * from driver ->remove().
+ */
+#define device_lock_reset_class(dev) \
+	device_lock_set_class(dev, &__lockdep_no_validate__)
+
 void lock_device_hotplug(void);
 void unlock_device_hotplug(void);
 int lock_device_hotplug_sysfs(void);