diff mbox series

[v2,02/12] device-core: Add dev->lock_class to enable device_lock() lockdep validation

Message ID 164982969858.684294.17819743973041389492.stgit@dwillia2-desk3.amr.corp.intel.com
State Superseded
Headers show
Series device-core: Enable device_lock() lockdep validation | expand

Commit Message

Dan Williams April 13, 2022, 6:01 a.m. UTC
The device_lock() is hidden from lockdep by default because, for
example, a device subsystem may do something like:

---
device_add(dev1);
...in driver core...
device_lock(dev1);
bus->probe(dev1); /* where bus->probe() calls driver1_probe() */

driver1_probe(struct device *dev)
{
	...do some enumeration...
	dev2->parent = dev;
	/* this triggers probe under device_lock(dev2); */
	device_add(dev2);
}
---

To lockdep, that device_lock(dev2) looks like a deadlock because lockdep
only sees lock classes, not individual lock instances. All device_lock()
instances across the entire kernel are the same class. However, this is
not a deadlock in practice because the locking is strictly hierarchical.
I.e. device_lock(dev1) is held over device_lock(dev2), but never the
reverse. In order for lockdep to be satisfied and see that it is
hierarchical in practice the mutex_lock() call in device_lock() needs to
be moved to mutex_lock_nested() where the @subclass argument to
mutex_lock_nested() represents the nesting level, i.e.:

s/device_lock(dev1)/mutex_lock_nested(&dev1->mutex, 1)/

s/device_lock(dev2)/mutex_lock_nested(&dev2->mutex, 2)/

Now, what if the internals of the device_lock() could be annotated with
the right @subclass argument to call mutex_lock_nested()?

With device_set_lock_class() a subsystem can optionally add that
metadata. The device_lock() still takes dev->mutex, but when
dev->lock_class is >= 0 it additionally takes dev->lockdep_mutex with
the proper nesting. Unlike dev->mutex, dev->lockdep_mutex is not marked
lockdep_set_novalidate_class() and lockdep will become useful... at
least for one subsystem at a time.

It is still the case that only one subsystem can be using lockdep with
lockdep_mutex at a time because different subsystems will collide class
numbers. You might say "well, how about subsystem1 gets class ids 0 to 9
and subsystem2 gets class ids 10 to 20?". MAX_LOCKDEP_SUBCLASSES is 8,
and 8 is just enough class ids for one subsystem of moderate complexity.

Fixing that problem needs deeper changes, but for now moving the ability
to set a lock class into the core lets the NVDIMM and CXL subsystems
drop their incomplete solutions which attempt to set the lock class and
take the lockdep mutex after the fact.

This approach has prevented at least one deadlock scenario from making
its way upstream that was not caught by the current "local /
after-the-fact" usage of dev->lockdep_mutex (commit 87a30e1f05d7
("driver-core, libnvdimm: Let device subsystems add local lockdep
coverage")).

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/device.h |   92 ++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 88 insertions(+), 4 deletions(-)

Comments

Peter Zijlstra April 13, 2022, 8:43 a.m. UTC | #1
On Tue, Apr 12, 2022 at 11:01:38PM -0700, Dan Williams wrote:
> The device_lock() is hidden from lockdep by default because, for
> example, a device subsystem may do something like:
> 
> ---
> device_add(dev1);
> ...in driver core...
> device_lock(dev1);
> bus->probe(dev1); /* where bus->probe() calls driver1_probe() */
> 
> driver1_probe(struct device *dev)
> {
> 	...do some enumeration...
> 	dev2->parent = dev;
> 	/* this triggers probe under device_lock(dev2); */
> 	device_add(dev2);
> }
> ---
> 
> To lockdep, that device_lock(dev2) looks like a deadlock because lockdep

Recursion, you're meaning to say it looks like same lock recursion.

> only sees lock classes, not individual lock instances. All device_lock()
> instances across the entire kernel are the same class. However, this is
> not a deadlock in practice because the locking is strictly hierarchical.
> I.e. device_lock(dev1) is held over device_lock(dev2), but never the
> reverse.

I have some very vague memories from a conversation with Alan Stern,
some maybe 10 years ago, where I think he was explaining to me this was
not in fact a simple hierarchy.

> In order for lockdep to be satisfied and see that it is
> hierarchical in practice the mutex_lock() call in device_lock() needs to
> be moved to mutex_lock_nested() where the @subclass argument to
> mutex_lock_nested() represents the nesting level, i.e.:

That's not an obvious conclusion; lockdep has lots of funny annotations,
subclasses is just one.

I think the big new development in lockdep since that time with Alan
Stern is that lockdep now has support for dynamic keys; that is lock
keys in heap memory (as opposed to static storage).

> s/device_lock(dev1)/mutex_lock_nested(&dev1->mutex, 1)/
> 
> s/device_lock(dev2)/mutex_lock_nested(&dev2->mutex, 2)/
> 
> Now, what if the internals of the device_lock() could be annotated with
> the right @subclass argument to call mutex_lock_nested()?
> 
> With device_set_lock_class() a subsystem can optionally add that
> metadata. The device_lock() still takes dev->mutex, but when
> dev->lock_class is >= 0 it additionally takes dev->lockdep_mutex with
> the proper nesting. Unlike dev->mutex, dev->lockdep_mutex is not marked
> lockdep_set_novalidate_class() and lockdep will become useful... at
> least for one subsystem at a time.
> 
> It is still the case that only one subsystem can be using lockdep with
> lockdep_mutex at a time because different subsystems will collide class
> numbers. You might say "well, how about subsystem1 gets class ids 0 to 9
> and subsystem2 gets class ids 10 to 20?". MAX_LOCKDEP_SUBCLASSES is 8,
> and 8 is just enough class ids for one subsystem of moderate complexity.

Again, that doesn't seem like an obvious suggestion at all. Why not give
each subsystem a different lock key?


> diff --git a/include/linux/device.h b/include/linux/device.h
> index af2576ace130..6083e757e804 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -402,6 +402,7 @@ struct dev_msi_info {
>   * @mutex:	Mutex to synchronize calls to its driver.
>   * @lockdep_mutex: An optional debug lock that a subsystem can use as a
>   * 		peer lock to gain localized lockdep coverage of the device_lock.
> + * @lock_class: per-subsystem annotated device lock class
>   * @bus:	Type of bus device is on.
>   * @driver:	Which driver has allocated this
>   * @platform_data: Platform data specific to the device.
> @@ -501,6 +502,7 @@ struct device {
>  					   dev_set_drvdata/dev_get_drvdata */
>  #ifdef CONFIG_PROVE_LOCKING
>  	struct mutex		lockdep_mutex;
> +	int			lock_class;
>  #endif
>  	struct mutex		mutex;	/* mutex to synchronize calls to
>  					 * its driver.
> @@ -762,18 +764,100 @@ static inline bool dev_pm_test_driver_flags(struct device *dev, u32 flags)
>  	return !!(dev->power.driver_flags & flags);
>  }
>  
> +static inline void device_lock_assert(struct device *dev)
> +{
> +	lockdep_assert_held(&dev->mutex);
> +}
> +
>  #ifdef CONFIG_PROVE_LOCKING
>  static inline void device_lockdep_init(struct device *dev)
>  {
>  	mutex_init(&dev->lockdep_mutex);
> +	dev->lock_class = -1;
>  	lockdep_set_novalidate_class(&dev->mutex);
>  }
> -#else
> +
> +static inline void device_lock(struct device *dev)
> +{
> +	/*
> +	 * For double-lock programming errors the kernel will hang
> +	 * trying to acquire @dev->mutex before lockdep can report the
> +	 * problem acquiring @dev->lockdep_mutex, so manually assert
> +	 * before that hang.
> +	 */
> +	lockdep_assert_not_held(&dev->lockdep_mutex);
> +
> +	mutex_lock(&dev->mutex);
> +	if (dev->lock_class >= 0)
> +		mutex_lock_nested(&dev->lockdep_mutex, dev->lock_class);
> +}
> +
> +static inline int device_lock_interruptible(struct device *dev)
> +{
> +	int rc;
> +
> +	lockdep_assert_not_held(&dev->lockdep_mutex);
> +
> +	rc = mutex_lock_interruptible(&dev->mutex);
> +	if (rc || dev->lock_class < 0)
> +		return rc;
> +
> +	return mutex_lock_interruptible_nested(&dev->lockdep_mutex,
> +					       dev->lock_class);
> +}
> +
> +static inline int device_trylock(struct device *dev)
> +{
> +	if (mutex_trylock(&dev->mutex)) {
> +		if (dev->lock_class >= 0)
> +			mutex_lock_nested(&dev->lockdep_mutex, dev->lock_class);

This must be the weirdest stuff I've seen in a while.

> +		return 1;
> +	}
> +
> +	return 0;
> +}
> +
> +static inline void device_unlock(struct device *dev)
> +{
> +	if (dev->lock_class >= 0)
> +		mutex_unlock(&dev->lockdep_mutex);
> +	mutex_unlock(&dev->mutex);
> +}
> +
> +/*
> + * Note: this routine expects that the state of @dev->mutex is stable
> + * from entry to exit. There is no support for changing lockdep
> + * validation classes, only enabling and disabling validation.
> + */
> +static inline void device_set_lock_class(struct device *dev, int lock_class)
> +{
> +	/*
> +	 * Allow for setting or clearing the lock class while the
> +	 * device_lock() is held, in which case the paired nested lock
> +	 * might need to be acquired or released now to accommodate the
> +	 * next device_unlock().
> +	 */
> +	if (dev->lock_class < 0 && lock_class >= 0) {
> +		/* Enabling lockdep validation... */
> +		if (mutex_is_locked(&dev->mutex))
> +			mutex_lock_nested(&dev->lockdep_mutex, lock_class);
> +	} else if (dev->lock_class >= 0 && lock_class < 0) {
> +		/* Disabling lockdep validation... */
> +		if (mutex_is_locked(&dev->mutex))
> +			mutex_unlock(&dev->lockdep_mutex);
> +	} else {
> +		dev_warn(dev,
> +			 "%s: failed to change lock_class from: %d to %d\n",
> +			 __func__, dev->lock_class, lock_class);
> +		return;
> +	}
> +	dev->lock_class = lock_class;
> +}
> +#else /* !CONFIG_PROVE_LOCKING */

This all reads like something utterly surreal... *WHAT*!?!?

If you want lockdep validation for one (or more) dev->mutex instances,
why not pull them out of the no_validate class and use the normal
locking?

This is all quite insane.
Dan Williams April 13, 2022, 10:01 p.m. UTC | #2
On Wed, Apr 13, 2022 at 1:43 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, Apr 12, 2022 at 11:01:38PM -0700, Dan Williams wrote:
> > The device_lock() is hidden from lockdep by default because, for
> > example, a device subsystem may do something like:
> >
> > ---
> > device_add(dev1);
> > ...in driver core...
> > device_lock(dev1);
> > bus->probe(dev1); /* where bus->probe() calls driver1_probe() */
> >
> > driver1_probe(struct device *dev)
> > {
> >       ...do some enumeration...
> >       dev2->parent = dev;
> >       /* this triggers probe under device_lock(dev2); */
> >       device_add(dev2);
> > }
> > ---
> >
> > To lockdep, that device_lock(dev2) looks like a deadlock because lockdep
>
> Recursion, you're meaning to say it looks like same lock recursion.

Yes, wrong terminology on my part.

>
> > only sees lock classes, not individual lock instances. All device_lock()
> > instances across the entire kernel are the same class. However, this is
> > not a deadlock in practice because the locking is strictly hierarchical.
> > I.e. device_lock(dev1) is held over device_lock(dev2), but never the
> > reverse.
>
> I have some very vague memories from a conversation with Alan Stern,
> some maybe 10 years ago, where I think he was explaining to me this was
> not in fact a simple hierarchy.
>
> > In order for lockdep to be satisfied and see that it is
> > hierarchical in practice the mutex_lock() call in device_lock() needs to
> > be moved to mutex_lock_nested() where the @subclass argument to
> > mutex_lock_nested() represents the nesting level, i.e.:
>
> That's not an obvious conclusion; lockdep has lots of funny annotations,
> subclasses is just one.
>
> I think the big new development in lockdep since that time with Alan
> Stern is that lockdep now has support for dynamic keys; that is lock
> keys in heap memory (as opposed to static storage).

Ah, I was not aware of that, that should allow for deep cleanups of
this proposal.

>
> > s/device_lock(dev1)/mutex_lock_nested(&dev1->mutex, 1)/
> >
> > s/device_lock(dev2)/mutex_lock_nested(&dev2->mutex, 2)/
> >
> > Now, what if the internals of the device_lock() could be annotated with
> > the right @subclass argument to call mutex_lock_nested()?
> >
> > With device_set_lock_class() a subsystem can optionally add that
> > metadata. The device_lock() still takes dev->mutex, but when
> > dev->lock_class is >= 0 it additionally takes dev->lockdep_mutex with
> > the proper nesting. Unlike dev->mutex, dev->lockdep_mutex is not marked
> > lockdep_set_novalidate_class() and lockdep will become useful... at
> > least for one subsystem at a time.
> >
> > It is still the case that only one subsystem can be using lockdep with
> > lockdep_mutex at a time because different subsystems will collide class
> > numbers. You might say "well, how about subsystem1 gets class ids 0 to 9
> > and subsystem2 gets class ids 10 to 20?". MAX_LOCKDEP_SUBCLASSES is 8,
> > and 8 is just enough class ids for one subsystem of moderate complexity.
>
> Again, that doesn't seem like an obvious suggestion at all. Why not give
> each subsystem a different lock key?
>

Yes, that would also save a source of merge conflicts if every
subsystem needed to add conditional extensions to 'struct device' for
an array of lock metadata.

>
> > diff --git a/include/linux/device.h b/include/linux/device.h
> > index af2576ace130..6083e757e804 100644
> > --- a/include/linux/device.h
> > +++ b/include/linux/device.h
> > @@ -402,6 +402,7 @@ struct dev_msi_info {
> >   * @mutex:   Mutex to synchronize calls to its driver.
> >   * @lockdep_mutex: An optional debug lock that a subsystem can use as a
> >   *           peer lock to gain localized lockdep coverage of the device_lock.
> > + * @lock_class: per-subsystem annotated device lock class
> >   * @bus:     Type of bus device is on.
> >   * @driver:  Which driver has allocated this
> >   * @platform_data: Platform data specific to the device.
> > @@ -501,6 +502,7 @@ struct device {
> >                                          dev_set_drvdata/dev_get_drvdata */
> >  #ifdef CONFIG_PROVE_LOCKING
> >       struct mutex            lockdep_mutex;
> > +     int                     lock_class;
> >  #endif
> >       struct mutex            mutex;  /* mutex to synchronize calls to
> >                                        * its driver.
> > @@ -762,18 +764,100 @@ static inline bool dev_pm_test_driver_flags(struct device *dev, u32 flags)
> >       return !!(dev->power.driver_flags & flags);
> >  }
> >
> > +static inline void device_lock_assert(struct device *dev)
> > +{
> > +     lockdep_assert_held(&dev->mutex);
> > +}
> > +
> >  #ifdef CONFIG_PROVE_LOCKING
> >  static inline void device_lockdep_init(struct device *dev)
> >  {
> >       mutex_init(&dev->lockdep_mutex);
> > +     dev->lock_class = -1;
> >       lockdep_set_novalidate_class(&dev->mutex);
> >  }
> > -#else
> > +
> > +static inline void device_lock(struct device *dev)
> > +{
> > +     /*
> > +      * For double-lock programming errors the kernel will hang
> > +      * trying to acquire @dev->mutex before lockdep can report the
> > +      * problem acquiring @dev->lockdep_mutex, so manually assert
> > +      * before that hang.
> > +      */
> > +     lockdep_assert_not_held(&dev->lockdep_mutex);
> > +
> > +     mutex_lock(&dev->mutex);
> > +     if (dev->lock_class >= 0)
> > +             mutex_lock_nested(&dev->lockdep_mutex, dev->lock_class);
> > +}
> > +
> > +static inline int device_lock_interruptible(struct device *dev)
> > +{
> > +     int rc;
> > +
> > +     lockdep_assert_not_held(&dev->lockdep_mutex);
> > +
> > +     rc = mutex_lock_interruptible(&dev->mutex);
> > +     if (rc || dev->lock_class < 0)
> > +             return rc;
> > +
> > +     return mutex_lock_interruptible_nested(&dev->lockdep_mutex,
> > +                                            dev->lock_class);
> > +}
> > +
> > +static inline int device_trylock(struct device *dev)
> > +{
> > +     if (mutex_trylock(&dev->mutex)) {
> > +             if (dev->lock_class >= 0)
> > +                     mutex_lock_nested(&dev->lockdep_mutex, dev->lock_class);
>
> This must be the weirdest stuff I've seen in a while.
>
> > +             return 1;
> > +     }
> > +
> > +     return 0;
> > +}
> > +
> > +static inline void device_unlock(struct device *dev)
> > +{
> > +     if (dev->lock_class >= 0)
> > +             mutex_unlock(&dev->lockdep_mutex);
> > +     mutex_unlock(&dev->mutex);
> > +}
> > +
> > +/*
> > + * Note: this routine expects that the state of @dev->mutex is stable
> > + * from entry to exit. There is no support for changing lockdep
> > + * validation classes, only enabling and disabling validation.
> > + */
> > +static inline void device_set_lock_class(struct device *dev, int lock_class)
> > +{
> > +     /*
> > +      * Allow for setting or clearing the lock class while the
> > +      * device_lock() is held, in which case the paired nested lock
> > +      * might need to be acquired or released now to accommodate the
> > +      * next device_unlock().
> > +      */
> > +     if (dev->lock_class < 0 && lock_class >= 0) {
> > +             /* Enabling lockdep validation... */
> > +             if (mutex_is_locked(&dev->mutex))
> > +                     mutex_lock_nested(&dev->lockdep_mutex, lock_class);
> > +     } else if (dev->lock_class >= 0 && lock_class < 0) {
> > +             /* Disabling lockdep validation... */
> > +             if (mutex_is_locked(&dev->mutex))
> > +                     mutex_unlock(&dev->lockdep_mutex);
> > +     } else {
> > +             dev_warn(dev,
> > +                      "%s: failed to change lock_class from: %d to %d\n",
> > +                      __func__, dev->lock_class, lock_class);
> > +             return;
> > +     }
> > +     dev->lock_class = lock_class;
> > +}
> > +#else /* !CONFIG_PROVE_LOCKING */
>
> This all reads like something utterly surreal... *WHAT*!?!?

Pile of hacks to workaround cases where the lock_class needed to be
specified after the fact. For example, ACPI does not annotate its
locks, but CXL knows that an "ACPI0017" device will always be the root
of a CXL topology. So CXL subsystem can specify that as lock_class 0.

> If you want lockdep validation for one (or more) dev->mutex instances,
> why not pull them out of the no_validate class and use the normal
> locking?

Sounds perfect, just didn't know how to do that with my current
understanding of how to communicate this to lockdep.

>
> This is all quite insane.

Yes, certainly in comparison to your suggestion on the next patch.
That looks much more sane, and even better I think it allows for
optional lockdep validation without even needing to touch
include/linux/device.h.
Peter Zijlstra April 14, 2022, 10:16 a.m. UTC | #3
On Wed, Apr 13, 2022 at 03:01:21PM -0700, Dan Williams wrote:

> > That's not an obvious conclusion; lockdep has lots of funny annotations,
> > subclasses is just one.
> >
> > I think the big new development in lockdep since that time with Alan
> > Stern is that lockdep now has support for dynamic keys; that is lock
> > keys in heap memory (as opposed to static storage).
> 
> Ah, I was not aware of that, that should allow for deep cleanups of
> this proposal.

> > If you want lockdep validation for one (or more) dev->mutex instances,
> > why not pull them out of the no_validate class and use the normal
> > locking?
> 
> Sounds perfect, just didn't know how to do that with my current
> understanding of how to communicate this to lockdep.
> 
> >
> > This is all quite insane.
> 
> Yes, certainly in comparison to your suggestion on the next patch.
> That looks much more sane, and even better I think it allows for
> optional lockdep validation without even needing to touch
> include/linux/device.h.

Right, so lockdep has:

 - classes, based off of lock_class_key address;

   * lock_class_key should be static storage; except now we also have:

       lockdep_{,un}register_key()

     which allows dynamic memory (aka. heap) to be used for classes,
     important to note that lockdep memory usage is still static storage
     because the memory allocators use locks too. So if you register too
     many dynamic keys, you'll run out of lockdep memory etc.. so be
     careful.

   * things like mutex_init() have a static lock_class_key per site
     and hence every lock initialized by the same code ends up in the
     same class by default.

   * can be trivially changed at any time, assuming the lock isn't held,
     using lockdep_set_class*() family.

     (extensively used all over the kernel, for example by the vfs to
      give each filesystem type their own locking order rules)

   * lockdep_set_no_validate_class() is a magical variant of
     lockdep_set_class() that sets a magical lock_class_key.

   * can be changed while held using lock_set_class()

     ( from a lockdep pov it unlocks the held stack,
       changes the class of your lock, and re-locks the
       held stack, now with a different class nesting ).

     Be carefule! It doesn't forget the old nesting order, so you
     can trivally generate cycles.

 - subclasses, basically distinct addresses inside above mentioned
   lock_class_key object, limited to 8. Normally used with
   *lock_nested() family of functions. Typically used to lock multiple
   instances of a single lock class where there is defined order between
   instances (see for instance: double_rq_lock()).

 - nest_lock; eg. mutex_lock_nest_lock(), which allows expressing things
   like: multiple locks of class A can be taken in any order, provided
   we hold lock B.

With many of these, it's possible to get it wrong and annotate real
deadlocks away, so be careful :-)
Dan Williams April 14, 2022, 5:17 p.m. UTC | #4
On Thu, Apr 14, 2022 at 3:16 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Wed, Apr 13, 2022 at 03:01:21PM -0700, Dan Williams wrote:
>
> > > That's not an obvious conclusion; lockdep has lots of funny annotations,
> > > subclasses is just one.
> > >
> > > I think the big new development in lockdep since that time with Alan
> > > Stern is that lockdep now has support for dynamic keys; that is lock
> > > keys in heap memory (as opposed to static storage).
> >
> > Ah, I was not aware of that, that should allow for deep cleanups of
> > this proposal.
>
> > > If you want lockdep validation for one (or more) dev->mutex instances,
> > > why not pull them out of the no_validate class and use the normal
> > > locking?
> >
> > Sounds perfect, just didn't know how to do that with my current
> > understanding of how to communicate this to lockdep.
> >
> > >
> > > This is all quite insane.
> >
> > Yes, certainly in comparison to your suggestion on the next patch.
> > That looks much more sane, and even better I think it allows for
> > optional lockdep validation without even needing to touch
> > include/linux/device.h.
>
> Right, so lockdep has:
>
>  - classes, based off of lock_class_key address;
>
>    * lock_class_key should be static storage; except now we also have:
>
>        lockdep_{,un}register_key()
>
>      which allows dynamic memory (aka. heap) to be used for classes,
>      important to note that lockdep memory usage is still static storage
>      because the memory allocators use locks too. So if you register too
>      many dynamic keys, you'll run out of lockdep memory etc.. so be
>      careful.
>
>    * things like mutex_init() have a static lock_class_key per site
>      and hence every lock initialized by the same code ends up in the
>      same class by default.
>
>    * can be trivially changed at any time, assuming the lock isn't held,
>      using lockdep_set_class*() family.
>
>      (extensively used all over the kernel, for example by the vfs to
>       give each filesystem type their own locking order rules)
>
>    * lockdep_set_no_validate_class() is a magical variant of
>      lockdep_set_class() that sets a magical lock_class_key.

Golden, thanks Peter!

>
>    * can be changed while held using lock_set_class()

One more sanity check... So in driver subsystems there are cases where
a device on busA hosts a topology on busB. When that happens there's a
need to set the lock class late in a driver since busA knows nothing
about the locking rules of busB. Since the device has a longer
lifetime than a driver when the driver exits it must set dev->mutex
back to the novalidate class, otherwise it sets up a use after free of
the static lock_class_key. I came up with this and it seems to work,
just want to make sure I'm properly using the lock_set_class() API and
it is ok to transition back and forth from the novalidate case:

diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 990b6670222e..32673e1a736d 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -405,6 +405,29 @@ struct cxl_nvdimm_bridge
*cxl_find_nvdimm_bridge(struct cxl_nvdimm *cxl_nvd);
 #define __mock static
 #endif

+#ifdef CONFIG_PROVE_LOCKING
+static inline void cxl_lock_reset_class(void *_dev)
+{
+       struct device *dev = _dev;
+
+       lock_set_class(&dev->mutex.dep_map, "__lockdep_no_validate__",
+                      &__lockdep_no_validate__, 0, _THIS_IP_);
+}
+
+static inline int cxl_lock_set_class(struct device *dev, const char *name,
+                                    struct lock_class_key *key)
+{
+       lock_set_class(&dev->mutex.dep_map, name, key, 0, _THIS_IP_);
+       return devm_add_action_or_reset(dev, cxl_lock_reset_class, dev);
+}
+#else
+static inline int cxl_lock_set_class(struct device *dev, const char *name,
+                                    struct lock_class_key *key)
+{
+       return 0;
+}
+#endif
+
 #ifdef CONFIG_PROVE_CXL_LOCKING
 enum cxl_lock_class {
        CXL_ANON_LOCK,

>      ( from a lockdep pov it unlocks the held stack,
>        changes the class of your lock, and re-locks the
>        held stack, now with a different class nesting ).
>
>      Be carefule! It doesn't forget the old nesting order, so you
>      can trivally generate cycles.
>
>  - subclasses, basically distinct addresses inside above mentioned
>    lock_class_key object, limited to 8. Normally used with
>    *lock_nested() family of functions. Typically used to lock multiple
>    instances of a single lock class where there is defined order between
>    instances (see for instance: double_rq_lock()).
>
>  - nest_lock; eg. mutex_lock_nest_lock(), which allows expressing things
>    like: multiple locks of class A can be taken in any order, provided
>    we hold lock B.
>
> With many of these, it's possible to get it wrong and annotate real
> deadlocks away, so be careful :-)

Noted.
Peter Zijlstra April 14, 2022, 7:33 p.m. UTC | #5
On Thu, Apr 14, 2022 at 10:17:13AM -0700, Dan Williams wrote:

> One more sanity check... So in driver subsystems there are cases where
> a device on busA hosts a topology on busB. When that happens there's a
> need to set the lock class late in a driver since busA knows nothing
> about the locking rules of busB.

I'll pretend I konw what you're talking about ;-)

> Since the device has a longer lifetime than a driver when the driver
> exits it must set dev->mutex back to the novalidate class, otherwise
> it sets up a use after free of the static lock_class_key.

I'm not following, static storage has infinite lifetime.

> I came up with this and it seems to work, just want to make sure I'm
> properly using the lock_set_class() API and it is ok to transition
> back and forth from the novalidate case:
> 
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 990b6670222e..32673e1a736d 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -405,6 +405,29 @@ struct cxl_nvdimm_bridge
> *cxl_find_nvdimm_bridge(struct cxl_nvdimm *cxl_nvd);
>  #define __mock static
>  #endif
> 
> +#ifdef CONFIG_PROVE_LOCKING
> +static inline void cxl_lock_reset_class(void *_dev)
> +{
> +       struct device *dev = _dev;
> +
> +       lock_set_class(&dev->mutex.dep_map, "__lockdep_no_validate__",
> +                      &__lockdep_no_validate__, 0, _THIS_IP_);
> +}
> +
> +static inline int cxl_lock_set_class(struct device *dev, const char *name,
> +                                    struct lock_class_key *key)
> +{
> +       lock_set_class(&dev->mutex.dep_map, name, key, 0, _THIS_IP_);
> +       return devm_add_action_or_reset(dev, cxl_lock_reset_class, dev);
> +}
> +#else
> +static inline int cxl_lock_set_class(struct device *dev, const char *name,
> +                                    struct lock_class_key *key)
> +{
> +       return 0;
> +}
> +#endif

Under the assumption that the lock is held (lock_set_class() will
actually barf if @lock isn't held) this should indeed work as expected
(although I think you got the @name part 'wrong', I think that's
canonically something like "&dev->mutex" or something).
Dan Williams April 14, 2022, 7:43 p.m. UTC | #6
On Thu, Apr 14, 2022 at 12:34 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Apr 14, 2022 at 10:17:13AM -0700, Dan Williams wrote:
>
> > One more sanity check... So in driver subsystems there are cases where
> > a device on busA hosts a topology on busB. When that happens there's a
> > need to set the lock class late in a driver since busA knows nothing
> > about the locking rules of busB.
>
> I'll pretend I konw what you're talking about ;-)
>
> > Since the device has a longer lifetime than a driver when the driver
> > exits it must set dev->mutex back to the novalidate class, otherwise
> > it sets up a use after free of the static lock_class_key.
>
> I'm not following, static storage has infinite lifetime.

Not static storage in a driver module.

modprobe -r fancy_lockdep_using_driver.ko

Any use of device_lock() by the core on a device that a driver in this
module was driving will de-reference a now invalid pointer into
whatever memory was vmalloc'd for the module static data.

>
> > I came up with this and it seems to work, just want to make sure I'm
> > properly using the lock_set_class() API and it is ok to transition
> > back and forth from the novalidate case:
> >
> > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > index 990b6670222e..32673e1a736d 100644
> > --- a/drivers/cxl/cxl.h
> > +++ b/drivers/cxl/cxl.h
> > @@ -405,6 +405,29 @@ struct cxl_nvdimm_bridge
> > *cxl_find_nvdimm_bridge(struct cxl_nvdimm *cxl_nvd);
> >  #define __mock static
> >  #endif
> >
> > +#ifdef CONFIG_PROVE_LOCKING
> > +static inline void cxl_lock_reset_class(void *_dev)
> > +{
> > +       struct device *dev = _dev;
> > +
> > +       lock_set_class(&dev->mutex.dep_map, "__lockdep_no_validate__",
> > +                      &__lockdep_no_validate__, 0, _THIS_IP_);
> > +}
> > +
> > +static inline int cxl_lock_set_class(struct device *dev, const char *name,
> > +                                    struct lock_class_key *key)
> > +{
> > +       lock_set_class(&dev->mutex.dep_map, name, key, 0, _THIS_IP_);
> > +       return devm_add_action_or_reset(dev, cxl_lock_reset_class, dev);
> > +}
> > +#else
> > +static inline int cxl_lock_set_class(struct device *dev, const char *name,
> > +                                    struct lock_class_key *key)
> > +{
> > +       return 0;
> > +}
> > +#endif
>
> Under the assumption that the lock is held (lock_set_class() will
> actually barf if @lock isn't held) this should indeed work as expected

Nice.

> (although I think you got the @name part 'wrong', I think that's
> canonically something like "&dev->mutex" or something).

Ah, yes, I got it wrong for restoring the same name that results from
lockdep_set_novalidate_class(), will fix.
Peter Zijlstra April 15, 2022, 7:53 a.m. UTC | #7
On Thu, Apr 14, 2022 at 12:43:31PM -0700, Dan Williams wrote:
> On Thu, Apr 14, 2022 at 12:34 PM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Thu, Apr 14, 2022 at 10:17:13AM -0700, Dan Williams wrote:
> >
> > > One more sanity check... So in driver subsystems there are cases where
> > > a device on busA hosts a topology on busB. When that happens there's a
> > > need to set the lock class late in a driver since busA knows nothing
> > > about the locking rules of busB.
> >
> > I'll pretend I konw what you're talking about ;-)
> >
> > > Since the device has a longer lifetime than a driver when the driver
> > > exits it must set dev->mutex back to the novalidate class, otherwise
> > > it sets up a use after free of the static lock_class_key.
> >
> > I'm not following, static storage has infinite lifetime.
> 
> Not static storage in a driver module.
> 
> modprobe -r fancy_lockdep_using_driver.ko
> 
> Any use of device_lock() by the core on a device that a driver in this
> module was driving will de-reference a now invalid pointer into
> whatever memory was vmalloc'd for the module static data.

Ooh, modules (I always, conveniently, forget they exist). Yes, setting a
lock instance from the core kernel to a key that's inside a module and
then taking the module out will be 'interesting'.

Most likely you'll get a splat from lockdep when doing this.
diff mbox series

Patch

diff --git a/include/linux/device.h b/include/linux/device.h
index af2576ace130..6083e757e804 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -402,6 +402,7 @@  struct dev_msi_info {
  * @mutex:	Mutex to synchronize calls to its driver.
  * @lockdep_mutex: An optional debug lock that a subsystem can use as a
  * 		peer lock to gain localized lockdep coverage of the device_lock.
+ * @lock_class: per-subsystem annotated device lock class
  * @bus:	Type of bus device is on.
  * @driver:	Which driver has allocated this
  * @platform_data: Platform data specific to the device.
@@ -501,6 +502,7 @@  struct device {
 					   dev_set_drvdata/dev_get_drvdata */
 #ifdef CONFIG_PROVE_LOCKING
 	struct mutex		lockdep_mutex;
+	int			lock_class;
 #endif
 	struct mutex		mutex;	/* mutex to synchronize calls to
 					 * its driver.
@@ -762,18 +764,100 @@  static inline bool dev_pm_test_driver_flags(struct device *dev, u32 flags)
 	return !!(dev->power.driver_flags & flags);
 }
 
+static inline void device_lock_assert(struct device *dev)
+{
+	lockdep_assert_held(&dev->mutex);
+}
+
 #ifdef CONFIG_PROVE_LOCKING
 static inline void device_lockdep_init(struct device *dev)
 {
 	mutex_init(&dev->lockdep_mutex);
+	dev->lock_class = -1;
 	lockdep_set_novalidate_class(&dev->mutex);
 }
-#else
+
+static inline void device_lock(struct device *dev)
+{
+	/*
+	 * For double-lock programming errors the kernel will hang
+	 * trying to acquire @dev->mutex before lockdep can report the
+	 * problem acquiring @dev->lockdep_mutex, so manually assert
+	 * before that hang.
+	 */
+	lockdep_assert_not_held(&dev->lockdep_mutex);
+
+	mutex_lock(&dev->mutex);
+	if (dev->lock_class >= 0)
+		mutex_lock_nested(&dev->lockdep_mutex, dev->lock_class);
+}
+
+static inline int device_lock_interruptible(struct device *dev)
+{
+	int rc;
+
+	lockdep_assert_not_held(&dev->lockdep_mutex);
+
+	rc = mutex_lock_interruptible(&dev->mutex);
+	if (rc || dev->lock_class < 0)
+		return rc;
+
+	return mutex_lock_interruptible_nested(&dev->lockdep_mutex,
+					       dev->lock_class);
+}
+
+static inline int device_trylock(struct device *dev)
+{
+	if (mutex_trylock(&dev->mutex)) {
+		if (dev->lock_class >= 0)
+			mutex_lock_nested(&dev->lockdep_mutex, dev->lock_class);
+		return 1;
+	}
+
+	return 0;
+}
+
+static inline void device_unlock(struct device *dev)
+{
+	if (dev->lock_class >= 0)
+		mutex_unlock(&dev->lockdep_mutex);
+	mutex_unlock(&dev->mutex);
+}
+
+/*
+ * Note: this routine expects that the state of @dev->mutex is stable
+ * from entry to exit. There is no support for changing lockdep
+ * validation classes, only enabling and disabling validation.
+ */
+static inline void device_set_lock_class(struct device *dev, int lock_class)
+{
+	/*
+	 * Allow for setting or clearing the lock class while the
+	 * device_lock() is held, in which case the paired nested lock
+	 * might need to be acquired or released now to accommodate the
+	 * next device_unlock().
+	 */
+	if (dev->lock_class < 0 && lock_class >= 0) {
+		/* Enabling lockdep validation... */
+		if (mutex_is_locked(&dev->mutex))
+			mutex_lock_nested(&dev->lockdep_mutex, lock_class);
+	} else if (dev->lock_class >= 0 && lock_class < 0) {
+		/* Disabling lockdep validation... */
+		if (mutex_is_locked(&dev->mutex))
+			mutex_unlock(&dev->lockdep_mutex);
+	} else {
+		dev_warn(dev,
+			 "%s: failed to change lock_class from: %d to %d\n",
+			 __func__, dev->lock_class, lock_class);
+		return;
+	}
+	dev->lock_class = lock_class;
+}
+#else /* !CONFIG_PROVE_LOCKING */
 static inline void device_lockdep_init(struct device *dev)
 {
 	lockdep_set_novalidate_class(&dev->mutex);
 }
-#endif
 
 static inline void device_lock(struct device *dev)
 {
@@ -795,10 +879,10 @@  static inline void device_unlock(struct device *dev)
 	mutex_unlock(&dev->mutex);
 }
 
-static inline void device_lock_assert(struct device *dev)
+static inline void device_set_lock_class(struct device *dev, int lock_class)
 {
-	lockdep_assert_held(&dev->mutex);
 }
+#endif /* CONFIG_PROVE_LOCKING */
 
 static inline struct device_node *dev_of_node(struct device *dev)
 {