Message ID | 20220816080828.1218667-1-vincent.whitchurch@axis.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | iio: buffer: Silence lock nesting splat | expand |
On Tue, Aug 16, 2022 at 1:30 PM Vincent Whitchurch <vincent.whitchurch@axis.com> wrote: > > If an IIO driver uses callbacks from another IIO driver and calls > iio_channel_start_all_cb() from one of its buffer setup ops, then > lockdep complains due to the lock nesting, as in the below example with > lmp91000. Since the locks are being taken on different IIO devices, > there is no actual deadlock, so add lock nesting annotation to silence > the spurious warning. > > ============================================ > WARNING: possible recursive locking detected > 6.0.0-rc1+ #10 Not tainted > -------------------------------------------- > python3/23 is trying to acquire lock: > 0000000064c944c0 (&indio_dev->mlock){+.+.}-{3:3}, at: iio_update_buffers+0x62/0x180 > > but task is already holding lock: > 00000000636b64c0 (&indio_dev->mlock){+.+.}-{3:3}, at: enable_store+0x4d/0x100 > > other info that might help us debug this: > Possible unsafe locking scenario: > > CPU0 > ---- > lock(&indio_dev->mlock); > lock(&indio_dev->mlock); > > *** DEADLOCK *** > > May be due to missing lock nesting notation > > 5 locks held by python3/23: > #0: 00000000636b5420 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x67/0x100 > #1: 0000000064c19280 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x13a/0x270 > #2: 0000000064c3d9e0 (kn->active#14){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x149/0x270 > #3: 00000000636b64c0 (&indio_dev->mlock){+.+.}-{3:3}, at: enable_store+0x4d/0x100 > #4: 0000000064c945c8 (&iio_dev_opaque->info_exist_lock){+.+.}-{3:3}, at: iio_update_buffers+0x4f/0x180 > > stack backtrace: > CPU: 0 PID: 23 Comm: python3 Not tainted 6.0.0-rc1+ #10 > Call Trace: > dump_stack+0x1a/0x1c > __lock_acquire.cold+0x407/0x42d > lock_acquire+0x1ed/0x310 > __mutex_lock+0x72/0xde0 > mutex_lock_nested+0x1d/0x20 > iio_update_buffers+0x62/0x180 > iio_channel_start_all_cb+0x1c/0x20 [industrialio_buffer_cb] > lmp91000_buffer_postenable+0x1b/0x20 [lmp91000] > __iio_update_buffers+0x50b/0xd80 > enable_store+0x81/0x100 > dev_attr_store+0xf/0x20 > sysfs_kf_write+0x4c/0x70 > kernfs_fop_write_iter+0x179/0x270 > new_sync_write+0x99/0x120 > vfs_write+0x2c1/0x470 > ksys_write+0x67/0x100 > sys_write+0x10/0x20 https://www.kernel.org/doc/html/latest/process/submitting-patches.html#backtraces-in-commit-mesages On top of that, Fixes tag? > Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> -- With Best Regards, Andy Shevchenko
On Tue, 16 Aug 2022 10:08:28 +0200 Vincent Whitchurch <vincent.whitchurch@axis.com> wrote: > If an IIO driver uses callbacks from another IIO driver and calls > iio_channel_start_all_cb() from one of its buffer setup ops, then > lockdep complains due to the lock nesting, as in the below example with > lmp91000. Since the locks are being taken on different IIO devices, > there is no actual deadlock, so add lock nesting annotation to silence > the spurious warning. > > ============================================ > WARNING: possible recursive locking detected > 6.0.0-rc1+ #10 Not tainted > -------------------------------------------- > python3/23 is trying to acquire lock: > 0000000064c944c0 (&indio_dev->mlock){+.+.}-{3:3}, at: iio_update_buffers+0x62/0x180 > > but task is already holding lock: > 00000000636b64c0 (&indio_dev->mlock){+.+.}-{3:3}, at: enable_store+0x4d/0x100 > > other info that might help us debug this: > Possible unsafe locking scenario: > > CPU0 > ---- > lock(&indio_dev->mlock); > lock(&indio_dev->mlock); > > *** DEADLOCK *** > > May be due to missing lock nesting notation > > 5 locks held by python3/23: > #0: 00000000636b5420 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x67/0x100 > #1: 0000000064c19280 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x13a/0x270 > #2: 0000000064c3d9e0 (kn->active#14){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x149/0x270 > #3: 00000000636b64c0 (&indio_dev->mlock){+.+.}-{3:3}, at: enable_store+0x4d/0x100 > #4: 0000000064c945c8 (&iio_dev_opaque->info_exist_lock){+.+.}-{3:3}, at: iio_update_buffers+0x4f/0x180 > > stack backtrace: > CPU: 0 PID: 23 Comm: python3 Not tainted 6.0.0-rc1+ #10 > Call Trace: > dump_stack+0x1a/0x1c > __lock_acquire.cold+0x407/0x42d > lock_acquire+0x1ed/0x310 > __mutex_lock+0x72/0xde0 > mutex_lock_nested+0x1d/0x20 > iio_update_buffers+0x62/0x180 > iio_channel_start_all_cb+0x1c/0x20 [industrialio_buffer_cb] > lmp91000_buffer_postenable+0x1b/0x20 [lmp91000] > __iio_update_buffers+0x50b/0xd80 > enable_store+0x81/0x100 > dev_attr_store+0xf/0x20 > sysfs_kf_write+0x4c/0x70 > kernfs_fop_write_iter+0x179/0x270 > new_sync_write+0x99/0x120 > vfs_write+0x2c1/0x470 > ksys_write+0x67/0x100 > sys_write+0x10/0x20 > > Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> I'm wondering if this is sufficient. At first glance there are a whole bunch of other possible cases of this. Any consumer driver that calls iio_device_claim_direct_mode() would be a problem - though I'm not sure any do? I'm not sure I properly understand lockdep notations, but I thought the point was we needed to define a hierarchy? To do that here we need an IIO driver that is a consumer to somehow let the IIO core know that and mark all calls to the locks appropriately. This gets trickier as we allow 3+ levels of IIO drivers calling into each other. We should also think about how to prevent recursion if there are 3 IIO drivers involved. +CC Peter as most of the fun cases of IIO consumers were from him. Perhaps this notation is a step in the right direction and we can look for other problem cases later. One side note is that it's not immediately obvious that iio_update_buffers() is called only from consumers (the other paths use __iio_update_buffers() directly so if we make this change we should consider renaming that function or at very least adding some documentation. Jonathan > --- > drivers/iio/industrialio-buffer.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/iio/industrialio-buffer.c b/drivers/iio/industrialio-buffer.c > index acc2b6c05d57..27868ed092d0 100644 > --- a/drivers/iio/industrialio-buffer.c > +++ b/drivers/iio/industrialio-buffer.c > @@ -1255,7 +1255,7 @@ int iio_update_buffers(struct iio_dev *indio_dev, > return -EINVAL; > > mutex_lock(&iio_dev_opaque->info_exist_lock); > - mutex_lock(&indio_dev->mlock); > + mutex_lock_nested(&indio_dev->mlock, SINGLE_DEPTH_NESTING); > > if (insert_buffer && iio_buffer_is_active(insert_buffer)) > insert_buffer = NULL;
On Fri, 19 Aug 2022 11:03:55 +0300 Andy Shevchenko <andy.shevchenko@gmail.com> wrote: > On Tue, Aug 16, 2022 at 1:30 PM Vincent Whitchurch > <vincent.whitchurch@axis.com> wrote: > > > > If an IIO driver uses callbacks from another IIO driver and calls > > iio_channel_start_all_cb() from one of its buffer setup ops, then > > lockdep complains due to the lock nesting, as in the below example with > > lmp91000. Since the locks are being taken on different IIO devices, > > there is no actual deadlock, so add lock nesting annotation to silence > > the spurious warning. > > > > ============================================ > > WARNING: possible recursive locking detected > > 6.0.0-rc1+ #10 Not tainted > > -------------------------------------------- > > python3/23 is trying to acquire lock: > > 0000000064c944c0 (&indio_dev->mlock){+.+.}-{3:3}, at: iio_update_buffers+0x62/0x180 > > > > but task is already holding lock: > > 00000000636b64c0 (&indio_dev->mlock){+.+.}-{3:3}, at: enable_store+0x4d/0x100 > > > > other info that might help us debug this: > > Possible unsafe locking scenario: > > > > CPU0 > > ---- > > lock(&indio_dev->mlock); > > lock(&indio_dev->mlock); > > > > *** DEADLOCK *** > > > > May be due to missing lock nesting notation > > > > 5 locks held by python3/23: > > #0: 00000000636b5420 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x67/0x100 > > #1: 0000000064c19280 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x13a/0x270 > > #2: 0000000064c3d9e0 (kn->active#14){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x149/0x270 > > #3: 00000000636b64c0 (&indio_dev->mlock){+.+.}-{3:3}, at: enable_store+0x4d/0x100 > > #4: 0000000064c945c8 (&iio_dev_opaque->info_exist_lock){+.+.}-{3:3}, at: iio_update_buffers+0x4f/0x180 > > > > stack backtrace: > > CPU: 0 PID: 23 Comm: python3 Not tainted 6.0.0-rc1+ #10 > > Call Trace: > > dump_stack+0x1a/0x1c > > __lock_acquire.cold+0x407/0x42d > > lock_acquire+0x1ed/0x310 > > __mutex_lock+0x72/0xde0 > > mutex_lock_nested+0x1d/0x20 > > iio_update_buffers+0x62/0x180 > > iio_channel_start_all_cb+0x1c/0x20 [industrialio_buffer_cb] > > lmp91000_buffer_postenable+0x1b/0x20 [lmp91000] > > __iio_update_buffers+0x50b/0xd80 > > enable_store+0x81/0x100 > > dev_attr_store+0xf/0x20 > > sysfs_kf_write+0x4c/0x70 > > kernfs_fop_write_iter+0x179/0x270 > > new_sync_write+0x99/0x120 > > vfs_write+0x2c1/0x470 > > ksys_write+0x67/0x100 > > sys_write+0x10/0x20 > > https://www.kernel.org/doc/html/latest/process/submitting-patches.html#backtraces-in-commit-mesages > > On top of that, Fixes tag? It's going to be tricky to identify - the interface predates usecases that were IIO drivers by a long way. I guess introduction of first IIO driver that used a callback buffer? No idea which one that was :( > > > Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> > > > -- > With Best Regards, > Andy Shevchenko
On 8/20/22 13:06, Jonathan Cameron wrote: > On Tue, 16 Aug 2022 10:08:28 +0200 > Vincent Whitchurch <vincent.whitchurch@axis.com> wrote: > >> If an IIO driver uses callbacks from another IIO driver and calls >> iio_channel_start_all_cb() from one of its buffer setup ops, then >> lockdep complains due to the lock nesting, as in the below example with >> lmp91000. Since the locks are being taken on different IIO devices, >> there is no actual deadlock, so add lock nesting annotation to silence >> the spurious warning. >> >> ============================================ >> WARNING: possible recursive locking detected >> 6.0.0-rc1+ #10 Not tainted >> -------------------------------------------- >> python3/23 is trying to acquire lock: >> 0000000064c944c0 (&indio_dev->mlock){+.+.}-{3:3}, at: iio_update_buffers+0x62/0x180 >> >> but task is already holding lock: >> 00000000636b64c0 (&indio_dev->mlock){+.+.}-{3:3}, at: enable_store+0x4d/0x100 >> >> other info that might help us debug this: >> Possible unsafe locking scenario: >> >> CPU0 >> ---- >> lock(&indio_dev->mlock); >> lock(&indio_dev->mlock); >> >> *** DEADLOCK *** >> >> May be due to missing lock nesting notation >> >> 5 locks held by python3/23: >> #0: 00000000636b5420 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x67/0x100 >> #1: 0000000064c19280 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x13a/0x270 >> #2: 0000000064c3d9e0 (kn->active#14){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x149/0x270 >> #3: 00000000636b64c0 (&indio_dev->mlock){+.+.}-{3:3}, at: enable_store+0x4d/0x100 >> #4: 0000000064c945c8 (&iio_dev_opaque->info_exist_lock){+.+.}-{3:3}, at: iio_update_buffers+0x4f/0x180 >> >> stack backtrace: >> CPU: 0 PID: 23 Comm: python3 Not tainted 6.0.0-rc1+ #10 >> Call Trace: >> dump_stack+0x1a/0x1c >> __lock_acquire.cold+0x407/0x42d >> lock_acquire+0x1ed/0x310 >> __mutex_lock+0x72/0xde0 >> mutex_lock_nested+0x1d/0x20 >> iio_update_buffers+0x62/0x180 >> iio_channel_start_all_cb+0x1c/0x20 [industrialio_buffer_cb] >> lmp91000_buffer_postenable+0x1b/0x20 [lmp91000] >> __iio_update_buffers+0x50b/0xd80 >> enable_store+0x81/0x100 >> dev_attr_store+0xf/0x20 >> sysfs_kf_write+0x4c/0x70 >> kernfs_fop_write_iter+0x179/0x270 >> new_sync_write+0x99/0x120 >> vfs_write+0x2c1/0x470 >> ksys_write+0x67/0x100 >> sys_write+0x10/0x20 >> >> Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> > I'm wondering if this is sufficient. > At first glance there are a whole bunch of other possible cases of this. > Any consumer driver that calls iio_device_claim_direct_mode() would be a > problem - though I'm not sure any do? > > I'm not sure I properly understand lockdep notations, but I thought the > point was we needed to define a hierarchy? To do that here we need > an IIO driver that is a consumer to somehow let the IIO core know that > and mark all calls to the locks appropriately. This gets trickier > as we allow 3+ levels of IIO drivers calling into each other. > > We should also think about how to prevent recursion if there are 3 > IIO drivers involved. There are two different approaches for this kind of nested locking. One is to use mutex_lock_nested(). This works if there is a strict hierarchy. The I2C framework for example has a function to determine the position of a I2C mux in the hierarchy and uses that for locking. See https://elixir.bootlin.com/linux/latest/source/drivers/i2c/i2c-core-base.c#L1151. I'm not sure this directly translates to IIO since the consumers/producers don't have to be a in strict hierarchy. And if it is a complex graph it can be difficult to figure out the right level for mutex_lock_nested(). The other method is to mark each mutex as its own class. lockdep does the lock checking based on the lock class and by default the same mutex of different instances is considered the same class to keep the resource requirements for the checker lower. Regmap for example does this. See https://elixir.bootlin.com/linux/latest/source/drivers/base/regmap/regmap.c#L795. This could be a solution for IIO with the downside how the additional work for the checker. But as long as there are only a few IIO devices per system that should be OK. We could also only set the per device lock class if in kernel consumers are enabled.
On Sat, Aug 20, 2022 at 01:08:00PM +0200, Jonathan Cameron wrote: > On Fri, 19 Aug 2022 11:03:55 +0300 > Andy Shevchenko <andy.shevchenko@gmail.com> wrote: > > On top of that, Fixes tag? > It's going to be tricky to identify - the interface predates usecases that were IIO > drivers by a long way. I guess introduction of first IIO driver that used > a callback buffer? No idea which one that was :( AFAICS there's only one IIO driver upstream using a callback buffer, and it's lmp91000, so I can point the fixes tag to the patch which added that. By the way, note that lmp91000 actually fails to probe in mainline without extra patches, and it seems to have been that way for a while now. I noticed this lockdep splat when working on a new driver which also uses a callback buffer. I can post the fixes I used to get lmp91000 to probe successfully (in roadtest) separately.
On Sat, Aug 20, 2022 at 01:08:28PM +0200, Lars-Peter Clausen wrote: > There are two different approaches for this kind of nested locking. One > is to use mutex_lock_nested(). This works if there is a strict > hierarchy. The I2C framework for example has a function to determine the > position of a I2C mux in the hierarchy and uses that for locking. See > https://elixir.bootlin.com/linux/latest/source/drivers/i2c/i2c-core-base.c#L1151. > > I'm not sure this directly translates to IIO since the > consumers/producers don't have to be a in strict hierarchy. And if it > is a complex graph it can be difficult to figure out the right level for > mutex_lock_nested(). > > The other method is to mark each mutex as its own class. lockdep does > the lock checking based on the lock class and by default the same mutex > of different instances is considered the same class to keep the resource > requirements for the checker lower. > > Regmap for example does this. See > https://elixir.bootlin.com/linux/latest/source/drivers/base/regmap/regmap.c#L795. > > This could be a solution for IIO with the downside how the additional > work for the checker. But as long as there are only a few IIO devices > per system that should be OK. We could also only set the per device lock > class if in kernel consumers are enabled. The second method certainly sounds like a better fix, since it also still warns if one actually takes the same iio_dev mutex twice. I'll respin the patch. Thanks.
On Tue, 23 Aug 2022 10:10:45 +0200 Vincent Whitchurch <vincent.whitchurch@axis.com> wrote: > On Sat, Aug 20, 2022 at 01:08:00PM +0200, Jonathan Cameron wrote: > > On Fri, 19 Aug 2022 11:03:55 +0300 > > Andy Shevchenko <andy.shevchenko@gmail.com> wrote: > > > On top of that, Fixes tag? > > It's going to be tricky to identify - the interface predates usecases that were IIO > > drivers by a long way. I guess introduction of first IIO driver that used > > a callback buffer? No idea which one that was :( > > AFAICS there's only one IIO driver upstream using a callback buffer, and > it's lmp91000, so I can point the fixes tag to the patch which added > that. Ah. That's handy. > > By the way, note that lmp91000 actually fails to probe in mainline > without extra patches, and it seems to have been that way for a while > now. I noticed this lockdep splat when working on a new driver which > also uses a callback buffer. I can post the fixes I used to get > lmp91000 to probe successfully (in roadtest) separately. That would be great. Unfortunately drivers sometimes bit rot without testing. Any plans to post updated roadtest soon? I'm keen to add more test cases and use it to cleanup the remaining staging drivers. Very helpful tool, but I don't want to be developing test sets against an old version if it's going to be costly to forward port it. No great rush though as I've bitten off a bit more than I was aiming to in prep for plumbers so not going to get anything significant done in IIO until mid September. Thanks, Jonathan
On Sun, Aug 28, 2022 at 05:32:47PM +0200, Jonathan Cameron wrote: > Any plans to post updated roadtest soon? I'm keen to add more test cases > and use it to cleanup the remaining staging drivers. Very helpful tool, > but I don't want to be developing test sets against an old version if > it's going to be costly to forward port it. I don't think it's ready for a new posting, but I pushed the version I currently have here: https://github.com/vwax/linux/commits/roadtest/devel The tests only pass on v5.19 since there are a couple of regressions (which affect real hardware too) in mainline: https://lore.kernel.org/all/a48011b9-a551-3547-34b6-98b10e7ff2eb@redhat.com/ https://lore.kernel.org/all/YxBX4bXG02E4lSUW@axis.com/
diff --git a/drivers/iio/industrialio-buffer.c b/drivers/iio/industrialio-buffer.c index acc2b6c05d57..27868ed092d0 100644 --- a/drivers/iio/industrialio-buffer.c +++ b/drivers/iio/industrialio-buffer.c @@ -1255,7 +1255,7 @@ int iio_update_buffers(struct iio_dev *indio_dev, return -EINVAL; mutex_lock(&iio_dev_opaque->info_exist_lock); - mutex_lock(&indio_dev->mlock); + mutex_lock_nested(&indio_dev->mlock, SINGLE_DEPTH_NESTING); if (insert_buffer && iio_buffer_is_active(insert_buffer)) insert_buffer = NULL;
If an IIO driver uses callbacks from another IIO driver and calls iio_channel_start_all_cb() from one of its buffer setup ops, then lockdep complains due to the lock nesting, as in the below example with lmp91000. Since the locks are being taken on different IIO devices, there is no actual deadlock, so add lock nesting annotation to silence the spurious warning. ============================================ WARNING: possible recursive locking detected 6.0.0-rc1+ #10 Not tainted -------------------------------------------- python3/23 is trying to acquire lock: 0000000064c944c0 (&indio_dev->mlock){+.+.}-{3:3}, at: iio_update_buffers+0x62/0x180 but task is already holding lock: 00000000636b64c0 (&indio_dev->mlock){+.+.}-{3:3}, at: enable_store+0x4d/0x100 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&indio_dev->mlock); lock(&indio_dev->mlock); *** DEADLOCK *** May be due to missing lock nesting notation 5 locks held by python3/23: #0: 00000000636b5420 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x67/0x100 #1: 0000000064c19280 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x13a/0x270 #2: 0000000064c3d9e0 (kn->active#14){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x149/0x270 #3: 00000000636b64c0 (&indio_dev->mlock){+.+.}-{3:3}, at: enable_store+0x4d/0x100 #4: 0000000064c945c8 (&iio_dev_opaque->info_exist_lock){+.+.}-{3:3}, at: iio_update_buffers+0x4f/0x180 stack backtrace: CPU: 0 PID: 23 Comm: python3 Not tainted 6.0.0-rc1+ #10 Call Trace: dump_stack+0x1a/0x1c __lock_acquire.cold+0x407/0x42d lock_acquire+0x1ed/0x310 __mutex_lock+0x72/0xde0 mutex_lock_nested+0x1d/0x20 iio_update_buffers+0x62/0x180 iio_channel_start_all_cb+0x1c/0x20 [industrialio_buffer_cb] lmp91000_buffer_postenable+0x1b/0x20 [lmp91000] __iio_update_buffers+0x50b/0xd80 enable_store+0x81/0x100 dev_attr_store+0xf/0x20 sysfs_kf_write+0x4c/0x70 kernfs_fop_write_iter+0x179/0x270 new_sync_write+0x99/0x120 vfs_write+0x2c1/0x470 ksys_write+0x67/0x100 sys_write+0x10/0x20 Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> --- drivers/iio/industrialio-buffer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)