diff mbox series

[v2] platform/chrome: cros_ec: Use per-device lockdep key

Message ID 20230111074146.2624496-1-wenst@chromium.org (mailing list archive)
State Accepted
Commit 961a325becd9a142ae5c8b258e5c2f221f8bfac8
Headers show
Series [v2] platform/chrome: cros_ec: Use per-device lockdep key | expand

Commit Message

Chen-Yu Tsai Jan. 11, 2023, 7:41 a.m. UTC
Lockdep reports a bogus possible deadlock on MT8192 Chromebooks due to
the following lock sequences:

1. lock(i2c_register_adapter) [1]; lock(&ec_dev->lock)
2. lock(&ec_dev->lock); lock(prepare_lock);

The actual dependency chains are much longer. The shortened version
looks somewhat like:

1. cros-ec-rpmsg on mtk-scp
   ec_dev->lock -> prepare_lock
2. In rt5682_i2c_probe() on native I2C bus:
   prepare_lock -> regmap->lock -> (possibly) i2c_adapter->bus_lock
3. In rt5682_i2c_probe() on native I2C bus:
   regmap->lock -> i2c_adapter->bus_lock
4. In sbs_probe() on i2c-cros-ec-tunnel I2C bus attached on cros-ec:
   i2c_adapter->bus_lock -> ec_dev->lock

While lockdep is correct that the shared lockdep classes have a circular
dependency, it is bogus because

  a) 2+3 happen on a native I2C bus
  b) 4 happens on the actual EC on ChromeOS devices
  c) 1 happens on the SCP coprocessor on MediaTek Chromebooks that just
     happens to expose a cros-ec interface, but does not have an
     i2c-cros-ec-tunnel I2C bus

In short, the "dependencies" are actually on different devices.

Setup a per-device lockdep key for cros_ec devices so lockdep can tell
the two instances apart. This helps with getting rid of the bogus
lockdep warning. For ChromeOS devices that only have one cros-ec
instance this doesn't change anything.

Also add a missing mutex_destroy, just to make the teardown complete.

[1] This is likely the per I2C bus lock with shared lockdep class

Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
---
Changes since v1:
- Changed subject prefix from "chromeos" to "chrome"
- Changed "passthrough I2C bus" to exact name, i2c-cros-ec-tunnel
- Added kerneldoc for new "lockdep_key" field

 drivers/platform/chrome/cros_ec.c           | 14 +++++++++++---
 include/linux/platform_data/cros_ec_proto.h |  4 ++++
 2 files changed, 15 insertions(+), 3 deletions(-)

Comments

Chen-Yu Tsai Jan. 11, 2023, 8:47 a.m. UTC | #1
On Wed, Jan 11, 2023 at 3:41 PM Chen-Yu Tsai <wenst@chromium.org> wrote:
>
> Lockdep reports a bogus possible deadlock on MT8192 Chromebooks due to
> the following lock sequences:
>
> 1. lock(i2c_register_adapter) [1]; lock(&ec_dev->lock)
> 2. lock(&ec_dev->lock); lock(prepare_lock);
>
> The actual dependency chains are much longer. The shortened version
> looks somewhat like:
>
> 1. cros-ec-rpmsg on mtk-scp
>    ec_dev->lock -> prepare_lock
> 2. In rt5682_i2c_probe() on native I2C bus:
>    prepare_lock -> regmap->lock -> (possibly) i2c_adapter->bus_lock
> 3. In rt5682_i2c_probe() on native I2C bus:
>    regmap->lock -> i2c_adapter->bus_lock
> 4. In sbs_probe() on i2c-cros-ec-tunnel I2C bus attached on cros-ec:
>    i2c_adapter->bus_lock -> ec_dev->lock
>
> While lockdep is correct that the shared lockdep classes have a circular
> dependency, it is bogus because
>
>   a) 2+3 happen on a native I2C bus
>   b) 4 happens on the actual EC on ChromeOS devices
>   c) 1 happens on the SCP coprocessor on MediaTek Chromebooks that just
>      happens to expose a cros-ec interface, but does not have an
>      i2c-cros-ec-tunnel I2C bus
>
> In short, the "dependencies" are actually on different devices.
>
> Setup a per-device lockdep key for cros_ec devices so lockdep can tell
> the two instances apart. This helps with getting rid of the bogus
> lockdep warning. For ChromeOS devices that only have one cros-ec
> instance this doesn't change anything.

Actually, hold off on this for a bit. I just realized this makes the
kernel give a big warning:

INFO: trying to register non-static key.
The code is fine but needs lockdep annotation, or maybe
you didn't initialize this object before use?
turning off the locking correctness validator.

CPU: 0 PID: 99 Comm: kworker/u16:3 Not tainted
6.2.0-rc3-next-20230111-04021-g65853aed7123-dirty #472
8115f54190814e6abf2d53f6a2194c1af0b27040
Hardware name: Google juniper sku16 board (DT)
Workqueue: events_unbound async_run_entry_fn
Call trace:
 dump_backtrace.part.0+0xb4/0xf8
 show_stack+0x20/0x38
 dump_stack_lvl+0x88/0xb4
 dump_stack+0x18/0x34
 register_lock_class+0x16c/0x40c
 __lock_acquire+0xa0/0x1064
 lock_acquire+0x1f0/0x2f0
 down_write+0x5c/0x80
 __blocking_notifier_chain_register+0x64/0x84
 blocking_notifier_chain_register+0x1c/0x28
 cros_ec_debugfs_probe+0x218/0x3ac
 platform_probe+0x70/0xc4
 really_probe+0x158/0x290
 __driver_probe_device+0xc8/0xe0
 driver_probe_device+0x44/0x100
 __device_attach_driver+0x64/0xdc
 bus_for_each_drv+0xa0/0xc8
 __device_attach_async_helper+0x70/0xc4
 async_run_entry_fn+0x3c/0xe4
 process_one_work+0x2d0/0x48c
 worker_thread+0x204/0x274
 kthread+0xe8/0xf8
 ret_from_fork+0x10/0x20

>
> Also add a missing mutex_destroy, just to make the teardown complete.
>
> [1] This is likely the per I2C bus lock with shared lockdep class
>
> Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
> ---
> Changes since v1:
> - Changed subject prefix from "chromeos" to "chrome"
> - Changed "passthrough I2C bus" to exact name, i2c-cros-ec-tunnel
> - Added kerneldoc for new "lockdep_key" field
>
>  drivers/platform/chrome/cros_ec.c           | 14 +++++++++++---
>  include/linux/platform_data/cros_ec_proto.h |  4 ++++
>  2 files changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/platform/chrome/cros_ec.c b/drivers/platform/chrome/cros_ec.c
> index ec733f683f34..4ae57820afd5 100644
> --- a/drivers/platform/chrome/cros_ec.c
> +++ b/drivers/platform/chrome/cros_ec.c
> @@ -198,12 +198,14 @@ int cros_ec_register(struct cros_ec_device *ec_dev)
>         if (!ec_dev->dout)
>                 return -ENOMEM;
>
> +       lockdep_register_key(&ec_dev->lockdep_key);
>         mutex_init(&ec_dev->lock);
> +       lockdep_set_class(&ec_dev->lock, &ec_dev->lockdep_key);
>
>         err = cros_ec_query_all(ec_dev);
>         if (err) {
>                 dev_err(dev, "Cannot identify the EC: error %d\n", err);
> -               return err;
> +               goto destroy_mutex;
>         }
>
>         if (ec_dev->irq > 0) {
> @@ -215,7 +217,7 @@ int cros_ec_register(struct cros_ec_device *ec_dev)
>                 if (err) {
>                         dev_err(dev, "Failed to request IRQ %d: %d\n",
>                                 ec_dev->irq, err);
> -                       return err;
> +                       goto destroy_mutex;
>                 }
>         }
>
> @@ -226,7 +228,8 @@ int cros_ec_register(struct cros_ec_device *ec_dev)
>         if (IS_ERR(ec_dev->ec)) {
>                 dev_err(ec_dev->dev,
>                         "Failed to create CrOS EC platform device\n");
> -               return PTR_ERR(ec_dev->ec);
> +               err = PTR_ERR(ec_dev->ec);
> +               goto destroy_mutex;
>         }
>
>         if (ec_dev->max_passthru) {
> @@ -292,6 +295,9 @@ int cros_ec_register(struct cros_ec_device *ec_dev)
>  exit:
>         platform_device_unregister(ec_dev->ec);
>         platform_device_unregister(ec_dev->pd);
> +destroy_mutex:
> +       mutex_destroy(&ec_dev->lock);
> +       lockdep_unregister_key(&ec_dev->lockdep_key);
>         return err;
>  }
>  EXPORT_SYMBOL(cros_ec_register);
> @@ -309,6 +315,8 @@ void cros_ec_unregister(struct cros_ec_device *ec_dev)
>         if (ec_dev->pd)
>                 platform_device_unregister(ec_dev->pd);
>         platform_device_unregister(ec_dev->ec);
> +       mutex_destroy(&ec_dev->lock);
> +       lockdep_unregister_key(&ec_dev->lockdep_key);
>  }
>  EXPORT_SYMBOL(cros_ec_unregister);
>
> diff --git a/include/linux/platform_data/cros_ec_proto.h b/include/linux/platform_data/cros_ec_proto.h
> index 017d502ed66e..3db26c891d5c 100644
> --- a/include/linux/platform_data/cros_ec_proto.h
> +++ b/include/linux/platform_data/cros_ec_proto.h
> @@ -9,6 +9,7 @@
>  #define __LINUX_CROS_EC_PROTO_H
>
>  #include <linux/device.h>
> +#include <linux/lockdep_types.h>
>  #include <linux/mutex.h>
>  #include <linux/notifier.h>
>
> @@ -122,6 +123,8 @@ struct cros_ec_command {
>   *            command. The caller should check msg.result for the EC's result
>   *            code.
>   * @pkt_xfer: Send packet to EC and get response.
> + * @lockdep_key: Lockdep class for each instance. Unused if CONFIG_LOCKDEP is
> + *              not enabled.
>   * @lock: One transaction at a time.
>   * @mkbp_event_supported: 0 if MKBP not supported. Otherwise its value is
>   *                        the maximum supported version of the MKBP host event
> @@ -166,6 +169,7 @@ struct cros_ec_device {
>                         struct cros_ec_command *msg);
>         int (*pkt_xfer)(struct cros_ec_device *ec,
>                         struct cros_ec_command *msg);
> +       struct lock_class_key lockdep_key;
>         struct mutex lock;
>         u8 mkbp_event_supported;
>         bool host_sleep_v1;
> --
> 2.39.0.314.g84b9a713c41-goog
>
Chen-Yu Tsai Jan. 11, 2023, 9:03 a.m. UTC | #2
On Wed, Jan 11, 2023 at 4:47 PM Chen-Yu Tsai <wenst@chromium.org> wrote:
>
> On Wed, Jan 11, 2023 at 3:41 PM Chen-Yu Tsai <wenst@chromium.org> wrote:
> >
> > Lockdep reports a bogus possible deadlock on MT8192 Chromebooks due to
> > the following lock sequences:
> >
> > 1. lock(i2c_register_adapter) [1]; lock(&ec_dev->lock)
> > 2. lock(&ec_dev->lock); lock(prepare_lock);
> >
> > The actual dependency chains are much longer. The shortened version
> > looks somewhat like:
> >
> > 1. cros-ec-rpmsg on mtk-scp
> >    ec_dev->lock -> prepare_lock
> > 2. In rt5682_i2c_probe() on native I2C bus:
> >    prepare_lock -> regmap->lock -> (possibly) i2c_adapter->bus_lock
> > 3. In rt5682_i2c_probe() on native I2C bus:
> >    regmap->lock -> i2c_adapter->bus_lock
> > 4. In sbs_probe() on i2c-cros-ec-tunnel I2C bus attached on cros-ec:
> >    i2c_adapter->bus_lock -> ec_dev->lock
> >
> > While lockdep is correct that the shared lockdep classes have a circular
> > dependency, it is bogus because
> >
> >   a) 2+3 happen on a native I2C bus
> >   b) 4 happens on the actual EC on ChromeOS devices
> >   c) 1 happens on the SCP coprocessor on MediaTek Chromebooks that just
> >      happens to expose a cros-ec interface, but does not have an
> >      i2c-cros-ec-tunnel I2C bus
> >
> > In short, the "dependencies" are actually on different devices.
> >
> > Setup a per-device lockdep key for cros_ec devices so lockdep can tell
> > the two instances apart. This helps with getting rid of the bogus
> > lockdep warning. For ChromeOS devices that only have one cros-ec
> > instance this doesn't change anything.
>
> Actually, hold off on this for a bit. I just realized this makes the
> kernel give a big warning:
>
> INFO: trying to register non-static key.
> The code is fine but needs lockdep annotation, or maybe
> you didn't initialize this object before use?
> turning off the locking correctness validator.
>
> CPU: 0 PID: 99 Comm: kworker/u16:3 Not tainted
> 6.2.0-rc3-next-20230111-04021-g65853aed7123-dirty #472
> 8115f54190814e6abf2d53f6a2194c1af0b27040
> Hardware name: Google juniper sku16 board (DT)
> Workqueue: events_unbound async_run_entry_fn
> Call trace:
>  dump_backtrace.part.0+0xb4/0xf8
>  show_stack+0x20/0x38
>  dump_stack_lvl+0x88/0xb4
>  dump_stack+0x18/0x34
>  register_lock_class+0x16c/0x40c
>  __lock_acquire+0xa0/0x1064
>  lock_acquire+0x1f0/0x2f0
>  down_write+0x5c/0x80
>  __blocking_notifier_chain_register+0x64/0x84
>  blocking_notifier_chain_register+0x1c/0x28
>  cros_ec_debugfs_probe+0x218/0x3ac
>  platform_probe+0x70/0xc4
>  really_probe+0x158/0x290
>  __driver_probe_device+0xc8/0xe0
>  driver_probe_device+0x44/0x100
>  __device_attach_driver+0x64/0xdc
>  bus_for_each_drv+0xa0/0xc8
>  __device_attach_async_helper+0x70/0xc4
>  async_run_entry_fn+0x3c/0xe4
>  process_one_work+0x2d0/0x48c
>  worker_thread+0x204/0x274
>  kthread+0xe8/0xf8
>  ret_from_fork+0x10/0x20

I think this is caused by

    d90fa2c64d59 platform/chrome: cros_ec: Poll EC log on EC panic

That commit is missing a BLOCKING_INIT_NOTIFIER_HEAD() call.

ChenYu
Tzung-Bi Shih Jan. 11, 2023, 9:06 a.m. UTC | #3
On Wed, Jan 11, 2023 at 05:03:22PM +0800, Chen-Yu Tsai wrote:
> On Wed, Jan 11, 2023 at 4:47 PM Chen-Yu Tsai <wenst@chromium.org> wrote:
> >
> > On Wed, Jan 11, 2023 at 3:41 PM Chen-Yu Tsai <wenst@chromium.org> wrote:
> > >
> > > Lockdep reports a bogus possible deadlock on MT8192 Chromebooks due to
> > > the following lock sequences:
> > >
> > > 1. lock(i2c_register_adapter) [1]; lock(&ec_dev->lock)
> > > 2. lock(&ec_dev->lock); lock(prepare_lock);
> > >
> > > The actual dependency chains are much longer. The shortened version
> > > looks somewhat like:
> > >
> > > 1. cros-ec-rpmsg on mtk-scp
> > >    ec_dev->lock -> prepare_lock
> > > 2. In rt5682_i2c_probe() on native I2C bus:
> > >    prepare_lock -> regmap->lock -> (possibly) i2c_adapter->bus_lock
> > > 3. In rt5682_i2c_probe() on native I2C bus:
> > >    regmap->lock -> i2c_adapter->bus_lock
> > > 4. In sbs_probe() on i2c-cros-ec-tunnel I2C bus attached on cros-ec:
> > >    i2c_adapter->bus_lock -> ec_dev->lock
> > >
> > > While lockdep is correct that the shared lockdep classes have a circular
> > > dependency, it is bogus because
> > >
> > >   a) 2+3 happen on a native I2C bus
> > >   b) 4 happens on the actual EC on ChromeOS devices
> > >   c) 1 happens on the SCP coprocessor on MediaTek Chromebooks that just
> > >      happens to expose a cros-ec interface, but does not have an
> > >      i2c-cros-ec-tunnel I2C bus
> > >
> > > In short, the "dependencies" are actually on different devices.
> > >
> > > Setup a per-device lockdep key for cros_ec devices so lockdep can tell
> > > the two instances apart. This helps with getting rid of the bogus
> > > lockdep warning. For ChromeOS devices that only have one cros-ec
> > > instance this doesn't change anything.
> >
> > Actually, hold off on this for a bit. I just realized this makes the
> > kernel give a big warning:
> >
> > INFO: trying to register non-static key.
> > The code is fine but needs lockdep annotation, or maybe
> > you didn't initialize this object before use?
> > turning off the locking correctness validator.
> >
> > CPU: 0 PID: 99 Comm: kworker/u16:3 Not tainted
> > 6.2.0-rc3-next-20230111-04021-g65853aed7123-dirty #472
> > 8115f54190814e6abf2d53f6a2194c1af0b27040
> > Hardware name: Google juniper sku16 board (DT)
> > Workqueue: events_unbound async_run_entry_fn
> > Call trace:
> >  dump_backtrace.part.0+0xb4/0xf8
> >  show_stack+0x20/0x38
> >  dump_stack_lvl+0x88/0xb4
> >  dump_stack+0x18/0x34
> >  register_lock_class+0x16c/0x40c
> >  __lock_acquire+0xa0/0x1064
> >  lock_acquire+0x1f0/0x2f0
> >  down_write+0x5c/0x80
> >  __blocking_notifier_chain_register+0x64/0x84
> >  blocking_notifier_chain_register+0x1c/0x28
> >  cros_ec_debugfs_probe+0x218/0x3ac
> >  platform_probe+0x70/0xc4
> >  really_probe+0x158/0x290
> >  __driver_probe_device+0xc8/0xe0
> >  driver_probe_device+0x44/0x100
> >  __device_attach_driver+0x64/0xdc
> >  bus_for_each_drv+0xa0/0xc8
> >  __device_attach_async_helper+0x70/0xc4
> >  async_run_entry_fn+0x3c/0xe4
> >  process_one_work+0x2d0/0x48c
> >  worker_thread+0x204/0x274
> >  kthread+0xe8/0xf8
> >  ret_from_fork+0x10/0x20
> 
> I think this is caused by
> 
>     d90fa2c64d59 platform/chrome: cros_ec: Poll EC log on EC panic
> 
> That commit is missing a BLOCKING_INIT_NOTIFIER_HEAD() call.

Yes.  https://patchwork.kernel.org/project/chrome-platform/patch/20230110221033.7441-1-m.szyprowski@samsung.com/
patchwork-bot+chrome-platform@kernel.org Jan. 13, 2023, 9:20 p.m. UTC | #4
Hello:

This patch was applied to chrome-platform/linux.git (for-kernelci)
by Tzung-Bi Shih <tzungbi@kernel.org>:

On Wed, 11 Jan 2023 15:41:46 +0800 you wrote:
> Lockdep reports a bogus possible deadlock on MT8192 Chromebooks due to
> the following lock sequences:
> 
> 1. lock(i2c_register_adapter) [1]; lock(&ec_dev->lock)
> 2. lock(&ec_dev->lock); lock(prepare_lock);
> 
> The actual dependency chains are much longer. The shortened version
> looks somewhat like:
> 
> [...]

Here is the summary with links:
  - [v2] platform/chrome: cros_ec: Use per-device lockdep key
    https://git.kernel.org/chrome-platform/c/961a325becd9

You are awesome, thank you!
patchwork-bot+chrome-platform@kernel.org Jan. 15, 2023, 12:50 p.m. UTC | #5
Hello:

This patch was applied to chrome-platform/linux.git (for-next)
by Tzung-Bi Shih <tzungbi@kernel.org>:

On Wed, 11 Jan 2023 15:41:46 +0800 you wrote:
> Lockdep reports a bogus possible deadlock on MT8192 Chromebooks due to
> the following lock sequences:
> 
> 1. lock(i2c_register_adapter) [1]; lock(&ec_dev->lock)
> 2. lock(&ec_dev->lock); lock(prepare_lock);
> 
> The actual dependency chains are much longer. The shortened version
> looks somewhat like:
> 
> [...]

Here is the summary with links:
  - [v2] platform/chrome: cros_ec: Use per-device lockdep key
    https://git.kernel.org/chrome-platform/c/961a325becd9

You are awesome, thank you!
diff mbox series

Patch

diff --git a/drivers/platform/chrome/cros_ec.c b/drivers/platform/chrome/cros_ec.c
index ec733f683f34..4ae57820afd5 100644
--- a/drivers/platform/chrome/cros_ec.c
+++ b/drivers/platform/chrome/cros_ec.c
@@ -198,12 +198,14 @@  int cros_ec_register(struct cros_ec_device *ec_dev)
 	if (!ec_dev->dout)
 		return -ENOMEM;
 
+	lockdep_register_key(&ec_dev->lockdep_key);
 	mutex_init(&ec_dev->lock);
+	lockdep_set_class(&ec_dev->lock, &ec_dev->lockdep_key);
 
 	err = cros_ec_query_all(ec_dev);
 	if (err) {
 		dev_err(dev, "Cannot identify the EC: error %d\n", err);
-		return err;
+		goto destroy_mutex;
 	}
 
 	if (ec_dev->irq > 0) {
@@ -215,7 +217,7 @@  int cros_ec_register(struct cros_ec_device *ec_dev)
 		if (err) {
 			dev_err(dev, "Failed to request IRQ %d: %d\n",
 				ec_dev->irq, err);
-			return err;
+			goto destroy_mutex;
 		}
 	}
 
@@ -226,7 +228,8 @@  int cros_ec_register(struct cros_ec_device *ec_dev)
 	if (IS_ERR(ec_dev->ec)) {
 		dev_err(ec_dev->dev,
 			"Failed to create CrOS EC platform device\n");
-		return PTR_ERR(ec_dev->ec);
+		err = PTR_ERR(ec_dev->ec);
+		goto destroy_mutex;
 	}
 
 	if (ec_dev->max_passthru) {
@@ -292,6 +295,9 @@  int cros_ec_register(struct cros_ec_device *ec_dev)
 exit:
 	platform_device_unregister(ec_dev->ec);
 	platform_device_unregister(ec_dev->pd);
+destroy_mutex:
+	mutex_destroy(&ec_dev->lock);
+	lockdep_unregister_key(&ec_dev->lockdep_key);
 	return err;
 }
 EXPORT_SYMBOL(cros_ec_register);
@@ -309,6 +315,8 @@  void cros_ec_unregister(struct cros_ec_device *ec_dev)
 	if (ec_dev->pd)
 		platform_device_unregister(ec_dev->pd);
 	platform_device_unregister(ec_dev->ec);
+	mutex_destroy(&ec_dev->lock);
+	lockdep_unregister_key(&ec_dev->lockdep_key);
 }
 EXPORT_SYMBOL(cros_ec_unregister);
 
diff --git a/include/linux/platform_data/cros_ec_proto.h b/include/linux/platform_data/cros_ec_proto.h
index 017d502ed66e..3db26c891d5c 100644
--- a/include/linux/platform_data/cros_ec_proto.h
+++ b/include/linux/platform_data/cros_ec_proto.h
@@ -9,6 +9,7 @@ 
 #define __LINUX_CROS_EC_PROTO_H
 
 #include <linux/device.h>
+#include <linux/lockdep_types.h>
 #include <linux/mutex.h>
 #include <linux/notifier.h>
 
@@ -122,6 +123,8 @@  struct cros_ec_command {
  *            command. The caller should check msg.result for the EC's result
  *            code.
  * @pkt_xfer: Send packet to EC and get response.
+ * @lockdep_key: Lockdep class for each instance. Unused if CONFIG_LOCKDEP is
+ *		 not enabled.
  * @lock: One transaction at a time.
  * @mkbp_event_supported: 0 if MKBP not supported. Otherwise its value is
  *                        the maximum supported version of the MKBP host event
@@ -166,6 +169,7 @@  struct cros_ec_device {
 			struct cros_ec_command *msg);
 	int (*pkt_xfer)(struct cros_ec_device *ec,
 			struct cros_ec_command *msg);
+	struct lock_class_key lockdep_key;
 	struct mutex lock;
 	u8 mkbp_event_supported;
 	bool host_sleep_v1;