Message ID | 20220721130329.3787211-1-sudeep.holla@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | coresight: Fix possible deadlock with lock dependency | expand |
Hi Sudeep, On Thu, 21 Jul 2022 at 14:03, Sudeep Holla <sudeep.holla@arm.com> wrote: > > With lockdeps enabled, we get the following warning: > > ====================================================== > WARNING: possible circular locking dependency detected > ------------------------------------------------------ > kworker/u12:1/53 is trying to acquire lock: > ffff80000adce220 (coresight_mutex){+.+.}-{4:4}, at: coresight_set_assoc_ectdev_mutex+0x3c/0x5c > but task is already holding lock: > ffff80000add1f60 (ect_mutex){+.+.}-{4:4}, at: cti_probe+0x318/0x394 > > which lock already depends on the new lock. > the existing dependency chain (in reverse order) is: > > -> #1 (ect_mutex){+.+.}-{4:4}: > __mutex_lock_common+0xd8/0xe60 > mutex_lock_nested+0x44/0x50 > cti_add_assoc_to_csdev+0x4c/0x184 > coresight_register+0x2f0/0x314 > tmc_probe+0x33c/0x414 > > -> #0 (coresight_mutex){+.+.}-{4:4}: > __lock_acquire+0x1a20/0x32d0 > lock_acquire+0x160/0x308 > __mutex_lock_common+0xd8/0xe60 > mutex_lock_nested+0x44/0x50 > coresight_set_assoc_ectdev_mutex+0x3c/0x5c > cti_update_conn_xrefs+0x6c/0xf8 > cti_probe+0x33c/0x394 > > other info that might help us debug this: > Possible unsafe locking scenario: > CPU0 CPU1 > ---- ---- > lock(ect_mutex); > lock(coresight_mutex); > lock(ect_mutex); > lock(coresight_mutex); > *** DEADLOCK *** > > 4 locks held by kworker/u12:1/53: > #0: ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1fc/0x63c > #1: (deferred_probe_work){+.+.}-{0:0}, at: process_one_work+0x228/0x63c > #2: (&dev->mutex){....}-{4:4}, at: __device_attach+0x48/0x1a8 > #3: (ect_mutex){+.+.}-{4:4}, at: cti_probe+0x318/0x394 > > To fix the same, call cti_add_assoc_to_csdev without the holding > coresight_mutex and confine the locking while setting the associated > ect / cti device using coresight_set_assoc_ectdev_mutex(). > > Cc: Mathieu Poirier <mathieu.poirier@linaro.org> > Cc: Suzuki K Poulose <suzuki.poulose@arm.com> > Cc: Mike Leach <mike.leach@linaro.org> > Cc: Leo Yan <leo.yan@linaro.org> > Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> > --- > drivers/hwtracing/coresight/coresight-core.c | 7 ++++--- > drivers/hwtracing/coresight/coresight-cti-core.c | 5 +++-- > 2 files changed, 7 insertions(+), 5 deletions(-) > > Hi, > > I am not sure if I missed to consider something, but this is something > I think could be the fix. > > Regards, > Sudeep > I have been running with lockdep testing coresight for a number of weeks now, without encountering the same issue with CTIs. That said, I am on a DB410, and the code in question is designed to take different paths depending on whether the CTI or other device is discovered first, so this could well be something different in the discovery order between juno and DB410. The change looks perfectly fine to me. As such: Reviewed-by: Mike Leach <mike.leach@linaro.org> > diff --git a/drivers/hwtracing/coresight/coresight-core.c b/drivers/hwtracing/coresight/coresight-core.c > index 1edfec1e9d18..275b4dce8401 100644 > --- a/drivers/hwtracing/coresight/coresight-core.c > +++ b/drivers/hwtracing/coresight/coresight-core.c > @@ -1659,14 +1659,15 @@ struct coresight_device *coresight_register(struct coresight_desc *desc) > ret = coresight_fixup_device_conns(csdev); > if (!ret) > ret = coresight_fixup_orphan_conns(csdev); > - if (!ret && cti_assoc_ops && cti_assoc_ops->add) > - cti_assoc_ops->add(csdev); > > out_unlock: > mutex_unlock(&coresight_mutex); > /* Success */ > - if (!ret) > + if (!ret) { > + if (cti_assoc_ops && cti_assoc_ops->add) > + cti_assoc_ops->add(csdev); > return csdev; > + } > > /* Unregister the device if needed */ > if (registered) { > diff --git a/drivers/hwtracing/coresight/coresight-cti-core.c b/drivers/hwtracing/coresight/coresight-cti-core.c > index 8988b2ed2ea6..1be92342b5b9 100644 > --- a/drivers/hwtracing/coresight/coresight-cti-core.c > +++ b/drivers/hwtracing/coresight/coresight-cti-core.c > @@ -541,7 +541,7 @@ cti_match_fixup_csdev(struct cti_device *ctidev, const char *node_name, > /* > * Search the cti list to add an associated CTI into the supplied CS device > * This will set the association if CTI declared before the CS device. > - * (called from coresight_register() with coresight_mutex locked). > + * (called from coresight_register() without coresight_mutex locked). > */ > static void cti_add_assoc_to_csdev(struct coresight_device *csdev) > { > @@ -569,7 +569,8 @@ static void cti_add_assoc_to_csdev(struct coresight_device *csdev) > * if we found a matching csdev then update the ECT > * association pointer for the device with this CTI. > */ > - csdev->ect_dev = ect_item->csdev; > + coresight_set_assoc_ectdev_mutex(csdev->ect_dev, > + ect_item->csdev); > break; > } > } > -- > 2.37.1 >
On 21/07/2022 14:03, Sudeep Holla wrote: > With lockdeps enabled, we get the following warning: > > ====================================================== > WARNING: possible circular locking dependency detected > ------------------------------------------------------ > kworker/u12:1/53 is trying to acquire lock: > ffff80000adce220 (coresight_mutex){+.+.}-{4:4}, at: coresight_set_assoc_ectdev_mutex+0x3c/0x5c > but task is already holding lock: > ffff80000add1f60 (ect_mutex){+.+.}-{4:4}, at: cti_probe+0x318/0x394 > > which lock already depends on the new lock. > the existing dependency chain (in reverse order) is: > > -> #1 (ect_mutex){+.+.}-{4:4}: > __mutex_lock_common+0xd8/0xe60 > mutex_lock_nested+0x44/0x50 > cti_add_assoc_to_csdev+0x4c/0x184 > coresight_register+0x2f0/0x314 > tmc_probe+0x33c/0x414 > > -> #0 (coresight_mutex){+.+.}-{4:4}: > __lock_acquire+0x1a20/0x32d0 > lock_acquire+0x160/0x308 > __mutex_lock_common+0xd8/0xe60 > mutex_lock_nested+0x44/0x50 > coresight_set_assoc_ectdev_mutex+0x3c/0x5c > cti_update_conn_xrefs+0x6c/0xf8 > cti_probe+0x33c/0x394 > > other info that might help us debug this: > Possible unsafe locking scenario: > CPU0 CPU1 > ---- ---- > lock(ect_mutex); > lock(coresight_mutex); > lock(ect_mutex); > lock(coresight_mutex); > *** DEADLOCK *** > > 4 locks held by kworker/u12:1/53: > #0: ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1fc/0x63c > #1: (deferred_probe_work){+.+.}-{0:0}, at: process_one_work+0x228/0x63c > #2: (&dev->mutex){....}-{4:4}, at: __device_attach+0x48/0x1a8 > #3: (ect_mutex){+.+.}-{4:4}, at: cti_probe+0x318/0x394 > > To fix the same, call cti_add_assoc_to_csdev without the holding > coresight_mutex and confine the locking while setting the associated > ect / cti device using coresight_set_assoc_ectdev_mutex(). > > Cc: Mathieu Poirier <mathieu.poirier@linaro.org> > Cc: Suzuki K Poulose <suzuki.poulose@arm.com> > Cc: Mike Leach <mike.leach@linaro.org> > Cc: Leo Yan <leo.yan@linaro.org> > Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Thanks for the fix. I will push this out at rc1 as fixes. Thanks Suzuki
diff --git a/drivers/hwtracing/coresight/coresight-core.c b/drivers/hwtracing/coresight/coresight-core.c index 1edfec1e9d18..275b4dce8401 100644 --- a/drivers/hwtracing/coresight/coresight-core.c +++ b/drivers/hwtracing/coresight/coresight-core.c @@ -1659,14 +1659,15 @@ struct coresight_device *coresight_register(struct coresight_desc *desc) ret = coresight_fixup_device_conns(csdev); if (!ret) ret = coresight_fixup_orphan_conns(csdev); - if (!ret && cti_assoc_ops && cti_assoc_ops->add) - cti_assoc_ops->add(csdev); out_unlock: mutex_unlock(&coresight_mutex); /* Success */ - if (!ret) + if (!ret) { + if (cti_assoc_ops && cti_assoc_ops->add) + cti_assoc_ops->add(csdev); return csdev; + } /* Unregister the device if needed */ if (registered) { diff --git a/drivers/hwtracing/coresight/coresight-cti-core.c b/drivers/hwtracing/coresight/coresight-cti-core.c index 8988b2ed2ea6..1be92342b5b9 100644 --- a/drivers/hwtracing/coresight/coresight-cti-core.c +++ b/drivers/hwtracing/coresight/coresight-cti-core.c @@ -541,7 +541,7 @@ cti_match_fixup_csdev(struct cti_device *ctidev, const char *node_name, /* * Search the cti list to add an associated CTI into the supplied CS device * This will set the association if CTI declared before the CS device. - * (called from coresight_register() with coresight_mutex locked). + * (called from coresight_register() without coresight_mutex locked). */ static void cti_add_assoc_to_csdev(struct coresight_device *csdev) { @@ -569,7 +569,8 @@ static void cti_add_assoc_to_csdev(struct coresight_device *csdev) * if we found a matching csdev then update the ECT * association pointer for the device with this CTI. */ - csdev->ect_dev = ect_item->csdev; + coresight_set_assoc_ectdev_mutex(csdev->ect_dev, + ect_item->csdev); break; } }
With lockdeps enabled, we get the following warning: ====================================================== WARNING: possible circular locking dependency detected ------------------------------------------------------ kworker/u12:1/53 is trying to acquire lock: ffff80000adce220 (coresight_mutex){+.+.}-{4:4}, at: coresight_set_assoc_ectdev_mutex+0x3c/0x5c but task is already holding lock: ffff80000add1f60 (ect_mutex){+.+.}-{4:4}, at: cti_probe+0x318/0x394 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (ect_mutex){+.+.}-{4:4}: __mutex_lock_common+0xd8/0xe60 mutex_lock_nested+0x44/0x50 cti_add_assoc_to_csdev+0x4c/0x184 coresight_register+0x2f0/0x314 tmc_probe+0x33c/0x414 -> #0 (coresight_mutex){+.+.}-{4:4}: __lock_acquire+0x1a20/0x32d0 lock_acquire+0x160/0x308 __mutex_lock_common+0xd8/0xe60 mutex_lock_nested+0x44/0x50 coresight_set_assoc_ectdev_mutex+0x3c/0x5c cti_update_conn_xrefs+0x6c/0xf8 cti_probe+0x33c/0x394 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(ect_mutex); lock(coresight_mutex); lock(ect_mutex); lock(coresight_mutex); *** DEADLOCK *** 4 locks held by kworker/u12:1/53: #0: ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1fc/0x63c #1: (deferred_probe_work){+.+.}-{0:0}, at: process_one_work+0x228/0x63c #2: (&dev->mutex){....}-{4:4}, at: __device_attach+0x48/0x1a8 #3: (ect_mutex){+.+.}-{4:4}, at: cti_probe+0x318/0x394 To fix the same, call cti_add_assoc_to_csdev without the holding coresight_mutex and confine the locking while setting the associated ect / cti device using coresight_set_assoc_ectdev_mutex(). Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> --- drivers/hwtracing/coresight/coresight-core.c | 7 ++++--- drivers/hwtracing/coresight/coresight-cti-core.c | 5 +++-- 2 files changed, 7 insertions(+), 5 deletions(-) Hi, I am not sure if I missed to consider something, but this is something I think could be the fix. Regards, Sudeep