diff mbox

ASoC: rt5677: Avoid recursive locking in the DEADLOCK detector

Message ID 1433346797-1908-1-git-send-email-oder_chiou@realtek.com (mailing list archive)
State New, archived
Headers show

Commit Message

Oder Chiou June 3, 2015, 3:53 p.m. UTC
The patch is for to avoid recursive locking in the DEADLOCK detector.
In the driver, it encountered the warnning message of the recursive locking
in the function "regmap_lock_mutex".

Signed-off-by: Oder Chiou <oder_chiou@realtek.com>
---
 sound/soc/codecs/rt5677.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

Comments

Mark Brown June 3, 2015, 4:44 p.m. UTC | #1
On Wed, Jun 03, 2015 at 11:53:17PM +0800, Oder Chiou wrote:

> The patch is for to avoid recursive locking in the DEADLOCK detector.
> In the driver, it encountered the warnning message of the recursive locking
> in the function "regmap_lock_mutex".

Could you be more specific about the deadlock you're seeing?  This isn't
really enough to understand either what the problem is or why this fixes
it.  Converting all the per-regmap mutexes into a single global mutex
isn't an immediately obvious step.
Nicolas Boichat June 5, 2015, 12:53 a.m. UTC | #2
Hi Mark,

On Thu, Jun 4, 2015 at 12:44 AM, Mark Brown <broonie@kernel.org> wrote:
>
> On Wed, Jun 03, 2015 at 11:53:17PM +0800, Oder Chiou wrote:
>
> > The patch is for to avoid recursive locking in the DEADLOCK detector.
> > In the driver, it encountered the warnning message of the recursive locking
> > in the function "regmap_lock_mutex".
>
> Could you be more specific about the deadlock you're seeing?  This isn't
> really enough to understand either what the problem is or why this fixes
> it.  Converting all the per-regmap mutexes into a single global mutex
> isn't an immediately obvious step.

We originally reported the issue to Realtek:

[    2.569449] =============================================
[    2.569451] [ INFO: possible recursive locking detected ]
[    2.569454] 3.18.0 #311 Tainted: G S
[    2.569456] ---------------------------------------------
[    2.569458] swapper/0/1 is trying to acquire lock:
[    2.569469]  (&map->mutex){+.+...}, at: [<ffffffc00037dba0>]
regmap_lock_mutex+0x10/0x18
[    2.569470]
[    2.569470] but task is already holding lock:
[    2.569476]  (&map->mutex){+.+...}, at: [<ffffffc00037dba0>]
regmap_lock_mutex+0x10/0x18
[    2.569478]
[    2.569478] other info that might help us debug this:
[    2.569479]  Possible unsafe locking scenario:
[    2.569479]
[    2.569480]        CPU0
[    2.569481]        ----
[    2.569484]   lock(&map->mutex);
[    2.569486]   lock(&map->mutex);
[    2.569487]
[    2.569487]  *** DEADLOCK ***
[    2.569487]
[    2.569489]  May be due to missing lock nesting notation
[    2.569489]
[    2.569491] 3 locks held by swapper/0/1:
[    2.569499]  #0:  (&dev->mutex){......}, at: [<ffffffc000369e80>]
__driver_attach+0x38/0x98
[    2.569505]  #1:  (&dev->mutex){......}, at: [<ffffffc000369ea0>]
__driver_attach+0x58/0x98
[    2.569512]  #2:  (&map->mutex){+.+...}, at: [<ffffffc00037dba0>]
regmap_lock_mutex+0x10/0x18
[    2.569513]
[    2.569513] stack backtrace:
[    2.569517] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G S
3.18.0 #311
[    2.569521] Call trace:
[    2.569527] [<ffffffc000089198>] dump_backtrace+0x0/0x108
[    2.569530] [<ffffffc0000892b0>] show_stack+0x10/0x1c
[    2.569535] [<ffffffc0005e09b4>] dump_stack+0x74/0x94
[    2.569540] [<ffffffc0000d5d1c>] __lock_acquire+0x73c/0x1910
[    2.569544] [<ffffffc0000d7674>] lock_acquire+0xec/0x128
[    2.569548] [<ffffffc0005e3870>] mutex_lock_nested+0x58/0x354
[    2.569551] [<ffffffc00037db9c>] regmap_lock_mutex+0xc/0x18
[    2.569554] [<ffffffc00037ef50>] regmap_read+0x34/0x68
[    2.569559] [<ffffffc0004ddc84>] rt5677_read+0x9c/0xb4
[    2.569562] [<ffffffc00037ee8c>] _regmap_read+0xa8/0x138
[    2.569565] [<ffffffc00037ef60>] regmap_read+0x44/0x68
[    2.569569] [<ffffffc0004dd6a4>] rt5677_i2c_probe+0x25c/0x4a4
[    2.569574] [<ffffffc00042984c>] i2c_device_probe+0xfc/0x130
[    2.569577] [<ffffffc000369c58>] driver_probe_device+0xd4/0x23c
[    2.569580] [<ffffffc000369eb0>] __driver_attach+0x68/0x98
[    2.569584] [<ffffffc000368dc8>] bus_for_each_dev+0x70/0x90
[    2.569588] [<ffffffc0003697e4>] driver_attach+0x1c/0x28
[    2.569591] [<ffffffc000369400>] bus_add_driver+0xd8/0x1e0
[    2.569594] [<ffffffc00036a7f0>] driver_register+0xbc/0x10c
[    2.569598] [<ffffffc00042a4bc>] i2c_register_driver+0x48/0xac
[    2.569601] [<ffffffc0008d523c>] rt5677_i2c_driver_init+0x14/0x20
[    2.569605] [<ffffffc0000828dc>] do_one_initcall+0xf4/0x18c
[    2.569609] [<ffffffc0008a4ae8>] kernel_init_freeable+0x144/0x1e4
[    2.569613] [<ffffffc0005de3a4>] kernel_init+0x10/0xd4

It's actually a false alarm, the warning is triggered because 2 locks
are acquired, from the same location:
 - rt5677_i2c_probe calls regmap_read(rt5677->regmap, RT5677_VENDOR_ID2, &val);
 - This locks rt5677->regmap->mutex (1st lock), then calls rt5677_read
 - Since rt5677->is_dsp_mode is false,
regmap_read(rt5677->regmap_physical, reg, val); is called
 - This locks rt5677->regmap_physical->mutex (2nd lock).
 - The value is read, then both locks are released, in reverse order.

AFAIK there is no code that would acquire the locks in the reverse
order, so I don't think this can deadlock.

I suggested reworking the register read/write calls in rt5677.c to
direct them to the correct regmap earlier on (rt5677->regmap or
rt5677->regmap_physical), before locks are acquired. But the patch
above also fixes the issue (that is, it removes the warning).

Thanks,

Nicolas
Mark Brown June 5, 2015, 10:31 a.m. UTC | #3
On Fri, Jun 05, 2015 at 08:53:54AM +0800, Nicolas Boichat wrote:

> > Could you be more specific about the deadlock you're seeing?  This isn't
> > really enough to understand either what the problem is or why this fixes
> > it.  Converting all the per-regmap mutexes into a single global mutex
> > isn't an immediately obvious step.

> We originally reported the issue to Realtek:

Any analysis needs to be in the changelog for the commit and...

> I suggested reworking the register read/write calls in rt5677.c to
> direct them to the correct regmap earlier on (rt5677->regmap or
> rt5677->regmap_physical), before locks are acquired. But the patch
> above also fixes the issue (that is, it removes the warning).

...the above sounds like there's a bug in the locking anyway which this
is just a bodge for?
Lars-Peter Clausen June 5, 2015, 10:53 a.m. UTC | #4
On 06/05/2015 12:31 PM, Mark Brown wrote:
> On Fri, Jun 05, 2015 at 08:53:54AM +0800, Nicolas Boichat wrote:
>
>>> Could you be more specific about the deadlock you're seeing?  This isn't
>>> really enough to understand either what the problem is or why this fixes
>>> it.  Converting all the per-regmap mutexes into a single global mutex
>>> isn't an immediately obvious step.
>
>> We originally reported the issue to Realtek:
>
> Any analysis needs to be in the changelog for the commit and...
>
>> I suggested reworking the register read/write calls in rt5677.c to
>> direct them to the correct regmap earlier on (rt5677->regmap or
>> rt5677->regmap_physical), before locks are acquired. But the patch
>> above also fixes the issue (that is, it removes the warning).
>
> ...the above sounds like there's a bug in the locking anyway which this
> is just a bodge for?

It's the same issue Antti reported a while ago. The issue is that lockdep 
for performance reasons does not look at each lock separately, but only at 
lock classes. By default all locks initialized by the same mutex_init() call 
end up in the same lock class. When one lock of a lock class is locked while 
already holding a lock from the same lock class lockdep complains about a 
potential deadlock because to lockdep those look like the same lock. For 
most locks this is OK since they do not recursively lock locks from the same 
lock class.

Now the issue here is that we have nested regmap instances, meaning one 
regmap instances uses another instance in its read/write implementation. 
This will lead to nested locking of the mutex of the regmap struct. Since 
both mutexes are in the same lock class lockdep generates a warning. Rather 
than silencing the warning with some per driver hacks the correct way to fix 
this is to allow to properly annotate the locks as being different locks.

I think Antti submitted some patches to attend to fix this, but there were 
still issues with the patches and they never got merged.

- Lars
Mark Brown June 5, 2015, 5:13 p.m. UTC | #5
On Fri, Jun 05, 2015 at 12:53:27PM +0200, Lars-Peter Clausen wrote:

> Now the issue here is that we have nested regmap instances, meaning one
> regmap instances uses another instance in its read/write implementation.

Ah, that sounds more familiar than unlocking in different orders.

> I think Antti submitted some patches to attend to fix this, but there were
> still issues with the patches and they never got merged.

IIRC his patch was adding the ability for drivers to override the lock
class which wasn't exactly fixing the problem but rather providing a
different way to try to work around it in drivers.
diff mbox

Patch

diff --git a/sound/soc/codecs/rt5677.c b/sound/soc/codecs/rt5677.c
index 31d969a..ee5b570 100644
--- a/sound/soc/codecs/rt5677.c
+++ b/sound/soc/codecs/rt5677.c
@@ -4998,6 +4998,22 @@  static const struct regmap_config rt5677_regmap_physical = {
 	.num_ranges = ARRAY_SIZE(rt5677_ranges),
 };
 
+static void rt5677_regmap_lock_mutex(void *__lock)
+{
+	struct mutex *lock = __lock;
+
+	mutex_lock(lock);
+}
+
+static void rt5677_regmap_unlock_mutex(void *__lock)
+{
+	struct mutex *lock = __lock;
+
+	mutex_unlock(lock);
+}
+
+static struct mutex rt5677_regmap_lock;
+
 static const struct regmap_config rt5677_regmap = {
 	.reg_bits = 8,
 	.val_bits = 16,
@@ -5009,6 +5025,9 @@  static const struct regmap_config rt5677_regmap = {
 	.readable_reg = rt5677_readable_register,
 	.reg_read = rt5677_read,
 	.reg_write = rt5677_write,
+	.lock = rt5677_regmap_lock_mutex,
+	.unlock = rt5677_regmap_unlock_mutex,
+	.lock_arg = &rt5677_regmap_lock,
 
 	.cache_type = REGCACHE_RBTREE,
 	.reg_defaults = rt5677_reg,
@@ -5206,6 +5225,8 @@  static int rt5677_i2c_probe(struct i2c_client *i2c,
 		return ret;
 	}
 
+	mutex_init(&rt5677_regmap_lock);
+
 	regmap_read(rt5677->regmap, RT5677_VENDOR_ID2, &val);
 	if (val != RT5677_DEVICE_ID) {
 		dev_err(&i2c->dev,