diff mbox

mmc: mxs: DEADLOCK

Message ID 4FFC367A.8020100@bluegiga.com (mailing list archive)
State New, archived
Headers show

Commit Message

Lauri Hintsala July 10, 2012, 2:04 p.m. UTC
Hi,

I was able to get deadlock with CONFIG_DEBUG_SPINLOCK enabled. I added 
also CONFIG_PROVE_LOCKING to get more verbose output. I got following 
error message after SDIO device has been powered.

I'm able to replicate issue with Linux next-20120710. Platform is imx28.

[   79.660000] =============================================
[   79.660000] [ INFO: possible recursive locking detected ]
[   79.660000] 3.4.0-00009-g3e96082-dirty #11 Not tainted
[   79.660000] ---------------------------------------------
[   79.660000] swapper/0 is trying to acquire lock:
[   79.660000]  (&(&host->lock)->rlock#2){-.....}, at: [<c026ea3c>] 
mxs_mmc_enable_sdio_irq+0x18/0xd4
[   79.660000]
[   79.660000] but task is already holding lock:
[   79.660000]  (&(&host->lock)->rlock#2){-.....}, at: [<c026f744>] 
mxs_mmc_irq_handler+0x1c/0xe8
[   79.660000]
[   79.660000] other info that might help us debug this:
[   79.660000]  Possible unsafe locking scenario:
[   79.660000]
[   79.660000]        CPU0
[   79.660000]        ----
[   79.660000]   lock(&(&host->lock)->rlock#2);
[   79.660000]   lock(&(&host->lock)->rlock#2);
[   79.660000]
[   79.660000]  *** DEADLOCK ***
[   79.660000]
[   79.660000]  May be due to missing lock nesting notation
[   79.660000]
[   79.660000] 1 lock held by swapper/0:
[   79.660000]  #0:  (&(&host->lock)->rlock#2){-.....}, at: [<c026f744>] 
mxs_mmc_irq_handler+0x1c/0xe8
[   79.660000]
[   79.660000] stack backtrace:
[   79.660000] [<c0014bd0>] (unwind_backtrace+0x0/0xf4) from 
[<c005f9c0>] (__lock_acquire+0x1948/0x1d48)
[   79.660000] [<c005f9c0>] (__lock_acquire+0x1948/0x1d48) from 
[<c005fea0>] (lock_acquire+0xe0/0xf8)
[   79.660000] [<c005fea0>] (lock_acquire+0xe0/0xf8) from [<c03a8460>] 
(_raw_spin_lock_irqsave+0x44/0x58)
[   79.660000] [<c03a8460>] (_raw_spin_lock_irqsave+0x44/0x58) from 
[<c026ea3c>] (mxs_mmc_enable_sdio_irq+0x18/0xd4)
[   79.660000] [<c026ea3c>] (mxs_mmc_enable_sdio_irq+0x18/0xd4) from 
[<c026f7fc>] (mxs_mmc_irq_handler+0xd4/0xe8)
[   79.660000] [<c026f7fc>] (mxs_mmc_irq_handler+0xd4/0xe8) from 
[<c006bdd8>] (handle_irq_event_percpu+0x70/0x254)
[   79.660000] [<c006bdd8>] (handle_irq_event_percpu+0x70/0x254) from 
[<c006bff8>] (handle_irq_event+0x3c/0x5c)
[   79.660000] [<c006bff8>] (handle_irq_event+0x3c/0x5c) from 
[<c006e6d0>] (handle_level_irq+0x90/0x110)
[   79.660000] [<c006e6d0>] (handle_level_irq+0x90/0x110) from 
[<c006b930>] (generic_handle_irq+0x38/0x50)
[   79.660000] [<c006b930>] (generic_handle_irq+0x38/0x50) from 
[<c00102fc>] (handle_IRQ+0x30/0x84)
[   79.660000] [<c00102fc>] (handle_IRQ+0x30/0x84) from [<c000f058>] 
(__irq_svc+0x38/0x60)
[   79.660000] [<c000f058>] (__irq_svc+0x38/0x60) from [<c0010520>] 
(default_idle+0x2c/0x40)
[   79.660000] [<c0010520>] (default_idle+0x2c/0x40) from [<c0010a90>] 
(cpu_idle+0x64/0xcc)
[   79.660000] [<c0010a90>] (cpu_idle+0x64/0xcc) from [<c04ff858>] 
(start_kernel+0x244/0x2c8)
[   79.660000] BUG: spinlock lockup on CPU#0, swapper/0
[   79.660000]  lock: c398cb2c, .magic: dead4ead, .owner: swapper/0, 
.owner_cpu: 0
[   79.660000] [<c0014bd0>] (unwind_backtrace+0x0/0xf4) from 
[<c01ddb1c>] (do_raw_spin_lock+0xf0/0x144)
[   79.660000] [<c01ddb1c>] (do_raw_spin_lock+0xf0/0x144) from 
[<c03a8468>] (_raw_spin_lock_irqsave+0x4c/0x58)
[   79.660000] [<c03a8468>] (_raw_spin_lock_irqsave+0x4c/0x58) from 
[<c026ea3c>] (mxs_mmc_enable_sdio_irq+0x18/0xd4)
[   79.660000] [<c026ea3c>] (mxs_mmc_enable_sdio_irq+0x18/0xd4) from 
[<c026f7fc>] (mxs_mmc_irq_handler+0xd4/0xe8)
[   79.660000] [<c026f7fc>] (mxs_mmc_irq_handler+0xd4/0xe8) from 
[<c006bdd8>] (handle_irq_event_percpu+0x70/0x254)
[   79.660000] [<c006bdd8>] (handle_irq_event_percpu+0x70/0x254) from 
[<c006bff8>] (handle_irq_event+0x3c/0x5c)
[   79.660000] [<c006bff8>] (handle_irq_event+0x3c/0x5c) from 
[<c006e6d0>] (handle_level_irq+0x90/0x110)
[   79.660000] [<c006e6d0>] (handle_level_irq+0x90/0x110) from 
[<c006b930>] (generic_handle_irq+0x38/0x50)
[   79.660000] [<c006b930>] (generic_handle_irq+0x38/0x50) from 
[<c00102fc>] (handle_IRQ+0x30/0x84)
[   79.660000] [<c00102fc>] (handle_IRQ+0x30/0x84) from [<c000f058>] 
(__irq_svc+0x38/0x60)
[   79.660000] [<c000f058>] (__irq_svc+0x38/0x60) from [<c0010520>] 
(default_idle+0x2c/0x40)
[   79.660000] [<c0010520>] (default_idle+0x2c/0x40) from [<c0010a90>] 
(cpu_idle+0x64/0xcc)
[   79.660000] [<c0010a90>] (cpu_idle+0x64/0xcc) from [<c04ff858>] 
(start_kernel+0x244/0x2c8)


I found a way to fix this issue:

  	else if (stat & BM_SSP_CTRL1_RESP_ERR_IRQ)


Is there any reason to keep mmc_signal_sdio_irq inside the spinlock? 
mmc_signal_sdio_irq calls mxs_mmc_enable_sdio_irq and it tries to 
acquire lock while it is already acquired.


Best regards,
Lauri Hintsala

Comments

Marek Vasut July 10, 2012, 3:02 p.m. UTC | #1
Dear Lauri Hintsala,

[...]

> --- a/drivers/mmc/host/mxs-mmc.c
> +++ b/drivers/mmc/host/mxs-mmc.c
> @@ -278,11 +278,11 @@ static irqreturn_t mxs_mmc_irq_handler(int irq,
> void *dev_id)
>   	writel(stat & MXS_MMC_IRQ_BITS,
>   	       host->base + HW_SSP_CTRL1(host) + STMP_OFFSET_REG_CLR);
> 
> +	spin_unlock(&host->lock);
> +
>   	if ((stat & BM_SSP_CTRL1_SDIO_IRQ) && (stat & BM_SSP_CTRL1_SDIO_IRQ_EN))
>   		mmc_signal_sdio_irq(host->mmc);
> 
> -	spin_unlock(&host->lock);
> -

Spinlock in irq handler is interesting too ;-)

>   	if (stat & BM_SSP_CTRL1_RESP_TIMEOUT_IRQ)
>   		cmd->error = -ETIMEDOUT;
>   	else if (stat & BM_SSP_CTRL1_RESP_ERR_IRQ)
> 
> 
> Is there any reason to keep mmc_signal_sdio_irq inside the spinlock?
> mmc_signal_sdio_irq calls mxs_mmc_enable_sdio_irq and it tries to
> acquire lock while it is already acquired.
> 
> 
> Best regards,
> Lauri Hintsala

Best regards,
Marek Vasut
Shawn Guo July 11, 2012, 6:06 a.m. UTC | #2
On Tue, Jul 10, 2012 at 05:04:42PM +0300, Lauri Hintsala wrote:
> Hi,
> 
> I was able to get deadlock with CONFIG_DEBUG_SPINLOCK enabled. I
> added also CONFIG_PROVE_LOCKING to get more verbose output. I got
> following error message after SDIO device has been powered.
> 
> I'm able to replicate issue with Linux next-20120710. Platform is imx28.
> 
The bug is there probably because the driver hasn't been widely tested
on SDIO card.

> I found a way to fix this issue:
> 
> --- a/drivers/mmc/host/mxs-mmc.c
> +++ b/drivers/mmc/host/mxs-mmc.c
> @@ -278,11 +278,11 @@ static irqreturn_t mxs_mmc_irq_handler(int
> irq, void *dev_id)
>  	writel(stat & MXS_MMC_IRQ_BITS,
>  	       host->base + HW_SSP_CTRL1(host) + STMP_OFFSET_REG_CLR);
> 
> +	spin_unlock(&host->lock);
> +
>  	if ((stat & BM_SSP_CTRL1_SDIO_IRQ) && (stat & BM_SSP_CTRL1_SDIO_IRQ_EN))
>  		mmc_signal_sdio_irq(host->mmc);
> 
> -	spin_unlock(&host->lock);
> -
>  	if (stat & BM_SSP_CTRL1_RESP_TIMEOUT_IRQ)
>  		cmd->error = -ETIMEDOUT;
>  	else if (stat & BM_SSP_CTRL1_RESP_ERR_IRQ)
> 
> 
> Is there any reason to keep mmc_signal_sdio_irq inside the spinlock?
> mmc_signal_sdio_irq calls mxs_mmc_enable_sdio_irq and it tries to
> acquire lock while it is already acquired.
> 
The fix looks right to me.  You can have my ack when you send a patch
for it.

Acked-by: Shawn Guo <shawn.guo@linaro.org>
Lauri Hintsala July 11, 2012, 6:08 a.m. UTC | #3
On 07/11/2012 09:06 AM, Shawn Guo wrote:
>> --- a/drivers/mmc/host/mxs-mmc.c
>> +++ b/drivers/mmc/host/mxs-mmc.c
>> @@ -278,11 +278,11 @@ static irqreturn_t mxs_mmc_irq_handler(int
>> irq, void *dev_id)
>>   	writel(stat & MXS_MMC_IRQ_BITS,
>>   	       host->base + HW_SSP_CTRL1(host) + STMP_OFFSET_REG_CLR);
>>
>> +	spin_unlock(&host->lock);
>> +
>>   	if ((stat & BM_SSP_CTRL1_SDIO_IRQ) && (stat & BM_SSP_CTRL1_SDIO_IRQ_EN))
>>   		mmc_signal_sdio_irq(host->mmc);
>>
>> -	spin_unlock(&host->lock);
>> -
>>   	if (stat & BM_SSP_CTRL1_RESP_TIMEOUT_IRQ)
>>   		cmd->error = -ETIMEDOUT;
>>   	else if (stat & BM_SSP_CTRL1_RESP_ERR_IRQ)
>>
>>
>> Is there any reason to keep mmc_signal_sdio_irq inside the spinlock?
>> mmc_signal_sdio_irq calls mxs_mmc_enable_sdio_irq and it tries to
>> acquire lock while it is already acquired.
>>
> The fix looks right to me.  You can have my ack when you send a patch
> for it.
>
> Acked-by: Shawn Guo <shawn.guo@linaro.org>

OK, I'll send a patch. Thanks!

Lauri
Shawn Guo July 11, 2012, 6:10 a.m. UTC | #4
On Tue, Jul 10, 2012 at 05:02:52PM +0200, Marek Vasut wrote:
> Dear Lauri Hintsala,
> 
> [...]
> 
> > --- a/drivers/mmc/host/mxs-mmc.c
> > +++ b/drivers/mmc/host/mxs-mmc.c
> > @@ -278,11 +278,11 @@ static irqreturn_t mxs_mmc_irq_handler(int irq,
> > void *dev_id)
> >   	writel(stat & MXS_MMC_IRQ_BITS,
> >   	       host->base + HW_SSP_CTRL1(host) + STMP_OFFSET_REG_CLR);
> > 
> > +	spin_unlock(&host->lock);
> > +
> >   	if ((stat & BM_SSP_CTRL1_SDIO_IRQ) && (stat & BM_SSP_CTRL1_SDIO_IRQ_EN))
> >   		mmc_signal_sdio_irq(host->mmc);
> > 
> > -	spin_unlock(&host->lock);
> > -
> 
> Spinlock in irq handler is interesting too ;-)
> 
For you information, the following is what I learnt from Arnd when I
was a beginner.

Regards,
Shawn

--- Quote Begins ---

A short form of the strict rules (there are better documentations
out there) is:

* If all users are outside of interrupt or tasklet context,
  use a bare spin_lock().

* If one user is in a tasklet context, use spin_lock() inside
  the tasklet, but spin_lock_bh() outside, to prevent the
  tasklet from interrupting the critical section. Code that
  can be called in either tasklet or regular context needs
  to use spin_lock_bh() as well.

* If one user is in interrupt context, use spin_lock() inside
  of the interrupt handler, but spin_lock_irq() in tasklet
  and normal context (not spin_lock_irqsave()), to prevent
  the interrupt from happening during the critical section.

* Use spin_lock_irqsave() only for functions that can be called
  in either interrupt or non-interrupt context. Most drivers
  don't need this at all.

The simplified rule would be to always use spin_lock_irqsave(),
because that does not require you to understand what you are doing.

My position is that it is better to use the stricter rules, because
that documents that you actually do understand what you are doing ;-)
It's also slightly more efficient, because it avoids having to
save the interrupt status in a variable.

--- Quote Ends ---
Attila Kinali July 12, 2012, 2 p.m. UTC | #5
On Wed, 11 Jul 2012 14:06:09 +0800
Shawn Guo <shawn.guo@linaro.org> wrote:


> > I found a way to fix this issue:
> > 
> > --- a/drivers/mmc/host/mxs-mmc.c
> > +++ b/drivers/mmc/host/mxs-mmc.c
> > @@ -278,11 +278,11 @@ static irqreturn_t mxs_mmc_irq_handler(int
> > irq, void *dev_id)
> >  	writel(stat & MXS_MMC_IRQ_BITS,
> >  	       host->base + HW_SSP_CTRL1(host) + STMP_OFFSET_REG_CLR);
> > 
> > +	spin_unlock(&host->lock);
> > +
> >  	if ((stat & BM_SSP_CTRL1_SDIO_IRQ) && (stat & BM_SSP_CTRL1_SDIO_IRQ_EN))
> >  		mmc_signal_sdio_irq(host->mmc);
> > 
> > -	spin_unlock(&host->lock);
> > -
> >  	if (stat & BM_SSP_CTRL1_RESP_TIMEOUT_IRQ)
> >  		cmd->error = -ETIMEDOUT;
> >  	else if (stat & BM_SSP_CTRL1_RESP_ERR_IRQ)
> > 
> > 
> > Is there any reason to keep mmc_signal_sdio_irq inside the spinlock?
> > mmc_signal_sdio_irq calls mxs_mmc_enable_sdio_irq and it tries to
> > acquire lock while it is already acquired.
> > 
> The fix looks right to me.  You can have my ack when you send a patch
> for it.
> 
> Acked-by: Shawn Guo <shawn.guo@linaro.org>

I ran into the same problem today, but the proposed fix doesn't seem
to work for me:

---schnipp---
# modprobe libertas_sdio
[   59.200000] lib80211: common routines for IEEE802.11 drivers
[   59.240000] cfg80211: Calling CRDA to update world regulatory domain
[   59.320000] libertas_sdio: Libertas SDIO driver
[   59.330000] libertas_sdio: Copyright Pierre Ossman
# modprobe mxs-mmc
[   64.210000] mxs-mmc 80010000.ssp: initialized
[   64.260000] mxs-mmc 80034000.ssp: initialized
[   64.270000] mmc0: new SDIO card at address 0001
# [   65.440000] libertas_sdio mmc0:0001:1: (unregistered net_device): 00:13:04:80:00:3f, fw 9.70.3p24, cap 0x00000303
[   65.470000] 
[   65.470000] =============================================
[   65.470000] [ INFO: possible recursive locking detected ]
[   65.470000] 3.5.0-rc5 #2 Not tainted
[   65.470000] ---------------------------------------------
[   65.470000] ksdioirqd/mmc0/73 is trying to acquire lock:
[   65.470000]  (&(&host->lock)->rlock#2){-.-...}, at: [<bf054120>] mxs_mmc_enable_sdio_irq+0x18/0xdc [mxs_mmc]
[   65.470000] 
[   65.470000] but task is already holding lock:
[   65.470000]  (&(&host->lock)->rlock#2){-.-...}, at: [<bf054120>] mxs_mmc_enable_sdio_irq+0x18/0xdc [mxs_mmc]
[   65.470000] 
[   65.470000] other info that might help us debug this:
[   65.470000]  Possible unsafe locking scenario:
[   65.470000] 
[   65.470000]        CPU0
[   65.470000]        ----
[   65.470000]   lock(&(&host->lock)->rlock#2);
[   65.470000]   lock(&(&host->lock)->rlock#2);
[   65.470000] 
[   65.470000]  *** DEADLOCK ***
[   65.470000] 
[   65.470000]  May be due to missing lock nesting notation
[   65.470000] 
[   65.470000] 1 lock held by ksdioirqd/mmc0/73:
[   65.470000]  #0:  (&(&host->lock)->rlock#2){-.-...}, at: [<bf054120>] mxs_mmc_enable_sdio_irq+0x18/0xdc [mxs_mmc]
[   65.470000] 
[   65.470000] stack backtrace:
[   65.470000] [<c0014990>] (unwind_backtrace+0x0/0xf4) from [<c005ccb8>] (__lock_acquire+0x14f8/0x1b98)
[   65.470000] [<c005ccb8>] (__lock_acquire+0x14f8/0x1b98) from [<c005d3f8>] (lock_acquire+0xa0/0x108)
[   65.470000] [<c005d3f8>] (lock_acquire+0xa0/0x108) from [<c02f671c>] (_raw_spin_lock_irqsave+0x48/0x5c)
[   65.470000] [<c02f671c>] (_raw_spin_lock_irqsave+0x48/0x5c) from [<bf054120>] (mxs_mmc_enable_sdio_irq+0x18/0xdc [mxs_mmc])
[   65.470000] [<bf054120>] (mxs_mmc_enable_sdio_irq+0x18/0xdc [mxs_mmc]) from [<bf0541d0>] (mxs_mmc_enable_sdio_irq+0xc8/0xdc [mxs_mmc])
[   65.470000] [<bf0541d0>] (mxs_mmc_enable_sdio_irq+0xc8/0xdc [mxs_mmc]) from [<c0219b38>] (sdio_irq_thread+0x1bc/0x274)
[   65.470000] [<c0219b38>] (sdio_irq_thread+0x1bc/0x274) from [<c003c324>] (kthread+0x8c/0x98)
[   65.470000] [<c003c324>] (kthread+0x8c/0x98) from [<c00101ac>] (kernel_thread_exit+0x0/0x8)
[   65.470000] BUG: spinlock lockup suspected on CPU#0, ksdioirqd/mmc0/73
[   65.470000]  lock: 0xc3358724, .magic: dead4ead, .owner: ksdioirqd/mmc0/73, .owner_cpu: 0
[   65.470000] [<c0014990>] (unwind_backtrace+0x0/0xf4) from [<c01b46b0>] (do_raw_spin_lock+0x100/0x144)
[   65.470000] [<c01b46b0>] (do_raw_spin_lock+0x100/0x144) from [<c02f6724>] (_raw_spin_lock_irqsave+0x50/0x5c)
[   65.470000] [<c02f6724>] (_raw_spin_lock_irqsave+0x50/0x5c) from [<bf054120>] (mxs_mmc_enable_sdio_irq+0x18/0xdc [mxs_mmc])
[   65.470000] [<bf054120>] (mxs_mmc_enable_sdio_irq+0x18/0xdc [mxs_mmc]) from [<bf0541d0>] (mxs_mmc_enable_sdio_irq+0xc8/0xdc [mxs_mmc])
[   65.470000] [<bf0541d0>] (mxs_mmc_enable_sdio_irq+0xc8/0xdc [mxs_mmc]) from [<c0219b38>] (sdio_irq_thread+0x1bc/0x274)
[   65.470000] [<c0219b38>] (sdio_irq_thread+0x1bc/0x274) from [<c003c324>] (kthread+0x8c/0x98)
[   65.470000] [<c003c324>] (kthread+0x8c/0x98) from [<c00101ac>] (kernel_thread_exit+0x0/0x8)
---schnapp---

Any hints how to work around or fix this, would be appreciated

			Attila Kinali
Shawn Guo July 12, 2012, 2:39 p.m. UTC | #6
On Thu, Jul 12, 2012 at 04:00:08PM +0200, Attila Kinali wrote:
> On Wed, 11 Jul 2012 14:06:09 +0800
> Shawn Guo <shawn.guo@linaro.org> wrote:
> 
> 
> > > I found a way to fix this issue:
> > > 
> > > --- a/drivers/mmc/host/mxs-mmc.c
> > > +++ b/drivers/mmc/host/mxs-mmc.c
> > > @@ -278,11 +278,11 @@ static irqreturn_t mxs_mmc_irq_handler(int
> > > irq, void *dev_id)
> > >  	writel(stat & MXS_MMC_IRQ_BITS,
> > >  	       host->base + HW_SSP_CTRL1(host) + STMP_OFFSET_REG_CLR);
> > > 
> > > +	spin_unlock(&host->lock);
> > > +
> > >  	if ((stat & BM_SSP_CTRL1_SDIO_IRQ) && (stat & BM_SSP_CTRL1_SDIO_IRQ_EN))
> > >  		mmc_signal_sdio_irq(host->mmc);
> > > 
> > > -	spin_unlock(&host->lock);
> > > -
> > >  	if (stat & BM_SSP_CTRL1_RESP_TIMEOUT_IRQ)
> > >  		cmd->error = -ETIMEDOUT;
> > >  	else if (stat & BM_SSP_CTRL1_RESP_ERR_IRQ)
> > > 
> > > 
> > > Is there any reason to keep mmc_signal_sdio_irq inside the spinlock?
> > > mmc_signal_sdio_irq calls mxs_mmc_enable_sdio_irq and it tries to
> > > acquire lock while it is already acquired.
> > > 
> > The fix looks right to me.  You can have my ack when you send a patch
> > for it.
> > 
> > Acked-by: Shawn Guo <shawn.guo@linaro.org>
> 
> I ran into the same problem today, but the proposed fix doesn't seem
> to work for me:
> 
It's a different problem from what Lauri reported and fixed.  I haven't
played SDIO card that much, so I'm not completely clear about the SDIO
calling sequence, but is it reasonable that mxs_mmc_enable_sdio_irq is
being called recursively?

Regards,
Shawn

> ---schnipp---
> # modprobe libertas_sdio
> [   59.200000] lib80211: common routines for IEEE802.11 drivers
> [   59.240000] cfg80211: Calling CRDA to update world regulatory domain
> [   59.320000] libertas_sdio: Libertas SDIO driver
> [   59.330000] libertas_sdio: Copyright Pierre Ossman
> # modprobe mxs-mmc
> [   64.210000] mxs-mmc 80010000.ssp: initialized
> [   64.260000] mxs-mmc 80034000.ssp: initialized
> [   64.270000] mmc0: new SDIO card at address 0001
> # [   65.440000] libertas_sdio mmc0:0001:1: (unregistered net_device): 00:13:04:80:00:3f, fw 9.70.3p24, cap 0x00000303
> [   65.470000] 
> [   65.470000] =============================================
> [   65.470000] [ INFO: possible recursive locking detected ]
> [   65.470000] 3.5.0-rc5 #2 Not tainted
> [   65.470000] ---------------------------------------------
> [   65.470000] ksdioirqd/mmc0/73 is trying to acquire lock:
> [   65.470000]  (&(&host->lock)->rlock#2){-.-...}, at: [<bf054120>] mxs_mmc_enable_sdio_irq+0x18/0xdc [mxs_mmc]
> [   65.470000] 
> [   65.470000] but task is already holding lock:
> [   65.470000]  (&(&host->lock)->rlock#2){-.-...}, at: [<bf054120>] mxs_mmc_enable_sdio_irq+0x18/0xdc [mxs_mmc]
> [   65.470000] 
> [   65.470000] other info that might help us debug this:
> [   65.470000]  Possible unsafe locking scenario:
> [   65.470000] 
> [   65.470000]        CPU0
> [   65.470000]        ----
> [   65.470000]   lock(&(&host->lock)->rlock#2);
> [   65.470000]   lock(&(&host->lock)->rlock#2);
> [   65.470000] 
> [   65.470000]  *** DEADLOCK ***
> [   65.470000] 
> [   65.470000]  May be due to missing lock nesting notation
> [   65.470000] 
> [   65.470000] 1 lock held by ksdioirqd/mmc0/73:
> [   65.470000]  #0:  (&(&host->lock)->rlock#2){-.-...}, at: [<bf054120>] mxs_mmc_enable_sdio_irq+0x18/0xdc [mxs_mmc]
> [   65.470000] 
> [   65.470000] stack backtrace:
> [   65.470000] [<c0014990>] (unwind_backtrace+0x0/0xf4) from [<c005ccb8>] (__lock_acquire+0x14f8/0x1b98)
> [   65.470000] [<c005ccb8>] (__lock_acquire+0x14f8/0x1b98) from [<c005d3f8>] (lock_acquire+0xa0/0x108)
> [   65.470000] [<c005d3f8>] (lock_acquire+0xa0/0x108) from [<c02f671c>] (_raw_spin_lock_irqsave+0x48/0x5c)
> [   65.470000] [<c02f671c>] (_raw_spin_lock_irqsave+0x48/0x5c) from [<bf054120>] (mxs_mmc_enable_sdio_irq+0x18/0xdc [mxs_mmc])
> [   65.470000] [<bf054120>] (mxs_mmc_enable_sdio_irq+0x18/0xdc [mxs_mmc]) from [<bf0541d0>] (mxs_mmc_enable_sdio_irq+0xc8/0xdc [mxs_mmc])
> [   65.470000] [<bf0541d0>] (mxs_mmc_enable_sdio_irq+0xc8/0xdc [mxs_mmc]) from [<c0219b38>] (sdio_irq_thread+0x1bc/0x274)
> [   65.470000] [<c0219b38>] (sdio_irq_thread+0x1bc/0x274) from [<c003c324>] (kthread+0x8c/0x98)
> [   65.470000] [<c003c324>] (kthread+0x8c/0x98) from [<c00101ac>] (kernel_thread_exit+0x0/0x8)
> [   65.470000] BUG: spinlock lockup suspected on CPU#0, ksdioirqd/mmc0/73
> [   65.470000]  lock: 0xc3358724, .magic: dead4ead, .owner: ksdioirqd/mmc0/73, .owner_cpu: 0
> [   65.470000] [<c0014990>] (unwind_backtrace+0x0/0xf4) from [<c01b46b0>] (do_raw_spin_lock+0x100/0x144)
> [   65.470000] [<c01b46b0>] (do_raw_spin_lock+0x100/0x144) from [<c02f6724>] (_raw_spin_lock_irqsave+0x50/0x5c)
> [   65.470000] [<c02f6724>] (_raw_spin_lock_irqsave+0x50/0x5c) from [<bf054120>] (mxs_mmc_enable_sdio_irq+0x18/0xdc [mxs_mmc])
> [   65.470000] [<bf054120>] (mxs_mmc_enable_sdio_irq+0x18/0xdc [mxs_mmc]) from [<bf0541d0>] (mxs_mmc_enable_sdio_irq+0xc8/0xdc [mxs_mmc])
> [   65.470000] [<bf0541d0>] (mxs_mmc_enable_sdio_irq+0xc8/0xdc [mxs_mmc]) from [<c0219b38>] (sdio_irq_thread+0x1bc/0x274)
> [   65.470000] [<c0219b38>] (sdio_irq_thread+0x1bc/0x274) from [<c003c324>] (kthread+0x8c/0x98)
> [   65.470000] [<c003c324>] (kthread+0x8c/0x98) from [<c00101ac>] (kernel_thread_exit+0x0/0x8)
> ---schnapp---
> 
> Any hints how to work around or fix this, would be appreciated
> 
> 			Attila Kinali
> 
> -- 
> It is upon moral qualities that a society is ultimately founded. All 
> the prosperity and technological sophistication in the world is of no 
> use without that foundation.
>                  -- Miss Matheson, The Diamond Age, Neil Stephenson
Attila Kinali July 12, 2012, 3:13 p.m. UTC | #7
On Thu, 12 Jul 2012 22:39:53 +0800
Shawn Guo <shawn.guo@linaro.org> wrote:

> > 
> > I ran into the same problem today, but the proposed fix doesn't seem
> > to work for me:
> > 
> It's a different problem from what Lauri reported and fixed.  

Ok... 

> I haven't
> played SDIO card that much, so I'm not completely clear about the SDIO
> calling sequence, but is it reasonable that mxs_mmc_enable_sdio_irq is
> being called recursively?

I don't know. I dont know the code at all and not how the sdio system
works. But a quick check shows, that mxs_mmc_enable_sdio_irq does not
call any other function (besides readel, writel) and hence cannot call itself.

For me it rather looks like that there seem to be two consequtive
irqs that get passed to sdio_irq_thread which then calls 
mxs_mmc_enable_sdio_irq.

But with my limited knowledge i cannot check this theory.
Can anyone give me some hints how i could verify this?

			Attila Kinali
diff mbox

Patch

--- a/drivers/mmc/host/mxs-mmc.c
+++ b/drivers/mmc/host/mxs-mmc.c
@@ -278,11 +278,11 @@  static irqreturn_t mxs_mmc_irq_handler(int irq, 
void *dev_id)
  	writel(stat & MXS_MMC_IRQ_BITS,
  	       host->base + HW_SSP_CTRL1(host) + STMP_OFFSET_REG_CLR);

+	spin_unlock(&host->lock);
+
  	if ((stat & BM_SSP_CTRL1_SDIO_IRQ) && (stat & BM_SSP_CTRL1_SDIO_IRQ_EN))
  		mmc_signal_sdio_irq(host->mmc);

-	spin_unlock(&host->lock);
-
  	if (stat & BM_SSP_CTRL1_RESP_TIMEOUT_IRQ)
  		cmd->error = -ETIMEDOUT;