diff mbox

Panda ES board hang when using GPIO as interrupt

Message ID 4FECEE9A.5070300@ti.com (mailing list archive)
State New, archived
Headers show

Commit Message

Hunter, Jon June 28, 2012, 11:54 p.m. UTC
On 06/28/2012 06:10 PM, Franky Lin wrote:
> On 06/28/2012 03:59 PM, Jon Hunter wrote:
>>
>> On 06/28/2012 05:53 PM, Franky Lin wrote:
>>> I found one interesting thing. When I added the print info to see when
>>> runtime_suspend/resume get called, it seems like the suspend/resume is
>>> unbalance during boot. Resume got called more than suspend. So I hack
>>> the code to make sure suspend and resume are called in pair. A resume
>>> without suspend will do nothing and return immediately. This also makes
>>> the hang vanish.
>>
>> I am not 100% sure I follow. On boot I would expect to see a
>> resume/suspend due to the probe on the irq bank and then I would expect
>> to see another resume from the acquisition of the gpio, however, I would
>> not expect a suspend until the gpio is freed, which I don't believe you
>> are doing.
>>
>> Can you share your hack? Just paste the diff? This may help me
>> understand more.
>>
> 
> OK.
> This is what I saw in the log:
> [    0.171844] dummy:
> [    0.172912] NET: Registered protocol family 16
> [    0.173431] GPMC revision 6.0
> [    0.173492] gpmc: irq-52 could not claim: err -22
> [    0.177551] ??????omap_gpio_runtime_resume
> [    0.178619] OMAP GPIO hardware version 0.1
> [    0.178649] !!!!!omap_gpio_runtime_suspend
> [    0.178771] ??????omap_gpio_runtime_resume
> [    0.179351] !!!!!omap_gpio_runtime_suspend
> [    0.179504] ??????omap_gpio_runtime_resume
> [    0.180023] !!!!!omap_gpio_runtime_suspend
> [    0.180145] ??????omap_gpio_runtime_resume
> [    0.180694] !!!!!omap_gpio_runtime_suspend
> [    0.180847] ??????omap_gpio_runtime_resume
> [    0.181365] !!!!!omap_gpio_runtime_suspend
> [    0.181518] ??????omap_gpio_runtime_resume
> [    0.182037] !!!!!omap_gpio_runtime_suspend
> [    0.185089] omap_mux_init: Add partition: #1: core, flags: 2
> [    0.186462] omap_mux_init: Add partition: #2: wkup, flags: 2
> [    0.186584] error setting wl12xx data: -38
> [    0.189788] _omap_mux_get_by_name: Could not find signal
> uart1_rx.uart1_rx
> [    0.189788] _omap_mux_get_by_name: Could not find signal
> uart1_rx.uart1_rx
> [    0.239501] ??????omap_gpio_runtime_resume
> [    0.239532] ??????omap_gpio_runtime_resume
> [    0.241058]  usbhs_omap: alias fck already exists
> [    0.244781] ??????omap_gpio_runtime_resume

I am wondering if this could be the bug ... on start-up I see that we do
a context restore on bank1 during the probe which is before we have done
the first suspend! In other words, we could restore a bad/uninitialised
context for bank1. In the case of bank1, the loss count starts at 1 and
not 0 and so we falsely think we need to perform a restore :-(

[    0.176269] omap_gpio_runtime_resume: bank @ 0xfc310000
[    0.177276] omap_gpio_runtime_resume: count 0, now 1
[    0.177276] gpiochip_add: registered GPIOs 0 to 31 on device: gpio
[    0.177642] omap_gpio_runtime_suspend: bank @ 0xfc310000

Can you try ...

        if (bank->irq_base < 0) {

Thanks
Jon

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Franky Lin June 29, 2012, 12:59 a.m. UTC | #1
On 06/28/2012 04:54 PM, Jon Hunter wrote:
> I am wondering if this could be the bug ... on start-up I see that we do
> a context restore on bank1 during the probe which is before we have done
> the first suspend! In other words, we could restore a bad/uninitialised
> context for bank1. In the case of bank1, the loss count starts at 1 and
> not 0 and so we falsely think we need to perform a restore :-(
>
> [    0.176269] omap_gpio_runtime_resume: bank @ 0xfc310000
> [    0.177276] omap_gpio_runtime_resume: count 0, now 1
> [    0.177276] gpiochip_add: registered GPIOs 0 to 31 on device: gpio
> [    0.177642] omap_gpio_runtime_suspend: bank @ 0xfc310000
>
> Can you try ...
>
> diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
> index c4ed172..9623408 100644
> --- a/drivers/gpio/gpio-omap.c
> +++ b/drivers/gpio/gpio-omap.c
> @@ -1086,6 +1086,9 @@ static int __devinit omap_gpio_probe(struct
> platform_device *pdev)
>   #ifdef CONFIG_OF_GPIO
>          bank->chip.of_node = of_node_get(node);
>   #endif
> +       if (bank->get_context_loss_count)
> +               bank->context_loss_count =
> +                               bank->get_context_loss_count(bank->dev);
>
>          bank->irq_base = irq_alloc_descs(-1, 0, bank->width, 0);
>          if (bank->irq_base < 0) {
>

Looks like you found the culprit. :) It does fix the problem.

Franky

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tarun Kanti DebBarma June 29, 2012, 4:07 a.m. UTC | #2
On Fri, Jun 29, 2012 at 6:29 AM, Franky Lin <frankyl@broadcom.com> wrote:
> On 06/28/2012 04:54 PM, Jon Hunter wrote:
>>
>> I am wondering if this could be the bug ... on start-up I see that we do
>> a context restore on bank1 during the probe which is before we have done
>> the first suspend! In other words, we could restore a bad/uninitialised
>> context for bank1. In the case of bank1, the loss count starts at 1 and
>> not 0 and so we falsely think we need to perform a restore :-(
>>
>> [    0.176269] omap_gpio_runtime_resume: bank @ 0xfc310000
>> [    0.177276] omap_gpio_runtime_resume: count 0, now 1
>> [    0.177276] gpiochip_add: registered GPIOs 0 to 31 on device: gpio
>> [    0.177642] omap_gpio_runtime_suspend: bank @ 0xfc310000
>>
>> Can you try ...
>>
>> diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
>> index c4ed172..9623408 100644
>> --- a/drivers/gpio/gpio-omap.c
>> +++ b/drivers/gpio/gpio-omap.c
>> @@ -1086,6 +1086,9 @@ static int __devinit omap_gpio_probe(struct
>> platform_device *pdev)
>>  #ifdef CONFIG_OF_GPIO
>>         bank->chip.of_node = of_node_get(node);
>>  #endif
>> +       if (bank->get_context_loss_count)
>> +               bank->context_loss_count =
>> +                               bank->get_context_loss_count(bank->dev);
>>
>>         bank->irq_base = irq_alloc_descs(-1, 0, bank->width, 0);
>>         if (bank->irq_base < 0) {
>>
>
> Looks like you found the culprit. :) It does fix the problem.
So this looks similar to what NeilBrown <neilb@suse.de> reported in
another thread.
The reason was context_loss_count = 1 for GPIO BANK#0 which of course is in the
WKUP domain. In fact he tried out with the same fix. Anyways, we
should hear from
Kevin now whether it is feasible to fix the context_loss_count for the WKUP GPIO
bank or to put the workaround here in the gpio driver.
--
Tarun
>
> Franky
>
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hunter, Jon June 29, 2012, 3:53 p.m. UTC | #3
On 06/28/2012 11:07 PM, DebBarma, Tarun Kanti wrote:
> On Fri, Jun 29, 2012 at 6:29 AM, Franky Lin <frankyl@broadcom.com> wrote:
>> On 06/28/2012 04:54 PM, Jon Hunter wrote:
>>>
>>> I am wondering if this could be the bug ... on start-up I see that we do
>>> a context restore on bank1 during the probe which is before we have done
>>> the first suspend! In other words, we could restore a bad/uninitialised
>>> context for bank1. In the case of bank1, the loss count starts at 1 and
>>> not 0 and so we falsely think we need to perform a restore :-(
>>>
>>> [    0.176269] omap_gpio_runtime_resume: bank @ 0xfc310000
>>> [    0.177276] omap_gpio_runtime_resume: count 0, now 1
>>> [    0.177276] gpiochip_add: registered GPIOs 0 to 31 on device: gpio
>>> [    0.177642] omap_gpio_runtime_suspend: bank @ 0xfc310000
>>>
>>> Can you try ...
>>>
>>> diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
>>> index c4ed172..9623408 100644
>>> --- a/drivers/gpio/gpio-omap.c
>>> +++ b/drivers/gpio/gpio-omap.c
>>> @@ -1086,6 +1086,9 @@ static int __devinit omap_gpio_probe(struct
>>> platform_device *pdev)
>>>  #ifdef CONFIG_OF_GPIO
>>>         bank->chip.of_node = of_node_get(node);
>>>  #endif
>>> +       if (bank->get_context_loss_count)
>>> +               bank->context_loss_count =
>>> +                               bank->get_context_loss_count(bank->dev);
>>>
>>>         bank->irq_base = irq_alloc_descs(-1, 0, bank->width, 0);
>>>         if (bank->irq_base < 0) {
>>>
>>
>> Looks like you found the culprit. :) It does fix the problem.
> So this looks similar to what NeilBrown <neilb@suse.de> reported in
> another thread.
> The reason was context_loss_count = 1 for GPIO BANK#0 which of course is in the
> WKUP domain. In fact he tried out with the same fix. Anyways, we
> should hear from
> Kevin now whether it is feasible to fix the context_loss_count for the WKUP GPIO
> bank or to put the workaround here in the gpio driver.

Ok, so I have been looking at this some more today. I believe that the
actual bug is that we are not checking to see if "loses_context" is true
before populating "get_context_loss_count" (see omap dmtimer driver).
For bank0 loses_context is false and so we should never be calling
"get_context_loss_count" in the first place.

I will send out a patch to fix this and will copy Kevin and Franky.

Franky, if you can test and confirm it works that would be great.

Kevin, if you can review that would be great too.

Cheers
Jon
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index c4ed172..9623408 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -1086,6 +1086,9 @@  static int __devinit omap_gpio_probe(struct
platform_device *pdev)
 #ifdef CONFIG_OF_GPIO
        bank->chip.of_node = of_node_get(node);
 #endif
+       if (bank->get_context_loss_count)
+               bank->context_loss_count =
+                               bank->get_context_loss_count(bank->dev);

        bank->irq_base = irq_alloc_descs(-1, 0, bank->width, 0);