diff mbox

gpio/omap: fix invalid context restore of gpio bank-0

Message ID 1340990551-19426-1-git-send-email-jon-hunter@ti.com (mailing list archive)
State New, archived
Headers show

Commit Message

Hunter, Jon June 29, 2012, 5:22 p.m. UTC
Currently the gpio _runtime_resume/suspend functions are calling the
get_context_loss_count() platform function if the function is populated for
a gpio bank. This function is used to determine if the gpio bank logic state
needs to be restored due to a power transition. This function will be populated
for all banks, but it should only be called for banks that have the
"loses_context" variable set. It is pointless to call this if loses_context is
false as we know the context will never be lost and will not need restoring.

For all OMAP2+ devices gpio bank-0 is in an always-on power domain and so will
never lose context. We found that the get_context_loss_count() was being called
for bank-0 during the probe and returning 1 instead of 0 indicating that the
context had been lost. This was causing the context restore function to be
called at probe time for this bank and because the context had never been saved,
was restoring an invalid state. This ultimately resulted in a crash [1].

There are multiple bugs here that need to be addressed ...

1. Why the always-on power domain returns a context loss count of 1? This needs
   to be fixed in the power domain code. However, the gpio driver should not
   assume the loss count is 0 to begin with.
2. The omap gpio driver should never be calling get_context_loss_count for a
   gpio bank in a always-on domain. This is pointless and adds unneccessary
   overhead.
3. The OMAP gpio driver assumes that the initial power domain context loss count
   will be 0 at the time the gpio driver is probed. However, it could be
   possible that this is not the case and an invalid context restore could be
   performed during the probe. To avoid this otherwise only populated the
   get_context_loss_count() function pointer after the initial call to
   pm_runtime_get() has occurred. This will ensure that the first
   pm_runtime_put() initialised the loss count correctly.

This patch addresses issues 2 and 3 above.

[1] http://marc.info/?l=linux-omap&m=134065775323775&w=2

Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: Linus Walleij <linus.walleij@stericsson.com>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Tarun Kanti DebBarma <tarun.kanti@ti.com>
Cc: Franky Lin <frankyl@broadcom.com>

Reported-by: Franky Lin <frankyl@broadcom.com>
Signed-off-by: Jon Hunter <jon-hunter@ti.com>
---
 drivers/gpio/gpio-omap.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Franky Lin June 29, 2012, 8:27 p.m. UTC | #1
On 06/29/2012 10:22 AM, Jon Hunter wrote:
> Currently the gpio _runtime_resume/suspend functions are calling the
> get_context_loss_count() platform function if the function is populated for
> a gpio bank. This function is used to determine if the gpio bank logic state
> needs to be restored due to a power transition. This function will be populated
> for all banks, but it should only be called for banks that have the
> "loses_context" variable set. It is pointless to call this if loses_context is
> false as we know the context will never be lost and will not need restoring.
>
> For all OMAP2+ devices gpio bank-0 is in an always-on power domain and so will
> never lose context. We found that the get_context_loss_count() was being called
> for bank-0 during the probe and returning 1 instead of 0 indicating that the
> context had been lost. This was causing the context restore function to be
> called at probe time for this bank and because the context had never been saved,
> was restoring an invalid state. This ultimately resulted in a crash [1].
>
> There are multiple bugs here that need to be addressed ...
>
> 1. Why the always-on power domain returns a context loss count of 1? This needs
>     to be fixed in the power domain code. However, the gpio driver should not
>     assume the loss count is 0 to begin with.
> 2. The omap gpio driver should never be calling get_context_loss_count for a
>     gpio bank in a always-on domain. This is pointless and adds unneccessary
>     overhead.
> 3. The OMAP gpio driver assumes that the initial power domain context loss count
>     will be 0 at the time the gpio driver is probed. However, it could be
>     possible that this is not the case and an invalid context restore could be
>     performed during the probe. To avoid this otherwise only populated the
>     get_context_loss_count() function pointer after the initial call to
>     pm_runtime_get() has occurred. This will ensure that the first
>     pm_runtime_put() initialised the loss count correctly.
>
> This patch addresses issues 2 and 3 above.
>
> [1] http://marc.info/?l=linux-omap&m=134065775323775&w=2
>
> Cc: Grant Likely <grant.likely@secretlab.ca>
> Cc: Linus Walleij <linus.walleij@stericsson.com>
> Cc: Kevin Hilman <khilman@ti.com>
> Cc: Tarun Kanti DebBarma <tarun.kanti@ti.com>
> Cc: Franky Lin <frankyl@broadcom.com>
>
> Reported-by: Franky Lin <frankyl@broadcom.com>
> Signed-off-by: Jon Hunter <jon-hunter@ti.com>
> ---

Tested-by: Franky Lin <frankyl@broadcom.com>

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Santosh Shilimkar June 30, 2012, 4:18 a.m. UTC | #2
On Fri, Jun 29, 2012 at 10:52 PM, Jon Hunter <jon-hunter@ti.com> wrote:
> Currently the gpio _runtime_resume/suspend functions are calling the
> get_context_loss_count() platform function if the function is populated for
> a gpio bank. This function is used to determine if the gpio bank logic state
> needs to be restored due to a power transition. This function will be populated
> for all banks, but it should only be called for banks that have the
> "loses_context" variable set. It is pointless to call this if loses_context is
> false as we know the context will never be lost and will not need restoring.
>
> For all OMAP2+ devices gpio bank-0 is in an always-on power domain and so will
> never lose context. We found that the get_context_loss_count() was being called
> for bank-0 during the probe and returning 1 instead of 0 indicating that the
> context had been lost. This was causing the context restore function to be
> called at probe time for this bank and because the context had never been saved,
> was restoring an invalid state. This ultimately resulted in a crash [1].
>
> There are multiple bugs here that need to be addressed ...
>
> 1. Why the always-on power domain returns a context loss count of 1? This needs
>   to be fixed in the power domain code. However, the gpio driver should not
>   assume the loss count is 0 to begin with.
Indeed. GPIO driver should not assume the value.

> 2. The omap gpio driver should never be calling get_context_loss_count for a
>   gpio bank in a always-on domain. This is pointless and adds unneccessary
>   overhead.
Make sense too.

> 3. The OMAP gpio driver assumes that the initial power domain context loss count
>   will be 0 at the time the gpio driver is probed. However, it could be
>   possible that this is not the case and an invalid context restore could be
>   performed during the probe. To avoid this otherwise only populated the
>   get_context_loss_count() function pointer after the initial call to
>   pm_runtime_get() has occurred. This will ensure that the first
>   pm_runtime_put() initialised the loss count correctly.
>
> This patch addresses issues 2 and 3 above.
>
> [1] http://marc.info/?l=linux-omap&m=134065775323775&w=2
>
> Cc: Grant Likely <grant.likely@secretlab.ca>
> Cc: Linus Walleij <linus.walleij@stericsson.com>
> Cc: Kevin Hilman <khilman@ti.com>
> Cc: Tarun Kanti DebBarma <tarun.kanti@ti.com>
> Cc: Franky Lin <frankyl@broadcom.com>
>
> Reported-by: Franky Lin <frankyl@broadcom.com>
> Signed-off-by: Jon Hunter <jon-hunter@ti.com>
> ---
Thanks Jon for sorting this out. Patch looks good to me.

Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tony Lindgren July 1, 2012, 8:45 a.m. UTC | #3
* Shilimkar, Santosh <santosh.shilimkar@ti.com> [120629 21:23]:
> On Fri, Jun 29, 2012 at 10:52 PM, Jon Hunter <jon-hunter@ti.com> wrote:
> > Currently the gpio _runtime_resume/suspend functions are calling the
> > get_context_loss_count() platform function if the function is populated for
> > a gpio bank. This function is used to determine if the gpio bank logic state
> > needs to be restored due to a power transition. This function will be populated
> > for all banks, but it should only be called for banks that have the
> > "loses_context" variable set. It is pointless to call this if loses_context is
> > false as we know the context will never be lost and will not need restoring.
> >
> > For all OMAP2+ devices gpio bank-0 is in an always-on power domain and so will
> > never lose context. We found that the get_context_loss_count() was being called
> > for bank-0 during the probe and returning 1 instead of 0 indicating that the
> > context had been lost. This was causing the context restore function to be
> > called at probe time for this bank and because the context had never been saved,
> > was restoring an invalid state. This ultimately resulted in a crash [1].
> >
> > There are multiple bugs here that need to be addressed ...
> >
> > 1. Why the always-on power domain returns a context loss count of 1? This needs
> >   to be fixed in the power domain code. However, the gpio driver should not
> >   assume the loss count is 0 to begin with.
> Indeed. GPIO driver should not assume the value.
> 
> > 2. The omap gpio driver should never be calling get_context_loss_count for a
> >   gpio bank in a always-on domain. This is pointless and adds unneccessary
> >   overhead.
> Make sense too.
> 
> > 3. The OMAP gpio driver assumes that the initial power domain context loss count
> >   will be 0 at the time the gpio driver is probed. However, it could be
> >   possible that this is not the case and an invalid context restore could be
> >   performed during the probe. To avoid this otherwise only populated the
> >   get_context_loss_count() function pointer after the initial call to
> >   pm_runtime_get() has occurred. This will ensure that the first
> >   pm_runtime_put() initialised the loss count correctly.
> >
> > This patch addresses issues 2 and 3 above.

Should this one be Cc: stable? If this is a regression, then the regression
causing commit should be mentioned.

Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kevin Hilman July 2, 2012, 6:07 p.m. UTC | #4
+ Neil Brown

Hi Jon,

Jon Hunter <jon-hunter@ti.com> writes:

> Currently the gpio _runtime_resume/suspend functions are calling the
> get_context_loss_count() platform function if the function is populated for
> a gpio bank. This function is used to determine if the gpio bank logic state
> needs to be restored due to a power transition. This function will be populated
> for all banks, but it should only be called for banks that have the
> "loses_context" variable set. It is pointless to call this if loses_context is
> false as we know the context will never be lost and will not need restoring.
>
> For all OMAP2+ devices gpio bank-0 is in an always-on power domain and so will
> never lose context. We found that the get_context_loss_count() was being called
> for bank-0 during the probe and returning 1 instead of 0 indicating that the
> context had been lost. This was causing the context restore function to be
> called at probe time for this bank and because the context had never been saved,
> was restoring an invalid state. This ultimately resulted in a crash [1].
>
> There are multiple bugs here that need to be addressed ...
>
> 1. Why the always-on power domain returns a context loss count of 1? This needs
>    to be fixed in the power domain code. However, the gpio driver should not
>    assume the loss count is 0 to begin with.
> 2. The omap gpio driver should never be calling get_context_loss_count for a
>    gpio bank in a always-on domain. This is pointless and adds unneccessary
>    overhead.
> 3. The OMAP gpio driver assumes that the initial power domain context loss count
>    will be 0 at the time the gpio driver is probed. However, it could be
>    possible that this is not the case and an invalid context restore could be
>    performed during the probe. To avoid this otherwise only populated the

The 'To avoid this...' sentence here doesn't read well.  Looks like you
need to:

s/otherwise//
s/populated/populate/

?

>    get_context_loss_count() function pointer after the initial call to
>    pm_runtime_get() has occurred. This will ensure that the first
>    pm_runtime_put() initialised the loss count correctly.
>
> This patch addresses issues 2 and 3 above.
> [1] http://marc.info/?l=linux-omap&m=134065775323775&w=2
>
> Cc: Grant Likely <grant.likely@secretlab.ca>
> Cc: Linus Walleij <linus.walleij@stericsson.com>
> Cc: Kevin Hilman <khilman@ti.com>
> Cc: Tarun Kanti DebBarma <tarun.kanti@ti.com>
> Cc: Franky Lin <frankyl@broadcom.com>
>
> Reported-by: Franky Lin <frankyl@broadcom.com>
> Signed-off-by: Jon Hunter <jon-hunter@ti.com>

Thanks for digging inot this bug Jon.  The same bug was brought up by
Neil Brown (Cc'd) in a different thread.

Neil, it looks to me that this fix will address the problems you were
seeing as well.  Care to test, and respond with your ack/tested-by if it
works for you?  Thanks.

Kevin

> ---
>  drivers/gpio/gpio-omap.c |    4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
> index c4ed172..f13fc9c 100644
> --- a/drivers/gpio/gpio-omap.c
> +++ b/drivers/gpio/gpio-omap.c
> @@ -1081,7 +1081,6 @@ static int __devinit omap_gpio_probe(struct platform_device *pdev)
>  	bank->is_mpuio = pdata->is_mpuio;
>  	bank->non_wakeup_gpios = pdata->non_wakeup_gpios;
>  	bank->loses_context = pdata->loses_context;
> -	bank->get_context_loss_count = pdata->get_context_loss_count;
>  	bank->regs = pdata->regs;
>  #ifdef CONFIG_OF_GPIO
>  	bank->chip.of_node = of_node_get(node);
> @@ -1135,6 +1134,9 @@ static int __devinit omap_gpio_probe(struct platform_device *pdev)
>  	omap_gpio_chip_init(bank);
>  	omap_gpio_show_rev(bank);
>  
> +	if (bank->loses_context)
> +		bank->get_context_loss_count = pdata->get_context_loss_count;
> +
>  	pm_runtime_put(bank->dev);
>  
>  	list_add_tail(&bank->node, &omap_gpio_list);
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hunter, Jon July 2, 2012, 6:22 p.m. UTC | #5
On 07/01/2012 03:45 AM, Tony Lindgren wrote:
> * Shilimkar, Santosh <santosh.shilimkar@ti.com> [120629 21:23]:
>> On Fri, Jun 29, 2012 at 10:52 PM, Jon Hunter <jon-hunter@ti.com> wrote:
>>> Currently the gpio _runtime_resume/suspend functions are calling the
>>> get_context_loss_count() platform function if the function is populated for
>>> a gpio bank. This function is used to determine if the gpio bank logic state
>>> needs to be restored due to a power transition. This function will be populated
>>> for all banks, but it should only be called for banks that have the
>>> "loses_context" variable set. It is pointless to call this if loses_context is
>>> false as we know the context will never be lost and will not need restoring.
>>>
>>> For all OMAP2+ devices gpio bank-0 is in an always-on power domain and so will
>>> never lose context. We found that the get_context_loss_count() was being called
>>> for bank-0 during the probe and returning 1 instead of 0 indicating that the
>>> context had been lost. This was causing the context restore function to be
>>> called at probe time for this bank and because the context had never been saved,
>>> was restoring an invalid state. This ultimately resulted in a crash [1].
>>>
>>> There are multiple bugs here that need to be addressed ...
>>>
>>> 1. Why the always-on power domain returns a context loss count of 1? This needs
>>>   to be fixed in the power domain code. However, the gpio driver should not
>>>   assume the loss count is 0 to begin with.
>> Indeed. GPIO driver should not assume the value.
>>
>>> 2. The omap gpio driver should never be calling get_context_loss_count for a
>>>   gpio bank in a always-on domain. This is pointless and adds unneccessary
>>>   overhead.
>> Make sense too.
>>
>>> 3. The OMAP gpio driver assumes that the initial power domain context loss count
>>>   will be 0 at the time the gpio driver is probed. However, it could be
>>>   possible that this is not the case and an invalid context restore could be
>>>   performed during the probe. To avoid this otherwise only populated the
>>>   get_context_loss_count() function pointer after the initial call to
>>>   pm_runtime_get() has occurred. This will ensure that the first
>>>   pm_runtime_put() initialised the loss count correctly.
>>>
>>> This patch addresses issues 2 and 3 above.
> 
> Should this one be Cc: stable? If this is a regression, then the regression
> causing commit should be mentioned.

So that raises a good point. Looking at the stable branch (3.4.4) it is
missing 3 other fixes too [1][2][3]. So this particular problem would
not have been exposed, however, I am wondering if there are other
problems lingering there.

This is a regression is exposed by [2]. I should add that to the changelog.

Cheers
Jon

[1]
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b3c64bc30af67ed328a8d919e41160942b870451
[2]
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=1b1287032df3a69d3ef9a486b444f4ffcca50d01
[3]
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=22770de11cb13e7120f973bca6c800de371a6717
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hunter, Jon July 2, 2012, 6:26 p.m. UTC | #6
On 07/02/2012 01:07 PM, Kevin Hilman wrote:
> + Neil Brown
> 
> Hi Jon,
> 
> Jon Hunter <jon-hunter@ti.com> writes:
> 
>> Currently the gpio _runtime_resume/suspend functions are calling the
>> get_context_loss_count() platform function if the function is populated for
>> a gpio bank. This function is used to determine if the gpio bank logic state
>> needs to be restored due to a power transition. This function will be populated
>> for all banks, but it should only be called for banks that have the
>> "loses_context" variable set. It is pointless to call this if loses_context is
>> false as we know the context will never be lost and will not need restoring.
>>
>> For all OMAP2+ devices gpio bank-0 is in an always-on power domain and so will
>> never lose context. We found that the get_context_loss_count() was being called
>> for bank-0 during the probe and returning 1 instead of 0 indicating that the
>> context had been lost. This was causing the context restore function to be
>> called at probe time for this bank and because the context had never been saved,
>> was restoring an invalid state. This ultimately resulted in a crash [1].
>>
>> There are multiple bugs here that need to be addressed ...
>>
>> 1. Why the always-on power domain returns a context loss count of 1? This needs
>>    to be fixed in the power domain code. However, the gpio driver should not
>>    assume the loss count is 0 to begin with.
>> 2. The omap gpio driver should never be calling get_context_loss_count for a
>>    gpio bank in a always-on domain. This is pointless and adds unneccessary
>>    overhead.
>> 3. The OMAP gpio driver assumes that the initial power domain context loss count
>>    will be 0 at the time the gpio driver is probed. However, it could be
>>    possible that this is not the case and an invalid context restore could be
>>    performed during the probe. To avoid this otherwise only populated the
> 
> The 'To avoid this...' sentence here doesn't read well.  Looks like you
> need to:
> 
> s/otherwise//

Yes, I meant to have dropped "otherwise" here. Thanks!

> s/populated/populate/

Yes that too! I must have re-worded and screwed it up royally :-(

> ?
> 
>>    get_context_loss_count() function pointer after the initial call to
>>    pm_runtime_get() has occurred. This will ensure that the first
>>    pm_runtime_put() initialised the loss count correctly.
>>
>> This patch addresses issues 2 and 3 above.
>> [1] http://marc.info/?l=linux-omap&m=134065775323775&w=2
>>
>> Cc: Grant Likely <grant.likely@secretlab.ca>
>> Cc: Linus Walleij <linus.walleij@stericsson.com>
>> Cc: Kevin Hilman <khilman@ti.com>
>> Cc: Tarun Kanti DebBarma <tarun.kanti@ti.com>
>> Cc: Franky Lin <frankyl@broadcom.com>
>>
>> Reported-by: Franky Lin <frankyl@broadcom.com>
>> Signed-off-by: Jon Hunter <jon-hunter@ti.com>
> 
> Thanks for digging inot this bug Jon.  The same bug was brought up by
> Neil Brown (Cc'd) in a different thread.
> 
> Neil, it looks to me that this fix will address the problems you were
> seeing as well.  Care to test, and respond with your ack/tested-by if it
> works for you?  Thanks.

Neil let me know your thoughts and if you are ok, I can clean-up the
changelog and re-send.

Cheers
Jon
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
NeilBrown July 2, 2012, 11:34 p.m. UTC | #7
On Mon, 2 Jul 2012 13:26:38 -0500 Jon Hunter <jon-hunter@ti.com> wrote:

> 
> On 07/02/2012 01:07 PM, Kevin Hilman wrote:
> > + Neil Brown
> > 
> > Hi Jon,
> > 
> > Jon Hunter <jon-hunter@ti.com> writes:
> > 
> >> Currently the gpio _runtime_resume/suspend functions are calling the
> >> get_context_loss_count() platform function if the function is populated for
> >> a gpio bank. This function is used to determine if the gpio bank logic state
> >> needs to be restored due to a power transition. This function will be populated
> >> for all banks, but it should only be called for banks that have the
> >> "loses_context" variable set. It is pointless to call this if loses_context is
> >> false as we know the context will never be lost and will not need restoring.
> >>
> >> For all OMAP2+ devices gpio bank-0 is in an always-on power domain and so will
> >> never lose context. We found that the get_context_loss_count() was being called
> >> for bank-0 during the probe and returning 1 instead of 0 indicating that the
> >> context had been lost. This was causing the context restore function to be
> >> called at probe time for this bank and because the context had never been saved,
> >> was restoring an invalid state. This ultimately resulted in a crash [1].
> >>
> >> There are multiple bugs here that need to be addressed ...
> >>
> >> 1. Why the always-on power domain returns a context loss count of 1? This needs
> >>    to be fixed in the power domain code. However, the gpio driver should not
> >>    assume the loss count is 0 to begin with.
> >> 2. The omap gpio driver should never be calling get_context_loss_count for a
> >>    gpio bank in a always-on domain. This is pointless and adds unneccessary
> >>    overhead.
> >> 3. The OMAP gpio driver assumes that the initial power domain context loss count
> >>    will be 0 at the time the gpio driver is probed. However, it could be
> >>    possible that this is not the case and an invalid context restore could be
> >>    performed during the probe. To avoid this otherwise only populated the
> > 
> > The 'To avoid this...' sentence here doesn't read well.  Looks like you
> > need to:
> > 
> > s/otherwise//
> 
> Yes, I meant to have dropped "otherwise" here. Thanks!
> 
> > s/populated/populate/
> 
> Yes that too! I must have re-worded and screwed it up royally :-(
> 
> > ?
> > 
> >>    get_context_loss_count() function pointer after the initial call to
> >>    pm_runtime_get() has occurred. This will ensure that the first
> >>    pm_runtime_put() initialised the loss count correctly.
> >>
> >> This patch addresses issues 2 and 3 above.
> >> [1] http://marc.info/?l=linux-omap&m=134065775323775&w=2
> >>
> >> Cc: Grant Likely <grant.likely@secretlab.ca>
> >> Cc: Linus Walleij <linus.walleij@stericsson.com>
> >> Cc: Kevin Hilman <khilman@ti.com>
> >> Cc: Tarun Kanti DebBarma <tarun.kanti@ti.com>
> >> Cc: Franky Lin <frankyl@broadcom.com>
> >>
> >> Reported-by: Franky Lin <frankyl@broadcom.com>
> >> Signed-off-by: Jon Hunter <jon-hunter@ti.com>
> > 
> > Thanks for digging inot this bug Jon.  The same bug was brought up by
> > Neil Brown (Cc'd) in a different thread.
> > 
> > Neil, it looks to me that this fix will address the problems you were
> > seeing as well.  Care to test, and respond with your ack/tested-by if it
> > works for you?  Thanks.
> 
> Neil let me know your thoughts and if you are ok, I can clean-up the
> changelog and re-send.

Yes, works for me and looks sensible.

 Tested-by: NeilBrown <neilb@suse.de>

Thanks,
NeilBrown
Kevin Hilman July 3, 2012, 12:05 a.m. UTC | #8
NeilBrown <neilb@suse.de> writes:

> On Mon, 2 Jul 2012 13:26:38 -0500 Jon Hunter <jon-hunter@ti.com> wrote:
>
>> 
>> On 07/02/2012 01:07 PM, Kevin Hilman wrote:
>> > + Neil Brown
>> > 
>> > Hi Jon,
>> > 
>> > Jon Hunter <jon-hunter@ti.com> writes:
>> > 
>> >> Currently the gpio _runtime_resume/suspend functions are calling the
>> >> get_context_loss_count() platform function if the function is populated for
>> >> a gpio bank. This function is used to determine if the gpio bank logic state
>> >> needs to be restored due to a power transition. This function will be populated
>> >> for all banks, but it should only be called for banks that have the
>> >> "loses_context" variable set. It is pointless to call this if loses_context is
>> >> false as we know the context will never be lost and will not need restoring.
>> >>
>> >> For all OMAP2+ devices gpio bank-0 is in an always-on power domain and so will
>> >> never lose context. We found that the get_context_loss_count() was being called
>> >> for bank-0 during the probe and returning 1 instead of 0 indicating that the
>> >> context had been lost. This was causing the context restore function to be
>> >> called at probe time for this bank and because the context had never been saved,
>> >> was restoring an invalid state. This ultimately resulted in a crash [1].
>> >>
>> >> There are multiple bugs here that need to be addressed ...
>> >>
>> >> 1. Why the always-on power domain returns a context loss count of 1? This needs
>> >>    to be fixed in the power domain code. However, the gpio driver should not
>> >>    assume the loss count is 0 to begin with.
>> >> 2. The omap gpio driver should never be calling get_context_loss_count for a
>> >>    gpio bank in a always-on domain. This is pointless and adds unneccessary
>> >>    overhead.
>> >> 3. The OMAP gpio driver assumes that the initial power domain context loss count
>> >>    will be 0 at the time the gpio driver is probed. However, it could be
>> >>    possible that this is not the case and an invalid context restore could be
>> >>    performed during the probe. To avoid this otherwise only populated the
>> > 
>> > The 'To avoid this...' sentence here doesn't read well.  Looks like you
>> > need to:
>> > 
>> > s/otherwise//
>> 
>> Yes, I meant to have dropped "otherwise" here. Thanks!
>> 
>> > s/populated/populate/
>> 
>> Yes that too! I must have re-worded and screwed it up royally :-(
>> 
>> > ?
>> > 
>> >>    get_context_loss_count() function pointer after the initial call to
>> >>    pm_runtime_get() has occurred. This will ensure that the first
>> >>    pm_runtime_put() initialised the loss count correctly.
>> >>
>> >> This patch addresses issues 2 and 3 above.
>> >> [1] http://marc.info/?l=linux-omap&m=134065775323775&w=2
>> >>
>> >> Cc: Grant Likely <grant.likely@secretlab.ca>
>> >> Cc: Linus Walleij <linus.walleij@stericsson.com>
>> >> Cc: Kevin Hilman <khilman@ti.com>
>> >> Cc: Tarun Kanti DebBarma <tarun.kanti@ti.com>
>> >> Cc: Franky Lin <frankyl@broadcom.com>
>> >>
>> >> Reported-by: Franky Lin <frankyl@broadcom.com>
>> >> Signed-off-by: Jon Hunter <jon-hunter@ti.com>
>> > 
>> > Thanks for digging inot this bug Jon.  The same bug was brought up by
>> > Neil Brown (Cc'd) in a different thread.
>> > 
>> > Neil, it looks to me that this fix will address the problems you were
>> > seeing as well.  Care to test, and respond with your ack/tested-by if it
>> > works for you?  Thanks.
>> 
>> Neil let me know your thoughts and if you are ok, I can clean-up the
>> changelog and re-send.
>
> Yes, works for me and looks sensible.
>
>  Tested-by: NeilBrown <neilb@suse.de>
>

Great!  Thanks for testing.

Jon, please make the minor changelog edits, collect the reviewed-by and
tested-by tags and repost.  I'll then queue this up for Grant.

Based on your earlier comments, this only affects v3.5, so no
need to push it into stable, correct?

Kevin
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hunter, Jon July 3, 2012, 12:20 a.m. UTC | #9
On 07/02/2012 07:05 PM, Kevin Hilman wrote:
> NeilBrown <neilb@suse.de> writes:
> 
>> On Mon, 2 Jul 2012 13:26:38 -0500 Jon Hunter <jon-hunter@ti.com> wrote:
>>
>>>
>>> On 07/02/2012 01:07 PM, Kevin Hilman wrote:
>>>> + Neil Brown
>>>>
>>>> Hi Jon,
>>>>
>>>> Jon Hunter <jon-hunter@ti.com> writes:
>>>>
>>>>> Currently the gpio _runtime_resume/suspend functions are calling the
>>>>> get_context_loss_count() platform function if the function is populated for
>>>>> a gpio bank. This function is used to determine if the gpio bank logic state
>>>>> needs to be restored due to a power transition. This function will be populated
>>>>> for all banks, but it should only be called for banks that have the
>>>>> "loses_context" variable set. It is pointless to call this if loses_context is
>>>>> false as we know the context will never be lost and will not need restoring.
>>>>>
>>>>> For all OMAP2+ devices gpio bank-0 is in an always-on power domain and so will
>>>>> never lose context. We found that the get_context_loss_count() was being called
>>>>> for bank-0 during the probe and returning 1 instead of 0 indicating that the
>>>>> context had been lost. This was causing the context restore function to be
>>>>> called at probe time for this bank and because the context had never been saved,
>>>>> was restoring an invalid state. This ultimately resulted in a crash [1].
>>>>>
>>>>> There are multiple bugs here that need to be addressed ...
>>>>>
>>>>> 1. Why the always-on power domain returns a context loss count of 1? This needs
>>>>>    to be fixed in the power domain code. However, the gpio driver should not
>>>>>    assume the loss count is 0 to begin with.
>>>>> 2. The omap gpio driver should never be calling get_context_loss_count for a
>>>>>    gpio bank in a always-on domain. This is pointless and adds unneccessary
>>>>>    overhead.
>>>>> 3. The OMAP gpio driver assumes that the initial power domain context loss count
>>>>>    will be 0 at the time the gpio driver is probed. However, it could be
>>>>>    possible that this is not the case and an invalid context restore could be
>>>>>    performed during the probe. To avoid this otherwise only populated the
>>>>
>>>> The 'To avoid this...' sentence here doesn't read well.  Looks like you
>>>> need to:
>>>>
>>>> s/otherwise//
>>>
>>> Yes, I meant to have dropped "otherwise" here. Thanks!
>>>
>>>> s/populated/populate/
>>>
>>> Yes that too! I must have re-worded and screwed it up royally :-(
>>>
>>>> ?
>>>>
>>>>>    get_context_loss_count() function pointer after the initial call to
>>>>>    pm_runtime_get() has occurred. This will ensure that the first
>>>>>    pm_runtime_put() initialised the loss count correctly.
>>>>>
>>>>> This patch addresses issues 2 and 3 above.
>>>>> [1] http://marc.info/?l=linux-omap&m=134065775323775&w=2
>>>>>
>>>>> Cc: Grant Likely <grant.likely@secretlab.ca>
>>>>> Cc: Linus Walleij <linus.walleij@stericsson.com>
>>>>> Cc: Kevin Hilman <khilman@ti.com>
>>>>> Cc: Tarun Kanti DebBarma <tarun.kanti@ti.com>
>>>>> Cc: Franky Lin <frankyl@broadcom.com>
>>>>>
>>>>> Reported-by: Franky Lin <frankyl@broadcom.com>
>>>>> Signed-off-by: Jon Hunter <jon-hunter@ti.com>
>>>>
>>>> Thanks for digging inot this bug Jon.  The same bug was brought up by
>>>> Neil Brown (Cc'd) in a different thread.
>>>>
>>>> Neil, it looks to me that this fix will address the problems you were
>>>> seeing as well.  Care to test, and respond with your ack/tested-by if it
>>>> works for you?  Thanks.
>>>
>>> Neil let me know your thoughts and if you are ok, I can clean-up the
>>> changelog and re-send.
>>
>> Yes, works for me and looks sensible.
>>
>>  Tested-by: NeilBrown <neilb@suse.de>
>>
> 
> Great!  Thanks for testing.
> 
> Jon, please make the minor changelog edits, collect the reviewed-by and
> tested-by tags and repost.  I'll then queue this up for Grant.

Ok, will do that tomorrow.

> Based on your earlier comments, this only affects v3.5, so no
> need to push it into stable, correct?

As far as I can tell. However, not sure if any of the other fixes should
be back ported.

Cheers
Jon
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index c4ed172..f13fc9c 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -1081,7 +1081,6 @@  static int __devinit omap_gpio_probe(struct platform_device *pdev)
 	bank->is_mpuio = pdata->is_mpuio;
 	bank->non_wakeup_gpios = pdata->non_wakeup_gpios;
 	bank->loses_context = pdata->loses_context;
-	bank->get_context_loss_count = pdata->get_context_loss_count;
 	bank->regs = pdata->regs;
 #ifdef CONFIG_OF_GPIO
 	bank->chip.of_node = of_node_get(node);
@@ -1135,6 +1134,9 @@  static int __devinit omap_gpio_probe(struct platform_device *pdev)
 	omap_gpio_chip_init(bank);
 	omap_gpio_show_rev(bank);
 
+	if (bank->loses_context)
+		bank->get_context_loss_count = pdata->get_context_loss_count;
+
 	pm_runtime_put(bank->dev);
 
 	list_add_tail(&bank->node, &omap_gpio_list);