diff mbox

[v3,2/8] PM / Domains: Handle safely genpd_syscore_switch() call on non-genpd device

Message ID 20170628145623.20716-3-krzk@kernel.org (mailing list archive)
State Mainlined
Delegated to: Rafael Wysocki
Headers show

Commit Message

Krzysztof Kozlowski June 28, 2017, 2:56 p.m. UTC
genpd_syscore_switch() had two problems:
1. It silently assumed that device, it is being called for, belongs to
   generic power domain and used container_of() on its power domain
   pointer.  Such assumption might not be true always.

2. It iterated over list of generic power domains without holding
   gpd_list_lock mutex thus list could have been modified in the same
   time.

Usage of genpd_lookup_dev() solves both problems as it is safe a call
for non-generic power domains and uses mutex when iterating.

Reported-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>

---

Not tested on real hardware.
---
 drivers/base/power/domain.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Geert Uytterhoeven July 4, 2017, 1:01 p.m. UTC | #1
Hi Krzysztof, Rafael,

On Wed, Jun 28, 2017 at 4:56 PM, Krzysztof Kozlowski <krzk@kernel.org> wrote:
> genpd_syscore_switch() had two problems:
> 1. It silently assumed that device, it is being called for, belongs to
>    generic power domain and used container_of() on its power domain
>    pointer.  Such assumption might not be true always.
>
> 2. It iterated over list of generic power domains without holding
>    gpd_list_lock mutex thus list could have been modified in the same
>    time.
>
> Usage of genpd_lookup_dev() solves both problems as it is safe a call
> for non-generic power domains and uses mutex when iterating.
>
> Reported-by: Ulf Hansson <ulf.hansson@linaro.org>
> Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
> Acked-by: Ulf Hansson <ulf.hansson@linaro.org>

This is commit 8b55e55ee44356d6 in pm/linux-next, also part of the pull
request "[GIT PULL] Power management updates for v4.13-rc1".

> Not tested on real hardware.

This causes the following BUG during s2ram on all my Renesas arm32 boards,
where the system timer is an IRQ safe device:

PM: Syncing filesystems ... done.
PM: Preparing system for sleep (mem)
Freezing user space processes ... (elapsed 0.001 seconds) done.
OOM killer disabled.
Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
PM: Suspending system (mem)
PM: suspend of devices complete after 122.841 msecs
PM: suspend devices took 0.130 seconds
PM: late suspend of devices complete after 2.682 msecs
PM: noirq suspend of devices complete after 29.951 msecs
Disabling non-boot CPUs ...
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:238
in_atomic(): 0, irqs_disabled(): 128, pid: 1730, name: s2ram
CPU: 0 PID: 1730 Comm: s2ram Not tainted
4.12.0-koelsch-07061-g810fee9afeba15ef #3592
Hardware name: Generic R8A7791 (Flattened Device Tree)
[<c020e9f4>] (unwind_backtrace) from [<c020a484>] (show_stack+0x10/0x14)
[<c020a484>] (show_stack) from [<c04017e8>] (dump_stack+0x7c/0x9c)
[<c04017e8>] (dump_stack) from [<c0240284>] (___might_sleep+0x124/0x160)
[<c0240284>] (___might_sleep) from [<c0717cfc>] (mutex_lock+0x18/0x60)
[<c0717cfc>] (mutex_lock) from [<c04de11c>] (genpd_lookup_dev+0x38/0x94)
[<c04de11c>] (genpd_lookup_dev) from [<c04dfd34>]
(pm_genpd_syscore_poweroff+0x8/0x2c)
[<c04dfd34>] (pm_genpd_syscore_poweroff) from [<c05fcb70>]
(sh_cmt_clock_event_suspend+0x18/0x28)
[<c05fcb70>] (sh_cmt_clock_event_suspend) from [<c027f174>]
(clockevents_suspend+0x40/0x54)
[<c027f174>] (clockevents_suspend) from [<c02762d8>]
(timekeeping_suspend+0x23c/0x278)
[<c02762d8>] (timekeeping_suspend) from [<c04ce028>]
(syscore_suspend+0x88/0x138)
[<c04ce028>] (syscore_suspend) from [<c025d740>]
(suspend_devices_and_enter+0x308/0x4fc)
[<c025d740>] (suspend_devices_and_enter) from [<c025db5c>]
(pm_suspend+0x228/0x280)
[<c025db5c>] (pm_suspend) from [<c025c6b8>] (state_store+0xac/0xcc)
[<c025c6b8>] (state_store) from [<c0342f9c>] (kernfs_fop_write+0x170/0x1b0)
[<c0342f9c>] (kernfs_fop_write) from [<c02e47dc>] (__vfs_write+0x20/0x108)
[<c02e47dc>] (__vfs_write) from [<c02e5b80>] (vfs_write+0xb8/0x144)
[<c02e5b80>] (vfs_write) from [<c02e6804>] (SyS_write+0x40/0x80)
[<c02e6804>] (SyS_write) from [<c0206c40>] (ret_fast_syscall+0x0/0x34)

Reverting the commit fixes the issue.

> ---
>  drivers/base/power/domain.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> index 01e31d9f6c94..d31a4434b8b3 100644
> --- a/drivers/base/power/domain.c
> +++ b/drivers/base/power/domain.c
> @@ -1098,8 +1098,8 @@ static void genpd_syscore_switch(struct device *dev, bool suspend)
>  {
>         struct generic_pm_domain *genpd;
>
> -       genpd = dev_to_genpd(dev);
> -       if (!pm_genpd_present(genpd))
> +       genpd = genpd_lookup_dev(dev);
> +       if (!genpd)
>                 return;
>
>         if (suspend) {

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Krzysztof Kozlowski July 4, 2017, 6:10 p.m. UTC | #2
On Tue, Jul 04, 2017 at 03:01:15PM +0200, Geert Uytterhoeven wrote:
> Hi Krzysztof, Rafael,
> 
> On Wed, Jun 28, 2017 at 4:56 PM, Krzysztof Kozlowski <krzk@kernel.org> wrote:
> > genpd_syscore_switch() had two problems:
> > 1. It silently assumed that device, it is being called for, belongs to
> >    generic power domain and used container_of() on its power domain
> >    pointer.  Such assumption might not be true always.
> >
> > 2. It iterated over list of generic power domains without holding
> >    gpd_list_lock mutex thus list could have been modified in the same
> >    time.
> >
> > Usage of genpd_lookup_dev() solves both problems as it is safe a call
> > for non-generic power domains and uses mutex when iterating.
> >
> > Reported-by: Ulf Hansson <ulf.hansson@linaro.org>
> > Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
> > Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
> 
> This is commit 8b55e55ee44356d6 in pm/linux-next, also part of the pull
> request "[GIT PULL] Power management updates for v4.13-rc1".
> 
> > Not tested on real hardware.
> 
> This causes the following BUG during s2ram on all my Renesas arm32 boards,
> where the system timer is an IRQ safe device:
> 
> PM: Syncing filesystems ... done.
> PM: Preparing system for sleep (mem)
> Freezing user space processes ... (elapsed 0.001 seconds) done.
> OOM killer disabled.
> Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> PM: Suspending system (mem)
> PM: suspend of devices complete after 122.841 msecs
> PM: suspend devices took 0.130 seconds
> PM: late suspend of devices complete after 2.682 msecs
> PM: noirq suspend of devices complete after 29.951 msecs
> Disabling non-boot CPUs ...
> BUG: sleeping function called from invalid context at kernel/locking/mutex.c:238

Hi,

Thanks for report!

Damn it, although I couldn't find this in the code, but I was fearing
that this ends up in atomic section. That would kind of explain why
mutex was not there [1].

Anyway, the buggy code was there already. Instead of "sleeping in atomic
section" there was no locking at all... In this context this was
probably safe because it was executed *after* disabling non-boot CPUs
but then the function cannot be called in other contexts.

I am not sure I will be capable of developing the proper fix as I do not
have the hardware and I do not know all stuff happening in sh suspend.
Probably reverting this and living with non-locked path would be the
safest choice.

[1] https://patchwork.kernel.org/patch/9778903/

Best regards,
Krzysztof


> in_atomic(): 0, irqs_disabled(): 128, pid: 1730, name: s2ram
> CPU: 0 PID: 1730 Comm: s2ram Not tainted
> 4.12.0-koelsch-07061-g810fee9afeba15ef #3592
> Hardware name: Generic R8A7791 (Flattened Device Tree)
> [<c020e9f4>] (unwind_backtrace) from [<c020a484>] (show_stack+0x10/0x14)
> [<c020a484>] (show_stack) from [<c04017e8>] (dump_stack+0x7c/0x9c)
> [<c04017e8>] (dump_stack) from [<c0240284>] (___might_sleep+0x124/0x160)
> [<c0240284>] (___might_sleep) from [<c0717cfc>] (mutex_lock+0x18/0x60)
> [<c0717cfc>] (mutex_lock) from [<c04de11c>] (genpd_lookup_dev+0x38/0x94)
> [<c04de11c>] (genpd_lookup_dev) from [<c04dfd34>]
> (pm_genpd_syscore_poweroff+0x8/0x2c)
> [<c04dfd34>] (pm_genpd_syscore_poweroff) from [<c05fcb70>]
> (sh_cmt_clock_event_suspend+0x18/0x28)
> [<c05fcb70>] (sh_cmt_clock_event_suspend) from [<c027f174>]
> (clockevents_suspend+0x40/0x54)
> [<c027f174>] (clockevents_suspend) from [<c02762d8>]
> (timekeeping_suspend+0x23c/0x278)
> [<c02762d8>] (timekeeping_suspend) from [<c04ce028>]
> (syscore_suspend+0x88/0x138)
> [<c04ce028>] (syscore_suspend) from [<c025d740>]
> (suspend_devices_and_enter+0x308/0x4fc)
> [<c025d740>] (suspend_devices_and_enter) from [<c025db5c>]
> (pm_suspend+0x228/0x280)
> [<c025db5c>] (pm_suspend) from [<c025c6b8>] (state_store+0xac/0xcc)
> [<c025c6b8>] (state_store) from [<c0342f9c>] (kernfs_fop_write+0x170/0x1b0)
> [<c0342f9c>] (kernfs_fop_write) from [<c02e47dc>] (__vfs_write+0x20/0x108)
> [<c02e47dc>] (__vfs_write) from [<c02e5b80>] (vfs_write+0xb8/0x144)
> [<c02e5b80>] (vfs_write) from [<c02e6804>] (SyS_write+0x40/0x80)
> [<c02e6804>] (SyS_write) from [<c0206c40>] (ret_fast_syscall+0x0/0x34)
> 
> Reverting the commit fixes the issue.
> 
> > ---
> >  drivers/base/power/domain.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> > index 01e31d9f6c94..d31a4434b8b3 100644
> > --- a/drivers/base/power/domain.c
> > +++ b/drivers/base/power/domain.c
> > @@ -1098,8 +1098,8 @@ static void genpd_syscore_switch(struct device *dev, bool suspend)
> >  {
> >         struct generic_pm_domain *genpd;
> >
> > -       genpd = dev_to_genpd(dev);
> > -       if (!pm_genpd_present(genpd))
> > +       genpd = genpd_lookup_dev(dev);
> > +       if (!genpd)
> >                 return;
> >
> >         if (suspend) {
> 
> Gr{oetje,eeting}s,
> 
>                         Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds
Geert Uytterhoeven July 4, 2017, 6:19 p.m. UTC | #3
Hi Krzysztof,

On Tue, Jul 4, 2017 at 8:10 PM, Krzysztof Kozlowski <krzk@kernel.org> wrote:
> On Tue, Jul 04, 2017 at 03:01:15PM +0200, Geert Uytterhoeven wrote:
>> On Wed, Jun 28, 2017 at 4:56 PM, Krzysztof Kozlowski <krzk@kernel.org> wrote:
>> > genpd_syscore_switch() had two problems:
>> > 1. It silently assumed that device, it is being called for, belongs to
>> >    generic power domain and used container_of() on its power domain
>> >    pointer.  Such assumption might not be true always.
>> >
>> > 2. It iterated over list of generic power domains without holding
>> >    gpd_list_lock mutex thus list could have been modified in the same
>> >    time.
>> >
>> > Usage of genpd_lookup_dev() solves both problems as it is safe a call
>> > for non-generic power domains and uses mutex when iterating.
>> >
>> > Reported-by: Ulf Hansson <ulf.hansson@linaro.org>
>> > Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
>> > Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
>>
>> This is commit 8b55e55ee44356d6 in pm/linux-next, also part of the pull
>> request "[GIT PULL] Power management updates for v4.13-rc1".
>>
>> > Not tested on real hardware.
>>
>> This causes the following BUG during s2ram on all my Renesas arm32 boards,
>> where the system timer is an IRQ safe device:
>>
>> PM: Syncing filesystems ... done.
>> PM: Preparing system for sleep (mem)
>> Freezing user space processes ... (elapsed 0.001 seconds) done.
>> OOM killer disabled.
>> Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
>> PM: Suspending system (mem)
>> PM: suspend of devices complete after 122.841 msecs
>> PM: suspend devices took 0.130 seconds
>> PM: late suspend of devices complete after 2.682 msecs
>> PM: noirq suspend of devices complete after 29.951 msecs
>> Disabling non-boot CPUs ...
>> BUG: sleeping function called from invalid context at kernel/locking/mutex.c:238
>
> Thanks for report!
>
> Damn it, although I couldn't find this in the code, but I was fearing
> that this ends up in atomic section. That would kind of explain why
> mutex was not there [1].
>
> Anyway, the buggy code was there already. Instead of "sleeping in atomic
> section" there was no locking at all... In this context this was
> probably safe because it was executed *after* disabling non-boot CPUs
> but then the function cannot be called in other contexts.
>
> I am not sure I will be capable of developing the proper fix as I do not
> have the hardware and I do not know all stuff happening in sh suspend.
> Probably reverting this and living with non-locked path would be the
> safest choice.
>
> [1] https://patchwork.kernel.org/patch/9778903/

AFAIU, all syscore stuff runs in atomic context.

Don't worry, you're not the only one.
This bug report was almost 100% the same as an earlier one for a patch
from Ulf ;-)
(cfr. "[RESEND PATCH 0/2] PM / Domains: Fix asynchronous execution of
*noirq() callbacks")

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Krzysztof Kozlowski July 4, 2017, 6:36 p.m. UTC | #4
On Tue, Jul 04, 2017 at 08:19:47PM +0200, Geert Uytterhoeven wrote:
> Hi Krzysztof,
> 
> On Tue, Jul 4, 2017 at 8:10 PM, Krzysztof Kozlowski <krzk@kernel.org> wrote:
> > On Tue, Jul 04, 2017 at 03:01:15PM +0200, Geert Uytterhoeven wrote:
> >> On Wed, Jun 28, 2017 at 4:56 PM, Krzysztof Kozlowski <krzk@kernel.org> wrote:
> >> > genpd_syscore_switch() had two problems:
> >> > 1. It silently assumed that device, it is being called for, belongs to
> >> >    generic power domain and used container_of() on its power domain
> >> >    pointer.  Such assumption might not be true always.
> >> >
> >> > 2. It iterated over list of generic power domains without holding
> >> >    gpd_list_lock mutex thus list could have been modified in the same
> >> >    time.
> >> >
> >> > Usage of genpd_lookup_dev() solves both problems as it is safe a call
> >> > for non-generic power domains and uses mutex when iterating.
> >> >
> >> > Reported-by: Ulf Hansson <ulf.hansson@linaro.org>
> >> > Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
> >> > Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
> >>
> >> This is commit 8b55e55ee44356d6 in pm/linux-next, also part of the pull
> >> request "[GIT PULL] Power management updates for v4.13-rc1".
> >>
> >> > Not tested on real hardware.
> >>
> >> This causes the following BUG during s2ram on all my Renesas arm32 boards,
> >> where the system timer is an IRQ safe device:
> >>
> >> PM: Syncing filesystems ... done.
> >> PM: Preparing system for sleep (mem)
> >> Freezing user space processes ... (elapsed 0.001 seconds) done.
> >> OOM killer disabled.
> >> Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> >> PM: Suspending system (mem)
> >> PM: suspend of devices complete after 122.841 msecs
> >> PM: suspend devices took 0.130 seconds
> >> PM: late suspend of devices complete after 2.682 msecs
> >> PM: noirq suspend of devices complete after 29.951 msecs
> >> Disabling non-boot CPUs ...
> >> BUG: sleeping function called from invalid context at kernel/locking/mutex.c:238
> >
> > Thanks for report!
> >
> > Damn it, although I couldn't find this in the code, but I was fearing
> > that this ends up in atomic section. That would kind of explain why
> > mutex was not there [1].
> >
> > Anyway, the buggy code was there already. Instead of "sleeping in atomic
> > section" there was no locking at all... In this context this was
> > probably safe because it was executed *after* disabling non-boot CPUs
> > but then the function cannot be called in other contexts.
> >
> > I am not sure I will be capable of developing the proper fix as I do not
> > have the hardware and I do not know all stuff happening in sh suspend.
> > Probably reverting this and living with non-locked path would be the
> > safest choice.
> >
> > [1] https://patchwork.kernel.org/patch/9778903/
> 
> AFAIU, all syscore stuff runs in atomic context.

Indeed... The confusing part is that this code is syscore only from
the name, it is not hooked in to syscore_ops. Although going by call
chain (through sh clocksource drivers) we end up in
timekeeping_suspend() which is a syscore op.

I wonder whether it would be useful - after reverting my commit - to add
an assert (which is a stronger API requirement than only documentation "may
only be called during the system core (syscore) suspend") like:
	WARN_ON(num_online_cpus() > 1));
as without mutexes this should not be executed with more than one online
CPU.

Best regards,
Krzysztof

> 
> Don't worry, you're not the only one.
> This bug report was almost 100% the same as an earlier one for a patch
> from Ulf ;-)
> (cfr. "[RESEND PATCH 0/2] PM / Domains: Fix asynchronous execution of
> *noirq() callbacks")
> 
> Gr{oetje,eeting}s,
> 
>                         Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds
Rafael J. Wysocki July 4, 2017, 7:54 p.m. UTC | #5
On Tue, Jul 4, 2017 at 8:36 PM, Krzysztof Kozlowski <krzk@kernel.org> wrote:
> On Tue, Jul 04, 2017 at 08:19:47PM +0200, Geert Uytterhoeven wrote:
>> Hi Krzysztof,
>>
>> On Tue, Jul 4, 2017 at 8:10 PM, Krzysztof Kozlowski <krzk@kernel.org> wrote:
>> > On Tue, Jul 04, 2017 at 03:01:15PM +0200, Geert Uytterhoeven wrote:
>> >> On Wed, Jun 28, 2017 at 4:56 PM, Krzysztof Kozlowski <krzk@kernel.org> wrote:
>> >> > genpd_syscore_switch() had two problems:
>> >> > 1. It silently assumed that device, it is being called for, belongs to
>> >> >    generic power domain and used container_of() on its power domain
>> >> >    pointer.  Such assumption might not be true always.
>> >> >
>> >> > 2. It iterated over list of generic power domains without holding
>> >> >    gpd_list_lock mutex thus list could have been modified in the same
>> >> >    time.
>> >> >
>> >> > Usage of genpd_lookup_dev() solves both problems as it is safe a call
>> >> > for non-generic power domains and uses mutex when iterating.
>> >> >
>> >> > Reported-by: Ulf Hansson <ulf.hansson@linaro.org>
>> >> > Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
>> >> > Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
>> >>
>> >> This is commit 8b55e55ee44356d6 in pm/linux-next, also part of the pull
>> >> request "[GIT PULL] Power management updates for v4.13-rc1".
>> >>
>> >> > Not tested on real hardware.
>> >>
>> >> This causes the following BUG during s2ram on all my Renesas arm32 boards,
>> >> where the system timer is an IRQ safe device:
>> >>
>> >> PM: Syncing filesystems ... done.
>> >> PM: Preparing system for sleep (mem)
>> >> Freezing user space processes ... (elapsed 0.001 seconds) done.
>> >> OOM killer disabled.
>> >> Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
>> >> PM: Suspending system (mem)
>> >> PM: suspend of devices complete after 122.841 msecs
>> >> PM: suspend devices took 0.130 seconds
>> >> PM: late suspend of devices complete after 2.682 msecs
>> >> PM: noirq suspend of devices complete after 29.951 msecs
>> >> Disabling non-boot CPUs ...
>> >> BUG: sleeping function called from invalid context at kernel/locking/mutex.c:238
>> >
>> > Thanks for report!
>> >
>> > Damn it, although I couldn't find this in the code, but I was fearing
>> > that this ends up in atomic section. That would kind of explain why
>> > mutex was not there [1].
>> >
>> > Anyway, the buggy code was there already. Instead of "sleeping in atomic
>> > section" there was no locking at all... In this context this was
>> > probably safe because it was executed *after* disabling non-boot CPUs
>> > but then the function cannot be called in other contexts.
>> >
>> > I am not sure I will be capable of developing the proper fix as I do not
>> > have the hardware and I do not know all stuff happening in sh suspend.
>> > Probably reverting this and living with non-locked path would be the
>> > safest choice.
>> >
>> > [1] https://patchwork.kernel.org/patch/9778903/
>>
>> AFAIU, all syscore stuff runs in atomic context.
>
> Indeed... The confusing part is that this code is syscore only from
> the name, it is not hooked in to syscore_ops. Although going by call
> chain (through sh clocksource drivers) we end up in
> timekeeping_suspend() which is a syscore op.
>
> I wonder whether it would be useful - after reverting my commit - to add
> an assert (which is a stronger API requirement than only documentation "may
> only be called during the system core (syscore) suspend") like:
>         WARN_ON(num_online_cpus() > 1));
> as without mutexes this should not be executed with more than one online
> CPU.

Or maybe WARN_ON_ONCE(!in_atomic())?

I'm queuing up a revert of the $subject commit.

Thanks,
Rafael
Krzysztof Kozlowski July 4, 2017, 8:05 p.m. UTC | #6
On Tue, Jul 04, 2017 at 09:54:10PM +0200, Rafael J. Wysocki wrote:
> On Tue, Jul 4, 2017 at 8:36 PM, Krzysztof Kozlowski <krzk@kernel.org> wrote:
> > On Tue, Jul 04, 2017 at 08:19:47PM +0200, Geert Uytterhoeven wrote:
> >> Hi Krzysztof,
> >>
> >> On Tue, Jul 4, 2017 at 8:10 PM, Krzysztof Kozlowski <krzk@kernel.org> wrote:
> >> > On Tue, Jul 04, 2017 at 03:01:15PM +0200, Geert Uytterhoeven wrote:
> >> >> On Wed, Jun 28, 2017 at 4:56 PM, Krzysztof Kozlowski <krzk@kernel.org> wrote:
> >> >> > genpd_syscore_switch() had two problems:
> >> >> > 1. It silently assumed that device, it is being called for, belongs to
> >> >> >    generic power domain and used container_of() on its power domain
> >> >> >    pointer.  Such assumption might not be true always.
> >> >> >
> >> >> > 2. It iterated over list of generic power domains without holding
> >> >> >    gpd_list_lock mutex thus list could have been modified in the same
> >> >> >    time.
> >> >> >
> >> >> > Usage of genpd_lookup_dev() solves both problems as it is safe a call
> >> >> > for non-generic power domains and uses mutex when iterating.
> >> >> >
> >> >> > Reported-by: Ulf Hansson <ulf.hansson@linaro.org>
> >> >> > Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
> >> >> > Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
> >> >>
> >> >> This is commit 8b55e55ee44356d6 in pm/linux-next, also part of the pull
> >> >> request "[GIT PULL] Power management updates for v4.13-rc1".
> >> >>
> >> >> > Not tested on real hardware.
> >> >>
> >> >> This causes the following BUG during s2ram on all my Renesas arm32 boards,
> >> >> where the system timer is an IRQ safe device:
> >> >>
> >> >> PM: Syncing filesystems ... done.
> >> >> PM: Preparing system for sleep (mem)
> >> >> Freezing user space processes ... (elapsed 0.001 seconds) done.
> >> >> OOM killer disabled.
> >> >> Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> >> >> PM: Suspending system (mem)
> >> >> PM: suspend of devices complete after 122.841 msecs
> >> >> PM: suspend devices took 0.130 seconds
> >> >> PM: late suspend of devices complete after 2.682 msecs
> >> >> PM: noirq suspend of devices complete after 29.951 msecs
> >> >> Disabling non-boot CPUs ...
> >> >> BUG: sleeping function called from invalid context at kernel/locking/mutex.c:238
> >> >
> >> > Thanks for report!
> >> >
> >> > Damn it, although I couldn't find this in the code, but I was fearing
> >> > that this ends up in atomic section. That would kind of explain why
> >> > mutex was not there [1].
> >> >
> >> > Anyway, the buggy code was there already. Instead of "sleeping in atomic
> >> > section" there was no locking at all... In this context this was
> >> > probably safe because it was executed *after* disabling non-boot CPUs
> >> > but then the function cannot be called in other contexts.
> >> >
> >> > I am not sure I will be capable of developing the proper fix as I do not
> >> > have the hardware and I do not know all stuff happening in sh suspend.
> >> > Probably reverting this and living with non-locked path would be the
> >> > safest choice.
> >> >
> >> > [1] https://patchwork.kernel.org/patch/9778903/
> >>
> >> AFAIU, all syscore stuff runs in atomic context.
> >
> > Indeed... The confusing part is that this code is syscore only from
> > the name, it is not hooked in to syscore_ops. Although going by call
> > chain (through sh clocksource drivers) we end up in
> > timekeeping_suspend() which is a syscore op.
> >
> > I wonder whether it would be useful - after reverting my commit - to add
> > an assert (which is a stronger API requirement than only documentation "may
> > only be called during the system core (syscore) suspend") like:
> >         WARN_ON(num_online_cpus() > 1));
> > as without mutexes this should not be executed with more than one online
> > CPU.
> 
> Or maybe WARN_ON_ONCE(!in_atomic())?

You could be in atomic section on this CPU and still have other CPUs
online playing with gpd_list (without any protection from locking).
This function is safe only on non-SMP case.

Best regards,
Krzysztof
 
> I'm queuing up a revert of the $subject commit.
> 
> Thanks,
> Rafael
Rafael J. Wysocki July 4, 2017, 8:12 p.m. UTC | #7
On Tue, Jul 4, 2017 at 10:05 PM, Krzysztof Kozlowski <krzk@kernel.org> wrote:
> On Tue, Jul 04, 2017 at 09:54:10PM +0200, Rafael J. Wysocki wrote:
>> On Tue, Jul 4, 2017 at 8:36 PM, Krzysztof Kozlowski <krzk@kernel.org> wrote:
>> > On Tue, Jul 04, 2017 at 08:19:47PM +0200, Geert Uytterhoeven wrote:
>> >> Hi Krzysztof,
>> >>
>> >> On Tue, Jul 4, 2017 at 8:10 PM, Krzysztof Kozlowski <krzk@kernel.org> wrote:
>> >> > On Tue, Jul 04, 2017 at 03:01:15PM +0200, Geert Uytterhoeven wrote:
>> >> >> On Wed, Jun 28, 2017 at 4:56 PM, Krzysztof Kozlowski <krzk@kernel.org> wrote:
>> >> >> > genpd_syscore_switch() had two problems:
>> >> >> > 1. It silently assumed that device, it is being called for, belongs to
>> >> >> >    generic power domain and used container_of() on its power domain
>> >> >> >    pointer.  Such assumption might not be true always.
>> >> >> >
>> >> >> > 2. It iterated over list of generic power domains without holding
>> >> >> >    gpd_list_lock mutex thus list could have been modified in the same
>> >> >> >    time.
>> >> >> >
>> >> >> > Usage of genpd_lookup_dev() solves both problems as it is safe a call
>> >> >> > for non-generic power domains and uses mutex when iterating.
>> >> >> >
>> >> >> > Reported-by: Ulf Hansson <ulf.hansson@linaro.org>
>> >> >> > Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
>> >> >> > Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
>> >> >>
>> >> >> This is commit 8b55e55ee44356d6 in pm/linux-next, also part of the pull
>> >> >> request "[GIT PULL] Power management updates for v4.13-rc1".
>> >> >>
>> >> >> > Not tested on real hardware.
>> >> >>
>> >> >> This causes the following BUG during s2ram on all my Renesas arm32 boards,
>> >> >> where the system timer is an IRQ safe device:
>> >> >>
>> >> >> PM: Syncing filesystems ... done.
>> >> >> PM: Preparing system for sleep (mem)
>> >> >> Freezing user space processes ... (elapsed 0.001 seconds) done.
>> >> >> OOM killer disabled.
>> >> >> Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
>> >> >> PM: Suspending system (mem)
>> >> >> PM: suspend of devices complete after 122.841 msecs
>> >> >> PM: suspend devices took 0.130 seconds
>> >> >> PM: late suspend of devices complete after 2.682 msecs
>> >> >> PM: noirq suspend of devices complete after 29.951 msecs
>> >> >> Disabling non-boot CPUs ...
>> >> >> BUG: sleeping function called from invalid context at kernel/locking/mutex.c:238
>> >> >
>> >> > Thanks for report!
>> >> >
>> >> > Damn it, although I couldn't find this in the code, but I was fearing
>> >> > that this ends up in atomic section. That would kind of explain why
>> >> > mutex was not there [1].
>> >> >
>> >> > Anyway, the buggy code was there already. Instead of "sleeping in atomic
>> >> > section" there was no locking at all... In this context this was
>> >> > probably safe because it was executed *after* disabling non-boot CPUs
>> >> > but then the function cannot be called in other contexts.
>> >> >
>> >> > I am not sure I will be capable of developing the proper fix as I do not
>> >> > have the hardware and I do not know all stuff happening in sh suspend.
>> >> > Probably reverting this and living with non-locked path would be the
>> >> > safest choice.
>> >> >
>> >> > [1] https://patchwork.kernel.org/patch/9778903/
>> >>
>> >> AFAIU, all syscore stuff runs in atomic context.
>> >
>> > Indeed... The confusing part is that this code is syscore only from
>> > the name, it is not hooked in to syscore_ops. Although going by call
>> > chain (through sh clocksource drivers) we end up in
>> > timekeeping_suspend() which is a syscore op.
>> >
>> > I wonder whether it would be useful - after reverting my commit - to add
>> > an assert (which is a stronger API requirement than only documentation "may
>> > only be called during the system core (syscore) suspend") like:
>> >         WARN_ON(num_online_cpus() > 1));
>> > as without mutexes this should not be executed with more than one online
>> > CPU.
>>
>> Or maybe WARN_ON_ONCE(!in_atomic())?
>
> You could be in atomic section on this CPU and still have other CPUs
> online playing with gpd_list (without any protection from locking).
> This function is safe only on non-SMP case.

Well, not quite.

It is safe if you can guarantee that no other CPUs will touch the data
structure in question concurrently, which pretty much is the case for
timekeeping_suspend() even though it may be invoked without taking the
other CPUs offline (from the suspend-to-idle core path).

Thanks,
Rafael
Krzysztof Kozlowski July 4, 2017, 8:20 p.m. UTC | #8
On Tue, Jul 04, 2017 at 10:12:13PM +0200, Rafael J. Wysocki wrote:
> On Tue, Jul 4, 2017 at 10:05 PM, Krzysztof Kozlowski <krzk@kernel.org> wrote:
 >> >> > Thanks for report!
> >> >> >
> >> >> > Damn it, although I couldn't find this in the code, but I was fearing
> >> >> > that this ends up in atomic section. That would kind of explain why
> >> >> > mutex was not there [1].
> >> >> >
> >> >> > Anyway, the buggy code was there already. Instead of "sleeping in atomic
> >> >> > section" there was no locking at all... In this context this was
> >> >> > probably safe because it was executed *after* disabling non-boot CPUs
> >> >> > but then the function cannot be called in other contexts.
> >> >> >
> >> >> > I am not sure I will be capable of developing the proper fix as I do not
> >> >> > have the hardware and I do not know all stuff happening in sh suspend.
> >> >> > Probably reverting this and living with non-locked path would be the
> >> >> > safest choice.
> >> >> >
> >> >> > [1] https://patchwork.kernel.org/patch/9778903/
> >> >>
> >> >> AFAIU, all syscore stuff runs in atomic context.
> >> >
> >> > Indeed... The confusing part is that this code is syscore only from
> >> > the name, it is not hooked in to syscore_ops. Although going by call
> >> > chain (through sh clocksource drivers) we end up in
> >> > timekeeping_suspend() which is a syscore op.
> >> >
> >> > I wonder whether it would be useful - after reverting my commit - to add
> >> > an assert (which is a stronger API requirement than only documentation "may
> >> > only be called during the system core (syscore) suspend") like:
> >> >         WARN_ON(num_online_cpus() > 1));
> >> > as without mutexes this should not be executed with more than one online
> >> > CPU.
> >>
> >> Or maybe WARN_ON_ONCE(!in_atomic())?
> >
> > You could be in atomic section on this CPU and still have other CPUs
> > online playing with gpd_list (without any protection from locking).
> > This function is safe only on non-SMP case.
> 
> Well, not quite.
> 
> It is safe if you can guarantee that no other CPUs will touch the data
> structure in question concurrently, which pretty much is the case for
> timekeeping_suspend() even though it may be invoked without taking the
> other CPUs offline (from the suspend-to-idle core path).

Right, that would work fine for that case.

However I was rather thinking that we have an in-kernel API (exported)
so someone might by mistake try to use it in different contexts. For
example in some atomic section but on a platform which offlines CPUs
later. Thus it would be called in some imaginary suspend path but with
CPUs still being online. Partially it is already mentioned in documentation
although I am not sure that on every possible architecture syscore ops
are called after disabling non-boot CPUs...

And anyway for in-kernel API having a explicit assert is a stronger way
of documenting requirement than a comment.

Best regards,
Krzysztof
Rafael J. Wysocki July 4, 2017, 8:28 p.m. UTC | #9
On Tue, Jul 4, 2017 at 10:20 PM, Krzysztof Kozlowski <krzk@kernel.org> wrote:
> On Tue, Jul 04, 2017 at 10:12:13PM +0200, Rafael J. Wysocki wrote:
>> On Tue, Jul 4, 2017 at 10:05 PM, Krzysztof Kozlowski <krzk@kernel.org> wrote:
>  >> >> > Thanks for report!
>> >> >> >
>> >> >> > Damn it, although I couldn't find this in the code, but I was fearing
>> >> >> > that this ends up in atomic section. That would kind of explain why
>> >> >> > mutex was not there [1].
>> >> >> >
>> >> >> > Anyway, the buggy code was there already. Instead of "sleeping in atomic
>> >> >> > section" there was no locking at all... In this context this was
>> >> >> > probably safe because it was executed *after* disabling non-boot CPUs
>> >> >> > but then the function cannot be called in other contexts.
>> >> >> >
>> >> >> > I am not sure I will be capable of developing the proper fix as I do not
>> >> >> > have the hardware and I do not know all stuff happening in sh suspend.
>> >> >> > Probably reverting this and living with non-locked path would be the
>> >> >> > safest choice.
>> >> >> >
>> >> >> > [1] https://patchwork.kernel.org/patch/9778903/
>> >> >>
>> >> >> AFAIU, all syscore stuff runs in atomic context.
>> >> >
>> >> > Indeed... The confusing part is that this code is syscore only from
>> >> > the name, it is not hooked in to syscore_ops. Although going by call
>> >> > chain (through sh clocksource drivers) we end up in
>> >> > timekeeping_suspend() which is a syscore op.
>> >> >
>> >> > I wonder whether it would be useful - after reverting my commit - to add
>> >> > an assert (which is a stronger API requirement than only documentation "may
>> >> > only be called during the system core (syscore) suspend") like:
>> >> >         WARN_ON(num_online_cpus() > 1));
>> >> > as without mutexes this should not be executed with more than one online
>> >> > CPU.
>> >>
>> >> Or maybe WARN_ON_ONCE(!in_atomic())?
>> >
>> > You could be in atomic section on this CPU and still have other CPUs
>> > online playing with gpd_list (without any protection from locking).
>> > This function is safe only on non-SMP case.
>>
>> Well, not quite.
>>
>> It is safe if you can guarantee that no other CPUs will touch the data
>> structure in question concurrently, which pretty much is the case for
>> timekeeping_suspend() even though it may be invoked without taking the
>> other CPUs offline (from the suspend-to-idle core path).
>
> Right, that would work fine for that case.
>
> However I was rather thinking that we have an in-kernel API (exported)
> so someone might by mistake try to use it in different contexts. For
> example in some atomic section but on a platform which offlines CPUs
> later. Thus it would be called in some imaginary suspend path but with
> CPUs still being online. Partially it is already mentioned in documentation
> although I am not sure that on every possible architecture syscore ops
> are called after disabling non-boot CPUs...

Yes, they are.  Nonboot CPUs are disabled by the core.

Anyway, while I see your point, it would be rather hard to find an assertion
that would also work for the suspend-to-idle timekeeping_suspend() invocation
case.

Thanks,
Rafael
diff mbox

Patch

diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
index 01e31d9f6c94..d31a4434b8b3 100644
--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -1098,8 +1098,8 @@  static void genpd_syscore_switch(struct device *dev, bool suspend)
 {
 	struct generic_pm_domain *genpd;
 
-	genpd = dev_to_genpd(dev);
-	if (!pm_genpd_present(genpd))
+	genpd = genpd_lookup_dev(dev);
+	if (!genpd)
 		return;
 
 	if (suspend) {