mbox series

[0/2] cpufreq/opp: rework regulator initialization

Message ID 20190207122227.19873-1-m.szyprowski@samsung.com (mailing list archive)
Headers show
Series cpufreq/opp: rework regulator initialization | expand

Message

Marek Szyprowski Feb. 7, 2019, 12:22 p.m. UTC
Dear All,

Recent commit 9ac6cb5fbb17 ("i2c: add suspended flag and accessors for
i2c adapters") added a visible warning for an attempt to do i2c transfer
over a suspended i2c bus. This revealed a long standing issue in the
cpufreq-dt driver, which gives a following warning during system
suspend/resume cycle:

--->8---
Enabling non-boot CPUs ...
CPU1 is up
CPU2 is up
CPU3 is up
------------[ cut here ]------------
WARNING: CPU: 4 PID: 29 at drivers/i2c/i2c-core-base.c:1869 __i2c_transfer+0x6f8/0xa50
Modules linked in:
CPU: 4 PID: 29 Comm: cpuhp/4 Tainted: G        W         5.0.0-rc4-next-20190131-00024-g54b06b29cc65 #5324
Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[<c01110e8>] (unwind_backtrace) from [<c010d11c>] (show_stack+0x10/0x14)
[<c010d11c>] (show_stack) from [<c09a2584>] (dump_stack+0x90/0xc8)
[<c09a2584>] (dump_stack) from [<c0120bd0>] (__warn+0xf8/0x124)
[<c0120bd0>] (__warn) from [<c0120c3c>] (warn_slowpath_null+0x40/0x48)
[<c0120c3c>] (warn_slowpath_null) from [<c065cda0>] (__i2c_transfer+0x6f8/0xa50)
[<c065cda0>] (__i2c_transfer) from [<c065d168>] (i2c_transfer+0x70/0xe4)
[<c065d168>] (i2c_transfer) from [<c053ce4c>] (regmap_i2c_read+0x48/0x64)
[<c053ce4c>] (regmap_i2c_read) from [<c0536f1c>] (_regmap_raw_read+0xf8/0x450)
[<c0536f1c>] (_regmap_raw_read) from [<c053772c>] (_regmap_bus_read+0x38/0x68)
[<c053772c>] (_regmap_bus_read) from [<c05365a0>] (_regmap_read+0x60/0x250)
[<c05365a0>] (_regmap_read) from [<c05367cc>] (regmap_read+0x3c/0x5c)
[<c05367cc>] (regmap_read) from [<c047cfc0>] (regulator_is_enabled_regmap+0x20/0x90)
[<c047cfc0>] (regulator_is_enabled_regmap) from [<c0477660>] (_regulator_is_enabled+0x34/0x40)
[<c0477660>] (_regulator_is_enabled) from [<c0478674>] (create_regulator+0x1a4/0x25c)
[<c0478674>] (create_regulator) from [<c047c818>] (_regulator_get+0xe4/0x278)
[<c047c818>] (_regulator_get) from [<c068f1dc>] (dev_pm_opp_set_regulators+0xa0/0x1c0)
[<c068f1dc>] (dev_pm_opp_set_regulators) from [<c0698cc8>] (cpufreq_init+0x98/0x2d0)
[<c0698cc8>] (cpufreq_init) from [<c06959e4>] (cpufreq_online+0xc8/0x71c)
[<c06959e4>] (cpufreq_online) from [<c06960fc>] (cpuhp_cpufreq_online+0x8/0x10)
[<c06960fc>] (cpuhp_cpufreq_online) from [<c01213d4>] (cpuhp_invoke_callback+0xf4/0xebc)
[<c01213d4>] (cpuhp_invoke_callback) from [<c0122e4c>] (cpuhp_thread_fun+0x1d8/0x320)
[<c0122e4c>] (cpuhp_thread_fun) from [<c0149858>] (smpboot_thread_fn+0x194/0x340)
[<c0149858>] (smpboot_thread_fn) from [<c014573c>] (kthread+0x124/0x160)
[<c014573c>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
Exception stack(0xe897dfb0 to 0xe897dff8)
dfa0:                                     00000000 00000000 00000000 00000000
dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
irq event stamp: 3865
hardirqs last  enabled at (3873): [<c0186dec>] vprintk_emit+0x228/0x2a4
hardirqs last disabled at (3880): [<c0186cf0>] vprintk_emit+0x12c/0x2a4
softirqs last  enabled at (3052): [<c0102564>] __do_softirq+0x3a4/0x66c
softirqs last disabled at (3043): [<c0128464>] irq_exit+0x140/0x168
---[ end trace db48b455d924fec2 ]---
CPU4 is up
CPU5 is up
CPU6 is up
CPU7 is up
--->8---

This is a scenario that triggers the above issue:

1. system disables non-boot cpu's at the end of system suspend procedure,
2. this in turn deinitializes cpufreq drivers for the disabled cpus,
3. early in the system resume procedure all cpus are got back to online
   state,
4. this in turn causes cpufreq to be initialized for the newly onlined
   cpus,
5. cpufreq-dt acquires all its resources (clocks, regulators) during
   ->init() callback,
6. getting regulator require to check its state, what in turn requires
   i2c transfer,
7. during system early resume stage this is not really possible.

The issue is caused by cpufreq-dt driver not keeping its resources for
the whole driver lifetime and relying that they can be always acquired
at any system context. This problem has been observed on Samsung
Exynos based Odroid XU3/XU4 boards, but it happens on all boards, which
have separate regulators for different CPU clusters.

To fix this issue I've rewritten resources (clock and regulators)
handling in cpufreq-dt driver. Now the driver gathers them in its
->probe() and keeps using them until the ->remove() happens. This
required also some rework in common opp regulators handling code.
Namely, regulators are no longer assigned by name, instead the caller
has to pass proper regulator object. The side-effect of this work
is also a removal of the FIXME item and correct handling of deferred
probe caused by lack of non-CPU0 regulator.

Best regards
Marek Szyprowski
Samsung R&D Institute Poland


Patch summary:

Marek Szyprowski (2):
  cpufreq: dt/ti/opp: move regulators initialization to the drivers
  cpufreq: dt: rework resources initialization

 drivers/cpufreq/cpufreq-dt.c | 127 +++++++++++++++--------------------
 drivers/cpufreq/ti-cpufreq.c |  42 ++++++++++--
 drivers/opp/core.c           |  40 ++---------
 include/linux/pm_opp.h       |   2 +-
 4 files changed, 96 insertions(+), 115 deletions(-)

Comments

Viresh Kumar Feb. 8, 2019, 6:49 a.m. UTC | #1
On 07-02-19, 13:22, Marek Szyprowski wrote:
> Dear All,
> 
> Recent commit 9ac6cb5fbb17 ("i2c: add suspended flag and accessors for
> i2c adapters") added a visible warning for an attempt to do i2c transfer
> over a suspended i2c bus. This revealed a long standing issue in the
> cpufreq-dt driver, which gives a following warning during system
> suspend/resume cycle:
> 
> --->8---
> Enabling non-boot CPUs ...
> CPU1 is up
> CPU2 is up
> CPU3 is up
> ------------[ cut here ]------------
> WARNING: CPU: 4 PID: 29 at drivers/i2c/i2c-core-base.c:1869 __i2c_transfer+0x6f8/0xa50
> Modules linked in:
> CPU: 4 PID: 29 Comm: cpuhp/4 Tainted: G        W         5.0.0-rc4-next-20190131-00024-g54b06b29cc65 #5324
> Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
> [<c01110e8>] (unwind_backtrace) from [<c010d11c>] (show_stack+0x10/0x14)
> [<c010d11c>] (show_stack) from [<c09a2584>] (dump_stack+0x90/0xc8)
> [<c09a2584>] (dump_stack) from [<c0120bd0>] (__warn+0xf8/0x124)
> [<c0120bd0>] (__warn) from [<c0120c3c>] (warn_slowpath_null+0x40/0x48)
> [<c0120c3c>] (warn_slowpath_null) from [<c065cda0>] (__i2c_transfer+0x6f8/0xa50)
> [<c065cda0>] (__i2c_transfer) from [<c065d168>] (i2c_transfer+0x70/0xe4)
> [<c065d168>] (i2c_transfer) from [<c053ce4c>] (regmap_i2c_read+0x48/0x64)
> [<c053ce4c>] (regmap_i2c_read) from [<c0536f1c>] (_regmap_raw_read+0xf8/0x450)
> [<c0536f1c>] (_regmap_raw_read) from [<c053772c>] (_regmap_bus_read+0x38/0x68)
> [<c053772c>] (_regmap_bus_read) from [<c05365a0>] (_regmap_read+0x60/0x250)
> [<c05365a0>] (_regmap_read) from [<c05367cc>] (regmap_read+0x3c/0x5c)
> [<c05367cc>] (regmap_read) from [<c047cfc0>] (regulator_is_enabled_regmap+0x20/0x90)
> [<c047cfc0>] (regulator_is_enabled_regmap) from [<c0477660>] (_regulator_is_enabled+0x34/0x40)
> [<c0477660>] (_regulator_is_enabled) from [<c0478674>] (create_regulator+0x1a4/0x25c)
> [<c0478674>] (create_regulator) from [<c047c818>] (_regulator_get+0xe4/0x278)
> [<c047c818>] (_regulator_get) from [<c068f1dc>] (dev_pm_opp_set_regulators+0xa0/0x1c0)
> [<c068f1dc>] (dev_pm_opp_set_regulators) from [<c0698cc8>] (cpufreq_init+0x98/0x2d0)
> [<c0698cc8>] (cpufreq_init) from [<c06959e4>] (cpufreq_online+0xc8/0x71c)
> [<c06959e4>] (cpufreq_online) from [<c06960fc>] (cpuhp_cpufreq_online+0x8/0x10)
> [<c06960fc>] (cpuhp_cpufreq_online) from [<c01213d4>] (cpuhp_invoke_callback+0xf4/0xebc)
> [<c01213d4>] (cpuhp_invoke_callback) from [<c0122e4c>] (cpuhp_thread_fun+0x1d8/0x320)
> [<c0122e4c>] (cpuhp_thread_fun) from [<c0149858>] (smpboot_thread_fn+0x194/0x340)
> [<c0149858>] (smpboot_thread_fn) from [<c014573c>] (kthread+0x124/0x160)
> [<c014573c>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
> Exception stack(0xe897dfb0 to 0xe897dff8)
> dfa0:                                     00000000 00000000 00000000 00000000
> dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
> irq event stamp: 3865
> hardirqs last  enabled at (3873): [<c0186dec>] vprintk_emit+0x228/0x2a4
> hardirqs last disabled at (3880): [<c0186cf0>] vprintk_emit+0x12c/0x2a4
> softirqs last  enabled at (3052): [<c0102564>] __do_softirq+0x3a4/0x66c
> softirqs last disabled at (3043): [<c0128464>] irq_exit+0x140/0x168
> ---[ end trace db48b455d924fec2 ]---
> CPU4 is up
> CPU5 is up
> CPU6 is up
> CPU7 is up
> --->8---
> 
> This is a scenario that triggers the above issue:
> 
> 1. system disables non-boot cpu's at the end of system suspend procedure,
> 2. this in turn deinitializes cpufreq drivers for the disabled cpus,
> 3. early in the system resume procedure all cpus are got back to online
>    state,
> 4. this in turn causes cpufreq to be initialized for the newly onlined
>    cpus,
> 5. cpufreq-dt acquires all its resources (clocks, regulators) during
>    ->init() callback,
> 6. getting regulator require to check its state, what in turn requires
>    i2c transfer,
> 7. during system early resume stage this is not really possible.
> 
> The issue is caused by cpufreq-dt driver not keeping its resources for
> the whole driver lifetime and relying that they can be always acquired
> at any system context. This problem has been observed on Samsung
> Exynos based Odroid XU3/XU4 boards, but it happens on all boards, which
> have separate regulators for different CPU clusters.

Why don't you get similar problem during suspend? I think you can get
it when the CPUs are offlined as I2C would have gone by then. The
cpufreq or OPP core can try and run some regulator or genpd or clk
calls to disable resources, etc. Even if doesn't happen, it certainly
can.

Also at resume the cpufreq core may try to change the frequency right
from ->init() on certain cases, though not everytime and so the
problem can come despite of this series.

We guarantee that the resources are available during probe but not
during resume, that's where the problem is.

@Rafael any suggestions on how to fix this ?
Marek Szyprowski Feb. 8, 2019, 8:12 a.m. UTC | #2
Hi Viresh,

On 2019-02-08 07:49, Viresh Kumar wrote:
> On 07-02-19, 13:22, Marek Szyprowski wrote:
>> Recent commit 9ac6cb5fbb17 ("i2c: add suspended flag and accessors for
>> i2c adapters") added a visible warning for an attempt to do i2c transfer
>> over a suspended i2c bus. This revealed a long standing issue in the
>> cpufreq-dt driver, which gives a following warning during system
>> suspend/resume cycle:
>>
>> --->8---
>> Enabling non-boot CPUs ...
>> CPU1 is up
>> CPU2 is up
>> CPU3 is up
>> ------------[ cut here ]------------
>> WARNING: CPU: 4 PID: 29 at drivers/i2c/i2c-core-base.c:1869 __i2c_transfer+0x6f8/0xa50
>> Modules linked in:
>> CPU: 4 PID: 29 Comm: cpuhp/4 Tainted: G        W         5.0.0-rc4-next-20190131-00024-g54b06b29cc65 #5324
>> Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
>> [<c01110e8>] (unwind_backtrace) from [<c010d11c>] (show_stack+0x10/0x14)
>> [<c010d11c>] (show_stack) from [<c09a2584>] (dump_stack+0x90/0xc8)
>> [<c09a2584>] (dump_stack) from [<c0120bd0>] (__warn+0xf8/0x124)
>> [<c0120bd0>] (__warn) from [<c0120c3c>] (warn_slowpath_null+0x40/0x48)
>> [<c0120c3c>] (warn_slowpath_null) from [<c065cda0>] (__i2c_transfer+0x6f8/0xa50)
>> [<c065cda0>] (__i2c_transfer) from [<c065d168>] (i2c_transfer+0x70/0xe4)
>> [<c065d168>] (i2c_transfer) from [<c053ce4c>] (regmap_i2c_read+0x48/0x64)
>> [<c053ce4c>] (regmap_i2c_read) from [<c0536f1c>] (_regmap_raw_read+0xf8/0x450)
>> [<c0536f1c>] (_regmap_raw_read) from [<c053772c>] (_regmap_bus_read+0x38/0x68)
>> [<c053772c>] (_regmap_bus_read) from [<c05365a0>] (_regmap_read+0x60/0x250)
>> [<c05365a0>] (_regmap_read) from [<c05367cc>] (regmap_read+0x3c/0x5c)
>> [<c05367cc>] (regmap_read) from [<c047cfc0>] (regulator_is_enabled_regmap+0x20/0x90)
>> [<c047cfc0>] (regulator_is_enabled_regmap) from [<c0477660>] (_regulator_is_enabled+0x34/0x40)
>> [<c0477660>] (_regulator_is_enabled) from [<c0478674>] (create_regulator+0x1a4/0x25c)
>> [<c0478674>] (create_regulator) from [<c047c818>] (_regulator_get+0xe4/0x278)
>> [<c047c818>] (_regulator_get) from [<c068f1dc>] (dev_pm_opp_set_regulators+0xa0/0x1c0)
>> [<c068f1dc>] (dev_pm_opp_set_regulators) from [<c0698cc8>] (cpufreq_init+0x98/0x2d0)
>> [<c0698cc8>] (cpufreq_init) from [<c06959e4>] (cpufreq_online+0xc8/0x71c)
>> [<c06959e4>] (cpufreq_online) from [<c06960fc>] (cpuhp_cpufreq_online+0x8/0x10)
>> [<c06960fc>] (cpuhp_cpufreq_online) from [<c01213d4>] (cpuhp_invoke_callback+0xf4/0xebc)
>> [<c01213d4>] (cpuhp_invoke_callback) from [<c0122e4c>] (cpuhp_thread_fun+0x1d8/0x320)
>> [<c0122e4c>] (cpuhp_thread_fun) from [<c0149858>] (smpboot_thread_fn+0x194/0x340)
>> [<c0149858>] (smpboot_thread_fn) from [<c014573c>] (kthread+0x124/0x160)
>> [<c014573c>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
>> Exception stack(0xe897dfb0 to 0xe897dff8)
>> dfa0:                                     00000000 00000000 00000000 00000000
>> dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>> dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
>> irq event stamp: 3865
>> hardirqs last  enabled at (3873): [<c0186dec>] vprintk_emit+0x228/0x2a4
>> hardirqs last disabled at (3880): [<c0186cf0>] vprintk_emit+0x12c/0x2a4
>> softirqs last  enabled at (3052): [<c0102564>] __do_softirq+0x3a4/0x66c
>> softirqs last disabled at (3043): [<c0128464>] irq_exit+0x140/0x168
>> ---[ end trace db48b455d924fec2 ]---
>> CPU4 is up
>> CPU5 is up
>> CPU6 is up
>> CPU7 is up
>> --->8---
>>
>> This is a scenario that triggers the above issue:
>>
>> 1. system disables non-boot cpu's at the end of system suspend procedure,
>> 2. this in turn deinitializes cpufreq drivers for the disabled cpus,
>> 3. early in the system resume procedure all cpus are got back to online
>>    state,
>> 4. this in turn causes cpufreq to be initialized for the newly onlined
>>    cpus,
>> 5. cpufreq-dt acquires all its resources (clocks, regulators) during
>>    ->init() callback,
>> 6. getting regulator require to check its state, what in turn requires
>>    i2c transfer,
>> 7. during system early resume stage this is not really possible.
>>
>> The issue is caused by cpufreq-dt driver not keeping its resources for
>> the whole driver lifetime and relying that they can be always acquired
>> at any system context. This problem has been observed on Samsung
>> Exynos based Odroid XU3/XU4 boards, but it happens on all boards, which
>> have separate regulators for different CPU clusters.
> Why don't you get similar problem during suspend? I think you can get
> it when the CPUs are offlined as I2C would have gone by then. The
> cpufreq or OPP core can try and run some regulator or genpd or clk
> calls to disable resources, etc. Even if doesn't happen, it certainly
> can.

CPUfreq is suspended very early during system suspend and thus it does
nothing when CPUs are being offlined.

> Also at resume the cpufreq core may try to change the frequency right
> from ->init() on certain cases, though not everytime and so the
> problem can come despite of this series.

cpufreq is still in suspended state (it is being 'resume' very late in
the system resume procedure), so if driver doesn't explicitly change any
opp in ->init(), then cpufreq core waits until everything is resumed. To
sum up, this seems to be fine, beside the issue with regulator
initialization I've addressed in this patchset.

> We guarantee that the resources are available during probe but not
> during resume, that's where the problem is.

Yes, so I've changed cpufreq-dt to the common approach, in which the
driver keeps all needed resources for the whole lifetime of the device.

> @Rafael any suggestions on how to fix this ?

Best regards
Viresh Kumar Feb. 8, 2019, 8:55 a.m. UTC | #3
On 08-02-19, 09:12, Marek Szyprowski wrote:
> On 2019-02-08 07:49, Viresh Kumar wrote:
> > Why don't you get similar problem during suspend? I think you can get
> > it when the CPUs are offlined as I2C would have gone by then. The
> > cpufreq or OPP core can try and run some regulator or genpd or clk
> > calls to disable resources, etc. Even if doesn't happen, it certainly
> > can.
> 
> CPUfreq is suspended very early during system suspend and thus it does
> nothing when CPUs are being offlined.

> > Also at resume the cpufreq core may try to change the frequency right
> > from ->init() on certain cases, though not everytime and so the
> > problem can come despite of this series.
> 
> cpufreq is still in suspended state (it is being 'resume' very late in
> the system resume procedure), so if driver doesn't explicitly change any
> opp in ->init(), then cpufreq core waits until everything is resumed. To
> sum up, this seems to be fine, beside the issue with regulator
> initialization I've addressed in this patchset.

Yeah, the governors are suspended very soon, but any frequency change
starting from cpufreq core can still happen. There are at least two
points in cpufreq_online() where we may end up changing the frequency,
but that is conditional and may not be getting hit.

> > We guarantee that the resources are available during probe but not
> > during resume, that's where the problem is.
> 
> Yes, so I've changed cpufreq-dt to the common approach, in which the
> driver keeps all needed resources for the whole lifetime of the device.

That's not what I was saying actually. I was saying that it should be
fine to do a I2C transfer during resume, else we will always have
problems and have to fix them with hacks like the one you proposed
where you acquire resources for all the possible CPUs. Maybe we can
fix it once and for all.
Marek Szyprowski Feb. 8, 2019, 9:15 a.m. UTC | #4
Hi Viresh,

On 2019-02-08 09:55, Viresh Kumar wrote:
> On 08-02-19, 09:12, Marek Szyprowski wrote:
>> On 2019-02-08 07:49, Viresh Kumar wrote:
>>> Why don't you get similar problem during suspend? I think you can get
>>> it when the CPUs are offlined as I2C would have gone by then. The
>>> cpufreq or OPP core can try and run some regulator or genpd or clk
>>> calls to disable resources, etc. Even if doesn't happen, it certainly
>>> can.
>> CPUfreq is suspended very early during system suspend and thus it does
>> nothing when CPUs are being offlined.
>>> Also at resume the cpufreq core may try to change the frequency right
>>> from ->init() on certain cases, though not everytime and so the
>>> problem can come despite of this series.
>> cpufreq is still in suspended state (it is being 'resume' very late in
>> the system resume procedure), so if driver doesn't explicitly change any
>> opp in ->init(), then cpufreq core waits until everything is resumed. To
>> sum up, this seems to be fine, beside the issue with regulator
>> initialization I've addressed in this patchset.
> Yeah, the governors are suspended very soon, but any frequency change
> starting from cpufreq core can still happen. There are at least two
> points in cpufreq_online() where we may end up changing the frequency,
> but that is conditional and may not be getting hit.

Then probably cpufreq core suspend should handle this.

>>> We guarantee that the resources are available during probe but not
>>> during resume, that's where the problem is.
>> Yes, so I've changed cpufreq-dt to the common approach, in which the
>> driver keeps all needed resources for the whole lifetime of the device.
> That's not what I was saying actually. I was saying that it should be
> fine to do a I2C transfer during resume, else we will always have
> problems and have to fix them with hacks like the one you proposed
> where you acquire resources for all the possible CPUs. Maybe we can
> fix it once and for all.

It is fine to do i2c transfers during cpufreq resume (see
drivers/base/power/main.c dpm_resume() function for exact call place).
The problem is that such calls are not allowed in ->init(), which might
be called very early from CPU hotplug path (CPUs are resumed in the
first step of system resume procedure).

What's wrong with my proposed fix? It is not that uncommon to gather all
resources in probe() and keep them until remove() happens.

Best regards
Viresh Kumar Feb. 8, 2019, 9:23 a.m. UTC | #5
On 08-02-19, 10:15, Marek Szyprowski wrote:
> On 2019-02-08 09:55, Viresh Kumar wrote:
> > On 08-02-19, 09:12, Marek Szyprowski wrote:
> >> On 2019-02-08 07:49, Viresh Kumar wrote:
> >>> Why don't you get similar problem during suspend? I think you can get
> >>> it when the CPUs are offlined as I2C would have gone by then. The
> >>> cpufreq or OPP core can try and run some regulator or genpd or clk
> >>> calls to disable resources, etc. Even if doesn't happen, it certainly
> >>> can.
> >> CPUfreq is suspended very early during system suspend and thus it does
> >> nothing when CPUs are being offlined.
> >>> Also at resume the cpufreq core may try to change the frequency right
> >>> from ->init() on certain cases, though not everytime and so the
> >>> problem can come despite of this series.
> >> cpufreq is still in suspended state (it is being 'resume' very late in
> >> the system resume procedure), so if driver doesn't explicitly change any
> >> opp in ->init(), then cpufreq core waits until everything is resumed. To
> >> sum up, this seems to be fine, beside the issue with regulator
> >> initialization I've addressed in this patchset.
> > Yeah, the governors are suspended very soon, but any frequency change
> > starting from cpufreq core can still happen. There are at least two
> > points in cpufreq_online() where we may end up changing the frequency,
> > but that is conditional and may not be getting hit.
> 
> Then probably cpufreq core suspend should handle this.

Handle what ? That code is part of cpufreq_online() and needs to be
there only.

> >>> We guarantee that the resources are available during probe but not
> >>> during resume, that's where the problem is.
> >> Yes, so I've changed cpufreq-dt to the common approach, in which the
> >> driver keeps all needed resources for the whole lifetime of the device.
> > That's not what I was saying actually. I was saying that it should be
> > fine to do a I2C transfer during resume, else we will always have
> > problems and have to fix them with hacks like the one you proposed
> > where you acquire resources for all the possible CPUs. Maybe we can
> > fix it once and for all.
> 
> It is fine to do i2c transfers during cpufreq resume (see

By resume I meant system resume and the whole onlining process of
non-boot CPUs.

> drivers/base/power/main.c dpm_resume() function for exact call place).
> The problem is that such calls are not allowed in ->init(), which might
> be called very early from CPU hotplug path (CPUs are resumed in the
> first step of system resume procedure).

Right and that's where I think we can do something to fix it in a
proper way.

> What's wrong with my proposed fix? It is not that uncommon to gather all
> resources in probe() and keep them until remove() happens.

For cpufreq drivers, we must be doing most of the stuff in init/exit
only as far as possible. I am not saying your patch is bad, that is
the best we can do in such situations. But I don't like that we have
to get the resources for all CPUs, despite the fact that many of them
would be part of the same policy and hence share resources. The
problem was that we need to get sharing-cpus detail as well in probe
then, etc.

I was thinking about doing disable_nonboot_cpus() much earlier as the
userspace is already frozen by then.

@Rafael: Will that slowdown the suspend/resume process? Maybe not as
we are doing everything from a single CPU/thread anyways ?
Marek Szyprowski Feb. 8, 2019, 10:02 a.m. UTC | #6
Hi Viresh,

On 2019-02-08 10:23, Viresh Kumar wrote:
> On 08-02-19, 10:15, Marek Szyprowski wrote:
>> On 2019-02-08 09:55, Viresh Kumar wrote:
>>> On 08-02-19, 09:12, Marek Szyprowski wrote:
>>>> On 2019-02-08 07:49, Viresh Kumar wrote:
>>>>> Why don't you get similar problem during suspend? I think you can get
>>>>> it when the CPUs are offlined as I2C would have gone by then. The
>>>>> cpufreq or OPP core can try and run some regulator or genpd or clk
>>>>> calls to disable resources, etc. Even if doesn't happen, it certainly
>>>>> can.
>>>> CPUfreq is suspended very early during system suspend and thus it does
>>>> nothing when CPUs are being offlined.
>>>>> Also at resume the cpufreq core may try to change the frequency right
>>>>> from ->init() on certain cases, though not everytime and so the
>>>>> problem can come despite of this series.
>>>> cpufreq is still in suspended state (it is being 'resume' very late in
>>>> the system resume procedure), so if driver doesn't explicitly change any
>>>> opp in ->init(), then cpufreq core waits until everything is resumed. To
>>>> sum up, this seems to be fine, beside the issue with regulator
>>>> initialization I've addressed in this patchset.
>>> Yeah, the governors are suspended very soon, but any frequency change
>>> starting from cpufreq core can still happen. There are at least two
>>> points in cpufreq_online() where we may end up changing the frequency,
>>> but that is conditional and may not be getting hit.
>> Then probably cpufreq core suspend should handle this.
> Handle what ? That code is part of cpufreq_online() and needs to be
> there only.

If got it right, it is a matter of handling
CPUFREQ_NEED_INITIAL_FREQ_CHECK flag. Maybe there should be additional
check if CPUfreq is not suspended?

>>>>> We guarantee that the resources are available during probe but not
>>>>> during resume, that's where the problem is.
>>>> Yes, so I've changed cpufreq-dt to the common approach, in which the
>>>> driver keeps all needed resources for the whole lifetime of the device.
>>> That's not what I was saying actually. I was saying that it should be
>>> fine to do a I2C transfer during resume, else we will always have
>>> problems and have to fix them with hacks like the one you proposed
>>> where you acquire resources for all the possible CPUs. Maybe we can
>>> fix it once and for all.
>> It is fine to do i2c transfers during cpufreq resume (see
> By resume I meant system resume and the whole onlining process of
> non-boot CPUs.

Right now those are 2 separate things in cpufreq core.

>> drivers/base/power/main.c dpm_resume() function for exact call place).
>> The problem is that such calls are not allowed in ->init(), which might
>> be called very early from CPU hotplug path (CPUs are resumed in the
>> first step of system resume procedure).
> Right and that's where I think we can do something to fix it in a
> proper way.
>
>> What's wrong with my proposed fix? It is not that uncommon to gather all
>> resources in probe() and keep them until remove() happens.
> For cpufreq drivers, we must be doing most of the stuff in init/exit
> only as far as possible. I am not saying your patch is bad, that is
> the best we can do in such situations. But I don't like that we have
> to get the resources for all CPUs, despite the fact that many of them
> would be part of the same policy and hence share resources. The
> problem was that we need to get sharing-cpus detail as well in probe
> then, etc.

Both resources in this case: clocks and regulators are refcounted by
their frameworks, so the cost of getting a few more references for them
is imho negligible.

> I was thinking about doing disable_nonboot_cpus() much earlier as the
> userspace is already frozen by then.
>
> @Rafael: Will that slowdown the suspend/resume process? Maybe not as
> we are doing everything from a single CPU/thread anyways ?

For some reasons drivers are handled partially asynchronously in
suspend/resume procedure and can do suspend and resume in parallel. I
don't think that limiting the whole suspend/resume process to a single
cpu is the best we can do.

Best regards
Rafael J. Wysocki Feb. 8, 2019, 10:08 a.m. UTC | #7
On Fri, Feb 8, 2019 at 10:23 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 08-02-19, 10:15, Marek Szyprowski wrote:
> > On 2019-02-08 09:55, Viresh Kumar wrote:
> > > On 08-02-19, 09:12, Marek Szyprowski wrote:
> > >> On 2019-02-08 07:49, Viresh Kumar wrote:
> > >>> Why don't you get similar problem during suspend? I think you can get
> > >>> it when the CPUs are offlined as I2C would have gone by then. The
> > >>> cpufreq or OPP core can try and run some regulator or genpd or clk
> > >>> calls to disable resources, etc. Even if doesn't happen, it certainly
> > >>> can.
> > >> CPUfreq is suspended very early during system suspend and thus it does
> > >> nothing when CPUs are being offlined.
> > >>> Also at resume the cpufreq core may try to change the frequency right
> > >>> from ->init() on certain cases, though not everytime and so the
> > >>> problem can come despite of this series.
> > >> cpufreq is still in suspended state (it is being 'resume' very late in
> > >> the system resume procedure), so if driver doesn't explicitly change any
> > >> opp in ->init(), then cpufreq core waits until everything is resumed. To
> > >> sum up, this seems to be fine, beside the issue with regulator
> > >> initialization I've addressed in this patchset.
> > > Yeah, the governors are suspended very soon, but any frequency change
> > > starting from cpufreq core can still happen. There are at least two
> > > points in cpufreq_online() where we may end up changing the frequency,
> > > but that is conditional and may not be getting hit.
> >
> > Then probably cpufreq core suspend should handle this.
>
> Handle what ? That code is part of cpufreq_online() and needs to be
> there only.
>
> > >>> We guarantee that the resources are available during probe but not
> > >>> during resume, that's where the problem is.
> > >> Yes, so I've changed cpufreq-dt to the common approach, in which the
> > >> driver keeps all needed resources for the whole lifetime of the device.
> > > That's not what I was saying actually. I was saying that it should be
> > > fine to do a I2C transfer during resume, else we will always have
> > > problems and have to fix them with hacks like the one you proposed
> > > where you acquire resources for all the possible CPUs. Maybe we can
> > > fix it once and for all.
> >
> > It is fine to do i2c transfers during cpufreq resume (see
>
> By resume I meant system resume and the whole onlining process of
> non-boot CPUs.
>
> > drivers/base/power/main.c dpm_resume() function for exact call place).
> > The problem is that such calls are not allowed in ->init(), which might
> > be called very early from CPU hotplug path (CPUs are resumed in the
> > first step of system resume procedure).
>
> Right and that's where I think we can do something to fix it in a
> proper way.
>
> > What's wrong with my proposed fix? It is not that uncommon to gather all
> > resources in probe() and keep them until remove() happens.
>
> For cpufreq drivers, we must be doing most of the stuff in init/exit
> only as far as possible. I am not saying your patch is bad, that is
> the best we can do in such situations. But I don't like that we have
> to get the resources for all CPUs, despite the fact that many of them
> would be part of the same policy and hence share resources. The
> problem was that we need to get sharing-cpus detail as well in probe
> then, etc.
>
> I was thinking about doing disable_nonboot_cpus() much earlier as the
> userspace is already frozen by then.
>
> @Rafael: Will that slowdown the suspend/resume process? Maybe not as
> we are doing everything from a single CPU/thread anyways ?

First, we used to do that and we switched over to what we have right
now several years ago, because it didn't work reliably then.

Arguably, CPU hotplug is in a much better shape now, so it might be
working better, but that would be a huge change and lots of
documentation would need to be revised. :-)

Also it is not true that everything is done on a single CPU, but I'm
not really sure about the possible slowdown.

Second, there are (many) systems for which that change is not really
necessary and it is risky because of possible regressions.

I guess that CPUs depending on I2C for online/offline could be taken
offline earlier and brought back online later during suspend/resume,
like before/after the _noirq suspend of devices, but doing that on all
systems altogether is almost guaranteed to introduce regressions.
Rafael J. Wysocki Feb. 8, 2019, 10:18 a.m. UTC | #8
On Fri, Feb 8, 2019 at 9:55 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 08-02-19, 09:12, Marek Szyprowski wrote:
> > On 2019-02-08 07:49, Viresh Kumar wrote:
> > > Why don't you get similar problem during suspend? I think you can get
> > > it when the CPUs are offlined as I2C would have gone by then. The
> > > cpufreq or OPP core can try and run some regulator or genpd or clk
> > > calls to disable resources, etc. Even if doesn't happen, it certainly
> > > can.
> >
> > CPUfreq is suspended very early during system suspend and thus it does
> > nothing when CPUs are being offlined.
>
> > > Also at resume the cpufreq core may try to change the frequency right
> > > from ->init() on certain cases, though not everytime and so the
> > > problem can come despite of this series.
> >
> > cpufreq is still in suspended state (it is being 'resume' very late in
> > the system resume procedure), so if driver doesn't explicitly change any
> > opp in ->init(), then cpufreq core waits until everything is resumed. To
> > sum up, this seems to be fine, beside the issue with regulator
> > initialization I've addressed in this patchset.
>
> Yeah, the governors are suspended very soon, but any frequency change
> starting from cpufreq core can still happen. There are at least two
> points in cpufreq_online() where we may end up changing the frequency,
> but that is conditional and may not be getting hit.
>
> > > We guarantee that the resources are available during probe but not
> > > during resume, that's where the problem is.
> >
> > Yes, so I've changed cpufreq-dt to the common approach, in which the
> > driver keeps all needed resources for the whole lifetime of the device.
>
> That's not what I was saying actually. I was saying that it should be
> fine to do a I2C transfer during resume,

Surely not before resuming the I2C controller involved?

> else we will always have
> problems and have to fix them with hacks like the one you proposed
> where you acquire resources for all the possible CPUs. Maybe we can
> fix it once and for all.

Obviously, all I2C transfers need to take place either before
suspending the I2C controller or after resuming it.
Rafael J. Wysocki Feb. 8, 2019, 10:22 a.m. UTC | #9
On Fri, Feb 8, 2019 at 7:50 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 07-02-19, 13:22, Marek Szyprowski wrote:
> > Dear All,
> >
> > Recent commit 9ac6cb5fbb17 ("i2c: add suspended flag and accessors for
> > i2c adapters") added a visible warning for an attempt to do i2c transfer
> > over a suspended i2c bus. This revealed a long standing issue in the
> > cpufreq-dt driver, which gives a following warning during system
> > suspend/resume cycle:
> >
> > --->8---
> > Enabling non-boot CPUs ...
> > CPU1 is up
> > CPU2 is up
> > CPU3 is up
> > ------------[ cut here ]------------
> > WARNING: CPU: 4 PID: 29 at drivers/i2c/i2c-core-base.c:1869 __i2c_transfer+0x6f8/0xa50
> > Modules linked in:
> > CPU: 4 PID: 29 Comm: cpuhp/4 Tainted: G        W         5.0.0-rc4-next-20190131-00024-g54b06b29cc65 #5324
> > Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
> > [<c01110e8>] (unwind_backtrace) from [<c010d11c>] (show_stack+0x10/0x14)
> > [<c010d11c>] (show_stack) from [<c09a2584>] (dump_stack+0x90/0xc8)
> > [<c09a2584>] (dump_stack) from [<c0120bd0>] (__warn+0xf8/0x124)
> > [<c0120bd0>] (__warn) from [<c0120c3c>] (warn_slowpath_null+0x40/0x48)
> > [<c0120c3c>] (warn_slowpath_null) from [<c065cda0>] (__i2c_transfer+0x6f8/0xa50)
> > [<c065cda0>] (__i2c_transfer) from [<c065d168>] (i2c_transfer+0x70/0xe4)
> > [<c065d168>] (i2c_transfer) from [<c053ce4c>] (regmap_i2c_read+0x48/0x64)
> > [<c053ce4c>] (regmap_i2c_read) from [<c0536f1c>] (_regmap_raw_read+0xf8/0x450)
> > [<c0536f1c>] (_regmap_raw_read) from [<c053772c>] (_regmap_bus_read+0x38/0x68)
> > [<c053772c>] (_regmap_bus_read) from [<c05365a0>] (_regmap_read+0x60/0x250)
> > [<c05365a0>] (_regmap_read) from [<c05367cc>] (regmap_read+0x3c/0x5c)
> > [<c05367cc>] (regmap_read) from [<c047cfc0>] (regulator_is_enabled_regmap+0x20/0x90)
> > [<c047cfc0>] (regulator_is_enabled_regmap) from [<c0477660>] (_regulator_is_enabled+0x34/0x40)
> > [<c0477660>] (_regulator_is_enabled) from [<c0478674>] (create_regulator+0x1a4/0x25c)
> > [<c0478674>] (create_regulator) from [<c047c818>] (_regulator_get+0xe4/0x278)
> > [<c047c818>] (_regulator_get) from [<c068f1dc>] (dev_pm_opp_set_regulators+0xa0/0x1c0)
> > [<c068f1dc>] (dev_pm_opp_set_regulators) from [<c0698cc8>] (cpufreq_init+0x98/0x2d0)
> > [<c0698cc8>] (cpufreq_init) from [<c06959e4>] (cpufreq_online+0xc8/0x71c)
> > [<c06959e4>] (cpufreq_online) from [<c06960fc>] (cpuhp_cpufreq_online+0x8/0x10)
> > [<c06960fc>] (cpuhp_cpufreq_online) from [<c01213d4>] (cpuhp_invoke_callback+0xf4/0xebc)
> > [<c01213d4>] (cpuhp_invoke_callback) from [<c0122e4c>] (cpuhp_thread_fun+0x1d8/0x320)
> > [<c0122e4c>] (cpuhp_thread_fun) from [<c0149858>] (smpboot_thread_fn+0x194/0x340)
> > [<c0149858>] (smpboot_thread_fn) from [<c014573c>] (kthread+0x124/0x160)
> > [<c014573c>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
> > Exception stack(0xe897dfb0 to 0xe897dff8)
> > dfa0:                                     00000000 00000000 00000000 00000000
> > dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> > dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
> > irq event stamp: 3865
> > hardirqs last  enabled at (3873): [<c0186dec>] vprintk_emit+0x228/0x2a4
> > hardirqs last disabled at (3880): [<c0186cf0>] vprintk_emit+0x12c/0x2a4
> > softirqs last  enabled at (3052): [<c0102564>] __do_softirq+0x3a4/0x66c
> > softirqs last disabled at (3043): [<c0128464>] irq_exit+0x140/0x168
> > ---[ end trace db48b455d924fec2 ]---
> > CPU4 is up
> > CPU5 is up
> > CPU6 is up
> > CPU7 is up
> > --->8---
> >
> > This is a scenario that triggers the above issue:
> >
> > 1. system disables non-boot cpu's at the end of system suspend procedure,
> > 2. this in turn deinitializes cpufreq drivers for the disabled cpus,
> > 3. early in the system resume procedure all cpus are got back to online
> >    state,
> > 4. this in turn causes cpufreq to be initialized for the newly onlined
> >    cpus,
> > 5. cpufreq-dt acquires all its resources (clocks, regulators) during
> >    ->init() callback,
> > 6. getting regulator require to check its state, what in turn requires
> >    i2c transfer,
> > 7. during system early resume stage this is not really possible.
> >
> > The issue is caused by cpufreq-dt driver not keeping its resources for
> > the whole driver lifetime and relying that they can be always acquired
> > at any system context. This problem has been observed on Samsung
> > Exynos based Odroid XU3/XU4 boards, but it happens on all boards, which
> > have separate regulators for different CPU clusters.
>
> Why don't you get similar problem during suspend? I think you can get
> it when the CPUs are offlined as I2C would have gone by then. The
> cpufreq or OPP core can try and run some regulator or genpd or clk
> calls to disable resources, etc. Even if doesn't happen, it certainly
> can.
>
> Also at resume the cpufreq core may try to change the frequency right
> from ->init() on certain cases, though not everytime and so the
> problem can come despite of this series.
>
> We guarantee that the resources are available during probe but not
> during resume, that's where the problem is.
>
> @Rafael any suggestions on how to fix this ?

There are cpufreq driver suspend and resume callbacks, maybe use them?

The driver could do the I2C transactions in its suspend/resume
callbacks and do nothing in online/offline if those are part of
system-wide suspend/resume.
Viresh Kumar Feb. 8, 2019, 10:28 a.m. UTC | #10
On 08-02-19, 11:18, Rafael J. Wysocki wrote:
> Obviously, all I2C transfers need to take place either before
> suspending the I2C controller or after resuming it.

Right. But all I am saying is that it is the responsibility of the
cpufreq core/driver to make sure we call ->init() only when the
resources are available to be used. For example during driver load, we
defer probe if resources aren't available and that makes sure ->init()
gets called when we can actually change the frequency.

Shouldn't the same be true during resume? That is make sure all
resources are in place before ->init() gets called ?
Marek Szyprowski Feb. 8, 2019, 10:31 a.m. UTC | #11
Hi Rafael,

On 2019-02-08 11:22, Rafael J. Wysocki wrote:
> On Fri, Feb 8, 2019 at 7:50 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>> On 07-02-19, 13:22, Marek Szyprowski wrote:
>>> Dear All,
>>>
>>> Recent commit 9ac6cb5fbb17 ("i2c: add suspended flag and accessors for
>>> i2c adapters") added a visible warning for an attempt to do i2c transfer
>>> over a suspended i2c bus. This revealed a long standing issue in the
>>> cpufreq-dt driver, which gives a following warning during system
>>> suspend/resume cycle:
>>>
>>> --->8---
>>> Enabling non-boot CPUs ...
>>> CPU1 is up
>>> CPU2 is up
>>> CPU3 is up
>>> ------------[ cut here ]------------
>>> WARNING: CPU: 4 PID: 29 at drivers/i2c/i2c-core-base.c:1869 __i2c_transfer+0x6f8/0xa50
>>> Modules linked in:
>>> CPU: 4 PID: 29 Comm: cpuhp/4 Tainted: G        W         5.0.0-rc4-next-20190131-00024-g54b06b29cc65 #5324
>>> Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
>>> [<c01110e8>] (unwind_backtrace) from [<c010d11c>] (show_stack+0x10/0x14)
>>> [<c010d11c>] (show_stack) from [<c09a2584>] (dump_stack+0x90/0xc8)
>>> [<c09a2584>] (dump_stack) from [<c0120bd0>] (__warn+0xf8/0x124)
>>> [<c0120bd0>] (__warn) from [<c0120c3c>] (warn_slowpath_null+0x40/0x48)
>>> [<c0120c3c>] (warn_slowpath_null) from [<c065cda0>] (__i2c_transfer+0x6f8/0xa50)
>>> [<c065cda0>] (__i2c_transfer) from [<c065d168>] (i2c_transfer+0x70/0xe4)
>>> [<c065d168>] (i2c_transfer) from [<c053ce4c>] (regmap_i2c_read+0x48/0x64)
>>> [<c053ce4c>] (regmap_i2c_read) from [<c0536f1c>] (_regmap_raw_read+0xf8/0x450)
>>> [<c0536f1c>] (_regmap_raw_read) from [<c053772c>] (_regmap_bus_read+0x38/0x68)
>>> [<c053772c>] (_regmap_bus_read) from [<c05365a0>] (_regmap_read+0x60/0x250)
>>> [<c05365a0>] (_regmap_read) from [<c05367cc>] (regmap_read+0x3c/0x5c)
>>> [<c05367cc>] (regmap_read) from [<c047cfc0>] (regulator_is_enabled_regmap+0x20/0x90)
>>> [<c047cfc0>] (regulator_is_enabled_regmap) from [<c0477660>] (_regulator_is_enabled+0x34/0x40)
>>> [<c0477660>] (_regulator_is_enabled) from [<c0478674>] (create_regulator+0x1a4/0x25c)
>>> [<c0478674>] (create_regulator) from [<c047c818>] (_regulator_get+0xe4/0x278)
>>> [<c047c818>] (_regulator_get) from [<c068f1dc>] (dev_pm_opp_set_regulators+0xa0/0x1c0)
>>> [<c068f1dc>] (dev_pm_opp_set_regulators) from [<c0698cc8>] (cpufreq_init+0x98/0x2d0)
>>> [<c0698cc8>] (cpufreq_init) from [<c06959e4>] (cpufreq_online+0xc8/0x71c)
>>> [<c06959e4>] (cpufreq_online) from [<c06960fc>] (cpuhp_cpufreq_online+0x8/0x10)
>>> [<c06960fc>] (cpuhp_cpufreq_online) from [<c01213d4>] (cpuhp_invoke_callback+0xf4/0xebc)
>>> [<c01213d4>] (cpuhp_invoke_callback) from [<c0122e4c>] (cpuhp_thread_fun+0x1d8/0x320)
>>> [<c0122e4c>] (cpuhp_thread_fun) from [<c0149858>] (smpboot_thread_fn+0x194/0x340)
>>> [<c0149858>] (smpboot_thread_fn) from [<c014573c>] (kthread+0x124/0x160)
>>> [<c014573c>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
>>> Exception stack(0xe897dfb0 to 0xe897dff8)
>>> dfa0:                                     00000000 00000000 00000000 00000000
>>> dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>>> dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
>>> irq event stamp: 3865
>>> hardirqs last  enabled at (3873): [<c0186dec>] vprintk_emit+0x228/0x2a4
>>> hardirqs last disabled at (3880): [<c0186cf0>] vprintk_emit+0x12c/0x2a4
>>> softirqs last  enabled at (3052): [<c0102564>] __do_softirq+0x3a4/0x66c
>>> softirqs last disabled at (3043): [<c0128464>] irq_exit+0x140/0x168
>>> ---[ end trace db48b455d924fec2 ]---
>>> CPU4 is up
>>> CPU5 is up
>>> CPU6 is up
>>> CPU7 is up
>>> --->8---
>>>
>>> This is a scenario that triggers the above issue:
>>>
>>> 1. system disables non-boot cpu's at the end of system suspend procedure,
>>> 2. this in turn deinitializes cpufreq drivers for the disabled cpus,
>>> 3. early in the system resume procedure all cpus are got back to online
>>>    state,
>>> 4. this in turn causes cpufreq to be initialized for the newly onlined
>>>    cpus,
>>> 5. cpufreq-dt acquires all its resources (clocks, regulators) during
>>>    ->init() callback,
>>> 6. getting regulator require to check its state, what in turn requires
>>>    i2c transfer,
>>> 7. during system early resume stage this is not really possible.
>>>
>>> The issue is caused by cpufreq-dt driver not keeping its resources for
>>> the whole driver lifetime and relying that they can be always acquired
>>> at any system context. This problem has been observed on Samsung
>>> Exynos based Odroid XU3/XU4 boards, but it happens on all boards, which
>>> have separate regulators for different CPU clusters.
>> Why don't you get similar problem during suspend? I think you can get
>> it when the CPUs are offlined as I2C would have gone by then. The
>> cpufreq or OPP core can try and run some regulator or genpd or clk
>> calls to disable resources, etc. Even if doesn't happen, it certainly
>> can.
>>
>> Also at resume the cpufreq core may try to change the frequency right
>> from ->init() on certain cases, though not everytime and so the
>> problem can come despite of this series.
>>
>> We guarantee that the resources are available during probe but not
>> during resume, that's where the problem is.
>>
>> @Rafael any suggestions on how to fix this ?
> There are cpufreq driver suspend and resume callbacks, maybe use them?
>
> The driver could do the I2C transactions in its suspend/resume
> callbacks and do nothing in online/offline if those are part of
> system-wide suspend/resume.

This is exactly what I suggested, when I wrote to handle it in cpufreq
suspend/resume.

Best regards
Viresh Kumar Feb. 8, 2019, 10:31 a.m. UTC | #12
On 08-02-19, 11:22, Rafael J. Wysocki wrote:
> There are cpufreq driver suspend and resume callbacks, maybe use them?
> 
> The driver could do the I2C transactions in its suspend/resume
> callbacks and do nothing in online/offline if those are part of
> system-wide suspend/resume.

These are per-policy things that we need to do, not sure if driver
suspend/resume is a good place for that. It is more for a case where
CPU 0-3 are in one policy and 4-7 in another. Now 1-7 are
hot-unplugged during system suspend and hotplugged later on. This is
more like complete removal/addition of devices instead of
suspend/resume.
Rafael J. Wysocki Feb. 8, 2019, 10:42 a.m. UTC | #13
On Fri, Feb 8, 2019 at 11:31 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 08-02-19, 11:22, Rafael J. Wysocki wrote:
> > There are cpufreq driver suspend and resume callbacks, maybe use them?
> >
> > The driver could do the I2C transactions in its suspend/resume
> > callbacks and do nothing in online/offline if those are part of
> > system-wide suspend/resume.
>
> These are per-policy things that we need to do, not sure if driver
> suspend/resume is a good place for that. It is more for a case where
> CPU 0-3 are in one policy and 4-7 in another. Now 1-7 are
> hot-unplugged during system suspend and hotplugged later on. This is
> more like complete removal/addition of devices instead of
> suspend/resume.

No, it isn't.  We don't remove devices on offline.  We migrate stuff
away from them and (opportunistically) power them down.

If this is system suspend, the driver kind of knows that offline will
take place, so it can prepare for it.  Likewise, when online takes
place during system-wide resume, it generally is known that this is
system-wide resume (there is a flag to indicate that in CPU hotplug),
it can be "smart" and avoid accessing suspended devices.  Deferring
the frequency set up until the driver resume time should do the trick
I suppose.
Rafael J. Wysocki Feb. 8, 2019, 10:52 a.m. UTC | #14
On Fri, Feb 8, 2019 at 11:42 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Fri, Feb 8, 2019 at 11:31 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >
> > On 08-02-19, 11:22, Rafael J. Wysocki wrote:
> > > There are cpufreq driver suspend and resume callbacks, maybe use them?
> > >
> > > The driver could do the I2C transactions in its suspend/resume
> > > callbacks and do nothing in online/offline if those are part of
> > > system-wide suspend/resume.
> >
> > These are per-policy things that we need to do, not sure if driver
> > suspend/resume is a good place for that. It is more for a case where
> > CPU 0-3 are in one policy and 4-7 in another. Now 1-7 are
> > hot-unplugged during system suspend and hotplugged later on. This is
> > more like complete removal/addition of devices instead of
> > suspend/resume.
>
> No, it isn't.  We don't remove devices on offline.  We migrate stuff
> away from them and (opportunistically) power them down.
>
> If this is system suspend, the driver kind of knows that offline will
> take place, so it can prepare for it.  Likewise, when online takes
> place during system-wide resume, it generally is known that this is
> system-wide resume (there is a flag to indicate that in CPU hotplug),
> it can be "smart" and avoid accessing suspended devices.  Deferring
> the frequency set up until the driver resume time should do the trick
> I suppose.

BTW, do transitions to idle states involve I2C transactions on the
systems in question?
Sudeep Holla Feb. 8, 2019, 11 a.m. UTC | #15
On Thu, Feb 07, 2019 at 01:22:25PM +0100, Marek Szyprowski wrote:
> Dear All,
>
> This is a scenario that triggers the above issue:
>

[...]

> 1. system disables non-boot cpu's at the end of system suspend procedure,
> 2. this in turn deinitializes cpufreq drivers for the disabled cpus,
> 3. early in the system resume procedure all cpus are got back to online
>    state,
> 4. this in turn causes cpufreq to be initialized for the newly onlined
>    cpus,
> 5. cpufreq-dt acquires all its resources (clocks, regulators) during
>    ->init() callback,

This is strictly not just restricted to cpufreq-dt, but to any driver
supporting multiple policies. So we need a generic fix not just
cpufreq-dt specific.

--
Regards,
Sudeep
Sudeep Holla Feb. 8, 2019, 11:39 a.m. UTC | #16
On Fri, Feb 08, 2019 at 11:42:20AM +0100, Rafael J. Wysocki wrote:
> On Fri, Feb 8, 2019 at 11:31 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >
> > On 08-02-19, 11:22, Rafael J. Wysocki wrote:
> > > There are cpufreq driver suspend and resume callbacks, maybe use them?
> > >
> > > The driver could do the I2C transactions in its suspend/resume
> > > callbacks and do nothing in online/offline if those are part of
> > > system-wide suspend/resume.
> >
> > These are per-policy things that we need to do, not sure if driver
> > suspend/resume is a good place for that. It is more for a case where
> > CPU 0-3 are in one policy and 4-7 in another. Now 1-7 are
> > hot-unplugged during system suspend and hotplugged later on. This is
> > more like complete removal/addition of devices instead of
> > suspend/resume.
>
> No, it isn't.  We don't remove devices on offline.  We migrate stuff
> away from them and (opportunistically) power them down.
>
> If this is system suspend, the driver kind of knows that offline will
> take place, so it can prepare for it.  Likewise, when online takes
> place during system-wide resume, it generally is known that this is
> system-wide resume (there is a flag to indicate that in CPU hotplug),
> it can be "smart" and avoid accessing suspended devices.  Deferring
> the frequency set up until the driver resume time should do the trick
> I suppose.

I agree. The reason we don't see this generally on boot is because all
the CPUs are brought online before CPUfreq is initialised. While during
system suspend, we call cpufreq_online which in turn calls ->init in
the hotplug state machine.

So as Rafael suggests we need to do some trick, but can it be done in
the core itself ? I may be missing something, but how about the patch
below:

Regards,
Sudeep

--
diff --git i/drivers/cpufreq/cpufreq.c w/drivers/cpufreq/cpufreq.c
index e35a886e00bc..7d8b0b99f91d 100644
--- i/drivers/cpufreq/cpufreq.c
+++ w/drivers/cpufreq/cpufreq.c
@@ -1241,7 +1241,8 @@ static int cpufreq_online(unsigned int cpu)
                policy->max = policy->user_policy.max;
        }

-       if (cpufreq_driver->get && !cpufreq_driver->setpolicy) {
+       if (cpufreq_driver->get && !cpufreq_driver->setpolicy &&
+           !cpufreq_suspended) {
                policy->cur = cpufreq_driver->get(policy->cpu);
                if (!policy->cur) {
                        pr_err("%s: ->get() failed\n", __func__);
@@ -1702,6 +1703,11 @@ void cpufreq_resume(void)
                                pr_err("%s: Failed to start governor for policy: %p\n",
                                       __func__, policy);
                }
+               policy->cur = cpufreq_driver->get(policy->cpu);
+               if (!policy->cur) {
+                       pr_err("%s: ->get() failed\n", __func__);
+                       goto out_destroy_policy;
+               }
        }
 }
Marek Szyprowski Feb. 8, 2019, 11:47 a.m. UTC | #17
Hi Sudeep,

On 2019-02-08 12:00, Sudeep Holla wrote:
> On Thu, Feb 07, 2019 at 01:22:25PM +0100, Marek Szyprowski wrote:
>> Dear All,
>>
>> This is a scenario that triggers the above issue:
> [...]
>> 1. system disables non-boot cpu's at the end of system suspend procedure,
>> 2. this in turn deinitializes cpufreq drivers for the disabled cpus,
>> 3. early in the system resume procedure all cpus are got back to online
>>    state,
>> 4. this in turn causes cpufreq to be initialized for the newly onlined
>>    cpus,
>> 5. cpufreq-dt acquires all its resources (clocks, regulators) during
>>    ->init() callback,
> This is strictly not just restricted to cpufreq-dt, but to any driver
> supporting multiple policies. So we need a generic fix not just
> cpufreq-dt specific.

Could you point which other driver needs similar fix? Here in cpufreq-dt
the problem was caused by using regulator api (indirectly) from
->init(). All other drivers, which have regulators support, are for old,
obsolete, uni-processor systems, which don't have the problem of
secondary cpu suspend during system suspend/resume cycle.

Best regards
Sudeep Holla Feb. 8, 2019, 11:51 a.m. UTC | #18
On Fri, Feb 08, 2019 at 12:47:06PM +0100, Marek Szyprowski wrote:
> Hi Sudeep,
>
> On 2019-02-08 12:00, Sudeep Holla wrote:
> > On Thu, Feb 07, 2019 at 01:22:25PM +0100, Marek Szyprowski wrote:
> >> Dear All,
> >>
> >> This is a scenario that triggers the above issue:
> > [...]
> >> 1. system disables non-boot cpu's at the end of system suspend procedure,
> >> 2. this in turn deinitializes cpufreq drivers for the disabled cpus,
> >> 3. early in the system resume procedure all cpus are got back to online
> >>    state,
> >> 4. this in turn causes cpufreq to be initialized for the newly onlined
> >>    cpus,
> >> 5. cpufreq-dt acquires all its resources (clocks, regulators) during
> >>    ->init() callback,
> > This is strictly not just restricted to cpufreq-dt, but to any driver
> > supporting multiple policies. So we need a generic fix not just
> > cpufreq-dt specific.
>
> Could you point which other driver needs similar fix? Here in cpufreq-dt
> the problem was caused by using regulator api (indirectly) from
> ->init(). All other drivers, which have regulators support, are for old,
> obsolete, uni-processor systems, which don't have the problem of
> secondary cpu suspend during system suspend/resume cycle.
>

scmi_cpufreq for instance. We can fix that in driver my moving to polling
to get cpufreq_get_rate, but we support both polling and interrupt based.
We may wait for remote processor interrupt in get_rate.

--
Regards,
Sudeep
Rafael J. Wysocki Feb. 8, 2019, 12:03 p.m. UTC | #19
On Fri, Feb 8, 2019 at 12:39 PM Sudeep Holla <sudeep.holla@arm.com> wrote:
>
> On Fri, Feb 08, 2019 at 11:42:20AM +0100, Rafael J. Wysocki wrote:
> > On Fri, Feb 8, 2019 at 11:31 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > >
> > > On 08-02-19, 11:22, Rafael J. Wysocki wrote:
> > > > There are cpufreq driver suspend and resume callbacks, maybe use them?
> > > >
> > > > The driver could do the I2C transactions in its suspend/resume
> > > > callbacks and do nothing in online/offline if those are part of
> > > > system-wide suspend/resume.
> > >
> > > These are per-policy things that we need to do, not sure if driver
> > > suspend/resume is a good place for that. It is more for a case where
> > > CPU 0-3 are in one policy and 4-7 in another. Now 1-7 are
> > > hot-unplugged during system suspend and hotplugged later on. This is
> > > more like complete removal/addition of devices instead of
> > > suspend/resume.
> >
> > No, it isn't.  We don't remove devices on offline.  We migrate stuff
> > away from them and (opportunistically) power them down.
> >
> > If this is system suspend, the driver kind of knows that offline will
> > take place, so it can prepare for it.  Likewise, when online takes
> > place during system-wide resume, it generally is known that this is
> > system-wide resume (there is a flag to indicate that in CPU hotplug),
> > it can be "smart" and avoid accessing suspended devices.  Deferring
> > the frequency set up until the driver resume time should do the trick
> > I suppose.
>
> I agree. The reason we don't see this generally on boot is because all
> the CPUs are brought online before CPUfreq is initialised. While during
> system suspend, we call cpufreq_online which in turn calls ->init in
> the hotplug state machine.
>
> So as Rafael suggests we need to do some trick, but can it be done in
> the core itself ? I may be missing something, but how about the patch
> below:
>
> Regards,
> Sudeep
>
> --
> diff --git i/drivers/cpufreq/cpufreq.c w/drivers/cpufreq/cpufreq.c
> index e35a886e00bc..7d8b0b99f91d 100644
> --- i/drivers/cpufreq/cpufreq.c
> +++ w/drivers/cpufreq/cpufreq.c
> @@ -1241,7 +1241,8 @@ static int cpufreq_online(unsigned int cpu)
>                 policy->max = policy->user_policy.max;
>         }
>
> -       if (cpufreq_driver->get && !cpufreq_driver->setpolicy) {
> +       if (cpufreq_driver->get && !cpufreq_driver->setpolicy &&
> +           !cpufreq_suspended) {
>                 policy->cur = cpufreq_driver->get(policy->cpu);
>                 if (!policy->cur) {
>                         pr_err("%s: ->get() failed\n", __func__);

It looks like we need to skip the "initial freq check" block below.

Also this doesn't really help the case when the driver ->init() messes
up with things.

> @@ -1702,6 +1703,11 @@ void cpufreq_resume(void)
>                                 pr_err("%s: Failed to start governor for policy: %p\n",
>                                        __func__, policy);
>                 }
> +               policy->cur = cpufreq_driver->get(policy->cpu);
> +               if (!policy->cur) {
> +                       pr_err("%s: ->get() failed\n", __func__);
> +                       goto out_destroy_policy;
> +               }
>         }
>  }
>
Marek Szyprowski Feb. 8, 2019, 12:04 p.m. UTC | #20
Hi Sudeep,

On 2019-02-08 12:51, Sudeep Holla wrote:
> On Fri, Feb 08, 2019 at 12:47:06PM +0100, Marek Szyprowski wrote:
>> On 2019-02-08 12:00, Sudeep Holla wrote:
>>> On Thu, Feb 07, 2019 at 01:22:25PM +0100, Marek Szyprowski wrote:
>>>> Dear All,
>>>>
>>>> This is a scenario that triggers the above issue:
>>> [...]
>>>> 1. system disables non-boot cpu's at the end of system suspend procedure,
>>>> 2. this in turn deinitializes cpufreq drivers for the disabled cpus,
>>>> 3. early in the system resume procedure all cpus are got back to online
>>>>    state,
>>>> 4. this in turn causes cpufreq to be initialized for the newly onlined
>>>>    cpus,
>>>> 5. cpufreq-dt acquires all its resources (clocks, regulators) during
>>>>    ->init() callback,
>>> This is strictly not just restricted to cpufreq-dt, but to any driver
>>> supporting multiple policies. So we need a generic fix not just
>>> cpufreq-dt specific.
>> Could you point which other driver needs similar fix? Here in cpufreq-dt
>> the problem was caused by using regulator api (indirectly) from
>> ->init(). All other drivers, which have regulators support, are for old,
>> obsolete, uni-processor systems, which don't have the problem of
>> secondary cpu suspend during system suspend/resume cycle.
>>
> scmi_cpufreq for instance. We can fix that in driver my moving to polling
> to get cpufreq_get_rate, but we support both polling and interrupt based.
> We may wait for remote processor interrupt in get_rate.

Frankly, I don't feel I know enough to touch this driver and I don't
think that this can even be fixed in a generic way in the cpufreq core.
Maybe a comment somewhere is needed that ->init() might be called during
early system resume with irqs off and driver is responsible for handling
such case until the proper resume?

Best regards
Sudeep Holla Feb. 8, 2019, 12:09 p.m. UTC | #21
On Fri, Feb 08, 2019 at 01:03:10PM +0100, Rafael J. Wysocki wrote:
> On Fri, Feb 8, 2019 at 12:39 PM Sudeep Holla <sudeep.holla@arm.com> wrote:
> >
> > On Fri, Feb 08, 2019 at 11:42:20AM +0100, Rafael J. Wysocki wrote:
> > > On Fri, Feb 8, 2019 at 11:31 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > > >
> > > > On 08-02-19, 11:22, Rafael J. Wysocki wrote:
> > > > > There are cpufreq driver suspend and resume callbacks, maybe use them?
> > > > >
> > > > > The driver could do the I2C transactions in its suspend/resume
> > > > > callbacks and do nothing in online/offline if those are part of
> > > > > system-wide suspend/resume.
> > > >
> > > > These are per-policy things that we need to do, not sure if driver
> > > > suspend/resume is a good place for that. It is more for a case where
> > > > CPU 0-3 are in one policy and 4-7 in another. Now 1-7 are
> > > > hot-unplugged during system suspend and hotplugged later on. This is
> > > > more like complete removal/addition of devices instead of
> > > > suspend/resume.
> > >
> > > No, it isn't.  We don't remove devices on offline.  We migrate stuff
> > > away from them and (opportunistically) power them down.
> > >
> > > If this is system suspend, the driver kind of knows that offline will
> > > take place, so it can prepare for it.  Likewise, when online takes
> > > place during system-wide resume, it generally is known that this is
> > > system-wide resume (there is a flag to indicate that in CPU hotplug),
> > > it can be "smart" and avoid accessing suspended devices.  Deferring
> > > the frequency set up until the driver resume time should do the trick
> > > I suppose.
> >
> > I agree. The reason we don't see this generally on boot is because all
> > the CPUs are brought online before CPUfreq is initialised. While during
> > system suspend, we call cpufreq_online which in turn calls ->init in
> > the hotplug state machine.
> >
> > So as Rafael suggests we need to do some trick, but can it be done in
> > the core itself ? I may be missing something, but how about the patch
> > below:
> >
> > Regards,
> > Sudeep
> >
> > --
> > diff --git i/drivers/cpufreq/cpufreq.c w/drivers/cpufreq/cpufreq.c
> > index e35a886e00bc..7d8b0b99f91d 100644
> > --- i/drivers/cpufreq/cpufreq.c
> > +++ w/drivers/cpufreq/cpufreq.c
> > @@ -1241,7 +1241,8 @@ static int cpufreq_online(unsigned int cpu)
> >                 policy->max = policy->user_policy.max;
> >         }
> >
> > -       if (cpufreq_driver->get && !cpufreq_driver->setpolicy) {
> > +       if (cpufreq_driver->get && !cpufreq_driver->setpolicy &&
> > +           !cpufreq_suspended) {
> >                 policy->cur = cpufreq_driver->get(policy->cpu);
> >                 if (!policy->cur) {
> >                         pr_err("%s: ->get() failed\n", __func__);
> 
> It looks like we need to skip the "initial freq check" block below.
> 

Indeed, copy pasted an earlier version of diff. I found that I even
used a goto label wrong which I fixed along with the additional check
in "initial freq check" when I tried to compile :).

> Also this doesn't really help the case when the driver ->init() messes
> up with things.
>

Yes, in that case additional logic in the driver also needed. I am fine
if we enforce driver to deal with this issue, but was thinking if we can
make it generic. Also I was just trying to avoid adding _suspend/resume
to driver just to avoid this issue.

--
Regards,
Sudeep
Rafael J. Wysocki Feb. 8, 2019, 12:11 p.m. UTC | #22
On Fri, Feb 8, 2019 at 1:04 PM Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
>
> Hi Sudeep,
>
> On 2019-02-08 12:51, Sudeep Holla wrote:
> > On Fri, Feb 08, 2019 at 12:47:06PM +0100, Marek Szyprowski wrote:
> >> On 2019-02-08 12:00, Sudeep Holla wrote:
> >>> On Thu, Feb 07, 2019 at 01:22:25PM +0100, Marek Szyprowski wrote:
> >>>> Dear All,
> >>>>
> >>>> This is a scenario that triggers the above issue:
> >>> [...]
> >>>> 1. system disables non-boot cpu's at the end of system suspend procedure,
> >>>> 2. this in turn deinitializes cpufreq drivers for the disabled cpus,
> >>>> 3. early in the system resume procedure all cpus are got back to online
> >>>>    state,
> >>>> 4. this in turn causes cpufreq to be initialized for the newly onlined
> >>>>    cpus,
> >>>> 5. cpufreq-dt acquires all its resources (clocks, regulators) during
> >>>>    ->init() callback,
> >>> This is strictly not just restricted to cpufreq-dt, but to any driver
> >>> supporting multiple policies. So we need a generic fix not just
> >>> cpufreq-dt specific.
> >> Could you point which other driver needs similar fix? Here in cpufreq-dt
> >> the problem was caused by using regulator api (indirectly) from
> >> ->init(). All other drivers, which have regulators support, are for old,
> >> obsolete, uni-processor systems, which don't have the problem of
> >> secondary cpu suspend during system suspend/resume cycle.
> >>
> > scmi_cpufreq for instance. We can fix that in driver my moving to polling
> > to get cpufreq_get_rate, but we support both polling and interrupt based.
> > We may wait for remote processor interrupt in get_rate.
>
> Frankly, I don't feel I know enough to touch this driver and I don't
> think that this can even be fixed in a generic way in the cpufreq core.
> Maybe a comment somewhere is needed that ->init() might be called during
> early system resume with irqs off and driver is responsible for handling
> such case until the proper resume?

Well, adding a comment to that effect certainly won't hurt as that's
how things work now.

However, it looks like something needs to be done in addition to that.
Sudeep Holla Feb. 8, 2019, 12:16 p.m. UTC | #23
On Fri, Feb 08, 2019 at 01:04:18PM +0100, Marek Szyprowski wrote:
> Hi Sudeep,
>
> On 2019-02-08 12:51, Sudeep Holla wrote:
> > On Fri, Feb 08, 2019 at 12:47:06PM +0100, Marek Szyprowski wrote:
> >> On 2019-02-08 12:00, Sudeep Holla wrote:
> >>> On Thu, Feb 07, 2019 at 01:22:25PM +0100, Marek Szyprowski wrote:
> >>>> Dear All,
> >>>>
> >>>> This is a scenario that triggers the above issue:
> >>> [...]
> >>>> 1. system disables non-boot cpu's at the end of system suspend procedure,
> >>>> 2. this in turn deinitializes cpufreq drivers for the disabled cpus,
> >>>> 3. early in the system resume procedure all cpus are got back to online
> >>>>    state,
> >>>> 4. this in turn causes cpufreq to be initialized for the newly onlined
> >>>>    cpus,
> >>>> 5. cpufreq-dt acquires all its resources (clocks, regulators) during
> >>>>    ->init() callback,
> >>> This is strictly not just restricted to cpufreq-dt, but to any driver
> >>> supporting multiple policies. So we need a generic fix not just
> >>> cpufreq-dt specific.
> >> Could you point which other driver needs similar fix? Here in cpufreq-dt
> >> the problem was caused by using regulator api (indirectly) from
> >> ->init(). All other drivers, which have regulators support, are for old,
> >> obsolete, uni-processor systems, which don't have the problem of
> >> secondary cpu suspend during system suspend/resume cycle.
> >>
> > scmi_cpufreq for instance. We can fix that in driver my moving to polling
> > to get cpufreq_get_rate, but we support both polling and interrupt based.
> > We may wait for remote processor interrupt in get_rate.
>
> Frankly, I don't feel I know enough to touch this driver and I don't
> think that this can even be fixed in a generic way in the cpufreq core.

Why not ? IIUC it's only to get/set the frequency you would need to talk
to remote processor or external chip over I2C which can be deferred until
resume. Rafael has valid concerns on messed up init implementations, still
wondering if there's any chance to solve this in the core.

> Maybe a comment somewhere is needed that ->init() might be called during
> early system resume with irqs off and driver is responsible for handling
> such case until the proper resume?
>

I agree and +1 for comment, but every driver needs to deal with that,
which is fine. Just trying to see if we can avoid it.

--
Regards,
Sudeep
Rafael J. Wysocki Feb. 8, 2019, 12:23 p.m. UTC | #24
On Fri, Feb 8, 2019 at 1:09 PM Sudeep Holla <sudeep.holla@arm.com> wrote:
>
> On Fri, Feb 08, 2019 at 01:03:10PM +0100, Rafael J. Wysocki wrote:
> > On Fri, Feb 8, 2019 at 12:39 PM Sudeep Holla <sudeep.holla@arm.com> wrote:
> > >
> > > On Fri, Feb 08, 2019 at 11:42:20AM +0100, Rafael J. Wysocki wrote:
> > > > On Fri, Feb 8, 2019 at 11:31 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > > > >
> > > > > On 08-02-19, 11:22, Rafael J. Wysocki wrote:
> > > > > > There are cpufreq driver suspend and resume callbacks, maybe use them?
> > > > > >
> > > > > > The driver could do the I2C transactions in its suspend/resume
> > > > > > callbacks and do nothing in online/offline if those are part of
> > > > > > system-wide suspend/resume.
> > > > >
> > > > > These are per-policy things that we need to do, not sure if driver
> > > > > suspend/resume is a good place for that. It is more for a case where
> > > > > CPU 0-3 are in one policy and 4-7 in another. Now 1-7 are
> > > > > hot-unplugged during system suspend and hotplugged later on. This is
> > > > > more like complete removal/addition of devices instead of
> > > > > suspend/resume.
> > > >
> > > > No, it isn't.  We don't remove devices on offline.  We migrate stuff
> > > > away from them and (opportunistically) power them down.
> > > >
> > > > If this is system suspend, the driver kind of knows that offline will
> > > > take place, so it can prepare for it.  Likewise, when online takes
> > > > place during system-wide resume, it generally is known that this is
> > > > system-wide resume (there is a flag to indicate that in CPU hotplug),
> > > > it can be "smart" and avoid accessing suspended devices.  Deferring
> > > > the frequency set up until the driver resume time should do the trick
> > > > I suppose.
> > >
> > > I agree. The reason we don't see this generally on boot is because all
> > > the CPUs are brought online before CPUfreq is initialised. While during
> > > system suspend, we call cpufreq_online which in turn calls ->init in
> > > the hotplug state machine.
> > >
> > > So as Rafael suggests we need to do some trick, but can it be done in
> > > the core itself ? I may be missing something, but how about the patch
> > > below:
> > >
> > > Regards,
> > > Sudeep
> > >
> > > --
> > > diff --git i/drivers/cpufreq/cpufreq.c w/drivers/cpufreq/cpufreq.c
> > > index e35a886e00bc..7d8b0b99f91d 100644
> > > --- i/drivers/cpufreq/cpufreq.c
> > > +++ w/drivers/cpufreq/cpufreq.c
> > > @@ -1241,7 +1241,8 @@ static int cpufreq_online(unsigned int cpu)
> > >                 policy->max = policy->user_policy.max;
> > >         }
> > >
> > > -       if (cpufreq_driver->get && !cpufreq_driver->setpolicy) {
> > > +       if (cpufreq_driver->get && !cpufreq_driver->setpolicy &&
> > > +           !cpufreq_suspended) {
> > >                 policy->cur = cpufreq_driver->get(policy->cpu);
> > >                 if (!policy->cur) {
> > >                         pr_err("%s: ->get() failed\n", __func__);
> >
> > It looks like we need to skip the "initial freq check" block below.
> >
>
> Indeed, copy pasted an earlier version of diff. I found that I even
> used a goto label wrong which I fixed along with the additional check
> in "initial freq check" when I tried to compile :).
>
> > Also this doesn't really help the case when the driver ->init() messes
> > up with things.
> >
>
> Yes, in that case additional logic in the driver also needed. I am fine
> if we enforce driver to deal with this issue, but was thinking if we can
> make it generic. Also I was just trying to avoid adding _suspend/resume
> to driver just to avoid this issue.

I was wondering if cpufreq_offline()/online() could be invoked from
cpufreq_suspend()/resume() for the nonboot CPUs - if the driver needs
it (there could be a driver flag to indicate that).

If they are made exit immediately when cpufreq_suspended is set (and
the requisite driver flag is set too), that might work AFAICS.
Sudeep Holla Feb. 8, 2019, 2:28 p.m. UTC | #25
On Fri, Feb 08, 2019 at 01:23:37PM +0100, Rafael J. Wysocki wrote:
> On Fri, Feb 8, 2019 at 1:09 PM Sudeep Holla <sudeep.holla@arm.com> wrote:
> >

[...]

> > Yes, in that case additional logic in the driver also needed. I am fine
> > if we enforce driver to deal with this issue, but was thinking if we can
> > make it generic. Also I was just trying to avoid adding _suspend/resume
> > to driver just to avoid this issue.
>
> I was wondering if cpufreq_offline()/online() could be invoked from
> cpufreq_suspend()/resume() for the nonboot CPUs - if the driver needs
> it (there could be a driver flag to indicate that).
>
> If they are made exit immediately when cpufreq_suspended is set (and
> the requisite driver flag is set too), that might work AFAICS.

Yes that sounds feasible. It should be fine to assume it's safe to call
cpufreq_online on a CPU even for CPU that might have failed to come
online or didn't reached a state in CPUHP from where CPUFreq callback
is executed or am I missing something ?

--
Regards,
Sudeep
Sudeep Holla Feb. 8, 2019, 5:41 p.m. UTC | #26
On Fri, Feb 08, 2019 at 01:04:18PM +0100, Marek Szyprowski wrote:
> Hi Sudeep,
> 
> On 2019-02-08 12:51, Sudeep Holla wrote:
> > On Fri, Feb 08, 2019 at 12:47:06PM +0100, Marek Szyprowski wrote:
> >> On 2019-02-08 12:00, Sudeep Holla wrote:
> >>> On Thu, Feb 07, 2019 at 01:22:25PM +0100, Marek Szyprowski wrote:
> >>>> Dear All,
> >>>>
> >>>> This is a scenario that triggers the above issue:
> >>> [...]
> >>>> 1. system disables non-boot cpu's at the end of system suspend procedure,
> >>>> 2. this in turn deinitializes cpufreq drivers for the disabled cpus,
> >>>> 3. early in the system resume procedure all cpus are got back to online
> >>>>    state,
> >>>> 4. this in turn causes cpufreq to be initialized for the newly onlined
> >>>>    cpus,
> >>>> 5. cpufreq-dt acquires all its resources (clocks, regulators) during
> >>>>    ->init() callback,
> >>> This is strictly not just restricted to cpufreq-dt, but to any driver
> >>> supporting multiple policies. So we need a generic fix not just
> >>> cpufreq-dt specific.
> >> Could you point which other driver needs similar fix? Here in cpufreq-dt
> >> the problem was caused by using regulator api (indirectly) from
> >> ->init(). All other drivers, which have regulators support, are for old,
> >> obsolete, uni-processor systems, which don't have the problem of
> >> secondary cpu suspend during system suspend/resume cycle.
> >>
> > scmi_cpufreq for instance. We can fix that in driver my moving to polling
> > to get cpufreq_get_rate, but we support both polling and interrupt based.
> > We may wait for remote processor interrupt in get_rate.
> 
> Frankly, I don't feel I know enough to touch this driver and I don't
> think that this can even be fixed in a generic way in the cpufreq core.

Based on Rafael's suggestion, I cooked up something. See if this helps ?
The policy to cpu dance can be removed and we can just run through the
online cpumask I think.

Regards,
Sudeep

-->8

diff --git i/drivers/cpufreq/cpufreq.c w/drivers/cpufreq/cpufreq.c
index e35a886e00bc..03d65a02a542 100644
--- i/drivers/cpufreq/cpufreq.c
+++ w/drivers/cpufreq/cpufreq.c
@@ -1640,6 +1640,7 @@ EXPORT_SYMBOL(cpufreq_generic_suspend);
 void cpufreq_suspend(void)
 {
 	struct cpufreq_policy *policy;
+	int cpu;
 
 	if (!cpufreq_driver)
 		return;
@@ -1662,6 +1663,11 @@ void cpufreq_suspend(void)
 	}
 
 suspend:
+	if (cpufreq_driver->flags & CPUFREQ_DEFER_INIT_DURING_RESUME)
+		for_each_active_policy(policy)
+			for_each_cpu(cpu, policy->cpus)
+				cpufreq_offline(cpu);
+
 	cpufreq_suspended = true;
 }
 
@@ -1674,7 +1680,7 @@ void cpufreq_suspend(void)
 void cpufreq_resume(void)
 {
 	struct cpufreq_policy *policy;
-	int ret;
+	int ret, cpu;
 
 	if (!cpufreq_driver)
 		return;
@@ -1682,6 +1688,11 @@ void cpufreq_resume(void)
 	if (unlikely(!cpufreq_suspended))
 		return;
 
+	if (cpufreq_driver->flags & CPUFREQ_DEFER_INIT_DURING_RESUME)
+		for_each_active_policy(policy)
+			for_each_cpu(cpu, policy->cpus)
+				cpufreq_online(cpu);
+
 	cpufreq_suspended = false;
 
 	if (!has_target() && !cpufreq_driver->resume)
@@ -2444,14 +2455,16 @@ static enum cpuhp_state hp_online;
 
 static int cpuhp_cpufreq_online(unsigned int cpu)
 {
-	cpufreq_online(cpu);
+	if (!(cpufreq_driver->flags & CPUFREQ_DEFER_INIT_DURING_RESUME))
+		cpufreq_online(cpu);
 
 	return 0;
 }
 
 static int cpuhp_cpufreq_offline(unsigned int cpu)
 {
-	cpufreq_offline(cpu);
+	if (!(cpufreq_driver->flags & CPUFREQ_DEFER_INIT_DURING_RESUME))
+		cpufreq_offline(cpu);
 
 	return 0;
 }
diff --git i/drivers/cpufreq/scmi-cpufreq.c w/drivers/cpufreq/scmi-cpufreq.c
index 01871418ffde..0bfc96102739 100644
--- i/drivers/cpufreq/scmi-cpufreq.c
+++ w/drivers/cpufreq/scmi-cpufreq.c
@@ -203,7 +203,8 @@ static void scmi_cpufreq_ready(struct cpufreq_policy *policy)
 static struct cpufreq_driver scmi_cpufreq_driver = {
 	.name	= "scmi",
 	.flags	= CPUFREQ_STICKY | CPUFREQ_HAVE_GOVERNOR_PER_POLICY |
-		  CPUFREQ_NEED_INITIAL_FREQ_CHECK,
+		  CPUFREQ_NEED_INITIAL_FREQ_CHECK |
+		  CPUFREQ_DEFER_INIT_DURING_RESUME,
 	.verify	= cpufreq_generic_frequency_table_verify,
 	.attr	= cpufreq_generic_attr,
 	.target_index	= scmi_cpufreq_set_target,
diff --git i/include/linux/cpufreq.h w/include/linux/cpufreq.h
index c86d6d8bdfed..9cf6b3ce063a 100644
--- i/include/linux/cpufreq.h
+++ w/include/linux/cpufreq.h
@@ -385,6 +385,12 @@ struct cpufreq_driver {
  */
 #define CPUFREQ_NO_AUTO_DYNAMIC_SWITCHING (1 << 6)
 
+/*
+ * Set by drivers to advance/defer the cpufreq online/offline operation during
+ * system suspend/resume.
+ */
+#define CPUFREQ_DEFER_INIT_DURING_RESUME (1 << 7)
+
 int cpufreq_register_driver(struct cpufreq_driver *driver_data);
 int cpufreq_unregister_driver(struct cpufreq_driver *driver_data);
Viresh Kumar Feb. 11, 2019, 8:44 a.m. UTC | #27
On 07-02-19, 13:22, Marek Szyprowski wrote:
> Dear All,
> 
> Recent commit 9ac6cb5fbb17 ("i2c: add suspended flag and accessors for
> i2c adapters") added a visible warning for an attempt to do i2c transfer
> over a suspended i2c bus. This revealed a long standing issue in the
> cpufreq-dt driver, which gives a following warning during system
> suspend/resume cycle:

Marek,

I have sent a patchset which is not directly related to the problem
you are facing, but it may solve your problem as a side effect. Can
you please see if that works to solve this issue as well ?

https://lore.kernel.org/lkml/cover.1549874368.git.viresh.kumar@linaro.org/T/#u

Thanks.
Viresh Kumar Feb. 11, 2019, 8:47 a.m. UTC | #28
On 08-02-19, 17:41, Sudeep Holla wrote:
> Based on Rafael's suggestion, I cooked up something. See if this helps ?
> The policy to cpu dance can be removed and we can just run through the
> online cpumask I think.
> 
> Regards,
> Sudeep
> 
> -->8
> 
> diff --git i/drivers/cpufreq/cpufreq.c w/drivers/cpufreq/cpufreq.c
> index e35a886e00bc..03d65a02a542 100644
> --- i/drivers/cpufreq/cpufreq.c
> +++ w/drivers/cpufreq/cpufreq.c
> @@ -1640,6 +1640,7 @@ EXPORT_SYMBOL(cpufreq_generic_suspend);
>  void cpufreq_suspend(void)
>  {
>  	struct cpufreq_policy *policy;
> +	int cpu;
>  
>  	if (!cpufreq_driver)
>  		return;
> @@ -1662,6 +1663,11 @@ void cpufreq_suspend(void)
>  	}
>  
>  suspend:
> +	if (cpufreq_driver->flags & CPUFREQ_DEFER_INIT_DURING_RESUME)
> +		for_each_active_policy(policy)
> +			for_each_cpu(cpu, policy->cpus)
> +				cpufreq_offline(cpu);

You will offline boot-cpu as well :)

> +
>  	cpufreq_suspended = true;
>  }
>  
> @@ -1674,7 +1680,7 @@ void cpufreq_suspend(void)
>  void cpufreq_resume(void)
>  {
>  	struct cpufreq_policy *policy;
> -	int ret;
> +	int ret, cpu;
>  
>  	if (!cpufreq_driver)
>  		return;
> @@ -1682,6 +1688,11 @@ void cpufreq_resume(void)
>  	if (unlikely(!cpufreq_suspended))
>  		return;
>  
> +	if (cpufreq_driver->flags & CPUFREQ_DEFER_INIT_DURING_RESUME)
> +		for_each_active_policy(policy)
> +			for_each_cpu(cpu, policy->cpus)
> +				cpufreq_online(cpu);
> +
>  	cpufreq_suspended = false;
>  
>  	if (!has_target() && !cpufreq_driver->resume)
> @@ -2444,14 +2455,16 @@ static enum cpuhp_state hp_online;
>  
>  static int cpuhp_cpufreq_online(unsigned int cpu)
>  {
> -	cpufreq_online(cpu);
> +	if (!(cpufreq_driver->flags & CPUFREQ_DEFER_INIT_DURING_RESUME))
> +		cpufreq_online(cpu);

This isn't correct as we can offline the CPUs without suspend as well
and cpufreq_online/offline should always be called in such cases.

Anyways, I have cc'd you on another series which may end up fixing
this problem as well.
Marek Szyprowski Feb. 11, 2019, 9:52 a.m. UTC | #29
Hi Viresh,

On 2019-02-11 09:44, Viresh Kumar wrote:
> On 07-02-19, 13:22, Marek Szyprowski wrote:
>> Dear All,
>>
>> Recent commit 9ac6cb5fbb17 ("i2c: add suspended flag and accessors for
>> i2c adapters") added a visible warning for an attempt to do i2c transfer
>> over a suspended i2c bus. This revealed a long standing issue in the
>> cpufreq-dt driver, which gives a following warning during system
>> suspend/resume cycle:
> Marek,
>
> I have sent a patchset which is not directly related to the problem
> you are facing, but it may solve your problem as a side effect. Can
> you please see if that works to solve this issue as well ?
>
> https://lore.kernel.org/lkml/cover.1549874368.git.viresh.kumar@linaro.org/T/#u

Thanks for the patch. It indeed solves the problem of the i2c transfer
in cpu hotplug procedure during system resume, although my resources
management rewrite is still valid as a way to fix remaining 'todos' in
the cpufreq-dt driver.

Best regards
Viresh Kumar Feb. 11, 2019, 9:55 a.m. UTC | #30
On 11-02-19, 10:52, Marek Szyprowski wrote:
> Hi Viresh,
> 
> On 2019-02-11 09:44, Viresh Kumar wrote:
> > On 07-02-19, 13:22, Marek Szyprowski wrote:
> >> Dear All,
> >>
> >> Recent commit 9ac6cb5fbb17 ("i2c: add suspended flag and accessors for
> >> i2c adapters") added a visible warning for an attempt to do i2c transfer
> >> over a suspended i2c bus. This revealed a long standing issue in the
> >> cpufreq-dt driver, which gives a following warning during system
> >> suspend/resume cycle:
> > Marek,
> >
> > I have sent a patchset which is not directly related to the problem
> > you are facing, but it may solve your problem as a side effect. Can
> > you please see if that works to solve this issue as well ?
> >
> > https://lore.kernel.org/lkml/cover.1549874368.git.viresh.kumar@linaro.org/T/#u
> 
> Thanks for the patch. It indeed solves the problem of the i2c transfer
> in cpu hotplug procedure during system resume, although my resources
> management rewrite is still valid as a way to fix remaining 'todos' in
> the cpufreq-dt driver.

Which remaining todos are you talking about ? Sorry I am not able to
recall :(
Marek Szyprowski Feb. 11, 2019, 12:22 p.m. UTC | #31
Hi Viresh,

On 2019-02-11 10:55, Viresh Kumar wrote:
> On 11-02-19, 10:52, Marek Szyprowski wrote:
>> On 2019-02-11 09:44, Viresh Kumar wrote:
>>> On 07-02-19, 13:22, Marek Szyprowski wrote:
>>>> Dear All,
>>>>
>>>> Recent commit 9ac6cb5fbb17 ("i2c: add suspended flag and accessors for
>>>> i2c adapters") added a visible warning for an attempt to do i2c transfer
>>>> over a suspended i2c bus. This revealed a long standing issue in the
>>>> cpufreq-dt driver, which gives a following warning during system
>>>> suspend/resume cycle:
>>> Marek,
>>>
>>> I have sent a patchset which is not directly related to the problem
>>> you are facing, but it may solve your problem as a side effect. Can
>>> you please see if that works to solve this issue as well ?
>>>
>>> https://lore.kernel.org/lkml/cover.1549874368.git.viresh.kumar@linaro.org/T/#u
>> Thanks for the patch. It indeed solves the problem of the i2c transfer
>> in cpu hotplug procedure during system resume, although my resources
>> management rewrite is still valid as a way to fix remaining 'todos' in
>> the cpufreq-dt driver.
> Which remaining todos are you talking about ? Sorry I am not able to
> recall :(

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/cpufreq/cpufreq-dt.c#n347

Best regards
Sudeep Holla Feb. 11, 2019, 2:08 p.m. UTC | #32
On Mon, Feb 11, 2019 at 02:17:14PM +0530, Viresh Kumar wrote:
> On 08-02-19, 17:41, Sudeep Holla wrote:
> > Based on Rafael's suggestion, I cooked up something. See if this helps ?
> > The policy to cpu dance can be removed and we can just run through the
> > online cpumask I think.
> >
> > Regards,
> > Sudeep
> >
> > -->8
> >
> > diff --git i/drivers/cpufreq/cpufreq.c w/drivers/cpufreq/cpufreq.c
> > index e35a886e00bc..03d65a02a542 100644
> > --- i/drivers/cpufreq/cpufreq.c
> > +++ w/drivers/cpufreq/cpufreq.c
> > @@ -1640,6 +1640,7 @@ EXPORT_SYMBOL(cpufreq_generic_suspend);
> >  void cpufreq_suspend(void)
> >  {
> >  	struct cpufreq_policy *policy;
> > +	int cpu;
> >
> >  	if (!cpufreq_driver)
> >  		return;
> > @@ -1662,6 +1663,11 @@ void cpufreq_suspend(void)
> >  	}
> >
> >  suspend:
> > +	if (cpufreq_driver->flags & CPUFREQ_DEFER_INIT_DURING_RESUME)
> > +		for_each_active_policy(policy)
> > +			for_each_cpu(cpu, policy->cpus)
> > +				cpufreq_offline(cpu);
>
> You will offline boot-cpu as well :)
>

Indeed, I was just trying to check the idea of flags and clearly missed
the boot cpu :(


[..]

> > @@ -2444,14 +2455,16 @@ static enum cpuhp_state hp_online;
> >
> >  static int cpuhp_cpufreq_online(unsigned int cpu)
> >  {
> > -	cpufreq_online(cpu);
> > +	if (!(cpufreq_driver->flags & CPUFREQ_DEFER_INIT_DURING_RESUME))
> > +		cpufreq_online(cpu);
>
> This isn't correct as we can offline the CPUs without suspend as well
> and cpufreq_online/offline should always be called in such cases.
>

Understood

> Anyways, I have cc'd you on another series which may end up fixing
> this problem as well.

Sure, will have a look.

--
Regards,
Sudeep
Viresh Kumar Feb. 12, 2019, 5:08 a.m. UTC | #33
On 11-02-19, 13:22, Marek Szyprowski wrote:
> Hi Viresh,
> 
> On 2019-02-11 10:55, Viresh Kumar wrote:
> > On 11-02-19, 10:52, Marek Szyprowski wrote:
> >> On 2019-02-11 09:44, Viresh Kumar wrote:
> >>> On 07-02-19, 13:22, Marek Szyprowski wrote:
> >>>> Dear All,
> >>>>
> >>>> Recent commit 9ac6cb5fbb17 ("i2c: add suspended flag and accessors for
> >>>> i2c adapters") added a visible warning for an attempt to do i2c transfer
> >>>> over a suspended i2c bus. This revealed a long standing issue in the
> >>>> cpufreq-dt driver, which gives a following warning during system
> >>>> suspend/resume cycle:
> >>> Marek,
> >>>
> >>> I have sent a patchset which is not directly related to the problem
> >>> you are facing, but it may solve your problem as a side effect. Can
> >>> you please see if that works to solve this issue as well ?
> >>>
> >>> https://lore.kernel.org/lkml/cover.1549874368.git.viresh.kumar@linaro.org/T/#u
> >> Thanks for the patch. It indeed solves the problem of the i2c transfer
> >> in cpu hotplug procedure during system resume, although my resources
> >> management rewrite is still valid as a way to fix remaining 'todos' in
> >> the cpufreq-dt driver.
> > Which remaining todos are you talking about ? Sorry I am not able to
> > recall :(
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/cpufreq/cpufreq-dt.c#n347

Ah, okay. Thanks for the pointer.

Ideally we shouldn't be doing anything in probe, like
resources_available(), but we are forced to do it as
subsys_interface_register() doesn't return the errors returned from
cpufreq_add_dev() currently. I tried to fix it once but there were
some complications I believe and then left it.

The main problem is that if cpufreq_online() doesn't find the required
resources and returns -EPROBE_DEFER, it never comes back to the
probe() routine which registers the cpufreq driver. And so we have to
add the duplicate checks in probe() itself before it registers the
cpufreq driver.

Now all we need to do there is to guarantee that the resources are
available when the cpufreq driver registers and so we do it only for
CPU0 currently. Logically speaking, if the resources are available for
CPU0, it will normally be available for any other CPU as well and so
there never was a requirement to test it for other CPUs. And so I left
a FIXME there, so that we know what's going on there in case a
platform comes up for which it doesn't work.

I won't fix it with something like what this patch series tried to do,
rather I would try to make sure that EPROBE_DEFER gets returned from
cpufreq_driver_register() instead.

Thanks.