[v10,00/27] PM / Domains: Support hierarchical CPU arrangement (PSCI/ARM)

Message ID	20181129174700.16585-1-ulf.hansson@linaro.org (mailing list archive)
Headers	show Return-Path: <linux-pm-owner@kernel.org> From: Ulf Hansson <ulf.hansson@linaro.org> To: "Rafael J . Wysocki" <rjw@rjwysocki.net>, Sudeep Holla <sudeep.holla@arm.com>, Lorenzo Pieralisi <Lorenzo.Pieralisi@arm.com>, Mark Rutland <mark.rutland@arm.com>, Daniel Lezcano <daniel.lezcano@linaro.org>, linux-pm@vger.kernel.org Cc: "Raju P . L . S . S . S . N" <rplsssn@codeaurora.org>, Stephen Boyd <sboyd@kernel.org>, Tony Lindgren <tony@atomide.com>, Kevin Hilman <khilman@kernel.org>, Lina Iyer <ilina@codeaurora.org>, Ulf Hansson <ulf.hansson@linaro.org>, Viresh Kumar <viresh.kumar@linaro.org>, Vincent Guittot <vincent.guittot@linaro.org>, Geert Uytterhoeven <geert+renesas@glider.be>, linux-arm-kernel@lists.infradead.org, linux-arm-msm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v10 00/27] PM / Domains: Support hierarchical CPU arrangement (PSCI/ARM) Date: Thu, 29 Nov 2018 18:46:33 +0100 Message-Id: <20181129174700.16585-1-ulf.hansson@linaro.org> Sender: linux-pm-owner@vger.kernel.org Precedence: bulk
Series	PM / Domains: Support hierarchical CPU arrangement (PSCI/ARM) \| expand [v10,00/27] PM / Domains: Support hierarchical CPU arrangement (PSCI/ARM) [v10,01/27] PM / Domains: Add generic data pointer to genpd_power_state struct [v10,02/27] PM / Domains: Add support for CPU devices to genpd [v10,03/27] timer: Export next wakeup time of a CPU [v10,04/27] PM / Domains: Add genpd governor for CPUs [v10,05/27] dt: psci: Update DT bindings to support hierarchical PSCI states [v10,06/27] of: base: Add of_get_cpu_state_node() to get idle states for a CPU node [v10,07/27] cpuidle: dt: Support hierarchical CPU idle states [v10,08/27] ARM/ARM64: cpuidle: Let back-end init ops take the driver as input [v10,09/27] drivers: firmware: psci: Move psci to separate directory [v10,10/27] MAINTAINERS: Update files for PSCI [v10,11/27] drivers: firmware: psci: Split psci_dt_cpu_init_idle() [v10,12/27] drivers: firmware: psci: Simplify state node parsing [v10,13/27] drivers: firmware: psci: Support hierarchical CPU idle states [v10,14/27] drivers: firmware: psci: Simplify error path of psci_dt_init() [v10,15/27] drivers: firmware: psci: Announce support for OS initiated suspend mode [v10,16/27] drivers: firmware: psci: Prepare to use OS initiated suspend mode [v10,17/27] drivers: firmware: psci: Prepare to support PM domains [v10,18/27] drivers: firmware: psci: Add support for PM domains using genpd [v10,19/27] drivers: firmware: psci: Add hierarchical domain idle states converter [v10,20/27] drivers: firmware: psci: Introduce psci_dt_topology_init() [v10,21/27] drivers: firmware: psci: Add a helper to attach a CPU to its PM domain [v10,22/27] drivers: firmware: psci: Attach the CPU's device to its PM domain [v10,23/27] drivers: firmware: psci: Manage runtime PM in the idle path for CPUs [v10,24/27] drivers: firmware: psci: Support CPU hotplug for the hierarchical model [v10,25/27] arm64: kernel: Respect the hierarchical CPU topology in DT for PSCI [v10,26/27] arm64: dts: Convert to the hierarchical CPU topology layout for MSM8916 [v10,27/27] arm64: dts: hikey: Convert to the hierarchical CPU topology layout

Ulf Hansson Nov. 29, 2018, 5:46 p.m. UTC

Over the years this series have been iterated and discussed at various Linux
conferences and LKML. In this new v10, a quite significant amount of changes
have been made to address comments from v8 and v9. A summary is available
below, although let's start with a brand new clarification of the motivation
behind this series.

For ARM64/ARM based platforms CPUs are often arranged in a hierarchical manner.
From a CPU idle state perspective, this means some states may be shared among a
group of CPUs (aka CPU cluster).

To deal with idle management of a group of CPUs, sometimes the kernel needs to
be involved to manage the last-man standing algorithm, simply because it can't
rely solely on power management FWs to deal with this. Depending on the
platform, of course.

There are a couple of typical scenarios for when the kernel needs to be in
control, dealing with synchronization of when the last CPU in a cluster is about
to enter a deep idle state.

1)
The kernel needs to carry out so called last-man activities before the
CPU cluster can enter a deep idle state. This may for example involve to
configure external logics for wakeups, as the GIC may no longer be functional
once a deep cluster idle state have been entered. Likewise, these operations
may need to be restored, when the first CPU wakes up.

2)
Other more generic I/O devices, such as an MMC controller for example, may be a
part of the same power domain as the CPU cluster, due to a shared power-rail.
For these scenarios, when the MMC controller is in use dealing with an MMC
request, a deeper idle state of the CPU cluster may needs to be temporarily
disabled. This is needed to retain the MMC controller in a functional state,
else it may loose its register-context in the middle of serving a request.

In this series, we are extending the generic PM domain (aka genpd) to be used
for also CPU devices. Hence the goal is to re-use much of its current code to
help us manage the last-man standing synchronization. Moreover, as we already
use genpd to model power domains for generic I/O devices, both 1) and 2) can be
address with its help.

Moreover, to address these problems for ARM64 DT based platforms, we are
deploying support for genpd and runtime PM to the PSCI FW driver - and finally
we make some updates to two ARM64 DTBs, as to deploy the new PSCI CPU topology
layout.

The series has been tested on the QCOM 410c dragonboard and the Hisilicon Hikey
board. You may also find the code at:

git.linaro.org/people/ulf.hansson/linux-pm.git next

Kind regards
Ulf Hansson


Changes in v10:
 - Quite significant changes have been to the PSCI driver deployment. According
   to an agreement with Lorenzo, the hierarchical CPU layout for PSCI should be
   orthogonal to whether the PSCI FW supports OSI or not. This has been taken
   care of in this version.
 - Drop the generic attach/detach helpers of CPUs to genpd, instead make that
   related code internal to PSCI, for now.
 - Fix "BUG: sleeping for invalid context" for hotplug, as reported by Raju.
 - Addressed various comments from version 8 and 9.
 - Clarified changelogs and re-wrote the cover-letter to better explain the
   motivations behind these changes.

Changes in v9:
 - Collect only a subset from the changes in v8.
 - Patch 3 is new, documenting existing genpd flags. Future wise, this means
when a new genpd flag is invented, we must also properly document it.
 - No changes have been made to the patches picked from v8.
 - Dropped the text from v8 cover-letter[1], to avoid confusion. When posting v10
(or whatever the next version containing the rest becomes), I am going re-write
the cover-letter to clarify, more exactly, the problems this series intends to
solve. The earlier text was simply too vague.

[1]
https://lwn.net/Articles/758091/

Changes in v8:
 - Added some tags for reviews and acks.
 - Cleanup timer patch (patch6) according to comments from Rafael.
 - Rebased series on top of v4.18rc1 - it applied cleanly, except for patch 5.
 - While adopting patch 5 to new genpd changes, I took the opportunity to
   improve the new function description a bit.
 - Corrected malformed SPDX-License-Identifier in patch20.

Changes in v7:
 - Addressed comments concerning the PSCI changes from Mark Rutland, which moves
   the psci firmware driver to a new firmware subdir and change to force PSCI PC
   mode during boot to cope with kexec'ed booted kernels.
 - Added some maintainers in cc for the timer/nohz patches.
 - Minor update to the new genpd governor, taking into account the state's
   poweroff latency while validating the sleep duration time.
 - Addressed a problem pointed out by Geert Uytterhoeven, around calling
   pm_runtime_get|put() for CPUs that has not been attached to a CPU PM domain.
 - Re-based on Linus' latest master.


Lina Iyer (5):
  timer: Export next wakeup time of a CPU
  dt: psci: Update DT bindings to support hierarchical PSCI states
  cpuidle: dt: Support hierarchical CPU idle states
  drivers: firmware: psci: Support hierarchical CPU idle states
  arm64: dts: Convert to the hierarchical CPU topology layout for
    MSM8916

Ulf Hansson (22):
  PM / Domains: Add generic data pointer to genpd_power_state struct
  PM / Domains: Add support for CPU devices to genpd
  PM / Domains: Add genpd governor for CPUs
  of: base: Add of_get_cpu_state_node() to get idle states for a CPU
    node
  ARM/ARM64: cpuidle: Let back-end init ops take the driver as input
  drivers: firmware: psci: Move psci to separate directory
  MAINTAINERS: Update files for PSCI
  drivers: firmware: psci: Split psci_dt_cpu_init_idle()
  drivers: firmware: psci: Simplify state node parsing
  drivers: firmware: psci: Simplify error path of psci_dt_init()
  drivers: firmware: psci: Announce support for OS initiated suspend
    mode
  drivers: firmware: psci: Prepare to use OS initiated suspend mode
  drivers: firmware: psci: Prepare to support PM domains
  drivers: firmware: psci: Add support for PM domains using genpd
  drivers: firmware: psci: Add hierarchical domain idle states converter
  drivers: firmware: psci: Introduce psci_dt_topology_init()
  drivers: firmware: psci: Add a helper to attach a CPU to its PM domain
  drivers: firmware: psci: Attach the CPU's device to its PM domain
  drivers: firmware: psci: Manage runtime PM in the idle path for CPUs
  drivers: firmware: psci: Support CPU hotplug for the hierarchical
    model
  arm64: kernel: Respect the hierarchical CPU topology in DT for PSCI
  arm64: dts: hikey: Convert to the hierarchical CPU topology layout

 .../devicetree/bindings/arm/psci.txt          | 166 ++++++++
 MAINTAINERS                                   |   2 +-
 arch/arm/include/asm/cpuidle.h                |   4 +-
 arch/arm/kernel/cpuidle.c                     |   5 +-
 arch/arm64/boot/dts/hisilicon/hi6220.dtsi     |  87 +++-
 arch/arm64/boot/dts/qcom/msm8916.dtsi         |  57 ++-
 arch/arm64/include/asm/cpu_ops.h              |   4 +-
 arch/arm64/include/asm/cpuidle.h              |   6 +-
 arch/arm64/kernel/cpuidle.c                   |   6 +-
 arch/arm64/kernel/setup.c                     |   3 +
 drivers/base/power/domain.c                   |  74 +++-
 drivers/base/power/domain_governor.c          |  61 ++-
 drivers/cpuidle/cpuidle-arm.c                 |   2 +-
 drivers/cpuidle/dt_idle_states.c              |   5 +-
 drivers/firmware/Kconfig                      |  15 +-
 drivers/firmware/Makefile                     |   3 +-
 drivers/firmware/psci/Kconfig                 |  13 +
 drivers/firmware/psci/Makefile                |   4 +
 drivers/firmware/{ => psci}/psci.c            | 240 ++++++++---
 drivers/firmware/psci/psci.h                  |  23 ++
 drivers/firmware/{ => psci}/psci_checker.c    |   0
 drivers/firmware/psci/psci_pm_domain.c        | 389 ++++++++++++++++++
 drivers/of/base.c                             |  35 ++
 drivers/soc/qcom/spm.c                        |   3 +-
 include/linux/of.h                            |   8 +
 include/linux/pm_domain.h                     |  19 +-
 include/linux/psci.h                          |   6 +-
 include/linux/tick.h                          |   8 +
 include/uapi/linux/psci.h                     |   5 +
 kernel/time/tick-sched.c                      |  13 +
 30 files changed, 1163 insertions(+), 103 deletions(-)
 create mode 100644 drivers/firmware/psci/Kconfig
 create mode 100644 drivers/firmware/psci/Makefile
 rename drivers/firmware/{ => psci}/psci.c (76%)
 create mode 100644 drivers/firmware/psci/psci.h
 rename drivers/firmware/{ => psci}/psci_checker.c (100%)
 create mode 100644 drivers/firmware/psci/psci_pm_domain.c

Ulf Hansson Dec. 17, 2018, 4:12 p.m. UTC | #1

Rafael, Sudeep, Lorenzo, Mark,

On Thu, 29 Nov 2018 at 18:47, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> Over the years this series have been iterated and discussed at various Linux
> conferences and LKML. In this new v10, a quite significant amount of changes
> have been made to address comments from v8 and v9. A summary is available
> below, although let's start with a brand new clarification of the motivation
> behind this series.
>
> For ARM64/ARM based platforms CPUs are often arranged in a hierarchical manner.
> From a CPU idle state perspective, this means some states may be shared among a
> group of CPUs (aka CPU cluster).
>
> To deal with idle management of a group of CPUs, sometimes the kernel needs to
> be involved to manage the last-man standing algorithm, simply because it can't
> rely solely on power management FWs to deal with this. Depending on the
> platform, of course.
>
> There are a couple of typical scenarios for when the kernel needs to be in
> control, dealing with synchronization of when the last CPU in a cluster is about
> to enter a deep idle state.
>
> 1)
> The kernel needs to carry out so called last-man activities before the
> CPU cluster can enter a deep idle state. This may for example involve to
> configure external logics for wakeups, as the GIC may no longer be functional
> once a deep cluster idle state have been entered. Likewise, these operations
> may need to be restored, when the first CPU wakes up.
>
> 2)
> Other more generic I/O devices, such as an MMC controller for example, may be a
> part of the same power domain as the CPU cluster, due to a shared power-rail.
> For these scenarios, when the MMC controller is in use dealing with an MMC
> request, a deeper idle state of the CPU cluster may needs to be temporarily
> disabled. This is needed to retain the MMC controller in a functional state,
> else it may loose its register-context in the middle of serving a request.
>
> In this series, we are extending the generic PM domain (aka genpd) to be used
> for also CPU devices. Hence the goal is to re-use much of its current code to
> help us manage the last-man standing synchronization. Moreover, as we already
> use genpd to model power domains for generic I/O devices, both 1) and 2) can be
> address with its help.
>
> Moreover, to address these problems for ARM64 DT based platforms, we are
> deploying support for genpd and runtime PM to the PSCI FW driver - and finally
> we make some updates to two ARM64 DTBs, as to deploy the new PSCI CPU topology
> layout.
>
> The series has been tested on the QCOM 410c dragonboard and the Hisilicon Hikey
> board. You may also find the code at:
>
> git.linaro.org/people/ulf.hansson/linux-pm.git next

It's soon been three weeks since I posted this and I would really
appreciate some feedback.

Rafael, I need your feedback on patch 1->4.

Mark, Sudeep, Lorenzo, please have a look at the PSCI related changes.

When it comes to the the cpuidle related changes, I have pinged Daniel
offlist - and he is preparing some responses.

Kind regards
Uffe


>
> Kind regards
> Ulf Hansson
>
>
> Changes in v10:
>  - Quite significant changes have been to the PSCI driver deployment. According
>    to an agreement with Lorenzo, the hierarchical CPU layout for PSCI should be
>    orthogonal to whether the PSCI FW supports OSI or not. This has been taken
>    care of in this version.
>  - Drop the generic attach/detach helpers of CPUs to genpd, instead make that
>    related code internal to PSCI, for now.
>  - Fix "BUG: sleeping for invalid context" for hotplug, as reported by Raju.
>  - Addressed various comments from version 8 and 9.
>  - Clarified changelogs and re-wrote the cover-letter to better explain the
>    motivations behind these changes.
>
> Changes in v9:
>  - Collect only a subset from the changes in v8.
>  - Patch 3 is new, documenting existing genpd flags. Future wise, this means
> when a new genpd flag is invented, we must also properly document it.
>  - No changes have been made to the patches picked from v8.
>  - Dropped the text from v8 cover-letter[1], to avoid confusion. When posting v10
> (or whatever the next version containing the rest becomes), I am going re-write
> the cover-letter to clarify, more exactly, the problems this series intends to
> solve. The earlier text was simply too vague.
>
> [1]
> https://lwn.net/Articles/758091/
>
> Changes in v8:
>  - Added some tags for reviews and acks.
>  - Cleanup timer patch (patch6) according to comments from Rafael.
>  - Rebased series on top of v4.18rc1 - it applied cleanly, except for patch 5.
>  - While adopting patch 5 to new genpd changes, I took the opportunity to
>    improve the new function description a bit.
>  - Corrected malformed SPDX-License-Identifier in patch20.
>
> Changes in v7:
>  - Addressed comments concerning the PSCI changes from Mark Rutland, which moves
>    the psci firmware driver to a new firmware subdir and change to force PSCI PC
>    mode during boot to cope with kexec'ed booted kernels.
>  - Added some maintainers in cc for the timer/nohz patches.
>  - Minor update to the new genpd governor, taking into account the state's
>    poweroff latency while validating the sleep duration time.
>  - Addressed a problem pointed out by Geert Uytterhoeven, around calling
>    pm_runtime_get|put() for CPUs that has not been attached to a CPU PM domain.
>  - Re-based on Linus' latest master.
>
>
> Lina Iyer (5):
>   timer: Export next wakeup time of a CPU
>   dt: psci: Update DT bindings to support hierarchical PSCI states
>   cpuidle: dt: Support hierarchical CPU idle states
>   drivers: firmware: psci: Support hierarchical CPU idle states
>   arm64: dts: Convert to the hierarchical CPU topology layout for
>     MSM8916
>
> Ulf Hansson (22):
>   PM / Domains: Add generic data pointer to genpd_power_state struct
>   PM / Domains: Add support for CPU devices to genpd
>   PM / Domains: Add genpd governor for CPUs
>   of: base: Add of_get_cpu_state_node() to get idle states for a CPU
>     node
>   ARM/ARM64: cpuidle: Let back-end init ops take the driver as input
>   drivers: firmware: psci: Move psci to separate directory
>   MAINTAINERS: Update files for PSCI
>   drivers: firmware: psci: Split psci_dt_cpu_init_idle()
>   drivers: firmware: psci: Simplify state node parsing
>   drivers: firmware: psci: Simplify error path of psci_dt_init()
>   drivers: firmware: psci: Announce support for OS initiated suspend
>     mode
>   drivers: firmware: psci: Prepare to use OS initiated suspend mode
>   drivers: firmware: psci: Prepare to support PM domains
>   drivers: firmware: psci: Add support for PM domains using genpd
>   drivers: firmware: psci: Add hierarchical domain idle states converter
>   drivers: firmware: psci: Introduce psci_dt_topology_init()
>   drivers: firmware: psci: Add a helper to attach a CPU to its PM domain
>   drivers: firmware: psci: Attach the CPU's device to its PM domain
>   drivers: firmware: psci: Manage runtime PM in the idle path for CPUs
>   drivers: firmware: psci: Support CPU hotplug for the hierarchical
>     model
>   arm64: kernel: Respect the hierarchical CPU topology in DT for PSCI
>   arm64: dts: hikey: Convert to the hierarchical CPU topology layout
>
>  .../devicetree/bindings/arm/psci.txt          | 166 ++++++++
>  MAINTAINERS                                   |   2 +-
>  arch/arm/include/asm/cpuidle.h                |   4 +-
>  arch/arm/kernel/cpuidle.c                     |   5 +-
>  arch/arm64/boot/dts/hisilicon/hi6220.dtsi     |  87 +++-
>  arch/arm64/boot/dts/qcom/msm8916.dtsi         |  57 ++-
>  arch/arm64/include/asm/cpu_ops.h              |   4 +-
>  arch/arm64/include/asm/cpuidle.h              |   6 +-
>  arch/arm64/kernel/cpuidle.c                   |   6 +-
>  arch/arm64/kernel/setup.c                     |   3 +
>  drivers/base/power/domain.c                   |  74 +++-
>  drivers/base/power/domain_governor.c          |  61 ++-
>  drivers/cpuidle/cpuidle-arm.c                 |   2 +-
>  drivers/cpuidle/dt_idle_states.c              |   5 +-
>  drivers/firmware/Kconfig                      |  15 +-
>  drivers/firmware/Makefile                     |   3 +-
>  drivers/firmware/psci/Kconfig                 |  13 +
>  drivers/firmware/psci/Makefile                |   4 +
>  drivers/firmware/{ => psci}/psci.c            | 240 ++++++++---
>  drivers/firmware/psci/psci.h                  |  23 ++
>  drivers/firmware/{ => psci}/psci_checker.c    |   0
>  drivers/firmware/psci/psci_pm_domain.c        | 389 ++++++++++++++++++
>  drivers/of/base.c                             |  35 ++
>  drivers/soc/qcom/spm.c                        |   3 +-
>  include/linux/of.h                            |   8 +
>  include/linux/pm_domain.h                     |  19 +-
>  include/linux/psci.h                          |   6 +-
>  include/linux/tick.h                          |   8 +
>  include/uapi/linux/psci.h                     |   5 +
>  kernel/time/tick-sched.c                      |  13 +
>  30 files changed, 1163 insertions(+), 103 deletions(-)
>  create mode 100644 drivers/firmware/psci/Kconfig
>  create mode 100644 drivers/firmware/psci/Makefile
>  rename drivers/firmware/{ => psci}/psci.c (76%)
>  create mode 100644 drivers/firmware/psci/psci.h
>  rename drivers/firmware/{ => psci}/psci_checker.c (100%)
>  create mode 100644 drivers/firmware/psci/psci_pm_domain.c
>
> --
> 2.17.1
>

Sudeep Holla Jan. 3, 2019, 12:06 p.m. UTC | #2

On Thu, Nov 29, 2018 at 06:46:33PM +0100, Ulf Hansson wrote:
> Over the years this series have been iterated and discussed at various Linux
> conferences and LKML. In this new v10, a quite significant amount of changes
> have been made to address comments from v8 and v9. A summary is available
> below, although let's start with a brand new clarification of the motivation
> behind this series.

I would like to raise few points, not blockers as such but need to be
discussed and resolved before proceeding further.
1. CPU Idle Retention states
	- How will be deal with flattening (which brings back the DT bindings,
	  i.e. do we have all we need) ? Because today there are no users of
	  this binding yet. I know we all agreed and added after LPC2017 but
	  I am not convinced about flattening with only valid states.
	- Will domain governor ensure not to enter deeper idles states based
	  on its sub-domain states. E.g.: when CPUs are in retention, so
	  called container/cluster domain can enter retention or below and not
	  power off states.
	- Is the case of not calling cpu_pm_{enter,exit} handled now ?

2. Now that we have SDM845 which may soon have platform co-ordinated idle
   support in mainline, I *really* would like to see some power comparison
   numbers(i.e. PC without cluster idle states). This has been the main theme
   for most of the discussion on this topic for years and now we are close
   to have some platform, we need to try.

3. Also, after adding such complexity, we really need a platform with an
   option to build and upgrade firmware easily. This will help to prevent
   this being not maintained for long without a platform to test, also
   avoid adding lots of quirks to deal with broken firmware so that newer
   platforms deal those issues in the firmware correctly.

--
Regards,
Sudeep

Rafael J. Wysocki Jan. 11, 2019, 11:08 a.m. UTC | #3

On Monday, December 17, 2018 5:12:54 PM CET Ulf Hansson wrote:
> Rafael, Sudeep, Lorenzo, Mark,
> 
> On Thu, 29 Nov 2018 at 18:47, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> >
> > Over the years this series have been iterated and discussed at various Linux
> > conferences and LKML. In this new v10, a quite significant amount of changes
> > have been made to address comments from v8 and v9. A summary is available
> > below, although let's start with a brand new clarification of the motivation
> > behind this series.
> >
> > For ARM64/ARM based platforms CPUs are often arranged in a hierarchical manner.
> > From a CPU idle state perspective, this means some states may be shared among a
> > group of CPUs (aka CPU cluster).
> >
> > To deal with idle management of a group of CPUs, sometimes the kernel needs to
> > be involved to manage the last-man standing algorithm, simply because it can't
> > rely solely on power management FWs to deal with this. Depending on the
> > platform, of course.
> >
> > There are a couple of typical scenarios for when the kernel needs to be in
> > control, dealing with synchronization of when the last CPU in a cluster is about
> > to enter a deep idle state.
> >
> > 1)
> > The kernel needs to carry out so called last-man activities before the
> > CPU cluster can enter a deep idle state. This may for example involve to
> > configure external logics for wakeups, as the GIC may no longer be functional
> > once a deep cluster idle state have been entered. Likewise, these operations
> > may need to be restored, when the first CPU wakes up.
> >
> > 2)
> > Other more generic I/O devices, such as an MMC controller for example, may be a
> > part of the same power domain as the CPU cluster, due to a shared power-rail.
> > For these scenarios, when the MMC controller is in use dealing with an MMC
> > request, a deeper idle state of the CPU cluster may needs to be temporarily
> > disabled. This is needed to retain the MMC controller in a functional state,
> > else it may loose its register-context in the middle of serving a request.
> >
> > In this series, we are extending the generic PM domain (aka genpd) to be used
> > for also CPU devices. Hence the goal is to re-use much of its current code to
> > help us manage the last-man standing synchronization. Moreover, as we already
> > use genpd to model power domains for generic I/O devices, both 1) and 2) can be
> > address with its help.
> >
> > Moreover, to address these problems for ARM64 DT based platforms, we are
> > deploying support for genpd and runtime PM to the PSCI FW driver - and finally
> > we make some updates to two ARM64 DTBs, as to deploy the new PSCI CPU topology
> > layout.
> >
> > The series has been tested on the QCOM 410c dragonboard and the Hisilicon Hikey
> > board. You may also find the code at:
> >
> > git.linaro.org/people/ulf.hansson/linux-pm.git next
> 
> It's soon been three weeks since I posted this and I would really
> appreciate some feedback.
> 
> Rafael, I need your feedback on patch 1->4.

Sorry for the delay, I've replied to the patches.

The bottom line is that the mechanism introduced in patch 3 and used
in patch 4 doesn't look particularly clean to me.

Cheers,
Rafael

Ulf Hansson Jan. 16, 2019, 9:10 a.m. UTC | #4

On Thu, 3 Jan 2019 at 13:06, Sudeep Holla <sudeep.holla@arm.com> wrote:
>
> On Thu, Nov 29, 2018 at 06:46:33PM +0100, Ulf Hansson wrote:
> > Over the years this series have been iterated and discussed at various Linux
> > conferences and LKML. In this new v10, a quite significant amount of changes
> > have been made to address comments from v8 and v9. A summary is available
> > below, although let's start with a brand new clarification of the motivation
> > behind this series.
>
> I would like to raise few points, not blockers as such but need to be
> discussed and resolved before proceeding further.
> 1. CPU Idle Retention states
>         - How will be deal with flattening (which brings back the DT bindings,
>           i.e. do we have all we need) ? Because today there are no users of
>           this binding yet. I know we all agreed and added after LPC2017 but
>           I am not convinced about flattening with only valid states.

Not exactly sure I understand what you are concerned about here. When
it comes to users of the new DT binding, I am converting two new
platforms in this series to use of it.

Note, the flattened model is still a valid option to describe the CPU
idle states after these changes. Especially when there are no last man
standing activities to manage by Linux and no shared resource that
need to prevent cluster idle states, when it's active.

>         - Will domain governor ensure not to enter deeper idles states based
>           on its sub-domain states. E.g.: when CPUs are in retention, so
>           called container/cluster domain can enter retention or below and not
>           power off states.

I have tried to point this out as a known limitation in genpd of the
current series, possibly I have failed to communicate that clearly.
Anyway, I fully agree that this needs to be addressed in a future
step.

Note that, this isn't a specific limitation to how idle states are
selected for CPUs and CPU clusters by genpd, but is rather a
limitation to any hierarchical PM domain topology managed by genpd
that has multiple idle states.

Do note, I already started hacking on this and intend to post patches
on top of this series, as these changes isn't needed for those two
ARM64 platforms I have deployed support for.

>         - Is the case of not calling cpu_pm_{enter,exit} handled now ?

It is still called, so no changes in regards to that as apart of this series.

When it comes to actually manage the "last man activities" as part of
selecting an idle state of the cluster, that is going to be addressed
on top as "optimizations".

In principle we should not need to call cpu_pm_enter|exit() in the
idle path at all, but rather only cpu_cluster_pm_enter|exit() when a
cluster idle state is selected. That should improve latency when
selecting an idle state for a CPU. However, to reach that point
additional changes are needed in various drivers, such as the gic
driver for example.

>
> 2. Now that we have SDM845 which may soon have platform co-ordinated idle
>    support in mainline, I *really* would like to see some power comparison
>    numbers(i.e. PC without cluster idle states). This has been the main theme
>    for most of the discussion on this topic for years and now we are close
>    to have some platform, we need to try.

I have quite recently been talking to Qcom folkz about this as well,
but no commitments are made.

Although I fully agree that some comparison would be great, it still
doesn't matter much, as we anyway need to support PSCI OSI mode in
Linux. Lorenzo have agreed to this as well.

>
> 3. Also, after adding such complexity, we really need a platform with an
>    option to build and upgrade firmware easily. This will help to prevent
>    this being not maintained for long without a platform to test, also
>    avoid adding lots of quirks to deal with broken firmware so that newer
>    platforms deal those issues in the firmware correctly.

I don't see how this series change anything from what we already have
today with the PSCI FW. No matter of OSI or PC mode is used, there are
complexity involved.

Although, of course I agree with you, that we should continue to try
to convince ARM vendors about moving to the public version of ATF and
avoid proprietary FW binaries as much as possible.

Kind regards
Uffe

Sudeep Holla Jan. 17, 2019, 5:44 p.m. UTC | #5

On Wed, Jan 16, 2019 at 10:10:08AM +0100, Ulf Hansson wrote:
> On Thu, 3 Jan 2019 at 13:06, Sudeep Holla <sudeep.holla@arm.com> wrote:
> >
> > On Thu, Nov 29, 2018 at 06:46:33PM +0100, Ulf Hansson wrote:
> > > Over the years this series have been iterated and discussed at various Linux
> > > conferences and LKML. In this new v10, a quite significant amount of changes
> > > have been made to address comments from v8 and v9. A summary is available
> > > below, although let's start with a brand new clarification of the motivation
> > > behind this series.
> >
> > I would like to raise few points, not blockers as such but need to be
> > discussed and resolved before proceeding further.
> > 1. CPU Idle Retention states
> >         - How will be deal with flattening (which brings back the DT bindings,
> >           i.e. do we have all we need) ? Because today there are no users of
> >           this binding yet. I know we all agreed and added after LPC2017 but
> >           I am not convinced about flattening with only valid states.
>
> Not exactly sure I understand what you are concerned about here. When
> it comes to users of the new DT binding, I am converting two new
> platforms in this series to use of it.
>

Yes that's exactly my concern. So if someone updates DT(since it's part
of the kernel still), but don't update the firmware(for complexity reasons)
the end result on those platform is broken CPUIdle which is a regression/
feature break and that's what I am objecting here.

> Note, the flattened model is still a valid option to describe the CPU
> idle states after these changes. Especially when there are no last man
> standing activities to manage by Linux and no shared resource that
> need to prevent cluster idle states, when it's active.

Since OSI vs PC is discoverable, we shouldn't tie up with DT in anyway.

>
> >         - Will domain governor ensure not to enter deeper idles states based
> >           on its sub-domain states. E.g.: when CPUs are in retention, so
> >           called container/cluster domain can enter retention or below and not
> >           power off states.
>
> I have tried to point this out as a known limitation in genpd of the
> current series, possibly I have failed to communicate that clearly.
> Anyway, I fully agree that this needs to be addressed in a future
> step.
>

Sorry, I might have missed to read. The point is if we are sacrificing
few retention states with this new feature, I am sure PC would perform
better that OSI on platforms which has retention states. Another
reason for having comparison data or we should simply assume and state
clearly OSI may perform bad on such system until the support is added.

> Note that, this isn't a specific limitation to how idle states are
> selected for CPUs and CPU clusters by genpd, but is rather a
> limitation to any hierarchical PM domain topology managed by genpd
> that has multiple idle states.
>

Agreed, but with flattened mode we compile the list of valid states so
the limitation is automatically eliminated.

> Do note, I already started hacking on this and intend to post patches
> on top of this series, as these changes isn't needed for those two
> ARM64 platforms I have deployed support for.
>

Good to know.

> >         - Is the case of not calling cpu_pm_{enter,exit} handled now ?
>
> It is still called, so no changes in regards to that as apart of this series.
>

OK, so I assume for now we are not going to support retention states with OSI
for now ?

> When it comes to actually manage the "last man activities" as part of
> selecting an idle state of the cluster, that is going to be addressed
> on top as "optimizations".
>

OK

> In principle we should not need to call cpu_pm_enter|exit() in the
> idle path at all,

Not sure if we can do that. We need to notify things like PMU, FP, GIC
which have per cpu context too and not just "cluster" context.

> but rather only cpu_cluster_pm_enter|exit() when a cluster idle state is
> selected.

We need to avoid relying on concept of "cluster" and just think of power
domains and what's hanging on those domains. Sorry for naive question, but
does genpd have concept of notifiers. I do understand that it's more
bottom up approach where each entity in genpd saves the context and requests
to enter a particular state. But with CPU devices like GIC/VFP/PMU, it
needs to be more top down approach where CPU genpd has to enter a enter
so it notifies the devices attached to it to save it's context. Not ideal
but that's current solution. Because with the new DT bindings, platforms
can express if PMU/GIC is in per cpu domain or any pd in the hierarchy and
we ideally need to honor that. But that's optimisation, just mentioning.

> That should improve latency when
> selecting an idle state for a CPU. However, to reach that point
> additional changes are needed in various drivers, such as the gic
> driver for example.
>

Agreed.

> >
> > 2. Now that we have SDM845 which may soon have platform co-ordinated idle
> >    support in mainline, I *really* would like to see some power comparison
> >    numbers(i.e. PC without cluster idle states). This has been the main theme
> >    for most of the discussion on this topic for years and now we are close
> >    to have some platform, we need to try.
>
> I have quite recently been talking to Qcom folkz about this as well,
> but no commitments are made.
>

Indeed that's the worrying. IMO, this is requested since day#1 and not
even simple interest is shown, but that's another topic.

> Although I fully agree that some comparison would be great, it still
> doesn't matter much, as we anyway need to support PSCI OSI mode in
> Linux. Lorenzo have agreed to this as well.
>

OK, I am fine if others agree. Since we are sacrificing on few (retention)
states that might disappear with OSI, I am still very much still interested
as OSI might perform bad that PC especially in such cases.

> >
> > 3. Also, after adding such complexity, we really need a platform with an
> >    option to build and upgrade firmware easily. This will help to prevent
> >    this being not maintained for long without a platform to test, also
> >    avoid adding lots of quirks to deal with broken firmware so that newer
> >    platforms deal those issues in the firmware correctly.
>
> I don't see how this series change anything from what we already have
> today with the PSCI FW. No matter of OSI or PC mode is used, there are
> complexity involved.
>

I agree, but PC is already merged, mainitained and well tested regularly
as it's default mode that must be supported and TF-A supports/maintains
that. OSI is new and is on platform which may not have much commitments
and can be thrown away and any bugs we find in future many need to worked
around in kernel. That's what I meant as worrying.

> Although, of course I agree with you, that we should continue to try
> to convince ARM vendors about moving to the public version of ATF and
> avoid proprietary FW binaries as much as possible.
>

Indeed.

--
Regards,
Sudeep

Ulf Hansson Jan. 18, 2019, 11:56 a.m. UTC | #6

On Thu, 17 Jan 2019 at 18:44, Sudeep Holla <sudeep.holla@arm.com> wrote:
>
> On Wed, Jan 16, 2019 at 10:10:08AM +0100, Ulf Hansson wrote:
> > On Thu, 3 Jan 2019 at 13:06, Sudeep Holla <sudeep.holla@arm.com> wrote:
> > >
> > > On Thu, Nov 29, 2018 at 06:46:33PM +0100, Ulf Hansson wrote:
> > > > Over the years this series have been iterated and discussed at various Linux
> > > > conferences and LKML. In this new v10, a quite significant amount of changes
> > > > have been made to address comments from v8 and v9. A summary is available
> > > > below, although let's start with a brand new clarification of the motivation
> > > > behind this series.
> > >
> > > I would like to raise few points, not blockers as such but need to be
> > > discussed and resolved before proceeding further.
> > > 1. CPU Idle Retention states
> > >         - How will be deal with flattening (which brings back the DT bindings,
> > >           i.e. do we have all we need) ? Because today there are no users of
> > >           this binding yet. I know we all agreed and added after LPC2017 but
> > >           I am not convinced about flattening with only valid states.
> >
> > Not exactly sure I understand what you are concerned about here. When
> > it comes to users of the new DT binding, I am converting two new
> > platforms in this series to use of it.
> >
>
> Yes that's exactly my concern. So if someone updates DT(since it's part
> of the kernel still), but don't update the firmware(for complexity reasons)
> the end result on those platform is broken CPUIdle which is a regression/
> feature break and that's what I am objecting here.

There is not going to be a regression if that happens, you have got
that wrong. Let me clarify why.

For Hikey example, which is one of those platforms I convert into
using the new hierarchical DT bindings for the CPUs. It still uses the
existing PSCI FW, which is supporting PSCI PC mode only.

In this case, the PSCI FW driver, observes that there is no OSI mode
support in the FW, which triggers it to convert the hierarchically
described idle states into regular flattened cpuidle states. In this
way, the idle states can be manged by the cpuidle framework per CPU,
as they are currently.

So, why convert Hikey to the new DT bindings? It makes Linux aware of
the topology, thus it can monitor when the last CPU in the cluster
enters idle - and then take care of "last man activities".

>
> > Note, the flattened model is still a valid option to describe the CPU
> > idle states after these changes. Especially when there are no last man
> > standing activities to manage by Linux and no shared resource that
> > need to prevent cluster idle states, when it's active.
>
> Since OSI vs PC is discoverable, we shouldn't tie up with DT in anyway.

As stated above, we aren't. OSI and PC mode are orthogonal to the DT bindings.

>
> >
> > >         - Will domain governor ensure not to enter deeper idles states based
> > >           on its sub-domain states. E.g.: when CPUs are in retention, so
> > >           called container/cluster domain can enter retention or below and not
> > >           power off states.
> >
> > I have tried to point this out as a known limitation in genpd of the
> > current series, possibly I have failed to communicate that clearly.
> > Anyway, I fully agree that this needs to be addressed in a future
> > step.
> >
>
> Sorry, I might have missed to read. The point is if we are sacrificing
> few retention states with this new feature, I am sure PC would perform
> better that OSI on platforms which has retention states. Another
> reason for having comparison data or we should simply assume and state
> clearly OSI may perform bad on such system until the support is added.

I now understand that I misread your question. We are not scarifying
any idle states at all. Not in PC mode and not in OSI mode.

>
> > Note that, this isn't a specific limitation to how idle states are
> > selected for CPUs and CPU clusters by genpd, but is rather a
> > limitation to any hierarchical PM domain topology managed by genpd
> > that has multiple idle states.
> >
>
> Agreed, but with flattened mode we compile the list of valid states so
> the limitation is automatically eliminated.

What I was trying to point out above, was a limitation in genpd and
with its governors. If the PM domains have multiple idle states and
also have multiple sub-domain levels, the selection of idle state may
not be correct. However, that scenario doesn't exist for Hikey/410c.

Apologize for the noise, I simply thought it was this limitation you
referred to.

>
> > Do note, I already started hacking on this and intend to post patches
> > on top of this series, as these changes isn't needed for those two
> > ARM64 platforms I have deployed support for.
> >
>
> Good to know.
>
> > >         - Is the case of not calling cpu_pm_{enter,exit} handled now ?
> >
> > It is still called, so no changes in regards to that as apart of this series.
> >
>
> OK, so I assume for now we are not going to support retention states with OSI
> for now ?
>
> > When it comes to actually manage the "last man activities" as part of
> > selecting an idle state of the cluster, that is going to be addressed
> > on top as "optimizations".
> >
>
> OK
>
> > In principle we should not need to call cpu_pm_enter|exit() in the
> > idle path at all,
>
> Not sure if we can do that. We need to notify things like PMU, FP, GIC
> which have per cpu context too and not just "cluster" context.
>
> > but rather only cpu_cluster_pm_enter|exit() when a cluster idle state is
> > selected.
>
> We need to avoid relying on concept of "cluster" and just think of power
> domains and what's hanging on those domains.

I fully agree. I just wanted to use a well know term to avoid confusion.

> Sorry for naive question, but
> does genpd have concept of notifiers. I do understand that it's more
> bottom up approach where each entity in genpd saves the context and requests
> to enter a particular state. But with CPU devices like GIC/VFP/PMU, it
> needs to be more top down approach where CPU genpd has to enter a enter
> so it notifies the devices attached to it to save it's context.

No, genpd don't have on/off notifiers . There have been attempts to
add them, but those didn't make it.

Anyway, it's nice that you brings this up! The problem is well
described and the approach you suggest may very well be the right one.

In principle, I am also worried that the cpu_cluster_pm_enter|exist()
notifiers, doesn't scale. We may fire them when we shouldn't and
consumers may get them when they don't need them.

> Not ideal
> but that's current solution. Because with the new DT bindings, platforms
> can express if PMU/GIC is in per cpu domain or any pd in the hierarchy and
> we ideally need to honor that. But that's optimisation, just mentioning.

Overall, it's great that you mention this - and I just want to
confirm. I have this in mind when I am thinking of the next steps.

In regards to the next steps, hopefully we can move forward with
$subject series soon, so we really can start discussing the next steps
for real. I even think we need some of them to be implemented, before
we can see the full benefits made to latency and energy efficiency.

>
> > That should improve latency when
> > selecting an idle state for a CPU. However, to reach that point
> > additional changes are needed in various drivers, such as the gic
> > driver for example.
> >
>
> Agreed.
>
> > >
> > > 2. Now that we have SDM845 which may soon have platform co-ordinated idle
> > >    support in mainline, I *really* would like to see some power comparison
> > >    numbers(i.e. PC without cluster idle states). This has been the main theme
> > >    for most of the discussion on this topic for years and now we are close
> > >    to have some platform, we need to try.
> >
> > I have quite recently been talking to Qcom folkz about this as well,
> > but no commitments are made.
> >
>
> Indeed that's the worrying. IMO, this is requested since day#1 and not
> even simple interest is shown, but that's another topic.

Well, at least we keep talking about it and I am sure we will be able
to compare at some point.

Another option is simply to implement support for OSI mode in the
public ARM Trusted Firmware, any of us could do that. That would open
up for testing for a bunch of "open" platforms, like Hikey for
example.

>
> > Although I fully agree that some comparison would be great, it still
> > doesn't matter much, as we anyway need to support PSCI OSI mode in
> > Linux. Lorenzo have agreed to this as well.
> >
>
> OK, I am fine if others agree. Since we are sacrificing on few (retention)
> states that might disappear with OSI, I am still very much still interested
> as OSI might perform bad that PC especially in such cases.
>
> > >
> > > 3. Also, after adding such complexity, we really need a platform with an
> > >    option to build and upgrade firmware easily. This will help to prevent
> > >    this being not maintained for long without a platform to test, also
> > >    avoid adding lots of quirks to deal with broken firmware so that newer
> > >    platforms deal those issues in the firmware correctly.
> >
> > I don't see how this series change anything from what we already have
> > today with the PSCI FW. No matter of OSI or PC mode is used, there are
> > complexity involved.
> >
>
> I agree, but PC is already merged, mainitained and well tested regularly
> as it's default mode that must be supported and TF-A supports/maintains
> that. OSI is new and is on platform which may not have much commitments
> and can be thrown away and any bugs we find in future many need to worked
> around in kernel. That's what I meant as worrying.

I see what you are saying. Hopefully my earlier answers above will
make you less worry. :-)

>
> > Although, of course I agree with you, that we should continue to try
> > to convince ARM vendors about moving to the public version of ATF and
> > avoid proprietary FW binaries as much as possible.
> >
>
> Indeed.
>
> --
> Regards,
> Sudeep

Kind regards
Uffe

[v10,00/27] PM / Domains: Support hierarchical CPU arrangement (PSCI/ARM)

Message

Comments