mbox series

[RFC/PATCH,0/5] DVFS in the OPP core

Message ID 20190129015547.213276-1-swboyd@chromium.org (mailing list archive)
Headers show
Series DVFS in the OPP core | expand

Message

Stephen Boyd Jan. 29, 2019, 1:55 a.m. UTC
This patch series is an RFC around how we can implement DVFS for devices
that aren't your typical OPPish device (i.e. GPU/CPU). They don't have a
strict set of frequencies that they have been tested at to derive some
operating performance point. Instead they have a coarser set of
frequency max or 'fmax' OPPs that describe the maiximum frequency the
device can operate at with a given voltage.

The approach we take is to let the devm_pm_opp_set_rate() API accept 0
as a valid frequency indicating the frequency isn't required anymore and
to make the same API use the clk framework to round the frequency passed
in instead of relying on the OPP table to specify each frequency that
can be used. Once we have these two patches in place, we can use the OPP
API to change clk rates instead of clk_set_rate() and use all the recent
OPP enhancements that have been made around required-opps and genpd
performance states to do DVFS for all devices.

One nice feature of this approach is that we don't need to change the
OPP binding to support this. We can specify only the max frequencies and
the voltage requirements in DT with the existing binding and slightly
tweak the OPP code to achieve these results. 

This series includes a conversion of the uart and spi drivers on
qcom sdm845 where these patches are being developed. It shows how a
driver is converted from the clk APIs to the OPP APIs and how
enable/disable state of the clk is communicated to the OPP layer.

Some open topics and initial points for discussion are:

1) The dev_pm_opp_set_rate() API changes may break something that's 
relying on the rate rounding that OPP provides. If those exist,
we may need to implement another API that is more explicit about using
the clk API instead of the OPP tables.

2) dev_pm_opp_set_rate(0) is an interesting solution to the problem of
dropping the rate requirement. Is there anything better than this?

3) How do we handle devices that already have power-domains specified in
DT? The opp binding for required-opps doesn't let us specify the power
domain to target, instead it assumes that whatever power domain is
attached to a device is the one that OPP needs to use to change the
genpd performance state. Do we need a
dev_pm_opp_set_required_opps_name() or something to be explicit about
this? Can we have some way for the power domain that required-opps
correspond to be expressed in the OPP tables themselves?

4) How do we achieve the "full constraint" state? i.e. what do we do
about some devices being active and others being inactive during boot
and making sure that the voltage for the shared power domains doesn't
drop until all devices requiring it have informed OPP about their
power requirements?

Rajendra Nayak (4):
  OPP: Make dev_pm_opp_set_rate() with freq=0 as valid
  tty: serial: qcom_geni_serial: Use OPP API to set clk/perf state
  spi: spi-geni-qcom: Use OPP API to set clk/perf state
  arm64: dts: sdm845: Add OPP table for all qup devices

Stephen Boyd (1):
  OPP: Don't overwrite rounded clk rate

 arch/arm64/boot/dts/qcom/sdm845.dtsi  | 115 ++++++++++++++++++++++++++
 drivers/opp/core.c                    |  26 ++++--
 drivers/spi/spi-geni-qcom.c           |  12 ++-
 drivers/tty/serial/qcom_geni_serial.c |  15 +++-
 4 files changed, 155 insertions(+), 13 deletions(-)

For the interested, these patches are located here:

   https://github.com/rrnayak/linux/ v5.0-rc3/opp-corners-wip

I've generated these patches by cutting off the top of that tree and
massaging the commit text a bit for the first patch.

base-commit: 49a57857aeea06ca831043acbb0fa5e0f50602fd
prerequisite-patch-id: 9c3ee728603596b8b0ba06ffd66084bdc21b1271
prerequisite-patch-id: f160e050bcd74f5de6fad66329381853869a6a97
prerequisite-patch-id: aa23548d2b486c29489b4304d85799d08762254e
prerequisite-patch-id: 40dd117c45fecb4308298352346546612db94b64
prerequisite-patch-id: cd102fa42bab19897c2295e8b990b2156626054a
prerequisite-patch-id: 3b9e5c8ed65ee96cc0f1c50166cf6bbb597ef582
prerequisite-patch-id: 7e71b957b90ad51d0619944d5ebc859380e8e3e4
prerequisite-patch-id: 5abd2bd6b3ae3e91551e2b8f9295169019ba82c7
prerequisite-patch-id: 68bb3e44cf4e5dbd363a1a1750e4d378971ed393
prerequisite-patch-id: 882b14ef9527b15d22cfddbb5fa2b9d43df1ff04
prerequisite-patch-id: 6fc14ddeb074fb0826f1f40031e9d161f1361666

Comments

Viresh Kumar Jan. 31, 2019, 9:23 a.m. UTC | #1
Adding few folks to the thread who might be interested in this stuff.

On 28-01-19, 17:55, Stephen Boyd wrote:
> This patch series is an RFC around how we can implement DVFS for devices
> that aren't your typical OPPish device (i.e. GPU/CPU). They don't have a
> strict set of frequencies that they have been tested at to derive some
> operating performance point. Instead they have a coarser set of
> frequency max or 'fmax' OPPs that describe the maiximum frequency the
> device can operate at with a given voltage.
> 
> The approach we take is to let the devm_pm_opp_set_rate() API accept 0
> as a valid frequency indicating the frequency isn't required anymore and
> to make the same API use the clk framework to round the frequency passed
> in instead of relying on the OPP table to specify each frequency that
> can be used. Once we have these two patches in place, we can use the OPP
> API to change clk rates instead of clk_set_rate() and use all the recent
> OPP enhancements that have been made around required-opps and genpd
> performance states to do DVFS for all devices.

Generally speaking I am fine with such an approach but I am not sure
about what others would say on this as they had objections to using
OPP core for setting the rate itself.

FWIW, I suggested exactly this solution sometime back [1]

- Drivers need to use two API sets to change clock rate (OPP helpers)
  and enable/disable them (CLK framework helpers) and this leads us to
  exactly that combination. Is that acceptable ? It doesn't look great
  to me as well..

- Do we expect the callers will disable clk before calling
  opp-set-rate with 0 ? We should remove the regulator requirements as
  well along with perf-state.

- What about enabling/disabling clock as well from OPP framework. We
  can enable it on the very first call to opp-set-rate and disable
  when freq is 0. That will simplify the drivers as well.

> One nice feature of this approach is that we don't need to change the
> OPP binding to support this. We can specify only the max frequencies and
> the voltage requirements in DT with the existing binding and slightly
> tweak the OPP code to achieve these results. 
> 
> This series includes a conversion of the uart and spi drivers on
> qcom sdm845 where these patches are being developed. It shows how a
> driver is converted from the clk APIs to the OPP APIs and how
> enable/disable state of the clk is communicated to the OPP layer.
> 
> Some open topics and initial points for discussion are:
> 
> 1) The dev_pm_opp_set_rate() API changes may break something that's 
> relying on the rate rounding that OPP provides. If those exist,
> we may need to implement another API that is more explicit about using
> the clk API instead of the OPP tables.
> 
> 2) dev_pm_opp_set_rate(0) is an interesting solution to the problem of
> dropping the rate requirement. Is there anything better than this?
> 
> 3) How do we handle devices that already have power-domains specified in
> DT? The opp binding for required-opps doesn't let us specify the power
> domain to target, instead it assumes that whatever power domain is
> attached to a device is the one that OPP needs to use to change the
> genpd performance state. Do we need a
> dev_pm_opp_set_required_opps_name() or something to be explicit about
> this? Can we have some way for the power domain that required-opps
> correspond to be expressed in the OPP tables themselves?
> 
> 4) How do we achieve the "full constraint" state? i.e. what do we do
> about some devices being active and others being inactive during boot
> and making sure that the voltage for the shared power domains doesn't
> drop until all devices requiring it have informed OPP about their
> power requirements?
> 
> Rajendra Nayak (4):
>   OPP: Make dev_pm_opp_set_rate() with freq=0 as valid
>   tty: serial: qcom_geni_serial: Use OPP API to set clk/perf state
>   spi: spi-geni-qcom: Use OPP API to set clk/perf state
>   arm64: dts: sdm845: Add OPP table for all qup devices
> 
> Stephen Boyd (1):
>   OPP: Don't overwrite rounded clk rate
> 
>  arch/arm64/boot/dts/qcom/sdm845.dtsi  | 115 ++++++++++++++++++++++++++
>  drivers/opp/core.c                    |  26 ++++--
>  drivers/spi/spi-geni-qcom.c           |  12 ++-
>  drivers/tty/serial/qcom_geni_serial.c |  15 +++-
>  4 files changed, 155 insertions(+), 13 deletions(-)
> 
> For the interested, these patches are located here:
> 
>    https://github.com/rrnayak/linux/ v5.0-rc3/opp-corners-wip
> 
> I've generated these patches by cutting off the top of that tree and
> massaging the commit text a bit for the first patch.
> 
> base-commit: 49a57857aeea06ca831043acbb0fa5e0f50602fd
> prerequisite-patch-id: 9c3ee728603596b8b0ba06ffd66084bdc21b1271
> prerequisite-patch-id: f160e050bcd74f5de6fad66329381853869a6a97
> prerequisite-patch-id: aa23548d2b486c29489b4304d85799d08762254e
> prerequisite-patch-id: 40dd117c45fecb4308298352346546612db94b64
> prerequisite-patch-id: cd102fa42bab19897c2295e8b990b2156626054a
> prerequisite-patch-id: 3b9e5c8ed65ee96cc0f1c50166cf6bbb597ef582
> prerequisite-patch-id: 7e71b957b90ad51d0619944d5ebc859380e8e3e4
> prerequisite-patch-id: 5abd2bd6b3ae3e91551e2b8f9295169019ba82c7
> prerequisite-patch-id: 68bb3e44cf4e5dbd363a1a1750e4d378971ed393
> prerequisite-patch-id: 882b14ef9527b15d22cfddbb5fa2b9d43df1ff04
> prerequisite-patch-id: 6fc14ddeb074fb0826f1f40031e9d161f1361666
> -- 
> Sent by a computer through tubes
Rafael J. Wysocki Jan. 31, 2019, 9:58 a.m. UTC | #2
On Thu, Jan 31, 2019 at 10:23 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> Adding few folks to the thread who might be interested in this stuff.
>
> On 28-01-19, 17:55, Stephen Boyd wrote:
> > This patch series is an RFC around how we can implement DVFS for devices
> > that aren't your typical OPPish device (i.e. GPU/CPU). They don't have a
> > strict set of frequencies that they have been tested at to derive some
> > operating performance point. Instead they have a coarser set of
> > frequency max or 'fmax' OPPs that describe the maiximum frequency the
> > device can operate at with a given voltage.
> >
> > The approach we take is to let the devm_pm_opp_set_rate() API accept 0
> > as a valid frequency indicating the frequency isn't required anymore and
> > to make the same API use the clk framework to round the frequency passed
> > in instead of relying on the OPP table to specify each frequency that
> > can be used. Once we have these two patches in place, we can use the OPP
> > API to change clk rates instead of clk_set_rate() and use all the recent
> > OPP enhancements that have been made around required-opps and genpd
> > performance states to do DVFS for all devices.
>
> Generally speaking I am fine with such an approach but I am not sure
> about what others would say on this as they had objections to using
> OPP core for setting the rate itself.
>
> FWIW, I suggested exactly this solution sometime back [1]
>
> - Drivers need to use two API sets to change clock rate (OPP helpers)
>   and enable/disable them (CLK framework helpers) and this leads us to
>   exactly that combination. Is that acceptable ? It doesn't look great
>   to me as well..

I agree here.

> - Do we expect the callers will disable clk before calling
>   opp-set-rate with 0 ? We should remove the regulator requirements as
>   well along with perf-state.

Well, disabling clk affects HW in general, doesn't it?

> - What about enabling/disabling clock as well from OPP framework. We
>   can enable it on the very first call to opp-set-rate and disable
>   when freq is 0. That will simplify the drivers as well.

That sounds compelling, but I guess there are cases in which you can
gate the clock regardless of the frequency setting.  How would that
work then?
Viresh Kumar Jan. 31, 2019, 10:06 a.m. UTC | #3
On 31-01-19, 10:58, Rafael J. Wysocki wrote:
> On Thu, Jan 31, 2019 at 10:23 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >
> > Adding few folks to the thread who might be interested in this stuff.
> >
> > On 28-01-19, 17:55, Stephen Boyd wrote:
> > > This patch series is an RFC around how we can implement DVFS for devices
> > > that aren't your typical OPPish device (i.e. GPU/CPU). They don't have a
> > > strict set of frequencies that they have been tested at to derive some
> > > operating performance point. Instead they have a coarser set of
> > > frequency max or 'fmax' OPPs that describe the maiximum frequency the
> > > device can operate at with a given voltage.
> > >
> > > The approach we take is to let the devm_pm_opp_set_rate() API accept 0
> > > as a valid frequency indicating the frequency isn't required anymore and
> > > to make the same API use the clk framework to round the frequency passed
> > > in instead of relying on the OPP table to specify each frequency that
> > > can be used. Once we have these two patches in place, we can use the OPP
> > > API to change clk rates instead of clk_set_rate() and use all the recent
> > > OPP enhancements that have been made around required-opps and genpd
> > > performance states to do DVFS for all devices.
> >
> > Generally speaking I am fine with such an approach but I am not sure
> > about what others would say on this as they had objections to using
> > OPP core for setting the rate itself.
> >
> > FWIW, I suggested exactly this solution sometime back [1]
> >
> > - Drivers need to use two API sets to change clock rate (OPP helpers)
> >   and enable/disable them (CLK framework helpers) and this leads us to
> >   exactly that combination. Is that acceptable ? It doesn't look great
> >   to me as well..
> 
> I agree here.
> 
> > - Do we expect the callers will disable clk before calling
> >   opp-set-rate with 0 ? We should remove the regulator requirements as
> >   well along with perf-state.
> 
> Well, disabling clk affects HW in general, doesn't it?

Yeah, but the regulator may be shared and is running at higher
voltages just because of the clock requirement of the device getting
disabled here. Or did I misunderstood what you wanted to say ?

> > - What about enabling/disabling clock as well from OPP framework. We
> >   can enable it on the very first call to opp-set-rate and disable
> >   when freq is 0. That will simplify the drivers as well.
> 
> That sounds compelling, but I guess there are cases in which you can
> gate the clock regardless of the frequency setting.  How would that
> work then?

Can you give any example here ? I am not sure I understood the concern
here.
Rafael J. Wysocki Jan. 31, 2019, 10:36 a.m. UTC | #4
On Thu, Jan 31, 2019 at 11:06 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 31-01-19, 10:58, Rafael J. Wysocki wrote:
> > On Thu, Jan 31, 2019 at 10:23 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > >
> > > Adding few folks to the thread who might be interested in this stuff.
> > >
> > > On 28-01-19, 17:55, Stephen Boyd wrote:
> > > > This patch series is an RFC around how we can implement DVFS for devices
> > > > that aren't your typical OPPish device (i.e. GPU/CPU). They don't have a
> > > > strict set of frequencies that they have been tested at to derive some
> > > > operating performance point. Instead they have a coarser set of
> > > > frequency max or 'fmax' OPPs that describe the maiximum frequency the
> > > > device can operate at with a given voltage.
> > > >
> > > > The approach we take is to let the devm_pm_opp_set_rate() API accept 0
> > > > as a valid frequency indicating the frequency isn't required anymore and
> > > > to make the same API use the clk framework to round the frequency passed
> > > > in instead of relying on the OPP table to specify each frequency that
> > > > can be used. Once we have these two patches in place, we can use the OPP
> > > > API to change clk rates instead of clk_set_rate() and use all the recent
> > > > OPP enhancements that have been made around required-opps and genpd
> > > > performance states to do DVFS for all devices.
> > >
> > > Generally speaking I am fine with such an approach but I am not sure
> > > about what others would say on this as they had objections to using
> > > OPP core for setting the rate itself.
> > >
> > > FWIW, I suggested exactly this solution sometime back [1]
> > >
> > > - Drivers need to use two API sets to change clock rate (OPP helpers)
> > >   and enable/disable them (CLK framework helpers) and this leads us to
> > >   exactly that combination. Is that acceptable ? It doesn't look great
> > >   to me as well..
> >
> > I agree here.
> >
> > > - Do we expect the callers will disable clk before calling
> > >   opp-set-rate with 0 ? We should remove the regulator requirements as
> > >   well along with perf-state.
> >
> > Well, disabling clk affects HW in general, doesn't it?
>
> Yeah, but the regulator may be shared and is running at higher
> voltages just because of the clock requirement of the device getting
> disabled here. Or did I misunderstood what you wanted to say ?

What I wanted to say is that if the caller is required to disable clk
beforehand, that may be as good as setting its rate to zero already.

> > > - What about enabling/disabling clock as well from OPP framework. We
> > >   can enable it on the very first call to opp-set-rate and disable
> > >   when freq is 0. That will simplify the drivers as well.
> >
> > That sounds compelling, but I guess there are cases in which you can
> > gate the clock regardless of the frequency setting.  How would that
> > work then?
>
> Can you give any example here ? I am not sure I understood the concern
> here.

It looks like I was confused somehow, never mind. :-)
Viresh Kumar Jan. 31, 2019, 10:41 a.m. UTC | #5
On 31-01-19, 11:36, Rafael J. Wysocki wrote:
> On Thu, Jan 31, 2019 at 11:06 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >
> > On 31-01-19, 10:58, Rafael J. Wysocki wrote:
> > > On Thu, Jan 31, 2019 at 10:23 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > > >
> > > > Adding few folks to the thread who might be interested in this stuff.
> > > >
> > > > On 28-01-19, 17:55, Stephen Boyd wrote:
> > > > > This patch series is an RFC around how we can implement DVFS for devices
> > > > > that aren't your typical OPPish device (i.e. GPU/CPU). They don't have a
> > > > > strict set of frequencies that they have been tested at to derive some
> > > > > operating performance point. Instead they have a coarser set of
> > > > > frequency max or 'fmax' OPPs that describe the maiximum frequency the
> > > > > device can operate at with a given voltage.
> > > > >
> > > > > The approach we take is to let the devm_pm_opp_set_rate() API accept 0
> > > > > as a valid frequency indicating the frequency isn't required anymore and
> > > > > to make the same API use the clk framework to round the frequency passed
> > > > > in instead of relying on the OPP table to specify each frequency that
> > > > > can be used. Once we have these two patches in place, we can use the OPP
> > > > > API to change clk rates instead of clk_set_rate() and use all the recent
> > > > > OPP enhancements that have been made around required-opps and genpd
> > > > > performance states to do DVFS for all devices.
> > > >
> > > > Generally speaking I am fine with such an approach but I am not sure
> > > > about what others would say on this as they had objections to using
> > > > OPP core for setting the rate itself.
> > > >
> > > > FWIW, I suggested exactly this solution sometime back [1]
> > > >
> > > > - Drivers need to use two API sets to change clock rate (OPP helpers)
> > > >   and enable/disable them (CLK framework helpers) and this leads us to
> > > >   exactly that combination. Is that acceptable ? It doesn't look great
> > > >   to me as well..
> > >
> > > I agree here.
> > >
> > > > - Do we expect the callers will disable clk before calling
> > > >   opp-set-rate with 0 ? We should remove the regulator requirements as
> > > >   well along with perf-state.
> > >
> > > Well, disabling clk affects HW in general, doesn't it?
> >
> > Yeah, but the regulator may be shared and is running at higher
> > voltages just because of the clock requirement of the device getting
> > disabled here. Or did I misunderstood what you wanted to say ?
> 
> What I wanted to say is that if the caller is required to disable clk
> beforehand, that may be as good as setting its rate to zero already.

Right, but that doesn't mean that the requirements put on the
regulator and genpd are gone as well and that is why I favor the OPP
core to handle all the parts otherwise write the rules explicitly. If
the OPP core doesn't disable the clock and the caller also hasn't done
the same, then disabling the genpd/regulator may break things up.
Rajendra Nayak Feb. 7, 2019, 6:57 a.m. UTC | #6
> 3) How do we handle devices that already have power-domains specified in
> DT? The opp binding for required-opps doesn't let us specify the power
> domain to target, instead it assumes that whatever power domain is
> attached to a device is the one that OPP needs to use to change the
> genpd performance state. Do we need a
> dev_pm_opp_set_required_opps_name() or something to be explicit about
> this? Can we have some way for the power domain that required-opps
> correspond to be expressed in the OPP tables themselves?

I was converting a few more drivers to use the proposed approach in this
RFC, in order to identify all outstanding issues we need to deal with,
and specifically for UFS, I end up with this exact scenario where UFS already
has an existing power domain (gdsc) and I need to add another one (rpmhpd) for
setting the performance state.

If I use dev_pm_opp_of_add_table() to add the opp table from DT, the opp
layer assumes its the same device on which it can do a dev_pm_genpd_set_performance_state()
with, however the device that's actually associated with the pm_domain when we
have multiple power domains is infact the one (dummy) that we create when
the driver makes a call to dev_pm_domain_attach_by_name/id().

Any thoughts on whats a good way to handle this?
Stephen Boyd Feb. 7, 2019, 7:58 a.m. UTC | #7
Quoting Viresh Kumar (2019-01-31 01:23:49)
> Adding few folks to the thread who might be interested in this stuff.
> 
> On 28-01-19, 17:55, Stephen Boyd wrote:
> > This patch series is an RFC around how we can implement DVFS for devices
> > that aren't your typical OPPish device (i.e. GPU/CPU). They don't have a
> > strict set of frequencies that they have been tested at to derive some
> > operating performance point. Instead they have a coarser set of
> > frequency max or 'fmax' OPPs that describe the maiximum frequency the
> > device can operate at with a given voltage.
> > 
> > The approach we take is to let the devm_pm_opp_set_rate() API accept 0
> > as a valid frequency indicating the frequency isn't required anymore and
> > to make the same API use the clk framework to round the frequency passed
> > in instead of relying on the OPP table to specify each frequency that
> > can be used. Once we have these two patches in place, we can use the OPP
> > API to change clk rates instead of clk_set_rate() and use all the recent
> > OPP enhancements that have been made around required-opps and genpd
> > performance states to do DVFS for all devices.
> 
> Generally speaking I am fine with such an approach but I am not sure
> about what others would say on this as they had objections to using
> OPP core for setting the rate itself.
> 
> FWIW, I suggested exactly this solution sometime back [1]
> 
> - Drivers need to use two API sets to change clock rate (OPP helpers)
>   and enable/disable them (CLK framework helpers) and this leads us to
>   exactly that combination. Is that acceptable ? It doesn't look great
>   to me as well..

Agreed. I don't think anyone thinks this looks great, but I'll argue
that it's improving OPP for devices that already use it so that we can
remove voltage requirements when their clk is off. Think about CPUs that
are in their own clk domain where we want to remove the voltage
requirement when those CPUs are offline, or a GPU that wants to remove
its voltage requirement when it turns off clks. The combination is
already out there, just OPP hasn't solved this problem.

The only other plan I had was to implement another API like
dev_pm_set_state() or something like that and have that do something
like what the OPP rate API does right now. The main difference being
that the argument to the function would be some opaque u64 that's
converted by the bus/class/genpd attached to the device into whatever
frequency/voltage/performance state is desired (and sequenced in the
right order too). And then I was thinking that runtime PM or explicit
dev_pm_set_state() calls would be used to tell this new layer that the
device was going to a lower power mode with some other number (sub-kHz
integer?) and have that be translated again into some
frequency/voltage/performance state.

Either way, each driver will have to change from using the clk APIs to
changes rates to something else like one of these APIs, so I don't see a
huge difference. Drivers will have to change.

> 
> - Do we expect the callers will disable clk before calling
>   opp-set-rate with 0 ? We should remove the regulator requirements as
>   well along with perf-state.

Yes, that's the plan. Problems come along with this though, like shared
resource constraints and actually knowing the clk on/off state,
frequency, voltage, etc. at boot time and making sure to keep those
constraints satisfied during normal operation.

> 
> - What about enabling/disabling clock as well from OPP framework. We
>   can enable it on the very first call to opp-set-rate and disable
>   when freq is 0. That will simplify the drivers as well.

It works when those drivers aren't calling clk_disable() directly from
some irq handler. I don't think that's very common, but in those cases
we would probably want to allow drivers to quickly gate and ungate their
clks but then defer the sleeping stuff (voltages and off chip clks) to
the schedulable contexts. We'll still be left with a small number of
drivers that want to only call clk_prepare() and clk_unprepare() from
within OPP and keep calling clk_enable() and clk_disable() from their
driver. So introduce different APIs for those drivers to indicate this
to OPP? And only do that when it becomes a requirement?

Otherwise I don't really see a problem with the OPP call handling the
enable state of the clk as well.

> 
> > One nice feature of this approach is that we don't need to change the
> > OPP binding to support this. We can specify only the max frequencies and
> > the voltage requirements in DT with the existing binding and slightly
> > tweak the OPP code to achieve these results. 
> > 
> > This series includes a conversion of the uart and spi drivers on
> > qcom sdm845 where these patches are being developed. It shows how a
> > driver is converted from the clk APIs to the OPP APIs and how
> > enable/disable state of the clk is communicated to the OPP layer.
> > 
> > Some open topics and initial points for discussion are:
> > 
> > 1) The dev_pm_opp_set_rate() API changes may break something that's 
> > relying on the rate rounding that OPP provides. If those exist,
> > we may need to implement another API that is more explicit about using
> > the clk API instead of the OPP tables.
> > 
> > 2) dev_pm_opp_set_rate(0) is an interesting solution to the problem of
> > dropping the rate requirement. Is there anything better than this?
> > 
> > 3) How do we handle devices that already have power-domains specified in
> > DT? The opp binding for required-opps doesn't let us specify the power
> > domain to target, instead it assumes that whatever power domain is
> > attached to a device is the one that OPP needs to use to change the
> > genpd performance state. Do we need a
> > dev_pm_opp_set_required_opps_name() or something to be explicit about
> > this? Can we have some way for the power domain that required-opps
> > correspond to be expressed in the OPP tables themselves?
> > 
> > 4) How do we achieve the "full constraint" state? i.e. what do we do
> > about some devices being active and others being inactive during boot
> > and making sure that the voltage for the shared power domains doesn't
> > drop until all devices requiring it have informed OPP about their
> > power requirements?
> > 

Any comments on these topics?
Ulf Hansson Feb. 7, 2019, 1:37 p.m. UTC | #8
On Thu, 7 Feb 2019 at 08:58, Stephen Boyd <swboyd@chromium.org> wrote:
>
> Quoting Viresh Kumar (2019-01-31 01:23:49)
> > Adding few folks to the thread who might be interested in this stuff.
> >
> > On 28-01-19, 17:55, Stephen Boyd wrote:
> > > This patch series is an RFC around how we can implement DVFS for devices
> > > that aren't your typical OPPish device (i.e. GPU/CPU). They don't have a
> > > strict set of frequencies that they have been tested at to derive some
> > > operating performance point. Instead they have a coarser set of
> > > frequency max or 'fmax' OPPs that describe the maiximum frequency the
> > > device can operate at with a given voltage.
> > >
> > > The approach we take is to let the devm_pm_opp_set_rate() API accept 0
> > > as a valid frequency indicating the frequency isn't required anymore and
> > > to make the same API use the clk framework to round the frequency passed
> > > in instead of relying on the OPP table to specify each frequency that
> > > can be used. Once we have these two patches in place, we can use the OPP
> > > API to change clk rates instead of clk_set_rate() and use all the recent
> > > OPP enhancements that have been made around required-opps and genpd
> > > performance states to do DVFS for all devices.
> >
> > Generally speaking I am fine with such an approach but I am not sure
> > about what others would say on this as they had objections to using
> > OPP core for setting the rate itself.
> >
> > FWIW, I suggested exactly this solution sometime back [1]
> >
> > - Drivers need to use two API sets to change clock rate (OPP helpers)
> >   and enable/disable them (CLK framework helpers) and this leads us to
> >   exactly that combination. Is that acceptable ? It doesn't look great
> >   to me as well..
>
> Agreed. I don't think anyone thinks this looks great, but I'll argue
> that it's improving OPP for devices that already use it so that we can
> remove voltage requirements when their clk is off. Think about CPUs that
> are in their own clk domain where we want to remove the voltage
> requirement when those CPUs are offline, or a GPU that wants to remove
> its voltage requirement when it turns off clks. The combination is
> already out there, just OPP hasn't solved this problem.
>
> The only other plan I had was to implement another API like
> dev_pm_set_state() or something like that and have that do something
> like what the OPP rate API does right now. The main difference being
> that the argument to the function would be some opaque u64 that's
> converted by the bus/class/genpd attached to the device into whatever
> frequency/voltage/performance state is desired (and sequenced in the
> right order too). And then I was thinking that runtime PM or explicit
> dev_pm_set_state() calls would be used to tell this new layer that the
> device was going to a lower power mode with some other number (sub-kHz
> integer?) and have that be translated again into some
> frequency/voltage/performance state.
>
> Either way, each driver will have to change from using the clk APIs to
> changes rates to something else like one of these APIs, so I don't see a
> huge difference. Drivers will have to change.
>
> >
> > - Do we expect the callers will disable clk before calling
> >   opp-set-rate with 0 ? We should remove the regulator requirements as
> >   well along with perf-state.
>
> Yes, that's the plan. Problems come along with this though, like shared
> resource constraints and actually knowing the clk on/off state,
> frequency, voltage, etc. at boot time and making sure to keep those
> constraints satisfied during normal operation.
>
> >
> > - What about enabling/disabling clock as well from OPP framework. We
> >   can enable it on the very first call to opp-set-rate and disable
> >   when freq is 0. That will simplify the drivers as well.
>
> It works when those drivers aren't calling clk_disable() directly from
> some irq handler. I don't think that's very common, but in those cases
> we would probably want to allow drivers to quickly gate and ungate their
> clks but then defer the sleeping stuff (voltages and off chip clks) to
> the schedulable contexts. We'll still be left with a small number of
> drivers that want to only call clk_prepare() and clk_unprepare() from
> within OPP and keep calling clk_enable() and clk_disable() from their
> driver. So introduce different APIs for those drivers to indicate this
> to OPP? And only do that when it becomes a requirement?
>
> Otherwise I don't really see a problem with the OPP call handling the
> enable state of the clk as well.

I think we also need to consider cross SoC drivers. One SoC may have
both clocks and OPPs to manage, while another may have only clocks.

Even it this may be fairly uncommon, we should consider it, before we
decide to fold in additional clock management, like
clk_prepare|unprepare() for example, behind the dev_pm_opp_set_rate()
API.

The point is, the driver may need to call clk_prepare|enable()
anyways, unless we make that conditional depending on a DT compatible
string, for example. Of course, because the clock prepare/enable is
reference counted, there may not be a problem in practice to have both
the OPP and driver to deal with it.

>
> >
> > > One nice feature of this approach is that we don't need to change the
> > > OPP binding to support this. We can specify only the max frequencies and
> > > the voltage requirements in DT with the existing binding and slightly
> > > tweak the OPP code to achieve these results.
> > >
> > > This series includes a conversion of the uart and spi drivers on
> > > qcom sdm845 where these patches are being developed. It shows how a
> > > driver is converted from the clk APIs to the OPP APIs and how
> > > enable/disable state of the clk is communicated to the OPP layer.
> > >
> > > Some open topics and initial points for discussion are:
> > >
> > > 1) The dev_pm_opp_set_rate() API changes may break something that's
> > > relying on the rate rounding that OPP provides. If those exist,
> > > we may need to implement another API that is more explicit about using
> > > the clk API instead of the OPP tables.
> > >
> > > 2) dev_pm_opp_set_rate(0) is an interesting solution to the problem of
> > > dropping the rate requirement. Is there anything better than this?

This seems like a reasonable approach to me.

At least it seams silly to invent a separate API and I can't think of
anything else that makes sense.

> > >
> > > 3) How do we handle devices that already have power-domains specified in
> > > DT? The opp binding for required-opps doesn't let us specify the power
> > > domain to target, instead it assumes that whatever power domain is
> > > attached to a device is the one that OPP needs to use to change the
> > > genpd performance state. Do we need a
> > > dev_pm_opp_set_required_opps_name() or something to be explicit about
> > > this? Can we have some way for the power domain that required-opps
> > > correspond to be expressed in the OPP tables themselves?

Is the about the multiple PM domains per device? I thought we have
already covered this case, or at least part of it [1].

Doesn't dev_pm_opp_set_genpd_virt_dev() and friends, help you with this, no?

> > >
> > > 4) How do we achieve the "full constraint" state? i.e. what do we do
> > > about some devices being active and others being inactive during boot
> > > and making sure that the voltage for the shared power domains doesn't
> > > drop until all devices requiring it have informed OPP about their
> > > power requirements?

Good question. What kind of problems do you see.

Will drivers fail to probe? Will the HW operate outside valid conditions?

> > >
>
> Any comments on these topics?
>

Kind regards
Uffe

[1]
https://lkml.org/lkml/2018/10/25/204
Stephen Boyd Feb. 7, 2019, 7:47 p.m. UTC | #9
Quoting Rajendra Nayak (2019-02-06 22:57:12)
> 
> > 3) How do we handle devices that already have power-domains specified in
> > DT? The opp binding for required-opps doesn't let us specify the power
> > domain to target, instead it assumes that whatever power domain is
> > attached to a device is the one that OPP needs to use to change the
> > genpd performance state. Do we need a
> > dev_pm_opp_set_required_opps_name() or something to be explicit about
> > this? Can we have some way for the power domain that required-opps
> > correspond to be expressed in the OPP tables themselves?
> 
> I was converting a few more drivers to use the proposed approach in this
> RFC, in order to identify all outstanding issues we need to deal with,
> and specifically for UFS, I end up with this exact scenario where UFS already
> has an existing power domain (gdsc) and I need to add another one (rpmhpd) for
> setting the performance state.
> 
> If I use dev_pm_opp_of_add_table() to add the opp table from DT, the opp
> layer assumes its the same device on which it can do a dev_pm_genpd_set_performance_state()
> with, however the device that's actually associated with the pm_domain when we
> have multiple power domains is infact the one (dummy) that we create when
> the driver makes a call to dev_pm_domain_attach_by_name/id().
> 
> Any thoughts on whats a good way to handle this?
> 

Ulf mentioned that we can use dev_pm_opp_set_genpd_virt_dev() for this.
Does that API work here?
Rajendra Nayak Feb. 8, 2019, 4:39 a.m. UTC | #10
On 2/8/2019 1:17 AM, Stephen Boyd wrote:
> Quoting Rajendra Nayak (2019-02-06 22:57:12)
>>
>>> 3) How do we handle devices that already have power-domains specified in
>>> DT? The opp binding for required-opps doesn't let us specify the power
>>> domain to target, instead it assumes that whatever power domain is
>>> attached to a device is the one that OPP needs to use to change the
>>> genpd performance state. Do we need a
>>> dev_pm_opp_set_required_opps_name() or something to be explicit about
>>> this? Can we have some way for the power domain that required-opps
>>> correspond to be expressed in the OPP tables themselves?
>>
>> I was converting a few more drivers to use the proposed approach in this
>> RFC, in order to identify all outstanding issues we need to deal with,
>> and specifically for UFS, I end up with this exact scenario where UFS already
>> has an existing power domain (gdsc) and I need to add another one (rpmhpd) for
>> setting the performance state.
>>
>> If I use dev_pm_opp_of_add_table() to add the opp table from DT, the opp
>> layer assumes its the same device on which it can do a dev_pm_genpd_set_performance_state()
>> with, however the device that's actually associated with the pm_domain when we
>> have multiple power domains is infact the one (dummy) that we create when
>> the driver makes a call to dev_pm_domain_attach_by_name/id().
>>
>> Any thoughts on whats a good way to handle this?
>>
> 
> Ulf mentioned that we can use dev_pm_opp_set_genpd_virt_dev() for this.
> Does that API work here?

Ah, yes, that should work, I hadn't noticed this API existed.
Viresh Kumar Feb. 8, 2019, 7:14 a.m. UTC | #11
On 06-02-19, 23:58, Stephen Boyd wrote:
> Quoting Viresh Kumar (2019-01-31 01:23:49)
> > FWIW, I suggested exactly this solution sometime back [1]
> > 
> > - Drivers need to use two API sets to change clock rate (OPP helpers)
> >   and enable/disable them (CLK framework helpers) and this leads us to
> >   exactly that combination. Is that acceptable ? It doesn't look great
> >   to me as well..
> 
> Agreed. I don't think anyone thinks this looks great, but I'll argue
> that it's improving OPP for devices that already use it so that we can
> remove voltage requirements when their clk is off. Think about CPUs that
> are in their own clk domain where we want to remove the voltage
> requirement when those CPUs are offline, or a GPU that wants to remove
> its voltage requirement when it turns off clks. The combination is
> already out there, just OPP hasn't solved this problem.
> 
> The only other plan I had was to implement another API like
> dev_pm_set_state() or something like that and have that do something
> like what the OPP rate API does right now. The main difference being
> that the argument to the function would be some opaque u64 that's
> converted by the bus/class/genpd attached to the device into whatever
> frequency/voltage/performance state is desired (and sequenced in the
> right order too). And then I was thinking that runtime PM or explicit
> dev_pm_set_state() calls would be used to tell this new layer that the
> device was going to a lower power mode with some other number (sub-kHz
> integer?) and have that be translated again into some
> frequency/voltage/performance state.
> 
> Either way, each driver will have to change from using the clk APIs to
> changes rates to something else like one of these APIs, so I don't see a
> huge difference. Drivers will have to change.

I agree, that's why I wrote the dev_pm_opp_set_rate() API initially.

> > 
> > - Do we expect the callers will disable clk before calling
> >   opp-set-rate with 0 ? We should remove the regulator requirements as
> >   well along with perf-state.
> 
> Yes, that's the plan. Problems come along with this though, like shared
> resource constraints and actually knowing the clk on/off state,
> frequency, voltage, etc. at boot time and making sure to keep those
> constraints satisfied during normal operation.

But that isn't any different from drivers doing clk_disable directly,
right ? So that shouldn't worry us.

> > - What about enabling/disabling clock as well from OPP framework. We
> >   can enable it on the very first call to opp-set-rate and disable
> >   when freq is 0. That will simplify the drivers as well.
> 
> It works when those drivers aren't calling clk_disable() directly from
> some irq handler. I don't think that's very common, but in those cases
> we would probably want to allow drivers to quickly gate and ungate their
> clks but then defer the sleeping stuff (voltages and off chip clks) to
> the schedulable contexts. We'll still be left with a small number of
> drivers that want to only call clk_prepare() and clk_unprepare() from
> within OPP and keep calling clk_enable() and clk_disable() from their
> driver. So introduce different APIs for those drivers to indicate this
> to OPP? And only do that when it becomes a requirement?

I am not sure I understood this well. The reference counting within
clk/regulator should let both the layers (driver and opp core) work
just fine. Why would a driver don't want OPP core to call
clk_prepare_enable() all the time ?

> Otherwise I don't really see a problem with the OPP call handling the
> enable state of the clk as well.

Right, so I would like that to be part of this series when this gets
implemented.

> > > One nice feature of this approach is that we don't need to change the
> > > OPP binding to support this. We can specify only the max frequencies and
> > > the voltage requirements in DT with the existing binding and slightly
> > > tweak the OPP code to achieve these results. 
> > > 
> > > This series includes a conversion of the uart and spi drivers on
> > > qcom sdm845 where these patches are being developed. It shows how a
> > > driver is converted from the clk APIs to the OPP APIs and how
> > > enable/disable state of the clk is communicated to the OPP layer.
> > > 
> > > Some open topics and initial points for discussion are:
> > > 
> > > 1) The dev_pm_opp_set_rate() API changes may break something that's 
> > > relying on the rate rounding that OPP provides. If those exist,
> > > we may need to implement another API that is more explicit about using
> > > the clk API instead of the OPP tables.

I don't remember any such cases, I may have forgotten about those
though.

> > > 2) dev_pm_opp_set_rate(0) is an interesting solution to the problem of
> > > dropping the rate requirement. Is there anything better than this?

I am okay with it. I don't want to invent another set of APIs to
enable / disable the resources.

> > > 3) How do we handle devices that already have power-domains specified in
> > > DT? The opp binding for required-opps doesn't let us specify the power
> > > domain to target, instead it assumes that whatever power domain is
> > > attached to a device is the one that OPP needs to use to change the
> > > genpd performance state. Do we need a
> > > dev_pm_opp_set_required_opps_name() or something to be explicit about

Yeah, we may need to come up with something like this eventually. I
had written something like that earlier, but then it wasn't required.

> > > this? Can we have some way for the power domain that required-opps
> > > correspond to be expressed in the OPP tables themselves?

Not sure I understood it. Can you explain with some example please ?

> > > 4) How do we achieve the "full constraint" state? i.e. what do we do
> > > about some devices being active and others being inactive during boot
> > > and making sure that the voltage for the shared power domains doesn't
> > > drop until all devices requiring it have informed OPP about their
> > > power requirements?

We need the boot-constraint framework for that. I think this is a
problem which we have currently as well. I am waiting for the
bus-scaling framework to get in, after that we will have lot of cases
where boot-constraints would be required and it won't be limited to
just clcd then.
Viresh Kumar Feb. 8, 2019, 7:17 a.m. UTC | #12
On 07-02-19, 14:37, Ulf Hansson wrote:
> I think we also need to consider cross SoC drivers. One SoC may have
> both clocks and OPPs to manage, while another may have only clocks.

We already have that case with CPUs as well and dev_pm_opp_set_rate()
takes care of it.

> Even it this may be fairly uncommon, we should consider it, before we
> decide to fold in additional clock management, like
> clk_prepare|unprepare() for example, behind the dev_pm_opp_set_rate()
> API.
> 
> The point is, the driver may need to call clk_prepare|enable()
> anyways, unless we make that conditional depending on a DT compatible
> string, for example. Of course, because the clock prepare/enable is
> reference counted, there may not be a problem in practice to have both
> the OPP and driver to deal with it.
Ulf Hansson Feb. 8, 2019, 9:45 a.m. UTC | #13
On Fri, 8 Feb 2019 at 08:17, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 07-02-19, 14:37, Ulf Hansson wrote:
> > I think we also need to consider cross SoC drivers. One SoC may have
> > both clocks and OPPs to manage, while another may have only clocks.
>
> We already have that case with CPUs as well and dev_pm_opp_set_rate()
> takes care of it.

I think you may have misunderstood my point. Or maybe I don't get yours. :-)

What if there is no OPP at all to use, then dev_pm_opp_set_rate() is
just a noop, right? In this scenario the driver still need to call
clk_set_rate().

How do we cope with these cases?

>
> > Even it this may be fairly uncommon, we should consider it, before we
> > decide to fold in additional clock management, like
> > clk_prepare|unprepare() for example, behind the dev_pm_opp_set_rate()
> > API.
> >
> > The point is, the driver may need to call clk_prepare|enable()
> > anyways, unless we make that conditional depending on a DT compatible
> > string, for example. Of course, because the clock prepare/enable is
> > reference counted, there may not be a problem in practice to have both
> > the OPP and driver to deal with it.
>

Kind regards
Uffe
Viresh Kumar Feb. 8, 2019, 10:05 a.m. UTC | #14
On 08-02-19, 10:45, Ulf Hansson wrote:
> On Fri, 8 Feb 2019 at 08:17, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >
> > On 07-02-19, 14:37, Ulf Hansson wrote:
> > > I think we also need to consider cross SoC drivers. One SoC may have
> > > both clocks and OPPs to manage, while another may have only clocks.
> >
> > We already have that case with CPUs as well and dev_pm_opp_set_rate()
> > takes care of it.
> 
> I think you may have misunderstood my point. Or maybe I don't get yours. :-)

It was me. I thought you are talking about regulators and that is what
is already managed, i.e. to work with or without regulators.

> What if there is no OPP at all to use, then dev_pm_opp_set_rate() is
> just a noop, right? In this scenario the driver still need to call
> clk_set_rate().
> 
> How do we cope with these cases?

Yeah, that would be a problem and hacking the OPP core may not be the
right solution :(
Ulf Hansson Feb. 8, 2019, 10:31 a.m. UTC | #15
On Fri, 8 Feb 2019 at 11:05, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 08-02-19, 10:45, Ulf Hansson wrote:
> > On Fri, 8 Feb 2019 at 08:17, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > >
> > > On 07-02-19, 14:37, Ulf Hansson wrote:
> > > > I think we also need to consider cross SoC drivers. One SoC may have
> > > > both clocks and OPPs to manage, while another may have only clocks.
> > >
> > > We already have that case with CPUs as well and dev_pm_opp_set_rate()
> > > takes care of it.
> >
> > I think you may have misunderstood my point. Or maybe I don't get yours. :-)
>
> It was me. I thought you are talking about regulators and that is what
> is already managed, i.e. to work with or without regulators.
>
> > What if there is no OPP at all to use, then dev_pm_opp_set_rate() is
> > just a noop, right? In this scenario the driver still need to call
> > clk_set_rate().
> >
> > How do we cope with these cases?
>
> Yeah, that would be a problem and hacking the OPP core may not be the
> right solution :(

I guess one simple way forward could just be to check if there is an
OPP handle/table available, then use dev_pm_opp_set_rate(). When no
OPP handle/table, use clk_set_rate() *instead*, not both.

That could work, don't you think?

Kind regards
Uffe
Viresh Kumar Feb. 8, 2019, 10:33 a.m. UTC | #16
On 08-02-19, 11:31, Ulf Hansson wrote:
> On Fri, 8 Feb 2019 at 11:05, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >
> > On 08-02-19, 10:45, Ulf Hansson wrote:
> > > On Fri, 8 Feb 2019 at 08:17, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > > >
> > > > On 07-02-19, 14:37, Ulf Hansson wrote:
> > > > > I think we also need to consider cross SoC drivers. One SoC may have
> > > > > both clocks and OPPs to manage, while another may have only clocks.
> > > >
> > > > We already have that case with CPUs as well and dev_pm_opp_set_rate()
> > > > takes care of it.
> > >
> > > I think you may have misunderstood my point. Or maybe I don't get yours. :-)
> >
> > It was me. I thought you are talking about regulators and that is what
> > is already managed, i.e. to work with or without regulators.
> >
> > > What if there is no OPP at all to use, then dev_pm_opp_set_rate() is
> > > just a noop, right? In this scenario the driver still need to call
> > > clk_set_rate().
> > >
> > > How do we cope with these cases?
> >
> > Yeah, that would be a problem and hacking the OPP core may not be the
> > right solution :(
> 
> I guess one simple way forward could just be to check if there is an
> OPP handle/table available, then use dev_pm_opp_set_rate(). When no
> OPP handle/table, use clk_set_rate() *instead*, not both.
> 
> That could work, don't you think?

Yeah, just that it adds more conditional code in drivers, while we
wanted to make them light-weight :)