[0/8] cpufreq: Auto-register with energy model

Message ID	cover.1628579170.git.viresh.kumar@linaro.org (mailing list archive)
Headers	show Return-Path: <linux-omap-owner@kernel.org> From: Viresh Kumar <viresh.kumar@linaro.org> To: Rafael Wysocki <rjw@rjwysocki.net>, Vincent Donnefort <vincent.donnefort@arm.com>, lukasz.luba@arm.com, Andy Gross <agross@kernel.org>, Bjorn Andersson <bjorn.andersson@linaro.org>, Cristian Marussi <cristian.marussi@arm.com>, Fabio Estevam <festevam@gmail.com>, Kevin Hilman <khilman@kernel.org>, Matthias Brugger <matthias.bgg@gmail.com>, NXP Linux Team <linux-imx@nxp.com>, Pengutronix Kernel Team <kernel@pengutronix.de>, Sascha Hauer <s.hauer@pengutronix.de>, Shawn Guo <shawnguo@kernel.org>, Sudeep Holla <sudeep.holla@arm.com>, Viresh Kumar <viresh.kumar@linaro.org> Cc: linux-pm@vger.kernel.org, Vincent Guittot <vincent.guittot@linaro.org>, linux-arm-kernel@lists.infradead.org, linux-arm-msm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mediatek@lists.infradead.org, linux-omap@vger.kernel.org Subject: [PATCH 0/8] cpufreq: Auto-register with energy model Date: Tue, 10 Aug 2021 13:06:47 +0530 Message-Id: <cover.1628579170.git.viresh.kumar@linaro.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	cpufreq: Auto-register with energy model \| expand [0/8] cpufreq: Auto-register with energy model [5/8] cpufreq: omap: Use auto-registration for energy model

Viresh Kumar Aug. 10, 2021, 7:36 a.m. UTC

Provide a cpufreq driver flag so drivers can ask the cpufreq core to register
with the EM core on their behalf. This allows us to get rid of duplicated code
in the drivers and fix the unregistration part as well, which none of the
drivers have done until now.

This would also make the registration with EM core to happen only after policy
is fully initialized, and the EM core can do other stuff from in there, like
marking frequencies as inefficient (WIP). Though this patchset is useful without
that work being done and should be merged nevertheless.

This doesn't update scmi cpufreq driver for now as it is a special case and need
to be handled differently. Though we can make it work with this if required.

This is build/boot tested by the bot for a couple of boards.

https://gitlab.com/vireshk/pmko/-/pipelines/350674298

--
Viresh

Viresh Kumar (8):
  cpufreq: Auto-register with energy model if asked
  cpufreq: dt: Use auto-registration for energy model
  cpufreq: imx6q: Use auto-registration for energy model
  cpufreq: mediatek: Use auto-registration for energy model
  cpufreq: omap: Use auto-registration for energy model
  cpufreq: qcom-cpufreq-hw: Use auto-registration for energy model
  cpufreq: scpi: Use auto-registration for energy model
  cpufreq: vexpress: Use auto-registration for energy model

 drivers/cpufreq/cpufreq-dt.c           | 5 ++---
 drivers/cpufreq/cpufreq.c              | 9 +++++++++
 drivers/cpufreq/imx6q-cpufreq.c        | 4 ++--
 drivers/cpufreq/mediatek-cpufreq.c     | 5 ++---
 drivers/cpufreq/omap-cpufreq.c         | 4 ++--
 drivers/cpufreq/qcom-cpufreq-hw.c      | 5 ++---
 drivers/cpufreq/scpi-cpufreq.c         | 5 ++---
 drivers/cpufreq/vexpress-spc-cpufreq.c | 5 ++---
 include/linux/cpufreq.h                | 6 ++++++
 9 files changed, 29 insertions(+), 19 deletions(-)

Lukasz Luba Aug. 10, 2021, 9:17 a.m. UTC | #1

Hi Viresh,

I like the idea, only small comments here in the cover letter.

On 8/10/21 8:36 AM, Viresh Kumar wrote:
> Provide a cpufreq driver flag so drivers can ask the cpufreq core to register
> with the EM core on their behalf. This allows us to get rid of duplicated code
> in the drivers and fix the unregistration part as well, which none of the
> drivers have done until now.

The EM is never freed for CPUs by design. The unregister function was
introduced for devfreq devices.

> 
> This would also make the registration with EM core to happen only after policy
> is fully initialized, and the EM core can do other stuff from in there, like
> marking frequencies as inefficient (WIP). Though this patchset is useful without
> that work being done and should be merged nevertheless.
> 
> This doesn't update scmi cpufreq driver for now as it is a special case and need
> to be handled differently. Though we can make it work with this if required.

The scmi cpufreq driver uses direct EM API, which provides flexibility
and should stay as is.

Let me review the patches.

Regards,
Lukasz

Viresh Kumar Aug. 10, 2021, 9:27 a.m. UTC | #2

On 10-08-21, 10:17, Lukasz Luba wrote:
> Hi Viresh,
> 
> I like the idea, only small comments here in the cover letter.
> 
> On 8/10/21 8:36 AM, Viresh Kumar wrote:
> > Provide a cpufreq driver flag so drivers can ask the cpufreq core to register
> > with the EM core on their behalf. This allows us to get rid of duplicated code
> > in the drivers and fix the unregistration part as well, which none of the
> > drivers have done until now.
> 
> The EM is never freed for CPUs by design. The unregister function was
> introduced for devfreq devices.

I see. So if a cpufreq driver unregisters and registers again, it will
be required to use the entries created by the registration itself,
right ? Technically speaking, it is better to unregister and free any
related resources and parse everything again.

Lets say, just for fun, I want to test two copies of a cpufreq driver
(providing different set of freq-tables). I build both of them as
modules, insert the first version, remove it, insert the second one.
Ideally, this should just work as expected. But I don't think it will
in this case as you never parse the EM stuff again.

Again, since the routine is there already, I think it is better/fine
to just use it.

> > This would also make the registration with EM core to happen only after policy
> > is fully initialized, and the EM core can do other stuff from in there, like
> > marking frequencies as inefficient (WIP). Though this patchset is useful without
> > that work being done and should be merged nevertheless.
> > 
> > This doesn't update scmi cpufreq driver for now as it is a special case and need
> > to be handled differently. Though we can make it work with this if required.
> 
> The scmi cpufreq driver uses direct EM API, which provides flexibility
> and should stay as is.

Right, so I left it as is for now.

Lukasz Luba Aug. 10, 2021, 9:35 a.m. UTC | #3

On 8/10/21 10:27 AM, Viresh Kumar wrote:
> On 10-08-21, 10:17, Lukasz Luba wrote:
>> Hi Viresh,
>>
>> I like the idea, only small comments here in the cover letter.
>>
>> On 8/10/21 8:36 AM, Viresh Kumar wrote:
>>> Provide a cpufreq driver flag so drivers can ask the cpufreq core to register
>>> with the EM core on their behalf. This allows us to get rid of duplicated code
>>> in the drivers and fix the unregistration part as well, which none of the
>>> drivers have done until now.
>>
>> The EM is never freed for CPUs by design. The unregister function was
>> introduced for devfreq devices.
> 
> I see. So if a cpufreq driver unregisters and registers again, it will
> be required to use the entries created by the registration itself,
> right ? Technically speaking, it is better to unregister and free any
> related resources and parse everything again.
> 
> Lets say, just for fun, I want to test two copies of a cpufreq driver

It's good that it's just for fun ;)

> (providing different set of freq-tables). I build both of them as
> modules, insert the first version, remove it, insert the second one.
> Ideally, this should just work as expected. But I don't think it will
> in this case as you never parse the EM stuff again.

The EM is directly used by scheduler in the hot-path, there are no
checks even if the EM if for CPUs. We are sure it's is for CPUs and
is always there for all CPUs.

I'm currently working on a EM v2 which would have stronger mechanisms
and do better job in this field. The patches are under internal review
and hopefully ready to post by the end of month.

> 
> Again, since the routine is there already, I think it is better/fine
> to just use it.

True, it doesn't harm, so I commented it in the patch 1/8 that it
could stay.

> 
>>> This would also make the registration with EM core to happen only after policy
>>> is fully initialized, and the EM core can do other stuff from in there, like
>>> marking frequencies as inefficient (WIP). Though this patchset is useful without
>>> that work being done and should be merged nevertheless.
>>>
>>> This doesn't update scmi cpufreq driver for now as it is a special case and need
>>> to be handled differently. Though we can make it work with this if required.
>>
>> The scmi cpufreq driver uses direct EM API, which provides flexibility
>> and should stay as is.
> 
> Right, so I left it as is for now.
>

Quentin Perret Aug. 10, 2021, 12:35 p.m. UTC | #4

On Tuesday 10 Aug 2021 at 13:06:47 (+0530), Viresh Kumar wrote:
> Provide a cpufreq driver flag so drivers can ask the cpufreq core to register
> with the EM core on their behalf.

Hmm, that's not quite what this does. This asks the cpufreq core to
use *PM_OPP* to register an EM, which I think is kinda wrong to do from
there IMO. The decision to use PM_OPP or another mechanism to register
an EM belongs to platform specific code (drivers), so it is odd for the
PM_OPP registration to have its own cpufreq flag but not the other ways.

As mentioned in another thread, the very reason to have PM_EM is to not
depend on PM_OPP, so I'm worried about the direction of travel with this
series TBH.

> This allows us to get rid of duplicated code
> in the drivers and fix the unregistration part as well, which none of the
> drivers have done until now.

This series adds more code than it removes, and the unregistration is
not a fix as we don't ever remove the EM tables by design, so not sure
either of these points are valid arguments.

> This would also make the registration with EM core to happen only after policy
> is fully initialized, and the EM core can do other stuff from in there, like
> marking frequencies as inefficient (WIP). Though this patchset is useful without
> that work being done and should be merged nevertheless.
> 
> This doesn't update scmi cpufreq driver for now as it is a special case and need
> to be handled differently. Though we can make it work with this if required.

Note that we'll have more 'special cases' if other architectures start
using PM_EM, which is what we have been trying to allow since the
beginning, so that's worth keeping in mind.

Thanks,
Quentin

Lukasz Luba Aug. 10, 2021, 1:25 p.m. UTC | #5

On 8/10/21 1:35 PM, Quentin Perret wrote:
> On Tuesday 10 Aug 2021 at 13:06:47 (+0530), Viresh Kumar wrote:
>> Provide a cpufreq driver flag so drivers can ask the cpufreq core to register
>> with the EM core on their behalf.
> 
> Hmm, that's not quite what this does. This asks the cpufreq core to
> use *PM_OPP* to register an EM, which I think is kinda wrong to do from
> there IMO. The decision to use PM_OPP or another mechanism to register
> an EM belongs to platform specific code (drivers), so it is odd for the
> PM_OPP registration to have its own cpufreq flag but not the other ways.
> 
> As mentioned in another thread, the very reason to have PM_EM is to not
> depend on PM_OPP, so I'm worried about the direction of travel with this
> series TBH.
> 
>> This allows us to get rid of duplicated code
>> in the drivers and fix the unregistration part as well, which none of the
>> drivers have done until now.
> 
> This series adds more code than it removes, and the unregistration is
> not a fix as we don't ever remove the EM tables by design, so not sure
> either of these points are valid arguments.
> 
>> This would also make the registration with EM core to happen only after policy
>> is fully initialized, and the EM core can do other stuff from in there, like
>> marking frequencies as inefficient (WIP). Though this patchset is useful without
>> that work being done and should be merged nevertheless.
>>
>> This doesn't update scmi cpufreq driver for now as it is a special case and need
>> to be handled differently. Though we can make it work with this if required.
> 
> Note that we'll have more 'special cases' if other architectures start
> using PM_EM, which is what we have been trying to allow since the
> beginning, so that's worth keeping in mind.
> 

The way I see this is that the flag in cpufreq avoids
mistakes potentially made by driver developer. It will automaticaly
register the *simple* EM model via dev_pm_opp_of_register_em() on behalf
of drivers (which is already done manually by drivers). The developer
would just set the flag similarly to CPUFREQ_IS_COOLING_DEV and be sure
it will register at the right time. Well tested flag approach should be
safer, easier to understand, maintain.

If there is a need for *advanced* EM model, driver developer would
have to care about all these things (order, setup-ready-structures,
fw channels, freeing, etc) while developing custom registration.
The developer won't set this flag in such case, so the core won't
try to auto register the EM for that driver.

I don't see the dependency of PM_EM on PM_OPP in this series.

Quentin Perret Aug. 10, 2021, 1:53 p.m. UTC | #6

On Tuesday 10 Aug 2021 at 14:25:15 (+0100), Lukasz Luba wrote:
> The way I see this is that the flag in cpufreq avoids
> mistakes potentially made by driver developer. It will automaticaly
> register the *simple* EM model via dev_pm_opp_of_register_em() on behalf
> of drivers (which is already done manually by drivers). The developer
> would just set the flag similarly to CPUFREQ_IS_COOLING_DEV and be sure
> it will register at the right time. Well tested flag approach should be
> safer, easier to understand, maintain.

I would agree with all that if calling dev_pm_opp_of_register_em() was
complicated, but that is not really the case. I don't think we ever call
PM_OPP directly from cpufreq core ATM, which makes a lot of sense if you
consider PM_OPP arch-specific. I could understand that we might accept a
little 'violation' of the abstraction with this series if there were
real benefits, but I just don't see them.

Viresh Kumar Aug. 11, 2021, 5:18 a.m. UTC | #7

On 10-08-21, 13:35, Quentin Perret wrote:
> On Tuesday 10 Aug 2021 at 13:06:47 (+0530), Viresh Kumar wrote:
> > Provide a cpufreq driver flag so drivers can ask the cpufreq core to register
> > with the EM core on their behalf.
> 
> Hmm, that's not quite what this does. This asks the cpufreq core to
> use *PM_OPP* to register an EM, which I think is kinda wrong to do from
> there IMO. The decision to use PM_OPP or another mechanism to register
> an EM belongs to platform specific code (drivers), so it is odd for the
> PM_OPP registration to have its own cpufreq flag but not the other ways.
> 
> As mentioned in another thread, the very reason to have PM_EM is to not
> depend on PM_OPP, so I'm worried about the direction of travel with this
> series TBH.

I had to use the pm-opp version, since almost everyone was using that.

On the other hand, there isn't a lot of OPP specific stuff in
dev_pm_opp_of_register_em(). It just uses dev_pm_opp_get_opp_count(),
that's all. This ended up in the OPP core, nothing else. Maybe we can
now move it back to the EM core and name it differently ?

> > This allows us to get rid of duplicated code
> > in the drivers and fix the unregistration part as well, which none of the
> > drivers have done until now.
> 
> This series adds more code than it removes,

Sadly yes :(

> and the unregistration is
> not a fix as we don't ever remove the EM tables by design, so not sure
> either of these points are valid arguments.

I think that design needs to be looked over again, it looks broken to
me everytime I land onto this code. I wonder why we don't unregister
stuff.

Lets say, I am working on the cpufreq driver and I want to test that
on my ARM machine. Rebooting a simpler board to test stuff out is
easy, but if I am working on an ARM server which is running lots of
other userspace stuff as well, I won't want to reboot the machine just
to test a different versions of the driver. I will rather want to
build the driver as module and insert/remove it again and again.

If the frequency table changes in between versions, this just breaks
as EM won't be updated again.

This breaks one of the most basic rules of Linux Kernel. Inserting a
module should have exactly the same final behavior every single time.
This model doesn't guarantee it. It simply looks broken.

> > This would also make the registration with EM core to happen only after policy
> > is fully initialized, and the EM core can do other stuff from in there, like
> > marking frequencies as inefficient (WIP). Though this patchset is useful without
> > that work being done and should be merged nevertheless.
> > 
> > This doesn't update scmi cpufreq driver for now as it is a special case and need
> > to be handled differently. Though we can make it work with this if required.
> 
> Note that we'll have more 'special cases' if other architectures start
> using PM_EM, which is what we have been trying to allow since the
> beginning, so that's worth keeping in mind.

Yes, we need to take care of all such special cases as well.

Viresh Kumar Aug. 11, 2021, 5:34 a.m. UTC | #8

On 11-08-21, 10:48, Viresh Kumar wrote:
> On 10-08-21, 13:35, Quentin Perret wrote:
> > This series adds more code than it removes,
> 
> Sadly yes :(
> 
> > and the unregistration is
> > not a fix as we don't ever remove the EM tables by design, so not sure
> > either of these points are valid arguments.
> 
> I think that design needs to be looked over again, it looks broken to
> me everytime I land onto this code. I wonder why we don't unregister
> stuff.

Coming back to this series. We have two options, based on what I
proposed here:

https://lore.kernel.org/linux-pm/20210811050327.3yxrk4kqxjjwaztx@vireshk-i7/

1. Let cpufreq core register with EM on behalf of cpufreq drivers.

2. Update drivers to use ->ready() callback to do this stuff.

I am fine with both :)

Quentin Perret Aug. 11, 2021, 8:37 a.m. UTC | #9

On Wednesday 11 Aug 2021 at 10:48:59 (+0530), Viresh Kumar wrote:
> On 10-08-21, 13:35, Quentin Perret wrote:
> > On Tuesday 10 Aug 2021 at 13:06:47 (+0530), Viresh Kumar wrote:
> > > Provide a cpufreq driver flag so drivers can ask the cpufreq core to register
> > > with the EM core on their behalf.
> > 
> > Hmm, that's not quite what this does. This asks the cpufreq core to
> > use *PM_OPP* to register an EM, which I think is kinda wrong to do from
> > there IMO. The decision to use PM_OPP or another mechanism to register
> > an EM belongs to platform specific code (drivers), so it is odd for the
> > PM_OPP registration to have its own cpufreq flag but not the other ways.
> > 
> > As mentioned in another thread, the very reason to have PM_EM is to not
> > depend on PM_OPP, so I'm worried about the direction of travel with this
> > series TBH.
> 
> I had to use the pm-opp version, since almost everyone was using that.
> 
> On the other hand, there isn't a lot of OPP specific stuff in
> dev_pm_opp_of_register_em(). It just uses dev_pm_opp_get_opp_count(),
> that's all. This ended up in the OPP core, nothing else. Maybe we can
> now move it back to the EM core and name it differently ?

Well it also uses dev_pm_opp_find_freq_ceil() and
dev_pm_opp_get_voltage(), so not sure how easy it will be to move, but
if it is possible no objection from me.

> > > This allows us to get rid of duplicated code
> > > in the drivers and fix the unregistration part as well, which none of the
> > > drivers have done until now.
> > 
> > This series adds more code than it removes,
> 
> Sadly yes :(
> 
> > and the unregistration is
> > not a fix as we don't ever remove the EM tables by design, so not sure
> > either of these points are valid arguments.
> 
> I think that design needs to be looked over again, it looks broken to
> me everytime I land onto this code. I wonder why we don't unregister
> stuff.
> 
> Lets say, I am working on the cpufreq driver and I want to test that
> on my ARM machine. Rebooting a simpler board to test stuff out is
> easy, but if I am working on an ARM server which is running lots of
> other userspace stuff as well, I won't want to reboot the machine just
> to test a different versions of the driver. I will rather want to
> build the driver as module and insert/remove it again and again.
> 
> If the frequency table changes in between versions, this just breaks
> as EM won't be updated again.
> 
> This breaks one of the most basic rules of Linux Kernel. Inserting a
> module should have exactly the same final behavior every single time.
> This model doesn't guarantee it. It simply looks broken.

Right but the EM is a description of the hardware, so it seemed fair
to assume this wouldn't change across the lifetime of the OS, similar
to the DT which we can't reload at run-time. Yes it can be a little odd
if you load/unload your driver module, but note that you generally can't
load two completely different drivers on a single system. You'll just
load the same one again and the hardware hasn't changed in the meantime,
so the previously loaded EM will still be correct. I hear your argument
about cpufreq driver development, but the locking involved to allow
'just' that is pretty involved, and nobody has complained about this
specific issue so far, so that didn't seem worth it. If we do have good
reasons to change the EM at runtime, then yes I think we should do it,
it just didn't seem like that was the case until now.

Viresh Kumar Aug. 11, 2021, 9:13 a.m. UTC | #10

On 11-08-21, 09:37, Quentin Perret wrote:
> On Wednesday 11 Aug 2021 at 10:48:59 (+0530), Viresh Kumar wrote:
> > I had to use the pm-opp version, since almost everyone was using that.
> > 
> > On the other hand, there isn't a lot of OPP specific stuff in
> > dev_pm_opp_of_register_em(). It just uses dev_pm_opp_get_opp_count(),
> > that's all. This ended up in the OPP core, nothing else. Maybe we can
> > now move it back to the EM core and name it differently ?
> 
> Well it also uses dev_pm_opp_find_freq_ceil() and
> dev_pm_opp_get_voltage(), so not sure how easy it will be to move, but
> if it is possible no objection from me.

What uses these routines ? dev_pm_opp_of_register_em() ? I am not able
to see that at least :(

> Right but the EM is a description of the hardware, so it seemed fair
> to assume this wouldn't change across the lifetime of the OS, similar
> to the DT which we can't reload at run-time. Yes it can be a little odd
> if you load/unload your driver module, but note that you generally can't
> load two completely different drivers on a single system. You'll just
> load the same one again and the hardware hasn't changed in the meantime,
> so the previously loaded EM will still be correct.

Yeah, it will be the same driver but a different version of it, which
may have updated the freq table. For me the EM is attached to the
freq-table, and the freq-table is not available anymore after the
driver is gone.

Anyway, I will leave that for you guys to decide :)

> I hear your argument
> about cpufreq driver development, but the locking involved to allow
> 'just' that is pretty involved, and nobody has complained about this
> specific issue so far, so that didn't seem worth it. If we do have good
> reasons to change the EM at runtime, then yes I think we should do it,
> it just didn't seem like that was the case until now.

Quentin Perret Aug. 11, 2021, 9:34 a.m. UTC | #11

On Wednesday 11 Aug 2021 at 14:43:21 (+0530), Viresh Kumar wrote:
> On 11-08-21, 09:37, Quentin Perret wrote:
> > On Wednesday 11 Aug 2021 at 10:48:59 (+0530), Viresh Kumar wrote:
> > > I had to use the pm-opp version, since almost everyone was using that.
> > > 
> > > On the other hand, there isn't a lot of OPP specific stuff in
> > > dev_pm_opp_of_register_em(). It just uses dev_pm_opp_get_opp_count(),
> > > that's all. This ended up in the OPP core, nothing else. Maybe we can
> > > now move it back to the EM core and name it differently ?
> > 
> > Well it also uses dev_pm_opp_find_freq_ceil() and
> > dev_pm_opp_get_voltage(), so not sure how easy it will be to move, but
> > if it is possible no objection from me.
> 
> What uses these routines ? dev_pm_opp_of_register_em() ? I am not able
> to see that at least :(

Yep, it's not immediately obvious, but see how it sets the struct
em_data_callback to point at _get_power() where the actual energy
calculation is done. So strictly speaking _get_power() is what uses
these routines, but it goes in hand with dev_pm_opp_of_register_em() so
I guess the same reasoning applies.

> > Right but the EM is a description of the hardware, so it seemed fair
> > to assume this wouldn't change across the lifetime of the OS, similar
> > to the DT which we can't reload at run-time. Yes it can be a little odd
> > if you load/unload your driver module, but note that you generally can't
> > load two completely different drivers on a single system. You'll just
> > load the same one again and the hardware hasn't changed in the meantime,
> > so the previously loaded EM will still be correct.
> 
> Yeah, it will be the same driver but a different version of it, which
> may have updated the freq table. For me the EM is attached to the
> freq-table, and the freq-table is not available anymore after the
> driver is gone.
> 
> Anyway, I will leave that for you guys to decide :)

IIUC Lukasz is working on something that should allow changing the EM at
run-time, so hopefully it'll enable this use-case as well, but we'll see :)

Viresh Kumar Aug. 11, 2021, 9:36 a.m. UTC | #12

On 11-08-21, 10:34, Quentin Perret wrote:
> Yep, it's not immediately obvious, but see how it sets the struct
> em_data_callback to point at _get_power() where the actual energy
> calculation is done. So strictly speaking _get_power() is what uses
> these routines, but it goes in hand with dev_pm_opp_of_register_em() so
> I guess the same reasoning applies.

My bad.

Quentin Perret Aug. 11, 2021, 9:48 a.m. UTC | #13

On Wednesday 11 Aug 2021 at 11:04:06 (+0530), Viresh Kumar wrote:
> On 11-08-21, 10:48, Viresh Kumar wrote:
> > On 10-08-21, 13:35, Quentin Perret wrote:
> > > This series adds more code than it removes,
> > 
> > Sadly yes :(
> > 
> > > and the unregistration is
> > > not a fix as we don't ever remove the EM tables by design, so not sure
> > > either of these points are valid arguments.
> > 
> > I think that design needs to be looked over again, it looks broken to
> > me everytime I land onto this code. I wonder why we don't unregister
> > stuff.
> 
> Coming back to this series. We have two options, based on what I
> proposed here:
> 
> https://lore.kernel.org/linux-pm/20210811050327.3yxrk4kqxjjwaztx@vireshk-i7/
> 
> 1. Let cpufreq core register with EM on behalf of cpufreq drivers.

If we're going that route, I think we should allow _all_ possible
EM registration methods (via PM_OPP or else) to be done that way.
Otherwise we're creating an inconsitency in how the EM is registered
(e.g. from the ->init() cpufreq callback for some, or from cpufreq core
for others) which is problematic as we risk building features that
assume loading is done at a certain time, which won't work for some
platforms.

> 2. Update drivers to use ->ready() callback to do this stuff.

I think this should work, but perhaps will be a bit tricky for cpufreq
driver developers as they need to have a pretty good understanding of
the stack to know that they should do the registration from here and not
->init() for instance. Suggested alternative: we introduce a ->register_em()
callback to cpufreq_driver, and turn dev_pm_opp_of_register_em() into a
valid handler for this callback. This should 'document' things a bit
better, avoid some of the problems your other series tried to achieve, and
allow us to call the EM registration in exactly the right place from
cpufreq core. On the plus side, we could easily make this work for e.g.
the SCMI driver which would only need to provide its own version of
->register_em().

Thoughts?

Viresh Kumar Aug. 11, 2021, 9:53 a.m. UTC | #14

On 11-08-21, 10:48, Quentin Perret wrote:
> I think this should work, but perhaps will be a bit tricky for cpufreq
> driver developers as they need to have a pretty good understanding of
> the stack to know that they should do the registration from here and not
> ->init() for instance. Suggested alternative: we introduce a ->register_em()
> callback to cpufreq_driver, and turn dev_pm_opp_of_register_em() into a
> valid handler for this callback. This should 'document' things a bit
> better, avoid some of the problems your other series tried to achieve, and
> allow us to call the EM registration in exactly the right place from
> cpufreq core. On the plus side, we could easily make this work for e.g.
> the SCMI driver which would only need to provide its own version of
> ->register_em().
> 
> Thoughts?

I had exactly the same thing in mind, but was thinking of two
callbacks, to register and unregister. But yeah, we aren't going to
register for now at least :)

I wasn't sure if that should be done or not, since we also have
ready() callback. So was reluctant to suggest it earlier. But that can
work well as well.

Quentin Perret Aug. 11, 2021, 10:12 a.m. UTC | #15

On Wednesday 11 Aug 2021 at 15:23:11 (+0530), Viresh Kumar wrote:
> On 11-08-21, 10:48, Quentin Perret wrote:
> > I think this should work, but perhaps will be a bit tricky for cpufreq
> > driver developers as they need to have a pretty good understanding of
> > the stack to know that they should do the registration from here and not
> > ->init() for instance. Suggested alternative: we introduce a ->register_em()
> > callback to cpufreq_driver, and turn dev_pm_opp_of_register_em() into a
> > valid handler for this callback. This should 'document' things a bit
> > better, avoid some of the problems your other series tried to achieve, and
> > allow us to call the EM registration in exactly the right place from
> > cpufreq core. On the plus side, we could easily make this work for e.g.
> > the SCMI driver which would only need to provide its own version of
> > ->register_em().
> > 
> > Thoughts?
> 
> I had exactly the same thing in mind, but was thinking of two
> callbacks, to register and unregister. But yeah, we aren't going to
> register for now at least :)

Ack, we probably want both once we unregister things.

> I wasn't sure if that should be done or not, since we also have
> ready() callback. So was reluctant to suggest it earlier. But that can
> work well as well.

I think using the ready() callback can work just fine as long as we
document clearly it is important to register the EM from there and not
anywhere else. The dedicated em_register() callback makes that a bit
clearer and should avoid a bit of boilerplate in the driver, but it's
not a big deal really, so I'm happy either way ;)

Viresh Kumar Aug. 11, 2021, 10:14 a.m. UTC | #16

On 11-08-21, 11:12, Quentin Perret wrote:
> I think using the ready() callback can work just fine as long as we
> document clearly it is important to register the EM from there and not
> anywhere else. The dedicated em_register() callback makes that a bit
> clearer and should avoid a bit of boilerplate in the driver, but it's
> not a big deal really, so I'm happy either way ;)

Yeah, I think just the same. It is better to have register_em as a
separate call. I was just wondering if it is the right choice :)

Anyway, I think ready() will get removed pretty soon, so register_em()
will work well. I will redo this series and send it.

[0/8] cpufreq: Auto-register with energy model

Message

Comments