diff mbox series

[v8,4/8] PM / EM: add support for other devices than CPUs in Energy Model

Message ID 20200527095854.21714-5-lukasz.luba@arm.com (mailing list archive)
State New, archived
Headers show
Series Add support for devices in the Energy Model | expand

Commit Message

Lukasz Luba May 27, 2020, 9:58 a.m. UTC
Add support for other devices than CPUs. The registration function
does not require a valid cpumask pointer and is ready to handle new
devices. Some of the internal structures has been reorganized in order to
keep consistent view (like removing per_cpu pd pointers).

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 include/linux/device.h       |   5 +
 include/linux/energy_model.h |  29 ++++-
 kernel/power/energy_model.c  | 239 ++++++++++++++++++++++++-----------
 3 files changed, 189 insertions(+), 84 deletions(-)

Comments

Daniel Lezcano June 1, 2020, 9:44 p.m. UTC | #1
On 27/05/2020 11:58, Lukasz Luba wrote:
> Add support for other devices than CPUs. The registration function
> does not require a valid cpumask pointer and is ready to handle new
> devices. Some of the internal structures has been reorganized in order to
> keep consistent view (like removing per_cpu pd pointers).
> 
> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
> ---

[ ... ]

>  }
>  EXPORT_SYMBOL_GPL(em_register_perf_domain);
> +
> +/**
> + * em_dev_unregister_perf_domain() - Unregister Energy Model (EM) for a device
> + * @dev		: Device for which the EM is registered
> + *
> + * Try to unregister the EM for the specified device (but not a CPU).
> + */
> +void em_dev_unregister_perf_domain(struct device *dev)
> +{
> +	if (IS_ERR_OR_NULL(dev) || !dev->em_pd)
> +		return;
> +
> +	if (_is_cpu_device(dev))
> +		return;
> +
> +	mutex_lock(&em_pd_mutex);

Is the mutex really needed?

If this function is called that means there is no more user of the
em_pd, no?

> +	em_debug_remove_pd(dev);
> +
> +	kfree(dev->em_pd->table);
> +	kfree(dev->em_pd);
> +	dev->em_pd = NULL;
> +	mutex_unlock(&em_pd_mutex);
> +}
> +EXPORT_SYMBOL_GPL(em_dev_unregister_perf_domain);
>
Lukasz Luba June 2, 2020, 11:31 a.m. UTC | #2
Hi Daniel,

On 6/1/20 10:44 PM, Daniel Lezcano wrote:
> On 27/05/2020 11:58, Lukasz Luba wrote:
>> Add support for other devices than CPUs. The registration function
>> does not require a valid cpumask pointer and is ready to handle new
>> devices. Some of the internal structures has been reorganized in order to
>> keep consistent view (like removing per_cpu pd pointers).
>>
>> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
>> ---
> 
> [ ... ]
> 
>>   }
>>   EXPORT_SYMBOL_GPL(em_register_perf_domain);
>> +
>> +/**
>> + * em_dev_unregister_perf_domain() - Unregister Energy Model (EM) for a device
>> + * @dev		: Device for which the EM is registered
>> + *
>> + * Try to unregister the EM for the specified device (but not a CPU).
>> + */
>> +void em_dev_unregister_perf_domain(struct device *dev)
>> +{
>> +	if (IS_ERR_OR_NULL(dev) || !dev->em_pd)
>> +		return;
>> +
>> +	if (_is_cpu_device(dev))
>> +		return;
>> +
>> +	mutex_lock(&em_pd_mutex);
> 
> Is the mutex really needed?

I just wanted to align this unregister code with register. Since there
is debugfs dir lookup and the device's EM existence checks I thought it
wouldn't harm just to lock for a while and make sure the registration
path is not used. These two paths shouldn't affect each other, but with
modules loading/unloading I wanted to play safe.
I can change it maybe to just dmb() and the end of the function if it's
a big performance problem in this unloading path. What do you think?

> 
> If this function is called that means there is no more user of the
> em_pd, no?

True, that EM users should already be unregistered i.e. thermal cooling.

> 
>> +	em_debug_remove_pd(dev);
>> +
>> +	kfree(dev->em_pd->table);
>> +	kfree(dev->em_pd);
>> +	dev->em_pd = NULL;
>> +	mutex_unlock(&em_pd_mutex);
>> +}
>> +EXPORT_SYMBOL_GPL(em_dev_unregister_perf_domain);
>>
> 
> 

Thank you for reviewing this.

Regards,
Lukasz
Rafael J. Wysocki June 3, 2020, 3:13 p.m. UTC | #3
On Tue, Jun 2, 2020 at 1:31 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
>
> Hi Daniel,
>
> On 6/1/20 10:44 PM, Daniel Lezcano wrote:
> > On 27/05/2020 11:58, Lukasz Luba wrote:
> >> Add support for other devices than CPUs. The registration function
> >> does not require a valid cpumask pointer and is ready to handle new
> >> devices. Some of the internal structures has been reorganized in order to
> >> keep consistent view (like removing per_cpu pd pointers).
> >>
> >> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
> >> ---
> >
> > [ ... ]
> >
> >>   }
> >>   EXPORT_SYMBOL_GPL(em_register_perf_domain);
> >> +
> >> +/**
> >> + * em_dev_unregister_perf_domain() - Unregister Energy Model (EM) for a device
> >> + * @dev             : Device for which the EM is registered
> >> + *
> >> + * Try to unregister the EM for the specified device (but not a CPU).
> >> + */
> >> +void em_dev_unregister_perf_domain(struct device *dev)
> >> +{
> >> +    if (IS_ERR_OR_NULL(dev) || !dev->em_pd)
> >> +            return;
> >> +
> >> +    if (_is_cpu_device(dev))
> >> +            return;
> >> +
> >> +    mutex_lock(&em_pd_mutex);
> >
> > Is the mutex really needed?
>
> I just wanted to align this unregister code with register. Since there
> is debugfs dir lookup and the device's EM existence checks I thought it
> wouldn't harm just to lock for a while and make sure the registration
> path is not used. These two paths shouldn't affect each other, but with
> modules loading/unloading I wanted to play safe.
>
> I can change it maybe to just dmb() and the end of the function if it's
> a big performance problem in this unloading path. What do you think?

I would rather leave the mutex locking as is.

However, the question to ask is what exactly may go wrong without that
locking in place?  Is there any specific race condition that you are
concerned about?
Lukasz Luba June 3, 2020, 3:25 p.m. UTC | #4
On 6/3/20 4:13 PM, Rafael J. Wysocki wrote:
> On Tue, Jun 2, 2020 at 1:31 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
>>
>> Hi Daniel,
>>
>> On 6/1/20 10:44 PM, Daniel Lezcano wrote:
>>> On 27/05/2020 11:58, Lukasz Luba wrote:
>>>> Add support for other devices than CPUs. The registration function
>>>> does not require a valid cpumask pointer and is ready to handle new
>>>> devices. Some of the internal structures has been reorganized in order to
>>>> keep consistent view (like removing per_cpu pd pointers).
>>>>
>>>> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
>>>> ---
>>>
>>> [ ... ]
>>>
>>>>    }
>>>>    EXPORT_SYMBOL_GPL(em_register_perf_domain);
>>>> +
>>>> +/**
>>>> + * em_dev_unregister_perf_domain() - Unregister Energy Model (EM) for a device
>>>> + * @dev             : Device for which the EM is registered
>>>> + *
>>>> + * Try to unregister the EM for the specified device (but not a CPU).
>>>> + */
>>>> +void em_dev_unregister_perf_domain(struct device *dev)
>>>> +{
>>>> +    if (IS_ERR_OR_NULL(dev) || !dev->em_pd)
>>>> +            return;
>>>> +
>>>> +    if (_is_cpu_device(dev))
>>>> +            return;
>>>> +
>>>> +    mutex_lock(&em_pd_mutex);
>>>
>>> Is the mutex really needed?
>>
>> I just wanted to align this unregister code with register. Since there
>> is debugfs dir lookup and the device's EM existence checks I thought it
>> wouldn't harm just to lock for a while and make sure the registration
>> path is not used. These two paths shouldn't affect each other, but with
>> modules loading/unloading I wanted to play safe.
>>
>> I can change it maybe to just dmb() and the end of the function if it's
>> a big performance problem in this unloading path. What do you think?
> 
> I would rather leave the mutex locking as is.
> 
> However, the question to ask is what exactly may go wrong without that
> locking in place?  Is there any specific race condition that you are
> concerned about?
> 

I tried to test this with module loading & unloading with panfrost
driver and CPU hotplug (which should bail out quickly) and was OK.
I don't see any particular race. I don't too much about the
debugfs code, though. That's why I tried to protect from some
scripts/services which try to re-load the driver.

Apart from that, maybe just this dev->em = NULL to be populated to all
CPUs, which mutex_unlock synchronizes for free here.

Regards,
Lukasz
Rafael J. Wysocki June 3, 2020, 3:40 p.m. UTC | #5
On Wed, Jun 3, 2020 at 5:26 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
>
>
>
> On 6/3/20 4:13 PM, Rafael J. Wysocki wrote:
> > On Tue, Jun 2, 2020 at 1:31 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
> >>
> >> Hi Daniel,
> >>
> >> On 6/1/20 10:44 PM, Daniel Lezcano wrote:
> >>> On 27/05/2020 11:58, Lukasz Luba wrote:
> >>>> Add support for other devices than CPUs. The registration function
> >>>> does not require a valid cpumask pointer and is ready to handle new
> >>>> devices. Some of the internal structures has been reorganized in order to
> >>>> keep consistent view (like removing per_cpu pd pointers).
> >>>>
> >>>> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
> >>>> ---
> >>>
> >>> [ ... ]
> >>>
> >>>>    }
> >>>>    EXPORT_SYMBOL_GPL(em_register_perf_domain);
> >>>> +
> >>>> +/**
> >>>> + * em_dev_unregister_perf_domain() - Unregister Energy Model (EM) for a device
> >>>> + * @dev             : Device for which the EM is registered
> >>>> + *
> >>>> + * Try to unregister the EM for the specified device (but not a CPU).
> >>>> + */
> >>>> +void em_dev_unregister_perf_domain(struct device *dev)
> >>>> +{
> >>>> +    if (IS_ERR_OR_NULL(dev) || !dev->em_pd)
> >>>> +            return;
> >>>> +
> >>>> +    if (_is_cpu_device(dev))
> >>>> +            return;
> >>>> +
> >>>> +    mutex_lock(&em_pd_mutex);
> >>>
> >>> Is the mutex really needed?
> >>
> >> I just wanted to align this unregister code with register. Since there
> >> is debugfs dir lookup and the device's EM existence checks I thought it
> >> wouldn't harm just to lock for a while and make sure the registration
> >> path is not used. These two paths shouldn't affect each other, but with
> >> modules loading/unloading I wanted to play safe.
> >>
> >> I can change it maybe to just dmb() and the end of the function if it's
> >> a big performance problem in this unloading path. What do you think?
> >
> > I would rather leave the mutex locking as is.
> >
> > However, the question to ask is what exactly may go wrong without that
> > locking in place?  Is there any specific race condition that you are
> > concerned about?
> >
>
> I tried to test this with module loading & unloading with panfrost
> driver and CPU hotplug (which should bail out quickly) and was OK.
> I don't see any particular race. I don't too much about the
> debugfs code, though. That's why I tried to protect from some
> scripts/services which try to re-load the driver.
>
> Apart from that, maybe just this dev->em = NULL to be populated to all
> CPUs, which mutex_unlock synchronizes for free here.

If it may run concurrently with the registration for the same device,
the locking is necessary, but in that case the !dev->em_pd check needs
to go under the mutex too IMO, or you may end up leaking the pd if the
registration can run between that check and the point at which the
mutex is taken.

Apart from this your kerneldoc comments might me improved IMO.

First of all, you can use @dev inside of a kerneldoc (if @dev
represents an argument of the documented function) and that will
produce the right output automatically.

Second, it is better to avoid saying things like "Try to unregister
..." in kerneldoc comments (the "Try to" part is redundant).  Simply
say "Unregister ..." instead.

Thanks!
Lukasz Luba June 3, 2020, 4:12 p.m. UTC | #6
On 6/3/20 4:40 PM, Rafael J. Wysocki wrote:
> On Wed, Jun 3, 2020 at 5:26 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
>>
>>
>>
>> On 6/3/20 4:13 PM, Rafael J. Wysocki wrote:
>>> On Tue, Jun 2, 2020 at 1:31 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
>>>>
>>>> Hi Daniel,
>>>>
>>>> On 6/1/20 10:44 PM, Daniel Lezcano wrote:
>>>>> On 27/05/2020 11:58, Lukasz Luba wrote:
>>>>>> Add support for other devices than CPUs. The registration function
>>>>>> does not require a valid cpumask pointer and is ready to handle new
>>>>>> devices. Some of the internal structures has been reorganized in order to
>>>>>> keep consistent view (like removing per_cpu pd pointers).
>>>>>>
>>>>>> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
>>>>>> ---
>>>>>
>>>>> [ ... ]
>>>>>
>>>>>>     }
>>>>>>     EXPORT_SYMBOL_GPL(em_register_perf_domain);
>>>>>> +
>>>>>> +/**
>>>>>> + * em_dev_unregister_perf_domain() - Unregister Energy Model (EM) for a device
>>>>>> + * @dev             : Device for which the EM is registered
>>>>>> + *
>>>>>> + * Try to unregister the EM for the specified device (but not a CPU).
>>>>>> + */
>>>>>> +void em_dev_unregister_perf_domain(struct device *dev)
>>>>>> +{
>>>>>> +    if (IS_ERR_OR_NULL(dev) || !dev->em_pd)
>>>>>> +            return;
>>>>>> +
>>>>>> +    if (_is_cpu_device(dev))
>>>>>> +            return;
>>>>>> +
>>>>>> +    mutex_lock(&em_pd_mutex);
>>>>>
>>>>> Is the mutex really needed?
>>>>
>>>> I just wanted to align this unregister code with register. Since there
>>>> is debugfs dir lookup and the device's EM existence checks I thought it
>>>> wouldn't harm just to lock for a while and make sure the registration
>>>> path is not used. These two paths shouldn't affect each other, but with
>>>> modules loading/unloading I wanted to play safe.
>>>>
>>>> I can change it maybe to just dmb() and the end of the function if it's
>>>> a big performance problem in this unloading path. What do you think?
>>>
>>> I would rather leave the mutex locking as is.
>>>
>>> However, the question to ask is what exactly may go wrong without that
>>> locking in place?  Is there any specific race condition that you are
>>> concerned about?
>>>
>>
>> I tried to test this with module loading & unloading with panfrost
>> driver and CPU hotplug (which should bail out quickly) and was OK.
>> I don't see any particular race. I don't too much about the
>> debugfs code, though. That's why I tried to protect from some
>> scripts/services which try to re-load the driver.
>>
>> Apart from that, maybe just this dev->em = NULL to be populated to all
>> CPUs, which mutex_unlock synchronizes for free here.
> 
> If it may run concurrently with the registration for the same device,
> the locking is necessary, but in that case the !dev->em_pd check needs
> to go under the mutex too IMO, or you may end up leaking the pd if the
> registration can run between that check and the point at which the
> mutex is taken.

They don't run concurrently for the same device and users of that EM are
already gone.
I just wanted to be sure that everything is cleaned and synced properly.
Here is some example of the directories under
/sys/kernel/debug/energy_model
cpu0, cpu4, gpu, dsp, etc

The only worry that I had was the debugfs dir name, which is a
string from dev_name() and will be the same for the next registration
if module is re-loaded. So the 'name' is reused and debugfs_create_dir()
and debugfs_remove_recursive() uses this fsnotify, but they are
operating under inode_lock/unlock() on the parent dir 'energy_model'.
Then there is also this debugfs_lookup() which is slightly different.

That's why I put a mutex to separate all registration and unregistration
for all devices.
It should work without the mutex in unregister path, but I think it does
not harm to take it just in case and also have the CPU variable sync for
free.


> 
> Apart from this your kerneldoc comments might me improved IMO.
> 
> First of all, you can use @dev inside of a kerneldoc (if @dev
> represents an argument of the documented function) and that will
> produce the right output automatically.

OK

> 
> Second, it is better to avoid saying things like "Try to unregister
> ..." in kerneldoc comments (the "Try to" part is redundant).  Simply
> say "Unregister ..." instead.

Good point, thanks, I will use "Unregister ..." then.

> 
> Thanks!
> 

Shell I send a 'resend patch' which changes these @dev and
'unregister' comments?
Or wait for you to finish reviewing the other patches and send v9?
Rafael J. Wysocki June 3, 2020, 4:22 p.m. UTC | #7
On Wed, Jun 3, 2020 at 6:12 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
>
>
>
> On 6/3/20 4:40 PM, Rafael J. Wysocki wrote:
> > On Wed, Jun 3, 2020 at 5:26 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
> >>
> >>
> >>
> >> On 6/3/20 4:13 PM, Rafael J. Wysocki wrote:
> >>> On Tue, Jun 2, 2020 at 1:31 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
> >>>>
> >>>> Hi Daniel,
> >>>>
> >>>> On 6/1/20 10:44 PM, Daniel Lezcano wrote:
> >>>>> On 27/05/2020 11:58, Lukasz Luba wrote:
> >>>>>> Add support for other devices than CPUs. The registration function
> >>>>>> does not require a valid cpumask pointer and is ready to handle new
> >>>>>> devices. Some of the internal structures has been reorganized in order to
> >>>>>> keep consistent view (like removing per_cpu pd pointers).
> >>>>>>
> >>>>>> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
> >>>>>> ---
> >>>>>
> >>>>> [ ... ]
> >>>>>
> >>>>>>     }
> >>>>>>     EXPORT_SYMBOL_GPL(em_register_perf_domain);
> >>>>>> +
> >>>>>> +/**
> >>>>>> + * em_dev_unregister_perf_domain() - Unregister Energy Model (EM) for a device
> >>>>>> + * @dev             : Device for which the EM is registered
> >>>>>> + *
> >>>>>> + * Try to unregister the EM for the specified device (but not a CPU).
> >>>>>> + */
> >>>>>> +void em_dev_unregister_perf_domain(struct device *dev)
> >>>>>> +{
> >>>>>> +    if (IS_ERR_OR_NULL(dev) || !dev->em_pd)
> >>>>>> +            return;
> >>>>>> +
> >>>>>> +    if (_is_cpu_device(dev))
> >>>>>> +            return;
> >>>>>> +
> >>>>>> +    mutex_lock(&em_pd_mutex);
> >>>>>
> >>>>> Is the mutex really needed?
> >>>>
> >>>> I just wanted to align this unregister code with register. Since there
> >>>> is debugfs dir lookup and the device's EM existence checks I thought it
> >>>> wouldn't harm just to lock for a while and make sure the registration
> >>>> path is not used. These two paths shouldn't affect each other, but with
> >>>> modules loading/unloading I wanted to play safe.
> >>>>
> >>>> I can change it maybe to just dmb() and the end of the function if it's
> >>>> a big performance problem in this unloading path. What do you think?
> >>>
> >>> I would rather leave the mutex locking as is.
> >>>
> >>> However, the question to ask is what exactly may go wrong without that
> >>> locking in place?  Is there any specific race condition that you are
> >>> concerned about?
> >>>
> >>
> >> I tried to test this with module loading & unloading with panfrost
> >> driver and CPU hotplug (which should bail out quickly) and was OK.
> >> I don't see any particular race. I don't too much about the
> >> debugfs code, though. That's why I tried to protect from some
> >> scripts/services which try to re-load the driver.
> >>
> >> Apart from that, maybe just this dev->em = NULL to be populated to all
> >> CPUs, which mutex_unlock synchronizes for free here.
> >
> > If it may run concurrently with the registration for the same device,
> > the locking is necessary, but in that case the !dev->em_pd check needs
> > to go under the mutex too IMO, or you may end up leaking the pd if the
> > registration can run between that check and the point at which the
> > mutex is taken.
>
> They don't run concurrently for the same device and users of that EM are
> already gone.
> I just wanted to be sure that everything is cleaned and synced properly.
> Here is some example of the directories under
> /sys/kernel/debug/energy_model
> cpu0, cpu4, gpu, dsp, etc
>
> The only worry that I had was the debugfs dir name, which is a
> string from dev_name() and will be the same for the next registration
> if module is re-loaded.

OK, so that needs to be explained in a comment.

> So the 'name' is reused and debugfs_create_dir()
> and debugfs_remove_recursive() uses this fsnotify, but they are
> operating under inode_lock/unlock() on the parent dir 'energy_model'.
> Then there is also this debugfs_lookup() which is slightly different.
>
> That's why I put a mutex to separate all registration and unregistration
> for all devices.
> It should work without the mutex in unregister path, but I think it does
> not harm to take

Well, fair enough, but I still think that the !dev->em_pd check should
be done under the mutex or it will be confusing.

> it just in case and also have the CPU variable sync for free.

I'm not sure what you mean by the last part here?

> >
> > Apart from this your kerneldoc comments might me improved IMO.
> >
> > First of all, you can use @dev inside of a kerneldoc (if @dev
> > represents an argument of the documented function) and that will
> > produce the right output automatically.
>
> OK
>
> >
> > Second, it is better to avoid saying things like "Try to unregister
> > ..." in kerneldoc comments (the "Try to" part is redundant).  Simply
> > say "Unregister ..." instead.
>
> Good point, thanks, I will use "Unregister ..." then.
>
> >
> > Thanks!
> >
>
> Shell I send a 'resend patch' which changes these @dev and
> 'unregister' comments?

Yes, please, but see the comments above too.

> Or wait for you to finish reviewing the other patches and send v9?

That is not necessary, unless you want to make kerneldoc improvements
in the other patches.

Thanks!
Lukasz Luba June 3, 2020, 4:45 p.m. UTC | #8
On 6/3/20 5:22 PM, Rafael J. Wysocki wrote:
> On Wed, Jun 3, 2020 at 6:12 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
>>
>>
>>
>> On 6/3/20 4:40 PM, Rafael J. Wysocki wrote:
>>> On Wed, Jun 3, 2020 at 5:26 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
>>>>
>>>>
>>>>
>>>> On 6/3/20 4:13 PM, Rafael J. Wysocki wrote:
>>>>> On Tue, Jun 2, 2020 at 1:31 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
>>>>>>
>>>>>> Hi Daniel,
>>>>>>
>>>>>> On 6/1/20 10:44 PM, Daniel Lezcano wrote:
>>>>>>> On 27/05/2020 11:58, Lukasz Luba wrote:
>>>>>>>> Add support for other devices than CPUs. The registration function
>>>>>>>> does not require a valid cpumask pointer and is ready to handle new
>>>>>>>> devices. Some of the internal structures has been reorganized in order to
>>>>>>>> keep consistent view (like removing per_cpu pd pointers).
>>>>>>>>
>>>>>>>> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
>>>>>>>> ---
>>>>>>>
>>>>>>> [ ... ]
>>>>>>>
>>>>>>>>      }
>>>>>>>>      EXPORT_SYMBOL_GPL(em_register_perf_domain);
>>>>>>>> +
>>>>>>>> +/**
>>>>>>>> + * em_dev_unregister_perf_domain() - Unregister Energy Model (EM) for a device
>>>>>>>> + * @dev             : Device for which the EM is registered
>>>>>>>> + *
>>>>>>>> + * Try to unregister the EM for the specified device (but not a CPU).
>>>>>>>> + */
>>>>>>>> +void em_dev_unregister_perf_domain(struct device *dev)
>>>>>>>> +{
>>>>>>>> +    if (IS_ERR_OR_NULL(dev) || !dev->em_pd)
>>>>>>>> +            return;
>>>>>>>> +
>>>>>>>> +    if (_is_cpu_device(dev))
>>>>>>>> +            return;
>>>>>>>> +
>>>>>>>> +    mutex_lock(&em_pd_mutex);
>>>>>>>
>>>>>>> Is the mutex really needed?
>>>>>>
>>>>>> I just wanted to align this unregister code with register. Since there
>>>>>> is debugfs dir lookup and the device's EM existence checks I thought it
>>>>>> wouldn't harm just to lock for a while and make sure the registration
>>>>>> path is not used. These two paths shouldn't affect each other, but with
>>>>>> modules loading/unloading I wanted to play safe.
>>>>>>
>>>>>> I can change it maybe to just dmb() and the end of the function if it's
>>>>>> a big performance problem in this unloading path. What do you think?
>>>>>
>>>>> I would rather leave the mutex locking as is.
>>>>>
>>>>> However, the question to ask is what exactly may go wrong without that
>>>>> locking in place?  Is there any specific race condition that you are
>>>>> concerned about?
>>>>>
>>>>
>>>> I tried to test this with module loading & unloading with panfrost
>>>> driver and CPU hotplug (which should bail out quickly) and was OK.
>>>> I don't see any particular race. I don't too much about the
>>>> debugfs code, though. That's why I tried to protect from some
>>>> scripts/services which try to re-load the driver.
>>>>
>>>> Apart from that, maybe just this dev->em = NULL to be populated to all
>>>> CPUs, which mutex_unlock synchronizes for free here.
>>>
>>> If it may run concurrently with the registration for the same device,
>>> the locking is necessary, but in that case the !dev->em_pd check needs
>>> to go under the mutex too IMO, or you may end up leaking the pd if the
>>> registration can run between that check and the point at which the
>>> mutex is taken.
>>
>> They don't run concurrently for the same device and users of that EM are
>> already gone.
>> I just wanted to be sure that everything is cleaned and synced properly.
>> Here is some example of the directories under
>> /sys/kernel/debug/energy_model
>> cpu0, cpu4, gpu, dsp, etc
>>
>> The only worry that I had was the debugfs dir name, which is a
>> string from dev_name() and will be the same for the next registration
>> if module is re-loaded.
> 
> OK, so that needs to be explained in a comment.

OK, I will add it.

> 
>> So the 'name' is reused and debugfs_create_dir()
>> and debugfs_remove_recursive() uses this fsnotify, but they are
>> operating under inode_lock/unlock() on the parent dir 'energy_model'.
>> Then there is also this debugfs_lookup() which is slightly different.
>>
>> That's why I put a mutex to separate all registration and unregistration
>> for all devices.
>> It should work without the mutex in unregister path, but I think it does
>> not harm to take
> 
> Well, fair enough, but I still think that the !dev->em_pd check should
> be done under the mutex or it will be confusing.
> 
>> it just in case and also have the CPU variable sync for free.
> 
> I'm not sure what you mean by the last part here?

The mutex_unlock for me also means dmb() took place. ARM has slightly
different memory model than x86 and I just wanted to be sure that
this new values reach memory and become visible to other cores.
mutex_unlock just guaranties this for me.

> 
>>>
>>> Apart from this your kerneldoc comments might me improved IMO.
>>>
>>> First of all, you can use @dev inside of a kerneldoc (if @dev
>>> represents an argument of the documented function) and that will
>>> produce the right output automatically.
>>
>> OK
>>
>>>
>>> Second, it is better to avoid saying things like "Try to unregister
>>> ..." in kerneldoc comments (the "Try to" part is redundant).  Simply
>>> say "Unregister ..." instead.
>>
>> Good point, thanks, I will use "Unregister ..." then.
>>
>>>
>>> Thanks!
>>>
>>
>> Shell I send a 'resend patch' which changes these @dev and
>> 'unregister' comments?
> 
> Yes, please, but see the comments above too.

Saw it

> 
>> Or wait for you to finish reviewing the other patches and send v9?
> 
> That is not necessary, unless you want to make kerneldoc improvements
> in the other patches.

I will check them, but if they are OKish then I will just resend this
one.


Thank you for the review.

Regards,
Lukasz
Dan Carpenter June 8, 2020, 11:51 a.m. UTC | #9
Hi Lukasz,

I love your patch! Perhaps something to improve:

url:    https://github.com/0day-ci/linux/commits/Lukasz-Luba/Add-support-for-devices-in-the-Energy-Model/20200527-180614
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next

config: i386-randconfig-m021-20200605 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-13) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

smatch warnings:
kernel/power/energy_model.c:316 em_dev_register_perf_domain() error: we previously assumed 'dev->em_pd' could be null (see line 277)

# https://github.com/0day-ci/linux/commit/110d050cb7ba1c96e63ada498979d1fd99529be2
git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout 110d050cb7ba1c96e63ada498979d1fd99529be2
vim +316 kernel/power/energy_model.c

0e294e607adaf3 Lukasz Luba     2020-05-27  262  int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
110d050cb7ba1c Lukasz Luba     2020-05-27  263  				struct em_data_callback *cb, cpumask_t *cpus)
27871f7a8a341e Quentin Perret  2018-12-03  264  {
27871f7a8a341e Quentin Perret  2018-12-03  265  	unsigned long cap, prev_cap = 0;
110d050cb7ba1c Lukasz Luba     2020-05-27  266  	int cpu, ret;
27871f7a8a341e Quentin Perret  2018-12-03  267  
110d050cb7ba1c Lukasz Luba     2020-05-27  268  	if (!dev || !nr_states || !cb)
27871f7a8a341e Quentin Perret  2018-12-03  269  		return -EINVAL;
27871f7a8a341e Quentin Perret  2018-12-03  270  
27871f7a8a341e Quentin Perret  2018-12-03  271  	/*
27871f7a8a341e Quentin Perret  2018-12-03  272  	 * Use a mutex to serialize the registration of performance domains and
27871f7a8a341e Quentin Perret  2018-12-03  273  	 * let the driver-defined callback functions sleep.
27871f7a8a341e Quentin Perret  2018-12-03  274  	 */
27871f7a8a341e Quentin Perret  2018-12-03  275  	mutex_lock(&em_pd_mutex);
27871f7a8a341e Quentin Perret  2018-12-03  276  
110d050cb7ba1c Lukasz Luba     2020-05-27 @277  	if (dev->em_pd) {
                                                            ^^^^^^^^^^
Check for NULL.

27871f7a8a341e Quentin Perret  2018-12-03  278  		ret = -EEXIST;
27871f7a8a341e Quentin Perret  2018-12-03  279  		goto unlock;
27871f7a8a341e Quentin Perret  2018-12-03  280  	}
27871f7a8a341e Quentin Perret  2018-12-03  281  
110d050cb7ba1c Lukasz Luba     2020-05-27  282  	if (_is_cpu_device(dev)) {
110d050cb7ba1c Lukasz Luba     2020-05-27  283  		if (!cpus) {
110d050cb7ba1c Lukasz Luba     2020-05-27  284  			dev_err(dev, "EM: invalid CPU mask\n");
110d050cb7ba1c Lukasz Luba     2020-05-27  285  			ret = -EINVAL;
110d050cb7ba1c Lukasz Luba     2020-05-27  286  			goto unlock;
110d050cb7ba1c Lukasz Luba     2020-05-27  287  		}
110d050cb7ba1c Lukasz Luba     2020-05-27  288  
110d050cb7ba1c Lukasz Luba     2020-05-27  289  		for_each_cpu(cpu, cpus) {
110d050cb7ba1c Lukasz Luba     2020-05-27  290  			if (em_cpu_get(cpu)) {
110d050cb7ba1c Lukasz Luba     2020-05-27  291  				dev_err(dev, "EM: exists for CPU%d\n", cpu);
110d050cb7ba1c Lukasz Luba     2020-05-27  292  				ret = -EEXIST;
110d050cb7ba1c Lukasz Luba     2020-05-27  293  				goto unlock;
110d050cb7ba1c Lukasz Luba     2020-05-27  294  			}
27871f7a8a341e Quentin Perret  2018-12-03  295  			/*
110d050cb7ba1c Lukasz Luba     2020-05-27  296  			 * All CPUs of a domain must have the same
110d050cb7ba1c Lukasz Luba     2020-05-27  297  			 * micro-architecture since they all share the same
110d050cb7ba1c Lukasz Luba     2020-05-27  298  			 * table.
27871f7a8a341e Quentin Perret  2018-12-03  299  			 */
8ec59c0f5f4966 Vincent Guittot 2019-06-17  300  			cap = arch_scale_cpu_capacity(cpu);
27871f7a8a341e Quentin Perret  2018-12-03  301  			if (prev_cap && prev_cap != cap) {
110d050cb7ba1c Lukasz Luba     2020-05-27  302  				dev_err(dev, "EM: CPUs of %*pbl must have the same capacity\n",
110d050cb7ba1c Lukasz Luba     2020-05-27  303  					cpumask_pr_args(cpus));
110d050cb7ba1c Lukasz Luba     2020-05-27  304  
27871f7a8a341e Quentin Perret  2018-12-03  305  				ret = -EINVAL;
27871f7a8a341e Quentin Perret  2018-12-03  306  				goto unlock;
27871f7a8a341e Quentin Perret  2018-12-03  307  			}
27871f7a8a341e Quentin Perret  2018-12-03  308  			prev_cap = cap;
27871f7a8a341e Quentin Perret  2018-12-03  309  		}
110d050cb7ba1c Lukasz Luba     2020-05-27  310  	}
27871f7a8a341e Quentin Perret  2018-12-03  311  
110d050cb7ba1c Lukasz Luba     2020-05-27  312  	ret = em_create_pd(dev, nr_states, cb, cpus);


If it's a _is_cpu_device() then it iterates through a bunch of devices
and sets up cpu_dev->em_pd for each.  Presumably one of the devices is
"dev" or this would crash pretty early on in testing?

110d050cb7ba1c Lukasz Luba     2020-05-27  313  	if (ret)
27871f7a8a341e Quentin Perret  2018-12-03  314  		goto unlock;
27871f7a8a341e Quentin Perret  2018-12-03  315  
110d050cb7ba1c Lukasz Luba     2020-05-27 @316  	em_debug_create_pd(dev);
                                                                           ^^^
Dereferences dev->em_pd.

110d050cb7ba1c Lukasz Luba     2020-05-27  317  	dev_info(dev, "EM: created perf domain\n");
27871f7a8a341e Quentin Perret  2018-12-03  318  
27871f7a8a341e Quentin Perret  2018-12-03  319  unlock:
27871f7a8a341e Quentin Perret  2018-12-03  320  	mutex_unlock(&em_pd_mutex);
27871f7a8a341e Quentin Perret  2018-12-03  321  	return ret;
27871f7a8a341e Quentin Perret  2018-12-03  322  }
0e294e607adaf3 Lukasz Luba     2020-05-27  323  EXPORT_SYMBOL_GPL(em_dev_register_perf_domain);

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Lukasz Luba June 8, 2020, 12:34 p.m. UTC | #10
Hi Dan,

Thank you for your analyzes.

On 6/8/20 12:51 PM, Dan Carpenter wrote:
> Hi Lukasz,
> 
> I love your patch! Perhaps something to improve:
> 
> url:    https://github.com/0day-ci/linux/commits/Lukasz-Luba/Add-support-for-devices-in-the-Energy-Model/20200527-180614
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
> 
> config: i386-randconfig-m021-20200605 (attached as .config)
> compiler: gcc-9 (Debian 9.3.0-13) 9.3.0
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
> 
> smatch warnings:
> kernel/power/energy_model.c:316 em_dev_register_perf_domain() error: we previously assumed 'dev->em_pd' could be null (see line 277)
> 
> # https://github.com/0day-ci/linux/commit/110d050cb7ba1c96e63ada498979d1fd99529be2
> git remote add linux-review https://github.com/0day-ci/linux
> git remote update linux-review
> git checkout 110d050cb7ba1c96e63ada498979d1fd99529be2
> vim +316 kernel/power/energy_model.c
> 
> 0e294e607adaf3 Lukasz Luba     2020-05-27  262  int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
> 110d050cb7ba1c Lukasz Luba     2020-05-27  263  				struct em_data_callback *cb, cpumask_t *cpus)
> 27871f7a8a341e Quentin Perret  2018-12-03  264  {
> 27871f7a8a341e Quentin Perret  2018-12-03  265  	unsigned long cap, prev_cap = 0;
> 110d050cb7ba1c Lukasz Luba     2020-05-27  266  	int cpu, ret;
> 27871f7a8a341e Quentin Perret  2018-12-03  267
> 110d050cb7ba1c Lukasz Luba     2020-05-27  268  	if (!dev || !nr_states || !cb)
> 27871f7a8a341e Quentin Perret  2018-12-03  269  		return -EINVAL;
> 27871f7a8a341e Quentin Perret  2018-12-03  270
> 27871f7a8a341e Quentin Perret  2018-12-03  271  	/*
> 27871f7a8a341e Quentin Perret  2018-12-03  272  	 * Use a mutex to serialize the registration of performance domains and
> 27871f7a8a341e Quentin Perret  2018-12-03  273  	 * let the driver-defined callback functions sleep.
> 27871f7a8a341e Quentin Perret  2018-12-03  274  	 */
> 27871f7a8a341e Quentin Perret  2018-12-03  275  	mutex_lock(&em_pd_mutex);
> 27871f7a8a341e Quentin Perret  2018-12-03  276
> 110d050cb7ba1c Lukasz Luba     2020-05-27 @277  	if (dev->em_pd) {
>                                                              ^^^^^^^^^^
> Check for NULL.
> 
> 27871f7a8a341e Quentin Perret  2018-12-03  278  		ret = -EEXIST;
> 27871f7a8a341e Quentin Perret  2018-12-03  279  		goto unlock;
> 27871f7a8a341e Quentin Perret  2018-12-03  280  	}
> 27871f7a8a341e Quentin Perret  2018-12-03  281
> 110d050cb7ba1c Lukasz Luba     2020-05-27  282  	if (_is_cpu_device(dev)) {
> 110d050cb7ba1c Lukasz Luba     2020-05-27  283  		if (!cpus) {
> 110d050cb7ba1c Lukasz Luba     2020-05-27  284  			dev_err(dev, "EM: invalid CPU mask\n");
> 110d050cb7ba1c Lukasz Luba     2020-05-27  285  			ret = -EINVAL;
> 110d050cb7ba1c Lukasz Luba     2020-05-27  286  			goto unlock;
> 110d050cb7ba1c Lukasz Luba     2020-05-27  287  		}
> 110d050cb7ba1c Lukasz Luba     2020-05-27  288
> 110d050cb7ba1c Lukasz Luba     2020-05-27  289  		for_each_cpu(cpu, cpus) {
> 110d050cb7ba1c Lukasz Luba     2020-05-27  290  			if (em_cpu_get(cpu)) {
> 110d050cb7ba1c Lukasz Luba     2020-05-27  291  				dev_err(dev, "EM: exists for CPU%d\n", cpu);
> 110d050cb7ba1c Lukasz Luba     2020-05-27  292  				ret = -EEXIST;
> 110d050cb7ba1c Lukasz Luba     2020-05-27  293  				goto unlock;
> 110d050cb7ba1c Lukasz Luba     2020-05-27  294  			}
> 27871f7a8a341e Quentin Perret  2018-12-03  295  			/*
> 110d050cb7ba1c Lukasz Luba     2020-05-27  296  			 * All CPUs of a domain must have the same
> 110d050cb7ba1c Lukasz Luba     2020-05-27  297  			 * micro-architecture since they all share the same
> 110d050cb7ba1c Lukasz Luba     2020-05-27  298  			 * table.
> 27871f7a8a341e Quentin Perret  2018-12-03  299  			 */
> 8ec59c0f5f4966 Vincent Guittot 2019-06-17  300  			cap = arch_scale_cpu_capacity(cpu);
> 27871f7a8a341e Quentin Perret  2018-12-03  301  			if (prev_cap && prev_cap != cap) {
> 110d050cb7ba1c Lukasz Luba     2020-05-27  302  				dev_err(dev, "EM: CPUs of %*pbl must have the same capacity\n",
> 110d050cb7ba1c Lukasz Luba     2020-05-27  303  					cpumask_pr_args(cpus));
> 110d050cb7ba1c Lukasz Luba     2020-05-27  304
> 27871f7a8a341e Quentin Perret  2018-12-03  305  				ret = -EINVAL;
> 27871f7a8a341e Quentin Perret  2018-12-03  306  				goto unlock;
> 27871f7a8a341e Quentin Perret  2018-12-03  307  			}
> 27871f7a8a341e Quentin Perret  2018-12-03  308  			prev_cap = cap;
> 27871f7a8a341e Quentin Perret  2018-12-03  309  		}
> 110d050cb7ba1c Lukasz Luba     2020-05-27  310  	}
> 27871f7a8a341e Quentin Perret  2018-12-03  311
> 110d050cb7ba1c Lukasz Luba     2020-05-27  312  	ret = em_create_pd(dev, nr_states, cb, cpus);
> 
> 
> If it's a _is_cpu_device() then it iterates through a bunch of devices
> and sets up cpu_dev->em_pd for each.  Presumably one of the devices is
> "dev" or this would crash pretty early on in testing?

Yes, all of the devices taken from 'cpus' mask will get the em_pd set
including the suspected @dev.
To calm down this static analyzer I can remove the 'else'
in line 204 to make 'dev->em_pd = pd' set always.
199         if (_is_cpu_device(dev))
200                 for_each_cpu(cpu, cpus) {
201                         cpu_dev = get_cpu_device(cpu);
202                         cpu_dev->em_pd = pd;
203                 }
204         else
205                 dev->em_pd = pd;


Do you think it's a good solution and will work for this tool?

Regards,
Lukasz

> 
> 110d050cb7ba1c Lukasz Luba     2020-05-27  313  	if (ret)
> 27871f7a8a341e Quentin Perret  2018-12-03  314  		goto unlock;
> 27871f7a8a341e Quentin Perret  2018-12-03  315
> 110d050cb7ba1c Lukasz Luba     2020-05-27 @316  	em_debug_create_pd(dev);
>                                                                             ^^^
> Dereferences dev->em_pd.
> 
> 110d050cb7ba1c Lukasz Luba     2020-05-27  317  	dev_info(dev, "EM: created perf domain\n");
> 27871f7a8a341e Quentin Perret  2018-12-03  318
> 27871f7a8a341e Quentin Perret  2018-12-03  319  unlock:
> 27871f7a8a341e Quentin Perret  2018-12-03  320  	mutex_unlock(&em_pd_mutex);
> 27871f7a8a341e Quentin Perret  2018-12-03  321  	return ret;
> 27871f7a8a341e Quentin Perret  2018-12-03  322  }
> 0e294e607adaf3 Lukasz Luba     2020-05-27  323  EXPORT_SYMBOL_GPL(em_dev_register_perf_domain);
> 
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
>
Dan Carpenter June 8, 2020, 12:51 p.m. UTC | #11
On Mon, Jun 08, 2020 at 01:34:37PM +0100, Lukasz Luba wrote:
> Hi Dan,
> 
> Thank you for your analyzes.
> 
> On 6/8/20 12:51 PM, Dan Carpenter wrote:
> > Hi Lukasz,
> > 
> > I love your patch! Perhaps something to improve:
> > 
> > url:    https://github.com/0day-ci/linux/commits/Lukasz-Luba/Add-support-for-devices-in-the-Energy-Model/20200527-180614
> > base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
> > 
> > config: i386-randconfig-m021-20200605 (attached as .config)
> > compiler: gcc-9 (Debian 9.3.0-13) 9.3.0
> > 
> > If you fix the issue, kindly add following tag as appropriate
> > Reported-by: kernel test robot <lkp@intel.com>
> > Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
> > 
> > smatch warnings:
> > kernel/power/energy_model.c:316 em_dev_register_perf_domain() error: we previously assumed 'dev->em_pd' could be null (see line 277)
> > 
> > # https://github.com/0day-ci/linux/commit/110d050cb7ba1c96e63ada498979d1fd99529be2
> > git remote add linux-review https://github.com/0day-ci/linux
> > git remote update linux-review
> > git checkout 110d050cb7ba1c96e63ada498979d1fd99529be2
> > vim +316 kernel/power/energy_model.c
> > 
> > 0e294e607adaf3 Lukasz Luba     2020-05-27  262  int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
> > 110d050cb7ba1c Lukasz Luba     2020-05-27  263  				struct em_data_callback *cb, cpumask_t *cpus)
> > 27871f7a8a341e Quentin Perret  2018-12-03  264  {
> > 27871f7a8a341e Quentin Perret  2018-12-03  265  	unsigned long cap, prev_cap = 0;
> > 110d050cb7ba1c Lukasz Luba     2020-05-27  266  	int cpu, ret;
> > 27871f7a8a341e Quentin Perret  2018-12-03  267
> > 110d050cb7ba1c Lukasz Luba     2020-05-27  268  	if (!dev || !nr_states || !cb)
> > 27871f7a8a341e Quentin Perret  2018-12-03  269  		return -EINVAL;
> > 27871f7a8a341e Quentin Perret  2018-12-03  270
> > 27871f7a8a341e Quentin Perret  2018-12-03  271  	/*
> > 27871f7a8a341e Quentin Perret  2018-12-03  272  	 * Use a mutex to serialize the registration of performance domains and
> > 27871f7a8a341e Quentin Perret  2018-12-03  273  	 * let the driver-defined callback functions sleep.
> > 27871f7a8a341e Quentin Perret  2018-12-03  274  	 */
> > 27871f7a8a341e Quentin Perret  2018-12-03  275  	mutex_lock(&em_pd_mutex);
> > 27871f7a8a341e Quentin Perret  2018-12-03  276
> > 110d050cb7ba1c Lukasz Luba     2020-05-27 @277  	if (dev->em_pd) {
> >                                                              ^^^^^^^^^^
> > Check for NULL.
> > 
> > 27871f7a8a341e Quentin Perret  2018-12-03  278  		ret = -EEXIST;
> > 27871f7a8a341e Quentin Perret  2018-12-03  279  		goto unlock;
> > 27871f7a8a341e Quentin Perret  2018-12-03  280  	}
> > 27871f7a8a341e Quentin Perret  2018-12-03  281
> > 110d050cb7ba1c Lukasz Luba     2020-05-27  282  	if (_is_cpu_device(dev)) {
> > 110d050cb7ba1c Lukasz Luba     2020-05-27  283  		if (!cpus) {
> > 110d050cb7ba1c Lukasz Luba     2020-05-27  284  			dev_err(dev, "EM: invalid CPU mask\n");
> > 110d050cb7ba1c Lukasz Luba     2020-05-27  285  			ret = -EINVAL;
> > 110d050cb7ba1c Lukasz Luba     2020-05-27  286  			goto unlock;
> > 110d050cb7ba1c Lukasz Luba     2020-05-27  287  		}
> > 110d050cb7ba1c Lukasz Luba     2020-05-27  288
> > 110d050cb7ba1c Lukasz Luba     2020-05-27  289  		for_each_cpu(cpu, cpus) {
> > 110d050cb7ba1c Lukasz Luba     2020-05-27  290  			if (em_cpu_get(cpu)) {
> > 110d050cb7ba1c Lukasz Luba     2020-05-27  291  				dev_err(dev, "EM: exists for CPU%d\n", cpu);
> > 110d050cb7ba1c Lukasz Luba     2020-05-27  292  				ret = -EEXIST;
> > 110d050cb7ba1c Lukasz Luba     2020-05-27  293  				goto unlock;
> > 110d050cb7ba1c Lukasz Luba     2020-05-27  294  			}
> > 27871f7a8a341e Quentin Perret  2018-12-03  295  			/*
> > 110d050cb7ba1c Lukasz Luba     2020-05-27  296  			 * All CPUs of a domain must have the same
> > 110d050cb7ba1c Lukasz Luba     2020-05-27  297  			 * micro-architecture since they all share the same
> > 110d050cb7ba1c Lukasz Luba     2020-05-27  298  			 * table.
> > 27871f7a8a341e Quentin Perret  2018-12-03  299  			 */
> > 8ec59c0f5f4966 Vincent Guittot 2019-06-17  300  			cap = arch_scale_cpu_capacity(cpu);
> > 27871f7a8a341e Quentin Perret  2018-12-03  301  			if (prev_cap && prev_cap != cap) {
> > 110d050cb7ba1c Lukasz Luba     2020-05-27  302  				dev_err(dev, "EM: CPUs of %*pbl must have the same capacity\n",
> > 110d050cb7ba1c Lukasz Luba     2020-05-27  303  					cpumask_pr_args(cpus));
> > 110d050cb7ba1c Lukasz Luba     2020-05-27  304
> > 27871f7a8a341e Quentin Perret  2018-12-03  305  				ret = -EINVAL;
> > 27871f7a8a341e Quentin Perret  2018-12-03  306  				goto unlock;
> > 27871f7a8a341e Quentin Perret  2018-12-03  307  			}
> > 27871f7a8a341e Quentin Perret  2018-12-03  308  			prev_cap = cap;
> > 27871f7a8a341e Quentin Perret  2018-12-03  309  		}
> > 110d050cb7ba1c Lukasz Luba     2020-05-27  310  	}
> > 27871f7a8a341e Quentin Perret  2018-12-03  311
> > 110d050cb7ba1c Lukasz Luba     2020-05-27  312  	ret = em_create_pd(dev, nr_states, cb, cpus);
> > 
> > 
> > If it's a _is_cpu_device() then it iterates through a bunch of devices
> > and sets up cpu_dev->em_pd for each.  Presumably one of the devices is
> > "dev" or this would crash pretty early on in testing?
> 
> Yes, all of the devices taken from 'cpus' mask will get the em_pd set
> including the suspected @dev.
> To calm down this static analyzer I can remove the 'else'
> in line 204 to make 'dev->em_pd = pd' set always.
> 199         if (_is_cpu_device(dev))
> 200                 for_each_cpu(cpu, cpus) {
> 201                         cpu_dev = get_cpu_device(cpu);
> 202                         cpu_dev->em_pd = pd;
> 203                 }
> 204         else
> 205                 dev->em_pd = pd;
> 
> 
> Do you think it's a good solution and will work for this tool?

It's not about the tool...  Ignore the tool when it's wrong.  But I do
think the code is slightly more clear without the else statement.

Arguments could be made either way.  Removing the else statement means
we set dev->em_pd twice...  But I *personally* lean vaguely towards
removing the else statement.  :P

That would make the warning go away as well.

regards,
dan carpenter
Lukasz Luba June 8, 2020, 12:59 p.m. UTC | #12
On 6/8/20 1:51 PM, Dan Carpenter wrote:
> On Mon, Jun 08, 2020 at 01:34:37PM +0100, Lukasz Luba wrote:
>> Hi Dan,
>>
>> Thank you for your analyzes.
>>
>> On 6/8/20 12:51 PM, Dan Carpenter wrote:
>>> Hi Lukasz,
>>>
>>> I love your patch! Perhaps something to improve:
>>>
>>> url:    https://github.com/0day-ci/linux/commits/Lukasz-Luba/Add-support-for-devices-in-the-Energy-Model/20200527-180614
>>> base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
>>>
>>> config: i386-randconfig-m021-20200605 (attached as .config)
>>> compiler: gcc-9 (Debian 9.3.0-13) 9.3.0
>>>
>>> If you fix the issue, kindly add following tag as appropriate
>>> Reported-by: kernel test robot <lkp@intel.com>
>>> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
>>>
>>> smatch warnings:
>>> kernel/power/energy_model.c:316 em_dev_register_perf_domain() error: we previously assumed 'dev->em_pd' could be null (see line 277)
>>>
>>> # https://github.com/0day-ci/linux/commit/110d050cb7ba1c96e63ada498979d1fd99529be2
>>> git remote add linux-review https://github.com/0day-ci/linux
>>> git remote update linux-review
>>> git checkout 110d050cb7ba1c96e63ada498979d1fd99529be2
>>> vim +316 kernel/power/energy_model.c
>>>
>>> 0e294e607adaf3 Lukasz Luba     2020-05-27  262  int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
>>> 110d050cb7ba1c Lukasz Luba     2020-05-27  263  				struct em_data_callback *cb, cpumask_t *cpus)
>>> 27871f7a8a341e Quentin Perret  2018-12-03  264  {
>>> 27871f7a8a341e Quentin Perret  2018-12-03  265  	unsigned long cap, prev_cap = 0;
>>> 110d050cb7ba1c Lukasz Luba     2020-05-27  266  	int cpu, ret;
>>> 27871f7a8a341e Quentin Perret  2018-12-03  267
>>> 110d050cb7ba1c Lukasz Luba     2020-05-27  268  	if (!dev || !nr_states || !cb)
>>> 27871f7a8a341e Quentin Perret  2018-12-03  269  		return -EINVAL;
>>> 27871f7a8a341e Quentin Perret  2018-12-03  270
>>> 27871f7a8a341e Quentin Perret  2018-12-03  271  	/*
>>> 27871f7a8a341e Quentin Perret  2018-12-03  272  	 * Use a mutex to serialize the registration of performance domains and
>>> 27871f7a8a341e Quentin Perret  2018-12-03  273  	 * let the driver-defined callback functions sleep.
>>> 27871f7a8a341e Quentin Perret  2018-12-03  274  	 */
>>> 27871f7a8a341e Quentin Perret  2018-12-03  275  	mutex_lock(&em_pd_mutex);
>>> 27871f7a8a341e Quentin Perret  2018-12-03  276
>>> 110d050cb7ba1c Lukasz Luba     2020-05-27 @277  	if (dev->em_pd) {
>>>                                                               ^^^^^^^^^^
>>> Check for NULL.
>>>
>>> 27871f7a8a341e Quentin Perret  2018-12-03  278  		ret = -EEXIST;
>>> 27871f7a8a341e Quentin Perret  2018-12-03  279  		goto unlock;
>>> 27871f7a8a341e Quentin Perret  2018-12-03  280  	}
>>> 27871f7a8a341e Quentin Perret  2018-12-03  281
>>> 110d050cb7ba1c Lukasz Luba     2020-05-27  282  	if (_is_cpu_device(dev)) {
>>> 110d050cb7ba1c Lukasz Luba     2020-05-27  283  		if (!cpus) {
>>> 110d050cb7ba1c Lukasz Luba     2020-05-27  284  			dev_err(dev, "EM: invalid CPU mask\n");
>>> 110d050cb7ba1c Lukasz Luba     2020-05-27  285  			ret = -EINVAL;
>>> 110d050cb7ba1c Lukasz Luba     2020-05-27  286  			goto unlock;
>>> 110d050cb7ba1c Lukasz Luba     2020-05-27  287  		}
>>> 110d050cb7ba1c Lukasz Luba     2020-05-27  288
>>> 110d050cb7ba1c Lukasz Luba     2020-05-27  289  		for_each_cpu(cpu, cpus) {
>>> 110d050cb7ba1c Lukasz Luba     2020-05-27  290  			if (em_cpu_get(cpu)) {
>>> 110d050cb7ba1c Lukasz Luba     2020-05-27  291  				dev_err(dev, "EM: exists for CPU%d\n", cpu);
>>> 110d050cb7ba1c Lukasz Luba     2020-05-27  292  				ret = -EEXIST;
>>> 110d050cb7ba1c Lukasz Luba     2020-05-27  293  				goto unlock;
>>> 110d050cb7ba1c Lukasz Luba     2020-05-27  294  			}
>>> 27871f7a8a341e Quentin Perret  2018-12-03  295  			/*
>>> 110d050cb7ba1c Lukasz Luba     2020-05-27  296  			 * All CPUs of a domain must have the same
>>> 110d050cb7ba1c Lukasz Luba     2020-05-27  297  			 * micro-architecture since they all share the same
>>> 110d050cb7ba1c Lukasz Luba     2020-05-27  298  			 * table.
>>> 27871f7a8a341e Quentin Perret  2018-12-03  299  			 */
>>> 8ec59c0f5f4966 Vincent Guittot 2019-06-17  300  			cap = arch_scale_cpu_capacity(cpu);
>>> 27871f7a8a341e Quentin Perret  2018-12-03  301  			if (prev_cap && prev_cap != cap) {
>>> 110d050cb7ba1c Lukasz Luba     2020-05-27  302  				dev_err(dev, "EM: CPUs of %*pbl must have the same capacity\n",
>>> 110d050cb7ba1c Lukasz Luba     2020-05-27  303  					cpumask_pr_args(cpus));
>>> 110d050cb7ba1c Lukasz Luba     2020-05-27  304
>>> 27871f7a8a341e Quentin Perret  2018-12-03  305  				ret = -EINVAL;
>>> 27871f7a8a341e Quentin Perret  2018-12-03  306  				goto unlock;
>>> 27871f7a8a341e Quentin Perret  2018-12-03  307  			}
>>> 27871f7a8a341e Quentin Perret  2018-12-03  308  			prev_cap = cap;
>>> 27871f7a8a341e Quentin Perret  2018-12-03  309  		}
>>> 110d050cb7ba1c Lukasz Luba     2020-05-27  310  	}
>>> 27871f7a8a341e Quentin Perret  2018-12-03  311
>>> 110d050cb7ba1c Lukasz Luba     2020-05-27  312  	ret = em_create_pd(dev, nr_states, cb, cpus);
>>>
>>>
>>> If it's a _is_cpu_device() then it iterates through a bunch of devices
>>> and sets up cpu_dev->em_pd for each.  Presumably one of the devices is
>>> "dev" or this would crash pretty early on in testing?
>>
>> Yes, all of the devices taken from 'cpus' mask will get the em_pd set
>> including the suspected @dev.
>> To calm down this static analyzer I can remove the 'else'
>> in line 204 to make 'dev->em_pd = pd' set always.
>> 199         if (_is_cpu_device(dev))
>> 200                 for_each_cpu(cpu, cpus) {
>> 201                         cpu_dev = get_cpu_device(cpu);
>> 202                         cpu_dev->em_pd = pd;
>> 203                 }
>> 204         else
>> 205                 dev->em_pd = pd;
>>
>>
>> Do you think it's a good solution and will work for this tool?
> 
> It's not about the tool...  Ignore the tool when it's wrong.  But I do
> think the code is slightly more clear without the else statement.
> 
> Arguments could be made either way.  Removing the else statement means
> we set dev->em_pd twice...  But I *personally* lean vaguely towards
> removing the else statement.  :P

Thanks, I will remove the else statement and add your 'Reported-by'

Regards,
Lukasz

> 
> That would make the warning go away as well.
> 
> regards,
> dan carpenter
>
Dan Carpenter June 8, 2020, 1:25 p.m. UTC | #13
It's not really a proper bug report so it doesn't deserve a reported-by.

Thanks, though!

regards,
dan carpenter
Lukasz Luba June 8, 2020, 1:49 p.m. UTC | #14
On 6/8/20 2:25 PM, Dan Carpenter wrote:
> It's not really a proper bug report so it doesn't deserve a reported-by.
> 
> Thanks, though!
> 
> regards,
> dan carpenter
> 

Thank you Dan for your work. This is very much appreciated!

Lukasz
diff mbox series

Patch

diff --git a/include/linux/device.h b/include/linux/device.h
index ac8e37cd716a..7023d3ea189b 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -13,6 +13,7 @@ 
 #define _DEVICE_H_
 
 #include <linux/dev_printk.h>
+#include <linux/energy_model.h>
 #include <linux/ioport.h>
 #include <linux/kobject.h>
 #include <linux/klist.h>
@@ -559,6 +560,10 @@  struct device {
 	struct dev_pm_info	power;
 	struct dev_pm_domain	*pm_domain;
 
+#ifdef CONFIG_ENERGY_MODEL
+	struct em_perf_domain	*em_pd;
+#endif
+
 #ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
 	struct irq_domain	*msi_domain;
 #endif
diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index 7076cb22b247..0ebb083b15a0 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -12,8 +12,10 @@ 
 
 /**
  * em_perf_state - Performance state of a performance domain
- * @frequency:	The CPU frequency in KHz, for consistency with CPUFreq
- * @power:	The power consumed by 1 CPU at this level, in milli-watts
+ * @frequency:	The frequency in KHz, for consistency with CPUFreq
+ * @power:	The power consumed at this level, in milli-watts (by 1 CPU or
+		by a registered device). It can be a total power: static and
+		dynamic.
  * @cost:	The cost coefficient associated with this level, used during
  *		energy calculation. Equal to: power * max_frequency / frequency
  */
@@ -27,12 +29,16 @@  struct em_perf_state {
  * em_perf_domain - Performance domain
  * @table:		List of performance states, in ascending order
  * @nr_perf_states:	Number of performance states
- * @cpus:		Cpumask covering the CPUs of the domain
+ * @cpus:		Cpumask covering the CPUs of the domain. It's here
+ *			for performance reasons to avoid potential cache
+ *			misses during energy calculations in the scheduler
+ *			and simplifies allocating/freeing that memory region.
  *
- * A "performance domain" represents a group of CPUs whose performance is
- * scaled together. All CPUs of a performance domain must have the same
- * micro-architecture. Performance domains often have a 1-to-1 mapping with
- * CPUFreq policies.
+ * In case of CPU device, a "performance domain" represents a group of CPUs
+ * whose performance is scaled together. All CPUs of a performance domain
+ * must have the same micro-architecture. Performance domains often have
+ * a 1-to-1 mapping with CPUFreq policies. In case of other devices the 'cpus'
+ * field is unused.
  */
 struct em_perf_domain {
 	struct em_perf_state *table;
@@ -71,10 +77,12 @@  struct em_data_callback {
 #define EM_DATA_CB(_active_power_cb) { .active_power = &_active_power_cb }
 
 struct em_perf_domain *em_cpu_get(int cpu);
+struct em_perf_domain *em_pd_get(struct device *dev);
 int em_register_perf_domain(cpumask_t *span, unsigned int nr_states,
 						struct em_data_callback *cb);
 int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
 				struct em_data_callback *cb, cpumask_t *span);
+void em_dev_unregister_perf_domain(struct device *dev);
 
 /**
  * em_pd_energy() - Estimates the energy consumed by the CPUs of a perf. domain
@@ -184,10 +192,17 @@  int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
 {
 	return -EINVAL;
 }
+static inline void em_dev_unregister_perf_domain(struct device *dev)
+{
+}
 static inline struct em_perf_domain *em_cpu_get(int cpu)
 {
 	return NULL;
 }
+static inline struct em_perf_domain *em_pd_get(struct device *dev)
+{
+	return NULL;
+}
 static inline unsigned long em_pd_energy(struct em_perf_domain *pd,
 			unsigned long max_util, unsigned long sum_util)
 {
diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index 5b8a1566526a..07e8307460bc 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -1,9 +1,10 @@ 
 // SPDX-License-Identifier: GPL-2.0
 /*
- * Energy Model of CPUs
+ * Energy Model of devices
  *
- * Copyright (c) 2018, Arm ltd.
+ * Copyright (c) 2018-2020, Arm ltd.
  * Written by: Quentin Perret, Arm ltd.
+ * Improvements provided by: Lukasz Luba, Arm ltd.
  */
 
 #define pr_fmt(fmt) "energy_model: " fmt
@@ -15,15 +16,17 @@ 
 #include <linux/sched/topology.h>
 #include <linux/slab.h>
 
-/* Mapping of each CPU to the performance domain to which it belongs. */
-static DEFINE_PER_CPU(struct em_perf_domain *, em_data);
-
 /*
  * Mutex serializing the registrations of performance domains and letting
  * callbacks defined by drivers sleep.
  */
 static DEFINE_MUTEX(em_pd_mutex);
 
+static bool _is_cpu_device(struct device *dev)
+{
+	return (dev->bus == &cpu_subsys);
+}
+
 #ifdef CONFIG_DEBUG_FS
 static struct dentry *rootdir;
 
@@ -49,22 +52,30 @@  static int em_debug_cpus_show(struct seq_file *s, void *unused)
 }
 DEFINE_SHOW_ATTRIBUTE(em_debug_cpus);
 
-static void em_debug_create_pd(struct em_perf_domain *pd, int cpu)
+static void em_debug_create_pd(struct device *dev)
 {
 	struct dentry *d;
-	char name[8];
 	int i;
 
-	snprintf(name, sizeof(name), "pd%d", cpu);
-
 	/* Create the directory of the performance domain */
-	d = debugfs_create_dir(name, rootdir);
+	d = debugfs_create_dir(dev_name(dev), rootdir);
 
-	debugfs_create_file("cpus", 0444, d, pd->cpus, &em_debug_cpus_fops);
+	if (_is_cpu_device(dev))
+		debugfs_create_file("cpus", 0444, d, dev->em_pd->cpus,
+				    &em_debug_cpus_fops);
 
 	/* Create a sub-directory for each performance state */
-	for (i = 0; i < pd->nr_perf_states; i++)
-		em_debug_create_ps(&pd->table[i], d);
+	for (i = 0; i < dev->em_pd->nr_perf_states; i++)
+		em_debug_create_ps(&dev->em_pd->table[i], d);
+
+}
+
+static void em_debug_remove_pd(struct device *dev)
+{
+	struct dentry *debug_dir;
+
+	debug_dir = debugfs_lookup(dev_name(dev), rootdir);
+	debugfs_remove_recursive(debug_dir);
 }
 
 static int __init em_debug_init(void)
@@ -76,40 +87,34 @@  static int __init em_debug_init(void)
 }
 core_initcall(em_debug_init);
 #else /* CONFIG_DEBUG_FS */
-static void em_debug_create_pd(struct em_perf_domain *pd, int cpu) {}
+static void em_debug_create_pd(struct device *dev) {}
+static void em_debug_remove_pd(struct device *dev) {}
 #endif
-static struct em_perf_domain *
-em_create_pd(struct device *dev, int nr_states, struct em_data_callback *cb,
-	     cpumask_t *span)
+
+static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
+				int nr_states, struct em_data_callback *cb)
 {
 	unsigned long opp_eff, prev_opp_eff = ULONG_MAX;
 	unsigned long power, freq, prev_freq = 0;
-	int i, ret, cpu = cpumask_first(span);
 	struct em_perf_state *table;
-	struct em_perf_domain *pd;
+	int i, ret;
 	u64 fmax;
 
-	if (!cb->active_power)
-		return NULL;
-
-	pd = kzalloc(sizeof(*pd) + cpumask_size(), GFP_KERNEL);
-	if (!pd)
-		return NULL;
-
 	table = kcalloc(nr_states, sizeof(*table), GFP_KERNEL);
 	if (!table)
-		goto free_pd;
+		return -ENOMEM;
 
 	/* Build the list of performance states for this performance domain */
 	for (i = 0, freq = 0; i < nr_states; i++, freq++) {
 		/*
 		 * active_power() is a driver callback which ceils 'freq' to
-		 * lowest performance state of 'cpu' above 'freq' and updates
+		 * lowest performance state of 'dev' above 'freq' and updates
 		 * 'power' and 'freq' accordingly.
 		 */
 		ret = cb->active_power(&power, &freq, dev);
 		if (ret) {
-			pr_err("pd%d: invalid perf. state: %d\n", cpu, ret);
+			dev_err(dev, "EM: invalid perf. state: %d\n",
+				ret);
 			goto free_ps_table;
 		}
 
@@ -118,7 +123,8 @@  em_create_pd(struct device *dev, int nr_states, struct em_data_callback *cb,
 		 * higher performance states.
 		 */
 		if (freq <= prev_freq) {
-			pr_err("pd%d: non-increasing freq: %lu\n", cpu, freq);
+			dev_err(dev, "EM: non-increasing freq: %lu\n",
+				freq);
 			goto free_ps_table;
 		}
 
@@ -127,7 +133,8 @@  em_create_pd(struct device *dev, int nr_states, struct em_data_callback *cb,
 		 * positive, in milli-watts and to fit into 16 bits.
 		 */
 		if (!power || power > EM_MAX_POWER) {
-			pr_err("pd%d: invalid power: %lu\n", cpu, power);
+			dev_err(dev, "EM: invalid power: %lu\n",
+				power);
 			goto free_ps_table;
 		}
 
@@ -142,8 +149,8 @@  em_create_pd(struct device *dev, int nr_states, struct em_data_callback *cb,
 		 */
 		opp_eff = freq / power;
 		if (opp_eff >= prev_opp_eff)
-			pr_warn("pd%d: hertz/watts ratio non-monotonically decreasing: em_perf_state %d >= em_perf_state%d\n",
-					cpu, i, i - 1);
+			dev_dbg(dev, "EM: hertz/watts ratio non-monotonically decreasing: em_perf_state %d >= em_perf_state%d\n",
+					i, i - 1);
 		prev_opp_eff = opp_eff;
 	}
 
@@ -156,30 +163,82 @@  em_create_pd(struct device *dev, int nr_states, struct em_data_callback *cb,
 
 	pd->table = table;
 	pd->nr_perf_states = nr_states;
-	cpumask_copy(to_cpumask(pd->cpus), span);
-
-	em_debug_create_pd(pd, cpu);
 
-	return pd;
+	return 0;
 
 free_ps_table:
 	kfree(table);
-free_pd:
-	kfree(pd);
+	return -EINVAL;
+}
+
+static int em_create_pd(struct device *dev, int nr_states,
+			struct em_data_callback *cb, cpumask_t *cpus)
+{
+	struct em_perf_domain *pd;
+	struct device *cpu_dev;
+	int cpu, ret;
+
+	if (_is_cpu_device(dev)) {
+		pd = kzalloc(sizeof(*pd) + cpumask_size(), GFP_KERNEL);
+		if (!pd)
+			return -ENOMEM;
+
+		cpumask_copy(em_span_cpus(pd), cpus);
+	} else {
+		pd = kzalloc(sizeof(*pd), GFP_KERNEL);
+		if (!pd)
+			return -ENOMEM;
+	}
+
+	ret = em_create_perf_table(dev, pd, nr_states, cb);
+	if (ret) {
+		kfree(pd);
+		return ret;
+	}
 
-	return NULL;
+	if (_is_cpu_device(dev))
+		for_each_cpu(cpu, cpus) {
+			cpu_dev = get_cpu_device(cpu);
+			cpu_dev->em_pd = pd;
+		}
+	else
+		dev->em_pd = pd;
+
+	return 0;
 }
 
+/**
+ * em_pd_get() - Return the performance domain for a device
+ * @dev : Device to find the performance domain for
+ *
+ * Returns the performance domain to which 'dev' belongs, or NULL if it doesn't
+ * exist.
+ */
+struct em_perf_domain *em_pd_get(struct device *dev)
+{
+	if (IS_ERR_OR_NULL(dev))
+		return NULL;
+
+	return dev->em_pd;
+}
+EXPORT_SYMBOL_GPL(em_pd_get);
+
 /**
  * em_cpu_get() - Return the performance domain for a CPU
  * @cpu : CPU to find the performance domain for
  *
- * Return: the performance domain to which 'cpu' belongs, or NULL if it doesn't
+ * Returns the performance domain to which 'cpu' belongs, or NULL if it doesn't
  * exist.
  */
 struct em_perf_domain *em_cpu_get(int cpu)
 {
-	return READ_ONCE(per_cpu(em_data, cpu));
+	struct device *cpu_dev;
+
+	cpu_dev = get_cpu_device(cpu);
+	if (!cpu_dev)
+		return NULL;
+
+	return em_pd_get(cpu_dev);
 }
 EXPORT_SYMBOL_GPL(em_cpu_get);
 
@@ -188,7 +247,7 @@  EXPORT_SYMBOL_GPL(em_cpu_get);
  * @dev		: Device for which the EM is to register
  * @nr_states	: Number of performance states to register
  * @cb		: Callback functions providing the data of the Energy Model
- * @span	: Pointer to cpumask_t, which in case of a CPU device is
+ * @cpus	: Pointer to cpumask_t, which in case of a CPU device is
  *		obligatory. It can be taken from i.e. 'policy->cpus'. For other
  *		type of devices this should be set to NULL.
  *
@@ -201,13 +260,12 @@  EXPORT_SYMBOL_GPL(em_cpu_get);
  * Return 0 on success
  */
 int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
-				struct em_data_callback *cb, cpumask_t *span)
+				struct em_data_callback *cb, cpumask_t *cpus)
 {
 	unsigned long cap, prev_cap = 0;
-	struct em_perf_domain *pd;
-	int cpu, ret = 0;
+	int cpu, ret;
 
-	if (!dev || !span || !nr_states || !cb)
+	if (!dev || !nr_states || !cb)
 		return -EINVAL;
 
 	/*
@@ -216,47 +274,50 @@  int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
 	 */
 	mutex_lock(&em_pd_mutex);
 
-	for_each_cpu(cpu, span) {
-		/* Make sure we don't register again an existing domain. */
-		if (READ_ONCE(per_cpu(em_data, cpu))) {
-			ret = -EEXIST;
-			goto unlock;
-		}
+	if (dev->em_pd) {
+		ret = -EEXIST;
+		goto unlock;
+	}
 
-		/*
-		 * All CPUs of a domain must have the same micro-architecture
-		 * since they all share the same table.
-		 */
-		cap = arch_scale_cpu_capacity(cpu);
-		if (prev_cap && prev_cap != cap) {
-			pr_err("CPUs of %*pbl must have the same capacity\n",
-							cpumask_pr_args(span));
+	if (_is_cpu_device(dev)) {
+		if (!cpus) {
+			dev_err(dev, "EM: invalid CPU mask\n");
 			ret = -EINVAL;
 			goto unlock;
 		}
-		prev_cap = cap;
+
+		for_each_cpu(cpu, cpus) {
+			if (em_cpu_get(cpu)) {
+				dev_err(dev, "EM: exists for CPU%d\n", cpu);
+				ret = -EEXIST;
+				goto unlock;
+			}
+			/*
+			 * All CPUs of a domain must have the same
+			 * micro-architecture since they all share the same
+			 * table.
+			 */
+			cap = arch_scale_cpu_capacity(cpu);
+			if (prev_cap && prev_cap != cap) {
+				dev_err(dev, "EM: CPUs of %*pbl must have the same capacity\n",
+					cpumask_pr_args(cpus));
+
+				ret = -EINVAL;
+				goto unlock;
+			}
+			prev_cap = cap;
+		}
 	}
 
-	/* Create the performance domain and add it to the Energy Model. */
-	pd = em_create_pd(dev, nr_states, cb, span);
-	if (!pd) {
-		ret = -EINVAL;
+	ret = em_create_pd(dev, nr_states, cb, cpus);
+	if (ret)
 		goto unlock;
-	}
 
-	for_each_cpu(cpu, span) {
-		/*
-		 * The per-cpu array can be read concurrently from em_cpu_get().
-		 * The barrier enforces the ordering needed to make sure readers
-		 * can only access well formed em_perf_domain structs.
-		 */
-		smp_store_release(per_cpu_ptr(&em_data, cpu), pd);
-	}
+	em_debug_create_pd(dev);
+	dev_info(dev, "EM: created perf domain\n");
 
-	pr_debug("Created perf domain %*pbl\n", cpumask_pr_args(span));
 unlock:
 	mutex_unlock(&em_pd_mutex);
-
 	return ret;
 }
 EXPORT_SYMBOL_GPL(em_dev_register_perf_domain);
@@ -285,3 +346,27 @@  int em_register_perf_domain(cpumask_t *span, unsigned int nr_states,
 	return em_dev_register_perf_domain(cpu_dev, nr_states, cb, span);
 }
 EXPORT_SYMBOL_GPL(em_register_perf_domain);
+
+/**
+ * em_dev_unregister_perf_domain() - Unregister Energy Model (EM) for a device
+ * @dev		: Device for which the EM is registered
+ *
+ * Try to unregister the EM for the specified device (but not a CPU).
+ */
+void em_dev_unregister_perf_domain(struct device *dev)
+{
+	if (IS_ERR_OR_NULL(dev) || !dev->em_pd)
+		return;
+
+	if (_is_cpu_device(dev))
+		return;
+
+	mutex_lock(&em_pd_mutex);
+	em_debug_remove_pd(dev);
+
+	kfree(dev->em_pd->table);
+	kfree(dev->em_pd);
+	dev->em_pd = NULL;
+	mutex_unlock(&em_pd_mutex);
+}
+EXPORT_SYMBOL_GPL(em_dev_unregister_perf_domain);