diff mbox series

[V8,1/8] PM / devfreq: Add cpu based scaling support to passive_governor

Message ID 1616499241-4906-2-git-send-email-andrew-sh.cheng@mediatek.com (mailing list archive)
State New, archived
Headers show
Series Add cpufreq and cci devfreq for mt8183, and SVS support | expand

Commit Message

andrew-sh.cheng March 23, 2021, 11:33 a.m. UTC
From: Saravana Kannan <skannan@codeaurora.org>

Many CPU architectures have caches that can scale independent of the
CPUs. Frequency scaling of the caches is necessary to make sure that the
cache is not a performance bottleneck that leads to poor performance and
power. The same idea applies for RAM/DDR.

To achieve this, this patch adds support for cpu based scaling to the
passive governor. This is accomplished by taking the current frequency
of each CPU frequency domain and then adjust the frequency of the cache
(or any devfreq device) based on the frequency of the CPUs. It listens
to CPU frequency transition notifiers to keep itself up to date on the
current CPU frequency.

To decide the frequency of the device, the governor does one of the
following:
* Derives the optimal devfreq device opp from required-opps property of
  the parent cpu opp_table.

* Scales the device frequency in proportion to the CPU frequency. So, if
  the CPUs are running at their max frequency, the device runs at its
  max frequency. If the CPUs are running at their min frequency, the
  device runs at its min frequency. It is interpolated for frequencies
  in between.

Andrew-sh.Cheng change
dev_pm_opp_xlate_opp to dev_pm_opp_xlate_required_opp devfreq->max_freq
to devfreq->user_min_freq_req.data.freq.qos->min_freq.target_value
after kernel-5.7
Don't return -EINVAL in devfreq_passive_event_handler()
since it doesn't handle DEVFREQ_GOV_SUSPEND DEVFREQ_GOV_RESUME cases.

Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
[Sibi: Integrated cpu-freqmap governor into passive_governor]
Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
Signed-off-by: Andrew-sh.Cheng <andrew-sh.cheng@mediatek.com>
---
 drivers/devfreq/Kconfig            |   2 +
 drivers/devfreq/governor_passive.c | 329 +++++++++++++++++++++++++++++++++++--
 include/linux/devfreq.h            |  29 +++-
 3 files changed, 342 insertions(+), 18 deletions(-)

Comments

Chanwoo Choi March 25, 2021, 7:42 a.m. UTC | #1
Hi,

You are missing to add these patches to linux-pm mailing list.
Need to send them to linu-pm ML.

Also, before received this series, I tried to clean-up these patches
on testing branch[1]. So that I add my comment with my clean-up case.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov


On 3/23/21 8:33 PM, Andrew-sh.Cheng wrote:
> From: Saravana Kannan <skannan@codeaurora.org>
> 
> Many CPU architectures have caches that can scale independent of the
> CPUs. Frequency scaling of the caches is necessary to make sure that the
> cache is not a performance bottleneck that leads to poor performance and
> power. The same idea applies for RAM/DDR.
> 
> To achieve this, this patch adds support for cpu based scaling to the
> passive governor. This is accomplished by taking the current frequency
> of each CPU frequency domain and then adjust the frequency of the cache
> (or any devfreq device) based on the frequency of the CPUs. It listens
> to CPU frequency transition notifiers to keep itself up to date on the
> current CPU frequency.
> 
> To decide the frequency of the device, the governor does one of the
> following:
> * Derives the optimal devfreq device opp from required-opps property of
>   the parent cpu opp_table.
> 
> * Scales the device frequency in proportion to the CPU frequency. So, if
>   the CPUs are running at their max frequency, the device runs at its
>   max frequency. If the CPUs are running at their min frequency, the
>   device runs at its min frequency. It is interpolated for frequencies
>   in between.
> 
> Andrew-sh.Cheng change
> dev_pm_opp_xlate_opp to dev_pm_opp_xlate_required_opp devfreq->max_freq
> to devfreq->user_min_freq_req.data.freq.qos->min_freq.target_value
> after kernel-5.7
> Don't return -EINVAL in devfreq_passive_event_handler()
> since it doesn't handle DEVFREQ_GOV_SUSPEND DEVFREQ_GOV_RESUME cases.
> 
> Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
> [Sibi: Integrated cpu-freqmap governor into passive_governor]
> Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
> Signed-off-by: Andrew-sh.Cheng <andrew-sh.cheng@mediatek.com>
> ---
>  drivers/devfreq/Kconfig            |   2 +
>  drivers/devfreq/governor_passive.c | 329 +++++++++++++++++++++++++++++++++++--
>  include/linux/devfreq.h            |  29 +++-
>  3 files changed, 342 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/devfreq/Kconfig b/drivers/devfreq/Kconfig
> index 00704efe6398..f56132b0ae64 100644
> --- a/drivers/devfreq/Kconfig
> +++ b/drivers/devfreq/Kconfig
> @@ -73,6 +73,8 @@ config DEVFREQ_GOV_PASSIVE
>  	  device. This governor does not change the frequency by itself
>  	  through sysfs entries. The passive governor recommends that
>  	  devfreq device uses the OPP table to get the frequency/voltage.
> +	  Alternatively the governor can also be chosen to scale based on
> +	  the online CPUs current frequency.
>  
>  comment "DEVFREQ Drivers"
>  
> diff --git a/drivers/devfreq/governor_passive.c b/drivers/devfreq/governor_passive.c
> index b094132bd20b..9cc57b083839 100644
> --- a/drivers/devfreq/governor_passive.c
> +++ b/drivers/devfreq/governor_passive.c
> @@ -8,11 +8,103 @@
>   */
>  
>  #include <linux/module.h>
> +#include <linux/cpu.h>
> +#include <linux/cpufreq.h>
> +#include <linux/cpumask.h>
>  #include <linux/device.h>
>  #include <linux/devfreq.h>
> +#include <linux/slab.h>
>  #include "governor.h"
>  
> -static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
> +struct devfreq_cpu_state {
> +	unsigned int curr_freq;
> +	unsigned int min_freq;
> +	unsigned int max_freq;
> +	unsigned int first_cpu;
> +	struct device *cpu_dev;
> +	struct opp_table *opp_table;
> +};

As I knew, the previous version has the description of structure
as following:  I wan to add the description like below.

And if you have no any objection, I'd like you to order
the variables as following and use 'dev' instead of 'cpu_dev'
because this patch use the 'cpu_state->cpu_dev' at the multiple points.
I think that 'cpu_state->dev' is better than 'cpu_state->cpu_dev'.
Also, I prefer to use 'cur_freq' instead of 'curr_freq'
because devfreq subsystem uses 'cur_freq' for expressing the 'current frequency'.

/**                                                                             
 * struct devfreq_cpu_state - Hold the per-cpu data                              
 * @dev:        reference to cpu device.                                        
 * @first_cpu:  the cpumask of the first cpu of a policy.                       
 * @opp_table:  reference to cpu opp table.                                     
 * @cur_freq:   the current frequency of the cpu.                               
 * @min_freq:   the min frequency of the cpu.                                   
 * @max_freq:   the max frequency of the cpu.                                   
 *                                                                              
 * This structure stores the required cpu_data of a cpu.                        
 * This is auto-populated by the governor.                                      
 */                                                                             
struct devfreq_cpu_state {                                                       
         struct device *dev;                                                     
         unsigned int first_cpu;                                                 

         struct opp_table *opp_table;                                            
         unsigned int cur_freq;                                                  
         unsigned int min_freq;                                                  
         unsigned int max_freq;                                                  
};               


> +
> +static unsigned long xlate_cpufreq_to_devfreq(struct devfreq_passive_data *data,
> +					      unsigned int cpu)
> +{
> +	unsigned int cpu_min_freq, cpu_max_freq, cpu_curr_freq_khz, cpu_percent;
> +	unsigned long dev_min_freq, dev_max_freq, dev_max_state;
> +
> +	struct devfreq_cpu_state *cpu_state = data->cpu_state[cpu];
> +	struct devfreq *devfreq = (struct devfreq *)data->this;
> +	unsigned long *dev_freq_table = devfreq->profile->freq_table;
> +	struct dev_pm_opp *opp = NULL, *p_opp = NULL;
> +	unsigned long cpu_curr_freq, freq;
> +
> +	if (!cpu_state || cpu_state->first_cpu != cpu ||
> +	    !cpu_state->opp_table || !devfreq->opp_table)
> +		return 0;
> +
> +	cpu_curr_freq = cpu_state->curr_freq * 1000;
> +	p_opp = devfreq_recommended_opp(cpu_state->cpu_dev, &cpu_curr_freq, 0);
> +	if (IS_ERR(p_opp))
> +		return 0;
> +
> +	opp = dev_pm_opp_xlate_required_opp(cpu_state->opp_table,
> +					    devfreq->opp_table, p_opp);
> +	dev_pm_opp_put(p_opp);
> +
> +	if (!IS_ERR(opp)) {
> +		freq = dev_pm_opp_get_freq(opp);
> +		dev_pm_opp_put(opp);
> +		goto out;
> +	}
> +
> +	/* Use Interpolation if required opps is not available */
> +	cpu_min_freq = cpu_state->min_freq;
> +	cpu_max_freq = cpu_state->max_freq;
> +	cpu_curr_freq_khz = cpu_state->curr_freq;
> +
> +	if (dev_freq_table) {
> +		/* Get minimum frequency according to sorting order */
> +		dev_max_state = dev_freq_table[devfreq->profile->max_state - 1];
> +		if (dev_freq_table[0] < dev_max_state) {
> +			dev_min_freq = dev_freq_table[0];
> +			dev_max_freq = dev_max_state;
> +		} else {
> +			dev_min_freq = dev_max_state;
> +			dev_max_freq = dev_freq_table[0];
> +		}
> +	} else {
> +		dev_min_freq = dev_pm_qos_read_value(devfreq->dev.parent,
> +						     DEV_PM_QOS_MIN_FREQUENCY);
> +		dev_max_freq = dev_pm_qos_read_value(devfreq->dev.parent,
> +						     DEV_PM_QOS_MAX_FREQUENCY);
> +
> +		if (dev_max_freq <= dev_min_freq)
> +			return 0;
> +	}
> +	cpu_percent = ((cpu_curr_freq_khz - cpu_min_freq) * 100) / cpu_max_freq - cpu_min_freq;
> +	freq = dev_min_freq + mult_frac(dev_max_freq - dev_min_freq, cpu_percent, 100);
> +
> +out:
> +	return freq;
> +}
> +
> +static int get_target_freq_with_cpufreq(struct devfreq *devfreq,
> +					unsigned long *freq)
> +{
> +	struct devfreq_passive_data *p_data =
> +				(struct devfreq_passive_data *)devfreq->data;
> +	unsigned int cpu;
> +	unsigned long target_freq = 0;
> +
> +	for_each_online_cpu(cpu)
> +		target_freq = max(target_freq,
> +				  xlate_cpufreq_to_devfreq(p_data, cpu));
> +
> +	*freq = target_freq;
> +
> +	return 0;
> +}

As you knew, governor_passive.c was already used 
both 'dev_pm_opp_xlate_required_opp' and 'devfreq_recommended_opp'
to get the target from OPP. So, I wan to make the common function
like 'get_taget_freq_by_required_opp' as following:
If define 'get_taget_freq_by_required_opp' as following,
it will be used for get_target_freq_with_devfreq().
After finisied the review of this patch, I'll send the patch[2].
[2] https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/commit/?h=devfreq-testing-passive-gov&id=101c5a087586ab2b5cf3370166a7e39227ca83cf

For example but this code is not tested,
static unsigned long get_taget_freq_by_required_opp(struct device *p_dev,
						struct opp_table *p_opp_table,
						struct opp_table *opp_table,
						unsigned long freq)
{
	struct dev_pm_opp *opp = NULL, *p_opp = NULL;

	if (!p_dev || !p_opp_table || !opp_table || !freq)
		return 0;

	p_opp = devfreq_recommended_opp(p_dev, &freq, 0);
	if (IS_ERR(p_opp))
		return 0;

	opp = dev_pm_opp_xlate_required_opp(p_opp_table, opp_table, p_opp);
	dev_pm_opp_put(p_opp);

	if (IS_ERR(opp))
		return 0;

	freq = dev_pm_opp_get_freq(opp);
	dev_pm_opp_put(opp);

	return freq;
}

static int get_target_freq_with_cpufreq(struct devfreq *devfreq,
					unsigned long *target_freq)
{
	struct devfreq_passive_data *p_data =
				(struct devfreq_passive_data *)devfreq->data;
	struct devfreq_cpu_data *cpu_data;
	unsigned long cpu, cpu_cur, cpu_min, cpu_max, cpu_percent;
	unsigned long dev_min, dev_max;
	unsigned long freq = 0;

	for_each_online_cpu(cpu) {
		cpu_data = p_data->cpu_data[cpu];
		if (!cpu_data || cpu_data->first_cpu != cpu)
			continue;

		/* Get target freq via required opps */
		cpu_cur = cpu_data->cur_freq * HZ_PER_KHZ;
		freq = get_taget_freq_by_required_opp(cpu_data->dev,
					cpu_data->opp_table,
					devfreq->opp_table, cpu_cur);
		if (freq) {
			*target_freq = max(freq, *target_freq);
			continue;
		}

		/* Use Interpolation if required opps is not available */
		devfreq_get_freq_range(devfreq, &dev_min, &dev_max);

		cpu_min = cpu_data->min_freq;
		cpu_max = cpu_data->max_freq;
		cpu_cur = cpu_data->cur_freq;

		cpu_percent = ((cpu_cur - cpu_min) * 100) / cpu_max - cpu_min;
		freq = dev_min + mult_frac(dev_max - dev_min, cpu_percent, 100);

		*target_freq = max(freq, *target_freq);
	}

	return 0;
}

> +
> +static int get_target_freq_with_devfreq(struct devfreq *devfreq,
>  					unsigned long *freq)
>  {
>  	struct devfreq_passive_data *p_data
> @@ -23,14 +115,6 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
>  	int i, count;
>  
>  	/*
> -	 * If the devfreq device with passive governor has the specific method
> -	 * to determine the next frequency, should use the get_target_freq()
> -	 * of struct devfreq_passive_data.
> -	 */
> -	if (p_data->get_target_freq)
> -		return p_data->get_target_freq(devfreq, freq);
> -
> -	/*
>  	 * If the parent and passive devfreq device uses the OPP table,
>  	 * get the next frequency by using the OPP table.
>  	 */
> @@ -98,6 +182,37 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
>  	return 0;
>  }
>  
> +static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
> +					   unsigned long *freq)
> +{
> +	struct devfreq_passive_data *p_data =
> +		(struct devfreq_passive_data *)devfreq->data;
> +	int ret;
> +
> +	/*
> +	 * If the devfreq device with passive governor has the specific method
> +	 * to determine the next frequency, should use the get_target_freq()
> +	 * of struct devfreq_passive_data.
> +	 */
> +	if (p_data->get_target_freq)
> +		return p_data->get_target_freq(devfreq, freq);
> +
> +	switch (p_data->parent_type) {
> +	case DEVFREQ_PARENT_DEV:
> +		ret = get_target_freq_with_devfreq(devfreq, freq);
> +		break;
> +	case CPUFREQ_PARENT_DEV:
> +		ret = get_target_freq_with_cpufreq(devfreq, freq);
> +		break;
> +	default:
> +		ret = -EINVAL;
> +		dev_err(&devfreq->dev, "Invalid parent type\n");
> +		break;
> +	}
> +
> +	return ret;
> +}
> +
>  static int devfreq_passive_notifier_call(struct notifier_block *nb,
>  				unsigned long event, void *ptr)
>  {
> @@ -130,16 +245,200 @@ static int devfreq_passive_notifier_call(struct notifier_block *nb,
>  	return NOTIFY_DONE;
>  }
>  
> +static int cpufreq_passive_notifier_call(struct notifier_block *nb,
> +					 unsigned long event, void *ptr)
> +{
> +	struct devfreq_passive_data *data =
> +			container_of(nb, struct devfreq_passive_data, nb);
> +	struct devfreq *devfreq = (struct devfreq *)data->this;
> +	struct devfreq_cpu_state *cpu_state;
> +	struct cpufreq_freqs *cpu_freq = ptr;

Use 'freqs' variable name.  I prefer to use the same variable name
for both devfreq_freqs and cpufreq_freqs instance.

> +	unsigned int curr_freq;

As I commented above, better to use 'cur_frq' instead of 'curr_freq'
if there is no any special reason.

> +	int ret;
> +
> +	if (event != CPUFREQ_POSTCHANGE || !cpu_freq ||
> +	    !data->cpu_state[cpu_freq->policy->cpu])
> +		return 0;
> +
> +	cpu_state = data->cpu_state[cpu_freq->policy->cpu];
> +	if (cpu_state->curr_freq == cpu_freq->new)
> +		return 0;
> +
> +	/* Backup current freq and pre-update cpu state freq*/

I think that this commnet is not critial. So, please drop this comment.

> +	curr_freq = cpu_state->curr_freq;
> +	cpu_state->curr_freq = cpu_freq->new;
> +
> +	mutex_lock(&devfreq->lock);
> +	ret = update_devfreq(devfreq);

I recommend to use 'devfreq_update_target' instead of 'update_devfreq'
as following:
	devfreq_update_target(devfreq, freqs->new);

> +	mutex_unlock(&devfreq->lock);
> +	if (ret) {
> +		cpu_state->curr_freq = curr_freq;
> +		dev_err(&devfreq->dev, "Couldn't update the frequency.\n");
> +		return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +static int cpufreq_passive_register(struct devfreq_passive_data **p_data)

In order to keep the consistent style of function name,
please change the name as following because devfreq defines
the function name as 'devfreq_regiter_notifier'
- cpufreq_passive_register -> cpufreq_passive_register_notifier

> +{
> +	struct devfreq_passive_data *data = *p_data;
> +	struct devfreq *devfreq = (struct devfreq *)data->this;
> +	struct device *dev = devfreq->dev.parent;
> +	struct opp_table *opp_table = NULL;
> +	struct devfreq_cpu_state *cpu_state;
> +	struct cpufreq_policy *policy;
> +	struct device *cpu_dev;
> +	unsigned int cpu;
> +	int ret;
> +
> +	get_online_cpus();
> +
> +	data->nb.notifier_call = cpufreq_passive_notifier_call;
> +	ret = cpufreq_register_notifier(&data->nb,
> +					CPUFREQ_TRANSITION_NOTIFIER);
> +	if (ret) {
> +		dev_err(dev, "Couldn't register cpufreq notifier.\n");
> +		data->nb.notifier_call = NULL;
> +		goto out;
> +	}
> +
> +	/* Populate devfreq_cpu_state */

Don't need this comment. Please drop it.

> +	for_each_online_cpu(cpu) {
> +		if (data->cpu_state[cpu])
> +			continue;
> +
> +		policy = cpufreq_cpu_get(cpu);
> +		if (!policy) {
> +			ret = -EINVAL;
> +			goto out;
> +		} else if (PTR_ERR(policy) == -EPROBE_DEFER) {
> +			ret = -EPROBE_DEFER;
> +			goto out;
> +		} else if (IS_ERR(policy)) {
> +			ret = PTR_ERR(policy);
> +			dev_err(dev, "Couldn't get the cpufreq_poliy.\n");
> +			goto out;
> +		}

Use dev_err_probe() funciotn to handle hte EPROBE_DEFER.
It make code more simple.

> +
> +		cpu_state = kzalloc(sizeof(*cpu_state), GFP_KERNEL);
> +		if (!cpu_state) {
> +			ret = -ENOMEM;
> +			goto out;
> +		}
> +
> +		cpu_dev = get_cpu_device(cpu);
> +		if (!cpu_dev) {
> +			dev_err(dev, "Couldn't get cpu device.\n");
> +			ret = -ENODEV;
> +			goto out;
> +		}
> +
> +		opp_table = dev_pm_opp_get_opp_table(cpu_dev);
> +		if (IS_ERR(devfreq->opp_table)) {
> +			ret = PTR_ERR(opp_table);
> +			goto out;
> +		}
> +
> +		cpu_state->cpu_dev = cpu_dev;
> +		cpu_state->opp_table = opp_table;
> +		cpu_state->first_cpu = cpumask_first(policy->related_cpus);
> +		cpu_state->curr_freq = policy->cur;
> +		cpu_state->min_freq = policy->cpuinfo.min_freq;
> +		cpu_state->max_freq = policy->cpuinfo.max_freq;
> +		data->cpu_state[cpu] = cpu_state;
> +
> +		cpufreq_cpu_put(policy);
> +	}
> +
> +out:
> +	put_online_cpus();
> +	if (ret)
> +		return ret;
> +
> +	/* Update devfreq */
> +	mutex_lock(&devfreq->lock);
> +	ret = update_devfreq(devfreq);

> +	mutex_unlock(&devfreq->lock);
> +	if (ret)
> +		dev_err(dev, "Couldn't update the frequency.\n");
> +
> +	return ret;
> +}
> +
> +static int cpufreq_passive_unregister(struct devfreq_passive_data **p_data)

As I commented above, please change the name as following:
- cpufreq_passive_unregister -> cpufreq_passive_unregister_notifier

> +{
> +	struct devfreq_passive_data *data = *p_data;
> +	struct devfreq_cpu_state *cpu_state;
> +	int cpu;
> +
> +	if (data->nb.notifier_call)
> +		cpufreq_unregister_notifier(&data->nb,
> +					    CPUFREQ_TRANSITION_NOTIFIER);
> +
> +	for_each_possible_cpu(cpu) {
> +		cpu_state = data->cpu_state[cpu];
> +		if (cpu_state) {
> +			if (cpu_state->opp_table)
> +				dev_pm_opp_put_opp_table(cpu_state->opp_table);
> +			kfree(cpu_state);
> +			cpu_state = NULL;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +int register_parent_dev_notifier(struct devfreq_passive_data **p_data)
> +{
> +	struct notifier_block *nb = &(*p_data)->nb;
> +	int ret = 0;
> +
> +	switch ((*p_data)->parent_type) {
> +	case DEVFREQ_PARENT_DEV:
> +		nb->notifier_call = devfreq_passive_notifier_call;
> +		ret = devfreq_register_notifier((struct devfreq *)(*p_data)->parent, nb,
> +						DEVFREQ_TRANSITION_NOTIFIER);
> +		break;
> +	case CPUFREQ_PARENT_DEV:
> +		ret = cpufreq_passive_register(p_data);
> +		break;
> +	default:
> +		ret = -EINVAL;
> +		break;
> +	}
> +	return ret;
> +}
> +
> +int unregister_parent_dev_notifier(struct devfreq_passive_data **p_data)
> +{
> +	int ret = 0;
> +
> +	switch ((*p_data)->parent_type) {
> +	case DEVFREQ_PARENT_DEV:
> +		WARN_ON(devfreq_unregister_notifier((struct devfreq *)(*p_data)->parent,
> +						    &(*p_data)->nb,
> +						    DEVFREQ_TRANSITION_NOTIFIER));
> +		break;
> +	case CPUFREQ_PARENT_DEV:
> +		cpufreq_passive_unregister(p_data);
> +		break;
> +	default:
> +		ret = -EINVAL;
> +		break;
> +	}
> +	return ret;
> +}

I think that you don't need to define register_parent_dev_notifier
and unregister_parent_dev_notifier as the separate functions.

Instead of the separate functions, just add the code
into devfreq_passive_event_handler.


> +
>  static int devfreq_passive_event_handler(struct devfreq *devfreq,
>  				unsigned int event, void *data)
>  {
>  	struct devfreq_passive_data *p_data
>  			= (struct devfreq_passive_data *)devfreq->data;
>  	struct devfreq *parent = (struct devfreq *)p_data->parent;
> -	struct notifier_block *nb = &p_data->nb;
>  	int ret = 0;
>  
> -	if (!parent)
> +	if (p_data->parent_type == DEVFREQ_PARENT_DEV && !parent)
>  		return -EPROBE_DEFER;
>  
>  	switch (event) {
> @@ -147,13 +446,11 @@ static int devfreq_passive_event_handler(struct devfreq *devfreq,
>  		if (!p_data->this)
>  			p_data->this = devfreq;
>  
> -		nb->notifier_call = devfreq_passive_notifier_call;
> -		ret = devfreq_register_notifier(parent, nb,
> -					DEVFREQ_TRANSITION_NOTIFIER);
> +		ret = register_parent_dev_notifier(&p_data);
>  		break;
> +
>  	case DEVFREQ_GOV_STOP:
> -		WARN_ON(devfreq_unregister_notifier(parent, nb,
> -					DEVFREQ_TRANSITION_NOTIFIER));
> +		ret = unregister_parent_dev_notifier(&p_data);
>  		break;
>  	default:
>  		break;
> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
> index 26ea0850be9b..e0093b7c805c 100644
> --- a/include/linux/devfreq.h
> +++ b/include/linux/devfreq.h
> @@ -280,6 +280,25 @@ struct devfreq_simple_ondemand_data {
>  
>  #if IS_ENABLED(CONFIG_DEVFREQ_GOV_PASSIVE)
>  /**
> + * struct devfreq_cpu_state - holds the per-cpu state
> + * @freq:	the current frequency of the cpu.
> + * @min_freq:	the min frequency of the cpu.
> + * @max_freq:	the max frequency of the cpu.
> + * @first_cpu:	the cpumask of the first cpu of a policy.
> + * @dev:	reference to cpu device.
> + * @opp_table:	reference to cpu opp table.
> + *
> + * This structure stores the required cpu_state of a cpu.
> + * This is auto-populated by the governor.
> + */
> +struct devfreq_cpu_state;
> +
> +enum devfreq_parent_dev_type {
> +	DEVFREQ_PARENT_DEV,
> +	CPUFREQ_PARENT_DEV,
> +};
> +
> +/**
>   * struct devfreq_passive_data - ``void *data`` fed to struct devfreq
>   *	and devfreq_add_device
>   * @parent:	the devfreq instance of parent device.
> @@ -290,13 +309,15 @@ struct devfreq_simple_ondemand_data {
>   *			using governors except for passive governor.
>   *			If the devfreq device has the specific method to decide
>   *			the next frequency, should use this callback.
> + * @parent_type:	parent type of the device
>   * @this:	the devfreq instance of own device.
>   * @nb:		the notifier block for DEVFREQ_TRANSITION_NOTIFIER list
> + * @cpu_state:		the state min/max/current frequency of all online cpu's
>   *
>   * The devfreq_passive_data have to set the devfreq instance of parent
>   * device with governors except for the passive governor. But, don't need to
> - * initialize the 'this' and 'nb' field because the devfreq core will handle
> - * them.
> + * initialize the 'this', 'nb' and 'cpu_state' field because the devfreq core
> + * will handle them.
>   */
>  struct devfreq_passive_data {
>  	/* Should set the devfreq instance of parent device */
> @@ -305,9 +326,13 @@ struct devfreq_passive_data {
>  	/* Optional callback to decide the next frequency of passvice device */
>  	int (*get_target_freq)(struct devfreq *this, unsigned long *freq);
>  
> +	/* Should set the type of parent device */
> +	enum devfreq_parent_dev_type parent_type;
> +
>  	/* For passive governor's internal use. Don't need to set them */
>  	struct devfreq *this;
>  	struct notifier_block nb;
> +	struct devfreq_cpu_state *cpu_state[NR_CPUS];
>  };
>  #endif
>  
>
Chanwoo Choi March 25, 2021, 8:14 a.m. UTC | #2
Hi,

You are missing to add these patches to linux-pm mailing list.
Need to send them to linu-pm ML.

Also, before received this series, I tried to clean-up these patches
on testing branch[1]. So that I add my comment with my clean-up case.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov

And 'Saravana Kannan <skannan@codeaurora.org>' is wrong email address.
Please update the email or drop this email.


On 3/23/21 8:33 PM, Andrew-sh.Cheng wrote:
> From: Saravana Kannan <skannan@codeaurora.org>
>
> Many CPU architectures have caches that can scale independent of the
> CPUs. Frequency scaling of the caches is necessary to make sure that the
> cache is not a performance bottleneck that leads to poor performance and
> power. The same idea applies for RAM/DDR.
>
> To achieve this, this patch adds support for cpu based scaling to the
> passive governor. This is accomplished by taking the current frequency
> of each CPU frequency domain and then adjust the frequency of the cache
> (or any devfreq device) based on the frequency of the CPUs. It listens
> to CPU frequency transition notifiers to keep itself up to date on the
> current CPU frequency.
>
> To decide the frequency of the device, the governor does one of the
> following:
> * Derives the optimal devfreq device opp from required-opps property of
>   the parent cpu opp_table.
>
> * Scales the device frequency in proportion to the CPU frequency. So, if
>   the CPUs are running at their max frequency, the device runs at its
>   max frequency. If the CPUs are running at their min frequency, the
>   device runs at its min frequency. It is interpolated for frequencies
>   in between.
>
> Andrew-sh.Cheng change
> dev_pm_opp_xlate_opp to dev_pm_opp_xlate_required_opp devfreq->max_freq
> to devfreq->user_min_freq_req.data.freq.qos->min_freq.target_value
> after kernel-5.7
> Don't return -EINVAL in devfreq_passive_event_handler()
> since it doesn't handle DEVFREQ_GOV_SUSPEND DEVFREQ_GOV_RESUME cases.
>
> Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
> [Sibi: Integrated cpu-freqmap governor into passive_governor]
> Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
> Signed-off-by: Andrew-sh.Cheng <andrew-sh.cheng@mediatek.com>
> ---
>  drivers/devfreq/Kconfig            |   2 +
>  drivers/devfreq/governor_passive.c | 329 +++++++++++++++++++++++++++++++++++--
>  include/linux/devfreq.h            |  29 +++-
>  3 files changed, 342 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/devfreq/Kconfig b/drivers/devfreq/Kconfig
> index 00704efe6398..f56132b0ae64 100644
> --- a/drivers/devfreq/Kconfig
> +++ b/drivers/devfreq/Kconfig
> @@ -73,6 +73,8 @@ config DEVFREQ_GOV_PASSIVE
>  	  device. This governor does not change the frequency by itself
>  	  through sysfs entries. The passive governor recommends that
>  	  devfreq device uses the OPP table to get the frequency/voltage.
> +	  Alternatively the governor can also be chosen to scale based on
> +	  the online CPUs current frequency.
>  
>  comment "DEVFREQ Drivers"
>  
> diff --git a/drivers/devfreq/governor_passive.c b/drivers/devfreq/governor_passive.c
> index b094132bd20b..9cc57b083839 100644
> --- a/drivers/devfreq/governor_passive.c
> +++ b/drivers/devfreq/governor_passive.c
> @@ -8,11 +8,103 @@
>   */
>  
>  #include <linux/module.h>
> +#include <linux/cpu.h>
> +#include <linux/cpufreq.h>
> +#include <linux/cpumask.h>
>  #include <linux/device.h>
>  #include <linux/devfreq.h>
> +#include <linux/slab.h>
>  #include "governor.h"
>  
> -static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
> +struct devfreq_cpu_state {
> +	unsigned int curr_freq;
> +	unsigned int min_freq;
> +	unsigned int max_freq;
> +	unsigned int first_cpu;
> +	struct device *cpu_dev;
> +	struct opp_table *opp_table;
> +};

As I knew, the previous version has the description of structure
as following:  I wan to add the description like below.

And if you have no any objection, I'd like you to order
the variables as following and use 'dev' instead of 'cpu_dev'
because this patch use the 'cpu_state->cpu_dev' at the multiple points.
I think that 'cpu_state->dev' is better than 'cpu_state->cpu_dev'.
Also, I prefer to use 'cur_freq' instead of 'curr_freq'
because devfreq subsystem uses 'cur_freq' for expressing the 'current frequency'.

/**                                                                             
 * struct devfreq_cpu_state - Hold the per-cpu data                              
 * @dev:        reference to cpu device.                                        
 * @first_cpu:  the cpumask of the first cpu of a policy.                       
 * @opp_table:  reference to cpu opp table.                                     
 * @cur_freq:   the current frequency of the cpu.                               
 * @min_freq:   the min frequency of the cpu.                                   
 * @max_freq:   the max frequency of the cpu.                                   
 *                                                                              
 * This structure stores the required cpu_data of a cpu.                        
 * This is auto-populated by the governor.                                      
 */                                                                             
struct devfreq_cpu_state {                                                       
         struct device *dev;                                                     
         unsigned int first_cpu;                                                 

         struct opp_table *opp_table;                                            
         unsigned int cur_freq;                                                  
         unsigned int min_freq;                                                  
         unsigned int max_freq;                                                  
};               


> +
> +static unsigned long xlate_cpufreq_to_devfreq(struct devfreq_passive_data *data,
> +					      unsigned int cpu)
> +{
> +	unsigned int cpu_min_freq, cpu_max_freq, cpu_curr_freq_khz, cpu_percent;
> +	unsigned long dev_min_freq, dev_max_freq, dev_max_state;
> +
> +	struct devfreq_cpu_state *cpu_state = data->cpu_state[cpu];
> +	struct devfreq *devfreq = (struct devfreq *)data->this;
> +	unsigned long *dev_freq_table = devfreq->profile->freq_table;
> +	struct dev_pm_opp *opp = NULL, *p_opp = NULL;
> +	unsigned long cpu_curr_freq, freq;
> +
> +	if (!cpu_state || cpu_state->first_cpu != cpu ||
> +	    !cpu_state->opp_table || !devfreq->opp_table)
> +		return 0;
> +
> +	cpu_curr_freq = cpu_state->curr_freq * 1000;
> +	p_opp = devfreq_recommended_opp(cpu_state->cpu_dev, &cpu_curr_freq, 0);
> +	if (IS_ERR(p_opp))
> +		return 0;
> +
> +	opp = dev_pm_opp_xlate_required_opp(cpu_state->opp_table,
> +					    devfreq->opp_table, p_opp);
> +	dev_pm_opp_put(p_opp);
> +
> +	if (!IS_ERR(opp)) {
> +		freq = dev_pm_opp_get_freq(opp);
> +		dev_pm_opp_put(opp);
> +		goto out;
> +	}
> +
> +	/* Use Interpolation if required opps is not available */
> +	cpu_min_freq = cpu_state->min_freq;
> +	cpu_max_freq = cpu_state->max_freq;
> +	cpu_curr_freq_khz = cpu_state->curr_freq;
> +
> +	if (dev_freq_table) {
> +		/* Get minimum frequency according to sorting order */
> +		dev_max_state = dev_freq_table[devfreq->profile->max_state - 1];
> +		if (dev_freq_table[0] < dev_max_state) {
> +			dev_min_freq = dev_freq_table[0];
> +			dev_max_freq = dev_max_state;
> +		} else {
> +			dev_min_freq = dev_max_state;
> +			dev_max_freq = dev_freq_table[0];
> +		}
> +	} else {
> +		dev_min_freq = dev_pm_qos_read_value(devfreq->dev.parent,
> +						     DEV_PM_QOS_MIN_FREQUENCY);
> +		dev_max_freq = dev_pm_qos_read_value(devfreq->dev.parent,
> +						     DEV_PM_QOS_MAX_FREQUENCY);
> +
> +		if (dev_max_freq <= dev_min_freq)
> +			return 0;
> +	}
> +	cpu_percent = ((cpu_curr_freq_khz - cpu_min_freq) * 100) / cpu_max_freq - cpu_min_freq;
> +	freq = dev_min_freq + mult_frac(dev_max_freq - dev_min_freq, cpu_percent, 100);
> +
> +out:
> +	return freq;
> +}
> +
> +static int get_target_freq_with_cpufreq(struct devfreq *devfreq,
> +					unsigned long *freq)
> +{
> +	struct devfreq_passive_data *p_data =
> +				(struct devfreq_passive_data *)devfreq->data;
> +	unsigned int cpu;
> +	unsigned long target_freq = 0;
> +
> +	for_each_online_cpu(cpu)
> +		target_freq = max(target_freq,
> +				  xlate_cpufreq_to_devfreq(p_data, cpu));
> +
> +	*freq = target_freq;
> +
> +	return 0;
> +}

As you knew, governor_passive.c was already used 
both 'dev_pm_opp_xlate_required_opp' and 'devfreq_recommended_opp'
to get the target from OPP. So, I wan to make the common function
like 'get_taget_freq_by_required_opp' as following:
If define 'get_taget_freq_by_required_opp' as following,
it will be used for get_target_freq_with_devfreq().
After finisied the review of this patch, I'll send the patch[2].
[2] https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/commit/?h=devfreq-testing-passive-gov&id=101c5a087586ab2b5cf3370166a7e39227ca83cf

For example but this code is not tested,
static unsigned long get_taget_freq_by_required_opp(struct device *p_dev,
						struct opp_table *p_opp_table,
						struct opp_table *opp_table,
						unsigned long freq)
{
	struct dev_pm_opp *opp = NULL, *p_opp = NULL;

	if (!p_dev || !p_opp_table || !opp_table || !freq)
		return 0;

	p_opp = devfreq_recommended_opp(p_dev, &freq, 0);
	if (IS_ERR(p_opp))
		return 0;

	opp = dev_pm_opp_xlate_required_opp(p_opp_table, opp_table, p_opp);
	dev_pm_opp_put(p_opp);

	if (IS_ERR(opp))
		return 0;

	freq = dev_pm_opp_get_freq(opp);
	dev_pm_opp_put(opp);

	return freq;
}

static int get_target_freq_with_cpufreq(struct devfreq *devfreq,
					unsigned long *target_freq)
{
	struct devfreq_passive_data *p_data =
				(struct devfreq_passive_data *)devfreq->data;
	struct devfreq_cpu_data *cpu_data;
	unsigned long cpu, cpu_cur, cpu_min, cpu_max, cpu_percent;
	unsigned long dev_min, dev_max;
	unsigned long freq = 0;

	for_each_online_cpu(cpu) {
		cpu_data = p_data->cpu_data[cpu];
		if (!cpu_data || cpu_data->first_cpu != cpu)
			continue;

		/* Get target freq via required opps */
		cpu_cur = cpu_data->cur_freq * HZ_PER_KHZ;
		freq = get_taget_freq_by_required_opp(cpu_data->dev,
					cpu_data->opp_table,
					devfreq->opp_table, cpu_cur);
		if (freq) {
			*target_freq = max(freq, *target_freq);
			continue;
		}

		/* Use Interpolation if required opps is not available */
		devfreq_get_freq_range(devfreq, &dev_min, &dev_max);

		cpu_min = cpu_data->min_freq;
		cpu_max = cpu_data->max_freq;
		cpu_cur = cpu_data->cur_freq;

		cpu_percent = ((cpu_cur - cpu_min) * 100) / cpu_max - cpu_min;
		freq = dev_min + mult_frac(dev_max - dev_min, cpu_percent, 100);

		*target_freq = max(freq, *target_freq);
	}

	return 0;
}

> +
> +static int get_target_freq_with_devfreq(struct devfreq *devfreq,
>  					unsigned long *freq)
>  {
>  	struct devfreq_passive_data *p_data
> @@ -23,14 +115,6 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
>  	int i, count;
>  
>  	/*
> -	 * If the devfreq device with passive governor has the specific method
> -	 * to determine the next frequency, should use the get_target_freq()
> -	 * of struct devfreq_passive_data.
> -	 */
> -	if (p_data->get_target_freq)
> -		return p_data->get_target_freq(devfreq, freq);
> -
> -	/*
>  	 * If the parent and passive devfreq device uses the OPP table,
>  	 * get the next frequency by using the OPP table.
>  	 */
> @@ -98,6 +182,37 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
>  	return 0;
>  }
>  
> +static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
> +					   unsigned long *freq)
> +{
> +	struct devfreq_passive_data *p_data =
> +		(struct devfreq_passive_data *)devfreq->data;
> +	int ret;
> +
> +	/*
> +	 * If the devfreq device with passive governor has the specific method
> +	 * to determine the next frequency, should use the get_target_freq()
> +	 * of struct devfreq_passive_data.
> +	 */
> +	if (p_data->get_target_freq)
> +		return p_data->get_target_freq(devfreq, freq);
> +
> +	switch (p_data->parent_type) {
> +	case DEVFREQ_PARENT_DEV:
> +		ret = get_target_freq_with_devfreq(devfreq, freq);
> +		break;
> +	case CPUFREQ_PARENT_DEV:
> +		ret = get_target_freq_with_cpufreq(devfreq, freq);
> +		break;
> +	default:
> +		ret = -EINVAL;
> +		dev_err(&devfreq->dev, "Invalid parent type\n");
> +		break;
> +	}
> +
> +	return ret;
> +}
> +
>  static int devfreq_passive_notifier_call(struct notifier_block *nb,
>  				unsigned long event, void *ptr)
>  {
> @@ -130,16 +245,200 @@ static int devfreq_passive_notifier_call(struct notifier_block *nb,
>  	return NOTIFY_DONE;
>  }
>  
> +static int cpufreq_passive_notifier_call(struct notifier_block *nb,
> +					 unsigned long event, void *ptr)
> +{
> +	struct devfreq_passive_data *data =
> +			container_of(nb, struct devfreq_passive_data, nb);
> +	struct devfreq *devfreq = (struct devfreq *)data->this;
> +	struct devfreq_cpu_state *cpu_state;
> +	struct cpufreq_freqs *cpu_freq = ptr;

Use 'freqs' variable name.  I prefer to use the same variable name
for both devfreq_freqs and cpufreq_freqs instance.

> +	unsigned int curr_freq;

As I commented above, better to use 'cur_frq' instead of 'curr_freq'
if there is no any special reason.

> +	int ret;
> +
> +	if (event != CPUFREQ_POSTCHANGE || !cpu_freq ||
> +	    !data->cpu_state[cpu_freq->policy->cpu])
> +		return 0;
> +
> +	cpu_state = data->cpu_state[cpu_freq->policy->cpu];
> +	if (cpu_state->curr_freq == cpu_freq->new)
> +		return 0;
> +
> +	/* Backup current freq and pre-update cpu state freq*/

I think that this commnet is not critial. So, please drop this comment.

> +	curr_freq = cpu_state->curr_freq;
> +	cpu_state->curr_freq = cpu_freq->new;
> +
> +	mutex_lock(&devfreq->lock);
> +	ret = update_devfreq(devfreq);

I recommend to use 'devfreq_update_target' instead of 'update_devfreq'
as following:
	devfreq_update_target(devfreq, freqs->new);

> +	mutex_unlock(&devfreq->lock);
> +	if (ret) {
> +		cpu_state->curr_freq = curr_freq;
> +		dev_err(&devfreq->dev, "Couldn't update the frequency.\n");
> +		return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +static int cpufreq_passive_register(struct devfreq_passive_data **p_data)

In order to keep the consistent style of function name,
please change the name as following because devfreq defines
the function name as 'devfreq_regiter_notifier'
- cpufreq_passive_register -> cpufreq_passive_register_notifier

> +{
> +	struct devfreq_passive_data *data = *p_data;
> +	struct devfreq *devfreq = (struct devfreq *)data->this;
> +	struct device *dev = devfreq->dev.parent;
> +	struct opp_table *opp_table = NULL;
> +	struct devfreq_cpu_state *cpu_state;
> +	struct cpufreq_policy *policy;
> +	struct device *cpu_dev;
> +	unsigned int cpu;
> +	int ret;
> +
> +	get_online_cpus();
> +
> +	data->nb.notifier_call = cpufreq_passive_notifier_call;
> +	ret = cpufreq_register_notifier(&data->nb,
> +					CPUFREQ_TRANSITION_NOTIFIER);
> +	if (ret) {
> +		dev_err(dev, "Couldn't register cpufreq notifier.\n");
> +		data->nb.notifier_call = NULL;
> +		goto out;
> +	}
> +
> +	/* Populate devfreq_cpu_state */

Don't need this comment. Please drop it.

> +	for_each_online_cpu(cpu) {
> +		if (data->cpu_state[cpu])
> +			continue;
> +
> +		policy = cpufreq_cpu_get(cpu);
> +		if (!policy) {
> +			ret = -EINVAL;
> +			goto out;
> +		} else if (PTR_ERR(policy) == -EPROBE_DEFER) {
> +			ret = -EPROBE_DEFER;
> +			goto out;
> +		} else if (IS_ERR(policy)) {
> +			ret = PTR_ERR(policy);
> +			dev_err(dev, "Couldn't get the cpufreq_poliy.\n");
> +			goto out;
> +		}

Use dev_err_probe() funciotn to handle hte EPROBE_DEFER.
It make code more simple.

> +
> +		cpu_state = kzalloc(sizeof(*cpu_state), GFP_KERNEL);
> +		if (!cpu_state) {
> +			ret = -ENOMEM;
> +			goto out;
> +		}
> +
> +		cpu_dev = get_cpu_device(cpu);
> +		if (!cpu_dev) {
> +			dev_err(dev, "Couldn't get cpu device.\n");
> +			ret = -ENODEV;
> +			goto out;
> +		}
> +
> +		opp_table = dev_pm_opp_get_opp_table(cpu_dev);
> +		if (IS_ERR(devfreq->opp_table)) {
> +			ret = PTR_ERR(opp_table);
> +			goto out;
> +		}
> +
> +		cpu_state->cpu_dev = cpu_dev;
> +		cpu_state->opp_table = opp_table;
> +		cpu_state->first_cpu = cpumask_first(policy->related_cpus);
> +		cpu_state->curr_freq = policy->cur;
> +		cpu_state->min_freq = policy->cpuinfo.min_freq;
> +		cpu_state->max_freq = policy->cpuinfo.max_freq;
> +		data->cpu_state[cpu] = cpu_state;
> +
> +		cpufreq_cpu_put(policy);
> +	}
> +
> +out:
> +	put_online_cpus();
> +	if (ret)
> +		return ret;
> +
> +	/* Update devfreq */
> +	mutex_lock(&devfreq->lock);
> +	ret = update_devfreq(devfreq);

> +	mutex_unlock(&devfreq->lock);
> +	if (ret)
> +		dev_err(dev, "Couldn't update the frequency.\n");
> +
> +	return ret;
> +}
> +
> +static int cpufreq_passive_unregister(struct devfreq_passive_data **p_data)

As I commented above, please change the name as following:
- cpufreq_passive_unregister -> cpufreq_passive_unregister_notifier

> +{
> +	struct devfreq_passive_data *data = *p_data;
> +	struct devfreq_cpu_state *cpu_state;
> +	int cpu;
> +
> +	if (data->nb.notifier_call)
> +		cpufreq_unregister_notifier(&data->nb,
> +					    CPUFREQ_TRANSITION_NOTIFIER);
> +
> +	for_each_possible_cpu(cpu) {
> +		cpu_state = data->cpu_state[cpu];
> +		if (cpu_state) {
> +			if (cpu_state->opp_table)
> +				dev_pm_opp_put_opp_table(cpu_state->opp_table);
> +			kfree(cpu_state);
> +			cpu_state = NULL;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +int register_parent_dev_notifier(struct devfreq_passive_data **p_data)
> +{
> +	struct notifier_block *nb = &(*p_data)->nb;
> +	int ret = 0;
> +
> +	switch ((*p_data)->parent_type) {
> +	case DEVFREQ_PARENT_DEV:
> +		nb->notifier_call = devfreq_passive_notifier_call;
> +		ret = devfreq_register_notifier((struct devfreq *)(*p_data)->parent, nb,
> +						DEVFREQ_TRANSITION_NOTIFIER);
> +		break;
> +	case CPUFREQ_PARENT_DEV:
> +		ret = cpufreq_passive_register(p_data);
> +		break;
> +	default:
> +		ret = -EINVAL;
> +		break;
> +	}
> +	return ret;
> +}
> +
> +int unregister_parent_dev_notifier(struct devfreq_passive_data **p_data)
> +{
> +	int ret = 0;
> +
> +	switch ((*p_data)->parent_type) {
> +	case DEVFREQ_PARENT_DEV:
> +		WARN_ON(devfreq_unregister_notifier((struct devfreq *)(*p_data)->parent,
> +						    &(*p_data)->nb,
> +						    DEVFREQ_TRANSITION_NOTIFIER));
> +		break;
> +	case CPUFREQ_PARENT_DEV:
> +		cpufreq_passive_unregister(p_data);
> +		break;
> +	default:
> +		ret = -EINVAL;
> +		break;
> +	}
> +	return ret;
> +}

I think that you don't need to define register_parent_dev_notifier
and unregister_parent_dev_notifier as the separate functions.

Instead of the separate functions, just add the code
into devfreq_passive_event_handler.


> +
>  static int devfreq_passive_event_handler(struct devfreq *devfreq,
>  				unsigned int event, void *data)
>  {
>  	struct devfreq_passive_data *p_data
>  			= (struct devfreq_passive_data *)devfreq->data;
>  	struct devfreq *parent = (struct devfreq *)p_data->parent;
> -	struct notifier_block *nb = &p_data->nb;
>  	int ret = 0;
>  
> -	if (!parent)
> +	if (p_data->parent_type == DEVFREQ_PARENT_DEV && !parent)
>  		return -EPROBE_DEFER;
>  
>  	switch (event) {
> @@ -147,13 +446,11 @@ static int devfreq_passive_event_handler(struct devfreq *devfreq,
>  		if (!p_data->this)
>  			p_data->this = devfreq;
>  
> -		nb->notifier_call = devfreq_passive_notifier_call;
> -		ret = devfreq_register_notifier(parent, nb,
> -					DEVFREQ_TRANSITION_NOTIFIER);
> +		ret = register_parent_dev_notifier(&p_data);
>  		break;
> +
>  	case DEVFREQ_GOV_STOP:
> -		WARN_ON(devfreq_unregister_notifier(parent, nb,
> -					DEVFREQ_TRANSITION_NOTIFIER));
> +		ret = unregister_parent_dev_notifier(&p_data);
>  		break;
>  	default:
>  		break;
> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
> index 26ea0850be9b..e0093b7c805c 100644
> --- a/include/linux/devfreq.h
> +++ b/include/linux/devfreq.h
> @@ -280,6 +280,25 @@ struct devfreq_simple_ondemand_data {
>  
>  #if IS_ENABLED(CONFIG_DEVFREQ_GOV_PASSIVE)
>  /**
> + * struct devfreq_cpu_state - holds the per-cpu state
> + * @freq:	the current frequency of the cpu.
> + * @min_freq:	the min frequency of the cpu.
> + * @max_freq:	the max frequency of the cpu.
> + * @first_cpu:	the cpumask of the first cpu of a policy.
> + * @dev:	reference to cpu device.
> + * @opp_table:	reference to cpu opp table.
> + *
> + * This structure stores the required cpu_state of a cpu.
> + * This is auto-populated by the governor.
> + */
> +struct devfreq_cpu_state;
> +
> +enum devfreq_parent_dev_type {
> +	DEVFREQ_PARENT_DEV,
> +	CPUFREQ_PARENT_DEV,
> +};
> +
> +/**
>   * struct devfreq_passive_data - ``void *data`` fed to struct devfreq
>   *	and devfreq_add_device
>   * @parent:	the devfreq instance of parent device.
> @@ -290,13 +309,15 @@ struct devfreq_simple_ondemand_data {
>   *			using governors except for passive governor.
>   *			If the devfreq device has the specific method to decide
>   *			the next frequency, should use this callback.
> + * @parent_type:	parent type of the device
>   * @this:	the devfreq instance of own device.
>   * @nb:		the notifier block for DEVFREQ_TRANSITION_NOTIFIER list
> + * @cpu_state:		the state min/max/current frequency of all online cpu's
>   *
>   * The devfreq_passive_data have to set the devfreq instance of parent
>   * device with governors except for the passive governor. But, don't need to
> - * initialize the 'this' and 'nb' field because the devfreq core will handle
> - * them.
> + * initialize the 'this', 'nb' and 'cpu_state' field because the devfreq core
> + * will handle them.
>   */
>  struct devfreq_passive_data {
>  	/* Should set the devfreq instance of parent device */
> @@ -305,9 +326,13 @@ struct devfreq_passive_data {
>  	/* Optional callback to decide the next frequency of passvice device */
>  	int (*get_target_freq)(struct devfreq *this, unsigned long *freq);
>  
> +	/* Should set the type of parent device */
> +	enum devfreq_parent_dev_type parent_type;
> +
>  	/* For passive governor's internal use. Don't need to set them */
>  	struct devfreq *this;
>  	struct notifier_block nb;
> +	struct devfreq_cpu_state *cpu_state[NR_CPUS];
>  };
>  #endif
>  
>
andrew-sh.cheng March 31, 2021, 8:03 a.m. UTC | #3
On Thu, 2021-03-25 at 17:14 +0900, Chanwoo Choi wrote:
> Hi,
> 
> You are missing to add these patches to linux-pm mailing list.
> Need to send them to linu-pm ML.
> 
> Also, before received this series, I tried to clean-up these patches
> on testing branch[1]. So that I add my comment with my clean-up case.
> [1] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!zIrzeDp9vPnm1_SDzVPuzqdHn3zWie9DnfBXaA-j9-CSrVc6aR9_rJQQiw81_CgAPh9XRRs$ 
> 
> And 'Saravana Kannan <skannan@codeaurora.org>' is wrong email address.
> Please update the email or drop this email.

Hi Chanwoo,

Thank you for the advices.
I will resend patch v9 (add to linux-pm ML), remove this patch, and note
that my patch set base on
https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov


> 
> 
> On 3/23/21 8:33 PM, Andrew-sh.Cheng wrote:
> > From: Saravana Kannan <skannan@codeaurora.org>
> >
> > Many CPU architectures have caches that can scale independent of the
> > CPUs. Frequency scaling of the caches is necessary to make sure that the
> > cache is not a performance bottleneck that leads to poor performance and
> > power. The same idea applies for RAM/DDR.
> >
> > To achieve this, this patch adds support for cpu based scaling to the
> > passive governor. This is accomplished by taking the current frequency
> > of each CPU frequency domain and then adjust the frequency of the cache
> > (or any devfreq device) based on the frequency of the CPUs. It listens
> > to CPU frequency transition notifiers to keep itself up to date on the
> > current CPU frequency.
> >
> > To decide the frequency of the device, the governor does one of the
> > following:
> > * Derives the optimal devfreq device opp from required-opps property of
> >   the parent cpu opp_table.
> >
> > * Scales the device frequency in proportion to the CPU frequency. So, if
> >   the CPUs are running at their max frequency, the device runs at its
> >   max frequency. If the CPUs are running at their min frequency, the
> >   device runs at its min frequency. It is interpolated for frequencies
> >   in between.
> >
> > Andrew-sh.Cheng change
> > dev_pm_opp_xlate_opp to dev_pm_opp_xlate_required_opp devfreq->max_freq
> > to devfreq->user_min_freq_req.data.freq.qos->min_freq.target_value
> > after kernel-5.7
> > Don't return -EINVAL in devfreq_passive_event_handler()
> > since it doesn't handle DEVFREQ_GOV_SUSPEND DEVFREQ_GOV_RESUME cases.
> >
> > Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
> > [Sibi: Integrated cpu-freqmap governor into passive_governor]
> > Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
> > Signed-off-by: Andrew-sh.Cheng <andrew-sh.cheng@mediatek.com>
> > ---
> >  drivers/devfreq/Kconfig            |   2 +
> >  drivers/devfreq/governor_passive.c | 329 +++++++++++++++++++++++++++++++++++--
> >  include/linux/devfreq.h            |  29 +++-
> >  3 files changed, 342 insertions(+), 18 deletions(-)
> >
> > diff --git a/drivers/devfreq/Kconfig b/drivers/devfreq/Kconfig
> > index 00704efe6398..f56132b0ae64 100644
> > --- a/drivers/devfreq/Kconfig
> > +++ b/drivers/devfreq/Kconfig
> > @@ -73,6 +73,8 @@ config DEVFREQ_GOV_PASSIVE
> >  	  device. This governor does not change the frequency by itself
> >  	  through sysfs entries. The passive governor recommends that
> >  	  devfreq device uses the OPP table to get the frequency/voltage.
> > +	  Alternatively the governor can also be chosen to scale based on
> > +	  the online CPUs current frequency.
> >  
> >  comment "DEVFREQ Drivers"
> >  
> > diff --git a/drivers/devfreq/governor_passive.c b/drivers/devfreq/governor_passive.c
> > index b094132bd20b..9cc57b083839 100644
> > --- a/drivers/devfreq/governor_passive.c
> > +++ b/drivers/devfreq/governor_passive.c
> > @@ -8,11 +8,103 @@
> >   */
> >  
> >  #include <linux/module.h>
> > +#include <linux/cpu.h>
> > +#include <linux/cpufreq.h>
> > +#include <linux/cpumask.h>
> >  #include <linux/device.h>
> >  #include <linux/devfreq.h>
> > +#include <linux/slab.h>
> >  #include "governor.h"
> >  
> > -static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
> > +struct devfreq_cpu_state {
> > +	unsigned int curr_freq;
> > +	unsigned int min_freq;
> > +	unsigned int max_freq;
> > +	unsigned int first_cpu;
> > +	struct device *cpu_dev;
> > +	struct opp_table *opp_table;
> > +};
> 
> As I knew, the previous version has the description of structure
> as following:  I wan to add the description like below.
> 
> And if you have no any objection, I'd like you to order
> the variables as following and use 'dev' instead of 'cpu_dev'
> because this patch use the 'cpu_state->cpu_dev' at the multiple points.
> I think that 'cpu_state->dev' is better than 'cpu_state->cpu_dev'.
> Also, I prefer to use 'cur_freq' instead of 'curr_freq'
> because devfreq subsystem uses 'cur_freq' for expressing the 'current frequency'.
> 
> /**                                                                             
>  * struct devfreq_cpu_state - Hold the per-cpu data                              
>  * @dev:        reference to cpu device.                                        
>  * @first_cpu:  the cpumask of the first cpu of a policy.                       
>  * @opp_table:  reference to cpu opp table.                                     
>  * @cur_freq:   the current frequency of the cpu.                               
>  * @min_freq:   the min frequency of the cpu.                                   
>  * @max_freq:   the max frequency of the cpu.                                   
>  *                                                                              
>  * This structure stores the required cpu_data of a cpu.                        
>  * This is auto-populated by the governor.                                      
>  */                                                                             
> struct devfreq_cpu_state {                                                       
>          struct device *dev;                                                     
>          unsigned int first_cpu;                                                 
> 
>          struct opp_table *opp_table;                                            
>          unsigned int cur_freq;                                                  
>          unsigned int min_freq;                                                  
>          unsigned int max_freq;                                                  
> };               
> 
> 
> > +
> > +static unsigned long xlate_cpufreq_to_devfreq(struct devfreq_passive_data *data,
> > +					      unsigned int cpu)
> > +{
> > +	unsigned int cpu_min_freq, cpu_max_freq, cpu_curr_freq_khz, cpu_percent;
> > +	unsigned long dev_min_freq, dev_max_freq, dev_max_state;
> > +
> > +	struct devfreq_cpu_state *cpu_state = data->cpu_state[cpu];
> > +	struct devfreq *devfreq = (struct devfreq *)data->this;
> > +	unsigned long *dev_freq_table = devfreq->profile->freq_table;
> > +	struct dev_pm_opp *opp = NULL, *p_opp = NULL;
> > +	unsigned long cpu_curr_freq, freq;
> > +
> > +	if (!cpu_state || cpu_state->first_cpu != cpu ||
> > +	    !cpu_state->opp_table || !devfreq->opp_table)
> > +		return 0;
> > +
> > +	cpu_curr_freq = cpu_state->curr_freq * 1000;
> > +	p_opp = devfreq_recommended_opp(cpu_state->cpu_dev, &cpu_curr_freq, 0);
> > +	if (IS_ERR(p_opp))
> > +		return 0;
> > +
> > +	opp = dev_pm_opp_xlate_required_opp(cpu_state->opp_table,
> > +					    devfreq->opp_table, p_opp);
> > +	dev_pm_opp_put(p_opp);
> > +
> > +	if (!IS_ERR(opp)) {
> > +		freq = dev_pm_opp_get_freq(opp);
> > +		dev_pm_opp_put(opp);
> > +		goto out;
> > +	}
> > +
> > +	/* Use Interpolation if required opps is not available */
> > +	cpu_min_freq = cpu_state->min_freq;
> > +	cpu_max_freq = cpu_state->max_freq;
> > +	cpu_curr_freq_khz = cpu_state->curr_freq;
> > +
> > +	if (dev_freq_table) {
> > +		/* Get minimum frequency according to sorting order */
> > +		dev_max_state = dev_freq_table[devfreq->profile->max_state - 1];
> > +		if (dev_freq_table[0] < dev_max_state) {
> > +			dev_min_freq = dev_freq_table[0];
> > +			dev_max_freq = dev_max_state;
> > +		} else {
> > +			dev_min_freq = dev_max_state;
> > +			dev_max_freq = dev_freq_table[0];
> > +		}
> > +	} else {
> > +		dev_min_freq = dev_pm_qos_read_value(devfreq->dev.parent,
> > +						     DEV_PM_QOS_MIN_FREQUENCY);
> > +		dev_max_freq = dev_pm_qos_read_value(devfreq->dev.parent,
> > +						     DEV_PM_QOS_MAX_FREQUENCY);
> > +
> > +		if (dev_max_freq <= dev_min_freq)
> > +			return 0;
> > +	}
> > +	cpu_percent = ((cpu_curr_freq_khz - cpu_min_freq) * 100) / cpu_max_freq - cpu_min_freq;
> > +	freq = dev_min_freq + mult_frac(dev_max_freq - dev_min_freq, cpu_percent, 100);
> > +
> > +out:
> > +	return freq;
> > +}
> > +
> > +static int get_target_freq_with_cpufreq(struct devfreq *devfreq,
> > +					unsigned long *freq)
> > +{
> > +	struct devfreq_passive_data *p_data =
> > +				(struct devfreq_passive_data *)devfreq->data;
> > +	unsigned int cpu;
> > +	unsigned long target_freq = 0;
> > +
> > +	for_each_online_cpu(cpu)
> > +		target_freq = max(target_freq,
> > +				  xlate_cpufreq_to_devfreq(p_data, cpu));
> > +
> > +	*freq = target_freq;
> > +
> > +	return 0;
> > +}
> 
> As you knew, governor_passive.c was already used 
> both 'dev_pm_opp_xlate_required_opp' and 'devfreq_recommended_opp'
> to get the target from OPP. So, I wan to make the common function
> like 'get_taget_freq_by_required_opp' as following:
> If define 'get_taget_freq_by_required_opp' as following,
> it will be used for get_target_freq_with_devfreq().
> After finisied the review of this patch, I'll send the patch[2].
> [2] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/commit/?h=devfreq-testing-passive-gov&id=101c5a087586ab2b5cf3370166a7e39227ca83cf__;!!CTRNKA9wMg0ARbw!zIrzeDp9vPnm1_SDzVPuzqdHn3zWie9DnfBXaA-j9-CSrVc6aR9_rJQQiw81_CgA6mp3Yqo$ 
> 
> For example but this code is not tested,
> static unsigned long get_taget_freq_by_required_opp(struct device *p_dev,
> 						struct opp_table *p_opp_table,
> 						struct opp_table *opp_table,
> 						unsigned long freq)
> {
> 	struct dev_pm_opp *opp = NULL, *p_opp = NULL;
> 
> 	if (!p_dev || !p_opp_table || !opp_table || !freq)
> 		return 0;
> 
> 	p_opp = devfreq_recommended_opp(p_dev, &freq, 0);
> 	if (IS_ERR(p_opp))
> 		return 0;
> 
> 	opp = dev_pm_opp_xlate_required_opp(p_opp_table, opp_table, p_opp);
> 	dev_pm_opp_put(p_opp);
> 
> 	if (IS_ERR(opp))
> 		return 0;
> 
> 	freq = dev_pm_opp_get_freq(opp);
> 	dev_pm_opp_put(opp);
> 
> 	return freq;
> }
> 
> static int get_target_freq_with_cpufreq(struct devfreq *devfreq,
> 					unsigned long *target_freq)
> {
> 	struct devfreq_passive_data *p_data =
> 				(struct devfreq_passive_data *)devfreq->data;
> 	struct devfreq_cpu_data *cpu_data;
> 	unsigned long cpu, cpu_cur, cpu_min, cpu_max, cpu_percent;
> 	unsigned long dev_min, dev_max;
> 	unsigned long freq = 0;
> 
> 	for_each_online_cpu(cpu) {
> 		cpu_data = p_data->cpu_data[cpu];
> 		if (!cpu_data || cpu_data->first_cpu != cpu)
> 			continue;
> 
> 		/* Get target freq via required opps */
> 		cpu_cur = cpu_data->cur_freq * HZ_PER_KHZ;
> 		freq = get_taget_freq_by_required_opp(cpu_data->dev,
> 					cpu_data->opp_table,
> 					devfreq->opp_table, cpu_cur);
> 		if (freq) {
> 			*target_freq = max(freq, *target_freq);
> 			continue;
> 		}
> 
> 		/* Use Interpolation if required opps is not available */
> 		devfreq_get_freq_range(devfreq, &dev_min, &dev_max);
> 
> 		cpu_min = cpu_data->min_freq;
> 		cpu_max = cpu_data->max_freq;
> 		cpu_cur = cpu_data->cur_freq;
> 
> 		cpu_percent = ((cpu_cur - cpu_min) * 100) / cpu_max - cpu_min;
> 		freq = dev_min + mult_frac(dev_max - dev_min, cpu_percent, 100);
> 
> 		*target_freq = max(freq, *target_freq);
> 	}
> 
> 	return 0;
> }
> 
> > +
> > +static int get_target_freq_with_devfreq(struct devfreq *devfreq,
> >  					unsigned long *freq)
> >  {
> >  	struct devfreq_passive_data *p_data
> > @@ -23,14 +115,6 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
> >  	int i, count;
> >  
> >  	/*
> > -	 * If the devfreq device with passive governor has the specific method
> > -	 * to determine the next frequency, should use the get_target_freq()
> > -	 * of struct devfreq_passive_data.
> > -	 */
> > -	if (p_data->get_target_freq)
> > -		return p_data->get_target_freq(devfreq, freq);
> > -
> > -	/*
> >  	 * If the parent and passive devfreq device uses the OPP table,
> >  	 * get the next frequency by using the OPP table.
> >  	 */
> > @@ -98,6 +182,37 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
> >  	return 0;
> >  }
> >  
> > +static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
> > +					   unsigned long *freq)
> > +{
> > +	struct devfreq_passive_data *p_data =
> > +		(struct devfreq_passive_data *)devfreq->data;
> > +	int ret;
> > +
> > +	/*
> > +	 * If the devfreq device with passive governor has the specific method
> > +	 * to determine the next frequency, should use the get_target_freq()
> > +	 * of struct devfreq_passive_data.
> > +	 */
> > +	if (p_data->get_target_freq)
> > +		return p_data->get_target_freq(devfreq, freq);
> > +
> > +	switch (p_data->parent_type) {
> > +	case DEVFREQ_PARENT_DEV:
> > +		ret = get_target_freq_with_devfreq(devfreq, freq);
> > +		break;
> > +	case CPUFREQ_PARENT_DEV:
> > +		ret = get_target_freq_with_cpufreq(devfreq, freq);
> > +		break;
> > +	default:
> > +		ret = -EINVAL;
> > +		dev_err(&devfreq->dev, "Invalid parent type\n");
> > +		break;
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> >  static int devfreq_passive_notifier_call(struct notifier_block *nb,
> >  				unsigned long event, void *ptr)
> >  {
> > @@ -130,16 +245,200 @@ static int devfreq_passive_notifier_call(struct notifier_block *nb,
> >  	return NOTIFY_DONE;
> >  }
> >  
> > +static int cpufreq_passive_notifier_call(struct notifier_block *nb,
> > +					 unsigned long event, void *ptr)
> > +{
> > +	struct devfreq_passive_data *data =
> > +			container_of(nb, struct devfreq_passive_data, nb);
> > +	struct devfreq *devfreq = (struct devfreq *)data->this;
> > +	struct devfreq_cpu_state *cpu_state;
> > +	struct cpufreq_freqs *cpu_freq = ptr;
> 
> Use 'freqs' variable name.  I prefer to use the same variable name
> for both devfreq_freqs and cpufreq_freqs instance.
> 
> > +	unsigned int curr_freq;
> 
> As I commented above, better to use 'cur_frq' instead of 'curr_freq'
> if there is no any special reason.
> 
> > +	int ret;
> > +
> > +	if (event != CPUFREQ_POSTCHANGE || !cpu_freq ||
> > +	    !data->cpu_state[cpu_freq->policy->cpu])
> > +		return 0;
> > +
> > +	cpu_state = data->cpu_state[cpu_freq->policy->cpu];
> > +	if (cpu_state->curr_freq == cpu_freq->new)
> > +		return 0;
> > +
> > +	/* Backup current freq and pre-update cpu state freq*/
> 
> I think that this commnet is not critial. So, please drop this comment.
> 
> > +	curr_freq = cpu_state->curr_freq;
> > +	cpu_state->curr_freq = cpu_freq->new;
> > +
> > +	mutex_lock(&devfreq->lock);
> > +	ret = update_devfreq(devfreq);
> 
> I recommend to use 'devfreq_update_target' instead of 'update_devfreq'
> as following:
> 	devfreq_update_target(devfreq, freqs->new);
> 
> > +	mutex_unlock(&devfreq->lock);
> > +	if (ret) {
> > +		cpu_state->curr_freq = curr_freq;
> > +		dev_err(&devfreq->dev, "Couldn't update the frequency.\n");
> > +		return ret;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static int cpufreq_passive_register(struct devfreq_passive_data **p_data)
> 
> In order to keep the consistent style of function name,
> please change the name as following because devfreq defines
> the function name as 'devfreq_regiter_notifier'
> - cpufreq_passive_register -> cpufreq_passive_register_notifier
> 
> > +{
> > +	struct devfreq_passive_data *data = *p_data;
> > +	struct devfreq *devfreq = (struct devfreq *)data->this;
> > +	struct device *dev = devfreq->dev.parent;
> > +	struct opp_table *opp_table = NULL;
> > +	struct devfreq_cpu_state *cpu_state;
> > +	struct cpufreq_policy *policy;
> > +	struct device *cpu_dev;
> > +	unsigned int cpu;
> > +	int ret;
> > +
> > +	get_online_cpus();
> > +
> > +	data->nb.notifier_call = cpufreq_passive_notifier_call;
> > +	ret = cpufreq_register_notifier(&data->nb,
> > +					CPUFREQ_TRANSITION_NOTIFIER);
> > +	if (ret) {
> > +		dev_err(dev, "Couldn't register cpufreq notifier.\n");
> > +		data->nb.notifier_call = NULL;
> > +		goto out;
> > +	}
> > +
> > +	/* Populate devfreq_cpu_state */
> 
> Don't need this comment. Please drop it.
> 
> > +	for_each_online_cpu(cpu) {
> > +		if (data->cpu_state[cpu])
> > +			continue;
> > +
> > +		policy = cpufreq_cpu_get(cpu);
> > +		if (!policy) {
> > +			ret = -EINVAL;
> > +			goto out;
> > +		} else if (PTR_ERR(policy) == -EPROBE_DEFER) {
> > +			ret = -EPROBE_DEFER;
> > +			goto out;
> > +		} else if (IS_ERR(policy)) {
> > +			ret = PTR_ERR(policy);
> > +			dev_err(dev, "Couldn't get the cpufreq_poliy.\n");
> > +			goto out;
> > +		}
> 
> Use dev_err_probe() funciotn to handle hte EPROBE_DEFER.
> It make code more simple.
> 
> > +
> > +		cpu_state = kzalloc(sizeof(*cpu_state), GFP_KERNEL);
> > +		if (!cpu_state) {
> > +			ret = -ENOMEM;
> > +			goto out;
> > +		}
> > +
> > +		cpu_dev = get_cpu_device(cpu);
> > +		if (!cpu_dev) {
> > +			dev_err(dev, "Couldn't get cpu device.\n");
> > +			ret = -ENODEV;
> > +			goto out;
> > +		}
> > +
> > +		opp_table = dev_pm_opp_get_opp_table(cpu_dev);
> > +		if (IS_ERR(devfreq->opp_table)) {
> > +			ret = PTR_ERR(opp_table);
> > +			goto out;
> > +		}
> > +
> > +		cpu_state->cpu_dev = cpu_dev;
> > +		cpu_state->opp_table = opp_table;
> > +		cpu_state->first_cpu = cpumask_first(policy->related_cpus);
> > +		cpu_state->curr_freq = policy->cur;
> > +		cpu_state->min_freq = policy->cpuinfo.min_freq;
> > +		cpu_state->max_freq = policy->cpuinfo.max_freq;
> > +		data->cpu_state[cpu] = cpu_state;
> > +
> > +		cpufreq_cpu_put(policy);
> > +	}
> > +
> > +out:
> > +	put_online_cpus();
> > +	if (ret)
> > +		return ret;
> > +
> > +	/* Update devfreq */
> > +	mutex_lock(&devfreq->lock);
> > +	ret = update_devfreq(devfreq);
> 
> > +	mutex_unlock(&devfreq->lock);
> > +	if (ret)
> > +		dev_err(dev, "Couldn't update the frequency.\n");
> > +
> > +	return ret;
> > +}
> > +
> > +static int cpufreq_passive_unregister(struct devfreq_passive_data **p_data)
> 
> As I commented above, please change the name as following:
> - cpufreq_passive_unregister -> cpufreq_passive_unregister_notifier
> 
> > +{
> > +	struct devfreq_passive_data *data = *p_data;
> > +	struct devfreq_cpu_state *cpu_state;
> > +	int cpu;
> > +
> > +	if (data->nb.notifier_call)
> > +		cpufreq_unregister_notifier(&data->nb,
> > +					    CPUFREQ_TRANSITION_NOTIFIER);
> > +
> > +	for_each_possible_cpu(cpu) {
> > +		cpu_state = data->cpu_state[cpu];
> > +		if (cpu_state) {
> > +			if (cpu_state->opp_table)
> > +				dev_pm_opp_put_opp_table(cpu_state->opp_table);
> > +			kfree(cpu_state);
> > +			cpu_state = NULL;
> > +		}
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +int register_parent_dev_notifier(struct devfreq_passive_data **p_data)
> > +{
> > +	struct notifier_block *nb = &(*p_data)->nb;
> > +	int ret = 0;
> > +
> > +	switch ((*p_data)->parent_type) {
> > +	case DEVFREQ_PARENT_DEV:
> > +		nb->notifier_call = devfreq_passive_notifier_call;
> > +		ret = devfreq_register_notifier((struct devfreq *)(*p_data)->parent, nb,
> > +						DEVFREQ_TRANSITION_NOTIFIER);
> > +		break;
> > +	case CPUFREQ_PARENT_DEV:
> > +		ret = cpufreq_passive_register(p_data);
> > +		break;
> > +	default:
> > +		ret = -EINVAL;
> > +		break;
> > +	}
> > +	return ret;
> > +}
> > +
> > +int unregister_parent_dev_notifier(struct devfreq_passive_data **p_data)
> > +{
> > +	int ret = 0;
> > +
> > +	switch ((*p_data)->parent_type) {
> > +	case DEVFREQ_PARENT_DEV:
> > +		WARN_ON(devfreq_unregister_notifier((struct devfreq *)(*p_data)->parent,
> > +						    &(*p_data)->nb,
> > +						    DEVFREQ_TRANSITION_NOTIFIER));
> > +		break;
> > +	case CPUFREQ_PARENT_DEV:
> > +		cpufreq_passive_unregister(p_data);
> > +		break;
> > +	default:
> > +		ret = -EINVAL;
> > +		break;
> > +	}
> > +	return ret;
> > +}
> 
> I think that you don't need to define register_parent_dev_notifier
> and unregister_parent_dev_notifier as the separate functions.
> 
> Instead of the separate functions, just add the code
> into devfreq_passive_event_handler.
> 
> 
> > +
> >  static int devfreq_passive_event_handler(struct devfreq *devfreq,
> >  				unsigned int event, void *data)
> >  {
> >  	struct devfreq_passive_data *p_data
> >  			= (struct devfreq_passive_data *)devfreq->data;
> >  	struct devfreq *parent = (struct devfreq *)p_data->parent;
> > -	struct notifier_block *nb = &p_data->nb;
> >  	int ret = 0;
> >  
> > -	if (!parent)
> > +	if (p_data->parent_type == DEVFREQ_PARENT_DEV && !parent)
> >  		return -EPROBE_DEFER;
> >  
> >  	switch (event) {
> > @@ -147,13 +446,11 @@ static int devfreq_passive_event_handler(struct devfreq *devfreq,
> >  		if (!p_data->this)
> >  			p_data->this = devfreq;
> >  
> > -		nb->notifier_call = devfreq_passive_notifier_call;
> > -		ret = devfreq_register_notifier(parent, nb,
> > -					DEVFREQ_TRANSITION_NOTIFIER);
> > +		ret = register_parent_dev_notifier(&p_data);
> >  		break;
> > +
> >  	case DEVFREQ_GOV_STOP:
> > -		WARN_ON(devfreq_unregister_notifier(parent, nb,
> > -					DEVFREQ_TRANSITION_NOTIFIER));
> > +		ret = unregister_parent_dev_notifier(&p_data);
> >  		break;
> >  	default:
> >  		break;
> > diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
> > index 26ea0850be9b..e0093b7c805c 100644
> > --- a/include/linux/devfreq.h
> > +++ b/include/linux/devfreq.h
> > @@ -280,6 +280,25 @@ struct devfreq_simple_ondemand_data {
> >  
> >  #if IS_ENABLED(CONFIG_DEVFREQ_GOV_PASSIVE)
> >  /**
> > + * struct devfreq_cpu_state - holds the per-cpu state
> > + * @freq:	the current frequency of the cpu.
> > + * @min_freq:	the min frequency of the cpu.
> > + * @max_freq:	the max frequency of the cpu.
> > + * @first_cpu:	the cpumask of the first cpu of a policy.
> > + * @dev:	reference to cpu device.
> > + * @opp_table:	reference to cpu opp table.
> > + *
> > + * This structure stores the required cpu_state of a cpu.
> > + * This is auto-populated by the governor.
> > + */
> > +struct devfreq_cpu_state;
> > +
> > +enum devfreq_parent_dev_type {
> > +	DEVFREQ_PARENT_DEV,
> > +	CPUFREQ_PARENT_DEV,
> > +};
> > +
> > +/**
> >   * struct devfreq_passive_data - ``void *data`` fed to struct devfreq
> >   *	and devfreq_add_device
> >   * @parent:	the devfreq instance of parent device.
> > @@ -290,13 +309,15 @@ struct devfreq_simple_ondemand_data {
> >   *			using governors except for passive governor.
> >   *			If the devfreq device has the specific method to decide
> >   *			the next frequency, should use this callback.
> > + * @parent_type:	parent type of the device
> >   * @this:	the devfreq instance of own device.
> >   * @nb:		the notifier block for DEVFREQ_TRANSITION_NOTIFIER list
> > + * @cpu_state:		the state min/max/current frequency of all online cpu's
> >   *
> >   * The devfreq_passive_data have to set the devfreq instance of parent
> >   * device with governors except for the passive governor. But, don't need to
> > - * initialize the 'this' and 'nb' field because the devfreq core will handle
> > - * them.
> > + * initialize the 'this', 'nb' and 'cpu_state' field because the devfreq core
> > + * will handle them.
> >   */
> >  struct devfreq_passive_data {
> >  	/* Should set the devfreq instance of parent device */
> > @@ -305,9 +326,13 @@ struct devfreq_passive_data {
> >  	/* Optional callback to decide the next frequency of passvice device */
> >  	int (*get_target_freq)(struct devfreq *this, unsigned long *freq);
> >  
> > +	/* Should set the type of parent device */
> > +	enum devfreq_parent_dev_type parent_type;
> > +
> >  	/* For passive governor's internal use. Don't need to set them */
> >  	struct devfreq *this;
> >  	struct notifier_block nb;
> > +	struct devfreq_cpu_state *cpu_state[NR_CPUS];
> >  };
> >  #endif
> >  
> >
>
Chanwoo Choi March 31, 2021, 8:27 a.m. UTC | #4
Hi,

On 3/31/21 5:03 PM, andrew-sh.cheng wrote:
> On Thu, 2021-03-25 at 17:14 +0900, Chanwoo Choi wrote:
>> Hi,
>>
>> You are missing to add these patches to linux-pm mailing list.
>> Need to send them to linu-pm ML.
>>
>> Also, before received this series, I tried to clean-up these patches
>> on testing branch[1]. So that I add my comment with my clean-up case.
>> [1] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!zIrzeDp9vPnm1_SDzVPuzqdHn3zWie9DnfBXaA-j9-CSrVc6aR9_rJQQiw81_CgAPh9XRRs$ 
>>
>> And 'Saravana Kannan <skannan@codeaurora.org>' is wrong email address.
>> Please update the email or drop this email.
> 
> Hi Chanwoo,
> 
> Thank you for the advices.
> I will resend patch v9 (add to linux-pm ML), remove this patch, and note
> that my patch set base on
> https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov

I has not yet test this patch[1] on devfreq-testing-passive-gov branch.
So that if possible, I'd like you to test your patches with this patch[1] 
and then if there is no problem, could you send the next patches with patch[1]?

[1]https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/commit/?h=devfreq-testing-passive-gov&id=39c80d11a8f42dd63ecea1e0df595a0ceb83b454

> 
> 
>>
>>
>> On 3/23/21 8:33 PM, Andrew-sh.Cheng wrote:
>>> From: Saravana Kannan <skannan@codeaurora.org>
>>>
>>> Many CPU architectures have caches that can scale independent of the
>>> CPUs. Frequency scaling of the caches is necessary to make sure that the
>>> cache is not a performance bottleneck that leads to poor performance and
>>> power. The same idea applies for RAM/DDR.
>>>
>>> To achieve this, this patch adds support for cpu based scaling to the
>>> passive governor. This is accomplished by taking the current frequency
>>> of each CPU frequency domain and then adjust the frequency of the cache
>>> (or any devfreq device) based on the frequency of the CPUs. It listens
>>> to CPU frequency transition notifiers to keep itself up to date on the
>>> current CPU frequency.
>>>
>>> To decide the frequency of the device, the governor does one of the
>>> following:
>>> * Derives the optimal devfreq device opp from required-opps property of
>>>   the parent cpu opp_table.
>>>
>>> * Scales the device frequency in proportion to the CPU frequency. So, if
>>>   the CPUs are running at their max frequency, the device runs at its
>>>   max frequency. If the CPUs are running at their min frequency, the
>>>   device runs at its min frequency. It is interpolated for frequencies
>>>   in between.
>>>
>>> Andrew-sh.Cheng change
>>> dev_pm_opp_xlate_opp to dev_pm_opp_xlate_required_opp devfreq->max_freq
>>> to devfreq->user_min_freq_req.data.freq.qos->min_freq.target_value
>>> after kernel-5.7
>>> Don't return -EINVAL in devfreq_passive_event_handler()
>>> since it doesn't handle DEVFREQ_GOV_SUSPEND DEVFREQ_GOV_RESUME cases.
>>>
>>> Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
>>> [Sibi: Integrated cpu-freqmap governor into passive_governor]
>>> Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
>>> Signed-off-by: Andrew-sh.Cheng <andrew-sh.cheng@mediatek.com>
>>> ---
>>>  drivers/devfreq/Kconfig            |   2 +
>>>  drivers/devfreq/governor_passive.c | 329 +++++++++++++++++++++++++++++++++++--
>>>  include/linux/devfreq.h            |  29 +++-
>>>  3 files changed, 342 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/drivers/devfreq/Kconfig b/drivers/devfreq/Kconfig
>>> index 00704efe6398..f56132b0ae64 100644
>>> --- a/drivers/devfreq/Kconfig
>>> +++ b/drivers/devfreq/Kconfig
>>> @@ -73,6 +73,8 @@ config DEVFREQ_GOV_PASSIVE
>>>  	  device. This governor does not change the frequency by itself
>>>  	  through sysfs entries. The passive governor recommends that
>>>  	  devfreq device uses the OPP table to get the frequency/voltage.
>>> +	  Alternatively the governor can also be chosen to scale based on
>>> +	  the online CPUs current frequency.
>>>  
>>>  comment "DEVFREQ Drivers"
>>>  
>>> diff --git a/drivers/devfreq/governor_passive.c b/drivers/devfreq/governor_passive.c
>>> index b094132bd20b..9cc57b083839 100644
>>> --- a/drivers/devfreq/governor_passive.c
>>> +++ b/drivers/devfreq/governor_passive.c
>>> @@ -8,11 +8,103 @@
>>>   */
>>>  
>>>  #include <linux/module.h>
>>> +#include <linux/cpu.h>
>>> +#include <linux/cpufreq.h>
>>> +#include <linux/cpumask.h>
>>>  #include <linux/device.h>
>>>  #include <linux/devfreq.h>
>>> +#include <linux/slab.h>
>>>  #include "governor.h"
>>>  
>>> -static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
>>> +struct devfreq_cpu_state {
>>> +	unsigned int curr_freq;
>>> +	unsigned int min_freq;
>>> +	unsigned int max_freq;
>>> +	unsigned int first_cpu;
>>> +	struct device *cpu_dev;
>>> +	struct opp_table *opp_table;
>>> +};
>>
>> As I knew, the previous version has the description of structure
>> as following:  I wan to add the description like below.
>>
>> And if you have no any objection, I'd like you to order
>> the variables as following and use 'dev' instead of 'cpu_dev'
>> because this patch use the 'cpu_state->cpu_dev' at the multiple points.
>> I think that 'cpu_state->dev' is better than 'cpu_state->cpu_dev'.
>> Also, I prefer to use 'cur_freq' instead of 'curr_freq'
>> because devfreq subsystem uses 'cur_freq' for expressing the 'current frequency'.
>>
>> /**                                                                             
>>  * struct devfreq_cpu_state - Hold the per-cpu data                              
>>  * @dev:        reference to cpu device.                                        
>>  * @first_cpu:  the cpumask of the first cpu of a policy.                       
>>  * @opp_table:  reference to cpu opp table.                                     
>>  * @cur_freq:   the current frequency of the cpu.                               
>>  * @min_freq:   the min frequency of the cpu.                                   
>>  * @max_freq:   the max frequency of the cpu.                                   
>>  *                                                                              
>>  * This structure stores the required cpu_data of a cpu.                        
>>  * This is auto-populated by the governor.                                      
>>  */                                                                             
>> struct devfreq_cpu_state {                                                       
>>          struct device *dev;                                                     
>>          unsigned int first_cpu;                                                 
>>
>>          struct opp_table *opp_table;                                            
>>          unsigned int cur_freq;                                                  
>>          unsigned int min_freq;                                                  
>>          unsigned int max_freq;                                                  
>> };               
>>
>>
>>> +
>>> +static unsigned long xlate_cpufreq_to_devfreq(struct devfreq_passive_data *data,
>>> +					      unsigned int cpu)
>>> +{
>>> +	unsigned int cpu_min_freq, cpu_max_freq, cpu_curr_freq_khz, cpu_percent;
>>> +	unsigned long dev_min_freq, dev_max_freq, dev_max_state;
>>> +
>>> +	struct devfreq_cpu_state *cpu_state = data->cpu_state[cpu];
>>> +	struct devfreq *devfreq = (struct devfreq *)data->this;
>>> +	unsigned long *dev_freq_table = devfreq->profile->freq_table;
>>> +	struct dev_pm_opp *opp = NULL, *p_opp = NULL;
>>> +	unsigned long cpu_curr_freq, freq;
>>> +
>>> +	if (!cpu_state || cpu_state->first_cpu != cpu ||
>>> +	    !cpu_state->opp_table || !devfreq->opp_table)
>>> +		return 0;
>>> +
>>> +	cpu_curr_freq = cpu_state->curr_freq * 1000;
>>> +	p_opp = devfreq_recommended_opp(cpu_state->cpu_dev, &cpu_curr_freq, 0);
>>> +	if (IS_ERR(p_opp))
>>> +		return 0;
>>> +
>>> +	opp = dev_pm_opp_xlate_required_opp(cpu_state->opp_table,
>>> +					    devfreq->opp_table, p_opp);
>>> +	dev_pm_opp_put(p_opp);
>>> +
>>> +	if (!IS_ERR(opp)) {
>>> +		freq = dev_pm_opp_get_freq(opp);
>>> +		dev_pm_opp_put(opp);
>>> +		goto out;
>>> +	}
>>> +
>>> +	/* Use Interpolation if required opps is not available */
>>> +	cpu_min_freq = cpu_state->min_freq;
>>> +	cpu_max_freq = cpu_state->max_freq;
>>> +	cpu_curr_freq_khz = cpu_state->curr_freq;
>>> +
>>> +	if (dev_freq_table) {
>>> +		/* Get minimum frequency according to sorting order */
>>> +		dev_max_state = dev_freq_table[devfreq->profile->max_state - 1];
>>> +		if (dev_freq_table[0] < dev_max_state) {
>>> +			dev_min_freq = dev_freq_table[0];
>>> +			dev_max_freq = dev_max_state;
>>> +		} else {
>>> +			dev_min_freq = dev_max_state;
>>> +			dev_max_freq = dev_freq_table[0];
>>> +		}
>>> +	} else {
>>> +		dev_min_freq = dev_pm_qos_read_value(devfreq->dev.parent,
>>> +						     DEV_PM_QOS_MIN_FREQUENCY);
>>> +		dev_max_freq = dev_pm_qos_read_value(devfreq->dev.parent,
>>> +						     DEV_PM_QOS_MAX_FREQUENCY);
>>> +
>>> +		if (dev_max_freq <= dev_min_freq)
>>> +			return 0;
>>> +	}
>>> +	cpu_percent = ((cpu_curr_freq_khz - cpu_min_freq) * 100) / cpu_max_freq - cpu_min_freq;
>>> +	freq = dev_min_freq + mult_frac(dev_max_freq - dev_min_freq, cpu_percent, 100);
>>> +
>>> +out:
>>> +	return freq;
>>> +}
>>> +
>>> +static int get_target_freq_with_cpufreq(struct devfreq *devfreq,
>>> +					unsigned long *freq)
>>> +{
>>> +	struct devfreq_passive_data *p_data =
>>> +				(struct devfreq_passive_data *)devfreq->data;
>>> +	unsigned int cpu;
>>> +	unsigned long target_freq = 0;
>>> +
>>> +	for_each_online_cpu(cpu)
>>> +		target_freq = max(target_freq,
>>> +				  xlate_cpufreq_to_devfreq(p_data, cpu));
>>> +
>>> +	*freq = target_freq;
>>> +
>>> +	return 0;
>>> +}
>>
>> As you knew, governor_passive.c was already used 
>> both 'dev_pm_opp_xlate_required_opp' and 'devfreq_recommended_opp'
>> to get the target from OPP. So, I wan to make the common function
>> like 'get_taget_freq_by_required_opp' as following:
>> If define 'get_taget_freq_by_required_opp' as following,
>> it will be used for get_target_freq_with_devfreq().
>> After finisied the review of this patch, I'll send the patch[2].
>> [2] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/commit/?h=devfreq-testing-passive-gov&id=101c5a087586ab2b5cf3370166a7e39227ca83cf__;!!CTRNKA9wMg0ARbw!zIrzeDp9vPnm1_SDzVPuzqdHn3zWie9DnfBXaA-j9-CSrVc6aR9_rJQQiw81_CgA6mp3Yqo$ 
>>
>> For example but this code is not tested,
>> static unsigned long get_taget_freq_by_required_opp(struct device *p_dev,
>> 						struct opp_table *p_opp_table,
>> 						struct opp_table *opp_table,
>> 						unsigned long freq)
>> {
>> 	struct dev_pm_opp *opp = NULL, *p_opp = NULL;
>>
>> 	if (!p_dev || !p_opp_table || !opp_table || !freq)
>> 		return 0;
>>
>> 	p_opp = devfreq_recommended_opp(p_dev, &freq, 0);
>> 	if (IS_ERR(p_opp))
>> 		return 0;
>>
>> 	opp = dev_pm_opp_xlate_required_opp(p_opp_table, opp_table, p_opp);
>> 	dev_pm_opp_put(p_opp);
>>
>> 	if (IS_ERR(opp))
>> 		return 0;
>>
>> 	freq = dev_pm_opp_get_freq(opp);
>> 	dev_pm_opp_put(opp);
>>
>> 	return freq;
>> }
>>
>> static int get_target_freq_with_cpufreq(struct devfreq *devfreq,
>> 					unsigned long *target_freq)
>> {
>> 	struct devfreq_passive_data *p_data =
>> 				(struct devfreq_passive_data *)devfreq->data;
>> 	struct devfreq_cpu_data *cpu_data;
>> 	unsigned long cpu, cpu_cur, cpu_min, cpu_max, cpu_percent;
>> 	unsigned long dev_min, dev_max;
>> 	unsigned long freq = 0;
>>
>> 	for_each_online_cpu(cpu) {
>> 		cpu_data = p_data->cpu_data[cpu];
>> 		if (!cpu_data || cpu_data->first_cpu != cpu)
>> 			continue;
>>
>> 		/* Get target freq via required opps */
>> 		cpu_cur = cpu_data->cur_freq * HZ_PER_KHZ;
>> 		freq = get_taget_freq_by_required_opp(cpu_data->dev,
>> 					cpu_data->opp_table,
>> 					devfreq->opp_table, cpu_cur);
>> 		if (freq) {
>> 			*target_freq = max(freq, *target_freq);
>> 			continue;
>> 		}
>>
>> 		/* Use Interpolation if required opps is not available */
>> 		devfreq_get_freq_range(devfreq, &dev_min, &dev_max);
>>
>> 		cpu_min = cpu_data->min_freq;
>> 		cpu_max = cpu_data->max_freq;
>> 		cpu_cur = cpu_data->cur_freq;
>>
>> 		cpu_percent = ((cpu_cur - cpu_min) * 100) / cpu_max - cpu_min;
>> 		freq = dev_min + mult_frac(dev_max - dev_min, cpu_percent, 100);
>>
>> 		*target_freq = max(freq, *target_freq);
>> 	}
>>
>> 	return 0;
>> }
>>
>>> +
>>> +static int get_target_freq_with_devfreq(struct devfreq *devfreq,
>>>  					unsigned long *freq)
>>>  {
>>>  	struct devfreq_passive_data *p_data
>>> @@ -23,14 +115,6 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
>>>  	int i, count;
>>>  
>>>  	/*
>>> -	 * If the devfreq device with passive governor has the specific method
>>> -	 * to determine the next frequency, should use the get_target_freq()
>>> -	 * of struct devfreq_passive_data.
>>> -	 */
>>> -	if (p_data->get_target_freq)
>>> -		return p_data->get_target_freq(devfreq, freq);
>>> -
>>> -	/*
>>>  	 * If the parent and passive devfreq device uses the OPP table,
>>>  	 * get the next frequency by using the OPP table.
>>>  	 */
>>> @@ -98,6 +182,37 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
>>>  	return 0;
>>>  }
>>>  
>>> +static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
>>> +					   unsigned long *freq)
>>> +{
>>> +	struct devfreq_passive_data *p_data =
>>> +		(struct devfreq_passive_data *)devfreq->data;
>>> +	int ret;
>>> +
>>> +	/*
>>> +	 * If the devfreq device with passive governor has the specific method
>>> +	 * to determine the next frequency, should use the get_target_freq()
>>> +	 * of struct devfreq_passive_data.
>>> +	 */
>>> +	if (p_data->get_target_freq)
>>> +		return p_data->get_target_freq(devfreq, freq);
>>> +
>>> +	switch (p_data->parent_type) {
>>> +	case DEVFREQ_PARENT_DEV:
>>> +		ret = get_target_freq_with_devfreq(devfreq, freq);
>>> +		break;
>>> +	case CPUFREQ_PARENT_DEV:
>>> +		ret = get_target_freq_with_cpufreq(devfreq, freq);
>>> +		break;
>>> +	default:
>>> +		ret = -EINVAL;
>>> +		dev_err(&devfreq->dev, "Invalid parent type\n");
>>> +		break;
>>> +	}
>>> +
>>> +	return ret;
>>> +}
>>> +
>>>  static int devfreq_passive_notifier_call(struct notifier_block *nb,
>>>  				unsigned long event, void *ptr)
>>>  {
>>> @@ -130,16 +245,200 @@ static int devfreq_passive_notifier_call(struct notifier_block *nb,
>>>  	return NOTIFY_DONE;
>>>  }
>>>  
>>> +static int cpufreq_passive_notifier_call(struct notifier_block *nb,
>>> +					 unsigned long event, void *ptr)
>>> +{
>>> +	struct devfreq_passive_data *data =
>>> +			container_of(nb, struct devfreq_passive_data, nb);
>>> +	struct devfreq *devfreq = (struct devfreq *)data->this;
>>> +	struct devfreq_cpu_state *cpu_state;
>>> +	struct cpufreq_freqs *cpu_freq = ptr;
>>
>> Use 'freqs' variable name.  I prefer to use the same variable name
>> for both devfreq_freqs and cpufreq_freqs instance.
>>
>>> +	unsigned int curr_freq;
>>
>> As I commented above, better to use 'cur_frq' instead of 'curr_freq'
>> if there is no any special reason.
>>
>>> +	int ret;
>>> +
>>> +	if (event != CPUFREQ_POSTCHANGE || !cpu_freq ||
>>> +	    !data->cpu_state[cpu_freq->policy->cpu])
>>> +		return 0;
>>> +
>>> +	cpu_state = data->cpu_state[cpu_freq->policy->cpu];
>>> +	if (cpu_state->curr_freq == cpu_freq->new)
>>> +		return 0;
>>> +
>>> +	/* Backup current freq and pre-update cpu state freq*/
>>
>> I think that this commnet is not critial. So, please drop this comment.
>>
>>> +	curr_freq = cpu_state->curr_freq;
>>> +	cpu_state->curr_freq = cpu_freq->new;
>>> +
>>> +	mutex_lock(&devfreq->lock);
>>> +	ret = update_devfreq(devfreq);
>>
>> I recommend to use 'devfreq_update_target' instead of 'update_devfreq'
>> as following:
>> 	devfreq_update_target(devfreq, freqs->new);
>>
>>> +	mutex_unlock(&devfreq->lock);
>>> +	if (ret) {
>>> +		cpu_state->curr_freq = curr_freq;
>>> +		dev_err(&devfreq->dev, "Couldn't update the frequency.\n");
>>> +		return ret;
>>> +	}
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +static int cpufreq_passive_register(struct devfreq_passive_data **p_data)
>>
>> In order to keep the consistent style of function name,
>> please change the name as following because devfreq defines
>> the function name as 'devfreq_regiter_notifier'
>> - cpufreq_passive_register -> cpufreq_passive_register_notifier
>>
>>> +{
>>> +	struct devfreq_passive_data *data = *p_data;
>>> +	struct devfreq *devfreq = (struct devfreq *)data->this;
>>> +	struct device *dev = devfreq->dev.parent;
>>> +	struct opp_table *opp_table = NULL;
>>> +	struct devfreq_cpu_state *cpu_state;
>>> +	struct cpufreq_policy *policy;
>>> +	struct device *cpu_dev;
>>> +	unsigned int cpu;
>>> +	int ret;
>>> +
>>> +	get_online_cpus();
>>> +
>>> +	data->nb.notifier_call = cpufreq_passive_notifier_call;
>>> +	ret = cpufreq_register_notifier(&data->nb,
>>> +					CPUFREQ_TRANSITION_NOTIFIER);
>>> +	if (ret) {
>>> +		dev_err(dev, "Couldn't register cpufreq notifier.\n");
>>> +		data->nb.notifier_call = NULL;
>>> +		goto out;
>>> +	}
>>> +
>>> +	/* Populate devfreq_cpu_state */
>>
>> Don't need this comment. Please drop it.
>>
>>> +	for_each_online_cpu(cpu) {
>>> +		if (data->cpu_state[cpu])
>>> +			continue;
>>> +
>>> +		policy = cpufreq_cpu_get(cpu);
>>> +		if (!policy) {
>>> +			ret = -EINVAL;
>>> +			goto out;
>>> +		} else if (PTR_ERR(policy) == -EPROBE_DEFER) {
>>> +			ret = -EPROBE_DEFER;
>>> +			goto out;
>>> +		} else if (IS_ERR(policy)) {
>>> +			ret = PTR_ERR(policy);
>>> +			dev_err(dev, "Couldn't get the cpufreq_poliy.\n");
>>> +			goto out;
>>> +		}
>>
>> Use dev_err_probe() funciotn to handle hte EPROBE_DEFER.
>> It make code more simple.
>>
>>> +
>>> +		cpu_state = kzalloc(sizeof(*cpu_state), GFP_KERNEL);
>>> +		if (!cpu_state) {
>>> +			ret = -ENOMEM;
>>> +			goto out;
>>> +		}
>>> +
>>> +		cpu_dev = get_cpu_device(cpu);
>>> +		if (!cpu_dev) {
>>> +			dev_err(dev, "Couldn't get cpu device.\n");
>>> +			ret = -ENODEV;
>>> +			goto out;
>>> +		}
>>> +
>>> +		opp_table = dev_pm_opp_get_opp_table(cpu_dev);
>>> +		if (IS_ERR(devfreq->opp_table)) {
>>> +			ret = PTR_ERR(opp_table);
>>> +			goto out;
>>> +		}
>>> +
>>> +		cpu_state->cpu_dev = cpu_dev;
>>> +		cpu_state->opp_table = opp_table;
>>> +		cpu_state->first_cpu = cpumask_first(policy->related_cpus);
>>> +		cpu_state->curr_freq = policy->cur;
>>> +		cpu_state->min_freq = policy->cpuinfo.min_freq;
>>> +		cpu_state->max_freq = policy->cpuinfo.max_freq;
>>> +		data->cpu_state[cpu] = cpu_state;
>>> +
>>> +		cpufreq_cpu_put(policy);
>>> +	}
>>> +
>>> +out:
>>> +	put_online_cpus();
>>> +	if (ret)
>>> +		return ret;
>>> +
>>> +	/* Update devfreq */
>>> +	mutex_lock(&devfreq->lock);
>>> +	ret = update_devfreq(devfreq);
>>
>>> +	mutex_unlock(&devfreq->lock);
>>> +	if (ret)
>>> +		dev_err(dev, "Couldn't update the frequency.\n");
>>> +
>>> +	return ret;
>>> +}
>>> +
>>> +static int cpufreq_passive_unregister(struct devfreq_passive_data **p_data)
>>
>> As I commented above, please change the name as following:
>> - cpufreq_passive_unregister -> cpufreq_passive_unregister_notifier
>>
>>> +{
>>> +	struct devfreq_passive_data *data = *p_data;
>>> +	struct devfreq_cpu_state *cpu_state;
>>> +	int cpu;
>>> +
>>> +	if (data->nb.notifier_call)
>>> +		cpufreq_unregister_notifier(&data->nb,
>>> +					    CPUFREQ_TRANSITION_NOTIFIER);
>>> +
>>> +	for_each_possible_cpu(cpu) {
>>> +		cpu_state = data->cpu_state[cpu];
>>> +		if (cpu_state) {
>>> +			if (cpu_state->opp_table)
>>> +				dev_pm_opp_put_opp_table(cpu_state->opp_table);
>>> +			kfree(cpu_state);
>>> +			cpu_state = NULL;
>>> +		}
>>> +	}
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +int register_parent_dev_notifier(struct devfreq_passive_data **p_data)
>>> +{
>>> +	struct notifier_block *nb = &(*p_data)->nb;
>>> +	int ret = 0;
>>> +
>>> +	switch ((*p_data)->parent_type) {
>>> +	case DEVFREQ_PARENT_DEV:
>>> +		nb->notifier_call = devfreq_passive_notifier_call;
>>> +		ret = devfreq_register_notifier((struct devfreq *)(*p_data)->parent, nb,
>>> +						DEVFREQ_TRANSITION_NOTIFIER);
>>> +		break;
>>> +	case CPUFREQ_PARENT_DEV:
>>> +		ret = cpufreq_passive_register(p_data);
>>> +		break;
>>> +	default:
>>> +		ret = -EINVAL;
>>> +		break;
>>> +	}
>>> +	return ret;
>>> +}
>>> +
>>> +int unregister_parent_dev_notifier(struct devfreq_passive_data **p_data)
>>> +{
>>> +	int ret = 0;
>>> +
>>> +	switch ((*p_data)->parent_type) {
>>> +	case DEVFREQ_PARENT_DEV:
>>> +		WARN_ON(devfreq_unregister_notifier((struct devfreq *)(*p_data)->parent,
>>> +						    &(*p_data)->nb,
>>> +						    DEVFREQ_TRANSITION_NOTIFIER));
>>> +		break;
>>> +	case CPUFREQ_PARENT_DEV:
>>> +		cpufreq_passive_unregister(p_data);
>>> +		break;
>>> +	default:
>>> +		ret = -EINVAL;
>>> +		break;
>>> +	}
>>> +	return ret;
>>> +}
>>
>> I think that you don't need to define register_parent_dev_notifier
>> and unregister_parent_dev_notifier as the separate functions.
>>
>> Instead of the separate functions, just add the code
>> into devfreq_passive_event_handler.
>>
>>
>>> +
>>>  static int devfreq_passive_event_handler(struct devfreq *devfreq,
>>>  				unsigned int event, void *data)
>>>  {
>>>  	struct devfreq_passive_data *p_data
>>>  			= (struct devfreq_passive_data *)devfreq->data;
>>>  	struct devfreq *parent = (struct devfreq *)p_data->parent;
>>> -	struct notifier_block *nb = &p_data->nb;
>>>  	int ret = 0;
>>>  
>>> -	if (!parent)
>>> +	if (p_data->parent_type == DEVFREQ_PARENT_DEV && !parent)
>>>  		return -EPROBE_DEFER;
>>>  
>>>  	switch (event) {
>>> @@ -147,13 +446,11 @@ static int devfreq_passive_event_handler(struct devfreq *devfreq,
>>>  		if (!p_data->this)
>>>  			p_data->this = devfreq;
>>>  
>>> -		nb->notifier_call = devfreq_passive_notifier_call;
>>> -		ret = devfreq_register_notifier(parent, nb,
>>> -					DEVFREQ_TRANSITION_NOTIFIER);
>>> +		ret = register_parent_dev_notifier(&p_data);
>>>  		break;
>>> +
>>>  	case DEVFREQ_GOV_STOP:
>>> -		WARN_ON(devfreq_unregister_notifier(parent, nb,
>>> -					DEVFREQ_TRANSITION_NOTIFIER));
>>> +		ret = unregister_parent_dev_notifier(&p_data);
>>>  		break;
>>>  	default:
>>>  		break;
>>> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
>>> index 26ea0850be9b..e0093b7c805c 100644
>>> --- a/include/linux/devfreq.h
>>> +++ b/include/linux/devfreq.h
>>> @@ -280,6 +280,25 @@ struct devfreq_simple_ondemand_data {
>>>  
>>>  #if IS_ENABLED(CONFIG_DEVFREQ_GOV_PASSIVE)
>>>  /**
>>> + * struct devfreq_cpu_state - holds the per-cpu state
>>> + * @freq:	the current frequency of the cpu.
>>> + * @min_freq:	the min frequency of the cpu.
>>> + * @max_freq:	the max frequency of the cpu.
>>> + * @first_cpu:	the cpumask of the first cpu of a policy.
>>> + * @dev:	reference to cpu device.
>>> + * @opp_table:	reference to cpu opp table.
>>> + *
>>> + * This structure stores the required cpu_state of a cpu.
>>> + * This is auto-populated by the governor.
>>> + */
>>> +struct devfreq_cpu_state;
>>> +
>>> +enum devfreq_parent_dev_type {
>>> +	DEVFREQ_PARENT_DEV,
>>> +	CPUFREQ_PARENT_DEV,
>>> +};
>>> +
>>> +/**
>>>   * struct devfreq_passive_data - ``void *data`` fed to struct devfreq
>>>   *	and devfreq_add_device
>>>   * @parent:	the devfreq instance of parent device.
>>> @@ -290,13 +309,15 @@ struct devfreq_simple_ondemand_data {
>>>   *			using governors except for passive governor.
>>>   *			If the devfreq device has the specific method to decide
>>>   *			the next frequency, should use this callback.
>>> + * @parent_type:	parent type of the device
>>>   * @this:	the devfreq instance of own device.
>>>   * @nb:		the notifier block for DEVFREQ_TRANSITION_NOTIFIER list
>>> + * @cpu_state:		the state min/max/current frequency of all online cpu's
>>>   *
>>>   * The devfreq_passive_data have to set the devfreq instance of parent
>>>   * device with governors except for the passive governor. But, don't need to
>>> - * initialize the 'this' and 'nb' field because the devfreq core will handle
>>> - * them.
>>> + * initialize the 'this', 'nb' and 'cpu_state' field because the devfreq core
>>> + * will handle them.
>>>   */
>>>  struct devfreq_passive_data {
>>>  	/* Should set the devfreq instance of parent device */
>>> @@ -305,9 +326,13 @@ struct devfreq_passive_data {
>>>  	/* Optional callback to decide the next frequency of passvice device */
>>>  	int (*get_target_freq)(struct devfreq *this, unsigned long *freq);
>>>  
>>> +	/* Should set the type of parent device */
>>> +	enum devfreq_parent_dev_type parent_type;
>>> +
>>>  	/* For passive governor's internal use. Don't need to set them */
>>>  	struct devfreq *this;
>>>  	struct notifier_block nb;
>>> +	struct devfreq_cpu_state *cpu_state[NR_CPUS];
>>>  };
>>>  #endif
>>>  
>>>
>>
>
Chanwoo Choi March 31, 2021, 8:35 a.m. UTC | #5
On 3/31/21 5:27 PM, Chanwoo Choi wrote:
> Hi,
> 
> On 3/31/21 5:03 PM, andrew-sh.cheng wrote:
>> On Thu, 2021-03-25 at 17:14 +0900, Chanwoo Choi wrote:
>>> Hi,
>>>
>>> You are missing to add these patches to linux-pm mailing list.
>>> Need to send them to linu-pm ML.
>>>
>>> Also, before received this series, I tried to clean-up these patches
>>> on testing branch[1]. So that I add my comment with my clean-up case.
>>> [1] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!zIrzeDp9vPnm1_SDzVPuzqdHn3zWie9DnfBXaA-j9-CSrVc6aR9_rJQQiw81_CgAPh9XRRs$ 
>>>
>>> And 'Saravana Kannan <skannan@codeaurora.org>' is wrong email address.
>>> Please update the email or drop this email.
>>
>> Hi Chanwoo,
>>
>> Thank you for the advices.
>> I will resend patch v9 (add to linux-pm ML), remove this patch, and note
>> that my patch set base on
>> https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov
> 
> I has not yet test this patch[1] on devfreq-testing-passive-gov branch.
> So that if possible, I'd like you to test your patches with this patch[1] 
> and then if there is no problem, could you send the next patches with patch[1]?
> 
> [1]https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/commit/?h=devfreq-testing-passive-gov&id=39c80d11a8f42dd63ecea1e0df595a0ceb83b454


Sorry for the confusion. I make the devfreq-testing-passive-gov[1]
branch based on latest devfreq-next branch.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov

First of all, if possible, I want to test them[1] with your patches in this series.
And then if there are no any problem, please let me know. After confirmed from you,
I'll send the patches of devfreq-testing-passive-gov[1] branch.
How about that?


> 
>>
>>
>>>
>>>
>>> On 3/23/21 8:33 PM, Andrew-sh.Cheng wrote:
>>>> From: Saravana Kannan <skannan@codeaurora.org>
>>>>
>>>> Many CPU architectures have caches that can scale independent of the
>>>> CPUs. Frequency scaling of the caches is necessary to make sure that the
>>>> cache is not a performance bottleneck that leads to poor performance and
>>>> power. The same idea applies for RAM/DDR.
>>>>
>>>> To achieve this, this patch adds support for cpu based scaling to the
>>>> passive governor. This is accomplished by taking the current frequency
>>>> of each CPU frequency domain and then adjust the frequency of the cache
>>>> (or any devfreq device) based on the frequency of the CPUs. It listens
>>>> to CPU frequency transition notifiers to keep itself up to date on the
>>>> current CPU frequency.
>>>>
>>>> To decide the frequency of the device, the governor does one of the
>>>> following:
>>>> * Derives the optimal devfreq device opp from required-opps property of
>>>>   the parent cpu opp_table.
>>>>
>>>> * Scales the device frequency in proportion to the CPU frequency. So, if
>>>>   the CPUs are running at their max frequency, the device runs at its
>>>>   max frequency. If the CPUs are running at their min frequency, the
>>>>   device runs at its min frequency. It is interpolated for frequencies
>>>>   in between.
>>>>
>>>> Andrew-sh.Cheng change
>>>> dev_pm_opp_xlate_opp to dev_pm_opp_xlate_required_opp devfreq->max_freq
>>>> to devfreq->user_min_freq_req.data.freq.qos->min_freq.target_value
>>>> after kernel-5.7
>>>> Don't return -EINVAL in devfreq_passive_event_handler()
>>>> since it doesn't handle DEVFREQ_GOV_SUSPEND DEVFREQ_GOV_RESUME cases.
>>>>
>>>> Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
>>>> [Sibi: Integrated cpu-freqmap governor into passive_governor]
>>>> Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
>>>> Signed-off-by: Andrew-sh.Cheng <andrew-sh.cheng@mediatek.com>
>>>> ---
>>>>  drivers/devfreq/Kconfig            |   2 +
>>>>  drivers/devfreq/governor_passive.c | 329 +++++++++++++++++++++++++++++++++++--
>>>>  include/linux/devfreq.h            |  29 +++-
>>>>  3 files changed, 342 insertions(+), 18 deletions(-)
>>>>
>>>> diff --git a/drivers/devfreq/Kconfig b/drivers/devfreq/Kconfig
>>>> index 00704efe6398..f56132b0ae64 100644
>>>> --- a/drivers/devfreq/Kconfig
>>>> +++ b/drivers/devfreq/Kconfig
>>>> @@ -73,6 +73,8 @@ config DEVFREQ_GOV_PASSIVE
>>>>  	  device. This governor does not change the frequency by itself
>>>>  	  through sysfs entries. The passive governor recommends that
>>>>  	  devfreq device uses the OPP table to get the frequency/voltage.
>>>> +	  Alternatively the governor can also be chosen to scale based on
>>>> +	  the online CPUs current frequency.
>>>>  
>>>>  comment "DEVFREQ Drivers"
>>>>  
>>>> diff --git a/drivers/devfreq/governor_passive.c b/drivers/devfreq/governor_passive.c
>>>> index b094132bd20b..9cc57b083839 100644
>>>> --- a/drivers/devfreq/governor_passive.c
>>>> +++ b/drivers/devfreq/governor_passive.c
>>>> @@ -8,11 +8,103 @@
>>>>   */
>>>>  
>>>>  #include <linux/module.h>
>>>> +#include <linux/cpu.h>
>>>> +#include <linux/cpufreq.h>
>>>> +#include <linux/cpumask.h>
>>>>  #include <linux/device.h>
>>>>  #include <linux/devfreq.h>
>>>> +#include <linux/slab.h>
>>>>  #include "governor.h"
>>>>  
>>>> -static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
>>>> +struct devfreq_cpu_state {
>>>> +	unsigned int curr_freq;
>>>> +	unsigned int min_freq;
>>>> +	unsigned int max_freq;
>>>> +	unsigned int first_cpu;
>>>> +	struct device *cpu_dev;
>>>> +	struct opp_table *opp_table;
>>>> +};
>>>
>>> As I knew, the previous version has the description of structure
>>> as following:  I wan to add the description like below.
>>>
>>> And if you have no any objection, I'd like you to order
>>> the variables as following and use 'dev' instead of 'cpu_dev'
>>> because this patch use the 'cpu_state->cpu_dev' at the multiple points.
>>> I think that 'cpu_state->dev' is better than 'cpu_state->cpu_dev'.
>>> Also, I prefer to use 'cur_freq' instead of 'curr_freq'
>>> because devfreq subsystem uses 'cur_freq' for expressing the 'current frequency'.
>>>
>>> /**                                                                             
>>>  * struct devfreq_cpu_state - Hold the per-cpu data                              
>>>  * @dev:        reference to cpu device.                                        
>>>  * @first_cpu:  the cpumask of the first cpu of a policy.                       
>>>  * @opp_table:  reference to cpu opp table.                                     
>>>  * @cur_freq:   the current frequency of the cpu.                               
>>>  * @min_freq:   the min frequency of the cpu.                                   
>>>  * @max_freq:   the max frequency of the cpu.                                   
>>>  *                                                                              
>>>  * This structure stores the required cpu_data of a cpu.                        
>>>  * This is auto-populated by the governor.                                      
>>>  */                                                                             
>>> struct devfreq_cpu_state {                                                       
>>>          struct device *dev;                                                     
>>>          unsigned int first_cpu;                                                 
>>>
>>>          struct opp_table *opp_table;                                            
>>>          unsigned int cur_freq;                                                  
>>>          unsigned int min_freq;                                                  
>>>          unsigned int max_freq;                                                  
>>> };               
>>>
>>>
>>>> +
>>>> +static unsigned long xlate_cpufreq_to_devfreq(struct devfreq_passive_data *data,
>>>> +					      unsigned int cpu)
>>>> +{
>>>> +	unsigned int cpu_min_freq, cpu_max_freq, cpu_curr_freq_khz, cpu_percent;
>>>> +	unsigned long dev_min_freq, dev_max_freq, dev_max_state;
>>>> +
>>>> +	struct devfreq_cpu_state *cpu_state = data->cpu_state[cpu];
>>>> +	struct devfreq *devfreq = (struct devfreq *)data->this;
>>>> +	unsigned long *dev_freq_table = devfreq->profile->freq_table;
>>>> +	struct dev_pm_opp *opp = NULL, *p_opp = NULL;
>>>> +	unsigned long cpu_curr_freq, freq;
>>>> +
>>>> +	if (!cpu_state || cpu_state->first_cpu != cpu ||
>>>> +	    !cpu_state->opp_table || !devfreq->opp_table)
>>>> +		return 0;
>>>> +
>>>> +	cpu_curr_freq = cpu_state->curr_freq * 1000;
>>>> +	p_opp = devfreq_recommended_opp(cpu_state->cpu_dev, &cpu_curr_freq, 0);
>>>> +	if (IS_ERR(p_opp))
>>>> +		return 0;
>>>> +
>>>> +	opp = dev_pm_opp_xlate_required_opp(cpu_state->opp_table,
>>>> +					    devfreq->opp_table, p_opp);
>>>> +	dev_pm_opp_put(p_opp);
>>>> +
>>>> +	if (!IS_ERR(opp)) {
>>>> +		freq = dev_pm_opp_get_freq(opp);
>>>> +		dev_pm_opp_put(opp);
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	/* Use Interpolation if required opps is not available */
>>>> +	cpu_min_freq = cpu_state->min_freq;
>>>> +	cpu_max_freq = cpu_state->max_freq;
>>>> +	cpu_curr_freq_khz = cpu_state->curr_freq;
>>>> +
>>>> +	if (dev_freq_table) {
>>>> +		/* Get minimum frequency according to sorting order */
>>>> +		dev_max_state = dev_freq_table[devfreq->profile->max_state - 1];
>>>> +		if (dev_freq_table[0] < dev_max_state) {
>>>> +			dev_min_freq = dev_freq_table[0];
>>>> +			dev_max_freq = dev_max_state;
>>>> +		} else {
>>>> +			dev_min_freq = dev_max_state;
>>>> +			dev_max_freq = dev_freq_table[0];
>>>> +		}
>>>> +	} else {
>>>> +		dev_min_freq = dev_pm_qos_read_value(devfreq->dev.parent,
>>>> +						     DEV_PM_QOS_MIN_FREQUENCY);
>>>> +		dev_max_freq = dev_pm_qos_read_value(devfreq->dev.parent,
>>>> +						     DEV_PM_QOS_MAX_FREQUENCY);
>>>> +
>>>> +		if (dev_max_freq <= dev_min_freq)
>>>> +			return 0;
>>>> +	}
>>>> +	cpu_percent = ((cpu_curr_freq_khz - cpu_min_freq) * 100) / cpu_max_freq - cpu_min_freq;
>>>> +	freq = dev_min_freq + mult_frac(dev_max_freq - dev_min_freq, cpu_percent, 100);
>>>> +
>>>> +out:
>>>> +	return freq;
>>>> +}
>>>> +
>>>> +static int get_target_freq_with_cpufreq(struct devfreq *devfreq,
>>>> +					unsigned long *freq)
>>>> +{
>>>> +	struct devfreq_passive_data *p_data =
>>>> +				(struct devfreq_passive_data *)devfreq->data;
>>>> +	unsigned int cpu;
>>>> +	unsigned long target_freq = 0;
>>>> +
>>>> +	for_each_online_cpu(cpu)
>>>> +		target_freq = max(target_freq,
>>>> +				  xlate_cpufreq_to_devfreq(p_data, cpu));
>>>> +
>>>> +	*freq = target_freq;
>>>> +
>>>> +	return 0;
>>>> +}
>>>
>>> As you knew, governor_passive.c was already used 
>>> both 'dev_pm_opp_xlate_required_opp' and 'devfreq_recommended_opp'
>>> to get the target from OPP. So, I wan to make the common function
>>> like 'get_taget_freq_by_required_opp' as following:
>>> If define 'get_taget_freq_by_required_opp' as following,
>>> it will be used for get_target_freq_with_devfreq().
>>> After finisied the review of this patch, I'll send the patch[2].
>>> [2] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/commit/?h=devfreq-testing-passive-gov&id=101c5a087586ab2b5cf3370166a7e39227ca83cf__;!!CTRNKA9wMg0ARbw!zIrzeDp9vPnm1_SDzVPuzqdHn3zWie9DnfBXaA-j9-CSrVc6aR9_rJQQiw81_CgA6mp3Yqo$ 
>>>
>>> For example but this code is not tested,
>>> static unsigned long get_taget_freq_by_required_opp(struct device *p_dev,
>>> 						struct opp_table *p_opp_table,
>>> 						struct opp_table *opp_table,
>>> 						unsigned long freq)
>>> {
>>> 	struct dev_pm_opp *opp = NULL, *p_opp = NULL;
>>>
>>> 	if (!p_dev || !p_opp_table || !opp_table || !freq)
>>> 		return 0;
>>>
>>> 	p_opp = devfreq_recommended_opp(p_dev, &freq, 0);
>>> 	if (IS_ERR(p_opp))
>>> 		return 0;
>>>
>>> 	opp = dev_pm_opp_xlate_required_opp(p_opp_table, opp_table, p_opp);
>>> 	dev_pm_opp_put(p_opp);
>>>
>>> 	if (IS_ERR(opp))
>>> 		return 0;
>>>
>>> 	freq = dev_pm_opp_get_freq(opp);
>>> 	dev_pm_opp_put(opp);
>>>
>>> 	return freq;
>>> }
>>>
>>> static int get_target_freq_with_cpufreq(struct devfreq *devfreq,
>>> 					unsigned long *target_freq)
>>> {
>>> 	struct devfreq_passive_data *p_data =
>>> 				(struct devfreq_passive_data *)devfreq->data;
>>> 	struct devfreq_cpu_data *cpu_data;
>>> 	unsigned long cpu, cpu_cur, cpu_min, cpu_max, cpu_percent;
>>> 	unsigned long dev_min, dev_max;
>>> 	unsigned long freq = 0;
>>>
>>> 	for_each_online_cpu(cpu) {
>>> 		cpu_data = p_data->cpu_data[cpu];
>>> 		if (!cpu_data || cpu_data->first_cpu != cpu)
>>> 			continue;
>>>
>>> 		/* Get target freq via required opps */
>>> 		cpu_cur = cpu_data->cur_freq * HZ_PER_KHZ;
>>> 		freq = get_taget_freq_by_required_opp(cpu_data->dev,
>>> 					cpu_data->opp_table,
>>> 					devfreq->opp_table, cpu_cur);
>>> 		if (freq) {
>>> 			*target_freq = max(freq, *target_freq);
>>> 			continue;
>>> 		}
>>>
>>> 		/* Use Interpolation if required opps is not available */
>>> 		devfreq_get_freq_range(devfreq, &dev_min, &dev_max);
>>>
>>> 		cpu_min = cpu_data->min_freq;
>>> 		cpu_max = cpu_data->max_freq;
>>> 		cpu_cur = cpu_data->cur_freq;
>>>
>>> 		cpu_percent = ((cpu_cur - cpu_min) * 100) / cpu_max - cpu_min;
>>> 		freq = dev_min + mult_frac(dev_max - dev_min, cpu_percent, 100);
>>>
>>> 		*target_freq = max(freq, *target_freq);
>>> 	}
>>>
>>> 	return 0;
>>> }
>>>
>>>> +
>>>> +static int get_target_freq_with_devfreq(struct devfreq *devfreq,
>>>>  					unsigned long *freq)
>>>>  {
>>>>  	struct devfreq_passive_data *p_data
>>>> @@ -23,14 +115,6 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
>>>>  	int i, count;
>>>>  
>>>>  	/*
>>>> -	 * If the devfreq device with passive governor has the specific method
>>>> -	 * to determine the next frequency, should use the get_target_freq()
>>>> -	 * of struct devfreq_passive_data.
>>>> -	 */
>>>> -	if (p_data->get_target_freq)
>>>> -		return p_data->get_target_freq(devfreq, freq);
>>>> -
>>>> -	/*
>>>>  	 * If the parent and passive devfreq device uses the OPP table,
>>>>  	 * get the next frequency by using the OPP table.
>>>>  	 */
>>>> @@ -98,6 +182,37 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
>>>>  	return 0;
>>>>  }
>>>>  
>>>> +static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
>>>> +					   unsigned long *freq)
>>>> +{
>>>> +	struct devfreq_passive_data *p_data =
>>>> +		(struct devfreq_passive_data *)devfreq->data;
>>>> +	int ret;
>>>> +
>>>> +	/*
>>>> +	 * If the devfreq device with passive governor has the specific method
>>>> +	 * to determine the next frequency, should use the get_target_freq()
>>>> +	 * of struct devfreq_passive_data.
>>>> +	 */
>>>> +	if (p_data->get_target_freq)
>>>> +		return p_data->get_target_freq(devfreq, freq);
>>>> +
>>>> +	switch (p_data->parent_type) {
>>>> +	case DEVFREQ_PARENT_DEV:
>>>> +		ret = get_target_freq_with_devfreq(devfreq, freq);
>>>> +		break;
>>>> +	case CPUFREQ_PARENT_DEV:
>>>> +		ret = get_target_freq_with_cpufreq(devfreq, freq);
>>>> +		break;
>>>> +	default:
>>>> +		ret = -EINVAL;
>>>> +		dev_err(&devfreq->dev, "Invalid parent type\n");
>>>> +		break;
>>>> +	}
>>>> +
>>>> +	return ret;
>>>> +}
>>>> +
>>>>  static int devfreq_passive_notifier_call(struct notifier_block *nb,
>>>>  				unsigned long event, void *ptr)
>>>>  {
>>>> @@ -130,16 +245,200 @@ static int devfreq_passive_notifier_call(struct notifier_block *nb,
>>>>  	return NOTIFY_DONE;
>>>>  }
>>>>  
>>>> +static int cpufreq_passive_notifier_call(struct notifier_block *nb,
>>>> +					 unsigned long event, void *ptr)
>>>> +{
>>>> +	struct devfreq_passive_data *data =
>>>> +			container_of(nb, struct devfreq_passive_data, nb);
>>>> +	struct devfreq *devfreq = (struct devfreq *)data->this;
>>>> +	struct devfreq_cpu_state *cpu_state;
>>>> +	struct cpufreq_freqs *cpu_freq = ptr;
>>>
>>> Use 'freqs' variable name.  I prefer to use the same variable name
>>> for both devfreq_freqs and cpufreq_freqs instance.
>>>
>>>> +	unsigned int curr_freq;
>>>
>>> As I commented above, better to use 'cur_frq' instead of 'curr_freq'
>>> if there is no any special reason.
>>>
>>>> +	int ret;
>>>> +
>>>> +	if (event != CPUFREQ_POSTCHANGE || !cpu_freq ||
>>>> +	    !data->cpu_state[cpu_freq->policy->cpu])
>>>> +		return 0;
>>>> +
>>>> +	cpu_state = data->cpu_state[cpu_freq->policy->cpu];
>>>> +	if (cpu_state->curr_freq == cpu_freq->new)
>>>> +		return 0;
>>>> +
>>>> +	/* Backup current freq and pre-update cpu state freq*/
>>>
>>> I think that this commnet is not critial. So, please drop this comment.
>>>
>>>> +	curr_freq = cpu_state->curr_freq;
>>>> +	cpu_state->curr_freq = cpu_freq->new;
>>>> +
>>>> +	mutex_lock(&devfreq->lock);
>>>> +	ret = update_devfreq(devfreq);
>>>
>>> I recommend to use 'devfreq_update_target' instead of 'update_devfreq'
>>> as following:
>>> 	devfreq_update_target(devfreq, freqs->new);
>>>
>>>> +	mutex_unlock(&devfreq->lock);
>>>> +	if (ret) {
>>>> +		cpu_state->curr_freq = curr_freq;
>>>> +		dev_err(&devfreq->dev, "Couldn't update the frequency.\n");
>>>> +		return ret;
>>>> +	}
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static int cpufreq_passive_register(struct devfreq_passive_data **p_data)
>>>
>>> In order to keep the consistent style of function name,
>>> please change the name as following because devfreq defines
>>> the function name as 'devfreq_regiter_notifier'
>>> - cpufreq_passive_register -> cpufreq_passive_register_notifier
>>>
>>>> +{
>>>> +	struct devfreq_passive_data *data = *p_data;
>>>> +	struct devfreq *devfreq = (struct devfreq *)data->this;
>>>> +	struct device *dev = devfreq->dev.parent;
>>>> +	struct opp_table *opp_table = NULL;
>>>> +	struct devfreq_cpu_state *cpu_state;
>>>> +	struct cpufreq_policy *policy;
>>>> +	struct device *cpu_dev;
>>>> +	unsigned int cpu;
>>>> +	int ret;
>>>> +
>>>> +	get_online_cpus();
>>>> +
>>>> +	data->nb.notifier_call = cpufreq_passive_notifier_call;
>>>> +	ret = cpufreq_register_notifier(&data->nb,
>>>> +					CPUFREQ_TRANSITION_NOTIFIER);
>>>> +	if (ret) {
>>>> +		dev_err(dev, "Couldn't register cpufreq notifier.\n");
>>>> +		data->nb.notifier_call = NULL;
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	/* Populate devfreq_cpu_state */
>>>
>>> Don't need this comment. Please drop it.
>>>
>>>> +	for_each_online_cpu(cpu) {
>>>> +		if (data->cpu_state[cpu])
>>>> +			continue;
>>>> +
>>>> +		policy = cpufreq_cpu_get(cpu);
>>>> +		if (!policy) {
>>>> +			ret = -EINVAL;
>>>> +			goto out;
>>>> +		} else if (PTR_ERR(policy) == -EPROBE_DEFER) {
>>>> +			ret = -EPROBE_DEFER;
>>>> +			goto out;
>>>> +		} else if (IS_ERR(policy)) {
>>>> +			ret = PTR_ERR(policy);
>>>> +			dev_err(dev, "Couldn't get the cpufreq_poliy.\n");
>>>> +			goto out;
>>>> +		}
>>>
>>> Use dev_err_probe() funciotn to handle hte EPROBE_DEFER.
>>> It make code more simple.
>>>
>>>> +
>>>> +		cpu_state = kzalloc(sizeof(*cpu_state), GFP_KERNEL);
>>>> +		if (!cpu_state) {
>>>> +			ret = -ENOMEM;
>>>> +			goto out;
>>>> +		}
>>>> +
>>>> +		cpu_dev = get_cpu_device(cpu);
>>>> +		if (!cpu_dev) {
>>>> +			dev_err(dev, "Couldn't get cpu device.\n");
>>>> +			ret = -ENODEV;
>>>> +			goto out;
>>>> +		}
>>>> +
>>>> +		opp_table = dev_pm_opp_get_opp_table(cpu_dev);
>>>> +		if (IS_ERR(devfreq->opp_table)) {
>>>> +			ret = PTR_ERR(opp_table);
>>>> +			goto out;
>>>> +		}
>>>> +
>>>> +		cpu_state->cpu_dev = cpu_dev;
>>>> +		cpu_state->opp_table = opp_table;
>>>> +		cpu_state->first_cpu = cpumask_first(policy->related_cpus);
>>>> +		cpu_state->curr_freq = policy->cur;
>>>> +		cpu_state->min_freq = policy->cpuinfo.min_freq;
>>>> +		cpu_state->max_freq = policy->cpuinfo.max_freq;
>>>> +		data->cpu_state[cpu] = cpu_state;
>>>> +
>>>> +		cpufreq_cpu_put(policy);
>>>> +	}
>>>> +
>>>> +out:
>>>> +	put_online_cpus();
>>>> +	if (ret)
>>>> +		return ret;
>>>> +
>>>> +	/* Update devfreq */
>>>> +	mutex_lock(&devfreq->lock);
>>>> +	ret = update_devfreq(devfreq);
>>>
>>>> +	mutex_unlock(&devfreq->lock);
>>>> +	if (ret)
>>>> +		dev_err(dev, "Couldn't update the frequency.\n");
>>>> +
>>>> +	return ret;
>>>> +}
>>>> +
>>>> +static int cpufreq_passive_unregister(struct devfreq_passive_data **p_data)
>>>
>>> As I commented above, please change the name as following:
>>> - cpufreq_passive_unregister -> cpufreq_passive_unregister_notifier
>>>
>>>> +{
>>>> +	struct devfreq_passive_data *data = *p_data;
>>>> +	struct devfreq_cpu_state *cpu_state;
>>>> +	int cpu;
>>>> +
>>>> +	if (data->nb.notifier_call)
>>>> +		cpufreq_unregister_notifier(&data->nb,
>>>> +					    CPUFREQ_TRANSITION_NOTIFIER);
>>>> +
>>>> +	for_each_possible_cpu(cpu) {
>>>> +		cpu_state = data->cpu_state[cpu];
>>>> +		if (cpu_state) {
>>>> +			if (cpu_state->opp_table)
>>>> +				dev_pm_opp_put_opp_table(cpu_state->opp_table);
>>>> +			kfree(cpu_state);
>>>> +			cpu_state = NULL;
>>>> +		}
>>>> +	}
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +int register_parent_dev_notifier(struct devfreq_passive_data **p_data)
>>>> +{
>>>> +	struct notifier_block *nb = &(*p_data)->nb;
>>>> +	int ret = 0;
>>>> +
>>>> +	switch ((*p_data)->parent_type) {
>>>> +	case DEVFREQ_PARENT_DEV:
>>>> +		nb->notifier_call = devfreq_passive_notifier_call;
>>>> +		ret = devfreq_register_notifier((struct devfreq *)(*p_data)->parent, nb,
>>>> +						DEVFREQ_TRANSITION_NOTIFIER);
>>>> +		break;
>>>> +	case CPUFREQ_PARENT_DEV:
>>>> +		ret = cpufreq_passive_register(p_data);
>>>> +		break;
>>>> +	default:
>>>> +		ret = -EINVAL;
>>>> +		break;
>>>> +	}
>>>> +	return ret;
>>>> +}
>>>> +
>>>> +int unregister_parent_dev_notifier(struct devfreq_passive_data **p_data)
>>>> +{
>>>> +	int ret = 0;
>>>> +
>>>> +	switch ((*p_data)->parent_type) {
>>>> +	case DEVFREQ_PARENT_DEV:
>>>> +		WARN_ON(devfreq_unregister_notifier((struct devfreq *)(*p_data)->parent,
>>>> +						    &(*p_data)->nb,
>>>> +						    DEVFREQ_TRANSITION_NOTIFIER));
>>>> +		break;
>>>> +	case CPUFREQ_PARENT_DEV:
>>>> +		cpufreq_passive_unregister(p_data);
>>>> +		break;
>>>> +	default:
>>>> +		ret = -EINVAL;
>>>> +		break;
>>>> +	}
>>>> +	return ret;
>>>> +}
>>>
>>> I think that you don't need to define register_parent_dev_notifier
>>> and unregister_parent_dev_notifier as the separate functions.
>>>
>>> Instead of the separate functions, just add the code
>>> into devfreq_passive_event_handler.
>>>
>>>
>>>> +
>>>>  static int devfreq_passive_event_handler(struct devfreq *devfreq,
>>>>  				unsigned int event, void *data)
>>>>  {
>>>>  	struct devfreq_passive_data *p_data
>>>>  			= (struct devfreq_passive_data *)devfreq->data;
>>>>  	struct devfreq *parent = (struct devfreq *)p_data->parent;
>>>> -	struct notifier_block *nb = &p_data->nb;
>>>>  	int ret = 0;
>>>>  
>>>> -	if (!parent)
>>>> +	if (p_data->parent_type == DEVFREQ_PARENT_DEV && !parent)
>>>>  		return -EPROBE_DEFER;
>>>>  
>>>>  	switch (event) {
>>>> @@ -147,13 +446,11 @@ static int devfreq_passive_event_handler(struct devfreq *devfreq,
>>>>  		if (!p_data->this)
>>>>  			p_data->this = devfreq;
>>>>  
>>>> -		nb->notifier_call = devfreq_passive_notifier_call;
>>>> -		ret = devfreq_register_notifier(parent, nb,
>>>> -					DEVFREQ_TRANSITION_NOTIFIER);
>>>> +		ret = register_parent_dev_notifier(&p_data);
>>>>  		break;
>>>> +
>>>>  	case DEVFREQ_GOV_STOP:
>>>> -		WARN_ON(devfreq_unregister_notifier(parent, nb,
>>>> -					DEVFREQ_TRANSITION_NOTIFIER));
>>>> +		ret = unregister_parent_dev_notifier(&p_data);
>>>>  		break;
>>>>  	default:
>>>>  		break;
>>>> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
>>>> index 26ea0850be9b..e0093b7c805c 100644
>>>> --- a/include/linux/devfreq.h
>>>> +++ b/include/linux/devfreq.h
>>>> @@ -280,6 +280,25 @@ struct devfreq_simple_ondemand_data {
>>>>  
>>>>  #if IS_ENABLED(CONFIG_DEVFREQ_GOV_PASSIVE)
>>>>  /**
>>>> + * struct devfreq_cpu_state - holds the per-cpu state
>>>> + * @freq:	the current frequency of the cpu.
>>>> + * @min_freq:	the min frequency of the cpu.
>>>> + * @max_freq:	the max frequency of the cpu.
>>>> + * @first_cpu:	the cpumask of the first cpu of a policy.
>>>> + * @dev:	reference to cpu device.
>>>> + * @opp_table:	reference to cpu opp table.
>>>> + *
>>>> + * This structure stores the required cpu_state of a cpu.
>>>> + * This is auto-populated by the governor.
>>>> + */
>>>> +struct devfreq_cpu_state;
>>>> +
>>>> +enum devfreq_parent_dev_type {
>>>> +	DEVFREQ_PARENT_DEV,
>>>> +	CPUFREQ_PARENT_DEV,
>>>> +};
>>>> +
>>>> +/**
>>>>   * struct devfreq_passive_data - ``void *data`` fed to struct devfreq
>>>>   *	and devfreq_add_device
>>>>   * @parent:	the devfreq instance of parent device.
>>>> @@ -290,13 +309,15 @@ struct devfreq_simple_ondemand_data {
>>>>   *			using governors except for passive governor.
>>>>   *			If the devfreq device has the specific method to decide
>>>>   *			the next frequency, should use this callback.
>>>> + * @parent_type:	parent type of the device
>>>>   * @this:	the devfreq instance of own device.
>>>>   * @nb:		the notifier block for DEVFREQ_TRANSITION_NOTIFIER list
>>>> + * @cpu_state:		the state min/max/current frequency of all online cpu's
>>>>   *
>>>>   * The devfreq_passive_data have to set the devfreq instance of parent
>>>>   * device with governors except for the passive governor. But, don't need to
>>>> - * initialize the 'this' and 'nb' field because the devfreq core will handle
>>>> - * them.
>>>> + * initialize the 'this', 'nb' and 'cpu_state' field because the devfreq core
>>>> + * will handle them.
>>>>   */
>>>>  struct devfreq_passive_data {
>>>>  	/* Should set the devfreq instance of parent device */
>>>> @@ -305,9 +326,13 @@ struct devfreq_passive_data {
>>>>  	/* Optional callback to decide the next frequency of passvice device */
>>>>  	int (*get_target_freq)(struct devfreq *this, unsigned long *freq);
>>>>  
>>>> +	/* Should set the type of parent device */
>>>> +	enum devfreq_parent_dev_type parent_type;
>>>> +
>>>>  	/* For passive governor's internal use. Don't need to set them */
>>>>  	struct devfreq *this;
>>>>  	struct notifier_block nb;
>>>> +	struct devfreq_cpu_state *cpu_state[NR_CPUS];
>>>>  };
>>>>  #endif
>>>>  
>>>>
>>>
>>
> 
>
Hsin-Yi Wang March 31, 2021, 10:46 a.m. UTC | #6
On Thu, Mar 25, 2021 at 3:58 PM Chanwoo Choi <cw00.choi@samsung.com> wrote:
>
> Hi,
>
> You are missing to add these patches to linux-pm mailing list.
> Need to send them to linu-pm ML.
>
> Also, before received this series, I tried to clean-up these patches
> on testing branch[1]. So that I add my comment with my clean-up case.
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov
>
> And 'Saravana Kannan <skannan@codeaurora.org>' is wrong email address.
> Please update the email or drop this email.
>
>
> On 3/23/21 8:33 PM, Andrew-sh.Cheng wrote:
> > From: Saravana Kannan <skannan@codeaurora.org>
> >
> > Many CPU architectures have caches that can scale independent of the
> > CPUs. Frequency scaling of the caches is necessary to make sure that the
> > cache is not a performance bottleneck that leads to poor performance and
> > power. The same idea applies for RAM/DDR.
> >
> > To achieve this, this patch adds support for cpu based scaling to the
> > passive governor. This is accomplished by taking the current frequency
> > of each CPU frequency domain and then adjust the frequency of the cache
> > (or any devfreq device) based on the frequency of the CPUs. It listens
> > to CPU frequency transition notifiers to keep itself up to date on the
> > current CPU frequency.
> >
> > To decide the frequency of the device, the governor does one of the
> > following:
> > * Derives the optimal devfreq device opp from required-opps property of
> >   the parent cpu opp_table.
> >
> > * Scales the device frequency in proportion to the CPU frequency. So, if
> >   the CPUs are running at their max frequency, the device runs at its
> >   max frequency. If the CPUs are running at their min frequency, the
> >   device runs at its min frequency. It is interpolated for frequencies
> >   in between.
> >
> > Andrew-sh.Cheng change
> > dev_pm_opp_xlate_opp to dev_pm_opp_xlate_required_opp devfreq->max_freq
> > to devfreq->user_min_freq_req.data.freq.qos->min_freq.target_value
> > after kernel-5.7
> > Don't return -EINVAL in devfreq_passive_event_handler()
> > since it doesn't handle DEVFREQ_GOV_SUSPEND DEVFREQ_GOV_RESUME cases.
> >
> > Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
> > [Sibi: Integrated cpu-freqmap governor into passive_governor]
> > Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
> > Signed-off-by: Andrew-sh.Cheng <andrew-sh.cheng@mediatek.com>
> > ---
> >  drivers/devfreq/Kconfig            |   2 +
> >  drivers/devfreq/governor_passive.c | 329 +++++++++++++++++++++++++++++++++++--
> >  include/linux/devfreq.h            |  29 +++-
> >  3 files changed, 342 insertions(+), 18 deletions(-)
> >
> > diff --git a/drivers/devfreq/Kconfig b/drivers/devfreq/Kconfig
> > index 00704efe6398..f56132b0ae64 100644
> > --- a/drivers/devfreq/Kconfig
> > +++ b/drivers/devfreq/Kconfig
> > @@ -73,6 +73,8 @@ config DEVFREQ_GOV_PASSIVE
> >         device. This governor does not change the frequency by itself
> >         through sysfs entries. The passive governor recommends that
> >         devfreq device uses the OPP table to get the frequency/voltage.
> > +       Alternatively the governor can also be chosen to scale based on
> > +       the online CPUs current frequency.
> >
> >  comment "DEVFREQ Drivers"
> >
> > diff --git a/drivers/devfreq/governor_passive.c b/drivers/devfreq/governor_passive.c
> > index b094132bd20b..9cc57b083839 100644
> > --- a/drivers/devfreq/governor_passive.c
> > +++ b/drivers/devfreq/governor_passive.c
> > @@ -8,11 +8,103 @@
> >   */
> >
> >  #include <linux/module.h>
> > +#include <linux/cpu.h>
> > +#include <linux/cpufreq.h>
> > +#include <linux/cpumask.h>
> >  #include <linux/device.h>
> >  #include <linux/devfreq.h>
> > +#include <linux/slab.h>
> >  #include "governor.h"
> >
> > -static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
> > +struct devfreq_cpu_state {
> > +     unsigned int curr_freq;
> > +     unsigned int min_freq;
> > +     unsigned int max_freq;
> > +     unsigned int first_cpu;
> > +     struct device *cpu_dev;
> > +     struct opp_table *opp_table;
> > +};
>
> As I knew, the previous version has the description of structure
> as following:  I wan to add the description like below.
>
> And if you have no any objection, I'd like you to order
> the variables as following and use 'dev' instead of 'cpu_dev'
> because this patch use the 'cpu_state->cpu_dev' at the multiple points.
> I think that 'cpu_state->dev' is better than 'cpu_state->cpu_dev'.
> Also, I prefer to use 'cur_freq' instead of 'curr_freq'
> because devfreq subsystem uses 'cur_freq' for expressing the 'current frequency'.
>
> /**
>  * struct devfreq_cpu_state - Hold the per-cpu data
>  * @dev:        reference to cpu device.
>  * @first_cpu:  the cpumask of the first cpu of a policy.
>  * @opp_table:  reference to cpu opp table.
>  * @cur_freq:   the current frequency of the cpu.
>  * @min_freq:   the min frequency of the cpu.
>  * @max_freq:   the max frequency of the cpu.
>  *
>  * This structure stores the required cpu_data of a cpu.
>  * This is auto-populated by the governor.
>  */
> struct devfreq_cpu_state {
>          struct device *dev;
>          unsigned int first_cpu;
>
>          struct opp_table *opp_table;
>          unsigned int cur_freq;
>          unsigned int min_freq;
>          unsigned int max_freq;
> };
>
>
> > +
> > +static unsigned long xlate_cpufreq_to_devfreq(struct devfreq_passive_data *data,
> > +                                           unsigned int cpu)
> > +{
> > +     unsigned int cpu_min_freq, cpu_max_freq, cpu_curr_freq_khz, cpu_percent;
> > +     unsigned long dev_min_freq, dev_max_freq, dev_max_state;
> > +
> > +     struct devfreq_cpu_state *cpu_state = data->cpu_state[cpu];
> > +     struct devfreq *devfreq = (struct devfreq *)data->this;
> > +     unsigned long *dev_freq_table = devfreq->profile->freq_table;
> > +     struct dev_pm_opp *opp = NULL, *p_opp = NULL;
> > +     unsigned long cpu_curr_freq, freq;
> > +
> > +     if (!cpu_state || cpu_state->first_cpu != cpu ||
> > +         !cpu_state->opp_table || !devfreq->opp_table)
> > +             return 0;
> > +
> > +     cpu_curr_freq = cpu_state->curr_freq * 1000;
> > +     p_opp = devfreq_recommended_opp(cpu_state->cpu_dev, &cpu_curr_freq, 0);
> > +     if (IS_ERR(p_opp))
> > +             return 0;
> > +
> > +     opp = dev_pm_opp_xlate_required_opp(cpu_state->opp_table,
> > +                                         devfreq->opp_table, p_opp);
> > +     dev_pm_opp_put(p_opp);
> > +
> > +     if (!IS_ERR(opp)) {
> > +             freq = dev_pm_opp_get_freq(opp);
> > +             dev_pm_opp_put(opp);
> > +             goto out;
> > +     }
> > +
> > +     /* Use Interpolation if required opps is not available */
> > +     cpu_min_freq = cpu_state->min_freq;
> > +     cpu_max_freq = cpu_state->max_freq;
> > +     cpu_curr_freq_khz = cpu_state->curr_freq;
> > +
> > +     if (dev_freq_table) {
> > +             /* Get minimum frequency according to sorting order */
> > +             dev_max_state = dev_freq_table[devfreq->profile->max_state - 1];
> > +             if (dev_freq_table[0] < dev_max_state) {
> > +                     dev_min_freq = dev_freq_table[0];
> > +                     dev_max_freq = dev_max_state;
> > +             } else {
> > +                     dev_min_freq = dev_max_state;
> > +                     dev_max_freq = dev_freq_table[0];
> > +             }
> > +     } else {
> > +             dev_min_freq = dev_pm_qos_read_value(devfreq->dev.parent,
> > +                                                  DEV_PM_QOS_MIN_FREQUENCY);
> > +             dev_max_freq = dev_pm_qos_read_value(devfreq->dev.parent,
> > +                                                  DEV_PM_QOS_MAX_FREQUENCY);
> > +
> > +             if (dev_max_freq <= dev_min_freq)
> > +                     return 0;
> > +     }
> > +     cpu_percent = ((cpu_curr_freq_khz - cpu_min_freq) * 100) / cpu_max_freq - cpu_min_freq;

() is missing for denominator?
cpu_percent = ((cpu_curr_freq_khz - cpu_min_freq) * 100) /
(cpu_max_freq - cpu_min_freq);


> > +     freq = dev_min_freq + mult_frac(dev_max_freq - dev_min_freq, cpu_percent, 100);
> > +
> > +out:
> > +     return freq;
> > +}
> > +
> > +static int get_target_freq_with_cpufreq(struct devfreq *devfreq,
> > +                                     unsigned long *freq)
> > +{
> > +     struct devfreq_passive_data *p_data =
> > +                             (struct devfreq_passive_data *)devfreq->data;
> > +     unsigned int cpu;
> > +     unsigned long target_freq = 0;
> > +
> > +     for_each_online_cpu(cpu)
> > +             target_freq = max(target_freq,
> > +                               xlate_cpufreq_to_devfreq(p_data, cpu));
> > +
> > +     *freq = target_freq;
> > +
> > +     return 0;
> > +}
>
> As you knew, governor_passive.c was already used
> both 'dev_pm_opp_xlate_required_opp' and 'devfreq_recommended_opp'
> to get the target from OPP. So, I wan to make the common function
> like 'get_taget_freq_by_required_opp' as following:
> If define 'get_taget_freq_by_required_opp' as following,
> it will be used for get_target_freq_with_devfreq().
> After finisied the review of this patch, I'll send the patch[2].
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/commit/?h=devfreq-testing-passive-gov&id=101c5a087586ab2b5cf3370166a7e39227ca83cf
>
> For example but this code is not tested,
> static unsigned long get_taget_freq_by_required_opp(struct device *p_dev,
>                                                 struct opp_table *p_opp_table,
>                                                 struct opp_table *opp_table,
>                                                 unsigned long freq)
> {
>         struct dev_pm_opp *opp = NULL, *p_opp = NULL;
>
>         if (!p_dev || !p_opp_table || !opp_table || !freq)
>                 return 0;
>
>         p_opp = devfreq_recommended_opp(p_dev, &freq, 0);
>         if (IS_ERR(p_opp))
>                 return 0;
>
>         opp = dev_pm_opp_xlate_required_opp(p_opp_table, opp_table, p_opp);
>         dev_pm_opp_put(p_opp);
>
>         if (IS_ERR(opp))
>                 return 0;
>
>         freq = dev_pm_opp_get_freq(opp);
>         dev_pm_opp_put(opp);
>
>         return freq;
> }
>
> static int get_target_freq_with_cpufreq(struct devfreq *devfreq,
>                                         unsigned long *target_freq)
> {
>         struct devfreq_passive_data *p_data =
>                                 (struct devfreq_passive_data *)devfreq->data;
>         struct devfreq_cpu_data *cpu_data;
>         unsigned long cpu, cpu_cur, cpu_min, cpu_max, cpu_percent;
>         unsigned long dev_min, dev_max;
>         unsigned long freq = 0;
>
>         for_each_online_cpu(cpu) {
>                 cpu_data = p_data->cpu_data[cpu];
>                 if (!cpu_data || cpu_data->first_cpu != cpu)
>                         continue;
>
>                 /* Get target freq via required opps */
>                 cpu_cur = cpu_data->cur_freq * HZ_PER_KHZ;
>                 freq = get_taget_freq_by_required_opp(cpu_data->dev,
>                                         cpu_data->opp_table,
>                                         devfreq->opp_table, cpu_cur);
>                 if (freq) {
>                         *target_freq = max(freq, *target_freq);
>                         continue;
>                 }
>
>                 /* Use Interpolation if required opps is not available */
>                 devfreq_get_freq_range(devfreq, &dev_min, &dev_max);
>
>                 cpu_min = cpu_data->min_freq;
>                 cpu_max = cpu_data->max_freq;
>                 cpu_cur = cpu_data->cur_freq;
>
>                 cpu_percent = ((cpu_cur - cpu_min) * 100) / cpu_max - cpu_min;
>                 freq = dev_min + mult_frac(dev_max - dev_min, cpu_percent, 100);
>
>                 *target_freq = max(freq, *target_freq);
>         }
>
>         return 0;
> }
>
> > +
> > +static int get_target_freq_with_devfreq(struct devfreq *devfreq,
> >                                       unsigned long *freq)
> >  {
> >       struct devfreq_passive_data *p_data
> > @@ -23,14 +115,6 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
> >       int i, count;
> >
> >       /*
> > -      * If the devfreq device with passive governor has the specific method
> > -      * to determine the next frequency, should use the get_target_freq()
> > -      * of struct devfreq_passive_data.
> > -      */
> > -     if (p_data->get_target_freq)
> > -             return p_data->get_target_freq(devfreq, freq);
> > -
> > -     /*
> >        * If the parent and passive devfreq device uses the OPP table,
> >        * get the next frequency by using the OPP table.
> >        */
> > @@ -98,6 +182,37 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
> >       return 0;
> >  }
> >
> > +static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
> > +                                        unsigned long *freq)
> > +{
> > +     struct devfreq_passive_data *p_data =
> > +             (struct devfreq_passive_data *)devfreq->data;
> > +     int ret;
> > +
> > +     /*
> > +      * If the devfreq device with passive governor has the specific method
> > +      * to determine the next frequency, should use the get_target_freq()
> > +      * of struct devfreq_passive_data.
> > +      */
> > +     if (p_data->get_target_freq)
> > +             return p_data->get_target_freq(devfreq, freq);
> > +
> > +     switch (p_data->parent_type) {
> > +     case DEVFREQ_PARENT_DEV:
> > +             ret = get_target_freq_with_devfreq(devfreq, freq);
> > +             break;
> > +     case CPUFREQ_PARENT_DEV:
> > +             ret = get_target_freq_with_cpufreq(devfreq, freq);
> > +             break;
> > +     default:
> > +             ret = -EINVAL;
> > +             dev_err(&devfreq->dev, "Invalid parent type\n");
> > +             break;
> > +     }
> > +
> > +     return ret;
> > +}
> > +
> >  static int devfreq_passive_notifier_call(struct notifier_block *nb,
> >                               unsigned long event, void *ptr)
> >  {
> > @@ -130,16 +245,200 @@ static int devfreq_passive_notifier_call(struct notifier_block *nb,
> >       return NOTIFY_DONE;
> >  }
> >
> > +static int cpufreq_passive_notifier_call(struct notifier_block *nb,
> > +                                      unsigned long event, void *ptr)
> > +{
> > +     struct devfreq_passive_data *data =
> > +                     container_of(nb, struct devfreq_passive_data, nb);
> > +     struct devfreq *devfreq = (struct devfreq *)data->this;
> > +     struct devfreq_cpu_state *cpu_state;
> > +     struct cpufreq_freqs *cpu_freq = ptr;
>
> Use 'freqs' variable name.  I prefer to use the same variable name
> for both devfreq_freqs and cpufreq_freqs instance.
>
> > +     unsigned int curr_freq;
>
> As I commented above, better to use 'cur_frq' instead of 'curr_freq'
> if there is no any special reason.
>
> > +     int ret;
> > +
> > +     if (event != CPUFREQ_POSTCHANGE || !cpu_freq ||
> > +         !data->cpu_state[cpu_freq->policy->cpu])
> > +             return 0;
> > +
> > +     cpu_state = data->cpu_state[cpu_freq->policy->cpu];
> > +     if (cpu_state->curr_freq == cpu_freq->new)
> > +             return 0;
> > +
> > +     /* Backup current freq and pre-update cpu state freq*/
>
> I think that this commnet is not critial. So, please drop this comment.
>
> > +     curr_freq = cpu_state->curr_freq;
> > +     cpu_state->curr_freq = cpu_freq->new;
> > +
> > +     mutex_lock(&devfreq->lock);
> > +     ret = update_devfreq(devfreq);
>
> I recommend to use 'devfreq_update_target' instead of 'update_devfreq'
> as following:
>         devfreq_update_target(devfreq, freqs->new);
>
> > +     mutex_unlock(&devfreq->lock);
> > +     if (ret) {
> > +             cpu_state->curr_freq = curr_freq;
> > +             dev_err(&devfreq->dev, "Couldn't update the frequency.\n");
> > +             return ret;
> > +     }
> > +
> > +     return 0;
> > +}
> > +
> > +static int cpufreq_passive_register(struct devfreq_passive_data **p_data)
>
> In order to keep the consistent style of function name,
> please change the name as following because devfreq defines
> the function name as 'devfreq_regiter_notifier'
> - cpufreq_passive_register -> cpufreq_passive_register_notifier
>
> > +{
> > +     struct devfreq_passive_data *data = *p_data;
> > +     struct devfreq *devfreq = (struct devfreq *)data->this;
> > +     struct device *dev = devfreq->dev.parent;
> > +     struct opp_table *opp_table = NULL;
> > +     struct devfreq_cpu_state *cpu_state;
> > +     struct cpufreq_policy *policy;
> > +     struct device *cpu_dev;
> > +     unsigned int cpu;
> > +     int ret;
> > +
> > +     get_online_cpus();
> > +
> > +     data->nb.notifier_call = cpufreq_passive_notifier_call;
> > +     ret = cpufreq_register_notifier(&data->nb,
> > +                                     CPUFREQ_TRANSITION_NOTIFIER);
> > +     if (ret) {
> > +             dev_err(dev, "Couldn't register cpufreq notifier.\n");
> > +             data->nb.notifier_call = NULL;
> > +             goto out;
> > +     }
> > +
> > +     /* Populate devfreq_cpu_state */
>
> Don't need this comment. Please drop it.
>
> > +     for_each_online_cpu(cpu) {
> > +             if (data->cpu_state[cpu])
> > +                     continue;
> > +
> > +             policy = cpufreq_cpu_get(cpu);
> > +             if (!policy) {
> > +                     ret = -EINVAL;
> > +                     goto out;
> > +             } else if (PTR_ERR(policy) == -EPROBE_DEFER) {
> > +                     ret = -EPROBE_DEFER;
> > +                     goto out;
> > +             } else if (IS_ERR(policy)) {
> > +                     ret = PTR_ERR(policy);
> > +                     dev_err(dev, "Couldn't get the cpufreq_poliy.\n");
> > +                     goto out;
> > +             }
>
> Use dev_err_probe() funciotn to handle hte EPROBE_DEFER.
> It make code more simple.
>
> > +
> > +             cpu_state = kzalloc(sizeof(*cpu_state), GFP_KERNEL);
> > +             if (!cpu_state) {
> > +                     ret = -ENOMEM;
> > +                     goto out;
> > +             }
> > +
> > +             cpu_dev = get_cpu_device(cpu);
> > +             if (!cpu_dev) {
> > +                     dev_err(dev, "Couldn't get cpu device.\n");
> > +                     ret = -ENODEV;
> > +                     goto out;
> > +             }
> > +
> > +             opp_table = dev_pm_opp_get_opp_table(cpu_dev);
> > +             if (IS_ERR(devfreq->opp_table)) {
> > +                     ret = PTR_ERR(opp_table);
> > +                     goto out;
> > +             }
> > +
> > +             cpu_state->cpu_dev = cpu_dev;
> > +             cpu_state->opp_table = opp_table;
> > +             cpu_state->first_cpu = cpumask_first(policy->related_cpus);
> > +             cpu_state->curr_freq = policy->cur;
> > +             cpu_state->min_freq = policy->cpuinfo.min_freq;
> > +             cpu_state->max_freq = policy->cpuinfo.max_freq;
> > +             data->cpu_state[cpu] = cpu_state;
> > +
> > +             cpufreq_cpu_put(policy);
> > +     }
> > +
> > +out:
> > +     put_online_cpus();
> > +     if (ret)
> > +             return ret;
> > +
> > +     /* Update devfreq */
> > +     mutex_lock(&devfreq->lock);
> > +     ret = update_devfreq(devfreq);
>
> > +     mutex_unlock(&devfreq->lock);
> > +     if (ret)
> > +             dev_err(dev, "Couldn't update the frequency.\n");
> > +
> > +     return ret;
> > +}
> > +
> > +static int cpufreq_passive_unregister(struct devfreq_passive_data **p_data)
>
> As I commented above, please change the name as following:
> - cpufreq_passive_unregister -> cpufreq_passive_unregister_notifier
>
> > +{
> > +     struct devfreq_passive_data *data = *p_data;
> > +     struct devfreq_cpu_state *cpu_state;
> > +     int cpu;
> > +
> > +     if (data->nb.notifier_call)
> > +             cpufreq_unregister_notifier(&data->nb,
> > +                                         CPUFREQ_TRANSITION_NOTIFIER);
> > +
> > +     for_each_possible_cpu(cpu) {
> > +             cpu_state = data->cpu_state[cpu];
> > +             if (cpu_state) {
> > +                     if (cpu_state->opp_table)
> > +                             dev_pm_opp_put_opp_table(cpu_state->opp_table);
> > +                     kfree(cpu_state);
> > +                     cpu_state = NULL;
> > +             }
> > +     }
> > +
> > +     return 0;
> > +}
> > +
> > +int register_parent_dev_notifier(struct devfreq_passive_data **p_data)
> > +{
> > +     struct notifier_block *nb = &(*p_data)->nb;
> > +     int ret = 0;
> > +
> > +     switch ((*p_data)->parent_type) {
> > +     case DEVFREQ_PARENT_DEV:
> > +             nb->notifier_call = devfreq_passive_notifier_call;
> > +             ret = devfreq_register_notifier((struct devfreq *)(*p_data)->parent, nb,
> > +                                             DEVFREQ_TRANSITION_NOTIFIER);
> > +             break;
> > +     case CPUFREQ_PARENT_DEV:
> > +             ret = cpufreq_passive_register(p_data);
> > +             break;
> > +     default:
> > +             ret = -EINVAL;
> > +             break;
> > +     }
> > +     return ret;
> > +}
> > +
> > +int unregister_parent_dev_notifier(struct devfreq_passive_data **p_data)
> > +{
> > +     int ret = 0;
> > +
> > +     switch ((*p_data)->parent_type) {
> > +     case DEVFREQ_PARENT_DEV:
> > +             WARN_ON(devfreq_unregister_notifier((struct devfreq *)(*p_data)->parent,
> > +                                                 &(*p_data)->nb,
> > +                                                 DEVFREQ_TRANSITION_NOTIFIER));
> > +             break;
> > +     case CPUFREQ_PARENT_DEV:
> > +             cpufreq_passive_unregister(p_data);
> > +             break;
> > +     default:
> > +             ret = -EINVAL;
> > +             break;
> > +     }
> > +     return ret;
> > +}
>
> I think that you don't need to define register_parent_dev_notifier
> and unregister_parent_dev_notifier as the separate functions.
>
> Instead of the separate functions, just add the code
> into devfreq_passive_event_handler.
>
>
> > +
> >  static int devfreq_passive_event_handler(struct devfreq *devfreq,
> >                               unsigned int event, void *data)
> >  {
> >       struct devfreq_passive_data *p_data
> >                       = (struct devfreq_passive_data *)devfreq->data;
> >       struct devfreq *parent = (struct devfreq *)p_data->parent;
> > -     struct notifier_block *nb = &p_data->nb;
> >       int ret = 0;
> >
> > -     if (!parent)
> > +     if (p_data->parent_type == DEVFREQ_PARENT_DEV && !parent)
> >               return -EPROBE_DEFER;
> >
> >       switch (event) {
> > @@ -147,13 +446,11 @@ static int devfreq_passive_event_handler(struct devfreq *devfreq,
> >               if (!p_data->this)
> >                       p_data->this = devfreq;
> >
> > -             nb->notifier_call = devfreq_passive_notifier_call;
> > -             ret = devfreq_register_notifier(parent, nb,
> > -                                     DEVFREQ_TRANSITION_NOTIFIER);
> > +             ret = register_parent_dev_notifier(&p_data);
> >               break;
> > +
> >       case DEVFREQ_GOV_STOP:
> > -             WARN_ON(devfreq_unregister_notifier(parent, nb,
> > -                                     DEVFREQ_TRANSITION_NOTIFIER));
> > +             ret = unregister_parent_dev_notifier(&p_data);
> >               break;
> >       default:
> >               break;
> > diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
> > index 26ea0850be9b..e0093b7c805c 100644
> > --- a/include/linux/devfreq.h
> > +++ b/include/linux/devfreq.h
> > @@ -280,6 +280,25 @@ struct devfreq_simple_ondemand_data {
> >
> >  #if IS_ENABLED(CONFIG_DEVFREQ_GOV_PASSIVE)
> >  /**
> > + * struct devfreq_cpu_state - holds the per-cpu state
> > + * @freq:    the current frequency of the cpu.
> > + * @min_freq:        the min frequency of the cpu.
> > + * @max_freq:        the max frequency of the cpu.
> > + * @first_cpu:       the cpumask of the first cpu of a policy.
> > + * @dev:     reference to cpu device.
> > + * @opp_table:       reference to cpu opp table.
> > + *
> > + * This structure stores the required cpu_state of a cpu.
> > + * This is auto-populated by the governor.
> > + */
> > +struct devfreq_cpu_state;
> > +
> > +enum devfreq_parent_dev_type {
> > +     DEVFREQ_PARENT_DEV,
> > +     CPUFREQ_PARENT_DEV,
> > +};
> > +
> > +/**
> >   * struct devfreq_passive_data - ``void *data`` fed to struct devfreq
> >   *   and devfreq_add_device
> >   * @parent:  the devfreq instance of parent device.
> > @@ -290,13 +309,15 @@ struct devfreq_simple_ondemand_data {
> >   *                   using governors except for passive governor.
> >   *                   If the devfreq device has the specific method to decide
> >   *                   the next frequency, should use this callback.
> > + * @parent_type:     parent type of the device
> >   * @this:    the devfreq instance of own device.
> >   * @nb:              the notifier block for DEVFREQ_TRANSITION_NOTIFIER list
> > + * @cpu_state:               the state min/max/current frequency of all online cpu's
> >   *
> >   * The devfreq_passive_data have to set the devfreq instance of parent
> >   * device with governors except for the passive governor. But, don't need to
> > - * initialize the 'this' and 'nb' field because the devfreq core will handle
> > - * them.
> > + * initialize the 'this', 'nb' and 'cpu_state' field because the devfreq core
> > + * will handle them.
> >   */
> >  struct devfreq_passive_data {
> >       /* Should set the devfreq instance of parent device */
> > @@ -305,9 +326,13 @@ struct devfreq_passive_data {
> >       /* Optional callback to decide the next frequency of passvice device */
> >       int (*get_target_freq)(struct devfreq *this, unsigned long *freq);
> >
> > +     /* Should set the type of parent device */
> > +     enum devfreq_parent_dev_type parent_type;
> > +
> >       /* For passive governor's internal use. Don't need to set them */
> >       struct devfreq *this;
> >       struct notifier_block nb;
> > +     struct devfreq_cpu_state *cpu_state[NR_CPUS];
> >  };
> >  #endif
> >
> >
>
andrew-sh.cheng March 31, 2021, 1:03 p.m. UTC | #7
On Wed, 2021-03-31 at 17:35 +0900, Chanwoo Choi wrote:
> On 3/31/21 5:27 PM, Chanwoo Choi wrote:
> > Hi,
> > 
> > On 3/31/21 5:03 PM, andrew-sh.cheng wrote:
> >> On Thu, 2021-03-25 at 17:14 +0900, Chanwoo Choi wrote:
> >>> Hi,
> >>>
> >>> You are missing to add these patches to linux-pm mailing list.
> >>> Need to send them to linu-pm ML.
> >>>
> >>> Also, before received this series, I tried to clean-up these patches
> >>> on testing branch[1]. So that I add my comment with my clean-up case.
> >>> [1] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!zIrzeDp9vPnm1_SDzVPuzqdHn3zWie9DnfBXaA-j9-CSrVc6aR9_rJQQiw81_CgAPh9XRRs$ 
> >>>
> >>> And 'Saravana Kannan <skannan@codeaurora.org>' is wrong email address.
> >>> Please update the email or drop this email.
> >>
> >> Hi Chanwoo,
> >>
> >> Thank you for the advices.
> >> I will resend patch v9 (add to linux-pm ML), remove this patch, and note
> >> that my patch set base on
> >> https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!yUlsuxrL5PcbF7o6A9DlCfvoA6w8V8VXKjYIybYyiJg3D0HM-lI2xRuxLUV6b3UJ8WFhg_g$ 
> > 
> > I has not yet test this patch[1] on devfreq-testing-passive-gov branch.
> > So that if possible, I'd like you to test your patches with this patch[1] 
> > and then if there is no problem, could you send the next patches with patch[1]?
> > 
> > [1]https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/commit/?h=devfreq-testing-passive-gov&id=39c80d11a8f42dd63ecea1e0df595a0ceb83b454__;!!CTRNKA9wMg0ARbw!yUlsuxrL5PcbF7o6A9DlCfvoA6w8V8VXKjYIybYyiJg3D0HM-lI2xRuxLUV6b3UJR2cQqZs$ 
> 
> 
> Sorry for the confusion. I make the devfreq-testing-passive-gov[1]
> branch based on latest devfreq-next branch.
> [1] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!yUlsuxrL5PcbF7o6A9DlCfvoA6w8V8VXKjYIybYyiJg3D0HM-lI2xRuxLUV6b3UJ8WFhg_g$ 
> 
> First of all, if possible, I want to test them[1] with your patches in this series.
> And then if there are no any problem, please let me know. After confirmed from you,
> I'll send the patches of devfreq-testing-passive-gov[1] branch.
> How about that?
> 
Hi Chanwoo~

We will use this on Google Chrome project.
Google Hsin-Yi has test your patch + my patch set v8 [2~8]

    make sure cci devfreqs runs with cpufreq.
    suspend resume
    speedometer2 benchmark
It is okay.

Please send the patches of devfreq-testing-passive-gov[1] branch.

I will send patch v9 base on yours latter.


> 
> > 
> >>
> >>
> >>>
> >>>
> >>> On 3/23/21 8:33 PM, Andrew-sh.Cheng wrote:
> >>>> From: Saravana Kannan <skannan@codeaurora.org>
> >>>>
> >>>> Many CPU architectures have caches that can scale independent of the
> >>>> CPUs. Frequency scaling of the caches is necessary to make sure that the
> >>>> cache is not a performance bottleneck that leads to poor performance and
> >>>> power. The same idea applies for RAM/DDR.
> >>>>
> >>>> To achieve this, this patch adds support for cpu based scaling to the
> >>>> passive governor. This is accomplished by taking the current frequency
> >>>> of each CPU frequency domain and then adjust the frequency of the cache
> >>>> (or any devfreq device) based on the frequency of the CPUs. It listens
> >>>> to CPU frequency transition notifiers to keep itself up to date on the
> >>>> current CPU frequency.
> >>>>
> >>>> To decide the frequency of the device, the governor does one of the
> >>>> following:
> >>>> * Derives the optimal devfreq device opp from required-opps property of
> >>>>   the parent cpu opp_table.
> >>>>
> >>>> * Scales the device frequency in proportion to the CPU frequency. So, if
> >>>>   the CPUs are running at their max frequency, the device runs at its
> >>>>   max frequency. If the CPUs are running at their min frequency, the
> >>>>   device runs at its min frequency. It is interpolated for frequencies
> >>>>   in between.
> >>>>
> >>>> Andrew-sh.Cheng change
> >>>> dev_pm_opp_xlate_opp to dev_pm_opp_xlate_required_opp devfreq->max_freq
> >>>> to devfreq->user_min_freq_req.data.freq.qos->min_freq.target_value
> >>>> after kernel-5.7
> >>>> Don't return -EINVAL in devfreq_passive_event_handler()
> >>>> since it doesn't handle DEVFREQ_GOV_SUSPEND DEVFREQ_GOV_RESUME cases.
> >>>>
> >>>> Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
> >>>> [Sibi: Integrated cpu-freqmap governor into passive_governor]
> >>>> Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
> >>>> Signed-off-by: Andrew-sh.Cheng <andrew-sh.cheng@mediatek.com>
> >>>> ---
> >>>>  drivers/devfreq/Kconfig            |   2 +
> >>>>  drivers/devfreq/governor_passive.c | 329 +++++++++++++++++++++++++++++++++++--
> >>>>  include/linux/devfreq.h            |  29 +++-
> >>>>  3 files changed, 342 insertions(+), 18 deletions(-)
> >>>>
> >>>> diff --git a/drivers/devfreq/Kconfig b/drivers/devfreq/Kconfig
> >>>> index 00704efe6398..f56132b0ae64 100644
> >>>> --- a/drivers/devfreq/Kconfig
> >>>> +++ b/drivers/devfreq/Kconfig
> >>>> @@ -73,6 +73,8 @@ config DEVFREQ_GOV_PASSIVE
> >>>>  	  device. This governor does not change the frequency by itself
> >>>>  	  through sysfs entries. The passive governor recommends that
> >>>>  	  devfreq device uses the OPP table to get the frequency/voltage.
> >>>> +	  Alternatively the governor can also be chosen to scale based on
> >>>> +	  the online CPUs current frequency.
> >>>>  
> >>>>  comment "DEVFREQ Drivers"
> >>>>  
> >>>> diff --git a/drivers/devfreq/governor_passive.c b/drivers/devfreq/governor_passive.c
> >>>> index b094132bd20b..9cc57b083839 100644
> >>>> --- a/drivers/devfreq/governor_passive.c
> >>>> +++ b/drivers/devfreq/governor_passive.c
> >>>> @@ -8,11 +8,103 @@
> >>>>   */
> >>>>  
> >>>>  #include <linux/module.h>
> >>>> +#include <linux/cpu.h>
> >>>> +#include <linux/cpufreq.h>
> >>>> +#include <linux/cpumask.h>
> >>>>  #include <linux/device.h>
> >>>>  #include <linux/devfreq.h>
> >>>> +#include <linux/slab.h>
> >>>>  #include "governor.h"
> >>>>  
> >>>> -static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
> >>>> +struct devfreq_cpu_state {
> >>>> +	unsigned int curr_freq;
> >>>> +	unsigned int min_freq;
> >>>> +	unsigned int max_freq;
> >>>> +	unsigned int first_cpu;
> >>>> +	struct device *cpu_dev;
> >>>> +	struct opp_table *opp_table;
> >>>> +};
> >>>
> >>> As I knew, the previous version has the description of structure
> >>> as following:  I wan to add the description like below.
> >>>
> >>> And if you have no any objection, I'd like you to order
> >>> the variables as following and use 'dev' instead of 'cpu_dev'
> >>> because this patch use the 'cpu_state->cpu_dev' at the multiple points.
> >>> I think that 'cpu_state->dev' is better than 'cpu_state->cpu_dev'.
> >>> Also, I prefer to use 'cur_freq' instead of 'curr_freq'
> >>> because devfreq subsystem uses 'cur_freq' for expressing the 'current frequency'.
> >>>
> >>> /**                                                                             
> >>>  * struct devfreq_cpu_state - Hold the per-cpu data                              
> >>>  * @dev:        reference to cpu device.                                        
> >>>  * @first_cpu:  the cpumask of the first cpu of a policy.                       
> >>>  * @opp_table:  reference to cpu opp table.                                     
> >>>  * @cur_freq:   the current frequency of the cpu.                               
> >>>  * @min_freq:   the min frequency of the cpu.                                   
> >>>  * @max_freq:   the max frequency of the cpu.                                   
> >>>  *                                                                              
> >>>  * This structure stores the required cpu_data of a cpu.                        
> >>>  * This is auto-populated by the governor.                                      
> >>>  */                                                                             
> >>> struct devfreq_cpu_state {                                                       
> >>>          struct device *dev;                                                     
> >>>          unsigned int first_cpu;                                                 
> >>>
> >>>          struct opp_table *opp_table;                                            
> >>>          unsigned int cur_freq;                                                  
> >>>          unsigned int min_freq;                                                  
> >>>          unsigned int max_freq;                                                  
> >>> };               
> >>>
> >>>
> >>>> +
> >>>> +static unsigned long xlate_cpufreq_to_devfreq(struct devfreq_passive_data *data,
> >>>> +					      unsigned int cpu)
> >>>> +{
> >>>> +	unsigned int cpu_min_freq, cpu_max_freq, cpu_curr_freq_khz, cpu_percent;
> >>>> +	unsigned long dev_min_freq, dev_max_freq, dev_max_state;
> >>>> +
> >>>> +	struct devfreq_cpu_state *cpu_state = data->cpu_state[cpu];
> >>>> +	struct devfreq *devfreq = (struct devfreq *)data->this;
> >>>> +	unsigned long *dev_freq_table = devfreq->profile->freq_table;
> >>>> +	struct dev_pm_opp *opp = NULL, *p_opp = NULL;
> >>>> +	unsigned long cpu_curr_freq, freq;
> >>>> +
> >>>> +	if (!cpu_state || cpu_state->first_cpu != cpu ||
> >>>> +	    !cpu_state->opp_table || !devfreq->opp_table)
> >>>> +		return 0;
> >>>> +
> >>>> +	cpu_curr_freq = cpu_state->curr_freq * 1000;
> >>>> +	p_opp = devfreq_recommended_opp(cpu_state->cpu_dev, &cpu_curr_freq, 0);
> >>>> +	if (IS_ERR(p_opp))
> >>>> +		return 0;
> >>>> +
> >>>> +	opp = dev_pm_opp_xlate_required_opp(cpu_state->opp_table,
> >>>> +					    devfreq->opp_table, p_opp);
> >>>> +	dev_pm_opp_put(p_opp);
> >>>> +
> >>>> +	if (!IS_ERR(opp)) {
> >>>> +		freq = dev_pm_opp_get_freq(opp);
> >>>> +		dev_pm_opp_put(opp);
> >>>> +		goto out;
> >>>> +	}
> >>>> +
> >>>> +	/* Use Interpolation if required opps is not available */
> >>>> +	cpu_min_freq = cpu_state->min_freq;
> >>>> +	cpu_max_freq = cpu_state->max_freq;
> >>>> +	cpu_curr_freq_khz = cpu_state->curr_freq;
> >>>> +
> >>>> +	if (dev_freq_table) {
> >>>> +		/* Get minimum frequency according to sorting order */
> >>>> +		dev_max_state = dev_freq_table[devfreq->profile->max_state - 1];
> >>>> +		if (dev_freq_table[0] < dev_max_state) {
> >>>> +			dev_min_freq = dev_freq_table[0];
> >>>> +			dev_max_freq = dev_max_state;
> >>>> +		} else {
> >>>> +			dev_min_freq = dev_max_state;
> >>>> +			dev_max_freq = dev_freq_table[0];
> >>>> +		}
> >>>> +	} else {
> >>>> +		dev_min_freq = dev_pm_qos_read_value(devfreq->dev.parent,
> >>>> +						     DEV_PM_QOS_MIN_FREQUENCY);
> >>>> +		dev_max_freq = dev_pm_qos_read_value(devfreq->dev.parent,
> >>>> +						     DEV_PM_QOS_MAX_FREQUENCY);
> >>>> +
> >>>> +		if (dev_max_freq <= dev_min_freq)
> >>>> +			return 0;
> >>>> +	}
> >>>> +	cpu_percent = ((cpu_curr_freq_khz - cpu_min_freq) * 100) / cpu_max_freq - cpu_min_freq;
> >>>> +	freq = dev_min_freq + mult_frac(dev_max_freq - dev_min_freq, cpu_percent, 100);
> >>>> +
> >>>> +out:
> >>>> +	return freq;
> >>>> +}
> >>>> +
> >>>> +static int get_target_freq_with_cpufreq(struct devfreq *devfreq,
> >>>> +					unsigned long *freq)
> >>>> +{
> >>>> +	struct devfreq_passive_data *p_data =
> >>>> +				(struct devfreq_passive_data *)devfreq->data;
> >>>> +	unsigned int cpu;
> >>>> +	unsigned long target_freq = 0;
> >>>> +
> >>>> +	for_each_online_cpu(cpu)
> >>>> +		target_freq = max(target_freq,
> >>>> +				  xlate_cpufreq_to_devfreq(p_data, cpu));
> >>>> +
> >>>> +	*freq = target_freq;
> >>>> +
> >>>> +	return 0;
> >>>> +}
> >>>
> >>> As you knew, governor_passive.c was already used 
> >>> both 'dev_pm_opp_xlate_required_opp' and 'devfreq_recommended_opp'
> >>> to get the target from OPP. So, I wan to make the common function
> >>> like 'get_taget_freq_by_required_opp' as following:
> >>> If define 'get_taget_freq_by_required_opp' as following,
> >>> it will be used for get_target_freq_with_devfreq().
> >>> After finisied the review of this patch, I'll send the patch[2].
> >>> [2] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/commit/?h=devfreq-testing-passive-gov&id=101c5a087586ab2b5cf3370166a7e39227ca83cf__;!!CTRNKA9wMg0ARbw!zIrzeDp9vPnm1_SDzVPuzqdHn3zWie9DnfBXaA-j9-CSrVc6aR9_rJQQiw81_CgA6mp3Yqo$ 
> >>>
> >>> For example but this code is not tested,
> >>> static unsigned long get_taget_freq_by_required_opp(struct device *p_dev,
> >>> 						struct opp_table *p_opp_table,
> >>> 						struct opp_table *opp_table,
> >>> 						unsigned long freq)
> >>> {
> >>> 	struct dev_pm_opp *opp = NULL, *p_opp = NULL;
> >>>
> >>> 	if (!p_dev || !p_opp_table || !opp_table || !freq)
> >>> 		return 0;
> >>>
> >>> 	p_opp = devfreq_recommended_opp(p_dev, &freq, 0);
> >>> 	if (IS_ERR(p_opp))
> >>> 		return 0;
> >>>
> >>> 	opp = dev_pm_opp_xlate_required_opp(p_opp_table, opp_table, p_opp);
> >>> 	dev_pm_opp_put(p_opp);
> >>>
> >>> 	if (IS_ERR(opp))
> >>> 		return 0;
> >>>
> >>> 	freq = dev_pm_opp_get_freq(opp);
> >>> 	dev_pm_opp_put(opp);
> >>>
> >>> 	return freq;
> >>> }
> >>>
> >>> static int get_target_freq_with_cpufreq(struct devfreq *devfreq,
> >>> 					unsigned long *target_freq)
> >>> {
> >>> 	struct devfreq_passive_data *p_data =
> >>> 				(struct devfreq_passive_data *)devfreq->data;
> >>> 	struct devfreq_cpu_data *cpu_data;
> >>> 	unsigned long cpu, cpu_cur, cpu_min, cpu_max, cpu_percent;
> >>> 	unsigned long dev_min, dev_max;
> >>> 	unsigned long freq = 0;
> >>>
> >>> 	for_each_online_cpu(cpu) {
> >>> 		cpu_data = p_data->cpu_data[cpu];
> >>> 		if (!cpu_data || cpu_data->first_cpu != cpu)
> >>> 			continue;
> >>>
> >>> 		/* Get target freq via required opps */
> >>> 		cpu_cur = cpu_data->cur_freq * HZ_PER_KHZ;
> >>> 		freq = get_taget_freq_by_required_opp(cpu_data->dev,
> >>> 					cpu_data->opp_table,
> >>> 					devfreq->opp_table, cpu_cur);
> >>> 		if (freq) {
> >>> 			*target_freq = max(freq, *target_freq);
> >>> 			continue;
> >>> 		}
> >>>
> >>> 		/* Use Interpolation if required opps is not available */
> >>> 		devfreq_get_freq_range(devfreq, &dev_min, &dev_max);
> >>>
> >>> 		cpu_min = cpu_data->min_freq;
> >>> 		cpu_max = cpu_data->max_freq;
> >>> 		cpu_cur = cpu_data->cur_freq;
> >>>
> >>> 		cpu_percent = ((cpu_cur - cpu_min) * 100) / cpu_max - cpu_min;
> >>> 		freq = dev_min + mult_frac(dev_max - dev_min, cpu_percent, 100);
> >>>
> >>> 		*target_freq = max(freq, *target_freq);
> >>> 	}
> >>>
> >>> 	return 0;
> >>> }
> >>>
> >>>> +
> >>>> +static int get_target_freq_with_devfreq(struct devfreq *devfreq,
> >>>>  					unsigned long *freq)
> >>>>  {
> >>>>  	struct devfreq_passive_data *p_data
> >>>> @@ -23,14 +115,6 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
> >>>>  	int i, count;
> >>>>  
> >>>>  	/*
> >>>> -	 * If the devfreq device with passive governor has the specific method
> >>>> -	 * to determine the next frequency, should use the get_target_freq()
> >>>> -	 * of struct devfreq_passive_data.
> >>>> -	 */
> >>>> -	if (p_data->get_target_freq)
> >>>> -		return p_data->get_target_freq(devfreq, freq);
> >>>> -
> >>>> -	/*
> >>>>  	 * If the parent and passive devfreq device uses the OPP table,
> >>>>  	 * get the next frequency by using the OPP table.
> >>>>  	 */
> >>>> @@ -98,6 +182,37 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
> >>>>  	return 0;
> >>>>  }
> >>>>  
> >>>> +static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
> >>>> +					   unsigned long *freq)
> >>>> +{
> >>>> +	struct devfreq_passive_data *p_data =
> >>>> +		(struct devfreq_passive_data *)devfreq->data;
> >>>> +	int ret;
> >>>> +
> >>>> +	/*
> >>>> +	 * If the devfreq device with passive governor has the specific method
> >>>> +	 * to determine the next frequency, should use the get_target_freq()
> >>>> +	 * of struct devfreq_passive_data.
> >>>> +	 */
> >>>> +	if (p_data->get_target_freq)
> >>>> +		return p_data->get_target_freq(devfreq, freq);
> >>>> +
> >>>> +	switch (p_data->parent_type) {
> >>>> +	case DEVFREQ_PARENT_DEV:
> >>>> +		ret = get_target_freq_with_devfreq(devfreq, freq);
> >>>> +		break;
> >>>> +	case CPUFREQ_PARENT_DEV:
> >>>> +		ret = get_target_freq_with_cpufreq(devfreq, freq);
> >>>> +		break;
> >>>> +	default:
> >>>> +		ret = -EINVAL;
> >>>> +		dev_err(&devfreq->dev, "Invalid parent type\n");
> >>>> +		break;
> >>>> +	}
> >>>> +
> >>>> +	return ret;
> >>>> +}
> >>>> +
> >>>>  static int devfreq_passive_notifier_call(struct notifier_block *nb,
> >>>>  				unsigned long event, void *ptr)
> >>>>  {
> >>>> @@ -130,16 +245,200 @@ static int devfreq_passive_notifier_call(struct notifier_block *nb,
> >>>>  	return NOTIFY_DONE;
> >>>>  }
> >>>>  
> >>>> +static int cpufreq_passive_notifier_call(struct notifier_block *nb,
> >>>> +					 unsigned long event, void *ptr)
> >>>> +{
> >>>> +	struct devfreq_passive_data *data =
> >>>> +			container_of(nb, struct devfreq_passive_data, nb);
> >>>> +	struct devfreq *devfreq = (struct devfreq *)data->this;
> >>>> +	struct devfreq_cpu_state *cpu_state;
> >>>> +	struct cpufreq_freqs *cpu_freq = ptr;
> >>>
> >>> Use 'freqs' variable name.  I prefer to use the same variable name
> >>> for both devfreq_freqs and cpufreq_freqs instance.
> >>>
> >>>> +	unsigned int curr_freq;
> >>>
> >>> As I commented above, better to use 'cur_frq' instead of 'curr_freq'
> >>> if there is no any special reason.
> >>>
> >>>> +	int ret;
> >>>> +
> >>>> +	if (event != CPUFREQ_POSTCHANGE || !cpu_freq ||
> >>>> +	    !data->cpu_state[cpu_freq->policy->cpu])
> >>>> +		return 0;
> >>>> +
> >>>> +	cpu_state = data->cpu_state[cpu_freq->policy->cpu];
> >>>> +	if (cpu_state->curr_freq == cpu_freq->new)
> >>>> +		return 0;
> >>>> +
> >>>> +	/* Backup current freq and pre-update cpu state freq*/
> >>>
> >>> I think that this commnet is not critial. So, please drop this comment.
> >>>
> >>>> +	curr_freq = cpu_state->curr_freq;
> >>>> +	cpu_state->curr_freq = cpu_freq->new;
> >>>> +
> >>>> +	mutex_lock(&devfreq->lock);
> >>>> +	ret = update_devfreq(devfreq);
> >>>
> >>> I recommend to use 'devfreq_update_target' instead of 'update_devfreq'
> >>> as following:
> >>> 	devfreq_update_target(devfreq, freqs->new);
> >>>
> >>>> +	mutex_unlock(&devfreq->lock);
> >>>> +	if (ret) {
> >>>> +		cpu_state->curr_freq = curr_freq;
> >>>> +		dev_err(&devfreq->dev, "Couldn't update the frequency.\n");
> >>>> +		return ret;
> >>>> +	}
> >>>> +
> >>>> +	return 0;
> >>>> +}
> >>>> +
> >>>> +static int cpufreq_passive_register(struct devfreq_passive_data **p_data)
> >>>
> >>> In order to keep the consistent style of function name,
> >>> please change the name as following because devfreq defines
> >>> the function name as 'devfreq_regiter_notifier'
> >>> - cpufreq_passive_register -> cpufreq_passive_register_notifier
> >>>
> >>>> +{
> >>>> +	struct devfreq_passive_data *data = *p_data;
> >>>> +	struct devfreq *devfreq = (struct devfreq *)data->this;
> >>>> +	struct device *dev = devfreq->dev.parent;
> >>>> +	struct opp_table *opp_table = NULL;
> >>>> +	struct devfreq_cpu_state *cpu_state;
> >>>> +	struct cpufreq_policy *policy;
> >>>> +	struct device *cpu_dev;
> >>>> +	unsigned int cpu;
> >>>> +	int ret;
> >>>> +
> >>>> +	get_online_cpus();
> >>>> +
> >>>> +	data->nb.notifier_call = cpufreq_passive_notifier_call;
> >>>> +	ret = cpufreq_register_notifier(&data->nb,
> >>>> +					CPUFREQ_TRANSITION_NOTIFIER);
> >>>> +	if (ret) {
> >>>> +		dev_err(dev, "Couldn't register cpufreq notifier.\n");
> >>>> +		data->nb.notifier_call = NULL;
> >>>> +		goto out;
> >>>> +	}
> >>>> +
> >>>> +	/* Populate devfreq_cpu_state */
> >>>
> >>> Don't need this comment. Please drop it.
> >>>
> >>>> +	for_each_online_cpu(cpu) {
> >>>> +		if (data->cpu_state[cpu])
> >>>> +			continue;
> >>>> +
> >>>> +		policy = cpufreq_cpu_get(cpu);
> >>>> +		if (!policy) {
> >>>> +			ret = -EINVAL;
> >>>> +			goto out;
> >>>> +		} else if (PTR_ERR(policy) == -EPROBE_DEFER) {
> >>>> +			ret = -EPROBE_DEFER;
> >>>> +			goto out;
> >>>> +		} else if (IS_ERR(policy)) {
> >>>> +			ret = PTR_ERR(policy);
> >>>> +			dev_err(dev, "Couldn't get the cpufreq_poliy.\n");
> >>>> +			goto out;
> >>>> +		}
> >>>
> >>> Use dev_err_probe() funciotn to handle hte EPROBE_DEFER.
> >>> It make code more simple.
> >>>
> >>>> +
> >>>> +		cpu_state = kzalloc(sizeof(*cpu_state), GFP_KERNEL);
> >>>> +		if (!cpu_state) {
> >>>> +			ret = -ENOMEM;
> >>>> +			goto out;
> >>>> +		}
> >>>> +
> >>>> +		cpu_dev = get_cpu_device(cpu);
> >>>> +		if (!cpu_dev) {
> >>>> +			dev_err(dev, "Couldn't get cpu device.\n");
> >>>> +			ret = -ENODEV;
> >>>> +			goto out;
> >>>> +		}
> >>>> +
> >>>> +		opp_table = dev_pm_opp_get_opp_table(cpu_dev);
> >>>> +		if (IS_ERR(devfreq->opp_table)) {
> >>>> +			ret = PTR_ERR(opp_table);
> >>>> +			goto out;
> >>>> +		}
> >>>> +
> >>>> +		cpu_state->cpu_dev = cpu_dev;
> >>>> +		cpu_state->opp_table = opp_table;
> >>>> +		cpu_state->first_cpu = cpumask_first(policy->related_cpus);
> >>>> +		cpu_state->curr_freq = policy->cur;
> >>>> +		cpu_state->min_freq = policy->cpuinfo.min_freq;
> >>>> +		cpu_state->max_freq = policy->cpuinfo.max_freq;
> >>>> +		data->cpu_state[cpu] = cpu_state;
> >>>> +
> >>>> +		cpufreq_cpu_put(policy);
> >>>> +	}
> >>>> +
> >>>> +out:
> >>>> +	put_online_cpus();
> >>>> +	if (ret)
> >>>> +		return ret;
> >>>> +
> >>>> +	/* Update devfreq */
> >>>> +	mutex_lock(&devfreq->lock);
> >>>> +	ret = update_devfreq(devfreq);
> >>>
> >>>> +	mutex_unlock(&devfreq->lock);
> >>>> +	if (ret)
> >>>> +		dev_err(dev, "Couldn't update the frequency.\n");
> >>>> +
> >>>> +	return ret;
> >>>> +}
> >>>> +
> >>>> +static int cpufreq_passive_unregister(struct devfreq_passive_data **p_data)
> >>>
> >>> As I commented above, please change the name as following:
> >>> - cpufreq_passive_unregister -> cpufreq_passive_unregister_notifier
> >>>
> >>>> +{
> >>>> +	struct devfreq_passive_data *data = *p_data;
> >>>> +	struct devfreq_cpu_state *cpu_state;
> >>>> +	int cpu;
> >>>> +
> >>>> +	if (data->nb.notifier_call)
> >>>> +		cpufreq_unregister_notifier(&data->nb,
> >>>> +					    CPUFREQ_TRANSITION_NOTIFIER);
> >>>> +
> >>>> +	for_each_possible_cpu(cpu) {
> >>>> +		cpu_state = data->cpu_state[cpu];
> >>>> +		if (cpu_state) {
> >>>> +			if (cpu_state->opp_table)
> >>>> +				dev_pm_opp_put_opp_table(cpu_state->opp_table);
> >>>> +			kfree(cpu_state);
> >>>> +			cpu_state = NULL;
> >>>> +		}
> >>>> +	}
> >>>> +
> >>>> +	return 0;
> >>>> +}
> >>>> +
> >>>> +int register_parent_dev_notifier(struct devfreq_passive_data **p_data)
> >>>> +{
> >>>> +	struct notifier_block *nb = &(*p_data)->nb;
> >>>> +	int ret = 0;
> >>>> +
> >>>> +	switch ((*p_data)->parent_type) {
> >>>> +	case DEVFREQ_PARENT_DEV:
> >>>> +		nb->notifier_call = devfreq_passive_notifier_call;
> >>>> +		ret = devfreq_register_notifier((struct devfreq *)(*p_data)->parent, nb,
> >>>> +						DEVFREQ_TRANSITION_NOTIFIER);
> >>>> +		break;
> >>>> +	case CPUFREQ_PARENT_DEV:
> >>>> +		ret = cpufreq_passive_register(p_data);
> >>>> +		break;
> >>>> +	default:
> >>>> +		ret = -EINVAL;
> >>>> +		break;
> >>>> +	}
> >>>> +	return ret;
> >>>> +}
> >>>> +
> >>>> +int unregister_parent_dev_notifier(struct devfreq_passive_data **p_data)
> >>>> +{
> >>>> +	int ret = 0;
> >>>> +
> >>>> +	switch ((*p_data)->parent_type) {
> >>>> +	case DEVFREQ_PARENT_DEV:
> >>>> +		WARN_ON(devfreq_unregister_notifier((struct devfreq *)(*p_data)->parent,
> >>>> +						    &(*p_data)->nb,
> >>>> +						    DEVFREQ_TRANSITION_NOTIFIER));
> >>>> +		break;
> >>>> +	case CPUFREQ_PARENT_DEV:
> >>>> +		cpufreq_passive_unregister(p_data);
> >>>> +		break;
> >>>> +	default:
> >>>> +		ret = -EINVAL;
> >>>> +		break;
> >>>> +	}
> >>>> +	return ret;
> >>>> +}
> >>>
> >>> I think that you don't need to define register_parent_dev_notifier
> >>> and unregister_parent_dev_notifier as the separate functions.
> >>>
> >>> Instead of the separate functions, just add the code
> >>> into devfreq_passive_event_handler.
> >>>
> >>>
> >>>> +
> >>>>  static int devfreq_passive_event_handler(struct devfreq *devfreq,
> >>>>  				unsigned int event, void *data)
> >>>>  {
> >>>>  	struct devfreq_passive_data *p_data
> >>>>  			= (struct devfreq_passive_data *)devfreq->data;
> >>>>  	struct devfreq *parent = (struct devfreq *)p_data->parent;
> >>>> -	struct notifier_block *nb = &p_data->nb;
> >>>>  	int ret = 0;
> >>>>  
> >>>> -	if (!parent)
> >>>> +	if (p_data->parent_type == DEVFREQ_PARENT_DEV && !parent)
> >>>>  		return -EPROBE_DEFER;
> >>>>  
> >>>>  	switch (event) {
> >>>> @@ -147,13 +446,11 @@ static int devfreq_passive_event_handler(struct devfreq *devfreq,
> >>>>  		if (!p_data->this)
> >>>>  			p_data->this = devfreq;
> >>>>  
> >>>> -		nb->notifier_call = devfreq_passive_notifier_call;
> >>>> -		ret = devfreq_register_notifier(parent, nb,
> >>>> -					DEVFREQ_TRANSITION_NOTIFIER);
> >>>> +		ret = register_parent_dev_notifier(&p_data);
> >>>>  		break;
> >>>> +
> >>>>  	case DEVFREQ_GOV_STOP:
> >>>> -		WARN_ON(devfreq_unregister_notifier(parent, nb,
> >>>> -					DEVFREQ_TRANSITION_NOTIFIER));
> >>>> +		ret = unregister_parent_dev_notifier(&p_data);
> >>>>  		break;
> >>>>  	default:
> >>>>  		break;
> >>>> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
> >>>> index 26ea0850be9b..e0093b7c805c 100644
> >>>> --- a/include/linux/devfreq.h
> >>>> +++ b/include/linux/devfreq.h
> >>>> @@ -280,6 +280,25 @@ struct devfreq_simple_ondemand_data {
> >>>>  
> >>>>  #if IS_ENABLED(CONFIG_DEVFREQ_GOV_PASSIVE)
> >>>>  /**
> >>>> + * struct devfreq_cpu_state - holds the per-cpu state
> >>>> + * @freq:	the current frequency of the cpu.
> >>>> + * @min_freq:	the min frequency of the cpu.
> >>>> + * @max_freq:	the max frequency of the cpu.
> >>>> + * @first_cpu:	the cpumask of the first cpu of a policy.
> >>>> + * @dev:	reference to cpu device.
> >>>> + * @opp_table:	reference to cpu opp table.
> >>>> + *
> >>>> + * This structure stores the required cpu_state of a cpu.
> >>>> + * This is auto-populated by the governor.
> >>>> + */
> >>>> +struct devfreq_cpu_state;
> >>>> +
> >>>> +enum devfreq_parent_dev_type {
> >>>> +	DEVFREQ_PARENT_DEV,
> >>>> +	CPUFREQ_PARENT_DEV,
> >>>> +};
> >>>> +
> >>>> +/**
> >>>>   * struct devfreq_passive_data - ``void *data`` fed to struct devfreq
> >>>>   *	and devfreq_add_device
> >>>>   * @parent:	the devfreq instance of parent device.
> >>>> @@ -290,13 +309,15 @@ struct devfreq_simple_ondemand_data {
> >>>>   *			using governors except for passive governor.
> >>>>   *			If the devfreq device has the specific method to decide
> >>>>   *			the next frequency, should use this callback.
> >>>> + * @parent_type:	parent type of the device
> >>>>   * @this:	the devfreq instance of own device.
> >>>>   * @nb:		the notifier block for DEVFREQ_TRANSITION_NOTIFIER list
> >>>> + * @cpu_state:		the state min/max/current frequency of all online cpu's
> >>>>   *
> >>>>   * The devfreq_passive_data have to set the devfreq instance of parent
> >>>>   * device with governors except for the passive governor. But, don't need to
> >>>> - * initialize the 'this' and 'nb' field because the devfreq core will handle
> >>>> - * them.
> >>>> + * initialize the 'this', 'nb' and 'cpu_state' field because the devfreq core
> >>>> + * will handle them.
> >>>>   */
> >>>>  struct devfreq_passive_data {
> >>>>  	/* Should set the devfreq instance of parent device */
> >>>> @@ -305,9 +326,13 @@ struct devfreq_passive_data {
> >>>>  	/* Optional callback to decide the next frequency of passvice device */
> >>>>  	int (*get_target_freq)(struct devfreq *this, unsigned long *freq);
> >>>>  
> >>>> +	/* Should set the type of parent device */
> >>>> +	enum devfreq_parent_dev_type parent_type;
> >>>> +
> >>>>  	/* For passive governor's internal use. Don't need to set them */
> >>>>  	struct devfreq *this;
> >>>>  	struct notifier_block nb;
> >>>> +	struct devfreq_cpu_state *cpu_state[NR_CPUS];
> >>>>  };
> >>>>  #endif
> >>>>  
> >>>>
> >>>
> >>
> > 
> > 
> 
>
Chanwoo Choi April 1, 2021, 12:16 a.m. UTC | #8
On 3/31/21 10:03 PM, andrew-sh.cheng wrote:
> On Wed, 2021-03-31 at 17:35 +0900, Chanwoo Choi wrote:
>> On 3/31/21 5:27 PM, Chanwoo Choi wrote:
>>> Hi,
>>>
>>> On 3/31/21 5:03 PM, andrew-sh.cheng wrote:
>>>> On Thu, 2021-03-25 at 17:14 +0900, Chanwoo Choi wrote:
>>>>> Hi,
>>>>>
>>>>> You are missing to add these patches to linux-pm mailing list.
>>>>> Need to send them to linu-pm ML.
>>>>>
>>>>> Also, before received this series, I tried to clean-up these patches
>>>>> on testing branch[1]. So that I add my comment with my clean-up case.
>>>>> [1] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!zIrzeDp9vPnm1_SDzVPuzqdHn3zWie9DnfBXaA-j9-CSrVc6aR9_rJQQiw81_CgAPh9XRRs$ 
>>>>>
>>>>> And 'Saravana Kannan <skannan@codeaurora.org>' is wrong email address.
>>>>> Please update the email or drop this email.
>>>>
>>>> Hi Chanwoo,
>>>>
>>>> Thank you for the advices.
>>>> I will resend patch v9 (add to linux-pm ML), remove this patch, and note
>>>> that my patch set base on
>>>> https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!yUlsuxrL5PcbF7o6A9DlCfvoA6w8V8VXKjYIybYyiJg3D0HM-lI2xRuxLUV6b3UJ8WFhg_g$ 
>>>
>>> I has not yet test this patch[1] on devfreq-testing-passive-gov branch.
>>> So that if possible, I'd like you to test your patches with this patch[1] 
>>> and then if there is no problem, could you send the next patches with patch[1]?
>>>
>>> [1]https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/commit/?h=devfreq-testing-passive-gov&id=39c80d11a8f42dd63ecea1e0df595a0ceb83b454__;!!CTRNKA9wMg0ARbw!yUlsuxrL5PcbF7o6A9DlCfvoA6w8V8VXKjYIybYyiJg3D0HM-lI2xRuxLUV6b3UJR2cQqZs$ 
>>
>>
>> Sorry for the confusion. I make the devfreq-testing-passive-gov[1]
>> branch based on latest devfreq-next branch.
>> [1] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!yUlsuxrL5PcbF7o6A9DlCfvoA6w8V8VXKjYIybYyiJg3D0HM-lI2xRuxLUV6b3UJ8WFhg_g$ 
>>
>> First of all, if possible, I want to test them[1] with your patches in this series.
>> And then if there are no any problem, please let me know. After confirmed from you,
>> I'll send the patches of devfreq-testing-passive-gov[1] branch.
>> How about that?
>>
> Hi Chanwoo~
> 
> We will use this on Google Chrome project.
> Google Hsin-Yi has test your patch + my patch set v8 [2~8]
> 
>     make sure cci devfreqs runs with cpufreq.
>     suspend resume
>     speedometer2 benchmark
> It is okay.
> 
> Please send the patches of devfreq-testing-passive-gov[1] branch.
> 
> I will send patch v9 base on yours latter.

Thanks for your test. I'll send the patches today.

> 
> 
>>
>>>
>>>>
>>>>
>>>>>
>>>>>
>>>>> On 3/23/21 8:33 PM, Andrew-sh.Cheng wrote:
>>>>>> From: Saravana Kannan <skannan@codeaurora.org>
>>>>>>
>>>>>> Many CPU architectures have caches that can scale independent of the
>>>>>> CPUs. Frequency scaling of the caches is necessary to make sure that the
>>>>>> cache is not a performance bottleneck that leads to poor performance and
>>>>>> power. The same idea applies for RAM/DDR.
>>>>>>
>>>>>> To achieve this, this patch adds support for cpu based scaling to the
>>>>>> passive governor. This is accomplished by taking the current frequency
>>>>>> of each CPU frequency domain and then adjust the frequency of the cache
>>>>>> (or any devfreq device) based on the frequency of the CPUs. It listens
>>>>>> to CPU frequency transition notifiers to keep itself up to date on the
>>>>>> current CPU frequency.
>>>>>>
>>>>>> To decide the frequency of the device, the governor does one of the
>>>>>> following:
>>>>>> * Derives the optimal devfreq device opp from required-opps property of
>>>>>>   the parent cpu opp_table.
>>>>>>
>>>>>> * Scales the device frequency in proportion to the CPU frequency. So, if
>>>>>>   the CPUs are running at their max frequency, the device runs at its
>>>>>>   max frequency. If the CPUs are running at their min frequency, the
>>>>>>   device runs at its min frequency. It is interpolated for frequencies
>>>>>>   in between.
>>>>>>
>>>>>> Andrew-sh.Cheng change
>>>>>> dev_pm_opp_xlate_opp to dev_pm_opp_xlate_required_opp devfreq->max_freq
>>>>>> to devfreq->user_min_freq_req.data.freq.qos->min_freq.target_value
>>>>>> after kernel-5.7
>>>>>> Don't return -EINVAL in devfreq_passive_event_handler()
>>>>>> since it doesn't handle DEVFREQ_GOV_SUSPEND DEVFREQ_GOV_RESUME cases.
>>>>>>
>>>>>> Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
>>>>>> [Sibi: Integrated cpu-freqmap governor into passive_governor]
>>>>>> Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
>>>>>> Signed-off-by: Andrew-sh.Cheng <andrew-sh.cheng@mediatek.com>
>>>>>> ---
>>>>>>  drivers/devfreq/Kconfig            |   2 +
>>>>>>  drivers/devfreq/governor_passive.c | 329 +++++++++++++++++++++++++++++++++++--
>>>>>>  include/linux/devfreq.h            |  29 +++-
>>>>>>  3 files changed, 342 insertions(+), 18 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/devfreq/Kconfig b/drivers/devfreq/Kconfig
>>>>>> index 00704efe6398..f56132b0ae64 100644
>>>>>> --- a/drivers/devfreq/Kconfig
>>>>>> +++ b/drivers/devfreq/Kconfig
>>>>>> @@ -73,6 +73,8 @@ config DEVFREQ_GOV_PASSIVE
>>>>>>  	  device. This governor does not change the frequency by itself
>>>>>>  	  through sysfs entries. The passive governor recommends that
>>>>>>  	  devfreq device uses the OPP table to get the frequency/voltage.
>>>>>> +	  Alternatively the governor can also be chosen to scale based on
>>>>>> +	  the online CPUs current frequency.
>>>>>>  
>>>>>>  comment "DEVFREQ Drivers"
>>>>>>  
>>>>>> diff --git a/drivers/devfreq/governor_passive.c b/drivers/devfreq/governor_passive.c
>>>>>> index b094132bd20b..9cc57b083839 100644
>>>>>> --- a/drivers/devfreq/governor_passive.c
>>>>>> +++ b/drivers/devfreq/governor_passive.c
>>>>>> @@ -8,11 +8,103 @@
>>>>>>   */
>>>>>>  
>>>>>>  #include <linux/module.h>
>>>>>> +#include <linux/cpu.h>
>>>>>> +#include <linux/cpufreq.h>
>>>>>> +#include <linux/cpumask.h>
>>>>>>  #include <linux/device.h>
>>>>>>  #include <linux/devfreq.h>
>>>>>> +#include <linux/slab.h>
>>>>>>  #include "governor.h"
>>>>>>  
>>>>>> -static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
>>>>>> +struct devfreq_cpu_state {
>>>>>> +	unsigned int curr_freq;
>>>>>> +	unsigned int min_freq;
>>>>>> +	unsigned int max_freq;
>>>>>> +	unsigned int first_cpu;
>>>>>> +	struct device *cpu_dev;
>>>>>> +	struct opp_table *opp_table;
>>>>>> +};
>>>>>
>>>>> As I knew, the previous version has the description of structure
>>>>> as following:  I wan to add the description like below.
>>>>>
>>>>> And if you have no any objection, I'd like you to order
>>>>> the variables as following and use 'dev' instead of 'cpu_dev'
>>>>> because this patch use the 'cpu_state->cpu_dev' at the multiple points.
>>>>> I think that 'cpu_state->dev' is better than 'cpu_state->cpu_dev'.
>>>>> Also, I prefer to use 'cur_freq' instead of 'curr_freq'
>>>>> because devfreq subsystem uses 'cur_freq' for expressing the 'current frequency'.
>>>>>
>>>>> /**                                                                             
>>>>>  * struct devfreq_cpu_state - Hold the per-cpu data                              
>>>>>  * @dev:        reference to cpu device.                                        
>>>>>  * @first_cpu:  the cpumask of the first cpu of a policy.                       
>>>>>  * @opp_table:  reference to cpu opp table.                                     
>>>>>  * @cur_freq:   the current frequency of the cpu.                               
>>>>>  * @min_freq:   the min frequency of the cpu.                                   
>>>>>  * @max_freq:   the max frequency of the cpu.                                   
>>>>>  *                                                                              
>>>>>  * This structure stores the required cpu_data of a cpu.                        
>>>>>  * This is auto-populated by the governor.                                      
>>>>>  */                                                                             
>>>>> struct devfreq_cpu_state {                                                       
>>>>>          struct device *dev;                                                     
>>>>>          unsigned int first_cpu;                                                 
>>>>>
>>>>>          struct opp_table *opp_table;                                            
>>>>>          unsigned int cur_freq;                                                  
>>>>>          unsigned int min_freq;                                                  
>>>>>          unsigned int max_freq;                                                  
>>>>> };               
>>>>>
>>>>>
>>>>>> +
>>>>>> +static unsigned long xlate_cpufreq_to_devfreq(struct devfreq_passive_data *data,
>>>>>> +					      unsigned int cpu)
>>>>>> +{
>>>>>> +	unsigned int cpu_min_freq, cpu_max_freq, cpu_curr_freq_khz, cpu_percent;
>>>>>> +	unsigned long dev_min_freq, dev_max_freq, dev_max_state;
>>>>>> +
>>>>>> +	struct devfreq_cpu_state *cpu_state = data->cpu_state[cpu];
>>>>>> +	struct devfreq *devfreq = (struct devfreq *)data->this;
>>>>>> +	unsigned long *dev_freq_table = devfreq->profile->freq_table;
>>>>>> +	struct dev_pm_opp *opp = NULL, *p_opp = NULL;
>>>>>> +	unsigned long cpu_curr_freq, freq;
>>>>>> +
>>>>>> +	if (!cpu_state || cpu_state->first_cpu != cpu ||
>>>>>> +	    !cpu_state->opp_table || !devfreq->opp_table)
>>>>>> +		return 0;
>>>>>> +
>>>>>> +	cpu_curr_freq = cpu_state->curr_freq * 1000;
>>>>>> +	p_opp = devfreq_recommended_opp(cpu_state->cpu_dev, &cpu_curr_freq, 0);
>>>>>> +	if (IS_ERR(p_opp))
>>>>>> +		return 0;
>>>>>> +
>>>>>> +	opp = dev_pm_opp_xlate_required_opp(cpu_state->opp_table,
>>>>>> +					    devfreq->opp_table, p_opp);
>>>>>> +	dev_pm_opp_put(p_opp);
>>>>>> +
>>>>>> +	if (!IS_ERR(opp)) {
>>>>>> +		freq = dev_pm_opp_get_freq(opp);
>>>>>> +		dev_pm_opp_put(opp);
>>>>>> +		goto out;
>>>>>> +	}
>>>>>> +
>>>>>> +	/* Use Interpolation if required opps is not available */
>>>>>> +	cpu_min_freq = cpu_state->min_freq;
>>>>>> +	cpu_max_freq = cpu_state->max_freq;
>>>>>> +	cpu_curr_freq_khz = cpu_state->curr_freq;
>>>>>> +
>>>>>> +	if (dev_freq_table) {
>>>>>> +		/* Get minimum frequency according to sorting order */
>>>>>> +		dev_max_state = dev_freq_table[devfreq->profile->max_state - 1];
>>>>>> +		if (dev_freq_table[0] < dev_max_state) {
>>>>>> +			dev_min_freq = dev_freq_table[0];
>>>>>> +			dev_max_freq = dev_max_state;
>>>>>> +		} else {
>>>>>> +			dev_min_freq = dev_max_state;
>>>>>> +			dev_max_freq = dev_freq_table[0];
>>>>>> +		}
>>>>>> +	} else {
>>>>>> +		dev_min_freq = dev_pm_qos_read_value(devfreq->dev.parent,
>>>>>> +						     DEV_PM_QOS_MIN_FREQUENCY);
>>>>>> +		dev_max_freq = dev_pm_qos_read_value(devfreq->dev.parent,
>>>>>> +						     DEV_PM_QOS_MAX_FREQUENCY);
>>>>>> +
>>>>>> +		if (dev_max_freq <= dev_min_freq)
>>>>>> +			return 0;
>>>>>> +	}
>>>>>> +	cpu_percent = ((cpu_curr_freq_khz - cpu_min_freq) * 100) / cpu_max_freq - cpu_min_freq;
>>>>>> +	freq = dev_min_freq + mult_frac(dev_max_freq - dev_min_freq, cpu_percent, 100);
>>>>>> +
>>>>>> +out:
>>>>>> +	return freq;
>>>>>> +}
>>>>>> +
>>>>>> +static int get_target_freq_with_cpufreq(struct devfreq *devfreq,
>>>>>> +					unsigned long *freq)
>>>>>> +{
>>>>>> +	struct devfreq_passive_data *p_data =
>>>>>> +				(struct devfreq_passive_data *)devfreq->data;
>>>>>> +	unsigned int cpu;
>>>>>> +	unsigned long target_freq = 0;
>>>>>> +
>>>>>> +	for_each_online_cpu(cpu)
>>>>>> +		target_freq = max(target_freq,
>>>>>> +				  xlate_cpufreq_to_devfreq(p_data, cpu));
>>>>>> +
>>>>>> +	*freq = target_freq;
>>>>>> +
>>>>>> +	return 0;
>>>>>> +}
>>>>>
>>>>> As you knew, governor_passive.c was already used 
>>>>> both 'dev_pm_opp_xlate_required_opp' and 'devfreq_recommended_opp'
>>>>> to get the target from OPP. So, I wan to make the common function
>>>>> like 'get_taget_freq_by_required_opp' as following:
>>>>> If define 'get_taget_freq_by_required_opp' as following,
>>>>> it will be used for get_target_freq_with_devfreq().
>>>>> After finisied the review of this patch, I'll send the patch[2].
>>>>> [2] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/commit/?h=devfreq-testing-passive-gov&id=101c5a087586ab2b5cf3370166a7e39227ca83cf__;!!CTRNKA9wMg0ARbw!zIrzeDp9vPnm1_SDzVPuzqdHn3zWie9DnfBXaA-j9-CSrVc6aR9_rJQQiw81_CgA6mp3Yqo$ 
>>>>>
>>>>> For example but this code is not tested,
>>>>> static unsigned long get_taget_freq_by_required_opp(struct device *p_dev,
>>>>> 						struct opp_table *p_opp_table,
>>>>> 						struct opp_table *opp_table,
>>>>> 						unsigned long freq)
>>>>> {
>>>>> 	struct dev_pm_opp *opp = NULL, *p_opp = NULL;
>>>>>
>>>>> 	if (!p_dev || !p_opp_table || !opp_table || !freq)
>>>>> 		return 0;
>>>>>
>>>>> 	p_opp = devfreq_recommended_opp(p_dev, &freq, 0);
>>>>> 	if (IS_ERR(p_opp))
>>>>> 		return 0;
>>>>>
>>>>> 	opp = dev_pm_opp_xlate_required_opp(p_opp_table, opp_table, p_opp);
>>>>> 	dev_pm_opp_put(p_opp);
>>>>>
>>>>> 	if (IS_ERR(opp))
>>>>> 		return 0;
>>>>>
>>>>> 	freq = dev_pm_opp_get_freq(opp);
>>>>> 	dev_pm_opp_put(opp);
>>>>>
>>>>> 	return freq;
>>>>> }
>>>>>
>>>>> static int get_target_freq_with_cpufreq(struct devfreq *devfreq,
>>>>> 					unsigned long *target_freq)
>>>>> {
>>>>> 	struct devfreq_passive_data *p_data =
>>>>> 				(struct devfreq_passive_data *)devfreq->data;
>>>>> 	struct devfreq_cpu_data *cpu_data;
>>>>> 	unsigned long cpu, cpu_cur, cpu_min, cpu_max, cpu_percent;
>>>>> 	unsigned long dev_min, dev_max;
>>>>> 	unsigned long freq = 0;
>>>>>
>>>>> 	for_each_online_cpu(cpu) {
>>>>> 		cpu_data = p_data->cpu_data[cpu];
>>>>> 		if (!cpu_data || cpu_data->first_cpu != cpu)
>>>>> 			continue;
>>>>>
>>>>> 		/* Get target freq via required opps */
>>>>> 		cpu_cur = cpu_data->cur_freq * HZ_PER_KHZ;
>>>>> 		freq = get_taget_freq_by_required_opp(cpu_data->dev,
>>>>> 					cpu_data->opp_table,
>>>>> 					devfreq->opp_table, cpu_cur);
>>>>> 		if (freq) {
>>>>> 			*target_freq = max(freq, *target_freq);
>>>>> 			continue;
>>>>> 		}
>>>>>
>>>>> 		/* Use Interpolation if required opps is not available */
>>>>> 		devfreq_get_freq_range(devfreq, &dev_min, &dev_max);
>>>>>
>>>>> 		cpu_min = cpu_data->min_freq;
>>>>> 		cpu_max = cpu_data->max_freq;
>>>>> 		cpu_cur = cpu_data->cur_freq;
>>>>>
>>>>> 		cpu_percent = ((cpu_cur - cpu_min) * 100) / cpu_max - cpu_min;
>>>>> 		freq = dev_min + mult_frac(dev_max - dev_min, cpu_percent, 100);
>>>>>
>>>>> 		*target_freq = max(freq, *target_freq);
>>>>> 	}
>>>>>
>>>>> 	return 0;
>>>>> }
>>>>>
>>>>>> +
>>>>>> +static int get_target_freq_with_devfreq(struct devfreq *devfreq,
>>>>>>  					unsigned long *freq)
>>>>>>  {
>>>>>>  	struct devfreq_passive_data *p_data
>>>>>> @@ -23,14 +115,6 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
>>>>>>  	int i, count;
>>>>>>  
>>>>>>  	/*
>>>>>> -	 * If the devfreq device with passive governor has the specific method
>>>>>> -	 * to determine the next frequency, should use the get_target_freq()
>>>>>> -	 * of struct devfreq_passive_data.
>>>>>> -	 */
>>>>>> -	if (p_data->get_target_freq)
>>>>>> -		return p_data->get_target_freq(devfreq, freq);
>>>>>> -
>>>>>> -	/*
>>>>>>  	 * If the parent and passive devfreq device uses the OPP table,
>>>>>>  	 * get the next frequency by using the OPP table.
>>>>>>  	 */
>>>>>> @@ -98,6 +182,37 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
>>>>>>  	return 0;
>>>>>>  }
>>>>>>  
>>>>>> +static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
>>>>>> +					   unsigned long *freq)
>>>>>> +{
>>>>>> +	struct devfreq_passive_data *p_data =
>>>>>> +		(struct devfreq_passive_data *)devfreq->data;
>>>>>> +	int ret;
>>>>>> +
>>>>>> +	/*
>>>>>> +	 * If the devfreq device with passive governor has the specific method
>>>>>> +	 * to determine the next frequency, should use the get_target_freq()
>>>>>> +	 * of struct devfreq_passive_data.
>>>>>> +	 */
>>>>>> +	if (p_data->get_target_freq)
>>>>>> +		return p_data->get_target_freq(devfreq, freq);
>>>>>> +
>>>>>> +	switch (p_data->parent_type) {
>>>>>> +	case DEVFREQ_PARENT_DEV:
>>>>>> +		ret = get_target_freq_with_devfreq(devfreq, freq);
>>>>>> +		break;
>>>>>> +	case CPUFREQ_PARENT_DEV:
>>>>>> +		ret = get_target_freq_with_cpufreq(devfreq, freq);
>>>>>> +		break;
>>>>>> +	default:
>>>>>> +		ret = -EINVAL;
>>>>>> +		dev_err(&devfreq->dev, "Invalid parent type\n");
>>>>>> +		break;
>>>>>> +	}
>>>>>> +
>>>>>> +	return ret;
>>>>>> +}
>>>>>> +
>>>>>>  static int devfreq_passive_notifier_call(struct notifier_block *nb,
>>>>>>  				unsigned long event, void *ptr)
>>>>>>  {
>>>>>> @@ -130,16 +245,200 @@ static int devfreq_passive_notifier_call(struct notifier_block *nb,
>>>>>>  	return NOTIFY_DONE;
>>>>>>  }
>>>>>>  
>>>>>> +static int cpufreq_passive_notifier_call(struct notifier_block *nb,
>>>>>> +					 unsigned long event, void *ptr)
>>>>>> +{
>>>>>> +	struct devfreq_passive_data *data =
>>>>>> +			container_of(nb, struct devfreq_passive_data, nb);
>>>>>> +	struct devfreq *devfreq = (struct devfreq *)data->this;
>>>>>> +	struct devfreq_cpu_state *cpu_state;
>>>>>> +	struct cpufreq_freqs *cpu_freq = ptr;
>>>>>
>>>>> Use 'freqs' variable name.  I prefer to use the same variable name
>>>>> for both devfreq_freqs and cpufreq_freqs instance.
>>>>>
>>>>>> +	unsigned int curr_freq;
>>>>>
>>>>> As I commented above, better to use 'cur_frq' instead of 'curr_freq'
>>>>> if there is no any special reason.
>>>>>
>>>>>> +	int ret;
>>>>>> +
>>>>>> +	if (event != CPUFREQ_POSTCHANGE || !cpu_freq ||
>>>>>> +	    !data->cpu_state[cpu_freq->policy->cpu])
>>>>>> +		return 0;
>>>>>> +
>>>>>> +	cpu_state = data->cpu_state[cpu_freq->policy->cpu];
>>>>>> +	if (cpu_state->curr_freq == cpu_freq->new)
>>>>>> +		return 0;
>>>>>> +
>>>>>> +	/* Backup current freq and pre-update cpu state freq*/
>>>>>
>>>>> I think that this commnet is not critial. So, please drop this comment.
>>>>>
>>>>>> +	curr_freq = cpu_state->curr_freq;
>>>>>> +	cpu_state->curr_freq = cpu_freq->new;
>>>>>> +
>>>>>> +	mutex_lock(&devfreq->lock);
>>>>>> +	ret = update_devfreq(devfreq);
>>>>>
>>>>> I recommend to use 'devfreq_update_target' instead of 'update_devfreq'
>>>>> as following:
>>>>> 	devfreq_update_target(devfreq, freqs->new);
>>>>>
>>>>>> +	mutex_unlock(&devfreq->lock);
>>>>>> +	if (ret) {
>>>>>> +		cpu_state->curr_freq = curr_freq;
>>>>>> +		dev_err(&devfreq->dev, "Couldn't update the frequency.\n");
>>>>>> +		return ret;
>>>>>> +	}
>>>>>> +
>>>>>> +	return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static int cpufreq_passive_register(struct devfreq_passive_data **p_data)
>>>>>
>>>>> In order to keep the consistent style of function name,
>>>>> please change the name as following because devfreq defines
>>>>> the function name as 'devfreq_regiter_notifier'
>>>>> - cpufreq_passive_register -> cpufreq_passive_register_notifier
>>>>>
>>>>>> +{
>>>>>> +	struct devfreq_passive_data *data = *p_data;
>>>>>> +	struct devfreq *devfreq = (struct devfreq *)data->this;
>>>>>> +	struct device *dev = devfreq->dev.parent;
>>>>>> +	struct opp_table *opp_table = NULL;
>>>>>> +	struct devfreq_cpu_state *cpu_state;
>>>>>> +	struct cpufreq_policy *policy;
>>>>>> +	struct device *cpu_dev;
>>>>>> +	unsigned int cpu;
>>>>>> +	int ret;
>>>>>> +
>>>>>> +	get_online_cpus();
>>>>>> +
>>>>>> +	data->nb.notifier_call = cpufreq_passive_notifier_call;
>>>>>> +	ret = cpufreq_register_notifier(&data->nb,
>>>>>> +					CPUFREQ_TRANSITION_NOTIFIER);
>>>>>> +	if (ret) {
>>>>>> +		dev_err(dev, "Couldn't register cpufreq notifier.\n");
>>>>>> +		data->nb.notifier_call = NULL;
>>>>>> +		goto out;
>>>>>> +	}
>>>>>> +
>>>>>> +	/* Populate devfreq_cpu_state */
>>>>>
>>>>> Don't need this comment. Please drop it.
>>>>>
>>>>>> +	for_each_online_cpu(cpu) {
>>>>>> +		if (data->cpu_state[cpu])
>>>>>> +			continue;
>>>>>> +
>>>>>> +		policy = cpufreq_cpu_get(cpu);
>>>>>> +		if (!policy) {
>>>>>> +			ret = -EINVAL;
>>>>>> +			goto out;
>>>>>> +		} else if (PTR_ERR(policy) == -EPROBE_DEFER) {
>>>>>> +			ret = -EPROBE_DEFER;
>>>>>> +			goto out;
>>>>>> +		} else if (IS_ERR(policy)) {
>>>>>> +			ret = PTR_ERR(policy);
>>>>>> +			dev_err(dev, "Couldn't get the cpufreq_poliy.\n");
>>>>>> +			goto out;
>>>>>> +		}
>>>>>
>>>>> Use dev_err_probe() funciotn to handle hte EPROBE_DEFER.
>>>>> It make code more simple.
>>>>>
>>>>>> +
>>>>>> +		cpu_state = kzalloc(sizeof(*cpu_state), GFP_KERNEL);
>>>>>> +		if (!cpu_state) {
>>>>>> +			ret = -ENOMEM;
>>>>>> +			goto out;
>>>>>> +		}
>>>>>> +
>>>>>> +		cpu_dev = get_cpu_device(cpu);
>>>>>> +		if (!cpu_dev) {
>>>>>> +			dev_err(dev, "Couldn't get cpu device.\n");
>>>>>> +			ret = -ENODEV;
>>>>>> +			goto out;
>>>>>> +		}
>>>>>> +
>>>>>> +		opp_table = dev_pm_opp_get_opp_table(cpu_dev);
>>>>>> +		if (IS_ERR(devfreq->opp_table)) {
>>>>>> +			ret = PTR_ERR(opp_table);
>>>>>> +			goto out;
>>>>>> +		}
>>>>>> +
>>>>>> +		cpu_state->cpu_dev = cpu_dev;
>>>>>> +		cpu_state->opp_table = opp_table;
>>>>>> +		cpu_state->first_cpu = cpumask_first(policy->related_cpus);
>>>>>> +		cpu_state->curr_freq = policy->cur;
>>>>>> +		cpu_state->min_freq = policy->cpuinfo.min_freq;
>>>>>> +		cpu_state->max_freq = policy->cpuinfo.max_freq;
>>>>>> +		data->cpu_state[cpu] = cpu_state;
>>>>>> +
>>>>>> +		cpufreq_cpu_put(policy);
>>>>>> +	}
>>>>>> +
>>>>>> +out:
>>>>>> +	put_online_cpus();
>>>>>> +	if (ret)
>>>>>> +		return ret;
>>>>>> +
>>>>>> +	/* Update devfreq */
>>>>>> +	mutex_lock(&devfreq->lock);
>>>>>> +	ret = update_devfreq(devfreq);
>>>>>
>>>>>> +	mutex_unlock(&devfreq->lock);
>>>>>> +	if (ret)
>>>>>> +		dev_err(dev, "Couldn't update the frequency.\n");
>>>>>> +
>>>>>> +	return ret;
>>>>>> +}
>>>>>> +
>>>>>> +static int cpufreq_passive_unregister(struct devfreq_passive_data **p_data)
>>>>>
>>>>> As I commented above, please change the name as following:
>>>>> - cpufreq_passive_unregister -> cpufreq_passive_unregister_notifier
>>>>>
>>>>>> +{
>>>>>> +	struct devfreq_passive_data *data = *p_data;
>>>>>> +	struct devfreq_cpu_state *cpu_state;
>>>>>> +	int cpu;
>>>>>> +
>>>>>> +	if (data->nb.notifier_call)
>>>>>> +		cpufreq_unregister_notifier(&data->nb,
>>>>>> +					    CPUFREQ_TRANSITION_NOTIFIER);
>>>>>> +
>>>>>> +	for_each_possible_cpu(cpu) {
>>>>>> +		cpu_state = data->cpu_state[cpu];
>>>>>> +		if (cpu_state) {
>>>>>> +			if (cpu_state->opp_table)
>>>>>> +				dev_pm_opp_put_opp_table(cpu_state->opp_table);
>>>>>> +			kfree(cpu_state);
>>>>>> +			cpu_state = NULL;
>>>>>> +		}
>>>>>> +	}
>>>>>> +
>>>>>> +	return 0;
>>>>>> +}
>>>>>> +
>>>>>> +int register_parent_dev_notifier(struct devfreq_passive_data **p_data)
>>>>>> +{
>>>>>> +	struct notifier_block *nb = &(*p_data)->nb;
>>>>>> +	int ret = 0;
>>>>>> +
>>>>>> +	switch ((*p_data)->parent_type) {
>>>>>> +	case DEVFREQ_PARENT_DEV:
>>>>>> +		nb->notifier_call = devfreq_passive_notifier_call;
>>>>>> +		ret = devfreq_register_notifier((struct devfreq *)(*p_data)->parent, nb,
>>>>>> +						DEVFREQ_TRANSITION_NOTIFIER);
>>>>>> +		break;
>>>>>> +	case CPUFREQ_PARENT_DEV:
>>>>>> +		ret = cpufreq_passive_register(p_data);
>>>>>> +		break;
>>>>>> +	default:
>>>>>> +		ret = -EINVAL;
>>>>>> +		break;
>>>>>> +	}
>>>>>> +	return ret;
>>>>>> +}
>>>>>> +
>>>>>> +int unregister_parent_dev_notifier(struct devfreq_passive_data **p_data)
>>>>>> +{
>>>>>> +	int ret = 0;
>>>>>> +
>>>>>> +	switch ((*p_data)->parent_type) {
>>>>>> +	case DEVFREQ_PARENT_DEV:
>>>>>> +		WARN_ON(devfreq_unregister_notifier((struct devfreq *)(*p_data)->parent,
>>>>>> +						    &(*p_data)->nb,
>>>>>> +						    DEVFREQ_TRANSITION_NOTIFIER));
>>>>>> +		break;
>>>>>> +	case CPUFREQ_PARENT_DEV:
>>>>>> +		cpufreq_passive_unregister(p_data);
>>>>>> +		break;
>>>>>> +	default:
>>>>>> +		ret = -EINVAL;
>>>>>> +		break;
>>>>>> +	}
>>>>>> +	return ret;
>>>>>> +}
>>>>>
>>>>> I think that you don't need to define register_parent_dev_notifier
>>>>> and unregister_parent_dev_notifier as the separate functions.
>>>>>
>>>>> Instead of the separate functions, just add the code
>>>>> into devfreq_passive_event_handler.
>>>>>
>>>>>
>>>>>> +
>>>>>>  static int devfreq_passive_event_handler(struct devfreq *devfreq,
>>>>>>  				unsigned int event, void *data)
>>>>>>  {
>>>>>>  	struct devfreq_passive_data *p_data
>>>>>>  			= (struct devfreq_passive_data *)devfreq->data;
>>>>>>  	struct devfreq *parent = (struct devfreq *)p_data->parent;
>>>>>> -	struct notifier_block *nb = &p_data->nb;
>>>>>>  	int ret = 0;
>>>>>>  
>>>>>> -	if (!parent)
>>>>>> +	if (p_data->parent_type == DEVFREQ_PARENT_DEV && !parent)
>>>>>>  		return -EPROBE_DEFER;
>>>>>>  
>>>>>>  	switch (event) {
>>>>>> @@ -147,13 +446,11 @@ static int devfreq_passive_event_handler(struct devfreq *devfreq,
>>>>>>  		if (!p_data->this)
>>>>>>  			p_data->this = devfreq;
>>>>>>  
>>>>>> -		nb->notifier_call = devfreq_passive_notifier_call;
>>>>>> -		ret = devfreq_register_notifier(parent, nb,
>>>>>> -					DEVFREQ_TRANSITION_NOTIFIER);
>>>>>> +		ret = register_parent_dev_notifier(&p_data);
>>>>>>  		break;
>>>>>> +
>>>>>>  	case DEVFREQ_GOV_STOP:
>>>>>> -		WARN_ON(devfreq_unregister_notifier(parent, nb,
>>>>>> -					DEVFREQ_TRANSITION_NOTIFIER));
>>>>>> +		ret = unregister_parent_dev_notifier(&p_data);
>>>>>>  		break;
>>>>>>  	default:
>>>>>>  		break;
>>>>>> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
>>>>>> index 26ea0850be9b..e0093b7c805c 100644
>>>>>> --- a/include/linux/devfreq.h
>>>>>> +++ b/include/linux/devfreq.h
>>>>>> @@ -280,6 +280,25 @@ struct devfreq_simple_ondemand_data {
>>>>>>  
>>>>>>  #if IS_ENABLED(CONFIG_DEVFREQ_GOV_PASSIVE)
>>>>>>  /**
>>>>>> + * struct devfreq_cpu_state - holds the per-cpu state
>>>>>> + * @freq:	the current frequency of the cpu.
>>>>>> + * @min_freq:	the min frequency of the cpu.
>>>>>> + * @max_freq:	the max frequency of the cpu.
>>>>>> + * @first_cpu:	the cpumask of the first cpu of a policy.
>>>>>> + * @dev:	reference to cpu device.
>>>>>> + * @opp_table:	reference to cpu opp table.
>>>>>> + *
>>>>>> + * This structure stores the required cpu_state of a cpu.
>>>>>> + * This is auto-populated by the governor.
>>>>>> + */
>>>>>> +struct devfreq_cpu_state;
>>>>>> +
>>>>>> +enum devfreq_parent_dev_type {
>>>>>> +	DEVFREQ_PARENT_DEV,
>>>>>> +	CPUFREQ_PARENT_DEV,
>>>>>> +};
>>>>>> +
>>>>>> +/**
>>>>>>   * struct devfreq_passive_data - ``void *data`` fed to struct devfreq
>>>>>>   *	and devfreq_add_device
>>>>>>   * @parent:	the devfreq instance of parent device.
>>>>>> @@ -290,13 +309,15 @@ struct devfreq_simple_ondemand_data {
>>>>>>   *			using governors except for passive governor.
>>>>>>   *			If the devfreq device has the specific method to decide
>>>>>>   *			the next frequency, should use this callback.
>>>>>> + * @parent_type:	parent type of the device
>>>>>>   * @this:	the devfreq instance of own device.
>>>>>>   * @nb:		the notifier block for DEVFREQ_TRANSITION_NOTIFIER list
>>>>>> + * @cpu_state:		the state min/max/current frequency of all online cpu's
>>>>>>   *
>>>>>>   * The devfreq_passive_data have to set the devfreq instance of parent
>>>>>>   * device with governors except for the passive governor. But, don't need to
>>>>>> - * initialize the 'this' and 'nb' field because the devfreq core will handle
>>>>>> - * them.
>>>>>> + * initialize the 'this', 'nb' and 'cpu_state' field because the devfreq core
>>>>>> + * will handle them.
>>>>>>   */
>>>>>>  struct devfreq_passive_data {
>>>>>>  	/* Should set the devfreq instance of parent device */
>>>>>> @@ -305,9 +326,13 @@ struct devfreq_passive_data {
>>>>>>  	/* Optional callback to decide the next frequency of passvice device */
>>>>>>  	int (*get_target_freq)(struct devfreq *this, unsigned long *freq);
>>>>>>  
>>>>>> +	/* Should set the type of parent device */
>>>>>> +	enum devfreq_parent_dev_type parent_type;
>>>>>> +
>>>>>>  	/* For passive governor's internal use. Don't need to set them */
>>>>>>  	struct devfreq *this;
>>>>>>  	struct notifier_block nb;
>>>>>> +	struct devfreq_cpu_state *cpu_state[NR_CPUS];
>>>>>>  };
>>>>>>  #endif
>>>>>>  
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
>>
>
Chanwoo Choi April 8, 2021, 2:47 a.m. UTC | #9
On 4/1/21 9:16 AM, Chanwoo Choi wrote:
> On 3/31/21 10:03 PM, andrew-sh.cheng wrote:
>> On Wed, 2021-03-31 at 17:35 +0900, Chanwoo Choi wrote:
>>> On 3/31/21 5:27 PM, Chanwoo Choi wrote:
>>>> Hi,
>>>>
>>>> On 3/31/21 5:03 PM, andrew-sh.cheng wrote:
>>>>> On Thu, 2021-03-25 at 17:14 +0900, Chanwoo Choi wrote:
>>>>>> Hi,
>>>>>>
>>>>>> You are missing to add these patches to linux-pm mailing list.
>>>>>> Need to send them to linu-pm ML.
>>>>>>
>>>>>> Also, before received this series, I tried to clean-up these patches
>>>>>> on testing branch[1]. So that I add my comment with my clean-up case.
>>>>>> [1] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!zIrzeDp9vPnm1_SDzVPuzqdHn3zWie9DnfBXaA-j9-CSrVc6aR9_rJQQiw81_CgAPh9XRRs$ 
>>>>>>
>>>>>> And 'Saravana Kannan <skannan@codeaurora.org>' is wrong email address.
>>>>>> Please update the email or drop this email.
>>>>>
>>>>> Hi Chanwoo,
>>>>>
>>>>> Thank you for the advices.
>>>>> I will resend patch v9 (add to linux-pm ML), remove this patch, and note
>>>>> that my patch set base on
>>>>> https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!yUlsuxrL5PcbF7o6A9DlCfvoA6w8V8VXKjYIybYyiJg3D0HM-lI2xRuxLUV6b3UJ8WFhg_g$ 
>>>>
>>>> I has not yet test this patch[1] on devfreq-testing-passive-gov branch.
>>>> So that if possible, I'd like you to test your patches with this patch[1] 
>>>> and then if there is no problem, could you send the next patches with patch[1]?
>>>>
>>>> [1]https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/commit/?h=devfreq-testing-passive-gov&id=39c80d11a8f42dd63ecea1e0df595a0ceb83b454__;!!CTRNKA9wMg0ARbw!yUlsuxrL5PcbF7o6A9DlCfvoA6w8V8VXKjYIybYyiJg3D0HM-lI2xRuxLUV6b3UJR2cQqZs$ 
>>>
>>>
>>> Sorry for the confusion. I make the devfreq-testing-passive-gov[1]
>>> branch based on latest devfreq-next branch.
>>> [1] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!yUlsuxrL5PcbF7o6A9DlCfvoA6w8V8VXKjYIybYyiJg3D0HM-lI2xRuxLUV6b3UJ8WFhg_g$ 
>>>
>>> First of all, if possible, I want to test them[1] with your patches in this series.
>>> And then if there are no any problem, please let me know. After confirmed from you,
>>> I'll send the patches of devfreq-testing-passive-gov[1] branch.
>>> How about that?
>>>
>> Hi Chanwoo~
>>
>> We will use this on Google Chrome project.
>> Google Hsin-Yi has test your patch + my patch set v8 [2~8]
>>
>>     make sure cci devfreqs runs with cpufreq.
>>     suspend resume
>>     speedometer2 benchmark
>> It is okay.
>>
>> Please send the patches of devfreq-testing-passive-gov[1] branch.
>>
>> I will send patch v9 base on yours latter.
> 
> Thanks for your test. I'll send the patches today.

I'm sorry for delay because when I tested the patches
for devfreq parent type on Odroid-xu3, there are some problem
related to lazy linking of OPP. So I'm trying to analyze them.
Unfortunately, we need to postpone these patches to next linux
version.


[snip]
andrew-sh.cheng April 22, 2021, 1:34 p.m. UTC | #10
On Thu, 2021-04-08 at 11:47 +0900, Chanwoo Choi wrote:
> On 4/1/21 9:16 AM, Chanwoo Choi wrote:
> > On 3/31/21 10:03 PM, andrew-sh.cheng wrote:
> >> On Wed, 2021-03-31 at 17:35 +0900, Chanwoo Choi wrote:
> >>> On 3/31/21 5:27 PM, Chanwoo Choi wrote:
> >>>> Hi,
> >>>>
> >>>> On 3/31/21 5:03 PM, andrew-sh.cheng wrote:
> >>>>> On Thu, 2021-03-25 at 17:14 +0900, Chanwoo Choi wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> You are missing to add these patches to linux-pm mailing list.
> >>>>>> Need to send them to linu-pm ML.
> >>>>>>
> >>>>>> Also, before received this series, I tried to clean-up these patches
> >>>>>> on testing branch[1]. So that I add my comment with my clean-up case.
> >>>>>> [1] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!zIrzeDp9vPnm1_SDzVPuzqdHn3zWie9DnfBXaA-j9-CSrVc6aR9_rJQQiw81_CgAPh9XRRs$ 
> >>>>>>
> >>>>>> And 'Saravana Kannan <skannan@codeaurora.org>' is wrong email address.
> >>>>>> Please update the email or drop this email.
> >>>>>
> >>>>> Hi Chanwoo,
> >>>>>
> >>>>> Thank you for the advices.
> >>>>> I will resend patch v9 (add to linux-pm ML), remove this patch, and note
> >>>>> that my patch set base on
> >>>>> https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!yUlsuxrL5PcbF7o6A9DlCfvoA6w8V8VXKjYIybYyiJg3D0HM-lI2xRuxLUV6b3UJ8WFhg_g$ 
> >>>>
> >>>> I has not yet test this patch[1] on devfreq-testing-passive-gov branch.
> >>>> So that if possible, I'd like you to test your patches with this patch[1] 
> >>>> and then if there is no problem, could you send the next patches with patch[1]?
> >>>>
> >>>> [1]https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/commit/?h=devfreq-testing-passive-gov&id=39c80d11a8f42dd63ecea1e0df595a0ceb83b454__;!!CTRNKA9wMg0ARbw!yUlsuxrL5PcbF7o6A9DlCfvoA6w8V8VXKjYIybYyiJg3D0HM-lI2xRuxLUV6b3UJR2cQqZs$ 
> >>>
> >>>
> >>> Sorry for the confusion. I make the devfreq-testing-passive-gov[1]
> >>> branch based on latest devfreq-next branch.
> >>> [1] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!yUlsuxrL5PcbF7o6A9DlCfvoA6w8V8VXKjYIybYyiJg3D0HM-lI2xRuxLUV6b3UJ8WFhg_g$ 
> >>>
> >>> First of all, if possible, I want to test them[1] with your patches in this series.
> >>> And then if there are no any problem, please let me know. After confirmed from you,
> >>> I'll send the patches of devfreq-testing-passive-gov[1] branch.
> >>> How about that?
> >>>
> >> Hi Chanwoo~
> >>
> >> We will use this on Google Chrome project.
> >> Google Hsin-Yi has test your patch + my patch set v8 [2~8]
> >>
> >>     make sure cci devfreqs runs with cpufreq.
> >>     suspend resume
> >>     speedometer2 benchmark
> >> It is okay.
> >>
> >> Please send the patches of devfreq-testing-passive-gov[1] branch.
> >>
> >> I will send patch v9 base on yours latter.
> > 
> > Thanks for your test. I'll send the patches today.
> 
> I'm sorry for delay because when I tested the patches
> for devfreq parent type on Odroid-xu3, there are some problem
> related to lazy linking of OPP. So I'm trying to analyze them.
> Unfortunately, we need to postpone these patches to next linux
> version.
> 
> 
Hi Chanwoo,
Sorry to bother you.
Do you work on this patch now?
Is there any thing that we can do?


> [snip]
>
andrew-sh.cheng May 26, 2021, 2:22 a.m. UTC | #11
On Thu, 2021-04-08 at 11:47 +0900, Chanwoo Choi wrote:
> On 4/1/21 9:16 AM, Chanwoo Choi wrote:
> > On 3/31/21 10:03 PM, andrew-sh.cheng wrote:
> >> On Wed, 2021-03-31 at 17:35 +0900, Chanwoo Choi wrote:
> >>> On 3/31/21 5:27 PM, Chanwoo Choi wrote:
> >>>> Hi,
> >>>>
> >>>> On 3/31/21 5:03 PM, andrew-sh.cheng wrote:
> >>>>> On Thu, 2021-03-25 at 17:14 +0900, Chanwoo Choi wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> You are missing to add these patches to linux-pm mailing list.
> >>>>>> Need to send them to linu-pm ML.
> >>>>>>
> >>>>>> Also, before received this series, I tried to clean-up these patches
> >>>>>> on testing branch[1]. So that I add my comment with my clean-up case.
> >>>>>> [1] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!zIrzeDp9vPnm1_SDzVPuzqdHn3zWie9DnfBXaA-j9-CSrVc6aR9_rJQQiw81_CgAPh9XRRs$ 
> >>>>>>
> >>>>>> And 'Saravana Kannan <skannan@codeaurora.org>' is wrong email address.
> >>>>>> Please update the email or drop this email.
> >>>>>
> >>>>> Hi Chanwoo,
> >>>>>
> >>>>> Thank you for the advices.
> >>>>> I will resend patch v9 (add to linux-pm ML), remove this patch, and note
> >>>>> that my patch set base on
> >>>>> https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!yUlsuxrL5PcbF7o6A9DlCfvoA6w8V8VXKjYIybYyiJg3D0HM-lI2xRuxLUV6b3UJ8WFhg_g$ 
> >>>>
> >>>> I has not yet test this patch[1] on devfreq-testing-passive-gov branch.
> >>>> So that if possible, I'd like you to test your patches with this patch[1] 
> >>>> and then if there is no problem, could you send the next patches with patch[1]?
> >>>>
> >>>> [1]https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/commit/?h=devfreq-testing-passive-gov&id=39c80d11a8f42dd63ecea1e0df595a0ceb83b454__;!!CTRNKA9wMg0ARbw!yUlsuxrL5PcbF7o6A9DlCfvoA6w8V8VXKjYIybYyiJg3D0HM-lI2xRuxLUV6b3UJR2cQqZs$ 
> >>>
> >>>
> >>> Sorry for the confusion. I make the devfreq-testing-passive-gov[1]
> >>> branch based on latest devfreq-next branch.
> >>> [1] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!yUlsuxrL5PcbF7o6A9DlCfvoA6w8V8VXKjYIybYyiJg3D0HM-lI2xRuxLUV6b3UJ8WFhg_g$ 
> >>>
> >>> First of all, if possible, I want to test them[1] with your patches in this series.
> >>> And then if there are no any problem, please let me know. After confirmed from you,
> >>> I'll send the patches of devfreq-testing-passive-gov[1] branch.
> >>> How about that?
> >>>
> >> Hi Chanwoo~
> >>
> >> We will use this on Google Chrome project.
> >> Google Hsin-Yi has test your patch + my patch set v8 [2~8]
> >>
> >>     make sure cci devfreqs runs with cpufreq.
> >>     suspend resume
> >>     speedometer2 benchmark
> >> It is okay.
> >>
> >> Please send the patches of devfreq-testing-passive-gov[1] branch.
> >>
> >> I will send patch v9 base on yours latter.
> > 
> > Thanks for your test. I'll send the patches today.
> 
> I'm sorry for delay because when I tested the patches
> for devfreq parent type on Odroid-xu3, there are some problem
> related to lazy linking of OPP. So I'm trying to analyze them.
> Unfortunately, we need to postpone these patches to next linux
> version.
> 
Hi Chanwoo Choi~

It is said that you are busy on another task recently.
May I know your plan on this patch?
Thank you.

> 
> [snip]
>
Chanwoo Choi May 26, 2021, 3:08 a.m. UTC | #12
Hi,
On 5/26/21 11:22 AM, andrew-sh.cheng wrote:
> On Thu, 2021-04-08 at 11:47 +0900, Chanwoo Choi wrote:
>> On 4/1/21 9:16 AM, Chanwoo Choi wrote:
>>> On 3/31/21 10:03 PM, andrew-sh.cheng wrote:
>>>> On Wed, 2021-03-31 at 17:35 +0900, Chanwoo Choi wrote:
>>>>> On 3/31/21 5:27 PM, Chanwoo Choi wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On 3/31/21 5:03 PM, andrew-sh.cheng wrote:
>>>>>>> On Thu, 2021-03-25 at 17:14 +0900, Chanwoo Choi wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> You are missing to add these patches to linux-pm mailing list.
>>>>>>>> Need to send them to linu-pm ML.
>>>>>>>>
>>>>>>>> Also, before received this series, I tried to clean-up these patches
>>>>>>>> on testing branch[1]. So that I add my comment with my clean-up case.
>>>>>>>> [1] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!zIrzeDp9vPnm1_SDzVPuzqdHn3zWie9DnfBXaA-j9-CSrVc6aR9_rJQQiw81_CgAPh9XRRs$ 
>>>>>>>>
>>>>>>>> And 'Saravana Kannan <skannan@codeaurora.org>' is wrong email address.
>>>>>>>> Please update the email or drop this email.
>>>>>>>
>>>>>>> Hi Chanwoo,
>>>>>>>
>>>>>>> Thank you for the advices.
>>>>>>> I will resend patch v9 (add to linux-pm ML), remove this patch, and note
>>>>>>> that my patch set base on
>>>>>>> https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!yUlsuxrL5PcbF7o6A9DlCfvoA6w8V8VXKjYIybYyiJg3D0HM-lI2xRuxLUV6b3UJ8WFhg_g$ 
>>>>>>
>>>>>> I has not yet test this patch[1] on devfreq-testing-passive-gov branch.
>>>>>> So that if possible, I'd like you to test your patches with this patch[1] 
>>>>>> and then if there is no problem, could you send the next patches with patch[1]?
>>>>>>
>>>>>> [1]https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/commit/?h=devfreq-testing-passive-gov&id=39c80d11a8f42dd63ecea1e0df595a0ceb83b454__;!!CTRNKA9wMg0ARbw!yUlsuxrL5PcbF7o6A9DlCfvoA6w8V8VXKjYIybYyiJg3D0HM-lI2xRuxLUV6b3UJR2cQqZs$ 
>>>>>
>>>>>
>>>>> Sorry for the confusion. I make the devfreq-testing-passive-gov[1]
>>>>> branch based on latest devfreq-next branch.
>>>>> [1] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!yUlsuxrL5PcbF7o6A9DlCfvoA6w8V8VXKjYIybYyiJg3D0HM-lI2xRuxLUV6b3UJ8WFhg_g$ 
>>>>>
>>>>> First of all, if possible, I want to test them[1] with your patches in this series.
>>>>> And then if there are no any problem, please let me know. After confirmed from you,
>>>>> I'll send the patches of devfreq-testing-passive-gov[1] branch.
>>>>> How about that?
>>>>>
>>>> Hi Chanwoo~
>>>>
>>>> We will use this on Google Chrome project.
>>>> Google Hsin-Yi has test your patch + my patch set v8 [2~8]
>>>>
>>>>     make sure cci devfreqs runs with cpufreq.
>>>>     suspend resume
>>>>     speedometer2 benchmark
>>>> It is okay.
>>>>
>>>> Please send the patches of devfreq-testing-passive-gov[1] branch.
>>>>
>>>> I will send patch v9 base on yours latter.
>>>
>>> Thanks for your test. I'll send the patches today.
>>
>> I'm sorry for delay because when I tested the patches
>> for devfreq parent type on Odroid-xu3, there are some problem
>> related to lazy linking of OPP. So I'm trying to analyze them.
>> Unfortunately, we need to postpone these patches to next linux
>> version.
>>
> Hi Chanwoo Choi~
> 
> It is said that you are busy on another task recently.
> May I know your plan on this patch?
> Thank you.

Sorry for late work. I have a question.
When I tested exynos-bus.c with adding the 'required-opp' property
on odroid-xu3 board. I got some fail about 

When calling _set_required_opps(), always _set_required_opp() returns
-EBUSY error because of following lazy linking case[1].

[1] https://elixir.bootlin.com/linux/v5.13-rc3/source/drivers/opp/core.c#L896

/* required-opps not fully initialized yet */
if (lazy_linking_pending(opp_table))
	return -EBUSY;  


For calling dev_pm_opp_of_add_table(), lazy_link_required_opp_table() function
will be called. But, there is constraint[2]. If is_genpd of opp_table is false,
driver/opp/of.c cannot resolve the lazy linking issue.

[2]  https://elixir.bootlin.com/linux/v5.13-rc3/source/drivers/opp/of.c#L386

/* Link required OPPs for all OPPs of the newly added OPP table */
static void lazy_link_required_opp_table(struct opp_table *new_table)
{
	struct opp_table *opp_table, *temp, **required_opp_tables;
	struct device_node *required_np, *opp_np, *required_table_np;
	struct dev_pm_opp *opp;
	int i, ret;

	/*
	 * We only support genpd's OPPs in the "required-opps" for now,
	 * as we don't know much about other cases.
	 */
	if (!new_table->is_genpd)
		return;

Even if this case, there are no problem on your test case?
andrew-sh.cheng May 31, 2021, 3:22 a.m. UTC | #13
On Wed, 2021-05-26 at 12:08 +0900, Chanwoo Choi wrote:
> Hi,
> On 5/26/21 11:22 AM, andrew-sh.cheng wrote:
> > On Thu, 2021-04-08 at 11:47 +0900, Chanwoo Choi wrote:
> >> On 4/1/21 9:16 AM, Chanwoo Choi wrote:
> >>> On 3/31/21 10:03 PM, andrew-sh.cheng wrote:
> >>>> On Wed, 2021-03-31 at 17:35 +0900, Chanwoo Choi wrote:
> >>>>> On 3/31/21 5:27 PM, Chanwoo Choi wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> On 3/31/21 5:03 PM, andrew-sh.cheng wrote:
> >>>>>>> On Thu, 2021-03-25 at 17:14 +0900, Chanwoo Choi wrote:
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> You are missing to add these patches to linux-pm mailing list.
> >>>>>>>> Need to send them to linu-pm ML.
> >>>>>>>>
> >>>>>>>> Also, before received this series, I tried to clean-up these patches
> >>>>>>>> on testing branch[1]. So that I add my comment with my clean-up case.
> >>>>>>>> [1] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!zIrzeDp9vPnm1_SDzVPuzqdHn3zWie9DnfBXaA-j9-CSrVc6aR9_rJQQiw81_CgAPh9XRRs$ 
> >>>>>>>>
> >>>>>>>> And 'Saravana Kannan <skannan@codeaurora.org>' is wrong email address.
> >>>>>>>> Please update the email or drop this email.
> >>>>>>>
> >>>>>>> Hi Chanwoo,
> >>>>>>>
> >>>>>>> Thank you for the advices.
> >>>>>>> I will resend patch v9 (add to linux-pm ML), remove this patch, and note
> >>>>>>> that my patch set base on
> >>>>>>> https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!yUlsuxrL5PcbF7o6A9DlCfvoA6w8V8VXKjYIybYyiJg3D0HM-lI2xRuxLUV6b3UJ8WFhg_g$ 
> >>>>>>
> >>>>>> I has not yet test this patch[1] on devfreq-testing-passive-gov branch.
> >>>>>> So that if possible, I'd like you to test your patches with this patch[1] 
> >>>>>> and then if there is no problem, could you send the next patches with patch[1]?
> >>>>>>
> >>>>>> [1]https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/commit/?h=devfreq-testing-passive-gov&id=39c80d11a8f42dd63ecea1e0df595a0ceb83b454__;!!CTRNKA9wMg0ARbw!yUlsuxrL5PcbF7o6A9DlCfvoA6w8V8VXKjYIybYyiJg3D0HM-lI2xRuxLUV6b3UJR2cQqZs$ 
> >>>>>
> >>>>>
> >>>>> Sorry for the confusion. I make the devfreq-testing-passive-gov[1]
> >>>>> branch based on latest devfreq-next branch.
> >>>>> [1] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!yUlsuxrL5PcbF7o6A9DlCfvoA6w8V8VXKjYIybYyiJg3D0HM-lI2xRuxLUV6b3UJ8WFhg_g$ 
> >>>>>
> >>>>> First of all, if possible, I want to test them[1] with your patches in this series.
> >>>>> And then if there are no any problem, please let me know. After confirmed from you,
> >>>>> I'll send the patches of devfreq-testing-passive-gov[1] branch.
> >>>>> How about that?
> >>>>>
> >>>> Hi Chanwoo~
> >>>>
> >>>> We will use this on Google Chrome project.
> >>>> Google Hsin-Yi has test your patch + my patch set v8 [2~8]
> >>>>
> >>>>     make sure cci devfreqs runs with cpufreq.
> >>>>     suspend resume
> >>>>     speedometer2 benchmark
> >>>> It is okay.
> >>>>
> >>>> Please send the patches of devfreq-testing-passive-gov[1] branch.
> >>>>
> >>>> I will send patch v9 base on yours latter.
> >>>
> >>> Thanks for your test. I'll send the patches today.
> >>
> >> I'm sorry for delay because when I tested the patches
> >> for devfreq parent type on Odroid-xu3, there are some problem
> >> related to lazy linking of OPP. So I'm trying to analyze them.
> >> Unfortunately, we need to postpone these patches to next linux
> >> version.
> >>
> > Hi Chanwoo Choi~
> > 
> > It is said that you are busy on another task recently.
> > May I know your plan on this patch?
> > Thank you.
> 
> Sorry for late work. I have a question.
> When I tested exynos-bus.c with adding the 'required-opp' property
> on odroid-xu3 board. I got some fail about 
> 
> When calling _set_required_opps(), always _set_required_opp() returns
> -EBUSY error because of following lazy linking case[1].
> 
> [1] https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v5.13-rc3/source/drivers/opp/core.c*L896__;Iw!!CTRNKA9wMg0ARbw!3eNxwDZRy-Ev5BHGxT-BxCz4qrNy0NZohQuBGW36krkwOkl_WX8yBmxlqSk9hxp_kxspMJI$ 
> 
> /* required-opps not fully initialized yet */
> if (lazy_linking_pending(opp_table))
> 	return -EBUSY;  
> 
> 
> For calling dev_pm_opp_of_add_table(), lazy_link_required_opp_table() function
> will be called. But, there is constraint[2]. If is_genpd of opp_table is false,
> driver/opp/of.c cannot resolve the lazy linking issue.
> 
> [2]  https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v5.13-rc3/source/drivers/opp/of.c*L386__;Iw!!CTRNKA9wMg0ARbw!3eNxwDZRy-Ev5BHGxT-BxCz4qrNy0NZohQuBGW36krkwOkl_WX8yBmxlqSk9hxp_QFUVY9E$ 
> 
> /* Link required OPPs for all OPPs of the newly added OPP table */
> static void lazy_link_required_opp_table(struct opp_table *new_table)
> {
> 	struct opp_table *opp_table, *temp, **required_opp_tables;
> 	struct device_node *required_np, *opp_np, *required_table_np;
> 	struct dev_pm_opp *opp;
> 	int i, ret;
> 
> 	/*
> 	 * We only support genpd's OPPs in the "required-opps" for now,
> 	 * as we don't know much about other cases.
> 	 */
> 	if (!new_table->is_genpd)
> 		return;
> 
> Even if this case, there are no problem on your test case?
> 

Hi Chanwoo~
Sorry for late reply.
Yes, we meet similar issue.
Google member Hsin-Yi had helped deal with this issue on Chrome project.

Patch segment:
@ /drivers/opp/of.c

/* Link required OPPs for all OPPs of the newly added OPP table */
static void lazy_link_required_opp_table(struct opp_table *new_table)
{
	struct opp_table *opp_table, *temp, **required_opp_tables;
	struct device_node *required_np, *opp_np, *required_table_np;
	struct dev_pm_opp *opp;
	int i, ret;

+	/*
+	 * We only support genpd's OPPs in the "required-opps" for now,
+	 * as we don't know much about other cases.
+	 */
+	if (!new_table->is_genpd)
+		return;


Hsin-Yi replied this issue in the discussion list in the original lazy
link thread:
https://patchwork.kernel.org/project/linux-pm/patch/20190717222340.137578-4-saravanak@google.com/#23932203

Loop Hsin-YI here.
You can discuss with her if needing more detail.

Thank you both.
Chanwoo Choi May 31, 2021, 7:56 a.m. UTC | #14
Hi,

On 5/31/21 12:22 PM, andrew-sh.cheng wrote:
> On Wed, 2021-05-26 at 12:08 +0900, Chanwoo Choi wrote:
>> Hi,
>> On 5/26/21 11:22 AM, andrew-sh.cheng wrote:
>>> On Thu, 2021-04-08 at 11:47 +0900, Chanwoo Choi wrote:
>>>> On 4/1/21 9:16 AM, Chanwoo Choi wrote:
>>>>> On 3/31/21 10:03 PM, andrew-sh.cheng wrote:
>>>>>> On Wed, 2021-03-31 at 17:35 +0900, Chanwoo Choi wrote:
>>>>>>> On 3/31/21 5:27 PM, Chanwoo Choi wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> On 3/31/21 5:03 PM, andrew-sh.cheng wrote:
>>>>>>>>> On Thu, 2021-03-25 at 17:14 +0900, Chanwoo Choi wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> You are missing to add these patches to linux-pm mailing list.
>>>>>>>>>> Need to send them to linu-pm ML.
>>>>>>>>>>
>>>>>>>>>> Also, before received this series, I tried to clean-up these patches
>>>>>>>>>> on testing branch[1]. So that I add my comment with my clean-up case.
>>>>>>>>>> [1] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!zIrzeDp9vPnm1_SDzVPuzqdHn3zWie9DnfBXaA-j9-CSrVc6aR9_rJQQiw81_CgAPh9XRRs$ 
>>>>>>>>>>
>>>>>>>>>> And 'Saravana Kannan <skannan@codeaurora.org>' is wrong email address.
>>>>>>>>>> Please update the email or drop this email.
>>>>>>>>>
>>>>>>>>> Hi Chanwoo,
>>>>>>>>>
>>>>>>>>> Thank you for the advices.
>>>>>>>>> I will resend patch v9 (add to linux-pm ML), remove this patch, and note
>>>>>>>>> that my patch set base on
>>>>>>>>> https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!yUlsuxrL5PcbF7o6A9DlCfvoA6w8V8VXKjYIybYyiJg3D0HM-lI2xRuxLUV6b3UJ8WFhg_g$ 
>>>>>>>>
>>>>>>>> I has not yet test this patch[1] on devfreq-testing-passive-gov branch.
>>>>>>>> So that if possible, I'd like you to test your patches with this patch[1] 
>>>>>>>> and then if there is no problem, could you send the next patches with patch[1]?
>>>>>>>>
>>>>>>>> [1]https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/commit/?h=devfreq-testing-passive-gov&id=39c80d11a8f42dd63ecea1e0df595a0ceb83b454__;!!CTRNKA9wMg0ARbw!yUlsuxrL5PcbF7o6A9DlCfvoA6w8V8VXKjYIybYyiJg3D0HM-lI2xRuxLUV6b3UJR2cQqZs$ 
>>>>>>>
>>>>>>>
>>>>>>> Sorry for the confusion. I make the devfreq-testing-passive-gov[1]
>>>>>>> branch based on latest devfreq-next branch.
>>>>>>> [1] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!yUlsuxrL5PcbF7o6A9DlCfvoA6w8V8VXKjYIybYyiJg3D0HM-lI2xRuxLUV6b3UJ8WFhg_g$ 
>>>>>>>
>>>>>>> First of all, if possible, I want to test them[1] with your patches in this series.
>>>>>>> And then if there are no any problem, please let me know. After confirmed from you,
>>>>>>> I'll send the patches of devfreq-testing-passive-gov[1] branch.
>>>>>>> How about that?
>>>>>>>
>>>>>> Hi Chanwoo~
>>>>>>
>>>>>> We will use this on Google Chrome project.
>>>>>> Google Hsin-Yi has test your patch + my patch set v8 [2~8]
>>>>>>
>>>>>>     make sure cci devfreqs runs with cpufreq.
>>>>>>     suspend resume
>>>>>>     speedometer2 benchmark
>>>>>> It is okay.
>>>>>>
>>>>>> Please send the patches of devfreq-testing-passive-gov[1] branch.
>>>>>>
>>>>>> I will send patch v9 base on yours latter.
>>>>>
>>>>> Thanks for your test. I'll send the patches today.
>>>>
>>>> I'm sorry for delay because when I tested the patches
>>>> for devfreq parent type on Odroid-xu3, there are some problem
>>>> related to lazy linking of OPP. So I'm trying to analyze them.
>>>> Unfortunately, we need to postpone these patches to next linux
>>>> version.
>>>>
>>> Hi Chanwoo Choi~
>>>
>>> It is said that you are busy on another task recently.
>>> May I know your plan on this patch?
>>> Thank you.
>>
>> Sorry for late work. I have a question.
>> When I tested exynos-bus.c with adding the 'required-opp' property
>> on odroid-xu3 board. I got some fail about 
>>
>> When calling _set_required_opps(), always _set_required_opp() returns
>> -EBUSY error because of following lazy linking case[1].
>>
>> [1] https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v5.13-rc3/source/drivers/opp/core.c*L896__;Iw!!CTRNKA9wMg0ARbw!3eNxwDZRy-Ev5BHGxT-BxCz4qrNy0NZohQuBGW36krkwOkl_WX8yBmxlqSk9hxp_kxspMJI$ 
>>
>> /* required-opps not fully initialized yet */
>> if (lazy_linking_pending(opp_table))
>> 	return -EBUSY;  
>>
>>
>> For calling dev_pm_opp_of_add_table(), lazy_link_required_opp_table() function
>> will be called. But, there is constraint[2]. If is_genpd of opp_table is false,
>> driver/opp/of.c cannot resolve the lazy linking issue.
>>
>> [2]  https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v5.13-rc3/source/drivers/opp/of.c*L386__;Iw!!CTRNKA9wMg0ARbw!3eNxwDZRy-Ev5BHGxT-BxCz4qrNy0NZohQuBGW36krkwOkl_WX8yBmxlqSk9hxp_QFUVY9E$ 
>>
>> /* Link required OPPs for all OPPs of the newly added OPP table */
>> static void lazy_link_required_opp_table(struct opp_table *new_table)
>> {
>> 	struct opp_table *opp_table, *temp, **required_opp_tables;
>> 	struct device_node *required_np, *opp_np, *required_table_np;
>> 	struct dev_pm_opp *opp;
>> 	int i, ret;
>>
>> 	/*
>> 	 * We only support genpd's OPPs in the "required-opps" for now,
>> 	 * as we don't know much about other cases.
>> 	 */
>> 	if (!new_table->is_genpd)
>> 		return;
>>
>> Even if this case, there are no problem on your test case?
>>
> 
> Hi Chanwoo~
> Sorry for late reply.
> Yes, we meet similar issue.
> Google member Hsin-Yi had helped deal with this issue on Chrome project.
> 
> Patch segment:
> @ /drivers/opp/of.c
> 
> /* Link required OPPs for all OPPs of the newly added OPP table */
> static void lazy_link_required_opp_table(struct opp_table *new_table)
> {
> 	struct opp_table *opp_table, *temp, **required_opp_tables;
> 	struct device_node *required_np, *opp_np, *required_table_np;
> 	struct dev_pm_opp *opp;
> 	int i, ret;
> 
> +	/*
> +	 * We only support genpd's OPPs in the "required-opps" for now,
> +	 * as we don't know much about other cases.
> +	 */
> +	if (!new_table->is_genpd)
> +		return;
> 
> 
> Hsin-Yi replied this issue in the discussion list in the original lazy
> link thread:
> https://patchwork.kernel.org/project/linux-pm/patch/20190717222340.137578-4-saravanak@google.com/#23932203
> 
> Loop Hsin-YI here.
> You can discuss with her if needing more detail.
> 
> Thank you both.
> 

Thanks. First of all, we need to resolve and discuss this issue.
Chanwoo Choi May 31, 2021, 8:13 a.m. UTC | #15
On 5/31/21 4:42 PM, Hsin-Yi Wang wrote:
> 
> 
> On Mon, May 31, 2021 at 3:37 PM Chanwoo Choi <cw00.choi@samsung.com <mailto:cw00.choi@samsung.com>> wrote:
> 
>     Hi,
> 
>     On 5/31/21 12:22 PM, andrew-sh.cheng wrote:
>     > On Wed, 2021-05-26 at 12:08 +0900, Chanwoo Choi wrote:
>     >> Hi,
>     >> On 5/26/21 11:22 AM, andrew-sh.cheng wrote:
>     >>> On Thu, 2021-04-08 at 11:47 +0900, Chanwoo Choi wrote:
>     >>>> On 4/1/21 9:16 AM, Chanwoo Choi wrote:
>     >>>>> On 3/31/21 10:03 PM, andrew-sh.cheng wrote:
>     >>>>>> On Wed, 2021-03-31 at 17:35 +0900, Chanwoo Choi wrote:
>     >>>>>>> On 3/31/21 5:27 PM, Chanwoo Choi wrote:
>     >>>>>>>> Hi,
>     >>>>>>>>
>     >>>>>>>> On 3/31/21 5:03 PM, andrew-sh.cheng wrote:
>     >>>>>>>>> On Thu, 2021-03-25 at 17:14 +0900, Chanwoo Choi wrote:
>     >>>>>>>>>> Hi,
>     >>>>>>>>>>
>     >>>>>>>>>> You are missing to add these patches to linux-pm mailing list.
>     >>>>>>>>>> Need to send them to linu-pm ML.
>     >>>>>>>>>>
>     >>>>>>>>>> Also, before received this series, I tried to clean-up these patches
>     >>>>>>>>>> on testing branch[1]. So that I add my comment with my clean-up case.
>     >>>>>>>>>> [1] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!zIrzeDp9vPnm1_SDzVPuzqdHn3zWie9DnfBXaA-j9-CSrVc6aR9_rJQQiw81_CgAPh9XRRs$
>     >>>>>>>>>>
>     >>>>>>>>>> And 'Saravana Kannan <skannan@codeaurora.org <mailto:skannan@codeaurora.org>>' is wrong email address.
>     >>>>>>>>>> Please update the email or drop this email.
>     >>>>>>>>>
>     >>>>>>>>> Hi Chanwoo,
>     >>>>>>>>>
>     >>>>>>>>> Thank you for the advices.
>     >>>>>>>>> I will resend patch v9 (add to linux-pm ML), remove this patch, and note
>     >>>>>>>>> that my patch set base on
>     >>>>>>>>> https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!yUlsuxrL5PcbF7o6A9DlCfvoA6w8V8VXKjYIybYyiJg3D0HM-lI2xRuxLUV6b3UJ8WFhg_g$
>     >>>>>>>>
>     >>>>>>>> I has not yet test this patch[1] on devfreq-testing-passive-gov branch.
>     >>>>>>>> So that if possible, I'd like you to test your patches with this patch[1]
>     >>>>>>>> and then if there is no problem, could you send the next patches with patch[1]?
>     >>>>>>>>
>     >>>>>>>> [1]https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/commit/?h=devfreq-testing-passive-gov&id=39c80d11a8f42dd63ecea1e0df595a0ceb83b454__;!!CTRNKA9wMg0ARbw!yUlsuxrL5PcbF7o6A9DlCfvoA6w8V8VXKjYIybYyiJg3D0HM-lI2xRuxLUV6b3UJR2cQqZs$
>     >>>>>>>
>     >>>>>>>
>     >>>>>>> Sorry for the confusion. I make the devfreq-testing-passive-gov[1]
>     >>>>>>> branch based on latest devfreq-next branch.
>     >>>>>>> [1] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!yUlsuxrL5PcbF7o6A9DlCfvoA6w8V8VXKjYIybYyiJg3D0HM-lI2xRuxLUV6b3UJ8WFhg_g$
>     >>>>>>>
>     >>>>>>> First of all, if possible, I want to test them[1] with your patches in this series.
>     >>>>>>> And then if there are no any problem, please let me know. After confirmed from you,
>     >>>>>>> I'll send the patches of devfreq-testing-passive-gov[1] branch.
>     >>>>>>> How about that?
>     >>>>>>>
>     >>>>>> Hi Chanwoo~
>     >>>>>>
>     >>>>>> We will use this on Google Chrome project.
>     >>>>>> Google Hsin-Yi has test your patch + my patch set v8 [2~8]
>     >>>>>>
>     >>>>>>     make sure cci devfreqs runs with cpufreq.
>     >>>>>>     suspend resume
>     >>>>>>     speedometer2 benchmark
>     >>>>>> It is okay.
>     >>>>>>
>     >>>>>> Please send the patches of devfreq-testing-passive-gov[1] branch.
>     >>>>>>
>     >>>>>> I will send patch v9 base on yours latter.
>     >>>>>
>     >>>>> Thanks for your test. I'll send the patches today.
>     >>>>
>     >>>> I'm sorry for delay because when I tested the patches
>     >>>> for devfreq parent type on Odroid-xu3, there are some problem
>     >>>> related to lazy linking of OPP. So I'm trying to analyze them.
>     >>>> Unfortunately, we need to postpone these patches to next linux
>     >>>> version.
>     >>>>
>     >>> Hi Chanwoo Choi~
>     >>>
>     >>> It is said that you are busy on another task recently.
>     >>> May I know your plan on this patch?
>     >>> Thank you.
>     >>
>     >> Sorry for late work. I have a question.
>     >> When I tested exynos-bus.c with adding the 'required-opp' property
>     >> on odroid-xu3 board. I got some fail about
>     >>
>     >> When calling _set_required_opps(), always _set_required_opp() returns
>     >> -EBUSY error because of following lazy linking case[1].
>     >>
>     >> [1] https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v5.13-rc3/source/drivers/opp/core.c*L896__;Iw!!CTRNKA9wMg0ARbw!3eNxwDZRy-Ev5BHGxT-BxCz4qrNy0NZohQuBGW36krkwOkl_WX8yBmxlqSk9hxp_kxspMJI$
>     >>
>     >> /* required-opps not fully initialized yet */
>     >> if (lazy_linking_pending(opp_table))
>     >>      return -EBUSY; 
>     >>
>     >>
>     >> For calling dev_pm_opp_of_add_table(), lazy_link_required_opp_table() function
>     >> will be called. But, there is constraint[2]. If is_genpd of opp_table is false,
>     >> driver/opp/of.c cannot resolve the lazy linking issue.
>     >>
>     >> [2]  https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v5.13-rc3/source/drivers/opp/of.c*L386__;Iw!!CTRNKA9wMg0ARbw!3eNxwDZRy-Ev5BHGxT-BxCz4qrNy0NZohQuBGW36krkwOkl_WX8yBmxlqSk9hxp_QFUVY9E$
>     >>
>     >> /* Link required OPPs for all OPPs of the newly added OPP table */
>     >> static void lazy_link_required_opp_table(struct opp_table *new_table)
>     >> {
>     >>      struct opp_table *opp_table, *temp, **required_opp_tables;
>     >>      struct device_node *required_np, *opp_np, *required_table_np;
>     >>      struct dev_pm_opp *opp;
>     >>      int i, ret;
>     >>
>     >>      /*
>     >>       * We only support genpd's OPPs in the "required-opps" for now,
>     >>       * as we don't know much about other cases.
>     >>       */
>     >>      if (!new_table->is_genpd)
>     >>              return;
>     >>
>     >> Even if this case, there are no problem on your test case?
>     >>
>     >
>     > Hi Chanwoo~
>     > Sorry for late reply.
>     > Yes, we meet similar issue.
>     > Google member Hsin-Yi had helped deal with this issue on Chrome project.
>     >
>     > Patch segment:
>     > @ /drivers/opp/of.c
>     >
>     > /* Link required OPPs for all OPPs of the newly added OPP table */
>     > static void lazy_link_required_opp_table(struct opp_table *new_table)
>     > {
>     >       struct opp_table *opp_table, *temp, **required_opp_tables;
>     >       struct device_node *required_np, *opp_np, *required_table_np;
>     >       struct dev_pm_opp *opp;
>     >       int i, ret;
>     >
>     > +     /*
>     > +      * We only support genpd's OPPs in the "required-opps" for now,
>     > +      * as we don't know much about other cases.
>     > +      */
>     > +     if (!new_table->is_genpd)
>     > +             return;
>     >
>     >
>     > Hsin-Yi replied this issue in the discussion list in the original lazy
>     > link thread:
>     > https://patchwork.kernel.org/project/linux-pm/patch/20190717222340.137578-4-saravanak@google.com/#23932203
>     >
>     > Loop Hsin-YI here.
>     > You can discuss with her if needing more detail.
>     >
>     > Thank you both.
>     >
> 
>     Thanks. First of all, we need to resolve and discuss this issue.
> 
> 
> Hi Chanwoo, 
> 
> We think removing the genpd check is sufficient for our use case since we only use the lazy link for opp table translation.

Hi Hsin-Yi,

IMHO, I think 'is_genpd' checking should be removed for devices except for genpd
like as following:

diff --git a/drivers/opp/of.c b/drivers/opp/of.c
index c582a9ca397b..b54d3a985515 100644
--- a/drivers/opp/of.c
+++ b/drivers/opp/of.c
@@ -201,17 +201,6 @@ static void _opp_table_alloc_required_tables(struct opp_table *opp_table,
                        lazy = true;
                        continue;
                }
-
-               /*
-                * We only support genpd's OPPs in the "required-opps" for now,
-                * as we don't know how much about other cases. Error out if the
-                * required OPP doesn't belong to a genpd.
-                */
-               if (!required_opp_tables[i]->is_genpd) {
-                       dev_err(dev, "required-opp doesn't belong to genpd: %pOF\n",
-                               required_np);
-                       goto free_required_tables;
-               }
        }
 
        /* Let's do the linking later on */
@@ -379,13 +368,6 @@ static void lazy_link_required_opp_table(struct opp_table *new_table)
        struct dev_pm_opp *opp;
        int i, ret;
 
-       /*
-        * We only support genpd's OPPs in the "required-opps" for now,
-        * as we don't know much about other cases.
-        */
-       if (!new_table->is_genpd)
-               return;
-
        mutex_lock(&opp_table_lock);
 
        list_for_each_entry_safe(opp_table, temp, &lazy_opp_tables, lazy) {
@@ -874,7 +856,7 @@ static struct dev_pm_opp *_opp_add_static_v2(struct opp_table *opp_table,
                return ERR_PTR(-ENOMEM);
 
        ret = _read_opp_key(new_opp, opp_table, np, &rate_not_available);
-       if (ret < 0 && !opp_table->is_genpd) {
+       if (ret < 0) {
                dev_err(dev, "%s: opp key field not found\n", __func__);
                goto free_opp;
        }
diff mbox series

Patch

diff --git a/drivers/devfreq/Kconfig b/drivers/devfreq/Kconfig
index 00704efe6398..f56132b0ae64 100644
--- a/drivers/devfreq/Kconfig
+++ b/drivers/devfreq/Kconfig
@@ -73,6 +73,8 @@  config DEVFREQ_GOV_PASSIVE
 	  device. This governor does not change the frequency by itself
 	  through sysfs entries. The passive governor recommends that
 	  devfreq device uses the OPP table to get the frequency/voltage.
+	  Alternatively the governor can also be chosen to scale based on
+	  the online CPUs current frequency.
 
 comment "DEVFREQ Drivers"
 
diff --git a/drivers/devfreq/governor_passive.c b/drivers/devfreq/governor_passive.c
index b094132bd20b..9cc57b083839 100644
--- a/drivers/devfreq/governor_passive.c
+++ b/drivers/devfreq/governor_passive.c
@@ -8,11 +8,103 @@ 
  */
 
 #include <linux/module.h>
+#include <linux/cpu.h>
+#include <linux/cpufreq.h>
+#include <linux/cpumask.h>
 #include <linux/device.h>
 #include <linux/devfreq.h>
+#include <linux/slab.h>
 #include "governor.h"
 
-static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
+struct devfreq_cpu_state {
+	unsigned int curr_freq;
+	unsigned int min_freq;
+	unsigned int max_freq;
+	unsigned int first_cpu;
+	struct device *cpu_dev;
+	struct opp_table *opp_table;
+};
+
+static unsigned long xlate_cpufreq_to_devfreq(struct devfreq_passive_data *data,
+					      unsigned int cpu)
+{
+	unsigned int cpu_min_freq, cpu_max_freq, cpu_curr_freq_khz, cpu_percent;
+	unsigned long dev_min_freq, dev_max_freq, dev_max_state;
+
+	struct devfreq_cpu_state *cpu_state = data->cpu_state[cpu];
+	struct devfreq *devfreq = (struct devfreq *)data->this;
+	unsigned long *dev_freq_table = devfreq->profile->freq_table;
+	struct dev_pm_opp *opp = NULL, *p_opp = NULL;
+	unsigned long cpu_curr_freq, freq;
+
+	if (!cpu_state || cpu_state->first_cpu != cpu ||
+	    !cpu_state->opp_table || !devfreq->opp_table)
+		return 0;
+
+	cpu_curr_freq = cpu_state->curr_freq * 1000;
+	p_opp = devfreq_recommended_opp(cpu_state->cpu_dev, &cpu_curr_freq, 0);
+	if (IS_ERR(p_opp))
+		return 0;
+
+	opp = dev_pm_opp_xlate_required_opp(cpu_state->opp_table,
+					    devfreq->opp_table, p_opp);
+	dev_pm_opp_put(p_opp);
+
+	if (!IS_ERR(opp)) {
+		freq = dev_pm_opp_get_freq(opp);
+		dev_pm_opp_put(opp);
+		goto out;
+	}
+
+	/* Use Interpolation if required opps is not available */
+	cpu_min_freq = cpu_state->min_freq;
+	cpu_max_freq = cpu_state->max_freq;
+	cpu_curr_freq_khz = cpu_state->curr_freq;
+
+	if (dev_freq_table) {
+		/* Get minimum frequency according to sorting order */
+		dev_max_state = dev_freq_table[devfreq->profile->max_state - 1];
+		if (dev_freq_table[0] < dev_max_state) {
+			dev_min_freq = dev_freq_table[0];
+			dev_max_freq = dev_max_state;
+		} else {
+			dev_min_freq = dev_max_state;
+			dev_max_freq = dev_freq_table[0];
+		}
+	} else {
+		dev_min_freq = dev_pm_qos_read_value(devfreq->dev.parent,
+						     DEV_PM_QOS_MIN_FREQUENCY);
+		dev_max_freq = dev_pm_qos_read_value(devfreq->dev.parent,
+						     DEV_PM_QOS_MAX_FREQUENCY);
+
+		if (dev_max_freq <= dev_min_freq)
+			return 0;
+	}
+	cpu_percent = ((cpu_curr_freq_khz - cpu_min_freq) * 100) / cpu_max_freq - cpu_min_freq;
+	freq = dev_min_freq + mult_frac(dev_max_freq - dev_min_freq, cpu_percent, 100);
+
+out:
+	return freq;
+}
+
+static int get_target_freq_with_cpufreq(struct devfreq *devfreq,
+					unsigned long *freq)
+{
+	struct devfreq_passive_data *p_data =
+				(struct devfreq_passive_data *)devfreq->data;
+	unsigned int cpu;
+	unsigned long target_freq = 0;
+
+	for_each_online_cpu(cpu)
+		target_freq = max(target_freq,
+				  xlate_cpufreq_to_devfreq(p_data, cpu));
+
+	*freq = target_freq;
+
+	return 0;
+}
+
+static int get_target_freq_with_devfreq(struct devfreq *devfreq,
 					unsigned long *freq)
 {
 	struct devfreq_passive_data *p_data
@@ -23,14 +115,6 @@  static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
 	int i, count;
 
 	/*
-	 * If the devfreq device with passive governor has the specific method
-	 * to determine the next frequency, should use the get_target_freq()
-	 * of struct devfreq_passive_data.
-	 */
-	if (p_data->get_target_freq)
-		return p_data->get_target_freq(devfreq, freq);
-
-	/*
 	 * If the parent and passive devfreq device uses the OPP table,
 	 * get the next frequency by using the OPP table.
 	 */
@@ -98,6 +182,37 @@  static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
 	return 0;
 }
 
+static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
+					   unsigned long *freq)
+{
+	struct devfreq_passive_data *p_data =
+		(struct devfreq_passive_data *)devfreq->data;
+	int ret;
+
+	/*
+	 * If the devfreq device with passive governor has the specific method
+	 * to determine the next frequency, should use the get_target_freq()
+	 * of struct devfreq_passive_data.
+	 */
+	if (p_data->get_target_freq)
+		return p_data->get_target_freq(devfreq, freq);
+
+	switch (p_data->parent_type) {
+	case DEVFREQ_PARENT_DEV:
+		ret = get_target_freq_with_devfreq(devfreq, freq);
+		break;
+	case CPUFREQ_PARENT_DEV:
+		ret = get_target_freq_with_cpufreq(devfreq, freq);
+		break;
+	default:
+		ret = -EINVAL;
+		dev_err(&devfreq->dev, "Invalid parent type\n");
+		break;
+	}
+
+	return ret;
+}
+
 static int devfreq_passive_notifier_call(struct notifier_block *nb,
 				unsigned long event, void *ptr)
 {
@@ -130,16 +245,200 @@  static int devfreq_passive_notifier_call(struct notifier_block *nb,
 	return NOTIFY_DONE;
 }
 
+static int cpufreq_passive_notifier_call(struct notifier_block *nb,
+					 unsigned long event, void *ptr)
+{
+	struct devfreq_passive_data *data =
+			container_of(nb, struct devfreq_passive_data, nb);
+	struct devfreq *devfreq = (struct devfreq *)data->this;
+	struct devfreq_cpu_state *cpu_state;
+	struct cpufreq_freqs *cpu_freq = ptr;
+	unsigned int curr_freq;
+	int ret;
+
+	if (event != CPUFREQ_POSTCHANGE || !cpu_freq ||
+	    !data->cpu_state[cpu_freq->policy->cpu])
+		return 0;
+
+	cpu_state = data->cpu_state[cpu_freq->policy->cpu];
+	if (cpu_state->curr_freq == cpu_freq->new)
+		return 0;
+
+	/* Backup current freq and pre-update cpu state freq*/
+	curr_freq = cpu_state->curr_freq;
+	cpu_state->curr_freq = cpu_freq->new;
+
+	mutex_lock(&devfreq->lock);
+	ret = update_devfreq(devfreq);
+	mutex_unlock(&devfreq->lock);
+	if (ret) {
+		cpu_state->curr_freq = curr_freq;
+		dev_err(&devfreq->dev, "Couldn't update the frequency.\n");
+		return ret;
+	}
+
+	return 0;
+}
+
+static int cpufreq_passive_register(struct devfreq_passive_data **p_data)
+{
+	struct devfreq_passive_data *data = *p_data;
+	struct devfreq *devfreq = (struct devfreq *)data->this;
+	struct device *dev = devfreq->dev.parent;
+	struct opp_table *opp_table = NULL;
+	struct devfreq_cpu_state *cpu_state;
+	struct cpufreq_policy *policy;
+	struct device *cpu_dev;
+	unsigned int cpu;
+	int ret;
+
+	get_online_cpus();
+
+	data->nb.notifier_call = cpufreq_passive_notifier_call;
+	ret = cpufreq_register_notifier(&data->nb,
+					CPUFREQ_TRANSITION_NOTIFIER);
+	if (ret) {
+		dev_err(dev, "Couldn't register cpufreq notifier.\n");
+		data->nb.notifier_call = NULL;
+		goto out;
+	}
+
+	/* Populate devfreq_cpu_state */
+	for_each_online_cpu(cpu) {
+		if (data->cpu_state[cpu])
+			continue;
+
+		policy = cpufreq_cpu_get(cpu);
+		if (!policy) {
+			ret = -EINVAL;
+			goto out;
+		} else if (PTR_ERR(policy) == -EPROBE_DEFER) {
+			ret = -EPROBE_DEFER;
+			goto out;
+		} else if (IS_ERR(policy)) {
+			ret = PTR_ERR(policy);
+			dev_err(dev, "Couldn't get the cpufreq_poliy.\n");
+			goto out;
+		}
+
+		cpu_state = kzalloc(sizeof(*cpu_state), GFP_KERNEL);
+		if (!cpu_state) {
+			ret = -ENOMEM;
+			goto out;
+		}
+
+		cpu_dev = get_cpu_device(cpu);
+		if (!cpu_dev) {
+			dev_err(dev, "Couldn't get cpu device.\n");
+			ret = -ENODEV;
+			goto out;
+		}
+
+		opp_table = dev_pm_opp_get_opp_table(cpu_dev);
+		if (IS_ERR(devfreq->opp_table)) {
+			ret = PTR_ERR(opp_table);
+			goto out;
+		}
+
+		cpu_state->cpu_dev = cpu_dev;
+		cpu_state->opp_table = opp_table;
+		cpu_state->first_cpu = cpumask_first(policy->related_cpus);
+		cpu_state->curr_freq = policy->cur;
+		cpu_state->min_freq = policy->cpuinfo.min_freq;
+		cpu_state->max_freq = policy->cpuinfo.max_freq;
+		data->cpu_state[cpu] = cpu_state;
+
+		cpufreq_cpu_put(policy);
+	}
+
+out:
+	put_online_cpus();
+	if (ret)
+		return ret;
+
+	/* Update devfreq */
+	mutex_lock(&devfreq->lock);
+	ret = update_devfreq(devfreq);
+	mutex_unlock(&devfreq->lock);
+	if (ret)
+		dev_err(dev, "Couldn't update the frequency.\n");
+
+	return ret;
+}
+
+static int cpufreq_passive_unregister(struct devfreq_passive_data **p_data)
+{
+	struct devfreq_passive_data *data = *p_data;
+	struct devfreq_cpu_state *cpu_state;
+	int cpu;
+
+	if (data->nb.notifier_call)
+		cpufreq_unregister_notifier(&data->nb,
+					    CPUFREQ_TRANSITION_NOTIFIER);
+
+	for_each_possible_cpu(cpu) {
+		cpu_state = data->cpu_state[cpu];
+		if (cpu_state) {
+			if (cpu_state->opp_table)
+				dev_pm_opp_put_opp_table(cpu_state->opp_table);
+			kfree(cpu_state);
+			cpu_state = NULL;
+		}
+	}
+
+	return 0;
+}
+
+int register_parent_dev_notifier(struct devfreq_passive_data **p_data)
+{
+	struct notifier_block *nb = &(*p_data)->nb;
+	int ret = 0;
+
+	switch ((*p_data)->parent_type) {
+	case DEVFREQ_PARENT_DEV:
+		nb->notifier_call = devfreq_passive_notifier_call;
+		ret = devfreq_register_notifier((struct devfreq *)(*p_data)->parent, nb,
+						DEVFREQ_TRANSITION_NOTIFIER);
+		break;
+	case CPUFREQ_PARENT_DEV:
+		ret = cpufreq_passive_register(p_data);
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+	return ret;
+}
+
+int unregister_parent_dev_notifier(struct devfreq_passive_data **p_data)
+{
+	int ret = 0;
+
+	switch ((*p_data)->parent_type) {
+	case DEVFREQ_PARENT_DEV:
+		WARN_ON(devfreq_unregister_notifier((struct devfreq *)(*p_data)->parent,
+						    &(*p_data)->nb,
+						    DEVFREQ_TRANSITION_NOTIFIER));
+		break;
+	case CPUFREQ_PARENT_DEV:
+		cpufreq_passive_unregister(p_data);
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+	return ret;
+}
+
 static int devfreq_passive_event_handler(struct devfreq *devfreq,
 				unsigned int event, void *data)
 {
 	struct devfreq_passive_data *p_data
 			= (struct devfreq_passive_data *)devfreq->data;
 	struct devfreq *parent = (struct devfreq *)p_data->parent;
-	struct notifier_block *nb = &p_data->nb;
 	int ret = 0;
 
-	if (!parent)
+	if (p_data->parent_type == DEVFREQ_PARENT_DEV && !parent)
 		return -EPROBE_DEFER;
 
 	switch (event) {
@@ -147,13 +446,11 @@  static int devfreq_passive_event_handler(struct devfreq *devfreq,
 		if (!p_data->this)
 			p_data->this = devfreq;
 
-		nb->notifier_call = devfreq_passive_notifier_call;
-		ret = devfreq_register_notifier(parent, nb,
-					DEVFREQ_TRANSITION_NOTIFIER);
+		ret = register_parent_dev_notifier(&p_data);
 		break;
+
 	case DEVFREQ_GOV_STOP:
-		WARN_ON(devfreq_unregister_notifier(parent, nb,
-					DEVFREQ_TRANSITION_NOTIFIER));
+		ret = unregister_parent_dev_notifier(&p_data);
 		break;
 	default:
 		break;
diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
index 26ea0850be9b..e0093b7c805c 100644
--- a/include/linux/devfreq.h
+++ b/include/linux/devfreq.h
@@ -280,6 +280,25 @@  struct devfreq_simple_ondemand_data {
 
 #if IS_ENABLED(CONFIG_DEVFREQ_GOV_PASSIVE)
 /**
+ * struct devfreq_cpu_state - holds the per-cpu state
+ * @freq:	the current frequency of the cpu.
+ * @min_freq:	the min frequency of the cpu.
+ * @max_freq:	the max frequency of the cpu.
+ * @first_cpu:	the cpumask of the first cpu of a policy.
+ * @dev:	reference to cpu device.
+ * @opp_table:	reference to cpu opp table.
+ *
+ * This structure stores the required cpu_state of a cpu.
+ * This is auto-populated by the governor.
+ */
+struct devfreq_cpu_state;
+
+enum devfreq_parent_dev_type {
+	DEVFREQ_PARENT_DEV,
+	CPUFREQ_PARENT_DEV,
+};
+
+/**
  * struct devfreq_passive_data - ``void *data`` fed to struct devfreq
  *	and devfreq_add_device
  * @parent:	the devfreq instance of parent device.
@@ -290,13 +309,15 @@  struct devfreq_simple_ondemand_data {
  *			using governors except for passive governor.
  *			If the devfreq device has the specific method to decide
  *			the next frequency, should use this callback.
+ * @parent_type:	parent type of the device
  * @this:	the devfreq instance of own device.
  * @nb:		the notifier block for DEVFREQ_TRANSITION_NOTIFIER list
+ * @cpu_state:		the state min/max/current frequency of all online cpu's
  *
  * The devfreq_passive_data have to set the devfreq instance of parent
  * device with governors except for the passive governor. But, don't need to
- * initialize the 'this' and 'nb' field because the devfreq core will handle
- * them.
+ * initialize the 'this', 'nb' and 'cpu_state' field because the devfreq core
+ * will handle them.
  */
 struct devfreq_passive_data {
 	/* Should set the devfreq instance of parent device */
@@ -305,9 +326,13 @@  struct devfreq_passive_data {
 	/* Optional callback to decide the next frequency of passvice device */
 	int (*get_target_freq)(struct devfreq *this, unsigned long *freq);
 
+	/* Should set the type of parent device */
+	enum devfreq_parent_dev_type parent_type;
+
 	/* For passive governor's internal use. Don't need to set them */
 	struct devfreq *this;
 	struct notifier_block nb;
+	struct devfreq_cpu_state *cpu_state[NR_CPUS];
 };
 #endif