From patchwork Wed Nov 29 11:08:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 13472620 Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id B941510F0; Wed, 29 Nov 2023 03:08:04 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id AF04BC15; Wed, 29 Nov 2023 03:08:51 -0800 (PST) Received: from e129166.arm.com (unknown [10.57.4.241]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 149003F5A1; Wed, 29 Nov 2023 03:08:01 -0800 (PST) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v5 01/23] PM: EM: Add missing newline for the message log Date: Wed, 29 Nov 2023 11:08:31 +0000 Message-Id: <20231129110853.94344-2-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231129110853.94344-1-lukasz.luba@arm.com> References: <20231129110853.94344-1-lukasz.luba@arm.com> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Fix missing newline for the string long in the error code path. Signed-off-by: Lukasz Luba --- kernel/power/energy_model.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index 7b44f5b89fa1..8b9dd4a39f63 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -250,7 +250,7 @@ static void em_cpufreq_update_efficiencies(struct device *dev) policy = cpufreq_cpu_get(cpumask_first(em_span_cpus(pd))); if (!policy) { - dev_warn(dev, "EM: Access to CPUFreq policy failed"); + dev_warn(dev, "EM: Access to CPUFreq policy failed\n"); return; } From patchwork Wed Nov 29 11:08:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 13472621 Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 7531810F0; Wed, 29 Nov 2023 03:08:07 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 667552F4; Wed, 29 Nov 2023 03:08:54 -0800 (PST) Received: from e129166.arm.com (unknown [10.57.4.241]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id C95F03F5A1; Wed, 29 Nov 2023 03:08:04 -0800 (PST) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v5 02/23] PM: EM: Refactor em_cpufreq_update_efficiencies() arguments Date: Wed, 29 Nov 2023 11:08:32 +0000 Message-Id: <20231129110853.94344-3-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231129110853.94344-1-lukasz.luba@arm.com> References: <20231129110853.94344-1-lukasz.luba@arm.com> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 In order to prepare the code for the modifiable EM perf_state table, refactor existing function em_cpufreq_update_efficiencies(). Signed-off-by: Lukasz Luba --- kernel/power/energy_model.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index 8b9dd4a39f63..42486674b834 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -237,10 +237,10 @@ static int em_create_pd(struct device *dev, int nr_states, return 0; } -static void em_cpufreq_update_efficiencies(struct device *dev) +static void +em_cpufreq_update_efficiencies(struct device *dev, struct em_perf_state *table) { struct em_perf_domain *pd = dev->em_pd; - struct em_perf_state *table; struct cpufreq_policy *policy; int found = 0; int i; @@ -254,8 +254,6 @@ static void em_cpufreq_update_efficiencies(struct device *dev) return; } - table = pd->table; - for (i = 0; i < pd->nr_perf_states; i++) { if (!(table[i].flags & EM_PERF_STATE_INEFFICIENT)) continue; @@ -397,7 +395,7 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states, dev->em_pd->flags |= flags; - em_cpufreq_update_efficiencies(dev); + em_cpufreq_update_efficiencies(dev, dev->em_pd->table); em_debug_create_pd(dev); dev_info(dev, "EM: created perf domain\n"); From patchwork Wed Nov 29 11:08:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 13472622 Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 240BA1998; Wed, 29 Nov 2023 03:08:10 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 1BADA1424; Wed, 29 Nov 2023 03:08:57 -0800 (PST) Received: from e129166.arm.com (unknown [10.57.4.241]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 7EEDF3F5A1; Wed, 29 Nov 2023 03:08:07 -0800 (PST) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v5 03/23] PM: EM: Find first CPU active while updating OPP efficiency Date: Wed, 29 Nov 2023 11:08:33 +0000 Message-Id: <20231129110853.94344-4-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231129110853.94344-1-lukasz.luba@arm.com> References: <20231129110853.94344-1-lukasz.luba@arm.com> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 The Energy Model might be updated at runtime and the energy efficiency for each OPP may change. Thus, there is a need to update also the cpufreq framework and make it aligned to the new values. In order to do that, use a first active CPU from the Performance Domain. This is needed since the first CPU in the cpumask might be offline when we run this code path. Signed-off-by: Lukasz Luba --- kernel/power/energy_model.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index 42486674b834..aa7c89f9e115 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -243,12 +243,19 @@ em_cpufreq_update_efficiencies(struct device *dev, struct em_perf_state *table) struct em_perf_domain *pd = dev->em_pd; struct cpufreq_policy *policy; int found = 0; - int i; + int i, cpu; if (!_is_cpu_device(dev) || !pd) return; - policy = cpufreq_cpu_get(cpumask_first(em_span_cpus(pd))); + /* Try to get a CPU which is active and in this PD */ + cpu = cpumask_first_and(em_span_cpus(pd), cpu_active_mask); + if (cpu >= nr_cpu_ids) { + dev_warn(dev, "EM: No online CPU for CPUFreq policy\n"); + return; + } + + policy = cpufreq_cpu_get(cpu); if (!policy) { dev_warn(dev, "EM: Access to CPUFreq policy failed\n"); return; From patchwork Wed Nov 29 11:08:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 13472623 Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id D708310FC; Wed, 29 Nov 2023 03:08:12 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C3B712F4; Wed, 29 Nov 2023 03:08:59 -0800 (PST) Received: from e129166.arm.com (unknown [10.57.4.241]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 343DB3F5A1; Wed, 29 Nov 2023 03:08:10 -0800 (PST) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v5 04/23] PM: EM: Refactor em_pd_get_efficient_state() to be more flexible Date: Wed, 29 Nov 2023 11:08:34 +0000 Message-Id: <20231129110853.94344-5-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231129110853.94344-1-lukasz.luba@arm.com> References: <20231129110853.94344-1-lukasz.luba@arm.com> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 The Energy Model (EM) is going to support runtime modification. There are going to be 2 EM tables which store information. This patch aims to prepare the code to be generic and use one of the tables. The function will no longer get a pointer to 'struct em_perf_domain' (the EM) but instead a pointer to 'struct em_perf_state' (which is one of the EM's tables). Prepare em_pd_get_efficient_state() for the upcoming changes and make it possible to re-use. Return an index for the best performance state for a given EM table. The function arguments that are introduced should allow to work on different performance state arrays. The caller of em_pd_get_efficient_state() should be able to use the index either on the default or the modifiable EM table. Signed-off-by: Lukasz Luba Reviewed-by: Daniel Lezcano --- include/linux/energy_model.h | 30 +++++++++++++++++------------- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h index b9caa01dfac4..8069f526c9d8 100644 --- a/include/linux/energy_model.h +++ b/include/linux/energy_model.h @@ -175,33 +175,35 @@ void em_dev_unregister_perf_domain(struct device *dev); /** * em_pd_get_efficient_state() - Get an efficient performance state from the EM - * @pd : Performance domain for which we want an efficient frequency - * @freq : Frequency to map with the EM + * @state: List of performance states, in ascending order + * @nr_perf_states: Number of performance states + * @freq: Frequency to map with the EM + * @pd_flags: Performance Domain flags * * It is called from the scheduler code quite frequently and as a consequence * doesn't implement any check. * - * Return: An efficient performance state, high enough to meet @freq + * Return: An efficient performance state id, high enough to meet @freq * requirement. */ -static inline -struct em_perf_state *em_pd_get_efficient_state(struct em_perf_domain *pd, - unsigned long freq) +static inline int +em_pd_get_efficient_state(struct em_perf_state *table, int nr_perf_states, + unsigned long freq, unsigned long pd_flags) { struct em_perf_state *ps; int i; - for (i = 0; i < pd->nr_perf_states; i++) { - ps = &pd->table[i]; + for (i = 0; i < nr_perf_states; i++) { + ps = &table[i]; if (ps->frequency >= freq) { - if (pd->flags & EM_PERF_DOMAIN_SKIP_INEFFICIENCIES && + if (pd_flags & EM_PERF_DOMAIN_SKIP_INEFFICIENCIES && ps->flags & EM_PERF_STATE_INEFFICIENT) continue; - break; + return i; } } - return ps; + return nr_perf_states - 1; } /** @@ -226,7 +228,7 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd, { unsigned long freq, scale_cpu; struct em_perf_state *ps; - int cpu; + int cpu, i; if (!sum_util) return 0; @@ -251,7 +253,9 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd, * Find the lowest performance state of the Energy Model above the * requested frequency. */ - ps = em_pd_get_efficient_state(pd, freq); + i = em_pd_get_efficient_state(pd->table, pd->nr_perf_states, freq, + pd->flags); + ps = &pd->table[i]; /* * The capacity of a CPU in the domain at the performance state (ps) From patchwork Wed Nov 29 11:08:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 13472624 Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id B7D671BD3; Wed, 29 Nov 2023 03:08:15 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 789412F4; Wed, 29 Nov 2023 03:09:02 -0800 (PST) Received: from e129166.arm.com (unknown [10.57.4.241]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id DC6E53F5A1; Wed, 29 Nov 2023 03:08:12 -0800 (PST) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v5 05/23] PM: EM: Refactor a new function em_compute_costs() Date: Wed, 29 Nov 2023 11:08:35 +0000 Message-Id: <20231129110853.94344-6-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231129110853.94344-1-lukasz.luba@arm.com> References: <20231129110853.94344-1-lukasz.luba@arm.com> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Refactor a dedicated function which will be easier to maintain and re-use in future. The upcoming changes for the modifiable EM perf_state table will use it (instead of duplicating the code). This change is not expected to alter the general functionality. Signed-off-by: Lukasz Luba --- kernel/power/energy_model.c | 72 ++++++++++++++++++++++--------------- 1 file changed, 43 insertions(+), 29 deletions(-) diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index aa7c89f9e115..3bea930410c6 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -103,14 +103,52 @@ static void em_debug_create_pd(struct device *dev) {} static void em_debug_remove_pd(struct device *dev) {} #endif +static int em_compute_costs(struct device *dev, struct em_perf_state *table, + struct em_data_callback *cb, int nr_states, + unsigned long flags) +{ + unsigned long prev_cost = ULONG_MAX; + u64 fmax; + int i, ret; + + /* Compute the cost of each performance state. */ + fmax = (u64) table[nr_states - 1].frequency; + for (i = nr_states - 1; i >= 0; i--) { + unsigned long power_res, cost; + + if (flags & EM_PERF_DOMAIN_ARTIFICIAL) { + ret = cb->get_cost(dev, table[i].frequency, &cost); + if (ret || !cost || cost > EM_MAX_POWER) { + dev_err(dev, "EM: invalid cost %lu %d\n", + cost, ret); + return -EINVAL; + } + } else { + power_res = table[i].power; + cost = div64_u64(fmax * power_res, table[i].frequency); + } + + table[i].cost = cost; + + if (table[i].cost >= prev_cost) { + table[i].flags = EM_PERF_STATE_INEFFICIENT; + dev_dbg(dev, "EM: OPP:%lu is inefficient\n", + table[i].frequency); + } else { + prev_cost = table[i].cost; + } + } + + return 0; +} + static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd, int nr_states, struct em_data_callback *cb, unsigned long flags) { - unsigned long power, freq, prev_freq = 0, prev_cost = ULONG_MAX; + unsigned long power, freq, prev_freq = 0; struct em_perf_state *table; int i, ret; - u64 fmax; table = kcalloc(nr_states, sizeof(*table), GFP_KERNEL); if (!table) @@ -154,33 +192,9 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd, table[i].frequency = prev_freq = freq; } - /* Compute the cost of each performance state. */ - fmax = (u64) table[nr_states - 1].frequency; - for (i = nr_states - 1; i >= 0; i--) { - unsigned long power_res, cost; - - if (flags & EM_PERF_DOMAIN_ARTIFICIAL) { - ret = cb->get_cost(dev, table[i].frequency, &cost); - if (ret || !cost || cost > EM_MAX_POWER) { - dev_err(dev, "EM: invalid cost %lu %d\n", - cost, ret); - goto free_ps_table; - } - } else { - power_res = table[i].power; - cost = div64_u64(fmax * power_res, table[i].frequency); - } - - table[i].cost = cost; - - if (table[i].cost >= prev_cost) { - table[i].flags = EM_PERF_STATE_INEFFICIENT; - dev_dbg(dev, "EM: OPP:%lu is inefficient\n", - table[i].frequency); - } else { - prev_cost = table[i].cost; - } - } + ret = em_compute_costs(dev, table, cb, nr_states, flags); + if (ret) + goto free_ps_table; pd->table = table; pd->nr_perf_states = nr_states; From patchwork Wed Nov 29 11:08:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 13472625 Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 4D3E71BEA; Wed, 29 Nov 2023 03:08:18 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 36D912F4; Wed, 29 Nov 2023 03:09:05 -0800 (PST) Received: from e129166.arm.com (unknown [10.57.4.241]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 917C13F5A1; Wed, 29 Nov 2023 03:08:15 -0800 (PST) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v5 06/23] PM: EM: Check if the get_cost() callback is present in em_compute_costs() Date: Wed, 29 Nov 2023 11:08:36 +0000 Message-Id: <20231129110853.94344-7-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231129110853.94344-1-lukasz.luba@arm.com> References: <20231129110853.94344-1-lukasz.luba@arm.com> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Subsequent changes will introduce a case in which 'cb->get_cost' may not be set in em_compute_costs(), so add a check to ensure that it is not NULL before attempting to dereference it. Signed-off-by: Lukasz Luba --- kernel/power/energy_model.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index 3bea930410c6..3c8542443dd4 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -116,7 +116,7 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table, for (i = nr_states - 1; i >= 0; i--) { unsigned long power_res, cost; - if (flags & EM_PERF_DOMAIN_ARTIFICIAL) { + if ((flags & EM_PERF_DOMAIN_ARTIFICIAL) && cb->get_cost) { ret = cb->get_cost(dev, table[i].frequency, &cost); if (ret || !cost || cost > EM_MAX_POWER) { dev_err(dev, "EM: invalid cost %lu %d\n", From patchwork Wed Nov 29 11:08:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 13472626 Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 093E61FC3; Wed, 29 Nov 2023 03:08:21 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id DF2262F4; Wed, 29 Nov 2023 03:09:07 -0800 (PST) Received: from e129166.arm.com (unknown [10.57.4.241]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 506623F5A1; Wed, 29 Nov 2023 03:08:18 -0800 (PST) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v5 07/23] PM: EM: Refactor how the EM table is allocated and populated Date: Wed, 29 Nov 2023 11:08:37 +0000 Message-Id: <20231129110853.94344-8-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231129110853.94344-1-lukasz.luba@arm.com> References: <20231129110853.94344-1-lukasz.luba@arm.com> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Split the process of allocation and data initialization for the EM table. The upcoming changes for modifiable EM will use it. This change is not expected to alter the general functionality. Signed-off-by: Lukasz Luba --- kernel/power/energy_model.c | 52 ++++++++++++++++++++++--------------- 1 file changed, 31 insertions(+), 21 deletions(-) diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index 3c8542443dd4..99426b5eedb6 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -142,18 +142,25 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table, return 0; } +static int em_allocate_perf_table(struct em_perf_domain *pd, + int nr_states) +{ + pd->table = kcalloc(nr_states, sizeof(struct em_perf_state), + GFP_KERNEL); + if (!pd->table) + return -ENOMEM; + + return 0; +} + static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd, + struct em_perf_state *table, int nr_states, struct em_data_callback *cb, unsigned long flags) { unsigned long power, freq, prev_freq = 0; - struct em_perf_state *table; int i, ret; - table = kcalloc(nr_states, sizeof(*table), GFP_KERNEL); - if (!table) - return -ENOMEM; - /* Build the list of performance states for this performance domain */ for (i = 0, freq = 0; i < nr_states; i++, freq++) { /* @@ -165,7 +172,7 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd, if (ret) { dev_err(dev, "EM: invalid perf. state: %d\n", ret); - goto free_ps_table; + return -EINVAL; } /* @@ -175,7 +182,7 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd, if (freq <= prev_freq) { dev_err(dev, "EM: non-increasing freq: %lu\n", freq); - goto free_ps_table; + return -EINVAL; } /* @@ -185,7 +192,7 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd, if (!power || power > EM_MAX_POWER) { dev_err(dev, "EM: invalid power: %lu\n", power); - goto free_ps_table; + return -EINVAL; } table[i].power = power; @@ -194,16 +201,9 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd, ret = em_compute_costs(dev, table, cb, nr_states, flags); if (ret) - goto free_ps_table; - - pd->table = table; - pd->nr_perf_states = nr_states; + return -EINVAL; return 0; - -free_ps_table: - kfree(table); - return -EINVAL; } static int em_create_pd(struct device *dev, int nr_states, @@ -234,11 +234,15 @@ static int em_create_pd(struct device *dev, int nr_states, return -ENOMEM; } - ret = em_create_perf_table(dev, pd, nr_states, cb, flags); - if (ret) { - kfree(pd); - return ret; - } + pd->nr_perf_states = nr_states; + + ret = em_allocate_perf_table(pd, nr_states); + if (ret) + goto free_pd; + + ret = em_create_perf_table(dev, pd, pd->table, nr_states, cb, flags); + if (ret) + goto free_pd_table; if (_is_cpu_device(dev)) for_each_cpu(cpu, cpus) { @@ -249,6 +253,12 @@ static int em_create_pd(struct device *dev, int nr_states, dev->em_pd = pd; return 0; + +free_pd_table: + kfree(pd->table); +free_pd: + kfree(pd); + return -EINVAL; } static void From patchwork Wed Nov 29 11:08:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 13472627 Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 070641FCE; Wed, 29 Nov 2023 03:08:23 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 91017C15; Wed, 29 Nov 2023 03:09:10 -0800 (PST) Received: from e129166.arm.com (unknown [10.57.4.241]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 02EED3F5A1; Wed, 29 Nov 2023 03:08:20 -0800 (PST) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v5 08/23] PM: EM: Introduce runtime modifiable table Date: Wed, 29 Nov 2023 11:08:38 +0000 Message-Id: <20231129110853.94344-9-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231129110853.94344-1-lukasz.luba@arm.com> References: <20231129110853.94344-1-lukasz.luba@arm.com> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 The new runtime table can be populated with a new power data to better reflect the actual efficiency of the device e.g. CPU. The power can vary over time e.g. due to the SoC temperature change. Higher temperature can increase power values. For longer running scenarios, such as game or camera, when also other devices are used (e.g. GPU, ISP) the CPU power can change. The new EM framework is able to addresses this issue and change the EM data at runtime safely. Signed-off-by: Lukasz Luba --- include/linux/energy_model.h | 12 ++++++++ kernel/power/energy_model.c | 53 ++++++++++++++++++++++++++++++++++++ 2 files changed, 65 insertions(+) diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h index 8069f526c9d8..1e618e431cac 100644 --- a/include/linux/energy_model.h +++ b/include/linux/energy_model.h @@ -36,9 +36,20 @@ struct em_perf_state { */ #define EM_PERF_STATE_INEFFICIENT BIT(0) +/** + * struct em_perf_table - Performance states table + * @rcu: RCU used for safe access and destruction + * @state: List of performance states, in ascending order + */ +struct em_perf_table { + struct rcu_head rcu; + struct em_perf_state state[]; +}; + /** * struct em_perf_domain - Performance domain * @table: List of performance states, in ascending order + * @runtime_table: Pointer to the runtime modifiable em_perf_table * @nr_perf_states: Number of performance states * @flags: See "em_perf_domain flags" * @cpus: Cpumask covering the CPUs of the domain. It's here @@ -54,6 +65,7 @@ struct em_perf_state { */ struct em_perf_domain { struct em_perf_state *table; + struct em_perf_table __rcu *runtime_table; int nr_perf_states; unsigned long flags; unsigned long cpus[]; diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index 99426b5eedb6..489287666705 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -23,6 +23,9 @@ */ static DEFINE_MUTEX(em_pd_mutex); +static void em_cpufreq_update_efficiencies(struct device *dev, + struct em_perf_state *table); + static bool _is_cpu_device(struct device *dev) { return (dev->bus == &cpu_subsys); @@ -103,6 +106,31 @@ static void em_debug_create_pd(struct device *dev) {} static void em_debug_remove_pd(struct device *dev) {} #endif +static void em_destroy_table_rcu(struct rcu_head *rp) +{ + struct em_perf_table __rcu *runtime_table; + + runtime_table = container_of(rp, struct em_perf_table, rcu); + kfree(runtime_table); +} + +static void em_free_table(struct em_perf_table __rcu *table) +{ + call_rcu(&table->rcu, em_destroy_table_rcu); +} + +static struct em_perf_table __rcu * +em_allocate_table(struct em_perf_domain *pd) +{ + struct em_perf_table __rcu *table; + int table_size; + + table_size = sizeof(struct em_perf_state) * pd->nr_perf_states; + + table = kzalloc(sizeof(*table) + table_size, GFP_KERNEL); + return table; +} + static int em_compute_costs(struct device *dev, struct em_perf_state *table, struct em_data_callback *cb, int nr_states, unsigned long flags) @@ -153,6 +181,24 @@ static int em_allocate_perf_table(struct em_perf_domain *pd, return 0; } +static int em_create_runtime_table(struct em_perf_domain *pd) +{ + struct em_perf_table __rcu *runtime_table; + int table_size; + + runtime_table = em_allocate_table(pd); + if (!runtime_table) + return -ENOMEM; + + /* Initialize runtime table with existing data */ + table_size = sizeof(struct em_perf_state) * pd->nr_perf_states; + memcpy(runtime_table->state, pd->table, table_size); + + rcu_assign_pointer(pd->runtime_table, runtime_table); + + return 0; +} + static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd, struct em_perf_state *table, int nr_states, struct em_data_callback *cb, @@ -244,6 +290,10 @@ static int em_create_pd(struct device *dev, int nr_states, if (ret) goto free_pd_table; + ret = em_create_runtime_table(pd); + if (ret) + goto free_pd_table; + if (_is_cpu_device(dev)) for_each_cpu(cpu, cpus) { cpu_dev = get_cpu_device(cpu); @@ -460,6 +510,9 @@ void em_dev_unregister_perf_domain(struct device *dev) em_debug_remove_pd(dev); kfree(dev->em_pd->table); + + em_free_table(dev->em_pd->runtime_table); + kfree(dev->em_pd); dev->em_pd = NULL; mutex_unlock(&em_pd_mutex); From patchwork Wed Nov 29 11:08:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 13472628 Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 6AAA61FEB; Wed, 29 Nov 2023 03:08:26 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 422322F4; Wed, 29 Nov 2023 03:09:13 -0800 (PST) Received: from e129166.arm.com (unknown [10.57.4.241]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id A925C3F5A1; Wed, 29 Nov 2023 03:08:23 -0800 (PST) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v5 09/23] PM: EM: Use runtime modified EM for CPUs energy estimation in EAS Date: Wed, 29 Nov 2023 11:08:39 +0000 Message-Id: <20231129110853.94344-10-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231129110853.94344-1-lukasz.luba@arm.com> References: <20231129110853.94344-1-lukasz.luba@arm.com> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 The new Energy Model (EM) supports runtime modification of the performance state table to better model the power used by the SoC. Use this new feature to improve energy estimation and therefore task placement in Energy Aware Scheduler (EAS). Signed-off-by: Lukasz Luba --- include/linux/energy_model.h | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h index 1e618e431cac..94a77a813724 100644 --- a/include/linux/energy_model.h +++ b/include/linux/energy_model.h @@ -238,6 +238,7 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd, unsigned long max_util, unsigned long sum_util, unsigned long allowed_cpu_cap) { + struct em_perf_table *runtime_table; unsigned long freq, scale_cpu; struct em_perf_state *ps; int cpu, i; @@ -255,7 +256,14 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd, */ cpu = cpumask_first(to_cpumask(pd->cpus)); scale_cpu = arch_scale_cpu_capacity(cpu); - ps = &pd->table[pd->nr_perf_states - 1]; + + /* + * No rcu_read_lock() since it's already called by task scheduler. + * The runtime_table is always there for CPUs, so we don't check. + */ + runtime_table = rcu_dereference(pd->runtime_table); + + ps = &runtime_table->state[pd->nr_perf_states - 1]; max_util = map_util_perf(max_util); max_util = min(max_util, allowed_cpu_cap); @@ -265,9 +273,9 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd, * Find the lowest performance state of the Energy Model above the * requested frequency. */ - i = em_pd_get_efficient_state(pd->table, pd->nr_perf_states, freq, - pd->flags); - ps = &pd->table[i]; + i = em_pd_get_efficient_state(runtime_table->state, pd->nr_perf_states, + freq, pd->flags); + ps = &runtime_table->state[i]; /* * The capacity of a CPU in the domain at the performance state (ps) From patchwork Wed Nov 29 11:08:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 13472629 Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 0DAD01BC5; Wed, 29 Nov 2023 03:08:29 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E51B8C15; Wed, 29 Nov 2023 03:09:15 -0800 (PST) Received: from e129166.arm.com (unknown [10.57.4.241]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 5A01B3F5A1; Wed, 29 Nov 2023 03:08:26 -0800 (PST) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v5 10/23] PM: EM: Add API for memory allocations for new tables Date: Wed, 29 Nov 2023 11:08:40 +0000 Message-Id: <20231129110853.94344-11-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231129110853.94344-1-lukasz.luba@arm.com> References: <20231129110853.94344-1-lukasz.luba@arm.com> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 The runtime modified EM table can be provided from drivers. Create mechanism which allows safely allocate and free the table for device drivers. The same table can be used by the EAS in task scheduler code paths, so make sure the memory is not freed when the device driver module is unloaded. Signed-off-by: Lukasz Luba --- include/linux/energy_model.h | 11 +++++++++ kernel/power/energy_model.c | 44 ++++++++++++++++++++++++++++++++++-- 2 files changed, 53 insertions(+), 2 deletions(-) diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h index 94a77a813724..e785211828fe 100644 --- a/include/linux/energy_model.h +++ b/include/linux/energy_model.h @@ -5,6 +5,7 @@ #include #include #include +#include #include #include #include @@ -39,10 +40,12 @@ struct em_perf_state { /** * struct em_perf_table - Performance states table * @rcu: RCU used for safe access and destruction + * @refcount: Reference count to track the owners * @state: List of performance states, in ascending order */ struct em_perf_table { struct rcu_head rcu; + struct kref refcount; struct em_perf_state state[]; }; @@ -184,6 +187,8 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states, struct em_data_callback *cb, cpumask_t *span, bool microwatts); void em_dev_unregister_perf_domain(struct device *dev); +struct em_perf_table __rcu *em_allocate_table(struct em_perf_domain *pd); +void em_free_table(struct em_perf_table __rcu *table); /** * em_pd_get_efficient_state() - Get an efficient performance state from the EM @@ -368,6 +373,12 @@ static inline int em_pd_nr_perf_states(struct em_perf_domain *pd) { return 0; } +static inline +struct em_perf_table __rcu *em_allocate_table(struct em_perf_domain *pd) +{ + return NULL; +} +static inline void em_free_table(struct em_perf_table __rcu *table) {} #endif #endif diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index 489287666705..489a358b9a00 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -114,12 +114,46 @@ static void em_destroy_table_rcu(struct rcu_head *rp) kfree(runtime_table); } -static void em_free_table(struct em_perf_table __rcu *table) +static void em_release_table_kref(struct kref *kref) { + struct em_perf_table __rcu *table; + + /* It was the last owner of this table so we can free */ + table = container_of(kref, struct em_perf_table, refcount); + call_rcu(&table->rcu, em_destroy_table_rcu); } -static struct em_perf_table __rcu * +static inline void em_inc_usage(struct em_perf_table __rcu *table) +{ + kref_get(&table->refcount); +} + +static void em_dec_usage(struct em_perf_table __rcu *table) +{ + kref_put(&table->refcount, em_release_table_kref); +} + +/** + * em_free_table() - Handles safe free of the EM table when needed + * @table : EM memory which is going to be freed + * + * No return values. + */ +void em_free_table(struct em_perf_table __rcu *table) +{ + em_dec_usage(table); +} + +/** + * em_allocate_table() - Handles safe allocation of the new EM table + * @table : EM memory which is going to be freed + * + * Increments the reference counter to mark that there is an owner of that + * EM table. That might be a device driver module or EAS. + * Returns allocated table or error. + */ +struct em_perf_table __rcu * em_allocate_table(struct em_perf_domain *pd) { struct em_perf_table __rcu *table; @@ -128,6 +162,12 @@ em_allocate_table(struct em_perf_domain *pd) table_size = sizeof(struct em_perf_state) * pd->nr_perf_states; table = kzalloc(sizeof(*table) + table_size, GFP_KERNEL); + if (!table) + return table; + + kref_init(&table->refcount); + em_inc_usage(table); + return table; } From patchwork Wed Nov 29 11:08:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 13472630 Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id A89A4210D; Wed, 29 Nov 2023 03:08:31 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9616C2F4; Wed, 29 Nov 2023 03:09:18 -0800 (PST) Received: from e129166.arm.com (unknown [10.57.4.241]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 088573F5A1; Wed, 29 Nov 2023 03:08:28 -0800 (PST) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v5 11/23] PM: EM: Add API for updating the runtime modifiable EM Date: Wed, 29 Nov 2023 11:08:41 +0000 Message-Id: <20231129110853.94344-12-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231129110853.94344-1-lukasz.luba@arm.com> References: <20231129110853.94344-1-lukasz.luba@arm.com> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Add API function em_dev_update_perf_domain() which allows to safely change the EM. The concurrent modifiers are protected by the mutex to serialize them. Removal of the old memory is asynchronous and handled by the RCU mechanisms. Signed-off-by: Lukasz Luba --- include/linux/energy_model.h | 8 +++++++ kernel/power/energy_model.c | 46 ++++++++++++++++++++++++++++++++++++ 2 files changed, 54 insertions(+) diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h index e785211828fe..520a8c8ad849 100644 --- a/include/linux/energy_model.h +++ b/include/linux/energy_model.h @@ -183,6 +183,8 @@ struct em_data_callback { struct em_perf_domain *em_cpu_get(int cpu); struct em_perf_domain *em_pd_get(struct device *dev); +int em_dev_update_perf_domain(struct device *dev, + struct em_perf_table __rcu *new_table); int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states, struct em_data_callback *cb, cpumask_t *span, bool microwatts); @@ -379,6 +381,12 @@ struct em_perf_table __rcu *em_allocate_table(struct em_perf_domain *pd) return NULL; } static inline void em_free_table(struct em_perf_table __rcu *table) {} +static inline +int em_dev_update_perf_domain(struct device *dev, + struct em_perf_table __rcu *new_table) +{ + return -EINVAL; +} #endif #endif diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index 489a358b9a00..614891fde8df 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -221,6 +221,52 @@ static int em_allocate_perf_table(struct em_perf_domain *pd, return 0; } +/** + * em_dev_update_perf_domain() - Update runtime EM table for a device + * @dev : Device for which the EM is to be updated + * @table : The new EM table that is going to used from now + * + * Update EM runtime modifiable table for the @dev using the privided @table. + * + * This function uses mutex to serialize writers, so it must not be called + * from non-sleeping context. + * + * Return 0 on success or a proper error in case of failure. + */ +int em_dev_update_perf_domain(struct device *dev, + struct em_perf_table __rcu *new_table) +{ + struct em_perf_table __rcu *old_table; + struct em_perf_domain *pd; + + /* + * The lock serializes update and unregister code paths. When the + * EM has been unregistered in the meantime, we should capture that + * when entering this critical section. It also makes sure that + * two concurrent updates will be serialized. + */ + mutex_lock(&em_pd_mutex); + + if (!dev || !dev->em_pd) { + mutex_unlock(&em_pd_mutex); + return -EINVAL; + } + pd = dev->em_pd; + + em_inc_usage(new_table); + + old_table = pd->runtime_table; + rcu_assign_pointer(pd->runtime_table, new_table); + + em_cpufreq_update_efficiencies(dev, new_table->state); + + em_dec_usage(old_table); + + mutex_unlock(&em_pd_mutex); + return 0; +} +EXPORT_SYMBOL_GPL(em_dev_update_perf_domain); + static int em_create_runtime_table(struct em_perf_domain *pd) { struct em_perf_table __rcu *runtime_table; From patchwork Wed Nov 29 11:08:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 13472631 Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 59BA02127; Wed, 29 Nov 2023 03:08:34 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 45CA22F4; Wed, 29 Nov 2023 03:09:21 -0800 (PST) Received: from e129166.arm.com (unknown [10.57.4.241]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id ADF103F5A1; Wed, 29 Nov 2023 03:08:31 -0800 (PST) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v5 12/23] PM: EM: Add helpers to read under RCU lock the EM table Date: Wed, 29 Nov 2023 11:08:42 +0000 Message-Id: <20231129110853.94344-13-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231129110853.94344-1-lukasz.luba@arm.com> References: <20231129110853.94344-1-lukasz.luba@arm.com> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To use the runtime modifiable EM table there is a need to use RCU read locking properly. Add helper functions for the device drivers and frameworks to make sure it's done properly. Signed-off-by: Lukasz Luba --- include/linux/energy_model.h | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h index 520a8c8ad849..ae3ccc8b9f44 100644 --- a/include/linux/energy_model.h +++ b/include/linux/energy_model.h @@ -341,6 +341,20 @@ static inline int em_pd_nr_perf_states(struct em_perf_domain *pd) return pd->nr_perf_states; } +static inline struct em_perf_state *em_get_table(struct em_perf_domain *pd) +{ + struct em_perf_table __rcu *runtime_table; + + rcu_read_lock(); + runtime_table = rcu_dereference(pd->runtime_table); + return runtime_table->state; +} + +static inline void em_put_table(void) +{ + rcu_read_unlock(); +} + #else struct em_data_callback {}; #define EM_ADV_DATA_CB(_active_power_cb, _cost_cb) { } @@ -387,6 +401,11 @@ int em_dev_update_perf_domain(struct device *dev, { return -EINVAL; } +static inline struct em_perf_state *em_get_table(struct em_perf_domain *pd) +{ + return NULL; +} +static inline void em_put_table(void) {} #endif #endif From patchwork Wed Nov 29 11:08:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 13472632 Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 268602688; Wed, 29 Nov 2023 03:08:37 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id EA6802F4; Wed, 29 Nov 2023 03:09:23 -0800 (PST) Received: from e129166.arm.com (unknown [10.57.4.241]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 5DF803F5A1; Wed, 29 Nov 2023 03:08:34 -0800 (PST) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v5 13/23] PM: EM: Add performance field to struct em_perf_state Date: Wed, 29 Nov 2023 11:08:43 +0000 Message-Id: <20231129110853.94344-14-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231129110853.94344-1-lukasz.luba@arm.com> References: <20231129110853.94344-1-lukasz.luba@arm.com> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 The performance doesn't scale linearly with the frequency. Also, it may be different in different workloads. Some CPUs are designed to be particularly good at some applications e.g. images or video processing and other CPUs in different. When those different types of CPUs are combined in one SoC they should be properly modeled to get max of the HW in Energy Aware Scheduler (EAS). The Energy Model (EM) provides the power vs. performance curves to the EAS, but assumes the CPUs capacity is fixed and scales linearly with the frequency. This patch allows to adjust the curve on the 'performance' axis as well. Signed-off-by: Lukasz Luba --- include/linux/energy_model.h | 11 ++++++----- kernel/power/energy_model.c | 27 +++++++++++++++++++++++++++ 2 files changed, 33 insertions(+), 5 deletions(-) diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h index ae3ccc8b9f44..e30750500b10 100644 --- a/include/linux/energy_model.h +++ b/include/linux/energy_model.h @@ -13,6 +13,7 @@ /** * struct em_perf_state - Performance state of a performance domain + * @performance: Non-linear CPU performance at a given frequency * @frequency: The frequency in KHz, for consistency with CPUFreq * @power: The power consumed at this level (by 1 CPU or by a registered * device). It can be a total power: static and dynamic. @@ -21,6 +22,7 @@ * @flags: see "em_perf_state flags" description below. */ struct em_perf_state { + unsigned long performance; unsigned long frequency; unsigned long power; unsigned long cost; @@ -207,14 +209,14 @@ void em_free_table(struct em_perf_table __rcu *table); */ static inline int em_pd_get_efficient_state(struct em_perf_state *table, int nr_perf_states, - unsigned long freq, unsigned long pd_flags) + unsigned long max_util, unsigned long pd_flags) { struct em_perf_state *ps; int i; for (i = 0; i < nr_perf_states; i++) { ps = &table[i]; - if (ps->frequency >= freq) { + if (ps->performance >= max_util) { if (pd_flags & EM_PERF_DOMAIN_SKIP_INEFFICIENCIES && ps->flags & EM_PERF_STATE_INEFFICIENT) continue; @@ -246,8 +248,8 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd, unsigned long allowed_cpu_cap) { struct em_perf_table *runtime_table; - unsigned long freq, scale_cpu; struct em_perf_state *ps; + unsigned long scale_cpu; int cpu, i; if (!sum_util) @@ -274,14 +276,13 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd, max_util = map_util_perf(max_util); max_util = min(max_util, allowed_cpu_cap); - freq = map_util_freq(max_util, ps->frequency, scale_cpu); /* * Find the lowest performance state of the Energy Model above the * requested frequency. */ i = em_pd_get_efficient_state(runtime_table->state, pd->nr_perf_states, - freq, pd->flags); + max_util, pd->flags); ps = &runtime_table->state[i]; /* diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index 614891fde8df..b5016afe6a19 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -46,6 +46,7 @@ static void em_debug_create_ps(struct em_perf_state *ps, struct dentry *pd) debugfs_create_ulong("frequency", 0444, d, &ps->frequency); debugfs_create_ulong("power", 0444, d, &ps->power); debugfs_create_ulong("cost", 0444, d, &ps->cost); + debugfs_create_ulong("performance", 0444, d, &ps->performance); debugfs_create_ulong("inefficient", 0444, d, &ps->flags); } @@ -171,6 +172,30 @@ em_allocate_table(struct em_perf_domain *pd) return table; } +static void em_init_performance(struct device *dev, struct em_perf_domain *pd, + struct em_perf_state *table, int nr_states) +{ + u64 fmax, max_cap; + int i, cpu; + + /* This is needed only for CPUs and EAS skip other devices */ + if (!_is_cpu_device(dev)) + return; + + cpu = cpumask_first(em_span_cpus(pd)); + + /* + * Calculate the performance value for each frequency with + * linear relationship. The final CPU capacity might not be ready at + * boot time, but the EM will be updated a bit later with correct one. + */ + fmax = (u64) table[nr_states - 1].frequency; + max_cap = (u64) arch_scale_cpu_capacity(cpu); + for (i = 0; i < nr_states; i++) + table[i].performance = div64_u64(max_cap * table[i].frequency, + fmax); +} + static int em_compute_costs(struct device *dev, struct em_perf_state *table, struct em_data_callback *cb, int nr_states, unsigned long flags) @@ -331,6 +356,8 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd, table[i].frequency = prev_freq = freq; } + em_init_performance(dev, pd, table, nr_states); + ret = em_compute_costs(dev, table, cb, nr_states, flags); if (ret) return -EINVAL; From patchwork Wed Nov 29 11:08:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 13472633 Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id B5EFC26A2; Wed, 29 Nov 2023 03:08:39 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9B56EC15; Wed, 29 Nov 2023 03:09:26 -0800 (PST) Received: from e129166.arm.com (unknown [10.57.4.241]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 0DE443F5A1; Wed, 29 Nov 2023 03:08:36 -0800 (PST) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v5 14/23] PM: EM: Support late CPUs booting and capacity adjustment Date: Wed, 29 Nov 2023 11:08:44 +0000 Message-Id: <20231129110853.94344-15-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231129110853.94344-1-lukasz.luba@arm.com> References: <20231129110853.94344-1-lukasz.luba@arm.com> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 The patch adds needed infrastructure to handle the late CPUs boot, which might change the previous CPUs capacity values. With this changes the new CPUs which try to register EM will trigger the needed re-calculations for other CPUs EMs. Thanks to that the em_per_state::performance values will be aligned with the CPU capacity information after all CPUs finish the boot and EM registrations. Signed-off-by: Lukasz Luba --- kernel/power/energy_model.c | 121 ++++++++++++++++++++++++++++++++++++ 1 file changed, 121 insertions(+) diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index b5016afe6a19..d3fa5a77de80 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -25,6 +25,9 @@ static DEFINE_MUTEX(em_pd_mutex); static void em_cpufreq_update_efficiencies(struct device *dev, struct em_perf_state *table); +static void em_check_capacity_update(void); +static void em_update_workfn(struct work_struct *work); +static DECLARE_DELAYED_WORK(em_update_work, em_update_workfn); static bool _is_cpu_device(struct device *dev) { @@ -596,6 +599,10 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states, unlock: mutex_unlock(&em_pd_mutex); + + if (_is_cpu_device(dev)) + em_check_capacity_update(); + return ret; } EXPORT_SYMBOL_GPL(em_dev_register_perf_domain); @@ -631,3 +638,117 @@ void em_dev_unregister_perf_domain(struct device *dev) mutex_unlock(&em_pd_mutex); } EXPORT_SYMBOL_GPL(em_dev_unregister_perf_domain); + +/* + * Adjustment of CPU performance values after boot, when all CPUs capacites + * are correctly calculated. + */ +static void em_adjust_new_capacity(struct device *dev, + struct em_perf_domain *pd, + u64 max_cap) +{ + struct em_perf_table __rcu *runtime_table; + struct em_perf_state *table, *new_table; + int ret, table_size; + + runtime_table = em_allocate_table(pd); + if (!runtime_table) { + dev_warn(dev, "EM: allocation failed\n"); + return; + } + + new_table = runtime_table->state; + + table = em_get_table(pd); + /* Initialize data based on older runtime table */ + table_size = sizeof(struct em_perf_state) * pd->nr_perf_states; + memcpy(new_table, table, table_size); + + em_put_table(); + + em_init_performance(dev, pd, new_table, pd->nr_perf_states); + ret = em_compute_costs(dev, new_table, NULL, pd->nr_perf_states, + pd->flags); + if (ret) { + em_free_table(runtime_table); + return; + } + + ret = em_dev_update_perf_domain(dev, runtime_table); + if (ret) + dev_warn(dev, "EM: update failed %d\n", ret); + + /* + * This is one-time-update, so give up the ownership in this updater. + * The EM fwk will keep the reference and free the memory when needed. + */ + em_free_table(runtime_table); +} + +static void em_check_capacity_update(void) +{ + cpumask_var_t cpu_done_mask; + struct em_perf_state *table; + struct em_perf_domain *pd; + unsigned long cpu_capacity; + int cpu; + + if (!zalloc_cpumask_var(&cpu_done_mask, GFP_KERNEL)) { + pr_warn("no free memory\n"); + return; + } + + /* Check if CPUs capacity has changed than update EM */ + for_each_possible_cpu(cpu) { + struct cpufreq_policy *policy; + unsigned long em_max_perf; + struct device *dev; + int nr_states; + + if (cpumask_test_cpu(cpu, cpu_done_mask)) + continue; + + policy = cpufreq_cpu_get(cpu); + if (!policy) { + pr_debug("Accessing cpu%d policy failed\n", cpu); + schedule_delayed_work(&em_update_work, + msecs_to_jiffies(1000)); + break; + } + cpufreq_cpu_put(policy); + + pd = em_cpu_get(cpu); + if (!pd || em_is_artificial(pd)) + continue; + + cpumask_or(cpu_done_mask, cpu_done_mask, + em_span_cpus(pd)); + + nr_states = pd->nr_perf_states; + cpu_capacity = arch_scale_cpu_capacity(cpu); + + table = em_get_table(pd); + em_max_perf = table[pd->nr_perf_states - 1].performance; + em_put_table(); + + /* + * Check if the CPU capacity has been adjusted during boot + * and trigger the update for new performance values. + */ + if (em_max_perf == cpu_capacity) + continue; + + pr_debug("updating cpu%d cpu_cap=%lu old capacity=%lu\n", + cpu, cpu_capacity, em_max_perf); + + dev = get_cpu_device(cpu); + em_adjust_new_capacity(dev, pd, cpu_capacity); + } + + free_cpumask_var(cpu_done_mask); +} + +static void em_update_workfn(struct work_struct *work) +{ + em_check_capacity_update(); +} From patchwork Wed Nov 29 11:08:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 13472634 Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 584061BEF; Wed, 29 Nov 2023 03:08:42 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4B5812F4; Wed, 29 Nov 2023 03:09:29 -0800 (PST) Received: from e129166.arm.com (unknown [10.57.4.241]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id B2AEA3F5A1; Wed, 29 Nov 2023 03:08:39 -0800 (PST) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v5 15/23] PM: EM: Optimize em_cpu_energy() and remove division Date: Wed, 29 Nov 2023 11:08:45 +0000 Message-Id: <20231129110853.94344-16-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231129110853.94344-1-lukasz.luba@arm.com> References: <20231129110853.94344-1-lukasz.luba@arm.com> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 The Energy Model (EM) can be modified at runtime which brings new possibilities. The em_cpu_energy() is called by the Energy Aware Scheduler (EAS) in it's hot path. The energy calculation uses power value for a given performance state (ps) and the CPU busy time as percentage for that given frequency, which effectively is: pd_nrg = ps->power * busy_time_pct (1) cpu_util busy_time_pct = ----------------- (2) ps->performance The 'ps->performance' is the CPU capacity (performance) at that given ps. Thus, in a situation when the OS is not overloaded and we have EAS working, the busy time is lower than 'ps->performance' that the CPU is running at. Therefore, in longer scheduling period we can treat the power value calculated above as the energy. We can optimize the last arithmetic operation in em_cpu_energy() and remove the division. This can be done because em_perf_state::cost, which is a special coefficient, can now hold the pre-calculated value including the 'ps->performance' information for a performance state (ps): ps->power ps->cost = --------------- (3) ps->performance In the past the 'ps->performance' had to be calculated at runtime every time the em_cpu_energy() was called. Thus there was this formula involved: ps->freq ps->performance = ------------- * scale_cpu (4) cpu_max_freq When we inject (4) into (2) than we can have this equation: cpu_util * cpu_max_freq busy_time_pct = ------------------------ (5) ps->freq * scale_cpu Because the right 'scale_cpu' value wasn't ready during the boot time and EM initialization, we had to perform the division by 'scale_cpu' at runtime. There was not safe mechanism to update EM at runtime. It has changed thanks to EM runtime modification feature. It is possible to avoid the division by 'scale_cpu' at runtime, because EM is updated whenever new max capacity CPU is set in the system or after the boot has finished and proper CPU capacity is ready. Use that feature and do the needed division during the calculation of the coefficient 'ps->cost'. That enhanced 'ps->cost' value can be then just multiplied simply by utilization: pd_nrg = ps->cost * \Sum cpu_util (6) to get the needed energy for whole Performance Domain (PD). With this optimization, the em_cpu_energy() should run faster on the Big CPU by 1.43x and on the Little CPU by 1.69x. Signed-off-by: Lukasz Luba --- include/linux/energy_model.h | 68 +++++------------------------------- kernel/power/energy_model.c | 7 ++-- 2 files changed, 12 insertions(+), 63 deletions(-) diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h index e30750500b10..0f5621898a81 100644 --- a/include/linux/energy_model.h +++ b/include/linux/energy_model.h @@ -115,27 +115,6 @@ struct em_perf_domain { #define EM_MAX_NUM_CPUS 16 #endif -/* - * To avoid an overflow on 32bit machines while calculating the energy - * use a different order in the operation. First divide by the 'cpu_scale' - * which would reduce big value stored in the 'cost' field, then multiply by - * the 'sum_util'. This would allow to handle existing platforms, which have - * e.g. power ~1.3 Watt at max freq, so the 'cost' value > 1mln micro-Watts. - * In such scenario, where there are 4 CPUs in the Perf. Domain the 'sum_util' - * could be 4096, then multiplication: 'cost' * 'sum_util' would overflow. - * This reordering of operations has some limitations, we lose small - * precision in the estimation (comparing to 64bit platform w/o reordering). - * - * We are safe on 64bit machine. - */ -#ifdef CONFIG_64BIT -#define em_estimate_energy(cost, sum_util, scale_cpu) \ - (((cost) * (sum_util)) / (scale_cpu)) -#else -#define em_estimate_energy(cost, sum_util, scale_cpu) \ - (((cost) / (scale_cpu)) * (sum_util)) -#endif - struct em_data_callback { /** * active_power() - Provide power at the next performance state of @@ -249,29 +228,16 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd, { struct em_perf_table *runtime_table; struct em_perf_state *ps; - unsigned long scale_cpu; - int cpu, i; + int i; if (!sum_util) return 0; - /* - * In order to predict the performance state, map the utilization of - * the most utilized CPU of the performance domain to a requested - * frequency, like schedutil. Take also into account that the real - * frequency might be set lower (due to thermal capping). Thus, clamp - * max utilization to the allowed CPU capacity before calculating - * effective frequency. - */ - cpu = cpumask_first(to_cpumask(pd->cpus)); - scale_cpu = arch_scale_cpu_capacity(cpu); - /* * No rcu_read_lock() since it's already called by task scheduler. * The runtime_table is always there for CPUs, so we don't check. */ runtime_table = rcu_dereference(pd->runtime_table); - ps = &runtime_table->state[pd->nr_perf_states - 1]; max_util = map_util_perf(max_util); @@ -286,35 +252,21 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd, ps = &runtime_table->state[i]; /* - * The capacity of a CPU in the domain at the performance state (ps) - * can be computed as: - * - * ps->freq * scale_cpu - * ps->cap = -------------------- (1) - * cpu_max_freq - * - * So, ignoring the costs of idle states (which are not available in - * the EM), the energy consumed by this CPU at that performance state + * The energy consumed by the CPU at the given performance state (ps) * is estimated as: * - * ps->power * cpu_util - * cpu_nrg = -------------------- (2) - * ps->cap + * ps->power + * cpu_nrg = --------------- * cpu_util (1) + * ps->performance * - * since 'cpu_util / ps->cap' represents its percentage of busy time. + * The 'cpu_util / ps->performance' represents its percentage of + * busy time. The idle cost is ignored (it's not available in the EM). * * NOTE: Although the result of this computation actually is in * units of power, it can be manipulated as an energy value * over a scheduling period, since it is assumed to be * constant during that interval. * - * By injecting (1) in (2), 'cpu_nrg' can be re-expressed as a product - * of two terms: - * - * ps->power * cpu_max_freq cpu_util - * cpu_nrg = ------------------------ * --------- (3) - * ps->freq scale_cpu - * * The first term is static, and is stored in the em_perf_state struct * as 'ps->cost'. * @@ -323,11 +275,9 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd, * total energy of the domain (which is the simple sum of the energy of * all of its CPUs) can be factorized as: * - * ps->cost * \Sum cpu_util - * pd_nrg = ------------------------ (4) - * scale_cpu + * pd_nrg = ps->cost * \Sum cpu_util (2) */ - return em_estimate_energy(ps->cost, sum_util, scale_cpu); + return ps->cost * sum_util; } /** diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index d3fa5a77de80..c6e5f35a5129 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -204,11 +204,9 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table, unsigned long flags) { unsigned long prev_cost = ULONG_MAX; - u64 fmax; int i, ret; /* Compute the cost of each performance state. */ - fmax = (u64) table[nr_states - 1].frequency; for (i = nr_states - 1; i >= 0; i--) { unsigned long power_res, cost; @@ -220,8 +218,9 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table, return -EINVAL; } } else { - power_res = table[i].power; - cost = div64_u64(fmax * power_res, table[i].frequency); + /* increase resolution of 'cost' precision */ + power_res = table[i].power * 10; + cost = power_res / table[i].performance; } table[i].cost = cost; From patchwork Wed Nov 29 11:08:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 13472635 Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 185EE170B; Wed, 29 Nov 2023 03:08:45 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id F0F9AC15; Wed, 29 Nov 2023 03:09:31 -0800 (PST) Received: from e129166.arm.com (unknown [10.57.4.241]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 63D073F5A1; Wed, 29 Nov 2023 03:08:42 -0800 (PST) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v5 16/23] powercap/dtpm_cpu: Use new Energy Model interface to get table Date: Wed, 29 Nov 2023 11:08:46 +0000 Message-Id: <20231129110853.94344-17-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231129110853.94344-1-lukasz.luba@arm.com> References: <20231129110853.94344-1-lukasz.luba@arm.com> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Energy Model framework support modifications at runtime of the power values. Use the new EM table API which is protected with RCU. Align the code so that this RCU read section is short. This change is not expected to alter the general functionality. Signed-off-by: Lukasz Luba --- drivers/powercap/dtpm_cpu.c | 35 +++++++++++++++++++++++++---------- 1 file changed, 25 insertions(+), 10 deletions(-) diff --git a/drivers/powercap/dtpm_cpu.c b/drivers/powercap/dtpm_cpu.c index 8a2f18fa3faf..45bb7e2849d7 100644 --- a/drivers/powercap/dtpm_cpu.c +++ b/drivers/powercap/dtpm_cpu.c @@ -42,6 +42,7 @@ static u64 set_pd_power_limit(struct dtpm *dtpm, u64 power_limit) { struct dtpm_cpu *dtpm_cpu = to_dtpm_cpu(dtpm); struct em_perf_domain *pd = em_cpu_get(dtpm_cpu->cpu); + struct em_perf_state *table; struct cpumask cpus; unsigned long freq; u64 power; @@ -50,20 +51,21 @@ static u64 set_pd_power_limit(struct dtpm *dtpm, u64 power_limit) cpumask_and(&cpus, cpu_online_mask, to_cpumask(pd->cpus)); nr_cpus = cpumask_weight(&cpus); + table = em_get_table(pd); for (i = 0; i < pd->nr_perf_states; i++) { - power = pd->table[i].power * nr_cpus; + power = table[i].power * nr_cpus; if (power > power_limit) break; } - freq = pd->table[i - 1].frequency; + freq = table[i - 1].frequency; + power_limit = table[i - 1].power * nr_cpus; + em_put_table(); freq_qos_update_request(&dtpm_cpu->qos_req, freq); - power_limit = pd->table[i - 1].power * nr_cpus; - return power_limit; } @@ -87,9 +89,11 @@ static u64 scale_pd_power_uw(struct cpumask *pd_mask, u64 power) static u64 get_pd_power_uw(struct dtpm *dtpm) { struct dtpm_cpu *dtpm_cpu = to_dtpm_cpu(dtpm); + struct em_perf_state *table; struct em_perf_domain *pd; struct cpumask *pd_mask; unsigned long freq; + u64 power = 0; int i; pd = em_cpu_get(dtpm_cpu->cpu); @@ -98,33 +102,41 @@ static u64 get_pd_power_uw(struct dtpm *dtpm) freq = cpufreq_quick_get(dtpm_cpu->cpu); + table = em_get_table(pd); for (i = 0; i < pd->nr_perf_states; i++) { - if (pd->table[i].frequency < freq) + if (table[i].frequency < freq) continue; - return scale_pd_power_uw(pd_mask, pd->table[i].power); + power = scale_pd_power_uw(pd_mask, table[i].power); + break; } + em_put_table(); - return 0; + return power; } static int update_pd_power_uw(struct dtpm *dtpm) { struct dtpm_cpu *dtpm_cpu = to_dtpm_cpu(dtpm); struct em_perf_domain *em = em_cpu_get(dtpm_cpu->cpu); + struct em_perf_state *table; struct cpumask cpus; int nr_cpus; cpumask_and(&cpus, cpu_online_mask, to_cpumask(em->cpus)); nr_cpus = cpumask_weight(&cpus); - dtpm->power_min = em->table[0].power; + table = em_get_table(em); + + dtpm->power_min = table[0].power; dtpm->power_min *= nr_cpus; - dtpm->power_max = em->table[em->nr_perf_states - 1].power; + dtpm->power_max = table[em->nr_perf_states - 1].power; dtpm->power_max *= nr_cpus; + em_put_table(); + return 0; } @@ -178,6 +190,7 @@ static int __dtpm_cpu_setup(int cpu, struct dtpm *parent) { struct dtpm_cpu *dtpm_cpu; struct cpufreq_policy *policy; + struct em_perf_state *table; struct em_perf_domain *pd; char name[CPUFREQ_NAME_LEN]; int ret = -ENOMEM; @@ -210,9 +223,11 @@ static int __dtpm_cpu_setup(int cpu, struct dtpm *parent) if (ret) goto out_kfree_dtpm_cpu; + table = em_get_table(pd); ret = freq_qos_add_request(&policy->constraints, &dtpm_cpu->qos_req, FREQ_QOS_MAX, - pd->table[pd->nr_perf_states - 1].frequency); + table[pd->nr_perf_states - 1].frequency); + em_put_table(); if (ret) goto out_dtpm_unregister; From patchwork Wed Nov 29 11:08:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 13472636 Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id C52AA1BF9; Wed, 29 Nov 2023 03:08:47 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A1B982F4; Wed, 29 Nov 2023 03:09:34 -0800 (PST) Received: from e129166.arm.com (unknown [10.57.4.241]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 14FD73F5A1; Wed, 29 Nov 2023 03:08:44 -0800 (PST) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v5 17/23] powercap/dtpm_devfreq: Use new Energy Model interface to get table Date: Wed, 29 Nov 2023 11:08:47 +0000 Message-Id: <20231129110853.94344-18-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231129110853.94344-1-lukasz.luba@arm.com> References: <20231129110853.94344-1-lukasz.luba@arm.com> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Energy Model framework support modifications at runtime of the power values. Use the new EM table API which is protected with RCU. Align the code so that this RCU read section is short. This change is not expected to alter the general functionality. Signed-off-by: Lukasz Luba --- drivers/powercap/dtpm_devfreq.c | 31 ++++++++++++++++++++----------- 1 file changed, 20 insertions(+), 11 deletions(-) diff --git a/drivers/powercap/dtpm_devfreq.c b/drivers/powercap/dtpm_devfreq.c index 612c3b59dd5b..514aa0d9d9c2 100644 --- a/drivers/powercap/dtpm_devfreq.c +++ b/drivers/powercap/dtpm_devfreq.c @@ -37,11 +37,15 @@ static int update_pd_power_uw(struct dtpm *dtpm) struct devfreq *devfreq = dtpm_devfreq->devfreq; struct device *dev = devfreq->dev.parent; struct em_perf_domain *pd = em_pd_get(dev); + struct em_perf_state *table; - dtpm->power_min = pd->table[0].power; + table = em_get_table(pd); - dtpm->power_max = pd->table[pd->nr_perf_states - 1].power; + dtpm->power_min = table[0].power; + dtpm->power_max = table[pd->nr_perf_states - 1].power; + + em_put_table(); return 0; } @@ -51,20 +55,22 @@ static u64 set_pd_power_limit(struct dtpm *dtpm, u64 power_limit) struct devfreq *devfreq = dtpm_devfreq->devfreq; struct device *dev = devfreq->dev.parent; struct em_perf_domain *pd = em_pd_get(dev); + struct em_perf_state *table; unsigned long freq; int i; + table = em_get_table(pd); for (i = 0; i < pd->nr_perf_states; i++) { - if (pd->table[i].power > power_limit) + if (table[i].power > power_limit) break; } - freq = pd->table[i - 1].frequency; + freq = table[i - 1].frequency; + power_limit = table[i - 1].power; + em_put_table(); dev_pm_qos_update_request(&dtpm_devfreq->qos_req, freq); - power_limit = pd->table[i - 1].power; - return power_limit; } @@ -89,8 +95,9 @@ static u64 get_pd_power_uw(struct dtpm *dtpm) struct device *dev = devfreq->dev.parent; struct em_perf_domain *pd = em_pd_get(dev); struct devfreq_dev_status status; + struct em_perf_state *table; unsigned long freq; - u64 power; + u64 power = 0; int i; mutex_lock(&devfreq->lock); @@ -100,19 +107,21 @@ static u64 get_pd_power_uw(struct dtpm *dtpm) freq = DIV_ROUND_UP(status.current_frequency, HZ_PER_KHZ); _normalize_load(&status); + table = em_get_table(pd); for (i = 0; i < pd->nr_perf_states; i++) { - if (pd->table[i].frequency < freq) + if (table[i].frequency < freq) continue; - power = pd->table[i].power; + power = table[i].power; power *= status.busy_time; power >>= 10; - return power; + break; } + em_put_table(); - return 0; + return power; } static void pd_release(struct dtpm *dtpm) From patchwork Wed Nov 29 11:08:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 13472637 Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 95E121FCE; Wed, 29 Nov 2023 03:08:50 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 7D4CFC15; Wed, 29 Nov 2023 03:09:37 -0800 (PST) Received: from e129166.arm.com (unknown [10.57.4.241]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id BA4E83F5A1; Wed, 29 Nov 2023 03:08:47 -0800 (PST) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v5 18/23] drivers/thermal/cpufreq_cooling: Use new Energy Model interface Date: Wed, 29 Nov 2023 11:08:48 +0000 Message-Id: <20231129110853.94344-19-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231129110853.94344-1-lukasz.luba@arm.com> References: <20231129110853.94344-1-lukasz.luba@arm.com> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Energy Model framework support modifications at runtime of the power values. Use the new EM table API which is protected with RCU. Align the code so that this RCU read section is short. This change is not expected to alter the general functionality. Signed-off-by: Lukasz Luba --- drivers/thermal/cpufreq_cooling.c | 40 ++++++++++++++++++++++++------- 1 file changed, 32 insertions(+), 8 deletions(-) diff --git a/drivers/thermal/cpufreq_cooling.c b/drivers/thermal/cpufreq_cooling.c index e2cc7bd30862..c32d8dfa4fff 100644 --- a/drivers/thermal/cpufreq_cooling.c +++ b/drivers/thermal/cpufreq_cooling.c @@ -91,12 +91,15 @@ struct cpufreq_cooling_device { static unsigned long get_level(struct cpufreq_cooling_device *cpufreq_cdev, unsigned int freq) { + struct em_perf_state *table; int i; + table = em_get_table(cpufreq_cdev->em); for (i = cpufreq_cdev->max_level - 1; i >= 0; i--) { - if (freq > cpufreq_cdev->em->table[i].frequency) + if (freq > table[i].frequency) break; } + em_put_table(); return cpufreq_cdev->max_level - i - 1; } @@ -104,16 +107,19 @@ static unsigned long get_level(struct cpufreq_cooling_device *cpufreq_cdev, static u32 cpu_freq_to_power(struct cpufreq_cooling_device *cpufreq_cdev, u32 freq) { + struct em_perf_state *table; unsigned long power_mw; int i; + table = em_get_table(cpufreq_cdev->em); for (i = cpufreq_cdev->max_level - 1; i >= 0; i--) { - if (freq > cpufreq_cdev->em->table[i].frequency) + if (freq > table[i].frequency) break; } - power_mw = cpufreq_cdev->em->table[i + 1].power; + power_mw = table[i + 1].power; power_mw /= MICROWATT_PER_MILLIWATT; + em_put_table(); return power_mw; } @@ -121,18 +127,23 @@ static u32 cpu_freq_to_power(struct cpufreq_cooling_device *cpufreq_cdev, static u32 cpu_power_to_freq(struct cpufreq_cooling_device *cpufreq_cdev, u32 power) { + struct em_perf_state *table; unsigned long em_power_mw; + u32 freq; int i; + table = em_get_table(cpufreq_cdev->em); for (i = cpufreq_cdev->max_level; i > 0; i--) { /* Convert EM power to milli-Watts to make safe comparison */ - em_power_mw = cpufreq_cdev->em->table[i].power; + em_power_mw = table[i].power; em_power_mw /= MICROWATT_PER_MILLIWATT; if (power >= em_power_mw) break; } + freq = table[i].frequency; + em_put_table(); - return cpufreq_cdev->em->table[i].frequency; + return freq; } /** @@ -262,8 +273,9 @@ static int cpufreq_get_requested_power(struct thermal_cooling_device *cdev, static int cpufreq_state2power(struct thermal_cooling_device *cdev, unsigned long state, u32 *power) { - unsigned int freq, num_cpus, idx; struct cpufreq_cooling_device *cpufreq_cdev = cdev->devdata; + unsigned int freq, num_cpus, idx; + struct em_perf_state *table; /* Request state should be less than max_level */ if (state > cpufreq_cdev->max_level) @@ -272,7 +284,11 @@ static int cpufreq_state2power(struct thermal_cooling_device *cdev, num_cpus = cpumask_weight(cpufreq_cdev->policy->cpus); idx = cpufreq_cdev->max_level - state; - freq = cpufreq_cdev->em->table[idx].frequency; + + table = em_get_table(cpufreq_cdev->em); + freq = table[idx].frequency; + em_put_table(); + *power = cpu_freq_to_power(cpufreq_cdev, freq) * num_cpus; return 0; @@ -378,8 +394,16 @@ static unsigned int get_state_freq(struct cpufreq_cooling_device *cpufreq_cdev, #ifdef CONFIG_THERMAL_GOV_POWER_ALLOCATOR /* Use the Energy Model table if available */ if (cpufreq_cdev->em) { + struct em_perf_state *table; + unsigned int freq; + idx = cpufreq_cdev->max_level - state; - return cpufreq_cdev->em->table[idx].frequency; + + table = em_get_table(cpufreq_cdev->em); + freq = table[idx].frequency; + em_put_table(); + + return freq; } #endif From patchwork Wed Nov 29 11:08:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 13472638 Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 4611B2712; Wed, 29 Nov 2023 03:08:53 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 32BE62F4; Wed, 29 Nov 2023 03:09:40 -0800 (PST) Received: from e129166.arm.com (unknown [10.57.4.241]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 976813F5A1; Wed, 29 Nov 2023 03:08:50 -0800 (PST) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v5 19/23] drivers/thermal/devfreq_cooling: Use new Energy Model interface Date: Wed, 29 Nov 2023 11:08:49 +0000 Message-Id: <20231129110853.94344-20-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231129110853.94344-1-lukasz.luba@arm.com> References: <20231129110853.94344-1-lukasz.luba@arm.com> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Energy Model framework support modifications at runtime of the power values. Use the new EM table API which is protected with RCU. Align the code so that this RCU read section is short. This change is not expected to alter the general functionality. Signed-off-by: Lukasz Luba --- drivers/thermal/devfreq_cooling.c | 43 ++++++++++++++++++++++++------- 1 file changed, 34 insertions(+), 9 deletions(-) diff --git a/drivers/thermal/devfreq_cooling.c b/drivers/thermal/devfreq_cooling.c index 262e62ab6cf2..b7aed5a3810e 100644 --- a/drivers/thermal/devfreq_cooling.c +++ b/drivers/thermal/devfreq_cooling.c @@ -87,6 +87,7 @@ static int devfreq_cooling_set_cur_state(struct thermal_cooling_device *cdev, struct devfreq_cooling_device *dfc = cdev->devdata; struct devfreq *df = dfc->devfreq; struct device *dev = df->dev.parent; + struct em_perf_state *table; unsigned long freq; int perf_idx; @@ -100,7 +101,10 @@ static int devfreq_cooling_set_cur_state(struct thermal_cooling_device *cdev, if (dfc->em_pd) { perf_idx = dfc->max_state - state; - freq = dfc->em_pd->table[perf_idx].frequency * 1000; + + table = em_get_table(dfc->em_pd); + freq = table[perf_idx].frequency * 1000; + em_put_table(); } else { freq = dfc->freq_table[state]; } @@ -123,14 +127,20 @@ static int devfreq_cooling_set_cur_state(struct thermal_cooling_device *cdev, */ static int get_perf_idx(struct em_perf_domain *em_pd, unsigned long freq) { - int i; + struct em_perf_state *table; + int i, idx = -EINVAL; + table = em_get_table(em_pd); for (i = 0; i < em_pd->nr_perf_states; i++) { - if (em_pd->table[i].frequency == freq) - return i; + if (table[i].frequency != freq) + continue; + + idx = i; + break; } + em_put_table(); - return -EINVAL; + return idx; } static unsigned long get_voltage(struct devfreq *df, unsigned long freq) @@ -181,6 +191,7 @@ static int devfreq_cooling_get_requested_power(struct thermal_cooling_device *cd struct devfreq_cooling_device *dfc = cdev->devdata; struct devfreq *df = dfc->devfreq; struct devfreq_dev_status status; + struct em_perf_state *table; unsigned long state; unsigned long freq; unsigned long voltage; @@ -204,7 +215,10 @@ static int devfreq_cooling_get_requested_power(struct thermal_cooling_device *cd state = dfc->capped_state; /* Convert EM power into milli-Watts first */ - dfc->res_util = dfc->em_pd->table[state].power; + table = em_get_table(dfc->em_pd); + dfc->res_util = table[state].power; + em_put_table(); + dfc->res_util /= MICROWATT_PER_MILLIWATT; dfc->res_util *= SCALE_ERROR_MITIGATION; @@ -225,7 +239,10 @@ static int devfreq_cooling_get_requested_power(struct thermal_cooling_device *cd _normalize_load(&status); /* Convert EM power into milli-Watts first */ - *power = dfc->em_pd->table[perf_idx].power; + table = em_get_table(dfc->em_pd); + *power = table[perf_idx].power; + em_put_table(); + *power /= MICROWATT_PER_MILLIWATT; /* Scale power for utilization */ *power *= status.busy_time; @@ -245,13 +262,18 @@ static int devfreq_cooling_state2power(struct thermal_cooling_device *cdev, unsigned long state, u32 *power) { struct devfreq_cooling_device *dfc = cdev->devdata; + struct em_perf_state *table; int perf_idx; if (state > dfc->max_state) return -EINVAL; perf_idx = dfc->max_state - state; - *power = dfc->em_pd->table[perf_idx].power; + + table = em_get_table(dfc->em_pd); + *power = table[perf_idx].power; + em_put_table(); + *power /= MICROWATT_PER_MILLIWATT; return 0; @@ -264,6 +286,7 @@ static int devfreq_cooling_power2state(struct thermal_cooling_device *cdev, struct devfreq *df = dfc->devfreq; struct devfreq_dev_status status; unsigned long freq, em_power_mw; + struct em_perf_state *table; s32 est_power; int i; @@ -288,13 +311,15 @@ static int devfreq_cooling_power2state(struct thermal_cooling_device *cdev, * Find the first cooling state that is within the power * budget. The EM power table is sorted ascending. */ + table = em_get_table(dfc->em_pd); for (i = dfc->max_state; i > 0; i--) { /* Convert EM power to milli-Watts to make safe comparison */ - em_power_mw = dfc->em_pd->table[i].power; + em_power_mw = table[i].power; em_power_mw /= MICROWATT_PER_MILLIWATT; if (est_power >= em_power_mw) break; } + em_put_table(); *state = dfc->max_state - i; dfc->capped_state = *state; From patchwork Wed Nov 29 11:08:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 13472639 Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id F272E272C; Wed, 29 Nov 2023 03:08:55 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D610E2F4; Wed, 29 Nov 2023 03:09:42 -0800 (PST) Received: from e129166.arm.com (unknown [10.57.4.241]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 4A6CE3F5A1; Wed, 29 Nov 2023 03:08:53 -0800 (PST) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v5 20/23] PM: EM: Change debugfs configuration to use runtime EM table data Date: Wed, 29 Nov 2023 11:08:50 +0000 Message-Id: <20231129110853.94344-21-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231129110853.94344-1-lukasz.luba@arm.com> References: <20231129110853.94344-1-lukasz.luba@arm.com> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Dump the runtime EM table values which can be modified in time. In order to do that allocate chunk of debug memory which can be later freed automatically thanks to devm_kcalloc(). This design can handle the fact that the EM table memory can change after EM update, so debug code cannot use the pointer from initialization phase. Signed-off-by: Lukasz Luba --- kernel/power/energy_model.c | 65 ++++++++++++++++++++++++++++++++----- 1 file changed, 57 insertions(+), 8 deletions(-) diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index c6e5f35a5129..cc47993b4d64 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -37,20 +37,63 @@ static bool _is_cpu_device(struct device *dev) #ifdef CONFIG_DEBUG_FS static struct dentry *rootdir; -static void em_debug_create_ps(struct em_perf_state *ps, struct dentry *pd) +struct em_dbg_info { + struct em_perf_domain *pd; + int ps_id; +}; + +#define DEFINE_EM_DBG_SHOW(name, fname) \ +static int em_debug_##fname##_show(struct seq_file *s, void *unused) \ +{ \ + struct em_dbg_info *em_dbg = s->private; \ + struct em_perf_state *table; \ + unsigned long val; \ + \ + table = em_get_table(em_dbg->pd); \ + val = table[em_dbg->ps_id].name; \ + em_put_table(); \ + \ + seq_printf(s, "%lu\n", val); \ + return 0; \ +} \ +DEFINE_SHOW_ATTRIBUTE(em_debug_##fname) + +DEFINE_EM_DBG_SHOW(frequency, frequency); +DEFINE_EM_DBG_SHOW(power, power); +DEFINE_EM_DBG_SHOW(cost, cost); +DEFINE_EM_DBG_SHOW(performance, performance); +DEFINE_EM_DBG_SHOW(flags, inefficiency); + +static void em_debug_create_ps(struct em_perf_domain *em_pd, + struct em_dbg_info *em_dbg, int i, + struct dentry *pd) { + struct em_perf_state *table; + unsigned long freq; struct dentry *d; char name[24]; - snprintf(name, sizeof(name), "ps:%lu", ps->frequency); + em_dbg[i].pd = em_pd; + em_dbg[i].ps_id = i; + + table = em_get_table(em_pd); + freq = table[i].frequency; + em_put_table(); + + snprintf(name, sizeof(name), "ps:%lu", freq); /* Create per-ps directory */ d = debugfs_create_dir(name, pd); - debugfs_create_ulong("frequency", 0444, d, &ps->frequency); - debugfs_create_ulong("power", 0444, d, &ps->power); - debugfs_create_ulong("cost", 0444, d, &ps->cost); - debugfs_create_ulong("performance", 0444, d, &ps->performance); - debugfs_create_ulong("inefficient", 0444, d, &ps->flags); + debugfs_create_file("frequency", 0444, d, &em_dbg[i], + &em_debug_frequency_fops); + debugfs_create_file("power", 0444, d, &em_dbg[i], + &em_debug_power_fops); + debugfs_create_file("cost", 0444, d, &em_dbg[i], + &em_debug_cost_fops); + debugfs_create_file("performance", 0444, d, &em_dbg[i], + &em_debug_performance_fops); + debugfs_create_file("inefficient", 0444, d, &em_dbg[i], + &em_debug_inefficiency_fops); } static int em_debug_cpus_show(struct seq_file *s, void *unused) @@ -73,6 +116,7 @@ DEFINE_SHOW_ATTRIBUTE(em_debug_flags); static void em_debug_create_pd(struct device *dev) { + struct em_dbg_info *em_dbg; struct dentry *d; int i; @@ -86,9 +130,14 @@ static void em_debug_create_pd(struct device *dev) debugfs_create_file("flags", 0444, d, dev->em_pd, &em_debug_flags_fops); + em_dbg = devm_kcalloc(dev, dev->em_pd->nr_perf_states, + sizeof(*em_dbg), GFP_KERNEL); + if (!em_dbg) + return; + /* Create a sub-directory for each performance state */ for (i = 0; i < dev->em_pd->nr_perf_states; i++) - em_debug_create_ps(&dev->em_pd->table[i], d); + em_debug_create_ps(dev->em_pd, em_dbg, i, d); } From patchwork Wed Nov 29 11:08:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 13472640 Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id ABC512D46; Wed, 29 Nov 2023 03:08:58 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 86858C15; Wed, 29 Nov 2023 03:09:45 -0800 (PST) Received: from e129166.arm.com (unknown [10.57.4.241]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id EDC433F5A1; Wed, 29 Nov 2023 03:08:55 -0800 (PST) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v5 21/23] PM: EM: Remove old table Date: Wed, 29 Nov 2023 11:08:51 +0000 Message-Id: <20231129110853.94344-22-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231129110853.94344-1-lukasz.luba@arm.com> References: <20231129110853.94344-1-lukasz.luba@arm.com> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Remove the old EM table which wasn't able to modify the data. Clean the unneeded function and refactor the code a bit. Signed-off-by: Lukasz Luba --- include/linux/energy_model.h | 2 -- kernel/power/energy_model.c | 47 ++++++------------------------------ 2 files changed, 8 insertions(+), 41 deletions(-) diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h index 0f5621898a81..9c47388482a0 100644 --- a/include/linux/energy_model.h +++ b/include/linux/energy_model.h @@ -53,7 +53,6 @@ struct em_perf_table { /** * struct em_perf_domain - Performance domain - * @table: List of performance states, in ascending order * @runtime_table: Pointer to the runtime modifiable em_perf_table * @nr_perf_states: Number of performance states * @flags: See "em_perf_domain flags" @@ -69,7 +68,6 @@ struct em_perf_table { * field is unused. */ struct em_perf_domain { - struct em_perf_state *table; struct em_perf_table __rcu *runtime_table; int nr_perf_states; unsigned long flags; diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index cc47993b4d64..234823c0e59d 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -286,17 +286,6 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table, return 0; } -static int em_allocate_perf_table(struct em_perf_domain *pd, - int nr_states) -{ - pd->table = kcalloc(nr_states, sizeof(struct em_perf_state), - GFP_KERNEL); - if (!pd->table) - return -ENOMEM; - - return 0; -} - /** * em_dev_update_perf_domain() - Update runtime EM table for a device * @dev : Device for which the EM is to be updated @@ -343,24 +332,6 @@ int em_dev_update_perf_domain(struct device *dev, } EXPORT_SYMBOL_GPL(em_dev_update_perf_domain); -static int em_create_runtime_table(struct em_perf_domain *pd) -{ - struct em_perf_table __rcu *runtime_table; - int table_size; - - runtime_table = em_allocate_table(pd); - if (!runtime_table) - return -ENOMEM; - - /* Initialize runtime table with existing data */ - table_size = sizeof(struct em_perf_state) * pd->nr_perf_states; - memcpy(runtime_table->state, pd->table, table_size); - - rcu_assign_pointer(pd->runtime_table, runtime_table); - - return 0; -} - static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd, struct em_perf_state *table, int nr_states, struct em_data_callback *cb, @@ -420,6 +391,7 @@ static int em_create_pd(struct device *dev, int nr_states, struct em_data_callback *cb, cpumask_t *cpus, unsigned long flags) { + struct em_perf_table __rcu *runtime_table; struct em_perf_domain *pd; struct device *cpu_dev; int cpu, ret, num_cpus; @@ -446,17 +418,16 @@ static int em_create_pd(struct device *dev, int nr_states, pd->nr_perf_states = nr_states; - ret = em_allocate_perf_table(pd, nr_states); - if (ret) + runtime_table = em_allocate_table(pd); + if (!runtime_table) goto free_pd; - ret = em_create_perf_table(dev, pd, pd->table, nr_states, cb, flags); + ret = em_create_perf_table(dev, pd, runtime_table->state, + nr_states, cb, flags); if (ret) goto free_pd_table; - ret = em_create_runtime_table(pd); - if (ret) - goto free_pd_table; + rcu_assign_pointer(pd->runtime_table, runtime_table); if (_is_cpu_device(dev)) for_each_cpu(cpu, cpus) { @@ -469,7 +440,7 @@ static int em_create_pd(struct device *dev, int nr_states, return 0; free_pd_table: - kfree(pd->table); + kfree(runtime_table); free_pd: kfree(pd); return -EINVAL; @@ -640,7 +611,7 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states, dev->em_pd->flags |= flags; - em_cpufreq_update_efficiencies(dev, dev->em_pd->table); + em_cpufreq_update_efficiencies(dev, dev->em_pd->runtime_table->state); em_debug_create_pd(dev); dev_info(dev, "EM: created perf domain\n"); @@ -677,8 +648,6 @@ void em_dev_unregister_perf_domain(struct device *dev) mutex_lock(&em_pd_mutex); em_debug_remove_pd(dev); - kfree(dev->em_pd->table); - em_free_table(dev->em_pd->runtime_table); kfree(dev->em_pd); From patchwork Wed Nov 29 11:08:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 13472641 Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 5031E1FE1; Wed, 29 Nov 2023 03:09:01 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 35F6E2F4; Wed, 29 Nov 2023 03:09:48 -0800 (PST) Received: from e129166.arm.com (unknown [10.57.4.241]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 9E0913F5A1; Wed, 29 Nov 2023 03:08:58 -0800 (PST) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v5 22/23] PM: EM: Add em_dev_compute_costs() as API for device drivers Date: Wed, 29 Nov 2023 11:08:52 +0000 Message-Id: <20231129110853.94344-23-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231129110853.94344-1-lukasz.luba@arm.com> References: <20231129110853.94344-1-lukasz.luba@arm.com> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 The device drivers can modify EM at runtime by providing a new EM table. The EM is used by the EAS and the em_perf_state::cost stores pre-calculated value to avoid overhead. This patch provides the API for device drivers to calculate the cost values properly (and not duplicate the same code). Signed-off-by: Lukasz Luba --- include/linux/energy_model.h | 8 ++++++++ kernel/power/energy_model.c | 18 ++++++++++++++++++ 2 files changed, 26 insertions(+) diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h index 9c47388482a0..836622b1a0a1 100644 --- a/include/linux/energy_model.h +++ b/include/linux/energy_model.h @@ -170,6 +170,8 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states, void em_dev_unregister_perf_domain(struct device *dev); struct em_perf_table __rcu *em_allocate_table(struct em_perf_domain *pd); void em_free_table(struct em_perf_table __rcu *table); +int em_dev_compute_costs(struct device *dev, struct em_perf_state *table, + int nr_states); /** * em_pd_get_efficient_state() - Get an efficient performance state from the EM @@ -355,6 +357,12 @@ static inline struct em_perf_state *em_get_table(struct em_perf_domain *pd) return NULL; } static inline void em_put_table(void) {} +static inline +int em_dev_compute_costs(struct device *dev, struct em_perf_state *table, + int nr_states) +{ + return -EINVAL; +} #endif #endif diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index 234823c0e59d..fadfdefbe5f0 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -286,6 +286,24 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table, return 0; } +/** + * em_dev_compute_costs() - Calculate cost values for new runtime EM table + * @dev : Device for which the EM table is to be updated + * @table : The new EM table that is going to get the costs calculated + * + * Calculate the em_perf_state::cost values for new runtime EM table. The + * values are used for EAS during task placement. It also calculates and sets + * the efficiency flag for each performance state. When the function finish + * successfully the EM table is ready to be updated and used by EAS. + * + * Return 0 on success or a proper error in case of failure. + */ +int em_dev_compute_costs(struct device *dev, struct em_perf_state *table, + int nr_states) +{ + return em_compute_costs(dev, table, NULL, nr_states, 0); +} + /** * em_dev_update_perf_domain() - Update runtime EM table for a device * @dev : Device for which the EM is to be updated From patchwork Wed Nov 29 11:08:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 13472642 Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id EDDDC1FD2; Wed, 29 Nov 2023 03:09:03 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D8F0FC15; Wed, 29 Nov 2023 03:09:50 -0800 (PST) Received: from e129166.arm.com (unknown [10.57.4.241]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 4DDB73F5A1; Wed, 29 Nov 2023 03:09:01 -0800 (PST) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v5 23/23] Documentation: EM: Update with runtime modification design Date: Wed, 29 Nov 2023 11:08:53 +0000 Message-Id: <20231129110853.94344-24-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231129110853.94344-1-lukasz.luba@arm.com> References: <20231129110853.94344-1-lukasz.luba@arm.com> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Add a new section 'Design' which covers the information about Energy Model. It contains the design decisions, describes models and how they reflect the reality. Remove description of the default EM. Change the other section IDs. Add documentation bit for the new feature which allows to modify the EM in runtime. Signed-off-by: Lukasz Luba --- Documentation/power/energy-model.rst | 206 +++++++++++++++++++++++++-- 1 file changed, 196 insertions(+), 10 deletions(-) diff --git a/Documentation/power/energy-model.rst b/Documentation/power/energy-model.rst index 13225965c9a4..1f8cf36914b1 100644 --- a/Documentation/power/energy-model.rst +++ b/Documentation/power/energy-model.rst @@ -72,16 +72,48 @@ required to have the same micro-architecture. CPUs in different performance domains can have different micro-architectures. -2. Core APIs +2. Design +----------------- + +2.1 Runtime modifiable EM +^^^^^^^^^^^^^^^^^^^^^^^^^ + +To better reflect power variation due to static power (leakage) the EM +supports runtime modifications of the power values. The mechanism relies on +RCU to free the modifiable EM perf_state table memory. Its user, the task +scheduler, also uses RCU to access this memory. The EM framework provides +API for allocating/freeing the new memory for the modifiable EM table. +The old memory is freed automatically using RCU callback mechanism when there +are no owners anymore for the given EM runtime table instance. This is tracked +using kref mechanism. The device driver which provided the new EM at runtime, +should call EM API to free it safely when it's no longer needed. The EM +framework will handle the clean-up when it's possible. + +The kernel code which want to modify the EM values is protected from concurrent +access using a mutex. Therefore, the device driver code must run in sleeping +context when it tries to modify the EM. + +With the runtime modifiable EM we switch from a 'single and during the entire +runtime static EM' (system property) design to a 'single EM which can be +changed during runtime according e.g. to the workload' (system and workload +property) design. + +It is possible also to modify the CPU performance values for each EM's +performance state. Thus, the full power and performance profile (which +is an exponential curve) can be changed according e.g. to the workload +or system property. + + +3. Core APIs ------------ -2.1 Config options +3.1 Config options ^^^^^^^^^^^^^^^^^^ CONFIG_ENERGY_MODEL must be enabled to use the EM framework. -2.2 Registration of performance domains +3.2 Registration of performance domains ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Registration of 'advanced' EM @@ -110,8 +142,8 @@ The last argument 'microwatts' is important to set with correct value. Kernel subsystems which use EM might rely on this flag to check if all EM devices use the same scale. If there are different scales, these subsystems might decide to return warning/error, stop working or panic. -See Section 3. for an example of driver implementing this -callback, or Section 2.4 for further documentation on this API +See Section 4. for an example of driver implementing this +callback, or Section 3.4 for further documentation on this API Registration of EM using DT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -156,7 +188,7 @@ The EM which is registered using this method might not reflect correctly the physics of a real device, e.g. when static power (leakage) is important. -2.3 Accessing performance domains +3.3 Accessing performance domains ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ There are two API functions which provide the access to the energy model: @@ -175,10 +207,83 @@ CPUfreq governor is in use in case of CPU device. Currently this calculation is not provided for other type of devices. More details about the above APIs can be found in ```` -or in Section 2.4 +or in Section 3.5 + + +3.4 Runtime modifications +^^^^^^^^^^^^^^^^^^^^^^^^^ + +Drivers willing to update the EM at runtime should use the following dedicated +function to allocate a new instance of the modified EM. The API is listed +below:: + + struct em_perf_table __rcu *em_allocate_table(struct em_perf_domain *pd); + +This allows to allocate a structure which contains the new EM table with +also RCU and kref needed by the EM framework. The 'struct em_perf_table' +contains array 'struct em_perf_state state[]' which is a list of performance +states in ascending order. That list must be populated by the device driver +which wants to update the EM. The list of frequencies can be taken from +existing EM (created during boot). The content in the 'struct em_perf_state' +must be populated by the driver as well. + +This is the API which does the EM update, using RCU pointers swap:: + + int em_dev_update_perf_domain(struct device *dev, + struct em_perf_table __rcu *new_table); + +Drivers must provide a pointer to the allocated and initialized new EM +'struct em_perf_table'. That new EM will be safely used inside the EM framework +and will be visible to other sub-systems in the kernel (thermal, powercap). +The main design goal for this API is to be fast and avoid extra calculations +or memory allocations at runtime. When pre-computed EMs are available in the +device driver, than it should be possible to simply re-use them with low +performance overhead. + +In order to free the EM, provided earlier by the driver (e.g. when the module +is unloaded), there is a need to call the API:: + + void em_free_table(struct em_perf_table __rcu *table); + +It will allow the EM framework to safely remove the memory, when there is +no other sub-system using it, e.g. EAS. + +To use the power values in other sub-systems (like thermal, powercap) there is +a need to call API which protects the reader and provide consistency of the EM +table data:: + struct em_perf_state *em_get_table(struct em_perf_domain *pd); -2.4 Description details of this API +It returns the 'struct em_perf_state' pointer which is an array of performance +states in ascending order. + +When the EM table is not needed anymore there is a need to call dedicated API:: + + void em_put_table(void); + +In this way the EM safely uses the RCU read section and protects the users. +It also allows the EM framework to manage the memory and free it. + +There is dedicated API for device drivers to calculate em_perf_state::cost +values:: + + int em_dev_compute_costs(struct device *dev, struct em_perf_state *table, + int nr_states); + +These 'cost' values from EM are used in EAS. The new EM table should be passed +together with the number of entries and device pointer. When the computation +of the cost values is done properly the return value from the function is 0. +The function takes care for right setting of inefficiency for each performance +state as well. It updates em_perf_state::flags accordingly. +Then such prepared new EM can be passed to the em_dev_update_perf_domain() +function, which will allow to use it. + +More details about the above APIs can be found in ```` +or in Section 4.2 with an example code showing simple implementation of the +updating mechanism in a device driver. + + +3.5 Description details of this API ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. kernel-doc:: include/linux/energy_model.h :internal: @@ -187,8 +292,11 @@ or in Section 2.4 :export: -3. Example driver ------------------ +4. Examples +----------- + +4.1 Example driver with EM registration +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The CPUFreq framework supports dedicated callback for registering the EM for a given CPU(s) 'policy' object: cpufreq_driver::register_em(). @@ -242,3 +350,81 @@ EM framework:: 39 static struct cpufreq_driver foo_cpufreq_driver = { 40 .register_em = foo_cpufreq_register_em, 41 }; + + +4.2 Example driver with EM modification +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +This section provides a simple example of a thermal driver modifying the EM. +The driver implements a foo_thermal_em_update() function. The driver is woken +up periodically to check the temperature and modify the EM data:: + + -> drivers/soc/example/example_em_mod.c + + 01 static void foo_get_new_em(struct device *dev) + 02 { + 03 struct em_perf_table __rcu *runtime_table; + 04 struct em_perf_state *table, *new_table; + 05 struct em_perf_domain *pd; + 06 unsigned long freq; + 07 int i, ret; + 08 + 09 pd = em_pd_get(dev); + 10 if (!pd) + 11 return; + 12 + 13 runtime_table = em_allocate_table(pd); + 14 if (!runtime_table) + 15 return; + 16 + 17 new_table = runtime_table->state; + 18 + 19 table = em_get_table(pd); + 20 for (i = 0; i < pd->nr_perf_states; i++) { + 21 freq = table[i].frequency; + 22 foo_get_power_perf_values(dev, freq, &new_table[i]); + 23 } + 24 em_put_table(); + 25 + 26 /* Calculate 'cost' values for EAS */ + 27 ret = em_dev_compute_costs(dev, table, pd->nr_perf_states); + 28 if (ret) { + 29 dev_warn(dev, "EM: compute costs failed %d\n", ret); + 30 em_free_table(runtime_table); + 31 return; + 32 } + 33 + 34 ret = em_dev_update_perf_domain(dev, runtime_table); + 35 if (ret) { + 36 dev_warn(dev, "EM: update failed %d\n", ret); + 37 em_free_table(runtime_table); + 38 return; + 39 } + 40 + 41 ctx->runtime_table = runtime_table; + 42 } + 43 + 44 /* + 45 * Function called periodically to check the temperature and + 46 * update the EM if needed + 47 */ + 48 static void foo_thermal_em_update(struct foo_context *ctx) + 49 { + 50 struct device *dev = ctx->dev; + 51 int cpu; + 52 + 53 ctx->temperature = foo_get_temp(dev, ctx); + 54 if (ctx->temperature < FOO_EM_UPDATE_TEMP_THRESHOLD) + 55 return; + 56 + 57 foo_get_new_em(dev); + 58 } + 59 + 60 static void foo_exit(void) + 61 { + 62 struct foo_context *ctx = glob_ctx; + 63 + 64 em_free_table(ctx->runtime_table); + 65 } + 66 + 67 module_exit(foo_exit);