From patchwork Tue Nov 8 03:03:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: srinivas pandruvada X-Patchwork-Id: 13035742 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B3A1C43217 for ; Tue, 8 Nov 2022 03:04:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233274AbiKHDEK (ORCPT ); Mon, 7 Nov 2022 22:04:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40464 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233128AbiKHDEI (ORCPT ); Mon, 7 Nov 2022 22:04:08 -0500 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D9E2A2FFDD; Mon, 7 Nov 2022 19:04:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667876647; x=1699412647; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Gx12/V9+4eqZkr9V3cg1u7gq6FKczYaDMmSB1jBBM2g=; b=DCqoVApWPnv0VFbwOFvUvH+LyoSMY+jwmsJLdy9FNkO7EFcgv/0d/Zgj XEY7Wa6gazQhh6o6cz1llWHKPMmmZHA2CRy1VtbttqoVKDp2NpnZFkgR9 8PP9u6783wNCKCcBEE7N/ble4w1C3UK/4b/mMwlB7FBF8BY7pFBRYAU3F L/XuOKz+5y25sRNbifJYdwTU7M+JGNCiesCQhjkLHGx30eDlosqjHxqRt egYfHIT1ESaVbJpicg4e5Vc1LRjj1TLVu/tc2vj/n5YPl02ofprGubyRS P/iq9ubX4tNpAkkiKfEWAJCmgJNX29y8ehk7oPOwwDDYW3cbsH+egHowt g==; X-IronPort-AV: E=McAfee;i="6500,9779,10524"; a="337310747" X-IronPort-AV: E=Sophos;i="5.96,145,1665471600"; d="scan'208";a="337310747" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Nov 2022 19:04:07 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10524"; a="638612673" X-IronPort-AV: E=Sophos;i="5.96,145,1665471600"; d="scan'208";a="638612673" Received: from spandruv-desk.jf.intel.com ([10.54.75.8]) by fmsmga007.fm.intel.com with ESMTP; 07 Nov 2022 19:04:06 -0800 From: Srinivas Pandruvada To: rui.zhang@intel.com, rafael@kernel.org, daniel.lezcano@linaro.org Cc: amitk@kernel.org, linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Srinivas Pandruvada Subject: [PATCH 1/4] powercap: idle_inject: Export symbols Date: Mon, 7 Nov 2022 19:03:39 -0800 Message-Id: <20221108030342.1127216-2-srinivas.pandruvada@linux.intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20221108030342.1127216-1-srinivas.pandruvada@linux.intel.com> References: <20221108030342.1127216-1-srinivas.pandruvada@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org Export symbols for external interfaces, so that they can be used in other loadable modules. Export is done under name space IDLE_INJECT. Signed-off-by: Srinivas Pandruvada --- drivers/powercap/idle_inject.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/powercap/idle_inject.c b/drivers/powercap/idle_inject.c index 999e218d7793..e73885bd9065 100644 --- a/drivers/powercap/idle_inject.c +++ b/drivers/powercap/idle_inject.c @@ -159,6 +159,7 @@ void idle_inject_set_duration(struct idle_inject_device *ii_dev, WRITE_ONCE(ii_dev->idle_duration_us, idle_duration_us); } } +EXPORT_SYMBOL_NS_GPL(idle_inject_set_duration, IDLE_INJECT); /** * idle_inject_get_duration - idle and run duration retrieval helper @@ -172,6 +173,7 @@ void idle_inject_get_duration(struct idle_inject_device *ii_dev, *run_duration_us = READ_ONCE(ii_dev->run_duration_us); *idle_duration_us = READ_ONCE(ii_dev->idle_duration_us); } +EXPORT_SYMBOL_NS_GPL(idle_inject_get_duration, IDLE_INJECT); /** * idle_inject_set_latency - set the maximum latency allowed @@ -182,6 +184,7 @@ void idle_inject_set_latency(struct idle_inject_device *ii_dev, { WRITE_ONCE(ii_dev->latency_us, latency_us); } +EXPORT_SYMBOL_NS_GPL(idle_inject_set_latency, IDLE_INJECT); /** * idle_inject_start - start idle injections @@ -213,6 +216,7 @@ int idle_inject_start(struct idle_inject_device *ii_dev) return 0; } +EXPORT_SYMBOL_NS_GPL(idle_inject_start, IDLE_INJECT); /** * idle_inject_stop - stops idle injections @@ -259,6 +263,7 @@ void idle_inject_stop(struct idle_inject_device *ii_dev) cpu_hotplug_enable(); } +EXPORT_SYMBOL_NS_GPL(idle_inject_stop, IDLE_INJECT); /** * idle_inject_setup - prepare the current task for idle injection @@ -334,6 +339,7 @@ struct idle_inject_device *idle_inject_register(struct cpumask *cpumask) return NULL; } +EXPORT_SYMBOL_NS_GPL(idle_inject_register, IDLE_INJECT); /** * idle_inject_unregister - unregister idle injection control device @@ -354,6 +360,7 @@ void idle_inject_unregister(struct idle_inject_device *ii_dev) kfree(ii_dev); } +EXPORT_SYMBOL_NS_GPL(idle_inject_unregister, IDLE_INJECT); static struct smp_hotplug_thread idle_inject_threads = { .store = &idle_inject_thread.tsk, From patchwork Tue Nov 8 03:03:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: srinivas pandruvada X-Patchwork-Id: 13035743 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EE345C4332F for ; Tue, 8 Nov 2022 03:04:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230046AbiKHDEL (ORCPT ); Mon, 7 Nov 2022 22:04:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40474 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233168AbiKHDEJ (ORCPT ); Mon, 7 Nov 2022 22:04:09 -0500 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 202CB2FFDF; Mon, 7 Nov 2022 19:04:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667876648; x=1699412648; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=bZKUfgswrQzZt1xpgqXVm99HvJ16L7TugknFYChFNOY=; b=kod4ktCazBOaxYEIesnWtCk1WqxovrgDvX+SCBoDw9WjjcnGyNbp213t sP4r1U0p3hlQbcBK8CzKxrVfvVPHMg+oTEWCmCm6kP1zKYj+V7WR/a9XU j3SI5U+GopD2Tlf1hiXfmiSy+db9qmLpWDqd4RJrTlexMFpH3qXBVTYTG 5sZY5Qu30l+W7r0fFJq6ARpR15aAQtbaqfguZDSfPP5j591Z1klyb0jks i2KoGrGmuZQznixNpQxN5PAlqiexX+wXbiNYdxsl+OJDyu2mBvj1OM4aD Pzg2D5wNqXgdAauuwnc/5ApXTnpd42P1uHmLAm5Eb4hMMofLoN0KwBpi6 Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10524"; a="337310748" X-IronPort-AV: E=Sophos;i="5.96,145,1665471600"; d="scan'208";a="337310748" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Nov 2022 19:04:07 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10524"; a="638612678" X-IronPort-AV: E=Sophos;i="5.96,145,1665471600"; d="scan'208";a="638612678" Received: from spandruv-desk.jf.intel.com ([10.54.75.8]) by fmsmga007.fm.intel.com with ESMTP; 07 Nov 2022 19:04:07 -0800 From: Srinivas Pandruvada To: rui.zhang@intel.com, rafael@kernel.org, daniel.lezcano@linaro.org Cc: amitk@kernel.org, linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Srinivas Pandruvada Subject: [PATCH 2/4] powercap: idle_inject: Add begin/end callbacks Date: Mon, 7 Nov 2022 19:03:40 -0800 Message-Id: <20221108030342.1127216-3-srinivas.pandruvada@linux.intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20221108030342.1127216-1-srinivas.pandruvada@linux.intel.com> References: <20221108030342.1127216-1-srinivas.pandruvada@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org The actual CPU Idle percent can be different than what can be observed from the hardware. Since the objective for CPU Idle injection is for thermal control, the idle percent observed by the hardware is more relevant. To account for hardware feedback the actual runtime/idle time should be adjusted. Add a capability to register a begin and end callback during call to idle_inject_register(). If they are not NULL, then begin callback is called before calling play_idle_precise() and end callback is called after play_idle_precise(). If begin callback is present and returns non 0 value then play_idle_precise() is not called as it means there is some over compensation. Signed-off-by: Srinivas Pandruvada --- drivers/powercap/idle_inject.c | 19 ++++++++++++++++++- drivers/thermal/cpuidle_cooling.c | 2 +- include/linux/idle_inject.h | 4 +++- 3 files changed, 22 insertions(+), 3 deletions(-) diff --git a/drivers/powercap/idle_inject.c b/drivers/powercap/idle_inject.c index e73885bd9065..14968b0ff133 100644 --- a/drivers/powercap/idle_inject.c +++ b/drivers/powercap/idle_inject.c @@ -70,6 +70,8 @@ struct idle_inject_device { unsigned int idle_duration_us; unsigned int run_duration_us; unsigned int latency_us; + int (*idle_inject_begin)(unsigned int cpu); + void (*idle_inject_end)(unsigned int cpu); unsigned long cpumask[]; }; @@ -132,6 +134,7 @@ static void idle_inject_fn(unsigned int cpu) { struct idle_inject_device *ii_dev; struct idle_inject_thread *iit; + int ret; ii_dev = per_cpu(idle_inject_device, cpu); iit = per_cpu_ptr(&idle_inject_thread, cpu); @@ -141,8 +144,18 @@ static void idle_inject_fn(unsigned int cpu) */ iit->should_run = 0; + if (ii_dev->idle_inject_begin) { + ret = ii_dev->idle_inject_begin(cpu); + if (ret) + goto skip; + } + play_idle_precise(READ_ONCE(ii_dev->idle_duration_us) * NSEC_PER_USEC, READ_ONCE(ii_dev->latency_us) * NSEC_PER_USEC); + +skip: + if (ii_dev->idle_inject_end) + ii_dev->idle_inject_end(cpu); } /** @@ -302,7 +315,9 @@ static int idle_inject_should_run(unsigned int cpu) * Return: NULL if memory allocation fails, idle injection control device * pointer on success. */ -struct idle_inject_device *idle_inject_register(struct cpumask *cpumask) +struct idle_inject_device *idle_inject_register(struct cpumask *cpumask, + int (*idle_inject_begin)(unsigned int cpu), + void (*idle_inject_end)(unsigned int cpu)) { struct idle_inject_device *ii_dev; int cpu, cpu_rb; @@ -315,6 +330,8 @@ struct idle_inject_device *idle_inject_register(struct cpumask *cpumask) hrtimer_init(&ii_dev->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); ii_dev->timer.function = idle_inject_timer_fn; ii_dev->latency_us = UINT_MAX; + ii_dev->idle_inject_begin = idle_inject_begin; + ii_dev->idle_inject_end = idle_inject_end; for_each_cpu(cpu, to_cpumask(ii_dev->cpumask)) { diff --git a/drivers/thermal/cpuidle_cooling.c b/drivers/thermal/cpuidle_cooling.c index 4f41102e8b16..e8b35b3b5767 100644 --- a/drivers/thermal/cpuidle_cooling.c +++ b/drivers/thermal/cpuidle_cooling.c @@ -184,7 +184,7 @@ static int __cpuidle_cooling_register(struct device_node *np, goto out; } - ii_dev = idle_inject_register(drv->cpumask); + ii_dev = idle_inject_register(drv->cpumask, NULL, NULL); if (!ii_dev) { ret = -EINVAL; goto out_kfree; diff --git a/include/linux/idle_inject.h b/include/linux/idle_inject.h index fb88e23a99d3..73f3414fafe2 100644 --- a/include/linux/idle_inject.h +++ b/include/linux/idle_inject.h @@ -11,7 +11,9 @@ /* private idle injection device structure */ struct idle_inject_device; -struct idle_inject_device *idle_inject_register(struct cpumask *cpumask); +struct idle_inject_device *idle_inject_register(struct cpumask *cpumask, + int (*idle_inject_begin)(unsigned int cpu), + void (*idle_inject_end)(unsigned int cpu)); void idle_inject_unregister(struct idle_inject_device *ii_dev); From patchwork Tue Nov 8 03:03:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: srinivas pandruvada X-Patchwork-Id: 13035745 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1129FC4332F for ; Tue, 8 Nov 2022 03:04:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232939AbiKHDEP (ORCPT ); Mon, 7 Nov 2022 22:04:15 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40498 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233278AbiKHDEL (ORCPT ); Mon, 7 Nov 2022 22:04:11 -0500 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1103C2FFE8; Mon, 7 Nov 2022 19:04:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667876649; x=1699412649; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=tN3FZAv/jGEV+pgvJ7qU98ogWm2HC2WDE/hCylU57Qk=; b=L2TFqOtZdqv5ZOX1iyPkGKXcKxKcuvF1QAS/peBOoPC+ip883uZvBTfH qQx63xjAqx2ml+tYQ+I5olJKuFTHXxDrUWBRa152G1ipRO3toqheVVTEz eVThGHVwx1vvVQupel1WVFYBVTKn47PPnRMRizfSArbB3QHvyDlH0CjbX 7ULpmShPtCmoqFIwtP34K9HjuQP72rFjX7CdyGkihHDitoflX8rZMPwM2 HJUENNnT9R+ECLBHDUs1c9R9QFEsGdmL0FvS0+ssZ+De/rZIEi6tQwHMB eq6R1CnsmNrW/dE96NBoGwAS3j06PMtXpqgqIgjZ48oZJ/BXoXgTgtyTZ w==; X-IronPort-AV: E=McAfee;i="6500,9779,10524"; a="337310749" X-IronPort-AV: E=Sophos;i="5.96,145,1665471600"; d="scan'208";a="337310749" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Nov 2022 19:04:07 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10524"; a="638612687" X-IronPort-AV: E=Sophos;i="5.96,145,1665471600"; d="scan'208";a="638612687" Received: from spandruv-desk.jf.intel.com ([10.54.75.8]) by fmsmga007.fm.intel.com with ESMTP; 07 Nov 2022 19:04:07 -0800 From: Srinivas Pandruvada To: rui.zhang@intel.com, rafael@kernel.org, daniel.lezcano@linaro.org Cc: amitk@kernel.org, linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Srinivas Pandruvada Subject: [PATCH 3/4] thermal/drivers/intel_powerclamp: Use powercap idle-inject framework Date: Mon, 7 Nov 2022 19:03:41 -0800 Message-Id: <20221108030342.1127216-4-srinivas.pandruvada@linux.intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20221108030342.1127216-1-srinivas.pandruvada@linux.intel.com> References: <20221108030342.1127216-1-srinivas.pandruvada@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org There are two idle injection implementation in the Linux kernel. One via intel_powerclamp and the other using powercap/idle_inject. Both implementation end up in calling play_idle* function from a FIFO priority thread. Both can't be used at the same time. Currently per core idle injection (cpuidle_cooling) is using powercap/idle_inject, which is not used in platforms where intel_powerclamp is used for system wide idle injection. So there is no conflict. But there are some use cases where per core idle injection is beneficial on the same system where system wide idle injection is also used via intel_powerclamp. To avoid conflict only one of the idle injection type must be in use at a time. This require a common framework which both per core and system wide idle injection can use. Here powercap/idle_inject can be used for both per-core and for system wide idle injection. This framework has a well defined interface which allow registry for per-core or for all CPUs (system wide). If particular CPU is already participating in idle injection, the call to registry fails. Here the registry can be done when user space changes the current cooling device state. Also one framework for idle injection is better as there is one loop calling play_idle*, instead of multiple for better maintenance. So, reuse powercap/idle_inject calls in intel_powerclamp. This simplifies the code as all per CPU kthreads which calls play_idle* can be removed. The changes include: - Remove unneeded include files - Remove per CPU kthread workers: balancing_work and idle_injection_work - Reuse the compensation related code by moving from previous worker thread to idle_injection callbacks - Adjust the idle_duration and runtime by using powercap/idle_inject interface - Remove all variables, which are not required once powercap/idle_inject is used - Add mutex to avoid race during removal of idle injection during module unload and user action to change idle inject percent - Use READ_ONCE and WRITE_ONCE for data accessed from multiple CPUs Signed-off-by: Srinivas Pandruvada --- drivers/thermal/intel/Kconfig | 1 + drivers/thermal/intel/intel_powerclamp.c | 293 ++++++++++------------- 2 files changed, 126 insertions(+), 168 deletions(-) diff --git a/drivers/thermal/intel/Kconfig b/drivers/thermal/intel/Kconfig index f0c845679250..0883740bf70d 100644 --- a/drivers/thermal/intel/Kconfig +++ b/drivers/thermal/intel/Kconfig @@ -3,6 +3,7 @@ config INTEL_POWERCLAMP tristate "Intel PowerClamp idle injection driver" depends on X86 depends on CPU_SUP_INTEL + select IDLE_INJECT help Enable this to enable Intel PowerClamp idle injection driver. This enforce idle time which results in more package C-state residency. The diff --git a/drivers/thermal/intel/intel_powerclamp.c b/drivers/thermal/intel/intel_powerclamp.c index b80e25ec1261..17154686827d 100644 --- a/drivers/thermal/intel/intel_powerclamp.c +++ b/drivers/thermal/intel/intel_powerclamp.c @@ -2,7 +2,7 @@ /* * intel_powerclamp.c - package c-state idle injection * - * Copyright (c) 2012, Intel Corporation. + * Copyright (c) 2022, Intel Corporation. * * Authors: * Arjan van de Ven @@ -27,21 +27,15 @@ #include #include #include -#include #include #include -#include -#include #include #include -#include -#include +#include -#include #include #include #include -#include #define MAX_TARGET_RATIO (50U) /* For each undisturbed clamping period (no extra wake ups during idle time), @@ -60,6 +54,7 @@ static struct dentry *debug_dir; /* user selected target */ static unsigned int set_target_ratio; +static bool target_ratio_updated; static unsigned int current_ratio; static bool should_skip; @@ -67,26 +62,20 @@ static unsigned int control_cpu; /* The cpu assigned to collect stat and update * control parameters. default to BSP but BSP * can be offlined. */ -static bool clamping; - -struct powerclamp_worker_data { - struct kthread_worker *worker; - struct kthread_work balancing_work; - struct kthread_delayed_work idle_injection_work; +struct powerclamp_data { unsigned int cpu; unsigned int count; unsigned int guard; unsigned int window_size_now; unsigned int target_ratio; - unsigned int duration_jiffies; bool clamping; }; -static struct powerclamp_worker_data __percpu *worker_data; +static struct powerclamp_data powerclamp_data; + static struct thermal_cooling_device *cooling_dev; -static unsigned long *cpu_clamping_mask; /* bit map for tracking per cpu - * clamping kthread worker - */ + +static DEFINE_MUTEX(powerclamp_lock); static unsigned int duration; static unsigned int pkg_cstate_ratio_cur; @@ -344,79 +333,33 @@ static bool powerclamp_adjust_controls(unsigned int target_ratio, return set_target_ratio + guard <= current_ratio; } -static void clamp_balancing_func(struct kthread_work *work) +static unsigned int get_run_time(void) { - struct powerclamp_worker_data *w_data; - int sleeptime; - unsigned long target_jiffies; unsigned int compensated_ratio; - int interval; /* jiffies to sleep for each attempt */ - - w_data = container_of(work, struct powerclamp_worker_data, - balancing_work); + unsigned int runtime; /* * make sure user selected ratio does not take effect until * the next round. adjust target_ratio if user has changed * target such that we can converge quickly. */ - w_data->target_ratio = READ_ONCE(set_target_ratio); - w_data->guard = 1 + w_data->target_ratio / 20; - w_data->window_size_now = window_size; - w_data->duration_jiffies = msecs_to_jiffies(duration); - w_data->count++; + powerclamp_data.target_ratio = READ_ONCE(set_target_ratio); + powerclamp_data.guard = 1 + powerclamp_data.target_ratio / 20; + powerclamp_data.window_size_now = window_size; /* * systems may have different ability to enter package level * c-states, thus we need to compensate the injected idle ratio * to achieve the actual target reported by the HW. */ - compensated_ratio = w_data->target_ratio + - get_compensation(w_data->target_ratio); + compensated_ratio = powerclamp_data.target_ratio + + get_compensation(powerclamp_data.target_ratio); if (compensated_ratio <= 0) compensated_ratio = 1; - interval = w_data->duration_jiffies * 100 / compensated_ratio; - - /* align idle time */ - target_jiffies = roundup(jiffies, interval); - sleeptime = target_jiffies - jiffies; - if (sleeptime <= 0) - sleeptime = 1; - - if (clamping && w_data->clamping && cpu_online(w_data->cpu)) - kthread_queue_delayed_work(w_data->worker, - &w_data->idle_injection_work, - sleeptime); -} -static void clamp_idle_injection_func(struct kthread_work *work) -{ - struct powerclamp_worker_data *w_data; - - w_data = container_of(work, struct powerclamp_worker_data, - idle_injection_work.work); + runtime = duration * 100 / compensated_ratio - duration; - /* - * only elected controlling cpu can collect stats and update - * control parameters. - */ - if (w_data->cpu == control_cpu && - !(w_data->count % w_data->window_size_now)) { - should_skip = - powerclamp_adjust_controls(w_data->target_ratio, - w_data->guard, - w_data->window_size_now); - smp_mb(); - } - - if (should_skip) - goto balance; - - play_idle(jiffies_to_usecs(w_data->duration_jiffies)); - -balance: - if (clamping && w_data->clamping && cpu_online(w_data->cpu)) - kthread_queue_work(w_data->worker, &w_data->balancing_work); + return runtime; } /* @@ -452,104 +395,128 @@ static void poll_pkg_cstate(struct work_struct *dummy) msr_last = msr_now; tsc_last = tsc_now; - if (true == clamping) + if (powerclamp_data.clamping) schedule_delayed_work(&poll_pkg_cstate_work, HZ); } -static void start_power_clamp_worker(unsigned long cpu) +static struct idle_inject_device *ii_dev; + +static int idle_inject_begin(unsigned int cpu) { - struct powerclamp_worker_data *w_data = per_cpu_ptr(worker_data, cpu); - struct kthread_worker *worker; + /* + * only elected controlling cpu can collect stats and update + * control parameters. + */ + if (cpu == control_cpu) { + bool update = READ_ONCE(target_ratio_updated); + + if (!(powerclamp_data.count % powerclamp_data.window_size_now)) { + bool skip = powerclamp_adjust_controls(powerclamp_data.target_ratio, + powerclamp_data.guard, + powerclamp_data.window_size_now); + WRITE_ONCE(should_skip, skip); + update = true; + } - worker = kthread_create_worker_on_cpu(cpu, 0, "kidle_inj/%ld", cpu); - if (IS_ERR(worker)) - return; + if (update) { + unsigned int runtime; + + runtime = get_run_time(); + idle_inject_set_duration(ii_dev, runtime, duration); + WRITE_ONCE(target_ratio_updated, false); + } + powerclamp_data.count++; + } + + if (READ_ONCE(should_skip)) + return -EAGAIN; - w_data->worker = worker; - w_data->count = 0; - w_data->cpu = cpu; - w_data->clamping = true; - set_bit(cpu, cpu_clamping_mask); - sched_set_fifo(worker->task); - kthread_init_work(&w_data->balancing_work, clamp_balancing_func); - kthread_init_delayed_work(&w_data->idle_injection_work, - clamp_idle_injection_func); - kthread_queue_work(w_data->worker, &w_data->balancing_work); + return 0; } -static void stop_power_clamp_worker(unsigned long cpu) +static void trigger_idle_injection(void) { - struct powerclamp_worker_data *w_data = per_cpu_ptr(worker_data, cpu); + unsigned int runtime = get_run_time(); - if (!w_data->worker) - return; + idle_inject_set_duration(ii_dev, runtime, duration); + idle_inject_start(ii_dev); + powerclamp_data.clamping = true; +} + +static int powerclamp_idle_injection_register(void) +{ + static cpumask_t idle_injection_cpu_mask; + unsigned long cpu; - w_data->clamping = false; - /* - * Make sure that all works that get queued after this point see - * the clamping disabled. The counter part is not needed because - * there is an implicit memory barrier when the queued work - * is proceed. - */ - smp_wmb(); - kthread_cancel_work_sync(&w_data->balancing_work); - kthread_cancel_delayed_work_sync(&w_data->idle_injection_work); /* - * The balancing work still might be queued here because - * the handling of the "clapming" variable, cancel, and queue - * operations are not synchronized via a lock. But it is not - * a big deal. The balancing work is fast and destroy kthread - * will wait for it. + * The idle inject core will only inject for online CPUs, + * So we can register for all present CPUs. In this way + * if some CPU goes online/offline while idle inject + * is registered, nothing additional calls are required. + * The same runtime and idle time is applicable for + * newly onlined CPUs if any. */ - clear_bit(w_data->cpu, cpu_clamping_mask); - kthread_destroy_worker(w_data->worker); + for_each_present_cpu(cpu) { + cpumask_set_cpu(cpu, &idle_injection_cpu_mask); + } + + ii_dev = idle_inject_register(&idle_injection_cpu_mask, + idle_inject_begin, + NULL); + if (!ii_dev) { + pr_err("powerclamp: idle_inject_register failed\n"); + return -EAGAIN; + } - w_data->worker = NULL; + idle_inject_set_duration(ii_dev, TICK_USEC, duration); + idle_inject_set_latency(ii_dev, UINT_MAX); + + return 0; +} + +static void remove_idle_injection(void) +{ + if (!powerclamp_data.clamping) + return; + + powerclamp_data.clamping = false; + idle_inject_stop(ii_dev); } static int start_power_clamp(void) { - unsigned long cpu; + int ret; - set_target_ratio = clamp(set_target_ratio, 0U, MAX_TARGET_RATIO - 1); /* prevent cpu hotplug */ cpus_read_lock(); /* prefer BSP */ control_cpu = cpumask_first(cpu_online_mask); - clamping = true; - schedule_delayed_work(&poll_pkg_cstate_work, 0); - - /* start one kthread worker per online cpu */ - for_each_online_cpu(cpu) { - start_power_clamp_worker(cpu); + ret = powerclamp_idle_injection_register(); + if (!ret) { + trigger_idle_injection(); + schedule_delayed_work(&poll_pkg_cstate_work, 0); } + cpus_read_unlock(); - return 0; + return ret; } static void end_power_clamp(void) { - int i; - - /* - * Block requeuing in all the kthread workers. They will flush and - * stop faster. - */ - clamping = false; - for_each_set_bit(i, cpu_clamping_mask, num_possible_cpus()) { - pr_debug("clamping worker for cpu %d alive, destroy\n", i); - stop_power_clamp_worker(i); + if (powerclamp_data.clamping) { + remove_idle_injection(); + idle_inject_unregister(ii_dev); } } static int powerclamp_cpu_online(unsigned int cpu) { - if (clamping == false) + if (!powerclamp_data.clamping) return 0; - start_power_clamp_worker(cpu); + /* prefer BSP as controlling CPU */ if (cpu == 0) { control_cpu = 0; @@ -560,10 +527,6 @@ static int powerclamp_cpu_online(unsigned int cpu) static int powerclamp_cpu_predown(unsigned int cpu) { - if (clamping == false) - return 0; - - stop_power_clamp_worker(cpu); if (cpu != control_cpu) return 0; @@ -585,7 +548,7 @@ static int powerclamp_get_max_state(struct thermal_cooling_device *cdev, static int powerclamp_get_cur_state(struct thermal_cooling_device *cdev, unsigned long *state) { - if (true == clamping) + if (powerclamp_data.clamping) *state = pkg_cstate_ratio_cur; else /* to save power, do not poll idle ratio while not clamping */ @@ -599,24 +562,30 @@ static int powerclamp_set_cur_state(struct thermal_cooling_device *cdev, { int ret = 0; + mutex_lock(&powerclamp_lock); + new_target_ratio = clamp(new_target_ratio, 0UL, - (unsigned long) (MAX_TARGET_RATIO-1)); - if (set_target_ratio == 0 && new_target_ratio > 0) { + (unsigned long) (MAX_TARGET_RATIO - 1)); + if (READ_ONCE(set_target_ratio) == 0 && new_target_ratio > 0) { pr_info("Start idle injection to reduce power\n"); - set_target_ratio = new_target_ratio; + WRITE_ONCE(set_target_ratio, new_target_ratio); ret = start_power_clamp(); + if (ret) + WRITE_ONCE(set_target_ratio, 0); goto exit_set; - } else if (set_target_ratio > 0 && new_target_ratio == 0) { + } else if (READ_ONCE(set_target_ratio) > 0 && new_target_ratio == 0) { pr_info("Stop forced idle injection\n"); end_power_clamp(); - set_target_ratio = 0; + WRITE_ONCE(set_target_ratio, 0); + WRITE_ONCE(target_ratio_updated, false); } else /* adjust currently running */ { - set_target_ratio = new_target_ratio; - /* make new set_target_ratio visible to other cpus */ - smp_mb(); + WRITE_ONCE(set_target_ratio, new_target_ratio); + WRITE_ONCE(target_ratio_updated, true); } exit_set: + mutex_unlock(&powerclamp_lock); + return ret; } @@ -686,14 +655,10 @@ static int __init powerclamp_init(void) { int retval; - cpu_clamping_mask = bitmap_zalloc(num_possible_cpus(), GFP_KERNEL); - if (!cpu_clamping_mask) - return -ENOMEM; - /* probe cpu features and ids here */ retval = powerclamp_probe(); if (retval) - goto exit_free; + return retval; /* set default limit, maybe adjusted during runtime based on feedback */ window_size = 2; @@ -702,53 +667,45 @@ static int __init powerclamp_init(void) powerclamp_cpu_online, powerclamp_cpu_predown); if (retval < 0) - goto exit_free; + return retval; hp_state = retval; - worker_data = alloc_percpu(struct powerclamp_worker_data); - if (!worker_data) { - retval = -ENOMEM; - goto exit_unregister; - } - cooling_dev = thermal_cooling_device_register("intel_powerclamp", NULL, - &powerclamp_cooling_ops); + &powerclamp_cooling_ops); if (IS_ERR(cooling_dev)) { retval = -ENODEV; - goto exit_free_thread; + goto exit_unregister; } if (!duration) - duration = jiffies_to_msecs(DEFAULT_DURATION_JIFFIES); + duration = jiffies_to_usecs(DEFAULT_DURATION_JIFFIES); powerclamp_create_debug_files(); return 0; -exit_free_thread: - free_percpu(worker_data); exit_unregister: cpuhp_remove_state_nocalls(hp_state); -exit_free: - bitmap_free(cpu_clamping_mask); return retval; } module_init(powerclamp_init); static void __exit powerclamp_exit(void) { + mutex_lock(&powerclamp_lock); end_power_clamp(); + mutex_unlock(&powerclamp_lock); cpuhp_remove_state_nocalls(hp_state); - free_percpu(worker_data); thermal_cooling_device_unregister(cooling_dev); - bitmap_free(cpu_clamping_mask); cancel_delayed_work_sync(&poll_pkg_cstate_work); debugfs_remove_recursive(debug_dir); } module_exit(powerclamp_exit); +MODULE_IMPORT_NS(IDLE_INJECT); + MODULE_LICENSE("GPL"); MODULE_AUTHOR("Arjan van de Ven "); MODULE_AUTHOR("Jacob Pan "); From patchwork Tue Nov 8 03:03:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: srinivas pandruvada X-Patchwork-Id: 13035744 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1401EC43217 for ; Tue, 8 Nov 2022 03:04:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232950AbiKHDEM (ORCPT ); Mon, 7 Nov 2022 22:04:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40490 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233273AbiKHDEK (ORCPT ); Mon, 7 Nov 2022 22:04:10 -0500 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 118672FFEC; Mon, 7 Nov 2022 19:04:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667876649; x=1699412649; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=dxRO8ihGEN1iBkWIJJbqgCAhtl1mODxjE+WqrNOJ0rM=; b=SuifzG+f6KOEz5OvTPoh3jxasb+hYcEqz0CE424AZIJfjsZeDfe15rwO /Zj10vvdaOSfGExMmF4D3zVffpsNGfMfY0bRF/DcOxJJwMLBacN7HoOof nzOgUYBYtMSRYEAF5mquFtVNN6RhkJz7a8wYbPZMbMW6GaTyJGVIEBLA5 Cxfc4R3dXMrQA5j9KJULqLdIszxz7QQmFLR/rAsVBieGXbOQP5URXYq9X p1wkrY/cNwCESTbvRtC/W1gXtfEET5hiSBrRku8kIb8xQsT9m1O65upSC 0G1iPh/tkIkJjOO/jwd3VIgJskTN+2WL1xtO5Sf2AIsPtc0oaRH1JX+sc g==; X-IronPort-AV: E=McAfee;i="6500,9779,10524"; a="337310750" X-IronPort-AV: E=Sophos;i="5.96,145,1665471600"; d="scan'208";a="337310750" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Nov 2022 19:04:08 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10524"; a="638612691" X-IronPort-AV: E=Sophos;i="5.96,145,1665471600"; d="scan'208";a="638612691" Received: from spandruv-desk.jf.intel.com ([10.54.75.8]) by fmsmga007.fm.intel.com with ESMTP; 07 Nov 2022 19:04:07 -0800 From: Srinivas Pandruvada To: rui.zhang@intel.com, rafael@kernel.org, daniel.lezcano@linaro.org Cc: amitk@kernel.org, linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Srinivas Pandruvada Subject: [PATCH 4/4] thermal/drivers/intel_cpu_idle_cooling: Introduce Intel cpu idle cooling driver Date: Mon, 7 Nov 2022 19:03:42 -0800 Message-Id: <20221108030342.1127216-5-srinivas.pandruvada@linux.intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20221108030342.1127216-1-srinivas.pandruvada@linux.intel.com> References: <20221108030342.1127216-1-srinivas.pandruvada@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org The cpu idle cooling is used to cool down a CPU by injecting idle cycles at runtime. The objective is similar to intel_powerclamp driver, which is used for system wide cooling by injecting idle on each CPU. This driver is modeled after drivers/thermal/cpuidle_cooling.c by reusing powercap/idle_inject framework. On each CPU online a thermal cooling device is registered. The minimum state of the cooling device is 0 and maximum is 100. When user space changes the current state to non zero, then register with idle inject framework and start idle inject. The default idle duration is 24 milli seconds, matching intel_powerclamp, which doesn't change based on the current state of cooling device. The runtime is changed based on the current state. Signed-off-by: Srinivas Pandruvada --- drivers/thermal/intel/Kconfig | 10 + drivers/thermal/intel/Makefile | 1 + .../thermal/intel/intel_cpu_idle_cooling.c | 262 ++++++++++++++++++ 3 files changed, 273 insertions(+) create mode 100644 drivers/thermal/intel/intel_cpu_idle_cooling.c diff --git a/drivers/thermal/intel/Kconfig b/drivers/thermal/intel/Kconfig index 0883740bf70d..c93daa7c83eb 100644 --- a/drivers/thermal/intel/Kconfig +++ b/drivers/thermal/intel/Kconfig @@ -114,3 +114,13 @@ config INTEL_HFI_THERMAL These capabilities may change as a result of changes in the operating conditions of the system such power and thermal limits. If selected, the kernel relays updates in CPUs' capabilities to userspace. + +config INTEL_CPU_IDLE_COOLING + tristate "Intel CPU idle cooling device" + depends on IDLE_INJECT + help + This implements the CPU cooling mechanism through + idle injection. This will throttle the CPU by injecting + idle cycle. + Unlike Intel Power clamp driver, this driver provides + idle injection for each CPU. diff --git a/drivers/thermal/intel/Makefile b/drivers/thermal/intel/Makefile index 9a8d8054f316..8d5f7b5cf9b7 100644 --- a/drivers/thermal/intel/Makefile +++ b/drivers/thermal/intel/Makefile @@ -14,3 +14,4 @@ obj-$(CONFIG_INTEL_TCC_COOLING) += intel_tcc_cooling.o obj-$(CONFIG_X86_THERMAL_VECTOR) += therm_throt.o obj-$(CONFIG_INTEL_MENLOW) += intel_menlow.o obj-$(CONFIG_INTEL_HFI_THERMAL) += intel_hfi.o +obj-$(CONFIG_INTEL_CPU_IDLE_COOLING) += intel_cpu_idle_cooling.o diff --git a/drivers/thermal/intel/intel_cpu_idle_cooling.c b/drivers/thermal/intel/intel_cpu_idle_cooling.c new file mode 100644 index 000000000000..5df79f38d9fb --- /dev/null +++ b/drivers/thermal/intel/intel_cpu_idle_cooling.c @@ -0,0 +1,262 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Per CPU Idle injection cooling device implementation + * + * Copyright (c) 2022, Intel Corporation. + * All rights reserved. + * + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +/* Duration match with intel_powerclamp driver */ +#define IDLE_DURATION 24000 +#define IDLE_LATENCY UINT_MAX + +static int idle_duration_us = IDLE_DURATION; +static int idle_latency_us = IDLE_LATENCY; + +module_param(idle_duration_us, int, 0644); +MODULE_PARM_DESC(idle_duration_us, + "Idle duration in us."); + +module_param(idle_latency_us, int, 0644); +MODULE_PARM_DESC(idle_latency_us, + "Idle latency in us."); + +/** + * struct cpuidle_cooling - Per instance data for cooling device + * @cpu: CPU number for this cooling device + * @ii_dev: Idle inject core instance pointer + * @cdev: Thermal core cooling device instance + * @state: Current cooling device state + * + * Stores per instance cooling device state. + */ +struct cpuidle_cooling { + int cpu; + struct idle_inject_device *ii_dev; + struct thermal_cooling_device *cdev; + unsigned long state; +}; + +static DEFINE_PER_CPU(struct cpuidle_cooling, cooling_devs); +static cpumask_t cpuidle_cpu_mask; + +/* Used for module unload protection with idle injection operations */ +static DEFINE_MUTEX(idle_cooling_lock); + +static unsigned int cpuidle_cooling_runtime(unsigned int idle_duration_us, + unsigned long state) +{ + if (!state) + return 0; + + return ((idle_duration_us * 100) / state) - idle_duration_us; +} + +static int cpuidle_idle_injection_register(struct cpuidle_cooling *cooling_dev) +{ + struct idle_inject_device *ii_dev; + + ii_dev = idle_inject_register((struct cpumask *)cpumask_of(cooling_dev->cpu), + NULL, NULL); + if (!ii_dev) { + /* + * It is busy as some other device claimed idle injection for this CPU + * Also it is possible that memory allocation failure. + */ + pr_err("idle_inject_register failed for cpu:%d\n", cooling_dev->cpu); + return -EAGAIN; + } + + idle_inject_set_duration(ii_dev, TICK_USEC, idle_duration_us); + idle_inject_set_latency(ii_dev, idle_latency_us); + + cooling_dev->ii_dev = ii_dev; + + return 0; +} + +static void cpuidle_idle_injection_unregister(struct cpuidle_cooling *cooling_dev) +{ + idle_inject_unregister(cooling_dev->ii_dev); +} + +static int cpuidle_cooling_get_max_state(struct thermal_cooling_device *cdev, + unsigned long *state) +{ + *state = 100; + + return 0; +} + +static int cpuidle_cooling_get_cur_state(struct thermal_cooling_device *cdev, + unsigned long *state) +{ + struct cpuidle_cooling *cooling_dev = cdev->devdata; + + *state = READ_ONCE(cooling_dev->state); + + return 0; +} + +static int cpuidle_cooling_set_cur_state(struct thermal_cooling_device *cdev, + unsigned long state) +{ + struct cpuidle_cooling *cooling_dev = cdev->devdata; + unsigned int runtime_us; + unsigned long curr_state; + int ret = 0; + + mutex_lock(&idle_cooling_lock); + + curr_state = READ_ONCE(cooling_dev->state); + + if (!curr_state && state > 0) { + /* + * This is the first time to start cooling, so register with + * idle injection framework. + */ + if (!cooling_dev->ii_dev) { + ret = cpuidle_idle_injection_register(cooling_dev); + if (ret) + goto unlock_set_state; + } + + runtime_us = cpuidle_cooling_runtime(idle_duration_us, state); + + idle_inject_set_duration(cooling_dev->ii_dev, runtime_us, idle_duration_us); + idle_inject_start(cooling_dev->ii_dev); + } else if (curr_state > 0 && state) { + /* Simply update runtime */ + runtime_us = cpuidle_cooling_runtime(idle_duration_us, state); + idle_inject_set_duration(cooling_dev->ii_dev, runtime_us, idle_duration_us); + } else if (curr_state > 0 && !state) { + idle_inject_stop(cooling_dev->ii_dev); + cpuidle_idle_injection_unregister(cooling_dev); + cooling_dev->ii_dev = NULL; + } + + WRITE_ONCE(cooling_dev->state, state); + +unlock_set_state: + mutex_unlock(&idle_cooling_lock); + + return ret; +} + +/** + * cpuidle_cooling_ops - thermal cooling device ops + */ +static struct thermal_cooling_device_ops cpuidle_cooling_ops = { + .get_max_state = cpuidle_cooling_get_max_state, + .get_cur_state = cpuidle_cooling_get_cur_state, + .set_cur_state = cpuidle_cooling_set_cur_state, +}; + +static int cpuidle_cooling_register(int cpu) +{ + struct cpuidle_cooling *cooling_dev = &per_cpu(cooling_devs, cpu); + struct thermal_cooling_device *cdev; + char name[14]; /* storage for cpuidle-XXXX */ + int ret = 0; + + mutex_lock(&idle_cooling_lock); + + snprintf(name, sizeof(name), "cpuidle-%d", cpu); + cdev = thermal_cooling_device_register(name, cooling_dev, &cpuidle_cooling_ops); + if (IS_ERR(cdev)) { + ret = PTR_ERR(cdev); + goto unlock_register; + } + + cooling_dev->cdev = cdev; + cpumask_set_cpu(cpu, &cpuidle_cpu_mask); + cooling_dev->cpu = cpu; + +unlock_register: + mutex_unlock(&idle_cooling_lock); + + return ret; +} + +static void cpuidle_cooling_unregister(int cpu) +{ + struct cpuidle_cooling *cooling_dev = &per_cpu(cooling_devs, cpu); + + mutex_lock(&idle_cooling_lock); + + if (cooling_dev->state) { + idle_inject_stop(cooling_dev->ii_dev); + cpuidle_idle_injection_unregister(cooling_dev); + } + + thermal_cooling_device_unregister(cooling_dev->cdev); + cooling_dev->state = 0; + + mutex_unlock(&idle_cooling_lock); +} + +static int cpuidle_cooling_cpu_online(unsigned int cpu) +{ + cpuidle_cooling_register(cpu); + + return 0; +} + +static int cpuidle_cooling_cpu_offline(unsigned int cpu) +{ + cpuidle_cooling_unregister(cpu); + + return 0; +} + +static enum cpuhp_state cpuidle_cooling_hp_state __read_mostly; + +static const struct x86_cpu_id intel_cpuidle_cooling_ids[] __initconst = { + X86_MATCH_VENDOR_FEATURE(INTEL, X86_FEATURE_MWAIT, NULL), + {} +}; +MODULE_DEVICE_TABLE(x86cpu, intel_cpuidle_cooling_ids); + +static int __init cpuidle_cooling_init(void) +{ + int ret; + + if (!x86_match_cpu(intel_cpuidle_cooling_ids)) + return -ENODEV; + + ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, + "thermal/cpuidle_cooling:online", + cpuidle_cooling_cpu_online, + cpuidle_cooling_cpu_offline); + if (ret < 0) + return ret; + + cpuidle_cooling_hp_state = ret; + + return 0; +} +module_init(cpuidle_cooling_init) + +static void __exit cpuidle_cooling_exit(void) +{ + cpuhp_remove_state(cpuidle_cooling_hp_state); +} +module_exit(cpuidle_cooling_exit) + +MODULE_IMPORT_NS(IDLE_INJECT); + +MODULE_LICENSE("GPL");