diff mbox series

cpufreq: intel_pstate: Force intel_pstate to load when HWP disabled in firmware

Message ID 20210513075930.22657-1-ggherdovich@suse.cz (mailing list archive)
State Superseded, archived
Headers show
Series cpufreq: intel_pstate: Force intel_pstate to load when HWP disabled in firmware | expand

Commit Message

Giovanni Gherdovich May 13, 2021, 7:59 a.m. UTC
On CPUs succeeding SKX, eg. ICELAKE_X, intel_pstate doesn't load unless
CPUID advertises support for the HWP feature. Some OEMs, however, may offer
users the possibility to disable HWP from the BIOS config utility by
altering the output of CPUID.

Add the command line option "intel_pstate=hwp_broken_firmware" so that
intel_pstate still loads in that case, providing OS-driven frequency
scaling.

Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz>
---
 Documentation/admin-guide/kernel-parameters.txt | 7 +++++++
 Documentation/admin-guide/pm/intel_pstate.rst   | 7 +++++++
 drivers/cpufreq/intel_pstate.c                  | 7 ++++++-
 3 files changed, 20 insertions(+), 1 deletion(-)

Comments

Srinivas Pandruvada May 13, 2021, 9:24 a.m. UTC | #1
On Thu, 2021-05-13 at 09:59 +0200, Giovanni Gherdovich wrote:
> On CPUs succeeding SKX, eg. ICELAKE_X, intel_pstate doesn't load
> unless
> CPUID advertises support for the HWP feature. Some OEMs, however, may
> offer
> users the possibility to disable HWP from the BIOS config utility by
> altering the output of CPUID.
Is someone providing a utility? What is the case for broken HWP?

It is possible that some user don't want to use HWP, because there
workloads works better without HWP. But that doesn't mean HWP is
broken.

Thanks,
Srinivas

> 
> Add the command line option "intel_pstate=hwp_broken_firmware" so
> that
> intel_pstate still loads in that case, providing OS-driven frequency
> scaling.
> 
> Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz>
> ---
>  Documentation/admin-guide/kernel-parameters.txt | 7 +++++++
>  Documentation/admin-guide/pm/intel_pstate.rst   | 7 +++++++
>  drivers/cpufreq/intel_pstate.c                  | 7 ++++++-
>  3 files changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt
> b/Documentation/admin-guide/kernel-parameters.txt
> index cb89dbdedc46..278ec0718dc9 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1951,6 +1951,13 @@
>                         per_cpu_perf_limits
>                           Allow per-logical-CPU P-State performance
> control limits using
>                           cpufreq sysfs interface
> +                       hwp_broken_firmware
> +                         Register intel_pstate as the scaling driver
> despite the
> +                         hardware-managed P-states (HWP) feature
> being disabled in
> +                         firmware. On CPU models succeeding SKX,
> intel_pstate expects
> +                         HWP to be supported. Some OEMs may use
> firmware that hides the
> +                         feature from the OS. With this option
> intel_pstate will
> +                         load regardless.
>  
>         intremap=       [X86-64, Intel-IOMMU]
>                         on      enable Interrupt Remapping (default)
> diff --git a/Documentation/admin-guide/pm/intel_pstate.rst
> b/Documentation/admin-guide/pm/intel_pstate.rst
> index df29b4f1f219..1e6f139d5b05 100644
> --- a/Documentation/admin-guide/pm/intel_pstate.rst
> +++ b/Documentation/admin-guide/pm/intel_pstate.rst
> @@ -689,6 +689,13 @@ of them have to be prepended with the
> ``intel_pstate=`` prefix.
>         Use per-logical-CPU P-State limits (see `Coordination of P-
> state
>         Limits`_ for details).
>  
> +``hwp_broken_firmware``
> +       Register ``intel_pstate`` as the scaling driver despite the
> +       hardware-managed P-states (HWP) feature being disabled in
> firmware.
> +
> +       On CPU models succeeding SKX, ``intel_pstate`` expects HWP to
> be
> +       supported. Some OEMs may use firmware that hides the feature
> from the
> +       OS. With this option ``intel_pstate`` will load regardless.
>  
>  Diagnostics and Tuning
>  ======================
> diff --git a/drivers/cpufreq/intel_pstate.c
> b/drivers/cpufreq/intel_pstate.c
> index f0401064d7aa..8635251f86f2 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -2856,6 +2856,7 @@ static int intel_pstate_update_status(const
> char *buf, size_t size)
>  static int no_load __initdata;
>  static int no_hwp __initdata;
>  static int hwp_only __initdata;
> +static int hwp_broken_firmware __initdata;
>  static unsigned int force_load __initdata;
>  
>  static int __init intel_pstate_msrs_not_valid(void)
> @@ -3066,7 +3067,7 @@ static int __init intel_pstate_init(void)
>                 }
>         } else {
>                 id = x86_match_cpu(intel_pstate_cpu_ids);
> -               if (!id) {
> +               if (!id && !hwp_broken_firmware) {
>                         pr_info("CPU model not supported\n");
>                         return -ENODEV;
>                 }
> @@ -3149,6 +3150,10 @@ static int __init intel_pstate_setup(char
> *str)
>                 force_load = 1;
>         if (!strcmp(str, "hwp_only"))
>                 hwp_only = 1;
> +       if (!strcmp(str, "hwp_broken_firmware")) {
> +               pr_info("HWP disabled by firmware\n");
> +               hwp_broken_firmware = 1;
> +       }
>         if (!strcmp(str, "per_cpu_perf_limits"))
>                 per_cpu_limits = true;
>
Giovanni Gherdovich May 13, 2021, 10:10 a.m. UTC | #2
On Thu, 2021-05-13 at 02:24 -0700, Srinivas Pandruvada wrote:
> On Thu, 2021-05-13 at 09:59 +0200, Giovanni Gherdovich wrote:
> > On CPUs succeeding SKX, eg. ICELAKE_X, intel_pstate doesn't load
> > unless
> > CPUID advertises support for the HWP feature. Some OEMs, however, may
> > offer
> > users the possibility to disable HWP from the BIOS config utility by
> > altering the output of CPUID.
>
> Is someone providing a utility? What is the case for broken HWP?

Yes, I know of at least one server manufacturer that ships a BIOS config
utility where the user can disable HWP.

On such server machine, which has an ICELAKE_X CPU, if the user unchecks HWP
via BIOS then intel_pstate will refuse to load saying:

    intel_pstate: CPU model not supported

because ICELAKE_X is not in the list intel_pstate_cpu_ids (defined in
intel_pstate.c) of CPUs that intel_pstate supports when HWP is absent from
CPUID; that list ends at SKYLAKE_X.

An alternative approach to register intel_pstate in the case I'm describing
would be to add ICELAKE_X (and every CPU model after that, forever?) to the
list intel_pstate_cpu_ids.

> It is possible that some user don't want to use HWP, because there
> workloads works better without HWP. But that doesn't mean HWP is
> broken.

That's true, a user may legitimate want to disable HWP, and we have the
intel_pstate=no_hwp option for that. But for that option to work CPUID must
still show that the CPU is HWP-capable; when disablement happens in BIOS, it's
not the case.

The wording "hwp_broken_firmware" deliberately has a negative connotation (the
intended meaning is: "firmware is broken, regarding HWP"), carrying the
not-so-subtle message "OEM folks, please don't do this". My understanding is
that the preferred way to disable HWP is with intel_pstate=no_hwp, the
firmware should stay out of it.

I hope this clarifies the problem (there is an ICELAKE_X somewhere out there
that can't load intel_pstate, which is not nice) and the intention
(discouraging disablement of HWP via firmware).


Giovanni
Srinivas Pandruvada May 13, 2021, 11:03 a.m. UTC | #3
On Thu, 2021-05-13 at 12:10 +0200, Giovanni Gherdovich wrote:
> On Thu, 2021-05-13 at 02:24 -0700, Srinivas Pandruvada wrote:
> > On Thu, 2021-05-13 at 09:59 +0200, Giovanni Gherdovich wrote:
> > > On CPUs succeeding SKX, eg. ICELAKE_X, intel_pstate doesn't load
> > > unless
> > > CPUID advertises support for the HWP feature. Some OEMs, however,
> > > may
> > > offer
> > > users the possibility to disable HWP from the BIOS config utility
> > > by
> > > altering the output of CPUID.
> > 
> > Is someone providing a utility? What is the case for broken HWP?
> 
> Yes, I know of at least one server manufacturer that ships a BIOS
> config
> utility where the user can disable HWP.
> 
> On such server machine, which has an ICELAKE_X CPU, if the user
> unchecks HWP
> via BIOS then intel_pstate will refuse to load saying:
> 
>     intel_pstate: CPU model not supported
> 
> because ICELAKE_X is not in the list intel_pstate_cpu_ids (defined in
> intel_pstate.c) of CPUs that intel_pstate supports when HWP is absent
> from
> CPUID; that list ends at SKYLAKE_X.
> 
> An alternative approach to register intel_pstate in the case I'm
> describing
> would be to add ICELAKE_X (and every CPU model after that, forever?)
> to the
> list intel_pstate_cpu_ids.
This is not nice, but unlike client server CPUs don't get released
often. There is couple of years in between.

> 
> > It is possible that some user don't want to use HWP, because there
> > workloads works better without HWP. But that doesn't mean HWP is
> > broken.
> 
> That's true, a user may legitimate want to disable HWP, and we have
> the
> intel_pstate=no_hwp option for that. But for that option to work
> CPUID must
> still show that the CPU is HWP-capable; when disablement happens in
> BIOS, it's
> not the case.
Correct.

> 
> The wording "hwp_broken_firmware" deliberately has a negative
> connotation (the
> intended meaning is: "firmware is broken, regarding HWP"), carrying
> the
> not-so-subtle message "OEM folks, please don't do this". My
> understanding is
> that the preferred way to disable HWP is with intel_pstate=no_hwp,
> the
> firmware should stay out of it.
For me "broken" means that Intel has some bug, which is not the case,
even if the intention is to carry message to OEM.

no_hwp is for disabling HWP even if the HWP is supported.

The problem is that if we override the supported CPU list using some
kernel command line, some users may crash the system running on some
old hardware where some of the MSRs we rely are not present. We don't
read MSR in failsafe mode, so they will fault. We are checking some
MSRs but not all. Also what will be default struct pstate_funcs *)id-
>driver_data if the cpu model doesn't match.

I think better to add CPU model instead. We did that for SKX on user
requests.

Thanks,
Srinivas

> 
> I hope this clarifies the problem (there is an ICELAKE_X somewhere
> out there
> that can't load intel_pstate, which is not nice) and the intention
> (discouraging disablement of HWP via firmware).
> 
> 
> Giovanni
>
Giovanni Gherdovich May 13, 2021, 12:10 p.m. UTC | #4
On Thu, 2021-05-13 at 04:03 -0700, Srinivas Pandruvada wrote:
> On Thu, 2021-05-13 at 12:10 +0200, Giovanni Gherdovich wrote:
> > [...]
> > An alternative approach to register intel_pstate in the case I'm
> > describing
> > would be to add ICELAKE_X (and every CPU model after that, forever?)
> > to the
> > list intel_pstate_cpu_ids.
>
> This is not nice, but unlike client server CPUs don't get released
> often. There is couple of years in between.

True.

> > [...]
> > The wording "hwp_broken_firmware" deliberately has a negative
> > connotation (the
> > intended meaning is: "firmware is broken, regarding HWP"), carrying
> > the
> > not-so-subtle message "OEM folks, please don't do this". My
> > understanding is
> > that the preferred way to disable HWP is with intel_pstate=no_hwp,
> > the
> > firmware should stay out of it.
>
> For me "broken" means that Intel has some bug, which is not the case,
> even if the intention is to carry message to OEM.
> 
> no_hwp is for disabling HWP even if the HWP is supported.
> 
> The problem is that if we override the supported CPU list using some
> kernel command line, some users may crash the system running on some
> old hardware where some of the MSRs we rely are not present. We don't
> read MSR in failsafe mode, so they will fault. We are checking some
> MSRs but not all.

Fair enough.

> Also what will be default
> (struct pstate_funcs *)id->driver_data if the cpu model doesn't match.

Whoops... You're totally right, the patch I sent is broken! "id" must be a
valid pstate_funcs* pointer, or some other default methods must be provided.

And besides...

> I think better to add CPU model instead. We did that for SKX on user
> requests.

... I agree. Let's just add ICX to the list of explicitly supported CPUs.
I'll send a new patch doing that, please discard this one.


Giovanni
diff mbox series

Patch

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index cb89dbdedc46..278ec0718dc9 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1951,6 +1951,13 @@ 
 			per_cpu_perf_limits
 			  Allow per-logical-CPU P-State performance control limits using
 			  cpufreq sysfs interface
+			hwp_broken_firmware
+			  Register intel_pstate as the scaling driver despite the
+			  hardware-managed P-states (HWP) feature being disabled in
+			  firmware. On CPU models succeeding SKX, intel_pstate expects
+			  HWP to be supported. Some OEMs may use firmware that hides the
+			  feature from the OS. With this option intel_pstate will
+			  load regardless.
 
 	intremap=	[X86-64, Intel-IOMMU]
 			on	enable Interrupt Remapping (default)
diff --git a/Documentation/admin-guide/pm/intel_pstate.rst b/Documentation/admin-guide/pm/intel_pstate.rst
index df29b4f1f219..1e6f139d5b05 100644
--- a/Documentation/admin-guide/pm/intel_pstate.rst
+++ b/Documentation/admin-guide/pm/intel_pstate.rst
@@ -689,6 +689,13 @@  of them have to be prepended with the ``intel_pstate=`` prefix.
 	Use per-logical-CPU P-State limits (see `Coordination of P-state
 	Limits`_ for details).
 
+``hwp_broken_firmware``
+	Register ``intel_pstate`` as the scaling driver despite the
+	hardware-managed P-states (HWP) feature being disabled in firmware.
+
+	On CPU models succeeding SKX, ``intel_pstate`` expects HWP to be
+	supported. Some OEMs may use firmware that hides the feature from the
+	OS. With this option ``intel_pstate`` will load regardless.
 
 Diagnostics and Tuning
 ======================
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index f0401064d7aa..8635251f86f2 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -2856,6 +2856,7 @@  static int intel_pstate_update_status(const char *buf, size_t size)
 static int no_load __initdata;
 static int no_hwp __initdata;
 static int hwp_only __initdata;
+static int hwp_broken_firmware __initdata;
 static unsigned int force_load __initdata;
 
 static int __init intel_pstate_msrs_not_valid(void)
@@ -3066,7 +3067,7 @@  static int __init intel_pstate_init(void)
 		}
 	} else {
 		id = x86_match_cpu(intel_pstate_cpu_ids);
-		if (!id) {
+		if (!id && !hwp_broken_firmware) {
 			pr_info("CPU model not supported\n");
 			return -ENODEV;
 		}
@@ -3149,6 +3150,10 @@  static int __init intel_pstate_setup(char *str)
 		force_load = 1;
 	if (!strcmp(str, "hwp_only"))
 		hwp_only = 1;
+	if (!strcmp(str, "hwp_broken_firmware")) {
+		pr_info("HWP disabled by firmware\n");
+		hwp_broken_firmware = 1;
+	}
 	if (!strcmp(str, "per_cpu_perf_limits"))
 		per_cpu_limits = true;