diff mbox series

[RFC] therm_throt: test bits as we build therm_intr_core_clear_mask

Message ID Zg3GhhTZotBNvlRR@merlin.infradead.org (mailing list archive)
State RFC, archived
Headers show
Series [RFC] therm_throt: test bits as we build therm_intr_core_clear_mask | expand

Commit Message

Kyle McMartin April 3, 2024, 9:13 p.m. UTC
X86_FEATURE_HWP appears to be insufficient on some platforms, and
writing 1 to BIT(13)|BIT(15) results in the wrmsrl trapping and failing
to clear PROCHOT_LOG.

Instead, also try to wrmsrl_safe with those bits set in the mask, and
check if we get -EIO back. If so, those bits will trap and prevent us
from writing to THERM_STATUS.

Signed-off-by: Kyle McMartin <jkkm@fb.com>

---

We noticed a problem on some of our production hosts while rolling out
6.4 kernels where we were seeing PROCHOT_LOG set but never cleared,
along with a warn on once in wrmsr.

I tracked this down to:

commit 930d06bf071aa746db11d68d2d75660b449deff3
Author: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Date:   Tue Nov 15 18:54:17 2022 -0800

    thermal: intel: Protect clearing of thermal status bits

which started setting some unexpected bits in THERM_STATUS on our
platform. Previously, the mask had these bits set, but we were masking
with the MSR which was resulting in them not being written to 1.

Starting with 117e4e5bd9d47b89777dbf6b37a709dcfe59520f, these bits were
protected by the HWP CPUID flag, but on some of our platforms, this
doesn't seem sufficient.

On Broadwell and Broadwell-DE, the HWP flag is not set, but writing
these bits does not trap.

On our Skylake-DE, Skylake, and Cooper Lake platforms, the HWP flag is
set in CPUID, and writing 1 to these bits traps attempting to write
0xAAA8 to MSR 0x19C (THERM_STATUS). Writing 0xAA8 from userspace works
as expected to un-stick PROCHOT_LOG.

On our Sapphire Rapids platforms, the HWP flag is set, and writing 1 to
these bits is successful.

 drivers/thermal/intel/therm_throt.c | 29 ++++++++++++++++++++++-------
 1 file changed, 22 insertions(+), 7 deletions(-)

Comments

srinivas pandruvada April 4, 2024, 1:15 a.m. UTC | #1
Hi Kyle,

On Wed, 2024-04-03 at 17:13 -0400, Kyle McMartin wrote:
> X86_FEATURE_HWP appears to be insufficient on some platforms, and
> writing 1 to BIT(13)|BIT(15) results in the wrmsrl trapping and
> failing
> to clear PROCHOT_LOG.
> 
> Instead, also try to wrmsrl_safe with those bits set in the mask, and
> check if we get -EIO back. If so, those bits will trap and prevent us
> from writing to THERM_STATUS.
> 
> Signed-off-by: Kyle McMartin <jkkm@fb.com>
> 
> ---
> 
> We noticed a problem on some of our production hosts while rolling
> out
> 6.4 kernels where we were seeing PROCHOT_LOG set but never cleared,
> along with a warn on once in wrmsr.
> 
> I tracked this down to:
> 
> commit 930d06bf071aa746db11d68d2d75660b449deff3
> Author: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Date:   Tue Nov 15 18:54:17 2022 -0800
> 
>     thermal: intel: Protect clearing of thermal status bits
> 
> which started setting some unexpected bits in THERM_STATUS on our
> platform. Previously, the mask had these bits set, but we were
> masking
> with the MSR which was resulting in them not being written to 1.
> 
> Starting with 117e4e5bd9d47b89777dbf6b37a709dcfe59520f, these bits
> were
> protected by the HWP CPUID flag, but on some of our platforms, this
> doesn't seem sufficient.
> 
> On Broadwell and Broadwell-DE, the HWP flag is not set, but writing
> these bits does not trap.
> 
> On our Skylake-DE, Skylake, and Cooper Lake platforms, the HWP flag
> is
> set in CPUID, and writing 1 to these bits traps attempting to write
> 0xAAA8 to MSR 0x19C (THERM_STATUS). Writing 0xAA8 from userspace
> works
> as expected to un-stick PROCHOT_LOG.

I think this issue happens only on Skylake, Cascade Lake, Cooper Lake
and not on any other systems.

Please verify:
GP# happens only when bit13 (Current Limit Log) or bit15 (Cross Domain
Limit Log) is 1.

Basically writing 0x2000 or 0x8000  or A000 will cause this issue.
Are you using the latest BIOS with microcode?
Please confirm your microcode version, I can check internally.

Thanks,
Srinivas


> 
> On our Sapphire Rapids platforms, the HWP flag is set, and writing 1
> to
> these bits is successful.
> 
>  drivers/thermal/intel/therm_throt.c | 29 ++++++++++++++++++++++-----
> --
>  1 file changed, 22 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/thermal/intel/therm_throt.c
> b/drivers/thermal/intel/therm_throt.c
> index e69868e868eb..3058d8fcfcef 100644
> --- a/drivers/thermal/intel/therm_throt.c
> +++ b/drivers/thermal/intel/therm_throt.c
> @@ -196,8 +196,14 @@ static const struct attribute_group
> thermal_attr_group = {
>  static u64 therm_intr_core_clear_mask;
>  static u64 therm_intr_pkg_clear_mask;
>  
> +/* Probe each addition to the mask to ensure that our wrmsrl
> + * won't fail to clear bits.
> + */
>  static void thermal_intr_init_core_clear_mask(void)
>  {
> +       u64 bits = 0;
> +       u64 mask = 0;
> +
>         if (therm_intr_core_clear_mask)
>                 return;
>  
> @@ -211,25 +217,34 @@ static void
> thermal_intr_init_core_clear_mask(void)
>          * Bit 1, 3, 5: CPUID.01H:EDX[22] = 1. This driver will not
>          * enable interrupts, when 0 as it checks for
> X86_FEATURE_ACPI.
>          */
> -       therm_intr_core_clear_mask = (BIT(1) | BIT(3) | BIT(5));
> +       mask = (BIT(1) | BIT(3) | BIT(5));
>  
>         /*
>          * Bit 7 and 9: Thermal Threshold #1 and #2 log
>          * If CPUID.01H:ECX[8] = 1
>          */
> -       if (boot_cpu_has(X86_FEATURE_TM2))
> -               therm_intr_core_clear_mask |= (BIT(7) | BIT(9));
> +       bits = BIT(7) | BIT(9);
> +       if (boot_cpu_has(X86_FEATURE_TM2) &&
> +           wrmsrl_safe(MSR_IA32_THERM_STATUS, mask | bits) >= 0)
> +               mask |= bits;
> +
>  
>         /* Bit 11: Power Limitation log (R/WC0) If CPUID.06H:EAX[4] =
> 1 */
> -       if (boot_cpu_has(X86_FEATURE_PLN))
> -               therm_intr_core_clear_mask |= BIT(11);
> +       bits = BIT(11);
> +       if (boot_cpu_has(X86_FEATURE_PLN) &&
> +           wrmsrl_safe(MSR_IA32_THERM_STATUS, mask | bits) >= 0)
> +               mask |= bits;
>  
>         /*
>          * Bit 13: Current Limit log (R/WC0) If CPUID.06H:EAX[7] = 1
>          * Bit 15: Cross Domain Limit log (R/WC0) If CPUID.06H:EAX[7]
> = 1
>          */
> -       if (boot_cpu_has(X86_FEATURE_HWP))
> -               therm_intr_core_clear_mask |= (BIT(13) | BIT(15));
> +       bits = BIT(13) | BIT(15);
> +       if (boot_cpu_has(X86_FEATURE_HWP) &&
> +           wrmsrl_safe(MSR_IA32_THERM_STATUS, mask | bits) >= 0)
> +               mask |= bits;
> +
> +       therm_intr_core_clear_mask = mask;
>  }
>  
>  static void thermal_intr_init_pkg_clear_mask(void)
Kyle McMartin April 4, 2024, 5:17 p.m. UTC | #2
On Wed, Apr 03, 2024 at 06:15:47PM -0700, srinivas pandruvada wrote:
> > On Broadwell and Broadwell-DE, the HWP flag is not set, but writing
> > these bits does not trap.
> > 
> > On our Skylake-DE, Skylake, and Cooper Lake platforms, the HWP flag
> > is
> > set in CPUID, and writing 1 to these bits traps attempting to write
> > 0xAAA8 to MSR 0x19C (THERM_STATUS). Writing 0xAA8 from userspace
> > works
> > as expected to un-stick PROCHOT_LOG.
> 
> I think this issue happens only on Skylake, Cascade Lake, Cooper Lake
> and not on any other systems.
> 
> Please verify:
> GP# happens only when bit13 (Current Limit Log) or bit15 (Cross Domain
> Limit Log) is 1.
> 

Yeah, if either of the bits are set, we'll trap and fail the WRMSRL.

> Basically writing 0x2000 or 0x8000  or A000 will cause this issue.
> Are you using the latest BIOS with microcode?
> Please confirm your microcode version, I can check internally.
> 

On SkylakeDE, 6-85-4 we've got 0x2006e08 and 0x2006e05 as the most commonly
deployed microcodes. On Skylake, 6-85-4 we've got 0x2006e05 and 0x2000065.
Finally, on Cooper Lake, 6-85-11, we have 0x700001f and are in the process
of rolling out 0x7002503.

Rolling out new firmware is a pretty slow process... Since we're not
clearing those bits anywhere in the kernel we're deploying, I just
stubbed out setting BIT(13) and BIT(15) on those platforms for now while
we discuss a more durable fix.

Thanks for following up! --kyle

> Thanks,
> Srinivas
> 
> 
> > 
> > On our Sapphire Rapids platforms, the HWP flag is set, and writing 1
> > to
> > these bits is successful.
> > 
> >  drivers/thermal/intel/therm_throt.c | 29 ++++++++++++++++++++++-----
> > --
> >  1 file changed, 22 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/thermal/intel/therm_throt.c
> > b/drivers/thermal/intel/therm_throt.c
> > index e69868e868eb..3058d8fcfcef 100644
> > --- a/drivers/thermal/intel/therm_throt.c
> > +++ b/drivers/thermal/intel/therm_throt.c
> > @@ -196,8 +196,14 @@ static const struct attribute_group
> > thermal_attr_group = {
> >  static u64 therm_intr_core_clear_mask;
> >  static u64 therm_intr_pkg_clear_mask;
> >  
> > +/* Probe each addition to the mask to ensure that our wrmsrl
> > + * won't fail to clear bits.
> > + */
> >  static void thermal_intr_init_core_clear_mask(void)
> >  {
> > +       u64 bits = 0;
> > +       u64 mask = 0;
> > +
> >         if (therm_intr_core_clear_mask)
> >                 return;
> >  
> > @@ -211,25 +217,34 @@ static void
> > thermal_intr_init_core_clear_mask(void)
> >          * Bit 1, 3, 5: CPUID.01H:EDX[22] = 1. This driver will not
> >          * enable interrupts, when 0 as it checks for
> > X86_FEATURE_ACPI.
> >          */
> > -       therm_intr_core_clear_mask = (BIT(1) | BIT(3) | BIT(5));
> > +       mask = (BIT(1) | BIT(3) | BIT(5));
> >  
> >         /*
> >          * Bit 7 and 9: Thermal Threshold #1 and #2 log
> >          * If CPUID.01H:ECX[8] = 1
> >          */
> > -       if (boot_cpu_has(X86_FEATURE_TM2))
> > -               therm_intr_core_clear_mask |= (BIT(7) | BIT(9));
> > +       bits = BIT(7) | BIT(9);
> > +       if (boot_cpu_has(X86_FEATURE_TM2) &&
> > +           wrmsrl_safe(MSR_IA32_THERM_STATUS, mask | bits) >= 0)
> > +               mask |= bits;
> > +
> >  
> >         /* Bit 11: Power Limitation log (R/WC0) If CPUID.06H:EAX[4] =
> > 1 */
> > -       if (boot_cpu_has(X86_FEATURE_PLN))
> > -               therm_intr_core_clear_mask |= BIT(11);
> > +       bits = BIT(11);
> > +       if (boot_cpu_has(X86_FEATURE_PLN) &&
> > +           wrmsrl_safe(MSR_IA32_THERM_STATUS, mask | bits) >= 0)
> > +               mask |= bits;
> >  
> >         /*
> >          * Bit 13: Current Limit log (R/WC0) If CPUID.06H:EAX[7] = 1
> >          * Bit 15: Cross Domain Limit log (R/WC0) If CPUID.06H:EAX[7]
> > = 1
> >          */
> > -       if (boot_cpu_has(X86_FEATURE_HWP))
> > -               therm_intr_core_clear_mask |= (BIT(13) | BIT(15));
> > +       bits = BIT(13) | BIT(15);
> > +       if (boot_cpu_has(X86_FEATURE_HWP) &&
> > +           wrmsrl_safe(MSR_IA32_THERM_STATUS, mask | bits) >= 0)
> > +               mask |= bits;
> > +
> > +       therm_intr_core_clear_mask = mask;
> >  }
> >  
> >  static void thermal_intr_init_pkg_clear_mask(void)
>
diff mbox series

Patch

diff --git a/drivers/thermal/intel/therm_throt.c b/drivers/thermal/intel/therm_throt.c
index e69868e868eb..3058d8fcfcef 100644
--- a/drivers/thermal/intel/therm_throt.c
+++ b/drivers/thermal/intel/therm_throt.c
@@ -196,8 +196,14 @@  static const struct attribute_group thermal_attr_group = {
 static u64 therm_intr_core_clear_mask;
 static u64 therm_intr_pkg_clear_mask;
 
+/* Probe each addition to the mask to ensure that our wrmsrl
+ * won't fail to clear bits.
+ */
 static void thermal_intr_init_core_clear_mask(void)
 {
+	u64 bits = 0;
+	u64 mask = 0;
+
 	if (therm_intr_core_clear_mask)
 		return;
 
@@ -211,25 +217,34 @@  static void thermal_intr_init_core_clear_mask(void)
 	 * Bit 1, 3, 5: CPUID.01H:EDX[22] = 1. This driver will not
 	 * enable interrupts, when 0 as it checks for X86_FEATURE_ACPI.
 	 */
-	therm_intr_core_clear_mask = (BIT(1) | BIT(3) | BIT(5));
+	mask = (BIT(1) | BIT(3) | BIT(5));
 
 	/*
 	 * Bit 7 and 9: Thermal Threshold #1 and #2 log
 	 * If CPUID.01H:ECX[8] = 1
 	 */
-	if (boot_cpu_has(X86_FEATURE_TM2))
-		therm_intr_core_clear_mask |= (BIT(7) | BIT(9));
+	bits = BIT(7) | BIT(9);
+	if (boot_cpu_has(X86_FEATURE_TM2) &&
+	    wrmsrl_safe(MSR_IA32_THERM_STATUS, mask | bits) >= 0)
+		mask |= bits;
+
 
 	/* Bit 11: Power Limitation log (R/WC0) If CPUID.06H:EAX[4] = 1 */
-	if (boot_cpu_has(X86_FEATURE_PLN))
-		therm_intr_core_clear_mask |= BIT(11);
+	bits = BIT(11);
+	if (boot_cpu_has(X86_FEATURE_PLN) &&
+	    wrmsrl_safe(MSR_IA32_THERM_STATUS, mask | bits) >= 0)
+		mask |= bits;
 
 	/*
 	 * Bit 13: Current Limit log (R/WC0) If CPUID.06H:EAX[7] = 1
 	 * Bit 15: Cross Domain Limit log (R/WC0) If CPUID.06H:EAX[7] = 1
 	 */
-	if (boot_cpu_has(X86_FEATURE_HWP))
-		therm_intr_core_clear_mask |= (BIT(13) | BIT(15));
+	bits = BIT(13) | BIT(15);
+	if (boot_cpu_has(X86_FEATURE_HWP) &&
+	    wrmsrl_safe(MSR_IA32_THERM_STATUS, mask | bits) >= 0)
+		mask |= bits;
+
+	therm_intr_core_clear_mask = mask;
 }
 
 static void thermal_intr_init_pkg_clear_mask(void)