diff mbox series

x86/nmi: ensure Global Performance Counter Control is setup correctly

Message ID 20240110153400.64017-1-roger.pau@citrix.com (mailing list archive)
State New
Headers show
Series x86/nmi: ensure Global Performance Counter Control is setup correctly | expand

Commit Message

Roger Pau Monne Jan. 10, 2024, 3:34 p.m. UTC
When Architectural Performance Monitoring is available, the PERF_GLOBAL_CTRL
MSR contains per-counter enable bits that is ANDed with the enable bit in the
counter EVNTSEL MSR in order for a PMC counter to be enabled.

So far the watchdog code seems to have relied on the PERF_GLOBAL_CTRL enable
bits being set by default, but at least on some Intel Sapphire and Emerald
Rapids this is no longer the case, and Xen reports:

Testing NMI watchdog on all CPUs: 0 40 stuck

The first CPU on each socket is started with PERF_GLOBAL_CTRL zeroed, so PMC0
doesn't start counting when the enable bit in EVNTSEL0 is set, due to the
relevant enable bit in PERF_GLOBAL_CTRL not being set.

Fix by detecting when Architectural Performance Monitoring is available and
making sure the enable bit for PMC0 is set in PERF_GLOBAL_CTRL.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
The fact that it's only the first CPU on each socket that's started with
PERF_GLOBAL_CTRL clear looks like a firmware bug to me, but in any case making
sure PERF_GLOBAL_CTRL is properly setup should be done regardless.
---
 xen/arch/x86/nmi.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

Comments

Andrew Cooper Jan. 10, 2024, 3:52 p.m. UTC | #1
On 10/01/2024 3:34 pm, Roger Pau Monne wrote:
> When Architectural Performance Monitoring is available, the PERF_GLOBAL_CTRL
> MSR contains per-counter enable bits that is ANDed with the enable bit in the
> counter EVNTSEL MSR in order for a PMC counter to be enabled.
>
> So far the watchdog code seems to have relied on the PERF_GLOBAL_CTRL enable
> bits being set by default, but at least on some Intel Sapphire and Emerald
> Rapids this is no longer the case, and Xen reports:
>
> Testing NMI watchdog on all CPUs: 0 40 stuck
>
> The first CPU on each socket is started with PERF_GLOBAL_CTRL zeroed, so PMC0
> doesn't start counting when the enable bit in EVNTSEL0 is set, due to the
> relevant enable bit in PERF_GLOBAL_CTRL not being set.
>
> Fix by detecting when Architectural Performance Monitoring is available and
> making sure the enable bit for PMC0 is set in PERF_GLOBAL_CTRL.
>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> The fact that it's only the first CPU on each socket that's started with
> PERF_GLOBAL_CTRL clear looks like a firmware bug to me, but in any case making
> sure PERF_GLOBAL_CTRL is properly setup should be done regardless.

It's each package-BSP, and yes, this is clearly a firmware bug.  It's
probably worth saying that we're raising it with Intel, but this bug is
out in production firmware for SPR and EMR.

> ---
>  xen/arch/x86/nmi.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
>
> diff --git a/xen/arch/x86/nmi.c b/xen/arch/x86/nmi.c
> index dc79c25e3ffd..7a6601c4fd31 100644
> --- a/xen/arch/x86/nmi.c
> +++ b/xen/arch/x86/nmi.c
> @@ -335,6 +335,19 @@ static void setup_p6_watchdog(unsigned counter)
>           nmi_p6_event_width > BITS_PER_LONG )
>          return;
>  
> +    if ( cpu_has_arch_perfmon )
> +    {
> +        uint64_t global_ctrl;
> +
> +        rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl);
> +        /*
> +         * Make sure PMC0 is enabled in global control, as the enable bit in
> +         * PERF_GLOBAL_CTRL is AND'ed with the enable bit in EVNTSEL0.
> +         */
> +        if ( !(global_ctrl & 1) )
> +            wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl | 1);

My gut feeling is that we ought to reinstate all bits, not just bit 1. 
If nothing else because that will make debugging using other counters
more reliable too.

vPMU (although mutually exclusive with watchdog) does context switch
this register as a whole.

See how global_ctrl_mask gets set up, although I'm not sure how much of
that infrastructure we really want to reuse here.

~Andrew
Roger Pau Monne Jan. 10, 2024, 4:58 p.m. UTC | #2
On Wed, Jan 10, 2024 at 03:52:49PM +0000, Andrew Cooper wrote:
> On 10/01/2024 3:34 pm, Roger Pau Monne wrote:
> > When Architectural Performance Monitoring is available, the PERF_GLOBAL_CTRL
> > MSR contains per-counter enable bits that is ANDed with the enable bit in the
> > counter EVNTSEL MSR in order for a PMC counter to be enabled.
> >
> > So far the watchdog code seems to have relied on the PERF_GLOBAL_CTRL enable
> > bits being set by default, but at least on some Intel Sapphire and Emerald
> > Rapids this is no longer the case, and Xen reports:
> >
> > Testing NMI watchdog on all CPUs: 0 40 stuck
> >
> > The first CPU on each socket is started with PERF_GLOBAL_CTRL zeroed, so PMC0
> > doesn't start counting when the enable bit in EVNTSEL0 is set, due to the
> > relevant enable bit in PERF_GLOBAL_CTRL not being set.
> >
> > Fix by detecting when Architectural Performance Monitoring is available and
> > making sure the enable bit for PMC0 is set in PERF_GLOBAL_CTRL.
> >
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> > The fact that it's only the first CPU on each socket that's started with
> > PERF_GLOBAL_CTRL clear looks like a firmware bug to me, but in any case making
> > sure PERF_GLOBAL_CTRL is properly setup should be done regardless.
> 
> It's each package-BSP, and yes, this is clearly a firmware bug.  It's
> probably worth saying that we're raising it with Intel, but this bug is
> out in production firmware for SPR and EMR.
> 
> > ---
> >  xen/arch/x86/nmi.c | 13 +++++++++++++
> >  1 file changed, 13 insertions(+)
> >
> > diff --git a/xen/arch/x86/nmi.c b/xen/arch/x86/nmi.c
> > index dc79c25e3ffd..7a6601c4fd31 100644
> > --- a/xen/arch/x86/nmi.c
> > +++ b/xen/arch/x86/nmi.c
> > @@ -335,6 +335,19 @@ static void setup_p6_watchdog(unsigned counter)
> >           nmi_p6_event_width > BITS_PER_LONG )
> >          return;
> >  
> > +    if ( cpu_has_arch_perfmon )
> > +    {
> > +        uint64_t global_ctrl;
> > +
> > +        rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl);
> > +        /*
> > +         * Make sure PMC0 is enabled in global control, as the enable bit in
> > +         * PERF_GLOBAL_CTRL is AND'ed with the enable bit in EVNTSEL0.
> > +         */
> > +        if ( !(global_ctrl & 1) )
> > +            wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl | 1);
> 
> My gut feeling is that we ought to reinstate all bits, not just bit 1. 
> If nothing else because that will make debugging using other counters
> more reliable too.

Hm, yes, I was borderline on enabling all possible counters in
PERF_GLOBAL_CTRL, as reported by CPUID.0AH: EAX[15:8].

But then wondered if it was going too far, as for the purposes here we
just care about PMC1.

My reasoning for not doing it would be that such wide setup of
PERF_GLOBAL_CTRL would then be gated on the watchdog being enabled,
usages of other counters apart from PMC0 will be gated on the watchdog
being enabled.  It seems more reliable to me to either do the setting
of PERF_GLOBAL_CTRL as part of CPU initialization, or defer to each
user of a PMC to take care of enabling it in PERF_GLOBAL_CTRL.

> vPMU (although mutually exclusive with watchdog) does context switch
> this register as a whole.
> 
> See how global_ctrl_mask gets set up, although I'm not sure how much of
> that infrastructure we really want to reuse here.

Yes, if we want to enable all possible counters we would need to use
something similar to what's done there, albeit without the fixed
counter part.

Thanks, Roger.
Andrew Cooper Jan. 10, 2024, 5:41 p.m. UTC | #3
On 10/01/2024 4:58 pm, Roger Pau Monné wrote:
> On Wed, Jan 10, 2024 at 03:52:49PM +0000, Andrew Cooper wrote:
>> On 10/01/2024 3:34 pm, Roger Pau Monne wrote:
>>> When Architectural Performance Monitoring is available, the PERF_GLOBAL_CTRL
>>> MSR contains per-counter enable bits that is ANDed with the enable bit in the
>>> counter EVNTSEL MSR in order for a PMC counter to be enabled.
>>>
>>> So far the watchdog code seems to have relied on the PERF_GLOBAL_CTRL enable
>>> bits being set by default, but at least on some Intel Sapphire and Emerald
>>> Rapids this is no longer the case, and Xen reports:
>>>
>>> Testing NMI watchdog on all CPUs: 0 40 stuck
>>>
>>> The first CPU on each socket is started with PERF_GLOBAL_CTRL zeroed, so PMC0
>>> doesn't start counting when the enable bit in EVNTSEL0 is set, due to the
>>> relevant enable bit in PERF_GLOBAL_CTRL not being set.
>>>
>>> Fix by detecting when Architectural Performance Monitoring is available and
>>> making sure the enable bit for PMC0 is set in PERF_GLOBAL_CTRL.
>>>
>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>> ---
>>> The fact that it's only the first CPU on each socket that's started with
>>> PERF_GLOBAL_CTRL clear looks like a firmware bug to me, but in any case making
>>> sure PERF_GLOBAL_CTRL is properly setup should be done regardless.
>> It's each package-BSP, and yes, this is clearly a firmware bug.  It's
>> probably worth saying that we're raising it with Intel, but this bug is
>> out in production firmware for SPR and EMR.
>>
>>> ---
>>>  xen/arch/x86/nmi.c | 13 +++++++++++++
>>>  1 file changed, 13 insertions(+)
>>>
>>> diff --git a/xen/arch/x86/nmi.c b/xen/arch/x86/nmi.c
>>> index dc79c25e3ffd..7a6601c4fd31 100644
>>> --- a/xen/arch/x86/nmi.c
>>> +++ b/xen/arch/x86/nmi.c
>>> @@ -335,6 +335,19 @@ static void setup_p6_watchdog(unsigned counter)
>>>           nmi_p6_event_width > BITS_PER_LONG )
>>>          return;
>>>  
>>> +    if ( cpu_has_arch_perfmon )
>>> +    {
>>> +        uint64_t global_ctrl;
>>> +
>>> +        rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl);
>>> +        /*
>>> +         * Make sure PMC0 is enabled in global control, as the enable bit in
>>> +         * PERF_GLOBAL_CTRL is AND'ed with the enable bit in EVNTSEL0.
>>> +         */
>>> +        if ( !(global_ctrl & 1) )
>>> +            wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl | 1);
>> My gut feeling is that we ought to reinstate all bits, not just bit 1. 
>> If nothing else because that will make debugging using other counters
>> more reliable too.
> Hm, yes, I was borderline on enabling all possible counters in
> PERF_GLOBAL_CTRL, as reported by CPUID.0AH: EAX[15:8].
>
> But then wondered if it was going too far, as for the purposes here we
> just care about PMC1.
>
> My reasoning for not doing it would be that such wide setup of
> PERF_GLOBAL_CTRL would then be gated on the watchdog being enabled,
> usages of other counters apart from PMC0 will be gated on the watchdog
> being enabled.  It seems more reliable to me to either do the setting
> of PERF_GLOBAL_CTRL as part of CPU initialization, or defer to each
> user of a PMC to take care of enabling it in PERF_GLOBAL_CTRL.

It is buggy that each socket-BSP is handed over with ctl=0 rather than 0xff.

But we're exasperating the bug by not returning each socket-BSP to the
default behaviour.


It makes a practical difference if a developer wants to hand-code up
PCR2.  It also makes a practical difference to what a guest sees when it
executes RDPMC in guests, because right now the perf counter values leak
in (there's another oustanding patch series of mine trying to stem this
leak).

The fixup we're performing here isn't "because we're using one
counter".  It's to get state back to default.

~Andrew
Roger Pau Monne Jan. 11, 2024, 8:28 a.m. UTC | #4
On Wed, Jan 10, 2024 at 05:41:41PM +0000, Andrew Cooper wrote:
> On 10/01/2024 4:58 pm, Roger Pau Monné wrote:
> > On Wed, Jan 10, 2024 at 03:52:49PM +0000, Andrew Cooper wrote:
> >> On 10/01/2024 3:34 pm, Roger Pau Monne wrote:
> >>> When Architectural Performance Monitoring is available, the PERF_GLOBAL_CTRL
> >>> MSR contains per-counter enable bits that is ANDed with the enable bit in the
> >>> counter EVNTSEL MSR in order for a PMC counter to be enabled.
> >>>
> >>> So far the watchdog code seems to have relied on the PERF_GLOBAL_CTRL enable
> >>> bits being set by default, but at least on some Intel Sapphire and Emerald
> >>> Rapids this is no longer the case, and Xen reports:
> >>>
> >>> Testing NMI watchdog on all CPUs: 0 40 stuck
> >>>
> >>> The first CPU on each socket is started with PERF_GLOBAL_CTRL zeroed, so PMC0
> >>> doesn't start counting when the enable bit in EVNTSEL0 is set, due to the
> >>> relevant enable bit in PERF_GLOBAL_CTRL not being set.
> >>>
> >>> Fix by detecting when Architectural Performance Monitoring is available and
> >>> making sure the enable bit for PMC0 is set in PERF_GLOBAL_CTRL.
> >>>
> >>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> >>> ---
> >>> The fact that it's only the first CPU on each socket that's started with
> >>> PERF_GLOBAL_CTRL clear looks like a firmware bug to me, but in any case making
> >>> sure PERF_GLOBAL_CTRL is properly setup should be done regardless.
> >> It's each package-BSP, and yes, this is clearly a firmware bug.  It's
> >> probably worth saying that we're raising it with Intel, but this bug is
> >> out in production firmware for SPR and EMR.
> >>
> >>> ---
> >>>  xen/arch/x86/nmi.c | 13 +++++++++++++
> >>>  1 file changed, 13 insertions(+)
> >>>
> >>> diff --git a/xen/arch/x86/nmi.c b/xen/arch/x86/nmi.c
> >>> index dc79c25e3ffd..7a6601c4fd31 100644
> >>> --- a/xen/arch/x86/nmi.c
> >>> +++ b/xen/arch/x86/nmi.c
> >>> @@ -335,6 +335,19 @@ static void setup_p6_watchdog(unsigned counter)
> >>>           nmi_p6_event_width > BITS_PER_LONG )
> >>>          return;
> >>>  
> >>> +    if ( cpu_has_arch_perfmon )
> >>> +    {
> >>> +        uint64_t global_ctrl;
> >>> +
> >>> +        rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl);
> >>> +        /*
> >>> +         * Make sure PMC0 is enabled in global control, as the enable bit in
> >>> +         * PERF_GLOBAL_CTRL is AND'ed with the enable bit in EVNTSEL0.
> >>> +         */
> >>> +        if ( !(global_ctrl & 1) )
> >>> +            wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl | 1);
> >> My gut feeling is that we ought to reinstate all bits, not just bit 1. 
> >> If nothing else because that will make debugging using other counters
> >> more reliable too.
> > Hm, yes, I was borderline on enabling all possible counters in
> > PERF_GLOBAL_CTRL, as reported by CPUID.0AH: EAX[15:8].
> >
> > But then wondered if it was going too far, as for the purposes here we
> > just care about PMC1.
> >
> > My reasoning for not doing it would be that such wide setup of
> > PERF_GLOBAL_CTRL would then be gated on the watchdog being enabled,
> > usages of other counters apart from PMC0 will be gated on the watchdog
> > being enabled.  It seems more reliable to me to either do the setting
> > of PERF_GLOBAL_CTRL as part of CPU initialization, or defer to each
> > user of a PMC to take care of enabling it in PERF_GLOBAL_CTRL.
> 
> It is buggy that each socket-BSP is handed over with ctl=0 rather than 0xff.
> 
> But we're exasperating the bug by not returning each socket-BSP to the
> default behaviour.
> 
> 
> It makes a practical difference if a developer wants to hand-code up
> PCR2.

I'm afraid I'm lost at what the PCR2 acronym references to here, as I
cannot find any instance of it in SDM vols 2, 3 or 4.

> It also makes a practical difference to what a guest sees when it
> executes RDPMC in guests, because right now the perf counter values leak
> in (there's another oustanding patch series of mine trying to stem this
> leak).

But RDPMC just fetches the contents of the counters, so it has no
visibility on the value of PERF_GLOBAL_CTRL.  Albeit the settings in
PERF_GLOBAL_CTRL will affect whether the counters are enabled or not,
I'm not sure a guest without access to vPMU should expect to get any
kind of consistent results out of RDPMC.

> 
> The fixup we're performing here isn't "because we're using one
> counter".  It's to get state back to default.

I'm certainly not opposed to that, but as said in my previous reply,
the adjustment should then be done somewhere else and not in
setup_p6_watchdog().  Unless there are further objections I will send
a patch to enable all general purpose PMCs in PERF_GLOBAL_CTRL at
init_intel().

Thanks, Roger.
Jan Beulich Jan. 11, 2024, 8:29 a.m. UTC | #5
On 10.01.2024 17:58, Roger Pau Monné wrote:
> On Wed, Jan 10, 2024 at 03:52:49PM +0000, Andrew Cooper wrote:
>> On 10/01/2024 3:34 pm, Roger Pau Monne wrote:
>>> When Architectural Performance Monitoring is available, the PERF_GLOBAL_CTRL
>>> MSR contains per-counter enable bits that is ANDed with the enable bit in the
>>> counter EVNTSEL MSR in order for a PMC counter to be enabled.
>>>
>>> So far the watchdog code seems to have relied on the PERF_GLOBAL_CTRL enable
>>> bits being set by default, but at least on some Intel Sapphire and Emerald
>>> Rapids this is no longer the case, and Xen reports:
>>>
>>> Testing NMI watchdog on all CPUs: 0 40 stuck
>>>
>>> The first CPU on each socket is started with PERF_GLOBAL_CTRL zeroed, so PMC0
>>> doesn't start counting when the enable bit in EVNTSEL0 is set, due to the
>>> relevant enable bit in PERF_GLOBAL_CTRL not being set.
>>>
>>> Fix by detecting when Architectural Performance Monitoring is available and
>>> making sure the enable bit for PMC0 is set in PERF_GLOBAL_CTRL.
>>>
>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>> ---
>>> The fact that it's only the first CPU on each socket that's started with
>>> PERF_GLOBAL_CTRL clear looks like a firmware bug to me, but in any case making
>>> sure PERF_GLOBAL_CTRL is properly setup should be done regardless.
>>
>> It's each package-BSP, and yes, this is clearly a firmware bug.  It's
>> probably worth saying that we're raising it with Intel, but this bug is
>> out in production firmware for SPR and EMR.
>>
>>> ---
>>>  xen/arch/x86/nmi.c | 13 +++++++++++++
>>>  1 file changed, 13 insertions(+)
>>>
>>> diff --git a/xen/arch/x86/nmi.c b/xen/arch/x86/nmi.c
>>> index dc79c25e3ffd..7a6601c4fd31 100644
>>> --- a/xen/arch/x86/nmi.c
>>> +++ b/xen/arch/x86/nmi.c
>>> @@ -335,6 +335,19 @@ static void setup_p6_watchdog(unsigned counter)
>>>           nmi_p6_event_width > BITS_PER_LONG )
>>>          return;
>>>  
>>> +    if ( cpu_has_arch_perfmon )
>>> +    {
>>> +        uint64_t global_ctrl;
>>> +
>>> +        rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl);
>>> +        /*
>>> +         * Make sure PMC0 is enabled in global control, as the enable bit in
>>> +         * PERF_GLOBAL_CTRL is AND'ed with the enable bit in EVNTSEL0.
>>> +         */
>>> +        if ( !(global_ctrl & 1) )
>>> +            wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl | 1);
>>
>> My gut feeling is that we ought to reinstate all bits, not just bit 1. 
>> If nothing else because that will make debugging using other counters
>> more reliable too.
> 
> Hm, yes, I was borderline on enabling all possible counters in
> PERF_GLOBAL_CTRL, as reported by CPUID.0AH: EAX[15:8].
> 
> But then wondered if it was going too far, as for the purposes here we
> just care about PMC1.
> 
> My reasoning for not doing it would be that such wide setup of
> PERF_GLOBAL_CTRL would then be gated on the watchdog being enabled,
> usages of other counters apart from PMC0 will be gated on the watchdog
> being enabled.

Since Andrew didn't explicitly say so in his reply - imo this then means
the adjustment wants moving out of setup_p6_watchdog().

Jan

>  It seems more reliable to me to either do the setting
> of PERF_GLOBAL_CTRL as part of CPU initialization, or defer to each
> user of a PMC to take care of enabling it in PERF_GLOBAL_CTRL.
> 
>> vPMU (although mutually exclusive with watchdog) does context switch
>> this register as a whole.
>>
>> See how global_ctrl_mask gets set up, although I'm not sure how much of
>> that infrastructure we really want to reuse here.
> 
> Yes, if we want to enable all possible counters we would need to use
> something similar to what's done there, albeit without the fixed
> counter part.
> 
> Thanks, Roger.
diff mbox series

Patch

diff --git a/xen/arch/x86/nmi.c b/xen/arch/x86/nmi.c
index dc79c25e3ffd..7a6601c4fd31 100644
--- a/xen/arch/x86/nmi.c
+++ b/xen/arch/x86/nmi.c
@@ -335,6 +335,19 @@  static void setup_p6_watchdog(unsigned counter)
          nmi_p6_event_width > BITS_PER_LONG )
         return;
 
+    if ( cpu_has_arch_perfmon )
+    {
+        uint64_t global_ctrl;
+
+        rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl);
+        /*
+         * Make sure PMC0 is enabled in global control, as the enable bit in
+         * PERF_GLOBAL_CTRL is AND'ed with the enable bit in EVNTSEL0.
+         */
+        if ( !(global_ctrl & 1) )
+            wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl | 1);
+    }
+
     clear_msr_range(MSR_P6_EVNTSEL(0), 2);
     clear_msr_range(MSR_P6_PERFCTR(0), 2);