diff mbox series

[3/5] x86/AMD: make C-state handling independent of Dom0

Message ID 5CE68F830200007800231B3B@prv1-mh.provo.novell.com (mailing list archive)
State New, archived
Headers show
Series x86: CPU idle management adjustments | expand

Commit Message

Jan Beulich May 23, 2019, 12:18 p.m. UTC
At least for more recent CPUs, following what BKDG / PPR suggest for the
BIOS to surface via ACPI we can make ourselves independent of Dom0
uploading respective data.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
TBD: Can we set local_apic_timer_c2_ok to true? I can't seem to find any
     statement in the BKDG / PPR as to whether the LAPIC timer continues
     running in CC6.
TBD: We may want to verify that HLT indeed is configured to enter CC6.
TBD: Brian's series specifies .target_residency as 1000 for CC6; may
     want to do so here as well. Question then is whether this value is
     also suitable for the older families.
TBD: I guess we could extend this to families older then Fam15 as well.

Comments

Andrew Cooper June 10, 2019, 4:28 p.m. UTC | #1
On 23/05/2019 13:18, Jan Beulich wrote:
> At least for more recent CPUs, following what BKDG / PPR suggest for the
> BIOS to surface via ACPI we can make ourselves independent of Dom0
> uploading respective data.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> TBD: Can we set local_apic_timer_c2_ok to true? I can't seem to find any
>      statement in the BKDG / PPR as to whether the LAPIC timer continues
>      running in CC6.

This ought to be easy to determine.  Given the description of CC6
flushing the cache and power gating the core, I'd say there is a
reasonable chance that the LAPIC timer stops in CC6.

> TBD: We may want to verify that HLT indeed is configured to enter CC6.

I can't actually spot anything which talks about HLT directly.  The
closest I can post is CFOH (cache flush on halt) which is an
auto-transition from CC1 to CC6 after a specific timeout, but the
wording suggests that mwait would also take this path.

> TBD: Brian's series specifies .target_residency as 1000 for CC6; may
>      want to do so here as well. Question then is whether this value is
>      also suitable for the older families.

Well - the PPR does say 400.

> TBD: I guess we could extend this to families older then Fam15 as well.
>
> --- a/xen/arch/x86/acpi/cpu_idle.c
> +++ b/xen/arch/x86/acpi/cpu_idle.c
> @@ -1283,6 +1288,98 @@ long set_cx_pminfo(uint32_t acpi_id, str
>      return 0;
>  }
>  
> +static void amd_cpuidle_init(struct acpi_processor_power *power)
> +{
> +    unsigned int i, nr = 0;
> +    const struct cpuinfo_x86 *c = &current_cpu_data;
> +    const unsigned int ecx_req = CPUID5_ECX_EXTENSIONS_SUPPORTED |
> +                                 CPUID5_ECX_INTERRUPT_BREAK;
> +    const struct acpi_processor_cx *cx = NULL;
> +    static const struct acpi_processor_cx fam17[] = {
> +        {
> +            .type = ACPI_STATE_C1,
> +            .entry_method = ACPI_CSTATE_EM_FFH,
> +            .address = 0,
> +            .latency = 1,
> +        },
> +        {
> +            .type = ACPI_STATE_C2,
> +            .entry_method = ACPI_CSTATE_EM_HALT,
> +            .latency = 400,
> +        },
> +    };
> +
> +    if ( pm_idle_save && pm_idle != acpi_processor_idle )
> +        return;
> +
> +    if ( vendor_override < 0 )
> +        return;
> +
> +    switch ( c->x86 )
> +    {
> +    case 0x17:

With Hygon in the mix, this should be expanded to Fam18h.

~Andrew
Jan Beulich June 11, 2019, 12:42 p.m. UTC | #2
>>> On 10.06.19 at 18:28, <andrew.cooper3@citrix.com> wrote:
> On 23/05/2019 13:18, Jan Beulich wrote:
>> At least for more recent CPUs, following what BKDG / PPR suggest for the
>> BIOS to surface via ACPI we can make ourselves independent of Dom0
>> uploading respective data.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> TBD: Can we set local_apic_timer_c2_ok to true? I can't seem to find any
>>      statement in the BKDG / PPR as to whether the LAPIC timer continues
>>      running in CC6.
> 
> This ought to be easy to determine.  Given the description of CC6
> flushing the cache and power gating the core, I'd say there is a
> reasonable chance that the LAPIC timer stops in CC6.

But "reasonable chance" isn't enough for my taste here. And from
what you deduce, the answer to the question would be "no", and
hence simply no change to be made anywhere. (I do think though
that it's more complicated than this, because iirc much also depends
on what the firmware actually does.)

>> TBD: We may want to verify that HLT indeed is configured to enter CC6.
> 
> I can't actually spot anything which talks about HLT directly.  The
> closest I can post is CFOH (cache flush on halt) which is an
> auto-transition from CC1 to CC6 after a specific timeout, but the
> wording suggests that mwait would also take this path.

Well, I had come across a section describing how HLT can be
configured to be the same action as the I/O port read from one
of the three ports involved in C-state management
(CStateBaseAddr+0...2). But I can't seem to find this again.

As to MWAIT behaving the same, I don't think I can spot proof
of your interpretation or proof of Brian's.

>> --- a/xen/arch/x86/acpi/cpu_idle.c
>> +++ b/xen/arch/x86/acpi/cpu_idle.c
>> @@ -1283,6 +1288,98 @@ long set_cx_pminfo(uint32_t acpi_id, str
>>      return 0;
>>  }
>>  
>> +static void amd_cpuidle_init(struct acpi_processor_power *power)
>> +{
>> +    unsigned int i, nr = 0;
>> +    const struct cpuinfo_x86 *c = &current_cpu_data;
>> +    const unsigned int ecx_req = CPUID5_ECX_EXTENSIONS_SUPPORTED |
>> +                                 CPUID5_ECX_INTERRUPT_BREAK;
>> +    const struct acpi_processor_cx *cx = NULL;
>> +    static const struct acpi_processor_cx fam17[] = {
>> +        {
>> +            .type = ACPI_STATE_C1,
>> +            .entry_method = ACPI_CSTATE_EM_FFH,
>> +            .address = 0,
>> +            .latency = 1,
>> +        },
>> +        {
>> +            .type = ACPI_STATE_C2,
>> +            .entry_method = ACPI_CSTATE_EM_HALT,
>> +            .latency = 400,
>> +        },
>> +    };
>> +
>> +    if ( pm_idle_save && pm_idle != acpi_processor_idle )
>> +        return;
>> +
>> +    if ( vendor_override < 0 )
>> +        return;
>> +
>> +    switch ( c->x86 )
>> +    {
>> +    case 0x17:
> 
> With Hygon in the mix, this should be expanded to Fam18h.

But only once we get a guarantee from AMD that they won't use
family 18h. Otherwise we'd have to use vendor checks here.
Anyway this series predates the merging of the Hygon one. But
yes, I can easily do this for v2.

Jan
Woods, Brian June 18, 2019, 5:22 p.m. UTC | #3
On Tue, Jun 11, 2019 at 06:42:33AM -0600, Jan Beulich wrote:
> >>> On 10.06.19 at 18:28, <andrew.cooper3@citrix.com> wrote:
> > On 23/05/2019 13:18, Jan Beulich wrote:
> >> At least for more recent CPUs, following what BKDG / PPR suggest for the
> >> BIOS to surface via ACPI we can make ourselves independent of Dom0
> >> uploading respective data.
> >>
> >> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> >> ---
> >> TBD: Can we set local_apic_timer_c2_ok to true? I can't seem to find any
> >>      statement in the BKDG / PPR as to whether the LAPIC timer continues
> >>      running in CC6.
> > 
> > This ought to be easy to determine.  Given the description of CC6
> > flushing the cache and power gating the core, I'd say there is a
> > reasonable chance that the LAPIC timer stops in CC6.
> 
> But "reasonable chance" isn't enough for my taste here. And from
> what you deduce, the answer to the question would be "no", and
> hence simply no change to be made anywhere. (I do think though
> that it's more complicated than this, because iirc much also depends
> on what the firmware actually does.)

The LAPIC timer never stops on the currently platforms (Naples and
Rome).  This is a knowledgable HW engineer so.

> >> TBD: We may want to verify that HLT indeed is configured to enter CC6.
> > 
> > I can't actually spot anything which talks about HLT directly.  The
> > closest I can post is CFOH (cache flush on halt) which is an
> > auto-transition from CC1 to CC6 after a specific timeout, but the
> > wording suggests that mwait would also take this path.
> 
> Well, I had come across a section describing how HLT can be
> configured to be the same action as the I/O port read from one
> of the three ports involved in C-state management
> (CStateBaseAddr+0...2). But I can't seem to find this again.
> 
> As to MWAIT behaving the same, I don't think I can spot proof
> of your interpretation or proof of Brian's.

It's not really documented clearly.  I got my information from the HW
engineers.  I've already posted what information I know so I won't
repeat it.

> >> --- a/xen/arch/x86/acpi/cpu_idle.c
> >> +++ b/xen/arch/x86/acpi/cpu_idle.c
> >> @@ -1283,6 +1288,98 @@ long set_cx_pminfo(uint32_t acpi_id, str
> >>      return 0;
> >>  }
> >>  
> >> +static void amd_cpuidle_init(struct acpi_processor_power *power)
> >> +{
> >> +    unsigned int i, nr = 0;
> >> +    const struct cpuinfo_x86 *c = &current_cpu_data;
> >> +    const unsigned int ecx_req = CPUID5_ECX_EXTENSIONS_SUPPORTED |
> >> +                                 CPUID5_ECX_INTERRUPT_BREAK;
> >> +    const struct acpi_processor_cx *cx = NULL;
> >> +    static const struct acpi_processor_cx fam17[] = {
> >> +        {
> >> +            .type = ACPI_STATE_C1,
> >> +            .entry_method = ACPI_CSTATE_EM_FFH,
> >> +            .address = 0,
> >> +            .latency = 1,
> >> +        },
> >> +        {
> >> +            .type = ACPI_STATE_C2,
> >> +            .entry_method = ACPI_CSTATE_EM_HALT,
> >> +            .latency = 400,
> >> +        },
> >> +    };
> >> +
> >> +    if ( pm_idle_save && pm_idle != acpi_processor_idle )
> >> +        return;
> >> +
> >> +    if ( vendor_override < 0 )
> >> +        return;
> >> +
> >> +    switch ( c->x86 )
> >> +    {
> >> +    case 0x17:
> > 
> > With Hygon in the mix, this should be expanded to Fam18h.
> 
> But only once we get a guarantee from AMD that they won't use
> family 18h. Otherwise we'd have to use vendor checks here.
> Anyway this series predates the merging of the Hygon one. But
> yes, I can easily do this for v2.
> 
> Jan
> 
>
Jan Beulich June 19, 2019, 6:20 a.m. UTC | #4
>>> On 18.06.19 at 19:22, <Brian.Woods@amd.com> wrote:
> On Tue, Jun 11, 2019 at 06:42:33AM -0600, Jan Beulich wrote:
>> >>> On 10.06.19 at 18:28, <andrew.cooper3@citrix.com> wrote:
>> > On 23/05/2019 13:18, Jan Beulich wrote:
>> >> TBD: Can we set local_apic_timer_c2_ok to true? I can't seem to find any
>> >>      statement in the BKDG / PPR as to whether the LAPIC timer continues
>> >>      running in CC6.
>> > 
>> > This ought to be easy to determine.  Given the description of CC6
>> > flushing the cache and power gating the core, I'd say there is a
>> > reasonable chance that the LAPIC timer stops in CC6.
>> 
>> But "reasonable chance" isn't enough for my taste here. And from
>> what you deduce, the answer to the question would be "no", and
>> hence simply no change to be made anywhere. (I do think though
>> that it's more complicated than this, because iirc much also depends
>> on what the firmware actually does.)
> 
> The LAPIC timer never stops on the currently platforms (Naples and
> Rome).  This is a knowledgable HW engineer so.

Thanks - I've taken note to set the variable accordingly then.

>> >> TBD: We may want to verify that HLT indeed is configured to enter CC6.
>> > 
>> > I can't actually spot anything which talks about HLT directly.  The
>> > closest I can post is CFOH (cache flush on halt) which is an
>> > auto-transition from CC1 to CC6 after a specific timeout, but the
>> > wording suggests that mwait would also take this path.
>> 
>> Well, I had come across a section describing how HLT can be
>> configured to be the same action as the I/O port read from one
>> of the three ports involved in C-state management
>> (CStateBaseAddr+0...2). But I can't seem to find this again.
>> 
>> As to MWAIT behaving the same, I don't think I can spot proof
>> of your interpretation or proof of Brian's.
> 
> It's not really documented clearly.  I got my information from the HW
> engineers.  I've already posted what information I know so I won't
> repeat it.

At least a pointer to where you had stated this would have been
nice. Iirc there's no promotion into CC6 in that case, in contrast
to Andrew's reading of the doc.

Jan
Woods, Brian June 19, 2019, 3:54 p.m. UTC | #5
On Wed, Jun 19, 2019 at 12:20:40AM -0600, Jan Beulich wrote:
> >>> On 18.06.19 at 19:22, <Brian.Woods@amd.com> wrote:
> > On Tue, Jun 11, 2019 at 06:42:33AM -0600, Jan Beulich wrote:
> >> >>> On 10.06.19 at 18:28, <andrew.cooper3@citrix.com> wrote:
> >> > On 23/05/2019 13:18, Jan Beulich wrote:
> >> >> TBD: Can we set local_apic_timer_c2_ok to true? I can't seem to find any
> >> >>      statement in the BKDG / PPR as to whether the LAPIC timer continues
> >> >>      running in CC6.
> >> > 
> >> > This ought to be easy to determine.  Given the description of CC6
> >> > flushing the cache and power gating the core, I'd say there is a
> >> > reasonable chance that the LAPIC timer stops in CC6.
> >> 
> >> But "reasonable chance" isn't enough for my taste here. And from
> >> what you deduce, the answer to the question would be "no", and
> >> hence simply no change to be made anywhere. (I do think though
> >> that it's more complicated than this, because iirc much also depends
> >> on what the firmware actually does.)
> > 
> > The LAPIC timer never stops on the currently platforms (Naples and
> > Rome).  This is a knowledgable HW engineer so.
> 
> Thanks - I've taken note to set the variable accordingly then.
> 
> >> >> TBD: We may want to verify that HLT indeed is configured to enter CC6.
> >> > 
> >> > I can't actually spot anything which talks about HLT directly.  The
> >> > closest I can post is CFOH (cache flush on halt) which is an
> >> > auto-transition from CC1 to CC6 after a specific timeout, but the
> >> > wording suggests that mwait would also take this path.
> >> 
> >> Well, I had come across a section describing how HLT can be
> >> configured to be the same action as the I/O port read from one
> >> of the three ports involved in C-state management
> >> (CStateBaseAddr+0...2). But I can't seem to find this again.
> >> 
> >> As to MWAIT behaving the same, I don't think I can spot proof
> >> of your interpretation or proof of Brian's.
> > 
> > It's not really documented clearly.  I got my information from the HW
> > engineers.  I've already posted what information I know so I won't
> > repeat it.
> 
> At least a pointer to where you had stated this would have been
> nice. Iirc there's no promotion into CC6 in that case, in contrast
> to Andrew's reading of the doc.
> 
> Jan
> 

&mwait_v1_patchset
Jan Beulich June 21, 2019, 6:37 a.m. UTC | #6
>>> On 19.06.19 at 17:54, <Brian.Woods@amd.com> wrote:
> On Wed, Jun 19, 2019 at 12:20:40AM -0600, Jan Beulich wrote:
>> >>> On 18.06.19 at 19:22, <Brian.Woods@amd.com> wrote:
>> > On Tue, Jun 11, 2019 at 06:42:33AM -0600, Jan Beulich wrote:
>> >> >>> On 10.06.19 at 18:28, <andrew.cooper3@citrix.com> wrote:
>> >> > On 23/05/2019 13:18, Jan Beulich wrote:
>> >> >> TBD: Can we set local_apic_timer_c2_ok to true? I can't seem to find any
>> >> >>      statement in the BKDG / PPR as to whether the LAPIC timer continues
>> >> >>      running in CC6.
>> >> > 
>> >> > This ought to be easy to determine.  Given the description of CC6
>> >> > flushing the cache and power gating the core, I'd say there is a
>> >> > reasonable chance that the LAPIC timer stops in CC6.
>> >> 
>> >> But "reasonable chance" isn't enough for my taste here. And from
>> >> what you deduce, the answer to the question would be "no", and
>> >> hence simply no change to be made anywhere. (I do think though
>> >> that it's more complicated than this, because iirc much also depends
>> >> on what the firmware actually does.)
>> > 
>> > The LAPIC timer never stops on the currently platforms (Naples and
>> > Rome).  This is a knowledgable HW engineer so.
>> 
>> Thanks - I've taken note to set the variable accordingly then.
>> 
>> >> >> TBD: We may want to verify that HLT indeed is configured to enter CC6.
>> >> > 
>> >> > I can't actually spot anything which talks about HLT directly.  The
>> >> > closest I can post is CFOH (cache flush on halt) which is an
>> >> > auto-transition from CC1 to CC6 after a specific timeout, but the
>> >> > wording suggests that mwait would also take this path.
>> >> 
>> >> Well, I had come across a section describing how HLT can be
>> >> configured to be the same action as the I/O port read from one
>> >> of the three ports involved in C-state management
>> >> (CStateBaseAddr+0...2). But I can't seem to find this again.
>> >> 
>> >> As to MWAIT behaving the same, I don't think I can spot proof
>> >> of your interpretation or proof of Brian's.
>> > 
>> > It's not really documented clearly.  I got my information from the HW
>> > engineers.  I've already posted what information I know so I won't
>> > repeat it.
>> 
>> At least a pointer to where you had stated this would have been
>> nice. Iirc there's no promotion into CC6 in that case, in contrast
>> to Andrew's reading of the doc.
> 
> &mwait_v1_patchset

Hmm, I've looked through the patch descriptions there again, but I
can't find any explicit statement to the effect of there being no
promotion into deeper states when using MWAIT.

Jan
Woods, Brian June 21, 2019, 2:29 p.m. UTC | #7
On Fri, Jun 21, 2019 at 12:37:47AM -0600, Jan Beulich wrote:
> >>> On 19.06.19 at 17:54, <Brian.Woods@amd.com> wrote:
> > On Wed, Jun 19, 2019 at 12:20:40AM -0600, Jan Beulich wrote:
> >> >>> On 18.06.19 at 19:22, <Brian.Woods@amd.com> wrote:
> >> > On Tue, Jun 11, 2019 at 06:42:33AM -0600, Jan Beulich wrote:
> >> >> >>> On 10.06.19 at 18:28, <andrew.cooper3@citrix.com> wrote:
> >> >> > On 23/05/2019 13:18, Jan Beulich wrote:
> >> >> >> TBD: Can we set local_apic_timer_c2_ok to true? I can't seem to find any
> >> >> >>      statement in the BKDG / PPR as to whether the LAPIC timer continues
> >> >> >>      running in CC6.
> >> >> > 
> >> >> > This ought to be easy to determine.  Given the description of CC6
> >> >> > flushing the cache and power gating the core, I'd say there is a
> >> >> > reasonable chance that the LAPIC timer stops in CC6.
> >> >> 
> >> >> But "reasonable chance" isn't enough for my taste here. And from
> >> >> what you deduce, the answer to the question would be "no", and
> >> >> hence simply no change to be made anywhere. (I do think though
> >> >> that it's more complicated than this, because iirc much also depends
> >> >> on what the firmware actually does.)
> >> > 
> >> > The LAPIC timer never stops on the currently platforms (Naples and
> >> > Rome).  This is a knowledgable HW engineer so.
> >> 
> >> Thanks - I've taken note to set the variable accordingly then.
> >> 
> >> >> >> TBD: We may want to verify that HLT indeed is configured to enter CC6.
> >> >> > 
> >> >> > I can't actually spot anything which talks about HLT directly.  The
> >> >> > closest I can post is CFOH (cache flush on halt) which is an
> >> >> > auto-transition from CC1 to CC6 after a specific timeout, but the
> >> >> > wording suggests that mwait would also take this path.
> >> >> 
> >> >> Well, I had come across a section describing how HLT can be
> >> >> configured to be the same action as the I/O port read from one
> >> >> of the three ports involved in C-state management
> >> >> (CStateBaseAddr+0...2). But I can't seem to find this again.
> >> >> 
> >> >> As to MWAIT behaving the same, I don't think I can spot proof
> >> >> of your interpretation or proof of Brian's.
> >> > 
> >> > It's not really documented clearly.  I got my information from the HW
> >> > engineers.  I've already posted what information I know so I won't
> >> > repeat it.
> >> 
> >> At least a pointer to where you had stated this would have been
> >> nice. Iirc there's no promotion into CC6 in that case, in contrast
> >> to Andrew's reading of the doc.
> > 
> > &mwait_v1_patchset
> 
> Hmm, I've looked through the patch descriptions there again, but I
> can't find any explicit statement to the effect of there being no
> promotion into deeper states when using MWAIT.
> 
> Jan

https://lists.xenproject.org/archives/html/xen-devel/2019-02/msg02007.html

Since you're under NDA, I can send you the email I received from the HW
engineering but as a basic recap:

If the HW is configured to use CC6 for HLT (CC6 is enabled and some
other NDA bits which gets OR'd with firmware so you can only
functionally CC6 on HLT off, but can't make sure it's on), then the
flow is:
1) HLT
2) timer
3) flush the caches etc
4) CC6

This can be interrupted though.  The HW engineer said that while they
aren't the same (as IO based C-states), they end up at the same place.

The whole reason HLT was selected to be used in my patches is because
we can't look in the CST table from Xen and it's always safe to use,
even if CC6 is disabled in BIOS (which we can't tell).  At this point,
I'm repeating our conversion we had in my v1 patch set.  If you need
any further info, let me know.
Jan Beulich June 21, 2019, 2:56 p.m. UTC | #8
>>> On 21.06.19 at 16:29, <Brian.Woods@amd.com> wrote:
> On Fri, Jun 21, 2019 at 12:37:47AM -0600, Jan Beulich wrote:
>> >>> On 19.06.19 at 17:54, <Brian.Woods@amd.com> wrote:
>> > On Wed, Jun 19, 2019 at 12:20:40AM -0600, Jan Beulich wrote:
>> >> >>> On 18.06.19 at 19:22, <Brian.Woods@amd.com> wrote:
>> >> > On Tue, Jun 11, 2019 at 06:42:33AM -0600, Jan Beulich wrote:
>> >> >> >>> On 10.06.19 at 18:28, <andrew.cooper3@citrix.com> wrote:
>> >> >> > On 23/05/2019 13:18, Jan Beulich wrote:
>> >> >> >> TBD: We may want to verify that HLT indeed is configured to enter CC6.
>> >> >> > 
>> >> >> > I can't actually spot anything which talks about HLT directly.  The
>> >> >> > closest I can post is CFOH (cache flush on halt) which is an
>> >> >> > auto-transition from CC1 to CC6 after a specific timeout, but the
>> >> >> > wording suggests that mwait would also take this path.
>> >> >> 
>> >> >> Well, I had come across a section describing how HLT can be
>> >> >> configured to be the same action as the I/O port read from one
>> >> >> of the three ports involved in C-state management
>> >> >> (CStateBaseAddr+0...2). But I can't seem to find this again.
>> >> >> 
>> >> >> As to MWAIT behaving the same, I don't think I can spot proof
>> >> >> of your interpretation or proof of Brian's.
>> >> > 
>> >> > It's not really documented clearly.  I got my information from the HW
>> >> > engineers.  I've already posted what information I know so I won't
>> >> > repeat it.
>> >> 
>> >> At least a pointer to where you had stated this would have been
>> >> nice. Iirc there's no promotion into CC6 in that case, in contrast
>> >> to Andrew's reading of the doc.
>> > 
>> > &mwait_v1_patchset
>> 
>> Hmm, I've looked through the patch descriptions there again, but I
>> can't find any explicit statement to the effect of there being no
>> promotion into deeper states when using MWAIT.
> 
> https://lists.xenproject.org/archives/html/xen-devel/2019-02/msg02007.html 

Thanks. Yes, it may be implied from there, but to me it's still not
explicit. Also recall that it was Andrew originally asking if any
promotion from CC1 is possible. I'm fine with you telling me it's
not, but Andrew may still want you pointing him at where this
is written down.

> Since you're under NDA, I can send you the email I received from the HW
> engineering but as a basic recap:
> 
> If the HW is configured to use CC6 for HLT (CC6 is enabled and some
> other NDA bits which gets OR'd with firmware so you can only
> functionally CC6 on HLT off, but can't make sure it's on), then the
> flow is:
> 1) HLT
> 2) timer
> 3) flush the caches etc
> 4) CC6
> 
> This can be interrupted though.  The HW engineer said that while they
> aren't the same (as IO based C-states), they end up at the same place.
> 
> The whole reason HLT was selected to be used in my patches is because
> we can't look in the CST table from Xen and it's always safe to use,
> even if CC6 is disabled in BIOS (which we can't tell).  At this point,
> I'm repeating our conversion we had in my v1 patch set.  If you need
> any further info, let me know.

Thanks, I recall all of this. I don't see though how it's related to the
question of whether the CPU would really remain in C1 when using
MWAIT (i.e. going back to Andrew's original finding of promotion from
CC1 to CC6). Now I do realize that C1 != CC1, but this doesn't help
clarifying things in any way.

Jan
Woods, Brian June 21, 2019, 3:26 p.m. UTC | #9
On Fri, Jun 21, 2019 at 08:56:22AM -0600, Jan Beulich wrote:
> >>> On 21.06.19 at 16:29, <Brian.Woods@amd.com> wrote:
> > On Fri, Jun 21, 2019 at 12:37:47AM -0600, Jan Beulich wrote:
> >> >>> On 19.06.19 at 17:54, <Brian.Woods@amd.com> wrote:
> >> > On Wed, Jun 19, 2019 at 12:20:40AM -0600, Jan Beulich wrote:
> >> >> >>> On 18.06.19 at 19:22, <Brian.Woods@amd.com> wrote:
> >> >> > On Tue, Jun 11, 2019 at 06:42:33AM -0600, Jan Beulich wrote:
> >> >> >> >>> On 10.06.19 at 18:28, <andrew.cooper3@citrix.com> wrote:
> >> >> >> > On 23/05/2019 13:18, Jan Beulich wrote:
> >> >> >> >> TBD: We may want to verify that HLT indeed is configured to enter CC6.
> >> >> >> > 
> >> >> >> > I can't actually spot anything which talks about HLT directly.  The
> >> >> >> > closest I can post is CFOH (cache flush on halt) which is an
> >> >> >> > auto-transition from CC1 to CC6 after a specific timeout, but the
> >> >> >> > wording suggests that mwait would also take this path.
> >> >> >> 
> >> >> >> Well, I had come across a section describing how HLT can be
> >> >> >> configured to be the same action as the I/O port read from one
> >> >> >> of the three ports involved in C-state management
> >> >> >> (CStateBaseAddr+0...2). But I can't seem to find this again.
> >> >> >> 
> >> >> >> As to MWAIT behaving the same, I don't think I can spot proof
> >> >> >> of your interpretation or proof of Brian's.
> >> >> > 
> >> >> > It's not really documented clearly.  I got my information from the HW
> >> >> > engineers.  I've already posted what information I know so I won't
> >> >> > repeat it.
> >> >> 
> >> >> At least a pointer to where you had stated this would have been
> >> >> nice. Iirc there's no promotion into CC6 in that case, in contrast
> >> >> to Andrew's reading of the doc.
> >> > 
> >> > &mwait_v1_patchset
> >> 
> >> Hmm, I've looked through the patch descriptions there again, but I
> >> can't find any explicit statement to the effect of there being no
> >> promotion into deeper states when using MWAIT.
> > 
> > https://lists.xenproject.org/archives/html/xen-devel/2019-02/msg02007.html 
> 
> Thanks. Yes, it may be implied from there, but to me it's still not
> explicit. Also recall that it was Andrew originally asking if any
> promotion from CC1 is possible. I'm fine with you telling me it's
> not, but Andrew may still want you pointing him at where this
> is written down.
> 
> > Since you're under NDA, I can send you the email I received from the HW
> > engineering but as a basic recap:
> > 
> > If the HW is configured to use CC6 for HLT (CC6 is enabled and some
> > other NDA bits which gets OR'd with firmware so you can only
> > functionally CC6 on HLT off, but can't make sure it's on), then the
> > flow is:
> > 1) HLT
> > 2) timer
> > 3) flush the caches etc
> > 4) CC6
> > 
> > This can be interrupted though.  The HW engineer said that while they
> > aren't the same (as IO based C-states), they end up at the same place.
> > 
> > The whole reason HLT was selected to be used in my patches is because
> > we can't look in the CST table from Xen and it's always safe to use,
> > even if CC6 is disabled in BIOS (which we can't tell).  At this point,
> > I'm repeating our conversion we had in my v1 patch set.  If you need
> > any further info, let me know.
> 
> Thanks, I recall all of this. I don't see though how it's related to the
> question of whether the CPU would really remain in C1 when using
> MWAIT (i.e. going back to Andrew's original finding of promotion from
> CC1 to CC6). Now I do realize that C1 != CC1, but this doesn't help
> clarifying things in any way.
> 
> Jan
> 
> 

Note: this is for Naples and Rome only.

I was answering the HLT question.  But mwait can ONLY be used for
CC1/C1 since we don't support using mwait for CC6/C2 since it shuts
down the cache and mwait monitors that.  There is no promotion from
C1/CC1 to C2/CC6 with mwait since it would lose it's method of waking
up.  When you entry C1/CC1 using mwait, it stays in C1/CC1.  I will
email a HW engineer confirming this but I'd be extremely surprised it
you could be promoted from C1/CC6 to C2/CC6 when using mwait.
diff mbox series

Patch

--- a/xen/arch/x86/acpi/cpu_idle.c
+++ b/xen/arch/x86/acpi/cpu_idle.c
@@ -110,6 +110,8 @@  boolean_param("lapic_timer_c2_ok", local
 
 struct acpi_processor_power *__read_mostly processor_powers[NR_CPUS];
 
+static int8_t __read_mostly vendor_override;
+
 struct hw_residencies
 {
     uint64_t mc0;
@@ -1211,6 +1213,9 @@  long set_cx_pminfo(uint32_t acpi_id, str
     if ( pm_idle_save && pm_idle != acpi_processor_idle )
         return 0;
 
+    if ( vendor_override > 0 )
+        return 0;
+
     print_cx_pminfo(acpi_id, power);
 
     cpu_id = get_cpu_id(acpi_id);
@@ -1283,6 +1288,98 @@  long set_cx_pminfo(uint32_t acpi_id, str
     return 0;
 }
 
+static void amd_cpuidle_init(struct acpi_processor_power *power)
+{
+    unsigned int i, nr = 0;
+    const struct cpuinfo_x86 *c = &current_cpu_data;
+    const unsigned int ecx_req = CPUID5_ECX_EXTENSIONS_SUPPORTED |
+                                 CPUID5_ECX_INTERRUPT_BREAK;
+    const struct acpi_processor_cx *cx = NULL;
+    static const struct acpi_processor_cx fam17[] = {
+        {
+            .type = ACPI_STATE_C1,
+            .entry_method = ACPI_CSTATE_EM_FFH,
+            .address = 0,
+            .latency = 1,
+        },
+        {
+            .type = ACPI_STATE_C2,
+            .entry_method = ACPI_CSTATE_EM_HALT,
+            .latency = 400,
+        },
+    };
+
+    if ( pm_idle_save && pm_idle != acpi_processor_idle )
+        return;
+
+    if ( vendor_override < 0 )
+        return;
+
+    switch ( c->x86 )
+    {
+    case 0x17:
+        if ( cpu_has_monitor && c->cpuid_level >= CPUID_MWAIT_LEAF &&
+             (cpuid_ecx(CPUID_MWAIT_LEAF) & ecx_req) == ecx_req )
+        {
+            cx = fam17;
+            nr = ARRAY_SIZE(fam17);
+            break;
+        }
+        /* fall through */
+    case 0x15:
+    case 0x16:
+        cx = &fam17[1];
+        nr = ARRAY_SIZE(fam17) - 1;
+        break;
+
+    default:
+        vendor_override = -1;
+        return;
+    }
+
+    power->flags.has_cst = true;
+
+    for ( i = 0; i < nr; ++i )
+    {
+        if ( cx[i].type > max_cstate )
+            break;
+        power->states[i + 1] = cx[i];
+        power->states[i + 1].idx = i + 1;
+        power->states[i + 1].target_residency = cx[i].latency * latency_factor;
+    }
+
+    if ( i )
+    {
+        power->count = i + 1;
+        power->safe_state = &power->states[i];
+
+        if ( !vendor_override )
+        {
+            if ( !boot_cpu_has(X86_FEATURE_ARAT) )
+                hpet_broadcast_init();
+
+            if ( !lapic_timer_init() )
+            {
+                vendor_override = -1;
+                cpuidle_init_cpu(power->cpu);
+                return;
+            }
+
+            if ( !pm_idle_save )
+            {
+                pm_idle_save = pm_idle;
+                pm_idle = acpi_processor_idle;
+            }
+
+            dead_idle = acpi_dead_idle;
+
+            vendor_override = 1;
+        }
+    }
+    else
+        vendor_override = -1;
+}
+
 uint32_t pmstat_get_cx_nr(uint32_t cpuid)
 {
     return processor_powers[cpuid] ? processor_powers[cpuid]->count : 0;
@@ -1429,8 +1526,8 @@  static int cpu_callback(
     int rc = 0;
 
     /*
-     * Only hook on CPU_UP_PREPARE because a dead cpu may utilize the info
-     * to enter deep C-state.
+     * Only hook on CPU_UP_PREPARE / CPU_ONLINE because a dead cpu may utilize
+     * the info to enter deep C-state.
      */
     switch ( action )
     {
@@ -1439,6 +1536,12 @@  static int cpu_callback(
         if ( !rc && cpuidle_current_governor->enable )
             rc = cpuidle_current_governor->enable(processor_powers[cpu]);
         break;
+
+    case CPU_ONLINE:
+        if ( boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
+             processor_powers[cpu] )
+            amd_cpuidle_init(processor_powers[cpu]);
+        break;
     }
 
     return !rc ? NOTIFY_DONE : notifier_from_errno(rc);