diff mbox series

x86/x2apic: introduce a mixed physical/cluster mode

Message ID 20231024135150.49232-1-roger.pau@citrix.com (mailing list archive)
State Superseded
Headers show
Series x86/x2apic: introduce a mixed physical/cluster mode | expand

Commit Message

Roger Pau Monné Oct. 24, 2023, 1:51 p.m. UTC
The current implementation of x2APIC requires to either use Cluster Logical or
Physical mode for all interrupts.  However the selection of Physical vs Logical
is not done at APIC setup, an APIC can be addressed both in Physical or Logical
destination modes concurrently.

Introduce a new x2APIC mode called mixed, which uses Logical Cluster mode for
IPIs, and Physical mode for external interrupts, thus attempting to use the
best method for each interrupt type.

Using Physical mode for external interrupts allows more vectors to be used, and
interrupt balancing to be more accurate.

Using Logical Cluster mode for IPIs allows less accesses to the ICR register
when sending those, as multiple CPUs can be targeted with a single ICR register
write.

A simple test calling flush_tlb_all() 10000 times in a tight loop on a 96 CPU
box gives the following average figures:

Physical mode: 26617931ns
Mixed mode:    23865337ns

So ~10% improvement versus plain Physical mode.  Note that Xen uses Cluster
mode by default, and hence is already using the fastest way for IPI delivery at
the cost of reducing the amount of vectors available system-wide.

Make the newly introduced mode the default one.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Do we want to keep a full Logical Cluster mode available?  I don't see a reason
to target external interrupts in Logical Cluster mode, but maybe there's
something I'm missing.

It's not clear to me whether the ACPI FADT flags are meant to apply only to
external interrupt delivery mode, or also to IPI delivery.  If
ACPI_FADT_APIC_PHYSICAL is only meant to apply to external interrupts and not
IPIs then we could possibly get rid of physical mode IPI delivery.

Still need to put this under XenServer extensive testing, but wanted to get
some feedback before in case approach greatly changes.

Based on top of 'x86/x2apic: remove usage of ACPI_FADT_APIC_CLUSTER'
---
 docs/misc/xen-command-line.pandoc | 11 ++++
 xen/arch/x86/Kconfig              | 37 ++++++++++---
 xen/arch/x86/genapic/x2apic.c     | 87 ++++++++++++++++++++++---------
 3 files changed, 104 insertions(+), 31 deletions(-)

Comments

Jan Beulich Oct. 30, 2023, 2:32 p.m. UTC | #1
On 24.10.2023 15:51, Roger Pau Monne wrote:
> The current implementation of x2APIC requires to either use Cluster Logical or
> Physical mode for all interrupts.  However the selection of Physical vs Logical
> is not done at APIC setup, an APIC can be addressed both in Physical or Logical
> destination modes concurrently.
> 
> Introduce a new x2APIC mode called mixed, which uses Logical Cluster mode for
> IPIs, and Physical mode for external interrupts, thus attempting to use the
> best method for each interrupt type.
> 
> Using Physical mode for external interrupts allows more vectors to be used, and
> interrupt balancing to be more accurate.
> 
> Using Logical Cluster mode for IPIs allows less accesses to the ICR register
> when sending those, as multiple CPUs can be targeted with a single ICR register
> write.
> 
> A simple test calling flush_tlb_all() 10000 times in a tight loop on a 96 CPU
> box gives the following average figures:
> 
> Physical mode: 26617931ns
> Mixed mode:    23865337ns
> 
> So ~10% improvement versus plain Physical mode.

Nice.

>  Note that Xen uses Cluster
> mode by default, and hence is already using the fastest way for IPI delivery at
> the cost of reducing the amount of vectors available system-wide.
> 
> Make the newly introduced mode the default one.
> 
> Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Do we want to keep a full Logical Cluster mode available?  I don't see a reason
> to target external interrupts in Logical Cluster mode, but maybe there's
> something I'm missing.
> 
> It's not clear to me whether the ACPI FADT flags are meant to apply only to
> external interrupt delivery mode, or also to IPI delivery.  If
> ACPI_FADT_APIC_PHYSICAL is only meant to apply to external interrupts and not
> IPIs then we could possibly get rid of physical mode IPI delivery.
> 
> Still need to put this under XenServer extensive testing, but wanted to get
> some feedback before in case approach greatly changes.

Looks quite okay, just a couple of minor remarks:

> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -2802,6 +2802,14 @@ the watchdog.
>  
>  Permit use of x2apic setup for SMP environments.
>  
> +### x2apic-mode (x86)
> +> `= physical | cluster | mixed`
> +
> +> Default: `physical` if **FADT** mandates physical mode, `mixed` otherwise.
> +
> +In the case that x2apic is in use, this option switches between modes to
> +address APICs in the system as interrupt destinations.
> +
>  ### x2apic_phys (x86)
>  > `= <boolean>`
>  
> @@ -2812,6 +2820,9 @@ In the case that x2apic is in use, this option switches between physical and
>  clustered mode.  The default, given no hint from the **FADT**, is cluster
>  mode.
>  
> +**WARNING: `x2apic_phys` is deprecated and superseded by `x2apic-mode`.
> +The later takes precedence if both are set.**

s/later/latter/ ?

This may further want a CHANGELOG.md entry.

> --- a/xen/arch/x86/Kconfig
> +++ b/xen/arch/x86/Kconfig
> @@ -228,11 +228,18 @@ config XEN_ALIGN_2M
>  
>  endchoice
>  
> -config X2APIC_PHYSICAL
> -	bool "x2APIC Physical Destination mode"
> -	help
> -	  Use x2APIC Physical Destination mode by default when available.
> +choice
> +	prompt "x2APIC Destination mode"
> +	default X2APIC_MIXED
> +	---help---

No new ---help--- please (also below); it ought to be just help going forward.

> +	  Select APIC addressing when x2APIC is enabled.
>  
> +	  The default mode is mixed which should provide the best aspects
> +	  of both physical and cluster modes.
> +
> +config X2APIC_PHYSICAL
> +       tristate "Physical Destination mode"
> +	---help---

Something's odd with indentation here. But first of all - why tristate? We
don't have modules in Xen.

> --- a/xen/arch/x86/genapic/x2apic.c
> +++ b/xen/arch/x86/genapic/x2apic.c
> @@ -180,6 +180,25 @@ static const struct genapic __initconstrel apic_x2apic_cluster = {
>      .send_IPI_self = send_IPI_self_x2apic
>  };
>  
> +/*
> + * Mixed x2APIC mode: use physical for external (device) interrupts, and
> + * cluster for inter processor interrupt.  Such mode has the benefits of not
> + * sharing the vector space with all CPUs on the cluster, while still allows
> + * IPIs to be more efficiently delivered by not having to perform an ICR
> + * write for each target CPU.
> + */
> +static const struct genapic __initconstrel apic_x2apic_mixed = {
> +    APIC_INIT("x2apic_mixed", NULL),
> +    /* NB: int_{delivery,dest}_mode are only used by non-IPI callers. */
> +    .int_delivery_mode = dest_Fixed,
> +    .int_dest_mode = 0 /* physical delivery */,
> +    .init_apic_ldr = init_apic_ldr_x2apic_cluster,
> +    .vector_allocation_cpumask = vector_allocation_cpumask_phys,
> +    .cpu_mask_to_apicid = cpu_mask_to_apicid_phys,

You have a non-IPI-only comment further up, but that - if in fact 
applicable here - would need to extend to these two hook functions as
well.

Jan
Elliott Mitchell Oct. 30, 2023, 2:50 p.m. UTC | #2
On Tue, Oct 24, 2023 at 03:51:50PM +0200, Roger Pau Monne wrote:
> diff --git a/xen/arch/x86/genapic/x2apic.c b/xen/arch/x86/genapic/x2apic.c
> index 707deef98c27..15632cc7332e 100644
> --- a/xen/arch/x86/genapic/x2apic.c
> +++ b/xen/arch/x86/genapic/x2apic.c
> @@ -220,38 +239,56 @@ static struct notifier_block x2apic_cpu_nfb = {
>  static int8_t __initdata x2apic_phys = -1;
>  boolean_param("x2apic_phys", x2apic_phys);
>  
> +enum {
> +   unset, physical, cluster, mixed
> +} static __initdata x2apic_mode = unset;
> +
> +static int __init parse_x2apic_mode(const char *s)
> +{
> +    if ( !cmdline_strcmp(s, "physical") )
> +        x2apic_mode = physical;
> +    else if ( !cmdline_strcmp(s, "cluster") )
> +        x2apic_mode = cluster;
> +    else if ( !cmdline_strcmp(s, "mixed") )
> +        x2apic_mode = mixed;
> +    else
> +        return EINVAL;
> +
> +    return 0;
> +}
> +custom_param("x2apic-mode", parse_x2apic_mode);
> +
>  const struct genapic *__init apic_x2apic_probe(void)
>  {
> -    if ( x2apic_phys < 0 )
> +    /* x2apic-mode option has preference over x2apic_phys. */
> +    if ( x2apic_phys >= 0 && x2apic_mode == unset )
> +        x2apic_mode = x2apic_phys ? physical : cluster;
> +
> +    if ( x2apic_mode == unset )
>      {
> -        /*
> -         * Force physical mode if there's no (full) interrupt remapping support:
> -         * The ID in clustered mode requires a 32 bit destination field due to
> -         * the usage of the high 16 bits to hold the cluster ID.
> -         */
> -        x2apic_phys = iommu_intremap != iommu_intremap_full ||
> -                      (acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL) ||
> -                      IS_ENABLED(CONFIG_X2APIC_PHYSICAL);
> -    }
> -    else if ( !x2apic_phys )
> -        switch ( iommu_intremap )
> +        if ( acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL )
>          {

Could this explain the issues with recent AMD processors/motherboards?

Mainly the firmware had been setting this flag, but Xen was previously
ignoring it?  As such Xen had been attempting to use cluster mode on an
x2APIC where that mode was broken for physical interrupts?

This may warrant a Fixes line.  This may warrant a backport to 4.17 due
to the seriousness of the bug.
Roger Pau Monné Oct. 30, 2023, 3:27 p.m. UTC | #3
On Mon, Oct 30, 2023 at 07:50:27AM -0700, Elliott Mitchell wrote:
> On Tue, Oct 24, 2023 at 03:51:50PM +0200, Roger Pau Monne wrote:
> > diff --git a/xen/arch/x86/genapic/x2apic.c b/xen/arch/x86/genapic/x2apic.c
> > index 707deef98c27..15632cc7332e 100644
> > --- a/xen/arch/x86/genapic/x2apic.c
> > +++ b/xen/arch/x86/genapic/x2apic.c
> > @@ -220,38 +239,56 @@ static struct notifier_block x2apic_cpu_nfb = {
> >  static int8_t __initdata x2apic_phys = -1;
> >  boolean_param("x2apic_phys", x2apic_phys);
> >  
> > +enum {
> > +   unset, physical, cluster, mixed
> > +} static __initdata x2apic_mode = unset;
> > +
> > +static int __init parse_x2apic_mode(const char *s)
> > +{
> > +    if ( !cmdline_strcmp(s, "physical") )
> > +        x2apic_mode = physical;
> > +    else if ( !cmdline_strcmp(s, "cluster") )
> > +        x2apic_mode = cluster;
> > +    else if ( !cmdline_strcmp(s, "mixed") )
> > +        x2apic_mode = mixed;
> > +    else
> > +        return EINVAL;
> > +
> > +    return 0;
> > +}
> > +custom_param("x2apic-mode", parse_x2apic_mode);
> > +
> >  const struct genapic *__init apic_x2apic_probe(void)
> >  {
> > -    if ( x2apic_phys < 0 )
> > +    /* x2apic-mode option has preference over x2apic_phys. */
> > +    if ( x2apic_phys >= 0 && x2apic_mode == unset )
> > +        x2apic_mode = x2apic_phys ? physical : cluster;
> > +
> > +    if ( x2apic_mode == unset )
> >      {
> > -        /*
> > -         * Force physical mode if there's no (full) interrupt remapping support:
> > -         * The ID in clustered mode requires a 32 bit destination field due to
> > -         * the usage of the high 16 bits to hold the cluster ID.
> > -         */
> > -        x2apic_phys = iommu_intremap != iommu_intremap_full ||
> > -                      (acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL) ||
> > -                      IS_ENABLED(CONFIG_X2APIC_PHYSICAL);
> > -    }
> > -    else if ( !x2apic_phys )
> > -        switch ( iommu_intremap )
> > +        if ( acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL )
> >          {
> 
> Could this explain the issues with recent AMD processors/motherboards?
> 
> Mainly the firmware had been setting this flag, but Xen was previously
> ignoring it?

No, not unless you pass {no-}x2apic_phys={false,0} on the Xen command
line to force logical (clustered) destination mode.

> As such Xen had been attempting to use cluster mode on an
> x2APIC where that mode was broken for physical interrupts?

No, not realy, x2apic_phys was already forced to true if
acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL is set on the FADT (I
just delete that line in this same chunk and move it here).

Thanks, Roger.
Roger Pau Monné Oct. 30, 2023, 4:10 p.m. UTC | #4
On Mon, Oct 30, 2023 at 03:32:56PM +0100, Jan Beulich wrote:
> On 24.10.2023 15:51, Roger Pau Monne wrote:
> > The current implementation of x2APIC requires to either use Cluster Logical or
> > Physical mode for all interrupts.  However the selection of Physical vs Logical
> > is not done at APIC setup, an APIC can be addressed both in Physical or Logical
> > destination modes concurrently.
> > 
> > Introduce a new x2APIC mode called mixed, which uses Logical Cluster mode for
> > IPIs, and Physical mode for external interrupts, thus attempting to use the
> > best method for each interrupt type.
> > 
> > Using Physical mode for external interrupts allows more vectors to be used, and
> > interrupt balancing to be more accurate.
> > 
> > Using Logical Cluster mode for IPIs allows less accesses to the ICR register
> > when sending those, as multiple CPUs can be targeted with a single ICR register
> > write.
> > 
> > A simple test calling flush_tlb_all() 10000 times in a tight loop on a 96 CPU
> > box gives the following average figures:
> > 
> > Physical mode: 26617931ns
> > Mixed mode:    23865337ns
> > 
> > So ~10% improvement versus plain Physical mode.
> 
> Nice.
> 
> >  Note that Xen uses Cluster
> > mode by default, and hence is already using the fastest way for IPI delivery at
> > the cost of reducing the amount of vectors available system-wide.
> > 
> > Make the newly introduced mode the default one.
> > 
> > Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> > Do we want to keep a full Logical Cluster mode available?  I don't see a reason
> > to target external interrupts in Logical Cluster mode, but maybe there's
> > something I'm missing.
> > 
> > It's not clear to me whether the ACPI FADT flags are meant to apply only to
> > external interrupt delivery mode, or also to IPI delivery.  If
> > ACPI_FADT_APIC_PHYSICAL is only meant to apply to external interrupts and not
> > IPIs then we could possibly get rid of physical mode IPI delivery.
> > 
> > Still need to put this under XenServer extensive testing, but wanted to get
> > some feedback before in case approach greatly changes.
> 
> Looks quite okay, just a couple of minor remarks:

Thanks.

Do we still want to keep the pure cluster mode?

We do need to keep the pure Physical mode in case the FADT flags force
us into using it, but there's no flag to force the usage of Logical
destination mode only.

> 
> > --- a/docs/misc/xen-command-line.pandoc
> > +++ b/docs/misc/xen-command-line.pandoc
> > @@ -2802,6 +2802,14 @@ the watchdog.
> >  
> >  Permit use of x2apic setup for SMP environments.
> >  
> > +### x2apic-mode (x86)
> > +> `= physical | cluster | mixed`
> > +
> > +> Default: `physical` if **FADT** mandates physical mode, `mixed` otherwise.
> > +
> > +In the case that x2apic is in use, this option switches between modes to
> > +address APICs in the system as interrupt destinations.
> > +
> >  ### x2apic_phys (x86)
> >  > `= <boolean>`
> >  
> > @@ -2812,6 +2820,9 @@ In the case that x2apic is in use, this option switches between physical and
> >  clustered mode.  The default, given no hint from the **FADT**, is cluster
> >  mode.
> >  
> > +**WARNING: `x2apic_phys` is deprecated and superseded by `x2apic-mode`.
> > +The later takes precedence if both are set.**
> 
> s/later/latter/ ?
> 
> This may further want a CHANGELOG.md entry.

Yes, indeed.

> > --- a/xen/arch/x86/Kconfig
> > +++ b/xen/arch/x86/Kconfig
> > @@ -228,11 +228,18 @@ config XEN_ALIGN_2M
> >  
> >  endchoice
> >  
> > -config X2APIC_PHYSICAL
> > -	bool "x2APIC Physical Destination mode"
> > -	help
> > -	  Use x2APIC Physical Destination mode by default when available.
> > +choice
> > +	prompt "x2APIC Destination mode"
> > +	default X2APIC_MIXED
> > +	---help---
> 
> No new ---help--- please (also below); it ought to be just help going forward.

In this case we should replace existing instances of ---help---
otherwise I copy from one of those and forgot to adjust.

Kconfig is usually done by copy&paste (at least for me), and hence
having proper format everywhere will make that less error prone.

> > +	  Select APIC addressing when x2APIC is enabled.
> >  
> > +	  The default mode is mixed which should provide the best aspects
> > +	  of both physical and cluster modes.
> > +
> > +config X2APIC_PHYSICAL
> > +       tristate "Physical Destination mode"
> > +	---help---
> 
> Something's odd with indentation here. But first of all - why tristate? We
> don't have modules in Xen.

Ah, yes, got confused.

> 
> > --- a/xen/arch/x86/genapic/x2apic.c
> > +++ b/xen/arch/x86/genapic/x2apic.c
> > @@ -180,6 +180,25 @@ static const struct genapic __initconstrel apic_x2apic_cluster = {
> >      .send_IPI_self = send_IPI_self_x2apic
> >  };
> >  
> > +/*
> > + * Mixed x2APIC mode: use physical for external (device) interrupts, and
> > + * cluster for inter processor interrupt.  Such mode has the benefits of not
> > + * sharing the vector space with all CPUs on the cluster, while still allows
> > + * IPIs to be more efficiently delivered by not having to perform an ICR
> > + * write for each target CPU.
> > + */
> > +static const struct genapic __initconstrel apic_x2apic_mixed = {
> > +    APIC_INIT("x2apic_mixed", NULL),
> > +    /* NB: int_{delivery,dest}_mode are only used by non-IPI callers. */
> > +    .int_delivery_mode = dest_Fixed,
> > +    .int_dest_mode = 0 /* physical delivery */,
> > +    .init_apic_ldr = init_apic_ldr_x2apic_cluster,
> > +    .vector_allocation_cpumask = vector_allocation_cpumask_phys,
> > +    .cpu_mask_to_apicid = cpu_mask_to_apicid_phys,
> 
> You have a non-IPI-only comment further up, but that - if in fact 
> applicable here - would need to extend to these two hook functions as
> well.

Yes, all fields except for send_IPI_{mask,self} only apply to external
interrupts, not IPIs. And init_apic_ldr is kind of weird because it
just inits the data related fields in order to do the cluster
delivery, but doesn't clobber any data used by physical mode anyway.

Thanks, Roger.
Jan Beulich Oct. 30, 2023, 4:34 p.m. UTC | #5
On 30.10.2023 17:10, Roger Pau Monné wrote:
> On Mon, Oct 30, 2023 at 03:32:56PM +0100, Jan Beulich wrote:
>> On 24.10.2023 15:51, Roger Pau Monne wrote:
>>> The current implementation of x2APIC requires to either use Cluster Logical or
>>> Physical mode for all interrupts.  However the selection of Physical vs Logical
>>> is not done at APIC setup, an APIC can be addressed both in Physical or Logical
>>> destination modes concurrently.
>>>
>>> Introduce a new x2APIC mode called mixed, which uses Logical Cluster mode for
>>> IPIs, and Physical mode for external interrupts, thus attempting to use the
>>> best method for each interrupt type.
>>>
>>> Using Physical mode for external interrupts allows more vectors to be used, and
>>> interrupt balancing to be more accurate.
>>>
>>> Using Logical Cluster mode for IPIs allows less accesses to the ICR register
>>> when sending those, as multiple CPUs can be targeted with a single ICR register
>>> write.
>>>
>>> A simple test calling flush_tlb_all() 10000 times in a tight loop on a 96 CPU
>>> box gives the following average figures:
>>>
>>> Physical mode: 26617931ns
>>> Mixed mode:    23865337ns
>>>
>>> So ~10% improvement versus plain Physical mode.
>>
>> Nice.
>>
>>>  Note that Xen uses Cluster
>>> mode by default, and hence is already using the fastest way for IPI delivery at
>>> the cost of reducing the amount of vectors available system-wide.
>>>
>>> Make the newly introduced mode the default one.
>>>
>>> Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>> ---
>>> Do we want to keep a full Logical Cluster mode available?  I don't see a reason
>>> to target external interrupts in Logical Cluster mode, but maybe there's
>>> something I'm missing.
>>>
>>> It's not clear to me whether the ACPI FADT flags are meant to apply only to
>>> external interrupt delivery mode, or also to IPI delivery.  If
>>> ACPI_FADT_APIC_PHYSICAL is only meant to apply to external interrupts and not
>>> IPIs then we could possibly get rid of physical mode IPI delivery.
>>>
>>> Still need to put this under XenServer extensive testing, but wanted to get
>>> some feedback before in case approach greatly changes.
>>
>> Looks quite okay, just a couple of minor remarks:
> 
> Thanks.
> 
> Do we still want to keep the pure cluster mode?

I think we should keep it for a full release cycle or two.

Jan
Roger Pau Monné Oct. 30, 2023, 4:38 p.m. UTC | #6
On Mon, Oct 30, 2023 at 05:34:48PM +0100, Jan Beulich wrote:
> On 30.10.2023 17:10, Roger Pau Monné wrote:
> > On Mon, Oct 30, 2023 at 03:32:56PM +0100, Jan Beulich wrote:
> >> On 24.10.2023 15:51, Roger Pau Monne wrote:
> >>> The current implementation of x2APIC requires to either use Cluster Logical or
> >>> Physical mode for all interrupts.  However the selection of Physical vs Logical
> >>> is not done at APIC setup, an APIC can be addressed both in Physical or Logical
> >>> destination modes concurrently.
> >>>
> >>> Introduce a new x2APIC mode called mixed, which uses Logical Cluster mode for
> >>> IPIs, and Physical mode for external interrupts, thus attempting to use the
> >>> best method for each interrupt type.
> >>>
> >>> Using Physical mode for external interrupts allows more vectors to be used, and
> >>> interrupt balancing to be more accurate.
> >>>
> >>> Using Logical Cluster mode for IPIs allows less accesses to the ICR register
> >>> when sending those, as multiple CPUs can be targeted with a single ICR register
> >>> write.
> >>>
> >>> A simple test calling flush_tlb_all() 10000 times in a tight loop on a 96 CPU
> >>> box gives the following average figures:
> >>>
> >>> Physical mode: 26617931ns
> >>> Mixed mode:    23865337ns
> >>>
> >>> So ~10% improvement versus plain Physical mode.
> >>
> >> Nice.
> >>
> >>>  Note that Xen uses Cluster
> >>> mode by default, and hence is already using the fastest way for IPI delivery at
> >>> the cost of reducing the amount of vectors available system-wide.
> >>>
> >>> Make the newly introduced mode the default one.
> >>>
> >>> Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
> >>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> >>> ---
> >>> Do we want to keep a full Logical Cluster mode available?  I don't see a reason
> >>> to target external interrupts in Logical Cluster mode, but maybe there's
> >>> something I'm missing.
> >>>
> >>> It's not clear to me whether the ACPI FADT flags are meant to apply only to
> >>> external interrupt delivery mode, or also to IPI delivery.  If
> >>> ACPI_FADT_APIC_PHYSICAL is only meant to apply to external interrupts and not
> >>> IPIs then we could possibly get rid of physical mode IPI delivery.
> >>>
> >>> Still need to put this under XenServer extensive testing, but wanted to get
> >>> some feedback before in case approach greatly changes.
> >>
> >> Looks quite okay, just a couple of minor remarks:
> > 
> > Thanks.
> > 
> > Do we still want to keep the pure cluster mode?
> 
> I think we should keep it for a full release cycle or two.

OK, will make the requested changes and send v2 then.

Thanks, Roger.
Elliott Mitchell Nov. 1, 2023, 1:05 a.m. UTC | #7
On Mon, Oct 30, 2023 at 04:27:22PM +0100, Roger Pau Monné wrote:
> On Mon, Oct 30, 2023 at 07:50:27AM -0700, Elliott Mitchell wrote:
> > On Tue, Oct 24, 2023 at 03:51:50PM +0200, Roger Pau Monne wrote:
> > > diff --git a/xen/arch/x86/genapic/x2apic.c b/xen/arch/x86/genapic/x2apic.c
> > > index 707deef98c27..15632cc7332e 100644
> > > --- a/xen/arch/x86/genapic/x2apic.c
> > > +++ b/xen/arch/x86/genapic/x2apic.c
> > > @@ -220,38 +239,56 @@ static struct notifier_block x2apic_cpu_nfb = {
> > >  static int8_t __initdata x2apic_phys = -1;
> > >  boolean_param("x2apic_phys", x2apic_phys);
> > >  
> > > +enum {
> > > +   unset, physical, cluster, mixed
> > > +} static __initdata x2apic_mode = unset;
> > > +
> > > +static int __init parse_x2apic_mode(const char *s)
> > > +{
> > > +    if ( !cmdline_strcmp(s, "physical") )
> > > +        x2apic_mode = physical;
> > > +    else if ( !cmdline_strcmp(s, "cluster") )
> > > +        x2apic_mode = cluster;
> > > +    else if ( !cmdline_strcmp(s, "mixed") )
> > > +        x2apic_mode = mixed;
> > > +    else
> > > +        return EINVAL;
> > > +
> > > +    return 0;
> > > +}
> > > +custom_param("x2apic-mode", parse_x2apic_mode);
> > > +
> > >  const struct genapic *__init apic_x2apic_probe(void)
> > >  {
> > > -    if ( x2apic_phys < 0 )
> > > +    /* x2apic-mode option has preference over x2apic_phys. */
> > > +    if ( x2apic_phys >= 0 && x2apic_mode == unset )
> > > +        x2apic_mode = x2apic_phys ? physical : cluster;
> > > +
> > > +    if ( x2apic_mode == unset )
> > >      {
> > > -        /*
> > > -         * Force physical mode if there's no (full) interrupt remapping support:
> > > -         * The ID in clustered mode requires a 32 bit destination field due to
> > > -         * the usage of the high 16 bits to hold the cluster ID.
> > > -         */
> > > -        x2apic_phys = iommu_intremap != iommu_intremap_full ||
> > > -                      (acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL) ||
> > > -                      IS_ENABLED(CONFIG_X2APIC_PHYSICAL);
> > > -    }
> > > -    else if ( !x2apic_phys )
> > > -        switch ( iommu_intremap )
> > > +        if ( acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL )
> > >          {
> > 
> > Could this explain the issues with recent AMD processors/motherboards?
> > 
> > Mainly the firmware had been setting this flag, but Xen was previously
> > ignoring it?
> 
> No, not unless you pass {no-}x2apic_phys={false,0} on the Xen command
> line to force logical (clustered) destination mode.
> 
> > As such Xen had been attempting to use cluster mode on an
> > x2APIC where that mode was broken for physical interrupts?
> 
> No, not realy, x2apic_phys was already forced to true if
> acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL is set on the FADT (I
> just delete that line in this same chunk and move it here).

I think I inverted the case in my speculation.  Perhaps AMD's firmware
SHOULD have been setting ACPI_FADT_APIC_PHYSICAL, but failed to?

Given the rough launch with AMD's latest, I could readily see there
being a goof like this.  Question will then be, will mixed mode work on
these processors?
Elliott Mitchell Nov. 7, 2023, 7:35 p.m. UTC | #8
On Mon, Oct 30, 2023 at 04:27:22PM +0100, Roger Pau Monné wrote:
> On Mon, Oct 30, 2023 at 07:50:27AM -0700, Elliott Mitchell wrote:
> > On Tue, Oct 24, 2023 at 03:51:50PM +0200, Roger Pau Monne wrote:
> > > diff --git a/xen/arch/x86/genapic/x2apic.c b/xen/arch/x86/genapic/x2apic.c
> > > index 707deef98c27..15632cc7332e 100644
> > > --- a/xen/arch/x86/genapic/x2apic.c
> > > +++ b/xen/arch/x86/genapic/x2apic.c
> > > @@ -220,38 +239,56 @@ static struct notifier_block x2apic_cpu_nfb = {
> > >  static int8_t __initdata x2apic_phys = -1;
> > >  boolean_param("x2apic_phys", x2apic_phys);
> > >  
> > > +enum {
> > > +   unset, physical, cluster, mixed
> > > +} static __initdata x2apic_mode = unset;
> > > +
> > > +static int __init parse_x2apic_mode(const char *s)
> > > +{
> > > +    if ( !cmdline_strcmp(s, "physical") )
> > > +        x2apic_mode = physical;
> > > +    else if ( !cmdline_strcmp(s, "cluster") )
> > > +        x2apic_mode = cluster;
> > > +    else if ( !cmdline_strcmp(s, "mixed") )
> > > +        x2apic_mode = mixed;
> > > +    else
> > > +        return EINVAL;
> > > +
> > > +    return 0;
> > > +}
> > > +custom_param("x2apic-mode", parse_x2apic_mode);
> > > +
> > >  const struct genapic *__init apic_x2apic_probe(void)
> > >  {
> > > -    if ( x2apic_phys < 0 )
> > > +    /* x2apic-mode option has preference over x2apic_phys. */
> > > +    if ( x2apic_phys >= 0 && x2apic_mode == unset )
> > > +        x2apic_mode = x2apic_phys ? physical : cluster;
> > > +
> > > +    if ( x2apic_mode == unset )
> > >      {
> > > -        /*
> > > -         * Force physical mode if there's no (full) interrupt remapping support:
> > > -         * The ID in clustered mode requires a 32 bit destination field due to
> > > -         * the usage of the high 16 bits to hold the cluster ID.
> > > -         */
> > > -        x2apic_phys = iommu_intremap != iommu_intremap_full ||
> > > -                      (acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL) ||
> > > -                      IS_ENABLED(CONFIG_X2APIC_PHYSICAL);
> > > -    }
> > > -    else if ( !x2apic_phys )
> > > -        switch ( iommu_intremap )
> > > +        if ( acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL )
> > >          {
> > 
> > Could this explain the issues with recent AMD processors/motherboards?
> > 
> > Mainly the firmware had been setting this flag, but Xen was previously
> > ignoring it?
> 
> No, not unless you pass {no-}x2apic_phys={false,0} on the Xen command
> line to force logical (clustered) destination mode.
> 
> > As such Xen had been attempting to use cluster mode on an
> > x2APIC where that mode was broken for physical interrupts?
> 
> No, not realy, x2apic_phys was already forced to true if
> acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL is set on the FADT (I
> just delete that line in this same chunk and move it here).

Okay, that was from a quick look at the patch.  Given the symptoms and
workaround with recent AMD motherboards, this looked suspicious.

In that case it might be a bug in what AMD is providing to motherboard
manufacturers.  Mainly this bit MUST be set, but AMD's implementation
leaves it unset.

Could also be if the setup is done correctly the bit can be cleared, but
multiple motherboard manufacturers are breaking this.  Perhaps the steps
are fragile and AMD needed to provide better guidance.


Neowutran, are you still setup to and interested in doing
experimentation/testing with Xen on your AMD computer?  Would you be up
for trying the patch here:

https://lore.kernel.org/xen-devel/20231106142739.19650-1-roger.pau@citrix.com/raw

I have a suspicion this *might* fix the x2APIC issue everyone has been
seeing.

How plausible would it be to release this as a bugfix/workaround on 4.17?
I'm expecting a "no", but figured I should ask given how widespread the
issue is.
Neowutran Nov. 17, 2023, 10:12 a.m. UTC | #9
On 2023-11-07 11:11, Elliott Mitchell wrote:
> On Mon, Oct 30, 2023 at 04:27:22PM +01
Elliott Mitchell Nov. 18, 2023, 3:04 a.m. UTC | #10
On Fri, Nov 17, 2023 at 11:12:37AM +0100, Neowutran wrote:
> On 2023-11-07 11:11, Elliott Mitchell wrote:
> > On Mon, Oct 30, 2023 at 04:27:22PM +01
> > > On Mon, Oct 30, 2023 at 07:50:27AM -0700, Elliott Mitchell wrote:
> > > > On Tue, Oct 24, 2023 at 03:51:50PM +0200, Roger Pau Monne wrote:
> > > > > diff --git a/xen/arch/x86/genapic/x2apic.c b/xen/arch/x86/genapic/x2apic.c
> > > > > index 707deef98c27..15632cc7332e 100644
> > > > > --- a/xen/arch/x86/genapic/x2apic.c
> > > > > +++ b/xen/arch/x86/genapic/x2apic.c
> > > > > @@ -220,38 +239,56 @@ static struct notifier_block x2apic_cpu_nfb = {
> > > > >  static int8_t __initdata x2apic_phys = -1;
> > > > >  boolean_param("x2apic_phys", x2apic_phys);
> > > > >  
> > > > > +enum {
> > > > > +   unset, physical, cluster, mixed
> > > > > +} static __initdata x2apic_mode = unset;
> > > > > +
> > > > > +static int __init parse_x2apic_mode(const char *s)
> > > > > +{
> > > > > +    if ( !cmdline_strcmp(s, "physical") )
> > > > > +        x2apic_mode = physical;
> > > > > +    else if ( !cmdline_strcmp(s, "cluster") )
> > > > > +        x2apic_mode = cluster;
> > > > > +    else if ( !cmdline_strcmp(s, "mixed") )
> > > > > +   
> > > > > +    else
> > > > > +        return EINVAL;
> > > > > +
> > > > > +    return 0;
> > > > > +}
> > > > > +custom_param("x2apic-mode", parse_x2apic_mode);
> > > > > +
> > > > >  const struct genapic *__init apic_x2apic_probe(void)
> > > > >  {
> > > > > -    if ( x2apic_phys < 0 )
> > > > > +    /* x2apic-mode option has preference over x2apic_phys. */
> > > > > +    if ( x2apic_phys >= 0 && x2apic_mode == unset )
> > > > > +        x2apic_mode = x2apic_phys ? physical : cluster;
> > > > > +
> > > > > +    if ( x2apic_mode == unset )
> > > > >      {
> > > > > -        /*
> > > > > -         * Force physical mode if there's no (full) interrupt remapping support:
> > > > > -         * The ID in clustered mode requires a 32 bit destination field due to
> > > > > -         * the usage of the high 16 bits to hold the cluster ID.
> > > > > -         */
> > > > > -        x2apic_phys = iommu_intremap != iommu_intremap_full ||
> > > > > -                      (acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL) ||
> > > > > -        
> > > > > -    }
> > > > > -    else if ( !x2apic_phys )
> > > > > -        switch ( iommu_intremap )
> > > > > +        if ( acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL )
> > > > >          {
> > > > 
> > > > Could this explain the issues with recent AMD processors/motherboards?
> > > > 
> > > > Mainly the firmware had been setting this flag, but Xen was previously
> > > > ignoring it?
> > > 
> > > No, not unless you pass {no-}x2apic_phys={false,0} on the Xen command
> > > line to force logical (clustered) destination mode.
> > > 
> > > > As such Xen had been attempting to use cluster mode on an
> > > > x2APIC where that mode was broken for physical interrupts?
> > > 
> > > No, not realy, x2apic_phys was already forced to true if
> > > acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL is set on the FADT (I
> > > just delete that line in this same chunk and move it here).
> > 
> > Okay, that was from a quick look at the patch.  Given the symptoms and
> > workaround with recent AMD motherboards, this looked
> > 
> > In that case it might be a bug in what AMD is providing to motherboard
> > manufacturers.  Mainly this bit MUST be set, but AMD's implementation
> > leaves it unset.
> > 
> > Could also be if the setup is done correctly the bit can be cleared, but
> > multiple motherboard manufacturers are breaking this.  Perhaps the steps
> > are fragile and AMD needed to provide better guidance.
> > 
> > 
> > Neowutran, are you still setup to and interested in doing
> > experimentation/testing with Xen on your AMD computer?  Would you be up
> > for trying the patch here:
> > 
> > https://lore.kernel.org/xen-devel/20231106142739.19650-1-roger.pau@citrix.com/raw
> > 
> > I have a suspicion this *might* fix the x2APIC issue everyone has been
> > seeing.
> > 
> > How plausible would it be to release this as a bugfix/workaround on 4.17?
> > I'm expecting a "no", but figured I should ask given how widespread the
> > issue is.
> 
> I just applied the patch on my setup ( https://lore.kernel.org/xen-devel/20231106142739.19650-1-roger.pau@citrix.com/raw ) 
> It seems to fix the x2APIC issue I was having. 
> 
> I only did some quick tests: 
> 
> I tried all the differents values in my bios for the X2APIC settings. 
> Now the system successfully boot in all the cases, without needing
> manual override of the x2apic_phys/x2apic_mode parameter in boot commandline .

In light of this issue effecting a large number of people with recent
hardware, I formally request the patch
"x86/x2apic: introduce a mixed physical/cluster mode" be considered for
backport release on the 4.17 and 4.18 branches.

(I'm unsure whether it is realistic for a 4.17 update, but I figure I
should ask)
Andrew Cooper Nov. 18, 2023, 11:33 a.m. UTC | #11
On 18/11/2023 3:04 am, Elliott Mitchell wrote:
> On Fri, Nov 17, 2023 at 11:12:37AM +0100, Neowutran wrote:
>> On 2023-11-07 11:11, Elliott Mitchell wrote:
>>> On Mon, Oct 30, 2023 at 04:27:22PM +01
>>>> On Mon, Oct 30, 2023 at 07:50:27AM -0700, Elliott Mitchell wrote:
>>>>> On Tue, Oct 24, 2023 at 03:51:50PM +0200, Roger Pau Monne wrote:
>>>>>> diff --git a/xen/arch/x86/genapic/x2apic.c b/xen/arch/x86/genapic/x2apic.c
>>>>>> index 707deef98c27..15632cc7332e 100644
>>>>>> --- a/xen/arch/x86/genapic/x2apic.c
>>>>>> +++ b/xen/arch/x86/genapic/x2apic.c
>>>>>> @@ -220,38 +239,56 @@ static struct notifier_block x2apic_cpu_nfb = {
>>>>>>  static int8_t __initdata x2apic_phys = -1;
>>>>>>  boolean_param("x2apic_phys", x2apic_phys);
>>>>>>  
>>>>>> +enum {
>>>>>> +   unset, physical, cluster, mixed
>>>>>> +} static __initdata x2apic_mode = unset;
>>>>>> +
>>>>>> +static int __init parse_x2apic_mode(const char *s)
>>>>>> +{
>>>>>> +    if ( !cmdline_strcmp(s, "physical") )
>>>>>> +        x2apic_mode = physical;
>>>>>> +    else if ( !cmdline_strcmp(s, "cluster") )
>>>>>> +        x2apic_mode = cluster;
>>>>>> +    else if ( !cmdline_strcmp(s, "mixed") )
>>>>>> +   
>>>>>> +    else
>>>>>> +        return EINVAL;
>>>>>> +
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +custom_param("x2apic-mode", parse_x2apic_mode);
>>>>>> +
>>>>>>  const struct genapic *__init apic_x2apic_probe(void)
>>>>>>  {
>>>>>> -    if ( x2apic_phys < 0 )
>>>>>> +    /* x2apic-mode option has preference over x2apic_phys. */
>>>>>> +    if ( x2apic_phys >= 0 && x2apic_mode == unset )
>>>>>> +        x2apic_mode = x2apic_phys ? physical : cluster;
>>>>>> +
>>>>>> +    if ( x2apic_mode == unset )
>>>>>>      {
>>>>>> -        /*
>>>>>> -         * Force physical mode if there's no (full) interrupt remapping support:
>>>>>> -         * The ID in clustered mode requires a 32 bit destination field due to
>>>>>> -         * the usage of the high 16 bits to hold the cluster ID.
>>>>>> -         */
>>>>>> -        x2apic_phys = iommu_intremap != iommu_intremap_full ||
>>>>>> -                      (acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL) ||
>>>>>> -        
>>>>>> -    }
>>>>>> -    else if ( !x2apic_phys )
>>>>>> -        switch ( iommu_intremap )
>>>>>> +        if ( acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL )
>>>>>>          {
>>>>> Could this explain the issues with recent AMD processors/motherboards?
>>>>>
>>>>> Mainly the firmware had been setting this flag, but Xen was previously
>>>>> ignoring it?
>>>> No, not unless you pass {no-}x2apic_phys={false,0} on the Xen command
>>>> line to force logical (clustered) destination mode.
>>>>
>>>>> As such Xen had been attempting to use cluster mode on an
>>>>> x2APIC where that mode was broken for physical interrupts?
>>>> No, not realy, x2apic_phys was already forced to true if
>>>> acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL is set on the FADT (I
>>>> just delete that line in this same chunk and move it here).
>>> Okay, that was from a quick look at the patch.  Given the symptoms and
>>> workaround with recent AMD motherboards, this looked
>>>
>>> In that case it might be a bug in what AMD is providing to motherboard
>>> manufacturers.  Mainly this bit MUST be set, but AMD's implementation
>>> leaves it unset.
>>>
>>> Could also be if the setup is done correctly the bit can be cleared, but
>>> multiple motherboard manufacturers are breaking this.  Perhaps the steps
>>> are fragile and AMD needed to provide better guidance.
>>>
>>>
>>> Neowutran, are you still setup to and interested in doing
>>> experimentation/testing with Xen on your AMD computer?  Would you be up
>>> for trying the patch here:
>>>
>>> https://lore.kernel.org/xen-devel/20231106142739.19650-1-roger.pau@citrix.com/raw
>>>
>>> I have a suspicion this *might* fix the x2APIC issue everyone has been
>>> seeing.
>>>
>>> How plausible would it be to release this as a bugfix/workaround on 4.17?
>>> I'm expecting a "no", but figured I should ask given how widespread the
>>> issue is.
>> I just applied the patch on my setup ( https://lore.kernel.org/xen-devel/20231106142739.19650-1-roger.pau@citrix.com/raw ) 
>> It seems to fix the x2APIC issue I was having. 
>>
>> I only did some quick tests: 
>>
>> I tried all the differents values in my bios for the X2APIC settings. 
>> Now the system successfully boot in all the cases, without needing
>> manual override of the x2apic_phys/x2apic_mode parameter in boot commandline .
> In light of this issue effecting a large number of people with recent
> hardware, I formally request the patch
> "x86/x2apic: introduce a mixed physical/cluster mode" be considered for
> backport release on the 4.17 and 4.18 branches.
>
> (I'm unsure whether it is realistic for a 4.17 update, but I figure I
> should ask)

This is an unreasonable ask.

I believe you when you say there is (or at least was) an x2apic bug (or
bugs), but not once did you provide the logging requested, nor engage
usefully with us in debugging.

And despite this, we (Roger, Jan and myself) found, fixed and backported
3 x2apic bugs.

Now you come along guessing alone at x2apic in a patch name that it
fixes your problem, on a patch which is not a bugfix - it's a
performance optimisation.


Neowutran, thankyou for looking into the patch, but I'm afraid that
doesn't confirm that this patch fixed an issue either.  If it really did
make a difference, then you'll see a difference in behaviour using each
of the 3 new x2apic-mode= options.

Please could you take your single up-to-date build of Xen, put the BIOS
settings back to whatever was causing you problems originally, and
describe what happens when booting each of
x2apic-mode={physical,cluster,mixed}?

Thankyou,

~Andrew
Elliott Mitchell Nov. 22, 2023, 12:56 a.m. UTC | #12
On Sat, Nov 18, 2023 at 11:33:55AM +0000, Andrew Cooper wrote:
> On 18/11/2023 3:04 am, Elliott Mitchell wrote:
> > On Fri, Nov 17, 2023 at 11:12:37AM +0100, Neowutran wrote:
> >> On 2023-11-07 11:11, Elliott Mitchell wrote:
> >>> On Mon, Oct 30, 2023 at 04:27:22PM +01
> >>>> On Mon, Oct 30, 2023 at 07:50:27AM -0700, Elliott Mitchell wrote:
> >>>>> On Tue, Oct 24, 2023 at 03:51:50PM +0200, Roger Pau Monne wrote:
> >>>>>> diff --git a/xen/arch/x86/genapic/x2apic.c b/xen/arch/x86/genapic/x2apic.c
> >>>>>> index 707deef98c27..15632cc7332e 100644
> >>>>>> --- a/xen/arch/x86/genapic/x2apic.c
> >>>>>> +++ b/xen/arch/x86/genapic/x2apic.c
> >>>>>> @@ -220,38 +239,56 @@ static struct notifier_block x2apic_cpu_nfb = {
> >>>>>>  static int8_t __initdata x2apic_phys = -1;
> >>>>>>  boolean_param("x2apic_phys", x2apic_phys);
> >>>>>>  
> >>>>>> +enum {
> >>>>>> +   unset, physical, cluster, mixed
> >>>>>> +} static __initdata x2apic_mode = unset;
> >>>>>> +
> >>>>>> +static int __init parse_x2apic_mode(const char *s)
> >>>>>> +{
> >>>>>> +    if ( !cmdline_strcmp(s, "physical") )
> >>>>>> +        x2apic_mode = physical;
> >>>>>> +    else if ( !cmdline_strcmp(s, "cluster") )
> >>>>>> +        x2apic_mode = cluster;
> >>>>>> +    else if ( !cmdline_strcmp(s, "mixed") )
> >>>>>> +   
> >>>>>> +    else
> >>>>>> +        return EINVAL;
> >>>>>> +
> >>>>>> +    return 0;
> >>>>>> +}
> >>>>>> +custom_param("x2apic-mode", parse_x2apic_mode);
> >>>>>> +
> >>>>>>  const struct genapic *__init apic_x2apic_probe(void)
> >>>>>>  {
> >>>>>> -    if ( x2apic_phys < 0 )
> >>>>>> +    /* x2apic-mode option has preference over x2apic_phys. */
> >>>>>> +    if ( x2apic_phys >= 0 && x2apic_mode == unset )
> >>>>>> +        x2apic_mode = x2apic_phys ? physical : cluster;
> >>>>>> +
> >>>>>> +    if ( x2apic_mode == unset )
> >>>>>>      {
> >>>>>> -        /*
> >>>>>> -         * Force physical mode if there's no (full) interrupt remapping support:
> >>>>>> -         * The ID in clustered mode requires a 32 bit destination field due to
> >>>>>> -         * the usage of the high 16 bits to hold the cluster ID.
> >>>>>> -         */
> >>>>>> -        x2apic_phys = iommu_intremap != iommu_intremap_full ||
> >>>>>> -                      (acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL) ||
> >>>>>> -        
> >>>>>> -    }
> >>>>>> -    else if ( !x2apic_phys )
> >>>>>> -        switch ( iommu_intremap )
> >>>>>> +        if ( acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL )
> >>>>>>          {
> >>>>> Could this explain the issues with recent AMD processors/motherboards?
> >>>>>
> >>>>> Mainly the firmware had been setting this flag, but Xen was previously
> >>>>> ignoring it?
> >>>> No, not unless you pass {no-}x2apic_phys={false,0} on the Xen command
> >>>> line to force logical (clustered) destination mode.
> >>>>
> >>>>> As such Xen had been attempting to use cluster mode on an
> >>>>> x2APIC where that mode was broken for physical interrupts?
> >>>> No, not realy, x2apic_phys was already forced to true if
> >>>> acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL is set on the FADT (I
> >>>> just delete that line in this same chunk and move it here).
> >>> Okay, that was from a quick look at the patch.  Given the symptoms and
> >>> workaround with recent AMD motherboards, this looked
> >>>
> >>> In that case it might be a bug in what AMD is providing to motherboard
> >>> manufacturers.  Mainly this bit MUST be set, but AMD's implementation
> >>> leaves it unset.
> >>>
> >>> Could also be if the setup is done correctly the bit can be cleared, but
> >>> multiple motherboard manufacturers are breaking this.  Perhaps the steps
> >>> are fragile and AMD needed to provide better guidance.
> >>>
> >>>
> >>> Neowutran, are you still setup to and interested in doing
> >>> experimentation/testing with Xen on your AMD computer?  Would you be up
> >>> for trying the patch here:
> >>>
> >>> https://lore.kernel.org/xen-devel/20231106142739.19650-1-roger.pau@citrix.com/raw
> >>>
> >>> I have a suspicion this *might* fix the x2APIC issue everyone has been
> >>> seeing.
> >>>
> >>> How plausible would it be to release this as a bugfix/workaround on 4.17?
> >>> I'm expecting a "no", but figured I should ask given how widespread the
> >>> issue is.
> >> I just applied the patch on my setup ( https://lore.kernel.org/xen-devel/20231106142739.19650-1-roger.pau@citrix.com/raw ) 
> >> It seems to fix the x2APIC issue I was having. 
> >>
> >> I only did some quick tests: 
> >>
> >> I tried all the differents values in my bios for the X2APIC settings. 
> >> Now the system successfully boot in all the cases, without needing
> >> manual override of the x2apic_phys/x2apic_mode parameter in boot commandline .
> > In light of this issue effecting a large number of people with recent
> > hardware, I formally request the patch
> > "x86/x2apic: introduce a mixed physical/cluster mode" be considered for
> > backport release on the 4.17 and 4.18 branches.
> >
> > (I'm unsure whether it is realistic for a 4.17 update, but I figure I
> > should ask)
> 
> This is an unreasonable ask.
> 
> I believe you when you say there is (or at least was) an x2apic bug (or
> bugs), but not once did you provide the logging requested, nor engage
> usefully with us in debugging.

It was insisted that full logs be sent to xen-devel.  Perhaps I am
paranoid, but I doubt I would have been successful at scrubbing all
hardware serial numbers.  As such, I was willing to post substantial
sections, but not everything.

Since the two were at loggerheads, there was nothing I could do.

Problem is the x86 machine is trying to remain close to a Linux
distribution and that distribution has been sluggish with that update.
I plan to double-check later when things are ready.

> Now you come along guessing alone at x2apic in a patch name that it
> fixes your problem, on a patch which is not a bugfix - it's a
> performance optimisation.

It was created as a performance optimization, but I had hopes it might
also workaround the problem which was occurring.


Has it been given proper consideration by the Release Manager?  As long
as the answer to that is "yes", then I will accept the decision of the
Release Manager.  I was under no illusion it was likely to be accepted
for backport.
Roger Pau Monné Nov. 23, 2023, 9:39 a.m. UTC | #13
On Tue, Nov 21, 2023 at 04:56:47PM -0800, Elliott Mitchell wrote:
> It was insisted that full logs be sent to xen-devel.  Perhaps I am
> paranoid, but I doubt I would have been successful at scrubbing all
> hardware serial numbers.  As such, I was willing to post substantial
> sections, but not everything.

Can you please point out which hardware serial numbers are you
referring to, and where are they printed in Xen dmesg?

Thanks, Roger.
Neowutran Nov. 24, 2023, 7:54 p.m. UTC | #14
Content-D
Elliott Mitchell Nov. 25, 2023, 1:15 a.m. UTC | #15
On Thu, Nov 23, 2023 at 10:39:37AM +0100, Roger Pau Monné wrote:
> On Tue, Nov 21, 2023 at 04:56:47PM -0800, Elliott Mitchell wrote:
> > It was insisted that full logs be sent to xen-devel.  Perhaps I am
> > paranoid, but I doubt I would have been successful at scrubbing all
> > hardware serial numbers.  As such, I was willing to post substantial
> > sections, but not everything.
> 
> Can you please point out which hardware serial numbers are you
> referring to, and where are they printed in Xen dmesg?

I didn't confirm the presence of hardware serial numbers getting into
Xen's dmesg, but there was a lot of data and many hexadecimal numbers.
With 4000 lines of output, checking for concerning data is troublesome.

There was enough for alarms to trigger.
Roger Pau Monné Nov. 27, 2023, 8:27 a.m. UTC | #16
On Fri, Nov 24, 2023 at 05:15:34PM -0800, Elliott Mitchell wrote:
> On Thu, Nov 23, 2023 at 10:39:37AM +0100, Roger Pau Monné wrote:
> > On Tue, Nov 21, 2023 at 04:56:47PM -0800, Elliott Mitchell wrote:
> > > It was insisted that full logs be sent to xen-devel.  Perhaps I am
> > > paranoid, but I doubt I would have been successful at scrubbing all
> > > hardware serial numbers.  As such, I was willing to post substantial
> > > sections, but not everything.
> > 
> > Can you please point out which hardware serial numbers are you
> > referring to, and where are they printed in Xen dmesg?
> 
> I didn't confirm the presence of hardware serial numbers getting into
> Xen's dmesg, but there was a lot of data and many hexadecimal numbers.
> With 4000 lines of output, checking for concerning data is troublesome.

4000 lines of output from Xen dmesg seems quite suspicious.  Xen dmesg
on a fully booted box is ~400 lines on my local box.  That might get
quite longer if you have a box with a lot of memory region, or a lot
of CPUs.

> There was enough for alarms to trigger.

That's fine, but without pointing out which information is sensitive,
it's very unlikely such concerns will ever get fixed.

Could you at least go over the log and point out the first instance of
such line that you consider contains sensitive information?

The primary way we have to diagnose and solve issues reported by users
is based on Xen dmesg, so if it's impossible to provide it for privacy
reasons that's an issue that we need to address.

Regards, Roger.
Andrew Cooper Nov. 27, 2023, 11:49 a.m. UTC | #17
On 24/11/2023 7:54 pm, Neowutran wrote:
> Hi, 
> I did some more tests and research, indeed this patch improved/solved my specific case. 
>
> Starting point: 
>
> I am using Xen version 4.17.2 (exactly this source https://github.com/QubesOS/qubes-vmm-xen).
> In the bios (a Asus motherboard), I configured the "local apic" parameter to "X2APIC".
> For Xen, I did not set the parameter "x2apic-mode" nor the parameter "x2apic_phys". 
>
> Case 1:
> I tryied to boot just like that, result: system is unusuably slow
>
> Case 2:
> Then, I applied a backport of the patch  
> https://lore.kernel.org/xen-devel/20231106142739.19650-1-roger.pau@citrix.com/raw 
> to the original Xen version of QubesOS and I recompiled. 
> (https://github.com/neowutran/qubes-vmm-xen/blob/x2apic3/X2APIC.patch)
> Result: it work, the system is usable. 
>
> Case 3:
> Then, I applied the patch https://github.com/xen-project/xen/commit/26a449ce32cef33f2cb50602be19fcc0c4223ba9
> to the original Xen version of QubesOS and I recompiled.
> (https://github.com/neowutran/qubes-vmm-xen/blob/x2apic4/X2APIC.patch)
> Result: system is  
> unusuably slow. 
>
>
> In "Case 2", the value returned by the function "apic_x2apic_probe" is "&apic_x2apic_mixed". 
> In "Case 3", the value returned by the function "apic_x2apic_probe" is "&apic_x2apic_cluster". 
>
>
> -------------------
> If you want / need, details for the function "apic_x2apic_probe":
>
> Known "input" value:
>
> "CONFIG_X2APIC_PHYSICAL" is not defined
> "iommu_intremap == iommu_intremap_off" = false
> "acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL" -> 0
> "acpi_gbl_FADT.flags" = 247205 (in decimal)
> "CONFIG_X2APIC_PHYSICAL" is not defined
> "CONFIG_X2APIC_MIXED" is defined, because it is the default choice
> "x2apic_mode" = 0
> "x2apic_phys" = -1
>
>
>
> Trace log (I did some call "printk" to trace what was going on)
> Case 2:
> (XEN) NEOWUTRAN: X2APIC_MODE: 0 
> (XEN) NEOWUTRAN: X2APIC_PHYS: -1 
> (XEN) NEOWUTRAN: acpi_gbl_FADT.flags: 247205 
> (XEN) NEOWUTRAN IOMMU_INTREMAP: different 
> (XEN) Neowutran: PASSE 2 
> (XEN) Neowutran: PASSE 4 
> (XEN) NEOWUTRAN: X2APIC_MODE: 3 
> (XEN) Neowutran: PASSE 7 
> (XEN) NEOWUTRAN: X2APIC_MODE: 3 
>  
> (XEN) NEOWUTRAN: X2APIC_PHYS: -1 
> (XEN) NEOWUTRAN: acpi_gbl_FADT.flags: 247205 
> (XEN) NEOWUTRAN IOMMU_INTREMAP: different 
>
> Case 3:
> (XEN) NEOWUTRAN2: X2APIC_PHYS: -1 
> (XEN) NEOWUTRAN2: acpi_gbl_FADT.flags: 247205 
> (XEN) NEOWUTRAN2 IOMMU_INTREMAP: different 
> (XEN) Neowutran2: Passe 1 
> (XEN) NEOWUTRAN2: X2APIC_PHYS: 0 
> (XEN) Neowutran2: Passe 6 
> (XEN) Neowutran2: Passe 7 
> (XEN) NEOWUTRAN2: X2APIC_PHYS: 0 
> (XEN) NEOWUTRAN2: acpi_gbl_FADT.flags: 247205 
> (XEN) NEOWUTRAN2 IOMMU_INTREMAP: different 
> (XEN) Neowutran2: Passe 2 
> (XEN) Neowutran2: Passe 4 
> (XEN) Neowutran2: Passe 7
>
>
>
> If you require the full logs, I could publish the full logs somewhere.
> ----------------------
>
> ( However I do not understand if the root issue is a buggy motherboard, a
> bug in xen, or if the parameter "X2APIC_PHYSICAL" should have been set
> by the QubesOS project, or something else)

Hello,

Thankyou for the analysis.

For your base version of QubeOS Xen, was that 4.13.2-5 ?   I can't see
any APIC changes in the patchqueue, and I believe all relevant bugfixes
are in 4.17.2, but I'd just like to confirm.

First, by "unusable slow", other than the speed, did everything else
appear to operate adequately?  Any chance you could guess the slowdown. 
i.e. was it half the speed, or "seconds per log console line during
boot" levels of slow?


Having re-reviewed 26a449ce32, the patch is correct but the reasoning is
wrong.

ACPI_FADT_APIC_CLUSTER predates x2APIC by almost a decade (it appeared
in ACPI 3.0), and is not relevant outside of xAPIC mode.  xAPIC has 2
different logical destination modes, cluster and flat, and their
applicability is dependent on whether you have fewer or more than 8
local APICs, hence that property being called out in the ACPI spec.

x2APIC does not have this property.  DFR was removed from the
architecture, and logical mode is strictly cluster.  So the bit should
never have been interpreted on an x2APIC code path.

Not that it matters in your case - the bit isn't set in your FADT, hence
why case 1 and 3 have the same behaviour.


This brings us to case 2, where mixed mode does seem to resolve the perf
problem.

Since that patch was written, I've learnt how cluster delivery mode
works for external interrupts, and Xen should never ever have been using
it (Xen appears to be alone in OS software here).  For an external
interrupt in Logical cluster mode, it always sends to the lowest ID in
the cluster.  If that APIC decides that the local processor is too busy
to handle the interrupt now, it forwards the interrupt to the next APIC
in the cluster, and this cycle continues until one APIC accepts the message.

You get most interrupts hitting the lowest APIC in the cluster, but the
interrupt can be forwarded between APICs for an unbounded quantity of
time depending on system utilisation.


Could you please take case 2 and confirm what happens when booting with
x2apic-mode={physical,cluster}?  If the pattern holds, the physical
should be fine, and cluster should see the same problems as case 1 and 3.

Thanks,

~Andrew
Roger Pau Monné Nov. 27, 2023, 2:53 p.m. UTC | #18
On Mon, Nov 27, 2023 at 11:49:03AM +0000, Andrew Cooper wrote:
> On 24/11/2023 7:54 pm, Neowutran wrote:
> > Hi, 
> > I did some more tests and research, indeed this patch improved/solved my specific case. 
> >
> > Starting point: 
> >
> > I am using Xen version 4.17.2 (exactly this source https://github.com/QubesOS/qubes-vmm-xen).
> > In the bios (a Asus motherboard), I configured the "local apic" parameter to "X2APIC".
> > For Xen, I did not set the parameter "x2apic-mode" nor the parameter "x2apic_phys". 
> >
> > Case 1:
> > I tryied to boot just like that, result: system is unusuably slow
> >
> > Case 2:
> > Then, I applied a backport of the patch  
> > https://lore.kernel.org/xen-devel/20231106142739.19650-1-roger.pau@citrix.com/raw 
> > to the original Xen version of QubesOS and I recompiled. 
> > (https://github.com/neowutran/qubes-vmm-xen/blob/x2apic3/X2APIC.patch)
> > Result: it work, the system is usable. 
> >
> > Case 3:
> > Then, I applied the patch https://github.com/xen-project/xen/commit/26a449ce32cef33f2cb50602be19fcc0c4223ba9
> > to the original Xen version of QubesOS and I recompiled.
> > (https://github.com/neowutran/qubes-vmm-xen/blob/x2apic4/X2APIC.patch)
> > Result: system is  
> > unusuably slow. 
> >
> >
> > In "Case 2", the value returned by the function "apic_x2apic_probe" is "&apic_x2apic_mixed". 
> > In "Case 3", the value returned by the function "apic_x2apic_probe" is "&apic_x2apic_cluster". 
> >
> >
> > -------------------
> > If you want / need, details for the function "apic_x2apic_probe":
> >
> > Known "input" value:
> >
> > "CONFIG_X2APIC_PHYSICAL" is not defined
> > "iommu_intremap == iommu_intremap_off" = false
> > "acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL" -> 0
> > "acpi_gbl_FADT.flags" = 247205 (in decimal)
> > "CONFIG_X2APIC_PHYSICAL" is not defined
> > "CONFIG_X2APIC_MIXED" is defined, because it is the default choice
> > "x2apic_mode" = 0
> > "x2apic_phys" = -1
> >
> >
> >
> > Trace log (I did some call "printk" to trace what was going on)
> > Case 2:
> > (XEN) NEOWUTRAN: X2APIC_MODE: 0 
> > (XEN) NEOWUTRAN: X2APIC_PHYS: -1 
> > (XEN) NEOWUTRAN: acpi_gbl_FADT.flags: 247205 
> > (XEN) NEOWUTRAN IOMMU_INTREMAP: different 
> > (XEN) Neowutran: PASSE 2 
> > (XEN) Neowutran: PASSE 4 
> > (XEN) NEOWUTRAN: X2APIC_MODE: 3 
> > (XEN) Neowutran: PASSE 7 
> > (XEN) NEOWUTRAN: X2APIC_MODE: 3 
> >  
> > (XEN) NEOWUTRAN: X2APIC_PHYS: -1 
> > (XEN) NEOWUTRAN: acpi_gbl_FADT.flags: 247205 
> > (XEN) NEOWUTRAN IOMMU_INTREMAP: different 
> >
> > Case 3:
> > (XEN) NEOWUTRAN2: X2APIC_PHYS: -1 
> > (XEN) NEOWUTRAN2: acpi_gbl_FADT.flags: 247205 
> > (XEN) NEOWUTRAN2 IOMMU_INTREMAP: different 
> > (XEN) Neowutran2: Passe 1 
> > (XEN) NEOWUTRAN2: X2APIC_PHYS: 0 
> > (XEN) Neowutran2: Passe 6 
> > (XEN) Neowutran2: Passe 7 
> > (XEN) NEOWUTRAN2: X2APIC_PHYS: 0 
> > (XEN) NEOWUTRAN2: acpi_gbl_FADT.flags: 247205 
> > (XEN) NEOWUTRAN2 IOMMU_INTREMAP: different 
> > (XEN) Neowutran2: Passe 2 
> > (XEN) Neowutran2: Passe 4 
> > (XEN) Neowutran2: Passe 7
> >
> >
> >
> > If you require the full logs, I could publish the full logs somewhere.
> > ----------------------
> >
> > ( However I do not understand if the root issue is a buggy motherboard, a
> > bug in xen, or if the parameter "X2APIC_PHYSICAL" should have been set
> > by the QubesOS project, or something else)
> 
> Hello,
> 
> Thankyou for the analysis.
> 
> For your base version of QubeOS Xen, was that 4.13.2-5 ?   I can't see
> any APIC changes in the patchqueue, and I believe all relevant bugfixes
> are in 4.17.2, but I'd just like to confirm.
> 
> First, by "unusable slow", other than the speed, did everything else
> appear to operate adequately?  Any chance you could guess the slowdown. 
> i.e. was it half the speed, or "seconds per log console line during
> boot" levels of slow?
> 
> 
> Having re-reviewed 26a449ce32, the patch is correct but the reasoning is
> wrong.
> 
> ACPI_FADT_APIC_CLUSTER predates x2APIC by almost a decade (it appeared
> in ACPI 3.0), and is not relevant outside of xAPIC mode.  xAPIC has 2
> different logical destination modes, cluster and flat, and their
> applicability is dependent on whether you have fewer or more than 8
> local APICs, hence that property being called out in the ACPI spec.
> 
> x2APIC does not have this property.  DFR was removed from the
> architecture, and logical mode is strictly cluster.  So the bit should
> never have been interpreted on an x2APIC code path.

FWIW, Jan also pointed out that the ACPI spec mentions xAPIC strictly,
even for ACPI_FADT_APIC_PHYSICAL.  It's possible APIC_PHYSICAL should
also be enforced only in xAPIC mode.  Or it's also possible the ACPI
spec was not updated to mention both xAPIC and x2APIC modes.

Roger.
Neowutran Nov. 27, 2023, 3:18 p.m. UTC | #19
cab@citrix.com>
 <x4qzfuqkkebjkdfmhw6rvdhrn2ewa6ghjtjqd7xevnuylfahh7@pjratinsg76a>
 <a4b4546a-60b8-4d0e-bdf4-9af6699fb925@citrix.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <a4b4546a-60b8-4d0e-bdf4-9af6699fb925@citrix.com>


Hello, 
Thanks a lot for all the details and explainations ! :)

On 2023-11-27 11:11, Andrew Cooper wrote:
> On 24/11/2023 7:54 pm, Neowutran wrote:
> > Hi, 
> > I did some more tests and research, indeed this patch improved/solved my specific case. 
> >
> > Starting point: 
> >
> > I am using Xen version 4.17.2 (exactly this source https://github.com/QubesOS/qubes-vmm-xen).
> > In the bios (a Asus motherboard), I configured the "local apic" parameter to "X2APIC".
> > For Xen, I did not set the parameter "x2apic-mode" nor the parameter "x2apic_phys". 
> >
> > Case 1:
> > I tryied to boot just like that, result: system is unusuably slow
> >
> > Case 2:
> > Then, I applied a backport of the patch  

Andrew Cooper Nov. 27, 2023, 3:56 p.m. UTC | #20
On 27/11/2023 2:53 pm, Roger Pau Monné wrote:
> On Mon, Nov 27, 2023 at 11:49:03AM +0000, Andrew Cooper wrote:
>> On 24/11/2023 7:54 pm, Neowutran wrote:
>>> Hi, 
>>> I did some more tests and research, indeed this patch improved/solved my specific case. 
>>>
>>> Starting point: 
>>>
>>> I am using Xen version 4.17.2 (exactly this source https://github.com/QubesOS/qubes-vmm-xen).
>>> In the bios (a Asus motherboard), I configured the "local apic" parameter to "X2APIC".
>>> For Xen, I did not set the parameter "x2apic-mode" nor the parameter "x2apic_phys". 
>>>
>>> Case 1:
>>> I tryied to boot just like that, result: system is unusuably slow
>>>
>>> Case 2:
>>> Then, I applied a backport of the patch  
>>> https://lore.kernel.org/xen-devel/20231106142739.19650-1-roger.pau@citrix.com/raw 
>>> to the original Xen version of QubesOS and I recompiled. 
>>> (https://github.com/neowutran/qubes-vmm-xen/blob/x2apic3/X2APIC.patch)
>>> Result: it work, the system is usable. 
>>>
>>> Case 3:
>>> Then, I applied the patch https://github.com/xen-project/xen/commit/26a449ce32cef33f2cb50602be19fcc0c4223ba9
>>> to the original Xen version of QubesOS and I recompiled.
>>> (https://github.com/neowutran/qubes-vmm-xen/blob/x2apic4/X2APIC.patch)
>>> Result: system is  
>>> unusuably slow. 
>>>
>>>
>>> In "Case 2", the value returned by the function "apic_x2apic_probe" is "&apic_x2apic_mixed". 
>>> In "Case 3", the value returned by the function "apic_x2apic_probe" is "&apic_x2apic_cluster". 
>>>
>>>
>>> -------------------
>>> If you want / need, details for the function "apic_x2apic_probe":
>>>
>>> Known "input" value:
>>>
>>> "CONFIG_X2APIC_PHYSICAL" is not defined
>>> "iommu_intremap == iommu_intremap_off" = false
>>> "acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL" -> 0
>>> "acpi_gbl_FADT.flags" = 247205 (in decimal)
>>> "CONFIG_X2APIC_PHYSICAL" is not defined
>>> "CONFIG_X2APIC_MIXED" is defined, because it is the default choice
>>> "x2apic_mode" = 0
>>> "x2apic_phys" = -1
>>>
>>>
>>>
>>> Trace log (I did some call "printk" to trace what was going on)
>>> Case 2:
>>> (XEN) NEOWUTRAN: X2APIC_MODE: 0 
>>> (XEN) NEOWUTRAN: X2APIC_PHYS: -1 
>>> (XEN) NEOWUTRAN: acpi_gbl_FADT.flags: 247205 
>>> (XEN) NEOWUTRAN IOMMU_INTREMAP: different 
>>> (XEN) Neowutran: PASSE 2 
>>> (XEN) Neowutran: PASSE 4 
>>> (XEN) NEOWUTRAN: X2APIC_MODE: 3 
>>> (XEN) Neowutran: PASSE 7 
>>> (XEN) NEOWUTRAN: X2APIC_MODE: 3 
>>>  
>>> (XEN) NEOWUTRAN: X2APIC_PHYS: -1 
>>> (XEN) NEOWUTRAN: acpi_gbl_FADT.flags: 247205 
>>> (XEN) NEOWUTRAN IOMMU_INTREMAP: different 
>>>
>>> Case 3:
>>> (XEN) NEOWUTRAN2: X2APIC_PHYS: -1 
>>> (XEN) NEOWUTRAN2: acpi_gbl_FADT.flags: 247205 
>>> (XEN) NEOWUTRAN2 IOMMU_INTREMAP: different 
>>> (XEN) Neowutran2: Passe 1 
>>> (XEN) NEOWUTRAN2: X2APIC_PHYS: 0 
>>> (XEN) Neowutran2: Passe 6 
>>> (XEN) Neowutran2: Passe 7 
>>> (XEN) NEOWUTRAN2: X2APIC_PHYS: 0 
>>> (XEN) NEOWUTRAN2: acpi_gbl_FADT.flags: 247205 
>>> (XEN) NEOWUTRAN2 IOMMU_INTREMAP: different 
>>> (XEN) Neowutran2: Passe 2 
>>> (XEN) Neowutran2: Passe 4 
>>> (XEN) Neowutran2: Passe 7
>>>
>>>
>>>
>>> If you require the full logs, I could publish the full logs somewhere.
>>> ----------------------
>>>
>>> ( However I do not understand if the root issue is a buggy motherboard, a
>>> bug in xen, or if the parameter "X2APIC_PHYSICAL" should have been set
>>> by the QubesOS project, or something else)
>> Hello,
>>
>> Thankyou for the analysis.
>>
>> For your base version of QubeOS Xen, was that 4.13.2-5 ?   I can't see
>> any APIC changes in the patchqueue, and I believe all relevant bugfixes
>> are in 4.17.2, but I'd just like to confirm.
>>
>> First, by "unusable slow", other than the speed, did everything else
>> appear to operate adequately?  Any chance you could guess the slowdown. 
>> i.e. was it half the speed, or "seconds per log console line during
>> boot" levels of slow?
>>
>>
>> Having re-reviewed 26a449ce32, the patch is correct but the reasoning is
>> wrong.
>>
>> ACPI_FADT_APIC_CLUSTER predates x2APIC by almost a decade (it appeared
>> in ACPI 3.0), and is not relevant outside of xAPIC mode.  xAPIC has 2
>> different logical destination modes, cluster and flat, and their
>> applicability is dependent on whether you have fewer or more than 8
>> local APICs, hence that property being called out in the ACPI spec.
>>
>> x2APIC does not have this property.  DFR was removed from the
>> architecture, and logical mode is strictly cluster.  So the bit should
>> never have been interpreted on an x2APIC code path.
> FWIW, Jan also pointed out that the ACPI spec mentions xAPIC strictly,
> even for ACPI_FADT_APIC_PHYSICAL.  It's possible APIC_PHYSICAL should
> also be enforced only in xAPIC mode.  Or it's also possible the ACPI
> spec was not updated to mention both xAPIC and x2APIC modes.

ACPI_FADT_APIC_PHYSICAL is similarly old.

In 2004, xAPIC used strictly in this way distinguished the P4/Xeon APIC
architecture (named xAPIC) from prior generations (simply APIC,
including the original external APIC implementation).

But x2APIC is different still, so properties which were necessary to
state for xAPIC don't necessarily hold for x2APIC.

ACPI_FADT_APIC_CLUSTER is definitely obsolete for x2APIC.  Given the
wording of ACPI_FADT_APIC_PHYSICAL refers to DFR and the 8-APIC
boundary, I'd say its equally obsolete.

~Andrew
Andrew Cooper Nov. 27, 2023, 4:17 p.m. UTC | #21
On 27/11/2023 3:18 pm, Neowutran wrote:
>> Hello,
>>
>> Thankyou for the analysis.
>>
>> For your base version of QubeOS Xen, was that 4.13.2-5 ?   I can't see
>> any APIC changes in the patchqueue, and I believe all relevant bugfixes
>> are in 4.17.2, but I'd just like to confirm.
> I was using the qubes-vmm-xen release "4.17.2-5" that use xen version
> "4.17.2" . I don't see custom modification for APIC in the patchs
> applied t 
> o Xen by QubesOS
>
>> First, by "unusable slow", other than the speed, did everything else
>> appear to operate adequately?  Any chance you could guess the slowdown. 
>> i.e. was it half the speed, or "seconds per log console line during
>> boot" levels of slow?
> Once I was logged in, it took me around 10 minutes to type the command
> "sudo dmesg > log"

Ok.  Glacial it is.


>
> There was also graphical instabilities (screen display something, then it is black,
> few seconds later it display things again. 
> Sometime it completly crash and I need to reboot to try to finish the boot+login process),
> and unable to start guests due to the system being too slow. 
>
> Some of the logs gathered from "sudo dmesg" that only appear for case 1 and
> case 3: 
>
> "
>  nvme nvme1: I/O 998 QID 1 timeout, completion polled
>  nvme nvme1: I/O 854 QID 5 timeout, completion polled
>  ...
>  [drm] Fence fallback timer expired on ring sdma0
>  [drm] Fence fallback timer expired on ring sdma0
>  ...
>  [drm] Fence fallback timer expired on ring sdma0
>  [drm] Fence fallback timer ex 
> pired on ring gfx_0.0.0
>  [drm] Fence fallback timer expired on ring gfx_0.0.0
>  [drm] Fence fallback timer expired on ring sdma0
>  ...
> " 
> things like that repeated hundreds of times.

Both the disk and GPU had timeouts.  Both indicative of interrupts not
working correctly.
>> Having re-reviewed 26a449ce32, the patch is correct but the reasoning is
>> wrong.
>>
>> ACPI_FADT_APIC_CLUSTER predates x2APIC by almost a decade (it appeared
>> in ACPI 3.0), and is not relevant outside of xAPIC mode.  xAPIC has 2
>> different logical destination modes, cluster and flat, and their
>> applicability is dependent on whether you have fewer or more than 8
>> local APICs, hence that property being called out in the ACPI spec.
>>
>> x2APIC does not have this property.  DFR was removed from the
>> architecture, and logical mode is strictly cluster.  So the bit should
>> never have been interpreted on an x2APIC code path.
>>
>> Not that it matters in your case - the bit isn't set in your FADT, hence
>> why case 1 and 3 have the same behaviour.
>>
>>
>> This brings us to case 2, where mixed mode does seem to resolve the per 
> f
>> problem.
>>
>> Since that patch was written, I've learnt how cluster delivery mode
>> works for external interrupts, and Xen should never ever have been using
>> it (Xen appears to be alone in OS software here).  For an external
>> interrupt in Logical cluster mode, it always sends to the lowest ID in
>> the cluster.  If that APIC decides that the local processor is too busy
>> to handle the interrupt now, it forwards the interrupt to the next APIC
>> in the cluster, and this cycle continues until one APIC accepts the message.
>>
>> You get most interrupts hitting the lowest APIC in the cluster, but the
>> interrupt can be forwarded between APICs for an unbounded quantity of
>> time depending on system utilisation.
>>
>>
>> Could you please take case 2 and confirm what happens when booting with
>> x2apic-mode={physical,cluster}?  If the pattern holds, the physical
>> should be fine, and cluster should see the same problems as case 1 and 3.
> I confirm that the pattern holds. "physical" is fine and "cluster"
> have th 
> e same issue as case 1 and case 3.

Ok, so something in your system *really* doesn't like having external
interrupts programmed in Logical cluster mode.

And knowing how logical cluster mode works for external interrupts, we
should never have done that in the first place.

I think this is enough justification to backport mixed mode.  Against a
default upstream Xen, that swaps external interrupts to being programmed
in physical mode, while keeping IPIs in cluster mode.

FWIW, we've already backported mixed mode to XenServer's Xen 4.13 (where
we forced physical mode for other reasons), and that's come with nice
perf improvements from swapping IPIs to using cluster mode.

~Andrew
Elliott Mitchell Dec. 4, 2023, 1:02 a.m. UTC | #22
On Mon, Nov 27, 2023 at 09:27:18AM +0100, Roger Pau Monn
> On Fri, Nov 24, 2023 at 05:15:34PM -0800, Elliott Mitchell wrote:
> > On Thu, Nov 23, 2023 at 10:39:37AM +0100, Roger Pau Monn
> > > On Tue, Nov 21, 2023 at 04:56:47PM -0800, Elliott Mitchell wrote:
> > > > It was insisted that full logs be sent to xen-devel.  Perhaps I am
> > > > paranoid, but I doubt I would have been successful at scrubbing all
> > > > hardware serial numbers.  As such, I was willing to post substantial
> > > > sections, but not everything.
> > > 
> > > Can you please point out which hardware serial numbers are you
> > > referring to, and where are they printed in Xen dmesg?
> > 
> > I didn't confirm the presence of hardware serial numbers getting into
> > Xen's dmesg, but there was a lot of data and many hexadecimal numbers.
> > With 4000 lines of output, checking for concerning data is troublesome.
> 
> 4000 lines of output from Xen dmesg seems quite suspicious.  Xen dmesg
> on a fully booted box is ~400 lines on my local box.  That might get
> quite longer if you have a box with a lot of memory region, or a lot
> of CPUs.

That was from 4 boots with differing settings.  Since it was full console
log, it also had the initial Linux kernel boot messages.  Each log was
~1000 lines.

> > There was enough for alarms to trigger.
> 
> That's fine, but without pointing out which information is sensitive,
> it's very unlikely such concerns will ever get fixed.
> 
> Could you at least go over the log and point out the first instance of
> such line that you consider contains sensitive information?

I would have been more comfortable with getting some guidance on which
portions were desired or which could be left out.  No need for Linux's
boot messages, what would cut things down by half.  No need for the
memory map, lots more goes.

It is easier to be comfortable with 40 line sections than 1000 line
sections.
Jan Beulich Dec. 4, 2023, 8:03 a.m. UTC | #23
On 04.12.2023 02:02, Elliott Mitchell wrote:
> On Mon, Nov 27, 2023 at 09:27:18AM +0100, Roger Pau Monn
>> On Fri, Nov 24, 2023 at 05:15:34PM -0800, Elliott Mitchell wrote:
>>> On Thu, Nov 23, 2023 at 10:39:37AM +0100, Roger Pau Monn
>>>> On Tue, Nov 21, 2023 at 04:56:47PM -0800, Elliott Mitchell wrote:
>>>>> It was insisted that full logs be sent to xen-devel.  Perhaps I am
>>>>> paranoid, but I doubt I would have been successful at scrubbing all
>>>>> hardware serial numbers.  As such, I was willing to post substantial
>>>>> sections, but not everything.
>>>>
>>>> Can you please point out which hardware serial numbers are you
>>>> referring to, and where are they printed in Xen dmesg?
>>>
>>> I didn't confirm the presence of hardware serial numbers getting into
>>> Xen's dmesg, but there was a lot of data and many hexadecimal numbers.
>>> With 4000 lines of output, checking for concerning data is troublesome.
>>
>> 4000 lines of output from Xen dmesg seems quite suspicious.  Xen dmesg
>> on a fully booted box is ~400 lines on my local box.  That might get
>> quite longer if you have a box with a lot of memory region, or a lot
>> of CPUs.
> 
> That was from 4 boots with differing settings.  Since it was full console
> log, it also had the initial Linux kernel boot messages.  Each log was
> ~1000 lines.
> 
>>> There was enough for alarms to trigger.
>>
>> That's fine, but without pointing out which information is sensitive,
>> it's very unlikely such concerns will ever get fixed.
>>
>> Could you at least go over the log and point out the first instance of
>> such line that you consider contains sensitive information?
> 
> I would have been more comfortable with getting some guidance on which
> portions were desired or which could be left out.  No need for Linux's
> boot messages, what would cut things down by half.  No need for the
> memory map, lots more goes.

I didn't think "xl dmesg" spit out any Dom0 kernel messages by default.

Jan

> It is easier to be comfortable with 40 line sections than 1000 line
> sections.
> 
>
Roger Pau Monné Dec. 4, 2023, 11:07 a.m. UTC | #24
On Sun, Dec 03, 2023 at 05:02:04PM -0800, Elliott Mitchell wrote:
> On Mon, Nov 27, 2023 at 09:27:18AM +0100, Roger Pau Monn
> > On Fri, Nov 24, 2023 at 05:15:34PM -0800, Elliott Mitchell wrote:
> > > On Thu, Nov 23, 2023 at 10:39:37AM +0100, Roger Pau Monn
> > > > On Tue, Nov 21, 2023 at 04:56:47PM -0800, Elliott Mitchell wrote:
> > > > > It was insisted that full logs be sent to xen-devel.  Perhaps I am
> > > > > paranoid, but I doubt I would have been successful at scrubbing all
> > > > > hardware serial numbers.  As such, I was willing to post substantial
> > > > > sections, but not everything.
> > > > 
> > > > Can you please point out which hardware serial numbers are you
> > > > referring to, and where are they printed in Xen dmesg?
> > > 
> > > I didn't confirm the presence of hardware serial numbers getting into
> > > Xen's dmesg, but there was a lot of data and many hexadecimal numbers.
> > > With 4000 lines of output, checking for concerning data is troublesome.
> > 
> > 4000 lines of output from Xen dmesg seems quite suspicious.  Xen dmesg
> > on a fully booted box is ~400 lines on my local box.  That might get
> > quite longer if you have a box with a lot of memory region, or a lot
> > of CPUs.
> 
> That was from 4 boots with differing settings.  Since it was full console
> log, it also had the initial Linux kernel boot messages.  Each log was
> ~1000 lines.

I think this is unfair.  People on the list was willing to go over
your 1000 lines of log in order to help solve your issue, and you
couldn't go over them to assert your claims about it containing
sensitive information?

> > > There was enough for alarms to trigger.
> > 
> > That's fine, but without pointing out which information is sensitive,
> > it's very unlikely such concerns will ever get fixed.
> > 
> > Could you at least go over the log and point out the first instance of
> > such line that you consider contains sensitive information?
> 
> I would have been more comfortable with getting some guidance on which
> portions were desired or which could be left out.  No need for Linux's
> boot messages, what would cut things down by half.

IIRC the specific request was for Xen logs, so yes, you could have
left out the Linux side, at least until asked for.

> memory map, lots more goes.
> 
> It is easier to be comfortable with 40 line sections than 1000 line
> sections.

I don't think that's really feasible, it's unclear how much you would
trim, and whether such trimming could remove important information to
help debug the issue.

Again, it would be very helpful for us to get pointed out at what
sensitive information is being printed, so that it can be removed.

Thanks, Roger.
diff mbox series

Patch

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index 6b07d0f3a17f..09c6f17b4b02 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -2802,6 +2802,14 @@  the watchdog.
 
 Permit use of x2apic setup for SMP environments.
 
+### x2apic-mode (x86)
+> `= physical | cluster | mixed`
+
+> Default: `physical` if **FADT** mandates physical mode, `mixed` otherwise.
+
+In the case that x2apic is in use, this option switches between modes to
+address APICs in the system as interrupt destinations.
+
 ### x2apic_phys (x86)
 > `= <boolean>`
 
@@ -2812,6 +2820,9 @@  In the case that x2apic is in use, this option switches between physical and
 clustered mode.  The default, given no hint from the **FADT**, is cluster
 mode.
 
+**WARNING: `x2apic_phys` is deprecated and superseded by `x2apic-mode`.
+The later takes precedence if both are set.**
+
 ### xenheap_megabytes (arm32)
 > `= <size>`
 
diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
index eac77573bd75..a44d63b5c8dc 100644
--- a/xen/arch/x86/Kconfig
+++ b/xen/arch/x86/Kconfig
@@ -228,11 +228,18 @@  config XEN_ALIGN_2M
 
 endchoice
 
-config X2APIC_PHYSICAL
-	bool "x2APIC Physical Destination mode"
-	help
-	  Use x2APIC Physical Destination mode by default when available.
+choice
+	prompt "x2APIC Destination mode"
+	default X2APIC_MIXED
+	---help---
+	  Select APIC addressing when x2APIC is enabled.
 
+	  The default mode is mixed which should provide the best aspects
+	  of both physical and cluster modes.
+
+config X2APIC_PHYSICAL
+       tristate "Physical Destination mode"
+	---help---
 	  When using this mode APICs are addressed using the Physical
 	  Destination mode, which allows using all dynamic vectors on each
 	  CPU independently.
@@ -242,9 +249,27 @@  config X2APIC_PHYSICAL
 	  destination inter processor interrupts (IPIs) slightly slower than
 	  Logical Destination mode.
 
-	  The mode when this option is not selected is Logical Destination.
+config X2APIC_CLUSTER
+	tristate "Cluster Destination mode"
+	---help---
+	  When using this mode APICs are addressed using the Cluster Logical
+	  Destination mode.
 
-	  If unsure, say N.
+	  Cluster Destination has the benefit of sending IPIs faster since
+	  multiple APICs can be targeted as destinations of a single IPI.
+	  However the vector space is shared between all CPUs on the cluster,
+	  and hence using this mode reduces the number of available vectors
+	  when compared to Physical mode.
+
+config X2APIC_MIXED
+	tristate "Mixed Destination mode"
+	---help---
+	  When using this mode APICs are addressed using the Cluster Logical
+	  Destination mode for IPIs and Physical mode for external interrupts.
+
+	  Should provide the best of both modes.
+
+endchoice
 
 config GUEST
 	bool
diff --git a/xen/arch/x86/genapic/x2apic.c b/xen/arch/x86/genapic/x2apic.c
index 707deef98c27..15632cc7332e 100644
--- a/xen/arch/x86/genapic/x2apic.c
+++ b/xen/arch/x86/genapic/x2apic.c
@@ -180,6 +180,25 @@  static const struct genapic __initconstrel apic_x2apic_cluster = {
     .send_IPI_self = send_IPI_self_x2apic
 };
 
+/*
+ * Mixed x2APIC mode: use physical for external (device) interrupts, and
+ * cluster for inter processor interrupt.  Such mode has the benefits of not
+ * sharing the vector space with all CPUs on the cluster, while still allows
+ * IPIs to be more efficiently delivered by not having to perform an ICR
+ * write for each target CPU.
+ */
+static const struct genapic __initconstrel apic_x2apic_mixed = {
+    APIC_INIT("x2apic_mixed", NULL),
+    /* NB: int_{delivery,dest}_mode are only used by non-IPI callers. */
+    .int_delivery_mode = dest_Fixed,
+    .int_dest_mode = 0 /* physical delivery */,
+    .init_apic_ldr = init_apic_ldr_x2apic_cluster,
+    .vector_allocation_cpumask = vector_allocation_cpumask_phys,
+    .cpu_mask_to_apicid = cpu_mask_to_apicid_phys,
+    .send_IPI_mask = send_IPI_mask_x2apic_cluster,
+    .send_IPI_self = send_IPI_self_x2apic
+};
+
 static int cf_check update_clusterinfo(
     struct notifier_block *nfb, unsigned long action, void *hcpu)
 {
@@ -220,38 +239,56 @@  static struct notifier_block x2apic_cpu_nfb = {
 static int8_t __initdata x2apic_phys = -1;
 boolean_param("x2apic_phys", x2apic_phys);
 
+enum {
+   unset, physical, cluster, mixed
+} static __initdata x2apic_mode = unset;
+
+static int __init parse_x2apic_mode(const char *s)
+{
+    if ( !cmdline_strcmp(s, "physical") )
+        x2apic_mode = physical;
+    else if ( !cmdline_strcmp(s, "cluster") )
+        x2apic_mode = cluster;
+    else if ( !cmdline_strcmp(s, "mixed") )
+        x2apic_mode = mixed;
+    else
+        return EINVAL;
+
+    return 0;
+}
+custom_param("x2apic-mode", parse_x2apic_mode);
+
 const struct genapic *__init apic_x2apic_probe(void)
 {
-    if ( x2apic_phys < 0 )
+    /* x2apic-mode option has preference over x2apic_phys. */
+    if ( x2apic_phys >= 0 && x2apic_mode == unset )
+        x2apic_mode = x2apic_phys ? physical : cluster;
+
+    if ( x2apic_mode == unset )
     {
-        /*
-         * Force physical mode if there's no (full) interrupt remapping support:
-         * The ID in clustered mode requires a 32 bit destination field due to
-         * the usage of the high 16 bits to hold the cluster ID.
-         */
-        x2apic_phys = iommu_intremap != iommu_intremap_full ||
-                      (acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL) ||
-                      IS_ENABLED(CONFIG_X2APIC_PHYSICAL);
-    }
-    else if ( !x2apic_phys )
-        switch ( iommu_intremap )
+        if ( acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL )
         {
-        case iommu_intremap_off:
-        case iommu_intremap_restricted:
-            printk("WARNING: x2APIC cluster mode is not supported %s interrupt remapping -"
-                   " forcing phys mode\n",
-                   iommu_intremap == iommu_intremap_off ? "without"
-                                                        : "with restricted");
-            x2apic_phys = true;
-            break;
-
-        case iommu_intremap_full:
-            break;
+            printk(XENLOG_INFO "ACPI FADT forcing x2APIC physical mode\n");
+            x2apic_mode = physical;
         }
+        else
+            x2apic_mode = IS_ENABLED(CONFIG_X2APIC_MIXED) ? mixed
+                          : (IS_ENABLED(CONFIG_X2APIC_PHYSICAL) ? physical
+                                                                : cluster);
+    }
 
-    if ( x2apic_phys )
+    if ( x2apic_mode == physical )
         return &apic_x2apic_phys;
 
+    if ( x2apic_mode == cluster && iommu_intremap != iommu_intremap_full )
+    {
+        printk("WARNING: x2APIC cluster mode is not supported %s interrupt remapping -"
+               " forcing mixed mode\n",
+               iommu_intremap == iommu_intremap_off ? "without"
+                                                    : "with restricted");
+        x2apic_mode = mixed;
+    }
+
     if ( !this_cpu(cluster_cpus) )
     {
         update_clusterinfo(NULL, CPU_UP_PREPARE,
@@ -260,7 +297,7 @@  const struct genapic *__init apic_x2apic_probe(void)
         register_cpu_notifier(&x2apic_cpu_nfb);
     }
 
-    return &apic_x2apic_cluster;
+    return x2apic_mode == cluster ? &apic_x2apic_cluster : &apic_x2apic_mixed;
 }
 
 void __init check_x2apic_preenabled(void)