diff mbox series

[XEN] tools: add x2APIC entries in MADT based on APIC ID

Message ID cd1a3ce14790af8c1bb4372ef0be5a6cbbb50b1c.1710338145.git.matthew.barnes@cloud.com (mailing list archive)
State New
Headers show
Series [XEN] tools: add x2APIC entries in MADT based on APIC ID | expand

Commit Message

Matthew Barnes March 13, 2024, 3:35 p.m. UTC
libacpi is a tool that is used by libxl (for PVH guests) and hvmloader
(for HVM guests) to construct ACPI tables for guests.

Currently, libacpi only uses APIC entries to enumerate processors for
guests in the MADT.

The APIC ID field in APIC entries is an octet big, which is fine for
xAPIC IDs, but not so for sufficiently large x2APIC IDs.

This patch scans each APIC ID before constructing the MADT, and uses the
x2APIC entry for each vCPU whose APIC ID exceeds the size limit imposed
by regular APIC entries.

Signed-off-by: Matthew Barnes <matthew.barnes@cloud.com>
---
 tools/libacpi/acpi2_0.h | 13 +++++++
 tools/libacpi/build.c   | 75 ++++++++++++++++++++++++++++++-----------
 2 files changed, 68 insertions(+), 20 deletions(-)

Comments

Jan Beulich March 14, 2024, 1:50 p.m. UTC | #1
On 13.03.2024 16:35, Matthew Barnes wrote:
> libacpi is a tool that is used by libxl (for PVH guests) and hvmloader
> (for HVM guests) to construct ACPI tables for guests.
> 
> Currently, libacpi only uses APIC entries to enumerate processors for
> guests in the MADT.
> 
> The APIC ID field in APIC entries is an octet big, which is fine for
> xAPIC IDs, but not so for sufficiently large x2APIC IDs.

Yet where would those come from? I can see that down the road we will
have such, but right now I don't think we do. Without saying so, this
change could be mistaken for a fix of an active bug.

> This patch scans each APIC ID before constructing the MADT, and uses the
> x2APIC entry for each vCPU whose APIC ID exceeds the size limit imposed
> by regular APIC entries.

It is my understanding that if you use any x2APIC entry, every CPU needs
to have one.

> @@ -134,27 +151,45 @@ static struct acpi_20_madt *construct_madt(struct acpi_ctxt *ctxt,
>          io_apic->ioapic_id   = config->ioapic_id;
>          io_apic->ioapic_addr = config->ioapic_base_address;
>  
> -        lapic = (struct acpi_20_madt_lapic *)(io_apic + 1);
> +        apicid_entry = io_apic + 1;
>      }
>      else
> -        lapic = (struct acpi_20_madt_lapic *)(madt + 1);
> +        apicid_entry = madt + 1;
>  
>      info->nr_cpus = hvminfo->nr_vcpus;
> -    info->madt_lapic0_addr = ctxt->mem_ops.v2p(ctxt, lapic);
> +    info->madt_lapic0_addr = ctxt->mem_ops.v2p(ctxt, apicid_entry);
>      for ( i = 0; i < hvminfo->nr_vcpus; i++ )
>      {
> -        memset(lapic, 0, sizeof(*lapic));
> -        lapic->type    = ACPI_PROCESSOR_LOCAL_APIC;
> -        lapic->length  = sizeof(*lapic);
> -        /* Processor ID must match processor-object IDs in the DSDT. */
> -        lapic->acpi_processor_id = i;
> -        lapic->apic_id = config->lapic_id(i);
> -        lapic->flags = (test_bit(i, hvminfo->vcpu_online)
> -                        ? ACPI_LOCAL_APIC_ENABLED : 0);
> -        lapic++;
> +        uint32_t apic_id = config->lapic_id(i);
> +        if ( apic_id < 255 )

Nit (here and below): This file uses hypervisor coding style. and hence a
blank line is wanted between declaration(s) and statement(s).

> +        {
> +            struct acpi_20_madt_lapic *lapic = apicid_entry;
> +            memset(lapic, 0, sizeof(*lapic));
> +            lapic->type    = ACPI_PROCESSOR_LOCAL_APIC;
> +            lapic->length  = sizeof(*lapic);
> +            /* Processor ID must match processor-object IDs in the DSDT. */
> +            lapic->acpi_processor_id = i;
> +            lapic->apic_id = apic_id;
> +            lapic->flags = (test_bit(i, hvminfo->vcpu_online)
> +                            ? ACPI_LOCAL_APIC_ENABLED : 0);
> +            apicid_entry = lapic + 1;
> +        }
> +        else
> +        {
> +            struct acpi_20_madt_x2apic *x2apic = apicid_entry;
> +            memset(x2apic, 0, sizeof(*x2apic));
> +            x2apic->type    = ACPI_PROCESSOR_LOCAL_X2APIC;
> +            x2apic->length  = sizeof(*x2apic);
> +            x2apic->apic_id = apic_id;
> +            x2apic->flags   = (test_bit(i, hvminfo->vcpu_online)
> +                                ? ACPI_LOCAL_APIC_ENABLED : 0);

Nit: Indentation off by 1.

Jan
Alejandro Vallejo March 25, 2024, 2:30 p.m. UTC | #2
Hi,

On 14/03/2024 13:50, Jan Beulich wrote:
> On 13.03.2024 16:35, Matthew Barnes wrote:
>> libacpi is a tool that is used by libxl (for PVH guests) and hvmloader
>> (for HVM guests) to construct ACPI tables for guests.
>>
>> Currently, libacpi only uses APIC entries to enumerate processors for
>> guests in the MADT.
>>
>> The APIC ID field in APIC entries is an octet big, which is fine for
>> xAPIC IDs, but not so for sufficiently large x2APIC IDs.
> 
> Yet where would those come from? I can see that down the road we will
> have such, but right now I don't think we do. Without saying so, this
> change could be mistaken for a fix of an active bug.

It's worth adding some context here.

You're quite right in that it's not immediately needed now, but with the
work done on improving the state of CPU topologies exposed to guests[1]
the strict binding between APIC ID and vCPU ID breaks. It's not hard to
imagine sparsity in the APIC ID space forcing the maximum one to peak
beyond 254. The generator present in that series tries to be
conservative and avoid it, but general topologies can theoretically
waste copious amounts of APIC ID space (i.e: by reserving more width
than strictly required to represent IDs of a certain level), and
exposing the host topology sensibly becomes difficult if we're subject
to limitations the host does not have.

[1]
https://lore.kernel.org/xen-devel/20240109153834.4192-1-alejandro.vallejo@cloud.com/

Cheers,
Alejandro
Matthew Barnes March 26, 2024, 3:57 p.m. UTC | #3
> > This patch scans each APIC ID before constructing the MADT, and uses the
> > x2APIC entry for each vCPU whose APIC ID exceeds the size limit imposed
> > by regular APIC entries.
> 
> It is my understanding that if you use any x2APIC entry, every CPU needs
> to have one.

In the ACPI 6.4 specification, section 5.2.12.12, the note says the following:

[Compatibility note] On some legacy OSes, Logical processors with APIC ID
values less than 255 (whether in XAPIC or X2APIC mode) must use the Processor
Local APIC structure to convey their APIC information to OSPM, and those
processors must be declared in the DSDT using the Processor() keyword.

Therefore, even in X2APIC mode, it's better to represent processors with APIC
ID values less than 255 with APIC entries for legacy reasons.

> > @@ -134,27 +151,45 @@ static struct acpi_20_madt *construct_madt(struct acpi_ctxt *ctxt,
> >          io_apic->ioapic_id   = config->ioapic_id;
> >          io_apic->ioapic_addr = config->ioapic_base_address;
> >  
> > -        lapic = (struct acpi_20_madt_lapic *)(io_apic + 1);
> > +        apicid_entry = io_apic + 1;
> >      }
> >      else
> > -        lapic = (struct acpi_20_madt_lapic *)(madt + 1);
> > +        apicid_entry = madt + 1;
> >  
> >      info->nr_cpus = hvminfo->nr_vcpus;
> > -    info->madt_lapic0_addr = ctxt->mem_ops.v2p(ctxt, lapic);
> > +    info->madt_lapic0_addr = ctxt->mem_ops.v2p(ctxt, apicid_entry);
> >      for ( i = 0; i < hvminfo->nr_vcpus; i++ )
> >      {
> > -        memset(lapic, 0, sizeof(*lapic));
> > -        lapic->type    = ACPI_PROCESSOR_LOCAL_APIC;
> > -        lapic->length  = sizeof(*lapic);
> > -        /* Processor ID must match processor-object IDs in the DSDT. */
> > -        lapic->acpi_processor_id = i;
> > -        lapic->apic_id = config->lapic_id(i);
> > -        lapic->flags = (test_bit(i, hvminfo->vcpu_online)
> > -                        ? ACPI_LOCAL_APIC_ENABLED : 0);
> > -        lapic++;
> > +        uint32_t apic_id = config->lapic_id(i);
> > +        if ( apic_id < 255 )
> 
> Nit (here and below): This file uses hypervisor coding style. and hence a
> blank line is wanted between declaration(s) and statement(s).

I will add this in patch V2

> > +        {
> > +            struct acpi_20_madt_lapic *lapic = apicid_entry;
> > +            memset(lapic, 0, sizeof(*lapic));
> > +            lapic->type    = ACPI_PROCESSOR_LOCAL_APIC;
> > +            lapic->length  = sizeof(*lapic);
> > +            /* Processor ID must match processor-object IDs in the DSDT. */
> > +            lapic->acpi_processor_id = i;
> > +            lapic->apic_id = apic_id;
> > +            lapic->flags = (test_bit(i, hvminfo->vcpu_online)
> > +                            ? ACPI_LOCAL_APIC_ENABLED : 0);
> > +            apicid_entry = lapic + 1;
> > +        }
> > +        else
> > +        {
> > +            struct acpi_20_madt_x2apic *x2apic = apicid_entry;
> > +            memset(x2apic, 0, sizeof(*x2apic));
> > +            x2apic->type    = ACPI_PROCESSOR_LOCAL_X2APIC;
> > +            x2apic->length  = sizeof(*x2apic);
> > +            x2apic->apic_id = apic_id;
> > +            x2apic->flags   = (test_bit(i, hvminfo->vcpu_online)
> > +                                ? ACPI_LOCAL_APIC_ENABLED : 0);
> 
> Nit: Indentation off by 1.

I will add this in patch V2

Matt
Jan Beulich March 26, 2024, 4:15 p.m. UTC | #4
On 26.03.2024 16:57, Matthew Barnes wrote:
>>> This patch scans each APIC ID before constructing the MADT, and uses the
>>> x2APIC entry for each vCPU whose APIC ID exceeds the size limit imposed
>>> by regular APIC entries.
>>
>> It is my understanding that if you use any x2APIC entry, every CPU needs
>> to have one.
> 
> In the ACPI 6.4 specification, section 5.2.12.12, the note says the following:
> 
> [Compatibility note] On some legacy OSes, Logical processors with APIC ID
> values less than 255 (whether in XAPIC or X2APIC mode) must use the Processor
> Local APIC structure to convey their APIC information to OSPM, and those
> processors must be declared in the DSDT using the Processor() keyword.
> 
> Therefore, even in X2APIC mode, it's better to represent processors with APIC
> ID values less than 255 with APIC entries for legacy reasons.

Well, my reading of that is different: All CPUs need to have x2APIC entries
if one has, and CPUs with small enough APIC IDs _additionally_ need xAPIC
entries. That's what I've been observing on real hardware, too.

Jan
Matthew Barnes March 26, 2024, 4:38 p.m. UTC | #5
On Tue, Mar 26, 2024 at 05:15:46PM +0100, Jan Beulich wrote:
> On 26.03.2024 16:57, Matthew Barnes wrote:
> >>> This patch scans each APIC ID before constructing the MADT, and uses the
> >>> x2APIC entry for each vCPU whose APIC ID exceeds the size limit imposed
> >>> by regular APIC entries.
> >>
> >> It is my understanding that if you use any x2APIC entry, every CPU needs
> >> to have one.
> > 
> > In the ACPI 6.4 specification, section 5.2.12.12, the note says the following:
> > 
> > [Compatibility note] On some legacy OSes, Logical processors with APIC ID
> > values less than 255 (whether in XAPIC or X2APIC mode) must use the Processor
> > Local APIC structure to convey their APIC information to OSPM, and those
> > processors must be declared in the DSDT using the Processor() keyword.
> > 
> > Therefore, even in X2APIC mode, it's better to represent processors with APIC
> > ID values less than 255 with APIC entries for legacy reasons.
> 
> Well, my reading of that is different: All CPUs need to have x2APIC entries
> if one has, and CPUs with small enough APIC IDs _additionally_ need xAPIC
> entries. That's what I've been observing on real hardware, too.

Understood. Patch v2 will reflect this

Matt
Roger Pau Monne March 26, 2024, 4:51 p.m. UTC | #6
On Mon, Mar 25, 2024 at 02:30:35PM +0000, Alejandro Vallejo wrote:
> Hi,
> 
> On 14/03/2024 13:50, Jan Beulich wrote:
> > On 13.03.2024 16:35, Matthew Barnes wrote:
> >> libacpi is a tool that is used by libxl (for PVH guests) and hvmloader
> >> (for HVM guests) to construct ACPI tables for guests.
> >>
> >> Currently, libacpi only uses APIC entries to enumerate processors for
> >> guests in the MADT.
> >>
> >> The APIC ID field in APIC entries is an octet big, which is fine for
> >> xAPIC IDs, but not so for sufficiently large x2APIC IDs.
> > 
> > Yet where would those come from? I can see that down the road we will
> > have such, but right now I don't think we do. Without saying so, this
> > change could be mistaken for a fix of an active bug.
> 
> It's worth adding some context here.
> 
> You're quite right in that it's not immediately needed now, but with the
> work done on improving the state of CPU topologies exposed to guests[1]
> the strict binding between APIC ID and vCPU ID breaks. It's not hard to
> imagine sparsity in the APIC ID space forcing the maximum one to peak
> beyond 254. The generator present in that series tries to be
> conservative and avoid it, but general topologies can theoretically
> waste copious amounts of APIC ID space (i.e: by reserving more width
> than strictly required to represent IDs of a certain level), and
> exposing the host topology sensibly becomes difficult if we're subject
> to limitations the host does not have.

Most guest OSes will refuse to bring up APs with APIC ID > 254, unless
there's support for interrupt remapping or Extended Destination ID
[0].  So while adding the entries in the MADT is fine, I wonder how we
can sensibly test this.

IMO, we should first add support for Extended Destination ID, and then
expose APs with APIC ID > 254.

Have you been able to test this with APIC IDs > 254, and can confirm
that an OS is capable of bringing those APs up with the ID in the
x2APIC MADT entry?

I think at a minimum you need some changes to the vlapic code in Xen,
so that when a vLAPIC gets assigned an ID > 254 it automatically
switches to x2APIC mode, as vlapic_init() unconditionally inits the
lapic in xAPIC mode.  Otherwise the INTI-SIPI-SIPI sequence won't find
the intended destination.

Thanks, Roger.

[0] https://gitlab.com/xen-project/xen/-/issues/173
diff mbox series

Patch

diff --git a/tools/libacpi/acpi2_0.h b/tools/libacpi/acpi2_0.h
index 6dfa939a8c0c..10e567686fe6 100644
--- a/tools/libacpi/acpi2_0.h
+++ b/tools/libacpi/acpi2_0.h
@@ -344,6 +344,7 @@  struct acpi_20_waet {
 #define ACPI_IO_SAPIC                       0x06
 #define ACPI_PROCESSOR_LOCAL_SAPIC          0x07
 #define ACPI_PLATFORM_INTERRUPT_SOURCES     0x08
+#define ACPI_PROCESSOR_LOCAL_X2APIC         0x09
 
 /*
  * APIC Structure Definitions.
@@ -360,6 +361,18 @@  struct acpi_20_madt_lapic {
     uint32_t flags;
 };
 
+/*
+ * Processor Local x2APIC Structure Definition.
+ */
+struct acpi_20_madt_x2apic {
+    uint8_t  type;              /* Must refer to x2APIC type (0x09) */
+    uint8_t  length;            /* Must be length of x2APIC struct in bytes (0x10) */
+    uint16_t reserved;          /* Must be zero */
+    uint32_t apic_id;           /* Processor's local x2APIC ID */
+    uint32_t flags;             /* Same as Local APIC flags */
+    uint32_t acpi_processor_id; /* Refers to a processor device used to associate the X2APIC structure with */
+};
+
 /*
  * Local APIC Flags.  All other bits are reserved and must be 0.
  */
diff --git a/tools/libacpi/build.c b/tools/libacpi/build.c
index 2f29863db154..5b0fd6584b30 100644
--- a/tools/libacpi/build.c
+++ b/tools/libacpi/build.c
@@ -63,6 +63,27 @@  static void set_checksum(
     p[checksum_offset] = -sum;
 }
 
+static unsigned calculate_madt_size(const struct acpi_config *config)
+{
+    uint32_t apic_id;
+    unsigned i, size;
+
+    size  = sizeof(struct acpi_20_madt);
+    size += sizeof(struct acpi_20_madt_intsrcovr) * 16;
+    size += sizeof(struct acpi_20_madt_ioapic);
+
+    for ( i = 0; i < config->hvminfo->nr_vcpus; i++ )
+    {
+        apic_id = config->lapic_id(i);
+        if ( apic_id < 255 )
+            size += sizeof(struct acpi_20_madt_lapic);
+        else
+            size += sizeof(struct acpi_20_madt_x2apic);
+    }
+
+    return size;
+}
+
 static struct acpi_20_madt *construct_madt(struct acpi_ctxt *ctxt,
                                            const struct acpi_config *config,
                                            struct acpi_info *info)
@@ -70,18 +91,14 @@  static struct acpi_20_madt *construct_madt(struct acpi_ctxt *ctxt,
     struct acpi_20_madt           *madt;
     struct acpi_20_madt_intsrcovr *intsrcovr;
     struct acpi_20_madt_ioapic    *io_apic;
-    struct acpi_20_madt_lapic     *lapic;
+    void                          *apicid_entry;
     const struct hvm_info_table   *hvminfo = config->hvminfo;
-    int i, sz;
+    unsigned i, sz;
 
     if ( config->lapic_id == NULL )
         return NULL;
 
-    sz  = sizeof(struct acpi_20_madt);
-    sz += sizeof(struct acpi_20_madt_intsrcovr) * 16;
-    sz += sizeof(struct acpi_20_madt_ioapic);
-    sz += sizeof(struct acpi_20_madt_lapic) * hvminfo->nr_vcpus;
-
+    sz = calculate_madt_size(config);
     madt = ctxt->mem_ops.alloc(ctxt, sz, 16);
     if (!madt) return NULL;
 
@@ -134,27 +151,45 @@  static struct acpi_20_madt *construct_madt(struct acpi_ctxt *ctxt,
         io_apic->ioapic_id   = config->ioapic_id;
         io_apic->ioapic_addr = config->ioapic_base_address;
 
-        lapic = (struct acpi_20_madt_lapic *)(io_apic + 1);
+        apicid_entry = io_apic + 1;
     }
     else
-        lapic = (struct acpi_20_madt_lapic *)(madt + 1);
+        apicid_entry = madt + 1;
 
     info->nr_cpus = hvminfo->nr_vcpus;
-    info->madt_lapic0_addr = ctxt->mem_ops.v2p(ctxt, lapic);
+    info->madt_lapic0_addr = ctxt->mem_ops.v2p(ctxt, apicid_entry);
     for ( i = 0; i < hvminfo->nr_vcpus; i++ )
     {
-        memset(lapic, 0, sizeof(*lapic));
-        lapic->type    = ACPI_PROCESSOR_LOCAL_APIC;
-        lapic->length  = sizeof(*lapic);
-        /* Processor ID must match processor-object IDs in the DSDT. */
-        lapic->acpi_processor_id = i;
-        lapic->apic_id = config->lapic_id(i);
-        lapic->flags = (test_bit(i, hvminfo->vcpu_online)
-                        ? ACPI_LOCAL_APIC_ENABLED : 0);
-        lapic++;
+        uint32_t apic_id = config->lapic_id(i);
+        if ( apic_id < 255 )
+        {
+            struct acpi_20_madt_lapic *lapic = apicid_entry;
+            memset(lapic, 0, sizeof(*lapic));
+            lapic->type    = ACPI_PROCESSOR_LOCAL_APIC;
+            lapic->length  = sizeof(*lapic);
+            /* Processor ID must match processor-object IDs in the DSDT. */
+            lapic->acpi_processor_id = i;
+            lapic->apic_id = apic_id;
+            lapic->flags = (test_bit(i, hvminfo->vcpu_online)
+                            ? ACPI_LOCAL_APIC_ENABLED : 0);
+            apicid_entry = lapic + 1;
+        }
+        else
+        {
+            struct acpi_20_madt_x2apic *x2apic = apicid_entry;
+            memset(x2apic, 0, sizeof(*x2apic));
+            x2apic->type    = ACPI_PROCESSOR_LOCAL_X2APIC;
+            x2apic->length  = sizeof(*x2apic);
+            x2apic->apic_id = apic_id;
+            x2apic->flags   = (test_bit(i, hvminfo->vcpu_online)
+                                ? ACPI_LOCAL_APIC_ENABLED : 0);
+            /* Processor ID must match processor-object IDs in the DSDT. */
+            x2apic->acpi_processor_id = i;
+            apicid_entry = x2apic + 1;
+        }
     }
 
-    madt->header.length = (unsigned char *)lapic - (unsigned char *)madt;
+    madt->header.length = (unsigned char *)apicid_entry - (unsigned char *)madt;
     set_checksum(madt, offsetof(struct acpi_header, checksum),
                  madt->header.length);
     info->madt_csum_addr =