diff mbox series

[v8,2/2] hw/acpi: Implement the SRAT GI affinity structure

Message ID 20240306123317.4691-3-ankita@nvidia.com (mailing list archive)
State New, archived
Headers show
Series acpi: report NUMA nodes for device memory using GI | expand

Commit Message

Ankit Agrawal March 6, 2024, 12:33 p.m. UTC
From: Ankit Agrawal <ankita@nvidia.com>

ACPI spec provides a scheme to associate "Generic Initiators" [1]
(e.g. heterogeneous processors and accelerators, GPUs, and I/O devices with
integrated compute or DMA engines GPUs) with Proximity Domains. This is
achieved using Generic Initiator Affinity Structure in SRAT. During bootup,
Linux kernel parse the ACPI SRAT to determine the PXM ids and create a NUMA
node for each unique PXM ID encountered. Qemu currently do not implement
these structures while building SRAT.

Add GI structures while building VM ACPI SRAT. The association between
device and node are stored using acpi-generic-initiator object. Lookup
presence of all such objects and use them to build these structures.

The structure needs a PCI device handle [2] that consists of the device BDF.
The vfio-pci device corresponding to the acpi-generic-initiator object is
located to determine the BDF.

[1] ACPI Spec 6.3, Section 5.2.16.6
[2] ACPI Spec 6.3, Table 5.80

Cc: Jonathan Cameron <qemu-devel@nongnu.org>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Cedric Le Goater <clg@redhat.com>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
---
 hw/acpi/acpi_generic_initiator.c         | 73 ++++++++++++++++++++++++
 hw/arm/virt-acpi-build.c                 |  3 +
 include/hw/acpi/acpi_generic_initiator.h | 25 ++++++++
 3 files changed, 101 insertions(+)

Comments

Jonathan Cameron March 6, 2024, 1:58 p.m. UTC | #1
On Wed, 6 Mar 2024 12:33:17 +0000
<ankita@nvidia.com> wrote:

> From: Ankit Agrawal <ankita@nvidia.com>
> 
> ACPI spec provides a scheme to associate "Generic Initiators" [1]
> (e.g. heterogeneous processors and accelerators, GPUs, and I/O devices with
> integrated compute or DMA engines GPUs) with Proximity Domains. This is
> achieved using Generic Initiator Affinity Structure in SRAT. During bootup,
> Linux kernel parse the ACPI SRAT to determine the PXM ids and create a NUMA
> node for each unique PXM ID encountered. Qemu currently do not implement
> these structures while building SRAT.
> 
> Add GI structures while building VM ACPI SRAT. The association between
> device and node are stored using acpi-generic-initiator object. Lookup
> presence of all such objects and use them to build these structures.
> 
> The structure needs a PCI device handle [2] that consists of the device BDF.
> The vfio-pci device corresponding to the acpi-generic-initiator object is
> located to determine the BDF.
> 
> [1] ACPI Spec 6.3, Section 5.2.16.6
> [2] ACPI Spec 6.3, Table 5.80
> 
> Cc: Jonathan Cameron <qemu-devel@nongnu.org>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Cedric Le Goater <clg@redhat.com>
> Signed-off-by: Ankit Agrawal <ankita@nvidia.com>

I guess we gloss over the bisection breakage due to being able to add
these nodes and have them used in HMAT as initiators before we have
added SRAT support.  Linux will moan about it and not use such an HMAT
but meh, it will boot.

You could drag the HMAT change after this but perhaps it's not worth bothering.

Otherwise LGTM
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Could add x86 support (posted in reply to v7 this morning)
and sounds like you have the test nearly ready which is great.

Jonathan
Ankit Agrawal March 7, 2024, 3:03 a.m. UTC | #2
>>
>> [1] ACPI Spec 6.3, Section 5.2.16.6
>> [2] ACPI Spec 6.3, Table 5.80
>>
>> Cc: Jonathan Cameron <qemu-devel@nongnu.org>
>> Cc: Alex Williamson <alex.williamson@redhat.com>
>> Cc: Cedric Le Goater <clg@redhat.com>
>> Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
>
> I guess we gloss over the bisection breakage due to being able to add
> these nodes and have them used in HMAT as initiators before we have
> added SRAT support.  Linux will moan about it and not use such an HMAT
> but meh, it will boot.
>
> You could drag the HMAT change after this but perhaps it's not worth bothering.

Sorry this part isn't clear to me. Are you suggesting we keep the HMAT
changes out from this patch?

> Otherwise LGTM
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Thanks!

> Could add x86 support (posted in reply to v7 this morning)
> and sounds like you have the test nearly ready which is great.

Ok, will add the x86 part as well. I could reuse what you shared
earlier.

https://gitlab.com/jic23/qemu/-/commit/ccfb4fe22167e035173390cf147d9c226951b9b6
Jonathan Cameron March 7, 2024, 9:35 a.m. UTC | #3
On Thu, 7 Mar 2024 03:03:02 +0000
Ankit Agrawal <ankita@nvidia.com> wrote:

> >>
> >> [1] ACPI Spec 6.3, Section 5.2.16.6
> >> [2] ACPI Spec 6.3, Table 5.80
> >>
> >> Cc: Jonathan Cameron <qemu-devel@nongnu.org>
> >> Cc: Alex Williamson <alex.williamson@redhat.com>
> >> Cc: Cedric Le Goater <clg@redhat.com>
> >> Signed-off-by: Ankit Agrawal <ankita@nvidia.com>  
> >
> > I guess we gloss over the bisection breakage due to being able to add
> > these nodes and have them used in HMAT as initiators before we have
> > added SRAT support.  Linux will moan about it and not use such an HMAT
> > but meh, it will boot.
> >
> > You could drag the HMAT change after this but perhaps it's not worth bothering.  
> 
> Sorry this part isn't clear to me. Are you suggesting we keep the HMAT
> changes out from this patch?

No - don't drop them. Move them from patch 1 to either patch 2, or to a
patch 3 if that ends up looking clearer.  I think patch 2 is the
right choice though as that enables everything at once.

It's valid to have SRAT containing GI entries without the same in HMAT
(as HMAT doesn't have to be complete), it's not valid to have HMAT refer
to entries that aren't in SRAT.

Another thing we may need to do add in the long run is the _OSC support.
That's needed for DSDT entries with _PXM associated with a GI only node
so that we can make them move node depending on whether or not the
Guest OS supports GIs and so will create the nodes.  Requires a bit of
magic AML to make that work.

It used to crash linux if you didn't do that, but that's been fixed
for a while I believe.

For now we aren't adding any such _PXM entries though so this is just
one for the TODO list :)


> 
> > Otherwise LGTM
> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>  
> 
> Thanks!
> 
> > Could add x86 support (posted in reply to v7 this morning)
> > and sounds like you have the test nearly ready which is great.  
> 
> Ok, will add the x86 part as well. I could reuse what you shared
> earlier.
> 
> https://gitlab.com/jic23/qemu/-/commit/ccfb4fe22167e035173390cf147d9c226951b9b6
Excellent - thanks!

Jonathan

> 
> 
>
diff mbox series

Patch

diff --git a/hw/acpi/acpi_generic_initiator.c b/hw/acpi/acpi_generic_initiator.c
index f57b3c8984..5a7d15363e 100644
--- a/hw/acpi/acpi_generic_initiator.c
+++ b/hw/acpi/acpi_generic_initiator.c
@@ -71,3 +71,76 @@  static void acpi_generic_initiator_class_init(ObjectClass *oc, void *data)
     object_class_property_add(oc, "node", "int", NULL,
         acpi_generic_initiator_set_node, NULL, NULL);
 }
+
+/*
+ * ACPI 6.3:
+ * Table 5-78 Generic Initiator Affinity Structure
+ */
+static void
+build_srat_generic_pci_initiator_affinity(GArray *table_data, int node,
+                                          PCIDeviceHandle *handle)
+{
+    uint8_t index;
+
+    build_append_int_noprefix(table_data, 5, 1);  /* Type */
+    build_append_int_noprefix(table_data, 32, 1); /* Length */
+    build_append_int_noprefix(table_data, 0, 1);  /* Reserved */
+    build_append_int_noprefix(table_data, 1, 1);  /* Device Handle Type: PCI */
+    build_append_int_noprefix(table_data, node, 4);  /* Proximity Domain */
+
+    /* Device Handle - PCI */
+    build_append_int_noprefix(table_data, handle->segment, 2);
+    build_append_int_noprefix(table_data, handle->bdf, 2);
+    for (index = 0; index < 12; index++) {
+        build_append_int_noprefix(table_data, 0, 1);
+    }
+
+    build_append_int_noprefix(table_data, GEN_AFFINITY_ENABLED, 4); /* Flags */
+    build_append_int_noprefix(table_data, 0, 4);     /* Reserved */
+}
+
+static int build_all_acpi_generic_initiators(Object *obj, void *opaque)
+{
+    MachineState *ms = MACHINE(qdev_get_machine());
+    AcpiGenericInitiator *gi;
+    GArray *table_data = opaque;
+    PCIDeviceHandle dev_handle;
+    PCIDevice *pci_dev;
+    Object *o;
+
+    if (!object_dynamic_cast(obj, TYPE_ACPI_GENERIC_INITIATOR)) {
+        return 0;
+    }
+
+    gi = ACPI_GENERIC_INITIATOR(obj);
+    if (gi->node >= ms->numa_state->num_nodes) {
+        error_printf("%s: Specified node %d is invalid.\n",
+                     TYPE_ACPI_GENERIC_INITIATOR, gi->node);
+        exit(1);
+    }
+
+    o = object_resolve_path_type(gi->pci_dev, TYPE_PCI_DEVICE, NULL);
+    if (!o) {
+        error_printf("%s: Specified device must be a PCI device.\n",
+                     TYPE_ACPI_GENERIC_INITIATOR);
+        exit(1);
+    }
+
+    pci_dev = PCI_DEVICE(o);
+
+    dev_handle.segment = 0;
+    dev_handle.bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)),
+                                               pci_dev->devfn);
+
+    build_srat_generic_pci_initiator_affinity(table_data,
+                                              gi->node, &dev_handle);
+
+    return 0;
+}
+
+void build_srat_generic_pci_initiator(GArray *table_data)
+{
+    object_child_foreach_recursive(object_get_root(),
+                                   build_all_acpi_generic_initiators,
+                                   table_data);
+}
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 8bc35a483c..a2b3a2eb25 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -58,6 +58,7 @@ 
 #include "migration/vmstate.h"
 #include "hw/acpi/ghes.h"
 #include "hw/acpi/viot.h"
+#include "hw/acpi/acpi_generic_initiator.h"
 
 #define ARM_SPI_BASE 32
 
@@ -558,6 +559,8 @@  build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
         }
     }
 
+    build_srat_generic_pci_initiator(table_data);
+
     if (ms->nvdimms_state->is_enabled) {
         nvdimm_build_srat(table_data);
     }
diff --git a/include/hw/acpi/acpi_generic_initiator.h b/include/hw/acpi/acpi_generic_initiator.h
index 23d0b591c6..4bebe119ee 100644
--- a/include/hw/acpi/acpi_generic_initiator.h
+++ b/include/hw/acpi/acpi_generic_initiator.h
@@ -29,4 +29,29 @@  typedef struct AcpiGenericInitiatorClass {
     ObjectClass parent_class;
 } AcpiGenericInitiatorClass;
 
+/*
+ * ACPI 6.3:
+ * Table 5-81 Flags – Generic Initiator Affinity Structure
+ */
+typedef enum {
+    /*
+     * If clear, the OSPM ignores the contents of the Generic
+     * Initiator/Port Affinity Structure. This allows system firmware
+     * to populate the SRAT with a static number of structures, but only
+     * enable them as necessary.
+     */
+    GEN_AFFINITY_ENABLED = (1 << 0),
+} GenericAffinityFlags;
+
+/*
+ * ACPI 6.3:
+ * Table 5-80 Device Handle - PCI
+ */
+typedef struct PCIDeviceHandle {
+    uint16_t segment;
+    uint16_t bdf;
+} PCIDeviceHandle;
+
+void build_srat_generic_pci_initiator(GArray *table_data);
+
 #endif