Message ID | 20240403102927.31263-4-Jonathan.Cameron@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | acpi: NUMA nodes for CXL HB as GP + complex NUMA test. | expand |
Jonathan Cameron <Jonathan.Cameron@huawei.com> writes: > These are very similar to the recently added Generic Initiators > but instead of representing an initiator of memory traffic they > represent an edge point beyond which may lie either targets or > initiators. Here we add these ports such that they may > be targets of hmat_lb records to describe the latency and > bandwidth from host side initiators to the port. A descoverable > mechanism such as UEFI CDAT read from CXL devices and switches > is used to discover the remainder fo the path and the OS can build > up full latency and bandwidth numbers as need for work and data > placement decisions. > > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > --- > qapi/qom.json | 18 +++ > include/hw/acpi/acpi_generic_initiator.h | 18 ++- > include/hw/pci/pci_bridge.h | 1 + > hw/acpi/acpi_generic_initiator.c | 141 +++++++++++++++++------ > hw/pci-bridge/pci_expander_bridge.c | 1 - > 5 files changed, 141 insertions(+), 38 deletions(-) > > diff --git a/qapi/qom.json b/qapi/qom.json > index 85e6b4f84a..5480d9ca24 100644 > --- a/qapi/qom.json > +++ b/qapi/qom.json > @@ -826,6 +826,22 @@ > 'data': { 'pci-dev': 'str', > 'node': 'uint32' } } > > + > +## > +# @AcpiGenericPortProperties: > +# > +# Properties for acpi-generic-port objects. > +# > +# @pci-bus: PCI bus of the hostbridge associated with this SRAT entry What's this exactly? A QOM path? A qdev ID? Something else? > +# > +# @node: numa node associated with the PCI device NUMA Is this a NUMA node ID? > +# > +# Since: 9.1 > +## > +{ 'struct': 'AcpiGenericPortProperties', > + 'data': { 'pci-bus': 'str', > + 'node': 'uint32' } } > + > ## > # @RngProperties: > # > @@ -944,6 +960,7 @@ > { 'enum': 'ObjectType', > 'data': [ > 'acpi-generic-initiator', > + 'acpi-generic-port', > 'authz-list', > 'authz-listfile', > 'authz-pam', > @@ -1016,6 +1033,7 @@ > 'discriminator': 'qom-type', > 'data': { > 'acpi-generic-initiator': 'AcpiGenericInitiatorProperties', > + 'acpi-generic-port': 'AcpiGenericPortProperties', > 'authz-list': 'AuthZListProperties', > 'authz-listfile': 'AuthZListFileProperties', > 'authz-pam': 'AuthZPAMProperties', [...]
On Tue, 23 Apr 2024 12:56:21 +0200 Markus Armbruster <armbru@redhat.com> wrote: > Jonathan Cameron <Jonathan.Cameron@huawei.com> writes: > > > These are very similar to the recently added Generic Initiators > > but instead of representing an initiator of memory traffic they > > represent an edge point beyond which may lie either targets or > > initiators. Here we add these ports such that they may > > be targets of hmat_lb records to describe the latency and > > bandwidth from host side initiators to the port. A descoverable > > mechanism such as UEFI CDAT read from CXL devices and switches > > is used to discover the remainder fo the path and the OS can build > > up full latency and bandwidth numbers as need for work and data > > placement decisions. > > > > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Hi Markus, I've again managed a bad job of defining an interface - thanks for your help! > > --- > > qapi/qom.json | 18 +++ > > include/hw/acpi/acpi_generic_initiator.h | 18 ++- > > include/hw/pci/pci_bridge.h | 1 + > > hw/acpi/acpi_generic_initiator.c | 141 +++++++++++++++++------ > > hw/pci-bridge/pci_expander_bridge.c | 1 - > > 5 files changed, 141 insertions(+), 38 deletions(-) > > > > diff --git a/qapi/qom.json b/qapi/qom.json > > index 85e6b4f84a..5480d9ca24 100644 > > --- a/qapi/qom.json > > +++ b/qapi/qom.json > > @@ -826,6 +826,22 @@ > > 'data': { 'pci-dev': 'str', > > 'node': 'uint32' } } > > > > + > > +## > > +# @AcpiGenericPortProperties: > > +# > > +# Properties for acpi-generic-port objects. > > +# > > +# @pci-bus: PCI bus of the hostbridge associated with this SRAT entry > > What's this exactly? A QOM path? A qdev ID? Something else? QOM path I believe as going to call object_resolve_path_type() on it. Oddity is it's defined for the bus, not the host bridge that we care about as the host bridge doesn't have a convenient id to let us identify it. e.g. It is specified via --device pxb-cxl,id=XXXX of TYPE_PXB_CXL_HOST in the command line but ends up on the TYPE_PCI_BUS with parent set to the PXB_CXL_HOST. Normally we just want this bus for hanging root ports of it. I can clarify it's the QOM path but I'm struggling a bit to explain the relationship without resorting to an example. This should also not mention SRAT as at some stage I'd expect DT bindings to provide similar functionality. > > > +# > > +# @node: numa node associated with the PCI device > > NUMA > > Is this a NUMA node ID? Fair question with a non obvious answer. ACPI wise it's a proximity domain. In every other SRAT entry (which define proximity domains) this does map to a NUMA node in an operating system as they contain at least either some form of memory access initiator (CPU, Generic Initiator etc) or a target (memory). A Generic Port is subtly different in that it defines a proximity domain that in of itself is not what we'd think of as a NUMA node but rather an entity that exists to provide the info to the OS to stitch together non discoverable and discoverable buses. So I should have gone with something more specific. Could add this to the parameter docs, or is it too much? @node: Similar to a NUMA node ID, but instead of providing a reference point used for defining NUMA distances and access characteristics to memory or from an initiator (e.g. CPU), this node defines the boundary point between non discoverable system buses which must be discovered from firmware, and a discoverable bus. NUMA distances and access characteristics are defined to and from that point, but for system software to establish full initiator to target characteristics this information must be combined with information retrieved form the discoverable part of the path. An example would use CDAT information read from devices and switches in conjunction with link characteristics read from PCIe Configuration space. > > > +# > > +# Since: 9.1 > > +## > > +{ 'struct': 'AcpiGenericPortProperties', > > + 'data': { 'pci-bus': 'str', > > + 'node': 'uint32' } } > > + > > ## > > # @RngProperties: > > # > > @@ -944,6 +960,7 @@ > > { 'enum': 'ObjectType', > > 'data': [ > > 'acpi-generic-initiator', > > + 'acpi-generic-port', > > 'authz-list', > > 'authz-listfile', > > 'authz-pam', > > @@ -1016,6 +1033,7 @@ > > 'discriminator': 'qom-type', > > 'data': { > > 'acpi-generic-initiator': 'AcpiGenericInitiatorProperties', > > + 'acpi-generic-port': 'AcpiGenericPortProperties', > > 'authz-list': 'AuthZListProperties', > > 'authz-listfile': 'AuthZListFileProperties', > > 'authz-pam': 'AuthZPAMProperties', > > [...] > >
Jonathan Cameron <Jonathan.Cameron@Huawei.com> writes: > On Tue, 23 Apr 2024 12:56:21 +0200 > Markus Armbruster <armbru@redhat.com> wrote: > >> Jonathan Cameron <Jonathan.Cameron@huawei.com> writes: >> >> > These are very similar to the recently added Generic Initiators >> > but instead of representing an initiator of memory traffic they >> > represent an edge point beyond which may lie either targets or >> > initiators. Here we add these ports such that they may >> > be targets of hmat_lb records to describe the latency and >> > bandwidth from host side initiators to the port. A descoverable >> > mechanism such as UEFI CDAT read from CXL devices and switches >> > is used to discover the remainder fo the path and the OS can build >> > up full latency and bandwidth numbers as need for work and data >> > placement decisions. >> > >> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > > Hi Markus, > > I've again managed a bad job of defining an interface - thanks for > your help! Good interfaces are hard! >> > --- >> > qapi/qom.json | 18 +++ >> > include/hw/acpi/acpi_generic_initiator.h | 18 ++- >> > include/hw/pci/pci_bridge.h | 1 + >> > hw/acpi/acpi_generic_initiator.c | 141 +++++++++++++++++------ >> > hw/pci-bridge/pci_expander_bridge.c | 1 - >> > 5 files changed, 141 insertions(+), 38 deletions(-) >> > >> > diff --git a/qapi/qom.json b/qapi/qom.json >> > index 85e6b4f84a..5480d9ca24 100644 >> > --- a/qapi/qom.json >> > +++ b/qapi/qom.json >> > @@ -826,6 +826,22 @@ >> > 'data': { 'pci-dev': 'str', >> > 'node': 'uint32' } } >> > >> > + >> > +## >> > +# @AcpiGenericPortProperties: >> > +# >> > +# Properties for acpi-generic-port objects. >> > +# >> > +# @pci-bus: PCI bus of the hostbridge associated with this SRAT entry >> >> What's this exactly? A QOM path? A qdev ID? Something else? > > QOM path I believe as going to call object_resolve_path_type() on it. QOM path then. > Oddity is it's defined for the bus, not the host bridge that > we care about as the host bridge doesn't have a convenient id to let > us identify it. > > e.g. It is specified via --device pxb-cxl,id=XXXX > of TYPE_PXB_CXL_HOST in the command line but ends up on the > TYPE_PCI_BUS with parent set to the PXB_CXL_HOST. > Normally we just want this bus for hanging root ports of it. > > I can clarify it's the QOM path but I'm struggling a bit to explain > the relationship without resorting to an example. > This should also not mention SRAT as at some stage I'd expect DT > bindings to provide similar functionality. Let's start with an example. Not to put it into the doc comment, only to help me understand what you need. Hopefully I can then assist with improving the interface and/or its documentation. >> > +# >> > +# @node: numa node associated with the PCI device >> >> NUMA >> >> Is this a NUMA node ID? > > Fair question with a non obvious answer. ACPI wise it's a proximity domain. > In every other SRAT entry (which define proximity domains) this does map > to a NUMA node in an operating system as they contain at least either some > form of memory access initiator (CPU, Generic Initiator etc) or a target (memory). > > A Generic Port is subtly different in that it defines a proximity domain > that in of itself is not what we'd think of as a NUMA node but > rather an entity that exists to provide the info to the OS to stitch > together non discoverable and discoverable buses. > > So I should have gone with something more specific. Could add this to > the parameter docs, or is it too much? > > @node: Similar to a NUMA node ID, but instead of providing a reference > point used for defining NUMA distances and access characteristics > to memory or from an initiator (e.g. CPU), this node defines the > boundary point between non discoverable system buses which must be > discovered from firmware, and a discoverable bus. NUMA distances > and access characteristics are defined to and from that point, > but for system software to establish full initiator to target > characteristics this information must be combined with information > retrieved form the discoverable part of the path. An example would > use CDAT information read from devices and switches in conjunction > with link characteristics read from PCIe Configuration space. This is mostly greek to me :) Bit I don't think it's too much. >> > +# >> > +# Since: 9.1 >> > +## >> > +{ 'struct': 'AcpiGenericPortProperties', >> > + 'data': { 'pci-bus': 'str', >> > + 'node': 'uint32' } } >> > + >> > ## >> > # @RngProperties: >> > # >> > @@ -944,6 +960,7 @@ >> > { 'enum': 'ObjectType', >> > 'data': [ >> > 'acpi-generic-initiator', >> > + 'acpi-generic-port', >> > 'authz-list', >> > 'authz-listfile', >> > 'authz-pam', >> > @@ -1016,6 +1033,7 @@ >> > 'discriminator': 'qom-type', >> > 'data': { >> > 'acpi-generic-initiator': 'AcpiGenericInitiatorProperties', >> > + 'acpi-generic-port': 'AcpiGenericPortProperties', >> > 'authz-list': 'AuthZListProperties', >> > 'authz-listfile': 'AuthZListFileProperties', >> > 'authz-pam': 'AuthZPAMProperties', >> >> [...] >> >>
On Tue, 30 Apr 2024 08:55:12 +0200 Markus Armbruster <armbru@redhat.com> wrote: > Jonathan Cameron <Jonathan.Cameron@Huawei.com> writes: > > > On Tue, 23 Apr 2024 12:56:21 +0200 > > Markus Armbruster <armbru@redhat.com> wrote: > > > >> Jonathan Cameron <Jonathan.Cameron@huawei.com> writes: > >> > >> > These are very similar to the recently added Generic Initiators > >> > but instead of representing an initiator of memory traffic they > >> > represent an edge point beyond which may lie either targets or > >> > initiators. Here we add these ports such that they may > >> > be targets of hmat_lb records to describe the latency and > >> > bandwidth from host side initiators to the port. A descoverable > >> > mechanism such as UEFI CDAT read from CXL devices and switches > >> > is used to discover the remainder fo the path and the OS can build > >> > up full latency and bandwidth numbers as need for work and data > >> > placement decisions. > >> > > >> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > > > > Hi Markus, > > > > I've again managed a bad job of defining an interface - thanks for > > your help! > > Good interfaces are hard! > > >> > --- > >> > qapi/qom.json | 18 +++ > >> > include/hw/acpi/acpi_generic_initiator.h | 18 ++- > >> > include/hw/pci/pci_bridge.h | 1 + > >> > hw/acpi/acpi_generic_initiator.c | 141 +++++++++++++++++------ > >> > hw/pci-bridge/pci_expander_bridge.c | 1 - > >> > 5 files changed, 141 insertions(+), 38 deletions(-) > >> > > >> > diff --git a/qapi/qom.json b/qapi/qom.json > >> > index 85e6b4f84a..5480d9ca24 100644 > >> > --- a/qapi/qom.json > >> > +++ b/qapi/qom.json > >> > @@ -826,6 +826,22 @@ > >> > 'data': { 'pci-dev': 'str', > >> > 'node': 'uint32' } } > >> > > >> > + > >> > +## > >> > +# @AcpiGenericPortProperties: > >> > +# > >> > +# Properties for acpi-generic-port objects. > >> > +# > >> > +# @pci-bus: PCI bus of the hostbridge associated with this SRAT entry > >> > >> What's this exactly? A QOM path? A qdev ID? Something else? > > > > QOM path I believe as going to call object_resolve_path_type() on it. > > QOM path then. > > > Oddity is it's defined for the bus, not the host bridge that > > we care about as the host bridge doesn't have a convenient id to let > > us identify it. > > > > e.g. It is specified via --device pxb-cxl,id=XXXX > > of TYPE_PXB_CXL_HOST in the command line but ends up on the > > TYPE_PCI_BUS with parent set to the PXB_CXL_HOST. > > Normally we just want this bus for hanging root ports of it. > > > > I can clarify it's the QOM path but I'm struggling a bit to explain > > the relationship without resorting to an example. > > This should also not mention SRAT as at some stage I'd expect DT > > bindings to provide similar functionality. > > Let's start with an example. Not to put it into the doc comment, only > to help me understand what you need. Hopefully I can then assist with > improving the interface and/or its documentation. Stripping out some relevant bits from a test setup and editing it down - most of this is about creating relevant SLIT/HMAT tables. # First a CXL root bridge, root port and direct attached device plus fixed # memory window. Linux currently builds a NUMA node per fixed memory window # but that's a simplification that may change over time. -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1,hdm_for_passthrough=true \ -device cxl-rp,port=0,bus=cxl.1,id=cxl_rp_port0,chassis=0,slot=2 \ -device cxl-type3,bus=cxl_rp_port0,volatile-memdev=cxl-mem1,persistent-memdev=cxl-mem2,id=cxl-pmem1,lsa=cxl-lsa1,sn=3 \ -machine cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=1k \ # Next line is the port defintion - see that the pci-bus refers to the one in the id parameter for # the PXB CXL, but the ACPI table that is generated refers to the DSDT entry via a ACPI0016 entry. # So to get to that we use the PCI bus ID of the root bus that forms part of the root bridge (but # is a child object in qemu. -numa node,nodeid=2 \ -object acpi-generic-port,id=bob11,pci-bus=cxl.1,node=2 \ # The rest is a the setup for the hmat and slit tables. I hid most of the config, but # left this here as key is that we specify values to and from the generic port 'node' # but it's not really a node as such, but just a point along the path to one. -numa dist,src=0,dst=0,val=10 -numa dist,src=0,dst=1,val=21 -numa dist,src=0,dst=2,val=21 -numa dist,src=0,dst=3,val=21 -numa dist,src=0,dst=4,val=21 -numa dist,src=0,dst=5,val=21 \ -numa dist,src=1,dst=0,val=21 -numa dist,src=1,dst=1,val=10 -numa dist,src=1,dst=2,val=21 -numa dist,src=1,dst=3,val=21 -numa dist,src=1,dst=4,val=21 -numa dist,src=1,dst=5,val=21 \ -numa dist,src=2,dst=0,val=21 -numa dist,src=2,dst=1,val=21 -numa dist,src=2,dst=2,val=10 -numa dist,src=2,dst=3,val=21 -numa dist,src=2,dst=4,val=21 -numa dist,src=2,dst=5,val=21 \ -numa dist,src=3,dst=0,val=21 -numa dist,src=3,dst=1,val=21 -numa dist,src=3,dst=2,val=21 -numa dist,src=3,dst=3,val=10 -numa dist,src=3,dst=4,val=21 -numa dist,src=3,dst=5,val=21 \ -numa dist,src=4,dst=0,val=21 -numa dist,src=4,dst=1,val=21 -numa dist,src=4,dst=2,val=21 -numa dist,src=4,dst=3,val=21 -numa dist,src=4,dst=4,val=10 -numa dist,src=4,dst=5,val=21 \ -numa dist,src=5,dst=0,val=21 -numa dist,src=5,dst=1,val=21 -numa dist,src=5,dst=2,val=21 -numa dist,src=5,dst=3,val=21 -numa dist,src=5,dst=4,val=21 -numa dist,src=5,dst=5,val=10 \ -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=10 \ -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=800M \ -numa hmat-lb,initiator=0,target=2,hierarchy=memory,data-type=access-latency,latency=100 \ -numa hmat-lb,initiator=0,target=2,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \ -numa hmat-lb,initiator=0,target=4,hierarchy=memory,data-type=access-latency,latency=100 \ -numa hmat-lb,initiator=0,target=4,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \ -numa hmat-lb,initiator=0,target=5,hierarchy=memory,data-type=access-latency,latency=200 \ -numa hmat-lb,initiator=0,target=5,hierarchy=memory,data-type=access-bandwidth,bandwidth=400M \ -numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-latency,latency=500 \ -numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=100M \ -numa hmat-lb,initiator=1,target=2,hierarchy=memory,data-type=access-latency,latency=50 \ -numa hmat-lb,initiator=1,target=2,hierarchy=memory,data-type=access-bandwidth,bandwidth=400M \ -numa hmat-lb,initiator=1,target=4,hierarchy=memory,data-type=access-latency,latency=50 \ -numa hmat-lb,initiator=1,target=4,hierarchy=memory,data-type=access-bandwidth,bandwidth=800M \ -numa hmat-lb,initiator=1,target=5,hierarchy=memory,data-type=access-latency,latency=500 \ -numa hmat-lb,initiator=1,target=5,hierarchy=memory,data-type=access-bandwidth,bandwidth=100M \ -numa hmat-lb,initiator=3,target=0,hierarchy=memory,data-type=access-latency,latency=20 \ -numa hmat-lb,initiator=3,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=400M \ -numa hmat-lb,initiator=3,target=2,hierarchy=memory,data-type=access-latency,latency=80 \ -numa hmat-lb,initiator=3,target=2,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \ -numa hmat-lb,initiator=3,target=4,hierarchy=memory,data-type=access-latency,latency=80 \ -numa hmat-lb,initiator=3,target=4,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \ -numa hmat-lb,initiator=3,target=5,hierarchy=memory,data-type=access-latency,latency=20 \ -numa hmat-lb,initiator=3,target=5,hierarchy=memory,data-type=access-bandwidth,bandwidth=400M \ -numa hmat-lb,initiator=5,target=0,hierarchy=memory,data-type=access-latency,latency=20 \ -numa hmat-lb,initiator=5,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=400M \ -numa hmat-lb,initiator=5,target=2,hierarchy=memory,data-type=access-latency,latency=80 \ -numa hmat-lb,initiator=5,target=4,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \ -numa hmat-lb,initiator=5,target=4,hierarchy=memory,data-type=access-latency,latency=80 \ -numa hmat-lb,initiator=5,target=2,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \ -numa hmat-lb,initiator=5,target=5,hierarchy=memory,data-type=access-latency,latency=10 \ -numa hmat-lb,initiator=5,target=5,hierarchy=memory,data-type=access-bandwidth,bandwidth=800M > > >> > +# > >> > +# @node: numa node associated with the PCI device > >> > >> NUMA > >> > >> Is this a NUMA node ID? > > > > Fair question with a non obvious answer. ACPI wise it's a proximity domain. > > In every other SRAT entry (which define proximity domains) this does map > > to a NUMA node in an operating system as they contain at least either some > > form of memory access initiator (CPU, Generic Initiator etc) or a target (memory). > > > > A Generic Port is subtly different in that it defines a proximity domain > > that in of itself is not what we'd think of as a NUMA node but > > rather an entity that exists to provide the info to the OS to stitch > > together non discoverable and discoverable buses. > > > > So I should have gone with something more specific. Could add this to > > the parameter docs, or is it too much? > > > > @node: Similar to a NUMA node ID, but instead of providing a reference > > point used for defining NUMA distances and access characteristics > > to memory or from an initiator (e.g. CPU), this node defines the > > boundary point between non discoverable system buses which must be > > discovered from firmware, and a discoverable bus. NUMA distances > > and access characteristics are defined to and from that point, > > but for system software to establish full initiator to target > > characteristics this information must be combined with information > > retrieved form the discoverable part of the path. An example would > > use CDAT information read from devices and switches in conjunction > > with link characteristics read from PCIe Configuration space. > > This is mostly greek to me :) Bit I don't think it's too much. > > >> > +# > >> > +# Since: 9.1 > >> > +## > >> > +{ 'struct': 'AcpiGenericPortProperties', > >> > + 'data': { 'pci-bus': 'str', > >> > + 'node': 'uint32' } } > >> > + > >> > ## > >> > # @RngProperties: > >> > # > >> > @@ -944,6 +960,7 @@ > >> > { 'enum': 'ObjectType', > >> > 'data': [ > >> > 'acpi-generic-initiator', > >> > + 'acpi-generic-port', > >> > 'authz-list', > >> > 'authz-listfile', > >> > 'authz-pam', > >> > @@ -1016,6 +1033,7 @@ > >> > 'discriminator': 'qom-type', > >> > 'data': { > >> > 'acpi-generic-initiator': 'AcpiGenericInitiatorProperties', > >> > + 'acpi-generic-port': 'AcpiGenericPortProperties', > >> > 'authz-list': 'AuthZListProperties', > >> > 'authz-listfile': 'AuthZListFileProperties', > >> > 'authz-pam': 'AuthZPAMProperties', > >> > >> [...] > >> > >> >
Sorry for the late reply. I had to put this aside, and I'm coming back to it only now. Jonathan Cameron <Jonathan.Cameron@huawei.com> writes: > On Tue, 30 Apr 2024 08:55:12 +0200 > Markus Armbruster <armbru@redhat.com> wrote: > >> Jonathan Cameron <Jonathan.Cameron@Huawei.com> writes: >> >> > On Tue, 23 Apr 2024 12:56:21 +0200 >> > Markus Armbruster <armbru@redhat.com> wrote: >> > >> >> Jonathan Cameron <Jonathan.Cameron@huawei.com> writes: >> >> >> >> > These are very similar to the recently added Generic Initiators >> >> > but instead of representing an initiator of memory traffic they >> >> > represent an edge point beyond which may lie either targets or >> >> > initiators. Here we add these ports such that they may >> >> > be targets of hmat_lb records to describe the latency and >> >> > bandwidth from host side initiators to the port. A descoverable >> >> > mechanism such as UEFI CDAT read from CXL devices and switches >> >> > is used to discover the remainder fo the path and the OS can build >> >> > up full latency and bandwidth numbers as need for work and data >> >> > placement decisions. >> >> > >> >> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> >> > >> > Hi Markus, >> > >> > I've again managed a bad job of defining an interface - thanks for >> > your help! >> >> Good interfaces are hard! >> >> >> > --- >> >> > qapi/qom.json | 18 +++ >> >> > include/hw/acpi/acpi_generic_initiator.h | 18 ++- >> >> > include/hw/pci/pci_bridge.h | 1 + >> >> > hw/acpi/acpi_generic_initiator.c | 141 +++++++++++++++++------ >> >> > hw/pci-bridge/pci_expander_bridge.c | 1 - >> >> > 5 files changed, 141 insertions(+), 38 deletions(-) >> >> > >> >> > diff --git a/qapi/qom.json b/qapi/qom.json >> >> > index 85e6b4f84a..5480d9ca24 100644 >> >> > --- a/qapi/qom.json >> >> > +++ b/qapi/qom.json >> >> > @@ -826,6 +826,22 @@ >> >> > 'data': { 'pci-dev': 'str', >> >> > 'node': 'uint32' } } >> >> > >> >> > + >> >> > +## >> >> > +# @AcpiGenericPortProperties: >> >> > +# >> >> > +# Properties for acpi-generic-port objects. >> >> > +# >> >> > +# @pci-bus: PCI bus of the hostbridge associated with this SRAT entry >> >> >> >> What's this exactly? A QOM path? A qdev ID? Something else? >> > >> > QOM path I believe as going to call object_resolve_path_type() on it. >> >> QOM path then. >> >> > Oddity is it's defined for the bus, not the host bridge that >> > we care about as the host bridge doesn't have a convenient id to let >> > us identify it. Copied from the example below: -object acpi-generic-port,id=bob11,pci-bus=cxl.1,node=2 where pci-bus=cxl.1 refers to -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1,hdm_for_passthrough=true What is "the host bridge that we care about" in that example? Is pci-bus=cxl.1 a surrogate for a reference to "the host bridge that we care about"? I.e. something that enables the C code to find the host bridge? >> > e.g. It is specified via --device pxb-cxl,id=XXXX >> > of TYPE_PXB_CXL_HOST in the command line but ends up on the >> > TYPE_PCI_BUS with parent set to the PXB_CXL_HOST. >> > Normally we just want this bus for hanging root ports of it. >> > >> > I can clarify it's the QOM path but I'm struggling a bit to explain >> > the relationship without resorting to an example. >> > This should also not mention SRAT as at some stage I'd expect DT >> > bindings to provide similar functionality. >> >> Let's start with an example. Not to put it into the doc comment, only >> to help me understand what you need. Hopefully I can then assist with >> improving the interface and/or its documentation. > > Stripping out some relevant bits from a test setup and editing it down > - most of this is about creating relevant SLIT/HMAT tables. > > # First a CXL root bridge, root port and direct attached device plus fixed > # memory window. Linux currently builds a NUMA node per fixed memory window > # but that's a simplification that may change over time. > -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1,hdm_for_passthrough=true \ > -device cxl-rp,port=0,bus=cxl.1,id=cxl_rp_port0,chassis=0,slot=2 \ > -device cxl-type3,bus=cxl_rp_port0,volatile-memdev=cxl-mem1,persistent-memdev=cxl-mem2,id=cxl-pmem1,lsa=cxl-lsa1,sn=3 \ > -machine cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=1k \ > > # Next line is the port defintion - see that the pci-bus refers to the one in the id parameter for > # the PXB CXL, but the ACPI table that is generated refers to the DSDT entry via a ACPI0016 entry. > # So to get to that we use the PCI bus ID of the root bus that forms part of the root bridge (but > # is a child object in qemu. > -numa node,nodeid=2 \ > -object acpi-generic-port,id=bob11,pci-bus=cxl.1,node=2 \ > > # The rest is a the setup for the hmat and slit tables. I hid most of the config, but > # left this here as key is that we specify values to and from the generic port 'node' > # but it's not really a node as such, but just a point along the path to one. > > -numa dist,src=0,dst=0,val=10 -numa dist,src=0,dst=1,val=21 -numa dist,src=0,dst=2,val=21 -numa dist,src=0,dst=3,val=21 -numa dist,src=0,dst=4,val=21 -numa dist,src=0,dst=5,val=21 \ > -numa dist,src=1,dst=0,val=21 -numa dist,src=1,dst=1,val=10 -numa dist,src=1,dst=2,val=21 -numa dist,src=1,dst=3,val=21 -numa dist,src=1,dst=4,val=21 -numa dist,src=1,dst=5,val=21 \ > -numa dist,src=2,dst=0,val=21 -numa dist,src=2,dst=1,val=21 -numa dist,src=2,dst=2,val=10 -numa dist,src=2,dst=3,val=21 -numa dist,src=2,dst=4,val=21 -numa dist,src=2,dst=5,val=21 \ > -numa dist,src=3,dst=0,val=21 -numa dist,src=3,dst=1,val=21 -numa dist,src=3,dst=2,val=21 -numa dist,src=3,dst=3,val=10 -numa dist,src=3,dst=4,val=21 -numa dist,src=3,dst=5,val=21 \ > -numa dist,src=4,dst=0,val=21 -numa dist,src=4,dst=1,val=21 -numa dist,src=4,dst=2,val=21 -numa dist,src=4,dst=3,val=21 -numa dist,src=4,dst=4,val=10 -numa dist,src=4,dst=5,val=21 \ > -numa dist,src=5,dst=0,val=21 -numa dist,src=5,dst=1,val=21 -numa dist,src=5,dst=2,val=21 -numa dist,src=5,dst=3,val=21 -numa dist,src=5,dst=4,val=21 -numa dist,src=5,dst=5,val=10 \ > -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=10 \ > -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=800M \ > -numa hmat-lb,initiator=0,target=2,hierarchy=memory,data-type=access-latency,latency=100 \ > -numa hmat-lb,initiator=0,target=2,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \ > -numa hmat-lb,initiator=0,target=4,hierarchy=memory,data-type=access-latency,latency=100 \ > -numa hmat-lb,initiator=0,target=4,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \ > -numa hmat-lb,initiator=0,target=5,hierarchy=memory,data-type=access-latency,latency=200 \ > -numa hmat-lb,initiator=0,target=5,hierarchy=memory,data-type=access-bandwidth,bandwidth=400M \ > -numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-latency,latency=500 \ > -numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=100M \ > -numa hmat-lb,initiator=1,target=2,hierarchy=memory,data-type=access-latency,latency=50 \ > -numa hmat-lb,initiator=1,target=2,hierarchy=memory,data-type=access-bandwidth,bandwidth=400M \ > -numa hmat-lb,initiator=1,target=4,hierarchy=memory,data-type=access-latency,latency=50 \ > -numa hmat-lb,initiator=1,target=4,hierarchy=memory,data-type=access-bandwidth,bandwidth=800M \ > -numa hmat-lb,initiator=1,target=5,hierarchy=memory,data-type=access-latency,latency=500 \ > -numa hmat-lb,initiator=1,target=5,hierarchy=memory,data-type=access-bandwidth,bandwidth=100M \ > -numa hmat-lb,initiator=3,target=0,hierarchy=memory,data-type=access-latency,latency=20 \ > -numa hmat-lb,initiator=3,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=400M \ > -numa hmat-lb,initiator=3,target=2,hierarchy=memory,data-type=access-latency,latency=80 \ > -numa hmat-lb,initiator=3,target=2,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \ > -numa hmat-lb,initiator=3,target=4,hierarchy=memory,data-type=access-latency,latency=80 \ > -numa hmat-lb,initiator=3,target=4,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \ > -numa hmat-lb,initiator=3,target=5,hierarchy=memory,data-type=access-latency,latency=20 \ > -numa hmat-lb,initiator=3,target=5,hierarchy=memory,data-type=access-bandwidth,bandwidth=400M \ > -numa hmat-lb,initiator=5,target=0,hierarchy=memory,data-type=access-latency,latency=20 \ > -numa hmat-lb,initiator=5,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=400M \ > -numa hmat-lb,initiator=5,target=2,hierarchy=memory,data-type=access-latency,latency=80 \ > -numa hmat-lb,initiator=5,target=4,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \ > -numa hmat-lb,initiator=5,target=4,hierarchy=memory,data-type=access-latency,latency=80 \ > -numa hmat-lb,initiator=5,target=2,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \ > -numa hmat-lb,initiator=5,target=5,hierarchy=memory,data-type=access-latency,latency=10 \ > -numa hmat-lb,initiator=5,target=5,hierarchy=memory,data-type=access-bandwidth,bandwidth=800M Uff! Could you throw in output of "info qom-tree /"? [...]
On Tue, 04 Jun 2024 14:06:53 +0200 Markus Armbruster <armbru@redhat.com> wrote: > Sorry for the late reply. I had to put this aside, and I'm coming back > to it only now. No problem. Thanks for looking again! > > Jonathan Cameron <Jonathan.Cameron@huawei.com> writes: > > > On Tue, 30 Apr 2024 08:55:12 +0200 > > Markus Armbruster <armbru@redhat.com> wrote: > > > >> Jonathan Cameron <Jonathan.Cameron@Huawei.com> writes: > >> > >> > On Tue, 23 Apr 2024 12:56:21 +0200 > >> > Markus Armbruster <armbru@redhat.com> wrote: > >> > > >> >> Jonathan Cameron <Jonathan.Cameron@huawei.com> writes: > >> >> > >> >> > These are very similar to the recently added Generic Initiators > >> >> > but instead of representing an initiator of memory traffic they > >> >> > represent an edge point beyond which may lie either targets or > >> >> > initiators. Here we add these ports such that they may > >> >> > be targets of hmat_lb records to describe the latency and > >> >> > bandwidth from host side initiators to the port. A descoverable > >> >> > mechanism such as UEFI CDAT read from CXL devices and switches > >> >> > is used to discover the remainder fo the path and the OS can build > >> >> > up full latency and bandwidth numbers as need for work and data > >> >> > placement decisions. > >> >> > > >> >> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > >> > > >> > Hi Markus, > >> > > >> > I've again managed a bad job of defining an interface - thanks for > >> > your help! > >> > >> Good interfaces are hard! > >> > >> >> > --- > >> >> > qapi/qom.json | 18 +++ > >> >> > include/hw/acpi/acpi_generic_initiator.h | 18 ++- > >> >> > include/hw/pci/pci_bridge.h | 1 + > >> >> > hw/acpi/acpi_generic_initiator.c | 141 +++++++++++++++++------ > >> >> > hw/pci-bridge/pci_expander_bridge.c | 1 - > >> >> > 5 files changed, 141 insertions(+), 38 deletions(-) > >> >> > > >> >> > diff --git a/qapi/qom.json b/qapi/qom.json > >> >> > index 85e6b4f84a..5480d9ca24 100644 > >> >> > --- a/qapi/qom.json > >> >> > +++ b/qapi/qom.json > >> >> > @@ -826,6 +826,22 @@ > >> >> > 'data': { 'pci-dev': 'str', > >> >> > 'node': 'uint32' } } > >> >> > > >> >> > + > >> >> > +## > >> >> > +# @AcpiGenericPortProperties: > >> >> > +# > >> >> > +# Properties for acpi-generic-port objects. > >> >> > +# > >> >> > +# @pci-bus: PCI bus of the hostbridge associated with this SRAT entry > >> >> > >> >> What's this exactly? A QOM path? A qdev ID? Something else? > >> > > >> > QOM path I believe as going to call object_resolve_path_type() on it. > >> > >> QOM path then. > >> > >> > Oddity is it's defined for the bus, not the host bridge that > >> > we care about as the host bridge doesn't have a convenient id to let > >> > us identify it. > > Copied from the example below: > > -object acpi-generic-port,id=bob11,pci-bus=cxl.1,node=2 > > where pci-bus=cxl.1 refers to > > -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1,hdm_for_passthrough=true > > What is "the host bridge that we care about" in that example? > > Is pci-bus=cxl.1 a surrogate for a reference to "the host bridge that we > care about"? I.e. something that enables the C code to find the host > bridge? > Yes, That pci-bridge line results in the creation of two objects. A host bridge and a bus below it (to which root ports etc are attached). ID is associated with the bus. As you say we are using it as a surrogate to get to the host bridge as they always have a 1:1 relationship. Why the ACPI spec defines the generic port as not refering to a port but to a host bridge is beyond me, but unfortunately it does have an explicit reference to that being the right thing to do in this case. > >> > e.g. It is specified via --device pxb-cxl,id=XXXX > >> > of TYPE_PXB_CXL_HOST in the command line but ends up on the > >> > TYPE_PCI_BUS with parent set to the PXB_CXL_HOST. > >> > Normally we just want this bus for hanging root ports of it. > >> > > >> > I can clarify it's the QOM path but I'm struggling a bit to explain > >> > the relationship without resorting to an example. > >> > This should also not mention SRAT as at some stage I'd expect DT > >> > bindings to provide similar functionality. > >> > >> Let's start with an example. Not to put it into the doc comment, only > >> to help me understand what you need. Hopefully I can then assist with > >> improving the interface and/or its documentation. > > > > Stripping out some relevant bits from a test setup and editing it down > > - most of this is about creating relevant SLIT/HMAT tables. > > > > # First a CXL root bridge, root port and direct attached device plus fixed > > # memory window. Linux currently builds a NUMA node per fixed memory window > > # but that's a simplification that may change over time. > > -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1,hdm_for_passthrough=true \ > > -device cxl-rp,port=0,bus=cxl.1,id=cxl_rp_port0,chassis=0,slot=2 \ > > -device cxl-type3,bus=cxl_rp_port0,volatile-memdev=cxl-mem1,persistent-memdev=cxl-mem2,id=cxl-pmem1,lsa=cxl-lsa1,sn=3 \ > > -machine cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=1k \ > > > > # Next line is the port defintion - see that the pci-bus refers to the one in the id parameter for > > # the PXB CXL, but the ACPI table that is generated refers to the DSDT entry via a ACPI0016 entry. > > # So to get to that we use the PCI bus ID of the root bus that forms part of the root bridge (but > > # is a child object in qemu. > > -numa node,nodeid=2 \ > > -object acpi-generic-port,id=bob11,pci-bus=cxl.1,node=2 \ > > > > # The rest is a the setup for the hmat and slit tables. I hid most of the config, but > > # left this here as key is that we specify values to and from the generic port 'node' > > # but it's not really a node as such, but just a point along the path to one. > > > > -numa dist,src=0,dst=0,val=10 -numa dist,src=0,dst=1,val=21 -numa dist,src=0,dst=2,val=21 -numa dist,src=0,dst=3,val=21 -numa dist,src=0,dst=4,val=21 -numa dist,src=0,dst=5,val=21 \ > > -numa dist,src=1,dst=0,val=21 -numa dist,src=1,dst=1,val=10 -numa dist,src=1,dst=2,val=21 -numa dist,src=1,dst=3,val=21 -numa dist,src=1,dst=4,val=21 -numa dist,src=1,dst=5,val=21 \ > > -numa dist,src=2,dst=0,val=21 -numa dist,src=2,dst=1,val=21 -numa dist,src=2,dst=2,val=10 -numa dist,src=2,dst=3,val=21 -numa dist,src=2,dst=4,val=21 -numa dist,src=2,dst=5,val=21 \ > > -numa dist,src=3,dst=0,val=21 -numa dist,src=3,dst=1,val=21 -numa dist,src=3,dst=2,val=21 -numa dist,src=3,dst=3,val=10 -numa dist,src=3,dst=4,val=21 -numa dist,src=3,dst=5,val=21 \ > > -numa dist,src=4,dst=0,val=21 -numa dist,src=4,dst=1,val=21 -numa dist,src=4,dst=2,val=21 -numa dist,src=4,dst=3,val=21 -numa dist,src=4,dst=4,val=10 -numa dist,src=4,dst=5,val=21 \ > > -numa dist,src=5,dst=0,val=21 -numa dist,src=5,dst=1,val=21 -numa dist,src=5,dst=2,val=21 -numa dist,src=5,dst=3,val=21 -numa dist,src=5,dst=4,val=21 -numa dist,src=5,dst=5,val=10 \ > > -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=10 \ > > -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=800M \ > > -numa hmat-lb,initiator=0,target=2,hierarchy=memory,data-type=access-latency,latency=100 \ > > -numa hmat-lb,initiator=0,target=2,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \ > > -numa hmat-lb,initiator=0,target=4,hierarchy=memory,data-type=access-latency,latency=100 \ > > -numa hmat-lb,initiator=0,target=4,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \ > > -numa hmat-lb,initiator=0,target=5,hierarchy=memory,data-type=access-latency,latency=200 \ > > -numa hmat-lb,initiator=0,target=5,hierarchy=memory,data-type=access-bandwidth,bandwidth=400M \ > > -numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-latency,latency=500 \ > > -numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=100M \ > > -numa hmat-lb,initiator=1,target=2,hierarchy=memory,data-type=access-latency,latency=50 \ > > -numa hmat-lb,initiator=1,target=2,hierarchy=memory,data-type=access-bandwidth,bandwidth=400M \ > > -numa hmat-lb,initiator=1,target=4,hierarchy=memory,data-type=access-latency,latency=50 \ > > -numa hmat-lb,initiator=1,target=4,hierarchy=memory,data-type=access-bandwidth,bandwidth=800M \ > > -numa hmat-lb,initiator=1,target=5,hierarchy=memory,data-type=access-latency,latency=500 \ > > -numa hmat-lb,initiator=1,target=5,hierarchy=memory,data-type=access-bandwidth,bandwidth=100M \ > > -numa hmat-lb,initiator=3,target=0,hierarchy=memory,data-type=access-latency,latency=20 \ > > -numa hmat-lb,initiator=3,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=400M \ > > -numa hmat-lb,initiator=3,target=2,hierarchy=memory,data-type=access-latency,latency=80 \ > > -numa hmat-lb,initiator=3,target=2,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \ > > -numa hmat-lb,initiator=3,target=4,hierarchy=memory,data-type=access-latency,latency=80 \ > > -numa hmat-lb,initiator=3,target=4,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \ > > -numa hmat-lb,initiator=3,target=5,hierarchy=memory,data-type=access-latency,latency=20 \ > > -numa hmat-lb,initiator=3,target=5,hierarchy=memory,data-type=access-bandwidth,bandwidth=400M \ > > -numa hmat-lb,initiator=5,target=0,hierarchy=memory,data-type=access-latency,latency=20 \ > > -numa hmat-lb,initiator=5,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=400M \ > > -numa hmat-lb,initiator=5,target=2,hierarchy=memory,data-type=access-latency,latency=80 \ > > -numa hmat-lb,initiator=5,target=4,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \ > > -numa hmat-lb,initiator=5,target=4,hierarchy=memory,data-type=access-latency,latency=80 \ > > -numa hmat-lb,initiator=5,target=2,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \ > > -numa hmat-lb,initiator=5,target=5,hierarchy=memory,data-type=access-latency,latency=10 \ > > -numa hmat-lb,initiator=5,target=5,hierarchy=memory,data-type=access-bandwidth,bandwidth=800M > > Uff! > > Could you throw in output of "info qom-tree /"? > This isn't exactly the setup above, but I happen to have it running and it's hopefully close enough. I've tried to crop only the definitely irrelevant parts out. I'm having one of those days where for some reason my terminal is messing up whitespace. Hopefully reconstructed correctly enough. Deep in here you'll see /device[64] (pxb-cxl-host) /.cache_mem[0] (memory-region) /.io[0] (memory-region) /cxl.1 (pxb-cxl-bus) /pxb-cxl-host[0] (memory-region) which is what I want to get and /cxl.1 (pxb-cxl) /bus master container[0] (memory-region) /bus master[0] (memory-region) which is what I can get. Anyhow lots more follows. /(container) ... /machine (virt-9.1-machine) /cxl-fixed-memory-region[0] (memory-region) /cxl-fixed-memory-region[1] (memory-region) /cxl_host_reg[0] (memory-region) ... /peripheral (container) ... /cxl-pmem1 (cxl-type3) /.cache_mem[0] (memory-region) /.io[0] (memory-region) /bus master container[0] (memory-region) /bus master[0] (memory-region) /cap-array[0] (memory-region) /cpmu0-registers[0] (memory-region) /cpmu1-registers[0] (memory-region) /cxl-type3-msix[0] (memory-region) /cxl-type3[0] (memory-region) /device-registers[0] (memory-region) /device-status[0] (memory-region) /mailbox[0] (memory-region) /memory device caps[0] (memory-region) /msix-pba[0] (memory-region) /msix-table[0] (memory-region) /cxl-pmem2 (cxl-type3) /.cache_mem[0] (memory-region) /.io[0] (memory-region) /bus master container[0] (memory-region) /bus master[0] (memory-region) /cap-array[0] (memory-region) /cpmu0-registers[0] (memory-region) /cpmu1-registers[0] (memory-region) /cxl-type3-msix[0] (memory-region) /cxl-type3[0] (memory-region) /device-registers[0] (memory-region) /device-status[0] (memory-region) /mailbox[0] (memory-region) /memory device caps[0] (memory-region) /msix-pba[0] (memory-region) /msix-table[0] (memory-region) /cxl.1 (pxb-cxl) /bus master container[0] (memory-region) /bus master[0] (memory-region) /cxl_rp_port0 (cxl-rp) /.cache_mem[0] (memory-region) /.io[0] (memory-region) /bus master container[0] (memory-region) /bus master[0] (memory-region) /cpmu0-registers[0] (memory-region) /cxl-rp[0] (memory-region) /cxl_rp_port0 (CXL) /pci_bridge_io[0] (memory-region) /pci_bridge_io[1] (memory-region) /pci_bridge_mem[0] (memory-region) /pci_bridge_pci[0] (memory-region) /pci_bridge_pref_mem[0] (memory-region) /pci_bridge_vga_io_hi[0] (memory-region) /pci_bridge_vga_io_lo[0] (memory-region) /pci_bridge_vga_mem[0] (memory-region) /registers[0] (memory-region) /cxl_rp_port1 (cxl-rp) /.cache_mem[0] (memory-region) /.io[0] (memory-region) /bus master container[0] (memory-region) /bus master[0] (memory-region) /cpmu0-registers[0] (memory-region) /cxl-rp[0] (memory-region) /cxl_rp_port1 (CXL) /pci_bridge_io[0] (memory-region) /pci_bridge_io[1] (memory-region) /pci_bridge_mem[0] (memory-region) /pci_bridge_pci[0] (memory-region) /pci_bridge_pref_mem[0] (memory-region) /pci_bridge_vga_io_hi[0] (memory-region) /pci_bridge_vga_io_lo[0] (memory-region) /pci_bridge_vga_mem[0] (memory-region) /registers[0] (memory-region) ... /unattached (container) .... /device[64] (pxb-cxl-host) /.cache_mem[0] (memory-region) /.io[0] (memory-region) /cxl.1 (pxb-cxl-bus) /pxb-cxl-host[0] (memory-region) ... /device[8] (gpex-pcihost) /gpex_ioport[0] (memory-region) /gpex_ioport_window[0] (memory-region) /gpex_mmio[0] (memory-region) /gpex_mmio_window[0] (memory-region) /gpex_root (gpex-root) /bus master container[0] (memory-region) /bus master[0] (memory-region) /pcie-ecam[0] (memory-region) /pcie-mmcfg-mmio[0] (memory-region) /pcie-mmio-high[0] (memory-region) /pcie-mmio[0] (memory-region) /pcie.0 (PCIE) ... /objects (container) /bob11 (acpi-generic-port) /bob2 (acpi-generic-initiator /cxl-lsa1 (memory-backend-file) /cxl-lsa1[0] (memory-region) /cxl-lsa2 (memory-backend-file) /cxl-lsa2[0] (memory-region) /cxl-mem1 (memory-backend-file) /cxl-mem1[0] (memory-region) /cxl-mem2 (memory-backend-file) /cxl-mem2[0] (memory-region) /cxl-mem3 (memory-backend-file) /cxl-mem3[0] (memory-region) /cxl-mem4 (memory-backend-file) /cxl-mem4[0] (memory-region)
diff --git a/qapi/qom.json b/qapi/qom.json index 85e6b4f84a..5480d9ca24 100644 --- a/qapi/qom.json +++ b/qapi/qom.json @@ -826,6 +826,22 @@ 'data': { 'pci-dev': 'str', 'node': 'uint32' } } + +## +# @AcpiGenericPortProperties: +# +# Properties for acpi-generic-port objects. +# +# @pci-bus: PCI bus of the hostbridge associated with this SRAT entry +# +# @node: numa node associated with the PCI device +# +# Since: 9.1 +## +{ 'struct': 'AcpiGenericPortProperties', + 'data': { 'pci-bus': 'str', + 'node': 'uint32' } } + ## # @RngProperties: # @@ -944,6 +960,7 @@ { 'enum': 'ObjectType', 'data': [ 'acpi-generic-initiator', + 'acpi-generic-port', 'authz-list', 'authz-listfile', 'authz-pam', @@ -1016,6 +1033,7 @@ 'discriminator': 'qom-type', 'data': { 'acpi-generic-initiator': 'AcpiGenericInitiatorProperties', + 'acpi-generic-port': 'AcpiGenericPortProperties', 'authz-list': 'AuthZListProperties', 'authz-listfile': 'AuthZListFileProperties', 'authz-pam': 'AuthZPAMProperties', diff --git a/include/hw/acpi/acpi_generic_initiator.h b/include/hw/acpi/acpi_generic_initiator.h index 26e2bd92d4..49ac448034 100644 --- a/include/hw/acpi/acpi_generic_initiator.h +++ b/include/hw/acpi/acpi_generic_initiator.h @@ -30,6 +30,12 @@ typedef struct AcpiGenericInitiator { AcpiGenericNode parent; } AcpiGenericInitiator; +#define TYPE_ACPI_GENERIC_PORT "acpi-generic-port" + +typedef struct AcpiGenericPort { + AcpiGenericInitiator parent; +} AcpiGenericPort; + /* * ACPI 6.3: * Table 5-81 Flags – Generic Initiator Affinity Structure @@ -49,8 +55,16 @@ typedef enum { * Table 5-80 Device Handle - PCI */ typedef struct PCIDeviceHandle { - uint16_t segment; - uint16_t bdf; + union { + struct { + uint16_t segment; + uint16_t bdf; + }; + struct { + uint64_t hid; + uint32_t uid; + }; + }; } PCIDeviceHandle; void build_srat_generic_pci_initiator(GArray *table_data); diff --git a/include/hw/pci/pci_bridge.h b/include/hw/pci/pci_bridge.h index 5cd452115a..5456e24883 100644 --- a/include/hw/pci/pci_bridge.h +++ b/include/hw/pci/pci_bridge.h @@ -102,6 +102,7 @@ typedef struct PXBPCIEDev { PXBDev parent_obj; } PXBPCIEDev; +#define TYPE_PXB_CXL_BUS "pxb-cxl-bus" #define TYPE_PXB_DEV "pxb" OBJECT_DECLARE_SIMPLE_TYPE(PXBDev, PXB_DEV) diff --git a/hw/acpi/acpi_generic_initiator.c b/hw/acpi/acpi_generic_initiator.c index c054e0e27d..85191e90ab 100644 --- a/hw/acpi/acpi_generic_initiator.c +++ b/hw/acpi/acpi_generic_initiator.c @@ -7,6 +7,7 @@ #include "hw/acpi/acpi_generic_initiator.h" #include "hw/acpi/aml-build.h" #include "hw/boards.h" +#include "hw/pci/pci_bridge.h" #include "hw/pci/pci_device.h" #include "qemu/error-report.h" @@ -18,6 +19,10 @@ typedef struct AcpiGenericInitiatorClass { AcpiGenericNodeClass parent_class; } AcpiGenericInitiatorClass; +typedef struct AcpiGenericPortClass { + AcpiGenericInitiatorClass parent; +} AcpiGenericPortClass; + OBJECT_DEFINE_ABSTRACT_TYPE(AcpiGenericNode, acpi_generic_node, ACPI_GENERIC_NODE, OBJECT) @@ -30,6 +35,13 @@ OBJECT_DEFINE_TYPE_WITH_INTERFACES(AcpiGenericInitiator, acpi_generic_initiator, OBJECT_DECLARE_SIMPLE_TYPE(AcpiGenericInitiator, ACPI_GENERIC_INITIATOR) +OBJECT_DEFINE_TYPE_WITH_INTERFACES(AcpiGenericPort, acpi_generic_port, + ACPI_GENERIC_PORT, ACPI_GENERIC_NODE, + { TYPE_USER_CREATABLE }, + { NULL }) + +OBJECT_DECLARE_SIMPLE_TYPE(AcpiGenericPort, ACPI_GENERIC_PORT) + static void acpi_generic_node_init(Object *obj) { AcpiGenericNode *gn = ACPI_GENERIC_NODE(obj); @@ -53,6 +65,14 @@ static void acpi_generic_initiator_finalize(Object *obj) { } +static void acpi_generic_port_init(Object *obj) +{ +} + +static void acpi_generic_port_finalize(Object *obj) +{ +} + static void acpi_generic_node_set_pci_device(Object *obj, const char *val, Error **errp) { @@ -79,42 +99,61 @@ static void acpi_generic_node_set_node(Object *obj, Visitor *v, } gn->node = value; - ms->numa_state->nodes[gn->node].has_gi = true; + if (object_dynamic_cast(obj, TYPE_ACPI_GENERIC_INITIATOR)) { + ms->numa_state->nodes[gn->node].has_gi = true; + } } static void acpi_generic_node_class_init(ObjectClass *oc, void *data) { - object_class_property_add_str(oc, "pci-dev", NULL, - acpi_generic_node_set_pci_device); object_class_property_add(oc, "node", "int", NULL, acpi_generic_node_set_node, NULL, NULL); } static void acpi_generic_initiator_class_init(ObjectClass *oc, void *data) { + object_class_property_add_str(oc, "pci-dev", NULL, + acpi_generic_node_set_pci_device); +} + +static void acpi_generic_port_class_init(ObjectClass *oc, void *data) +{ + /* + * Despite the ID representing a root bridge bus, same storage + * can be used. + */ + object_class_property_add_str(oc, "pci-bus", NULL, + acpi_generic_node_set_pci_device); } /* * ACPI 6.3: * Table 5-78 Generic Initiator Affinity Structure + * ACPI 6.5: + * Table 5-67 Generic Port Affinity Structure */ static void -build_srat_generic_pci_initiator_affinity(GArray *table_data, int node, - PCIDeviceHandle *handle) +build_srat_generic_node_affinity(GArray *table_data, int node, + PCIDeviceHandle *handle, bool gp, bool pci) { - uint8_t index; - - build_append_int_noprefix(table_data, 5, 1); /* Type */ + build_append_int_noprefix(table_data, gp ? 6 : 5, 1); /* Type */ build_append_int_noprefix(table_data, 32, 1); /* Length */ build_append_int_noprefix(table_data, 0, 1); /* Reserved */ - build_append_int_noprefix(table_data, 1, 1); /* Device Handle Type: PCI */ + /* Device Handle Type: PCI / ACPI */ + build_append_int_noprefix(table_data, pci ? 1 : 0, 1); build_append_int_noprefix(table_data, node, 4); /* Proximity Domain */ /* Device Handle - PCI */ - build_append_int_noprefix(table_data, handle->segment, 2); - build_append_int_noprefix(table_data, handle->bdf, 2); - for (index = 0; index < 12; index++) { - build_append_int_noprefix(table_data, 0, 1); + if (pci) { + /* Device Handle - PCI */ + build_append_int_noprefix(table_data, handle->segment, 2); + build_append_int_noprefix(table_data, handle->bdf, 2); + build_append_int_noprefix(table_data, 0, 12); + } else { + /* Device Handle - ACPI */ + build_append_int_noprefix(table_data, handle->hid, 8); + build_append_int_noprefix(table_data, handle->uid, 4); + build_append_int_noprefix(table_data, 0, 4); } build_append_int_noprefix(table_data, GEN_AFFINITY_ENABLED, 4); /* Flags */ @@ -127,37 +166,69 @@ static int build_all_acpi_generic_initiators(Object *obj, void *opaque) GArray *table_data = opaque; PCIDeviceHandle dev_handle; AcpiGenericNode *gn; - PCIDevice *pci_dev; Object *o; - if (!object_dynamic_cast(obj, TYPE_ACPI_GENERIC_INITIATOR)) { + if (!object_dynamic_cast(obj, TYPE_ACPI_GENERIC_NODE)) { return 0; } gn = ACPI_GENERIC_NODE(obj); - if (gn->node >= ms->numa_state->num_nodes) { - error_printf("%s: Specified node %d is invalid.\n", - TYPE_ACPI_GENERIC_INITIATOR, gn->node); - exit(1); - } - o = object_resolve_path_type(gn->pci_dev, TYPE_PCI_DEVICE, NULL); - if (!o) { - error_printf("%s: Specified device must be a PCI device.\n", - TYPE_ACPI_GENERIC_INITIATOR); - exit(1); - } - - pci_dev = PCI_DEVICE(o); - - dev_handle.segment = 0; - dev_handle.bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)), - pci_dev->devfn); + if (object_dynamic_cast(OBJECT(gn), TYPE_ACPI_GENERIC_INITIATOR)) { + PCIDevice *pci_dev; + + if (gn->node >= ms->numa_state->num_nodes) { + error_printf("%s: Specified node %d is invalid.\n", + TYPE_ACPI_GENERIC_INITIATOR, gn->node); + exit(1); + } + + o = object_resolve_path_type(gn->pci_dev, TYPE_PCI_DEVICE, NULL); + if (!o) { + error_printf("%s: Specified device must be a PCI device.\n", + TYPE_ACPI_GENERIC_INITIATOR); + exit(1); + } + pci_dev = PCI_DEVICE(o); + + dev_handle.segment = 0; + dev_handle.bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)), + pci_dev->devfn); + build_srat_generic_node_affinity(table_data, + gn->node, &dev_handle, false, true); - build_srat_generic_pci_initiator_affinity(table_data, - gn->node, &dev_handle); + return 0; + } else { /* TYPE_ACPI_GENERIC_PORT */ + PCIBus *bus; + const char *hid = "ACPI0016"; + + if (gn->node >= ms->numa_state->num_nodes) { + error_printf("%s: Specified node %d is invalid.\n", + TYPE_ACPI_GENERIC_PORT, gn->node); + exit(1); + } + + o = object_resolve_path_type(gn->pci_dev, TYPE_PCI_BUS, NULL); + if (!o) { + error_printf("%s: Specified device must be a PCI Host Bridge.\n", + TYPE_ACPI_GENERIC_PORT); + exit(1); + } + bus = PCI_BUS(o); + /* Need to know if this is a PXB Bus so below an expander bridge */ + if (!object_dynamic_cast(OBJECT(bus), TYPE_PXB_CXL_BUS)) { + error_printf("%s: Specified device is not a bus below a host bridge.\n", + TYPE_ACPI_GENERIC_PORT); + exit(1); + } + /* Copy without trailing NULL */ + memcpy(&dev_handle.hid, hid, sizeof(dev_handle.hid)); + dev_handle.uid = pci_bus_num(bus); + build_srat_generic_node_affinity(table_data, + gn->node, &dev_handle, true, false); - return 0; + return 0; + } } void build_srat_generic_pci_initiator(GArray *table_data) diff --git a/hw/pci-bridge/pci_expander_bridge.c b/hw/pci-bridge/pci_expander_bridge.c index 0411ad31ea..f5431443b9 100644 --- a/hw/pci-bridge/pci_expander_bridge.c +++ b/hw/pci-bridge/pci_expander_bridge.c @@ -38,7 +38,6 @@ DECLARE_INSTANCE_CHECKER(PXBBus, PXB_BUS, DECLARE_INSTANCE_CHECKER(PXBBus, PXB_PCIE_BUS, TYPE_PXB_PCIE_BUS) -#define TYPE_PXB_CXL_BUS "pxb-cxl-bus" DECLARE_INSTANCE_CHECKER(PXBBus, PXB_CXL_BUS, TYPE_PXB_CXL_BUS)
These are very similar to the recently added Generic Initiators but instead of representing an initiator of memory traffic they represent an edge point beyond which may lie either targets or initiators. Here we add these ports such that they may be targets of hmat_lb records to describe the latency and bandwidth from host side initiators to the port. A descoverable mechanism such as UEFI CDAT read from CXL devices and switches is used to discover the remainder fo the path and the OS can build up full latency and bandwidth numbers as need for work and data placement decisions. Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> --- qapi/qom.json | 18 +++ include/hw/acpi/acpi_generic_initiator.h | 18 ++- include/hw/pci/pci_bridge.h | 1 + hw/acpi/acpi_generic_initiator.c | 141 +++++++++++++++++------ hw/pci-bridge/pci_expander_bridge.c | 1 - 5 files changed, 141 insertions(+), 38 deletions(-)