diff mbox series

[RFC,16/25] hw/pxb/cxl: Add "windows" for host bridges

Message ID 20201111054724.794888-17-ben.widawsky@intel.com (mailing list archive)
State New, archived
Headers show
Series Introduce CXL 2.0 Emulation | expand

Commit Message

Ben Widawsky Nov. 11, 2020, 5:47 a.m. UTC
In a bare metal CXL capable system, system firmware will program
physical address ranges on the host. This is done by programming
internal registers that aren't typically known to OS. These address
ranges might be contiguous or interleaved across host bridges.

For a QEMU guest a new construct is introduced allowing passing a memory
backend to the host bridge for this same purpose. Each memory backend
needs to be passed to the host bridge as well as any device that will be
emulating that memory (not implemented here).

I'm hopeful the interleaving work in the link can be re-purposed here
(see Link).

An example to create a host bridges with a 512M window at 0x4c0000000
 -object memory-backend-file,id=cxl-mem1,share,mem-path=cxl-type3,size=512M
 -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52,uid=0,len-memory-base=1,memory-base\[0\]=0x4c0000000,memory\[0\]=cxl-mem1

Link: https://lists.nongnu.org/archive/html/qemu-devel/2020-08/msg03680.html
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
---
 hw/pci-bridge/pci_expander_bridge.c | 65 +++++++++++++++++++++++++++--
 include/hw/cxl/cxl.h                |  1 +
 2 files changed, 62 insertions(+), 4 deletions(-)

Comments

Ben Widawsky Nov. 13, 2020, 12:49 a.m. UTC | #1
On 20-11-10 21:47:15, Ben Widawsky wrote:
> In a bare metal CXL capable system, system firmware will program
> physical address ranges on the host. This is done by programming
> internal registers that aren't typically known to OS. These address
> ranges might be contiguous or interleaved across host bridges.
> 
> For a QEMU guest a new construct is introduced allowing passing a memory
> backend to the host bridge for this same purpose. Each memory backend
> needs to be passed to the host bridge as well as any device that will be
> emulating that memory (not implemented here).
> 
> I'm hopeful the interleaving work in the link can be re-purposed here
> (see Link).
> 
> An example to create a host bridges with a 512M window at 0x4c0000000
>  -object memory-backend-file,id=cxl-mem1,share,mem-path=cxl-type3,size=512M
>  -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52,uid=0,len-memory-base=1,memory-base\[0\]=0x4c0000000,memory\[0\]=cxl-mem1
> 
> Link: https://lists.nongnu.org/archive/html/qemu-devel/2020-08/msg03680.html
> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>

Hi Phil, wanted to call you out specifically on this one.

The newly released CXL 2.0 specification (which from a topology perspective can
be thought of as very PCIe-like) allows for interleaving of memory access.

Below is an example of two host bridges, each with two root ports, and 5 devices
(two of switch are behind a switch).

RP: Root Port
USP: Upstream Port
DSP: Downstream Port
Type 3 device: Memory Device with persistent or volatile memory.

+-------------------------+      +-------------------------+
|                         |      |                         |
|   CXL 2.0 Host Bridge   |      |   CXL 2.0 Host Bridge   |
|                         |      |                         |
|  +------+     +------+  |      |  +------+     +------+  |
|  |  RP  |     |  RP  |  |      |  |  RP  |     |  RP  |  |
+--+------+-----+------+--+      +--+------+-----+------+--+
      |            |                   |               \--
      |            |                   |        +-------+-\--+------+
   +------+    +-------+            +-------+   |       |USP |      |
   |Type 3|    |Type 3 |            |Type 3 |   |       +----+      |
   |Device|    |Device |            |Device |   |                   |
   +------+    +-------+            +-------+   | +----+     +----+ |
                                                | |DSP |     |DSP | |
                                                +-+----+-----+----+-+
                                                    |          |
                                                +------+    +-------+
                                                |Type 3|    |Type 3 |
                                                |Device|    |Device |
                                                +------+    +-------+

Considering this picture... interleaving of memory access can happen in all 3
layers in the topology.

- Memory access can be interleaved across host bridges (this is accomplished
  based on the physical address chosen for the devices, those address ranges are
  platform specific and not part of the 2.0 spec, for now).

- Memory access can be interleaved across root ports in a host bridge.

- Finally, memory access can be interleaved across downstream ports.

I'd like to start the discussion about how this might overlap with the patch
series you've last been working on to interleave memory. Do you have any
thoughts or ideas on how I should go about doing this?
Philippe Mathieu-Daudé Nov. 23, 2020, 7:12 p.m. UTC | #2
On 11/13/20 1:49 AM, Ben Widawsky wrote:
> On 20-11-10 21:47:15, Ben Widawsky wrote:
>> In a bare metal CXL capable system, system firmware will program
>> physical address ranges on the host. This is done by programming
>> internal registers that aren't typically known to OS. These address
>> ranges might be contiguous or interleaved across host bridges.
>>
>> For a QEMU guest a new construct is introduced allowing passing a memory
>> backend to the host bridge for this same purpose. Each memory backend
>> needs to be passed to the host bridge as well as any device that will be
>> emulating that memory (not implemented here).
>>
>> I'm hopeful the interleaving work in the link can be re-purposed here
>> (see Link).
>>
>> An example to create a host bridges with a 512M window at 0x4c0000000
>>  -object memory-backend-file,id=cxl-mem1,share,mem-path=cxl-type3,size=512M
>>  -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52,uid=0,len-memory-base=1,memory-base\[0\]=0x4c0000000,memory\[0\]=cxl-mem1
>>
>> Link: https://lists.nongnu.org/archive/html/qemu-devel/2020-08/msg03680.html
>> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> 
> Hi Phil, wanted to call you out specifically on this one.
> 
> The newly released CXL 2.0 specification (which from a topology perspective can
> be thought of as very PCIe-like) allows for interleaving of memory access.
> 
> Below is an example of two host bridges, each with two root ports, and 5 devices
> (two of switch are behind a switch).
> 
> RP: Root Port
> USP: Upstream Port
> DSP: Downstream Port
> Type 3 device: Memory Device with persistent or volatile memory.
> 
> +-------------------------+      +-------------------------+
> |                         |      |                         |
> |   CXL 2.0 Host Bridge   |      |   CXL 2.0 Host Bridge   |
> |                         |      |                         |
> |  +------+     +------+  |      |  +------+     +------+  |
> |  |  RP  |     |  RP  |  |      |  |  RP  |     |  RP  |  |
> +--+------+-----+------+--+      +--+------+-----+------+--+
>       |            |                   |               \--
>       |            |                   |        +-------+-\--+------+
>    +------+    +-------+            +-------+   |       |USP |      |
>    |Type 3|    |Type 3 |            |Type 3 |   |       +----+      |
>    |Device|    |Device |            |Device |   |                   |
>    +------+    +-------+            +-------+   | +----+     +----+ |
>                                                 | |DSP |     |DSP | |
>                                                 +-+----+-----+----+-+
>                                                     |          |
>                                                 +------+    +-------+
>                                                 |Type 3|    |Type 3 |
>                                                 |Device|    |Device |
>                                                 +------+    +-------+
> 
> Considering this picture... interleaving of memory access can happen in all 3
> layers in the topology.
> 
> - Memory access can be interleaved across host bridges (this is accomplished
>   based on the physical address chosen for the devices, those address ranges are
>   platform specific and not part of the 2.0 spec, for now).
> 
> - Memory access can be interleaved across root ports in a host bridge.
> 
> - Finally, memory access can be interleaved across downstream ports.
> 
> I'd like to start the discussion about how this might overlap with the patch
> series you've last been working on to interleave memory. Do you have any
> thoughts or ideas on how I should go about doing this?

Main case:

 +-------------------------+
 |                         |
 |   CXL 2.0 Host Bridge   |
 |                         |
 |  +------+     +------+  |
 |  |  RP  |     |  RP  |  |
 +--+------+-----+------+--+
       |            |
       |            |
    +------+    +-------+
    |Type 3|    |Type 3 |
    |Device|    |Device |
    +------+    +-------+

// cxl device state
s = qdev_create(TYPE_CXL20_HB_DEV)

cxl_memsize = 2 * memsize(Type3Dev);

// container for cxl
memory_region_init(&s->container, OBJECT(s),
                   "container", cxl_memsize);

// create 2 slots, interleaved each 2k
s->interleaver = qdev_create(INTERLEAVER_DEV,
                             slotsize=2k,
                             max_slots=2)
qdev_prop_set_uint64(s->interleaver, "size",
                     cxl_memsize);

// connect each device to the interleaver
object_property_set_link(OBJECT(interleaver),
                         "mr0", OBJECT(RP0))
object_property_set_link(OBJECT(interleaver),
                         "mr1", OBJECT(RP1))
sysbus_realize_and_unref(SYS_BUS_DEVICE(interleaver))

// we can probably avoid this container
memory_region_add_subregion(&s->container, 0,
                            sysbus_mmio_get_region(interleaver, 0));


For the 2nd case, USP can be created the same way than case 1
(as a 2nd interleaver) then the main CXL is created with the
minor difference of mr1 now being the USP:

object_property_set_link(OBJECT(interleaver),
                         "mr1", OBJECT(USP))
sysbus_realize_and_unref(SYS_BUS_DEVICE(interleaver))

Regards,

Phil.
diff mbox series

Patch

diff --git a/hw/pci-bridge/pci_expander_bridge.c b/hw/pci-bridge/pci_expander_bridge.c
index eca5c71d45..75910f5870 100644
--- a/hw/pci-bridge/pci_expander_bridge.c
+++ b/hw/pci-bridge/pci_expander_bridge.c
@@ -69,12 +69,19 @@  struct PXBDev {
     uint8_t bus_nr;
     uint16_t numa_node;
     int32_t uid;
+    struct cxl_dev {
+        HostMemoryBackend *memory_window[CXL_WINDOW_MAX];
+
+        uint32_t num_windows;
+        hwaddr *window_base[CXL_WINDOW_MAX];
+    } cxl;
 };
 
 typedef struct CXLHost {
     PCIHostState parent_obj;
 
     CXLComponentState cxl_cstate;
+    PXBDev *dev;
 } CXLHost;
 
 static PXBDev *convert_to_pxb(PCIDevice *dev)
@@ -213,16 +220,31 @@  static void pxb_cxl_realize(DeviceState *dev, Error **errp)
     SysBusDevice *sbd = SYS_BUS_DEVICE(dev);
     PCIHostState *phb = PCI_HOST_BRIDGE(dev);
     CXLHost *cxl = PXB_CXL_HOST(dev);
+    struct cxl_dev *cxl_dev = &cxl->dev->cxl;
     CXLComponentState *cxl_cstate = &cxl->cxl_cstate;
     struct MemoryRegion *mr = &cxl_cstate->crb.component_registers;
+    int uid = pci_bus_uid(phb->bus);
 
     cxl_component_register_block_init(OBJECT(dev), cxl_cstate,
                                       TYPE_PXB_CXL_HOST);
     sysbus_init_mmio(sbd, mr);
 
-    /* FIXME: support multiple host bridges. */
-    sysbus_mmio_map(sbd, 0, CXL_HOST_BASE +
-                            memory_region_size(mr) * pci_bus_uid(phb->bus));
+    sysbus_mmio_map(sbd, 0, CXL_HOST_BASE + memory_region_size(mr) * uid);
+
+    /*
+     * A CXL host bridge can exist without a fixed memory window, but it would
+     * only operate in legacy PCIe mode.
+     */
+    if (!cxl_dev->memory_window[uid]) {
+        warn_report(
+            "CXL expander bridge created without window. Consider using %s",
+            "memdev[0]=<memory_backend>");
+        return;
+    }
+
+    mr = host_memory_backend_get_memory(cxl_dev->memory_window[uid]);
+    sysbus_init_mmio(sbd, mr);
+    sysbus_mmio_map(sbd, 1 + uid, *cxl_dev->window_base[uid]);
 }
 
 static void pxb_cxl_host_class_init(ObjectClass *class, void *data)
@@ -328,6 +350,7 @@  static void pxb_dev_realize_common(PCIDevice *dev, enum BusType type,
     } else if (type == CXL) {
         bus = pci_root_bus_new(ds, dev_name, NULL, NULL, 0, TYPE_PXB_CXL_BUS);
         bus->flags |= PCI_BUS_CXL;
+        PXB_CXL_HOST(ds)->dev = PXB_CXL_DEV(dev);
     } else {
         bus = pci_root_bus_new(ds, "pxb-internal", NULL, NULL, 0, TYPE_PXB_BUS);
         bds = qdev_new("pci-bridge");
@@ -389,6 +412,8 @@  static Property pxb_dev_properties[] = {
     DEFINE_PROP_UINT8("bus_nr", PXBDev, bus_nr, 0),
     DEFINE_PROP_UINT16("numa_node", PXBDev, numa_node, NUMA_NODE_UNASSIGNED),
     DEFINE_PROP_INT32("uid", PXBDev, uid, -1),
+    DEFINE_PROP_ARRAY("window-base", PXBDev, cxl.num_windows, cxl.window_base,
+                      qdev_prop_uint64, hwaddr),
     DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -460,7 +485,9 @@  static const TypeInfo pxb_pcie_dev_info = {
 
 static void pxb_cxl_dev_realize(PCIDevice *dev, Error **errp)
 {
-    PXBDev *pxb = convert_to_pxb(dev);
+    PXBDev *pxb = PXB_CXL_DEV(dev);
+    struct cxl_dev *cxl = &pxb->cxl;
+    int count = 0;
 
     /* A CXL PXB's parent bus is still PCIe */
     if (!pci_bus_is_express(pci_get_bus(dev))) {
@@ -476,6 +503,23 @@  static void pxb_cxl_dev_realize(PCIDevice *dev, Error **errp)
     /* FIXME: Check that uid doesn't collide with UIDs of other host bridges */
 
     pxb_dev_realize_common(dev, CXL, errp);
+
+    for (unsigned i = 0; i < CXL_WINDOW_MAX; i++) {
+        if (!cxl->memory_window[i]) {
+            continue;
+        }
+
+        count++;
+    }
+
+    if (!count) {
+        warn_report("memory-windows should be set when creating CXL host bridges");
+    }
+
+    if (count != cxl->num_windows) {
+        error_setg(errp, "window bases count (%d) must match window count (%d)",
+                   cxl->num_windows, count);
+    }
 }
 
 static void pxb_cxl_dev_class_init(ObjectClass *klass, void *data)
@@ -496,6 +540,19 @@  static void pxb_cxl_dev_class_init(ObjectClass *klass, void *data)
 
     /* Host bridges aren't hotpluggable. FIXME: spec reference */
     dc->hotpluggable = false;
+
+    /*
+     * Below is moral equivalent of:
+     *   DEFINE_PROP_ARRAY("memdev", PXBDev, window_count, windows,
+     *                     qdev_prop_memory_device, HostMemoryBackend)
+     */
+    for (unsigned i = 0; i < CXL_WINDOW_MAX; i++) {
+        g_autofree char *name = g_strdup_printf("memdev[%u]", i);
+        object_class_property_add_link(klass, name, TYPE_MEMORY_BACKEND,
+                offsetof(PXBDev, cxl.memory_window[i]),
+                qdev_prop_allow_set_link_before_realize,
+                OBJ_PROP_LINK_STRONG);
+    }
 }
 
 static const TypeInfo pxb_cxl_dev_info = {
diff --git a/include/hw/cxl/cxl.h b/include/hw/cxl/cxl.h
index 6bc344f205..b1e5f4a8fa 100644
--- a/include/hw/cxl/cxl.h
+++ b/include/hw/cxl/cxl.h
@@ -18,6 +18,7 @@ 
 #define DEVICE_REG_BAR_IDX 2
 
 #define CXL_HOST_BASE 0xD0000000
+#define CXL_WINDOW_MAX 10
 
 #endif