diff mbox series

[v7,09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents

Message ID 20240418232902.583744-10-fan.ni@samsung.com (mailing list archive)
State New, archived
Headers show
Series Enabling DCD emulation support in Qemu | expand

Commit Message

Fan Ni April 18, 2024, 11:11 p.m. UTC
From: Fan Ni <fan.ni@samsung.com>

To simulate FM functionalities for initiating Dynamic Capacity Add
(Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
add/release dynamic capacity extents requests.

With the change, we allow to release an extent only when its DPA range
is contained by a single accepted extent in the device. That is to say,
extent superset release is not supported yet.

1. Add dynamic capacity extents:

For example, the command to add two continuous extents (each 128MiB long)
to region 0 (starting at DPA offset 0) looks like below:

{ "execute": "qmp_capabilities" }

{ "execute": "cxl-add-dynamic-capacity",
  "arguments": {
      "path": "/machine/peripheral/cxl-dcd0",
      "hid": 0,
      "selection-policy": 2,
      "region-id": 0,
      "tag": "",
      "extents": [
      {
          "offset": 0,
          "len": 134217728
      },
      {
          "offset": 134217728,
          "len": 134217728
      }
      ]
  }
}

2. Release dynamic capacity extents:

For example, the command to release an extent of size 128MiB from region 0
(DPA offset 128MiB) looks like below:

{ "execute": "cxl-release-dynamic-capacity",
  "arguments": {
      "path": "/machine/peripheral/cxl-dcd0",
      "hid": 0,
      "flags": 1,
      "region-id": 0,
      "tag": "",
      "extents": [
      {
          "offset": 134217728,
          "len": 134217728
      }
      ]
  }
}

Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/cxl/cxl-mailbox-utils.c  |  62 +++++--
 hw/mem/cxl_type3.c          | 311 +++++++++++++++++++++++++++++++++++-
 hw/mem/cxl_type3_stubs.c    |  20 +++
 include/hw/cxl/cxl_device.h |  22 +++
 include/hw/cxl/cxl_events.h |  18 +++
 qapi/cxl.json               |  69 ++++++++
 6 files changed, 489 insertions(+), 13 deletions(-)

Comments

Gregory Price April 19, 2024, 6:13 p.m. UTC | #1
On Thu, Apr 18, 2024 at 04:11:00PM -0700, nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
> 
> To simulate FM functionalities for initiating Dynamic Capacity Add
> (Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
> r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
> add/release dynamic capacity extents requests.
> 
> With the change, we allow to release an extent only when its DPA range
> is contained by a single accepted extent in the device. That is to say,
> extent superset release is not supported yet.
> 
...
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
> ---
>  hw/cxl/cxl-mailbox-utils.c  |  62 +++++--
>  hw/mem/cxl_type3.c          | 311 +++++++++++++++++++++++++++++++++++-
>  hw/mem/cxl_type3_stubs.c    |  20 +++
>  include/hw/cxl/cxl_device.h |  22 +++
>  include/hw/cxl/cxl_events.h |  18 +++
>  qapi/cxl.json               |  69 ++++++++
>  6 files changed, 489 insertions(+), 13 deletions(-)
> 

Reviewed-by: Gregory Price <gregory.price@memverge.com>
Jonathan Cameron April 22, 2024, 12:01 p.m. UTC | #2
On Thu, 18 Apr 2024 16:11:00 -0700
nifan.cxl@gmail.com wrote:

> From: Fan Ni <fan.ni@samsung.com>
>

Hi Fan,

Please expand CC list to include QAPI maintainers.
+CC Markus and Micheal.

Also, for future versions +CC Michael Tsirkin.

I'm find rolling these up as a series with the precursors but
if it is already some Michael has seen it may speed things up.

Jonathan

p.s. Today I'm just building a tree, but will circle back around
later in the week with a final review of the last few changes.

 
> To simulate FM functionalities for initiating Dynamic Capacity Add
> (Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
> r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
> add/release dynamic capacity extents requests.
> 
> With the change, we allow to release an extent only when its DPA range
> is contained by a single accepted extent in the device. That is to say,
> extent superset release is not supported yet.
> 
> 1. Add dynamic capacity extents:
> 
> For example, the command to add two continuous extents (each 128MiB long)
> to region 0 (starting at DPA offset 0) looks like below:
> 
> { "execute": "qmp_capabilities" }
> 
> { "execute": "cxl-add-dynamic-capacity",
>   "arguments": {
>       "path": "/machine/peripheral/cxl-dcd0",
>       "hid": 0,
>       "selection-policy": 2,
>       "region-id": 0,
>       "tag": "",
>       "extents": [
>       {
>           "offset": 0,
>           "len": 134217728
>       },
>       {
>           "offset": 134217728,
>           "len": 134217728
>       }
>       ]
>   }
> }
> 
> 2. Release dynamic capacity extents:
> 
> For example, the command to release an extent of size 128MiB from region 0
> (DPA offset 128MiB) looks like below:
> 
> { "execute": "cxl-release-dynamic-capacity",
>   "arguments": {
>       "path": "/machine/peripheral/cxl-dcd0",
>       "hid": 0,
>       "flags": 1,
>       "region-id": 0,
>       "tag": "",
>       "extents": [
>       {
>           "offset": 134217728,
>           "len": 134217728
>       }
>       ]
>   }
> }
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
> ---
>  hw/cxl/cxl-mailbox-utils.c  |  62 +++++--
>  hw/mem/cxl_type3.c          | 311 +++++++++++++++++++++++++++++++++++-
>  hw/mem/cxl_type3_stubs.c    |  20 +++
>  include/hw/cxl/cxl_device.h |  22 +++
>  include/hw/cxl/cxl_events.h |  18 +++
>  qapi/cxl.json               |  69 ++++++++
>  6 files changed, 489 insertions(+), 13 deletions(-)
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index 9d54e10cd4..3569902e9e 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -1405,7 +1405,7 @@ static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
>   * Check whether any bit between addr[nr, nr+size) is set,
>   * return true if any bit is set, otherwise return false
>   */
> -static bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
> +bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
>                                unsigned long size)
>  {
>      unsigned long res = find_next_bit(addr, size + nr, nr);
> @@ -1444,7 +1444,7 @@ CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len)
>      return NULL;
>  }
>  
> -static void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
> +void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
>                                               uint64_t dpa,
>                                               uint64_t len,
>                                               uint8_t *tag,
> @@ -1470,6 +1470,44 @@ void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
>      g_free(extent);
>  }
>  
> +/*
> + * Add a new extent to the extent "group" if group exists;
> + * otherwise, create a new group
> + * Return value: return the group where the extent is inserted.
> + */
> +CXLDCExtentGroup *cxl_insert_extent_to_extent_group(CXLDCExtentGroup *group,
> +                                                    uint64_t dpa,
> +                                                    uint64_t len,
> +                                                    uint8_t *tag,
> +                                                    uint16_t shared_seq)
> +{
> +    if (!group) {
> +        group = g_new0(CXLDCExtentGroup, 1);
> +        QTAILQ_INIT(&group->list);
> +    }
> +    cxl_insert_extent_to_extent_list(&group->list, dpa, len,
> +                                     tag, shared_seq);
> +    return group;
> +}
> +
> +void cxl_extent_group_list_insert_tail(CXLDCExtentGroupList *list,
> +                                       CXLDCExtentGroup *group)
> +{
> +    QTAILQ_INSERT_TAIL(list, group, node);
> +}
> +
> +void cxl_extent_group_list_delete_front(CXLDCExtentGroupList *list)
> +{
> +    CXLDCExtent *ent, *ent_next;
> +    CXLDCExtentGroup *group = QTAILQ_FIRST(list);
> +
> +    QTAILQ_REMOVE(list, group, node);
> +    QTAILQ_FOREACH_SAFE(ent, &group->list, node, ent_next) {
> +        cxl_remove_extent_from_extent_list(&group->list, ent);
> +    }
> +    g_free(group);
> +}
> +
>  /*
>   * CXL r3.1 Table 8-168: Add Dynamic Capacity Response Input Payload
>   * CXL r3.1 Table 8-170: Release Dynamic Capacity Input Payload
> @@ -1541,6 +1579,7 @@ static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
>  {
>      uint32_t i;
>      CXLDCExtent *ent;
> +    CXLDCExtentGroup *ext_group;
>      uint64_t dpa, len;
>      Range range1, range2;
>  
> @@ -1551,9 +1590,13 @@ static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
>          range_init_nofail(&range1, dpa, len);
>  
>          /*
> -         * TODO: once the pending extent list is added, check against
> -         * the list will be added here.
> +         * The host-accepted DPA range must be contained by the first extent
> +         * group in the pending list
>           */
> +        ext_group = QTAILQ_FIRST(&ct3d->dc.extents_pending);
> +        if (!cxl_extents_contains_dpa_range(&ext_group->list, dpa, len)) {
> +            return CXL_MBOX_INVALID_PA;
> +        }
>  
>          /* to-be-added range should not overlap with range already accepted */
>          QTAILQ_FOREACH(ent, &ct3d->dc.extents, node) {
> @@ -1586,10 +1629,7 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
>      CXLRetCode ret;
>  
>      if (in->num_entries_updated == 0) {
> -        /*
> -         * TODO: once the pending list is introduced, extents in the beginning
> -         * will get wiped out.
> -         */
> +        cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
>          return CXL_MBOX_SUCCESS;
>      }
>  
> @@ -1615,11 +1655,9 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
>  
>          cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
>          ct3d->dc.total_extent_count += 1;
> -        /*
> -         * TODO: we will add a pending extent list based on event log record
> -         * and process the list accordingly here.
> -         */
>      }
> +    /* Remove the first extent group in the pending list*/
> +    cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
>  
>      return CXL_MBOX_SUCCESS;
>  }
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index c2cdd6d506..e892b3de7b 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -667,6 +667,7 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
>          ct3d->dc.total_capacity += region->len;
>      }
>      QTAILQ_INIT(&ct3d->dc.extents);
> +    QTAILQ_INIT(&ct3d->dc.extents_pending);
>  
>      return true;
>  }
> @@ -674,10 +675,19 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
>  static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
>  {
>      CXLDCExtent *ent, *ent_next;
> +    CXLDCExtentGroup *group, *group_next;
>  
>      QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node, ent_next) {
>          cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
>      }
> +
> +    QTAILQ_FOREACH_SAFE(group, &ct3d->dc.extents_pending, node, group_next) {
> +        QTAILQ_REMOVE(&ct3d->dc.extents_pending, group, node);
> +        QTAILQ_FOREACH_SAFE(ent, &group->list, node, ent_next) {
> +            cxl_remove_extent_from_extent_list(&group->list, ent);
> +        }
> +        g_free(group);
> +    }
>  }
>  
>  static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
> @@ -1443,7 +1453,6 @@ static int ct3d_qmp_cxl_event_log_enc(CxlEventLog log)
>          return CXL_EVENT_TYPE_FAIL;
>      case CXL_EVENT_LOG_FATAL:
>          return CXL_EVENT_TYPE_FATAL;
> -/* DCD not yet supported */
>      default:
>          return -EINVAL;
>      }
> @@ -1694,6 +1703,306 @@ void qmp_cxl_inject_memory_module_event(const char *path, CxlEventLog log,
>      }
>  }
>  
> +/* CXL r3.1 Table 8-50: Dynamic Capacity Event Record */
> +static const QemuUUID dynamic_capacity_uuid = {
> +    .data = UUID(0xca95afa7, 0xf183, 0x4018, 0x8c, 0x2f,
> +                 0x95, 0x26, 0x8e, 0x10, 0x1a, 0x2a),
> +};
> +
> +typedef enum CXLDCEventType {
> +    DC_EVENT_ADD_CAPACITY = 0x0,
> +    DC_EVENT_RELEASE_CAPACITY = 0x1,
> +    DC_EVENT_FORCED_RELEASE_CAPACITY = 0x2,
> +    DC_EVENT_REGION_CONFIG_UPDATED = 0x3,
> +    DC_EVENT_ADD_CAPACITY_RSP = 0x4,
> +    DC_EVENT_CAPACITY_RELEASED = 0x5,
> +} CXLDCEventType;
> +
> +/*
> + * Check whether the range [dpa, dpa + len - 1] has overlaps with extents in
> + * the list.
> + * Return value: return true if has overlaps; otherwise, return false
> + */
> +static bool cxl_extents_overlaps_dpa_range(CXLDCExtentList *list,
> +                                           uint64_t dpa, uint64_t len)
> +{
> +    CXLDCExtent *ent;
> +    Range range1, range2;
> +
> +    if (!list) {
> +        return false;
> +    }
> +
> +    range_init_nofail(&range1, dpa, len);
> +    QTAILQ_FOREACH(ent, list, node) {
> +        range_init_nofail(&range2, ent->start_dpa, ent->len);
> +        if (range_overlaps_range(&range1, &range2)) {
> +            return true;
> +        }
> +    }
> +    return false;
> +}
> +
> +/*
> + * Check whether the range [dpa, dpa + len - 1] is contained by extents in
> + * the list.
> + * Will check multiple extents containment once superset release is added.
> + * Return value: return true if range is contained; otherwise, return false
> + */
> +bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
> +                                    uint64_t dpa, uint64_t len)
> +{
> +    CXLDCExtent *ent;
> +    Range range1, range2;
> +
> +    if (!list) {
> +        return false;
> +    }
> +
> +    range_init_nofail(&range1, dpa, len);
> +    QTAILQ_FOREACH(ent, list, node) {
> +        range_init_nofail(&range2, ent->start_dpa, ent->len);
> +        if (range_contains_range(&range2, &range1)) {
> +            return true;
> +        }
> +    }
> +    return false;
> +}
> +
> +static bool cxl_extent_groups_overlaps_dpa_range(CXLDCExtentGroupList *list,
> +                                                uint64_t dpa, uint64_t len)
> +{
> +    CXLDCExtentGroup *group;
> +
> +    if (!list) {
> +        return false;
> +    }
> +
> +    QTAILQ_FOREACH(group, list, node) {
> +        if (cxl_extents_overlaps_dpa_range(&group->list, dpa, len)) {
> +            return true;
> +        }
> +    }
> +    return false;
> +}
> +
> +/*
> + * The main function to process dynamic capacity event with extent list.
> + * Currently DC extents add/release requests are processed.
> + */
> +static void qmp_cxl_process_dynamic_capacity_prescriptive(const char *path,
> +        uint16_t hid, CXLDCEventType type, uint8_t rid,
> +        CXLDCExtentRecordList *records, Error **errp)
> +{
> +    Object *obj;
> +    CXLEventDynamicCapacity dCap = {};
> +    CXLEventRecordHdr *hdr = &dCap.hdr;
> +    CXLType3Dev *dcd;
> +    uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;
> +    uint32_t num_extents = 0;
> +    CXLDCExtentRecordList *list;
> +    CXLDCExtentGroup *group = NULL;
> +    g_autofree CXLDCExtentRaw *extents = NULL;
> +    uint8_t enc_log = CXL_EVENT_TYPE_DYNAMIC_CAP;
> +    uint64_t dpa, offset, len, block_size;
> +    g_autofree unsigned long *blk_bitmap = NULL;
> +    int i;
> +
> +    obj = object_resolve_path_type(path, TYPE_CXL_TYPE3, NULL);
> +    if (!obj) {
> +        error_setg(errp, "Unable to resolve CXL type 3 device");
> +        return;
> +    }
> +
> +    dcd = CXL_TYPE3(obj);
> +    if (!dcd->dc.num_regions) {
> +        error_setg(errp, "No dynamic capacity support from the device");
> +        return;
> +    }
> +
> +
> +    if (rid >= dcd->dc.num_regions) {
> +        error_setg(errp, "region id is too large");
> +        return;
> +    }
> +    block_size = dcd->dc.regions[rid].block_size;
> +    blk_bitmap = bitmap_new(dcd->dc.regions[rid].len / block_size);
> +
> +    /* Sanity check and count the extents */
> +    list = records;
> +    while (list) {
> +        offset = list->value->offset;
> +        len = list->value->len;
> +        dpa = offset + dcd->dc.regions[rid].base;
> +
> +        if (len == 0) {
> +            error_setg(errp, "extent with 0 length is not allowed");
> +            return;
> +        }
> +
> +        if (offset % block_size || len % block_size) {
> +            error_setg(errp, "dpa or len is not aligned to region block size");
> +            return;
> +        }
> +
> +        if (offset + len > dcd->dc.regions[rid].len) {
> +            error_setg(errp, "extent range is beyond the region end");
> +            return;
> +        }
> +
> +        /* No duplicate or overlapped extents are allowed */
> +        if (test_any_bits_set(blk_bitmap, offset / block_size,
> +                              len / block_size)) {
> +            error_setg(errp, "duplicate or overlapped extents are detected");
> +            return;
> +        }
> +        bitmap_set(blk_bitmap, offset / block_size, len / block_size);
> +
> +        if (type == DC_EVENT_RELEASE_CAPACITY) {
> +            if (cxl_extent_groups_overlaps_dpa_range(&dcd->dc.extents_pending,
> +                                                     dpa, len)) {
> +                error_setg(errp,
> +                           "cannot release extent with pending DPA range");
> +                return;
> +            }
> +            if (!cxl_extents_contains_dpa_range(&dcd->dc.extents, dpa, len)) {
> +                error_setg(errp,
> +                           "cannot release extent with non-existing DPA range");
> +                return;
> +            }
> +        } else if (type == DC_EVENT_ADD_CAPACITY) {
> +            if (cxl_extents_overlaps_dpa_range(&dcd->dc.extents, dpa, len)) {
> +                error_setg(errp,
> +                           "cannot add DPA already accessible  to the same LD");
> +                return;
> +            }
> +            if (cxl_extent_groups_overlaps_dpa_range(&dcd->dc.extents_pending,
> +                                                     dpa, len)) {
> +                error_setg(errp,
> +                           "cannot add DPA again while still pending");
> +                return;
> +            }
> +        }
> +        list = list->next;
> +        num_extents++;
> +    }
> +
> +    /* Create extent list for event being passed to host */
> +    i = 0;
> +    list = records;
> +    extents = g_new0(CXLDCExtentRaw, num_extents);
> +    while (list) {
> +        offset = list->value->offset;
> +        len = list->value->len;
> +        dpa = dcd->dc.regions[rid].base + offset;
> +
> +        extents[i].start_dpa = dpa;
> +        extents[i].len = len;
> +        memset(extents[i].tag, 0, 0x10);
> +        extents[i].shared_seq = 0;
> +        if (type == DC_EVENT_ADD_CAPACITY) {
> +            group = cxl_insert_extent_to_extent_group(group,
> +                                                      extents[i].start_dpa,
> +                                                      extents[i].len,
> +                                                      extents[i].tag,
> +                                                      extents[i].shared_seq);
> +        }
> +
> +        list = list->next;
> +        i++;
> +    }
> +    if (group) {
> +        cxl_extent_group_list_insert_tail(&dcd->dc.extents_pending, group);
> +    }
> +
> +    /*
> +     * CXL r3.1 section 8.2.9.2.1.6: Dynamic Capacity Event Record
> +     *
> +     * All Dynamic Capacity event records shall set the Event Record Severity
> +     * field in the Common Event Record Format to Informational Event. All
> +     * Dynamic Capacity related events shall be logged in the Dynamic Capacity
> +     * Event Log.
> +     */
> +    cxl_assign_event_header(hdr, &dynamic_capacity_uuid, flags, sizeof(dCap),
> +                            cxl_device_get_timestamp(&dcd->cxl_dstate));
> +
> +    dCap.type = type;
> +    /* FIXME: for now, validity flag is cleared */
> +    dCap.validity_flags = 0;
> +    stw_le_p(&dCap.host_id, hid);
> +    /* only valid for DC_REGION_CONFIG_UPDATED event */
> +    dCap.updated_region_id = 0;
> +    dCap.flags = 0;
> +    for (i = 0; i < num_extents; i++) {
> +        memcpy(&dCap.dynamic_capacity_extent, &extents[i],
> +               sizeof(CXLDCExtentRaw));
> +
> +        if (i < num_extents - 1) {
> +            /* Set "More" flag */
> +            dCap.flags |= BIT(0);
> +        }
> +
> +        if (cxl_event_insert(&dcd->cxl_dstate, enc_log,
> +                             (CXLEventRecordRaw *)&dCap)) {
> +            cxl_event_irq_assert(dcd);
> +        }
> +    }
> +}
> +
> +void qmp_cxl_add_dynamic_capacity(const char *path, uint16_t hid,
> +                                  uint8_t sel_policy, uint8_t region_id,
> +                                  const char *tag,
> +                                  CXLDCExtentRecordList  *records,
> +                                  Error **errp)
> +{
> +    enum {
> +        CXL_SEL_POLICY_FREE,
> +        CXL_SEL_POLICY_CONTIGUOUS,
> +        CXL_SEL_POLICY_PRESCRIPTIVE,
> +        CXL_SEL_POLICY_ENABLESHAREDACCESS,
> +    };
> +    switch (sel_policy) {
> +    case CXL_SEL_POLICY_PRESCRIPTIVE:
> +        qmp_cxl_process_dynamic_capacity_prescriptive(path, hid,
> +                                                      DC_EVENT_ADD_CAPACITY,
> +                                                      region_id, records, errp);
> +        return;
> +    default:
> +        error_setg(errp, "Selection policy not supported");
> +        return;
> +    }
> +}
> +
> +#define REMOVAL_POLICY_MASK 0xf
> +#define REMOVAL_POLICY_PRESCRIPTIVE 1
> +#define FORCED_REMOVAL_BIT BIT(4)
> +
> +void qmp_cxl_release_dynamic_capacity(const char *path, uint16_t hid,
> +                                      uint8_t flags, uint8_t region_id,
> +                                      const char *tag,
> +                                      CXLDCExtentRecordList  *records,
> +                                      Error **errp)
> +{
> +    CXLDCEventType type = DC_EVENT_RELEASE_CAPACITY;
> +
> +    if (flags & FORCED_REMOVAL_BIT) {
> +        /* TODO: enable forced removal in the future */
> +        type = DC_EVENT_FORCED_RELEASE_CAPACITY;
> +        error_setg(errp, "Forced removal not supported yet");
> +        return;
> +    }
> +
> +    switch (flags & REMOVAL_POLICY_MASK) {
> +    case REMOVAL_POLICY_PRESCRIPTIVE:
> +        qmp_cxl_process_dynamic_capacity_prescriptive(path, hid, type,
> +                                                      region_id, records, errp);
> +        return;
> +    default:
> +        error_setg(errp, "Removal policy not supported");
> +        return;
> +    }
> +}
> +
>  static void ct3_class_init(ObjectClass *oc, void *data)
>  {
>      DeviceClass *dc = DEVICE_CLASS(oc);
> diff --git a/hw/mem/cxl_type3_stubs.c b/hw/mem/cxl_type3_stubs.c
> index 3e1851e32b..810685e0d5 100644
> --- a/hw/mem/cxl_type3_stubs.c
> +++ b/hw/mem/cxl_type3_stubs.c
> @@ -67,3 +67,23 @@ void qmp_cxl_inject_correctable_error(const char *path, CxlCorErrorType type,
>  {
>      error_setg(errp, "CXL Type 3 support is not compiled in");
>  }
> +
> +void qmp_cxl_add_dynamic_capacity(const char *path,
> +                                  uint16_t hid,
> +                                  uint8_t sel_policy,
> +                                  uint8_t region_id,
> +                                  const char *tag,
> +                                  CXLDCExtentRecordList  *records,
> +                                  Error **errp)
> +{
> +    error_setg(errp, "CXL Type 3 support is not compiled in");
> +}
> +
> +void qmp_cxl_release_dynamic_capacity(const char *path, uint16_t hid,
> +                                      uint8_t flags, uint8_t region_id,
> +                                      const char *tag,
> +                                      CXLDCExtentRecordList  *records,
> +                                      Error **errp)
> +{
> +    error_setg(errp, "CXL Type 3 support is not compiled in");
> +}
> diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> index df3511e91b..c69ff6b5de 100644
> --- a/include/hw/cxl/cxl_device.h
> +++ b/include/hw/cxl/cxl_device.h
> @@ -443,6 +443,12 @@ typedef struct CXLDCExtent {
>  } CXLDCExtent;
>  typedef QTAILQ_HEAD(, CXLDCExtent) CXLDCExtentList;
>  
> +typedef struct CXLDCExtentGroup {
> +    CXLDCExtentList list;
> +    QTAILQ_ENTRY(CXLDCExtentGroup) node;
> +} CXLDCExtentGroup;
> +typedef QTAILQ_HEAD(, CXLDCExtentGroup) CXLDCExtentGroupList;
> +
>  typedef struct CXLDCRegion {
>      uint64_t base;       /* aligned to 256*MiB */
>      uint64_t decode_len; /* aligned to 256*MiB */
> @@ -494,6 +500,7 @@ struct CXLType3Dev {
>           */
>          uint64_t total_capacity; /* 256M aligned */
>          CXLDCExtentList extents;
> +        CXLDCExtentGroupList extents_pending;
>          uint32_t total_extent_count;
>          uint32_t ext_list_gen_seq;
>  
> @@ -555,4 +562,19 @@ CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len);
>  
>  void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
>                                          CXLDCExtent *extent);
> +void cxl_insert_extent_to_extent_list(CXLDCExtentList *list, uint64_t dpa,
> +                                      uint64_t len, uint8_t *tag,
> +                                      uint16_t shared_seq);
> +bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
> +                       unsigned long size);
> +bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
> +                                    uint64_t dpa, uint64_t len);
> +CXLDCExtentGroup *cxl_insert_extent_to_extent_group(CXLDCExtentGroup *group,
> +                                                    uint64_t dpa,
> +                                                    uint64_t len,
> +                                                    uint8_t *tag,
> +                                                    uint16_t shared_seq);
> +void cxl_extent_group_list_insert_tail(CXLDCExtentGroupList *list,
> +                                       CXLDCExtentGroup *group);
> +void cxl_extent_group_list_delete_front(CXLDCExtentGroupList *list);
>  #endif
> diff --git a/include/hw/cxl/cxl_events.h b/include/hw/cxl/cxl_events.h
> index 5170b8dbf8..38cadaa0f3 100644
> --- a/include/hw/cxl/cxl_events.h
> +++ b/include/hw/cxl/cxl_events.h
> @@ -166,4 +166,22 @@ typedef struct CXLEventMemoryModule {
>      uint8_t reserved[0x3d];
>  } QEMU_PACKED CXLEventMemoryModule;
>  
> +/*
> + * CXL r3.1 section Table 8-50: Dynamic Capacity Event Record
> + * All fields little endian.
> + */
> +typedef struct CXLEventDynamicCapacity {
> +    CXLEventRecordHdr hdr;
> +    uint8_t type;
> +    uint8_t validity_flags;
> +    uint16_t host_id;
> +    uint8_t updated_region_id;
> +    uint8_t flags;
> +    uint8_t reserved2[2];
> +    uint8_t dynamic_capacity_extent[0x28]; /* defined in cxl_device.h */
> +    uint8_t reserved[0x18];
> +    uint32_t extents_avail;
> +    uint32_t tags_avail;
> +} QEMU_PACKED CXLEventDynamicCapacity;
> +
>  #endif /* CXL_EVENTS_H */
> diff --git a/qapi/cxl.json b/qapi/cxl.json
> index 4281726dec..2dcf03d973 100644
> --- a/qapi/cxl.json
> +++ b/qapi/cxl.json
> @@ -361,3 +361,72 @@
>  ##
>  {'command': 'cxl-inject-correctable-error',
>   'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
> +
> +##
> +# @CXLDCExtentRecord:
> +#
> +# Record of a single extent to add/release
> +#
> +# @offset: offset to the start of the region where the extent to be operated
> +# @len: length of the extent
> +#
> +# Since: 9.1
> +##
> +{ 'struct': 'CXLDCExtentRecord',
> +  'data': {
> +      'offset':'uint64',
> +      'len': 'uint64'
> +  }
> +}
> +
> +##
> +# @cxl-add-dynamic-capacity:
> +#
> +# Command to start add dynamic capacity extents flow. The device will
> +# have to acknowledged the acceptance of the extents before they are usable.
> +#
> +# @path: CXL DCD canonical QOM path
> +# @hid: host id
> +# @selection-policy: policy to use for selecting extents for adding capacity
> +# @region-id: id of the region where the extent to add
> +# @tag: Context field
> +# @extents: Extents to add
> +#
> +# Since : 9.1
> +##
> +{ 'command': 'cxl-add-dynamic-capacity',
> +  'data': { 'path': 'str',
> +            'hid': 'uint16',
> +            'selection-policy': 'uint8',
> +            'region-id': 'uint8',
> +            'tag': 'str',
> +            'extents': [ 'CXLDCExtentRecord' ]
> +           }
> +}
> +
> +##
> +# @cxl-release-dynamic-capacity:
> +#
> +# Command to start release dynamic capacity extents flow. The host will
> +# need to respond to indicate that it has released the capacity before it
> +# is made unavailable for read and write and can be re-added.
> +#
> +# @path: CXL DCD canonical QOM path
> +# @hid: host id
> +# @flags: bit[3:0] for removal policy, bit[4] for forced removal, bit[5] for
> +#     sanitize on release, bit[7:6] reserved
> +# @region-id: id of the region where the extent to release
> +# @tag: Context field
> +# @extents: Extents to release
> +#
> +# Since : 9.1
> +##
> +{ 'command': 'cxl-release-dynamic-capacity',
> +  'data': { 'path': 'str',
> +            'hid': 'uint16',
> +            'flags': 'uint8',
> +            'region-id': 'uint8',
> +            'tag': 'str',
> +            'extents': [ 'CXLDCExtentRecord' ]
> +           }
> +}
Markus Armbruster April 26, 2024, 9:12 a.m. UTC | #3
nifan.cxl@gmail.com writes:

> From: Fan Ni <fan.ni@samsung.com>
>
> To simulate FM functionalities for initiating Dynamic Capacity Add
> (Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
> r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
> add/release dynamic capacity extents requests.
>
> With the change, we allow to release an extent only when its DPA range
> is contained by a single accepted extent in the device. That is to say,
> extent superset release is not supported yet.
>
> 1. Add dynamic capacity extents:
>
> For example, the command to add two continuous extents (each 128MiB long)
> to region 0 (starting at DPA offset 0) looks like below:
>
> { "execute": "qmp_capabilities" }
>
> { "execute": "cxl-add-dynamic-capacity",
>   "arguments": {
>       "path": "/machine/peripheral/cxl-dcd0",
>       "hid": 0,
>       "selection-policy": 2,
>       "region-id": 0,
>       "tag": "",
>       "extents": [
>       {
>           "offset": 0,
>           "len": 134217728
>       },
>       {
>           "offset": 134217728,
>           "len": 134217728
>       }
>       ]
>   }
> }
>
> 2. Release dynamic capacity extents:
>
> For example, the command to release an extent of size 128MiB from region 0
> (DPA offset 128MiB) looks like below:
>
> { "execute": "cxl-release-dynamic-capacity",
>   "arguments": {
>       "path": "/machine/peripheral/cxl-dcd0",
>       "hid": 0,
>       "flags": 1,
>       "region-id": 0,
>       "tag": "",
>       "extents": [
>       {
>           "offset": 134217728,
>           "len": 134217728
>       }
>       ]
>   }
> }
>
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
> ---
>  hw/cxl/cxl-mailbox-utils.c  |  62 +++++--
>  hw/mem/cxl_type3.c          | 311 +++++++++++++++++++++++++++++++++++-
>  hw/mem/cxl_type3_stubs.c    |  20 +++
>  include/hw/cxl/cxl_device.h |  22 +++
>  include/hw/cxl/cxl_events.h |  18 +++
>  qapi/cxl.json               |  69 ++++++++
>  6 files changed, 489 insertions(+), 13 deletions(-)
>
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index 9d54e10cd4..3569902e9e 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -1405,7 +1405,7 @@ static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
>   * Check whether any bit between addr[nr, nr+size) is set,
>   * return true if any bit is set, otherwise return false
>   */
> -static bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
> +bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
>                                unsigned long size)
>  {
>      unsigned long res = find_next_bit(addr, size + nr, nr);
> @@ -1444,7 +1444,7 @@ CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len)
>      return NULL;
>  }
>  
> -static void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
> +void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
>                                               uint64_t dpa,
>                                               uint64_t len,
>                                               uint8_t *tag,
> @@ -1470,6 +1470,44 @@ void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
>      g_free(extent);
>  }
>  
> +/*
> + * Add a new extent to the extent "group" if group exists;
> + * otherwise, create a new group
> + * Return value: return the group where the extent is inserted.
> + */
> +CXLDCExtentGroup *cxl_insert_extent_to_extent_group(CXLDCExtentGroup *group,
> +                                                    uint64_t dpa,
> +                                                    uint64_t len,
> +                                                    uint8_t *tag,
> +                                                    uint16_t shared_seq)
> +{
> +    if (!group) {
> +        group = g_new0(CXLDCExtentGroup, 1);
> +        QTAILQ_INIT(&group->list);
> +    }
> +    cxl_insert_extent_to_extent_list(&group->list, dpa, len,
> +                                     tag, shared_seq);
> +    return group;
> +}
> +
> +void cxl_extent_group_list_insert_tail(CXLDCExtentGroupList *list,
> +                                       CXLDCExtentGroup *group)
> +{
> +    QTAILQ_INSERT_TAIL(list, group, node);
> +}
> +
> +void cxl_extent_group_list_delete_front(CXLDCExtentGroupList *list)
> +{
> +    CXLDCExtent *ent, *ent_next;
> +    CXLDCExtentGroup *group = QTAILQ_FIRST(list);
> +
> +    QTAILQ_REMOVE(list, group, node);
> +    QTAILQ_FOREACH_SAFE(ent, &group->list, node, ent_next) {
> +        cxl_remove_extent_from_extent_list(&group->list, ent);
> +    }
> +    g_free(group);
> +}
> +
>  /*
>   * CXL r3.1 Table 8-168: Add Dynamic Capacity Response Input Payload
>   * CXL r3.1 Table 8-170: Release Dynamic Capacity Input Payload
> @@ -1541,6 +1579,7 @@ static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
>  {
>      uint32_t i;
>      CXLDCExtent *ent;
> +    CXLDCExtentGroup *ext_group;
>      uint64_t dpa, len;
>      Range range1, range2;
>  
> @@ -1551,9 +1590,13 @@ static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
>          range_init_nofail(&range1, dpa, len);
>  
>          /*
> -         * TODO: once the pending extent list is added, check against
> -         * the list will be added here.
> +         * The host-accepted DPA range must be contained by the first extent
> +         * group in the pending list
>           */
> +        ext_group = QTAILQ_FIRST(&ct3d->dc.extents_pending);
> +        if (!cxl_extents_contains_dpa_range(&ext_group->list, dpa, len)) {
> +            return CXL_MBOX_INVALID_PA;
> +        }
>  
>          /* to-be-added range should not overlap with range already accepted */
>          QTAILQ_FOREACH(ent, &ct3d->dc.extents, node) {
> @@ -1586,10 +1629,7 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
>      CXLRetCode ret;
>  
>      if (in->num_entries_updated == 0) {
> -        /*
> -         * TODO: once the pending list is introduced, extents in the beginning
> -         * will get wiped out.
> -         */
> +        cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
>          return CXL_MBOX_SUCCESS;
>      }
>  
> @@ -1615,11 +1655,9 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
>  
>          cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
>          ct3d->dc.total_extent_count += 1;
> -        /*
> -         * TODO: we will add a pending extent list based on event log record
> -         * and process the list accordingly here.
> -         */
>      }
> +    /* Remove the first extent group in the pending list*/
> +    cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
>  
>      return CXL_MBOX_SUCCESS;
>  }
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index c2cdd6d506..e892b3de7b 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -667,6 +667,7 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
>          ct3d->dc.total_capacity += region->len;
>      }
>      QTAILQ_INIT(&ct3d->dc.extents);
> +    QTAILQ_INIT(&ct3d->dc.extents_pending);
>  
>      return true;
>  }
> @@ -674,10 +675,19 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
>  static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
>  {
>      CXLDCExtent *ent, *ent_next;
> +    CXLDCExtentGroup *group, *group_next;
>  
>      QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node, ent_next) {
>          cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
>      }
> +
> +    QTAILQ_FOREACH_SAFE(group, &ct3d->dc.extents_pending, node, group_next) {
> +        QTAILQ_REMOVE(&ct3d->dc.extents_pending, group, node);
> +        QTAILQ_FOREACH_SAFE(ent, &group->list, node, ent_next) {
> +            cxl_remove_extent_from_extent_list(&group->list, ent);
> +        }
> +        g_free(group);
> +    }
>  }
>  
>  static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
> @@ -1443,7 +1453,6 @@ static int ct3d_qmp_cxl_event_log_enc(CxlEventLog log)
>          return CXL_EVENT_TYPE_FAIL;
>      case CXL_EVENT_LOG_FATAL:
>          return CXL_EVENT_TYPE_FATAL;
> -/* DCD not yet supported */
>      default:
>          return -EINVAL;
>      }
> @@ -1694,6 +1703,306 @@ void qmp_cxl_inject_memory_module_event(const char *path, CxlEventLog log,
>      }
>  }
>  
> +/* CXL r3.1 Table 8-50: Dynamic Capacity Event Record */
> +static const QemuUUID dynamic_capacity_uuid = {
> +    .data = UUID(0xca95afa7, 0xf183, 0x4018, 0x8c, 0x2f,
> +                 0x95, 0x26, 0x8e, 0x10, 0x1a, 0x2a),
> +};
> +
> +typedef enum CXLDCEventType {
> +    DC_EVENT_ADD_CAPACITY = 0x0,
> +    DC_EVENT_RELEASE_CAPACITY = 0x1,
> +    DC_EVENT_FORCED_RELEASE_CAPACITY = 0x2,
> +    DC_EVENT_REGION_CONFIG_UPDATED = 0x3,
> +    DC_EVENT_ADD_CAPACITY_RSP = 0x4,
> +    DC_EVENT_CAPACITY_RELEASED = 0x5,
> +} CXLDCEventType;
> +
> +/*
> + * Check whether the range [dpa, dpa + len - 1] has overlaps with extents in
> + * the list.
> + * Return value: return true if has overlaps; otherwise, return false
> + */
> +static bool cxl_extents_overlaps_dpa_range(CXLDCExtentList *list,
> +                                           uint64_t dpa, uint64_t len)
> +{
> +    CXLDCExtent *ent;
> +    Range range1, range2;
> +
> +    if (!list) {
> +        return false;
> +    }
> +
> +    range_init_nofail(&range1, dpa, len);
> +    QTAILQ_FOREACH(ent, list, node) {
> +        range_init_nofail(&range2, ent->start_dpa, ent->len);
> +        if (range_overlaps_range(&range1, &range2)) {
> +            return true;
> +        }
> +    }
> +    return false;
> +}
> +
> +/*
> + * Check whether the range [dpa, dpa + len - 1] is contained by extents in
> + * the list.
> + * Will check multiple extents containment once superset release is added.
> + * Return value: return true if range is contained; otherwise, return false
> + */
> +bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
> +                                    uint64_t dpa, uint64_t len)
> +{
> +    CXLDCExtent *ent;
> +    Range range1, range2;
> +
> +    if (!list) {
> +        return false;
> +    }
> +
> +    range_init_nofail(&range1, dpa, len);
> +    QTAILQ_FOREACH(ent, list, node) {
> +        range_init_nofail(&range2, ent->start_dpa, ent->len);
> +        if (range_contains_range(&range2, &range1)) {
> +            return true;
> +        }
> +    }
> +    return false;
> +}
> +
> +static bool cxl_extent_groups_overlaps_dpa_range(CXLDCExtentGroupList *list,
> +                                                uint64_t dpa, uint64_t len)
> +{
> +    CXLDCExtentGroup *group;
> +
> +    if (!list) {
> +        return false;
> +    }
> +
> +    QTAILQ_FOREACH(group, list, node) {
> +        if (cxl_extents_overlaps_dpa_range(&group->list, dpa, len)) {
> +            return true;
> +        }
> +    }
> +    return false;
> +}
> +
> +/*
> + * The main function to process dynamic capacity event with extent list.
> + * Currently DC extents add/release requests are processed.
> + */
> +static void qmp_cxl_process_dynamic_capacity_prescriptive(const char *path,
> +        uint16_t hid, CXLDCEventType type, uint8_t rid,
> +        CXLDCExtentRecordList *records, Error **errp)
> +{
> +    Object *obj;
> +    CXLEventDynamicCapacity dCap = {};
> +    CXLEventRecordHdr *hdr = &dCap.hdr;
> +    CXLType3Dev *dcd;
> +    uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;
> +    uint32_t num_extents = 0;
> +    CXLDCExtentRecordList *list;
> +    CXLDCExtentGroup *group = NULL;
> +    g_autofree CXLDCExtentRaw *extents = NULL;
> +    uint8_t enc_log = CXL_EVENT_TYPE_DYNAMIC_CAP;
> +    uint64_t dpa, offset, len, block_size;
> +    g_autofree unsigned long *blk_bitmap = NULL;
> +    int i;
> +
> +    obj = object_resolve_path_type(path, TYPE_CXL_TYPE3, NULL);
> +    if (!obj) {
> +        error_setg(errp, "Unable to resolve CXL type 3 device");
> +        return;
> +    }
> +
> +    dcd = CXL_TYPE3(obj);
> +    if (!dcd->dc.num_regions) {
> +        error_setg(errp, "No dynamic capacity support from the device");
> +        return;
> +    }
> +
> +
> +    if (rid >= dcd->dc.num_regions) {
> +        error_setg(errp, "region id is too large");
> +        return;
> +    }
> +    block_size = dcd->dc.regions[rid].block_size;
> +    blk_bitmap = bitmap_new(dcd->dc.regions[rid].len / block_size);
> +
> +    /* Sanity check and count the extents */
> +    list = records;
> +    while (list) {
> +        offset = list->value->offset;
> +        len = list->value->len;
> +        dpa = offset + dcd->dc.regions[rid].base;
> +
> +        if (len == 0) {
> +            error_setg(errp, "extent with 0 length is not allowed");
> +            return;
> +        }
> +
> +        if (offset % block_size || len % block_size) {
> +            error_setg(errp, "dpa or len is not aligned to region block size");
> +            return;
> +        }
> +
> +        if (offset + len > dcd->dc.regions[rid].len) {
> +            error_setg(errp, "extent range is beyond the region end");
> +            return;
> +        }
> +
> +        /* No duplicate or overlapped extents are allowed */
> +        if (test_any_bits_set(blk_bitmap, offset / block_size,
> +                              len / block_size)) {
> +            error_setg(errp, "duplicate or overlapped extents are detected");
> +            return;
> +        }
> +        bitmap_set(blk_bitmap, offset / block_size, len / block_size);
> +
> +        if (type == DC_EVENT_RELEASE_CAPACITY) {
> +            if (cxl_extent_groups_overlaps_dpa_range(&dcd->dc.extents_pending,
> +                                                     dpa, len)) {
> +                error_setg(errp,
> +                           "cannot release extent with pending DPA range");
> +                return;
> +            }
> +            if (!cxl_extents_contains_dpa_range(&dcd->dc.extents, dpa, len)) {
> +                error_setg(errp,
> +                           "cannot release extent with non-existing DPA range");
> +                return;
> +            }
> +        } else if (type == DC_EVENT_ADD_CAPACITY) {
> +            if (cxl_extents_overlaps_dpa_range(&dcd->dc.extents, dpa, len)) {
> +                error_setg(errp,
> +                           "cannot add DPA already accessible  to the same LD");
> +                return;
> +            }
> +            if (cxl_extent_groups_overlaps_dpa_range(&dcd->dc.extents_pending,
> +                                                     dpa, len)) {
> +                error_setg(errp,
> +                           "cannot add DPA again while still pending");
> +                return;
> +            }
> +        }
> +        list = list->next;
> +        num_extents++;
> +    }
> +
> +    /* Create extent list for event being passed to host */
> +    i = 0;
> +    list = records;
> +    extents = g_new0(CXLDCExtentRaw, num_extents);
> +    while (list) {
> +        offset = list->value->offset;
> +        len = list->value->len;
> +        dpa = dcd->dc.regions[rid].base + offset;
> +
> +        extents[i].start_dpa = dpa;
> +        extents[i].len = len;
> +        memset(extents[i].tag, 0, 0x10);
> +        extents[i].shared_seq = 0;
> +        if (type == DC_EVENT_ADD_CAPACITY) {
> +            group = cxl_insert_extent_to_extent_group(group,
> +                                                      extents[i].start_dpa,
> +                                                      extents[i].len,
> +                                                      extents[i].tag,
> +                                                      extents[i].shared_seq);
> +        }
> +
> +        list = list->next;
> +        i++;
> +    }
> +    if (group) {
> +        cxl_extent_group_list_insert_tail(&dcd->dc.extents_pending, group);
> +    }
> +
> +    /*
> +     * CXL r3.1 section 8.2.9.2.1.6: Dynamic Capacity Event Record
> +     *
> +     * All Dynamic Capacity event records shall set the Event Record Severity
> +     * field in the Common Event Record Format to Informational Event. All
> +     * Dynamic Capacity related events shall be logged in the Dynamic Capacity
> +     * Event Log.
> +     */
> +    cxl_assign_event_header(hdr, &dynamic_capacity_uuid, flags, sizeof(dCap),
> +                            cxl_device_get_timestamp(&dcd->cxl_dstate));
> +
> +    dCap.type = type;
> +    /* FIXME: for now, validity flag is cleared */
> +    dCap.validity_flags = 0;
> +    stw_le_p(&dCap.host_id, hid);
> +    /* only valid for DC_REGION_CONFIG_UPDATED event */
> +    dCap.updated_region_id = 0;
> +    dCap.flags = 0;
> +    for (i = 0; i < num_extents; i++) {
> +        memcpy(&dCap.dynamic_capacity_extent, &extents[i],
> +               sizeof(CXLDCExtentRaw));
> +
> +        if (i < num_extents - 1) {
> +            /* Set "More" flag */
> +            dCap.flags |= BIT(0);
> +        }
> +
> +        if (cxl_event_insert(&dcd->cxl_dstate, enc_log,
> +                             (CXLEventRecordRaw *)&dCap)) {
> +            cxl_event_irq_assert(dcd);
> +        }
> +    }
> +}
> +
> +void qmp_cxl_add_dynamic_capacity(const char *path, uint16_t hid,
> +                                  uint8_t sel_policy, uint8_t region_id,
> +                                  const char *tag,
> +                                  CXLDCExtentRecordList  *records,
> +                                  Error **errp)
> +{
> +    enum {
> +        CXL_SEL_POLICY_FREE,
> +        CXL_SEL_POLICY_CONTIGUOUS,
> +        CXL_SEL_POLICY_PRESCRIPTIVE,
> +        CXL_SEL_POLICY_ENABLESHAREDACCESS,
> +    };
> +    switch (sel_policy) {
> +    case CXL_SEL_POLICY_PRESCRIPTIVE:
> +        qmp_cxl_process_dynamic_capacity_prescriptive(path, hid,
> +                                                      DC_EVENT_ADD_CAPACITY,
> +                                                      region_id, records, errp);
> +        return;
> +    default:
> +        error_setg(errp, "Selection policy not supported");
> +        return;
> +    }
> +}
> +
> +#define REMOVAL_POLICY_MASK 0xf
> +#define REMOVAL_POLICY_PRESCRIPTIVE 1
> +#define FORCED_REMOVAL_BIT BIT(4)
> +
> +void qmp_cxl_release_dynamic_capacity(const char *path, uint16_t hid,
> +                                      uint8_t flags, uint8_t region_id,
> +                                      const char *tag,
> +                                      CXLDCExtentRecordList  *records,
> +                                      Error **errp)
> +{
> +    CXLDCEventType type = DC_EVENT_RELEASE_CAPACITY;
> +
> +    if (flags & FORCED_REMOVAL_BIT) {
> +        /* TODO: enable forced removal in the future */
> +        type = DC_EVENT_FORCED_RELEASE_CAPACITY;
> +        error_setg(errp, "Forced removal not supported yet");
> +        return;
> +    }
> +
> +    switch (flags & REMOVAL_POLICY_MASK) {
> +    case REMOVAL_POLICY_PRESCRIPTIVE:
> +        qmp_cxl_process_dynamic_capacity_prescriptive(path, hid, type,
> +                                                      region_id, records, errp);
> +        return;
> +    default:
> +        error_setg(errp, "Removal policy not supported");
> +        return;
> +    }
> +}
> +
>  static void ct3_class_init(ObjectClass *oc, void *data)
>  {
>      DeviceClass *dc = DEVICE_CLASS(oc);
> diff --git a/hw/mem/cxl_type3_stubs.c b/hw/mem/cxl_type3_stubs.c
> index 3e1851e32b..810685e0d5 100644
> --- a/hw/mem/cxl_type3_stubs.c
> +++ b/hw/mem/cxl_type3_stubs.c
> @@ -67,3 +67,23 @@ void qmp_cxl_inject_correctable_error(const char *path, CxlCorErrorType type,
>  {
>      error_setg(errp, "CXL Type 3 support is not compiled in");
>  }
> +
> +void qmp_cxl_add_dynamic_capacity(const char *path,
> +                                  uint16_t hid,
> +                                  uint8_t sel_policy,
> +                                  uint8_t region_id,
> +                                  const char *tag,
> +                                  CXLDCExtentRecordList  *records,
> +                                  Error **errp)
> +{
> +    error_setg(errp, "CXL Type 3 support is not compiled in");
> +}
> +
> +void qmp_cxl_release_dynamic_capacity(const char *path, uint16_t hid,
> +                                      uint8_t flags, uint8_t region_id,
> +                                      const char *tag,
> +                                      CXLDCExtentRecordList  *records,
> +                                      Error **errp)
> +{
> +    error_setg(errp, "CXL Type 3 support is not compiled in");
> +}
> diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> index df3511e91b..c69ff6b5de 100644
> --- a/include/hw/cxl/cxl_device.h
> +++ b/include/hw/cxl/cxl_device.h
> @@ -443,6 +443,12 @@ typedef struct CXLDCExtent {
>  } CXLDCExtent;
>  typedef QTAILQ_HEAD(, CXLDCExtent) CXLDCExtentList;
>  
> +typedef struct CXLDCExtentGroup {
> +    CXLDCExtentList list;
> +    QTAILQ_ENTRY(CXLDCExtentGroup) node;
> +} CXLDCExtentGroup;
> +typedef QTAILQ_HEAD(, CXLDCExtentGroup) CXLDCExtentGroupList;
> +
>  typedef struct CXLDCRegion {
>      uint64_t base;       /* aligned to 256*MiB */
>      uint64_t decode_len; /* aligned to 256*MiB */
> @@ -494,6 +500,7 @@ struct CXLType3Dev {
>           */
>          uint64_t total_capacity; /* 256M aligned */
>          CXLDCExtentList extents;
> +        CXLDCExtentGroupList extents_pending;
>          uint32_t total_extent_count;
>          uint32_t ext_list_gen_seq;
>  
> @@ -555,4 +562,19 @@ CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len);
>  
>  void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
>                                          CXLDCExtent *extent);
> +void cxl_insert_extent_to_extent_list(CXLDCExtentList *list, uint64_t dpa,
> +                                      uint64_t len, uint8_t *tag,
> +                                      uint16_t shared_seq);
> +bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
> +                       unsigned long size);
> +bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
> +                                    uint64_t dpa, uint64_t len);
> +CXLDCExtentGroup *cxl_insert_extent_to_extent_group(CXLDCExtentGroup *group,
> +                                                    uint64_t dpa,
> +                                                    uint64_t len,
> +                                                    uint8_t *tag,
> +                                                    uint16_t shared_seq);
> +void cxl_extent_group_list_insert_tail(CXLDCExtentGroupList *list,
> +                                       CXLDCExtentGroup *group);
> +void cxl_extent_group_list_delete_front(CXLDCExtentGroupList *list);
>  #endif
> diff --git a/include/hw/cxl/cxl_events.h b/include/hw/cxl/cxl_events.h
> index 5170b8dbf8..38cadaa0f3 100644
> --- a/include/hw/cxl/cxl_events.h
> +++ b/include/hw/cxl/cxl_events.h
> @@ -166,4 +166,22 @@ typedef struct CXLEventMemoryModule {
>      uint8_t reserved[0x3d];
>  } QEMU_PACKED CXLEventMemoryModule;
>  
> +/*
> + * CXL r3.1 section Table 8-50: Dynamic Capacity Event Record
> + * All fields little endian.
> + */
> +typedef struct CXLEventDynamicCapacity {
> +    CXLEventRecordHdr hdr;
> +    uint8_t type;
> +    uint8_t validity_flags;
> +    uint16_t host_id;
> +    uint8_t updated_region_id;
> +    uint8_t flags;
> +    uint8_t reserved2[2];
> +    uint8_t dynamic_capacity_extent[0x28]; /* defined in cxl_device.h */
> +    uint8_t reserved[0x18];
> +    uint32_t extents_avail;
> +    uint32_t tags_avail;
> +} QEMU_PACKED CXLEventDynamicCapacity;
> +
>  #endif /* CXL_EVENTS_H */
> diff --git a/qapi/cxl.json b/qapi/cxl.json
> index 4281726dec..2dcf03d973 100644
> --- a/qapi/cxl.json
> +++ b/qapi/cxl.json
> @@ -361,3 +361,72 @@
>  ##
>  {'command': 'cxl-inject-correctable-error',
>   'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
> +
> +##
> +# @CXLDCExtentRecord:

Such traffic jams of capital letters are hard to read.  What about
CxlDynamicCapacityExtent?

> +#
> +# Record of a single extent to add/release

Suggest "A dynamic capacity extent."

> +#
> +# @offset: offset to the start of the region where the extent to be operated

Blank line here, please.



> +# @len: length of the extent
> +#
> +# Since: 9.1
> +##
> +{ 'struct': 'CXLDCExtentRecord',
> +  'data': {
> +      'offset':'uint64',
> +      'len': 'uint64'
> +  }
> +}
> +
> +##
> +# @cxl-add-dynamic-capacity:
> +#
> +# Command to start add dynamic capacity extents flow. The device will
> +# have to acknowledged the acceptance of the extents before they are usable.

This text needs work.  More on that at the end of my review.

docs/devel/qapi-code-gen.rst:

    For legibility, wrap text paragraphs so every line is at most 70
    characters long.

    Separate sentences with two spaces.

More elsewhere.

> +#
> +# @path: CXL DCD canonical QOM path

I'd prefer @qom-path, unless you can make a consistency argument for
@path.

Sure the QOM path needs to be canonical?

If not, what about "path to the CXL dynamic capacity device in the QOM
tree".  Intentionally close to existing descriptions of @qom-path
elsewhere.

> +# @hid: host id

@host-id, unless "HID" is established terminology in CXL DCD land.

What is a host ID?

> +# @selection-policy: policy to use for selecting extents for adding capacity

Where are selection policies defined?

> +# @region-id: id of the region where the extent to add

Is "region ID" the established terminology in CXL DCD land?  Or is
"region number" also used?  I'm asking because "ID" in this QEMU device
context suggests a connection to a qdev ID.

If region number is fine, I'd rename to just @region, and rephrase the
description to avoid "ID".  Perhaps "number of the region the extent is
to be added to".  Not entirely happy with the phrasing, doesn't exactly
roll off the tongue, but "where the extent to add" sounds worse to my
ears.  Mind, I'm not a native speaker.

> +# @tag: Context field

What is this about?

> +# @extents: Extents to add

Blank lines between argument descriptions, please.

> +#
> +# Since : 9.1
> +##
> +{ 'command': 'cxl-add-dynamic-capacity',
> +  'data': { 'path': 'str',
> +            'hid': 'uint16',
> +            'selection-policy': 'uint8',
> +            'region-id': 'uint8',
> +            'tag': 'str',
> +            'extents': [ 'CXLDCExtentRecord' ]
> +           }
> +}
> +
> +##
> +# @cxl-release-dynamic-capacity:
> +#
> +# Command to start release dynamic capacity extents flow. The host will
> +# need to respond to indicate that it has released the capacity before it
> +# is made unavailable for read and write and can be re-added.

This text needs work.  More on that at the end of my review.

> +#
> +# @path: CXL DCD canonical QOM path

My comment on cxl-add-dynamic-capacity applies.

> +# @hid: host id

Likewise.

> +# @flags: bit[3:0] for removal policy, bit[4] for forced removal, bit[5] for
> +#     sanitize on release, bit[7:6] reserved

Where are these flags defined?

> +# @region-id: id of the region where the extent to release

My comment on cxl-add-dynamic-capacity applies.

> +# @tag: Context field

Likewise.

> +# @extents: Extents to release
> +#
> +# Since : 9.1
> +##
> +{ 'command': 'cxl-release-dynamic-capacity',
> +  'data': { 'path': 'str',
> +            'hid': 'uint16',
> +            'flags': 'uint8',
> +            'region-id': 'uint8',
> +            'tag': 'str',
> +            'extents': [ 'CXLDCExtentRecord' ]
> +           }
> +}

During review of v5, you wrote:

    For add command, the host will send a mailbox command to response to
    the add request to the device to indicate whether it accepts the add
    capacity offer or not.
    
    For release command, the host send a mailbox command (not always a
    response since the host can proactively release capacity if it does
    not need it any more) to device to ask device release the capacity.

Can you briefly sketch the protocol?  Peers and messages involved.
Possibly as a state diagram.
Fan Ni April 26, 2024, 5:31 p.m. UTC | #4
On Fri, Apr 26, 2024 at 11:12:50AM +0200, Markus Armbruster wrote:
> nifan.cxl@gmail.com writes:
> 
> > From: Fan Ni <fan.ni@samsung.com>
> >
> > To simulate FM functionalities for initiating Dynamic Capacity Add
> > (Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
> > r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
> > add/release dynamic capacity extents requests.
> >
> > With the change, we allow to release an extent only when its DPA range
> > is contained by a single accepted extent in the device. That is to say,
> > extent superset release is not supported yet.
> >
> > 1. Add dynamic capacity extents:
> >
> > For example, the command to add two continuous extents (each 128MiB long)
> > to region 0 (starting at DPA offset 0) looks like below:
> >
> > { "execute": "qmp_capabilities" }
> >
> > { "execute": "cxl-add-dynamic-capacity",
> >   "arguments": {
> >       "path": "/machine/peripheral/cxl-dcd0",
> >       "hid": 0,
> >       "selection-policy": 2,
> >       "region-id": 0,
> >       "tag": "",
> >       "extents": [
> >       {
> >           "offset": 0,
> >           "len": 134217728
> >       },
> >       {
> >           "offset": 134217728,
> >           "len": 134217728
> >       }
> >       ]
> >   }
> > }
> >
> > 2. Release dynamic capacity extents:
> >
> > For example, the command to release an extent of size 128MiB from region 0
> > (DPA offset 128MiB) looks like below:
> >
> > { "execute": "cxl-release-dynamic-capacity",
> >   "arguments": {
> >       "path": "/machine/peripheral/cxl-dcd0",
> >       "hid": 0,
> >       "flags": 1,
> >       "region-id": 0,
> >       "tag": "",
> >       "extents": [
> >       {
> >           "offset": 134217728,
> >           "len": 134217728
> >       }
> >       ]
> >   }
> > }
> >
> > Signed-off-by: Fan Ni <fan.ni@samsung.com>
> > ---
> >  hw/cxl/cxl-mailbox-utils.c  |  62 +++++--
> >  hw/mem/cxl_type3.c          | 311 +++++++++++++++++++++++++++++++++++-
> >  hw/mem/cxl_type3_stubs.c    |  20 +++
> >  include/hw/cxl/cxl_device.h |  22 +++
> >  include/hw/cxl/cxl_events.h |  18 +++
> >  qapi/cxl.json               |  69 ++++++++
> >  6 files changed, 489 insertions(+), 13 deletions(-)
> >
> > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > index 9d54e10cd4..3569902e9e 100644
> > --- a/hw/cxl/cxl-mailbox-utils.c
> > +++ b/hw/cxl/cxl-mailbox-utils.c
> > @@ -1405,7 +1405,7 @@ static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
> >   * Check whether any bit between addr[nr, nr+size) is set,
> >   * return true if any bit is set, otherwise return false
> >   */
> > -static bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
> > +bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
> >                                unsigned long size)
> >  {
> >      unsigned long res = find_next_bit(addr, size + nr, nr);
> > @@ -1444,7 +1444,7 @@ CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len)
> >      return NULL;
> >  }
> >  
> > -static void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
> > +void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
> >                                               uint64_t dpa,
> >                                               uint64_t len,
> >                                               uint8_t *tag,
> > @@ -1470,6 +1470,44 @@ void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
> >      g_free(extent);
> >  }
> >  
> > +/*
> > + * Add a new extent to the extent "group" if group exists;
> > + * otherwise, create a new group
> > + * Return value: return the group where the extent is inserted.
> > + */
> > +CXLDCExtentGroup *cxl_insert_extent_to_extent_group(CXLDCExtentGroup *group,
> > +                                                    uint64_t dpa,
> > +                                                    uint64_t len,
> > +                                                    uint8_t *tag,
> > +                                                    uint16_t shared_seq)
> > +{
> > +    if (!group) {
> > +        group = g_new0(CXLDCExtentGroup, 1);
> > +        QTAILQ_INIT(&group->list);
> > +    }
> > +    cxl_insert_extent_to_extent_list(&group->list, dpa, len,
> > +                                     tag, shared_seq);
> > +    return group;
> > +}
> > +
> > +void cxl_extent_group_list_insert_tail(CXLDCExtentGroupList *list,
> > +                                       CXLDCExtentGroup *group)
> > +{
> > +    QTAILQ_INSERT_TAIL(list, group, node);
> > +}
> > +
> > +void cxl_extent_group_list_delete_front(CXLDCExtentGroupList *list)
> > +{
> > +    CXLDCExtent *ent, *ent_next;
> > +    CXLDCExtentGroup *group = QTAILQ_FIRST(list);
> > +
> > +    QTAILQ_REMOVE(list, group, node);
> > +    QTAILQ_FOREACH_SAFE(ent, &group->list, node, ent_next) {
> > +        cxl_remove_extent_from_extent_list(&group->list, ent);
> > +    }
> > +    g_free(group);
> > +}
> > +
> >  /*
> >   * CXL r3.1 Table 8-168: Add Dynamic Capacity Response Input Payload
> >   * CXL r3.1 Table 8-170: Release Dynamic Capacity Input Payload
> > @@ -1541,6 +1579,7 @@ static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
> >  {
> >      uint32_t i;
> >      CXLDCExtent *ent;
> > +    CXLDCExtentGroup *ext_group;
> >      uint64_t dpa, len;
> >      Range range1, range2;
> >  
> > @@ -1551,9 +1590,13 @@ static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
> >          range_init_nofail(&range1, dpa, len);
> >  
> >          /*
> > -         * TODO: once the pending extent list is added, check against
> > -         * the list will be added here.
> > +         * The host-accepted DPA range must be contained by the first extent
> > +         * group in the pending list
> >           */
> > +        ext_group = QTAILQ_FIRST(&ct3d->dc.extents_pending);
> > +        if (!cxl_extents_contains_dpa_range(&ext_group->list, dpa, len)) {
> > +            return CXL_MBOX_INVALID_PA;
> > +        }
> >  
> >          /* to-be-added range should not overlap with range already accepted */
> >          QTAILQ_FOREACH(ent, &ct3d->dc.extents, node) {
> > @@ -1586,10 +1629,7 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
> >      CXLRetCode ret;
> >  
> >      if (in->num_entries_updated == 0) {
> > -        /*
> > -         * TODO: once the pending list is introduced, extents in the beginning
> > -         * will get wiped out.
> > -         */
> > +        cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
> >          return CXL_MBOX_SUCCESS;
> >      }
> >  
> > @@ -1615,11 +1655,9 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
> >  
> >          cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
> >          ct3d->dc.total_extent_count += 1;
> > -        /*
> > -         * TODO: we will add a pending extent list based on event log record
> > -         * and process the list accordingly here.
> > -         */
> >      }
> > +    /* Remove the first extent group in the pending list*/
> > +    cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
> >  
> >      return CXL_MBOX_SUCCESS;
> >  }
> > diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> > index c2cdd6d506..e892b3de7b 100644
> > --- a/hw/mem/cxl_type3.c
> > +++ b/hw/mem/cxl_type3.c
> > @@ -667,6 +667,7 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
> >          ct3d->dc.total_capacity += region->len;
> >      }
> >      QTAILQ_INIT(&ct3d->dc.extents);
> > +    QTAILQ_INIT(&ct3d->dc.extents_pending);
> >  
> >      return true;
> >  }
> > @@ -674,10 +675,19 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
> >  static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
> >  {
> >      CXLDCExtent *ent, *ent_next;
> > +    CXLDCExtentGroup *group, *group_next;
> >  
> >      QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node, ent_next) {
> >          cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
> >      }
> > +
> > +    QTAILQ_FOREACH_SAFE(group, &ct3d->dc.extents_pending, node, group_next) {
> > +        QTAILQ_REMOVE(&ct3d->dc.extents_pending, group, node);
> > +        QTAILQ_FOREACH_SAFE(ent, &group->list, node, ent_next) {
> > +            cxl_remove_extent_from_extent_list(&group->list, ent);
> > +        }
> > +        g_free(group);
> > +    }
> >  }
> >  
> >  static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
> > @@ -1443,7 +1453,6 @@ static int ct3d_qmp_cxl_event_log_enc(CxlEventLog log)
> >          return CXL_EVENT_TYPE_FAIL;
> >      case CXL_EVENT_LOG_FATAL:
> >          return CXL_EVENT_TYPE_FATAL;
> > -/* DCD not yet supported */
> >      default:
> >          return -EINVAL;
> >      }
> > @@ -1694,6 +1703,306 @@ void qmp_cxl_inject_memory_module_event(const char *path, CxlEventLog log,
> >      }
> >  }
> >  
> > +/* CXL r3.1 Table 8-50: Dynamic Capacity Event Record */
> > +static const QemuUUID dynamic_capacity_uuid = {
> > +    .data = UUID(0xca95afa7, 0xf183, 0x4018, 0x8c, 0x2f,
> > +                 0x95, 0x26, 0x8e, 0x10, 0x1a, 0x2a),
> > +};
> > +
> > +typedef enum CXLDCEventType {
> > +    DC_EVENT_ADD_CAPACITY = 0x0,
> > +    DC_EVENT_RELEASE_CAPACITY = 0x1,
> > +    DC_EVENT_FORCED_RELEASE_CAPACITY = 0x2,
> > +    DC_EVENT_REGION_CONFIG_UPDATED = 0x3,
> > +    DC_EVENT_ADD_CAPACITY_RSP = 0x4,
> > +    DC_EVENT_CAPACITY_RELEASED = 0x5,
> > +} CXLDCEventType;
> > +
> > +/*
> > + * Check whether the range [dpa, dpa + len - 1] has overlaps with extents in
> > + * the list.
> > + * Return value: return true if has overlaps; otherwise, return false
> > + */
> > +static bool cxl_extents_overlaps_dpa_range(CXLDCExtentList *list,
> > +                                           uint64_t dpa, uint64_t len)
> > +{
> > +    CXLDCExtent *ent;
> > +    Range range1, range2;
> > +
> > +    if (!list) {
> > +        return false;
> > +    }
> > +
> > +    range_init_nofail(&range1, dpa, len);
> > +    QTAILQ_FOREACH(ent, list, node) {
> > +        range_init_nofail(&range2, ent->start_dpa, ent->len);
> > +        if (range_overlaps_range(&range1, &range2)) {
> > +            return true;
> > +        }
> > +    }
> > +    return false;
> > +}
> > +
> > +/*
> > + * Check whether the range [dpa, dpa + len - 1] is contained by extents in
> > + * the list.
> > + * Will check multiple extents containment once superset release is added.
> > + * Return value: return true if range is contained; otherwise, return false
> > + */
> > +bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
> > +                                    uint64_t dpa, uint64_t len)
> > +{
> > +    CXLDCExtent *ent;
> > +    Range range1, range2;
> > +
> > +    if (!list) {
> > +        return false;
> > +    }
> > +
> > +    range_init_nofail(&range1, dpa, len);
> > +    QTAILQ_FOREACH(ent, list, node) {
> > +        range_init_nofail(&range2, ent->start_dpa, ent->len);
> > +        if (range_contains_range(&range2, &range1)) {
> > +            return true;
> > +        }
> > +    }
> > +    return false;
> > +}
> > +
> > +static bool cxl_extent_groups_overlaps_dpa_range(CXLDCExtentGroupList *list,
> > +                                                uint64_t dpa, uint64_t len)
> > +{
> > +    CXLDCExtentGroup *group;
> > +
> > +    if (!list) {
> > +        return false;
> > +    }
> > +
> > +    QTAILQ_FOREACH(group, list, node) {
> > +        if (cxl_extents_overlaps_dpa_range(&group->list, dpa, len)) {
> > +            return true;
> > +        }
> > +    }
> > +    return false;
> > +}
> > +
> > +/*
> > + * The main function to process dynamic capacity event with extent list.
> > + * Currently DC extents add/release requests are processed.
> > + */
> > +static void qmp_cxl_process_dynamic_capacity_prescriptive(const char *path,
> > +        uint16_t hid, CXLDCEventType type, uint8_t rid,
> > +        CXLDCExtentRecordList *records, Error **errp)
> > +{
> > +    Object *obj;
> > +    CXLEventDynamicCapacity dCap = {};
> > +    CXLEventRecordHdr *hdr = &dCap.hdr;
> > +    CXLType3Dev *dcd;
> > +    uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;
> > +    uint32_t num_extents = 0;
> > +    CXLDCExtentRecordList *list;
> > +    CXLDCExtentGroup *group = NULL;
> > +    g_autofree CXLDCExtentRaw *extents = NULL;
> > +    uint8_t enc_log = CXL_EVENT_TYPE_DYNAMIC_CAP;
> > +    uint64_t dpa, offset, len, block_size;
> > +    g_autofree unsigned long *blk_bitmap = NULL;
> > +    int i;
> > +
> > +    obj = object_resolve_path_type(path, TYPE_CXL_TYPE3, NULL);
> > +    if (!obj) {
> > +        error_setg(errp, "Unable to resolve CXL type 3 device");
> > +        return;
> > +    }
> > +
> > +    dcd = CXL_TYPE3(obj);
> > +    if (!dcd->dc.num_regions) {
> > +        error_setg(errp, "No dynamic capacity support from the device");
> > +        return;
> > +    }
> > +
> > +
> > +    if (rid >= dcd->dc.num_regions) {
> > +        error_setg(errp, "region id is too large");
> > +        return;
> > +    }
> > +    block_size = dcd->dc.regions[rid].block_size;
> > +    blk_bitmap = bitmap_new(dcd->dc.regions[rid].len / block_size);
> > +
> > +    /* Sanity check and count the extents */
> > +    list = records;
> > +    while (list) {
> > +        offset = list->value->offset;
> > +        len = list->value->len;
> > +        dpa = offset + dcd->dc.regions[rid].base;
> > +
> > +        if (len == 0) {
> > +            error_setg(errp, "extent with 0 length is not allowed");
> > +            return;
> > +        }
> > +
> > +        if (offset % block_size || len % block_size) {
> > +            error_setg(errp, "dpa or len is not aligned to region block size");
> > +            return;
> > +        }
> > +
> > +        if (offset + len > dcd->dc.regions[rid].len) {
> > +            error_setg(errp, "extent range is beyond the region end");
> > +            return;
> > +        }
> > +
> > +        /* No duplicate or overlapped extents are allowed */
> > +        if (test_any_bits_set(blk_bitmap, offset / block_size,
> > +                              len / block_size)) {
> > +            error_setg(errp, "duplicate or overlapped extents are detected");
> > +            return;
> > +        }
> > +        bitmap_set(blk_bitmap, offset / block_size, len / block_size);
> > +
> > +        if (type == DC_EVENT_RELEASE_CAPACITY) {
> > +            if (cxl_extent_groups_overlaps_dpa_range(&dcd->dc.extents_pending,
> > +                                                     dpa, len)) {
> > +                error_setg(errp,
> > +                           "cannot release extent with pending DPA range");
> > +                return;
> > +            }
> > +            if (!cxl_extents_contains_dpa_range(&dcd->dc.extents, dpa, len)) {
> > +                error_setg(errp,
> > +                           "cannot release extent with non-existing DPA range");
> > +                return;
> > +            }
> > +        } else if (type == DC_EVENT_ADD_CAPACITY) {
> > +            if (cxl_extents_overlaps_dpa_range(&dcd->dc.extents, dpa, len)) {
> > +                error_setg(errp,
> > +                           "cannot add DPA already accessible  to the same LD");
> > +                return;
> > +            }
> > +            if (cxl_extent_groups_overlaps_dpa_range(&dcd->dc.extents_pending,
> > +                                                     dpa, len)) {
> > +                error_setg(errp,
> > +                           "cannot add DPA again while still pending");
> > +                return;
> > +            }
> > +        }
> > +        list = list->next;
> > +        num_extents++;
> > +    }
> > +
> > +    /* Create extent list for event being passed to host */
> > +    i = 0;
> > +    list = records;
> > +    extents = g_new0(CXLDCExtentRaw, num_extents);
> > +    while (list) {
> > +        offset = list->value->offset;
> > +        len = list->value->len;
> > +        dpa = dcd->dc.regions[rid].base + offset;
> > +
> > +        extents[i].start_dpa = dpa;
> > +        extents[i].len = len;
> > +        memset(extents[i].tag, 0, 0x10);
> > +        extents[i].shared_seq = 0;
> > +        if (type == DC_EVENT_ADD_CAPACITY) {
> > +            group = cxl_insert_extent_to_extent_group(group,
> > +                                                      extents[i].start_dpa,
> > +                                                      extents[i].len,
> > +                                                      extents[i].tag,
> > +                                                      extents[i].shared_seq);
> > +        }
> > +
> > +        list = list->next;
> > +        i++;
> > +    }
> > +    if (group) {
> > +        cxl_extent_group_list_insert_tail(&dcd->dc.extents_pending, group);
> > +    }
> > +
> > +    /*
> > +     * CXL r3.1 section 8.2.9.2.1.6: Dynamic Capacity Event Record
> > +     *
> > +     * All Dynamic Capacity event records shall set the Event Record Severity
> > +     * field in the Common Event Record Format to Informational Event. All
> > +     * Dynamic Capacity related events shall be logged in the Dynamic Capacity
> > +     * Event Log.
> > +     */
> > +    cxl_assign_event_header(hdr, &dynamic_capacity_uuid, flags, sizeof(dCap),
> > +                            cxl_device_get_timestamp(&dcd->cxl_dstate));
> > +
> > +    dCap.type = type;
> > +    /* FIXME: for now, validity flag is cleared */
> > +    dCap.validity_flags = 0;
> > +    stw_le_p(&dCap.host_id, hid);
> > +    /* only valid for DC_REGION_CONFIG_UPDATED event */
> > +    dCap.updated_region_id = 0;
> > +    dCap.flags = 0;
> > +    for (i = 0; i < num_extents; i++) {
> > +        memcpy(&dCap.dynamic_capacity_extent, &extents[i],
> > +               sizeof(CXLDCExtentRaw));
> > +
> > +        if (i < num_extents - 1) {
> > +            /* Set "More" flag */
> > +            dCap.flags |= BIT(0);
> > +        }
> > +
> > +        if (cxl_event_insert(&dcd->cxl_dstate, enc_log,
> > +                             (CXLEventRecordRaw *)&dCap)) {
> > +            cxl_event_irq_assert(dcd);
> > +        }
> > +    }
> > +}
> > +
> > +void qmp_cxl_add_dynamic_capacity(const char *path, uint16_t hid,
> > +                                  uint8_t sel_policy, uint8_t region_id,
> > +                                  const char *tag,
> > +                                  CXLDCExtentRecordList  *records,
> > +                                  Error **errp)
> > +{
> > +    enum {
> > +        CXL_SEL_POLICY_FREE,
> > +        CXL_SEL_POLICY_CONTIGUOUS,
> > +        CXL_SEL_POLICY_PRESCRIPTIVE,
> > +        CXL_SEL_POLICY_ENABLESHAREDACCESS,
> > +    };
> > +    switch (sel_policy) {
> > +    case CXL_SEL_POLICY_PRESCRIPTIVE:
> > +        qmp_cxl_process_dynamic_capacity_prescriptive(path, hid,
> > +                                                      DC_EVENT_ADD_CAPACITY,
> > +                                                      region_id, records, errp);
> > +        return;
> > +    default:
> > +        error_setg(errp, "Selection policy not supported");
> > +        return;
> > +    }
> > +}
> > +
> > +#define REMOVAL_POLICY_MASK 0xf
> > +#define REMOVAL_POLICY_PRESCRIPTIVE 1
> > +#define FORCED_REMOVAL_BIT BIT(4)
> > +
> > +void qmp_cxl_release_dynamic_capacity(const char *path, uint16_t hid,
> > +                                      uint8_t flags, uint8_t region_id,
> > +                                      const char *tag,
> > +                                      CXLDCExtentRecordList  *records,
> > +                                      Error **errp)
> > +{
> > +    CXLDCEventType type = DC_EVENT_RELEASE_CAPACITY;
> > +
> > +    if (flags & FORCED_REMOVAL_BIT) {
> > +        /* TODO: enable forced removal in the future */
> > +        type = DC_EVENT_FORCED_RELEASE_CAPACITY;
> > +        error_setg(errp, "Forced removal not supported yet");
> > +        return;
> > +    }
> > +
> > +    switch (flags & REMOVAL_POLICY_MASK) {
> > +    case REMOVAL_POLICY_PRESCRIPTIVE:
> > +        qmp_cxl_process_dynamic_capacity_prescriptive(path, hid, type,
> > +                                                      region_id, records, errp);
> > +        return;
> > +    default:
> > +        error_setg(errp, "Removal policy not supported");
> > +        return;
> > +    }
> > +}
> > +
> >  static void ct3_class_init(ObjectClass *oc, void *data)
> >  {
> >      DeviceClass *dc = DEVICE_CLASS(oc);
> > diff --git a/hw/mem/cxl_type3_stubs.c b/hw/mem/cxl_type3_stubs.c
> > index 3e1851e32b..810685e0d5 100644
> > --- a/hw/mem/cxl_type3_stubs.c
> > +++ b/hw/mem/cxl_type3_stubs.c
> > @@ -67,3 +67,23 @@ void qmp_cxl_inject_correctable_error(const char *path, CxlCorErrorType type,
> >  {
> >      error_setg(errp, "CXL Type 3 support is not compiled in");
> >  }
> > +
> > +void qmp_cxl_add_dynamic_capacity(const char *path,
> > +                                  uint16_t hid,
> > +                                  uint8_t sel_policy,
> > +                                  uint8_t region_id,
> > +                                  const char *tag,
> > +                                  CXLDCExtentRecordList  *records,
> > +                                  Error **errp)
> > +{
> > +    error_setg(errp, "CXL Type 3 support is not compiled in");
> > +}
> > +
> > +void qmp_cxl_release_dynamic_capacity(const char *path, uint16_t hid,
> > +                                      uint8_t flags, uint8_t region_id,
> > +                                      const char *tag,
> > +                                      CXLDCExtentRecordList  *records,
> > +                                      Error **errp)
> > +{
> > +    error_setg(errp, "CXL Type 3 support is not compiled in");
> > +}
> > diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> > index df3511e91b..c69ff6b5de 100644
> > --- a/include/hw/cxl/cxl_device.h
> > +++ b/include/hw/cxl/cxl_device.h
> > @@ -443,6 +443,12 @@ typedef struct CXLDCExtent {
> >  } CXLDCExtent;
> >  typedef QTAILQ_HEAD(, CXLDCExtent) CXLDCExtentList;
> >  
> > +typedef struct CXLDCExtentGroup {
> > +    CXLDCExtentList list;
> > +    QTAILQ_ENTRY(CXLDCExtentGroup) node;
> > +} CXLDCExtentGroup;
> > +typedef QTAILQ_HEAD(, CXLDCExtentGroup) CXLDCExtentGroupList;
> > +
> >  typedef struct CXLDCRegion {
> >      uint64_t base;       /* aligned to 256*MiB */
> >      uint64_t decode_len; /* aligned to 256*MiB */
> > @@ -494,6 +500,7 @@ struct CXLType3Dev {
> >           */
> >          uint64_t total_capacity; /* 256M aligned */
> >          CXLDCExtentList extents;
> > +        CXLDCExtentGroupList extents_pending;
> >          uint32_t total_extent_count;
> >          uint32_t ext_list_gen_seq;
> >  
> > @@ -555,4 +562,19 @@ CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len);
> >  
> >  void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
> >                                          CXLDCExtent *extent);
> > +void cxl_insert_extent_to_extent_list(CXLDCExtentList *list, uint64_t dpa,
> > +                                      uint64_t len, uint8_t *tag,
> > +                                      uint16_t shared_seq);
> > +bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
> > +                       unsigned long size);
> > +bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
> > +                                    uint64_t dpa, uint64_t len);
> > +CXLDCExtentGroup *cxl_insert_extent_to_extent_group(CXLDCExtentGroup *group,
> > +                                                    uint64_t dpa,
> > +                                                    uint64_t len,
> > +                                                    uint8_t *tag,
> > +                                                    uint16_t shared_seq);
> > +void cxl_extent_group_list_insert_tail(CXLDCExtentGroupList *list,
> > +                                       CXLDCExtentGroup *group);
> > +void cxl_extent_group_list_delete_front(CXLDCExtentGroupList *list);
> >  #endif
> > diff --git a/include/hw/cxl/cxl_events.h b/include/hw/cxl/cxl_events.h
> > index 5170b8dbf8..38cadaa0f3 100644
> > --- a/include/hw/cxl/cxl_events.h
> > +++ b/include/hw/cxl/cxl_events.h
> > @@ -166,4 +166,22 @@ typedef struct CXLEventMemoryModule {
> >      uint8_t reserved[0x3d];
> >  } QEMU_PACKED CXLEventMemoryModule;
> >  
> > +/*
> > + * CXL r3.1 section Table 8-50: Dynamic Capacity Event Record
> > + * All fields little endian.
> > + */
> > +typedef struct CXLEventDynamicCapacity {
> > +    CXLEventRecordHdr hdr;
> > +    uint8_t type;
> > +    uint8_t validity_flags;
> > +    uint16_t host_id;
> > +    uint8_t updated_region_id;
> > +    uint8_t flags;
> > +    uint8_t reserved2[2];
> > +    uint8_t dynamic_capacity_extent[0x28]; /* defined in cxl_device.h */
> > +    uint8_t reserved[0x18];
> > +    uint32_t extents_avail;
> > +    uint32_t tags_avail;
> > +} QEMU_PACKED CXLEventDynamicCapacity;
> > +
> >  #endif /* CXL_EVENTS_H */
> > diff --git a/qapi/cxl.json b/qapi/cxl.json
> > index 4281726dec..2dcf03d973 100644
> > --- a/qapi/cxl.json
> > +++ b/qapi/cxl.json
> > @@ -361,3 +361,72 @@
> >  ##
> >  {'command': 'cxl-inject-correctable-error',
> >   'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
> > +
> > +##
> > +# @CXLDCExtentRecord:
> 
> Such traffic jams of capital letters are hard to read.  What about
> CxlDynamicCapacityExtent?
> 
> > +#
> > +# Record of a single extent to add/release
> 
> Suggest "A dynamic capacity extent."
> 
> > +#
> > +# @offset: offset to the start of the region where the extent to be operated
> 
> Blank line here, please.
> 
> 
> 
> > +# @len: length of the extent
> > +#
> > +# Since: 9.1
> > +##
> > +{ 'struct': 'CXLDCExtentRecord',
> > +  'data': {
> > +      'offset':'uint64',
> > +      'len': 'uint64'
> > +  }
> > +}
> > +
> > +##
> > +# @cxl-add-dynamic-capacity:
> > +#
> > +# Command to start add dynamic capacity extents flow. The device will
> > +# have to acknowledged the acceptance of the extents before they are usable.
> 
> This text needs work.  More on that at the end of my review.

Yes. I will work on it for the next version once all the feedbacks
are collected and comments are resolved.

See below.

> 
> docs/devel/qapi-code-gen.rst:
> 
>     For legibility, wrap text paragraphs so every line is at most 70
>     characters long.
> 
>     Separate sentences with two spaces.
> 
> More elsewhere.
> 
> > +#
> > +# @path: CXL DCD canonical QOM path
> 
> I'd prefer @qom-path, unless you can make a consistency argument for
> @path.
> 
> Sure the QOM path needs to be canonical?
> 
> If not, what about "path to the CXL dynamic capacity device in the QOM
> tree".  Intentionally close to existing descriptions of @qom-path
> elsewhere.

From the same file, I saw "path" was used for other commands, like
"cxl-inject-memory-module-event", so I followed it.
DCD is nothing different from "type 3 device" expect it can dynamically
change capacity. 
Renaming it to "qom-path" is no problem for me, just want to make sure it
will not break the naming consistency.

> 
> > +# @hid: host id
> 
> @host-id, unless "HID" is established terminology in CXL DCD land.

host-id works.
> 
> What is a host ID?

It is an id identifying the host to which the capacity is being added.

> 
> > +# @selection-policy: policy to use for selecting extents for adding capacity
> 
> Where are selection policies defined?

It is defined in CXL specification: Specifies the policy to use for selecting
which extents comprise the added capacity

> 
> > +# @region-id: id of the region where the extent to add
> 
> Is "region ID" the established terminology in CXL DCD land?  Or is
> "region number" also used?  I'm asking because "ID" in this QEMU device
> context suggests a connection to a qdev ID.
> 
> If region number is fine, I'd rename to just @region, and rephrase the
> description to avoid "ID".  Perhaps "number of the region the extent is
> to be added to".  Not entirely happy with the phrasing, doesn't exactly
> roll off the tongue, but "where the extent to add" sounds worse to my
> ears.  Mind, I'm not a native speaker.

Yes. region number is fine. Will rename it as "region"

> 
> > +# @tag: Context field
> 
> What is this about?

Based on the specification, it is "Context field utilized by implementations
that make use of the Dynamic Capacity feature.". Basically, it is a
string (label) attached to an dynamic capacity extent so we can achieve
specific purpose, like identifying or grouping extents.

> 
> > +# @extents: Extents to add
> 
> Blank lines between argument descriptions, please.
> 
> > +#
> > +# Since : 9.1
> > +##
> > +{ 'command': 'cxl-add-dynamic-capacity',
> > +  'data': { 'path': 'str',
> > +            'hid': 'uint16',
> > +            'selection-policy': 'uint8',
> > +            'region-id': 'uint8',
> > +            'tag': 'str',
> > +            'extents': [ 'CXLDCExtentRecord' ]
> > +           }
> > +}
> > +
> > +##
> > +# @cxl-release-dynamic-capacity:
> > +#
> > +# Command to start release dynamic capacity extents flow. The host will
> > +# need to respond to indicate that it has released the capacity before it
> > +# is made unavailable for read and write and can be re-added.
> 
> This text needs work.  More on that at the end of my review.

Will do.

> 
> > +#
> > +# @path: CXL DCD canonical QOM path
> 
> My comment on cxl-add-dynamic-capacity applies.
> 
> > +# @hid: host id
> 
> Likewise.
> 
> > +# @flags: bit[3:0] for removal policy, bit[4] for forced removal, bit[5] for
> > +#     sanitize on release, bit[7:6] reserved
> 
> Where are these flags defined?

Defined in the CXL specification, it defines the release behaviour.

> 
> > +# @region-id: id of the region where the extent to release
> 
> My comment on cxl-add-dynamic-capacity applies.
> 
> > +# @tag: Context field
> 
> Likewise.
> 
> > +# @extents: Extents to release
> > +#
> > +# Since : 9.1
> > +##
> > +{ 'command': 'cxl-release-dynamic-capacity',
> > +  'data': { 'path': 'str',
> > +            'hid': 'uint16',
> > +            'flags': 'uint8',
> > +            'region-id': 'uint8',
> > +            'tag': 'str',
> > +            'extents': [ 'CXLDCExtentRecord' ]
> > +           }
> > +}
> 
> During review of v5, you wrote:
> 
>     For add command, the host will send a mailbox command to response to
>     the add request to the device to indicate whether it accepts the add
>     capacity offer or not.
>     
>     For release command, the host send a mailbox command (not always a
>     response since the host can proactively release capacity if it does
>     not need it any more) to device to ask device release the capacity.
> 
> Can you briefly sketch the protocol?  Peers and messages involved.
> Possibly as a state diagram.

Need to think about it. If we can polish the text nicely, maybe the
sketch is not needed. My concern is that the sketch may
introduce unwanted complexity as we expose too much details. The two
commands provide ways to add/release dynamic capacity to/from a host,
that is all. All the other information, like what the host will do, or
how the device will react, are consequence of the command, not sure
whether we want to include here.

@Jonathan, Any thoughts on this?

Fan

>
Markus Armbruster April 29, 2024, 7:58 a.m. UTC | #5
fan <nifan.cxl@gmail.com> writes:

> On Fri, Apr 26, 2024 at 11:12:50AM +0200, Markus Armbruster wrote:
>> nifan.cxl@gmail.com writes:

[...]

>> > diff --git a/qapi/cxl.json b/qapi/cxl.json
>> > index 4281726dec..2dcf03d973 100644
>> > --- a/qapi/cxl.json
>> > +++ b/qapi/cxl.json
>> > @@ -361,3 +361,72 @@
>> >  ##
>> >  {'command': 'cxl-inject-correctable-error',
>> >   'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
>> > +
>> > +##
>> > +# @CXLDCExtentRecord:
>> 
>> Such traffic jams of capital letters are hard to read.  What about
>> CxlDynamicCapacityExtent?
>> 
>> > +#
>> > +# Record of a single extent to add/release
>> 
>> Suggest "A dynamic capacity extent."
>> 
>> > +#
>> > +# @offset: offset to the start of the region where the extent to be operated
>> 
>> Blank line here, please.
>> 
>> 
>> 
>> > +# @len: length of the extent
>> > +#
>> > +# Since: 9.1
>> > +##
>> > +{ 'struct': 'CXLDCExtentRecord',
>> > +  'data': {
>> > +      'offset':'uint64',
>> > +      'len': 'uint64'
>> > +  }
>> > +}
>> > +
>> > +##
>> > +# @cxl-add-dynamic-capacity:
>> > +#
>> > +# Command to start add dynamic capacity extents flow. The device will
>> > +# have to acknowledged the acceptance of the extents before they are usable.
>> 
>> This text needs work.  More on that at the end of my review.
>
> Yes. I will work on it for the next version once all the feedbacks
> are collected and comments are resolved.
>
> See below.
>
>> 
>> docs/devel/qapi-code-gen.rst:
>> 
>>     For legibility, wrap text paragraphs so every line is at most 70
>>     characters long.
>> 
>>     Separate sentences with two spaces.
>> 
>> More elsewhere.
>> 
>> > +#
>> > +# @path: CXL DCD canonical QOM path
>> 
>> I'd prefer @qom-path, unless you can make a consistency argument for
>> @path.
>> 
>> Sure the QOM path needs to be canonical?
>> 
>> If not, what about "path to the CXL dynamic capacity device in the QOM
>> tree".  Intentionally close to existing descriptions of @qom-path
>> elsewhere.
>
> From the same file, I saw "path" was used for other commands, like
> "cxl-inject-memory-module-event", so I followed it.
> DCD is nothing different from "type 3 device" expect it can dynamically
> change capacity. 
> Renaming it to "qom-path" is no problem for me, just want to make sure it
> will not break the naming consistency.

Both @path and @qom-path are used (sadly).  @path is used for all kinds
of paths, whereas @qom-path is only used for QOM paths.  That's why I
prefer it.

However, you're making a compelling local consistency argument: cxl.json
uses only @path.  Sticking to that makes sense.

>> > +# @hid: host id
>> 
>> @host-id, unless "HID" is established terminology in CXL DCD land.
>
> host-id works.
>> 
>> What is a host ID?
>
> It is an id identifying the host to which the capacity is being added.

How are these IDs assigned?

>> > +# @selection-policy: policy to use for selecting extents for adding capacity
>> 
>> Where are selection policies defined?
>
> It is defined in CXL specification: Specifies the policy to use for selecting
> which extents comprise the added capacity

Include a reference to the spec here?

>> > +# @region-id: id of the region where the extent to add
>> 
>> Is "region ID" the established terminology in CXL DCD land?  Or is
>> "region number" also used?  I'm asking because "ID" in this QEMU device
>> context suggests a connection to a qdev ID.
>> 
>> If region number is fine, I'd rename to just @region, and rephrase the
>> description to avoid "ID".  Perhaps "number of the region the extent is
>> to be added to".  Not entirely happy with the phrasing, doesn't exactly
>> roll off the tongue, but "where the extent to add" sounds worse to my
>> ears.  Mind, I'm not a native speaker.
>
> Yes. region number is fine. Will rename it as "region"
>
>> 
>> > +# @tag: Context field
>> 
>> What is this about?
>
> Based on the specification, it is "Context field utilized by implementations
> that make use of the Dynamic Capacity feature.". Basically, it is a
> string (label) attached to an dynamic capacity extent so we can achieve
> specific purpose, like identifying or grouping extents.

Include a reference to the spec here?

>> > +# @extents: Extents to add
>> 
>> Blank lines between argument descriptions, please.
>> 
>> > +#
>> > +# Since : 9.1
>> > +##
>> > +{ 'command': 'cxl-add-dynamic-capacity',
>> > +  'data': { 'path': 'str',
>> > +            'hid': 'uint16',
>> > +            'selection-policy': 'uint8',
>> > +            'region-id': 'uint8',
>> > +            'tag': 'str',
>> > +            'extents': [ 'CXLDCExtentRecord' ]
>> > +           }
>> > +}
>> > +
>> > +##
>> > +# @cxl-release-dynamic-capacity:
>> > +#
>> > +# Command to start release dynamic capacity extents flow. The host will
>> > +# need to respond to indicate that it has released the capacity before it
>> > +# is made unavailable for read and write and can be re-added.
>> 
>> This text needs work.  More on that at the end of my review.
>
> Will do.
>
>> 
>> > +#
>> > +# @path: CXL DCD canonical QOM path
>> 
>> My comment on cxl-add-dynamic-capacity applies.
>> 
>> > +# @hid: host id
>> 
>> Likewise.
>> 
>> > +# @flags: bit[3:0] for removal policy, bit[4] for forced removal, bit[5] for
>> > +#     sanitize on release, bit[7:6] reserved
>> 
>> Where are these flags defined?
>
> Defined in the CXL specification, it defines the release behaviour.

Include a reference to the spec here?

Is the numeric encoding of flags appropriate?

In general, we prefer symbolic encodings.  Numeric encodings can make
sense when

• the encoding is stable, and

• QEMU doesn't need to decode it, only pass it on to something else, and

• both the QMP client and the "something else" prefer a numeric
  encoding.

>> > +# @region-id: id of the region where the extent to release
>> 
>> My comment on cxl-add-dynamic-capacity applies.
>> 
>> > +# @tag: Context field
>> 
>> Likewise.
>> 
>> > +# @extents: Extents to release
>> > +#
>> > +# Since : 9.1
>> > +##
>> > +{ 'command': 'cxl-release-dynamic-capacity',
>> > +  'data': { 'path': 'str',
>> > +            'hid': 'uint16',
>> > +            'flags': 'uint8',
>> > +            'region-id': 'uint8',
>> > +            'tag': 'str',
>> > +            'extents': [ 'CXLDCExtentRecord' ]
>> > +           }
>> > +}
>> 
>> During review of v5, you wrote:
>> 
>>     For add command, the host will send a mailbox command to response to
>>     the add request to the device to indicate whether it accepts the add
>>     capacity offer or not.
>>     
>>     For release command, the host send a mailbox command (not always a
>>     response since the host can proactively release capacity if it does
>>     not need it any more) to device to ask device release the capacity.
>> 
>> Can you briefly sketch the protocol?  Peers and messages involved.
>> Possibly as a state diagram.
>
> Need to think about it. If we can polish the text nicely, maybe the
> sketch is not needed. My concern is that the sketch may
> introduce unwanted complexity as we expose too much details. The two
> commands provide ways to add/release dynamic capacity to/from a host,
> that is all. All the other information, like what the host will do, or
> how the device will react, are consequence of the command, not sure
> whether we want to include here.

The protocol sketch is for me, not necessarily the doc comment.  I'd
like to understand at high level how this stuff works, because only then
can I meaningfully review the docs.

> @Jonathan, Any thoughts on this?

Thanks!
Fan Ni April 30, 2024, 5:17 p.m. UTC | #6
On Mon, Apr 29, 2024 at 09:58:42AM +0200, Markus Armbruster wrote:
> fan <nifan.cxl@gmail.com> writes:
> 
> > On Fri, Apr 26, 2024 at 11:12:50AM +0200, Markus Armbruster wrote:
> >> nifan.cxl@gmail.com writes:
> 
> [...]
> 
> >> > diff --git a/qapi/cxl.json b/qapi/cxl.json
> >> > index 4281726dec..2dcf03d973 100644
> >> > --- a/qapi/cxl.json
> >> > +++ b/qapi/cxl.json
> >> > @@ -361,3 +361,72 @@
> >> >  ##
> >> >  {'command': 'cxl-inject-correctable-error',
> >> >   'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
> >> > +
> >> > +##
> >> > +# @CXLDCExtentRecord:
> >> 
> >> Such traffic jams of capital letters are hard to read.  What about
> >> CxlDynamicCapacityExtent?
> >> 
> >> > +#
> >> > +# Record of a single extent to add/release
> >> 
> >> Suggest "A dynamic capacity extent."
> >> 
> >> > +#
> >> > +# @offset: offset to the start of the region where the extent to be operated
> >> 
> >> Blank line here, please.
> >> 
> >> 
> >> 
> >> > +# @len: length of the extent
> >> > +#
> >> > +# Since: 9.1
> >> > +##
> >> > +{ 'struct': 'CXLDCExtentRecord',
> >> > +  'data': {
> >> > +      'offset':'uint64',
> >> > +      'len': 'uint64'
> >> > +  }
> >> > +}
> >> > +
> >> > +##
> >> > +# @cxl-add-dynamic-capacity:
> >> > +#
> >> > +# Command to start add dynamic capacity extents flow. The device will
> >> > +# have to acknowledged the acceptance of the extents before they are usable.
> >> 
> >> This text needs work.  More on that at the end of my review.
> >
> > Yes. I will work on it for the next version once all the feedbacks
> > are collected and comments are resolved.
> >
> > See below.
> >
> >> 
> >> docs/devel/qapi-code-gen.rst:
> >> 
> >>     For legibility, wrap text paragraphs so every line is at most 70
> >>     characters long.
> >> 
> >>     Separate sentences with two spaces.
> >> 
> >> More elsewhere.
> >> 
> >> > +#
> >> > +# @path: CXL DCD canonical QOM path
> >> 
> >> I'd prefer @qom-path, unless you can make a consistency argument for
> >> @path.
> >> 
> >> Sure the QOM path needs to be canonical?
> >> 
> >> If not, what about "path to the CXL dynamic capacity device in the QOM
> >> tree".  Intentionally close to existing descriptions of @qom-path
> >> elsewhere.
> >
> > From the same file, I saw "path" was used for other commands, like
> > "cxl-inject-memory-module-event", so I followed it.
> > DCD is nothing different from "type 3 device" expect it can dynamically
> > change capacity. 
> > Renaming it to "qom-path" is no problem for me, just want to make sure it
> > will not break the naming consistency.
> 
> Both @path and @qom-path are used (sadly).  @path is used for all kinds
> of paths, whereas @qom-path is only used for QOM paths.  That's why I
> prefer it.
> 
> However, you're making a compelling local consistency argument: cxl.json
> uses only @path.  Sticking to that makes sense.
> 
> >> > +# @hid: host id
> >> 
> >> @host-id, unless "HID" is established terminology in CXL DCD land.
> >
> > host-id works.
> >> 
> >> What is a host ID?
> >
> > It is an id identifying the host to which the capacity is being added.
> 
> How are these IDs assigned?

All the arguments passed to the command here are defined in CXL spec. I
will add reference to the spec.

Based on the spec, for LD-FAM (Fabric attached memory represented as
logical device), host id is the LD-ID of the host interface to which
the capacity is being added. LD-ID is a unique number (16-bit) assigned
to a host interface.

> 
> >> > +# @selection-policy: policy to use for selecting extents for adding capacity
> >> 
> >> Where are selection policies defined?
> >
> > It is defined in CXL specification: Specifies the policy to use for selecting
> > which extents comprise the added capacity
> 
> Include a reference to the spec here?
Wil do.
> 
> >> > +# @region-id: id of the region where the extent to add
> >> 
> >> Is "region ID" the established terminology in CXL DCD land?  Or is
> >> "region number" also used?  I'm asking because "ID" in this QEMU device
> >> context suggests a connection to a qdev ID.
> >> 
> >> If region number is fine, I'd rename to just @region, and rephrase the
> >> description to avoid "ID".  Perhaps "number of the region the extent is
> >> to be added to".  Not entirely happy with the phrasing, doesn't exactly
> >> roll off the tongue, but "where the extent to add" sounds worse to my
> >> ears.  Mind, I'm not a native speaker.
> >
> > Yes. region number is fine. Will rename it as "region"
> >
> >> 
> >> > +# @tag: Context field
> >> 
> >> What is this about?
> >
> > Based on the specification, it is "Context field utilized by implementations
> > that make use of the Dynamic Capacity feature.". Basically, it is a
> > string (label) attached to an dynamic capacity extent so we can achieve
> > specific purpose, like identifying or grouping extents.
> 
> Include a reference to the spec here?
Will do.
> 
> >> > +# @extents: Extents to add
> >> 
> >> Blank lines between argument descriptions, please.
> >> 
> >> > +#
> >> > +# Since : 9.1
> >> > +##
> >> > +{ 'command': 'cxl-add-dynamic-capacity',
> >> > +  'data': { 'path': 'str',
> >> > +            'hid': 'uint16',
> >> > +            'selection-policy': 'uint8',
> >> > +            'region-id': 'uint8',
> >> > +            'tag': 'str',
> >> > +            'extents': [ 'CXLDCExtentRecord' ]
> >> > +           }
> >> > +}
> >> > +
> >> > +##
> >> > +# @cxl-release-dynamic-capacity:
> >> > +#
> >> > +# Command to start release dynamic capacity extents flow. The host will
> >> > +# need to respond to indicate that it has released the capacity before it
> >> > +# is made unavailable for read and write and can be re-added.
> >> 
> >> This text needs work.  More on that at the end of my review.
> >
> > Will do.
> >
> >> 
> >> > +#
> >> > +# @path: CXL DCD canonical QOM path
> >> 
> >> My comment on cxl-add-dynamic-capacity applies.
> >> 
> >> > +# @hid: host id
> >> 
> >> Likewise.
> >> 
> >> > +# @flags: bit[3:0] for removal policy, bit[4] for forced removal, bit[5] for
> >> > +#     sanitize on release, bit[7:6] reserved
> >> 
> >> Where are these flags defined?
> >
> > Defined in the CXL specification, it defines the release behaviour.
> 
> Include a reference to the spec here?
Will do.
> 
> Is the numeric encoding of flags appropriate?
> 
> In general, we prefer symbolic encodings.  Numeric encodings can make
> sense when
> 
> • the encoding is stable, and
> 
> • QEMU doesn't need to decode it, only pass it on to something else, and
> 
> • both the QMP client and the "something else" prefer a numeric
>   encoding.
The encoding is from the specification, and we do not invent anything
here. It is stable and all the updates to the spec need to be backward
compatible.
> 
> >> > +# @region-id: id of the region where the extent to release
> >> 
> >> My comment on cxl-add-dynamic-capacity applies.
> >> 
> >> > +# @tag: Context field
> >> 
> >> Likewise.
> >> 
> >> > +# @extents: Extents to release
> >> > +#
> >> > +# Since : 9.1
> >> > +##
> >> > +{ 'command': 'cxl-release-dynamic-capacity',
> >> > +  'data': { 'path': 'str',
> >> > +            'hid': 'uint16',
> >> > +            'flags': 'uint8',
> >> > +            'region-id': 'uint8',
> >> > +            'tag': 'str',
> >> > +            'extents': [ 'CXLDCExtentRecord' ]
> >> > +           }
> >> > +}
> >> 
> >> During review of v5, you wrote:
> >> 
> >>     For add command, the host will send a mailbox command to response to
> >>     the add request to the device to indicate whether it accepts the add
> >>     capacity offer or not.
> >>     
> >>     For release command, the host send a mailbox command (not always a
> >>     response since the host can proactively release capacity if it does
> >>     not need it any more) to device to ask device release the capacity.
> >> 
> >> Can you briefly sketch the protocol?  Peers and messages involved.
> >> Possibly as a state diagram.
> >
> > Need to think about it. If we can polish the text nicely, maybe the
> > sketch is not needed. My concern is that the sketch may
> > introduce unwanted complexity as we expose too much details. The two
> > commands provide ways to add/release dynamic capacity to/from a host,
> > that is all. All the other information, like what the host will do, or
> > how the device will react, are consequence of the command, not sure
> > whether we want to include here.
> 
> The protocol sketch is for me, not necessarily the doc comment.  I'd
> like to understand at high level how this stuff works, because only then
> can I meaningfully review the docs.

--------------------------------
For add command, saying a user sends a request to FM to ask to add
extent A of the device (managed by FM) to host 0.
The function cxl-add-dynamic-capacity simulates what FM needs to do.
1. Verify extent A is valid (behaviour defined by the spec), return
error if not; otherwise,
2. Add a record to the device's event log (indicating the intent to
add extent A to host 0), update device internal extent tracking status,
signal an interrupt to host 0;
(The above step 1 & 2 are performed in the QMP interface, following
operations are QMP irrelevant, only host and device involved.)
3. Once the interrupt is received, host 0 fetch the event record from
the device's event log through some mailbox command (out of scope
of this patch series).
4. Host 0 decides whether it accepts extent A or not. Whether accept or
reject, host needs to send a response (add-response mailbox command) to
the device so the device can update its internal extent tracking
status accordingly.
The device return a value to the host showing whether the response is
successful or failed.
5. Based on the mailbox command return value, the host process
accordingly.
6. The host sends a mailbox command to the device to clear the event
record in the device's event log. 

---------------------------------
For release command, saying a user sends a request to FM to ask host 0
to release extent A and return it back to the device (managed by FM).

The function cxl-release-dynamic-capacity simulates what FM needs to do.
1. Verify extent A is valid (defined by the spec), return error if not;
otherwise,
2. Add a record to the event log (indicating the intent to
release extent A from host 0), signal an interrupt to host 0;
(The above step 1 & 2 are performed in the QMP interface, following
operations are QMP irrelevant, only host and device involved.
3. Once the interrupt is received, host 0 fetch the event record from
the device's event log through some mailbox command (out of scope
of this patch series).
4. Host 0 decides whether it can release extent A or not. Whether can or
cannot release, host needs to send a release (mailbox command) to the device
so the device can update its internal extent tracking status accordingly.
The device returns a value to host 0 showing whether the release is
successful or failed.
5. Based on the returned value, the host process accordingly.
6. The host sends mailbox command to clear the event record in the
device's event log. 

For release command, it is more complicated. Based on the release flag
passed to FM, FM can behaviour differently. For example, if the
forced-removal flag is set, FM can directly get the extent back from a
host for other uses without waiting for the host to send command to the
device. For the above step 2, their may be not event record to the event
log (no supported in this patch series yet).

Also, for the release interface here, it simulates FM initializes the
release request.
There is another case where the host can proactively release extents it
do not need any more back to device. However, this case is out of the
scope of this release interface.

Hope the above text helps a little for the context here.
Let me know if further clarification is needed.

Thanks,
Fan



> 
> > @Jonathan, Any thoughts on this?
> 
> Thanks!
>
Jonathan Cameron April 30, 2024, 5:21 p.m. UTC | #7
On Mon, 29 Apr 2024 09:58:42 +0200
Markus Armbruster <armbru@redhat.com> wrote:

> fan <nifan.cxl@gmail.com> writes:
> 
> > On Fri, Apr 26, 2024 at 11:12:50AM +0200, Markus Armbruster wrote:  
> >> nifan.cxl@gmail.com writes:  
> 
> [...]
> 
> >> > diff --git a/qapi/cxl.json b/qapi/cxl.json
> >> > index 4281726dec..2dcf03d973 100644
> >> > --- a/qapi/cxl.json
> >> > +++ b/qapi/cxl.json
> >> > @@ -361,3 +361,72 @@
> >> >  ##
> >> >  {'command': 'cxl-inject-correctable-error',
> >> >   'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
> >> > +
> >> > +##
> >> > +# @CXLDCExtentRecord:  
> >> 
> >> Such traffic jams of capital letters are hard to read.  What about
> >> CxlDynamicCapacityExtent?
> >>   
> >> > +#
> >> > +# Record of a single extent to add/release  
> >> 
> >> Suggest "A dynamic capacity extent."
> >>   
> >> > +#
> >> > +# @offset: offset to the start of the region where the extent to be operated  
> >> 
> >> Blank line here, please.
> >> 
> >> 
> >>   
> >> > +# @len: length of the extent
> >> > +#
> >> > +# Since: 9.1
> >> > +##
> >> > +{ 'struct': 'CXLDCExtentRecord',
> >> > +  'data': {
> >> > +      'offset':'uint64',
> >> > +      'len': 'uint64'
> >> > +  }
> >> > +}
> >> > +
> >> > +##
> >> > +# @cxl-add-dynamic-capacity:
> >> > +#
> >> > +# Command to start add dynamic capacity extents flow. The device will
> >> > +# have to acknowledged the acceptance of the extents before they are usable.  
> >> 
> >> This text needs work.  More on that at the end of my review.  
> >
> > Yes. I will work on it for the next version once all the feedbacks
> > are collected and comments are resolved.
> >
> > See below.
> >  
> >> 
> >> docs/devel/qapi-code-gen.rst:
> >> 
> >>     For legibility, wrap text paragraphs so every line is at most 70
> >>     characters long.
> >> 
> >>     Separate sentences with two spaces.
> >> 
> >> More elsewhere.
> >>   
> >> > +#
> >> > +# @path: CXL DCD canonical QOM path  
> >> 
> >> I'd prefer @qom-path, unless you can make a consistency argument for
> >> @path.
> >> 
> >> Sure the QOM path needs to be canonical?
> >> 
> >> If not, what about "path to the CXL dynamic capacity device in the QOM
> >> tree".  Intentionally close to existing descriptions of @qom-path
> >> elsewhere.  
> >
> > From the same file, I saw "path" was used for other commands, like
> > "cxl-inject-memory-module-event", so I followed it.
> > DCD is nothing different from "type 3 device" expect it can dynamically
> > change capacity. 
> > Renaming it to "qom-path" is no problem for me, just want to make sure it
> > will not break the naming consistency.  
> 
> Both @path and @qom-path are used (sadly).  @path is used for all kinds
> of paths, whereas @qom-path is only used for QOM paths.  That's why I
> prefer it.
> 
> However, you're making a compelling local consistency argument: cxl.json
> uses only @path.  Sticking to that makes sense.
> 
> >> > +# @hid: host id  
> >> 
> >> @host-id, unless "HID" is established terminology in CXL DCD land.  
> >
> > host-id works.  
> >> 
> >> What is a host ID?  
> >
> > It is an id identifying the host to which the capacity is being added.  
> 
> How are these IDs assigned?

Right now there is only 1 option.  We can drop this for now and introduce
it when needed (Default of 0 will be fine).  Multi head device patches
that will need this are on list though I haven't read them yet :(

> 
> >> > +# @selection-policy: policy to use for selecting extents for adding capacity  
> >> 
> >> Where are selection policies defined?  
> >
> > It is defined in CXL specification: Specifies the policy to use for selecting
> > which extents comprise the added capacity  
> 
> Include a reference to the spec here?
> 
> >> > +# @region-id: id of the region where the extent to add  
> >> 
> >> Is "region ID" the established terminology in CXL DCD land?  Or is
> >> "region number" also used?  I'm asking because "ID" in this QEMU device
> >> context suggests a connection to a qdev ID.
> >> 
> >> If region number is fine, I'd rename to just @region, and rephrase the
> >> description to avoid "ID".  Perhaps "number of the region the extent is
> >> to be added to".  Not entirely happy with the phrasing, doesn't exactly
> >> roll off the tongue, but "where the extent to add" sounds worse to my
> >> ears.  Mind, I'm not a native speaker.  
> >
> > Yes. region number is fine. Will rename it as "region"
> >  
> >>   
> >> > +# @tag: Context field  
> >> 
> >> What is this about?  
> >
> > Based on the specification, it is "Context field utilized by implementations
> > that make use of the Dynamic Capacity feature.". Basically, it is a
> > string (label) attached to an dynamic capacity extent so we can achieve
> > specific purpose, like identifying or grouping extents.  
> 
> Include a reference to the spec here?

Agreed - that is the best we can do. It'sa  magic value.

> 
> >> > +# @extents: Extents to add  
> >> 
> >> Blank lines between argument descriptions, please.
> >>   
> >> > +#
> >> > +# Since : 9.1
> >> > +##
> >> > +{ 'command': 'cxl-add-dynamic-capacity',
> >> > +  'data': { 'path': 'str',
> >> > +            'hid': 'uint16',
> >> > +            'selection-policy': 'uint8',
> >> > +            'region-id': 'uint8',
> >> > +            'tag': 'str',
> >> > +            'extents': [ 'CXLDCExtentRecord' ]
> >> > +           }
> >> > +}
> >> > +
> >> > +##
> >> > +# @cxl-release-dynamic-capacity:
> >> > +#
> >> > +# Command to start release dynamic capacity extents flow. The host will
> >> > +# need to respond to indicate that it has released the capacity before it
> >> > +# is made unavailable for read and write and can be re-added.  
> >> 
> >> This text needs work.  More on that at the end of my review.  
> >
> > Will do.
> >  
> >>   
> >> > +#
> >> > +# @path: CXL DCD canonical QOM path  
> >> 
> >> My comment on cxl-add-dynamic-capacity applies.
> >>   
> >> > +# @hid: host id  
> >> 
> >> Likewise.
> >>   
> >> > +# @flags: bit[3:0] for removal policy, bit[4] for forced removal, bit[5] for
> >> > +#     sanitize on release, bit[7:6] reserved  
> >> 
> >> Where are these flags defined?  
> >
> > Defined in the CXL specification, it defines the release behaviour.  
> 
> Include a reference to the spec here?
> 
> Is the numeric encoding of flags appropriate?

Could definitely break them out as a bunch of flags / symbolic for
the policy.

> 
> In general, we prefer symbolic encodings.  Numeric encodings can make
> sense when
> 
> • the encoding is stable, and
> 
> • QEMU doesn't need to decode it, only pass it on to something else, and
> 
> • both the QMP client and the "something else" prefer a numeric
>   encoding.

I don't think that really applies here - though Gregory's shim from
MCTP to this will have to go through a simple dance to fill them in.

> 
> >> > +# @region-id: id of the region where the extent to release  
> >> 
> >> My comment on cxl-add-dynamic-capacity applies.
> >>   
> >> > +# @tag: Context field  
> >> 
> >> Likewise.
> >>   
> >> > +# @extents: Extents to release
> >> > +#
> >> > +# Since : 9.1
> >> > +##
> >> > +{ 'command': 'cxl-release-dynamic-capacity',
> >> > +  'data': { 'path': 'str',
> >> > +            'hid': 'uint16',
> >> > +            'flags': 'uint8',
> >> > +            'region-id': 'uint8',
> >> > +            'tag': 'str',
> >> > +            'extents': [ 'CXLDCExtentRecord' ]
> >> > +           }
> >> > +}  
> >> 
> >> During review of v5, you wrote:
> >> 
> >>     For add command, the host will send a mailbox command to response to
> >>     the add request to the device to indicate whether it accepts the add
> >>     capacity offer or not.
> >>     
> >>     For release command, the host send a mailbox command (not always a
> >>     response since the host can proactively release capacity if it does
> >>     not need it any more) to device to ask device release the capacity.
> >> 
> >> Can you briefly sketch the protocol?  Peers and messages involved.
> >> Possibly as a state diagram.  
> >
> > Need to think about it. If we can polish the text nicely, maybe the
> > sketch is not needed. My concern is that the sketch may
> > introduce unwanted complexity as we expose too much details. The two
> > commands provide ways to add/release dynamic capacity to/from a host,
> > that is all. All the other information, like what the host will do, or
> > how the device will react, are consequence of the command, not sure
> > whether we want to include here.  
> 
> The protocol sketch is for me, not necessarily the doc comment.  I'd
> like to understand at high level how this stuff works, because only then
> can I meaningfully review the docs.
> 
> > @Jonathan, Any thoughts on this? 
Makes sense to have a bit of artwork to explain what is going on.
Suitable stuff for the cover letter or patch description for v8
as well as in reply here.  Simple flows should be enough, we don't
need to worry on the messy corner cases (hopefully)

1) Offer extents to a host  + it accepts.
2) Ask for it back, it gives it back.

I can put my non existent artistic talents on it later in the
week if Fan doesn't get there first.

> 
> Thanks!
>
Jonathan Cameron May 1, 2024, 2:58 p.m. UTC | #8
> > >> > +# @hid: host id  
> > >> 
> > >> @host-id, unless "HID" is established terminology in CXL DCD land.  
> > >
> > > host-id works.  
> > >> 
> > >> What is a host ID?  
> > >
> > > It is an id identifying the host to which the capacity is being added.  
> > 
> > How are these IDs assigned?  
> 
> All the arguments passed to the command here are defined in CXL spec. I
> will add reference to the spec.
> 
> Based on the spec, for LD-FAM (Fabric attached memory represented as
> logical device), host id is the LD-ID of the host interface to which
> the capacity is being added. LD-ID is a unique number (16-bit) assigned
> to a host interface.

Key here is the host doesn't know it.  This ID exists purely for rooting
to the appropriate host interface either via choosing a port on a
multihead Single Logical Device (SLD) (so today it's always 0 as we only
have one head) or if we ever implement a switch capable of handling MLDs
then the switch will handle routing of host PCIe accesses so it lands
on the interface defined by this ID (and the event turns up in that event log.

            Host A         Host B - could in theory be a RP on host A ;)
              |              |  Doesn't exist (yet, but there are partial.
             _|______________|_ patches for this on list.
            | LD 0         LD 1|
            |                  |
            |   Multi Head     |
            |   Single Logical |
            |  Device (MH-SLD) |
            |__________________|
Host view similar to the switch case, but just two direct
connected devices.

Or Switch and MLD case - we aren't emulating this yet at all

     Wiring / real topology                 Host View 
         
      Host A     Host B              Host A       Host B
        |          |                   |            |
     ___|__________|___               _|_          _|_
    |   \  SWITCH /    |             |SW0|        | | |
    |    \       /     |             | | |        | | |
    |    LD0   LD1     |             | | |        | | |
    |      \   /       |             | | |        | | |
    |        |         |             | | |        | | |
    |________|_________|             |_|_|        |_|_|
             |                         |            |
      Traffic tagged with LD           |            |
             |                         |            |
     ________|________________     ____|___     ____|___
    | Multilogical Device MLD |   |        |   |        |
    |        |                |   | Simple |   | Another|
    |       / \               |   | CXL    |   | CXL    |
    |      /   \              |   | Memory |   | Memory |
    |    Interfaces           |   | Device |   | Device |
    |   LD0     LD1           |   |        |   |        |
    |_________________________|   |________|   |________|

Note the hosts just see separate devices and switches with the fun exception that the
memory may actually be available to both at the same time.

Control plane for the switches and MLD see what is actually going on.

At this stage upshot is we could just default this to zero and add an optional
parameter to set it later.



...

> > >> > +# @extents: Extents to release
> > >> > +#
> > >> > +# Since : 9.1
> > >> > +##
> > >> > +{ 'command': 'cxl-release-dynamic-capacity',
> > >> > +  'data': { 'path': 'str',
> > >> > +            'hid': 'uint16',
> > >> > +            'flags': 'uint8',
> > >> > +            'region-id': 'uint8',
> > >> > +            'tag': 'str',
> > >> > +            'extents': [ 'CXLDCExtentRecord' ]
> > >> > +           }
> > >> > +}  
> > >> 
> > >> During review of v5, you wrote:
> > >> 
> > >>     For add command, the host will send a mailbox command to response to
> > >>     the add request to the device to indicate whether it accepts the add
> > >>     capacity offer or not.
> > >>     
> > >>     For release command, the host send a mailbox command (not always a
> > >>     response since the host can proactively release capacity if it does
> > >>     not need it any more) to device to ask device release the capacity.
> > >> 
> > >> Can you briefly sketch the protocol?  Peers and messages involved.
> > >> Possibly as a state diagram.  
> > >
> > > Need to think about it. If we can polish the text nicely, maybe the
> > > sketch is not needed. My concern is that the sketch may
> > > introduce unwanted complexity as we expose too much details. The two
> > > commands provide ways to add/release dynamic capacity to/from a host,
> > > that is all. All the other information, like what the host will do, or
> > > how the device will react, are consequence of the command, not sure
> > > whether we want to include here.  
> > 
> > The protocol sketch is for me, not necessarily the doc comment.  I'd
> > like to understand at high level how this stuff works, because only then
> > can I meaningfully review the docs.  
> 
> --------------------------------
> For add command, saying a user sends a request to FM to ask to add
> extent A of the device (managed by FM) to host 0.
> The function cxl-add-dynamic-capacity simulates what FM needs to do.

This gets a little fiddly as an explanation.  I'd argue this is more or
less at the level of the FM to device command flow so it's the device
verifying etc. (you could explain this interface as talking to an FM
that is talking to the device, but that just feels complicated to me).

> 1. Verify extent A is valid (behaviour defined by the spec), return
> error if not; otherwise,
> 2. Add a record to the device's event log (indicating the intent to
> add extent A to host 0), update device internal extent tracking status,
> signal an interrupt to host 0;
> (The above step 1 & 2 are performed in the QMP interface, following
> operations are QMP irrelevant, only host and device involved.)

In this patch.

> 3. Once the interrupt is received, host 0 fetch the event record from
> the device's event log through some mailbox command (out of scope
> of this patch series).

It's in patch 8.

> 4. Host 0 decides whether it accepts extent A or not. Whether accept or
> reject, host needs to send a response (add-response mailbox command) to
> the device so the device can update its internal extent tracking
> status accordingly.
> The device return a value to the host showing whether the response is
> successful or failed.

(assuming the host isn't buggy this always succeeds)

> 5. Based on the mailbox command return value, the host process
> accordingly.

Memory now useable by host if it accepted it successfully.

> 6. The host sends a mailbox command to the device to clear the event
> record in the device's event log. 
> 
> ---------------------------------
> For release command, saying a user sends a request to FM to ask host 0
> to release extent A and return it back to the device (managed by FM).
> 
> The function cxl-release-dynamic-capacity simulates what FM needs to do.
> 1. Verify extent A is valid (defined by the spec), return error if not;
> otherwise,
> 2. Add a record to the event log (indicating the intent to
> release extent A from host 0), signal an interrupt to host 0;
> (The above step 1 & 2 are performed in the QMP interface, following
> operations are QMP irrelevant, only host and device involved.
> 3. Once the interrupt is received, host 0 fetch the event record from
> the device's event log through some mailbox command (out of scope
> of this patch series).
> 4. Host 0 decides whether it can release extent A or not. Whether can or
> cannot release, host needs to send a release (mailbox command) to the device
> so the device can update its internal extent tracking status accordingly.
> The device returns a value to host 0 showing whether the release is
> successful or failed.
> 5. Based on the returned value, the host process accordingly.
> 6. The host sends mailbox command to clear the event record in the
> device's event log. 
> 
> For release command, it is more complicated. Based on the release flag
> passed to FM, FM can behaviour differently. For example, if the
> forced-removal flag is set, FM can directly get the extent back from a
> host for other uses without waiting for the host to send command to the
> device. For the above step 2, their may be not event record to the event
> log (no supported in this patch series yet).
I thought we weren't doing force remove yet?  So for that we could
set default value as normal release until we add that support perhaps.

> 
> Also, for the release interface here, it simulates FM initializes the
> release request.
> There is another case where the host can proactively release extents it
> do not need any more back to device. However, this case is out of the
> scope of this release interface.
> 
> Hope the above text helps a little for the context here.
> Let me know if further clarification is needed.

Only thing I'd add is that for now (because we don't need it for testing
the kernel flows) is that this does not provide any way for external
agents (e.g. our 'fabric manager' to find out what the state is - i.e.
if the extents have been accepted by the host etc). That stuff is all
defined by the spec, but not yet in the QMP interface.  At somepoint
we may want to add that as a state query type interface.

Jonathan

p.s. Our emails raced yesterday, so great you put together this explanation
of the flows before I got to it :)

> 
> Thanks,
> Fan
> 
> 
> 
> >   
> > > @Jonathan, Any thoughts on this?  
> > 
> > Thanks!
> >
Fan Ni May 1, 2024, 10:29 p.m. UTC | #9
From 873f59ec06c38645768ada452d9b18920a34723e Mon Sep 17 00:00:00 2001
From: Fan Ni <fan.ni@samsung.com>
Date: Tue, 20 Feb 2024 09:48:31 -0800
Subject: [PATCH] hw/cxl/events: Add qmp interfaces to add/release dynamic
 capacity extents
Status: RO
Content-Length: 25172
Lines: 731

To simulate FM functionalities for initiating Dynamic Capacity Add
(Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
add/release dynamic capacity extents requests.

With the change, we allow to release an extent only when its DPA range
is contained by a single accepted extent in the device. That is to say,
extent superset release is not supported yet.

1. Add dynamic capacity extents:

For example, the command to add two continuous extents (each 128MiB long)
to region 0 (starting at DPA offset 0) looks like below:

{ "execute": "qmp_capabilities" }

{ "execute": "cxl-add-dynamic-capacity",
  "arguments": {
      "path": "/machine/peripheral/cxl-dcd0",
      "host-id": 0,
      "selection-policy": 2,
      "region": 0,
      "tag": "",
      "extents": [
      {
          "offset": 0,
          "len": 134217728
      },
      {
          "offset": 134217728,
          "len": 134217728
      }
      ]
  }
}

2. Release dynamic capacity extents:

For example, the command to release an extent of size 128MiB from region 0
(DPA offset 128MiB) looks like below:

{ "execute": "cxl-release-dynamic-capacity",
  "arguments": {
      "path": "/machine/peripheral/cxl-dcd0",
      "host-id": 0,
      "flags": 1,
      "region": 0,
      "tag": "",
      "extents": [
      {
          "offset": 134217728,
          "len": 134217728
      }
      ]
  }
}

Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/cxl/cxl-mailbox-utils.c  |  62 +++++--
 hw/mem/cxl_type3.c          | 311 +++++++++++++++++++++++++++++++++++-
 hw/mem/cxl_type3_stubs.c    |  20 +++
 include/hw/cxl/cxl_device.h |  22 +++
 include/hw/cxl/cxl_events.h |  18 +++
 qapi/cxl.json               |  90 +++++++++++
 6 files changed, 510 insertions(+), 13 deletions(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 9d54e10cd4..3569902e9e 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -1405,7 +1405,7 @@ static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
  * Check whether any bit between addr[nr, nr+size) is set,
  * return true if any bit is set, otherwise return false
  */
-static bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
+bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
                               unsigned long size)
 {
     unsigned long res = find_next_bit(addr, size + nr, nr);
@@ -1444,7 +1444,7 @@ CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len)
     return NULL;
 }
 
-static void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
+void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
                                              uint64_t dpa,
                                              uint64_t len,
                                              uint8_t *tag,
@@ -1470,6 +1470,44 @@ void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
     g_free(extent);
 }
 
+/*
+ * Add a new extent to the extent "group" if group exists;
+ * otherwise, create a new group
+ * Return value: return the group where the extent is inserted.
+ */
+CXLDCExtentGroup *cxl_insert_extent_to_extent_group(CXLDCExtentGroup *group,
+                                                    uint64_t dpa,
+                                                    uint64_t len,
+                                                    uint8_t *tag,
+                                                    uint16_t shared_seq)
+{
+    if (!group) {
+        group = g_new0(CXLDCExtentGroup, 1);
+        QTAILQ_INIT(&group->list);
+    }
+    cxl_insert_extent_to_extent_list(&group->list, dpa, len,
+                                     tag, shared_seq);
+    return group;
+}
+
+void cxl_extent_group_list_insert_tail(CXLDCExtentGroupList *list,
+                                       CXLDCExtentGroup *group)
+{
+    QTAILQ_INSERT_TAIL(list, group, node);
+}
+
+void cxl_extent_group_list_delete_front(CXLDCExtentGroupList *list)
+{
+    CXLDCExtent *ent, *ent_next;
+    CXLDCExtentGroup *group = QTAILQ_FIRST(list);
+
+    QTAILQ_REMOVE(list, group, node);
+    QTAILQ_FOREACH_SAFE(ent, &group->list, node, ent_next) {
+        cxl_remove_extent_from_extent_list(&group->list, ent);
+    }
+    g_free(group);
+}
+
 /*
  * CXL r3.1 Table 8-168: Add Dynamic Capacity Response Input Payload
  * CXL r3.1 Table 8-170: Release Dynamic Capacity Input Payload
@@ -1541,6 +1579,7 @@ static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
 {
     uint32_t i;
     CXLDCExtent *ent;
+    CXLDCExtentGroup *ext_group;
     uint64_t dpa, len;
     Range range1, range2;
 
@@ -1551,9 +1590,13 @@ static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
         range_init_nofail(&range1, dpa, len);
 
         /*
-         * TODO: once the pending extent list is added, check against
-         * the list will be added here.
+         * The host-accepted DPA range must be contained by the first extent
+         * group in the pending list
          */
+        ext_group = QTAILQ_FIRST(&ct3d->dc.extents_pending);
+        if (!cxl_extents_contains_dpa_range(&ext_group->list, dpa, len)) {
+            return CXL_MBOX_INVALID_PA;
+        }
 
         /* to-be-added range should not overlap with range already accepted */
         QTAILQ_FOREACH(ent, &ct3d->dc.extents, node) {
@@ -1586,10 +1629,7 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
     CXLRetCode ret;
 
     if (in->num_entries_updated == 0) {
-        /*
-         * TODO: once the pending list is introduced, extents in the beginning
-         * will get wiped out.
-         */
+        cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
         return CXL_MBOX_SUCCESS;
     }
 
@@ -1615,11 +1655,9 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
 
         cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
         ct3d->dc.total_extent_count += 1;
-        /*
-         * TODO: we will add a pending extent list based on event log record
-         * and process the list accordingly here.
-         */
     }
+    /* Remove the first extent group in the pending list*/
+    cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
 
     return CXL_MBOX_SUCCESS;
 }
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index c2cdd6d506..1bae3711a0 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -667,6 +667,7 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
         ct3d->dc.total_capacity += region->len;
     }
     QTAILQ_INIT(&ct3d->dc.extents);
+    QTAILQ_INIT(&ct3d->dc.extents_pending);
 
     return true;
 }
@@ -674,10 +675,19 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
 static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
 {
     CXLDCExtent *ent, *ent_next;
+    CXLDCExtentGroup *group, *group_next;
 
     QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node, ent_next) {
         cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
     }
+
+    QTAILQ_FOREACH_SAFE(group, &ct3d->dc.extents_pending, node, group_next) {
+        QTAILQ_REMOVE(&ct3d->dc.extents_pending, group, node);
+        QTAILQ_FOREACH_SAFE(ent, &group->list, node, ent_next) {
+            cxl_remove_extent_from_extent_list(&group->list, ent);
+        }
+        g_free(group);
+    }
 }
 
 static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
@@ -1443,7 +1453,6 @@ static int ct3d_qmp_cxl_event_log_enc(CxlEventLog log)
         return CXL_EVENT_TYPE_FAIL;
     case CXL_EVENT_LOG_FATAL:
         return CXL_EVENT_TYPE_FATAL;
-/* DCD not yet supported */
     default:
         return -EINVAL;
     }
@@ -1694,6 +1703,306 @@ void qmp_cxl_inject_memory_module_event(const char *path, CxlEventLog log,
     }
 }
 
+/* CXL r3.1 Table 8-50: Dynamic Capacity Event Record */
+static const QemuUUID dynamic_capacity_uuid = {
+    .data = UUID(0xca95afa7, 0xf183, 0x4018, 0x8c, 0x2f,
+                 0x95, 0x26, 0x8e, 0x10, 0x1a, 0x2a),
+};
+
+typedef enum CXLDCEventType {
+    DC_EVENT_ADD_CAPACITY = 0x0,
+    DC_EVENT_RELEASE_CAPACITY = 0x1,
+    DC_EVENT_FORCED_RELEASE_CAPACITY = 0x2,
+    DC_EVENT_REGION_CONFIG_UPDATED = 0x3,
+    DC_EVENT_ADD_CAPACITY_RSP = 0x4,
+    DC_EVENT_CAPACITY_RELEASED = 0x5,
+} CXLDCEventType;
+
+/*
+ * Check whether the range [dpa, dpa + len - 1] has overlaps with extents in
+ * the list.
+ * Return value: return true if has overlaps; otherwise, return false
+ */
+static bool cxl_extents_overlaps_dpa_range(CXLDCExtentList *list,
+                                           uint64_t dpa, uint64_t len)
+{
+    CXLDCExtent *ent;
+    Range range1, range2;
+
+    if (!list) {
+        return false;
+    }
+
+    range_init_nofail(&range1, dpa, len);
+    QTAILQ_FOREACH(ent, list, node) {
+        range_init_nofail(&range2, ent->start_dpa, ent->len);
+        if (range_overlaps_range(&range1, &range2)) {
+            return true;
+        }
+    }
+    return false;
+}
+
+/*
+ * Check whether the range [dpa, dpa + len - 1] is contained by extents in
+ * the list.
+ * Will check multiple extents containment once superset release is added.
+ * Return value: return true if range is contained; otherwise, return false
+ */
+bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
+                                    uint64_t dpa, uint64_t len)
+{
+    CXLDCExtent *ent;
+    Range range1, range2;
+
+    if (!list) {
+        return false;
+    }
+
+    range_init_nofail(&range1, dpa, len);
+    QTAILQ_FOREACH(ent, list, node) {
+        range_init_nofail(&range2, ent->start_dpa, ent->len);
+        if (range_contains_range(&range2, &range1)) {
+            return true;
+        }
+    }
+    return false;
+}
+
+static bool cxl_extent_groups_overlaps_dpa_range(CXLDCExtentGroupList *list,
+                                                uint64_t dpa, uint64_t len)
+{
+    CXLDCExtentGroup *group;
+
+    if (!list) {
+        return false;
+    }
+
+    QTAILQ_FOREACH(group, list, node) {
+        if (cxl_extents_overlaps_dpa_range(&group->list, dpa, len)) {
+            return true;
+        }
+    }
+    return false;
+}
+
+/*
+ * The main function to process dynamic capacity event with extent list.
+ * Currently DC extents add/release requests are processed.
+ */
+static void qmp_cxl_process_dynamic_capacity_prescriptive(const char *path,
+        uint16_t hid, CXLDCEventType type, uint8_t rid,
+        CXLDynamicCapacityExtentList *records, Error **errp)
+{
+    Object *obj;
+    CXLEventDynamicCapacity dCap = {};
+    CXLEventRecordHdr *hdr = &dCap.hdr;
+    CXLType3Dev *dcd;
+    uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;
+    uint32_t num_extents = 0;
+    CXLDynamicCapacityExtentList *list;
+    CXLDCExtentGroup *group = NULL;
+    g_autofree CXLDCExtentRaw *extents = NULL;
+    uint8_t enc_log = CXL_EVENT_TYPE_DYNAMIC_CAP;
+    uint64_t dpa, offset, len, block_size;
+    g_autofree unsigned long *blk_bitmap = NULL;
+    int i;
+
+    obj = object_resolve_path_type(path, TYPE_CXL_TYPE3, NULL);
+    if (!obj) {
+        error_setg(errp, "Unable to resolve CXL type 3 device");
+        return;
+    }
+
+    dcd = CXL_TYPE3(obj);
+    if (!dcd->dc.num_regions) {
+        error_setg(errp, "No dynamic capacity support from the device");
+        return;
+    }
+
+
+    if (rid >= dcd->dc.num_regions) {
+        error_setg(errp, "region id is too large");
+        return;
+    }
+    block_size = dcd->dc.regions[rid].block_size;
+    blk_bitmap = bitmap_new(dcd->dc.regions[rid].len / block_size);
+
+    /* Sanity check and count the extents */
+    list = records;
+    while (list) {
+        offset = list->value->offset;
+        len = list->value->len;
+        dpa = offset + dcd->dc.regions[rid].base;
+
+        if (len == 0) {
+            error_setg(errp, "extent with 0 length is not allowed");
+            return;
+        }
+
+        if (offset % block_size || len % block_size) {
+            error_setg(errp, "dpa or len is not aligned to region block size");
+            return;
+        }
+
+        if (offset + len > dcd->dc.regions[rid].len) {
+            error_setg(errp, "extent range is beyond the region end");
+            return;
+        }
+
+        /* No duplicate or overlapped extents are allowed */
+        if (test_any_bits_set(blk_bitmap, offset / block_size,
+                              len / block_size)) {
+            error_setg(errp, "duplicate or overlapped extents are detected");
+            return;
+        }
+        bitmap_set(blk_bitmap, offset / block_size, len / block_size);
+
+        if (type == DC_EVENT_RELEASE_CAPACITY) {
+            if (cxl_extent_groups_overlaps_dpa_range(&dcd->dc.extents_pending,
+                                                     dpa, len)) {
+                error_setg(errp,
+                           "cannot release extent with pending DPA range");
+                return;
+            }
+            if (!cxl_extents_contains_dpa_range(&dcd->dc.extents, dpa, len)) {
+                error_setg(errp,
+                           "cannot release extent with non-existing DPA range");
+                return;
+            }
+        } else if (type == DC_EVENT_ADD_CAPACITY) {
+            if (cxl_extents_overlaps_dpa_range(&dcd->dc.extents, dpa, len)) {
+                error_setg(errp,
+                           "cannot add DPA already accessible  to the same LD");
+                return;
+            }
+            if (cxl_extent_groups_overlaps_dpa_range(&dcd->dc.extents_pending,
+                                                     dpa, len)) {
+                error_setg(errp,
+                           "cannot add DPA again while still pending");
+                return;
+            }
+        }
+        list = list->next;
+        num_extents++;
+    }
+
+    /* Create extent list for event being passed to host */
+    i = 0;
+    list = records;
+    extents = g_new0(CXLDCExtentRaw, num_extents);
+    while (list) {
+        offset = list->value->offset;
+        len = list->value->len;
+        dpa = dcd->dc.regions[rid].base + offset;
+
+        extents[i].start_dpa = dpa;
+        extents[i].len = len;
+        memset(extents[i].tag, 0, 0x10);
+        extents[i].shared_seq = 0;
+        if (type == DC_EVENT_ADD_CAPACITY) {
+            group = cxl_insert_extent_to_extent_group(group,
+                                                      extents[i].start_dpa,
+                                                      extents[i].len,
+                                                      extents[i].tag,
+                                                      extents[i].shared_seq);
+        }
+
+        list = list->next;
+        i++;
+    }
+    if (group) {
+        cxl_extent_group_list_insert_tail(&dcd->dc.extents_pending, group);
+    }
+
+    /*
+     * CXL r3.1 section 8.2.9.2.1.6: Dynamic Capacity Event Record
+     *
+     * All Dynamic Capacity event records shall set the Event Record Severity
+     * field in the Common Event Record Format to Informational Event. All
+     * Dynamic Capacity related events shall be logged in the Dynamic Capacity
+     * Event Log.
+     */
+    cxl_assign_event_header(hdr, &dynamic_capacity_uuid, flags, sizeof(dCap),
+                            cxl_device_get_timestamp(&dcd->cxl_dstate));
+
+    dCap.type = type;
+    /* FIXME: for now, validity flag is cleared */
+    dCap.validity_flags = 0;
+    stw_le_p(&dCap.host_id, hid);
+    /* only valid for DC_REGION_CONFIG_UPDATED event */
+    dCap.updated_region_id = 0;
+    dCap.flags = 0;
+    for (i = 0; i < num_extents; i++) {
+        memcpy(&dCap.dynamic_capacity_extent, &extents[i],
+               sizeof(CXLDCExtentRaw));
+
+        if (i < num_extents - 1) {
+            /* Set "More" flag */
+            dCap.flags |= BIT(0);
+        }
+
+        if (cxl_event_insert(&dcd->cxl_dstate, enc_log,
+                             (CXLEventRecordRaw *)&dCap)) {
+            cxl_event_irq_assert(dcd);
+        }
+    }
+}
+
+void qmp_cxl_add_dynamic_capacity(const char *path, uint16_t host_id,
+                                  uint8_t sel_policy, uint8_t region,
+                                  const char *tag,
+                                  CXLDynamicCapacityExtentList  *extents,
+                                  Error **errp)
+{
+    enum {
+        CXL_SEL_POLICY_FREE,
+        CXL_SEL_POLICY_CONTIGUOUS,
+        CXL_SEL_POLICY_PRESCRIPTIVE,
+        CXL_SEL_POLICY_ENABLESHAREDACCESS,
+    };
+    switch (sel_policy) {
+    case CXL_SEL_POLICY_PRESCRIPTIVE:
+        qmp_cxl_process_dynamic_capacity_prescriptive(path, host_id,
+                                                      DC_EVENT_ADD_CAPACITY,
+                                                      region, extents, errp);
+        return;
+    default:
+        error_setg(errp, "Selection policy not supported");
+        return;
+    }
+}
+
+#define REMOVAL_POLICY_MASK 0xf
+#define REMOVAL_POLICY_PRESCRIPTIVE 1
+#define FORCED_REMOVAL_BIT BIT(4)
+
+void qmp_cxl_release_dynamic_capacity(const char *path, uint16_t host_id,
+                                      uint8_t flags, uint8_t region,
+                                      const char *tag,
+                                      CXLDynamicCapacityExtentList  *extents,
+                                      Error **errp)
+{
+    CXLDCEventType type = DC_EVENT_RELEASE_CAPACITY;
+
+    if (flags & FORCED_REMOVAL_BIT) {
+        /* TODO: enable forced removal in the future */
+        type = DC_EVENT_FORCED_RELEASE_CAPACITY;
+        error_setg(errp, "Forced removal not supported yet");
+        return;
+    }
+
+    switch (flags & REMOVAL_POLICY_MASK) {
+    case REMOVAL_POLICY_PRESCRIPTIVE:
+        qmp_cxl_process_dynamic_capacity_prescriptive(path, host_id, type,
+                                                      region, extents, errp);
+        return;
+    default:
+        error_setg(errp, "Removal policy not supported");
+        return;
+    }
+}
+
 static void ct3_class_init(ObjectClass *oc, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(oc);
diff --git a/hw/mem/cxl_type3_stubs.c b/hw/mem/cxl_type3_stubs.c
index 3e1851e32b..9df530ceec 100644
--- a/hw/mem/cxl_type3_stubs.c
+++ b/hw/mem/cxl_type3_stubs.c
@@ -67,3 +67,23 @@ void qmp_cxl_inject_correctable_error(const char *path, CxlCorErrorType type,
 {
     error_setg(errp, "CXL Type 3 support is not compiled in");
 }
+
+void qmp_cxl_add_dynamic_capacity(const char *path,
+                                  uint16_t host_id,
+                                  uint8_t sel_policy,
+                                  uint8_t region,
+                                  const char *tag,
+                                  CXLDynamicCapacityExtentList *extents,
+                                  Error **errp)
+{
+    error_setg(errp, "CXL Type 3 support is not compiled in");
+}
+
+void qmp_cxl_release_dynamic_capacity(const char *path, uint16_t host_id,
+                                      uint8_t flags, uint8_t region,
+                                      const char *tag,
+                                      CXLDynamicCapacityExtentList *extents,
+                                      Error **errp)
+{
+    error_setg(errp, "CXL Type 3 support is not compiled in");
+}
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index df3511e91b..c69ff6b5de 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -443,6 +443,12 @@ typedef struct CXLDCExtent {
 } CXLDCExtent;
 typedef QTAILQ_HEAD(, CXLDCExtent) CXLDCExtentList;
 
+typedef struct CXLDCExtentGroup {
+    CXLDCExtentList list;
+    QTAILQ_ENTRY(CXLDCExtentGroup) node;
+} CXLDCExtentGroup;
+typedef QTAILQ_HEAD(, CXLDCExtentGroup) CXLDCExtentGroupList;
+
 typedef struct CXLDCRegion {
     uint64_t base;       /* aligned to 256*MiB */
     uint64_t decode_len; /* aligned to 256*MiB */
@@ -494,6 +500,7 @@ struct CXLType3Dev {
          */
         uint64_t total_capacity; /* 256M aligned */
         CXLDCExtentList extents;
+        CXLDCExtentGroupList extents_pending;
         uint32_t total_extent_count;
         uint32_t ext_list_gen_seq;
 
@@ -555,4 +562,19 @@ CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len);
 
 void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
                                         CXLDCExtent *extent);
+void cxl_insert_extent_to_extent_list(CXLDCExtentList *list, uint64_t dpa,
+                                      uint64_t len, uint8_t *tag,
+                                      uint16_t shared_seq);
+bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
+                       unsigned long size);
+bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
+                                    uint64_t dpa, uint64_t len);
+CXLDCExtentGroup *cxl_insert_extent_to_extent_group(CXLDCExtentGroup *group,
+                                                    uint64_t dpa,
+                                                    uint64_t len,
+                                                    uint8_t *tag,
+                                                    uint16_t shared_seq);
+void cxl_extent_group_list_insert_tail(CXLDCExtentGroupList *list,
+                                       CXLDCExtentGroup *group);
+void cxl_extent_group_list_delete_front(CXLDCExtentGroupList *list);
 #endif
diff --git a/include/hw/cxl/cxl_events.h b/include/hw/cxl/cxl_events.h
index 5170b8dbf8..38cadaa0f3 100644
--- a/include/hw/cxl/cxl_events.h
+++ b/include/hw/cxl/cxl_events.h
@@ -166,4 +166,22 @@ typedef struct CXLEventMemoryModule {
     uint8_t reserved[0x3d];
 } QEMU_PACKED CXLEventMemoryModule;
 
+/*
+ * CXL r3.1 section Table 8-50: Dynamic Capacity Event Record
+ * All fields little endian.
+ */
+typedef struct CXLEventDynamicCapacity {
+    CXLEventRecordHdr hdr;
+    uint8_t type;
+    uint8_t validity_flags;
+    uint16_t host_id;
+    uint8_t updated_region_id;
+    uint8_t flags;
+    uint8_t reserved2[2];
+    uint8_t dynamic_capacity_extent[0x28]; /* defined in cxl_device.h */
+    uint8_t reserved[0x18];
+    uint32_t extents_avail;
+    uint32_t tags_avail;
+} QEMU_PACKED CXLEventDynamicCapacity;
+
 #endif /* CXL_EVENTS_H */
diff --git a/qapi/cxl.json b/qapi/cxl.json
index 4281726dec..27cf39f448 100644
--- a/qapi/cxl.json
+++ b/qapi/cxl.json
@@ -361,3 +361,93 @@
 ##
 {'command': 'cxl-inject-correctable-error',
  'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
+
+##
+# @CXLDynamicCapacityExtent:
+#
+# A single dynamic capacity extent
+#
+# @offset: The offset (in bytes) to the start of the region
+#     where the extent belongs to
+#
+# @len: The length of the extent in bytes
+#
+# Since: 9.1
+##
+{ 'struct': 'CXLDynamicCapacityExtent',
+  'data': {
+      'offset':'uint64',
+      'len': 'uint64'
+  }
+}
+
+##
+# @cxl-add-dynamic-capacity:
+#
+# Command to initiate to add dynamic capacity extents to a host.  It
+# simulates operations defined in cxl spec r3.1 7.6.7.6.5.
+#
+# @path: CXL DCD canonical QOM path
+#
+# @host-id: The "Host ID" field as defined in cxl spec r3.1
+#     Table 7-70.
+#
+# @selection-policy: The "Selection Policy" bits as defined in
+#     cxl spec r3.1 Table 7-70.  It specifies the policy to use for
+#     selecting which extents comprise the added capacity.
+#
+# @region: The "Region Number" field as defined in cxl spec r3.1
+#     Table 7-70.  The dynamic capacity region where the capacity
+#     is being added.  Valid range is from 0-7.
+#
+# @tag: The "Tag" field as defined in cxl spec r3.1 Table 7-70.
+#
+# @extents: The "Extent List" field as defined in cxl spec r3.1
+#     Table 7-70.
+#
+# Since : 9.1
+##
+{ 'command': 'cxl-add-dynamic-capacity',
+  'data': { 'path': 'str',
+            'host-id': 'uint16',
+            'selection-policy': 'uint8',
+            'region': 'uint8',
+            'tag': 'str',
+            'extents': [ 'CXLDynamicCapacityExtent' ]
+           }
+}
+
+##
+# @cxl-release-dynamic-capacity:
+#
+# Command to initiate to release dynamic capacity extents from a
+# host.  It simulates operations defined in cxl spec r3.1 7.6.7.6.6.
+#
+# @path: CXL DCD canonical QOM path
+#
+# @host-id: The "Host ID" field as defined in cxl spec r3.1
+#     Table 7-71.
+#
+# @flags: The "Flags" field as defined in cxl spec r3.1 Table 7-71,
+#     with bit[3:0] for removal policy, bit[4] for forced removal,
+#     bit[5] for sanitize on release, bit[7:6] reserved.
+#
+# @region: The dynamic capacity region where the extents will be
+#     released.
+#
+# @tag: The "Tag" field as defined in cxl spec r3.1 Table 7-71.
+#
+# @extents: The "Extent List" field as defined in cxl spec r3.1
+#     Table 7-71.
+#
+# Since : 9.1
+##
+{ 'command': 'cxl-release-dynamic-capacity',
+  'data': { 'path': 'str',
+            'host-id': 'uint16',
+            'flags': 'uint8',
+            'region': 'uint8',
+            'tag': 'str',
+            'extents': [ 'CXLDynamicCapacityExtent' ]
+           }
+}
Fan Ni May 1, 2024, 10:36 p.m. UTC | #10
Hi Markus, Michael and Jonathan,

FYI. I have updated this patch based on the feedbacks so far, and posted here:
https://lore.kernel.org/linux-cxl/20240418232902.583744-1-fan.ni@samsung.com/T/#ma25b6657597d39df23341dc43c22a8c49818e5f9

Comments are welcomed and appreciated.

Fan


On Wed, May 01, 2024 at 03:58:12PM +0100, Jonathan Cameron wrote:
> 
> 
> 
> > > >> > +# @hid: host id  
> > > >> 
> > > >> @host-id, unless "HID" is established terminology in CXL DCD land.  
> > > >
> > > > host-id works.  
> > > >> 
> > > >> What is a host ID?  
> > > >
> > > > It is an id identifying the host to which the capacity is being added.  
> > > 
> > > How are these IDs assigned?  
> > 
> > All the arguments passed to the command here are defined in CXL spec. I
> > will add reference to the spec.
> > 
> > Based on the spec, for LD-FAM (Fabric attached memory represented as
> > logical device), host id is the LD-ID of the host interface to which
> > the capacity is being added. LD-ID is a unique number (16-bit) assigned
> > to a host interface.
> 
> Key here is the host doesn't know it.  This ID exists purely for rooting
> to the appropriate host interface either via choosing a port on a
> multihead Single Logical Device (SLD) (so today it's always 0 as we only
> have one head) or if we ever implement a switch capable of handling MLDs
> then the switch will handle routing of host PCIe accesses so it lands
> on the interface defined by this ID (and the event turns up in that event log.
> 
>             Host A         Host B - could in theory be a RP on host A ;)
>               |              |  Doesn't exist (yet, but there are partial.
>              _|______________|_ patches for this on list.
>             | LD 0         LD 1|
>             |                  |
>             |   Multi Head     |
>             |   Single Logical |
>             |  Device (MH-SLD) |
>             |__________________|
> Host view similar to the switch case, but just two direct
> connected devices.
> 
> Or Switch and MLD case - we aren't emulating this yet at all
> 
>      Wiring / real topology                 Host View 
>          
>       Host A     Host B              Host A       Host B
>         |          |                   |            |
>      ___|__________|___               _|_          _|_
>     |   \  SWITCH /    |             |SW0|        | | |
>     |    \       /     |             | | |        | | |
>     |    LD0   LD1     |             | | |        | | |
>     |      \   /       |             | | |        | | |
>     |        |         |             | | |        | | |
>     |________|_________|             |_|_|        |_|_|
>              |                         |            |
>       Traffic tagged with LD           |            |
>              |                         |            |
>      ________|________________     ____|___     ____|___
>     | Multilogical Device MLD |   |        |   |        |
>     |        |                |   | Simple |   | Another|
>     |       / \               |   | CXL    |   | CXL    |
>     |      /   \              |   | Memory |   | Memory |
>     |    Interfaces           |   | Device |   | Device |
>     |   LD0     LD1           |   |        |   |        |
>     |_________________________|   |________|   |________|
> 
> Note the hosts just see separate devices and switches with the fun exception that the
> memory may actually be available to both at the same time.
> 
> Control plane for the switches and MLD see what is actually going on.
> 
> At this stage upshot is we could just default this to zero and add an optional
> parameter to set it later.
> 
> 
> 
> ...
> 
> > > >> > +# @extents: Extents to release
> > > >> > +#
> > > >> > +# Since : 9.1
> > > >> > +##
> > > >> > +{ 'command': 'cxl-release-dynamic-capacity',
> > > >> > +  'data': { 'path': 'str',
> > > >> > +            'hid': 'uint16',
> > > >> > +            'flags': 'uint8',
> > > >> > +            'region-id': 'uint8',
> > > >> > +            'tag': 'str',
> > > >> > +            'extents': [ 'CXLDCExtentRecord' ]
> > > >> > +           }
> > > >> > +}  
> > > >> 
> > > >> During review of v5, you wrote:
> > > >> 
> > > >>     For add command, the host will send a mailbox command to response to
> > > >>     the add request to the device to indicate whether it accepts the add
> > > >>     capacity offer or not.
> > > >>     
> > > >>     For release command, the host send a mailbox command (not always a
> > > >>     response since the host can proactively release capacity if it does
> > > >>     not need it any more) to device to ask device release the capacity.
> > > >> 
> > > >> Can you briefly sketch the protocol?  Peers and messages involved.
> > > >> Possibly as a state diagram.  
> > > >
> > > > Need to think about it. If we can polish the text nicely, maybe the
> > > > sketch is not needed. My concern is that the sketch may
> > > > introduce unwanted complexity as we expose too much details. The two
> > > > commands provide ways to add/release dynamic capacity to/from a host,
> > > > that is all. All the other information, like what the host will do, or
> > > > how the device will react, are consequence of the command, not sure
> > > > whether we want to include here.  
> > > 
> > > The protocol sketch is for me, not necessarily the doc comment.  I'd
> > > like to understand at high level how this stuff works, because only then
> > > can I meaningfully review the docs.  
> > 
> > --------------------------------
> > For add command, saying a user sends a request to FM to ask to add
> > extent A of the device (managed by FM) to host 0.
> > The function cxl-add-dynamic-capacity simulates what FM needs to do.
> 
> This gets a little fiddly as an explanation.  I'd argue this is more or
> less at the level of the FM to device command flow so it's the device
> verifying etc. (you could explain this interface as talking to an FM
> that is talking to the device, but that just feels complicated to me).
> 
> > 1. Verify extent A is valid (behaviour defined by the spec), return
> > error if not; otherwise,
> > 2. Add a record to the device's event log (indicating the intent to
> > add extent A to host 0), update device internal extent tracking status,
> > signal an interrupt to host 0;
> > (The above step 1 & 2 are performed in the QMP interface, following
> > operations are QMP irrelevant, only host and device involved.)
> 
> In this patch.
> 
> > 3. Once the interrupt is received, host 0 fetch the event record from
> > the device's event log through some mailbox command (out of scope
> > of this patch series).
> 
> It's in patch 8.
> 
> > 4. Host 0 decides whether it accepts extent A or not. Whether accept or
> > reject, host needs to send a response (add-response mailbox command) to
> > the device so the device can update its internal extent tracking
> > status accordingly.
> > The device return a value to the host showing whether the response is
> > successful or failed.
> 
> (assuming the host isn't buggy this always succeeds)
> 
> > 5. Based on the mailbox command return value, the host process
> > accordingly.
> 
> Memory now useable by host if it accepted it successfully.
> 
> > 6. The host sends a mailbox command to the device to clear the event
> > record in the device's event log. 
> > 
> > ---------------------------------
> > For release command, saying a user sends a request to FM to ask host 0
> > to release extent A and return it back to the device (managed by FM).
> > 
> > The function cxl-release-dynamic-capacity simulates what FM needs to do.
> > 1. Verify extent A is valid (defined by the spec), return error if not;
> > otherwise,
> > 2. Add a record to the event log (indicating the intent to
> > release extent A from host 0), signal an interrupt to host 0;
> > (The above step 1 & 2 are performed in the QMP interface, following
> > operations are QMP irrelevant, only host and device involved.
> > 3. Once the interrupt is received, host 0 fetch the event record from
> > the device's event log through some mailbox command (out of scope
> > of this patch series).
> > 4. Host 0 decides whether it can release extent A or not. Whether can or
> > cannot release, host needs to send a release (mailbox command) to the device
> > so the device can update its internal extent tracking status accordingly.
> > The device returns a value to host 0 showing whether the release is
> > successful or failed.
> > 5. Based on the returned value, the host process accordingly.
> > 6. The host sends mailbox command to clear the event record in the
> > device's event log. 
> > 
> > For release command, it is more complicated. Based on the release flag
> > passed to FM, FM can behaviour differently. For example, if the
> > forced-removal flag is set, FM can directly get the extent back from a
> > host for other uses without waiting for the host to send command to the
> > device. For the above step 2, their may be not event record to the event
> > log (no supported in this patch series yet).
> I thought we weren't doing force remove yet?  So for that we could
> set default value as normal release until we add that support perhaps.
> 
> > 
> > Also, for the release interface here, it simulates FM initializes the
> > release request.
> > There is another case where the host can proactively release extents it
> > do not need any more back to device. However, this case is out of the
> > scope of this release interface.
> > 
> > Hope the above text helps a little for the context here.
> > Let me know if further clarification is needed.
> 
> Only thing I'd add is that for now (because we don't need it for testing
> the kernel flows) is that this does not provide any way for external
> agents (e.g. our 'fabric manager' to find out what the state is - i.e.
> if the extents have been accepted by the host etc). That stuff is all
> defined by the spec, but not yet in the QMP interface.  At somepoint
> we may want to add that as a state query type interface.
> 
> Jonathan
> 
> p.s. Our emails raced yesterday, so great you put together this explanation
> of the flows before I got to it :)
> 
> > 
> > Thanks,
> > Fan
> > 
> > 
> > 
> > >   
> > > > @Jonathan, Any thoughts on this?  
> > > 
> > > Thanks!
> > >   
>
Zhijian Li (Fujitsu)" via May 14, 2024, 2:35 a.m. UTC | #11
On 19/04/2024 07:11, nifan.cxl@gmail.com wrote:
> +        } else if (type == DC_EVENT_ADD_CAPACITY) {
> +            if (cxl_extents_overlaps_dpa_range(&dcd->dc.extents, dpa, len)) {
> +                error_setg(errp,
> +                           "cannot add DPA already accessible  to the same LD");
> +                return;
> +            }


Double *space* before 'to'
Jonathan Cameron May 20, 2024, 4:50 p.m. UTC | #12
On Wed, 1 May 2024 15:29:31 -0700
fan <nifan.cxl@gmail.com> wrote:

> From 873f59ec06c38645768ada452d9b18920a34723e Mon Sep 17 00:00:00 2001
> From: Fan Ni <fan.ni@samsung.com>
> Date: Tue, 20 Feb 2024 09:48:31 -0800
> Subject: [PATCH] hw/cxl/events: Add qmp interfaces to add/release dynamic
>  capacity extents
> Status: RO
> Content-Length: 25172
> Lines: 731
> 
> To simulate FM functionalities for initiating Dynamic Capacity Add
> (Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
> r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
> add/release dynamic capacity extents requests.
> 
> With the change, we allow to release an extent only when its DPA range
> is contained by a single accepted extent in the device. That is to say,
> extent superset release is not supported yet.
> 
> 1. Add dynamic capacity extents:
> 
> For example, the command to add two continuous extents (each 128MiB long)
> to region 0 (starting at DPA offset 0) looks like below:
> 
> { "execute": "qmp_capabilities" }
> 
> { "execute": "cxl-add-dynamic-capacity",
>   "arguments": {
>       "path": "/machine/peripheral/cxl-dcd0",
>       "host-id": 0,
>       "selection-policy": 2,
>       "region": 0,
>       "tag": "",
>       "extents": [
>       {
>           "offset": 0,
>           "len": 134217728
>       },
>       {
>           "offset": 134217728,
>           "len": 134217728
>       }
>       ]
>   }
> }
> 
> 2. Release dynamic capacity extents:
> 
> For example, the command to release an extent of size 128MiB from region 0
> (DPA offset 128MiB) looks like below:
> 
> { "execute": "cxl-release-dynamic-capacity",
>   "arguments": {
>       "path": "/machine/peripheral/cxl-dcd0",
>       "host-id": 0,
>       "flags": 1,
>       "region": 0,
>       "tag": "",
>       "extents": [
>       {
>           "offset": 134217728,
>           "len": 134217728
>       }
>       ]
>   }
> }
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>

Hi Fan,

A few trivial questions inline.  I don't feel particularly strongly
about breaking up the flags fields, but I'd like to understand your
reasoning for keeping them as single fields?

Is it mainly to keep aligned with the specification or something else?

Thanks,

Jonathan


>  #endif /* CXL_EVENTS_H */
> diff --git a/qapi/cxl.json b/qapi/cxl.json
> index 4281726dec..27cf39f448 100644
> --- a/qapi/cxl.json
> +++ b/qapi/cxl.json
> @@ -361,3 +361,93 @@
>  ##
>  {'command': 'cxl-inject-correctable-error',
>   'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
> +
> +##
> +# @CXLDynamicCapacityExtent:
> +#
> +# A single dynamic capacity extent
> +#
> +# @offset: The offset (in bytes) to the start of the region
> +#     where the extent belongs to
> +#
> +# @len: The length of the extent in bytes
> +#
> +# Since: 9.1
> +##
> +{ 'struct': 'CXLDynamicCapacityExtent',
> +  'data': {
> +      'offset':'uint64',
> +      'len': 'uint64'
> +  }
> +}
> +
> +##
> +# @cxl-add-dynamic-capacity:
> +#
> +# Command to initiate to add dynamic capacity extents to a host.  It
> +# simulates operations defined in cxl spec r3.1 7.6.7.6.5.
> +#
> +# @path: CXL DCD canonical QOM path
> +#
> +# @host-id: The "Host ID" field as defined in cxl spec r3.1
> +#     Table 7-70.
> +#
> +# @selection-policy: The "Selection Policy" bits as defined in
> +#     cxl spec r3.1 Table 7-70.  It specifies the policy to use for
> +#     selecting which extents comprise the added capacity.

Hmm. This one is defined as a selection of nameable choices.  Perhaps
worth an enum?  If we did do that, we'd also need to break the flags
on in the release flags below.


> +#
> +# @region: The "Region Number" field as defined in cxl spec r3.1
> +#     Table 7-70.  The dynamic capacity region where the capacity
> +#     is being added.  Valid range is from 0-7.
> +#
> +# @tag: The "Tag" field as defined in cxl spec r3.1 Table 7-70.
> +#
> +# @extents: The "Extent List" field as defined in cxl spec r3.1
> +#     Table 7-70.
> +#
> +# Since : 9.1
> +##
> +{ 'command': 'cxl-add-dynamic-capacity',
> +  'data': { 'path': 'str',
> +            'host-id': 'uint16',
> +            'selection-policy': 'uint8',
> +            'region': 'uint8',
> +            'tag': 'str',
> +            'extents': [ 'CXLDynamicCapacityExtent' ]
> +           }
> +}
> +
> +##
> +# @cxl-release-dynamic-capacity:
> +#
> +# Command to initiate to release dynamic capacity extents from a
> +# host.  It simulates operations defined in cxl spec r3.1 7.6.7.6.6.
> +#
> +# @path: CXL DCD canonical QOM path
> +#
> +# @host-id: The "Host ID" field as defined in cxl spec r3.1
> +#     Table 7-71.
> +#
> +# @flags: The "Flags" field as defined in cxl spec r3.1 Table 7-71,
> +#     with bit[3:0] for removal policy, bit[4] for forced removal,
> +#     bit[5] for sanitize on release, bit[7:6] reserved.

This can be nicely broken up into removal policy enum plus two flags.
It might be worth doing so to give a nicer interface?

> +#
> +# @region: The dynamic capacity region where the extents will be
> +#     released.

This has a better definition in the add dynamic capacity entry above.

> +#
> +# @tag: The "Tag" field as defined in cxl spec r3.1 Table 7-71.
> +#
> +# @extents: The "Extent List" field as defined in cxl spec r3.1
> +#     Table 7-71.
> +#
> +# Since : 9.1
> +##
> +{ 'command': 'cxl-release-dynamic-capacity',
> +  'data': { 'path': 'str',
> +            'host-id': 'uint16',
> +            'flags': 'uint8',
> +            'region': 'uint8',
> +            'tag': 'str',
> +            'extents': [ 'CXLDynamicCapacityExtent' ]
> +           }
> +}
Fan Ni May 20, 2024, 5:55 p.m. UTC | #13
On Mon, May 20, 2024 at 05:50:12PM +0100, Jonathan Cameron wrote:
> On Wed, 1 May 2024 15:29:31 -0700
> fan <nifan.cxl@gmail.com> wrote:
> 
> > From 873f59ec06c38645768ada452d9b18920a34723e Mon Sep 17 00:00:00 2001
> > From: Fan Ni <fan.ni@samsung.com>
> > Date: Tue, 20 Feb 2024 09:48:31 -0800
> > Subject: [PATCH] hw/cxl/events: Add qmp interfaces to add/release dynamic
> >  capacity extents
> > Status: RO
> > Content-Length: 25172
> > Lines: 731
> > 
> > To simulate FM functionalities for initiating Dynamic Capacity Add
> > (Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
> > r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
> > add/release dynamic capacity extents requests.
> > 
> > With the change, we allow to release an extent only when its DPA range
> > is contained by a single accepted extent in the device. That is to say,
> > extent superset release is not supported yet.
> > 
> > 1. Add dynamic capacity extents:
> > 
> > For example, the command to add two continuous extents (each 128MiB long)
> > to region 0 (starting at DPA offset 0) looks like below:
> > 
> > { "execute": "qmp_capabilities" }
> > 
> > { "execute": "cxl-add-dynamic-capacity",
> >   "arguments": {
> >       "path": "/machine/peripheral/cxl-dcd0",
> >       "host-id": 0,
> >       "selection-policy": 2,
> >       "region": 0,
> >       "tag": "",
> >       "extents": [
> >       {
> >           "offset": 0,
> >           "len": 134217728
> >       },
> >       {
> >           "offset": 134217728,
> >           "len": 134217728
> >       }
> >       ]
> >   }
> > }
> > 
> > 2. Release dynamic capacity extents:
> > 
> > For example, the command to release an extent of size 128MiB from region 0
> > (DPA offset 128MiB) looks like below:
> > 
> > { "execute": "cxl-release-dynamic-capacity",
> >   "arguments": {
> >       "path": "/machine/peripheral/cxl-dcd0",
> >       "host-id": 0,
> >       "flags": 1,
> >       "region": 0,
> >       "tag": "",
> >       "extents": [
> >       {
> >           "offset": 134217728,
> >           "len": 134217728
> >       }
> >       ]
> >   }
> > }
> > 
> > Signed-off-by: Fan Ni <fan.ni@samsung.com>
> 
> Hi Fan,
> 
> A few trivial questions inline.  I don't feel particularly strongly
> about breaking up the flags fields, but I'd like to understand your
> reasoning for keeping them as single fields?
> 
> Is it mainly to keep aligned with the specification or something else?
> 
> Thanks,
> 
> Jonathan
> 
> 
> >  #endif /* CXL_EVENTS_H */
> > diff --git a/qapi/cxl.json b/qapi/cxl.json
> > index 4281726dec..27cf39f448 100644
> > --- a/qapi/cxl.json
> > +++ b/qapi/cxl.json
> > @@ -361,3 +361,93 @@
> >  ##
> >  {'command': 'cxl-inject-correctable-error',
> >   'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
> > +
> > +##
> > +# @CXLDynamicCapacityExtent:
> > +#
> > +# A single dynamic capacity extent
> > +#
> > +# @offset: The offset (in bytes) to the start of the region
> > +#     where the extent belongs to
> > +#
> > +# @len: The length of the extent in bytes
> > +#
> > +# Since: 9.1
> > +##
> > +{ 'struct': 'CXLDynamicCapacityExtent',
> > +  'data': {
> > +      'offset':'uint64',
> > +      'len': 'uint64'
> > +  }
> > +}
> > +
> > +##
> > +# @cxl-add-dynamic-capacity:
> > +#
> > +# Command to initiate to add dynamic capacity extents to a host.  It
> > +# simulates operations defined in cxl spec r3.1 7.6.7.6.5.
> > +#
> > +# @path: CXL DCD canonical QOM path
> > +#
> > +# @host-id: The "Host ID" field as defined in cxl spec r3.1
> > +#     Table 7-70.
> > +#
> > +# @selection-policy: The "Selection Policy" bits as defined in
> > +#     cxl spec r3.1 Table 7-70.  It specifies the policy to use for
> > +#     selecting which extents comprise the added capacity.
> 
> Hmm. This one is defined as a selection of nameable choices.  Perhaps
> worth an enum?  If we did do that, we'd also need to break the flags
> on in the release flags below.

Initially, I defined a enum for selection policy. But for users who are
not familiar with CXL spec, I think the enum definition is not very clear to
to them without reading the spec, so I removed it. Also, there are some
reserved bits there, let it as uint8 may help keep the interface unchanged
if some of the bits are used in the future?

> 
> 
> > +#
> > +# @region: The "Region Number" field as defined in cxl spec r3.1
> > +#     Table 7-70.  The dynamic capacity region where the capacity
> > +#     is being added.  Valid range is from 0-7.
> > +#
> > +# @tag: The "Tag" field as defined in cxl spec r3.1 Table 7-70.
> > +#
> > +# @extents: The "Extent List" field as defined in cxl spec r3.1
> > +#     Table 7-70.
> > +#
> > +# Since : 9.1
> > +##
> > +{ 'command': 'cxl-add-dynamic-capacity',
> > +  'data': { 'path': 'str',
> > +            'host-id': 'uint16',
> > +            'selection-policy': 'uint8',
> > +            'region': 'uint8',
> > +            'tag': 'str',
> > +            'extents': [ 'CXLDynamicCapacityExtent' ]
> > +           }
> > +}
> > +
> > +##
> > +# @cxl-release-dynamic-capacity:
> > +#
> > +# Command to initiate to release dynamic capacity extents from a
> > +# host.  It simulates operations defined in cxl spec r3.1 7.6.7.6.6.
> > +#
> > +# @path: CXL DCD canonical QOM path
> > +#
> > +# @host-id: The "Host ID" field as defined in cxl spec r3.1
> > +#     Table 7-71.
> > +#
> > +# @flags: The "Flags" field as defined in cxl spec r3.1 Table 7-71,
> > +#     with bit[3:0] for removal policy, bit[4] for forced removal,
> > +#     bit[5] for sanitize on release, bit[7:6] reserved.
> 
> This can be nicely broken up into removal policy enum plus two flags.
> It might be worth doing so to give a nicer interface?

So if we choose to do it this way, maybe we use enum for the above add
command also?

> 
> > +#
> > +# @region: The dynamic capacity region where the extents will be
> > +#     released.
> 
> This has a better definition in the add dynamic capacity entry above.

Yeah, will fix. 

Fan
> 
> > +#
> > +# @tag: The "Tag" field as defined in cxl spec r3.1 Table 7-71.
> > +#
> > +# @extents: The "Extent List" field as defined in cxl spec r3.1
> > +#     Table 7-71.
> > +#
> > +# Since : 9.1
> > +##
> > +{ 'command': 'cxl-release-dynamic-capacity',
> > +  'data': { 'path': 'str',
> > +            'host-id': 'uint16',
> > +            'flags': 'uint8',
> > +            'region': 'uint8',
> > +            'tag': 'str',
> > +            'extents': [ 'CXLDynamicCapacityExtent' ]
> > +           }
> > +}
>
Fan Ni May 21, 2024, 11:32 p.m. UTC | #14
From 9d6d774ec973d22c0f662b32385345a88b14cc55 Mon Sep 17 00:00:00 2001
From: Fan Ni <fan.ni@samsung.com>
Date: Tue, 20 Feb 2024 09:48:31 -0800
Subject: [PATCH 11/14] hw/cxl/events: Add qmp interfaces to add/release
 dynamic capacity extents

To simulate FM functionalities for initiating Dynamic Capacity Add
(Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
add/release dynamic capacity extents requests.

With the change, we allow to release an extent only when its DPA range
is contained by a single accepted extent in the device. That is to say,
extent superset release is not supported yet.

1. Add dynamic capacity extents:

For example, the command to add two continuous extents (each 128MiB long)
to region 0 (starting at DPA offset 0) looks like below:

{ "execute": "qmp_capabilities" }

{ "execute": "cxl-add-dynamic-capacity",
  "arguments": {
      "path": "/machine/peripheral/cxl-dcd0",
      "host-id": 0,
      "selection-policy": "prescriptive",
      "region": 0,
      "tag": "",
      "extents": [
      {
          "offset": 0,
          "len": 134217728
      },
      {
          "offset": 134217728,
          "len": 134217728
      }
      ]
  }
}

2. Release dynamic capacity extents:

For example, the command to release an extent of size 128MiB from region 0
(DPA offset 128MiB) looks like below:

{ "execute": "cxl-release-dynamic-capacity",
  "arguments": {
      "path": "/machine/peripheral/cxl-dcd0",
      "host-id": 0,
      "removal-policy":"prescriptive",
      "forced-removal": false,
      "sanitize-on-release": false,
      "region": 0,
      "tag": "",
      "extents": [
      {
          "offset": 134217728,
          "len": 134217728
      }
      ]
  }
}

Tested-by: Svetly Todorov <svetly.todorov@memverge.com>
Reviewed-by: Gregory Price <gregory.price@memverge.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/cxl/cxl-mailbox-utils.c  |  62 ++++++--
 hw/mem/cxl_type3.c          | 304 +++++++++++++++++++++++++++++++++++-
 hw/mem/cxl_type3_stubs.c    |  23 +++
 include/hw/cxl/cxl_device.h |  22 +++
 include/hw/cxl/cxl_events.h |  18 +++
 qapi/cxl.json               | 143 +++++++++++++++++
 6 files changed, 559 insertions(+), 13 deletions(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 9d54e10cd4..ab71492697 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -1405,7 +1405,7 @@ static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
  * Check whether any bit between addr[nr, nr+size) is set,
  * return true if any bit is set, otherwise return false
  */
-static bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
+bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
                               unsigned long size)
 {
     unsigned long res = find_next_bit(addr, size + nr, nr);
@@ -1444,7 +1444,7 @@ CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len)
     return NULL;
 }
 
-static void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
+void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
                                              uint64_t dpa,
                                              uint64_t len,
                                              uint8_t *tag,
@@ -1470,6 +1470,44 @@ void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
     g_free(extent);
 }
 
+/*
+ * Add a new extent to the extent "group" if group exists;
+ * otherwise, create a new group
+ * Return value: the extent group where the extent is inserted.
+ */
+CXLDCExtentGroup *cxl_insert_extent_to_extent_group(CXLDCExtentGroup *group,
+                                                    uint64_t dpa,
+                                                    uint64_t len,
+                                                    uint8_t *tag,
+                                                    uint16_t shared_seq)
+{
+    if (!group) {
+        group = g_new0(CXLDCExtentGroup, 1);
+        QTAILQ_INIT(&group->list);
+    }
+    cxl_insert_extent_to_extent_list(&group->list, dpa, len,
+                                     tag, shared_seq);
+    return group;
+}
+
+void cxl_extent_group_list_insert_tail(CXLDCExtentGroupList *list,
+                                       CXLDCExtentGroup *group)
+{
+    QTAILQ_INSERT_TAIL(list, group, node);
+}
+
+void cxl_extent_group_list_delete_front(CXLDCExtentGroupList *list)
+{
+    CXLDCExtent *ent, *ent_next;
+    CXLDCExtentGroup *group = QTAILQ_FIRST(list);
+
+    QTAILQ_REMOVE(list, group, node);
+    QTAILQ_FOREACH_SAFE(ent, &group->list, node, ent_next) {
+        cxl_remove_extent_from_extent_list(&group->list, ent);
+    }
+    g_free(group);
+}
+
 /*
  * CXL r3.1 Table 8-168: Add Dynamic Capacity Response Input Payload
  * CXL r3.1 Table 8-170: Release Dynamic Capacity Input Payload
@@ -1541,6 +1579,7 @@ static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
 {
     uint32_t i;
     CXLDCExtent *ent;
+    CXLDCExtentGroup *ext_group;
     uint64_t dpa, len;
     Range range1, range2;
 
@@ -1551,9 +1590,13 @@ static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
         range_init_nofail(&range1, dpa, len);
 
         /*
-         * TODO: once the pending extent list is added, check against
-         * the list will be added here.
+         * The host-accepted DPA range must be contained by the first extent
+         * group in the pending list
          */
+        ext_group = QTAILQ_FIRST(&ct3d->dc.extents_pending);
+        if (!cxl_extents_contains_dpa_range(&ext_group->list, dpa, len)) {
+            return CXL_MBOX_INVALID_PA;
+        }
 
         /* to-be-added range should not overlap with range already accepted */
         QTAILQ_FOREACH(ent, &ct3d->dc.extents, node) {
@@ -1586,10 +1629,7 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
     CXLRetCode ret;
 
     if (in->num_entries_updated == 0) {
-        /*
-         * TODO: once the pending list is introduced, extents in the beginning
-         * will get wiped out.
-         */
+        cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
         return CXL_MBOX_SUCCESS;
     }
 
@@ -1615,11 +1655,9 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
 
         cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
         ct3d->dc.total_extent_count += 1;
-        /*
-         * TODO: we will add a pending extent list based on event log record
-         * and process the list accordingly here.
-         */
     }
+    /* Remove the first extent group in the pending list */
+    cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
 
     return CXL_MBOX_SUCCESS;
 }
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 7c9038938f..d4f73952c9 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -673,6 +673,7 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
         ct3d->dc.total_capacity += region->len;
     }
     QTAILQ_INIT(&ct3d->dc.extents);
+    QTAILQ_INIT(&ct3d->dc.extents_pending);
 
     return true;
 }
@@ -680,10 +681,19 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
 static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
 {
     CXLDCExtent *ent, *ent_next;
+    CXLDCExtentGroup *group, *group_next;
 
     QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node, ent_next) {
         cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
     }
+
+    QTAILQ_FOREACH_SAFE(group, &ct3d->dc.extents_pending, node, group_next) {
+        QTAILQ_REMOVE(&ct3d->dc.extents_pending, group, node);
+        QTAILQ_FOREACH_SAFE(ent, &group->list, node, ent_next) {
+            cxl_remove_extent_from_extent_list(&group->list, ent);
+        }
+        g_free(group);
+    }
 }
 
 static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
@@ -1448,7 +1458,6 @@ static int ct3d_qmp_cxl_event_log_enc(CxlEventLog log)
         return CXL_EVENT_TYPE_FAIL;
     case CXL_EVENT_LOG_FATAL:
         return CXL_EVENT_TYPE_FATAL;
-/* DCD not yet supported */
     default:
         return -EINVAL;
     }
@@ -1699,6 +1708,299 @@ void qmp_cxl_inject_memory_module_event(const char *path, CxlEventLog log,
     }
 }
 
+/* CXL r3.1 Table 8-50: Dynamic Capacity Event Record */
+static const QemuUUID dynamic_capacity_uuid = {
+    .data = UUID(0xca95afa7, 0xf183, 0x4018, 0x8c, 0x2f,
+                 0x95, 0x26, 0x8e, 0x10, 0x1a, 0x2a),
+};
+
+typedef enum CXLDCEventType {
+    DC_EVENT_ADD_CAPACITY = 0x0,
+    DC_EVENT_RELEASE_CAPACITY = 0x1,
+    DC_EVENT_FORCED_RELEASE_CAPACITY = 0x2,
+    DC_EVENT_REGION_CONFIG_UPDATED = 0x3,
+    DC_EVENT_ADD_CAPACITY_RSP = 0x4,
+    DC_EVENT_CAPACITY_RELEASED = 0x5,
+} CXLDCEventType;
+
+/*
+ * Check whether the range [dpa, dpa + len - 1] has overlaps with extents in
+ * the list.
+ * Return value: return true if has overlaps; otherwise, return false
+ */
+static bool cxl_extents_overlaps_dpa_range(CXLDCExtentList *list,
+                                           uint64_t dpa, uint64_t len)
+{
+    CXLDCExtent *ent;
+    Range range1, range2;
+
+    if (!list) {
+        return false;
+    }
+
+    range_init_nofail(&range1, dpa, len);
+    QTAILQ_FOREACH(ent, list, node) {
+        range_init_nofail(&range2, ent->start_dpa, ent->len);
+        if (range_overlaps_range(&range1, &range2)) {
+            return true;
+        }
+    }
+    return false;
+}
+
+/*
+ * Check whether the range [dpa, dpa + len - 1] is contained by extents in
+ * the list.
+ * Will check multiple extents containment once superset release is added.
+ * Return value: return true if range is contained; otherwise, return false
+ */
+bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
+                                    uint64_t dpa, uint64_t len)
+{
+    CXLDCExtent *ent;
+    Range range1, range2;
+
+    if (!list) {
+        return false;
+    }
+
+    range_init_nofail(&range1, dpa, len);
+    QTAILQ_FOREACH(ent, list, node) {
+        range_init_nofail(&range2, ent->start_dpa, ent->len);
+        if (range_contains_range(&range2, &range1)) {
+            return true;
+        }
+    }
+    return false;
+}
+
+static bool cxl_extent_groups_overlaps_dpa_range(CXLDCExtentGroupList *list,
+                                                 uint64_t dpa, uint64_t len)
+{
+    CXLDCExtentGroup *group;
+
+    if (!list) {
+        return false;
+    }
+
+    QTAILQ_FOREACH(group, list, node) {
+        if (cxl_extents_overlaps_dpa_range(&group->list, dpa, len)) {
+            return true;
+        }
+    }
+    return false;
+}
+
+/*
+ * The main function to process dynamic capacity event with extent list.
+ * Currently DC extents add/release requests are processed.
+ */
+static void qmp_cxl_process_dynamic_capacity_prescriptive(const char *path,
+        uint16_t hid, CXLDCEventType type, uint8_t rid,
+        CXLDynamicCapacityExtentList *records, Error **errp)
+{
+    Object *obj;
+    CXLEventDynamicCapacity dCap = {};
+    CXLEventRecordHdr *hdr = &dCap.hdr;
+    CXLType3Dev *dcd;
+    uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;
+    uint32_t num_extents = 0;
+    CXLDynamicCapacityExtentList *list;
+    CXLDCExtentGroup *group = NULL;
+    g_autofree CXLDCExtentRaw *extents = NULL;
+    uint8_t enc_log = CXL_EVENT_TYPE_DYNAMIC_CAP;
+    uint64_t dpa, offset, len, block_size;
+    g_autofree unsigned long *blk_bitmap = NULL;
+    int i;
+
+    obj = object_resolve_path_type(path, TYPE_CXL_TYPE3, NULL);
+    if (!obj) {
+        error_setg(errp, "Unable to resolve CXL type 3 device");
+        return;
+    }
+
+    dcd = CXL_TYPE3(obj);
+    if (!dcd->dc.num_regions) {
+        error_setg(errp, "No dynamic capacity support from the device");
+        return;
+    }
+
+
+    if (rid >= dcd->dc.num_regions) {
+        error_setg(errp, "region id is too large");
+        return;
+    }
+    block_size = dcd->dc.regions[rid].block_size;
+    blk_bitmap = bitmap_new(dcd->dc.regions[rid].len / block_size);
+
+    /* Sanity check and count the extents */
+    list = records;
+    while (list) {
+        offset = list->value->offset;
+        len = list->value->len;
+        dpa = offset + dcd->dc.regions[rid].base;
+
+        if (len == 0) {
+            error_setg(errp, "extent with 0 length is not allowed");
+            return;
+        }
+
+        if (offset % block_size || len % block_size) {
+            error_setg(errp, "dpa or len is not aligned to region block size");
+            return;
+        }
+
+        if (offset + len > dcd->dc.regions[rid].len) {
+            error_setg(errp, "extent range is beyond the region end");
+            return;
+        }
+
+        /* No duplicate or overlapped extents are allowed */
+        if (test_any_bits_set(blk_bitmap, offset / block_size,
+                              len / block_size)) {
+            error_setg(errp, "duplicate or overlapped extents are detected");
+            return;
+        }
+        bitmap_set(blk_bitmap, offset / block_size, len / block_size);
+
+        if (type == DC_EVENT_RELEASE_CAPACITY) {
+            if (cxl_extent_groups_overlaps_dpa_range(&dcd->dc.extents_pending,
+                                                     dpa, len)) {
+                error_setg(errp,
+                           "cannot release extent with pending DPA range");
+                return;
+            }
+            if (!cxl_extents_contains_dpa_range(&dcd->dc.extents, dpa, len)) {
+                error_setg(errp,
+                           "cannot release extent with non-existing DPA range");
+                return;
+            }
+        } else if (type == DC_EVENT_ADD_CAPACITY) {
+            if (cxl_extents_overlaps_dpa_range(&dcd->dc.extents, dpa, len)) {
+                error_setg(errp,
+                           "cannot add DPA already accessible to the same LD");
+                return;
+            }
+            if (cxl_extent_groups_overlaps_dpa_range(&dcd->dc.extents_pending,
+                                                     dpa, len)) {
+                error_setg(errp,
+                           "cannot add DPA again while still pending");
+                return;
+            }
+        }
+        list = list->next;
+        num_extents++;
+    }
+
+    /* Create extent list for event being passed to host */
+    i = 0;
+    list = records;
+    extents = g_new0(CXLDCExtentRaw, num_extents);
+    while (list) {
+        offset = list->value->offset;
+        len = list->value->len;
+        dpa = dcd->dc.regions[rid].base + offset;
+
+        extents[i].start_dpa = dpa;
+        extents[i].len = len;
+        memset(extents[i].tag, 0, 0x10);
+        extents[i].shared_seq = 0;
+        if (type == DC_EVENT_ADD_CAPACITY) {
+            group = cxl_insert_extent_to_extent_group(group,
+                                                      extents[i].start_dpa,
+                                                      extents[i].len,
+                                                      extents[i].tag,
+                                                      extents[i].shared_seq);
+        }
+
+        list = list->next;
+        i++;
+    }
+    if (group) {
+        cxl_extent_group_list_insert_tail(&dcd->dc.extents_pending, group);
+    }
+
+    /*
+     * CXL r3.1 section 8.2.9.2.1.6: Dynamic Capacity Event Record
+     *
+     * All Dynamic Capacity event records shall set the Event Record Severity
+     * field in the Common Event Record Format to Informational Event. All
+     * Dynamic Capacity related events shall be logged in the Dynamic Capacity
+     * Event Log.
+     */
+    cxl_assign_event_header(hdr, &dynamic_capacity_uuid, flags, sizeof(dCap),
+                            cxl_device_get_timestamp(&dcd->cxl_dstate));
+
+    dCap.type = type;
+    /* FIXME: for now, validity flag is cleared */
+    dCap.validity_flags = 0;
+    stw_le_p(&dCap.host_id, hid);
+    /* only valid for DC_REGION_CONFIG_UPDATED event */
+    dCap.updated_region_id = 0;
+    dCap.flags = 0;
+    for (i = 0; i < num_extents; i++) {
+        memcpy(&dCap.dynamic_capacity_extent, &extents[i],
+               sizeof(CXLDCExtentRaw));
+
+        if (i < num_extents - 1) {
+            /* Set "More" flag */
+            dCap.flags |= BIT(0);
+        }
+
+        if (cxl_event_insert(&dcd->cxl_dstate, enc_log,
+                             (CXLEventRecordRaw *)&dCap)) {
+            cxl_event_irq_assert(dcd);
+        }
+    }
+}
+
+void qmp_cxl_add_dynamic_capacity(const char *path, uint16_t host_id,
+                                  CXLExtSelPolicy sel_policy, uint8_t region,
+                                  const char *tag,
+                                  CXLDynamicCapacityExtentList  *extents,
+                                  Error **errp)
+{
+    switch (sel_policy) {
+    case CXL_EXT_SEL_POLICY_PRESCRIPTIVE:
+        qmp_cxl_process_dynamic_capacity_prescriptive(path, host_id,
+                                                      DC_EVENT_ADD_CAPACITY,
+                                                      region, extents, errp);
+        return;
+    default:
+        error_setg(errp, "Selection policy not supported");
+        return;
+    }
+}
+
+void qmp_cxl_release_dynamic_capacity(const char *path, uint16_t host_id,
+                                      CXLExtRemovalPolicy removal_policy,
+                                      bool forced_removal,
+                                      bool sanitize_on_release,
+                                      uint8_t region,
+                                      const char *tag,
+                                      CXLDynamicCapacityExtentList  *extents,
+                                      Error **errp)
+{
+    CXLDCEventType type = DC_EVENT_RELEASE_CAPACITY;
+
+    if (forced_removal) {
+        /* TODO: enable forced removal in the future */
+        type = DC_EVENT_FORCED_RELEASE_CAPACITY;
+        error_setg(errp, "Forced removal not supported yet");
+        return;
+    }
+
+    switch (removal_policy) {
+    case CXL_EXT_REMOVAL_POLICY_PRESCRIPTIVE:
+        qmp_cxl_process_dynamic_capacity_prescriptive(path, host_id, type,
+                                                      region, extents, errp);
+        return;
+    default:
+        error_setg(errp, "Removal policy not supported");
+        return;
+    }
+}
+
 static void ct3_class_init(ObjectClass *oc, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(oc);
diff --git a/hw/mem/cxl_type3_stubs.c b/hw/mem/cxl_type3_stubs.c
index 3e1851e32b..02d15bc256 100644
--- a/hw/mem/cxl_type3_stubs.c
+++ b/hw/mem/cxl_type3_stubs.c
@@ -67,3 +67,26 @@ void qmp_cxl_inject_correctable_error(const char *path, CxlCorErrorType type,
 {
     error_setg(errp, "CXL Type 3 support is not compiled in");
 }
+
+void qmp_cxl_add_dynamic_capacity(const char *path,
+                                  uint16_t host_id,
+                                  CXLExtSelPolicy sel_policy,
+                                  uint8_t region,
+                                  const char *tag,
+                                  CXLDynamicCapacityExtentList *extents,
+                                  Error **errp)
+{
+    error_setg(errp, "CXL Type 3 support is not compiled in");
+}
+
+void qmp_cxl_release_dynamic_capacity(const char *path, uint16_t host_id,
+                                      CXLExtRemovalPolicy removal_policy,
+                                      bool forced_removal,
+                                      bool sanitize_on_release,
+                                      uint8_t region,
+                                      const char *tag,
+                                      CXLDynamicCapacityExtentList *extents,
+                                      Error **errp)
+{
+    error_setg(errp, "CXL Type 3 support is not compiled in");
+}
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index df3511e91b..c69ff6b5de 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -443,6 +443,12 @@ typedef struct CXLDCExtent {
 } CXLDCExtent;
 typedef QTAILQ_HEAD(, CXLDCExtent) CXLDCExtentList;
 
+typedef struct CXLDCExtentGroup {
+    CXLDCExtentList list;
+    QTAILQ_ENTRY(CXLDCExtentGroup) node;
+} CXLDCExtentGroup;
+typedef QTAILQ_HEAD(, CXLDCExtentGroup) CXLDCExtentGroupList;
+
 typedef struct CXLDCRegion {
     uint64_t base;       /* aligned to 256*MiB */
     uint64_t decode_len; /* aligned to 256*MiB */
@@ -494,6 +500,7 @@ struct CXLType3Dev {
          */
         uint64_t total_capacity; /* 256M aligned */
         CXLDCExtentList extents;
+        CXLDCExtentGroupList extents_pending;
         uint32_t total_extent_count;
         uint32_t ext_list_gen_seq;
 
@@ -555,4 +562,19 @@ CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len);
 
 void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
                                         CXLDCExtent *extent);
+void cxl_insert_extent_to_extent_list(CXLDCExtentList *list, uint64_t dpa,
+                                      uint64_t len, uint8_t *tag,
+                                      uint16_t shared_seq);
+bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
+                       unsigned long size);
+bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
+                                    uint64_t dpa, uint64_t len);
+CXLDCExtentGroup *cxl_insert_extent_to_extent_group(CXLDCExtentGroup *group,
+                                                    uint64_t dpa,
+                                                    uint64_t len,
+                                                    uint8_t *tag,
+                                                    uint16_t shared_seq);
+void cxl_extent_group_list_insert_tail(CXLDCExtentGroupList *list,
+                                       CXLDCExtentGroup *group);
+void cxl_extent_group_list_delete_front(CXLDCExtentGroupList *list);
 #endif
diff --git a/include/hw/cxl/cxl_events.h b/include/hw/cxl/cxl_events.h
index 5170b8dbf8..38cadaa0f3 100644
--- a/include/hw/cxl/cxl_events.h
+++ b/include/hw/cxl/cxl_events.h
@@ -166,4 +166,22 @@ typedef struct CXLEventMemoryModule {
     uint8_t reserved[0x3d];
 } QEMU_PACKED CXLEventMemoryModule;
 
+/*
+ * CXL r3.1 section Table 8-50: Dynamic Capacity Event Record
+ * All fields little endian.
+ */
+typedef struct CXLEventDynamicCapacity {
+    CXLEventRecordHdr hdr;
+    uint8_t type;
+    uint8_t validity_flags;
+    uint16_t host_id;
+    uint8_t updated_region_id;
+    uint8_t flags;
+    uint8_t reserved2[2];
+    uint8_t dynamic_capacity_extent[0x28]; /* defined in cxl_device.h */
+    uint8_t reserved[0x18];
+    uint32_t extents_avail;
+    uint32_t tags_avail;
+} QEMU_PACKED CXLEventDynamicCapacity;
+
 #endif /* CXL_EVENTS_H */
diff --git a/qapi/cxl.json b/qapi/cxl.json
index 4281726dec..a3a900eb2e 100644
--- a/qapi/cxl.json
+++ b/qapi/cxl.json
@@ -361,3 +361,146 @@
 ##
 {'command': 'cxl-inject-correctable-error',
  'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
+
+##
+# @CXLDynamicCapacityExtent:
+#
+# A single dynamic capacity extent
+#
+# @offset: The offset (in bytes) to the start of the region
+#     where the extent belongs to.
+#
+# @len: The length of the extent in bytes.
+#
+# Since: 9.1
+##
+{ 'struct': 'CXLDynamicCapacityExtent',
+  'data': {
+      'offset':'uint64',
+      'len': 'uint64'
+  }
+}
+
+##
+# @CXLExtSelPolicy:
+#
+# The policy to use for selecting which extents comprise the added
+# capacity, as defined in cxl spec r3.1 Table 7-70.
+#
+# @free: 0h = Free
+#
+# @contiguous: 1h = Continuous
+#
+# @prescriptive: 2h = Prescriptive
+#
+# @enable-shared-access: 3h = Enable Shared Access
+#
+# Since: 9.1
+##
+{ 'enum': 'CXLExtSelPolicy',
+  'data': ['free',
+           'contiguous',
+           'prescriptive',
+           'enable-shared-access']
+}
+
+##
+# @cxl-add-dynamic-capacity:
+#
+# Command to initiate to add dynamic capacity extents to a host.  It
+# simulates operations defined in cxl spec r3.1 7.6.7.6.5.
+#
+# @path: CXL DCD canonical QOM path.
+#
+# @host-id: The "Host ID" field as defined in cxl spec r3.1
+#     Table 7-70.
+#
+# @selection-policy: The "Selection Policy" bits as defined in
+#     cxl spec r3.1 Table 7-70.  It specifies the policy to use for
+#     selecting which extents comprise the added capacity.
+#
+# @region: The "Region Number" field as defined in cxl spec r3.1
+#     Table 7-70.  The dynamic capacity region where the capacity
+#     is being added.  Valid range is from 0-7.
+#
+# @tag: The "Tag" field as defined in cxl spec r3.1 Table 7-70.
+#
+# @extents: The "Extent List" field as defined in cxl spec r3.1
+#     Table 7-70.
+#
+# Since : 9.1
+##
+{ 'command': 'cxl-add-dynamic-capacity',
+  'data': { 'path': 'str',
+            'host-id': 'uint16',
+            'selection-policy': 'CXLExtSelPolicy',
+            'region': 'uint8',
+            'tag': 'str',
+            'extents': [ 'CXLDynamicCapacityExtent' ]
+           }
+}
+
+##
+# @CXLExtRemovalPolicy:
+#
+# The policy to use for selecting which extents comprise the released
+# capacity, defined in the "Flags" field in cxl spec r3.1 Table 7-71.
+#
+# @tag-based: value = 0h.  Extents are selected by the device based
+#     on tag, with no requirement for contiguous extents.
+#
+# @prescriptive: value = 1h.  Extent list of capacity to release is
+#     included in the request payload.
+#
+# Since: 9.1
+##
+{ 'enum': 'CXLExtRemovalPolicy',
+  'data': ['tag-based',
+           'prescriptive']
+}
+
+##
+# @cxl-release-dynamic-capacity:
+#
+# Command to initiate to release dynamic capacity extents from a
+# host.  It simulates operations defined in cxl spec r3.1 7.6.7.6.6.
+#
+# @path: CXL DCD canonical QOM path.
+#
+# @host-id: The "Host ID" field as defined in cxl spec r3.1
+#     Table 7-71.
+#
+# @removal-policy: Bit[3:0] of the "Flags" field as defined in cxl
+#     spec r3.1 Table 7-71.
+#
+# @forced-removal: Bit[4] of the "Flags" field in cxl spec r3.1
+#     Table 7-71.  When set, device does not wait for a Release
+#     Dynamic Capacity command from the host.  Host immediately
+#     loses access to released capacity.
+#
+# @sanitize-on-release: Bit[5] of the "Flags" field in cxl spec r3.1
+#     Table 7-71.  When set, device should sanitize all released
+#     capacity as a result of this request.
+#
+# @region: The "Region Number" field as defined in cxl spec r3.1
+#     Table 7-71.  The dynamic capacity region where the capacity
+#     is being added.  Valid range is from 0-7.
+#
+# @tag: The "Tag" field as defined in cxl spec r3.1 Table 7-71.
+#
+# @extents: The "Extent List" field as defined in cxl spec r3.1
+#     Table 7-71.
+#
+# Since : 9.1
+##
+{ 'command': 'cxl-release-dynamic-capacity',
+  'data': { 'path': 'str',
+            'host-id': 'uint16',
+            'removal-policy': 'CXLExtRemovalPolicy',
+            'forced-removal': 'bool',
+            'sanitize-on-release': 'bool',
+            'region': 'uint8',
+            'tag': 'str',
+            'extents': [ 'CXLDynamicCapacityExtent' ]
+           }
+}
Fan Ni May 21, 2024, 11:38 p.m. UTC | #15
On Mon, May 20, 2024 at 05:50:12PM +0100, Jonathan Cameron wrote:
> On Wed, 1 May 2024 15:29:31 -0700
> fan <nifan.cxl@gmail.com> wrote:
> 
> > From 873f59ec06c38645768ada452d9b18920a34723e Mon Sep 17 00:00:00 2001
> > From: Fan Ni <fan.ni@samsung.com>
> > Date: Tue, 20 Feb 2024 09:48:31 -0800
> > Subject: [PATCH] hw/cxl/events: Add qmp interfaces to add/release dynamic
> >  capacity extents
> > Status: RO
> > Content-Length: 25172
> > Lines: 731
> > 
> > To simulate FM functionalities for initiating Dynamic Capacity Add
> > (Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
> > r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
> > add/release dynamic capacity extents requests.
> > 
> > With the change, we allow to release an extent only when its DPA range
> > is contained by a single accepted extent in the device. That is to say,
> > extent superset release is not supported yet.
> > 
> > 1. Add dynamic capacity extents:
> > 
> > For example, the command to add two continuous extents (each 128MiB long)
> > to region 0 (starting at DPA offset 0) looks like below:
> > 
> > { "execute": "qmp_capabilities" }
> > 
> > { "execute": "cxl-add-dynamic-capacity",
> >   "arguments": {
> >       "path": "/machine/peripheral/cxl-dcd0",
> >       "host-id": 0,
> >       "selection-policy": 2,
> >       "region": 0,
> >       "tag": "",
> >       "extents": [
> >       {
> >           "offset": 0,
> >           "len": 134217728
> >       },
> >       {
> >           "offset": 134217728,
> >           "len": 134217728
> >       }
> >       ]
> >   }
> > }
> > 
> > 2. Release dynamic capacity extents:
> > 
> > For example, the command to release an extent of size 128MiB from region 0
> > (DPA offset 128MiB) looks like below:
> > 
> > { "execute": "cxl-release-dynamic-capacity",
> >   "arguments": {
> >       "path": "/machine/peripheral/cxl-dcd0",
> >       "host-id": 0,
> >       "flags": 1,
> >       "region": 0,
> >       "tag": "",
> >       "extents": [
> >       {
> >           "offset": 134217728,
> >           "len": 134217728
> >       }
> >       ]
> >   }
> > }
> > 
> > Signed-off-by: Fan Ni <fan.ni@samsung.com>
> 
> Hi Fan,
> 
> A few trivial questions inline.  I don't feel particularly strongly
> about breaking up the flags fields, but I'd like to understand your
> reasoning for keeping them as single fields?
> 
> Is it mainly to keep aligned with the specification or something else?
> 
> Thanks,
> 
> Jonathan

Hi Jonathan,
FYI.
I just sent out the updated QAPI patch with selection policy defined as
enum and removal policy split out in this thread,
https://lore.kernel.org/linux-cxl/5856b7a4-4082-465f-9f61-b1ec6c35ef0f@fujitsu.com/T/#m9838d6afda49fb26eb90526eae5550256f5d0f56

Planning to send out v8 on Thursday.

Fan

> 
> 
> >  #endif /* CXL_EVENTS_H */
> > diff --git a/qapi/cxl.json b/qapi/cxl.json
> > index 4281726dec..27cf39f448 100644
> > --- a/qapi/cxl.json
> > +++ b/qapi/cxl.json
> > @@ -361,3 +361,93 @@
> >  ##
> >  {'command': 'cxl-inject-correctable-error',
> >   'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
> > +
> > +##
> > +# @CXLDynamicCapacityExtent:
> > +#
> > +# A single dynamic capacity extent
> > +#
> > +# @offset: The offset (in bytes) to the start of the region
> > +#     where the extent belongs to
> > +#
> > +# @len: The length of the extent in bytes
> > +#
> > +# Since: 9.1
> > +##
> > +{ 'struct': 'CXLDynamicCapacityExtent',
> > +  'data': {
> > +      'offset':'uint64',
> > +      'len': 'uint64'
> > +  }
> > +}
> > +
> > +##
> > +# @cxl-add-dynamic-capacity:
> > +#
> > +# Command to initiate to add dynamic capacity extents to a host.  It
> > +# simulates operations defined in cxl spec r3.1 7.6.7.6.5.
> > +#
> > +# @path: CXL DCD canonical QOM path
> > +#
> > +# @host-id: The "Host ID" field as defined in cxl spec r3.1
> > +#     Table 7-70.
> > +#
> > +# @selection-policy: The "Selection Policy" bits as defined in
> > +#     cxl spec r3.1 Table 7-70.  It specifies the policy to use for
> > +#     selecting which extents comprise the added capacity.
> 
> Hmm. This one is defined as a selection of nameable choices.  Perhaps
> worth an enum?  If we did do that, we'd also need to break the flags
> on in the release flags below.
> 
> 
> > +#
> > +# @region: The "Region Number" field as defined in cxl spec r3.1
> > +#     Table 7-70.  The dynamic capacity region where the capacity
> > +#     is being added.  Valid range is from 0-7.
> > +#
> > +# @tag: The "Tag" field as defined in cxl spec r3.1 Table 7-70.
> > +#
> > +# @extents: The "Extent List" field as defined in cxl spec r3.1
> > +#     Table 7-70.
> > +#
> > +# Since : 9.1
> > +##
> > +{ 'command': 'cxl-add-dynamic-capacity',
> > +  'data': { 'path': 'str',
> > +            'host-id': 'uint16',
> > +            'selection-policy': 'uint8',
> > +            'region': 'uint8',
> > +            'tag': 'str',
> > +            'extents': [ 'CXLDynamicCapacityExtent' ]
> > +           }
> > +}
> > +
> > +##
> > +# @cxl-release-dynamic-capacity:
> > +#
> > +# Command to initiate to release dynamic capacity extents from a
> > +# host.  It simulates operations defined in cxl spec r3.1 7.6.7.6.6.
> > +#
> > +# @path: CXL DCD canonical QOM path
> > +#
> > +# @host-id: The "Host ID" field as defined in cxl spec r3.1
> > +#     Table 7-71.
> > +#
> > +# @flags: The "Flags" field as defined in cxl spec r3.1 Table 7-71,
> > +#     with bit[3:0] for removal policy, bit[4] for forced removal,
> > +#     bit[5] for sanitize on release, bit[7:6] reserved.
> 
> This can be nicely broken up into removal policy enum plus two flags.
> It might be worth doing so to give a nicer interface?
> 
> > +#
> > +# @region: The dynamic capacity region where the extents will be
> > +#     released.
> 
> This has a better definition in the add dynamic capacity entry above.
> 
> > +#
> > +# @tag: The "Tag" field as defined in cxl spec r3.1 Table 7-71.
> > +#
> > +# @extents: The "Extent List" field as defined in cxl spec r3.1
> > +#     Table 7-71.
> > +#
> > +# Since : 9.1
> > +##
> > +{ 'command': 'cxl-release-dynamic-capacity',
> > +  'data': { 'path': 'str',
> > +            'host-id': 'uint16',
> > +            'flags': 'uint8',
> > +            'region': 'uint8',
> > +            'tag': 'str',
> > +            'extents': [ 'CXLDynamicCapacityExtent' ]
> > +           }
> > +}
>
Jonathan Cameron May 23, 2024, 3:31 p.m. UTC | #16
On Tue, 21 May 2024 16:32:52 -0700
fan <nifan.cxl@gmail.com> wrote:

> From 9d6d774ec973d22c0f662b32385345a88b14cc55 Mon Sep 17 00:00:00 2001
> From: Fan Ni <fan.ni@samsung.com>
> Date: Tue, 20 Feb 2024 09:48:31 -0800
> Subject: [PATCH 11/14] hw/cxl/events: Add qmp interfaces to add/release
>  dynamic capacity extents
> 
> To simulate FM functionalities for initiating Dynamic Capacity Add
> (Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
> r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
> add/release dynamic capacity extents requests.
> 
> With the change, we allow to release an extent only when its DPA range
> is contained by a single accepted extent in the device. That is to say,
> extent superset release is not supported yet.
> 
> 1. Add dynamic capacity extents:
> 
> For example, the command to add two continuous extents (each 128MiB long)
> to region 0 (starting at DPA offset 0) looks like below:
> 
> { "execute": "qmp_capabilities" }
> 
> { "execute": "cxl-add-dynamic-capacity",
>   "arguments": {
>       "path": "/machine/peripheral/cxl-dcd0",
>       "host-id": 0,
>       "selection-policy": "prescriptive",
>       "region": 0,
>       "tag": "",
As below, should we make the tag optional? (prefix it with * and
the callback will gain a has_x bool.)
>       "extents": [
>       {
>           "offset": 0,
>           "len": 134217728
>       },
>       {
>           "offset": 134217728,
>           "len": 134217728
>       }
>       ]
>   }
> }
> 
> 2. Release dynamic capacity extents:
> 
> For example, the command to release an extent of size 128MiB from region 0
> (DPA offset 128MiB) looks like below:
> 
> { "execute": "cxl-release-dynamic-capacity",
>   "arguments": {
>       "path": "/machine/peripheral/cxl-dcd0",
>       "host-id": 0,
>       "removal-policy":"prescriptive",
>       "forced-removal": false,
>       "sanitize-on-release": false,
>       "region": 0,
>       "tag": "",
This suggests to me we should use some optional parameters with
sensible defaults (off and not present) for
tag, sanitize-on-release and forced-removal.


>       "extents": [
>       {
>           "offset": 134217728,
>           "len": 134217728
>       }
>       ]
>   }
> }


> diff --git a/qapi/cxl.json b/qapi/cxl.json
> index 4281726dec..a3a900eb2e 100644
> --- a/qapi/cxl.json
> +++ b/qapi/cxl.json
> @@ -361,3 +361,146 @@
>  ##
>  {'command': 'cxl-inject-correctable-error',
>   'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
> +
> +##
> +# @CXLDynamicCapacityExtent:
> +#
> +# A single dynamic capacity extent
> +#
> +# @offset: The offset (in bytes) to the start of the region
> +#     where the extent belongs to.
> +#
> +# @len: The length of the extent in bytes.
> +#
> +# Since: 9.1
> +##
> +{ 'struct': 'CXLDynamicCapacityExtent',
> +  'data': {
> +      'offset':'uint64',
> +      'len': 'uint64'
> +  }
> +}
> +
> +##
> +# @CXLExtSelPolicy:
> +#
> +# The policy to use for selecting which extents comprise the added
> +# capacity, as defined in cxl spec r3.1 Table 7-70.
> +#
> +# @free: 0h = Free
> +#
> +# @contiguous: 1h = Continuous
> +#
> +# @prescriptive: 2h = Prescriptive
> +#
> +# @enable-shared-access: 3h = Enable Shared Access
> +#
> +# Since: 9.1
> +##
> +{ 'enum': 'CXLExtSelPolicy',
> +  'data': ['free',
> +           'contiguous',
> +           'prescriptive',
> +           'enable-shared-access']
> +}
> +
> +##
> +# @cxl-add-dynamic-capacity:
> +#
> +# Command to initiate to add dynamic capacity extents to a host.  It
> +# simulates operations defined in cxl spec r3.1 7.6.7.6.5.
> +#
> +# @path: CXL DCD canonical QOM path.
> +#
> +# @host-id: The "Host ID" field as defined in cxl spec r3.1
> +#     Table 7-70.
> +#
> +# @selection-policy: The "Selection Policy" bits as defined in
> +#     cxl spec r3.1 Table 7-70.  It specifies the policy to use for
> +#     selecting which extents comprise the added capacity.
> +#
> +# @region: The "Region Number" field as defined in cxl spec r3.1
> +#     Table 7-70.  The dynamic capacity region where the capacity
> +#     is being added.  Valid range is from 0-7.
> +#
> +# @tag: The "Tag" field as defined in cxl spec r3.1 Table 7-70.
> +#
> +# @extents: The "Extent List" field as defined in cxl spec r3.1
> +#     Table 7-70.
> +#
> +# Since : 9.1
> +##
> +{ 'command': 'cxl-add-dynamic-capacity',
> +  'data': { 'path': 'str',
> +            'host-id': 'uint16',
> +            'selection-policy': 'CXLExtSelPolicy',
> +            'region': 'uint8',
> +            'tag': 'str',
> +            'extents': [ 'CXLDynamicCapacityExtent' ]
> +           }
> +}
> +
> +##
> +# @CXLExtRemovalPolicy:
> +#
> +# The policy to use for selecting which extents comprise the released
> +# capacity, defined in the "Flags" field in cxl spec r3.1 Table 7-71.
> +#
> +# @tag-based: value = 0h.  Extents are selected by the device based
> +#     on tag, with no requirement for contiguous extents.
> +#
> +# @prescriptive: value = 1h.  Extent list of capacity to release is
> +#     included in the request payload.
> +#
> +# Since: 9.1
> +##
> +{ 'enum': 'CXLExtRemovalPolicy',
> +  'data': ['tag-based',
> +           'prescriptive']
> +}
> +
> +##
> +# @cxl-release-dynamic-capacity:
> +#
> +# Command to initiate to release dynamic capacity extents from a
> +# host.  It simulates operations defined in cxl spec r3.1 7.6.7.6.6.
> +#
> +# @path: CXL DCD canonical QOM path.
> +#
> +# @host-id: The "Host ID" field as defined in cxl spec r3.1
> +#     Table 7-71.
> +#
> +# @removal-policy: Bit[3:0] of the "Flags" field as defined in cxl
> +#     spec r3.1 Table 7-71.
> +#
> +# @forced-removal: Bit[4] of the "Flags" field in cxl spec r3.1
> +#     Table 7-71.  When set, device does not wait for a Release
> +#     Dynamic Capacity command from the host.  Host immediately
> +#     loses access to released capacity.
> +#
> +# @sanitize-on-release: Bit[5] of the "Flags" field in cxl spec r3.1
> +#     Table 7-71.  When set, device should sanitize all released
> +#     capacity as a result of this request.
> +#
> +# @region: The "Region Number" field as defined in cxl spec r3.1
> +#     Table 7-71.  The dynamic capacity region where the capacity
> +#     is being added.  Valid range is from 0-7.
> +#
> +# @tag: The "Tag" field as defined in cxl spec r3.1 Table 7-71.
> +#
> +# @extents: The "Extent List" field as defined in cxl spec r3.1
> +#     Table 7-71.
> +#
> +# Since : 9.1
> +##
> +{ 'command': 'cxl-release-dynamic-capacity',
> +  'data': { 'path': 'str',
> +            'host-id': 'uint16',
> +            'removal-policy': 'CXLExtRemovalPolicy',
> +            'forced-removal': 'bool',
> +            'sanitize-on-release': 'bool',
> +            'region': 'uint8',
> +            'tag': 'str',
> +            'extents': [ 'CXLDynamicCapacityExtent' ]
> +           }
> +}
Jonathan Cameron May 23, 2024, 3:32 p.m. UTC | #17
On Tue, 21 May 2024 16:38:53 -0700
fan <nifan.cxl@gmail.com> wrote:

> On Mon, May 20, 2024 at 05:50:12PM +0100, Jonathan Cameron wrote:
> > On Wed, 1 May 2024 15:29:31 -0700
> > fan <nifan.cxl@gmail.com> wrote:
> >   
> > > From 873f59ec06c38645768ada452d9b18920a34723e Mon Sep 17 00:00:00 2001
> > > From: Fan Ni <fan.ni@samsung.com>
> > > Date: Tue, 20 Feb 2024 09:48:31 -0800
> > > Subject: [PATCH] hw/cxl/events: Add qmp interfaces to add/release dynamic
> > >  capacity extents
> > > Status: RO
> > > Content-Length: 25172
> > > Lines: 731
> > > 
> > > To simulate FM functionalities for initiating Dynamic Capacity Add
> > > (Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
> > > r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
> > > add/release dynamic capacity extents requests.
> > > 
> > > With the change, we allow to release an extent only when its DPA range
> > > is contained by a single accepted extent in the device. That is to say,
> > > extent superset release is not supported yet.
> > > 
> > > 1. Add dynamic capacity extents:
> > > 
> > > For example, the command to add two continuous extents (each 128MiB long)
> > > to region 0 (starting at DPA offset 0) looks like below:
> > > 
> > > { "execute": "qmp_capabilities" }
> > > 
> > > { "execute": "cxl-add-dynamic-capacity",
> > >   "arguments": {
> > >       "path": "/machine/peripheral/cxl-dcd0",
> > >       "host-id": 0,
> > >       "selection-policy": 2,
> > >       "region": 0,
> > >       "tag": "",
> > >       "extents": [
> > >       {
> > >           "offset": 0,
> > >           "len": 134217728
> > >       },
> > >       {
> > >           "offset": 134217728,
> > >           "len": 134217728
> > >       }
> > >       ]
> > >   }
> > > }
> > > 
> > > 2. Release dynamic capacity extents:
> > > 
> > > For example, the command to release an extent of size 128MiB from region 0
> > > (DPA offset 128MiB) looks like below:
> > > 
> > > { "execute": "cxl-release-dynamic-capacity",
> > >   "arguments": {
> > >       "path": "/machine/peripheral/cxl-dcd0",
> > >       "host-id": 0,
> > >       "flags": 1,
> > >       "region": 0,
> > >       "tag": "",
> > >       "extents": [
> > >       {
> > >           "offset": 134217728,
> > >           "len": 134217728
> > >       }
> > >       ]
> > >   }
> > > }
> > > 
> > > Signed-off-by: Fan Ni <fan.ni@samsung.com>  
> > 
> > Hi Fan,
> > 
> > A few trivial questions inline.  I don't feel particularly strongly
> > about breaking up the flags fields, but I'd like to understand your
> > reasoning for keeping them as single fields?
> > 
> > Is it mainly to keep aligned with the specification or something else?
> > 
> > Thanks,
> > 
> > Jonathan  
> 
> Hi Jonathan,
> FYI.
> I just sent out the updated QAPI patch with selection policy defined as
> enum and removal policy split out in this thread,
> https://lore.kernel.org/linux-cxl/5856b7a4-4082-465f-9f61-b1ec6c35ef0f@fujitsu.com/T/#m9838d6afda49fb26eb90526eae5550256f5d0f56
Looks good in general, but I have more questions :(
Now we have separate fields, some of them have natural defaults. Maybe
we should provide those and reduce what needs to be passed in?

We will need to do that anyway for any future editions so perhaps
makes sense to do so now?

> 
> Planning to send out v8 on Thursday.
Great.

Jonathan
Markus Armbruster June 4, 2024, 9:18 a.m. UTC | #18
I finally got around to read this slowly.  Thank you, Fan and Jonathan!

I'm getting some "incomplete" vibes: "if we ever implement", "patches
for this on list", "we aren't emulating this yet at all", and ...

Jonathan Cameron <Jonathan.Cameron@Huawei.com> writes:

[...]

> Only thing I'd add is that for now (because we don't need it for testing
> the kernel flows) is that this does not provide any way for external
> agents (e.g. our 'fabric manager' to find out what the state is - i.e.
> if the extents have been accepted by the host etc). That stuff is all
> defined by the spec, but not yet in the QMP interface.  At somepoint
> we may want to add that as a state query type interface.

... this, too.

In review of v5, I asked whether this interface needs to be stable.

"Not stable" doesn't mean we change an interface without thought.  It
merely means we can change it much, much faster, and with much less
overhead.

I understand you want it chiefly for CXL development.  Development aids
commonly don't need to be stable.

If you're aiming for stable, you need to convince us the interface can
support the foreseeable purposes without incompatible changes.  In
particular, I'd like to see how you intend to support "finding out what
the state is".  I suspect that's related to my question in review of v8:
how to detect completion, and maybe track progress.

There's infrastructure for background jobs: job.json.  We might be
better off using it, unless completion is trivial and progress tracking
unnecessary.

[...]
Jonathan Cameron June 4, 2024, 11:54 a.m. UTC | #19
On Tue, 04 Jun 2024 11:18:18 +0200
Markus Armbruster <armbru@redhat.com> wrote:

> I finally got around to read this slowly.  Thank you, Fan and Jonathan!
> 
> I'm getting some "incomplete" vibes: "if we ever implement", "patches
> for this on list", "we aren't emulating this yet at all", and ...

Absolutely.  There is a bunch of stuff that we reject today but
the interfaces allow you to specify it and align with the CXL specification
Fabric Management API definition which is the spec used to control this
stuff from a BMC etc.  If that doesn't work we have a hardware errata
problem, so hopefully that is fairly stable.

I think I can publicly confirm there are some related errata in flight,
seeing as we said we'd raise questions on some aspects in the kernel and
QEMU series preceding this one (so no IP secrecy issues). These are as a
result of this work from Fan, but we have carefully avoided implementing
anything that 'may' change.


> 
> Jonathan Cameron <Jonathan.Cameron@Huawei.com> writes:
> 
> [...]
> 
> > Only thing I'd add is that for now (because we don't need it for testing
> > the kernel flows) is that this does not provide any way for external
> > agents (e.g. our 'fabric manager' to find out what the state is - i.e.
> > if the extents have been accepted by the host etc). That stuff is all
> > defined by the spec, but not yet in the QMP interface.  At somepoint
> > we may want to add that as a state query type interface.  
> 
> ... this, too.
> 
> In review of v5, I asked whether this interface needs to be stable.
> 
> "Not stable" doesn't mean we change an interface without thought.  It
> merely means we can change it much, much faster, and with much less
> overhead.
> 
> I understand you want it chiefly for CXL development.  Development aids
> commonly don't need to be stable.

Ok. If it makes sense to make this unstable for now I'm fine with that.
I don't see why it would change beyond in backwards compatible fashion
with new optional fields to cover future specification updates allowing
for more information. However I've been wrong on such things before.

So I'm fine adding a patch on top of v8 marking them unstable for now.

> 
> If you're aiming for stable, you need to convince us the interface can
> support the foreseeable purposes without incompatible changes.  In
> particular, I'd like to see how you intend to support "finding out what
> the state is".  I suspect that's related to my question in review of v8:
> how to detect completion, and maybe track progress.

There is a need for completion information but given it's completely
asynchronous to the commands defined here (Can be out of order, lots of
ongoing capacity add / remove flows in flight etc) and for the hardware
definition the feedback occurs via an event record log I'm not expecting it
to affect the interfaces added so far.

> 
> There's infrastructure for background jobs: job.json.  We might be
> better off using it, unless completion is trivial and progress tracking
> unnecessary.

I'll take a look, but these are not conventional background commands
(We do have those as well, but that's a whole different set of future
problems!)

The actual command itself completes synchronously but not the flow
it kicked off which is not background as such because it may never
finish and involves lots of moving parts.  This is similar to any
form of error injection.  We inject the error synchronously and that
creates a bunch of records in various registers / firmware etc but
the actions the host OS takes are asynchronous and may only happen
when it decides to poll some register or a driver loads much later.

So I'm not sure if job.json approach fits.

> 
> [...]
> 
>
Jonathan Cameron June 4, 2024, 12:13 p.m. UTC | #20
On Tue, 4 Jun 2024 12:54:28 +0100
Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:

> On Tue, 04 Jun 2024 11:18:18 +0200
> Markus Armbruster <armbru@redhat.com> wrote:
> 
> > I finally got around to read this slowly.  Thank you, Fan and Jonathan!
> > 
> > I'm getting some "incomplete" vibes: "if we ever implement", "patches
> > for this on list", "we aren't emulating this yet at all", and ...  
> 
> Absolutely.  There is a bunch of stuff that we reject today but
> the interfaces allow you to specify it and align with the CXL specification
> Fabric Management API definition which is the spec used to control this
> stuff from a BMC etc.  If that doesn't work we have a hardware errata
> problem, so hopefully that is fairly stable.
> 
> I think I can publicly confirm there are some related errata in flight,
> seeing as we said we'd raise questions on some aspects in the kernel and
> QEMU series preceding this one (so no IP secrecy issues). These are as a
> result of this work from Fan, but we have carefully avoided implementing
> anything that 'may' change.
> 
> 
> > 
> > Jonathan Cameron <Jonathan.Cameron@Huawei.com> writes:
> > 
> > [...]
> >   
> > > Only thing I'd add is that for now (because we don't need it for testing
> > > the kernel flows) is that this does not provide any way for external
> > > agents (e.g. our 'fabric manager' to find out what the state is - i.e.
> > > if the extents have been accepted by the host etc). That stuff is all
> > > defined by the spec, but not yet in the QMP interface.  At somepoint
> > > we may want to add that as a state query type interface.    
> > 
> > ... this, too.
> > 
> > In review of v5, I asked whether this interface needs to be stable.
> > 
> > "Not stable" doesn't mean we change an interface without thought.  It
> > merely means we can change it much, much faster, and with much less
> > overhead.
> > 
> > I understand you want it chiefly for CXL development.  Development aids
> > commonly don't need to be stable.  
> 
> Ok. If it makes sense to make this unstable for now I'm fine with that.
> I don't see why it would change beyond in backwards compatible fashion
> with new optional fields to cover future specification updates allowing
> for more information. However I've been wrong on such things before.
> 
> So I'm fine adding a patch on top of v8 marking them unstable for now.

Does this look correct?  Applied on top of the docs update patch just
posted in response to v8.  In my view neither is a blocker on Michael
Tsirkin sending a pull request but +CC so he is aware of discussion.

[PATCH] hw/cxl/events: Mark cxl-add-dynamic-capacity and cxl-release-dynamic-capcity unstable

Markus suggested that we make the unstable. I don't expect these
interfaces to change because of their tight coupling to the Compute
Express Link (CXL) Specification, Revision 3.1 Fabric Management API
definitions which can only be extended in backwards compatible way.
However, there seems little disadvantage in taking a cautious path
for now and marking them as unstable interfaces.

Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 qapi/cxl.json | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/qapi/cxl.json b/qapi/cxl.json
index a38622a0d1..bdfac67c47 100644
--- a/qapi/cxl.json
+++ b/qapi/cxl.json
@@ -453,6 +453,10 @@
 # @extents: The "Extent List" field as defined in Compute Express Link
 #     (CXL) Specification, Revision 3.1, Table 7-70.
 #
+# Features:
+#
+# @unstable: For now this command is subject to change.
+#
 # Since : 9.1
 ##
 { 'command': 'cxl-add-dynamic-capacity',
@@ -462,7 +466,8 @@
             'region': 'uint8',
             '*tag': 'str',
             'extents': [ 'CxlDynamicCapacityExtent' ]
-           }
+           },
+  'features': [ 'unstable' ]
 }
 
 ##
@@ -527,6 +532,10 @@
 # @extents: The "Extent List" field as defined in Compute Express
 #     Link (CXL) Specification, Revision 3.1, Table 7-71.
 #
+# Features:
+#
+# @unstable: For now this command is subject to change.
+#
 # Since : 9.1
 ##
 { 'command': 'cxl-release-dynamic-capacity',
@@ -538,5 +547,6 @@
             'region': 'uint8',
             '*tag': 'str',
             'extents': [ 'CxlDynamicCapacityExtent' ]
-           }
+           },
+  'features': [ 'unstable' ]
 }
Markus Armbruster June 4, 2024, 12:28 p.m. UTC | #21
Jonathan Cameron <Jonathan.Cameron@Huawei.com> writes:

> On Tue, 04 Jun 2024 11:18:18 +0200
> Markus Armbruster <armbru@redhat.com> wrote:
>
>> I finally got around to read this slowly.  Thank you, Fan and Jonathan!
>> 
>> I'm getting some "incomplete" vibes: "if we ever implement", "patches
>> for this on list", "we aren't emulating this yet at all", and ...
>
> Absolutely.  There is a bunch of stuff that we reject today but
> the interfaces allow you to specify it and align with the CXL specification
> Fabric Management API definition which is the spec used to control this
> stuff from a BMC etc.  If that doesn't work we have a hardware errata
> problem, so hopefully that is fairly stable.
>
> I think I can publicly confirm there are some related errata in flight,
> seeing as we said we'd raise questions on some aspects in the kernel and
> QEMU series preceding this one (so no IP secrecy issues). These are as a
> result of this work from Fan, but we have carefully avoided implementing
> anything that 'may' change.
>
>
>> 
>> Jonathan Cameron <Jonathan.Cameron@Huawei.com> writes:
>> 
>> [...]
>> 
>> > Only thing I'd add is that for now (because we don't need it for testing
>> > the kernel flows) is that this does not provide any way for external
>> > agents (e.g. our 'fabric manager' to find out what the state is - i.e.
>> > if the extents have been accepted by the host etc). That stuff is all
>> > defined by the spec, but not yet in the QMP interface.  At somepoint
>> > we may want to add that as a state query type interface.  
>> 
>> ... this, too.
>> 
>> In review of v5, I asked whether this interface needs to be stable.
>> 
>> "Not stable" doesn't mean we change an interface without thought.  It
>> merely means we can change it much, much faster, and with much less
>> overhead.
>> 
>> I understand you want it chiefly for CXL development.  Development aids
>> commonly don't need to be stable.
>
> Ok. If it makes sense to make this unstable for now I'm fine with that.
> I don't see why it would change beyond in backwards compatible fashion
> with new optional fields to cover future specification updates allowing
> for more information. However I've been wrong on such things before.
>
> So I'm fine adding a patch on top of v8 marking them unstable for now.

I'd squash it into v8, but follow-up patch works for me, too.

>> If you're aiming for stable, you need to convince us the interface can
>> support the foreseeable purposes without incompatible changes.  In
>> particular, I'd like to see how you intend to support "finding out what
>> the state is".  I suspect that's related to my question in review of v8:
>> how to detect completion, and maybe track progress.
>
> There is a need for completion information but given it's completely
> asynchronous to the commands defined here (Can be out of order, lots of
> ongoing capacity add / remove flows in flight etc) and for the hardware
> definition the feedback occurs via an event record log I'm not expecting it
> to affect the interfaces added so far.
>
>> 
>> There's infrastructure for background jobs: job.json.  We might be
>> better off using it, unless completion is trivial and progress tracking
>> unnecessary.
>
> I'll take a look, but these are not conventional background commands
> (We do have those as well, but that's a whole different set of future
> problems!)
>
> The actual command itself completes synchronously but not the flow
> it kicked off which is not background as such because it may never
> finish and involves lots of moving parts.  This is similar to any
> form of error injection.  We inject the error synchronously and that
> creates a bunch of records in various registers / firmware etc but
> the actions the host OS takes are asynchronous and may only happen
> when it decides to poll some register or a driver loads much later.
>
> So I'm not sure if job.json approach fits.

Neither am I, but I want you to be aware of it, so you can make an
informed decision :)

>> [...]
diff mbox series

Patch

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 9d54e10cd4..3569902e9e 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -1405,7 +1405,7 @@  static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
  * Check whether any bit between addr[nr, nr+size) is set,
  * return true if any bit is set, otherwise return false
  */
-static bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
+bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
                               unsigned long size)
 {
     unsigned long res = find_next_bit(addr, size + nr, nr);
@@ -1444,7 +1444,7 @@  CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len)
     return NULL;
 }
 
-static void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
+void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
                                              uint64_t dpa,
                                              uint64_t len,
                                              uint8_t *tag,
@@ -1470,6 +1470,44 @@  void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
     g_free(extent);
 }
 
+/*
+ * Add a new extent to the extent "group" if group exists;
+ * otherwise, create a new group
+ * Return value: return the group where the extent is inserted.
+ */
+CXLDCExtentGroup *cxl_insert_extent_to_extent_group(CXLDCExtentGroup *group,
+                                                    uint64_t dpa,
+                                                    uint64_t len,
+                                                    uint8_t *tag,
+                                                    uint16_t shared_seq)
+{
+    if (!group) {
+        group = g_new0(CXLDCExtentGroup, 1);
+        QTAILQ_INIT(&group->list);
+    }
+    cxl_insert_extent_to_extent_list(&group->list, dpa, len,
+                                     tag, shared_seq);
+    return group;
+}
+
+void cxl_extent_group_list_insert_tail(CXLDCExtentGroupList *list,
+                                       CXLDCExtentGroup *group)
+{
+    QTAILQ_INSERT_TAIL(list, group, node);
+}
+
+void cxl_extent_group_list_delete_front(CXLDCExtentGroupList *list)
+{
+    CXLDCExtent *ent, *ent_next;
+    CXLDCExtentGroup *group = QTAILQ_FIRST(list);
+
+    QTAILQ_REMOVE(list, group, node);
+    QTAILQ_FOREACH_SAFE(ent, &group->list, node, ent_next) {
+        cxl_remove_extent_from_extent_list(&group->list, ent);
+    }
+    g_free(group);
+}
+
 /*
  * CXL r3.1 Table 8-168: Add Dynamic Capacity Response Input Payload
  * CXL r3.1 Table 8-170: Release Dynamic Capacity Input Payload
@@ -1541,6 +1579,7 @@  static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
 {
     uint32_t i;
     CXLDCExtent *ent;
+    CXLDCExtentGroup *ext_group;
     uint64_t dpa, len;
     Range range1, range2;
 
@@ -1551,9 +1590,13 @@  static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
         range_init_nofail(&range1, dpa, len);
 
         /*
-         * TODO: once the pending extent list is added, check against
-         * the list will be added here.
+         * The host-accepted DPA range must be contained by the first extent
+         * group in the pending list
          */
+        ext_group = QTAILQ_FIRST(&ct3d->dc.extents_pending);
+        if (!cxl_extents_contains_dpa_range(&ext_group->list, dpa, len)) {
+            return CXL_MBOX_INVALID_PA;
+        }
 
         /* to-be-added range should not overlap with range already accepted */
         QTAILQ_FOREACH(ent, &ct3d->dc.extents, node) {
@@ -1586,10 +1629,7 @@  static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
     CXLRetCode ret;
 
     if (in->num_entries_updated == 0) {
-        /*
-         * TODO: once the pending list is introduced, extents in the beginning
-         * will get wiped out.
-         */
+        cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
         return CXL_MBOX_SUCCESS;
     }
 
@@ -1615,11 +1655,9 @@  static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
 
         cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
         ct3d->dc.total_extent_count += 1;
-        /*
-         * TODO: we will add a pending extent list based on event log record
-         * and process the list accordingly here.
-         */
     }
+    /* Remove the first extent group in the pending list*/
+    cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
 
     return CXL_MBOX_SUCCESS;
 }
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index c2cdd6d506..e892b3de7b 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -667,6 +667,7 @@  static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
         ct3d->dc.total_capacity += region->len;
     }
     QTAILQ_INIT(&ct3d->dc.extents);
+    QTAILQ_INIT(&ct3d->dc.extents_pending);
 
     return true;
 }
@@ -674,10 +675,19 @@  static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
 static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
 {
     CXLDCExtent *ent, *ent_next;
+    CXLDCExtentGroup *group, *group_next;
 
     QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node, ent_next) {
         cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
     }
+
+    QTAILQ_FOREACH_SAFE(group, &ct3d->dc.extents_pending, node, group_next) {
+        QTAILQ_REMOVE(&ct3d->dc.extents_pending, group, node);
+        QTAILQ_FOREACH_SAFE(ent, &group->list, node, ent_next) {
+            cxl_remove_extent_from_extent_list(&group->list, ent);
+        }
+        g_free(group);
+    }
 }
 
 static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
@@ -1443,7 +1453,6 @@  static int ct3d_qmp_cxl_event_log_enc(CxlEventLog log)
         return CXL_EVENT_TYPE_FAIL;
     case CXL_EVENT_LOG_FATAL:
         return CXL_EVENT_TYPE_FATAL;
-/* DCD not yet supported */
     default:
         return -EINVAL;
     }
@@ -1694,6 +1703,306 @@  void qmp_cxl_inject_memory_module_event(const char *path, CxlEventLog log,
     }
 }
 
+/* CXL r3.1 Table 8-50: Dynamic Capacity Event Record */
+static const QemuUUID dynamic_capacity_uuid = {
+    .data = UUID(0xca95afa7, 0xf183, 0x4018, 0x8c, 0x2f,
+                 0x95, 0x26, 0x8e, 0x10, 0x1a, 0x2a),
+};
+
+typedef enum CXLDCEventType {
+    DC_EVENT_ADD_CAPACITY = 0x0,
+    DC_EVENT_RELEASE_CAPACITY = 0x1,
+    DC_EVENT_FORCED_RELEASE_CAPACITY = 0x2,
+    DC_EVENT_REGION_CONFIG_UPDATED = 0x3,
+    DC_EVENT_ADD_CAPACITY_RSP = 0x4,
+    DC_EVENT_CAPACITY_RELEASED = 0x5,
+} CXLDCEventType;
+
+/*
+ * Check whether the range [dpa, dpa + len - 1] has overlaps with extents in
+ * the list.
+ * Return value: return true if has overlaps; otherwise, return false
+ */
+static bool cxl_extents_overlaps_dpa_range(CXLDCExtentList *list,
+                                           uint64_t dpa, uint64_t len)
+{
+    CXLDCExtent *ent;
+    Range range1, range2;
+
+    if (!list) {
+        return false;
+    }
+
+    range_init_nofail(&range1, dpa, len);
+    QTAILQ_FOREACH(ent, list, node) {
+        range_init_nofail(&range2, ent->start_dpa, ent->len);
+        if (range_overlaps_range(&range1, &range2)) {
+            return true;
+        }
+    }
+    return false;
+}
+
+/*
+ * Check whether the range [dpa, dpa + len - 1] is contained by extents in
+ * the list.
+ * Will check multiple extents containment once superset release is added.
+ * Return value: return true if range is contained; otherwise, return false
+ */
+bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
+                                    uint64_t dpa, uint64_t len)
+{
+    CXLDCExtent *ent;
+    Range range1, range2;
+
+    if (!list) {
+        return false;
+    }
+
+    range_init_nofail(&range1, dpa, len);
+    QTAILQ_FOREACH(ent, list, node) {
+        range_init_nofail(&range2, ent->start_dpa, ent->len);
+        if (range_contains_range(&range2, &range1)) {
+            return true;
+        }
+    }
+    return false;
+}
+
+static bool cxl_extent_groups_overlaps_dpa_range(CXLDCExtentGroupList *list,
+                                                uint64_t dpa, uint64_t len)
+{
+    CXLDCExtentGroup *group;
+
+    if (!list) {
+        return false;
+    }
+
+    QTAILQ_FOREACH(group, list, node) {
+        if (cxl_extents_overlaps_dpa_range(&group->list, dpa, len)) {
+            return true;
+        }
+    }
+    return false;
+}
+
+/*
+ * The main function to process dynamic capacity event with extent list.
+ * Currently DC extents add/release requests are processed.
+ */
+static void qmp_cxl_process_dynamic_capacity_prescriptive(const char *path,
+        uint16_t hid, CXLDCEventType type, uint8_t rid,
+        CXLDCExtentRecordList *records, Error **errp)
+{
+    Object *obj;
+    CXLEventDynamicCapacity dCap = {};
+    CXLEventRecordHdr *hdr = &dCap.hdr;
+    CXLType3Dev *dcd;
+    uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;
+    uint32_t num_extents = 0;
+    CXLDCExtentRecordList *list;
+    CXLDCExtentGroup *group = NULL;
+    g_autofree CXLDCExtentRaw *extents = NULL;
+    uint8_t enc_log = CXL_EVENT_TYPE_DYNAMIC_CAP;
+    uint64_t dpa, offset, len, block_size;
+    g_autofree unsigned long *blk_bitmap = NULL;
+    int i;
+
+    obj = object_resolve_path_type(path, TYPE_CXL_TYPE3, NULL);
+    if (!obj) {
+        error_setg(errp, "Unable to resolve CXL type 3 device");
+        return;
+    }
+
+    dcd = CXL_TYPE3(obj);
+    if (!dcd->dc.num_regions) {
+        error_setg(errp, "No dynamic capacity support from the device");
+        return;
+    }
+
+
+    if (rid >= dcd->dc.num_regions) {
+        error_setg(errp, "region id is too large");
+        return;
+    }
+    block_size = dcd->dc.regions[rid].block_size;
+    blk_bitmap = bitmap_new(dcd->dc.regions[rid].len / block_size);
+
+    /* Sanity check and count the extents */
+    list = records;
+    while (list) {
+        offset = list->value->offset;
+        len = list->value->len;
+        dpa = offset + dcd->dc.regions[rid].base;
+
+        if (len == 0) {
+            error_setg(errp, "extent with 0 length is not allowed");
+            return;
+        }
+
+        if (offset % block_size || len % block_size) {
+            error_setg(errp, "dpa or len is not aligned to region block size");
+            return;
+        }
+
+        if (offset + len > dcd->dc.regions[rid].len) {
+            error_setg(errp, "extent range is beyond the region end");
+            return;
+        }
+
+        /* No duplicate or overlapped extents are allowed */
+        if (test_any_bits_set(blk_bitmap, offset / block_size,
+                              len / block_size)) {
+            error_setg(errp, "duplicate or overlapped extents are detected");
+            return;
+        }
+        bitmap_set(blk_bitmap, offset / block_size, len / block_size);
+
+        if (type == DC_EVENT_RELEASE_CAPACITY) {
+            if (cxl_extent_groups_overlaps_dpa_range(&dcd->dc.extents_pending,
+                                                     dpa, len)) {
+                error_setg(errp,
+                           "cannot release extent with pending DPA range");
+                return;
+            }
+            if (!cxl_extents_contains_dpa_range(&dcd->dc.extents, dpa, len)) {
+                error_setg(errp,
+                           "cannot release extent with non-existing DPA range");
+                return;
+            }
+        } else if (type == DC_EVENT_ADD_CAPACITY) {
+            if (cxl_extents_overlaps_dpa_range(&dcd->dc.extents, dpa, len)) {
+                error_setg(errp,
+                           "cannot add DPA already accessible  to the same LD");
+                return;
+            }
+            if (cxl_extent_groups_overlaps_dpa_range(&dcd->dc.extents_pending,
+                                                     dpa, len)) {
+                error_setg(errp,
+                           "cannot add DPA again while still pending");
+                return;
+            }
+        }
+        list = list->next;
+        num_extents++;
+    }
+
+    /* Create extent list for event being passed to host */
+    i = 0;
+    list = records;
+    extents = g_new0(CXLDCExtentRaw, num_extents);
+    while (list) {
+        offset = list->value->offset;
+        len = list->value->len;
+        dpa = dcd->dc.regions[rid].base + offset;
+
+        extents[i].start_dpa = dpa;
+        extents[i].len = len;
+        memset(extents[i].tag, 0, 0x10);
+        extents[i].shared_seq = 0;
+        if (type == DC_EVENT_ADD_CAPACITY) {
+            group = cxl_insert_extent_to_extent_group(group,
+                                                      extents[i].start_dpa,
+                                                      extents[i].len,
+                                                      extents[i].tag,
+                                                      extents[i].shared_seq);
+        }
+
+        list = list->next;
+        i++;
+    }
+    if (group) {
+        cxl_extent_group_list_insert_tail(&dcd->dc.extents_pending, group);
+    }
+
+    /*
+     * CXL r3.1 section 8.2.9.2.1.6: Dynamic Capacity Event Record
+     *
+     * All Dynamic Capacity event records shall set the Event Record Severity
+     * field in the Common Event Record Format to Informational Event. All
+     * Dynamic Capacity related events shall be logged in the Dynamic Capacity
+     * Event Log.
+     */
+    cxl_assign_event_header(hdr, &dynamic_capacity_uuid, flags, sizeof(dCap),
+                            cxl_device_get_timestamp(&dcd->cxl_dstate));
+
+    dCap.type = type;
+    /* FIXME: for now, validity flag is cleared */
+    dCap.validity_flags = 0;
+    stw_le_p(&dCap.host_id, hid);
+    /* only valid for DC_REGION_CONFIG_UPDATED event */
+    dCap.updated_region_id = 0;
+    dCap.flags = 0;
+    for (i = 0; i < num_extents; i++) {
+        memcpy(&dCap.dynamic_capacity_extent, &extents[i],
+               sizeof(CXLDCExtentRaw));
+
+        if (i < num_extents - 1) {
+            /* Set "More" flag */
+            dCap.flags |= BIT(0);
+        }
+
+        if (cxl_event_insert(&dcd->cxl_dstate, enc_log,
+                             (CXLEventRecordRaw *)&dCap)) {
+            cxl_event_irq_assert(dcd);
+        }
+    }
+}
+
+void qmp_cxl_add_dynamic_capacity(const char *path, uint16_t hid,
+                                  uint8_t sel_policy, uint8_t region_id,
+                                  const char *tag,
+                                  CXLDCExtentRecordList  *records,
+                                  Error **errp)
+{
+    enum {
+        CXL_SEL_POLICY_FREE,
+        CXL_SEL_POLICY_CONTIGUOUS,
+        CXL_SEL_POLICY_PRESCRIPTIVE,
+        CXL_SEL_POLICY_ENABLESHAREDACCESS,
+    };
+    switch (sel_policy) {
+    case CXL_SEL_POLICY_PRESCRIPTIVE:
+        qmp_cxl_process_dynamic_capacity_prescriptive(path, hid,
+                                                      DC_EVENT_ADD_CAPACITY,
+                                                      region_id, records, errp);
+        return;
+    default:
+        error_setg(errp, "Selection policy not supported");
+        return;
+    }
+}
+
+#define REMOVAL_POLICY_MASK 0xf
+#define REMOVAL_POLICY_PRESCRIPTIVE 1
+#define FORCED_REMOVAL_BIT BIT(4)
+
+void qmp_cxl_release_dynamic_capacity(const char *path, uint16_t hid,
+                                      uint8_t flags, uint8_t region_id,
+                                      const char *tag,
+                                      CXLDCExtentRecordList  *records,
+                                      Error **errp)
+{
+    CXLDCEventType type = DC_EVENT_RELEASE_CAPACITY;
+
+    if (flags & FORCED_REMOVAL_BIT) {
+        /* TODO: enable forced removal in the future */
+        type = DC_EVENT_FORCED_RELEASE_CAPACITY;
+        error_setg(errp, "Forced removal not supported yet");
+        return;
+    }
+
+    switch (flags & REMOVAL_POLICY_MASK) {
+    case REMOVAL_POLICY_PRESCRIPTIVE:
+        qmp_cxl_process_dynamic_capacity_prescriptive(path, hid, type,
+                                                      region_id, records, errp);
+        return;
+    default:
+        error_setg(errp, "Removal policy not supported");
+        return;
+    }
+}
+
 static void ct3_class_init(ObjectClass *oc, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(oc);
diff --git a/hw/mem/cxl_type3_stubs.c b/hw/mem/cxl_type3_stubs.c
index 3e1851e32b..810685e0d5 100644
--- a/hw/mem/cxl_type3_stubs.c
+++ b/hw/mem/cxl_type3_stubs.c
@@ -67,3 +67,23 @@  void qmp_cxl_inject_correctable_error(const char *path, CxlCorErrorType type,
 {
     error_setg(errp, "CXL Type 3 support is not compiled in");
 }
+
+void qmp_cxl_add_dynamic_capacity(const char *path,
+                                  uint16_t hid,
+                                  uint8_t sel_policy,
+                                  uint8_t region_id,
+                                  const char *tag,
+                                  CXLDCExtentRecordList  *records,
+                                  Error **errp)
+{
+    error_setg(errp, "CXL Type 3 support is not compiled in");
+}
+
+void qmp_cxl_release_dynamic_capacity(const char *path, uint16_t hid,
+                                      uint8_t flags, uint8_t region_id,
+                                      const char *tag,
+                                      CXLDCExtentRecordList  *records,
+                                      Error **errp)
+{
+    error_setg(errp, "CXL Type 3 support is not compiled in");
+}
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index df3511e91b..c69ff6b5de 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -443,6 +443,12 @@  typedef struct CXLDCExtent {
 } CXLDCExtent;
 typedef QTAILQ_HEAD(, CXLDCExtent) CXLDCExtentList;
 
+typedef struct CXLDCExtentGroup {
+    CXLDCExtentList list;
+    QTAILQ_ENTRY(CXLDCExtentGroup) node;
+} CXLDCExtentGroup;
+typedef QTAILQ_HEAD(, CXLDCExtentGroup) CXLDCExtentGroupList;
+
 typedef struct CXLDCRegion {
     uint64_t base;       /* aligned to 256*MiB */
     uint64_t decode_len; /* aligned to 256*MiB */
@@ -494,6 +500,7 @@  struct CXLType3Dev {
          */
         uint64_t total_capacity; /* 256M aligned */
         CXLDCExtentList extents;
+        CXLDCExtentGroupList extents_pending;
         uint32_t total_extent_count;
         uint32_t ext_list_gen_seq;
 
@@ -555,4 +562,19 @@  CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len);
 
 void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
                                         CXLDCExtent *extent);
+void cxl_insert_extent_to_extent_list(CXLDCExtentList *list, uint64_t dpa,
+                                      uint64_t len, uint8_t *tag,
+                                      uint16_t shared_seq);
+bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
+                       unsigned long size);
+bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
+                                    uint64_t dpa, uint64_t len);
+CXLDCExtentGroup *cxl_insert_extent_to_extent_group(CXLDCExtentGroup *group,
+                                                    uint64_t dpa,
+                                                    uint64_t len,
+                                                    uint8_t *tag,
+                                                    uint16_t shared_seq);
+void cxl_extent_group_list_insert_tail(CXLDCExtentGroupList *list,
+                                       CXLDCExtentGroup *group);
+void cxl_extent_group_list_delete_front(CXLDCExtentGroupList *list);
 #endif
diff --git a/include/hw/cxl/cxl_events.h b/include/hw/cxl/cxl_events.h
index 5170b8dbf8..38cadaa0f3 100644
--- a/include/hw/cxl/cxl_events.h
+++ b/include/hw/cxl/cxl_events.h
@@ -166,4 +166,22 @@  typedef struct CXLEventMemoryModule {
     uint8_t reserved[0x3d];
 } QEMU_PACKED CXLEventMemoryModule;
 
+/*
+ * CXL r3.1 section Table 8-50: Dynamic Capacity Event Record
+ * All fields little endian.
+ */
+typedef struct CXLEventDynamicCapacity {
+    CXLEventRecordHdr hdr;
+    uint8_t type;
+    uint8_t validity_flags;
+    uint16_t host_id;
+    uint8_t updated_region_id;
+    uint8_t flags;
+    uint8_t reserved2[2];
+    uint8_t dynamic_capacity_extent[0x28]; /* defined in cxl_device.h */
+    uint8_t reserved[0x18];
+    uint32_t extents_avail;
+    uint32_t tags_avail;
+} QEMU_PACKED CXLEventDynamicCapacity;
+
 #endif /* CXL_EVENTS_H */
diff --git a/qapi/cxl.json b/qapi/cxl.json
index 4281726dec..2dcf03d973 100644
--- a/qapi/cxl.json
+++ b/qapi/cxl.json
@@ -361,3 +361,72 @@ 
 ##
 {'command': 'cxl-inject-correctable-error',
  'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
+
+##
+# @CXLDCExtentRecord:
+#
+# Record of a single extent to add/release
+#
+# @offset: offset to the start of the region where the extent to be operated
+# @len: length of the extent
+#
+# Since: 9.1
+##
+{ 'struct': 'CXLDCExtentRecord',
+  'data': {
+      'offset':'uint64',
+      'len': 'uint64'
+  }
+}
+
+##
+# @cxl-add-dynamic-capacity:
+#
+# Command to start add dynamic capacity extents flow. The device will
+# have to acknowledged the acceptance of the extents before they are usable.
+#
+# @path: CXL DCD canonical QOM path
+# @hid: host id
+# @selection-policy: policy to use for selecting extents for adding capacity
+# @region-id: id of the region where the extent to add
+# @tag: Context field
+# @extents: Extents to add
+#
+# Since : 9.1
+##
+{ 'command': 'cxl-add-dynamic-capacity',
+  'data': { 'path': 'str',
+            'hid': 'uint16',
+            'selection-policy': 'uint8',
+            'region-id': 'uint8',
+            'tag': 'str',
+            'extents': [ 'CXLDCExtentRecord' ]
+           }
+}
+
+##
+# @cxl-release-dynamic-capacity:
+#
+# Command to start release dynamic capacity extents flow. The host will
+# need to respond to indicate that it has released the capacity before it
+# is made unavailable for read and write and can be re-added.
+#
+# @path: CXL DCD canonical QOM path
+# @hid: host id
+# @flags: bit[3:0] for removal policy, bit[4] for forced removal, bit[5] for
+#     sanitize on release, bit[7:6] reserved
+# @region-id: id of the region where the extent to release
+# @tag: Context field
+# @extents: Extents to release
+#
+# Since : 9.1
+##
+{ 'command': 'cxl-release-dynamic-capacity',
+  'data': { 'path': 'str',
+            'hid': 'uint16',
+            'flags': 'uint8',
+            'region-id': 'uint8',
+            'tag': 'str',
+            'extents': [ 'CXLDCExtentRecord' ]
+           }
+}