diff mbox series

[v8,11/14] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents

Message ID 20240523174651.1089554-12-nifan.cxl@gmail.com (mailing list archive)
State New, archived
Headers show
Series Enabling DCD emulation support in Qemu | expand

Commit Message

Fan Ni May 23, 2024, 5:44 p.m. UTC
From: Fan Ni <fan.ni@samsung.com>

To simulate FM functionalities for initiating Dynamic Capacity Add
(Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
add/release dynamic capacity extents requests.

With the change, we allow to release an extent only when its DPA range
is contained by a single accepted extent in the device. That is to say,
extent superset release is not supported yet.

1. Add dynamic capacity extents:

For example, the command to add two continuous extents (each 128MiB long)
to region 0 (starting at DPA offset 0) looks like below:

{ "execute": "qmp_capabilities" }

{ "execute": "cxl-add-dynamic-capacity",
  "arguments": {
      "path": "/machine/peripheral/cxl-dcd0",
      "host-id": 0,
      "selection-policy": "prescriptive",
      "region": 0,
      "extents": [
      {
          "offset": 0,
          "len": 134217728
      },
      {
          "offset": 134217728,
          "len": 134217728
      }
      ]
  }
}

2. Release dynamic capacity extents:

For example, the command to release an extent of size 128MiB from region 0
(DPA offset 128MiB) looks like below:

{ "execute": "cxl-release-dynamic-capacity",
  "arguments": {
      "path": "/machine/peripheral/cxl-dcd0",
      "host-id": 0,
      "removal-policy":"prescriptive",
      "region": 0,
      "extents": [
      {
          "offset": 134217728,
          "len": 134217728
      }
      ]
  }
}

Tested-by: Svetly Todorov <svetly.todorov@memverge.com>
Reviewed-by: Gregory Price <gregory.price@memverge.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/cxl/cxl-mailbox-utils.c  |  62 ++++++--
 hw/mem/cxl_type3.c          | 306 +++++++++++++++++++++++++++++++++++-
 hw/mem/cxl_type3_stubs.c    |  25 +++
 include/hw/cxl/cxl_device.h |  22 +++
 include/hw/cxl/cxl_events.h |  18 +++
 qapi/cxl.json               | 143 +++++++++++++++++
 6 files changed, 563 insertions(+), 13 deletions(-)

Comments

Markus Armbruster June 4, 2024, 7:12 a.m. UTC | #1
nifan.cxl@gmail.com writes:

> From: Fan Ni <fan.ni@samsung.com>
>
> To simulate FM functionalities for initiating Dynamic Capacity Add
> (Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
> r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
> add/release dynamic capacity extents requests.
>
> With the change, we allow to release an extent only when its DPA range
> is contained by a single accepted extent in the device. That is to say,
> extent superset release is not supported yet.
>
> 1. Add dynamic capacity extents:
>
> For example, the command to add two continuous extents (each 128MiB long)
> to region 0 (starting at DPA offset 0) looks like below:
>
> { "execute": "qmp_capabilities" }
>
> { "execute": "cxl-add-dynamic-capacity",
>   "arguments": {
>       "path": "/machine/peripheral/cxl-dcd0",
>       "host-id": 0,
>       "selection-policy": "prescriptive",
>       "region": 0,
>       "extents": [
>       {
>           "offset": 0,
>           "len": 134217728
>       },
>       {
>           "offset": 134217728,
>           "len": 134217728
>       }
>       ]
>   }
> }
>
> 2. Release dynamic capacity extents:
>
> For example, the command to release an extent of size 128MiB from region 0
> (DPA offset 128MiB) looks like below:
>
> { "execute": "cxl-release-dynamic-capacity",
>   "arguments": {
>       "path": "/machine/peripheral/cxl-dcd0",
>       "host-id": 0,
>       "removal-policy":"prescriptive",
>       "region": 0,
>       "extents": [
>       {
>           "offset": 134217728,
>           "len": 134217728
>       }
>       ]
>   }
> }
>
> Tested-by: Svetly Todorov <svetly.todorov@memverge.com>
> Reviewed-by: Gregory Price <gregory.price@memverge.com>
> Signed-off-by: Fan Ni <fan.ni@samsung.com>

[...]

> diff --git a/qapi/cxl.json b/qapi/cxl.json
> index 4281726dec..57d9f82014 100644
> --- a/qapi/cxl.json
> +++ b/qapi/cxl.json
> @@ -361,3 +361,146 @@
>  ##
>  {'command': 'cxl-inject-correctable-error',
>   'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
> +
> +##
> +# @CXLDynamicCapacityExtent:

Three existing type names start with Cxl, and only one starts with CXL.
Please make your new ones start with Cxl, not CXL:
CxlDynamicCapacityExtent.

> +#
> +# A single dynamic capacity extent
> +#
> +# @offset: The offset (in bytes) to the start of the region
> +#     where the extent belongs to.
> +#
> +# @len: The length of the extent in bytes.

What is this?  Memory?

> +#
> +# Since: 9.1
> +##
> +{ 'struct': 'CXLDynamicCapacityExtent',
> +  'data': {
> +      'offset':'uint64',
> +      'len': 'uint64'
> +  }
> +}
> +
> +##
> +# @CXLExtSelPolicy:

CxlExtentSelectionPolicy

> +#
> +# The policy to use for selecting which extents comprise the added
> +# capacity, as defined in cxl spec r3.1 Table 7-70.

Use the official title: "as defined in the CXL Specification 3.1" (I
think, the actual document is behind a click-through agreement).

> +#
> +# @free: 0h = Free
> +#
> +# @contiguous: 1h = Continuous

What does "1h =" mean?  The numeric encoding?

What exactly is "contiguous" / "continuous"?  I figure it's clear enough
if you have the CXL spec open in another window.  Can we condense it
into one phrase for use here?

> +#
> +# @prescriptive: 2h = Prescriptive
> +#
> +# @enable-shared-access: 3h = Enable Shared Access

Similar questions.

> +#
> +# Since: 9.1
> +##
> +{ 'enum': 'CXLExtSelPolicy',
> +  'data': ['free',
> +           'contiguous',
> +           'prescriptive',
> +           'enable-shared-access']
> +}
> +
> +##
> +# @cxl-add-dynamic-capacity:
> +#
> +# Command to initiate to add dynamic capacity extents to a host.  It

"Initiate adding dynamic capacity extents"

When a command initiates something, we commonly need a way to detect
completion, and sometimes need a way to track progress.

How can we detect completion, and if we can't, why's that okay?

Can adding capacity fail after the command succeeded?  If yes, how can
we detect that?

How long until completion after the command succeeded?  Unbounded time?

> +# simulates operations defined in cxl spec r3.1 7.6.7.6.5.

"defined in the CXL Specification 3.1 section 7.6.7.6.5"

More of the same below, not noting it again.

> +#
> +# @path: CXL DCD canonical QOM path.

Sure the QOM path needs to be canonical?

If not, what about "path to the CXL dynamic capacity device in the QOM
tree".  Intentionally close to existing descriptions of @qom-path
elsewhere.

> +#
> +# @host-id: The "Host ID" field as defined in cxl spec r3.1
> +#     Table 7-70.
> +#
> +# @selection-policy: The "Selection Policy" bits as defined in
> +#     cxl spec r3.1 Table 7-70.  It specifies the policy to use for
> +#     selecting which extents comprise the added capacity.
> +#
> +# @region: The "Region Number" field as defined in cxl spec r3.1
> +#     Table 7-70.  The dynamic capacity region where the capacity
> +#     is being added.  Valid range is from 0-7.

Scratch the second sentence?

> +#
> +# @tag: The "Tag" field as defined in cxl spec r3.1 Table 7-70.
> +#
> +# @extents: The "Extent List" field as defined in cxl spec r3.1
> +#     Table 7-70.
> +#
> +# Since : 9.1
> +##
> +{ 'command': 'cxl-add-dynamic-capacity',
> +  'data': { 'path': 'str',
> +            'host-id': 'uint16',
> +            'selection-policy': 'CXLExtSelPolicy',
> +            'region': 'uint8',
> +            '*tag': 'str',
> +            'extents': [ 'CXLDynamicCapacityExtent' ]
> +           }
> +}
> +
> +##
> +# @CXLExtRemovalPolicy:

CxlExtentRemovalPolicy

> +#
> +# The policy to use for selecting which extents comprise the released
> +# capacity, defined in the "Flags" field in cxl spec r3.1 Table 7-71.
> +#
> +# @tag-based: value = 0h.  Extents are selected by the device based
> +#     on tag, with no requirement for contiguous extents.
> +#
> +# @prescriptive: value = 1h.  Extent list of capacity to release is
> +#     included in the request payload.

I guess "value = ..." documents the numeric value.  Sure that's useful
here?

> +#
> +# Since: 9.1
> +##
> +{ 'enum': 'CXLExtRemovalPolicy',
> +  'data': ['tag-based',
> +           'prescriptive']
> +}
> +
> +##
> +# @cxl-release-dynamic-capacity:
> +#
> +# Command to initiate to release dynamic capacity extents from a

"Initiate releasing dynamic capacity extents"

When a command initiates something, we commonly need a way to detect
completion, and sometimes need a way to track progress.  See
cxl-add-dynamic-capacity above.

> +# host.  It simulates operations defined in cxl spec r3.1 7.6.7.6.6.
> +#
> +# @path: CXL DCD canonical QOM path.

My comment on cxl-add-dynamic-capacity argument @path applies.

> +#
> +# @host-id: The "Host ID" field as defined in cxl spec r3.1
> +#     Table 7-71.
> +#
> +# @removal-policy: Bit[3:0] of the "Flags" field as defined in cxl
> +#     spec r3.1 Table 7-71.
> +#
> +# @forced-removal: Bit[4] of the "Flags" field in cxl spec r3.1
> +#     Table 7-71.  When set, device does not wait for a Release

"the device"

> +#     Dynamic Capacity command from the host.  Host immediately
> +#     loses access to released capacity.

"Instead, the host immediately loses"

> +#
> +# @sanitize-on-release: Bit[5] of the "Flags" field in cxl spec r3.1
> +#     Table 7-71.  When set, device should sanitize all released

"the device"

> +#     capacity as a result of this request.

What does it mean "to sanitize capacity"?  Is this about scrubbing the
memory?

> +#
> +# @region: The "Region Number" field as defined in cxl spec r3.1
> +#     Table 7-71.  The dynamic capacity region where the capacity
> +#     is being added.  Valid range is from 0-7.

My comment on cxl-add-dynamic-capacity argument @region applies.

> +#
> +# @tag: The "Tag" field as defined in cxl spec r3.1 Table 7-71.
> +#
> +# @extents: The "Extent List" field as defined in cxl spec r3.1
> +#     Table 7-71.
> +#
> +# Since : 9.1
> +##
> +{ 'command': 'cxl-release-dynamic-capacity',
> +  'data': { 'path': 'str',
> +            'host-id': 'uint16',
> +            'removal-policy': 'CXLExtRemovalPolicy',
> +            '*forced-removal': 'bool',
> +            '*sanitize-on-release': 'bool',
> +            'region': 'uint8',
> +            '*tag': 'str',
> +            'extents': [ 'CXLDynamicCapacityExtent' ]
> +           }
> +}
Jonathan Cameron June 4, 2024, 11:55 a.m. UTC | #2
On Tue, 04 Jun 2024 09:12:09 +0200
Markus Armbruster <armbru@redhat.com> wrote:

> nifan.cxl@gmail.com writes:
> 
> > From: Fan Ni <fan.ni@samsung.com>
> >
> > To simulate FM functionalities for initiating Dynamic Capacity Add
> > (Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
> > r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
> > add/release dynamic capacity extents requests.
> >
> > With the change, we allow to release an extent only when its DPA range
> > is contained by a single accepted extent in the device. That is to say,
> > extent superset release is not supported yet.
> >
> > 1. Add dynamic capacity extents:
> >
> > For example, the command to add two continuous extents (each 128MiB long)
> > to region 0 (starting at DPA offset 0) looks like below:
> >
> > { "execute": "qmp_capabilities" }
> >
> > { "execute": "cxl-add-dynamic-capacity",
> >   "arguments": {
> >       "path": "/machine/peripheral/cxl-dcd0",
> >       "host-id": 0,
> >       "selection-policy": "prescriptive",
> >       "region": 0,
> >       "extents": [
> >       {
> >           "offset": 0,
> >           "len": 134217728
> >       },
> >       {
> >           "offset": 134217728,
> >           "len": 134217728
> >       }
> >       ]
> >   }
> > }
> >
> > 2. Release dynamic capacity extents:
> >
> > For example, the command to release an extent of size 128MiB from region 0
> > (DPA offset 128MiB) looks like below:
> >
> > { "execute": "cxl-release-dynamic-capacity",
> >   "arguments": {
> >       "path": "/machine/peripheral/cxl-dcd0",
> >       "host-id": 0,
> >       "removal-policy":"prescriptive",
> >       "region": 0,
> >       "extents": [
> >       {
> >           "offset": 134217728,
> >           "len": 134217728
> >       }
> >       ]
> >   }
> > }
> >
> > Tested-by: Svetly Todorov <svetly.todorov@memverge.com>
> > Reviewed-by: Gregory Price <gregory.price@memverge.com>
> > Signed-off-by: Fan Ni <fan.ni@samsung.com>  
> 

Hi Markus,

Thanks for the detailed review.

Fan is traveling for a few weeks and may have intermittent internet.
He asked me to help with any feedback that came in during this period.

Perhaps at this stage (as Michael has this queued) best bet is a follow on patch
tweaking things.  The blast radius is more or less contained to the
qmp file subject to a few parameter type changes.  I'd be keen on this
approach if possible because that lets me start attacking the annoyingly
large queue of stuff dependent on this series in parallel with
improving this aspect.

Proposed draft patch at end of this email and responses to individual
comments inline.

I'll do a separate patch in response to your suggestion to mark the
two interfaces unstable.  For now seems there is little disadvantage
in doing so as I assume there is nothing stopping us removing
that marking in a cycle or two if things look stable.

> [...]
> 
> > diff --git a/qapi/cxl.json b/qapi/cxl.json
> > index 4281726dec..57d9f82014 100644
> > --- a/qapi/cxl.json
> > +++ b/qapi/cxl.json
> > @@ -361,3 +361,146 @@
> >  ##
> >  {'command': 'cxl-inject-correctable-error',
> >   'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
> > +
> > +##
> > +# @CXLDynamicCapacityExtent:  
> 
> Three existing type names start with Cxl, and only one starts with CXL.
> Please make your new ones start with Cxl, not CXL:
> CxlDynamicCapacityExtent.
Ok. 
> 
> > +#
> > +# A single dynamic capacity extent
> > +#
> > +# @offset: The offset (in bytes) to the start of the region
> > +#     where the extent belongs to.
> > +#
> > +# @len: The length of the extent in bytes.  
> 
> What is this?  Memory?

Yes.  Probably makes more sense to add to the initial description rather
than down here.

# A single dynamic capacity extent.  This is a contiguous allocation
# of memory by Device Physical Address within a single Dynamic Capacity
# Region on a CXL Type 3 device.

This is all a bit of a balance between not quoting large chunks of
the specification and providing enough detail here.
Reality is that people who don't know what this is, won't use this
interface.  We can add some additional documentation to introduce
all the concepts but it probably doesn't make sense to do so here.


> 
> > +#
> > +# Since: 9.1
> > +##
> > +{ 'struct': 'CXLDynamicCapacityExtent',
> > +  'data': {
> > +      'offset':'uint64',
> > +      'len': 'uint64'
> > +  }
> > +}
> > +
> > +##
> > +# @CXLExtSelPolicy:  
> 
> CxlExtentSelectionPolicy
> 
> > +#
> > +# The policy to use for selecting which extents comprise the added
> > +# capacity, as defined in cxl spec r3.1 Table 7-70.  
> 
> Use the official title: "as defined in the CXL Specification 3.1" (I
> think, the actual document is behind a click-through agreement).

Sadly not that simple, hence the desire for an abbreviation. Should be

Compute Express Link (CXL) Specification, Revision 3.1, Version 1.0

Can drop the Version 1.0 (as there have never been other versions and
probably won't be) but the Revision part matters (unfortunately)
hence the r in the above.

Not that we've used CXL r3.0 etc in previous QMP docs for this. Perhaps
just sticking to that and relying on the reference in
docs/system/devices/cxl.rst for the canonical reference.

For now I'll go with the (almost) full form here as it's never wrong to
spell it out.  So all the new references will be to
Compute Express Link (CXL) Specification, Revision 3.1, Section xxxx


> 
> > +#
> > +# @free: 0h = Free
> > +#
> > +# @contiguous: 1h = Continuous  
> 
> What does "1h =" mean?  The numeric encoding?
Alignment with spec, but doesn't need to be here so removed.

> 
> What exactly is "contiguous" / "continuous"?  I figure it's clear enough
> if you have the CXL spec open in another window.  Can we condense it
> into one phrase for use here?

@free: Device is responsible for allocating the requested memory
     capacity and is free to do this using any combination of
     supported extents.

@contiguous: Device is responsible for allocating the requested
     memory capacity but must do so as a single contiguous
     extent.

@prescriptive: The precise set of extents to be allocated is specified
     by the command.  Thus allocation is being managed by the
     issuer of the allocation command, not the device.

@enable-shared-access: Capacity has already been allocated to a
     different host using free, contiguous or prescriptive methods with
     a known tag. This policy then instructs the device to make the
     capacity with the specified tag available to an additional host.
     Capacity is implicit as it matches that already associated with the
     tag. Note that the extent list (and hence DPAs)
     used are per host, so a device may use different representations
     on each host. The ordering of the extents provided to each host
     is indicated to the host using per extent sequence numbers generated
     by the device. Has a similar
     meaning for temporal sharing but in that case there may be only
     one host involved.

> 
> > +#
> > +# @prescriptive: 2h = Prescriptive
> > +#
> > +# @enable-shared-access: 3h = Enable Shared Access  
> 
> Similar questions.
> 
> > +#
> > +# Since: 9.1
> > +##
> > +{ 'enum': 'CXLExtSelPolicy',
> > +  'data': ['free',
> > +           'contiguous',
> > +           'prescriptive',
> > +           'enable-shared-access']
> > +}
> > +
> > +##
> > +# @cxl-add-dynamic-capacity:
> > +#
> > +# Command to initiate to add dynamic capacity extents to a host.  It  
> 
> "Initiate adding dynamic capacity extents"
Done.
> 
> When a command initiates something, we commonly need a way to detect
> completion, and sometimes need a way to track progress.
> 
> How can we detect completion, and if we can't, why's that okay?
> 
> Can adding capacity fail after the command succeeded?  If yes, how can
> we detect that?

The full flow can fail, in the sense that the host can reject the offered
capacity.
This command just initiates the flow.

Today we can't detect it via QMP. There are a could of options but I
think they are out of scope for this document (for now).
There are a lot more DCD features to come and I'd include a
resolution to this aspect as one of those.  Aim today is just
to get to the point where we can test the OS handling - other
cases like virtualization of this require a lot more infrastructure
on top of what we have here.

So likely options:

* The 'fabric manager' will have an out of band path to the OS as it
  doesn't spontaneously decide to offer capacity - that happens
  because an orchestrator (think kubernetes or similar) has told a
  host to bring up an application that needs this extra capacity.
  That path would typically include an acknowledgment that the capacity
  has turned up and the host can run what it was asked to run.

  There is an inband path for a real fabric manager interface that
  we don't yet have an equivalent of in QEMU. An earlier version
  of this patch set provided a hacky equivalent so was dropped.
  That path is the Fabric Manager side Dynamic Capacity Event Log
  which has events for this
 0x4 Add Capacity Response:
" The host has responded to the Add Capacity event and the Dynamic
  Capacity Extent field in this structure specifies the capacity
  accepted by the host.  This event shall only be reported
  to the FM"
0x5 is the similar one for release.

So long term there is probably a need for a reporting interface
but lots more to do in general and I think this is functional
without that.  For now I think all we can do is document that
discovering success must be done via an out of band interface.

I've added:
" Note that, currently, establishing success or failure of the full Add Dynamic
  Capacity flow requires out of band communication with the OS of
  the CXL host."

Does that work for now?  We will have to remember to update if/when
we add a way to query this.

Also clear we could benefit from some additional documentation
in cxl.rst.  That's a job for another day however - for now to
get the details users will have to read the CXL specification or
may watch a bunch of conference videos and webinars at least.
 

> 
> How long until completion after the command succeeded?  Unbounded time?

Depends on the host, and indeed unbounded - ultimately there is an abort
path (forced removal later in this doc) but it is sometimes fatal for the
OS running and only meant for the case where the host OS crashed.
Not many operating systems play well with force removal of memory and due
to a race condition it may looks like that to the host.  So basically
it's a 'don't use this' kind of hardware feature.

However it's not that QEMU is waiting for it beyond  having some tracking
structures allocated that are not freed until the flow has finished.
This is very much an an asynchronous flow.

> 
> > +# simulates operations defined in cxl spec r3.1 7.6.7.6.5.  
> 
> "defined in the CXL Specification 3.1 section 7.6.7.6.5"
> 
> More of the same below, not noting it again.
Sure. Hopefully fixed throughout the new text. I've not taken
on the existing cases today.

> 
> > +#
> > +# @path: CXL DCD canonical QOM path.  
> 
> Sure the QOM path needs to be canonical?
> 
> If not, what about "path to the CXL dynamic capacity device in the QOM
> tree".  Intentionally close to existing descriptions of @qom-path
> elsewhere.

That text LGTM. I'll focus only on new cases of this for an initial
patch but there are a load of other cases of this text that will
want updating separately.

> 
> > +#
> > +# @host-id: The "Host ID" field as defined in cxl spec r3.1
> > +#     Table 7-70.
> > +#
> > +# @selection-policy: The "Selection Policy" bits as defined in
> > +#     cxl spec r3.1 Table 7-70.  It specifies the policy to use for
> > +#     selecting which extents comprise the added capacity.
> > +#
> > +# @region: The "Region Number" field as defined in cxl spec r3.1
> > +#     Table 7-70.  The dynamic capacity region where the capacity
> > +#     is being added.  Valid range is from 0-7.  
> 
> Scratch the second sentence?

Sure, I guess because nearly everything else is just a spec reference
and this isn't adding enough info to be useful?

> 
> > +#
> > +# @tag: The "Tag" field as defined in cxl spec r3.1 Table 7-70.
> > +#
> > +# @extents: The "Extent List" field as defined in cxl spec r3.1
> > +#     Table 7-70.
> > +#
> > +# Since : 9.1
> > +##
> > +{ 'command': 'cxl-add-dynamic-capacity',
> > +  'data': { 'path': 'str',
> > +            'host-id': 'uint16',
> > +            'selection-policy': 'CXLExtSelPolicy',
> > +            'region': 'uint8',
> > +            '*tag': 'str',
> > +            'extents': [ 'CXLDynamicCapacityExtent' ]
> > +           }
> > +}
> > +
> > +##
> > +# @CXLExtRemovalPolicy:  
> 
> CxlExtentRemovalPolicy
Done this and similar.
> 
> > +#
> > +# The policy to use for selecting which extents comprise the released
> > +# capacity, defined in the "Flags" field in cxl spec r3.1 Table 7-71.
> > +#
> > +# @tag-based: value = 0h.  Extents are selected by the device based
> > +#     on tag, with no requirement for contiguous extents.
> > +#
> > +# @prescriptive: value = 1h.  Extent list of capacity to release is
> > +#     included in the request payload.  
> 
> I guess "value = ..." documents the numeric value.  Sure that's useful
> here?

Dropped as not useful here.

> 
> > +#
> > +# Since: 9.1
> > +##
> > +{ 'enum': 'CXLExtRemovalPolicy',
> > +  'data': ['tag-based',
> > +           'prescriptive']
> > +}
> > +
> > +##
> > +# @cxl-release-dynamic-capacity:
> > +#
> > +# Command to initiate to release dynamic capacity extents from a  
> 
> "Initiate releasing dynamic capacity extents"
> 
> When a command initiates something, we commonly need a way to detect
> completion, and sometimes need a way to track progress.  See
> cxl-add-dynamic-capacity above.
> 

Effectively same reply.  Today you can only do this via out of band
comms with the host.  We have quite a lot more to add before we
can report this via QMP. This is very much part 1 of DCD support,
I'd expect us to be still adding features in a year or more.

I'll add similar text to proposed for the add path.

...

> 
> > +#     capacity as a result of this request.  
> 
> What does it mean "to sanitize capacity"?  Is this about scrubbing the
> memory?

For one meaning of scrubbing.  Not the one that is normally applied to
memory which is patrol scrub / ECC error detection and correction and
subject to a long kernel mailing list thread at the moment and another
QEMU patch set on my queue..  

Why can't we have a be dictionary of canonical terms. Ah well.
Added a slightly shortened quote from the CXL spec.
"This Ensures that all user data and metadata is made permanently
 unavailable by whatever means is appropriate for the media type.
 Note that changing encryption keys is not sufficient."

The last bit is because we will shortly have secure erase support
via another patch set and in that case changing encryption keys is
sufficient.

  

> 
> > +#
> > +# @region: The "Region Number" field as defined in cxl spec r3.1
> > +#     Table 7-71.  The dynamic capacity region where the capacity
> > +#     is being added.  Valid range is from 0-7.  
> 
> My comment on cxl-add-dynamic-capacity argument @region applies.
"The dynamic capacity region where the capacity is being added."
sentence dropped.

> 
> > +#
> > +# @tag: The "Tag" field as defined in cxl spec r3.1 Table 7-71.
> > +#
> > +# @extents: The "Extent List" field as defined in cxl spec r3.1
> > +#     Table 7-71.
> > +#
> > +# Since : 9.1
> > +##
> > +{ 'command': 'cxl-release-dynamic-capacity',
> > +  'data': { 'path': 'str',
> > +            'host-id': 'uint16',
> > +            'removal-policy': 'CXLExtRemovalPolicy',
> > +            '*forced-removal': 'bool',
> > +            '*sanitize-on-release': 'bool',
> > +            'region': 'uint8',
> > +            '*tag': 'str',
> > +            'extents': [ 'CXLDynamicCapacityExtent' ]
> > +           }
> > +}  
> 

So with all that incorporated, what I currently have is:



[PATCH] hw/cxl/events: Improve QMP interfaces and documentation for add/release dynamic capacity.

New DCD command definitions updated in response to review comments
from Markus.

- Used CxlXXXX instead of CXLXXXXX for newly added types.
- Expanded some abreviations in type names to be easier to read.
- Additional documentation for some fields.
- Replace slightly vague cxl r3.1 references with
  "Compute Express Link (CXL) Specification, Revision 3.1, XXXX"
  to bring them inline with what it says on the specification cover.

Suggested-by: Maruks Armbruster <armbru@redhat.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

---
I can break this up into a separate patches, but that's going to be
quite a lot of churn as often multiple of the above affect the same
paragraph.

---
 qapi/cxl.json            | 152 ++++++++++++++++++++++++---------------
 hw/mem/cxl_type3.c       |  18 ++---
 hw/mem/cxl_type3_stubs.c |   8 +--
 3 files changed, 107 insertions(+), 71 deletions(-)

diff --git a/qapi/cxl.json b/qapi/cxl.json
index 57d9f82014..a38622a0d1 100644
--- a/qapi/cxl.json
+++ b/qapi/cxl.json
@@ -363,9 +363,11 @@
  'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
 
 ##
-# @CXLDynamicCapacityExtent:
+# @CxlDynamicCapacityExtent:
 #
-# A single dynamic capacity extent
+# A single dynamic capacity extent.  This is a contiguous allocation
+# of memory by Device Physical Address within a single Dynamic
+# Capacity Region on a CXL Type 3 Device.
 #
 # @offset: The offset (in bytes) to the start of the region
 #     where the extent belongs to.
@@ -374,7 +376,7 @@
 #
 # Since: 9.1
 ##
-{ 'struct': 'CXLDynamicCapacityExtent',
+{ 'struct': 'CxlDynamicCapacityExtent',
   'data': {
       'offset':'uint64',
       'len': 'uint64'
@@ -382,22 +384,40 @@
 }
 
 ##
-# @CXLExtSelPolicy:
+# @CxlExtentSelectionPolicy:
 #
 # The policy to use for selecting which extents comprise the added
-# capacity, as defined in cxl spec r3.1 Table 7-70.
-#
-# @free: 0h = Free
-#
-# @contiguous: 1h = Continuous
-#
-# @prescriptive: 2h = Prescriptive
-#
-# @enable-shared-access: 3h = Enable Shared Access
+# capacity, as defined in Compute Express Link (CXL) Specification,
+# Revision 3.1, Table 7-70.
+#
+# @free: Device is responsible for allocating the requested memory
+#     capacity and is free to do this using any combination of
+#     supported extents.
+#
+# @contiguous: Device is responsible for allocating the requested
+#     memory capacity but must do so as a single contiguous
+#     extent.
+#
+# @prescriptive: The precise set of extents to be allocated is
+#     specified by the command.  Thus allocation is being managed
+#     by the issuer of the allocation command, not the device.
+#
+# @enable-shared-access: Capacity has already been allocated to a
+#     different host using free, contiguous or prescriptive policy
+#     with a known tag.  This policy then instructs the device to
+#     make the capacity with the specified tag available to an
+#     additional host.  Capacity is implicit as it matches that
+#     already associated with the tag.  Note that the extent list
+#     (and hence Device Physical Addresses) used are per host, so
+#     a device may use different representations on each host.
+#     The ordering of the extents provided to each host is indicated
+#     to the host using per extent sequence numbers generated by
+#     the device.  Has a similar meaning for temporal sharing, but
+#     in that case there may be only one host involved.
 #
 # Since: 9.1
 ##
-{ 'enum': 'CXLExtSelPolicy',
+{ 'enum': 'CxlExtentSelectionPolicy',
   'data': ['free',
            'contiguous',
            'prescriptive',
@@ -407,54 +427,60 @@
 ##
 # @cxl-add-dynamic-capacity:
 #
-# Command to initiate to add dynamic capacity extents to a host.  It
-# simulates operations defined in cxl spec r3.1 7.6.7.6.5.
+# Initiate adding dynamic capacity extents to a host.  This simulates
+# operations defined in Compute Express Link (CXL) Specification,
+# Revision 3.1, Section 7.6.7.6.5. Note that, currently, establishing
+# success or failure of the full Add Dynamic Capacity flow requires
+# out of band communication with the OS of the CXL host.
 #
-# @path: CXL DCD canonical QOM path.
+# @path: path to the CXL Dynamic Capacity Device in the QOM tree.
 #
-# @host-id: The "Host ID" field as defined in cxl spec r3.1
-#     Table 7-70.
+# @host-id: The "Host ID" field as defined in Compute Express Link
+#     (CXL) Specification, Revision 3.1, Table 7-70.
 #
 # @selection-policy: The "Selection Policy" bits as defined in
-#     cxl spec r3.1 Table 7-70.  It specifies the policy to use for
-#     selecting which extents comprise the added capacity.
+#     Compute Express Link (CXL) Specification, Revision 3.1,
+#     Table 7-70.  It specifies the policy to use for selecting
+#     which extents comprise the added capacity.
 #
-# @region: The "Region Number" field as defined in cxl spec r3.1
-#     Table 7-70.  The dynamic capacity region where the capacity
-#     is being added.  Valid range is from 0-7.
+# @region: The "Region Number" field as defined in Compute Express
+#     Link (CXL) Specification, Revision 3.1, Table 7-70.  Valid
+#     range is from 0-7.
 #
-# @tag: The "Tag" field as defined in cxl spec r3.1 Table 7-70.
+# @tag: The "Tag" field as defined in Compute Express Link (CXL)
+#     Specification, Revision 3.1, Table 7-70.
 #
-# @extents: The "Extent List" field as defined in cxl spec r3.1
-#     Table 7-70.
+# @extents: The "Extent List" field as defined in Compute Express Link
+#     (CXL) Specification, Revision 3.1, Table 7-70.
 #
 # Since : 9.1
 ##
 { 'command': 'cxl-add-dynamic-capacity',
   'data': { 'path': 'str',
             'host-id': 'uint16',
-            'selection-policy': 'CXLExtSelPolicy',
+            'selection-policy': 'CxlExtentSelectionPolicy',
             'region': 'uint8',
             '*tag': 'str',
-            'extents': [ 'CXLDynamicCapacityExtent' ]
+            'extents': [ 'CxlDynamicCapacityExtent' ]
            }
 }
 
 ##
-# @CXLExtRemovalPolicy:
+# @CxlExtentRemovalPolicy:
 #
 # The policy to use for selecting which extents comprise the released
-# capacity, defined in the "Flags" field in cxl spec r3.1 Table 7-71.
+# capacity, defined in the "Flags" field in Compute Express Link (CXL)
+# Specification, Revision 3.1, Table 7-71.
 #
-# @tag-based: value = 0h.  Extents are selected by the device based
-#     on tag, with no requirement for contiguous extents.
+# @tag-based: Extents are selected by the device based on tag, with
+#     no requirement for contiguous extents.
 #
-# @prescriptive: value = 1h.  Extent list of capacity to release is
-#     included in the request payload.
+# @prescriptive: Extent list of capacity to release is included in
+#     the request payload.
 #
 # Since: 9.1
 ##
-{ 'enum': 'CXLExtRemovalPolicy',
+{ 'enum': 'CxlExtentRemovalPolicy',
   'data': ['tag-based',
            'prescriptive']
 }
@@ -462,45 +488,55 @@
 ##
 # @cxl-release-dynamic-capacity:
 #
-# Command to initiate to release dynamic capacity extents from a
-# host.  It simulates operations defined in cxl spec r3.1 7.6.7.6.6.
+# Initiate release of dynamic capacity extents from a host.  This
+# simulates operations defined in Compute Express Link (CXL)
+# Specification, Revision 3.1, Section 7.6.7.6.6. Note that,
+# currently, success or failure of the full Release Dynamic Capacity
+# flow requires out of band communication with the OS of the CXL host.
 #
-# @path: CXL DCD canonical QOM path.
+# @path: path to the CXL Dynamic Capacity Device in the QOM tree.
 #
-# @host-id: The "Host ID" field as defined in cxl spec r3.1
-#     Table 7-71.
+# @host-id: The "Host ID" field as defined in Compute Express Link
+#     (CXL) Specification, Revision 3.1, Table 7-71.
 #
-# @removal-policy: Bit[3:0] of the "Flags" field as defined in cxl
-#     spec r3.1 Table 7-71.
+# @removal-policy: Bit[3:0] of the "Flags" field as defined in
+#     Compute Express Link (CXL) Specification, Revision 3.1,
+#     Table 7-71.
 #
-# @forced-removal: Bit[4] of the "Flags" field in cxl spec r3.1
-#     Table 7-71.  When set, device does not wait for a Release
-#     Dynamic Capacity command from the host.  Host immediately
-#     loses access to released capacity.
+# @forced-removal: Bit[4] of the "Flags" field in Compute Express
+#     Link (CXL) Specification, Revision 3.1, Table 7-71.  When set,
+#     the device does not wait for a Release Dynamic Capacity command
+#     from the host.  Instead, the host immediately looses access to
+#     the released capacity.
 #
-# @sanitize-on-release: Bit[5] of the "Flags" field in cxl spec r3.1
-#     Table 7-71.  When set, device should sanitize all released
-#     capacity as a result of this request.
+# @sanitize-on-release: Bit[5] of the "Flags" field in Compute
+#     Express Link (CXL) Specification, Revision 3.1, Table 7-71.
+#     When set, the device should sanitize all released capacity as
+#     a result of this request. This ensures that all user data
+#     and metadata is made permanently unavailable by whatever
+#     means is appropriate for the media type. Note that changing
+#     encryption keys is not sufficient.
 #
-# @region: The "Region Number" field as defined in cxl spec r3.1
-#     Table 7-71.  The dynamic capacity region where the capacity
-#     is being added.  Valid range is from 0-7.
+# @region: The "Region Number" field as defined in Compute Express
+#     Link Specification, Revision 3.1, Table 7-71.  Valid range
+#     is from 0-7.
 #
-# @tag: The "Tag" field as defined in cxl spec r3.1 Table 7-71.
+# @tag: The "Tag" field as defined in Compute Express Link (CXL)
+#     Specification, Revision 3.1, Table 7-71.
 #
-# @extents: The "Extent List" field as defined in cxl spec r3.1
-#     Table 7-71.
+# @extents: The "Extent List" field as defined in Compute Express
+#     Link (CXL) Specification, Revision 3.1, Table 7-71.
 #
 # Since : 9.1
 ##
 { 'command': 'cxl-release-dynamic-capacity',
   'data': { 'path': 'str',
             'host-id': 'uint16',
-            'removal-policy': 'CXLExtRemovalPolicy',
+            'removal-policy': 'CxlExtentRemovalPolicy',
             '*forced-removal': 'bool',
             '*sanitize-on-release': 'bool',
             'region': 'uint8',
             '*tag': 'str',
-            'extents': [ 'CXLDynamicCapacityExtent' ]
+            'extents': [ 'CxlDynamicCapacityExtent' ]
            }
 }
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 284db94182..2242986d8b 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -1873,7 +1873,7 @@ static bool cxl_extent_groups_overlaps_dpa_range(CXLDCExtentGroupList *list,
  */
 static void qmp_cxl_process_dynamic_capacity_prescriptive(const char *path,
         uint16_t hid, CXLDCEventType type, uint8_t rid,
-        CXLDynamicCapacityExtentList *records, Error **errp)
+        CxlDynamicCapacityExtentList *records, Error **errp)
 {
     Object *obj;
     CXLEventDynamicCapacity dCap = {};
@@ -1881,7 +1881,7 @@ static void qmp_cxl_process_dynamic_capacity_prescriptive(const char *path,
     CXLType3Dev *dcd;
     uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;
     uint32_t num_extents = 0;
-    CXLDynamicCapacityExtentList *list;
+    CxlDynamicCapacityExtentList *list;
     CXLDCExtentGroup *group = NULL;
     g_autofree CXLDCExtentRaw *extents = NULL;
     uint8_t enc_log = CXL_EVENT_TYPE_DYNAMIC_CAP;
@@ -2031,13 +2031,13 @@ static void qmp_cxl_process_dynamic_capacity_prescriptive(const char *path,
 }
 
 void qmp_cxl_add_dynamic_capacity(const char *path, uint16_t host_id,
-                                  CXLExtSelPolicy sel_policy, uint8_t region,
-                                  const char *tag,
-                                  CXLDynamicCapacityExtentList  *extents,
+                                  CxlExtentSelectionPolicy sel_policy,
+                                  uint8_t region, const char *tag,
+                                  CxlDynamicCapacityExtentList  *extents,
                                   Error **errp)
 {
     switch (sel_policy) {
-    case CXL_EXT_SEL_POLICY_PRESCRIPTIVE:
+    case CXL_EXTENT_SELECTION_POLICY_PRESCRIPTIVE:
         qmp_cxl_process_dynamic_capacity_prescriptive(path, host_id,
                                                       DC_EVENT_ADD_CAPACITY,
                                                       region, extents, errp);
@@ -2049,14 +2049,14 @@ void qmp_cxl_add_dynamic_capacity(const char *path, uint16_t host_id,
 }
 
 void qmp_cxl_release_dynamic_capacity(const char *path, uint16_t host_id,
-                                      CXLExtRemovalPolicy removal_policy,
+                                      CxlExtentRemovalPolicy removal_policy,
                                       bool has_forced_removal,
                                       bool forced_removal,
                                       bool has_sanitize_on_release,
                                       bool sanitize_on_release,
                                       uint8_t region,
                                       const char *tag,
-                                      CXLDynamicCapacityExtentList  *extents,
+                                      CxlDynamicCapacityExtentList  *extents,
                                       Error **errp)
 {
     CXLDCEventType type = DC_EVENT_RELEASE_CAPACITY;
@@ -2069,7 +2069,7 @@ void qmp_cxl_release_dynamic_capacity(const char *path, uint16_t host_id,
     }
 
     switch (removal_policy) {
-    case CXL_EXT_REMOVAL_POLICY_PRESCRIPTIVE:
+    case CXL_EXTENT_REMOVAL_POLICY_PRESCRIPTIVE:
         qmp_cxl_process_dynamic_capacity_prescriptive(path, host_id, type,
                                                       region, extents, errp);
         return;
diff --git a/hw/mem/cxl_type3_stubs.c b/hw/mem/cxl_type3_stubs.c
index 45419bbefe..c1a5e4a7c1 100644
--- a/hw/mem/cxl_type3_stubs.c
+++ b/hw/mem/cxl_type3_stubs.c
@@ -70,24 +70,24 @@ void qmp_cxl_inject_correctable_error(const char *path, CxlCorErrorType type,
 
 void qmp_cxl_add_dynamic_capacity(const char *path,
                                   uint16_t host_id,
-                                  CXLExtSelPolicy sel_policy,
+                                  CxlExtentSelectionPolicy sel_policy,
                                   uint8_t region,
                                   const char *tag,
-                                  CXLDynamicCapacityExtentList *extents,
+                                  CxlDynamicCapacityExtentList *extents,
                                   Error **errp)
 {
     error_setg(errp, "CXL Type 3 support is not compiled in");
 }
 
 void qmp_cxl_release_dynamic_capacity(const char *path, uint16_t host_id,
-                                      CXLExtRemovalPolicy removal_policy,
+                                      CxlExtentRemovalPolicy removal_policy,
                                       bool has_forced_removal,
                                       bool forced_removal,
                                       bool has_sanitize_on_release,
                                       bool sanitize_on_release,
                                       uint8_t region,
                                       const char *tag,
-                                      CXLDynamicCapacityExtentList *extents,
+                                      CxlDynamicCapacityExtentList *extents,
                                       Error **errp)
 {
     error_setg(errp, "CXL Type 3 support is not compiled in");
Markus Armbruster June 4, 2024, 2:49 p.m. UTC | #3
Jonathan Cameron <Jonathan.Cameron@Huawei.com> writes:

> On Tue, 04 Jun 2024 09:12:09 +0200
> Markus Armbruster <armbru@redhat.com> wrote:
>
>> nifan.cxl@gmail.com writes:
>> 
>> > From: Fan Ni <fan.ni@samsung.com>
>> >
>> > To simulate FM functionalities for initiating Dynamic Capacity Add
>> > (Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
>> > r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
>> > add/release dynamic capacity extents requests.
>> >
>> > With the change, we allow to release an extent only when its DPA range
>> > is contained by a single accepted extent in the device. That is to say,
>> > extent superset release is not supported yet.
>> >
>> > 1. Add dynamic capacity extents:
>> >
>> > For example, the command to add two continuous extents (each 128MiB long)
>> > to region 0 (starting at DPA offset 0) looks like below:
>> >
>> > { "execute": "qmp_capabilities" }
>> >
>> > { "execute": "cxl-add-dynamic-capacity",
>> >   "arguments": {
>> >       "path": "/machine/peripheral/cxl-dcd0",
>> >       "host-id": 0,
>> >       "selection-policy": "prescriptive",
>> >       "region": 0,
>> >       "extents": [
>> >       {
>> >           "offset": 0,
>> >           "len": 134217728
>> >       },
>> >       {
>> >           "offset": 134217728,
>> >           "len": 134217728
>> >       }
>> >       ]
>> >   }
>> > }
>> >
>> > 2. Release dynamic capacity extents:
>> >
>> > For example, the command to release an extent of size 128MiB from region 0
>> > (DPA offset 128MiB) looks like below:
>> >
>> > { "execute": "cxl-release-dynamic-capacity",
>> >   "arguments": {
>> >       "path": "/machine/peripheral/cxl-dcd0",
>> >       "host-id": 0,
>> >       "removal-policy":"prescriptive",
>> >       "region": 0,
>> >       "extents": [
>> >       {
>> >           "offset": 134217728,
>> >           "len": 134217728
>> >       }
>> >       ]
>> >   }
>> > }
>> >
>> > Tested-by: Svetly Todorov <svetly.todorov@memverge.com>
>> > Reviewed-by: Gregory Price <gregory.price@memverge.com>
>> > Signed-off-by: Fan Ni <fan.ni@samsung.com>  
>> 
>
> Hi Markus,
>
> Thanks for the detailed review.
>
> Fan is traveling for a few weeks and may have intermittent internet.
> He asked me to help with any feedback that came in during this period.
>
> Perhaps at this stage (as Michael has this queued) best bet is a follow on patch
> tweaking things.  The blast radius is more or less contained to the
> qmp file subject to a few parameter type changes.  I'd be keen on this
> approach if possible because that lets me start attacking the annoyingly
> large queue of stuff dependent on this series in parallel with
> improving this aspect.

Sacrifices git history tidiness for development velocity.  Judgement
call.

> Proposed draft patch at end of this email and responses to individual
> comments inline.
>
> I'll do a separate patch in response to your suggestion to mark the
> two interfaces unstable.  For now seems there is little disadvantage
> in doing so as I assume there is nothing stopping us removing
> that marking in a cycle or two if things look stable.

We can certainly make things stable when we're reasonably convinced they
are, and have a need for it.

>> [...]
>> 
>> > diff --git a/qapi/cxl.json b/qapi/cxl.json
>> > index 4281726dec..57d9f82014 100644
>> > --- a/qapi/cxl.json
>> > +++ b/qapi/cxl.json
>> > @@ -361,3 +361,146 @@
>> >  ##
>> >  {'command': 'cxl-inject-correctable-error',
>> >   'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
>> > +
>> > +##
>> > +# @CXLDynamicCapacityExtent:  
>> 
>> Three existing type names start with Cxl, and only one starts with CXL.
>> Please make your new ones start with Cxl, not CXL:
>> CxlDynamicCapacityExtent.
> Ok. 
>> 
>> > +#
>> > +# A single dynamic capacity extent
>> > +#
>> > +# @offset: The offset (in bytes) to the start of the region
>> > +#     where the extent belongs to.
>> > +#
>> > +# @len: The length of the extent in bytes.  
>> 
>> What is this?  Memory?
>
> Yes.  Probably makes more sense to add to the initial description rather
> than down here.
>
> # A single dynamic capacity extent.  This is a contiguous allocation
> # of memory by Device Physical Address within a single Dynamic Capacity
> # Region on a CXL Type 3 device.

Yes, that's better.

> This is all a bit of a balance between not quoting large chunks of
> the specification and providing enough detail here.

Yes.

> Reality is that people who don't know what this is, won't use this
> interface.  We can add some additional documentation to introduce
> all the concepts but it probably doesn't make sense to do so here.

I suggest to try combining references to the spec with just enough
explanation to serve as reminders for the people familiar with this
stuff, and maybe even as terse overview for the rest of us.

>> > +#
>> > +# Since: 9.1
>> > +##
>> > +{ 'struct': 'CXLDynamicCapacityExtent',
>> > +  'data': {
>> > +      'offset':'uint64',
>> > +      'len': 'uint64'
>> > +  }
>> > +}
>> > +
>> > +##
>> > +# @CXLExtSelPolicy:  
>> 
>> CxlExtentSelectionPolicy
>> 
>> > +#
>> > +# The policy to use for selecting which extents comprise the added
>> > +# capacity, as defined in cxl spec r3.1 Table 7-70.  
>> 
>> Use the official title: "as defined in the CXL Specification 3.1" (I
>> think, the actual document is behind a click-through agreement).
>
> Sadly not that simple, hence the desire for an abbreviation. Should be
>
> Compute Express Link (CXL) Specification, Revision 3.1, Version 1.0
>
> Can drop the Version 1.0 (as there have never been other versions and
> probably won't be) but the Revision part matters (unfortunately)
> hence the r in the above.
>
> Not that we've used CXL r3.0 etc in previous QMP docs for this. Perhaps
> just sticking to that and relying on the reference in
> docs/system/devices/cxl.rst for the canonical reference.
>
> For now I'll go with the (almost) full form here as it's never wrong to
> spell it out.  So all the new references will be to
> Compute Express Link (CXL) Specification, Revision 3.1, Section xxxx

Abbreviating a long title is okay as long the full title is still easy
enough to find.  But always abbreviate the same way, please.

>> > +#
>> > +# @free: 0h = Free
>> > +#
>> > +# @contiguous: 1h = Continuous  
>> 
>> What does "1h =" mean?  The numeric encoding?
> Alignment with spec, but doesn't need to be here so removed.
>
>> 
>> What exactly is "contiguous" / "continuous"?  I figure it's clear enough
>> if you have the CXL spec open in another window.  Can we condense it
>> into one phrase for use here?
>
> @free: Device is responsible for allocating the requested memory
>      capacity and is free to do this using any combination of
>      supported extents.
>
> @contiguous: Device is responsible for allocating the requested
>      memory capacity but must do so as a single contiguous
>      extent.
>
> @prescriptive: The precise set of extents to be allocated is specified
>      by the command.  Thus allocation is being managed by the
>      issuer of the allocation command, not the device.
>
> @enable-shared-access: Capacity has already been allocated to a
>      different host using free, contiguous or prescriptive methods with
>      a known tag. This policy then instructs the device to make the
>      capacity with the specified tag available to an additional host.
>      Capacity is implicit as it matches that already associated with the
>      tag. Note that the extent list (and hence DPAs)
>      used are per host, so a device may use different representations
>      on each host. The ordering of the extents provided to each host
>      is indicated to the host using per extent sequence numbers generated
>      by the device. Has a similar
>      meaning for temporal sharing but in that case there may be only
>      one host involved.

Better.

Feel free to omit some of the detail from the last one.

>> > +#
>> > +# @prescriptive: 2h = Prescriptive
>> > +#
>> > +# @enable-shared-access: 3h = Enable Shared Access  
>> 
>> Similar questions.
>> 
>> > +#
>> > +# Since: 9.1
>> > +##
>> > +{ 'enum': 'CXLExtSelPolicy',
>> > +  'data': ['free',
>> > +           'contiguous',
>> > +           'prescriptive',
>> > +           'enable-shared-access']
>> > +}
>> > +
>> > +##
>> > +# @cxl-add-dynamic-capacity:
>> > +#
>> > +# Command to initiate to add dynamic capacity extents to a host.  It  
>> 
>> "Initiate adding dynamic capacity extents"
> Done.
>> 
>> When a command initiates something, we commonly need a way to detect
>> completion, and sometimes need a way to track progress.
>> 
>> How can we detect completion, and if we can't, why's that okay?
>> 
>> Can adding capacity fail after the command succeeded?  If yes, how can
>> we detect that?
>
> The full flow can fail, in the sense that the host can reject the offered
> capacity.
> This command just initiates the flow.
>
> Today we can't detect it via QMP. There are a could of options but I
> think they are out of scope for this document (for now).
> There are a lot more DCD features to come and I'd include a
> resolution to this aspect as one of those.  Aim today is just
> to get to the point where we can test the OS handling - other
> cases like virtualization of this require a lot more infrastructure
> on top of what we have here.

Sounds scary...

> So likely options:
>
> * The 'fabric manager' will have an out of band path to the OS as it
>   doesn't spontaneously decide to offer capacity - that happens
>   because an orchestrator (think kubernetes or similar) has told a
>   host to bring up an application that needs this extra capacity.
>   That path would typically include an acknowledgment that the capacity
>   has turned up and the host can run what it was asked to run.
>
>   There is an inband path for a real fabric manager interface that
>   we don't yet have an equivalent of in QEMU. An earlier version
>   of this patch set provided a hacky equivalent so was dropped.
>   That path is the Fabric Manager side Dynamic Capacity Event Log
>   which has events for this
>  0x4 Add Capacity Response:
> " The host has responded to the Add Capacity event and the Dynamic
>   Capacity Extent field in this structure specifies the capacity
>   accepted by the host.  This event shall only be reported
>   to the FM"
> 0x5 is the similar one for release.
>
> So long term there is probably a need for a reporting interface
> but lots more to do in general and I think this is functional
> without that.  For now I think all we can do is document that
> discovering success must be done via an out of band interface.
>
> I've added:
> " Note that, currently, establishing success or failure of the full Add Dynamic
>   Capacity flow requires out of band communication with the OS of
>   the CXL host."
>
> Does that work for now?  We will have to remember to update if/when
> we add a way to query this.

Good enough for an unstable interface.

> Also clear we could benefit from some additional documentation
> in cxl.rst.  That's a job for another day however - for now to
> get the details users will have to read the CXL specification or
> may watch a bunch of conference videos and webinars at least.

Would it make sense to add a short paragaph on what's missing there?

>> How long until completion after the command succeeded?  Unbounded time?
>
> Depends on the host, and indeed unbounded - ultimately there is an abort
> path (forced removal later in this doc) but it is sometimes fatal for the
> OS running and only meant for the case where the host OS crashed.
> Not many operating systems play well with force removal of memory and due
> to a race condition it may looks like that to the host.  So basically
> it's a 'don't use this' kind of hardware feature.
>
> However it's not that QEMU is waiting for it beyond  having some tracking
> structures allocated that are not freed until the flow has finished.
> This is very much an an asynchronous flow.

Vaguely similar: when device_del merely initiates hot unplug, and
completion requires guest cooperation.  This puts management
applications into an awkward position.  What if they don't get
DEVICE_DELETED event within a reasonable time?  What is a reasonable
time?  We later added DEVICE_UNPLUG_GUEST_ERROR to avoid this for the
common case of well-behaved guests.

I'm not asking you to do anything about this now.  Spelling it out in
documentation seems advisable, though.

>> > +# simulates operations defined in cxl spec r3.1 7.6.7.6.5.  
>> 
>> "defined in the CXL Specification 3.1 section 7.6.7.6.5"
>> 
>> More of the same below, not noting it again.
> Sure. Hopefully fixed throughout the new text. I've not taken
> on the existing cases today.
>
>> 
>> > +#
>> > +# @path: CXL DCD canonical QOM path.  
>> 
>> Sure the QOM path needs to be canonical?
>> 
>> If not, what about "path to the CXL dynamic capacity device in the QOM
>> tree".  Intentionally close to existing descriptions of @qom-path
>> elsewhere.
>
> That text LGTM. I'll focus only on new cases of this for an initial
> patch but there are a load of other cases of this text that will
> want updating separately.

Okay.

>> > +#
>> > +# @host-id: The "Host ID" field as defined in cxl spec r3.1
>> > +#     Table 7-70.
>> > +#
>> > +# @selection-policy: The "Selection Policy" bits as defined in
>> > +#     cxl spec r3.1 Table 7-70.  It specifies the policy to use for
>> > +#     selecting which extents comprise the added capacity.
>> > +#
>> > +# @region: The "Region Number" field as defined in cxl spec r3.1
>> > +#     Table 7-70.  The dynamic capacity region where the capacity
>> > +#     is being added.  Valid range is from 0-7.  
>> 
>> Scratch the second sentence?
>
> Sure, I guess because nearly everything else is just a spec reference
> and this isn't adding enough info to be useful?

One, it adds relatively little over "region number", and two, it's not
actually a sentence ;)

>> 
>> > +#
>> > +# @tag: The "Tag" field as defined in cxl spec r3.1 Table 7-70.
>> > +#
>> > +# @extents: The "Extent List" field as defined in cxl spec r3.1
>> > +#     Table 7-70.
>> > +#
>> > +# Since : 9.1
>> > +##
>> > +{ 'command': 'cxl-add-dynamic-capacity',
>> > +  'data': { 'path': 'str',
>> > +            'host-id': 'uint16',
>> > +            'selection-policy': 'CXLExtSelPolicy',
>> > +            'region': 'uint8',
>> > +            '*tag': 'str',
>> > +            'extents': [ 'CXLDynamicCapacityExtent' ]
>> > +           }
>> > +}
>> > +
>> > +##
>> > +# @CXLExtRemovalPolicy:  
>> 
>> CxlExtentRemovalPolicy
> Done this and similar.
>> 
>> > +#
>> > +# The policy to use for selecting which extents comprise the released
>> > +# capacity, defined in the "Flags" field in cxl spec r3.1 Table 7-71.
>> > +#
>> > +# @tag-based: value = 0h.  Extents are selected by the device based
>> > +#     on tag, with no requirement for contiguous extents.
>> > +#
>> > +# @prescriptive: value = 1h.  Extent list of capacity to release is
>> > +#     included in the request payload.  
>> 
>> I guess "value = ..." documents the numeric value.  Sure that's useful
>> here?
>
> Dropped as not useful here.
>
>> 
>> > +#
>> > +# Since: 9.1
>> > +##
>> > +{ 'enum': 'CXLExtRemovalPolicy',
>> > +  'data': ['tag-based',
>> > +           'prescriptive']
>> > +}
>> > +
>> > +##
>> > +# @cxl-release-dynamic-capacity:
>> > +#
>> > +# Command to initiate to release dynamic capacity extents from a  
>> 
>> "Initiate releasing dynamic capacity extents"
>> 
>> When a command initiates something, we commonly need a way to detect
>> completion, and sometimes need a way to track progress.  See
>> cxl-add-dynamic-capacity above.
>> 
>
> Effectively same reply.  Today you can only do this via out of band
> comms with the host.  We have quite a lot more to add before we
> can report this via QMP. This is very much part 1 of DCD support,
> I'd expect us to be still adding features in a year or more.
>
> I'll add similar text to proposed for the add path.
>
> ...
>
>> 
>> > +#     capacity as a result of this request.  
>> 
>> What does it mean "to sanitize capacity"?  Is this about scrubbing the
>> memory?
>
> For one meaning of scrubbing.  Not the one that is normally applied to
> memory which is patrol scrub / ECC error detection and correction and
> subject to a long kernel mailing list thread at the moment and another
> QEMU patch set on my queue..  
>
> Why can't we have a be dictionary of canonical terms. Ah well.
> Added a slightly shortened quote from the CXL spec.
> "This Ensures that all user data and metadata is made permanently

ensures

>  unavailable by whatever means is appropriate for the media type.
>  Note that changing encryption keys is not sufficient."
>
> The last bit is because we will shortly have secure erase support
> via another patch set and in that case changing encryption keys is
> sufficient.

Works for me.

>> > +#
>> > +# @region: The "Region Number" field as defined in cxl spec r3.1
>> > +#     Table 7-71.  The dynamic capacity region where the capacity
>> > +#     is being added.  Valid range is from 0-7.  
>> 
>> My comment on cxl-add-dynamic-capacity argument @region applies.
> "The dynamic capacity region where the capacity is being added."
> sentence dropped.
>
>> 
>> > +#
>> > +# @tag: The "Tag" field as defined in cxl spec r3.1 Table 7-71.
>> > +#
>> > +# @extents: The "Extent List" field as defined in cxl spec r3.1
>> > +#     Table 7-71.
>> > +#
>> > +# Since : 9.1
>> > +##
>> > +{ 'command': 'cxl-release-dynamic-capacity',
>> > +  'data': { 'path': 'str',
>> > +            'host-id': 'uint16',
>> > +            'removal-policy': 'CXLExtRemovalPolicy',
>> > +            '*forced-removal': 'bool',
>> > +            '*sanitize-on-release': 'bool',
>> > +            'region': 'uint8',
>> > +            '*tag': 'str',
>> > +            'extents': [ 'CXLDynamicCapacityExtent' ]
>> > +           }
>> > +}  
>> 
>
> So with all that incorporated, what I currently have is:
>
>
>
> [PATCH] hw/cxl/events: Improve QMP interfaces and documentation for add/release dynamic capacity.
>
> New DCD command definitions updated in response to review comments
> from Markus.
>
> - Used CxlXXXX instead of CXLXXXXX for newly added types.
> - Expanded some abreviations in type names to be easier to read.
> - Additional documentation for some fields.
> - Replace slightly vague cxl r3.1 references with
>   "Compute Express Link (CXL) Specification, Revision 3.1, XXXX"
>   to bring them inline with what it says on the specification cover.
>
> Suggested-by: Maruks Armbruster <armbru@redhat.com>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>
> ---
> I can break this up into a separate patches, but that's going to be
> quite a lot of churn as often multiple of the above affect the same
> paragraph.

I don't think breaking it up is worth your while or mine :)

Patch looks good to me at a glance.  There are a few instances of

    For legibility, wrap text paragraphs so every line is at most 70
    characters long.

    Separate sentences with two spaces.

[...]
diff mbox series

Patch

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 9d54e10cd4..ab71492697 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -1405,7 +1405,7 @@  static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
  * Check whether any bit between addr[nr, nr+size) is set,
  * return true if any bit is set, otherwise return false
  */
-static bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
+bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
                               unsigned long size)
 {
     unsigned long res = find_next_bit(addr, size + nr, nr);
@@ -1444,7 +1444,7 @@  CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len)
     return NULL;
 }
 
-static void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
+void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
                                              uint64_t dpa,
                                              uint64_t len,
                                              uint8_t *tag,
@@ -1470,6 +1470,44 @@  void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
     g_free(extent);
 }
 
+/*
+ * Add a new extent to the extent "group" if group exists;
+ * otherwise, create a new group
+ * Return value: the extent group where the extent is inserted.
+ */
+CXLDCExtentGroup *cxl_insert_extent_to_extent_group(CXLDCExtentGroup *group,
+                                                    uint64_t dpa,
+                                                    uint64_t len,
+                                                    uint8_t *tag,
+                                                    uint16_t shared_seq)
+{
+    if (!group) {
+        group = g_new0(CXLDCExtentGroup, 1);
+        QTAILQ_INIT(&group->list);
+    }
+    cxl_insert_extent_to_extent_list(&group->list, dpa, len,
+                                     tag, shared_seq);
+    return group;
+}
+
+void cxl_extent_group_list_insert_tail(CXLDCExtentGroupList *list,
+                                       CXLDCExtentGroup *group)
+{
+    QTAILQ_INSERT_TAIL(list, group, node);
+}
+
+void cxl_extent_group_list_delete_front(CXLDCExtentGroupList *list)
+{
+    CXLDCExtent *ent, *ent_next;
+    CXLDCExtentGroup *group = QTAILQ_FIRST(list);
+
+    QTAILQ_REMOVE(list, group, node);
+    QTAILQ_FOREACH_SAFE(ent, &group->list, node, ent_next) {
+        cxl_remove_extent_from_extent_list(&group->list, ent);
+    }
+    g_free(group);
+}
+
 /*
  * CXL r3.1 Table 8-168: Add Dynamic Capacity Response Input Payload
  * CXL r3.1 Table 8-170: Release Dynamic Capacity Input Payload
@@ -1541,6 +1579,7 @@  static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
 {
     uint32_t i;
     CXLDCExtent *ent;
+    CXLDCExtentGroup *ext_group;
     uint64_t dpa, len;
     Range range1, range2;
 
@@ -1551,9 +1590,13 @@  static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
         range_init_nofail(&range1, dpa, len);
 
         /*
-         * TODO: once the pending extent list is added, check against
-         * the list will be added here.
+         * The host-accepted DPA range must be contained by the first extent
+         * group in the pending list
          */
+        ext_group = QTAILQ_FIRST(&ct3d->dc.extents_pending);
+        if (!cxl_extents_contains_dpa_range(&ext_group->list, dpa, len)) {
+            return CXL_MBOX_INVALID_PA;
+        }
 
         /* to-be-added range should not overlap with range already accepted */
         QTAILQ_FOREACH(ent, &ct3d->dc.extents, node) {
@@ -1586,10 +1629,7 @@  static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
     CXLRetCode ret;
 
     if (in->num_entries_updated == 0) {
-        /*
-         * TODO: once the pending list is introduced, extents in the beginning
-         * will get wiped out.
-         */
+        cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
         return CXL_MBOX_SUCCESS;
     }
 
@@ -1615,11 +1655,9 @@  static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
 
         cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
         ct3d->dc.total_extent_count += 1;
-        /*
-         * TODO: we will add a pending extent list based on event log record
-         * and process the list accordingly here.
-         */
     }
+    /* Remove the first extent group in the pending list */
+    cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
 
     return CXL_MBOX_SUCCESS;
 }
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 7c9038938f..2161766b14 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -673,6 +673,7 @@  static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
         ct3d->dc.total_capacity += region->len;
     }
     QTAILQ_INIT(&ct3d->dc.extents);
+    QTAILQ_INIT(&ct3d->dc.extents_pending);
 
     return true;
 }
@@ -680,10 +681,19 @@  static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
 static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
 {
     CXLDCExtent *ent, *ent_next;
+    CXLDCExtentGroup *group, *group_next;
 
     QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node, ent_next) {
         cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
     }
+
+    QTAILQ_FOREACH_SAFE(group, &ct3d->dc.extents_pending, node, group_next) {
+        QTAILQ_REMOVE(&ct3d->dc.extents_pending, group, node);
+        QTAILQ_FOREACH_SAFE(ent, &group->list, node, ent_next) {
+            cxl_remove_extent_from_extent_list(&group->list, ent);
+        }
+        g_free(group);
+    }
 }
 
 static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
@@ -1448,7 +1458,6 @@  static int ct3d_qmp_cxl_event_log_enc(CxlEventLog log)
         return CXL_EVENT_TYPE_FAIL;
     case CXL_EVENT_LOG_FATAL:
         return CXL_EVENT_TYPE_FATAL;
-/* DCD not yet supported */
     default:
         return -EINVAL;
     }
@@ -1699,6 +1708,301 @@  void qmp_cxl_inject_memory_module_event(const char *path, CxlEventLog log,
     }
 }
 
+/* CXL r3.1 Table 8-50: Dynamic Capacity Event Record */
+static const QemuUUID dynamic_capacity_uuid = {
+    .data = UUID(0xca95afa7, 0xf183, 0x4018, 0x8c, 0x2f,
+                 0x95, 0x26, 0x8e, 0x10, 0x1a, 0x2a),
+};
+
+typedef enum CXLDCEventType {
+    DC_EVENT_ADD_CAPACITY = 0x0,
+    DC_EVENT_RELEASE_CAPACITY = 0x1,
+    DC_EVENT_FORCED_RELEASE_CAPACITY = 0x2,
+    DC_EVENT_REGION_CONFIG_UPDATED = 0x3,
+    DC_EVENT_ADD_CAPACITY_RSP = 0x4,
+    DC_EVENT_CAPACITY_RELEASED = 0x5,
+} CXLDCEventType;
+
+/*
+ * Check whether the range [dpa, dpa + len - 1] has overlaps with extents in
+ * the list.
+ * Return value: return true if has overlaps; otherwise, return false
+ */
+static bool cxl_extents_overlaps_dpa_range(CXLDCExtentList *list,
+                                           uint64_t dpa, uint64_t len)
+{
+    CXLDCExtent *ent;
+    Range range1, range2;
+
+    if (!list) {
+        return false;
+    }
+
+    range_init_nofail(&range1, dpa, len);
+    QTAILQ_FOREACH(ent, list, node) {
+        range_init_nofail(&range2, ent->start_dpa, ent->len);
+        if (range_overlaps_range(&range1, &range2)) {
+            return true;
+        }
+    }
+    return false;
+}
+
+/*
+ * Check whether the range [dpa, dpa + len - 1] is contained by extents in
+ * the list.
+ * Will check multiple extents containment once superset release is added.
+ * Return value: return true if range is contained; otherwise, return false
+ */
+bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
+                                    uint64_t dpa, uint64_t len)
+{
+    CXLDCExtent *ent;
+    Range range1, range2;
+
+    if (!list) {
+        return false;
+    }
+
+    range_init_nofail(&range1, dpa, len);
+    QTAILQ_FOREACH(ent, list, node) {
+        range_init_nofail(&range2, ent->start_dpa, ent->len);
+        if (range_contains_range(&range2, &range1)) {
+            return true;
+        }
+    }
+    return false;
+}
+
+static bool cxl_extent_groups_overlaps_dpa_range(CXLDCExtentGroupList *list,
+                                                 uint64_t dpa, uint64_t len)
+{
+    CXLDCExtentGroup *group;
+
+    if (!list) {
+        return false;
+    }
+
+    QTAILQ_FOREACH(group, list, node) {
+        if (cxl_extents_overlaps_dpa_range(&group->list, dpa, len)) {
+            return true;
+        }
+    }
+    return false;
+}
+
+/*
+ * The main function to process dynamic capacity event with extent list.
+ * Currently DC extents add/release requests are processed.
+ */
+static void qmp_cxl_process_dynamic_capacity_prescriptive(const char *path,
+        uint16_t hid, CXLDCEventType type, uint8_t rid,
+        CXLDynamicCapacityExtentList *records, Error **errp)
+{
+    Object *obj;
+    CXLEventDynamicCapacity dCap = {};
+    CXLEventRecordHdr *hdr = &dCap.hdr;
+    CXLType3Dev *dcd;
+    uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;
+    uint32_t num_extents = 0;
+    CXLDynamicCapacityExtentList *list;
+    CXLDCExtentGroup *group = NULL;
+    g_autofree CXLDCExtentRaw *extents = NULL;
+    uint8_t enc_log = CXL_EVENT_TYPE_DYNAMIC_CAP;
+    uint64_t dpa, offset, len, block_size;
+    g_autofree unsigned long *blk_bitmap = NULL;
+    int i;
+
+    obj = object_resolve_path_type(path, TYPE_CXL_TYPE3, NULL);
+    if (!obj) {
+        error_setg(errp, "Unable to resolve CXL type 3 device");
+        return;
+    }
+
+    dcd = CXL_TYPE3(obj);
+    if (!dcd->dc.num_regions) {
+        error_setg(errp, "No dynamic capacity support from the device");
+        return;
+    }
+
+
+    if (rid >= dcd->dc.num_regions) {
+        error_setg(errp, "region id is too large");
+        return;
+    }
+    block_size = dcd->dc.regions[rid].block_size;
+    blk_bitmap = bitmap_new(dcd->dc.regions[rid].len / block_size);
+
+    /* Sanity check and count the extents */
+    list = records;
+    while (list) {
+        offset = list->value->offset;
+        len = list->value->len;
+        dpa = offset + dcd->dc.regions[rid].base;
+
+        if (len == 0) {
+            error_setg(errp, "extent with 0 length is not allowed");
+            return;
+        }
+
+        if (offset % block_size || len % block_size) {
+            error_setg(errp, "dpa or len is not aligned to region block size");
+            return;
+        }
+
+        if (offset + len > dcd->dc.regions[rid].len) {
+            error_setg(errp, "extent range is beyond the region end");
+            return;
+        }
+
+        /* No duplicate or overlapped extents are allowed */
+        if (test_any_bits_set(blk_bitmap, offset / block_size,
+                              len / block_size)) {
+            error_setg(errp, "duplicate or overlapped extents are detected");
+            return;
+        }
+        bitmap_set(blk_bitmap, offset / block_size, len / block_size);
+
+        if (type == DC_EVENT_RELEASE_CAPACITY) {
+            if (cxl_extent_groups_overlaps_dpa_range(&dcd->dc.extents_pending,
+                                                     dpa, len)) {
+                error_setg(errp,
+                           "cannot release extent with pending DPA range");
+                return;
+            }
+            if (!cxl_extents_contains_dpa_range(&dcd->dc.extents, dpa, len)) {
+                error_setg(errp,
+                           "cannot release extent with non-existing DPA range");
+                return;
+            }
+        } else if (type == DC_EVENT_ADD_CAPACITY) {
+            if (cxl_extents_overlaps_dpa_range(&dcd->dc.extents, dpa, len)) {
+                error_setg(errp,
+                           "cannot add DPA already accessible to the same LD");
+                return;
+            }
+            if (cxl_extent_groups_overlaps_dpa_range(&dcd->dc.extents_pending,
+                                                     dpa, len)) {
+                error_setg(errp,
+                           "cannot add DPA again while still pending");
+                return;
+            }
+        }
+        list = list->next;
+        num_extents++;
+    }
+
+    /* Create extent list for event being passed to host */
+    i = 0;
+    list = records;
+    extents = g_new0(CXLDCExtentRaw, num_extents);
+    while (list) {
+        offset = list->value->offset;
+        len = list->value->len;
+        dpa = dcd->dc.regions[rid].base + offset;
+
+        extents[i].start_dpa = dpa;
+        extents[i].len = len;
+        memset(extents[i].tag, 0, 0x10);
+        extents[i].shared_seq = 0;
+        if (type == DC_EVENT_ADD_CAPACITY) {
+            group = cxl_insert_extent_to_extent_group(group,
+                                                      extents[i].start_dpa,
+                                                      extents[i].len,
+                                                      extents[i].tag,
+                                                      extents[i].shared_seq);
+        }
+
+        list = list->next;
+        i++;
+    }
+    if (group) {
+        cxl_extent_group_list_insert_tail(&dcd->dc.extents_pending, group);
+    }
+
+    /*
+     * CXL r3.1 section 8.2.9.2.1.6: Dynamic Capacity Event Record
+     *
+     * All Dynamic Capacity event records shall set the Event Record Severity
+     * field in the Common Event Record Format to Informational Event. All
+     * Dynamic Capacity related events shall be logged in the Dynamic Capacity
+     * Event Log.
+     */
+    cxl_assign_event_header(hdr, &dynamic_capacity_uuid, flags, sizeof(dCap),
+                            cxl_device_get_timestamp(&dcd->cxl_dstate));
+
+    dCap.type = type;
+    /* FIXME: for now, validity flag is cleared */
+    dCap.validity_flags = 0;
+    stw_le_p(&dCap.host_id, hid);
+    /* only valid for DC_REGION_CONFIG_UPDATED event */
+    dCap.updated_region_id = 0;
+    dCap.flags = 0;
+    for (i = 0; i < num_extents; i++) {
+        memcpy(&dCap.dynamic_capacity_extent, &extents[i],
+               sizeof(CXLDCExtentRaw));
+
+        if (i < num_extents - 1) {
+            /* Set "More" flag */
+            dCap.flags |= BIT(0);
+        }
+
+        if (cxl_event_insert(&dcd->cxl_dstate, enc_log,
+                             (CXLEventRecordRaw *)&dCap)) {
+            cxl_event_irq_assert(dcd);
+        }
+    }
+}
+
+void qmp_cxl_add_dynamic_capacity(const char *path, uint16_t host_id,
+                                  CXLExtSelPolicy sel_policy, uint8_t region,
+                                  const char *tag,
+                                  CXLDynamicCapacityExtentList  *extents,
+                                  Error **errp)
+{
+    switch (sel_policy) {
+    case CXL_EXT_SEL_POLICY_PRESCRIPTIVE:
+        qmp_cxl_process_dynamic_capacity_prescriptive(path, host_id,
+                                                      DC_EVENT_ADD_CAPACITY,
+                                                      region, extents, errp);
+        return;
+    default:
+        error_setg(errp, "Selection policy not supported");
+        return;
+    }
+}
+
+void qmp_cxl_release_dynamic_capacity(const char *path, uint16_t host_id,
+                                      CXLExtRemovalPolicy removal_policy,
+                                      bool has_forced_removal,
+                                      bool forced_removal,
+                                      bool has_sanitize_on_release,
+                                      bool sanitize_on_release,
+                                      uint8_t region,
+                                      const char *tag,
+                                      CXLDynamicCapacityExtentList  *extents,
+                                      Error **errp)
+{
+    CXLDCEventType type = DC_EVENT_RELEASE_CAPACITY;
+
+    if (has_forced_removal && forced_removal) {
+        /* TODO: enable forced removal in the future */
+        type = DC_EVENT_FORCED_RELEASE_CAPACITY;
+        error_setg(errp, "Forced removal not supported yet");
+        return;
+    }
+
+    switch (removal_policy) {
+    case CXL_EXT_REMOVAL_POLICY_PRESCRIPTIVE:
+        qmp_cxl_process_dynamic_capacity_prescriptive(path, host_id, type,
+                                                      region, extents, errp);
+        return;
+    default:
+        error_setg(errp, "Removal policy not supported");
+        return;
+    }
+}
+
 static void ct3_class_init(ObjectClass *oc, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(oc);
diff --git a/hw/mem/cxl_type3_stubs.c b/hw/mem/cxl_type3_stubs.c
index 3e1851e32b..45419bbefe 100644
--- a/hw/mem/cxl_type3_stubs.c
+++ b/hw/mem/cxl_type3_stubs.c
@@ -67,3 +67,28 @@  void qmp_cxl_inject_correctable_error(const char *path, CxlCorErrorType type,
 {
     error_setg(errp, "CXL Type 3 support is not compiled in");
 }
+
+void qmp_cxl_add_dynamic_capacity(const char *path,
+                                  uint16_t host_id,
+                                  CXLExtSelPolicy sel_policy,
+                                  uint8_t region,
+                                  const char *tag,
+                                  CXLDynamicCapacityExtentList *extents,
+                                  Error **errp)
+{
+    error_setg(errp, "CXL Type 3 support is not compiled in");
+}
+
+void qmp_cxl_release_dynamic_capacity(const char *path, uint16_t host_id,
+                                      CXLExtRemovalPolicy removal_policy,
+                                      bool has_forced_removal,
+                                      bool forced_removal,
+                                      bool has_sanitize_on_release,
+                                      bool sanitize_on_release,
+                                      uint8_t region,
+                                      const char *tag,
+                                      CXLDynamicCapacityExtentList *extents,
+                                      Error **errp)
+{
+    error_setg(errp, "CXL Type 3 support is not compiled in");
+}
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index df3511e91b..c69ff6b5de 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -443,6 +443,12 @@  typedef struct CXLDCExtent {
 } CXLDCExtent;
 typedef QTAILQ_HEAD(, CXLDCExtent) CXLDCExtentList;
 
+typedef struct CXLDCExtentGroup {
+    CXLDCExtentList list;
+    QTAILQ_ENTRY(CXLDCExtentGroup) node;
+} CXLDCExtentGroup;
+typedef QTAILQ_HEAD(, CXLDCExtentGroup) CXLDCExtentGroupList;
+
 typedef struct CXLDCRegion {
     uint64_t base;       /* aligned to 256*MiB */
     uint64_t decode_len; /* aligned to 256*MiB */
@@ -494,6 +500,7 @@  struct CXLType3Dev {
          */
         uint64_t total_capacity; /* 256M aligned */
         CXLDCExtentList extents;
+        CXLDCExtentGroupList extents_pending;
         uint32_t total_extent_count;
         uint32_t ext_list_gen_seq;
 
@@ -555,4 +562,19 @@  CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len);
 
 void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
                                         CXLDCExtent *extent);
+void cxl_insert_extent_to_extent_list(CXLDCExtentList *list, uint64_t dpa,
+                                      uint64_t len, uint8_t *tag,
+                                      uint16_t shared_seq);
+bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
+                       unsigned long size);
+bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
+                                    uint64_t dpa, uint64_t len);
+CXLDCExtentGroup *cxl_insert_extent_to_extent_group(CXLDCExtentGroup *group,
+                                                    uint64_t dpa,
+                                                    uint64_t len,
+                                                    uint8_t *tag,
+                                                    uint16_t shared_seq);
+void cxl_extent_group_list_insert_tail(CXLDCExtentGroupList *list,
+                                       CXLDCExtentGroup *group);
+void cxl_extent_group_list_delete_front(CXLDCExtentGroupList *list);
 #endif
diff --git a/include/hw/cxl/cxl_events.h b/include/hw/cxl/cxl_events.h
index 5170b8dbf8..38cadaa0f3 100644
--- a/include/hw/cxl/cxl_events.h
+++ b/include/hw/cxl/cxl_events.h
@@ -166,4 +166,22 @@  typedef struct CXLEventMemoryModule {
     uint8_t reserved[0x3d];
 } QEMU_PACKED CXLEventMemoryModule;
 
+/*
+ * CXL r3.1 section Table 8-50: Dynamic Capacity Event Record
+ * All fields little endian.
+ */
+typedef struct CXLEventDynamicCapacity {
+    CXLEventRecordHdr hdr;
+    uint8_t type;
+    uint8_t validity_flags;
+    uint16_t host_id;
+    uint8_t updated_region_id;
+    uint8_t flags;
+    uint8_t reserved2[2];
+    uint8_t dynamic_capacity_extent[0x28]; /* defined in cxl_device.h */
+    uint8_t reserved[0x18];
+    uint32_t extents_avail;
+    uint32_t tags_avail;
+} QEMU_PACKED CXLEventDynamicCapacity;
+
 #endif /* CXL_EVENTS_H */
diff --git a/qapi/cxl.json b/qapi/cxl.json
index 4281726dec..57d9f82014 100644
--- a/qapi/cxl.json
+++ b/qapi/cxl.json
@@ -361,3 +361,146 @@ 
 ##
 {'command': 'cxl-inject-correctable-error',
  'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
+
+##
+# @CXLDynamicCapacityExtent:
+#
+# A single dynamic capacity extent
+#
+# @offset: The offset (in bytes) to the start of the region
+#     where the extent belongs to.
+#
+# @len: The length of the extent in bytes.
+#
+# Since: 9.1
+##
+{ 'struct': 'CXLDynamicCapacityExtent',
+  'data': {
+      'offset':'uint64',
+      'len': 'uint64'
+  }
+}
+
+##
+# @CXLExtSelPolicy:
+#
+# The policy to use for selecting which extents comprise the added
+# capacity, as defined in cxl spec r3.1 Table 7-70.
+#
+# @free: 0h = Free
+#
+# @contiguous: 1h = Continuous
+#
+# @prescriptive: 2h = Prescriptive
+#
+# @enable-shared-access: 3h = Enable Shared Access
+#
+# Since: 9.1
+##
+{ 'enum': 'CXLExtSelPolicy',
+  'data': ['free',
+           'contiguous',
+           'prescriptive',
+           'enable-shared-access']
+}
+
+##
+# @cxl-add-dynamic-capacity:
+#
+# Command to initiate to add dynamic capacity extents to a host.  It
+# simulates operations defined in cxl spec r3.1 7.6.7.6.5.
+#
+# @path: CXL DCD canonical QOM path.
+#
+# @host-id: The "Host ID" field as defined in cxl spec r3.1
+#     Table 7-70.
+#
+# @selection-policy: The "Selection Policy" bits as defined in
+#     cxl spec r3.1 Table 7-70.  It specifies the policy to use for
+#     selecting which extents comprise the added capacity.
+#
+# @region: The "Region Number" field as defined in cxl spec r3.1
+#     Table 7-70.  The dynamic capacity region where the capacity
+#     is being added.  Valid range is from 0-7.
+#
+# @tag: The "Tag" field as defined in cxl spec r3.1 Table 7-70.
+#
+# @extents: The "Extent List" field as defined in cxl spec r3.1
+#     Table 7-70.
+#
+# Since : 9.1
+##
+{ 'command': 'cxl-add-dynamic-capacity',
+  'data': { 'path': 'str',
+            'host-id': 'uint16',
+            'selection-policy': 'CXLExtSelPolicy',
+            'region': 'uint8',
+            '*tag': 'str',
+            'extents': [ 'CXLDynamicCapacityExtent' ]
+           }
+}
+
+##
+# @CXLExtRemovalPolicy:
+#
+# The policy to use for selecting which extents comprise the released
+# capacity, defined in the "Flags" field in cxl spec r3.1 Table 7-71.
+#
+# @tag-based: value = 0h.  Extents are selected by the device based
+#     on tag, with no requirement for contiguous extents.
+#
+# @prescriptive: value = 1h.  Extent list of capacity to release is
+#     included in the request payload.
+#
+# Since: 9.1
+##
+{ 'enum': 'CXLExtRemovalPolicy',
+  'data': ['tag-based',
+           'prescriptive']
+}
+
+##
+# @cxl-release-dynamic-capacity:
+#
+# Command to initiate to release dynamic capacity extents from a
+# host.  It simulates operations defined in cxl spec r3.1 7.6.7.6.6.
+#
+# @path: CXL DCD canonical QOM path.
+#
+# @host-id: The "Host ID" field as defined in cxl spec r3.1
+#     Table 7-71.
+#
+# @removal-policy: Bit[3:0] of the "Flags" field as defined in cxl
+#     spec r3.1 Table 7-71.
+#
+# @forced-removal: Bit[4] of the "Flags" field in cxl spec r3.1
+#     Table 7-71.  When set, device does not wait for a Release
+#     Dynamic Capacity command from the host.  Host immediately
+#     loses access to released capacity.
+#
+# @sanitize-on-release: Bit[5] of the "Flags" field in cxl spec r3.1
+#     Table 7-71.  When set, device should sanitize all released
+#     capacity as a result of this request.
+#
+# @region: The "Region Number" field as defined in cxl spec r3.1
+#     Table 7-71.  The dynamic capacity region where the capacity
+#     is being added.  Valid range is from 0-7.
+#
+# @tag: The "Tag" field as defined in cxl spec r3.1 Table 7-71.
+#
+# @extents: The "Extent List" field as defined in cxl spec r3.1
+#     Table 7-71.
+#
+# Since : 9.1
+##
+{ 'command': 'cxl-release-dynamic-capacity',
+  'data': { 'path': 'str',
+            'host-id': 'uint16',
+            'removal-policy': 'CXLExtRemovalPolicy',
+            '*forced-removal': 'bool',
+            '*sanitize-on-release': 'bool',
+            'region': 'uint8',
+            '*tag': 'str',
+            'extents': [ 'CXLDynamicCapacityExtent' ]
+           }
+}