mbox series

[RFC,v2,00/18] DCD: Add support for Dynamic Capacity Devices (DCD)

Message ID 20230604-dcd-type2-upstream-v2-0-f740c47e7916@intel.com
Headers show
Series DCD: Add support for Dynamic Capacity Devices (DCD) | expand

Message

Ira Weiny Aug. 29, 2023, 5:20 a.m. UTC
A Dynamic Capacity Device (DCD) (CXL 3.0 spec 9.13.3) is a CXL memory
device that implements dynamic capacity.  Dynamic capacity feature
allows memory capacity to change dynamically, without the need for
resetting the device.

Even though this is marked v2 by b4, this is effectively a whole new
series for DCD support.  Quite a bit of the core support was completed
by Navneet in [4].  However, the architecture through the CXL region,
DAX region, and DAX Device layers is completely different.  Particular
attention was paid to:

	1) managing skip resources in the hardware device
	2) ensuring the host OS only sent a release memory mailbox
	   response when all DAX devices are done using an extent
	3) allowing dax devices to span extents
	4) allowing dax devices to use parts of extents

I could say all of the review comments from v1 are addressed but frankly
the series has changed so much that I can't guarantee anything.

The series continues to be based on the type-2 work posted from Dan.[2]
However, my branch with that work is a bit dated.  Therefore I have
posted this series on github here.[5]

Testing was sped up with cxl-test and ndctl dcd support.  A preview of
that work is on github.[6]  In addition Fan Ni's Qemu DCD series was
used part of the time.[3]

The major parts of this series are:

- Get the dynamic capacity (DC) region information from cxl device
- Configure device DC regions reported by hardware
- Enhance CXL and DAX regions for DC
	a. maintain separation between the hardware extents and the CXL
	   region extents to provide for the addition of interleaving in
	   the future.
- Get and maintain the hardware extent lists for each device via an
  initial extent list and DC event records
        a. Add capacity Events
	b. Add capacity response
	b. Release capacity events
	d. Release capacity response
- Notify region layers of extent changes
- Allow for DAX devices to be created on extents which are surfaced
- Maintain references on extents which are in use
	a. Send Release capacity Response only when DAX devices are not
	   using memory
- Allow DAX region extent labels to change to allow for flexibility in
  DAX device creation in the future (further enhancements are required
  to ndctl for this)
- Trace Dynamic Capacity events
- Add cxl-test infrastructure to allow for faster unit testing

To: Dan Williams <dan.j.williams@intel.com>
Cc: Navneet Singh <navneet.singh@intel.com>
Cc: Fan Ni <fan.ni@samsung.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: linux-cxl@vger.kernel.org
Cc: linux-kernel@vger.kernel.org

[1] https://lore.kernel.org/all/64326437c1496_934b2949f@dwillia2-mobl3.amr.corp.intel.com.notmuch/
[2] https://lore.kernel.org/all/168592149709.1948938.8663425987110396027.stgit@dwillia2-xfh.jf.intel.com/
[3] https://lore.kernel.org/all/6483946e8152f_f1132294a2@iweiny-mobl.notmuch/
[4] https://lore.kernel.org/r/20230604-dcd-type2-upstream-v1-0-71b6341bae54@intel.com
[5] https://github.com/weiny2/linux-kernel/commits/dcd-v2-2023-08-28
[6] https://github.com/weiny2/ndctl/tree/dcd-region2

---
Changes in v2:
- iweiny: Complete rework of the entire series
- Link to v1: https://lore.kernel.org/r/20230604-dcd-type2-upstream-v1-0-71b6341bae54@intel.com

---
Ira Weiny (15):
      cxl/hdm: Debug, use decoder name function
      cxl/mbox: Flag support for Dynamic Capacity Devices (DCD)
      cxl/region: Add Dynamic Capacity decoder and region modes
      cxl/port: Add Dynamic Capacity mode support to endpoint decoders
      cxl/port: Add Dynamic Capacity size support to endpoint decoders
      cxl/region: Add Dynamic Capacity CXL region support
      cxl/mem: Read extents on memory device discovery
      cxl/mem: Handle DCD add and release capacity events.
      cxl/region: Expose DC extents on region driver load
      cxl/region: Notify regions of DC changes
      dax/bus: Factor out dev dax resize logic
      dax/region: Support DAX device creation on dynamic DAX regions
      tools/testing/cxl: Make event logs dynamic
      tools/testing/cxl: Add DC Regions to mock mem data
      tools/testing/cxl: Add Dynamic Capacity events

Navneet Singh (3):
      cxl/mem: Read Dynamic capacity configuration from the device
      cxl/mem: Expose device dynamic capacity configuration
      cxl/mem: Trace Dynamic capacity Event Record

 Documentation/ABI/testing/sysfs-bus-cxl |  56 ++-
 drivers/cxl/core/core.h                 |   1 +
 drivers/cxl/core/hdm.c                  | 215 ++++++++-
 drivers/cxl/core/mbox.c                 | 646 +++++++++++++++++++++++++-
 drivers/cxl/core/memdev.c               |  77 ++++
 drivers/cxl/core/port.c                 |  19 +
 drivers/cxl/core/region.c               | 418 +++++++++++++++--
 drivers/cxl/core/trace.h                |  65 +++
 drivers/cxl/cxl.h                       |  99 +++-
 drivers/cxl/cxlmem.h                    | 138 +++++-
 drivers/cxl/mem.c                       |  50 ++
 drivers/cxl/pci.c                       |   8 +
 drivers/dax/Makefile                    |   1 +
 drivers/dax/bus.c                       | 263 ++++++++---
 drivers/dax/bus.h                       |   1 +
 drivers/dax/cxl.c                       | 213 ++++++++-
 drivers/dax/dax-private.h               |  61 +++
 drivers/dax/extent.c                    | 133 ++++++
 tools/testing/cxl/test/mem.c            | 782 +++++++++++++++++++++++++++-----
 19 files changed, 3005 insertions(+), 241 deletions(-)
---
base-commit: c76cce37fb6f3796e8e146677ba98d3cca30a488
change-id: 20230604-dcd-type2-upstream-0cd15f6216fd

Best regards,

Comments

Fan Ni Sept. 7, 2023, 9:01 p.m. UTC | #1
On Mon, Aug 28, 2023 at 10:20:51PM -0700, Ira Weiny wrote:
> A Dynamic Capacity Device (DCD) (CXL 3.0 spec 9.13.3) is a CXL memory
> device that implements dynamic capacity.  Dynamic capacity feature
> allows memory capacity to change dynamically, without the need for
> resetting the device.
>
> Even though this is marked v2 by b4, this is effectively a whole new
> series for DCD support.  Quite a bit of the core support was completed
> by Navneet in [4].  However, the architecture through the CXL region,
> DAX region, and DAX Device layers is completely different.  Particular
> attention was paid to:
>
> 	1) managing skip resources in the hardware device
> 	2) ensuring the host OS only sent a release memory mailbox
> 	   response when all DAX devices are done using an extent
> 	3) allowing dax devices to span extents
> 	4) allowing dax devices to use parts of extents
>
> I could say all of the review comments from v1 are addressed but frankly
> the series has changed so much that I can't guarantee anything.
>
> The series continues to be based on the type-2 work posted from Dan.[2]
> However, my branch with that work is a bit dated.  Therefore I have
> posted this series on github here.[5]
>
> Testing was sped up with cxl-test and ndctl dcd support.  A preview of
> that work is on github.[6]  In addition Fan Ni's Qemu DCD series was
> used part of the time.[3]
>
> The major parts of this series are:
>
> - Get the dynamic capacity (DC) region information from cxl device
> - Configure device DC regions reported by hardware
> - Enhance CXL and DAX regions for DC
> 	a. maintain separation between the hardware extents and the CXL
> 	   region extents to provide for the addition of interleaving in
> 	   the future.
> - Get and maintain the hardware extent lists for each device via an
>   initial extent list and DC event records
>         a. Add capacity Events
> 	b. Add capacity response
> 	b. Release capacity events
> 	d. Release capacity response
> - Notify region layers of extent changes
> - Allow for DAX devices to be created on extents which are surfaced
> - Maintain references on extents which are in use
> 	a. Send Release capacity Response only when DAX devices are not
> 	   using memory
> - Allow DAX region extent labels to change to allow for flexibility in
>   DAX device creation in the future (further enhancements are required
>   to ndctl for this)
> - Trace Dynamic Capacity events
> - Add cxl-test infrastructure to allow for faster unit testing
>
> To: Dan Williams <dan.j.williams@intel.com>
> Cc: Navneet Singh <navneet.singh@intel.com>
> Cc: Fan Ni <fan.ni@samsung.com>
> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Cc: Davidlohr Bueso <dave@stgolabs.net>
> Cc: Dave Jiang <dave.jiang@intel.com>
> Cc: Alison Schofield <alison.schofield@intel.com>
> Cc: Vishal Verma <vishal.l.verma@intel.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Cc: linux-cxl@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
>
> [1] https://lore.kernel.org/all/64326437c1496_934b2949f@dwillia2-mobl3.amr.corp.intel.com.notmuch/
> [2] https://lore.kernel.org/all/168592149709.1948938.8663425987110396027.stgit@dwillia2-xfh.jf.intel.com/
> [3] https://lore.kernel.org/all/6483946e8152f_f1132294a2@iweiny-mobl.notmuch/
> [4] https://lore.kernel.org/r/20230604-dcd-type2-upstream-v1-0-71b6341bae54@intel.com
> [5] https://github.com/weiny2/linux-kernel/commits/dcd-v2-2023-08-28
> [6] https://github.com/weiny2/ndctl/tree/dcd-region2
>

Hi Ira,

I tried to test the patch series with the qemu dcd patches, however, I
hit some issues, and would like to check the following with you.

1. After we create a region for DC before any extents are added, a dax
device will show under /dev. Is that what we want? If I remember it
correctly, the dax device used to show up after a dc extent is added.


2. add/release extent does not work correctly for me. The code path is
not called, and I made the following changes to make it pass.
---
 drivers/cxl/cxl.h    | 3 ++-
 drivers/cxl/cxlmem.h | 1 +
 drivers/cxl/pci.c    | 7 +++++++
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 2c73a30980b6..0d132c1739ce 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -168,7 +168,8 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
 #define CXLDEV_EVENT_STATUS_ALL (CXLDEV_EVENT_STATUS_INFO |	\
 				 CXLDEV_EVENT_STATUS_WARN |	\
 				 CXLDEV_EVENT_STATUS_FAIL |	\
-				 CXLDEV_EVENT_STATUS_FATAL)
+				 CXLDEV_EVENT_STATUS_FATAL| \
+				 CXLDEV_EVENT_STATUS_DCD)

 /* CXL rev 3.0 section 8.2.9.2.4; Table 8-52 */
 #define CXLDEV_EVENT_INT_MODE_MASK	GENMASK(1, 0)
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 8ca81fd067c2..ae9dcb291c75 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -235,6 +235,7 @@ struct cxl_event_interrupt_policy {
 	u8 warn_settings;
 	u8 failure_settings;
 	u8 fatal_settings;
+	u8 dyncap_settings;
 } __packed;

 /**
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 10c1a583113c..e30fe0304514 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -686,6 +686,7 @@ static int cxl_event_config_msgnums(struct cxl_memdev_state *mds,
 		.warn_settings = CXL_INT_MSI_MSIX,
 		.failure_settings = CXL_INT_MSI_MSIX,
 		.fatal_settings = CXL_INT_MSI_MSIX,
+		.dyncap_settings = CXL_INT_MSI_MSIX,
 	};

 	mbox_cmd = (struct cxl_mbox_cmd) {
@@ -739,6 +740,12 @@ static int cxl_event_irqsetup(struct cxl_memdev_state *mds)
 		return rc;
 	}

+	rc = cxl_event_req_irq(cxlds, policy.dyncap_settings);
+	if (rc) {
+		dev_err(cxlds->dev, "Failed to get interrupt for event dyncap log\n");
+		return rc;
+	}
+
 	return 0;
 }

--

3. With changes made in 2, the code for add/release dc extent can be called,
however, the system behaviour seems different from before. Previously, after a
dc extent is added, it will show up with lsmem command and listed as offline.
Now, nothing is showing. Is it expected? What should we do to make it usable
as system ram?

Please let me know if I miss something or did something wrong. Thanks.

Fan



> ---
> Changes in v2:
> - iweiny: Complete rework of the entire series
> - Link to v1: https://lore.kernel.org/r/20230604-dcd-type2-upstream-v1-0-71b6341bae54@intel.com
>
> ---
> Ira Weiny (15):
>       cxl/hdm: Debug, use decoder name function
>       cxl/mbox: Flag support for Dynamic Capacity Devices (DCD)
>       cxl/region: Add Dynamic Capacity decoder and region modes
>       cxl/port: Add Dynamic Capacity mode support to endpoint decoders
>       cxl/port: Add Dynamic Capacity size support to endpoint decoders
>       cxl/region: Add Dynamic Capacity CXL region support
>       cxl/mem: Read extents on memory device discovery
>       cxl/mem: Handle DCD add and release capacity events.
>       cxl/region: Expose DC extents on region driver load
>       cxl/region: Notify regions of DC changes
>       dax/bus: Factor out dev dax resize logic
>       dax/region: Support DAX device creation on dynamic DAX regions
>       tools/testing/cxl: Make event logs dynamic
>       tools/testing/cxl: Add DC Regions to mock mem data
>       tools/testing/cxl: Add Dynamic Capacity events
>
> Navneet Singh (3):
>       cxl/mem: Read Dynamic capacity configuration from the device
>       cxl/mem: Expose device dynamic capacity configuration
>       cxl/mem: Trace Dynamic capacity Event Record
>
>  Documentation/ABI/testing/sysfs-bus-cxl |  56 ++-
>  drivers/cxl/core/core.h                 |   1 +
>  drivers/cxl/core/hdm.c                  | 215 ++++++++-
>  drivers/cxl/core/mbox.c                 | 646 +++++++++++++++++++++++++-
>  drivers/cxl/core/memdev.c               |  77 ++++
>  drivers/cxl/core/port.c                 |  19 +
>  drivers/cxl/core/region.c               | 418 +++++++++++++++--
>  drivers/cxl/core/trace.h                |  65 +++
>  drivers/cxl/cxl.h                       |  99 +++-
>  drivers/cxl/cxlmem.h                    | 138 +++++-
>  drivers/cxl/mem.c                       |  50 ++
>  drivers/cxl/pci.c                       |   8 +
>  drivers/dax/Makefile                    |   1 +
>  drivers/dax/bus.c                       | 263 ++++++++---
>  drivers/dax/bus.h                       |   1 +
>  drivers/dax/cxl.c                       | 213 ++++++++-
>  drivers/dax/dax-private.h               |  61 +++
>  drivers/dax/extent.c                    | 133 ++++++
>  tools/testing/cxl/test/mem.c            | 782 +++++++++++++++++++++++++++-----
>  19 files changed, 3005 insertions(+), 241 deletions(-)
> ---
> base-commit: c76cce37fb6f3796e8e146677ba98d3cca30a488
> change-id: 20230604-dcd-type2-upstream-0cd15f6216fd
>
> Best regards,
> --
> Ira Weiny <ira.weiny@intel.com>
>
Ira Weiny Sept. 12, 2023, 1:44 a.m. UTC | #2
Fan Ni wrote:
> On Mon, Aug 28, 2023 at 10:20:51PM -0700, Ira Weiny wrote:

Sorry for the delay, I've been walking through the responses and just saw
this.

> 
> Hi Ira,
> 
> I tried to test the patch series with the qemu dcd patches, however, I
> hit some issues, and would like to check the following with you.
> 
> 1. After we create a region for DC before any extents are added, a dax
> device will show under /dev. Is that what we want?

Yes, see

cxl/region: Add Dynamic Capacity CXL region support

	"Special case DC capable CXL regions to create a 0 sized seed DAX
	device until others can be created on dynamic space later."

The seed device is required but is left empty.  It can be resized when
extents are added later.

> If I remember it
> correctly, the dax device used to show up after a dc extent is added.
> 
> 
> 2. add/release extent does not work correctly for me. The code path is
> not called, and I made the following changes to make it pass.

:-(

This is the problem with cxl_test...  I've just realized this after seeing
Jorgen's email regarding the interrupt configuration code.  I've added it
back in.  I'm not sure where it got lost along the way but it was
completely gone from this RFC v2.  Sorry about that.

> ---
>  drivers/cxl/cxl.h    | 3 ++-
>  drivers/cxl/cxlmem.h | 1 +
>  drivers/cxl/pci.c    | 7 +++++++
>  3 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 2c73a30980b6..0d132c1739ce 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -168,7 +168,8 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
>  #define CXLDEV_EVENT_STATUS_ALL (CXLDEV_EVENT_STATUS_INFO |	\
>  				 CXLDEV_EVENT_STATUS_WARN |	\
>  				 CXLDEV_EVENT_STATUS_FAIL |	\
> -				 CXLDEV_EVENT_STATUS_FATAL)
> +				 CXLDEV_EVENT_STATUS_FATAL| \
> +				 CXLDEV_EVENT_STATUS_DCD)
> 
>  /* CXL rev 3.0 section 8.2.9.2.4; Table 8-52 */
>  #define CXLDEV_EVENT_INT_MODE_MASK	GENMASK(1, 0)
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 8ca81fd067c2..ae9dcb291c75 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -235,6 +235,7 @@ struct cxl_event_interrupt_policy {
>  	u8 warn_settings;
>  	u8 failure_settings;
>  	u8 fatal_settings;
> +	u8 dyncap_settings;
>  } __packed;
> 
>  /**
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 10c1a583113c..e30fe0304514 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -686,6 +686,7 @@ static int cxl_event_config_msgnums(struct cxl_memdev_state *mds,
>  		.warn_settings = CXL_INT_MSI_MSIX,
>  		.failure_settings = CXL_INT_MSI_MSIX,
>  		.fatal_settings = CXL_INT_MSI_MSIX,
> +		.dyncap_settings = CXL_INT_MSI_MSIX,
>  	};
> 
>  	mbox_cmd = (struct cxl_mbox_cmd) {
> @@ -739,6 +740,12 @@ static int cxl_event_irqsetup(struct cxl_memdev_state *mds)
>  		return rc;
>  	}
> 
> +	rc = cxl_event_req_irq(cxlds, policy.dyncap_settings);
> +	if (rc) {
> +		dev_err(cxlds->dev, "Failed to get interrupt for event dyncap log\n");
> +		return rc;
> +	}
> +
>  	return 0;
>  }
> 
> --
> 
> 3. With changes made in 2, the code for add/release dc extent can be called,
> however, the system behaviour seems different from before. Previously, after a
> dc extent is added, it will show up with lsmem command and listed as offline.
> Now, nothing is showing. Is it expected? What should we do to make it usable
> as system ram?

Yes this behavior was not correct before.  DAX devices should be flexible
to be created throughout the region.  Either within extents or across
extents.  Dave Jiang mentioned to me internally it might help to add some
ASCII art documentation regarding how this works.  Generally, the dax
region available size will increase when extents are added and new dax
devices can be created to utilize that space.

Check out the dcd-test.sh in ndctl at this link for the commands to create
a dax device in the new architecture.

https://github.com/weiny2/ndctl/tree/dcd-region2

Hope this helps.

> 
> Please let me know if I miss something or did something wrong. Thanks.

You did not.  I thought the new dax code would explain this new dax device
operation.

Some new documentation is in order.

Ira