diff mbox series

[RFC,v2,08/18] cxl/region: Add Dynamic Capacity CXL region support

Message ID 20230604-dcd-type2-upstream-v2-8-f740c47e7916@intel.com
State New, archived
Headers show
Series DCD: Add support for Dynamic Capacity Devices (DCD) | expand

Commit Message

Ira Weiny Aug. 29, 2023, 5:20 a.m. UTC
CXL devices optionally support dynamic capacity.  CXL Regions must be
configured correctly to access this capacity.  Similar to ram and pmem
partitions, DC Regions represent different partitions of the DPA space.

Interleaving is deferred due to the complexity of managing extents on
multiple devices at the same time.  However, there is nothing which
directly prevents interleave support at this time.  The check allows
for early rejection.

To maintain backwards compatibility with older software, CXL regions
need a default DAX device to hold the reference for the region until it
is deleted.

Add create_dc_region sysfs entry to create DC regions.  Share the logic
of devm_cxl_add_dax_region() and region_is_system_ram().  Special case
DC capable CXL regions to create a 0 sized seed DAX device until others
can be created on dynamic space later.

Flag dax_regions to indicate 0 capacity available until dax_region
extents are supported by the region.

Co-developed-by: Navneet Singh <navneet.singh@intel.com>
Signed-off-by: Navneet Singh <navneet.singh@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
changes for v2:
[iweiny: flag empty dax regions]
[iweiny: Split out anything not directly related to creating a DC CXL
	 region]
[iweiny: Separate out dev dax stuff]
[iweiny/navneet: create 0 sized DAX device by default]
[iweiny: use new DC region mode]
---
 Documentation/ABI/testing/sysfs-bus-cxl | 20 +++++-----
 drivers/cxl/core/core.h                 |  1 +
 drivers/cxl/core/port.c                 |  1 +
 drivers/cxl/core/region.c               | 71 ++++++++++++++++++++++++++++-----
 drivers/dax/bus.c                       |  8 ++++
 drivers/dax/bus.h                       |  1 +
 drivers/dax/cxl.c                       | 15 ++++++-
 7 files changed, 96 insertions(+), 21 deletions(-)

Comments

Jonathan Cameron Aug. 29, 2023, 3:19 p.m. UTC | #1
On Mon, 28 Aug 2023 22:20:59 -0700
Ira Weiny <ira.weiny@intel.com> wrote:

> CXL devices optionally support dynamic capacity.  CXL Regions must be
> configured correctly to access this capacity.  Similar to ram and pmem
> partitions, DC Regions represent different partitions of the DPA space.
> 
> Interleaving is deferred due to the complexity of managing extents on
> multiple devices at the same time.  However, there is nothing which
> directly prevents interleave support at this time.  The check allows
> for early rejection.
> 
> To maintain backwards compatibility with older software, CXL regions
> need a default DAX device to hold the reference for the region until it
> is deleted.
> 
> Add create_dc_region sysfs entry to create DC regions.  Share the logic
> of devm_cxl_add_dax_region() and region_is_system_ram().  Special case
> DC capable CXL regions to create a 0 sized seed DAX device until others
> can be created on dynamic space later.
> 
> Flag dax_regions to indicate 0 capacity available until dax_region
> extents are supported by the region.
> 
> Co-developed-by: Navneet Singh <navneet.singh@intel.com>
> Signed-off-by: Navneet Singh <navneet.singh@intel.com>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
LGTM
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Dave Jiang Aug. 30, 2023, 11:27 p.m. UTC | #2
On 8/28/23 22:20, Ira Weiny wrote:
> CXL devices optionally support dynamic capacity.  CXL Regions must be
> configured correctly to access this capacity.  Similar to ram and pmem
> partitions, DC Regions represent different partitions of the DPA space.
> 
> Interleaving is deferred due to the complexity of managing extents on
> multiple devices at the same time.  However, there is nothing which
> directly prevents interleave support at this time.  The check allows
> for early rejection.
> 
> To maintain backwards compatibility with older software, CXL regions
> need a default DAX device to hold the reference for the region until it
> is deleted.
> 
> Add create_dc_region sysfs entry to create DC regions.  Share the logic
> of devm_cxl_add_dax_region() and region_is_system_ram().  Special case
> DC capable CXL regions to create a 0 sized seed DAX device until others
> can be created on dynamic space later.
> 
> Flag dax_regions to indicate 0 capacity available until dax_region
> extents are supported by the region.
> 
> Co-developed-by: Navneet Singh <navneet.singh@intel.com>
> Signed-off-by: Navneet Singh <navneet.singh@intel.com>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>

You probably should update kernel version to v6.7. Otherwise
Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> 
> ---
> changes for v2:
> [iweiny: flag empty dax regions]
> [iweiny: Split out anything not directly related to creating a DC CXL
> 	 region]
> [iweiny: Separate out dev dax stuff]
> [iweiny/navneet: create 0 sized DAX device by default]
> [iweiny: use new DC region mode]
> ---
>   Documentation/ABI/testing/sysfs-bus-cxl | 20 +++++-----
>   drivers/cxl/core/core.h                 |  1 +
>   drivers/cxl/core/port.c                 |  1 +
>   drivers/cxl/core/region.c               | 71 ++++++++++++++++++++++++++++-----
>   drivers/dax/bus.c                       |  8 ++++
>   drivers/dax/bus.h                       |  1 +
>   drivers/dax/cxl.c                       | 15 ++++++-
>   7 files changed, 96 insertions(+), 21 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index aa65dc5b4e13..a0562938ecac 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -351,20 +351,20 @@ Description:
>   		interleave_granularity).
>   
>   
> -What:		/sys/bus/cxl/devices/decoderX.Y/create_{pmem,ram}_region
> +What:		/sys/bus/cxl/devices/decoderX.Y/create_{pmem,ram,dc}_region
>   Date:		May, 2022, January, 2023
> -KernelVersion:	v6.0 (pmem), v6.3 (ram)
> +KernelVersion:	v6.0 (pmem), v6.3 (ram), v6.6 (dc)
>   Contact:	linux-cxl@vger.kernel.org
>   Description:
>   		(RW) Write a string in the form 'regionZ' to start the process
> -		of defining a new persistent, or volatile memory region
> -		(interleave-set) within the decode range bounded by root decoder
> -		'decoderX.Y'. The value written must match the current value
> -		returned from reading this attribute. An atomic compare exchange
> -		operation is done on write to assign the requested id to a
> -		region and allocate the region-id for the next creation attempt.
> -		EBUSY is returned if the region name written does not match the
> -		current cached value.
> +		of defining a new persistent, volatile, or Dynamic Capacity
> +		(DC) memory region (interleave-set) within the decode range
> +		bounded by root decoder 'decoderX.Y'. The value written must
> +		match the current value returned from reading this attribute.
> +		An atomic compare exchange operation is done on write to assign
> +		the requested id to a region and allocate the region-id for the
> +		next creation attempt.  EBUSY is returned if the region name
> +		written does not match the current cached value.
>   
>   
>   What:		/sys/bus/cxl/devices/decoderX.Y/delete_region
> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index 45e7e044cf4a..cf3cf01cb95d 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -13,6 +13,7 @@ extern struct attribute_group cxl_base_attribute_group;
>   #ifdef CONFIG_CXL_REGION
>   extern struct device_attribute dev_attr_create_pmem_region;
>   extern struct device_attribute dev_attr_create_ram_region;
> +extern struct device_attribute dev_attr_create_dc_region;
>   extern struct device_attribute dev_attr_delete_region;
>   extern struct device_attribute dev_attr_region;
>   extern const struct device_type cxl_pmem_region_type;
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index a5db710a63bc..608901bb7d91 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -314,6 +314,7 @@ static struct attribute *cxl_decoder_root_attrs[] = {
>   	&dev_attr_target_list.attr,
>   	SET_CXL_REGION_ATTR(create_pmem_region)
>   	SET_CXL_REGION_ATTR(create_ram_region)
> +	SET_CXL_REGION_ATTR(create_dc_region)
>   	SET_CXL_REGION_ATTR(delete_region)
>   	NULL,
>   };
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 69af1354bc5b..fc8dee469244 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2271,6 +2271,7 @@ static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd,
>   	switch (mode) {
>   	case CXL_REGION_RAM:
>   	case CXL_REGION_PMEM:
> +	case CXL_REGION_DC:
>   		break;
>   	default:
>   		dev_err(&cxlrd->cxlsd.cxld.dev, "unsupported mode %s\n",
> @@ -2383,6 +2384,33 @@ static ssize_t create_ram_region_store(struct device *dev,
>   }
>   DEVICE_ATTR_RW(create_ram_region);
>   
> +static ssize_t create_dc_region_show(struct device *dev,
> +				     struct device_attribute *attr, char *buf)
> +{
> +	return __create_region_show(to_cxl_root_decoder(dev), buf);
> +}
> +
> +static ssize_t create_dc_region_store(struct device *dev,
> +				      struct device_attribute *attr,
> +				      const char *buf, size_t len)
> +{
> +	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
> +	struct cxl_region *cxlr;
> +	int rc, id;
> +
> +	rc = sscanf(buf, "region%d\n", &id);
> +	if (rc != 1)
> +		return -EINVAL;
> +
> +	cxlr = __create_region(cxlrd, id, CXL_REGION_DC,
> +			       CXL_DECODER_HOSTONLYMEM);
> +	if (IS_ERR(cxlr))
> +		return PTR_ERR(cxlr);
> +
> +	return len;
> +}
> +DEVICE_ATTR_RW(create_dc_region);
> +
>   static ssize_t region_show(struct device *dev, struct device_attribute *attr,
>   			   char *buf)
>   {
> @@ -2834,7 +2862,7 @@ static void cxlr_dax_unregister(void *_cxlr_dax)
>   	device_unregister(&cxlr_dax->dev);
>   }
>   
> -static int devm_cxl_add_dax_region(struct cxl_region *cxlr)
> +static int __devm_cxl_add_dax_region(struct cxl_region *cxlr)
>   {
>   	struct cxl_dax_region *cxlr_dax;
>   	struct device *dev;
> @@ -2863,6 +2891,21 @@ static int devm_cxl_add_dax_region(struct cxl_region *cxlr)
>   	return rc;
>   }
>   
> +static int devm_cxl_add_dax_region(struct cxl_region *cxlr)
> +{
> +	return __devm_cxl_add_dax_region(cxlr);
> +}
> +
> +static int devm_cxl_add_dc_dax_region(struct cxl_region *cxlr)
> +{
> +	if (cxlr->params.interleave_ways != 1) {
> +		dev_err(&cxlr->dev, "Interleaving DC not supported\n");
> +		return -EINVAL;
> +	}
> +
> +	return __devm_cxl_add_dax_region(cxlr);
> +}
> +
>   static int match_decoder_by_range(struct device *dev, void *data)
>   {
>   	struct range *r1, *r2 = data;
> @@ -3203,6 +3246,19 @@ static int is_system_ram(struct resource *res, void *arg)
>   	return 1;
>   }
>   
> +/*
> + * The region can not be manged by CXL if any portion of
> + * it is already online as 'System RAM'
> + */
> +static bool region_is_system_ram(struct cxl_region *cxlr,
> +				 struct cxl_region_params *p)
> +{
> +	return (walk_iomem_res_desc(IORES_DESC_NONE,
> +				    IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY,
> +				    p->res->start, p->res->end, cxlr,
> +				    is_system_ram) > 0);
> +}
> +
>   static int cxl_region_probe(struct device *dev)
>   {
>   	struct cxl_region *cxlr = to_cxl_region(dev);
> @@ -3242,14 +3298,7 @@ static int cxl_region_probe(struct device *dev)
>   	case CXL_REGION_PMEM:
>   		return devm_cxl_add_pmem_region(cxlr);
>   	case CXL_REGION_RAM:
> -		/*
> -		 * The region can not be manged by CXL if any portion of
> -		 * it is already online as 'System RAM'
> -		 */
> -		if (walk_iomem_res_desc(IORES_DESC_NONE,
> -					IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY,
> -					p->res->start, p->res->end, cxlr,
> -					is_system_ram) > 0)
> +		if (region_is_system_ram(cxlr, p))
>   			return 0;
>   
>   		/*
> @@ -3261,6 +3310,10 @@ static int cxl_region_probe(struct device *dev)
>   
>   		/* HDM-H routes to device-dax */
>   		return devm_cxl_add_dax_region(cxlr);
> +	case CXL_REGION_DC:
> +		if (region_is_system_ram(cxlr, p))
> +			return 0;
> +		return devm_cxl_add_dc_dax_region(cxlr);
>   	default:
>   		dev_dbg(&cxlr->dev, "unsupported region mode: %s\n",
>   			cxl_region_mode_name(cxlr->mode));
> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index 0ee96e6fc426..b76e49813a39 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
> @@ -169,6 +169,11 @@ static bool is_static(struct dax_region *dax_region)
>   	return (dax_region->res.flags & IORESOURCE_DAX_STATIC) != 0;
>   }
>   
> +static bool is_dynamic(struct dax_region *dax_region)
> +{
> +	return (dax_region->res.flags & IORESOURCE_DAX_DYNAMIC_CAP) != 0;
> +}
> +
>   bool static_dev_dax(struct dev_dax *dev_dax)
>   {
>   	return is_static(dev_dax->region);
> @@ -285,6 +290,9 @@ static unsigned long long dax_region_avail_size(struct dax_region *dax_region)
>   
>   	device_lock_assert(dax_region->dev);
>   
> +	if (is_dynamic(dax_region))
> +		return 0;
> +
>   	for_each_dax_region_resource(dax_region, res)
>   		size -= resource_size(res);
>   	return size;
> diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
> index 1ccd23360124..74d8fe4a5532 100644
> --- a/drivers/dax/bus.h
> +++ b/drivers/dax/bus.h
> @@ -13,6 +13,7 @@ struct dax_region;
>   /* dax bus specific ioresource flags */
>   #define IORESOURCE_DAX_STATIC BIT(0)
>   #define IORESOURCE_DAX_KMEM BIT(1)
> +#define IORESOURCE_DAX_DYNAMIC_CAP BIT(2)
>   
>   struct dax_region *alloc_dax_region(struct device *parent, int region_id,
>   		struct range *range, int target_node, unsigned int align,
> diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
> index 8bc9d04034d6..147c8c69782b 100644
> --- a/drivers/dax/cxl.c
> +++ b/drivers/dax/cxl.c
> @@ -13,19 +13,30 @@ static int cxl_dax_region_probe(struct device *dev)
>   	struct cxl_region *cxlr = cxlr_dax->cxlr;
>   	struct dax_region *dax_region;
>   	struct dev_dax_data data;
> +	resource_size_t dev_size;
> +	unsigned long flags;
>   
>   	if (nid == NUMA_NO_NODE)
>   		nid = memory_add_physaddr_to_nid(cxlr_dax->hpa_range.start);
>   
> +	dev_size = range_len(&cxlr_dax->hpa_range);
> +
> +	flags = IORESOURCE_DAX_KMEM;
> +	if (cxlr->mode == CXL_REGION_DC) {
> +		/* Add empty seed dax device */
> +		dev_size = 0;
> +		flags |= IORESOURCE_DAX_DYNAMIC_CAP;
> +	}
> +
>   	dax_region = alloc_dax_region(dev, cxlr->id, &cxlr_dax->hpa_range, nid,
> -				      PMD_SIZE, IORESOURCE_DAX_KMEM);
> +				      PMD_SIZE, flags);
>   	if (!dax_region)
>   		return -ENOMEM;
>   
>   	data = (struct dev_dax_data) {
>   		.dax_region = dax_region,
>   		.id = -1,
> -		.size = range_len(&cxlr_dax->hpa_range),
> +		.size = dev_size,
>   	};
>   
>   	return PTR_ERR_OR_ZERO(devm_create_dev_dax(&data));
>
Fan Ni Sept. 5, 2023, 9:09 p.m. UTC | #3
On Mon, Aug 28, 2023 at 10:20:59PM -0700, Ira Weiny wrote:
> CXL devices optionally support dynamic capacity.  CXL Regions must be
> configured correctly to access this capacity.  Similar to ram and pmem
> partitions, DC Regions represent different partitions of the DPA space.
>
> Interleaving is deferred due to the complexity of managing extents on
> multiple devices at the same time.  However, there is nothing which
> directly prevents interleave support at this time.  The check allows
> for early rejection.
>
> To maintain backwards compatibility with older software, CXL regions
> need a default DAX device to hold the reference for the region until it
> is deleted.
>
> Add create_dc_region sysfs entry to create DC regions.  Share the logic
> of devm_cxl_add_dax_region() and region_is_system_ram().  Special case
> DC capable CXL regions to create a 0 sized seed DAX device until others
> can be created on dynamic space later.
>
> Flag dax_regions to indicate 0 capacity available until dax_region
> extents are supported by the region.
>
> Co-developed-by: Navneet Singh <navneet.singh@intel.com>
> Signed-off-by: Navneet Singh <navneet.singh@intel.com>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
>

Reviewed-by: Fan Ni <fan.ni@samsung.com>

> ---
> changes for v2:
> [iweiny: flag empty dax regions]
> [iweiny: Split out anything not directly related to creating a DC CXL
> 	 region]
> [iweiny: Separate out dev dax stuff]
> [iweiny/navneet: create 0 sized DAX device by default]
> [iweiny: use new DC region mode]
> ---
>  Documentation/ABI/testing/sysfs-bus-cxl | 20 +++++-----
>  drivers/cxl/core/core.h                 |  1 +
>  drivers/cxl/core/port.c                 |  1 +
>  drivers/cxl/core/region.c               | 71 ++++++++++++++++++++++++++++-----
>  drivers/dax/bus.c                       |  8 ++++
>  drivers/dax/bus.h                       |  1 +
>  drivers/dax/cxl.c                       | 15 ++++++-
>  7 files changed, 96 insertions(+), 21 deletions(-)
>
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index aa65dc5b4e13..a0562938ecac 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -351,20 +351,20 @@ Description:
>  		interleave_granularity).
>
>
> -What:		/sys/bus/cxl/devices/decoderX.Y/create_{pmem,ram}_region
> +What:		/sys/bus/cxl/devices/decoderX.Y/create_{pmem,ram,dc}_region
>  Date:		May, 2022, January, 2023
> -KernelVersion:	v6.0 (pmem), v6.3 (ram)
> +KernelVersion:	v6.0 (pmem), v6.3 (ram), v6.6 (dc)
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
>  		(RW) Write a string in the form 'regionZ' to start the process
> -		of defining a new persistent, or volatile memory region
> -		(interleave-set) within the decode range bounded by root decoder
> -		'decoderX.Y'. The value written must match the current value
> -		returned from reading this attribute. An atomic compare exchange
> -		operation is done on write to assign the requested id to a
> -		region and allocate the region-id for the next creation attempt.
> -		EBUSY is returned if the region name written does not match the
> -		current cached value.
> +		of defining a new persistent, volatile, or Dynamic Capacity
> +		(DC) memory region (interleave-set) within the decode range
> +		bounded by root decoder 'decoderX.Y'. The value written must
> +		match the current value returned from reading this attribute.
> +		An atomic compare exchange operation is done on write to assign
> +		the requested id to a region and allocate the region-id for the
> +		next creation attempt.  EBUSY is returned if the region name
> +		written does not match the current cached value.
>
>
>  What:		/sys/bus/cxl/devices/decoderX.Y/delete_region
> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index 45e7e044cf4a..cf3cf01cb95d 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -13,6 +13,7 @@ extern struct attribute_group cxl_base_attribute_group;
>  #ifdef CONFIG_CXL_REGION
>  extern struct device_attribute dev_attr_create_pmem_region;
>  extern struct device_attribute dev_attr_create_ram_region;
> +extern struct device_attribute dev_attr_create_dc_region;
>  extern struct device_attribute dev_attr_delete_region;
>  extern struct device_attribute dev_attr_region;
>  extern const struct device_type cxl_pmem_region_type;
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index a5db710a63bc..608901bb7d91 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -314,6 +314,7 @@ static struct attribute *cxl_decoder_root_attrs[] = {
>  	&dev_attr_target_list.attr,
>  	SET_CXL_REGION_ATTR(create_pmem_region)
>  	SET_CXL_REGION_ATTR(create_ram_region)
> +	SET_CXL_REGION_ATTR(create_dc_region)
>  	SET_CXL_REGION_ATTR(delete_region)
>  	NULL,
>  };
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 69af1354bc5b..fc8dee469244 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2271,6 +2271,7 @@ static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd,
>  	switch (mode) {
>  	case CXL_REGION_RAM:
>  	case CXL_REGION_PMEM:
> +	case CXL_REGION_DC:
>  		break;
>  	default:
>  		dev_err(&cxlrd->cxlsd.cxld.dev, "unsupported mode %s\n",
> @@ -2383,6 +2384,33 @@ static ssize_t create_ram_region_store(struct device *dev,
>  }
>  DEVICE_ATTR_RW(create_ram_region);
>
> +static ssize_t create_dc_region_show(struct device *dev,
> +				     struct device_attribute *attr, char *buf)
> +{
> +	return __create_region_show(to_cxl_root_decoder(dev), buf);
> +}
> +
> +static ssize_t create_dc_region_store(struct device *dev,
> +				      struct device_attribute *attr,
> +				      const char *buf, size_t len)
> +{
> +	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
> +	struct cxl_region *cxlr;
> +	int rc, id;
> +
> +	rc = sscanf(buf, "region%d\n", &id);
> +	if (rc != 1)
> +		return -EINVAL;
> +
> +	cxlr = __create_region(cxlrd, id, CXL_REGION_DC,
> +			       CXL_DECODER_HOSTONLYMEM);
> +	if (IS_ERR(cxlr))
> +		return PTR_ERR(cxlr);
> +
> +	return len;
> +}
> +DEVICE_ATTR_RW(create_dc_region);
> +
>  static ssize_t region_show(struct device *dev, struct device_attribute *attr,
>  			   char *buf)
>  {
> @@ -2834,7 +2862,7 @@ static void cxlr_dax_unregister(void *_cxlr_dax)
>  	device_unregister(&cxlr_dax->dev);
>  }
>
> -static int devm_cxl_add_dax_region(struct cxl_region *cxlr)
> +static int __devm_cxl_add_dax_region(struct cxl_region *cxlr)
>  {
>  	struct cxl_dax_region *cxlr_dax;
>  	struct device *dev;
> @@ -2863,6 +2891,21 @@ static int devm_cxl_add_dax_region(struct cxl_region *cxlr)
>  	return rc;
>  }
>
> +static int devm_cxl_add_dax_region(struct cxl_region *cxlr)
> +{
> +	return __devm_cxl_add_dax_region(cxlr);
> +}
> +
> +static int devm_cxl_add_dc_dax_region(struct cxl_region *cxlr)
> +{
> +	if (cxlr->params.interleave_ways != 1) {
> +		dev_err(&cxlr->dev, "Interleaving DC not supported\n");
> +		return -EINVAL;
> +	}
> +
> +	return __devm_cxl_add_dax_region(cxlr);
> +}
> +
>  static int match_decoder_by_range(struct device *dev, void *data)
>  {
>  	struct range *r1, *r2 = data;
> @@ -3203,6 +3246,19 @@ static int is_system_ram(struct resource *res, void *arg)
>  	return 1;
>  }
>
> +/*
> + * The region can not be manged by CXL if any portion of
> + * it is already online as 'System RAM'
> + */
> +static bool region_is_system_ram(struct cxl_region *cxlr,
> +				 struct cxl_region_params *p)
> +{
> +	return (walk_iomem_res_desc(IORES_DESC_NONE,
> +				    IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY,
> +				    p->res->start, p->res->end, cxlr,
> +				    is_system_ram) > 0);
> +}
> +
>  static int cxl_region_probe(struct device *dev)
>  {
>  	struct cxl_region *cxlr = to_cxl_region(dev);
> @@ -3242,14 +3298,7 @@ static int cxl_region_probe(struct device *dev)
>  	case CXL_REGION_PMEM:
>  		return devm_cxl_add_pmem_region(cxlr);
>  	case CXL_REGION_RAM:
> -		/*
> -		 * The region can not be manged by CXL if any portion of
> -		 * it is already online as 'System RAM'
> -		 */
> -		if (walk_iomem_res_desc(IORES_DESC_NONE,
> -					IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY,
> -					p->res->start, p->res->end, cxlr,
> -					is_system_ram) > 0)
> +		if (region_is_system_ram(cxlr, p))
>  			return 0;
>
>  		/*
> @@ -3261,6 +3310,10 @@ static int cxl_region_probe(struct device *dev)
>
>  		/* HDM-H routes to device-dax */
>  		return devm_cxl_add_dax_region(cxlr);
> +	case CXL_REGION_DC:
> +		if (region_is_system_ram(cxlr, p))
> +			return 0;
> +		return devm_cxl_add_dc_dax_region(cxlr);
>  	default:
>  		dev_dbg(&cxlr->dev, "unsupported region mode: %s\n",
>  			cxl_region_mode_name(cxlr->mode));
> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index 0ee96e6fc426..b76e49813a39 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
> @@ -169,6 +169,11 @@ static bool is_static(struct dax_region *dax_region)
>  	return (dax_region->res.flags & IORESOURCE_DAX_STATIC) != 0;
>  }
>
> +static bool is_dynamic(struct dax_region *dax_region)
> +{
> +	return (dax_region->res.flags & IORESOURCE_DAX_DYNAMIC_CAP) != 0;
> +}
> +
>  bool static_dev_dax(struct dev_dax *dev_dax)
>  {
>  	return is_static(dev_dax->region);
> @@ -285,6 +290,9 @@ static unsigned long long dax_region_avail_size(struct dax_region *dax_region)
>
>  	device_lock_assert(dax_region->dev);
>
> +	if (is_dynamic(dax_region))
> +		return 0;
> +
>  	for_each_dax_region_resource(dax_region, res)
>  		size -= resource_size(res);
>  	return size;
> diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
> index 1ccd23360124..74d8fe4a5532 100644
> --- a/drivers/dax/bus.h
> +++ b/drivers/dax/bus.h
> @@ -13,6 +13,7 @@ struct dax_region;
>  /* dax bus specific ioresource flags */
>  #define IORESOURCE_DAX_STATIC BIT(0)
>  #define IORESOURCE_DAX_KMEM BIT(1)
> +#define IORESOURCE_DAX_DYNAMIC_CAP BIT(2)
>
>  struct dax_region *alloc_dax_region(struct device *parent, int region_id,
>  		struct range *range, int target_node, unsigned int align,
> diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
> index 8bc9d04034d6..147c8c69782b 100644
> --- a/drivers/dax/cxl.c
> +++ b/drivers/dax/cxl.c
> @@ -13,19 +13,30 @@ static int cxl_dax_region_probe(struct device *dev)
>  	struct cxl_region *cxlr = cxlr_dax->cxlr;
>  	struct dax_region *dax_region;
>  	struct dev_dax_data data;
> +	resource_size_t dev_size;
> +	unsigned long flags;
>
>  	if (nid == NUMA_NO_NODE)
>  		nid = memory_add_physaddr_to_nid(cxlr_dax->hpa_range.start);
>
> +	dev_size = range_len(&cxlr_dax->hpa_range);
> +
> +	flags = IORESOURCE_DAX_KMEM;
> +	if (cxlr->mode == CXL_REGION_DC) {
> +		/* Add empty seed dax device */
> +		dev_size = 0;
> +		flags |= IORESOURCE_DAX_DYNAMIC_CAP;
> +	}
> +
>  	dax_region = alloc_dax_region(dev, cxlr->id, &cxlr_dax->hpa_range, nid,
> -				      PMD_SIZE, IORESOURCE_DAX_KMEM);
> +				      PMD_SIZE, flags);
>  	if (!dax_region)
>  		return -ENOMEM;
>
>  	data = (struct dev_dax_data) {
>  		.dax_region = dax_region,
>  		.id = -1,
> -		.size = range_len(&cxlr_dax->hpa_range),
> +		.size = dev_size,
>  	};
>
>  	return PTR_ERR_OR_ZERO(devm_create_dev_dax(&data));
>
> --
> 2.41.0
>
Ira Weiny Sept. 6, 2023, 4:36 a.m. UTC | #4
Dave Jiang wrote:
> 
> 
> On 8/28/23 22:20, Ira Weiny wrote:
> > CXL devices optionally support dynamic capacity.  CXL Regions must be
> > configured correctly to access this capacity.  Similar to ram and pmem
> > partitions, DC Regions represent different partitions of the DPA space.
> > 
> > Interleaving is deferred due to the complexity of managing extents on
> > multiple devices at the same time.  However, there is nothing which
> > directly prevents interleave support at this time.  The check allows
> > for early rejection.
> > 
> > To maintain backwards compatibility with older software, CXL regions
> > need a default DAX device to hold the reference for the region until it
> > is deleted.
> > 
> > Add create_dc_region sysfs entry to create DC regions.  Share the logic
> > of devm_cxl_add_dax_region() and region_is_system_ram().  Special case
> > DC capable CXL regions to create a 0 sized seed DAX device until others
> > can be created on dynamic space later.
> > 
> > Flag dax_regions to indicate 0 capacity available until dax_region
> > extents are supported by the region.
> > 
> > Co-developed-by: Navneet Singh <navneet.singh@intel.com>
> > Signed-off-by: Navneet Singh <navneet.singh@intel.com>
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
> You probably should update kernel version to v6.7. Otherwise

Done.

> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>
diff mbox series

Patch

diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index aa65dc5b4e13..a0562938ecac 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -351,20 +351,20 @@  Description:
 		interleave_granularity).
 
 
-What:		/sys/bus/cxl/devices/decoderX.Y/create_{pmem,ram}_region
+What:		/sys/bus/cxl/devices/decoderX.Y/create_{pmem,ram,dc}_region
 Date:		May, 2022, January, 2023
-KernelVersion:	v6.0 (pmem), v6.3 (ram)
+KernelVersion:	v6.0 (pmem), v6.3 (ram), v6.6 (dc)
 Contact:	linux-cxl@vger.kernel.org
 Description:
 		(RW) Write a string in the form 'regionZ' to start the process
-		of defining a new persistent, or volatile memory region
-		(interleave-set) within the decode range bounded by root decoder
-		'decoderX.Y'. The value written must match the current value
-		returned from reading this attribute. An atomic compare exchange
-		operation is done on write to assign the requested id to a
-		region and allocate the region-id for the next creation attempt.
-		EBUSY is returned if the region name written does not match the
-		current cached value.
+		of defining a new persistent, volatile, or Dynamic Capacity
+		(DC) memory region (interleave-set) within the decode range
+		bounded by root decoder 'decoderX.Y'. The value written must
+		match the current value returned from reading this attribute.
+		An atomic compare exchange operation is done on write to assign
+		the requested id to a region and allocate the region-id for the
+		next creation attempt.  EBUSY is returned if the region name
+		written does not match the current cached value.
 
 
 What:		/sys/bus/cxl/devices/decoderX.Y/delete_region
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 45e7e044cf4a..cf3cf01cb95d 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -13,6 +13,7 @@  extern struct attribute_group cxl_base_attribute_group;
 #ifdef CONFIG_CXL_REGION
 extern struct device_attribute dev_attr_create_pmem_region;
 extern struct device_attribute dev_attr_create_ram_region;
+extern struct device_attribute dev_attr_create_dc_region;
 extern struct device_attribute dev_attr_delete_region;
 extern struct device_attribute dev_attr_region;
 extern const struct device_type cxl_pmem_region_type;
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index a5db710a63bc..608901bb7d91 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -314,6 +314,7 @@  static struct attribute *cxl_decoder_root_attrs[] = {
 	&dev_attr_target_list.attr,
 	SET_CXL_REGION_ATTR(create_pmem_region)
 	SET_CXL_REGION_ATTR(create_ram_region)
+	SET_CXL_REGION_ATTR(create_dc_region)
 	SET_CXL_REGION_ATTR(delete_region)
 	NULL,
 };
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 69af1354bc5b..fc8dee469244 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2271,6 +2271,7 @@  static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd,
 	switch (mode) {
 	case CXL_REGION_RAM:
 	case CXL_REGION_PMEM:
+	case CXL_REGION_DC:
 		break;
 	default:
 		dev_err(&cxlrd->cxlsd.cxld.dev, "unsupported mode %s\n",
@@ -2383,6 +2384,33 @@  static ssize_t create_ram_region_store(struct device *dev,
 }
 DEVICE_ATTR_RW(create_ram_region);
 
+static ssize_t create_dc_region_show(struct device *dev,
+				     struct device_attribute *attr, char *buf)
+{
+	return __create_region_show(to_cxl_root_decoder(dev), buf);
+}
+
+static ssize_t create_dc_region_store(struct device *dev,
+				      struct device_attribute *attr,
+				      const char *buf, size_t len)
+{
+	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
+	struct cxl_region *cxlr;
+	int rc, id;
+
+	rc = sscanf(buf, "region%d\n", &id);
+	if (rc != 1)
+		return -EINVAL;
+
+	cxlr = __create_region(cxlrd, id, CXL_REGION_DC,
+			       CXL_DECODER_HOSTONLYMEM);
+	if (IS_ERR(cxlr))
+		return PTR_ERR(cxlr);
+
+	return len;
+}
+DEVICE_ATTR_RW(create_dc_region);
+
 static ssize_t region_show(struct device *dev, struct device_attribute *attr,
 			   char *buf)
 {
@@ -2834,7 +2862,7 @@  static void cxlr_dax_unregister(void *_cxlr_dax)
 	device_unregister(&cxlr_dax->dev);
 }
 
-static int devm_cxl_add_dax_region(struct cxl_region *cxlr)
+static int __devm_cxl_add_dax_region(struct cxl_region *cxlr)
 {
 	struct cxl_dax_region *cxlr_dax;
 	struct device *dev;
@@ -2863,6 +2891,21 @@  static int devm_cxl_add_dax_region(struct cxl_region *cxlr)
 	return rc;
 }
 
+static int devm_cxl_add_dax_region(struct cxl_region *cxlr)
+{
+	return __devm_cxl_add_dax_region(cxlr);
+}
+
+static int devm_cxl_add_dc_dax_region(struct cxl_region *cxlr)
+{
+	if (cxlr->params.interleave_ways != 1) {
+		dev_err(&cxlr->dev, "Interleaving DC not supported\n");
+		return -EINVAL;
+	}
+
+	return __devm_cxl_add_dax_region(cxlr);
+}
+
 static int match_decoder_by_range(struct device *dev, void *data)
 {
 	struct range *r1, *r2 = data;
@@ -3203,6 +3246,19 @@  static int is_system_ram(struct resource *res, void *arg)
 	return 1;
 }
 
+/*
+ * The region can not be manged by CXL if any portion of
+ * it is already online as 'System RAM'
+ */
+static bool region_is_system_ram(struct cxl_region *cxlr,
+				 struct cxl_region_params *p)
+{
+	return (walk_iomem_res_desc(IORES_DESC_NONE,
+				    IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY,
+				    p->res->start, p->res->end, cxlr,
+				    is_system_ram) > 0);
+}
+
 static int cxl_region_probe(struct device *dev)
 {
 	struct cxl_region *cxlr = to_cxl_region(dev);
@@ -3242,14 +3298,7 @@  static int cxl_region_probe(struct device *dev)
 	case CXL_REGION_PMEM:
 		return devm_cxl_add_pmem_region(cxlr);
 	case CXL_REGION_RAM:
-		/*
-		 * The region can not be manged by CXL if any portion of
-		 * it is already online as 'System RAM'
-		 */
-		if (walk_iomem_res_desc(IORES_DESC_NONE,
-					IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY,
-					p->res->start, p->res->end, cxlr,
-					is_system_ram) > 0)
+		if (region_is_system_ram(cxlr, p))
 			return 0;
 
 		/*
@@ -3261,6 +3310,10 @@  static int cxl_region_probe(struct device *dev)
 
 		/* HDM-H routes to device-dax */
 		return devm_cxl_add_dax_region(cxlr);
+	case CXL_REGION_DC:
+		if (region_is_system_ram(cxlr, p))
+			return 0;
+		return devm_cxl_add_dc_dax_region(cxlr);
 	default:
 		dev_dbg(&cxlr->dev, "unsupported region mode: %s\n",
 			cxl_region_mode_name(cxlr->mode));
diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 0ee96e6fc426..b76e49813a39 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -169,6 +169,11 @@  static bool is_static(struct dax_region *dax_region)
 	return (dax_region->res.flags & IORESOURCE_DAX_STATIC) != 0;
 }
 
+static bool is_dynamic(struct dax_region *dax_region)
+{
+	return (dax_region->res.flags & IORESOURCE_DAX_DYNAMIC_CAP) != 0;
+}
+
 bool static_dev_dax(struct dev_dax *dev_dax)
 {
 	return is_static(dev_dax->region);
@@ -285,6 +290,9 @@  static unsigned long long dax_region_avail_size(struct dax_region *dax_region)
 
 	device_lock_assert(dax_region->dev);
 
+	if (is_dynamic(dax_region))
+		return 0;
+
 	for_each_dax_region_resource(dax_region, res)
 		size -= resource_size(res);
 	return size;
diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
index 1ccd23360124..74d8fe4a5532 100644
--- a/drivers/dax/bus.h
+++ b/drivers/dax/bus.h
@@ -13,6 +13,7 @@  struct dax_region;
 /* dax bus specific ioresource flags */
 #define IORESOURCE_DAX_STATIC BIT(0)
 #define IORESOURCE_DAX_KMEM BIT(1)
+#define IORESOURCE_DAX_DYNAMIC_CAP BIT(2)
 
 struct dax_region *alloc_dax_region(struct device *parent, int region_id,
 		struct range *range, int target_node, unsigned int align,
diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
index 8bc9d04034d6..147c8c69782b 100644
--- a/drivers/dax/cxl.c
+++ b/drivers/dax/cxl.c
@@ -13,19 +13,30 @@  static int cxl_dax_region_probe(struct device *dev)
 	struct cxl_region *cxlr = cxlr_dax->cxlr;
 	struct dax_region *dax_region;
 	struct dev_dax_data data;
+	resource_size_t dev_size;
+	unsigned long flags;
 
 	if (nid == NUMA_NO_NODE)
 		nid = memory_add_physaddr_to_nid(cxlr_dax->hpa_range.start);
 
+	dev_size = range_len(&cxlr_dax->hpa_range);
+
+	flags = IORESOURCE_DAX_KMEM;
+	if (cxlr->mode == CXL_REGION_DC) {
+		/* Add empty seed dax device */
+		dev_size = 0;
+		flags |= IORESOURCE_DAX_DYNAMIC_CAP;
+	}
+
 	dax_region = alloc_dax_region(dev, cxlr->id, &cxlr_dax->hpa_range, nid,
-				      PMD_SIZE, IORESOURCE_DAX_KMEM);
+				      PMD_SIZE, flags);
 	if (!dax_region)
 		return -ENOMEM;
 
 	data = (struct dev_dax_data) {
 		.dax_region = dax_region,
 		.id = -1,
-		.size = range_len(&cxlr_dax->hpa_range),
+		.size = dev_size,
 	};
 
 	return PTR_ERR_OR_ZERO(devm_create_dev_dax(&data));