diff mbox series

[12/23] cxl/region: Add region creation ABI

Message ID 20210723210623.114073-13-ben.widawsky@intel.com
State New, archived
Headers show
Series cxl_region and cxl_memdev drivers | expand

Commit Message

Ben Widawsky July 23, 2021, 9:06 p.m. UTC
Regions are created as a child of the decoder that encompasses an
address space with constraints. Regions only exist for persistent
capacities.

When regions are created, the number of desired interleave ways must be
known. To enable this, the sysfs attribute will take the desired ways as
input. This interface intentionally allows creation of
impossible-to-enable regions based on interleave constraints in the
topology. The reasoning is to create new regions through the kernel
interfaces which may become possible on reboot under a variety of
circumstances.

As an example, creating a x1 region with:
echo 1 > /sys/bus/cxl/devices/decoder0.0/create_region

Will yield /sys/bus/cxl/devices/decoder0.0/region0.0:0

That region may then be deleted with:
echo region0.0:0 > /sys/bus/cxl/devices/decoder0.0/delete_region

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
---
 Documentation/ABI/testing/sysfs-bus-cxl       |  21 +++
 .../driver-api/cxl/memory-devices.rst         |  11 ++
 drivers/cxl/Makefile                          |   1 +
 drivers/cxl/core/Makefile                     |   2 +-
 drivers/cxl/core/bus.c                        |  70 ++++++++
 drivers/cxl/core/core.h                       |   1 +
 drivers/cxl/core/region.c                     | 158 ++++++++++++++++++
 drivers/cxl/cxl.h                             |  14 ++
 drivers/cxl/region.h                          |  30 ++++
 9 files changed, 307 insertions(+), 1 deletion(-)
 create mode 100644 drivers/cxl/core/region.c
 create mode 100644 drivers/cxl/region.h

Comments

Dan Williams Aug. 14, 2021, 2:19 a.m. UTC | #1
On Fri, Jul 23, 2021 at 2:06 PM Ben Widawsky <ben.widawsky@intel.com> wrote:
>
> Regions are created as a child of the decoder that encompasses an
> address space with constraints. Regions only exist for persistent
> capacities.

Maybe I am misinterpreting, but regions need to exist for volatile
capacities too, otherwise what's the interface for enumerating and
reconfiguring volatile interleaves? The only difference is one is
recorded in labels.

Perhaps before creating regions we should write the code to enumerate
volatile regions that might be present since boot. We'll need that
anyway to determine how much CFMWS space is available for dynamic
creation. Once happy with the read side then proceed to the write
side. Thoughts?

>
> When regions are created, the number of desired interleave ways must be
> known. To enable this, the sysfs attribute will take the desired ways as
> input. This interface intentionally allows creation of
> impossible-to-enable regions based on interleave constraints in the
> topology. The reasoning is to create new regions through the kernel
> interfaces which may become possible on reboot under a variety of
> circumstances.
>
> As an example, creating a x1 region with:
> echo 1 > /sys/bus/cxl/devices/decoder0.0/create_region

This proposal has lost its luster for me since I last saw it, and I
must belatedly apologize to you and Jonathan for not internalizing his
critique of this. My previous concern [1] was that tooling would be
surprised by sysfs_update_group() causing attributes to pop into
existence. However, I think the asymmetry is more surprising,
especially when it comes to reconfiguring an existing interleave to be
a different width. There should just be one sysfs attribute that
controls the width of a region.

This allows for additional symmetry where userspace must explicitly
request the next region device name, like so:

# region=$(cat /sys/bus/cxl/devices/decoder0.0/create_region)
# echo $region
region0.0:0
# echo $region > /sys/bus/cxl/devices/decoder0.0/create_region

If userspace races itself 2 threads will attempt
# echo region0.0:0 > /sys/bus/cxl/devices/decoder0.0/create_region
...and one thread will win. This solves the long standing "next seed"
problem (in libnvdimm and device-dax) with a way to atomically get the
next region, and send colliding threads to create another region.


[1]: https://lore.kernel.org/r/CAPcyv4iF1m+xGMaph+K8VJ0+tCvUML9-pUuAWymXoOrvY1jV1w@mail.gmail.com


>
> Will yield /sys/bus/cxl/devices/decoder0.0/region0.0:0
>
> That region may then be deleted with:
> echo region0.0:0 > /sys/bus/cxl/devices/decoder0.0/delete_region
>
> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> ---
>  Documentation/ABI/testing/sysfs-bus-cxl       |  21 +++
>  .../driver-api/cxl/memory-devices.rst         |  11 ++
>  drivers/cxl/Makefile                          |   1 +
>  drivers/cxl/core/Makefile                     |   2 +-
>  drivers/cxl/core/bus.c                        |  70 ++++++++
>  drivers/cxl/core/core.h                       |   1 +
>  drivers/cxl/core/region.c                     | 158 ++++++++++++++++++
>  drivers/cxl/cxl.h                             |  14 ++
>  drivers/cxl/region.h                          |  30 ++++
>  9 files changed, 307 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/cxl/core/region.c
>  create mode 100644 drivers/cxl/region.h
>
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index 0b6a2e6e8fbb..115a25d2899d 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -127,3 +127,24 @@ Description:
>                 memory (type-3). The 'target_type' attribute indicates the
>                 current setting which may dynamically change based on what
>                 memory regions are activated in this decode hierarchy.
> +
> +What:          /sys/bus/cxl/devices/decoderX.Y/create_region
> +Date:          June, 2021
> +KernelVersion: v5.14
> +Contact:       linux-cxl@vger.kernel.org
> +Description:
> +               Creates a new CXL region of N interleaved ways. Writing a value
> +               of '2' will create a new uninitialized region with 2x interleave
> +               that will be mapped by the CXL decoderX.Y. Reading from this
> +               node will return the last created region. Regions must be
> +               subsequently configured and bound to a region driver before they
> +               can be used.
> +
> +What:          /sys/bus/cxl/devices/decoderX.Y/delete_region
> +Date:          June, 2021
> +KernelVersion: v5.14
> +Contact:       linux-cxl@vger.kernel.org
> +Description:
> +               Deletes the named region. A region must be unbound from the
> +               region driver before being deleted. The attributes expects a
> +               region in the form "regionX.Y:Z".
> diff --git a/Documentation/driver-api/cxl/memory-devices.rst b/Documentation/driver-api/cxl/memory-devices.rst
> index 46847d8c70a0..96a1f8be7940 100644
> --- a/Documentation/driver-api/cxl/memory-devices.rst
> +++ b/Documentation/driver-api/cxl/memory-devices.rst
> @@ -45,6 +45,17 @@ CXL Core
>  .. kernel-doc:: drivers/cxl/core/regs.c
>     :internal:
>
> +CXL Regions
> +-----------
> +.. kernel-doc:: drivers/cxl/region.h
> +   :identifiers:
> +
> +.. kernel-doc:: drivers/cxl/core/region.c
> +   :doc: cxl core region
> +
> +.. kernel-doc:: drivers/cxl/core/region.c
> +   :identifiers:
> +
>  External Interfaces
>  ===================
>
> diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile
> index d1aaabc940f3..d19d22a19966 100644
> --- a/drivers/cxl/Makefile
> +++ b/drivers/cxl/Makefile
> @@ -4,6 +4,7 @@ obj-$(CONFIG_CXL_MEM) += cxl_pci.o
>  obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o
>  obj-$(CONFIG_CXL_PMEM) += cxl_pmem.o
>
> +cxl_acpi-y := acpi.o
>  cxl_pci-y := pci.o
>  cxl_acpi-y := acpi.o
>  cxl_pmem-y := pmem.o
> diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
> index 1503d8bf5e9a..4d034900e22c 100644
> --- a/drivers/cxl/core/Makefile
> +++ b/drivers/cxl/core/Makefile
> @@ -2,4 +2,4 @@
>  obj-$(CONFIG_CXL_BUS) += cxl_core.o
>
>  ccflags-y += -DDEFAULT_SYMBOL_NAMESPACE=CXL -I./drivers/cxl
> -cxl_core-y := bus.o memdev.o pmem.o regs.o
> +cxl_core-y := bus.o memdev.o pmem.o region.o regs.o
> diff --git a/drivers/cxl/core/bus.c b/drivers/cxl/core/bus.c
> index e2166f43aefc..3b2bcc091523 100644
> --- a/drivers/cxl/core/bus.c
> +++ b/drivers/cxl/core/bus.c
> @@ -131,7 +131,68 @@ static ssize_t target_list_show(struct device *dev,
>  }
>  static DEVICE_ATTR_RO(target_list);
>
> +static ssize_t create_region_show(struct device *dev,
> +                                 struct device_attribute *attr, char *buf)
> +{
> +       struct cxl_decoder *cxld = to_cxl_decoder(dev);
> +       int rc;
> +
> +       device_lock(dev);
> +       rc = sprintf(buf, "%s\n",
> +                    cxld->youngest ? dev_name(&cxld->youngest->dev) : "");

Youngest goes away if userspace explicitly knows which region it
atomically created.

> +       device_unlock(dev);
> +
> +       return rc;
> +}
> +
> +static ssize_t create_region_store(struct device *dev,
> +                                  struct device_attribute *attr,
> +                                  const char *buf, size_t len)
> +{
> +       struct cxl_decoder *cxld = to_cxl_decoder(dev);
> +       struct cxl_region *region;
> +       ssize_t rc;
> +       int val;
> +
> +       rc = kstrtoint(buf, 0, &val);
> +       if (rc)
> +               return rc;
> +       if (val < 0 || val > 16)
> +               return -EINVAL;
> +
> +       region = cxl_alloc_region(cxld, val);
> +       if (IS_ERR(region))
> +               return PTR_ERR(region);
> +
> +       rc = cxl_add_region(cxld, region);
> +       if (rc) {
> +               cxl_free_region(cxld, region);
> +               return rc;
> +       }
> +
> +       cxld->youngest = region;
> +       return len;
> +}
> +static DEVICE_ATTR_RW(create_region);
> +
> +static ssize_t delete_region_store(struct device *dev,
> +                                  struct device_attribute *attr,
> +                                  const char *buf, size_t len)
> +{
> +       struct cxl_decoder *cxld = to_cxl_decoder(dev);
> +       int rc;
> +
> +       rc = cxl_delete_region(cxld, buf);
> +       if (rc)
> +               return rc;
> +
> +       return len;
> +}
> +static DEVICE_ATTR_WO(delete_region);
> +
>  static struct attribute *cxl_decoder_base_attrs[] = {
> +       &dev_attr_create_region.attr,
> +       &dev_attr_delete_region.attr,
>         &dev_attr_start.attr,
>         &dev_attr_size.attr,
>         &dev_attr_locked.attr,
> @@ -188,7 +249,13 @@ static void cxl_decoder_release(struct device *dev)
>  {
>         struct cxl_decoder *cxld = to_cxl_decoder(dev);
>         struct cxl_port *port = dev->parent ? to_cxl_port(dev->parent) : NULL;
> +       struct cxl_region *region;
>
> +       list_for_each_entry(region, &cxld->regions, list)
> +               cxl_delete_region(cxld, dev_name(&region->dev));

The decoder already has a list of child devices, no need to duplicate
that, i.e. do something like:

device_for_each_child(&cxld->dev, cxld, cxl_delete_region);

> +
> +       dev_WARN_ONCE(dev, !ida_is_empty(&cxld->region_ida),
> +                     "Lost track of a region");
>         if (port)
>                 ida_free(&port->decoder_ida, cxld->id);
>         else
> @@ -516,8 +583,11 @@ cxl_decoder_alloc(struct cxl_port *port, int nr_targets, resource_size_t base,
>                 .interleave_ways = interleave_ways,
>                 .interleave_granularity = interleave_granularity,
>                 .target_type = type,
> +               .regions = LIST_HEAD_INIT(cxld->regions),

This goes away...

>         };
>
> +       ida_init(&cxld->region_ida);
> +
>         /* handle implied target_list */
>         if (port)
>                 if (interleave_ways == 1)
> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index 5e5862e4d6af..eb1a17103e5d 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -6,6 +6,7 @@
>
>  #include <cxl.h>
>  #include <mem.h>
> +#include <region.h>
>
>  extern const struct device_type cxl_nvdimm_bridge_type;
>  extern const struct device_type cxl_nvdimm_type;
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> new file mode 100644
> index 000000000000..caa34f59a62d
> --- /dev/null
> +++ b/drivers/cxl/core/region.c
> @@ -0,0 +1,158 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright(c) 2021 Intel Corporation. All rights reserved. */
> +#include <linux/io-64-nonatomic-lo-hi.h>
> +#include <linux/device.h>
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +#include <linux/idr.h>
> +#include "core.h"
> +
> +/**
> + * DOC: cxl core region
> + *
> + * Regions are managed through the Linux device model. Each region instance is a
> + * unique struct device. CXL core provides functionality to create, destroy, and
> + * configure regions. This is all implemented here. Binding a region
> + * (programming the hardware) is handled by a separate region driver.
> + */
> +
> +static void cxl_region_release(struct device *dev);
> +
> +static const struct device_type cxl_region_type = {
> +       .name = "cxl_region",
> +       .release = cxl_region_release,
> +};
> +
> +bool is_cxl_region(struct device *dev)
> +{
> +       return dev->type == &cxl_region_type;
> +}
> +EXPORT_SYMBOL_GPL(is_cxl_region);
> +
> +struct cxl_region *to_cxl_region(struct device *dev)
> +{
> +       if (dev_WARN_ONCE(dev, dev->type != &cxl_region_type,
> +                         "not a cxl_region device\n"))
> +               return NULL;
> +
> +       return container_of(dev, struct cxl_region, dev);
> +}
> +EXPORT_SYMBOL_GPL(to_cxl_region);
> +
> +void cxl_free_region(struct cxl_decoder *cxld, struct cxl_region *region)
> +{
> +       ida_free(&cxld->region_ida, region->id);
> +       kfree(region);
> +}
> +
> +static void cxl_region_release(struct device *dev)
> +{
> +       struct cxl_decoder *cxld = to_cxl_decoder(dev->parent);
> +       struct cxl_region *region = to_cxl_region(dev);
> +
> +       cxl_free_region(cxld, region);
> +}
> +
> +struct cxl_region *cxl_alloc_region(struct cxl_decoder *cxld,
> +                                   int interleave_ways)
> +{
> +       struct cxl_region *region;
> +       int rc;
> +
> +       region = kzalloc(struct_size(region, targets, interleave_ways),
> +                        GFP_KERNEL);
> +       if (!region)
> +               return ERR_PTR(-ENOMEM);
> +
> +       region->eniw = interleave_ways;
> +
> +       rc = ida_alloc(&cxld->region_ida, GFP_KERNEL);
> +       if (rc < 0) {
> +               dev_err(&cxld->dev, "Couldn't get a new id\n");
> +               kfree(region);
> +               return ERR_PTR(rc);
> +       }
> +       region->id = rc;
> +
> +       return region;
> +}
> +
> +/**
> + * cxl_add_region - Adds a region to a decoder
> + * @cxld: Parent decoder.
> + * @region: Region to be added to the decoder.
> + *
> + * This is the second step of region initialization. Regions exist within an
> + * address space which is mapped by a @cxld, and that @cxld enforces constraints
> + * upon the region as it is configured. Regions may be added to a @cxld but not
> + * activated and therefore it is possible to have more regions in a @cxld than
> + * there are interleave ways in the @cxld. Regions exist only for persistent
> + * capacities.
> + *
> + * Return: zero if the region was added to the @cxld, else returns negative
> + * error code.
> + */
> +int cxl_add_region(struct cxl_decoder *cxld, struct cxl_region *region)
> +{
> +       struct cxl_port *port = to_cxl_port(cxld->dev.parent);
> +       struct device *dev = &region->dev;
> +       int rc;
> +
> +       device_initialize(dev);
> +       dev->parent = &cxld->dev;
> +       device_set_pm_not_required(dev);
> +       dev->bus = &cxl_bus_type;
> +       dev->type = &cxl_region_type;
> +       rc = dev_set_name(dev, "region%d.%d:%d", port->id, cxld->id, region->id);
> +       if (rc)
> +               goto err;
> +
> +       rc = device_add(dev);
> +       if (rc)
> +               goto err;
> +
> +       dev_dbg(dev, "Added %s to %s\n", dev_name(dev), dev_name(&cxld->dev));
> +
> +       return 0;
> +
> +err:
> +       put_device(dev);
> +       return rc;
> +}
> +
> +static struct cxl_region *
> +cxl_find_region_by_name(struct cxl_decoder *cxld, const char *name)
> +{
> +       struct device *region_dev;
> +
> +       region_dev = device_find_child_by_name(&cxld->dev, name);
> +       if (!region_dev)
> +               return ERR_PTR(-ENOENT);
> +
> +       return to_cxl_region(region_dev);
> +}
> +
> +int cxl_delete_region(struct cxl_decoder *cxld, const char *region_name)
> +{
> +       struct cxl_region *region;
> +
> +       device_lock(&cxld->dev);
> +
> +       region = cxl_find_region_by_name(cxld, region_name);
> +       if (IS_ERR(region)) {
> +               device_unlock(&cxld->dev);
> +               return PTR_ERR(region);
> +       }
> +
> +       dev_dbg(&cxld->dev, "Requested removal of %s from %s\n",
> +               dev_name(&region->dev), dev_name(&cxld->dev));
> +
> +       cmpxchg(&cxld->youngest, region, NULL);
> +
> +       device_unregister(&region->dev);
> +       device_unlock(&cxld->dev);
> +
> +       put_device(&region->dev);

What get_device() does this pair with?

> +
> +       return 0;
> +}
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index dcf2d1a59271..448abc5c7918 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -191,6 +191,9 @@ enum cxl_decoder_type {
>   * @interleave_granularity: data stride per dport
>   * @target_type: accelerator vs expander (type2 vs type3) selector
>   * @flags: memory type capabilities and locking
> + * @region_ida: allocator for region ids.
> + * @regions: List of regions mapped (may be disabled) by this decoder.
> + * @youngest: Last region created for this decoder.
>   * @target: active ordered target list in current decoder configuration
>   */
>  struct cxl_decoder {
> @@ -201,6 +204,9 @@ struct cxl_decoder {
>         int interleave_granularity;
>         enum cxl_decoder_type target_type;
>         unsigned long flags;
> +       struct ida region_ida;
> +       struct list_head regions;
> +       struct cxl_region *youngest;
>         struct cxl_dport *target[];
>  };
>
> @@ -263,6 +269,14 @@ struct cxl_dport {
>         struct list_head list;
>  };
>
> +bool is_cxl_region(struct device *dev);
> +struct cxl_region *to_cxl_region(struct device *dev);
> +struct cxl_region *cxl_alloc_region(struct cxl_decoder *cxld,
> +                                   int interleave_ways);
> +void cxl_free_region(struct cxl_decoder *cxld, struct cxl_region *region);
> +int cxl_add_region(struct cxl_decoder *cxld, struct cxl_region *region);
> +int cxl_delete_region(struct cxl_decoder *cxld, const char *region);
> +
>  struct cxl_port *to_cxl_port(struct device *dev);
>  struct cxl_port *devm_cxl_add_port(struct device *host, struct device *uport,
>                                    resource_size_t component_reg_phys,
> diff --git a/drivers/cxl/region.h b/drivers/cxl/region.h
> new file mode 100644
> index 000000000000..c8ed3a8bd1e0
> --- /dev/null
> +++ b/drivers/cxl/region.h
> @@ -0,0 +1,30 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/* Copyright(c) 2021 Intel Corporation. */
> +#ifndef __CXL_REGION_H__
> +#define __CXL_REGION_H__
> +
> +#include <linux/uuid.h>
> +
> +/**
> + * struct cxl_region - CXL region
> + * @dev: This region's device.
> + * @id: This regions id. Id is globally unique across all regions.
> + * @res: Address space consumed by this region.
> + * @requested_size: Size of the region determined from LSA or userspace.
> + * @uuid: The UUID for this region.
> + * @list: Node in decoders region list.
> + * @eniw: Number of interleave ways this region is configured for.
> + * @targets: The memory devices comprising the region.
> + */
> +struct cxl_region {
> +       struct device dev;
> +       int id;
> +       struct resource *res;
> +       u64 requested_size;

Why requested_size, and not size? Perhaps this instead should be a
pointer to a 'struct cxl_region_label' where the LSA data is cached.

I think we'll want struct cxl_volatile_region and struct
cxl_persistent_region wrapping a common struct cxl_region.

> +       uuid_t uuid;
> +       struct list_head list;
> +       int eniw;
> +       struct cxl_memdev *targets[];
> +};
> +
> +#endif
> --
> 2.32.0
>
Ben Widawsky Aug. 26, 2021, 9:01 p.m. UTC | #2
On 21-08-13 19:19:28, Dan Williams wrote:
> On Fri, Jul 23, 2021 at 2:06 PM Ben Widawsky <ben.widawsky@intel.com> wrote:
> >
> > Regions are created as a child of the decoder that encompasses an
> > address space with constraints. Regions only exist for persistent
> > capacities.
> 
> Maybe I am misinterpreting, but regions need to exist for volatile
> capacities too, otherwise what's the interface for enumerating and
> reconfiguring volatile interleaves? The only difference is one is
> recorded in labels.

You mean for hotplugged volatile regions? I thought for static volatile regions,
BIOS will always set it up and we'll get just a chunk of address space via EFI
memory map. What would we be reconfiguring?

> 
> Perhaps before creating regions we should write the code to enumerate
> volatile regions that might be present since boot. We'll need that
> anyway to determine how much CFMWS space is available for dynamic
> creation. Once happy with the read side then proceed to the write
> side. Thoughts?

I'm fine either way. I started writing a tool to generate fake labels for this,
but then we switched course. I think since you sent this email, we've kind of
decided to do both at the same time.

> 
> >
> > When regions are created, the number of desired interleave ways must be
> > known. To enable this, the sysfs attribute will take the desired ways as
> > input. This interface intentionally allows creation of
> > impossible-to-enable regions based on interleave constraints in the
> > topology. The reasoning is to create new regions through the kernel
> > interfaces which may become possible on reboot under a variety of
> > circumstances.
> >
> > As an example, creating a x1 region with:
> > echo 1 > /sys/bus/cxl/devices/decoder0.0/create_region
> 
> This proposal has lost its luster for me since I last saw it, and I
> must belatedly apologize to you and Jonathan for not internalizing his
> critique of this. My previous concern [1] was that tooling would be
> surprised by sysfs_update_group() causing attributes to pop into
> existence. However, I think the asymmetry is more surprising,
> especially when it comes to reconfiguring an existing interleave to be
> a different width. There should just be one sysfs attribute that
> controls the width of a region.
> 
> This allows for additional symmetry where userspace must explicitly
> request the next region device name, like so:
> 
> # region=$(cat /sys/bus/cxl/devices/decoder0.0/create_region)
> # echo $region
> region0.0:0
> # echo $region > /sys/bus/cxl/devices/decoder0.0/create_region
> 
> If userspace races itself 2 threads will attempt
> # echo region0.0:0 > /sys/bus/cxl/devices/decoder0.0/create_region
> ...and one thread will win. This solves the long standing "next seed"
> problem (in libnvdimm and device-dax) with a way to atomically get the
> next region, and send colliding threads to create another region.
> 

Apology accepted.

> 
> [1]: https://lore.kernel.org/r/CAPcyv4iF1m+xGMaph+K8VJ0+tCvUML9-pUuAWymXoOrvY1jV1w@mail.gmail.com
> 
> 
> >
> > Will yield /sys/bus/cxl/devices/decoder0.0/region0.0:0
> >
> > That region may then be deleted with:
> > echo region0.0:0 > /sys/bus/cxl/devices/decoder0.0/delete_region
> >
> > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> > ---
> >  Documentation/ABI/testing/sysfs-bus-cxl       |  21 +++
> >  .../driver-api/cxl/memory-devices.rst         |  11 ++
> >  drivers/cxl/Makefile                          |   1 +
> >  drivers/cxl/core/Makefile                     |   2 +-
> >  drivers/cxl/core/bus.c                        |  70 ++++++++
> >  drivers/cxl/core/core.h                       |   1 +
> >  drivers/cxl/core/region.c                     | 158 ++++++++++++++++++
> >  drivers/cxl/cxl.h                             |  14 ++
> >  drivers/cxl/region.h                          |  30 ++++
> >  9 files changed, 307 insertions(+), 1 deletion(-)
> >  create mode 100644 drivers/cxl/core/region.c
> >  create mode 100644 drivers/cxl/region.h
> >
> > diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> > index 0b6a2e6e8fbb..115a25d2899d 100644
> > --- a/Documentation/ABI/testing/sysfs-bus-cxl
> > +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> > @@ -127,3 +127,24 @@ Description:
> >                 memory (type-3). The 'target_type' attribute indicates the
> >                 current setting which may dynamically change based on what
> >                 memory regions are activated in this decode hierarchy.
> > +
> > +What:          /sys/bus/cxl/devices/decoderX.Y/create_region
> > +Date:          June, 2021
> > +KernelVersion: v5.14
> > +Contact:       linux-cxl@vger.kernel.org
> > +Description:
> > +               Creates a new CXL region of N interleaved ways. Writing a value
> > +               of '2' will create a new uninitialized region with 2x interleave
> > +               that will be mapped by the CXL decoderX.Y. Reading from this
> > +               node will return the last created region. Regions must be
> > +               subsequently configured and bound to a region driver before they
> > +               can be used.
> > +
> > +What:          /sys/bus/cxl/devices/decoderX.Y/delete_region
> > +Date:          June, 2021
> > +KernelVersion: v5.14
> > +Contact:       linux-cxl@vger.kernel.org
> > +Description:
> > +               Deletes the named region. A region must be unbound from the
> > +               region driver before being deleted. The attributes expects a
> > +               region in the form "regionX.Y:Z".
> > diff --git a/Documentation/driver-api/cxl/memory-devices.rst b/Documentation/driver-api/cxl/memory-devices.rst
> > index 46847d8c70a0..96a1f8be7940 100644
> > --- a/Documentation/driver-api/cxl/memory-devices.rst
> > +++ b/Documentation/driver-api/cxl/memory-devices.rst
> > @@ -45,6 +45,17 @@ CXL Core
> >  .. kernel-doc:: drivers/cxl/core/regs.c
> >     :internal:
> >
> > +CXL Regions
> > +-----------
> > +.. kernel-doc:: drivers/cxl/region.h
> > +   :identifiers:
> > +
> > +.. kernel-doc:: drivers/cxl/core/region.c
> > +   :doc: cxl core region
> > +
> > +.. kernel-doc:: drivers/cxl/core/region.c
> > +   :identifiers:
> > +
> >  External Interfaces
> >  ===================
> >
> > diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile
> > index d1aaabc940f3..d19d22a19966 100644
> > --- a/drivers/cxl/Makefile
> > +++ b/drivers/cxl/Makefile
> > @@ -4,6 +4,7 @@ obj-$(CONFIG_CXL_MEM) += cxl_pci.o
> >  obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o
> >  obj-$(CONFIG_CXL_PMEM) += cxl_pmem.o
> >
> > +cxl_acpi-y := acpi.o
> >  cxl_pci-y := pci.o
> >  cxl_acpi-y := acpi.o
> >  cxl_pmem-y := pmem.o
> > diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
> > index 1503d8bf5e9a..4d034900e22c 100644
> > --- a/drivers/cxl/core/Makefile
> > +++ b/drivers/cxl/core/Makefile
> > @@ -2,4 +2,4 @@
> >  obj-$(CONFIG_CXL_BUS) += cxl_core.o
> >
> >  ccflags-y += -DDEFAULT_SYMBOL_NAMESPACE=CXL -I./drivers/cxl
> > -cxl_core-y := bus.o memdev.o pmem.o regs.o
> > +cxl_core-y := bus.o memdev.o pmem.o region.o regs.o
> > diff --git a/drivers/cxl/core/bus.c b/drivers/cxl/core/bus.c
> > index e2166f43aefc..3b2bcc091523 100644
> > --- a/drivers/cxl/core/bus.c
> > +++ b/drivers/cxl/core/bus.c
> > @@ -131,7 +131,68 @@ static ssize_t target_list_show(struct device *dev,
> >  }
> >  static DEVICE_ATTR_RO(target_list);
> >
> > +static ssize_t create_region_show(struct device *dev,
> > +                                 struct device_attribute *attr, char *buf)
> > +{
> > +       struct cxl_decoder *cxld = to_cxl_decoder(dev);
> > +       int rc;
> > +
> > +       device_lock(dev);
> > +       rc = sprintf(buf, "%s\n",
> > +                    cxld->youngest ? dev_name(&cxld->youngest->dev) : "");
> 
> Youngest goes away if userspace explicitly knows which region it
> atomically created.
> 
> > +       device_unlock(dev);
> > +
> > +       return rc;
> > +}
> > +
> > +static ssize_t create_region_store(struct device *dev,
> > +                                  struct device_attribute *attr,
> > +                                  const char *buf, size_t len)
> > +{
> > +       struct cxl_decoder *cxld = to_cxl_decoder(dev);
> > +       struct cxl_region *region;
> > +       ssize_t rc;
> > +       int val;
> > +
> > +       rc = kstrtoint(buf, 0, &val);
> > +       if (rc)
> > +               return rc;
> > +       if (val < 0 || val > 16)
> > +               return -EINVAL;
> > +
> > +       region = cxl_alloc_region(cxld, val);
> > +       if (IS_ERR(region))
> > +               return PTR_ERR(region);
> > +
> > +       rc = cxl_add_region(cxld, region);
> > +       if (rc) {
> > +               cxl_free_region(cxld, region);
> > +               return rc;
> > +       }
> > +
> > +       cxld->youngest = region;
> > +       return len;
> > +}
> > +static DEVICE_ATTR_RW(create_region);
> > +
> > +static ssize_t delete_region_store(struct device *dev,
> > +                                  struct device_attribute *attr,
> > +                                  const char *buf, size_t len)
> > +{
> > +       struct cxl_decoder *cxld = to_cxl_decoder(dev);
> > +       int rc;
> > +
> > +       rc = cxl_delete_region(cxld, buf);
> > +       if (rc)
> > +               return rc;
> > +
> > +       return len;
> > +}
> > +static DEVICE_ATTR_WO(delete_region);
> > +
> >  static struct attribute *cxl_decoder_base_attrs[] = {
> > +       &dev_attr_create_region.attr,
> > +       &dev_attr_delete_region.attr,
> >         &dev_attr_start.attr,
> >         &dev_attr_size.attr,
> >         &dev_attr_locked.attr,
> > @@ -188,7 +249,13 @@ static void cxl_decoder_release(struct device *dev)
> >  {
> >         struct cxl_decoder *cxld = to_cxl_decoder(dev);
> >         struct cxl_port *port = dev->parent ? to_cxl_port(dev->parent) : NULL;
> > +       struct cxl_region *region;
> >
> > +       list_for_each_entry(region, &cxld->regions, list)
> > +               cxl_delete_region(cxld, dev_name(&region->dev));
> 
> The decoder already has a list of child devices, no need to duplicate
> that, i.e. do something like:
> 
> device_for_each_child(&cxld->dev, cxld, cxl_delete_region);
> 
> > +
> > +       dev_WARN_ONCE(dev, !ida_is_empty(&cxld->region_ida),
> > +                     "Lost track of a region");
> >         if (port)
> >                 ida_free(&port->decoder_ida, cxld->id);
> >         else
> > @@ -516,8 +583,11 @@ cxl_decoder_alloc(struct cxl_port *port, int nr_targets, resource_size_t base,
> >                 .interleave_ways = interleave_ways,
> >                 .interleave_granularity = interleave_granularity,
> >                 .target_type = type,
> > +               .regions = LIST_HEAD_INIT(cxld->regions),
> 
> This goes away...
> 
> >         };
> >
> > +       ida_init(&cxld->region_ida);
> > +
> >         /* handle implied target_list */
> >         if (port)
> >                 if (interleave_ways == 1)
> > diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> > index 5e5862e4d6af..eb1a17103e5d 100644
> > --- a/drivers/cxl/core/core.h
> > +++ b/drivers/cxl/core/core.h
> > @@ -6,6 +6,7 @@
> >
> >  #include <cxl.h>
> >  #include <mem.h>
> > +#include <region.h>
> >
> >  extern const struct device_type cxl_nvdimm_bridge_type;
> >  extern const struct device_type cxl_nvdimm_type;
> > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > new file mode 100644
> > index 000000000000..caa34f59a62d
> > --- /dev/null
> > +++ b/drivers/cxl/core/region.c
> > @@ -0,0 +1,158 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/* Copyright(c) 2021 Intel Corporation. All rights reserved. */
> > +#include <linux/io-64-nonatomic-lo-hi.h>
> > +#include <linux/device.h>
> > +#include <linux/module.h>
> > +#include <linux/slab.h>
> > +#include <linux/idr.h>
> > +#include "core.h"
> > +
> > +/**
> > + * DOC: cxl core region
> > + *
> > + * Regions are managed through the Linux device model. Each region instance is a
> > + * unique struct device. CXL core provides functionality to create, destroy, and
> > + * configure regions. This is all implemented here. Binding a region
> > + * (programming the hardware) is handled by a separate region driver.
> > + */
> > +
> > +static void cxl_region_release(struct device *dev);
> > +
> > +static const struct device_type cxl_region_type = {
> > +       .name = "cxl_region",
> > +       .release = cxl_region_release,
> > +};
> > +
> > +bool is_cxl_region(struct device *dev)
> > +{
> > +       return dev->type == &cxl_region_type;
> > +}
> > +EXPORT_SYMBOL_GPL(is_cxl_region);
> > +
> > +struct cxl_region *to_cxl_region(struct device *dev)
> > +{
> > +       if (dev_WARN_ONCE(dev, dev->type != &cxl_region_type,
> > +                         "not a cxl_region device\n"))
> > +               return NULL;
> > +
> > +       return container_of(dev, struct cxl_region, dev);
> > +}
> > +EXPORT_SYMBOL_GPL(to_cxl_region);
> > +
> > +void cxl_free_region(struct cxl_decoder *cxld, struct cxl_region *region)
> > +{
> > +       ida_free(&cxld->region_ida, region->id);
> > +       kfree(region);
> > +}
> > +
> > +static void cxl_region_release(struct device *dev)
> > +{
> > +       struct cxl_decoder *cxld = to_cxl_decoder(dev->parent);
> > +       struct cxl_region *region = to_cxl_region(dev);
> > +
> > +       cxl_free_region(cxld, region);
> > +}
> > +
> > +struct cxl_region *cxl_alloc_region(struct cxl_decoder *cxld,
> > +                                   int interleave_ways)
> > +{
> > +       struct cxl_region *region;
> > +       int rc;
> > +
> > +       region = kzalloc(struct_size(region, targets, interleave_ways),
> > +                        GFP_KERNEL);
> > +       if (!region)
> > +               return ERR_PTR(-ENOMEM);
> > +
> > +       region->eniw = interleave_ways;
> > +
> > +       rc = ida_alloc(&cxld->region_ida, GFP_KERNEL);
> > +       if (rc < 0) {
> > +               dev_err(&cxld->dev, "Couldn't get a new id\n");
> > +               kfree(region);
> > +               return ERR_PTR(rc);
> > +       }
> > +       region->id = rc;
> > +
> > +       return region;
> > +}
> > +
> > +/**
> > + * cxl_add_region - Adds a region to a decoder
> > + * @cxld: Parent decoder.
> > + * @region: Region to be added to the decoder.
> > + *
> > + * This is the second step of region initialization. Regions exist within an
> > + * address space which is mapped by a @cxld, and that @cxld enforces constraints
> > + * upon the region as it is configured. Regions may be added to a @cxld but not
> > + * activated and therefore it is possible to have more regions in a @cxld than
> > + * there are interleave ways in the @cxld. Regions exist only for persistent
> > + * capacities.
> > + *
> > + * Return: zero if the region was added to the @cxld, else returns negative
> > + * error code.
> > + */
> > +int cxl_add_region(struct cxl_decoder *cxld, struct cxl_region *region)
> > +{
> > +       struct cxl_port *port = to_cxl_port(cxld->dev.parent);
> > +       struct device *dev = &region->dev;
> > +       int rc;
> > +
> > +       device_initialize(dev);
> > +       dev->parent = &cxld->dev;
> > +       device_set_pm_not_required(dev);
> > +       dev->bus = &cxl_bus_type;
> > +       dev->type = &cxl_region_type;
> > +       rc = dev_set_name(dev, "region%d.%d:%d", port->id, cxld->id, region->id);
> > +       if (rc)
> > +               goto err;
> > +
> > +       rc = device_add(dev);
> > +       if (rc)
> > +               goto err;
> > +
> > +       dev_dbg(dev, "Added %s to %s\n", dev_name(dev), dev_name(&cxld->dev));
> > +
> > +       return 0;
> > +
> > +err:
> > +       put_device(dev);
> > +       return rc;
> > +}
> > +
> > +static struct cxl_region *
> > +cxl_find_region_by_name(struct cxl_decoder *cxld, const char *name)
> > +{
> > +       struct device *region_dev;
> > +
> > +       region_dev = device_find_child_by_name(&cxld->dev, name);
> > +       if (!region_dev)
> > +               return ERR_PTR(-ENOENT);
> > +
> > +       return to_cxl_region(region_dev);
> > +}
> > +
> > +int cxl_delete_region(struct cxl_decoder *cxld, const char *region_name)
> > +{
> > +       struct cxl_region *region;
> > +
> > +       device_lock(&cxld->dev);
> > +
> > +       region = cxl_find_region_by_name(cxld, region_name);
> > +       if (IS_ERR(region)) {
> > +               device_unlock(&cxld->dev);
> > +               return PTR_ERR(region);
> > +       }
> > +
> > +       dev_dbg(&cxld->dev, "Requested removal of %s from %s\n",
> > +               dev_name(&region->dev), dev_name(&cxld->dev));
> > +
> > +       cmpxchg(&cxld->youngest, region, NULL);
> > +
> > +       device_unregister(&region->dev);
> > +       device_unlock(&cxld->dev);
> > +
> > +       put_device(&region->dev);
> 
> What get_device() does this pair with?
> 
> > +
> > +       return 0;
> > +}
> > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > index dcf2d1a59271..448abc5c7918 100644
> > --- a/drivers/cxl/cxl.h
> > +++ b/drivers/cxl/cxl.h
> > @@ -191,6 +191,9 @@ enum cxl_decoder_type {
> >   * @interleave_granularity: data stride per dport
> >   * @target_type: accelerator vs expander (type2 vs type3) selector
> >   * @flags: memory type capabilities and locking
> > + * @region_ida: allocator for region ids.
> > + * @regions: List of regions mapped (may be disabled) by this decoder.
> > + * @youngest: Last region created for this decoder.
> >   * @target: active ordered target list in current decoder configuration
> >   */
> >  struct cxl_decoder {
> > @@ -201,6 +204,9 @@ struct cxl_decoder {
> >         int interleave_granularity;
> >         enum cxl_decoder_type target_type;
> >         unsigned long flags;
> > +       struct ida region_ida;
> > +       struct list_head regions;
> > +       struct cxl_region *youngest;
> >         struct cxl_dport *target[];
> >  };
> >
> > @@ -263,6 +269,14 @@ struct cxl_dport {
> >         struct list_head list;
> >  };
> >
> > +bool is_cxl_region(struct device *dev);
> > +struct cxl_region *to_cxl_region(struct device *dev);
> > +struct cxl_region *cxl_alloc_region(struct cxl_decoder *cxld,
> > +                                   int interleave_ways);
> > +void cxl_free_region(struct cxl_decoder *cxld, struct cxl_region *region);
> > +int cxl_add_region(struct cxl_decoder *cxld, struct cxl_region *region);
> > +int cxl_delete_region(struct cxl_decoder *cxld, const char *region);
> > +
> >  struct cxl_port *to_cxl_port(struct device *dev);
> >  struct cxl_port *devm_cxl_add_port(struct device *host, struct device *uport,
> >                                    resource_size_t component_reg_phys,
> > diff --git a/drivers/cxl/region.h b/drivers/cxl/region.h
> > new file mode 100644
> > index 000000000000..c8ed3a8bd1e0
> > --- /dev/null
> > +++ b/drivers/cxl/region.h
> > @@ -0,0 +1,30 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/* Copyright(c) 2021 Intel Corporation. */
> > +#ifndef __CXL_REGION_H__
> > +#define __CXL_REGION_H__
> > +
> > +#include <linux/uuid.h>
> > +
> > +/**
> > + * struct cxl_region - CXL region
> > + * @dev: This region's device.
> > + * @id: This regions id. Id is globally unique across all regions.
> > + * @res: Address space consumed by this region.
> > + * @requested_size: Size of the region determined from LSA or userspace.
> > + * @uuid: The UUID for this region.
> > + * @list: Node in decoders region list.
> > + * @eniw: Number of interleave ways this region is configured for.
> > + * @targets: The memory devices comprising the region.
> > + */
> > +struct cxl_region {
> > +       struct device dev;
> > +       int id;
> > +       struct resource *res;
> > +       u64 requested_size;
> 
> Why requested_size, and not size? Perhaps this instead should be a
> pointer to a 'struct cxl_region_label' where the LSA data is cached.

I'd been thinking that regions could be rounded up in size by the driver. It
could point to a region label too. I hadn't been touching any label creation yet
because I thought you're bringing that up from the other side. If there's some
pointer I should use for this, I'm game. Which one?

> 
> I think we'll want struct cxl_volatile_region and struct
> cxl_persistent_region wrapping a common struct cxl_region.
> 

Reserving the right to change my mind, I think one region struct should be fine,
it's just one needs to be serialized to the LSA, and the other is ephemeral.

> > +       uuid_t uuid;
> > +       struct list_head list;
> > +       int eniw;
> > +       struct cxl_memdev *targets[];
> > +};
> > +
> > +#endif
> > --
> > 2.32.0
> >
Dan Williams Aug. 26, 2021, 9:44 p.m. UTC | #3
On Thu, Aug 26, 2021 at 2:01 PM Ben Widawsky <ben.widawsky@intel.com> wrote:
>
> On 21-08-13 19:19:28, Dan Williams wrote:
> > On Fri, Jul 23, 2021 at 2:06 PM Ben Widawsky <ben.widawsky@intel.com> wrote:
> > >
> > > Regions are created as a child of the decoder that encompasses an
> > > address space with constraints. Regions only exist for persistent
> > > capacities.
> >
> > Maybe I am misinterpreting, but regions need to exist for volatile
> > capacities too, otherwise what's the interface for enumerating and
> > reconfiguring volatile interleaves? The only difference is one is
> > recorded in labels.
>
> You mean for hotplugged volatile regions? I thought for static volatile regions,
> BIOS will always set it up and we'll get just a chunk of address space via EFI
> memory map. What would we be reconfiguring?

While the BIOS must set up CXL memory that it advertises as
Conventional Memory to the OS, there is no requirement that the BIOS
sets up *all* volatile memory present at boot. I expect that the BIOS
might do things like setup direct host-bridge attached CXL, or
whatever the OEM expects to be present. I also expect there will be
BIOS implementations that punt on configuring after-market add-on
devices, or ones that are deeper in a switch topology. Regardless
there will be CXL regions that the BIOS sets up that consume CFMWS
space and the driver needs to account for those and provide address
translation. The mechanism for kernel services and resource management
for CXL to operate is to establish CXL regions for them.



> > Perhaps before creating regions we should write the code to enumerate
> > volatile regions that might be present since boot. We'll need that
> > anyway to determine how much CFMWS space is available for dynamic
> > creation. Once happy with the read side then proceed to the write
> > side. Thoughts?
>
> I'm fine either way. I started writing a tool to generate fake labels for this,
> but then we switched course. I think since you sent this email, we've kind of
> decided to do both at the same time.

Yeah, both at the same time works for me.

[..]
> > > +/**
> > > + * struct cxl_region - CXL region
> > > + * @dev: This region's device.
> > > + * @id: This regions id. Id is globally unique across all regions.
> > > + * @res: Address space consumed by this region.
> > > + * @requested_size: Size of the region determined from LSA or userspace.
> > > + * @uuid: The UUID for this region.
> > > + * @list: Node in decoders region list.
> > > + * @eniw: Number of interleave ways this region is configured for.
> > > + * @targets: The memory devices comprising the region.
> > > + */
> > > +struct cxl_region {
> > > +       struct device dev;
> > > +       int id;
> > > +       struct resource *res;
> > > +       u64 requested_size;
> >
> > Why requested_size, and not size? Perhaps this instead should be a
> > pointer to a 'struct cxl_region_label' where the LSA data is cached.
>
> I'd been thinking that regions could be rounded up in size by the driver. It
> could point to a region label too. I hadn't been touching any label creation yet
> because I thought you're bringing that up from the other side. If there's some
> pointer I should use for this, I'm game. Which one?

'struct cxl_region_label' is currently in drivers/nvdimm/label.h it
would need to be moved to common location in include/. I am roughly
expecting that drivers/cxl/ calls into drivers/nvdimm/ to manipulate
the label area.

>
> >
> > I think we'll want struct cxl_volatile_region and struct
> > cxl_persistent_region wrapping a common struct cxl_region.
> >
>
> Reserving the right to change my mind, I think one region struct should be fine,
> it's just one needs to be serialized to the LSA, and the other is ephemeral.

It can be like nvdimm nd_region's where it's one data-structure object
with varying 'struct device_type' types.
diff mbox series

Patch

diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index 0b6a2e6e8fbb..115a25d2899d 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -127,3 +127,24 @@  Description:
 		memory (type-3). The 'target_type' attribute indicates the
 		current setting which may dynamically change based on what
 		memory regions are activated in this decode hierarchy.
+
+What:		/sys/bus/cxl/devices/decoderX.Y/create_region
+Date:		June, 2021
+KernelVersion:	v5.14
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		Creates a new CXL region of N interleaved ways. Writing a value
+		of '2' will create a new uninitialized region with 2x interleave
+		that will be mapped by the CXL decoderX.Y. Reading from this
+		node will return the last created region. Regions must be
+		subsequently configured and bound to a region driver before they
+		can be used.
+
+What:		/sys/bus/cxl/devices/decoderX.Y/delete_region
+Date:		June, 2021
+KernelVersion:	v5.14
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		Deletes the named region. A region must be unbound from the
+		region driver before being deleted. The attributes expects a
+		region in the form "regionX.Y:Z".
diff --git a/Documentation/driver-api/cxl/memory-devices.rst b/Documentation/driver-api/cxl/memory-devices.rst
index 46847d8c70a0..96a1f8be7940 100644
--- a/Documentation/driver-api/cxl/memory-devices.rst
+++ b/Documentation/driver-api/cxl/memory-devices.rst
@@ -45,6 +45,17 @@  CXL Core
 .. kernel-doc:: drivers/cxl/core/regs.c
    :internal:
 
+CXL Regions
+-----------
+.. kernel-doc:: drivers/cxl/region.h
+   :identifiers:
+
+.. kernel-doc:: drivers/cxl/core/region.c
+   :doc: cxl core region
+
+.. kernel-doc:: drivers/cxl/core/region.c
+   :identifiers:
+
 External Interfaces
 ===================
 
diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile
index d1aaabc940f3..d19d22a19966 100644
--- a/drivers/cxl/Makefile
+++ b/drivers/cxl/Makefile
@@ -4,6 +4,7 @@  obj-$(CONFIG_CXL_MEM) += cxl_pci.o
 obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o
 obj-$(CONFIG_CXL_PMEM) += cxl_pmem.o
 
+cxl_acpi-y := acpi.o
 cxl_pci-y := pci.o
 cxl_acpi-y := acpi.o
 cxl_pmem-y := pmem.o
diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
index 1503d8bf5e9a..4d034900e22c 100644
--- a/drivers/cxl/core/Makefile
+++ b/drivers/cxl/core/Makefile
@@ -2,4 +2,4 @@ 
 obj-$(CONFIG_CXL_BUS) += cxl_core.o
 
 ccflags-y += -DDEFAULT_SYMBOL_NAMESPACE=CXL -I./drivers/cxl
-cxl_core-y := bus.o memdev.o pmem.o regs.o
+cxl_core-y := bus.o memdev.o pmem.o region.o regs.o
diff --git a/drivers/cxl/core/bus.c b/drivers/cxl/core/bus.c
index e2166f43aefc..3b2bcc091523 100644
--- a/drivers/cxl/core/bus.c
+++ b/drivers/cxl/core/bus.c
@@ -131,7 +131,68 @@  static ssize_t target_list_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(target_list);
 
+static ssize_t create_region_show(struct device *dev,
+				  struct device_attribute *attr, char *buf)
+{
+	struct cxl_decoder *cxld = to_cxl_decoder(dev);
+	int rc;
+
+	device_lock(dev);
+	rc = sprintf(buf, "%s\n",
+		     cxld->youngest ? dev_name(&cxld->youngest->dev) : "");
+	device_unlock(dev);
+
+	return rc;
+}
+
+static ssize_t create_region_store(struct device *dev,
+				   struct device_attribute *attr,
+				   const char *buf, size_t len)
+{
+	struct cxl_decoder *cxld = to_cxl_decoder(dev);
+	struct cxl_region *region;
+	ssize_t rc;
+	int val;
+
+	rc = kstrtoint(buf, 0, &val);
+	if (rc)
+		return rc;
+	if (val < 0 || val > 16)
+		return -EINVAL;
+
+	region = cxl_alloc_region(cxld, val);
+	if (IS_ERR(region))
+		return PTR_ERR(region);
+
+	rc = cxl_add_region(cxld, region);
+	if (rc) {
+		cxl_free_region(cxld, region);
+		return rc;
+	}
+
+	cxld->youngest = region;
+	return len;
+}
+static DEVICE_ATTR_RW(create_region);
+
+static ssize_t delete_region_store(struct device *dev,
+				   struct device_attribute *attr,
+				   const char *buf, size_t len)
+{
+	struct cxl_decoder *cxld = to_cxl_decoder(dev);
+	int rc;
+
+	rc = cxl_delete_region(cxld, buf);
+	if (rc)
+		return rc;
+
+	return len;
+}
+static DEVICE_ATTR_WO(delete_region);
+
 static struct attribute *cxl_decoder_base_attrs[] = {
+	&dev_attr_create_region.attr,
+	&dev_attr_delete_region.attr,
 	&dev_attr_start.attr,
 	&dev_attr_size.attr,
 	&dev_attr_locked.attr,
@@ -188,7 +249,13 @@  static void cxl_decoder_release(struct device *dev)
 {
 	struct cxl_decoder *cxld = to_cxl_decoder(dev);
 	struct cxl_port *port = dev->parent ? to_cxl_port(dev->parent) : NULL;
+	struct cxl_region *region;
 
+	list_for_each_entry(region, &cxld->regions, list)
+		cxl_delete_region(cxld, dev_name(&region->dev));
+
+	dev_WARN_ONCE(dev, !ida_is_empty(&cxld->region_ida),
+		      "Lost track of a region");
 	if (port)
 		ida_free(&port->decoder_ida, cxld->id);
 	else
@@ -516,8 +583,11 @@  cxl_decoder_alloc(struct cxl_port *port, int nr_targets, resource_size_t base,
 		.interleave_ways = interleave_ways,
 		.interleave_granularity = interleave_granularity,
 		.target_type = type,
+		.regions = LIST_HEAD_INIT(cxld->regions),
 	};
 
+	ida_init(&cxld->region_ida);
+
 	/* handle implied target_list */
 	if (port)
 		if (interleave_ways == 1)
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 5e5862e4d6af..eb1a17103e5d 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -6,6 +6,7 @@ 
 
 #include <cxl.h>
 #include <mem.h>
+#include <region.h>
 
 extern const struct device_type cxl_nvdimm_bridge_type;
 extern const struct device_type cxl_nvdimm_type;
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
new file mode 100644
index 000000000000..caa34f59a62d
--- /dev/null
+++ b/drivers/cxl/core/region.c
@@ -0,0 +1,158 @@ 
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright(c) 2021 Intel Corporation. All rights reserved. */
+#include <linux/io-64-nonatomic-lo-hi.h>
+#include <linux/device.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/idr.h>
+#include "core.h"
+
+/**
+ * DOC: cxl core region
+ *
+ * Regions are managed through the Linux device model. Each region instance is a
+ * unique struct device. CXL core provides functionality to create, destroy, and
+ * configure regions. This is all implemented here. Binding a region
+ * (programming the hardware) is handled by a separate region driver.
+ */
+
+static void cxl_region_release(struct device *dev);
+
+static const struct device_type cxl_region_type = {
+	.name = "cxl_region",
+	.release = cxl_region_release,
+};
+
+bool is_cxl_region(struct device *dev)
+{
+	return dev->type == &cxl_region_type;
+}
+EXPORT_SYMBOL_GPL(is_cxl_region);
+
+struct cxl_region *to_cxl_region(struct device *dev)
+{
+	if (dev_WARN_ONCE(dev, dev->type != &cxl_region_type,
+			  "not a cxl_region device\n"))
+		return NULL;
+
+	return container_of(dev, struct cxl_region, dev);
+}
+EXPORT_SYMBOL_GPL(to_cxl_region);
+
+void cxl_free_region(struct cxl_decoder *cxld, struct cxl_region *region)
+{
+	ida_free(&cxld->region_ida, region->id);
+	kfree(region);
+}
+
+static void cxl_region_release(struct device *dev)
+{
+	struct cxl_decoder *cxld = to_cxl_decoder(dev->parent);
+	struct cxl_region *region = to_cxl_region(dev);
+
+	cxl_free_region(cxld, region);
+}
+
+struct cxl_region *cxl_alloc_region(struct cxl_decoder *cxld,
+				    int interleave_ways)
+{
+	struct cxl_region *region;
+	int rc;
+
+	region = kzalloc(struct_size(region, targets, interleave_ways),
+			 GFP_KERNEL);
+	if (!region)
+		return ERR_PTR(-ENOMEM);
+
+	region->eniw = interleave_ways;
+
+	rc = ida_alloc(&cxld->region_ida, GFP_KERNEL);
+	if (rc < 0) {
+		dev_err(&cxld->dev, "Couldn't get a new id\n");
+		kfree(region);
+		return ERR_PTR(rc);
+	}
+	region->id = rc;
+
+	return region;
+}
+
+/**
+ * cxl_add_region - Adds a region to a decoder
+ * @cxld: Parent decoder.
+ * @region: Region to be added to the decoder.
+ *
+ * This is the second step of region initialization. Regions exist within an
+ * address space which is mapped by a @cxld, and that @cxld enforces constraints
+ * upon the region as it is configured. Regions may be added to a @cxld but not
+ * activated and therefore it is possible to have more regions in a @cxld than
+ * there are interleave ways in the @cxld. Regions exist only for persistent
+ * capacities.
+ *
+ * Return: zero if the region was added to the @cxld, else returns negative
+ * error code.
+ */
+int cxl_add_region(struct cxl_decoder *cxld, struct cxl_region *region)
+{
+	struct cxl_port *port = to_cxl_port(cxld->dev.parent);
+	struct device *dev = &region->dev;
+	int rc;
+
+	device_initialize(dev);
+	dev->parent = &cxld->dev;
+	device_set_pm_not_required(dev);
+	dev->bus = &cxl_bus_type;
+	dev->type = &cxl_region_type;
+	rc = dev_set_name(dev, "region%d.%d:%d", port->id, cxld->id, region->id);
+	if (rc)
+		goto err;
+
+	rc = device_add(dev);
+	if (rc)
+		goto err;
+
+	dev_dbg(dev, "Added %s to %s\n", dev_name(dev), dev_name(&cxld->dev));
+
+	return 0;
+
+err:
+	put_device(dev);
+	return rc;
+}
+
+static struct cxl_region *
+cxl_find_region_by_name(struct cxl_decoder *cxld, const char *name)
+{
+	struct device *region_dev;
+
+	region_dev = device_find_child_by_name(&cxld->dev, name);
+	if (!region_dev)
+		return ERR_PTR(-ENOENT);
+
+	return to_cxl_region(region_dev);
+}
+
+int cxl_delete_region(struct cxl_decoder *cxld, const char *region_name)
+{
+	struct cxl_region *region;
+
+	device_lock(&cxld->dev);
+
+	region = cxl_find_region_by_name(cxld, region_name);
+	if (IS_ERR(region)) {
+		device_unlock(&cxld->dev);
+		return PTR_ERR(region);
+	}
+
+	dev_dbg(&cxld->dev, "Requested removal of %s from %s\n",
+		dev_name(&region->dev), dev_name(&cxld->dev));
+
+	cmpxchg(&cxld->youngest, region, NULL);
+
+	device_unregister(&region->dev);
+	device_unlock(&cxld->dev);
+
+	put_device(&region->dev);
+
+	return 0;
+}
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index dcf2d1a59271..448abc5c7918 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -191,6 +191,9 @@  enum cxl_decoder_type {
  * @interleave_granularity: data stride per dport
  * @target_type: accelerator vs expander (type2 vs type3) selector
  * @flags: memory type capabilities and locking
+ * @region_ida: allocator for region ids.
+ * @regions: List of regions mapped (may be disabled) by this decoder.
+ * @youngest: Last region created for this decoder.
  * @target: active ordered target list in current decoder configuration
  */
 struct cxl_decoder {
@@ -201,6 +204,9 @@  struct cxl_decoder {
 	int interleave_granularity;
 	enum cxl_decoder_type target_type;
 	unsigned long flags;
+	struct ida region_ida;
+	struct list_head regions;
+	struct cxl_region *youngest;
 	struct cxl_dport *target[];
 };
 
@@ -263,6 +269,14 @@  struct cxl_dport {
 	struct list_head list;
 };
 
+bool is_cxl_region(struct device *dev);
+struct cxl_region *to_cxl_region(struct device *dev);
+struct cxl_region *cxl_alloc_region(struct cxl_decoder *cxld,
+				    int interleave_ways);
+void cxl_free_region(struct cxl_decoder *cxld, struct cxl_region *region);
+int cxl_add_region(struct cxl_decoder *cxld, struct cxl_region *region);
+int cxl_delete_region(struct cxl_decoder *cxld, const char *region);
+
 struct cxl_port *to_cxl_port(struct device *dev);
 struct cxl_port *devm_cxl_add_port(struct device *host, struct device *uport,
 				   resource_size_t component_reg_phys,
diff --git a/drivers/cxl/region.h b/drivers/cxl/region.h
new file mode 100644
index 000000000000..c8ed3a8bd1e0
--- /dev/null
+++ b/drivers/cxl/region.h
@@ -0,0 +1,30 @@ 
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright(c) 2021 Intel Corporation. */
+#ifndef __CXL_REGION_H__
+#define __CXL_REGION_H__
+
+#include <linux/uuid.h>
+
+/**
+ * struct cxl_region - CXL region
+ * @dev: This region's device.
+ * @id: This regions id. Id is globally unique across all regions.
+ * @res: Address space consumed by this region.
+ * @requested_size: Size of the region determined from LSA or userspace.
+ * @uuid: The UUID for this region.
+ * @list: Node in decoders region list.
+ * @eniw: Number of interleave ways this region is configured for.
+ * @targets: The memory devices comprising the region.
+ */
+struct cxl_region {
+	struct device dev;
+	int id;
+	struct resource *res;
+	u64 requested_size;
+	uuid_t uuid;
+	struct list_head list;
+	int eniw;
+	struct cxl_memdev *targets[];
+};
+
+#endif