Message ID | 165603883814.551046.17226119386543525679.stgit@dwillia2-xfh (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | CXL PMEM Region Provisioning | expand |
On Thu, 23 Jun 2022 19:47:18 -0700 Dan Williams <dan.j.williams@intel.com> wrote: > The region provisioning flow will roughly follow a sequence of: > > 1/ Allocate DPA to a set of decoders > > 2/ Allocate HPA to a region > > 3/ Associate decoders with a region and validate that the DPA allocations > and topologies match the parameters of the region. > > For now, this change (step 1) arranges for DPA capacity to be allocated > and deleted from non-committed decoders based on the decoder's mode / > partition selection. Capacity is allocated from the lowest DPA in the > partition and any 'pmem' allocation blocks out all remaining ram > capacity in its 'skip' setting. DPA allocations are enforced in decoder > instance order. I.e. decoder N + 1 always starts at a higher DPA than > instance N, and deleting allocations must proceed from the > highest-instance allocated decoder to the lowest. > > Signed-off-by: Dan Williams <dan.j.williams@intel.com> The error value setting in here might save a few lines, but to me it is less readable than setting rc in each error path. > --- > Documentation/ABI/testing/sysfs-bus-cxl | 37 +++++++ > drivers/cxl/core/core.h | 7 + > drivers/cxl/core/hdm.c | 160 +++++++++++++++++++++++++++++++ > drivers/cxl/core/port.c | 73 ++++++++++++++ > 4 files changed, 275 insertions(+), 2 deletions(-) > > diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl > index 091459216e11..85844f9bc00b 100644 > --- a/Documentation/ABI/testing/sysfs-bus-cxl > +++ b/Documentation/ABI/testing/sysfs-bus-cxl > @@ -171,7 +171,7 @@ Date: May, 2022 > KernelVersion: v5.20 > Contact: linux-cxl@vger.kernel.org > Description: > - (RO) When a CXL decoder is of devtype "cxl_decoder_endpoint" it > + (RW) When a CXL decoder is of devtype "cxl_decoder_endpoint" it > translates from a host physical address range, to a device local > address range. Device-local address ranges are further split > into a 'ram' (volatile memory) range and 'pmem' (persistent > @@ -180,3 +180,38 @@ Description: > when a decoder straddles the volatile/persistent partition > boundary, and 'none' indicates the decoder is not actively > decoding, or no DPA allocation policy has been set. > + > + 'mode' can be written, when the decoder is in the 'disabled' > + state, with either 'ram' or 'pmem' to set the boundaries for the > + next allocation. > + As before, documentation above this in the file only uses single line break between entries. > + > +What: /sys/bus/cxl/devices/decoderX.Y/dpa_resource > +Date: May, 2022 > +KernelVersion: v5.20 > +Contact: linux-cxl@vger.kernel.org > +Description: > + (RO) When a CXL decoder is of devtype "cxl_decoder_endpoint", > + and its 'dpa_size' attribute is non-zero, this attribute > + indicates the device physical address (DPA) base address of the > + allocation. Why _resource rather than _base or _start? > + > + > +What: /sys/bus/cxl/devices/decoderX.Y/dpa_size > +Date: May, 2022 > +KernelVersion: v5.20 > +Contact: linux-cxl@vger.kernel.org > +Description: > + (RW) When a CXL decoder is of devtype "cxl_decoder_endpoint" it > + translates from a host physical address range, to a device local > + address range. The range, base address plus length in bytes, of > + DPA allocated to this decoder is conveyed in these 2 attributes. > + Allocations can be mutated as long as the decoder is in the > + disabled state. A write to 'size' releases the previous DPA 'dpa_size' ? > + allocation and then attempts to allocate from the free capacity > + in the device partition referred to by 'decoderX.Y/mode'. > + Allocate and free requests can only be performed on the highest > + instance number disabled decoder with non-zero size. I.e. > + allocations are enforced to occur in increasing 'decoderX.Y/id' > + order and frees are enforced to occur in decreasing > + 'decoderX.Y/id' order. > diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h > index 1a50c0fc399c..47cf0c286fc3 100644 > --- a/drivers/cxl/core/core.h > +++ b/drivers/cxl/core/core.h > @@ -17,6 +17,13 @@ int cxl_send_cmd(struct cxl_memdev *cxlmd, struct cxl_send_command __user *s); > void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr, > resource_size_t length); > > +int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled, > + enum cxl_decoder_mode mode); > +int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size); > +int cxl_dpa_free(struct cxl_endpoint_decoder *cxled); > +resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled); > +resource_size_t cxl_dpa_resource(struct cxl_endpoint_decoder *cxled); > + > int cxl_memdev_init(void); > void cxl_memdev_exit(void); > void cxl_mbox_init(void); > diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c > index 8805afe63ebf..ceb4c28abc1b 100644 > --- a/drivers/cxl/core/hdm.c > +++ b/drivers/cxl/core/hdm.c > @@ -248,6 +248,166 @@ static int cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, > return devm_add_action_or_reset(&port->dev, cxl_dpa_release, cxled); > } > > +resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled) > +{ > + resource_size_t size = 0; > + > + down_read(&cxl_dpa_rwsem); > + if (cxled->dpa_res) > + size = resource_size(cxled->dpa_res); > + up_read(&cxl_dpa_rwsem); > + > + return size; > +} > + > +resource_size_t cxl_dpa_resource(struct cxl_endpoint_decoder *cxled) Instinct would be to expect this to return the resource, not the start. Rename? > +{ > + resource_size_t base = -1; > + > + down_read(&cxl_dpa_rwsem); > + if (cxled->dpa_res) > + base = cxled->dpa_res->start; > + up_read(&cxl_dpa_rwsem); > + > + return base; > +} > + > +int cxl_dpa_free(struct cxl_endpoint_decoder *cxled) > +{ > + int rc = -EBUSY; > + struct device *dev = &cxled->cxld.dev; > + struct cxl_port *port = to_cxl_port(dev->parent); > + > + down_write(&cxl_dpa_rwsem); > + if (!cxled->dpa_res) { > + rc = 0; > + goto out; > + } > + if (cxled->cxld.flags & CXL_DECODER_F_ENABLE) { > + dev_dbg(dev, "decoder enabled\n"); I'd prefer explicit setting of rc = -EBUSY in the two 'error' paths to make it really clear when looking at these that they are treated as errors. > + goto out; > + } > + if (cxled->cxld.id != port->dpa_end) { > + dev_dbg(dev, "expected decoder%d.%d\n", port->id, > + port->dpa_end); > + goto out; > + } > + __cxl_dpa_release(cxled, true); > + rc = 0; > +out: > + up_write(&cxl_dpa_rwsem); > + return rc; > +} > + > +int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled, > + enum cxl_decoder_mode mode) > +{ > + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); > + struct cxl_dev_state *cxlds = cxlmd->cxlds; > + struct device *dev = &cxled->cxld.dev; > + int rc = -EBUSY; As above, I'd prefer seeing error set in each error path rther than it being set in a few locations and having to go look for which value it currently has. To me having the error code next to the condition is much easier to follow. > + > + switch (mode) { > + case CXL_DECODER_RAM: > + case CXL_DECODER_PMEM: > + break; > + default: > + dev_dbg(dev, "unsupported mode: %d\n", mode); > + return -EINVAL; > + } > + > + down_write(&cxl_dpa_rwsem); > + if (cxled->cxld.flags & CXL_DECODER_F_ENABLE) > + goto out; > + /* > + * Only allow modes that are supported by the current partition > + * configuration > + */ > + rc = -ENXIO; > + if (mode == CXL_DECODER_PMEM && !resource_size(&cxlds->pmem_res)) { > + dev_dbg(dev, "no available pmem capacity\n"); > + goto out; > + } > + if (mode == CXL_DECODER_RAM && !resource_size(&cxlds->ram_res)) { > + dev_dbg(dev, "no available ram capacity\n"); > + goto out; > + } > + > + cxled->mode = mode; > + rc = 0; > +out: > + up_write(&cxl_dpa_rwsem); > + > + return rc; > +} > + > +int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size) > +{ > + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); > + resource_size_t free_ram_start, free_pmem_start; > + struct cxl_port *port = cxled_to_port(cxled); > + struct cxl_dev_state *cxlds = cxlmd->cxlds; > + struct device *dev = &cxled->cxld.dev; > + resource_size_t start, avail, skip; > + struct resource *p, *last; > + int rc = -EBUSY; > + > + down_write(&cxl_dpa_rwsem); > + if (cxled->cxld.flags & CXL_DECODER_F_ENABLE) { > + dev_dbg(dev, "decoder enabled\n"); > + goto out; -EBUSY only used in this path, so clearer to me to push that setting down to in this error path. > + } > + > + for (p = cxlds->ram_res.child, last = NULL; p; p = p->sibling) > + last = p; > + if (last) > + free_ram_start = last->end + 1; > + else > + free_ram_start = cxlds->ram_res.start; > + > + for (p = cxlds->pmem_res.child, last = NULL; p; p = p->sibling) > + last = p; > + if (last) > + free_pmem_start = last->end + 1; > + else > + free_pmem_start = cxlds->pmem_res.start; > + > + if (cxled->mode == CXL_DECODER_RAM) { > + start = free_ram_start; > + avail = cxlds->ram_res.end - start + 1; > + skip = 0; > + } else if (cxled->mode == CXL_DECODER_PMEM) { > + resource_size_t skip_start, skip_end; > + > + start = free_pmem_start; > + avail = cxlds->pmem_res.end - start + 1; > + skip_start = free_ram_start; > + skip_end = start - 1; > + skip = skip_end - skip_start + 1; > + } else { > + dev_dbg(dev, "mode not set\n"); > + rc = -EINVAL; > + goto out; > + } > + > + if (size > avail) { > + dev_dbg(dev, "%pa exceeds available %s capacity: %pa\n", &size, > + cxled->mode == CXL_DECODER_RAM ? "ram" : "pmem", > + &avail); > + rc = -ENOSPC; > + goto out; > + } > + > + rc = __cxl_dpa_reserve(cxled, start, size, skip); > +out: > + up_write(&cxl_dpa_rwsem); > + > + if (rc) > + return rc; > + > + return devm_add_action_or_reset(&port->dev, cxl_dpa_release, cxled); > +} > + > static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld, > int *target_map, void __iomem *hdm, int which, > u64 *dpa_base) > >
Jonathan Cameron wrote: > On Thu, 23 Jun 2022 19:47:18 -0700 > Dan Williams <dan.j.williams@intel.com> wrote: > > > The region provisioning flow will roughly follow a sequence of: > > > > 1/ Allocate DPA to a set of decoders > > > > 2/ Allocate HPA to a region > > > > 3/ Associate decoders with a region and validate that the DPA allocations > > and topologies match the parameters of the region. > > > > For now, this change (step 1) arranges for DPA capacity to be allocated > > and deleted from non-committed decoders based on the decoder's mode / > > partition selection. Capacity is allocated from the lowest DPA in the > > partition and any 'pmem' allocation blocks out all remaining ram > > capacity in its 'skip' setting. DPA allocations are enforced in decoder > > instance order. I.e. decoder N + 1 always starts at a higher DPA than > > instance N, and deleting allocations must proceed from the > > highest-instance allocated decoder to the lowest. > > > > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > > The error value setting in here might save a few lines, but to me it > is less readable than setting rc in each error path. > > > --- > > Documentation/ABI/testing/sysfs-bus-cxl | 37 +++++++ > > drivers/cxl/core/core.h | 7 + > > drivers/cxl/core/hdm.c | 160 +++++++++++++++++++++++++++++++ > > drivers/cxl/core/port.c | 73 ++++++++++++++ > > 4 files changed, 275 insertions(+), 2 deletions(-) > > > > diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl > > index 091459216e11..85844f9bc00b 100644 > > --- a/Documentation/ABI/testing/sysfs-bus-cxl > > +++ b/Documentation/ABI/testing/sysfs-bus-cxl > > @@ -171,7 +171,7 @@ Date: May, 2022 > > KernelVersion: v5.20 > > Contact: linux-cxl@vger.kernel.org > > Description: > > - (RO) When a CXL decoder is of devtype "cxl_decoder_endpoint" it > > + (RW) When a CXL decoder is of devtype "cxl_decoder_endpoint" it > > translates from a host physical address range, to a device local > > address range. Device-local address ranges are further split > > into a 'ram' (volatile memory) range and 'pmem' (persistent > > @@ -180,3 +180,38 @@ Description: > > when a decoder straddles the volatile/persistent partition > > boundary, and 'none' indicates the decoder is not actively > > decoding, or no DPA allocation policy has been set. > > + > > + 'mode' can be written, when the decoder is in the 'disabled' > > + state, with either 'ram' or 'pmem' to set the boundaries for the > > + next allocation. > > + > > As before, documentation above this in the file only uses single line break between > entries. Yeah, I'll go fix those up as a precursor patch. > > > + > > +What: /sys/bus/cxl/devices/decoderX.Y/dpa_resource > > +Date: May, 2022 > > +KernelVersion: v5.20 > > +Contact: linux-cxl@vger.kernel.org > > +Description: > > + (RO) When a CXL decoder is of devtype "cxl_decoder_endpoint", > > + and its 'dpa_size' attribute is non-zero, this attribute > > + indicates the device physical address (DPA) base address of the > > + allocation. > > Why _resource rather than _base or _start? To mimic PCI and NVDIMM sysfs that calls it a 'resource' address. > > > + > > + > > +What: /sys/bus/cxl/devices/decoderX.Y/dpa_size > > +Date: May, 2022 > > +KernelVersion: v5.20 > > +Contact: linux-cxl@vger.kernel.org > > +Description: > > + (RW) When a CXL decoder is of devtype "cxl_decoder_endpoint" it > > + translates from a host physical address range, to a device local > > + address range. The range, base address plus length in bytes, of > > + DPA allocated to this decoder is conveyed in these 2 attributes. > > + Allocations can be mutated as long as the decoder is in the > > + disabled state. A write to 'size' releases the previous DPA > > 'dpa_size' ? Yes. > > > + allocation and then attempts to allocate from the free capacity > > + in the device partition referred to by 'decoderX.Y/mode'. > > + Allocate and free requests can only be performed on the highest > > + instance number disabled decoder with non-zero size. I.e. > > + allocations are enforced to occur in increasing 'decoderX.Y/id' > > + order and frees are enforced to occur in decreasing > > + 'decoderX.Y/id' order. > > diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h > > index 1a50c0fc399c..47cf0c286fc3 100644 > > --- a/drivers/cxl/core/core.h > > +++ b/drivers/cxl/core/core.h > > @@ -17,6 +17,13 @@ int cxl_send_cmd(struct cxl_memdev *cxlmd, struct cxl_send_command __user *s); > > void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr, > > resource_size_t length); > > > > +int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled, > > + enum cxl_decoder_mode mode); > > +int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size); > > +int cxl_dpa_free(struct cxl_endpoint_decoder *cxled); > > +resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled); > > +resource_size_t cxl_dpa_resource(struct cxl_endpoint_decoder *cxled); > > + > > int cxl_memdev_init(void); > > void cxl_memdev_exit(void); > > void cxl_mbox_init(void); > > diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c > > index 8805afe63ebf..ceb4c28abc1b 100644 > > --- a/drivers/cxl/core/hdm.c > > +++ b/drivers/cxl/core/hdm.c > > @@ -248,6 +248,166 @@ static int cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, > > return devm_add_action_or_reset(&port->dev, cxl_dpa_release, cxled); > > } > > > > +resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled) > > +{ > > + resource_size_t size = 0; > > + > > + down_read(&cxl_dpa_rwsem); > > + if (cxled->dpa_res) > > + size = resource_size(cxled->dpa_res); > > + up_read(&cxl_dpa_rwsem); > > + > > + return size; > > +} > > + > > +resource_size_t cxl_dpa_resource(struct cxl_endpoint_decoder *cxled) > > Instinct would be to expect this to return the resource, not the start. > Rename? I can add _start, but this is only servicing the dpa_resource_show() sysfs operation, and there 'resource' as a the base address has ABI history. > > > > +{ > > + resource_size_t base = -1; > > + > > + down_read(&cxl_dpa_rwsem); > > + if (cxled->dpa_res) > > + base = cxled->dpa_res->start; > > + up_read(&cxl_dpa_rwsem); > > + > > + return base; > > +} > > + > > +int cxl_dpa_free(struct cxl_endpoint_decoder *cxled) > > +{ > > + int rc = -EBUSY; > > + struct device *dev = &cxled->cxld.dev; > > + struct cxl_port *port = to_cxl_port(dev->parent); > > + > > + down_write(&cxl_dpa_rwsem); > > + if (!cxled->dpa_res) { > > + rc = 0; > > + goto out; > > + } > > + if (cxled->cxld.flags & CXL_DECODER_F_ENABLE) { > > + dev_dbg(dev, "decoder enabled\n"); > > I'd prefer explicit setting of rc = -EBUSY in the two > 'error' paths to make it really clear when looking at these > that they are treated as errors. Ok. > > > + goto out; > > + } > > + if (cxled->cxld.id != port->dpa_end) { > > + dev_dbg(dev, "expected decoder%d.%d\n", port->id, > > + port->dpa_end); > > + goto out; > > + } > > + __cxl_dpa_release(cxled, true); > > + rc = 0; > > +out: > > + up_write(&cxl_dpa_rwsem); > > + return rc; > > +} > > + > > +int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled, > > + enum cxl_decoder_mode mode) > > +{ > > + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); > > + struct cxl_dev_state *cxlds = cxlmd->cxlds; > > + struct device *dev = &cxled->cxld.dev; > > + int rc = -EBUSY; > > As above, I'd prefer seeing error set in each error path rther > than it being set in a few locations and having to go look > for which value it currently has. To me having the > error code next to the condition is much easier to follow. Fair enough. > > > + > > + switch (mode) { > > + case CXL_DECODER_RAM: > > + case CXL_DECODER_PMEM: > > + break; > > + default: > > + dev_dbg(dev, "unsupported mode: %d\n", mode); > > + return -EINVAL; > > + } > > + > > + down_write(&cxl_dpa_rwsem); > > + if (cxled->cxld.flags & CXL_DECODER_F_ENABLE) > > + goto out; > > + /* > > + * Only allow modes that are supported by the current partition > > + * configuration > > + */ > > + rc = -ENXIO; > > + if (mode == CXL_DECODER_PMEM && !resource_size(&cxlds->pmem_res)) { > > + dev_dbg(dev, "no available pmem capacity\n"); > > + goto out; > > + } > > + if (mode == CXL_DECODER_RAM && !resource_size(&cxlds->ram_res)) { > > + dev_dbg(dev, "no available ram capacity\n"); > > + goto out; > > + } > > + > > + cxled->mode = mode; > > + rc = 0; > > +out: > > + up_write(&cxl_dpa_rwsem); > > + > > + return rc; > > +} > > + > > +int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size) > > +{ > > + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); > > + resource_size_t free_ram_start, free_pmem_start; > > + struct cxl_port *port = cxled_to_port(cxled); > > + struct cxl_dev_state *cxlds = cxlmd->cxlds; > > + struct device *dev = &cxled->cxld.dev; > > + resource_size_t start, avail, skip; > > + struct resource *p, *last; > > + int rc = -EBUSY; > > + > > + down_write(&cxl_dpa_rwsem); > > + if (cxled->cxld.flags & CXL_DECODER_F_ENABLE) { > > + dev_dbg(dev, "decoder enabled\n"); > > + goto out; > > > -EBUSY only used in this path, so clearer to me to push that setting down > to in this error path. Ok. Folded in these changes: diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl index 85844f9bc00b..3fa6da73751e 100644 --- a/Documentation/ABI/testing/sysfs-bus-cxl +++ b/Documentation/ABI/testing/sysfs-bus-cxl @@ -207,7 +207,7 @@ Description: address range. The range, base address plus length in bytes, of DPA allocated to this decoder is conveyed in these 2 attributes. Allocations can be mutated as long as the decoder is in the - disabled state. A write to 'size' releases the previous DPA + disabled state. A write to 'dpa_size' releases the previous DPA allocation and then attempts to allocate from the free capacity in the device partition referred to by 'decoderX.Y/mode'. Allocate and free requests can only be performed on the highest diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h index 47cf0c286fc3..65bcaecec405 100644 --- a/drivers/cxl/core/core.h +++ b/drivers/cxl/core/core.h @@ -22,7 +22,7 @@ int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled, int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size); int cxl_dpa_free(struct cxl_endpoint_decoder *cxled); resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled); -resource_size_t cxl_dpa_resource(struct cxl_endpoint_decoder *cxled); +resource_size_t cxl_dpa_resource_start(struct cxl_endpoint_decoder *cxled); int cxl_memdev_init(void); void cxl_memdev_exit(void); diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index 0c1ff3c0142f..e9281557781d 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -270,7 +270,7 @@ resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled) return size; } -resource_size_t cxl_dpa_resource(struct cxl_endpoint_decoder *cxled) +resource_size_t cxl_dpa_resource_start(struct cxl_endpoint_decoder *cxled) { resource_size_t base = -1; @@ -284,9 +284,9 @@ resource_size_t cxl_dpa_resource(struct cxl_endpoint_decoder *cxled) int cxl_dpa_free(struct cxl_endpoint_decoder *cxled) { - int rc = -EBUSY; + struct cxl_port *port = cxled_to_port(cxled); struct device *dev = &cxled->cxld.dev; - struct cxl_port *port = to_cxl_port(dev->parent); + int rc; down_write(&cxl_dpa_rwsem); if (!cxled->dpa_res) { @@ -295,11 +295,13 @@ int cxl_dpa_free(struct cxl_endpoint_decoder *cxled) } if (cxled->cxld.flags & CXL_DECODER_F_ENABLE) { dev_dbg(dev, "decoder enabled\n"); + rc = -EBUSY; goto out; } - if (cxled->cxld.id != port->dpa_end) { + if (cxled->cxld.id != port->hdm_end) { dev_dbg(dev, "expected decoder%d.%d\n", port->id, - port->dpa_end); + port->hdm_end); + rc = -EBUSY; goto out; } devm_cxl_dpa_release(cxled); @@ -315,7 +317,7 @@ int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled, struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); struct cxl_dev_state *cxlds = cxlmd->cxlds; struct device *dev = &cxled->cxld.dev; - int rc = -EBUSY; + int rc; switch (mode) { case CXL_DECODER_RAM: @@ -327,19 +329,23 @@ int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled, } down_write(&cxl_dpa_rwsem); - if (cxled->cxld.flags & CXL_DECODER_F_ENABLE) + if (cxled->cxld.flags & CXL_DECODER_F_ENABLE) { + rc = -EBUSY; goto out; + } + /* * Only allow modes that are supported by the current partition * configuration */ - rc = -ENXIO; if (mode == CXL_DECODER_PMEM && !resource_size(&cxlds->pmem_res)) { dev_dbg(dev, "no available pmem capacity\n"); + rc = -ENXIO; goto out; } if (mode == CXL_DECODER_RAM && !resource_size(&cxlds->ram_res)) { dev_dbg(dev, "no available ram capacity\n"); + rc = -ENXIO; goto out; } @@ -360,11 +366,12 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size) struct device *dev = &cxled->cxld.dev; resource_size_t start, avail, skip; struct resource *p, *last; - int rc = -EBUSY; + int rc; down_write(&cxl_dpa_rwsem); if (cxled->cxld.flags & CXL_DECODER_F_ENABLE) { dev_dbg(dev, "decoder enabled\n"); + rc = -EBUSY; goto out; } diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 130989846cce..feed86737202 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -215,7 +215,7 @@ static ssize_t dpa_resource_show(struct device *dev, struct device_attribute *at char *buf) { struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev); - u64 base = cxl_dpa_resource(cxled); + u64 base = cxl_dpa_resource_start(cxled); return sysfs_emit(buf, "%#llx\n", base); }
diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl index 091459216e11..85844f9bc00b 100644 --- a/Documentation/ABI/testing/sysfs-bus-cxl +++ b/Documentation/ABI/testing/sysfs-bus-cxl @@ -171,7 +171,7 @@ Date: May, 2022 KernelVersion: v5.20 Contact: linux-cxl@vger.kernel.org Description: - (RO) When a CXL decoder is of devtype "cxl_decoder_endpoint" it + (RW) When a CXL decoder is of devtype "cxl_decoder_endpoint" it translates from a host physical address range, to a device local address range. Device-local address ranges are further split into a 'ram' (volatile memory) range and 'pmem' (persistent @@ -180,3 +180,38 @@ Description: when a decoder straddles the volatile/persistent partition boundary, and 'none' indicates the decoder is not actively decoding, or no DPA allocation policy has been set. + + 'mode' can be written, when the decoder is in the 'disabled' + state, with either 'ram' or 'pmem' to set the boundaries for the + next allocation. + + +What: /sys/bus/cxl/devices/decoderX.Y/dpa_resource +Date: May, 2022 +KernelVersion: v5.20 +Contact: linux-cxl@vger.kernel.org +Description: + (RO) When a CXL decoder is of devtype "cxl_decoder_endpoint", + and its 'dpa_size' attribute is non-zero, this attribute + indicates the device physical address (DPA) base address of the + allocation. + + +What: /sys/bus/cxl/devices/decoderX.Y/dpa_size +Date: May, 2022 +KernelVersion: v5.20 +Contact: linux-cxl@vger.kernel.org +Description: + (RW) When a CXL decoder is of devtype "cxl_decoder_endpoint" it + translates from a host physical address range, to a device local + address range. The range, base address plus length in bytes, of + DPA allocated to this decoder is conveyed in these 2 attributes. + Allocations can be mutated as long as the decoder is in the + disabled state. A write to 'size' releases the previous DPA + allocation and then attempts to allocate from the free capacity + in the device partition referred to by 'decoderX.Y/mode'. + Allocate and free requests can only be performed on the highest + instance number disabled decoder with non-zero size. I.e. + allocations are enforced to occur in increasing 'decoderX.Y/id' + order and frees are enforced to occur in decreasing + 'decoderX.Y/id' order. diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h index 1a50c0fc399c..47cf0c286fc3 100644 --- a/drivers/cxl/core/core.h +++ b/drivers/cxl/core/core.h @@ -17,6 +17,13 @@ int cxl_send_cmd(struct cxl_memdev *cxlmd, struct cxl_send_command __user *s); void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr, resource_size_t length); +int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled, + enum cxl_decoder_mode mode); +int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size); +int cxl_dpa_free(struct cxl_endpoint_decoder *cxled); +resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled); +resource_size_t cxl_dpa_resource(struct cxl_endpoint_decoder *cxled); + int cxl_memdev_init(void); void cxl_memdev_exit(void); void cxl_mbox_init(void); diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index 8805afe63ebf..ceb4c28abc1b 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -248,6 +248,166 @@ static int cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, return devm_add_action_or_reset(&port->dev, cxl_dpa_release, cxled); } +resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled) +{ + resource_size_t size = 0; + + down_read(&cxl_dpa_rwsem); + if (cxled->dpa_res) + size = resource_size(cxled->dpa_res); + up_read(&cxl_dpa_rwsem); + + return size; +} + +resource_size_t cxl_dpa_resource(struct cxl_endpoint_decoder *cxled) +{ + resource_size_t base = -1; + + down_read(&cxl_dpa_rwsem); + if (cxled->dpa_res) + base = cxled->dpa_res->start; + up_read(&cxl_dpa_rwsem); + + return base; +} + +int cxl_dpa_free(struct cxl_endpoint_decoder *cxled) +{ + int rc = -EBUSY; + struct device *dev = &cxled->cxld.dev; + struct cxl_port *port = to_cxl_port(dev->parent); + + down_write(&cxl_dpa_rwsem); + if (!cxled->dpa_res) { + rc = 0; + goto out; + } + if (cxled->cxld.flags & CXL_DECODER_F_ENABLE) { + dev_dbg(dev, "decoder enabled\n"); + goto out; + } + if (cxled->cxld.id != port->dpa_end) { + dev_dbg(dev, "expected decoder%d.%d\n", port->id, + port->dpa_end); + goto out; + } + __cxl_dpa_release(cxled, true); + rc = 0; +out: + up_write(&cxl_dpa_rwsem); + return rc; +} + +int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled, + enum cxl_decoder_mode mode) +{ + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); + struct cxl_dev_state *cxlds = cxlmd->cxlds; + struct device *dev = &cxled->cxld.dev; + int rc = -EBUSY; + + switch (mode) { + case CXL_DECODER_RAM: + case CXL_DECODER_PMEM: + break; + default: + dev_dbg(dev, "unsupported mode: %d\n", mode); + return -EINVAL; + } + + down_write(&cxl_dpa_rwsem); + if (cxled->cxld.flags & CXL_DECODER_F_ENABLE) + goto out; + /* + * Only allow modes that are supported by the current partition + * configuration + */ + rc = -ENXIO; + if (mode == CXL_DECODER_PMEM && !resource_size(&cxlds->pmem_res)) { + dev_dbg(dev, "no available pmem capacity\n"); + goto out; + } + if (mode == CXL_DECODER_RAM && !resource_size(&cxlds->ram_res)) { + dev_dbg(dev, "no available ram capacity\n"); + goto out; + } + + cxled->mode = mode; + rc = 0; +out: + up_write(&cxl_dpa_rwsem); + + return rc; +} + +int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size) +{ + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); + resource_size_t free_ram_start, free_pmem_start; + struct cxl_port *port = cxled_to_port(cxled); + struct cxl_dev_state *cxlds = cxlmd->cxlds; + struct device *dev = &cxled->cxld.dev; + resource_size_t start, avail, skip; + struct resource *p, *last; + int rc = -EBUSY; + + down_write(&cxl_dpa_rwsem); + if (cxled->cxld.flags & CXL_DECODER_F_ENABLE) { + dev_dbg(dev, "decoder enabled\n"); + goto out; + } + + for (p = cxlds->ram_res.child, last = NULL; p; p = p->sibling) + last = p; + if (last) + free_ram_start = last->end + 1; + else + free_ram_start = cxlds->ram_res.start; + + for (p = cxlds->pmem_res.child, last = NULL; p; p = p->sibling) + last = p; + if (last) + free_pmem_start = last->end + 1; + else + free_pmem_start = cxlds->pmem_res.start; + + if (cxled->mode == CXL_DECODER_RAM) { + start = free_ram_start; + avail = cxlds->ram_res.end - start + 1; + skip = 0; + } else if (cxled->mode == CXL_DECODER_PMEM) { + resource_size_t skip_start, skip_end; + + start = free_pmem_start; + avail = cxlds->pmem_res.end - start + 1; + skip_start = free_ram_start; + skip_end = start - 1; + skip = skip_end - skip_start + 1; + } else { + dev_dbg(dev, "mode not set\n"); + rc = -EINVAL; + goto out; + } + + if (size > avail) { + dev_dbg(dev, "%pa exceeds available %s capacity: %pa\n", &size, + cxled->mode == CXL_DECODER_RAM ? "ram" : "pmem", + &avail); + rc = -ENOSPC; + goto out; + } + + rc = __cxl_dpa_reserve(cxled, start, size, skip); +out: + up_write(&cxl_dpa_rwsem); + + if (rc) + return rc; + + return devm_add_action_or_reset(&port->dev, cxl_dpa_release, cxled); +} + static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld, int *target_map, void __iomem *hdm, int which, u64 *dpa_base) diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 54bf032cbcb7..08851357b364 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -188,7 +188,76 @@ static ssize_t mode_show(struct device *dev, struct device_attribute *attr, return sysfs_emit(buf, "mixed\n"); } } -static DEVICE_ATTR_RO(mode); + +static ssize_t mode_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t len) +{ + struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev); + enum cxl_decoder_mode mode; + ssize_t rc; + + if (sysfs_streq(buf, "pmem")) + mode = CXL_DECODER_PMEM; + else if (sysfs_streq(buf, "ram")) + mode = CXL_DECODER_RAM; + else + return -EINVAL; + + rc = cxl_dpa_set_mode(cxled, mode); + if (rc) + return rc; + + return len; +} +static DEVICE_ATTR_RW(mode); + +static ssize_t dpa_resource_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev); + u64 base = cxl_dpa_resource(cxled); + + return sysfs_emit(buf, "%#llx\n", base); +} +static DEVICE_ATTR_RO(dpa_resource); + +static ssize_t dpa_size_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev); + resource_size_t size = cxl_dpa_size(cxled); + + return sysfs_emit(buf, "%pa\n", &size); +} + +static ssize_t dpa_size_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t len) +{ + struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev); + unsigned long long size; + ssize_t rc; + + rc = kstrtoull(buf, 0, &size); + if (rc) + return rc; + + if (!IS_ALIGNED(size, SZ_256M)) + return -EINVAL; + + rc = cxl_dpa_free(cxled); + if (rc) + return rc; + + if (size == 0) + return len; + + rc = cxl_dpa_alloc(cxled, size); + if (rc) + return rc; + + return len; +} +static DEVICE_ATTR_RW(dpa_size); static struct attribute *cxl_decoder_base_attrs[] = { &dev_attr_start.attr, @@ -241,6 +310,8 @@ static const struct attribute_group *cxl_decoder_switch_attribute_groups[] = { static struct attribute *cxl_decoder_endpoint_attrs[] = { &dev_attr_target_type.attr, &dev_attr_mode.attr, + &dev_attr_dpa_size.attr, + &dev_attr_dpa_resource.attr, NULL, };
The region provisioning flow will roughly follow a sequence of: 1/ Allocate DPA to a set of decoders 2/ Allocate HPA to a region 3/ Associate decoders with a region and validate that the DPA allocations and topologies match the parameters of the region. For now, this change (step 1) arranges for DPA capacity to be allocated and deleted from non-committed decoders based on the decoder's mode / partition selection. Capacity is allocated from the lowest DPA in the partition and any 'pmem' allocation blocks out all remaining ram capacity in its 'skip' setting. DPA allocations are enforced in decoder instance order. I.e. decoder N + 1 always starts at a higher DPA than instance N, and deleting allocations must proceed from the highest-instance allocated decoder to the lowest. Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- Documentation/ABI/testing/sysfs-bus-cxl | 37 +++++++ drivers/cxl/core/core.h | 7 + drivers/cxl/core/hdm.c | 160 +++++++++++++++++++++++++++++++ drivers/cxl/core/port.c | 73 ++++++++++++++ 4 files changed, 275 insertions(+), 2 deletions(-)