Message ID | 173709424415.753996.10761098712604763500.stgit@dwillia2-xfh.jf.intel.com |
---|---|
State | New |
Headers | show |
Series | cxl: DPA partition metadata is a mess... | expand |
On Thu, 16 Jan 2025 22:10:44 -0800 Dan Williams <dan.j.williams@intel.com> wrote: > The pending efforts to add CXL Accelerator (type-2) device [1], and > Dynamic Capacity (DCD) support [2], tripped on the > no-longer-fit-for-purpose design in the CXL subsystem for tracking > device-physical-address (DPA) metadata. Trip hazards include: > > - CXL Memory Devices need to consider a PMEM partition, but Accelerator > devices with CXL.mem likely do not in the common case. > > - CXL Memory Devices enumerate DPA through Memory Device mailbox > commands like Partition Info, Accelerators devices do not. > > - CXL Memory Devices that support DCD support more than 2 partitions. > Some of the driver algorithms are awkward to expand to > 2 partition > cases. > > - DPA performance data is a general capability that can be shared with > accelerators, so tracking it in 'struct cxl_memdev_state' is no longer > suitable. > > - 'enum cxl_decoder_mode' is sometimes a partition id and sometimes a > memory property, it should be phased in favor of a partition id and > the memory property comes from the partition info. > > Towards cleaning up those issues and allowing a smoother landing for the > aforementioned pending efforts, introduce a 'struct cxl_dpa_partition' > array to 'struct cxl_dev_state', and 'struct cxl_range_info' as a shared > way for Memory Devices and Accelerators to initialize the DPA information > in 'struct cxl_dev_state'. > > For now, split a new cxl_dpa_setup() from cxl_mem_create_range_info() to > get the new data structure initialized, and cleanup some qos_class init. > Follow on patches will go further to use the new data structure to > cleanup algorithms that are better suited to loop over all possible > partitions. > > cxl_dpa_setup() follows the locking expectations of mutating the device > DPA map, and is suitable for Accelerator drivers to use. Accelerators > likely only have one hardcoded 'ram' partition to convey to the > cxl_core. > > Link: http://lore.kernel.org/20241230214445.27602-1-alejandro.lucero-palau@amd.com [1] > Link: http://lore.kernel.org/20241210-dcd-type2-upstream-v8-0-812852504400@intel.com [2] > Cc: Dave Jiang <dave.jiang@intel.com> > Cc: Alejandro Lucero <alucerop@amd.com> > Cc: Ira Weiny <ira.weiny@intel.com> > Signed-off-by: Dan Williams <dan.j.williams@intel.com> Hi Dan, In basic form this seems fine, but I find the nr_paritions variable usage very counter intuitive. It's just how many we configured not how many there are, potentially with 0 size (so not a partition). I'd be happier if we can avoid that by just prefilling the lot with zero size and filling in the ones we want. So zero size means doesn't exist and use an iterator where appropriate to skip the zero size ones. Without that tidied up, to me this is more confusing than the previous code. Jonathan > --- > drivers/cxl/core/cdat.c | 15 ++----- > drivers/cxl/core/hdm.c | 69 ++++++++++++++++++++++++++++++++++ > drivers/cxl/core/mbox.c | 86 ++++++++++++++++++------------------------ > drivers/cxl/cxlmem.h | 79 +++++++++++++++++++++++++-------------- > drivers/cxl/pci.c | 7 +++ > tools/testing/cxl/test/cxl.c | 15 ++----- > tools/testing/cxl/test/mem.c | 7 +++ > 7 files changed, 176 insertions(+), 102 deletions(-) > > diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c > index b177a488e29b..5400a421ad30 100644 > --- a/drivers/cxl/core/cdat.c > +++ b/drivers/cxl/core/cdat.c > @@ -261,25 +261,18 @@ static void cxl_memdev_set_qos_class(struct cxl_dev_state *cxlds, > struct device *dev = cxlds->dev; > struct dsmas_entry *dent; > unsigned long index; > - const struct resource *partition[] = { > - to_ram_res(cxlds), > - to_pmem_res(cxlds), > - }; > - struct cxl_dpa_perf *perf[] = { > - to_ram_perf(cxlds), > - to_pmem_perf(cxlds), > - }; Ok. This removes some of the concerns from previous patch. > > xa_for_each(dsmas_xa, index, dent) { > - for (int i = 0; i < ARRAY_SIZE(partition); i++) { > - const struct resource *res = partition[i]; > + for (int i = 0; i < cxlds->nr_partitions; i++) { > + struct resource *res = &cxlds->part[i].res; > struct range range = { > .start = res->start, > .end = res->end, > }; > > if (range_contains(&range, &dent->dpa_range)) > - update_perf_entry(dev, dent, perf[i]); > + update_perf_entry(dev, dent, > + &cxlds->part[i].perf); > else > dev_dbg(dev, > "no partition for dsmas dpa: %pra\n", > diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c > index 7a85522294ad..7e1559b3ed88 100644 > --- a/drivers/cxl/core/hdm.c > +++ b/drivers/cxl/core/hdm.c > @@ -342,6 +342,75 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, > return 0; > } > > +static int add_dpa_res(struct device *dev, struct resource *parent, > + struct resource *res, resource_size_t start, > + resource_size_t size, const char *type) > +{ > + int rc; > + > + *res = (struct resource) { > + .name = type, > + .start = start, > + .end = start + size - 1, > + .flags = IORESOURCE_MEM, > + }; > + if (resource_size(res) == 0) { > + dev_dbg(dev, "DPA(%s): no capacity\n", res->name); > + return 0; > + } > + rc = request_resource(parent, res); > + if (rc) { > + dev_err(dev, "DPA(%s): failed to track %pr (%d)\n", res->name, > + res, rc); > + return rc; > + } > + > + dev_dbg(dev, "DPA(%s): %pr\n", res->name, res); > + > + return 0; > +} > + > +/* if this fails the caller must destroy @cxlds, there is no recovery */ > +int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info) > +{ > + struct device *dev = cxlds->dev; > + > + guard(rwsem_write)(&cxl_dpa_rwsem); > + > + if (cxlds->nr_partitions) > + return -EBUSY; > + > + if (!info->size || !info->nr_partitions) { > + cxlds->dpa_res = DEFINE_RES_MEM(0, 0); > + cxlds->nr_partitions = 0; > + return 0; > + } > + > + cxlds->dpa_res = DEFINE_RES_MEM(0, info->size); > + > + for (int i = 0; i < info->nr_partitions; i++) { > + const char *desc; > + int rc; > + > + if (i == CXL_PARTITION_RAM) > + desc = "ram"; > + else if (i == CXL_PARTITION_PMEM) > + desc = "pmem"; > + else > + desc = ""; > + cxlds->part[i].perf.qos_class = CXL_QOS_CLASS_INVALID; > + rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->part[i].res, > + info->range[i].start, > + range_len(&info->range[i]), desc); > + if (rc) > + return rc; > + cxlds->nr_partitions++; I'd just initialize the rest to 0 length similar to what is happening if we have pmem only anyway. Then this nr_patitions goes away and stops being a possible source of confusion. > + } > + > + return 0; > +} > +EXPORT_SYMBOL_GPL(cxl_dpa_setup); > diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c > index 3502f1633ad2..7dca5c8c3494 100644 > --- a/drivers/cxl/core/mbox.c > +++ b/drivers/cxl/core/mbox.c > @@ -1241,57 +1241,36 @@ int cxl_mem_sanitize(struct cxl_memdev *cxlmd, u16 cmd) > -int cxl_mem_create_range_info(struct cxl_memdev_state *mds) > +int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info) > { > struct cxl_dev_state *cxlds = &mds->cxlds; > - struct resource *ram_res = to_ram_res(cxlds); > - struct resource *pmem_res = to_pmem_res(cxlds); > struct device *dev = cxlds->dev; > int rc; > > if (!cxlds->media_ready) { > - cxlds->dpa_res = DEFINE_RES_MEM(0, 0); > - *ram_res = DEFINE_RES_MEM(0, 0); > - *pmem_res = DEFINE_RES_MEM(0, 0); > + info->size = 0; > return 0; > } > > - cxlds->dpa_res = DEFINE_RES_MEM(0, mds->total_bytes); > + info->size = mds->total_bytes; > > if (mds->partition_align_bytes == 0) { Obviously nothing to do with your patch as such, but maybe tidy this up by making active values == fixed values when we don't have partition control. That seems logical anyway to me and means we only end up with one lot of range setup in here. I can't immediately see any side effects of doing this. if (mds->partition_align_bytes != 0) { rc = cxl_mem_get_partition_info(mds); if (rc) return rc; } else { mds->active_volatile_bytes = mds->volatile_only_bytes; mds->active_persistent_bytes = mds->persistent_only_bytes; } info->range[CXL_PARTITION_RAM] = (struct range) { .start = 0, .end = mds->active_volatile_bytes - 1, }; info->nr_partitions++; if (!mds->active_persistent_bytes) return 0; info->range[CXL_PARTITION_PMEM] = (struct range) { .start = mds->active_volatile_bytes, .end = mds->active_volatile_bytes + mds->active_persistent_bytes - 1, }; info->nr_partitions++; return 0; } > - rc = add_dpa_res(dev, &cxlds->dpa_res, ram_res, 0, > - mds->volatile_only_bytes, "ram"); > - if (rc) > - return rc; > - return add_dpa_res(dev, &cxlds->dpa_res, pmem_res, > - mds->volatile_only_bytes, > - mds->persistent_only_bytes, "pmem"); > + info->range[CXL_PARTITION_RAM] = (struct range) { > + .start = 0, > + .end = mds->volatile_only_bytes - 1, > + }; > + info->nr_partitions++; > + > + if (!mds->persistent_only_bytes) > + return 0; > + > + info->range[CXL_PARTITION_PMEM] = (struct range) { > + .start = mds->volatile_only_bytes, > + .end = mds->volatile_only_bytes + > + mds->persistent_only_bytes - 1, > + }; > + info->nr_partitions++; This nr partitions makes some sense though I'd be tempted to add a type array to info so that we can just not pass empty ones if we don't want to. Makes this code a little more complex, but not a lot and means nr->partitions becomes the ones that actually exist. > + return 0; > } > > rc = cxl_mem_get_partition_info(mds); > @@ -1300,15 +1279,24 @@ int cxl_mem_create_range_info(struct cxl_memdev_state *mds) > return rc; > } > > - rc = add_dpa_res(dev, &cxlds->dpa_res, ram_res, 0, > - mds->active_volatile_bytes, "ram"); > - if (rc) > - return rc; > - return add_dpa_res(dev, &cxlds->dpa_res, pmem_res, > - mds->active_volatile_bytes, > - mds->active_persistent_bytes, "pmem"); > + info->range[CXL_PARTITION_RAM] = (struct range) { > + .start = 0, > + .end = mds->active_volatile_bytes - 1, > + }; > + info->nr_partitions++; > + > + if (!mds->active_persistent_bytes) > + return 0; > + > + info->range[CXL_PARTITION_PMEM] = (struct range) { > + .start = mds->active_volatile_bytes, > + .end = mds->active_volatile_bytes + mds->active_persistent_bytes - 1, > + }; > + info->nr_partitions++; > + > + return 0; > } > -EXPORT_SYMBOL_NS_GPL(cxl_mem_create_range_info, "CXL"); > +EXPORT_SYMBOL_NS_GPL(cxl_mem_dpa_fetch, "CXL"); > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h > index 78e92e24d7b5..2e728d4b7327 100644 > --- a/drivers/cxl/cxlmem.h > +++ b/drivers/cxl/cxlmem.h > @@ -97,6 +97,20 @@ int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, > resource_size_t base, resource_size_t len, > resource_size_t skipped); > > +/* Well known, spec defined partition indices */ > +enum cxl_partition { > + CXL_PARTITION_RAM, > + CXL_PARTITION_PMEM, > + CXL_PARTITION_MAX, > +}; > + > +struct cxl_dpa_info { > + u64 size; > + struct range range[CXL_PARTITION_MAX]; > + int nr_partitions; > +}; blank line seems appropriate here. > +int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info); > + > static inline struct cxl_ep *cxl_ep_load(struct cxl_port *port, > struct cxl_memdev *cxlmd) > { > @@ -408,6 +422,16 @@ struct cxl_dpa_perf { > int qos_class; > }; > > /** > * struct cxl_dev_state - The driver device state > * > @@ -423,8 +447,8 @@ struct cxl_dpa_perf { > * @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH) > * @media_ready: Indicate whether the device media is usable > * @dpa_res: Overall DPA resource tree for the device > - * @_pmem_res: Active Persistent memory capacity configuration > - * @_ram_res: Active Volatile memory capacity configuration > + * @part: DPA partition array > + * @nr_partitions: Number of DPA partitions This needs more. It is not the number of partitions present I think, it is the number that a particular driver is potentially interested in. > * @serial: PCIe Device Serial Number > * @type: Generic Memory Class device or Vendor Specific Memory device > * @cxl_mbox: CXL mailbox context > @@ -438,21 +462,39 @@ struct cxl_dev_state { > bool rcd; > bool media_ready; > struct resource dpa_res; > - struct resource _pmem_res; > - struct resource _ram_res; > + struct cxl_dpa_partition part[CXL_PARTITION_MAX]; > + unsigned int nr_partitions; > u64 serial; > enum cxl_devtype type; > struct cxl_mailbox cxl_mbox; > }; > > -static inline struct resource *to_ram_res(struct cxl_dev_state *cxlds) > +static inline const struct resource *to_ram_res(struct cxl_dev_state *cxlds) > { > - return &cxlds->_ram_res; > + if (cxlds->nr_partitions > 0) > + return &cxlds->part[CXL_PARTITION_RAM].res; > + return NULL; > } > > -static inline struct resource *to_pmem_res(struct cxl_dev_state *cxlds) > +static inline const struct resource *to_pmem_res(struct cxl_dev_state *cxlds) > { > - return &cxlds->_pmem_res; > + if (cxlds->nr_partitions > 1) This is very confusing as nr_partitions is being used not to indicate number of partitions but whether a driver has filled in the data for them (which may well be empty). I'd rather see that as a bitmap, or a 'not set' value initialized by the core that is then replaced when they are set. > + return &cxlds->part[CXL_PARTITION_PMEM].res; > + return NULL; > +} > + > +static inline struct cxl_dpa_perf *to_ram_perf(struct cxl_dev_state *cxlds) > +{ > + if (cxlds->nr_partitions > 0) > + return &cxlds->part[CXL_PARTITION_RAM].perf; > + return NULL; > +} > + > +static inline struct cxl_dpa_perf *to_pmem_perf(struct cxl_dev_state *cxlds) > +{ > + if (cxlds->nr_partitions > 1) > + return &cxlds->part[CXL_PARTITION_PMEM].perf; > + return NULL; > } > @@ -860,7 +883,7 @@ int cxl_internal_send_cmd(struct cxl_mailbox *cxl_mbox, > int cxl_dev_state_identify(struct cxl_memdev_state *mds); > int cxl_await_media_ready(struct cxl_dev_state *cxlds); > int cxl_enumerate_cmds(struct cxl_memdev_state *mds); > -int cxl_mem_create_range_info(struct cxl_memdev_state *mds); > +int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info); > struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev); > void set_exclusive_cxl_commands(struct cxl_memdev_state *mds, > unsigned long *cmds); > diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c > index 7f1c5061307b..ba3d48b37de3 100644 > --- a/tools/testing/cxl/test/cxl.c > +++ b/tools/testing/cxl/test/cxl.c > @@ -1001,26 +1001,19 @@ static void mock_cxl_endpoint_parse_cdat(struct cxl_port *port) > struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev); > struct cxl_dev_state *cxlds = cxlmd->cxlds; > struct access_coordinate ep_c[ACCESS_COORDINATE_MAX]; > - const struct resource *partition[] = { > - to_ram_res(cxlds), > - to_pmem_res(cxlds), > - }; > - struct cxl_dpa_perf *perf[] = { > - to_ram_perf(cxlds), > - to_pmem_perf(cxlds), > - }; Ok. This gets rid of some of the earlier concerns. > > if (!cxl_root) > return; > > - for (int i = 0; i < ARRAY_SIZE(partition); i++) { > - const struct resource *res = partition[i]; > + for (int i = 0; i < cxlds->nr_partitions; i++) { > + struct resource *res = &cxlds->part[i].res; > + struct cxl_dpa_perf *perf = &cxlds->part[i].perf; > struct range range = { > .start = res->start, > .end = res->end, > }; > > - dpa_perf_setup(port, &range, perf[i]); > + dpa_perf_setup(port, &range, perf); > } > > cxl_memdev_update_perf(cxlmd);
On 1/17/25 10:52, Jonathan Cameron wrote: > On Thu, 16 Jan 2025 22:10:44 -0800 > Dan Williams <dan.j.williams@intel.com> wrote: > >> The pending efforts to add CXL Accelerator (type-2) device [1], and >> Dynamic Capacity (DCD) support [2], tripped on the >> no-longer-fit-for-purpose design in the CXL subsystem for tracking >> device-physical-address (DPA) metadata. Trip hazards include: >> >> - CXL Memory Devices need to consider a PMEM partition, but Accelerator >> devices with CXL.mem likely do not in the common case. >> >> - CXL Memory Devices enumerate DPA through Memory Device mailbox >> commands like Partition Info, Accelerators devices do not. >> >> - CXL Memory Devices that support DCD support more than 2 partitions. >> Some of the driver algorithms are awkward to expand to > 2 partition >> cases. >> >> - DPA performance data is a general capability that can be shared with >> accelerators, so tracking it in 'struct cxl_memdev_state' is no longer >> suitable. >> >> - 'enum cxl_decoder_mode' is sometimes a partition id and sometimes a >> memory property, it should be phased in favor of a partition id and >> the memory property comes from the partition info. >> >> Towards cleaning up those issues and allowing a smoother landing for the >> aforementioned pending efforts, introduce a 'struct cxl_dpa_partition' >> array to 'struct cxl_dev_state', and 'struct cxl_range_info' as a shared >> way for Memory Devices and Accelerators to initialize the DPA information >> in 'struct cxl_dev_state'. >> >> For now, split a new cxl_dpa_setup() from cxl_mem_create_range_info() to >> get the new data structure initialized, and cleanup some qos_class init. >> Follow on patches will go further to use the new data structure to >> cleanup algorithms that are better suited to loop over all possible >> partitions. >> >> cxl_dpa_setup() follows the locking expectations of mutating the device >> DPA map, and is suitable for Accelerator drivers to use. Accelerators >> likely only have one hardcoded 'ram' partition to convey to the >> cxl_core. >> >> Link: http://lore.kernel.org/20241230214445.27602-1-alejandro.lucero-palau@amd.com [1] >> Link: http://lore.kernel.org/20241210-dcd-type2-upstream-v8-0-812852504400@intel.com [2] >> Cc: Dave Jiang <dave.jiang@intel.com> >> Cc: Alejandro Lucero <alucerop@amd.com> >> Cc: Ira Weiny <ira.weiny@intel.com> >> Signed-off-by: Dan Williams <dan.j.williams@intel.com> > Hi Dan, > > In basic form this seems fine, but I find the nr_paritions variable usage very > counter intuitive. It's just how many we configured not how many there > are, potentially with 0 size (so not a partition). I'd be happier if we > can avoid that by just prefilling the lot with zero size and filling in > the ones we want. So zero size means doesn't exist and use an iterator where > appropriate to skip the zero size ones. > > Without that tidied up, to me this is more confusing than the previous code. > > Jonathan > >> --- >> drivers/cxl/core/cdat.c | 15 ++----- >> drivers/cxl/core/hdm.c | 69 ++++++++++++++++++++++++++++++++++ >> drivers/cxl/core/mbox.c | 86 ++++++++++++++++++------------------------ >> drivers/cxl/cxlmem.h | 79 +++++++++++++++++++++++++-------------- >> drivers/cxl/pci.c | 7 +++ >> tools/testing/cxl/test/cxl.c | 15 ++----- >> tools/testing/cxl/test/mem.c | 7 +++ >> 7 files changed, 176 insertions(+), 102 deletions(-) >> >> diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c >> index b177a488e29b..5400a421ad30 100644 >> --- a/drivers/cxl/core/cdat.c >> +++ b/drivers/cxl/core/cdat.c >> @@ -261,25 +261,18 @@ static void cxl_memdev_set_qos_class(struct cxl_dev_state *cxlds, >> struct device *dev = cxlds->dev; >> struct dsmas_entry *dent; >> unsigned long index; >> - const struct resource *partition[] = { >> - to_ram_res(cxlds), >> - to_pmem_res(cxlds), >> - }; >> - struct cxl_dpa_perf *perf[] = { >> - to_ram_perf(cxlds), >> - to_pmem_perf(cxlds), >> - }; > Ok. This removes some of the concerns from previous patch. > >> >> xa_for_each(dsmas_xa, index, dent) { >> - for (int i = 0; i < ARRAY_SIZE(partition); i++) { >> - const struct resource *res = partition[i]; >> + for (int i = 0; i < cxlds->nr_partitions; i++) { >> + struct resource *res = &cxlds->part[i].res; >> struct range range = { >> .start = res->start, >> .end = res->end, >> }; >> >> if (range_contains(&range, &dent->dpa_range)) >> - update_perf_entry(dev, dent, perf[i]); >> + update_perf_entry(dev, dent, >> + &cxlds->part[i].perf); >> else >> dev_dbg(dev, >> "no partition for dsmas dpa: %pra\n", >> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c >> index 7a85522294ad..7e1559b3ed88 100644 >> --- a/drivers/cxl/core/hdm.c >> +++ b/drivers/cxl/core/hdm.c >> @@ -342,6 +342,75 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, >> return 0; >> } >> >> +static int add_dpa_res(struct device *dev, struct resource *parent, >> + struct resource *res, resource_size_t start, >> + resource_size_t size, const char *type) >> +{ >> + int rc; >> + >> + *res = (struct resource) { >> + .name = type, >> + .start = start, >> + .end = start + size - 1, >> + .flags = IORESOURCE_MEM, >> + }; >> + if (resource_size(res) == 0) { >> + dev_dbg(dev, "DPA(%s): no capacity\n", res->name); >> + return 0; >> + } >> + rc = request_resource(parent, res); >> + if (rc) { >> + dev_err(dev, "DPA(%s): failed to track %pr (%d)\n", res->name, >> + res, rc); >> + return rc; >> + } >> + >> + dev_dbg(dev, "DPA(%s): %pr\n", res->name, res); >> + >> + return 0; >> +} >> + >> +/* if this fails the caller must destroy @cxlds, there is no recovery */ >> +int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info) >> +{ >> + struct device *dev = cxlds->dev; >> + >> + guard(rwsem_write)(&cxl_dpa_rwsem); >> + >> + if (cxlds->nr_partitions) >> + return -EBUSY; >> + >> + if (!info->size || !info->nr_partitions) { >> + cxlds->dpa_res = DEFINE_RES_MEM(0, 0); >> + cxlds->nr_partitions = 0; >> + return 0; >> + } >> + >> + cxlds->dpa_res = DEFINE_RES_MEM(0, info->size); >> + >> + for (int i = 0; i < info->nr_partitions; i++) { >> + const char *desc; >> + int rc; >> + >> + if (i == CXL_PARTITION_RAM) >> + desc = "ram"; >> + else if (i == CXL_PARTITION_PMEM) >> + desc = "pmem"; >> + else >> + desc = ""; >> + cxlds->part[i].perf.qos_class = CXL_QOS_CLASS_INVALID; >> + rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->part[i].res, >> + info->range[i].start, >> + range_len(&info->range[i]), desc); >> + if (rc) >> + return rc; >> + cxlds->nr_partitions++; > I'd just initialize the rest to 0 length similar to what is happening > if we have pmem only anyway. Then this nr_patitions goes away and > stops being a possible source of confusion. > >> + } >> + >> + return 0; >> +} >> +EXPORT_SYMBOL_GPL(cxl_dpa_setup); >> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c >> index 3502f1633ad2..7dca5c8c3494 100644 >> --- a/drivers/cxl/core/mbox.c >> +++ b/drivers/cxl/core/mbox.c >> @@ -1241,57 +1241,36 @@ int cxl_mem_sanitize(struct cxl_memdev *cxlmd, u16 cmd) >> -int cxl_mem_create_range_info(struct cxl_memdev_state *mds) >> +int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info) >> { >> struct cxl_dev_state *cxlds = &mds->cxlds; >> - struct resource *ram_res = to_ram_res(cxlds); >> - struct resource *pmem_res = to_pmem_res(cxlds); >> struct device *dev = cxlds->dev; >> int rc; >> >> if (!cxlds->media_ready) { >> - cxlds->dpa_res = DEFINE_RES_MEM(0, 0); >> - *ram_res = DEFINE_RES_MEM(0, 0); >> - *pmem_res = DEFINE_RES_MEM(0, 0); >> + info->size = 0; >> return 0; >> } >> >> - cxlds->dpa_res = DEFINE_RES_MEM(0, mds->total_bytes); >> + info->size = mds->total_bytes; >> >> if (mds->partition_align_bytes == 0) { > Obviously nothing to do with your patch as such, but maybe tidy this up > by making active values == fixed values when we don't have partition control. > That seems logical anyway to me and means we only end up with one lot of > range setup in here. I can't immediately see any side effects of doing this. > > > > if (mds->partition_align_bytes != 0) { > rc = cxl_mem_get_partition_info(mds); > if (rc) > return rc; > } else { > mds->active_volatile_bytes = mds->volatile_only_bytes; > mds->active_persistent_bytes = mds->persistent_only_bytes; > } > info->range[CXL_PARTITION_RAM] = (struct range) { > .start = 0, > .end = mds->active_volatile_bytes - 1, > }; > info->nr_partitions++; > > if (!mds->active_persistent_bytes) > return 0; > > info->range[CXL_PARTITION_PMEM] = (struct range) { > .start = mds->active_volatile_bytes, > .end = mds->active_volatile_bytes + mds->active_persistent_bytes - 1, > }; > info->nr_partitions++; > > return 0; > } > >> - rc = add_dpa_res(dev, &cxlds->dpa_res, ram_res, 0, >> - mds->volatile_only_bytes, "ram"); >> - if (rc) >> - return rc; >> - return add_dpa_res(dev, &cxlds->dpa_res, pmem_res, >> - mds->volatile_only_bytes, >> - mds->persistent_only_bytes, "pmem"); >> + info->range[CXL_PARTITION_RAM] = (struct range) { >> + .start = 0, >> + .end = mds->volatile_only_bytes - 1, >> + }; >> + info->nr_partitions++; >> + >> + if (!mds->persistent_only_bytes) >> + return 0; >> + >> + info->range[CXL_PARTITION_PMEM] = (struct range) { >> + .start = mds->volatile_only_bytes, >> + .end = mds->volatile_only_bytes + >> + mds->persistent_only_bytes - 1, >> + }; >> + info->nr_partitions++; > This nr partitions makes some sense though I'd be tempted to add a type > array to info so that we can just not pass empty ones if we don't want to. > Makes this code a little more complex, but not a lot and means > nr->partitions becomes the ones that actually exist. > >> + return 0; >> } >> >> rc = cxl_mem_get_partition_info(mds); >> @@ -1300,15 +1279,24 @@ int cxl_mem_create_range_info(struct cxl_memdev_state *mds) >> return rc; >> } >> >> - rc = add_dpa_res(dev, &cxlds->dpa_res, ram_res, 0, >> - mds->active_volatile_bytes, "ram"); >> - if (rc) >> - return rc; >> - return add_dpa_res(dev, &cxlds->dpa_res, pmem_res, >> - mds->active_volatile_bytes, >> - mds->active_persistent_bytes, "pmem"); >> + info->range[CXL_PARTITION_RAM] = (struct range) { >> + .start = 0, >> + .end = mds->active_volatile_bytes - 1, >> + }; >> + info->nr_partitions++; >> + >> + if (!mds->active_persistent_bytes) >> + return 0; >> + >> + info->range[CXL_PARTITION_PMEM] = (struct range) { >> + .start = mds->active_volatile_bytes, >> + .end = mds->active_volatile_bytes + mds->active_persistent_bytes - 1, >> + }; >> + info->nr_partitions++; >> + >> + return 0; >> } >> -EXPORT_SYMBOL_NS_GPL(cxl_mem_create_range_info, "CXL"); >> +EXPORT_SYMBOL_NS_GPL(cxl_mem_dpa_fetch, "CXL"); >> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h >> index 78e92e24d7b5..2e728d4b7327 100644 >> --- a/drivers/cxl/cxlmem.h >> +++ b/drivers/cxl/cxlmem.h >> @@ -97,6 +97,20 @@ int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, >> resource_size_t base, resource_size_t len, >> resource_size_t skipped); >> >> +/* Well known, spec defined partition indices */ >> +enum cxl_partition { >> + CXL_PARTITION_RAM, >> + CXL_PARTITION_PMEM, >> + CXL_PARTITION_MAX, >> +}; >> + >> +struct cxl_dpa_info { >> + u64 size; >> + struct range range[CXL_PARTITION_MAX]; >> + int nr_partitions; >> +}; > blank line seems appropriate here. > >> +int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info); >> + >> static inline struct cxl_ep *cxl_ep_load(struct cxl_port *port, >> struct cxl_memdev *cxlmd) >> { >> @@ -408,6 +422,16 @@ struct cxl_dpa_perf { >> int qos_class; >> }; >> >> /** >> * struct cxl_dev_state - The driver device state >> * >> @@ -423,8 +447,8 @@ struct cxl_dpa_perf { >> * @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH) >> * @media_ready: Indicate whether the device media is usable >> * @dpa_res: Overall DPA resource tree for the device >> - * @_pmem_res: Active Persistent memory capacity configuration >> - * @_ram_res: Active Volatile memory capacity configuration >> + * @part: DPA partition array >> + * @nr_partitions: Number of DPA partitions > This needs more. It is not the number of partitions present I think, it > is the number that a particular driver is potentially interested in. > >> * @serial: PCIe Device Serial Number >> * @type: Generic Memory Class device or Vendor Specific Memory device >> * @cxl_mbox: CXL mailbox context >> @@ -438,21 +462,39 @@ struct cxl_dev_state { >> bool rcd; >> bool media_ready; >> struct resource dpa_res; >> - struct resource _pmem_res; >> - struct resource _ram_res; >> + struct cxl_dpa_partition part[CXL_PARTITION_MAX]; >> + unsigned int nr_partitions; >> u64 serial; >> enum cxl_devtype type; >> struct cxl_mailbox cxl_mbox; >> }; >> >> -static inline struct resource *to_ram_res(struct cxl_dev_state *cxlds) >> +static inline const struct resource *to_ram_res(struct cxl_dev_state *cxlds) >> { >> - return &cxlds->_ram_res; >> + if (cxlds->nr_partitions > 0) >> + return &cxlds->part[CXL_PARTITION_RAM].res; >> + return NULL; >> } >> >> -static inline struct resource *to_pmem_res(struct cxl_dev_state *cxlds) >> +static inline const struct resource *to_pmem_res(struct cxl_dev_state *cxlds) >> { >> - return &cxlds->_pmem_res; >> + if (cxlds->nr_partitions > 1) > This is very confusing as nr_partitions is being used not to indicate > number of partitions but whether a driver has filled in the data for them > (which may well be empty). I would say this is more than confusing: it is broken. What if a device only has pmem? Number of partitions would be 1 ... > > I'd rather see that as a bitmap, or a 'not set' value initialized by > the core that is then replaced when they are set. Repeating my doubt I expressed in the previous patch but with other words: Is it not enough to give the reference to the ram/pmem resource and then work based on the size? >> + return &cxlds->part[CXL_PARTITION_PMEM].res; >> + return NULL; >> +} >> + >> +static inline struct cxl_dpa_perf *to_ram_perf(struct cxl_dev_state *cxlds) >> +{ >> + if (cxlds->nr_partitions > 0) >> + return &cxlds->part[CXL_PARTITION_RAM].perf; >> + return NULL; >> +} >> + >> +static inline struct cxl_dpa_perf *to_pmem_perf(struct cxl_dev_state *cxlds) >> +{ >> + if (cxlds->nr_partitions > 1) >> + return &cxlds->part[CXL_PARTITION_PMEM].perf; >> + return NULL; >> } > >> @@ -860,7 +883,7 @@ int cxl_internal_send_cmd(struct cxl_mailbox *cxl_mbox, >> int cxl_dev_state_identify(struct cxl_memdev_state *mds); >> int cxl_await_media_ready(struct cxl_dev_state *cxlds); >> int cxl_enumerate_cmds(struct cxl_memdev_state *mds); >> -int cxl_mem_create_range_info(struct cxl_memdev_state *mds); >> +int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info); >> struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev); >> void set_exclusive_cxl_commands(struct cxl_memdev_state *mds, >> unsigned long *cmds); >> diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c >> index 7f1c5061307b..ba3d48b37de3 100644 >> --- a/tools/testing/cxl/test/cxl.c >> +++ b/tools/testing/cxl/test/cxl.c >> @@ -1001,26 +1001,19 @@ static void mock_cxl_endpoint_parse_cdat(struct cxl_port *port) >> struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev); >> struct cxl_dev_state *cxlds = cxlmd->cxlds; >> struct access_coordinate ep_c[ACCESS_COORDINATE_MAX]; >> - const struct resource *partition[] = { >> - to_ram_res(cxlds), >> - to_pmem_res(cxlds), >> - }; >> - struct cxl_dpa_perf *perf[] = { >> - to_ram_perf(cxlds), >> - to_pmem_perf(cxlds), >> - }; > Ok. This gets rid of some of the earlier concerns. > >> >> if (!cxl_root) >> return; >> >> - for (int i = 0; i < ARRAY_SIZE(partition); i++) { >> - const struct resource *res = partition[i]; >> + for (int i = 0; i < cxlds->nr_partitions; i++) { >> + struct resource *res = &cxlds->part[i].res; >> + struct cxl_dpa_perf *perf = &cxlds->part[i].perf; >> struct range range = { >> .start = res->start, >> .end = res->end, >> }; >> >> - dpa_perf_setup(port, &range, perf[i]); >> + dpa_perf_setup(port, &range, perf); >> } >> >> cxl_memdev_update_perf(cxlmd); >
On 1/17/25 06:10, Dan Williams wrote: > The pending efforts to add CXL Accelerator (type-2) device [1], and > Dynamic Capacity (DCD) support [2], tripped on the > no-longer-fit-for-purpose design in the CXL subsystem for tracking > device-physical-address (DPA) metadata. Trip hazards include: > > - CXL Memory Devices need to consider a PMEM partition, but Accelerator > devices with CXL.mem likely do not in the common case. > > - CXL Memory Devices enumerate DPA through Memory Device mailbox > commands like Partition Info, Accelerators devices do not. > > - CXL Memory Devices that support DCD support more than 2 partitions. > Some of the driver algorithms are awkward to expand to > 2 partition > cases. > > - DPA performance data is a general capability that can be shared with > accelerators, so tracking it in 'struct cxl_memdev_state' is no longer > suitable. > > - 'enum cxl_decoder_mode' is sometimes a partition id and sometimes a > memory property, it should be phased in favor of a partition id and > the memory property comes from the partition info. > > Towards cleaning up those issues and allowing a smoother landing for the > aforementioned pending efforts, introduce a 'struct cxl_dpa_partition' > array to 'struct cxl_dev_state', and 'struct cxl_range_info' as a shared > way for Memory Devices and Accelerators to initialize the DPA information > in 'struct cxl_dev_state'. > > For now, split a new cxl_dpa_setup() from cxl_mem_create_range_info() to > get the new data structure initialized, and cleanup some qos_class init. > Follow on patches will go further to use the new data structure to > cleanup algorithms that are better suited to loop over all possible > partitions. > > cxl_dpa_setup() follows the locking expectations of mutating the device > DPA map, and is suitable for Accelerator drivers to use. Accelerators > likely only have one hardcoded 'ram' partition to convey to the > cxl_core. <snip> > +/* if this fails the caller must destroy @cxlds, there is no recovery */ > +int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info) > +{ > + struct device *dev = cxlds->dev; > + > + guard(rwsem_write)(&cxl_dpa_rwsem); > + This explains to me what you meant about locking when setting the resources for Type2. However, I think this is no necessary because there is no user space, or that is my idea, involved when creating CXL regions for a Type2. It is all up to the accel driver to do so, therefore no locking needed because none is going to traverse the child resource list while initialising/updating it. It does not harm to have it for current Type2 case, and always a good idea to have it for potential future cases.
Jonathan Cameron wrote: > On Thu, 16 Jan 2025 22:10:44 -0800 > Dan Williams <dan.j.williams@intel.com> wrote: > > > The pending efforts to add CXL Accelerator (type-2) device [1], and > > Dynamic Capacity (DCD) support [2], tripped on the > > no-longer-fit-for-purpose design in the CXL subsystem for tracking > > device-physical-address (DPA) metadata. Trip hazards include: > > > > - CXL Memory Devices need to consider a PMEM partition, but Accelerator > > devices with CXL.mem likely do not in the common case. > > > > - CXL Memory Devices enumerate DPA through Memory Device mailbox > > commands like Partition Info, Accelerators devices do not. > > > > - CXL Memory Devices that support DCD support more than 2 partitions. > > Some of the driver algorithms are awkward to expand to > 2 partition > > cases. > > > > - DPA performance data is a general capability that can be shared with > > accelerators, so tracking it in 'struct cxl_memdev_state' is no longer > > suitable. > > > > - 'enum cxl_decoder_mode' is sometimes a partition id and sometimes a > > memory property, it should be phased in favor of a partition id and > > the memory property comes from the partition info. > > > > Towards cleaning up those issues and allowing a smoother landing for the > > aforementioned pending efforts, introduce a 'struct cxl_dpa_partition' > > array to 'struct cxl_dev_state', and 'struct cxl_range_info' as a shared > > way for Memory Devices and Accelerators to initialize the DPA information > > in 'struct cxl_dev_state'. > > > > For now, split a new cxl_dpa_setup() from cxl_mem_create_range_info() to > > get the new data structure initialized, and cleanup some qos_class init. > > Follow on patches will go further to use the new data structure to > > cleanup algorithms that are better suited to loop over all possible > > partitions. > > > > cxl_dpa_setup() follows the locking expectations of mutating the device > > DPA map, and is suitable for Accelerator drivers to use. Accelerators > > likely only have one hardcoded 'ram' partition to convey to the > > cxl_core. > > > > Link: http://lore.kernel.org/20241230214445.27602-1-alejandro.lucero-palau@amd.com [1] > > Link: http://lore.kernel.org/20241210-dcd-type2-upstream-v8-0-812852504400@intel.com [2] > > Cc: Dave Jiang <dave.jiang@intel.com> > > Cc: Alejandro Lucero <alucerop@amd.com> > > Cc: Ira Weiny <ira.weiny@intel.com> > > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > > Hi Dan, > > In basic form this seems fine, but I find the nr_paritions variable usage very > counter intuitive. It's just how many we configured not how many there > are, potentially with 0 size (so not a partition). I'd be happier if we > can avoid that by just prefilling the lot with zero size and filling in > the ones we want. So zero size means doesn't exist and use an iterator where > appropriate to skip the zero size ones. The PMEM-only device case did give me pause. Is that 2 partitions with a zero-sized first partition, or is that just 1 partition? Ultimately I do think the code should further evolve to treat that as 1-PMEM-partition, but as far as I can see that depends on 'enum cxl_decoder_mode' being eliminated and teaching all code paths to search for the position of the PMEM partition. > Without that tidied up, to me this is more confusing than the previous code. I was going to save PMEM at a partition other than 1 for the DCD series, but let me take another pass at adding that to this series. [..] > > +/* if this fails the caller must destroy @cxlds, there is no recovery */ > > +int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info) > > +{ > > + struct device *dev = cxlds->dev; > > + > > + guard(rwsem_write)(&cxl_dpa_rwsem); > > + > > + if (cxlds->nr_partitions) > > + return -EBUSY; > > + > > + if (!info->size || !info->nr_partitions) { > > + cxlds->dpa_res = DEFINE_RES_MEM(0, 0); > > + cxlds->nr_partitions = 0; > > + return 0; > > + } > > + > > + cxlds->dpa_res = DEFINE_RES_MEM(0, info->size); > > + > > + for (int i = 0; i < info->nr_partitions; i++) { > > + const char *desc; > > + int rc; > > + > > + if (i == CXL_PARTITION_RAM) > > + desc = "ram"; > > + else if (i == CXL_PARTITION_PMEM) > > + desc = "pmem"; > > + else > > + desc = ""; > > + cxlds->part[i].perf.qos_class = CXL_QOS_CLASS_INVALID; > > + rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->part[i].res, > > + info->range[i].start, > > + range_len(&info->range[i]), desc); > > + if (rc) > > + return rc; > > + cxlds->nr_partitions++; > I'd just initialize the rest to 0 length similar to what is happening > if we have pmem only anyway. Then this nr_patitions goes away and > stops being a possible source of confusion. Modulo teaching other code that wants to ask "what is the size of the PMEM partition" to use a helper that hides the "find the device's PMEM partition". > > > + } > > + > > + return 0; > > +} > > +EXPORT_SYMBOL_GPL(cxl_dpa_setup); > > > diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c > > index 3502f1633ad2..7dca5c8c3494 100644 > > --- a/drivers/cxl/core/mbox.c > > +++ b/drivers/cxl/core/mbox.c > > @@ -1241,57 +1241,36 @@ int cxl_mem_sanitize(struct cxl_memdev *cxlmd, u16 cmd) > > > -int cxl_mem_create_range_info(struct cxl_memdev_state *mds) > > +int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info) > > { > > struct cxl_dev_state *cxlds = &mds->cxlds; > > - struct resource *ram_res = to_ram_res(cxlds); > > - struct resource *pmem_res = to_pmem_res(cxlds); > > struct device *dev = cxlds->dev; > > int rc; > > > > if (!cxlds->media_ready) { > > - cxlds->dpa_res = DEFINE_RES_MEM(0, 0); > > - *ram_res = DEFINE_RES_MEM(0, 0); > > - *pmem_res = DEFINE_RES_MEM(0, 0); > > + info->size = 0; > > return 0; > > } > > > > - cxlds->dpa_res = DEFINE_RES_MEM(0, mds->total_bytes); > > + info->size = mds->total_bytes; > > > > if (mds->partition_align_bytes == 0) { > Obviously nothing to do with your patch as such, but maybe tidy this up > by making active values == fixed values when we don't have partition control. > That seems logical anyway to me and means we only end up with one lot of > range setup in here. I can't immediately see any side effects of doing this. Yeah, I mentioned this in another thread. There is no reason for 'struct cxl_memdev_state' to carry these values at all. They are just temporary init-data. So, cxl_dev_state_identify() becomes cxl_mem_identify(), since it is a memory-device command. Move it inside of cxl_mem_dpa_fetch() since it is just temporary init-data for 'struct cxl_dpa_info'. [..] > > - rc = add_dpa_res(dev, &cxlds->dpa_res, ram_res, 0, > > - mds->volatile_only_bytes, "ram"); > > - if (rc) > > - return rc; > > - return add_dpa_res(dev, &cxlds->dpa_res, pmem_res, > > - mds->volatile_only_bytes, > > - mds->persistent_only_bytes, "pmem"); > > + info->range[CXL_PARTITION_RAM] = (struct range) { > > + .start = 0, > > + .end = mds->volatile_only_bytes - 1, > > + }; > > + info->nr_partitions++; > > + > > + if (!mds->persistent_only_bytes) > > + return 0; > > + > > + info->range[CXL_PARTITION_PMEM] = (struct range) { > > + .start = mds->volatile_only_bytes, > > + .end = mds->volatile_only_bytes + > > + mds->persistent_only_bytes - 1, > > + }; > > + info->nr_partitions++; > > This nr partitions makes some sense though I'd be tempted to add a type > array to info so that we can just not pass empty ones if we don't want to. > Makes this code a little more complex, but not a lot and means > nr->partitions becomes the ones that actually exist. Agree, that's the end goal. > > > + return 0; > > } > > > > rc = cxl_mem_get_partition_info(mds); > > @@ -1300,15 +1279,24 @@ int cxl_mem_create_range_info(struct cxl_memdev_state *mds) > > return rc; > > } > > > > - rc = add_dpa_res(dev, &cxlds->dpa_res, ram_res, 0, > > - mds->active_volatile_bytes, "ram"); > > - if (rc) > > - return rc; > > - return add_dpa_res(dev, &cxlds->dpa_res, pmem_res, > > - mds->active_volatile_bytes, > > - mds->active_persistent_bytes, "pmem"); > > + info->range[CXL_PARTITION_RAM] = (struct range) { > > + .start = 0, > > + .end = mds->active_volatile_bytes - 1, > > + }; > > + info->nr_partitions++; > > + > > + if (!mds->active_persistent_bytes) > > + return 0; > > + > > + info->range[CXL_PARTITION_PMEM] = (struct range) { > > + .start = mds->active_volatile_bytes, > > + .end = mds->active_volatile_bytes + mds->active_persistent_bytes - 1, > > + }; > > + info->nr_partitions++; > > + > > + return 0; > > } > > -EXPORT_SYMBOL_NS_GPL(cxl_mem_create_range_info, "CXL"); > > +EXPORT_SYMBOL_NS_GPL(cxl_mem_dpa_fetch, "CXL"); > > > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h > > index 78e92e24d7b5..2e728d4b7327 100644 > > --- a/drivers/cxl/cxlmem.h > > +++ b/drivers/cxl/cxlmem.h > > @@ -97,6 +97,20 @@ int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, > > resource_size_t base, resource_size_t len, > > resource_size_t skipped); > > > > +/* Well known, spec defined partition indices */ > > +enum cxl_partition { > > + CXL_PARTITION_RAM, > > + CXL_PARTITION_PMEM, > > + CXL_PARTITION_MAX, > > +}; > > + > > +struct cxl_dpa_info { > > + u64 size; > > + struct range range[CXL_PARTITION_MAX]; > > + int nr_partitions; > > +}; > > blank line seems appropriate here. Added. > > > +int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info); > > + > > static inline struct cxl_ep *cxl_ep_load(struct cxl_port *port, > > struct cxl_memdev *cxlmd) > > { > > @@ -408,6 +422,16 @@ struct cxl_dpa_perf { > > int qos_class; > > }; > > > > > /** > > * struct cxl_dev_state - The driver device state > > * > > @@ -423,8 +447,8 @@ struct cxl_dpa_perf { > > * @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH) > > * @media_ready: Indicate whether the device media is usable > > * @dpa_res: Overall DPA resource tree for the device > > - * @_pmem_res: Active Persistent memory capacity configuration > > - * @_ram_res: Active Volatile memory capacity configuration > > + * @part: DPA partition array > > + * @nr_partitions: Number of DPA partitions > > This needs more. It is not the number of partitions present I think, it > is the number that a particular driver is potentially interested in. > > > * @serial: PCIe Device Serial Number > > * @type: Generic Memory Class device or Vendor Specific Memory device > > * @cxl_mbox: CXL mailbox context > > @@ -438,21 +462,39 @@ struct cxl_dev_state { > > bool rcd; > > bool media_ready; > > struct resource dpa_res; > > - struct resource _pmem_res; > > - struct resource _ram_res; > > + struct cxl_dpa_partition part[CXL_PARTITION_MAX]; > > + unsigned int nr_partitions; > > u64 serial; > > enum cxl_devtype type; > > struct cxl_mailbox cxl_mbox; > > }; > > > > -static inline struct resource *to_ram_res(struct cxl_dev_state *cxlds) > > +static inline const struct resource *to_ram_res(struct cxl_dev_state *cxlds) > > { > > - return &cxlds->_ram_res; > > + if (cxlds->nr_partitions > 0) > > + return &cxlds->part[CXL_PARTITION_RAM].res; > > + return NULL; > > } > > > > -static inline struct resource *to_pmem_res(struct cxl_dev_state *cxlds) > > +static inline const struct resource *to_pmem_res(struct cxl_dev_state *cxlds) > > { > > - return &cxlds->_pmem_res; > > + if (cxlds->nr_partitions > 1) > > This is very confusing as nr_partitions is being used not to indicate > number of partitions but whether a driver has filled in the data for them > (which may well be empty). > > I'd rather see that as a bitmap, or a 'not set' value initialized by > the core that is then replaced when they are set. ...or even better, not require PMEM to be at partition1. [..]
Dan Williams wrote: > Jonathan Cameron wrote: > > On Thu, 16 Jan 2025 22:10:44 -0800 > > Dan Williams <dan.j.williams@intel.com> wrote: > > > > > The pending efforts to add CXL Accelerator (type-2) device [1], and > > > Dynamic Capacity (DCD) support [2], tripped on the > > > no-longer-fit-for-purpose design in the CXL subsystem for tracking > > > device-physical-address (DPA) metadata. Trip hazards include: > > > > > > - CXL Memory Devices need to consider a PMEM partition, but Accelerator > > > devices with CXL.mem likely do not in the common case. > > > > > > - CXL Memory Devices enumerate DPA through Memory Device mailbox > > > commands like Partition Info, Accelerators devices do not. > > > > > > - CXL Memory Devices that support DCD support more than 2 partitions. > > > Some of the driver algorithms are awkward to expand to > 2 partition > > > cases. > > > > > > - DPA performance data is a general capability that can be shared with > > > accelerators, so tracking it in 'struct cxl_memdev_state' is no longer > > > suitable. > > > > > > - 'enum cxl_decoder_mode' is sometimes a partition id and sometimes a > > > memory property, it should be phased in favor of a partition id and > > > the memory property comes from the partition info. > > > > > > Towards cleaning up those issues and allowing a smoother landing for the > > > aforementioned pending efforts, introduce a 'struct cxl_dpa_partition' > > > array to 'struct cxl_dev_state', and 'struct cxl_range_info' as a shared > > > way for Memory Devices and Accelerators to initialize the DPA information > > > in 'struct cxl_dev_state'. > > > > > > For now, split a new cxl_dpa_setup() from cxl_mem_create_range_info() to > > > get the new data structure initialized, and cleanup some qos_class init. > > > Follow on patches will go further to use the new data structure to > > > cleanup algorithms that are better suited to loop over all possible > > > partitions. > > > > > > cxl_dpa_setup() follows the locking expectations of mutating the device > > > DPA map, and is suitable for Accelerator drivers to use. Accelerators > > > likely only have one hardcoded 'ram' partition to convey to the > > > cxl_core. > > > > > > Link: http://lore.kernel.org/20241230214445.27602-1-alejandro.lucero-palau@amd.com [1] > > > Link: http://lore.kernel.org/20241210-dcd-type2-upstream-v8-0-812852504400@intel.com [2] > > > Cc: Dave Jiang <dave.jiang@intel.com> > > > Cc: Alejandro Lucero <alucerop@amd.com> > > > Cc: Ira Weiny <ira.weiny@intel.com> > > > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > > > > Hi Dan, > > > > In basic form this seems fine, but I find the nr_paritions variable usage very > > counter intuitive. It's just how many we configured not how many there > > are, potentially with 0 size (so not a partition). I'd be happier if we > > can avoid that by just prefilling the lot with zero size and filling in > > the ones we want. So zero size means doesn't exist and use an iterator where > > appropriate to skip the zero size ones. > > The PMEM-only device case did give me pause. Is that 2 partitions with a > zero-sized first partition, or is that just 1 partition? > > Ultimately I do think the code should further evolve to treat that as > 1-PMEM-partition, but as far as I can see that depends on 'enum > cxl_decoder_mode' being eliminated and teaching all code paths to search > for the position of the PMEM partition. I was of the same mind that the decoder names could be used to index the array. For ram/pmem this is baked into the user API but for DCD one could imagine not just specifying partition dc0 but rather a 'ram' dcd partition. > > > Without that tidied up, to me this is more confusing than the previous code. > > I was going to save PMEM at a partition other than 1 for the DCD series, > but let me take another pass at adding that to this series. > > [..] > > > +/* if this fails the caller must destroy @cxlds, there is no recovery */ > > > +int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info) > > > +{ > > > + struct device *dev = cxlds->dev; > > > + > > > + guard(rwsem_write)(&cxl_dpa_rwsem); > > > + > > > + if (cxlds->nr_partitions) > > > + return -EBUSY; > > > + > > > + if (!info->size || !info->nr_partitions) { > > > + cxlds->dpa_res = DEFINE_RES_MEM(0, 0); > > > + cxlds->nr_partitions = 0; > > > + return 0; > > > + } > > > + > > > + cxlds->dpa_res = DEFINE_RES_MEM(0, info->size); > > > + > > > + for (int i = 0; i < info->nr_partitions; i++) { > > > + const char *desc; > > > + int rc; > > > + > > > + if (i == CXL_PARTITION_RAM) > > > + desc = "ram"; > > > + else if (i == CXL_PARTITION_PMEM) > > > + desc = "pmem"; > > > + else > > > + desc = ""; > > > + cxlds->part[i].perf.qos_class = CXL_QOS_CLASS_INVALID; > > > + rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->part[i].res, > > > + info->range[i].start, > > > + range_len(&info->range[i]), desc); > > > + if (rc) > > > + return rc; > > > + cxlds->nr_partitions++; > > I'd just initialize the rest to 0 length similar to what is happening > > if we have pmem only anyway. Then this nr_patitions goes away and > > stops being a possible source of confusion. > > Modulo teaching other code that wants to ask "what is the size of the > PMEM partition" to use a helper that hides the "find the device's PMEM > partition". > > > > > > > + } > > > + > > > + return 0; > > > +} > > > +EXPORT_SYMBOL_GPL(cxl_dpa_setup); > > > > > diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c > > > index 3502f1633ad2..7dca5c8c3494 100644 > > > --- a/drivers/cxl/core/mbox.c > > > +++ b/drivers/cxl/core/mbox.c > > > @@ -1241,57 +1241,36 @@ int cxl_mem_sanitize(struct cxl_memdev *cxlmd, u16 cmd) > > > > > -int cxl_mem_create_range_info(struct cxl_memdev_state *mds) > > > +int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info) > > > { > > > struct cxl_dev_state *cxlds = &mds->cxlds; > > > - struct resource *ram_res = to_ram_res(cxlds); > > > - struct resource *pmem_res = to_pmem_res(cxlds); > > > struct device *dev = cxlds->dev; > > > int rc; > > > > > > if (!cxlds->media_ready) { > > > - cxlds->dpa_res = DEFINE_RES_MEM(0, 0); > > > - *ram_res = DEFINE_RES_MEM(0, 0); > > > - *pmem_res = DEFINE_RES_MEM(0, 0); > > > + info->size = 0; > > > return 0; > > > } > > > > > > - cxlds->dpa_res = DEFINE_RES_MEM(0, mds->total_bytes); > > > + info->size = mds->total_bytes; > > > > > > if (mds->partition_align_bytes == 0) { > > Obviously nothing to do with your patch as such, but maybe tidy this up > > by making active values == fixed values when we don't have partition control. > > That seems logical anyway to me and means we only end up with one lot of > > range setup in here. I can't immediately see any side effects of doing this. > > Yeah, I mentioned this in another thread. There is no reason > for 'struct cxl_memdev_state' to carry these values at all. They are > just temporary init-data. > > So, cxl_dev_state_identify() becomes cxl_mem_identify(), since > it is a memory-device command. Move it inside of cxl_mem_dpa_fetch() > since it is just temporary init-data for 'struct cxl_dpa_info'. I took a different direction and removed all these temporary variables which also had the side effect of separating the mbox command processing from the resource creation. I do prefer cxl_dpa_info to the way I coded it but I did not anticipate my structure living long I'm also not a fan of 'cxl_byte_layout': diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index 6d63c29eb0e1..9646465e2cbe 100644 --- a/drivers/cxl/cxlmem.h +++ b/drivers/cxl/cxlmem.h @@ -463,12 +463,6 @@ static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox) * @firmware_version: Firmware version for the memory device. * @enabled_cmds: Hardware commands found enabled in CEL. * @exclusive_cmds: Commands that are kernel-internal only - * @total_bytes: sum of all possible capacities - * @volatile_only_bytes: hard volatile capacity - * @persistent_only_bytes: hard persistent capacity - * @partition_align_bytes: alignment size for partition-able capacity - * @active_volatile_bytes: sum of hard + soft volatile - * @active_persistent_bytes: sum of hard + soft persistent * @ram_perf: performance data entry matched to RAM partition * @pmem_perf: performance data entry matched to PMEM partition * @event: event log driver state @@ -485,12 +479,6 @@ struct cxl_memdev_state { char firmware_version[0x10]; DECLARE_BITMAP(enabled_cmds, CXL_MEM_COMMAND_ID_MAX); DECLARE_BITMAP(exclusive_cmds, CXL_MEM_COMMAND_ID_MAX); - u64 total_bytes; - u64 volatile_only_bytes; - u64 persistent_only_bytes; - u64 partition_align_bytes; - u64 active_volatile_bytes; - u64 active_persistent_bytes; struct cxl_dpa_perf ram_perf; struct cxl_dpa_perf pmem_perf; @@ -811,10 +799,19 @@ enum { int cxl_internal_send_cmd(struct cxl_mailbox *cxl_mbox, struct cxl_mbox_cmd *cmd); -int cxl_dev_state_identify(struct cxl_memdev_state *mds); + +struct cxl_mem_byte_layout { + u64 total_bytes; + u64 volatile_bytes; + u64 persistent_bytes; +}; + +int cxl_dev_state_identify(struct cxl_memdev_state *mds, + struct cxl_mem_byte_layout *byte_layout); int cxl_await_media_ready(struct cxl_dev_state *cxlds); int cxl_enumerate_cmds(struct cxl_memdev_state *mds); -int cxl_mem_create_range_info(struct cxl_memdev_state *mds); +int cxl_create_range_info(struct cxl_dev_state *cxlds, + struct cxl_mem_byte_layout *byte_layout); struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev); void set_exclusive_cxl_commands(struct cxl_memdev_state *mds, unsigned long *cmds); > > [..] > > > - rc = add_dpa_res(dev, &cxlds->dpa_res, ram_res, 0, > > > - mds->volatile_only_bytes, "ram"); > > > - if (rc) > > > - return rc; > > > - return add_dpa_res(dev, &cxlds->dpa_res, pmem_res, > > > - mds->volatile_only_bytes, > > > - mds->persistent_only_bytes, "pmem"); > > > + info->range[CXL_PARTITION_RAM] = (struct range) { > > > + .start = 0, > > > + .end = mds->volatile_only_bytes - 1, > > > + }; > > > + info->nr_partitions++; > > > + > > > + if (!mds->persistent_only_bytes) > > > + return 0; > > > + > > > + info->range[CXL_PARTITION_PMEM] = (struct range) { > > > + .start = mds->volatile_only_bytes, > > > + .end = mds->volatile_only_bytes + > > > + mds->persistent_only_bytes - 1, > > > + }; > > > + info->nr_partitions++; > > > > This nr partitions makes some sense though I'd be tempted to add a type > > array to info so that we can just not pass empty ones if we don't want to. > > Makes this code a little more complex, but not a lot and means > > nr->partitions becomes the ones that actually exist. > > Agree, that's the end goal. > > > > > > + return 0; > > > } > > > > > > rc = cxl_mem_get_partition_info(mds); > > > @@ -1300,15 +1279,24 @@ int cxl_mem_create_range_info(struct cxl_memdev_state *mds) > > > return rc; > > > } > > > > > > - rc = add_dpa_res(dev, &cxlds->dpa_res, ram_res, 0, > > > - mds->active_volatile_bytes, "ram"); > > > - if (rc) > > > - return rc; > > > - return add_dpa_res(dev, &cxlds->dpa_res, pmem_res, > > > - mds->active_volatile_bytes, > > > - mds->active_persistent_bytes, "pmem"); > > > + info->range[CXL_PARTITION_RAM] = (struct range) { > > > + .start = 0, > > > + .end = mds->active_volatile_bytes - 1, > > > + }; > > > + info->nr_partitions++; > > > + > > > + if (!mds->active_persistent_bytes) > > > + return 0; > > > + > > > + info->range[CXL_PARTITION_PMEM] = (struct range) { > > > + .start = mds->active_volatile_bytes, > > > + .end = mds->active_volatile_bytes + mds->active_persistent_bytes - 1, > > > + }; > > > + info->nr_partitions++; > > > + > > > + return 0; > > > } > > > -EXPORT_SYMBOL_NS_GPL(cxl_mem_create_range_info, "CXL"); > > > +EXPORT_SYMBOL_NS_GPL(cxl_mem_dpa_fetch, "CXL"); > > > > > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h > > > index 78e92e24d7b5..2e728d4b7327 100644 > > > --- a/drivers/cxl/cxlmem.h > > > +++ b/drivers/cxl/cxlmem.h > > > @@ -97,6 +97,20 @@ int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, > > > resource_size_t base, resource_size_t len, > > > resource_size_t skipped); > > > > > > +/* Well known, spec defined partition indices */ > > > +enum cxl_partition { > > > + CXL_PARTITION_RAM, > > > + CXL_PARTITION_PMEM, > > > + CXL_PARTITION_MAX, > > > +}; > > > + > > > +struct cxl_dpa_info { > > > + u64 size; > > > + struct range range[CXL_PARTITION_MAX]; > > > + int nr_partitions; > > > +}; > > > > blank line seems appropriate here. > > Added. > > > > > > +int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info); > > > + > > > static inline struct cxl_ep *cxl_ep_load(struct cxl_port *port, > > > struct cxl_memdev *cxlmd) > > > { > > > @@ -408,6 +422,16 @@ struct cxl_dpa_perf { > > > int qos_class; > > > }; > > > > > > > > /** > > > * struct cxl_dev_state - The driver device state > > > * > > > @@ -423,8 +447,8 @@ struct cxl_dpa_perf { > > > * @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH) > > > * @media_ready: Indicate whether the device media is usable > > > * @dpa_res: Overall DPA resource tree for the device > > > - * @_pmem_res: Active Persistent memory capacity configuration > > > - * @_ram_res: Active Volatile memory capacity configuration > > > + * @part: DPA partition array > > > + * @nr_partitions: Number of DPA partitions > > > > This needs more. It is not the number of partitions present I think, it > > is the number that a particular driver is potentially interested in. > > > > > * @serial: PCIe Device Serial Number > > > * @type: Generic Memory Class device or Vendor Specific Memory device > > > * @cxl_mbox: CXL mailbox context > > > @@ -438,21 +462,39 @@ struct cxl_dev_state { > > > bool rcd; > > > bool media_ready; > > > struct resource dpa_res; > > > - struct resource _pmem_res; > > > - struct resource _ram_res; > > > + struct cxl_dpa_partition part[CXL_PARTITION_MAX]; > > > + unsigned int nr_partitions; > > > u64 serial; > > > enum cxl_devtype type; > > > struct cxl_mailbox cxl_mbox; > > > }; > > > > > > -static inline struct resource *to_ram_res(struct cxl_dev_state *cxlds) > > > +static inline const struct resource *to_ram_res(struct cxl_dev_state *cxlds) > > > { > > > - return &cxlds->_ram_res; > > > + if (cxlds->nr_partitions > 0) > > > + return &cxlds->part[CXL_PARTITION_RAM].res; > > > + return NULL; > > > } > > > > > > -static inline struct resource *to_pmem_res(struct cxl_dev_state *cxlds) > > > +static inline const struct resource *to_pmem_res(struct cxl_dev_state *cxlds) > > > { > > > - return &cxlds->_pmem_res; > > > + if (cxlds->nr_partitions > 1) > > > > This is very confusing as nr_partitions is being used not to indicate > > number of partitions but whether a driver has filled in the data for them > > (which may well be empty). > > > > I'd rather see that as a bitmap, or a 'not set' value initialized by > > the core that is then replaced when they are set. > > ...or even better, not require PMEM to be at partition1. FWIW I think that is a step to far to be rushing into 6.14. Ira
Dan Williams wrote: [snip] > diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c > index 7a85522294ad..7e1559b3ed88 100644 > --- a/drivers/cxl/core/hdm.c > +++ b/drivers/cxl/core/hdm.c > @@ -342,6 +342,75 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, > return 0; > } > > +static int add_dpa_res(struct device *dev, struct resource *parent, > + struct resource *res, resource_size_t start, > + resource_size_t size, const char *type) > +{ > + int rc; > + > + *res = (struct resource) { > + .name = type, > + .start = start, > + .end = start + size - 1, > + .flags = IORESOURCE_MEM, > + }; > + if (resource_size(res) == 0) { > + dev_dbg(dev, "DPA(%s): no capacity\n", res->name); > + return 0; > + } > + rc = request_resource(parent, res); > + if (rc) { > + dev_err(dev, "DPA(%s): failed to track %pr (%d)\n", res->name, > + res, rc); > + return rc; > + } > + > + dev_dbg(dev, "DPA(%s): %pr\n", res->name, res); > + > + return 0; > +} > + > +/* if this fails the caller must destroy @cxlds, there is no recovery */ > +int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info) > +{ > + struct device *dev = cxlds->dev; > + > + guard(rwsem_write)(&cxl_dpa_rwsem); > + > + if (cxlds->nr_partitions) > + return -EBUSY; > + > + if (!info->size || !info->nr_partitions) { > + cxlds->dpa_res = DEFINE_RES_MEM(0, 0); > + cxlds->nr_partitions = 0; > + return 0; > + } > + > + cxlds->dpa_res = DEFINE_RES_MEM(0, info->size); > + > + for (int i = 0; i < info->nr_partitions; i++) { > + const char *desc; > + int rc; > + > + if (i == CXL_PARTITION_RAM) > + desc = "ram"; > + else if (i == CXL_PARTITION_PMEM) > + desc = "pmem"; > + else > + desc = ""; > + cxlds->part[i].perf.qos_class = CXL_QOS_CLASS_INVALID; > + rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->part[i].res, > + info->range[i].start, > + range_len(&info->range[i]), desc); > + if (rc) > + return rc; > + cxlds->nr_partitions++; > + } > + > + return 0; > +} > +EXPORT_SYMBOL_GPL(cxl_dpa_setup); Why put this in the middle of hdm.c where it splits up devm_cxl_dpa_reserve() and __cxl_dpa_reserve()? Ira > + > int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, > resource_size_t base, resource_size_t len, > resource_size_t skipped) [snip]
Dan Williams wrote: > The pending efforts to add CXL Accelerator (type-2) device [1], and > Dynamic Capacity (DCD) support [2], tripped on the > no-longer-fit-for-purpose design in the CXL subsystem for tracking > device-physical-address (DPA) metadata. Trip hazards include: > > - CXL Memory Devices need to consider a PMEM partition, but Accelerator > devices with CXL.mem likely do not in the common case. > > - CXL Memory Devices enumerate DPA through Memory Device mailbox > commands like Partition Info, Accelerators devices do not. > > - CXL Memory Devices that support DCD support more than 2 partitions. > Some of the driver algorithms are awkward to expand to > 2 partition > cases. > > - DPA performance data is a general capability that can be shared with > accelerators, so tracking it in 'struct cxl_memdev_state' is no longer > suitable. > > - 'enum cxl_decoder_mode' is sometimes a partition id and sometimes a > memory property, it should be phased in favor of a partition id and > the memory property comes from the partition info. > > Towards cleaning up those issues and allowing a smoother landing for the > aforementioned pending efforts, introduce a 'struct cxl_dpa_partition' > array to 'struct cxl_dev_state', and 'struct cxl_range_info' as a shared > way for Memory Devices and Accelerators to initialize the DPA information > in 'struct cxl_dev_state'. > > For now, split a new cxl_dpa_setup() from cxl_mem_create_range_info() to > get the new data structure initialized, and cleanup some qos_class init. > Follow on patches will go further to use the new data structure to > cleanup algorithms that are better suited to loop over all possible > partitions. > > cxl_dpa_setup() follows the locking expectations of mutating the device > DPA map, and is suitable for Accelerator drivers to use. Accelerators > likely only have one hardcoded 'ram' partition to convey to the > cxl_core. > > Link: http://lore.kernel.org/20241230214445.27602-1-alejandro.lucero-palau@amd.com [1] > Link: http://lore.kernel.org/20241210-dcd-type2-upstream-v8-0-812852504400@intel.com [2] > Cc: Dave Jiang <dave.jiang@intel.com> > Cc: Alejandro Lucero <alucerop@amd.com> > Cc: Ira Weiny <ira.weiny@intel.com> > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > --- > drivers/cxl/core/cdat.c | 15 ++----- > drivers/cxl/core/hdm.c | 69 ++++++++++++++++++++++++++++++++++ > drivers/cxl/core/mbox.c | 86 ++++++++++++++++++------------------------ > drivers/cxl/cxlmem.h | 79 +++++++++++++++++++++++++-------------- > drivers/cxl/pci.c | 7 +++ > tools/testing/cxl/test/cxl.c | 15 ++----- > tools/testing/cxl/test/mem.c | 7 +++ > 7 files changed, 176 insertions(+), 102 deletions(-) > > diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c > index b177a488e29b..5400a421ad30 100644 > --- a/drivers/cxl/core/cdat.c > +++ b/drivers/cxl/core/cdat.c > @@ -261,25 +261,18 @@ static void cxl_memdev_set_qos_class(struct cxl_dev_state *cxlds, > struct device *dev = cxlds->dev; > struct dsmas_entry *dent; > unsigned long index; > - const struct resource *partition[] = { > - to_ram_res(cxlds), > - to_pmem_res(cxlds), > - }; > - struct cxl_dpa_perf *perf[] = { > - to_ram_perf(cxlds), > - to_pmem_perf(cxlds), > - }; > > xa_for_each(dsmas_xa, index, dent) { > - for (int i = 0; i < ARRAY_SIZE(partition); i++) { > - const struct resource *res = partition[i]; > + for (int i = 0; i < cxlds->nr_partitions; i++) { > + struct resource *res = &cxlds->part[i].res; > struct range range = { > .start = res->start, > .end = res->end, > }; > > if (range_contains(&range, &dent->dpa_range)) > - update_perf_entry(dev, dent, perf[i]); > + update_perf_entry(dev, dent, > + &cxlds->part[i].perf); > else > dev_dbg(dev, > "no partition for dsmas dpa: %pra\n", > diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c > index 7a85522294ad..7e1559b3ed88 100644 > --- a/drivers/cxl/core/hdm.c > +++ b/drivers/cxl/core/hdm.c > @@ -342,6 +342,75 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, > return 0; > } > > +static int add_dpa_res(struct device *dev, struct resource *parent, > + struct resource *res, resource_size_t start, > + resource_size_t size, const char *type) > +{ > + int rc; > + > + *res = (struct resource) { > + .name = type, > + .start = start, > + .end = start + size - 1, > + .flags = IORESOURCE_MEM, > + }; > + if (resource_size(res) == 0) { > + dev_dbg(dev, "DPA(%s): no capacity\n", res->name); > + return 0; > + } > + rc = request_resource(parent, res); > + if (rc) { > + dev_err(dev, "DPA(%s): failed to track %pr (%d)\n", res->name, > + res, rc); > + return rc; > + } > + > + dev_dbg(dev, "DPA(%s): %pr\n", res->name, res); > + > + return 0; > +} > + > +/* if this fails the caller must destroy @cxlds, there is no recovery */ > +int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info) > +{ > + struct device *dev = cxlds->dev; > + > + guard(rwsem_write)(&cxl_dpa_rwsem); Why is this semaphore required now? Ira > + > + if (cxlds->nr_partitions) > + return -EBUSY; > + > + if (!info->size || !info->nr_partitions) { > + cxlds->dpa_res = DEFINE_RES_MEM(0, 0); > + cxlds->nr_partitions = 0; > + return 0; > + } > + > + cxlds->dpa_res = DEFINE_RES_MEM(0, info->size); > + > + for (int i = 0; i < info->nr_partitions; i++) { > + const char *desc; > + int rc; > + > + if (i == CXL_PARTITION_RAM) > + desc = "ram"; > + else if (i == CXL_PARTITION_PMEM) > + desc = "pmem"; > + else > + desc = ""; > + cxlds->part[i].perf.qos_class = CXL_QOS_CLASS_INVALID; > + rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->part[i].res, > + info->range[i].start, > + range_len(&info->range[i]), desc); > + if (rc) > + return rc; > + cxlds->nr_partitions++; > + } > + > + return 0; > +} > +EXPORT_SYMBOL_GPL(cxl_dpa_setup); > + > int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, > resource_size_t base, resource_size_t len, > resource_size_t skipped) > diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c > index 3502f1633ad2..7dca5c8c3494 100644 > --- a/drivers/cxl/core/mbox.c > +++ b/drivers/cxl/core/mbox.c > @@ -1241,57 +1241,36 @@ int cxl_mem_sanitize(struct cxl_memdev *cxlmd, u16 cmd) > return rc; > } > > -static int add_dpa_res(struct device *dev, struct resource *parent, > - struct resource *res, resource_size_t start, > - resource_size_t size, const char *type) > -{ > - int rc; > - > - res->name = type; > - res->start = start; > - res->end = start + size - 1; > - res->flags = IORESOURCE_MEM; > - if (resource_size(res) == 0) { > - dev_dbg(dev, "DPA(%s): no capacity\n", res->name); > - return 0; > - } > - rc = request_resource(parent, res); > - if (rc) { > - dev_err(dev, "DPA(%s): failed to track %pr (%d)\n", res->name, > - res, rc); > - return rc; > - } > - > - dev_dbg(dev, "DPA(%s): %pr\n", res->name, res); > - > - return 0; > -} > - > -int cxl_mem_create_range_info(struct cxl_memdev_state *mds) > +int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info) > { > struct cxl_dev_state *cxlds = &mds->cxlds; > - struct resource *ram_res = to_ram_res(cxlds); > - struct resource *pmem_res = to_pmem_res(cxlds); > struct device *dev = cxlds->dev; > int rc; > > if (!cxlds->media_ready) { > - cxlds->dpa_res = DEFINE_RES_MEM(0, 0); > - *ram_res = DEFINE_RES_MEM(0, 0); > - *pmem_res = DEFINE_RES_MEM(0, 0); > + info->size = 0; > return 0; > } > > - cxlds->dpa_res = DEFINE_RES_MEM(0, mds->total_bytes); > + info->size = mds->total_bytes; > > if (mds->partition_align_bytes == 0) { > - rc = add_dpa_res(dev, &cxlds->dpa_res, ram_res, 0, > - mds->volatile_only_bytes, "ram"); > - if (rc) > - return rc; > - return add_dpa_res(dev, &cxlds->dpa_res, pmem_res, > - mds->volatile_only_bytes, > - mds->persistent_only_bytes, "pmem"); > + info->range[CXL_PARTITION_RAM] = (struct range) { > + .start = 0, > + .end = mds->volatile_only_bytes - 1, > + }; > + info->nr_partitions++; > + > + if (!mds->persistent_only_bytes) > + return 0; > + > + info->range[CXL_PARTITION_PMEM] = (struct range) { > + .start = mds->volatile_only_bytes, > + .end = mds->volatile_only_bytes + > + mds->persistent_only_bytes - 1, > + }; > + info->nr_partitions++; > + return 0; > } > > rc = cxl_mem_get_partition_info(mds); > @@ -1300,15 +1279,24 @@ int cxl_mem_create_range_info(struct cxl_memdev_state *mds) > return rc; > } > > - rc = add_dpa_res(dev, &cxlds->dpa_res, ram_res, 0, > - mds->active_volatile_bytes, "ram"); > - if (rc) > - return rc; > - return add_dpa_res(dev, &cxlds->dpa_res, pmem_res, > - mds->active_volatile_bytes, > - mds->active_persistent_bytes, "pmem"); > + info->range[CXL_PARTITION_RAM] = (struct range) { > + .start = 0, > + .end = mds->active_volatile_bytes - 1, > + }; > + info->nr_partitions++; > + > + if (!mds->active_persistent_bytes) > + return 0; > + > + info->range[CXL_PARTITION_PMEM] = (struct range) { > + .start = mds->active_volatile_bytes, > + .end = mds->active_volatile_bytes + mds->active_persistent_bytes - 1, > + }; > + info->nr_partitions++; > + > + return 0; > } > -EXPORT_SYMBOL_NS_GPL(cxl_mem_create_range_info, "CXL"); > +EXPORT_SYMBOL_NS_GPL(cxl_mem_dpa_fetch, "CXL"); > > int cxl_set_timestamp(struct cxl_memdev_state *mds) > { > @@ -1452,8 +1440,6 @@ struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev) > mds->cxlds.reg_map.host = dev; > mds->cxlds.reg_map.resource = CXL_RESOURCE_NONE; > mds->cxlds.type = CXL_DEVTYPE_CLASSMEM; > - to_ram_perf(&mds->cxlds)->qos_class = CXL_QOS_CLASS_INVALID; > - to_pmem_perf(&mds->cxlds)->qos_class = CXL_QOS_CLASS_INVALID; > > return mds; > } > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h > index 78e92e24d7b5..2e728d4b7327 100644 > --- a/drivers/cxl/cxlmem.h > +++ b/drivers/cxl/cxlmem.h > @@ -97,6 +97,20 @@ int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, > resource_size_t base, resource_size_t len, > resource_size_t skipped); > > +/* Well known, spec defined partition indices */ > +enum cxl_partition { > + CXL_PARTITION_RAM, > + CXL_PARTITION_PMEM, > + CXL_PARTITION_MAX, > +}; > + > +struct cxl_dpa_info { > + u64 size; > + struct range range[CXL_PARTITION_MAX]; > + int nr_partitions; > +}; > +int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info); > + > static inline struct cxl_ep *cxl_ep_load(struct cxl_port *port, > struct cxl_memdev *cxlmd) > { > @@ -408,6 +422,16 @@ struct cxl_dpa_perf { > int qos_class; > }; > > +/** > + * struct cxl_dpa_partition - DPA partition descriptor > + * @res: shortcut to the partition in the DPA resource tree (cxlds->dpa_res) > + * @perf: performance attributes of the partition from CDAT > + */ > +struct cxl_dpa_partition { > + struct resource res; > + struct cxl_dpa_perf perf; > +}; > + > /** > * struct cxl_dev_state - The driver device state > * > @@ -423,8 +447,8 @@ struct cxl_dpa_perf { > * @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH) > * @media_ready: Indicate whether the device media is usable > * @dpa_res: Overall DPA resource tree for the device > - * @_pmem_res: Active Persistent memory capacity configuration > - * @_ram_res: Active Volatile memory capacity configuration > + * @part: DPA partition array > + * @nr_partitions: Number of DPA partitions > * @serial: PCIe Device Serial Number > * @type: Generic Memory Class device or Vendor Specific Memory device > * @cxl_mbox: CXL mailbox context > @@ -438,21 +462,39 @@ struct cxl_dev_state { > bool rcd; > bool media_ready; > struct resource dpa_res; > - struct resource _pmem_res; > - struct resource _ram_res; > + struct cxl_dpa_partition part[CXL_PARTITION_MAX]; > + unsigned int nr_partitions; > u64 serial; > enum cxl_devtype type; > struct cxl_mailbox cxl_mbox; > }; > > -static inline struct resource *to_ram_res(struct cxl_dev_state *cxlds) > +static inline const struct resource *to_ram_res(struct cxl_dev_state *cxlds) > { > - return &cxlds->_ram_res; > + if (cxlds->nr_partitions > 0) > + return &cxlds->part[CXL_PARTITION_RAM].res; > + return NULL; > } > > -static inline struct resource *to_pmem_res(struct cxl_dev_state *cxlds) > +static inline const struct resource *to_pmem_res(struct cxl_dev_state *cxlds) > { > - return &cxlds->_pmem_res; > + if (cxlds->nr_partitions > 1) > + return &cxlds->part[CXL_PARTITION_PMEM].res; > + return NULL; > +} > + > +static inline struct cxl_dpa_perf *to_ram_perf(struct cxl_dev_state *cxlds) > +{ > + if (cxlds->nr_partitions > 0) > + return &cxlds->part[CXL_PARTITION_RAM].perf; > + return NULL; > +} > + > +static inline struct cxl_dpa_perf *to_pmem_perf(struct cxl_dev_state *cxlds) > +{ > + if (cxlds->nr_partitions > 1) > + return &cxlds->part[CXL_PARTITION_PMEM].perf; > + return NULL; > } > > static inline resource_size_t cxl_ram_size(struct cxl_dev_state *cxlds) > @@ -499,8 +541,6 @@ static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox) > * @active_persistent_bytes: sum of hard + soft persistent > * @next_volatile_bytes: volatile capacity change pending device reset > * @next_persistent_bytes: persistent capacity change pending device reset > - * @_ram_perf: performance data entry matched to RAM partition > - * @_pmem_perf: performance data entry matched to PMEM partition > * @event: event log driver state > * @poison: poison driver state info > * @security: security driver state info > @@ -524,29 +564,12 @@ struct cxl_memdev_state { > u64 next_volatile_bytes; > u64 next_persistent_bytes; > > - struct cxl_dpa_perf _ram_perf; > - struct cxl_dpa_perf _pmem_perf; > - > struct cxl_event_state event; > struct cxl_poison_state poison; > struct cxl_security_state security; > struct cxl_fw_state fw; > }; > > -static inline struct cxl_dpa_perf *to_ram_perf(struct cxl_dev_state *cxlds) > -{ > - struct cxl_memdev_state *mds = container_of(cxlds, typeof(*mds), cxlds); > - > - return &mds->_ram_perf; > -} > - > -static inline struct cxl_dpa_perf *to_pmem_perf(struct cxl_dev_state *cxlds) > -{ > - struct cxl_memdev_state *mds = container_of(cxlds, typeof(*mds), cxlds); > - > - return &mds->_pmem_perf; > -} > - > static inline struct cxl_memdev_state * > to_cxl_memdev_state(struct cxl_dev_state *cxlds) > { > @@ -860,7 +883,7 @@ int cxl_internal_send_cmd(struct cxl_mailbox *cxl_mbox, > int cxl_dev_state_identify(struct cxl_memdev_state *mds); > int cxl_await_media_ready(struct cxl_dev_state *cxlds); > int cxl_enumerate_cmds(struct cxl_memdev_state *mds); > -int cxl_mem_create_range_info(struct cxl_memdev_state *mds); > +int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info); > struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev); > void set_exclusive_cxl_commands(struct cxl_memdev_state *mds, > unsigned long *cmds); > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c > index 0241d1d7133a..47dbfe406236 100644 > --- a/drivers/cxl/pci.c > +++ b/drivers/cxl/pci.c > @@ -900,6 +900,7 @@ __ATTRIBUTE_GROUPS(cxl_rcd); > static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) > { > struct pci_host_bridge *host_bridge = pci_find_host_bridge(pdev->bus); > + struct cxl_dpa_info range_info = { 0 }; > struct cxl_memdev_state *mds; > struct cxl_dev_state *cxlds; > struct cxl_register_map map; > @@ -989,7 +990,11 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) > if (rc) > return rc; > > - rc = cxl_mem_create_range_info(mds); > + rc = cxl_mem_dpa_fetch(mds, &range_info); > + if (rc) > + return rc; > + > + rc = cxl_dpa_setup(cxlds, &range_info); > if (rc) > return rc; > > diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c > index 7f1c5061307b..ba3d48b37de3 100644 > --- a/tools/testing/cxl/test/cxl.c > +++ b/tools/testing/cxl/test/cxl.c > @@ -1001,26 +1001,19 @@ static void mock_cxl_endpoint_parse_cdat(struct cxl_port *port) > struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev); > struct cxl_dev_state *cxlds = cxlmd->cxlds; > struct access_coordinate ep_c[ACCESS_COORDINATE_MAX]; > - const struct resource *partition[] = { > - to_ram_res(cxlds), > - to_pmem_res(cxlds), > - }; > - struct cxl_dpa_perf *perf[] = { > - to_ram_perf(cxlds), > - to_pmem_perf(cxlds), > - }; > > if (!cxl_root) > return; > > - for (int i = 0; i < ARRAY_SIZE(partition); i++) { > - const struct resource *res = partition[i]; > + for (int i = 0; i < cxlds->nr_partitions; i++) { > + struct resource *res = &cxlds->part[i].res; > + struct cxl_dpa_perf *perf = &cxlds->part[i].perf; > struct range range = { > .start = res->start, > .end = res->end, > }; > > - dpa_perf_setup(port, &range, perf[i]); > + dpa_perf_setup(port, &range, perf); > } > > cxl_memdev_update_perf(cxlmd); > diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c > index 347c1e7b37bd..ed365e083c8f 100644 > --- a/tools/testing/cxl/test/mem.c > +++ b/tools/testing/cxl/test/mem.c > @@ -1477,6 +1477,7 @@ static int cxl_mock_mem_probe(struct platform_device *pdev) > struct cxl_dev_state *cxlds; > struct cxl_mockmem_data *mdata; > struct cxl_mailbox *cxl_mbox; > + struct cxl_dpa_info range_info = { 0 }; > int rc; > > mdata = devm_kzalloc(dev, sizeof(*mdata), GFP_KERNEL); > @@ -1537,7 +1538,11 @@ static int cxl_mock_mem_probe(struct platform_device *pdev) > if (rc) > return rc; > > - rc = cxl_mem_create_range_info(mds); > + rc = cxl_mem_dpa_fetch(mds, &range_info); > + if (rc) > + return rc; > + > + rc = cxl_dpa_setup(cxlds, &range_info); > if (rc) > return rc; > > >
Alejandro Lucero Palau wrote: [..] > > +/* if this fails the caller must destroy @cxlds, there is no recovery */ > > +int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info) > > +{ > > + struct device *dev = cxlds->dev; > > + > > + guard(rwsem_write)(&cxl_dpa_rwsem); > > + > > > This explains to me what you meant about locking when setting the > resources for Type2. > > > However, I think this is no necessary because there is no user space, or > that is my idea, involved when creating CXL regions for a Type2. It is > all up to the accel driver to do so, therefore no locking needed because > none is going to traverse the child resource list while > initialising/updating it. Yes, no locking is needed, and that was the status quo for cxl_pci since it was simple to audit the single user. Going forward, with multiple users, and the fact that cxl_dev_state is not strictly private to the cxl_core, some safety is reasonable. > It does not harm to have it for current Type2 case, and always a good > idea to have it for potential future cases. My main motivation is to help protect against someone thinking that calling cxl_dpa_setup() twice is a workable model.
diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c index b177a488e29b..5400a421ad30 100644 --- a/drivers/cxl/core/cdat.c +++ b/drivers/cxl/core/cdat.c @@ -261,25 +261,18 @@ static void cxl_memdev_set_qos_class(struct cxl_dev_state *cxlds, struct device *dev = cxlds->dev; struct dsmas_entry *dent; unsigned long index; - const struct resource *partition[] = { - to_ram_res(cxlds), - to_pmem_res(cxlds), - }; - struct cxl_dpa_perf *perf[] = { - to_ram_perf(cxlds), - to_pmem_perf(cxlds), - }; xa_for_each(dsmas_xa, index, dent) { - for (int i = 0; i < ARRAY_SIZE(partition); i++) { - const struct resource *res = partition[i]; + for (int i = 0; i < cxlds->nr_partitions; i++) { + struct resource *res = &cxlds->part[i].res; struct range range = { .start = res->start, .end = res->end, }; if (range_contains(&range, &dent->dpa_range)) - update_perf_entry(dev, dent, perf[i]); + update_perf_entry(dev, dent, + &cxlds->part[i].perf); else dev_dbg(dev, "no partition for dsmas dpa: %pra\n", diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index 7a85522294ad..7e1559b3ed88 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -342,6 +342,75 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, return 0; } +static int add_dpa_res(struct device *dev, struct resource *parent, + struct resource *res, resource_size_t start, + resource_size_t size, const char *type) +{ + int rc; + + *res = (struct resource) { + .name = type, + .start = start, + .end = start + size - 1, + .flags = IORESOURCE_MEM, + }; + if (resource_size(res) == 0) { + dev_dbg(dev, "DPA(%s): no capacity\n", res->name); + return 0; + } + rc = request_resource(parent, res); + if (rc) { + dev_err(dev, "DPA(%s): failed to track %pr (%d)\n", res->name, + res, rc); + return rc; + } + + dev_dbg(dev, "DPA(%s): %pr\n", res->name, res); + + return 0; +} + +/* if this fails the caller must destroy @cxlds, there is no recovery */ +int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info) +{ + struct device *dev = cxlds->dev; + + guard(rwsem_write)(&cxl_dpa_rwsem); + + if (cxlds->nr_partitions) + return -EBUSY; + + if (!info->size || !info->nr_partitions) { + cxlds->dpa_res = DEFINE_RES_MEM(0, 0); + cxlds->nr_partitions = 0; + return 0; + } + + cxlds->dpa_res = DEFINE_RES_MEM(0, info->size); + + for (int i = 0; i < info->nr_partitions; i++) { + const char *desc; + int rc; + + if (i == CXL_PARTITION_RAM) + desc = "ram"; + else if (i == CXL_PARTITION_PMEM) + desc = "pmem"; + else + desc = ""; + cxlds->part[i].perf.qos_class = CXL_QOS_CLASS_INVALID; + rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->part[i].res, + info->range[i].start, + range_len(&info->range[i]), desc); + if (rc) + return rc; + cxlds->nr_partitions++; + } + + return 0; +} +EXPORT_SYMBOL_GPL(cxl_dpa_setup); + int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, resource_size_t base, resource_size_t len, resource_size_t skipped) diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index 3502f1633ad2..7dca5c8c3494 100644 --- a/drivers/cxl/core/mbox.c +++ b/drivers/cxl/core/mbox.c @@ -1241,57 +1241,36 @@ int cxl_mem_sanitize(struct cxl_memdev *cxlmd, u16 cmd) return rc; } -static int add_dpa_res(struct device *dev, struct resource *parent, - struct resource *res, resource_size_t start, - resource_size_t size, const char *type) -{ - int rc; - - res->name = type; - res->start = start; - res->end = start + size - 1; - res->flags = IORESOURCE_MEM; - if (resource_size(res) == 0) { - dev_dbg(dev, "DPA(%s): no capacity\n", res->name); - return 0; - } - rc = request_resource(parent, res); - if (rc) { - dev_err(dev, "DPA(%s): failed to track %pr (%d)\n", res->name, - res, rc); - return rc; - } - - dev_dbg(dev, "DPA(%s): %pr\n", res->name, res); - - return 0; -} - -int cxl_mem_create_range_info(struct cxl_memdev_state *mds) +int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info) { struct cxl_dev_state *cxlds = &mds->cxlds; - struct resource *ram_res = to_ram_res(cxlds); - struct resource *pmem_res = to_pmem_res(cxlds); struct device *dev = cxlds->dev; int rc; if (!cxlds->media_ready) { - cxlds->dpa_res = DEFINE_RES_MEM(0, 0); - *ram_res = DEFINE_RES_MEM(0, 0); - *pmem_res = DEFINE_RES_MEM(0, 0); + info->size = 0; return 0; } - cxlds->dpa_res = DEFINE_RES_MEM(0, mds->total_bytes); + info->size = mds->total_bytes; if (mds->partition_align_bytes == 0) { - rc = add_dpa_res(dev, &cxlds->dpa_res, ram_res, 0, - mds->volatile_only_bytes, "ram"); - if (rc) - return rc; - return add_dpa_res(dev, &cxlds->dpa_res, pmem_res, - mds->volatile_only_bytes, - mds->persistent_only_bytes, "pmem"); + info->range[CXL_PARTITION_RAM] = (struct range) { + .start = 0, + .end = mds->volatile_only_bytes - 1, + }; + info->nr_partitions++; + + if (!mds->persistent_only_bytes) + return 0; + + info->range[CXL_PARTITION_PMEM] = (struct range) { + .start = mds->volatile_only_bytes, + .end = mds->volatile_only_bytes + + mds->persistent_only_bytes - 1, + }; + info->nr_partitions++; + return 0; } rc = cxl_mem_get_partition_info(mds); @@ -1300,15 +1279,24 @@ int cxl_mem_create_range_info(struct cxl_memdev_state *mds) return rc; } - rc = add_dpa_res(dev, &cxlds->dpa_res, ram_res, 0, - mds->active_volatile_bytes, "ram"); - if (rc) - return rc; - return add_dpa_res(dev, &cxlds->dpa_res, pmem_res, - mds->active_volatile_bytes, - mds->active_persistent_bytes, "pmem"); + info->range[CXL_PARTITION_RAM] = (struct range) { + .start = 0, + .end = mds->active_volatile_bytes - 1, + }; + info->nr_partitions++; + + if (!mds->active_persistent_bytes) + return 0; + + info->range[CXL_PARTITION_PMEM] = (struct range) { + .start = mds->active_volatile_bytes, + .end = mds->active_volatile_bytes + mds->active_persistent_bytes - 1, + }; + info->nr_partitions++; + + return 0; } -EXPORT_SYMBOL_NS_GPL(cxl_mem_create_range_info, "CXL"); +EXPORT_SYMBOL_NS_GPL(cxl_mem_dpa_fetch, "CXL"); int cxl_set_timestamp(struct cxl_memdev_state *mds) { @@ -1452,8 +1440,6 @@ struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev) mds->cxlds.reg_map.host = dev; mds->cxlds.reg_map.resource = CXL_RESOURCE_NONE; mds->cxlds.type = CXL_DEVTYPE_CLASSMEM; - to_ram_perf(&mds->cxlds)->qos_class = CXL_QOS_CLASS_INVALID; - to_pmem_perf(&mds->cxlds)->qos_class = CXL_QOS_CLASS_INVALID; return mds; } diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index 78e92e24d7b5..2e728d4b7327 100644 --- a/drivers/cxl/cxlmem.h +++ b/drivers/cxl/cxlmem.h @@ -97,6 +97,20 @@ int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, resource_size_t base, resource_size_t len, resource_size_t skipped); +/* Well known, spec defined partition indices */ +enum cxl_partition { + CXL_PARTITION_RAM, + CXL_PARTITION_PMEM, + CXL_PARTITION_MAX, +}; + +struct cxl_dpa_info { + u64 size; + struct range range[CXL_PARTITION_MAX]; + int nr_partitions; +}; +int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info); + static inline struct cxl_ep *cxl_ep_load(struct cxl_port *port, struct cxl_memdev *cxlmd) { @@ -408,6 +422,16 @@ struct cxl_dpa_perf { int qos_class; }; +/** + * struct cxl_dpa_partition - DPA partition descriptor + * @res: shortcut to the partition in the DPA resource tree (cxlds->dpa_res) + * @perf: performance attributes of the partition from CDAT + */ +struct cxl_dpa_partition { + struct resource res; + struct cxl_dpa_perf perf; +}; + /** * struct cxl_dev_state - The driver device state * @@ -423,8 +447,8 @@ struct cxl_dpa_perf { * @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH) * @media_ready: Indicate whether the device media is usable * @dpa_res: Overall DPA resource tree for the device - * @_pmem_res: Active Persistent memory capacity configuration - * @_ram_res: Active Volatile memory capacity configuration + * @part: DPA partition array + * @nr_partitions: Number of DPA partitions * @serial: PCIe Device Serial Number * @type: Generic Memory Class device or Vendor Specific Memory device * @cxl_mbox: CXL mailbox context @@ -438,21 +462,39 @@ struct cxl_dev_state { bool rcd; bool media_ready; struct resource dpa_res; - struct resource _pmem_res; - struct resource _ram_res; + struct cxl_dpa_partition part[CXL_PARTITION_MAX]; + unsigned int nr_partitions; u64 serial; enum cxl_devtype type; struct cxl_mailbox cxl_mbox; }; -static inline struct resource *to_ram_res(struct cxl_dev_state *cxlds) +static inline const struct resource *to_ram_res(struct cxl_dev_state *cxlds) { - return &cxlds->_ram_res; + if (cxlds->nr_partitions > 0) + return &cxlds->part[CXL_PARTITION_RAM].res; + return NULL; } -static inline struct resource *to_pmem_res(struct cxl_dev_state *cxlds) +static inline const struct resource *to_pmem_res(struct cxl_dev_state *cxlds) { - return &cxlds->_pmem_res; + if (cxlds->nr_partitions > 1) + return &cxlds->part[CXL_PARTITION_PMEM].res; + return NULL; +} + +static inline struct cxl_dpa_perf *to_ram_perf(struct cxl_dev_state *cxlds) +{ + if (cxlds->nr_partitions > 0) + return &cxlds->part[CXL_PARTITION_RAM].perf; + return NULL; +} + +static inline struct cxl_dpa_perf *to_pmem_perf(struct cxl_dev_state *cxlds) +{ + if (cxlds->nr_partitions > 1) + return &cxlds->part[CXL_PARTITION_PMEM].perf; + return NULL; } static inline resource_size_t cxl_ram_size(struct cxl_dev_state *cxlds) @@ -499,8 +541,6 @@ static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox) * @active_persistent_bytes: sum of hard + soft persistent * @next_volatile_bytes: volatile capacity change pending device reset * @next_persistent_bytes: persistent capacity change pending device reset - * @_ram_perf: performance data entry matched to RAM partition - * @_pmem_perf: performance data entry matched to PMEM partition * @event: event log driver state * @poison: poison driver state info * @security: security driver state info @@ -524,29 +564,12 @@ struct cxl_memdev_state { u64 next_volatile_bytes; u64 next_persistent_bytes; - struct cxl_dpa_perf _ram_perf; - struct cxl_dpa_perf _pmem_perf; - struct cxl_event_state event; struct cxl_poison_state poison; struct cxl_security_state security; struct cxl_fw_state fw; }; -static inline struct cxl_dpa_perf *to_ram_perf(struct cxl_dev_state *cxlds) -{ - struct cxl_memdev_state *mds = container_of(cxlds, typeof(*mds), cxlds); - - return &mds->_ram_perf; -} - -static inline struct cxl_dpa_perf *to_pmem_perf(struct cxl_dev_state *cxlds) -{ - struct cxl_memdev_state *mds = container_of(cxlds, typeof(*mds), cxlds); - - return &mds->_pmem_perf; -} - static inline struct cxl_memdev_state * to_cxl_memdev_state(struct cxl_dev_state *cxlds) { @@ -860,7 +883,7 @@ int cxl_internal_send_cmd(struct cxl_mailbox *cxl_mbox, int cxl_dev_state_identify(struct cxl_memdev_state *mds); int cxl_await_media_ready(struct cxl_dev_state *cxlds); int cxl_enumerate_cmds(struct cxl_memdev_state *mds); -int cxl_mem_create_range_info(struct cxl_memdev_state *mds); +int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info); struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev); void set_exclusive_cxl_commands(struct cxl_memdev_state *mds, unsigned long *cmds); diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c index 0241d1d7133a..47dbfe406236 100644 --- a/drivers/cxl/pci.c +++ b/drivers/cxl/pci.c @@ -900,6 +900,7 @@ __ATTRIBUTE_GROUPS(cxl_rcd); static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) { struct pci_host_bridge *host_bridge = pci_find_host_bridge(pdev->bus); + struct cxl_dpa_info range_info = { 0 }; struct cxl_memdev_state *mds; struct cxl_dev_state *cxlds; struct cxl_register_map map; @@ -989,7 +990,11 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) if (rc) return rc; - rc = cxl_mem_create_range_info(mds); + rc = cxl_mem_dpa_fetch(mds, &range_info); + if (rc) + return rc; + + rc = cxl_dpa_setup(cxlds, &range_info); if (rc) return rc; diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c index 7f1c5061307b..ba3d48b37de3 100644 --- a/tools/testing/cxl/test/cxl.c +++ b/tools/testing/cxl/test/cxl.c @@ -1001,26 +1001,19 @@ static void mock_cxl_endpoint_parse_cdat(struct cxl_port *port) struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev); struct cxl_dev_state *cxlds = cxlmd->cxlds; struct access_coordinate ep_c[ACCESS_COORDINATE_MAX]; - const struct resource *partition[] = { - to_ram_res(cxlds), - to_pmem_res(cxlds), - }; - struct cxl_dpa_perf *perf[] = { - to_ram_perf(cxlds), - to_pmem_perf(cxlds), - }; if (!cxl_root) return; - for (int i = 0; i < ARRAY_SIZE(partition); i++) { - const struct resource *res = partition[i]; + for (int i = 0; i < cxlds->nr_partitions; i++) { + struct resource *res = &cxlds->part[i].res; + struct cxl_dpa_perf *perf = &cxlds->part[i].perf; struct range range = { .start = res->start, .end = res->end, }; - dpa_perf_setup(port, &range, perf[i]); + dpa_perf_setup(port, &range, perf); } cxl_memdev_update_perf(cxlmd); diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c index 347c1e7b37bd..ed365e083c8f 100644 --- a/tools/testing/cxl/test/mem.c +++ b/tools/testing/cxl/test/mem.c @@ -1477,6 +1477,7 @@ static int cxl_mock_mem_probe(struct platform_device *pdev) struct cxl_dev_state *cxlds; struct cxl_mockmem_data *mdata; struct cxl_mailbox *cxl_mbox; + struct cxl_dpa_info range_info = { 0 }; int rc; mdata = devm_kzalloc(dev, sizeof(*mdata), GFP_KERNEL); @@ -1537,7 +1538,11 @@ static int cxl_mock_mem_probe(struct platform_device *pdev) if (rc) return rc; - rc = cxl_mem_create_range_info(mds); + rc = cxl_mem_dpa_fetch(mds, &range_info); + if (rc) + return rc; + + rc = cxl_dpa_setup(cxlds, &range_info); if (rc) return rc;
The pending efforts to add CXL Accelerator (type-2) device [1], and Dynamic Capacity (DCD) support [2], tripped on the no-longer-fit-for-purpose design in the CXL subsystem for tracking device-physical-address (DPA) metadata. Trip hazards include: - CXL Memory Devices need to consider a PMEM partition, but Accelerator devices with CXL.mem likely do not in the common case. - CXL Memory Devices enumerate DPA through Memory Device mailbox commands like Partition Info, Accelerators devices do not. - CXL Memory Devices that support DCD support more than 2 partitions. Some of the driver algorithms are awkward to expand to > 2 partition cases. - DPA performance data is a general capability that can be shared with accelerators, so tracking it in 'struct cxl_memdev_state' is no longer suitable. - 'enum cxl_decoder_mode' is sometimes a partition id and sometimes a memory property, it should be phased in favor of a partition id and the memory property comes from the partition info. Towards cleaning up those issues and allowing a smoother landing for the aforementioned pending efforts, introduce a 'struct cxl_dpa_partition' array to 'struct cxl_dev_state', and 'struct cxl_range_info' as a shared way for Memory Devices and Accelerators to initialize the DPA information in 'struct cxl_dev_state'. For now, split a new cxl_dpa_setup() from cxl_mem_create_range_info() to get the new data structure initialized, and cleanup some qos_class init. Follow on patches will go further to use the new data structure to cleanup algorithms that are better suited to loop over all possible partitions. cxl_dpa_setup() follows the locking expectations of mutating the device DPA map, and is suitable for Accelerator drivers to use. Accelerators likely only have one hardcoded 'ram' partition to convey to the cxl_core. Link: http://lore.kernel.org/20241230214445.27602-1-alejandro.lucero-palau@amd.com [1] Link: http://lore.kernel.org/20241210-dcd-type2-upstream-v8-0-812852504400@intel.com [2] Cc: Dave Jiang <dave.jiang@intel.com> Cc: Alejandro Lucero <alucerop@amd.com> Cc: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- drivers/cxl/core/cdat.c | 15 ++----- drivers/cxl/core/hdm.c | 69 ++++++++++++++++++++++++++++++++++ drivers/cxl/core/mbox.c | 86 ++++++++++++++++++------------------------ drivers/cxl/cxlmem.h | 79 +++++++++++++++++++++++++-------------- drivers/cxl/pci.c | 7 +++ tools/testing/cxl/test/cxl.c | 15 ++----- tools/testing/cxl/test/mem.c | 7 +++ 7 files changed, 176 insertions(+), 102 deletions(-)