Message ID | 170268217159.1381493.10875292326564731198.stgit@djiang5-mobl3 |
---|---|
State | Superseded |
Headers | show |
Series | cxl: Add support to report region access coordinates to numa nodes | expand |
On Fri, 15 Dec 2023 16:16:11 -0700 Dave Jiang <dave.jiang@intel.com> wrote: > When the CXL region is formed, the driver would computed the performance > data for the region. However this data is not available at the node data > collection that has been populated by the HMAT during kernel > initialization. Add a memory hotplug notifier to update the performance > data to the node hmem_attrs to expose the newly calculated region > performance data. The CXL region is created under specific CFMWS. The > node for the CFMWS is created during SRAT parsing by acpi_parse_cfmws(). > The notifier will run once only and turn itself off after the initial > run. Additional regions may overwrite the initial data, but since this is > for the same poximity domain it's a don't care for now. proximity > > node_set_perf_attrs() is exported to allow update of perf attribs for a > node. Given that only CXL is using this, export only to CXL namespace. > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > Cc: Rafael J. Wysocki <rafael@kernel.org> > Reviewed-by: "Huang, Ying" <ying.huang@intel.com> > Signed-off-by: Dave Jiang <dave.jiang@intel.com> What is end result of this? /sys/devices/system/node/node/access0/ /sys/devices/system/node/node/access1/ With just the bandwidths and latencies? No targets or initiators under accessX/targets or accessX/initiators? Or have those been set up earlier? In which case do we handle the worse bandwidth being inside the host CPU? > --- > v2: > - Fix notifier return values (Dan) > - Use devm_add_action_or_reset() instead of adding a remove callback (Dan) > - Add Ying review tag > --- > drivers/base/node.c | 1 + > drivers/cxl/core/region.c | 42 ++++++++++++++++++++++++++++++++++++++++++ > drivers/cxl/cxl.h | 3 +++ > 3 files changed, 46 insertions(+) > > diff --git a/drivers/base/node.c b/drivers/base/node.c > index cb2b6cc7f6e6..f5b5a3f11894 100644 > --- a/drivers/base/node.c > +++ b/drivers/base/node.c > @@ -215,6 +215,7 @@ void node_set_perf_attrs(unsigned int nid, struct access_coordinate *coord, > } > } > } > +EXPORT_SYMBOL_NS_GPL(node_set_perf_attrs, CXL); This feels ugly as namespaces usually about what is providing the facility not a 'who can use it' control. Also, I'm aware of at least one other user who will want this in the not too distant future. So if we want to namespace it, I'd prefer a NODE namespace or something along those lines. > > /** > * struct node_cache_info - Internal tracking for memory node caches > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c > index d97fa5f32e86..1765bf716484 100644 > --- a/drivers/cxl/core/region.c > +++ b/drivers/cxl/core/region.c > @@ -4,6 +4,7 @@ > #include <linux/genalloc.h> > #include <linux/device.h> > #include <linux/module.h> > +#include <linux/memory.h> > #include <linux/slab.h> > #include <linux/uuid.h> > #include <linux/sort.h> > @@ -2960,6 +2961,42 @@ static int is_system_ram(struct resource *res, void *arg) > return 1; > } > > +static int cxl_region_perf_attrs_callback(struct notifier_block *nb, > + unsigned long action, void *arg) > +{ > + struct cxl_region *cxlr = container_of(nb, struct cxl_region, > + memory_notifier); > + struct cxl_region_params *p = &cxlr->params; > + struct cxl_endpoint_decoder *cxled = p->targets[0]; > + struct cxl_decoder *cxld = &cxled->cxld; > + struct memory_notify *mnb = arg; > + int nid = mnb->status_change_nid; > + int region_nid; > + > + if (nid == NUMA_NO_NODE || action != MEM_ONLINE) > + return NOTIFY_DONE; > + > + region_nid = phys_to_target_node(cxld->hpa_range.start); > + if (nid != region_nid) > + return NOTIFY_DONE; > + > + /* Don't set if there's no coordinate information */ > + if (!cxlr->coord.write_bandwidth) > + return NOTIFY_DONE; Could future proof a bit to allow for RO memory by using read_bandwith here. > + > + node_set_perf_attrs(nid, &cxlr->coord, 0); > + node_set_perf_attrs(nid, &cxlr->coord, 1); Hmm. Assumption that the access attributes from no CPU requesters is the same as the CPU bothers me a little. > + > + return NOTIFY_OK; > +} > + > +static void remove_coord_notifier(void *data) > +{ > + struct cxl_region *cxlr = data; > + > + unregister_memory_notifier(&cxlr->memory_notifier); > +} > + > static int cxl_region_probe(struct device *dev) > { > struct cxl_region *cxlr = to_cxl_region(dev); > @@ -2985,6 +3022,11 @@ static int cxl_region_probe(struct device *dev) > goto out; > } > > + cxlr->memory_notifier.notifier_call = cxl_region_perf_attrs_callback; > + cxlr->memory_notifier.priority = HMAT_CALLBACK_PRI; > + register_memory_notifier(&cxlr->memory_notifier); > + rc = devm_add_action_or_reset(&cxlr->dev, remove_coord_notifier, cxlr); > + > /* > * From this point on any path that changes the region's state away from > * CXL_CONFIG_COMMIT is also responsible for releasing the driver. >
On 12/19/23 08:15, Jonathan Cameron wrote: > On Fri, 15 Dec 2023 16:16:11 -0700 > Dave Jiang <dave.jiang@intel.com> wrote: > >> When the CXL region is formed, the driver would computed the performance >> data for the region. However this data is not available at the node data >> collection that has been populated by the HMAT during kernel >> initialization. Add a memory hotplug notifier to update the performance >> data to the node hmem_attrs to expose the newly calculated region >> performance data. The CXL region is created under specific CFMWS. The >> node for the CFMWS is created during SRAT parsing by acpi_parse_cfmws(). >> The notifier will run once only and turn itself off after the initial >> run. Additional regions may overwrite the initial data, but since this is >> for the same poximity domain it's a don't care for now. > > proximity > >> >> node_set_perf_attrs() is exported to allow update of perf attribs for a >> node. Given that only CXL is using this, export only to CXL namespace. >> >> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> >> Cc: Rafael J. Wysocki <rafael@kernel.org> >> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> >> Signed-off-by: Dave Jiang <dave.jiang@intel.com> > > What is end result of this? > > /sys/devices/system/node/node/access0/ > /sys/devices/system/node/node/access1/ > With just the bandwidths and latencies? > No targets or initiators under accessX/targets or accessX/initiators? # tree ./devices/system/node/node2/access0 ./devices/system/node/node2/access0 ├── initiators │ ├── node1 -> ../../../node1 │ ├── read_bandwidth │ ├── read_latency │ ├── write_bandwidth │ └── write_latency ├── power │ ├── async │ ├── runtime_active_kids │ ├── runtime_enabled │ ├── runtime_status │ └── runtime_usage ├── targets └── uevent > > Or have those been set up earlier? In which case do we handle > the worse bandwidth being inside the host CPU? I think it gets setup via the memory online callback notifier the region driver registered. > >> --- >> v2: >> - Fix notifier return values (Dan) >> - Use devm_add_action_or_reset() instead of adding a remove callback (Dan) >> - Add Ying review tag >> --- >> drivers/base/node.c | 1 + >> drivers/cxl/core/region.c | 42 ++++++++++++++++++++++++++++++++++++++++++ >> drivers/cxl/cxl.h | 3 +++ >> 3 files changed, 46 insertions(+) >> >> diff --git a/drivers/base/node.c b/drivers/base/node.c >> index cb2b6cc7f6e6..f5b5a3f11894 100644 >> --- a/drivers/base/node.c >> +++ b/drivers/base/node.c >> @@ -215,6 +215,7 @@ void node_set_perf_attrs(unsigned int nid, struct access_coordinate *coord, >> } >> } >> } >> +EXPORT_SYMBOL_NS_GPL(node_set_perf_attrs, CXL); > This feels ugly as namespaces usually about what is providing the facility not > a 'who can use it' control. > > Also, I'm aware of at least one other user who will want this in the not > too distant future. So if we want to namespace it, I'd prefer a NODE namespace > or something along those lines. I'll just make it normal export if we are anticipating another user. > >> >> /** >> * struct node_cache_info - Internal tracking for memory node caches >> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c >> index d97fa5f32e86..1765bf716484 100644 >> --- a/drivers/cxl/core/region.c >> +++ b/drivers/cxl/core/region.c >> @@ -4,6 +4,7 @@ >> #include <linux/genalloc.h> >> #include <linux/device.h> >> #include <linux/module.h> >> +#include <linux/memory.h> >> #include <linux/slab.h> >> #include <linux/uuid.h> >> #include <linux/sort.h> >> @@ -2960,6 +2961,42 @@ static int is_system_ram(struct resource *res, void *arg) >> return 1; >> } >> >> +static int cxl_region_perf_attrs_callback(struct notifier_block *nb, >> + unsigned long action, void *arg) >> +{ >> + struct cxl_region *cxlr = container_of(nb, struct cxl_region, >> + memory_notifier); >> + struct cxl_region_params *p = &cxlr->params; >> + struct cxl_endpoint_decoder *cxled = p->targets[0]; >> + struct cxl_decoder *cxld = &cxled->cxld; >> + struct memory_notify *mnb = arg; >> + int nid = mnb->status_change_nid; >> + int region_nid; >> + >> + if (nid == NUMA_NO_NODE || action != MEM_ONLINE) >> + return NOTIFY_DONE; >> + >> + region_nid = phys_to_target_node(cxld->hpa_range.start); >> + if (nid != region_nid) >> + return NOTIFY_DONE; >> + >> + /* Don't set if there's no coordinate information */ >> + if (!cxlr->coord.write_bandwidth) >> + return NOTIFY_DONE; > > Could future proof a bit to allow for RO memory by using read_bandwith here. Yes. I didn't realize there will be RO memory. I just assumed that bandwidth would always be > 0 for a valid set of data. > >> + >> + node_set_perf_attrs(nid, &cxlr->coord, 0); >> + node_set_perf_attrs(nid, &cxlr->coord, 1); > > Hmm. Assumption that the access attributes from no CPU requesters is the same > as the CPU bothers me a little. I wasn't too sure about updating this. Should I only update access 0? > >> + >> + return NOTIFY_OK; >> +} >> + >> +static void remove_coord_notifier(void *data) >> +{ >> + struct cxl_region *cxlr = data; >> + >> + unregister_memory_notifier(&cxlr->memory_notifier); >> +} >> + >> static int cxl_region_probe(struct device *dev) >> { >> struct cxl_region *cxlr = to_cxl_region(dev); >> @@ -2985,6 +3022,11 @@ static int cxl_region_probe(struct device *dev) >> goto out; >> } >> >> + cxlr->memory_notifier.notifier_call = cxl_region_perf_attrs_callback; >> + cxlr->memory_notifier.priority = HMAT_CALLBACK_PRI; >> + register_memory_notifier(&cxlr->memory_notifier); >> + rc = devm_add_action_or_reset(&cxlr->dev, remove_coord_notifier, cxlr); >> + >> /* >> * From this point on any path that changes the region's state away from >> * CXL_CONFIG_COMMIT is also responsible for releasing the driver. > >> >
On Fri, 22 Dec 2023 11:17:15 -0700 Dave Jiang <dave.jiang@intel.com> wrote: > On 12/19/23 08:15, Jonathan Cameron wrote: > > On Fri, 15 Dec 2023 16:16:11 -0700 > > Dave Jiang <dave.jiang@intel.com> wrote: > > > >> When the CXL region is formed, the driver would computed the performance > >> data for the region. However this data is not available at the node data > >> collection that has been populated by the HMAT during kernel > >> initialization. Add a memory hotplug notifier to update the performance > >> data to the node hmem_attrs to expose the newly calculated region > >> performance data. The CXL region is created under specific CFMWS. The > >> node for the CFMWS is created during SRAT parsing by acpi_parse_cfmws(). > >> The notifier will run once only and turn itself off after the initial > >> run. Additional regions may overwrite the initial data, but since this is > >> for the same poximity domain it's a don't care for now. > > > > proximity > > > >> > >> node_set_perf_attrs() is exported to allow update of perf attribs for a > >> node. Given that only CXL is using this, export only to CXL namespace. > >> > >> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > >> Cc: Rafael J. Wysocki <rafael@kernel.org> > >> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> > >> Signed-off-by: Dave Jiang <dave.jiang@intel.com> > > > > What is end result of this? > > > > /sys/devices/system/node/node/access0/ > > /sys/devices/system/node/node/access1/ > > With just the bandwidths and latencies? > > No targets or initiators under accessX/targets or accessX/initiators? > > # tree ./devices/system/node/node2/access0 > ./devices/system/node/node2/access0 > ├── initiators > │ ├── node1 -> ../../../node1 > │ ├── read_bandwidth > │ ├── read_latency > │ ├── write_bandwidth > │ └── write_latency > ├── power > │ ├── async > │ ├── runtime_active_kids > │ ├── runtime_enabled > │ ├── runtime_status > │ └── runtime_usage > ├── targets > └── uevent > > > > > > Or have those been set up earlier? In which case do we handle > > the worse bandwidth being inside the host CPU? > > I think it gets setup via the memory online callback notifier the region driver registered. Hmm. For access0 this is a problem in the long term as the nearest initiator might be below the port. Still lots of stuff to do before that works anyway. access1 is fine in general as CPUs won't be below the port unless CXL gains a lot of new functionality in CXL rev X :) > > > > >> --- > >> v2: > >> - Fix notifier return values (Dan) > >> - Use devm_add_action_or_reset() instead of adding a remove callback (Dan) > >> - Add Ying review tag > >> --- > >> drivers/base/node.c | 1 + > >> drivers/cxl/core/region.c | 42 ++++++++++++++++++++++++++++++++++++++++++ > >> drivers/cxl/cxl.h | 3 +++ > >> 3 files changed, 46 insertions(+) > >> > >> diff --git a/drivers/base/node.c b/drivers/base/node.c > >> index cb2b6cc7f6e6..f5b5a3f11894 100644 > >> --- a/drivers/base/node.c > >> +++ b/drivers/base/node.c > >> @@ -215,6 +215,7 @@ void node_set_perf_attrs(unsigned int nid, struct access_coordinate *coord, > >> } > >> } > >> } > >> +EXPORT_SYMBOL_NS_GPL(node_set_perf_attrs, CXL); > > This feels ugly as namespaces usually about what is providing the facility not > > a 'who can use it' control. > > > > Also, I'm aware of at least one other user who will want this in the not > > too distant future. So if we want to namespace it, I'd prefer a NODE namespace > > or something along those lines. > > I'll just make it normal export if we are anticipating another user. > > > > >> > >> /** > >> * struct node_cache_info - Internal tracking for memory node caches > >> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c > >> index d97fa5f32e86..1765bf716484 100644 > >> --- a/drivers/cxl/core/region.c > >> +++ b/drivers/cxl/core/region.c > >> @@ -4,6 +4,7 @@ > >> #include <linux/genalloc.h> > >> #include <linux/device.h> > >> #include <linux/module.h> > >> +#include <linux/memory.h> > >> #include <linux/slab.h> > >> #include <linux/uuid.h> > >> #include <linux/sort.h> > >> @@ -2960,6 +2961,42 @@ static int is_system_ram(struct resource *res, void *arg) > >> return 1; > >> } > >> > >> +static int cxl_region_perf_attrs_callback(struct notifier_block *nb, > >> + unsigned long action, void *arg) > >> +{ > >> + struct cxl_region *cxlr = container_of(nb, struct cxl_region, > >> + memory_notifier); > >> + struct cxl_region_params *p = &cxlr->params; > >> + struct cxl_endpoint_decoder *cxled = p->targets[0]; > >> + struct cxl_decoder *cxld = &cxled->cxld; > >> + struct memory_notify *mnb = arg; > >> + int nid = mnb->status_change_nid; > >> + int region_nid; > >> + > >> + if (nid == NUMA_NO_NODE || action != MEM_ONLINE) > >> + return NOTIFY_DONE; > >> + > >> + region_nid = phys_to_target_node(cxld->hpa_range.start); > >> + if (nid != region_nid) > >> + return NOTIFY_DONE; > >> + > >> + /* Don't set if there's no coordinate information */ > >> + if (!cxlr->coord.write_bandwidth) > >> + return NOTIFY_DONE; > > > > Could future proof a bit to allow for RO memory by using read_bandwith here. > > Yes. I didn't realize there will be RO memory. I just assumed that bandwidth would always be > 0 for a valid set of data. > > > > >> + > >> + node_set_perf_attrs(nid, &cxlr->coord, 0); > >> + node_set_perf_attrs(nid, &cxlr->coord, 1); > > > > Hmm. Assumption that the access attributes from no CPU requesters is the same > > as the CPU bothers me a little. > > I wasn't too sure about updating this. Should I only update access 0? They shouldn't be updated to the same thing. It should depend on which node is in their initiators directory. It can be different between access0 and access1 > > > >> + > >> + return NOTIFY_OK; > >> +} > >> + > >> +static void remove_coord_notifier(void *data) > >> +{ > >> + struct cxl_region *cxlr = data; > >> + > >> + unregister_memory_notifier(&cxlr->memory_notifier); > >> +} > >> + > >> static int cxl_region_probe(struct device *dev) > >> { > >> struct cxl_region *cxlr = to_cxl_region(dev); > >> @@ -2985,6 +3022,11 @@ static int cxl_region_probe(struct device *dev) > >> goto out; > >> } > >> > >> + cxlr->memory_notifier.notifier_call = cxl_region_perf_attrs_callback; > >> + cxlr->memory_notifier.priority = HMAT_CALLBACK_PRI; > >> + register_memory_notifier(&cxlr->memory_notifier); > >> + rc = devm_add_action_or_reset(&cxlr->dev, remove_coord_notifier, cxlr); > >> + > >> /* > >> * From this point on any path that changes the region's state away from > >> * CXL_CONFIG_COMMIT is also responsible for releasing the driver. > > > >> > >
diff --git a/drivers/base/node.c b/drivers/base/node.c index cb2b6cc7f6e6..f5b5a3f11894 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -215,6 +215,7 @@ void node_set_perf_attrs(unsigned int nid, struct access_coordinate *coord, } } } +EXPORT_SYMBOL_NS_GPL(node_set_perf_attrs, CXL); /** * struct node_cache_info - Internal tracking for memory node caches diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index d97fa5f32e86..1765bf716484 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -4,6 +4,7 @@ #include <linux/genalloc.h> #include <linux/device.h> #include <linux/module.h> +#include <linux/memory.h> #include <linux/slab.h> #include <linux/uuid.h> #include <linux/sort.h> @@ -2960,6 +2961,42 @@ static int is_system_ram(struct resource *res, void *arg) return 1; } +static int cxl_region_perf_attrs_callback(struct notifier_block *nb, + unsigned long action, void *arg) +{ + struct cxl_region *cxlr = container_of(nb, struct cxl_region, + memory_notifier); + struct cxl_region_params *p = &cxlr->params; + struct cxl_endpoint_decoder *cxled = p->targets[0]; + struct cxl_decoder *cxld = &cxled->cxld; + struct memory_notify *mnb = arg; + int nid = mnb->status_change_nid; + int region_nid; + + if (nid == NUMA_NO_NODE || action != MEM_ONLINE) + return NOTIFY_DONE; + + region_nid = phys_to_target_node(cxld->hpa_range.start); + if (nid != region_nid) + return NOTIFY_DONE; + + /* Don't set if there's no coordinate information */ + if (!cxlr->coord.write_bandwidth) + return NOTIFY_DONE; + + node_set_perf_attrs(nid, &cxlr->coord, 0); + node_set_perf_attrs(nid, &cxlr->coord, 1); + + return NOTIFY_OK; +} + +static void remove_coord_notifier(void *data) +{ + struct cxl_region *cxlr = data; + + unregister_memory_notifier(&cxlr->memory_notifier); +} + static int cxl_region_probe(struct device *dev) { struct cxl_region *cxlr = to_cxl_region(dev); @@ -2985,6 +3022,11 @@ static int cxl_region_probe(struct device *dev) goto out; } + cxlr->memory_notifier.notifier_call = cxl_region_perf_attrs_callback; + cxlr->memory_notifier.priority = HMAT_CALLBACK_PRI; + register_memory_notifier(&cxlr->memory_notifier); + rc = devm_add_action_or_reset(&cxlr->dev, remove_coord_notifier, cxlr); + /* * From this point on any path that changes the region's state away from * CXL_CONFIG_COMMIT is also responsible for releasing the driver. diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 4639d0d6ef54..2498086c8edc 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -6,6 +6,7 @@ #include <linux/libnvdimm.h> #include <linux/bitfield.h> +#include <linux/notifier.h> #include <linux/bitops.h> #include <linux/log2.h> #include <linux/node.h> @@ -520,6 +521,7 @@ struct cxl_region_params { * @flags: Region state flags * @params: active + config params for the region * @coord: QoS access coordinates for the region + * @memory_notifier: notifier for setting the access coordinates to node */ struct cxl_region { struct device dev; @@ -531,6 +533,7 @@ struct cxl_region { unsigned long flags; struct cxl_region_params params; struct access_coordinate coord; + struct notifier_block memory_notifier; }; struct cxl_nvdimm_bridge {