Message ID | 20220603134237.131362-3-aneesh.kumar@linux.ibm.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/demotion: Memory tiers and demotion | expand |
On Fri, 2022-06-03 at 19:12 +0530, Aneesh Kumar K.V wrote: > > > +static struct memory_tier *__node_get_memory_tier(int node) > +{ > + struct memory_tier *memtier; > + > + list_for_each_entry(memtier, &memory_tiers, list) { We could need to map node to mem_tier quite often, if we need to account memory usage at tier level. It will be more efficient to have a pointer from node (pgdat) to memtier rather than doing a search through the list. > + if (node_isset(node, memtier->nodelist)) > + return memtier; > + } > + return NULL; > +} > + > Tim
On 6/8/22 1:45 AM, Tim Chen wrote: > On Fri, 2022-06-03 at 19:12 +0530, Aneesh Kumar K.V wrote: >> >> >> +static struct memory_tier *__node_get_memory_tier(int node) >> +{ >> + struct memory_tier *memtier; >> + >> + list_for_each_entry(memtier, &memory_tiers, list) { > > We could need to map node to mem_tier quite often, if we need > to account memory usage at tier level. It will be more efficient > to have a pointer from node (pgdat) to memtier rather > than doing a search through the list. > > That is something I was actively trying to avoid. Currently all struct memory_tier references are with memory_tier_lock mutex held. That simplify the locking and reference counting. As of now we are able to implement all the required interfaces without pgdat having pointers to struct memory_tier. We can update pgdat with memtier details when we are implementing changes requiring those. We could keep additional memtier->dev reference to make sure memory tiers are not destroyed while other part of the kernel is referencing the same. But IMHO such changes should wait till we have users for the same. >> + if (node_isset(node, memtier->nodelist)) >> + return memtier; >> + } >> + return NULL; >> +} >> + >> > > Tim > -aneesh
On Wed, 2022-06-08 at 10:25 +0530, Aneesh Kumar K V wrote: > On 6/8/22 1:45 AM, Tim Chen wrote: > > On Fri, 2022-06-03 at 19:12 +0530, Aneesh Kumar K.V wrote: > > > > > > > > > > > > > > > > > > +static struct memory_tier *__node_get_memory_tier(int node) > > > +{ > > > + struct memory_tier *memtier; > > > + > > > + list_for_each_entry(memtier, &memory_tiers, list) { > > > > We could need to map node to mem_tier quite often, if we need > > to account memory usage at tier level. It will be more efficient > > to have a pointer from node (pgdat) to memtier rather > > than doing a search through the list. > > > > > > That is something I was actively trying to avoid. Currently all struct > memory_tier references are with memory_tier_lock mutex held. That > simplify the locking and reference counting. > > As of now we are able to implement all the required interfaces without > pgdat having pointers to struct memory_tier. We can update pgdat with > memtier details when we are implementing changes requiring those. We > could keep additional memtier->dev reference to make sure memory tiers > are not destroyed while other part of the kernel is referencing the > same. But IMHO such changes should wait till we have users for the same. No. We need a convenient way to access memory tier information from inside the kernel. For example, from nid to memory tier rank, this is needed by migrate_misplaced_page() to do statistics too, iterate all nodes of a memory tier, etc. And, "allowed" field of struct demotion_nodes (introduced in [7/9] is per-memory tier instead of per-node. Please move it to struct memory_tier. And we just need a convenient way to access it. All these are not complex, unless you insist to use memory_tier_lock and device liftcycle to manage this in-kernel data structure. Best Regards, Huang, Ying > > > + if (node_isset(node, memtier->nodelist)) > > > + return memtier; > > > + } > > > + return NULL; > > > +} > > > + > > > > > > > Tim > > > > -aneesh >
On Wed, 2022-06-08 at 10:25 +0530, Aneesh Kumar K V wrote: > On 6/8/22 1:45 AM, Tim Chen wrote: > > On Fri, 2022-06-03 at 19:12 +0530, Aneesh Kumar K.V wrote: > > > > > > +static struct memory_tier *__node_get_memory_tier(int node) > > > +{ > > > + struct memory_tier *memtier; > > > + > > > + list_for_each_entry(memtier, &memory_tiers, list) { > > > > We could need to map node to mem_tier quite often, if we need > > to account memory usage at tier level. It will be more efficient > > to have a pointer from node (pgdat) to memtier rather > > than doing a search through the list. > > > > > > That is something I was actively trying to avoid. Currently all struct > memory_tier references are with memory_tier_lock mutex held. That > simplify the locking and reference counting. > > As of now we are able to implement all the required interfaces without > pgdat having pointers to struct memory_tier. We can update pgdat with > memtier details when we are implementing changes requiring those. We > could keep additional memtier->dev reference to make sure memory tiers > are not destroyed while other part of the kernel is referencing the > same. But IMHO such changes should wait till we have users for the same. > I think we should have an efficient mapping from node to memtier from the get go. There are many easily envisioned scenarios where we need to map from node to memtier, which Ying pointed out. Tim
On 6/8/22 9:36 PM, Tim Chen wrote: > On Wed, 2022-06-08 at 10:25 +0530, Aneesh Kumar K V wrote: >> On 6/8/22 1:45 AM, Tim Chen wrote: >>> On Fri, 2022-06-03 at 19:12 +0530, Aneesh Kumar K.V wrote: >>>> >>>> +static struct memory_tier *__node_get_memory_tier(int node) >>>> +{ >>>> + struct memory_tier *memtier; >>>> + >>>> + list_for_each_entry(memtier, &memory_tiers, list) { >>> >>> We could need to map node to mem_tier quite often, if we need >>> to account memory usage at tier level. It will be more efficient >>> to have a pointer from node (pgdat) to memtier rather >>> than doing a search through the list. >>> >>> >> >> That is something I was actively trying to avoid. Currently all struct >> memory_tier references are with memory_tier_lock mutex held. That >> simplify the locking and reference counting. >> >> As of now we are able to implement all the required interfaces without >> pgdat having pointers to struct memory_tier. We can update pgdat with >> memtier details when we are implementing changes requiring those. We >> could keep additional memtier->dev reference to make sure memory tiers >> are not destroyed while other part of the kernel is referencing the >> same. But IMHO such changes should wait till we have users for the same. >> > > I think we should have an efficient mapping from node to memtier from > the get go. There are many easily envisioned scenarios where > we need to map from node to memtier, which Ying pointed out. > I did an initial implementation here. We need to make sure we can access NODE_DATA()->memtier lockless. Can you review the changes here https://lore.kernel.org/linux-mm/87sfoffcfz.fsf@linux.ibm.com -aneesh
diff --git a/drivers/base/node.c b/drivers/base/node.c index 0ac6376ef7a1..599ed64d910f 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -20,6 +20,7 @@ #include <linux/pm_runtime.h> #include <linux/swap.h> #include <linux/slab.h> +#include <linux/memory-tiers.h> static struct bus_type node_subsys = { .name = "node", @@ -560,11 +561,49 @@ static ssize_t node_read_distance(struct device *dev, } static DEVICE_ATTR(distance, 0444, node_read_distance, NULL); +#ifdef CONFIG_TIERED_MEMORY +static ssize_t memtier_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + int node = dev->id; + int tier_index = node_get_memory_tier_id(node); + + if (tier_index != -1) + return sysfs_emit(buf, "%d\n", tier_index); + return 0; +} + +static ssize_t memtier_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + unsigned long tier; + int node = dev->id; + int ret; + + ret = kstrtoul(buf, 10, &tier); + if (ret) + return ret; + + ret = node_reset_memory_tier(node, tier); + if (ret) + return ret; + + return count; +} + +static DEVICE_ATTR_RW(memtier); +#endif + static struct attribute *node_dev_attrs[] = { &dev_attr_meminfo.attr, &dev_attr_numastat.attr, &dev_attr_distance.attr, &dev_attr_vmstat.attr, +#ifdef CONFIG_TIERED_MEMORY + &dev_attr_memtier.attr, +#endif NULL }; diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h index e17f6b4ee177..91f071804476 100644 --- a/include/linux/memory-tiers.h +++ b/include/linux/memory-tiers.h @@ -15,6 +15,9 @@ #define DEFAULT_MEMORY_TIER MEMORY_TIER_DRAM #define MAX_MEMORY_TIERS 3 +int node_get_memory_tier_id(int node); +int node_set_memory_tier(int node, int tier); +int node_reset_memory_tier(int node, int tier); #endif /* CONFIG_TIERED_MEMORY */ #endif diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c index 7de18d94a08d..9c78c47ad030 100644 --- a/mm/memory-tiers.c +++ b/mm/memory-tiers.c @@ -126,7 +126,6 @@ static struct memory_tier *register_memory_tier(unsigned int tier) return memtier; } -__maybe_unused // temporay to prevent warnings during bisects static void unregister_memory_tier(struct memory_tier *memtier) { list_del(&memtier->list); @@ -162,6 +161,128 @@ static const struct attribute_group *memory_tier_attr_groups[] = { NULL, }; +static struct memory_tier *__node_get_memory_tier(int node) +{ + struct memory_tier *memtier; + + list_for_each_entry(memtier, &memory_tiers, list) { + if (node_isset(node, memtier->nodelist)) + return memtier; + } + return NULL; +} + +static struct memory_tier *__get_memory_tier_from_id(int id) +{ + struct memory_tier *memtier; + + list_for_each_entry(memtier, &memory_tiers, list) { + if (memtier->dev.id == id) + return memtier; + } + return NULL; +} + +__maybe_unused // temporay to prevent warnings during bisects +static void node_remove_from_memory_tier(int node) +{ + struct memory_tier *memtier; + + mutex_lock(&memory_tier_lock); + + memtier = __node_get_memory_tier(node); + if (!memtier) + goto out; + /* + * Remove node from tier, if tier becomes + * empty then unregister it to make it invisible + * in sysfs. + */ + node_clear(node, memtier->nodelist); + if (nodes_empty(memtier->nodelist)) + unregister_memory_tier(memtier); + +out: + mutex_unlock(&memory_tier_lock); +} + +int node_get_memory_tier_id(int node) +{ + int tier = -1; + struct memory_tier *memtier; + /* + * Make sure memory tier is not unregistered + * while it is being read. + */ + mutex_lock(&memory_tier_lock); + memtier = __node_get_memory_tier(node); + if (memtier) + tier = memtier->dev.id; + mutex_unlock(&memory_tier_lock); + + return tier; +} + +static int __node_set_memory_tier(int node, int tier) +{ + int ret = 0; + struct memory_tier *memtier; + + memtier = __get_memory_tier_from_id(tier); + if (!memtier) { + memtier = register_memory_tier(tier); + if (!memtier) { + ret = -EINVAL; + goto out; + } + } + node_set(node, memtier->nodelist); +out: + return ret; +} + +int node_reset_memory_tier(int node, int tier) +{ + struct memory_tier *current_tier; + int ret = 0; + + mutex_lock(&memory_tier_lock); + + current_tier = __node_get_memory_tier(node); + if (!current_tier || current_tier->dev.id == tier) + goto out; + + node_clear(node, current_tier->nodelist); + + ret = __node_set_memory_tier(node, tier); + if (ret) { + /* reset it back to older tier */ + node_set(node, current_tier->nodelist); + goto out; + } + + if (nodes_empty(current_tier->nodelist)) + unregister_memory_tier(current_tier); +out: + mutex_unlock(&memory_tier_lock); + + return ret; +} + +int node_set_memory_tier(int node, int tier) +{ + struct memory_tier *memtier; + int ret = 0; + + mutex_lock(&memory_tier_lock); + memtier = __node_get_memory_tier(node); + if (!memtier) + ret = __node_set_memory_tier(node, tier); + mutex_unlock(&memory_tier_lock); + + return ret; +} + static int __init memory_tier_init(void) { int ret;