diff mbox series

[v5,6/9] mm/demotion: Add support for removing node from demotion memory tiers

Message ID 20220603134237.131362-7-aneesh.kumar@linux.ibm.com (mailing list archive)
State New
Headers show
Series mm/demotion: Memory tiers and demotion | expand

Commit Message

Aneesh Kumar K.V June 3, 2022, 1:42 p.m. UTC
This patch adds the special string "none" as a supported memtier value
that we can use to remove a specific node from being using as demotion target.

For ex:
:/sys/devices/system/node/node1# cat memtier
1
:/sys/devices/system/node/node1# cat ../../memtier/memtier1/nodelist
1-3
:/sys/devices/system/node/node1# echo none > memtier
:/sys/devices/system/node/node1#
:/sys/devices/system/node/node1# cat memtier
:/sys/devices/system/node/node1# cat ../../memtier/memtier1/nodelist
2-3
:/sys/devices/system/node/node1#

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 drivers/base/node.c          |  4 ++++
 include/linux/memory-tiers.h |  1 +
 mm/memory-tiers.c            | 18 +++++++++++++++---
 3 files changed, 20 insertions(+), 3 deletions(-)

Comments

Tim Chen June 7, 2022, 11:40 p.m. UTC | #1
On Fri, 2022-06-03 at 19:12 +0530, Aneesh Kumar K.V wrote:
> This patch adds the special string "none" as a supported memtier value
> that we can use to remove a specific node from being using as demotion target.

And also such node will not participate in promotion.  That is, hot memory in it will
not be promoted to other nodes.

> 
> For ex:
> :/sys/devices/system/node/node1# cat memtier
> 1
> :/sys/devices/system/node/node1# cat ../../memtier/memtier1/nodelist
> 1-3
> :/sys/devices/system/node/node1# echo none > memtier
> :/sys/devices/system/node/node1#
> :/sys/devices/system/node/node1# cat memtier
> :/sys/devices/system/node/node1# cat ../../memtier/memtier1/nodelist
> 2-3
> :/sys/devices/system/node/node1#
> 
>
Huang, Ying June 8, 2022, 6:59 a.m. UTC | #2
On Fri, 2022-06-03 at 19:12 +0530, Aneesh Kumar K.V wrote:
> This patch adds the special string "none" as a supported memtier value
> that we can use to remove a specific node from being using as demotion target.
> 
> For ex:
> :/sys/devices/system/node/node1# cat memtier
> 1
> :/sys/devices/system/node/node1# cat ../../memtier/memtier1/nodelist
> 1-3
> :/sys/devices/system/node/node1# echo none > memtier
> :/sys/devices/system/node/node1#
> :/sys/devices/system/node/node1# cat memtier
> :/sys/devices/system/node/node1# cat ../../memtier/memtier1/nodelist
> 2-3
> :/sys/devices/system/node/node1#

Do you have a practical use case for this?  What kind of memory node
needs to be removed from memory tiers demotion/promotion?

Best Regards,
Huang, YIng


[snip]
Aneesh Kumar K.V June 8, 2022, 8:20 a.m. UTC | #3
On 6/8/22 12:29 PM, Ying Huang wrote:
> On Fri, 2022-06-03 at 19:12 +0530, Aneesh Kumar K.V wrote:
>> This patch adds the special string "none" as a supported memtier value
>> that we can use to remove a specific node from being using as demotion target.
>>
>> For ex:
>> :/sys/devices/system/node/node1# cat memtier
>> 1
>> :/sys/devices/system/node/node1# cat ../../memtier/memtier1/nodelist
>> 1-3
>> :/sys/devices/system/node/node1# echo none > memtier
>> :/sys/devices/system/node/node1#
>> :/sys/devices/system/node/node1# cat memtier
>> :/sys/devices/system/node/node1# cat ../../memtier/memtier1/nodelist
>> 2-3
>> :/sys/devices/system/node/node1#
> 
> Do you have a practical use case for this?  What kind of memory node
> needs to be removed from memory tiers demotion/promotion?
> 

This came up in our internal discussion. It was mentioned that there is 
a need to skip some slow memory nodes from participating in demotion.

-aneesh
Huang, Ying June 8, 2022, 8:23 a.m. UTC | #4
On Wed, 2022-06-08 at 13:50 +0530, Aneesh Kumar K V wrote:
> On 6/8/22 12:29 PM, Ying Huang wrote:
> > On Fri, 2022-06-03 at 19:12 +0530, Aneesh Kumar K.V wrote:
> > > This patch adds the special string "none" as a supported memtier value
> > > that we can use to remove a specific node from being using as demotion target.
> > > 
> > > For ex:
> > > :/sys/devices/system/node/node1# cat memtier
> > > 1
> > > :/sys/devices/system/node/node1# cat ../../memtier/memtier1/nodelist
> > > 1-3
> > > :/sys/devices/system/node/node1# echo none > memtier
> > > :/sys/devices/system/node/node1#
> > > :/sys/devices/system/node/node1# cat memtier
> > > :/sys/devices/system/node/node1# cat ../../memtier/memtier1/nodelist
> > > 2-3
> > > :/sys/devices/system/node/node1#
> > 
> > Do you have a practical use case for this?  What kind of memory node
> > needs to be removed from memory tiers demotion/promotion?
> > 
> 
> This came up in our internal discussion. It was mentioned that there is 
> a need to skip some slow memory nodes from participating in demotion.

Again, can you provide a practical use case?  Why we shouldn't demote
cold pages to these slow memory nodes?  How do we use these slow memory
node?  These slow memory node is slower than disk?

Best Regards,
Huang, Ying
Aneesh Kumar K.V June 8, 2022, 8:29 a.m. UTC | #5
On 6/8/22 1:53 PM, Ying Huang wrote:
> On Wed, 2022-06-08 at 13:50 +0530, Aneesh Kumar K V wrote:
>> On 6/8/22 12:29 PM, Ying Huang wrote:
>>> On Fri, 2022-06-03 at 19:12 +0530, Aneesh Kumar K.V wrote:
>>>> This patch adds the special string "none" as a supported memtier value
>>>> that we can use to remove a specific node from being using as demotion target.
>>>>
>>>> For ex:
>>>> :/sys/devices/system/node/node1# cat memtier
>>>> 1
>>>> :/sys/devices/system/node/node1# cat ../../memtier/memtier1/nodelist
>>>> 1-3
>>>> :/sys/devices/system/node/node1# echo none > memtier
>>>> :/sys/devices/system/node/node1#
>>>> :/sys/devices/system/node/node1# cat memtier
>>>> :/sys/devices/system/node/node1# cat ../../memtier/memtier1/nodelist
>>>> 2-3
>>>> :/sys/devices/system/node/node1#
>>>
>>> Do you have a practical use case for this?  What kind of memory node
>>> needs to be removed from memory tiers demotion/promotion?
>>>
>>
>> This came up in our internal discussion. It was mentioned that there is
>> a need to skip some slow memory nodes from participating in demotion.
> 
> Again, can you provide a practical use case?  Why we shouldn't demote
> cold pages to these slow memory nodes?  How do we use these slow memory
> node?  These slow memory node is slower than disk?
> 

This was discussed in the context of memory borrowed from remote machine 
(aka OpenCAPI memory). In such case, we would have a memory only NUMA 
node which we want to avoid using for demotion.

-aneesh
Huang, Ying June 8, 2022, 8:34 a.m. UTC | #6
On Wed, 2022-06-08 at 13:59 +0530, Aneesh Kumar K V wrote:
> On 6/8/22 1:53 PM, Ying Huang wrote:
> > On Wed, 2022-06-08 at 13:50 +0530, Aneesh Kumar K V wrote:
> > > On 6/8/22 12:29 PM, Ying Huang wrote:
> > > > On Fri, 2022-06-03 at 19:12 +0530, Aneesh Kumar K.V wrote:
> > > > > This patch adds the special string "none" as a supported memtier value
> > > > > that we can use to remove a specific node from being using as demotion target.
> > > > > 
> > > > > For ex:
> > > > > :/sys/devices/system/node/node1# cat memtier
> > > > > 1
> > > > > :/sys/devices/system/node/node1# cat ../../memtier/memtier1/nodelist
> > > > > 1-3
> > > > > :/sys/devices/system/node/node1# echo none > memtier
> > > > > :/sys/devices/system/node/node1#
> > > > > :/sys/devices/system/node/node1# cat memtier
> > > > > :/sys/devices/system/node/node1# cat ../../memtier/memtier1/nodelist
> > > > > 2-3
> > > > > :/sys/devices/system/node/node1#
> > > > 
> > > > Do you have a practical use case for this?  What kind of memory node
> > > > needs to be removed from memory tiers demotion/promotion?
> > > > 
> > > 
> > > This came up in our internal discussion. It was mentioned that there is
> > > a need to skip some slow memory nodes from participating in demotion.
> > 
> > Again, can you provide a practical use case?  Why we shouldn't demote
> > cold pages to these slow memory nodes?  How do we use these slow memory
> > node?  These slow memory node is slower than disk?
> > 
> 
> This was discussed in the context of memory borrowed from remote machine 
> (aka OpenCAPI memory). In such case, we would have a memory only NUMA 
> node which we want to avoid using for demotion.

Thanks for your information.  But why shouldn't we use them for
demotion?  Because it's too slow?  Even slower than disks?  Or some
other reason?

Best Regards,
Huang, Ying
diff mbox series

Patch

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 599ed64d910f..344786290149 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -582,6 +582,10 @@  static ssize_t memtier_store(struct device *dev,
 	int node = dev->id;
 	int ret;
 
+	if (!strncmp(buf, "none", strlen("none"))) {
+		node_remove_from_memory_tier(node);
+		return count;
+	}
 	ret = kstrtoul(buf, 10, &tier);
 	if (ret)
 		return ret;
diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
index adc2cb3bf5f8..79bd8d26feb2 100644
--- a/include/linux/memory-tiers.h
+++ b/include/linux/memory-tiers.h
@@ -17,6 +17,7 @@ 
 
 extern bool numa_demotion_enabled;
 int next_demotion_node(int node);
+void node_remove_from_memory_tier(int node);
 int node_get_memory_tier_id(int node);
 int node_set_memory_tier(int node, int tier);
 int node_reset_memory_tier(int node, int tier);
diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
index 9c82cf4c4bca..b4e72b672d4d 100644
--- a/mm/memory-tiers.c
+++ b/mm/memory-tiers.c
@@ -253,8 +253,7 @@  static struct memory_tier *__get_memory_tier_from_id(int id)
 	return NULL;
 }
 
-__maybe_unused // temporay to prevent warnings during bisects
-static void node_remove_from_memory_tier(int node)
+void node_remove_from_memory_tier(int node)
 {
 	struct memory_tier *memtier;
 
@@ -320,7 +319,20 @@  int node_reset_memory_tier(int node, int tier)
 	mutex_lock(&memory_tier_lock);
 
 	current_tier = __node_get_memory_tier(node);
-	if (!current_tier || current_tier->dev.id == tier)
+	if (!current_tier) {
+		/*
+		 * If a N_MEMORY node doesn't have a tier index, then
+		 * we removed it from demotion earlier and we are trying
+		 * add it back. Just add the node to requested tier.
+		 */
+		if (node_state(node, N_MEMORY)) {
+			ret = __node_set_memory_tier(node, tier);
+			establish_migration_targets();
+		}
+		goto out;
+	}
+
+	if (current_tier->dev.id == tier)
 		goto out;
 
 	node_clear(node, current_tier->nodelist);