Message ID | 20250401092716.537512-1-osalvador@suse.de (mailing list archive) |
---|---|
Headers | show |
Series | Implement numa node notifier | expand |
On 4/1/25 11:27, Oscar Salvador wrote: > Memory notifier is a tool that allow consumers to get notified whenever > memory gets onlined or offlined in the system. > Currently, there are 10 consumers of that, but 5 out of those 10 consumers > are only interested in getting notifications when a numa node has changed its > state. > That means going from memoryless to memory-aware of vice versa. > > Which means that for every {online,offline}_pages operation they get > notified even though the numa node might not have changed its state. > > The first patch implements a numa node notifier that does just that, and have > those consumers register in there, so they get notified only when they are > interested. What if we had two chains: register_node_notifier() register_node_normal_notifier() I think they could have shared the state #defines and struct node_notify would have just one nid and be always >= 0. Or would it add too much extra boilerplate and only slab cares? > The second patch replaces 'status_change_normal{_normal}' fields within > memory_notify with a 'nid', as that is only what we need for memory > notifer and update the only user of it (page_ext). > > Consumers that are only interested in numa node states change are: > > - memory-tier > - slub > - cpuset > - hmat > - cxl > > > Oscar Salvador (2): > mm,memory_hotplug: Implement numa node notifier > mm,memory_hotplug: Replace status_change_nid parameter in > memory_notify > > drivers/acpi/numa/hmat.c | 6 +-- > drivers/base/node.c | 19 +++++++++ > drivers/cxl/core/region.c | 14 +++---- > drivers/cxl/cxl.h | 4 +- > include/linux/memory.h | 37 ++++++++++++++++++ > kernel/cgroup/cpuset.c | 2 +- > mm/memory-tiers.c | 8 ++-- > mm/memory_hotplug.c | 82 +++++++++++++++++++++++++++++---------- > mm/page_ext.c | 12 +----- > mm/slub.c | 22 +++++------ > 10 files changed, 146 insertions(+), 60 deletions(-) >
On Wed, Apr 02, 2025 at 06:06:51PM +0200, Vlastimil Babka wrote: > What if we had two chains: > > register_node_notifier() > register_node_normal_notifier() > > I think they could have shared the state #defines and struct node_notify > would have just one nid and be always >= 0. > > Or would it add too much extra boilerplate and only slab cares? We could indeed go on that direction to try to decouple status_change_nid from status_change_nid_normal. Although as you said, slub is the only user of status_change_nid_normal for the time beign, so I am not sure of adding a second chain for only one user. Might look cleaner though, and the advantatge is that slub would not get notified for nodes adquiring only ZONE_MOVABLE. Let us see what David thinks about it. thanks for the suggestion ;-)
On Tue, 1 Apr 2025 11:27:14 +0200 Oscar Salvador <osalvador@suse.de> wrote: > Memory notifier is a tool that allow consumers to get notified whenever > memory gets onlined or offlined in the system. > Currently, there are 10 consumers of that, but 5 out of those 10 consumers > are only interested in getting notifications when a numa node has changed its > state. > That means going from memoryless to memory-aware of vice versa. > > Which means that for every {online,offline}_pages operation they get > notified even though the numa node might not have changed its state. > > The first patch implements a numa node notifier that does just that, and have > those consumers register in there, so they get notified only when they are > interested. > > The second patch replaces 'status_change_normal{_normal}' fields within > memory_notify with a 'nid', as that is only what we need for memory > notifer and update the only user of it (page_ext). > > Consumers that are only interested in numa node states change are: > > - memory-tier > - slub > - cpuset > - hmat > - cxl > Hi Oscar, Idea seems good to me. +CC linux-cxl for information of others not on the thread. > > Oscar Salvador (2): > mm,memory_hotplug: Implement numa node notifier > mm,memory_hotplug: Replace status_change_nid parameter in > memory_notify > > drivers/acpi/numa/hmat.c | 6 +-- > drivers/base/node.c | 19 +++++++++ > drivers/cxl/core/region.c | 14 +++---- > drivers/cxl/cxl.h | 4 +- > include/linux/memory.h | 37 ++++++++++++++++++ > kernel/cgroup/cpuset.c | 2 +- > mm/memory-tiers.c | 8 ++-- > mm/memory_hotplug.c | 82 +++++++++++++++++++++++++++++---------- > mm/page_ext.c | 12 +----- > mm/slub.c | 22 +++++------ > 10 files changed, 146 insertions(+), 60 deletions(-) >
On 02.04.25 19:03, Oscar Salvador wrote: > On Wed, Apr 02, 2025 at 06:06:51PM +0200, Vlastimil Babka wrote: >> What if we had two chains: >> >> register_node_notifier() >> register_node_normal_notifier() >> >> I think they could have shared the state #defines and struct node_notify >> would have just one nid and be always >= 0. >> >> Or would it add too much extra boilerplate and only slab cares? > > We could indeed go on that direction to try to decouple > status_change_nid from status_change_nid_normal. > > Although as you said, slub is the only user of status_change_nid_normal > for the time beign, so I am not sure of adding a second chain for only > one user. > > Might look cleaner though, and the advantatge is that slub would not get > notified for nodes adquiring only ZONE_MOVABLE. > > Let us see what David thinks about it. I'd hope we'd be able to get rid of the _normal stuff completely, it's seems way to specialized. We added that in commit b9d5ab2562eceeada5e4837a621b6260574dd11d Author: Lai Jiangshan <laijs@cn.fujitsu.com> Date: Tue Dec 11 16:01:05 2012 -0800 slub, hotplug: ignore unrelated node's hot-adding and hot-removing SLUB only focuses on the nodes which have normal memory and it ignores the other node's hot-adding and hot-removing. Aka: if some memory of a node which has no onlined memory is online, but this new memory onlined is not normal memory (for example, highmem), we should not allocate kmem_cache_node for SLUB. And if the last normal memory is offlined, but the node still has memory, we should remove kmem_cache_node for that node. (The current code delays it when all of the memory is offlined) So we only do something when marg->status_change_nid_normal > 0. marg->status_change_nid is not suitable here. The same problem doesn't exist in SLAB, because SLAB allocates kmem_list3 for every node even the node don't have normal memory, SLAB tolerates kmem_list3 on alien nodes. SLUB only focuses on the nodes which have normal memory, it don't tolerate alien kmem_cache_node. The patch makes SLUB become self-compatible and avoids WARNs and BUGs in rare conditions. How "bad" would it be if we do the slab_mem_going_online_callback() etc even for completely-movable nodes? I assume one kmem_cache_alloc() per slab_caches. slab_mem_going_offline_callback() only does shrinking, #dontcare Looking at slab_mem_offline_callback(), we never even free the caches either way when offlining. So the implication would be that we would have movable-only nodes set in slab_nodes. We don't expect many such nodes, so ... do we care?
On 03.04.25 15:02, David Hildenbrand wrote: > On 02.04.25 19:03, Oscar Salvador wrote: >> On Wed, Apr 02, 2025 at 06:06:51PM +0200, Vlastimil Babka wrote: >>> What if we had two chains: >>> >>> register_node_notifier() >>> register_node_normal_notifier() >>> >>> I think they could have shared the state #defines and struct node_notify >>> would have just one nid and be always >= 0. >>> >>> Or would it add too much extra boilerplate and only slab cares? >> >> We could indeed go on that direction to try to decouple >> status_change_nid from status_change_nid_normal. >> >> Although as you said, slub is the only user of status_change_nid_normal >> for the time beign, so I am not sure of adding a second chain for only >> one user. >> >> Might look cleaner though, and the advantatge is that slub would not get >> notified for nodes adquiring only ZONE_MOVABLE. >> >> Let us see what David thinks about it. > > I'd hope we'd be able to get rid of the _normal stuff completely, it's seems > way to specialized. > > We added that in > > commit b9d5ab2562eceeada5e4837a621b6260574dd11d > Author: Lai Jiangshan <laijs@cn.fujitsu.com> > Date: Tue Dec 11 16:01:05 2012 -0800 > > slub, hotplug: ignore unrelated node's hot-adding and hot-removing > > SLUB only focuses on the nodes which have normal memory and it ignores the > other node's hot-adding and hot-removing. > > Aka: if some memory of a node which has no onlined memory is online, but > this new memory onlined is not normal memory (for example, highmem), we > should not allocate kmem_cache_node for SLUB. > > And if the last normal memory is offlined, but the node still has memory, > we should remove kmem_cache_node for that node. (The current code delays > it when all of the memory is offlined) > > So we only do something when marg->status_change_nid_normal > 0. > marg->status_change_nid is not suitable here. > > The same problem doesn't exist in SLAB, because SLAB allocates kmem_list3 > for every node even the node don't have normal memory, SLAB tolerates > kmem_list3 on alien nodes. SLUB only focuses on the nodes which have > normal memory, it don't tolerate alien kmem_cache_node. The patch makes > SLUB become self-compatible and avoids WARNs and BUGs in rare conditions. > > > How "bad" would it be if we do the slab_mem_going_online_callback() etc even > for completely-movable nodes? I assume one kmem_cache_alloc() per slab_caches. > > slab_mem_going_offline_callback() only does shrinking, #dontcare > > Looking at slab_mem_offline_callback(), we never even free the caches either > way when offlining. So the implication would be that we would have movable-only nodes > set in slab_nodes. > > > We don't expect many such nodes, so ... do we care? BTW, isn't description of slab_nodes wrong? "Tracks for which NUMA nodes we have kmem_cache_nodes allocated." -- but as there is no freeing done in slab_mem_offline_callback(), isn't it always kept allocated? (probably I am missing something)
On Thu, Apr 03, 2025 at 03:08:18PM +0200, David Hildenbrand wrote: > On 03.04.25 15:02, David Hildenbrand wrote: > > On 02.04.25 19:03, Oscar Salvador wrote: > > > On Wed, Apr 02, 2025 at 06:06:51PM +0200, Vlastimil Babka wrote: > > > > What if we had two chains: > > > > > > > > register_node_notifier() > > > > register_node_normal_notifier() > > > > > > > > I think they could have shared the state #defines and struct node_notify > > > > would have just one nid and be always >= 0. > > > > > > > > Or would it add too much extra boilerplate and only slab cares? > > > > > > We could indeed go on that direction to try to decouple > > > status_change_nid from status_change_nid_normal. > > > > > > Although as you said, slub is the only user of status_change_nid_normal > > > for the time beign, so I am not sure of adding a second chain for only > > > one user. > > > > > > Might look cleaner though, and the advantatge is that slub would not get > > > notified for nodes adquiring only ZONE_MOVABLE. > > > > > > Let us see what David thinks about it. > > > > I'd hope we'd be able to get rid of the _normal stuff completely, it's seems > > way to specialized. > > > > We added that in > > > > commit b9d5ab2562eceeada5e4837a621b6260574dd11d > > Author: Lai Jiangshan <laijs@cn.fujitsu.com> > > Date: Tue Dec 11 16:01:05 2012 -0800 > > > > slub, hotplug: ignore unrelated node's hot-adding and hot-removing > > SLUB only focuses on the nodes which have normal memory and it ignores the > > other node's hot-adding and hot-removing. > > Aka: if some memory of a node which has no onlined memory is online, but > > this new memory onlined is not normal memory (for example, highmem), we > > should not allocate kmem_cache_node for SLUB. > > And if the last normal memory is offlined, but the node still has memory, > > we should remove kmem_cache_node for that node. (The current code delays > > it when all of the memory is offlined) > > So we only do something when marg->status_change_nid_normal > 0. > > marg->status_change_nid is not suitable here. > > The same problem doesn't exist in SLAB, because SLAB allocates kmem_list3 > > for every node even the node don't have normal memory, SLAB tolerates > > kmem_list3 on alien nodes. SLUB only focuses on the nodes which have > > normal memory, it don't tolerate alien kmem_cache_node. The patch makes > > SLUB become self-compatible and avoids WARNs and BUGs in rare conditions. > > > > > > How "bad" would it be if we do the slab_mem_going_online_callback() etc even > > for completely-movable nodes? I assume one kmem_cache_alloc() per slab_caches. > > > > slab_mem_going_offline_callback() only does shrinking, #dontcare > > > > Looking at slab_mem_offline_callback(), we never even free the caches either > > way when offlining. So the implication would be that we would have movable-only nodes > > set in slab_nodes. > > > > > > We don't expect many such nodes, so ... do we care? > > BTW, isn't description of slab_nodes wrong? > > "Tracks for which NUMA nodes we have kmem_cache_nodes allocated." -- but as > there is no freeing done in slab_mem_offline_callback(), isn't it always > kept allocated? It was, but not anymore :) I think this patch series [1] forgot the fact that it changed the meaning from 'NUMA nodes that have kmem_cache_node', to 'NUMA nodes that have normal memory (that can be allocated as slab memory)'? [1] https://lore.kernel.org/all/20210113131634.3671-1-vbabka@suse.cz > > (probably I am missing something) > > -- > Cheers, > > David / dhildenb > >