mbox series

[0/2] Implement numa node notifier

Message ID 20250401092716.537512-1-osalvador@suse.de (mailing list archive)
Headers show
Series Implement numa node notifier | expand

Message

Oscar Salvador April 1, 2025, 9:27 a.m. UTC
Memory notifier is a tool that allow consumers to get notified whenever
memory gets onlined or offlined in the system.
Currently, there are 10 consumers of that, but 5 out of those 10 consumers
are only interested in getting notifications when a numa node has changed its
state.
That means going from memoryless to memory-aware of vice versa.

Which means that for every {online,offline}_pages operation they get
notified even though the numa node might not have changed its state.

The first patch implements a numa node notifier that does just that, and have
those consumers register in there, so they get notified only when they are
interested.

The second patch replaces 'status_change_normal{_normal}' fields within
memory_notify with a 'nid', as that is only what we need for memory
notifer and update the only user of it (page_ext).

Consumers that are only interested in numa node states change are:

 - memory-tier
 - slub
 - cpuset
 - hmat
 - cxl


Oscar Salvador (2):
  mm,memory_hotplug: Implement numa node notifier
  mm,memory_hotplug: Replace status_change_nid parameter in
    memory_notify

 drivers/acpi/numa/hmat.c  |  6 +--
 drivers/base/node.c       | 19 +++++++++
 drivers/cxl/core/region.c | 14 +++----
 drivers/cxl/cxl.h         |  4 +-
 include/linux/memory.h    | 37 ++++++++++++++++++
 kernel/cgroup/cpuset.c    |  2 +-
 mm/memory-tiers.c         |  8 ++--
 mm/memory_hotplug.c       | 82 +++++++++++++++++++++++++++++----------
 mm/page_ext.c             | 12 +-----
 mm/slub.c                 | 22 +++++------
 10 files changed, 146 insertions(+), 60 deletions(-)

Comments

Vlastimil Babka April 2, 2025, 4:06 p.m. UTC | #1
On 4/1/25 11:27, Oscar Salvador wrote:
> Memory notifier is a tool that allow consumers to get notified whenever
> memory gets onlined or offlined in the system.
> Currently, there are 10 consumers of that, but 5 out of those 10 consumers
> are only interested in getting notifications when a numa node has changed its
> state.
> That means going from memoryless to memory-aware of vice versa.
> 
> Which means that for every {online,offline}_pages operation they get
> notified even though the numa node might not have changed its state.
> 
> The first patch implements a numa node notifier that does just that, and have
> those consumers register in there, so they get notified only when they are
> interested.

What if we had two chains:

register_node_notifier()
register_node_normal_notifier()

I think they could have shared the state #defines and struct node_notify
would have just one nid and be always >= 0.

Or would it add too much extra boilerplate and only slab cares?

> The second patch replaces 'status_change_normal{_normal}' fields within
> memory_notify with a 'nid', as that is only what we need for memory
> notifer and update the only user of it (page_ext).
> 
> Consumers that are only interested in numa node states change are:
> 
>  - memory-tier
>  - slub
>  - cpuset
>  - hmat
>  - cxl
> 
> 
> Oscar Salvador (2):
>   mm,memory_hotplug: Implement numa node notifier
>   mm,memory_hotplug: Replace status_change_nid parameter in
>     memory_notify
> 
>  drivers/acpi/numa/hmat.c  |  6 +--
>  drivers/base/node.c       | 19 +++++++++
>  drivers/cxl/core/region.c | 14 +++----
>  drivers/cxl/cxl.h         |  4 +-
>  include/linux/memory.h    | 37 ++++++++++++++++++
>  kernel/cgroup/cpuset.c    |  2 +-
>  mm/memory-tiers.c         |  8 ++--
>  mm/memory_hotplug.c       | 82 +++++++++++++++++++++++++++++----------
>  mm/page_ext.c             | 12 +-----
>  mm/slub.c                 | 22 +++++------
>  10 files changed, 146 insertions(+), 60 deletions(-)
>
Oscar Salvador April 2, 2025, 5:03 p.m. UTC | #2
On Wed, Apr 02, 2025 at 06:06:51PM +0200, Vlastimil Babka wrote:
> What if we had two chains:
> 
> register_node_notifier()
> register_node_normal_notifier()
> 
> I think they could have shared the state #defines and struct node_notify
> would have just one nid and be always >= 0.
> 
> Or would it add too much extra boilerplate and only slab cares?

We could indeed go on that direction to try to decouple
status_change_nid from status_change_nid_normal.

Although as you said, slub is the only user of status_change_nid_normal
for the time beign, so I am not sure of adding a second chain for only
one user.

Might look cleaner though, and the advantatge is that slub would not get
notified for nodes adquiring only ZONE_MOVABLE.

Let us see what David thinks about it.

thanks for the suggestion ;-)
Jonathan Cameron April 3, 2025, 12:29 p.m. UTC | #3
On Tue,  1 Apr 2025 11:27:14 +0200
Oscar Salvador <osalvador@suse.de> wrote:

> Memory notifier is a tool that allow consumers to get notified whenever
> memory gets onlined or offlined in the system.
> Currently, there are 10 consumers of that, but 5 out of those 10 consumers
> are only interested in getting notifications when a numa node has changed its
> state.
> That means going from memoryless to memory-aware of vice versa.
> 
> Which means that for every {online,offline}_pages operation they get
> notified even though the numa node might not have changed its state.
> 
> The first patch implements a numa node notifier that does just that, and have
> those consumers register in there, so they get notified only when they are
> interested.
> 
> The second patch replaces 'status_change_normal{_normal}' fields within
> memory_notify with a 'nid', as that is only what we need for memory
> notifer and update the only user of it (page_ext).
> 
> Consumers that are only interested in numa node states change are:
> 
>  - memory-tier
>  - slub
>  - cpuset
>  - hmat
>  - cxl
> 
Hi Oscar,

Idea seems good to me.

+CC linux-cxl for information of others not on the thread.

> 
> Oscar Salvador (2):
>   mm,memory_hotplug: Implement numa node notifier
>   mm,memory_hotplug: Replace status_change_nid parameter in
>     memory_notify
> 
>  drivers/acpi/numa/hmat.c  |  6 +--
>  drivers/base/node.c       | 19 +++++++++
>  drivers/cxl/core/region.c | 14 +++----
>  drivers/cxl/cxl.h         |  4 +-
>  include/linux/memory.h    | 37 ++++++++++++++++++
>  kernel/cgroup/cpuset.c    |  2 +-
>  mm/memory-tiers.c         |  8 ++--
>  mm/memory_hotplug.c       | 82 +++++++++++++++++++++++++++++----------
>  mm/page_ext.c             | 12 +-----
>  mm/slub.c                 | 22 +++++------
>  10 files changed, 146 insertions(+), 60 deletions(-)
>
David Hildenbrand April 3, 2025, 1:02 p.m. UTC | #4
On 02.04.25 19:03, Oscar Salvador wrote:
> On Wed, Apr 02, 2025 at 06:06:51PM +0200, Vlastimil Babka wrote:
>> What if we had two chains:
>>
>> register_node_notifier()
>> register_node_normal_notifier()
>>
>> I think they could have shared the state #defines and struct node_notify
>> would have just one nid and be always >= 0.
>>
>> Or would it add too much extra boilerplate and only slab cares?
> 
> We could indeed go on that direction to try to decouple
> status_change_nid from status_change_nid_normal.
> 
> Although as you said, slub is the only user of status_change_nid_normal
> for the time beign, so I am not sure of adding a second chain for only
> one user.
> 
> Might look cleaner though, and the advantatge is that slub would not get
> notified for nodes adquiring only ZONE_MOVABLE.
> 
> Let us see what David thinks about it.

I'd hope we'd be able to get rid of the _normal stuff completely, it's seems
way to specialized.

We added that in

commit b9d5ab2562eceeada5e4837a621b6260574dd11d
Author: Lai Jiangshan <laijs@cn.fujitsu.com>
Date:   Tue Dec 11 16:01:05 2012 -0800

     slub, hotplug: ignore unrelated node's hot-adding and hot-removing
     
     SLUB only focuses on the nodes which have normal memory and it ignores the
     other node's hot-adding and hot-removing.
     
     Aka: if some memory of a node which has no onlined memory is online, but
     this new memory onlined is not normal memory (for example, highmem), we
     should not allocate kmem_cache_node for SLUB.
     
     And if the last normal memory is offlined, but the node still has memory,
     we should remove kmem_cache_node for that node.  (The current code delays
     it when all of the memory is offlined)
     
     So we only do something when marg->status_change_nid_normal > 0.
     marg->status_change_nid is not suitable here.
     
     The same problem doesn't exist in SLAB, because SLAB allocates kmem_list3
     for every node even the node don't have normal memory, SLAB tolerates
     kmem_list3 on alien nodes.  SLUB only focuses on the nodes which have
     normal memory, it don't tolerate alien kmem_cache_node.  The patch makes
     SLUB become self-compatible and avoids WARNs and BUGs in rare conditions.


How "bad" would it be if we do the slab_mem_going_online_callback() etc even
for completely-movable nodes? I assume one kmem_cache_alloc() per slab_caches.

slab_mem_going_offline_callback() only does shrinking, #dontcare

Looking at slab_mem_offline_callback(), we never even free the caches either
way when offlining. So the implication would be that we would have movable-only nodes
set in slab_nodes.


We don't expect many such nodes, so ... do we care?
David Hildenbrand April 3, 2025, 1:08 p.m. UTC | #5
On 03.04.25 15:02, David Hildenbrand wrote:
> On 02.04.25 19:03, Oscar Salvador wrote:
>> On Wed, Apr 02, 2025 at 06:06:51PM +0200, Vlastimil Babka wrote:
>>> What if we had two chains:
>>>
>>> register_node_notifier()
>>> register_node_normal_notifier()
>>>
>>> I think they could have shared the state #defines and struct node_notify
>>> would have just one nid and be always >= 0.
>>>
>>> Or would it add too much extra boilerplate and only slab cares?
>>
>> We could indeed go on that direction to try to decouple
>> status_change_nid from status_change_nid_normal.
>>
>> Although as you said, slub is the only user of status_change_nid_normal
>> for the time beign, so I am not sure of adding a second chain for only
>> one user.
>>
>> Might look cleaner though, and the advantatge is that slub would not get
>> notified for nodes adquiring only ZONE_MOVABLE.
>>
>> Let us see what David thinks about it.
> 
> I'd hope we'd be able to get rid of the _normal stuff completely, it's seems
> way to specialized.
> 
> We added that in
> 
> commit b9d5ab2562eceeada5e4837a621b6260574dd11d
> Author: Lai Jiangshan <laijs@cn.fujitsu.com>
> Date:   Tue Dec 11 16:01:05 2012 -0800
> 
>       slub, hotplug: ignore unrelated node's hot-adding and hot-removing
>       
>       SLUB only focuses on the nodes which have normal memory and it ignores the
>       other node's hot-adding and hot-removing.
>       
>       Aka: if some memory of a node which has no onlined memory is online, but
>       this new memory onlined is not normal memory (for example, highmem), we
>       should not allocate kmem_cache_node for SLUB.
>       
>       And if the last normal memory is offlined, but the node still has memory,
>       we should remove kmem_cache_node for that node.  (The current code delays
>       it when all of the memory is offlined)
>       
>       So we only do something when marg->status_change_nid_normal > 0.
>       marg->status_change_nid is not suitable here.
>       
>       The same problem doesn't exist in SLAB, because SLAB allocates kmem_list3
>       for every node even the node don't have normal memory, SLAB tolerates
>       kmem_list3 on alien nodes.  SLUB only focuses on the nodes which have
>       normal memory, it don't tolerate alien kmem_cache_node.  The patch makes
>       SLUB become self-compatible and avoids WARNs and BUGs in rare conditions.
> 
> 
> How "bad" would it be if we do the slab_mem_going_online_callback() etc even
> for completely-movable nodes? I assume one kmem_cache_alloc() per slab_caches.
> 
> slab_mem_going_offline_callback() only does shrinking, #dontcare
> 
> Looking at slab_mem_offline_callback(), we never even free the caches either
> way when offlining. So the implication would be that we would have movable-only nodes
> set in slab_nodes.
> 
> 
> We don't expect many such nodes, so ... do we care?

BTW, isn't description of slab_nodes wrong?

"Tracks for which NUMA nodes we have kmem_cache_nodes allocated." -- but 
as there is no freeing done in slab_mem_offline_callback(), isn't it 
always kept allocated?

(probably I am missing something)
Harry Yoo April 3, 2025, 1:57 p.m. UTC | #6
On Thu, Apr 03, 2025 at 03:08:18PM +0200, David Hildenbrand wrote:
> On 03.04.25 15:02, David Hildenbrand wrote:
> > On 02.04.25 19:03, Oscar Salvador wrote:
> > > On Wed, Apr 02, 2025 at 06:06:51PM +0200, Vlastimil Babka wrote:
> > > > What if we had two chains:
> > > > 
> > > > register_node_notifier()
> > > > register_node_normal_notifier()
> > > > 
> > > > I think they could have shared the state #defines and struct node_notify
> > > > would have just one nid and be always >= 0.
> > > > 
> > > > Or would it add too much extra boilerplate and only slab cares?
> > > 
> > > We could indeed go on that direction to try to decouple
> > > status_change_nid from status_change_nid_normal.
> > > 
> > > Although as you said, slub is the only user of status_change_nid_normal
> > > for the time beign, so I am not sure of adding a second chain for only
> > > one user.
> > > 
> > > Might look cleaner though, and the advantatge is that slub would not get
> > > notified for nodes adquiring only ZONE_MOVABLE.
> > > 
> > > Let us see what David thinks about it.
> > 
> > I'd hope we'd be able to get rid of the _normal stuff completely, it's seems
> > way to specialized.
> > 
> > We added that in
> > 
> > commit b9d5ab2562eceeada5e4837a621b6260574dd11d
> > Author: Lai Jiangshan <laijs@cn.fujitsu.com>
> > Date:   Tue Dec 11 16:01:05 2012 -0800
> > 
> >       slub, hotplug: ignore unrelated node's hot-adding and hot-removing
> >       SLUB only focuses on the nodes which have normal memory and it ignores the
> >       other node's hot-adding and hot-removing.
> >       Aka: if some memory of a node which has no onlined memory is online, but
> >       this new memory onlined is not normal memory (for example, highmem), we
> >       should not allocate kmem_cache_node for SLUB.
> >       And if the last normal memory is offlined, but the node still has memory,
> >       we should remove kmem_cache_node for that node.  (The current code delays
> >       it when all of the memory is offlined)
> >       So we only do something when marg->status_change_nid_normal > 0.
> >       marg->status_change_nid is not suitable here.
> >       The same problem doesn't exist in SLAB, because SLAB allocates kmem_list3
> >       for every node even the node don't have normal memory, SLAB tolerates
> >       kmem_list3 on alien nodes.  SLUB only focuses on the nodes which have
> >       normal memory, it don't tolerate alien kmem_cache_node.  The patch makes
> >       SLUB become self-compatible and avoids WARNs and BUGs in rare conditions.
> > 
> > 
> > How "bad" would it be if we do the slab_mem_going_online_callback() etc even
> > for completely-movable nodes? I assume one kmem_cache_alloc() per slab_caches.
> > 
> > slab_mem_going_offline_callback() only does shrinking, #dontcare
> > 
> > Looking at slab_mem_offline_callback(), we never even free the caches either
> > way when offlining. So the implication would be that we would have movable-only nodes
> > set in slab_nodes.
> > 
> > 
> > We don't expect many such nodes, so ... do we care?
> 
> BTW, isn't description of slab_nodes wrong?
> 
> "Tracks for which NUMA nodes we have kmem_cache_nodes allocated." -- but as
> there is no freeing done in slab_mem_offline_callback(), isn't it always
> kept allocated?

It was, but not anymore :)

I think this patch series [1] forgot the fact that it changed the meaning
from 'NUMA nodes that have kmem_cache_node', to 'NUMA nodes that have normal
memory (that can be allocated as slab memory)'?

[1] https://lore.kernel.org/all/20210113131634.3671-1-vbabka@suse.cz

> 
> (probably I am missing something)
> 
> -- 
> Cheers,
> 
> David / dhildenb
> 
>
Harry Yoo April 3, 2025, 10:06 p.m. UTC | #7
On Thu, Apr 03, 2025 at 03:02:25PM +0200, David Hildenbrand wrote:
> On 02.04.25 19:03, Oscar Salvador wrote:
> > On Wed, Apr 02, 2025 at 06:06:51PM +0200, Vlastimil Babka wrote:
> > > What if we had two chains:
> > > 
> > > register_node_notifier()
> > > register_node_normal_notifier()
> > > 
> > > I think they could have shared the state #defines and struct node_notify
> > > would have just one nid and be always >= 0.
> > > 
> > > Or would it add too much extra boilerplate and only slab cares?
> > 
> > We could indeed go on that direction to try to decouple
> > status_change_nid from status_change_nid_normal.
> > 
> > Although as you said, slub is the only user of status_change_nid_normal
> > for the time beign, so I am not sure of adding a second chain for only
> > one user.
> > 
> > Might look cleaner though, and the advantatge is that slub would not get
> > notified for nodes adquiring only ZONE_MOVABLE.
> > 
> > Let us see what David thinks about it.
> 
> I'd hope we'd be able to get rid of the _normal stuff completely, it's seems
> way to specialized.

Hmm, perhaps we can remove it with as part of this patch series?

status_change_nid_normal has been used to indicate both 'There is a
status change' AND 'The node id when the NUMA node has normal memory'.

But since NUMA node notifier triggers only when there is a state change,
it can simply pass nid, like patch 2 does. SLUB can then check whether the
node has normal memory.

Or am I missing something?

> We added that in
> 
> commit b9d5ab2562eceeada5e4837a621b6260574dd11d
> Author: Lai Jiangshan <laijs@cn.fujitsu.com>
> Date:   Tue Dec 11 16:01:05 2012 -0800
> 
>     slub, hotplug: ignore unrelated node's hot-adding and hot-removing
>     SLUB only focuses on the nodes which have normal memory and it ignores the
>     other node's hot-adding and hot-removing.
>     Aka: if some memory of a node which has no onlined memory is online, but
>     this new memory onlined is not normal memory (for example, highmem), we
>     should not allocate kmem_cache_node for SLUB.
>     And if the last normal memory is offlined, but the node still has memory,
>     we should remove kmem_cache_node for that node.  (The current code delays
>     it when all of the memory is offlined)
>     So we only do something when marg->status_change_nid_normal > 0.
>     marg->status_change_nid is not suitable here.
>     The same problem doesn't exist in SLAB, because SLAB allocates kmem_list3
>     for every node even the node don't have normal memory, SLAB tolerates
>     kmem_list3 on alien nodes.  SLUB only focuses on the nodes which have
>     normal memory, it don't tolerate alien kmem_cache_node.  The patch makes
>     SLUB become self-compatible and avoids WARNs and BUGs in rare conditions.
> 
> 
> How "bad" would it be if we do the slab_mem_going_online_callback() etc even
> for completely-movable nodes? I assume one kmem_cache_alloc() per slab_caches.
>
> slab_mem_going_offline_callback() only does shrinking, #dontcare
> 
> Looking at slab_mem_offline_callback(), we never even free the caches either
> way when offlining. So the implication would be that we would have movable-only nodes
> set in slab_nodes.
> 
> 
> We don't expect many such nodes, so ... do we care?
> 
> -- 
> Cheers,
> 
> David / dhildenb
> 
>
Vlastimil Babka April 4, 2025, 8:47 a.m. UTC | #8
On 4/3/25 15:08, David Hildenbrand wrote:
> On 03.04.25 15:02, David Hildenbrand wrote:
>> On 02.04.25 19:03, Oscar Salvador wrote:
>>> On Wed, Apr 02, 2025 at 06:06:51PM +0200, Vlastimil Babka wrote:
>>>> What if we had two chains:
>>>>
>>>> register_node_notifier()
>>>> register_node_normal_notifier()
>>>>
>>>> I think they could have shared the state #defines and struct node_notify
>>>> would have just one nid and be always >= 0.
>>>>
>>>> Or would it add too much extra boilerplate and only slab cares?
>>>
>>> We could indeed go on that direction to try to decouple
>>> status_change_nid from status_change_nid_normal.
>>>
>>> Although as you said, slub is the only user of status_change_nid_normal
>>> for the time beign, so I am not sure of adding a second chain for only
>>> one user.
>>>
>>> Might look cleaner though, and the advantatge is that slub would not get
>>> notified for nodes adquiring only ZONE_MOVABLE.
>>>
>>> Let us see what David thinks about it.
>> 
>> I'd hope we'd be able to get rid of the _normal stuff completely, it's seems
>> way to specialized.
>> 
>> We added that in
>> 
>> commit b9d5ab2562eceeada5e4837a621b6260574dd11d
>> Author: Lai Jiangshan <laijs@cn.fujitsu.com>
>> Date:   Tue Dec 11 16:01:05 2012 -0800
>> 
>>       slub, hotplug: ignore unrelated node's hot-adding and hot-removing
>>       
>>       SLUB only focuses on the nodes which have normal memory and it ignores the
>>       other node's hot-adding and hot-removing.
>>       
>>       Aka: if some memory of a node which has no onlined memory is online, but
>>       this new memory onlined is not normal memory (for example, highmem), we
>>       should not allocate kmem_cache_node for SLUB.
>>       
>>       And if the last normal memory is offlined, but the node still has memory,
>>       we should remove kmem_cache_node for that node.  (The current code delays
>>       it when all of the memory is offlined)
>>       
>>       So we only do something when marg->status_change_nid_normal > 0.
>>       marg->status_change_nid is not suitable here.
>>       
>>       The same problem doesn't exist in SLAB, because SLAB allocates kmem_list3
>>       for every node even the node don't have normal memory, SLAB tolerates
>>       kmem_list3 on alien nodes.  SLUB only focuses on the nodes which have
>>       normal memory, it don't tolerate alien kmem_cache_node.  The patch makes
>>       SLUB become self-compatible and avoids WARNs and BUGs in rare conditions.
>> 
>> 
>> How "bad" would it be if we do the slab_mem_going_online_callback() etc even
>> for completely-movable nodes? I assume one kmem_cache_alloc() per slab_caches.

Yes.

>> slab_mem_going_offline_callback() only does shrinking, #dontcare
>> 
>> Looking at slab_mem_offline_callback(), we never even free the caches either
>> way when offlining. So the implication would be that we would have movable-only nodes
>> set in slab_nodes.

Yes.

>> We don't expect many such nodes, so ... do we care?

Right, the memory waste should be negligible in the big picture. So Oscar
can you change this as part of your series?

> BTW, isn't description of slab_nodes wrong?
> 
> "Tracks for which NUMA nodes we have kmem_cache_nodes allocated." -- but 
> as there is no freeing done in slab_mem_offline_callback(), isn't it 
> always kept allocated?

Hm yes right now it's "is it in slab_nodes => it's allocated" and not
equivalent.

I guess we could remove slab_mem_offline_callback() completely and thus stop
clearing from slab_nodes and the comment will be true again?

That would mean new caches created after node hotremove will allocate for
that node but that's even more negligible than the waste we're doing already.

> (probably I am missing something)
>
Vlastimil Babka April 4, 2025, 8:50 a.m. UTC | #9
On 4/4/25 00:06, Harry Yoo wrote:
> On Thu, Apr 03, 2025 at 03:02:25PM +0200, David Hildenbrand wrote:
>> On 02.04.25 19:03, Oscar Salvador wrote:
>> > On Wed, Apr 02, 2025 at 06:06:51PM +0200, Vlastimil Babka wrote:
>> > > What if we had two chains:
>> > > 
>> > > register_node_notifier()
>> > > register_node_normal_notifier()
>> > > 
>> > > I think they could have shared the state #defines and struct node_notify
>> > > would have just one nid and be always >= 0.
>> > > 
>> > > Or would it add too much extra boilerplate and only slab cares?
>> > 
>> > We could indeed go on that direction to try to decouple
>> > status_change_nid from status_change_nid_normal.
>> > 
>> > Although as you said, slub is the only user of status_change_nid_normal
>> > for the time beign, so I am not sure of adding a second chain for only
>> > one user.
>> > 
>> > Might look cleaner though, and the advantatge is that slub would not get
>> > notified for nodes adquiring only ZONE_MOVABLE.
>> > 
>> > Let us see what David thinks about it.
>> 
>> I'd hope we'd be able to get rid of the _normal stuff completely, it's seems
>> way to specialized.
> 
> Hmm, perhaps we can remove it with as part of this patch series?
> 
> status_change_nid_normal has been used to indicate both 'There is a
> status change' AND 'The node id when the NUMA node has normal memory'.
> 
> But since NUMA node notifier triggers only when there is a state change,
> it can simply pass nid, like patch 2 does. SLUB can then check whether the
> node has normal memory.

Well the state change could be adding movable memory, SLUB checks that
there's no normal memory and thus does nothing. Then normal memory is added
to the node, but there's no new notification and SLUB is left without
supporting the node forever.

But with David's suggestion we avoid that problem.

> Or am I missing something?
> 
>> We added that in
>> 
>> commit b9d5ab2562eceeada5e4837a621b6260574dd11d
>> Author: Lai Jiangshan <laijs@cn.fujitsu.com>
>> Date:   Tue Dec 11 16:01:05 2012 -0800
>> 
>>     slub, hotplug: ignore unrelated node's hot-adding and hot-removing
>>     SLUB only focuses on the nodes which have normal memory and it ignores the
>>     other node's hot-adding and hot-removing.
>>     Aka: if some memory of a node which has no onlined memory is online, but
>>     this new memory onlined is not normal memory (for example, highmem), we
>>     should not allocate kmem_cache_node for SLUB.
>>     And if the last normal memory is offlined, but the node still has memory,
>>     we should remove kmem_cache_node for that node.  (The current code delays
>>     it when all of the memory is offlined)
>>     So we only do something when marg->status_change_nid_normal > 0.
>>     marg->status_change_nid is not suitable here.
>>     The same problem doesn't exist in SLAB, because SLAB allocates kmem_list3
>>     for every node even the node don't have normal memory, SLAB tolerates
>>     kmem_list3 on alien nodes.  SLUB only focuses on the nodes which have
>>     normal memory, it don't tolerate alien kmem_cache_node.  The patch makes
>>     SLUB become self-compatible and avoids WARNs and BUGs in rare conditions.
>> 
>> 
>> How "bad" would it be if we do the slab_mem_going_online_callback() etc even
>> for completely-movable nodes? I assume one kmem_cache_alloc() per slab_caches.
>>
>> slab_mem_going_offline_callback() only does shrinking, #dontcare
>> 
>> Looking at slab_mem_offline_callback(), we never even free the caches either
>> way when offlining. So the implication would be that we would have movable-only nodes
>> set in slab_nodes.
>> 
>> 
>> We don't expect many such nodes, so ... do we care?
>> 
>> -- 
>> Cheers,
>> 
>> David / dhildenb
>> 
>> 
>