diff mbox series

[2/4] mm: not __SetPageReserved on initializing hot-plugged memory

Message ID 20240629013322.12364-2-richard.weiyang@gmail.com (mailing list archive)
State New
Headers show
Series [1/4] mm: use zonelist_zone() to get zone | expand

Commit Message

Wei Yang June 29, 2024, 1:33 a.m. UTC
Initialize all pages reserved is an ancient behavior.

Since commit 92923ca3aace ("mm: meminit: only set page reserved in the
memblock region"), SetPageReserved is removed from
__init_single_page(). Only those reserved pages are marked PG_reserved.

But we still set PG_reserved on offline and check it on online.

Following two commits removed both of them:

* Commit 0ee5f4f31d36 ("mm/page_alloc.c: don't set pages PageReserved()
  when offlining") removed the set on offline.
* Commit 5ecae6359e3a ("mm/memory_hotplug: drop PageReserved() check in
  online_pages_range()") removed the check on online.

This means we set PG_reserved for hot-plugged memory at initialization
is not helpful and a little different from bootmem initialization path.
Now we can remove it.

Memory hot-add and hot-remove have been tested.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
CC: Nathan Zimmer <nzimmer@sgi.com>
CC: David Hildenbrand <david@redhat.com>
CC: Mike Rapoport (IBM) <rppt@kernel.org>
---
 mm/mm_init.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

Comments

David Hildenbrand June 29, 2024, 6:19 a.m. UTC | #1
On 29.06.24 03:33, Wei Yang wrote:
> Initialize all pages reserved is an ancient behavior.
> 
> Since commit 92923ca3aace ("mm: meminit: only set page reserved in the
> memblock region"), SetPageReserved is removed from
> __init_single_page(). Only those reserved pages are marked PG_reserved.
> 
> But we still set PG_reserved on offline and check it on online.
> 
> Following two commits removed both of them:
> 
> * Commit 0ee5f4f31d36 ("mm/page_alloc.c: don't set pages PageReserved()
>    when offlining") removed the set on offline.
> * Commit 5ecae6359e3a ("mm/memory_hotplug: drop PageReserved() check in
>    online_pages_range()") removed the check on online.
> 
> This means we set PG_reserved for hot-plugged memory at initialization
> is not helpful and a little different from bootmem initialization path.
> Now we can remove it.

It's not that easy for ZONE_DEVICE.

Also, see mm/mm-stable

commit 3dadec1babf9eee0c67c967df931d6f0cb124a04
Author: David Hildenbrand <david@redhat.com>
Date:   Fri Jun 7 11:09:36 2024 +0200

     mm: pass meminit_context to __free_pages_core()

     Patch series "mm/memory_hotplug: use PageOffline() instead of
     PageReserved() for !ZONE_DEVICE".


commit b873faaa609ab44c223b2327f55d2b6a2ba4ca9c
Author: David Hildenbrand <david@redhat.com>
Date:   Fri Jun 7 11:09:37 2024 +0200

     mm/memory_hotplug: initialize memmap of !ZONE_DEVICE with 
PageOffline() instead of PageReserved()


If you want to work on removing it for ZONE_DEVICE, one idea
is to replace all relevant PageReserved() checks by a
more generic function that would check PageReserved() and the zone
Wei Yang June 29, 2024, 8:32 a.m. UTC | #2
On Sat, Jun 29, 2024 at 08:19:49AM +0200, David Hildenbrand wrote:
>On 29.06.24 03:33, Wei Yang wrote:
>> Initialize all pages reserved is an ancient behavior.
>> 
>> Since commit 92923ca3aace ("mm: meminit: only set page reserved in the
>> memblock region"), SetPageReserved is removed from
>> __init_single_page(). Only those reserved pages are marked PG_reserved.
>> 
>> But we still set PG_reserved on offline and check it on online.
>> 
>> Following two commits removed both of them:
>> 
>> * Commit 0ee5f4f31d36 ("mm/page_alloc.c: don't set pages PageReserved()
>>    when offlining") removed the set on offline.
>> * Commit 5ecae6359e3a ("mm/memory_hotplug: drop PageReserved() check in
>>    online_pages_range()") removed the check on online.
>> 
>> This means we set PG_reserved for hot-plugged memory at initialization
>> is not helpful and a little different from bootmem initialization path.
>> Now we can remove it.
>
>It's not that easy for ZONE_DEVICE.
>
>Also, see mm/mm-stable
>
>commit 3dadec1babf9eee0c67c967df931d6f0cb124a04
>Author: David Hildenbrand <david@redhat.com>
>Date:   Fri Jun 7 11:09:36 2024 +0200
>
>    mm: pass meminit_context to __free_pages_core()
>
>    Patch series "mm/memory_hotplug: use PageOffline() instead of
>    PageReserved() for !ZONE_DEVICE".
>
>
>commit b873faaa609ab44c223b2327f55d2b6a2ba4ca9c
>Author: David Hildenbrand <david@redhat.com>
>Date:   Fri Jun 7 11:09:37 2024 +0200
>
>    mm/memory_hotplug: initialize memmap of !ZONE_DEVICE with PageOffline()
>instead of PageReserved()
>

Let me try to understand this.

You also tries to get rid of PG_reserved but you want PG_offline instead,
because this benefit virtio-mem, right?

But I don't get why PG_offline is wrong for ZONE_DEVICE. I may miss some
knowledge for it.

>
>If you want to work on removing it for ZONE_DEVICE, one idea
>is to replace all relevant PageReserved() checks by a
>more generic function that would check PageReserved() and the zone
>
>
>-- 
>Cheers,
>
>David / dhildenb
David Hildenbrand June 29, 2024, 2:38 p.m. UTC | #3
On 29.06.24 10:32, Wei Yang wrote:
> On Sat, Jun 29, 2024 at 08:19:49AM +0200, David Hildenbrand wrote:
>> On 29.06.24 03:33, Wei Yang wrote:
>>> Initialize all pages reserved is an ancient behavior.
>>>
>>> Since commit 92923ca3aace ("mm: meminit: only set page reserved in the
>>> memblock region"), SetPageReserved is removed from
>>> __init_single_page(). Only those reserved pages are marked PG_reserved.
>>>
>>> But we still set PG_reserved on offline and check it on online.
>>>
>>> Following two commits removed both of them:
>>>
>>> * Commit 0ee5f4f31d36 ("mm/page_alloc.c: don't set pages PageReserved()
>>>     when offlining") removed the set on offline.
>>> * Commit 5ecae6359e3a ("mm/memory_hotplug: drop PageReserved() check in
>>>     online_pages_range()") removed the check on online.
>>>
>>> This means we set PG_reserved for hot-plugged memory at initialization
>>> is not helpful and a little different from bootmem initialization path.
>>> Now we can remove it.
>>
>> It's not that easy for ZONE_DEVICE.
>>
>> Also, see mm/mm-stable
>>
>> commit 3dadec1babf9eee0c67c967df931d6f0cb124a04
>> Author: David Hildenbrand <david@redhat.com>
>> Date:   Fri Jun 7 11:09:36 2024 +0200
>>
>>     mm: pass meminit_context to __free_pages_core()
>>
>>     Patch series "mm/memory_hotplug: use PageOffline() instead of
>>     PageReserved() for !ZONE_DEVICE".
>>
>>
>> commit b873faaa609ab44c223b2327f55d2b6a2ba4ca9c
>> Author: David Hildenbrand <david@redhat.com>
>> Date:   Fri Jun 7 11:09:37 2024 +0200
>>
>>     mm/memory_hotplug: initialize memmap of !ZONE_DEVICE with PageOffline()
>> instead of PageReserved()
>>
> 
> Let me try to understand this.
> 
> You also tries to get rid of PG_reserved but you want PG_offline instead,
> because this benefit virtio-mem, right?

We now make proper use of PG_offline. All hotplugged pages start out 
PG_offline once we turn the section online. Only the ones that actually 
get exposed to the buddy -- actually get onlined -- get PG_offline 
cleared. A side effect of that is less hacks for virtio-mem, and more 
natural handling for the other ballooning drivers that hotplug memory.

In the future, I'm planning on moving more fake-offlining code from 
virtio-mem the core, making use of more PG_offline in memory.

For now, it's stops the PG_reserved use while maintaining the same 
semantics as before: the page content and "struct page" is not to be 
touched by anybody except the "owner".

> 
> But I don't get why PG_offline is wrong for ZONE_DEVICE. I may miss some
> knowledge for it.

I suggest you take a look at the PG_offline documentation. ZONE_DEVICE 
are certainly not logically offline pages. They will never be considered 
online as part of online sections. But they will never be handed to the 
buddy.

Maybe we want a dedicate page type for them in the future, not sure. We 
can right now identify them reliably using the zone idx.

Using a page type right now is very likely not possible, because we 
might be using the page->_mapcount in rmap code when mapping some of 
them to user space.

If we want to get rid of the PG_reserved for them right now, we'll have 
to make sure all existing PageReserved checks won't be degraded. For
example, drivers/vfio/vfio_iommu_type1.c might need some work (no sure).

The KVM one in kvm_pfn_to_refcounted_page() should already be fine, 
because they really want to refcount them.

A lot of other ones like can_gather_numa_stats(), already refuse 
is_zone_device_page() manually, and maybe we want to factor both checks 
out into a separate function like "is_special_reserved_page()" or sth 
like that.
Wei Yang June 30, 2024, 7:32 a.m. UTC | #4
On Sat, Jun 29, 2024 at 04:38:47PM +0200, David Hildenbrand wrote:
>On 29.06.24 10:32, Wei Yang wrote:
>> On Sat, Jun 29, 2024 at 08:19:49AM +0200, David Hildenbrand wrote:
>> > On 29.06.24 03:33, Wei Yang wrote:
>> > > Initialize all pages reserved is an ancient behavior.
>> > > 
>> > > Since commit 92923ca3aace ("mm: meminit: only set page reserved in the
>> > > memblock region"), SetPageReserved is removed from
>> > > __init_single_page(). Only those reserved pages are marked PG_reserved.
>> > > 
>> > > But we still set PG_reserved on offline and check it on online.
>> > > 
>> > > Following two commits removed both of them:
>> > > 
>> > > * Commit 0ee5f4f31d36 ("mm/page_alloc.c: don't set pages PageReserved()
>> > >     when offlining") removed the set on offline.
>> > > * Commit 5ecae6359e3a ("mm/memory_hotplug: drop PageReserved() check in
>> > >     online_pages_range()") removed the check on online.
>> > > 
>> > > This means we set PG_reserved for hot-plugged memory at initialization
>> > > is not helpful and a little different from bootmem initialization path.
>> > > Now we can remove it.
>> > 
>> > It's not that easy for ZONE_DEVICE.
>> > 
>> > Also, see mm/mm-stable
>> > 
>> > commit 3dadec1babf9eee0c67c967df931d6f0cb124a04
>> > Author: David Hildenbrand <david@redhat.com>
>> > Date:   Fri Jun 7 11:09:36 2024 +0200
>> > 
>> >     mm: pass meminit_context to __free_pages_core()
>> > 
>> >     Patch series "mm/memory_hotplug: use PageOffline() instead of
>> >     PageReserved() for !ZONE_DEVICE".
>> > 
>> > 
>> > commit b873faaa609ab44c223b2327f55d2b6a2ba4ca9c
>> > Author: David Hildenbrand <david@redhat.com>
>> > Date:   Fri Jun 7 11:09:37 2024 +0200
>> > 
>> >     mm/memory_hotplug: initialize memmap of !ZONE_DEVICE with PageOffline()
>> > instead of PageReserved()
>> > 
>> 
>> Let me try to understand this.
>> 
>> You also tries to get rid of PG_reserved but you want PG_offline instead,
>> because this benefit virtio-mem, right?
>
>We now make proper use of PG_offline. All hotplugged pages start out
>PG_offline once we turn the section online. Only the ones that actually get
>exposed to the buddy -- actually get onlined -- get PG_offline cleared. A
>side effect of that is less hacks for virtio-mem, and more natural handling
>for the other ballooning drivers that hotplug memory.
>
>In the future, I'm planning on moving more fake-offlining code from
>virtio-mem the core, making use of more PG_offline in memory.
>
>For now, it's stops the PG_reserved use while maintaining the same semantics
>as before: the page content and "struct page" is not to be touched by anybody
>except the "owner".
>
>> 
>> But I don't get why PG_offline is wrong for ZONE_DEVICE. I may miss some
>> knowledge for it.
>
>I suggest you take a look at the PG_offline documentation. ZONE_DEVICE are
>certainly not logically offline pages. They will never be considered online
>as part of online sections. But they will never be handed to the buddy.
>
>Maybe we want a dedicate page type for them in the future, not sure. We can
>right now identify them reliably using the zone idx.
>
>Using a page type right now is very likely not possible, because we might be
>using the page->_mapcount in rmap code when mapping some of them to user
>space.
>
>If we want to get rid of the PG_reserved for them right now, we'll have to
>make sure all existing PageReserved checks won't be degraded. For
>example, drivers/vfio/vfio_iommu_type1.c might need some work (no sure).
>
>The KVM one in kvm_pfn_to_refcounted_page() should already be fine, because
>they really want to refcount them.
>
>A lot of other ones like can_gather_numa_stats(), already refuse
>is_zone_device_page() manually, and maybe we want to factor both checks out
>into a separate function like "is_special_reserved_page()" or sth like that.
>

Thanks for the information, I need sometime to digest it :-)

>-- 
>Cheers,
>
>David / dhildenb
diff mbox series

Patch

diff --git a/mm/mm_init.c b/mm/mm_init.c
index 3ec04933f7fd..362ac4334b99 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -839,9 +839,8 @@  static void __init init_unavailable_range(unsigned long spfn,
 }
 
 /*
- * Initially all pages are reserved - free ones are freed
- * up by memblock_free_all() once the early boot process is
- * done. Non-atomic initialization, single-pass.
+ * Free ones are freed up by memblock_free_all() once the early boot process
+ * is done. Non-atomic initialization, single-pass.
  *
  * All aligned pageblocks are initialized to the specified migratetype
  * (usually MIGRATE_MOVABLE). Besides setting the migratetype, no related
@@ -892,8 +891,6 @@  void __meminit memmap_init_range(unsigned long size, int nid, unsigned long zone
 
 		page = pfn_to_page(pfn);
 		__init_single_page(page, pfn, zone, nid);
-		if (context == MEMINIT_HOTPLUG)
-			__SetPageReserved(page);
 
 		/*
 		 * Usually, we want to mark the pageblock MIGRATE_MOVABLE,