diff mbox series

[v1,01/10] mm/memory_hotplug: Don't allow to online/offline memory blocks with holes

Message ID 20191024120938.11237-2-david@redhat.com (mailing list archive)
State New, archived
Headers show
Series mm: Don't mark hotplugged pages PG_reserved (including ZONE_DEVICE) | expand

Commit Message

David Hildenbrand Oct. 24, 2019, 12:09 p.m. UTC
Our onlining/offlining code is unnecessarily complicated. Only memory
blocks added during boot can have holes (a range that is not
IORESOURCE_SYSTEM_RAM). Hotplugged memory never has holes (e.g., see
add_memory_resource()). All boot memory is alread online.

Therefore, when we stop allowing to offline memory blocks with holes, we
implicitly no longer have to deal with onlining memory blocks with holes.

This allows to simplify the code. For example, we no longer have to
worry about marking pages that fall into memory holes PG_reserved when
onlining memory. We can stop setting pages PG_reserved.

Offlining memory blocks added during boot is usually not guranteed to work
either way (unmovable data might have easily ended up on that memory during
boot). So stopping to do that should not really hurt (+ people are not
even aware of a setup where that used to work and that the existing code
still works correctly with memory holes). For the use case of offlining
memory to unplug DIMMs, we should see no change. (holes on DIMMs would be
weird).

Please note that hardware errors (PG_hwpoison) are not memory holes and
not affected by this change when offlining.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory_hotplug.c | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

Comments

Dan Williams Nov. 5, 2019, 1:30 a.m. UTC | #1
On Thu, Oct 24, 2019 at 5:10 AM David Hildenbrand <david@redhat.com> wrote:
>
> Our onlining/offlining code is unnecessarily complicated. Only memory
> blocks added during boot can have holes (a range that is not
> IORESOURCE_SYSTEM_RAM). Hotplugged memory never has holes (e.g., see
> add_memory_resource()). All boot memory is alread online.

s/alread/already/

...also perhaps clarify "already online" by what point in time and why
that is relevant. For example a description of the difference between
the SetPageReserved() in the bootmem path and the one in the hotplug
path.

> Therefore, when we stop allowing to offline memory blocks with holes, we
> implicitly no longer have to deal with onlining memory blocks with holes.

Maybe an explicit reference of the code areas that deal with holes
would help to back up that assertion. Certainly it would have saved me
some time for the review.

> This allows to simplify the code. For example, we no longer have to
> worry about marking pages that fall into memory holes PG_reserved when
> onlining memory. We can stop setting pages PG_reserved.

...but not for bootmem, right?

>
> Offlining memory blocks added during boot is usually not guranteed to work

s/guranteed/guaranteed/

> either way (unmovable data might have easily ended up on that memory during
> boot). So stopping to do that should not really hurt (+ people are not
> even aware of a setup where that used to work

Maybe put a "Link: https://lkml.kernel.org/r/$msg_id" to that discussion?

> and that the existing code
> still works correctly with memory holes). For the use case of offlining
> memory to unplug DIMMs, we should see no change. (holes on DIMMs would be
> weird).

However, less memory can be offlined than was theoretically allowed
previously, so I don't understand the "we should see no change"
comment. I still agree that's a price worth paying to get the code
cleanups and if someone screams we can look at adding it back, but the
fact that it was already fragile seems decent enough protection.

>
> Please note that hardware errors (PG_hwpoison) are not memory holes and
> not affected by this change when offlining.
>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  mm/memory_hotplug.c | 26 ++++++++++++++++++++++++--
>  1 file changed, 24 insertions(+), 2 deletions(-)
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 561371ead39a..8d81730cf036 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1447,10 +1447,19 @@ static void node_states_clear_node(int node, struct memory_notify *arg)
>                 node_clear_state(node, N_MEMORY);
>  }
>
> +static int count_system_ram_pages_cb(unsigned long start_pfn,
> +                                    unsigned long nr_pages, void *data)
> +{
> +       unsigned long *nr_system_ram_pages = data;
> +
> +       *nr_system_ram_pages += nr_pages;
> +       return 0;
> +}
> +
>  static int __ref __offline_pages(unsigned long start_pfn,
>                   unsigned long end_pfn)
>  {
> -       unsigned long pfn, nr_pages;
> +       unsigned long pfn, nr_pages = 0;
>         unsigned long offlined_pages = 0;
>         int ret, node, nr_isolate_pageblock;
>         unsigned long flags;
> @@ -1461,6 +1470,20 @@ static int __ref __offline_pages(unsigned long start_pfn,
>
>         mem_hotplug_begin();
>
> +       /*
> +        * Don't allow to offline memory blocks that contain holes.
> +        * Consecuently, memory blocks with holes can never get onlined

s/Consecuently/Consequently/

> +        * (hotplugged memory has no holes and all boot memory is online).
> +        * This allows to simplify the onlining/offlining code quite a lot.
> +        */

The last sentence of this comment makes sense in the context of this
patch, but I don't think it stands by itself in the code base after
the fact. The person reading the comment can't see the simplifications
because the code is already gone. I'd clarify it to talk about why it
is safe to not mess around with PG_Reserved in the hotplug path
because of this check.

After those clarifications you can add:

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
David Hildenbrand Nov. 5, 2019, 9:31 a.m. UTC | #2
On 05.11.19 02:30, Dan Williams wrote:
> On Thu, Oct 24, 2019 at 5:10 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> Our onlining/offlining code is unnecessarily complicated. Only memory
>> blocks added during boot can have holes (a range that is not
>> IORESOURCE_SYSTEM_RAM). Hotplugged memory never has holes (e.g., see
>> add_memory_resource()). All boot memory is alread online.
> 
> s/alread/already/
> 

Thanks.

> ...also perhaps clarify "already online" by what point in time and why
> that is relevant. For example a description of the difference between
> the SetPageReserved() in the bootmem path and the one in the hotplug
> path.

Will add.

> 
>> Therefore, when we stop allowing to offline memory blocks with holes, we
>> implicitly no longer have to deal with onlining memory blocks with holes.
> 
> Maybe an explicit reference of the code areas that deal with holes
> would help to back up that assertion. Certainly it would have saved me
> some time for the review.

I can add a reference to the onlining code that will only online pages 
that don't fall into memory holes.

> 
>> This allows to simplify the code. For example, we no longer have to
>> worry about marking pages that fall into memory holes PG_reserved when
>> onlining memory. We can stop setting pages PG_reserved.
> 
> ...but not for bootmem, right?

Yes, bootmem is not changed. (especially, early allocations and memory 
holes are marked PG_reserved - basically everything not given to the 
buddy after boot)

> 
>>
>> Offlining memory blocks added during boot is usually not guranteed to work
> 
> s/guranteed/guaranteed/

Thanks.

> 
>> either way (unmovable data might have easily ended up on that memory during
>> boot). So stopping to do that should not really hurt (+ people are not
>> even aware of a setup where that used to work
> 
> Maybe put a "Link: https://lkml.kernel.org/r/$msg_id" to that discussion?
> 
>> and that the existing code
>> still works correctly with memory holes). For the use case of offlining
>> memory to unplug DIMMs, we should see no change. (holes on DIMMs would be
>> weird).
> 
> However, less memory can be offlined than was theoretically allowed
> previously, so I don't understand the "we should see no change"
> comment. I still agree that's a price worth paying to get the code
> cleanups and if someone screams we can look at adding it back, but the
> fact that it was already fragile seems decent enough protection.

Hotplugged DIMMs (add_memory()) have no holes and will therefore see no 
change.

>>
>> Please note that hardware errors (PG_hwpoison) are not memory holes and
>> not affected by this change when offlining.
>>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Oscar Salvador <osalvador@suse.de>
>> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
>> Cc: Dan Williams <dan.j.williams@intel.com>
>> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>   mm/memory_hotplug.c | 26 ++++++++++++++++++++++++--
>>   1 file changed, 24 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 561371ead39a..8d81730cf036 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -1447,10 +1447,19 @@ static void node_states_clear_node(int node, struct memory_notify *arg)
>>                  node_clear_state(node, N_MEMORY);
>>   }
>>
>> +static int count_system_ram_pages_cb(unsigned long start_pfn,
>> +                                    unsigned long nr_pages, void *data)
>> +{
>> +       unsigned long *nr_system_ram_pages = data;
>> +
>> +       *nr_system_ram_pages += nr_pages;
>> +       return 0;
>> +}
>> +
>>   static int __ref __offline_pages(unsigned long start_pfn,
>>                    unsigned long end_pfn)
>>   {
>> -       unsigned long pfn, nr_pages;
>> +       unsigned long pfn, nr_pages = 0;
>>          unsigned long offlined_pages = 0;
>>          int ret, node, nr_isolate_pageblock;
>>          unsigned long flags;
>> @@ -1461,6 +1470,20 @@ static int __ref __offline_pages(unsigned long start_pfn,
>>
>>          mem_hotplug_begin();
>>
>> +       /*
>> +        * Don't allow to offline memory blocks that contain holes.
>> +        * Consecuently, memory blocks with holes can never get onlined
> 
> s/Consecuently/Consequently/

Thanks.

> 
>> +        * (hotplugged memory has no holes and all boot memory is online).
>> +        * This allows to simplify the onlining/offlining code quite a lot.
>> +        */
> 
> The last sentence of this comment makes sense in the context of this
> patch, but I don't think it stands by itself in the code base after
> the fact. The person reading the comment can't see the simplifications
> because the code is already gone. I'd clarify it to talk about why it
> is safe to not mess around with PG_Reserved in the hotplug path
> because of this check.

I'll think of something. It's not just the PG_reserved handling but the 
whole pfn_valid()/walk_system_ram_range() handling that can be simplified.

> 
> After those clarifications you can add:
> 
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> 

Thanks!
diff mbox series

Patch

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 561371ead39a..8d81730cf036 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1447,10 +1447,19 @@  static void node_states_clear_node(int node, struct memory_notify *arg)
 		node_clear_state(node, N_MEMORY);
 }
 
+static int count_system_ram_pages_cb(unsigned long start_pfn,
+				     unsigned long nr_pages, void *data)
+{
+	unsigned long *nr_system_ram_pages = data;
+
+	*nr_system_ram_pages += nr_pages;
+	return 0;
+}
+
 static int __ref __offline_pages(unsigned long start_pfn,
 		  unsigned long end_pfn)
 {
-	unsigned long pfn, nr_pages;
+	unsigned long pfn, nr_pages = 0;
 	unsigned long offlined_pages = 0;
 	int ret, node, nr_isolate_pageblock;
 	unsigned long flags;
@@ -1461,6 +1470,20 @@  static int __ref __offline_pages(unsigned long start_pfn,
 
 	mem_hotplug_begin();
 
+	/*
+	 * Don't allow to offline memory blocks that contain holes.
+	 * Consecuently, memory blocks with holes can never get onlined
+	 * (hotplugged memory has no holes and all boot memory is online).
+	 * This allows to simplify the onlining/offlining code quite a lot.
+	 */
+	walk_system_ram_range(start_pfn, end_pfn - start_pfn, &nr_pages,
+			      count_system_ram_pages_cb);
+	if (nr_pages != end_pfn - start_pfn) {
+		ret = -EINVAL;
+		reason = "memory holes";
+		goto failed_removal;
+	}
+
 	/* This makes hotplug much easier...and readable.
 	   we assume this for now. .*/
 	if (!test_pages_in_a_zone(start_pfn, end_pfn, &valid_start,
@@ -1472,7 +1495,6 @@  static int __ref __offline_pages(unsigned long start_pfn,
 
 	zone = page_zone(pfn_to_page(valid_start));
 	node = zone_to_nid(zone);
-	nr_pages = end_pfn - start_pfn;
 
 	/* set above range as isolated */
 	ret = start_isolate_page_range(start_pfn, end_pfn,