diff mbox series

[RFC,v1,01/12] mm/memory_hotplug: Don't allow to online/offline memory blocks with holes

Message ID 20191022171239.21487-2-david@redhat.com (mailing list archive)
State New, archived
Headers show
Series mm: Don't mark hotplugged pages PG_reserved (including ZONE_DEVICE) | expand

Commit Message

David Hildenbrand Oct. 22, 2019, 5:12 p.m. UTC
Our onlining/offlining code is unnecessarily complicated. Only memory
blocks added during boot can have holes. Hotplugged memory never has
holes. That memory is already online.

When we stop allowing to offline memory blocks with holes, we implicitly
stop to online memory blocks with holes.

This allows to simplify the code. For example, we no longer have to
worry about marking pages that fall into memory holes PG_reserved when
onlining memory. We can stop setting pages PG_reserved.

Offlining memory blocks added during boot is usually not guranteed to work
either way. So stopping to do that (if anybody really used and tested
this over the years) should not really hurt. For the use case of
offlining memory to unplug DIMMs, we should see no change. (holes on
DIMMs would be weird)

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory_hotplug.c | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

Comments

Anshuman Khandual Oct. 24, 2019, 3:53 a.m. UTC | #1
On 10/22/2019 10:42 PM, David Hildenbrand wrote:
> Our onlining/offlining code is unnecessarily complicated. Only memory
> blocks added during boot can have holes. Hotplugged memory never has
> holes. That memory is already online.

Why hot plugged memory at runtime cannot have holes (e.g a semi bad DIMM).
Currently, do we just abort adding that memory block if there are holes ?

> 
> When we stop allowing to offline memory blocks with holes, we implicitly
> stop to online memory blocks with holes.

Reducing hotplug support for memory blocks with holes just to simplify
the code. Is it worth ?

> 
> This allows to simplify the code. For example, we no longer have to
> worry about marking pages that fall into memory holes PG_reserved when
> onlining memory. We can stop setting pages PG_reserved.

Could not there be any other way of tracking these holes if not the page
reserved bit. In the memory section itself and corresponding struct pages
just remained poisoned ? Just wondering, might be all wrong here.

> 
> Offlining memory blocks added during boot is usually not guranteed to work
> either way. So stopping to do that (if anybody really used and tested

That guarantee does not exist right now because how boot memory could have
been used after boot not from a limitation of the memory hot remove itself.

> this over the years) should not really hurt. For the use case of
> offlining memory to unplug DIMMs, we should see no change. (holes on
> DIMMs would be weird)

Holes on DIMM could be due to HW errors affecting only parts of it. By not
allowing such DIMM's hot add and remove, we are definitely reducing the
scope of overall hotplug functionality. Is code simplification in itself
is worth this reduction in functionality ?

> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  mm/memory_hotplug.c | 26 ++++++++++++++++++++++++--
>  1 file changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 561371ead39a..7210f4375279 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1447,10 +1447,19 @@ static void node_states_clear_node(int node, struct memory_notify *arg)
>  		node_clear_state(node, N_MEMORY);
>  }
>  
> +static int count_system_ram_pages_cb(unsigned long start_pfn,
> +				     unsigned long nr_pages, void *data)
> +{
> +	unsigned long *nr_system_ram_pages = data;
> +
> +	*nr_system_ram_pages += nr_pages;
> +	return 0;
> +}
> +
>  static int __ref __offline_pages(unsigned long start_pfn,
>  		  unsigned long end_pfn)
>  {
> -	unsigned long pfn, nr_pages;
> +	unsigned long pfn, nr_pages = 0;
>  	unsigned long offlined_pages = 0;
>  	int ret, node, nr_isolate_pageblock;
>  	unsigned long flags;
> @@ -1461,6 +1470,20 @@ static int __ref __offline_pages(unsigned long start_pfn,
>  
>  	mem_hotplug_begin();
>  
> +	/*
> +	 * We don't allow to offline memory blocks that contain holes
> +	 * and consecuently don't allow to online memory blocks that contain
> +	 * holes. This allows to simplify the code quite a lot and we don't
> +	 * have to mess with PG_reserved pages for memory holes.
> +	 */
> +	walk_system_ram_range(start_pfn, end_pfn - start_pfn, &nr_pages,
> +			      count_system_ram_pages_cb);
> +	if (nr_pages != end_pfn - start_pfn) {
> +		ret = -EINVAL;
> +		reason = "memory holes";
> +		goto failed_removal;
> +	}
> +
>  	/* This makes hotplug much easier...and readable.
>  	   we assume this for now. .*/
>  	if (!test_pages_in_a_zone(start_pfn, end_pfn, &valid_start,
> @@ -1472,7 +1495,6 @@ static int __ref __offline_pages(unsigned long start_pfn,
>  
>  	zone = page_zone(pfn_to_page(valid_start));
>  	node = zone_to_nid(zone);
> -	nr_pages = end_pfn - start_pfn;
>  
>  	/* set above range as isolated */
>  	ret = start_isolate_page_range(start_pfn, end_pfn,
>
David Hildenbrand Oct. 24, 2019, 7:55 a.m. UTC | #2
On 24.10.19 05:53, Anshuman Khandual wrote:
> 
> On 10/22/2019 10:42 PM, David Hildenbrand wrote:
>> Our onlining/offlining code is unnecessarily complicated. Only memory
>> blocks added during boot can have holes. Hotplugged memory never has
>> holes. That memory is already online.
> 
> Why hot plugged memory at runtime cannot have holes (e.g a semi bad DIMM).

Important: HWPoison != memory hole

A memory hole is memory that is not "IORESOURCE_SYSRAM". These pages are 
currently marked PG_reserved. Such holes are sometimes used for mapping 
something into kernel space. Some archs use the PG_reserved to detect 
the memory hole ("not ram") and ignore the memmap.

Poisoned pages are marked PG_hwpoison.

> Currently, do we just abort adding that memory block if there are holes ?

There is no interface to do that.

E.g., have a look at add_memory() add_memory_resource(). You can only 
pass one memory resource (that is all IORESOURCE_SYSRAM | IORESOURCE_BUSY)

Hotplugging memory with holes is not supported (nor can I imagine a use 
case for that).

>>
>> When we stop allowing to offline memory blocks with holes, we implicitly
>> stop to online memory blocks with holes.
> 
> Reducing hotplug support for memory blocks with holes just to simplify
> the code. Is it worth ?

Me and Michal are not aware of a users, not even aware of a use case. 
Keeping code around that nobody really needs that limits cleanups, no 
thanks. Similar to us not supporting to offline memory blocks that span 
multiple nodes/zones.

E.g., have a look at the isolation code. It is full of code that jumps 
over memory holes (start_isolate_page_range() -> __first_valid_page()). 
That made sense for our complicated memory offlining code, but it is 
actually harmful when dealing with alloc_contig_range(). Allocation 
never wants to jump over memory holes. After this patch, we can just 
fail hard on any memory hole we detect, instead of ignoring it (or 
special-casing it).

> 
>>
>> This allows to simplify the code. For example, we no longer have to
>> worry about marking pages that fall into memory holes PG_reserved when
>> onlining memory. We can stop setting pages PG_reserved.
> 
> Could not there be any other way of tracking these holes if not the page
> reserved bit. In the memory section itself and corresponding struct pages
> just remained poisoned ? Just wondering, might be all wrong here.

Of course there could be ways (e.g., using PG_offline eventually), but 
it boils down to us having to deal with it in onlining/offlining code. 
And that is some handling nobody really seems to need.

> 
>>
>> Offlining memory blocks added during boot is usually not guranteed to work
>> either way. So stopping to do that (if anybody really used and tested
> 
> That guarantee does not exist right now because how boot memory could have
> been used after boot not from a limitation of the memory hot remove itself.

Yep. However, Michal and I are not even aware of a setup that would made 
this work and guarantee that the existing code actually still is able to 
deal with holes. Are you?

> 
>> this over the years) should not really hurt. For the use case of
>> offlining memory to unplug DIMMs, we should see no change. (holes on
>> DIMMs would be weird)
> 
> Holes on DIMM could be due to HW errors affecting only parts of it. By not

Again, HW errors != holes. We have PG_hwpoison for that.

> allowing such DIMM's hot add and remove, we are definitely reducing the
> scope of overall hotplug functionality. Is code simplification in itself
> is worth this reduction in functionality ?

What you describe is not affected.

Thanks!
diff mbox series

Patch

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 561371ead39a..7210f4375279 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1447,10 +1447,19 @@  static void node_states_clear_node(int node, struct memory_notify *arg)
 		node_clear_state(node, N_MEMORY);
 }
 
+static int count_system_ram_pages_cb(unsigned long start_pfn,
+				     unsigned long nr_pages, void *data)
+{
+	unsigned long *nr_system_ram_pages = data;
+
+	*nr_system_ram_pages += nr_pages;
+	return 0;
+}
+
 static int __ref __offline_pages(unsigned long start_pfn,
 		  unsigned long end_pfn)
 {
-	unsigned long pfn, nr_pages;
+	unsigned long pfn, nr_pages = 0;
 	unsigned long offlined_pages = 0;
 	int ret, node, nr_isolate_pageblock;
 	unsigned long flags;
@@ -1461,6 +1470,20 @@  static int __ref __offline_pages(unsigned long start_pfn,
 
 	mem_hotplug_begin();
 
+	/*
+	 * We don't allow to offline memory blocks that contain holes
+	 * and consecuently don't allow to online memory blocks that contain
+	 * holes. This allows to simplify the code quite a lot and we don't
+	 * have to mess with PG_reserved pages for memory holes.
+	 */
+	walk_system_ram_range(start_pfn, end_pfn - start_pfn, &nr_pages,
+			      count_system_ram_pages_cb);
+	if (nr_pages != end_pfn - start_pfn) {
+		ret = -EINVAL;
+		reason = "memory holes";
+		goto failed_removal;
+	}
+
 	/* This makes hotplug much easier...and readable.
 	   we assume this for now. .*/
 	if (!test_pages_in_a_zone(start_pfn, end_pfn, &valid_start,
@@ -1472,7 +1495,6 @@  static int __ref __offline_pages(unsigned long start_pfn,
 
 	zone = page_zone(pfn_to_page(valid_start));
 	node = zone_to_nid(zone);
-	nr_pages = end_pfn - start_pfn;
 
 	/* set above range as isolated */
 	ret = start_isolate_page_range(start_pfn, end_pfn,