mbox series

[v2,0/8] mm/memory_hotplug: Factor out memory block device handling

Message ID 20190507183804.5512-1-david@redhat.com (mailing list archive)
Headers show
Series mm/memory_hotplug: Factor out memory block device handling | expand

Message

David Hildenbrand May 7, 2019, 6:37 p.m. UTC
We only want memory block devices for memory to be onlined/offlined
(add/remove from the buddy). This is required so user space can
online/offline memory and kdump gets notified about newly onlined memory.

Only such memory has the requirement of having to span whole memory blocks.
Let's factor out creation/removal of memory block devices. This helps
to further cleanup arch_add_memory/arch_remove_memory() and to make
implementation of new features easier. E.g. supplying a driver for
memory block devices becomes way easier (so user space is able to
distinguish different types of added memory to properly online it).

Patch 1 makes sure the memory block size granularity is always respected.
Patch 2 implements arch_remove_memory() on s390x. Patch 3 prepares
arch_remove_memory() to be also called without CONFIG_MEMORY_HOTREMOVE.
Patch 4,5 and 6 factor out creation/removal of memory block devices.
Patch 7 gets rid of some unlikely errors that could have happened, not
removing links between memory block devices and nodes, previously brought
up by Oscar.

Did a quick sanity test with DIMM plug/unplug, making sure all devices
and sysfs links properly get added/removed. Compile tested on s390x and
x86-64.

Based on git://git.cmpxchg.org/linux-mmots.git

Next refactoring on my list will be making sure that remove_memory()
will never deal with zones / access "struct pages". Any kind of zone
handling will have to be done when offlining system memory / before
removing device memory. I am thinking about remove_pfn_range_from_zone()",
du undo everything "move_pfn_range_to_zone()" did.

v1 -> v2:
- s390x/mm: Implement arch_remove_memory()
-- remove mapping after "__remove_pages"


David Hildenbrand (8):
  mm/memory_hotplug: Simplify and fix check_hotplug_memory_range()
  s390x/mm: Implement arch_remove_memory()
  mm/memory_hotplug: arch_remove_memory() and __remove_pages() with
    CONFIG_MEMORY_HOTPLUG
  mm/memory_hotplug: Create memory block devices after arch_add_memory()
  mm/memory_hotplug: Drop MHP_MEMBLOCK_API
  mm/memory_hotplug: Remove memory block devices before
    arch_remove_memory()
  mm/memory_hotplug: Make unregister_memory_block_under_nodes() never
    fail
  mm/memory_hotplug: Remove "zone" parameter from
    sparse_remove_one_section

 arch/ia64/mm/init.c            |   2 -
 arch/powerpc/mm/mem.c          |   2 -
 arch/s390/mm/init.c            |  15 +++--
 arch/sh/mm/init.c              |   2 -
 arch/x86/mm/init_32.c          |   2 -
 arch/x86/mm/init_64.c          |   2 -
 drivers/base/memory.c          | 109 +++++++++++++++++++--------------
 drivers/base/node.c            |  27 +++-----
 include/linux/memory.h         |   6 +-
 include/linux/memory_hotplug.h |  12 +---
 include/linux/node.h           |   7 +--
 mm/memory_hotplug.c            |  44 ++++++-------
 mm/sparse.c                    |  10 +--
 13 files changed, 104 insertions(+), 136 deletions(-)

Comments

Dan Williams May 7, 2019, 7:04 p.m. UTC | #1
On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote:
>
> We only want memory block devices for memory to be onlined/offlined
> (add/remove from the buddy). This is required so user space can
> online/offline memory and kdump gets notified about newly onlined memory.
>
> Only such memory has the requirement of having to span whole memory blocks.
> Let's factor out creation/removal of memory block devices. This helps
> to further cleanup arch_add_memory/arch_remove_memory() and to make
> implementation of new features easier. E.g. supplying a driver for
> memory block devices becomes way easier (so user space is able to
> distinguish different types of added memory to properly online it).
>
> Patch 1 makes sure the memory block size granularity is always respected.
> Patch 2 implements arch_remove_memory() on s390x. Patch 3 prepares
> arch_remove_memory() to be also called without CONFIG_MEMORY_HOTREMOVE.
> Patch 4,5 and 6 factor out creation/removal of memory block devices.
> Patch 7 gets rid of some unlikely errors that could have happened, not
> removing links between memory block devices and nodes, previously brought
> up by Oscar.
>
> Did a quick sanity test with DIMM plug/unplug, making sure all devices
> and sysfs links properly get added/removed. Compile tested on s390x and
> x86-64.
>
> Based on git://git.cmpxchg.org/linux-mmots.git
>
> Next refactoring on my list will be making sure that remove_memory()
> will never deal with zones / access "struct pages". Any kind of zone
> handling will have to be done when offlining system memory / before
> removing device memory. I am thinking about remove_pfn_range_from_zone()",
> du undo everything "move_pfn_range_to_zone()" did.
>
> v1 -> v2:
> - s390x/mm: Implement arch_remove_memory()
> -- remove mapping after "__remove_pages"
>
>
> David Hildenbrand (8):
>   mm/memory_hotplug: Simplify and fix check_hotplug_memory_range()
>   s390x/mm: Implement arch_remove_memory()
>   mm/memory_hotplug: arch_remove_memory() and __remove_pages() with
>     CONFIG_MEMORY_HOTPLUG
>   mm/memory_hotplug: Create memory block devices after arch_add_memory()
>   mm/memory_hotplug: Drop MHP_MEMBLOCK_API

So at a minimum we need a bit of patch staging guidance because this
obviously collides with the subsection bits that are built on top of
the existence of MHP_MEMBLOCK_API. What trigger do you envision as a
replacement that arch_add_memory() use to determine that subsection
operations should be disallowed?
David Hildenbrand May 7, 2019, 7:21 p.m. UTC | #2
On 07.05.19 21:04, Dan Williams wrote:
> On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> We only want memory block devices for memory to be onlined/offlined
>> (add/remove from the buddy). This is required so user space can
>> online/offline memory and kdump gets notified about newly onlined memory.
>>
>> Only such memory has the requirement of having to span whole memory blocks.
>> Let's factor out creation/removal of memory block devices. This helps
>> to further cleanup arch_add_memory/arch_remove_memory() and to make
>> implementation of new features easier. E.g. supplying a driver for
>> memory block devices becomes way easier (so user space is able to
>> distinguish different types of added memory to properly online it).
>>
>> Patch 1 makes sure the memory block size granularity is always respected.
>> Patch 2 implements arch_remove_memory() on s390x. Patch 3 prepares
>> arch_remove_memory() to be also called without CONFIG_MEMORY_HOTREMOVE.
>> Patch 4,5 and 6 factor out creation/removal of memory block devices.
>> Patch 7 gets rid of some unlikely errors that could have happened, not
>> removing links between memory block devices and nodes, previously brought
>> up by Oscar.
>>
>> Did a quick sanity test with DIMM plug/unplug, making sure all devices
>> and sysfs links properly get added/removed. Compile tested on s390x and
>> x86-64.
>>
>> Based on git://git.cmpxchg.org/linux-mmots.git
>>
>> Next refactoring on my list will be making sure that remove_memory()
>> will never deal with zones / access "struct pages". Any kind of zone
>> handling will have to be done when offlining system memory / before
>> removing device memory. I am thinking about remove_pfn_range_from_zone()",
>> du undo everything "move_pfn_range_to_zone()" did.
>>
>> v1 -> v2:
>> - s390x/mm: Implement arch_remove_memory()
>> -- remove mapping after "__remove_pages"
>>
>>
>> David Hildenbrand (8):
>>   mm/memory_hotplug: Simplify and fix check_hotplug_memory_range()
>>   s390x/mm: Implement arch_remove_memory()
>>   mm/memory_hotplug: arch_remove_memory() and __remove_pages() with
>>     CONFIG_MEMORY_HOTPLUG
>>   mm/memory_hotplug: Create memory block devices after arch_add_memory()
>>   mm/memory_hotplug: Drop MHP_MEMBLOCK_API
> 
> So at a minimum we need a bit of patch staging guidance because this
> obviously collides with the subsection bits that are built on top of
> the existence of MHP_MEMBLOCK_API. What trigger do you envision as a
> replacement that arch_add_memory() use to determine that subsection
> operations should be disallowed?
> 

Looks like we now have time to sort it out :)


Looking at your series

[PATCH v8 08/12] mm/sparsemem: Prepare for sub-section ranges

is the "single" effectively place using MHP_MEMBLOCK_API, namely
"subsection_check()". Used when adding/removing memory.


+static int subsection_check(unsigned long pfn, unsigned long nr_pages,
+		unsigned long flags, const char *reason)
+{
+	/*
+	 * Only allow partial section hotplug for !memblock ranges,
+	 * since register_new_memory() requires section alignment, and
+	 * CONFIG_SPARSEMEM_VMEMMAP=n requires sections to be fully
+	 * populated.
+	 */
+	if ((!IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP)
+				|| (flags & MHP_MEMBLOCK_API))
+			&& ((pfn & ~PAGE_SECTION_MASK)
+				|| (nr_pages & ~PAGE_SECTION_MASK))) {
+		WARN(1, "Sub-section hot-%s incompatible with %s\n", reason,
+				(flags & MHP_MEMBLOCK_API)
+				? "memblock api" : "!CONFIG_SPARSEMEM_VMEMMAP");
+		return -EINVAL;
+	}
+	return 0;
 }


(flags & MHP_MEMBLOCK_API)) && ((pfn & ~PAGE_SECTION_MASK) || (nr_pages
& ~PAGE_SECTION_MASK)))

sounds like something the caller (add_memory()) always has to take care
of. No need to check. The one imposing this restriction is the only caller.

In my opinion, that check/function can go completely.

Am I missing something / missing another user?
David Hildenbrand May 7, 2019, 7:37 p.m. UTC | #3
On 07.05.19 21:21, David Hildenbrand wrote:
> On 07.05.19 21:04, Dan Williams wrote:
>> On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote:
>>>
>>> We only want memory block devices for memory to be onlined/offlined
>>> (add/remove from the buddy). This is required so user space can
>>> online/offline memory and kdump gets notified about newly onlined memory.
>>>
>>> Only such memory has the requirement of having to span whole memory blocks.
>>> Let's factor out creation/removal of memory block devices. This helps
>>> to further cleanup arch_add_memory/arch_remove_memory() and to make
>>> implementation of new features easier. E.g. supplying a driver for
>>> memory block devices becomes way easier (so user space is able to
>>> distinguish different types of added memory to properly online it).
>>>
>>> Patch 1 makes sure the memory block size granularity is always respected.
>>> Patch 2 implements arch_remove_memory() on s390x. Patch 3 prepares
>>> arch_remove_memory() to be also called without CONFIG_MEMORY_HOTREMOVE.
>>> Patch 4,5 and 6 factor out creation/removal of memory block devices.
>>> Patch 7 gets rid of some unlikely errors that could have happened, not
>>> removing links between memory block devices and nodes, previously brought
>>> up by Oscar.
>>>
>>> Did a quick sanity test with DIMM plug/unplug, making sure all devices
>>> and sysfs links properly get added/removed. Compile tested on s390x and
>>> x86-64.
>>>
>>> Based on git://git.cmpxchg.org/linux-mmots.git
>>>
>>> Next refactoring on my list will be making sure that remove_memory()
>>> will never deal with zones / access "struct pages". Any kind of zone
>>> handling will have to be done when offlining system memory / before
>>> removing device memory. I am thinking about remove_pfn_range_from_zone()",
>>> du undo everything "move_pfn_range_to_zone()" did.
>>>
>>> v1 -> v2:
>>> - s390x/mm: Implement arch_remove_memory()
>>> -- remove mapping after "__remove_pages"
>>>
>>>
>>> David Hildenbrand (8):
>>>   mm/memory_hotplug: Simplify and fix check_hotplug_memory_range()
>>>   s390x/mm: Implement arch_remove_memory()
>>>   mm/memory_hotplug: arch_remove_memory() and __remove_pages() with
>>>     CONFIG_MEMORY_HOTPLUG
>>>   mm/memory_hotplug: Create memory block devices after arch_add_memory()
>>>   mm/memory_hotplug: Drop MHP_MEMBLOCK_API
>>
>> So at a minimum we need a bit of patch staging guidance because this
>> obviously collides with the subsection bits that are built on top of
>> the existence of MHP_MEMBLOCK_API. What trigger do you envision as a
>> replacement that arch_add_memory() use to determine that subsection
>> operations should be disallowed?
>>
> 
> Looks like we now have time to sort it out :)
> 
> 
> Looking at your series
> 
> [PATCH v8 08/12] mm/sparsemem: Prepare for sub-section ranges
> 
> is the "single" effectively place using MHP_MEMBLOCK_API, namely
> "subsection_check()". Used when adding/removing memory.
> 
> 
> +static int subsection_check(unsigned long pfn, unsigned long nr_pages,
> +		unsigned long flags, const char *reason)
> +{
> +	/*
> +	 * Only allow partial section hotplug for !memblock ranges,
> +	 * since register_new_memory() requires section alignment, and
> +	 * CONFIG_SPARSEMEM_VMEMMAP=n requires sections to be fully
> +	 * populated.
> +	 */
> +	if ((!IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP)
> +				|| (flags & MHP_MEMBLOCK_API))
> +			&& ((pfn & ~PAGE_SECTION_MASK)
> +				|| (nr_pages & ~PAGE_SECTION_MASK))) {
> +		WARN(1, "Sub-section hot-%s incompatible with %s\n", reason,
> +				(flags & MHP_MEMBLOCK_API)
> +				? "memblock api" : "!CONFIG_SPARSEMEM_VMEMMAP");
> +		return -EINVAL;
> +	}
> +	return 0;
>  }
> 
> 
> (flags & MHP_MEMBLOCK_API)) && ((pfn & ~PAGE_SECTION_MASK) || (nr_pages
> & ~PAGE_SECTION_MASK)))
> 
> sounds like something the caller (add_memory()) always has to take care
> of. No need to check. The one imposing this restriction is the only caller.
> 
> In my opinion, that check/function can go completely.
> 
> Am I missing something / missing another user?
> 

In other word, this series moves the restriction out of
arch_add_memory() and therefore you don't need subsection_check() at all
anymore. At least if I am not missing something :)
Dan Williams May 7, 2019, 8:36 p.m. UTC | #4
On Tue, May 7, 2019 at 12:38 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 07.05.19 21:21, David Hildenbrand wrote:
> > On 07.05.19 21:04, Dan Williams wrote:
> >> On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote:
> >>>
> >>> We only want memory block devices for memory to be onlined/offlined
> >>> (add/remove from the buddy). This is required so user space can
> >>> online/offline memory and kdump gets notified about newly onlined memory.
> >>>
> >>> Only such memory has the requirement of having to span whole memory blocks.
> >>> Let's factor out creation/removal of memory block devices. This helps
> >>> to further cleanup arch_add_memory/arch_remove_memory() and to make
> >>> implementation of new features easier. E.g. supplying a driver for
> >>> memory block devices becomes way easier (so user space is able to
> >>> distinguish different types of added memory to properly online it).
> >>>
> >>> Patch 1 makes sure the memory block size granularity is always respected.
> >>> Patch 2 implements arch_remove_memory() on s390x. Patch 3 prepares
> >>> arch_remove_memory() to be also called without CONFIG_MEMORY_HOTREMOVE.
> >>> Patch 4,5 and 6 factor out creation/removal of memory block devices.
> >>> Patch 7 gets rid of some unlikely errors that could have happened, not
> >>> removing links between memory block devices and nodes, previously brought
> >>> up by Oscar.
> >>>
> >>> Did a quick sanity test with DIMM plug/unplug, making sure all devices
> >>> and sysfs links properly get added/removed. Compile tested on s390x and
> >>> x86-64.
> >>>
> >>> Based on git://git.cmpxchg.org/linux-mmots.git
> >>>
> >>> Next refactoring on my list will be making sure that remove_memory()
> >>> will never deal with zones / access "struct pages". Any kind of zone
> >>> handling will have to be done when offlining system memory / before
> >>> removing device memory. I am thinking about remove_pfn_range_from_zone()",
> >>> du undo everything "move_pfn_range_to_zone()" did.
> >>>
> >>> v1 -> v2:
> >>> - s390x/mm: Implement arch_remove_memory()
> >>> -- remove mapping after "__remove_pages"
> >>>
> >>>
> >>> David Hildenbrand (8):
> >>>   mm/memory_hotplug: Simplify and fix check_hotplug_memory_range()
> >>>   s390x/mm: Implement arch_remove_memory()
> >>>   mm/memory_hotplug: arch_remove_memory() and __remove_pages() with
> >>>     CONFIG_MEMORY_HOTPLUG
> >>>   mm/memory_hotplug: Create memory block devices after arch_add_memory()
> >>>   mm/memory_hotplug: Drop MHP_MEMBLOCK_API
> >>
> >> So at a minimum we need a bit of patch staging guidance because this
> >> obviously collides with the subsection bits that are built on top of
> >> the existence of MHP_MEMBLOCK_API. What trigger do you envision as a
> >> replacement that arch_add_memory() use to determine that subsection
> >> operations should be disallowed?
> >>
> >
> > Looks like we now have time to sort it out :)
> >
> >
> > Looking at your series
> >
> > [PATCH v8 08/12] mm/sparsemem: Prepare for sub-section ranges
> >
> > is the "single" effectively place using MHP_MEMBLOCK_API, namely
> > "subsection_check()". Used when adding/removing memory.
> >
> >
> > +static int subsection_check(unsigned long pfn, unsigned long nr_pages,
> > +             unsigned long flags, const char *reason)
> > +{
> > +     /*
> > +      * Only allow partial section hotplug for !memblock ranges,
> > +      * since register_new_memory() requires section alignment, and
> > +      * CONFIG_SPARSEMEM_VMEMMAP=n requires sections to be fully
> > +      * populated.
> > +      */
> > +     if ((!IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP)
> > +                             || (flags & MHP_MEMBLOCK_API))
> > +                     && ((pfn & ~PAGE_SECTION_MASK)
> > +                             || (nr_pages & ~PAGE_SECTION_MASK))) {
> > +             WARN(1, "Sub-section hot-%s incompatible with %s\n", reason,
> > +                             (flags & MHP_MEMBLOCK_API)
> > +                             ? "memblock api" : "!CONFIG_SPARSEMEM_VMEMMAP");
> > +             return -EINVAL;
> > +     }
> > +     return 0;
> >  }
> >
> >
> > (flags & MHP_MEMBLOCK_API)) && ((pfn & ~PAGE_SECTION_MASK) || (nr_pages
> > & ~PAGE_SECTION_MASK)))
> >
> > sounds like something the caller (add_memory()) always has to take care
> > of. No need to check. The one imposing this restriction is the only caller.
> >
> > In my opinion, that check/function can go completely.
> >
> > Am I missing something / missing another user?
> >
>
> In other word, this series moves the restriction out of
> arch_add_memory() and therefore you don't need subsection_check() at all
> anymore. At least if I am not missing something :)

Ah, ok. Only direct arch_add_memory() users need to be worried about
subsection hotplug and the add_memory_resource() + __remove_memory()
paths are already protected by check_hotplug_memory_range(). Ok, I can
get on board with the removal.

Let me go ahead and review this series so Andrew can get it pulled in
and I can rebase.