mbox series

[v3,0/5] implement "memmap on memory" feature on s390

Message ID 20231127082023.2079810-1-sumanthk@linux.ibm.com (mailing list archive)
Headers show
Series implement "memmap on memory" feature on s390 | expand

Message

Sumanth Korikkar Nov. 27, 2023, 8:20 a.m. UTC
Hi All,

The patch series implements "memmap on memory" feature on s390.

Patch 1 introduces  MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE memory
notifiers to prepare the transition of memory to and from a physically
accessible state. New mhp_flag MHP_OFFLINE_INACCESSIBLE is introduced to
ensure altmap cannot be written when addidng memory - before it is set
online. This enhancement is crucial for implementing the "memmap on
memory" feature for s390 in a subsequent patch.

Patches 2 allocates vmemmap pages from self-contained memory range for
s390. It allocates memory map (struct pages array) from the hotplugged
memory range, rather than using system memory by passing altmap to
vmemmap functions.

Patch 3 removes unhandled memory notifier types on s390.

Patch 4 implements MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE memory
notifiers on s390. MEM_PREPARE_ONLINE memory notifier makes memory block
physical accessible via sclp assign command. The notifier ensures
self-contained memory maps are accessible and hence enabling the "memmap
on memory" on s390. MEM_FINISH_OFFLINE memory notifier shifts the memory
block to an inaccessible state via sclp unassign command

Patch 5 finally enables MHP_MEMMAP_ON_MEMORY on s390

Note:
These patches are rebased on top of three fixes:
mm: use vmem_altmap code without CONFIG_ZONE_DEVICE
mm/memory_hotplug: fix error handling in add_memory_resource()
mm/memory_hotplug: add missing mem_hotplug_lock

v3:
* added comments to MHP_OFFLINE_ACCESSIBLE as suggested by David.
* Squashed three commits related to new memory notifier.

v2:
* Fixes are integrated and hence removed from this patch series
Suggestions from David:
* Add new flag MHP_OFFLINE_INACCESSIBLE to avoid accessing memory
  during memory hotplug addition phase.
* Avoid page_init_poison() on memmap during mhp addition phase, when
  MHP_OFFLINE_INACCESSIBLE mhp_flag is passed in add_memory().
* Do not skip add_pages() in arch_add_memory(). Similarly, remove
  similar hacks in arch_remove_memory(). 
* Use MHP_PREPARE_ONLINE/MHP_FINISH_OFFLINE naming convention for
  new memory notifiers.
* Rearrange removal of unused s390 memory notifier.
* Necessary commit messages changes.

Thank you

Sumanth Korikkar (5):
  mm/memory_hotplug: introduce MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE
    notifiers
  s390/mm: allocate vmemmap pages from self-contained memory range
  s390/sclp: remove unhandled memory notifier type
  s390/mm: implement MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers
  s390: enable MHP_MEMMAP_ON_MEMORY

 arch/s390/Kconfig              |  1 +
 arch/s390/mm/init.c            |  3 --
 arch/s390/mm/vmem.c            | 62 +++++++++++++++++++---------------
 drivers/base/memory.c          | 21 ++++++++++--
 drivers/s390/char/sclp_cmd.c   | 31 ++++++++++++-----
 include/linux/memory.h         |  2 ++
 include/linux/memory_hotplug.h | 18 +++++++++-
 include/linux/memremap.h       |  1 +
 mm/memory_hotplug.c            | 30 ++++++++++++++--
 mm/sparse.c                    |  3 +-
 10 files changed, 127 insertions(+), 45 deletions(-)

Comments

David Hildenbrand Nov. 27, 2023, 3:11 p.m. UTC | #1
On 27.11.23 09:20, Sumanth Korikkar wrote:
> MEM_PREPARE_ONLINE memory notifier makes memory block physical
> accessible via sclp assign command. The notifier ensures self-contained
> memory maps are accessible and hence enabling the "memmap on memory" on
> s390.
> 
> MEM_FINISH_OFFLINE memory notifier shifts the memory block to an
> inaccessible state via sclp unassign command.
> 
> Implementation considerations:
> * When MHP_MEMMAP_ON_MEMORY is disabled, the system retains the old
>    behavior. This means the memory map is allocated from default memory.
> * If MACHINE_HAS_EDAT1 is unavailable, MHP_MEMMAP_ON_MEMORY is
>    automatically disabled. This ensures that vmemmap pagetables do not
>    consume additional memory from the default memory allocator.
> * The MEM_GOING_ONLINE notifier has been modified to perform no
>    operation, as MEM_PREPARE_ONLINE already executes the sclp assign
>    command.
> * The MEM_CANCEL_ONLINE/MEM_OFFLINE notifier now performs no operation, as
>    MEM_FINISH_OFFLINE already executes the sclp unassign command.
> 
> Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
> Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
> ---
>   drivers/s390/char/sclp_cmd.c | 28 ++++++++++++++++++++++------
>   1 file changed, 22 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/s390/char/sclp_cmd.c b/drivers/s390/char/sclp_cmd.c
> index 355e63e44e95..30b829e4c052 100644
> --- a/drivers/s390/char/sclp_cmd.c
> +++ b/drivers/s390/char/sclp_cmd.c
> @@ -18,6 +18,7 @@
>   #include <linux/mm.h>
>   #include <linux/mmzone.h>
>   #include <linux/memory.h>
> +#include <linux/memory_hotplug.h>
>   #include <linux/module.h>
>   #include <asm/ctlreg.h>
>   #include <asm/chpid.h>
> @@ -26,6 +27,7 @@
>   #include <asm/sclp.h>
>   #include <asm/numa.h>
>   #include <asm/facility.h>
> +#include <asm/page-states.h>
>   
>   #include "sclp.h"
>   
> @@ -319,6 +321,7 @@ static bool contains_standby_increment(unsigned long start, unsigned long end)
>   static int sclp_mem_notifier(struct notifier_block *nb,
>   			     unsigned long action, void *data)
>   {
> +	struct memory_block *memory_block;
>   	unsigned long start, size;
>   	struct memory_notify *arg;
>   	unsigned char id;
> @@ -340,18 +343,29 @@ static int sclp_mem_notifier(struct notifier_block *nb,
>   		if (contains_standby_increment(start, start + size))
>   			rc = -EPERM;
>   		break;
> -	case MEM_GOING_ONLINE:
> +	case MEM_PREPARE_ONLINE:
> +		memory_block = find_memory_block(pfn_to_section_nr(arg->start_pfn));
> +		if (!memory_block) {
> +			rc = -EINVAL;
> +			goto out;
> +		}
>   		rc = sclp_mem_change_state(start, size, 1);
> +		if (rc || !memory_block->altmap)
> +			goto out;
> +		/*
> +		 * Set CMMA state to nodat here, since the struct page memory
> +		 * at the beginning of the memory block will not go through the
> +		 * buddy allocator later.
> +		 */
> +		__arch_set_page_nodat((void *)__va(start), memory_block->altmap->free);

Looking up the memory block and grabbing the altmap from there is a bit 
unfortunate.

Why can't we do that when adding the altmap? Will the hypervisor scream 
at us?

... would we want to communicate any altmap start+size via the memory 
notifier instead?
Sumanth Korikkar Nov. 27, 2023, 4:12 p.m. UTC | #2
On Mon, Nov 27, 2023 at 04:11:05PM +0100, David Hildenbrand wrote:
> > diff --git a/drivers/s390/char/sclp_cmd.c b/drivers/s390/char/sclp_cmd.c
> > index 355e63e44e95..30b829e4c052 100644
> > --- a/drivers/s390/char/sclp_cmd.c
> > +++ b/drivers/s390/char/sclp_cmd.c
> > @@ -18,6 +18,7 @@
> >   #include <linux/mm.h>
> >   #include <linux/mmzone.h>
> >   #include <linux/memory.h>
> > +#include <linux/memory_hotplug.h>
> >   #include <linux/module.h>
> >   #include <asm/ctlreg.h>
> >   #include <asm/chpid.h>
> > @@ -26,6 +27,7 @@
> >   #include <asm/sclp.h>
> >   #include <asm/numa.h>
> >   #include <asm/facility.h>
> > +#include <asm/page-states.h>
> >   #include "sclp.h"
> > @@ -319,6 +321,7 @@ static bool contains_standby_increment(unsigned long start, unsigned long end)
> >   static int sclp_mem_notifier(struct notifier_block *nb,
> >   			     unsigned long action, void *data)
> >   {
> > +	struct memory_block *memory_block;
> >   	unsigned long start, size;
> >   	struct memory_notify *arg;
> >   	unsigned char id;
> > @@ -340,18 +343,29 @@ static int sclp_mem_notifier(struct notifier_block *nb,
> >   		if (contains_standby_increment(start, start + size))
> >   			rc = -EPERM;
> >   		break;
> > -	case MEM_GOING_ONLINE:
> > +	case MEM_PREPARE_ONLINE:
> > +		memory_block = find_memory_block(pfn_to_section_nr(arg->start_pfn));
> > +		if (!memory_block) {
> > +			rc = -EINVAL;
> > +			goto out;
> > +		}
> >   		rc = sclp_mem_change_state(start, size, 1);
> > +		if (rc || !memory_block->altmap)
> > +			goto out;
> > +		/*
> > +		 * Set CMMA state to nodat here, since the struct page memory
> > +		 * at the beginning of the memory block will not go through the
> > +		 * buddy allocator later.
> > +		 */
> > +		__arch_set_page_nodat((void *)__va(start), memory_block->altmap->free);
> 
> Looking up the memory block and grabbing the altmap from there is a bit
> unfortunate.
> 
> Why can't we do that when adding the altmap? Will the hypervisor scream at
> us?
> 
calling __arch_set_page_nodat() before making memory block accessible
will lead to crash. Hence, we think this is the only safe location to
place it.

> ... would we want to communicate any altmap start+size via the memory
> notifier instead?

Passing start, size  of memory range via memory notifier looks correct
approach to me, as we try to make the specified range accessible.

If we want to pass altmap size (nr_vmemmap_pages), then we might need a
new field in struct memory_notify, which would prevent access of
memory_block->altmap->free in the notifier.

Do you want to take this approach instead?

If yes, Then I could add a new field nr_vmemmap_pages in struct
memory_notify and place it in PATCH : introduce
MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers.


Thanks
David Hildenbrand Nov. 27, 2023, 4:58 p.m. UTC | #3
On 27.11.23 17:12, Sumanth Korikkar wrote:
> On Mon, Nov 27, 2023 at 04:11:05PM +0100, David Hildenbrand wrote:
>>> diff --git a/drivers/s390/char/sclp_cmd.c b/drivers/s390/char/sclp_cmd.c
>>> index 355e63e44e95..30b829e4c052 100644
>>> --- a/drivers/s390/char/sclp_cmd.c
>>> +++ b/drivers/s390/char/sclp_cmd.c
>>> @@ -18,6 +18,7 @@
>>>    #include <linux/mm.h>
>>>    #include <linux/mmzone.h>
>>>    #include <linux/memory.h>
>>> +#include <linux/memory_hotplug.h>
>>>    #include <linux/module.h>
>>>    #include <asm/ctlreg.h>
>>>    #include <asm/chpid.h>
>>> @@ -26,6 +27,7 @@
>>>    #include <asm/sclp.h>
>>>    #include <asm/numa.h>
>>>    #include <asm/facility.h>
>>> +#include <asm/page-states.h>
>>>    #include "sclp.h"
>>> @@ -319,6 +321,7 @@ static bool contains_standby_increment(unsigned long start, unsigned long end)
>>>    static int sclp_mem_notifier(struct notifier_block *nb,
>>>    			     unsigned long action, void *data)
>>>    {
>>> +	struct memory_block *memory_block;
>>>    	unsigned long start, size;
>>>    	struct memory_notify *arg;
>>>    	unsigned char id;
>>> @@ -340,18 +343,29 @@ static int sclp_mem_notifier(struct notifier_block *nb,
>>>    		if (contains_standby_increment(start, start + size))
>>>    			rc = -EPERM;
>>>    		break;
>>> -	case MEM_GOING_ONLINE:
>>> +	case MEM_PREPARE_ONLINE:
>>> +		memory_block = find_memory_block(pfn_to_section_nr(arg->start_pfn));
>>> +		if (!memory_block) {
>>> +			rc = -EINVAL;
>>> +			goto out;
>>> +		}
>>>    		rc = sclp_mem_change_state(start, size, 1);
>>> +		if (rc || !memory_block->altmap)
>>> +			goto out;
>>> +		/*
>>> +		 * Set CMMA state to nodat here, since the struct page memory
>>> +		 * at the beginning of the memory block will not go through the
>>> +		 * buddy allocator later.
>>> +		 */
>>> +		__arch_set_page_nodat((void *)__va(start), memory_block->altmap->free);
>>
>> Looking up the memory block and grabbing the altmap from there is a bit
>> unfortunate.
>>
>> Why can't we do that when adding the altmap? Will the hypervisor scream at
>> us?
>>
> calling __arch_set_page_nodat() before making memory block accessible
> will lead to crash. Hence, we think this is the only safe location to
> place it.
> 
>> ... would we want to communicate any altmap start+size via the memory
>> notifier instead?
> 
> Passing start, size  of memory range via memory notifier looks correct
> approach to me, as we try to make the specified range accessible.
> 
> If we want to pass altmap size (nr_vmemmap_pages), then we might need a
> new field in struct memory_notify, which would prevent access of
> memory_block->altmap->free in the notifier.
> 
> Do you want to take this approach instead?
> 
> If yes, Then I could add a new field nr_vmemmap_pages in struct
> memory_notify and place it in PATCH : introduce
> MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers.

Yes, see my other mail. That's probably cleanest!