diff mbox series

[v1,1/2] powerpc/pseries/hotplug-memory: stop checking is_mem_section_removable()

Message ID 20200407135416.24093-2-david@redhat.com (mailing list archive)
State New, archived
Headers show
Series mm/memory_hotplug: remove is_mem_section_removable() | expand

Commit Message

David Hildenbrand April 7, 2020, 1:54 p.m. UTC
In commit 53cdc1cb29e8 ("drivers/base/memory.c: indicate all memory
blocks as removable"), the user space interface to compute whether a memory
block can be offlined (exposed via
/sys/devices/system/memory/memoryX/removable) has effectively been
deprecated. We want to remove the leftovers of the kernel implementation.

When offlining a memory block (mm/memory_hotplug.c:__offline_pages()),
we'll start by:
1. Testing if it contains any holes, and reject if so
2. Testing if pages belong to different zones, and reject if so
3. Isolating the page range, checking if it contains any unmovable pages

Using is_mem_section_removable() before trying to offline is not only racy,
it can easily result in false positives/negatives. Let's stop manually
checking is_mem_section_removable(), and let device_offline() handle it
completely instead. We can remove the racy is_mem_section_removable()
implementation next.

We now take more locks (e.g., memory hotplug lock when offlining and the
zone lock when isolating), but maybe we should optimize that
implementation instead if this ever becomes a real problem (after all,
memory unplug is already an expensive operation). We started using
is_mem_section_removable() in commit 51925fb3c5c9 ("powerpc/pseries:
Implement memory hotplug remove in the kernel"), with the initial
hotremove support of lmbs.

Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Baoquan He <bhe@redhat.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 .../platforms/pseries/hotplug-memory.c        | 26 +++----------------
 1 file changed, 3 insertions(+), 23 deletions(-)

Comments

Michal Hocko April 7, 2020, 1:58 p.m. UTC | #1
On Tue 07-04-20 15:54:15, David Hildenbrand wrote:
> In commit 53cdc1cb29e8 ("drivers/base/memory.c: indicate all memory
> blocks as removable"), the user space interface to compute whether a memory
> block can be offlined (exposed via
> /sys/devices/system/memory/memoryX/removable) has effectively been
> deprecated. We want to remove the leftovers of the kernel implementation.
> 
> When offlining a memory block (mm/memory_hotplug.c:__offline_pages()),
> we'll start by:
> 1. Testing if it contains any holes, and reject if so
> 2. Testing if pages belong to different zones, and reject if so
> 3. Isolating the page range, checking if it contains any unmovable pages
> 
> Using is_mem_section_removable() before trying to offline is not only racy,
> it can easily result in false positives/negatives. Let's stop manually
> checking is_mem_section_removable(), and let device_offline() handle it
> completely instead. We can remove the racy is_mem_section_removable()
> implementation next.
> 
> We now take more locks (e.g., memory hotplug lock when offlining and the
> zone lock when isolating), but maybe we should optimize that
> implementation instead if this ever becomes a real problem (after all,
> memory unplug is already an expensive operation). We started using
> is_mem_section_removable() in commit 51925fb3c5c9 ("powerpc/pseries:
> Implement memory hotplug remove in the kernel"), with the initial
> hotremove support of lmbs.

I am not familiar with this code but it makes sense to make it sync with
the global behavior.

> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Wei Yang <richard.weiyang@gmail.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  .../platforms/pseries/hotplug-memory.c        | 26 +++----------------
>  1 file changed, 3 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
> index b2cde1732301..5ace2f9a277e 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -337,39 +337,19 @@ static int pseries_remove_mem_node(struct device_node *np)
>  
>  static bool lmb_is_removable(struct drmem_lmb *lmb)
>  {
> -	int i, scns_per_block;
> -	bool rc = true;
> -	unsigned long pfn, block_sz;
> -	u64 phys_addr;
> -
>  	if (!(lmb->flags & DRCONF_MEM_ASSIGNED))
>  		return false;
>  
> -	block_sz = memory_block_size_bytes();
> -	scns_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
> -	phys_addr = lmb->base_addr;
> -
>  #ifdef CONFIG_FA_DUMP
>  	/*
>  	 * Don't hot-remove memory that falls in fadump boot memory area
>  	 * and memory that is reserved for capturing old kernel memory.
>  	 */
> -	if (is_fadump_memory_area(phys_addr, block_sz))
> +	if (is_fadump_memory_area(lmb->base_addr, memory_block_size_bytes()))
>  		return false;
>  #endif
> -
> -	for (i = 0; i < scns_per_block; i++) {
> -		pfn = PFN_DOWN(phys_addr);
> -		if (!pfn_in_present_section(pfn)) {
> -			phys_addr += MIN_MEMORY_BLOCK_SIZE;
> -			continue;
> -		}
> -
> -		rc = rc && is_mem_section_removable(pfn, PAGES_PER_SECTION);
> -		phys_addr += MIN_MEMORY_BLOCK_SIZE;
> -	}
> -
> -	return rc;
> +	/* device_offline() will determine if we can actually remove this lmb */
> +	return true;
>  }
>  
>  static int dlpar_add_lmb(struct drmem_lmb *);
> -- 
> 2.25.1
Baoquan He April 8, 2020, 2:46 a.m. UTC | #2
Add Pingfan to CC since he usually handles ppc related bugs for RHEL.

On 04/07/20 at 03:54pm, David Hildenbrand wrote:
> In commit 53cdc1cb29e8 ("drivers/base/memory.c: indicate all memory
> blocks as removable"), the user space interface to compute whether a memory
> block can be offlined (exposed via
> /sys/devices/system/memory/memoryX/removable) has effectively been
> deprecated. We want to remove the leftovers of the kernel implementation.

Pingfan, can you have a look at this change on PPC?  Please feel free to
give comments if any concern, or offer ack if it's OK to you.

> 
> When offlining a memory block (mm/memory_hotplug.c:__offline_pages()),
> we'll start by:
> 1. Testing if it contains any holes, and reject if so
> 2. Testing if pages belong to different zones, and reject if so
> 3. Isolating the page range, checking if it contains any unmovable pages
> 
> Using is_mem_section_removable() before trying to offline is not only racy,
> it can easily result in false positives/negatives. Let's stop manually
> checking is_mem_section_removable(), and let device_offline() handle it
> completely instead. We can remove the racy is_mem_section_removable()
> implementation next.
> 
> We now take more locks (e.g., memory hotplug lock when offlining and the
> zone lock when isolating), but maybe we should optimize that
> implementation instead if this ever becomes a real problem (after all,
> memory unplug is already an expensive operation). We started using
> is_mem_section_removable() in commit 51925fb3c5c9 ("powerpc/pseries:
> Implement memory hotplug remove in the kernel"), with the initial
> hotremove support of lmbs.
> 
> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Wei Yang <richard.weiyang@gmail.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  .../platforms/pseries/hotplug-memory.c        | 26 +++----------------
>  1 file changed, 3 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
> index b2cde1732301..5ace2f9a277e 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -337,39 +337,19 @@ static int pseries_remove_mem_node(struct device_node *np)
>  
>  static bool lmb_is_removable(struct drmem_lmb *lmb)
>  {
> -	int i, scns_per_block;
> -	bool rc = true;
> -	unsigned long pfn, block_sz;
> -	u64 phys_addr;
> -
>  	if (!(lmb->flags & DRCONF_MEM_ASSIGNED))
>  		return false;
>  
> -	block_sz = memory_block_size_bytes();
> -	scns_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
> -	phys_addr = lmb->base_addr;
> -
>  #ifdef CONFIG_FA_DUMP
>  	/*
>  	 * Don't hot-remove memory that falls in fadump boot memory area
>  	 * and memory that is reserved for capturing old kernel memory.
>  	 */
> -	if (is_fadump_memory_area(phys_addr, block_sz))
> +	if (is_fadump_memory_area(lmb->base_addr, memory_block_size_bytes()))
>  		return false;
>  #endif
> -
> -	for (i = 0; i < scns_per_block; i++) {
> -		pfn = PFN_DOWN(phys_addr);
> -		if (!pfn_in_present_section(pfn)) {
> -			phys_addr += MIN_MEMORY_BLOCK_SIZE;
> -			continue;
> -		}
> -
> -		rc = rc && is_mem_section_removable(pfn, PAGES_PER_SECTION);
> -		phys_addr += MIN_MEMORY_BLOCK_SIZE;
> -	}
> -
> -	return rc;
> +	/* device_offline() will determine if we can actually remove this lmb */
> +	return true;
>  }
>  
>  static int dlpar_add_lmb(struct drmem_lmb *);
> -- 
> 2.25.1
>
Pingfan Liu April 9, 2020, 2:59 a.m. UTC | #3
On 04/08/2020 10:46 AM, Baoquan He wrote:
> Add Pingfan to CC since he usually handles ppc related bugs for RHEL.
> 
> On 04/07/20 at 03:54pm, David Hildenbrand wrote:
>> In commit 53cdc1cb29e8 ("drivers/base/memory.c: indicate all memory
>> blocks as removable"), the user space interface to compute whether a memory
>> block can be offlined (exposed via
>> /sys/devices/system/memory/memoryX/removable) has effectively been
>> deprecated. We want to remove the leftovers of the kernel implementation.
> 
> Pingfan, can you have a look at this change on PPC?  Please feel free to
> give comments if any concern, or offer ack if it's OK to you.
> 
>>
>> When offlining a memory block (mm/memory_hotplug.c:__offline_pages()),
>> we'll start by:
>> 1. Testing if it contains any holes, and reject if so
>> 2. Testing if pages belong to different zones, and reject if so
>> 3. Isolating the page range, checking if it contains any unmovable pages
>>
>> Using is_mem_section_removable() before trying to offline is not only racy,
>> it can easily result in false positives/negatives. Let's stop manually
>> checking is_mem_section_removable(), and let device_offline() handle it
>> completely instead. We can remove the racy is_mem_section_removable()
>> implementation next.
>>
>> We now take more locks (e.g., memory hotplug lock when offlining and the
>> zone lock when isolating), but maybe we should optimize that
>> implementation instead if this ever becomes a real problem (after all,
>> memory unplug is already an expensive operation). We started using
>> is_mem_section_removable() in commit 51925fb3c5c9 ("powerpc/pseries:
>> Implement memory hotplug remove in the kernel"), with the initial
>> hotremove support of lmbs.
>>
>> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> Cc: Paul Mackerras <paulus@samba.org>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Oscar Salvador <osalvador@suse.de>
>> Cc: Baoquan He <bhe@redhat.com>
>> Cc: Wei Yang <richard.weiyang@gmail.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  .../platforms/pseries/hotplug-memory.c        | 26 +++----------------
>>  1 file changed, 3 insertions(+), 23 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
>> index b2cde1732301..5ace2f9a277e 100644
>> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
>> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
>> @@ -337,39 +337,19 @@ static int pseries_remove_mem_node(struct device_node *np)
>>  
>>  static bool lmb_is_removable(struct drmem_lmb *lmb)
>>  {
>> -	int i, scns_per_block;
>> -	bool rc = true;
>> -	unsigned long pfn, block_sz;
>> -	u64 phys_addr;
>> -
>>  	if (!(lmb->flags & DRCONF_MEM_ASSIGNED))
>>  		return false;
>>  
>> -	block_sz = memory_block_size_bytes();
>> -	scns_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
>> -	phys_addr = lmb->base_addr;
>> -
>>  #ifdef CONFIG_FA_DUMP
>>  	/*
>>  	 * Don't hot-remove memory that falls in fadump boot memory area
>>  	 * and memory that is reserved for capturing old kernel memory.
>>  	 */
>> -	if (is_fadump_memory_area(phys_addr, block_sz))
>> +	if (is_fadump_memory_area(lmb->base_addr, memory_block_size_bytes()))
>>  		return false;
>>  #endif
>> -
>> -	for (i = 0; i < scns_per_block; i++) {
>> -		pfn = PFN_DOWN(phys_addr);
>> -		if (!pfn_in_present_section(pfn)) {
>> -			phys_addr += MIN_MEMORY_BLOCK_SIZE;
>> -			continue;
>> -		}
>> -
>> -		rc = rc && is_mem_section_removable(pfn, PAGES_PER_SECTION);
>> -		phys_addr += MIN_MEMORY_BLOCK_SIZE;
>> -	}
>> -
>> -	return rc;
>> +	/* device_offline() will determine if we can actually remove this lmb */
>> +	return true;
So I think here swaps the check and do sequence. At least it breaks
dlpar_memory_remove_by_count(). It is doable to remove
is_mem_section_removable(), but here should be more effort to re-arrange
the code.

Thanks,
Pingfan
>>  }
>>  
>>  static int dlpar_add_lmb(struct drmem_lmb *);
>> -- 
>> 2.25.1
>>
Michael Ellerman April 9, 2020, 7:26 a.m. UTC | #4
David Hildenbrand <david@redhat.com> writes:

> In commit 53cdc1cb29e8 ("drivers/base/memory.c: indicate all memory
> blocks as removable"), the user space interface to compute whether a memory
> block can be offlined (exposed via
> /sys/devices/system/memory/memoryX/removable) has effectively been
> deprecated. We want to remove the leftovers of the kernel implementation.
>
> When offlining a memory block (mm/memory_hotplug.c:__offline_pages()),
> we'll start by:
> 1. Testing if it contains any holes, and reject if so
> 2. Testing if pages belong to different zones, and reject if so
> 3. Isolating the page range, checking if it contains any unmovable pages
>
> Using is_mem_section_removable() before trying to offline is not only racy,
> it can easily result in false positives/negatives. Let's stop manually
> checking is_mem_section_removable(), and let device_offline() handle it
> completely instead. We can remove the racy is_mem_section_removable()
> implementation next.
>
> We now take more locks (e.g., memory hotplug lock when offlining and the
> zone lock when isolating), but maybe we should optimize that
> implementation instead if this ever becomes a real problem (after all,
> memory unplug is already an expensive operation). We started using
> is_mem_section_removable() in commit 51925fb3c5c9 ("powerpc/pseries:
> Implement memory hotplug remove in the kernel"), with the initial
> hotremove support of lmbs.

It's also not very pretty in dmesg.

Before:

  pseries-hotplug-mem: Attempting to hot-add 10 LMB(s)
  pseries-hotplug-mem: Memory hot-add failed, removing any added LMBs
  dlpar: Could not handle DLPAR request "memory add count 10"

After:

  pseries-hotplug-mem: Attempting to hot-remove 10 LMB(s)
  page:c00c000001ca8200 refcount:1 mapcount:0 mapping:0000000018c4a547 index:0xc00000072a080180
  flags: 0x7ffff000000200(slab)
  raw: 007ffff000000200 c00c000001cffd48 c000000781c51410 c000000793327580
  raw: c00000072a080180 0000000001550001 00000001ffffffff 0000000000000000
  page dumped because: unmovable page
  page:c00c000001cc4a80 refcount:1 mapcount:0 mapping:0000000018c4a547 index:0xc00000079b110080
  flags: 0x7ffff000000000()
  raw: 007ffff000000000 5deadbeef0000100 5deadbeef0000122 0000000000000000
  raw: c00000079b110080 0000000000000000 00000001ffffffff 0000000000000000
  page dumped because: unmovable page
  page:c00c000001d08200 refcount:1 mapcount:0 mapping:0000000018c4a547 index:0xc00000074208ff00
  flags: 0x7ffff000000200(slab)
  raw: 007ffff000000200 c000000781c5f150 c00c000001d37f88 c000000798a24880
  raw: c00000074208ff00 0000000001550002 00000001ffffffff 0000000000000000
  page dumped because: unmovable page
  page:c00c000001d40140 refcount:1 mapcount:0 mapping:0000000018c4a547 index:0xc000000750059c00
  flags: 0x7ffff000000200(slab)
  raw: 007ffff000000200 c00c000001dfcfc8 c00c000001d3c288 c0000007851c2d00
  raw: c000000750059c00 0000000001000003 00000001ffffffff 0000000000000000
  page dumped because: unmovable page
  page:c00c000001d9c000 refcount:1 mapcount:0 mapping:0000000018c4a547 index:0xc000000002370080
  flags: 0x7ffff000000000()
  raw: 007ffff000000000 5deadbeef0000100 5deadbeef0000122 0000000000000000
  raw: c000000002370080 0000000000000000 00000001ffffffff 0000000000000000
  page dumped because: unmovable page
  page:c00c000001dc0200 refcount:1 mapcount:0 mapping:0000000018c4a547 index:0xc000000002370080
  flags: 0x7ffff000000000()
  raw: 007ffff000000000 5deadbeef0000100 5deadbeef0000122 0000000000000000
  raw: c000000002370080 0000000000000000 00000001ffffffff 0000000000000000
  page dumped because: unmovable page
  page:c00c000001e00000 refcount:1 mapcount:0 mapping:0000000018c4a547 index:0x0
  flags: 0x7ffff000000200(slab)
  raw: 007ffff000000200 5deadbeef0000100 5deadbeef0000122 c0000007a801f500
  raw: 0000000000000000 0000000008000800 00000001ffffffff 0000000000000000
  page dumped because: unmovable page
  page:c00c000001e40440 refcount:1 mapcount:0 mapping:0000000018c4a547 index:0x0
  flags: 0x7ffff000000200(slab)
  raw: 007ffff000000200 5deadbeef0000100 5deadbeef0000122 c0000007a801e380
  raw: 0000000000000000 0000000000400040 00000001ffffffff 0000000000000000
  page dumped because: unmovable page
  page:c00c000001e80000 refcount:1 mapcount:0 mapping:0000000018c4a547 index:0xc0000007a0000640
  flags: 0x7ffff000000200(slab)
  raw: 007ffff000000200 c00c000001e5af48 c00c000001e80408 c000000f42d00a00
  raw: c0000007a0000640 00000000066600a2 00000001ffffffff 0000000000000000
  page dumped because: unmovable page
  page:c00c000003c89d40 refcount:2 mapcount:1 mapping:0000000018c4a547 index:0x10a41
  anon flags: 0x17ffff000080024(uptodate|active|swapbacked)
  raw: 017ffff000080024 5deadbeef0000100 5deadbeef0000122 c0000007986b03c9
  raw: 0000000000010a41 0000000000000000 0000000200000000 c00000000340b000
  page dumped because: unmovable page
  page->mem_cgroup:c00000000340b000
  page:c00c000003cc0000 refcount:1 mapcount:0 mapping:0000000018c4a547 index:0xc000000f3000fd38
  flags: 0x17ffff000000200(slab)
  raw: 017ffff000000200 c000000f3c911890 c000000f3c911890 c00000079fffd980
  raw: c000000f3000fd38 0000000000700003 00000001ffffffff 0000000000000000
  page dumped because: unmovable page
  page:c00c000003d00000 refcount:1 mapcount:0 mapping:0000000018c4a547 index:0xc0000007a2ec0000
  flags: 0x17ffff000000000()
  raw: 017ffff000000000 5deadbeef0000100 5deadbeef0000122 0000000000000000
  raw: c0000007a2ec0000 0000000000000000 00000001ffffffff 0000000000000000
  page dumped because: unmovable page
  page:c00c000003e2c000 refcount:1 mapcount:0 mapping:0000000018c4a547 index:0xc000000f8b008400
  flags: 0x27ffff000000200(slab)
  raw: 027ffff000000200 c000000f8e000190 c000000f8e000190 c0000007a801e380
  raw: c000000f8b008400 0000000000400038 00000001ffffffff 0000000000000000
  page dumped because: unmovable page
  page:c00c000003fec000 refcount:1 mapcount:0 mapping:0000000018c4a547 index:0x0
  flags: 0x37ffff000000000()
  raw: 037ffff000000000 5deadbeef0000100 5deadbeef0000122 0000000000000000
  raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
  page dumped because: unmovable page
  pseries-hotplug-mem: Memory hot-remove failed, adding LMB's back
  dlpar: Could not handle DLPAR request "memory remove count 10"



It looks like set_migratetype_isolate() and start_isolate_page_range()
can be told not to report those warnings, but we're just calling
device_offline() which doesn't let us specify that.

cheers
David Hildenbrand April 9, 2020, 7:26 a.m. UTC | #5
On 09.04.20 04:59, piliu wrote:
> 
> 
> On 04/08/2020 10:46 AM, Baoquan He wrote:
>> Add Pingfan to CC since he usually handles ppc related bugs for RHEL.
>>
>> On 04/07/20 at 03:54pm, David Hildenbrand wrote:
>>> In commit 53cdc1cb29e8 ("drivers/base/memory.c: indicate all memory
>>> blocks as removable"), the user space interface to compute whether a memory
>>> block can be offlined (exposed via
>>> /sys/devices/system/memory/memoryX/removable) has effectively been
>>> deprecated. We want to remove the leftovers of the kernel implementation.
>>
>> Pingfan, can you have a look at this change on PPC?  Please feel free to
>> give comments if any concern, or offer ack if it's OK to you.
>>
>>>
>>> When offlining a memory block (mm/memory_hotplug.c:__offline_pages()),
>>> we'll start by:
>>> 1. Testing if it contains any holes, and reject if so
>>> 2. Testing if pages belong to different zones, and reject if so
>>> 3. Isolating the page range, checking if it contains any unmovable pages
>>>
>>> Using is_mem_section_removable() before trying to offline is not only racy,
>>> it can easily result in false positives/negatives. Let's stop manually
>>> checking is_mem_section_removable(), and let device_offline() handle it
>>> completely instead. We can remove the racy is_mem_section_removable()
>>> implementation next.
>>>
>>> We now take more locks (e.g., memory hotplug lock when offlining and the
>>> zone lock when isolating), but maybe we should optimize that
>>> implementation instead if this ever becomes a real problem (after all,
>>> memory unplug is already an expensive operation). We started using
>>> is_mem_section_removable() in commit 51925fb3c5c9 ("powerpc/pseries:
>>> Implement memory hotplug remove in the kernel"), with the initial
>>> hotremove support of lmbs.
>>>
>>> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
>>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>>> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>>> Cc: Paul Mackerras <paulus@samba.org>
>>> Cc: Michal Hocko <mhocko@suse.com>
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> Cc: Oscar Salvador <osalvador@suse.de>
>>> Cc: Baoquan He <bhe@redhat.com>
>>> Cc: Wei Yang <richard.weiyang@gmail.com>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>> ---
>>>  .../platforms/pseries/hotplug-memory.c        | 26 +++----------------
>>>  1 file changed, 3 insertions(+), 23 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
>>> index b2cde1732301..5ace2f9a277e 100644
>>> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
>>> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
>>> @@ -337,39 +337,19 @@ static int pseries_remove_mem_node(struct device_node *np)
>>>  
>>>  static bool lmb_is_removable(struct drmem_lmb *lmb)
>>>  {
>>> -	int i, scns_per_block;
>>> -	bool rc = true;
>>> -	unsigned long pfn, block_sz;
>>> -	u64 phys_addr;
>>> -
>>>  	if (!(lmb->flags & DRCONF_MEM_ASSIGNED))
>>>  		return false;
>>>  
>>> -	block_sz = memory_block_size_bytes();
>>> -	scns_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
>>> -	phys_addr = lmb->base_addr;
>>> -
>>>  #ifdef CONFIG_FA_DUMP
>>>  	/*
>>>  	 * Don't hot-remove memory that falls in fadump boot memory area
>>>  	 * and memory that is reserved for capturing old kernel memory.
>>>  	 */
>>> -	if (is_fadump_memory_area(phys_addr, block_sz))
>>> +	if (is_fadump_memory_area(lmb->base_addr, memory_block_size_bytes()))
>>>  		return false;
>>>  #endif
>>> -
>>> -	for (i = 0; i < scns_per_block; i++) {
>>> -		pfn = PFN_DOWN(phys_addr);
>>> -		if (!pfn_in_present_section(pfn)) {
>>> -			phys_addr += MIN_MEMORY_BLOCK_SIZE;
>>> -			continue;
>>> -		}
>>> -
>>> -		rc = rc && is_mem_section_removable(pfn, PAGES_PER_SECTION);
>>> -		phys_addr += MIN_MEMORY_BLOCK_SIZE;
>>> -	}
>>> -
>>> -	return rc;
>>> +	/* device_offline() will determine if we can actually remove this lmb */
>>> +	return true;
> So I think here swaps the check and do sequence. At least it breaks
> dlpar_memory_remove_by_count(). It is doable to remove
> is_mem_section_removable(), but here should be more effort to re-arrange
> the code.
> 

Thanks Pingfan,

1. "swaps the check and do sequence":

Partially. Any caller of dlpar_remove_lmb() already has to deal with
false positives. device_offline() can easily fail after
dlpar_remove_lmb() == true. It's inherently racy.

2. "breaks dlpar_memory_remove_by_count()"

Can you elaborate why it "breaks" it? It will simply try to
offline+remove lmbs, detect that it wasn't able to offline+remove as
much as it wanted (which could happen before as well easily), and re-add
the already offlined+removed ones.

3. "more effort to re-arrange the code"

What would be your suggestion?

We would rip out that racy check if we can remove as much memory as
requested in dlpar_memory_remove_by_count() and simply always try to
remove + recover.
David Hildenbrand April 9, 2020, 7:32 a.m. UTC | #6
> It's also not very pretty in dmesg.
> 
> Before:
> 
>   pseries-hotplug-mem: Attempting to hot-add 10 LMB(s)
>   pseries-hotplug-mem: Memory hot-add failed, removing any added LMBs
>   dlpar: Could not handle DLPAR request "memory add count 10"
> 

Thanks for running it through the mill.

Here you test "hotadd", below you test "hot-remove". But yeah, there is
a change in the amount of dmesg.

> After:
> 
>   pseries-hotplug-mem: Attempting to hot-remove 10 LMB(s)
>   page:c00c000001ca8200 refcount:1 mapcount:0 mapping:0000000018c4a547 index:0xc00000072a080180
>   flags: 0x7ffff000000200(slab)
>   raw: 007ffff000000200 c00c000001cffd48 c000000781c51410 c000000793327580
>   raw: c00000072a080180 0000000001550001 00000001ffffffff 0000000000000000
>   page dumped because: unmovable page
>   page:c00c000001cc4a80 refcount:1 mapcount:0 mapping:0000000018c4a547 index:0xc00000079b110080
>   flags: 0x7ffff000000000()
>   raw: 007ffff000000000 5deadbeef0000100 5deadbeef0000122 0000000000000000
>   raw: c00000079b110080 0000000000000000 00000001ffffffff 0000000000000000
>   page dumped because: unmovable page
>   page:c00c000001d08200 refcount:1 mapcount:0 mapping:0000000018c4a547 index:0xc00000074208ff00
>   flags: 0x7ffff000000200(slab)
>   raw: 007ffff000000200 c000000781c5f150 c00c000001d37f88 c000000798a24880
>   raw: c00000074208ff00 0000000001550002 00000001ffffffff 0000000000000000
>   page dumped because: unmovable page
>   page:c00c000001d40140 refcount:1 mapcount:0 mapping:0000000018c4a547 index:0xc000000750059c00
>   flags: 0x7ffff000000200(slab)
>   raw: 007ffff000000200 c00c000001dfcfc8 c00c000001d3c288 c0000007851c2d00
>   raw: c000000750059c00 0000000001000003 00000001ffffffff 0000000000000000
>   page dumped because: unmovable page
>   page:c00c000001d9c000 refcount:1 mapcount:0 mapping:0000000018c4a547 index:0xc000000002370080
>   flags: 0x7ffff000000000()
>   raw: 007ffff000000000 5deadbeef0000100 5deadbeef0000122 0000000000000000
>   raw: c000000002370080 0000000000000000 00000001ffffffff 0000000000000000
>   page dumped because: unmovable page
>   page:c00c000001dc0200 refcount:1 mapcount:0 mapping:0000000018c4a547 index:0xc000000002370080
>   flags: 0x7ffff000000000()
>   raw: 007ffff000000000 5deadbeef0000100 5deadbeef0000122 0000000000000000
>   raw: c000000002370080 0000000000000000 00000001ffffffff 0000000000000000
>   page dumped because: unmovable page
>   page:c00c000001e00000 refcount:1 mapcount:0 mapping:0000000018c4a547 index:0x0
>   flags: 0x7ffff000000200(slab)
>   raw: 007ffff000000200 5deadbeef0000100 5deadbeef0000122 c0000007a801f500
>   raw: 0000000000000000 0000000008000800 00000001ffffffff 0000000000000000
>   page dumped because: unmovable page
>   page:c00c000001e40440 refcount:1 mapcount:0 mapping:0000000018c4a547 index:0x0
>   flags: 0x7ffff000000200(slab)
>   raw: 007ffff000000200 5deadbeef0000100 5deadbeef0000122 c0000007a801e380
>   raw: 0000000000000000 0000000000400040 00000001ffffffff 0000000000000000
>   page dumped because: unmovable page
>   page:c00c000001e80000 refcount:1 mapcount:0 mapping:0000000018c4a547 index:0xc0000007a0000640
>   flags: 0x7ffff000000200(slab)
>   raw: 007ffff000000200 c00c000001e5af48 c00c000001e80408 c000000f42d00a00
>   raw: c0000007a0000640 00000000066600a2 00000001ffffffff 0000000000000000
>   page dumped because: unmovable page
>   page:c00c000003c89d40 refcount:2 mapcount:1 mapping:0000000018c4a547 index:0x10a41
>   anon flags: 0x17ffff000080024(uptodate|active|swapbacked)
>   raw: 017ffff000080024 5deadbeef0000100 5deadbeef0000122 c0000007986b03c9
>   raw: 0000000000010a41 0000000000000000 0000000200000000 c00000000340b000
>   page dumped because: unmovable page
>   page->mem_cgroup:c00000000340b000
>   page:c00c000003cc0000 refcount:1 mapcount:0 mapping:0000000018c4a547 index:0xc000000f3000fd38
>   flags: 0x17ffff000000200(slab)
>   raw: 017ffff000000200 c000000f3c911890 c000000f3c911890 c00000079fffd980
>   raw: c000000f3000fd38 0000000000700003 00000001ffffffff 0000000000000000
>   page dumped because: unmovable page
>   page:c00c000003d00000 refcount:1 mapcount:0 mapping:0000000018c4a547 index:0xc0000007a2ec0000
>   flags: 0x17ffff000000000()
>   raw: 017ffff000000000 5deadbeef0000100 5deadbeef0000122 0000000000000000
>   raw: c0000007a2ec0000 0000000000000000 00000001ffffffff 0000000000000000
>   page dumped because: unmovable page
>   page:c00c000003e2c000 refcount:1 mapcount:0 mapping:0000000018c4a547 index:0xc000000f8b008400
>   flags: 0x27ffff000000200(slab)
>   raw: 027ffff000000200 c000000f8e000190 c000000f8e000190 c0000007a801e380
>   raw: c000000f8b008400 0000000000400038 00000001ffffffff 0000000000000000
>   page dumped because: unmovable page
>   page:c00c000003fec000 refcount:1 mapcount:0 mapping:0000000018c4a547 index:0x0
>   flags: 0x37ffff000000000()
>   raw: 037ffff000000000 5deadbeef0000100 5deadbeef0000122 0000000000000000
>   raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
>   page dumped because: unmovable page
>   pseries-hotplug-mem: Memory hot-remove failed, adding LMB's back
>   dlpar: Could not handle DLPAR request "memory remove count 10"
> 
> 
> 
> It looks like set_migratetype_isolate() and start_isolate_page_range()
> can be told not to report those warnings, but we're just calling
> device_offline() which doesn't let us specify that.

Yeah, but these messages can easily pop up (to a more limited degree)
with the current code as well, as checking for movable pages without
isolating the page range gives you no guarantees that no unmovable data
will end up on the lmb until you offline it. It's simply racy.

I discussed this output with Michal when we changed the
/sys/devices/system/memory/memoryX/removable behavior, and he had the
opinion that dmesg (debug) output should not really be an issue.

We could make this output

a) configurable at runtime and let powerpc code disable it while calling
device_offline(). So user space attempts will still trigger the messages

b) configurable at compile time
Michal Hocko April 9, 2020, 7:59 a.m. UTC | #7
On Thu 09-04-20 17:26:01, Michael Ellerman wrote:
> David Hildenbrand <david@redhat.com> writes:
> 
> > In commit 53cdc1cb29e8 ("drivers/base/memory.c: indicate all memory
> > blocks as removable"), the user space interface to compute whether a memory
> > block can be offlined (exposed via
> > /sys/devices/system/memory/memoryX/removable) has effectively been
> > deprecated. We want to remove the leftovers of the kernel implementation.
> >
> > When offlining a memory block (mm/memory_hotplug.c:__offline_pages()),
> > we'll start by:
> > 1. Testing if it contains any holes, and reject if so
> > 2. Testing if pages belong to different zones, and reject if so
> > 3. Isolating the page range, checking if it contains any unmovable pages
> >
> > Using is_mem_section_removable() before trying to offline is not only racy,
> > it can easily result in false positives/negatives. Let's stop manually
> > checking is_mem_section_removable(), and let device_offline() handle it
> > completely instead. We can remove the racy is_mem_section_removable()
> > implementation next.
> >
> > We now take more locks (e.g., memory hotplug lock when offlining and the
> > zone lock when isolating), but maybe we should optimize that
> > implementation instead if this ever becomes a real problem (after all,
> > memory unplug is already an expensive operation). We started using
> > is_mem_section_removable() in commit 51925fb3c5c9 ("powerpc/pseries:
> > Implement memory hotplug remove in the kernel"), with the initial
> > hotremove support of lmbs.
> 
> It's also not very pretty in dmesg.
> 
> Before:
> 
>   pseries-hotplug-mem: Attempting to hot-add 10 LMB(s)
>   pseries-hotplug-mem: Memory hot-add failed, removing any added LMBs
>   dlpar: Could not handle DLPAR request "memory add count 10"

Yeah, there is more output but isn't that useful? Or put it differently
what is the actual problem from having those messages in the kernel log?

From the below you can clearly tell that there are kernel allocations
which prevent hot remove from happening.

If the overall size of the debugging output is a concern then we can
think of a way to reduce it. E.g. once you have a couple of pages
reported then all others from the same block are likely not interesting
much.
David Hildenbrand April 9, 2020, 8:12 a.m. UTC | #8
On 09.04.20 09:59, Michal Hocko wrote:
> On Thu 09-04-20 17:26:01, Michael Ellerman wrote:
>> David Hildenbrand <david@redhat.com> writes:
>>
>>> In commit 53cdc1cb29e8 ("drivers/base/memory.c: indicate all memory
>>> blocks as removable"), the user space interface to compute whether a memory
>>> block can be offlined (exposed via
>>> /sys/devices/system/memory/memoryX/removable) has effectively been
>>> deprecated. We want to remove the leftovers of the kernel implementation.
>>>
>>> When offlining a memory block (mm/memory_hotplug.c:__offline_pages()),
>>> we'll start by:
>>> 1. Testing if it contains any holes, and reject if so
>>> 2. Testing if pages belong to different zones, and reject if so
>>> 3. Isolating the page range, checking if it contains any unmovable pages
>>>
>>> Using is_mem_section_removable() before trying to offline is not only racy,
>>> it can easily result in false positives/negatives. Let's stop manually
>>> checking is_mem_section_removable(), and let device_offline() handle it
>>> completely instead. We can remove the racy is_mem_section_removable()
>>> implementation next.
>>>
>>> We now take more locks (e.g., memory hotplug lock when offlining and the
>>> zone lock when isolating), but maybe we should optimize that
>>> implementation instead if this ever becomes a real problem (after all,
>>> memory unplug is already an expensive operation). We started using
>>> is_mem_section_removable() in commit 51925fb3c5c9 ("powerpc/pseries:
>>> Implement memory hotplug remove in the kernel"), with the initial
>>> hotremove support of lmbs.
>>
>> It's also not very pretty in dmesg.
>>
>> Before:
>>
>>   pseries-hotplug-mem: Attempting to hot-add 10 LMB(s)
>>   pseries-hotplug-mem: Memory hot-add failed, removing any added LMBs
>>   dlpar: Could not handle DLPAR request "memory add count 10"
> 
> Yeah, there is more output but isn't that useful? Or put it differently
> what is the actual problem from having those messages in the kernel log?
> 
> From the below you can clearly tell that there are kernel allocations
> which prevent hot remove from happening.
> 
> If the overall size of the debugging output is a concern then we can
> think of a way to reduce it. E.g. once you have a couple of pages
> reported then all others from the same block are likely not interesting
> much.
> 

IIRC, we only report one page per block already. (and stop, as we
detected something unmovable)
Michal Hocko April 9, 2020, 8:49 a.m. UTC | #9
On Thu 09-04-20 10:12:20, David Hildenbrand wrote:
> On 09.04.20 09:59, Michal Hocko wrote:
> > On Thu 09-04-20 17:26:01, Michael Ellerman wrote:
> >> David Hildenbrand <david@redhat.com> writes:
> >>
> >>> In commit 53cdc1cb29e8 ("drivers/base/memory.c: indicate all memory
> >>> blocks as removable"), the user space interface to compute whether a memory
> >>> block can be offlined (exposed via
> >>> /sys/devices/system/memory/memoryX/removable) has effectively been
> >>> deprecated. We want to remove the leftovers of the kernel implementation.
> >>>
> >>> When offlining a memory block (mm/memory_hotplug.c:__offline_pages()),
> >>> we'll start by:
> >>> 1. Testing if it contains any holes, and reject if so
> >>> 2. Testing if pages belong to different zones, and reject if so
> >>> 3. Isolating the page range, checking if it contains any unmovable pages
> >>>
> >>> Using is_mem_section_removable() before trying to offline is not only racy,
> >>> it can easily result in false positives/negatives. Let's stop manually
> >>> checking is_mem_section_removable(), and let device_offline() handle it
> >>> completely instead. We can remove the racy is_mem_section_removable()
> >>> implementation next.
> >>>
> >>> We now take more locks (e.g., memory hotplug lock when offlining and the
> >>> zone lock when isolating), but maybe we should optimize that
> >>> implementation instead if this ever becomes a real problem (after all,
> >>> memory unplug is already an expensive operation). We started using
> >>> is_mem_section_removable() in commit 51925fb3c5c9 ("powerpc/pseries:
> >>> Implement memory hotplug remove in the kernel"), with the initial
> >>> hotremove support of lmbs.
> >>
> >> It's also not very pretty in dmesg.
> >>
> >> Before:
> >>
> >>   pseries-hotplug-mem: Attempting to hot-add 10 LMB(s)
> >>   pseries-hotplug-mem: Memory hot-add failed, removing any added LMBs
> >>   dlpar: Could not handle DLPAR request "memory add count 10"
> > 
> > Yeah, there is more output but isn't that useful? Or put it differently
> > what is the actual problem from having those messages in the kernel log?
> > 
> > From the below you can clearly tell that there are kernel allocations
> > which prevent hot remove from happening.
> > 
> > If the overall size of the debugging output is a concern then we can
> > think of a way to reduce it. E.g. once you have a couple of pages
> > reported then all others from the same block are likely not interesting
> > much.
> > 
> 
> IIRC, we only report one page per block already. (and stop, as we
> detected something unmovable)

You are right.
David Hildenbrand April 9, 2020, 8:56 a.m. UTC | #10
On 09.04.20 09:26, David Hildenbrand wrote:
> On 09.04.20 04:59, piliu wrote:
>>
>>
>> On 04/08/2020 10:46 AM, Baoquan He wrote:
>>> Add Pingfan to CC since he usually handles ppc related bugs for RHEL.
>>>
>>> On 04/07/20 at 03:54pm, David Hildenbrand wrote:
>>>> In commit 53cdc1cb29e8 ("drivers/base/memory.c: indicate all memory
>>>> blocks as removable"), the user space interface to compute whether a memory
>>>> block can be offlined (exposed via
>>>> /sys/devices/system/memory/memoryX/removable) has effectively been
>>>> deprecated. We want to remove the leftovers of the kernel implementation.
>>>
>>> Pingfan, can you have a look at this change on PPC?  Please feel free to
>>> give comments if any concern, or offer ack if it's OK to you.
>>>
>>>>
>>>> When offlining a memory block (mm/memory_hotplug.c:__offline_pages()),
>>>> we'll start by:
>>>> 1. Testing if it contains any holes, and reject if so
>>>> 2. Testing if pages belong to different zones, and reject if so
>>>> 3. Isolating the page range, checking if it contains any unmovable pages
>>>>
>>>> Using is_mem_section_removable() before trying to offline is not only racy,
>>>> it can easily result in false positives/negatives. Let's stop manually
>>>> checking is_mem_section_removable(), and let device_offline() handle it
>>>> completely instead. We can remove the racy is_mem_section_removable()
>>>> implementation next.
>>>>
>>>> We now take more locks (e.g., memory hotplug lock when offlining and the
>>>> zone lock when isolating), but maybe we should optimize that
>>>> implementation instead if this ever becomes a real problem (after all,
>>>> memory unplug is already an expensive operation). We started using
>>>> is_mem_section_removable() in commit 51925fb3c5c9 ("powerpc/pseries:
>>>> Implement memory hotplug remove in the kernel"), with the initial
>>>> hotremove support of lmbs.
>>>>
>>>> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
>>>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>>>> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>>>> Cc: Paul Mackerras <paulus@samba.org>
>>>> Cc: Michal Hocko <mhocko@suse.com>
>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>> Cc: Oscar Salvador <osalvador@suse.de>
>>>> Cc: Baoquan He <bhe@redhat.com>
>>>> Cc: Wei Yang <richard.weiyang@gmail.com>
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>> ---
>>>>  .../platforms/pseries/hotplug-memory.c        | 26 +++----------------
>>>>  1 file changed, 3 insertions(+), 23 deletions(-)
>>>>
>>>> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
>>>> index b2cde1732301..5ace2f9a277e 100644
>>>> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
>>>> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
>>>> @@ -337,39 +337,19 @@ static int pseries_remove_mem_node(struct device_node *np)
>>>>  
>>>>  static bool lmb_is_removable(struct drmem_lmb *lmb)
>>>>  {
>>>> -	int i, scns_per_block;
>>>> -	bool rc = true;
>>>> -	unsigned long pfn, block_sz;
>>>> -	u64 phys_addr;
>>>> -
>>>>  	if (!(lmb->flags & DRCONF_MEM_ASSIGNED))
>>>>  		return false;
>>>>  
>>>> -	block_sz = memory_block_size_bytes();
>>>> -	scns_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
>>>> -	phys_addr = lmb->base_addr;
>>>> -
>>>>  #ifdef CONFIG_FA_DUMP
>>>>  	/*
>>>>  	 * Don't hot-remove memory that falls in fadump boot memory area
>>>>  	 * and memory that is reserved for capturing old kernel memory.
>>>>  	 */
>>>> -	if (is_fadump_memory_area(phys_addr, block_sz))
>>>> +	if (is_fadump_memory_area(lmb->base_addr, memory_block_size_bytes()))
>>>>  		return false;
>>>>  #endif
>>>> -
>>>> -	for (i = 0; i < scns_per_block; i++) {
>>>> -		pfn = PFN_DOWN(phys_addr);
>>>> -		if (!pfn_in_present_section(pfn)) {
>>>> -			phys_addr += MIN_MEMORY_BLOCK_SIZE;
>>>> -			continue;
>>>> -		}
>>>> -
>>>> -		rc = rc && is_mem_section_removable(pfn, PAGES_PER_SECTION);
>>>> -		phys_addr += MIN_MEMORY_BLOCK_SIZE;
>>>> -	}
>>>> -
>>>> -	return rc;
>>>> +	/* device_offline() will determine if we can actually remove this lmb */
>>>> +	return true;
>> So I think here swaps the check and do sequence. At least it breaks
>> dlpar_memory_remove_by_count(). It is doable to remove
>> is_mem_section_removable(), but here should be more effort to re-arrange
>> the code.
>>
> 
> Thanks Pingfan,
> 
> 1. "swaps the check and do sequence":
> 
> Partially. Any caller of dlpar_remove_lmb() already has to deal with
> false positives. device_offline() can easily fail after
> dlpar_remove_lmb() == true. It's inherently racy.

Sorry, s/dlpar_remove_lmb/lmb_is_removable/
Pingfan Liu April 9, 2020, 2:01 p.m. UTC | #11
On 04/09/2020 03:26 PM, David Hildenbrand wrote:
> On 09.04.20 04:59, piliu wrote:
>>
>>
>> On 04/08/2020 10:46 AM, Baoquan He wrote:
>>> Add Pingfan to CC since he usually handles ppc related bugs for RHEL.
>>>
>>> On 04/07/20 at 03:54pm, David Hildenbrand wrote:
>>>> In commit 53cdc1cb29e8 ("drivers/base/memory.c: indicate all memory
>>>> blocks as removable"), the user space interface to compute whether a memory
>>>> block can be offlined (exposed via
>>>> /sys/devices/system/memory/memoryX/removable) has effectively been
>>>> deprecated. We want to remove the leftovers of the kernel implementation.
>>>
>>> Pingfan, can you have a look at this change on PPC?  Please feel free to
>>> give comments if any concern, or offer ack if it's OK to you.
>>>
>>>>
>>>> When offlining a memory block (mm/memory_hotplug.c:__offline_pages()),
>>>> we'll start by:
>>>> 1. Testing if it contains any holes, and reject if so
>>>> 2. Testing if pages belong to different zones, and reject if so
>>>> 3. Isolating the page range, checking if it contains any unmovable pages
>>>>
>>>> Using is_mem_section_removable() before trying to offline is not only racy,
>>>> it can easily result in false positives/negatives. Let's stop manually
>>>> checking is_mem_section_removable(), and let device_offline() handle it
>>>> completely instead. We can remove the racy is_mem_section_removable()
>>>> implementation next.
>>>>
>>>> We now take more locks (e.g., memory hotplug lock when offlining and the
>>>> zone lock when isolating), but maybe we should optimize that
>>>> implementation instead if this ever becomes a real problem (after all,
>>>> memory unplug is already an expensive operation). We started using
>>>> is_mem_section_removable() in commit 51925fb3c5c9 ("powerpc/pseries:
>>>> Implement memory hotplug remove in the kernel"), with the initial
>>>> hotremove support of lmbs.
>>>>
>>>> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
>>>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>>>> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>>>> Cc: Paul Mackerras <paulus@samba.org>
>>>> Cc: Michal Hocko <mhocko@suse.com>
>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>> Cc: Oscar Salvador <osalvador@suse.de>
>>>> Cc: Baoquan He <bhe@redhat.com>
>>>> Cc: Wei Yang <richard.weiyang@gmail.com>
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>> ---
>>>>  .../platforms/pseries/hotplug-memory.c        | 26 +++----------------
>>>>  1 file changed, 3 insertions(+), 23 deletions(-)
>>>>
>>>> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
>>>> index b2cde1732301..5ace2f9a277e 100644
>>>> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
>>>> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
>>>> @@ -337,39 +337,19 @@ static int pseries_remove_mem_node(struct device_node *np)
>>>>  
>>>>  static bool lmb_is_removable(struct drmem_lmb *lmb)
>>>>  {
>>>> -	int i, scns_per_block;
>>>> -	bool rc = true;
>>>> -	unsigned long pfn, block_sz;
>>>> -	u64 phys_addr;
>>>> -
>>>>  	if (!(lmb->flags & DRCONF_MEM_ASSIGNED))
>>>>  		return false;
>>>>  
>>>> -	block_sz = memory_block_size_bytes();
>>>> -	scns_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
>>>> -	phys_addr = lmb->base_addr;
>>>> -
>>>>  #ifdef CONFIG_FA_DUMP
>>>>  	/*
>>>>  	 * Don't hot-remove memory that falls in fadump boot memory area
>>>>  	 * and memory that is reserved for capturing old kernel memory.
>>>>  	 */
>>>> -	if (is_fadump_memory_area(phys_addr, block_sz))
>>>> +	if (is_fadump_memory_area(lmb->base_addr, memory_block_size_bytes()))
>>>>  		return false;
>>>>  #endif
>>>> -
>>>> -	for (i = 0; i < scns_per_block; i++) {
>>>> -		pfn = PFN_DOWN(phys_addr);
>>>> -		if (!pfn_in_present_section(pfn)) {
>>>> -			phys_addr += MIN_MEMORY_BLOCK_SIZE;
>>>> -			continue;
>>>> -		}
>>>> -
>>>> -		rc = rc && is_mem_section_removable(pfn, PAGES_PER_SECTION);
>>>> -		phys_addr += MIN_MEMORY_BLOCK_SIZE;
>>>> -	}
>>>> -
>>>> -	return rc;
>>>> +	/* device_offline() will determine if we can actually remove this lmb */
>>>> +	return true;
>> So I think here swaps the check and do sequence. At least it breaks
>> dlpar_memory_remove_by_count(). It is doable to remove
>> is_mem_section_removable(), but here should be more effort to re-arrange
>> the code.
>>
> 
> Thanks Pingfan,
> 
> 1. "swaps the check and do sequence":
> 
> Partially. Any caller of dlpar_remove_lmb() already has to deal with
> false positives. device_offline() can easily fail after
> dlpar_remove_lmb() == true. It's inherently racy.
> 
> 2. "breaks dlpar_memory_remove_by_count()"
> 
> Can you elaborate why it "breaks" it? It will simply try to
> offline+remove lmbs, detect that it wasn't able to offline+remove as
> much as it wanted (which could happen before as well easily), and re-add
> the already offlined+removed ones.
> 
I overlooked the re-add logic. Then I think
dlpar_memory_remove_by_count() is OK with this patch.
> 3. "more effort to re-arrange the code"
> 
> What would be your suggestion?
> 
I had thought about merging the two loop "for_each_drmem_lmb()", and do
check inside the loop. But now it is needless.

The only concerned left is "if (lmbs_available < lmbs_to_remove)" fails
to alarm due to the weaken checking in lmb_is_removable(). Then after
heavy migration in offline_pages, we encounters this limit, and need to
re-add them back.

But I think it is a rare case plus hot-remove is also not a quite
frequent event. So it is worth to simplify the code by this patch.

Thanks for your classification.

For [1/2]
Reviewed-by: Pingfan Liu <piliu@redhat.com>
diff mbox series

Patch

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index b2cde1732301..5ace2f9a277e 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -337,39 +337,19 @@  static int pseries_remove_mem_node(struct device_node *np)
 
 static bool lmb_is_removable(struct drmem_lmb *lmb)
 {
-	int i, scns_per_block;
-	bool rc = true;
-	unsigned long pfn, block_sz;
-	u64 phys_addr;
-
 	if (!(lmb->flags & DRCONF_MEM_ASSIGNED))
 		return false;
 
-	block_sz = memory_block_size_bytes();
-	scns_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
-	phys_addr = lmb->base_addr;
-
 #ifdef CONFIG_FA_DUMP
 	/*
 	 * Don't hot-remove memory that falls in fadump boot memory area
 	 * and memory that is reserved for capturing old kernel memory.
 	 */
-	if (is_fadump_memory_area(phys_addr, block_sz))
+	if (is_fadump_memory_area(lmb->base_addr, memory_block_size_bytes()))
 		return false;
 #endif
-
-	for (i = 0; i < scns_per_block; i++) {
-		pfn = PFN_DOWN(phys_addr);
-		if (!pfn_in_present_section(pfn)) {
-			phys_addr += MIN_MEMORY_BLOCK_SIZE;
-			continue;
-		}
-
-		rc = rc && is_mem_section_removable(pfn, PAGES_PER_SECTION);
-		phys_addr += MIN_MEMORY_BLOCK_SIZE;
-	}
-
-	return rc;
+	/* device_offline() will determine if we can actually remove this lmb */
+	return true;
 }
 
 static int dlpar_add_lmb(struct drmem_lmb *);