diff mbox series

[V11,1/5] mm/hotplug: Introduce arch callback validating the hot remove range

Message ID 1578625755-11792-2-git-send-email-anshuman.khandual@arm.com (mailing list archive)
State New, archived
Headers show
Series arm64/mm: Enable memory hot remove | expand

Commit Message

Anshuman Khandual Jan. 10, 2020, 3:09 a.m. UTC
Currently there are two interfaces to initiate memory range hot removal i.e
remove_memory() and __remove_memory() which then calls try_remove_memory().
Platform gets called with arch_remove_memory() to tear down required kernel
page tables and other arch specific procedures. But there are platforms
like arm64 which might want to prevent removal of certain specific memory
ranges irrespective of their present usage or movability properties.

Current arch call back arch_remove_memory() is too late in the process to
abort memory hot removal as memory block devices and firmware memory map
entries would have already been removed. Platforms should be able to abort
the process before taking the mem_hotplug_lock with mem_hotplug_begin().
This essentially requires a new arch callback for memory range validation.

This differentiates memory range validation between memory hot add and hot
remove paths before carving out a new helper check_hotremove_memory_range()
which incorporates a new arch callback. This call back provides platforms
an opportunity to refuse memory removal at the very onset. In future the
same principle can be extended for memory hot add path if required.

Platforms can choose to override this callback in order to reject specific
memory ranges from removal or can just fallback to a default implementation
which allows removal of all memory ranges.

Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 include/linux/memory_hotplug.h |  7 +++++++
 mm/memory_hotplug.c            | 21 ++++++++++++++++++++-
 2 files changed, 27 insertions(+), 1 deletion(-)

Comments

David Hildenbrand Jan. 10, 2020, 8:42 a.m. UTC | #1
On 10.01.20 04:09, Anshuman Khandual wrote:
> Currently there are two interfaces to initiate memory range hot removal i.e
> remove_memory() and __remove_memory() which then calls try_remove_memory().
> Platform gets called with arch_remove_memory() to tear down required kernel
> page tables and other arch specific procedures. But there are platforms
> like arm64 which might want to prevent removal of certain specific memory
> ranges irrespective of their present usage or movability properties.

Why? Is this only relevant for boot memory? I hope so, otherwise the
arch code needs fixing IMHO.

If it's only boot memory, we should disallow offlining instead via a
memory notifier - much cleaner.

> 
> Current arch call back arch_remove_memory() is too late in the process to
> abort memory hot removal as memory block devices and firmware memory map
> entries would have already been removed. Platforms should be able to abort
> the process before taking the mem_hotplug_lock with mem_hotplug_begin().
> This essentially requires a new arch callback for memory range validation.

I somewhat dislike this very much. Memory removal should never fail if
used sanely. See e.g., __remove_memory(), it will BUG() whenever
something like that would strike.

> 
> This differentiates memory range validation between memory hot add and hot
> remove paths before carving out a new helper check_hotremove_memory_range()
> which incorporates a new arch callback. This call back provides platforms
> an opportunity to refuse memory removal at the very onset. In future the
> same principle can be extended for memory hot add path if required.
> 
> Platforms can choose to override this callback in order to reject specific
> memory ranges from removal or can just fallback to a default implementation
> which allows removal of all memory ranges.

I suspect we want really want to disallow offlining instead. E.g., I
remember s390x does that with certain areas needed for dumping/kexec.

Somebody who added memory via add_memory() should always be able to
remove the memory via remove_memory() again. Only boot memory can be
treated in a special way, but boot memory is initially always online.
kernel test robot Jan. 11, 2020, 2:11 p.m. UTC | #2
Hi Anshuman,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.5-rc5 next-20200110]
[cannot apply to arm64/for-next/core robh/for-next linux/master]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Anshuman-Khandual/arm64-mm-Enable-memory-hot-remove/20200111-003854
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 4a3033ef6e6bb4c566bd1d556de69b494d76976c
config: arm64-randconfig-a001-20200109 (attached as .config)
compiler: aarch64-linux-gcc (GCC) 7.5.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.5.0 make.cross ARCH=arm64 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   mm/memory_hotplug.c: In function 'check_hotremove_memory_range':
>> mm/memory_hotplug.c:1027:7: error: implicit declaration of function 'arch_memory_removable'; did you mean 'add_memory_resource'? [-Werror=implicit-function-declaration]
     rc = arch_memory_removable(start, size);
          ^~~~~~~~~~~~~~~~~~~~~
          add_memory_resource
   At top level:
   mm/memory_hotplug.c:1017:12: warning: 'check_hotremove_memory_range' defined but not used [-Wunused-function]
    static int check_hotremove_memory_range(u64 start, u64 size)
               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors

vim +1027 mm/memory_hotplug.c

  1016	
  1017	static int check_hotremove_memory_range(u64 start, u64 size)
  1018	{
  1019		int rc;
  1020	
  1021		BUG_ON(check_hotplug_memory_range(start, size));
  1022	
  1023		/*
  1024		 * First check if the platform is willing to have this
  1025		 * memory range removed else just abort.
  1026		 */
> 1027		rc = arch_memory_removable(start, size);
  1028		if (!rc)
  1029			return -EINVAL;
  1030	
  1031		return 0;
  1032	}
  1033	

---
0-DAY kernel test infrastructure                 Open Source Technology Center
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org Intel Corporation
kernel test robot Jan. 11, 2020, 7:49 p.m. UTC | #3
Hi Anshuman,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.5-rc5 next-20200109]
[cannot apply to arm64/for-next/core robh/for-next linux/master]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Anshuman-Khandual/arm64-mm-Enable-memory-hot-remove/20200111-003854
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 4a3033ef6e6bb4c566bd1d556de69b494d76976c
config: x86_64-randconfig-s1-20200111 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.2-10+deb8u1) 4.9.2
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   mm/memory_hotplug.c: In function 'check_hotremove_memory_range':
>> mm/memory_hotplug.c:1027:2: error: implicit declaration of function 'arch_memory_removable' [-Werror=implicit-function-declaration]
     rc = arch_memory_removable(start, size);
     ^
   mm/memory_hotplug.c: At top level:
   mm/memory_hotplug.c:1017:12: warning: 'check_hotremove_memory_range' defined but not used [-Wunused-function]
    static int check_hotremove_memory_range(u64 start, u64 size)
               ^
   Cyclomatic Complexity 5 include/linux/compiler.h:__read_once_size
   Cyclomatic Complexity 5 include/linux/compiler.h:__write_once_size
   Cyclomatic Complexity 2 arch/x86/include/asm/bitops.h:arch_set_bit
   Cyclomatic Complexity 2 arch/x86/include/asm/bitops.h:arch_clear_bit
   Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:fls64
   Cyclomatic Complexity 1 include/linux/log2.h:__ilog2_u64
   Cyclomatic Complexity 1 include/linux/list.h:INIT_LIST_HEAD
   Cyclomatic Complexity 1 include/linux/percpu-defs.h:__this_cpu_preempt_check
   Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:arch_atomic_read
   Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:arch_atomic_set
   Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:arch_atomic_inc
   Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:arch_atomic_add_return
   Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:arch_atomic_sub_return
   Cyclomatic Complexity 1 arch/x86/include/asm/atomic64_64.h:arch_atomic64_add
   Cyclomatic Complexity 2 include/linux/jump_label.h:static_key_false
   Cyclomatic Complexity 3 include/linux/string.h:memset
   Cyclomatic Complexity 1 include/linux/err.h:ERR_PTR
   Cyclomatic Complexity 1 include/linux/err.h:PTR_ERR
   Cyclomatic Complexity 5 arch/x86/include/asm/preempt.h:__preempt_count_add
   Cyclomatic Complexity 5 arch/x86/include/asm/preempt.h:__preempt_count_sub
   Cyclomatic Complexity 1 include/linux/spinlock.h:spinlock_check
   Cyclomatic Complexity 1 include/linux/spinlock.h:spin_lock
   Cyclomatic Complexity 1 include/linux/spinlock.h:spin_unlock
   Cyclomatic Complexity 1 include/linux/spinlock.h:spin_unlock_irqrestore
   Cyclomatic Complexity 1 include/linux/seqlock.h:raw_write_seqcount_begin
   Cyclomatic Complexity 1 include/linux/seqlock.h:raw_write_seqcount_end
   Cyclomatic Complexity 1 include/linux/nodemask.h:node_state
   Cyclomatic Complexity 1 include/linux/nodemask.h:node_set_state
   Cyclomatic Complexity 2 include/linux/notifier.h:notifier_to_errno
   Cyclomatic Complexity 1 include/linux/page-flags.h:ClearPageReserved
   Cyclomatic Complexity 1 include/linux/page-flags.h:SetPagePrivate
   Cyclomatic Complexity 1 include/linux/page-flags.h:ClearPagePrivate
   Cyclomatic Complexity 1 include/linux/mmzone.h:zone_end_pfn
   Cyclomatic Complexity 1 include/linux/mmzone.h:zone_is_empty
   Cyclomatic Complexity 4 include/linux/mmzone.h:zone_intersects
   Cyclomatic Complexity 1 include/linux/mmzone.h:pgdat_end_pfn
   Cyclomatic Complexity 1 include/linux/memory_hotplug.h:generic_free_nodedata
   Cyclomatic Complexity 1 include/linux/memory_hotplug.h:arch_refresh_nodedata
   Cyclomatic Complexity 1 include/linux/memory_hotplug.h:pgdat_resize_unlock
   Cyclomatic Complexity 1 include/linux/mmzone.h:populated_zone
   Cyclomatic Complexity 1 include/linux/mmzone.h:zone_to_nid
   Cyclomatic Complexity 1 include/linux/mmzone.h:pfn_to_section_nr
   Cyclomatic Complexity 3 include/linux/mmzone.h:__nr_to_section
   Cyclomatic Complexity 1 include/linux/mmzone.h:__section_mem_map_addr
   Cyclomatic Complexity 3 include/linux/mmzone.h:valid_section
   Cyclomatic Complexity 3 include/linux/mmzone.h:online_section
   Cyclomatic Complexity 1 include/linux/mmzone.h:online_section_nr
   Cyclomatic Complexity 1 include/linux/mmzone.h:__pfn_to_section
   Cyclomatic Complexity 1 include/linux/ioport.h:resource_size
   Cyclomatic Complexity 1 include/linux/memremap.h:vmem_altmap_offset
   Cyclomatic Complexity 1 include/linux/mm.h:page_zonenum
   Cyclomatic Complexity 1 include/linux/mm.h:page_zone
   Cyclomatic Complexity 1 include/linux/node.h:link_mem_sections
   Cyclomatic Complexity 1 include/linux/node.h:__register_one_node
   Cyclomatic Complexity 1 include/linux/node.h:register_one_node
   Cyclomatic Complexity 1 include/linux/cpu.h:cpus_read_lock
   Cyclomatic Complexity 1 include/linux/cpu.h:cpus_read_unlock
   Cyclomatic Complexity 1 include/linux/compaction.h:kcompactd_run
   Cyclomatic Complexity 6 mm/memory_hotplug.c:update_pgdat_span
   Cyclomatic Complexity 4 mm/memory_hotplug.c:node_states_check_changes_online
   Cyclomatic Complexity 4 mm/memory_hotplug.c:node_states_set_node
   Cyclomatic Complexity 3 mm/memory_hotplug.c:resize_zone_range
   Cyclomatic Complexity 3 mm/memory_hotplug.c:resize_pgdat_range
   Cyclomatic Complexity 3 mm/memory_hotplug.c:default_kernel_zone_for_pfn
   Cyclomatic Complexity 4 mm/memory_hotplug.c:default_zone_for_pfn
   Cyclomatic Complexity 2 mm/memory_hotplug.c:reset_node_present_pages
   Cyclomatic Complexity 1 include/linux/seqlock.h:write_seqcount_begin_nested
   Cyclomatic Complexity 1 include/linux/seqlock.h:write_seqcount_begin
   Cyclomatic Complexity 1 include/linux/seqlock.h:write_seqlock
   Cyclomatic Complexity 1 include/linux/memory_hotplug.h:zone_span_writelock
   Cyclomatic Complexity 8 mm/memory_hotplug.c:find_smallest_section_pfn
   Cyclomatic Complexity 8 mm/memory_hotplug.c:find_biggest_section_pfn
   Cyclomatic Complexity 1 include/linux/err.h:IS_ERR
   Cyclomatic Complexity 69 include/asm-generic/getorder.h:get_order
   Cyclomatic Complexity 4 include/linux/rcu_sync.h:rcu_sync_is_idle
   Cyclomatic Complexity 2 include/linux/percpu-rwsem.h:percpu_down_read
   Cyclomatic Complexity 2 include/linux/percpu-rwsem.h:percpu_up_read
   Cyclomatic Complexity 1 include/linux/seqlock.h:write_seqcount_end
   Cyclomatic Complexity 1 include/linux/seqlock.h:write_sequnlock
   Cyclomatic Complexity 1 include/linux/memory_hotplug.h:zone_span_writeunlock
   Cyclomatic Complexity 11 mm/memory_hotplug.c:shrink_zone_span
   Cyclomatic Complexity 3 mm/memory_hotplug.c:setup_memhp_default_state
   Cyclomatic Complexity 1 include/asm-generic/bitops/instrumented-atomic.h:set_bit
   Cyclomatic Complexity 1 include/asm-generic/atomic-instrumented.h:atomic_inc
   Cyclomatic Complexity 1 include/asm-generic/atomic-instrumented.h:atomic_sub_return
   Cyclomatic Complexity 1 include/linux/atomic-fallback.h:atomic_dec_return
   Cyclomatic Complexity 1 include/asm-generic/bitops/instrumented-atomic.h:clear_bit
   Cyclomatic Complexity 1 include/asm-generic/atomic-instrumented.h:atomic_set
   Cyclomatic Complexity 1 include/asm-generic/atomic-instrumented.h:atomic64_add
   Cyclomatic Complexity 1 include/asm-generic/atomic-long.h:atomic_long_add
   Cyclomatic Complexity 1 include/linux/mm.h:totalram_pages_add
   Cyclomatic Complexity 2 include/linux/page_ref.h:page_ref_inc
   Cyclomatic Complexity 1 include/asm-generic/atomic-instrumented.h:atomic_read
   Cyclomatic Complexity 1 include/linux/jump_label.h:static_key_count

vim +/arch_memory_removable +1027 mm/memory_hotplug.c

  1016	
  1017	static int check_hotremove_memory_range(u64 start, u64 size)
  1018	{
  1019		int rc;
  1020	
  1021		BUG_ON(check_hotplug_memory_range(start, size));
  1022	
  1023		/*
  1024		 * First check if the platform is willing to have this
  1025		 * memory range removed else just abort.
  1026		 */
> 1027		rc = arch_memory_removable(start, size);
  1028		if (!rc)
  1029			return -EINVAL;
  1030	
  1031		return 0;
  1032	}
  1033	

---
0-DAY kernel test infrastructure                 Open Source Technology Center
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org Intel Corporation
Anshuman Khandual Jan. 13, 2020, 4:06 a.m. UTC | #4
On 01/11/2020 07:41 PM, kbuild test robot wrote:
>    mm/memory_hotplug.c: In function 'check_hotremove_memory_range':
>>> mm/memory_hotplug.c:1027:7: error: implicit declaration of function 'arch_memory_removable'; did you mean 'add_memory_resource'? [-Werror=implicit-function-declaration]
>      rc = arch_memory_removable(start, size);
>           ^~~~~~~~~~~~~~~~~~~~~
>           add_memory_resource
>    At top level:
>    mm/memory_hotplug.c:1017:12: warning: 'check_hotremove_memory_range' defined but not used [-Wunused-function]
>     static int check_hotremove_memory_range(u64 start, u64 size)
>                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
>    cc1: some warnings being treated as errors
> 
> vim +1027 mm/memory_hotplug.c
> 
>   1016	
>   1017	static int check_hotremove_memory_range(u64 start, u64 size)
>   1018	{
>   1019		int rc;
>   1020	
>   1021		BUG_ON(check_hotplug_memory_range(start, size));
>   1022	
>   1023		/*
>   1024		 * First check if the platform is willing to have this
>   1025		 * memory range removed else just abort.
>   1026		 */
>> 1027		rc = arch_memory_removable(start, size);
>   1028		if (!rc)
>   1029			return -EINVAL;
>   1030	
>   1031		return 0;
>   1032	}
>   1033	


Both the build failures reported here could be solved by moving
check_hotremove_memory_range() inside CONFIG_MEMORY_HOTREMOVE
wrappers, will fix it.

- Anshuman
Anshuman Khandual Jan. 13, 2020, 9:11 a.m. UTC | #5
On 01/10/2020 02:12 PM, David Hildenbrand wrote:
> On 10.01.20 04:09, Anshuman Khandual wrote:
>> Currently there are two interfaces to initiate memory range hot removal i.e
>> remove_memory() and __remove_memory() which then calls try_remove_memory().
>> Platform gets called with arch_remove_memory() to tear down required kernel
>> page tables and other arch specific procedures. But there are platforms
>> like arm64 which might want to prevent removal of certain specific memory
>> ranges irrespective of their present usage or movability properties.
> 
> Why? Is this only relevant for boot memory? I hope so, otherwise the
> arch code needs fixing IMHO.

Right, it is relevant only for the boot memory on arm64 platform. But this
new arch callback makes it flexible to reject any given memory range.

> 
> If it's only boot memory, we should disallow offlining instead via a
> memory notifier - much cleaner.

Dont have much detail understanding of MMU notifier mechanism but from some
initial reading, it seems like we need to have a mm_struct for a notifier
to monitor various events on the page table. Just wondering how a physical
memory range like boot memory can be monitored because it can be used both
for for kernel (init_mm) or user space process at same time. Is there some
mechanism we could do this ?

> 
>>
>> Current arch call back arch_remove_memory() is too late in the process to
>> abort memory hot removal as memory block devices and firmware memory map
>> entries would have already been removed. Platforms should be able to abort
>> the process before taking the mem_hotplug_lock with mem_hotplug_begin().
>> This essentially requires a new arch callback for memory range validation.
> 
> I somewhat dislike this very much. Memory removal should never fail if
> used sanely. See e.g., __remove_memory(), it will BUG() whenever
> something like that would strike.
> 
>>
>> This differentiates memory range validation between memory hot add and hot
>> remove paths before carving out a new helper check_hotremove_memory_range()
>> which incorporates a new arch callback. This call back provides platforms
>> an opportunity to refuse memory removal at the very onset. In future the
>> same principle can be extended for memory hot add path if required.
>>
>> Platforms can choose to override this callback in order to reject specific
>> memory ranges from removal or can just fallback to a default implementation
>> which allows removal of all memory ranges.
> 
> I suspect we want really want to disallow offlining instead. E.g., I

If boot memory pages can be prevented from being offlined for sure, then it
would indirectly definitely prevent hot remove process as well.

> remember s390x does that with certain areas needed for dumping/kexec.

Could not find any references to mmu_notifier in arch/s390 or any other arch
for that matter apart from KVM (which has an user space component), could you
please give some pointers ?

> 
> Somebody who added memory via add_memory() should always be able to
> remove the memory via remove_memory() again. Only boot memory can be
> treated in a special way, but boot memory is initially always online.
>
David Hildenbrand Jan. 13, 2020, 9:14 a.m. UTC | #6
> Am 13.01.2020 um 10:10 schrieb Anshuman Khandual <anshuman.khandual@arm.com>:
> 
> 
> 
>> On 01/10/2020 02:12 PM, David Hildenbrand wrote:
>>> On 10.01.20 04:09, Anshuman Khandual wrote:
>>> Currently there are two interfaces to initiate memory range hot removal i.e
>>> remove_memory() and __remove_memory() which then calls try_remove_memory().
>>> Platform gets called with arch_remove_memory() to tear down required kernel
>>> page tables and other arch specific procedures. But there are platforms
>>> like arm64 which might want to prevent removal of certain specific memory
>>> ranges irrespective of their present usage or movability properties.
>> 
>> Why? Is this only relevant for boot memory? I hope so, otherwise the
>> arch code needs fixing IMHO.
> 
> Right, it is relevant only for the boot memory on arm64 platform. But this
> new arch callback makes it flexible to reject any given memory range.
> 
>> 
>> If it's only boot memory, we should disallow offlining instead via a
>> memory notifier - much cleaner.
> 
> Dont have much detail understanding of MMU notifier mechanism but from some
> initial reading, it seems like we need to have a mm_struct for a notifier
> to monitor various events on the page table. Just wondering how a physical
> memory range like boot memory can be monitored because it can be used both
> for for kernel (init_mm) or user space process at same time. Is there some
> mechanism we could do this ?
> 
>> 
>>> 
>>> Current arch call back arch_remove_memory() is too late in the process to
>>> abort memory hot removal as memory block devices and firmware memory map
>>> entries would have already been removed. Platforms should be able to abort
>>> the process before taking the mem_hotplug_lock with mem_hotplug_begin().
>>> This essentially requires a new arch callback for memory range validation.
>> 
>> I somewhat dislike this very much. Memory removal should never fail if
>> used sanely. See e.g., __remove_memory(), it will BUG() whenever
>> something like that would strike.
>> 
>>> 
>>> This differentiates memory range validation between memory hot add and hot
>>> remove paths before carving out a new helper check_hotremove_memory_range()
>>> which incorporates a new arch callback. This call back provides platforms
>>> an opportunity to refuse memory removal at the very onset. In future the
>>> same principle can be extended for memory hot add path if required.
>>> 
>>> Platforms can choose to override this callback in order to reject specific
>>> memory ranges from removal or can just fallback to a default implementation
>>> which allows removal of all memory ranges.
>> 
>> I suspect we want really want to disallow offlining instead. E.g., I
> 
> If boot memory pages can be prevented from being offlined for sure, then it
> would indirectly definitely prevent hot remove process as well.
> 
>> remember s390x does that with certain areas needed for dumping/kexec.
> 
> Could not find any references to mmu_notifier in arch/s390 or any other arch
> for that matter apart from KVM (which has an user space component), could you
> please give some pointers ?

Memory (hotplug) notifier, not MMU notifier :)

Not on my notebook right now, grep for MEM_GOING_OFFLINE, that should be it.
Anshuman Khandual Jan. 13, 2020, 9:50 a.m. UTC | #7
On 01/13/2020 02:44 PM, David Hildenbrand wrote:
> 
> 
>> Am 13.01.2020 um 10:10 schrieb Anshuman Khandual <anshuman.khandual@arm.com>:
>>
>> 
>>
>>> On 01/10/2020 02:12 PM, David Hildenbrand wrote:
>>>> On 10.01.20 04:09, Anshuman Khandual wrote:
>>>> Currently there are two interfaces to initiate memory range hot removal i.e
>>>> remove_memory() and __remove_memory() which then calls try_remove_memory().
>>>> Platform gets called with arch_remove_memory() to tear down required kernel
>>>> page tables and other arch specific procedures. But there are platforms
>>>> like arm64 which might want to prevent removal of certain specific memory
>>>> ranges irrespective of their present usage or movability properties.
>>>
>>> Why? Is this only relevant for boot memory? I hope so, otherwise the
>>> arch code needs fixing IMHO.
>>
>> Right, it is relevant only for the boot memory on arm64 platform. But this
>> new arch callback makes it flexible to reject any given memory range.
>>
>>>
>>> If it's only boot memory, we should disallow offlining instead via a
>>> memory notifier - much cleaner.
>>
>> Dont have much detail understanding of MMU notifier mechanism but from some
>> initial reading, it seems like we need to have a mm_struct for a notifier
>> to monitor various events on the page table. Just wondering how a physical
>> memory range like boot memory can be monitored because it can be used both
>> for for kernel (init_mm) or user space process at same time. Is there some
>> mechanism we could do this ?
>>
>>>
>>>>
>>>> Current arch call back arch_remove_memory() is too late in the process to
>>>> abort memory hot removal as memory block devices and firmware memory map
>>>> entries would have already been removed. Platforms should be able to abort
>>>> the process before taking the mem_hotplug_lock with mem_hotplug_begin().
>>>> This essentially requires a new arch callback for memory range validation.
>>>
>>> I somewhat dislike this very much. Memory removal should never fail if
>>> used sanely. See e.g., __remove_memory(), it will BUG() whenever
>>> something like that would strike.
>>>
>>>>
>>>> This differentiates memory range validation between memory hot add and hot
>>>> remove paths before carving out a new helper check_hotremove_memory_range()
>>>> which incorporates a new arch callback. This call back provides platforms
>>>> an opportunity to refuse memory removal at the very onset. In future the
>>>> same principle can be extended for memory hot add path if required.
>>>>
>>>> Platforms can choose to override this callback in order to reject specific
>>>> memory ranges from removal or can just fallback to a default implementation
>>>> which allows removal of all memory ranges.
>>>
>>> I suspect we want really want to disallow offlining instead. E.g., I
>>
>> If boot memory pages can be prevented from being offlined for sure, then it
>> would indirectly definitely prevent hot remove process as well.
>>
>>> remember s390x does that with certain areas needed for dumping/kexec.
>>
>> Could not find any references to mmu_notifier in arch/s390 or any other arch
>> for that matter apart from KVM (which has an user space component), could you
>> please give some pointers ?
> 
> Memory (hotplug) notifier, not MMU notifier :)

They are so similarly named :)

> 
> Not on my notebook right now, grep for MEM_GOING_OFFLINE, that should be it.
> 

Got it, thanks ! But we will still need boot memory enumeration via MEMBLOCK_BOOT
to reject affected offline requests in the callback.
David Hildenbrand Jan. 13, 2020, 10:37 a.m. UTC | #8
On 13.01.20 10:50, Anshuman Khandual wrote:
> 
> 
> On 01/13/2020 02:44 PM, David Hildenbrand wrote:
>>
>>
>>> Am 13.01.2020 um 10:10 schrieb Anshuman Khandual <anshuman.khandual@arm.com>:
>>>
>>> 
>>>
>>>> On 01/10/2020 02:12 PM, David Hildenbrand wrote:
>>>>> On 10.01.20 04:09, Anshuman Khandual wrote:
>>>>> Currently there are two interfaces to initiate memory range hot removal i.e
>>>>> remove_memory() and __remove_memory() which then calls try_remove_memory().
>>>>> Platform gets called with arch_remove_memory() to tear down required kernel
>>>>> page tables and other arch specific procedures. But there are platforms
>>>>> like arm64 which might want to prevent removal of certain specific memory
>>>>> ranges irrespective of their present usage or movability properties.
>>>>
>>>> Why? Is this only relevant for boot memory? I hope so, otherwise the
>>>> arch code needs fixing IMHO.
>>>
>>> Right, it is relevant only for the boot memory on arm64 platform. But this
>>> new arch callback makes it flexible to reject any given memory range.
>>>
>>>>
>>>> If it's only boot memory, we should disallow offlining instead via a
>>>> memory notifier - much cleaner.
>>>
>>> Dont have much detail understanding of MMU notifier mechanism but from some
>>> initial reading, it seems like we need to have a mm_struct for a notifier
>>> to monitor various events on the page table. Just wondering how a physical
>>> memory range like boot memory can be monitored because it can be used both
>>> for for kernel (init_mm) or user space process at same time. Is there some
>>> mechanism we could do this ?
>>>
>>>>
>>>>>
>>>>> Current arch call back arch_remove_memory() is too late in the process to
>>>>> abort memory hot removal as memory block devices and firmware memory map
>>>>> entries would have already been removed. Platforms should be able to abort
>>>>> the process before taking the mem_hotplug_lock with mem_hotplug_begin().
>>>>> This essentially requires a new arch callback for memory range validation.
>>>>
>>>> I somewhat dislike this very much. Memory removal should never fail if
>>>> used sanely. See e.g., __remove_memory(), it will BUG() whenever
>>>> something like that would strike.
>>>>
>>>>>
>>>>> This differentiates memory range validation between memory hot add and hot
>>>>> remove paths before carving out a new helper check_hotremove_memory_range()
>>>>> which incorporates a new arch callback. This call back provides platforms
>>>>> an opportunity to refuse memory removal at the very onset. In future the
>>>>> same principle can be extended for memory hot add path if required.
>>>>>
>>>>> Platforms can choose to override this callback in order to reject specific
>>>>> memory ranges from removal or can just fallback to a default implementation
>>>>> which allows removal of all memory ranges.
>>>>
>>>> I suspect we want really want to disallow offlining instead. E.g., I
>>>
>>> If boot memory pages can be prevented from being offlined for sure, then it
>>> would indirectly definitely prevent hot remove process as well.
>>>
>>>> remember s390x does that with certain areas needed for dumping/kexec.
>>>
>>> Could not find any references to mmu_notifier in arch/s390 or any other arch
>>> for that matter apart from KVM (which has an user space component), could you
>>> please give some pointers ?
>>
>> Memory (hotplug) notifier, not MMU notifier :)
> 
> They are so similarly named :)
> 
>>
>> Not on my notebook right now, grep for MEM_GOING_OFFLINE, that should be it.
>>
> 
> Got it, thanks ! But we will still need boot memory enumeration via MEMBLOCK_BOOT
> to reject affected offline requests in the callback.

Do you really need that?

We have SECTION_IS_EARLY. You could iterate all involved sections (for
which you are getting notified) and check if any one of these is marked
SECTION_IS_EARLY. then, it was added during boot and not via add_memory().
Anshuman Khandual Jan. 14, 2020, 2:13 a.m. UTC | #9
On 01/13/2020 04:07 PM, David Hildenbrand wrote:
> On 13.01.20 10:50, Anshuman Khandual wrote:
>>
>>
>> On 01/13/2020 02:44 PM, David Hildenbrand wrote:
>>>
>>>
>>>> Am 13.01.2020 um 10:10 schrieb Anshuman Khandual <anshuman.khandual@arm.com>:
>>>>
>>>> 
>>>>
>>>>> On 01/10/2020 02:12 PM, David Hildenbrand wrote:
>>>>>> On 10.01.20 04:09, Anshuman Khandual wrote:
>>>>>> Currently there are two interfaces to initiate memory range hot removal i.e
>>>>>> remove_memory() and __remove_memory() which then calls try_remove_memory().
>>>>>> Platform gets called with arch_remove_memory() to tear down required kernel
>>>>>> page tables and other arch specific procedures. But there are platforms
>>>>>> like arm64 which might want to prevent removal of certain specific memory
>>>>>> ranges irrespective of their present usage or movability properties.
>>>>>
>>>>> Why? Is this only relevant for boot memory? I hope so, otherwise the
>>>>> arch code needs fixing IMHO.
>>>>
>>>> Right, it is relevant only for the boot memory on arm64 platform. But this
>>>> new arch callback makes it flexible to reject any given memory range.
>>>>
>>>>>
>>>>> If it's only boot memory, we should disallow offlining instead via a
>>>>> memory notifier - much cleaner.
>>>>
>>>> Dont have much detail understanding of MMU notifier mechanism but from some
>>>> initial reading, it seems like we need to have a mm_struct for a notifier
>>>> to monitor various events on the page table. Just wondering how a physical
>>>> memory range like boot memory can be monitored because it can be used both
>>>> for for kernel (init_mm) or user space process at same time. Is there some
>>>> mechanism we could do this ?
>>>>
>>>>>
>>>>>>
>>>>>> Current arch call back arch_remove_memory() is too late in the process to
>>>>>> abort memory hot removal as memory block devices and firmware memory map
>>>>>> entries would have already been removed. Platforms should be able to abort
>>>>>> the process before taking the mem_hotplug_lock with mem_hotplug_begin().
>>>>>> This essentially requires a new arch callback for memory range validation.
>>>>>
>>>>> I somewhat dislike this very much. Memory removal should never fail if
>>>>> used sanely. See e.g., __remove_memory(), it will BUG() whenever
>>>>> something like that would strike.
>>>>>
>>>>>>
>>>>>> This differentiates memory range validation between memory hot add and hot
>>>>>> remove paths before carving out a new helper check_hotremove_memory_range()
>>>>>> which incorporates a new arch callback. This call back provides platforms
>>>>>> an opportunity to refuse memory removal at the very onset. In future the
>>>>>> same principle can be extended for memory hot add path if required.
>>>>>>
>>>>>> Platforms can choose to override this callback in order to reject specific
>>>>>> memory ranges from removal or can just fallback to a default implementation
>>>>>> which allows removal of all memory ranges.
>>>>>
>>>>> I suspect we want really want to disallow offlining instead. E.g., I
>>>>
>>>> If boot memory pages can be prevented from being offlined for sure, then it
>>>> would indirectly definitely prevent hot remove process as well.
>>>>
>>>>> remember s390x does that with certain areas needed for dumping/kexec.
>>>>
>>>> Could not find any references to mmu_notifier in arch/s390 or any other arch
>>>> for that matter apart from KVM (which has an user space component), could you
>>>> please give some pointers ?
>>>
>>> Memory (hotplug) notifier, not MMU notifier :)
>>
>> They are so similarly named :)
>>
>>>
>>> Not on my notebook right now, grep for MEM_GOING_OFFLINE, that should be it.
>>>
>>
>> Got it, thanks ! But we will still need boot memory enumeration via MEMBLOCK_BOOT
>> to reject affected offline requests in the callback.
> 
> Do you really need that?
> 
> We have SECTION_IS_EARLY. You could iterate all involved sections (for
> which you are getting notified) and check if any one of these is marked
> SECTION_IS_EARLY. then, it was added during boot and not via add_memory().

Seems to be a better approach than adding a new memblock flag.

> 
>
Anshuman Khandual Jan. 14, 2020, 11:09 a.m. UTC | #10
On 01/14/2020 07:43 AM, Anshuman Khandual wrote:
> 
> 
> On 01/13/2020 04:07 PM, David Hildenbrand wrote:
>> On 13.01.20 10:50, Anshuman Khandual wrote:
>>>
>>>
>>> On 01/13/2020 02:44 PM, David Hildenbrand wrote:
>>>>
>>>>
>>>>> Am 13.01.2020 um 10:10 schrieb Anshuman Khandual <anshuman.khandual@arm.com>:
>>>>>
>>>>> 
>>>>>
>>>>>> On 01/10/2020 02:12 PM, David Hildenbrand wrote:
>>>>>>> On 10.01.20 04:09, Anshuman Khandual wrote:
>>>>>>> Currently there are two interfaces to initiate memory range hot removal i.e
>>>>>>> remove_memory() and __remove_memory() which then calls try_remove_memory().
>>>>>>> Platform gets called with arch_remove_memory() to tear down required kernel
>>>>>>> page tables and other arch specific procedures. But there are platforms
>>>>>>> like arm64 which might want to prevent removal of certain specific memory
>>>>>>> ranges irrespective of their present usage or movability properties.
>>>>>>
>>>>>> Why? Is this only relevant for boot memory? I hope so, otherwise the
>>>>>> arch code needs fixing IMHO.
>>>>>
>>>>> Right, it is relevant only for the boot memory on arm64 platform. But this
>>>>> new arch callback makes it flexible to reject any given memory range.
>>>>>
>>>>>>
>>>>>> If it's only boot memory, we should disallow offlining instead via a
>>>>>> memory notifier - much cleaner.
>>>>>
>>>>> Dont have much detail understanding of MMU notifier mechanism but from some
>>>>> initial reading, it seems like we need to have a mm_struct for a notifier
>>>>> to monitor various events on the page table. Just wondering how a physical
>>>>> memory range like boot memory can be monitored because it can be used both
>>>>> for for kernel (init_mm) or user space process at same time. Is there some
>>>>> mechanism we could do this ?
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Current arch call back arch_remove_memory() is too late in the process to
>>>>>>> abort memory hot removal as memory block devices and firmware memory map
>>>>>>> entries would have already been removed. Platforms should be able to abort
>>>>>>> the process before taking the mem_hotplug_lock with mem_hotplug_begin().
>>>>>>> This essentially requires a new arch callback for memory range validation.
>>>>>>
>>>>>> I somewhat dislike this very much. Memory removal should never fail if
>>>>>> used sanely. See e.g., __remove_memory(), it will BUG() whenever
>>>>>> something like that would strike.
>>>>>>
>>>>>>>
>>>>>>> This differentiates memory range validation between memory hot add and hot
>>>>>>> remove paths before carving out a new helper check_hotremove_memory_range()
>>>>>>> which incorporates a new arch callback. This call back provides platforms
>>>>>>> an opportunity to refuse memory removal at the very onset. In future the
>>>>>>> same principle can be extended for memory hot add path if required.
>>>>>>>
>>>>>>> Platforms can choose to override this callback in order to reject specific
>>>>>>> memory ranges from removal or can just fallback to a default implementation
>>>>>>> which allows removal of all memory ranges.
>>>>>>
>>>>>> I suspect we want really want to disallow offlining instead. E.g., I
>>>>>
>>>>> If boot memory pages can be prevented from being offlined for sure, then it
>>>>> would indirectly definitely prevent hot remove process as well.
>>>>>
>>>>>> remember s390x does that with certain areas needed for dumping/kexec.
>>>>>
>>>>> Could not find any references to mmu_notifier in arch/s390 or any other arch
>>>>> for that matter apart from KVM (which has an user space component), could you
>>>>> please give some pointers ?
>>>>
>>>> Memory (hotplug) notifier, not MMU notifier :)
>>>
>>> They are so similarly named :)
>>>
>>>>
>>>> Not on my notebook right now, grep for MEM_GOING_OFFLINE, that should be it.
>>>>
>>>
>>> Got it, thanks ! But we will still need boot memory enumeration via MEMBLOCK_BOOT
>>> to reject affected offline requests in the callback.
>>
>> Do you really need that?
>>
>> We have SECTION_IS_EARLY. You could iterate all involved sections (for
>> which you are getting notified) and check if any one of these is marked
>> SECTION_IS_EARLY. then, it was added during boot and not via add_memory().
> 
> Seems to be a better approach than adding a new memblock flag.

These additional changes do the trick and prevent boot memory removal.
Hope this is in line with your earlier suggestion.

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 00f3e1836558..3b59e6a29dea 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -17,6 +17,7 @@
+++ b/arch/arm64/mm/mmu.c
@@ -17,6 +17,7 @@
 #include <linux/mman.h>
 #include <linux/nodemask.h>
 #include <linux/memblock.h>
+#include <linux/memory.h>
 #include <linux/fs.h>
 #include <linux/io.h>
 #include <linux/mm.h>
@@ -1365,4 +1366,37 @@ void arch_remove_memory(int nid, u64 start, u64 size,
        __remove_pages(start_pfn, nr_pages, altmap);
        __remove_pgd_mapping(swapper_pg_dir, __phys_to_virt(start), size);
 }
+
+static int boot_mem_remove_notifier(struct notifier_block *nb,
+                                   unsigned long action, void *data)
+{
+       unsigned long start_pfn, end_pfn, pfn, section_nr;
+       struct mem_section *ms;
+       struct memory_notify *arg = data;
+
+       start_pfn = arg->start_pfn;
+       end_pfn = start_pfn + arg->nr_pages;
+
+       if (action != MEM_GOING_OFFLINE)
+               return NOTIFY_OK;
+
+       for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
+               section_nr = pfn_to_section_nr(pfn);
+               ms = __nr_to_section(section_nr);
+
+               if (early_section(ms))
+                       return NOTIFY_BAD;
+       }
+       return NOTIFY_OK;
+}
+
+static struct notifier_block boot_mem_remove_nb = {
+       .notifier_call = boot_mem_remove_notifier,
+};
+
+static int __init boot_mem_remove_init(void)
+{
+       return register_memory_notifier(&boot_mem_remove_nb);
+}
+device_initcall(boot_mem_remove_init);
 #endif

> 
>>
>>
> 
>
David Hildenbrand Jan. 14, 2020, 12:30 p.m. UTC | #11
On 14.01.20 12:09, Anshuman Khandual wrote:
> 
> 
> On 01/14/2020 07:43 AM, Anshuman Khandual wrote:
>>
>>
>> On 01/13/2020 04:07 PM, David Hildenbrand wrote:
>>> On 13.01.20 10:50, Anshuman Khandual wrote:
>>>>
>>>>
>>>> On 01/13/2020 02:44 PM, David Hildenbrand wrote:
>>>>>
>>>>>
>>>>>> Am 13.01.2020 um 10:10 schrieb Anshuman Khandual <anshuman.khandual@arm.com>:
>>>>>>
>>>>>> 
>>>>>>
>>>>>>> On 01/10/2020 02:12 PM, David Hildenbrand wrote:
>>>>>>>> On 10.01.20 04:09, Anshuman Khandual wrote:
>>>>>>>> Currently there are two interfaces to initiate memory range hot removal i.e
>>>>>>>> remove_memory() and __remove_memory() which then calls try_remove_memory().
>>>>>>>> Platform gets called with arch_remove_memory() to tear down required kernel
>>>>>>>> page tables and other arch specific procedures. But there are platforms
>>>>>>>> like arm64 which might want to prevent removal of certain specific memory
>>>>>>>> ranges irrespective of their present usage or movability properties.
>>>>>>>
>>>>>>> Why? Is this only relevant for boot memory? I hope so, otherwise the
>>>>>>> arch code needs fixing IMHO.
>>>>>>
>>>>>> Right, it is relevant only for the boot memory on arm64 platform. But this
>>>>>> new arch callback makes it flexible to reject any given memory range.
>>>>>>
>>>>>>>
>>>>>>> If it's only boot memory, we should disallow offlining instead via a
>>>>>>> memory notifier - much cleaner.
>>>>>>
>>>>>> Dont have much detail understanding of MMU notifier mechanism but from some
>>>>>> initial reading, it seems like we need to have a mm_struct for a notifier
>>>>>> to monitor various events on the page table. Just wondering how a physical
>>>>>> memory range like boot memory can be monitored because it can be used both
>>>>>> for for kernel (init_mm) or user space process at same time. Is there some
>>>>>> mechanism we could do this ?
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Current arch call back arch_remove_memory() is too late in the process to
>>>>>>>> abort memory hot removal as memory block devices and firmware memory map
>>>>>>>> entries would have already been removed. Platforms should be able to abort
>>>>>>>> the process before taking the mem_hotplug_lock with mem_hotplug_begin().
>>>>>>>> This essentially requires a new arch callback for memory range validation.
>>>>>>>
>>>>>>> I somewhat dislike this very much. Memory removal should never fail if
>>>>>>> used sanely. See e.g., __remove_memory(), it will BUG() whenever
>>>>>>> something like that would strike.
>>>>>>>
>>>>>>>>
>>>>>>>> This differentiates memory range validation between memory hot add and hot
>>>>>>>> remove paths before carving out a new helper check_hotremove_memory_range()
>>>>>>>> which incorporates a new arch callback. This call back provides platforms
>>>>>>>> an opportunity to refuse memory removal at the very onset. In future the
>>>>>>>> same principle can be extended for memory hot add path if required.
>>>>>>>>
>>>>>>>> Platforms can choose to override this callback in order to reject specific
>>>>>>>> memory ranges from removal or can just fallback to a default implementation
>>>>>>>> which allows removal of all memory ranges.
>>>>>>>
>>>>>>> I suspect we want really want to disallow offlining instead. E.g., I
>>>>>>
>>>>>> If boot memory pages can be prevented from being offlined for sure, then it
>>>>>> would indirectly definitely prevent hot remove process as well.
>>>>>>
>>>>>>> remember s390x does that with certain areas needed for dumping/kexec.
>>>>>>
>>>>>> Could not find any references to mmu_notifier in arch/s390 or any other arch
>>>>>> for that matter apart from KVM (which has an user space component), could you
>>>>>> please give some pointers ?
>>>>>
>>>>> Memory (hotplug) notifier, not MMU notifier :)
>>>>
>>>> They are so similarly named :)
>>>>
>>>>>
>>>>> Not on my notebook right now, grep for MEM_GOING_OFFLINE, that should be it.
>>>>>
>>>>
>>>> Got it, thanks ! But we will still need boot memory enumeration via MEMBLOCK_BOOT
>>>> to reject affected offline requests in the callback.
>>>
>>> Do you really need that?
>>>
>>> We have SECTION_IS_EARLY. You could iterate all involved sections (for
>>> which you are getting notified) and check if any one of these is marked
>>> SECTION_IS_EARLY. then, it was added during boot and not via add_memory().
>>
>> Seems to be a better approach than adding a new memblock flag.
> 
> These additional changes do the trick and prevent boot memory removal.
> Hope this is in line with your earlier suggestion.
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 00f3e1836558..3b59e6a29dea 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -17,6 +17,7 @@
> +++ b/arch/arm64/mm/mmu.c
> @@ -17,6 +17,7 @@
>  #include <linux/mman.h>
>  #include <linux/nodemask.h>
>  #include <linux/memblock.h>
> +#include <linux/memory.h>
>  #include <linux/fs.h>
>  #include <linux/io.h>
>  #include <linux/mm.h>
> @@ -1365,4 +1366,37 @@ void arch_remove_memory(int nid, u64 start, u64 size,
>         __remove_pages(start_pfn, nr_pages, altmap);
>         __remove_pgd_mapping(swapper_pg_dir, __phys_to_virt(start), size);
>  }
> +
> +static int boot_mem_remove_notifier(struct notifier_block *nb,
> +                                   unsigned long action, void *data)
> +{
> +       unsigned long start_pfn, end_pfn, pfn, section_nr;
> +       struct mem_section *ms;
> +       struct memory_notify *arg = data;
> +
> +       start_pfn = 
> +       end_pfn = start_pfn + arg->nr_pages;

You can initialize some of these directly

struct memory_notify *arg = data;
const unsigned long end_pfn = arg->start_pfn; + arg->nr_pages;
unsigned long pfn = arg->start_pfn;

and avoid start_pfn.

> +
> +       if (action != MEM_GOING_OFFLINE)
> +               return NOTIFY_OK;
> +
> +       for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
> +               section_nr = ;
> +               ms = __nr_to_section(section_nr);

Also, I think you can avoid section_nr.

		ms = __nr_to_section(pfn_to_section_nr(pfn));

> +
> +               if (early_section(ms))
> +                       return NOTIFY_BAD;
> +       }
> +       return NOTIFY_OK;
> +}
> +
> +static struct notifier_block boot_mem_remove_nb = {
> +       .notifier_call = boot_mem_remove_notifier,
> +};
> +
> +static int __init boot_mem_remove_init(void)
> +{
> +       return register_memory_notifier(&boot_mem_remove_nb);
> +}
> +device_initcall(boot_mem_remove_init);
>  #endif

Exactly what I was suggesting :)

If we ever need to offline+re-online boot memory (e.g., to a different
zone), we can think of something else.
diff mbox series

Patch

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index ba0dca6..f661bd5 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -305,6 +305,13 @@  static inline void pgdat_resize_init(struct pglist_data *pgdat) {}
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
 
+#ifndef arch_memory_removable
+static inline bool arch_memory_removable(u64 base, u64 size)
+{
+	return true;
+}
+#endif
+
 extern bool is_mem_section_removable(unsigned long pfn, unsigned long nr_pages);
 extern void try_offline_node(int nid);
 extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index a91a072..7cdf800 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1014,6 +1014,23 @@  static int check_hotplug_memory_range(u64 start, u64 size)
 	return 0;
 }
 
+static int check_hotremove_memory_range(u64 start, u64 size)
+{
+	int rc;
+
+	BUG_ON(check_hotplug_memory_range(start, size));
+
+	/*
+	 * First check if the platform is willing to have this
+	 * memory range removed else just abort.
+	 */
+	rc = arch_memory_removable(start, size);
+	if (!rc)
+		return -EINVAL;
+
+	return 0;
+}
+
 static int online_memory_block(struct memory_block *mem, void *arg)
 {
 	return device_online(&mem->dev);
@@ -1762,7 +1779,9 @@  static int __ref try_remove_memory(int nid, u64 start, u64 size)
 {
 	int rc = 0;
 
-	BUG_ON(check_hotplug_memory_range(start, size));
+	rc = check_hotremove_memory_range(start, size);
+	if (rc)
+		return rc;
 
 	mem_hotplug_begin();