Message ID | 1578625755-11792-2-git-send-email-anshuman.khandual@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64/mm: Enable memory hot remove | expand |
On 10.01.20 04:09, Anshuman Khandual wrote: > Currently there are two interfaces to initiate memory range hot removal i.e > remove_memory() and __remove_memory() which then calls try_remove_memory(). > Platform gets called with arch_remove_memory() to tear down required kernel > page tables and other arch specific procedures. But there are platforms > like arm64 which might want to prevent removal of certain specific memory > ranges irrespective of their present usage or movability properties. Why? Is this only relevant for boot memory? I hope so, otherwise the arch code needs fixing IMHO. If it's only boot memory, we should disallow offlining instead via a memory notifier - much cleaner. > > Current arch call back arch_remove_memory() is too late in the process to > abort memory hot removal as memory block devices and firmware memory map > entries would have already been removed. Platforms should be able to abort > the process before taking the mem_hotplug_lock with mem_hotplug_begin(). > This essentially requires a new arch callback for memory range validation. I somewhat dislike this very much. Memory removal should never fail if used sanely. See e.g., __remove_memory(), it will BUG() whenever something like that would strike. > > This differentiates memory range validation between memory hot add and hot > remove paths before carving out a new helper check_hotremove_memory_range() > which incorporates a new arch callback. This call back provides platforms > an opportunity to refuse memory removal at the very onset. In future the > same principle can be extended for memory hot add path if required. > > Platforms can choose to override this callback in order to reject specific > memory ranges from removal or can just fallback to a default implementation > which allows removal of all memory ranges. I suspect we want really want to disallow offlining instead. E.g., I remember s390x does that with certain areas needed for dumping/kexec. Somebody who added memory via add_memory() should always be able to remove the memory via remove_memory() again. Only boot memory can be treated in a special way, but boot memory is initially always online.
Hi Anshuman, Thank you for the patch! Yet something to improve: [auto build test ERROR on linus/master] [also build test ERROR on v5.5-rc5 next-20200110] [cannot apply to arm64/for-next/core robh/for-next linux/master] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system. BTW, we also suggest to use '--base' option to specify the base tree in git format-patch, please see https://stackoverflow.com/a/37406982] url: https://github.com/0day-ci/linux/commits/Anshuman-Khandual/arm64-mm-Enable-memory-hot-remove/20200111-003854 base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 4a3033ef6e6bb4c566bd1d556de69b494d76976c config: arm64-randconfig-a001-20200109 (attached as .config) compiler: aarch64-linux-gcc (GCC) 7.5.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree GCC_VERSION=7.5.0 make.cross ARCH=arm64 If you fix the issue, kindly add following tag Reported-by: kbuild test robot <lkp@intel.com> All errors (new ones prefixed by >>): mm/memory_hotplug.c: In function 'check_hotremove_memory_range': >> mm/memory_hotplug.c:1027:7: error: implicit declaration of function 'arch_memory_removable'; did you mean 'add_memory_resource'? [-Werror=implicit-function-declaration] rc = arch_memory_removable(start, size); ^~~~~~~~~~~~~~~~~~~~~ add_memory_resource At top level: mm/memory_hotplug.c:1017:12: warning: 'check_hotremove_memory_range' defined but not used [-Wunused-function] static int check_hotremove_memory_range(u64 start, u64 size) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: some warnings being treated as errors vim +1027 mm/memory_hotplug.c 1016 1017 static int check_hotremove_memory_range(u64 start, u64 size) 1018 { 1019 int rc; 1020 1021 BUG_ON(check_hotplug_memory_range(start, size)); 1022 1023 /* 1024 * First check if the platform is willing to have this 1025 * memory range removed else just abort. 1026 */ > 1027 rc = arch_memory_removable(start, size); 1028 if (!rc) 1029 return -EINVAL; 1030 1031 return 0; 1032 } 1033 --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org Intel Corporation
Hi Anshuman, Thank you for the patch! Yet something to improve: [auto build test ERROR on linus/master] [also build test ERROR on v5.5-rc5 next-20200109] [cannot apply to arm64/for-next/core robh/for-next linux/master] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system. BTW, we also suggest to use '--base' option to specify the base tree in git format-patch, please see https://stackoverflow.com/a/37406982] url: https://github.com/0day-ci/linux/commits/Anshuman-Khandual/arm64-mm-Enable-memory-hot-remove/20200111-003854 base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 4a3033ef6e6bb4c566bd1d556de69b494d76976c config: x86_64-randconfig-s1-20200111 (attached as .config) compiler: gcc-4.9 (Debian 4.9.2-10+deb8u1) 4.9.2 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 If you fix the issue, kindly add following tag Reported-by: kbuild test robot <lkp@intel.com> All errors (new ones prefixed by >>): mm/memory_hotplug.c: In function 'check_hotremove_memory_range': >> mm/memory_hotplug.c:1027:2: error: implicit declaration of function 'arch_memory_removable' [-Werror=implicit-function-declaration] rc = arch_memory_removable(start, size); ^ mm/memory_hotplug.c: At top level: mm/memory_hotplug.c:1017:12: warning: 'check_hotremove_memory_range' defined but not used [-Wunused-function] static int check_hotremove_memory_range(u64 start, u64 size) ^ Cyclomatic Complexity 5 include/linux/compiler.h:__read_once_size Cyclomatic Complexity 5 include/linux/compiler.h:__write_once_size Cyclomatic Complexity 2 arch/x86/include/asm/bitops.h:arch_set_bit Cyclomatic Complexity 2 arch/x86/include/asm/bitops.h:arch_clear_bit Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:fls64 Cyclomatic Complexity 1 include/linux/log2.h:__ilog2_u64 Cyclomatic Complexity 1 include/linux/list.h:INIT_LIST_HEAD Cyclomatic Complexity 1 include/linux/percpu-defs.h:__this_cpu_preempt_check Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:arch_atomic_read Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:arch_atomic_set Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:arch_atomic_inc Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:arch_atomic_add_return Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:arch_atomic_sub_return Cyclomatic Complexity 1 arch/x86/include/asm/atomic64_64.h:arch_atomic64_add Cyclomatic Complexity 2 include/linux/jump_label.h:static_key_false Cyclomatic Complexity 3 include/linux/string.h:memset Cyclomatic Complexity 1 include/linux/err.h:ERR_PTR Cyclomatic Complexity 1 include/linux/err.h:PTR_ERR Cyclomatic Complexity 5 arch/x86/include/asm/preempt.h:__preempt_count_add Cyclomatic Complexity 5 arch/x86/include/asm/preempt.h:__preempt_count_sub Cyclomatic Complexity 1 include/linux/spinlock.h:spinlock_check Cyclomatic Complexity 1 include/linux/spinlock.h:spin_lock Cyclomatic Complexity 1 include/linux/spinlock.h:spin_unlock Cyclomatic Complexity 1 include/linux/spinlock.h:spin_unlock_irqrestore Cyclomatic Complexity 1 include/linux/seqlock.h:raw_write_seqcount_begin Cyclomatic Complexity 1 include/linux/seqlock.h:raw_write_seqcount_end Cyclomatic Complexity 1 include/linux/nodemask.h:node_state Cyclomatic Complexity 1 include/linux/nodemask.h:node_set_state Cyclomatic Complexity 2 include/linux/notifier.h:notifier_to_errno Cyclomatic Complexity 1 include/linux/page-flags.h:ClearPageReserved Cyclomatic Complexity 1 include/linux/page-flags.h:SetPagePrivate Cyclomatic Complexity 1 include/linux/page-flags.h:ClearPagePrivate Cyclomatic Complexity 1 include/linux/mmzone.h:zone_end_pfn Cyclomatic Complexity 1 include/linux/mmzone.h:zone_is_empty Cyclomatic Complexity 4 include/linux/mmzone.h:zone_intersects Cyclomatic Complexity 1 include/linux/mmzone.h:pgdat_end_pfn Cyclomatic Complexity 1 include/linux/memory_hotplug.h:generic_free_nodedata Cyclomatic Complexity 1 include/linux/memory_hotplug.h:arch_refresh_nodedata Cyclomatic Complexity 1 include/linux/memory_hotplug.h:pgdat_resize_unlock Cyclomatic Complexity 1 include/linux/mmzone.h:populated_zone Cyclomatic Complexity 1 include/linux/mmzone.h:zone_to_nid Cyclomatic Complexity 1 include/linux/mmzone.h:pfn_to_section_nr Cyclomatic Complexity 3 include/linux/mmzone.h:__nr_to_section Cyclomatic Complexity 1 include/linux/mmzone.h:__section_mem_map_addr Cyclomatic Complexity 3 include/linux/mmzone.h:valid_section Cyclomatic Complexity 3 include/linux/mmzone.h:online_section Cyclomatic Complexity 1 include/linux/mmzone.h:online_section_nr Cyclomatic Complexity 1 include/linux/mmzone.h:__pfn_to_section Cyclomatic Complexity 1 include/linux/ioport.h:resource_size Cyclomatic Complexity 1 include/linux/memremap.h:vmem_altmap_offset Cyclomatic Complexity 1 include/linux/mm.h:page_zonenum Cyclomatic Complexity 1 include/linux/mm.h:page_zone Cyclomatic Complexity 1 include/linux/node.h:link_mem_sections Cyclomatic Complexity 1 include/linux/node.h:__register_one_node Cyclomatic Complexity 1 include/linux/node.h:register_one_node Cyclomatic Complexity 1 include/linux/cpu.h:cpus_read_lock Cyclomatic Complexity 1 include/linux/cpu.h:cpus_read_unlock Cyclomatic Complexity 1 include/linux/compaction.h:kcompactd_run Cyclomatic Complexity 6 mm/memory_hotplug.c:update_pgdat_span Cyclomatic Complexity 4 mm/memory_hotplug.c:node_states_check_changes_online Cyclomatic Complexity 4 mm/memory_hotplug.c:node_states_set_node Cyclomatic Complexity 3 mm/memory_hotplug.c:resize_zone_range Cyclomatic Complexity 3 mm/memory_hotplug.c:resize_pgdat_range Cyclomatic Complexity 3 mm/memory_hotplug.c:default_kernel_zone_for_pfn Cyclomatic Complexity 4 mm/memory_hotplug.c:default_zone_for_pfn Cyclomatic Complexity 2 mm/memory_hotplug.c:reset_node_present_pages Cyclomatic Complexity 1 include/linux/seqlock.h:write_seqcount_begin_nested Cyclomatic Complexity 1 include/linux/seqlock.h:write_seqcount_begin Cyclomatic Complexity 1 include/linux/seqlock.h:write_seqlock Cyclomatic Complexity 1 include/linux/memory_hotplug.h:zone_span_writelock Cyclomatic Complexity 8 mm/memory_hotplug.c:find_smallest_section_pfn Cyclomatic Complexity 8 mm/memory_hotplug.c:find_biggest_section_pfn Cyclomatic Complexity 1 include/linux/err.h:IS_ERR Cyclomatic Complexity 69 include/asm-generic/getorder.h:get_order Cyclomatic Complexity 4 include/linux/rcu_sync.h:rcu_sync_is_idle Cyclomatic Complexity 2 include/linux/percpu-rwsem.h:percpu_down_read Cyclomatic Complexity 2 include/linux/percpu-rwsem.h:percpu_up_read Cyclomatic Complexity 1 include/linux/seqlock.h:write_seqcount_end Cyclomatic Complexity 1 include/linux/seqlock.h:write_sequnlock Cyclomatic Complexity 1 include/linux/memory_hotplug.h:zone_span_writeunlock Cyclomatic Complexity 11 mm/memory_hotplug.c:shrink_zone_span Cyclomatic Complexity 3 mm/memory_hotplug.c:setup_memhp_default_state Cyclomatic Complexity 1 include/asm-generic/bitops/instrumented-atomic.h:set_bit Cyclomatic Complexity 1 include/asm-generic/atomic-instrumented.h:atomic_inc Cyclomatic Complexity 1 include/asm-generic/atomic-instrumented.h:atomic_sub_return Cyclomatic Complexity 1 include/linux/atomic-fallback.h:atomic_dec_return Cyclomatic Complexity 1 include/asm-generic/bitops/instrumented-atomic.h:clear_bit Cyclomatic Complexity 1 include/asm-generic/atomic-instrumented.h:atomic_set Cyclomatic Complexity 1 include/asm-generic/atomic-instrumented.h:atomic64_add Cyclomatic Complexity 1 include/asm-generic/atomic-long.h:atomic_long_add Cyclomatic Complexity 1 include/linux/mm.h:totalram_pages_add Cyclomatic Complexity 2 include/linux/page_ref.h:page_ref_inc Cyclomatic Complexity 1 include/asm-generic/atomic-instrumented.h:atomic_read Cyclomatic Complexity 1 include/linux/jump_label.h:static_key_count vim +/arch_memory_removable +1027 mm/memory_hotplug.c 1016 1017 static int check_hotremove_memory_range(u64 start, u64 size) 1018 { 1019 int rc; 1020 1021 BUG_ON(check_hotplug_memory_range(start, size)); 1022 1023 /* 1024 * First check if the platform is willing to have this 1025 * memory range removed else just abort. 1026 */ > 1027 rc = arch_memory_removable(start, size); 1028 if (!rc) 1029 return -EINVAL; 1030 1031 return 0; 1032 } 1033 --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org Intel Corporation
On 01/11/2020 07:41 PM, kbuild test robot wrote: > mm/memory_hotplug.c: In function 'check_hotremove_memory_range': >>> mm/memory_hotplug.c:1027:7: error: implicit declaration of function 'arch_memory_removable'; did you mean 'add_memory_resource'? [-Werror=implicit-function-declaration] > rc = arch_memory_removable(start, size); > ^~~~~~~~~~~~~~~~~~~~~ > add_memory_resource > At top level: > mm/memory_hotplug.c:1017:12: warning: 'check_hotremove_memory_range' defined but not used [-Wunused-function] > static int check_hotremove_memory_range(u64 start, u64 size) > ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ > cc1: some warnings being treated as errors > > vim +1027 mm/memory_hotplug.c > > 1016 > 1017 static int check_hotremove_memory_range(u64 start, u64 size) > 1018 { > 1019 int rc; > 1020 > 1021 BUG_ON(check_hotplug_memory_range(start, size)); > 1022 > 1023 /* > 1024 * First check if the platform is willing to have this > 1025 * memory range removed else just abort. > 1026 */ >> 1027 rc = arch_memory_removable(start, size); > 1028 if (!rc) > 1029 return -EINVAL; > 1030 > 1031 return 0; > 1032 } > 1033 Both the build failures reported here could be solved by moving check_hotremove_memory_range() inside CONFIG_MEMORY_HOTREMOVE wrappers, will fix it. - Anshuman
On 01/10/2020 02:12 PM, David Hildenbrand wrote: > On 10.01.20 04:09, Anshuman Khandual wrote: >> Currently there are two interfaces to initiate memory range hot removal i.e >> remove_memory() and __remove_memory() which then calls try_remove_memory(). >> Platform gets called with arch_remove_memory() to tear down required kernel >> page tables and other arch specific procedures. But there are platforms >> like arm64 which might want to prevent removal of certain specific memory >> ranges irrespective of their present usage or movability properties. > > Why? Is this only relevant for boot memory? I hope so, otherwise the > arch code needs fixing IMHO. Right, it is relevant only for the boot memory on arm64 platform. But this new arch callback makes it flexible to reject any given memory range. > > If it's only boot memory, we should disallow offlining instead via a > memory notifier - much cleaner. Dont have much detail understanding of MMU notifier mechanism but from some initial reading, it seems like we need to have a mm_struct for a notifier to monitor various events on the page table. Just wondering how a physical memory range like boot memory can be monitored because it can be used both for for kernel (init_mm) or user space process at same time. Is there some mechanism we could do this ? > >> >> Current arch call back arch_remove_memory() is too late in the process to >> abort memory hot removal as memory block devices and firmware memory map >> entries would have already been removed. Platforms should be able to abort >> the process before taking the mem_hotplug_lock with mem_hotplug_begin(). >> This essentially requires a new arch callback for memory range validation. > > I somewhat dislike this very much. Memory removal should never fail if > used sanely. See e.g., __remove_memory(), it will BUG() whenever > something like that would strike. > >> >> This differentiates memory range validation between memory hot add and hot >> remove paths before carving out a new helper check_hotremove_memory_range() >> which incorporates a new arch callback. This call back provides platforms >> an opportunity to refuse memory removal at the very onset. In future the >> same principle can be extended for memory hot add path if required. >> >> Platforms can choose to override this callback in order to reject specific >> memory ranges from removal or can just fallback to a default implementation >> which allows removal of all memory ranges. > > I suspect we want really want to disallow offlining instead. E.g., I If boot memory pages can be prevented from being offlined for sure, then it would indirectly definitely prevent hot remove process as well. > remember s390x does that with certain areas needed for dumping/kexec. Could not find any references to mmu_notifier in arch/s390 or any other arch for that matter apart from KVM (which has an user space component), could you please give some pointers ? > > Somebody who added memory via add_memory() should always be able to > remove the memory via remove_memory() again. Only boot memory can be > treated in a special way, but boot memory is initially always online. >
> Am 13.01.2020 um 10:10 schrieb Anshuman Khandual <anshuman.khandual@arm.com>: > > > >> On 01/10/2020 02:12 PM, David Hildenbrand wrote: >>> On 10.01.20 04:09, Anshuman Khandual wrote: >>> Currently there are two interfaces to initiate memory range hot removal i.e >>> remove_memory() and __remove_memory() which then calls try_remove_memory(). >>> Platform gets called with arch_remove_memory() to tear down required kernel >>> page tables and other arch specific procedures. But there are platforms >>> like arm64 which might want to prevent removal of certain specific memory >>> ranges irrespective of their present usage or movability properties. >> >> Why? Is this only relevant for boot memory? I hope so, otherwise the >> arch code needs fixing IMHO. > > Right, it is relevant only for the boot memory on arm64 platform. But this > new arch callback makes it flexible to reject any given memory range. > >> >> If it's only boot memory, we should disallow offlining instead via a >> memory notifier - much cleaner. > > Dont have much detail understanding of MMU notifier mechanism but from some > initial reading, it seems like we need to have a mm_struct for a notifier > to monitor various events on the page table. Just wondering how a physical > memory range like boot memory can be monitored because it can be used both > for for kernel (init_mm) or user space process at same time. Is there some > mechanism we could do this ? > >> >>> >>> Current arch call back arch_remove_memory() is too late in the process to >>> abort memory hot removal as memory block devices and firmware memory map >>> entries would have already been removed. Platforms should be able to abort >>> the process before taking the mem_hotplug_lock with mem_hotplug_begin(). >>> This essentially requires a new arch callback for memory range validation. >> >> I somewhat dislike this very much. Memory removal should never fail if >> used sanely. See e.g., __remove_memory(), it will BUG() whenever >> something like that would strike. >> >>> >>> This differentiates memory range validation between memory hot add and hot >>> remove paths before carving out a new helper check_hotremove_memory_range() >>> which incorporates a new arch callback. This call back provides platforms >>> an opportunity to refuse memory removal at the very onset. In future the >>> same principle can be extended for memory hot add path if required. >>> >>> Platforms can choose to override this callback in order to reject specific >>> memory ranges from removal or can just fallback to a default implementation >>> which allows removal of all memory ranges. >> >> I suspect we want really want to disallow offlining instead. E.g., I > > If boot memory pages can be prevented from being offlined for sure, then it > would indirectly definitely prevent hot remove process as well. > >> remember s390x does that with certain areas needed for dumping/kexec. > > Could not find any references to mmu_notifier in arch/s390 or any other arch > for that matter apart from KVM (which has an user space component), could you > please give some pointers ? Memory (hotplug) notifier, not MMU notifier :) Not on my notebook right now, grep for MEM_GOING_OFFLINE, that should be it.
On 01/13/2020 02:44 PM, David Hildenbrand wrote: > > >> Am 13.01.2020 um 10:10 schrieb Anshuman Khandual <anshuman.khandual@arm.com>: >> >> >> >>> On 01/10/2020 02:12 PM, David Hildenbrand wrote: >>>> On 10.01.20 04:09, Anshuman Khandual wrote: >>>> Currently there are two interfaces to initiate memory range hot removal i.e >>>> remove_memory() and __remove_memory() which then calls try_remove_memory(). >>>> Platform gets called with arch_remove_memory() to tear down required kernel >>>> page tables and other arch specific procedures. But there are platforms >>>> like arm64 which might want to prevent removal of certain specific memory >>>> ranges irrespective of their present usage or movability properties. >>> >>> Why? Is this only relevant for boot memory? I hope so, otherwise the >>> arch code needs fixing IMHO. >> >> Right, it is relevant only for the boot memory on arm64 platform. But this >> new arch callback makes it flexible to reject any given memory range. >> >>> >>> If it's only boot memory, we should disallow offlining instead via a >>> memory notifier - much cleaner. >> >> Dont have much detail understanding of MMU notifier mechanism but from some >> initial reading, it seems like we need to have a mm_struct for a notifier >> to monitor various events on the page table. Just wondering how a physical >> memory range like boot memory can be monitored because it can be used both >> for for kernel (init_mm) or user space process at same time. Is there some >> mechanism we could do this ? >> >>> >>>> >>>> Current arch call back arch_remove_memory() is too late in the process to >>>> abort memory hot removal as memory block devices and firmware memory map >>>> entries would have already been removed. Platforms should be able to abort >>>> the process before taking the mem_hotplug_lock with mem_hotplug_begin(). >>>> This essentially requires a new arch callback for memory range validation. >>> >>> I somewhat dislike this very much. Memory removal should never fail if >>> used sanely. See e.g., __remove_memory(), it will BUG() whenever >>> something like that would strike. >>> >>>> >>>> This differentiates memory range validation between memory hot add and hot >>>> remove paths before carving out a new helper check_hotremove_memory_range() >>>> which incorporates a new arch callback. This call back provides platforms >>>> an opportunity to refuse memory removal at the very onset. In future the >>>> same principle can be extended for memory hot add path if required. >>>> >>>> Platforms can choose to override this callback in order to reject specific >>>> memory ranges from removal or can just fallback to a default implementation >>>> which allows removal of all memory ranges. >>> >>> I suspect we want really want to disallow offlining instead. E.g., I >> >> If boot memory pages can be prevented from being offlined for sure, then it >> would indirectly definitely prevent hot remove process as well. >> >>> remember s390x does that with certain areas needed for dumping/kexec. >> >> Could not find any references to mmu_notifier in arch/s390 or any other arch >> for that matter apart from KVM (which has an user space component), could you >> please give some pointers ? > > Memory (hotplug) notifier, not MMU notifier :) They are so similarly named :) > > Not on my notebook right now, grep for MEM_GOING_OFFLINE, that should be it. > Got it, thanks ! But we will still need boot memory enumeration via MEMBLOCK_BOOT to reject affected offline requests in the callback.
On 13.01.20 10:50, Anshuman Khandual wrote: > > > On 01/13/2020 02:44 PM, David Hildenbrand wrote: >> >> >>> Am 13.01.2020 um 10:10 schrieb Anshuman Khandual <anshuman.khandual@arm.com>: >>> >>> >>> >>>> On 01/10/2020 02:12 PM, David Hildenbrand wrote: >>>>> On 10.01.20 04:09, Anshuman Khandual wrote: >>>>> Currently there are two interfaces to initiate memory range hot removal i.e >>>>> remove_memory() and __remove_memory() which then calls try_remove_memory(). >>>>> Platform gets called with arch_remove_memory() to tear down required kernel >>>>> page tables and other arch specific procedures. But there are platforms >>>>> like arm64 which might want to prevent removal of certain specific memory >>>>> ranges irrespective of their present usage or movability properties. >>>> >>>> Why? Is this only relevant for boot memory? I hope so, otherwise the >>>> arch code needs fixing IMHO. >>> >>> Right, it is relevant only for the boot memory on arm64 platform. But this >>> new arch callback makes it flexible to reject any given memory range. >>> >>>> >>>> If it's only boot memory, we should disallow offlining instead via a >>>> memory notifier - much cleaner. >>> >>> Dont have much detail understanding of MMU notifier mechanism but from some >>> initial reading, it seems like we need to have a mm_struct for a notifier >>> to monitor various events on the page table. Just wondering how a physical >>> memory range like boot memory can be monitored because it can be used both >>> for for kernel (init_mm) or user space process at same time. Is there some >>> mechanism we could do this ? >>> >>>> >>>>> >>>>> Current arch call back arch_remove_memory() is too late in the process to >>>>> abort memory hot removal as memory block devices and firmware memory map >>>>> entries would have already been removed. Platforms should be able to abort >>>>> the process before taking the mem_hotplug_lock with mem_hotplug_begin(). >>>>> This essentially requires a new arch callback for memory range validation. >>>> >>>> I somewhat dislike this very much. Memory removal should never fail if >>>> used sanely. See e.g., __remove_memory(), it will BUG() whenever >>>> something like that would strike. >>>> >>>>> >>>>> This differentiates memory range validation between memory hot add and hot >>>>> remove paths before carving out a new helper check_hotremove_memory_range() >>>>> which incorporates a new arch callback. This call back provides platforms >>>>> an opportunity to refuse memory removal at the very onset. In future the >>>>> same principle can be extended for memory hot add path if required. >>>>> >>>>> Platforms can choose to override this callback in order to reject specific >>>>> memory ranges from removal or can just fallback to a default implementation >>>>> which allows removal of all memory ranges. >>>> >>>> I suspect we want really want to disallow offlining instead. E.g., I >>> >>> If boot memory pages can be prevented from being offlined for sure, then it >>> would indirectly definitely prevent hot remove process as well. >>> >>>> remember s390x does that with certain areas needed for dumping/kexec. >>> >>> Could not find any references to mmu_notifier in arch/s390 or any other arch >>> for that matter apart from KVM (which has an user space component), could you >>> please give some pointers ? >> >> Memory (hotplug) notifier, not MMU notifier :) > > They are so similarly named :) > >> >> Not on my notebook right now, grep for MEM_GOING_OFFLINE, that should be it. >> > > Got it, thanks ! But we will still need boot memory enumeration via MEMBLOCK_BOOT > to reject affected offline requests in the callback. Do you really need that? We have SECTION_IS_EARLY. You could iterate all involved sections (for which you are getting notified) and check if any one of these is marked SECTION_IS_EARLY. then, it was added during boot and not via add_memory().
On 01/13/2020 04:07 PM, David Hildenbrand wrote: > On 13.01.20 10:50, Anshuman Khandual wrote: >> >> >> On 01/13/2020 02:44 PM, David Hildenbrand wrote: >>> >>> >>>> Am 13.01.2020 um 10:10 schrieb Anshuman Khandual <anshuman.khandual@arm.com>: >>>> >>>> >>>> >>>>> On 01/10/2020 02:12 PM, David Hildenbrand wrote: >>>>>> On 10.01.20 04:09, Anshuman Khandual wrote: >>>>>> Currently there are two interfaces to initiate memory range hot removal i.e >>>>>> remove_memory() and __remove_memory() which then calls try_remove_memory(). >>>>>> Platform gets called with arch_remove_memory() to tear down required kernel >>>>>> page tables and other arch specific procedures. But there are platforms >>>>>> like arm64 which might want to prevent removal of certain specific memory >>>>>> ranges irrespective of their present usage or movability properties. >>>>> >>>>> Why? Is this only relevant for boot memory? I hope so, otherwise the >>>>> arch code needs fixing IMHO. >>>> >>>> Right, it is relevant only for the boot memory on arm64 platform. But this >>>> new arch callback makes it flexible to reject any given memory range. >>>> >>>>> >>>>> If it's only boot memory, we should disallow offlining instead via a >>>>> memory notifier - much cleaner. >>>> >>>> Dont have much detail understanding of MMU notifier mechanism but from some >>>> initial reading, it seems like we need to have a mm_struct for a notifier >>>> to monitor various events on the page table. Just wondering how a physical >>>> memory range like boot memory can be monitored because it can be used both >>>> for for kernel (init_mm) or user space process at same time. Is there some >>>> mechanism we could do this ? >>>> >>>>> >>>>>> >>>>>> Current arch call back arch_remove_memory() is too late in the process to >>>>>> abort memory hot removal as memory block devices and firmware memory map >>>>>> entries would have already been removed. Platforms should be able to abort >>>>>> the process before taking the mem_hotplug_lock with mem_hotplug_begin(). >>>>>> This essentially requires a new arch callback for memory range validation. >>>>> >>>>> I somewhat dislike this very much. Memory removal should never fail if >>>>> used sanely. See e.g., __remove_memory(), it will BUG() whenever >>>>> something like that would strike. >>>>> >>>>>> >>>>>> This differentiates memory range validation between memory hot add and hot >>>>>> remove paths before carving out a new helper check_hotremove_memory_range() >>>>>> which incorporates a new arch callback. This call back provides platforms >>>>>> an opportunity to refuse memory removal at the very onset. In future the >>>>>> same principle can be extended for memory hot add path if required. >>>>>> >>>>>> Platforms can choose to override this callback in order to reject specific >>>>>> memory ranges from removal or can just fallback to a default implementation >>>>>> which allows removal of all memory ranges. >>>>> >>>>> I suspect we want really want to disallow offlining instead. E.g., I >>>> >>>> If boot memory pages can be prevented from being offlined for sure, then it >>>> would indirectly definitely prevent hot remove process as well. >>>> >>>>> remember s390x does that with certain areas needed for dumping/kexec. >>>> >>>> Could not find any references to mmu_notifier in arch/s390 or any other arch >>>> for that matter apart from KVM (which has an user space component), could you >>>> please give some pointers ? >>> >>> Memory (hotplug) notifier, not MMU notifier :) >> >> They are so similarly named :) >> >>> >>> Not on my notebook right now, grep for MEM_GOING_OFFLINE, that should be it. >>> >> >> Got it, thanks ! But we will still need boot memory enumeration via MEMBLOCK_BOOT >> to reject affected offline requests in the callback. > > Do you really need that? > > We have SECTION_IS_EARLY. You could iterate all involved sections (for > which you are getting notified) and check if any one of these is marked > SECTION_IS_EARLY. then, it was added during boot and not via add_memory(). Seems to be a better approach than adding a new memblock flag. > >
On 01/14/2020 07:43 AM, Anshuman Khandual wrote: > > > On 01/13/2020 04:07 PM, David Hildenbrand wrote: >> On 13.01.20 10:50, Anshuman Khandual wrote: >>> >>> >>> On 01/13/2020 02:44 PM, David Hildenbrand wrote: >>>> >>>> >>>>> Am 13.01.2020 um 10:10 schrieb Anshuman Khandual <anshuman.khandual@arm.com>: >>>>> >>>>> >>>>> >>>>>> On 01/10/2020 02:12 PM, David Hildenbrand wrote: >>>>>>> On 10.01.20 04:09, Anshuman Khandual wrote: >>>>>>> Currently there are two interfaces to initiate memory range hot removal i.e >>>>>>> remove_memory() and __remove_memory() which then calls try_remove_memory(). >>>>>>> Platform gets called with arch_remove_memory() to tear down required kernel >>>>>>> page tables and other arch specific procedures. But there are platforms >>>>>>> like arm64 which might want to prevent removal of certain specific memory >>>>>>> ranges irrespective of their present usage or movability properties. >>>>>> >>>>>> Why? Is this only relevant for boot memory? I hope so, otherwise the >>>>>> arch code needs fixing IMHO. >>>>> >>>>> Right, it is relevant only for the boot memory on arm64 platform. But this >>>>> new arch callback makes it flexible to reject any given memory range. >>>>> >>>>>> >>>>>> If it's only boot memory, we should disallow offlining instead via a >>>>>> memory notifier - much cleaner. >>>>> >>>>> Dont have much detail understanding of MMU notifier mechanism but from some >>>>> initial reading, it seems like we need to have a mm_struct for a notifier >>>>> to monitor various events on the page table. Just wondering how a physical >>>>> memory range like boot memory can be monitored because it can be used both >>>>> for for kernel (init_mm) or user space process at same time. Is there some >>>>> mechanism we could do this ? >>>>> >>>>>> >>>>>>> >>>>>>> Current arch call back arch_remove_memory() is too late in the process to >>>>>>> abort memory hot removal as memory block devices and firmware memory map >>>>>>> entries would have already been removed. Platforms should be able to abort >>>>>>> the process before taking the mem_hotplug_lock with mem_hotplug_begin(). >>>>>>> This essentially requires a new arch callback for memory range validation. >>>>>> >>>>>> I somewhat dislike this very much. Memory removal should never fail if >>>>>> used sanely. See e.g., __remove_memory(), it will BUG() whenever >>>>>> something like that would strike. >>>>>> >>>>>>> >>>>>>> This differentiates memory range validation between memory hot add and hot >>>>>>> remove paths before carving out a new helper check_hotremove_memory_range() >>>>>>> which incorporates a new arch callback. This call back provides platforms >>>>>>> an opportunity to refuse memory removal at the very onset. In future the >>>>>>> same principle can be extended for memory hot add path if required. >>>>>>> >>>>>>> Platforms can choose to override this callback in order to reject specific >>>>>>> memory ranges from removal or can just fallback to a default implementation >>>>>>> which allows removal of all memory ranges. >>>>>> >>>>>> I suspect we want really want to disallow offlining instead. E.g., I >>>>> >>>>> If boot memory pages can be prevented from being offlined for sure, then it >>>>> would indirectly definitely prevent hot remove process as well. >>>>> >>>>>> remember s390x does that with certain areas needed for dumping/kexec. >>>>> >>>>> Could not find any references to mmu_notifier in arch/s390 or any other arch >>>>> for that matter apart from KVM (which has an user space component), could you >>>>> please give some pointers ? >>>> >>>> Memory (hotplug) notifier, not MMU notifier :) >>> >>> They are so similarly named :) >>> >>>> >>>> Not on my notebook right now, grep for MEM_GOING_OFFLINE, that should be it. >>>> >>> >>> Got it, thanks ! But we will still need boot memory enumeration via MEMBLOCK_BOOT >>> to reject affected offline requests in the callback. >> >> Do you really need that? >> >> We have SECTION_IS_EARLY. You could iterate all involved sections (for >> which you are getting notified) and check if any one of these is marked >> SECTION_IS_EARLY. then, it was added during boot and not via add_memory(). > > Seems to be a better approach than adding a new memblock flag. These additional changes do the trick and prevent boot memory removal. Hope this is in line with your earlier suggestion. diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 00f3e1836558..3b59e6a29dea 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -17,6 +17,7 @@ +++ b/arch/arm64/mm/mmu.c @@ -17,6 +17,7 @@ #include <linux/mman.h> #include <linux/nodemask.h> #include <linux/memblock.h> +#include <linux/memory.h> #include <linux/fs.h> #include <linux/io.h> #include <linux/mm.h> @@ -1365,4 +1366,37 @@ void arch_remove_memory(int nid, u64 start, u64 size, __remove_pages(start_pfn, nr_pages, altmap); __remove_pgd_mapping(swapper_pg_dir, __phys_to_virt(start), size); } + +static int boot_mem_remove_notifier(struct notifier_block *nb, + unsigned long action, void *data) +{ + unsigned long start_pfn, end_pfn, pfn, section_nr; + struct mem_section *ms; + struct memory_notify *arg = data; + + start_pfn = arg->start_pfn; + end_pfn = start_pfn + arg->nr_pages; + + if (action != MEM_GOING_OFFLINE) + return NOTIFY_OK; + + for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) { + section_nr = pfn_to_section_nr(pfn); + ms = __nr_to_section(section_nr); + + if (early_section(ms)) + return NOTIFY_BAD; + } + return NOTIFY_OK; +} + +static struct notifier_block boot_mem_remove_nb = { + .notifier_call = boot_mem_remove_notifier, +}; + +static int __init boot_mem_remove_init(void) +{ + return register_memory_notifier(&boot_mem_remove_nb); +} +device_initcall(boot_mem_remove_init); #endif > >> >> > >
On 14.01.20 12:09, Anshuman Khandual wrote: > > > On 01/14/2020 07:43 AM, Anshuman Khandual wrote: >> >> >> On 01/13/2020 04:07 PM, David Hildenbrand wrote: >>> On 13.01.20 10:50, Anshuman Khandual wrote: >>>> >>>> >>>> On 01/13/2020 02:44 PM, David Hildenbrand wrote: >>>>> >>>>> >>>>>> Am 13.01.2020 um 10:10 schrieb Anshuman Khandual <anshuman.khandual@arm.com>: >>>>>> >>>>>> >>>>>> >>>>>>> On 01/10/2020 02:12 PM, David Hildenbrand wrote: >>>>>>>> On 10.01.20 04:09, Anshuman Khandual wrote: >>>>>>>> Currently there are two interfaces to initiate memory range hot removal i.e >>>>>>>> remove_memory() and __remove_memory() which then calls try_remove_memory(). >>>>>>>> Platform gets called with arch_remove_memory() to tear down required kernel >>>>>>>> page tables and other arch specific procedures. But there are platforms >>>>>>>> like arm64 which might want to prevent removal of certain specific memory >>>>>>>> ranges irrespective of their present usage or movability properties. >>>>>>> >>>>>>> Why? Is this only relevant for boot memory? I hope so, otherwise the >>>>>>> arch code needs fixing IMHO. >>>>>> >>>>>> Right, it is relevant only for the boot memory on arm64 platform. But this >>>>>> new arch callback makes it flexible to reject any given memory range. >>>>>> >>>>>>> >>>>>>> If it's only boot memory, we should disallow offlining instead via a >>>>>>> memory notifier - much cleaner. >>>>>> >>>>>> Dont have much detail understanding of MMU notifier mechanism but from some >>>>>> initial reading, it seems like we need to have a mm_struct for a notifier >>>>>> to monitor various events on the page table. Just wondering how a physical >>>>>> memory range like boot memory can be monitored because it can be used both >>>>>> for for kernel (init_mm) or user space process at same time. Is there some >>>>>> mechanism we could do this ? >>>>>> >>>>>>> >>>>>>>> >>>>>>>> Current arch call back arch_remove_memory() is too late in the process to >>>>>>>> abort memory hot removal as memory block devices and firmware memory map >>>>>>>> entries would have already been removed. Platforms should be able to abort >>>>>>>> the process before taking the mem_hotplug_lock with mem_hotplug_begin(). >>>>>>>> This essentially requires a new arch callback for memory range validation. >>>>>>> >>>>>>> I somewhat dislike this very much. Memory removal should never fail if >>>>>>> used sanely. See e.g., __remove_memory(), it will BUG() whenever >>>>>>> something like that would strike. >>>>>>> >>>>>>>> >>>>>>>> This differentiates memory range validation between memory hot add and hot >>>>>>>> remove paths before carving out a new helper check_hotremove_memory_range() >>>>>>>> which incorporates a new arch callback. This call back provides platforms >>>>>>>> an opportunity to refuse memory removal at the very onset. In future the >>>>>>>> same principle can be extended for memory hot add path if required. >>>>>>>> >>>>>>>> Platforms can choose to override this callback in order to reject specific >>>>>>>> memory ranges from removal or can just fallback to a default implementation >>>>>>>> which allows removal of all memory ranges. >>>>>>> >>>>>>> I suspect we want really want to disallow offlining instead. E.g., I >>>>>> >>>>>> If boot memory pages can be prevented from being offlined for sure, then it >>>>>> would indirectly definitely prevent hot remove process as well. >>>>>> >>>>>>> remember s390x does that with certain areas needed for dumping/kexec. >>>>>> >>>>>> Could not find any references to mmu_notifier in arch/s390 or any other arch >>>>>> for that matter apart from KVM (which has an user space component), could you >>>>>> please give some pointers ? >>>>> >>>>> Memory (hotplug) notifier, not MMU notifier :) >>>> >>>> They are so similarly named :) >>>> >>>>> >>>>> Not on my notebook right now, grep for MEM_GOING_OFFLINE, that should be it. >>>>> >>>> >>>> Got it, thanks ! But we will still need boot memory enumeration via MEMBLOCK_BOOT >>>> to reject affected offline requests in the callback. >>> >>> Do you really need that? >>> >>> We have SECTION_IS_EARLY. You could iterate all involved sections (for >>> which you are getting notified) and check if any one of these is marked >>> SECTION_IS_EARLY. then, it was added during boot and not via add_memory(). >> >> Seems to be a better approach than adding a new memblock flag. > > These additional changes do the trick and prevent boot memory removal. > Hope this is in line with your earlier suggestion. > > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c > index 00f3e1836558..3b59e6a29dea 100644 > --- a/arch/arm64/mm/mmu.c > +++ b/arch/arm64/mm/mmu.c > @@ -17,6 +17,7 @@ > +++ b/arch/arm64/mm/mmu.c > @@ -17,6 +17,7 @@ > #include <linux/mman.h> > #include <linux/nodemask.h> > #include <linux/memblock.h> > +#include <linux/memory.h> > #include <linux/fs.h> > #include <linux/io.h> > #include <linux/mm.h> > @@ -1365,4 +1366,37 @@ void arch_remove_memory(int nid, u64 start, u64 size, > __remove_pages(start_pfn, nr_pages, altmap); > __remove_pgd_mapping(swapper_pg_dir, __phys_to_virt(start), size); > } > + > +static int boot_mem_remove_notifier(struct notifier_block *nb, > + unsigned long action, void *data) > +{ > + unsigned long start_pfn, end_pfn, pfn, section_nr; > + struct mem_section *ms; > + struct memory_notify *arg = data; > + > + start_pfn = > + end_pfn = start_pfn + arg->nr_pages; You can initialize some of these directly struct memory_notify *arg = data; const unsigned long end_pfn = arg->start_pfn; + arg->nr_pages; unsigned long pfn = arg->start_pfn; and avoid start_pfn. > + > + if (action != MEM_GOING_OFFLINE) > + return NOTIFY_OK; > + > + for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) { > + section_nr = ; > + ms = __nr_to_section(section_nr); Also, I think you can avoid section_nr. ms = __nr_to_section(pfn_to_section_nr(pfn)); > + > + if (early_section(ms)) > + return NOTIFY_BAD; > + } > + return NOTIFY_OK; > +} > + > +static struct notifier_block boot_mem_remove_nb = { > + .notifier_call = boot_mem_remove_notifier, > +}; > + > +static int __init boot_mem_remove_init(void) > +{ > + return register_memory_notifier(&boot_mem_remove_nb); > +} > +device_initcall(boot_mem_remove_init); > #endif Exactly what I was suggesting :) If we ever need to offline+re-online boot memory (e.g., to a different zone), we can think of something else.
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index ba0dca6..f661bd5 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -305,6 +305,13 @@ static inline void pgdat_resize_init(struct pglist_data *pgdat) {} #ifdef CONFIG_MEMORY_HOTREMOVE +#ifndef arch_memory_removable +static inline bool arch_memory_removable(u64 base, u64 size) +{ + return true; +} +#endif + extern bool is_mem_section_removable(unsigned long pfn, unsigned long nr_pages); extern void try_offline_node(int nid); extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index a91a072..7cdf800 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1014,6 +1014,23 @@ static int check_hotplug_memory_range(u64 start, u64 size) return 0; } +static int check_hotremove_memory_range(u64 start, u64 size) +{ + int rc; + + BUG_ON(check_hotplug_memory_range(start, size)); + + /* + * First check if the platform is willing to have this + * memory range removed else just abort. + */ + rc = arch_memory_removable(start, size); + if (!rc) + return -EINVAL; + + return 0; +} + static int online_memory_block(struct memory_block *mem, void *arg) { return device_online(&mem->dev); @@ -1762,7 +1779,9 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size) { int rc = 0; - BUG_ON(check_hotplug_memory_range(start, size)); + rc = check_hotremove_memory_range(start, size); + if (rc) + return rc; mem_hotplug_begin();
Currently there are two interfaces to initiate memory range hot removal i.e remove_memory() and __remove_memory() which then calls try_remove_memory(). Platform gets called with arch_remove_memory() to tear down required kernel page tables and other arch specific procedures. But there are platforms like arm64 which might want to prevent removal of certain specific memory ranges irrespective of their present usage or movability properties. Current arch call back arch_remove_memory() is too late in the process to abort memory hot removal as memory block devices and firmware memory map entries would have already been removed. Platforms should be able to abort the process before taking the mem_hotplug_lock with mem_hotplug_begin(). This essentially requires a new arch callback for memory range validation. This differentiates memory range validation between memory hot add and hot remove paths before carving out a new helper check_hotremove_memory_range() which incorporates a new arch callback. This call back provides platforms an opportunity to refuse memory removal at the very onset. In future the same principle can be extended for memory hot add path if required. Platforms can choose to override this callback in order to reject specific memory ranges from removal or can just fallback to a default implementation which allows removal of all memory ranges. Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> --- include/linux/memory_hotplug.h | 7 +++++++ mm/memory_hotplug.c | 21 ++++++++++++++++++++- 2 files changed, 27 insertions(+), 1 deletion(-)