Message ID | 20200707055917.143653-2-justin.he@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Fix and enable pmem as RAM device on arm64 | expand |
On 07.07.20 07:59, Jia He wrote: > This exports memory_add_physaddr_to_nid() for module driver to use. > > memory_add_physaddr_to_nid() is a fallback option to get the nid in case > NUMA_NO_NID is detected. > > Suggested-by: David Hildenbrand <david@redhat.com> > Signed-off-by: Jia He <justin.he@arm.com> > --- > arch/arm64/mm/numa.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c > index aafcee3e3f7e..7eeb31740248 100644 > --- a/arch/arm64/mm/numa.c > +++ b/arch/arm64/mm/numa.c > @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) > > /* > * We hope that we will be hotplugging memory on nodes we already know about, > - * such that acpi_get_node() succeeds and we never fall back to this... > + * such that acpi_get_node() succeeds. But when SRAT is not present, the node > + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option. > */ > int memory_add_physaddr_to_nid(u64 addr) > { > - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr); > return 0; > } > +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); > We could turn that into a pr_info() instead, but the effect is visible to user space (e.g., which memory blocks belong to which node in sysfs), so this can be debugged easily on demand. Reviewed-by: David Hildenbrand <david@redhat.com>
On Tue 07-07-20 13:59:15, Jia He wrote: > This exports memory_add_physaddr_to_nid() for module driver to use. > > memory_add_physaddr_to_nid() is a fallback option to get the nid in case > NUMA_NO_NID is detected. > > Suggested-by: David Hildenbrand <david@redhat.com> > Signed-off-by: Jia He <justin.he@arm.com> > --- > arch/arm64/mm/numa.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c > index aafcee3e3f7e..7eeb31740248 100644 > --- a/arch/arm64/mm/numa.c > +++ b/arch/arm64/mm/numa.c > @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) > > /* > * We hope that we will be hotplugging memory on nodes we already know about, > - * such that acpi_get_node() succeeds and we never fall back to this... > + * such that acpi_get_node() succeeds. But when SRAT is not present, the node > + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option. > */ > int memory_add_physaddr_to_nid(u64 addr) > { > - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr); > return 0; > } > +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); Does it make sense to export a noop function? Wouldn't make more sense to simply make it static inline somewhere in a header? I haven't checked whether there is an easy way to do that sanely bu this just hit my eyes.
On Tue, Jul 07, 2020 at 01:54:54PM +0200, Michal Hocko wrote: > On Tue 07-07-20 13:59:15, Jia He wrote: > > This exports memory_add_physaddr_to_nid() for module driver to use. > > > > memory_add_physaddr_to_nid() is a fallback option to get the nid in case > > NUMA_NO_NID is detected. > > > > Suggested-by: David Hildenbrand <david@redhat.com> > > Signed-off-by: Jia He <justin.he@arm.com> > > --- > > arch/arm64/mm/numa.c | 5 +++-- > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c > > index aafcee3e3f7e..7eeb31740248 100644 > > --- a/arch/arm64/mm/numa.c > > +++ b/arch/arm64/mm/numa.c > > @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) > > > > /* > > * We hope that we will be hotplugging memory on nodes we already know about, > > - * such that acpi_get_node() succeeds and we never fall back to this... > > + * such that acpi_get_node() succeeds. But when SRAT is not present, the node > > + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option. > > */ > > int memory_add_physaddr_to_nid(u64 addr) > > { > > - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr); > > return 0; > > } > > +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); > > Does it make sense to export a noop function? Wouldn't make more sense > to simply make it static inline somewhere in a header? I haven't checked > whether there is an easy way to do that sanely bu this just hit my eyes. We'll need to either add a CONFIG_ option or arch specific callback to make both non-empty (x86, powerpc, ia64) and empty (arm64, sh) implementations coexist ... > -- > Michal Hocko > SUSE Labs
On 07.07.20 14:13, Mike Rapoport wrote: > On Tue, Jul 07, 2020 at 01:54:54PM +0200, Michal Hocko wrote: >> On Tue 07-07-20 13:59:15, Jia He wrote: >>> This exports memory_add_physaddr_to_nid() for module driver to use. >>> >>> memory_add_physaddr_to_nid() is a fallback option to get the nid in case >>> NUMA_NO_NID is detected. >>> >>> Suggested-by: David Hildenbrand <david@redhat.com> >>> Signed-off-by: Jia He <justin.he@arm.com> >>> --- >>> arch/arm64/mm/numa.c | 5 +++-- >>> 1 file changed, 3 insertions(+), 2 deletions(-) >>> >>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c >>> index aafcee3e3f7e..7eeb31740248 100644 >>> --- a/arch/arm64/mm/numa.c >>> +++ b/arch/arm64/mm/numa.c >>> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) >>> >>> /* >>> * We hope that we will be hotplugging memory on nodes we already know about, >>> - * such that acpi_get_node() succeeds and we never fall back to this... >>> + * such that acpi_get_node() succeeds. But when SRAT is not present, the node >>> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option. >>> */ >>> int memory_add_physaddr_to_nid(u64 addr) >>> { >>> - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr); >>> return 0; >>> } >>> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); >> >> Does it make sense to export a noop function? Wouldn't make more sense >> to simply make it static inline somewhere in a header? I haven't checked >> whether there is an easy way to do that sanely bu this just hit my eyes. > > We'll need to either add a CONFIG_ option or arch specific callback to > make both non-empty (x86, powerpc, ia64) and empty (arm64, sh) > implementations coexist ... Note: I have a similar dummy (return 0) patch for s390x lying around here.
On Tue, Jul 07, 2020 at 02:26:08PM +0200, David Hildenbrand wrote: > On 07.07.20 14:13, Mike Rapoport wrote: > > On Tue, Jul 07, 2020 at 01:54:54PM +0200, Michal Hocko wrote: > >> On Tue 07-07-20 13:59:15, Jia He wrote: > >>> This exports memory_add_physaddr_to_nid() for module driver to use. > >>> > >>> memory_add_physaddr_to_nid() is a fallback option to get the nid in case > >>> NUMA_NO_NID is detected. > >>> > >>> Suggested-by: David Hildenbrand <david@redhat.com> > >>> Signed-off-by: Jia He <justin.he@arm.com> > >>> --- > >>> arch/arm64/mm/numa.c | 5 +++-- > >>> 1 file changed, 3 insertions(+), 2 deletions(-) > >>> > >>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c > >>> index aafcee3e3f7e..7eeb31740248 100644 > >>> --- a/arch/arm64/mm/numa.c > >>> +++ b/arch/arm64/mm/numa.c > >>> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) > >>> > >>> /* > >>> * We hope that we will be hotplugging memory on nodes we already know about, > >>> - * such that acpi_get_node() succeeds and we never fall back to this... > >>> + * such that acpi_get_node() succeeds. But when SRAT is not present, the node > >>> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option. > >>> */ > >>> int memory_add_physaddr_to_nid(u64 addr) > >>> { > >>> - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr); > >>> return 0; > >>> } > >>> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); > >> > >> Does it make sense to export a noop function? Wouldn't make more sense > >> to simply make it static inline somewhere in a header? I haven't checked > >> whether there is an easy way to do that sanely bu this just hit my eyes. > > > > We'll need to either add a CONFIG_ option or arch specific callback to > > make both non-empty (x86, powerpc, ia64) and empty (arm64, sh) > > implementations coexist ... > > Note: I have a similar dummy (return 0) patch for s390x lying around here. Then we'll call it a tie - 3:3 ;-) > -- > Thanks, > > David / dhildenb >
On Tue, Jul 7, 2020 at 11:01 AM Mike Rapoport <rppt@linux.ibm.com> wrote: > > On Tue, Jul 07, 2020 at 02:26:08PM +0200, David Hildenbrand wrote: > > On 07.07.20 14:13, Mike Rapoport wrote: > > > On Tue, Jul 07, 2020 at 01:54:54PM +0200, Michal Hocko wrote: > > >> On Tue 07-07-20 13:59:15, Jia He wrote: > > >>> This exports memory_add_physaddr_to_nid() for module driver to use. > > >>> > > >>> memory_add_physaddr_to_nid() is a fallback option to get the nid in case > > >>> NUMA_NO_NID is detected. > > >>> > > >>> Suggested-by: David Hildenbrand <david@redhat.com> > > >>> Signed-off-by: Jia He <justin.he@arm.com> > > >>> --- > > >>> arch/arm64/mm/numa.c | 5 +++-- > > >>> 1 file changed, 3 insertions(+), 2 deletions(-) > > >>> > > >>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c > > >>> index aafcee3e3f7e..7eeb31740248 100644 > > >>> --- a/arch/arm64/mm/numa.c > > >>> +++ b/arch/arm64/mm/numa.c > > >>> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) > > >>> > > >>> /* > > >>> * We hope that we will be hotplugging memory on nodes we already know about, > > >>> - * such that acpi_get_node() succeeds and we never fall back to this... > > >>> + * such that acpi_get_node() succeeds. But when SRAT is not present, the node > > >>> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option. > > >>> */ > > >>> int memory_add_physaddr_to_nid(u64 addr) > > >>> { > > >>> - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr); > > >>> return 0; > > >>> } > > >>> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); > > >> > > >> Does it make sense to export a noop function? Wouldn't make more sense > > >> to simply make it static inline somewhere in a header? I haven't checked > > >> whether there is an easy way to do that sanely bu this just hit my eyes. > > > > > > We'll need to either add a CONFIG_ option or arch specific callback to > > > make both non-empty (x86, powerpc, ia64) and empty (arm64, sh) > > > implementations coexist ... > > > > Note: I have a similar dummy (return 0) patch for s390x lying around here. > > Then we'll call it a tie - 3:3 ;-) So I'd be happy to jump on the train of people wanting to export the ARM stub for this (and add a new ARM stub for phys_to_target_node()), but Will did have a plausibly better idea that I have been meaning to circle back to: http://lore.kernel.org/r/20200325111039.GA32109@willie-the-truck ...i.e. iterate over node data to do the lookup. This would seem to work generically for multiple archs unless I am missing something?
Hi Michal and David > -----Original Message----- > From: Michal Hocko <mhocko@kernel.org> > Sent: Tuesday, July 7, 2020 7:55 PM > To: Justin He <Justin.He@arm.com> > Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Will Deacon > <will@kernel.org>; Dan Williams <dan.j.williams@intel.com>; Vishal Verma > <vishal.l.verma@intel.com>; Dave Jiang <dave.jiang@intel.com>; Andrew > Morton <akpm@linux-foundation.org>; Mike Rapoport <rppt@linux.ibm.com>; > Baoquan He <bhe@redhat.com>; Chuhong Yuan <hslester96@gmail.com>; linux- > arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux- > mm@kvack.org; linux-nvdimm@lists.01.org; Kaly Xin <Kaly.Xin@arm.com> > Subject: Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid > as EXPORT_SYMBOL_GPL > > On Tue 07-07-20 13:59:15, Jia He wrote: > > This exports memory_add_physaddr_to_nid() for module driver to use. > > > > memory_add_physaddr_to_nid() is a fallback option to get the nid in case > > NUMA_NO_NID is detected. > > > > Suggested-by: David Hildenbrand <david@redhat.com> > > Signed-off-by: Jia He <justin.he@arm.com> > > --- > > arch/arm64/mm/numa.c | 5 +++-- > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c > > index aafcee3e3f7e..7eeb31740248 100644 > > --- a/arch/arm64/mm/numa.c > > +++ b/arch/arm64/mm/numa.c > > @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) > > > > /* > > * We hope that we will be hotplugging memory on nodes we already know > about, > > - * such that acpi_get_node() succeeds and we never fall back to this... > > + * such that acpi_get_node() succeeds. But when SRAT is not present, > the node > > + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback > option. > > */ > > int memory_add_physaddr_to_nid(u64 addr) > > { > > - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", > addr); > > return 0; > > } > > +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); > > Does it make sense to export a noop function? Wouldn't make more sense > to simply make it static inline somewhere in a header? I haven't checked > whether there is an easy way to do that sanely bu this just hit my eyes. Okay, I can make a change in memory_hotplug.h, sth like: --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -149,13 +149,13 @@ int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages, struct mhp_params *params); #endif /* ARCH_HAS_ADD_PAGES */ -#ifdef CONFIG_NUMA -extern int memory_add_physaddr_to_nid(u64 start); -#else +#if !defined(CONFIG_NUMA) || !defined(memory_add_physaddr_to_nid) static inline int memory_add_physaddr_to_nid(u64 start) { return 0; } +#else +extern int memory_add_physaddr_to_nid(u64 start); #endif And then check the memory_add_physaddr_to_nid() helper on all arches, if it is noop(return 0), I can simply remove it. if it is not noop, after the helper, #define memory_add_physaddr_to_nid What do you think of this proposal? -- Cheers, Justin (Jia He)
On Tue, Jul 7, 2020 at 7:20 PM Justin He <Justin.He@arm.com> wrote: > > Hi Michal and David > > > -----Original Message----- > > From: Michal Hocko <mhocko@kernel.org> > > Sent: Tuesday, July 7, 2020 7:55 PM > > To: Justin He <Justin.He@arm.com> > > Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Will Deacon > > <will@kernel.org>; Dan Williams <dan.j.williams@intel.com>; Vishal Verma > > <vishal.l.verma@intel.com>; Dave Jiang <dave.jiang@intel.com>; Andrew > > Morton <akpm@linux-foundation.org>; Mike Rapoport <rppt@linux.ibm.com>; > > Baoquan He <bhe@redhat.com>; Chuhong Yuan <hslester96@gmail.com>; linux- > > arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux- > > mm@kvack.org; linux-nvdimm@lists.01.org; Kaly Xin <Kaly.Xin@arm.com> > > Subject: Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid > > as EXPORT_SYMBOL_GPL > > > > On Tue 07-07-20 13:59:15, Jia He wrote: > > > This exports memory_add_physaddr_to_nid() for module driver to use. > > > > > > memory_add_physaddr_to_nid() is a fallback option to get the nid in case > > > NUMA_NO_NID is detected. > > > > > > Suggested-by: David Hildenbrand <david@redhat.com> > > > Signed-off-by: Jia He <justin.he@arm.com> > > > --- > > > arch/arm64/mm/numa.c | 5 +++-- > > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > > > diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c > > > index aafcee3e3f7e..7eeb31740248 100644 > > > --- a/arch/arm64/mm/numa.c > > > +++ b/arch/arm64/mm/numa.c > > > @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) > > > > > > /* > > > * We hope that we will be hotplugging memory on nodes we already know > > about, > > > - * such that acpi_get_node() succeeds and we never fall back to this... > > > + * such that acpi_get_node() succeeds. But when SRAT is not present, > > the node > > > + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback > > option. > > > */ > > > int memory_add_physaddr_to_nid(u64 addr) > > > { > > > - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", > > addr); > > > return 0; > > > } > > > +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); > > > > Does it make sense to export a noop function? Wouldn't make more sense > > to simply make it static inline somewhere in a header? I haven't checked > > whether there is an easy way to do that sanely bu this just hit my eyes. > > Okay, I can make a change in memory_hotplug.h, sth like: > --- a/include/linux/memory_hotplug.h > +++ b/include/linux/memory_hotplug.h > @@ -149,13 +149,13 @@ int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages, > struct mhp_params *params); > #endif /* ARCH_HAS_ADD_PAGES */ > > -#ifdef CONFIG_NUMA > -extern int memory_add_physaddr_to_nid(u64 start); > -#else > +#if !defined(CONFIG_NUMA) || !defined(memory_add_physaddr_to_nid) > static inline int memory_add_physaddr_to_nid(u64 start) > { > return 0; > } > +#else > +extern int memory_add_physaddr_to_nid(u64 start); > #endif > > And then check the memory_add_physaddr_to_nid() helper on all arches, > if it is noop(return 0), I can simply remove it. > if it is not noop, after the helper, > #define memory_add_physaddr_to_nid > > What do you think of this proposal? Especially for architectures that use memblock info for numa info (which seems to be everyone except x86) why not implement a generic memory_add_physaddr_to_nid() that does: int memory_add_physaddr_to_nid(u64 addr) { unsigned long start_pfn, end_pfn, pfn = PHYS_PFN(addr); int nid; for_each_online_node(nid) { get_pfn_range_for_nid(nid, &start_pfn, &end_pfn); if (pfn >= start_pfn && pfn <= end_pfn) return nid; } return NUMA_NO_NODE; }
Hi Dan > -----Original Message----- > From: Dan Williams <dan.j.williams@intel.com> > Sent: Wednesday, July 8, 2020 11:57 AM > To: Justin He <Justin.He@arm.com> > Cc: Michal Hocko <mhocko@kernel.org>; David Hildenbrand <david@redhat.com>; > Catalin Marinas <Catalin.Marinas@arm.com>; Will Deacon <will@kernel.org>; > Vishal Verma <vishal.l.verma@intel.com>; Dave Jiang <dave.jiang@intel.com>; > Andrew Morton <akpm@linux-foundation.org>; Mike Rapoport > <rppt@linux.ibm.com>; Baoquan He <bhe@redhat.com>; Chuhong Yuan > <hslester96@gmail.com>; linux-arm-kernel@lists.infradead.org; linux- > kernel@vger.kernel.org; linux-mm@kvack.org; linux-nvdimm@lists.01.org; > Kaly Xin <Kaly.Xin@arm.com> > Subject: Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid > as EXPORT_SYMBOL_GPL > > On Tue, Jul 7, 2020 at 7:20 PM Justin He <Justin.He@arm.com> wrote: > > > > Hi Michal and David > > > > > -----Original Message----- > > > From: Michal Hocko <mhocko@kernel.org> > > > Sent: Tuesday, July 7, 2020 7:55 PM > > > To: Justin He <Justin.He@arm.com> > > > Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Will Deacon > > > <will@kernel.org>; Dan Williams <dan.j.williams@intel.com>; Vishal > Verma > > > <vishal.l.verma@intel.com>; Dave Jiang <dave.jiang@intel.com>; Andrew > > > Morton <akpm@linux-foundation.org>; Mike Rapoport <rppt@linux.ibm.com>; > > > Baoquan He <bhe@redhat.com>; Chuhong Yuan <hslester96@gmail.com>; > linux- > > > arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux- > > > mm@kvack.org; linux-nvdimm@lists.01.org; Kaly Xin <Kaly.Xin@arm.com> > > > Subject: Re: [PATCH v2 1/3] arm64/numa: export > memory_add_physaddr_to_nid > > > as EXPORT_SYMBOL_GPL > > > > > > On Tue 07-07-20 13:59:15, Jia He wrote: > > > > This exports memory_add_physaddr_to_nid() for module driver to use. > > > > > > > > memory_add_physaddr_to_nid() is a fallback option to get the nid in > case > > > > NUMA_NO_NID is detected. > > > > > > > > Suggested-by: David Hildenbrand <david@redhat.com> > > > > Signed-off-by: Jia He <justin.he@arm.com> > > > > --- > > > > arch/arm64/mm/numa.c | 5 +++-- > > > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c > > > > index aafcee3e3f7e..7eeb31740248 100644 > > > > --- a/arch/arm64/mm/numa.c > > > > +++ b/arch/arm64/mm/numa.c > > > > @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) > > > > > > > > /* > > > > * We hope that we will be hotplugging memory on nodes we already > know > > > about, > > > > - * such that acpi_get_node() succeeds and we never fall back to > this... > > > > + * such that acpi_get_node() succeeds. But when SRAT is not present, > > > the node > > > > + * id may be probed as NUMA_NO_NODE by acpi, Here provide a > fallback > > > option. > > > > */ > > > > int memory_add_physaddr_to_nid(u64 addr) > > > > { > > > > - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", > > > addr); > > > > return 0; > > > > } > > > > +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); > > > > > > Does it make sense to export a noop function? Wouldn't make more sense > > > to simply make it static inline somewhere in a header? I haven't > checked > > > whether there is an easy way to do that sanely bu this just hit my > eyes. > > > > Okay, I can make a change in memory_hotplug.h, sth like: > > --- a/include/linux/memory_hotplug.h > > +++ b/include/linux/memory_hotplug.h > > @@ -149,13 +149,13 @@ int add_pages(int nid, unsigned long start_pfn, > unsigned long nr_pages, > > struct mhp_params *params); > > #endif /* ARCH_HAS_ADD_PAGES */ > > > > -#ifdef CONFIG_NUMA > > -extern int memory_add_physaddr_to_nid(u64 start); > > -#else > > +#if !defined(CONFIG_NUMA) || !defined(memory_add_physaddr_to_nid) > > static inline int memory_add_physaddr_to_nid(u64 start) > > { > > return 0; > > } > > +#else > > +extern int memory_add_physaddr_to_nid(u64 start); > > #endif > > > > And then check the memory_add_physaddr_to_nid() helper on all arches, > > if it is noop(return 0), I can simply remove it. > > if it is not noop, after the helper, > > #define memory_add_physaddr_to_nid > > > > What do you think of this proposal? > > Especially for architectures that use memblock info for numa info > (which seems to be everyone except x86) why not implement a generic > memory_add_physaddr_to_nid() that does: > > int memory_add_physaddr_to_nid(u64 addr) > { > unsigned long start_pfn, end_pfn, pfn = PHYS_PFN(addr); > int nid; > > for_each_online_node(nid) { > get_pfn_range_for_nid(nid, &start_pfn, &end_pfn); > if (pfn >= start_pfn && pfn <= end_pfn) > return nid; > } > return NUMA_NO_NODE; > } Thanks for your suggestion, Could I wrap the codes and let memory_add_physaddr_to_nid simply invoke phys_to_target_node()? -- Cheers, Justin (Jia He)
On Tue, Jul 7, 2020 at 9:08 PM Justin He <Justin.He@arm.com> wrote: [..] > > Especially for architectures that use memblock info for numa info > > (which seems to be everyone except x86) why not implement a generic > > memory_add_physaddr_to_nid() that does: > > > > int memory_add_physaddr_to_nid(u64 addr) > > { > > unsigned long start_pfn, end_pfn, pfn = PHYS_PFN(addr); > > int nid; > > > > for_each_online_node(nid) { > > get_pfn_range_for_nid(nid, &start_pfn, &end_pfn); > > if (pfn >= start_pfn && pfn <= end_pfn) > > return nid; > > } > > return NUMA_NO_NODE; > > } > > Thanks for your suggestion, > Could I wrap the codes and let memory_add_physaddr_to_nid simply invoke > phys_to_target_node()? I think it needs to be the reverse. phys_to_target_node() should call memory_add_physaddr_to_nid() by default, but fall back to searching reserved memory address ranges in memblock. See phys_to_target_node() in arch/x86/mm/numa.c. That one uses numa_meminfo instead of memblock, but the principle is the same i.e. that a target node may not be represented in memblock.memory, but memblock.reserved. I'm working on a patch to provide a function similar to get_pfn_range_for_nid() that operates on reserved memory.
On Tue, Jul 07, 2020 at 03:05:48PM -0700, Dan Williams wrote: > On Tue, Jul 7, 2020 at 11:01 AM Mike Rapoport <rppt@linux.ibm.com> wrote: > > > > On Tue, Jul 07, 2020 at 02:26:08PM +0200, David Hildenbrand wrote: > > > On 07.07.20 14:13, Mike Rapoport wrote: > > > > On Tue, Jul 07, 2020 at 01:54:54PM +0200, Michal Hocko wrote: > > > >> On Tue 07-07-20 13:59:15, Jia He wrote: > > > >>> This exports memory_add_physaddr_to_nid() for module driver to use. > > > >>> > > > >>> memory_add_physaddr_to_nid() is a fallback option to get the nid in case > > > >>> NUMA_NO_NID is detected. > > > >>> > > > >>> Suggested-by: David Hildenbrand <david@redhat.com> > > > >>> Signed-off-by: Jia He <justin.he@arm.com> > > > >>> --- > > > >>> arch/arm64/mm/numa.c | 5 +++-- > > > >>> 1 file changed, 3 insertions(+), 2 deletions(-) > > > >>> > > > >>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c > > > >>> index aafcee3e3f7e..7eeb31740248 100644 > > > >>> --- a/arch/arm64/mm/numa.c > > > >>> +++ b/arch/arm64/mm/numa.c > > > >>> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) > > > >>> > > > >>> /* > > > >>> * We hope that we will be hotplugging memory on nodes we already know about, > > > >>> - * such that acpi_get_node() succeeds and we never fall back to this... > > > >>> + * such that acpi_get_node() succeeds. But when SRAT is not present, the node > > > >>> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option. > > > >>> */ > > > >>> int memory_add_physaddr_to_nid(u64 addr) > > > >>> { > > > >>> - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr); > > > >>> return 0; > > > >>> } > > > >>> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); > > > >> > > > >> Does it make sense to export a noop function? Wouldn't make more sense > > > >> to simply make it static inline somewhere in a header? I haven't checked > > > >> whether there is an easy way to do that sanely bu this just hit my eyes. > > > > > > > > We'll need to either add a CONFIG_ option or arch specific callback to > > > > make both non-empty (x86, powerpc, ia64) and empty (arm64, sh) > > > > implementations coexist ... > > > > > > Note: I have a similar dummy (return 0) patch for s390x lying around here. > > > > Then we'll call it a tie - 3:3 ;-) > > So I'd be happy to jump on the train of people wanting to export the > ARM stub for this (and add a new ARM stub for phys_to_target_node()), > but Will did have a plausibly better idea that I have been meaning to > circle back to: > > http://lore.kernel.org/r/20200325111039.GA32109@willie-the-truck > > ...i.e. iterate over node data to do the lookup. This would seem to > work generically for multiple archs unless I am missing something? I think it would work on arm64, power and, most propbably on s390 (David?), but not on x86. x86 does not have reserved memory in pgdat, it's never memblock_add()'ed (see e820__memblock_setup()). I've suggested to add E820_*_RESERVED to memblock.memory a while ago [1], but apparently there are systems that cannot tolerate OS mappings of the BIOS reserved areas. [1] https://lore.kernel.org/lkml/20200522142053.GW1059226@linux.ibm.com/
On Tue, Jul 07, 2020 at 08:56:36PM -0700, Dan Williams wrote: > On Tue, Jul 7, 2020 at 7:20 PM Justin He <Justin.He@arm.com> wrote: > > > > Hi Michal and David > > > > > -----Original Message----- > > > From: Michal Hocko <mhocko@kernel.org> > > > Sent: Tuesday, July 7, 2020 7:55 PM > > > To: Justin He <Justin.He@arm.com> > > > Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Will Deacon > > > <will@kernel.org>; Dan Williams <dan.j.williams@intel.com>; Vishal Verma > > > <vishal.l.verma@intel.com>; Dave Jiang <dave.jiang@intel.com>; Andrew > > > Morton <akpm@linux-foundation.org>; Mike Rapoport <rppt@linux.ibm.com>; > > > Baoquan He <bhe@redhat.com>; Chuhong Yuan <hslester96@gmail.com>; linux- > > > arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux- > > > mm@kvack.org; linux-nvdimm@lists.01.org; Kaly Xin <Kaly.Xin@arm.com> > > > Subject: Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid > > > as EXPORT_SYMBOL_GPL > > > > > > On Tue 07-07-20 13:59:15, Jia He wrote: > > > > This exports memory_add_physaddr_to_nid() for module driver to use. > > > > > > > > memory_add_physaddr_to_nid() is a fallback option to get the nid in case > > > > NUMA_NO_NID is detected. > > > > > > > > Suggested-by: David Hildenbrand <david@redhat.com> > > > > Signed-off-by: Jia He <justin.he@arm.com> > > > > --- > > > > arch/arm64/mm/numa.c | 5 +++-- > > > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c > > > > index aafcee3e3f7e..7eeb31740248 100644 > > > > --- a/arch/arm64/mm/numa.c > > > > +++ b/arch/arm64/mm/numa.c > > > > @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) > > > > > > > > /* > > > > * We hope that we will be hotplugging memory on nodes we already know > > > about, > > > > - * such that acpi_get_node() succeeds and we never fall back to this... > > > > + * such that acpi_get_node() succeeds. But when SRAT is not present, > > > the node > > > > + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback > > > option. > > > > */ > > > > int memory_add_physaddr_to_nid(u64 addr) > > > > { > > > > - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", > > > addr); > > > > return 0; > > > > } > > > > +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); > > > > > > Does it make sense to export a noop function? Wouldn't make more sense > > > to simply make it static inline somewhere in a header? I haven't checked > > > whether there is an easy way to do that sanely bu this just hit my eyes. > > > > Okay, I can make a change in memory_hotplug.h, sth like: > > --- a/include/linux/memory_hotplug.h > > +++ b/include/linux/memory_hotplug.h > > @@ -149,13 +149,13 @@ int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages, > > struct mhp_params *params); > > #endif /* ARCH_HAS_ADD_PAGES */ > > > > -#ifdef CONFIG_NUMA > > -extern int memory_add_physaddr_to_nid(u64 start); > > -#else > > +#if !defined(CONFIG_NUMA) || !defined(memory_add_physaddr_to_nid) > > static inline int memory_add_physaddr_to_nid(u64 start) > > { > > return 0; > > } > > +#else > > +extern int memory_add_physaddr_to_nid(u64 start); > > #endif > > > > And then check the memory_add_physaddr_to_nid() helper on all arches, > > if it is noop(return 0), I can simply remove it. > > if it is not noop, after the helper, > > #define memory_add_physaddr_to_nid > > > > What do you think of this proposal? > > Especially for architectures that use memblock info for numa info > (which seems to be everyone except x86) why not implement a generic > memory_add_physaddr_to_nid() that does: That would be only arm64. > int memory_add_physaddr_to_nid(u64 addr) > { > unsigned long start_pfn, end_pfn, pfn = PHYS_PFN(addr); > int nid; > > for_each_online_node(nid) { > get_pfn_range_for_nid(nid, &start_pfn, &end_pfn); > if (pfn >= start_pfn && pfn <= end_pfn) > return nid; > } > return NUMA_NO_NODE; > }
On Tue, Jul 7, 2020 at 10:33 PM Mike Rapoport <rppt@linux.ibm.com> wrote: > > On Tue, Jul 07, 2020 at 08:56:36PM -0700, Dan Williams wrote: > > On Tue, Jul 7, 2020 at 7:20 PM Justin He <Justin.He@arm.com> wrote: > > > > > > Hi Michal and David > > > > > > > -----Original Message----- > > > > From: Michal Hocko <mhocko@kernel.org> > > > > Sent: Tuesday, July 7, 2020 7:55 PM > > > > To: Justin He <Justin.He@arm.com> > > > > Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Will Deacon > > > > <will@kernel.org>; Dan Williams <dan.j.williams@intel.com>; Vishal Verma > > > > <vishal.l.verma@intel.com>; Dave Jiang <dave.jiang@intel.com>; Andrew > > > > Morton <akpm@linux-foundation.org>; Mike Rapoport <rppt@linux.ibm.com>; > > > > Baoquan He <bhe@redhat.com>; Chuhong Yuan <hslester96@gmail.com>; linux- > > > > arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux- > > > > mm@kvack.org; linux-nvdimm@lists.01.org; Kaly Xin <Kaly.Xin@arm.com> > > > > Subject: Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid > > > > as EXPORT_SYMBOL_GPL > > > > > > > > On Tue 07-07-20 13:59:15, Jia He wrote: > > > > > This exports memory_add_physaddr_to_nid() for module driver to use. > > > > > > > > > > memory_add_physaddr_to_nid() is a fallback option to get the nid in case > > > > > NUMA_NO_NID is detected. > > > > > > > > > > Suggested-by: David Hildenbrand <david@redhat.com> > > > > > Signed-off-by: Jia He <justin.he@arm.com> > > > > > --- > > > > > arch/arm64/mm/numa.c | 5 +++-- > > > > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > > > > > > > diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c > > > > > index aafcee3e3f7e..7eeb31740248 100644 > > > > > --- a/arch/arm64/mm/numa.c > > > > > +++ b/arch/arm64/mm/numa.c > > > > > @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) > > > > > > > > > > /* > > > > > * We hope that we will be hotplugging memory on nodes we already know > > > > about, > > > > > - * such that acpi_get_node() succeeds and we never fall back to this... > > > > > + * such that acpi_get_node() succeeds. But when SRAT is not present, > > > > the node > > > > > + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback > > > > option. > > > > > */ > > > > > int memory_add_physaddr_to_nid(u64 addr) > > > > > { > > > > > - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", > > > > addr); > > > > > return 0; > > > > > } > > > > > +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); > > > > > > > > Does it make sense to export a noop function? Wouldn't make more sense > > > > to simply make it static inline somewhere in a header? I haven't checked > > > > whether there is an easy way to do that sanely bu this just hit my eyes. > > > > > > Okay, I can make a change in memory_hotplug.h, sth like: > > > --- a/include/linux/memory_hotplug.h > > > +++ b/include/linux/memory_hotplug.h > > > @@ -149,13 +149,13 @@ int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages, > > > struct mhp_params *params); > > > #endif /* ARCH_HAS_ADD_PAGES */ > > > > > > -#ifdef CONFIG_NUMA > > > -extern int memory_add_physaddr_to_nid(u64 start); > > > -#else > > > +#if !defined(CONFIG_NUMA) || !defined(memory_add_physaddr_to_nid) > > > static inline int memory_add_physaddr_to_nid(u64 start) > > > { > > > return 0; > > > } > > > +#else > > > +extern int memory_add_physaddr_to_nid(u64 start); > > > #endif > > > > > > And then check the memory_add_physaddr_to_nid() helper on all arches, > > > if it is noop(return 0), I can simply remove it. > > > if it is not noop, after the helper, > > > #define memory_add_physaddr_to_nid > > > > > > What do you think of this proposal? > > > > Especially for architectures that use memblock info for numa info > > (which seems to be everyone except x86) why not implement a generic > > memory_add_physaddr_to_nid() that does: > > That would be only arm64. > Darn, I saw ARCH_KEEP_MEMBLOCK and had delusions of grandeur that it could solve my numa api woes. At least for x86 the problem is already solved with reserved numa_meminfo, but now I'm trying to write generic drivers that use those apis and finding these gaps on other archs.
On Tue, Jul 07, 2020 at 10:48:19PM -0700, Dan Williams wrote: > On Tue, Jul 7, 2020 at 10:33 PM Mike Rapoport <rppt@linux.ibm.com> wrote: > > > > On Tue, Jul 07, 2020 at 08:56:36PM -0700, Dan Williams wrote: > > > On Tue, Jul 7, 2020 at 7:20 PM Justin He <Justin.He@arm.com> wrote: > > > > > > > > Hi Michal and David > > > > > > > > > -----Original Message----- > > > > > From: Michal Hocko <mhocko@kernel.org> > > > > > Sent: Tuesday, July 7, 2020 7:55 PM > > > > > To: Justin He <Justin.He@arm.com> > > > > > Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Will Deacon > > > > > <will@kernel.org>; Dan Williams <dan.j.williams@intel.com>; Vishal Verma > > > > > <vishal.l.verma@intel.com>; Dave Jiang <dave.jiang@intel.com>; Andrew > > > > > Morton <akpm@linux-foundation.org>; Mike Rapoport <rppt@linux.ibm.com>; > > > > > Baoquan He <bhe@redhat.com>; Chuhong Yuan <hslester96@gmail.com>; linux- > > > > > arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux- > > > > > mm@kvack.org; linux-nvdimm@lists.01.org; Kaly Xin <Kaly.Xin@arm.com> > > > > > Subject: Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid > > > > > as EXPORT_SYMBOL_GPL > > > > > > > > > > On Tue 07-07-20 13:59:15, Jia He wrote: > > > > > > This exports memory_add_physaddr_to_nid() for module driver to use. > > > > > > > > > > > > memory_add_physaddr_to_nid() is a fallback option to get the nid in case > > > > > > NUMA_NO_NID is detected. > > > > > > > > > > > > Suggested-by: David Hildenbrand <david@redhat.com> > > > > > > Signed-off-by: Jia He <justin.he@arm.com> > > > > > > --- > > > > > > arch/arm64/mm/numa.c | 5 +++-- > > > > > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > > > > > > > > > diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c > > > > > > index aafcee3e3f7e..7eeb31740248 100644 > > > > > > --- a/arch/arm64/mm/numa.c > > > > > > +++ b/arch/arm64/mm/numa.c > > > > > > @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) > > > > > > > > > > > > /* > > > > > > * We hope that we will be hotplugging memory on nodes we already know > > > > > about, > > > > > > - * such that acpi_get_node() succeeds and we never fall back to this... > > > > > > + * such that acpi_get_node() succeeds. But when SRAT is not present, > > > > > the node > > > > > > + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback > > > > > option. > > > > > > */ > > > > > > int memory_add_physaddr_to_nid(u64 addr) > > > > > > { > > > > > > - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", > > > > > addr); > > > > > > return 0; > > > > > > } > > > > > > +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); > > > > > > > > > > Does it make sense to export a noop function? Wouldn't make more sense > > > > > to simply make it static inline somewhere in a header? I haven't checked > > > > > whether there is an easy way to do that sanely bu this just hit my eyes. > > > > > > > > Okay, I can make a change in memory_hotplug.h, sth like: > > > > --- a/include/linux/memory_hotplug.h > > > > +++ b/include/linux/memory_hotplug.h > > > > @@ -149,13 +149,13 @@ int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages, > > > > struct mhp_params *params); > > > > #endif /* ARCH_HAS_ADD_PAGES */ > > > > > > > > -#ifdef CONFIG_NUMA > > > > -extern int memory_add_physaddr_to_nid(u64 start); > > > > -#else > > > > +#if !defined(CONFIG_NUMA) || !defined(memory_add_physaddr_to_nid) > > > > static inline int memory_add_physaddr_to_nid(u64 start) > > > > { > > > > return 0; > > > > } > > > > +#else > > > > +extern int memory_add_physaddr_to_nid(u64 start); > > > > #endif > > > > > > > > And then check the memory_add_physaddr_to_nid() helper on all arches, > > > > if it is noop(return 0), I can simply remove it. > > > > if it is not noop, after the helper, > > > > #define memory_add_physaddr_to_nid > > > > > > > > What do you think of this proposal? > > > > > > Especially for architectures that use memblock info for numa info > > > (which seems to be everyone except x86) why not implement a generic > > > memory_add_physaddr_to_nid() that does: > > > > That would be only arm64. > > > > Darn, I saw ARCH_KEEP_MEMBLOCK and had delusions of grandeur that it > could solve my numa api woes. At least for x86 the problem is already > solved with reserved numa_meminfo, but now I'm trying to write generic > drivers that use those apis and finding these gaps on other archs. I'm not sure if x86's numa_meminfo is a part of the solution or a part of the problem ;-) Anyway, this all indeed messy and there is a lot to improve there.
On Tue, Jul 07, 2020 at 09:27:43PM -0700, Dan Williams wrote: > On Tue, Jul 7, 2020 at 9:08 PM Justin He <Justin.He@arm.com> wrote: > [..] > > > Especially for architectures that use memblock info for numa info > > > (which seems to be everyone except x86) why not implement a generic > > > memory_add_physaddr_to_nid() that does: > > > > > > int memory_add_physaddr_to_nid(u64 addr) > > > { > > > unsigned long start_pfn, end_pfn, pfn = PHYS_PFN(addr); > > > int nid; > > > > > > for_each_online_node(nid) { > > > get_pfn_range_for_nid(nid, &start_pfn, &end_pfn); > > > if (pfn >= start_pfn && pfn <= end_pfn) > > > return nid; > > > } > > > return NUMA_NO_NODE; > > > } > > > > Thanks for your suggestion, > > Could I wrap the codes and let memory_add_physaddr_to_nid simply invoke > > phys_to_target_node()? > > I think it needs to be the reverse. phys_to_target_node() should call > memory_add_physaddr_to_nid() by default, but fall back to searching > reserved memory address ranges in memblock. See phys_to_target_node() > in arch/x86/mm/numa.c. That one uses numa_meminfo instead of memblock, > but the principle is the same i.e. that a target node may not be > represented in memblock.memory, but memblock.reserved. I'm working on > a patch to provide a function similar to get_pfn_range_for_nid() that > operates on reserved memory. Do we really need yet another memblock iterator? I think only x86 has memory that is not in memblock.memory but only in memblock.reserved.
On Tue, Jul 7, 2020 at 11:20 PM Mike Rapoport <rppt@linux.ibm.com> wrote: [..] > > Darn, I saw ARCH_KEEP_MEMBLOCK and had delusions of grandeur that it > > could solve my numa api woes. At least for x86 the problem is already > > solved with reserved numa_meminfo, but now I'm trying to write generic > > drivers that use those apis and finding these gaps on other archs. > > I'm not sure if x86's numa_meminfo is a part of the solution or a part > of the problem ;-) More the latter, but hopefully it can remain an exception and not the rule.
On Tue, Jul 7, 2020 at 11:22 PM Mike Rapoport <rppt@linux.ibm.com> wrote: [..] > > > Thanks for your suggestion, > > > Could I wrap the codes and let memory_add_physaddr_to_nid simply invoke > > > phys_to_target_node()? > > > > I think it needs to be the reverse. phys_to_target_node() should call > > memory_add_physaddr_to_nid() by default, but fall back to searching > > reserved memory address ranges in memblock. See phys_to_target_node() > > in arch/x86/mm/numa.c. That one uses numa_meminfo instead of memblock, > > but the principle is the same i.e. that a target node may not be > > represented in memblock.memory, but memblock.reserved. I'm working on > > a patch to provide a function similar to get_pfn_range_for_nid() that > > operates on reserved memory. > > Do we really need yet another memblock iterator? > I think only x86 has memory that is not in memblock.memory but only in > memblock.reserved. Well, that's what led me here. EFI has introduced a memory attribute called "EFI Special Purpose Memory". I mapped it to a new Linux concept called Soft Reserved memory (commit b617c5266eed "efi: Common enable/disable infrastructure for EFI soft reservation"). The driver I want to claim that memory, device-dax, wants to be able to look up numa information for an address range that is marked reserved in memblock. The device-dax facility has the ability to either let userspace map a device, or assign the memory backing that device to the page allocator. In both scenarios the driver needs numa info to either populate the 'numa_node' property of the device in sysfs, or to pass an node-id to add_memory_resource() when it is hot-plugged. I was thwarted by the lack of phys_to_target_node() on arm64, and rather than add another stub like memory_add_physaddr_to_nid() I wanted to see if it could be solved properly / generically with memblock data.
Hi Dan > -----Original Message----- > From: Dan Williams <dan.j.williams@intel.com> > Sent: Wednesday, July 8, 2020 1:48 PM > To: Mike Rapoport <rppt@linux.ibm.com> > Cc: Justin He <Justin.He@arm.com>; Michal Hocko <mhocko@kernel.org>; David > Hildenbrand <david@redhat.com>; Catalin Marinas <Catalin.Marinas@arm.com>; > Will Deacon <will@kernel.org>; Vishal Verma <vishal.l.verma@intel.com>; > Dave Jiang <dave.jiang@intel.com>; Andrew Morton <akpm@linux- > foundation.org>; Baoquan He <bhe@redhat.com>; Chuhong Yuan > <hslester96@gmail.com>; linux-arm-kernel@lists.infradead.org; linux- > kernel@vger.kernel.org; linux-mm@kvack.org; linux-nvdimm@lists.01.org; > Kaly Xin <Kaly.Xin@arm.com> > Subject: Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid > as EXPORT_SYMBOL_GPL > > On Tue, Jul 7, 2020 at 10:33 PM Mike Rapoport <rppt@linux.ibm.com> wrote: > > > > On Tue, Jul 07, 2020 at 08:56:36PM -0700, Dan Williams wrote: > > > On Tue, Jul 7, 2020 at 7:20 PM Justin He <Justin.He@arm.com> wrote: > > > > > > > > Hi Michal and David > > > > > > > > > -----Original Message----- > > > > > From: Michal Hocko <mhocko@kernel.org> > > > > > Sent: Tuesday, July 7, 2020 7:55 PM > > > > > To: Justin He <Justin.He@arm.com> > > > > > Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Will Deacon > > > > > <will@kernel.org>; Dan Williams <dan.j.williams@intel.com>; Vishal > Verma > > > > > <vishal.l.verma@intel.com>; Dave Jiang <dave.jiang@intel.com>; > Andrew > > > > > Morton <akpm@linux-foundation.org>; Mike Rapoport > <rppt@linux.ibm.com>; > > > > > Baoquan He <bhe@redhat.com>; Chuhong Yuan <hslester96@gmail.com>; > linux- > > > > > arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org; > linux- > > > > > mm@kvack.org; linux-nvdimm@lists.01.org; Kaly Xin > <Kaly.Xin@arm.com> > > > > > Subject: Re: [PATCH v2 1/3] arm64/numa: export > memory_add_physaddr_to_nid > > > > > as EXPORT_SYMBOL_GPL > > > > > > > > > > On Tue 07-07-20 13:59:15, Jia He wrote: > > > > > > This exports memory_add_physaddr_to_nid() for module driver to > use. > > > > > > > > > > > > memory_add_physaddr_to_nid() is a fallback option to get the nid > in case > > > > > > NUMA_NO_NID is detected. > > > > > > > > > > > > Suggested-by: David Hildenbrand <david@redhat.com> > > > > > > Signed-off-by: Jia He <justin.he@arm.com> > > > > > > --- > > > > > > arch/arm64/mm/numa.c | 5 +++-- > > > > > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > > > > > > > > > diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c > > > > > > index aafcee3e3f7e..7eeb31740248 100644 > > > > > > --- a/arch/arm64/mm/numa.c > > > > > > +++ b/arch/arm64/mm/numa.c > > > > > > @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) > > > > > > > > > > > > /* > > > > > > * We hope that we will be hotplugging memory on nodes we > already know > > > > > about, > > > > > > - * such that acpi_get_node() succeeds and we never fall back to > this... > > > > > > + * such that acpi_get_node() succeeds. But when SRAT is not > present, > > > > > the node > > > > > > + * id may be probed as NUMA_NO_NODE by acpi, Here provide a > fallback > > > > > option. > > > > > > */ > > > > > > int memory_add_physaddr_to_nid(u64 addr) > > > > > > { > > > > > > - pr_warn("Unknown node for memory at 0x%llx, assuming node > 0\n", > > > > > addr); > > > > > > return 0; > > > > > > } > > > > > > +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); > > > > > > > > > > Does it make sense to export a noop function? Wouldn't make more > sense > > > > > to simply make it static inline somewhere in a header? I haven't > checked > > > > > whether there is an easy way to do that sanely bu this just hit my > eyes. > > > > > > > > Okay, I can make a change in memory_hotplug.h, sth like: > > > > --- a/include/linux/memory_hotplug.h > > > > +++ b/include/linux/memory_hotplug.h > > > > @@ -149,13 +149,13 @@ int add_pages(int nid, unsigned long start_pfn, > unsigned long nr_pages, > > > > struct mhp_params *params); > > > > #endif /* ARCH_HAS_ADD_PAGES */ > > > > > > > > -#ifdef CONFIG_NUMA > > > > -extern int memory_add_physaddr_to_nid(u64 start); > > > > -#else > > > > +#if !defined(CONFIG_NUMA) || !defined(memory_add_physaddr_to_nid) > > > > static inline int memory_add_physaddr_to_nid(u64 start) > > > > { > > > > return 0; > > > > } > > > > +#else > > > > +extern int memory_add_physaddr_to_nid(u64 start); > > > > #endif > > > > > > > > And then check the memory_add_physaddr_to_nid() helper on all arches, > > > > if it is noop(return 0), I can simply remove it. > > > > if it is not noop, after the helper, > > > > #define memory_add_physaddr_to_nid > > > > > > > > What do you think of this proposal? > > > > > > Especially for architectures that use memblock info for numa info > > > (which seems to be everyone except x86) why not implement a generic > > > memory_add_physaddr_to_nid() that does: > > > > That would be only arm64. > > > > Darn, I saw ARCH_KEEP_MEMBLOCK and had delusions of grandeur that it > could solve my numa api woes. At least for x86 the problem is already > solved with reserved numa_meminfo, but now I'm trying to write generic > drivers that use those apis and finding these gaps on other archs. Even on arm64, there is a dependency issue in dax_pmem kmem case. If dax pmem uses memory_add_physaddr_to_nid() to decide which node that memblock should add into, get_pfn_range_for_nid() might not have the correct memblock info at that time. That is, get_pfn_range_for_nid() can't get the correct memblock info before add_memory() So IMO, memory_add_physaddr_to_nid() still have to implement as noop on arm64 (return 0) together with sh,s390x? Powerpc, x86,ia64 can use their own implementation. And phys_to_target_node() can use your suggested( for_each_online_node() ...) What do you think of it? Thanks -- Cheers, Justin (Jia He)
On 08.07.20 08:22, Mike Rapoport wrote: > On Tue, Jul 07, 2020 at 09:27:43PM -0700, Dan Williams wrote: >> On Tue, Jul 7, 2020 at 9:08 PM Justin He <Justin.He@arm.com> wrote: >> [..] >>>> Especially for architectures that use memblock info for numa info >>>> (which seems to be everyone except x86) why not implement a generic >>>> memory_add_physaddr_to_nid() that does: >>>> >>>> int memory_add_physaddr_to_nid(u64 addr) >>>> { >>>> unsigned long start_pfn, end_pfn, pfn = PHYS_PFN(addr); >>>> int nid; >>>> >>>> for_each_online_node(nid) { >>>> get_pfn_range_for_nid(nid, &start_pfn, &end_pfn); >>>> if (pfn >= start_pfn && pfn <= end_pfn) >>>> return nid; >>>> } >>>> return NUMA_NO_NODE; >>>> } >>> >>> Thanks for your suggestion, >>> Could I wrap the codes and let memory_add_physaddr_to_nid simply invoke >>> phys_to_target_node()? >> >> I think it needs to be the reverse. phys_to_target_node() should call >> memory_add_physaddr_to_nid() by default, but fall back to searching >> reserved memory address ranges in memblock. See phys_to_target_node() >> in arch/x86/mm/numa.c. That one uses numa_meminfo instead of memblock, >> but the principle is the same i.e. that a target node may not be >> represented in memblock.memory, but memblock.reserved. I'm working on >> a patch to provide a function similar to get_pfn_range_for_nid() that >> operates on reserved memory. > > Do we really need yet another memblock iterator? > I think only x86 has memory that is not in memblock.memory but only in > memblock.reserved. Reading about abusing the memblock allcoator once again in memory hotplug paths makes me shiver.
On 08.07.20 08:56, Justin He wrote: > Hi Dan > >> -----Original Message----- >> From: Dan Williams <dan.j.williams@intel.com> >> Sent: Wednesday, July 8, 2020 1:48 PM >> To: Mike Rapoport <rppt@linux.ibm.com> >> Cc: Justin He <Justin.He@arm.com>; Michal Hocko <mhocko@kernel.org>; David >> Hildenbrand <david@redhat.com>; Catalin Marinas <Catalin.Marinas@arm.com>; >> Will Deacon <will@kernel.org>; Vishal Verma <vishal.l.verma@intel.com>; >> Dave Jiang <dave.jiang@intel.com>; Andrew Morton <akpm@linux- >> foundation.org>; Baoquan He <bhe@redhat.com>; Chuhong Yuan >> <hslester96@gmail.com>; linux-arm-kernel@lists.infradead.org; linux- >> kernel@vger.kernel.org; linux-mm@kvack.org; linux-nvdimm@lists.01.org; >> Kaly Xin <Kaly.Xin@arm.com> >> Subject: Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid >> as EXPORT_SYMBOL_GPL >> >> On Tue, Jul 7, 2020 at 10:33 PM Mike Rapoport <rppt@linux.ibm.com> wrote: >>> >>> On Tue, Jul 07, 2020 at 08:56:36PM -0700, Dan Williams wrote: >>>> On Tue, Jul 7, 2020 at 7:20 PM Justin He <Justin.He@arm.com> wrote: >>>>> >>>>> Hi Michal and David >>>>> >>>>>> -----Original Message----- >>>>>> From: Michal Hocko <mhocko@kernel.org> >>>>>> Sent: Tuesday, July 7, 2020 7:55 PM >>>>>> To: Justin He <Justin.He@arm.com> >>>>>> Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Will Deacon >>>>>> <will@kernel.org>; Dan Williams <dan.j.williams@intel.com>; Vishal >> Verma >>>>>> <vishal.l.verma@intel.com>; Dave Jiang <dave.jiang@intel.com>; >> Andrew >>>>>> Morton <akpm@linux-foundation.org>; Mike Rapoport >> <rppt@linux.ibm.com>; >>>>>> Baoquan He <bhe@redhat.com>; Chuhong Yuan <hslester96@gmail.com>; >> linux- >>>>>> arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org; >> linux- >>>>>> mm@kvack.org; linux-nvdimm@lists.01.org; Kaly Xin >> <Kaly.Xin@arm.com> >>>>>> Subject: Re: [PATCH v2 1/3] arm64/numa: export >> memory_add_physaddr_to_nid >>>>>> as EXPORT_SYMBOL_GPL >>>>>> >>>>>> On Tue 07-07-20 13:59:15, Jia He wrote: >>>>>>> This exports memory_add_physaddr_to_nid() for module driver to >> use. >>>>>>> >>>>>>> memory_add_physaddr_to_nid() is a fallback option to get the nid >> in case >>>>>>> NUMA_NO_NID is detected. >>>>>>> >>>>>>> Suggested-by: David Hildenbrand <david@redhat.com> >>>>>>> Signed-off-by: Jia He <justin.he@arm.com> >>>>>>> --- >>>>>>> arch/arm64/mm/numa.c | 5 +++-- >>>>>>> 1 file changed, 3 insertions(+), 2 deletions(-) >>>>>>> >>>>>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c >>>>>>> index aafcee3e3f7e..7eeb31740248 100644 >>>>>>> --- a/arch/arm64/mm/numa.c >>>>>>> +++ b/arch/arm64/mm/numa.c >>>>>>> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) >>>>>>> >>>>>>> /* >>>>>>> * We hope that we will be hotplugging memory on nodes we >> already know >>>>>> about, >>>>>>> - * such that acpi_get_node() succeeds and we never fall back to >> this... >>>>>>> + * such that acpi_get_node() succeeds. But when SRAT is not >> present, >>>>>> the node >>>>>>> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a >> fallback >>>>>> option. >>>>>>> */ >>>>>>> int memory_add_physaddr_to_nid(u64 addr) >>>>>>> { >>>>>>> - pr_warn("Unknown node for memory at 0x%llx, assuming node >> 0\n", >>>>>> addr); >>>>>>> return 0; >>>>>>> } >>>>>>> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); >>>>>> >>>>>> Does it make sense to export a noop function? Wouldn't make more >> sense >>>>>> to simply make it static inline somewhere in a header? I haven't >> checked >>>>>> whether there is an easy way to do that sanely bu this just hit my >> eyes. >>>>> >>>>> Okay, I can make a change in memory_hotplug.h, sth like: >>>>> --- a/include/linux/memory_hotplug.h >>>>> +++ b/include/linux/memory_hotplug.h >>>>> @@ -149,13 +149,13 @@ int add_pages(int nid, unsigned long start_pfn, >> unsigned long nr_pages, >>>>> struct mhp_params *params); >>>>> #endif /* ARCH_HAS_ADD_PAGES */ >>>>> >>>>> -#ifdef CONFIG_NUMA >>>>> -extern int memory_add_physaddr_to_nid(u64 start); >>>>> -#else >>>>> +#if !defined(CONFIG_NUMA) || !defined(memory_add_physaddr_to_nid) >>>>> static inline int memory_add_physaddr_to_nid(u64 start) >>>>> { >>>>> return 0; >>>>> } >>>>> +#else >>>>> +extern int memory_add_physaddr_to_nid(u64 start); >>>>> #endif >>>>> >>>>> And then check the memory_add_physaddr_to_nid() helper on all arches, >>>>> if it is noop(return 0), I can simply remove it. >>>>> if it is not noop, after the helper, >>>>> #define memory_add_physaddr_to_nid >>>>> >>>>> What do you think of this proposal? >>>> >>>> Especially for architectures that use memblock info for numa info >>>> (which seems to be everyone except x86) why not implement a generic >>>> memory_add_physaddr_to_nid() that does: >>> >>> That would be only arm64. >>> >> >> Darn, I saw ARCH_KEEP_MEMBLOCK and had delusions of grandeur that it >> could solve my numa api woes. At least for x86 the problem is already >> solved with reserved numa_meminfo, but now I'm trying to write generic >> drivers that use those apis and finding these gaps on other archs. > > Even on arm64, there is a dependency issue in dax_pmem kmem case. > If dax pmem uses memory_add_physaddr_to_nid() to decide which node that > memblock should add into, get_pfn_range_for_nid() might not have > the correct memblock info at that time. That is, get_pfn_range_for_nid() > can't get the correct memblock info before add_memory() > > So IMO, memory_add_physaddr_to_nid() still have to implement as noop on > arm64 (return 0) together with sh,s390x? Powerpc, x86,ia64 can use their > own implementation. And phys_to_target_node() can use your suggested( > for_each_online_node() ...) > > What do you think of it? Thanks You are trying to fix the "we only have one dummy node" AFAIU, so what you propose here is certainly good enough for now.
On Tue, Jul 7, 2020 at 11:59 PM David Hildenbrand <david@redhat.com> wrote: > > On 08.07.20 08:22, Mike Rapoport wrote: > > On Tue, Jul 07, 2020 at 09:27:43PM -0700, Dan Williams wrote: > >> On Tue, Jul 7, 2020 at 9:08 PM Justin He <Justin.He@arm.com> wrote: > >> [..] > >>>> Especially for architectures that use memblock info for numa info > >>>> (which seems to be everyone except x86) why not implement a generic > >>>> memory_add_physaddr_to_nid() that does: > >>>> > >>>> int memory_add_physaddr_to_nid(u64 addr) > >>>> { > >>>> unsigned long start_pfn, end_pfn, pfn = PHYS_PFN(addr); > >>>> int nid; > >>>> > >>>> for_each_online_node(nid) { > >>>> get_pfn_range_for_nid(nid, &start_pfn, &end_pfn); > >>>> if (pfn >= start_pfn && pfn <= end_pfn) > >>>> return nid; > >>>> } > >>>> return NUMA_NO_NODE; > >>>> } > >>> > >>> Thanks for your suggestion, > >>> Could I wrap the codes and let memory_add_physaddr_to_nid simply invoke > >>> phys_to_target_node()? > >> > >> I think it needs to be the reverse. phys_to_target_node() should call > >> memory_add_physaddr_to_nid() by default, but fall back to searching > >> reserved memory address ranges in memblock. See phys_to_target_node() > >> in arch/x86/mm/numa.c. That one uses numa_meminfo instead of memblock, > >> but the principle is the same i.e. that a target node may not be > >> represented in memblock.memory, but memblock.reserved. I'm working on > >> a patch to provide a function similar to get_pfn_range_for_nid() that > >> operates on reserved memory. > > > > Do we really need yet another memblock iterator? > > I think only x86 has memory that is not in memblock.memory but only in > > memblock.reserved. > > Reading about abusing the memblock allcoator once again in memory > hotplug paths makes me shiver. Technical reasoning please? arm64 numa information is established from memblock data. It seems counterproductive to ignore that fact if we're already touching memory_add_physaddr_to_nid() and have a use case for a driver to call it.
On 08.07.20 09:04, Dan Williams wrote: > On Tue, Jul 7, 2020 at 11:59 PM David Hildenbrand <david@redhat.com> wrote: >> >> On 08.07.20 08:22, Mike Rapoport wrote: >>> On Tue, Jul 07, 2020 at 09:27:43PM -0700, Dan Williams wrote: >>>> On Tue, Jul 7, 2020 at 9:08 PM Justin He <Justin.He@arm.com> wrote: >>>> [..] >>>>>> Especially for architectures that use memblock info for numa info >>>>>> (which seems to be everyone except x86) why not implement a generic >>>>>> memory_add_physaddr_to_nid() that does: >>>>>> >>>>>> int memory_add_physaddr_to_nid(u64 addr) >>>>>> { >>>>>> unsigned long start_pfn, end_pfn, pfn = PHYS_PFN(addr); >>>>>> int nid; >>>>>> >>>>>> for_each_online_node(nid) { >>>>>> get_pfn_range_for_nid(nid, &start_pfn, &end_pfn); >>>>>> if (pfn >= start_pfn && pfn <= end_pfn) >>>>>> return nid; >>>>>> } >>>>>> return NUMA_NO_NODE; >>>>>> } >>>>> >>>>> Thanks for your suggestion, >>>>> Could I wrap the codes and let memory_add_physaddr_to_nid simply invoke >>>>> phys_to_target_node()? >>>> >>>> I think it needs to be the reverse. phys_to_target_node() should call >>>> memory_add_physaddr_to_nid() by default, but fall back to searching >>>> reserved memory address ranges in memblock. See phys_to_target_node() >>>> in arch/x86/mm/numa.c. That one uses numa_meminfo instead of memblock, >>>> but the principle is the same i.e. that a target node may not be >>>> represented in memblock.memory, but memblock.reserved. I'm working on >>>> a patch to provide a function similar to get_pfn_range_for_nid() that >>>> operates on reserved memory. >>> >>> Do we really need yet another memblock iterator? >>> I think only x86 has memory that is not in memblock.memory but only in >>> memblock.reserved. >> >> Reading about abusing the memblock allcoator once again in memory >> hotplug paths makes me shiver. > > Technical reasoning please? ARCH_KEEP_MEMBLOCK is (AFAIK) only a hack for arm64 to implement pfn_valid(), because they zap out individual pages corresponding to memory holes of full sections. I am not a friend of adding more post-init code to rely on memblock data. It just makes it harder to eventually get rid of ARCH_KEEP_MEMBLOCK. > > arm64 numa information is established from memblock data. It seems > counterproductive to ignore that fact if we're already touching > memory_add_physaddr_to_nid() and have a use case for a driver to call > it. ... and we are trying to handle the "only a single dummy node" case (patch #2), or what am I missing? What is there to optimize currently?
On 08.07.20 07:27, Mike Rapoport wrote: > On Tue, Jul 07, 2020 at 03:05:48PM -0700, Dan Williams wrote: >> On Tue, Jul 7, 2020 at 11:01 AM Mike Rapoport <rppt@linux.ibm.com> wrote: >>> >>> On Tue, Jul 07, 2020 at 02:26:08PM +0200, David Hildenbrand wrote: >>>> On 07.07.20 14:13, Mike Rapoport wrote: >>>>> On Tue, Jul 07, 2020 at 01:54:54PM +0200, Michal Hocko wrote: >>>>>> On Tue 07-07-20 13:59:15, Jia He wrote: >>>>>>> This exports memory_add_physaddr_to_nid() for module driver to use. >>>>>>> >>>>>>> memory_add_physaddr_to_nid() is a fallback option to get the nid in case >>>>>>> NUMA_NO_NID is detected. >>>>>>> >>>>>>> Suggested-by: David Hildenbrand <david@redhat.com> >>>>>>> Signed-off-by: Jia He <justin.he@arm.com> >>>>>>> --- >>>>>>> arch/arm64/mm/numa.c | 5 +++-- >>>>>>> 1 file changed, 3 insertions(+), 2 deletions(-) >>>>>>> >>>>>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c >>>>>>> index aafcee3e3f7e..7eeb31740248 100644 >>>>>>> --- a/arch/arm64/mm/numa.c >>>>>>> +++ b/arch/arm64/mm/numa.c >>>>>>> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) >>>>>>> >>>>>>> /* >>>>>>> * We hope that we will be hotplugging memory on nodes we already know about, >>>>>>> - * such that acpi_get_node() succeeds and we never fall back to this... >>>>>>> + * such that acpi_get_node() succeeds. But when SRAT is not present, the node >>>>>>> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option. >>>>>>> */ >>>>>>> int memory_add_physaddr_to_nid(u64 addr) >>>>>>> { >>>>>>> - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr); >>>>>>> return 0; >>>>>>> } >>>>>>> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); >>>>>> >>>>>> Does it make sense to export a noop function? Wouldn't make more sense >>>>>> to simply make it static inline somewhere in a header? I haven't checked >>>>>> whether there is an easy way to do that sanely bu this just hit my eyes. >>>>> >>>>> We'll need to either add a CONFIG_ option or arch specific callback to >>>>> make both non-empty (x86, powerpc, ia64) and empty (arm64, sh) >>>>> implementations coexist ... >>>> >>>> Note: I have a similar dummy (return 0) patch for s390x lying around here. >>> >>> Then we'll call it a tie - 3:3 ;-) >> >> So I'd be happy to jump on the train of people wanting to export the >> ARM stub for this (and add a new ARM stub for phys_to_target_node()), >> but Will did have a plausibly better idea that I have been meaning to >> circle back to: >> >> http://lore.kernel.org/r/20200325111039.GA32109@willie-the-truck >> >> ...i.e. iterate over node data to do the lookup. This would seem to >> work generically for multiple archs unless I am missing something? IIRC, only memory assigned to/onlined to a ZONE is represented in the pgdat node span. E.g., not offline memory blocks. Esp., when hotplugging + onlining consecutive memory, there won't really be any intersections in most cases if I am not wrong. It would not be "intersection" but rather "closest fit". With overlapping nodes it's even more unclear. Which one to pick? > > I think it would work on arm64, power and, most propbably on s390 With only a single dummy node I guess it should work (searching when there is only a single node does not make too much sense). > (David?), but not on x86. x86 does not have reserved memory in pgdat, > it's never memblock_add()'ed (see e820__memblock_setup()). Can you enlighten me why that is relevant for the memory hotplug path? (or is it just a general comment to make the function as accurate as possible for all addresses?)
On Wed, Jul 08, 2020 at 09:21:25AM +0200, David Hildenbrand wrote: > On 08.07.20 07:27, Mike Rapoport wrote: > > On Tue, Jul 07, 2020 at 03:05:48PM -0700, Dan Williams wrote: > >> On Tue, Jul 7, 2020 at 11:01 AM Mike Rapoport <rppt@linux.ibm.com> wrote: > >>> > >>> On Tue, Jul 07, 2020 at 02:26:08PM +0200, David Hildenbrand wrote: > >>>> On 07.07.20 14:13, Mike Rapoport wrote: > >>>>> On Tue, Jul 07, 2020 at 01:54:54PM +0200, Michal Hocko wrote: > >>>>>> On Tue 07-07-20 13:59:15, Jia He wrote: > >>>>>>> This exports memory_add_physaddr_to_nid() for module driver to use. > >>>>>>> > >>>>>>> memory_add_physaddr_to_nid() is a fallback option to get the nid in case > >>>>>>> NUMA_NO_NID is detected. > >>>>>>> > >>>>>>> Suggested-by: David Hildenbrand <david@redhat.com> > >>>>>>> Signed-off-by: Jia He <justin.he@arm.com> > >>>>>>> --- > >>>>>>> arch/arm64/mm/numa.c | 5 +++-- > >>>>>>> 1 file changed, 3 insertions(+), 2 deletions(-) > >>>>>>> > >>>>>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c > >>>>>>> index aafcee3e3f7e..7eeb31740248 100644 > >>>>>>> --- a/arch/arm64/mm/numa.c > >>>>>>> +++ b/arch/arm64/mm/numa.c > >>>>>>> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) > >>>>>>> > >>>>>>> /* > >>>>>>> * We hope that we will be hotplugging memory on nodes we already know about, > >>>>>>> - * such that acpi_get_node() succeeds and we never fall back to this... > >>>>>>> + * such that acpi_get_node() succeeds. But when SRAT is not present, the node > >>>>>>> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option. > >>>>>>> */ > >>>>>>> int memory_add_physaddr_to_nid(u64 addr) > >>>>>>> { > >>>>>>> - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr); > >>>>>>> return 0; > >>>>>>> } > >>>>>>> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); > >>>>>> > >>>>>> Does it make sense to export a noop function? Wouldn't make more sense > >>>>>> to simply make it static inline somewhere in a header? I haven't checked > >>>>>> whether there is an easy way to do that sanely bu this just hit my eyes. > >>>>> > >>>>> We'll need to either add a CONFIG_ option or arch specific callback to > >>>>> make both non-empty (x86, powerpc, ia64) and empty (arm64, sh) > >>>>> implementations coexist ... > >>>> > >>>> Note: I have a similar dummy (return 0) patch for s390x lying around here. > >>> > >>> Then we'll call it a tie - 3:3 ;-) > >> > >> So I'd be happy to jump on the train of people wanting to export the > >> ARM stub for this (and add a new ARM stub for phys_to_target_node()), > >> but Will did have a plausibly better idea that I have been meaning to > >> circle back to: > >> > >> http://lore.kernel.org/r/20200325111039.GA32109@willie-the-truck > >> > >> ...i.e. iterate over node data to do the lookup. This would seem to > >> work generically for multiple archs unless I am missing something? > > IIRC, only memory assigned to/onlined to a ZONE is represented in the > pgdat node span. E.g., not offline memory blocks. > > Esp., when hotplugging + onlining consecutive memory, there won't really > be any intersections in most cases if I am not wrong. It would not be > "intersection" but rather "closest fit". > > With overlapping nodes it's even more unclear. Which one to pick? > > > > > I think it would work on arm64, power and, most propbably on s390 > > With only a single dummy node I guess it should work (searching when > there is only a single node does not make too much sense). > > > (David?), but not on x86. x86 does not have reserved memory in pgdat, > > it's never memblock_add()'ed (see e820__memblock_setup()). > > Can you enlighten me why that is relevant for the memory hotplug path? > (or is it just a general comment to make the function as accurate as > possible for all addresses?) phys_to_target_node() on x86 falls back to numa_reserved_meminfo which holds memory that is never listed in a node. > -- > Thanks, > > David / dhildenb >
On 08.07.20 09:38, Mike Rapoport wrote: > On Wed, Jul 08, 2020 at 09:21:25AM +0200, David Hildenbrand wrote: >> On 08.07.20 07:27, Mike Rapoport wrote: >>> On Tue, Jul 07, 2020 at 03:05:48PM -0700, Dan Williams wrote: >>>> On Tue, Jul 7, 2020 at 11:01 AM Mike Rapoport <rppt@linux.ibm.com> wrote: >>>>> >>>>> On Tue, Jul 07, 2020 at 02:26:08PM +0200, David Hildenbrand wrote: >>>>>> On 07.07.20 14:13, Mike Rapoport wrote: >>>>>>> On Tue, Jul 07, 2020 at 01:54:54PM +0200, Michal Hocko wrote: >>>>>>>> On Tue 07-07-20 13:59:15, Jia He wrote: >>>>>>>>> This exports memory_add_physaddr_to_nid() for module driver to use. >>>>>>>>> >>>>>>>>> memory_add_physaddr_to_nid() is a fallback option to get the nid in case >>>>>>>>> NUMA_NO_NID is detected. >>>>>>>>> >>>>>>>>> Suggested-by: David Hildenbrand <david@redhat.com> >>>>>>>>> Signed-off-by: Jia He <justin.he@arm.com> >>>>>>>>> --- >>>>>>>>> arch/arm64/mm/numa.c | 5 +++-- >>>>>>>>> 1 file changed, 3 insertions(+), 2 deletions(-) >>>>>>>>> >>>>>>>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c >>>>>>>>> index aafcee3e3f7e..7eeb31740248 100644 >>>>>>>>> --- a/arch/arm64/mm/numa.c >>>>>>>>> +++ b/arch/arm64/mm/numa.c >>>>>>>>> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) >>>>>>>>> >>>>>>>>> /* >>>>>>>>> * We hope that we will be hotplugging memory on nodes we already know about, >>>>>>>>> - * such that acpi_get_node() succeeds and we never fall back to this... >>>>>>>>> + * such that acpi_get_node() succeeds. But when SRAT is not present, the node >>>>>>>>> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option. >>>>>>>>> */ >>>>>>>>> int memory_add_physaddr_to_nid(u64 addr) >>>>>>>>> { >>>>>>>>> - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr); >>>>>>>>> return 0; >>>>>>>>> } >>>>>>>>> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); >>>>>>>> >>>>>>>> Does it make sense to export a noop function? Wouldn't make more sense >>>>>>>> to simply make it static inline somewhere in a header? I haven't checked >>>>>>>> whether there is an easy way to do that sanely bu this just hit my eyes. >>>>>>> >>>>>>> We'll need to either add a CONFIG_ option or arch specific callback to >>>>>>> make both non-empty (x86, powerpc, ia64) and empty (arm64, sh) >>>>>>> implementations coexist ... >>>>>> >>>>>> Note: I have a similar dummy (return 0) patch for s390x lying around here. >>>>> >>>>> Then we'll call it a tie - 3:3 ;-) >>>> >>>> So I'd be happy to jump on the train of people wanting to export the >>>> ARM stub for this (and add a new ARM stub for phys_to_target_node()), >>>> but Will did have a plausibly better idea that I have been meaning to >>>> circle back to: >>>> >>>> http://lore.kernel.org/r/20200325111039.GA32109@willie-the-truck >>>> >>>> ...i.e. iterate over node data to do the lookup. This would seem to >>>> work generically for multiple archs unless I am missing something? >> >> IIRC, only memory assigned to/onlined to a ZONE is represented in the >> pgdat node span. E.g., not offline memory blocks. >> >> Esp., when hotplugging + onlining consecutive memory, there won't really >> be any intersections in most cases if I am not wrong. It would not be >> "intersection" but rather "closest fit". >> >> With overlapping nodes it's even more unclear. Which one to pick? >> >>> >>> I think it would work on arm64, power and, most propbably on s390 >> >> With only a single dummy node I guess it should work (searching when >> there is only a single node does not make too much sense). >> >>> (David?), but not on x86. x86 does not have reserved memory in pgdat, >>> it's never memblock_add()'ed (see e820__memblock_setup()). >> >> Can you enlighten me why that is relevant for the memory hotplug path? >> (or is it just a general comment to make the function as accurate as >> possible for all addresses?) > > phys_to_target_node() on x86 falls back to numa_reserved_meminfo which > holds memory that is never listed in a node. > Ah, I see - thanks.
On Wed, Jul 08, 2020 at 09:16:01AM +0200, David Hildenbrand wrote: > On 08.07.20 09:04, Dan Williams wrote: > > On Tue, Jul 7, 2020 at 11:59 PM David Hildenbrand <david@redhat.com> wrote: > >> > >> On 08.07.20 08:22, Mike Rapoport wrote: > >>> On Tue, Jul 07, 2020 at 09:27:43PM -0700, Dan Williams wrote: > >>>> On Tue, Jul 7, 2020 at 9:08 PM Justin He <Justin.He@arm.com> wrote: > >>>> [..] > >>>>>> Especially for architectures that use memblock info for numa info > >>>>>> (which seems to be everyone except x86) why not implement a generic > >>>>>> memory_add_physaddr_to_nid() that does: > >>>>>> > >>>>>> int memory_add_physaddr_to_nid(u64 addr) > >>>>>> { > >>>>>> unsigned long start_pfn, end_pfn, pfn = PHYS_PFN(addr); > >>>>>> int nid; > >>>>>> > >>>>>> for_each_online_node(nid) { > >>>>>> get_pfn_range_for_nid(nid, &start_pfn, &end_pfn); > >>>>>> if (pfn >= start_pfn && pfn <= end_pfn) > >>>>>> return nid; > >>>>>> } > >>>>>> return NUMA_NO_NODE; > >>>>>> } > >>>>> > >>>>> Thanks for your suggestion, > >>>>> Could I wrap the codes and let memory_add_physaddr_to_nid simply invoke > >>>>> phys_to_target_node()? > >>>> > >>>> I think it needs to be the reverse. phys_to_target_node() should call > >>>> memory_add_physaddr_to_nid() by default, but fall back to searching > >>>> reserved memory address ranges in memblock. See phys_to_target_node() > >>>> in arch/x86/mm/numa.c. That one uses numa_meminfo instead of memblock, > >>>> but the principle is the same i.e. that a target node may not be > >>>> represented in memblock.memory, but memblock.reserved. I'm working on > >>>> a patch to provide a function similar to get_pfn_range_for_nid() that > >>>> operates on reserved memory. > >>> > >>> Do we really need yet another memblock iterator? > >>> I think only x86 has memory that is not in memblock.memory but only in > >>> memblock.reserved. > >> > >> Reading about abusing the memblock allcoator once again in memory > >> hotplug paths makes me shiver. > > > > Technical reasoning please? > > ARCH_KEEP_MEMBLOCK is (AFAIK) only a hack for arm64 to implement > pfn_valid(), because they zap out individual pages corresponding to > memory holes of full sections. > > I am not a friend of adding more post-init code to rely on memblock > data. It just makes it harder to eventually get rid of ARCH_KEEP_MEMBLOCK. The most heavy user of memblock in post-init code is powerpc. It won't be easy to get rid of it there. > > arm64 numa information is established from memblock data. It seems > > counterproductive to ignore that fact if we're already touching > > memory_add_physaddr_to_nid() and have a use case for a driver to call > > it. > > ... and we are trying to handle the "only a single dummy node" case > (patch #2), or what am I missing? What is there to optimize currently? > > -- > Thanks, > > David / dhildenb >
On Wed, Jul 8, 2020 at 12:22 AM David Hildenbrand <david@redhat.com> wrote: > > On 08.07.20 07:27, Mike Rapoport wrote: > > On Tue, Jul 07, 2020 at 03:05:48PM -0700, Dan Williams wrote: > >> On Tue, Jul 7, 2020 at 11:01 AM Mike Rapoport <rppt@linux.ibm.com> wrote: > >>> > >>> On Tue, Jul 07, 2020 at 02:26:08PM +0200, David Hildenbrand wrote: > >>>> On 07.07.20 14:13, Mike Rapoport wrote: > >>>>> On Tue, Jul 07, 2020 at 01:54:54PM +0200, Michal Hocko wrote: > >>>>>> On Tue 07-07-20 13:59:15, Jia He wrote: > >>>>>>> This exports memory_add_physaddr_to_nid() for module driver to use. > >>>>>>> > >>>>>>> memory_add_physaddr_to_nid() is a fallback option to get the nid in case > >>>>>>> NUMA_NO_NID is detected. > >>>>>>> > >>>>>>> Suggested-by: David Hildenbrand <david@redhat.com> > >>>>>>> Signed-off-by: Jia He <justin.he@arm.com> > >>>>>>> --- > >>>>>>> arch/arm64/mm/numa.c | 5 +++-- > >>>>>>> 1 file changed, 3 insertions(+), 2 deletions(-) > >>>>>>> > >>>>>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c > >>>>>>> index aafcee3e3f7e..7eeb31740248 100644 > >>>>>>> --- a/arch/arm64/mm/numa.c > >>>>>>> +++ b/arch/arm64/mm/numa.c > >>>>>>> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) > >>>>>>> > >>>>>>> /* > >>>>>>> * We hope that we will be hotplugging memory on nodes we already know about, > >>>>>>> - * such that acpi_get_node() succeeds and we never fall back to this... > >>>>>>> + * such that acpi_get_node() succeeds. But when SRAT is not present, the node > >>>>>>> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option. > >>>>>>> */ > >>>>>>> int memory_add_physaddr_to_nid(u64 addr) > >>>>>>> { > >>>>>>> - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr); > >>>>>>> return 0; > >>>>>>> } > >>>>>>> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); > >>>>>> > >>>>>> Does it make sense to export a noop function? Wouldn't make more sense > >>>>>> to simply make it static inline somewhere in a header? I haven't checked > >>>>>> whether there is an easy way to do that sanely bu this just hit my eyes. > >>>>> > >>>>> We'll need to either add a CONFIG_ option or arch specific callback to > >>>>> make both non-empty (x86, powerpc, ia64) and empty (arm64, sh) > >>>>> implementations coexist ... > >>>> > >>>> Note: I have a similar dummy (return 0) patch for s390x lying around here. > >>> > >>> Then we'll call it a tie - 3:3 ;-) > >> > >> So I'd be happy to jump on the train of people wanting to export the > >> ARM stub for this (and add a new ARM stub for phys_to_target_node()), > >> but Will did have a plausibly better idea that I have been meaning to > >> circle back to: > >> > >> http://lore.kernel.org/r/20200325111039.GA32109@willie-the-truck > >> > >> ...i.e. iterate over node data to do the lookup. This would seem to > >> work generically for multiple archs unless I am missing something? > > IIRC, only memory assigned to/onlined to a ZONE is represented in the > pgdat node span. E.g., not offline memory blocks. So this dovetails somewhat with Will's idea. What if we populated node_data for "offline" ranges? I started there, but then saw ARCH_KEEP_MEMBLOCK and thought it would be safer to just teach phys_to_target_node() to use that rather than update other code paths to expect node_data might not always reflect online data. > Esp., when hotplugging + onlining consecutive memory, there won't really > be any intersections in most cases if I am not wrong. It would not be > "intersection" but rather "closest fit". > > With overlapping nodes it's even more unclear. Which one to pick? In the overlap case you get what you get. Some signal is better than the noise of a dummy function. The consequences of picking the wrong node might be that the kernel can't properly associate a memory range to its performance data tables in firmware, but then again firmware messed up with an overlapping node definition in the first instance.
On 08.07.20 09:50, Dan Williams wrote: > On Wed, Jul 8, 2020 at 12:22 AM David Hildenbrand <david@redhat.com> wrote: >> >> On 08.07.20 07:27, Mike Rapoport wrote: >>> On Tue, Jul 07, 2020 at 03:05:48PM -0700, Dan Williams wrote: >>>> On Tue, Jul 7, 2020 at 11:01 AM Mike Rapoport <rppt@linux.ibm.com> wrote: >>>>> >>>>> On Tue, Jul 07, 2020 at 02:26:08PM +0200, David Hildenbrand wrote: >>>>>> On 07.07.20 14:13, Mike Rapoport wrote: >>>>>>> On Tue, Jul 07, 2020 at 01:54:54PM +0200, Michal Hocko wrote: >>>>>>>> On Tue 07-07-20 13:59:15, Jia He wrote: >>>>>>>>> This exports memory_add_physaddr_to_nid() for module driver to use. >>>>>>>>> >>>>>>>>> memory_add_physaddr_to_nid() is a fallback option to get the nid in case >>>>>>>>> NUMA_NO_NID is detected. >>>>>>>>> >>>>>>>>> Suggested-by: David Hildenbrand <david@redhat.com> >>>>>>>>> Signed-off-by: Jia He <justin.he@arm.com> >>>>>>>>> --- >>>>>>>>> arch/arm64/mm/numa.c | 5 +++-- >>>>>>>>> 1 file changed, 3 insertions(+), 2 deletions(-) >>>>>>>>> >>>>>>>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c >>>>>>>>> index aafcee3e3f7e..7eeb31740248 100644 >>>>>>>>> --- a/arch/arm64/mm/numa.c >>>>>>>>> +++ b/arch/arm64/mm/numa.c >>>>>>>>> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) >>>>>>>>> >>>>>>>>> /* >>>>>>>>> * We hope that we will be hotplugging memory on nodes we already know about, >>>>>>>>> - * such that acpi_get_node() succeeds and we never fall back to this... >>>>>>>>> + * such that acpi_get_node() succeeds. But when SRAT is not present, the node >>>>>>>>> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option. >>>>>>>>> */ >>>>>>>>> int memory_add_physaddr_to_nid(u64 addr) >>>>>>>>> { >>>>>>>>> - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr); >>>>>>>>> return 0; >>>>>>>>> } >>>>>>>>> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); >>>>>>>> >>>>>>>> Does it make sense to export a noop function? Wouldn't make more sense >>>>>>>> to simply make it static inline somewhere in a header? I haven't checked >>>>>>>> whether there is an easy way to do that sanely bu this just hit my eyes. >>>>>>> >>>>>>> We'll need to either add a CONFIG_ option or arch specific callback to >>>>>>> make both non-empty (x86, powerpc, ia64) and empty (arm64, sh) >>>>>>> implementations coexist ... >>>>>> >>>>>> Note: I have a similar dummy (return 0) patch for s390x lying around here. >>>>> >>>>> Then we'll call it a tie - 3:3 ;-) >>>> >>>> So I'd be happy to jump on the train of people wanting to export the >>>> ARM stub for this (and add a new ARM stub for phys_to_target_node()), >>>> but Will did have a plausibly better idea that I have been meaning to >>>> circle back to: >>>> >>>> http://lore.kernel.org/r/20200325111039.GA32109@willie-the-truck >>>> >>>> ...i.e. iterate over node data to do the lookup. This would seem to >>>> work generically for multiple archs unless I am missing something? >> >> IIRC, only memory assigned to/onlined to a ZONE is represented in the >> pgdat node span. E.g., not offline memory blocks. > > So this dovetails somewhat with Will's idea. What if we populated > node_data for "offline" ranges? I started there, but then saw > ARCH_KEEP_MEMBLOCK and thought it would be safer to just teach > phys_to_target_node() to use that rather than update other code paths > to expect node_data might not always reflect online data. We currently need a somewhat-accurate pgdat node span to detect when to offline a node. See try_offline_node(). This works fairly reliable. Shrinking the node span is currently fairly easy for !ZONE_DEVICE memory, because we can rely on pfn_to_online_page() + pfn_to_nid(pfn). See e.g., find_biggest_section_pfn(). If we glue growing/shrinking the node span to adding/removing of memory (instead of e.g., onlining/offlining), we can no longer base shrinking on memmap data. We would have to get the information ("how far can I shrink the node span, is it empty?") from somewhere else. E.g., for_each_memory_block() - but that one does not cover ZONE_DEVICE. And there are memory blocks which cover multiple nodes, in which case we only store one of them ... unreliable. This certainly needs more thought :/ > >> Esp., when hotplugging + onlining consecutive memory, there won't really >> be any intersections in most cases if I am not wrong. It would not be >> "intersection" but rather "closest fit". >> >> With overlapping nodes it's even more unclear. Which one to pick? > > In the overlap case you get what you get. Some signal is better than > the noise of a dummy function. The consequences of picking the wrong > node might be that the kernel can't properly associate a memory range > to its performance data tables in firmware, but then again firmware > messed up with an overlapping node definition in the first instance. I'd be curious if what we are trying to optimize here is actually worth optimizing. IOW, is there a well-known scenario where the dummy value on arm64 would be problematic and is worth the effort? I mean, in all performance relevant setups (ignoring hv_balloon/xen-balloon/prove_store(), which also use memory_add_physaddr_to_nid()), we should have a proper PXM/node specified by the hardware on memory hotadd. The fallback of memory_add_physaddr_to_nid() is not relevant in these scenarios.
On Wed, Jul 08, 2020 at 10:26:41AM +0200, David Hildenbrand wrote: > On 08.07.20 09:50, Dan Williams wrote: > > On Wed, Jul 8, 2020 at 12:22 AM David Hildenbrand <david@redhat.com> wrote: > >> > >>>>>>>> On Tue 07-07-20 13:59:15, Jia He wrote: > >>>>>>>>> This exports memory_add_physaddr_to_nid() for module driver to use. > >>>>>>>>> > >>>>>>>>> memory_add_physaddr_to_nid() is a fallback option to get the nid in case > >>>>>>>>> NUMA_NO_NID is detected. > >>>>>>>>> > >>>>>>>>> Suggested-by: David Hildenbrand <david@redhat.com> > >>>>>>>>> Signed-off-by: Jia He <justin.he@arm.com> > >>>>>>>>> --- > >>>>>>>>> arch/arm64/mm/numa.c | 5 +++-- > >>>>>>>>> 1 file changed, 3 insertions(+), 2 deletions(-) > >>>>>>>>> > >>>>>>>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c > >>>>>>>>> index aafcee3e3f7e..7eeb31740248 100644 > >>>>>>>>> --- a/arch/arm64/mm/numa.c > >>>>>>>>> +++ b/arch/arm64/mm/numa.c > >>>>>>>>> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) > >>>>>>>>> > >>>>>>>>> /* > >>>>>>>>> * We hope that we will be hotplugging memory on nodes we already know about, > >>>>>>>>> - * such that acpi_get_node() succeeds and we never fall back to this... > >>>>>>>>> + * such that acpi_get_node() succeeds. But when SRAT is not present, the node > >>>>>>>>> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option. > >>>>>>>>> */ > >>>>>>>>> int memory_add_physaddr_to_nid(u64 addr) > >>>>>>>>> { > >>>>>>>>> - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr); > >>>>>>>>> return 0; > >>>>>>>>> } > >>>>>>>>> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); > >>>>>>>> > >>>>>>>> Does it make sense to export a noop function? Wouldn't make more sense > >>>>>>>> to simply make it static inline somewhere in a header? I haven't checked > >>>>>>>> whether there is an easy way to do that sanely bu this just hit my eyes. > I'd be curious if what we are trying to optimize here is actually worth > optimizing. IOW, is there a well-known scenario where the dummy value on > arm64 would be problematic and is worth the effort? Well, it started with Michal's comment above that EXPORT_SYMBOL_GPL() for a stub might be an overkill. I think Jia's suggestion [1] with addition of a comment that explains why and when the stub will be used, can work for both memory_add_physaddr_to_nid() and phys_to_target_node(). But on more theoretical/fundmanetal level, I think we lack a generic abstraction similar to e.g. x86 'struct numa_meminfo' that serves as translaton of firmware supplied information into data that can be used by the generic mm without need to reimplement it for each and every arch. [1] https://lore.kernel.org/lkml/AM6PR08MB406907F9F2B13DA6DC893AD9F7670@AM6PR08MB4069.eurprd08.prod.outlook.com > I mean, in all performance relevant setups (ignoring > hv_balloon/xen-balloon/prove_store(), which also use > memory_add_physaddr_to_nid()), we should have a proper PXM/node > specified by the hardware on memory hotadd. The fallback of > memory_add_physaddr_to_nid() is not relevant in these scenarios. > > -- > Thanks, > > David / dhildenb >
On 08.07.20 10:39, Mike Rapoport wrote: > On Wed, Jul 08, 2020 at 10:26:41AM +0200, David Hildenbrand wrote: >> On 08.07.20 09:50, Dan Williams wrote: >>> On Wed, Jul 8, 2020 at 12:22 AM David Hildenbrand <david@redhat.com> wrote: >>>> >>>>>>>>>> On Tue 07-07-20 13:59:15, Jia He wrote: >>>>>>>>>>> This exports memory_add_physaddr_to_nid() for module driver to use. >>>>>>>>>>> >>>>>>>>>>> memory_add_physaddr_to_nid() is a fallback option to get the nid in case >>>>>>>>>>> NUMA_NO_NID is detected. >>>>>>>>>>> >>>>>>>>>>> Suggested-by: David Hildenbrand <david@redhat.com> >>>>>>>>>>> Signed-off-by: Jia He <justin.he@arm.com> >>>>>>>>>>> --- >>>>>>>>>>> arch/arm64/mm/numa.c | 5 +++-- >>>>>>>>>>> 1 file changed, 3 insertions(+), 2 deletions(-) >>>>>>>>>>> >>>>>>>>>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c >>>>>>>>>>> index aafcee3e3f7e..7eeb31740248 100644 >>>>>>>>>>> --- a/arch/arm64/mm/numa.c >>>>>>>>>>> +++ b/arch/arm64/mm/numa.c >>>>>>>>>>> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) >>>>>>>>>>> >>>>>>>>>>> /* >>>>>>>>>>> * We hope that we will be hotplugging memory on nodes we already know about, >>>>>>>>>>> - * such that acpi_get_node() succeeds and we never fall back to this... >>>>>>>>>>> + * such that acpi_get_node() succeeds. But when SRAT is not present, the node >>>>>>>>>>> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option. >>>>>>>>>>> */ >>>>>>>>>>> int memory_add_physaddr_to_nid(u64 addr) >>>>>>>>>>> { >>>>>>>>>>> - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr); >>>>>>>>>>> return 0; >>>>>>>>>>> } >>>>>>>>>>> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); >>>>>>>>>> >>>>>>>>>> Does it make sense to export a noop function? Wouldn't make more sense >>>>>>>>>> to simply make it static inline somewhere in a header? I haven't checked >>>>>>>>>> whether there is an easy way to do that sanely bu this just hit my eyes. > >> I'd be curious if what we are trying to optimize here is actually worth >> optimizing. IOW, is there a well-known scenario where the dummy value on >> arm64 would be problematic and is worth the effort? > > Well, it started with Michal's comment above that EXPORT_SYMBOL_GPL() > for a stub might be an overkill. > > I think Jia's suggestion [1] with addition of a comment that explains > why and when the stub will be used, can work for both > memory_add_physaddr_to_nid() and phys_to_target_node(). Agreed. > > But on more theoretical/fundmanetal level, I think we lack a generic > abstraction similar to e.g. x86 'struct numa_meminfo' that serves as > translaton of firmware supplied information into data that can be used > by the generic mm without need to reimplement it for each and every > arch. Right. As I expressed, I am not a friend of using memblock for that, and the pgdat node span is tricky. Maybe abstracting that x86 concept is possible in some way (and we could restrict the information to boot-time properties, so we don't have to mess with memory hot(un)plug - just as done for numa_meminfo AFAIKS).
On Wed, Jul 08, 2020 at 10:45:17AM +0200, David Hildenbrand wrote: > On 08.07.20 10:39, Mike Rapoport wrote: > > On Wed, Jul 08, 2020 at 10:26:41AM +0200, David Hildenbrand wrote: > >> On 08.07.20 09:50, Dan Williams wrote: > >>> On Wed, Jul 8, 2020 at 12:22 AM David Hildenbrand <david@redhat.com> wrote: > >>>> > >>>>>>>>>> On Tue 07-07-20 13:59:15, Jia He wrote: > >>>>>>>>>>> This exports memory_add_physaddr_to_nid() for module driver to use. > >>>>>>>>>>> > >>>>>>>>>>> memory_add_physaddr_to_nid() is a fallback option to get the nid in case > >>>>>>>>>>> NUMA_NO_NID is detected. > >>>>>>>>>>> > >>>>>>>>>>> Suggested-by: David Hildenbrand <david@redhat.com> > >>>>>>>>>>> Signed-off-by: Jia He <justin.he@arm.com> > >>>>>>>>>>> --- > >>>>>>>>>>> arch/arm64/mm/numa.c | 5 +++-- > >>>>>>>>>>> 1 file changed, 3 insertions(+), 2 deletions(-) > >>>>>>>>>>> > >>>>>>>>>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c > >>>>>>>>>>> index aafcee3e3f7e..7eeb31740248 100644 > >>>>>>>>>>> --- a/arch/arm64/mm/numa.c > >>>>>>>>>>> +++ b/arch/arm64/mm/numa.c > >>>>>>>>>>> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) > >>>>>>>>>>> > >>>>>>>>>>> /* > >>>>>>>>>>> * We hope that we will be hotplugging memory on nodes we already know about, > >>>>>>>>>>> - * such that acpi_get_node() succeeds and we never fall back to this... > >>>>>>>>>>> + * such that acpi_get_node() succeeds. But when SRAT is not present, the node > >>>>>>>>>>> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option. > >>>>>>>>>>> */ > >>>>>>>>>>> int memory_add_physaddr_to_nid(u64 addr) > >>>>>>>>>>> { > >>>>>>>>>>> - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr); > >>>>>>>>>>> return 0; > >>>>>>>>>>> } > >>>>>>>>>>> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); > >>>>>>>>>> > >>>>>>>>>> Does it make sense to export a noop function? Wouldn't make more sense > >>>>>>>>>> to simply make it static inline somewhere in a header? I haven't checked > >>>>>>>>>> whether there is an easy way to do that sanely bu this just hit my eyes. > > > >> I'd be curious if what we are trying to optimize here is actually worth > >> optimizing. IOW, is there a well-known scenario where the dummy value on > >> arm64 would be problematic and is worth the effort? > > > > Well, it started with Michal's comment above that EXPORT_SYMBOL_GPL() > > for a stub might be an overkill. > > > > I think Jia's suggestion [1] with addition of a comment that explains > > why and when the stub will be used, can work for both > > memory_add_physaddr_to_nid() and phys_to_target_node(). > > Agreed. > > > > > But on more theoretical/fundmanetal level, I think we lack a generic > > abstraction similar to e.g. x86 'struct numa_meminfo' that serves as > > translaton of firmware supplied information into data that can be used > > by the generic mm without need to reimplement it for each and every > > arch. > > Right. As I expressed, I am not a friend of using memblock for that, and > the pgdat node span is tricky. > > Maybe abstracting that x86 concept is possible in some way (and we could > restrict the information to boot-time properties, so we don't have to > mess with memory hot(un)plug - just as done for numa_meminfo AFAIKS). I agree with pgdat part and disagree about memblock. It already has non-init physmap, why won't we add memblock.memory to the mix? ;-) Now, seriously, memblock already has all the necessary information about the coldplug memory for several architectures. x86 being an exception because for some reason the reserved memory is not considered memory there. The infrastructure for quiering and iterating memory regions is already there. We just need to leave out the irrelevant parts, like memblock.reserved and allocation funcions. Otherwise we'll add yet another 'struct { start, end }', a horde of covnersion and re-initialization functions that will do more or less the same things as current memblock APIs. > -- > Thanks, > > David / dhildenb > >
On 08.07.20 11:15, Mike Rapoport wrote: > On Wed, Jul 08, 2020 at 10:45:17AM +0200, David Hildenbrand wrote: >> On 08.07.20 10:39, Mike Rapoport wrote: >>> On Wed, Jul 08, 2020 at 10:26:41AM +0200, David Hildenbrand wrote: >>>> On 08.07.20 09:50, Dan Williams wrote: >>>>> On Wed, Jul 8, 2020 at 12:22 AM David Hildenbrand <david@redhat.com> wrote: >>>>>> >>>>>>>>>>>> On Tue 07-07-20 13:59:15, Jia He wrote: >>>>>>>>>>>>> This exports memory_add_physaddr_to_nid() for module driver to use. >>>>>>>>>>>>> >>>>>>>>>>>>> memory_add_physaddr_to_nid() is a fallback option to get the nid in case >>>>>>>>>>>>> NUMA_NO_NID is detected. >>>>>>>>>>>>> >>>>>>>>>>>>> Suggested-by: David Hildenbrand <david@redhat.com> >>>>>>>>>>>>> Signed-off-by: Jia He <justin.he@arm.com> >>>>>>>>>>>>> --- >>>>>>>>>>>>> arch/arm64/mm/numa.c | 5 +++-- >>>>>>>>>>>>> 1 file changed, 3 insertions(+), 2 deletions(-) >>>>>>>>>>>>> >>>>>>>>>>>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c >>>>>>>>>>>>> index aafcee3e3f7e..7eeb31740248 100644 >>>>>>>>>>>>> --- a/arch/arm64/mm/numa.c >>>>>>>>>>>>> +++ b/arch/arm64/mm/numa.c >>>>>>>>>>>>> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) >>>>>>>>>>>>> >>>>>>>>>>>>> /* >>>>>>>>>>>>> * We hope that we will be hotplugging memory on nodes we already know about, >>>>>>>>>>>>> - * such that acpi_get_node() succeeds and we never fall back to this... >>>>>>>>>>>>> + * such that acpi_get_node() succeeds. But when SRAT is not present, the node >>>>>>>>>>>>> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option. >>>>>>>>>>>>> */ >>>>>>>>>>>>> int memory_add_physaddr_to_nid(u64 addr) >>>>>>>>>>>>> { >>>>>>>>>>>>> - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr); >>>>>>>>>>>>> return 0; >>>>>>>>>>>>> } >>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); >>>>>>>>>>>> >>>>>>>>>>>> Does it make sense to export a noop function? Wouldn't make more sense >>>>>>>>>>>> to simply make it static inline somewhere in a header? I haven't checked >>>>>>>>>>>> whether there is an easy way to do that sanely bu this just hit my eyes. >>> >>>> I'd be curious if what we are trying to optimize here is actually worth >>>> optimizing. IOW, is there a well-known scenario where the dummy value on >>>> arm64 would be problematic and is worth the effort? >>> >>> Well, it started with Michal's comment above that EXPORT_SYMBOL_GPL() >>> for a stub might be an overkill. >>> >>> I think Jia's suggestion [1] with addition of a comment that explains >>> why and when the stub will be used, can work for both >>> memory_add_physaddr_to_nid() and phys_to_target_node(). >> >> Agreed. >> >>> >>> But on more theoretical/fundmanetal level, I think we lack a generic >>> abstraction similar to e.g. x86 'struct numa_meminfo' that serves as >>> translaton of firmware supplied information into data that can be used >>> by the generic mm without need to reimplement it for each and every >>> arch. >> >> Right. As I expressed, I am not a friend of using memblock for that, and >> the pgdat node span is tricky. >> >> Maybe abstracting that x86 concept is possible in some way (and we could >> restrict the information to boot-time properties, so we don't have to >> mess with memory hot(un)plug - just as done for numa_meminfo AFAIKS). > > I agree with pgdat part and disagree about memblock. It already has > non-init physmap, why won't we add memblock.memory to the mix? ;-) Can we generalize and tweak physmap to contain node info? That's all we need, no? (the special mem= parameter handling should not matter for our use case, where "physmap" and "memory" would differ) > > Now, seriously, memblock already has all the necessary information about > the coldplug memory for several architectures. x86 being an exception > because for some reason the reserved memory is not considered memory > there. The infrastructure for quiering and iterating memory regions is > already there. We just need to leave out the irrelevant parts, like > memblock.reserved and allocation funcions. I *really* don't want to mess with memblocks on memory hot(un)plug on x86 and s390x (+other architectures in the future). I also thought about stopping to create memblocks for hotplugged memory on arm64, by tweaking pfn_valid() to query memblocks only for early sections. If "physmem" is not an option, can we at least introduce something like ARCH_UPDTAE_MEMBLOCK_ON_HOTPLUG to avoid doing that on x86 and s390x for now (and later maybe for others)?
On Wed, Jul 08, 2020 at 11:25:36AM +0200, David Hildenbrand wrote: > On 08.07.20 11:15, Mike Rapoport wrote: > >>>>>> > >>> But on more theoretical/fundmanetal level, I think we lack a generic > >>> abstraction similar to e.g. x86 'struct numa_meminfo' that serves as > >>> translaton of firmware supplied information into data that can be used > >>> by the generic mm without need to reimplement it for each and every > >>> arch. > >> > >> Right. As I expressed, I am not a friend of using memblock for that, and > >> the pgdat node span is tricky. > >> > >> Maybe abstracting that x86 concept is possible in some way (and we could > >> restrict the information to boot-time properties, so we don't have to > >> mess with memory hot(un)plug - just as done for numa_meminfo AFAIKS). > > > > I agree with pgdat part and disagree about memblock. It already has > > non-init physmap, why won't we add memblock.memory to the mix? ;-) > > Can we generalize and tweak physmap to contain node info? That's all we > need, no? (the special mem= parameter handling should not matter for our > use case, where "physmap" and "memory" would differ) TBH, I have only random vague thoughts at the moment. This might be an option. But then we need to enable physmap on !s390, right? > > Now, seriously, memblock already has all the necessary information about > > the coldplug memory for several architectures. x86 being an exception > > because for some reason the reserved memory is not considered memory > > there. The infrastructure for quiering and iterating memory regions is > > already there. We just need to leave out the irrelevant parts, like > > memblock.reserved and allocation funcions. > > I *really* don't want to mess with memblocks on memory hot(un)plug on > x86 and s390x (+other architectures in the future). I also thought about > stopping to create memblocks for hotplugged memory on arm64, by tweaking > pfn_valid() to query memblocks only for early sections. > > If "physmem" is not an option, can we at least introduce something like > ARCH_UPDTAE_MEMBLOCK_ON_HOTPLUG to avoid doing that on x86 and s390x for > now (and later maybe for others)? I have to do more memory hotplug howework to answer that ;-) My general point is that we don't have to reinvent the wheel to have coldplug memory representation, it's already there. We just need a way to use it properly. > -- > Thanks, > > David / dhildenb >
On 08.07.20 11:45, Mike Rapoport wrote: > On Wed, Jul 08, 2020 at 11:25:36AM +0200, David Hildenbrand wrote: >> On 08.07.20 11:15, Mike Rapoport wrote: >>>>>>>> >>>>> But on more theoretical/fundmanetal level, I think we lack a generic >>>>> abstraction similar to e.g. x86 'struct numa_meminfo' that serves as >>>>> translaton of firmware supplied information into data that can be used >>>>> by the generic mm without need to reimplement it for each and every >>>>> arch. >>>> >>>> Right. As I expressed, I am not a friend of using memblock for that, and >>>> the pgdat node span is tricky. >>>> >>>> Maybe abstracting that x86 concept is possible in some way (and we could >>>> restrict the information to boot-time properties, so we don't have to >>>> mess with memory hot(un)plug - just as done for numa_meminfo AFAIKS). >>> >>> I agree with pgdat part and disagree about memblock. It already has >>> non-init physmap, why won't we add memblock.memory to the mix? ;-) >> >> Can we generalize and tweak physmap to contain node info? That's all we >> need, no? (the special mem= parameter handling should not matter for our >> use case, where "physmap" and "memory" would differ) > > TBH, I have only random vague thoughts at the moment. This might be an > option. But then we need to enable physmap on !s390, right? Yes, looks like it. > >>> Now, seriously, memblock already has all the necessary information about >>> the coldplug memory for several architectures. x86 being an exception >>> because for some reason the reserved memory is not considered memory >>> there. The infrastructure for quiering and iterating memory regions is >>> already there. We just need to leave out the irrelevant parts, like >>> memblock.reserved and allocation funcions. >> >> I *really* don't want to mess with memblocks on memory hot(un)plug on >> x86 and s390x (+other architectures in the future). I also thought about >> stopping to create memblocks for hotplugged memory on arm64, by tweaking >> pfn_valid() to query memblocks only for early sections. >> >> If "physmem" is not an option, can we at least introduce something like >> ARCH_UPDTAE_MEMBLOCK_ON_HOTPLUG to avoid doing that on x86 and s390x for >> now (and later maybe for others)? > > I have to do more memory hotplug howework to answer that ;-) > > My general point is that we don't have to reinvent the wheel to have > coldplug memory representation, it's already there. We just need a way > to use it properly. Yes, I tend to agree. Details to be clarified :)
On Wed, Jul 8, 2020 at 3:04 AM David Hildenbrand <david@redhat.com> wrote: > > On 08.07.20 11:45, Mike Rapoport wrote: > > On Wed, Jul 08, 2020 at 11:25:36AM +0200, David Hildenbrand wrote: > >> On 08.07.20 11:15, Mike Rapoport wrote: > >>>>>>>> > >>>>> But on more theoretical/fundmanetal level, I think we lack a generic > >>>>> abstraction similar to e.g. x86 'struct numa_meminfo' that serves as > >>>>> translaton of firmware supplied information into data that can be used > >>>>> by the generic mm without need to reimplement it for each and every > >>>>> arch. > >>>> > >>>> Right. As I expressed, I am not a friend of using memblock for that, and > >>>> the pgdat node span is tricky. > >>>> > >>>> Maybe abstracting that x86 concept is possible in some way (and we could > >>>> restrict the information to boot-time properties, so we don't have to > >>>> mess with memory hot(un)plug - just as done for numa_meminfo AFAIKS). > >>> > >>> I agree with pgdat part and disagree about memblock. It already has > >>> non-init physmap, why won't we add memblock.memory to the mix? ;-) > >> > >> Can we generalize and tweak physmap to contain node info? That's all we > >> need, no? (the special mem= parameter handling should not matter for our > >> use case, where "physmap" and "memory" would differ) > > > > TBH, I have only random vague thoughts at the moment. This might be an > > option. But then we need to enable physmap on !s390, right? > > Yes, looks like it. > > > > >>> Now, seriously, memblock already has all the necessary information about > >>> the coldplug memory for several architectures. x86 being an exception > >>> because for some reason the reserved memory is not considered memory > >>> there. The infrastructure for quiering and iterating memory regions is > >>> already there. We just need to leave out the irrelevant parts, like > >>> memblock.reserved and allocation funcions. > >> > >> I *really* don't want to mess with memblocks on memory hot(un)plug on > >> x86 and s390x (+other architectures in the future). I also thought about > >> stopping to create memblocks for hotplugged memory on arm64, by tweaking > >> pfn_valid() to query memblocks only for early sections. > >> > >> If "physmem" is not an option, can we at least introduce something like > >> ARCH_UPDTAE_MEMBLOCK_ON_HOTPLUG to avoid doing that on x86 and s390x for > >> now (and later maybe for others)? > > > > I have to do more memory hotplug howework to answer that ;-) > > > > My general point is that we don't have to reinvent the wheel to have > > coldplug memory representation, it's already there. We just need a way > > to use it properly. > > Yes, I tend to agree. Details to be clarified :) I'm not quite understanding the concern, or requirement about "updating memblock" in the hotplug path. The routines memory_add_physaddr_to_nid() and phys_to_target_node() are helpers to interrogate platform-firmware numa info through a common abstraction. They place no burden on the memory hotplug code they're just used to see if a hot-added range lies within an existing node span when platform-firmware otherwise fails to communicate a node. x86 can continue to back those helpers with numa_meminfo, arm64 can use a generic memblock implementation and other archs can follow the arm64 example if they want better numa answers for drivers.
On 08.07.20 17:50, Dan Williams wrote: > On Wed, Jul 8, 2020 at 3:04 AM David Hildenbrand <david@redhat.com> wrote: >> >> On 08.07.20 11:45, Mike Rapoport wrote: >>> On Wed, Jul 08, 2020 at 11:25:36AM +0200, David Hildenbrand wrote: >>>> On 08.07.20 11:15, Mike Rapoport wrote: >>>>>>>>>> >>>>>>> But on more theoretical/fundmanetal level, I think we lack a generic >>>>>>> abstraction similar to e.g. x86 'struct numa_meminfo' that serves as >>>>>>> translaton of firmware supplied information into data that can be used >>>>>>> by the generic mm without need to reimplement it for each and every >>>>>>> arch. >>>>>> >>>>>> Right. As I expressed, I am not a friend of using memblock for that, and >>>>>> the pgdat node span is tricky. >>>>>> >>>>>> Maybe abstracting that x86 concept is possible in some way (and we could >>>>>> restrict the information to boot-time properties, so we don't have to >>>>>> mess with memory hot(un)plug - just as done for numa_meminfo AFAIKS). >>>>> >>>>> I agree with pgdat part and disagree about memblock. It already has >>>>> non-init physmap, why won't we add memblock.memory to the mix? ;-) >>>> >>>> Can we generalize and tweak physmap to contain node info? That's all we >>>> need, no? (the special mem= parameter handling should not matter for our >>>> use case, where "physmap" and "memory" would differ) >>> >>> TBH, I have only random vague thoughts at the moment. This might be an >>> option. But then we need to enable physmap on !s390, right? >> >> Yes, looks like it. >> >>> >>>>> Now, seriously, memblock already has all the necessary information about >>>>> the coldplug memory for several architectures. x86 being an exception >>>>> because for some reason the reserved memory is not considered memory >>>>> there. The infrastructure for quiering and iterating memory regions is >>>>> already there. We just need to leave out the irrelevant parts, like >>>>> memblock.reserved and allocation funcions. >>>> >>>> I *really* don't want to mess with memblocks on memory hot(un)plug on >>>> x86 and s390x (+other architectures in the future). I also thought about >>>> stopping to create memblocks for hotplugged memory on arm64, by tweaking >>>> pfn_valid() to query memblocks only for early sections. >>>> >>>> If "physmem" is not an option, can we at least introduce something like >>>> ARCH_UPDTAE_MEMBLOCK_ON_HOTPLUG to avoid doing that on x86 and s390x for >>>> now (and later maybe for others)? >>> >>> I have to do more memory hotplug howework to answer that ;-) >>> >>> My general point is that we don't have to reinvent the wheel to have >>> coldplug memory representation, it's already there. We just need a way >>> to use it properly. >> >> Yes, I tend to agree. Details to be clarified :) > > I'm not quite understanding the concern, or requirement about > "updating memblock" in the hotplug path. The routines > memory_add_physaddr_to_nid() and phys_to_target_node() are helpers to > interrogate platform-firmware numa info through a common abstraction. > They place no burden on the memory hotplug code they're just used to > see if a hot-added range lies within an existing node span when > platform-firmware otherwise fails to communicate a node. x86 can > continue to back those helpers with numa_meminfo, arm64 can use a > generic memblock implementation and other archs can follow the arm64 > example if they want better numa answers for drivers. > See memblock_add_node()/memblock_remove() in mm/memory_hotplug.c. I don't want that code be reactivated for x86/s390x. That's all I am saying.
On Wed, Jul 08, 2020 at 06:10:19PM +0200, David Hildenbrand wrote: > On 08.07.20 17:50, Dan Williams wrote: > > On Wed, Jul 8, 2020 at 3:04 AM David Hildenbrand <david@redhat.com> wrote: > >> > > > > I'm not quite understanding the concern, or requirement about > > "updating memblock" in the hotplug path. The routines > > memory_add_physaddr_to_nid() and phys_to_target_node() are helpers to > > interrogate platform-firmware numa info through a common abstraction. > > They place no burden on the memory hotplug code they're just used to > > see if a hot-added range lies within an existing node span when > > platform-firmware otherwise fails to communicate a node. x86 can > > continue to back those helpers with numa_meminfo, arm64 can use a > > generic memblock implementation and other archs can follow the arm64 > > example if they want better numa answers for drivers. > > > > See memblock_add_node()/memblock_remove() in mm/memory_hotplug.c. I > don't want that code be reactivated for x86/s390x. That's all I am saying. And these have actual meaning only on arm64 because powerpc does not rely on memblock for memory hot(un)plug, AFAIU. Anyway, at the moment we can use memblock on hotplug path only on arm64 and I don't think its the path worth exploring. > -- > Thanks, > > David / dhildenb >
diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c index aafcee3e3f7e..7eeb31740248 100644 --- a/arch/arm64/mm/numa.c +++ b/arch/arm64/mm/numa.c @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) /* * We hope that we will be hotplugging memory on nodes we already know about, - * such that acpi_get_node() succeeds and we never fall back to this... + * such that acpi_get_node() succeeds. But when SRAT is not present, the node + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option. */ int memory_add_physaddr_to_nid(u64 addr) { - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr); return 0; } +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
This exports memory_add_physaddr_to_nid() for module driver to use. memory_add_physaddr_to_nid() is a fallback option to get the nid in case NUMA_NO_NID is detected. Suggested-by: David Hildenbrand <david@redhat.com> Signed-off-by: Jia He <justin.he@arm.com> --- arch/arm64/mm/numa.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)