Message ID | 1441859269-25831-4-git-send-email-tangchen@cn.fujitsu.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Hello, On Thu, Sep 10, 2015 at 12:27:45PM +0800, Tang Chen wrote: > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > index ad35f30..1a1324f 100644 > --- a/include/linux/gfp.h > +++ b/include/linux/gfp.h > @@ -307,13 +307,19 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask, > if (nid < 0) > nid = numa_node_id(); > > + if (!node_online(nid)) > + nid = get_near_online_node(nid); > + > return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask)); > } Why not just update node_data[]->node_zonelist in the first place? Also, what's the synchronization rule here? How are allocators synchronized against node hot [un]plugs? Thanks.
(cc'ing Christoph Lameter) On Thu, Sep 10, 2015 at 03:29:35PM -0400, Tejun Heo wrote: > Hello, > > On Thu, Sep 10, 2015 at 12:27:45PM +0800, Tang Chen wrote: > > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > > index ad35f30..1a1324f 100644 > > --- a/include/linux/gfp.h > > +++ b/include/linux/gfp.h > > @@ -307,13 +307,19 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask, > > if (nid < 0) > > nid = numa_node_id(); > > > > + if (!node_online(nid)) > > + nid = get_near_online_node(nid); > > + > > return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask)); > > } > > Why not just update node_data[]->node_zonelist in the first place? > Also, what's the synchronization rule here? How are allocators > synchronized against node hot [un]plugs? Also, shouldn't kmalloc_node() or any public allocator fall back automatically to a near node w/o GFP_THISNODE? Why is this failing at all? I get that cpu id -> node id mapping changing messes up the locality but allocations shouldn't fail, right? Thanks.
On Thu, 10 Sep 2015, Tejun Heo wrote: > > Why not just update node_data[]->node_zonelist in the first place? > > Also, what's the synchronization rule here? How are allocators > > synchronized against node hot [un]plugs? > > Also, shouldn't kmalloc_node() or any public allocator fall back > automatically to a near node w/o GFP_THISNODE? Why is this failing at > all? I get that cpu id -> node id mapping changing messes up the > locality but allocations shouldn't fail, right? Without a node specification allocations are subject to various constraints and memory policies. It is not simply going to the next node. The memory load may require spreading out the allocations over multiple nodes, the app may have specified which nodes are to be used etc etc. -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, On Thu, Sep 10, 2015 at 05:02:31PM -0500, Christoph Lameter wrote: > > Also, shouldn't kmalloc_node() or any public allocator fall back > > automatically to a near node w/o GFP_THISNODE? Why is this failing at > > all? I get that cpu id -> node id mapping changing messes up the > > locality but allocations shouldn't fail, right? > > Without a node specification allocations are subject to various > constraints and memory policies. It is not simply going to the next node. > The memory load may require spreading out the allocations over multiple > nodes, the app may have specified which nodes are to be used etc etc. Yeah, sure, but even w/ node specified, it shouldn't fail unless THISNODE, right? Thanks.
On Thu, 10 Sep 2015, Tejun Heo wrote: > > Why not just update node_data[]->node_zonelist in the first place? > > Also, what's the synchronization rule here? How are allocators > > synchronized against node hot [un]plugs? > > Also, shouldn't kmalloc_node() or any public allocator fall back > automatically to a near node w/o GFP_THISNODE? Why is this failing at > all? I get that cpu id -> node id mapping changing messes up the > locality but allocations shouldn't fail, right? Yes that should occur in the absence of other constraints (mempolicies, cpusets, cgroups, allocation type). If the constraints do not allow an allocation then the allocation will fail. Also: Are the zonelists setup the right way? -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, tj On 09/11/2015 03:29 AM, Tejun Heo wrote: > Hello, > > On Thu, Sep 10, 2015 at 12:27:45PM +0800, Tang Chen wrote: >> diff --git a/include/linux/gfp.h b/include/linux/gfp.h >> index ad35f30..1a1324f 100644 >> --- a/include/linux/gfp.h >> +++ b/include/linux/gfp.h >> @@ -307,13 +307,19 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask, >> if (nid < 0) >> nid = numa_node_id(); >> >> + if (!node_online(nid)) >> + nid = get_near_online_node(nid); >> + >> return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask)); >> } > Why not just update node_data[]->node_zonelist in the first place? zonelist will be rebuilt in __offline_pages() when the zone is not populated any more. Here, getting the best near online node is for those cpus on memory-less nodes. In the original code, if nid is NUMA_NO_NODE, the node the current cpu resides in will be chosen. And if the node is memory-less node, the cpu will be mapped to its best near online node. But this patch-set will map the cpu to its original node, so numa_node_id() may return a memory-less node to allocator. And then memory allocation may fail. > Also, what's the synchronization rule here? How are allocators > synchronized against node hot [un]plugs? The rule is: node_to_near_node_map[] array will be updated each time node [un]hotplug happens. Now it is not protected by a lock. But I think acquiring a lock may cause performance regression to memory allocator. When rebuilding zonelist, stop_machine is used. So I think maybe updating the node_to_near_node_map[] array at the same time when zonelist is rebuilt could be a good idea. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, Christoph, tj, On 09/11/2015 08:14 AM, Christoph Lameter wrote: > On Thu, 10 Sep 2015, Tejun Heo wrote: > >>> Why not just update node_data[]->node_zonelist in the first place? >>> Also, what's the synchronization rule here? How are allocators >>> synchronized against node hot [un]plugs? >> Also, shouldn't kmalloc_node() or any public allocator fall back >> automatically to a near node w/o GFP_THISNODE? Why is this failing at >> all? I get that cpu id -> node id mapping changing messes up the >> locality but allocations shouldn't fail, right? Yes. That is the reason we are getting near online node here. > Yes that should occur in the absence of other constraints (mempolicies, > cpusets, cgroups, allocation type). If the constraints do not allow an > allocation then the allocation will fail. > > Also: Are the zonelists setup the right way? zonelist will be rebuilt in __offline_pages() when the zone is not populated any more. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, Tang. On Sat, Sep 26, 2015 at 05:31:07PM +0800, Tang Chen wrote: > >>@@ -307,13 +307,19 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask, > >> if (nid < 0) > >> nid = numa_node_id(); > >>+ if (!node_online(nid)) > >>+ nid = get_near_online_node(nid); > >>+ > >> return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask)); > >> } > >Why not just update node_data[]->node_zonelist in the first place? > > zonelist will be rebuilt in __offline_pages() when the zone is not populated > any more. > > Here, getting the best near online node is for those cpus on memory-less > nodes. > > In the original code, if nid is NUMA_NO_NODE, the node the current cpu > resides in > will be chosen. And if the node is memory-less node, the cpu will be mapped > to its > best near online node. > > But this patch-set will map the cpu to its original node, so numa_node_id() > may return > a memory-less node to allocator. And then memory allocation may fail. Correct me if I'm wrong but the zonelist dictates which memory areas the page allocator is gonna try to from, right? What I'm wondering is why we aren't handling memory-less nodes by simply updating their zonelists. I mean, if, say, node 2 is memory-less, its zonelist can simply point to zones from other nodes, right? What am I missing here? Thanks.
Hi, tj, On 09/27/2015 01:53 AM, Tejun Heo wrote: > Hello, Tang. > > On Sat, Sep 26, 2015 at 05:31:07PM +0800, Tang Chen wrote: >>>> @@ -307,13 +307,19 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask, >>>> if (nid < 0) >>>> nid = numa_node_id(); >>>> + if (!node_online(nid)) >>>> + nid = get_near_online_node(nid); >>>> + >>>> return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask)); >>>> } >>> Why not just update node_data[]->node_zonelist in the first place? >> zonelist will be rebuilt in __offline_pages() when the zone is not populated >> any more. >> >> Here, getting the best near online node is for those cpus on memory-less >> nodes. >> >> In the original code, if nid is NUMA_NO_NODE, the node the current cpu >> resides in >> will be chosen. And if the node is memory-less node, the cpu will be mapped >> to its >> best near online node. >> >> But this patch-set will map the cpu to its original node, so numa_node_id() >> may return >> a memory-less node to allocator. And then memory allocation may fail. > Correct me if I'm wrong but the zonelist dictates which memory areas > the page allocator is gonna try to from, right? What I'm wondering is > why we aren't handling memory-less nodes by simply updating their > zonelists. I mean, if, say, node 2 is memory-less, its zonelist can > simply point to zones from other nodes, right? What am I missing > here? Oh, yes, you are right. But I remember some time ago, Liu, Jiang has or was going to handle memory less node like this in his patch: https://lkml.org/lkml/2015/8/16/130 BTW, to Liu Jiang, how is your patches going on ? Thanks. > > Thanks. > -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c index 8bd7661..e89b9fb 100644 --- a/arch/x86/mm/numa.c +++ b/arch/x86/mm/numa.c @@ -151,6 +151,7 @@ void numa_set_node(int cpu, int node) return; } #endif + per_cpu(x86_cpu_to_node_map, cpu) = node; set_near_online_node(node); @@ -787,8 +788,6 @@ void __init init_cpu_to_node(void) if (node == NUMA_NO_NODE) continue; - if (!node_online(node)) - node = find_near_online_node(node); numa_set_node(cpu, node); } } diff --git a/include/linux/gfp.h b/include/linux/gfp.h index ad35f30..1a1324f 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -307,13 +307,19 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask, if (nid < 0) nid = numa_node_id(); + if (!node_online(nid)) + nid = get_near_online_node(nid); + return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask)); } static inline struct page *alloc_pages_exact_node(int nid, gfp_t gfp_mask, unsigned int order) { - VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid)); + VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES); + + if (!node_online(nid)) + nid = get_near_online_node(nid); return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask)); }