Message ID | 1567231103-13237-3-git-send-email-linyunsheng@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | check the node id consistently across different arches | expand |
On Sat, Aug 31, 2019 at 01:58:16PM +0800, Yunsheng Lin wrote: > According to Section 6.2.14 from ACPI spec 6.3 [1], the setting > of proximity domain is optional, as below: > > This optional object is used to describe proximity domain > associations within a machine. _PXM evaluates to an integer > that identifies a device as belonging to a Proximity Domain > defined in the System Resource Affinity Table (SRAT). That's just words.. what does it actually mean? > This patch checks node id with the below case before returning > node_to_cpumask_map[node]: > 1. if node_id >= nr_node_ids, return cpu_none_mask > 2. if node_id < 0, return cpu_online_mask > 3. if node_to_cpumask_map[node_id] is NULL, return cpu_online_mask > > [1] https://uefi.org/sites/default/files/resources/ACPI_6_3_final_Jan30.pdf > > Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> > --- > arch/x86/include/asm/topology.h | 6 ++++++ > arch/x86/mm/numa.c | 2 +- > 2 files changed, 7 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h > index 4b14d23..f36e9c8 100644 > --- a/arch/x86/include/asm/topology.h > +++ b/arch/x86/include/asm/topology.h > @@ -69,6 +69,12 @@ extern const struct cpumask *cpumask_of_node(int node); > /* Returns a pointer to the cpumask of CPUs on Node 'node'. */ > static inline const struct cpumask *cpumask_of_node(int node) > { > + if (node >= nr_node_ids) > + return cpu_none_mask; > + > + if (node < 0 || !node_to_cpumask_map[node]) > + return cpu_online_mask; > + > return node_to_cpumask_map[node]; > } > #endif I _reallly_ hate this. Users are expected to use valid numa ids. Now we're adding all this checking to all users. Why do we want to do that? Using '(unsigned)node >= nr_nods_ids' is an error. > diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c > index e6dad60..5e393d2 100644 > --- a/arch/x86/mm/numa.c > +++ b/arch/x86/mm/numa.c > @@ -868,7 +868,7 @@ const struct cpumask *cpumask_of_node(int node) > dump_stack(); > return cpu_none_mask; > } > - if (node_to_cpumask_map[node] == NULL) { > + if (node < 0 || !node_to_cpumask_map[node]) { > printk(KERN_WARNING > "cpumask_of_node(%d): no node_to_cpumask_map!\n", > node); > -- > 2.8.1 >
On 2019/8/31 16:55, Peter Zijlstra wrote: > On Sat, Aug 31, 2019 at 01:58:16PM +0800, Yunsheng Lin wrote: >> According to Section 6.2.14 from ACPI spec 6.3 [1], the setting >> of proximity domain is optional, as below: >> >> This optional object is used to describe proximity domain >> associations within a machine. _PXM evaluates to an integer >> that identifies a device as belonging to a Proximity Domain >> defined in the System Resource Affinity Table (SRAT). > > That's just words.. what does it actually mean? It means the dev_to_node(dev) may return -1 if the bios does not implement the proximity domain feature, user may use that value to call cpumask_of_node and cpumask_of_node does not protect itself from node id being -1, which causes out of bound access. > >> This patch checks node id with the below case before returning >> node_to_cpumask_map[node]: >> 1. if node_id >= nr_node_ids, return cpu_none_mask >> 2. if node_id < 0, return cpu_online_mask >> 3. if node_to_cpumask_map[node_id] is NULL, return cpu_online_mask >> >> [1] https://uefi.org/sites/default/files/resources/ACPI_6_3_final_Jan30.pdf >> >> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> >> --- >> arch/x86/include/asm/topology.h | 6 ++++++ >> arch/x86/mm/numa.c | 2 +- >> 2 files changed, 7 insertions(+), 1 deletion(-) >> >> diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h >> index 4b14d23..f36e9c8 100644 >> --- a/arch/x86/include/asm/topology.h >> +++ b/arch/x86/include/asm/topology.h >> @@ -69,6 +69,12 @@ extern const struct cpumask *cpumask_of_node(int node); >> /* Returns a pointer to the cpumask of CPUs on Node 'node'. */ >> static inline const struct cpumask *cpumask_of_node(int node) >> { >> + if (node >= nr_node_ids) >> + return cpu_none_mask; >> + >> + if (node < 0 || !node_to_cpumask_map[node]) >> + return cpu_online_mask; >> + >> return node_to_cpumask_map[node]; >> } >> #endif > > I _reallly_ hate this. Users are expected to use valid numa ids. Now > we're adding all this checking to all users. Why do we want to do that? As above, the dev_to_node(dev) may return -1. > > Using '(unsigned)node >= nr_nods_ids' is an error. 'node >= nr_node_ids' can be dropped if all user is expected to not call cpumask_of_node with node id greater or equal to nr_nods_ids. From what I can see, the problem can be fixed in three place: 1. Make user dev_to_node return a valid node id even when proximity domain is not set by bios(or node id set by buggy bios is not valid), which may need info from the numa system to make sure it will return a valid node. 2. User that call cpumask_of_node should ensure the node id is valid before calling cpumask_of_node, and user also need some info to make ensure node id is valid. 3. Make sure cpumask_of_node deal with invalid node id as this patchset. Which one do you prefer to make sure node id is valid, or do you have any better idea? Any detail advice and suggestion will be very helpful, thanks. > >> diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c >> index e6dad60..5e393d2 100644 >> --- a/arch/x86/mm/numa.c >> +++ b/arch/x86/mm/numa.c >> @@ -868,7 +868,7 @@ const struct cpumask *cpumask_of_node(int node) >> dump_stack(); >> return cpu_none_mask; >> } >> - if (node_to_cpumask_map[node] == NULL) { >> + if (node < 0 || !node_to_cpumask_map[node]) { >> printk(KERN_WARNING >> "cpumask_of_node(%d): no node_to_cpumask_map!\n", >> node); >> -- >> 2.8.1 >> > > . >
On Sat, Aug 31, 2019 at 06:09:39PM +0800, Yunsheng Lin wrote: > > > On 2019/8/31 16:55, Peter Zijlstra wrote: > > On Sat, Aug 31, 2019 at 01:58:16PM +0800, Yunsheng Lin wrote: > >> According to Section 6.2.14 from ACPI spec 6.3 [1], the setting > >> of proximity domain is optional, as below: > >> > >> This optional object is used to describe proximity domain > >> associations within a machine. _PXM evaluates to an integer > >> that identifies a device as belonging to a Proximity Domain > >> defined in the System Resource Affinity Table (SRAT). > > > > That's just words.. what does it actually mean? > > It means the dev_to_node(dev) may return -1 if the bios does not > implement the proximity domain feature, user may use that value > to call cpumask_of_node and cpumask_of_node does not protect itself > from node id being -1, which causes out of bound access. > >> @@ -69,6 +69,12 @@ extern const struct cpumask *cpumask_of_node(int node); > >> /* Returns a pointer to the cpumask of CPUs on Node 'node'. */ > >> static inline const struct cpumask *cpumask_of_node(int node) > >> { > >> + if (node >= nr_node_ids) > >> + return cpu_none_mask; > >> + > >> + if (node < 0 || !node_to_cpumask_map[node]) > >> + return cpu_online_mask; > >> + > >> return node_to_cpumask_map[node]; > >> } > >> #endif > > > > I _reallly_ hate this. Users are expected to use valid numa ids. Now > > we're adding all this checking to all users. Why do we want to do that? > > As above, the dev_to_node(dev) may return -1. > > > > > Using '(unsigned)node >= nr_nods_ids' is an error. > > 'node >= nr_node_ids' can be dropped if all user is expected to not call > cpumask_of_node with node id greater or equal to nr_nods_ids. you copied my typo :-) > From what I can see, the problem can be fixed in three place: > 1. Make user dev_to_node return a valid node id even when proximity > domain is not set by bios(or node id set by buggy bios is not valid), > which may need info from the numa system to make sure it will return > a valid node. > > 2. User that call cpumask_of_node should ensure the node id is valid > before calling cpumask_of_node, and user also need some info to > make ensure node id is valid. > > 3. Make sure cpumask_of_node deal with invalid node id as this patchset. > > Which one do you prefer to make sure node id is valid, or do you > have any better idea? > > Any detail advice and suggestion will be very helpful, thanks. 1) because even it is not set, the device really does belong to a node. It is impossible a device will have magic uniform access to memory when CPUs cannot. 2) is already true today, cpumask_of_node() requires a valid node_id. 3) is just wrong and increases overhead for everyone.
On 2019/9/1 0:12, Peter Zijlstra wrote: > On Sat, Aug 31, 2019 at 06:09:39PM +0800, Yunsheng Lin wrote: >> >> >> On 2019/8/31 16:55, Peter Zijlstra wrote: >>> On Sat, Aug 31, 2019 at 01:58:16PM +0800, Yunsheng Lin wrote: >>>> According to Section 6.2.14 from ACPI spec 6.3 [1], the setting >>>> of proximity domain is optional, as below: >>>> >>>> This optional object is used to describe proximity domain >>>> associations within a machine. _PXM evaluates to an integer >>>> that identifies a device as belonging to a Proximity Domain >>>> defined in the System Resource Affinity Table (SRAT). >>> >>> That's just words.. what does it actually mean? >> >> It means the dev_to_node(dev) may return -1 if the bios does not >> implement the proximity domain feature, user may use that value >> to call cpumask_of_node and cpumask_of_node does not protect itself >> from node id being -1, which causes out of bound access. > >>>> @@ -69,6 +69,12 @@ extern const struct cpumask *cpumask_of_node(int node); >>>> /* Returns a pointer to the cpumask of CPUs on Node 'node'. */ >>>> static inline const struct cpumask *cpumask_of_node(int node) >>>> { >>>> + if (node >= nr_node_ids) >>>> + return cpu_none_mask; >>>> + >>>> + if (node < 0 || !node_to_cpumask_map[node]) >>>> + return cpu_online_mask; >>>> + >>>> return node_to_cpumask_map[node]; >>>> } >>>> #endif >>> >>> I _reallly_ hate this. Users are expected to use valid numa ids. Now >>> we're adding all this checking to all users. Why do we want to do that? >> >> As above, the dev_to_node(dev) may return -1. >> >>> >>> Using '(unsigned)node >= nr_nods_ids' is an error. >> >> 'node >= nr_node_ids' can be dropped if all user is expected to not call >> cpumask_of_node with node id greater or equal to nr_nods_ids. > > you copied my typo :-) I did note the typo, corrected the first one, but missed the second one :) > >> From what I can see, the problem can be fixed in three place: >> 1. Make user dev_to_node return a valid node id even when proximity >> domain is not set by bios(or node id set by buggy bios is not valid), >> which may need info from the numa system to make sure it will return >> a valid node. >> >> 2. User that call cpumask_of_node should ensure the node id is valid >> before calling cpumask_of_node, and user also need some info to >> make ensure node id is valid. >> >> 3. Make sure cpumask_of_node deal with invalid node id as this patchset. >> >> Which one do you prefer to make sure node id is valid, or do you >> have any better idea? >> >> Any detail advice and suggestion will be very helpful, thanks. > > 1) because even it is not set, the device really does belong to a node. > It is impossible a device will have magic uniform access to memory when > CPUs cannot. So it means dev_to_node() will return either NUMA_NO_NODE or a valid node id? > > 2) is already true today, cpumask_of_node() requires a valid node_id. Ok, most of the user does check node_id before calling cpumask_of_node(), but does a little different type of checking: 1) some does " < 0" check; 2) some does "== NUMA_NO_NODE" check; 3) some does ">= MAX_NUMNODES" check; 4) some does "< 0 || >= MAX_NUMNODES || !node_online(node)" check. > > 3) is just wrong and increases overhead for everyone. Ok, cpumask_of_node() is also used in some critical path such as scheduling, which may not need those checking, the overhead is unnecessary. But for non-critical path such as setup or configuration path, it better to have consistent checking, and also simplify the user code that calls cpumask_of_node(). Do you think it is worth the trouble to add a new function such as cpumask_of_node_check(maybe some other name) to do consistent checking? Or caller just simply check if dev_to_node()'s return value is NUMA_NO_NODE before calling cpumask_of_node()? > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > > . >
On Mon, Sep 02, 2019 at 01:46:51PM +0800, Yunsheng Lin wrote: > On 2019/9/1 0:12, Peter Zijlstra wrote: > > 1) because even it is not set, the device really does belong to a node. > > It is impossible a device will have magic uniform access to memory when > > CPUs cannot. > > So it means dev_to_node() will return either NUMA_NO_NODE or a > valid node id? NUMA_NO_NODE := -1, which is not a valid node number. It is also, like I said, not a valid device location on a NUMA system. Just because ACPI/BIOS is shit, doesn't mean the device doesn't have a node association. It just means we don't know and might have to guess. > > 2) is already true today, cpumask_of_node() requires a valid node_id. > > Ok, most of the user does check node_id before calling > cpumask_of_node(), but does a little different type of checking: > > 1) some does " < 0" check; > 2) some does "== NUMA_NO_NODE" check; > 3) some does ">= MAX_NUMNODES" check; > 4) some does "< 0 || >= MAX_NUMNODES || !node_online(node)" check. The one true way is: '(unsigned)node_id >= nr_node_ids' > > 3) is just wrong and increases overhead for everyone. > > Ok, cpumask_of_node() is also used in some critical path such > as scheduling, which may not need those checking, the overhead > is unnecessary. > > But for non-critical path such as setup or configuration path, > it better to have consistent checking, and also simplify the > user code that calls cpumask_of_node(). > > Do you think it is worth the trouble to add a new function > such as cpumask_of_node_check(maybe some other name) to do > consistent checking? > > Or caller just simply check if dev_to_node()'s return value is > NUMA_NO_NODE before calling cpumask_of_node()? It is not a matter of convenience. The function is called cpumask_of_node(), when node < 0 || node >= nr_node_ids, it is not a valid node, therefore the function shouldn't return anything except an error. Also note that the CONFIG_DEBUG_PER_CPU_MAPS version of cpumask_of_node() already does this (although it wants the below fix). --- diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c index e6dad600614c..5f49c10201c7 100644 --- a/arch/x86/mm/numa.c +++ b/arch/x86/mm/numa.c @@ -861,7 +861,7 @@ void numa_remove_cpu(int cpu) */ const struct cpumask *cpumask_of_node(int node) { - if (node >= nr_node_ids) { + if ((unsigned)node >= nr_node_ids) { printk(KERN_WARNING "cpumask_of_node(%d): node > nr_node_ids(%u)\n", node, nr_node_ids);
On 2019/9/2 15:25, Peter Zijlstra wrote: > On Mon, Sep 02, 2019 at 01:46:51PM +0800, Yunsheng Lin wrote: >> On 2019/9/1 0:12, Peter Zijlstra wrote: > >>> 1) because even it is not set, the device really does belong to a node. >>> It is impossible a device will have magic uniform access to memory when >>> CPUs cannot. >> >> So it means dev_to_node() will return either NUMA_NO_NODE or a >> valid node id? > > NUMA_NO_NODE := -1, which is not a valid node number. It is also, like I > said, not a valid device location on a NUMA system. > > Just because ACPI/BIOS is shit, doesn't mean the device doesn't have a > node association. It just means we don't know and might have to guess. How do we guess the device's location when ACPI/BIOS does not set it? It seems dev_to_node() does not do anything about that and leave the job to the caller or whatever function that get called with its return value, such as cpumask_of_node(). > >>> 2) is already true today, cpumask_of_node() requires a valid node_id. >> >> Ok, most of the user does check node_id before calling >> cpumask_of_node(), but does a little different type of checking: >> >> 1) some does " < 0" check; >> 2) some does "== NUMA_NO_NODE" check; >> 3) some does ">= MAX_NUMNODES" check; >> 4) some does "< 0 || >= MAX_NUMNODES || !node_online(node)" check. > > The one true way is: > > '(unsigned)node_id >= nr_node_ids' I missed the magic of the "unsigned" in your previous reply. > >>> 3) is just wrong and increases overhead for everyone. >> >> Ok, cpumask_of_node() is also used in some critical path such >> as scheduling, which may not need those checking, the overhead >> is unnecessary. >> >> But for non-critical path such as setup or configuration path, >> it better to have consistent checking, and also simplify the >> user code that calls cpumask_of_node(). >> >> Do you think it is worth the trouble to add a new function >> such as cpumask_of_node_check(maybe some other name) to do >> consistent checking? >> >> Or caller just simply check if dev_to_node()'s return value is >> NUMA_NO_NODE before calling cpumask_of_node()? > > It is not a matter of convenience. The function is called > cpumask_of_node(), when node < 0 || node >= nr_node_ids, it is not a > valid node, therefore the function shouldn't return anything except an > error. what do you mean by error? What I can think is three type of errors: 1) return NULL, this way it seems cpumask_of_node() also leave the job to the function that calls it. 2) cpu_none_mask, I am not sure what this means, maybe it means there is no cpu on the same node with the device? 3) give a warning, stack dump, or even a BUG_ON? I would prefer the second one, and implement the third one when the CONFIG_DEBUG_PER_CPU_MAPS is selected. Any suggestion? > > Also note that the CONFIG_DEBUG_PER_CPU_MAPS version of > cpumask_of_node() already does this (although it wants the below fix). Thanks for the note and example.
diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h index 4b14d23..f36e9c8 100644 --- a/arch/x86/include/asm/topology.h +++ b/arch/x86/include/asm/topology.h @@ -69,6 +69,12 @@ extern const struct cpumask *cpumask_of_node(int node); /* Returns a pointer to the cpumask of CPUs on Node 'node'. */ static inline const struct cpumask *cpumask_of_node(int node) { + if (node >= nr_node_ids) + return cpu_none_mask; + + if (node < 0 || !node_to_cpumask_map[node]) + return cpu_online_mask; + return node_to_cpumask_map[node]; } #endif diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c index e6dad60..5e393d2 100644 --- a/arch/x86/mm/numa.c +++ b/arch/x86/mm/numa.c @@ -868,7 +868,7 @@ const struct cpumask *cpumask_of_node(int node) dump_stack(); return cpu_none_mask; } - if (node_to_cpumask_map[node] == NULL) { + if (node < 0 || !node_to_cpumask_map[node]) { printk(KERN_WARNING "cpumask_of_node(%d): no node_to_cpumask_map!\n", node);
According to Section 6.2.14 from ACPI spec 6.3 [1], the setting of proximity domain is optional, as below: This optional object is used to describe proximity domain associations within a machine. _PXM evaluates to an integer that identifies a device as belonging to a Proximity Domain defined in the System Resource Affinity Table (SRAT). This patch checks node id with the below case before returning node_to_cpumask_map[node]: 1. if node_id >= nr_node_ids, return cpu_none_mask 2. if node_id < 0, return cpu_online_mask 3. if node_to_cpumask_map[node_id] is NULL, return cpu_online_mask [1] https://uefi.org/sites/default/files/resources/ACPI_6_3_final_Jan30.pdf Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> --- arch/x86/include/asm/topology.h | 6 ++++++ arch/x86/mm/numa.c | 2 +- 2 files changed, 7 insertions(+), 1 deletion(-)