diff mbox series

arm64: numa: check the node id before accessing node_to_cpumask_map

Message ID 1567131991-189761-1-git-send-email-linyunsheng@huawei.com (mailing list archive)
State New, archived
Headers show
Series arm64: numa: check the node id before accessing node_to_cpumask_map | expand

Commit Message

Yunsheng Lin Aug. 30, 2019, 2:26 a.m. UTC
Some buggy bios may not set the device' numa id, and dev_to_node
will return -1, which may cause global-out-of-bounds error
detected by KASAN.

This patch changes cpumask_of_node to return cpu_none_mask if the
node is not valid, and sync the cpumask_of_node between the
cpumask_of_node function in numa.h and numa.c.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
---
 arch/arm64/include/asm/numa.h | 6 ++++++
 arch/arm64/mm/numa.c          | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

Comments

Michal Hocko Aug. 30, 2019, 5:55 a.m. UTC | #1
On Fri 30-08-19 10:26:31, Yunsheng Lin wrote:
> Some buggy bios may not set the device' numa id, and dev_to_node
> will return -1, which may cause global-out-of-bounds error
> detected by KASAN.

Why should we workaround a buggy bios like that? Is it so widespread and
no BIOS update available? Also, why is this arm64 specific?

> This patch changes cpumask_of_node to return cpu_none_mask if the
> node is not valid, and sync the cpumask_of_node between the
> cpumask_of_node function in numa.h and numa.c.

Why?

> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> ---
>  arch/arm64/include/asm/numa.h | 6 ++++++
>  arch/arm64/mm/numa.c          | 2 +-
>  2 files changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
> index 626ad01..da891ed 100644
> --- a/arch/arm64/include/asm/numa.h
> +++ b/arch/arm64/include/asm/numa.h
> @@ -25,6 +25,12 @@ const struct cpumask *cpumask_of_node(int node);
>  /* Returns a pointer to the cpumask of CPUs on Node 'node'. */
>  static inline const struct cpumask *cpumask_of_node(int node)
>  {
> +	if (node >= nr_node_ids || node < 0)
> +		return cpu_none_mask;
> +
> +	if (!node_to_cpumask_map[node])
> +		return cpu_online_mask;
> +
>  	return node_to_cpumask_map[node];
>  }
>  #endif
> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> index 4f241cc..3846313 100644
> --- a/arch/arm64/mm/numa.c
> +++ b/arch/arm64/mm/numa.c
> @@ -46,7 +46,7 @@ EXPORT_SYMBOL(node_to_cpumask_map);
>   */
>  const struct cpumask *cpumask_of_node(int node)
>  {
> -	if (WARN_ON(node >= nr_node_ids))
> +	if (WARN_ON(node >= nr_node_ids || node < 0))
>  		return cpu_none_mask;
>  
>  	if (WARN_ON(node_to_cpumask_map[node] == NULL))
> -- 
> 2.8.1
Yunsheng Lin Aug. 30, 2019, 6:35 a.m. UTC | #2
On 2019/8/30 13:55, Michal Hocko wrote:
> On Fri 30-08-19 10:26:31, Yunsheng Lin wrote:
>> Some buggy bios may not set the device' numa id, and dev_to_node
>> will return -1, which may cause global-out-of-bounds error
>> detected by KASAN.
> 
> Why should we workaround a buggy bios like that? Is it so widespread and
> no BIOS update available? Also, why is this arm64 specific?

For our case, there is BIOS update available. I just thought it might
be better to protect from this case when BIOS has not implemented the
device' numa id setting feature or the feature from BIOS has some bug.

It is not arm64 specific, right now I only have arm64 board. If it is
ok to protect this from the buggy BIOS, maybe all other arch can be
changed too.

> 
>> This patch changes cpumask_of_node to return cpu_none_mask if the
>> node is not valid, and sync the cpumask_of_node between the
>> cpumask_of_node function in numa.h and numa.c.
> 
> Why?

When CONFIG_DEBUG_PER_CPU_MAPS is defined, the cpumask_of_node() in
numa.c is used, if not, the cpumask_of_node() in numa.h is used.

I am not sure why there is difference between them, and it is there
when since the below commit:
1a2db300348b ("arm64, numa: Add NUMA support for arm64 platforms.")

I synced them to keep them consistent whether CONFIG_DEBUG_PER_CPU_MAPS
is defined.

> 
>> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
>> ---
>>  arch/arm64/include/asm/numa.h | 6 ++++++
>>  arch/arm64/mm/numa.c          | 2 +-
>>  2 files changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
>> index 626ad01..da891ed 100644
>> --- a/arch/arm64/include/asm/numa.h
>> +++ b/arch/arm64/include/asm/numa.h
>> @@ -25,6 +25,12 @@ const struct cpumask *cpumask_of_node(int node);
>>  /* Returns a pointer to the cpumask of CPUs on Node 'node'. */
>>  static inline const struct cpumask *cpumask_of_node(int node)
>>  {
>> +	if (node >= nr_node_ids || node < 0)
>> +		return cpu_none_mask;
>> +
>> +	if (!node_to_cpumask_map[node])
>> +		return cpu_online_mask;
>> +
>>  	return node_to_cpumask_map[node];
>>  }
>>  #endif
>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>> index 4f241cc..3846313 100644
>> --- a/arch/arm64/mm/numa.c
>> +++ b/arch/arm64/mm/numa.c
>> @@ -46,7 +46,7 @@ EXPORT_SYMBOL(node_to_cpumask_map);
>>   */
>>  const struct cpumask *cpumask_of_node(int node)
>>  {
>> -	if (WARN_ON(node >= nr_node_ids))
>> +	if (WARN_ON(node >= nr_node_ids || node < 0))
>>  		return cpu_none_mask;
>>  
>>  	if (WARN_ON(node_to_cpumask_map[node] == NULL))
>> -- 
>> 2.8.1
>
Michal Hocko Aug. 30, 2019, 6:44 a.m. UTC | #3
On Fri 30-08-19 14:35:26, Yunsheng Lin wrote:
> On 2019/8/30 13:55, Michal Hocko wrote:
> > On Fri 30-08-19 10:26:31, Yunsheng Lin wrote:
> >> Some buggy bios may not set the device' numa id, and dev_to_node
> >> will return -1, which may cause global-out-of-bounds error
> >> detected by KASAN.
> > 
> > Why should we workaround a buggy bios like that? Is it so widespread and
> > no BIOS update available? Also, why is this arm64 specific?
> 
> For our case, there is BIOS update available. I just thought it might
> be better to protect from this case when BIOS has not implemented the
> device' numa id setting feature or the feature from BIOS has some bug.
> 
> It is not arm64 specific, right now I only have arm64 board. If it is
> ok to protect this from the buggy BIOS, maybe all other arch can be
> changed too.

If we are to really care then this should be consistent among
architectures IMHO. But I am not really sure this is really worth it.
The code is quite old and I do not really remember any reports. 

> >> This patch changes cpumask_of_node to return cpu_none_mask if the
> >> node is not valid, and sync the cpumask_of_node between the
> >> cpumask_of_node function in numa.h and numa.c.
> > 
> > Why?
> 
> When CONFIG_DEBUG_PER_CPU_MAPS is defined, the cpumask_of_node() in
> numa.c is used, if not, the cpumask_of_node() in numa.h is used.
> 
> I am not sure why there is difference between them, and it is there
> when since the below commit:
> 1a2db300348b ("arm64, numa: Add NUMA support for arm64 platforms.")
> 
> I synced them to keep them consistent whether CONFIG_DEBUG_PER_CPU_MAPS
> is defined.

Such a change should be made in a separate patch with a full
clarification/justification. From the above it is still not clear why
this is needed though.

> >> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> >> ---
> >>  arch/arm64/include/asm/numa.h | 6 ++++++
> >>  arch/arm64/mm/numa.c          | 2 +-
> >>  2 files changed, 7 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
> >> index 626ad01..da891ed 100644
> >> --- a/arch/arm64/include/asm/numa.h
> >> +++ b/arch/arm64/include/asm/numa.h
> >> @@ -25,6 +25,12 @@ const struct cpumask *cpumask_of_node(int node);
> >>  /* Returns a pointer to the cpumask of CPUs on Node 'node'. */
> >>  static inline const struct cpumask *cpumask_of_node(int node)
> >>  {
> >> +	if (node >= nr_node_ids || node < 0)
> >> +		return cpu_none_mask;
> >> +
> >> +	if (!node_to_cpumask_map[node])
> >> +		return cpu_online_mask;
> >> +
> >>  	return node_to_cpumask_map[node];
> >>  }
> >>  #endif
> >> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> >> index 4f241cc..3846313 100644
> >> --- a/arch/arm64/mm/numa.c
> >> +++ b/arch/arm64/mm/numa.c
> >> @@ -46,7 +46,7 @@ EXPORT_SYMBOL(node_to_cpumask_map);
> >>   */
> >>  const struct cpumask *cpumask_of_node(int node)
> >>  {
> >> -	if (WARN_ON(node >= nr_node_ids))
> >> +	if (WARN_ON(node >= nr_node_ids || node < 0))
> >>  		return cpu_none_mask;
> >>  
> >>  	if (WARN_ON(node_to_cpumask_map[node] == NULL))
> >> -- 
> >> 2.8.1
> >
Yunsheng Lin Aug. 30, 2019, 8:08 a.m. UTC | #4
On 2019/8/30 14:44, Michal Hocko wrote:
> On Fri 30-08-19 14:35:26, Yunsheng Lin wrote:
>> On 2019/8/30 13:55, Michal Hocko wrote:
>>> On Fri 30-08-19 10:26:31, Yunsheng Lin wrote:
>>>> Some buggy bios may not set the device' numa id, and dev_to_node
>>>> will return -1, which may cause global-out-of-bounds error
>>>> detected by KASAN.
>>>
>>> Why should we workaround a buggy bios like that? Is it so widespread and
>>> no BIOS update available? Also, why is this arm64 specific?
>>
>> For our case, there is BIOS update available. I just thought it might
>> be better to protect from this case when BIOS has not implemented the
>> device' numa id setting feature or the feature from BIOS has some bug.
>>
>> It is not arm64 specific, right now I only have arm64 board. If it is
>> ok to protect this from the buggy BIOS, maybe all other arch can be
>> changed too.
> 
> If we are to really care then this should be consistent among
> architectures IMHO. But I am not really sure this is really worth it.
> The code is quite old and I do not really remember any reports. 

It is only detected by enabling KASAN, the system seems to run fine without
any visible error if KASAN is disabled. Maybe there is why no report has
been seen?

Also according to Section 6.2.14 from ACPI spec 6.3 [1], the setting of proximity
domain is optional, as below:

This optional object is used to describe proximity domain
associations within a machine. _PXM evaluates to an integer
that identifies a device as belonging to a Proximity Domain
defined in the System Resource Affinity Table (SRAT).


Do you think it is ok to resend the fix with above clarification and below
log:

[   42.970381] ==================================================================
[   42.977595] BUG: KASAN: global-out-of-bounds in __bitmap_weight+0x48/0xb0
[   42.984370] Read of size 8 at addr ffff20008cdf8790 by task kworker/0:1/13
[   42.991230]
[   42.992712] CPU: 0 PID: 13 Comm: kworker/0:1 Tainted: G           O      5.2.0-rc4-g8bde06a-dirty #3
[   43.001830] Hardware name: Huawei TaiShan 2280 V2/BC82AMDA, BIOS TA BIOS 2280-A CS V2.B050.01 08/08/2019
[   43.011298] Workqueue: events work_for_cpu_fn
[   43.015643] Call trace:
[   43.018078]  dump_backtrace+0x0/0x1e8
[   43.021727]  show_stack+0x14/0x20
[   43.025031]  dump_stack+0xc4/0xfc
[   43.028335]  print_address_description+0x178/0x270
[   43.033113]  __kasan_report+0x164/0x1b8
[   43.036936]  kasan_report+0xc/0x18
[   43.040325]  __asan_load8+0x84/0xa8
[   43.043801]  __bitmap_weight+0x48/0xb0
[   43.047552]  hclge_init_ae_dev+0x988/0x1e78 [hclge]
[   43.052418]  hnae3_register_ae_dev+0xcc/0x278 [hnae3]
[   43.057467]  hns3_probe+0xe0/0x120 [hns3]
[   43.061464]  local_pci_probe+0x74/0xf0
[   43.065200]  work_for_cpu_fn+0x2c/0x48
[   43.068937]  process_one_work+0x3c0/0x878
[   43.072934]  worker_thread+0x400/0x670
[   43.076670]  kthread+0x1b0/0x1b8
[   43.079885]  ret_from_fork+0x10/0x18
[   43.083446]
[   43.084925] The buggy address belongs to the variable:
[   43.090052]  numa_distance+0x30/0x40
[   43.093613]
[   43.095091] Memory state around the buggy address:
[   43.099870]  ffff20008cdf8680: fa fa fa fa 04 fa fa fa fa fa fa fa 00 00 fa fa
[   43.107078]  ffff20008cdf8700: fa fa fa fa 04 fa fa fa fa fa fa fa 00 fa fa fa
[   43.114286] >ffff20008cdf8780: fa fa fa fa 00 00 00 00 00 00 00 00 fa fa fa fa
[   43.121494]                          ^
[   43.125230]  ffff20008cdf8800: 01 fa fa fa fa fa fa fa 04 fa fa fa fa fa fa fa
[   43.132439]  ffff20008cdf8880: fa fa fa fa fa fa fa fa 00 00 fa fa fa fa fa fa
[   43.139646] ==================================================================


> 
>>>> This patch changes cpumask_of_node to return cpu_none_mask if the
>>>> node is not valid, and sync the cpumask_of_node between the
>>>> cpumask_of_node function in numa.h and numa.c.
>>>
>>> Why?
>>
>> When CONFIG_DEBUG_PER_CPU_MAPS is defined, the cpumask_of_node() in
>> numa.c is used, if not, the cpumask_of_node() in numa.h is used.
>>
>> I am not sure why there is difference between them, and it is there
>> when since the below commit:
>> 1a2db300348b ("arm64, numa: Add NUMA support for arm64 platforms.")
>>
>> I synced them to keep them consistent whether CONFIG_DEBUG_PER_CPU_MAPS
>> is defined.
> 
> Such a change should be made in a separate patch with a full
> clarification/justification. From the above it is still not clear why
> this is needed though.

Ok.

How about:

Currently there are different implementations of cpumask_of_node() depend
on the arch, for example:

ia64:
#define cpumask_of_node(node) ((node) == -1 ?				\
			       cpu_all_mask :				\
			       &node_to_cpu_mask[node])


alpha:
static const struct cpumask *cpumask_of_node(int node)
{
	int cpu;

	if (node == NUMA_NO_NODE)
		return cpu_all_mask;

	cpumask_clear(&node_to_cpumask_map[node]);

	for_each_online_cpu(cpu) {
		if (cpu_to_node(cpu) == node)
			cpumask_set_cpu(cpu, node_to_cpumask_map[node]);
	}

	return &node_to_cpumask_map[node];
}

Even for the same arch, there are two implementations depend on the
CONFIG_DEBUG_PER_CPU_MAPS configuration.

arm64/x86 without CONFIG_DEBUG_PER_CPU_MAPS:
static inline const struct cpumask *cpumask_of_node(int node)
{
	return node_to_cpumask_map[node];
}

arm64/x86 with CONFIG_DEBUG_PER_CPU_MAPS:
const struct cpumask *cpumask_of_node(int node)
{
	if (WARN_ON(node >= nr_node_ids))
		return cpu_none_mask;

	if (WARN_ON(node_to_cpumask_map[node] == NULL))
		return cpu_online_mask;

	return node_to_cpumask_map[node];
}

It seems the cpumask_of_node with CONFIG_DEBUG_PER_CPU_MAPS is used
to catch the erorr case and give a warning to user when node id is not
valid.

So in order to be consistent between different arch and with or
without CONFIG_DEBUG_PER_CPU_MAPS case:
node id has to be checked with the below case before returning
node_to_cpumask_map[node]:
1. if nr_node_ids < 0, return cpu_all_mask
2. if nr_node_ids >= nr_node_ids, return cpu_none_mask
3. if node_to_cpumask_map[node] is NULL, return cpu_online_mask


[1] https://uefi.org/sites/default/files/resources/ACPI_6_3_final_Jan30.pdf

> 
>>>> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
>>>> ---
>>>>  arch/arm64/include/asm/numa.h | 6 ++++++
>>>>  arch/arm64/mm/numa.c          | 2 +-
>>>>  2 files changed, 7 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
>>>> index 626ad01..da891ed 100644
>>>> --- a/arch/arm64/include/asm/numa.h
>>>> +++ b/arch/arm64/include/asm/numa.h
>>>> @@ -25,6 +25,12 @@ const struct cpumask *cpumask_of_node(int node);
>>>>  /* Returns a pointer to the cpumask of CPUs on Node 'node'. */
>>>>  static inline const struct cpumask *cpumask_of_node(int node)
>>>>  {
>>>> +	if (node >= nr_node_ids || node < 0)
>>>> +		return cpu_none_mask;
>>>> +
>>>> +	if (!node_to_cpumask_map[node])
>>>> +		return cpu_online_mask;
>>>> +
>>>>  	return node_to_cpumask_map[node];
>>>>  }
>>>>  #endif
>>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>>>> index 4f241cc..3846313 100644
>>>> --- a/arch/arm64/mm/numa.c
>>>> +++ b/arch/arm64/mm/numa.c
>>>> @@ -46,7 +46,7 @@ EXPORT_SYMBOL(node_to_cpumask_map);
>>>>   */
>>>>  const struct cpumask *cpumask_of_node(int node)
>>>>  {
>>>> -	if (WARN_ON(node >= nr_node_ids))
>>>> +	if (WARN_ON(node >= nr_node_ids || node < 0))
>>>>  		return cpu_none_mask;
>>>>  
>>>>  	if (WARN_ON(node_to_cpumask_map[node] == NULL))
>>>> -- 
>>>> 2.8.1
>>>
>
Michal Hocko Aug. 30, 2019, 8:39 a.m. UTC | #5
On Fri 30-08-19 16:08:14, Yunsheng Lin wrote:
> On 2019/8/30 14:44, Michal Hocko wrote:
> > On Fri 30-08-19 14:35:26, Yunsheng Lin wrote:
> >> On 2019/8/30 13:55, Michal Hocko wrote:
> >>> On Fri 30-08-19 10:26:31, Yunsheng Lin wrote:
> >>>> Some buggy bios may not set the device' numa id, and dev_to_node
> >>>> will return -1, which may cause global-out-of-bounds error
> >>>> detected by KASAN.
> >>>
> >>> Why should we workaround a buggy bios like that? Is it so widespread and
> >>> no BIOS update available? Also, why is this arm64 specific?
> >>
> >> For our case, there is BIOS update available. I just thought it might
> >> be better to protect from this case when BIOS has not implemented the
> >> device' numa id setting feature or the feature from BIOS has some bug.
> >>
> >> It is not arm64 specific, right now I only have arm64 board. If it is
> >> ok to protect this from the buggy BIOS, maybe all other arch can be
> >> changed too.
> > 
> > If we are to really care then this should be consistent among
> > architectures IMHO. But I am not really sure this is really worth it.
> > The code is quite old and I do not really remember any reports. 
> 
> It is only detected by enabling KASAN, the system seems to run fine without
> any visible error if KASAN is disabled. Maybe there is why no report has
> been seen?
> 
> Also according to Section 6.2.14 from ACPI spec 6.3 [1], the setting of proximity
> domain is optional, as below:
> 
> This optional object is used to describe proximity domain
> associations within a machine. _PXM evaluates to an integer
> that identifies a device as belonging to a Proximity Domain
> defined in the System Resource Affinity Table (SRAT).
> 
> 
> Do you think it is ok to resend the fix with above clarification and below
> log:

OK, if the specification really allows to provide NUMA_NO_NODE (-1) then
the code really has to be prepared for that. And ideally all arches
should deal with that.

> [   42.970381] ==================================================================
> [   42.977595] BUG: KASAN: global-out-of-bounds in __bitmap_weight+0x48/0xb0
> [   42.984370] Read of size 8 at addr ffff20008cdf8790 by task kworker/0:1/13
> [   42.991230]
> [   42.992712] CPU: 0 PID: 13 Comm: kworker/0:1 Tainted: G           O      5.2.0-rc4-g8bde06a-dirty #3
> [   43.001830] Hardware name: Huawei TaiShan 2280 V2/BC82AMDA, BIOS TA BIOS 2280-A CS V2.B050.01 08/08/2019
> [   43.011298] Workqueue: events work_for_cpu_fn
> [   43.015643] Call trace:
> [   43.018078]  dump_backtrace+0x0/0x1e8
> [   43.021727]  show_stack+0x14/0x20
> [   43.025031]  dump_stack+0xc4/0xfc
> [   43.028335]  print_address_description+0x178/0x270
> [   43.033113]  __kasan_report+0x164/0x1b8
> [   43.036936]  kasan_report+0xc/0x18
> [   43.040325]  __asan_load8+0x84/0xa8
> [   43.043801]  __bitmap_weight+0x48/0xb0
> [   43.047552]  hclge_init_ae_dev+0x988/0x1e78 [hclge]
> [   43.052418]  hnae3_register_ae_dev+0xcc/0x278 [hnae3]
> [   43.057467]  hns3_probe+0xe0/0x120 [hns3]
> [   43.061464]  local_pci_probe+0x74/0xf0
> [   43.065200]  work_for_cpu_fn+0x2c/0x48
> [   43.068937]  process_one_work+0x3c0/0x878
> [   43.072934]  worker_thread+0x400/0x670
> [   43.076670]  kthread+0x1b0/0x1b8
> [   43.079885]  ret_from_fork+0x10/0x18
> [   43.083446]
> [   43.084925] The buggy address belongs to the variable:
> [   43.090052]  numa_distance+0x30/0x40
> [   43.093613]
> [   43.095091] Memory state around the buggy address:
> [   43.099870]  ffff20008cdf8680: fa fa fa fa 04 fa fa fa fa fa fa fa 00 00 fa fa
> [   43.107078]  ffff20008cdf8700: fa fa fa fa 04 fa fa fa fa fa fa fa 00 fa fa fa
> [   43.114286] >ffff20008cdf8780: fa fa fa fa 00 00 00 00 00 00 00 00 fa fa fa fa
> [   43.121494]                          ^
> [   43.125230]  ffff20008cdf8800: 01 fa fa fa fa fa fa fa 04 fa fa fa fa fa fa fa
> [   43.132439]  ffff20008cdf8880: fa fa fa fa fa fa fa fa 00 00 fa fa fa fa fa fa
> [   43.139646] ==================================================================
> 
> 
> > 
> >>>> This patch changes cpumask_of_node to return cpu_none_mask if the
> >>>> node is not valid, and sync the cpumask_of_node between the
> >>>> cpumask_of_node function in numa.h and numa.c.
> >>>
> >>> Why?
> >>
> >> When CONFIG_DEBUG_PER_CPU_MAPS is defined, the cpumask_of_node() in
> >> numa.c is used, if not, the cpumask_of_node() in numa.h is used.
> >>
> >> I am not sure why there is difference between them, and it is there
> >> when since the below commit:
> >> 1a2db300348b ("arm64, numa: Add NUMA support for arm64 platforms.")
> >>
> >> I synced them to keep them consistent whether CONFIG_DEBUG_PER_CPU_MAPS
> >> is defined.
> > 
> > Such a change should be made in a separate patch with a full
> > clarification/justification. From the above it is still not clear why
> > this is needed though.
> 
> Ok.
> 
> How about:
> 
> Currently there are different implementations of cpumask_of_node() depend
> on the arch, for example:
> 
> ia64:
> #define cpumask_of_node(node) ((node) == -1 ?				\
> 			       cpu_all_mask :				\
> 			       &node_to_cpu_mask[node])
> 
> 
> alpha:
> static const struct cpumask *cpumask_of_node(int node)
> {
> 	int cpu;
> 
> 	if (node == NUMA_NO_NODE)
> 		return cpu_all_mask;
> 
> 	cpumask_clear(&node_to_cpumask_map[node]);
> 
> 	for_each_online_cpu(cpu) {
> 		if (cpu_to_node(cpu) == node)
> 			cpumask_set_cpu(cpu, node_to_cpumask_map[node]);
> 	}
> 
> 	return &node_to_cpumask_map[node];
> }
> 
> Even for the same arch, there are two implementations depend on the
> CONFIG_DEBUG_PER_CPU_MAPS configuration.
> 
> arm64/x86 without CONFIG_DEBUG_PER_CPU_MAPS:
> static inline const struct cpumask *cpumask_of_node(int node)
> {
> 	return node_to_cpumask_map[node];
> }
> 
> arm64/x86 with CONFIG_DEBUG_PER_CPU_MAPS:
> const struct cpumask *cpumask_of_node(int node)
> {
> 	if (WARN_ON(node >= nr_node_ids))
> 		return cpu_none_mask;
> 
> 	if (WARN_ON(node_to_cpumask_map[node] == NULL))
> 		return cpu_online_mask;
> 
> 	return node_to_cpumask_map[node];
> }
> 
> It seems the cpumask_of_node with CONFIG_DEBUG_PER_CPU_MAPS is used
> to catch the erorr case and give a warning to user when node id is not
> valid.

Yeah the config help text
config DEBUG_PER_CPU_MAPS
        bool "Debug access to per_cpu maps"
        depends on DEBUG_KERNEL
        depends on SMP
        help
          Say Y to verify that the per_cpu map being accessed has
          been set up. This adds a fair amount of code to kernel memory
          and decreases performance.

          Say N if unsure.

suggests that this is intentionally hidden behind a config so a normal
path shouldn't really duplicate it. If those checks make sense in
general then the config option should be dropped I think.
Yunsheng Lin Aug. 30, 2019, 9:49 a.m. UTC | #6
On 2019/8/30 16:39, Michal Hocko wrote:
> On Fri 30-08-19 16:08:14, Yunsheng Lin wrote:
>> On 2019/8/30 14:44, Michal Hocko wrote:
>>> On Fri 30-08-19 14:35:26, Yunsheng Lin wrote:
>>>> On 2019/8/30 13:55, Michal Hocko wrote:
>>>>> On Fri 30-08-19 10:26:31, Yunsheng Lin wrote:
>>>>>> Some buggy bios may not set the device' numa id, and dev_to_node
>>>>>> will return -1, which may cause global-out-of-bounds error
>>>>>> detected by KASAN.
>>>>>
>>>>> Why should we workaround a buggy bios like that? Is it so widespread and
>>>>> no BIOS update available? Also, why is this arm64 specific?
>>>>
>>>> For our case, there is BIOS update available. I just thought it might
>>>> be better to protect from this case when BIOS has not implemented the
>>>> device' numa id setting feature or the feature from BIOS has some bug.
>>>>
>>>> It is not arm64 specific, right now I only have arm64 board. If it is
>>>> ok to protect this from the buggy BIOS, maybe all other arch can be
>>>> changed too.
>>>
>>> If we are to really care then this should be consistent among
>>> architectures IMHO. But I am not really sure this is really worth it.
>>> The code is quite old and I do not really remember any reports. 
>>
>> It is only detected by enabling KASAN, the system seems to run fine without
>> any visible error if KASAN is disabled. Maybe there is why no report has
>> been seen?
>>
>> Also according to Section 6.2.14 from ACPI spec 6.3 [1], the setting of proximity
>> domain is optional, as below:
>>
>> This optional object is used to describe proximity domain
>> associations within a machine. _PXM evaluates to an integer
>> that identifies a device as belonging to a Proximity Domain
>> defined in the System Resource Affinity Table (SRAT).
>>
>>
>> Do you think it is ok to resend the fix with above clarification and below
>> log:
> 
> OK, if the specification really allows to provide NUMA_NO_NODE (-1) then
> the code really has to be prepared for that. And ideally all arches
> should deal with that.
> 
>> [   42.970381] ==================================================================
>> [   42.977595] BUG: KASAN: global-out-of-bounds in __bitmap_weight+0x48/0xb0
>> [   42.984370] Read of size 8 at addr ffff20008cdf8790 by task kworker/0:1/13
>> [   42.991230]
>> [   42.992712] CPU: 0 PID: 13 Comm: kworker/0:1 Tainted: G           O      5.2.0-rc4-g8bde06a-dirty #3
>> [   43.001830] Hardware name: Huawei TaiShan 2280 V2/BC82AMDA, BIOS TA BIOS 2280-A CS V2.B050.01 08/08/2019
>> [   43.011298] Workqueue: events work_for_cpu_fn
>> [   43.015643] Call trace:
>> [   43.018078]  dump_backtrace+0x0/0x1e8
>> [   43.021727]  show_stack+0x14/0x20
>> [   43.025031]  dump_stack+0xc4/0xfc
>> [   43.028335]  print_address_description+0x178/0x270
>> [   43.033113]  __kasan_report+0x164/0x1b8
>> [   43.036936]  kasan_report+0xc/0x18
>> [   43.040325]  __asan_load8+0x84/0xa8
>> [   43.043801]  __bitmap_weight+0x48/0xb0
>> [   43.047552]  hclge_init_ae_dev+0x988/0x1e78 [hclge]
>> [   43.052418]  hnae3_register_ae_dev+0xcc/0x278 [hnae3]
>> [   43.057467]  hns3_probe+0xe0/0x120 [hns3]
>> [   43.061464]  local_pci_probe+0x74/0xf0
>> [   43.065200]  work_for_cpu_fn+0x2c/0x48
>> [   43.068937]  process_one_work+0x3c0/0x878
>> [   43.072934]  worker_thread+0x400/0x670
>> [   43.076670]  kthread+0x1b0/0x1b8
>> [   43.079885]  ret_from_fork+0x10/0x18
>> [   43.083446]
>> [   43.084925] The buggy address belongs to the variable:
>> [   43.090052]  numa_distance+0x30/0x40
>> [   43.093613]
>> [   43.095091] Memory state around the buggy address:
>> [   43.099870]  ffff20008cdf8680: fa fa fa fa 04 fa fa fa fa fa fa fa 00 00 fa fa
>> [   43.107078]  ffff20008cdf8700: fa fa fa fa 04 fa fa fa fa fa fa fa 00 fa fa fa
>> [   43.114286] >ffff20008cdf8780: fa fa fa fa 00 00 00 00 00 00 00 00 fa fa fa fa
>> [   43.121494]                          ^
>> [   43.125230]  ffff20008cdf8800: 01 fa fa fa fa fa fa fa 04 fa fa fa fa fa fa fa
>> [   43.132439]  ffff20008cdf8880: fa fa fa fa fa fa fa fa 00 00 fa fa fa fa fa fa
>> [   43.139646] ==================================================================
>>
>>
>>>
>>>>>> This patch changes cpumask_of_node to return cpu_none_mask if the
>>>>>> node is not valid, and sync the cpumask_of_node between the
>>>>>> cpumask_of_node function in numa.h and numa.c.
>>>>>
>>>>> Why?
>>>>
>>>> When CONFIG_DEBUG_PER_CPU_MAPS is defined, the cpumask_of_node() in
>>>> numa.c is used, if not, the cpumask_of_node() in numa.h is used.
>>>>
>>>> I am not sure why there is difference between them, and it is there
>>>> when since the below commit:
>>>> 1a2db300348b ("arm64, numa: Add NUMA support for arm64 platforms.")
>>>>
>>>> I synced them to keep them consistent whether CONFIG_DEBUG_PER_CPU_MAPS
>>>> is defined.
>>>
>>> Such a change should be made in a separate patch with a full
>>> clarification/justification. From the above it is still not clear why
>>> this is needed though.
>>
>> Ok.
>>
>> How about:
>>
>> Currently there are different implementations of cpumask_of_node() depend
>> on the arch, for example:
>>
>> ia64:
>> #define cpumask_of_node(node) ((node) == -1 ?				\
>> 			       cpu_all_mask :				\
>> 			       &node_to_cpu_mask[node])
>>
>>
>> alpha:
>> static const struct cpumask *cpumask_of_node(int node)
>> {
>> 	int cpu;
>>
>> 	if (node == NUMA_NO_NODE)
>> 		return cpu_all_mask;
>>
>> 	cpumask_clear(&node_to_cpumask_map[node]);
>>
>> 	for_each_online_cpu(cpu) {
>> 		if (cpu_to_node(cpu) == node)
>> 			cpumask_set_cpu(cpu, node_to_cpumask_map[node]);
>> 	}
>>
>> 	return &node_to_cpumask_map[node];
>> }
>>
>> Even for the same arch, there are two implementations depend on the
>> CONFIG_DEBUG_PER_CPU_MAPS configuration.
>>
>> arm64/x86 without CONFIG_DEBUG_PER_CPU_MAPS:
>> static inline const struct cpumask *cpumask_of_node(int node)
>> {
>> 	return node_to_cpumask_map[node];
>> }
>>
>> arm64/x86 with CONFIG_DEBUG_PER_CPU_MAPS:
>> const struct cpumask *cpumask_of_node(int node)
>> {
>> 	if (WARN_ON(node >= nr_node_ids))
>> 		return cpu_none_mask;
>>
>> 	if (WARN_ON(node_to_cpumask_map[node] == NULL))
>> 		return cpu_online_mask;
>>
>> 	return node_to_cpumask_map[node];
>> }
>>
>> It seems the cpumask_of_node with CONFIG_DEBUG_PER_CPU_MAPS is used
>> to catch the erorr case and give a warning to user when node id is not
>> valid.
> 
> Yeah the config help text
> config DEBUG_PER_CPU_MAPS
>         bool "Debug access to per_cpu maps"
>         depends on DEBUG_KERNEL
>         depends on SMP
>         help
>           Say Y to verify that the per_cpu map being accessed has
>           been set up. This adds a fair amount of code to kernel memory
>           and decreases performance.
> 
>           Say N if unsure.
> 
> suggests that this is intentionally hidden behind a config so a normal
> path shouldn't really duplicate it. If those checks make sense in
> general then the config option should be dropped I think.

It seems cpumask_of_node with CONFIG_DEBUG_PER_CPU_MAPS on may be used to
debug some early use of cpumask_of_node problem, see below:

/*
 * Allocate node_to_cpumask_map based on number of available nodes
 * Requires node_possible_map to be valid.
 *
 * Note: cpumask_of_node() is not valid until after this is done.
 * (Use CONFIG_DEBUG_PER_CPU_MAPS to check this.)
 */
static void __init setup_node_to_cpumask_map(void)
{
	int node;

	/* setup nr_node_ids if not done yet */
	if (nr_node_ids == MAX_NUMNODES)
		setup_nr_node_ids();

	/* allocate and clear the mapping */
	for (node = 0; node < nr_node_ids; node++) {
		alloc_bootmem_cpumask_var(&node_to_cpumask_map[node]);
		cpumask_clear(node_to_cpumask_map[node]);
	}

	/* cpumask_of_node() will now work */
	pr_debug("Node to cpumask map for %u nodes\n", nr_node_ids);
}

So I prefer to keep it as two implementations for arm64 and x86, but
ensure the two implementations be consistent. It can be cleaned up later
when there is no use at all.

Is it ok with you?

>
Michal Hocko Aug. 30, 2019, 11:13 a.m. UTC | #7
On Fri 30-08-19 17:49:46, Yunsheng Lin wrote:
> On 2019/8/30 16:39, Michal Hocko wrote:
> > On Fri 30-08-19 16:08:14, Yunsheng Lin wrote:
[...]
> >> It seems the cpumask_of_node with CONFIG_DEBUG_PER_CPU_MAPS is used
> >> to catch the erorr case and give a warning to user when node id is not
> >> valid.
> > 
> > Yeah the config help text
> > config DEBUG_PER_CPU_MAPS
> >         bool "Debug access to per_cpu maps"
> >         depends on DEBUG_KERNEL
> >         depends on SMP
> >         help
> >           Say Y to verify that the per_cpu map being accessed has
> >           been set up. This adds a fair amount of code to kernel memory
> >           and decreases performance.
> > 
> >           Say N if unsure.
> > 
> > suggests that this is intentionally hidden behind a config so a normal
> > path shouldn't really duplicate it. If those checks make sense in
> > general then the config option should be dropped I think.
> 
> It seems cpumask_of_node with CONFIG_DEBUG_PER_CPU_MAPS on may be used to
> debug some early use of cpumask_of_node problem, see below:
> 
> /*
>  * Allocate node_to_cpumask_map based on number of available nodes
>  * Requires node_possible_map to be valid.
>  *
>  * Note: cpumask_of_node() is not valid until after this is done.
>  * (Use CONFIG_DEBUG_PER_CPU_MAPS to check this.)
>  */
> static void __init setup_node_to_cpumask_map(void)
> {
> 	int node;
> 
> 	/* setup nr_node_ids if not done yet */
> 	if (nr_node_ids == MAX_NUMNODES)
> 		setup_nr_node_ids();
> 
> 	/* allocate and clear the mapping */
> 	for (node = 0; node < nr_node_ids; node++) {
> 		alloc_bootmem_cpumask_var(&node_to_cpumask_map[node]);
> 		cpumask_clear(node_to_cpumask_map[node]);
> 	}
> 
> 	/* cpumask_of_node() will now work */
> 	pr_debug("Node to cpumask map for %u nodes\n", nr_node_ids);
> }
> 
> So I prefer to keep it as two implementations for arm64 and x86, but
> ensure the two implementations be consistent. It can be cleaned up later
> when there is no use at all.
> 
> Is it ok with you?

I am not really sure what you are asking here TBH. You want both
CONFIG_DEBUG_PER_CPU_MAPS implementations to use the same (duplicated) code?
If that is the case then no objections from me. I would obviously
preferred a shared code but that might be a larger change indeed and can
be done later.
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
index 626ad01..da891ed 100644
--- a/arch/arm64/include/asm/numa.h
+++ b/arch/arm64/include/asm/numa.h
@@ -25,6 +25,12 @@  const struct cpumask *cpumask_of_node(int node);
 /* Returns a pointer to the cpumask of CPUs on Node 'node'. */
 static inline const struct cpumask *cpumask_of_node(int node)
 {
+	if (node >= nr_node_ids || node < 0)
+		return cpu_none_mask;
+
+	if (!node_to_cpumask_map[node])
+		return cpu_online_mask;
+
 	return node_to_cpumask_map[node];
 }
 #endif
diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
index 4f241cc..3846313 100644
--- a/arch/arm64/mm/numa.c
+++ b/arch/arm64/mm/numa.c
@@ -46,7 +46,7 @@  EXPORT_SYMBOL(node_to_cpumask_map);
  */
 const struct cpumask *cpumask_of_node(int node)
 {
-	if (WARN_ON(node >= nr_node_ids))
+	if (WARN_ON(node >= nr_node_ids || node < 0))
 		return cpu_none_mask;
 
 	if (WARN_ON(node_to_cpumask_map[node] == NULL))