Message ID | 20250321023602.2609614-3-wangyuquan1236@phytium.com.cn (mailing list archive) |
---|---|
State | RFC, archived |
Headers | show |
Series | ACPI: NUMA: debug invalid unused PXM value for CFMWs | expand |
On Fri, 21 Mar 2025 10:36:02 +0800 Yuquan Wang <wangyuquan1236@phytium.com.cn> wrote: > The absence of SRAT would cause the fake_pxm to be -1 and increment > to 0, then send to acpi_parse_cfmws(). If there exists CXL memory > ranges that are defined in the CFMWS and not already defined in the > SRAT, the new node (node0) for the CXL memory would be invalid, as > node0 is already in "used", and all CXL memory might be online on > node0. > > This utilizes node_set(0, nodes_found_map) to set pxm&node map. With > this setting, acpi_map_pxm_to_node() could return the expected node > value even if no SRAT. > > If SRAT is valid, the numa_memblks_init() would then utilize > numa_move_tail_memblk() to move the numa_memblk from numa_meminfo to > numa_reserved_meminfo in CFMWs fake node situation. I would call out that numa_move_tail_memblk() is called in numa_cleanup_meminfo() which is indeed called by num_memblks_init() > > If SRAT is missing or bad, the numa_memblks_init() would fail since > init_func() would fail. And it causes that no numa_memblk in > numa_reserved_meminfo list and the following dax_cxl driver could > find the expected fake node. > > Use numa_add_reserved_memblk() to replace numa_add_memblk(), since > the cxl numa_memblk added by numa_add_memblk() would finally be moved > to numa_reserved_meminfo, and numa_add_reserved_memblk() here could > add cxl numa_memblk into reserved list directly. Hence, no matter > SRAT is good or not, cxl numa_memblk could be allocated to reserved > list. > > Signed-off-by: Yuquan Wang <wangyuquan1236@phytium.com.cn> This definitely wants input from Mike Rapoport. Looks fine to me, but there may be some subtle corners I'm missing. > --- > drivers/acpi/numa/srat.c | 11 ++++++++--- > 1 file changed, 8 insertions(+), 3 deletions(-) > > diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c > index 00ac0d7bb8c9..50bfecfb9c16 100644 > --- a/drivers/acpi/numa/srat.c > +++ b/drivers/acpi/numa/srat.c > @@ -458,11 +458,12 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header, > return -EINVAL; > } > > - if (numa_add_memblk(node, start, end) < 0) { > + if (numa_add_reserved_memblk(node, start, end) < 0) { > /* CXL driver must handle the NUMA_NO_NODE case */ > pr_warn("ACPI NUMA: Failed to add memblk for CFMWS node %d [mem %#llx-%#llx]\n", > node, start, end); > } > + Unrelated change. Always give patches a final look through to spot things like this. Trivial, but they all add noise to what we are focusing on. > node_set(node, numa_nodes_parsed); > > /* Set the next available fake_pxm value */ > @@ -646,8 +647,12 @@ int __init acpi_numa_init(void) > if (node_to_pxm_map[i] > fake_pxm) > fake_pxm = node_to_pxm_map[i]; > } > - last_real_pxm = fake_pxm; > - fake_pxm++; > + > + /* Make sure CFMWs fake node >= 1 */ > + fake_pxm = max(fake_pxm, 0); > + last_real_pxm = fake_pxm++; > + node_set(0, nodes_found_map); > + > acpi_table_parse_cedt(ACPI_CEDT_TYPE_CFMWS, acpi_parse_cfmws, > &fake_pxm); >
Hi, On Fri, Mar 21, 2025 at 11:51:16AM +0000, Jonathan Cameron wrote: > On Fri, 21 Mar 2025 10:36:02 +0800 > Yuquan Wang <wangyuquan1236@phytium.com.cn> wrote: > > > The absence of SRAT would cause the fake_pxm to be -1 and increment > > to 0, then send to acpi_parse_cfmws(). If there exists CXL memory > > ranges that are defined in the CFMWS and not already defined in the > > SRAT, the new node (node0) for the CXL memory would be invalid, as > > node0 is already in "used", and all CXL memory might be online on > > node0. > > > > This utilizes node_set(0, nodes_found_map) to set pxm&node map. With > > this setting, acpi_map_pxm_to_node() could return the expected node > > value even if no SRAT. > > > > If SRAT is valid, the numa_memblks_init() would then utilize > > numa_move_tail_memblk() to move the numa_memblk from numa_meminfo to > > numa_reserved_meminfo in CFMWs fake node situation. > > I would call out that numa_move_tail_memblk() is called in > numa_cleanup_meminfo() which is indeed called by num_memblks_init() > > > > > If SRAT is missing or bad, the numa_memblks_init() would fail since > > init_func() would fail. And it causes that no numa_memblk in > > numa_reserved_meminfo list and the following dax_cxl driver could > > find the expected fake node. > > > > Use numa_add_reserved_memblk() to replace numa_add_memblk(), since > > the cxl numa_memblk added by numa_add_memblk() would finally be moved > > to numa_reserved_meminfo, and numa_add_reserved_memblk() here could > > add cxl numa_memblk into reserved list directly. Hence, no matter > > SRAT is good or not, cxl numa_memblk could be allocated to reserved > > list. > > > > Signed-off-by: Yuquan Wang <wangyuquan1236@phytium.com.cn> > > This definitely wants input from Mike Rapoport. > Looks fine to me, but there may be some subtle corners I'm missing. I'm fine with exposing numa_add_reserved_memblk(), but I don't understand CXL discovery enough to say if adding CXL ranges directly to numa_reserved_meminfo. If this is always the case that CFMW regions end up on numa_reserved_meminfo, adding them there in the first place does make sense. > > --- > > drivers/acpi/numa/srat.c | 11 ++++++++--- > > 1 file changed, 8 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c > > index 00ac0d7bb8c9..50bfecfb9c16 100644 > > --- a/drivers/acpi/numa/srat.c > > +++ b/drivers/acpi/numa/srat.c > > @@ -458,11 +458,12 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header, > > return -EINVAL; > > } > > > > - if (numa_add_memblk(node, start, end) < 0) { > > + if (numa_add_reserved_memblk(node, start, end) < 0) { > > /* CXL driver must handle the NUMA_NO_NODE case */ > > pr_warn("ACPI NUMA: Failed to add memblk for CFMWS node %d [mem %#llx-%#llx]\n", > > node, start, end); > > } > > + > > Unrelated change. Always give patches a final look through to spot > things like this. Trivial, but they all add noise to what we are focusing on. > > > node_set(node, numa_nodes_parsed); > > > > /* Set the next available fake_pxm value */ > > @@ -646,8 +647,12 @@ int __init acpi_numa_init(void) > > if (node_to_pxm_map[i] > fake_pxm) > > fake_pxm = node_to_pxm_map[i]; > > } > > - last_real_pxm = fake_pxm; > > - fake_pxm++; > > + > > + /* Make sure CFMWs fake node >= 1 */ > > + fake_pxm = max(fake_pxm, 0); > > + last_real_pxm = fake_pxm++; I'd make it more explicit: /* * Make sure CFMWs fake nodes follow last_real_pxm, even when SRAT * is invalid */ last_real_pxm = max(fake_pxm, 0); fake_pxm = last_real_pxm + 1; > > + node_set(0, nodes_found_map); > > + > > acpi_table_parse_cedt(ACPI_CEDT_TYPE_CFMWS, acpi_parse_cfmws, > > &fake_pxm); > > >
Yuquan Wang wrote: > The absence of SRAT would cause the fake_pxm to be -1 and increment > to 0, then send to acpi_parse_cfmws(). If there exists CXL memory > ranges that are defined in the CFMWS and not already defined in the > SRAT, the new node (node0) for the CXL memory would be invalid, as > node0 is already in "used", and all CXL memory might be online on > node0. It is still not clear to me why this is a problem. If there is no SRAT and CXL is the first memory proximity domain in the system then it should be 0. In other words, if it is a problem that the kernel is picking node0 for CXL memory when there is no SRAT, the problem is that there is no SRAT. > This utilizes node_set(0, nodes_found_map) to set pxm&node map. With > this setting, acpi_map_pxm_to_node() could return the expected node > value even if no SRAT. > > If SRAT is valid, the numa_memblks_init() would then utilize > numa_move_tail_memblk() to move the numa_memblk from numa_meminfo to > numa_reserved_meminfo in CFMWs fake node situation. > > If SRAT is missing or bad, the numa_memblks_init() would fail since > init_func() would fail. And it causes that no numa_memblk in > numa_reserved_meminfo list and the following dax_cxl driver could > find the expected fake node. > > Use numa_add_reserved_memblk() to replace numa_add_memblk(), since > the cxl numa_memblk added by numa_add_memblk() would finally be moved > to numa_reserved_meminfo, and numa_add_reserved_memblk() here could > add cxl numa_memblk into reserved list directly. Hence, no matter > SRAT is good or not, cxl numa_memblk could be allocated to reserved > list. Do you not have other problems due to numa_register_meminfo() not being called? I would really like to say that the platform is buggy without an SRAT and you should not expect anything useful from a NUMA perspective on such a platform. Everything showing up in node0 in that case sounds right. > > Signed-off-by: Yuquan Wang <wangyuquan1236@phytium.com.cn> > --- > drivers/acpi/numa/srat.c | 11 ++++++++--- > 1 file changed, 8 insertions(+), 3 deletions(-) > > diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c > index 00ac0d7bb8c9..50bfecfb9c16 100644 > --- a/drivers/acpi/numa/srat.c > +++ b/drivers/acpi/numa/srat.c > @@ -458,11 +458,12 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header, > return -EINVAL; > } > > - if (numa_add_memblk(node, start, end) < 0) { > + if (numa_add_reserved_memblk(node, start, end) < 0) { This change can move to patch1 with the new justification I suggested. ...then we can have the pxm fixup discussion separately. > /* CXL driver must handle the NUMA_NO_NODE case */ > pr_warn("ACPI NUMA: Failed to add memblk for CFMWS node %d [mem %#llx-%#llx]\n", > node, start, end); > } > + > node_set(node, numa_nodes_parsed); > > /* Set the next available fake_pxm value */ > @@ -646,8 +647,12 @@ int __init acpi_numa_init(void) > if (node_to_pxm_map[i] > fake_pxm) > fake_pxm = node_to_pxm_map[i]; > } > - last_real_pxm = fake_pxm; > - fake_pxm++; > + > + /* Make sure CFMWs fake node >= 1 */ > + fake_pxm = max(fake_pxm, 0); > + last_real_pxm = fake_pxm++; > + node_set(0, nodes_found_map); > + > acpi_table_parse_cedt(ACPI_CEDT_TYPE_CFMWS, acpi_parse_cfmws, > &fake_pxm); > > -- > 2.34.1 >
diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c index 00ac0d7bb8c9..50bfecfb9c16 100644 --- a/drivers/acpi/numa/srat.c +++ b/drivers/acpi/numa/srat.c @@ -458,11 +458,12 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header, return -EINVAL; } - if (numa_add_memblk(node, start, end) < 0) { + if (numa_add_reserved_memblk(node, start, end) < 0) { /* CXL driver must handle the NUMA_NO_NODE case */ pr_warn("ACPI NUMA: Failed to add memblk for CFMWS node %d [mem %#llx-%#llx]\n", node, start, end); } + node_set(node, numa_nodes_parsed); /* Set the next available fake_pxm value */ @@ -646,8 +647,12 @@ int __init acpi_numa_init(void) if (node_to_pxm_map[i] > fake_pxm) fake_pxm = node_to_pxm_map[i]; } - last_real_pxm = fake_pxm; - fake_pxm++; + + /* Make sure CFMWs fake node >= 1 */ + fake_pxm = max(fake_pxm, 0); + last_real_pxm = fake_pxm++; + node_set(0, nodes_found_map); + acpi_table_parse_cedt(ACPI_CEDT_TYPE_CFMWS, acpi_parse_cfmws, &fake_pxm);
The absence of SRAT would cause the fake_pxm to be -1 and increment to 0, then send to acpi_parse_cfmws(). If there exists CXL memory ranges that are defined in the CFMWS and not already defined in the SRAT, the new node (node0) for the CXL memory would be invalid, as node0 is already in "used", and all CXL memory might be online on node0. This utilizes node_set(0, nodes_found_map) to set pxm&node map. With this setting, acpi_map_pxm_to_node() could return the expected node value even if no SRAT. If SRAT is valid, the numa_memblks_init() would then utilize numa_move_tail_memblk() to move the numa_memblk from numa_meminfo to numa_reserved_meminfo in CFMWs fake node situation. If SRAT is missing or bad, the numa_memblks_init() would fail since init_func() would fail. And it causes that no numa_memblk in numa_reserved_meminfo list and the following dax_cxl driver could find the expected fake node. Use numa_add_reserved_memblk() to replace numa_add_memblk(), since the cxl numa_memblk added by numa_add_memblk() would finally be moved to numa_reserved_meminfo, and numa_add_reserved_memblk() here could add cxl numa_memblk into reserved list directly. Hence, no matter SRAT is good or not, cxl numa_memblk could be allocated to reserved list. Signed-off-by: Yuquan Wang <wangyuquan1236@phytium.com.cn> --- drivers/acpi/numa/srat.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)