Message ID | 20250313060907.2381416-1-wangyuquan1236@phytium.com.cn (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | [v2] ACPI: NUMA: debug invalid unused PXM value for CFMWs | expand |
On Thu, Mar 13, 2025 at 02:09:07PM +0800, Yuquan Wang wrote: > @@ -441,6 +441,11 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header, > start = cfmws->base_hpa; > end = cfmws->base_hpa + cfmws->window_size; > > + if (srat_disabled()) { > + pr_err("SRAT is missing or bad while processing CFMWS.\n"); > + return -EINVAL; > + } > + I thought the srat was optional regardless of the presence of a CFMWS. Is this not the case? ~Gregory
On Thu, Mar 13, 2025 at 02:09:07PM +0800, Yuquan Wang wrote: > The absence of SRAT would cause the fake_pxm to be -1 and increment > to 0, then send to acpi_parse_cfmws(). If there exists CXL memory > ranges that are defined in the CFMWS and not already defined in the > SRAT, the new node (node0) for the CXL memory would be invalid, as > node0 is already in "used". > > This utilizes disable_srat() & srat_disabled() to fail CXL init. Seems like this fixup has drifted from adjusting the fake_pxm to shutting down CXL parsing. More below - > > Signed-off-by: Yuquan Wang <wangyuquan1236@phytium.com.cn> > --- > > Changes in v2: > - Add disable_srat() when fake_pxm is invalid > - Add srat_disabled() check in cxl_acpi_probe() and acpi_parse_cfmws() > > > drivers/acpi/numa/srat.c | 10 ++++++++++ > drivers/cxl/acpi.c | 4 ++++ > 2 files changed, 14 insertions(+) > > diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c > index 00ac0d7bb8c9..2dac25c9258a 100644 > --- a/drivers/acpi/numa/srat.c > +++ b/drivers/acpi/numa/srat.c > @@ -441,6 +441,11 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header, > start = cfmws->base_hpa; > end = cfmws->base_hpa + cfmws->window_size; > > + if (srat_disabled()) { > + pr_err("SRAT is missing or bad while processing CFMWS.\n"); > + return -EINVAL; > + } > + This goes too far by shutting down cfmws parsing for lack of SRAT. > /* > * The SRAT may have already described NUMA details for all, > * or a portion of, this CFMWS HPA range. Extend the memblks > @@ -646,6 +651,11 @@ int __init acpi_numa_init(void) > if (node_to_pxm_map[i] > fake_pxm) > fake_pxm = node_to_pxm_map[i]; > } > + > + /* Make sure CFMWs fake nodes start at node[1] */ > + if (fake_pxm < 0) > + disable_srat(); > + How does the code above make sure fake node starts at node[1]? Would an explicit adjustment like this work? - last_real_pxm = fake_pxm; - fake_pxm++; + fake_pxm = max(fake_pxm, 1); + last_real_pxm = fake_pxm--; > last_real_pxm = fake_pxm; > fake_pxm++; > acpi_table_parse_cedt(ACPI_CEDT_TYPE_CFMWS, acpi_parse_cfmws, > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c > index cb14829bb9be..e75a8ead99f6 100644 > --- a/drivers/cxl/acpi.c > +++ b/drivers/cxl/acpi.c > @@ -829,6 +829,10 @@ static int cxl_acpi_probe(struct platform_device *pdev) > if (rc) > return rc; > > + /* CXL must be in a NUMA system */ > + if (srat_disabled()) > + return -EINVAL; > + > cxl_res = devm_kzalloc(host, sizeof(*cxl_res), GFP_KERNEL); > if (!cxl_res) > return -ENOMEM; > -- > 2.34.1 >
On Thu, Mar 13, 2025 at 09:28:46AM -0700, Alison Schofield wrote: > On Thu, Mar 13, 2025 at 02:09:07PM +0800, Yuquan Wang wrote: > > The absence of SRAT would cause the fake_pxm to be -1 and increment > > to 0, then send to acpi_parse_cfmws(). If there exists CXL memory > > ranges that are defined in the CFMWS and not already defined in the > > SRAT, the new node (node0) for the CXL memory would be invalid, as > > node0 is already in "used". > > > > This utilizes disable_srat() & srat_disabled() to fail CXL init. > > Seems like this fixup has drifted from adjusting the fake_pxm to > shutting down CXL parsing. More below - > > > > > Signed-off-by: Yuquan Wang <wangyuquan1236@phytium.com.cn> > > --- > > > > Changes in v2: > > - Add disable_srat() when fake_pxm is invalid > > - Add srat_disabled() check in cxl_acpi_probe() and acpi_parse_cfmws() > > > > > > drivers/acpi/numa/srat.c | 10 ++++++++++ > > drivers/cxl/acpi.c | 4 ++++ > > 2 files changed, 14 insertions(+) > > > > diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c > > index 00ac0d7bb8c9..2dac25c9258a 100644 > > --- a/drivers/acpi/numa/srat.c > > +++ b/drivers/acpi/numa/srat.c > > @@ -441,6 +441,11 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header, > > start = cfmws->base_hpa; > > end = cfmws->base_hpa + cfmws->window_size; > > > > + if (srat_disabled()) { > > + pr_err("SRAT is missing or bad while processing CFMWS.\n"); > > + return -EINVAL; > > + } > > + > > This goes too far by shutting down cfmws parsing for lack of SRAT. > Actually, I thought there need another patch to fix the follow problem that the fake node bring when no SRAT. Detailed description below. > > /* > > * The SRAT may have already described NUMA details for all, > > * or a portion of, this CFMWS HPA range. Extend the memblks > > @@ -646,6 +651,11 @@ int __init acpi_numa_init(void) > > if (node_to_pxm_map[i] > fake_pxm) > > fake_pxm = node_to_pxm_map[i]; > > } > > + > > + /* Make sure CFMWs fake nodes start at node[1] */ > > + if (fake_pxm < 0) > > + disable_srat(); > > + > > How does the code above make sure fake node starts at node[1]? > Would an explicit adjustment like this work? Thanks for your correction :) Yes, the way I used here is too implicit. > > - last_real_pxm = fake_pxm; > - fake_pxm++; > + fake_pxm = max(fake_pxm, 1); > + last_real_pxm = fake_pxm--; I tried the adjustment below: fake_pxm = max(fake_pxm, 0); // 0 because it will increment to 1 last_real_pxm = fake_pxm++; This works but it might only control the parameter sent to acpi_parse_cfmws(). According to acpi_map_pxm_to_node(), altough the input fake_pxm is 1 when no SRAT, the returned node would still be 0 and the following nodes are aslo incorrect. Hence, I tried add a new line below: fake_pxm = max(fake_pxm, 0); last_real_pxm = fake_pxm++; node_set(0, nodes_found_map); As no matter what situation, node[0] would be found and set. With this setting, acpi_map_pxm_to_node() could return the expected node value even if no SRAT. :( Unfortunately, when we use "cxl create-region" to enable our cxl memory, it would still be assigned to node[0], because the "numa_add_memblk()" can only add numa_memblk to numa_meminfo list. If our SRAT is OK, the numa_memblks_init() would then utilize numa_move_tail_memblk() to move the numa_memblk from numa_meminfo to numa_reserved_meminfo in CFMWs fake node situation. If SRAT is missing or bad, the numa_memblks_init() would fail since init_func() would fail. And it causes that no numa_memblk in numa_reserved_meminfo list and the following dax&memory_hotplug drivers could not online the expected fake node. Based on the above problem, I have a new patch idea that introduce a new function in mm/numa_memblks.c: numa_add_reserved_memblk(). It could add one numa_memblk to nuam_reserved_meminfo directly. Maybe we could call it in acpi_parse_cfmws() if srat is missing. In mm/numa_memblks.c: int __init numa_add_reserved_memblk(int nid, u64 start, u64 end) { return numa_add_memblk_to(nid, start, end, &numa_reserved_meminfo); } In drivers/acpi/numa/srat.c: if (srat_disabled()) { if (numa_add_reserved_memblk(node, start, end) < 0) { pr_warn("ACPI NUMA: Failed to add memblk for CFMWS node %d [mem %#llx-%#llx]\n", node, start, end); } } :( But..., the dax_kmem driver will fail because something wrong in memory_group_register_static(). The good result is our cxl memory would not be assigned to node[0] anymore! BTW, as papering these things looks like not easily, I chose to aggressively fail the acpi_parse_cfmws() in srat.c since it mainly works for building cxl fake nodes and also fail the CXL init in cxl_acpi_probe per Jonathan. Link: https://lists.nongnu.org/archive/html/qemu-devel/2025-03/msg03668.html Hopes more comments to guide me! I'm a really rookie in kernel community :P > > last_real_pxm = fake_pxm; > > fake_pxm++; > > acpi_table_parse_cedt(ACPI_CEDT_TYPE_CFMWS, acpi_parse_cfmws, > > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c > > index cb14829bb9be..e75a8ead99f6 100644 > > --- a/drivers/cxl/acpi.c > > +++ b/drivers/cxl/acpi.c > > @@ -829,6 +829,10 @@ static int cxl_acpi_probe(struct platform_device *pdev) > > if (rc) > > return rc; > > > > + /* CXL must be in a NUMA system */ > > + if (srat_disabled()) > > + return -EINVAL; > > + > > cxl_res = devm_kzalloc(host, sizeof(*cxl_res), GFP_KERNEL); > > if (!cxl_res) > > return -ENOMEM; > > -- > > 2.34.1 > >
On Thu, 13 Mar 2025 11:02:37 -0400 Gregory Price <gourry@gourry.net> wrote: > On Thu, Mar 13, 2025 at 02:09:07PM +0800, Yuquan Wang wrote: > > @@ -441,6 +441,11 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header, > > start = cfmws->base_hpa; > > end = cfmws->base_hpa + cfmws->window_size; > > > > + if (srat_disabled()) { > > + pr_err("SRAT is missing or bad while processing CFMWS.\n"); > > + return -EINVAL; > > + } > > + > > I thought the srat was optional regardless of the presence of a CFMWS. > Is this not the case? True in theory, but do we want to support it? I'd vote no unless someone is shipping such a system and can't fix it up. Jonathan > > ~Gregory >
On Fri, Mar 14, 2025 at 10:12:26AM +0000, Jonathan Cameron wrote: > On Thu, 13 Mar 2025 11:02:37 -0400 > Gregory Price <gourry@gourry.net> wrote: > > > On Thu, Mar 13, 2025 at 02:09:07PM +0800, Yuquan Wang wrote: > > > @@ -441,6 +441,11 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header, > > > start = cfmws->base_hpa; > > > end = cfmws->base_hpa + cfmws->window_size; > > > > > > + if (srat_disabled()) { > > > + pr_err("SRAT is missing or bad while processing CFMWS.\n"); > > > + return -EINVAL; > > > + } > > > + > > > > I thought the srat was optional regardless of the presence of a CFMWS. > > Is this not the case? > > True in theory, but do we want to support it? > > I'd vote no unless someone is shipping such a system and can't fix it up. > > Jonathan > Well, this is really the patch trying to deal with that I suppose. The code here already states its creating 1 node per CFMWS in the absense of srat - but this patch just changes that and says "no nodes 4 u". I don't think that's what we want either. ~Gregory
On Fri, 14 Mar 2025 09:38:24 -0400 Gregory Price <gourry@gourry.net> wrote: > On Fri, Mar 14, 2025 at 10:12:26AM +0000, Jonathan Cameron wrote: > > On Thu, 13 Mar 2025 11:02:37 -0400 > > Gregory Price <gourry@gourry.net> wrote: > > > > > On Thu, Mar 13, 2025 at 02:09:07PM +0800, Yuquan Wang wrote: > > > > @@ -441,6 +441,11 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header, > > > > start = cfmws->base_hpa; > > > > end = cfmws->base_hpa + cfmws->window_size; > > > > > > > > + if (srat_disabled()) { > > > > + pr_err("SRAT is missing or bad while processing CFMWS.\n"); > > > > + return -EINVAL; > > > > + } > > > > + > > > > > > I thought the srat was optional regardless of the presence of a CFMWS. > > > Is this not the case? > > > > True in theory, but do we want to support it? > > > > I'd vote no unless someone is shipping such a system and can't fix it up. > > > > Jonathan > > > > Well, this is really the patch trying to deal with that I suppose. The > code here already states its creating 1 node per CFMWS in the absense of > srat - but this patch just changes that and says "no nodes 4 u". I > don't think that's what we want either. Under this specific set of circumstances, "no nodes 4 u" is to me a perfectly valid answer. Jonathan > > ~Gregory
On Fri, Mar 14, 2025 at 03:55:44PM +0800, Yuquan Wang wrote: > On Thu, Mar 13, 2025 at 09:28:46AM -0700, Alison Schofield wrote: > > On Thu, Mar 13, 2025 at 02:09:07PM +0800, Yuquan Wang wrote: > > > The absence of SRAT would cause the fake_pxm to be -1 and increment > > > to 0, then send to acpi_parse_cfmws(). If there exists CXL memory > > > ranges that are defined in the CFMWS and not already defined in the > > > SRAT, the new node (node0) for the CXL memory would be invalid, as > > > node0 is already in "used". > > > > > > This utilizes disable_srat() & srat_disabled() to fail CXL init. > > > > Seems like this fixup has drifted from adjusting the fake_pxm to > > shutting down CXL parsing. More below - > > > > > > > > Signed-off-by: Yuquan Wang <wangyuquan1236@phytium.com.cn> > > > --- > > > > > > Changes in v2: > > > - Add disable_srat() when fake_pxm is invalid > > > - Add srat_disabled() check in cxl_acpi_probe() and acpi_parse_cfmws() > > > > > > > > > drivers/acpi/numa/srat.c | 10 ++++++++++ > > > drivers/cxl/acpi.c | 4 ++++ > > > 2 files changed, 14 insertions(+) > > > > > > diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c > > > index 00ac0d7bb8c9..2dac25c9258a 100644 > > > --- a/drivers/acpi/numa/srat.c > > > +++ b/drivers/acpi/numa/srat.c > > > @@ -441,6 +441,11 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header, > > > start = cfmws->base_hpa; > > > end = cfmws->base_hpa + cfmws->window_size; > > > > > > + if (srat_disabled()) { > > > + pr_err("SRAT is missing or bad while processing CFMWS.\n"); > > > + return -EINVAL; > > > + } > > > + > > > > This goes too far by shutting down cfmws parsing for lack of SRAT. > > > > Actually, I thought there need another patch to fix the follow problem > that the fake node bring when no SRAT. > > Detailed description below. > > > > /* > > > * The SRAT may have already described NUMA details for all, > > > * or a portion of, this CFMWS HPA range. Extend the memblks > > > @@ -646,6 +651,11 @@ int __init acpi_numa_init(void) > > > if (node_to_pxm_map[i] > fake_pxm) > > > fake_pxm = node_to_pxm_map[i]; > > > } > > > + > > > + /* Make sure CFMWs fake nodes start at node[1] */ > > > + if (fake_pxm < 0) > > > + disable_srat(); > > > + > > > > How does the code above make sure fake node starts at node[1]? > > Would an explicit adjustment like this work? > > Thanks for your correction :) Yes, the way I used here is too implicit. > > > > > - last_real_pxm = fake_pxm; > > - fake_pxm++; > > + fake_pxm = max(fake_pxm, 1); > > + last_real_pxm = fake_pxm--; > > I tried the adjustment below: > > fake_pxm = max(fake_pxm, 0); // 0 because it will increment to 1 > last_real_pxm = fake_pxm++; > > This works but it might only control the parameter sent to acpi_parse_cfmws(). > According to acpi_map_pxm_to_node(), altough the input fake_pxm is 1 when no > SRAT, the returned node would still be 0 and the following nodes are aslo > incorrect. > > Hence, I tried add a new line below: > > fake_pxm = max(fake_pxm, 0); > last_real_pxm = fake_pxm++; > node_set(0, nodes_found_map); > > As no matter what situation, node[0] would be found and set. With this > setting, acpi_map_pxm_to_node() could return the expected node value > even if no SRAT. :( > > Unfortunately, when we use "cxl create-region" to enable our cxl memory, > it would still be assigned to node[0], because the "numa_add_memblk()" > can only add numa_memblk to numa_meminfo list. > > If our SRAT is OK, the numa_memblks_init() would then utilize > numa_move_tail_memblk() to move the numa_memblk from numa_meminfo to > numa_reserved_meminfo in CFMWs fake node situation. If SRAT is missing > or bad, the numa_memblks_init() would fail since init_func() would fail. > And it causes that no numa_memblk in numa_reserved_meminfo list and the > following dax&memory_hotplug drivers could not online the expected fake > node. > > Based on the above problem, I have a new patch idea that introduce a new > function in mm/numa_memblks.c: numa_add_reserved_memblk(). It could add > one numa_memblk to nuam_reserved_meminfo directly. Maybe we could call > it in acpi_parse_cfmws() if srat is missing. > > In mm/numa_memblks.c: > > int __init numa_add_reserved_memblk(int nid, u64 start, u64 end) > { > return numa_add_memblk_to(nid, start, end, &numa_reserved_meminfo); > } > > In drivers/acpi/numa/srat.c: > > if (srat_disabled()) { > if (numa_add_reserved_memblk(node, start, end) < 0) { > pr_warn("ACPI NUMA: Failed to add memblk for CFMWS node %d [mem %#llx-%#llx]\n", > node, start, end); > } > } > > :( But..., the dax_kmem driver will fail because something wrong in > memory_group_register_static(). The good result is our cxl memory would > not be assigned to node[0] anymore! > > BTW, as papering these things looks like not easily, I chose to aggressively > fail the acpi_parse_cfmws() in srat.c since it mainly works for building > cxl fake nodes and also fail the CXL init in cxl_acpi_probe per Jonathan. > > Link: https://lists.nongnu.org/archive/html/qemu-devel/2025-03/msg03668.html > > Hopes more comments to guide me! I'm a really rookie in kernel community :P > No worries Yuquan. This is how we do it. Keep iterating! I see you, Jonathan, and Gregory chatting about this a bit on the LSFMM thread, and they are probably following along better than I. I'm getting a bit lost in the diffs here in this thread. At this point, I suggest sending the next revision of this patch with an updated commit log describing the failing scenario, how the changes solve 'your' problem, and why it doesn't break anything. > > > last_real_pxm = fake_pxm; > > > fake_pxm++; > > > acpi_table_parse_cedt(ACPI_CEDT_TYPE_CFMWS, acpi_parse_cfmws, > > > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c > > > index cb14829bb9be..e75a8ead99f6 100644 > > > --- a/drivers/cxl/acpi.c > > > +++ b/drivers/cxl/acpi.c > > > @@ -829,6 +829,10 @@ static int cxl_acpi_probe(struct platform_device *pdev) > > > if (rc) > > > return rc; > > > > > > + /* CXL must be in a NUMA system */ > > > + if (srat_disabled()) > > > + return -EINVAL; > > > + > > > cxl_res = devm_kzalloc(host, sizeof(*cxl_res), GFP_KERNEL); > > > if (!cxl_res) > > > return -ENOMEM; > > > -- > > > 2.34.1 > > > >
diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c index 00ac0d7bb8c9..2dac25c9258a 100644 --- a/drivers/acpi/numa/srat.c +++ b/drivers/acpi/numa/srat.c @@ -441,6 +441,11 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header, start = cfmws->base_hpa; end = cfmws->base_hpa + cfmws->window_size; + if (srat_disabled()) { + pr_err("SRAT is missing or bad while processing CFMWS.\n"); + return -EINVAL; + } + /* * The SRAT may have already described NUMA details for all, * or a portion of, this CFMWS HPA range. Extend the memblks @@ -646,6 +651,11 @@ int __init acpi_numa_init(void) if (node_to_pxm_map[i] > fake_pxm) fake_pxm = node_to_pxm_map[i]; } + + /* Make sure CFMWs fake nodes start at node[1] */ + if (fake_pxm < 0) + disable_srat(); + last_real_pxm = fake_pxm; fake_pxm++; acpi_table_parse_cedt(ACPI_CEDT_TYPE_CFMWS, acpi_parse_cfmws, diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c index cb14829bb9be..e75a8ead99f6 100644 --- a/drivers/cxl/acpi.c +++ b/drivers/cxl/acpi.c @@ -829,6 +829,10 @@ static int cxl_acpi_probe(struct platform_device *pdev) if (rc) return rc; + /* CXL must be in a NUMA system */ + if (srat_disabled()) + return -EINVAL; + cxl_res = devm_kzalloc(host, sizeof(*cxl_res), GFP_KERNEL); if (!cxl_res) return -ENOMEM;
The absence of SRAT would cause the fake_pxm to be -1 and increment to 0, then send to acpi_parse_cfmws(). If there exists CXL memory ranges that are defined in the CFMWS and not already defined in the SRAT, the new node (node0) for the CXL memory would be invalid, as node0 is already in "used". This utilizes disable_srat() & srat_disabled() to fail CXL init. Signed-off-by: Yuquan Wang <wangyuquan1236@phytium.com.cn> --- Changes in v2: - Add disable_srat() when fake_pxm is invalid - Add srat_disabled() check in cxl_acpi_probe() and acpi_parse_cfmws() drivers/acpi/numa/srat.c | 10 ++++++++++ drivers/cxl/acpi.c | 4 ++++ 2 files changed, 14 insertions(+)