Message ID | 20241202111941.2636613-1-raghavendra.kt@amd.com |
---|---|
State | New |
Headers | show |
Series | [RFC] resource: Fix CXL node not populated issue | expand |
On Mon, Dec 02, 2024 at 11:19:41AM +0000, Raghavendra K T wrote: > Before: > ~]$ numastat -m > ... > Node 0 Node 1 Total > --------------- --------------- --------------- > MemTotal 128096.18 128838.48 256934.65 > > After: > $ numastat -m > ..... > Node 0 Node 1 Node 2 Total > --------------- --------------- --------------- --------------- > MemTotal 128054.16 128880.51 129024.00 385958.67 > > Current patch reverts the effect of first commit where the issue is seen. > > git bisect had led to below commit Missed blank line here. > Fixes: b4afe4183ec7 ("resource: fix region_intersects() vs add_memory_driver_managed()") > Cc: Huang Ying <ying.huang@intel.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: David Hildenbrand <david@redhat.com> > Cc: Davidlohr Bueso <dave@stgolabs.net> > Cc: Jonathan Cameron <jonathan.cameron@huawei.com> > Cc: Dave Jiang <dave.jiang@intel.com> > Cc: Alison Schofield <alison.schofield@intel.com> > Cc: Vishal Verma <vishal.l.verma@intel.com> > Cc: Ira Weiny <ira.weiny@intel.com> > Cc: Alistair Popple <apopple@nvidia.com> > Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > Cc: Bjorn Helgaas <bhelgaas@google.com> > Cc: Baoquan He <bhe@redhat.com> > Cc: <ilpo.jarvinen@linux.intel.com> > Cc: Mika Westerberg <mika.westerberg@linux.intel.com> > Cc: Fontenot Nathan <Nathan.Fontenot@amd.com> > Cc: Wei Huang <wei.huang2@amd.com> Isn't it too many to be included in the commit message? Note you may use the same list with --cc in the command line with almost the same effect (almost -- no noise in the commit message). > Signed-off-by: Raghavendra K T <raghavendra.kt@amd.com> > --- ... > + bool is_type = (((p->flags & flags) == flags) && > + ((desc == IORES_DESC_NONE) || > + (desc == p->desc))); > + > + if (resource_overlaps(p, &res)) > + is_type ? type++ : other++; Instead (if you will end up with this approach) please still use is_type_match().
++ Huang new address Thank you for looking into patch Andy. On 12/2/2024 9:04 PM, Andy Shevchenko wrote: > On Mon, Dec 02, 2024 at 11:19:41AM +0000, Raghavendra K T wrote: >> Before: >> ~]$ numastat -m >> ... >> Node 0 Node 1 Total >> --------------- --------------- --------------- >> MemTotal 128096.18 128838.48 256934.65 >> >> After: >> $ numastat -m >> ..... >> Node 0 Node 1 Node 2 Total >> --------------- --------------- --------------- --------------- >> MemTotal 128054.16 128880.51 129024.00 385958.67 >> >> Current patch reverts the effect of first commit where the issue is seen. >> >> git bisect had led to below commit > > Missed blank line here. > Will add. >> Fixes: b4afe4183ec7 ("resource: fix region_intersects() vs add_memory_driver_managed()") > >> Cc: Huang Ying <ying.huang@intel.com> >> Cc: Andrew Morton <akpm@linux-foundation.org> >> Cc: Dan Williams <dan.j.williams@intel.com> >> Cc: David Hildenbrand <david@redhat.com> >> Cc: Davidlohr Bueso <dave@stgolabs.net> >> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> >> Cc: Dave Jiang <dave.jiang@intel.com> >> Cc: Alison Schofield <alison.schofield@intel.com> >> Cc: Vishal Verma <vishal.l.verma@intel.com> >> Cc: Ira Weiny <ira.weiny@intel.com> >> Cc: Alistair Popple <apopple@nvidia.com> >> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> >> Cc: Bjorn Helgaas <bhelgaas@google.com> >> Cc: Baoquan He <bhe@redhat.com> >> Cc: <ilpo.jarvinen@linux.intel.com> >> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> >> Cc: Fontenot Nathan <Nathan.Fontenot@amd.com> >> Cc: Wei Huang <wei.huang2@amd.com> > > Isn't it too many to be included in the commit message? Note you may use the > same list with --cc in the command line with almost the same effect (almost -- > no noise in the commit message). Agree. Do not want to add in the final commit message. I took the names from the bisected message. will Cc directly next time. > >> Signed-off-by: Raghavendra K T <raghavendra.kt@amd.com> >> --- > > ... > >> + bool is_type = (((p->flags & flags) == flags) && >> + ((desc == IORES_DESC_NONE) || >> + (desc == p->desc))); >> + >> + if (resource_overlaps(p, &res)) >> + is_type ? type++ : other++; > > Instead (if you will end up with this approach) please still use is_type_match(). > Sure will do in next iteration unless someone comes with expected fix. I bumped into this because I was not able to evaluate my patches on 6.13 with CXL card. Thought of reporting what worked for me... Link: : https://lore.kernel.org/all/20241201153818.2633616-1-raghavendra.kt@amd.com/ I may need some time to go back and understand some history and code. Thanks and Regards - Raghu
Raghavendra K T <raghavendra.kt@amd.com> writes: > Before: > ~]$ numastat -m > ... > Node 0 Node 1 Total > --------------- --------------- --------------- > MemTotal 128096.18 128838.48 256934.65 > > After: > $ numastat -m > ..... > Node 0 Node 1 Node 2 Total > --------------- --------------- --------------- --------------- > MemTotal 128054.16 128880.51 129024.00 385958.67 > > Current patch reverts the effect of first commit where the issue is seen. This doesn't root cause the issue. Please trace the code path (e.g. return value of region_intersects(), in which code path) at least. Without these information, we cannot come up with a proper fix to your issue. > git bisect had led to below commit > Fixes: b4afe4183ec7 ("resource: fix region_intersects() vs add_memory_driver_managed()") This breaks you case, sorry about that. But this also fixed a real bug too. So, it's not appropriate just to revert it blindly. [snip] --- Best Regards, Huang, Ying
On Tue, Dec 03, 2024 at 02:26:52PM +0800, Huang, Ying wrote: > Raghavendra K T <raghavendra.kt@amd.com> writes: ... > > git bisect had led to below commit > > Fixes: b4afe4183ec7 ("resource: fix region_intersects() vs add_memory_driver_managed()") > > This breaks you case, sorry about that. But this also fixed a real bug > too. So, it's not appropriate just to revert it blindly. Linus was clear about this recently. Even if it fixes a bug, regression is still regression and might (*) lead to a revert. https://lwn.net/Articles/990599/ (*) in general fixes are better than reverts, but depends on the timing in the release cycle the revert may be the only option.
On Tue, Dec 03, 2024 at 11:56:35AM -0800, Gregory Price wrote: > On Tue, Dec 3, 2024, 6:09 AM Andy Shevchenko < > andriy.shevchenko@linux.intel.com> wrote: > > On Tue, Dec 03, 2024 at 02:26:52PM +0800, Huang, Ying wrote: > > > Raghavendra K T <raghavendra.kt@amd.com> writes: ... > > > > git bisect had led to below commit > > > > Fixes: b4afe4183ec7 ("resource: fix region_intersects() vs > > add_memory_driver_managed()") > > > > > > This breaks you case, sorry about that. But this also fixed a real bug > > > too. So, it's not appropriate just to revert it blindly. > > > > Linus was clear about this recently. Even if it fixes a bug, regression is > > still regression > > In my experience, region intersection issues often stem from bad CEDT > entries. Yeah, sometimes fixes may reveal the hidden issues. > Have you dumped the ACPI tables on your system and validated the content? I believe this is the Q to the original reporter. > It would be a shame to revert a patch because the ACPI tables were wrong. I agree with you and as I said in my reply "fixes are better than reverts". I.o.w. I'm on the same page with Linus on this.
Hi, Andy, Andy Shevchenko <andriy.shevchenko@linux.intel.com> writes: > On Tue, Dec 03, 2024 at 02:26:52PM +0800, Huang, Ying wrote: >> Raghavendra K T <raghavendra.kt@amd.com> writes: > > ... > >> > git bisect had led to below commit >> > Fixes: b4afe4183ec7 ("resource: fix region_intersects() vs add_memory_driver_managed()") >> >> This breaks you case, sorry about that. But this also fixed a real bug >> too. So, it's not appropriate just to revert it blindly. > > Linus was clear about this recently. Even if it fixes a bug, regression is > still regression and might (*) lead to a revert. > https://lwn.net/Articles/990599/ > > (*) in general fixes are better than reverts, but depends on the timing in > the release cycle the revert may be the only option. I don't think that the timing is so tight that we should not work on proper fix firstly. I'm trying to work with the reporter on this. BTW, the commit b4afe4183ec7 ("resource: fix region_intersects() vs add_memory_driver_managed()") fixed a security related bug. The bug weakened the protection to prevent users read/write system memory via /dev/mem. So, IMO, we need to be more careful about this. --- Best Regards, Huang, Ying
On Wed, Dec 04, 2024 at 10:07:16AM +0800, Huang, Ying wrote: > Andy Shevchenko <andriy.shevchenko@linux.intel.com> writes: > > On Tue, Dec 03, 2024 at 02:26:52PM +0800, Huang, Ying wrote: > >> Raghavendra K T <raghavendra.kt@amd.com> writes: ... > >> > git bisect had led to below commit > >> > Fixes: b4afe4183ec7 ("resource: fix region_intersects() vs add_memory_driver_managed()") > >> > >> This breaks you case, sorry about that. But this also fixed a real bug > >> too. So, it's not appropriate just to revert it blindly. > > > > Linus was clear about this recently. Even if it fixes a bug, regression is > > still regression and might (*) lead to a revert. > > https://lwn.net/Articles/990599/ > > > > (*) in general fixes are better than reverts, but depends on the timing in > > the release cycle the revert may be the only option. > > I don't think that the timing is so tight that we should not work on > proper fix firstly. I'm trying to work with the reporter on this. I agree on this, please do. > BTW, the commit b4afe4183ec7 ("resource: fix region_intersects() vs > add_memory_driver_managed()") fixed a security related bug. The bug > weakened the protection to prevent users read/write system memory via > /dev/mem. So, IMO, we need to be more careful about this. My point was that the regression is obvious and it needs to be fixed. That's all. Revert is a last resort in this sense.
[ add regressions@lists.linux.dev ] Next time make the subject of the patch: Revert "resource: fix region_intersects() vs add_memory_driver_managed()" ...to make it clear that this is a revert, not a fix. The revert should be applied if a fix does not materialize in the next few weeks. Raghavendra K T wrote: > Before: > ~]$ numastat -m > ... > Node 0 Node 1 Total > --------------- --------------- --------------- > MemTotal 128096.18 128838.48 256934.65 > > After: > $ numastat -m > ..... > Node 0 Node 1 Node 2 Total > --------------- --------------- --------------- --------------- > MemTotal 128054.16 128880.51 129024.00 385958.67 > > Current patch reverts the effect of first commit where the issue is seen. Might you be able to dig a bit further into the details like memory map for this platform and ACPI SRAT tables? A dmesg comparison of the good and bad cases would be useful (those can be shared via a github gist). Even better would be some debug instrumentation to identify which call to __region_intersects() started behaving differently resulting in a whole node disappearing. In terms of the urgency of fixing this it would also help to know how prevalent the system this was found on is in the wild.
On 12/4/2024 9:25 AM, Dan Williams wrote: > [ add regressions@lists.linux.dev ] > > Next time make the subject of the patch: > > Revert "resource: fix region_intersects() vs add_memory_driver_managed()" > > ...to make it clear that this is a revert, not a fix. > > The revert should be applied if a fix does not materialize in the next few weeks. > Agreed regarding fix. one thing to note is it is not exact revert. > Raghavendra K T wrote: >> Before: >> ~]$ numastat -m >> ... >> Node 0 Node 1 Total >> --------------- --------------- --------------- >> MemTotal 128096.18 128838.48 256934.65 >> >> After: >> $ numastat -m >> ..... >> Node 0 Node 1 Node 2 Total >> --------------- --------------- --------------- --------------- >> MemTotal 128054.16 128880.51 129024.00 385958.67 >> >> Current patch reverts the effect of first commit where the issue is seen. > > Might you be able to dig a bit further into the details like memory map > for this platform and ACPI SRAT tables? A dmesg comparison of the good > and bad cases would be useful (those can be shared via a github gist). > Even better would be some debug instrumentation to identify which call > to __region_intersects() started behaving differently resulting in a > whole node disappearing. > > In terms of the urgency of fixing this it would also help to know how > prevalent the system this was found on is in the wild. I have compared dmesg, proc/iomem of both success and fail case. A. dmesg: 1. Address ranges is different 2. extra message about printing Demotion target Fallback order for Node 0: 0 1 2 Fallback order for Node 1: 1 0 2 Fallback order for Node 2: 2 0 1 Built 3 zonelists, mobility grouping on. Total pages: 66145521 Policy zone: Normal .... Demotion targets for Node 0: preferred: 2, fallback: 2 Demotion targets for Node 1: preferred: 2, fallback: 2 Demotion targets for Node 2: null B. /proc/iomem $ vimdiff success fail 4050000000-604fffffff : Soft Reserved | 164 4050000000-604fffffff : Soft Reserved 165 4050000000-604fffffff : CXL Window 0 | 165 4050000000-604fffffff : CXL Window 0 166 4080000000-5fffffffff : dax1.0 | ------------------------------------------------------------------------ 167 4080000000-5fffffffff : System RAM (kmem) | -------------------------------------------------------------------- I will get more detail from ACPI SRAT table etc.. Thanks and Regards Raghu
On 12/4/2024 8:31 AM, Andy Shevchenko wrote: > On Wed, Dec 04, 2024 at 10:07:16AM +0800, Huang, Ying wrote: >> Andy Shevchenko <andriy.shevchenko@linux.intel.com> writes: >>> On Tue, Dec 03, 2024 at 02:26:52PM +0800, Huang, Ying wrote: >>>> Raghavendra K T <raghavendra.kt@amd.com> writes: > > ... > >>>>> git bisect had led to below commit >>>>> Fixes: b4afe4183ec7 ("resource: fix region_intersects() vs add_memory_driver_managed()") >>>> >>>> This breaks you case, sorry about that. But this also fixed a real bug >>>> too. So, it's not appropriate just to revert it blindly. >>> >>> Linus was clear about this recently. Even if it fixes a bug, regression is >>> still regression and might (*) lead to a revert. >>> https://lwn.net/Articles/990599/ >>> >>> (*) in general fixes are better than reverts, but depends on the timing in >>> the release cycle the revert may be the only option. >> >> I don't think that the timing is so tight that we should not work on >> proper fix firstly. I'm trying to work with the reporter on this. > > I agree on this, please do. > >> BTW, the commit b4afe4183ec7 ("resource: fix region_intersects() vs >> add_memory_driver_managed()") fixed a security related bug. The bug >> weakened the protection to prevent users read/write system memory via >> /dev/mem. So, IMO, we need to be more careful about this. > > My point was that the regression is obvious and it needs to be fixed. > That's all. Revert is a last resort in this sense. > I agree in general to both of your comment. (i.e. since this bisected commit had security fix, we shall try to get better fix than a close to revert). I am trying to work on this, but it is a bit slow on my side. Thanks and Regards - Raghu
On 12/4/2024 10:14 AM, Raghavendra K T wrote: > > >>>> >>>> (*) in general fixes are better than reverts, but depends on the >>>> timing in >>>> the release cycle the revert may be the only option. >>> >>> I don't think that the timing is so tight that we should not work on >>> proper fix firstly. I'm trying to work with the reporter on this. >> >> I agree on this, please do. >> >>> BTW, the commit b4afe4183ec7 ("resource: fix region_intersects() vs >>> add_memory_driver_managed()") fixed a security related bug. The bug >>> weakened the protection to prevent users read/write system memory via >>> /dev/mem. So, IMO, we need to be more careful about this. >> >> My point was that the regression is obvious and it needs to be fixed. >> That's all. Revert is a last resort in this sense. >> > > I agree in general to both of your comment. (i.e. since this bisected > commit had security fix, we shall try to get better fix than a close to > revert). > > I am trying to work on this, but it is a bit slow on my side. > I will try to get a fix that retains old bugfix and works well for me too. Since it is reproduced only on this shared system, I will get hold of this system next week, and work towards a fix. Thanks and Regards - Raghu
Raghavendra K T wrote: > > > On 12/4/2024 9:25 AM, Dan Williams wrote: > > [ add regressions@lists.linux.dev ] > > > > Next time make the subject of the patch: > > > > Revert "resource: fix region_intersects() vs add_memory_driver_managed()" > > > > ...to make it clear that this is a revert, not a fix. > > > > The revert should be applied if a fix does not materialize in the next few weeks. > > > > Agreed regarding fix. > one thing to note is it is not exact revert. > > > Raghavendra K T wrote: > >> Before: > >> ~]$ numastat -m > >> ... > >> Node 0 Node 1 Total > >> --------------- --------------- --------------- > >> MemTotal 128096.18 128838.48 256934.65 > >> > >> After: > >> $ numastat -m > >> ..... > >> Node 0 Node 1 Node 2 Total > >> --------------- --------------- --------------- --------------- > >> MemTotal 128054.16 128880.51 129024.00 385958.67 > >> > >> Current patch reverts the effect of first commit where the issue is seen. > > > > Might you be able to dig a bit further into the details like memory map > > for this platform and ACPI SRAT tables? A dmesg comparison of the good > > and bad cases would be useful (those can be shared via a github gist). > > Even better would be some debug instrumentation to identify which call > > to __region_intersects() started behaving differently resulting in a > > whole node disappearing. > > > > In terms of the urgency of fixing this it would also help to know how > > prevalent the system this was found on is in the wild. > > I have compared dmesg, proc/iomem of both success and fail case. > > A. dmesg: > > 1. Address ranges is different > 2. extra message about printing Demotion target > > Fallback order for Node 0: 0 1 2 > Fallback order for Node 1: 1 0 2 > Fallback order for Node 2: 2 0 1 > Built 3 zonelists, mobility grouping on. Total pages: 66145521 > Policy zone: Normal > .... > Demotion targets for Node 0: preferred: 2, fallback: 2 > Demotion targets for Node 1: preferred: 2, fallback: 2 > Demotion targets for Node 2: null > > B. /proc/iomem > > $ vimdiff success fail > > 4050000000-604fffffff : Soft Reserved > | 164 4050000000-604fffffff : Soft Reserved > 165 4050000000-604fffffff : CXL Window 0 > | 165 4050000000-604fffffff : CXL Window 0 > 166 4080000000-5fffffffff : dax1.0 > | > ------------------------------------------------------------------------ > 167 4080000000-5fffffffff : System RAM (kmem) > | > -------------------------------------------------------------------- My eyes only know how to read unified diff (diff -u) format. Is this saying that in the failure case the System RAM range for dax1.0 is missing?
On 12/6/2024 1:20 PM, Dan Williams wrote: > Raghavendra K T wrote: >> [...] >> B. /proc/iomem >> >> $ vimdiff success fail >> >> 4050000000-604fffffff : Soft Reserved >> | 164 4050000000-604fffffff : Soft Reserved >> 165 4050000000-604fffffff : CXL Window 0 >> | 165 4050000000-604fffffff : CXL Window 0 >> 166 4080000000-5fffffffff : dax1.0 >> | >> ------------------------------------------------------------------------ >> 167 4080000000-5fffffffff : System RAM (kmem) >> | >> -------------------------------------------------------------------- > > My eyes only know how to read unified diff (diff -u) format. Is this > saying that in the failure case the System RAM range for dax1.0 is > missing? Sorry for that formatting. I realized later. Yes. dax1.0 missing in the failure case. Looks like problem is that, 4050000000-604fffffff : Soft Reserved 4050000000-604fffffff : CXL Window 0 4080000000-5fffffffff : dax1.0 4080000000-5fffffffff : System RAM (kmem) this use case where, |A<------------->D| CXL window |..|B<-->C|.......| kmem B->C range that falls within A->D as in above somehow not covered after the patch. Thanks and Regards - Raghu
Raghavendra K T wrote: > > > On 12/6/2024 1:20 PM, Dan Williams wrote: > > Raghavendra K T wrote: > >> > [...] > >> B. /proc/iomem > >> > >> $ vimdiff success fail > >> > >> 4050000000-604fffffff : Soft Reserved > >> | 164 4050000000-604fffffff : Soft Reserved > >> 165 4050000000-604fffffff : CXL Window 0 > >> | 165 4050000000-604fffffff : CXL Window 0 > >> 166 4080000000-5fffffffff : dax1.0 > >> | > >> ------------------------------------------------------------------------ > >> 167 4080000000-5fffffffff : System RAM (kmem) > >> | > >> -------------------------------------------------------------------- > > > > My eyes only know how to read unified diff (diff -u) format. Is this > > saying that in the failure case the System RAM range for dax1.0 is > > missing? > > > Sorry for that formatting. I realized later. > Yes. dax1.0 missing in the failure case. > > Looks like problem is that, > > 4050000000-604fffffff : Soft Reserved > 4050000000-604fffffff : CXL Window 0 > 4080000000-5fffffffff : dax1.0 > 4080000000-5fffffffff : System RAM (kmem) > > this use case where, > > |A<------------->D| CXL window > |..|B<-->C|.......| kmem > > B->C range that falls within A->D as in above somehow > not covered after the patch. I was able to reproduce a similar set of conditions with the cxl_test environment: f010000000-f04fffffff : Soft Reserved f010000000-f04fffffff : CXL Window 0 f020000000-f03fffffff : region3 f020000000-f03fffffff : dax3.0 f020000000-f03fffffff : System RAM (kmem) ...but that did not result in the bug. So there are some other details missing. Can you proceed with providing the dmesg from the good and the bad cases? gist.github.com is useful for this.
On 11-Dec-24 10:14 AM, Dan Williams wrote: > > I was able to reproduce a similar set of conditions with the cxl_test > environment: > > f010000000-f04fffffff : Soft Reserved > f010000000-f04fffffff : CXL Window 0 > f020000000-f03fffffff : region3 > f020000000-f03fffffff : dax3.0 > f020000000-f03fffffff : System RAM (kmem) > > > ...but that did not result in the bug. So there are some other details > missing. Can you proceed with providing the dmesg from the good and the > bad cases? > > gist.github.com is useful for this. Sorry for the delay in providing the data. The system wasn't available for a while. Now I have put the good(6.11.0-rc6) and bad(6.13.0-rc1) dmesg and iomem logs at https://gist.github.com/bharata/4a57db11e044fd1d313035f3dd5f763b What I see is that in the bad case, we hit the below shown dev_dbg message (not seen in the logs that are provided) static int hmem_register_device(struct device *host, int target_nid, const struct resource *res) { struct platform_device *pdev; struct memregion_info info; long id; int rc; if (IS_ENABLED(CONFIG_CXL_REGION) && region_intersects(res->start, resource_size(res), IORESOURCE_MEM, IORES_DESC_CXL) != REGION_DISJOINT) { dev_dbg(host, "deferring range to CXL: %pr\n", res); <-- return 0; } With this, it appears that dev_dax_kmem_probe() won't happen for the CXL range and hence CXL memory doesn't get detected. Let us know if you need more debug data. Regards, Bharata.
Bharata B Rao wrote: > On 11-Dec-24 10:14 AM, Dan Williams wrote: > > > > I was able to reproduce a similar set of conditions with the cxl_test > > environment: > > > > f010000000-f04fffffff : Soft Reserved > > f010000000-f04fffffff : CXL Window 0 > > f020000000-f03fffffff : region3 > > f020000000-f03fffffff : dax3.0 > > f020000000-f03fffffff : System RAM (kmem) > > > > > > ...but that did not result in the bug. So there are some other details > > missing. Can you proceed with providing the dmesg from the good and the > > bad cases? > > > > gist.github.com is useful for this. > > Sorry for the delay in providing the data. The system wasn't available > for a while. Now I have put the good(6.11.0-rc6) and bad(6.13.0-rc1) > dmesg and iomem logs at > > https://gist.github.com/bharata/4a57db11e044fd1d313035f3dd5f763b > > What I see is that in the bad case, we hit the below shown dev_dbg > message (not seen in the logs that are provided) > > static int hmem_register_device(struct device *host, int target_nid, > const struct resource *res) > { > struct platform_device *pdev; > struct memregion_info info; > long id; > int rc; > > if (IS_ENABLED(CONFIG_CXL_REGION) && > region_intersects(res->start, resource_size(res), > IORESOURCE_MEM, > IORES_DESC_CXL) != REGION_DISJOINT) { > dev_dbg(host, "deferring range to CXL: %pr\n", res); <-- > return 0; > } > > With this, it appears that dev_dax_kmem_probe() won't happen for the CXL > range and hence CXL memory doesn't get detected. > > Let us know if you need more debug data. Ah, thanks for that! So, it turns out Ying's patch is working as advertised. It is traversing past the top-level entry of the the iomem_resource topology to discover that a Soft Reserved range is described by CXL. Then the expectation is that the CXL subsystem parses the topology and registers a dax device. I missed that detail in my repro because I was not testing the HMEM handoff. Now, the dmesg shows that the CXL subsystem gives up early on the CXL devices as they do not appear to be meeting the expecations of a "CXL Memory Device" as described by the "PCI Header - Class Code Register (Offset 09h)" of the CXL spec. Specifically these messages: [ 4.449072] cxl_pci 0000:9f:00.0: registers not found: status mbox memdev I think this situation is increasingly showing that the pace of non-standard quirks being deployed is higher than CXL subsystem is able to keep pace. The immediate workaround to this problem that Linux discovered is to disable the cxl_acpi driver. Can you confirm that preventing the cxl_acpi driver from loading restores that missing node? Longer term the urgency of Nathan's patch [1] needs to be escalated. [1]: http://lore.kernel.org/20241202155542.22111-1-nathan.fontenot@amd.com
On 11-Dec-24 2:08 PM, Dan Williams wrote: > Ah, thanks for that! > > So, it turns out Ying's patch is working as advertised. It is traversing > past the top-level entry of the the iomem_resource topology to discover that a > Soft Reserved range is described by CXL. Then the expectation is that > the CXL subsystem parses the topology and registers a dax device. > > I missed that detail in my repro because I was not testing the HMEM > handoff. > > Now, the dmesg shows that the CXL subsystem gives up early on the > CXL devices as they do not appear to be meeting the expecations of a "CXL > Memory Device" as described by the "PCI Header - Class Code Register > (Offset 09h)" of the CXL spec. > > Specifically these messages: > > [ 4.449072] cxl_pci 0000:9f:00.0: registers not found: status mbox memdev > > I think this situation is increasingly showing that the pace of > non-standard quirks being deployed is higher than CXL subsystem is able > to keep pace. > > The immediate workaround to this problem that Linux discovered is to > disable the cxl_acpi driver. Can you confirm that preventing the > cxl_acpi driver from loading restores that missing node? Yes, disabling CONFIG_CXL_ACPI gets the CXL node up with the memory. Thanks for this workaround. > > Longer term the urgency of Nathan's patch [1] needs to be escalated. > > [1]: http://lore.kernel.org/20241202155542.22111-1-nathan.fontenot@amd.com I didn't find that patch helping though with CONFIG_CXL_ACPI kept enabled. Regards, Bharata.
On 12/11/2024 10:17 PM, Bharata B Rao wrote: > On 11-Dec-24 2:08 PM, Dan Williams wrote: >> Ah, thanks for that! >> >> So, it turns out Ying's patch is working as advertised. It is traversing >> past the top-level entry of the the iomem_resource topology to >> discover that a >> Soft Reserved range is described by CXL. Then the expectation is that >> the CXL subsystem parses the topology and registers a dax device. >> >> I missed that detail in my repro because I was not testing the HMEM >> handoff. >> >> Now, the dmesg shows that the CXL subsystem gives up early on the >> CXL devices as they do not appear to be meeting the expecations of a "CXL >> Memory Device" as described by the "PCI Header - Class Code Register >> (Offset 09h)" of the CXL spec. >> >> Specifically these messages: >> >> [ 4.449072] cxl_pci 0000:9f:00.0: registers not found: status >> mbox memdev >> >> I think this situation is increasingly showing that the pace of >> non-standard quirks being deployed is higher than CXL subsystem is able >> to keep pace. >> >> The immediate workaround to this problem that Linux discovered is to >> disable the cxl_acpi driver. Can you confirm that preventing the >> cxl_acpi driver from loading restores that missing node? > > Yes, disabling CONFIG_CXL_ACPI gets the CXL node up with the memory. > Thanks for this workaround. > Thank you Bharata for filling in. Thank you Dan, Andy, Gregory, Ying for your input. (sorry for the delay because machine was not available as pointed by Bharata). This helps to proceed with CONFIG_CXL_ACPI=n for now in my case. - Raghu
diff --git a/kernel/resource.c b/kernel/resource.c index c9fd26c06345..d4dcaa1831cd 100644 --- a/kernel/resource.c +++ b/kernel/resource.c @@ -556,55 +556,20 @@ static int __region_intersects(struct resource *parent, resource_size_t start, size_t size, unsigned long flags, unsigned long desc) { + struct resource res; int type = 0; int other = 0; - struct resource *p, *dp; - struct resource res, o; - bool covered; + struct resource *p; res.start = start; res.end = start + size - 1; for (p = parent->child; p ; p = p->sibling) { - if (!resource_intersection(p, &res, &o)) - continue; - if (is_type_match(p, flags, desc)) { - type++; - continue; - } - /* - * Continue to search in descendant resources as if the - * matched descendant resources cover some ranges of 'p'. - * - * |------------- "CXL Window 0" ------------| - * |-- "System RAM" --| - * - * will behave similar as the following fake resource - * tree when searching "System RAM". - * - * |-- "System RAM" --||-- "CXL Window 0a" --| - */ - covered = false; - for_each_resource(p, dp, false) { - if (!resource_overlaps(dp, &res)) - continue; - if (is_type_match(dp, flags, desc)) { - type++; - /* - * Range from 'o.start' to 'dp->start' - * isn't covered by matched resource. - */ - if (dp->start > o.start) - break; - if (dp->end >= o.end) { - covered = true; - break; - } - /* Remove covered range */ - o.start = max(o.start, dp->end + 1); - } - } - if (!covered) - other++; + bool is_type = (((p->flags & flags) == flags) && + ((desc == IORES_DESC_NONE) || + (desc == p->desc))); + + if (resource_overlaps(p, &res)) + is_type ? type++ : other++; } if (type == 0)
Before: ~]$ numastat -m ... Node 0 Node 1 Total --------------- --------------- --------------- MemTotal 128096.18 128838.48 256934.65 After: $ numastat -m ..... Node 0 Node 1 Node 2 Total --------------- --------------- --------------- --------------- MemTotal 128054.16 128880.51 129024.00 385958.67 Current patch reverts the effect of first commit where the issue is seen. git bisect had led to below commit Fixes: b4afe4183ec7 ("resource: fix region_intersects() vs add_memory_driver_managed()") Cc: Huang Ying <ying.huang@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: <ilpo.jarvinen@linux.intel.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Fontenot Nathan <Nathan.Fontenot@amd.com> Cc: Wei Huang <wei.huang2@amd.com> Signed-off-by: Raghavendra K T <raghavendra.kt@amd.com> --- kernel/resource.c | 51 ++++++++--------------------------------------- 1 file changed, 8 insertions(+), 43 deletions(-) Note: Posting the fix that works for me. But looks like an exact fix might be different. Since I am not much familiar with the below code, I will have to go back and look into more details. Please let me know if more detail is needed. sorry if I had missed something obvious. Git bisec log looked like this: # good: [f7feea289f9ae3a8fb56e9daa3832949bf742c53] mm: numa_memblks: use memblock_{start,end}_of_DRAM() when sanitizing meminfo git bisect good f7feea289f9ae3a8fb56e9daa3832949bf742c53 # bad: [9852d85ec9d492ebef56dc5f229416c925758edc] Linux 6.12-rc1 git bisect bad 9852d85ec9d492ebef56dc5f229416c925758edc # good: [a65b3c3ed49a3b8068c002e98c90f8594927ff25] Merge tag 'hid-for-linus-2024091602' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid git bisect good a65b3c3ed49a3b8068c002e98c90f8594927ff25 # bad: [486fd58af7ac1098b68370b1d4d9f94a2a1c7124] zram: don't free statically defined names git bisect bad 486fd58af7ac1098b68370b1d4d9f94a2a1c7124 # bad: [7856a565416e0cf091f825b0e25c7a1b7abb650e] Merge tag 'mm-nonmm-stable-2024-09-21-07-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm git bisect bad 7856a565416e0cf091f825b0e25c7a1b7abb650e # bad: [1868f9d0260e9afaf7c6436d14923ae12eaea465] Merge tag 'for-linux-6.12-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux git bisect bad 1868f9d0260e9afaf7c6436d14923ae12eaea465 # good: [e55ef65510a401862b902dc979441ea10ae25c61] Merge tag 'amd-drm-next-6.12-2024-08-26' of https://gitlab.freedesktop.org/agd5f/linux into drm-next git bisect good e55ef65510a401862b902dc979441ea10ae25c61 # good: [f1a4dceeb2bd4b4478e4f0c77dac55569d153fb3] drm/xe: Fix missing conversion to xe_display_pm_runtime_resume git bisect good f1a4dceeb2bd4b4478e4f0c77dac55569d153fb3 # bad: [839c4f596f898edc424070dc8b517381572f8502] Merge tag 'mm-hotfixes-stable-2024-09-19-00-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm git bisect bad 839c4f596f898edc424070dc8b517381572f8502 # good: [726e2d0cf2bbc14e3bf38491cddda1a56fe18663] Merge tag 'dma-mapping-6.12-2024-09-19' of git://git.infradead.org/users/hch/dma-mapping git bisect good 726e2d0cf2bbc14e3bf38491cddda1a56fe18663 # good: [992f9884626a0e6ab73a98ca4eb166d17675cae6] Merge patch series "NCR5380: Bug fixes and other improvements" git bisect good 992f9884626a0e6ab73a98ca4eb166d17675cae6 # good: [adedd0f46c923f8d63aeb42d504c82431febed31] scsi: bnx2i: Remove unused declarations git bisect good adedd0f46c923f8d63aeb42d504c82431febed31 # good: [a1d1eb2f57501b2e7e2076ce89b3f3a666ddbfdd] Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi git bisect good a1d1eb2f57501b2e7e2076ce89b3f3a666ddbfdd # good: [fb497d6db7c19c797cbd694b52d1af87c4eebcc6] mm/damon/vaddr: protect vma traversal in __damon_va_thre_regions() with rcu read lock git bisect good fb497d6db7c19c797cbd694b52d1af87c4eebcc6 # bad: [2a058ab3286d6475b2082b90c2d2182d2fea4b39] mm: change vmf_anon_prepare() to __vmf_anon_prepare() git bisect bad 2a058ab3286d6475b2082b90c2d2182d2fea4b39 # bad: [b4afe4183ec77f230851ea139d91e5cf2644c68b] resource: fix region_intersects() vs add_memory_driver_managed() git bisect bad b4afe4183ec77f230851ea139d91e5cf2644c68b # good: [6040f650c56862a4ac40b00c37ef6ab1ddfcebb5] zsmalloc: use unique zsmalloc caches names git bisect good 6040f650c56862a4ac40b00c37ef6ab1ddfcebb5 # first bad commit: [b4afe4183ec77f230851ea139d91e5cf2644c68b] resource: fix region_intersects() vs add_memory_driver_managed() # # # git bisect good b4afe4183ec77f230851ea139d91e5cf2644c68b is the first bad commit commit b4afe4183ec77f230851ea139d91e5cf2644c68b base-commit: e70140ba0d2b1a30467d4af6bcfe761327b9ec95