Message ID | 20240913183234.17302-1-gourry@gourry.net |
---|---|
State | New |
Headers | show |
Series | cxl/core/port: defer probe when memdev fails to find correct port | expand |
Gregory Price wrote: > Depending on device/hierarchy readiness, it can be possible for the > async probe process to attempt to register an endpoint before the > entire port hierarchy is ready. This currently fails with -ENXIO. > > Return -EPROBE_DEFER to try again later automatically (which is > what the local comments already say we should do anyway). I want to make sure this is not papering over some other issue. Can you post the final topology when this works (cxl list -BPET)? My working theory is that you have 2 devices that share an intermediate port. Otherwise, I am having a hard time understanding why the cxl_bus_rescan() in cxl_acpi_probe() does not obviate the explicit EPROBE_DEFER. So, devA is dependendent on devB to create a common port, but devA loses that race after cxl_bus_rescan() has already run. Then EBPROBE_DEFER is the right answer to trigger devA to try again.
On Fri, Sep 13, 2024 at 09:32:48PM -0700, Dan Williams wrote: > Gregory Price wrote: > > Depending on device/hierarchy readiness, it can be possible for the > > async probe process to attempt to register an endpoint before the > > entire port hierarchy is ready. This currently fails with -ENXIO. > > > > Return -EPROBE_DEFER to try again later automatically (which is > > what the local comments already say we should do anyway). > > I want to make sure this is not papering over some other issue. Can you > post the final topology when this works (cxl list -BPET)? My working > theory is that you have 2 devices that share an intermediate port. > Otherwise, I am having a hard time understanding why the > cxl_bus_rescan() in cxl_acpi_probe() does not obviate the explicit > EPROBE_DEFER. > Sorry for the delay [ { "bus":"root0", "provider":"ACPI.CXL", "nr_dports":4, "dports":[ { "dport":"pci0000:e0", "alias":"ACPI0016:00", "id":7 }, { "dport":"pci0000:00", "alias":"ACPI0016:01", "id":0 }, { "dport":"pci0000:c0", "alias":"ACPI0016:02", "id":6 }, { "dport":"pci0000:20", "alias":"ACPI0016:03", "id":1 } ], "ports:root0":[ { "port":"port1", "host":"pci0000:e0", "depth":1, "decoders_committed":2, "nr_dports":4, "dports":[ { "dport":"0000:e0:07.2", "alias":"device:16", "id":114 }, { "dport":"0000:e0:01.1", "alias":"device:02", "id":0 }, { "dport":"0000:e0:01.3", "alias":"device:05", "id":2 }, { "dport":"0000:e0:07.1", "alias":"device:0d", "id":113 } ], "endpoints:port1":[ { "endpoint":"endpoint5", "host":"mem0", "parent_dport":"0000:e0:01.1", "depth":2, "decoders_committed":1 } ] }, { "port":"port3", "host":"pci0000:c0", "depth":1, "decoders_committed":2, "nr_dports":1, "dports":[ { "dport":"0000:c0:01.1", "alias":"device:c3", "id":0 } ], "endpoints:port3":[ { "endpoint":"endpoint6", "host":"mem1", "parent_dport":"0000:c0:01.1", "depth":2, "decoders_committed":1 } ] }, { "port":"port2", "host":"pci0000:00", "depth":1, "decoders_committed":0, "nr_dports":2, "dports":[ { "dport":"0000:00:01.3", "alias":"device:55", "id":2 }, { "dport":"0000:00:07.1", "alias":"device:5d", "id":113 } ] }, { "port":"port4", "host":"pci0000:20", "depth":1, "decoders_committed":0, "nr_dports":1, "dports":[ { "dport":"0000:20:01.1", "alias":"device:d0", "id":0 } ] } ] } ] > So, devA is dependendent on devB to create a common port, but devA loses > that race after cxl_bus_rescan() has already run. Then EBPROBE_DEFER is > the right answer to trigger devA to try again.
On Wed, Sep 18, 2024 at 01:59:07PM +0200, Gregory Price wrote: > On Fri, Sep 13, 2024 at 09:32:48PM -0700, Dan Williams wrote: > > Gregory Price wrote: > > > Depending on device/hierarchy readiness, it can be possible for the > > > async probe process to attempt to register an endpoint before the > > > entire port hierarchy is ready. This currently fails with -ENXIO. > > > > > > Return -EPROBE_DEFER to try again later automatically (which is > > > what the local comments already say we should do anyway). > > > > I want to make sure this is not papering over some other issue. Can you > > post the final topology when this works (cxl list -BPET)? My working > > theory is that you have 2 devices that share an intermediate port. > > Otherwise, I am having a hard time understanding why the > > cxl_bus_rescan() in cxl_acpi_probe() does not obviate the explicit > > EPROBE_DEFER. > > just reporting back with a fully functional layout - they do not appear to share an intermediate port, unless you consider the root a shared port. I see your concern about this papering over another issue, but it's not clear what to look for or how to look for it at this point. For what it's worth - another group observed the same issue with different hardware and produced the same patch. "ports:root0":[ { "port":"port1", "host":"pci0000:e0", "depth":1, "decoders_committed":1, "nr_dports":4, "dports":[ { "dport":"0000:e0:07.2", "alias":"device:16", "id":114 }, { "dport":"0000:e0:01.1", "alias":"device:02", "id":0 }, { "dport":"0000:e0:01.3", "alias":"device:05", "id":2 }, { "dport":"0000:e0:07.1", "alias":"device:0d", "id":113 } ], "endpoints:port1":[ { "endpoint":"endpoint5", "host":"mem0", "parent_dport":"0000:e0:01.1", "depth":2, "decoders_committed":1 } ] }, { "port":"port3", "host":"pci0000:c0", "depth":1, "decoders_committed":2, "nr_dports":1, "dports":[ { "dport":"0000:c0:01.1", "alias":"device:c3", "id":0 } ], "endpoints:port3":[ { "endpoint":"endpoint6", "host":"mem1", "parent_dport":"0000:c0:01.1", "depth":2, "decoders_committed":2 } ] }, ] > > So, devA is dependendent on devB to create a common port, but devA loses > > that race after cxl_bus_rescan() has already run. Then EBPROBE_DEFER is > > the right answer to trigger devA to try again.
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 1d5007e3795a..d6bebf70d142 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -1553,7 +1553,7 @@ static int add_port_attach_ep(struct cxl_memdev *cxlmd, */ dev_dbg(&cxlmd->dev, "%s is a root dport\n", dev_name(dport_dev)); - return -ENXIO; + return -EPROBE_DEFER; } parent_port = find_cxl_port(dparent, &parent_dport);
Depending on device/hierarchy readiness, it can be possible for the async probe process to attempt to register an endpoint before the entire port hierarchy is ready. This currently fails with -ENXIO. Return -EPROBE_DEFER to try again later automatically (which is what the local comments already say we should do anyway). Signed-off-by: Gregory Price <gourry@gourry.net> --- drivers/cxl/core/port.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)