Message ID | ZIs5Ra+9SGGbUZoZ@memverge.com |
---|---|
State | New, archived |
Headers | show |
Series | [BUG] Root port fails to match with port driver on non-RCH topology | expand |
On Thu, Jun 15, 2023 at 12:16:05PM -0400, Gregory Price wrote: > Unfortunately, I added this as a hack, but it did not resolve the issue. > > I'm a bit lost in the device-driver core trying to track down the exact > path that is being taken, it's possible another subsequent error is > occuring that subsequently fails as well. > > > ~Gregory > > > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c > index 4d1f9c5b5029..7f99e4f790d8 100644 > --- a/drivers/cxl/core/port.c > +++ b/drivers/cxl/core/port.c > @@ -1853,7 +1853,9 @@ static int cxl_bus_uevent(const struct device *dev, struct kobj_uevent_env *env) > > static int cxl_bus_match(struct device *dev, struct device_driver *drv) > { > - return cxl_device_id(dev) == to_cxl_drv(drv)->id; > + int devid = cxl_device_id(dev); > + int drvid = to_cxl_drv(drv)->id > + return (devid == drvid) || (devid == CXL_DEVICE_ROOT && drvid == CXL_DEVICE_PORT); > } follow up with more context, there is a subsequent error on cxl_port_probe which causes dev->driver to remain NULL. [ 15.561313] cxl_mem mem0: CXL port topology root0 not enabled [ 15.561315] cxl_mem mem0: call_driver_probe failed drivers/cxl/port.c static int cxl_port_probe(struct device *dev) { struct cxl_port *port = to_cxl_port(dev); if (is_cxl_endpoint(port)) return cxl_endpoint_port_probe(port); return cxl_switch_port_probe(port); } since root is not an endpoint, we probe root as a switch port. static int cxl_switch_port_probe(struct cxl_port *port) { struct cxl_hdm *cxlhdm; int rc; rc = devm_cxl_port_enumerate_dports(port); if (rc < 0) return rc; if (rc == 1) return devm_cxl_add_passthrough_decoder(port); cxlhdm = devm_cxl_setup_hdm(port, NULL); if (IS_ERR(cxlhdm)) return PTR_ERR(cxlhdm); return devm_cxl_enumerate_decoders(cxlhdm, NULL); } It is likely that dev_cxl_port_enumate_dports is failing, though more investigation is needed at this point. What I don't know is whether cxl_port_probe is intended to probe the root port, or if we should simply be attaching the driver to the root port device explicitly rather than going through probe. Any guidance would be appreciated. ~Gregory
On Thu, Jun 15, 2023 at 02:46:13PM -0400, Gregory Price wrote: > On Thu, Jun 15, 2023 at 12:16:05PM -0400, Gregory Price wrote: > follow up with more context, there is a subsequent error on > cxl_port_probe which causes dev->driver to remain NULL. > > > [ 15.561313] cxl_mem mem0: CXL port topology root0 not enabled > [ 15.561315] cxl_mem mem0: call_driver_probe failed wrong prints here, this should have read: [ 15.549316] cxl root0: driver set to cxl_port [ 15.549732] cxl_port root0: call_driver_probe failed I confirmed that call_driver_probe sinks down to a call to cxl_port_probe and that is failing on cxl_port_probe(root0). Rest of the commentary below is accurate. > > > drivers/cxl/port.c > static int cxl_port_probe(struct device *dev) > { > struct cxl_port *port = to_cxl_port(dev); > > if (is_cxl_endpoint(port)) > return cxl_endpoint_port_probe(port); > return cxl_switch_port_probe(port); > } > > > since root is not an endpoint, we probe root as a switch port. > > static int cxl_switch_port_probe(struct cxl_port *port) > { > struct cxl_hdm *cxlhdm; > int rc; > > rc = devm_cxl_port_enumerate_dports(port); > if (rc < 0) > return rc; > > if (rc == 1) > return devm_cxl_add_passthrough_decoder(port); > > cxlhdm = devm_cxl_setup_hdm(port, NULL); > if (IS_ERR(cxlhdm)) > return PTR_ERR(cxlhdm); > > return devm_cxl_enumerate_decoders(cxlhdm, NULL); > } > > > It is likely that dev_cxl_port_enumate_dports is failing, though more > investigation is needed at this point. > > What I don't know is whether cxl_port_probe is intended to probe the > root port, or if we should simply be attaching the driver to the root > port device explicitly rather than going through probe. > > Any guidance would be appreciated. > > ~Gregory
On Thu, Jun 15, 2023 at 02:51:13PM -0400, Gregory Price wrote: > On Thu, Jun 15, 2023 at 02:46:13PM -0400, Gregory Price wrote: > > On Thu, Jun 15, 2023 at 12:16:05PM -0400, Gregory Price wrote: > > follow up with more context, there is a subsequent error on > > cxl_port_probe which causes dev->driver to remain NULL. > > > > > > [ 15.561313] cxl_mem mem0: CXL port topology root0 not enabled > > [ 15.561315] cxl_mem mem0: call_driver_probe failed > > wrong prints here, this should have read: > > [ 15.549316] cxl root0: driver set to cxl_port > [ 15.549732] cxl_port root0: call_driver_probe failed > > I confirmed that call_driver_probe sinks down to a call to > cxl_port_probe and that is failing on cxl_port_probe(root0). > > Rest of the commentary below is accurate. > so cxl_switch_port_probe fails on this stack: - cxl_switch_port_probe - devm_cxl_port_enumerate_dports - cxl_port_to_pci_bus int devm_cxl_port_enumerate_dports(struct cxl_port *port) { struct pci_bus *bus = cxl_port_to_pci_bus(port); struct cxl_walk_context ctx; int type; if (!bus) return -ENXIO; ... } struct pci_bus *cxl_port_to_pci_bus(struct cxl_port *port) { /* There is no pci_bus associated with a CXL platform-root port */ if (is_cxl_root(port)) return NULL; ... } I presume then that for the root port, we should just return 0 to note success? The rest of the devices should register themselves correctly with the root as they're iterated over - i think? What I can't figure out is why this doesn't happen on QEMU, which also presents a similar topology. Still not sure what is correct vs incorrect here, will need to dig around in the spec and system settings unless someone has an idea. ~Gregory
On Thu, 15 Jun 2023 18:43:59 -0400 Gregory Price <gregory.price@memverge.com> wrote: > On Thu, Jun 15, 2023 at 02:51:13PM -0400, Gregory Price wrote: > > On Thu, Jun 15, 2023 at 02:46:13PM -0400, Gregory Price wrote: > > > On Thu, Jun 15, 2023 at 12:16:05PM -0400, Gregory Price wrote: > > > follow up with more context, there is a subsequent error on > > > cxl_port_probe which causes dev->driver to remain NULL. > > > > > > > > > [ 15.561313] cxl_mem mem0: CXL port topology root0 not enabled > > > [ 15.561315] cxl_mem mem0: call_driver_probe failed > > > > wrong prints here, this should have read: > > > > [ 15.549316] cxl root0: driver set to cxl_port > > [ 15.549732] cxl_port root0: call_driver_probe failed > > > > I confirmed that call_driver_probe sinks down to a call to > > cxl_port_probe and that is failing on cxl_port_probe(root0). > > > > Rest of the commentary below is accurate. > > > > > so cxl_switch_port_probe fails on this stack: > > - cxl_switch_port_probe > - devm_cxl_port_enumerate_dports > - cxl_port_to_pci_bus > > int devm_cxl_port_enumerate_dports(struct cxl_port *port) > { > struct pci_bus *bus = cxl_port_to_pci_bus(port); > struct cxl_walk_context ctx; > int type; > > if (!bus) > return -ENXIO; > ... > } > > struct pci_bus *cxl_port_to_pci_bus(struct cxl_port *port) > { > /* There is no pci_bus associated with a CXL platform-root port */ > if (is_cxl_root(port)) > return NULL; > ... > } > > > I presume then that for the root port, we should just return 0 to note > success? The rest of the devices should register themselves correctly > with the root as they're iterated over - i think? > > What I can't figure out is why this doesn't happen on QEMU, which also > presents a similar topology. > > Still not sure what is correct vs incorrect here, will need to dig > around in the spec and system settings unless someone has an idea. Interestingly on QEMU we never get near the path that's failing on root0 as no driver is associated with that particular device. There are plenty of child devices with drivers though - just not the top level one. I wonder if a rescan or something similar is causing the probe on your system but not qemu? Dan, should we be seeing a driver binding to root0 or not? Jonathan > > ~Gregory
On Thu, Jun 22, 2023 at 10:48:20AM +0100, Jonathan Cameron wrote: > On Thu, 15 Jun 2023 18:43:59 -0400 > Gregory Price <gregory.price@memverge.com> wrote: > > > On Thu, Jun 15, 2023 at 02:51:13PM -0400, Gregory Price wrote: > > > On Thu, Jun 15, 2023 at 02:46:13PM -0400, Gregory Price wrote: > > > > On Thu, Jun 15, 2023 at 12:16:05PM -0400, Gregory Price wrote: > > > > follow up with more context, there is a subsequent error on > > > > cxl_port_probe which causes dev->driver to remain NULL. > > > > > > > > > > > > [ 15.561313] cxl_mem mem0: CXL port topology root0 not enabled > > > > [ 15.561315] cxl_mem mem0: call_driver_probe failed > > > > > > wrong prints here, this should have read: > > > > > > [ 15.549316] cxl root0: driver set to cxl_port > > > [ 15.549732] cxl_port root0: call_driver_probe failed > > > > > > I confirmed that call_driver_probe sinks down to a call to > > > cxl_port_probe and that is failing on cxl_port_probe(root0). > > > > > > Rest of the commentary below is accurate. > > > > > > > > > so cxl_switch_port_probe fails on this stack: > > > > - cxl_switch_port_probe > > - devm_cxl_port_enumerate_dports > > - cxl_port_to_pci_bus > > > > int devm_cxl_port_enumerate_dports(struct cxl_port *port) > > { > > struct pci_bus *bus = cxl_port_to_pci_bus(port); > > struct cxl_walk_context ctx; > > int type; > > > > if (!bus) > > return -ENXIO; > > ... > > } > > > > struct pci_bus *cxl_port_to_pci_bus(struct cxl_port *port) > > { > > /* There is no pci_bus associated with a CXL platform-root port */ > > if (is_cxl_root(port)) > > return NULL; > > ... > > } > > > > > > I presume then that for the root port, we should just return 0 to note > > success? The rest of the devices should register themselves correctly > > with the root as they're iterated over - i think? > > > > What I can't figure out is why this doesn't happen on QEMU, which also > > presents a similar topology. > > > > Still not sure what is correct vs incorrect here, will need to dig > > around in the spec and system settings unless someone has an idea. > > Interestingly on QEMU we never get near the path that's failing on root0 > as no driver is associated with that particular device. There are plenty of > child devices with drivers though - just not the top level one. > I wonder if a rescan or something similar is causing the probe on your > system but not qemu? > > Dan, should we be seeing a driver binding to root0 or not? I added a local patch to associate the root port with the port driver after observing that the association was failing: diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 4d1f9c5b5029..7f99e4f790d8 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -1853,7 +1853,9 @@ static int cxl_bus_uevent(const struct device *dev, struct kobj_uevent_env *env) static int cxl_bus_match(struct device *dev, struct device_driver *drv) { - return cxl_device_id(dev) == to_cxl_drv(drv)->id; + int devid = cxl_device_id(dev); + int drvid = to_cxl_drv(drv)->id + return (devid == drvid) || (devid == CXL_DEVICE_ROOT && drvid == + CXL_DEVICE_PORT); } This is what is causing the probe, that's not what i'm mostly concerned with. The core issue here appears to be that the driver is seeing the root port as the endpoint parent: drivers/cxl/mem.c static int cxl_mem_probe(struct device *dev) { ... if (dport->rch) endpoint_parent = parent_port->uport; else endpoint_parent = &parent_port->dev; device_lock(endpoint_parent); if (!endpoint_parent->driver) { <---- endpoint here is root0 dev_err(dev, "CXL port topology %s not enabled\n", dev_name(endpoint_parent)); rc = -ENXIO; goto unlock; } ... } I suppose the root question may be whether an endpoint_parent should be a root port, or if there is topological error occurring somewhere (missing port in the topology). > > Jonathan > > > > > ~Gregory >
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 4d1f9c5b5029..7f99e4f790d8 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -1853,7 +1853,9 @@ static int cxl_bus_uevent(const struct device *dev, struct kobj_uevent_env *env) static int cxl_bus_match(struct device *dev, struct device_driver *drv) { - return cxl_device_id(dev) == to_cxl_drv(drv)->id; + int devid = cxl_device_id(dev); + int drvid = to_cxl_drv(drv)->id + return (devid == drvid) || (devid == CXL_DEVICE_ROOT && drvid == CXL_DEVICE_PORT); }