Message ID | 20250305100123.3077031-2-rrichter@amd.com |
---|---|
State | New |
Headers | show |
Series | cxl/pci: Inactive downstream port handling | expand |
Robert Richter wrote: > If a link is inactive, the port ID in the PCIe Link Capability > Register of a downstream port may not be assigned yet. Another > downstream port with an inactive link on the same Downstream Switch > Port may have the same port ID. Is it possible that an active link would have the same ID? I'm not clear why failing with a duplicate port ID is a bad thing. > > In this case the port enumeration of > the root or downstream port fails due to duplicate port IDs > (devm_cxl_port_enumerate_dports()/add_dport()). > > Relax the check and just ignore downstream ports with duplicate port > IDs. Ah. So do not add the dport... It may not matter but I __think__ this adds a subtle memory leak where the dport object is allocated, not added to the xarray, and upon the port being probed later a new dport object is allocated in it's place. That might be ok as the memory will be recovered when the switch device is destroyed (via devm). But could a series of unplug/hotplugs cause issues? Ira > > Do not fail and continue to enumerate all downstream ports of a > CXL Root Port or CXL Switch. Turn the related dev_err() messages into > a dev_dbg(). > > Signed-off-by: Robert Richter <rrichter@amd.com> > --- > drivers/cxl/core/pci.c | 10 ++++++++-- > drivers/cxl/core/port.c | 2 +- > 2 files changed, 9 insertions(+), 3 deletions(-) > > diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c > index fbc50b1156b8..524b8749cc0b 100644 > --- a/drivers/cxl/core/pci.c > +++ b/drivers/cxl/core/pci.c > @@ -59,8 +59,14 @@ static int match_add_dports(struct pci_dev *pdev, void *data) > port_num = FIELD_GET(PCI_EXP_LNKCAP_PN, lnkcap); > dport = devm_cxl_add_dport(port, &pdev->dev, port_num, map.resource); > if (IS_ERR(dport)) { > - ctx->error = PTR_ERR(dport); > - return PTR_ERR(dport); > + rc = PTR_ERR(dport); > + if (rc == -EBUSY) { > + dev_dbg(&port->dev, "failed to add dport %s, continuing\n", > + dev_name(&pdev->dev)); > + return 0; > + } > + ctx->error = rc; > + return rc; > } > ctx->count++; > > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c > index 33607301c5d3..8038cbeffbf7 100644 > --- a/drivers/cxl/core/port.c > +++ b/drivers/cxl/core/port.c > @@ -1071,7 +1071,7 @@ static int add_dport(struct cxl_port *port, struct cxl_dport *dport) > device_lock_assert(&port->dev); > dup = find_dport(port, dport->port_id); > if (dup) { > - dev_err(&port->dev, > + dev_dbg(&port->dev, > "unable to add dport%d-%s non-unique port id (%s)\n", > dport->port_id, dev_name(dport->dport_dev), > dev_name(dup->dport_dev)); > -- > 2.39.5 >
Hi Ira, thanks for your review. On 05.03.25 09:09:52, Ira Weiny wrote: > Robert Richter wrote: > > If a link is inactive, the port ID in the PCIe Link Capability > > Register of a downstream port may not be assigned yet. Another > > downstream port with an inactive link on the same Downstream Switch > > Port may have the same port ID. > > Is it possible that an active link would have the same ID? > > I'm not clear why failing with a duplicate port ID is a bad thing. > > > > > In this case the port enumeration of > > the root or downstream port fails due to duplicate port IDs > > (devm_cxl_port_enumerate_dports()/add_dport()). > > > > Relax the check and just ignore downstream ports with duplicate port > > IDs. > > Ah. So do not add the dport... > > It may not matter but I __think__ this adds a subtle memory leak where the > dport object is allocated, not added to the xarray, and upon the port > being probed later a new dport object is allocated in it's place. That > might be ok as the memory will be recovered when the switch device is > destroyed (via devm). But could a series of unplug/hotplugs cause issues? No, this patches do not change anything regarding devm allocation. I have looked into __devm_cxl_add_dport(). If the function returns an error, there might be allocated memory left which is not released before the host device is released. This is current implementation and these patches do not change anything here. If at all, it is a general issue with the devm implementation in that function (and probably not only there). -Robert
On Wed, 5 Mar 2025 11:01:22 +0100 Robert Richter <rrichter@amd.com> wrote: > If a link is inactive, the port ID in the PCIe Link Capability > Register of a downstream port may not be assigned yet. Another > downstream port with an inactive link on the same Downstream Switch > Port may have the same port ID. In this case the port enumeration of > the root or downstream port fails due to duplicate port IDs > (devm_cxl_port_enumerate_dports()/add_dport()). Obviously it's a bit irrelevant if there is hardware doing that, but can you add a PCI spec reference that says the port ID may not be configured. In my (often wrong) mental model, these are fixed values. It's marked HwInit and text for that talks about conventional reset (which I'm not sure applies here). I can't find any other references to when you are allowed to read this. Anyhow I'd like to know if this is a quirk or a bug fix for compliant hardware. Jonathan > > Relax the check and just ignore downstream ports with duplicate port > IDs. Do not fail and continue to enumerate all downstream ports of a > CXL Root Port or CXL Switch. Turn the related dev_err() messages into > a dev_dbg(). > > Signed-off-by: Robert Richter <rrichter@amd.com> > --- > drivers/cxl/core/pci.c | 10 ++++++++-- > drivers/cxl/core/port.c | 2 +- > 2 files changed, 9 insertions(+), 3 deletions(-) > > diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c > index fbc50b1156b8..524b8749cc0b 100644 > --- a/drivers/cxl/core/pci.c > +++ b/drivers/cxl/core/pci.c > @@ -59,8 +59,14 @@ static int match_add_dports(struct pci_dev *pdev, void *data) > port_num = FIELD_GET(PCI_EXP_LNKCAP_PN, lnkcap); > dport = devm_cxl_add_dport(port, &pdev->dev, port_num, map.resource); > if (IS_ERR(dport)) { > - ctx->error = PTR_ERR(dport); > - return PTR_ERR(dport); > + rc = PTR_ERR(dport); > + if (rc == -EBUSY) { > + dev_dbg(&port->dev, "failed to add dport %s, continuing\n", > + dev_name(&pdev->dev)); > + return 0; > + } > + ctx->error = rc; > + return rc; > } > ctx->count++; > > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c > index 33607301c5d3..8038cbeffbf7 100644 > --- a/drivers/cxl/core/port.c > +++ b/drivers/cxl/core/port.c > @@ -1071,7 +1071,7 @@ static int add_dport(struct cxl_port *port, struct cxl_dport *dport) > device_lock_assert(&port->dev); > dup = find_dport(port, dport->port_id); > if (dup) { > - dev_err(&port->dev, > + dev_dbg(&port->dev, > "unable to add dport%d-%s non-unique port id (%s)\n", > dport->port_id, dev_name(dport->dport_dev), > dev_name(dup->dport_dev));
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c index fbc50b1156b8..524b8749cc0b 100644 --- a/drivers/cxl/core/pci.c +++ b/drivers/cxl/core/pci.c @@ -59,8 +59,14 @@ static int match_add_dports(struct pci_dev *pdev, void *data) port_num = FIELD_GET(PCI_EXP_LNKCAP_PN, lnkcap); dport = devm_cxl_add_dport(port, &pdev->dev, port_num, map.resource); if (IS_ERR(dport)) { - ctx->error = PTR_ERR(dport); - return PTR_ERR(dport); + rc = PTR_ERR(dport); + if (rc == -EBUSY) { + dev_dbg(&port->dev, "failed to add dport %s, continuing\n", + dev_name(&pdev->dev)); + return 0; + } + ctx->error = rc; + return rc; } ctx->count++; diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 33607301c5d3..8038cbeffbf7 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -1071,7 +1071,7 @@ static int add_dport(struct cxl_port *port, struct cxl_dport *dport) device_lock_assert(&port->dev); dup = find_dport(port, dport->port_id); if (dup) { - dev_err(&port->dev, + dev_dbg(&port->dev, "unable to add dport%d-%s non-unique port id (%s)\n", dport->port_id, dev_name(dport->dport_dev), dev_name(dup->dport_dev));
If a link is inactive, the port ID in the PCIe Link Capability Register of a downstream port may not be assigned yet. Another downstream port with an inactive link on the same Downstream Switch Port may have the same port ID. In this case the port enumeration of the root or downstream port fails due to duplicate port IDs (devm_cxl_port_enumerate_dports()/add_dport()). Relax the check and just ignore downstream ports with duplicate port IDs. Do not fail and continue to enumerate all downstream ports of a CXL Root Port or CXL Switch. Turn the related dev_err() messages into a dev_dbg(). Signed-off-by: Robert Richter <rrichter@amd.com> --- drivers/cxl/core/pci.c | 10 ++++++++-- drivers/cxl/core/port.c | 2 +- 2 files changed, 9 insertions(+), 3 deletions(-)