diff mbox series

[1/2] cxl/pci: Ignore downstream ports with duplicate port IDs

Message ID 20250305100123.3077031-2-rrichter@amd.com
State New
Headers show
Series cxl/pci: Inactive downstream port handling | expand

Commit Message

Robert Richter March 5, 2025, 10:01 a.m. UTC
If a link is inactive, the port ID in the PCIe Link Capability
Register of a downstream port may not be assigned yet. Another
downstream port with an inactive link on the same Downstream Switch
Port may have the same port ID. In this case the port enumeration of
the root or downstream port fails due to duplicate port IDs
(devm_cxl_port_enumerate_dports()/add_dport()).

Relax the check and just ignore downstream ports with duplicate port
IDs. Do not fail and continue to enumerate all downstream ports of a
CXL Root Port or CXL Switch. Turn the related dev_err() messages into
a dev_dbg().

Signed-off-by: Robert Richter <rrichter@amd.com>
---
 drivers/cxl/core/pci.c  | 10 ++++++++--
 drivers/cxl/core/port.c |  2 +-
 2 files changed, 9 insertions(+), 3 deletions(-)

Comments

Ira Weiny March 5, 2025, 3:09 p.m. UTC | #1
Robert Richter wrote:
> If a link is inactive, the port ID in the PCIe Link Capability
> Register of a downstream port may not be assigned yet. Another
> downstream port with an inactive link on the same Downstream Switch
> Port may have the same port ID.

Is it possible that an active link would have the same ID?

I'm not clear why failing with a duplicate port ID is a bad thing.

>
> In this case the port enumeration of
> the root or downstream port fails due to duplicate port IDs
> (devm_cxl_port_enumerate_dports()/add_dport()).
> 
> Relax the check and just ignore downstream ports with duplicate port
> IDs.

Ah.  So do not add the dport...

It may not matter but I __think__ this adds a subtle memory leak where the
dport object is allocated, not added to the xarray, and upon the port
being probed later a new dport object is allocated in it's place.  That
might be ok as the memory will be recovered when the switch device is
destroyed (via devm).  But could a series of unplug/hotplugs cause issues?

Ira

>
> Do not fail and continue to enumerate all downstream ports of a
> CXL Root Port or CXL Switch. Turn the related dev_err() messages into
> a dev_dbg().
> 
> Signed-off-by: Robert Richter <rrichter@amd.com>
> ---
>  drivers/cxl/core/pci.c  | 10 ++++++++--
>  drivers/cxl/core/port.c |  2 +-
>  2 files changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index fbc50b1156b8..524b8749cc0b 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -59,8 +59,14 @@ static int match_add_dports(struct pci_dev *pdev, void *data)
>  	port_num = FIELD_GET(PCI_EXP_LNKCAP_PN, lnkcap);
>  	dport = devm_cxl_add_dport(port, &pdev->dev, port_num, map.resource);
>  	if (IS_ERR(dport)) {
> -		ctx->error = PTR_ERR(dport);
> -		return PTR_ERR(dport);
> +		rc = PTR_ERR(dport);
> +		if (rc == -EBUSY) {
> +			dev_dbg(&port->dev, "failed to add dport %s, continuing\n",
> +				dev_name(&pdev->dev));
> +			return 0;
> +		}
> +		ctx->error = rc;
> +		return rc;
>  	}
>  	ctx->count++;
>  
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 33607301c5d3..8038cbeffbf7 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -1071,7 +1071,7 @@ static int add_dport(struct cxl_port *port, struct cxl_dport *dport)
>  	device_lock_assert(&port->dev);
>  	dup = find_dport(port, dport->port_id);
>  	if (dup) {
> -		dev_err(&port->dev,
> +		dev_dbg(&port->dev,
>  			"unable to add dport%d-%s non-unique port id (%s)\n",
>  			dport->port_id, dev_name(dport->dport_dev),
>  			dev_name(dup->dport_dev));
> -- 
> 2.39.5
>
Robert Richter March 7, 2025, 3:28 p.m. UTC | #2
Hi Ira,

thanks for your review.

On 05.03.25 09:09:52, Ira Weiny wrote:
> Robert Richter wrote:
> > If a link is inactive, the port ID in the PCIe Link Capability
> > Register of a downstream port may not be assigned yet. Another
> > downstream port with an inactive link on the same Downstream Switch
> > Port may have the same port ID.
> 
> Is it possible that an active link would have the same ID?
> 
> I'm not clear why failing with a duplicate port ID is a bad thing.
> 
> >
> > In this case the port enumeration of
> > the root or downstream port fails due to duplicate port IDs
> > (devm_cxl_port_enumerate_dports()/add_dport()).
> > 
> > Relax the check and just ignore downstream ports with duplicate port
> > IDs.
> 
> Ah.  So do not add the dport...
> 
> It may not matter but I __think__ this adds a subtle memory leak where the
> dport object is allocated, not added to the xarray, and upon the port
> being probed later a new dport object is allocated in it's place.  That
> might be ok as the memory will be recovered when the switch device is
> destroyed (via devm).  But could a series of unplug/hotplugs cause issues?

No, this patches do not change anything regarding devm allocation.

I have looked into __devm_cxl_add_dport(). If the function returns an
error, there might be allocated memory left which is not released
before the host device is released. This is current implementation and
these patches do not change anything here. If at all, it is a general
issue with the devm implementation in that function (and probably not
only there).

-Robert
Jonathan Cameron March 14, 2025, 12:11 p.m. UTC | #3
On Wed, 5 Mar 2025 11:01:22 +0100
Robert Richter <rrichter@amd.com> wrote:

> If a link is inactive, the port ID in the PCIe Link Capability
> Register of a downstream port may not be assigned yet. Another
> downstream port with an inactive link on the same Downstream Switch
> Port may have the same port ID. In this case the port enumeration of
> the root or downstream port fails due to duplicate port IDs
> (devm_cxl_port_enumerate_dports()/add_dport()).

Obviously it's a bit irrelevant if there is hardware doing that, but
can you add a PCI spec reference that says the port ID may not be configured.

In my (often wrong) mental model, these are fixed values. It's marked
HwInit and text for that talks about conventional reset (which I'm not
sure applies here).  I can't find any other references to when you
are allowed to read this.

Anyhow I'd like to know if this is a quirk or a bug fix for compliant
hardware.

Jonathan


> 
> Relax the check and just ignore downstream ports with duplicate port
> IDs. Do not fail and continue to enumerate all downstream ports of a
> CXL Root Port or CXL Switch. Turn the related dev_err() messages into
> a dev_dbg().
> 
> Signed-off-by: Robert Richter <rrichter@amd.com>
> ---
>  drivers/cxl/core/pci.c  | 10 ++++++++--
>  drivers/cxl/core/port.c |  2 +-
>  2 files changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index fbc50b1156b8..524b8749cc0b 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -59,8 +59,14 @@ static int match_add_dports(struct pci_dev *pdev, void *data)
>  	port_num = FIELD_GET(PCI_EXP_LNKCAP_PN, lnkcap);
>  	dport = devm_cxl_add_dport(port, &pdev->dev, port_num, map.resource);
>  	if (IS_ERR(dport)) {
> -		ctx->error = PTR_ERR(dport);
> -		return PTR_ERR(dport);
> +		rc = PTR_ERR(dport);
> +		if (rc == -EBUSY) {
> +			dev_dbg(&port->dev, "failed to add dport %s, continuing\n",
> +				dev_name(&pdev->dev));
> +			return 0;
> +		}
> +		ctx->error = rc;
> +		return rc;
>  	}
>  	ctx->count++;
>  
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 33607301c5d3..8038cbeffbf7 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -1071,7 +1071,7 @@ static int add_dport(struct cxl_port *port, struct cxl_dport *dport)
>  	device_lock_assert(&port->dev);
>  	dup = find_dport(port, dport->port_id);
>  	if (dup) {
> -		dev_err(&port->dev,
> +		dev_dbg(&port->dev,
>  			"unable to add dport%d-%s non-unique port id (%s)\n",
>  			dport->port_id, dev_name(dport->dport_dev),
>  			dev_name(dup->dport_dev));
diff mbox series

Patch

diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index fbc50b1156b8..524b8749cc0b 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -59,8 +59,14 @@  static int match_add_dports(struct pci_dev *pdev, void *data)
 	port_num = FIELD_GET(PCI_EXP_LNKCAP_PN, lnkcap);
 	dport = devm_cxl_add_dport(port, &pdev->dev, port_num, map.resource);
 	if (IS_ERR(dport)) {
-		ctx->error = PTR_ERR(dport);
-		return PTR_ERR(dport);
+		rc = PTR_ERR(dport);
+		if (rc == -EBUSY) {
+			dev_dbg(&port->dev, "failed to add dport %s, continuing\n",
+				dev_name(&pdev->dev));
+			return 0;
+		}
+		ctx->error = rc;
+		return rc;
 	}
 	ctx->count++;
 
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 33607301c5d3..8038cbeffbf7 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1071,7 +1071,7 @@  static int add_dport(struct cxl_port *port, struct cxl_dport *dport)
 	device_lock_assert(&port->dev);
 	dup = find_dport(port, dport->port_id);
 	if (dup) {
-		dev_err(&port->dev,
+		dev_dbg(&port->dev,
 			"unable to add dport%d-%s non-unique port id (%s)\n",
 			dport->port_id, dev_name(dport->dport_dev),
 			dev_name(dup->dport_dev));