Message ID | 20250305100123.3077031-3-rrichter@amd.com |
---|---|
State | New |
Headers | show |
Series | cxl/pci: Inactive downstream port handling | expand |
Robert Richter wrote: > When downstream ports are enumerated, some of them may not be > connected to a corresponding endpoint or upstream switch port. The > dport is inactive and its link status is down then. For permanently > disabled ports a HwInit configuration mechanism (set by hardware or > firmware) may assign a (further unused) default port number. The port > number may be set to the same value accross other inactive links. > Those duplicate port numbers cause the downstream port enumeration to > fail including the root or switch port initialization > (cxl_switch_port_probe()) and all its active downstream ports. > > Prevent a port initialization failure by checking the link status and > only enabling active dports. If a dport is inactive, there is no > matching component (endpoint or switch) connected to and thus, it must > not be enumerated and added to the kernel's CXL device hierarchy. > There is no device that will connect to an inactive dport. This makes much more sense. Wouldn't it be better to use this patch and leave the old error messages/failures on duplicate port ids? IOW drop patch 1? What happens if a link is having issues (perhaps going up and down) and RAS events fire for this dport? Does the lack of a dport object cause issues? Ira > > Signed-off-by: Robert Richter <rrichter@amd.com> > --- > drivers/cxl/core/pci.c | 30 ++++++++++++++++++++++++++---- > 1 file changed, 26 insertions(+), 4 deletions(-) > > diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c > index 524b8749cc0b..72683e1b41e3 100644 > --- a/drivers/cxl/core/pci.c > +++ b/drivers/cxl/core/pci.c > @@ -32,6 +32,26 @@ struct cxl_walk_context { > int count; > }; > > +static int get_port_num(struct pci_dev *pdev) > +{ > + u32 lnkcap, port_num; > + u16 lnksta; > + > + if (pcie_capability_read_dword(pdev, PCI_EXP_LNKCAP, &lnkcap)) > + return -ENXIO; > + > + /* Skip inactive links */ > + if (lnkcap & PCI_EXP_LNKCAP_DLLLARC) { > + pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnksta); > + if (!(lnksta & PCI_EXP_LNKSTA_DLLLA)) > + return -ENOENT; > + } > + > + port_num = FIELD_GET(PCI_EXP_LNKCAP_PN, lnkcap); > + > + return port_num; > +} > + > static int match_add_dports(struct pci_dev *pdev, void *data) > { > struct cxl_walk_context *ctx = data; > @@ -39,7 +59,7 @@ static int match_add_dports(struct pci_dev *pdev, void *data) > int type = pci_pcie_type(pdev); > struct cxl_register_map map; > struct cxl_dport *dport; > - u32 lnkcap, port_num; > + int port_num; > int rc; > > if (pdev->bus != ctx->bus) > @@ -48,15 +68,17 @@ static int match_add_dports(struct pci_dev *pdev, void *data) > return 0; > if (type != ctx->type) > return 0; > - if (pci_read_config_dword(pdev, pci_pcie_cap(pdev) + PCI_EXP_LNKCAP, > - &lnkcap)) > + > + port_num = get_port_num(pdev); > + if (port_num == -ENOENT) > + pci_dbg(pdev, "Skipping dport, link is down\n"); > + if (port_num < 0) > return 0; > > rc = cxl_find_regblock(pdev, CXL_REGLOC_RBI_COMPONENT, &map); > if (rc) > dev_dbg(&port->dev, "failed to find component registers\n"); > > - port_num = FIELD_GET(PCI_EXP_LNKCAP_PN, lnkcap); > dport = devm_cxl_add_dport(port, &pdev->dev, port_num, map.resource); > if (IS_ERR(dport)) { > rc = PTR_ERR(dport); > -- > 2.39.5 >
On 05.03.25 09:19:59, Ira Weiny wrote: > Robert Richter wrote: > > When downstream ports are enumerated, some of them may not be > > connected to a corresponding endpoint or upstream switch port. The > > dport is inactive and its link status is down then. For permanently > > disabled ports a HwInit configuration mechanism (set by hardware or > > firmware) may assign a (further unused) default port number. The port > > number may be set to the same value accross other inactive links. > > Those duplicate port numbers cause the downstream port enumeration to > > fail including the root or switch port initialization > > (cxl_switch_port_probe()) and all its active downstream ports. > > > > Prevent a port initialization failure by checking the link status and > > only enabling active dports. If a dport is inactive, there is no > > matching component (endpoint or switch) connected to and thus, it must > > not be enumerated and added to the kernel's CXL device hierarchy. > > There is no device that will connect to an inactive dport. > > This makes much more sense. > > Wouldn't it be better to use this patch and leave the old error > messages/failures on duplicate port ids? IOW drop patch 1? The link check only works if Data Link Layer Link Active Reporting is supported. That is why there is the patch 1. That patch also handles cases where dups are seen that are unrelated to the link status. > What happens if a link is having issues (perhaps going up and down) and > RAS events fire for this dport? Does the lack of a dport object cause > issues? Once a dport was enumerated it exists as long as the parent exists. -Robert
Robert Richter wrote: > On 05.03.25 09:19:59, Ira Weiny wrote: > > Robert Richter wrote: > > > When downstream ports are enumerated, some of them may not be > > > connected to a corresponding endpoint or upstream switch port. The > > > dport is inactive and its link status is down then. For permanently > > > disabled ports a HwInit configuration mechanism (set by hardware or > > > firmware) may assign a (further unused) default port number. The port > > > number may be set to the same value accross other inactive links. > > > Those duplicate port numbers cause the downstream port enumeration to > > > fail including the root or switch port initialization > > > (cxl_switch_port_probe()) and all its active downstream ports. > > > > > > Prevent a port initialization failure by checking the link status and > > > only enabling active dports. If a dport is inactive, there is no > > > matching component (endpoint or switch) connected to and thus, it must > > > not be enumerated and added to the kernel's CXL device hierarchy. > > > There is no device that will connect to an inactive dport. > > > > This makes much more sense. > > > > Wouldn't it be better to use this patch and leave the old error > > messages/failures on duplicate port ids? IOW drop patch 1? > > The link check only works if Data Link Layer Link Active Reporting is > supported. That is why there is the patch 1. That patch also handles > cases where dups are seen that are unrelated to the link status. Isn't it a bug in the hardware to have 2 active dports with the same ID? > > > What happens if a link is having issues (perhaps going up and down) and > > RAS events fire for this dport? Does the lack of a dport object cause > > issues? > > Once a dport was enumerated it exists as long as the parent exists. Yes but do we care enough about this unused memory being held until the destruction of the device object? Currently I think it all get's torn down which means we are not keeping memory around. Ira
On Wed, 5 Mar 2025 11:01:23 +0100 Robert Richter <rrichter@amd.com> wrote: > When downstream ports are enumerated, some of them may not be > connected to a corresponding endpoint or upstream switch port. The > dport is inactive and its link status is down then. For permanently > disabled ports a HwInit configuration mechanism (set by hardware or > firmware) may assign a (further unused) default port number. The port > number may be set to the same value accross other inactive links. Spec reference needed. This sounds like it came from the spec but searching for it isn't finding anything. > Those duplicate port numbers cause the downstream port enumeration to > fail including the root or switch port initialization > (cxl_switch_port_probe()) and all its active downstream ports. > > Prevent a port initialization failure by checking the link status and > only enabling active dports. If a dport is inactive, there is no > matching component (endpoint or switch) connected to and thus, it must > not be enumerated and added to the kernel's CXL device hierarchy. > There is no device that will connect to an inactive dport. > > Signed-off-by: Robert Richter <rrichter@amd.com> > --- > drivers/cxl/core/pci.c | 30 ++++++++++++++++++++++++++---- > 1 file changed, 26 insertions(+), 4 deletions(-) > > diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c > index 524b8749cc0b..72683e1b41e3 100644 > --- a/drivers/cxl/core/pci.c > +++ b/drivers/cxl/core/pci.c > @@ -32,6 +32,26 @@ struct cxl_walk_context { > int count; > }; > > +static int get_port_num(struct pci_dev *pdev) > +{ > + u32 lnkcap, port_num; > + u16 lnksta; > + > + if (pcie_capability_read_dword(pdev, PCI_EXP_LNKCAP, &lnkcap)) > + return -ENXIO; > + > + /* Skip inactive links */ > + if (lnkcap & PCI_EXP_LNKCAP_DLLLARC) { > + pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnksta); > + if (!(lnksta & PCI_EXP_LNKSTA_DLLLA)) > + return -ENOENT; > + } > + > + port_num = FIELD_GET(PCI_EXP_LNKCAP_PN, lnkcap); > + > + return port_num; > +} > + > static int match_add_dports(struct pci_dev *pdev, void *data) > { > struct cxl_walk_context *ctx = data; > @@ -39,7 +59,7 @@ static int match_add_dports(struct pci_dev *pdev, void *data) > int type = pci_pcie_type(pdev); > struct cxl_register_map map; > struct cxl_dport *dport; > - u32 lnkcap, port_num; > + int port_num; > int rc; > > if (pdev->bus != ctx->bus) > @@ -48,15 +68,17 @@ static int match_add_dports(struct pci_dev *pdev, void *data) > return 0; > if (type != ctx->type) > return 0; > - if (pci_read_config_dword(pdev, pci_pcie_cap(pdev) + PCI_EXP_LNKCAP, > - &lnkcap)) > + > + port_num = get_port_num(pdev); > + if (port_num == -ENOENT) > + pci_dbg(pdev, "Skipping dport, link is down\n"); > + if (port_num < 0) > return 0; > > rc = cxl_find_regblock(pdev, CXL_REGLOC_RBI_COMPONENT, &map); > if (rc) > dev_dbg(&port->dev, "failed to find component registers\n"); > > - port_num = FIELD_GET(PCI_EXP_LNKCAP_PN, lnkcap); > dport = devm_cxl_add_dport(port, &pdev->dev, port_num, map.resource); > if (IS_ERR(dport)) { > rc = PTR_ERR(dport);
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c index 524b8749cc0b..72683e1b41e3 100644 --- a/drivers/cxl/core/pci.c +++ b/drivers/cxl/core/pci.c @@ -32,6 +32,26 @@ struct cxl_walk_context { int count; }; +static int get_port_num(struct pci_dev *pdev) +{ + u32 lnkcap, port_num; + u16 lnksta; + + if (pcie_capability_read_dword(pdev, PCI_EXP_LNKCAP, &lnkcap)) + return -ENXIO; + + /* Skip inactive links */ + if (lnkcap & PCI_EXP_LNKCAP_DLLLARC) { + pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnksta); + if (!(lnksta & PCI_EXP_LNKSTA_DLLLA)) + return -ENOENT; + } + + port_num = FIELD_GET(PCI_EXP_LNKCAP_PN, lnkcap); + + return port_num; +} + static int match_add_dports(struct pci_dev *pdev, void *data) { struct cxl_walk_context *ctx = data; @@ -39,7 +59,7 @@ static int match_add_dports(struct pci_dev *pdev, void *data) int type = pci_pcie_type(pdev); struct cxl_register_map map; struct cxl_dport *dport; - u32 lnkcap, port_num; + int port_num; int rc; if (pdev->bus != ctx->bus) @@ -48,15 +68,17 @@ static int match_add_dports(struct pci_dev *pdev, void *data) return 0; if (type != ctx->type) return 0; - if (pci_read_config_dword(pdev, pci_pcie_cap(pdev) + PCI_EXP_LNKCAP, - &lnkcap)) + + port_num = get_port_num(pdev); + if (port_num == -ENOENT) + pci_dbg(pdev, "Skipping dport, link is down\n"); + if (port_num < 0) return 0; rc = cxl_find_regblock(pdev, CXL_REGLOC_RBI_COMPONENT, &map); if (rc) dev_dbg(&port->dev, "failed to find component registers\n"); - port_num = FIELD_GET(PCI_EXP_LNKCAP_PN, lnkcap); dport = devm_cxl_add_dport(port, &pdev->dev, port_num, map.resource); if (IS_ERR(dport)) { rc = PTR_ERR(dport);
When downstream ports are enumerated, some of them may not be connected to a corresponding endpoint or upstream switch port. The dport is inactive and its link status is down then. For permanently disabled ports a HwInit configuration mechanism (set by hardware or firmware) may assign a (further unused) default port number. The port number may be set to the same value accross other inactive links. Those duplicate port numbers cause the downstream port enumeration to fail including the root or switch port initialization (cxl_switch_port_probe()) and all its active downstream ports. Prevent a port initialization failure by checking the link status and only enabling active dports. If a dport is inactive, there is no matching component (endpoint or switch) connected to and thus, it must not be enumerated and added to the kernel's CXL device hierarchy. There is no device that will connect to an inactive dport. Signed-off-by: Robert Richter <rrichter@amd.com> --- drivers/cxl/core/pci.c | 30 ++++++++++++++++++++++++++---- 1 file changed, 26 insertions(+), 4 deletions(-)