Message ID | 20230413101642.8724-1-mika.westerberg@linux.intel.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v3] PCI/PM: Bail out early in pci_bridge_wait_for_secondary_bus() if link is not trained | expand |
On 4/13/23 3:16 AM, Mika Westerberg wrote: > If the Root/Downstream Port supports active link reporting we can check > if the link is trained before waiting for the device to respond. If the > link is not trained, there is no point waiting for the whole ~60s so > bail out early in that case. > > Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> > --- Looks good to me. Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> > As discussed in the email thread of the previous version here: > > https://lore.kernel.org/linux-pci/20230404052714.51315-1-mika.westerberg@linux.intel.com/ > > This adds the last change on top of > > https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git/log/?h=reset > > > drivers/pci/pci.c | 16 ++++++++++++++++ > 1 file changed, 16 insertions(+) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index 0b4f3b08f780..61bf8a4b2099 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -5037,6 +5037,22 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type) > } > } > > + /* > + * Everything above is handling the delays mandated by the PCIe r6.0 > + * sec 6.6.1. > + * > + * If the port supports active link reporting we now check one more > + * time if the link is active and if not bail out early with the > + * assumption that the device is not present anymore. > + */ > + if (dev->link_active_reporting) { > + u16 status; > + > + pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &status); > + if (!(status & PCI_EXP_LNKSTA_DLLLA)) > + return -ENOTTY; > + } > + > return pci_dev_wait(child, reset_type, > PCIE_RESET_READY_POLL_MS - delay); > }
On Thu, Apr 13, 2023 at 01:16:42PM +0300, Mika Westerberg wrote: > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -5037,6 +5037,22 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type) > } > } > > + /* > + * Everything above is handling the delays mandated by the PCIe r6.0 > + * sec 6.6.1. > + * > + * If the port supports active link reporting we now check one more > + * time if the link is active and if not bail out early with the > + * assumption that the device is not present anymore. > + */ > + if (dev->link_active_reporting) { > + u16 status; > + > + pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &status); > + if (!(status & PCI_EXP_LNKSTA_DLLLA)) > + return -ENOTTY; > + } > + > return pci_dev_wait(child, reset_type, > PCIE_RESET_READY_POLL_MS - delay); > } Hm, shouldn't the added code live in the if (pcie_get_speed_cap(dev) <= PCIE_SPEED_5_0GT) branch? For the else branch (Gen3+ devices with > 5 GT/s), we've already waited for the link to become active, so the additional check seems superfluous. (But maybe I'm missing something.) I also note that this documentation change has been dropped vis-à-vis v1 of the patch, not sure if that's intentional: - * However, 100 ms is the minimum and the PCIe spec says the - * software must allow at least 1s before it can determine that the - * device that did not respond is a broken device. There is - * evidence that 100 ms is not always enough, for example certain - * Titan Ridge xHCI controller does not always respond to - * configuration requests if we only wait for 100 ms (see - * https://bugzilla.kernel.org/show_bug.cgi?id=203885). + * However, 100 ms is the minimum and the PCIe spec says the software + * must allow at least 1s before it can determine that the device that + * did not respond is a broken device. Also device can take longer than + * that to respond if it indicates so through Request Retry Status + * completions. Thanks, Lukas
Hi, On Fri, Apr 14, 2023 at 09:42:38AM +0200, Lukas Wunner wrote: > On Thu, Apr 13, 2023 at 01:16:42PM +0300, Mika Westerberg wrote: > > --- a/drivers/pci/pci.c > > +++ b/drivers/pci/pci.c > > @@ -5037,6 +5037,22 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type) > > } > > } > > > > + /* > > + * Everything above is handling the delays mandated by the PCIe r6.0 > > + * sec 6.6.1. > > + * > > + * If the port supports active link reporting we now check one more > > + * time if the link is active and if not bail out early with the > > + * assumption that the device is not present anymore. > > + */ > > + if (dev->link_active_reporting) { > > + u16 status; > > + > > + pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &status); > > + if (!(status & PCI_EXP_LNKSTA_DLLLA)) > > + return -ENOTTY; > > + } > > + > > return pci_dev_wait(child, reset_type, > > PCIE_RESET_READY_POLL_MS - delay); > > } > > Hm, shouldn't the added code live in the > > if (pcie_get_speed_cap(dev) <= PCIE_SPEED_5_0GT) > > branch? For the else branch (Gen3+ devices with > 5 GT/s), > we've already waited for the link to become active, so the > additional check seems superfluous. (But maybe I'm missing > something.) You are not missing anything ;-) Indeed it should belong there, and I think we are also missing now the "optimization" for devices behind slow link without active link reporting capabilities. That is we wait for the 1s instead of the whole 60s. > I also note that this documentation change has been dropped > vis-à-vis v1 of the patch, not sure if that's intentional: > > - * However, 100 ms is the minimum and the PCIe spec says the > - * software must allow at least 1s before it can determine that the > - * device that did not respond is a broken device. There is > - * evidence that 100 ms is not always enough, for example certain > - * Titan Ridge xHCI controller does not always respond to > - * configuration requests if we only wait for 100 ms (see > - * https://bugzilla.kernel.org/show_bug.cgi?id=203885). > + * However, 100 ms is the minimum and the PCIe spec says the software > + * must allow at least 1s before it can determine that the device that > + * did not respond is a broken device. Also device can take longer than > + * that to respond if it indicates so through Request Retry Status > + * completions. This is not intentional. I will add it back in the next version. To summarize the v4 patch would look something like below. Only compile tested but I will run real testing later today. I think it now includes the 1s optimization and also checking of the active link reporting support for the devices behind slow links. Let me know is I missed something. It is getting rather complex unfortunately :( diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 61bf8a4b2099..f81a9e6aff84 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -64,6 +64,13 @@ struct pci_pme_device { #define PME_TIMEOUT 1000 /* How long between PME checks */ +/* + * Following exit from Conventional Reset, devices must be ready within 1 sec + * (PCIe r6.0 sec 6.6.1). A D3cold to D0 transition implies a Conventional + * Reset (PCIe r6.0 sec 5.8). + */ +#define PCI_RESET_WAIT 1000 /* msec */ + /* * Devices may extend the 1 sec period through Request Retry Status * completions (PCIe r6.0 sec 2.3.1). The spec does not provide an upper @@ -5010,13 +5017,11 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type) * speeds (gen3) we need to wait first for the data link layer to * become active. * - * However, 100 ms is the minimum and the PCIe spec says the - * software must allow at least 1s before it can determine that the - * device that did not respond is a broken device. There is - * evidence that 100 ms is not always enough, for example certain - * Titan Ridge xHCI controller does not always respond to - * configuration requests if we only wait for 100 ms (see - * https://bugzilla.kernel.org/show_bug.cgi?id=203885). + * However, 100 ms is the minimum and the PCIe spec says the software + * must allow at least 1s before it can determine that the device that + * did not respond is a broken device. Also device can take longer than + * that to respond if it indicates so through Request Retry Status + * completions. * * Therefore we wait for 100 ms and check for the device presence * until the timeout expires. @@ -5027,30 +5032,29 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type) if (pcie_get_speed_cap(dev) <= PCIE_SPEED_5_0GT) { pci_dbg(dev, "waiting %d ms for downstream link\n", delay); msleep(delay); - } else { - pci_dbg(dev, "waiting %d ms for downstream link, after activation\n", - delay); - if (!pcie_wait_for_link_delay(dev, true, delay)) { - /* Did not train, no need to wait any further */ - pci_info(dev, "Data Link Layer Link Active not set in 1000 msec\n"); - return -ENOTTY; + + /* + * If the port supports active link reporting we now check one + * more time if the link is active and if not bail out early + * with the assumption that the device is not present anymore. + */ + if (dev->link_active_reporting) { + u16 status; + + pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &status); + if (!(status & PCI_EXP_LNKSTA_DLLLA)) + return -ENOTTY; } - } - /* - * Everything above is handling the delays mandated by the PCIe r6.0 - * sec 6.6.1. - * - * If the port supports active link reporting we now check one more - * time if the link is active and if not bail out early with the - * assumption that the device is not present anymore. - */ - if (dev->link_active_reporting) { - u16 status; + return pci_dev_wait(child, reset_type, PCI_RESET_WAIT - delay); + } - pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &status); - if (!(status & PCI_EXP_LNKSTA_DLLLA)) - return -ENOTTY; + pci_dbg(dev, "waiting %d ms for downstream link, after activation\n", + delay); + if (!pcie_wait_for_link_delay(dev, true, delay)) { + /* Did not train, no need to wait any further */ + pci_info(dev, "Data Link Layer Link Active not set in 1000 msec\n"); + return -ENOTTY; } return pci_dev_wait(child, reset_type, diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 022da58afb33..f2d3aeab91f4 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -64,13 +64,6 @@ struct pci_cap_saved_state *pci_find_saved_ext_cap(struct pci_dev *dev, #define PCI_PM_D3HOT_WAIT 10 /* msec */ #define PCI_PM_D3COLD_WAIT 100 /* msec */ -/* - * Following exit from Conventional Reset, devices must be ready within 1 sec - * (PCIe r6.0 sec 6.6.1). A D3cold to D0 transition implies a Conventional - * Reset (PCIe r6.0 sec 5.8). - */ -#define PCI_RESET_WAIT 1000 /* msec */ - void pci_update_current_state(struct pci_dev *dev, pci_power_t state); void pci_refresh_power_state(struct pci_dev *dev); int pci_power_up(struct pci_dev *dev);
On Fri, Apr 14, 2023 at 01:11:47PM +0300, Mika Westerberg wrote: > To summarize the v4 patch would look something like below. Only compile > tested but I will run real testing later today. I think it now includes > the 1s optimization and also checking of the active link reporting > support for the devices behind slow links. Let me know is I missed > something. The patch seems to be based on a branch which has the v3 patch applied instead of on pci.git/reset, and that makes it slightly more difficult to review, but from a first glance it LGTM. > It is getting rather complex unfortunately :( I disagree. :) Basically the Gen1/Gen2 situation becomes a special case because it has specific timing requirements (need to observe a delay before accessing the Secondary Bus, instead of waiting for the link) and it doesn't necessarily support link active reporting. So special casing it seems fair to me. > - * However, 100 ms is the minimum and the PCIe spec says the > - * software must allow at least 1s before it can determine that the > - * device that did not respond is a broken device. There is > - * evidence that 100 ms is not always enough, for example certain > - * Titan Ridge xHCI controller does not always respond to > - * configuration requests if we only wait for 100 ms (see > - * https://bugzilla.kernel.org/show_bug.cgi?id=203885). > + * However, 100 ms is the minimum and the PCIe spec says the software > + * must allow at least 1s before it can determine that the device that > + * did not respond is a broken device. Also device can take longer than > + * that to respond if it indicates so through Request Retry Status > + * completions. It *might* be worth avoiding the rewrapping of the first 3 lines to make the patch smaller, your choice. > + > + /* > + * If the port supports active link reporting we now check one > + * more time if the link is active and if not bail out early > + * with the assumption that the device is not present anymore. > + */ Nit: Drop the "one more time" because it seems this is actually the first time the link is checked. Somewhat tangentially, I note that pcie_wait_for_link_delay() has a "if (!pdev->link_active_reporting)" branch right at its top, however pci_bridge_wait_for_secondary_bus() only calls the function in the Gen3+ (> 5 GT/s) case, which always supports link active reporting. Thus the branch is never taken when pcie_wait_for_link_delay() is called from pci_bridge_wait_for_secondary_bus(). There's only one other caller, pcie_wait_for_link(). So moving the "if (!pdev->link_active_reporting)" branch to pcie_wait_for_link() *might* make the code more readable. Just a thought. Thanks, Lukas
Hi, On Sun, Apr 16, 2023 at 09:48:46AM +0200, Lukas Wunner wrote: > On Fri, Apr 14, 2023 at 01:11:47PM +0300, Mika Westerberg wrote: > > To summarize the v4 patch would look something like below. Only compile > > tested but I will run real testing later today. I think it now includes > > the 1s optimization and also checking of the active link reporting > > support for the devices behind slow links. Let me know is I missed > > something. > > The patch seems to be based on a branch which has the v3 patch applied > instead of on pci.git/reset, and that makes it slightly more difficult > to review, but from a first glance it LGTM. Oops :( I forgot to rebase it. Sorry about that. > > > It is getting rather complex unfortunately :( > > I disagree. :) Basically the Gen1/Gen2 situation becomes a special case > because it has specific timing requirements (need to observe a delay > before accessing the Secondary Bus, instead of waiting for the link) > and it doesn't necessarily support link active reporting. So special > casing it seems fair to me. OK. > > - * However, 100 ms is the minimum and the PCIe spec says the > > - * software must allow at least 1s before it can determine that the > > - * device that did not respond is a broken device. There is > > - * evidence that 100 ms is not always enough, for example certain > > - * Titan Ridge xHCI controller does not always respond to > > - * configuration requests if we only wait for 100 ms (see > > - * https://bugzilla.kernel.org/show_bug.cgi?id=203885). > > + * However, 100 ms is the minimum and the PCIe spec says the software > > + * must allow at least 1s before it can determine that the device that > > + * did not respond is a broken device. Also device can take longer than > > + * that to respond if it indicates so through Request Retry Status > > + * completions. > > It *might* be worth avoiding the rewrapping of the first 3 lines to > make the patch smaller, your choice. Makes sense, will do. > > + > > + /* > > + * If the port supports active link reporting we now check one > > + * more time if the link is active and if not bail out early > > + * with the assumption that the device is not present anymore. > > + */ > > Nit: Drop the "one more time" because it seems this is actually the > first time the link is checked. Sure. > Somewhat tangentially, I note that pcie_wait_for_link_delay() has a > "if (!pdev->link_active_reporting)" branch right at its top, however > pci_bridge_wait_for_secondary_bus() only calls the function in the > Gen3+ (> 5 GT/s) case, which always supports link active reporting. > > Thus the branch is never taken when pcie_wait_for_link_delay() is called > from pci_bridge_wait_for_secondary_bus(). There's only one other caller, > pcie_wait_for_link(). So moving the "if (!pdev->link_active_reporting)" > branch to pcie_wait_for_link() *might* make the code more readable. > Just a thought. Indeed, however moving it would make the two functions behave differently where as before pcie_wait_for_link() is just a common wrapper on top fo pcie_wait_for_link_delay(). At least if we do this change, the documentation should be updated too and possibly rename pcie_wait_for_link_delay() to __pcie_wait_for_link_delay() or so to make sure it is something internal to that file.
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 0b4f3b08f780..61bf8a4b2099 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -5037,6 +5037,22 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type) } } + /* + * Everything above is handling the delays mandated by the PCIe r6.0 + * sec 6.6.1. + * + * If the port supports active link reporting we now check one more + * time if the link is active and if not bail out early with the + * assumption that the device is not present anymore. + */ + if (dev->link_active_reporting) { + u16 status; + + pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &status); + if (!(status & PCI_EXP_LNKSTA_DLLLA)) + return -ENOTTY; + } + return pci_dev_wait(child, reset_type, PCIE_RESET_READY_POLL_MS - delay); }
If the Root/Downstream Port supports active link reporting we can check if the link is trained before waiting for the device to respond. If the link is not trained, there is no point waiting for the whole ~60s so bail out early in that case. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> --- As discussed in the email thread of the previous version here: https://lore.kernel.org/linux-pci/20230404052714.51315-1-mika.westerberg@linux.intel.com/ This adds the last change on top of https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git/log/?h=reset drivers/pci/pci.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)