Message ID | 1499375234-23928-1-git-send-email-okaya@codeaurora.org (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On Thu, Jul 06, 2017 at 05:07:14PM -0400, Sinan Kaya wrote: > An endpoint is allowed to issue Configuration Request Retry Status (CRS) > following a Function Level Reset (FLR) request to indicate that it is not > ready to accept new requests. > > Seen a timeout message with Intel 750 NVMe drive and FLR reset. > > Kernel enables CRS visibility in pci_enable_crs() function for each bridge > it discovers. The OS observes a special vendor ID read value of 0xFFFF0001 > in this case. We need to keep polling until this special read value > disappears. pci_bus_read_dev_vendor_id() takes care of CRS handling for a > given vendor id read request under the covers. > > Adding a vendor ID read if this is a physical function before attempting > to read any other registers on the endpoint. A CRS indication will only > be given if the address to be read is vendor ID register. > > Note that virtual functions report their vendor ID through another > mechanism. > > The spec is calling to wait up to 1 seconds if the device is sending CRS. > The NVMe device seems to be requiring more. Relax this up to 60 seconds. Can you add a pointer to the "1 second" requirement in the spec here? We use 60 seconds in pci_scan_device() and acpiphp_add_context(). Is there a basis in the spec for the 60 second timeout? What's the NVMe excuse for requiring more time than the spec allows? Is this a hardware erratum? Is there some PCIe ECN pending to address this? I try to avoid adding generic changes based on one specific piece of hardware because it can penalize everybody else who actually bothered to follow the spec. For example, if FLR fails because a non-NVMe device is broken, it will now take 60 seconds to notice that instead of 1 second. > Signed-off-by: Sinan Kaya <okaya@codeaurora.org> > --- > drivers/pci/pci.c | 14 ++++++++++---- > 1 file changed, 10 insertions(+), 4 deletions(-) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index aab9d51..83a9784 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -3723,10 +3723,16 @@ static void pci_flr_wait(struct pci_dev *dev) > int i = 0; > u32 id; > > - do { > - msleep(100); > - pci_read_config_dword(dev, PCI_COMMAND, &id); > - } while (i++ < 10 && id == ~0); > + if (dev->is_virtfn) { > + do { > + msleep(100); > + pci_read_config_dword(dev, PCI_COMMAND, &id); > + } while (i++ < 10 && id == ~0); > + } else { > + if (!pci_bus_read_dev_vendor_id(dev->bus, dev->devfn, &id, > + 60*1000)) > + id = ~0; > + } > > if (id == ~0) > dev_warn(&dev->dev, "Failed to return from FLR\n"); > -- > 1.9.1 > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On 7/13/2017 8:17 AM, Bjorn Helgaas wrote: >> he spec is calling to wait up to 1 seconds if the device is sending CRS. >> The NVMe device seems to be requiring more. Relax this up to 60 seconds. > Can you add a pointer to the "1 second" requirement in the spec here? > We use 60 seconds in pci_scan_device() and acpiphp_add_context(). Is > there a basis in the spec for the 60 second timeout? This does not specify a hard limit above on how long SW need to wait. "6.6.2 Function Level Reset After an FLR has been initiated by writing a 1b to the Initiate Function Level Reset bit, the Function must complete the FLR within 100 ms. While a Function is required to complete the FLR operation within the time limit described above, the subsequent Function-specific initialization sequence may require additional time. If additional time is required, the Function must return a Configuration Request Retry Status (CRS) Completion Status when a Configuration Request is received 15 after the time limit above. After the Function responds to a Configuration Request with a Completion status other than CRS, it is not permitted to return CRS until it is reset again." However, another indirect reference here tells us it is capped by 1 second below. "6.23. Readiness Notifications (RN) Readiness Notifications (RN) is intended to reduce the time software needs to wait before issuing Configuration Requests to a Device or Function following DRS Events or FRS Events. RN includes both the Device Readiness Status (DRS) and Function Readiness Status (FRS) mechanisms. These mechanisms provide a direct indication of Configuration-Readiness (see 5 Terms and Acronyms entry for “Configuration-Ready”). When used, DRS and FRS allow an improved behavior over the CRS mechanism, and eliminate its associated periodic polling time of up to 1 second following a reset." If I remember it right from CRS commit messages, 60 seconds was coming from some PCIe switch taking too long to boot. > > What's the NVMe excuse for requiring more time than the spec allows? > Is this a hardware erratum? Is there some PCIe ECN pending to address > this? We have seen the issue with Intel 750 and Intel P3600 NVMe drives. I don't have access to the errata document for either of the drives. > > I try to avoid adding generic changes based on one specific piece of > hardware because it can penalize everybody else who actually bothered > to follow the spec. For example, if FLR fails because a non-NVMe > device is broken, it will now take 60 seconds to notice that instead > of 1 second. > We can look for a better number like 3-4 seconds and put some nice warning that HW might be broken (violating the spec) and could be in need of a FW/BIOS update. What do you think?
On Thu, Jul 13, 2017 at 07:17:58AM -0500, Bjorn Helgaas wrote: > On Thu, Jul 06, 2017 at 05:07:14PM -0400, Sinan Kaya wrote: > > An endpoint is allowed to issue Configuration Request Retry Status (CRS) > > following a Function Level Reset (FLR) request to indicate that it is not > > ready to accept new requests. > > > > Seen a timeout message with Intel 750 NVMe drive and FLR reset. > > > > Kernel enables CRS visibility in pci_enable_crs() function for each bridge > > it discovers. The OS observes a special vendor ID read value of 0xFFFF0001 > > in this case. We need to keep polling until this special read value > > disappears. pci_bus_read_dev_vendor_id() takes care of CRS handling for a > > given vendor id read request under the covers. > > > > Adding a vendor ID read if this is a physical function before attempting > > to read any other registers on the endpoint. A CRS indication will only > > be given if the address to be read is vendor ID register. > > > > Note that virtual functions report their vendor ID through another > > mechanism. > > > > The spec is calling to wait up to 1 seconds if the device is sending CRS. > > The NVMe device seems to be requiring more. Relax this up to 60 seconds. > > Can you add a pointer to the "1 second" requirement in the spec here? > We use 60 seconds in pci_scan_device() and acpiphp_add_context(). Is > there a basis in the spec for the 60 second timeout? I also don't see anywhere that says CRS is limited to only 1 second. It looks to me that the spec allows a device to return CRS for as long as it takes to complete initialization. From PCIe Base Spec, Section 2.3.1 CRS Implementation note: A device in receipt of a Configuration Request following a valid reset condition may respond with a CRS Completion Status to terminate the Request, and thus effectively stall the Configuration Request until such time that the subsystem has completed local initialization and is ready to communicate with the host. No time limit specified here, or anywhere else for that matter AFAICT. Where is 1 second requirement coming from?
On Thu, Jul 13, 2017 at 11:44:12AM -0400, Sinan Kaya wrote: > On 7/13/2017 8:17 AM, Bjorn Helgaas wrote: > >> he spec is calling to wait up to 1 seconds if the device is sending CRS. > >> The NVMe device seems to be requiring more. Relax this up to 60 seconds. > > Can you add a pointer to the "1 second" requirement in the spec here? > > We use 60 seconds in pci_scan_device() and acpiphp_add_context(). Is > > there a basis in the spec for the 60 second timeout? > > This does not specify a hard limit above on how long SW need to wait. > > "6.6.2 Function Level Reset > After an FLR has been initiated by writing a 1b to the Initiate Function Level Reset bit, > the Function must complete the FLR within 100 ms. > > While a Function is required to complete the FLR operation within the time limit described above, > the subsequent Function-specific initialization sequence may require additional time. > If additional time is required, the Function must return a Configuration Request Retry Status (CRS) > Completion Status when a Configuration Request is received 15 after the time limit above. > After the Function responds to a Configuration Request with a Completion status other than CRS, > it is not permitted to return CRS until it is reset again." > > However, another indirect reference here tells us it is capped by 1 second below. > > "6.23. Readiness Notifications (RN) > Readiness Notifications (RN) is intended to reduce the time software needs to > wait before issuing Configuration Requests to a Device or Function following DRS > Events or FRS Events. RN includes both the Device Readiness Status (DRS) and > Function Readiness Status (FRS) mechanisms. These mechanisms provide a direct > indication of Configuration-Readiness (see 5 Terms and Acronyms entry for “Configuration-Ready”). > > When used, DRS and FRS allow an improved behavior over the CRS mechanism, and eliminate > its associated periodic polling time of up to 1 second following a reset." That wording is just confusing. It looks to me the 1 second polling is to be used following a reset if CRS is not implemented. https://pcisig.com/sites/default/files/specification_documents/ECN_RN_29_Aug_2013.pdf " Through the mechanisms defined by this ECR, we can avoid the long, architected, fixed delays following various forms of reset before software is permitted to perform its first Configuration Request. These delays are very large: 1 second if Configuration Retry Status (CRS) is not used " It goes on to say CRS is usually much lower, but doesn't specify an upper bound either.
On 7/13/2017 12:29 PM, Keith Busch wrote: >> When used, DRS and FRS allow an improved behavior over the CRS mechanism, and eliminate >> its associated periodic polling time of up to 1 second following a reset." > That wording is just confusing. It looks to me the 1 second polling is > to be used following a reset if CRS is not implemented. > > https://pcisig.com/sites/default/files/specification_documents/ECN_RN_29_Aug_2013.pdf > > " > Through the mechanisms defined by this ECR, we can avoid the long, > architected, fixed delays following various forms of reset before > software is permitted to perform its first Configuration Request. These > delays are very large: > > 1 second if Configuration Retry Status (CRS) is not used > " > > It goes on to say CRS is usually much lower, but doesn't specify an > upper bound either. > I see, we got caught on spec language where we don't know what 'its' is. Bjorn, Since there is no upper cap on how long, what is your preference (stick to 60), give incremental warning updates every 5 seconds? I can certainly rewrite the commit message. Sinan
On Thu, Jul 13, 2017 at 12:42:44PM -0400, Sinan Kaya wrote: > On 7/13/2017 12:29 PM, Keith Busch wrote: > > That wording is just confusing. It looks to me the 1 second polling is > > to be used following a reset if CRS is not implemented. > > > > https://pcisig.com/sites/default/files/specification_documents/ECN_RN_29_Aug_2013.pdf > > > > " > > Through the mechanisms defined by this ECR, we can avoid the long, > > architected, fixed delays following various forms of reset before > > software is permitted to perform its first Configuration Request. These > > delays are very large: > > > > 1 second if Configuration Retry Status (CRS) is not used > > " > > > > It goes on to say CRS is usually much lower, but doesn't specify an > > upper bound either. > > > > I see, we got caught on spec language where we don't know what 'its' is. Well, I don't know for certain if your original interpretation is incorrect. Just saying the CRS intention doesn't explicitly stand out to me. On a side note, I'll also see if I can get clarification on what expectations the hardware people have for this particular product. Your observation seems a little high to me, but I don't know if that's outside the product's limits.
On Thu, Jul 13, 2017 at 11:44:12AM -0400, Sinan Kaya wrote: > On 7/13/2017 8:17 AM, Bjorn Helgaas wrote: > >> he spec is calling to wait up to 1 seconds if the device is sending CRS. > >> The NVMe device seems to be requiring more. Relax this up to 60 seconds. > > Can you add a pointer to the "1 second" requirement in the spec here? > > We use 60 seconds in pci_scan_device() and acpiphp_add_context(). Is > > there a basis in the spec for the 60 second timeout? > > This does not specify a hard limit above on how long SW need to wait. I wouldn't expect a *maximum* time we can wait. I'm looking for the minimum times the spec requires. If you're claiming "the spec is calling to wait up to 1 second", I just want to know where in the spec it says that. That helps in the future when we need to maintain code like this. > If I remember it right from CRS commit messages, 60 seconds was coming from > some PCIe switch taking too long to boot. If you have a pointer to this, please include it. The earliest thing I can find is when Linux was imported into git (1da177e4c3f4 ("Linux-2.6.12-rc2")), which includes a 60 second timeout: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/probe.c?id=1da177e4c3f4#n686
On Thu, Jul 06, 2017 at 05:07:14PM -0400, Sinan Kaya wrote: > An endpoint is allowed to issue Configuration Request Retry Status (CRS) > following a Function Level Reset (FLR) request to indicate that it is not > ready to accept new requests. > > Seen a timeout message with Intel 750 NVMe drive and FLR reset. > > Kernel enables CRS visibility in pci_enable_crs() function for each bridge > it discovers. The OS observes a special vendor ID read value of 0xFFFF0001 > in this case. We need to keep polling until this special read value > disappears. pci_bus_read_dev_vendor_id() takes care of CRS handling for a > given vendor id read request under the covers. This patch isn't about how CRS works; we already have support for that. So this paragraph is mostly extraneous and can be replaced by a simple reference to CRS in the spec. > Adding a vendor ID read if this is a physical function before attempting > to read any other registers on the endpoint. A CRS indication will only > be given if the address to be read is vendor ID register. > > Note that virtual functions report their vendor ID through another > mechanism. How VFs report vendor ID is irrelevant. What *is* relevant is how FLR affects VFs. The SR-IOV spec, r1.1, sec 2.2.2, says FLR doesn't affect a VF's existence in config space. That suggests to me that reading a VF's PCI_COMMAND register after an FLR should always return valid data (never ~0). I suppose an FLR VF reset isn't instantaneous, though, so I suppose we do need the 100ms delay. But maybe we should just return immediately after that, without reading any VF config space? pci_flr_wait() was added by 5adecf817dd6 ("PCI: Wait for up to 1000ms after FLR reset"); maybe Alex has more insight into this. > The spec is calling to wait up to 1 seconds if the device is sending CRS. > The NVMe device seems to be requiring more. Relax this up to 60 seconds. > > Signed-off-by: Sinan Kaya <okaya@codeaurora.org> > --- > drivers/pci/pci.c | 14 ++++++++++---- > 1 file changed, 10 insertions(+), 4 deletions(-) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index aab9d51..83a9784 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -3723,10 +3723,16 @@ static void pci_flr_wait(struct pci_dev *dev) > int i = 0; > u32 id; > > - do { > - msleep(100); > - pci_read_config_dword(dev, PCI_COMMAND, &id); > - } while (i++ < 10 && id == ~0); > + if (dev->is_virtfn) { > + do { > + msleep(100); > + pci_read_config_dword(dev, PCI_COMMAND, &id); > + } while (i++ < 10 && id == ~0); > + } else { > + if (!pci_bus_read_dev_vendor_id(dev->bus, dev->devfn, &id, > + 60*1000)) > + id = ~0; > + } > > if (id == ~0) > dev_warn(&dev->dev, "Failed to return from FLR\n"); > -- > 1.9.1 > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On 7/13/2017 7:38 PM, Bjorn Helgaas wrote: >> This does not specify a hard limit above on how long SW need to wait. > I wouldn't expect a *maximum* time we can wait. I'm looking for the > minimum times the spec requires. My understanding is FLR needs to finish in 100ms max under normal circumstances. If an endpoint needs more, it needs to issue CRS. "After an FLR has been initiated by writing a 1b to the Initiate Function Level Reset bit, the Function must complete the FLR within 100 ms. ... it is recommended that software allow as much time as provided by the pre-FLR value for Completion Timeout on the device. If Completion Timeouts were disabled on the Function when FLR was issued, then the delay is system dependent but must be no less than 100 ms." The only minimum I found is in the last paragraph where somebody actually disables completion timeout. I don't know why anyone would do that. > > If you're claiming "the spec is calling to wait up to 1 second", I > just want to know where in the spec it says that. That helps in the > future when we need to maintain code like this. > Keith and I discussed this here. https://www.spinics.net/lists/arm-kernel/msg593493.html We have a spec language problem. My interpretation of this is 1 seconds max for CRS. "When used, DRS and FRS allow an improved behavior over the CRS mechanism, and eliminate its associated periodic polling time of up to 1 second following a reset." The ECN is referring to conventional reset as 1 second max rather than CRS. https://www.spinics.net/lists/arm-kernel/msg593500.html
Hi Bjorn, On 7/13/2017 7:49 PM, Bjorn Helgaas wrote: > How VFs report vendor ID is irrelevant. I was trying to highlight this. "SR-IOV spec rev 1.1, 3.4.1.1 & 3.4.1.2, Vendor ID and Device ID fields for the VF return 0xFFFF when read. The "Virtualization Intermediary" is supposed to use the vendor ID from the PF and the device ID defined in the PF SR-IOV capability." https://www.spinics.net/lists/linux-pci/msg58530.html The point is we can't use pci_bus_read_dev_vendor_id() for both virtual and physical functions as in my V3 patch. > > What *is* relevant is how FLR affects VFs. The SR-IOV spec, r1.1, > sec 2.2.2, says FLR doesn't affect a VF's existence in config space. > I see. > That suggests to me that reading a VF's PCI_COMMAND register after an > FLR should always return valid data (never ~0). I suppose an FLR VF > reset isn't instantaneous, though, so I suppose we do need the 100ms > delay. But maybe we should just return immediately after that, > without reading any VF config space? make sense. > > pci_flr_wait() was added by 5adecf817dd6 ("PCI: Wait for up to 1000ms > after FLR reset"); maybe Alex has more insight into this. I think code is handling both virtual and physical functions. 1 second is nice to have for physical functions. See this comment at the top. /* * We should only need to wait 100ms after FLR, but some devices take longer. * Wait for up to 1000ms for config space to return something other than -1. * Intel IGD requires this when an LCD panel is attached. We read the 2nd * dword because VFs don't implement the 1st dword. */ Since we are separating the handling of physical and virtual functions, we could go back to 100ms for virtual functions and handle CRS/wait for physical functions. Sinan
Hi Bjorn, On 7/13/2017 7:49 PM, Bjorn Helgaas wrote: > On Thu, Jul 06, 2017 at 05:07:14PM -0400, Sinan Kaya wrote: >> An endpoint is allowed to issue Configuration Request Retry Status (CRS) >> following a Function Level Reset (FLR) request to indicate that it is not >> ready to accept new requests. >> >> Seen a timeout message with Intel 750 NVMe drive and FLR reset. >> >> Kernel enables CRS visibility in pci_enable_crs() function for each bridge >> it discovers. The OS observes a special vendor ID read value of 0xFFFF0001 >> in this case. We need to keep polling until this special read value >> disappears. pci_bus_read_dev_vendor_id() takes care of CRS handling for a >> given vendor id read request under the covers. > > This patch isn't about how CRS works; we already have support for > that. So this paragraph is mostly extraneous and can be replaced by a > simple reference to CRS in the spec. > >> Adding a vendor ID read if this is a physical function before attempting >> to read any other registers on the endpoint. A CRS indication will only >> be given if the address to be read is vendor ID register. >> >> Note that virtual functions report their vendor ID through another >> mechanism. > > How VFs report vendor ID is irrelevant. > > What *is* relevant is how FLR affects VFs. The SR-IOV spec, r1.1, > sec 2.2.2, says FLR doesn't affect a VF's existence in config space. > > That suggests to me that reading a VF's PCI_COMMAND register after an > FLR should always return valid data (never ~0). I suppose an FLR VF > reset isn't instantaneous, though, so I suppose we do need the 100ms > delay. But maybe we should just return immediately after that, > without reading any VF config space? > > pci_flr_wait() was added by 5adecf817dd6 ("PCI: Wait for up to 1000ms > after FLR reset"); maybe Alex has more insight into this. > >> The spec is calling to wait up to 1 seconds if the device is sending CRS. >> The NVMe device seems to be requiring more. Relax this up to 60 seconds. >> >> Signed-off-by: Sinan Kaya <okaya@codeaurora.org> >> --- >> drivers/pci/pci.c | 14 ++++++++++---- >> 1 file changed, 10 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c >> index aab9d51..83a9784 100644 >> --- a/drivers/pci/pci.c >> +++ b/drivers/pci/pci.c >> @@ -3723,10 +3723,16 @@ static void pci_flr_wait(struct pci_dev *dev) >> int i = 0; >> u32 id; >> >> - do { >> - msleep(100); >> - pci_read_config_dword(dev, PCI_COMMAND, &id); >> - } while (i++ < 10 && id == ~0); >> + if (dev->is_virtfn) { >> + do { >> + msleep(100); >> + pci_read_config_dword(dev, PCI_COMMAND, &id); >> + } while (i++ < 10 && id == ~0); >> + } else { >> + if (!pci_bus_read_dev_vendor_id(dev->bus, dev->devfn, &id, >> + 60*1000)) >> + id = ~0; >> + } >> >> if (id == ~0) >> dev_warn(&dev->dev, "Failed to return from FLR\n"); >> -- >> 1.9.1 >> >> Welcome back from vacation. Let's pick up from where we left. Based on our email conversation, here are things I noted: 1. You want to go back to 100ms for virtual functions. 2. You want to print some user visible information when CRS is taking longer. I can call the vendor ID function 60 times and print a message on each loop if it helps. 3. You are looking for maximum CRS time reference from the spec. The reference depends on how you interpret it. I interpreted it as 1 second maximum CRS time. Keith interpreted it as 1 second being a reference to conventional reset maximum time rather than CRS time. The fact that no time limit specified in the CRS chapter also makes me lean towards Keith's interpretation. 4. I'll clean up the commit message based on your feedback. Please let me know if I missed anything. Sinan
On Mon, Jul 31, 2017 at 05:45:46PM -0400, Sinan Kaya wrote: > Hi Bjorn, > > On 7/13/2017 7:49 PM, Bjorn Helgaas wrote: > > On Thu, Jul 06, 2017 at 05:07:14PM -0400, Sinan Kaya wrote: > >> An endpoint is allowed to issue Configuration Request Retry Status (CRS) > >> following a Function Level Reset (FLR) request to indicate that it is not > >> ready to accept new requests. > >> > >> Seen a timeout message with Intel 750 NVMe drive and FLR reset. > >> > >> Kernel enables CRS visibility in pci_enable_crs() function for each bridge > >> it discovers. The OS observes a special vendor ID read value of 0xFFFF0001 > >> in this case. We need to keep polling until this special read value > >> disappears. pci_bus_read_dev_vendor_id() takes care of CRS handling for a > >> given vendor id read request under the covers. > > > > This patch isn't about how CRS works; we already have support for > > that. So this paragraph is mostly extraneous and can be replaced by a > > simple reference to CRS in the spec. > > > >> Adding a vendor ID read if this is a physical function before attempting > >> to read any other registers on the endpoint. A CRS indication will only > >> be given if the address to be read is vendor ID register. > >> > >> Note that virtual functions report their vendor ID through another > >> mechanism. > > > > How VFs report vendor ID is irrelevant. > > > > What *is* relevant is how FLR affects VFs. The SR-IOV spec, r1.1, > > sec 2.2.2, says FLR doesn't affect a VF's existence in config space. > > > > That suggests to me that reading a VF's PCI_COMMAND register after an > > FLR should always return valid data (never ~0). I suppose an FLR VF > > reset isn't instantaneous, though, so I suppose we do need the 100ms > > delay. But maybe we should just return immediately after that, > > without reading any VF config space? > > > > pci_flr_wait() was added by 5adecf817dd6 ("PCI: Wait for up to 1000ms > > after FLR reset"); maybe Alex has more insight into this. > > > >> The spec is calling to wait up to 1 seconds if the device is sending CRS. > >> The NVMe device seems to be requiring more. Relax this up to 60 seconds. > >> > >> Signed-off-by: Sinan Kaya <okaya@codeaurora.org> > >> --- > >> drivers/pci/pci.c | 14 ++++++++++---- > >> 1 file changed, 10 insertions(+), 4 deletions(-) > >> > >> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > >> index aab9d51..83a9784 100644 > >> --- a/drivers/pci/pci.c > >> +++ b/drivers/pci/pci.c > >> @@ -3723,10 +3723,16 @@ static void pci_flr_wait(struct pci_dev *dev) > >> int i = 0; > >> u32 id; > >> > >> - do { > >> - msleep(100); > >> - pci_read_config_dword(dev, PCI_COMMAND, &id); > >> - } while (i++ < 10 && id == ~0); > >> + if (dev->is_virtfn) { > >> + do { > >> + msleep(100); > >> + pci_read_config_dword(dev, PCI_COMMAND, &id); > >> + } while (i++ < 10 && id == ~0); > >> + } else { > >> + if (!pci_bus_read_dev_vendor_id(dev->bus, dev->devfn, &id, > >> + 60*1000)) > >> + id = ~0; > >> + } > >> > >> if (id == ~0) > >> dev_warn(&dev->dev, "Failed to return from FLR\n"); > >> -- > >> 1.9.1 > >> > >> > > Welcome back from vacation. Let's pick up from where we left. > > Based on our email conversation, here are things I noted: > > 1. You want to go back to 100ms for virtual functions. This should be a separate patch with a changelog and a comment that references the spec, e.g., SR-IOV r1.1, sec 2.2.2 or whatever seems most relevant. > 2. You want to print some user visible information when CRS is > taking longer. I can call the vendor ID function 60 times and print > a message on each loop if it helps. I don't remember the context of this. 60 copies of a message sounds like too much. > 3. You are looking for maximum CRS time reference from the spec. The > reference depends on how you interpret it. I interpreted it as 1 > second maximum CRS time. Keith interpreted it as 1 second being a > reference to conventional reset maximum time rather than CRS time. > The fact that no time limit specified in the CRS chapter also makes > me lean towards Keith's interpretation. I'm looking for the *minimum* time we're required to wait per the spec, with pointers to the spec sections you used to derive it. In most cases the spec does not specify maximum times for software events because that would be like saying "your software must be at least this fast" and that sort of thing is impossible to enforce. If you find a maximum, great -- use it and include the spec pointer. But I suspect the max time we wait for a device to become ready will be a policy question of how long Linux wants to wait. Bjorn
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index aab9d51..83a9784 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -3723,10 +3723,16 @@ static void pci_flr_wait(struct pci_dev *dev) int i = 0; u32 id; - do { - msleep(100); - pci_read_config_dword(dev, PCI_COMMAND, &id); - } while (i++ < 10 && id == ~0); + if (dev->is_virtfn) { + do { + msleep(100); + pci_read_config_dword(dev, PCI_COMMAND, &id); + } while (i++ < 10 && id == ~0); + } else { + if (!pci_bus_read_dev_vendor_id(dev->bus, dev->devfn, &id, + 60*1000)) + id = ~0; + } if (id == ~0) dev_warn(&dev->dev, "Failed to return from FLR\n");
An endpoint is allowed to issue Configuration Request Retry Status (CRS) following a Function Level Reset (FLR) request to indicate that it is not ready to accept new requests. Seen a timeout message with Intel 750 NVMe drive and FLR reset. Kernel enables CRS visibility in pci_enable_crs() function for each bridge it discovers. The OS observes a special vendor ID read value of 0xFFFF0001 in this case. We need to keep polling until this special read value disappears. pci_bus_read_dev_vendor_id() takes care of CRS handling for a given vendor id read request under the covers. Adding a vendor ID read if this is a physical function before attempting to read any other registers on the endpoint. A CRS indication will only be given if the address to be read is vendor ID register. Note that virtual functions report their vendor ID through another mechanism. The spec is calling to wait up to 1 seconds if the device is sending CRS. The NVMe device seems to be requiring more. Relax this up to 60 seconds. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> --- drivers/pci/pci.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-)