Message ID | 20201211121507.28166-1-daniel.thompson@linaro.org (mailing list archive) |
---|---|
State | RFC |
Delegated to: | Lorenzo Pieralisi |
Headers | show |
Series | [RFC,HACK] PCI: dwc: layerscape: Hack around enumeration problems with Honeycomb LX2K | expand |
On Fri, Dec 11, 2020 at 6:15 AM Daniel Thompson <daniel.thompson@linaro.org> wrote: > > I have been chasing down a problem enumerating an NVMe drive on a > Honeycomb LX2K (NXP LX2160A). Specifically the drive can only enumerate > successfully if the we are emitting lots of console messages via a UART. > If the system is booted with `quiet` set then enumeration fails. We really need to get away from work-arounds for device X on host Y. I suspect they are not limited to that combination only... How exactly does it fail to enumerate? There's a (racy) linkup check on config accesses, is it reporting link down and skipping config accesses? > I guessed this would be due to the timing impact of printk-to-UART and > tried to find out where a delay could be added to provoke a successful > enumeration. > > This patch contains the results. The delay length (1ms) was selected by > binary searching downwards until the delay was not effective for these > devices (Honeycomb LX2K and a Western Digital WD Blue SN550). > > I have also included the workaround twice (conditionally compiled). The > first change is the *latest* possible code path that we can deploy a > delay whilst the second is the earliest place I could find. > > The summary is that the critical window were we are currently relying on > a call to the console UART code can "mend" the driver runs from calling > dw_pcie_setup_rc() in host init to just before we read the state in the > link up callback. > > Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> > --- > > Notes: > This patch is RFC (and HACK) because I don't have much clue *why* this > patch works... merely that this is the smallest possible change I need > to replicate the current accidental printk() workaround. Perhaps one > could argue that RFC here stands for request-for-clue. All my > observations and changes here are empirical and I don't know how best to > turn them into something that is not a hack! > > BTW I noticed many other pcie-designware drivers take advantage > of a function called dw_pcie_wait_for_link() in their init paths... > but my naive attempts to add it to the layerscape driver results > in non-booting systems so I haven't embarrassed myself by including > that in the patch! You need to look at what's pending for v5.11, because I reworked this to be more unified. The ordering of init is also possibly changed. The sequence is now like this: dw_pcie_setup_rc(pp); dw_pcie_msi_init(pp); if (!dw_pcie_link_up(pci) && pci->ops->start_link) { ret = pci->ops->start_link(pci); if (ret) goto err_free_msi; } /* Ignore errors, the link may come up later */ dw_pcie_wait_for_link(pci); > drivers/pci/controller/dwc/pci-layerscape.c | 35 +++++++++++++++++++++ > 1 file changed, 35 insertions(+) > > diff --git a/drivers/pci/controller/dwc/pci-layerscape.c b/drivers/pci/controller/dwc/pci-layerscape.c > index f24f79a70d9a..c354904b90ef 100644 > --- a/drivers/pci/controller/dwc/pci-layerscape.c > +++ b/drivers/pci/controller/dwc/pci-layerscape.c > @@ -22,6 +22,8 @@ > > #include "pcie-designware.h" > > +#define WORKAROUND_LATEST_POSSIBLE > + > /* PEX1/2 Misc Ports Status Register */ > #define SCFG_PEXMSCPORTSR(pex_idx) (0x94 + (pex_idx) * 4) > #define LTSSM_STATE_SHIFT 20 > @@ -113,10 +115,31 @@ static int ls_pcie_link_up(struct dw_pcie *pci) > struct ls_pcie *pcie = to_ls_pcie(pci); > u32 state; > > + /* > + * Strictly speaking *this* (before the ioread32) is the latest > + * point a simple delay can be effective. If we move the delay > + * after the ioread32 then the NVMe does not enumerate. > + * > + * However this function appears to be frequently called so an > + * unconditional delay here causes noticeable delay at boot > + * time. Hence we implement the workaround by retrying the read > + * after a short delay if we think we might need to return false. > + */ > + > state = (ioread32(pcie->lut + pcie->drvdata->lut_dbg) >> > pcie->drvdata->ltssm_shift) & > LTSSM_STATE_MASK; > > +#ifdef WORKAROUND_LATEST_POSSIBLE > + if (state < LTSSM_PCIE_L0) { > + /* see comment above */ > + mdelay(1); > + state = (ioread32(pcie->lut + pcie->drvdata->lut_dbg) >> > + pcie->drvdata->ltssm_shift) & > + LTSSM_STATE_MASK; Side note, I really wonder about the variety of ways in DWC drivers we have to check link state. The default is a debug register... Are the standard PCIe registers for this really broken in these cases? > + } > +#endif > + > if (state < LTSSM_PCIE_L0) > return 0; > > @@ -152,6 +175,18 @@ static int ls_pcie_host_init(struct pcie_port *pp) > > dw_pcie_setup_rc(pp); Might be related to the PORT_LOGIC_SPEED_CHANGE we do in here. > +#ifdef WORKAROUND_EARLIEST_POSSIBLE > + /* > + * This is the earliest point the delay is effective. > + * If we move it before dw_pcie_setup_rc() then the > + * NVMe does not enumerate. > + * > + * 500us is too short to reliably work around the issue > + * hence adopting 1000us here. > + */ > + mdelay(1); > +#endif > + > return 0; > } > > > base-commit: 0477e92881850d44910a7e94fc2c46f96faa131f > -- > 2.29.2 >
On Fri, Dec 11, 2020 at 08:37:40AM -0600, Rob Herring wrote: > On Fri, Dec 11, 2020 at 6:15 AM Daniel Thompson > <daniel.thompson@linaro.org> wrote: > > > > I have been chasing down a problem enumerating an NVMe drive on a > > Honeycomb LX2K (NXP LX2160A). Specifically the drive can only enumerate > > successfully if the we are emitting lots of console messages via a UART. > > If the system is booted with `quiet` set then enumeration fails. > > We really need to get away from work-arounds for device X on host Y. I > suspect they are not limited to that combination only... No objection on that. This patch was essentially sharing the result of an investigation where I got stuck at the "now fix it properly" stage! > How exactly does it fail to enumerate? There's a (racy) linkup check > on config accesses, is it reporting link down and skipping config > accesses? In dmesg terms it looked like this: --- nvme_borked_gpu_working.linted.dmesg 2020-11-18 15:28:35.544118213 +0000 +++ nvme_working_gpu_working.linted.dmesg 2020-11-18 15:28:56.180076946 +0000 @@ -241,11 +241,19 @@ pci 0000:00:00.0: reg 0x38: [mem 0x9048000000-0x9048ffffff pref] pci 0000:00:00.0: supports D1 D2 pci 0000:00:00.0: PME# supported from D0 D1 D2 D3hot +pci 0000:01:00.0: [15b7:5009] type 00 class 0x010802 +pci 0000:01:00.0: reg 0x10: [mem 0x9049000000-0x9049003fff 64bit] +pci 0000:01:00.0: reg 0x20: [mem 0x9049004000-0x90490040ff 64bit] pci 0000:00:00.0: BAR 1: assigned [mem 0x9040000000-0x9043ffffff] pci 0000:00:00.0: BAR 0: assigned [mem 0x9044000000-0x9044ffffff] pci 0000:00:00.0: BAR 6: assigned [mem 0x9045000000-0x9045ffffff pref] +pci 0000:00:00.0: BAR 14: assigned [mem 0x9046000000-0x90460fffff] +pci 0000:01:00.0: BAR 0: assigned [mem 0x9046000000-0x9046003fff 64bit] +pci 0000:01:00.0: BAR 4: assigned [mem 0x9046004000-0x90460040ff 64bit] pci 0000:00:00.0: PCI bridge to [bus 01-ff] +pci 0000:00:00.0: bridge window [mem 0x9046000000-0x90460fffff] pci 0000:00:00.0: Max Payload Size set to 256/ 256 (was 128), Max Read Rq 256 +pci 0000:01:00.0: Max Payload Size set to 256/ 512 (was 128), Max Read Rq 256 layerscape-pcie 3800000.pcie: host bridge /soc/pcie@3800000 ranges: layerscape-pcie 3800000.pcie: MEM 0xa040000000..0xa07fffffff -> 0x0040000000 layerscape-pcie 3800000.pcie: PCI host bridge to bus 0001:00 ... and be aware that the last three lines here are another PCIe controller coming up just fine and it is about to detect the GPU (which like the NVMe is also gen3) just fine whether or not we add a short delay. > > I guessed this would be due to the timing impact of printk-to-UART and > > tried to find out where a delay could be added to provoke a successful > > enumeration. > > > > This patch contains the results. The delay length (1ms) was selected by > > binary searching downwards until the delay was not effective for these > > devices (Honeycomb LX2K and a Western Digital WD Blue SN550). > > > > I have also included the workaround twice (conditionally compiled). The > > first change is the *latest* possible code path that we can deploy a > > delay whilst the second is the earliest place I could find. > > > > The summary is that the critical window were we are currently relying on > > a call to the console UART code can "mend" the driver runs from calling > > dw_pcie_setup_rc() in host init to just before we read the state in the > > link up callback. > > > > Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> > > --- > > > > Notes: > > This patch is RFC (and HACK) because I don't have much clue *why* this > > patch works... merely that this is the smallest possible change I need > > to replicate the current accidental printk() workaround. Perhaps one > > could argue that RFC here stands for request-for-clue. All my > > observations and changes here are empirical and I don't know how best to > > turn them into something that is not a hack! > > > > BTW I noticed many other pcie-designware drivers take advantage > > of a function called dw_pcie_wait_for_link() in their init paths... > > but my naive attempts to add it to the layerscape driver results > > in non-booting systems so I haven't embarrassed myself by including > > that in the patch! > > You need to look at what's pending for v5.11, because I reworked this > to be more unified. The ordering of init is also possibly changed. The > sequence is now like this: > > dw_pcie_setup_rc(pp); > dw_pcie_msi_init(pp); > > if (!dw_pcie_link_up(pci) && pci->ops->start_link) { > ret = pci->ops->start_link(pci); > if (ret) > goto err_free_msi; > } > > /* Ignore errors, the link may come up later */ > dw_pcie_wait_for_link(pci); Thanks. That looks likely to fix it since IIUC dw_pcie_wait_for_link() will end up waiting somewhat like the double check I added to ls_pcie_link_up(). I'll take a look at let you know. Daniel.
On Fri, Dec 11, 2020 at 05:05:58PM +0000, Daniel Thompson wrote: > On Fri, Dec 11, 2020 at 08:37:40AM -0600, Rob Herring wrote: > > On Fri, Dec 11, 2020 at 6:15 AM Daniel Thompson > > > BTW I noticed many other pcie-designware drivers take advantage > > > of a function called dw_pcie_wait_for_link() in their init paths... > > > but my naive attempts to add it to the layerscape driver results > > > in non-booting systems so I haven't embarrassed myself by including > > > that in the patch! > > > > You need to look at what's pending for v5.11, because I reworked this > > to be more unified. The ordering of init is also possibly changed. The > > sequence is now like this: > > > > dw_pcie_setup_rc(pp); > > dw_pcie_msi_init(pp); > > > > if (!dw_pcie_link_up(pci) && pci->ops->start_link) { > > ret = pci->ops->start_link(pci); > > if (ret) > > goto err_free_msi; > > } > > > > /* Ignore errors, the link may come up later */ > > dw_pcie_wait_for_link(pci); > > Thanks. That looks likely to fix it since IIUC dw_pcie_wait_for_link() > will end up waiting somewhat like the double check I added to > ls_pcie_link_up(). > > I'll take a look at let you know. Yes. These changes have fixed the enumeration problems for me. I tested pci/next and I cherry picked your patch series onto v5.10 and both are working well. Given this fixes a bug for me, do you think there is any scope for me to whittle down your series into patches for the stable kernels or am I likely to find too many extra bits being pulled in? Daniel.
On Mon, Dec 14, 2020 at 4:43 AM Daniel Thompson <daniel.thompson@linaro.org> wrote: > > On Fri, Dec 11, 2020 at 05:05:58PM +0000, Daniel Thompson wrote: > > On Fri, Dec 11, 2020 at 08:37:40AM -0600, Rob Herring wrote: > > > On Fri, Dec 11, 2020 at 6:15 AM Daniel Thompson > > > > BTW I noticed many other pcie-designware drivers take advantage > > > > of a function called dw_pcie_wait_for_link() in their init paths... > > > > but my naive attempts to add it to the layerscape driver results > > > > in non-booting systems so I haven't embarrassed myself by including > > > > that in the patch! > > > > > > You need to look at what's pending for v5.11, because I reworked this > > > to be more unified. The ordering of init is also possibly changed. The > > > sequence is now like this: > > > > > > dw_pcie_setup_rc(pp); > > > dw_pcie_msi_init(pp); > > > > > > if (!dw_pcie_link_up(pci) && pci->ops->start_link) { > > > ret = pci->ops->start_link(pci); > > > if (ret) > > > goto err_free_msi; > > > } > > > > > > /* Ignore errors, the link may come up later */ > > > dw_pcie_wait_for_link(pci); > > > > Thanks. That looks likely to fix it since IIUC dw_pcie_wait_for_link() > > will end up waiting somewhat like the double check I added to > > ls_pcie_link_up(). > > > > I'll take a look at let you know. > > Yes. These changes have fixed the enumeration problems for me. > > I tested pci/next and I cherry picked your patch series onto v5.10 and > both are working well. > > Given this fixes a bug for me, do you think there is any scope for me > to whittle down your series into patches for the stable kernels or am > I likely to find too many extra bits being pulled in? I think I'd just go the adding a delay route. It's a fairly big series and depends on my other clean-up done in 5.10. And there's at least some possibility it regresses some platform given the limited testing linux-next gets. Rob
diff --git a/drivers/pci/controller/dwc/pci-layerscape.c b/drivers/pci/controller/dwc/pci-layerscape.c index f24f79a70d9a..c354904b90ef 100644 --- a/drivers/pci/controller/dwc/pci-layerscape.c +++ b/drivers/pci/controller/dwc/pci-layerscape.c @@ -22,6 +22,8 @@ #include "pcie-designware.h" +#define WORKAROUND_LATEST_POSSIBLE + /* PEX1/2 Misc Ports Status Register */ #define SCFG_PEXMSCPORTSR(pex_idx) (0x94 + (pex_idx) * 4) #define LTSSM_STATE_SHIFT 20 @@ -113,10 +115,31 @@ static int ls_pcie_link_up(struct dw_pcie *pci) struct ls_pcie *pcie = to_ls_pcie(pci); u32 state; + /* + * Strictly speaking *this* (before the ioread32) is the latest + * point a simple delay can be effective. If we move the delay + * after the ioread32 then the NVMe does not enumerate. + * + * However this function appears to be frequently called so an + * unconditional delay here causes noticeable delay at boot + * time. Hence we implement the workaround by retrying the read + * after a short delay if we think we might need to return false. + */ + state = (ioread32(pcie->lut + pcie->drvdata->lut_dbg) >> pcie->drvdata->ltssm_shift) & LTSSM_STATE_MASK; +#ifdef WORKAROUND_LATEST_POSSIBLE + if (state < LTSSM_PCIE_L0) { + /* see comment above */ + mdelay(1); + state = (ioread32(pcie->lut + pcie->drvdata->lut_dbg) >> + pcie->drvdata->ltssm_shift) & + LTSSM_STATE_MASK; + } +#endif + if (state < LTSSM_PCIE_L0) return 0; @@ -152,6 +175,18 @@ static int ls_pcie_host_init(struct pcie_port *pp) dw_pcie_setup_rc(pp); +#ifdef WORKAROUND_EARLIEST_POSSIBLE + /* + * This is the earliest point the delay is effective. + * If we move it before dw_pcie_setup_rc() then the + * NVMe does not enumerate. + * + * 500us is too short to reliably work around the issue + * hence adopting 1000us here. + */ + mdelay(1); +#endif + return 0; }
I have been chasing down a problem enumerating an NVMe drive on a Honeycomb LX2K (NXP LX2160A). Specifically the drive can only enumerate successfully if the we are emitting lots of console messages via a UART. If the system is booted with `quiet` set then enumeration fails. I guessed this would be due to the timing impact of printk-to-UART and tried to find out where a delay could be added to provoke a successful enumeration. This patch contains the results. The delay length (1ms) was selected by binary searching downwards until the delay was not effective for these devices (Honeycomb LX2K and a Western Digital WD Blue SN550). I have also included the workaround twice (conditionally compiled). The first change is the *latest* possible code path that we can deploy a delay whilst the second is the earliest place I could find. The summary is that the critical window were we are currently relying on a call to the console UART code can "mend" the driver runs from calling dw_pcie_setup_rc() in host init to just before we read the state in the link up callback. Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> --- Notes: This patch is RFC (and HACK) because I don't have much clue *why* this patch works... merely that this is the smallest possible change I need to replicate the current accidental printk() workaround. Perhaps one could argue that RFC here stands for request-for-clue. All my observations and changes here are empirical and I don't know how best to turn them into something that is not a hack! BTW I noticed many other pcie-designware drivers take advantage of a function called dw_pcie_wait_for_link() in their init paths... but my naive attempts to add it to the layerscape driver results in non-booting systems so I haven't embarrassed myself by including that in the patch! drivers/pci/controller/dwc/pci-layerscape.c | 35 +++++++++++++++++++++ 1 file changed, 35 insertions(+) base-commit: 0477e92881850d44910a7e94fc2c46f96faa131f -- 2.29.2