Message ID | 20250409144930.10402-1-mike.looijmans@topic.nl (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v3,1/2] pcie-xilinx: Wait for link-up status during initialization | expand |
Please make the subject line match previous changes to this driver. See "git log --oneline drivers/pci/controller/pcie-xilinx.c" On Wed, Apr 09, 2025 at 04:49:24PM +0200, Mike Looijmans wrote: > When the driver loads, the transceiver may still be in the state of > setting up a link. Wait for that to complete before continuing. This > fixes that the PCIe core does not work when loading the PL bitstream > from userspace. There's only milliseconds between the FPGA boot and the > core initializing in that case, and the link won't be up yet. The design > only worked when the FPGA was programmed in the bootloader, as that will > give the system hundreds of milliseconds to boot. > > As the PCIe spec allows up to 100 ms time to establish a link, we'll > allow up to 200ms before giving up. This sounds like there's still a race between userspace loading the PL bitstream and the driver waiting for link up, but we're just waiting longer in the kernel so userspace has more chance of winning the race. Is that true? > @@ -126,6 +127,19 @@ static inline bool xilinx_pcie_link_up(struct xilinx_pcie *pcie) > XILINX_PCIE_REG_PSCR_LNKUP) ? 1 : 0; > } > > +static int xilinx_pci_wait_link_up(struct xilinx_pcie *pcie) > +{ > + u32 val; > + > + /* > + * PCIe r6.0, sec 6.6.1 provides 100ms timeout. Since this is FPGA > + * fabric, we're more lenient and allow 200 ms for link training. > + */ > + return readl_poll_timeout(pcie->reg_base + XILINX_PCIE_REG_PSCR, val, > + (val & XILINX_PCIE_REG_PSCR_LNKUP), 2 * USEC_PER_MSEC, > + 200 * USEC_PER_MSEC); There should be a #define in drivers/pci/pci.h for this 100ms value that you can use here to connect this more closely with the spec. Maybe there's a way to use read_poll_timeout(), readx_poll_timeout(), or something similar so we can use xilinx_pcie_link_up() directly instead of reimplementing it here? > +} > + > /** > * xilinx_pcie_clear_err_interrupts - Clear Error Interrupts > * @pcie: PCIe port information > @@ -493,7 +507,7 @@ static void xilinx_pcie_init_port(struct xilinx_pcie *pcie) > { > struct device *dev = pcie->dev; > > - if (xilinx_pcie_link_up(pcie)) > + if (!xilinx_pci_wait_link_up(pcie)) > dev_info(dev, "PCIe Link is UP\n"); > else > dev_info(dev, "PCIe Link is DOWN\n");
Met vriendelijke groet / kind regards, Mike Looijmans System Expert TOPIC Embedded Products B.V. Materiaalweg 4, 5681 RJ Best The Netherlands T: +31 (0) 499 33 69 69 E: mike.looijmans@topic.nl W: www.topic.nl Please consider the environment before printing this e-mail On 09-04-2025 17:17, Bjorn Helgaas wrote: > Please make the subject line match previous changes to this driver. > See "git log --oneline drivers/pci/controller/pcie-xilinx.c" > > On Wed, Apr 09, 2025 at 04:49:24PM +0200, Mike Looijmans wrote: >> When the driver loads, the transceiver may still be in the state of >> setting up a link. Wait for that to complete before continuing. This >> fixes that the PCIe core does not work when loading the PL bitstream >> from userspace. There's only milliseconds between the FPGA boot and the >> core initializing in that case, and the link won't be up yet. The design >> only worked when the FPGA was programmed in the bootloader, as that will >> give the system hundreds of milliseconds to boot. >> >> As the PCIe spec allows up to 100 ms time to establish a link, we'll >> allow up to 200ms before giving up. > This sounds like there's still a race between userspace loading the PL > bitstream and the driver waiting for link up, but we're just waiting > longer in the kernel so userspace has more chance of winning the race. > Is that true? No, that's not the case here. The PCIe (host) core is what is in the PL bitstream. Devicetree overlay and FPGA support take care of that part, so the PL is programmed and the PCIe core is available when this driver probes. The issue is with the endpoint on the other side of the PL, most likely an NVME drive, WiFi card or PCIe switch (nothing inside the FPGA, the purpose of the PCIe core here is to communicate with something external over PCIe). The endpoint gets reset by what amounts to black magic (external circuit, something in the PL, or just nothing at all). This driver assumes that that has already happened, and immediately starts training. This works on Xilinx' reference designs that program the PL in their proprietary bootloader. If the PL was programmed from within Linux, this driver will probe within milliseconds after programming the PL, and in that case, the link won't be up yet. The second patch in this series adds a PERST# GPIO support, so that the "black magic" can also be replaced with a proper reset. > >> @@ -126,6 +127,19 @@ static inline bool xilinx_pcie_link_up(struct xilinx_pcie *pcie) >> XILINX_PCIE_REG_PSCR_LNKUP) ? 1 : 0; >> } >> >> +static int xilinx_pci_wait_link_up(struct xilinx_pcie *pcie) >> +{ >> + u32 val; >> + >> + /* >> + * PCIe r6.0, sec 6.6.1 provides 100ms timeout. Since this is FPGA >> + * fabric, we're more lenient and allow 200 ms for link training. >> + */ >> + return readl_poll_timeout(pcie->reg_base + XILINX_PCIE_REG_PSCR, val, >> + (val & XILINX_PCIE_REG_PSCR_LNKUP), 2 * USEC_PER_MSEC, >> + 200 * USEC_PER_MSEC); > There should be a #define in drivers/pci/pci.h for this 100ms value > that you can use here to connect this more closely with the spec. That'd be "PCIE_T_RRS_READY_MS". Experience learns that adhering too closely to this spec is a good way to make your system fail though, so most host controllers are more lenient, e.g. rockchip uses 500ms. > > Maybe there's a way to use read_poll_timeout(), readx_poll_timeout(), > or something similar so we can use xilinx_pcie_link_up() directly > instead of reimplementing it here? Other way around would be easier, just call this again when it wants to know if the link is up, maybe with a 0 timeout (which allows the compiler to remove the loop). > >> +} >> + >> /** >> * xilinx_pcie_clear_err_interrupts - Clear Error Interrupts >> * @pcie: PCIe port information >> @@ -493,7 +507,7 @@ static void xilinx_pcie_init_port(struct xilinx_pcie *pcie) >> { >> struct device *dev = pcie->dev; >> >> - if (xilinx_pcie_link_up(pcie)) >> + if (!xilinx_pci_wait_link_up(pcie)) >> dev_info(dev, "PCIe Link is UP\n"); >> else >> dev_info(dev, "PCIe Link is DOWN\n");
diff --git a/drivers/pci/controller/pcie-xilinx.c b/drivers/pci/controller/pcie-xilinx.c index 0b534f73a942..2e59b91f43e0 100644 --- a/drivers/pci/controller/pcie-xilinx.c +++ b/drivers/pci/controller/pcie-xilinx.c @@ -15,6 +15,7 @@ #include <linux/irqdomain.h> #include <linux/kernel.h> #include <linux/init.h> +#include <linux/iopoll.h> #include <linux/msi.h> #include <linux/of_address.h> #include <linux/of_pci.h> @@ -126,6 +127,19 @@ static inline bool xilinx_pcie_link_up(struct xilinx_pcie *pcie) XILINX_PCIE_REG_PSCR_LNKUP) ? 1 : 0; } +static int xilinx_pci_wait_link_up(struct xilinx_pcie *pcie) +{ + u32 val; + + /* + * PCIe r6.0, sec 6.6.1 provides 100ms timeout. Since this is FPGA + * fabric, we're more lenient and allow 200 ms for link training. + */ + return readl_poll_timeout(pcie->reg_base + XILINX_PCIE_REG_PSCR, val, + (val & XILINX_PCIE_REG_PSCR_LNKUP), 2 * USEC_PER_MSEC, + 200 * USEC_PER_MSEC); +} + /** * xilinx_pcie_clear_err_interrupts - Clear Error Interrupts * @pcie: PCIe port information @@ -493,7 +507,7 @@ static void xilinx_pcie_init_port(struct xilinx_pcie *pcie) { struct device *dev = pcie->dev; - if (xilinx_pcie_link_up(pcie)) + if (!xilinx_pci_wait_link_up(pcie)) dev_info(dev, "PCIe Link is UP\n"); else dev_info(dev, "PCIe Link is DOWN\n");
When the driver loads, the transceiver may still be in the state of setting up a link. Wait for that to complete before continuing. This fixes that the PCIe core does not work when loading the PL bitstream from userspace. There's only milliseconds between the FPGA boot and the core initializing in that case, and the link won't be up yet. The design only worked when the FPGA was programmed in the bootloader, as that will give the system hundreds of milliseconds to boot. As the PCIe spec allows up to 100 ms time to establish a link, we'll allow up to 200ms before giving up. Signed-off-by: Mike Looijmans <mike.looijmans@topic.nl> --- (no changes since v2) Changes in v2: Split into "reset GPIO" and "wait for link" patches Add timeout explanation drivers/pci/controller/pcie-xilinx.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-)