diff mbox series

[v4,3/5] PCI: brcmstb: Set PCIe transaction completion timeout

Message ID 20230428223500.23337-4-jim2101024@gmail.com (mailing list archive)
State Superseded
Headers show
Series PCI: brcmstb: Configure appropriate HW CLKREQ# mode | expand

Commit Message

Jim Quinlan April 28, 2023, 10:34 p.m. UTC
Since the STB PCIe HW will cause a CPU abort on a PCIe transaction
completion timeout abort, we might as well extend the default timeout
limit.  Further, different devices and systems may requires a larger or
smaller amount commensurate with their L1SS exit time, so the property
"brcm,completion-timeout-us" may be used to set a custom timeout value.

Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jim Quinlan <jim2101024@gmail.com>
---
 drivers/pci/controller/pcie-brcmstb.c | 30 +++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

Comments

Bjorn Helgaas April 30, 2023, 7:13 p.m. UTC | #1
On Fri, Apr 28, 2023 at 06:34:57PM -0400, Jim Quinlan wrote:
> Since the STB PCIe HW will cause a CPU abort on a PCIe transaction
> completion timeout abort, we might as well extend the default timeout
> limit.  Further, different devices and systems may requires a larger or
> smaller amount commensurate with their L1SS exit time, so the property
> "brcm,completion-timeout-us" may be used to set a custom timeout value.

s/requires/require/

AFAIK, other platforms do not tweak Configuration Timeout values based
on L1SS exit time.  Why is brcm different?

Bjorn
Jim Quinlan April 30, 2023, 9:24 p.m. UTC | #2
On Sun, Apr 30, 2023 at 3:13 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Fri, Apr 28, 2023 at 06:34:57PM -0400, Jim Quinlan wrote:
> > Since the STB PCIe HW will cause a CPU abort on a PCIe transaction
> > completion timeout abort, we might as well extend the default timeout
> > limit.  Further, different devices and systems may requires a larger or
> > smaller amount commensurate with their L1SS exit time, so the property
> > "brcm,completion-timeout-us" may be used to set a custom timeout value.
>
> s/requires/require/
>
> AFAIK, other platforms do not tweak Configuration Timeout values based
> on L1SS exit time.  Why is brcm different?

Keep in mind that our Brcm PCIe HW signals a CPU abort on a PCIe
completion timeout.  Other PCIe HW just returns 0xffffffff.

I've been maintaining this driver for over eight years or so and we've
done fine with the HW default completion timeout value.
Only recently has a major customer requested that this timeout value
be changed, and their reason was so they could
avoid a CPU abort when using L1SS.

Now we could set this value to a big number for all cases and not
require "brcm,completion-timeout-us".  I cannot see any
downsides, other than another customer coming along asking us to
double the default or lessen it.

But I'm certainly willing to do that -- would that  be acceptable?

Regards,
Jim

>
> Bjorn
Bjorn Helgaas April 30, 2023, 10:38 p.m. UTC | #3
On Sun, Apr 30, 2023 at 05:24:26PM -0400, Jim Quinlan wrote:
> On Sun, Apr 30, 2023 at 3:13 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Fri, Apr 28, 2023 at 06:34:57PM -0400, Jim Quinlan wrote:
> > > Since the STB PCIe HW will cause a CPU abort on a PCIe transaction
> > > completion timeout abort, we might as well extend the default timeout
> > > limit.  Further, different devices and systems may requires a larger or
> > > smaller amount commensurate with their L1SS exit time, so the property
> > > "brcm,completion-timeout-us" may be used to set a custom timeout value.
> >
> > s/requires/require/
> >
> > AFAIK, other platforms do not tweak Configuration Timeout values based
> > on L1SS exit time.  Why is brcm different?
> 
> Keep in mind that our Brcm PCIe HW signals a CPU abort on a PCIe
> completion timeout.  Other PCIe HW just returns 0xffffffff.

Most does, but I'm pretty sure there are other controllers used on
arm64 that signal CPU aborts, e.g., imx6q_pcie_abort_handler() seems
similar.

> I've been maintaining this driver for over eight years or so and we've
> done fine with the HW default completion timeout value.
> Only recently has a major customer requested that this timeout value
> be changed, and their reason was so they could
> avoid a CPU abort when using L1SS.
> 
> Now we could set this value to a big number for all cases and not
> require "brcm,completion-timeout-us".  I cannot see any
> downsides, other than another customer coming along asking us to
> double the default or lessen it.
> 
> But I'm certainly willing to do that -- would that  be acceptable?

That would be fine with me.

Bjorn
Lukas Wunner May 1, 2023, 8:55 p.m. UTC | #4
On Sun, Apr 30, 2023 at 05:24:26PM -0400, Jim Quinlan wrote:
> I've been maintaining this driver for over eight years or so and we've
> done fine with the HW default completion timeout value.
> Only recently has a major customer requested that this timeout value
> be changed, and their reason was so they could
> avoid a CPU abort when using L1SS.
> 
> Now we could set this value to a big number for all cases and not
> require "brcm,completion-timeout-us".  I cannot see any
> downsides, other than another customer coming along asking us to
> double the default or lessen it.

The Completion Timeout is configurable in the Device Control 2 Register
(PCIe r2.1 sec 7.8.16).  Your controller does expose that register:

  DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ OBFF Disabled, ARIFwd-
           AtomicOpsCtl: ReqEn- EgressBlck-

Why does your controller allow configuration of the timeout in a
proprietary register in addition to DevCtl2?

If you make the Completion Timeout configurable, please do so in
a spec-compliant way, i.e. via DevCtl2, so that it works for
other products as well.

If the proprietary register has advantages over DevCtl2 (e.g.
finer granularity), perhaps you can divert accesses to the
Completion Timeout Value in DevCtl2 to your proprietary register.

Thanks,

Lukas
Stefan Wahren May 3, 2023, 6:06 a.m. UTC | #5
Hi Jim,

Am 29.04.23 um 00:34 schrieb Jim Quinlan:
> Since the STB PCIe HW will cause a CPU abort on a PCIe transaction
> completion timeout abort, we might as well extend the default timeout
> limit.  Further, different devices and systems may requires a larger or
> smaller amount commensurate with their L1SS exit time, so the property
> "brcm,completion-timeout-us" may be used to set a custom timeout value.
> 
> Tested-by: Florian Fainelli <f.fainelli@gmail.com>
> Signed-off-by: Jim Quinlan <jim2101024@gmail.com>
> ---
>   drivers/pci/controller/pcie-brcmstb.c | 30 +++++++++++++++++++++++++++
>   1 file changed, 30 insertions(+)
> 
> diff --git a/drivers/pci/controller/pcie-brcmstb.c b/drivers/pci/controller/pcie-brcmstb.c
> index c4b076ea5180..c2cb683447ac 100644
> --- a/drivers/pci/controller/pcie-brcmstb.c
> +++ b/drivers/pci/controller/pcie-brcmstb.c
> @@ -1080,6 +1080,35 @@ static void brcm_config_clkreq(struct brcm_pcie *pcie)
>   	writel(clkreq_set, pcie->base + PCIE_MISC_HARD_PCIE_HARD_DEBUG);
>   }
>   
> +static void brcm_config_completion_timeout(struct brcm_pcie *pcie)
> +{
> +	/* TIMEOUT register is two registers before RGR1_SW_INIT_1 */
> +	const char *fmt = "brcm,completion-timeout-us clamped to region [%u..%u]\n";
> +	const unsigned int REG_OFFSET = PCIE_RGR1_SW_INIT_1(pcie) - 8;
> +	const u32 timeout_us_min = 16;
> +	const u32 timeout_us_max = 19884107;
> +	u32 timeout_us = 1000000; /* Our default, 1 second */
> +	int rval, ret;
> +
> +	ret = of_property_read_u32(pcie->np, "brcm,completion-timeout-us",
> +				   &timeout_us);
> +	if (ret && ret != -EINVAL)
> +		dev_err(pcie->dev, "malformed/invalid 'brcm,completion-timeout-us'\n");
> +
> +	/* If needed, clamp the requested timeout value and issue a warning */
> +	if (timeout_us < timeout_us_min) {
> +		timeout_us = timeout_us_min;
> +		dev_warn(pcie->dev, fmt, timeout_us_min, timeout_us_max);
> +	} else if (timeout_us > timeout_us_max) {
> +		timeout_us = timeout_us_max;
> +		dev_warn(pcie->dev, fmt, timeout_us_min, timeout_us_max);
> +	}
> +
> +	/* Each unit in timeout register is 1/216,000,000 seconds */
> +	rval = 216 * timeout_us;
> +	writel(rval, pcie->base + REG_OFFSET);

i don't think "int" is the proper type for rval here

> +}
> +
>   static int brcm_pcie_start_link(struct brcm_pcie *pcie)
>   {
>   	struct device *dev = pcie->dev;
> @@ -1110,6 +1139,7 @@ static int brcm_pcie_start_link(struct brcm_pcie *pcie)
>   		return -ENODEV;
>   	}
>   
> +	brcm_config_completion_timeout(pcie);
>   	brcm_config_clkreq(pcie);
>   
>   	if (pcie->gen)
Jim Quinlan May 3, 2023, 2:06 p.m. UTC | #6
On Mon, May 1, 2023 at 4:55 PM Lukas Wunner <lukas@wunner.de> wrote:
>
> On Sun, Apr 30, 2023 at 05:24:26PM -0400, Jim Quinlan wrote:
> > I've been maintaining this driver for over eight years or so and we've
> > done fine with the HW default completion timeout value.
> > Only recently has a major customer requested that this timeout value
> > be changed, and their reason was so they could
> > avoid a CPU abort when using L1SS.
> >
> > Now we could set this value to a big number for all cases and not
> > require "brcm,completion-timeout-us".  I cannot see any
> > downsides, other than another customer coming along asking us to
> > double the default or lessen it.
>
> The Completion Timeout is configurable in the Device Control 2 Register
> (PCIe r2.1 sec 7.8.16).  Your controller does expose that register:
>
>   DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ OBFF Disabled, ARIFwd-
>            AtomicOpsCtl: ReqEn- EgressBlck-
>
> Why does your controller allow configuration of the timeout in a
> proprietary register in addition to DevCtl2?
>
> If you make the Completion Timeout configurable, please do so in
> a spec-compliant way, i.e. via DevCtl2, so that it works for
> other products as well.
>
> If the proprietary register has advantages over DevCtl2 (e.g.
> finer granularity), perhaps you can divert accesses to the
> Completion Timeout Value in DevCtl2 to your proprietary register.

Hello Lukas & Bjorn,

Yes, you are right.  I was under the (mis)understanding that writing
this register is (a) necessary for long L1SS periods and (b) overrides
or updates the value in the CFG register you mentioned.  It turns out
that only (a) is true.

After communicating with the HW engineer, it appears this register is
for an internal bus timeout.  Further, this bus timeout can occur when
there is no PCIe access involved.  However, the error appears as a
completion timeout,  which I also am investigating.

At any rate, I am dropping the "brcm,completion-timeout-usec" property
completely and V5 will just set this internal timeout to a higher
default if  the "brcm,enable-l1ss" is present.

Thanks &  regards,
Jim




>
> Thanks,
>
> Lukas
diff mbox series

Patch

diff --git a/drivers/pci/controller/pcie-brcmstb.c b/drivers/pci/controller/pcie-brcmstb.c
index c4b076ea5180..c2cb683447ac 100644
--- a/drivers/pci/controller/pcie-brcmstb.c
+++ b/drivers/pci/controller/pcie-brcmstb.c
@@ -1080,6 +1080,35 @@  static void brcm_config_clkreq(struct brcm_pcie *pcie)
 	writel(clkreq_set, pcie->base + PCIE_MISC_HARD_PCIE_HARD_DEBUG);
 }
 
+static void brcm_config_completion_timeout(struct brcm_pcie *pcie)
+{
+	/* TIMEOUT register is two registers before RGR1_SW_INIT_1 */
+	const char *fmt = "brcm,completion-timeout-us clamped to region [%u..%u]\n";
+	const unsigned int REG_OFFSET = PCIE_RGR1_SW_INIT_1(pcie) - 8;
+	const u32 timeout_us_min = 16;
+	const u32 timeout_us_max = 19884107;
+	u32 timeout_us = 1000000; /* Our default, 1 second */
+	int rval, ret;
+
+	ret = of_property_read_u32(pcie->np, "brcm,completion-timeout-us",
+				   &timeout_us);
+	if (ret && ret != -EINVAL)
+		dev_err(pcie->dev, "malformed/invalid 'brcm,completion-timeout-us'\n");
+
+	/* If needed, clamp the requested timeout value and issue a warning */
+	if (timeout_us < timeout_us_min) {
+		timeout_us = timeout_us_min;
+		dev_warn(pcie->dev, fmt, timeout_us_min, timeout_us_max);
+	} else if (timeout_us > timeout_us_max) {
+		timeout_us = timeout_us_max;
+		dev_warn(pcie->dev, fmt, timeout_us_min, timeout_us_max);
+	}
+
+	/* Each unit in timeout register is 1/216,000,000 seconds */
+	rval = 216 * timeout_us;
+	writel(rval, pcie->base + REG_OFFSET);
+}
+
 static int brcm_pcie_start_link(struct brcm_pcie *pcie)
 {
 	struct device *dev = pcie->dev;
@@ -1110,6 +1139,7 @@  static int brcm_pcie_start_link(struct brcm_pcie *pcie)
 		return -ENODEV;
 	}
 
+	brcm_config_completion_timeout(pcie);
 	brcm_config_clkreq(pcie);
 
 	if (pcie->gen)