[V4] PCI: rcar: Add L1 link state fix into data abort hook

Message ID	20201016120416.7008-1-marek.vasut@gmail.com (mailing list archive)
State	Superseded
Delegated to:	Geert Uytterhoeven
Headers	show Return-Path: <SRS0=L9bH=DX=vger.kernel.org=linux-renesas-soc-owner@kernel.org> From: marek.vasut@gmail.com To: linux-pci@vger.kernel.org Cc: Marek Vasut <marek.vasut+renesas@gmail.com>, Bjorn Helgaas <bhelgaas@google.com>, Geert Uytterhoeven <geert+renesas@glider.be>, Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>, Wolfram Sang <wsa@the-dreams.de>, Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>, linux-renesas-soc@vger.kernel.org Subject: [PATCH V4] PCI: rcar: Add L1 link state fix into data abort hook Date: Fri, 16 Oct 2020 14:04:16 +0200 Message-Id: <20201016120416.7008-1-marek.vasut@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	[V4] PCI: rcar: Add L1 link state fix into data abort hook \| expand [V4] PCI: rcar: Add L1 link state fix into data abort hook

Marek Vasut Oct. 16, 2020, 12:04 p.m. UTC

From: Marek Vasut <marek.vasut+renesas@gmail.com>

The R-Car PCIe controller is capable of handling L0s/L1 link states.
While the controller can enter and exit L0s link state, and exit L1
link state, without any additional action from the driver, to enter
L1 link state, the driver must complete the link state transition by
issuing additional commands to the controller.

The problem is, this transition is not atomic. The controller sets
PMEL1RX bit in PMSR register upon reception of PM_ENTER_L1 DLLP from
the PCIe card, but then the controller enters some sort of inbetween
state. The driver must detect this condition and complete the link
state transition, by setting L1IATN bit in PMCTLR and waiting for
the link state transition to complete.

If a PCIe access happens inside this window, where the controller
is between L0 and L1 link states, the access generates a fault and
the ARM 'imprecise external abort' handler is invoked.

Just like other PCI controller drivers, here we hook the fault handler,
perform the fixup to help the controller enter L1 link state, and then
restart the instruction which triggered the fault. Since the controller
is in L1 link state now, the link can exit from L1 link state to L0 and
successfully complete the access.

Note that this fixup is applicable only to Aarch32 R-Car controllers,
the Aarch64 R-Car perform the same fixup in TFA, see TFA commit [1]
0969397f2 ("rcar_gen3: plat: Prevent PCIe hang during L1X config access")
[1] https://github.com/ARM-software/arm-trusted-firmware/commit/0969397f295621aa26b3d14b76dd397d22be58bf

Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Wolfram Sang <wsa@the-dreams.de>
Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Cc: linux-renesas-soc@vger.kernel.org
---
V2: - Update commit message, add link to TFA repository commit
    - Handle the LPAE case as in ARM fault.c and fsr-{2,3}level.c
    - Cache clock and check whether they are enabled before register
      access
V3: - Fix commit message according to spellchecker
    - Use of_find_matching_node() to apply hook only on Gen1 and Gen2 RCar
      (in case the kernel is multiplatform)
V4: - Mark rcar_pcie_abort_handler_of_match with __initconst
---
 drivers/pci/controller/pcie-rcar-host.c | 76 +++++++++++++++++++++++++
 drivers/pci/controller/pcie-rcar.h      |  7 +++
 2 files changed, 83 insertions(+)

Geert Uytterhoeven Oct. 17, 2020, 2:03 p.m. UTC | #1

On Fri, Oct 16, 2020 at 2:04 PM <marek.vasut@gmail.com> wrote:
> From: Marek Vasut <marek.vasut+renesas@gmail.com>
>
> The R-Car PCIe controller is capable of handling L0s/L1 link states.
> While the controller can enter and exit L0s link state, and exit L1
> link state, without any additional action from the driver, to enter
> L1 link state, the driver must complete the link state transition by
> issuing additional commands to the controller.
>
> The problem is, this transition is not atomic. The controller sets
> PMEL1RX bit in PMSR register upon reception of PM_ENTER_L1 DLLP from
> the PCIe card, but then the controller enters some sort of inbetween
> state. The driver must detect this condition and complete the link
> state transition, by setting L1IATN bit in PMCTLR and waiting for
> the link state transition to complete.
>
> If a PCIe access happens inside this window, where the controller
> is between L0 and L1 link states, the access generates a fault and
> the ARM 'imprecise external abort' handler is invoked.
>
> Just like other PCI controller drivers, here we hook the fault handler,
> perform the fixup to help the controller enter L1 link state, and then
> restart the instruction which triggered the fault. Since the controller
> is in L1 link state now, the link can exit from L1 link state to L0 and
> successfully complete the access.
>
> Note that this fixup is applicable only to Aarch32 R-Car controllers,
> the Aarch64 R-Car perform the same fixup in TFA, see TFA commit [1]
> 0969397f2 ("rcar_gen3: plat: Prevent PCIe hang during L1X config access")
> [1] https://github.com/ARM-software/arm-trusted-firmware/commit/0969397f295621aa26b3d14b76dd397d22be58bf
>
> Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Geert Uytterhoeven <geert+renesas@glider.be>
> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> Cc: Wolfram Sang <wsa@the-dreams.de>
> Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> Cc: linux-renesas-soc@vger.kernel.org
> ---
> V2: - Update commit message, add link to TFA repository commit
>     - Handle the LPAE case as in ARM fault.c and fsr-{2,3}level.c
>     - Cache clock and check whether they are enabled before register
>       access
> V3: - Fix commit message according to spellchecker
>     - Use of_find_matching_node() to apply hook only on Gen1 and Gen2 RCar
>       (in case the kernel is multiplatform)
> V4: - Mark rcar_pcie_abort_handler_of_match with __initconst

Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>

Please add tags given to the previous version:
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>

Gr{oetje,eeting}s,

                        Geert

Lorenzo Pieralisi Nov. 19, 2020, 5:35 p.m. UTC | #2

On Fri, Oct 16, 2020 at 02:04:16PM +0200, marek.vasut@gmail.com wrote:

[...]

> +#ifdef CONFIG_ARM
> +/*
> + * Here we keep a static copy of the remapped PCIe controller address.
> + * This is only used on aarch32 systems, all of which have one single
> + * PCIe controller, to provide quick access to the PCIe controller in
> + * the L1 link state fixup function, called from the ARM fault handler.
> + */
> +static void __iomem *pcie_base;
> +/*
> + * Static copy of bus clock pointer, so we can check whether the clock
> + * is enabled or not.
> + */
> +static struct clk *pcie_bus_clk;
> +#endif

Don't think you can have multiple host bridges in a given platform,
if it is a possible configuration this won't work.

>  static inline struct rcar_msi *to_rcar_msi(struct msi_controller *chip)
>  {
>  	return container_of(chip, struct rcar_msi, chip);
> @@ -804,6 +820,12 @@ static int rcar_pcie_get_resources(struct rcar_pcie_host *host)
>  	}
>  	host->msi.irq2 = i;
>  
> +#ifdef CONFIG_ARM
> +	/* Cache static copy for L1 link state fixup hook on aarch32 */
> +	pcie_base = pcie->base;
> +	pcie_bus_clk = host->bus_clk;
> +#endif
> +
>  	return 0;
>  
>  err_irq2:
> @@ -1050,4 +1072,58 @@ static struct platform_driver rcar_pcie_driver = {
>  	},
>  	.probe = rcar_pcie_probe,
>  };
> +
> +#ifdef CONFIG_ARM
> +static int rcar_pcie_aarch32_abort_handler(unsigned long addr,
> +		unsigned int fsr, struct pt_regs *regs)
> +{
> +	u32 pmsr;
> +
> +	if (!pcie_base || !__clk_is_enabled(pcie_bus_clk))
> +		return 1;
> +
> +	pmsr = readl(pcie_base + PMSR);
> +
> +	/*
> +	 * Test if the PCIe controller received PM_ENTER_L1 DLLP and
> +	 * the PCIe controller is not in L1 link state. If true, apply
> +	 * fix, which will put the controller into L1 link state, from
> +	 * which it can return to L0s/L0 on its own.
> +	 */
> +	if ((pmsr & PMEL1RX) && ((pmsr & PMSTATE) != PMSTATE_L1)) {
> +		writel(L1IATN, pcie_base + PMCTLR);
> +		while (!(readl(pcie_base + PMSR) & L1FAEG))
> +			;
> +		writel(L1FAEG | PMEL1RX, pcie_base + PMSR);
> +		return 0;
> +	}

I suppose a fault on multiple cores can happen simultaneously, if it
does this may not work well either - I assume all config/io/mem would
trigger a fault.

As I mentioned in my reply to v1, is there a chance we can move
this quirk into config accessors (if the PM_ENTER_L1_DLLP is
subsequent to a write into PMCSR to programme a D state) ?

Config access is serialized but I suspect as I said above that this
triggers on config/io/mem alike.

Just asking to try to avoid a fault handler if possible.

Thanks,
Lorenzo

> +
> +	return 1;
> +}
> +
> +static const struct of_device_id rcar_pcie_abort_handler_of_match[] __initconst = {
> +	{ .compatible = "renesas,pcie-r8a7779" },
> +	{ .compatible = "renesas,pcie-r8a7790" },
> +	{ .compatible = "renesas,pcie-r8a7791" },
> +	{ .compatible = "renesas,pcie-rcar-gen2" },
> +	{},
> +};
> +
> +static int __init rcar_pcie_init(void)
> +{
> +	if (of_find_matching_node(NULL, rcar_pcie_abort_handler_of_match)) {
> +#ifdef CONFIG_ARM_LPAE
> +		hook_fault_code(17, rcar_pcie_aarch32_abort_handler, SIGBUS, 0,
> +				"asynchronous external abort");
> +#else
> +		hook_fault_code(22, rcar_pcie_aarch32_abort_handler, SIGBUS, 0,
> +				"imprecise external abort");
> +#endif
> +	}
> +
> +	return platform_driver_register(&rcar_pcie_driver);
> +}
> +device_initcall(rcar_pcie_init);
> +#else
>  builtin_platform_driver(rcar_pcie_driver);
> +#endif
> diff --git a/drivers/pci/controller/pcie-rcar.h b/drivers/pci/controller/pcie-rcar.h
> index d4c698b5f821..9bb125db85c6 100644
> --- a/drivers/pci/controller/pcie-rcar.h
> +++ b/drivers/pci/controller/pcie-rcar.h
> @@ -85,6 +85,13 @@
>  #define  LTSMDIS		BIT(31)
>  #define  MACCTLR_INIT_VAL	(LTSMDIS | MACCTLR_NFTS_MASK)
>  #define PMSR			0x01105c
> +#define  L1FAEG			BIT(31)
> +#define  PMEL1RX		BIT(23)
> +#define  PMSTATE		GENMASK(18, 16)
> +#define  PMSTATE_L1		(3 << 16)
> +#define PMCTLR			0x011060
> +#define  L1IATN			BIT(31)
> +
>  #define MACS2R			0x011078
>  #define MACCGSPSETR		0x011084
>  #define  SPCNGRSN		BIT(31)
> -- 
> 2.28.0
>

Marek Vasut Nov. 29, 2020, 1:05 p.m. UTC | #3

On 11/19/20 6:35 PM, Lorenzo Pieralisi wrote:
>> +#ifdef CONFIG_ARM
>> +/*
>> + * Here we keep a static copy of the remapped PCIe controller address.
>> + * This is only used on aarch32 systems, all of which have one single
>> + * PCIe controller, to provide quick access to the PCIe controller in
>> + * the L1 link state fixup function, called from the ARM fault handler.
>> + */
>> +static void __iomem *pcie_base;
>> +/*
>> + * Static copy of bus clock pointer, so we can check whether the clock
>> + * is enabled or not.
>> + */
>> +static struct clk *pcie_bus_clk;
>> +#endif
> 
> Don't think you can have multiple host bridges in a given platform,
> if it is a possible configuration this won't work.

Correct, all the affected platforms have only one host bridge.

>>   static inline struct rcar_msi *to_rcar_msi(struct msi_controller *chip)
>>   {
>>   	return container_of(chip, struct rcar_msi, chip);
>> @@ -804,6 +820,12 @@ static int rcar_pcie_get_resources(struct rcar_pcie_host *host)
>>   	}
>>   	host->msi.irq2 = i;
>>   
>> +#ifdef CONFIG_ARM
>> +	/* Cache static copy for L1 link state fixup hook on aarch32 */
>> +	pcie_base = pcie->base;
>> +	pcie_bus_clk = host->bus_clk;
>> +#endif
>> +
>>   	return 0;
>>   
>>   err_irq2:
>> @@ -1050,4 +1072,58 @@ static struct platform_driver rcar_pcie_driver = {
>>   	},
>>   	.probe = rcar_pcie_probe,
>>   };
>> +
>> +#ifdef CONFIG_ARM
>> +static int rcar_pcie_aarch32_abort_handler(unsigned long addr,
>> +		unsigned int fsr, struct pt_regs *regs)
>> +{
>> +	u32 pmsr;
>> +
>> +	if (!pcie_base || !__clk_is_enabled(pcie_bus_clk))
>> +		return 1;
>> +
>> +	pmsr = readl(pcie_base + PMSR);
>> +
>> +	/*
>> +	 * Test if the PCIe controller received PM_ENTER_L1 DLLP and
>> +	 * the PCIe controller is not in L1 link state. If true, apply
>> +	 * fix, which will put the controller into L1 link state, from
>> +	 * which it can return to L0s/L0 on its own.
>> +	 */
>> +	if ((pmsr & PMEL1RX) && ((pmsr & PMSTATE) != PMSTATE_L1)) {
>> +		writel(L1IATN, pcie_base + PMCTLR);
>> +		while (!(readl(pcie_base + PMSR) & L1FAEG))
>> +			;
>> +		writel(L1FAEG | PMEL1RX, pcie_base + PMSR);
>> +		return 0;
>> +	}
> 
> I suppose a fault on multiple cores can happen simultaneously, if it
> does this may not work well either - I assume all config/io/mem would
> trigger a fault.
> 
> As I mentioned in my reply to v1, is there a chance we can move
> this quirk into config accessors (if the PM_ENTER_L1_DLLP is
> subsequent to a write into PMCSR to programme a D state) ?

I don't think we can, since the userspace can do such a config space 
write with e.g. setpci and then this fixup is still needed.

> Config access is serialized but I suspect as I said above that this
> triggers on config/io/mem alike.
> 
> Just asking to try to avoid a fault handler if possible.

See above, I doubt we can fully avoid this workaround.

[...]

Lorenzo Pieralisi Dec. 8, 2020, 10:18 a.m. UTC | #4

On Sun, Nov 29, 2020 at 02:05:08PM +0100, Marek Vasut wrote:
> On 11/19/20 6:35 PM, Lorenzo Pieralisi wrote:
> > > +#ifdef CONFIG_ARM
> > > +/*
> > > + * Here we keep a static copy of the remapped PCIe controller address.
> > > + * This is only used on aarch32 systems, all of which have one single
> > > + * PCIe controller, to provide quick access to the PCIe controller in
> > > + * the L1 link state fixup function, called from the ARM fault handler.
> > > + */
> > > +static void __iomem *pcie_base;
> > > +/*
> > > + * Static copy of bus clock pointer, so we can check whether the clock
> > > + * is enabled or not.
> > > + */
> > > +static struct clk *pcie_bus_clk;
> > > +#endif
> > 
> > Don't think you can have multiple host bridges in a given platform,
> > if it is a possible configuration this won't work.
> 
> Correct, all the affected platforms have only one host bridge.
> 
> > >   static inline struct rcar_msi *to_rcar_msi(struct msi_controller *chip)
> > >   {
> > >   	return container_of(chip, struct rcar_msi, chip);
> > > @@ -804,6 +820,12 @@ static int rcar_pcie_get_resources(struct rcar_pcie_host *host)
> > >   	}
> > >   	host->msi.irq2 = i;
> > > +#ifdef CONFIG_ARM
> > > +	/* Cache static copy for L1 link state fixup hook on aarch32 */
> > > +	pcie_base = pcie->base;
> > > +	pcie_bus_clk = host->bus_clk;
> > > +#endif
> > > +
> > >   	return 0;
> > >   err_irq2:
> > > @@ -1050,4 +1072,58 @@ static struct platform_driver rcar_pcie_driver = {
> > >   	},
> > >   	.probe = rcar_pcie_probe,
> > >   };
> > > +
> > > +#ifdef CONFIG_ARM
> > > +static int rcar_pcie_aarch32_abort_handler(unsigned long addr,
> > > +		unsigned int fsr, struct pt_regs *regs)
> > > +{
> > > +	u32 pmsr;
> > > +
> > > +	if (!pcie_base || !__clk_is_enabled(pcie_bus_clk))
> > > +		return 1;
> > > +
> > > +	pmsr = readl(pcie_base + PMSR);
> > > +
> > > +	/*
> > > +	 * Test if the PCIe controller received PM_ENTER_L1 DLLP and
> > > +	 * the PCIe controller is not in L1 link state. If true, apply
> > > +	 * fix, which will put the controller into L1 link state, from
> > > +	 * which it can return to L0s/L0 on its own.
> > > +	 */
> > > +	if ((pmsr & PMEL1RX) && ((pmsr & PMSTATE) != PMSTATE_L1)) {
> > > +		writel(L1IATN, pcie_base + PMCTLR);
> > > +		while (!(readl(pcie_base + PMSR) & L1FAEG))
> > > +			;
> > > +		writel(L1FAEG | PMEL1RX, pcie_base + PMSR);
> > > +		return 0;
> > > +	}
> > 
> > I suppose a fault on multiple cores can happen simultaneously, if it
> > does this may not work well either - I assume all config/io/mem would
> > trigger a fault.
> > 
> > As I mentioned in my reply to v1, is there a chance we can move
> > this quirk into config accessors (if the PM_ENTER_L1_DLLP is
> > subsequent to a write into PMCSR to programme a D state) ?
> 
> I don't think we can, since the userspace can do such a config space write
> with e.g. setpci and then this fixup is still needed.


Userspace goes via the kernel config accessors anyway, right ?

I would like to avoid having arch specific hooks in PCI drivers so
if we can work around it somehow it is much better.

I can still merge this patch this week but I would like to explore
alternatives before committing it.

Lorenzo
> 
> > Config access is serialized but I suspect as I said above that this
> > triggers on config/io/mem alike.
> > 
> > Just asking to try to avoid a fault handler if possible.
> 
> See above, I doubt we can fully avoid this workaround.
> 
> [...]

Bjorn Helgaas Dec. 8, 2020, 4:40 p.m. UTC | #5

On Fri, Oct 16, 2020 at 02:04:16PM +0200, marek.vasut@gmail.com wrote:
> From: Marek Vasut <marek.vasut+renesas@gmail.com>
> 
> The R-Car PCIe controller is capable of handling L0s/L1 link states.

Minor wording nit: L0s seems irrelevant to this patch.

All PCIe functions are required to support the Power Management
Capability (PCIe r5.0, sec 7.5.2), and that in turn requires D0,
D3hot, and D3cold support, and D3hot requires L1 (sec 5.2).

So saying this device "is capable of handling L1" really doesn't tell
us anything, and it glosses over the fact that it doesn't do it
*correctly* and requires help from the driver to work around this
hardware defect.

Does this problem occur in both these cases?

  1) When ASPM enters L1, and

  2) When software writes PCI_PM_CTRL to put the device in D3hot?

IIUC both cases require the link to go to L1.  I guess the same
software workaround applies to both cases?

> While the controller can enter and exit L0s link state, and exit L1
> link state, without any additional action from the driver, to enter
> L1 link state, the driver must complete the link state transition by
> issuing additional commands to the controller.
> 
> The problem is, this transition is not atomic. The controller sets
> PMEL1RX bit in PMSR register upon reception of PM_ENTER_L1 DLLP from
> the PCIe card, but then the controller enters some sort of inbetween
> state. The driver must detect this condition and complete the link
> state transition, by setting L1IATN bit in PMCTLR and waiting for
> the link state transition to complete.
> 
> If a PCIe access happens inside this window, where the controller
> is between L0 and L1 link states, the access generates a fault and
> the ARM 'imprecise external abort' handler is invoked.
> 
> Just like other PCI controller drivers, here we hook the fault handler,
> perform the fixup to help the controller enter L1 link state, and then
> restart the instruction which triggered the fault. Since the controller
> is in L1 link state now, the link can exit from L1 link state to L0 and
> successfully complete the access.
> 
> Note that this fixup is applicable only to Aarch32 R-Car controllers,
> the Aarch64 R-Car perform the same fixup in TFA, see TFA commit [1]
> 0969397f2 ("rcar_gen3: plat: Prevent PCIe hang during L1X config access")
> [1] https://github.com/ARM-software/arm-trusted-firmware/commit/0969397f295621aa26b3d14b76dd397d22be58bf
> 
> Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Geert Uytterhoeven <geert+renesas@glider.be>
> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> Cc: Wolfram Sang <wsa@the-dreams.de>
> Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> Cc: linux-renesas-soc@vger.kernel.org
> ---
> V2: - Update commit message, add link to TFA repository commit
>     - Handle the LPAE case as in ARM fault.c and fsr-{2,3}level.c
>     - Cache clock and check whether they are enabled before register
>       access
> V3: - Fix commit message according to spellchecker
>     - Use of_find_matching_node() to apply hook only on Gen1 and Gen2 RCar
>       (in case the kernel is multiplatform)
> V4: - Mark rcar_pcie_abort_handler_of_match with __initconst
> ---
>  drivers/pci/controller/pcie-rcar-host.c | 76 +++++++++++++++++++++++++
>  drivers/pci/controller/pcie-rcar.h      |  7 +++
>  2 files changed, 83 insertions(+)
> 
> diff --git a/drivers/pci/controller/pcie-rcar-host.c b/drivers/pci/controller/pcie-rcar-host.c
> index cdc0963f154e..1194d5f3341b 100644
> --- a/drivers/pci/controller/pcie-rcar-host.c
> +++ b/drivers/pci/controller/pcie-rcar-host.c
> @@ -13,6 +13,7 @@
>  
>  #include <linux/bitops.h>
>  #include <linux/clk.h>
> +#include <linux/clk-provider.h>
>  #include <linux/delay.h>
>  #include <linux/interrupt.h>
>  #include <linux/irq.h>
> @@ -42,6 +43,21 @@ struct rcar_msi {
>  	int irq2;
>  };
>  
> +#ifdef CONFIG_ARM
> +/*
> + * Here we keep a static copy of the remapped PCIe controller address.
> + * This is only used on aarch32 systems, all of which have one single
> + * PCIe controller, to provide quick access to the PCIe controller in
> + * the L1 link state fixup function, called from the ARM fault handler.
> + */
> +static void __iomem *pcie_base;
> +/*
> + * Static copy of bus clock pointer, so we can check whether the clock
> + * is enabled or not.
> + */
> +static struct clk *pcie_bus_clk;
> +#endif
> +
>  static inline struct rcar_msi *to_rcar_msi(struct msi_controller *chip)
>  {
>  	return container_of(chip, struct rcar_msi, chip);
> @@ -804,6 +820,12 @@ static int rcar_pcie_get_resources(struct rcar_pcie_host *host)
>  	}
>  	host->msi.irq2 = i;
>  
> +#ifdef CONFIG_ARM
> +	/* Cache static copy for L1 link state fixup hook on aarch32 */
> +	pcie_base = pcie->base;
> +	pcie_bus_clk = host->bus_clk;
> +#endif
> +
>  	return 0;
>  
>  err_irq2:
> @@ -1050,4 +1072,58 @@ static struct platform_driver rcar_pcie_driver = {
>  	},
>  	.probe = rcar_pcie_probe,
>  };
> +
> +#ifdef CONFIG_ARM
> +static int rcar_pcie_aarch32_abort_handler(unsigned long addr,
> +		unsigned int fsr, struct pt_regs *regs)
> +{
> +	u32 pmsr;
> +
> +	if (!pcie_base || !__clk_is_enabled(pcie_bus_clk))
> +		return 1;
> +
> +	pmsr = readl(pcie_base + PMSR);
> +
> +	/*
> +	 * Test if the PCIe controller received PM_ENTER_L1 DLLP and
> +	 * the PCIe controller is not in L1 link state. If true, apply
> +	 * fix, which will put the controller into L1 link state, from
> +	 * which it can return to L0s/L0 on its own.
> +	 */
> +	if ((pmsr & PMEL1RX) && ((pmsr & PMSTATE) != PMSTATE_L1)) {
> +		writel(L1IATN, pcie_base + PMCTLR);
> +		while (!(readl(pcie_base + PMSR) & L1FAEG))
> +			;
> +		writel(L1FAEG | PMEL1RX, pcie_base + PMSR);
> +		return 0;
> +	}
> +
> +	return 1;

I have no insight into how these abort handlers work.  Looks awfully
kludgy to me, but if it's the only way and the ARM folks are on board
with it, I can't object.

I guess the other alternative would be to have a quirk to stop
advertising ASPM L1 support and D1/D2/D3hot support.  Obviously that
may give up some power savings.

If people aren't comfortable with the reliability or maintainability
of this approach in the upstream kernel, there's always the option of
the users who need it carrying this as an out-of-tree patch.

> +}
> +
> +static const struct of_device_id rcar_pcie_abort_handler_of_match[] __initconst = {
> +	{ .compatible = "renesas,pcie-r8a7779" },
> +	{ .compatible = "renesas,pcie-r8a7790" },
> +	{ .compatible = "renesas,pcie-r8a7791" },
> +	{ .compatible = "renesas,pcie-rcar-gen2" },
> +	{},
> +};

Why do we need another copy of these, as opposed to doing something
with of_device_get_match_data(), e.g., like brcm_pcie_probe() does?

> +static int __init rcar_pcie_init(void)
> +{
> +	if (of_find_matching_node(NULL, rcar_pcie_abort_handler_of_match)) {
> +#ifdef CONFIG_ARM_LPAE
> +		hook_fault_code(17, rcar_pcie_aarch32_abort_handler, SIGBUS, 0,
> +				"asynchronous external abort");
> +#else
> +		hook_fault_code(22, rcar_pcie_aarch32_abort_handler, SIGBUS, 0,
> +				"imprecise external abort");
> +#endif
> +	}
> +
> +	return platform_driver_register(&rcar_pcie_driver);
> +}
> +device_initcall(rcar_pcie_init);
> +#else
>  builtin_platform_driver(rcar_pcie_driver);
> +#endif

Is the device_initcall() vs builtin_platform_driver() something
related to the hook_fault_code()?  What would break if this were
always builtin_platform_driver()?

> diff --git a/drivers/pci/controller/pcie-rcar.h b/drivers/pci/controller/pcie-rcar.h
> index d4c698b5f821..9bb125db85c6 100644
> --- a/drivers/pci/controller/pcie-rcar.h
> +++ b/drivers/pci/controller/pcie-rcar.h
> @@ -85,6 +85,13 @@
>  #define  LTSMDIS		BIT(31)
>  #define  MACCTLR_INIT_VAL	(LTSMDIS | MACCTLR_NFTS_MASK)
>  #define PMSR			0x01105c
> +#define  L1FAEG			BIT(31)
> +#define  PMEL1RX		BIT(23)
> +#define  PMSTATE		GENMASK(18, 16)
> +#define  PMSTATE_L1		(3 << 16)
> +#define PMCTLR			0x011060
> +#define  L1IATN			BIT(31)
> +
>  #define MACS2R			0x011078
>  #define MACCGSPSETR		0x011084
>  #define  SPCNGRSN		BIT(31)
> -- 
> 2.28.0
>

Marek Vasut Dec. 8, 2020, 5:45 p.m. UTC | #6

On 12/8/20 11:18 AM, Lorenzo Pieralisi wrote:
[...]
>>> I suppose a fault on multiple cores can happen simultaneously, if it
>>> does this may not work well either - I assume all config/io/mem would
>>> trigger a fault.
>>>
>>> As I mentioned in my reply to v1, is there a chance we can move
>>> this quirk into config accessors (if the PM_ENTER_L1_DLLP is
>>> subsequent to a write into PMCSR to programme a D state) ?
>>
>> I don't think we can, since the userspace can do such a config space write
>> with e.g. setpci and then this fixup is still needed.
> 
> 
> Userspace goes via the kernel config accessors anyway, right ?

As far as I can tell, you can just write the register with devmem, so 
no. You cannot assume everything will go through the accessors. I don't 
think setpci does either.

> I would like to avoid having arch specific hooks in PCI drivers so
> if we can work around it somehow it is much better.

I think we had this discussion before, which ultimately led to hiding 
the workaround in ATF on Gen3. On Gen2, there is no ATF, so the work 
around must be in Linux.

> I can still merge this patch this week but I would like to explore
> alternatives before committing it.

Please merge it as-is.

Geert Uytterhoeven Dec. 8, 2020, 5:52 p.m. UTC | #7

On Tue, Dec 8, 2020 at 6:45 PM Marek Vasut <marek.vasut@gmail.com> wrote:
> On 12/8/20 11:18 AM, Lorenzo Pieralisi wrote:
> [...]
> >>> I suppose a fault on multiple cores can happen simultaneously, if it
> >>> does this may not work well either - I assume all config/io/mem would
> >>> trigger a fault.
> >>>
> >>> As I mentioned in my reply to v1, is there a chance we can move
> >>> this quirk into config accessors (if the PM_ENTER_L1_DLLP is
> >>> subsequent to a write into PMCSR to programme a D state) ?
> >>
> >> I don't think we can, since the userspace can do such a config space write
> >> with e.g. setpci and then this fixup is still needed.
> >
> > Userspace goes via the kernel config accessors anyway, right ?
>
> As far as I can tell, you can just write the register with devmem, so
> no. You cannot assume everything will go through the accessors. I don't
> think setpci does either.
>
> > I would like to avoid having arch specific hooks in PCI drivers so
> > if we can work around it somehow it is much better.
>
> I think we had this discussion before, which ultimately led to hiding
> the workaround in ATF on Gen3. On Gen2, there is no ATF, so the work
> around must be in Linux.
>
> > I can still merge this patch this week but I would like to explore
> > alternatives before committing it.
>
> Please merge it as-is.

+1

This can be triggered easily, just insert a stock Intel Ethernet card,
s2ram, and boom.

Gr{oetje,eeting}s,

                        Geert

Marek Vasut Dec. 8, 2020, 6:05 p.m. UTC | #8

On 12/8/20 5:40 PM, Bjorn Helgaas wrote:

[...]

>> The R-Car PCIe controller is capable of handling L0s/L1 link states.
> 
> Minor wording nit: L0s seems irrelevant to this patch.

Of course.

> All PCIe functions are required to support the Power Management
> Capability (PCIe r5.0, sec 7.5.2), and that in turn requires D0,
> D3hot, and D3cold support, and D3hot requires L1 (sec 5.2).
> 
> So saying this device "is capable of handling L1" really doesn't tell
> us anything, and it glosses over the fact that it doesn't do it
> *correctly* and requires help from the driver to work around this
> hardware defect.

I see.

> Does this problem occur in both these cases?
> 
>    1) When ASPM enters L1, and
> 
>    2) When software writes PCI_PM_CTRL to put the device in D3hot?
> 
> IIUC both cases require the link to go to L1.  I guess the same
> software workaround applies to both cases?

Yes

[...]

>> +#ifdef CONFIG_ARM
>> +static int rcar_pcie_aarch32_abort_handler(unsigned long addr,
>> +		unsigned int fsr, struct pt_regs *regs)
>> +{
>> +	u32 pmsr;
>> +
>> +	if (!pcie_base || !__clk_is_enabled(pcie_bus_clk))
>> +		return 1;
>> +
>> +	pmsr = readl(pcie_base + PMSR);
>> +
>> +	/*
>> +	 * Test if the PCIe controller received PM_ENTER_L1 DLLP and
>> +	 * the PCIe controller is not in L1 link state. If true, apply
>> +	 * fix, which will put the controller into L1 link state, from
>> +	 * which it can return to L0s/L0 on its own.
>> +	 */
>> +	if ((pmsr & PMEL1RX) && ((pmsr & PMSTATE) != PMSTATE_L1)) {
>> +		writel(L1IATN, pcie_base + PMCTLR);
>> +		while (!(readl(pcie_base + PMSR) & L1FAEG))
>> +			;
>> +		writel(L1FAEG | PMEL1RX, pcie_base + PMSR);
>> +		return 0;
>> +	}
>> +
>> +	return 1;
> 
> I have no insight into how these abort handlers work.  Looks awfully
> kludgy to me, but if it's the only way and the ARM folks are on board
> with it, I can't object.
> 
> I guess the other alternative would be to have a quirk to stop
> advertising ASPM L1 support and D1/D2/D3hot support.  Obviously that
> may give up some power savings.
> 
> If people aren't comfortable with the reliability or maintainability
> of this approach in the upstream kernel, there's always the option of
> the users who need it carrying this as an out-of-tree patch.

I would highly prefer to be able to use mainline Linux as-is, without 
carrying any extra patches, so BSP is not an option.

>> +}
>> +
>> +static const struct of_device_id rcar_pcie_abort_handler_of_match[] __initconst = {
>> +	{ .compatible = "renesas,pcie-r8a7779" },
>> +	{ .compatible = "renesas,pcie-r8a7790" },
>> +	{ .compatible = "renesas,pcie-r8a7791" },
>> +	{ .compatible = "renesas,pcie-rcar-gen2" },
>> +	{},
>> +};
> 
> Why do we need another copy of these, as opposed to doing something
> with of_device_get_match_data(), e.g., like brcm_pcie_probe() does?

This is not a copy, but as subset of SoCs which are affected by this 
problem.

>> +static int __init rcar_pcie_init(void)
>> +{
>> +	if (of_find_matching_node(NULL, rcar_pcie_abort_handler_of_match)) {
>> +#ifdef CONFIG_ARM_LPAE
>> +		hook_fault_code(17, rcar_pcie_aarch32_abort_handler, SIGBUS, 0,
>> +				"asynchronous external abort");
>> +#else
>> +		hook_fault_code(22, rcar_pcie_aarch32_abort_handler, SIGBUS, 0,
>> +				"imprecise external abort");
>> +#endif
>> +	}
>> +
>> +	return platform_driver_register(&rcar_pcie_driver);
>> +}
>> +device_initcall(rcar_pcie_init);
>> +#else
>>   builtin_platform_driver(rcar_pcie_driver);
>> +#endif
> 
> Is the device_initcall() vs builtin_platform_driver() something
> related to the hook_fault_code()?  What would break if this were
> always builtin_platform_driver()?

rcar_pcie_init() would not be called before probe.

Bjorn Helgaas Dec. 8, 2020, 6:46 p.m. UTC | #9

On Tue, Dec 08, 2020 at 07:05:09PM +0100, Marek Vasut wrote:
> On 12/8/20 5:40 PM, Bjorn Helgaas wrote:

> > > +static const struct of_device_id rcar_pcie_abort_handler_of_match[] __initconst = {
> > > +	{ .compatible = "renesas,pcie-r8a7779" },
> > > +	{ .compatible = "renesas,pcie-r8a7790" },
> > > +	{ .compatible = "renesas,pcie-r8a7791" },
> > > +	{ .compatible = "renesas,pcie-rcar-gen2" },
> > > +	{},
> > > +};
> > 
> > Why do we need another copy of these, as opposed to doing something
> > with of_device_get_match_data(), e.g., like brcm_pcie_probe() does?
> 
> This is not a copy, but as subset of SoCs which are affected by this
> problem.

I know it's not a complete copy.  Many systems include flags like
"broken_l1" in their match_data.  Something like this:

  struct rcar_pcie_drvdata {
    int            (*phy_init_fn)(struct rcar_pcie_host *host);
    unsigned int   broken_l1:1;
  };

  static const struct rcar_pcie_drvdata rcar_init_h1_drvdata = {
    .phy_init_fn = rcar_pcie_phy_init_h1,
    .broken_l1 = 1,
  };

  static const struct rcar_pcie_drvdata rcar_init_gen2_drvdata = {
    .phy_init_fn = rcar_pcie_phy_init_gen2,
    .broken_l1 = 1,
  };

  static const struct rcar_pcie_drvdata rcar_init_gen3_drvdata = {
    .phy_init_fn = rcar_pcie_phy_init_gen3,
  };

  static const struct of_device_id rcar_pcie_of_match[] = {
    { .compatible = "renesas,pcie-r8a7779", .data = rcar_init_h1_drvdata },
    { .compatible = "renesas,pcie-r8a7790", .data = rcar_init_gen2_drvdata },
    { .compatible = "renesas,pcie-r8a7791", .data = rcar_init_gen2_drvdata },
    ...

> > > +static int __init rcar_pcie_init(void)
> > > +{
> > > +	if (of_find_matching_node(NULL, rcar_pcie_abort_handler_of_match)) {
> > > +#ifdef CONFIG_ARM_LPAE
> > > +		hook_fault_code(17, rcar_pcie_aarch32_abort_handler, SIGBUS, 0,
> > > +				"asynchronous external abort");
> > > +#else
> > > +		hook_fault_code(22, rcar_pcie_aarch32_abort_handler, SIGBUS, 0,
> > > +				"imprecise external abort");
> > > +#endif
> > > +	}
> > > +
> > > +	return platform_driver_register(&rcar_pcie_driver);
> > > +}
> > > +device_initcall(rcar_pcie_init);
> > > +#else
> > >   builtin_platform_driver(rcar_pcie_driver);
> > > +#endif
> > 
> > Is the device_initcall() vs builtin_platform_driver() something
> > related to the hook_fault_code()?  What would break if this were
> > always builtin_platform_driver()?
> 
> rcar_pcie_init() would not be called before probe.

Sorry to be slow, but why does it need to be called before probe?
Obviously software isn't putting the controller in D3 or enabling ASPM
before probe.

Bjorn

Lorenzo Pieralisi Dec. 10, 2020, 12:12 p.m. UTC | #10

On Tue, Dec 08, 2020 at 12:46:27PM -0600, Bjorn Helgaas wrote:
> On Tue, Dec 08, 2020 at 07:05:09PM +0100, Marek Vasut wrote:
> > On 12/8/20 5:40 PM, Bjorn Helgaas wrote:
> 
> > > > +static const struct of_device_id rcar_pcie_abort_handler_of_match[] __initconst = {
> > > > +	{ .compatible = "renesas,pcie-r8a7779" },
> > > > +	{ .compatible = "renesas,pcie-r8a7790" },
> > > > +	{ .compatible = "renesas,pcie-r8a7791" },
> > > > +	{ .compatible = "renesas,pcie-rcar-gen2" },
> > > > +	{},
> > > > +};
> > > 
> > > Why do we need another copy of these, as opposed to doing something
> > > with of_device_get_match_data(), e.g., like brcm_pcie_probe() does?
> > 
> > This is not a copy, but as subset of SoCs which are affected by this
> > problem.
> 
> I know it's not a complete copy.  Many systems include flags like
> "broken_l1" in their match_data.  Something like this:
> 
>   struct rcar_pcie_drvdata {
>     int            (*phy_init_fn)(struct rcar_pcie_host *host);
>     unsigned int   broken_l1:1;
>   };
> 
>   static const struct rcar_pcie_drvdata rcar_init_h1_drvdata = {
>     .phy_init_fn = rcar_pcie_phy_init_h1,
>     .broken_l1 = 1,
>   };
> 
>   static const struct rcar_pcie_drvdata rcar_init_gen2_drvdata = {
>     .phy_init_fn = rcar_pcie_phy_init_gen2,
>     .broken_l1 = 1,
>   };
> 
>   static const struct rcar_pcie_drvdata rcar_init_gen3_drvdata = {
>     .phy_init_fn = rcar_pcie_phy_init_gen3,
>   };
> 
>   static const struct of_device_id rcar_pcie_of_match[] = {
>     { .compatible = "renesas,pcie-r8a7779", .data = rcar_init_h1_drvdata },
>     { .compatible = "renesas,pcie-r8a7790", .data = rcar_init_gen2_drvdata },
>     { .compatible = "renesas,pcie-r8a7791", .data = rcar_init_gen2_drvdata },
>     ...

+1

> > > > +static int __init rcar_pcie_init(void)
> > > > +{
> > > > +	if (of_find_matching_node(NULL, rcar_pcie_abort_handler_of_match)) {
> > > > +#ifdef CONFIG_ARM_LPAE
> > > > +		hook_fault_code(17, rcar_pcie_aarch32_abort_handler, SIGBUS, 0,
> > > > +				"asynchronous external abort");
> > > > +#else
> > > > +		hook_fault_code(22, rcar_pcie_aarch32_abort_handler, SIGBUS, 0,
> > > > +				"imprecise external abort");
> > > > +#endif
> > > > +	}
> > > > +
> > > > +	return platform_driver_register(&rcar_pcie_driver);
> > > > +}
> > > > +device_initcall(rcar_pcie_init);
> > > > +#else
> > > >   builtin_platform_driver(rcar_pcie_driver);
> > > > +#endif
> > > 
> > > Is the device_initcall() vs builtin_platform_driver() something
> > > related to the hook_fault_code()?  What would break if this were
> > > always builtin_platform_driver()?
> > 
> > rcar_pcie_init() would not be called before probe.
> 
> Sorry to be slow, but why does it need to be called before probe?
> Obviously software isn't putting the controller in D3 or enabling ASPM
> before probe.

I don't understand it either so it would be good to clarify.

Also, some of these platforms are SMP systems, I don't understand
what prevents multiple cores to fault at once given that the faults
can happen for config/io/mem accesses alike.

I understand that the immediate fix is for S2R, that is single
threaded but I would like to understand how comprehensive this fix
is.

Thanks,
Lorenzo

Marek Vasut Dec. 12, 2020, 7:10 p.m. UTC | #11

On 12/8/20 7:46 PM, Bjorn Helgaas wrote:

[...]

>>>> +static int __init rcar_pcie_init(void)
>>>> +{
>>>> +	if (of_find_matching_node(NULL, rcar_pcie_abort_handler_of_match)) {
>>>> +#ifdef CONFIG_ARM_LPAE
>>>> +		hook_fault_code(17, rcar_pcie_aarch32_abort_handler, SIGBUS, 0,
>>>> +				"asynchronous external abort");
>>>> +#else
>>>> +		hook_fault_code(22, rcar_pcie_aarch32_abort_handler, SIGBUS, 0,
>>>> +				"imprecise external abort");
>>>> +#endif
>>>> +	}
>>>> +
>>>> +	return platform_driver_register(&rcar_pcie_driver);
>>>> +}
>>>> +device_initcall(rcar_pcie_init);
>>>> +#else
>>>>    builtin_platform_driver(rcar_pcie_driver);
>>>> +#endif
>>>
>>> Is the device_initcall() vs builtin_platform_driver() something
>>> related to the hook_fault_code()?  What would break if this were
>>> always builtin_platform_driver()?
>>
>> rcar_pcie_init() would not be called before probe.
> 
> Sorry to be slow, but why does it need to be called before probe?
> Obviously software isn't putting the controller in D3 or enabling ASPM
> before probe.

The hook_fault_code() is marked __init, so if probe() was deferred and 
the kernel __init memory was free'd, attempt to call hook_fault_code() 
from probe would lead to a crash.

Marek Vasut Dec. 12, 2020, 7:12 p.m. UTC | #12

On 12/10/20 1:12 PM, Lorenzo Pieralisi wrote:

[...]

>>>>> +static int __init rcar_pcie_init(void)
>>>>> +{
>>>>> +	if (of_find_matching_node(NULL, rcar_pcie_abort_handler_of_match)) {
>>>>> +#ifdef CONFIG_ARM_LPAE
>>>>> +		hook_fault_code(17, rcar_pcie_aarch32_abort_handler, SIGBUS, 0,
>>>>> +				"asynchronous external abort");
>>>>> +#else
>>>>> +		hook_fault_code(22, rcar_pcie_aarch32_abort_handler, SIGBUS, 0,
>>>>> +				"imprecise external abort");
>>>>> +#endif
>>>>> +	}
>>>>> +
>>>>> +	return platform_driver_register(&rcar_pcie_driver);
>>>>> +}
>>>>> +device_initcall(rcar_pcie_init);
>>>>> +#else
>>>>>    builtin_platform_driver(rcar_pcie_driver);
>>>>> +#endif
>>>>
>>>> Is the device_initcall() vs builtin_platform_driver() something
>>>> related to the hook_fault_code()?  What would break if this were
>>>> always builtin_platform_driver()?
>>>
>>> rcar_pcie_init() would not be called before probe.
>>
>> Sorry to be slow, but why does it need to be called before probe?
>> Obviously software isn't putting the controller in D3 or enabling ASPM
>> before probe.
> 
> I don't understand it either so it would be good to clarify.

The hook_fault_code() is marked __init, so if probe() was deferred and 
the kernel __init memory was free'd, attempt to call hook_fault_code() 
from probe would lead to a crash.

> Also, some of these platforms are SMP systems, I don't understand
> what prevents multiple cores to fault at once given that the faults
> can happen for config/io/mem accesses alike.
> 
> I understand that the immediate fix is for S2R, that is single
> threaded but I would like to understand how comprehensive this fix
> is.

Are you suggesting to add some sort of locking ?

Lorenzo Pieralisi Dec. 14, 2020, 5:13 p.m. UTC | #13

On Sat, Dec 12, 2020 at 08:12:16PM +0100, Marek Vasut wrote:
> On 12/10/20 1:12 PM, Lorenzo Pieralisi wrote:
> 
> [...]
> 
> > > > > > +static int __init rcar_pcie_init(void)
> > > > > > +{
> > > > > > +	if (of_find_matching_node(NULL, rcar_pcie_abort_handler_of_match)) {
> > > > > > +#ifdef CONFIG_ARM_LPAE
> > > > > > +		hook_fault_code(17, rcar_pcie_aarch32_abort_handler, SIGBUS, 0,
> > > > > > +				"asynchronous external abort");
> > > > > > +#else
> > > > > > +		hook_fault_code(22, rcar_pcie_aarch32_abort_handler, SIGBUS, 0,
> > > > > > +				"imprecise external abort");
> > > > > > +#endif
> > > > > > +	}
> > > > > > +
> > > > > > +	return platform_driver_register(&rcar_pcie_driver);
> > > > > > +}
> > > > > > +device_initcall(rcar_pcie_init);
> > > > > > +#else
> > > > > >    builtin_platform_driver(rcar_pcie_driver);
> > > > > > +#endif
> > > > > 
> > > > > Is the device_initcall() vs builtin_platform_driver() something
> > > > > related to the hook_fault_code()?  What would break if this were
> > > > > always builtin_platform_driver()?
> > > > 
> > > > rcar_pcie_init() would not be called before probe.
> > > 
> > > Sorry to be slow, but why does it need to be called before probe?
> > > Obviously software isn't putting the controller in D3 or enabling ASPM
> > > before probe.
> > 
> > I don't understand it either so it would be good to clarify.
> 
> The hook_fault_code() is marked __init, so if probe() was deferred and the
> kernel __init memory was free'd, attempt to call hook_fault_code() from
> probe would lead to a crash.

Understood - I don't think there is a point though in keeping
the builtin_platform_driver() call then, something like:

#ifdef CONFIG_ARM
...
static __init void init_platform_hook_fault(void) {
	if (of_find_matching_node(NULL, rcar_pcie_abort_handler_of_match)) {
		#ifdef CONFIG_ARM_LPAE
			hook_fault_code(17, rcar_pcie_aarch32_abort_handler, SIGBUS, 0,
					"asynchronous external abort");
		#else
			hook_fault_code(22, rcar_pcie_aarch32_abort_handler, SIGBUS, 0,
					"imprecise external abort");
		#endif
	}
}
#else
static inline void init_platform_hook_fault(void)
{}
#endif

static int __init rcar_pcie_init(void)
{
	init_platform_hook_fault();
	return platform_driver_register(&rcar_pcie_driver);
}
device_initcall(rcar_pcie_init);

Or we remove the __init marker from hook_fault_code().

> > Also, some of these platforms are SMP systems, I don't understand
> > what prevents multiple cores to fault at once given that the faults
> > can happen for config/io/mem accesses alike.
> > 
> > I understand that the immediate fix is for S2R, that is single
> > threaded but I would like to understand how comprehensive this fix
> > is.
> 
> Are you suggesting to add some sort of locking ?

If we merge a fix the fix has to work, by reading the code if multiple
cores fault at once this fix seems to have an issue that's why I asked,
you may still end up with an unhandled fault by reading the code.

Lorenzo

Bjorn Helgaas Dec. 14, 2020, 8:38 p.m. UTC | #14

On Tue, Dec 08, 2020 at 07:05:09PM +0100, Marek Vasut wrote:
> On 12/8/20 5:40 PM, Bjorn Helgaas wrote:

> > Does this problem occur in both these cases?
> > 
> >    1) When ASPM enters L1, and
> > 
> >    2) When software writes PCI_PM_CTRL to put the device in D3hot?
> > 
> > IIUC both cases require the link to go to L1.  I guess the same
> > software workaround applies to both cases?
> 
> Yes

If ASPM puts the Link in L1 and the device needs to DMA, how does the
Link get back to L0?  Do we use the same data abort hook?  If getting
back to L0 requires help from software, it seems like that would
invalidate the L1 exit latency advertised by the devices.  Wouldn't
that mean we couldn't safely enable L1 at all unless the endpoint
could tolerate unlimited exit latency?

Bjorn

Marek Vasut Dec. 16, 2020, 5:52 p.m. UTC | #15

On 12/14/20 6:13 PM, Lorenzo Pieralisi wrote:
[...]

>>>>>> Is the device_initcall() vs builtin_platform_driver() something
>>>>>> related to the hook_fault_code()?  What would break if this were
>>>>>> always builtin_platform_driver()?
>>>>>
>>>>> rcar_pcie_init() would not be called before probe.
>>>>
>>>> Sorry to be slow, but why does it need to be called before probe?
>>>> Obviously software isn't putting the controller in D3 or enabling ASPM
>>>> before probe.
>>>
>>> I don't understand it either so it would be good to clarify.
>>
>> The hook_fault_code() is marked __init, so if probe() was deferred and the
>> kernel __init memory was free'd, attempt to call hook_fault_code() from
>> probe would lead to a crash.
> 
> Understood - I don't think there is a point though in keeping
> the builtin_platform_driver() call then, something like:
> 
> #ifdef CONFIG_ARM
> ...
> static __init void init_platform_hook_fault(void) {
> 	if (of_find_matching_node(NULL, rcar_pcie_abort_handler_of_match)) {
> 		#ifdef CONFIG_ARM_LPAE
> 			hook_fault_code(17, rcar_pcie_aarch32_abort_handler, SIGBUS, 0,
> 					"asynchronous external abort");
> 		#else
> 			hook_fault_code(22, rcar_pcie_aarch32_abort_handler, SIGBUS, 0,
> 					"imprecise external abort");
> 		#endif
> 	}
> }
> #else
> static inline void init_platform_hook_fault(void)
> {}
> #endif
> 
> static int __init rcar_pcie_init(void)
> {
> 	init_platform_hook_fault();
> 	return platform_driver_register(&rcar_pcie_driver);
> }
> device_initcall(rcar_pcie_init);

Does this look simpler than the code in this patch ?

> Or we remove the __init marker from hook_fault_code().

This is a bugfix, it should be possible to backport this through the 
stable tree easily, without changing core architecture code.

>>> Also, some of these platforms are SMP systems, I don't understand
>>> what prevents multiple cores to fault at once given that the faults
>>> can happen for config/io/mem accesses alike.
>>>
>>> I understand that the immediate fix is for S2R, that is single
>>> threaded but I would like to understand how comprehensive this fix
>>> is.
>>
>> Are you suggesting to add some sort of locking ?
> 
> If we merge a fix the fix has to work, by reading the code if multiple
> cores fault at once this fix seems to have an issue that's why I asked,
> you may still end up with an unhandled fault by reading the code.

So, are you suggesting the hook needs some locking ?

Marek Vasut Dec. 16, 2020, 5:56 p.m. UTC | #16

On 12/14/20 9:38 PM, Bjorn Helgaas wrote:
> On Tue, Dec 08, 2020 at 07:05:09PM +0100, Marek Vasut wrote:
>> On 12/8/20 5:40 PM, Bjorn Helgaas wrote:
> 
>>> Does this problem occur in both these cases?
>>>
>>>     1) When ASPM enters L1, and
>>>
>>>     2) When software writes PCI_PM_CTRL to put the device in D3hot?
>>>
>>> IIUC both cases require the link to go to L1.  I guess the same
>>> software workaround applies to both cases?
>>
>> Yes
> 
> If ASPM puts the Link in L1 and the device needs to DMA, how does the
> Link get back to L0?

It cannot, so I would expect the DMA access would fail.

> Do we use the same data abort hook?  If getting
> back to L0 requires help from software, it seems like that would
> invalidate the L1 exit latency advertised by the devices.  Wouldn't
> that mean we couldn't safely enable L1 at all unless the endpoint
> could tolerate unlimited exit latency?

Possibly, there could be limitations to the L1 support in some corner 
cases. Does that mean the L1 support should be disabled completely ?

Bjorn Helgaas Dec. 16, 2020, 6:20 p.m. UTC | #17

On Wed, Dec 16, 2020 at 06:56:11PM +0100, Marek Vasut wrote:
> On 12/14/20 9:38 PM, Bjorn Helgaas wrote:
> > On Tue, Dec 08, 2020 at 07:05:09PM +0100, Marek Vasut wrote:
> > > On 12/8/20 5:40 PM, Bjorn Helgaas wrote:
> > 
> > > > Does this problem occur in both these cases?
> > > > 
> > > >     1) When ASPM enters L1, and
> > > > 
> > > >     2) When software writes PCI_PM_CTRL to put the device in D3hot?
> > > > 
> > > > IIUC both cases require the link to go to L1.  I guess the same
> > > > software workaround applies to both cases?
> > > 
> > > Yes
> > 
> > If ASPM puts the Link in L1 and the device needs to DMA, how does the
> > Link get back to L0?
> 
> It cannot, so I would expect the DMA access would fail.

I think that means we cannot enable ASPM L1 at all on this device.  I
don't think devices or drivers are prepared to deal with this sort of
DMA failure.  At least, if there is a mechanism for dealing with it, I
don't know what it is.

Preventing use of ASPM L1 probably means some sort of quirk to
override whatever the controller advertises in its Link Capabilities
register.

The software-controlled PCI-PM model (where software writes to the
PCI_PM_CTRL register) is different, and it may still be possible to
use L1 then.  If software puts the device in D1, D2, or D3hot, the
device cannot initiate DMA.  If it needs to return to D0, it would
have to use the PME mechanism, so there is an opportunity for the
software workaround.

> > Do we use the same data abort hook?  If getting
> > back to L0 requires help from software, it seems like that would
> > invalidate the L1 exit latency advertised by the devices.  Wouldn't
> > that mean we couldn't safely enable L1 at all unless the endpoint
> > could tolerate unlimited exit latency?
> 
> Possibly, there could be limitations to the L1 support in some corner cases.
> Does that mean the L1 support should be disabled completely ?

The L1 exit latency only applies to the ASPM case.  It sounds like we
will have to disable L1 for ASPM.  But the exit latency doesn't apply
to the PCI-PM model where software will explicitly return the device
to D0, and the device should not initiate a transaction until it sees
the link back in L0.

Bjorn

[V4] PCI: rcar: Add L1 link state fix into data abort hook

Commit Message

Comments

Patch