diff mbox series

[v3,2/2] PCI: rcar: Return all Fs from read which triggered an exception

Message ID 20220122221554.196311-2-marek.vasut@gmail.com (mailing list archive)
State Superseded
Delegated to: Lorenzo Pieralisi
Headers show
Series [v3,1/2] PCI: rcar: Finish transition to L1 state in rcar_pcie_config_access() | expand

Commit Message

Marek Vasut Jan. 22, 2022, 10:15 p.m. UTC
From: Marek Vasut <marek.vasut+renesas@gmail.com>

In case the controller is transitioning to L1 in rcar_pcie_config_access(),
any read/write access to PCIECDR triggers asynchronous external abort. This
is because the transition to L1 link state must be manually finished by the
driver. The PCIe IP can transition back from L1 state to L0 on its own.

The current asynchronous external abort hook implementation restarts
the instruction which finally triggered the fault, which can be a
different instruction than the read/write instruction which started
the faulting access. Usually the instruction which finally triggers
the fault is one which has some data dependency on the result of the
read/write. In case of read, the read value after fixup is undefined,
while a read value of faulting read should be all Fs.

It is possible to enforce the fault using 'isb' instruction placed
right after the read/write instruction which started the faulting
access. Add custom register accessors which perform the read/write
followed immediately by 'isb'.

This way, the fault always happens on the 'isb' and in case of read,
which is located one instruction before the 'isb', it is now possible
to fix up the return value of the read in the asynchronous external
abort hook and make that read return all Fs.

Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Krzysztof Wilczyński <kw@linux.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Wolfram Sang <wsa@the-dreams.de>
Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Cc: linux-renesas-soc@vger.kernel.org
---
V2: Rebase on 1/2
V3: - Add .text.fixup on all three ldr/str/isb instructions and call
      fixup_exception() in the abort handler to trigger the fixup.
    - Propagate return value from read/write accessors, in case the
      access fails, return PCIBIOS_SET_FAILED, else PCIBIOS_SUCCESSFUL.
---
 drivers/pci/controller/pcie-rcar-host.c | 53 +++++++++++++++++++++++--
 1 file changed, 49 insertions(+), 4 deletions(-)

Comments

Arnd Bergmann Jan. 23, 2022, 2:12 p.m. UTC | #1
On Sat, Jan 22, 2022 at 11:15 PM <marek.vasut@gmail.com> wrote:
>
> From: Marek Vasut <marek.vasut+renesas@gmail.com>
>
> In case the controller is transitioning to L1 in rcar_pcie_config_access(),
> any read/write access to PCIECDR triggers asynchronous external abort. This
> is because the transition to L1 link state must be manually finished by the
> driver. The PCIe IP can transition back from L1 state to L0 on its own.
>
> The current asynchronous external abort hook implementation restarts
> the instruction which finally triggered the fault, which can be a
> different instruction than the read/write instruction which started
> the faulting access. Usually the instruction which finally triggers
> the fault is one which has some data dependency on the result of the
> read/write. In case of read, the read value after fixup is undefined,
> while a read value of faulting read should be all Fs.
>
> It is possible to enforce the fault using 'isb' instruction placed
> right after the read/write instruction which started the faulting
> access. Add custom register accessors which perform the read/write
> followed immediately by 'isb'.
>
> This way, the fault always happens on the 'isb' and in case of read,
> which is located one instruction before the 'isb', it is now possible
> to fix up the return value of the read in the asynchronous external
> abort hook and make that read return all Fs.
>
> Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Geert Uytterhoeven <geert+renesas@glider.be>
> Cc: Krzysztof Wilczyński <kw@linux.com>
> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> Cc: Wolfram Sang <wsa@the-dreams.de>
> Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> Cc: linux-renesas-soc@vger.kernel.org

Reviewed-by: Arnd Bergmann <arnd@arndb.de>

provided once concern gets resolved:

> +#ifdef CONFIG_ARM
> +#define __rcar_pci_rw_reg_workaround(instr)                            \
> +               "1:     " instr " %1, [%2]\n"                           \
> +               "2:     isb\n"                                          \
> +               "3:     .pushsection .text.fixup,\"ax\"\n"              \
> +               "       .align  2\n"                                    \
> +               "4:     mov     %0, #" __stringify(PCIBIOS_SET_FAILED) "\n" \
> +               "       b       3b\n"                                   \
> +               "       .popsection\n"                                  \
> +               "       .pushsection __ex_table,\"a\"\n"                \
> +               "       .align  3\n"                                    \
> +               "       .long   1b, 4b\n"                               \
> +               "       .long   1b, 4b\n"                               \
> +               "       .popsection\n"
> +#endif

You list the fixup for the ldr/str instruction here twice, (.long 1b,4b), but
no fixup for the isb instruction (.long 2b, 4b). Your description says that
the fault happens on the isb, not the ldr, so I don't understand what is
going on here.

       Arnd
Pali Rohár Jan. 23, 2022, 3:31 p.m. UTC | #2
On Saturday 22 January 2022 23:15:54 marek.vasut@gmail.com wrote:
> From: Marek Vasut <marek.vasut+renesas@gmail.com>
> 
> In case the controller is transitioning to L1 in rcar_pcie_config_access(),
> any read/write access to PCIECDR triggers asynchronous external abort. This
> is because the transition to L1 link state must be manually finished by the
> driver. The PCIe IP can transition back from L1 state to L0 on its own.

Hello!

I must admit that this patch from its initial version evolved into giant hack...
https://lore.kernel.org/linux-pci/20210514200549.431275-1-marek.vasut@gmail.com/

During review of the previous patch I have asked some important
questions but I have not got any answer to them. So I'm reminding it:
https://lore.kernel.org/linux-pci/20210805183024.ftdwknkttfwwogks@pali/

So could please answer what happens when PCIe controller is in some
non-L* state and either MMIO happen or config read happens or config
write happens?

It is really important to know this fact.

I'm in impression that this patch still is not enough as similar issues
are also in other PCIe controllers which I know...

> The current asynchronous external abort hook implementation restarts
> the instruction which finally triggered the fault, which can be a
> different instruction than the read/write instruction which started
> the faulting access. Usually the instruction which finally triggers
> the fault is one which has some data dependency on the result of the
> read/write. In case of read, the read value after fixup is undefined,
> while a read value of faulting read should be all Fs.
> 
> It is possible to enforce the fault using 'isb' instruction placed
> right after the read/write instruction which started the faulting
> access. Add custom register accessors which perform the read/write
> followed immediately by 'isb'.
> 
> This way, the fault always happens on the 'isb' and in case of read,
> which is located one instruction before the 'isb', it is now possible
> to fix up the return value of the read in the asynchronous external
> abort hook and make that read return all Fs.
> 
> Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Geert Uytterhoeven <geert+renesas@glider.be>
> Cc: Krzysztof Wilczyński <kw@linux.com>
> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> Cc: Wolfram Sang <wsa@the-dreams.de>
> Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> Cc: linux-renesas-soc@vger.kernel.org
> ---
> V2: Rebase on 1/2
> V3: - Add .text.fixup on all three ldr/str/isb instructions and call
>       fixup_exception() in the abort handler to trigger the fixup.
>     - Propagate return value from read/write accessors, in case the
>       access fails, return PCIBIOS_SET_FAILED, else PCIBIOS_SUCCESSFUL.
> ---
>  drivers/pci/controller/pcie-rcar-host.c | 53 +++++++++++++++++++++++--
>  1 file changed, 49 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/pci/controller/pcie-rcar-host.c b/drivers/pci/controller/pcie-rcar-host.c
> index 7d38a9c50093..b2e521ee95eb 100644
> --- a/drivers/pci/controller/pcie-rcar-host.c
> +++ b/drivers/pci/controller/pcie-rcar-host.c
> @@ -114,6 +114,51 @@ static u32 rcar_read_conf(struct rcar_pcie *pcie, int where)
>  	return val >> shift;
>  }
>  
> +#ifdef CONFIG_ARM
> +#define __rcar_pci_rw_reg_workaround(instr)				\
> +		"1:	" instr " %1, [%2]\n"				\
> +		"2:	isb\n"						\
> +		"3:	.pushsection .text.fixup,\"ax\"\n"		\
> +		"	.align	2\n"					\
> +		"4:	mov	%0, #" __stringify(PCIBIOS_SET_FAILED) "\n" \
> +		"	b	3b\n"					\
> +		"	.popsection\n"					\
> +		"	.pushsection __ex_table,\"a\"\n"		\
> +		"	.align	3\n"					\
> +		"	.long	1b, 4b\n"				\
> +		"	.long	1b, 4b\n"				\
> +		"	.popsection\n"
> +#endif
> +
> +int rcar_pci_write_reg_workaround(struct rcar_pcie *pcie, u32 val, unsigned int reg)
> +{
> +	int error = PCIBIOS_SUCCESSFUL;
> +#ifdef CONFIG_ARM
> +	asm volatile(
> +		__rcar_pci_rw_reg_workaround("str")
> +	: "+r"(error):"r"(val), "r"(pcie->base + reg) : "memory");
> +#else
> +	rcar_pci_write_reg(pcie, val, reg);
> +#endif
> +	return error;
> +}
> +
> +int rcar_pci_read_reg_workaround(struct rcar_pcie *pcie, u32 *val, unsigned int reg)
> +{
> +	int error = PCIBIOS_SUCCESSFUL;
> +#ifdef CONFIG_ARM
> +	asm volatile(
> +		__rcar_pci_rw_reg_workaround("ldr")
> +	: "+r"(error), "=r"(*val) : "r"(pcie->base + reg) : "memory");
> +
> +	if (error != PCIBIOS_SUCCESSFUL)
> +		*val = 0xffffffff;
> +#else
> +	*val = rcar_pci_read_reg(pcie, reg);
> +#endif
> +	return error;
> +}
> +
>  /* Serialization is provided by 'pci_lock' in drivers/pci/access.c */
>  static int rcar_pcie_config_access(struct rcar_pcie_host *host,
>  		unsigned char access_type, struct pci_bus *bus,
> @@ -185,14 +230,14 @@ static int rcar_pcie_config_access(struct rcar_pcie_host *host,
>  		return PCIBIOS_DEVICE_NOT_FOUND;
>  
>  	if (access_type == RCAR_PCI_ACCESS_READ)
> -		*data = rcar_pci_read_reg(pcie, PCIECDR);
> +		ret = rcar_pci_read_reg_workaround(pcie, data, PCIECDR);
>  	else
> -		rcar_pci_write_reg(pcie, *data, PCIECDR);
> +		ret = rcar_pci_write_reg_workaround(pcie, *data, PCIECDR);
>  
>  	/* Disable the configuration access */
>  	rcar_pci_write_reg(pcie, 0, PCIECCTLR);
>  
> -	return PCIBIOS_SUCCESSFUL;
> +	return ret;
>  }
>  
>  static int rcar_pcie_read_conf(struct pci_bus *bus, unsigned int devfn,
> @@ -1097,7 +1142,7 @@ static struct platform_driver rcar_pcie_driver = {
>  static int rcar_pcie_aarch32_abort_handler(unsigned long addr,
>  		unsigned int fsr, struct pt_regs *regs)
>  {
> -	return !!rcar_pcie_wakeup(pcie_dev, pcie_base);
> +	return !fixup_exception(regs);
>  }
>  
>  static const struct of_device_id rcar_pcie_abort_handler_of_match[] __initconst = {
> -- 
> 2.34.1
>
Bjorn Helgaas Jan. 23, 2022, 3:39 p.m. UTC | #3
On Sat, Jan 22, 2022 at 11:15:54PM +0100, marek.vasut@gmail.com wrote:
> From: Marek Vasut <marek.vasut+renesas@gmail.com>
> 
> In case the controller is transitioning to L1 in rcar_pcie_config_access(),
> any read/write access to PCIECDR triggers asynchronous external abort. This
> is because the transition to L1 link state must be manually finished by the
> driver. The PCIe IP can transition back from L1 state to L0 on its own.
> 
> The current asynchronous external abort hook implementation restarts
> the instruction which finally triggered the fault, which can be a
> different instruction than the read/write instruction which started
> the faulting access. Usually the instruction which finally triggers
> the fault is one which has some data dependency on the result of the
> read/write. In case of read, the read value after fixup is undefined,
> while a read value of faulting read should be all Fs.
> 
> It is possible to enforce the fault using 'isb' instruction placed
> right after the read/write instruction which started the faulting
> access. Add custom register accessors which perform the read/write
> followed immediately by 'isb'.
> 
> This way, the fault always happens on the 'isb' and in case of read,
> which is located one instruction before the 'isb', it is now possible
> to fix up the return value of the read in the asynchronous external
> abort hook and make that read return all Fs.
> 
> Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Geert Uytterhoeven <geert+renesas@glider.be>
> Cc: Krzysztof Wilczyński <kw@linux.com>
> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> Cc: Wolfram Sang <wsa@the-dreams.de>
> Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> Cc: linux-renesas-soc@vger.kernel.org
> ---
> V2: Rebase on 1/2
> V3: - Add .text.fixup on all three ldr/str/isb instructions and call
>       fixup_exception() in the abort handler to trigger the fixup.
>     - Propagate return value from read/write accessors, in case the
>       access fails, return PCIBIOS_SET_FAILED, else PCIBIOS_SUCCESSFUL.
> ---
>  drivers/pci/controller/pcie-rcar-host.c | 53 +++++++++++++++++++++++--
>  1 file changed, 49 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/pci/controller/pcie-rcar-host.c b/drivers/pci/controller/pcie-rcar-host.c
> index 7d38a9c50093..b2e521ee95eb 100644
> --- a/drivers/pci/controller/pcie-rcar-host.c
> +++ b/drivers/pci/controller/pcie-rcar-host.c
> @@ -114,6 +114,51 @@ static u32 rcar_read_conf(struct rcar_pcie *pcie, int where)
>  	return val >> shift;
>  }
>  
> +#ifdef CONFIG_ARM
> +#define __rcar_pci_rw_reg_workaround(instr)				\
> +		"1:	" instr " %1, [%2]\n"				\
> +		"2:	isb\n"						\
> +		"3:	.pushsection .text.fixup,\"ax\"\n"		\
> +		"	.align	2\n"					\
> +		"4:	mov	%0, #" __stringify(PCIBIOS_SET_FAILED) "\n" \
> +		"	b	3b\n"					\
> +		"	.popsection\n"					\
> +		"	.pushsection __ex_table,\"a\"\n"		\
> +		"	.align	3\n"					\
> +		"	.long	1b, 4b\n"				\
> +		"	.long	1b, 4b\n"				\
> +		"	.popsection\n"
> +#endif
> +
> +int rcar_pci_write_reg_workaround(struct rcar_pcie *pcie, u32 val, unsigned int reg)
> +{
> +	int error = PCIBIOS_SUCCESSFUL;
> +#ifdef CONFIG_ARM
> +	asm volatile(
> +		__rcar_pci_rw_reg_workaround("str")
> +	: "+r"(error):"r"(val), "r"(pcie->base + reg) : "memory");
> +#else
> +	rcar_pci_write_reg(pcie, val, reg);
> +#endif
> +	return error;
> +}
> +
> +int rcar_pci_read_reg_workaround(struct rcar_pcie *pcie, u32 *val, unsigned int reg)
> +{
> +	int error = PCIBIOS_SUCCESSFUL;
> +#ifdef CONFIG_ARM
> +	asm volatile(
> +		__rcar_pci_rw_reg_workaround("ldr")
> +	: "+r"(error), "=r"(*val) : "r"(pcie->base + reg) : "memory");
> +
> +	if (error != PCIBIOS_SUCCESSFUL)
> +		*val = 0xffffffff;

PCI_SET_ERROR_RESPONSE()?

Maybe also use PCI_ERROR_RESPONSE in the subject and commit log
instead of "all Fs" to make it more greppable.

> +#else
> +	*val = rcar_pci_read_reg(pcie, reg);
> +#endif
> +	return error;
> +}
> +
>  /* Serialization is provided by 'pci_lock' in drivers/pci/access.c */
>  static int rcar_pcie_config_access(struct rcar_pcie_host *host,
>  		unsigned char access_type, struct pci_bus *bus,
> @@ -185,14 +230,14 @@ static int rcar_pcie_config_access(struct rcar_pcie_host *host,
>  		return PCIBIOS_DEVICE_NOT_FOUND;
>  
>  	if (access_type == RCAR_PCI_ACCESS_READ)
> -		*data = rcar_pci_read_reg(pcie, PCIECDR);
> +		ret = rcar_pci_read_reg_workaround(pcie, data, PCIECDR);
>  	else
> -		rcar_pci_write_reg(pcie, *data, PCIECDR);
> +		ret = rcar_pci_write_reg_workaround(pcie, *data, PCIECDR);
>  
>  	/* Disable the configuration access */
>  	rcar_pci_write_reg(pcie, 0, PCIECCTLR);
>  
> -	return PCIBIOS_SUCCESSFUL;
> +	return ret;
>  }
>  
>  static int rcar_pcie_read_conf(struct pci_bus *bus, unsigned int devfn,
> @@ -1097,7 +1142,7 @@ static struct platform_driver rcar_pcie_driver = {
>  static int rcar_pcie_aarch32_abort_handler(unsigned long addr,
>  		unsigned int fsr, struct pt_regs *regs)
>  {
> -	return !!rcar_pcie_wakeup(pcie_dev, pcie_base);
> +	return !fixup_exception(regs);
>  }
>  
>  static const struct of_device_id rcar_pcie_abort_handler_of_match[] __initconst = {
> -- 
> 2.34.1
>
Marek Vasut Jan. 23, 2022, 4:02 p.m. UTC | #4
On 1/23/22 15:12, Arnd Bergmann wrote:

[...]

>> +#ifdef CONFIG_ARM
>> +#define __rcar_pci_rw_reg_workaround(instr)                            \
>> +               "1:     " instr " %1, [%2]\n"                           \
>> +               "2:     isb\n"                                          \
>> +               "3:     .pushsection .text.fixup,\"ax\"\n"              \
>> +               "       .align  2\n"                                    \
>> +               "4:     mov     %0, #" __stringify(PCIBIOS_SET_FAILED) "\n" \
>> +               "       b       3b\n"                                   \
>> +               "       .popsection\n"                                  \
>> +               "       .pushsection __ex_table,\"a\"\n"                \
>> +               "       .align  3\n"                                    \
>> +               "       .long   1b, 4b\n"                               \
>> +               "       .long   1b, 4b\n"                               \
>> +               "       .popsection\n"
>> +#endif
> 
> You list the fixup for the ldr/str instruction here twice, (.long 1b,4b), but
> no fixup for the isb instruction (.long 2b, 4b). Your description says that
> the fault happens on the isb, not the ldr, so I don't understand what is
> going on here.

Copy-paste error when deduplicating the patch content, should be

.long   1b, 4b
.long   2b, 4b

to cover both ldr/str and isb.

I can imagine on CA7, the abort would trigger on the ldr/str already.
Marek Vasut Jan. 23, 2022, 4:31 p.m. UTC | #5
On 1/23/22 16:31, Pali Rohár wrote:
> On Saturday 22 January 2022 23:15:54 marek.vasut@gmail.com wrote:
>> From: Marek Vasut <marek.vasut+renesas@gmail.com>
>>
>> In case the controller is transitioning to L1 in rcar_pcie_config_access(),
>> any read/write access to PCIECDR triggers asynchronous external abort. This
>> is because the transition to L1 link state must be manually finished by the
>> driver. The PCIe IP can transition back from L1 state to L0 on its own.
> 
> Hello!
> 
> I must admit that this patch from its initial version evolved into giant hack...
> https://lore.kernel.org/linux-pci/20210514200549.431275-1-marek.vasut@gmail.com/
> 
> During review of the previous patch I have asked some important
> questions but I have not got any answer to them. So I'm reminding it:
> https://lore.kernel.org/linux-pci/20210805183024.ftdwknkttfwwogks@pali/
> 
> So could please answer what happens when PCIe controller is in some
> non-L* state and either MMIO happen or config read happens or config
> write happens?

What kind of non-L state ?

Do you have some specific test which fails ?

This patch addresses the case where the link transition to L1 state has 
to be completed manually. If the CPU accesses the config space before 
that happened, you get an imprecise data abort.

> It is really important to know this fact.
> 
> I'm in impression that this patch still is not enough as similar issues
> are also in other PCIe controllers which I know...

Do you have a suggestion for a patch which would be enough on this 
hardware ?
Pali Rohár Jan. 23, 2022, 4:49 p.m. UTC | #6
On Sunday 23 January 2022 17:31:28 Marek Vasut wrote:
> On 1/23/22 16:31, Pali Rohár wrote:
> > On Saturday 22 January 2022 23:15:54 marek.vasut@gmail.com wrote:
> > > From: Marek Vasut <marek.vasut+renesas@gmail.com>
> > > 
> > > In case the controller is transitioning to L1 in rcar_pcie_config_access(),
> > > any read/write access to PCIECDR triggers asynchronous external abort. This
> > > is because the transition to L1 link state must be manually finished by the
> > > driver. The PCIe IP can transition back from L1 state to L0 on its own.
> > 
> > Hello!
> > 
> > I must admit that this patch from its initial version evolved into giant hack...
> > https://lore.kernel.org/linux-pci/20210514200549.431275-1-marek.vasut@gmail.com/
> > 
> > During review of the previous patch I have asked some important
> > questions but I have not got any answer to them. So I'm reminding it:
> > https://lore.kernel.org/linux-pci/20210805183024.ftdwknkttfwwogks@pali/
> > 
> > So could please answer what happens when PCIe controller is in some
> > non-L* state and either MMIO happen or config read happens or config
> > write happens?
> 
> What kind of non-L state ?

E.g. Hot Reset, Detect, Polling, Configuration or Recovery.

> Do you have some specific test which fails ?

Yes, by putting PCIe controller into one of those states. I have already
wrote you in some previous email to trigger hot reset as this is the
easiest test and can be done also by userspace (setpci).

Link goes to Recovery state automatically when doing link retraining
(e.g. by setting RT bit in PCIe Root Port config space) and from
Recovery to Configuration or directly back to L0. So testing this path
needs precise timing and repeating it more times to trigger.

So the easiest test is really via PCIe Hot Reset by setting Secondary
Bus Reset bit in Bridge Control register of PCIe Root Port. After this
is link in Hot Reset and does not go back to L0 until you clear that
bit. So in this state you can do all these operations which cause
aborts, like calling that kernel function which is reading from config
space which belongs to device on the other end of the PCIe link or doing
MMIO read / write operation of mapped memory which again belongs to
other end of PCIe link.

Or instead of Hot Reset, you can set link disable bit in config space of
PCIe Root Port. Then link also would not be in L0 state (until you clear
that bit), so again you have lot of time to do same tests.

> This patch addresses the case where the link transition to L1 state has to
> be completed manually. If the CPU accesses the config space before that
> happened, you get an imprecise data abort.

Yes, I see. But it does not have to complete and the question how is
handled this case... And that is why is needed to know what happens in
such cases.

And IIRC you cannot go from L1 state directly to L0, but only via
Recovery state. And from Recovery you may end up in Detect state.
(e.g. after hot unplug or if some buggy card with kernel quirk is used)

> > It is really important to know this fact.
> > 
> > I'm in impression that this patch still is not enough as similar issues
> > are also in other PCIe controllers which I know...
> 
> Do you have a suggestion for a patch which would be enough on this hardware
> ?

I do not have enough information.
Marek Vasut Jan. 24, 2022, 5:46 a.m. UTC | #7
On 1/23/22 17:49, Pali Rohár wrote:

Hi,

[...]

>>> I must admit that this patch from its initial version evolved into giant hack...
>>> https://lore.kernel.org/linux-pci/20210514200549.431275-1-marek.vasut@gmail.com/
>>>
>>> During review of the previous patch I have asked some important
>>> questions but I have not got any answer to them. So I'm reminding it:
>>> https://lore.kernel.org/linux-pci/20210805183024.ftdwknkttfwwogks@pali/
>>>
>>> So could please answer what happens when PCIe controller is in some
>>> non-L* state and either MMIO happen or config read happens or config
>>> write happens?
>>
>> What kind of non-L state ?
> 
> E.g. Hot Reset, Detect, Polling, Configuration or Recovery.
> 
>> Do you have some specific test which fails ?
> 
> Yes, by putting PCIe controller into one of those states. I have already
> wrote you in some previous email to trigger hot reset as this is the
> easiest test and can be done also by userspace (setpci).
> 
> Link goes to Recovery state automatically when doing link retraining
> (e.g. by setting RT bit in PCIe Root Port config space) and from
> Recovery to Configuration or directly back to L0. So testing this path
> needs precise timing and repeating it more times to trigger.
> 
> So the easiest test is really via PCIe Hot Reset by setting Secondary
> Bus Reset bit in Bridge Control register of PCIe Root Port. After this
> is link in Hot Reset and does not go back to L0 until you clear that
> bit. So in this state you can do all these operations which cause
> aborts, like calling that kernel function which is reading from config
> space which belongs to device on the other end of the PCIe link or doing
> MMIO read / write operation of mapped memory which again belongs to
> other end of PCIe link.
> 
> Or instead of Hot Reset, you can set link disable bit in config space of
> PCIe Root Port. Then link also would not be in L0 state (until you clear
> that bit), so again you have lot of time to do same tests.

Can you give me the exact setpci invocation ? If so, then I can test 
this for you on the hardware.

>> This patch addresses the case where the link transition to L1 state has to
>> be completed manually. If the CPU accesses the config space before that
>> happened, you get an imprecise data abort.
> 
> Yes, I see. But it does not have to complete and the question how is
> handled this case... And that is why is needed to know what happens in
> such cases.
> 
> And IIRC you cannot go from L1 state directly to L0, but only via
> Recovery state. And from Recovery you may end up in Detect state.
> (e.g. after hot unplug or if some buggy card with kernel quirk is used)
> 
>>> It is really important to know this fact.
>>>
>>> I'm in impression that this patch still is not enough as similar issues
>>> are also in other PCIe controllers which I know...
>>
>> Do you have a suggestion for a patch which would be enough on this hardware
>> ?
> 
> I do not have enough information.

I see
Pali Rohár Jan. 24, 2022, 9:37 a.m. UTC | #8
On Monday 24 January 2022 06:46:47 Marek Vasut wrote:
> On 1/23/22 17:49, Pali Rohár wrote:
> 
> Hi,
> 
> [...]
> 
> > > > I must admit that this patch from its initial version evolved into giant hack...
> > > > https://lore.kernel.org/linux-pci/20210514200549.431275-1-marek.vasut@gmail.com/
> > > > 
> > > > During review of the previous patch I have asked some important
> > > > questions but I have not got any answer to them. So I'm reminding it:
> > > > https://lore.kernel.org/linux-pci/20210805183024.ftdwknkttfwwogks@pali/
> > > > 
> > > > So could please answer what happens when PCIe controller is in some
> > > > non-L* state and either MMIO happen or config read happens or config
> > > > write happens?
> > > 
> > > What kind of non-L state ?
> > 
> > E.g. Hot Reset, Detect, Polling, Configuration or Recovery.
> > 
> > > Do you have some specific test which fails ?
> > 
> > Yes, by putting PCIe controller into one of those states. I have already
> > wrote you in some previous email to trigger hot reset as this is the
> > easiest test and can be done also by userspace (setpci).
> > 
> > Link goes to Recovery state automatically when doing link retraining
> > (e.g. by setting RT bit in PCIe Root Port config space) and from
> > Recovery to Configuration or directly back to L0. So testing this path
> > needs precise timing and repeating it more times to trigger.
> > 
> > So the easiest test is really via PCIe Hot Reset by setting Secondary
> > Bus Reset bit in Bridge Control register of PCIe Root Port. After this
> > is link in Hot Reset and does not go back to L0 until you clear that
> > bit. So in this state you can do all these operations which cause
> > aborts, like calling that kernel function which is reading from config
> > space which belongs to device on the other end of the PCIe link or doing
> > MMIO read / write operation of mapped memory which again belongs to
> > other end of PCIe link.
> > 
> > Or instead of Hot Reset, you can set link disable bit in config space of
> > PCIe Root Port. Then link also would not be in L0 state (until you clear
> > that bit), so again you have lot of time to do same tests.
> 
> Can you give me the exact setpci invocation ? If so, then I can test this
> for you on the hardware.

Call "setpci -s $bdf_root_port BRIDGE_CONTROL" with address of the PCIe
Root Port device (parent of selected device). This will print value of
bridge control register. Logical OR it with value 0x20 (Secondary Bus
Reset Bit) and call "setpci -s $bdf_root_port BRIDGE_CONTROL=$new_value".
After this call is link in the Hot Reset state and you can do any test.
To bring link back, call setpci again with cleared 0x20 bit mask.

Similar test you can done also with setting Link Disable bit (bit 4) in
PCIe Link Control register. Offset to this register is not static and
you can figure it out from lspci -s $bdf_root_port -vv output.
Retrain Link is bit 5 in the same register.
kernel test robot Jan. 28, 2022, 2:47 a.m. UTC | #9
Hi,

I love your patch! Yet something to improve:

[auto build test ERROR on helgaas-pci/next]
[also build test ERROR on v5.17-rc1 next-20220127]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/marek-vasut-gmail-com/PCI-rcar-Finish-transition-to-L1-state-in-rcar_pcie_config_access/20220123-061742
base:   https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git next
config: arm-buildonly-randconfig-r002-20220127 (https://download.01.org/0day-ci/archive/20220128/202201281037.40i8elct-lkp@intel.com/config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project f400a6012c668dfaa73462caf067ceb074e66c47)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm cross compiling tool for clang build
        # apt-get install binutils-arm-linux-gnueabi
        # https://github.com/0day-ci/linux/commit/1eeb0cd756244c956c8b0fa809f2d4e5bee530b4
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review marek-vasut-gmail-com/PCI-rcar-Finish-transition-to-L1-state-in-rcar_pcie_config_access/20220123-061742
        git checkout 1eeb0cd756244c956c8b0fa809f2d4e5bee530b4
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   drivers/pci/controller/pcie-rcar-host.c:133:5: warning: no previous prototype for function 'rcar_pci_write_reg_workaround' [-Wmissing-prototypes]
   int rcar_pci_write_reg_workaround(struct rcar_pcie *pcie, u32 val, unsigned int reg)
       ^
   drivers/pci/controller/pcie-rcar-host.c:133:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int rcar_pci_write_reg_workaround(struct rcar_pcie *pcie, u32 val, unsigned int reg)
   ^
   static 
   drivers/pci/controller/pcie-rcar-host.c:146:5: warning: no previous prototype for function 'rcar_pci_read_reg_workaround' [-Wmissing-prototypes]
   int rcar_pci_read_reg_workaround(struct rcar_pcie *pcie, u32 *val, unsigned int reg)
       ^
   drivers/pci/controller/pcie-rcar-host.c:146:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int rcar_pci_read_reg_workaround(struct rcar_pcie *pcie, u32 *val, unsigned int reg)
   ^
   static 
>> drivers/pci/controller/pcie-rcar-host.c:138:3: error: instruction requires: data-barriers
                   __rcar_pci_rw_reg_workaround("str")
                   ^
   drivers/pci/controller/pcie-rcar-host.c:120:4: note: expanded from macro '__rcar_pci_rw_reg_workaround'
                   "2:     isb\n"                                          \
                    ^
   <inline asm>:2:4: note: instantiated into assembly here
   2:      isb
           ^
   drivers/pci/controller/pcie-rcar-host.c:151:3: error: instruction requires: data-barriers
                   __rcar_pci_rw_reg_workaround("ldr")
                   ^
   drivers/pci/controller/pcie-rcar-host.c:120:4: note: expanded from macro '__rcar_pci_rw_reg_workaround'
                   "2:     isb\n"                                          \
                    ^
   <inline asm>:2:4: note: instantiated into assembly here
   2:      isb
           ^
>> drivers/pci/controller/pcie-rcar-host.c:138:3: error: instruction requires: data-barriers
                   __rcar_pci_rw_reg_workaround("str")
                   ^
   drivers/pci/controller/pcie-rcar-host.c:120:4: note: expanded from macro '__rcar_pci_rw_reg_workaround'
                   "2:     isb\n"                                          \
                    ^
   <inline asm>:2:4: note: instantiated into assembly here
   2:      isb
           ^
   drivers/pci/controller/pcie-rcar-host.c:151:3: error: instruction requires: data-barriers
                   __rcar_pci_rw_reg_workaround("ldr")
                   ^
   drivers/pci/controller/pcie-rcar-host.c:120:4: note: expanded from macro '__rcar_pci_rw_reg_workaround'
                   "2:     isb\n"                                          \
                    ^
   <inline asm>:2:4: note: instantiated into assembly here
   2:      isb
           ^
   2 warnings and 4 errors generated.


vim +138 drivers/pci/controller/pcie-rcar-host.c

   132	
   133	int rcar_pci_write_reg_workaround(struct rcar_pcie *pcie, u32 val, unsigned int reg)
   134	{
   135		int error = PCIBIOS_SUCCESSFUL;
   136	#ifdef CONFIG_ARM
   137		asm volatile(
 > 138			__rcar_pci_rw_reg_workaround("str")
   139		: "+r"(error):"r"(val), "r"(pcie->base + reg) : "memory");
   140	#else
   141		rcar_pci_write_reg(pcie, val, reg);
   142	#endif
   143		return error;
   144	}
   145	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Marek Vasut Jan. 29, 2022, 4:39 a.m. UTC | #10
On 1/24/22 10:37, Pali Rohár wrote:
> On Monday 24 January 2022 06:46:47 Marek Vasut wrote:
>> On 1/23/22 17:49, Pali Rohár wrote:
>>
>> Hi,
>>
>> [...]
>>
>>>>> I must admit that this patch from its initial version evolved into giant hack...
>>>>> https://lore.kernel.org/linux-pci/20210514200549.431275-1-marek.vasut@gmail.com/
>>>>>
>>>>> During review of the previous patch I have asked some important
>>>>> questions but I have not got any answer to them. So I'm reminding it:
>>>>> https://lore.kernel.org/linux-pci/20210805183024.ftdwknkttfwwogks@pali/
>>>>>
>>>>> So could please answer what happens when PCIe controller is in some
>>>>> non-L* state and either MMIO happen or config read happens or config
>>>>> write happens?
>>>>
>>>> What kind of non-L state ?
>>>
>>> E.g. Hot Reset, Detect, Polling, Configuration or Recovery.
>>>
>>>> Do you have some specific test which fails ?
>>>
>>> Yes, by putting PCIe controller into one of those states. I have already
>>> wrote you in some previous email to trigger hot reset as this is the
>>> easiest test and can be done also by userspace (setpci).
>>>
>>> Link goes to Recovery state automatically when doing link retraining
>>> (e.g. by setting RT bit in PCIe Root Port config space) and from
>>> Recovery to Configuration or directly back to L0. So testing this path
>>> needs precise timing and repeating it more times to trigger.
>>>
>>> So the easiest test is really via PCIe Hot Reset by setting Secondary
>>> Bus Reset bit in Bridge Control register of PCIe Root Port. After this
>>> is link in Hot Reset and does not go back to L0 until you clear that
>>> bit. So in this state you can do all these operations which cause
>>> aborts, like calling that kernel function which is reading from config
>>> space which belongs to device on the other end of the PCIe link or doing
>>> MMIO read / write operation of mapped memory which again belongs to
>>> other end of PCIe link.
>>>
>>> Or instead of Hot Reset, you can set link disable bit in config space of
>>> PCIe Root Port. Then link also would not be in L0 state (until you clear
>>> that bit), so again you have lot of time to do same tests.
>>
>> Can you give me the exact setpci invocation ? If so, then I can test this
>> for you on the hardware.
> 
> Call "setpci -s $bdf_root_port BRIDGE_CONTROL" with address of the PCIe
> Root Port device (parent of selected device). This will print value of
> bridge control register. Logical OR it with value 0x20 (Secondary Bus
> Reset Bit) and call "setpci -s $bdf_root_port BRIDGE_CONTROL=$new_value".
> After this call is link in the Hot Reset state and you can do any test.
> To bring link back, call setpci again with cleared 0x20 bit mask.
> 
> Similar test you can done also with setting Link Disable bit (bit 4) in
> PCIe Link Control register. Offset to this register is not static and
> you can figure it out from lspci -s $bdf_root_port -vv output.
> Retrain Link is bit 5 in the same register.

Flipping either bit makes no difference, suspend/resume behaves the same 
and the link always recovers.
Pali Rohár Jan. 31, 2022, 12:53 p.m. UTC | #11
On Saturday 29 January 2022 05:39:40 Marek Vasut wrote:
> On 1/24/22 10:37, Pali Rohár wrote:
> > On Monday 24 January 2022 06:46:47 Marek Vasut wrote:
> > > On 1/23/22 17:49, Pali Rohár wrote:
> > > 
> > > Hi,
> > > 
> > > [...]
> > > 
> > > > > > I must admit that this patch from its initial version evolved into giant hack...
> > > > > > https://lore.kernel.org/linux-pci/20210514200549.431275-1-marek.vasut@gmail.com/
> > > > > > 
> > > > > > During review of the previous patch I have asked some important
> > > > > > questions but I have not got any answer to them. So I'm reminding it:
> > > > > > https://lore.kernel.org/linux-pci/20210805183024.ftdwknkttfwwogks@pali/
> > > > > > 
> > > > > > So could please answer what happens when PCIe controller is in some
> > > > > > non-L* state and either MMIO happen or config read happens or config
> > > > > > write happens?
> > > > > 
> > > > > What kind of non-L state ?
> > > > 
> > > > E.g. Hot Reset, Detect, Polling, Configuration or Recovery.
> > > > 
> > > > > Do you have some specific test which fails ?
> > > > 
> > > > Yes, by putting PCIe controller into one of those states. I have already
> > > > wrote you in some previous email to trigger hot reset as this is the
> > > > easiest test and can be done also by userspace (setpci).
> > > > 
> > > > Link goes to Recovery state automatically when doing link retraining
> > > > (e.g. by setting RT bit in PCIe Root Port config space) and from
> > > > Recovery to Configuration or directly back to L0. So testing this path
> > > > needs precise timing and repeating it more times to trigger.
> > > > 
> > > > So the easiest test is really via PCIe Hot Reset by setting Secondary
> > > > Bus Reset bit in Bridge Control register of PCIe Root Port. After this
> > > > is link in Hot Reset and does not go back to L0 until you clear that
> > > > bit. So in this state you can do all these operations which cause
> > > > aborts, like calling that kernel function which is reading from config
> > > > space which belongs to device on the other end of the PCIe link or doing
> > > > MMIO read / write operation of mapped memory which again belongs to
> > > > other end of PCIe link.
> > > > 
> > > > Or instead of Hot Reset, you can set link disable bit in config space of
> > > > PCIe Root Port. Then link also would not be in L0 state (until you clear
> > > > that bit), so again you have lot of time to do same tests.
> > > 
> > > Can you give me the exact setpci invocation ? If so, then I can test this
> > > for you on the hardware.
> > 
> > Call "setpci -s $bdf_root_port BRIDGE_CONTROL" with address of the PCIe
> > Root Port device (parent of selected device). This will print value of
> > bridge control register. Logical OR it with value 0x20 (Secondary Bus
> > Reset Bit) and call "setpci -s $bdf_root_port BRIDGE_CONTROL=$new_value".
> > After this call is link in the Hot Reset state and you can do any test.
> > To bring link back, call setpci again with cleared 0x20 bit mask.
> > 
> > Similar test you can done also with setting Link Disable bit (bit 4) in
> > PCIe Link Control register. Offset to this register is not static and
> > you can figure it out from lspci -s $bdf_root_port -vv output.
> > Retrain Link is bit 5 in the same register.
> 
> Flipping either bit makes no difference, suspend/resume behaves the same and
> the link always recovers.

Ok, perfect! And what happens without suspend/resume (just in normal
conditions)? E.g. during active usage of some PCIe card (wifi, sata, etc..).
Pali Rohár Feb. 16, 2022, 11:50 a.m. UTC | #12
On Saturday 22 January 2022 23:15:54 marek.vasut@gmail.com wrote:
> From: Marek Vasut <marek.vasut+renesas@gmail.com>
> 
> In case the controller is transitioning to L1 in rcar_pcie_config_access(),
> any read/write access to PCIECDR triggers asynchronous external abort. This
> is because the transition to L1 link state must be manually finished by the
> driver. The PCIe IP can transition back from L1 state to L0 on its own.
> 
> The current asynchronous external abort hook implementation restarts
> the instruction which finally triggered the fault, which can be a
> different instruction than the read/write instruction which started
> the faulting access. Usually the instruction which finally triggers
> the fault is one which has some data dependency on the result of the
> read/write. In case of read, the read value after fixup is undefined,
> while a read value of faulting read should be all Fs.
> 
> It is possible to enforce the fault using 'isb' instruction placed
> right after the read/write instruction which started the faulting
> access. Add custom register accessors which perform the read/write
> followed immediately by 'isb'.
> 
> This way, the fault always happens on the 'isb' and in case of read,
> which is located one instruction before the 'isb', it is now possible
> to fix up the return value of the read in the asynchronous external
> abort hook and make that read return all Fs.

I'm looking at this again and I do not think that this is reliable.
Asynchronous aborts are by definition asynchronous. Placing isb looks
like a hack to decrease probability that asynchronous abort would be
triggered at wrong time.

Marek: Cannot you change the code to trigger proper synchronous abort
for this operation? If this is ARMv7 system, what about trying to change
memory mapping where is the accessing address to strongly-ordered?
Writing to strongly-ordered ARMv7 mapping could not report asynchronous
aborts anymore, but I'm not sure.

Marek: Are you sure that also ldr instruction is causing asynchronous
abort? This is strange as normally load from memory mapped config space
could not finish earlier than data from config space are fetched.
Normally these load instructions should cause synchronous abort on data
errors.

Bjorn, Lorenzo, Krzysztof: These kind of aborts are common for lot of
ARM SoCs and I think that some generic/core PCI code would be useful.
I talked with more people and these aborts really affects more drivers,
just mainline kernel does not have fixes/workarounds. Common suggestions
(as workarounds) in vendors kernels are to disable ASPM or other PCIe
feature (which has probability to trigger this error) or stop using any
PCIe switches, etc... This is really ridiculous but everything comes to
the issue that some device in the PCIe topology sends UR or CA response,
or that PCIe controller itself send this response. Lot of PCIe
controllers on ARM SoCs converts PCIe UR or CA as fatal aborts to CPU
(AXI slave error) and crash is there. Note that at least during PCIe
device enumeration is UR or CA fully valid response which should never
cause CPU/SoC to crash.

Lorenzo: Do you have an idea if it is possible to completely mute AXI
slave errors which comes from PCIe controllers?

> Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Geert Uytterhoeven <geert+renesas@glider.be>
> Cc: Krzysztof Wilczyński <kw@linux.com>
> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> Cc: Wolfram Sang <wsa@the-dreams.de>
> Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> Cc: linux-renesas-soc@vger.kernel.org
> ---
> V2: Rebase on 1/2
> V3: - Add .text.fixup on all three ldr/str/isb instructions and call
>       fixup_exception() in the abort handler to trigger the fixup.
>     - Propagate return value from read/write accessors, in case the
>       access fails, return PCIBIOS_SET_FAILED, else PCIBIOS_SUCCESSFUL.
> ---
>  drivers/pci/controller/pcie-rcar-host.c | 53 +++++++++++++++++++++++--
>  1 file changed, 49 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/pci/controller/pcie-rcar-host.c b/drivers/pci/controller/pcie-rcar-host.c
> index 7d38a9c50093..b2e521ee95eb 100644
> --- a/drivers/pci/controller/pcie-rcar-host.c
> +++ b/drivers/pci/controller/pcie-rcar-host.c
> @@ -114,6 +114,51 @@ static u32 rcar_read_conf(struct rcar_pcie *pcie, int where)
>  	return val >> shift;
>  }
>  
> +#ifdef CONFIG_ARM
> +#define __rcar_pci_rw_reg_workaround(instr)				\
> +		"1:	" instr " %1, [%2]\n"				\
> +		"2:	isb\n"						\
> +		"3:	.pushsection .text.fixup,\"ax\"\n"		\
> +		"	.align	2\n"					\
> +		"4:	mov	%0, #" __stringify(PCIBIOS_SET_FAILED) "\n" \
> +		"	b	3b\n"					\
> +		"	.popsection\n"					\
> +		"	.pushsection __ex_table,\"a\"\n"		\
> +		"	.align	3\n"					\
> +		"	.long	1b, 4b\n"				\
> +		"	.long	1b, 4b\n"				\
> +		"	.popsection\n"
> +#endif
> +
> +int rcar_pci_write_reg_workaround(struct rcar_pcie *pcie, u32 val, unsigned int reg)
> +{
> +	int error = PCIBIOS_SUCCESSFUL;
> +#ifdef CONFIG_ARM
> +	asm volatile(
> +		__rcar_pci_rw_reg_workaround("str")
> +	: "+r"(error):"r"(val), "r"(pcie->base + reg) : "memory");
> +#else
> +	rcar_pci_write_reg(pcie, val, reg);
> +#endif
> +	return error;
> +}
> +
> +int rcar_pci_read_reg_workaround(struct rcar_pcie *pcie, u32 *val, unsigned int reg)
> +{
> +	int error = PCIBIOS_SUCCESSFUL;
> +#ifdef CONFIG_ARM
> +	asm volatile(
> +		__rcar_pci_rw_reg_workaround("ldr")
> +	: "+r"(error), "=r"(*val) : "r"(pcie->base + reg) : "memory");
> +
> +	if (error != PCIBIOS_SUCCESSFUL)
> +		*val = 0xffffffff;
> +#else
> +	*val = rcar_pci_read_reg(pcie, reg);
> +#endif
> +	return error;
> +}
> +
>  /* Serialization is provided by 'pci_lock' in drivers/pci/access.c */
>  static int rcar_pcie_config_access(struct rcar_pcie_host *host,
>  		unsigned char access_type, struct pci_bus *bus,
> @@ -185,14 +230,14 @@ static int rcar_pcie_config_access(struct rcar_pcie_host *host,
>  		return PCIBIOS_DEVICE_NOT_FOUND;
>  
>  	if (access_type == RCAR_PCI_ACCESS_READ)
> -		*data = rcar_pci_read_reg(pcie, PCIECDR);
> +		ret = rcar_pci_read_reg_workaround(pcie, data, PCIECDR);
>  	else
> -		rcar_pci_write_reg(pcie, *data, PCIECDR);
> +		ret = rcar_pci_write_reg_workaround(pcie, *data, PCIECDR);
>  
>  	/* Disable the configuration access */
>  	rcar_pci_write_reg(pcie, 0, PCIECCTLR);
>  
> -	return PCIBIOS_SUCCESSFUL;
> +	return ret;
>  }
>  
>  static int rcar_pcie_read_conf(struct pci_bus *bus, unsigned int devfn,
> @@ -1097,7 +1142,7 @@ static struct platform_driver rcar_pcie_driver = {
>  static int rcar_pcie_aarch32_abort_handler(unsigned long addr,
>  		unsigned int fsr, struct pt_regs *regs)
>  {
> -	return !!rcar_pcie_wakeup(pcie_dev, pcie_base);
> +	return !fixup_exception(regs);
>  }
>  
>  static const struct of_device_id rcar_pcie_abort_handler_of_match[] __initconst = {
> -- 
> 2.34.1
>
Marek Vasut Feb. 17, 2022, 3:24 a.m. UTC | #13
On 2/16/22 12:50, Pali Rohár wrote:

Hi,

>> In case the controller is transitioning to L1 in rcar_pcie_config_access(),
>> any read/write access to PCIECDR triggers asynchronous external abort. This
>> is because the transition to L1 link state must be manually finished by the
>> driver. The PCIe IP can transition back from L1 state to L0 on its own.
>>
>> The current asynchronous external abort hook implementation restarts
>> the instruction which finally triggered the fault, which can be a
>> different instruction than the read/write instruction which started
>> the faulting access. Usually the instruction which finally triggers
>> the fault is one which has some data dependency on the result of the
>> read/write. In case of read, the read value after fixup is undefined,
>> while a read value of faulting read should be all Fs.
>>
>> It is possible to enforce the fault using 'isb' instruction placed
>> right after the read/write instruction which started the faulting
>> access. Add custom register accessors which perform the read/write
>> followed immediately by 'isb'.
>>
>> This way, the fault always happens on the 'isb' and in case of read,
>> which is located one instruction before the 'isb', it is now possible
>> to fix up the return value of the read in the asynchronous external
>> abort hook and make that read return all Fs.
> 
> I'm looking at this again and I do not think that this is reliable.

Are you still running into any problems on your hardware with these 
patches applied ?

> Asynchronous aborts are by definition asynchronous. Placing isb looks
> like a hack to decrease probability that asynchronous abort would be
> triggered at wrong time.

That is exactly what this patch fixes, the ISB enforces the async 
exception at the right moment so it can be fixed up in the fixup handler 
(thanks Arnd).

> Marek: Cannot you change the code to trigger proper synchronous abort
> for this operation? If this is ARMv7 system, what about trying to change
> memory mapping where is the accessing address to strongly-ordered?
> Writing to strongly-ordered ARMv7 mapping could not report asynchronous
> aborts anymore, but I'm not sure.

No, last time I tried tweaking the mapping, that didn't lead to sync aborts.

> Marek: Are you sure that also ldr instruction is causing asynchronous
> abort? This is strange as normally load from memory mapped config space
> could not finish earlier than data from config space are fetched.
> Normally these load instructions should cause synchronous abort on data
> errors.

I am positive LDR triggers async abort on this hardware, since that's 
how this problem was triggered by Geert.

[...]
Pali Rohár Feb. 17, 2022, 11:29 a.m. UTC | #14
On Monday 31 January 2022 13:53:41 Pali Rohár wrote:
> On Saturday 29 January 2022 05:39:40 Marek Vasut wrote:
> > On 1/24/22 10:37, Pali Rohár wrote:
> > > On Monday 24 January 2022 06:46:47 Marek Vasut wrote:
> > > > On 1/23/22 17:49, Pali Rohár wrote:
> > > > 
> > > > Hi,
> > > > 
> > > > [...]
> > > > 
> > > > > > > I must admit that this patch from its initial version evolved into giant hack...
> > > > > > > https://lore.kernel.org/linux-pci/20210514200549.431275-1-marek.vasut@gmail.com/
> > > > > > > 
> > > > > > > During review of the previous patch I have asked some important
> > > > > > > questions but I have not got any answer to them. So I'm reminding it:
> > > > > > > https://lore.kernel.org/linux-pci/20210805183024.ftdwknkttfwwogks@pali/
> > > > > > > 
> > > > > > > So could please answer what happens when PCIe controller is in some
> > > > > > > non-L* state and either MMIO happen or config read happens or config
> > > > > > > write happens?
> > > > > > 
> > > > > > What kind of non-L state ?
> > > > > 
> > > > > E.g. Hot Reset, Detect, Polling, Configuration or Recovery.
> > > > > 
> > > > > > Do you have some specific test which fails ?
> > > > > 
> > > > > Yes, by putting PCIe controller into one of those states. I have already
> > > > > wrote you in some previous email to trigger hot reset as this is the
> > > > > easiest test and can be done also by userspace (setpci).
> > > > > 
> > > > > Link goes to Recovery state automatically when doing link retraining
> > > > > (e.g. by setting RT bit in PCIe Root Port config space) and from
> > > > > Recovery to Configuration or directly back to L0. So testing this path
> > > > > needs precise timing and repeating it more times to trigger.
> > > > > 
> > > > > So the easiest test is really via PCIe Hot Reset by setting Secondary
> > > > > Bus Reset bit in Bridge Control register of PCIe Root Port. After this
> > > > > is link in Hot Reset and does not go back to L0 until you clear that
> > > > > bit. So in this state you can do all these operations which cause
> > > > > aborts, like calling that kernel function which is reading from config
> > > > > space which belongs to device on the other end of the PCIe link or doing
> > > > > MMIO read / write operation of mapped memory which again belongs to
> > > > > other end of PCIe link.
> > > > > 
> > > > > Or instead of Hot Reset, you can set link disable bit in config space of
> > > > > PCIe Root Port. Then link also would not be in L0 state (until you clear
> > > > > that bit), so again you have lot of time to do same tests.
> > > > 
> > > > Can you give me the exact setpci invocation ? If so, then I can test this
> > > > for you on the hardware.
> > > 
> > > Call "setpci -s $bdf_root_port BRIDGE_CONTROL" with address of the PCIe
> > > Root Port device (parent of selected device). This will print value of
> > > bridge control register. Logical OR it with value 0x20 (Secondary Bus
> > > Reset Bit) and call "setpci -s $bdf_root_port BRIDGE_CONTROL=$new_value".
> > > After this call is link in the Hot Reset state and you can do any test.
> > > To bring link back, call setpci again with cleared 0x20 bit mask.
> > > 
> > > Similar test you can done also with setting Link Disable bit (bit 4) in
> > > PCIe Link Control register. Offset to this register is not static and
> > > you can figure it out from lspci -s $bdf_root_port -vv output.
> > > Retrain Link is bit 5 in the same register.
> > 
> > Flipping either bit makes no difference, suspend/resume behaves the same and
> > the link always recovers.
> 
> Ok, perfect! And what happens without suspend/resume (just in normal
> conditions)? E.g. during active usage of some PCIe card (wifi, sata, etc..).

PING? Also what lspci see for the root port and card itself during hot reset?
Marek Vasut Feb. 17, 2022, 12:59 p.m. UTC | #15
On 2/17/22 12:29, Pali Rohár wrote:
> On Monday 31 January 2022 13:53:41 Pali Rohár wrote:
>> On Saturday 29 January 2022 05:39:40 Marek Vasut wrote:
>>> On 1/24/22 10:37, Pali Rohár wrote:
>>>> On Monday 24 January 2022 06:46:47 Marek Vasut wrote:
>>>>> On 1/23/22 17:49, Pali Rohár wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> [...]
>>>>>
>>>>>>>> I must admit that this patch from its initial version evolved into giant hack...
>>>>>>>> https://lore.kernel.org/linux-pci/20210514200549.431275-1-marek.vasut@gmail.com/
>>>>>>>>
>>>>>>>> During review of the previous patch I have asked some important
>>>>>>>> questions but I have not got any answer to them. So I'm reminding it:
>>>>>>>> https://lore.kernel.org/linux-pci/20210805183024.ftdwknkttfwwogks@pali/
>>>>>>>>
>>>>>>>> So could please answer what happens when PCIe controller is in some
>>>>>>>> non-L* state and either MMIO happen or config read happens or config
>>>>>>>> write happens?
>>>>>>>
>>>>>>> What kind of non-L state ?
>>>>>>
>>>>>> E.g. Hot Reset, Detect, Polling, Configuration or Recovery.
>>>>>>
>>>>>>> Do you have some specific test which fails ?
>>>>>>
>>>>>> Yes, by putting PCIe controller into one of those states. I have already
>>>>>> wrote you in some previous email to trigger hot reset as this is the
>>>>>> easiest test and can be done also by userspace (setpci).
>>>>>>
>>>>>> Link goes to Recovery state automatically when doing link retraining
>>>>>> (e.g. by setting RT bit in PCIe Root Port config space) and from
>>>>>> Recovery to Configuration or directly back to L0. So testing this path
>>>>>> needs precise timing and repeating it more times to trigger.
>>>>>>
>>>>>> So the easiest test is really via PCIe Hot Reset by setting Secondary
>>>>>> Bus Reset bit in Bridge Control register of PCIe Root Port. After this
>>>>>> is link in Hot Reset and does not go back to L0 until you clear that
>>>>>> bit. So in this state you can do all these operations which cause
>>>>>> aborts, like calling that kernel function which is reading from config
>>>>>> space which belongs to device on the other end of the PCIe link or doing
>>>>>> MMIO read / write operation of mapped memory which again belongs to
>>>>>> other end of PCIe link.
>>>>>>
>>>>>> Or instead of Hot Reset, you can set link disable bit in config space of
>>>>>> PCIe Root Port. Then link also would not be in L0 state (until you clear
>>>>>> that bit), so again you have lot of time to do same tests.
>>>>>
>>>>> Can you give me the exact setpci invocation ? If so, then I can test this
>>>>> for you on the hardware.
>>>>
>>>> Call "setpci -s $bdf_root_port BRIDGE_CONTROL" with address of the PCIe
>>>> Root Port device (parent of selected device). This will print value of
>>>> bridge control register. Logical OR it with value 0x20 (Secondary Bus
>>>> Reset Bit) and call "setpci -s $bdf_root_port BRIDGE_CONTROL=$new_value".
>>>> After this call is link in the Hot Reset state and you can do any test.
>>>> To bring link back, call setpci again with cleared 0x20 bit mask.
>>>>
>>>> Similar test you can done also with setting Link Disable bit (bit 4) in
>>>> PCIe Link Control register. Offset to this register is not static and
>>>> you can figure it out from lspci -s $bdf_root_port -vv output.
>>>> Retrain Link is bit 5 in the same register.
>>>
>>> Flipping either bit makes no difference, suspend/resume behaves the same and
>>> the link always recovers.
>>
>> Ok, perfect! And what happens without suspend/resume (just in normal
>> conditions)? E.g. during active usage of some PCIe card (wifi, sata, etc..).
> 
> PING? Also what lspci see for the root port and card itself during hot reset?

If I recall, lspci showed the root port and card.
Pali Rohár Feb. 17, 2022, 1:04 p.m. UTC | #16
On Thursday 17 February 2022 13:59:38 Marek Vasut wrote:
> On 2/17/22 12:29, Pali Rohár wrote:
> > On Monday 31 January 2022 13:53:41 Pali Rohár wrote:
> > > On Saturday 29 January 2022 05:39:40 Marek Vasut wrote:
> > > > On 1/24/22 10:37, Pali Rohár wrote:
> > > > > On Monday 24 January 2022 06:46:47 Marek Vasut wrote:
> > > > > > On 1/23/22 17:49, Pali Rohár wrote:
> > > > > > 
> > > > > > Hi,
> > > > > > 
> > > > > > [...]
> > > > > > 
> > > > > > > > > I must admit that this patch from its initial version evolved into giant hack...
> > > > > > > > > https://lore.kernel.org/linux-pci/20210514200549.431275-1-marek.vasut@gmail.com/
> > > > > > > > > 
> > > > > > > > > During review of the previous patch I have asked some important
> > > > > > > > > questions but I have not got any answer to them. So I'm reminding it:
> > > > > > > > > https://lore.kernel.org/linux-pci/20210805183024.ftdwknkttfwwogks@pali/
> > > > > > > > > 
> > > > > > > > > So could please answer what happens when PCIe controller is in some
> > > > > > > > > non-L* state and either MMIO happen or config read happens or config
> > > > > > > > > write happens?
> > > > > > > > 
> > > > > > > > What kind of non-L state ?
> > > > > > > 
> > > > > > > E.g. Hot Reset, Detect, Polling, Configuration or Recovery.
> > > > > > > 
> > > > > > > > Do you have some specific test which fails ?
> > > > > > > 
> > > > > > > Yes, by putting PCIe controller into one of those states. I have already
> > > > > > > wrote you in some previous email to trigger hot reset as this is the
> > > > > > > easiest test and can be done also by userspace (setpci).
> > > > > > > 
> > > > > > > Link goes to Recovery state automatically when doing link retraining
> > > > > > > (e.g. by setting RT bit in PCIe Root Port config space) and from
> > > > > > > Recovery to Configuration or directly back to L0. So testing this path
> > > > > > > needs precise timing and repeating it more times to trigger.
> > > > > > > 
> > > > > > > So the easiest test is really via PCIe Hot Reset by setting Secondary
> > > > > > > Bus Reset bit in Bridge Control register of PCIe Root Port. After this
> > > > > > > is link in Hot Reset and does not go back to L0 until you clear that
> > > > > > > bit. So in this state you can do all these operations which cause
> > > > > > > aborts, like calling that kernel function which is reading from config
> > > > > > > space which belongs to device on the other end of the PCIe link or doing
> > > > > > > MMIO read / write operation of mapped memory which again belongs to
> > > > > > > other end of PCIe link.
> > > > > > > 
> > > > > > > Or instead of Hot Reset, you can set link disable bit in config space of
> > > > > > > PCIe Root Port. Then link also would not be in L0 state (until you clear
> > > > > > > that bit), so again you have lot of time to do same tests.
> > > > > > 
> > > > > > Can you give me the exact setpci invocation ? If so, then I can test this
> > > > > > for you on the hardware.
> > > > > 
> > > > > Call "setpci -s $bdf_root_port BRIDGE_CONTROL" with address of the PCIe
> > > > > Root Port device (parent of selected device). This will print value of
> > > > > bridge control register. Logical OR it with value 0x20 (Secondary Bus
> > > > > Reset Bit) and call "setpci -s $bdf_root_port BRIDGE_CONTROL=$new_value".
> > > > > After this call is link in the Hot Reset state and you can do any test.
> > > > > To bring link back, call setpci again with cleared 0x20 bit mask.
> > > > > 
> > > > > Similar test you can done also with setting Link Disable bit (bit 4) in
> > > > > PCIe Link Control register. Offset to this register is not static and
> > > > > you can figure it out from lspci -s $bdf_root_port -vv output.
> > > > > Retrain Link is bit 5 in the same register.
> > > > 
> > > > Flipping either bit makes no difference, suspend/resume behaves the same and
> > > > the link always recovers.
> > > 
> > > Ok, perfect! And what happens without suspend/resume (just in normal
> > > conditions)? E.g. during active usage of some PCIe card (wifi, sata, etc..).
> > 
> > PING? Also what lspci see for the root port and card itself during hot reset?
> 
> If I recall, lspci showed the root port and card.

This is suspicious. Card should not respond to config read requests when
is in hot reset state. Could you send output of lspci -vvxx of the root
port and also card during this test? Maybe it is possible that root port
has broken BRIDGE_CONTROL register and did not put card into Hot Reset
state?
Marek Vasut Feb. 18, 2022, 1:53 a.m. UTC | #17
On 2/17/22 14:04, Pali Rohár wrote:

[...]

>>>>> Flipping either bit makes no difference, suspend/resume behaves the same and
>>>>> the link always recovers.
>>>>
>>>> Ok, perfect! And what happens without suspend/resume (just in normal
>>>> conditions)? E.g. during active usage of some PCIe card (wifi, sata, etc..).
>>>
>>> PING? Also what lspci see for the root port and card itself during hot reset?
>>
>> If I recall, lspci showed the root port and card.
> 
> This is suspicious. Card should not respond to config read requests when
> is in hot reset state. Could you send output of lspci -vvxx of the root
> port and also card during this test? Maybe it is possible that root port
> has broken BRIDGE_CONTROL register and did not put card into Hot Reset
> state?

Yes, I could set the hardware up again and run more tests, it will take 
some time, but I can still do that.

But before I spend any more time running tests for you here, I have to 
admit, it seems to me running all those tests is completely off-topic in 
context of these two bugfixes here.

So maybe it would make sense to stop the discussion here and move it to 
separate thread ?

I have to admit, I also don't quite understand what it is that you're 
trying to find out with all those tests.
Pali Rohár Feb. 18, 2022, 4:17 p.m. UTC | #18
On Friday 18 February 2022 02:53:39 Marek Vasut wrote:
> On 2/17/22 14:04, Pali Rohár wrote:
> 
> [...]
> 
> > > > > > Flipping either bit makes no difference, suspend/resume behaves the same and
> > > > > > the link always recovers.
> > > > > 
> > > > > Ok, perfect! And what happens without suspend/resume (just in normal
> > > > > conditions)? E.g. during active usage of some PCIe card (wifi, sata, etc..).
> > > > 
> > > > PING? Also what lspci see for the root port and card itself during hot reset?
> > > 
> > > If I recall, lspci showed the root port and card.
> > 
> > This is suspicious. Card should not respond to config read requests when
> > is in hot reset state. Could you send output of lspci -vvxx of the root
> > port and also card during this test? Maybe it is possible that root port
> > has broken BRIDGE_CONTROL register and did not put card into Hot Reset
> > state?
> 
> Yes, I could set the hardware up again and run more tests, it will take some
> time, but I can still do that.
> 
> But before I spend any more time running tests for you here, I have to
> admit, it seems to me running all those tests is completely off-topic in
> context of these two bugfixes here.

I do not think this is off-topic. Issue here is caused when controller
is not in L0 state and this test is something which deterministically
put controller into non-L0 state (Hot Reset). The best verification of
all race conditions and similar timing problems is to to setup scenario
in which timing windows can be under full control. Which this can can
do.

I saw more issues related to PCIe slave errors and I'm feeling that this
patch is just hacking one or two consequences and not fixing the source
of the problem globally.

In most cases slave errors are (incorrectly) reported to CPU when PCIe
controller receive UR/CA response from the bus or if controller itself
generate UR/CA response for request from CPU.

> So maybe it would make sense to stop the discussion here and move it to
> separate thread ?
> 
> I have to admit, I also don't quite understand what it is that you're trying
> to find out with all those tests.

Moreover if this test shows that PCI Bridge registers do not work
properly then it is something which must be fixed too.

There were more discussions about catching and recovering from ARM CPU
aborts and all patches for catching asynchronous exceptions were
rejected because they cannot work by their _imprecise_ nature.

And also there were discussions (not sure if on ML or IRC) if the PCI
core / drivers are the correct place for ARMv7 exceptions / data aborts.
diff mbox series

Patch

diff --git a/drivers/pci/controller/pcie-rcar-host.c b/drivers/pci/controller/pcie-rcar-host.c
index 7d38a9c50093..b2e521ee95eb 100644
--- a/drivers/pci/controller/pcie-rcar-host.c
+++ b/drivers/pci/controller/pcie-rcar-host.c
@@ -114,6 +114,51 @@  static u32 rcar_read_conf(struct rcar_pcie *pcie, int where)
 	return val >> shift;
 }
 
+#ifdef CONFIG_ARM
+#define __rcar_pci_rw_reg_workaround(instr)				\
+		"1:	" instr " %1, [%2]\n"				\
+		"2:	isb\n"						\
+		"3:	.pushsection .text.fixup,\"ax\"\n"		\
+		"	.align	2\n"					\
+		"4:	mov	%0, #" __stringify(PCIBIOS_SET_FAILED) "\n" \
+		"	b	3b\n"					\
+		"	.popsection\n"					\
+		"	.pushsection __ex_table,\"a\"\n"		\
+		"	.align	3\n"					\
+		"	.long	1b, 4b\n"				\
+		"	.long	1b, 4b\n"				\
+		"	.popsection\n"
+#endif
+
+int rcar_pci_write_reg_workaround(struct rcar_pcie *pcie, u32 val, unsigned int reg)
+{
+	int error = PCIBIOS_SUCCESSFUL;
+#ifdef CONFIG_ARM
+	asm volatile(
+		__rcar_pci_rw_reg_workaround("str")
+	: "+r"(error):"r"(val), "r"(pcie->base + reg) : "memory");
+#else
+	rcar_pci_write_reg(pcie, val, reg);
+#endif
+	return error;
+}
+
+int rcar_pci_read_reg_workaround(struct rcar_pcie *pcie, u32 *val, unsigned int reg)
+{
+	int error = PCIBIOS_SUCCESSFUL;
+#ifdef CONFIG_ARM
+	asm volatile(
+		__rcar_pci_rw_reg_workaround("ldr")
+	: "+r"(error), "=r"(*val) : "r"(pcie->base + reg) : "memory");
+
+	if (error != PCIBIOS_SUCCESSFUL)
+		*val = 0xffffffff;
+#else
+	*val = rcar_pci_read_reg(pcie, reg);
+#endif
+	return error;
+}
+
 /* Serialization is provided by 'pci_lock' in drivers/pci/access.c */
 static int rcar_pcie_config_access(struct rcar_pcie_host *host,
 		unsigned char access_type, struct pci_bus *bus,
@@ -185,14 +230,14 @@  static int rcar_pcie_config_access(struct rcar_pcie_host *host,
 		return PCIBIOS_DEVICE_NOT_FOUND;
 
 	if (access_type == RCAR_PCI_ACCESS_READ)
-		*data = rcar_pci_read_reg(pcie, PCIECDR);
+		ret = rcar_pci_read_reg_workaround(pcie, data, PCIECDR);
 	else
-		rcar_pci_write_reg(pcie, *data, PCIECDR);
+		ret = rcar_pci_write_reg_workaround(pcie, *data, PCIECDR);
 
 	/* Disable the configuration access */
 	rcar_pci_write_reg(pcie, 0, PCIECCTLR);
 
-	return PCIBIOS_SUCCESSFUL;
+	return ret;
 }
 
 static int rcar_pcie_read_conf(struct pci_bus *bus, unsigned int devfn,
@@ -1097,7 +1142,7 @@  static struct platform_driver rcar_pcie_driver = {
 static int rcar_pcie_aarch32_abort_handler(unsigned long addr,
 		unsigned int fsr, struct pt_regs *regs)
 {
-	return !!rcar_pcie_wakeup(pcie_dev, pcie_base);
+	return !fixup_exception(regs);
 }
 
 static const struct of_device_id rcar_pcie_abort_handler_of_match[] __initconst = {