diff mbox series

[4/5] PCI: mediatek-gen3: Don't reply AXI slave error

Message ID 20250103060035.30688-5-jianjun.wang@mediatek.com (mailing list archive)
State New
Headers show
Series PCI: mediatek-gen3: Add MT8196 support | expand

Commit Message

Jianjun Wang Jan. 3, 2025, 6 a.m. UTC
There are some circumstances where the EP device will not respond to
non-posted access from the root port (e.g., MMIO read). In such cases,
the root port will reply with an AXI slave error, which will be treated
as a System Error (SError), causing a kernel panic and preventing us
from obtaining any useful information for further debugging.

We have added a new bit in the PCIE_AXI_IF_CTRL_REG register to prevent
PCIe AXI0 from replying with a slave error. Setting this bit on an older
platform that does not support this feature will have no effect.

By preventing AXI0 from replying with a slave error, we can keep the
kernel alive and debug using the information from AER.

Signed-off-by: Jianjun Wang <jianjun.wang@mediatek.com>
---
 drivers/pci/controller/pcie-mediatek-gen3.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

AngeloGioacchino Del Regno Jan. 3, 2025, 9:29 a.m. UTC | #1
Il 03/01/25 07:00, Jianjun Wang ha scritto:
> There are some circumstances where the EP device will not respond to
> non-posted access from the root port (e.g., MMIO read). In such cases,
> the root port will reply with an AXI slave error, which will be treated
> as a System Error (SError), causing a kernel panic and preventing us
> from obtaining any useful information for further debugging.
> 
> We have added a new bit in the PCIE_AXI_IF_CTRL_REG register to prevent
> PCIe AXI0 from replying with a slave error. Setting this bit on an older
> platform that does not support this feature will have no effect.
> 
> By preventing AXI0 from replying with a slave error, we can keep the
> kernel alive and debug using the information from AER.
> 
> Signed-off-by: Jianjun Wang <jianjun.wang@mediatek.com>
> ---
>   drivers/pci/controller/pcie-mediatek-gen3.c | 12 ++++++++++++
>   1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/pci/controller/pcie-mediatek-gen3.c b/drivers/pci/controller/pcie-mediatek-gen3.c
> index 4bd3b39eebe2..48f83c2d91f7 100644
> --- a/drivers/pci/controller/pcie-mediatek-gen3.c
> +++ b/drivers/pci/controller/pcie-mediatek-gen3.c
> @@ -87,6 +87,9 @@
>   #define PCIE_LOW_POWER_CTRL_REG		0x194
>   #define PCIE_FORCE_DIS_L0S		BIT(8)
>   
> +#define PCIE_AXI_IF_CTRL_REG		0x1a8
> +#define PCIE_AXI0_SLV_RESP_MASK		BIT(12)
> +
>   #define PCIE_PIPE4_PIE8_REG		0x338
>   #define PCIE_K_FINETUNE_MAX		GENMASK(5, 0)
>   #define PCIE_K_FINETUNE_ERR		GENMASK(7, 6)
> @@ -469,6 +472,15 @@ static int mtk_pcie_startup_port(struct mtk_gen3_pcie *pcie)
>   	val |= PCIE_FORCE_DIS_L0S;
>   	writel_relaxed(val, pcie->base + PCIE_LOW_POWER_CTRL_REG);
>   
> +	/*
> +	 * Prevent PCIe AXI0 from replying a slave error, as it will cause kernel panic
> +	 * and prevent us from getting useful information.
> +	 * Keep the kernel alive and debug using the information from AER.
> +	 */

Isn't it safer if we set this bit at the beginning of mtk_pcie_startup_port()
instead?

Cheers,
Angelo

> +	val = readl_relaxed(pcie->base + PCIE_AXI_IF_CTRL_REG);
> +	val |= PCIE_AXI0_SLV_RESP_MASK;
> +	writel_relaxed(val, pcie->base + PCIE_AXI_IF_CTRL_REG);
> +
>   	/* Disable DVFSRC voltage request */
>   	val = readl_relaxed(pcie->base + PCIE_MISC_CTRL_REG);
>   	val |= PCIE_DISABLE_DVFSRC_VLT_REQ;
Bjorn Helgaas Jan. 3, 2025, 7:19 p.m. UTC | #2
On Fri, Jan 03, 2025 at 02:00:14PM +0800, Jianjun Wang wrote:
> There are some circumstances where the EP device will not respond to
> non-posted access from the root port (e.g., MMIO read). In such cases,
> the root port will reply with an AXI slave error, which will be treated
> as a System Error (SError), causing a kernel panic and preventing us
> from obtaining any useful information for further debugging.
> 
> We have added a new bit in the PCIE_AXI_IF_CTRL_REG register to prevent
> PCIe AXI0 from replying with a slave error. Setting this bit on an older
> platform that does not support this feature will have no effect.
> 
> By preventing AXI0 from replying with a slave error, we can keep the
> kernel alive and debug using the information from AER.
> 
> Signed-off-by: Jianjun Wang <jianjun.wang@mediatek.com>
> ---
>  drivers/pci/controller/pcie-mediatek-gen3.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/pci/controller/pcie-mediatek-gen3.c b/drivers/pci/controller/pcie-mediatek-gen3.c
> index 4bd3b39eebe2..48f83c2d91f7 100644
> --- a/drivers/pci/controller/pcie-mediatek-gen3.c
> +++ b/drivers/pci/controller/pcie-mediatek-gen3.c
> @@ -87,6 +87,9 @@
>  #define PCIE_LOW_POWER_CTRL_REG		0x194
>  #define PCIE_FORCE_DIS_L0S		BIT(8)
>  
> +#define PCIE_AXI_IF_CTRL_REG		0x1a8
> +#define PCIE_AXI0_SLV_RESP_MASK		BIT(12)
> +
>  #define PCIE_PIPE4_PIE8_REG		0x338
>  #define PCIE_K_FINETUNE_MAX		GENMASK(5, 0)
>  #define PCIE_K_FINETUNE_ERR		GENMASK(7, 6)
> @@ -469,6 +472,15 @@ static int mtk_pcie_startup_port(struct mtk_gen3_pcie *pcie)
>  	val |= PCIE_FORCE_DIS_L0S;
>  	writel_relaxed(val, pcie->base + PCIE_LOW_POWER_CTRL_REG);
>  
> +	/*
> +	 * Prevent PCIe AXI0 from replying a slave error, as it will cause kernel panic
> +	 * and prevent us from getting useful information.
> +	 * Keep the kernel alive and debug using the information from AER.

Wrap to fit in 80 columns like the rest of the file

Add blank lines between paragraphs.

AER is an asynchronous mechanism, so if you disable the SError,
whoever issued the MMIO read to the PCIe device will receive some kind
of data.

I hope/assume that data is ~0 as on other platforms?  If so, please
confirm this in the comment and commit log.  Otherwise, the caller
will received corrupted data with no way to know that it's corrupted.

> +	 */
> +	val = readl_relaxed(pcie->base + PCIE_AXI_IF_CTRL_REG);
> +	val |= PCIE_AXI0_SLV_RESP_MASK;
> +	writel_relaxed(val, pcie->base + PCIE_AXI_IF_CTRL_REG);
> +
>  	/* Disable DVFSRC voltage request */
>  	val = readl_relaxed(pcie->base + PCIE_MISC_CTRL_REG);
>  	val |= PCIE_DISABLE_DVFSRC_VLT_REQ;
> -- 
> 2.46.0
>
Jianjun Wang Jan. 6, 2025, 9:27 a.m. UTC | #3
On Fri, 2025-01-03 at 10:29 +0100, AngeloGioacchino Del Regno wrote:
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
> 
> 
> Il 03/01/25 07:00, Jianjun Wang ha scritto:
> > There are some circumstances where the EP device will not respond
> > to
> > non-posted access from the root port (e.g., MMIO read). In such
> > cases,
> > the root port will reply with an AXI slave error, which will be
> > treated
> > as a System Error (SError), causing a kernel panic and preventing
> > us
> > from obtaining any useful information for further debugging.
> > 
> > We have added a new bit in the PCIE_AXI_IF_CTRL_REG register to
> > prevent
> > PCIe AXI0 from replying with a slave error. Setting this bit on an
> > older
> > platform that does not support this feature will have no effect.
> > 
> > By preventing AXI0 from replying with a slave error, we can keep
> > the
> > kernel alive and debug using the information from AER.
> > 
> > Signed-off-by: Jianjun Wang <jianjun.wang@mediatek.com>
> > ---
> >   drivers/pci/controller/pcie-mediatek-gen3.c | 12 ++++++++++++
> >   1 file changed, 12 insertions(+)
> > 
> > diff --git a/drivers/pci/controller/pcie-mediatek-gen3.c
> > b/drivers/pci/controller/pcie-mediatek-gen3.c
> > index 4bd3b39eebe2..48f83c2d91f7 100644
> > --- a/drivers/pci/controller/pcie-mediatek-gen3.c
> > +++ b/drivers/pci/controller/pcie-mediatek-gen3.c
> > @@ -87,6 +87,9 @@
> >   #define PCIE_LOW_POWER_CTRL_REG             0x194
> >   #define PCIE_FORCE_DIS_L0S          BIT(8)
> > 
> > +#define PCIE_AXI_IF_CTRL_REG         0x1a8
> > +#define PCIE_AXI0_SLV_RESP_MASK              BIT(12)
> > +
> >   #define PCIE_PIPE4_PIE8_REG         0x338
> >   #define PCIE_K_FINETUNE_MAX         GENMASK(5, 0)
> >   #define PCIE_K_FINETUNE_ERR         GENMASK(7, 6)
> > @@ -469,6 +472,15 @@ static int mtk_pcie_startup_port(struct
> > mtk_gen3_pcie *pcie)
> >       val |= PCIE_FORCE_DIS_L0S;
> >       writel_relaxed(val, pcie->base + PCIE_LOW_POWER_CTRL_REG);
> > 
> > +     /*
> > +      * Prevent PCIe AXI0 from replying a slave error, as it will
> > cause kernel panic
> > +      * and prevent us from getting useful information.
> > +      * Keep the kernel alive and debug using the information from
> > AER.
> > +      */
> 
> Isn't it safer if we set this bit at the beginning of
> mtk_pcie_startup_port()
> instead?

Agree, I'll move it to the beginning of mtk_pcie_startup_port().

Thanks.

> 
> Cheers,
> Angelo
> 
> > +     val = readl_relaxed(pcie->base + PCIE_AXI_IF_CTRL_REG);
> > +     val |= PCIE_AXI0_SLV_RESP_MASK;
> > +     writel_relaxed(val, pcie->base + PCIE_AXI_IF_CTRL_REG);
> > +
> >       /* Disable DVFSRC voltage request */
> >       val = readl_relaxed(pcie->base + PCIE_MISC_CTRL_REG);
> >       val |= PCIE_DISABLE_DVFSRC_VLT_REQ;
> 
> 
>
Jianjun Wang Jan. 6, 2025, 9:31 a.m. UTC | #4
On Fri, 2025-01-03 at 13:19 -0600, Bjorn Helgaas wrote:
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
> 
> 
> On Fri, Jan 03, 2025 at 02:00:14PM +0800, Jianjun Wang wrote:
> > There are some circumstances where the EP device will not respond
> > to
> > non-posted access from the root port (e.g., MMIO read). In such
> > cases,
> > the root port will reply with an AXI slave error, which will be
> > treated
> > as a System Error (SError), causing a kernel panic and preventing
> > us
> > from obtaining any useful information for further debugging.
> > 
> > We have added a new bit in the PCIE_AXI_IF_CTRL_REG register to
> > prevent
> > PCIe AXI0 from replying with a slave error. Setting this bit on an
> > older
> > platform that does not support this feature will have no effect.
> > 
> > By preventing AXI0 from replying with a slave error, we can keep
> > the
> > kernel alive and debug using the information from AER.
> > 
> > Signed-off-by: Jianjun Wang <jianjun.wang@mediatek.com>
> > ---
> >  drivers/pci/controller/pcie-mediatek-gen3.c | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> > 
> > diff --git a/drivers/pci/controller/pcie-mediatek-gen3.c
> > b/drivers/pci/controller/pcie-mediatek-gen3.c
> > index 4bd3b39eebe2..48f83c2d91f7 100644
> > --- a/drivers/pci/controller/pcie-mediatek-gen3.c
> > +++ b/drivers/pci/controller/pcie-mediatek-gen3.c
> > @@ -87,6 +87,9 @@
> >  #define PCIE_LOW_POWER_CTRL_REG              0x194
> >  #define PCIE_FORCE_DIS_L0S           BIT(8)
> > 
> > +#define PCIE_AXI_IF_CTRL_REG         0x1a8
> > +#define PCIE_AXI0_SLV_RESP_MASK              BIT(12)
> > +
> >  #define PCIE_PIPE4_PIE8_REG          0x338
> >  #define PCIE_K_FINETUNE_MAX          GENMASK(5, 0)
> >  #define PCIE_K_FINETUNE_ERR          GENMASK(7, 6)
> > @@ -469,6 +472,15 @@ static int mtk_pcie_startup_port(struct
> > mtk_gen3_pcie *pcie)
> >       val |= PCIE_FORCE_DIS_L0S;
> >       writel_relaxed(val, pcie->base + PCIE_LOW_POWER_CTRL_REG);
> > 
> > +     /*
> > +      * Prevent PCIe AXI0 from replying a slave error, as it will
> > cause kernel panic
> > +      * and prevent us from getting useful information.
> > +      * Keep the kernel alive and debug using the information from
> > AER.
> 
> Wrap to fit in 80 columns like the rest of the file
> 
> Add blank lines between paragraphs.
> 
> AER is an asynchronous mechanism, so if you disable the SError,
> whoever issued the MMIO read to the PCIe device will receive some
> kind
> of data.
> 
> I hope/assume that data is ~0 as on other platforms?  If so, please
> confirm this in the comment and commit log.  Otherwise, the caller
> will received corrupted data with no way to know that it's corrupted.

Yes, with this bit set, the caller will receive ~0 data if the EP does
not respond. I'll add this to the comment and commit log.

Thanks.

> 
> > +      */
> > +     val = readl_relaxed(pcie->base + PCIE_AXI_IF_CTRL_REG);
> > +     val |= PCIE_AXI0_SLV_RESP_MASK;
> > +     writel_relaxed(val, pcie->base + PCIE_AXI_IF_CTRL_REG);
> > +
> >       /* Disable DVFSRC voltage request */
> >       val = readl_relaxed(pcie->base + PCIE_MISC_CTRL_REG);
> >       val |= PCIE_DISABLE_DVFSRC_VLT_REQ;
> > --
> > 2.46.0
> >
Manivannan Sadhasivam Jan. 6, 2025, 4:16 p.m. UTC | #5
On Fri, Jan 03, 2025 at 02:00:14PM +0800, Jianjun Wang wrote:
> There are some circumstances where the EP device will not respond to
> non-posted access from the root port (e.g., MMIO read). In such cases,
> the root port will reply with an AXI slave error, which will be treated

By 'reply with an AXI slave error', you meant that the root port responds to the
MMIO read by the CPU with AXI slave error? If so, please reword it as such to
avoid confusion.

> as a System Error (SError), causing a kernel panic and preventing us
> from obtaining any useful information for further debugging.
> 
> We have added a new bit in the PCIE_AXI_IF_CTRL_REG register to prevent
> PCIe AXI0 from replying with a slave error. Setting this bit on an older
> platform that does not support this feature will have no effect.
> 

But the issue is still present on the older SoCs, isn't it? If so, please add
this info to the comments below.

- Mani

> By preventing AXI0 from replying with a slave error, we can keep the
> kernel alive and debug using the information from AER.
> 
> Signed-off-by: Jianjun Wang <jianjun.wang@mediatek.com>
> ---
>  drivers/pci/controller/pcie-mediatek-gen3.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/pci/controller/pcie-mediatek-gen3.c b/drivers/pci/controller/pcie-mediatek-gen3.c
> index 4bd3b39eebe2..48f83c2d91f7 100644
> --- a/drivers/pci/controller/pcie-mediatek-gen3.c
> +++ b/drivers/pci/controller/pcie-mediatek-gen3.c
> @@ -87,6 +87,9 @@
>  #define PCIE_LOW_POWER_CTRL_REG		0x194
>  #define PCIE_FORCE_DIS_L0S		BIT(8)
>  
> +#define PCIE_AXI_IF_CTRL_REG		0x1a8
> +#define PCIE_AXI0_SLV_RESP_MASK		BIT(12)
> +
>  #define PCIE_PIPE4_PIE8_REG		0x338
>  #define PCIE_K_FINETUNE_MAX		GENMASK(5, 0)
>  #define PCIE_K_FINETUNE_ERR		GENMASK(7, 6)
> @@ -469,6 +472,15 @@ static int mtk_pcie_startup_port(struct mtk_gen3_pcie *pcie)
>  	val |= PCIE_FORCE_DIS_L0S;
>  	writel_relaxed(val, pcie->base + PCIE_LOW_POWER_CTRL_REG);
>  
> +	/*
> +	 * Prevent PCIe AXI0 from replying a slave error, as it will cause kernel panic
> +	 * and prevent us from getting useful information.
> +	 * Keep the kernel alive and debug using the information from AER.
> +	 */
> +	val = readl_relaxed(pcie->base + PCIE_AXI_IF_CTRL_REG);
> +	val |= PCIE_AXI0_SLV_RESP_MASK;
> +	writel_relaxed(val, pcie->base + PCIE_AXI_IF_CTRL_REG);
> +
>  	/* Disable DVFSRC voltage request */
>  	val = readl_relaxed(pcie->base + PCIE_MISC_CTRL_REG);
>  	val |= PCIE_DISABLE_DVFSRC_VLT_REQ;
> -- 
> 2.46.0
>
Jianjun Wang Jan. 7, 2025, 3:21 a.m. UTC | #6
On Mon, 2025-01-06 at 21:46 +0530, Manivannan Sadhasivam wrote:
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
> 
> 
> On Fri, Jan 03, 2025 at 02:00:14PM +0800, Jianjun Wang wrote:
> > There are some circumstances where the EP device will not respond
> > to
> > non-posted access from the root port (e.g., MMIO read). In such
> > cases,
> > the root port will reply with an AXI slave error, which will be
> > treated
> 
> By 'reply with an AXI slave error', you meant that the root port
> responds to the
> MMIO read by the CPU with AXI slave error? If so, please reword it as
> such to
> avoid confusion.

Yes, I'll fix it in the next version, thanks.

> 
> > as a System Error (SError), causing a kernel panic and preventing
> > us
> > from obtaining any useful information for further debugging.
> > 
> > We have added a new bit in the PCIE_AXI_IF_CTRL_REG register to
> > prevent
> > PCIe AXI0 from replying with a slave error. Setting this bit on an
> > older
> > platform that does not support this feature will have no effect.
> > 
> 
> But the issue is still present on the older SoCs, isn't it? If so,
> please add
> this info to the comments below.

Yes, older SoCs don't have this feature, I'll add this info in the next
version.

Thanks.

> 
> - Mani
> 
> > By preventing AXI0 from replying with a slave error, we can keep
> > the
> > kernel alive and debug using the information from AER.
> > 
> > Signed-off-by: Jianjun Wang <jianjun.wang@mediatek.com>
> > ---
> >  drivers/pci/controller/pcie-mediatek-gen3.c | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> > 
> > diff --git a/drivers/pci/controller/pcie-mediatek-gen3.c
> > b/drivers/pci/controller/pcie-mediatek-gen3.c
> > index 4bd3b39eebe2..48f83c2d91f7 100644
> > --- a/drivers/pci/controller/pcie-mediatek-gen3.c
> > +++ b/drivers/pci/controller/pcie-mediatek-gen3.c
> > @@ -87,6 +87,9 @@
> >  #define PCIE_LOW_POWER_CTRL_REG              0x194
> >  #define PCIE_FORCE_DIS_L0S           BIT(8)
> > 
> > +#define PCIE_AXI_IF_CTRL_REG         0x1a8
> > +#define PCIE_AXI0_SLV_RESP_MASK              BIT(12)
> > +
> >  #define PCIE_PIPE4_PIE8_REG          0x338
> >  #define PCIE_K_FINETUNE_MAX          GENMASK(5, 0)
> >  #define PCIE_K_FINETUNE_ERR          GENMASK(7, 6)
> > @@ -469,6 +472,15 @@ static int mtk_pcie_startup_port(struct
> > mtk_gen3_pcie *pcie)
> >       val |= PCIE_FORCE_DIS_L0S;
> >       writel_relaxed(val, pcie->base + PCIE_LOW_POWER_CTRL_REG);
> > 
> > +     /*
> > +      * Prevent PCIe AXI0 from replying a slave error, as it will
> > cause kernel panic
> > +      * and prevent us from getting useful information.
> > +      * Keep the kernel alive and debug using the information from
> > AER.
> > +      */
> > +     val = readl_relaxed(pcie->base + PCIE_AXI_IF_CTRL_REG);
> > +     val |= PCIE_AXI0_SLV_RESP_MASK;
> > +     writel_relaxed(val, pcie->base + PCIE_AXI_IF_CTRL_REG);
> > +
> >       /* Disable DVFSRC voltage request */
> >       val = readl_relaxed(pcie->base + PCIE_MISC_CTRL_REG);
> >       val |= PCIE_DISABLE_DVFSRC_VLT_REQ;
> > --
> > 2.46.0
> > 
> 
> --
> மணிவண்ணன் சதாசிவம்
>
diff mbox series

Patch

diff --git a/drivers/pci/controller/pcie-mediatek-gen3.c b/drivers/pci/controller/pcie-mediatek-gen3.c
index 4bd3b39eebe2..48f83c2d91f7 100644
--- a/drivers/pci/controller/pcie-mediatek-gen3.c
+++ b/drivers/pci/controller/pcie-mediatek-gen3.c
@@ -87,6 +87,9 @@ 
 #define PCIE_LOW_POWER_CTRL_REG		0x194
 #define PCIE_FORCE_DIS_L0S		BIT(8)
 
+#define PCIE_AXI_IF_CTRL_REG		0x1a8
+#define PCIE_AXI0_SLV_RESP_MASK		BIT(12)
+
 #define PCIE_PIPE4_PIE8_REG		0x338
 #define PCIE_K_FINETUNE_MAX		GENMASK(5, 0)
 #define PCIE_K_FINETUNE_ERR		GENMASK(7, 6)
@@ -469,6 +472,15 @@  static int mtk_pcie_startup_port(struct mtk_gen3_pcie *pcie)
 	val |= PCIE_FORCE_DIS_L0S;
 	writel_relaxed(val, pcie->base + PCIE_LOW_POWER_CTRL_REG);
 
+	/*
+	 * Prevent PCIe AXI0 from replying a slave error, as it will cause kernel panic
+	 * and prevent us from getting useful information.
+	 * Keep the kernel alive and debug using the information from AER.
+	 */
+	val = readl_relaxed(pcie->base + PCIE_AXI_IF_CTRL_REG);
+	val |= PCIE_AXI0_SLV_RESP_MASK;
+	writel_relaxed(val, pcie->base + PCIE_AXI_IF_CTRL_REG);
+
 	/* Disable DVFSRC voltage request */
 	val = readl_relaxed(pcie->base + PCIE_MISC_CTRL_REG);
 	val |= PCIE_DISABLE_DVFSRC_VLT_REQ;