mbox series

[v2,00/13] Qcom: LLCC/EDAC: Fix base address used for LLCC banks

Message ID 20221212123311.146261-1-manivannan.sadhasivam@linaro.org (mailing list archive)
Headers show
Series Qcom: LLCC/EDAC: Fix base address used for LLCC banks | expand

Message

Manivannan Sadhasivam Dec. 12, 2022, 12:32 p.m. UTC
The Qualcomm LLCC/EDAC drivers were using a fixed register stride for
accessing the (Control and Status Regsiters) CSRs of each LLCC bank.
This offset only works for some SoCs like SDM845 for which driver support
was initially added.
    
But the later SoCs use different register stride that vary between the
banks with holes in-between. So it is not possible to use a single register
stride for accessing the CSRs of each bank. By doing so could result in a
crash with the current drivers. So far this crash is not reported since
EDAC_QCOM driver is not enabled in ARM64 defconfig and no one tested the
driver extensively by triggering the EDAC IRQ (that's where each bank
CSRs are accessed).
    
For fixing this issue, let's obtain the base address of each LLCC bank from
devicetree and get rid of the fixed stride.

This series affects multiple platforms but I have only tested this on
SM8250 and SM8450. Testing on other platforms is welcomed.

Thanks,
Mani

Changes in v2:

* Removed reg-names property and used index of reg property to parse LLCC
  bank base address (Bjorn)
* Collected Ack from Sai for binding
* Added a new patch for polling mode (Luca)
* Renamed subject of patches targeting SC7180 and SM6350

Manivannan Sadhasivam (13):
  dt-bindings: arm: msm: Update the maintainers for LLCC
  dt-bindings: arm: msm: Fix register regions used for LLCC banks
  arm64: dts: qcom: sdm845: Fix the base addresses of LLCC banks
  arm64: dts: qcom: sc7180: Remove reg-names property from LLCC node
  arm64: dts: qcom: sc7280: Fix the base addresses of LLCC banks
  arm64: dts: qcom: sc8280xp: Fix the base addresses of LLCC banks
  arm64: dts: qcom: sm8150: Fix the base addresses of LLCC banks
  arm64: dts: qcom: sm8250: Fix the base addresses of LLCC banks
  arm64: dts: qcom: sm8350: Fix the base addresses of LLCC banks
  arm64: dts: qcom: sm8450: Fix the base addresses of LLCC banks
  arm64: dts: qcom: sm6350: Remove reg-names property from LLCC node
  qcom: llcc/edac: Fix the base address used for accessing LLCC banks
  qcom: llcc/edac: Support polling mode for ECC handling

 .../bindings/arm/msm/qcom,llcc.yaml           | 100 +++++++++++++++---
 arch/arm64/boot/dts/qcom/sc7180.dtsi          |   1 -
 arch/arm64/boot/dts/qcom/sc7280.dtsi          |   4 +-
 arch/arm64/boot/dts/qcom/sc8280xp.dtsi        |   7 +-
 arch/arm64/boot/dts/qcom/sdm845.dtsi          |   5 +-
 arch/arm64/boot/dts/qcom/sm6350.dtsi          |   1 -
 arch/arm64/boot/dts/qcom/sm8150.dtsi          |   5 +-
 arch/arm64/boot/dts/qcom/sm8250.dtsi          |   5 +-
 arch/arm64/boot/dts/qcom/sm8350.dtsi          |   5 +-
 arch/arm64/boot/dts/qcom/sm8450.dtsi          |   5 +-
 drivers/edac/qcom_edac.c                      |  51 +++++----
 drivers/soc/qcom/llcc-qcom.c                  |  85 ++++++++-------
 include/linux/soc/qcom/llcc-qcom.h            |   6 +-
 13 files changed, 186 insertions(+), 94 deletions(-)

Comments

Andrew Halaney Dec. 12, 2022, 7:23 p.m. UTC | #1
On Mon, Dec 12, 2022 at 06:02:58PM +0530, Manivannan Sadhasivam wrote:
> The Qualcomm LLCC/EDAC drivers were using a fixed register stride for
> accessing the (Control and Status Regsiters) CSRs of each LLCC bank.
> This offset only works for some SoCs like SDM845 for which driver support
> was initially added.
>
> But the later SoCs use different register stride that vary between the
> banks with holes in-between. So it is not possible to use a single register
> stride for accessing the CSRs of each bank. By doing so could result in a
> crash with the current drivers. So far this crash is not reported since
> EDAC_QCOM driver is not enabled in ARM64 defconfig and no one tested the
> driver extensively by triggering the EDAC IRQ (that's where each bank
> CSRs are accessed).
>
> For fixing this issue, let's obtain the base address of each LLCC bank from
> devicetree and get rid of the fixed stride.
>
> This series affects multiple platforms but I have only tested this on
> SM8250 and SM8450. Testing on other platforms is welcomed.
>

Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8540p-ride

I took this for a quick spin on the qdrive3 I've got access to without
any issue:

    [root@localhost ~]# modprobe qcom_edac
    [root@localhost ~]# dmesg | grep -i edac
    [    0.620723] EDAC MC: Ver: 3.0.0
    [    1.165417] ghes_edac: GHES probing device list is empty
    [  594.688103] EDAC DEVICE0: Giving out device to module qcom_llcc_edac controller llcc: DEV qcom_llcc_edac (INTERRUPT)
    [root@localhost ~]# cat /proc/interrupts | grep ecc
    174:          0          0          0          0          0          0          0          0     GICv3 614 Level     llcc_ecc
    [root@localhost ~]#

Potentially stupid question, but are users expected to manually load the
driver as I did? I don't see how it would be loaded automatically in the
current state, but thought it was funny that I needed to modprobe
myself.

Please let me know if you want me to do any more further testing!

Thanks,
Andrew
Manivannan Sadhasivam Dec. 13, 2022, 5:28 a.m. UTC | #2
On Mon, Dec 12, 2022 at 01:23:40PM -0600, Andrew Halaney wrote:
> On Mon, Dec 12, 2022 at 06:02:58PM +0530, Manivannan Sadhasivam wrote:
> > The Qualcomm LLCC/EDAC drivers were using a fixed register stride for
> > accessing the (Control and Status Regsiters) CSRs of each LLCC bank.
> > This offset only works for some SoCs like SDM845 for which driver support
> > was initially added.
> >
> > But the later SoCs use different register stride that vary between the
> > banks with holes in-between. So it is not possible to use a single register
> > stride for accessing the CSRs of each bank. By doing so could result in a
> > crash with the current drivers. So far this crash is not reported since
> > EDAC_QCOM driver is not enabled in ARM64 defconfig and no one tested the
> > driver extensively by triggering the EDAC IRQ (that's where each bank
> > CSRs are accessed).
> >
> > For fixing this issue, let's obtain the base address of each LLCC bank from
> > devicetree and get rid of the fixed stride.
> >
> > This series affects multiple platforms but I have only tested this on
> > SM8250 and SM8450. Testing on other platforms is welcomed.
> >
> 
> Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8540p-ride
> 

Thanks!

> I took this for a quick spin on the qdrive3 I've got access to without
> any issue:
> 
>     [root@localhost ~]# modprobe qcom_edac
>     [root@localhost ~]# dmesg | grep -i edac
>     [    0.620723] EDAC MC: Ver: 3.0.0
>     [    1.165417] ghes_edac: GHES probing device list is empty
>     [  594.688103] EDAC DEVICE0: Giving out device to module qcom_llcc_edac controller llcc: DEV qcom_llcc_edac (INTERRUPT)
>     [root@localhost ~]# cat /proc/interrupts | grep ecc
>     174:          0          0          0          0          0          0          0          0     GICv3 614 Level     llcc_ecc
>     [root@localhost ~]#
> 
> Potentially stupid question, but are users expected to manually load the
> driver as I did? I don't see how it would be loaded automatically in the
> current state, but thought it was funny that I needed to modprobe
> myself.
> 
> Please let me know if you want me to do any more further testing!
> 

Well, I always ended up using the driver as a built-in. I do make it module for
build test but never really used it as a module, so didn't catch this issue.

This is due to the module alias not exported by the qcom_edac driver. Below
diff allows kernel to autoload it:

diff --git a/drivers/edac/qcom_edac.c b/drivers/edac/qcom_edac.c
index f7afb5375293..13919d01c22d 100644
--- a/drivers/edac/qcom_edac.c
+++ b/drivers/edac/qcom_edac.c
@@ -419,3 +419,4 @@ module_platform_driver(qcom_llcc_edac_driver);
 
 MODULE_DESCRIPTION("QCOM EDAC driver");
 MODULE_LICENSE("GPL v2");
+MODULE_ALIAS("platform:qcom_llcc_edac");

Please test and let me know. I will add this as a new patch in next version.

Thanks,
Mani

> Thanks,
> Andrew
>
Andrew Halaney Dec. 13, 2022, 4:17 p.m. UTC | #3
On Tue, Dec 13, 2022 at 10:58:02AM +0530, Manivannan Sadhasivam wrote:
> On Mon, Dec 12, 2022 at 01:23:40PM -0600, Andrew Halaney wrote:
> > On Mon, Dec 12, 2022 at 06:02:58PM +0530, Manivannan Sadhasivam wrote:
> > > The Qualcomm LLCC/EDAC drivers were using a fixed register stride for
> > > accessing the (Control and Status Regsiters) CSRs of each LLCC bank.
> > > This offset only works for some SoCs like SDM845 for which driver support
> > > was initially added.
> > >
> > > But the later SoCs use different register stride that vary between the
> > > banks with holes in-between. So it is not possible to use a single register
> > > stride for accessing the CSRs of each bank. By doing so could result in a
> > > crash with the current drivers. So far this crash is not reported since
> > > EDAC_QCOM driver is not enabled in ARM64 defconfig and no one tested the
> > > driver extensively by triggering the EDAC IRQ (that's where each bank
> > > CSRs are accessed).
> > >
> > > For fixing this issue, let's obtain the base address of each LLCC bank from
> > > devicetree and get rid of the fixed stride.
> > >
> > > This series affects multiple platforms but I have only tested this on
> > > SM8250 and SM8450. Testing on other platforms is welcomed.
> > >
> > 
> > Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8540p-ride
> > 
> 
> Thanks!
> 
> > I took this for a quick spin on the qdrive3 I've got access to without
> > any issue:
> > 
> >     [root@localhost ~]# modprobe qcom_edac
> >     [root@localhost ~]# dmesg | grep -i edac
> >     [    0.620723] EDAC MC: Ver: 3.0.0
> >     [    1.165417] ghes_edac: GHES probing device list is empty
> >     [  594.688103] EDAC DEVICE0: Giving out device to module qcom_llcc_edac controller llcc: DEV qcom_llcc_edac (INTERRUPT)
> >     [root@localhost ~]# cat /proc/interrupts | grep ecc
> >     174:          0          0          0          0          0          0          0          0     GICv3 614 Level     llcc_ecc
> >     [root@localhost ~]#
> > 
> > Potentially stupid question, but are users expected to manually load the
> > driver as I did? I don't see how it would be loaded automatically in the
> > current state, but thought it was funny that I needed to modprobe
> > myself.
> > 
> > Please let me know if you want me to do any more further testing!
> > 
> 
> Well, I always ended up using the driver as a built-in. I do make it module for
> build test but never really used it as a module, so didn't catch this issue.
> 
> This is due to the module alias not exported by the qcom_edac driver. Below
> diff allows kernel to autoload it:
> 
> diff --git a/drivers/edac/qcom_edac.c b/drivers/edac/qcom_edac.c
> index f7afb5375293..13919d01c22d 100644
> --- a/drivers/edac/qcom_edac.c
> +++ b/drivers/edac/qcom_edac.c
> @@ -419,3 +419,4 @@ module_platform_driver(qcom_llcc_edac_driver);
>  
>  MODULE_DESCRIPTION("QCOM EDAC driver");
>  MODULE_LICENSE("GPL v2");
> +MODULE_ALIAS("platform:qcom_llcc_edac");
> 
> Please test and let me know. I will add this as a new patch in next version.
> 

Thanks Mani, that gets things working for me. For that patch:

Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Tested-by: Andrew Halaney <ahalaney@redhat.com>

My personal opinion, but that probably deserves a Fixes: tag too!
Krzysztof Kozlowski Dec. 13, 2022, 4:54 p.m. UTC | #4
On 13/12/2022 06:28, Manivannan Sadhasivam wrote:
> On Mon, Dec 12, 2022 at 01:23:40PM -0600, Andrew Halaney wrote:
>> On Mon, Dec 12, 2022 at 06:02:58PM +0530, Manivannan Sadhasivam wrote:
>>> The Qualcomm LLCC/EDAC drivers were using a fixed register stride for
>>> accessing the (Control and Status Regsiters) CSRs of each LLCC bank.
>>> This offset only works for some SoCs like SDM845 for which driver support
>>> was initially added.
>>>
>>> But the later SoCs use different register stride that vary between the
>>> banks with holes in-between. So it is not possible to use a single register
>>> stride for accessing the CSRs of each bank. By doing so could result in a
>>> crash with the current drivers. So far this crash is not reported since
>>> EDAC_QCOM driver is not enabled in ARM64 defconfig and no one tested the
>>> driver extensively by triggering the EDAC IRQ (that's where each bank
>>> CSRs are accessed).
>>>
>>> For fixing this issue, let's obtain the base address of each LLCC bank from
>>> devicetree and get rid of the fixed stride.
>>>
>>> This series affects multiple platforms but I have only tested this on
>>> SM8250 and SM8450. Testing on other platforms is welcomed.
>>>
>>
>> Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8540p-ride
>>
> 
> Thanks!
> 
>> I took this for a quick spin on the qdrive3 I've got access to without
>> any issue:
>>
>>     [root@localhost ~]# modprobe qcom_edac
>>     [root@localhost ~]# dmesg | grep -i edac
>>     [    0.620723] EDAC MC: Ver: 3.0.0
>>     [    1.165417] ghes_edac: GHES probing device list is empty
>>     [  594.688103] EDAC DEVICE0: Giving out device to module qcom_llcc_edac controller llcc: DEV qcom_llcc_edac (INTERRUPT)
>>     [root@localhost ~]# cat /proc/interrupts | grep ecc
>>     174:          0          0          0          0          0          0          0          0     GICv3 614 Level     llcc_ecc
>>     [root@localhost ~]#
>>
>> Potentially stupid question, but are users expected to manually load the
>> driver as I did? I don't see how it would be loaded automatically in the
>> current state, but thought it was funny that I needed to modprobe
>> myself.
>>
>> Please let me know if you want me to do any more further testing!
>>
> 
> Well, I always ended up using the driver as a built-in. I do make it module for
> build test but never really used it as a module, so didn't catch this issue.
> 
> This is due to the module alias not exported by the qcom_edac driver. Below
> diff allows kernel to autoload it:
> 
> diff --git a/drivers/edac/qcom_edac.c b/drivers/edac/qcom_edac.c
> index f7afb5375293..13919d01c22d 100644
> --- a/drivers/edac/qcom_edac.c
> +++ b/drivers/edac/qcom_edac.c
> @@ -419,3 +419,4 @@ module_platform_driver(qcom_llcc_edac_driver);
>  
>  MODULE_DESCRIPTION("QCOM EDAC driver");
>  MODULE_LICENSE("GPL v2");
> +MODULE_ALIAS("platform:qcom_llcc_edac");

While this is a way to fix it, but instead of creating aliases for wrong
names, either a correct name should be used or driver should receive ID
table.

Best regards,
Krzysztof
Manivannan Sadhasivam Dec. 13, 2022, 5:57 p.m. UTC | #5
On Tue, Dec 13, 2022 at 05:54:56PM +0100, Krzysztof Kozlowski wrote:
> On 13/12/2022 06:28, Manivannan Sadhasivam wrote:
> > On Mon, Dec 12, 2022 at 01:23:40PM -0600, Andrew Halaney wrote:
> >> On Mon, Dec 12, 2022 at 06:02:58PM +0530, Manivannan Sadhasivam wrote:
> >>> The Qualcomm LLCC/EDAC drivers were using a fixed register stride for
> >>> accessing the (Control and Status Regsiters) CSRs of each LLCC bank.
> >>> This offset only works for some SoCs like SDM845 for which driver support
> >>> was initially added.
> >>>
> >>> But the later SoCs use different register stride that vary between the
> >>> banks with holes in-between. So it is not possible to use a single register
> >>> stride for accessing the CSRs of each bank. By doing so could result in a
> >>> crash with the current drivers. So far this crash is not reported since
> >>> EDAC_QCOM driver is not enabled in ARM64 defconfig and no one tested the
> >>> driver extensively by triggering the EDAC IRQ (that's where each bank
> >>> CSRs are accessed).
> >>>
> >>> For fixing this issue, let's obtain the base address of each LLCC bank from
> >>> devicetree and get rid of the fixed stride.
> >>>
> >>> This series affects multiple platforms but I have only tested this on
> >>> SM8250 and SM8450. Testing on other platforms is welcomed.
> >>>
> >>
> >> Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8540p-ride
> >>
> > 
> > Thanks!
> > 
> >> I took this for a quick spin on the qdrive3 I've got access to without
> >> any issue:
> >>
> >>     [root@localhost ~]# modprobe qcom_edac
> >>     [root@localhost ~]# dmesg | grep -i edac
> >>     [    0.620723] EDAC MC: Ver: 3.0.0
> >>     [    1.165417] ghes_edac: GHES probing device list is empty
> >>     [  594.688103] EDAC DEVICE0: Giving out device to module qcom_llcc_edac controller llcc: DEV qcom_llcc_edac (INTERRUPT)
> >>     [root@localhost ~]# cat /proc/interrupts | grep ecc
> >>     174:          0          0          0          0          0          0          0          0     GICv3 614 Level     llcc_ecc
> >>     [root@localhost ~]#
> >>
> >> Potentially stupid question, but are users expected to manually load the
> >> driver as I did? I don't see how it would be loaded automatically in the
> >> current state, but thought it was funny that I needed to modprobe
> >> myself.
> >>
> >> Please let me know if you want me to do any more further testing!
> >>
> > 
> > Well, I always ended up using the driver as a built-in. I do make it module for
> > build test but never really used it as a module, so didn't catch this issue.
> > 
> > This is due to the module alias not exported by the qcom_edac driver. Below
> > diff allows kernel to autoload it:
> > 
> > diff --git a/drivers/edac/qcom_edac.c b/drivers/edac/qcom_edac.c
> > index f7afb5375293..13919d01c22d 100644
> > --- a/drivers/edac/qcom_edac.c
> > +++ b/drivers/edac/qcom_edac.c
> > @@ -419,3 +419,4 @@ module_platform_driver(qcom_llcc_edac_driver);
> >  
> >  MODULE_DESCRIPTION("QCOM EDAC driver");
> >  MODULE_LICENSE("GPL v2");
> > +MODULE_ALIAS("platform:qcom_llcc_edac");
> 
> While this is a way to fix it, but instead of creating aliases for wrong
> names, either a correct name should be used or driver should receive ID
> table.
> 

I'm not sure how you'd fix it with a _correct_ name here. Also, the id table is
an overkill since there is only one driver that is making use of it. And
moreover, there is no definite ID to use.

Thanks,
Mani

> Best regards,
> Krzysztof
>
Krzysztof Kozlowski Dec. 13, 2022, 6:47 p.m. UTC | #6
On 13/12/2022 18:57, Manivannan Sadhasivam wrote:
> On Tue, Dec 13, 2022 at 05:54:56PM +0100, Krzysztof Kozlowski wrote:
>> On 13/12/2022 06:28, Manivannan Sadhasivam wrote:
>>> On Mon, Dec 12, 2022 at 01:23:40PM -0600, Andrew Halaney wrote:
>>>> On Mon, Dec 12, 2022 at 06:02:58PM +0530, Manivannan Sadhasivam wrote:
>>>>> The Qualcomm LLCC/EDAC drivers were using a fixed register stride for
>>>>> accessing the (Control and Status Regsiters) CSRs of each LLCC bank.
>>>>> This offset only works for some SoCs like SDM845 for which driver support
>>>>> was initially added.
>>>>>
>>>>> But the later SoCs use different register stride that vary between the
>>>>> banks with holes in-between. So it is not possible to use a single register
>>>>> stride for accessing the CSRs of each bank. By doing so could result in a
>>>>> crash with the current drivers. So far this crash is not reported since
>>>>> EDAC_QCOM driver is not enabled in ARM64 defconfig and no one tested the
>>>>> driver extensively by triggering the EDAC IRQ (that's where each bank
>>>>> CSRs are accessed).
>>>>>
>>>>> For fixing this issue, let's obtain the base address of each LLCC bank from
>>>>> devicetree and get rid of the fixed stride.
>>>>>
>>>>> This series affects multiple platforms but I have only tested this on
>>>>> SM8250 and SM8450. Testing on other platforms is welcomed.
>>>>>
>>>>
>>>> Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8540p-ride
>>>>
>>>
>>> Thanks!
>>>
>>>> I took this for a quick spin on the qdrive3 I've got access to without
>>>> any issue:
>>>>
>>>>     [root@localhost ~]# modprobe qcom_edac
>>>>     [root@localhost ~]# dmesg | grep -i edac
>>>>     [    0.620723] EDAC MC: Ver: 3.0.0
>>>>     [    1.165417] ghes_edac: GHES probing device list is empty
>>>>     [  594.688103] EDAC DEVICE0: Giving out device to module qcom_llcc_edac controller llcc: DEV qcom_llcc_edac (INTERRUPT)
>>>>     [root@localhost ~]# cat /proc/interrupts | grep ecc
>>>>     174:          0          0          0          0          0          0          0          0     GICv3 614 Level     llcc_ecc
>>>>     [root@localhost ~]#
>>>>
>>>> Potentially stupid question, but are users expected to manually load the
>>>> driver as I did? I don't see how it would be loaded automatically in the
>>>> current state, but thought it was funny that I needed to modprobe
>>>> myself.
>>>>
>>>> Please let me know if you want me to do any more further testing!
>>>>
>>>
>>> Well, I always ended up using the driver as a built-in. I do make it module for
>>> build test but never really used it as a module, so didn't catch this issue.
>>>
>>> This is due to the module alias not exported by the qcom_edac driver. Below
>>> diff allows kernel to autoload it:
>>>
>>> diff --git a/drivers/edac/qcom_edac.c b/drivers/edac/qcom_edac.c
>>> index f7afb5375293..13919d01c22d 100644
>>> --- a/drivers/edac/qcom_edac.c
>>> +++ b/drivers/edac/qcom_edac.c
>>> @@ -419,3 +419,4 @@ module_platform_driver(qcom_llcc_edac_driver);
>>>  
>>>  MODULE_DESCRIPTION("QCOM EDAC driver");
>>>  MODULE_LICENSE("GPL v2");
>>> +MODULE_ALIAS("platform:qcom_llcc_edac");
>>
>> While this is a way to fix it, but instead of creating aliases for wrong
>> names, either a correct name should be used or driver should receive ID
>> table.
>>
> 
> I'm not sure how you'd fix it with a _correct_ name here. 

Hm, I assumed that it would be enough if driver name would match device
name. Currently these two are not in sync. Maybe it's not enough when
built as module?

> Also, the id table is
> an overkill since there is only one driver that is making use of it. And
> moreover, there is no definite ID to use.

Every driver with a single device support has usually ID table and it's
not a problem...

Best regards,
Krzysztof
Manivannan Sadhasivam Dec. 19, 2022, 1:50 p.m. UTC | #7
On Tue, Dec 13, 2022 at 07:47:17PM +0100, Krzysztof Kozlowski wrote:
> On 13/12/2022 18:57, Manivannan Sadhasivam wrote:
> > On Tue, Dec 13, 2022 at 05:54:56PM +0100, Krzysztof Kozlowski wrote:
> >> On 13/12/2022 06:28, Manivannan Sadhasivam wrote:
> >>> On Mon, Dec 12, 2022 at 01:23:40PM -0600, Andrew Halaney wrote:
> >>>> On Mon, Dec 12, 2022 at 06:02:58PM +0530, Manivannan Sadhasivam wrote:
> >>>>> The Qualcomm LLCC/EDAC drivers were using a fixed register stride for
> >>>>> accessing the (Control and Status Regsiters) CSRs of each LLCC bank.
> >>>>> This offset only works for some SoCs like SDM845 for which driver support
> >>>>> was initially added.
> >>>>>
> >>>>> But the later SoCs use different register stride that vary between the
> >>>>> banks with holes in-between. So it is not possible to use a single register
> >>>>> stride for accessing the CSRs of each bank. By doing so could result in a
> >>>>> crash with the current drivers. So far this crash is not reported since
> >>>>> EDAC_QCOM driver is not enabled in ARM64 defconfig and no one tested the
> >>>>> driver extensively by triggering the EDAC IRQ (that's where each bank
> >>>>> CSRs are accessed).
> >>>>>
> >>>>> For fixing this issue, let's obtain the base address of each LLCC bank from
> >>>>> devicetree and get rid of the fixed stride.
> >>>>>
> >>>>> This series affects multiple platforms but I have only tested this on
> >>>>> SM8250 and SM8450. Testing on other platforms is welcomed.
> >>>>>
> >>>>
> >>>> Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8540p-ride
> >>>>
> >>>
> >>> Thanks!
> >>>
> >>>> I took this for a quick spin on the qdrive3 I've got access to without
> >>>> any issue:
> >>>>
> >>>>     [root@localhost ~]# modprobe qcom_edac
> >>>>     [root@localhost ~]# dmesg | grep -i edac
> >>>>     [    0.620723] EDAC MC: Ver: 3.0.0
> >>>>     [    1.165417] ghes_edac: GHES probing device list is empty
> >>>>     [  594.688103] EDAC DEVICE0: Giving out device to module qcom_llcc_edac controller llcc: DEV qcom_llcc_edac (INTERRUPT)
> >>>>     [root@localhost ~]# cat /proc/interrupts | grep ecc
> >>>>     174:          0          0          0          0          0          0          0          0     GICv3 614 Level     llcc_ecc
> >>>>     [root@localhost ~]#
> >>>>
> >>>> Potentially stupid question, but are users expected to manually load the
> >>>> driver as I did? I don't see how it would be loaded automatically in the
> >>>> current state, but thought it was funny that I needed to modprobe
> >>>> myself.
> >>>>
> >>>> Please let me know if you want me to do any more further testing!
> >>>>
> >>>
> >>> Well, I always ended up using the driver as a built-in. I do make it module for
> >>> build test but never really used it as a module, so didn't catch this issue.
> >>>
> >>> This is due to the module alias not exported by the qcom_edac driver. Below
> >>> diff allows kernel to autoload it:
> >>>
> >>> diff --git a/drivers/edac/qcom_edac.c b/drivers/edac/qcom_edac.c
> >>> index f7afb5375293..13919d01c22d 100644
> >>> --- a/drivers/edac/qcom_edac.c
> >>> +++ b/drivers/edac/qcom_edac.c
> >>> @@ -419,3 +419,4 @@ module_platform_driver(qcom_llcc_edac_driver);
> >>>  
> >>>  MODULE_DESCRIPTION("QCOM EDAC driver");
> >>>  MODULE_LICENSE("GPL v2");
> >>> +MODULE_ALIAS("platform:qcom_llcc_edac");
> >>
> >> While this is a way to fix it, but instead of creating aliases for wrong
> >> names, either a correct name should be used or driver should receive ID
> >> table.
> >>
> > 
> > I'm not sure how you'd fix it with a _correct_ name here. 
> 
> Hm, I assumed that it would be enough if driver name would match device
> name. Currently these two are not in sync. Maybe it's not enough when
> built as module?
> 

Right, for module it is not enough and that's why we need id_table/alias.

> > Also, the id table is
> > an overkill since there is only one driver that is making use of it. And
> > moreover, there is no definite ID to use.
> 
> Every driver with a single device support has usually ID table and it's
> not a problem...
> 

Are you referring to OF/ACPI ID table? Or something else?

Thanks,
Mani

> Best regards,
> Krzysztof
>
Krzysztof Kozlowski Dec. 19, 2022, 2:11 p.m. UTC | #8
On 19/12/2022 14:50, Manivannan Sadhasivam wrote:
> 
>>> Also, the id table is
>>> an overkill since there is only one driver that is making use of it. And
>>> moreover, there is no definite ID to use.
>>
>> Every driver with a single device support has usually ID table and it's
>> not a problem...
>>
> 
> Are you referring to OF/ACPI ID table? Or something else?

No, I refer to the driver ID table (I2C, platform whatever the driver is).

Best regards,
Krzysztof
Manivannan Sadhasivam Dec. 19, 2022, 2:16 p.m. UTC | #9
On Mon, Dec 19, 2022 at 03:11:36PM +0100, Krzysztof Kozlowski wrote:
> On 19/12/2022 14:50, Manivannan Sadhasivam wrote:
> > 
> >>> Also, the id table is
> >>> an overkill since there is only one driver that is making use of it. And
> >>> moreover, there is no definite ID to use.
> >>
> >> Every driver with a single device support has usually ID table and it's
> >> not a problem...
> >>
> > 
> > Are you referring to OF/ACPI ID table? Or something else?
> 
> No, I refer to the driver ID table (I2C, platform whatever the driver is).
> 

Yeah, that's what I wanted to avoid here. The ID table makes sense if you have
a bus like I2C or a separate subsystem but here LLCC is an individual driver.
So creating a separate ID table is an overkill IMO.

Thanks,
Mani

> Best regards,
> Krzysztof
>
Krzysztof Kozlowski Dec. 19, 2022, 2:21 p.m. UTC | #10
On 19/12/2022 15:16, Manivannan Sadhasivam wrote:
> On Mon, Dec 19, 2022 at 03:11:36PM +0100, Krzysztof Kozlowski wrote:
>> On 19/12/2022 14:50, Manivannan Sadhasivam wrote:
>>>
>>>>> Also, the id table is
>>>>> an overkill since there is only one driver that is making use of it. And
>>>>> moreover, there is no definite ID to use.
>>>>
>>>> Every driver with a single device support has usually ID table and it's
>>>> not a problem...
>>>>
>>>
>>> Are you referring to OF/ACPI ID table? Or something else?
>>
>> No, I refer to the driver ID table (I2C, platform whatever the driver is).
>>
> 
> Yeah, that's what I wanted to avoid here. The ID table makes sense if you have
> a bus like I2C or a separate subsystem but here LLCC is an individual driver.
> So creating a separate ID table is an overkill IMO.

Why this is an overkill? Just few lines and many, many drivers have it.
Even duplicated (for legacy reasons) with OF tables.

ALIAS is not the way to go around ID table because essentially you are
re-implementing it.

Best regards,
Krzysztof
Dmitry Baryshkov Dec. 19, 2022, 4:49 p.m. UTC | #11
On Mon, 19 Dec 2022 at 16:17, Manivannan Sadhasivam
<manivannan.sadhasivam@linaro.org> wrote:
>
> On Mon, Dec 19, 2022 at 03:11:36PM +0100, Krzysztof Kozlowski wrote:
> > On 19/12/2022 14:50, Manivannan Sadhasivam wrote:
> > >
> > >>> Also, the id table is
> > >>> an overkill since there is only one driver that is making use of it. And
> > >>> moreover, there is no definite ID to use.
> > >>
> > >> Every driver with a single device support has usually ID table and it's
> > >> not a problem...
> > >>
> > >
> > > Are you referring to OF/ACPI ID table? Or something else?
> >
> > No, I refer to the driver ID table (I2C, platform whatever the driver is).
> >
>
> Yeah, that's what I wanted to avoid here. The ID table makes sense if you have
> a bus like I2C or a separate subsystem but here LLCC is an individual driver.
> So creating a separate ID table is an overkill IMO.

Well, struct platform_device_id is used quite a lot together with the
MODULE_DEVICE_TABLE(platform, _ids);

On the other hand:

$ git grep MODULE_ALIAS.*platform: | wc -l
1308
$ git grep MODULE_DEVICE_TABLE.*platform | wc -l
236
Manivannan Sadhasivam Dec. 19, 2022, 5:31 p.m. UTC | #12
On Mon, Dec 19, 2022 at 06:49:39PM +0200, Dmitry Baryshkov wrote:
> On Mon, 19 Dec 2022 at 16:17, Manivannan Sadhasivam
> <manivannan.sadhasivam@linaro.org> wrote:
> >
> > On Mon, Dec 19, 2022 at 03:11:36PM +0100, Krzysztof Kozlowski wrote:
> > > On 19/12/2022 14:50, Manivannan Sadhasivam wrote:
> > > >
> > > >>> Also, the id table is
> > > >>> an overkill since there is only one driver that is making use of it. And
> > > >>> moreover, there is no definite ID to use.
> > > >>
> > > >> Every driver with a single device support has usually ID table and it's
> > > >> not a problem...
> > > >>
> > > >
> > > > Are you referring to OF/ACPI ID table? Or something else?
> > >
> > > No, I refer to the driver ID table (I2C, platform whatever the driver is).
> > >
> >
> > Yeah, that's what I wanted to avoid here. The ID table makes sense if you have
> > a bus like I2C or a separate subsystem but here LLCC is an individual driver.
> > So creating a separate ID table is an overkill IMO.
> 
> Well, struct platform_device_id is used quite a lot together with the
> MODULE_DEVICE_TABLE(platform, _ids);
> 
> On the other hand:
> 
> $ git grep MODULE_ALIAS.*platform: | wc -l
> 1308
> $ git grep MODULE_DEVICE_TABLE.*platform | wc -l
> 236
> 

Hmm. I think I will just go with platform_device_id in the next version.

Thanks,
Mani

> -- 
> With best wishes
> Dmitry
Manivannan Sadhasivam Dec. 19, 2022, 6:31 p.m. UTC | #13
Hi Andrew,

On Mon, Dec 12, 2022 at 01:23:40PM -0600, Andrew Halaney wrote:
> On Mon, Dec 12, 2022 at 06:02:58PM +0530, Manivannan Sadhasivam wrote:
> > The Qualcomm LLCC/EDAC drivers were using a fixed register stride for
> > accessing the (Control and Status Regsiters) CSRs of each LLCC bank.
> > This offset only works for some SoCs like SDM845 for which driver support
> > was initially added.
> >
> > But the later SoCs use different register stride that vary between the
> > banks with holes in-between. So it is not possible to use a single register
> > stride for accessing the CSRs of each bank. By doing so could result in a
> > crash with the current drivers. So far this crash is not reported since
> > EDAC_QCOM driver is not enabled in ARM64 defconfig and no one tested the
> > driver extensively by triggering the EDAC IRQ (that's where each bank
> > CSRs are accessed).
> >
> > For fixing this issue, let's obtain the base address of each LLCC bank from
> > devicetree and get rid of the fixed stride.
> >
> > This series affects multiple platforms but I have only tested this on
> > SM8250 and SM8450. Testing on other platforms is welcomed.
> >
> 
> Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8540p-ride
> 

I dropped your tested-by tag in v3 as some of the patch content have been
changed. Please test v3 and share your feedback.

Thanks,
Mani

> I took this for a quick spin on the qdrive3 I've got access to without
> any issue:
> 
>     [root@localhost ~]# modprobe qcom_edac
>     [root@localhost ~]# dmesg | grep -i edac
>     [    0.620723] EDAC MC: Ver: 3.0.0
>     [    1.165417] ghes_edac: GHES probing device list is empty
>     [  594.688103] EDAC DEVICE0: Giving out device to module qcom_llcc_edac controller llcc: DEV qcom_llcc_edac (INTERRUPT)
>     [root@localhost ~]# cat /proc/interrupts | grep ecc
>     174:          0          0          0          0          0          0          0          0     GICv3 614 Level     llcc_ecc
>     [root@localhost ~]#
> 
> Potentially stupid question, but are users expected to manually load the
> driver as I did? I don't see how it would be loaded automatically in the
> current state, but thought it was funny that I needed to modprobe
> myself.
> 
> Please let me know if you want me to do any more further testing!
> 
> Thanks,
> Andrew
>