Message ID | 0101016ed57a3259-eee09e9e-e99a-40f1-ab1c-63e58a42615c-000000@us-west-2.amazonses.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Add EDAC support for Kryo CPU core caches | expand |
On Thu, Dec 05, 2019 at 09:53:05AM +0000, Sai Prakash Ranjan wrote: > This adds DT bindings for Kryo EDAC implemented with RAS > extensions on KRYO{3,4}XX CPU cores for reporting of cache > errors. > > Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> > --- > .../bindings/edac/qcom-kryo-edac.yaml | 67 +++++++++++++++++++ > 1 file changed, 67 insertions(+) > create mode 100644 Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml > > diff --git a/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml b/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml > new file mode 100644 > index 000000000000..1a39429a73b4 > --- /dev/null > +++ b/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml > @@ -0,0 +1,67 @@ > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) > +%YAML 1.2 > +--- > +$id: http://devicetree.org/schemas/edac/qcom-kryo-edac.yaml# > +$schema: http://devicetree.org/meta-schemas/core.yaml# > + > +title: Kryo Error Detection and Correction(EDAC) > + > +maintainers: > + - Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> > + > +description: | > + Kryo EDAC is defined to describe on-chip error detection and correction > + for the Kryo CPU cores which implement RAS extensions. It will report > + all Single Bit Errors and Double Bit Errors found in L1/L2 caches in > + in two registers ERXSTATUS_EL1 and ERXMISC0_EL1. L3-SCU cache errors > + are reported in ERR1STATUS and ERR1MISC0 registers. > + ERXSTATUS_EL1 - Selected Error Record Primary Status Register, EL1 > + ERXMISC0_EL1 - Selected Error Record Miscellaneous Register 0, EL1 > + ERR1STATUS - Error Record Primary Status Register > + ERR1MISC0 - Error Record Miscellaneous Register 0 > + Current implementation of Kryo ECC(Error Correcting Code) mechanism is > + based on interrupts. > + > +properties: > + compatible: > + enum: > + - qcom,kryo-edac > + > + interrupts: > + minItems: 1 > + maxItems: 4 > + items: > + - description: l1-l2 cache faultirq interrupt > + - description: l1-l2 cache errirq interrupt > + - description: l3-scu cache errirq interrupt > + - description: l3-scu cache faultirq interrupt > + > + interrupt-names: > + minItems: 1 > + maxItems: 4 You are saying only these combinations are valid: l1-l2-faultirq l1-l2-faultirq l1-l2-errirq l1-l2-faultirq l1-l2-errirq l3-scu-errirq l1-l2-faultirq l1-l2-errirq l3-scu-errirq l3-scu-faultirq Is that your intent? > + items: > + - const: l1-l2-faultirq > + - const: l1-l2-errirq > + - const: l3-scu-errirq > + - const: l3-scu-faultirq > + > +required: > + - compatible > + - interrupts > + - interrupt-names > + > +examples: > + - | > + #include <dt-bindings/interrupt-controller/arm-gic.h> > + > + kryo_edac { > + compatible = "qcom,kryo-edac"; > + interrupts = <GIC_PPI 6 IRQ_TYPE_LEVEL_HIGH>, > + <GIC_PPI 7 IRQ_TYPE_LEVEL_HIGH>, > + <GIC_SPI 34 IRQ_TYPE_LEVEL_HIGH>, > + <GIC_SPI 35 IRQ_TYPE_LEVEL_HIGH>; > + interrupt-names = "l1-l2-faultirq", > + "l1-l2-errirq", > + "l3-scu-errirq", > + "l3-scu-faultirq"; > + }; > -- > QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member > of Code Aurora Forum, hosted by The Linux Foundation >
Hi Rob, On 2019-12-19 05:07, Rob Herring wrote: > On Thu, Dec 05, 2019 at 09:53:05AM +0000, Sai Prakash Ranjan wrote: >> This adds DT bindings for Kryo EDAC implemented with RAS >> extensions on KRYO{3,4}XX CPU cores for reporting of cache >> errors. >> >> Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> >> --- >> .../bindings/edac/qcom-kryo-edac.yaml | 67 >> +++++++++++++++++++ >> 1 file changed, 67 insertions(+) >> create mode 100644 >> Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml >> >> diff --git >> a/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml >> b/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml >> new file mode 100644 >> index 000000000000..1a39429a73b4 >> --- /dev/null >> +++ b/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml >> @@ -0,0 +1,67 @@ >> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) >> +%YAML 1.2 >> +--- >> +$id: http://devicetree.org/schemas/edac/qcom-kryo-edac.yaml# >> +$schema: http://devicetree.org/meta-schemas/core.yaml# >> + >> +title: Kryo Error Detection and Correction(EDAC) >> + >> +maintainers: >> + - Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> >> + >> +description: | >> + Kryo EDAC is defined to describe on-chip error detection and >> correction >> + for the Kryo CPU cores which implement RAS extensions. It will >> report >> + all Single Bit Errors and Double Bit Errors found in L1/L2 caches >> in >> + in two registers ERXSTATUS_EL1 and ERXMISC0_EL1. L3-SCU cache >> errors >> + are reported in ERR1STATUS and ERR1MISC0 registers. >> + ERXSTATUS_EL1 - Selected Error Record Primary Status Register, >> EL1 >> + ERXMISC0_EL1 - Selected Error Record Miscellaneous Register 0, >> EL1 >> + ERR1STATUS - Error Record Primary Status Register >> + ERR1MISC0 - Error Record Miscellaneous Register 0 >> + Current implementation of Kryo ECC(Error Correcting Code) mechanism >> is >> + based on interrupts. >> + >> +properties: >> + compatible: >> + enum: >> + - qcom,kryo-edac >> + >> + interrupts: >> + minItems: 1 >> + maxItems: 4 >> + items: >> + - description: l1-l2 cache faultirq interrupt >> + - description: l1-l2 cache errirq interrupt >> + - description: l3-scu cache errirq interrupt >> + - description: l3-scu cache faultirq interrupt >> + >> + interrupt-names: >> + minItems: 1 >> + maxItems: 4 > > You are saying only these combinations are valid: > > l1-l2-faultirq > > l1-l2-faultirq > l1-l2-errirq > > l1-l2-faultirq > l1-l2-errirq > l3-scu-errirq > > l1-l2-faultirq > l1-l2-errirq > l3-scu-errirq > l3-scu-faultirq > > Is that your intent? > No, I want any combination of interrupts to be valid with atleast one interrupt as mandatory. I thought specifying minItems as 1 and maxItems as 4 will take care of this, am I doing something wrong? Thanks, Sai
On Thu, Dec 19, 2019 at 12:50 AM Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> wrote: > > Hi Rob, > > On 2019-12-19 05:07, Rob Herring wrote: > > On Thu, Dec 05, 2019 at 09:53:05AM +0000, Sai Prakash Ranjan wrote: > >> This adds DT bindings for Kryo EDAC implemented with RAS > >> extensions on KRYO{3,4}XX CPU cores for reporting of cache > >> errors. > >> > >> Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> > >> --- > >> .../bindings/edac/qcom-kryo-edac.yaml | 67 > >> +++++++++++++++++++ > >> 1 file changed, 67 insertions(+) > >> create mode 100644 > >> Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml > >> > >> diff --git > >> a/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml > >> b/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml > >> new file mode 100644 > >> index 000000000000..1a39429a73b4 > >> --- /dev/null > >> +++ b/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml > >> @@ -0,0 +1,67 @@ > >> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) > >> +%YAML 1.2 > >> +--- > >> +$id: http://devicetree.org/schemas/edac/qcom-kryo-edac.yaml# > >> +$schema: http://devicetree.org/meta-schemas/core.yaml# > >> + > >> +title: Kryo Error Detection and Correction(EDAC) > >> + > >> +maintainers: > >> + - Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> > >> + > >> +description: | > >> + Kryo EDAC is defined to describe on-chip error detection and > >> correction > >> + for the Kryo CPU cores which implement RAS extensions. It will > >> report > >> + all Single Bit Errors and Double Bit Errors found in L1/L2 caches > >> in > >> + in two registers ERXSTATUS_EL1 and ERXMISC0_EL1. L3-SCU cache > >> errors > >> + are reported in ERR1STATUS and ERR1MISC0 registers. > >> + ERXSTATUS_EL1 - Selected Error Record Primary Status Register, > >> EL1 > >> + ERXMISC0_EL1 - Selected Error Record Miscellaneous Register 0, > >> EL1 > >> + ERR1STATUS - Error Record Primary Status Register > >> + ERR1MISC0 - Error Record Miscellaneous Register 0 > >> + Current implementation of Kryo ECC(Error Correcting Code) mechanism > >> is > >> + based on interrupts. > >> + > >> +properties: > >> + compatible: > >> + enum: > >> + - qcom,kryo-edac > >> + > >> + interrupts: > >> + minItems: 1 > >> + maxItems: 4 > >> + items: > >> + - description: l1-l2 cache faultirq interrupt > >> + - description: l1-l2 cache errirq interrupt > >> + - description: l3-scu cache errirq interrupt > >> + - description: l3-scu cache faultirq interrupt > >> + > >> + interrupt-names: > >> + minItems: 1 > >> + maxItems: 4 > > > > You are saying only these combinations are valid: > > > > l1-l2-faultirq > > > > l1-l2-faultirq > > l1-l2-errirq > > > > l1-l2-faultirq > > l1-l2-errirq > > l3-scu-errirq > > > > l1-l2-faultirq > > l1-l2-errirq > > l3-scu-errirq > > l3-scu-faultirq > > > > Is that your intent? > > > > No, I want any combination of interrupts to be valid with atleast one > interrupt as mandatory. > I thought specifying minItems as 1 and maxItems as 4 will take care of > this, am I doing something wrong? Interrupts (really all properties) have a defined order in DT and an 'items' list defines both the order and index. You'll need to use oneOf and list out the possibilities. Stick to ones you actually need. Rob
On 2019-12-19 19:28, Rob Herring wrote: >> > Is that your intent? >> > >> >> No, I want any combination of interrupts to be valid with atleast one >> interrupt as mandatory. >> I thought specifying minItems as 1 and maxItems as 4 will take care of >> this, am I doing something wrong? > > Interrupts (really all properties) have a defined order in DT and an > 'items' list defines both the order and index. You'll need to use > oneOf and list out the possibilities. Stick to ones you actually need. > Thanks, I will make the change in the next spin. -Sai
Hi Sai, (CC: +Tyler) On 05/12/2019 09:53, Sai Prakash Ranjan wrote: > This adds DT bindings for Kryo EDAC implemented with RAS > extensions on KRYO{3,4}XX CPU cores for reporting of cache > errors. KRYO{3,4}XX isn't the only SoC with the RAS extensions. The DT needs to convey the range of ways this armv8 RAS extensions stuff can be wired up. The folk who look after the ACPI specs have made a start: https://static.docs.arm.com/den0085/a/DEN0085_RAS_ACPI_1.0_BETA_1.pdf (I suspect that isn't the latest version, I'll try and find out) I'd like the ACPI table and DT to convey the same information so that we don't need to convert or infer things in the driver. If something is missing, we should get it added! > diff --git a/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml b/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml > new file mode 100644 > index 000000000000..1a39429a73b4 > --- /dev/null > +++ b/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml > @@ -0,0 +1,67 @@ > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) > +%YAML 1.2 > +--- > +$id: http://devicetree.org/schemas/edac/qcom-kryo-edac.yaml# > +$schema: http://devicetree.org/meta-schemas/core.yaml# > + > +title: Kryo Error Detection and Correction(EDAC) > + > +maintainers: > + - Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> > + > +description: | > + Kryo EDAC is defined to describe on-chip error detection and correction > + for the Kryo CPU cores which implement RAS extensions. Please don't make this Kryo specific, otherwise this binding becomes an extra thing we need to support with a 'v8.2 RAS' driver. What I'd like is a single 'armv82_ras' edac driver that handles faults and errors reported by interrupts, and interacts with the arch code's handling of 'external aborts'. This should work for all platforms using v8.2 RAS and later. > + It will report > + all Single Bit Errors and Double Bit Errors found in L1/L2 caches in > + in two registers ERXSTATUS_EL1 and ERXMISC0_EL1. L3-SCU cache errors > + are reported in ERR1STATUS and ERR1MISC0 registers. > + ERXSTATUS_EL1 - Selected Error Record Primary Status Register, EL1 > + ERXMISC0_EL1 - Selected Error Record Miscellaneous Register 0, EL1 > + ERR1STATUS - Error Record Primary Status Register > + ERR1MISC0 - Error Record Miscellaneous Register 0 > + Current implementation of Kryo ECC(Error Correcting Code) mechanism is > + based on interrupts. Your SoC picked the system registers as the interface to these component's registers. The binding would need to specify which index the 'l1-l2' records start at, and how many there are. The same for the 'l3-scu'. You can't hard code these, they are different on other platforms. There is also an MMIO interface which needs a base address, along with the index and ranges. (which may be different). The same component may use both the system register and the MMIO interface. This stuff is likely to vary on big/little systems, so you need a way of describing which CPUs the settings refer to. This probably isn't something the ACPI tables capture as ACPI machines are typically homogenous. Thanks, James
Hi James, On 2020-01-16 00:18, James Morse wrote: > Hi Sai, > > (CC: +Tyler) > > On 05/12/2019 09:53, Sai Prakash Ranjan wrote: >> This adds DT bindings for Kryo EDAC implemented with RAS >> extensions on KRYO{3,4}XX CPU cores for reporting of cache >> errors. > > KRYO{3,4}XX isn't the only SoC with the RAS extensions. The DT needs > to convey the range > of ways this armv8 RAS extensions stuff can be wired up. > Right, but I was going for Kryo specific implementation and hence the binding as such. > The folk who look after the ACPI specs have made a start: > https://static.docs.arm.com/den0085/a/DEN0085_RAS_ACPI_1.0_BETA_1.pdf > > (I suspect that isn't the latest version, I'll try and find out) > That would be helpful, thanks. > I'd like the ACPI table and DT to convey the same information so that > we don't need to > convert or infer things in the driver. If something is missing, we > should get it added! > Sure, I think it is decided now that kernel first RAS implementation will be generic. > >> diff --git >> a/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml >> b/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml >> new file mode 100644 >> index 000000000000..1a39429a73b4 >> --- /dev/null >> +++ b/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml >> @@ -0,0 +1,67 @@ >> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) >> +%YAML 1.2 >> +--- >> +$id: http://devicetree.org/schemas/edac/qcom-kryo-edac.yaml# >> +$schema: http://devicetree.org/meta-schemas/core.yaml# >> + >> +title: Kryo Error Detection and Correction(EDAC) >> + >> +maintainers: >> + - Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> >> + >> +description: | >> + Kryo EDAC is defined to describe on-chip error detection and >> correction >> + for the Kryo CPU cores which implement RAS extensions. > > Please don't make this Kryo specific, otherwise this binding becomes > an extra thing we > need to support with a 'v8.2 RAS' driver. > > What I'd like is a single 'armv82_ras' edac driver that handles faults > and errors reported > by interrupts, and interacts with the arch code's handling of > 'external aborts'. This > should work for all platforms using v8.2 RAS and later. > > Ok sure. >> + It will report >> + all Single Bit Errors and Double Bit Errors found in L1/L2 caches >> in >> + in two registers ERXSTATUS_EL1 and ERXMISC0_EL1. L3-SCU cache >> errors >> + are reported in ERR1STATUS and ERR1MISC0 registers. >> + ERXSTATUS_EL1 - Selected Error Record Primary Status Register, >> EL1 >> + ERXMISC0_EL1 - Selected Error Record Miscellaneous Register 0, >> EL1 >> + ERR1STATUS - Error Record Primary Status Register >> + ERR1MISC0 - Error Record Miscellaneous Register 0 >> + Current implementation of Kryo ECC(Error Correcting Code) mechanism >> is >> + based on interrupts. > > Your SoC picked the system registers as the interface to these > component's registers. > The binding would need to specify which index the 'l1-l2' records > start at, and how many > there are. The same for the 'l3-scu'. You can't hard code these, they > are different on > other platforms. > Ok will keep this in mind for the next version. > There is also an MMIO interface which needs a base address, along with > the index and > ranges. (which may be different). The same component may use both the > system register and > the MMIO interface. > I have some doubts here, Where do I get this info? Will this be implementation specific? > This stuff is likely to vary on big/little systems, so you need a way > of describing which > CPUs the settings refer to. This probably isn't something the ACPI > tables capture as ACPI > machines are typically homogenous. > Our SoCs are based on big.LITTLE arch, so this will be needed. Thanks, Sai
Hi Sai, On 24/01/2020 14:21, Sai Prakash Ranjan wrote: > On 2020-01-16 00:18, James Morse wrote: >> On 05/12/2019 09:53, Sai Prakash Ranjan wrote: >>> This adds DT bindings for Kryo EDAC implemented with RAS >>> extensions on KRYO{3,4}XX CPU cores for reporting of cache >>> errors. >>> diff --git a/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml >>> b/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml >>> new file mode 100644 >>> index 000000000000..1a39429a73b4 >>> --- /dev/null >>> +++ b/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml >> There is also an MMIO interface which needs a base address, along with >> the index and >> ranges. (which may be different). The same component may use both the >> system register and the MMIO interface. > I have some doubts here, Where do I get this info? Will this be implementation specific? It will be implementation specific. The ACPI spec folk have gathered some of the range of ways people are putting this together. We should take that into account with the binding, otherwise we end up with a 'v1' and 'v2' of the binding and have to support both. There is a 'Beta 2' of that ACPI document. It should appear on the website at some point. Qualcomm should have this somewhere, its called 'DEN0085_RAS_ACPI_1.0_RELEASE_BETA2.pdf. Thanks, James
diff --git a/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml b/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml new file mode 100644 index 000000000000..1a39429a73b4 --- /dev/null +++ b/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml @@ -0,0 +1,67 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/edac/qcom-kryo-edac.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Kryo Error Detection and Correction(EDAC) + +maintainers: + - Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> + +description: | + Kryo EDAC is defined to describe on-chip error detection and correction + for the Kryo CPU cores which implement RAS extensions. It will report + all Single Bit Errors and Double Bit Errors found in L1/L2 caches in + in two registers ERXSTATUS_EL1 and ERXMISC0_EL1. L3-SCU cache errors + are reported in ERR1STATUS and ERR1MISC0 registers. + ERXSTATUS_EL1 - Selected Error Record Primary Status Register, EL1 + ERXMISC0_EL1 - Selected Error Record Miscellaneous Register 0, EL1 + ERR1STATUS - Error Record Primary Status Register + ERR1MISC0 - Error Record Miscellaneous Register 0 + Current implementation of Kryo ECC(Error Correcting Code) mechanism is + based on interrupts. + +properties: + compatible: + enum: + - qcom,kryo-edac + + interrupts: + minItems: 1 + maxItems: 4 + items: + - description: l1-l2 cache faultirq interrupt + - description: l1-l2 cache errirq interrupt + - description: l3-scu cache errirq interrupt + - description: l3-scu cache faultirq interrupt + + interrupt-names: + minItems: 1 + maxItems: 4 + items: + - const: l1-l2-faultirq + - const: l1-l2-errirq + - const: l3-scu-errirq + - const: l3-scu-faultirq + +required: + - compatible + - interrupts + - interrupt-names + +examples: + - | + #include <dt-bindings/interrupt-controller/arm-gic.h> + + kryo_edac { + compatible = "qcom,kryo-edac"; + interrupts = <GIC_PPI 6 IRQ_TYPE_LEVEL_HIGH>, + <GIC_PPI 7 IRQ_TYPE_LEVEL_HIGH>, + <GIC_SPI 34 IRQ_TYPE_LEVEL_HIGH>, + <GIC_SPI 35 IRQ_TYPE_LEVEL_HIGH>; + interrupt-names = "l1-l2-faultirq", + "l1-l2-errirq", + "l3-scu-errirq", + "l3-scu-faultirq"; + };
This adds DT bindings for Kryo EDAC implemented with RAS extensions on KRYO{3,4}XX CPU cores for reporting of cache errors. Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> --- .../bindings/edac/qcom-kryo-edac.yaml | 67 +++++++++++++++++++ 1 file changed, 67 insertions(+) create mode 100644 Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml