Message ID | 20220307143014.22758-1-lcherian@marvell.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [V3] irqchip/gic-v3: Workaround Marvell erratum 38545 when reading IAR | expand |
On Mon, 07 Mar 2022 14:30:14 +0000, Linu Cherian <lcherian@marvell.com> wrote: > > When a IAR register read races with a GIC interrupt RELEASE event, > GIC-CPU interface could wrongly return a valid INTID to the CPU > for an interrupt that is already released(non activated) instead of 0x3ff. > > As a side effect, an interrupt handler could run twice, once with > interrupt priority and then with idle priority. > > As a workaround, gic_read_iar is updated so that it will return a > valid interrupt ID only if there is a change in the active priority list > after the IAR read on all the affected Silicons. > > Since there are silicon variants where both 23154 and 38545 are applicable, > workaround for erratum 23154 has been extended to address both of them. > > Signed-off-by: Linu Cherian <lcherian@marvell.com> > --- > Changes since V2: > - Changed masked part number to individual part numbers > - Added additional comment to clarify on priority groups > > > Changes since V1: > - IIDR based quirk management done for 23154 has been reverted > - Extended existing 23154 errata to address 38545 as well, > so that existing static keys are reused. > - Added MIDR based support macros to cover all the affected parts > - Changed the unlikely construct to likely construct in the workaround > function. > > > > > Documentation/arm64/silicon-errata.rst | 2 +- > arch/arm64/Kconfig | 8 ++++++-- > arch/arm64/include/asm/arch_gicv3.h | 23 +++++++++++++++++++++-- > arch/arm64/include/asm/cputype.h | 13 +++++++++++++ > arch/arm64/kernel/cpu_errata.c | 20 +++++++++++++++++--- > 5 files changed, 58 insertions(+), 8 deletions(-) Looks good to me this time. Catalin, Will: happy to take this into the irqchip tree for 5.18 with your Ack, or you can take it into the arm64 tree with my Reviewed-by: Marc Zyngier <maz@kernel.org> Thanks, M.
On Mon, Mar 07, 2022 at 02:39:25PM +0000, Marc Zyngier wrote: > On Mon, 07 Mar 2022 14:30:14 +0000, > Linu Cherian <lcherian@marvell.com> wrote: > > > > When a IAR register read races with a GIC interrupt RELEASE event, > > GIC-CPU interface could wrongly return a valid INTID to the CPU > > for an interrupt that is already released(non activated) instead of 0x3ff. > > > > As a side effect, an interrupt handler could run twice, once with > > interrupt priority and then with idle priority. > > > > As a workaround, gic_read_iar is updated so that it will return a > > valid interrupt ID only if there is a change in the active priority list > > after the IAR read on all the affected Silicons. > > > > Since there are silicon variants where both 23154 and 38545 are applicable, > > workaround for erratum 23154 has been extended to address both of them. > > > > Signed-off-by: Linu Cherian <lcherian@marvell.com> > > --- > > Changes since V2: > > - Changed masked part number to individual part numbers > > - Added additional comment to clarify on priority groups > > > > > > Changes since V1: > > - IIDR based quirk management done for 23154 has been reverted > > - Extended existing 23154 errata to address 38545 as well, > > so that existing static keys are reused. > > - Added MIDR based support macros to cover all the affected parts > > - Changed the unlikely construct to likely construct in the workaround > > function. > > > > > > > > > > Documentation/arm64/silicon-errata.rst | 2 +- > > arch/arm64/Kconfig | 8 ++++++-- > > arch/arm64/include/asm/arch_gicv3.h | 23 +++++++++++++++++++++-- > > arch/arm64/include/asm/cputype.h | 13 +++++++++++++ > > arch/arm64/kernel/cpu_errata.c | 20 +++++++++++++++++--- > > 5 files changed, 58 insertions(+), 8 deletions(-) > > Looks good to me this time. > > Catalin, Will: happy to take this into the irqchip tree for 5.18 with > your Ack, or you can take it into the arm64 tree with my > > Reviewed-by: Marc Zyngier <maz@kernel.org> Fine by me to take it into irqchip but do a quick check for conflicts with other arm64 changes in for-next/core. Acked-by: Catalin Marinas <catalin.marinas@arm.com>
On Mon, 7 Mar 2022 20:00:14 +0530, Linu Cherian wrote: > When a IAR register read races with a GIC interrupt RELEASE event, > GIC-CPU interface could wrongly return a valid INTID to the CPU > for an interrupt that is already released(non activated) instead of 0x3ff. > > As a side effect, an interrupt handler could run twice, once with > interrupt priority and then with idle priority. > > [...] Applied to arm64 (for-next/errata), thanks! [1/1] irqchip/gic-v3: Workaround Marvell erratum 38545 when reading IAR https://git.kernel.org/arm64/c/24a147bcef8c Cheers,
On Mon, Mar 07, 2022 at 08:00:14PM +0530, Linu Cherian wrote: > When a IAR register read races with a GIC interrupt RELEASE event, > GIC-CPU interface could wrongly return a valid INTID to the CPU > for an interrupt that is already released(non activated) instead of 0x3ff. > > As a side effect, an interrupt handler could run twice, once with > interrupt priority and then with idle priority. > > As a workaround, gic_read_iar is updated so that it will return a > valid interrupt ID only if there is a change in the active priority list > after the IAR read on all the affected Silicons. > > Since there are silicon variants where both 23154 and 38545 are applicable, > workaround for erratum 23154 has been extended to address both of them. > > Signed-off-by: Linu Cherian <lcherian@marvell.com> Reverting this commit from today's linux-next fixed global-out-of-bounds accesses running CPU hotplug workloads on a non-ThunderX server. psci: CPU88 killed (polled 0 ms) ================================================================== BUG: KASAN: global-out-of-bounds in is_affected_midr_range_list Read of size 4 at addr ffffa0ec80ddcc6c by task swapper/88/0 CPU: 88 PID: 0 Comm: swapper/88 Not tainted 5.17.0-rc7-next-20220309-dirty #25 Call trace: dump_backtrace show_stack dump_stack_lvl print_address_description.constprop.0 print_report kasan_report __asan_report_load4_noabort is_affected_midr_range_list is_midr_in_range_list at ./arch/arm64/include/asm/cputype.h:221 (inlined by) is_affected_midr_range_list at arch/arm64/kernel/cpu_errata.c:41 verify_local_cpu_caps verify_local_cpu_caps at arch/arm64/kernel/cpufeature.c:2787 check_local_cpu_capabilities verify_local_elf_hwcaps at arch/arm64/kernel/cpufeature.c:2852 (inlined by) verify_local_cpu_capabilities at arch/arm64/kernel/cpufeature.c:2922 (inlined by) check_local_cpu_capabilities at arch/arm64/kernel/cpufeature.c:2948 secondary_start_kernel __secondary_switched The buggy address belongs to the variable: cavium_erratum_23154_cpus The buggy address belongs to the virtual mapping at [ffffa0ec80dd0000, ffffa0ec82140000) created by: map_kernel
On 2022-03-09 17:40, Qian Cai wrote: > On Mon, Mar 07, 2022 at 08:00:14PM +0530, Linu Cherian wrote: >> When a IAR register read races with a GIC interrupt RELEASE event, >> GIC-CPU interface could wrongly return a valid INTID to the CPU >> for an interrupt that is already released(non activated) instead of >> 0x3ff. >> >> As a side effect, an interrupt handler could run twice, once with >> interrupt priority and then with idle priority. >> >> As a workaround, gic_read_iar is updated so that it will return a >> valid interrupt ID only if there is a change in the active priority >> list >> after the IAR read on all the affected Silicons. >> >> Since there are silicon variants where both 23154 and 38545 are >> applicable, >> workaround for erratum 23154 has been extended to address both of >> them. >> >> Signed-off-by: Linu Cherian <lcherian@marvell.com> > > Reverting this commit from today's linux-next fixed > global-out-of-bounds > accesses running CPU hotplug workloads on a non-ThunderX server. > > psci: CPU88 killed (polled 0 ms) > ================================================================== > BUG: KASAN: global-out-of-bounds in is_affected_midr_range_list > Read of size 4 at addr ffffa0ec80ddcc6c by task swapper/88/0 > > CPU: 88 PID: 0 Comm: swapper/88 Not tainted > 5.17.0-rc7-next-20220309-dirty #25 > Call trace: > dump_backtrace > show_stack > dump_stack_lvl > print_address_description.constprop.0 > print_report > kasan_report > __asan_report_load4_noabort > is_affected_midr_range_list > is_midr_in_range_list at ./arch/arm64/include/asm/cputype.h:221 > (inlined by) is_affected_midr_range_list at > arch/arm64/kernel/cpu_errata.c:41 > verify_local_cpu_caps > verify_local_cpu_caps at arch/arm64/kernel/cpufeature.c:2787 > check_local_cpu_capabilities > verify_local_elf_hwcaps at arch/arm64/kernel/cpufeature.c:2852 > (inlined by) verify_local_cpu_capabilities at > arch/arm64/kernel/cpufeature.c:2922 > (inlined by) check_local_cpu_capabilities at > arch/arm64/kernel/cpufeature.c:2948 > secondary_start_kernel > __secondary_switched > > The buggy address belongs to the variable: > cavium_erratum_23154_cpus > > The buggy address belongs to the virtual mapping at > [ffffa0ec80dd0000, ffffa0ec82140000) created by: > map_kernel Urgh... Thanks for reporting this. Will, can you either drop this patch, or squash the following diff in? Thanks, M. diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c index 1d9d4f910de7..400a1c9cac90 100644 --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -225,6 +225,7 @@ const struct midr_range cavium_erratum_23154_cpus[] = { MIDR_ALL_VERSIONS(MIDR_OCTX2_95XXN), MIDR_ALL_VERSIONS(MIDR_OCTX2_95XXMM), MIDR_ALL_VERSIONS(MIDR_OCTX2_95XXO), + {}, }; #endif
Hi all, On Mon, Mar 7, 2022 at 11:15 PM Will Deacon <will@kernel.org> wrote: > On Mon, 7 Mar 2022 20:00:14 +0530, Linu Cherian wrote: > > When a IAR register read races with a GIC interrupt RELEASE event, > > GIC-CPU interface could wrongly return a valid INTID to the CPU > > for an interrupt that is already released(non activated) instead of 0x3ff. > > > > As a side effect, an interrupt handler could run twice, once with > > interrupt priority and then with idle priority. > > > > [...] > > Applied to arm64 (for-next/errata), thanks! > > [1/1] irqchip/gic-v3: Workaround Marvell erratum 38545 when reading IAR > https://git.kernel.org/arm64/c/24a147bcef8c This workaround is now enabled on R-Car V4H: GIC: enabling workaround for GICv3: Cavium erratum 38539 which is not a Cavium SoC. Is this expected? Thanks! Gr{oetje,eeting}s, Geert
On Tue, May 30, 2023 at 10:13 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > On Mon, Mar 7, 2022 at 11:15 PM Will Deacon <will@kernel.org> wrote: > > On Mon, 7 Mar 2022 20:00:14 +0530, Linu Cherian wrote: > > > When a IAR register read races with a GIC interrupt RELEASE event, > > > GIC-CPU interface could wrongly return a valid INTID to the CPU > > > for an interrupt that is already released(non activated) instead of 0x3ff. > > > > > > As a side effect, an interrupt handler could run twice, once with > > > interrupt priority and then with idle priority. > > > > > > [...] > > > > Applied to arm64 (for-next/errata), thanks! > > > > [1/1] irqchip/gic-v3: Workaround Marvell erratum 38545 when reading IAR > > https://git.kernel.org/arm64/c/24a147bcef8c > > This workaround is now enabled on R-Car V4H: > > GIC: enabling workaround for GICv3: Cavium erratum 38539 > > which is not a Cavium SoC. Is this expected? > Thanks! Please ignore, wrong thread. Sorry for the fuzz. (note to myself: do not trust Gmail search to match on all search parameters) Gr{oetje,eeting}s, Geert
diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst index ea281dd75517..466cb9e89047 100644 --- a/Documentation/arm64/silicon-errata.rst +++ b/Documentation/arm64/silicon-errata.rst @@ -136,7 +136,7 @@ stable kernels. +----------------+-----------------+-----------------+-----------------------------+ | Cavium | ThunderX ITS | #23144 | CAVIUM_ERRATUM_23144 | +----------------+-----------------+-----------------+-----------------------------+ -| Cavium | ThunderX GICv3 | #23154 | CAVIUM_ERRATUM_23154 | +| Cavium | ThunderX GICv3 | #23154,38545 | CAVIUM_ERRATUM_23154 | +----------------+-----------------+-----------------+-----------------------------+ | Cavium | ThunderX GICv3 | #38539 | N/A | +----------------+-----------------+-----------------+-----------------------------+ diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 09b885cc4db5..778cc2e22c21 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -891,13 +891,17 @@ config CAVIUM_ERRATUM_23144 If unsure, say Y. config CAVIUM_ERRATUM_23154 - bool "Cavium erratum 23154: Access to ICC_IAR1_EL1 is not sync'ed" + bool "Cavium errata 23154 and 38545: GICv3 lacks HW synchronisation" default y help - The gicv3 of ThunderX requires a modified version for + The ThunderX GICv3 implementation requires a modified version for reading the IAR status to ensure data synchronization (access to icc_iar1_el1 is not sync'ed before and after). + It also suffers from erratum 38545 (also present on Marvell's + OcteonTX and OcteonTX2), resulting in deactivated interrupts being + spuriously presented to the CPU interface. + If unsure, say Y. config CAVIUM_ERRATUM_27456 diff --git a/arch/arm64/include/asm/arch_gicv3.h b/arch/arm64/include/asm/arch_gicv3.h index 4ad22c3135db..8bd5afc7b692 100644 --- a/arch/arm64/include/asm/arch_gicv3.h +++ b/arch/arm64/include/asm/arch_gicv3.h @@ -53,17 +53,36 @@ static inline u64 gic_read_iar_common(void) * The gicv3 of ThunderX requires a modified version for reading the * IAR status to ensure data synchronization (access to icc_iar1_el1 * is not sync'ed before and after). + * + * Erratum 38545 + * + * When a IAR register read races with a GIC interrupt RELEASE event, + * GIC-CPU interface could wrongly return a valid INTID to the CPU + * for an interrupt that is already released(non activated) instead of 0x3ff. + * + * To workaround this, return a valid interrupt ID only if there is a change + * in the active priority list after the IAR read. + * + * Common function used for both the workarounds since, + * 1. On Thunderx 88xx 1.x both erratas are applicable. + * 2. Having extra nops doesn't add any side effects for Silicons where + * erratum 23154 is not applicable. */ static inline u64 gic_read_iar_cavium_thunderx(void) { - u64 irqstat; + u64 irqstat, apr; + apr = read_sysreg_s(SYS_ICC_AP1R0_EL1); nops(8); irqstat = read_sysreg_s(SYS_ICC_IAR1_EL1); nops(4); mb(); - return irqstat; + /* Max priority groups implemented is only 32 */ + if (likely(apr != read_sysreg_s(SYS_ICC_AP1R0_EL1))) + return irqstat; + + return 0x3ff; } static inline void gic_write_ctlr(u32 val) diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h index 999b9149f856..4596e7ca29a3 100644 --- a/arch/arm64/include/asm/cputype.h +++ b/arch/arm64/include/asm/cputype.h @@ -84,6 +84,13 @@ #define CAVIUM_CPU_PART_THUNDERX_81XX 0x0A2 #define CAVIUM_CPU_PART_THUNDERX_83XX 0x0A3 #define CAVIUM_CPU_PART_THUNDERX2 0x0AF +/* OcteonTx2 series */ +#define CAVIUM_CPU_PART_OCTX2_98XX 0x0B1 +#define CAVIUM_CPU_PART_OCTX2_96XX 0x0B2 +#define CAVIUM_CPU_PART_OCTX2_95XX 0x0B3 +#define CAVIUM_CPU_PART_OCTX2_95XXN 0x0B4 +#define CAVIUM_CPU_PART_OCTX2_95XXMM 0x0B5 +#define CAVIUM_CPU_PART_OCTX2_95XXO 0x0B6 #define BRCM_CPU_PART_BRAHMA_B53 0x100 #define BRCM_CPU_PART_VULCAN 0x516 @@ -124,6 +131,12 @@ #define MIDR_THUNDERX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX) #define MIDR_THUNDERX_81XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX_81XX) #define MIDR_THUNDERX_83XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX_83XX) +#define MIDR_OCTX2_98XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_OCTX2_98XX) +#define MIDR_OCTX2_96XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_OCTX2_96XX) +#define MIDR_OCTX2_95XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_OCTX2_95XX) +#define MIDR_OCTX2_95XXN MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_OCTX2_95XXN) +#define MIDR_OCTX2_95XXMM MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_OCTX2_95XXMM) +#define MIDR_OCTX2_95XXO MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_OCTX2_95XXO) #define MIDR_CAVIUM_THUNDERX2 MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX2) #define MIDR_BRAHMA_B53 MIDR_CPU_MODEL(ARM_CPU_IMP_BRCM, BRCM_CPU_PART_BRAHMA_B53) #define MIDR_BRCM_VULCAN MIDR_CPU_MODEL(ARM_CPU_IMP_BRCM, BRCM_CPU_PART_VULCAN) diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c index b217941713a8..510f47055b91 100644 --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -214,6 +214,20 @@ static const struct arm64_cpu_capabilities arm64_repeat_tlbi_list[] = { }; #endif +#ifdef CONFIG_CAVIUM_ERRATUM_23154 +const struct midr_range cavium_erratum_23154_cpus[] = { + MIDR_ALL_VERSIONS(MIDR_THUNDERX), + MIDR_ALL_VERSIONS(MIDR_THUNDERX_81XX), + MIDR_ALL_VERSIONS(MIDR_THUNDERX_83XX), + MIDR_ALL_VERSIONS(MIDR_OCTX2_98XX), + MIDR_ALL_VERSIONS(MIDR_OCTX2_96XX), + MIDR_ALL_VERSIONS(MIDR_OCTX2_95XX), + MIDR_ALL_VERSIONS(MIDR_OCTX2_95XXN), + MIDR_ALL_VERSIONS(MIDR_OCTX2_95XXMM), + MIDR_ALL_VERSIONS(MIDR_OCTX2_95XXO), +}; +#endif + #ifdef CONFIG_CAVIUM_ERRATUM_27456 const struct midr_range cavium_erratum_27456_cpus[] = { /* Cavium ThunderX, T88 pass 1.x - 2.1 */ @@ -425,10 +439,10 @@ const struct arm64_cpu_capabilities arm64_errata[] = { #endif #ifdef CONFIG_CAVIUM_ERRATUM_23154 { - /* Cavium ThunderX, pass 1.x */ - .desc = "Cavium erratum 23154", + .desc = "Cavium errata 23154 and 38545", .capability = ARM64_WORKAROUND_CAVIUM_23154, - ERRATA_MIDR_REV_RANGE(MIDR_THUNDERX, 0, 0, 1), + .type = ARM64_CPUCAP_LOCAL_CPU_ERRATUM, + ERRATA_MIDR_RANGE_LIST(cavium_erratum_23154_cpus), }, #endif #ifdef CONFIG_CAVIUM_ERRATUM_27456
When a IAR register read races with a GIC interrupt RELEASE event, GIC-CPU interface could wrongly return a valid INTID to the CPU for an interrupt that is already released(non activated) instead of 0x3ff. As a side effect, an interrupt handler could run twice, once with interrupt priority and then with idle priority. As a workaround, gic_read_iar is updated so that it will return a valid interrupt ID only if there is a change in the active priority list after the IAR read on all the affected Silicons. Since there are silicon variants where both 23154 and 38545 are applicable, workaround for erratum 23154 has been extended to address both of them. Signed-off-by: Linu Cherian <lcherian@marvell.com> --- Changes since V2: - Changed masked part number to individual part numbers - Added additional comment to clarify on priority groups Changes since V1: - IIDR based quirk management done for 23154 has been reverted - Extended existing 23154 errata to address 38545 as well, so that existing static keys are reused. - Added MIDR based support macros to cover all the affected parts - Changed the unlikely construct to likely construct in the workaround function. Documentation/arm64/silicon-errata.rst | 2 +- arch/arm64/Kconfig | 8 ++++++-- arch/arm64/include/asm/arch_gicv3.h | 23 +++++++++++++++++++++-- arch/arm64/include/asm/cputype.h | 13 +++++++++++++ arch/arm64/kernel/cpu_errata.c | 20 +++++++++++++++++--- 5 files changed, 58 insertions(+), 8 deletions(-)