Message ID | 20240426135126.12802-12-Jonathan.Cameron@huawei.com (mailing list archive) |
---|---|
State | Handled Elsewhere, archived |
Headers | show |
Series | ACPI/arm64: add support for virtual cpu hotplug | expand |
On Fri, 26 Apr 2024 14:51:21 +0100, Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: > > From: James Morse <james.morse@arm.com> > > To support virtual CPU hotplug, ACPI has added an 'online capable' bit > to the MADT GICC entries. This indicates a disabled CPU entry may not > be possible to online via PSCI until firmware has set enabled bit in > _STA. > > This means that a "usable" GIC is one that is marked as either enabled, nit: "GIC" usually designs the whole HW infrastructure (distributor, redistributors, and ITSs). My understanding is that you are only referring to the redistributors. > or online capable. Therefore, change acpi_gicc_is_usable() to check both > bits. However, we need to change the test in gic_acpi_match_gicc() back > to testing just the enabled bit so the count of enabled distributors is > correct. > > What about the redistributor in the GICC entry? ACPI doesn't want to say. > Assume the worst: When a redistributor is described in the GICC entry, > but the entry is marked as disabled at boot, assume the redistributor > is inaccessible. > > The GICv3 driver doesn't support late online of redistributors, so this > means the corresponding CPU can't be brought online either. > Rather than modifying cpu masks that may already have been used, > register a new cpuhp callback to fail this case. This must run earlier > than the main gic_starting_cpu() so that this case can be rejected > before the section of cpuhp that runs on the CPU that is coming up as > that is not allowed to fail. This solution keeps the handling of this > broken firmware corner case local to the GIC driver. As precise ordering > of this callback doesn't need to be controlled as long as it is > in that initial prepare phase, use CPUHP_BP_PREPARE_DYN. > > Systems that want CPU hotplug in a VM can ensure their redistributors > are always-on, and describe them that way with a GICR entry in the MADT. > > Suggested-by: Marc Zyngier <maz@kernel.org> > Signed-off-by: James Morse <james.morse@arm.com> > Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> > Tested-by: Miguel Luis <miguel.luis@oracle.com> > Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > > --- > Thanks to Marc for review and suggestions! > v8: Change the handling of broken rdists to fail cpuhp rather than > modifying the cpu_present and cpu_possible masks. > Updated commit text to reflect that. > Added a sb tag for Marc given this is more or less what he put > in his review comment. > --- > drivers/irqchip/irq-gic-v3.c | 38 ++++++++++++++++++++++++++++++++++-- > include/linux/acpi.h | 3 ++- > 2 files changed, 38 insertions(+), 3 deletions(-) > > diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c > index 10af15f93d4d..b4685991953e 100644 > --- a/drivers/irqchip/irq-gic-v3.c > +++ b/drivers/irqchip/irq-gic-v3.c > @@ -44,6 +44,8 @@ > > #define GIC_IRQ_TYPE_PARTITION (GIC_IRQ_TYPE_LPI + 1) > > +static struct cpumask broken_rdists __read_mostly; > + > struct redist_region { > void __iomem *redist_base; > phys_addr_t phys_base; > @@ -1293,6 +1295,18 @@ static void gic_cpu_init(void) > #define MPIDR_TO_SGI_RS(mpidr) (MPIDR_RS(mpidr) << ICC_SGI1R_RS_SHIFT) > #define MPIDR_TO_SGI_CLUSTER_ID(mpidr) ((mpidr) & ~0xFUL) > > +/* > + * gic_starting_cpu() is called after the last point where cpuhp is allowed > + * to fail. So pre check for problems earlier. > + */ > +static int gic_check_rdist(unsigned int cpu) > +{ > + if (cpumask_test_cpu(cpu, &broken_rdists)) > + return -EINVAL; > + > + return 0; > +} > + > static int gic_starting_cpu(unsigned int cpu) > { > gic_cpu_init(); > @@ -1384,6 +1398,10 @@ static void __init gic_smp_init(void) > }; > int base_sgi; > > + cpuhp_setup_state_nocalls(CPUHP_BP_PREPARE_DYN, > + "irqchip/arm/gicv3:checkrdist", > + gic_check_rdist, NULL); > + > cpuhp_setup_state_nocalls(CPUHP_AP_IRQ_GIC_STARTING, > "irqchip/arm/gicv3:starting", > gic_starting_cpu, NULL); > @@ -2363,11 +2381,24 @@ gic_acpi_parse_madt_gicc(union acpi_subtable_headers *header, > (struct acpi_madt_generic_interrupt *)header; > u32 reg = readl_relaxed(acpi_data.dist_base + GICD_PIDR2) & GIC_PIDR2_ARCH_MASK; > u32 size = reg == GIC_PIDR2_ARCH_GICv4 ? SZ_64K * 4 : SZ_64K * 2; > + int cpu = get_cpu_for_acpi_id(gicc->uid); > void __iomem *redist_base; > > if (!acpi_gicc_is_usable(gicc)) > return 0; > > + /* > + * Capable but disabled CPUs can be brought online later. What about > + * the redistributor? ACPI doesn't want to say! > + * Virtual hotplug systems can use the MADT's "always-on" GICR entries. > + * Otherwise, prevent such CPUs from being brought online. > + */ > + if (!(gicc->flags & ACPI_MADT_ENABLED)) { Now this makes the above acpi_gicc_is_usable() very odd. It checks for MADT_ENABLED *or* GICC_ONLINE_CAPABLE. But we definitely don't want to deal with the lack of MADT_ENABLED. So why don't we explicitly check for individual flags and get rid of acpi_gicc_is_usable(), as its new definition doesn't tell you anything useful? > + pr_warn_once("CPU %u's redistributor is inaccessible: this CPU can't be brought online\n", cpu); > + cpumask_set_cpu(cpu, &broken_rdists); Given that get_cpu_for_acpi_id() can return -EINVAL, you'd want to check that. Also, I'd like to drop the _once on the warning. Indicating all the broken CPUs is useful information, and only happens once per boot. > + return 0; > + } > + > redist_base = ioremap(gicc->gicr_base_address, size); > if (!redist_base) > return -ENOMEM; > @@ -2413,9 +2444,12 @@ static int __init gic_acpi_match_gicc(union acpi_subtable_headers *header, > > /* > * If GICC is enabled and has valid gicr base address, then it means > - * GICR base is presented via GICC > + * GICR base is presented via GICC. The redistributor is only known to > + * be accessible if the GICC is marked as enabled. If this bit is not > + * set, we'd need to add the redistributor at runtime, which isn't > + * supported. > */ > - if (acpi_gicc_is_usable(gicc) && gicc->gicr_base_address) > + if (gicc->flags & ACPI_MADT_ENABLED && gicc->gicr_base_address) > acpi_data.enabled_rdists++; > > return 0; > diff --git a/include/linux/acpi.h b/include/linux/acpi.h > index 9844a3f9c4e5..fcfb7bb6789e 100644 > --- a/include/linux/acpi.h > +++ b/include/linux/acpi.h > @@ -239,7 +239,8 @@ void acpi_table_print_madt_entry (struct acpi_subtable_header *madt); > > static inline bool acpi_gicc_is_usable(struct acpi_madt_generic_interrupt *gicc) > { > - return gicc->flags & ACPI_MADT_ENABLED; > + return gicc->flags & (ACPI_MADT_ENABLED | > + ACPI_MADT_GICC_ONLINE_CAPABLE); > } > > /* the following numa functions are architecture-dependent */ Thanks, M.
> > @@ -2363,11 +2381,24 @@ gic_acpi_parse_madt_gicc(union acpi_subtable_headers *header, > > (struct acpi_madt_generic_interrupt *)header; > > u32 reg = readl_relaxed(acpi_data.dist_base + GICD_PIDR2) & GIC_PIDR2_ARCH_MASK; > > u32 size = reg == GIC_PIDR2_ARCH_GICv4 ? SZ_64K * 4 : SZ_64K * 2; > > + int cpu = get_cpu_for_acpi_id(gicc->uid); > > void __iomem *redist_base; > > > > if (!acpi_gicc_is_usable(gicc)) > > return 0; > > > > + /* > > + * Capable but disabled CPUs can be brought online later. What about > > + * the redistributor? ACPI doesn't want to say! > > + * Virtual hotplug systems can use the MADT's "always-on" GICR entries. > > + * Otherwise, prevent such CPUs from being brought online. > > + */ > > + if (!(gicc->flags & ACPI_MADT_ENABLED)) { > > Now this makes the above acpi_gicc_is_usable() very odd. It checks for > MADT_ENABLED *or* GICC_ONLINE_CAPABLE. But we definitely don't want to > deal with the lack of MADT_ENABLED. > > So why don't we explicitly check for individual flags and get rid of > acpi_gicc_is_usable(), as its new definition doesn't tell you anything > useful? That does seem to have evolved to something rather odd. I messed around with various reorganizations of the boolean logic and ended up with same 2 conditions as here as otherwise the indent gets deep and the code becomes fiddlier to reason about (see below for result) > > > + return 0; > > + } > > + > > redist_base = ioremap(gicc->gicr_base_address, size); > > if (!redist_base) > > return -ENOMEM; > > @@ -2413,9 +2444,12 @@ static int __init gic_acpi_match_gicc(union acpi_subtable_headers *header, > > > > /* > > * If GICC is enabled and has valid gicr base address, then it means > > - * GICR base is presented via GICC > > + * GICR base is presented via GICC. The redistributor is only known to > > + * be accessible if the GICC is marked as enabled. If this bit is not > > + * set, we'd need to add the redistributor at runtime, which isn't > > + * supported. > > */ > > - if (acpi_gicc_is_usable(gicc) && gicc->gicr_base_address) > > + if (gicc->flags & ACPI_MADT_ENABLED && gicc->gicr_base_address) > > acpi_data.enabled_rdists++; > > > > return 0; > > diff --git a/include/linux/acpi.h b/include/linux/acpi.h > > index 9844a3f9c4e5..fcfb7bb6789e 100644 > > --- a/include/linux/acpi.h > > +++ b/include/linux/acpi.h > > @@ -239,7 +239,8 @@ void acpi_table_print_madt_entry (struct acpi_subtable_header *madt); > > > > static inline bool acpi_gicc_is_usable(struct acpi_madt_generic_interrupt *gicc) > > { > > - return gicc->flags & ACPI_MADT_ENABLED; > > + return gicc->flags & (ACPI_MADT_ENABLED | > > + ACPI_MADT_GICC_ONLINE_CAPABLE); > > } > > > > /* the following numa functions are architecture-dependent */ > > Thanks, I'll not send a formal v9 until early next week, so here is the current state if you have time to take another look before then. From a8a54cfbadccf1782b7cc04b93eb875dedbee7a9 Mon Sep 17 00:00:00 2001 From: James Morse <james.morse@arm.com> Date: Thu, 18 Apr 2024 14:54:07 +0100 Subject: [PATCH] irqchip/gic-v3: Add support for ACPI's disabled but 'online capable' CPUs To support virtual CPU hotplug, ACPI has added an 'online capable' bit to the MADT GICC entries. This indicates a disabled CPU entry may not be possible to online via PSCI until firmware has set enabled bit in _STA. This means that a "usable" GIC redistributor is one that is marked as either enabled, or online capable. The meaning of the acpi_gicc_is_usable() would become less clear than just checking the pair of flags at call sites. As such, drop that helper function. The test in gic_acpi_match_gicc() remains as testing just the enabled bit so the count of enabled distributors is correct. What about the redistributor in the GICC entry? ACPI doesn't want to say. Assume the worst: When a redistributor is described in the GICC entry, but the entry is marked as disabled at boot, assume the redistributor is inaccessible. The GICv3 driver doesn't support late online of redistributors, so this means the corresponding CPU can't be brought online either. Rather than modifying cpu masks that may already have been used, register a new cpuhp callback to fail this case. This must run earlier than the main gic_starting_cpu() so that this case can be rejected before the section of cpuhp that runs on the CPU that is coming up as that is not allowed to fail. This solution keeps the handling of this broken firmware corner case local to the GIC driver. As precise ordering of this callback doesn't need to be controlled as long as it is in that initial prepare phase, use CPUHP_BP_PREPARE_DYN. Systems that want CPU hotplug in a VM can ensure their redistributors are always-on, and describe them that way with a GICR entry in the MADT. Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Tested-by: Miguel Luis <miguel.luis@oracle.com> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> --- v9: Thanks to Marc for quick follow up. Fix up description and drop the acpi_gicc_is_usable() check given that now doesn't actually mean they are usable. Thanks to Marc for review and suggestions! v8: Change the handling of broken rdists to fail cpuhp rather than modifying the cpu_present and cpu_possible masks. Updated commit text to reflect that. Added a sb tag for Marc given this is more or less what he put in his review comment. --- arch/arm64/kernel/smp.c | 3 ++- drivers/acpi/processor_core.c | 3 ++- drivers/irqchip/irq-gic-v3.c | 44 +++++++++++++++++++++++++++++++---- include/linux/acpi.h | 5 ---- 4 files changed, 44 insertions(+), 11 deletions(-) diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index 4ced34f62dab..afe835c1cbe2 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -523,7 +523,8 @@ acpi_map_gic_cpu_interface(struct acpi_madt_generic_interrupt *processor) { u64 hwid = processor->arm_mpidr; - if (!acpi_gicc_is_usable(processor)) { + if (!(processor->flags & + (ACPI_MADT_ENABLED | ACPI_MADT_GICC_ONLINE_CAPABLE))) { pr_debug("skipping disabled CPU entry with 0x%llx MPIDR\n", hwid); return; } diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c index b203cfe28550..b04b684f3190 100644 --- a/drivers/acpi/processor_core.c +++ b/drivers/acpi/processor_core.c @@ -90,7 +90,8 @@ static int map_gicc_mpidr(struct acpi_subtable_header *entry, struct acpi_madt_generic_interrupt *gicc = container_of(entry, struct acpi_madt_generic_interrupt, header); - if (!acpi_gicc_is_usable(gicc)) + if (!(gicc->flags & + (ACPI_MADT_ENABLED | ACPI_MADT_GICC_ONLINE_CAPABLE))) return -ENODEV; /* device_declaration means Device object in DSDT, in the diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index 10af15f93d4d..45272316d155 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -44,6 +44,8 @@ #define GIC_IRQ_TYPE_PARTITION (GIC_IRQ_TYPE_LPI + 1) +static struct cpumask broken_rdists __read_mostly; + struct redist_region { void __iomem *redist_base; phys_addr_t phys_base; @@ -1293,6 +1295,18 @@ static void gic_cpu_init(void) #define MPIDR_TO_SGI_RS(mpidr) (MPIDR_RS(mpidr) << ICC_SGI1R_RS_SHIFT) #define MPIDR_TO_SGI_CLUSTER_ID(mpidr) ((mpidr) & ~0xFUL) +/* + * gic_starting_cpu() is called after the last point where cpuhp is allowed + * to fail. So pre check for problems earlier. + */ +static int gic_check_rdist(unsigned int cpu) +{ + if (cpumask_test_cpu(cpu, &broken_rdists)) + return -EINVAL; + + return 0; +} + static int gic_starting_cpu(unsigned int cpu) { gic_cpu_init(); @@ -1384,6 +1398,10 @@ static void __init gic_smp_init(void) }; int base_sgi; + cpuhp_setup_state_nocalls(CPUHP_BP_PREPARE_DYN, + "irqchip/arm/gicv3:checkrdist", + gic_check_rdist, NULL); + cpuhp_setup_state_nocalls(CPUHP_AP_IRQ_GIC_STARTING, "irqchip/arm/gicv3:starting", gic_starting_cpu, NULL); @@ -2363,11 +2381,25 @@ gic_acpi_parse_madt_gicc(union acpi_subtable_headers *header, (struct acpi_madt_generic_interrupt *)header; u32 reg = readl_relaxed(acpi_data.dist_base + GICD_PIDR2) & GIC_PIDR2_ARCH_MASK; u32 size = reg == GIC_PIDR2_ARCH_GICv4 ? SZ_64K * 4 : SZ_64K * 2; + int cpu = get_cpu_for_acpi_id(gicc->uid); void __iomem *redist_base; - if (!acpi_gicc_is_usable(gicc)) + /* Neither enabled or online capable means it doesn't exist, skip it */ + if (!(gicc->flags & (ACPI_MADT_ENABLED | ACPI_MADT_GICC_ONLINE_CAPABLE))) return 0; + /* + * Capable but disabled CPUs can be brought online later. What about + * the redistributor? ACPI doesn't want to say! + * Virtual hotplug systems can use the MADT's "always-on" GICR entries. + * Otherwise, prevent such CPUs from being brought online. + */ + if (!(gicc->flags & ACPI_MADT_ENABLED)) { + pr_warn("CPU %u's redistributor is inaccessible: this CPU can't be brought online\n", cpu); + cpumask_set_cpu(cpu, &broken_rdists); + return 0; + } + redist_base = ioremap(gicc->gicr_base_address, size); if (!redist_base) return -ENOMEM; @@ -2413,9 +2445,12 @@ static int __init gic_acpi_match_gicc(union acpi_subtable_headers *header, /* * If GICC is enabled and has valid gicr base address, then it means - * GICR base is presented via GICC + * GICR base is presented via GICC. The redistributor is only known to + * be accessible if the GICC is marked as enabled. If this bit is not + * set, we'd need to add the redistributor at runtime, which isn't + * supported. */ - if (acpi_gicc_is_usable(gicc) && gicc->gicr_base_address) + if (gicc->flags & ACPI_MADT_ENABLED && gicc->gicr_base_address) acpi_data.enabled_rdists++; return 0; @@ -2474,7 +2509,8 @@ static int __init gic_acpi_parse_virt_madt_gicc(union acpi_subtable_headers *hea int maint_irq_mode; static int first_madt = true; - if (!acpi_gicc_is_usable(gicc)) + if (!(gicc->flags & + (ACPI_MADT_ENABLED | ACPI_MADT_GICC_ONLINE_CAPABLE))) return 0; maint_irq_mode = (gicc->flags & ACPI_MADT_VGIC_IRQ_MODE) ? diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 9844a3f9c4e5..cf5d2a6950ec 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -237,11 +237,6 @@ acpi_table_parse_cedt(enum acpi_cedt_type id, int acpi_parse_mcfg (struct acpi_table_header *header); void acpi_table_print_madt_entry (struct acpi_subtable_header *madt); -static inline bool acpi_gicc_is_usable(struct acpi_madt_generic_interrupt *gicc) -{ - return gicc->flags & ACPI_MADT_ENABLED; -} - /* the following numa functions are architecture-dependent */ void acpi_numa_slit_init (struct acpi_table_slit *slit);
On Fri, 26 Apr 2024 19:28:58 +0100, Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote: > > > I'll not send a formal v9 until early next week, so here is the current state > if you have time to take another look before then. Don't bother resending this on my account -- you only sent it on Friday and there hasn't been much response to it yet. There is still a problem (see below), but looks otherwise OK. [...] > @@ -2363,11 +2381,25 @@ gic_acpi_parse_madt_gicc(union acpi_subtable_headers *header, > (struct acpi_madt_generic_interrupt *)header; > u32 reg = readl_relaxed(acpi_data.dist_base + GICD_PIDR2) & GIC_PIDR2_ARCH_MASK; > u32 size = reg == GIC_PIDR2_ARCH_GICv4 ? SZ_64K * 4 : SZ_64K * 2; > + int cpu = get_cpu_for_acpi_id(gicc->uid); I already commented that get_cpu_for_acpi_id() can... > void __iomem *redist_base; > > - if (!acpi_gicc_is_usable(gicc)) > + /* Neither enabled or online capable means it doesn't exist, skip it */ > + if (!(gicc->flags & (ACPI_MADT_ENABLED | ACPI_MADT_GICC_ONLINE_CAPABLE))) > return 0; > > + /* > + * Capable but disabled CPUs can be brought online later. What about > + * the redistributor? ACPI doesn't want to say! > + * Virtual hotplug systems can use the MADT's "always-on" GICR entries. > + * Otherwise, prevent such CPUs from being brought online. > + */ > + if (!(gicc->flags & ACPI_MADT_ENABLED)) { > + pr_warn("CPU %u's redistributor is inaccessible: this CPU can't be brought online\n", cpu); > + cpumask_set_cpu(cpu, &broken_rdists); ... return -EINVAL, and then be passed to cpumask_set_cpu(), with interesting effects. It shouldn't happen, but I trust anything that comes from firmware tables as much as I trust a campaigning politician's promises. This should really result in the RD being considered unusable, but without affecting any CPU (there is no valid CPU the first place). Another question is what get_cpu_for acpi_id() returns for a disabled CPU. A valid CPU number? Or -EINVAL? Thanks, M.
On Sun, 28 Apr 2024 12:28:03 +0100 Marc Zyngier <maz@kernel.org> wrote: > On Fri, 26 Apr 2024 19:28:58 +0100, > Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote: > > > > > > I'll not send a formal v9 until early next week, so here is the current state > > if you have time to take another look before then. > > Don't bother resending this on my account -- you only sent it on > Friday and there hasn't been much response to it yet. There is still a > problem (see below), but looks otherwise OK. > > [...] > > > @@ -2363,11 +2381,25 @@ gic_acpi_parse_madt_gicc(union acpi_subtable_headers *header, > > (struct acpi_madt_generic_interrupt *)header; > > u32 reg = readl_relaxed(acpi_data.dist_base + GICD_PIDR2) & GIC_PIDR2_ARCH_MASK; > > u32 size = reg == GIC_PIDR2_ARCH_GICv4 ? SZ_64K * 4 : SZ_64K * 2; > > + int cpu = get_cpu_for_acpi_id(gicc->uid); > > I already commented that get_cpu_for_acpi_id() can... Indeed sorry - I blame Friday syndrome for me failing to address that. > > > void __iomem *redist_base; > > > > - if (!acpi_gicc_is_usable(gicc)) > > + /* Neither enabled or online capable means it doesn't exist, skip it */ > > + if (!(gicc->flags & (ACPI_MADT_ENABLED | ACPI_MADT_GICC_ONLINE_CAPABLE))) > > return 0; > > > > + /* > > + * Capable but disabled CPUs can be brought online later. What about > > + * the redistributor? ACPI doesn't want to say! > > + * Virtual hotplug systems can use the MADT's "always-on" GICR entries. > > + * Otherwise, prevent such CPUs from being brought online. > > + */ > > + if (!(gicc->flags & ACPI_MADT_ENABLED)) { > > + pr_warn("CPU %u's redistributor is inaccessible: this CPU can't be brought online\n", cpu); > > + cpumask_set_cpu(cpu, &broken_rdists); > > ... return -EINVAL, and then be passed to cpumask_set_cpu(), with > interesting effects. It shouldn't happen, but I trust anything that > comes from firmware tables as much as I trust a campaigning > politician's promises. This should really result in the RD being > considered unusable, but without affecting any CPU (there is no valid > CPU the first place). > > Another question is what get_cpu_for acpi_id() returns for a disabled > CPU. A valid CPU number? Or -EINVAL? It's a match function that works by iterating over 0 to nr_cpu_ids and if (uid == get_acpi_id_for_cpu(cpu)) So the question become does get_acpi_id_for_cpu() return a valid CPU number for a disabled CPU. That uses acpi_cpu_get_madt_gicc(cpu)->uid so this all gets a bit circular. That looks it up via cpu_madt_gicc[cpu] which after the proposed updated patch is set if enabled or online capable. There are however a few other error checks in acpi_map_gic_cpu_interface() that could lead to it not being set (MPIDR validity checks). I suspect all of these end up being fatal elsewhere which is why this hasn't blown up before. If any of those cases are possible we could get a null pointer dereference. Easy to harden this case via the following (which will leave us with -EINVAL. There are other call sites that might trip over this. I'm inclined to harden them as a separate issue though so as not to get in the way of this patch set. diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h index bc9a6656fc0c..a407f9cd549e 100644 --- a/arch/arm64/include/asm/acpi.h +++ b/arch/arm64/include/asm/acpi.h @@ -124,7 +124,8 @@ static inline int get_cpu_for_acpi_id(u32 uid) int cpu; for (cpu = 0; cpu < nr_cpu_ids; cpu++) - if (uid == get_acpi_id_for_cpu(cpu)) + if (acpi_cpu_get_madt_gicc(cpu) && + uid == get_acpi_id_for_cpu(cpu)) return cpu; return -EINVAL; I'll spin an additional patch to make that change after testing I haven't messed it up. At the call site in gic_acpi_parse_madt_gicc() I'm not sure we can do better than just skipping setting broken_rdists. I'll also pull the declaration of that cpu variable down into this condition so it's more obvious we only care about it in this error path. Jonathan > > Thanks, > > M. >
On Mon, 29 Apr 2024 10:21:31 +0100 Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote: > On Sun, 28 Apr 2024 12:28:03 +0100 > Marc Zyngier <maz@kernel.org> wrote: > > > On Fri, 26 Apr 2024 19:28:58 +0100, > > Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote: > > > > > > > > > I'll not send a formal v9 until early next week, so here is the current state > > > if you have time to take another look before then. > > > > Don't bother resending this on my account -- you only sent it on > > Friday and there hasn't been much response to it yet. There is still a > > problem (see below), but looks otherwise OK. > > > > [...] > > > > > @@ -2363,11 +2381,25 @@ gic_acpi_parse_madt_gicc(union acpi_subtable_headers *header, > > > (struct acpi_madt_generic_interrupt *)header; > > > u32 reg = readl_relaxed(acpi_data.dist_base + GICD_PIDR2) & GIC_PIDR2_ARCH_MASK; > > > u32 size = reg == GIC_PIDR2_ARCH_GICv4 ? SZ_64K * 4 : SZ_64K * 2; > > > + int cpu = get_cpu_for_acpi_id(gicc->uid); > > > > I already commented that get_cpu_for_acpi_id() can... > > Indeed sorry - I blame Friday syndrome for me failing to address that. > > > > > > void __iomem *redist_base; > > > > > > - if (!acpi_gicc_is_usable(gicc)) > > > + /* Neither enabled or online capable means it doesn't exist, skip it */ > > > + if (!(gicc->flags & (ACPI_MADT_ENABLED | ACPI_MADT_GICC_ONLINE_CAPABLE))) > > > return 0; > > > > > > + /* > > > + * Capable but disabled CPUs can be brought online later. What about > > > + * the redistributor? ACPI doesn't want to say! > > > + * Virtual hotplug systems can use the MADT's "always-on" GICR entries. > > > + * Otherwise, prevent such CPUs from being brought online. > > > + */ > > > + if (!(gicc->flags & ACPI_MADT_ENABLED)) { > > > + pr_warn("CPU %u's redistributor is inaccessible: this CPU can't be brought online\n", cpu); > > > + cpumask_set_cpu(cpu, &broken_rdists); > > > > ... return -EINVAL, and then be passed to cpumask_set_cpu(), with > > interesting effects. It shouldn't happen, but I trust anything that > > comes from firmware tables as much as I trust a campaigning > > politician's promises. This should really result in the RD being > > considered unusable, but without affecting any CPU (there is no valid > > CPU the first place). > > > > Another question is what get_cpu_for acpi_id() returns for a disabled > > CPU. A valid CPU number? Or -EINVAL? > It's a match function that works by iterating over 0 to nr_cpu_ids and > > if (uid == get_acpi_id_for_cpu(cpu)) > > So the question become does get_acpi_id_for_cpu() return a valid CPU > number for a disabled CPU. > > That uses acpi_cpu_get_madt_gicc(cpu)->uid so this all gets a bit circular. > That looks it up via cpu_madt_gicc[cpu] which after the proposed updated > patch is set if enabled or online capable. There are however a few other > error checks in acpi_map_gic_cpu_interface() that could lead to it > not being set (MPIDR validity checks). I suspect all of these end up being > fatal elsewhere which is why this hasn't blown up before. > > If any of those cases are possible we could get a null pointer > dereference. > > Easy to harden this case via the following (which will leave us with > -EINVAL. There are other call sites that might trip over this. > I'm inclined to harden them as a separate issue though so as not > to get in the way of this patch set. > > > diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h > index bc9a6656fc0c..a407f9cd549e 100644 > --- a/arch/arm64/include/asm/acpi.h > +++ b/arch/arm64/include/asm/acpi.h > @@ -124,7 +124,8 @@ static inline int get_cpu_for_acpi_id(u32 uid) > int cpu; > > for (cpu = 0; cpu < nr_cpu_ids; cpu++) > - if (uid == get_acpi_id_for_cpu(cpu)) > + if (acpi_cpu_get_madt_gicc(cpu) && > + uid == get_acpi_id_for_cpu(cpu)) > return cpu; > > return -EINVAL; > > I'll spin an additional patch to make that change after testing I haven't > messed it up. > > At the call site in gic_acpi_parse_madt_gicc() I'm not sure we can do better > than just skipping setting broken_rdists. I'll also pull the declaration of > that cpu variable down into this condition so it's more obvious we only > care about it in this error path. Just for the record, for my deliberately broken test case it seems that it returns a valid CPU ID anyway. That's what I'd expect given acpi_parse_and_init_cpus() doesn't check if the gicc entrees are enabled or not. Jonathan > > Jonathan > > > > > > > > > Thanks, > > > > M. > > > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index 10af15f93d4d..b4685991953e 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -44,6 +44,8 @@ #define GIC_IRQ_TYPE_PARTITION (GIC_IRQ_TYPE_LPI + 1) +static struct cpumask broken_rdists __read_mostly; + struct redist_region { void __iomem *redist_base; phys_addr_t phys_base; @@ -1293,6 +1295,18 @@ static void gic_cpu_init(void) #define MPIDR_TO_SGI_RS(mpidr) (MPIDR_RS(mpidr) << ICC_SGI1R_RS_SHIFT) #define MPIDR_TO_SGI_CLUSTER_ID(mpidr) ((mpidr) & ~0xFUL) +/* + * gic_starting_cpu() is called after the last point where cpuhp is allowed + * to fail. So pre check for problems earlier. + */ +static int gic_check_rdist(unsigned int cpu) +{ + if (cpumask_test_cpu(cpu, &broken_rdists)) + return -EINVAL; + + return 0; +} + static int gic_starting_cpu(unsigned int cpu) { gic_cpu_init(); @@ -1384,6 +1398,10 @@ static void __init gic_smp_init(void) }; int base_sgi; + cpuhp_setup_state_nocalls(CPUHP_BP_PREPARE_DYN, + "irqchip/arm/gicv3:checkrdist", + gic_check_rdist, NULL); + cpuhp_setup_state_nocalls(CPUHP_AP_IRQ_GIC_STARTING, "irqchip/arm/gicv3:starting", gic_starting_cpu, NULL); @@ -2363,11 +2381,24 @@ gic_acpi_parse_madt_gicc(union acpi_subtable_headers *header, (struct acpi_madt_generic_interrupt *)header; u32 reg = readl_relaxed(acpi_data.dist_base + GICD_PIDR2) & GIC_PIDR2_ARCH_MASK; u32 size = reg == GIC_PIDR2_ARCH_GICv4 ? SZ_64K * 4 : SZ_64K * 2; + int cpu = get_cpu_for_acpi_id(gicc->uid); void __iomem *redist_base; if (!acpi_gicc_is_usable(gicc)) return 0; + /* + * Capable but disabled CPUs can be brought online later. What about + * the redistributor? ACPI doesn't want to say! + * Virtual hotplug systems can use the MADT's "always-on" GICR entries. + * Otherwise, prevent such CPUs from being brought online. + */ + if (!(gicc->flags & ACPI_MADT_ENABLED)) { + pr_warn_once("CPU %u's redistributor is inaccessible: this CPU can't be brought online\n", cpu); + cpumask_set_cpu(cpu, &broken_rdists); + return 0; + } + redist_base = ioremap(gicc->gicr_base_address, size); if (!redist_base) return -ENOMEM; @@ -2413,9 +2444,12 @@ static int __init gic_acpi_match_gicc(union acpi_subtable_headers *header, /* * If GICC is enabled and has valid gicr base address, then it means - * GICR base is presented via GICC + * GICR base is presented via GICC. The redistributor is only known to + * be accessible if the GICC is marked as enabled. If this bit is not + * set, we'd need to add the redistributor at runtime, which isn't + * supported. */ - if (acpi_gicc_is_usable(gicc) && gicc->gicr_base_address) + if (gicc->flags & ACPI_MADT_ENABLED && gicc->gicr_base_address) acpi_data.enabled_rdists++; return 0; diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 9844a3f9c4e5..fcfb7bb6789e 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -239,7 +239,8 @@ void acpi_table_print_madt_entry (struct acpi_subtable_header *madt); static inline bool acpi_gicc_is_usable(struct acpi_madt_generic_interrupt *gicc) { - return gicc->flags & ACPI_MADT_ENABLED; + return gicc->flags & (ACPI_MADT_ENABLED | + ACPI_MADT_GICC_ONLINE_CAPABLE); } /* the following numa functions are architecture-dependent */