Message ID | 20230601143109.v9.7.I2ef26d1b3bfbed2d10a281942b0da7d9854de05e@changeid (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64: Add debug IPI for backtraces / kgdb; try to use NMI for it | expand |
On Thu, Jun 01, 2023 at 02:31:51PM -0700, Douglas Anderson wrote: > From: Sumit Garg <sumit.garg@linaro.org> > > Let's use the debug IPI for rounding up CPUs in kgdb. When the debug > IPI is backed by an NMI (or pseudo NMI) then this will let us debug > even hard locked CPUs. When the debug IPI isn't backed by an NMI then > this won't really have any huge benefit but it will still work. > > Signed-off-by: Sumit Garg <sumit.garg@linaro.org> > Signed-off-by: Douglas Anderson <dianders@chromium.org> > --- > > Changes in v9: > - Remove fallback for when debug IPI isn't available. > - Renamed "NMI IPI" to "debug IPI" since it might not be backed by NMI. > > arch/arm64/kernel/ipi_debug.c | 5 +++++ > arch/arm64/kernel/kgdb.c | 14 ++++++++++++++ > 2 files changed, 19 insertions(+) This looks fine to me, but I'd feel a bit happier if we had separate SGIs for the backtrace and the KGDB callback as they're logically unrelated. Thanks, Mark. > > diff --git a/arch/arm64/kernel/ipi_debug.c b/arch/arm64/kernel/ipi_debug.c > index 6984ed507e1f..5794894d94f1 100644 > --- a/arch/arm64/kernel/ipi_debug.c > +++ b/arch/arm64/kernel/ipi_debug.c > @@ -8,6 +8,7 @@ > > #include <linux/interrupt.h> > #include <linux/irq.h> > +#include <linux/kgdb.h> > #include <linux/nmi.h> > #include <linux/smp.h> > > @@ -40,6 +41,7 @@ void arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self) > static irqreturn_t ipi_debug_handler(int irq, void *data) > { > irqreturn_t ret = IRQ_NONE; > + unsigned int cpu = smp_processor_id(); > > /* > * NOTE: Just like in arch_trigger_cpumask_backtrace(), we're calling > @@ -49,6 +51,9 @@ static irqreturn_t ipi_debug_handler(int irq, void *data) > if (nmi_cpu_backtrace(get_irq_regs())) > ret = IRQ_HANDLED; > > + if (!kgdb_nmicallback(cpu, get_irq_regs())) > + ret = IRQ_HANDLED; > + > return ret; > } > > diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c > index 4e1f983df3d1..9c4c47507cd4 100644 > --- a/arch/arm64/kernel/kgdb.c > +++ b/arch/arm64/kernel/kgdb.c > @@ -20,6 +20,8 @@ > #include <asm/patching.h> > #include <asm/traps.h> > > +#include "ipi_debug.h" > + > struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] = { > { "x0", 8, offsetof(struct pt_regs, regs[0])}, > { "x1", 8, offsetof(struct pt_regs, regs[1])}, > @@ -356,3 +358,15 @@ int kgdb_arch_remove_breakpoint(struct kgdb_bkpt *bpt) > return aarch64_insn_write((void *)bpt->bpt_addr, > *(u32 *)bpt->saved_instr); > } > + > +void kgdb_roundup_cpus(void) > +{ > + struct cpumask mask; > + > + cpumask_copy(&mask, cpu_online_mask); > + cpumask_clear_cpu(raw_smp_processor_id(), &mask); > + if (cpumask_empty(&mask)) > + return; > + > + arm64_debug_ipi(&mask); > +} > -- > 2.41.0.rc2.161.g9c6817b8e7-goog >
On 2023-08-07 11:28, Mark Rutland wrote: > On Thu, Jun 01, 2023 at 02:31:51PM -0700, Douglas Anderson wrote: >> From: Sumit Garg <sumit.garg@linaro.org> >> >> Let's use the debug IPI for rounding up CPUs in kgdb. When the debug >> IPI is backed by an NMI (or pseudo NMI) then this will let us debug >> even hard locked CPUs. When the debug IPI isn't backed by an NMI then >> this won't really have any huge benefit but it will still work. >> >> Signed-off-by: Sumit Garg <sumit.garg@linaro.org> >> Signed-off-by: Douglas Anderson <dianders@chromium.org> >> --- >> >> Changes in v9: >> - Remove fallback for when debug IPI isn't available. >> - Renamed "NMI IPI" to "debug IPI" since it might not be backed by >> NMI. >> >> arch/arm64/kernel/ipi_debug.c | 5 +++++ >> arch/arm64/kernel/kgdb.c | 14 ++++++++++++++ >> 2 files changed, 19 insertions(+) > > This looks fine to me, but I'd feel a bit happier if we had separate > SGIs for > the backtrace and the KGDB callback as they're logically unrelated. Well, we're a bit stuck here. We have exactly *one* spare SGI with GICv3, as we lose 8 of them to the secure side. One possibility would be to mux some of the lesser used IPIs onto two SGIs (one with standard priority, and one with NMI priority). M.
On Mon, Aug 07, 2023 at 11:47:04AM +0100, Marc Zyngier wrote: > On 2023-08-07 11:28, Mark Rutland wrote: > > On Thu, Jun 01, 2023 at 02:31:51PM -0700, Douglas Anderson wrote: > > > From: Sumit Garg <sumit.garg@linaro.org> > > > > > > Let's use the debug IPI for rounding up CPUs in kgdb. When the debug > > > IPI is backed by an NMI (or pseudo NMI) then this will let us debug > > > even hard locked CPUs. When the debug IPI isn't backed by an NMI then > > > this won't really have any huge benefit but it will still work. > > > > > > Signed-off-by: Sumit Garg <sumit.garg@linaro.org> > > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > > > --- > > > > > > Changes in v9: > > > - Remove fallback for when debug IPI isn't available. > > > - Renamed "NMI IPI" to "debug IPI" since it might not be backed by > > > NMI. > > > > > > arch/arm64/kernel/ipi_debug.c | 5 +++++ > > > arch/arm64/kernel/kgdb.c | 14 ++++++++++++++ > > > 2 files changed, 19 insertions(+) > > > > This looks fine to me, but I'd feel a bit happier if we had separate > > SGIs for > > the backtrace and the KGDB callback as they're logically unrelated. > > Well, we're a bit stuck here. > > We have exactly *one* spare SGI with GICv3, as we lose 8 of them > to the secure side. One possibility would be to mux some of the > lesser used IPIs onto two SGIs (one with standard priority, and > one with NMI priority). Understood; Doug and I suggested two options for that: 1) Unify/mux the IPI_CPU_STOP and IPI_CPU_CRASH_STOP IPIs The only *intended* difference between the two is that IPI_CPU_CRASH_STOP calls crash_save_cpu() before trying to stop the CPU, but the implementations have diverged significantly for unrelated reasons. 2) Remove IPI_WAKEUP We only use IPI_WAKEUP for the ACPI parking protocol, and we could reuse another IPI (e.g. IPI_RESCHEDULE) to achieve the same thing witout a dedicated IPI. Thanks, Mark.
On 2023-08-07 11:54, Mark Rutland wrote: > On Mon, Aug 07, 2023 at 11:47:04AM +0100, Marc Zyngier wrote: >> On 2023-08-07 11:28, Mark Rutland wrote: >> > On Thu, Jun 01, 2023 at 02:31:51PM -0700, Douglas Anderson wrote: >> > > From: Sumit Garg <sumit.garg@linaro.org> >> > > >> > > Let's use the debug IPI for rounding up CPUs in kgdb. When the debug >> > > IPI is backed by an NMI (or pseudo NMI) then this will let us debug >> > > even hard locked CPUs. When the debug IPI isn't backed by an NMI then >> > > this won't really have any huge benefit but it will still work. >> > > >> > > Signed-off-by: Sumit Garg <sumit.garg@linaro.org> >> > > Signed-off-by: Douglas Anderson <dianders@chromium.org> >> > > --- >> > > >> > > Changes in v9: >> > > - Remove fallback for when debug IPI isn't available. >> > > - Renamed "NMI IPI" to "debug IPI" since it might not be backed by >> > > NMI. >> > > >> > > arch/arm64/kernel/ipi_debug.c | 5 +++++ >> > > arch/arm64/kernel/kgdb.c | 14 ++++++++++++++ >> > > 2 files changed, 19 insertions(+) >> > >> > This looks fine to me, but I'd feel a bit happier if we had separate >> > SGIs for >> > the backtrace and the KGDB callback as they're logically unrelated. >> >> Well, we're a bit stuck here. >> >> We have exactly *one* spare SGI with GICv3, as we lose 8 of them >> to the secure side. One possibility would be to mux some of the >> lesser used IPIs onto two SGIs (one with standard priority, and >> one with NMI priority). > > Understood; Doug and I suggested two options for that: > > 1) Unify/mux the IPI_CPU_STOP and IPI_CPU_CRASH_STOP IPIs > > The only *intended* difference between the two is that > IPI_CPU_CRASH_STOP > calls crash_save_cpu() before trying to stop the CPU, but the > implementations have diverged significantly for unrelated reasons. > > 2) Remove IPI_WAKEUP > > We only use IPI_WAKEUP for the ACPI parking protocol, and we could > reuse > another IPI (e.g. IPI_RESCHEDULE) to achieve the same thing witout a > dedicated IPI. Sure. My concern is that we're papering over the fundamental problem, which is that IPIs are limited resource, and that we're bound to pile more stuff on them. I'm all for reclaiming the ones that can be merged, but we may ultimately need a real fix for this. M.
On Mon, Aug 07, 2023 at 12:08:06PM +0100, Marc Zyngier wrote: > On 2023-08-07 11:54, Mark Rutland wrote: > > On Mon, Aug 07, 2023 at 11:47:04AM +0100, Marc Zyngier wrote: > > > On 2023-08-07 11:28, Mark Rutland wrote: > > > > On Thu, Jun 01, 2023 at 02:31:51PM -0700, Douglas Anderson wrote: > > > > > From: Sumit Garg <sumit.garg@linaro.org> > > > > > > > > > > Let's use the debug IPI for rounding up CPUs in kgdb. When the debug > > > > > IPI is backed by an NMI (or pseudo NMI) then this will let us debug > > > > > even hard locked CPUs. When the debug IPI isn't backed by an NMI then > > > > > this won't really have any huge benefit but it will still work. > > > > > > > > > > Signed-off-by: Sumit Garg <sumit.garg@linaro.org> > > > > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > > > > > --- > > > > > > > > > > Changes in v9: > > > > > - Remove fallback for when debug IPI isn't available. > > > > > - Renamed "NMI IPI" to "debug IPI" since it might not be backed by > > > > > NMI. > > > > > > > > > > arch/arm64/kernel/ipi_debug.c | 5 +++++ > > > > > arch/arm64/kernel/kgdb.c | 14 ++++++++++++++ > > > > > 2 files changed, 19 insertions(+) > > > > > > > > This looks fine to me, but I'd feel a bit happier if we had separate > > > > SGIs for > > > > the backtrace and the KGDB callback as they're logically unrelated. > > > > > > Well, we're a bit stuck here. > > > > > > We have exactly *one* spare SGI with GICv3, as we lose 8 of them > > > to the secure side. One possibility would be to mux some of the > > > lesser used IPIs onto two SGIs (one with standard priority, and > > > one with NMI priority). > > > > Understood; Doug and I suggested two options for that: > > > > 1) Unify/mux the IPI_CPU_STOP and IPI_CPU_CRASH_STOP IPIs > > > > The only *intended* difference between the two is that > > IPI_CPU_CRASH_STOP > > calls crash_save_cpu() before trying to stop the CPU, but the > > implementations have diverged significantly for unrelated reasons. > > > > 2) Remove IPI_WAKEUP > > > > We only use IPI_WAKEUP for the ACPI parking protocol, and we could > > reuse > > another IPI (e.g. IPI_RESCHEDULE) to achieve the same thing witout a > > dedicated IPI. > > Sure. My concern is that we're papering over the fundamental problem, > which is that IPIs are limited resource, and that we're bound to pile > more stuff on them. > > I'm all for reclaiming the ones that can be merged, but we may ultimately > need a real fix for this. Sure; I will bear that in mind. Thanks, Mark.
On Mon, Aug 07, 2023 at 11:28:52AM +0100, Mark Rutland wrote: > On Thu, Jun 01, 2023 at 02:31:51PM -0700, Douglas Anderson wrote: > > From: Sumit Garg <sumit.garg@linaro.org> > > > > Let's use the debug IPI for rounding up CPUs in kgdb. When the debug > > IPI is backed by an NMI (or pseudo NMI) then this will let us debug > > even hard locked CPUs. When the debug IPI isn't backed by an NMI then > > this won't really have any huge benefit but it will still work. > > > > Signed-off-by: Sumit Garg <sumit.garg@linaro.org> > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > > --- > > > > Changes in v9: > > - Remove fallback for when debug IPI isn't available. > > - Renamed "NMI IPI" to "debug IPI" since it might not be backed by NMI. > > > > arch/arm64/kernel/ipi_debug.c | 5 +++++ > > arch/arm64/kernel/kgdb.c | 14 ++++++++++++++ > > 2 files changed, 19 insertions(+) > > This looks fine to me, but I'd feel a bit happier if we had separate SGIs for > the backtrace and the KGDB callback as they're logically unrelated. I've no objections to seperate SGIs (if one can be found) but I'm curious what benefit emerges from giving them seperate IPIs. Both interfaces are already designed to share and NMI-like IPI nicely (and IIUC they must share one on x86), neither is performance critical[1] and the content of /proc/interrupts for the IPI is seldom going to be of much interest. As mentioned it is perfectly OK to separate them if there is space in the SGI allocations. However if any two functions are good candidates to share a scarce resource such as an SGI then it is these! Daniel. [1] In both cases their results are only required at human-scale and the work of the both handlers is hugely more expensive than either up front quick-check.
On Mon, Aug 07, 2023 at 04:24:44PM +0100, Daniel Thompson wrote: > On Mon, Aug 07, 2023 at 11:28:52AM +0100, Mark Rutland wrote: > > On Thu, Jun 01, 2023 at 02:31:51PM -0700, Douglas Anderson wrote: > > > From: Sumit Garg <sumit.garg@linaro.org> > > > > > > Let's use the debug IPI for rounding up CPUs in kgdb. When the debug > > > IPI is backed by an NMI (or pseudo NMI) then this will let us debug > > > even hard locked CPUs. When the debug IPI isn't backed by an NMI then > > > this won't really have any huge benefit but it will still work. > > > > > > Signed-off-by: Sumit Garg <sumit.garg@linaro.org> > > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > > > --- > > > > > > Changes in v9: > > > - Remove fallback for when debug IPI isn't available. > > > - Renamed "NMI IPI" to "debug IPI" since it might not be backed by NMI. > > > > > > arch/arm64/kernel/ipi_debug.c | 5 +++++ > > > arch/arm64/kernel/kgdb.c | 14 ++++++++++++++ > > > 2 files changed, 19 insertions(+) > > > > This looks fine to me, but I'd feel a bit happier if we had separate SGIs for > > the backtrace and the KGDB callback as they're logically unrelated. > > I've no objections to seperate SGIs (if one can be found) but I'm curious > what benefit emerges from giving them seperate IPIs. Mostly an "I'd feel happier"; they're logically unrelated and having them separate avoids the risk of them unintentionally getting in the way of the other. > Both interfaces are already designed to share and NMI-like IPI nicely > (and IIUC they must share one on x86), neither is performance > critical[1] and the content of /proc/interrupts for the IPI is seldom > going to be of much interest. Sure; I understand that. The flip side of "neither is performance critical" is that they're seldom tested in terms of interaction with one another, and hence for robustness I'd prefer they're separate. I agree it's not strictly necessary, but given we can easily free up an SGI slot, I think it'd be worthwhile. We can always decide to fold them together in future if we have to. I realise a similar argument could be applied to IPI_WAKEUP and IPI_RESCHEDULE, but given that IPI_RESCHEDULE happens *all the time* and the wakeup handler does literally nothing, I think the risk there is substantially lower. Thanks, Mark. > As mentioned it is perfectly OK to separate them if there is space in > the SGI allocations. However if any two functions are good candidates to > share a scarce resource such as an SGI then it is these! > > > Daniel. > > > [1] In both cases their results are only required at human-scale > and the work of the both handlers is hugely more expensive than > either up front quick-check.
On Mon, Aug 07, 2023 at 05:04:14PM +0100, Mark Rutland wrote: > On Mon, Aug 07, 2023 at 04:24:44PM +0100, Daniel Thompson wrote: > > On Mon, Aug 07, 2023 at 11:28:52AM +0100, Mark Rutland wrote: > > > On Thu, Jun 01, 2023 at 02:31:51PM -0700, Douglas Anderson wrote: > > > > From: Sumit Garg <sumit.garg@linaro.org> > > > > > > > > Let's use the debug IPI for rounding up CPUs in kgdb. When the debug > > > > IPI is backed by an NMI (or pseudo NMI) then this will let us debug > > > > even hard locked CPUs. When the debug IPI isn't backed by an NMI then > > > > this won't really have any huge benefit but it will still work. > > > > > > > > Signed-off-by: Sumit Garg <sumit.garg@linaro.org> > > > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > > > > --- > > > > > > > > Changes in v9: > > > > - Remove fallback for when debug IPI isn't available. > > > > - Renamed "NMI IPI" to "debug IPI" since it might not be backed by NMI. > > > > > > > > arch/arm64/kernel/ipi_debug.c | 5 +++++ > > > > arch/arm64/kernel/kgdb.c | 14 ++++++++++++++ > > > > 2 files changed, 19 insertions(+) > > > > > > This looks fine to me, but I'd feel a bit happier if we had separate SGIs for > > > the backtrace and the KGDB callback as they're logically unrelated. > > > > I've no objections to seperate SGIs (if one can be found) but I'm curious > > what benefit emerges from giving them seperate IPIs. > > Mostly an "I'd feel happier"; they're logically unrelated and having them > separate avoids the risk of them unintentionally getting in the way of the > other. > > > Both interfaces are already designed to share and NMI-like IPI nicely > > (and IIUC they must share one on x86), neither is performance > > critical[1] and the content of /proc/interrupts for the IPI is seldom > > going to be of much interest. > > Sure; I understand that. The flip side of "neither is performance critical" is > that they're seldom tested in terms of interaction with one another, and hence > for robustness I'd prefer they're separate. We can't really stop them interacting when kdb is enabled: both activate special logic to intercept console messages and this is much more complex than the "is this my IPI?" tests. > I agree it's not strictly necessary, but given we can easily free up an SGI > slot, I think it'd be worthwhile. We can always decide to fold them together in > future if we have to. Agreed. It should be very little bother to "reclaim" an IPI from the diagnostics machinery if that is ever needed. So seperate IPIs is no problem from my point of view. Daniel.
diff --git a/arch/arm64/kernel/ipi_debug.c b/arch/arm64/kernel/ipi_debug.c index 6984ed507e1f..5794894d94f1 100644 --- a/arch/arm64/kernel/ipi_debug.c +++ b/arch/arm64/kernel/ipi_debug.c @@ -8,6 +8,7 @@ #include <linux/interrupt.h> #include <linux/irq.h> +#include <linux/kgdb.h> #include <linux/nmi.h> #include <linux/smp.h> @@ -40,6 +41,7 @@ void arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self) static irqreturn_t ipi_debug_handler(int irq, void *data) { irqreturn_t ret = IRQ_NONE; + unsigned int cpu = smp_processor_id(); /* * NOTE: Just like in arch_trigger_cpumask_backtrace(), we're calling @@ -49,6 +51,9 @@ static irqreturn_t ipi_debug_handler(int irq, void *data) if (nmi_cpu_backtrace(get_irq_regs())) ret = IRQ_HANDLED; + if (!kgdb_nmicallback(cpu, get_irq_regs())) + ret = IRQ_HANDLED; + return ret; } diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c index 4e1f983df3d1..9c4c47507cd4 100644 --- a/arch/arm64/kernel/kgdb.c +++ b/arch/arm64/kernel/kgdb.c @@ -20,6 +20,8 @@ #include <asm/patching.h> #include <asm/traps.h> +#include "ipi_debug.h" + struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] = { { "x0", 8, offsetof(struct pt_regs, regs[0])}, { "x1", 8, offsetof(struct pt_regs, regs[1])}, @@ -356,3 +358,15 @@ int kgdb_arch_remove_breakpoint(struct kgdb_bkpt *bpt) return aarch64_insn_write((void *)bpt->bpt_addr, *(u32 *)bpt->saved_instr); } + +void kgdb_roundup_cpus(void) +{ + struct cpumask mask; + + cpumask_copy(&mask, cpu_online_mask); + cpumask_clear_cpu(raw_smp_processor_id(), &mask); + if (cpumask_empty(&mask)) + return; + + arm64_debug_ipi(&mask); +}