diff mbox series

[v2] arm64: smp: smp_send_stop() and crash_smp_send_stop() should try non-NMI first

Message ID 20240625160718.v2.1.Id4817adef610302554b8aa42b090d57270dc119c@changeid (mailing list archive)
State Handled Elsewhere
Headers show
Series [v2] arm64: smp: smp_send_stop() and crash_smp_send_stop() should try non-NMI first | expand

Commit Message

Doug Anderson June 25, 2024, 11:07 p.m. UTC
When testing hard lockup handling on my sc7180-trogdor-lazor device
with pseudo-NMI enabled, with serial console enabled and with kgdb
disabled, I found that the stack crawls printed to the serial console
ended up as a jumbled mess. After rebooting, the pstore-based console
looked fine though. Also, enabling kgdb to trap the panic made the
console look fine and avoided the mess.

After a bit of tracking down, I came to the conclusion that this was
what was happening:
1. The panic path was stopping all other CPUs with
   panic_other_cpus_shutdown().
2. At least one of those other CPUs was in the middle of printing to
   the serial console and holding the console port's lock, which is
   grabbed with "irqsave". ...but since we were stopping with an NMI
   we didn't care about the "irqsave" and interrupted anyway.
3. Since we stopped the CPU while it was holding the lock it would
   never release it.
4. All future calls to output to the console would end up failing to
   get the lock in qcom_geni_serial_console_write(). This isn't
   _totally_ unexpected at panic time but it's a code path that's not
   well tested, hard to get right, and apparently doesn't work
   terribly well on the Qualcomm geni serial driver.

The Qualcomm geni serial driver was fixed to be a bit better in commit
9e957a155005 ("serial: qcom-geni: Don't cancel/abort if we can't get
the port lock") but it's nice not to get into this situation in the
first place.

Taking a page from what x86 appears to do in native_stop_other_cpus(),
do this:
1. First, try to stop other CPUs with a normal IPI and wait a second.
   This gives them a chance to leave critical sections.
2. If CPUs fail to stop then retry with an NMI, but give a much lower
   timeout since there's no good reason for a CPU not to react quickly
   to a NMI.

This works well and avoids the corrupted console and (presumably)
could help avoid other similar issues.

In order to do this, we need to do a little re-organization of our
IPIs since we don't have any more free IDs. Do what was suggested in
previous conversations and combine "stop" and "crash stop". That frees
up an IPI so now we can have a "stop" and "stop NMI".

In order to do this we also need a slight change in the way we keep
track of which CPUs still need to be stopped. We need to know
specifically which CPUs haven't stopped yet when we fall back to NMI
but in the "crash stop" case the "cpu_online_mask" isn't updated as
CPUs go down. This is why that code path had an atomic of the number
of CPUs left. Solve this by also updating the "cpu_online_mask" for
crash stops.

All of the above lets us combine the logic for "stop" and "crash stop"
code, which appeared to have a bunch of arbitrary implementation
differences.

Aside from the above change where we try a normal IPI and then an NMI,
the combined function has a few subtle differences:
* In the normal smp_send_stop(), if we fail to stop one or more CPUs
  then we won't include the current CPU (the one running
  smp_send_stop()) in the error message.
* In crash_smp_send_stop(), if we fail to stop some CPUs we'll print
  the CPUs that we failed to stop instead of printing all _but_ the
  current running CPU.
* In crash_smp_send_stop(), we will now only print "SMP: stopping
  secondary CPUs" if (system_state <= SYSTEM_RUNNING).

Fixes: d7402513c935 ("arm64: smp: IPI_CPU_STOP and IPI_CPU_CRASH_STOP should try for NMI")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---
I'm not setup to test the crash_smp_send_stop(). I made sure it
compiled and hacked the panic() method to call it, but I haven't
actually run kexec. Hopefully others can confirm that it's working for
them.

v1: https://lore.kernel.org/r/20231207170251.1.Id4817adef610302554b8aa42b090d57270dc119c@changeid

Changes in v2:
- Update commit message to point to Qualcomm serial driver fix.
- Use a test-and-set to prevent stop code from running twice.
- Move mask clearing until after crash_save_cpu().
- Use local_daif_mask() in ipi_cpu_crash_stop().
- Don't use a new mask, just have crash case update online CPUs.

 arch/arm64/kernel/smp.c | 138 ++++++++++++++++++++++------------------
 1 file changed, 75 insertions(+), 63 deletions(-)

Comments

Will Deacon Aug. 5, 2024, 4:53 p.m. UTC | #1
Hi Doug,

On Tue, Jun 25, 2024 at 04:07:22PM -0700, Douglas Anderson wrote:
> @@ -1084,79 +1088,87 @@ static inline unsigned int num_other_online_cpus(void)
>  
>  void smp_send_stop(void)
>  {
> +	static unsigned long stop_in_progress;
> +	cpumask_t mask;
>  	unsigned long timeout;
>  
> -	if (num_other_online_cpus()) {
> -		cpumask_t mask;
> +	/*
> +	 * If this cpu is the only one alive at this point in time, online or
> +	 * not, there are no stop messages to be sent around, so just back out.
> +	 */
> +	if (num_other_online_cpus() == 0)
> +		goto skip_ipi;
>  
> -		cpumask_copy(&mask, cpu_online_mask);
> -		cpumask_clear_cpu(smp_processor_id(), &mask);
> +	/* Only proceed if this is the first CPU to reach this code */
> +	if (test_and_set_bit(0, &stop_in_progress))
> +		return;
>  
> -		if (system_state <= SYSTEM_RUNNING)
> -			pr_crit("SMP: stopping secondary CPUs\n");
> -		smp_cross_call(&mask, IPI_CPU_STOP);
> -	}
> +	cpumask_copy(&mask, cpu_online_mask);
> +	cpumask_clear_cpu(smp_processor_id(), &mask);
>  
> -	/* Wait up to one second for other CPUs to stop */
> +	if (system_state <= SYSTEM_RUNNING)
> +		pr_crit("SMP: stopping secondary CPUs\n");
> +
> +	/*
> +	 * Start with a normal IPI and wait up to one second for other CPUs to
> +	 * stop. We do this first because it gives other processors a chance
> +	 * to exit critical sections / drop locks and makes the rest of the
> +	 * stop process (especially console flush) more robust.
> +	 */
> +	smp_cross_call(&mask, IPI_CPU_STOP);

I realise you've moved this out of crash_smp_send_stop() and it looks
like we inherited the code from x86, but do you know how this serialise
against CPU hotplug operations? I've spent the last 20m looking at the
code and I can't see what prevents other CPUs from coming and going
while we're trying to IPI a non-atomic copy of 'cpu_online_mask'.

Will
Doug Anderson Aug. 5, 2024, 5:13 p.m. UTC | #2
Hi,

On Mon, Aug 5, 2024 at 9:53 AM Will Deacon <will@kernel.org> wrote:
>
> Hi Doug,
>
> On Tue, Jun 25, 2024 at 04:07:22PM -0700, Douglas Anderson wrote:
> > @@ -1084,79 +1088,87 @@ static inline unsigned int num_other_online_cpus(void)
> >
> >  void smp_send_stop(void)
> >  {
> > +     static unsigned long stop_in_progress;
> > +     cpumask_t mask;
> >       unsigned long timeout;
> >
> > -     if (num_other_online_cpus()) {
> > -             cpumask_t mask;
> > +     /*
> > +      * If this cpu is the only one alive at this point in time, online or
> > +      * not, there are no stop messages to be sent around, so just back out.
> > +      */
> > +     if (num_other_online_cpus() == 0)
> > +             goto skip_ipi;
> >
> > -             cpumask_copy(&mask, cpu_online_mask);
> > -             cpumask_clear_cpu(smp_processor_id(), &mask);
> > +     /* Only proceed if this is the first CPU to reach this code */
> > +     if (test_and_set_bit(0, &stop_in_progress))
> > +             return;
> >
> > -             if (system_state <= SYSTEM_RUNNING)
> > -                     pr_crit("SMP: stopping secondary CPUs\n");
> > -             smp_cross_call(&mask, IPI_CPU_STOP);
> > -     }
> > +     cpumask_copy(&mask, cpu_online_mask);
> > +     cpumask_clear_cpu(smp_processor_id(), &mask);
> >
> > -     /* Wait up to one second for other CPUs to stop */
> > +     if (system_state <= SYSTEM_RUNNING)
> > +             pr_crit("SMP: stopping secondary CPUs\n");
> > +
> > +     /*
> > +      * Start with a normal IPI and wait up to one second for other CPUs to
> > +      * stop. We do this first because it gives other processors a chance
> > +      * to exit critical sections / drop locks and makes the rest of the
> > +      * stop process (especially console flush) more robust.
> > +      */
> > +     smp_cross_call(&mask, IPI_CPU_STOP);
>
> I realise you've moved this out of crash_smp_send_stop() and it looks
> like we inherited the code from x86, but do you know how this serialise
> against CPU hotplug operations? I've spent the last 20m looking at the
> code and I can't see what prevents other CPUs from coming and going
> while we're trying to IPI a non-atomic copy of 'cpu_online_mask'.

I don't think there is anything. ...and it's not just this code
either. It sure looks like nmi_trigger_cpumask_backtrace() has the
same problem.

I guess maybe in the case of nmi_trigger_cpumask_backtrace() it's not
such a big deal because:
1. If a CPU goes away then we'll just time out
2. If a CPU shows up then we'll skip backtracing it, but we were
sending backtraces at an instant in time anyway.

In the case of smp_send_stop() it's probably fine if a CPU goes away
because, again, we'll just timeout. ...but if a CPU shows up then
that's not super ideal. Maybe it doesn't cause problems in practice
but it does feel like it should be fixed.

-Doug
Will Deacon Aug. 5, 2024, 5:24 p.m. UTC | #3
On Mon, Aug 05, 2024 at 10:13:11AM -0700, Doug Anderson wrote:
> On Mon, Aug 5, 2024 at 9:53 AM Will Deacon <will@kernel.org> wrote:
> > On Tue, Jun 25, 2024 at 04:07:22PM -0700, Douglas Anderson wrote:
> > > @@ -1084,79 +1088,87 @@ static inline unsigned int num_other_online_cpus(void)
> > >
> > >  void smp_send_stop(void)
> > >  {
> > > +     static unsigned long stop_in_progress;
> > > +     cpumask_t mask;
> > >       unsigned long timeout;
> > >
> > > -     if (num_other_online_cpus()) {
> > > -             cpumask_t mask;
> > > +     /*
> > > +      * If this cpu is the only one alive at this point in time, online or
> > > +      * not, there are no stop messages to be sent around, so just back out.
> > > +      */
> > > +     if (num_other_online_cpus() == 0)
> > > +             goto skip_ipi;
> > >
> > > -             cpumask_copy(&mask, cpu_online_mask);
> > > -             cpumask_clear_cpu(smp_processor_id(), &mask);
> > > +     /* Only proceed if this is the first CPU to reach this code */
> > > +     if (test_and_set_bit(0, &stop_in_progress))
> > > +             return;
> > >
> > > -             if (system_state <= SYSTEM_RUNNING)
> > > -                     pr_crit("SMP: stopping secondary CPUs\n");
> > > -             smp_cross_call(&mask, IPI_CPU_STOP);
> > > -     }
> > > +     cpumask_copy(&mask, cpu_online_mask);
> > > +     cpumask_clear_cpu(smp_processor_id(), &mask);
> > >
> > > -     /* Wait up to one second for other CPUs to stop */
> > > +     if (system_state <= SYSTEM_RUNNING)
> > > +             pr_crit("SMP: stopping secondary CPUs\n");
> > > +
> > > +     /*
> > > +      * Start with a normal IPI and wait up to one second for other CPUs to
> > > +      * stop. We do this first because it gives other processors a chance
> > > +      * to exit critical sections / drop locks and makes the rest of the
> > > +      * stop process (especially console flush) more robust.
> > > +      */
> > > +     smp_cross_call(&mask, IPI_CPU_STOP);
> >
> > I realise you've moved this out of crash_smp_send_stop() and it looks
> > like we inherited the code from x86, but do you know how this serialise
> > against CPU hotplug operations? I've spent the last 20m looking at the
> > code and I can't see what prevents other CPUs from coming and going
> > while we're trying to IPI a non-atomic copy of 'cpu_online_mask'.
> 
> I don't think there is anything. ...and it's not just this code
> either. It sure looks like nmi_trigger_cpumask_backtrace() has the
> same problem.
> 
> I guess maybe in the case of nmi_trigger_cpumask_backtrace() it's not
> such a big deal because:
> 1. If a CPU goes away then we'll just time out
> 2. If a CPU shows up then we'll skip backtracing it, but we were
> sending backtraces at an instant in time anyway.
> 
> In the case of smp_send_stop() it's probably fine if a CPU goes away
> because, again, we'll just timeout. ...but if a CPU shows up then
> that's not super ideal. Maybe it doesn't cause problems in practice
> but it does feel like it should be fixed.

On the other hand, I really don't fancy taking a CPU hotplug mutex on
the panic() path, so it's hard to know what's best.

I suppose one thing we could do is recompute the mask before sending the
NMI, which should make it a little more robust in that case. It still
feels fragile, but it's no worse than the existing code, I suppose.

Will
Yu Zhao Aug. 5, 2024, 5:26 p.m. UTC | #4
On Mon, Aug 5, 2024 at 11:13 AM Doug Anderson <dianders@chromium.org> wrote:
>
> Hi,
>
> On Mon, Aug 5, 2024 at 9:53 AM Will Deacon <will@kernel.org> wrote:
> >
> > Hi Doug,
> >
> > On Tue, Jun 25, 2024 at 04:07:22PM -0700, Douglas Anderson wrote:
> > > @@ -1084,79 +1088,87 @@ static inline unsigned int num_other_online_cpus(void)
> > >
> > >  void smp_send_stop(void)
> > >  {
> > > +     static unsigned long stop_in_progress;
> > > +     cpumask_t mask;
> > >       unsigned long timeout;
> > >
> > > -     if (num_other_online_cpus()) {
> > > -             cpumask_t mask;
> > > +     /*
> > > +      * If this cpu is the only one alive at this point in time, online or
> > > +      * not, there are no stop messages to be sent around, so just back out.
> > > +      */
> > > +     if (num_other_online_cpus() == 0)
> > > +             goto skip_ipi;
> > >
> > > -             cpumask_copy(&mask, cpu_online_mask);
> > > -             cpumask_clear_cpu(smp_processor_id(), &mask);
> > > +     /* Only proceed if this is the first CPU to reach this code */
> > > +     if (test_and_set_bit(0, &stop_in_progress))
> > > +             return;
> > >
> > > -             if (system_state <= SYSTEM_RUNNING)
> > > -                     pr_crit("SMP: stopping secondary CPUs\n");
> > > -             smp_cross_call(&mask, IPI_CPU_STOP);
> > > -     }
> > > +     cpumask_copy(&mask, cpu_online_mask);
> > > +     cpumask_clear_cpu(smp_processor_id(), &mask);
> > >
> > > -     /* Wait up to one second for other CPUs to stop */
> > > +     if (system_state <= SYSTEM_RUNNING)
> > > +             pr_crit("SMP: stopping secondary CPUs\n");
> > > +
> > > +     /*
> > > +      * Start with a normal IPI and wait up to one second for other CPUs to
> > > +      * stop. We do this first because it gives other processors a chance
> > > +      * to exit critical sections / drop locks and makes the rest of the
> > > +      * stop process (especially console flush) more robust.
> > > +      */
> > > +     smp_cross_call(&mask, IPI_CPU_STOP);
> >
> > I realise you've moved this out of crash_smp_send_stop() and it looks
> > like we inherited the code from x86, but do you know how this serialise
> > against CPU hotplug operations? I've spent the last 20m looking at the
> > code and I can't see what prevents other CPUs from coming and going
> > while we're trying to IPI a non-atomic copy of 'cpu_online_mask'.
>
> I don't think there is anything. ...and it's not just this code
> either.

I agree -- I noticed this a while ago [1], and I added
cpus_read_lock/unlock() on the caller side, because having the locks
inside callees can be a problem for callers who can't sleep.

[1] https://lore.kernel.org/linux-mm/ZnkGps-vQbiynNwP@google.com/

Also, do the handlers always see `crash_stop` set by the sender? If
not, would it be a problem? (In the above link, it has to do extra
work to make sure that handlers eventually see any updated values.)




> It sure looks like nmi_trigger_cpumask_backtrace() has the
> same problem.
>
> I guess maybe in the case of nmi_trigger_cpumask_backtrace() it's not
> such a big deal because:
> 1. If a CPU goes away then we'll just time out
> 2. If a CPU shows up then we'll skip backtracing it, but we were
> sending backtraces at an instant in time anyway.
>
> In the case of smp_send_stop() it's probably fine if a CPU goes away
> because, again, we'll just timeout. ...but if a CPU shows up then
> that's not super ideal. Maybe it doesn't cause problems in practice
> but it does feel like it should be fixed.
>
> -Doug
Doug Anderson Aug. 5, 2024, 8:14 p.m. UTC | #5
Hi,

On Mon, Aug 5, 2024 at 10:24 AM Will Deacon <will@kernel.org> wrote:
>
> On Mon, Aug 05, 2024 at 10:13:11AM -0700, Doug Anderson wrote:
> > On Mon, Aug 5, 2024 at 9:53 AM Will Deacon <will@kernel.org> wrote:
> > > On Tue, Jun 25, 2024 at 04:07:22PM -0700, Douglas Anderson wrote:
> > > > @@ -1084,79 +1088,87 @@ static inline unsigned int num_other_online_cpus(void)
> > > >
> > > >  void smp_send_stop(void)
> > > >  {
> > > > +     static unsigned long stop_in_progress;
> > > > +     cpumask_t mask;
> > > >       unsigned long timeout;
> > > >
> > > > -     if (num_other_online_cpus()) {
> > > > -             cpumask_t mask;
> > > > +     /*
> > > > +      * If this cpu is the only one alive at this point in time, online or
> > > > +      * not, there are no stop messages to be sent around, so just back out.
> > > > +      */
> > > > +     if (num_other_online_cpus() == 0)
> > > > +             goto skip_ipi;
> > > >
> > > > -             cpumask_copy(&mask, cpu_online_mask);
> > > > -             cpumask_clear_cpu(smp_processor_id(), &mask);
> > > > +     /* Only proceed if this is the first CPU to reach this code */
> > > > +     if (test_and_set_bit(0, &stop_in_progress))
> > > > +             return;
> > > >
> > > > -             if (system_state <= SYSTEM_RUNNING)
> > > > -                     pr_crit("SMP: stopping secondary CPUs\n");
> > > > -             smp_cross_call(&mask, IPI_CPU_STOP);
> > > > -     }
> > > > +     cpumask_copy(&mask, cpu_online_mask);
> > > > +     cpumask_clear_cpu(smp_processor_id(), &mask);
> > > >
> > > > -     /* Wait up to one second for other CPUs to stop */
> > > > +     if (system_state <= SYSTEM_RUNNING)
> > > > +             pr_crit("SMP: stopping secondary CPUs\n");
> > > > +
> > > > +     /*
> > > > +      * Start with a normal IPI and wait up to one second for other CPUs to
> > > > +      * stop. We do this first because it gives other processors a chance
> > > > +      * to exit critical sections / drop locks and makes the rest of the
> > > > +      * stop process (especially console flush) more robust.
> > > > +      */
> > > > +     smp_cross_call(&mask, IPI_CPU_STOP);
> > >
> > > I realise you've moved this out of crash_smp_send_stop() and it looks
> > > like we inherited the code from x86, but do you know how this serialise
> > > against CPU hotplug operations? I've spent the last 20m looking at the
> > > code and I can't see what prevents other CPUs from coming and going
> > > while we're trying to IPI a non-atomic copy of 'cpu_online_mask'.
> >
> > I don't think there is anything. ...and it's not just this code
> > either. It sure looks like nmi_trigger_cpumask_backtrace() has the
> > same problem.
> >
> > I guess maybe in the case of nmi_trigger_cpumask_backtrace() it's not
> > such a big deal because:
> > 1. If a CPU goes away then we'll just time out
> > 2. If a CPU shows up then we'll skip backtracing it, but we were
> > sending backtraces at an instant in time anyway.
> >
> > In the case of smp_send_stop() it's probably fine if a CPU goes away
> > because, again, we'll just timeout. ...but if a CPU shows up then
> > that's not super ideal. Maybe it doesn't cause problems in practice
> > but it does feel like it should be fixed.
>
> On the other hand, I really don't fancy taking a CPU hotplug mutex on
> the panic() path, so it's hard to know what's best.
>
> I suppose one thing we could do is recompute the mask before sending the
> NMI, which should make it a little more robust in that case. It still
> feels fragile, but it's no worse than the existing code, I suppose.

Happy to do this and send a new version if you want. Just let me know.
Basically it's just replacing `cpumask_and(&mask, &mask,
cpu_online_mask)`  with `cpumask_copy(&mask, cpu_online_mask);
cpumask_clear_cpu(smp_processor_id(), &mask);` in the case where we
fallback to NMI. If you're happy with the patch I'm also happy if you
make this change while applying.

BTW: if you want to dig into what's (possibly) another preexisting
race condition that I think I noticed while coding this up, I suspect
that "kcrash" will end up lacking info on a CPU any time we end up in
the panic_smp_self_stop() case. Trying to do something about that felt
a bit far afield for me, though.

-Doug
Doug Anderson Aug. 5, 2024, 8:21 p.m. UTC | #6
Hi,

On Mon, Aug 5, 2024 at 10:26 AM Yu Zhao <yuzhao@google.com> wrote:
>
> > > > +     /*
> > > > +      * Start with a normal IPI and wait up to one second for other CPUs to
> > > > +      * stop. We do this first because it gives other processors a chance
> > > > +      * to exit critical sections / drop locks and makes the rest of the
> > > > +      * stop process (especially console flush) more robust.
> > > > +      */
> > > > +     smp_cross_call(&mask, IPI_CPU_STOP);
> > >
> > > I realise you've moved this out of crash_smp_send_stop() and it looks
> > > like we inherited the code from x86, but do you know how this serialise
> > > against CPU hotplug operations? I've spent the last 20m looking at the
> > > code and I can't see what prevents other CPUs from coming and going
> > > while we're trying to IPI a non-atomic copy of 'cpu_online_mask'.
> >
> > I don't think there is anything. ...and it's not just this code
> > either.
>
> I agree -- I noticed this a while ago [1], and I added
> cpus_read_lock/unlock() on the caller side, because having the locks
> inside callees can be a problem for callers who can't sleep.
>
> [1] https://lore.kernel.org/linux-mm/ZnkGps-vQbiynNwP@google.com/

Sounds reasonable. I'm happy to review a patch doing that.


> Also, do the handlers always see `crash_stop` set by the sender? If
> not, would it be a problem? (In the above link, it has to do extra
> work to make sure that handlers eventually see any updated values.)

I _think_ this is OK. It's been a while since I wrote the original
patch but I seem to remember thinking that I didn't need to do
anything special. Tracing through the code again, I see in
gic_ipi_send_mask() the comment:

/*
 * Ensure that stores to Normal memory are visible to the
 * other CPUs before they observe us issuing the IPI.
 */
dmb(ishst);

...so I think that means we're fine, right?
Yu Zhao Aug. 5, 2024, 8:43 p.m. UTC | #7
On Mon, Aug 5, 2024 at 2:29 PM Doug Anderson <dianders@chromium.org> wrote:
>
> Hi,
>
> On Mon, Aug 5, 2024 at 10:26 AM Yu Zhao <yuzhao@google.com> wrote:
> >
> > > > > +     /*
> > > > > +      * Start with a normal IPI and wait up to one second for other CPUs to
> > > > > +      * stop. We do this first because it gives other processors a chance
> > > > > +      * to exit critical sections / drop locks and makes the rest of the
> > > > > +      * stop process (especially console flush) more robust.
> > > > > +      */
> > > > > +     smp_cross_call(&mask, IPI_CPU_STOP);
> > > >
> > > > I realise you've moved this out of crash_smp_send_stop() and it looks
> > > > like we inherited the code from x86, but do you know how this serialise
> > > > against CPU hotplug operations? I've spent the last 20m looking at the
> > > > code and I can't see what prevents other CPUs from coming and going
> > > > while we're trying to IPI a non-atomic copy of 'cpu_online_mask'.
> > >
> > > I don't think there is anything. ...and it's not just this code
> > > either.
> >
> > I agree -- I noticed this a while ago [1], and I added
> > cpus_read_lock/unlock() on the caller side, because having the locks
> > inside callees can be a problem for callers who can't sleep.
> >
> > [1] https://lore.kernel.org/linux-mm/ZnkGps-vQbiynNwP@google.com/
>
> Sounds reasonable. I'm happy to review a patch doing that.

Thanks.

> > Also, do the handlers always see `crash_stop` set by the sender? If
> > not, would it be a problem? (In the above link, it has to do extra
> > work to make sure that handlers eventually see any updated values.)
>
> I _think_ this is OK. It's been a while since I wrote the original
> patch but I seem to remember thinking that I didn't need to do
> anything special. Tracing through the code again, I see in
> gic_ipi_send_mask() the comment:
>
> /*
>  * Ensure that stores to Normal memory are visible to the
>  * other CPUs before they observe us issuing the IPI.
>  */
> dmb(ishst);
>
> ...so I think that means we're fine, right?

Nice -- I didn't know that.
Will Deacon Aug. 20, 2024, 3:53 p.m. UTC | #8
On Mon, Aug 05, 2024 at 01:14:12PM -0700, Doug Anderson wrote:
> On Mon, Aug 5, 2024 at 10:24 AM Will Deacon <will@kernel.org> wrote:
> > On Mon, Aug 05, 2024 at 10:13:11AM -0700, Doug Anderson wrote:
> > > On Mon, Aug 5, 2024 at 9:53 AM Will Deacon <will@kernel.org> wrote:
> > > > On Tue, Jun 25, 2024 at 04:07:22PM -0700, Douglas Anderson wrote:
> > > > > @@ -1084,79 +1088,87 @@ static inline unsigned int num_other_online_cpus(void)
> > > > >
> > > > >  void smp_send_stop(void)
> > > > >  {
> > > > > +     static unsigned long stop_in_progress;
> > > > > +     cpumask_t mask;
> > > > >       unsigned long timeout;
> > > > >
> > > > > -     if (num_other_online_cpus()) {
> > > > > -             cpumask_t mask;
> > > > > +     /*
> > > > > +      * If this cpu is the only one alive at this point in time, online or
> > > > > +      * not, there are no stop messages to be sent around, so just back out.
> > > > > +      */
> > > > > +     if (num_other_online_cpus() == 0)
> > > > > +             goto skip_ipi;
> > > > >
> > > > > -             cpumask_copy(&mask, cpu_online_mask);
> > > > > -             cpumask_clear_cpu(smp_processor_id(), &mask);
> > > > > +     /* Only proceed if this is the first CPU to reach this code */
> > > > > +     if (test_and_set_bit(0, &stop_in_progress))
> > > > > +             return;
> > > > >
> > > > > -             if (system_state <= SYSTEM_RUNNING)
> > > > > -                     pr_crit("SMP: stopping secondary CPUs\n");
> > > > > -             smp_cross_call(&mask, IPI_CPU_STOP);
> > > > > -     }
> > > > > +     cpumask_copy(&mask, cpu_online_mask);
> > > > > +     cpumask_clear_cpu(smp_processor_id(), &mask);
> > > > >
> > > > > -     /* Wait up to one second for other CPUs to stop */
> > > > > +     if (system_state <= SYSTEM_RUNNING)
> > > > > +             pr_crit("SMP: stopping secondary CPUs\n");
> > > > > +
> > > > > +     /*
> > > > > +      * Start with a normal IPI and wait up to one second for other CPUs to
> > > > > +      * stop. We do this first because it gives other processors a chance
> > > > > +      * to exit critical sections / drop locks and makes the rest of the
> > > > > +      * stop process (especially console flush) more robust.
> > > > > +      */
> > > > > +     smp_cross_call(&mask, IPI_CPU_STOP);
> > > >
> > > > I realise you've moved this out of crash_smp_send_stop() and it looks
> > > > like we inherited the code from x86, but do you know how this serialise
> > > > against CPU hotplug operations? I've spent the last 20m looking at the
> > > > code and I can't see what prevents other CPUs from coming and going
> > > > while we're trying to IPI a non-atomic copy of 'cpu_online_mask'.
> > >
> > > I don't think there is anything. ...and it's not just this code
> > > either. It sure looks like nmi_trigger_cpumask_backtrace() has the
> > > same problem.
> > >
> > > I guess maybe in the case of nmi_trigger_cpumask_backtrace() it's not
> > > such a big deal because:
> > > 1. If a CPU goes away then we'll just time out
> > > 2. If a CPU shows up then we'll skip backtracing it, but we were
> > > sending backtraces at an instant in time anyway.
> > >
> > > In the case of smp_send_stop() it's probably fine if a CPU goes away
> > > because, again, we'll just timeout. ...but if a CPU shows up then
> > > that's not super ideal. Maybe it doesn't cause problems in practice
> > > but it does feel like it should be fixed.
> >
> > On the other hand, I really don't fancy taking a CPU hotplug mutex on
> > the panic() path, so it's hard to know what's best.
> >
> > I suppose one thing we could do is recompute the mask before sending the
> > NMI, which should make it a little more robust in that case. It still
> > feels fragile, but it's no worse than the existing code, I suppose.
> 
> Happy to do this and send a new version if you want. Just let me know.
> Basically it's just replacing `cpumask_and(&mask, &mask,
> cpu_online_mask)`  with `cpumask_copy(&mask, cpu_online_mask);
> cpumask_clear_cpu(smp_processor_id(), &mask);` in the case where we
> fallback to NMI. If you're happy with the patch I'm also happy if you
> make this change while applying.

Please send a v3 with that along with a little comment summarising this
discussion so I don't confuse myself again when I look at next time :)

Thanks,

Will
diff mbox series

Patch

diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 31c8b3094dd7..254619245f97 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -71,7 +71,7 @@  enum ipi_msg_type {
 	IPI_RESCHEDULE,
 	IPI_CALL_FUNC,
 	IPI_CPU_STOP,
-	IPI_CPU_CRASH_STOP,
+	IPI_CPU_STOP_NMI,
 	IPI_TIMER,
 	IPI_IRQ_WORK,
 	NR_IPI,
@@ -88,6 +88,8 @@  static int ipi_irq_base __ro_after_init;
 static int nr_ipi __ro_after_init = NR_IPI;
 static struct irq_desc *ipi_desc[MAX_IPI] __ro_after_init;
 
+static bool crash_stop;
+
 static void ipi_setup(int cpu);
 
 #ifdef CONFIG_HOTPLUG_CPU
@@ -771,7 +773,7 @@  static const char *ipi_types[NR_IPI] __tracepoint_string = {
 	[IPI_RESCHEDULE]	= "Rescheduling interrupts",
 	[IPI_CALL_FUNC]		= "Function call interrupts",
 	[IPI_CPU_STOP]		= "CPU stop interrupts",
-	[IPI_CPU_CRASH_STOP]	= "CPU stop (for crash dump) interrupts",
+	[IPI_CPU_STOP_NMI]	= "CPU stop NMIs",
 	[IPI_TIMER]		= "Timer broadcast interrupts",
 	[IPI_IRQ_WORK]		= "IRQ work interrupts",
 };
@@ -813,9 +815,9 @@  void arch_irq_work_raise(void)
 }
 #endif
 
-static void __noreturn local_cpu_stop(void)
+static void __noreturn local_cpu_stop(unsigned int cpu)
 {
-	set_cpu_online(smp_processor_id(), false);
+	set_cpu_online(cpu, false);
 
 	local_daif_mask();
 	sdei_mask_local_cpu();
@@ -829,21 +831,26 @@  static void __noreturn local_cpu_stop(void)
  */
 void __noreturn panic_smp_self_stop(void)
 {
-	local_cpu_stop();
+	local_cpu_stop(smp_processor_id());
 }
 
-#ifdef CONFIG_KEXEC_CORE
-static atomic_t waiting_for_crash_ipi = ATOMIC_INIT(0);
-#endif
-
 static void __noreturn ipi_cpu_crash_stop(unsigned int cpu, struct pt_regs *regs)
 {
 #ifdef CONFIG_KEXEC_CORE
+	/*
+	 * Use local_daif_mask() instead of local_irq_disable() to make sure
+	 * that pseudo-NMIs are disabled. The "crash stop" code starts with
+	 * an IRQ and falls back to NMI (which might be pseudo). If the IRQ
+	 * finally goes through right as we're timing out then the NMI could
+	 * interrupt us. It's better to prevent the NMI and let the IRQ
+	 * finish since the pt_regs will be better.
+	 */
+	local_daif_mask();
+
 	crash_save_cpu(regs, cpu);
 
-	atomic_dec(&waiting_for_crash_ipi);
+	set_cpu_online(cpu, false);
 
-	local_irq_disable();
 	sdei_mask_local_cpu();
 
 	if (IS_ENABLED(CONFIG_HOTPLUG_CPU))
@@ -908,14 +915,12 @@  static void do_handle_IPI(int ipinr)
 		break;
 
 	case IPI_CPU_STOP:
-		local_cpu_stop();
-		break;
-
-	case IPI_CPU_CRASH_STOP:
-		if (IS_ENABLED(CONFIG_KEXEC_CORE)) {
+	case IPI_CPU_STOP_NMI:
+		if (IS_ENABLED(CONFIG_KEXEC_CORE) && crash_stop) {
 			ipi_cpu_crash_stop(cpu, get_irq_regs());
-
 			unreachable();
+		} else {
+			local_cpu_stop(cpu);
 		}
 		break;
 
@@ -970,8 +975,7 @@  static bool ipi_should_be_nmi(enum ipi_msg_type ipi)
 		return false;
 
 	switch (ipi) {
-	case IPI_CPU_STOP:
-	case IPI_CPU_CRASH_STOP:
+	case IPI_CPU_STOP_NMI:
 	case IPI_CPU_BACKTRACE:
 	case IPI_KGDB_ROUNDUP:
 		return true;
@@ -1084,79 +1088,87 @@  static inline unsigned int num_other_online_cpus(void)
 
 void smp_send_stop(void)
 {
+	static unsigned long stop_in_progress;
+	cpumask_t mask;
 	unsigned long timeout;
 
-	if (num_other_online_cpus()) {
-		cpumask_t mask;
+	/*
+	 * If this cpu is the only one alive at this point in time, online or
+	 * not, there are no stop messages to be sent around, so just back out.
+	 */
+	if (num_other_online_cpus() == 0)
+		goto skip_ipi;
 
-		cpumask_copy(&mask, cpu_online_mask);
-		cpumask_clear_cpu(smp_processor_id(), &mask);
+	/* Only proceed if this is the first CPU to reach this code */
+	if (test_and_set_bit(0, &stop_in_progress))
+		return;
 
-		if (system_state <= SYSTEM_RUNNING)
-			pr_crit("SMP: stopping secondary CPUs\n");
-		smp_cross_call(&mask, IPI_CPU_STOP);
-	}
+	cpumask_copy(&mask, cpu_online_mask);
+	cpumask_clear_cpu(smp_processor_id(), &mask);
 
-	/* Wait up to one second for other CPUs to stop */
+	if (system_state <= SYSTEM_RUNNING)
+		pr_crit("SMP: stopping secondary CPUs\n");
+
+	/*
+	 * Start with a normal IPI and wait up to one second for other CPUs to
+	 * stop. We do this first because it gives other processors a chance
+	 * to exit critical sections / drop locks and makes the rest of the
+	 * stop process (especially console flush) more robust.
+	 */
+	smp_cross_call(&mask, IPI_CPU_STOP);
 	timeout = USEC_PER_SEC;
 	while (num_other_online_cpus() && timeout--)
 		udelay(1);
 
-	if (num_other_online_cpus())
+	/*
+	 * If CPUs are still online, try an NMI. There's no excuse for this to
+	 * be slow, so we only give them an extra 10 ms to respond.
+	 */
+	if (num_other_online_cpus() && ipi_should_be_nmi(IPI_CPU_STOP_NMI)) {
+		cpumask_and(&mask, &mask, cpu_online_mask);
+
+		pr_info("SMP: retry stop with NMI for CPUs %*pbl\n",
+			cpumask_pr_args(&mask));
+
+		smp_cross_call(&mask, IPI_CPU_STOP_NMI);
+		timeout = USEC_PER_MSEC * 10;
+		while (num_other_online_cpus() && timeout--)
+			udelay(1);
+	}
+
+	if (num_other_online_cpus()) {
+		cpumask_and(&mask, &mask, cpu_online_mask);
+
 		pr_warn("SMP: failed to stop secondary CPUs %*pbl\n",
-			cpumask_pr_args(cpu_online_mask));
+			cpumask_pr_args(&mask));
+	}
 
+skip_ipi:
 	sdei_mask_local_cpu();
 }
 
 #ifdef CONFIG_KEXEC_CORE
 void crash_smp_send_stop(void)
 {
-	static int cpus_stopped;
-	cpumask_t mask;
-	unsigned long timeout;
-
 	/*
 	 * This function can be called twice in panic path, but obviously
 	 * we execute this only once.
+	 *
+	 * We use this same boolean to tell whether the IPI we send was a
+	 * stop or a "crash stop".
 	 */
-	if (cpus_stopped)
+	if (crash_stop)
 		return;
+	crash_stop = 1;
 
-	cpus_stopped = 1;
+	smp_send_stop();
 
-	/*
-	 * If this cpu is the only one alive at this point in time, online or
-	 * not, there are no stop messages to be sent around, so just back out.
-	 */
-	if (num_other_online_cpus() == 0)
-		goto skip_ipi;
-
-	cpumask_copy(&mask, cpu_online_mask);
-	cpumask_clear_cpu(smp_processor_id(), &mask);
-
-	atomic_set(&waiting_for_crash_ipi, num_other_online_cpus());
-
-	pr_crit("SMP: stopping secondary CPUs\n");
-	smp_cross_call(&mask, IPI_CPU_CRASH_STOP);
-
-	/* Wait up to one second for other CPUs to stop */
-	timeout = USEC_PER_SEC;
-	while ((atomic_read(&waiting_for_crash_ipi) > 0) && timeout--)
-		udelay(1);
-
-	if (atomic_read(&waiting_for_crash_ipi) > 0)
-		pr_warn("SMP: failed to stop secondary CPUs %*pbl\n",
-			cpumask_pr_args(&mask));
-
-skip_ipi:
-	sdei_mask_local_cpu();
 	sdei_handler_abort();
 }
 
 bool smp_crash_stop_failed(void)
 {
-	return (atomic_read(&waiting_for_crash_ipi) > 0);
+	return num_other_online_cpus() != 0;
 }
 #endif