Message ID | 20231207170251.1.Id4817adef610302554b8aa42b090d57270dc119c@changeid (mailing list archive) |
---|---|
State | Handled Elsewhere |
Headers | show |
Series | arm64: smp: smp_send_stop() and crash_smp_send_stop() should try non-NMI first | expand |
Hi, On Thu, Dec 7, 2023 at 5:03 PM Douglas Anderson <dianders@chromium.org> wrote: > > When testing hard lockup handling on my sc7180-trogdor-lazor device > with pseudo-NMI enabled, with serial console enabled and with kgdb > disabled, I found that the stack crawls printed to the serial console > ended up as a jumbled mess. After rebooting, the pstore-based console > looked fine though. Also, enabling kgdb to trap the panic made the > console look fine and avoided the mess. > > After a bit of tracking down, I came to the conclusion that this was > what was happening: > 1. The panic path was stopping all other CPUs with > panic_other_cpus_shutdown(). > 2. At least one of those other CPUs was in the middle of printing to > the serial console and holding the console port's lock, which is > grabbed with "irqsave". ...but since we were stopping with an NMI > we didn't care about the "irqsave" and interrupted anyway. > 3. Since we stopped the CPU while it was holding the lock it would > never release it. > 4. All future calls to output to the console would end up failing to > get the lock in qcom_geni_serial_console_write(). This isn't > _totally_ unexpected at panic time but it's a code path that's not > well tested, hard to get right, and apparently doesn't work > terribly well on the Qualcomm geni serial driver. > > It would probably be a reasonable idea to try to make the Qualcomm > geni serial driver work better, but also it's nice not to get into > this situation in the first place. > > Taking a page from what x86 appears to do in native_stop_other_cpus(), > let's do this: > 1. First, we'll try to stop other CPUs with a normal IPI and wait a > second. This gives them a chance to leave critical sections. > 2. If CPUs fail to stop then we'll retry with an NMI, but give a much > lower timeout since there's no good reason for a CPU not to react > quickly to a NMI. > > This works well and avoids the corrupted console and (presumably) > could help avoid other similar issues. > > In order to do this, we need to do a little re-organization of our > IPIs since we don't have any more free IDs. We'll do what was > suggested in previous conversations and combine "stop" and "crash > stop". That frees up an IPI so now we can have a "stop" and "stop > NMI". > > In order to do this we also need a slight change in the way we keep > track of which CPUs still need to be stopped. We need to know > specifically which CPUs haven't stopped yet when we fall back to NMI > but in the "crash stop" case the "cpu_online_mask" isn't updated as > CPUs go down. This is why that code path had an atomic of the number > of CPUs left. We'll solve this by making the cpumask into a > global. This has a potential memory implication--with NR_CPUs = 4096 > this is 4096/8 = 512 bytes of globals. On the upside in that same case > we take 512 bytes off the stack which could potentially have made the > stop code less reliable. It can be noted that the NMI backtrace code > (lib/nmi_backtrace.c) uses the same approach and that use also > confirms that updating the mask is safe from NMI. > > All of the above lets us combine the logic for "stop" and "crash stop" > code, which appeared to have a bunch of arbitrary implementation > differences. Possibly this could make up for some of the 512 wasted > bytes. ;-) > > Aside from the above change where we try a normal IPI and then an NMI, > the combined function has a few subtle differences: > * In the normal smp_send_stop(), if we fail to stop one or more CPUs > then we won't include the current CPU (the one running > smp_send_stop()) in the error message. > * In crash_smp_send_stop(), if we fail to stop some CPUs we'll print > the CPUs that we failed to stop instead of printing all _but_ the > current running CPU. > * In crash_smp_send_stop(), we will now only print "SMP: stopping > secondary CPUs" if (system_state <= SYSTEM_RUNNING). > > Fixes: d7402513c935 ("arm64: smp: IPI_CPU_STOP and IPI_CPU_CRASH_STOP should try for NMI") > Signed-off-by: Douglas Anderson <dianders@chromium.org> > --- > I'm not setup to test the crash_smp_send_stop(). I made sure it > compiled and hacked the panic() method to call it, but I haven't > actually run kexec. Hopefully others can confirm that it's working for > them. > > arch/arm64/kernel/smp.c | 115 +++++++++++++++++++--------------------- > 1 file changed, 54 insertions(+), 61 deletions(-) The sound of crickets is overwhelming. ;-) Does anyone have any comments here? Is this a terrible idea? Is this the best idea you've heard all year (it's only been 8 days, so maybe)? Is this great but the implementation is lacking (at best)? Do you hate that this waits for 1 second and wish it waited for 1 ms? 10 ms? 100 ms? 8192 ms? Aside from the weirdness of a processor being killed while holding the console lock, it does seem beneficial to give IRQs at least a little time to finish before killing a processor. I don't have any other explicit examples, but I could just imagine that things might be a little more orderly in such a case... -Doug
Hi, On Mon, Jan 8, 2024 at 4:54 PM Doug Anderson <dianders@chromium.org> wrote: > > Hi, > > On Thu, Dec 7, 2023 at 5:03 PM Douglas Anderson <dianders@chromium.org> wrote: > > > > When testing hard lockup handling on my sc7180-trogdor-lazor device > > with pseudo-NMI enabled, with serial console enabled and with kgdb > > disabled, I found that the stack crawls printed to the serial console > > ended up as a jumbled mess. After rebooting, the pstore-based console > > looked fine though. Also, enabling kgdb to trap the panic made the > > console look fine and avoided the mess. > > > > After a bit of tracking down, I came to the conclusion that this was > > what was happening: > > 1. The panic path was stopping all other CPUs with > > panic_other_cpus_shutdown(). > > 2. At least one of those other CPUs was in the middle of printing to > > the serial console and holding the console port's lock, which is > > grabbed with "irqsave". ...but since we were stopping with an NMI > > we didn't care about the "irqsave" and interrupted anyway. > > 3. Since we stopped the CPU while it was holding the lock it would > > never release it. > > 4. All future calls to output to the console would end up failing to > > get the lock in qcom_geni_serial_console_write(). This isn't > > _totally_ unexpected at panic time but it's a code path that's not > > well tested, hard to get right, and apparently doesn't work > > terribly well on the Qualcomm geni serial driver. > > > > It would probably be a reasonable idea to try to make the Qualcomm > > geni serial driver work better, but also it's nice not to get into > > this situation in the first place. > > > > Taking a page from what x86 appears to do in native_stop_other_cpus(), > > let's do this: > > 1. First, we'll try to stop other CPUs with a normal IPI and wait a > > second. This gives them a chance to leave critical sections. > > 2. If CPUs fail to stop then we'll retry with an NMI, but give a much > > lower timeout since there's no good reason for a CPU not to react > > quickly to a NMI. > > > > This works well and avoids the corrupted console and (presumably) > > could help avoid other similar issues. > > > > In order to do this, we need to do a little re-organization of our > > IPIs since we don't have any more free IDs. We'll do what was > > suggested in previous conversations and combine "stop" and "crash > > stop". That frees up an IPI so now we can have a "stop" and "stop > > NMI". > > > > In order to do this we also need a slight change in the way we keep > > track of which CPUs still need to be stopped. We need to know > > specifically which CPUs haven't stopped yet when we fall back to NMI > > but in the "crash stop" case the "cpu_online_mask" isn't updated as > > CPUs go down. This is why that code path had an atomic of the number > > of CPUs left. We'll solve this by making the cpumask into a > > global. This has a potential memory implication--with NR_CPUs = 4096 > > this is 4096/8 = 512 bytes of globals. On the upside in that same case > > we take 512 bytes off the stack which could potentially have made the > > stop code less reliable. It can be noted that the NMI backtrace code > > (lib/nmi_backtrace.c) uses the same approach and that use also > > confirms that updating the mask is safe from NMI. > > > > All of the above lets us combine the logic for "stop" and "crash stop" > > code, which appeared to have a bunch of arbitrary implementation > > differences. Possibly this could make up for some of the 512 wasted > > bytes. ;-) > > > > Aside from the above change where we try a normal IPI and then an NMI, > > the combined function has a few subtle differences: > > * In the normal smp_send_stop(), if we fail to stop one or more CPUs > > then we won't include the current CPU (the one running > > smp_send_stop()) in the error message. > > * In crash_smp_send_stop(), if we fail to stop some CPUs we'll print > > the CPUs that we failed to stop instead of printing all _but_ the > > current running CPU. > > * In crash_smp_send_stop(), we will now only print "SMP: stopping > > secondary CPUs" if (system_state <= SYSTEM_RUNNING). > > > > Fixes: d7402513c935 ("arm64: smp: IPI_CPU_STOP and IPI_CPU_CRASH_STOP should try for NMI") > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > > --- > > I'm not setup to test the crash_smp_send_stop(). I made sure it > > compiled and hacked the panic() method to call it, but I haven't > > actually run kexec. Hopefully others can confirm that it's working for > > them. > > > > arch/arm64/kernel/smp.c | 115 +++++++++++++++++++--------------------- > > 1 file changed, 54 insertions(+), 61 deletions(-) > > The sound of crickets is overwhelming. ;-) Does anyone have any > comments here? Is this a terrible idea? Is this the best idea you've > heard all year (it's only been 8 days, so maybe)? Is this great but > the implementation is lacking (at best)? Do you hate that this waits > for 1 second and wish it waited for 1 ms? 10 ms? 100 ms? 8192 ms? > > Aside from the weirdness of a processor being killed while holding the > console lock, it does seem beneficial to give IRQs at least a little > time to finish before killing a processor. I don't have any other > explicit examples, but I could just imagine that things might be a > little more orderly in such a case... I'm still hoping to get some sort of feedback here. If people think this is a terrible idea then I'll shut up now and leave well enough alone, but it would be nice to actively decide and get the patch out of limbo. FWIW the serial console dumping issue that originally inspired me to track this down has been worked around at least well enough to not spew garbage in my console. See commit 9e957a155005 ("serial: qcom-geni: Don't cancel/abort if we can't get the port lock"). It's still a little awkward because we'll be running fully lockless during panic time, but it seems to work... -Doug
On Tue, Feb 27, 2024 at 04:57:31PM -0800, Doug Anderson wrote: > Hi, > > On Mon, Jan 8, 2024 at 4:54 PM Doug Anderson <dianders@chromium.org> wrote: > > > > Hi, > > > > On Thu, Dec 7, 2023 at 5:03 PM Douglas Anderson <dianders@chromium.org> wrote: > > > > > > When testing hard lockup handling on my sc7180-trogdor-lazor device > > > with pseudo-NMI enabled, with serial console enabled and with kgdb > > > disabled, I found that the stack crawls printed to the serial console > > > ended up as a jumbled mess. After rebooting, the pstore-based console > > > looked fine though. Also, enabling kgdb to trap the panic made the > > > console look fine and avoided the mess. > > > > > > After a bit of tracking down, I came to the conclusion that this was > > > what was happening: > > > 1. The panic path was stopping all other CPUs with > > > panic_other_cpus_shutdown(). > > > 2. At least one of those other CPUs was in the middle of printing to > > > the serial console and holding the console port's lock, which is > > > grabbed with "irqsave". ...but since we were stopping with an NMI > > > we didn't care about the "irqsave" and interrupted anyway. > > > 3. Since we stopped the CPU while it was holding the lock it would > > > never release it. > > > 4. All future calls to output to the console would end up failing to > > > get the lock in qcom_geni_serial_console_write(). This isn't > > > _totally_ unexpected at panic time but it's a code path that's not > > > well tested, hard to get right, and apparently doesn't work > > > terribly well on the Qualcomm geni serial driver. > > > > > > It would probably be a reasonable idea to try to make the Qualcomm > > > geni serial driver work better, but also it's nice not to get into > > > this situation in the first place. > > > > > > Taking a page from what x86 appears to do in native_stop_other_cpus(), > > > let's do this: > > > 1. First, we'll try to stop other CPUs with a normal IPI and wait a > > > second. This gives them a chance to leave critical sections. > > > 2. If CPUs fail to stop then we'll retry with an NMI, but give a much > > > lower timeout since there's no good reason for a CPU not to react > > > quickly to a NMI. > > > > > > This works well and avoids the corrupted console and (presumably) > > > could help avoid other similar issues. > > > > > > In order to do this, we need to do a little re-organization of our > > > IPIs since we don't have any more free IDs. We'll do what was > > > suggested in previous conversations and combine "stop" and "crash > > > stop". That frees up an IPI so now we can have a "stop" and "stop > > > NMI". > > > > > > In order to do this we also need a slight change in the way we keep > > > track of which CPUs still need to be stopped. We need to know > > > specifically which CPUs haven't stopped yet when we fall back to NMI > > > but in the "crash stop" case the "cpu_online_mask" isn't updated as > > > CPUs go down. This is why that code path had an atomic of the number > > > of CPUs left. We'll solve this by making the cpumask into a > > > global. This has a potential memory implication--with NR_CPUs = 4096 > > > this is 4096/8 = 512 bytes of globals. On the upside in that same case > > > we take 512 bytes off the stack which could potentially have made the > > > stop code less reliable. It can be noted that the NMI backtrace code > > > (lib/nmi_backtrace.c) uses the same approach and that use also > > > confirms that updating the mask is safe from NMI. > > > > > > All of the above lets us combine the logic for "stop" and "crash stop" > > > code, which appeared to have a bunch of arbitrary implementation > > > differences. Possibly this could make up for some of the 512 wasted > > > bytes. ;-) > > > > > > Aside from the above change where we try a normal IPI and then an NMI, > > > the combined function has a few subtle differences: > > > * In the normal smp_send_stop(), if we fail to stop one or more CPUs > > > then we won't include the current CPU (the one running > > > smp_send_stop()) in the error message. > > > * In crash_smp_send_stop(), if we fail to stop some CPUs we'll print > > > the CPUs that we failed to stop instead of printing all _but_ the > > > current running CPU. > > > * In crash_smp_send_stop(), we will now only print "SMP: stopping > > > secondary CPUs" if (system_state <= SYSTEM_RUNNING). > > > > > > Fixes: d7402513c935 ("arm64: smp: IPI_CPU_STOP and IPI_CPU_CRASH_STOP should try for NMI") > > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > > > --- > > > I'm not setup to test the crash_smp_send_stop(). I made sure it > > > compiled and hacked the panic() method to call it, but I haven't > > > actually run kexec. Hopefully others can confirm that it's working for > > > them. > > > > > > arch/arm64/kernel/smp.c | 115 +++++++++++++++++++--------------------- > > > 1 file changed, 54 insertions(+), 61 deletions(-) > > > > The sound of crickets is overwhelming. ;-) Does anyone have any > > comments here? Is this a terrible idea? Is this the best idea you've > > heard all year (it's only been 8 days, so maybe)? Is this great but > > the implementation is lacking (at best)? Do you hate that this waits > > for 1 second and wish it waited for 1 ms? 10 ms? 100 ms? 8192 ms? > > > > Aside from the weirdness of a processor being killed while holding the > > console lock, it does seem beneficial to give IRQs at least a little > > time to finish before killing a processor. I don't have any other > > explicit examples, but I could just imagine that things might be a > > little more orderly in such a case... > > I'm still hoping to get some sort of feedback here. If people think > this is a terrible idea then I'll shut up now and leave well enough > alone, but it would be nice to actively decide and get the patch out > of limbo. I've read patch through a couple of times and was generally convinced by the "do what x86 does" argument. However until now I've always held my council since I wasn't familiar with these code paths and I figured it was OK for me to have no opinion because the first line of the description says that kgdb/kdb is 100% not involved in causing the problem ;-) . However today I also took a look at the HAVE_NMI architectures and there is no consensus between them about how to implement this: PowerPC uses NMI and most of the others use IRQ only, s390 special cases for the panic code path and acts differently compared to a normal SMP shutdown. FWIW the x86 route was irq-only and then switching to irq-plus-nmi (after a short trial with NMI-only that had problems with pstore reliability[1]) and that approach has been in place for over a decade now! However, if we talking ourselves into copying x86 then perhaps we should more accurately copy x86! Assuming I read the x86 code correctly then crash_smp_send_stop() will (mostly) go staight to NMI rather than trialling an IRQ first! That is not what is currently implemented in the patch for arm64. Daniel. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7d007d21e539dbecb6942c5734e6649f720982cf
Hi, On Wed, Feb 28, 2024 at 5:11 AM Daniel Thompson <daniel.thompson@linaro.org> wrote: > > > I'm still hoping to get some sort of feedback here. If people think > > this is a terrible idea then I'll shut up now and leave well enough > > alone, but it would be nice to actively decide and get the patch out > > of limbo. > > I've read patch through a couple of times and was generally convinced by > the "do what x86 does" argument. > > However until now I've always held my council since I wasn't familiar > with these code paths and I figured it was OK for me to have no opinion > because the first line of the description says that kgdb/kdb is 100% not > involved in causing the problem ;-) . > > However today I also took a look at the HAVE_NMI architectures and there > is no consensus between them about how to implement this: PowerPC uses > NMI and most of the others use IRQ only, s390 special cases for the > panic code path and acts differently compared to a normal SMP shutdown. Thanks for taking a look! I think I just included you since long ago you were involved in the pseudo-NMI patches. ;-) > FWIW the x86 route was irq-only and then switching to irq-plus-nmi > (after a short trial with NMI-only that had problems with pstore > reliability[1]) and that approach has been in place for over > a decade now! Ah, interesting. I guess this isn't a problem for me at the moment since we're not using any alternate pstore backends (ChromeOS just does pstore to RAM), but it's good to confirm that people were facing real issues. This matches what my gut told me: that it's nice to give CPUs a little chance to shut down more cleanly before jamming an NMI down their throats. > However, if we talking ourselves into copying x86 then perhaps we should > more accurately copy x86! Assuming I read the x86 code correctly then > crash_smp_send_stop() will (mostly) go staight to NMI rather > than trialling an IRQ first! That is not what is currently implemented > in the patch for arm64. Sure, I'm happy to change the patch to work that way, though I might wait to get some confirmation from a maintainer that they think this idea is worth pursuing before spending more time on it. I don't think it would be hard to have the "crash stop" code jump straight to NMI if that's what people want. Matching x86 here seems reasonable, though I'd also say that my gut still says that even for crash stop we should try to stop things cleanly before jumping to NMI. I guess I could imagine that the code we're kexec-ing to generate the core file might be more likely to find the hardware in a funny state if we stopped CPUs w/ NMI vs IRQ. -Doug
On Thu, Feb 29, 2024 at 10:34:26AM -0800, Doug Anderson wrote: > Hi, > > On Wed, Feb 28, 2024 at 5:11 AM Daniel Thompson > <daniel.thompson@linaro.org> wrote: > > > > > I'm still hoping to get some sort of feedback here. If people think > > > this is a terrible idea then I'll shut up now and leave well enough > > > alone, but it would be nice to actively decide and get the patch out > > > of limbo. > > > > I've read patch through a couple of times and was generally convinced by > > the "do what x86 does" argument. > > > > However until now I've always held my council since I wasn't familiar > > with these code paths and I figured it was OK for me to have no opinion > > because the first line of the description says that kgdb/kdb is 100% not > > involved in causing the problem ;-) . > > > > However today I also took a look at the HAVE_NMI architectures and there > > is no consensus between them about how to implement this: PowerPC uses > > NMI and most of the others use IRQ only, s390 special cases for the > > panic code path and acts differently compared to a normal SMP shutdown. > > > > <snip> > > > > However, if we talking ourselves into copying x86 then perhaps we should > > more accurately copy x86! Assuming I read the x86 code correctly then > > crash_smp_send_stop() will (mostly) go staight to NMI rather > > than trialling an IRQ first! That is not what is currently implemented > > in the patch for arm64. > > Sure, I'm happy to change the patch to work that way, though I might > wait to get some confirmation from a maintainer that they think this > idea is worth pursuing before spending more time on it. 100%. Don't respin on my account. > I don't think it would be hard to have the "crash stop" code jump > straight to NMI if that's what people want. Matching x86 here seems > reasonable, though I'd also say that my gut still says that even for > crash stop we should try to stop things cleanly before jumping to NMI. > I guess I could imagine that the code we're kexec-ing to generate the > core file might be more likely to find the hardware in a funny state > if we stopped CPUs w/ NMI vs IRQ. In terms of the "right thing to do" for kdump then reviewing the s390 might be a good idea. Unfortunately it's a bit different to the other arches and I can't offer a 95% answer about what that arch does. Daniel.
Hi Doug, On Tue, Feb 27, 2024 at 04:57:31PM -0800, Doug Anderson wrote: > On Mon, Jan 8, 2024 at 4:54 PM Doug Anderson <dianders@chromium.org> wrote: > > On Thu, Dec 7, 2023 at 5:03 PM Douglas Anderson <dianders@chromium.org> wrote: > > The sound of crickets is overwhelming. ;-) Does anyone have any > > comments here? Is this a terrible idea? Is this the best idea you've > > heard all year (it's only been 8 days, so maybe)? Is this great but > > the implementation is lacking (at best)? Do you hate that this waits > > for 1 second and wish it waited for 1 ms? 10 ms? 100 ms? 8192 ms? > > > > Aside from the weirdness of a processor being killed while holding the > > console lock, it does seem beneficial to give IRQs at least a little > > time to finish before killing a processor. I don't have any other > > explicit examples, but I could just imagine that things might be a > > little more orderly in such a case... > > I'm still hoping to get some sort of feedback here. If people think > this is a terrible idea then I'll shut up now and leave well enough > alone, but it would be nice to actively decide and get the patch out > of limbo. > > FWIW the serial console dumping issue that originally inspired me to > track this down has been worked around at least well enough to not > spew garbage in my console. See commit 9e957a155005 ("serial: > qcom-geni: Don't cancel/abort if we can't get the port lock"). It's > still a little awkward because we'll be running fully lockless during > panic time, but it seems to work... This is on my list of things to look into, but I haven't had the chance to go through it in detail. From a high level, I think this sounds reasonable; I just want to make sure this doesn't lead to any new surprises... Mark.
Hi, On Fri, Mar 1, 2024 at 8:06 AM Mark Rutland <mark.rutland@arm.com> wrote: > > Hi Doug, > > On Tue, Feb 27, 2024 at 04:57:31PM -0800, Doug Anderson wrote: > > On Mon, Jan 8, 2024 at 4:54 PM Doug Anderson <dianders@chromium.org> wrote: > > > On Thu, Dec 7, 2023 at 5:03 PM Douglas Anderson <dianders@chromium.org> wrote: > > > > The sound of crickets is overwhelming. ;-) Does anyone have any > > > comments here? Is this a terrible idea? Is this the best idea you've > > > heard all year (it's only been 8 days, so maybe)? Is this great but > > > the implementation is lacking (at best)? Do you hate that this waits > > > for 1 second and wish it waited for 1 ms? 10 ms? 100 ms? 8192 ms? > > > > > > Aside from the weirdness of a processor being killed while holding the > > > console lock, it does seem beneficial to give IRQs at least a little > > > time to finish before killing a processor. I don't have any other > > > explicit examples, but I could just imagine that things might be a > > > little more orderly in such a case... > > > > I'm still hoping to get some sort of feedback here. If people think > > this is a terrible idea then I'll shut up now and leave well enough > > alone, but it would be nice to actively decide and get the patch out > > of limbo. > > > > FWIW the serial console dumping issue that originally inspired me to > > track this down has been worked around at least well enough to not > > spew garbage in my console. See commit 9e957a155005 ("serial: > > qcom-geni: Don't cancel/abort if we can't get the port lock"). It's > > still a little awkward because we'll be running fully lockless during > > panic time, but it seems to work... > > This is on my list of things to look into, but I haven't had the chance to go > through it in detail. > > From a high level, I think this sounds reasonable; I just want to make sure > this doesn't lead to any new surprises... Sounds good. For now I'll snooze this for another 2 months and if I haven't heard from you then I'll pester you again. There's no crazy hurry, I was just hoping for some sort of decision one way or the other. For now I'll hold off on changing things to match x86 exactly until I get your take on it too. -Doug
Hi Doug, I'm doing some inbox Spring cleaning! On Thu, Dec 07, 2023 at 05:02:56PM -0800, Douglas Anderson wrote: > When testing hard lockup handling on my sc7180-trogdor-lazor device > with pseudo-NMI enabled, with serial console enabled and with kgdb > disabled, I found that the stack crawls printed to the serial console > ended up as a jumbled mess. After rebooting, the pstore-based console > looked fine though. Also, enabling kgdb to trap the panic made the > console look fine and avoided the mess. > > After a bit of tracking down, I came to the conclusion that this was > what was happening: > 1. The panic path was stopping all other CPUs with > panic_other_cpus_shutdown(). > 2. At least one of those other CPUs was in the middle of printing to > the serial console and holding the console port's lock, which is > grabbed with "irqsave". ...but since we were stopping with an NMI > we didn't care about the "irqsave" and interrupted anyway. > 3. Since we stopped the CPU while it was holding the lock it would > never release it. > 4. All future calls to output to the console would end up failing to > get the lock in qcom_geni_serial_console_write(). This isn't > _totally_ unexpected at panic time but it's a code path that's not > well tested, hard to get right, and apparently doesn't work > terribly well on the Qualcomm geni serial driver. > > It would probably be a reasonable idea to try to make the Qualcomm > geni serial driver work better, but also it's nice not to get into > this situation in the first place. > > Taking a page from what x86 appears to do in native_stop_other_cpus(), > let's do this: > 1. First, we'll try to stop other CPUs with a normal IPI and wait a > second. This gives them a chance to leave critical sections. > 2. If CPUs fail to stop then we'll retry with an NMI, but give a much > lower timeout since there's no good reason for a CPU not to react > quickly to a NMI. > > This works well and avoids the corrupted console and (presumably) > could help avoid other similar issues. > > In order to do this, we need to do a little re-organization of our > IPIs since we don't have any more free IDs. We'll do what was > suggested in previous conversations and combine "stop" and "crash > stop". That frees up an IPI so now we can have a "stop" and "stop > NMI". > > In order to do this we also need a slight change in the way we keep > track of which CPUs still need to be stopped. We need to know > specifically which CPUs haven't stopped yet when we fall back to NMI > but in the "crash stop" case the "cpu_online_mask" isn't updated as > CPUs go down. This is why that code path had an atomic of the number > of CPUs left. We'll solve this by making the cpumask into a > global. This has a potential memory implication--with NR_CPUs = 4096 > this is 4096/8 = 512 bytes of globals. On the upside in that same case > we take 512 bytes off the stack which could potentially have made the > stop code less reliable. It can be noted that the NMI backtrace code > (lib/nmi_backtrace.c) uses the same approach and that use also > confirms that updating the mask is safe from NMI. Updating the global masks without any synchronisation feels broken though: > @@ -1085,77 +1080,75 @@ void smp_send_stop(void) > { > unsigned long timeout; > > - if (num_other_online_cpus()) { > - cpumask_t mask; > + /* > + * If this cpu is the only one alive at this point in time, online or > + * not, there are no stop messages to be sent around, so just back out. > + */ > + if (num_other_online_cpus() == 0) > + goto skip_ipi; > > - cpumask_copy(&mask, cpu_online_mask); > - cpumask_clear_cpu(smp_processor_id(), &mask); > + cpumask_copy(to_cpumask(stop_mask), cpu_online_mask); > + cpumask_clear_cpu(smp_processor_id(), to_cpumask(stop_mask)); I don't see what prevents multiple CPUs getting in here concurrently and tripping over the masks. x86 seems to avoid that with an atomic 'stopping_cpu' variable in native_stop_other_cpus(). Do we need something similar? Apart from that, I'm fine with the gist of the patch. Will
Hi, On Fri, Apr 12, 2024 at 6:55 AM Will Deacon <will@kernel.org> wrote: > > Hi Doug, > > I'm doing some inbox Spring cleaning! No worries. I got your reply while I was on a bunch of business travel and finally cleared stuff out enough to take a look again. ;-) > On Thu, Dec 07, 2023 at 05:02:56PM -0800, Douglas Anderson wrote: > > When testing hard lockup handling on my sc7180-trogdor-lazor device > > with pseudo-NMI enabled, with serial console enabled and with kgdb > > disabled, I found that the stack crawls printed to the serial console > > ended up as a jumbled mess. After rebooting, the pstore-based console > > looked fine though. Also, enabling kgdb to trap the panic made the > > console look fine and avoided the mess. > > > > After a bit of tracking down, I came to the conclusion that this was > > what was happening: > > 1. The panic path was stopping all other CPUs with > > panic_other_cpus_shutdown(). > > 2. At least one of those other CPUs was in the middle of printing to > > the serial console and holding the console port's lock, which is > > grabbed with "irqsave". ...but since we were stopping with an NMI > > we didn't care about the "irqsave" and interrupted anyway. > > 3. Since we stopped the CPU while it was holding the lock it would > > never release it. > > 4. All future calls to output to the console would end up failing to > > get the lock in qcom_geni_serial_console_write(). This isn't > > _totally_ unexpected at panic time but it's a code path that's not > > well tested, hard to get right, and apparently doesn't work > > terribly well on the Qualcomm geni serial driver. > > > > It would probably be a reasonable idea to try to make the Qualcomm > > geni serial driver work better, but also it's nice not to get into > > this situation in the first place. > > > > Taking a page from what x86 appears to do in native_stop_other_cpus(), > > let's do this: > > 1. First, we'll try to stop other CPUs with a normal IPI and wait a > > second. This gives them a chance to leave critical sections. > > 2. If CPUs fail to stop then we'll retry with an NMI, but give a much > > lower timeout since there's no good reason for a CPU not to react > > quickly to a NMI. > > > > This works well and avoids the corrupted console and (presumably) > > could help avoid other similar issues. > > > > In order to do this, we need to do a little re-organization of our > > IPIs since we don't have any more free IDs. We'll do what was > > suggested in previous conversations and combine "stop" and "crash > > stop". That frees up an IPI so now we can have a "stop" and "stop > > NMI". > > > > In order to do this we also need a slight change in the way we keep > > track of which CPUs still need to be stopped. We need to know > > specifically which CPUs haven't stopped yet when we fall back to NMI > > but in the "crash stop" case the "cpu_online_mask" isn't updated as > > CPUs go down. This is why that code path had an atomic of the number > > of CPUs left. We'll solve this by making the cpumask into a > > global. This has a potential memory implication--with NR_CPUs = 4096 > > this is 4096/8 = 512 bytes of globals. On the upside in that same case > > we take 512 bytes off the stack which could potentially have made the > > stop code less reliable. It can be noted that the NMI backtrace code > > (lib/nmi_backtrace.c) uses the same approach and that use also > > confirms that updating the mask is safe from NMI. > > Updating the global masks without any synchronisation feels broken though: > > > @@ -1085,77 +1080,75 @@ void smp_send_stop(void) > > { > > unsigned long timeout; > > > > - if (num_other_online_cpus()) { > > - cpumask_t mask; > > + /* > > + * If this cpu is the only one alive at this point in time, online or > > + * not, there are no stop messages to be sent around, so just back out. > > + */ > > + if (num_other_online_cpus() == 0) > > + goto skip_ipi; > > > > - cpumask_copy(&mask, cpu_online_mask); > > - cpumask_clear_cpu(smp_processor_id(), &mask); > > + cpumask_copy(to_cpumask(stop_mask), cpu_online_mask); > > + cpumask_clear_cpu(smp_processor_id(), to_cpumask(stop_mask)); > > I don't see what prevents multiple CPUs getting in here concurrently and > tripping over the masks. x86 seems to avoid that with an atomic > 'stopping_cpu' variable in native_stop_other_cpus(). Do we need something > similar? Good point. nmi_trigger_cpumask_backtrace(), which my code was based on, has a test_and_set() for this and that seems simpler than the atomic_try_cmpxchg() from the x86 code. If we run into that case, what do you think we should do? I guess x86 just does a "return", though it feels like at least a warning should be printed since we're not doing what the function asked us to do. When we return there will be other CPUs running. In theory, we could try to help the other processor along? I don't know how much complexity to handle here and I could imagine that testing some of the corner cases would be extremely hard. I could imagine that this might work but maybe it's too complex? -- void smp_send_stop(void) { unsigned long timeout; static unsigned long stop_in_progress; /* * If this cpu is the only one alive at this point in time, online or * not, there are no stop messages to be sent around, so just back out. */ if (num_other_online_cpus() == 0) goto skip_ipi; /* * If another is already trying to stop and we're here then either the * other CPU hasn't sent us the IPI yet or we have interrupts disabled. * Let's help the other CPU by stopping ourselves. */ if (test_and_set_bit(0, &stop_in_progress)) { /* Wait until the other inits stop_mask */ while (!test_bit(1, &stop_in_progress)) { cpu_relax(); smp_rmb(); } do_handle_IPI(IPI_CPU_STOP); } cpumask_copy(to_cpumask(stop_mask), cpu_online_mask); cpumask_clear_cpu(smp_processor_id(), to_cpumask(stop_mask)); /* Indicate that we've initted stop_mask */ set_bit(1, &stop_in_progress); smp_wmb(); ... ... -- Opinions? > Apart from that, I'm fine with the gist of the patch. Great. Ironically as I reviewed this patch with fresh eyes and looking at the things you brought up, I also found a few issues, I'll respond to my post myself so I have context to respond to. One other question: what did you think about Daniel's suggestion to go straight to NMI for crash_stop? I don't feel like I have enough experience with crash_stop to have intuition here, but it feels like trying IRQ first is still better in that case, but I'm happy to go either way. -Doug
Hi, On Thu, Dec 7, 2023 at 5:03 PM Douglas Anderson <dianders@chromium.org> wrote: > > When testing hard lockup handling on my sc7180-trogdor-lazor device > with pseudo-NMI enabled, with serial console enabled and with kgdb > disabled, I found that the stack crawls printed to the serial console > ended up as a jumbled mess. After rebooting, the pstore-based console > looked fine though. Also, enabling kgdb to trap the panic made the > console look fine and avoided the mess. > > After a bit of tracking down, I came to the conclusion that this was > what was happening: > 1. The panic path was stopping all other CPUs with > panic_other_cpus_shutdown(). > 2. At least one of those other CPUs was in the middle of printing to > the serial console and holding the console port's lock, which is > grabbed with "irqsave". ...but since we were stopping with an NMI > we didn't care about the "irqsave" and interrupted anyway. > 3. Since we stopped the CPU while it was holding the lock it would > never release it. > 4. All future calls to output to the console would end up failing to > get the lock in qcom_geni_serial_console_write(). This isn't > _totally_ unexpected at panic time but it's a code path that's not > well tested, hard to get right, and apparently doesn't work > terribly well on the Qualcomm geni serial driver. > > It would probably be a reasonable idea to try to make the Qualcomm > geni serial driver work better, but also it's nice not to get into > this situation in the first place. > > Taking a page from what x86 appears to do in native_stop_other_cpus(), > let's do this: > 1. First, we'll try to stop other CPUs with a normal IPI and wait a > second. This gives them a chance to leave critical sections. > 2. If CPUs fail to stop then we'll retry with an NMI, but give a much > lower timeout since there's no good reason for a CPU not to react > quickly to a NMI. > > This works well and avoids the corrupted console and (presumably) > could help avoid other similar issues. > > In order to do this, we need to do a little re-organization of our > IPIs since we don't have any more free IDs. We'll do what was > suggested in previous conversations and combine "stop" and "crash > stop". That frees up an IPI so now we can have a "stop" and "stop > NMI". > > In order to do this we also need a slight change in the way we keep > track of which CPUs still need to be stopped. We need to know > specifically which CPUs haven't stopped yet when we fall back to NMI > but in the "crash stop" case the "cpu_online_mask" isn't updated as > CPUs go down. This is why that code path had an atomic of the number > of CPUs left. We'll solve this by making the cpumask into a > global. This has a potential memory implication--with NR_CPUs = 4096 > this is 4096/8 = 512 bytes of globals. On the upside in that same case > we take 512 bytes off the stack which could potentially have made the > stop code less reliable. It can be noted that the NMI backtrace code > (lib/nmi_backtrace.c) uses the same approach and that use also > confirms that updating the mask is safe from NMI. > > All of the above lets us combine the logic for "stop" and "crash stop" > code, which appeared to have a bunch of arbitrary implementation > differences. Possibly this could make up for some of the 512 wasted > bytes. ;-) > > Aside from the above change where we try a normal IPI and then an NMI, > the combined function has a few subtle differences: > * In the normal smp_send_stop(), if we fail to stop one or more CPUs > then we won't include the current CPU (the one running > smp_send_stop()) in the error message. > * In crash_smp_send_stop(), if we fail to stop some CPUs we'll print > the CPUs that we failed to stop instead of printing all _but_ the > current running CPU. > * In crash_smp_send_stop(), we will now only print "SMP: stopping > secondary CPUs" if (system_state <= SYSTEM_RUNNING). > > Fixes: d7402513c935 ("arm64: smp: IPI_CPU_STOP and IPI_CPU_CRASH_STOP should try for NMI") > Signed-off-by: Douglas Anderson <dianders@chromium.org> > --- > I'm not setup to test the crash_smp_send_stop(). I made sure it > compiled and hacked the panic() method to call it, but I haven't > actually run kexec. Hopefully others can confirm that it's working for > them. > > arch/arm64/kernel/smp.c | 115 +++++++++++++++++++--------------------- > 1 file changed, 54 insertions(+), 61 deletions(-) > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > index defbab84e9e5..9fe9d4342517 100644 > --- a/arch/arm64/kernel/smp.c > +++ b/arch/arm64/kernel/smp.c > @@ -71,7 +71,7 @@ enum ipi_msg_type { > IPI_RESCHEDULE, > IPI_CALL_FUNC, > IPI_CPU_STOP, > - IPI_CPU_CRASH_STOP, > + IPI_CPU_STOP_NMI, > IPI_TIMER, > IPI_IRQ_WORK, > NR_IPI, > @@ -88,6 +88,9 @@ static int ipi_irq_base __ro_after_init; > static int nr_ipi __ro_after_init = NR_IPI; > static struct irq_desc *ipi_desc[MAX_IPI] __ro_after_init; > > +static DECLARE_BITMAP(stop_mask, NR_CPUS) __read_mostly; > +static bool crash_stop; > + > static void ipi_setup(int cpu); > > #ifdef CONFIG_HOTPLUG_CPU > @@ -770,7 +773,7 @@ static const char *ipi_types[NR_IPI] __tracepoint_string = { > [IPI_RESCHEDULE] = "Rescheduling interrupts", > [IPI_CALL_FUNC] = "Function call interrupts", > [IPI_CPU_STOP] = "CPU stop interrupts", > - [IPI_CPU_CRASH_STOP] = "CPU stop (for crash dump) interrupts", > + [IPI_CPU_STOP_NMI] = "CPU stop NMIs", > [IPI_TIMER] = "Timer broadcast interrupts", > [IPI_IRQ_WORK] = "IRQ work interrupts", > }; > @@ -831,17 +834,11 @@ void __noreturn panic_smp_self_stop(void) > local_cpu_stop(); > } > > -#ifdef CONFIG_KEXEC_CORE > -static atomic_t waiting_for_crash_ipi = ATOMIC_INIT(0); > -#endif > - > static void __noreturn ipi_cpu_crash_stop(unsigned int cpu, struct pt_regs *regs) > { > #ifdef CONFIG_KEXEC_CORE > crash_save_cpu(regs, cpu); > > - atomic_dec(&waiting_for_crash_ipi); Upon reading the patch with fresh eyes, I think I actually need to move the "cpumask_clear_cpu(cpu, to_cpumask(stop_mask))" here. Specifically I think it's important that it happens _after_ the call to crash_save_cpu(). > local_irq_disable(); The above local_irq_disable() is not new for my patch but it seems wonky for two reasons: 1. It feels like it should have been the first thing in the function. 2. It feels like it should be local_daif_mask() instead. I _think_ it doesn't actually matter because, with the current code, we're only ever called from do_handle_IPI() and thus local IRQs will be off (and local NMIs will be off if we're called from NMI context). However, once we have the IRQ + NMI fallback it _might_ matter if we were midway through finally handling the IRQ-based IPI when we decided to try the NMI-based one. For the next spin of the patch I'll plan to get rid of the local_irq_disable() and instead have local_daif_mask() be the first thing that this function does.
On Fri, May 17, 2024 at 01:01:51PM -0700, Doug Anderson wrote: > On Fri, Apr 12, 2024 at 6:55 AM Will Deacon <will@kernel.org> wrote: > > On Thu, Dec 07, 2023 at 05:02:56PM -0800, Douglas Anderson wrote: > > > In order to do this we also need a slight change in the way we keep > > > track of which CPUs still need to be stopped. We need to know > > > specifically which CPUs haven't stopped yet when we fall back to NMI > > > but in the "crash stop" case the "cpu_online_mask" isn't updated as > > > CPUs go down. This is why that code path had an atomic of the number > > > of CPUs left. We'll solve this by making the cpumask into a > > > global. This has a potential memory implication--with NR_CPUs = 4096 > > > this is 4096/8 = 512 bytes of globals. On the upside in that same case > > > we take 512 bytes off the stack which could potentially have made the > > > stop code less reliable. It can be noted that the NMI backtrace code > > > (lib/nmi_backtrace.c) uses the same approach and that use also > > > confirms that updating the mask is safe from NMI. > > > > Updating the global masks without any synchronisation feels broken though: > > > > > @@ -1085,77 +1080,75 @@ void smp_send_stop(void) > > > { > > > unsigned long timeout; > > > > > > - if (num_other_online_cpus()) { > > > - cpumask_t mask; > > > + /* > > > + * If this cpu is the only one alive at this point in time, online or > > > + * not, there are no stop messages to be sent around, so just back out. > > > + */ > > > + if (num_other_online_cpus() == 0) > > > + goto skip_ipi; > > > > > > - cpumask_copy(&mask, cpu_online_mask); > > > - cpumask_clear_cpu(smp_processor_id(), &mask); > > > + cpumask_copy(to_cpumask(stop_mask), cpu_online_mask); > > > + cpumask_clear_cpu(smp_processor_id(), to_cpumask(stop_mask)); > > > > I don't see what prevents multiple CPUs getting in here concurrently and > > tripping over the masks. x86 seems to avoid that with an atomic > > 'stopping_cpu' variable in native_stop_other_cpus(). Do we need something > > similar? > > Good point. nmi_trigger_cpumask_backtrace(), which my code was based > on, has a test_and_set() for this and that seems simpler than the > atomic_try_cmpxchg() from the x86 code. > > If we run into that case, what do you think we should do? I guess x86 > just does a "return", though it feels like at least a warning should > be printed since we're not doing what the function asked us to do. > When we return there will be other CPUs running. > > In theory, we could try to help the other processor along? I don't > know how much complexity to handle here and I could imagine that > testing some of the corner cases would be extremely hard. I could > imagine that this might work but maybe it's too complex? > > -- > > void smp_send_stop(void) > { > unsigned long timeout; > static unsigned long stop_in_progress; > > /* > * If this cpu is the only one alive at this point in time, online or > * not, there are no stop messages to be sent around, so just back out. > */ > if (num_other_online_cpus() == 0) > goto skip_ipi; > > /* > * If another is already trying to stop and we're here then either the > * other CPU hasn't sent us the IPI yet or we have interrupts disabled. > * Let's help the other CPU by stopping ourselves. > */ > if (test_and_set_bit(0, &stop_in_progress)) { > /* Wait until the other inits stop_mask */ > while (!test_bit(1, &stop_in_progress)) { > cpu_relax(); > smp_rmb(); > } > do_handle_IPI(IPI_CPU_STOP); > } > > cpumask_copy(to_cpumask(stop_mask), cpu_online_mask); > cpumask_clear_cpu(smp_processor_id(), to_cpumask(stop_mask)); > > /* Indicate that we've initted stop_mask */ > set_bit(1, &stop_in_progress); > smp_wmb(); > ... > ... > > -- > > Opinions? I think following the x86 behaviour is probably the best place to start: the CPU that ends up doing the IPI will spin anyway, so I don't think there's much to gain by being pro-active on the recipient. > > Apart from that, I'm fine with the gist of the patch. > > Great. Ironically as I reviewed this patch with fresh eyes and looking > at the things you brought up, I also found a few issues, I'll respond > to my post myself so I have context to respond to. Ok. > One other question: what did you think about Daniel's suggestion to go > straight to NMI for crash_stop? I don't feel like I have enough > experience with crash_stop to have intuition here, but it feels like > trying IRQ first is still better in that case, but I'm happy to go > either way. I don't have a strong preference, but I think I'd prefer a non-NMI first as it feels like things ought to be more robust in that case and it should work most of the time. Will
On Fri, May 17, 2024 at 01:01:58PM -0700, Doug Anderson wrote: > On Thu, Dec 7, 2023 at 5:03 PM Douglas Anderson <dianders@chromium.org> wrote: > > local_irq_disable(); > > The above local_irq_disable() is not new for my patch but it seems > wonky for two reasons: > > 1. It feels like it should have been the first thing in the function. > > 2. It feels like it should be local_daif_mask() instead. Is that to ensure we don't take a pNMI? I think that makes sense, but let's please add a comment to say why local_irq_disable() is not sufficient. Will
Hi, On Mon, Jun 24, 2024 at 6:55 AM Will Deacon <will@kernel.org> wrote: > > On Fri, May 17, 2024 at 01:01:58PM -0700, Doug Anderson wrote: > > On Thu, Dec 7, 2023 at 5:03 PM Douglas Anderson <dianders@chromium.org> wrote: > > > local_irq_disable(); > > > > The above local_irq_disable() is not new for my patch but it seems > > wonky for two reasons: > > > > 1. It feels like it should have been the first thing in the function. > > > > 2. It feels like it should be local_daif_mask() instead. > > Is that to ensure we don't take a pNMI? I think that makes sense, but > let's please add a comment to say why local_irq_disable() is not > sufficient. Right, that was my thought. Mostly I realized it was right because the normal (non-crash) stop case calls local_cpu_stop() which calls local_daif_mask(). I was comparing the two and trying to figure out if the difference was on purpose or an oversight. Looks like an oversight to me. Sure, I'll add a comment. Ironically, looking at the code again I found _yet another_ corner case I missed: panic_smp_self_stop(). If a CPU hits that case then we could end up waiting for it when it's already stopped itself. I tried to figure out how to solve that properly and it dawned on me that maybe I should rethink part of my patch. Specifically, I had added a new `stop_mask` in this patch because the panic case didn't update `cpu_online_mask`. ...but that's easy enough to fix: just add a call to `set_cpu_online(cpu, false)` in ipi_cpu_crash_stop(). ...so I'll do that and avoid adding a new mask. If there's some reason why crash stop shouldn't be marking a CPU offline then let me know and I'll go back... -Doug
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index defbab84e9e5..9fe9d4342517 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -71,7 +71,7 @@ enum ipi_msg_type { IPI_RESCHEDULE, IPI_CALL_FUNC, IPI_CPU_STOP, - IPI_CPU_CRASH_STOP, + IPI_CPU_STOP_NMI, IPI_TIMER, IPI_IRQ_WORK, NR_IPI, @@ -88,6 +88,9 @@ static int ipi_irq_base __ro_after_init; static int nr_ipi __ro_after_init = NR_IPI; static struct irq_desc *ipi_desc[MAX_IPI] __ro_after_init; +static DECLARE_BITMAP(stop_mask, NR_CPUS) __read_mostly; +static bool crash_stop; + static void ipi_setup(int cpu); #ifdef CONFIG_HOTPLUG_CPU @@ -770,7 +773,7 @@ static const char *ipi_types[NR_IPI] __tracepoint_string = { [IPI_RESCHEDULE] = "Rescheduling interrupts", [IPI_CALL_FUNC] = "Function call interrupts", [IPI_CPU_STOP] = "CPU stop interrupts", - [IPI_CPU_CRASH_STOP] = "CPU stop (for crash dump) interrupts", + [IPI_CPU_STOP_NMI] = "CPU stop NMIs", [IPI_TIMER] = "Timer broadcast interrupts", [IPI_IRQ_WORK] = "IRQ work interrupts", }; @@ -831,17 +834,11 @@ void __noreturn panic_smp_self_stop(void) local_cpu_stop(); } -#ifdef CONFIG_KEXEC_CORE -static atomic_t waiting_for_crash_ipi = ATOMIC_INIT(0); -#endif - static void __noreturn ipi_cpu_crash_stop(unsigned int cpu, struct pt_regs *regs) { #ifdef CONFIG_KEXEC_CORE crash_save_cpu(regs, cpu); - atomic_dec(&waiting_for_crash_ipi); - local_irq_disable(); sdei_mask_local_cpu(); @@ -907,14 +904,13 @@ static void do_handle_IPI(int ipinr) break; case IPI_CPU_STOP: - local_cpu_stop(); - break; - - case IPI_CPU_CRASH_STOP: - if (IS_ENABLED(CONFIG_KEXEC_CORE)) { + case IPI_CPU_STOP_NMI: + cpumask_clear_cpu(cpu, to_cpumask(stop_mask)); + if (IS_ENABLED(CONFIG_KEXEC_CORE) && crash_stop) { ipi_cpu_crash_stop(cpu, get_irq_regs()); - unreachable(); + } else { + local_cpu_stop(); } break; @@ -969,8 +965,7 @@ static bool ipi_should_be_nmi(enum ipi_msg_type ipi) return false; switch (ipi) { - case IPI_CPU_STOP: - case IPI_CPU_CRASH_STOP: + case IPI_CPU_STOP_NMI: case IPI_CPU_BACKTRACE: case IPI_KGDB_ROUNDUP: return true; @@ -1085,77 +1080,75 @@ void smp_send_stop(void) { unsigned long timeout; - if (num_other_online_cpus()) { - cpumask_t mask; + /* + * If this cpu is the only one alive at this point in time, online or + * not, there are no stop messages to be sent around, so just back out. + */ + if (num_other_online_cpus() == 0) + goto skip_ipi; - cpumask_copy(&mask, cpu_online_mask); - cpumask_clear_cpu(smp_processor_id(), &mask); + cpumask_copy(to_cpumask(stop_mask), cpu_online_mask); + cpumask_clear_cpu(smp_processor_id(), to_cpumask(stop_mask)); - if (system_state <= SYSTEM_RUNNING) - pr_crit("SMP: stopping secondary CPUs\n"); - smp_cross_call(&mask, IPI_CPU_STOP); - } + if (system_state <= SYSTEM_RUNNING) + pr_crit("SMP: stopping secondary CPUs\n"); - /* Wait up to one second for other CPUs to stop */ + /* + * Start with a normal IPI and wait up to one second for other CPUs to + * stop. We do this first because it gives other processors a chance + * to exit critical sections / drop locks and makes the rest of the + * stop process (especially console flush) more robust. + */ + smp_cross_call(to_cpumask(stop_mask), IPI_CPU_STOP); timeout = USEC_PER_SEC; - while (num_other_online_cpus() && timeout--) + while (!cpumask_empty(to_cpumask(stop_mask)) && timeout--) udelay(1); - if (num_other_online_cpus()) + /* + * If CPUs are still online, try an NMI. There's no excuse for this to + * be slow, so we only give them an extra 10 ms to respond. + */ + if (!cpumask_empty(to_cpumask(stop_mask)) && + ipi_should_be_nmi(IPI_CPU_STOP_NMI)) { + pr_info("SMP: retry stop with NMI for CPUs %*pbl\n", + cpumask_pr_args(to_cpumask(stop_mask))); + + smp_cross_call(to_cpumask(stop_mask), IPI_CPU_STOP_NMI); + timeout = USEC_PER_MSEC * 10; + while (!cpumask_empty(to_cpumask(stop_mask)) && timeout--) + udelay(1); + } + + if (!cpumask_empty(to_cpumask(stop_mask))) pr_warn("SMP: failed to stop secondary CPUs %*pbl\n", - cpumask_pr_args(cpu_online_mask)); + cpumask_pr_args(to_cpumask(stop_mask))); +skip_ipi: sdei_mask_local_cpu(); } #ifdef CONFIG_KEXEC_CORE void crash_smp_send_stop(void) { - static int cpus_stopped; - cpumask_t mask; - unsigned long timeout; - /* * This function can be called twice in panic path, but obviously * we execute this only once. + * + * We use this same boolean to tell whether the IPI we send was a + * stop or a "crash stop". */ - if (cpus_stopped) + if (crash_stop) return; + crash_stop = 1; - cpus_stopped = 1; - - /* - * If this cpu is the only one alive at this point in time, online or - * not, there are no stop messages to be sent around, so just back out. - */ - if (num_other_online_cpus() == 0) - goto skip_ipi; - - cpumask_copy(&mask, cpu_online_mask); - cpumask_clear_cpu(smp_processor_id(), &mask); - - atomic_set(&waiting_for_crash_ipi, num_other_online_cpus()); + smp_send_stop(); - pr_crit("SMP: stopping secondary CPUs\n"); - smp_cross_call(&mask, IPI_CPU_CRASH_STOP); - - /* Wait up to one second for other CPUs to stop */ - timeout = USEC_PER_SEC; - while ((atomic_read(&waiting_for_crash_ipi) > 0) && timeout--) - udelay(1); - - if (atomic_read(&waiting_for_crash_ipi) > 0) - pr_warn("SMP: failed to stop secondary CPUs %*pbl\n", - cpumask_pr_args(&mask)); - -skip_ipi: - sdei_mask_local_cpu(); sdei_handler_abort(); } bool smp_crash_stop_failed(void) { - return (atomic_read(&waiting_for_crash_ipi) > 0); + return !cpumask_empty(to_cpumask(stop_mask)); } #endif
When testing hard lockup handling on my sc7180-trogdor-lazor device with pseudo-NMI enabled, with serial console enabled and with kgdb disabled, I found that the stack crawls printed to the serial console ended up as a jumbled mess. After rebooting, the pstore-based console looked fine though. Also, enabling kgdb to trap the panic made the console look fine and avoided the mess. After a bit of tracking down, I came to the conclusion that this was what was happening: 1. The panic path was stopping all other CPUs with panic_other_cpus_shutdown(). 2. At least one of those other CPUs was in the middle of printing to the serial console and holding the console port's lock, which is grabbed with "irqsave". ...but since we were stopping with an NMI we didn't care about the "irqsave" and interrupted anyway. 3. Since we stopped the CPU while it was holding the lock it would never release it. 4. All future calls to output to the console would end up failing to get the lock in qcom_geni_serial_console_write(). This isn't _totally_ unexpected at panic time but it's a code path that's not well tested, hard to get right, and apparently doesn't work terribly well on the Qualcomm geni serial driver. It would probably be a reasonable idea to try to make the Qualcomm geni serial driver work better, but also it's nice not to get into this situation in the first place. Taking a page from what x86 appears to do in native_stop_other_cpus(), let's do this: 1. First, we'll try to stop other CPUs with a normal IPI and wait a second. This gives them a chance to leave critical sections. 2. If CPUs fail to stop then we'll retry with an NMI, but give a much lower timeout since there's no good reason for a CPU not to react quickly to a NMI. This works well and avoids the corrupted console and (presumably) could help avoid other similar issues. In order to do this, we need to do a little re-organization of our IPIs since we don't have any more free IDs. We'll do what was suggested in previous conversations and combine "stop" and "crash stop". That frees up an IPI so now we can have a "stop" and "stop NMI". In order to do this we also need a slight change in the way we keep track of which CPUs still need to be stopped. We need to know specifically which CPUs haven't stopped yet when we fall back to NMI but in the "crash stop" case the "cpu_online_mask" isn't updated as CPUs go down. This is why that code path had an atomic of the number of CPUs left. We'll solve this by making the cpumask into a global. This has a potential memory implication--with NR_CPUs = 4096 this is 4096/8 = 512 bytes of globals. On the upside in that same case we take 512 bytes off the stack which could potentially have made the stop code less reliable. It can be noted that the NMI backtrace code (lib/nmi_backtrace.c) uses the same approach and that use also confirms that updating the mask is safe from NMI. All of the above lets us combine the logic for "stop" and "crash stop" code, which appeared to have a bunch of arbitrary implementation differences. Possibly this could make up for some of the 512 wasted bytes. ;-) Aside from the above change where we try a normal IPI and then an NMI, the combined function has a few subtle differences: * In the normal smp_send_stop(), if we fail to stop one or more CPUs then we won't include the current CPU (the one running smp_send_stop()) in the error message. * In crash_smp_send_stop(), if we fail to stop some CPUs we'll print the CPUs that we failed to stop instead of printing all _but_ the current running CPU. * In crash_smp_send_stop(), we will now only print "SMP: stopping secondary CPUs" if (system_state <= SYSTEM_RUNNING). Fixes: d7402513c935 ("arm64: smp: IPI_CPU_STOP and IPI_CPU_CRASH_STOP should try for NMI") Signed-off-by: Douglas Anderson <dianders@chromium.org> --- I'm not setup to test the crash_smp_send_stop(). I made sure it compiled and hacked the panic() method to call it, but I haven't actually run kexec. Hopefully others can confirm that it's working for them. arch/arm64/kernel/smp.c | 115 +++++++++++++++++++--------------------- 1 file changed, 54 insertions(+), 61 deletions(-)