Message ID | 33183CC9F5247A488A2544077AF19020B02B7A73@SZXEMA503-MBS.china.huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Sat, Dec 19, 2015 at 12:03:15PM +0000, Gonglei (Arei) wrote: > Maybe the root cause is not NMI but INTR, so yield() can open hardware interrupt, > And then execute interrupt handler, but the interrupt handler make the SeaBIOS > stack broken, so that the BSP can't execute the instruction and occur exception, > VM_EXIT to Kmod, which is an infinite loop. But I don't have any proofs except > the surface phenomenon. I can't see any reason why allowing interrupts at this location would be a problem. > Kevin, can we drop yield() in smp_setup() ? It's possible to eliminate this instance of yield, but I think it would just push the crash to the next time interrupts are enabled. > Is it really useful and allowable for SeaBIOS? Maybe for other components? > I'm not sure. Because we found that when SeaBIOS is booting, if we inject a > NMI by QMP, the guest will *stuck*. And the kvm tracing log is the same with > the current problem. If you apply the patches you had to prevent that NMI crash problem, does it also prevent the above crash? -Kevin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: Kevin O'Connor [mailto:kevin@koconnor.net] > Sent: Saturday, December 19, 2015 11:12 PM > On Sat, Dec 19, 2015 at 12:03:15PM +0000, Gonglei (Arei) wrote: > > Maybe the root cause is not NMI but INTR, so yield() can open hardware > interrupt, > > And then execute interrupt handler, but the interrupt handler make the > SeaBIOS > > stack broken, so that the BSP can't execute the instruction and occur > exception, > > VM_EXIT to Kmod, which is an infinite loop. But I don't have any proofs except > > the surface phenomenon. > > I can't see any reason why allowing interrupts at this location would > be a problem. > Does it have any relationship with *extra stack* of SeaBIOS? > > Kevin, can we drop yield() in smp_setup() ? > > It's possible to eliminate this instance of yield, but I think it > would just push the crash to the next time interrupts are enabled. > Perhaps. I'm not sure. > > Is it really useful and allowable for SeaBIOS? Maybe for other components? > > I'm not sure. Because we found that when SeaBIOS is booting, if we inject a > > NMI by QMP, the guest will *stuck*. And the kvm tracing log is the same with > > the current problem. > > If you apply the patches you had to prevent that NMI crash problem, > does it also prevent the above crash? > Yes, but we cannot prevent the NMI injection (though I'll submit some patches to forbid users' NMI injection after NMI_EN disabled by RTC bit7 of port 0x70). Regards, -Gonglei -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Dec 20, 2015 at 09:49:54AM +0000, Gonglei (Arei) wrote: > > From: Kevin O'Connor [mailto:kevin@koconnor.net] > > Sent: Saturday, December 19, 2015 11:12 PM > > On Sat, Dec 19, 2015 at 12:03:15PM +0000, Gonglei (Arei) wrote: > > > Maybe the root cause is not NMI but INTR, so yield() can open hardware > > interrupt, > > > And then execute interrupt handler, but the interrupt handler make the > > SeaBIOS > > > stack broken, so that the BSP can't execute the instruction and occur > > exception, > > > VM_EXIT to Kmod, which is an infinite loop. But I don't have any proofs except > > > the surface phenomenon. > > > > I can't see any reason why allowing interrupts at this location would > > be a problem. > > > Does it have any relationship with *extra stack* of SeaBIOS? None that I can see. Also, the kvm trace seems to show the code trying to execute at rip=0x03 - that will crash long before the extra stack is used. > > > Kevin, can we drop yield() in smp_setup() ? > > > > It's possible to eliminate this instance of yield, but I think it > > would just push the crash to the next time interrupts are enabled. > > > Perhaps. I'm not sure. > > > > Is it really useful and allowable for SeaBIOS? Maybe for other components? > > > I'm not sure. Because we found that when SeaBIOS is booting, if we inject a > > > NMI by QMP, the guest will *stuck*. And the kvm tracing log is the same with > > > the current problem. > > > > If you apply the patches you had to prevent that NMI crash problem, > > does it also prevent the above crash? > > > Yes, but we cannot prevent the NMI injection (though I'll submit some patches to > forbid users' NMI injection after NMI_EN disabled by RTC bit7 of port 0x70). > -Kevin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Dear Kevin, > -----Original Message----- > From: Kevin O'Connor [mailto:kevin@koconnor.net] > Sent: Sunday, December 20, 2015 10:33 PM > To: Gonglei (Arei) > Cc: Xulei (Stone); Paolo Bonzini; qemu-devel; seabios@seabios.org; > Huangweidong (C); kvm@vger.kernel.org; Radim Krcmar > Subject: Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy > problem on qemu-kvm platform > > On Sun, Dec 20, 2015 at 09:49:54AM +0000, Gonglei (Arei) wrote: > > > From: Kevin O'Connor [mailto:kevin@koconnor.net] > > > Sent: Saturday, December 19, 2015 11:12 PM > > > On Sat, Dec 19, 2015 at 12:03:15PM +0000, Gonglei (Arei) wrote: > > > > Maybe the root cause is not NMI but INTR, so yield() can open hardware > > > interrupt, > > > > And then execute interrupt handler, but the interrupt handler make the > > > SeaBIOS > > > > stack broken, so that the BSP can't execute the instruction and occur > > > exception, > > > > VM_EXIT to Kmod, which is an infinite loop. But I don't have any proofs > except > > > > the surface phenomenon. > > > > > > I can't see any reason why allowing interrupts at this location would > > > be a problem. > > > > > Does it have any relationship with *extra stack* of SeaBIOS? > > None that I can see. Also, the kvm trace seems to show the code > trying to execute at rip=0x03 - that will crash long before the extra > stack is used. > When the gurb of OS is booting, then the softirq and C function send_disk_op() may use extra stack of SeaBIOS. If we inject a NMI, romlayout.S: irqentry_extrastack is invoked, and the extra stack will be used again. And the stack of first calling will be broken, so that the SeaBIOS stuck. You can easily reproduce the problem. 1. start on guest 2. reset the guest 3. inject a NMI when the guest show the grub surface 4. then the guest stuck If we disabled extra stack by setting CONFIG_ENTRY_EXTRASTACK=n Then the problem is gone. Besides, I have another thought: Is it possible when one cpu is using the extra stack, but other cpus (APs) still be waked up by hardware interrupt after yield() or br->flags = F_IF and used the extra stack again? Regards, -Gonglei > > > > Kevin, can we drop yield() in smp_setup() ? > > > > > > It's possible to eliminate this instance of yield, but I think it > > > would just push the crash to the next time interrupts are enabled. > > > > > Perhaps. I'm not sure. > > > > > > Is it really useful and allowable for SeaBIOS? Maybe for other > components? > > > > I'm not sure. Because we found that when SeaBIOS is booting, if we inject > a > > > > NMI by QMP, the guest will *stuck*. And the kvm tracing log is the same > with > > > > the current problem. > > > > > > If you apply the patches you had to prevent that NMI crash problem, > > > does it also prevent the above crash? > > > > > Yes, but we cannot prevent the NMI injection (though I'll submit some > patches to > > forbid users' NMI injection after NMI_EN disabled by RTC bit7 of port 0x70). > > > > -Kevin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/src/fw/smp.c b/src/fw/smp.c index 579acdb..dd23eda 100644 --- a/src/fw/smp.c +++ b/src/fw/smp.c @@ -136,7 +136,6 @@ smp_setup(void) " jc 1b\n" : "+m" (SMPLock), "+m" (SMPStack) : : "cc", "memory"); - yield(); // Restore memory. *(u64*)BUILD_AP_BOOT_ADDR = old;