Message ID | 20110603193157.GO2333@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, 2011-06-03 at 12:31 -0700, Paul E. McKenney wrote: > On Fri, Jun 03, 2011 at 10:54:19AM +0300, Sasha Levin wrote: > > On Fri, 2011-06-03 at 09:34 +0200, Ingo Molnar wrote: > > > * Sasha Levin <levinsasha928@gmail.com> wrote: > > > > > > > > with no apparent progress being made. > > > > > > > > Since it's something that worked in 2.6.37, I've looked into it to > > > > find what might have caused this issue. > > > > > > > > I've bisected guest kernels and found that the problem starts with: > > > > > > > > a26ac2455ffcf3be5c6ef92bc6df7182700f2114 is the first bad commit > > > > commit a26ac2455ffcf3be5c6ef92bc6df7182700f2114 > > > > Author: Paul E. McKenney <paul.mckenney@linaro.org> > > > > Date: Wed Jan 12 14:10:23 2011 -0800 > > > > > > > > rcu: move TREE_RCU from softirq to kthread > > > > > > > > Ingo, could you confirm that the problem goes away for you when you > > > > use an earlier commit? > > > > > > testing will have to wait, but there's a recent upstream fix: > > > > > > d72bce0e67e8: rcu: Cure load woes > > > > > > That *might* perhaps address this problem too. > > > > > I've re-tested with Linus's current git, the problem is still there. > > > > > If not then this appears to be some sort of RCU related livelock with > > > brutally overcommitted vcpus. On native this would show up too, in a > > > less drastic form, as a spurious bootup delay. > > > > I don't think it was overcommited by *that* much. With that commit it > > usually hangs at 20-40 vcpus, while without it I can go up to 255. > > Here is a diagnostic patch, untested. It assumes that your system > has only a few CPUs (maybe 8-16) and that timers are still running. > It dumps out some RCU state if grace periods extend for more than > a few seconds. > > To activate it, call rcu_diag_timer_start() from process context. > To stop it, call rcu_diag_timer_stop(), also from process context. Since the hang happens in guest kernel very early during boot, I can't call rcu_diag_timer_start(). What would be a good place to put the _start() code instead?
On Fri, Jun 03, 2011 at 10:56:20PM +0300, Sasha Levin wrote: > On Fri, 2011-06-03 at 12:31 -0700, Paul E. McKenney wrote: > > On Fri, Jun 03, 2011 at 10:54:19AM +0300, Sasha Levin wrote: > > > On Fri, 2011-06-03 at 09:34 +0200, Ingo Molnar wrote: > > > > * Sasha Levin <levinsasha928@gmail.com> wrote: > > > > > > > > > > with no apparent progress being made. > > > > > > > > > > Since it's something that worked in 2.6.37, I've looked into it to > > > > > find what might have caused this issue. > > > > > > > > > > I've bisected guest kernels and found that the problem starts with: > > > > > > > > > > a26ac2455ffcf3be5c6ef92bc6df7182700f2114 is the first bad commit > > > > > commit a26ac2455ffcf3be5c6ef92bc6df7182700f2114 > > > > > Author: Paul E. McKenney <paul.mckenney@linaro.org> > > > > > Date: Wed Jan 12 14:10:23 2011 -0800 > > > > > > > > > > rcu: move TREE_RCU from softirq to kthread > > > > > > > > > > Ingo, could you confirm that the problem goes away for you when you > > > > > use an earlier commit? > > > > > > > > testing will have to wait, but there's a recent upstream fix: > > > > > > > > d72bce0e67e8: rcu: Cure load woes > > > > > > > > That *might* perhaps address this problem too. > > > > > > > I've re-tested with Linus's current git, the problem is still there. > > > > > > > If not then this appears to be some sort of RCU related livelock with > > > > brutally overcommitted vcpus. On native this would show up too, in a > > > > less drastic form, as a spurious bootup delay. > > > > > > I don't think it was overcommited by *that* much. With that commit it > > > usually hangs at 20-40 vcpus, while without it I can go up to 255. > > > > Here is a diagnostic patch, untested. It assumes that your system > > has only a few CPUs (maybe 8-16) and that timers are still running. > > It dumps out some RCU state if grace periods extend for more than > > a few seconds. > > > > To activate it, call rcu_diag_timer_start() from process context. > > To stop it, call rcu_diag_timer_stop(), also from process context. > > Since the hang happens in guest kernel very early during boot, I can't > call rcu_diag_timer_start(). What would be a good place to put the > _start() code instead? Assuming that the failure occurs in the host OS rather than in the guest OS, I suggest placing rcu_diag_timer_start() in the host code that starts up the guest. On the other hand, if the failure is occuring in the guest OS, then I suggest placing the call to rcu_diag_timer_start() just after timer initialization -- that is, assuming that interrupts are enabled at the time of the failure. If interrupts are not yet enabled at the time of the failure, color me clueless. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2011-06-03 at 13:22 -0700, Paul E. McKenney wrote: > On Fri, Jun 03, 2011 at 10:56:20PM +0300, Sasha Levin wrote: > > On Fri, 2011-06-03 at 12:31 -0700, Paul E. McKenney wrote: > > > On Fri, Jun 03, 2011 at 10:54:19AM +0300, Sasha Levin wrote: > > > > On Fri, 2011-06-03 at 09:34 +0200, Ingo Molnar wrote: > > > > > * Sasha Levin <levinsasha928@gmail.com> wrote: > > > > > > > > > > > > with no apparent progress being made. > > > > > > > > > > > > Since it's something that worked in 2.6.37, I've looked into it to > > > > > > find what might have caused this issue. > > > > > > > > > > > > I've bisected guest kernels and found that the problem starts with: > > > > > > > > > > > > a26ac2455ffcf3be5c6ef92bc6df7182700f2114 is the first bad commit > > > > > > commit a26ac2455ffcf3be5c6ef92bc6df7182700f2114 > > > > > > Author: Paul E. McKenney <paul.mckenney@linaro.org> > > > > > > Date: Wed Jan 12 14:10:23 2011 -0800 > > > > > > > > > > > > rcu: move TREE_RCU from softirq to kthread > > > > > > > > > > > > Ingo, could you confirm that the problem goes away for you when you > > > > > > use an earlier commit? > > > > > > > > > > testing will have to wait, but there's a recent upstream fix: > > > > > > > > > > d72bce0e67e8: rcu: Cure load woes > > > > > > > > > > That *might* perhaps address this problem too. > > > > > > > > > I've re-tested with Linus's current git, the problem is still there. > > > > > > > > > If not then this appears to be some sort of RCU related livelock with > > > > > brutally overcommitted vcpus. On native this would show up too, in a > > > > > less drastic form, as a spurious bootup delay. > > > > > > > > I don't think it was overcommited by *that* much. With that commit it > > > > usually hangs at 20-40 vcpus, while without it I can go up to 255. > > > > > > Here is a diagnostic patch, untested. It assumes that your system > > > has only a few CPUs (maybe 8-16) and that timers are still running. > > > It dumps out some RCU state if grace periods extend for more than > > > a few seconds. > > > > > > To activate it, call rcu_diag_timer_start() from process context. > > > To stop it, call rcu_diag_timer_stop(), also from process context. > > > > Since the hang happens in guest kernel very early during boot, I can't > > call rcu_diag_timer_start(). What would be a good place to put the > > _start() code instead? > > Assuming that the failure occurs in the host OS rather than in the guest > OS, I suggest placing rcu_diag_timer_start() in the host code that starts > up the guest. > > On the other hand, if the failure is occuring in the guest OS, then > I suggest placing the call to rcu_diag_timer_start() just after timer > initialization -- that is, assuming that interrupts are enabled at the > time of the failure. If interrupts are not yet enabled at the time of > the failure, color me clueless. It happens in the guest OS, the bisection was done on a guest kernel. I've placed the rcu debug _start() right at the end of init_timers() which happens before the hang, but I'm not seeing any output from the debug function.
On Sat, Jun 04, 2011 at 12:03:59AM +0300, Sasha Levin wrote: > On Fri, 2011-06-03 at 13:22 -0700, Paul E. McKenney wrote: > > On Fri, Jun 03, 2011 at 10:56:20PM +0300, Sasha Levin wrote: > > > On Fri, 2011-06-03 at 12:31 -0700, Paul E. McKenney wrote: > > > > On Fri, Jun 03, 2011 at 10:54:19AM +0300, Sasha Levin wrote: > > > > > On Fri, 2011-06-03 at 09:34 +0200, Ingo Molnar wrote: > > > > > > * Sasha Levin <levinsasha928@gmail.com> wrote: > > > > > > > > > > > > > > with no apparent progress being made. > > > > > > > > > > > > > > Since it's something that worked in 2.6.37, I've looked into it to > > > > > > > find what might have caused this issue. > > > > > > > > > > > > > > I've bisected guest kernels and found that the problem starts with: > > > > > > > > > > > > > > a26ac2455ffcf3be5c6ef92bc6df7182700f2114 is the first bad commit > > > > > > > commit a26ac2455ffcf3be5c6ef92bc6df7182700f2114 > > > > > > > Author: Paul E. McKenney <paul.mckenney@linaro.org> > > > > > > > Date: Wed Jan 12 14:10:23 2011 -0800 > > > > > > > > > > > > > > rcu: move TREE_RCU from softirq to kthread > > > > > > > > > > > > > > Ingo, could you confirm that the problem goes away for you when you > > > > > > > use an earlier commit? > > > > > > > > > > > > testing will have to wait, but there's a recent upstream fix: > > > > > > > > > > > > d72bce0e67e8: rcu: Cure load woes > > > > > > > > > > > > That *might* perhaps address this problem too. > > > > > > > > > > > I've re-tested with Linus's current git, the problem is still there. > > > > > > > > > > > If not then this appears to be some sort of RCU related livelock with > > > > > > brutally overcommitted vcpus. On native this would show up too, in a > > > > > > less drastic form, as a spurious bootup delay. > > > > > > > > > > I don't think it was overcommited by *that* much. With that commit it > > > > > usually hangs at 20-40 vcpus, while without it I can go up to 255. > > > > > > > > Here is a diagnostic patch, untested. It assumes that your system > > > > has only a few CPUs (maybe 8-16) and that timers are still running. > > > > It dumps out some RCU state if grace periods extend for more than > > > > a few seconds. > > > > > > > > To activate it, call rcu_diag_timer_start() from process context. > > > > To stop it, call rcu_diag_timer_stop(), also from process context. > > > > > > Since the hang happens in guest kernel very early during boot, I can't > > > call rcu_diag_timer_start(). What would be a good place to put the > > > _start() code instead? > > > > Assuming that the failure occurs in the host OS rather than in the guest > > OS, I suggest placing rcu_diag_timer_start() in the host code that starts > > up the guest. > > > > On the other hand, if the failure is occuring in the guest OS, then > > I suggest placing the call to rcu_diag_timer_start() just after timer > > initialization -- that is, assuming that interrupts are enabled at the > > time of the failure. If interrupts are not yet enabled at the time of > > the failure, color me clueless. > > It happens in the guest OS, the bisection was done on a guest kernel. > > I've placed the rcu debug _start() right at the end of init_timers() > which happens before the hang, but I'm not seeing any output from the > debug function. Sounds like the hang is happening before timers start? Are scheduling clock interrupts happening in the guest, as in scheduler_tick()? If so, I could just as easily put the debug there. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2011-06-03 at 14:20 -0700, Paul E. McKenney wrote: > On Sat, Jun 04, 2011 at 12:03:59AM +0300, Sasha Levin wrote: > > On Fri, 2011-06-03 at 13:22 -0700, Paul E. McKenney wrote: > > > On Fri, Jun 03, 2011 at 10:56:20PM +0300, Sasha Levin wrote: > > > > On Fri, 2011-06-03 at 12:31 -0700, Paul E. McKenney wrote: > > > > > On Fri, Jun 03, 2011 at 10:54:19AM +0300, Sasha Levin wrote: > > > > > > On Fri, 2011-06-03 at 09:34 +0200, Ingo Molnar wrote: > > > > > > > * Sasha Levin <levinsasha928@gmail.com> wrote: > > > > > > > > > > > > > > > > with no apparent progress being made. > > > > > > > > > > > > > > > > Since it's something that worked in 2.6.37, I've looked into it to > > > > > > > > find what might have caused this issue. > > > > > > > > > > > > > > > > I've bisected guest kernels and found that the problem starts with: > > > > > > > > > > > > > > > > a26ac2455ffcf3be5c6ef92bc6df7182700f2114 is the first bad commit > > > > > > > > commit a26ac2455ffcf3be5c6ef92bc6df7182700f2114 > > > > > > > > Author: Paul E. McKenney <paul.mckenney@linaro.org> > > > > > > > > Date: Wed Jan 12 14:10:23 2011 -0800 > > > > > > > > > > > > > > > > rcu: move TREE_RCU from softirq to kthread > > > > > > > > > > > > > > > > Ingo, could you confirm that the problem goes away for you when you > > > > > > > > use an earlier commit? > > > > > > > > > > > > > > testing will have to wait, but there's a recent upstream fix: > > > > > > > > > > > > > > d72bce0e67e8: rcu: Cure load woes > > > > > > > > > > > > > > That *might* perhaps address this problem too. > > > > > > > > > > > > > I've re-tested with Linus's current git, the problem is still there. > > > > > > > > > > > > > If not then this appears to be some sort of RCU related livelock with > > > > > > > brutally overcommitted vcpus. On native this would show up too, in a > > > > > > > less drastic form, as a spurious bootup delay. > > > > > > > > > > > > I don't think it was overcommited by *that* much. With that commit it > > > > > > usually hangs at 20-40 vcpus, while without it I can go up to 255. > > > > > > > > > > Here is a diagnostic patch, untested. It assumes that your system > > > > > has only a few CPUs (maybe 8-16) and that timers are still running. > > > > > It dumps out some RCU state if grace periods extend for more than > > > > > a few seconds. > > > > > > > > > > To activate it, call rcu_diag_timer_start() from process context. > > > > > To stop it, call rcu_diag_timer_stop(), also from process context. > > > > > > > > Since the hang happens in guest kernel very early during boot, I can't > > > > call rcu_diag_timer_start(). What would be a good place to put the > > > > _start() code instead? > > > > > > Assuming that the failure occurs in the host OS rather than in the guest > > > OS, I suggest placing rcu_diag_timer_start() in the host code that starts > > > up the guest. > > > > > > On the other hand, if the failure is occuring in the guest OS, then > > > I suggest placing the call to rcu_diag_timer_start() just after timer > > > initialization -- that is, assuming that interrupts are enabled at the > > > time of the failure. If interrupts are not yet enabled at the time of > > > the failure, color me clueless. > > > > It happens in the guest OS, the bisection was done on a guest kernel. > > > > I've placed the rcu debug _start() right at the end of init_timers() > > which happens before the hang, but I'm not seeing any output from the > > debug function. > > Sounds like the hang is happening before timers start? Are scheduling > clock interrupts happening in the guest, as in scheduler_tick()? If so, > I could just as easily put the debug there. Yes, scheduler_tick() is running before the hang. Should I just call the handler function directly from there?
On Sat, Jun 04, 2011 at 01:54:45AM +0300, Sasha Levin wrote: > On Fri, 2011-06-03 at 14:20 -0700, Paul E. McKenney wrote: > > On Sat, Jun 04, 2011 at 12:03:59AM +0300, Sasha Levin wrote: > > > On Fri, 2011-06-03 at 13:22 -0700, Paul E. McKenney wrote: > > > > On Fri, Jun 03, 2011 at 10:56:20PM +0300, Sasha Levin wrote: > > > > > On Fri, 2011-06-03 at 12:31 -0700, Paul E. McKenney wrote: > > > > > > On Fri, Jun 03, 2011 at 10:54:19AM +0300, Sasha Levin wrote: > > > > > > > On Fri, 2011-06-03 at 09:34 +0200, Ingo Molnar wrote: > > > > > > > > * Sasha Levin <levinsasha928@gmail.com> wrote: > > > > > > > > > > > > > > > > > > with no apparent progress being made. > > > > > > > > > > > > > > > > > > Since it's something that worked in 2.6.37, I've looked into it to > > > > > > > > > find what might have caused this issue. > > > > > > > > > > > > > > > > > > I've bisected guest kernels and found that the problem starts with: > > > > > > > > > > > > > > > > > > a26ac2455ffcf3be5c6ef92bc6df7182700f2114 is the first bad commit > > > > > > > > > commit a26ac2455ffcf3be5c6ef92bc6df7182700f2114 > > > > > > > > > Author: Paul E. McKenney <paul.mckenney@linaro.org> > > > > > > > > > Date: Wed Jan 12 14:10:23 2011 -0800 > > > > > > > > > > > > > > > > > > rcu: move TREE_RCU from softirq to kthread > > > > > > > > > > > > > > > > > > Ingo, could you confirm that the problem goes away for you when you > > > > > > > > > use an earlier commit? > > > > > > > > > > > > > > > > testing will have to wait, but there's a recent upstream fix: > > > > > > > > > > > > > > > > d72bce0e67e8: rcu: Cure load woes > > > > > > > > > > > > > > > > That *might* perhaps address this problem too. > > > > > > > > > > > > > > > I've re-tested with Linus's current git, the problem is still there. > > > > > > > > > > > > > > > If not then this appears to be some sort of RCU related livelock with > > > > > > > > brutally overcommitted vcpus. On native this would show up too, in a > > > > > > > > less drastic form, as a spurious bootup delay. > > > > > > > > > > > > > > I don't think it was overcommited by *that* much. With that commit it > > > > > > > usually hangs at 20-40 vcpus, while without it I can go up to 255. > > > > > > > > > > > > Here is a diagnostic patch, untested. It assumes that your system > > > > > > has only a few CPUs (maybe 8-16) and that timers are still running. > > > > > > It dumps out some RCU state if grace periods extend for more than > > > > > > a few seconds. > > > > > > > > > > > > To activate it, call rcu_diag_timer_start() from process context. > > > > > > To stop it, call rcu_diag_timer_stop(), also from process context. > > > > > > > > > > Since the hang happens in guest kernel very early during boot, I can't > > > > > call rcu_diag_timer_start(). What would be a good place to put the > > > > > _start() code instead? > > > > > > > > Assuming that the failure occurs in the host OS rather than in the guest > > > > OS, I suggest placing rcu_diag_timer_start() in the host code that starts > > > > up the guest. > > > > > > > > On the other hand, if the failure is occuring in the guest OS, then > > > > I suggest placing the call to rcu_diag_timer_start() just after timer > > > > initialization -- that is, assuming that interrupts are enabled at the > > > > time of the failure. If interrupts are not yet enabled at the time of > > > > the failure, color me clueless. > > > > > > It happens in the guest OS, the bisection was done on a guest kernel. > > > > > > I've placed the rcu debug _start() right at the end of init_timers() > > > which happens before the hang, but I'm not seeing any output from the > > > debug function. > > > > Sounds like the hang is happening before timers start? Are scheduling > > clock interrupts happening in the guest, as in scheduler_tick()? If so, > > I could just as easily put the debug there. > > Yes, scheduler_tick() is running before the hang. > Should I just call the handler function directly from there? I suggest calling it no more than once every few seconds, but yes. (Otherwise you will get a very full dmesg.) Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2011-06-03 at 16:05 -0700, Paul E. McKenney wrote: > On Sat, Jun 04, 2011 at 01:54:45AM +0300, Sasha Levin wrote: > > On Fri, 2011-06-03 at 14:20 -0700, Paul E. McKenney wrote: > > > On Sat, Jun 04, 2011 at 12:03:59AM +0300, Sasha Levin wrote: > > > > On Fri, 2011-06-03 at 13:22 -0700, Paul E. McKenney wrote: > > > > > On Fri, Jun 03, 2011 at 10:56:20PM +0300, Sasha Levin wrote: > > > > > > On Fri, 2011-06-03 at 12:31 -0700, Paul E. McKenney wrote: > > > > > > > On Fri, Jun 03, 2011 at 10:54:19AM +0300, Sasha Levin wrote: > > > > > > > > On Fri, 2011-06-03 at 09:34 +0200, Ingo Molnar wrote: > > > > > > > > > * Sasha Levin <levinsasha928@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > with no apparent progress being made. > > > > > > > > > > > > > > > > > > > > Since it's something that worked in 2.6.37, I've looked into it to > > > > > > > > > > find what might have caused this issue. > > > > > > > > > > > > > > > > > > > > I've bisected guest kernels and found that the problem starts with: > > > > > > > > > > > > > > > > > > > > a26ac2455ffcf3be5c6ef92bc6df7182700f2114 is the first bad commit > > > > > > > > > > commit a26ac2455ffcf3be5c6ef92bc6df7182700f2114 > > > > > > > > > > Author: Paul E. McKenney <paul.mckenney@linaro.org> > > > > > > > > > > Date: Wed Jan 12 14:10:23 2011 -0800 > > > > > > > > > > > > > > > > > > > > rcu: move TREE_RCU from softirq to kthread > > > > > > > > > > > > > > > > > > > > Ingo, could you confirm that the problem goes away for you when you > > > > > > > > > > use an earlier commit? > > > > > > > > > > > > > > > > > > testing will have to wait, but there's a recent upstream fix: > > > > > > > > > > > > > > > > > > d72bce0e67e8: rcu: Cure load woes > > > > > > > > > > > > > > > > > > That *might* perhaps address this problem too. > > > > > > > > > > > > > > > > > I've re-tested with Linus's current git, the problem is still there. > > > > > > > > > > > > > > > > > If not then this appears to be some sort of RCU related livelock with > > > > > > > > > brutally overcommitted vcpus. On native this would show up too, in a > > > > > > > > > less drastic form, as a spurious bootup delay. > > > > > > > > > > > > > > > > I don't think it was overcommited by *that* much. With that commit it > > > > > > > > usually hangs at 20-40 vcpus, while without it I can go up to 255. > > > > > > > > > > > > > > Here is a diagnostic patch, untested. It assumes that your system > > > > > > > has only a few CPUs (maybe 8-16) and that timers are still running. > > > > > > > It dumps out some RCU state if grace periods extend for more than > > > > > > > a few seconds. > > > > > > > > > > > > > > To activate it, call rcu_diag_timer_start() from process context. > > > > > > > To stop it, call rcu_diag_timer_stop(), also from process context. > > > > > > > > > > > > Since the hang happens in guest kernel very early during boot, I can't > > > > > > call rcu_diag_timer_start(). What would be a good place to put the > > > > > > _start() code instead? > > > > > > > > > > Assuming that the failure occurs in the host OS rather than in the guest > > > > > OS, I suggest placing rcu_diag_timer_start() in the host code that starts > > > > > up the guest. > > > > > > > > > > On the other hand, if the failure is occuring in the guest OS, then > > > > > I suggest placing the call to rcu_diag_timer_start() just after timer > > > > > initialization -- that is, assuming that interrupts are enabled at the > > > > > time of the failure. If interrupts are not yet enabled at the time of > > > > > the failure, color me clueless. > > > > > > > > It happens in the guest OS, the bisection was done on a guest kernel. > > > > > > > > I've placed the rcu debug _start() right at the end of init_timers() > > > > which happens before the hang, but I'm not seeing any output from the > > > > debug function. > > > > > > Sounds like the hang is happening before timers start? Are scheduling > > > clock interrupts happening in the guest, as in scheduler_tick()? If so, > > > I could just as easily put the debug there. > > > > Yes, scheduler_tick() is running before the hang. > > Should I just call the handler function directly from there? > > I suggest calling it no more than once every few seconds, but yes. > (Otherwise you will get a very full dmesg.) Here is the last group of printk(), basically it just repeats itself when it's stuck: KVM setup async PF for cpu 19 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Call Trace: rcu_diag: rcuc17 2/46/0 kworker/0:1 R running task 0 0 14 0x00000008 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Call Trace: rcu_diag: rcuc18 2/23/0 kworker/0:1 R running task 0 0 14 0x00000008 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Call Trace: rcu_diag: rcuc19 0/0/0 kworker/0:1 R running task 0 0 14 0x00000008 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Call Trace: [ 0.527044] #20 0000000000000000 Call Trace: rcu_diag: rcuc3 0/0/0 kworker/0:1 R running task 0 0 14 0x00000008 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Call Trace: rcu_diag: rcuc4 2/358/0 kworker/0:1 R running task 0 0 14 0x00000008 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Call Trace: rcu_diag: rcuc5 2/340/1 kworker/0:1 R running task 0 0 14 0x00000008 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Call Trace: rcu_diag: rcuc6 2/321/1 kworker/0:1 R running task 0 0 14 0x00000008 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Call Trace: rcu_diag: rcuc7 2/301/1 kworker/0:1 R running task 0 0 14 0x00000008 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Call Trace: rcu_diag: rcuc8 2/277/0 kworker/0:1 R running task 0 0 14 0x00000008 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Call Trace: rcu_diag: rcuc9 2/255/0 kworker/0:1 R running task 0 0 14 0x00000008 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Call Trace: rcu_diag: rcuc10 2/226/0 kworker/0:1 R running task 0 0 14 0x00000008 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Call Trace: rcu_diag: rcuc11 2/214/1 kworker/0:1 R running task 0 0 14 0x00000008 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Call Trace: rcu_diag: rcuc12 2/193/1 kworker/0:1 R running task 0 0 14 0x00000008 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Call Trace: rcu_diag: rcuc13 2/172/0 kworker/0:1 R running task 0 0 14 0x00000008 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Call Trace: rcu_diag: rcuc14 2/146/0 kworker/0:1 R running task 0 0 14 0x00000008 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Call Trace: rcu_diag: rcuc15 2/130/0 kworker/0:1 R running task 0 0 14 0x00000008 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Call Trace: rcu_diag: rcuc16 2/107/0 kworker/0:1 R running task 0 0 14 0x00000008 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Call Trace: rcu_diag: rcuc17 2/86/0 kworker/0:1 R running task 0 0 14 0x00000008 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Call Trace: rcu_diag: rcuc18 2/62/1 kworker/0:1 R running task 0 0 14 0x00000008 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Call Trace: rcu_diag: rcuc19 2/39/0 kworker/0:1 R running task 0 0 14 0x00000008 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Call Trace:
On Sat, Jun 04, 2011 at 09:26:15AM +0300, Sasha Levin wrote: > On Fri, 2011-06-03 at 16:05 -0700, Paul E. McKenney wrote: > > On Sat, Jun 04, 2011 at 01:54:45AM +0300, Sasha Levin wrote: > > > On Fri, 2011-06-03 at 14:20 -0700, Paul E. McKenney wrote: > > > > On Sat, Jun 04, 2011 at 12:03:59AM +0300, Sasha Levin wrote: > > > > > On Fri, 2011-06-03 at 13:22 -0700, Paul E. McKenney wrote: > > > > > > On Fri, Jun 03, 2011 at 10:56:20PM +0300, Sasha Levin wrote: > > > > > > > On Fri, 2011-06-03 at 12:31 -0700, Paul E. McKenney wrote: > > > > > > > > On Fri, Jun 03, 2011 at 10:54:19AM +0300, Sasha Levin wrote: > > > > > > > > > On Fri, 2011-06-03 at 09:34 +0200, Ingo Molnar wrote: > > > > > > > > > > * Sasha Levin <levinsasha928@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > > > with no apparent progress being made. > > > > > > > > > > > > > > > > > > > > > > Since it's something that worked in 2.6.37, I've looked into it to > > > > > > > > > > > find what might have caused this issue. > > > > > > > > > > > > > > > > > > > > > > I've bisected guest kernels and found that the problem starts with: > > > > > > > > > > > > > > > > > > > > > > a26ac2455ffcf3be5c6ef92bc6df7182700f2114 is the first bad commit > > > > > > > > > > > commit a26ac2455ffcf3be5c6ef92bc6df7182700f2114 > > > > > > > > > > > Author: Paul E. McKenney <paul.mckenney@linaro.org> > > > > > > > > > > > Date: Wed Jan 12 14:10:23 2011 -0800 > > > > > > > > > > > > > > > > > > > > > > rcu: move TREE_RCU from softirq to kthread > > > > > > > > > > > > > > > > > > > > > > Ingo, could you confirm that the problem goes away for you when you > > > > > > > > > > > use an earlier commit? > > > > > > > > > > > > > > > > > > > > testing will have to wait, but there's a recent upstream fix: > > > > > > > > > > > > > > > > > > > > d72bce0e67e8: rcu: Cure load woes > > > > > > > > > > > > > > > > > > > > That *might* perhaps address this problem too. > > > > > > > > > > > > > > > > > > > I've re-tested with Linus's current git, the problem is still there. > > > > > > > > > > > > > > > > > > > If not then this appears to be some sort of RCU related livelock with > > > > > > > > > > brutally overcommitted vcpus. On native this would show up too, in a > > > > > > > > > > less drastic form, as a spurious bootup delay. > > > > > > > > > > > > > > > > > > I don't think it was overcommited by *that* much. With that commit it > > > > > > > > > usually hangs at 20-40 vcpus, while without it I can go up to 255. > > > > > > > > > > > > > > > > Here is a diagnostic patch, untested. It assumes that your system > > > > > > > > has only a few CPUs (maybe 8-16) and that timers are still running. > > > > > > > > It dumps out some RCU state if grace periods extend for more than > > > > > > > > a few seconds. > > > > > > > > > > > > > > > > To activate it, call rcu_diag_timer_start() from process context. > > > > > > > > To stop it, call rcu_diag_timer_stop(), also from process context. > > > > > > > > > > > > > > Since the hang happens in guest kernel very early during boot, I can't > > > > > > > call rcu_diag_timer_start(). What would be a good place to put the > > > > > > > _start() code instead? > > > > > > > > > > > > Assuming that the failure occurs in the host OS rather than in the guest > > > > > > OS, I suggest placing rcu_diag_timer_start() in the host code that starts > > > > > > up the guest. > > > > > > > > > > > > On the other hand, if the failure is occuring in the guest OS, then > > > > > > I suggest placing the call to rcu_diag_timer_start() just after timer > > > > > > initialization -- that is, assuming that interrupts are enabled at the > > > > > > time of the failure. If interrupts are not yet enabled at the time of > > > > > > the failure, color me clueless. > > > > > > > > > > It happens in the guest OS, the bisection was done on a guest kernel. > > > > > > > > > > I've placed the rcu debug _start() right at the end of init_timers() > > > > > which happens before the hang, but I'm not seeing any output from the > > > > > debug function. > > > > > > > > Sounds like the hang is happening before timers start? Are scheduling > > > > clock interrupts happening in the guest, as in scheduler_tick()? If so, > > > > I could just as easily put the debug there. > > > > > > Yes, scheduler_tick() is running before the hang. > > > Should I just call the handler function directly from there? > > > > I suggest calling it no more than once every few seconds, but yes. > > (Otherwise you will get a very full dmesg.) > > Here is the last group of printk(), basically it just repeats itself > when it's stuck: > > KVM setup async PF for cpu 19 > 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > Call Trace: > rcu_diag: rcuc17 2/46/0 > kworker/0:1 R running task 0 0 14 0x00000008 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > Call Trace: > rcu_diag: rcuc18 2/23/0 > kworker/0:1 R running task 0 0 14 0x00000008 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > Call Trace: > rcu_diag: rcuc19 0/0/0 > kworker/0:1 R running task 0 0 14 0x00000008 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > Call Trace: > [ 0.527044] #20 > 0000000000000000 > Call Trace: > rcu_diag: rcuc3 0/0/0 > kworker/0:1 R running task 0 0 14 0x00000008 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > Call Trace: > rcu_diag: rcuc4 2/358/0 > kworker/0:1 R running task 0 0 14 0x00000008 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > Call Trace: > rcu_diag: rcuc5 2/340/1 OK, this one (RCU's per-CPU kthread for CPU 5) has work to do (this is the "1" after the final "/"). Does rcuc5 show up later in the output? Thanx, Paul > kworker/0:1 R running task 0 0 14 0x00000008 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > Call Trace: > rcu_diag: rcuc6 2/321/1 > kworker/0:1 R running task 0 0 14 0x00000008 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > Call Trace: > rcu_diag: rcuc7 2/301/1 > kworker/0:1 R running task 0 0 14 0x00000008 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > Call Trace: > rcu_diag: rcuc8 2/277/0 > kworker/0:1 R running task 0 0 14 0x00000008 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > Call Trace: > rcu_diag: rcuc9 2/255/0 > kworker/0:1 R running task 0 0 14 0x00000008 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > Call Trace: > rcu_diag: rcuc10 2/226/0 > kworker/0:1 R running task 0 0 14 0x00000008 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > Call Trace: > rcu_diag: rcuc11 2/214/1 > kworker/0:1 R running task 0 0 14 0x00000008 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > Call Trace: > rcu_diag: rcuc12 2/193/1 > kworker/0:1 R running task 0 0 14 0x00000008 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > Call Trace: > rcu_diag: rcuc13 2/172/0 > kworker/0:1 R running task 0 0 14 0x00000008 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > Call Trace: > rcu_diag: rcuc14 2/146/0 > kworker/0:1 R running task 0 0 14 0x00000008 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > Call Trace: > rcu_diag: rcuc15 2/130/0 > kworker/0:1 R running task 0 0 14 0x00000008 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > Call Trace: > rcu_diag: rcuc16 2/107/0 > kworker/0:1 R running task 0 0 14 0x00000008 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > Call Trace: > rcu_diag: rcuc17 2/86/0 > kworker/0:1 R running task 0 0 14 0x00000008 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > Call Trace: > rcu_diag: rcuc18 2/62/1 > kworker/0:1 R running task 0 0 14 0x00000008 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > Call Trace: > rcu_diag: rcuc19 2/39/0 > kworker/0:1 R running task 0 0 14 0x00000008 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > Call Trace: > > > -- > > Sasha. > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Sorry for the delay on this. On Sat, 2011-06-04 at 09:30 -0700, Paul E. McKenney wrote: > On Sat, Jun 04, 2011 at 09:26:15AM +0300, Sasha Levin wrote: > > On Fri, 2011-06-03 at 16:05 -0700, Paul E. McKenney wrote: > > > On Sat, Jun 04, 2011 at 01:54:45AM +0300, Sasha Levin wrote: > > > > On Fri, 2011-06-03 at 14:20 -0700, Paul E. McKenney wrote: > > > > > On Sat, Jun 04, 2011 at 12:03:59AM +0300, Sasha Levin wrote: > > > > > > On Fri, 2011-06-03 at 13:22 -0700, Paul E. McKenney wrote: > > > > > > > On Fri, Jun 03, 2011 at 10:56:20PM +0300, Sasha Levin wrote: > > > > > > > > On Fri, 2011-06-03 at 12:31 -0700, Paul E. McKenney wrote: > > > > > > > > > On Fri, Jun 03, 2011 at 10:54:19AM +0300, Sasha Levin wrote: > > > > > > > > > > On Fri, 2011-06-03 at 09:34 +0200, Ingo Molnar wrote: > > > > > > > > > > > * Sasha Levin <levinsasha928@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > > with no apparent progress being made. > > > > > > > > > > > > > > > > > > > > > > > > Since it's something that worked in 2.6.37, I've looked into it to > > > > > > > > > > > > find what might have caused this issue. > > > > > > > > > > > > > > > > > > > > > > > > I've bisected guest kernels and found that the problem starts with: > > > > > > > > > > > > > > > > > > > > > > > > a26ac2455ffcf3be5c6ef92bc6df7182700f2114 is the first bad commit > > > > > > > > > > > > commit a26ac2455ffcf3be5c6ef92bc6df7182700f2114 > > > > > > > > > > > > Author: Paul E. McKenney <paul.mckenney@linaro.org> > > > > > > > > > > > > Date: Wed Jan 12 14:10:23 2011 -0800 > > > > > > > > > > > > > > > > > > > > > > > > rcu: move TREE_RCU from softirq to kthread > > > > > > > > > > > > > > > > > > > > > > > > Ingo, could you confirm that the problem goes away for you when you > > > > > > > > > > > > use an earlier commit? > > > > > > > > > > > > > > > > > > > > > > testing will have to wait, but there's a recent upstream fix: > > > > > > > > > > > > > > > > > > > > > > d72bce0e67e8: rcu: Cure load woes > > > > > > > > > > > > > > > > > > > > > > That *might* perhaps address this problem too. > > > > > > > > > > > > > > > > > > > > > I've re-tested with Linus's current git, the problem is still there. > > > > > > > > > > > > > > > > > > > > > If not then this appears to be some sort of RCU related livelock with > > > > > > > > > > > brutally overcommitted vcpus. On native this would show up too, in a > > > > > > > > > > > less drastic form, as a spurious bootup delay. > > > > > > > > > > > > > > > > > > > > I don't think it was overcommited by *that* much. With that commit it > > > > > > > > > > usually hangs at 20-40 vcpus, while without it I can go up to 255. > > > > > > > > > > > > > > > > > > Here is a diagnostic patch, untested. It assumes that your system > > > > > > > > > has only a few CPUs (maybe 8-16) and that timers are still running. > > > > > > > > > It dumps out some RCU state if grace periods extend for more than > > > > > > > > > a few seconds. > > > > > > > > > > > > > > > > > > To activate it, call rcu_diag_timer_start() from process context. > > > > > > > > > To stop it, call rcu_diag_timer_stop(), also from process context. > > > > > > > > > > > > > > > > Since the hang happens in guest kernel very early during boot, I can't > > > > > > > > call rcu_diag_timer_start(). What would be a good place to put the > > > > > > > > _start() code instead? > > > > > > > > > > > > > > Assuming that the failure occurs in the host OS rather than in the guest > > > > > > > OS, I suggest placing rcu_diag_timer_start() in the host code that starts > > > > > > > up the guest. > > > > > > > > > > > > > > On the other hand, if the failure is occuring in the guest OS, then > > > > > > > I suggest placing the call to rcu_diag_timer_start() just after timer > > > > > > > initialization -- that is, assuming that interrupts are enabled at the > > > > > > > time of the failure. If interrupts are not yet enabled at the time of > > > > > > > the failure, color me clueless. > > > > > > > > > > > > It happens in the guest OS, the bisection was done on a guest kernel. > > > > > > > > > > > > I've placed the rcu debug _start() right at the end of init_timers() > > > > > > which happens before the hang, but I'm not seeing any output from the > > > > > > debug function. > > > > > > > > > > Sounds like the hang is happening before timers start? Are scheduling > > > > > clock interrupts happening in the guest, as in scheduler_tick()? If so, > > > > > I could just as easily put the debug there. > > > > > > > > Yes, scheduler_tick() is running before the hang. > > > > Should I just call the handler function directly from there? > > > > > > I suggest calling it no more than once every few seconds, but yes. > > > (Otherwise you will get a very full dmesg.) > > > > Here is the last group of printk(), basically it just repeats itself > > when it's stuck: > > > > KVM setup async PF for cpu 19 > > 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > Call Trace: > > rcu_diag: rcuc17 2/46/0 > > kworker/0:1 R running task 0 0 14 0x00000008 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > Call Trace: > > rcu_diag: rcuc18 2/23/0 > > kworker/0:1 R running task 0 0 14 0x00000008 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > Call Trace: > > rcu_diag: rcuc19 0/0/0 > > kworker/0:1 R running task 0 0 14 0x00000008 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > Call Trace: > > [ 0.527044] #20 > > 0000000000000000 > > Call Trace: > > rcu_diag: rcuc3 0/0/0 > > kworker/0:1 R running task 0 0 14 0x00000008 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > Call Trace: > > rcu_diag: rcuc4 2/358/0 > > kworker/0:1 R running task 0 0 14 0x00000008 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > Call Trace: > > rcu_diag: rcuc5 2/340/1 > > OK, this one (RCU's per-CPU kthread for CPU 5) has work to do (this > is the "1" after the final "/"). Does rcuc5 show up later in the > output? > The output grinds to a stop, I've pasted the last part of the prints - there are no more prints at all past this one. > Thanx, Paul > > > kworker/0:1 R running task 0 0 14 0x00000008 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > Call Trace: > > rcu_diag: rcuc6 2/321/1 > > kworker/0:1 R running task 0 0 14 0x00000008 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > Call Trace: > > rcu_diag: rcuc7 2/301/1 > > kworker/0:1 R running task 0 0 14 0x00000008 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > Call Trace: > > rcu_diag: rcuc8 2/277/0 > > kworker/0:1 R running task 0 0 14 0x00000008 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > Call Trace: > > rcu_diag: rcuc9 2/255/0 > > kworker/0:1 R running task 0 0 14 0x00000008 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > Call Trace: > > rcu_diag: rcuc10 2/226/0 > > kworker/0:1 R running task 0 0 14 0x00000008 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > Call Trace: > > rcu_diag: rcuc11 2/214/1 > > kworker/0:1 R running task 0 0 14 0x00000008 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > Call Trace: > > rcu_diag: rcuc12 2/193/1 > > kworker/0:1 R running task 0 0 14 0x00000008 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > Call Trace: > > rcu_diag: rcuc13 2/172/0 > > kworker/0:1 R running task 0 0 14 0x00000008 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > Call Trace: > > rcu_diag: rcuc14 2/146/0 > > kworker/0:1 R running task 0 0 14 0x00000008 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > Call Trace: > > rcu_diag: rcuc15 2/130/0 > > kworker/0:1 R running task 0 0 14 0x00000008 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > Call Trace: > > rcu_diag: rcuc16 2/107/0 > > kworker/0:1 R running task 0 0 14 0x00000008 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > Call Trace: > > rcu_diag: rcuc17 2/86/0 > > kworker/0:1 R running task 0 0 14 0x00000008 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > Call Trace: > > rcu_diag: rcuc18 2/62/1 > > kworker/0:1 R running task 0 0 14 0x00000008 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > Call Trace: > > rcu_diag: rcuc19 2/39/0 > > kworker/0:1 R running task 0 0 14 0x00000008 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > Call Trace: > > > > > > -- > > > > Sasha. > >
On Tue, Jun 14, 2011 at 06:26:01PM -0400, Sasha Levin wrote: > Sorry for the delay on this. Actually, you might have had excellent timing on this one. Could you please try Shaohua Li's patch at: https://lkml.org/lkml/2011/6/14/20 This patch makes RCU much less prone to causing massive scheduler lock contention, which might well be the cause of the problems that you are seeing. Thanx, Paul > On Sat, 2011-06-04 at 09:30 -0700, Paul E. McKenney wrote: > > On Sat, Jun 04, 2011 at 09:26:15AM +0300, Sasha Levin wrote: > > > On Fri, 2011-06-03 at 16:05 -0700, Paul E. McKenney wrote: > > > > On Sat, Jun 04, 2011 at 01:54:45AM +0300, Sasha Levin wrote: > > > > > On Fri, 2011-06-03 at 14:20 -0700, Paul E. McKenney wrote: > > > > > > On Sat, Jun 04, 2011 at 12:03:59AM +0300, Sasha Levin wrote: > > > > > > > On Fri, 2011-06-03 at 13:22 -0700, Paul E. McKenney wrote: > > > > > > > > On Fri, Jun 03, 2011 at 10:56:20PM +0300, Sasha Levin wrote: > > > > > > > > > On Fri, 2011-06-03 at 12:31 -0700, Paul E. McKenney wrote: > > > > > > > > > > On Fri, Jun 03, 2011 at 10:54:19AM +0300, Sasha Levin wrote: > > > > > > > > > > > On Fri, 2011-06-03 at 09:34 +0200, Ingo Molnar wrote: > > > > > > > > > > > > * Sasha Levin <levinsasha928@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > with no apparent progress being made. > > > > > > > > > > > > > > > > > > > > > > > > > > Since it's something that worked in 2.6.37, I've looked into it to > > > > > > > > > > > > > find what might have caused this issue. > > > > > > > > > > > > > > > > > > > > > > > > > > I've bisected guest kernels and found that the problem starts with: > > > > > > > > > > > > > > > > > > > > > > > > > > a26ac2455ffcf3be5c6ef92bc6df7182700f2114 is the first bad commit > > > > > > > > > > > > > commit a26ac2455ffcf3be5c6ef92bc6df7182700f2114 > > > > > > > > > > > > > Author: Paul E. McKenney <paul.mckenney@linaro.org> > > > > > > > > > > > > > Date: Wed Jan 12 14:10:23 2011 -0800 > > > > > > > > > > > > > > > > > > > > > > > > > > rcu: move TREE_RCU from softirq to kthread > > > > > > > > > > > > > > > > > > > > > > > > > > Ingo, could you confirm that the problem goes away for you when you > > > > > > > > > > > > > use an earlier commit? > > > > > > > > > > > > > > > > > > > > > > > > testing will have to wait, but there's a recent upstream fix: > > > > > > > > > > > > > > > > > > > > > > > > d72bce0e67e8: rcu: Cure load woes > > > > > > > > > > > > > > > > > > > > > > > > That *might* perhaps address this problem too. > > > > > > > > > > > > > > > > > > > > > > > I've re-tested with Linus's current git, the problem is still there. > > > > > > > > > > > > > > > > > > > > > > > If not then this appears to be some sort of RCU related livelock with > > > > > > > > > > > > brutally overcommitted vcpus. On native this would show up too, in a > > > > > > > > > > > > less drastic form, as a spurious bootup delay. > > > > > > > > > > > > > > > > > > > > > > I don't think it was overcommited by *that* much. With that commit it > > > > > > > > > > > usually hangs at 20-40 vcpus, while without it I can go up to 255. > > > > > > > > > > > > > > > > > > > > Here is a diagnostic patch, untested. It assumes that your system > > > > > > > > > > has only a few CPUs (maybe 8-16) and that timers are still running. > > > > > > > > > > It dumps out some RCU state if grace periods extend for more than > > > > > > > > > > a few seconds. > > > > > > > > > > > > > > > > > > > > To activate it, call rcu_diag_timer_start() from process context. > > > > > > > > > > To stop it, call rcu_diag_timer_stop(), also from process context. > > > > > > > > > > > > > > > > > > Since the hang happens in guest kernel very early during boot, I can't > > > > > > > > > call rcu_diag_timer_start(). What would be a good place to put the > > > > > > > > > _start() code instead? > > > > > > > > > > > > > > > > Assuming that the failure occurs in the host OS rather than in the guest > > > > > > > > OS, I suggest placing rcu_diag_timer_start() in the host code that starts > > > > > > > > up the guest. > > > > > > > > > > > > > > > > On the other hand, if the failure is occuring in the guest OS, then > > > > > > > > I suggest placing the call to rcu_diag_timer_start() just after timer > > > > > > > > initialization -- that is, assuming that interrupts are enabled at the > > > > > > > > time of the failure. If interrupts are not yet enabled at the time of > > > > > > > > the failure, color me clueless. > > > > > > > > > > > > > > It happens in the guest OS, the bisection was done on a guest kernel. > > > > > > > > > > > > > > I've placed the rcu debug _start() right at the end of init_timers() > > > > > > > which happens before the hang, but I'm not seeing any output from the > > > > > > > debug function. > > > > > > > > > > > > Sounds like the hang is happening before timers start? Are scheduling > > > > > > clock interrupts happening in the guest, as in scheduler_tick()? If so, > > > > > > I could just as easily put the debug there. > > > > > > > > > > Yes, scheduler_tick() is running before the hang. > > > > > Should I just call the handler function directly from there? > > > > > > > > I suggest calling it no more than once every few seconds, but yes. > > > > (Otherwise you will get a very full dmesg.) > > > > > > Here is the last group of printk(), basically it just repeats itself > > > when it's stuck: > > > > > > KVM setup async PF for cpu 19 > > > 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > Call Trace: > > > rcu_diag: rcuc17 2/46/0 > > > kworker/0:1 R running task 0 0 14 0x00000008 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > Call Trace: > > > rcu_diag: rcuc18 2/23/0 > > > kworker/0:1 R running task 0 0 14 0x00000008 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > Call Trace: > > > rcu_diag: rcuc19 0/0/0 > > > kworker/0:1 R running task 0 0 14 0x00000008 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > Call Trace: > > > [ 0.527044] #20 > > > 0000000000000000 > > > Call Trace: > > > rcu_diag: rcuc3 0/0/0 > > > kworker/0:1 R running task 0 0 14 0x00000008 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > Call Trace: > > > rcu_diag: rcuc4 2/358/0 > > > kworker/0:1 R running task 0 0 14 0x00000008 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > Call Trace: > > > rcu_diag: rcuc5 2/340/1 > > > > OK, this one (RCU's per-CPU kthread for CPU 5) has work to do (this > > is the "1" after the final "/"). Does rcuc5 show up later in the > > output? > > > > The output grinds to a stop, I've pasted the last part of the prints - > there are no more prints at all past this one. > > > Thanx, Paul > > > > > kworker/0:1 R running task 0 0 14 0x00000008 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > Call Trace: > > > rcu_diag: rcuc6 2/321/1 > > > kworker/0:1 R running task 0 0 14 0x00000008 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > Call Trace: > > > rcu_diag: rcuc7 2/301/1 > > > kworker/0:1 R running task 0 0 14 0x00000008 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > Call Trace: > > > rcu_diag: rcuc8 2/277/0 > > > kworker/0:1 R running task 0 0 14 0x00000008 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > Call Trace: > > > rcu_diag: rcuc9 2/255/0 > > > kworker/0:1 R running task 0 0 14 0x00000008 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > Call Trace: > > > rcu_diag: rcuc10 2/226/0 > > > kworker/0:1 R running task 0 0 14 0x00000008 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > Call Trace: > > > rcu_diag: rcuc11 2/214/1 > > > kworker/0:1 R running task 0 0 14 0x00000008 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > Call Trace: > > > rcu_diag: rcuc12 2/193/1 > > > kworker/0:1 R running task 0 0 14 0x00000008 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > Call Trace: > > > rcu_diag: rcuc13 2/172/0 > > > kworker/0:1 R running task 0 0 14 0x00000008 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > Call Trace: > > > rcu_diag: rcuc14 2/146/0 > > > kworker/0:1 R running task 0 0 14 0x00000008 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > Call Trace: > > > rcu_diag: rcuc15 2/130/0 > > > kworker/0:1 R running task 0 0 14 0x00000008 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > Call Trace: > > > rcu_diag: rcuc16 2/107/0 > > > kworker/0:1 R running task 0 0 14 0x00000008 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > Call Trace: > > > rcu_diag: rcuc17 2/86/0 > > > kworker/0:1 R running task 0 0 14 0x00000008 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > Call Trace: > > > rcu_diag: rcuc18 2/62/1 > > > kworker/0:1 R running task 0 0 14 0x00000008 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > Call Trace: > > > rcu_diag: rcuc19 2/39/0 > > > kworker/0:1 R running task 0 0 14 0x00000008 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > Call Trace: > > > > > > > > > -- > > > > > > Sasha. > > > > > -- > > Sasha. > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2011-06-14 at 16:42 -0700, Paul E. McKenney wrote: > On Tue, Jun 14, 2011 at 06:26:01PM -0400, Sasha Levin wrote: > > Sorry for the delay on this. > > Actually, you might have had excellent timing on this one. > > Could you please try Shaohua Li's patch at: > > https://lkml.org/lkml/2011/6/14/20 > > This patch makes RCU much less prone to causing massive scheduler lock > contention, which might well be the cause of the problems that you > are seeing. This patch has solved the hang issue I've had. kvm tools now boots 60 vcpus easily with 3.0 kernel.
On Tue, Jun 14, 2011 at 09:25:22PM -0400, Sasha Levin wrote: > On Tue, 2011-06-14 at 16:42 -0700, Paul E. McKenney wrote: > > On Tue, Jun 14, 2011 at 06:26:01PM -0400, Sasha Levin wrote: > > > Sorry for the delay on this. > > > > Actually, you might have had excellent timing on this one. > > > > Could you please try Shaohua Li's patch at: > > > > https://lkml.org/lkml/2011/6/14/20 > > > > This patch makes RCU much less prone to causing massive scheduler lock > > contention, which might well be the cause of the problems that you > > are seeing. > > This patch has solved the hang issue I've had. kvm tools now boots 60 > vcpus easily with 3.0 kernel. Thank you, Sasha -- I have added your Tested-by. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 99f9aa7..489ea1b 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -80,6 +80,8 @@ extern void call_rcu_sched(struct rcu_head *head, extern void synchronize_sched(void); extern void rcu_barrier_bh(void); extern void rcu_barrier_sched(void); +extern void rcu_diag_timer_start(void); +extern void rcu_diag_timer_stop(void); static inline void __rcu_read_lock_bh(void) { diff --git a/kernel/rcutree.c b/kernel/rcutree.c index 89419ff..bb61574 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -2423,3 +2423,48 @@ void __init rcu_init(void) } #include "rcutree_plugin.h" + +/* Diagnostic code for boot-time hangs observed in early 3.0 days. */ + +static int rcu_diag_timer_must_stop; +struct timer_list rcu_diag_timer; +#define RCU_DIAG_TIMER_PERIOD (10 * HZ) + +static void rcu_diag_timer_handler(unsigned long unused) +{ + int cpu; + + if (rcu_diag_timer_must_stop) + return; + + if(ULONG_CMP_GE(jiffies, + rcu_sched_state.gp_start + RCU_DIAG_TIMER_PERIOD)) + for_each_online_cpu(cpu) { + printk(KERN_ALERT "rcu_diag: rcuc%d %u/%u/%d ", + cpu, + per_cpu(rcu_cpu_kthread_status, cpu), + per_cpu(rcu_cpu_kthread_loops, cpu), + per_cpu(rcu_cpu_has_work, cpu)); + sched_show_task(current); + } + + if (rcu_diag_timer_must_stop) + return; + mod_timer(&rcu_diag_timer, RCU_DIAG_TIMER_PERIOD + jiffies); +} + +void rcu_diag_timer_start(void) +{ + rcu_diag_timer_must_stop = 0; + setup_timer(&rcu_diag_timer, + rcu_diag_timer_handler, (unsigned long) NULL); + mod_timer(&rcu_diag_timer, RCU_DIAG_TIMER_PERIOD + jiffies); +} +EXPORT_SYMBOL_GPL(rcu_diag_timer_start); + +void rcu_diag_timer_stop(void) +{ + rcu_diag_timer_must_stop = 1; + del_timer(&rcu_diag_timer); +} +EXPORT_SYMBOL_GPL(rcu_diag_timer_stop);