diff mbox

[v2,6/8] kvm tools: Add rwlock wrapper

Message ID 20110603193157.GO2333@linux.vnet.ibm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Paul E. McKenney June 3, 2011, 7:31 p.m. UTC
On Fri, Jun 03, 2011 at 10:54:19AM +0300, Sasha Levin wrote:
> On Fri, 2011-06-03 at 09:34 +0200, Ingo Molnar wrote:
> > * Sasha Levin <levinsasha928@gmail.com> wrote:
> > 
> > > > with no apparent progress being made.
> > > 
> > > Since it's something that worked in 2.6.37, I've looked into it to 
> > > find what might have caused this issue.
> > > 
> > > I've bisected guest kernels and found that the problem starts with:
> > > 
> > > a26ac2455ffcf3be5c6ef92bc6df7182700f2114 is the first bad commit
> > > commit a26ac2455ffcf3be5c6ef92bc6df7182700f2114
> > > Author: Paul E. McKenney <paul.mckenney@linaro.org>
> > > Date:   Wed Jan 12 14:10:23 2011 -0800
> > > 
> > >     rcu: move TREE_RCU from softirq to kthread
> > > 
> > > Ingo, could you confirm that the problem goes away for you when you 
> > > use an earlier commit?
> > 
> > testing will have to wait, but there's a recent upstream fix:
> > 
> >   d72bce0e67e8: rcu: Cure load woes
> > 
> > That *might* perhaps address this problem too.
> > 
> I've re-tested with Linus's current git, the problem is still there.
> 
> > If not then this appears to be some sort of RCU related livelock with 
> > brutally overcommitted vcpus. On native this would show up too, in a 
> > less drastic form, as a spurious bootup delay.
> 
> I don't think it was overcommited by *that* much. With that commit it
> usually hangs at 20-40 vcpus, while without it I can go up to 255.

Here is a diagnostic patch, untested.  It assumes that your system
has only a few CPUs (maybe 8-16) and that timers are still running.
It dumps out some RCU state if grace periods extend for more than
a few seconds.

To activate it, call rcu_diag_timer_start() from process context.
To stop it, call rcu_diag_timer_stop(), also from process context.

Thoughts?

							Thanx, Paul

------------------------------------------------------------------------

rcu: diagnostic check of kthread state
    
Not-signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Sasha Levin June 3, 2011, 7:56 p.m. UTC | #1
On Fri, 2011-06-03 at 12:31 -0700, Paul E. McKenney wrote:
> On Fri, Jun 03, 2011 at 10:54:19AM +0300, Sasha Levin wrote:
> > On Fri, 2011-06-03 at 09:34 +0200, Ingo Molnar wrote:
> > > * Sasha Levin <levinsasha928@gmail.com> wrote:
> > > 
> > > > > with no apparent progress being made.
> > > > 
> > > > Since it's something that worked in 2.6.37, I've looked into it to 
> > > > find what might have caused this issue.
> > > > 
> > > > I've bisected guest kernels and found that the problem starts with:
> > > > 
> > > > a26ac2455ffcf3be5c6ef92bc6df7182700f2114 is the first bad commit
> > > > commit a26ac2455ffcf3be5c6ef92bc6df7182700f2114
> > > > Author: Paul E. McKenney <paul.mckenney@linaro.org>
> > > > Date:   Wed Jan 12 14:10:23 2011 -0800
> > > > 
> > > >     rcu: move TREE_RCU from softirq to kthread
> > > > 
> > > > Ingo, could you confirm that the problem goes away for you when you 
> > > > use an earlier commit?
> > > 
> > > testing will have to wait, but there's a recent upstream fix:
> > > 
> > >   d72bce0e67e8: rcu: Cure load woes
> > > 
> > > That *might* perhaps address this problem too.
> > > 
> > I've re-tested with Linus's current git, the problem is still there.
> > 
> > > If not then this appears to be some sort of RCU related livelock with 
> > > brutally overcommitted vcpus. On native this would show up too, in a 
> > > less drastic form, as a spurious bootup delay.
> > 
> > I don't think it was overcommited by *that* much. With that commit it
> > usually hangs at 20-40 vcpus, while without it I can go up to 255.
> 
> Here is a diagnostic patch, untested.  It assumes that your system
> has only a few CPUs (maybe 8-16) and that timers are still running.
> It dumps out some RCU state if grace periods extend for more than
> a few seconds.
> 
> To activate it, call rcu_diag_timer_start() from process context.
> To stop it, call rcu_diag_timer_stop(), also from process context.

Since the hang happens in guest kernel very early during boot, I can't
call rcu_diag_timer_start(). What would be a good place to put the
_start() code instead?
Paul E. McKenney June 3, 2011, 8:22 p.m. UTC | #2
On Fri, Jun 03, 2011 at 10:56:20PM +0300, Sasha Levin wrote:
> On Fri, 2011-06-03 at 12:31 -0700, Paul E. McKenney wrote:
> > On Fri, Jun 03, 2011 at 10:54:19AM +0300, Sasha Levin wrote:
> > > On Fri, 2011-06-03 at 09:34 +0200, Ingo Molnar wrote:
> > > > * Sasha Levin <levinsasha928@gmail.com> wrote:
> > > > 
> > > > > > with no apparent progress being made.
> > > > > 
> > > > > Since it's something that worked in 2.6.37, I've looked into it to 
> > > > > find what might have caused this issue.
> > > > > 
> > > > > I've bisected guest kernels and found that the problem starts with:
> > > > > 
> > > > > a26ac2455ffcf3be5c6ef92bc6df7182700f2114 is the first bad commit
> > > > > commit a26ac2455ffcf3be5c6ef92bc6df7182700f2114
> > > > > Author: Paul E. McKenney <paul.mckenney@linaro.org>
> > > > > Date:   Wed Jan 12 14:10:23 2011 -0800
> > > > > 
> > > > >     rcu: move TREE_RCU from softirq to kthread
> > > > > 
> > > > > Ingo, could you confirm that the problem goes away for you when you 
> > > > > use an earlier commit?
> > > > 
> > > > testing will have to wait, but there's a recent upstream fix:
> > > > 
> > > >   d72bce0e67e8: rcu: Cure load woes
> > > > 
> > > > That *might* perhaps address this problem too.
> > > > 
> > > I've re-tested with Linus's current git, the problem is still there.
> > > 
> > > > If not then this appears to be some sort of RCU related livelock with 
> > > > brutally overcommitted vcpus. On native this would show up too, in a 
> > > > less drastic form, as a spurious bootup delay.
> > > 
> > > I don't think it was overcommited by *that* much. With that commit it
> > > usually hangs at 20-40 vcpus, while without it I can go up to 255.
> > 
> > Here is a diagnostic patch, untested.  It assumes that your system
> > has only a few CPUs (maybe 8-16) and that timers are still running.
> > It dumps out some RCU state if grace periods extend for more than
> > a few seconds.
> > 
> > To activate it, call rcu_diag_timer_start() from process context.
> > To stop it, call rcu_diag_timer_stop(), also from process context.
> 
> Since the hang happens in guest kernel very early during boot, I can't
> call rcu_diag_timer_start(). What would be a good place to put the
> _start() code instead?

Assuming that the failure occurs in the host OS rather than in the guest
OS, I suggest placing rcu_diag_timer_start() in the host code that starts
up the guest.

On the other hand, if the failure is occuring in the guest OS, then
I suggest placing the call to rcu_diag_timer_start() just after timer
initialization -- that is, assuming that interrupts are enabled at the
time of the failure.  If interrupts are not yet enabled at the time of
the failure, color me clueless.

							Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sasha Levin June 3, 2011, 9:03 p.m. UTC | #3
On Fri, 2011-06-03 at 13:22 -0700, Paul E. McKenney wrote:
> On Fri, Jun 03, 2011 at 10:56:20PM +0300, Sasha Levin wrote:
> > On Fri, 2011-06-03 at 12:31 -0700, Paul E. McKenney wrote:
> > > On Fri, Jun 03, 2011 at 10:54:19AM +0300, Sasha Levin wrote:
> > > > On Fri, 2011-06-03 at 09:34 +0200, Ingo Molnar wrote:
> > > > > * Sasha Levin <levinsasha928@gmail.com> wrote:
> > > > > 
> > > > > > > with no apparent progress being made.
> > > > > > 
> > > > > > Since it's something that worked in 2.6.37, I've looked into it to 
> > > > > > find what might have caused this issue.
> > > > > > 
> > > > > > I've bisected guest kernels and found that the problem starts with:
> > > > > > 
> > > > > > a26ac2455ffcf3be5c6ef92bc6df7182700f2114 is the first bad commit
> > > > > > commit a26ac2455ffcf3be5c6ef92bc6df7182700f2114
> > > > > > Author: Paul E. McKenney <paul.mckenney@linaro.org>
> > > > > > Date:   Wed Jan 12 14:10:23 2011 -0800
> > > > > > 
> > > > > >     rcu: move TREE_RCU from softirq to kthread
> > > > > > 
> > > > > > Ingo, could you confirm that the problem goes away for you when you 
> > > > > > use an earlier commit?
> > > > > 
> > > > > testing will have to wait, but there's a recent upstream fix:
> > > > > 
> > > > >   d72bce0e67e8: rcu: Cure load woes
> > > > > 
> > > > > That *might* perhaps address this problem too.
> > > > > 
> > > > I've re-tested with Linus's current git, the problem is still there.
> > > > 
> > > > > If not then this appears to be some sort of RCU related livelock with 
> > > > > brutally overcommitted vcpus. On native this would show up too, in a 
> > > > > less drastic form, as a spurious bootup delay.
> > > > 
> > > > I don't think it was overcommited by *that* much. With that commit it
> > > > usually hangs at 20-40 vcpus, while without it I can go up to 255.
> > > 
> > > Here is a diagnostic patch, untested.  It assumes that your system
> > > has only a few CPUs (maybe 8-16) and that timers are still running.
> > > It dumps out some RCU state if grace periods extend for more than
> > > a few seconds.
> > > 
> > > To activate it, call rcu_diag_timer_start() from process context.
> > > To stop it, call rcu_diag_timer_stop(), also from process context.
> > 
> > Since the hang happens in guest kernel very early during boot, I can't
> > call rcu_diag_timer_start(). What would be a good place to put the
> > _start() code instead?
> 
> Assuming that the failure occurs in the host OS rather than in the guest
> OS, I suggest placing rcu_diag_timer_start() in the host code that starts
> up the guest.
> 
> On the other hand, if the failure is occuring in the guest OS, then
> I suggest placing the call to rcu_diag_timer_start() just after timer
> initialization -- that is, assuming that interrupts are enabled at the
> time of the failure.  If interrupts are not yet enabled at the time of
> the failure, color me clueless.

It happens in the guest OS, the bisection was done on a guest kernel.

I've placed the rcu debug _start() right at the end of init_timers()
which happens before the hang, but I'm not seeing any output from the
debug function.
Paul E. McKenney June 3, 2011, 9:20 p.m. UTC | #4
On Sat, Jun 04, 2011 at 12:03:59AM +0300, Sasha Levin wrote:
> On Fri, 2011-06-03 at 13:22 -0700, Paul E. McKenney wrote:
> > On Fri, Jun 03, 2011 at 10:56:20PM +0300, Sasha Levin wrote:
> > > On Fri, 2011-06-03 at 12:31 -0700, Paul E. McKenney wrote:
> > > > On Fri, Jun 03, 2011 at 10:54:19AM +0300, Sasha Levin wrote:
> > > > > On Fri, 2011-06-03 at 09:34 +0200, Ingo Molnar wrote:
> > > > > > * Sasha Levin <levinsasha928@gmail.com> wrote:
> > > > > > 
> > > > > > > > with no apparent progress being made.
> > > > > > > 
> > > > > > > Since it's something that worked in 2.6.37, I've looked into it to 
> > > > > > > find what might have caused this issue.
> > > > > > > 
> > > > > > > I've bisected guest kernels and found that the problem starts with:
> > > > > > > 
> > > > > > > a26ac2455ffcf3be5c6ef92bc6df7182700f2114 is the first bad commit
> > > > > > > commit a26ac2455ffcf3be5c6ef92bc6df7182700f2114
> > > > > > > Author: Paul E. McKenney <paul.mckenney@linaro.org>
> > > > > > > Date:   Wed Jan 12 14:10:23 2011 -0800
> > > > > > > 
> > > > > > >     rcu: move TREE_RCU from softirq to kthread
> > > > > > > 
> > > > > > > Ingo, could you confirm that the problem goes away for you when you 
> > > > > > > use an earlier commit?
> > > > > > 
> > > > > > testing will have to wait, but there's a recent upstream fix:
> > > > > > 
> > > > > >   d72bce0e67e8: rcu: Cure load woes
> > > > > > 
> > > > > > That *might* perhaps address this problem too.
> > > > > > 
> > > > > I've re-tested with Linus's current git, the problem is still there.
> > > > > 
> > > > > > If not then this appears to be some sort of RCU related livelock with 
> > > > > > brutally overcommitted vcpus. On native this would show up too, in a 
> > > > > > less drastic form, as a spurious bootup delay.
> > > > > 
> > > > > I don't think it was overcommited by *that* much. With that commit it
> > > > > usually hangs at 20-40 vcpus, while without it I can go up to 255.
> > > > 
> > > > Here is a diagnostic patch, untested.  It assumes that your system
> > > > has only a few CPUs (maybe 8-16) and that timers are still running.
> > > > It dumps out some RCU state if grace periods extend for more than
> > > > a few seconds.
> > > > 
> > > > To activate it, call rcu_diag_timer_start() from process context.
> > > > To stop it, call rcu_diag_timer_stop(), also from process context.
> > > 
> > > Since the hang happens in guest kernel very early during boot, I can't
> > > call rcu_diag_timer_start(). What would be a good place to put the
> > > _start() code instead?
> > 
> > Assuming that the failure occurs in the host OS rather than in the guest
> > OS, I suggest placing rcu_diag_timer_start() in the host code that starts
> > up the guest.
> > 
> > On the other hand, if the failure is occuring in the guest OS, then
> > I suggest placing the call to rcu_diag_timer_start() just after timer
> > initialization -- that is, assuming that interrupts are enabled at the
> > time of the failure.  If interrupts are not yet enabled at the time of
> > the failure, color me clueless.
> 
> It happens in the guest OS, the bisection was done on a guest kernel.
> 
> I've placed the rcu debug _start() right at the end of init_timers()
> which happens before the hang, but I'm not seeing any output from the
> debug function.

Sounds like the hang is happening before timers start?  Are scheduling
clock interrupts happening in the guest, as in scheduler_tick()?  If so,
I could just as easily put the debug there.

							Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sasha Levin June 3, 2011, 10:54 p.m. UTC | #5
On Fri, 2011-06-03 at 14:20 -0700, Paul E. McKenney wrote:
> On Sat, Jun 04, 2011 at 12:03:59AM +0300, Sasha Levin wrote:
> > On Fri, 2011-06-03 at 13:22 -0700, Paul E. McKenney wrote:
> > > On Fri, Jun 03, 2011 at 10:56:20PM +0300, Sasha Levin wrote:
> > > > On Fri, 2011-06-03 at 12:31 -0700, Paul E. McKenney wrote:
> > > > > On Fri, Jun 03, 2011 at 10:54:19AM +0300, Sasha Levin wrote:
> > > > > > On Fri, 2011-06-03 at 09:34 +0200, Ingo Molnar wrote:
> > > > > > > * Sasha Levin <levinsasha928@gmail.com> wrote:
> > > > > > > 
> > > > > > > > > with no apparent progress being made.
> > > > > > > > 
> > > > > > > > Since it's something that worked in 2.6.37, I've looked into it to 
> > > > > > > > find what might have caused this issue.
> > > > > > > > 
> > > > > > > > I've bisected guest kernels and found that the problem starts with:
> > > > > > > > 
> > > > > > > > a26ac2455ffcf3be5c6ef92bc6df7182700f2114 is the first bad commit
> > > > > > > > commit a26ac2455ffcf3be5c6ef92bc6df7182700f2114
> > > > > > > > Author: Paul E. McKenney <paul.mckenney@linaro.org>
> > > > > > > > Date:   Wed Jan 12 14:10:23 2011 -0800
> > > > > > > > 
> > > > > > > >     rcu: move TREE_RCU from softirq to kthread
> > > > > > > > 
> > > > > > > > Ingo, could you confirm that the problem goes away for you when you 
> > > > > > > > use an earlier commit?
> > > > > > > 
> > > > > > > testing will have to wait, but there's a recent upstream fix:
> > > > > > > 
> > > > > > >   d72bce0e67e8: rcu: Cure load woes
> > > > > > > 
> > > > > > > That *might* perhaps address this problem too.
> > > > > > > 
> > > > > > I've re-tested with Linus's current git, the problem is still there.
> > > > > > 
> > > > > > > If not then this appears to be some sort of RCU related livelock with 
> > > > > > > brutally overcommitted vcpus. On native this would show up too, in a 
> > > > > > > less drastic form, as a spurious bootup delay.
> > > > > > 
> > > > > > I don't think it was overcommited by *that* much. With that commit it
> > > > > > usually hangs at 20-40 vcpus, while without it I can go up to 255.
> > > > > 
> > > > > Here is a diagnostic patch, untested.  It assumes that your system
> > > > > has only a few CPUs (maybe 8-16) and that timers are still running.
> > > > > It dumps out some RCU state if grace periods extend for more than
> > > > > a few seconds.
> > > > > 
> > > > > To activate it, call rcu_diag_timer_start() from process context.
> > > > > To stop it, call rcu_diag_timer_stop(), also from process context.
> > > > 
> > > > Since the hang happens in guest kernel very early during boot, I can't
> > > > call rcu_diag_timer_start(). What would be a good place to put the
> > > > _start() code instead?
> > > 
> > > Assuming that the failure occurs in the host OS rather than in the guest
> > > OS, I suggest placing rcu_diag_timer_start() in the host code that starts
> > > up the guest.
> > > 
> > > On the other hand, if the failure is occuring in the guest OS, then
> > > I suggest placing the call to rcu_diag_timer_start() just after timer
> > > initialization -- that is, assuming that interrupts are enabled at the
> > > time of the failure.  If interrupts are not yet enabled at the time of
> > > the failure, color me clueless.
> > 
> > It happens in the guest OS, the bisection was done on a guest kernel.
> > 
> > I've placed the rcu debug _start() right at the end of init_timers()
> > which happens before the hang, but I'm not seeing any output from the
> > debug function.
> 
> Sounds like the hang is happening before timers start?  Are scheduling
> clock interrupts happening in the guest, as in scheduler_tick()?  If so,
> I could just as easily put the debug there.

Yes, scheduler_tick() is running before the hang.
Should I just call the handler function directly from there?
Paul E. McKenney June 3, 2011, 11:05 p.m. UTC | #6
On Sat, Jun 04, 2011 at 01:54:45AM +0300, Sasha Levin wrote:
> On Fri, 2011-06-03 at 14:20 -0700, Paul E. McKenney wrote:
> > On Sat, Jun 04, 2011 at 12:03:59AM +0300, Sasha Levin wrote:
> > > On Fri, 2011-06-03 at 13:22 -0700, Paul E. McKenney wrote:
> > > > On Fri, Jun 03, 2011 at 10:56:20PM +0300, Sasha Levin wrote:
> > > > > On Fri, 2011-06-03 at 12:31 -0700, Paul E. McKenney wrote:
> > > > > > On Fri, Jun 03, 2011 at 10:54:19AM +0300, Sasha Levin wrote:
> > > > > > > On Fri, 2011-06-03 at 09:34 +0200, Ingo Molnar wrote:
> > > > > > > > * Sasha Levin <levinsasha928@gmail.com> wrote:
> > > > > > > > 
> > > > > > > > > > with no apparent progress being made.
> > > > > > > > > 
> > > > > > > > > Since it's something that worked in 2.6.37, I've looked into it to 
> > > > > > > > > find what might have caused this issue.
> > > > > > > > > 
> > > > > > > > > I've bisected guest kernels and found that the problem starts with:
> > > > > > > > > 
> > > > > > > > > a26ac2455ffcf3be5c6ef92bc6df7182700f2114 is the first bad commit
> > > > > > > > > commit a26ac2455ffcf3be5c6ef92bc6df7182700f2114
> > > > > > > > > Author: Paul E. McKenney <paul.mckenney@linaro.org>
> > > > > > > > > Date:   Wed Jan 12 14:10:23 2011 -0800
> > > > > > > > > 
> > > > > > > > >     rcu: move TREE_RCU from softirq to kthread
> > > > > > > > > 
> > > > > > > > > Ingo, could you confirm that the problem goes away for you when you 
> > > > > > > > > use an earlier commit?
> > > > > > > > 
> > > > > > > > testing will have to wait, but there's a recent upstream fix:
> > > > > > > > 
> > > > > > > >   d72bce0e67e8: rcu: Cure load woes
> > > > > > > > 
> > > > > > > > That *might* perhaps address this problem too.
> > > > > > > > 
> > > > > > > I've re-tested with Linus's current git, the problem is still there.
> > > > > > > 
> > > > > > > > If not then this appears to be some sort of RCU related livelock with 
> > > > > > > > brutally overcommitted vcpus. On native this would show up too, in a 
> > > > > > > > less drastic form, as a spurious bootup delay.
> > > > > > > 
> > > > > > > I don't think it was overcommited by *that* much. With that commit it
> > > > > > > usually hangs at 20-40 vcpus, while without it I can go up to 255.
> > > > > > 
> > > > > > Here is a diagnostic patch, untested.  It assumes that your system
> > > > > > has only a few CPUs (maybe 8-16) and that timers are still running.
> > > > > > It dumps out some RCU state if grace periods extend for more than
> > > > > > a few seconds.
> > > > > > 
> > > > > > To activate it, call rcu_diag_timer_start() from process context.
> > > > > > To stop it, call rcu_diag_timer_stop(), also from process context.
> > > > > 
> > > > > Since the hang happens in guest kernel very early during boot, I can't
> > > > > call rcu_diag_timer_start(). What would be a good place to put the
> > > > > _start() code instead?
> > > > 
> > > > Assuming that the failure occurs in the host OS rather than in the guest
> > > > OS, I suggest placing rcu_diag_timer_start() in the host code that starts
> > > > up the guest.
> > > > 
> > > > On the other hand, if the failure is occuring in the guest OS, then
> > > > I suggest placing the call to rcu_diag_timer_start() just after timer
> > > > initialization -- that is, assuming that interrupts are enabled at the
> > > > time of the failure.  If interrupts are not yet enabled at the time of
> > > > the failure, color me clueless.
> > > 
> > > It happens in the guest OS, the bisection was done on a guest kernel.
> > > 
> > > I've placed the rcu debug _start() right at the end of init_timers()
> > > which happens before the hang, but I'm not seeing any output from the
> > > debug function.
> > 
> > Sounds like the hang is happening before timers start?  Are scheduling
> > clock interrupts happening in the guest, as in scheduler_tick()?  If so,
> > I could just as easily put the debug there.
> 
> Yes, scheduler_tick() is running before the hang.
> Should I just call the handler function directly from there?

I suggest calling it no more than once every few seconds, but yes.
(Otherwise you will get a very full dmesg.)

							Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sasha Levin June 4, 2011, 6:26 a.m. UTC | #7
On Fri, 2011-06-03 at 16:05 -0700, Paul E. McKenney wrote:
> On Sat, Jun 04, 2011 at 01:54:45AM +0300, Sasha Levin wrote:
> > On Fri, 2011-06-03 at 14:20 -0700, Paul E. McKenney wrote:
> > > On Sat, Jun 04, 2011 at 12:03:59AM +0300, Sasha Levin wrote:
> > > > On Fri, 2011-06-03 at 13:22 -0700, Paul E. McKenney wrote:
> > > > > On Fri, Jun 03, 2011 at 10:56:20PM +0300, Sasha Levin wrote:
> > > > > > On Fri, 2011-06-03 at 12:31 -0700, Paul E. McKenney wrote:
> > > > > > > On Fri, Jun 03, 2011 at 10:54:19AM +0300, Sasha Levin wrote:
> > > > > > > > On Fri, 2011-06-03 at 09:34 +0200, Ingo Molnar wrote:
> > > > > > > > > * Sasha Levin <levinsasha928@gmail.com> wrote:
> > > > > > > > > 
> > > > > > > > > > > with no apparent progress being made.
> > > > > > > > > > 
> > > > > > > > > > Since it's something that worked in 2.6.37, I've looked into it to 
> > > > > > > > > > find what might have caused this issue.
> > > > > > > > > > 
> > > > > > > > > > I've bisected guest kernels and found that the problem starts with:
> > > > > > > > > > 
> > > > > > > > > > a26ac2455ffcf3be5c6ef92bc6df7182700f2114 is the first bad commit
> > > > > > > > > > commit a26ac2455ffcf3be5c6ef92bc6df7182700f2114
> > > > > > > > > > Author: Paul E. McKenney <paul.mckenney@linaro.org>
> > > > > > > > > > Date:   Wed Jan 12 14:10:23 2011 -0800
> > > > > > > > > > 
> > > > > > > > > >     rcu: move TREE_RCU from softirq to kthread
> > > > > > > > > > 
> > > > > > > > > > Ingo, could you confirm that the problem goes away for you when you 
> > > > > > > > > > use an earlier commit?
> > > > > > > > > 
> > > > > > > > > testing will have to wait, but there's a recent upstream fix:
> > > > > > > > > 
> > > > > > > > >   d72bce0e67e8: rcu: Cure load woes
> > > > > > > > > 
> > > > > > > > > That *might* perhaps address this problem too.
> > > > > > > > > 
> > > > > > > > I've re-tested with Linus's current git, the problem is still there.
> > > > > > > > 
> > > > > > > > > If not then this appears to be some sort of RCU related livelock with 
> > > > > > > > > brutally overcommitted vcpus. On native this would show up too, in a 
> > > > > > > > > less drastic form, as a spurious bootup delay.
> > > > > > > > 
> > > > > > > > I don't think it was overcommited by *that* much. With that commit it
> > > > > > > > usually hangs at 20-40 vcpus, while without it I can go up to 255.
> > > > > > > 
> > > > > > > Here is a diagnostic patch, untested.  It assumes that your system
> > > > > > > has only a few CPUs (maybe 8-16) and that timers are still running.
> > > > > > > It dumps out some RCU state if grace periods extend for more than
> > > > > > > a few seconds.
> > > > > > > 
> > > > > > > To activate it, call rcu_diag_timer_start() from process context.
> > > > > > > To stop it, call rcu_diag_timer_stop(), also from process context.
> > > > > > 
> > > > > > Since the hang happens in guest kernel very early during boot, I can't
> > > > > > call rcu_diag_timer_start(). What would be a good place to put the
> > > > > > _start() code instead?
> > > > > 
> > > > > Assuming that the failure occurs in the host OS rather than in the guest
> > > > > OS, I suggest placing rcu_diag_timer_start() in the host code that starts
> > > > > up the guest.
> > > > > 
> > > > > On the other hand, if the failure is occuring in the guest OS, then
> > > > > I suggest placing the call to rcu_diag_timer_start() just after timer
> > > > > initialization -- that is, assuming that interrupts are enabled at the
> > > > > time of the failure.  If interrupts are not yet enabled at the time of
> > > > > the failure, color me clueless.
> > > > 
> > > > It happens in the guest OS, the bisection was done on a guest kernel.
> > > > 
> > > > I've placed the rcu debug _start() right at the end of init_timers()
> > > > which happens before the hang, but I'm not seeing any output from the
> > > > debug function.
> > > 
> > > Sounds like the hang is happening before timers start?  Are scheduling
> > > clock interrupts happening in the guest, as in scheduler_tick()?  If so,
> > > I could just as easily put the debug there.
> > 
> > Yes, scheduler_tick() is running before the hang.
> > Should I just call the handler function directly from there?
> 
> I suggest calling it no more than once every few seconds, but yes.
> (Otherwise you will get a very full dmesg.)

Here is the last group of printk(), basically it just repeats itself
when it's stuck:

KVM setup async PF for cpu 19
 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
rcu_diag: rcuc17 2/46/0 
kworker/0:1     R  running task        0     0     14 0x00000008
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
rcu_diag: rcuc18 2/23/0 
kworker/0:1     R  running task        0     0     14 0x00000008
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
rcu_diag: rcuc19 0/0/0 
kworker/0:1     R  running task        0     0     14 0x00000008
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
[    0.527044]  #20
 0000000000000000
Call Trace:
rcu_diag: rcuc3 0/0/0 
kworker/0:1     R  running task        0     0     14 0x00000008
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
rcu_diag: rcuc4 2/358/0 
kworker/0:1     R  running task        0     0     14 0x00000008
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
rcu_diag: rcuc5 2/340/1 
kworker/0:1     R  running task        0     0     14 0x00000008
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
rcu_diag: rcuc6 2/321/1 
kworker/0:1     R  running task        0     0     14 0x00000008
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
rcu_diag: rcuc7 2/301/1 
kworker/0:1     R  running task        0     0     14 0x00000008
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
rcu_diag: rcuc8 2/277/0 
kworker/0:1     R  running task        0     0     14 0x00000008
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
rcu_diag: rcuc9 2/255/0 
kworker/0:1     R  running task        0     0     14 0x00000008
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
rcu_diag: rcuc10 2/226/0 
kworker/0:1     R  running task        0     0     14 0x00000008
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
rcu_diag: rcuc11 2/214/1 
kworker/0:1     R  running task        0     0     14 0x00000008
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
rcu_diag: rcuc12 2/193/1 
kworker/0:1     R  running task        0     0     14 0x00000008
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
rcu_diag: rcuc13 2/172/0 
kworker/0:1     R  running task        0     0     14 0x00000008
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
rcu_diag: rcuc14 2/146/0 
kworker/0:1     R  running task        0     0     14 0x00000008
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
rcu_diag: rcuc15 2/130/0 
kworker/0:1     R  running task        0     0     14 0x00000008
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
rcu_diag: rcuc16 2/107/0 
kworker/0:1     R  running task        0     0     14 0x00000008
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
rcu_diag: rcuc17 2/86/0 
kworker/0:1     R  running task        0     0     14 0x00000008
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
rcu_diag: rcuc18 2/62/1 
kworker/0:1     R  running task        0     0     14 0x00000008
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
rcu_diag: rcuc19 2/39/0 
kworker/0:1     R  running task        0     0     14 0x00000008
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
Paul E. McKenney June 4, 2011, 4:30 p.m. UTC | #8
On Sat, Jun 04, 2011 at 09:26:15AM +0300, Sasha Levin wrote:
> On Fri, 2011-06-03 at 16:05 -0700, Paul E. McKenney wrote:
> > On Sat, Jun 04, 2011 at 01:54:45AM +0300, Sasha Levin wrote:
> > > On Fri, 2011-06-03 at 14:20 -0700, Paul E. McKenney wrote:
> > > > On Sat, Jun 04, 2011 at 12:03:59AM +0300, Sasha Levin wrote:
> > > > > On Fri, 2011-06-03 at 13:22 -0700, Paul E. McKenney wrote:
> > > > > > On Fri, Jun 03, 2011 at 10:56:20PM +0300, Sasha Levin wrote:
> > > > > > > On Fri, 2011-06-03 at 12:31 -0700, Paul E. McKenney wrote:
> > > > > > > > On Fri, Jun 03, 2011 at 10:54:19AM +0300, Sasha Levin wrote:
> > > > > > > > > On Fri, 2011-06-03 at 09:34 +0200, Ingo Molnar wrote:
> > > > > > > > > > * Sasha Levin <levinsasha928@gmail.com> wrote:
> > > > > > > > > > 
> > > > > > > > > > > > with no apparent progress being made.
> > > > > > > > > > > 
> > > > > > > > > > > Since it's something that worked in 2.6.37, I've looked into it to 
> > > > > > > > > > > find what might have caused this issue.
> > > > > > > > > > > 
> > > > > > > > > > > I've bisected guest kernels and found that the problem starts with:
> > > > > > > > > > > 
> > > > > > > > > > > a26ac2455ffcf3be5c6ef92bc6df7182700f2114 is the first bad commit
> > > > > > > > > > > commit a26ac2455ffcf3be5c6ef92bc6df7182700f2114
> > > > > > > > > > > Author: Paul E. McKenney <paul.mckenney@linaro.org>
> > > > > > > > > > > Date:   Wed Jan 12 14:10:23 2011 -0800
> > > > > > > > > > > 
> > > > > > > > > > >     rcu: move TREE_RCU from softirq to kthread
> > > > > > > > > > > 
> > > > > > > > > > > Ingo, could you confirm that the problem goes away for you when you 
> > > > > > > > > > > use an earlier commit?
> > > > > > > > > > 
> > > > > > > > > > testing will have to wait, but there's a recent upstream fix:
> > > > > > > > > > 
> > > > > > > > > >   d72bce0e67e8: rcu: Cure load woes
> > > > > > > > > > 
> > > > > > > > > > That *might* perhaps address this problem too.
> > > > > > > > > > 
> > > > > > > > > I've re-tested with Linus's current git, the problem is still there.
> > > > > > > > > 
> > > > > > > > > > If not then this appears to be some sort of RCU related livelock with 
> > > > > > > > > > brutally overcommitted vcpus. On native this would show up too, in a 
> > > > > > > > > > less drastic form, as a spurious bootup delay.
> > > > > > > > > 
> > > > > > > > > I don't think it was overcommited by *that* much. With that commit it
> > > > > > > > > usually hangs at 20-40 vcpus, while without it I can go up to 255.
> > > > > > > > 
> > > > > > > > Here is a diagnostic patch, untested.  It assumes that your system
> > > > > > > > has only a few CPUs (maybe 8-16) and that timers are still running.
> > > > > > > > It dumps out some RCU state if grace periods extend for more than
> > > > > > > > a few seconds.
> > > > > > > > 
> > > > > > > > To activate it, call rcu_diag_timer_start() from process context.
> > > > > > > > To stop it, call rcu_diag_timer_stop(), also from process context.
> > > > > > > 
> > > > > > > Since the hang happens in guest kernel very early during boot, I can't
> > > > > > > call rcu_diag_timer_start(). What would be a good place to put the
> > > > > > > _start() code instead?
> > > > > > 
> > > > > > Assuming that the failure occurs in the host OS rather than in the guest
> > > > > > OS, I suggest placing rcu_diag_timer_start() in the host code that starts
> > > > > > up the guest.
> > > > > > 
> > > > > > On the other hand, if the failure is occuring in the guest OS, then
> > > > > > I suggest placing the call to rcu_diag_timer_start() just after timer
> > > > > > initialization -- that is, assuming that interrupts are enabled at the
> > > > > > time of the failure.  If interrupts are not yet enabled at the time of
> > > > > > the failure, color me clueless.
> > > > > 
> > > > > It happens in the guest OS, the bisection was done on a guest kernel.
> > > > > 
> > > > > I've placed the rcu debug _start() right at the end of init_timers()
> > > > > which happens before the hang, but I'm not seeing any output from the
> > > > > debug function.
> > > > 
> > > > Sounds like the hang is happening before timers start?  Are scheduling
> > > > clock interrupts happening in the guest, as in scheduler_tick()?  If so,
> > > > I could just as easily put the debug there.
> > > 
> > > Yes, scheduler_tick() is running before the hang.
> > > Should I just call the handler function directly from there?
> > 
> > I suggest calling it no more than once every few seconds, but yes.
> > (Otherwise you will get a very full dmesg.)
> 
> Here is the last group of printk(), basically it just repeats itself
> when it's stuck:
> 
> KVM setup async PF for cpu 19
>  0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> Call Trace:
> rcu_diag: rcuc17 2/46/0 
> kworker/0:1     R  running task        0     0     14 0x00000008
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> Call Trace:
> rcu_diag: rcuc18 2/23/0 
> kworker/0:1     R  running task        0     0     14 0x00000008
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> Call Trace:
> rcu_diag: rcuc19 0/0/0 
> kworker/0:1     R  running task        0     0     14 0x00000008
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> Call Trace:
> [    0.527044]  #20
>  0000000000000000
> Call Trace:
> rcu_diag: rcuc3 0/0/0 
> kworker/0:1     R  running task        0     0     14 0x00000008
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> Call Trace:
> rcu_diag: rcuc4 2/358/0 
> kworker/0:1     R  running task        0     0     14 0x00000008
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> Call Trace:
> rcu_diag: rcuc5 2/340/1 

OK, this one (RCU's per-CPU kthread for CPU 5) has work to do (this
is the "1" after the final "/").  Does rcuc5 show up later in the
output?

							Thanx, Paul

> kworker/0:1     R  running task        0     0     14 0x00000008
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> Call Trace:
> rcu_diag: rcuc6 2/321/1 
> kworker/0:1     R  running task        0     0     14 0x00000008
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> Call Trace:
> rcu_diag: rcuc7 2/301/1 
> kworker/0:1     R  running task        0     0     14 0x00000008
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> Call Trace:
> rcu_diag: rcuc8 2/277/0 
> kworker/0:1     R  running task        0     0     14 0x00000008
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> Call Trace:
> rcu_diag: rcuc9 2/255/0 
> kworker/0:1     R  running task        0     0     14 0x00000008
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> Call Trace:
> rcu_diag: rcuc10 2/226/0 
> kworker/0:1     R  running task        0     0     14 0x00000008
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> Call Trace:
> rcu_diag: rcuc11 2/214/1 
> kworker/0:1     R  running task        0     0     14 0x00000008
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> Call Trace:
> rcu_diag: rcuc12 2/193/1 
> kworker/0:1     R  running task        0     0     14 0x00000008
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> Call Trace:
> rcu_diag: rcuc13 2/172/0 
> kworker/0:1     R  running task        0     0     14 0x00000008
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> Call Trace:
> rcu_diag: rcuc14 2/146/0 
> kworker/0:1     R  running task        0     0     14 0x00000008
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> Call Trace:
> rcu_diag: rcuc15 2/130/0 
> kworker/0:1     R  running task        0     0     14 0x00000008
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> Call Trace:
> rcu_diag: rcuc16 2/107/0 
> kworker/0:1     R  running task        0     0     14 0x00000008
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> Call Trace:
> rcu_diag: rcuc17 2/86/0 
> kworker/0:1     R  running task        0     0     14 0x00000008
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> Call Trace:
> rcu_diag: rcuc18 2/62/1 
> kworker/0:1     R  running task        0     0     14 0x00000008
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> Call Trace:
> rcu_diag: rcuc19 2/39/0 
> kworker/0:1     R  running task        0     0     14 0x00000008
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> Call Trace:
> 
> 
> -- 
> 
> Sasha.
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sasha Levin June 14, 2011, 10:26 p.m. UTC | #9
Sorry for the delay on this.

On Sat, 2011-06-04 at 09:30 -0700, Paul E. McKenney wrote:
> On Sat, Jun 04, 2011 at 09:26:15AM +0300, Sasha Levin wrote:
> > On Fri, 2011-06-03 at 16:05 -0700, Paul E. McKenney wrote:
> > > On Sat, Jun 04, 2011 at 01:54:45AM +0300, Sasha Levin wrote:
> > > > On Fri, 2011-06-03 at 14:20 -0700, Paul E. McKenney wrote:
> > > > > On Sat, Jun 04, 2011 at 12:03:59AM +0300, Sasha Levin wrote:
> > > > > > On Fri, 2011-06-03 at 13:22 -0700, Paul E. McKenney wrote:
> > > > > > > On Fri, Jun 03, 2011 at 10:56:20PM +0300, Sasha Levin wrote:
> > > > > > > > On Fri, 2011-06-03 at 12:31 -0700, Paul E. McKenney wrote:
> > > > > > > > > On Fri, Jun 03, 2011 at 10:54:19AM +0300, Sasha Levin wrote:
> > > > > > > > > > On Fri, 2011-06-03 at 09:34 +0200, Ingo Molnar wrote:
> > > > > > > > > > > * Sasha Levin <levinsasha928@gmail.com> wrote:
> > > > > > > > > > > 
> > > > > > > > > > > > > with no apparent progress being made.
> > > > > > > > > > > > 
> > > > > > > > > > > > Since it's something that worked in 2.6.37, I've looked into it to 
> > > > > > > > > > > > find what might have caused this issue.
> > > > > > > > > > > > 
> > > > > > > > > > > > I've bisected guest kernels and found that the problem starts with:
> > > > > > > > > > > > 
> > > > > > > > > > > > a26ac2455ffcf3be5c6ef92bc6df7182700f2114 is the first bad commit
> > > > > > > > > > > > commit a26ac2455ffcf3be5c6ef92bc6df7182700f2114
> > > > > > > > > > > > Author: Paul E. McKenney <paul.mckenney@linaro.org>
> > > > > > > > > > > > Date:   Wed Jan 12 14:10:23 2011 -0800
> > > > > > > > > > > > 
> > > > > > > > > > > >     rcu: move TREE_RCU from softirq to kthread
> > > > > > > > > > > > 
> > > > > > > > > > > > Ingo, could you confirm that the problem goes away for you when you 
> > > > > > > > > > > > use an earlier commit?
> > > > > > > > > > > 
> > > > > > > > > > > testing will have to wait, but there's a recent upstream fix:
> > > > > > > > > > > 
> > > > > > > > > > >   d72bce0e67e8: rcu: Cure load woes
> > > > > > > > > > > 
> > > > > > > > > > > That *might* perhaps address this problem too.
> > > > > > > > > > > 
> > > > > > > > > > I've re-tested with Linus's current git, the problem is still there.
> > > > > > > > > > 
> > > > > > > > > > > If not then this appears to be some sort of RCU related livelock with 
> > > > > > > > > > > brutally overcommitted vcpus. On native this would show up too, in a 
> > > > > > > > > > > less drastic form, as a spurious bootup delay.
> > > > > > > > > > 
> > > > > > > > > > I don't think it was overcommited by *that* much. With that commit it
> > > > > > > > > > usually hangs at 20-40 vcpus, while without it I can go up to 255.
> > > > > > > > > 
> > > > > > > > > Here is a diagnostic patch, untested.  It assumes that your system
> > > > > > > > > has only a few CPUs (maybe 8-16) and that timers are still running.
> > > > > > > > > It dumps out some RCU state if grace periods extend for more than
> > > > > > > > > a few seconds.
> > > > > > > > > 
> > > > > > > > > To activate it, call rcu_diag_timer_start() from process context.
> > > > > > > > > To stop it, call rcu_diag_timer_stop(), also from process context.
> > > > > > > > 
> > > > > > > > Since the hang happens in guest kernel very early during boot, I can't
> > > > > > > > call rcu_diag_timer_start(). What would be a good place to put the
> > > > > > > > _start() code instead?
> > > > > > > 
> > > > > > > Assuming that the failure occurs in the host OS rather than in the guest
> > > > > > > OS, I suggest placing rcu_diag_timer_start() in the host code that starts
> > > > > > > up the guest.
> > > > > > > 
> > > > > > > On the other hand, if the failure is occuring in the guest OS, then
> > > > > > > I suggest placing the call to rcu_diag_timer_start() just after timer
> > > > > > > initialization -- that is, assuming that interrupts are enabled at the
> > > > > > > time of the failure.  If interrupts are not yet enabled at the time of
> > > > > > > the failure, color me clueless.
> > > > > > 
> > > > > > It happens in the guest OS, the bisection was done on a guest kernel.
> > > > > > 
> > > > > > I've placed the rcu debug _start() right at the end of init_timers()
> > > > > > which happens before the hang, but I'm not seeing any output from the
> > > > > > debug function.
> > > > > 
> > > > > Sounds like the hang is happening before timers start?  Are scheduling
> > > > > clock interrupts happening in the guest, as in scheduler_tick()?  If so,
> > > > > I could just as easily put the debug there.
> > > > 
> > > > Yes, scheduler_tick() is running before the hang.
> > > > Should I just call the handler function directly from there?
> > > 
> > > I suggest calling it no more than once every few seconds, but yes.
> > > (Otherwise you will get a very full dmesg.)
> > 
> > Here is the last group of printk(), basically it just repeats itself
> > when it's stuck:
> > 
> > KVM setup async PF for cpu 19
> >  0000000000000000
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > Call Trace:
> > rcu_diag: rcuc17 2/46/0 
> > kworker/0:1     R  running task        0     0     14 0x00000008
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > Call Trace:
> > rcu_diag: rcuc18 2/23/0 
> > kworker/0:1     R  running task        0     0     14 0x00000008
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > Call Trace:
> > rcu_diag: rcuc19 0/0/0 
> > kworker/0:1     R  running task        0     0     14 0x00000008
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > Call Trace:
> > [    0.527044]  #20
> >  0000000000000000
> > Call Trace:
> > rcu_diag: rcuc3 0/0/0 
> > kworker/0:1     R  running task        0     0     14 0x00000008
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > Call Trace:
> > rcu_diag: rcuc4 2/358/0 
> > kworker/0:1     R  running task        0     0     14 0x00000008
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > Call Trace:
> > rcu_diag: rcuc5 2/340/1 
> 
> OK, this one (RCU's per-CPU kthread for CPU 5) has work to do (this
> is the "1" after the final "/").  Does rcuc5 show up later in the
> output?
> 

The output grinds to a stop, I've pasted the last part of the prints -
there are no more prints at all past this one.

> 							Thanx, Paul
> 
> > kworker/0:1     R  running task        0     0     14 0x00000008
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > Call Trace:
> > rcu_diag: rcuc6 2/321/1 
> > kworker/0:1     R  running task        0     0     14 0x00000008
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > Call Trace:
> > rcu_diag: rcuc7 2/301/1 
> > kworker/0:1     R  running task        0     0     14 0x00000008
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > Call Trace:
> > rcu_diag: rcuc8 2/277/0 
> > kworker/0:1     R  running task        0     0     14 0x00000008
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > Call Trace:
> > rcu_diag: rcuc9 2/255/0 
> > kworker/0:1     R  running task        0     0     14 0x00000008
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > Call Trace:
> > rcu_diag: rcuc10 2/226/0 
> > kworker/0:1     R  running task        0     0     14 0x00000008
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > Call Trace:
> > rcu_diag: rcuc11 2/214/1 
> > kworker/0:1     R  running task        0     0     14 0x00000008
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > Call Trace:
> > rcu_diag: rcuc12 2/193/1 
> > kworker/0:1     R  running task        0     0     14 0x00000008
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > Call Trace:
> > rcu_diag: rcuc13 2/172/0 
> > kworker/0:1     R  running task        0     0     14 0x00000008
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > Call Trace:
> > rcu_diag: rcuc14 2/146/0 
> > kworker/0:1     R  running task        0     0     14 0x00000008
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > Call Trace:
> > rcu_diag: rcuc15 2/130/0 
> > kworker/0:1     R  running task        0     0     14 0x00000008
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > Call Trace:
> > rcu_diag: rcuc16 2/107/0 
> > kworker/0:1     R  running task        0     0     14 0x00000008
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > Call Trace:
> > rcu_diag: rcuc17 2/86/0 
> > kworker/0:1     R  running task        0     0     14 0x00000008
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > Call Trace:
> > rcu_diag: rcuc18 2/62/1 
> > kworker/0:1     R  running task        0     0     14 0x00000008
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > Call Trace:
> > rcu_diag: rcuc19 2/39/0 
> > kworker/0:1     R  running task        0     0     14 0x00000008
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > Call Trace:
> > 
> > 
> > -- 
> > 
> > Sasha.
> >
Paul E. McKenney June 14, 2011, 11:42 p.m. UTC | #10
On Tue, Jun 14, 2011 at 06:26:01PM -0400, Sasha Levin wrote:
> Sorry for the delay on this.

Actually, you might have had excellent timing on this one.

Could you please try Shaohua Li's patch at:

	https://lkml.org/lkml/2011/6/14/20

This patch makes RCU much less prone to causing massive scheduler lock
contention, which might well be the cause of the problems that you
are seeing.

							Thanx, Paul

> On Sat, 2011-06-04 at 09:30 -0700, Paul E. McKenney wrote:
> > On Sat, Jun 04, 2011 at 09:26:15AM +0300, Sasha Levin wrote:
> > > On Fri, 2011-06-03 at 16:05 -0700, Paul E. McKenney wrote:
> > > > On Sat, Jun 04, 2011 at 01:54:45AM +0300, Sasha Levin wrote:
> > > > > On Fri, 2011-06-03 at 14:20 -0700, Paul E. McKenney wrote:
> > > > > > On Sat, Jun 04, 2011 at 12:03:59AM +0300, Sasha Levin wrote:
> > > > > > > On Fri, 2011-06-03 at 13:22 -0700, Paul E. McKenney wrote:
> > > > > > > > On Fri, Jun 03, 2011 at 10:56:20PM +0300, Sasha Levin wrote:
> > > > > > > > > On Fri, 2011-06-03 at 12:31 -0700, Paul E. McKenney wrote:
> > > > > > > > > > On Fri, Jun 03, 2011 at 10:54:19AM +0300, Sasha Levin wrote:
> > > > > > > > > > > On Fri, 2011-06-03 at 09:34 +0200, Ingo Molnar wrote:
> > > > > > > > > > > > * Sasha Levin <levinsasha928@gmail.com> wrote:
> > > > > > > > > > > > 
> > > > > > > > > > > > > > with no apparent progress being made.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Since it's something that worked in 2.6.37, I've looked into it to 
> > > > > > > > > > > > > find what might have caused this issue.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > I've bisected guest kernels and found that the problem starts with:
> > > > > > > > > > > > > 
> > > > > > > > > > > > > a26ac2455ffcf3be5c6ef92bc6df7182700f2114 is the first bad commit
> > > > > > > > > > > > > commit a26ac2455ffcf3be5c6ef92bc6df7182700f2114
> > > > > > > > > > > > > Author: Paul E. McKenney <paul.mckenney@linaro.org>
> > > > > > > > > > > > > Date:   Wed Jan 12 14:10:23 2011 -0800
> > > > > > > > > > > > > 
> > > > > > > > > > > > >     rcu: move TREE_RCU from softirq to kthread
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Ingo, could you confirm that the problem goes away for you when you 
> > > > > > > > > > > > > use an earlier commit?
> > > > > > > > > > > > 
> > > > > > > > > > > > testing will have to wait, but there's a recent upstream fix:
> > > > > > > > > > > > 
> > > > > > > > > > > >   d72bce0e67e8: rcu: Cure load woes
> > > > > > > > > > > > 
> > > > > > > > > > > > That *might* perhaps address this problem too.
> > > > > > > > > > > > 
> > > > > > > > > > > I've re-tested with Linus's current git, the problem is still there.
> > > > > > > > > > > 
> > > > > > > > > > > > If not then this appears to be some sort of RCU related livelock with 
> > > > > > > > > > > > brutally overcommitted vcpus. On native this would show up too, in a 
> > > > > > > > > > > > less drastic form, as a spurious bootup delay.
> > > > > > > > > > > 
> > > > > > > > > > > I don't think it was overcommited by *that* much. With that commit it
> > > > > > > > > > > usually hangs at 20-40 vcpus, while without it I can go up to 255.
> > > > > > > > > > 
> > > > > > > > > > Here is a diagnostic patch, untested.  It assumes that your system
> > > > > > > > > > has only a few CPUs (maybe 8-16) and that timers are still running.
> > > > > > > > > > It dumps out some RCU state if grace periods extend for more than
> > > > > > > > > > a few seconds.
> > > > > > > > > > 
> > > > > > > > > > To activate it, call rcu_diag_timer_start() from process context.
> > > > > > > > > > To stop it, call rcu_diag_timer_stop(), also from process context.
> > > > > > > > > 
> > > > > > > > > Since the hang happens in guest kernel very early during boot, I can't
> > > > > > > > > call rcu_diag_timer_start(). What would be a good place to put the
> > > > > > > > > _start() code instead?
> > > > > > > > 
> > > > > > > > Assuming that the failure occurs in the host OS rather than in the guest
> > > > > > > > OS, I suggest placing rcu_diag_timer_start() in the host code that starts
> > > > > > > > up the guest.
> > > > > > > > 
> > > > > > > > On the other hand, if the failure is occuring in the guest OS, then
> > > > > > > > I suggest placing the call to rcu_diag_timer_start() just after timer
> > > > > > > > initialization -- that is, assuming that interrupts are enabled at the
> > > > > > > > time of the failure.  If interrupts are not yet enabled at the time of
> > > > > > > > the failure, color me clueless.
> > > > > > > 
> > > > > > > It happens in the guest OS, the bisection was done on a guest kernel.
> > > > > > > 
> > > > > > > I've placed the rcu debug _start() right at the end of init_timers()
> > > > > > > which happens before the hang, but I'm not seeing any output from the
> > > > > > > debug function.
> > > > > > 
> > > > > > Sounds like the hang is happening before timers start?  Are scheduling
> > > > > > clock interrupts happening in the guest, as in scheduler_tick()?  If so,
> > > > > > I could just as easily put the debug there.
> > > > > 
> > > > > Yes, scheduler_tick() is running before the hang.
> > > > > Should I just call the handler function directly from there?
> > > > 
> > > > I suggest calling it no more than once every few seconds, but yes.
> > > > (Otherwise you will get a very full dmesg.)
> > > 
> > > Here is the last group of printk(), basically it just repeats itself
> > > when it's stuck:
> > > 
> > > KVM setup async PF for cpu 19
> > >  0000000000000000
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > Call Trace:
> > > rcu_diag: rcuc17 2/46/0 
> > > kworker/0:1     R  running task        0     0     14 0x00000008
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > Call Trace:
> > > rcu_diag: rcuc18 2/23/0 
> > > kworker/0:1     R  running task        0     0     14 0x00000008
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > Call Trace:
> > > rcu_diag: rcuc19 0/0/0 
> > > kworker/0:1     R  running task        0     0     14 0x00000008
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > Call Trace:
> > > [    0.527044]  #20
> > >  0000000000000000
> > > Call Trace:
> > > rcu_diag: rcuc3 0/0/0 
> > > kworker/0:1     R  running task        0     0     14 0x00000008
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > Call Trace:
> > > rcu_diag: rcuc4 2/358/0 
> > > kworker/0:1     R  running task        0     0     14 0x00000008
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > Call Trace:
> > > rcu_diag: rcuc5 2/340/1 
> > 
> > OK, this one (RCU's per-CPU kthread for CPU 5) has work to do (this
> > is the "1" after the final "/").  Does rcuc5 show up later in the
> > output?
> > 
> 
> The output grinds to a stop, I've pasted the last part of the prints -
> there are no more prints at all past this one.
> 
> > 							Thanx, Paul
> > 
> > > kworker/0:1     R  running task        0     0     14 0x00000008
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > Call Trace:
> > > rcu_diag: rcuc6 2/321/1 
> > > kworker/0:1     R  running task        0     0     14 0x00000008
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > Call Trace:
> > > rcu_diag: rcuc7 2/301/1 
> > > kworker/0:1     R  running task        0     0     14 0x00000008
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > Call Trace:
> > > rcu_diag: rcuc8 2/277/0 
> > > kworker/0:1     R  running task        0     0     14 0x00000008
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > Call Trace:
> > > rcu_diag: rcuc9 2/255/0 
> > > kworker/0:1     R  running task        0     0     14 0x00000008
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > Call Trace:
> > > rcu_diag: rcuc10 2/226/0 
> > > kworker/0:1     R  running task        0     0     14 0x00000008
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > Call Trace:
> > > rcu_diag: rcuc11 2/214/1 
> > > kworker/0:1     R  running task        0     0     14 0x00000008
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > Call Trace:
> > > rcu_diag: rcuc12 2/193/1 
> > > kworker/0:1     R  running task        0     0     14 0x00000008
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > Call Trace:
> > > rcu_diag: rcuc13 2/172/0 
> > > kworker/0:1     R  running task        0     0     14 0x00000008
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > Call Trace:
> > > rcu_diag: rcuc14 2/146/0 
> > > kworker/0:1     R  running task        0     0     14 0x00000008
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > Call Trace:
> > > rcu_diag: rcuc15 2/130/0 
> > > kworker/0:1     R  running task        0     0     14 0x00000008
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > Call Trace:
> > > rcu_diag: rcuc16 2/107/0 
> > > kworker/0:1     R  running task        0     0     14 0x00000008
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > Call Trace:
> > > rcu_diag: rcuc17 2/86/0 
> > > kworker/0:1     R  running task        0     0     14 0x00000008
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > Call Trace:
> > > rcu_diag: rcuc18 2/62/1 
> > > kworker/0:1     R  running task        0     0     14 0x00000008
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > Call Trace:
> > > rcu_diag: rcuc19 2/39/0 
> > > kworker/0:1     R  running task        0     0     14 0x00000008
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > Call Trace:
> > > 
> > > 
> > > -- 
> > > 
> > > Sasha.
> > > 
> 
> -- 
> 
> Sasha.
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sasha Levin June 15, 2011, 1:25 a.m. UTC | #11
On Tue, 2011-06-14 at 16:42 -0700, Paul E. McKenney wrote:
> On Tue, Jun 14, 2011 at 06:26:01PM -0400, Sasha Levin wrote:
> > Sorry for the delay on this.
> 
> Actually, you might have had excellent timing on this one.
> 
> Could you please try Shaohua Li's patch at:
> 
> 	https://lkml.org/lkml/2011/6/14/20
> 
> This patch makes RCU much less prone to causing massive scheduler lock
> contention, which might well be the cause of the problems that you
> are seeing.

This patch has solved the hang issue I've had. kvm tools now boots 60
vcpus easily with 3.0 kernel.
Paul E. McKenney June 15, 2011, 4:22 a.m. UTC | #12
On Tue, Jun 14, 2011 at 09:25:22PM -0400, Sasha Levin wrote:
> On Tue, 2011-06-14 at 16:42 -0700, Paul E. McKenney wrote:
> > On Tue, Jun 14, 2011 at 06:26:01PM -0400, Sasha Levin wrote:
> > > Sorry for the delay on this.
> > 
> > Actually, you might have had excellent timing on this one.
> > 
> > Could you please try Shaohua Li's patch at:
> > 
> > 	https://lkml.org/lkml/2011/6/14/20
> > 
> > This patch makes RCU much less prone to causing massive scheduler lock
> > contention, which might well be the cause of the problems that you
> > are seeing.
> 
> This patch has solved the hang issue I've had. kvm tools now boots 60
> vcpus easily with 3.0 kernel.

Thank you, Sasha -- I have added your Tested-by.

							Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 99f9aa7..489ea1b 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -80,6 +80,8 @@  extern void call_rcu_sched(struct rcu_head *head,
 extern void synchronize_sched(void);
 extern void rcu_barrier_bh(void);
 extern void rcu_barrier_sched(void);
+extern void rcu_diag_timer_start(void);
+extern void rcu_diag_timer_stop(void);
 
 static inline void __rcu_read_lock_bh(void)
 {
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 89419ff..bb61574 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -2423,3 +2423,48 @@  void __init rcu_init(void)
 }
 
 #include "rcutree_plugin.h"
+
+/* Diagnostic code for boot-time hangs observed in early 3.0 days. */
+
+static int rcu_diag_timer_must_stop;
+struct timer_list rcu_diag_timer;
+#define RCU_DIAG_TIMER_PERIOD	(10 * HZ)
+
+static void rcu_diag_timer_handler(unsigned long unused)
+{
+	int cpu;
+
+	if (rcu_diag_timer_must_stop)
+		return;
+
+	if(ULONG_CMP_GE(jiffies,
+			rcu_sched_state.gp_start + RCU_DIAG_TIMER_PERIOD))
+		for_each_online_cpu(cpu) {
+			printk(KERN_ALERT "rcu_diag: rcuc%d %u/%u/%d ",
+			       cpu,
+			       per_cpu(rcu_cpu_kthread_status, cpu),
+			       per_cpu(rcu_cpu_kthread_loops, cpu),
+			       per_cpu(rcu_cpu_has_work, cpu));
+			sched_show_task(current);
+		}
+
+	if (rcu_diag_timer_must_stop)
+		return;
+	mod_timer(&rcu_diag_timer, RCU_DIAG_TIMER_PERIOD + jiffies);
+}
+
+void rcu_diag_timer_start(void)
+{
+	rcu_diag_timer_must_stop = 0;
+	setup_timer(&rcu_diag_timer,
+		    rcu_diag_timer_handler, (unsigned long) NULL);
+	mod_timer(&rcu_diag_timer, RCU_DIAG_TIMER_PERIOD + jiffies);
+}
+EXPORT_SYMBOL_GPL(rcu_diag_timer_start);
+
+void rcu_diag_timer_stop(void)
+{
+	rcu_diag_timer_must_stop = 1;
+	del_timer(&rcu_diag_timer);
+}
+EXPORT_SYMBOL_GPL(rcu_diag_timer_stop);