diff mbox

BUG: spinlock trylock failure on UP, i.MX28 3.12.15-rt25

Message ID 20140422141650.7f43d5ba@gandalf.local.home (mailing list archive)
State New, archived
Headers show

Commit Message

Steven Rostedt April 22, 2014, 6:16 p.m. UTC
On Tue, 22 Apr 2014 13:48:02 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> I need to take a deeper look into the actual code. But as trylocks on
> UP are nops (always succeed), and if it expects to be able to do
> something in a critical section that is protected by spinlocks (again
> nops on UP), this would be broken for UP.

Reading the code, I see it's broken. We should add something like this:

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---

Comments

Sebastian Andrzej Siewior April 23, 2014, 7:14 a.m. UTC | #1
On 04/22/2014 08:16 PM, Steven Rostedt wrote:
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -1447,6 +1447,12 @@ static void run_timer_softirq(struct softirq_action *h)
>  		__run_timers(base);
>  }
>  
> +#ifdef CONFIG_SMP
> +#define timer_should_raise_softirq(lock)	!spin_do_trylock(lock)
> +#else
> +#define timer_should_raise_softirq(lock)	1
> +#endif
> +

No. The lock may be taken but it also may be available no matter if UP
or not. With this patch applied the lockdep splat will go away but the
FULL_NO_HZ people will come back because the timer softirq is scheduled
even if no timer has expired.

Sebastian
Peter Zijlstra April 23, 2014, 8:49 a.m. UTC | #2
On Wed, Apr 23, 2014 at 09:14:33AM +0200, Sebastian Andrzej Siewior wrote:
> On 04/22/2014 08:16 PM, Steven Rostedt wrote:
> > --- a/kernel/timer.c
> > +++ b/kernel/timer.c
> > @@ -1447,6 +1447,12 @@ static void run_timer_softirq(struct softirq_action *h)
> >  		__run_timers(base);
> >  }
> >  
> > +#ifdef CONFIG_SMP
> > +#define timer_should_raise_softirq(lock)	!spin_do_trylock(lock)
> > +#else
> > +#define timer_should_raise_softirq(lock)	1
> > +#endif
> > +
> 
> No. The lock may be taken but it also may be available no matter if UP
> or not. With this patch applied the lockdep splat will go away but the
> FULL_NO_HZ people will come back because the timer softirq is scheduled
> even if no timer has expired.

You can do FULL_NO_HZ on UP ? That seems rather pointless, as there
isn't a 'spare' CPU to do all the timekeeping on.
Steven Rostedt April 23, 2014, 12:15 p.m. UTC | #3
On Wed, 23 Apr 2014 09:14:33 +0200
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> On 04/22/2014 08:16 PM, Steven Rostedt wrote:
> > --- a/kernel/timer.c
> > +++ b/kernel/timer.c
> > @@ -1447,6 +1447,12 @@ static void run_timer_softirq(struct softirq_action *h)
> >  		__run_timers(base);
> >  }
> >  
> > +#ifdef CONFIG_SMP
> > +#define timer_should_raise_softirq(lock)	!spin_do_trylock(lock)
> > +#else
> > +#define timer_should_raise_softirq(lock)	1
> > +#endif
> > +
> 
> No. The lock may be taken but it also may be available no matter if UP
> or not. With this patch applied the lockdep splat will go away but the
> FULL_NO_HZ people will come back because the timer softirq is scheduled
> even if no timer has expired.

Although, as Peter said FULL_NO_HZ is pretty pointless on UP, but I've
been thinking this from a non PREEMPT_RT viewpoint. In non PREEMPT_RT,
spin_locks() are nops, where this is an issue, but in PREEMPT_RT, they
are mutexes, BUT!

The rt_mutex use wait_lock to protect its internal state, which is a raw
spinlock, and in UP it's a nop. That means there's nothing preventing
an interrupt here from corrupting the rtmutex's internal state. I see
this is still an issue, and the warning is still valid.

-- Steve
Sebastian Andrzej Siewior May 2, 2014, 6:38 p.m. UTC | #4
* Steven Rostedt | 2014-04-22 14:16:50 [-0400]:

>On Tue, 22 Apr 2014 13:48:02 -0400
>Steven Rostedt <rostedt@goodmis.org> wrote:
>
>> I need to take a deeper look into the actual code. But as trylocks on
>> UP are nops (always succeed), and if it expects to be able to do
>> something in a critical section that is protected by spinlocks (again
>> nops on UP), this would be broken for UP.
>
>Reading the code, I see it's broken. We should add something like this:
>
>Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
>---
>diff --git a/kernel/timer.c b/kernel/timer.c
>index cc34e42..a03164a 100644
>--- a/kernel/timer.c
>+++ b/kernel/timer.c
>@@ -1447,6 +1447,12 @@ static void run_timer_softirq(struct softirq_action *h)
> 		__run_timers(base);
> }
> 
>+#ifdef CONFIG_SMP
>+#define timer_should_raise_softirq(lock)	!spin_do_trylock(lock)
>+#else
>+#define timer_should_raise_softirq(lock)	1
>+#endif
>+
> /*
>  * Called by the local, per-CPU timer interrupt on SMP.
>  */
>@@ -1467,7 +1473,7 @@ void run_local_timers(void)
> 		return;
> 	}
> 
>-	if (!spin_do_trylock(&base->lock)) {
>+	if (timer_should_raise_softirq(&base->lock)) {
> 		raise_softirq(TIMER_SOFTIRQ);
> 		return;
> 	}

Okay. So Peter said that it is okay to apply this since FULL_NO_HZ users
wouldn't complain on UP. I still wouldn't say it is broken but that is a
different story.
We have two users of this trylock. run_local_timers() which pops up
quite often (and you patched here) and the other is
get_next_timer_interrupt(). What do you suggest we do here? It is
basically the same thing.

Sebastian
Thomas Gleixner May 2, 2014, 7:01 p.m. UTC | #5
On Fri, 2 May 2014, Sebastian Andrzej Siewior wrote:
> * Steven Rostedt | 2014-04-22 14:16:50 [-0400]:
> > /*
> >  * Called by the local, per-CPU timer interrupt on SMP.
> >  */
> >@@ -1467,7 +1473,7 @@ void run_local_timers(void)
> > 		return;
> > 	}
> > 
> >-	if (!spin_do_trylock(&base->lock)) {
> >+	if (timer_should_raise_softirq(&base->lock)) {
> > 		raise_softirq(TIMER_SOFTIRQ);
> > 		return;
> > 	}
> 
> Okay. So Peter said that it is okay to apply this since FULL_NO_HZ users
> wouldn't complain on UP. I still wouldn't say it is broken but that is a
> different story.
> We have two users of this trylock. run_local_timers() which pops up
> quite often (and you patched here) and the other is
> get_next_timer_interrupt(). What do you suggest we do here? It is
> basically the same thing.

It's different as it CANNOT fail on UP. That's called from the idle
code and there is no way that anything holds that lock on UP when idle
runs.

Thanks,

	tglx
Thomas Gleixner May 2, 2014, 7:36 p.m. UTC | #6
On Fri, 2 May 2014, Thomas Gleixner wrote:
> On Fri, 2 May 2014, Sebastian Andrzej Siewior wrote:
> > * Steven Rostedt | 2014-04-22 14:16:50 [-0400]:
> > > /*
> > >  * Called by the local, per-CPU timer interrupt on SMP.
> > >  */
> > >@@ -1467,7 +1473,7 @@ void run_local_timers(void)
> > > 		return;
> > > 	}
> > > 
> > >-	if (!spin_do_trylock(&base->lock)) {
> > >+	if (timer_should_raise_softirq(&base->lock)) {
> > > 		raise_softirq(TIMER_SOFTIRQ);
> > > 		return;
> > > 	}
> > 
> > Okay. So Peter said that it is okay to apply this since FULL_NO_HZ users
> > wouldn't complain on UP. I still wouldn't say it is broken but that is a
> > different story.
> > We have two users of this trylock. run_local_timers() which pops up
> > quite often (and you patched here) and the other is
> > get_next_timer_interrupt(). What do you suggest we do here? It is
> > basically the same thing.
> 
> It's different as it CANNOT fail on UP. That's called from the idle
> code and there is no way that anything holds that lock on UP when idle
> runs.

So yeah, you are right, that it's called from irq_exit() so it needs
an annotation at least. Maybe it's really cleaner to make it #if SMP
as well just for clarity raisins.

Thanks,

	tglx
diff mbox

Patch

diff --git a/kernel/timer.c b/kernel/timer.c
index cc34e42..a03164a 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1447,6 +1447,12 @@  static void run_timer_softirq(struct softirq_action *h)
 		__run_timers(base);
 }
 
+#ifdef CONFIG_SMP
+#define timer_should_raise_softirq(lock)	!spin_do_trylock(lock)
+#else
+#define timer_should_raise_softirq(lock)	1
+#endif
+
 /*
  * Called by the local, per-CPU timer interrupt on SMP.
  */
@@ -1467,7 +1473,7 @@  void run_local_timers(void)
 		return;
 	}
 
-	if (!spin_do_trylock(&base->lock)) {
+	if (timer_should_raise_softirq(&base->lock)) {
 		raise_softirq(TIMER_SOFTIRQ);
 		return;
 	}