diff mbox

RFC: mutex: hung tasks on SMP platforms with asm-generic/mutex-xchg.h

Message ID alpine.LFD.2.02.1208091243080.5231@xanadu.home (mailing list archive)
State New, archived
Headers show

Commit Message

Nicolas Pitre Aug. 9, 2012, 4:57 p.m. UTC
On Thu, 9 Aug 2012, Nicolas Pitre wrote:

> On Thu, 9 Aug 2012, Will Deacon wrote:
> 
> > I think we could actually fix this entirely in mutex-xchg.h by doing
> > something in fastpath_lock similar to what we do for trylock:
> > 
> > 
> > diff --git a/include/asm-generic/mutex-xchg.h b/include/asm-generic/mutex-xchg.h
> > index 580a6d3..c082e99 100644
> > --- a/include/asm-generic/mutex-xchg.h
> > +++ b/include/asm-generic/mutex-xchg.h
> > @@ -25,8 +25,19 @@
> >  static inline void
> >  __mutex_fastpath_lock(atomic_t *count, void (*fail_fn)(atomic_t *))
> >  {
> > -       if (unlikely(atomic_xchg(count, 0) != 1))
> > -               fail_fn(count);
> > +       int prev = atomic_xchg(count, 0);
> > +
> > +       if (unlikely(prev != 1)) {
> > +               if (prev < 0)
> > +                       /*
> > +                        * The lock was contended, so we need to restore
> > +                        * its original state to ensure that any waiting
> > +                        * tasks are woken up by the unlock slow path.
> > +                        */
> > +                       prev = atomic_xchg(count, prev);
> > +               if (prev != 1)
> > +                       fail_fn(count);
> > +       }
> >  }
> > 
> > What do you reckon?
> 
> Yes, that looks fine.  I'd remove that if (prev < 0) entirely though.  
> We'll just swap a 0 for a 0 if the count wasn't < 0, or a 0 for a 1 if 
> the mutex just got unlocked which is also fine.  This is especially 
> beneficial when a native xchg processor instruction is used.

In fact, this suggestion isn't entirely correct either. The inner xchg 
in this case should be -1 and not 'count'.  If 'count' is 0 and the 
mutex becomes contended in the small window between the two xchg's then 
the contention mark would be lost again.

In other words, I think this should look like this:



Nicolas

Comments

Will Deacon Aug. 9, 2012, 5:50 p.m. UTC | #1
On Thu, Aug 09, 2012 at 05:57:33PM +0100, Nicolas Pitre wrote:
> On Thu, 9 Aug 2012, Nicolas Pitre wrote:
> > Yes, that looks fine.  I'd remove that if (prev < 0) entirely though.  
> > We'll just swap a 0 for a 0 if the count wasn't < 0, or a 0 for a 1 if 
> > the mutex just got unlocked which is also fine.  This is especially 
> > beneficial when a native xchg processor instruction is used.
> 
> In fact, this suggestion isn't entirely correct either. The inner xchg 
> in this case should be -1 and not 'count'.  If 'count' is 0 and the 
> mutex becomes contended in the small window between the two xchg's then 
> the contention mark would be lost again.
> 
> In other words, I think this should look like this:
> 
> diff --git a/include/asm-generic/mutex-xchg.h b/include/asm-generic/mutex-xchg.h
> index 580a6d35c7..44a66c99c8 100644
> --- a/include/asm-generic/mutex-xchg.h
> +++ b/include/asm-generic/mutex-xchg.h
> @@ -25,8 +25,11 @@
>  static inline void
>  __mutex_fastpath_lock(atomic_t *count, void (*fail_fn)(atomic_t *))
>  {
> -	if (unlikely(atomic_xchg(count, 0) != 1))
> -		fail_fn(count);
> +	if (unlikely(atomic_xchg(count, 0) != 1)) {
> +		/* Mark lock contention explicitly */
> +		if (likely(atomic_xchg(count, -1) != 1))
> +			fail_fn(count);
> +	}
>  }
>  
>  /**

Doesn't this mean that we're no longer just swapping 0 for a 0 if the lock
was taken, therefore needlessly sending the current owner down the slowpath
on unlock? If we have the explicit test, that doesn't happen and the
disassembly isn't much different (an additional cmp/conditional branch).

Will
Nicolas Pitre Aug. 9, 2012, 6:09 p.m. UTC | #2
On Thu, 9 Aug 2012, Will Deacon wrote:

> On Thu, Aug 09, 2012 at 05:57:33PM +0100, Nicolas Pitre wrote:
> > On Thu, 9 Aug 2012, Nicolas Pitre wrote:
> > > Yes, that looks fine.  I'd remove that if (prev < 0) entirely though.  
> > > We'll just swap a 0 for a 0 if the count wasn't < 0, or a 0 for a 1 if 
> > > the mutex just got unlocked which is also fine.  This is especially 
> > > beneficial when a native xchg processor instruction is used.
> > 
> > In fact, this suggestion isn't entirely correct either. The inner xchg 
> > in this case should be -1 and not 'count'.  If 'count' is 0 and the 
> > mutex becomes contended in the small window between the two xchg's then 
> > the contention mark would be lost again.
> > 
> > In other words, I think this should look like this:
> > 
> > diff --git a/include/asm-generic/mutex-xchg.h b/include/asm-generic/mutex-xchg.h
> > index 580a6d35c7..44a66c99c8 100644
> > --- a/include/asm-generic/mutex-xchg.h
> > +++ b/include/asm-generic/mutex-xchg.h
> > @@ -25,8 +25,11 @@
> >  static inline void
> >  __mutex_fastpath_lock(atomic_t *count, void (*fail_fn)(atomic_t *))
> >  {
> > -	if (unlikely(atomic_xchg(count, 0) != 1))
> > -		fail_fn(count);
> > +	if (unlikely(atomic_xchg(count, 0) != 1)) {
> > +		/* Mark lock contention explicitly */
> > +		if (likely(atomic_xchg(count, -1) != 1))
> > +			fail_fn(count);
> > +	}
> >  }
> >  
> >  /**
> 
> Doesn't this mean that we're no longer just swapping 0 for a 0 if the lock
> was taken, therefore needlessly sending the current owner down the slowpath
> on unlock?

If the lock was taken, this means the count was either 0 or -1.  If it 
was 1 then we just put a 0 there and we own it.  But if the cound was 0 
then we should store -1 instead, which is what the inner xchg does.  If 
the count was already -1 then we store -1 back.  That more closely mimic 
what the atomic dec does which is what we want.


Nicolas
Will Deacon Aug. 9, 2012, 6:17 p.m. UTC | #3
On Thu, Aug 09, 2012 at 07:09:02PM +0100, Nicolas Pitre wrote:
> On Thu, 9 Aug 2012, Will Deacon wrote:
> > On Thu, Aug 09, 2012 at 05:57:33PM +0100, Nicolas Pitre wrote:
> > > On Thu, 9 Aug 2012, Nicolas Pitre wrote:
> > > diff --git a/include/asm-generic/mutex-xchg.h b/include/asm-generic/mutex-xchg.h
> > > index 580a6d35c7..44a66c99c8 100644
> > > --- a/include/asm-generic/mutex-xchg.h
> > > +++ b/include/asm-generic/mutex-xchg.h
> > > @@ -25,8 +25,11 @@
> > >  static inline void
> > >  __mutex_fastpath_lock(atomic_t *count, void (*fail_fn)(atomic_t *))
> > >  {
> > > -	if (unlikely(atomic_xchg(count, 0) != 1))
> > > -		fail_fn(count);
> > > +	if (unlikely(atomic_xchg(count, 0) != 1)) {
> > > +		/* Mark lock contention explicitly */
> > > +		if (likely(atomic_xchg(count, -1) != 1))
> > > +			fail_fn(count);
> > > +	}
> > >  }
> > >  
> > >  /**
> > 
> > Doesn't this mean that we're no longer just swapping 0 for a 0 if the lock
> > was taken, therefore needlessly sending the current owner down the slowpath
> > on unlock?
> 
> If the lock was taken, this means the count was either 0 or -1.  If it 
> was 1 then we just put a 0 there and we own it.  But if the cound was 0 
> then we should store -1 instead, which is what the inner xchg does.  If 
> the count was already -1 then we store -1 back.  That more closely mimic 
> what the atomic dec does which is what we want.

Ok, I just wasn't sure that marking the lock contended was required when it
was previously locked, given that we'll drop into spinning on the owner
anyway.

I'll add a commit message to the above and re-post if that's ok?

Will
Nicolas Pitre Aug. 9, 2012, 8:05 p.m. UTC | #4
On Thu, 9 Aug 2012, Will Deacon wrote:

> On Thu, Aug 09, 2012 at 07:09:02PM +0100, Nicolas Pitre wrote:
> > On Thu, 9 Aug 2012, Will Deacon wrote:
> > > On Thu, Aug 09, 2012 at 05:57:33PM +0100, Nicolas Pitre wrote:
> > > > On Thu, 9 Aug 2012, Nicolas Pitre wrote:
> > > > diff --git a/include/asm-generic/mutex-xchg.h b/include/asm-generic/mutex-xchg.h
> > > > index 580a6d35c7..44a66c99c8 100644
> > > > --- a/include/asm-generic/mutex-xchg.h
> > > > +++ b/include/asm-generic/mutex-xchg.h
> > > > @@ -25,8 +25,11 @@
> > > >  static inline void
> > > >  __mutex_fastpath_lock(atomic_t *count, void (*fail_fn)(atomic_t *))
> > > >  {
> > > > -	if (unlikely(atomic_xchg(count, 0) != 1))
> > > > -		fail_fn(count);
> > > > +	if (unlikely(atomic_xchg(count, 0) != 1)) {
> > > > +		/* Mark lock contention explicitly */
> > > > +		if (likely(atomic_xchg(count, -1) != 1))
> > > > +			fail_fn(count);
> > > > +	}
> > > >  }
> > > >  
> > > >  /**
> > > 
> > > Doesn't this mean that we're no longer just swapping 0 for a 0 if the lock
> > > was taken, therefore needlessly sending the current owner down the slowpath
> > > on unlock?
> > 
> > If the lock was taken, this means the count was either 0 or -1.  If it 
> > was 1 then we just put a 0 there and we own it.  But if the cound was 0 
> > then we should store -1 instead, which is what the inner xchg does.  If 
> > the count was already -1 then we store -1 back.  That more closely mimic 
> > what the atomic dec does which is what we want.
> 
> Ok, I just wasn't sure that marking the lock contended was required when it
> was previously locked, given that we'll drop into spinning on the owner
> anyway.

That's fine, and the owner will put 1 back when it unlocks it as well as 
processing the wait queue which is what we need.

> I'll add a commit message to the above and re-post if that's ok?

Sure.  Don't forget to update __mutex_fastpath_lock_retval() as well.


Nicolas
Peter Zijlstra Aug. 13, 2012, 8:15 a.m. UTC | #5
On Thu, 2012-08-09 at 12:57 -0400, Nicolas Pitre wrote:
> 
> In other words, I think this should look like this:
> 
> diff --git a/include/asm-generic/mutex-xchg.h b/include/asm-generic/mutex-xchg.h
> index 580a6d35c7..44a66c99c8 100644
> --- a/include/asm-generic/mutex-xchg.h
> +++ b/include/asm-generic/mutex-xchg.h
> @@ -25,8 +25,11 @@
>  static inline void
>  __mutex_fastpath_lock(atomic_t *count, void (*fail_fn)(atomic_t *))
>  {
> -       if (unlikely(atomic_xchg(count, 0) != 1))
> -               fail_fn(count);
> +       if (unlikely(atomic_xchg(count, 0) != 1)) {
> +               /* Mark lock contention explicitly */
> +               if (likely(atomic_xchg(count, -1) != 1))
> +                       fail_fn(count);
> +       }
>  } 

OK, I like this.. Thanks guys! Will will you send a final and complete
patch?
Will Deacon Aug. 13, 2012, 9:13 a.m. UTC | #6
Hi Peter,

On Mon, Aug 13, 2012 at 09:15:04AM +0100, Peter Zijlstra wrote:
> On Thu, 2012-08-09 at 12:57 -0400, Nicolas Pitre wrote:
> > 
> > In other words, I think this should look like this:
> > 
> > diff --git a/include/asm-generic/mutex-xchg.h b/include/asm-generic/mutex-xchg.h
> > index 580a6d35c7..44a66c99c8 100644
> > --- a/include/asm-generic/mutex-xchg.h
> > +++ b/include/asm-generic/mutex-xchg.h
> > @@ -25,8 +25,11 @@
> >  static inline void
> >  __mutex_fastpath_lock(atomic_t *count, void (*fail_fn)(atomic_t *))
> >  {
> > -       if (unlikely(atomic_xchg(count, 0) != 1))
> > -               fail_fn(count);
> > +       if (unlikely(atomic_xchg(count, 0) != 1)) {
> > +               /* Mark lock contention explicitly */
> > +               if (likely(atomic_xchg(count, -1) != 1))
> > +                       fail_fn(count);
> > +       }
> >  } 
> 
> OK, I like this.. Thanks guys! Will will you send a final and complete
> patch?

I sent one out on Friday but somehow managed to drop you from CC -- sorry
about that:

  http://thread.gmane.org/gmane.linux.kernel/1341305

Cheers,

Will
Peter Zijlstra Aug. 13, 2012, 2:05 p.m. UTC | #7
On Mon, 2012-08-13 at 09:35 -0400, Nicolas Pitre wrote:
> Date: Fri, 10 Aug 2012 15:22:09 +0100
> From: Will Deacon <will.deacon@arm.com>
> Subject: [PATCH] mutex: place lock in contended state after fastpath_lock failure
> 
> ARM recently moved to asm-generic/mutex-xchg.h for its mutex
> implementation after the previous implementation was found to be missing
> some crucial memory barriers. However, this has revealed some problems
> running hackbench on SMP platforms due to the way in which the
> MUTEX_SPIN_ON_OWNER code operates.
> 
> The symptoms are that a bunch of hackbench tasks are left waiting on an
> unlocked mutex and therefore never get woken up to claim it. This boils
> down to the following sequence of events:
> 
>         Task A        Task B        Task C        Lock value
> 0                                                     1
> 1       lock()                                        0
> 2                     lock()                          0
> 3                     spin(A)                         0
> 4       unlock()                                      1
> 5                                   lock()            0
> 6                     cmpxchg(1,0)                    0
> 7                     contended()                    -1
> 8       lock()                                        0
> 9       spin(C)                                       0
> 10                                  unlock()          1
> 11      cmpxchg(1,0)                                  0
> 12      unlock()                                      1
> 
> At this point, the lock is unlocked, but Task B is in an uninterruptible
> sleep with nobody to wake it up.
> 
> This patch fixes the problem by ensuring we put the lock into the
> contended state if we fail to acquire it on the fastpath, ensuring that
> any blocked waiters are woken up when the mutex is released.
> 
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Chris Mason <chris.mason@fusionio.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Will Deacon <will.deacon@arm.com>
> Reviewed-by: Nicolas Pitre <nico@linaro.org> 

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>


Will you carry this through the ARM tree or do you want me/Ingo to take
it?
Will Deacon Aug. 13, 2012, 2:11 p.m. UTC | #8
On Mon, Aug 13, 2012 at 03:05:17PM +0100, Peter Zijlstra wrote:
> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>

Thanks Peter.

> Will you carry this through the ARM tree or do you want me/Ingo to take
> it?

Please can you guys take it? The move to mutex-dec can go through the ARM
tree but it's probably best if you take the core change.

Cheers,

Will
Peter Zijlstra Aug. 13, 2012, 2:45 p.m. UTC | #9
On Mon, 2012-08-13 at 15:11 +0100, Will Deacon wrote:
> 
> > Will you carry this through the ARM tree or do you want me/Ingo to take
> > it?
> 
> Please can you guys take it? The move to mutex-dec can go through the ARM
> tree but it's probably best if you take the core change. 

OK, done. Thanks!
diff mbox

Patch

diff --git a/include/asm-generic/mutex-xchg.h b/include/asm-generic/mutex-xchg.h
index 580a6d35c7..44a66c99c8 100644
--- a/include/asm-generic/mutex-xchg.h
+++ b/include/asm-generic/mutex-xchg.h
@@ -25,8 +25,11 @@ 
 static inline void
 __mutex_fastpath_lock(atomic_t *count, void (*fail_fn)(atomic_t *))
 {
-	if (unlikely(atomic_xchg(count, 0) != 1))
-		fail_fn(count);
+	if (unlikely(atomic_xchg(count, 0) != 1)) {
+		/* Mark lock contention explicitly */
+		if (likely(atomic_xchg(count, -1) != 1))
+			fail_fn(count);
+	}
 }
 
 /**