diff mbox

locking/mutexes: Revert "locking/mutexes: Add extra reschedule point"

Message ID 1406801797-20139-1-git-send-email-ilya.dryomov@inktank.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ilya Dryomov July 31, 2014, 10:16 a.m. UTC
This reverts commit 34c6bc2c919a55e5ad4e698510a2f35ee13ab900.

This commit can lead to deadlocks by way of what at a high level
appears to look like a missing wakeup on mutex_unlock() when
CONFIG_MUTEX_SPIN_ON_OWNER is set, which is how most distributions ship
their kernels.  In particular, it causes reproducible deadlocks in
libceph/rbd code under higher than moderate loads with the evidence
actually pointing to the bowels of mutex_lock().

kernel/locking/mutex.c, __mutex_lock_common():
476         osq_unlock(&lock->osq);
477 slowpath:
478         /*
479          * If we fell out of the spin path because of need_resched(),
480          * reschedule now, before we try-lock the mutex. This avoids getting
481          * scheduled out right after we obtained the mutex.
482          */
483         if (need_resched())
484                 schedule_preempt_disabled(); <-- never returns
485 #endif
486         spin_lock_mutex(&lock->wait_lock, flags);

We started bumping into deadlocks in QA the day our branch has been
rebased onto 3.15 (the release this commit went in) but then as part of
debugging effort I enabled all locking debug options, which also
disabled CONFIG_MUTEX_SPIN_ON_OWNER and made everything disappear,
which is why it hasn't been looked into until now.  Revert makes the
problem go away, confirmed by our users.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org # 3.15
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
---
 kernel/locking/mutex.c |    7 -------
 1 file changed, 7 deletions(-)

Comments

Peter Zijlstra July 31, 2014, 11:57 a.m. UTC | #1
On Thu, Jul 31, 2014 at 02:16:37PM +0400, Ilya Dryomov wrote:
> This reverts commit 34c6bc2c919a55e5ad4e698510a2f35ee13ab900.
> 
> This commit can lead to deadlocks by way of what at a high level
> appears to look like a missing wakeup on mutex_unlock() when
> CONFIG_MUTEX_SPIN_ON_OWNER is set, which is how most distributions ship
> their kernels.  In particular, it causes reproducible deadlocks in
> libceph/rbd code under higher than moderate loads with the evidence
> actually pointing to the bowels of mutex_lock().
> 
> kernel/locking/mutex.c, __mutex_lock_common():
> 476         osq_unlock(&lock->osq);
> 477 slowpath:
> 478         /*
> 479          * If we fell out of the spin path because of need_resched(),
> 480          * reschedule now, before we try-lock the mutex. This avoids getting
> 481          * scheduled out right after we obtained the mutex.
> 482          */
> 483         if (need_resched())
> 484                 schedule_preempt_disabled(); <-- never returns
> 485 #endif
> 486         spin_lock_mutex(&lock->wait_lock, flags);
> 
> We started bumping into deadlocks in QA the day our branch has been
> rebased onto 3.15 (the release this commit went in) but then as part of
> debugging effort I enabled all locking debug options, which also
> disabled CONFIG_MUTEX_SPIN_ON_OWNER and made everything disappear,
> which is why it hasn't been looked into until now.  Revert makes the
> problem go away, confirmed by our users.

This doesn't make sense and you fail to explain how this can possibly
deadlock.
Ilya Dryomov July 31, 2014, 12:37 p.m. UTC | #2
On Thu, Jul 31, 2014 at 3:57 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, Jul 31, 2014 at 02:16:37PM +0400, Ilya Dryomov wrote:
>> This reverts commit 34c6bc2c919a55e5ad4e698510a2f35ee13ab900.
>>
>> This commit can lead to deadlocks by way of what at a high level
>> appears to look like a missing wakeup on mutex_unlock() when
>> CONFIG_MUTEX_SPIN_ON_OWNER is set, which is how most distributions ship
>> their kernels.  In particular, it causes reproducible deadlocks in
>> libceph/rbd code under higher than moderate loads with the evidence
>> actually pointing to the bowels of mutex_lock().
>>
>> kernel/locking/mutex.c, __mutex_lock_common():
>> 476         osq_unlock(&lock->osq);
>> 477 slowpath:
>> 478         /*
>> 479          * If we fell out of the spin path because of need_resched(),
>> 480          * reschedule now, before we try-lock the mutex. This avoids getting
>> 481          * scheduled out right after we obtained the mutex.
>> 482          */
>> 483         if (need_resched())
>> 484                 schedule_preempt_disabled(); <-- never returns
>> 485 #endif
>> 486         spin_lock_mutex(&lock->wait_lock, flags);
>>
>> We started bumping into deadlocks in QA the day our branch has been
>> rebased onto 3.15 (the release this commit went in) but then as part of
>> debugging effort I enabled all locking debug options, which also
>> disabled CONFIG_MUTEX_SPIN_ON_OWNER and made everything disappear,
>> which is why it hasn't been looked into until now.  Revert makes the
>> problem go away, confirmed by our users.
>
> This doesn't make sense and you fail to explain how this can possibly
> deadlock.

This didn't make sense to me at first too, and I'll be happy to be
proven wrong, but we can reproduce this with rbd very reliably under
higher than usual load, and the revert makes it go away.  What we are
seeing in the rbd scenario is the following.

Suppose foo needs mutexes A and B, bar needs mutex B.  foo acquires
A and then wants to acquire B, but B is held by bar.  foo spins
a little and ends up calling schedule_preempt_disabled() on line 484
above, but that call never returns, even though a hundred usecs later
bar releases B.  foo ends up stuck in mutex_lock() indefinitely, but
still holds A and everybody else who needs A gets behind A.  Given that
this A happens to be a central libceph mutex all rbd activity halts.
Deadlock may not be the best term for this, but never returning from
mutex_lock(&B) even though B has been unlocked is *a* problem.

This obviously doesn't happen every time schedule_preempt_disabled() on
line 484 is called, so there must be some sort of race here.  I'll send
along the actual rbd stack traces shortly.

Thanks,

                Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index acca2c1a3c5e..746ff280a2fc 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -475,13 +475,6 @@  __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	}
 	osq_unlock(&lock->osq);
 slowpath:
-	/*
-	 * If we fell out of the spin path because of need_resched(),
-	 * reschedule now, before we try-lock the mutex. This avoids getting
-	 * scheduled out right after we obtained the mutex.
-	 */
-	if (need_resched())
-		schedule_preempt_disabled();
 #endif
 	spin_lock_mutex(&lock->wait_lock, flags);