Message ID | 1480601214-26583-6-git-send-email-nhaehnle@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Dec 01, 2016 at 03:06:48PM +0100, Nicolai Hähnle wrote: > @@ -677,15 +722,25 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, > debug_mutex_lock_common(lock, &waiter); > debug_mutex_add_waiter(lock, &waiter, task); > > - /* add waiting tasks to the end of the waitqueue (FIFO): */ > - list_add_tail(&waiter.list, &lock->wait_list); > + lock_contended(&lock->dep_map, ip); > + > + if (!use_ww_ctx) { > + /* add waiting tasks to the end of the waitqueue (FIFO): */ > + list_add_tail(&waiter.list, &lock->wait_list); > + } else { > + /* Add in stamp order, waking up waiters that must back off. */ > + ret = __ww_mutex_add_waiter(&waiter, lock, ww_ctx); > + if (ret) > + goto err_early_backoff; > + > + waiter.ww_ctx = ww_ctx; > + } > + > waiter.task = task; Would an unconditional waiter.ww_ctx = ww_ctx be chep enough? (Same cacheline write and all that?) Makes the above clearer in that you have if (!ww_ctx) { list_add_tail(); } else { ret = __ww_mutex_add_waiter(); /* no need to handle !ww_ctx */ if (ret) goto err_early_backoff; } waiter.ww_ctx = ww_ctx; waiter.task = task; > > if (__mutex_waiter_is_first(lock, &waiter)) > __mutex_set_flag(lock, MUTEX_FLAG_WAITERS); > > - lock_contended(&lock->dep_map, ip); > - > set_task_state(task, state); > for (;;) { > /* > @@ -693,8 +748,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, > * mutex_unlock() handing the lock off to us, do a trylock > * before testing the error conditions to make sure we pick up > * the handoff. > + * > + * For w/w locks, we always need to do this even if we're not > + * currently the first waiter, because we may have been the > + * first waiter during the unlock. > */ > - if (__mutex_trylock(lock, first)) > + if (__mutex_trylock(lock, use_ww_ctx || first)) I'm not certain about the magic of first vs HANDOFF. Afaict, first == HANDOFF and this patch breaks that relationship. I think you need to add bool handoff; as a separate tracker to first. > goto acquired; > > /* > @@ -716,7 +775,20 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, > spin_unlock_mutex(&lock->wait_lock, flags); > schedule_preempt_disabled(); > > - if (!first && __mutex_waiter_is_first(lock, &waiter)) { > + if (use_ww_ctx && ww_ctx) { > + /* > + * Always re-check whether we're in first position. We > + * don't want to spin if another task with a lower > + * stamp has taken our position. > + * > + * We also may have to set the handoff flag again, if > + * our position at the head was temporarily taken away. Comment makes sense. Ah. Should this be just if (use_ww_ctx) { /* always recheck... */ ? Except that !ww_ctx are never gazzumped in the list, so if they are first, then they are always first. Could you explain that as well (about why !ww_ctx is special here but not above). And then it can even be reduced to if (ww_ctx) {} to match the first chunk if the revision is acceptable. -Chris
On Thu, Dec 01, 2016 at 03:06:48PM +0100, Nicolai Hähnle wrote: > +static inline int __sched > +__ww_mutex_add_waiter(struct mutex_waiter *waiter, > + struct mutex *lock, > + struct ww_acquire_ctx *ww_ctx) > +{ > + struct mutex_waiter *cur; > + > + if (!ww_ctx) { > + list_add_tail(&waiter->list, &lock->wait_list); > + return 0; > + } > + > + /* > + * Add the waiter before the first waiter with a higher stamp. > + * Waiters without a context are skipped to avoid starving > + * them. > + */ > + list_for_each_entry(cur, &lock->wait_list, list) { > + if (!cur->ww_ctx) > + continue; > + > + if (__ww_mutex_stamp_after(ww_ctx, cur->ww_ctx)) { > + /* Back off immediately if necessary. */ > + if (ww_ctx->acquired > 0) { > +#ifdef CONFIG_DEBUG_MUTEXES > + struct ww_mutex *ww; > + > + ww = container_of(lock, struct ww_mutex, base); > + DEBUG_LOCKS_WARN_ON(ww_ctx->contending_lock); > + ww_ctx->contending_lock = ww; > +#endif > + return -EDEADLK; > + } > + > + continue; > + } > + > + list_add_tail(&waiter->list, &cur->list); > + return 0; > + } > + > + list_add_tail(&waiter->list, &lock->wait_list); > + return 0; > +} So you keep the list in order of stamp, and in general stamps come in, in-order. That is, barring races on concurrent ww_mutex_lock(), things are already ordered. So it doesn't make sense to scan the entire list forwards, that's almost guarantees you scan the entire list every single time. Or am I reading this wrong? Which in itself is a hint a comment might be in place.
On Thu, Dec 01, 2016 at 03:06:48PM +0100, Nicolai Hähnle wrote: > @@ -693,8 +748,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, > * mutex_unlock() handing the lock off to us, do a trylock > * before testing the error conditions to make sure we pick up > * the handoff. > + * > + * For w/w locks, we always need to do this even if we're not > + * currently the first waiter, because we may have been the > + * first waiter during the unlock. > */ > - if (__mutex_trylock(lock, first)) > + if (__mutex_trylock(lock, use_ww_ctx || first)) > goto acquired; So I'm somewhat uncomfortable with this. The point is that with the .handoff logic it is very easy to accidentally allow: mutex_lock(&a); mutex_lock(&a); And I'm not sure this doesn't make that happen for ww_mutexes. We get to this __mutex_trylock() without first having blocked. > /* > @@ -716,7 +775,20 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, > spin_unlock_mutex(&lock->wait_lock, flags); > schedule_preempt_disabled(); > > - if (!first && __mutex_waiter_is_first(lock, &waiter)) { > + if (use_ww_ctx && ww_ctx) { > + /* > + * Always re-check whether we're in first position. We > + * don't want to spin if another task with a lower > + * stamp has taken our position. > + * > + * We also may have to set the handoff flag again, if > + * our position at the head was temporarily taken away. > + */ > + first = __mutex_waiter_is_first(lock, &waiter); > + > + if (first) > + __mutex_set_flag(lock, MUTEX_FLAG_HANDOFF); > + } else if (!first && __mutex_waiter_is_first(lock, &waiter)) { > first = true; > __mutex_set_flag(lock, MUTEX_FLAG_HANDOFF); > } So the point is that !ww_ctx entries are 'skipped' during the insertion and therefore, if one becomes first, it must stay first? > @@ -728,7 +800,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, > * or we must see its unlock and acquire. > */ > if ((first && mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx, true)) || > - __mutex_trylock(lock, first)) > + __mutex_trylock(lock, use_ww_ctx || first)) > break; > > spin_lock_mutex(&lock->wait_lock, flags);
On 06.12.2016 16:36, Peter Zijlstra wrote: > On Thu, Dec 01, 2016 at 03:06:48PM +0100, Nicolai Hähnle wrote: >> +static inline int __sched >> +__ww_mutex_add_waiter(struct mutex_waiter *waiter, >> + struct mutex *lock, >> + struct ww_acquire_ctx *ww_ctx) >> +{ >> + struct mutex_waiter *cur; >> + >> + if (!ww_ctx) { >> + list_add_tail(&waiter->list, &lock->wait_list); >> + return 0; >> + } >> + >> + /* >> + * Add the waiter before the first waiter with a higher stamp. >> + * Waiters without a context are skipped to avoid starving >> + * them. >> + */ >> + list_for_each_entry(cur, &lock->wait_list, list) { >> + if (!cur->ww_ctx) >> + continue; >> + >> + if (__ww_mutex_stamp_after(ww_ctx, cur->ww_ctx)) { >> + /* Back off immediately if necessary. */ >> + if (ww_ctx->acquired > 0) { >> +#ifdef CONFIG_DEBUG_MUTEXES >> + struct ww_mutex *ww; >> + >> + ww = container_of(lock, struct ww_mutex, base); >> + DEBUG_LOCKS_WARN_ON(ww_ctx->contending_lock); >> + ww_ctx->contending_lock = ww; >> +#endif >> + return -EDEADLK; >> + } >> + >> + continue; >> + } >> + >> + list_add_tail(&waiter->list, &cur->list); >> + return 0; >> + } >> + >> + list_add_tail(&waiter->list, &lock->wait_list); >> + return 0; >> +} > > So you keep the list in order of stamp, and in general stamps come in, > in-order. That is, barring races on concurrent ww_mutex_lock(), things > are already ordered. > > So it doesn't make sense to scan the entire list forwards, that's almost > guarantees you scan the entire list every single time. > > Or am I reading this wrong? Which in itself is a hint a comment might be > in place. No, it's a reasonable question. Some things to keep in mind: 1. Each ww_acquire_ctx may be used with hundreds of locks. It's not that clear that things will be ordered in a contention scenario, especially since the old stamp is re-used when a context backs off and goes into the slow path (with ww_ctx->acquired == 0). 2. We want to add waiters directly before the first waiter with a higher stamp. Keeping in mind that there may be non-ww_ctx waiters in between, and we don't want to starve them, traversing the list backwards would require keeping the best insertion point around in a separate variable. Clearly possible, but it seemed more awkward. In hindsight, backwards iteration may not be so bad. Nicolai
Hi Peter and Chris, (trying to combine the handoff discussion here) On 06.12.2016 17:55, Peter Zijlstra wrote: > On Thu, Dec 01, 2016 at 03:06:48PM +0100, Nicolai Hähnle wrote: >> @@ -693,8 +748,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, >> * mutex_unlock() handing the lock off to us, do a trylock >> * before testing the error conditions to make sure we pick up >> * the handoff. >> + * >> + * For w/w locks, we always need to do this even if we're not >> + * currently the first waiter, because we may have been the >> + * first waiter during the unlock. >> */ >> - if (__mutex_trylock(lock, first)) >> + if (__mutex_trylock(lock, use_ww_ctx || first)) >> goto acquired; > > So I'm somewhat uncomfortable with this. The point is that with the > .handoff logic it is very easy to accidentally allow: > > mutex_lock(&a); > mutex_lock(&a); > > And I'm not sure this doesn't make that happen for ww_mutexes. We get to > this __mutex_trylock() without first having blocked. Okay, took me a while, but I see the problem. If we have: ww_mutex_lock(&a, NULL); ww_mutex_lock(&a, ctx); then it's possible that another currently waiting task sets the HANDOFF flag between those calls and we'll allow the second ww_mutex_lock to go through. The concern about picking up a handoff that we didn't request is real, though it cannot happen in the first iteration. Perhaps this __mutex_trylock can be moved to the end of the loop? See below... > > >> /* >> @@ -716,7 +775,20 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, >> spin_unlock_mutex(&lock->wait_lock, flags); >> schedule_preempt_disabled(); >> >> - if (!first && __mutex_waiter_is_first(lock, &waiter)) { >> + if (use_ww_ctx && ww_ctx) { >> + /* >> + * Always re-check whether we're in first position. We >> + * don't want to spin if another task with a lower >> + * stamp has taken our position. >> + * >> + * We also may have to set the handoff flag again, if >> + * our position at the head was temporarily taken away. >> + */ >> + first = __mutex_waiter_is_first(lock, &waiter); >> + >> + if (first) >> + __mutex_set_flag(lock, MUTEX_FLAG_HANDOFF); >> + } else if (!first && __mutex_waiter_is_first(lock, &waiter)) { >> first = true; >> __mutex_set_flag(lock, MUTEX_FLAG_HANDOFF); >> } > > So the point is that !ww_ctx entries are 'skipped' during the insertion > and therefore, if one becomes first, it must stay first? Yes. Actually, it should be possible to replace all the cases of use_ww_ctx || first with ww_ctx. Similarly, all cases of use_ww_ctx && ww_ctx could be replaced by just ww_ctx. > >> @@ -728,7 +800,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, >> * or we must see its unlock and acquire. >> */ >> if ((first && mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx, true)) || >> - __mutex_trylock(lock, first)) >> + __mutex_trylock(lock, use_ww_ctx || first)) >> break; >> >> spin_lock_mutex(&lock->wait_lock, flags); Change this code to: acquired = first && mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx, &waiter); spin_lock_mutex(&lock->wait_lock, flags); if (acquired || __mutex_trylock(lock, use_ww_ctx || first)) break; } This changes the trylock to always be under the wait_lock, but we previously had that at the beginning of the loop anyway. It also removes back-to-back calls to __mutex_trylock when going through the loop; and for the first iteration, there is a __mutex_trylock under wait_lock already before adding ourselves to the wait list. What do you think? Nicolai
On 01.12.2016 16:59, Chris Wilson wrote: > On Thu, Dec 01, 2016 at 03:06:48PM +0100, Nicolai Hähnle wrote: >> @@ -677,15 +722,25 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, >> debug_mutex_lock_common(lock, &waiter); >> debug_mutex_add_waiter(lock, &waiter, task); >> >> - /* add waiting tasks to the end of the waitqueue (FIFO): */ >> - list_add_tail(&waiter.list, &lock->wait_list); >> + lock_contended(&lock->dep_map, ip); >> + >> + if (!use_ww_ctx) { >> + /* add waiting tasks to the end of the waitqueue (FIFO): */ >> + list_add_tail(&waiter.list, &lock->wait_list); >> + } else { >> + /* Add in stamp order, waking up waiters that must back off. */ >> + ret = __ww_mutex_add_waiter(&waiter, lock, ww_ctx); >> + if (ret) >> + goto err_early_backoff; >> + >> + waiter.ww_ctx = ww_ctx; >> + } >> + >> waiter.task = task; > > Would an unconditional waiter.ww_ctx = ww_ctx be chep enough? (Same > cacheline write and all that?) > > Makes the above clearer in that you have > > if (!ww_ctx) { > list_add_tail(); > } else { > ret = __ww_mutex_add_waiter(); /* no need to handle !ww_ctx */ > if (ret) > goto err_early_backoff; > } > > waiter.ww_ctx = ww_ctx; > waiter.task = task; I don't feel strongly either way. I thought it'd be nice to have an explicit distinction between mutex_lock(&a) and ww_mutex_lock(&a, NULL) though. > >> >> if (__mutex_waiter_is_first(lock, &waiter)) >> __mutex_set_flag(lock, MUTEX_FLAG_WAITERS); >> >> - lock_contended(&lock->dep_map, ip); >> - >> set_task_state(task, state); >> for (;;) { >> /* >> @@ -693,8 +748,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, >> * mutex_unlock() handing the lock off to us, do a trylock >> * before testing the error conditions to make sure we pick up >> * the handoff. >> + * >> + * For w/w locks, we always need to do this even if we're not >> + * currently the first waiter, because we may have been the >> + * first waiter during the unlock. >> */ >> - if (__mutex_trylock(lock, first)) >> + if (__mutex_trylock(lock, use_ww_ctx || first)) > > I'm not certain about the magic of first vs HANDOFF. Afaict, first == > HANDOFF and this patch breaks that relationship. I think you need to add > bool handoff; as a separate tracker to first. > >> goto acquired; >> >> /* >> @@ -716,7 +775,20 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, >> spin_unlock_mutex(&lock->wait_lock, flags); >> schedule_preempt_disabled(); >> >> - if (!first && __mutex_waiter_is_first(lock, &waiter)) { >> + if (use_ww_ctx && ww_ctx) { >> + /* >> + * Always re-check whether we're in first position. We >> + * don't want to spin if another task with a lower >> + * stamp has taken our position. >> + * >> + * We also may have to set the handoff flag again, if >> + * our position at the head was temporarily taken away. > > Comment makes sense. > > Ah. Should this be just if (use_ww_ctx) { /* always recheck... */ ? > Except that !ww_ctx are never gazzumped in the list, so if they are > first, then they are always first. Right. See also the other mail. Nicolai > > Could you explain that as well (about why !ww_ctx is special here but > not above). And then it can even be reduced to if (ww_ctx) {} to match > the first chunk if the revision is acceptable. > -Chris >
On Fri, Dec 16, 2016 at 03:19:43PM +0100, Nicolai Hähnle wrote: > Hi Peter and Chris, > > (trying to combine the handoff discussion here) > > On 06.12.2016 17:55, Peter Zijlstra wrote: > >On Thu, Dec 01, 2016 at 03:06:48PM +0100, Nicolai Hähnle wrote: > >>@@ -693,8 +748,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, > >> * mutex_unlock() handing the lock off to us, do a trylock > >> * before testing the error conditions to make sure we pick up > >> * the handoff. > >>+ * > >>+ * For w/w locks, we always need to do this even if we're not > >>+ * currently the first waiter, because we may have been the > >>+ * first waiter during the unlock. > >> */ > >>- if (__mutex_trylock(lock, first)) > >>+ if (__mutex_trylock(lock, use_ww_ctx || first)) > >> goto acquired; > > > >So I'm somewhat uncomfortable with this. The point is that with the > >.handoff logic it is very easy to accidentally allow: > > > > mutex_lock(&a); > > mutex_lock(&a); > > > >And I'm not sure this doesn't make that happen for ww_mutexes. We get to > >this __mutex_trylock() without first having blocked. > > Okay, took me a while, but I see the problem. If we have: > > ww_mutex_lock(&a, NULL); > ww_mutex_lock(&a, ctx); > > then it's possible that another currently waiting task sets the HANDOFF flag > between those calls and we'll allow the second ww_mutex_lock to go through. Its worse, __mutex_trylock() doesn't check if MUTEX_FLAG_HANDOFF is set, if .handoff == true && __owner_task() == current, we 'acquire'. And since 'use_ww_ctx' is unconditionally true for ww_mutex_lock(), the sequence: ww_mutex_lock(&a, ...); ww_mutex_lock(&a, ...); will 'work'.
On Fri, Dec 16, 2016 at 03:19:43PM +0100, Nicolai Hähnle wrote: > The concern about picking up a handoff that we didn't request is real, > though it cannot happen in the first iteration. Perhaps this __mutex_trylock > can be moved to the end of the loop? See below... > >>@@ -728,7 +800,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, > >> * or we must see its unlock and acquire. > >> */ > >> if ((first && mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx, true)) || > >>- __mutex_trylock(lock, first)) > >>+ __mutex_trylock(lock, use_ww_ctx || first)) > >> break; > >> > >> spin_lock_mutex(&lock->wait_lock, flags); > > Change this code to: > > acquired = first && > mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx, > &waiter); > spin_lock_mutex(&lock->wait_lock, flags); > > if (acquired || > __mutex_trylock(lock, use_ww_ctx || first)) > break; goto acquired; will work lots better. > } > > This changes the trylock to always be under the wait_lock, but we previously > had that at the beginning of the loop anyway. > It also removes back-to-back > calls to __mutex_trylock when going through the loop; Yeah, I had that explicitly. It allows taking the mutex when mutex_unlock() is still holding the wait_lock. > and for the first > iteration, there is a __mutex_trylock under wait_lock already before adding > ourselves to the wait list. Correct.
On Fri, Dec 16, 2016 at 03:19:43PM +0100, Nicolai Hähnle wrote: > >>@@ -716,7 +775,20 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, > >> spin_unlock_mutex(&lock->wait_lock, flags); > >> schedule_preempt_disabled(); > >> > >>- if (!first && __mutex_waiter_is_first(lock, &waiter)) { > >>+ if (use_ww_ctx && ww_ctx) { > >>+ /* > >>+ * Always re-check whether we're in first position. We > >>+ * don't want to spin if another task with a lower > >>+ * stamp has taken our position. > >>+ * > >>+ * We also may have to set the handoff flag again, if > >>+ * our position at the head was temporarily taken away. > >>+ */ > >>+ first = __mutex_waiter_is_first(lock, &waiter); > >>+ > >>+ if (first) > >>+ __mutex_set_flag(lock, MUTEX_FLAG_HANDOFF); > >>+ } else if (!first && __mutex_waiter_is_first(lock, &waiter)) { > >> first = true; > >> __mutex_set_flag(lock, MUTEX_FLAG_HANDOFF); > >> } > > > >So the point is that !ww_ctx entries are 'skipped' during the insertion > >and therefore, if one becomes first, it must stay first? > > Yes. Actually, it should be possible to replace all the cases of use_ww_ctx > || first with ww_ctx. Similarly, all cases of use_ww_ctx && ww_ctx could be > replaced by just ww_ctx. I'm not seeing how "use_ww_ctx || first" -> "ww_ctx" works. And while "use_ww_ctx && ww_ctx" -> "ww_ctx" is correct, it didn't work right on some older GCCs, they choked on value propagation for ww_ctx and kept emitting code even if we passed in NULL. Hence use_ww_ctx. Arnd is now looking to raise the minimum supported GCC version, so maybe we should look at that again if he gets anywhere.
On 16.12.2016 18:15, Peter Zijlstra wrote: > On Fri, Dec 16, 2016 at 03:19:43PM +0100, Nicolai Hähnle wrote: >> The concern about picking up a handoff that we didn't request is real, >> though it cannot happen in the first iteration. Perhaps this __mutex_trylock >> can be moved to the end of the loop? See below... > > >>>> @@ -728,7 +800,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, >>>> * or we must see its unlock and acquire. >>>> */ >>>> if ((first && mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx, true)) || >>>> - __mutex_trylock(lock, first)) >>>> + __mutex_trylock(lock, use_ww_ctx || first)) >>>> break; >>>> >>>> spin_lock_mutex(&lock->wait_lock, flags); >> >> Change this code to: >> >> acquired = first && >> mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx, >> &waiter); >> spin_lock_mutex(&lock->wait_lock, flags); >> >> if (acquired || >> __mutex_trylock(lock, use_ww_ctx || first)) >> break; > > goto acquired; > > will work lots better. Wasn't explicit enough, sorry. The idea was to get rid of the acquired label and change things so that all paths exit the loop with wait_lock held. That seems cleaner to me. >> } >> >> This changes the trylock to always be under the wait_lock, but we previously >> had that at the beginning of the loop anyway. > >> It also removes back-to-back >> calls to __mutex_trylock when going through the loop; > > Yeah, I had that explicitly. It allows taking the mutex when > mutex_unlock() is still holding the wait_lock. mutex_optimistic_spin() already calls __mutex_trylock, and for the no-spin case, __mutex_unlock_slowpath() only calls wake_up_q() after releasing the wait_lock. So I don't see the purpose of the back-to-back __mutex_trylocks, especially considering that if the first one succeeds, we immediately take the wait_lock anyway. Nicolai >> and for the first >> iteration, there is a __mutex_trylock under wait_lock already before adding >> ourselves to the wait list. > > Correct. >
On 16.12.2016 18:20, Peter Zijlstra wrote: > On Fri, Dec 16, 2016 at 03:19:43PM +0100, Nicolai Hähnle wrote: >>>> @@ -716,7 +775,20 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, >>>> spin_unlock_mutex(&lock->wait_lock, flags); >>>> schedule_preempt_disabled(); >>>> >>>> - if (!first && __mutex_waiter_is_first(lock, &waiter)) { >>>> + if (use_ww_ctx && ww_ctx) { >>>> + /* >>>> + * Always re-check whether we're in first position. We >>>> + * don't want to spin if another task with a lower >>>> + * stamp has taken our position. >>>> + * >>>> + * We also may have to set the handoff flag again, if >>>> + * our position at the head was temporarily taken away. >>>> + */ >>>> + first = __mutex_waiter_is_first(lock, &waiter); >>>> + >>>> + if (first) >>>> + __mutex_set_flag(lock, MUTEX_FLAG_HANDOFF); >>>> + } else if (!first && __mutex_waiter_is_first(lock, &waiter)) { >>>> first = true; >>>> __mutex_set_flag(lock, MUTEX_FLAG_HANDOFF); >>>> } >>> >>> So the point is that !ww_ctx entries are 'skipped' during the insertion >>> and therefore, if one becomes first, it must stay first? >> >> Yes. Actually, it should be possible to replace all the cases of use_ww_ctx >> || first with ww_ctx. Similarly, all cases of use_ww_ctx && ww_ctx could be >> replaced by just ww_ctx. > > > I'm not seeing how "use_ww_ctx || first" -> "ww_ctx" works. My bad, missing the '|| first'. > And while > "use_ww_ctx && ww_ctx" -> "ww_ctx" is correct, it didn't work right on > some older GCCs, they choked on value propagation for ww_ctx and kept > emitting code even if we passed in NULL. Hence use_ww_ctx. Okay, I'll stick with use_ww_ctx. Thanks for the explanation. Nicolai > Arnd is now looking to raise the minimum supported GCC version, so maybe > we should look at that again if he gets anywhere. >
On Fri, Dec 16, 2016 at 07:11:41PM +0100, Nicolai Hähnle wrote: > mutex_optimistic_spin() already calls __mutex_trylock, and for the no-spin > case, __mutex_unlock_slowpath() only calls wake_up_q() after releasing the > wait_lock. mutex_optimistic_spin() is a no-op when !CONFIG_MUTEX_SPIN_ON_OWNER
On 16.12.2016 21:00, Peter Zijlstra wrote: > On Fri, Dec 16, 2016 at 07:11:41PM +0100, Nicolai Hähnle wrote: >> mutex_optimistic_spin() already calls __mutex_trylock, and for the no-spin >> case, __mutex_unlock_slowpath() only calls wake_up_q() after releasing the >> wait_lock. > > mutex_optimistic_spin() is a no-op when !CONFIG_MUTEX_SPIN_ON_OWNER Does this change the conclusion in a meaningful way? I did mention the no-spin case in the very part you quoted... Again, AFAIU we're talking about the part of my proposal that turns what is effectively __mutex_trylock(lock, ...); spin_lock_mutex(&lock->wait_lock, flags); (independent of whether the trylock succeeds or not!) into spin_lock_mutex(&lock->wait_lock, flags); __mutex_trylock(lock, ...); in an effort to streamline the code overall. Also AFAIU, you're concerned that spin_lock_mutex(...) has to wait for an unlock from mutex_unlock(), but when does that actually happen with relevant probability? When we spin optimistically, that could happen -- except that __mutex_trylock is already called in mutex_optimistic_spin, so it doesn't matter. When we don't spin -- whether due to .config or !first -- then the chance of overlap with mutex_unlock is exceedingly small. Even if we do overlap, we'll have to wait for mutex_unlock to release the wait_lock anyway! So what good does acquiring the lock first really do? Anyway, this is really more of an argument about whether there's really a good reason to calling __mutex_trylock twice in that loop. I don't think there is, your arguments certainly haven't been convincing, but the issue can be side-stepped for this patch by keeping the trylock calls as they are and just setting first = true unconditionally for ww_ctx != NULL (but keep the logic for when to set the HANDOFF flag as-is). Should probably rename the variable s/first/handoff/ then. Nicolai
diff --git a/include/linux/mutex.h b/include/linux/mutex.h index b97870f..118a3b6 100644 --- a/include/linux/mutex.h +++ b/include/linux/mutex.h @@ -20,6 +20,8 @@ #include <linux/osq_lock.h> #include <linux/debug_locks.h> +struct ww_acquire_ctx; + /* * Simple, straightforward mutexes with strict semantics: * @@ -75,6 +77,7 @@ static inline struct task_struct *__mutex_owner(struct mutex *lock) struct mutex_waiter { struct list_head list; struct task_struct *task; + struct ww_acquire_ctx *ww_ctx; #ifdef CONFIG_DEBUG_MUTEXES void *magic; #endif diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 585627f..d63eebb 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -628,6 +628,51 @@ __ww_mutex_lock_check_stamp(struct mutex *lock, struct ww_acquire_ctx *ctx) return 0; } +static inline int __sched +__ww_mutex_add_waiter(struct mutex_waiter *waiter, + struct mutex *lock, + struct ww_acquire_ctx *ww_ctx) +{ + struct mutex_waiter *cur; + + if (!ww_ctx) { + list_add_tail(&waiter->list, &lock->wait_list); + return 0; + } + + /* + * Add the waiter before the first waiter with a higher stamp. + * Waiters without a context are skipped to avoid starving + * them. + */ + list_for_each_entry(cur, &lock->wait_list, list) { + if (!cur->ww_ctx) + continue; + + if (__ww_mutex_stamp_after(ww_ctx, cur->ww_ctx)) { + /* Back off immediately if necessary. */ + if (ww_ctx->acquired > 0) { +#ifdef CONFIG_DEBUG_MUTEXES + struct ww_mutex *ww; + + ww = container_of(lock, struct ww_mutex, base); + DEBUG_LOCKS_WARN_ON(ww_ctx->contending_lock); + ww_ctx->contending_lock = ww; +#endif + return -EDEADLK; + } + + continue; + } + + list_add_tail(&waiter->list, &cur->list); + return 0; + } + + list_add_tail(&waiter->list, &lock->wait_list); + return 0; +} + /* * Lock a mutex (possibly interruptible), slowpath: */ @@ -677,15 +722,25 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, debug_mutex_lock_common(lock, &waiter); debug_mutex_add_waiter(lock, &waiter, task); - /* add waiting tasks to the end of the waitqueue (FIFO): */ - list_add_tail(&waiter.list, &lock->wait_list); + lock_contended(&lock->dep_map, ip); + + if (!use_ww_ctx) { + /* add waiting tasks to the end of the waitqueue (FIFO): */ + list_add_tail(&waiter.list, &lock->wait_list); + } else { + /* Add in stamp order, waking up waiters that must back off. */ + ret = __ww_mutex_add_waiter(&waiter, lock, ww_ctx); + if (ret) + goto err_early_backoff; + + waiter.ww_ctx = ww_ctx; + } + waiter.task = task; if (__mutex_waiter_is_first(lock, &waiter)) __mutex_set_flag(lock, MUTEX_FLAG_WAITERS); - lock_contended(&lock->dep_map, ip); - set_task_state(task, state); for (;;) { /* @@ -693,8 +748,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, * mutex_unlock() handing the lock off to us, do a trylock * before testing the error conditions to make sure we pick up * the handoff. + * + * For w/w locks, we always need to do this even if we're not + * currently the first waiter, because we may have been the + * first waiter during the unlock. */ - if (__mutex_trylock(lock, first)) + if (__mutex_trylock(lock, use_ww_ctx || first)) goto acquired; /* @@ -716,7 +775,20 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, spin_unlock_mutex(&lock->wait_lock, flags); schedule_preempt_disabled(); - if (!first && __mutex_waiter_is_first(lock, &waiter)) { + if (use_ww_ctx && ww_ctx) { + /* + * Always re-check whether we're in first position. We + * don't want to spin if another task with a lower + * stamp has taken our position. + * + * We also may have to set the handoff flag again, if + * our position at the head was temporarily taken away. + */ + first = __mutex_waiter_is_first(lock, &waiter); + + if (first) + __mutex_set_flag(lock, MUTEX_FLAG_HANDOFF); + } else if (!first && __mutex_waiter_is_first(lock, &waiter)) { first = true; __mutex_set_flag(lock, MUTEX_FLAG_HANDOFF); } @@ -728,7 +800,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, * or we must see its unlock and acquire. */ if ((first && mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx, true)) || - __mutex_trylock(lock, first)) + __mutex_trylock(lock, use_ww_ctx || first)) break; spin_lock_mutex(&lock->wait_lock, flags); @@ -761,6 +833,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, err: __set_task_state(task, TASK_RUNNING); mutex_remove_waiter(lock, &waiter, task); +err_early_backoff: spin_unlock_mutex(&lock->wait_lock, flags); debug_mutex_free_waiter(&waiter); mutex_release(&lock->dep_map, 1, ip);