diff mbox

[v2,05/11] locking/ww_mutex: Add waiters in stamp order

Message ID 1480601214-26583-6-git-send-email-nhaehnle@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Nicolai Hähnle Dec. 1, 2016, 2:06 p.m. UTC
From: Nicolai Hähnle <Nicolai.Haehnle@amd.com>

Add regular waiters in stamp order. Keep adding waiters that have no
context in FIFO order and take care not to starve them.

While adding our task as a waiter, back off if we detect that there is a
waiter with a lower stamp in front of us.

Make sure to call lock_contended even when we back off early.

v2:
- rein in the indentation of __ww_mutex_add_waiter a bit
- set contending_lock in __ww_mutex_add_waiter (Chris Wilson)

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Maarten Lankhorst <dev@mblankhorst.nl>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Nicolai Hähnle <Nicolai.Haehnle@amd.com>
---
 include/linux/mutex.h  |  3 ++
 kernel/locking/mutex.c | 87 ++++++++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 83 insertions(+), 7 deletions(-)

Comments

Chris Wilson Dec. 1, 2016, 3:59 p.m. UTC | #1
On Thu, Dec 01, 2016 at 03:06:48PM +0100, Nicolai Hähnle wrote:
> @@ -677,15 +722,25 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>  	debug_mutex_lock_common(lock, &waiter);
>  	debug_mutex_add_waiter(lock, &waiter, task);
>  
> -	/* add waiting tasks to the end of the waitqueue (FIFO): */
> -	list_add_tail(&waiter.list, &lock->wait_list);
> +	lock_contended(&lock->dep_map, ip);
> +
> +	if (!use_ww_ctx) {
> +		/* add waiting tasks to the end of the waitqueue (FIFO): */
> +		list_add_tail(&waiter.list, &lock->wait_list);
> +	} else {
> +		/* Add in stamp order, waking up waiters that must back off. */
> +		ret = __ww_mutex_add_waiter(&waiter, lock, ww_ctx);
> +		if (ret)
> +			goto err_early_backoff;
> +
> +		waiter.ww_ctx = ww_ctx;
> +	}
> +
>  	waiter.task = task;

Would an unconditional waiter.ww_ctx = ww_ctx be chep enough? (Same
cacheline write and all that?)

Makes the above clearer in that you have

	if (!ww_ctx) {
		list_add_tail();
	} else {
		ret = __ww_mutex_add_waiter(); /* no need to handle !ww_ctx */
		if (ret)
			goto err_early_backoff;
	}

	waiter.ww_ctx = ww_ctx;
	waiter.task = task;

>  
>  	if (__mutex_waiter_is_first(lock, &waiter))
>  		__mutex_set_flag(lock, MUTEX_FLAG_WAITERS);
>  
> -	lock_contended(&lock->dep_map, ip);
> -
>  	set_task_state(task, state);
>  	for (;;) {
>  		/*
> @@ -693,8 +748,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>  		 * mutex_unlock() handing the lock off to us, do a trylock
>  		 * before testing the error conditions to make sure we pick up
>  		 * the handoff.
> +		 *
> +		 * For w/w locks, we always need to do this even if we're not
> +		 * currently the first waiter, because we may have been the
> +		 * first waiter during the unlock.
>  		 */
> -		if (__mutex_trylock(lock, first))
> +		if (__mutex_trylock(lock, use_ww_ctx || first))

I'm not certain about the magic of first vs HANDOFF. Afaict, first ==
HANDOFF and this patch breaks that relationship. I think you need to add
bool handoff; as a separate tracker to first.

>  			goto acquired;
>  
>  		/*
> @@ -716,7 +775,20 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>  		spin_unlock_mutex(&lock->wait_lock, flags);
>  		schedule_preempt_disabled();
>  
> -		if (!first && __mutex_waiter_is_first(lock, &waiter)) {
> +		if (use_ww_ctx && ww_ctx) {
> +			/*
> +			 * Always re-check whether we're in first position. We
> +			 * don't want to spin if another task with a lower
> +			 * stamp has taken our position.
> +			 *
> +			 * We also may have to set the handoff flag again, if
> +			 * our position at the head was temporarily taken away.

Comment makes sense.

Ah. Should this be just if (use_ww_ctx) { /* always recheck... */ ?
Except that !ww_ctx are never gazzumped in the list, so if they are
first, then they are always first.

Could you explain that as well (about why !ww_ctx is special here but
not above). And then it can even be reduced to if (ww_ctx) {} to match
the first chunk if the revision is acceptable.
-Chris
Peter Zijlstra Dec. 6, 2016, 3:36 p.m. UTC | #2
On Thu, Dec 01, 2016 at 03:06:48PM +0100, Nicolai Hähnle wrote:
> +static inline int __sched
> +__ww_mutex_add_waiter(struct mutex_waiter *waiter,
> +		      struct mutex *lock,
> +		      struct ww_acquire_ctx *ww_ctx)
> +{
> +	struct mutex_waiter *cur;
> +
> +	if (!ww_ctx) {
> +		list_add_tail(&waiter->list, &lock->wait_list);
> +		return 0;
> +	}
> +
> +	/*
> +	 * Add the waiter before the first waiter with a higher stamp.
> +	 * Waiters without a context are skipped to avoid starving
> +	 * them.
> +	 */
> +	list_for_each_entry(cur, &lock->wait_list, list) {
> +		if (!cur->ww_ctx)
> +			continue;
> +
> +		if (__ww_mutex_stamp_after(ww_ctx, cur->ww_ctx)) {
> +			/* Back off immediately if necessary. */
> +			if (ww_ctx->acquired > 0) {
> +#ifdef CONFIG_DEBUG_MUTEXES
> +				struct ww_mutex *ww;
> +
> +				ww = container_of(lock, struct ww_mutex, base);
> +				DEBUG_LOCKS_WARN_ON(ww_ctx->contending_lock);
> +				ww_ctx->contending_lock = ww;
> +#endif
> +				return -EDEADLK;
> +			}
> +
> +			continue;
> +		}
> +
> +		list_add_tail(&waiter->list, &cur->list);
> +		return 0;
> +	}
> +
> +	list_add_tail(&waiter->list, &lock->wait_list);
> +	return 0;
> +}

So you keep the list in order of stamp, and in general stamps come in,
in-order. That is, barring races on concurrent ww_mutex_lock(), things
are already ordered.

So it doesn't make sense to scan the entire list forwards, that's almost
guarantees you scan the entire list every single time.

Or am I reading this wrong? Which in itself is a hint a comment might be
in place.
Peter Zijlstra Dec. 6, 2016, 4:55 p.m. UTC | #3
On Thu, Dec 01, 2016 at 03:06:48PM +0100, Nicolai Hähnle wrote:
> @@ -693,8 +748,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>  		 * mutex_unlock() handing the lock off to us, do a trylock
>  		 * before testing the error conditions to make sure we pick up
>  		 * the handoff.
> +		 *
> +		 * For w/w locks, we always need to do this even if we're not
> +		 * currently the first waiter, because we may have been the
> +		 * first waiter during the unlock.
>  		 */
> -		if (__mutex_trylock(lock, first))
> +		if (__mutex_trylock(lock, use_ww_ctx || first))
>  			goto acquired;

So I'm somewhat uncomfortable with this. The point is that with the
.handoff logic it is very easy to accidentally allow:

	mutex_lock(&a);
	mutex_lock(&a);

And I'm not sure this doesn't make that happen for ww_mutexes. We get to
this __mutex_trylock() without first having blocked.


>  		/*
> @@ -716,7 +775,20 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>  		spin_unlock_mutex(&lock->wait_lock, flags);
>  		schedule_preempt_disabled();
>  
> -		if (!first && __mutex_waiter_is_first(lock, &waiter)) {
> +		if (use_ww_ctx && ww_ctx) {
> +			/*
> +			 * Always re-check whether we're in first position. We
> +			 * don't want to spin if another task with a lower
> +			 * stamp has taken our position.
> +			 *
> +			 * We also may have to set the handoff flag again, if
> +			 * our position at the head was temporarily taken away.
> +			 */
> +			first = __mutex_waiter_is_first(lock, &waiter);
> +
> +			if (first)
> +				__mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
> +		} else if (!first && __mutex_waiter_is_first(lock, &waiter)) {
>  			first = true;
>  			__mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
>  		}

So the point is that !ww_ctx entries are 'skipped' during the insertion
and therefore, if one becomes first, it must stay first?

> @@ -728,7 +800,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>  		 * or we must see its unlock and acquire.
>  		 */
>  		if ((first && mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx, true)) ||
> -		     __mutex_trylock(lock, first))
> +		     __mutex_trylock(lock, use_ww_ctx || first))
>  			break;
>  
>  		spin_lock_mutex(&lock->wait_lock, flags);
Nicolai Hähnle Dec. 16, 2016, 1:34 p.m. UTC | #4
On 06.12.2016 16:36, Peter Zijlstra wrote:
> On Thu, Dec 01, 2016 at 03:06:48PM +0100, Nicolai Hähnle wrote:
>> +static inline int __sched
>> +__ww_mutex_add_waiter(struct mutex_waiter *waiter,
>> +		      struct mutex *lock,
>> +		      struct ww_acquire_ctx *ww_ctx)
>> +{
>> +	struct mutex_waiter *cur;
>> +
>> +	if (!ww_ctx) {
>> +		list_add_tail(&waiter->list, &lock->wait_list);
>> +		return 0;
>> +	}
>> +
>> +	/*
>> +	 * Add the waiter before the first waiter with a higher stamp.
>> +	 * Waiters without a context are skipped to avoid starving
>> +	 * them.
>> +	 */
>> +	list_for_each_entry(cur, &lock->wait_list, list) {
>> +		if (!cur->ww_ctx)
>> +			continue;
>> +
>> +		if (__ww_mutex_stamp_after(ww_ctx, cur->ww_ctx)) {
>> +			/* Back off immediately if necessary. */
>> +			if (ww_ctx->acquired > 0) {
>> +#ifdef CONFIG_DEBUG_MUTEXES
>> +				struct ww_mutex *ww;
>> +
>> +				ww = container_of(lock, struct ww_mutex, base);
>> +				DEBUG_LOCKS_WARN_ON(ww_ctx->contending_lock);
>> +				ww_ctx->contending_lock = ww;
>> +#endif
>> +				return -EDEADLK;
>> +			}
>> +
>> +			continue;
>> +		}
>> +
>> +		list_add_tail(&waiter->list, &cur->list);
>> +		return 0;
>> +	}
>> +
>> +	list_add_tail(&waiter->list, &lock->wait_list);
>> +	return 0;
>> +}
>
> So you keep the list in order of stamp, and in general stamps come in,
> in-order. That is, barring races on concurrent ww_mutex_lock(), things
> are already ordered.
 >
> So it doesn't make sense to scan the entire list forwards, that's almost
> guarantees you scan the entire list every single time.
>
> Or am I reading this wrong? Which in itself is a hint a comment might be
> in place.

No, it's a reasonable question. Some things to keep in mind:

1. Each ww_acquire_ctx may be used with hundreds of locks. It's not that 
clear that things will be ordered in a contention scenario, especially 
since the old stamp is re-used when a context backs off and goes into 
the slow path (with ww_ctx->acquired == 0).

2. We want to add waiters directly before the first waiter with a higher 
stamp. Keeping in mind that there may be non-ww_ctx waiters in between, 
and we don't want to starve them, traversing the list backwards would 
require keeping the best insertion point around in a separate variable. 
Clearly possible, but it seemed more awkward.

In hindsight, backwards iteration may not be so bad.

Nicolai
Nicolai Hähnle Dec. 16, 2016, 2:19 p.m. UTC | #5
Hi Peter and Chris,

(trying to combine the handoff discussion here)

On 06.12.2016 17:55, Peter Zijlstra wrote:
> On Thu, Dec 01, 2016 at 03:06:48PM +0100, Nicolai Hähnle wrote:
>> @@ -693,8 +748,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>>  		 * mutex_unlock() handing the lock off to us, do a trylock
>>  		 * before testing the error conditions to make sure we pick up
>>  		 * the handoff.
>> +		 *
>> +		 * For w/w locks, we always need to do this even if we're not
>> +		 * currently the first waiter, because we may have been the
>> +		 * first waiter during the unlock.
>>  		 */
>> -		if (__mutex_trylock(lock, first))
>> +		if (__mutex_trylock(lock, use_ww_ctx || first))
>>  			goto acquired;
>
> So I'm somewhat uncomfortable with this. The point is that with the
> .handoff logic it is very easy to accidentally allow:
>
> 	mutex_lock(&a);
> 	mutex_lock(&a);
>
> And I'm not sure this doesn't make that happen for ww_mutexes. We get to
> this __mutex_trylock() without first having blocked.

Okay, took me a while, but I see the problem. If we have:

	ww_mutex_lock(&a, NULL);
	ww_mutex_lock(&a, ctx);

then it's possible that another currently waiting task sets the HANDOFF 
flag between those calls and we'll allow the second ww_mutex_lock to go 
through.

The concern about picking up a handoff that we didn't request is real, 
though it cannot happen in the first iteration. Perhaps this 
__mutex_trylock can be moved to the end of the loop? See below...


>
>
>>  		/*
>> @@ -716,7 +775,20 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>>  		spin_unlock_mutex(&lock->wait_lock, flags);
>>  		schedule_preempt_disabled();
>>
>> -		if (!first && __mutex_waiter_is_first(lock, &waiter)) {
>> +		if (use_ww_ctx && ww_ctx) {
>> +			/*
>> +			 * Always re-check whether we're in first position. We
>> +			 * don't want to spin if another task with a lower
>> +			 * stamp has taken our position.
>> +			 *
>> +			 * We also may have to set the handoff flag again, if
>> +			 * our position at the head was temporarily taken away.
>> +			 */
>> +			first = __mutex_waiter_is_first(lock, &waiter);
>> +
>> +			if (first)
>> +				__mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
>> +		} else if (!first && __mutex_waiter_is_first(lock, &waiter)) {
>>  			first = true;
>>  			__mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
>>  		}
>
> So the point is that !ww_ctx entries are 'skipped' during the insertion
> and therefore, if one becomes first, it must stay first?

Yes. Actually, it should be possible to replace all the cases of 
use_ww_ctx || first with ww_ctx. Similarly, all cases of use_ww_ctx && 
ww_ctx could be replaced by just ww_ctx.

>
>> @@ -728,7 +800,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>>  		 * or we must see its unlock and acquire.
>>  		 */
>>  		if ((first && mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx, true)) ||
>> -		     __mutex_trylock(lock, first))
>> +		     __mutex_trylock(lock, use_ww_ctx || first))
>>  			break;
>>
>>  		spin_lock_mutex(&lock->wait_lock, flags);

Change this code to:

		acquired = first &&
		    mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx,
					  &waiter);
		spin_lock_mutex(&lock->wait_lock, flags);
		
		if (acquired ||
		    __mutex_trylock(lock, use_ww_ctx || first))
			break;
	}

This changes the trylock to always be under the wait_lock, but we 
previously had that at the beginning of the loop anyway. It also removes 
back-to-back calls to __mutex_trylock when going through the loop; and 
for the first iteration, there is a __mutex_trylock under wait_lock 
already before adding ourselves to the wait list.

What do you think?

Nicolai
Nicolai Hähnle Dec. 16, 2016, 2:21 p.m. UTC | #6
On 01.12.2016 16:59, Chris Wilson wrote:
> On Thu, Dec 01, 2016 at 03:06:48PM +0100, Nicolai Hähnle wrote:
>> @@ -677,15 +722,25 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>>  	debug_mutex_lock_common(lock, &waiter);
>>  	debug_mutex_add_waiter(lock, &waiter, task);
>>
>> -	/* add waiting tasks to the end of the waitqueue (FIFO): */
>> -	list_add_tail(&waiter.list, &lock->wait_list);
>> +	lock_contended(&lock->dep_map, ip);
>> +
>> +	if (!use_ww_ctx) {
>> +		/* add waiting tasks to the end of the waitqueue (FIFO): */
>> +		list_add_tail(&waiter.list, &lock->wait_list);
>> +	} else {
>> +		/* Add in stamp order, waking up waiters that must back off. */
>> +		ret = __ww_mutex_add_waiter(&waiter, lock, ww_ctx);
>> +		if (ret)
>> +			goto err_early_backoff;
>> +
>> +		waiter.ww_ctx = ww_ctx;
>> +	}
>> +
>>  	waiter.task = task;
>
> Would an unconditional waiter.ww_ctx = ww_ctx be chep enough? (Same
> cacheline write and all that?)
>
> Makes the above clearer in that you have
>
> 	if (!ww_ctx) {
> 		list_add_tail();
> 	} else {
> 		ret = __ww_mutex_add_waiter(); /* no need to handle !ww_ctx */
> 		if (ret)
> 			goto err_early_backoff;
> 	}
>
> 	waiter.ww_ctx = ww_ctx;
> 	waiter.task = task;

I don't feel strongly either way. I thought it'd be nice to have an 
explicit distinction between mutex_lock(&a) and ww_mutex_lock(&a, NULL) 
though.

>
>>
>>  	if (__mutex_waiter_is_first(lock, &waiter))
>>  		__mutex_set_flag(lock, MUTEX_FLAG_WAITERS);
>>
>> -	lock_contended(&lock->dep_map, ip);
>> -
>>  	set_task_state(task, state);
>>  	for (;;) {
>>  		/*
>> @@ -693,8 +748,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>>  		 * mutex_unlock() handing the lock off to us, do a trylock
>>  		 * before testing the error conditions to make sure we pick up
>>  		 * the handoff.
>> +		 *
>> +		 * For w/w locks, we always need to do this even if we're not
>> +		 * currently the first waiter, because we may have been the
>> +		 * first waiter during the unlock.
>>  		 */
>> -		if (__mutex_trylock(lock, first))
>> +		if (__mutex_trylock(lock, use_ww_ctx || first))
>
> I'm not certain about the magic of first vs HANDOFF. Afaict, first ==
> HANDOFF and this patch breaks that relationship. I think you need to add
> bool handoff; as a separate tracker to first.
>
>>  			goto acquired;
>>
>>  		/*
>> @@ -716,7 +775,20 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>>  		spin_unlock_mutex(&lock->wait_lock, flags);
>>  		schedule_preempt_disabled();
>>
>> -		if (!first && __mutex_waiter_is_first(lock, &waiter)) {
>> +		if (use_ww_ctx && ww_ctx) {
>> +			/*
>> +			 * Always re-check whether we're in first position. We
>> +			 * don't want to spin if another task with a lower
>> +			 * stamp has taken our position.
>> +			 *
>> +			 * We also may have to set the handoff flag again, if
>> +			 * our position at the head was temporarily taken away.
>
> Comment makes sense.
>
> Ah. Should this be just if (use_ww_ctx) { /* always recheck... */ ?
> Except that !ww_ctx are never gazzumped in the list, so if they are
> first, then they are always first.

Right. See also the other mail.

Nicolai

>
> Could you explain that as well (about why !ww_ctx is special here but
> not above). And then it can even be reduced to if (ww_ctx) {} to match
> the first chunk if the revision is acceptable.
> -Chris
>
Peter Zijlstra Dec. 16, 2016, 2:46 p.m. UTC | #7
On Fri, Dec 16, 2016 at 03:19:43PM +0100, Nicolai Hähnle wrote:
> Hi Peter and Chris,
> 
> (trying to combine the handoff discussion here)
> 
> On 06.12.2016 17:55, Peter Zijlstra wrote:
> >On Thu, Dec 01, 2016 at 03:06:48PM +0100, Nicolai Hähnle wrote:
> >>@@ -693,8 +748,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
> >> 		 * mutex_unlock() handing the lock off to us, do a trylock
> >> 		 * before testing the error conditions to make sure we pick up
> >> 		 * the handoff.
> >>+		 *
> >>+		 * For w/w locks, we always need to do this even if we're not
> >>+		 * currently the first waiter, because we may have been the
> >>+		 * first waiter during the unlock.
> >> 		 */
> >>-		if (__mutex_trylock(lock, first))
> >>+		if (__mutex_trylock(lock, use_ww_ctx || first))
> >> 			goto acquired;
> >
> >So I'm somewhat uncomfortable with this. The point is that with the
> >.handoff logic it is very easy to accidentally allow:
> >
> >	mutex_lock(&a);
> >	mutex_lock(&a);
> >
> >And I'm not sure this doesn't make that happen for ww_mutexes. We get to
> >this __mutex_trylock() without first having blocked.
> 
> Okay, took me a while, but I see the problem. If we have:
> 
> 	ww_mutex_lock(&a, NULL);
> 	ww_mutex_lock(&a, ctx);
> 
> then it's possible that another currently waiting task sets the HANDOFF flag
> between those calls and we'll allow the second ww_mutex_lock to go through.

Its worse, __mutex_trylock() doesn't check if MUTEX_FLAG_HANDOFF is set,
if .handoff == true && __owner_task() == current, we 'acquire'.

And since 'use_ww_ctx' is unconditionally true for ww_mutex_lock(), the
sequence:

	ww_mutex_lock(&a, ...);
	ww_mutex_lock(&a, ...);

will 'work'.
Peter Zijlstra Dec. 16, 2016, 5:15 p.m. UTC | #8
On Fri, Dec 16, 2016 at 03:19:43PM +0100, Nicolai Hähnle wrote:
> The concern about picking up a handoff that we didn't request is real,
> though it cannot happen in the first iteration. Perhaps this __mutex_trylock
> can be moved to the end of the loop? See below...


> >>@@ -728,7 +800,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
> >> 		 * or we must see its unlock and acquire.
> >> 		 */
> >> 		if ((first && mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx, true)) ||
> >>-		     __mutex_trylock(lock, first))
> >>+		     __mutex_trylock(lock, use_ww_ctx || first))
> >> 			break;
> >>
> >> 		spin_lock_mutex(&lock->wait_lock, flags);
> 
> Change this code to:
> 
> 		acquired = first &&
> 		    mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx,
> 					  &waiter);
> 		spin_lock_mutex(&lock->wait_lock, flags);
> 		
> 		if (acquired ||
> 		    __mutex_trylock(lock, use_ww_ctx || first))
> 			break;

			goto acquired;

will work lots better.

> 	}
> 
> This changes the trylock to always be under the wait_lock, but we previously
> had that at the beginning of the loop anyway. 

> It also removes back-to-back
> calls to __mutex_trylock when going through the loop;

Yeah, I had that explicitly. It allows taking the mutex when
mutex_unlock() is still holding the wait_lock.

> and for the first
> iteration, there is a __mutex_trylock under wait_lock already before adding
> ourselves to the wait list.

Correct.
Peter Zijlstra Dec. 16, 2016, 5:20 p.m. UTC | #9
On Fri, Dec 16, 2016 at 03:19:43PM +0100, Nicolai Hähnle wrote:
> >>@@ -716,7 +775,20 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
> >> 		spin_unlock_mutex(&lock->wait_lock, flags);
> >> 		schedule_preempt_disabled();
> >>
> >>-		if (!first && __mutex_waiter_is_first(lock, &waiter)) {
> >>+		if (use_ww_ctx && ww_ctx) {
> >>+			/*
> >>+			 * Always re-check whether we're in first position. We
> >>+			 * don't want to spin if another task with a lower
> >>+			 * stamp has taken our position.
> >>+			 *
> >>+			 * We also may have to set the handoff flag again, if
> >>+			 * our position at the head was temporarily taken away.
> >>+			 */
> >>+			first = __mutex_waiter_is_first(lock, &waiter);
> >>+
> >>+			if (first)
> >>+				__mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
> >>+		} else if (!first && __mutex_waiter_is_first(lock, &waiter)) {
> >> 			first = true;
> >> 			__mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
> >> 		}
> >
> >So the point is that !ww_ctx entries are 'skipped' during the insertion
> >and therefore, if one becomes first, it must stay first?
> 
> Yes. Actually, it should be possible to replace all the cases of use_ww_ctx
> || first with ww_ctx. Similarly, all cases of use_ww_ctx && ww_ctx could be
> replaced by just ww_ctx.


I'm not seeing how "use_ww_ctx || first" -> "ww_ctx" works. And while
"use_ww_ctx && ww_ctx" -> "ww_ctx" is correct, it didn't work right on
some older GCCs, they choked on value propagation for ww_ctx and kept
emitting code even if we passed in NULL. Hence use_ww_ctx.

Arnd is now looking to raise the minimum supported GCC version, so maybe
we should look at that again if he gets anywhere.
Nicolai Hähnle Dec. 16, 2016, 6:11 p.m. UTC | #10
On 16.12.2016 18:15, Peter Zijlstra wrote:
> On Fri, Dec 16, 2016 at 03:19:43PM +0100, Nicolai Hähnle wrote:
>> The concern about picking up a handoff that we didn't request is real,
>> though it cannot happen in the first iteration. Perhaps this __mutex_trylock
>> can be moved to the end of the loop? See below...
>
>
>>>> @@ -728,7 +800,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>>>> 		 * or we must see its unlock and acquire.
>>>> 		 */
>>>> 		if ((first && mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx, true)) ||
>>>> -		     __mutex_trylock(lock, first))
>>>> +		     __mutex_trylock(lock, use_ww_ctx || first))
>>>> 			break;
>>>>
>>>> 		spin_lock_mutex(&lock->wait_lock, flags);
>>
>> Change this code to:
>>
>> 		acquired = first &&
>> 		    mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx,
>> 					  &waiter);
>> 		spin_lock_mutex(&lock->wait_lock, flags);
>> 		
>> 		if (acquired ||
>> 		    __mutex_trylock(lock, use_ww_ctx || first))
>> 			break;
>
> 			goto acquired;
>
> will work lots better.

Wasn't explicit enough, sorry. The idea was to get rid of the acquired 
label and change things so that all paths exit the loop with wait_lock 
held. That seems cleaner to me.


>> 	}
>>
>> This changes the trylock to always be under the wait_lock, but we previously
>> had that at the beginning of the loop anyway.
>
>> It also removes back-to-back
>> calls to __mutex_trylock when going through the loop;
>
> Yeah, I had that explicitly. It allows taking the mutex when
> mutex_unlock() is still holding the wait_lock.

mutex_optimistic_spin() already calls __mutex_trylock, and for the 
no-spin case, __mutex_unlock_slowpath() only calls wake_up_q() after 
releasing the wait_lock.

So I don't see the purpose of the back-to-back __mutex_trylocks, 
especially considering that if the first one succeeds, we immediately 
take the wait_lock anyway.

Nicolai



>> and for the first
>> iteration, there is a __mutex_trylock under wait_lock already before adding
>> ourselves to the wait list.
>
> Correct.
>
Nicolai Hähnle Dec. 16, 2016, 6:12 p.m. UTC | #11
On 16.12.2016 18:20, Peter Zijlstra wrote:
> On Fri, Dec 16, 2016 at 03:19:43PM +0100, Nicolai Hähnle wrote:
>>>> @@ -716,7 +775,20 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>>>> 		spin_unlock_mutex(&lock->wait_lock, flags);
>>>> 		schedule_preempt_disabled();
>>>>
>>>> -		if (!first && __mutex_waiter_is_first(lock, &waiter)) {
>>>> +		if (use_ww_ctx && ww_ctx) {
>>>> +			/*
>>>> +			 * Always re-check whether we're in first position. We
>>>> +			 * don't want to spin if another task with a lower
>>>> +			 * stamp has taken our position.
>>>> +			 *
>>>> +			 * We also may have to set the handoff flag again, if
>>>> +			 * our position at the head was temporarily taken away.
>>>> +			 */
>>>> +			first = __mutex_waiter_is_first(lock, &waiter);
>>>> +
>>>> +			if (first)
>>>> +				__mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
>>>> +		} else if (!first && __mutex_waiter_is_first(lock, &waiter)) {
>>>> 			first = true;
>>>> 			__mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
>>>> 		}
>>>
>>> So the point is that !ww_ctx entries are 'skipped' during the insertion
>>> and therefore, if one becomes first, it must stay first?
>>
>> Yes. Actually, it should be possible to replace all the cases of use_ww_ctx
>> || first with ww_ctx. Similarly, all cases of use_ww_ctx && ww_ctx could be
>> replaced by just ww_ctx.
>
>
> I'm not seeing how "use_ww_ctx || first" -> "ww_ctx" works.

My bad, missing the '|| first'.


> And while
> "use_ww_ctx && ww_ctx" -> "ww_ctx" is correct, it didn't work right on
> some older GCCs, they choked on value propagation for ww_ctx and kept
> emitting code even if we passed in NULL. Hence use_ww_ctx.

Okay, I'll stick with use_ww_ctx. Thanks for the explanation.

Nicolai


> Arnd is now looking to raise the minimum supported GCC version, so maybe
> we should look at that again if he gets anywhere.
>
Peter Zijlstra Dec. 16, 2016, 8 p.m. UTC | #12
On Fri, Dec 16, 2016 at 07:11:41PM +0100, Nicolai Hähnle wrote:
> mutex_optimistic_spin() already calls __mutex_trylock, and for the no-spin
> case, __mutex_unlock_slowpath() only calls wake_up_q() after releasing the
> wait_lock.

mutex_optimistic_spin() is a no-op when !CONFIG_MUTEX_SPIN_ON_OWNER
Nicolai Hähnle Dec. 16, 2016, 10:35 p.m. UTC | #13
On 16.12.2016 21:00, Peter Zijlstra wrote:
> On Fri, Dec 16, 2016 at 07:11:41PM +0100, Nicolai Hähnle wrote:
>> mutex_optimistic_spin() already calls __mutex_trylock, and for the no-spin
>> case, __mutex_unlock_slowpath() only calls wake_up_q() after releasing the
>> wait_lock.
>
> mutex_optimistic_spin() is a no-op when !CONFIG_MUTEX_SPIN_ON_OWNER

Does this change the conclusion in a meaningful way? I did mention the 
no-spin case in the very part you quoted...

Again, AFAIU we're talking about the part of my proposal that turns what 
is effectively

	__mutex_trylock(lock, ...);
	spin_lock_mutex(&lock->wait_lock, flags);

(independent of whether the trylock succeeds or not!) into

	spin_lock_mutex(&lock->wait_lock, flags);
	__mutex_trylock(lock, ...);

in an effort to streamline the code overall.

Also AFAIU, you're concerned that spin_lock_mutex(...) has to wait for 
an unlock from mutex_unlock(), but when does that actually happen with 
relevant probability?

When we spin optimistically, that could happen -- except that 
__mutex_trylock is already called in mutex_optimistic_spin, so it 
doesn't matter. When we don't spin -- whether due to .config or !first 
-- then the chance of overlap with mutex_unlock is exceedingly small.

Even if we do overlap, we'll have to wait for mutex_unlock to release 
the wait_lock anyway! So what good does acquiring the lock first really do?

Anyway, this is really more of an argument about whether there's really 
a good reason to calling __mutex_trylock twice in that loop. I don't 
think there is, your arguments certainly haven't been convincing, but 
the issue can be side-stepped for this patch by keeping the trylock 
calls as they are and just setting first = true unconditionally for 
ww_ctx != NULL (but keep the logic for when to set the HANDOFF flag 
as-is). Should probably rename the variable s/first/handoff/ then.

Nicolai
diff mbox

Patch

diff --git a/include/linux/mutex.h b/include/linux/mutex.h
index b97870f..118a3b6 100644
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -20,6 +20,8 @@ 
 #include <linux/osq_lock.h>
 #include <linux/debug_locks.h>
 
+struct ww_acquire_ctx;
+
 /*
  * Simple, straightforward mutexes with strict semantics:
  *
@@ -75,6 +77,7 @@  static inline struct task_struct *__mutex_owner(struct mutex *lock)
 struct mutex_waiter {
 	struct list_head	list;
 	struct task_struct	*task;
+	struct ww_acquire_ctx	*ww_ctx;
 #ifdef CONFIG_DEBUG_MUTEXES
 	void			*magic;
 #endif
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 585627f..d63eebb 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -628,6 +628,51 @@  __ww_mutex_lock_check_stamp(struct mutex *lock, struct ww_acquire_ctx *ctx)
 	return 0;
 }
 
+static inline int __sched
+__ww_mutex_add_waiter(struct mutex_waiter *waiter,
+		      struct mutex *lock,
+		      struct ww_acquire_ctx *ww_ctx)
+{
+	struct mutex_waiter *cur;
+
+	if (!ww_ctx) {
+		list_add_tail(&waiter->list, &lock->wait_list);
+		return 0;
+	}
+
+	/*
+	 * Add the waiter before the first waiter with a higher stamp.
+	 * Waiters without a context are skipped to avoid starving
+	 * them.
+	 */
+	list_for_each_entry(cur, &lock->wait_list, list) {
+		if (!cur->ww_ctx)
+			continue;
+
+		if (__ww_mutex_stamp_after(ww_ctx, cur->ww_ctx)) {
+			/* Back off immediately if necessary. */
+			if (ww_ctx->acquired > 0) {
+#ifdef CONFIG_DEBUG_MUTEXES
+				struct ww_mutex *ww;
+
+				ww = container_of(lock, struct ww_mutex, base);
+				DEBUG_LOCKS_WARN_ON(ww_ctx->contending_lock);
+				ww_ctx->contending_lock = ww;
+#endif
+				return -EDEADLK;
+			}
+
+			continue;
+		}
+
+		list_add_tail(&waiter->list, &cur->list);
+		return 0;
+	}
+
+	list_add_tail(&waiter->list, &lock->wait_list);
+	return 0;
+}
+
 /*
  * Lock a mutex (possibly interruptible), slowpath:
  */
@@ -677,15 +722,25 @@  __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	debug_mutex_lock_common(lock, &waiter);
 	debug_mutex_add_waiter(lock, &waiter, task);
 
-	/* add waiting tasks to the end of the waitqueue (FIFO): */
-	list_add_tail(&waiter.list, &lock->wait_list);
+	lock_contended(&lock->dep_map, ip);
+
+	if (!use_ww_ctx) {
+		/* add waiting tasks to the end of the waitqueue (FIFO): */
+		list_add_tail(&waiter.list, &lock->wait_list);
+	} else {
+		/* Add in stamp order, waking up waiters that must back off. */
+		ret = __ww_mutex_add_waiter(&waiter, lock, ww_ctx);
+		if (ret)
+			goto err_early_backoff;
+
+		waiter.ww_ctx = ww_ctx;
+	}
+
 	waiter.task = task;
 
 	if (__mutex_waiter_is_first(lock, &waiter))
 		__mutex_set_flag(lock, MUTEX_FLAG_WAITERS);
 
-	lock_contended(&lock->dep_map, ip);
-
 	set_task_state(task, state);
 	for (;;) {
 		/*
@@ -693,8 +748,12 @@  __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		 * mutex_unlock() handing the lock off to us, do a trylock
 		 * before testing the error conditions to make sure we pick up
 		 * the handoff.
+		 *
+		 * For w/w locks, we always need to do this even if we're not
+		 * currently the first waiter, because we may have been the
+		 * first waiter during the unlock.
 		 */
-		if (__mutex_trylock(lock, first))
+		if (__mutex_trylock(lock, use_ww_ctx || first))
 			goto acquired;
 
 		/*
@@ -716,7 +775,20 @@  __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		spin_unlock_mutex(&lock->wait_lock, flags);
 		schedule_preempt_disabled();
 
-		if (!first && __mutex_waiter_is_first(lock, &waiter)) {
+		if (use_ww_ctx && ww_ctx) {
+			/*
+			 * Always re-check whether we're in first position. We
+			 * don't want to spin if another task with a lower
+			 * stamp has taken our position.
+			 *
+			 * We also may have to set the handoff flag again, if
+			 * our position at the head was temporarily taken away.
+			 */
+			first = __mutex_waiter_is_first(lock, &waiter);
+
+			if (first)
+				__mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
+		} else if (!first && __mutex_waiter_is_first(lock, &waiter)) {
 			first = true;
 			__mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
 		}
@@ -728,7 +800,7 @@  __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		 * or we must see its unlock and acquire.
 		 */
 		if ((first && mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx, true)) ||
-		     __mutex_trylock(lock, first))
+		     __mutex_trylock(lock, use_ww_ctx || first))
 			break;
 
 		spin_lock_mutex(&lock->wait_lock, flags);
@@ -761,6 +833,7 @@  __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 err:
 	__set_task_state(task, TASK_RUNNING);
 	mutex_remove_waiter(lock, &waiter, task);
+err_early_backoff:
 	spin_unlock_mutex(&lock->wait_lock, flags);
 	debug_mutex_free_waiter(&waiter);
 	mutex_release(&lock->dep_map, 1, ip);