diff mbox

[v9,04/19] qspinlock: Extract out the exchange of tail code word

Message ID 1397747051-15401-5-git-send-email-Waiman.Long@hp.com (mailing list archive)
State New, archived
Headers show

Commit Message

Waiman Long April 17, 2014, 3:03 p.m. UTC
This patch extracts the logic for the exchange of new and previous tail
code words into a new xchg_tail() function which can be optimized in a
later patch.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
 include/asm-generic/qspinlock_types.h |    2 +
 kernel/locking/qspinlock.c            |   61 +++++++++++++++++++++------------
 2 files changed, 41 insertions(+), 22 deletions(-)

Comments

Peter Zijlstra April 17, 2014, 3:49 p.m. UTC | #1
On Thu, Apr 17, 2014 at 11:03:56AM -0400, Waiman Long wrote:
> @@ -192,36 +220,25 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val)
>  	node->next = NULL;
>  
>  	/*
> +	 * We touched a (possibly) cold cacheline; attempt the trylock once
> +	 * more in the hope someone let go while we weren't watching as long
> +	 * as no one was queuing.
>  	 */
> +	if (!(val & _Q_TAIL_MASK) && queue_spin_trylock(lock))
> +		goto release;

But you just did a potentially very expensive op; @val isn't
representative anymore!
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Waiman Long April 17, 2014, 9:28 p.m. UTC | #2
On 04/17/2014 11:49 AM, Peter Zijlstra wrote:
> On Thu, Apr 17, 2014 at 11:03:56AM -0400, Waiman Long wrote:
>> @@ -192,36 +220,25 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val)
>>   	node->next = NULL;
>>
>>   	/*
>> +	 * We touched a (possibly) cold cacheline; attempt the trylock once
>> +	 * more in the hope someone let go while we weren't watching as long
>> +	 * as no one was queuing.
>>   	 */
>> +	if (!(val&  _Q_TAIL_MASK)&&  queue_spin_trylock(lock))
>> +		goto release;
> But you just did a potentially very expensive op; @val isn't
> representative anymore!

That is not true. I pass in a pointer to val to trylock_pending() (the 
pointer thing) so that it will store the latest value that it reads from 
the lock back into val. I did miss one in the PV qspinlock exit loop. I 
will add it back when I do the next version.

-Longman
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra April 18, 2014, 8:15 a.m. UTC | #3
On Thu, Apr 17, 2014 at 05:28:17PM -0400, Waiman Long wrote:
> On 04/17/2014 11:49 AM, Peter Zijlstra wrote:
> >On Thu, Apr 17, 2014 at 11:03:56AM -0400, Waiman Long wrote:
> >>@@ -192,36 +220,25 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val)
> >>  	node->next = NULL;
> >>
> >>  	/*
> >>+	 * We touched a (possibly) cold cacheline; attempt the trylock once
> >>+	 * more in the hope someone let go while we weren't watching as long
> >>+	 * as no one was queuing.
> >>  	 */
> >>+	if (!(val&  _Q_TAIL_MASK)&&  queue_spin_trylock(lock))
> >>+		goto release;
> >But you just did a potentially very expensive op; @val isn't
> >representative anymore!
> 
> That is not true. I pass in a pointer to val to trylock_pending() (the
> pointer thing) so that it will store the latest value that it reads from the
> lock back into val. I did miss one in the PV qspinlock exit loop. I will add
> it back when I do the next version.

But you did that read _before_ you touched a cold cacheline, that's 100s
of cycles. Whatever value you read back then is now complete nonsense.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Waiman Long April 18, 2014, 5:32 p.m. UTC | #4
On 04/18/2014 04:15 AM, Peter Zijlstra wrote:
> On Thu, Apr 17, 2014 at 05:28:17PM -0400, Waiman Long wrote:
>> On 04/17/2014 11:49 AM, Peter Zijlstra wrote:
>>> On Thu, Apr 17, 2014 at 11:03:56AM -0400, Waiman Long wrote:
>>>> @@ -192,36 +220,25 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val)
>>>>   	node->next = NULL;
>>>>
>>>>   	/*
>>>> +	 * We touched a (possibly) cold cacheline; attempt the trylock once
>>>> +	 * more in the hope someone let go while we weren't watching as long
>>>> +	 * as no one was queuing.
>>>>   	 */
>>>> +	if (!(val&   _Q_TAIL_MASK)&&   queue_spin_trylock(lock))
>>>> +		goto release;
>>> But you just did a potentially very expensive op; @val isn't
>>> representative anymore!
>> That is not true. I pass in a pointer to val to trylock_pending() (the
>> pointer thing) so that it will store the latest value that it reads from the
>> lock back into val. I did miss one in the PV qspinlock exit loop. I will add
>> it back when I do the next version.
> But you did that read _before_ you touched a cold cacheline, that's 100s
> of cycles. Whatever value you read back then is now complete nonsense.

For spin_lock(), the lock cacheline is touched by a cmpxchg(). It can 
takes 100s of cycles whether it is hot or cold.

I will take the precheck out, it is not such a big deal anyway.

-Longman
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra April 18, 2014, 5:53 p.m. UTC | #5
On Fri, Apr 18, 2014 at 01:32:47PM -0400, Waiman Long wrote:
> On 04/18/2014 04:15 AM, Peter Zijlstra wrote:
> >On Thu, Apr 17, 2014 at 05:28:17PM -0400, Waiman Long wrote:
> >>On 04/17/2014 11:49 AM, Peter Zijlstra wrote:
> >>>On Thu, Apr 17, 2014 at 11:03:56AM -0400, Waiman Long wrote:
> >>>>@@ -192,36 +220,25 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val)
> >>>>  	node->next = NULL;
> >>>>
> >>>>  	/*
> >>>>+	 * We touched a (possibly) cold cacheline; attempt the trylock once
> >>>>+	 * more in the hope someone let go while we weren't watching as long
> >>>>+	 * as no one was queuing.
> >>>>  	 */
> >>>>+	if (!(val&   _Q_TAIL_MASK)&&   queue_spin_trylock(lock))
> >>>>+		goto release;
> >>>But you just did a potentially very expensive op; @val isn't
> >>>representative anymore!
> >>That is not true. I pass in a pointer to val to trylock_pending() (the
> >>pointer thing) so that it will store the latest value that it reads from the
> >>lock back into val. I did miss one in the PV qspinlock exit loop. I will add
> >>it back when I do the next version.
> >But you did that read _before_ you touched a cold cacheline, that's 100s
> >of cycles. Whatever value you read back then is now complete nonsense.
> 
> For spin_lock(), the lock cacheline is touched by a cmpxchg(). It can takes
> 100s of cycles whether it is hot or cold.

Its not the lock cacheline, you just touched the per-cpu node cacheline
for the first time, setting up the node.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Waiman Long April 18, 2014, 6:13 p.m. UTC | #6
On 04/18/2014 01:53 PM, Peter Zijlstra wrote:
> On Fri, Apr 18, 2014 at 01:32:47PM -0400, Waiman Long wrote:
>> On 04/18/2014 04:15 AM, Peter Zijlstra wrote:
>>> On Thu, Apr 17, 2014 at 05:28:17PM -0400, Waiman Long wrote:
>>>> On 04/17/2014 11:49 AM, Peter Zijlstra wrote:
>>>>> On Thu, Apr 17, 2014 at 11:03:56AM -0400, Waiman Long wrote:
>>>>>> @@ -192,36 +220,25 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val)
>>>>>>   	node->next = NULL;
>>>>>>
>>>>>>   	/*
>>>>>> +	 * We touched a (possibly) cold cacheline; attempt the trylock once
>>>>>> +	 * more in the hope someone let go while we weren't watching as long
>>>>>> +	 * as no one was queuing.
>>>>>>   	 */
>>>>>> +	if (!(val&    _Q_TAIL_MASK)&&    queue_spin_trylock(lock))
>>>>>> +		goto release;
>>>>> But you just did a potentially very expensive op; @val isn't
>>>>> representative anymore!
>>>> That is not true. I pass in a pointer to val to trylock_pending() (the
>>>> pointer thing) so that it will store the latest value that it reads from the
>>>> lock back into val. I did miss one in the PV qspinlock exit loop. I will add
>>>> it back when I do the next version.
>>> But you did that read _before_ you touched a cold cacheline, that's 100s
>>> of cycles. Whatever value you read back then is now complete nonsense.
>> For spin_lock(), the lock cacheline is touched by a cmpxchg(). It can takes
>> 100s of cycles whether it is hot or cold.
> Its not the lock cacheline, you just touched the per-cpu node cacheline
> for the first time, setting up the node.
>

Thank for the clarification, now I know what you mean.

-Longman
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/asm-generic/qspinlock_types.h b/include/asm-generic/qspinlock_types.h
index bd25081..ed5d89a 100644
--- a/include/asm-generic/qspinlock_types.h
+++ b/include/asm-generic/qspinlock_types.h
@@ -61,6 +61,8 @@  typedef struct qspinlock {
 #define _Q_TAIL_CPU_BITS	(32 - _Q_TAIL_CPU_OFFSET)
 #define _Q_TAIL_CPU_MASK	_Q_SET_MASK(TAIL_CPU)
 
+#define _Q_TAIL_MASK		(_Q_TAIL_IDX_MASK | _Q_TAIL_CPU_MASK)
+
 #define _Q_LOCKED_VAL		(1U << _Q_LOCKED_OFFSET)
 #define _Q_PENDING_VAL		(1U << _Q_PENDING_OFFSET)
 
diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index d35362a..fcf06cb 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -86,6 +86,34 @@  static inline struct mcs_spinlock *decode_tail(u32 tail)
 #define _Q_LOCKED_PENDING_MASK	(_Q_LOCKED_MASK | _Q_PENDING_MASK)
 
 /**
+ * xchg_tail - Put in the new queue tail code word & retrieve previous one
+ * @lock : Pointer to queue spinlock structure
+ * @tail : The new queue tail code word
+ * @pval : Pointer to current value of the queue spinlock 32-bit word
+ * Return: The previous queue tail code word
+ *
+ * xchg(lock, tail)
+ *
+ * p,*,* -> n,*,* ; prev = xchg(lock, node)
+ */
+static __always_inline u32
+xchg_tail(struct qspinlock *lock, u32 tail, u32 *pval)
+{
+	u32 old, new, val = *pval;
+
+	for (;;) {
+		new = (val & _Q_LOCKED_PENDING_MASK) | tail;
+		old = atomic_cmpxchg(&lock->val, val, new);
+		if (old == val)
+			break;
+
+		val = old;
+	}
+	*pval = new;
+	return old;
+}
+
+/**
  * trylock_pending - try to acquire queue spinlock using the pending bit
  * @lock : Pointer to queue spinlock structure
  * @pval : Pointer to value of the queue spinlock 32-bit word
@@ -192,36 +220,25 @@  void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val)
 	node->next = NULL;
 
 	/*
-	 * we already touched the queueing cacheline; don't bother with pending
-	 * stuff.
-	 *
-	 * trylock || xchg(lock, node)
-	 *
-	 * 0,0,0 -> 0,0,1 ; trylock
-	 * p,y,x -> n,y,x ; prev = xchg(lock, node)
+	 * We touched a (possibly) cold cacheline; attempt the trylock once
+	 * more in the hope someone let go while we weren't watching as long
+	 * as no one was queuing.
 	 */
-	for (;;) {
-		new = _Q_LOCKED_VAL;
-		if (val)
-			new = tail | (val & _Q_LOCKED_PENDING_MASK);
-
-		old = atomic_cmpxchg(&lock->val, val, new);
-		if (old == val)
-			break;
-
-		val = old;
-	}
+	if (!(val & _Q_TAIL_MASK) && queue_spin_trylock(lock))
+		goto release;
 
 	/*
-	 * we won the trylock; forget about queueing.
+	 * we already touched the queueing cacheline; don't bother with pending
+	 * stuff.
+	 *
+	 * p,*,* -> n,*,*
 	 */
-	if (new == _Q_LOCKED_VAL)
-		goto release;
+	old = xchg_tail(lock, tail, &val);
 
 	/*
 	 * if there was a previous node; link it and wait.
 	 */
-	if (old & ~_Q_LOCKED_PENDING_MASK) {
+	if (old & _Q_TAIL_MASK) {
 		prev = decode_tail(old);
 		ACCESS_ONCE(prev->next) = node;