diff mbox

[v9,03/19] qspinlock: Add pending bit

Message ID 1397747051-15401-4-git-send-email-Waiman.Long@hp.com (mailing list archive)
State New, archived
Headers show

Commit Message

Waiman Long April 17, 2014, 3:03 p.m. UTC
Because the qspinlock needs to touch a second cacheline; add a pending
bit and allow a single in-word spinner before we punt to the second
cacheline.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
 include/asm-generic/qspinlock_types.h |   12 +++-
 kernel/locking/qspinlock.c            |  117 +++++++++++++++++++++++++++------
 2 files changed, 106 insertions(+), 23 deletions(-)

Comments

Peter Zijlstra April 17, 2014, 3:42 p.m. UTC | #1
On Thu, Apr 17, 2014 at 11:03:55AM -0400, Waiman Long wrote:
> +/**
> + * trylock_pending - try to acquire queue spinlock using the pending bit
> + * @lock : Pointer to queue spinlock structure
> + * @pval : Pointer to value of the queue spinlock 32-bit word
> + * Return: 1 if lock acquired, 0 otherwise
> + */
> +static inline int trylock_pending(struct qspinlock *lock, u32 *pval)
> +{
> +	u32 old, new, val = *pval;

I'm not thrilled about you breaking this into a separate function; the
compiler will put it right back and now you get to have that ugly
pointer stuff.

It also makes the function control flow not match the state diagram
anymore.

> +
> +	/*
> +	 * trylock || pending
> +	 *
> +	 * 0,0,0 -> 0,0,1 ; trylock
> +	 * 0,0,1 -> 0,1,1 ; pending
> +	 */
> +	for (;;) {
> +		/*
> +		 * If we observe any contention; queue.
> +		 */
> +		if (val & ~_Q_LOCKED_MASK)
> +			return 0;
> +
> +		new = _Q_LOCKED_VAL;
> +		if (val == new)
> +			new |= _Q_PENDING_VAL;
> +
> +		old = atomic_cmpxchg(&lock->val, val, new);
> +		if (old == val)
> +			break;
> +
> +		*pval = val = old;
> +	}
> +
> +	/*
> +	 * we won the trylock
> +	 */
> +	if (new == _Q_LOCKED_VAL)
> +		return 1;
> +
> +	/*
> +	 * we're pending, wait for the owner to go away.
> +	 *
> +	 * *,1,1 -> *,1,0
> +	 */
> +	while ((val = atomic_read(&lock->val)) & _Q_LOCKED_MASK)
> +		arch_mutex_cpu_relax();

That was a cpu_relax().

> +
> +	/*
> +	 * take ownership and clear the pending bit.
> +	 *
> +	 * *,1,0 -> *,0,1
> +	 */
> +	for (;;) {
> +		new = (val & ~_Q_PENDING_MASK) | _Q_LOCKED_VAL;
> +
> +		old = atomic_cmpxchg(&lock->val, val, new);
> +		if (old == val)
> +			break;
> +
> +		val = old;
> +	}
> +	return 1;
> +}
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Waiman Long April 17, 2014, 9:20 p.m. UTC | #2
On 04/17/2014 11:42 AM, Peter Zijlstra wrote:
> On Thu, Apr 17, 2014 at 11:03:55AM -0400, Waiman Long wrote:
>> +/**
>> + * trylock_pending - try to acquire queue spinlock using the pending bit
>> + * @lock : Pointer to queue spinlock structure
>> + * @pval : Pointer to value of the queue spinlock 32-bit word
>> + * Return: 1 if lock acquired, 0 otherwise
>> + */
>> +static inline int trylock_pending(struct qspinlock *lock, u32 *pval)
>> +{
>> +	u32 old, new, val = *pval;
> I'm not thrilled about you breaking this into a separate function; the
> compiler will put it right back and now you get to have that ugly
> pointer stuff.
>
> It also makes the function control flow not match the state diagram
> anymore.

I separate it out primarily to break the pending bit logic away from the 
core MCS queuing logic to make each of them easier to understand as they 
are kind of independent. I fully understand that the compiler will put 
them back together. As I pile on more code, the slowpath function will 
grow bigger making it harder to comprehend and find out where are the 
boundary between them.

I will take a look at the state diagram to see what adjustment will be 
needed.

>> +
>> +	/*
>> +	 * trylock || pending
>> +	 *
>> +	 * 0,0,0 ->  0,0,1 ; trylock
>> +	 * 0,0,1 ->  0,1,1 ; pending
>> +	 */
>> +	for (;;) {
>> +		/*
>> +		 * If we observe any contention; queue.
>> +		 */
>> +		if (val&  ~_Q_LOCKED_MASK)
>> +			return 0;
>> +
>> +		new = _Q_LOCKED_VAL;
>> +		if (val == new)
>> +			new |= _Q_PENDING_VAL;
>> +
>> +		old = atomic_cmpxchg(&lock->val, val, new);
>> +		if (old == val)
>> +			break;
>> +
>> +		*pval = val = old;
>> +	}
>> +
>> +	/*
>> +	 * we won the trylock
>> +	 */
>> +	if (new == _Q_LOCKED_VAL)
>> +		return 1;
>> +
>> +	/*
>> +	 * we're pending, wait for the owner to go away.
>> +	 *
>> +	 * *,1,1 ->  *,1,0
>> +	 */
>> +	while ((val = atomic_read(&lock->val))&  _Q_LOCKED_MASK)
>> +		arch_mutex_cpu_relax();
> That was a cpu_relax().

Yes, but arch_mutex_cpu_relax() is the same as cpu_relax() for x86.

-Longman
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ingo Molnar April 18, 2014, 7:42 a.m. UTC | #3
* Waiman Long <Waiman.Long@hp.com> wrote:

> Because the qspinlock needs to touch a second cacheline; add a pending
> bit and allow a single in-word spinner before we punt to the second
> cacheline.
> 
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Waiman Long <Waiman.Long@hp.com>

This patch should have a "From: Peter" in it as well, right?

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra April 18, 2014, 8:13 a.m. UTC | #4
On Thu, Apr 17, 2014 at 05:20:31PM -0400, Waiman Long wrote:
> >>+	while ((val = atomic_read(&lock->val))&  _Q_LOCKED_MASK)
> >>+		arch_mutex_cpu_relax();
> >That was a cpu_relax().
> 
> Yes, but arch_mutex_cpu_relax() is the same as cpu_relax() for x86.

Yeah, so why bother typing more?

Let the s390 people sort that if/when they try and make this work for
them.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Waiman Long April 18, 2014, 4:23 p.m. UTC | #5
On 04/18/2014 03:42 AM, Ingo Molnar wrote:
> * Waiman Long<Waiman.Long@hp.com>  wrote:
>
>> Because the qspinlock needs to touch a second cacheline; add a pending
>> bit and allow a single in-word spinner before we punt to the second
>> cacheline.
>>
>> Signed-off-by: Peter Zijlstra<peterz@infradead.org>
>> Signed-off-by: Waiman Long<Waiman.Long@hp.com>
> This patch should have a "From: Peter" in it as well, right?
>
> Thanks,
>
> 	Ingo

Do you mean a "From:" line in the mail header? It will be a bit hard to 
have different "From:" header in the same patch series. I can certainly 
do that if there is an easy way to do it.

-Longman
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Konrad Rzeszutek Wilk April 18, 2014, 4:35 p.m. UTC | #6
On Fri, Apr 18, 2014 at 12:23:29PM -0400, Waiman Long wrote:
> On 04/18/2014 03:42 AM, Ingo Molnar wrote:
> >* Waiman Long<Waiman.Long@hp.com>  wrote:
> >
> >>Because the qspinlock needs to touch a second cacheline; add a pending
> >>bit and allow a single in-word spinner before we punt to the second
> >>cacheline.
> >>
> >>Signed-off-by: Peter Zijlstra<peterz@infradead.org>
> >>Signed-off-by: Waiman Long<Waiman.Long@hp.com>
> >This patch should have a "From: Peter" in it as well, right?
> >
> >Thanks,
> >
> >	Ingo
> 
> Do you mean a "From:" line in the mail header? It will be a bit hard
> to have different "From:" header in the same patch series. I can
> certainly do that if there is an easy way to do it.

It is pretty easy.

Just do 'git commit --amend --author "The Right Author"' and when
you send the patches (git send-email) it will include that.

> 
> -Longman
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Waiman Long April 18, 2014, 5:07 p.m. UTC | #7
On 04/18/2014 04:13 AM, Peter Zijlstra wrote:
> On Thu, Apr 17, 2014 at 05:20:31PM -0400, Waiman Long wrote:
>>>> +	while ((val = atomic_read(&lock->val))&   _Q_LOCKED_MASK)
>>>> +		arch_mutex_cpu_relax();
>>> That was a cpu_relax().
>> Yes, but arch_mutex_cpu_relax() is the same as cpu_relax() for x86.
> Yeah, so why bother typing more?
>
> Let the s390 people sort that if/when they try and make this work for
> them.

OK, I can revert the change if you wish. I have no objection for that.

-Longman
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Waiman Long April 18, 2014, 6:12 p.m. UTC | #8
On 04/18/2014 12:35 PM, Konrad Rzeszutek Wilk wrote:
> On Fri, Apr 18, 2014 at 12:23:29PM -0400, Waiman Long wrote:
>> On 04/18/2014 03:42 AM, Ingo Molnar wrote:
>>> * Waiman Long<Waiman.Long@hp.com>   wrote:
>>>
>>>> Because the qspinlock needs to touch a second cacheline; add a pending
>>>> bit and allow a single in-word spinner before we punt to the second
>>>> cacheline.
>>>>
>>>> Signed-off-by: Peter Zijlstra<peterz@infradead.org>
>>>> Signed-off-by: Waiman Long<Waiman.Long@hp.com>
>>> This patch should have a "From: Peter" in it as well, right?
>>>
>>> Thanks,
>>>
>>> 	Ingo
>> Do you mean a "From:" line in the mail header? It will be a bit hard
>> to have different "From:" header in the same patch series. I can
>> certainly do that if there is an easy way to do it.
> It is pretty easy.
>
> Just do 'git commit --amend --author "The Right Author"' and when
> you send the patches (git send-email) it will include that.

Thank for the tip. I will do that in the next version.

-Longman
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/asm-generic/qspinlock_types.h b/include/asm-generic/qspinlock_types.h
index f66f845..bd25081 100644
--- a/include/asm-generic/qspinlock_types.h
+++ b/include/asm-generic/qspinlock_types.h
@@ -39,8 +39,9 @@  typedef struct qspinlock {
  * Bitfields in the atomic value:
  *
  *  0- 7: locked byte
- *  8- 9: tail index
- * 10-31: tail cpu (+1)
+ *     8: pending
+ *  9-10: tail index
+ * 11-31: tail cpu (+1)
  */
 #define	_Q_SET_MASK(type)	(((1U << _Q_ ## type ## _BITS) - 1)\
 				      << _Q_ ## type ## _OFFSET)
@@ -48,7 +49,11 @@  typedef struct qspinlock {
 #define _Q_LOCKED_BITS		8
 #define _Q_LOCKED_MASK		_Q_SET_MASK(LOCKED)
 
-#define _Q_TAIL_IDX_OFFSET	(_Q_LOCKED_OFFSET + _Q_LOCKED_BITS)
+#define _Q_PENDING_OFFSET	(_Q_LOCKED_OFFSET + _Q_LOCKED_BITS)
+#define _Q_PENDING_BITS		1
+#define _Q_PENDING_MASK		_Q_SET_MASK(PENDING)
+
+#define _Q_TAIL_IDX_OFFSET	(_Q_PENDING_OFFSET + _Q_PENDING_BITS)
 #define _Q_TAIL_IDX_BITS	2
 #define _Q_TAIL_IDX_MASK	_Q_SET_MASK(TAIL_IDX)
 
@@ -57,5 +62,6 @@  typedef struct qspinlock {
 #define _Q_TAIL_CPU_MASK	_Q_SET_MASK(TAIL_CPU)
 
 #define _Q_LOCKED_VAL		(1U << _Q_LOCKED_OFFSET)
+#define _Q_PENDING_VAL		(1U << _Q_PENDING_OFFSET)
 
 #endif /* __ASM_GENERIC_QSPINLOCK_TYPES_H */
diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index b97a1ad..d35362a 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -83,23 +83,93 @@  static inline struct mcs_spinlock *decode_tail(u32 tail)
 	return per_cpu_ptr(&mcs_nodes[idx], cpu);
 }
 
+#define _Q_LOCKED_PENDING_MASK	(_Q_LOCKED_MASK | _Q_PENDING_MASK)
+
+/**
+ * trylock_pending - try to acquire queue spinlock using the pending bit
+ * @lock : Pointer to queue spinlock structure
+ * @pval : Pointer to value of the queue spinlock 32-bit word
+ * Return: 1 if lock acquired, 0 otherwise
+ */
+static inline int trylock_pending(struct qspinlock *lock, u32 *pval)
+{
+	u32 old, new, val = *pval;
+
+	/*
+	 * trylock || pending
+	 *
+	 * 0,0,0 -> 0,0,1 ; trylock
+	 * 0,0,1 -> 0,1,1 ; pending
+	 */
+	for (;;) {
+		/*
+		 * If we observe any contention; queue.
+		 */
+		if (val & ~_Q_LOCKED_MASK)
+			return 0;
+
+		new = _Q_LOCKED_VAL;
+		if (val == new)
+			new |= _Q_PENDING_VAL;
+
+		old = atomic_cmpxchg(&lock->val, val, new);
+		if (old == val)
+			break;
+
+		*pval = val = old;
+	}
+
+	/*
+	 * we won the trylock
+	 */
+	if (new == _Q_LOCKED_VAL)
+		return 1;
+
+	/*
+	 * we're pending, wait for the owner to go away.
+	 *
+	 * *,1,1 -> *,1,0
+	 */
+	while ((val = atomic_read(&lock->val)) & _Q_LOCKED_MASK)
+		arch_mutex_cpu_relax();
+
+	/*
+	 * take ownership and clear the pending bit.
+	 *
+	 * *,1,0 -> *,0,1
+	 */
+	for (;;) {
+		new = (val & ~_Q_PENDING_MASK) | _Q_LOCKED_VAL;
+
+		old = atomic_cmpxchg(&lock->val, val, new);
+		if (old == val)
+			break;
+
+		val = old;
+	}
+	return 1;
+}
+
 /**
  * queue_spin_lock_slowpath - acquire the queue spinlock
  * @lock: Pointer to queue spinlock structure
  * @val: Current value of the queue spinlock 32-bit word
  *
- * (queue tail, lock bit)
+ * (queue tail, pending bit, lock bit)
  *
- *              fast      :    slow                                  :    unlock
- *                        :                                          :
- * uncontended  (0,0)   --:--> (0,1) --------------------------------:--> (*,0)
- *                        :       | ^--------.                    /  :
- *                        :       v           \                   |  :
- * uncontended            :    (n,x) --+--> (n,0)                 |  :
- *   queue                :       | ^--'                          |  :
- *                        :       v                               |  :
- * contended              :    (*,x) --+--> (*,0) -----> (*,1) ---'  :
- *   queue                :         ^--'                             :
+ *              fast     :    slow                                  :    unlock
+ *                       :                                          :
+ * uncontended  (0,0,0) -:--> (0,0,1) ------------------------------:--> (*,*,0)
+ *                       :       | ^--------.------.             /  :
+ *                       :       v           \      \            |  :
+ * pending               :    (0,1,1) +--> (0,1,0)   \           |  :
+ *                       :       | ^--'              |           |  :
+ *                       :       v                   |           |  :
+ * uncontended           :    (n,x,y) +--> (n,0,0) --'           |  :
+ *   queue               :       | ^--'                          |  :
+ *                       :       v                               |  :
+ * contended             :    (*,x,y) +--> (*,0,0) ---> (*,0,1) -'  :
+ *   queue               :         ^--'                             :
  *
  */
 void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val)
@@ -110,6 +180,9 @@  void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val)
 
 	BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS));
 
+	if (trylock_pending(lock, &val))
+		return;	/* Lock acquired */
+
 	node = this_cpu_ptr(&mcs_nodes[0]);
 	idx = node->count++;
 	tail = encode_tail(smp_processor_id(), idx);
@@ -119,15 +192,18 @@  void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val)
 	node->next = NULL;
 
 	/*
+	 * we already touched the queueing cacheline; don't bother with pending
+	 * stuff.
+	 *
 	 * trylock || xchg(lock, node)
 	 *
-	 * 0,0 -> 0,1 ; trylock
-	 * p,x -> n,x ; prev = xchg(lock, node)
+	 * 0,0,0 -> 0,0,1 ; trylock
+	 * p,y,x -> n,y,x ; prev = xchg(lock, node)
 	 */
 	for (;;) {
 		new = _Q_LOCKED_VAL;
 		if (val)
-			new = tail | (val & _Q_LOCKED_MASK);
+			new = tail | (val & _Q_LOCKED_PENDING_MASK);
 
 		old = atomic_cmpxchg(&lock->val, val, new);
 		if (old == val)
@@ -145,7 +221,7 @@  void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val)
 	/*
 	 * if there was a previous node; link it and wait.
 	 */
-	if (old & ~_Q_LOCKED_MASK) {
+	if (old & ~_Q_LOCKED_PENDING_MASK) {
 		prev = decode_tail(old);
 		ACCESS_ONCE(prev->next) = node;
 
@@ -153,18 +229,19 @@  void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val)
 	}
 
 	/*
-	 * we're at the head of the waitqueue, wait for the owner to go away.
+	 * we're at the head of the waitqueue, wait for the owner & pending to
+	 * go away.
 	 *
-	 * *,x -> *,0
+	 * *,x,y -> *,0,0
 	 */
-	while ((val = atomic_read(&lock->val)) & _Q_LOCKED_MASK)
+	while ((val = atomic_read(&lock->val)) & _Q_LOCKED_PENDING_MASK)
 		arch_mutex_cpu_relax();
 
 	/*
 	 * claim the lock:
 	 *
-	 * n,0 -> 0,1 : lock, uncontended
-	 * *,0 -> *,1 : lock, contended
+	 * n,0,0 -> 0,0,1 : lock, uncontended
+	 * *,0,0 -> *,0,1 : lock, contended
 	 */
 	for (;;) {
 		new = _Q_LOCKED_VAL;