diff mbox

[v4,1/2] locking/rwsem: Add a new RWSEM_ANONYMOUSLY_OWNED flag

Message ID 1526420991-21213-2-git-send-email-longman@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Waiman Long May 15, 2018, 9:49 p.m. UTC
There are use cases where a rwsem can be acquired by one task, but
released by another task. In thess cases, optimistic spinning may need
to be disabled.  One example will be the filesystem freeze/thaw code
where the task that freezes the filesystem will acquire a write lock
on a rwsem and then un-owns it before returning to userspace. Later on,
another task will come along, acquire the ownership, thaw the filesystem
and release the rwsem.

Bit 0 of the owner field was used to designate that it is a reader
owned rwsem. It is now repurposed to mean that the owner of the rwsem
is not known. If only bit 0 is set, the rwsem is reader owned. If bit
0 and other bits are set, it is writer owned with an unknown owner.
One such value for the latter case is (-1L). So we can set owner to 1 for
reader-owned, -1 for writer-owned. The owner is unknown in both cases.

To handle transfer of rwsem ownership, the higher level code should
set the owner field to -1 to indicate a write-locked rwsem with unknown
owner.  Optimistic spinning will be disabled in this case.

Once the higher level code figures who the new owner is, it can then
set the owner field accordingly.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem-xadd.c | 17 +++++++----------
 kernel/locking/rwsem.c      |  2 --
 kernel/locking/rwsem.h      | 30 +++++++++++++++++++++---------
 3 files changed, 28 insertions(+), 21 deletions(-)

Comments

Oleg Nesterov May 16, 2018, 10:48 a.m. UTC | #1
On 05/15, Waiman Long wrote:
>
> There are use cases where a rwsem can be acquired by one task, but
> released by another task. In thess cases, optimistic spinning may need
> to be disabled.  One example will be the filesystem freeze/thaw code

You do not read my emails ;)

Let me repeat once again that in this particular case the writer will
never spin because of owner == NULL. freeze_super() checks SB_UNFROZEN
under sb->s_umount and only then calls sb_wait_write(). IOW, sb_wait_write()
can only be called when this rwsem was already released by the previous
writer.

I am not arguing with this change, percpu_rwsem_release/acquire may have
another user sometime, but the changelog is not accurate.

> +static inline bool is_rwsem_owner_spinnable(struct task_struct *owner)
>  {
> -	return owner && owner != RWSEM_READER_OWNED;
> +	return !((unsigned long)owner & RWSEM_ANONYMOUSLY_OWNED);
>  }

Perhaps you should add __attribute__(aligned) to struct rw_semaphore then...

I don't think it is really needed, but see the comment under struct address_space.

Oleg.
Peter Zijlstra May 16, 2018, 11:59 a.m. UTC | #2
On Wed, May 16, 2018 at 12:48:30PM +0200, Oleg Nesterov wrote:
> > +static inline bool is_rwsem_owner_spinnable(struct task_struct *owner)
> >  {
> > -	return owner && owner != RWSEM_READER_OWNED;
> > +	return !((unsigned long)owner & RWSEM_ANONYMOUSLY_OWNED);
> >  }
> 
> Perhaps you should add __attribute__(aligned) to struct rw_semaphore then...
> 
> I don't think it is really needed, but see the comment under struct address_space.

Luckily we just dropped CRIS support, but yeah, who knows if some other
dodgy arch also doesn't properly align things.

From a quick test, m68k is the only odd one, it seems to align pointers
on 2 bytes.
Matthew Wilcox May 16, 2018, 12:19 p.m. UTC | #3
On Tue, May 15, 2018 at 05:49:50PM -0400, Waiman Long wrote:
> @@ -357,11 +357,8 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
>  
>  	rcu_read_lock();
>  	owner = READ_ONCE(sem->owner);
> -	if (!rwsem_owner_is_writer(owner)) {
> -		/*
> -		 * Don't spin if the rwsem is readers owned.
> -		 */
> -		ret = !rwsem_owner_is_reader(owner);
> +	if (!owner || !is_rwsem_owner_spinnable(owner)) {
> +		ret = !owner;	/* !owner is spinnable */
>  		goto done;
>  	}

This is confusingly written.  I think you mean ...

	if (!owner)
		goto done;
	if (!is_rwsem_owner_spinnable(owner)) {
		ret = false;
		goto done;
	}
Waiman Long May 16, 2018, 1:11 p.m. UTC | #4
On 05/16/2018 06:48 AM, Oleg Nesterov wrote:
> On 05/15, Waiman Long wrote:
>> There are use cases where a rwsem can be acquired by one task, but
>> released by another task. In thess cases, optimistic spinning may need
>> to be disabled.  One example will be the filesystem freeze/thaw code
> You do not read my emails ;)
>
> Let me repeat once again that in this particular case the writer will
> never spin because of owner == NULL. freeze_super() checks SB_UNFROZEN
> under sb->s_umount and only then calls sb_wait_write(). IOW, sb_wait_write()
> can only be called when this rwsem was already released by the previous
> writer.
>
> I am not arguing with this change, percpu_rwsem_release/acquire may have
> another user sometime, but the changelog is not accurate.

I know the change may not be necessary in this particular case, but it
is a correctness issue. Optimistic spinning should be disabled when the
exact time delay between percpu_rwsem_release() and
percpu_rwsem_acquire() is indeterminate even though no one is supposed
to spin on the rwsem during that time.

If we don't do that now, we may forget this issue when some other use
cases show up or we extend rwsem to do reader optimistic spinning, for
instance. So it is better to address that now than debugging the same
issue again in the future.

Cheers,
Longman
Oleg Nesterov May 16, 2018, 3:27 p.m. UTC | #5
On 05/16, Waiman Long wrote:
>
> On 05/16/2018 06:48 AM, Oleg Nesterov wrote:
> > On 05/15, Waiman Long wrote:
> >> There are use cases where a rwsem can be acquired by one task, but
> >> released by another task. In thess cases, optimistic spinning may need
> >> to be disabled.  One example will be the filesystem freeze/thaw code
> > You do not read my emails ;)
> >
> > Let me repeat once again that in this particular case the writer will
> > never spin because of owner == NULL. freeze_super() checks SB_UNFROZEN
> > under sb->s_umount and only then calls sb_wait_write(). IOW, sb_wait_write()
> > can only be called when this rwsem was already released by the previous
> > writer.
> >
> > I am not arguing with this change, percpu_rwsem_release/acquire may have
> > another user sometime, but the changelog is not accurate.
>
> I know the change may not be necessary in this particular case, but it
> is a correctness issue.

Really? I mean, performance-wise the unnecessary spinning is obviously bad,
but why it is a correctness issue?

And how this differs from the case when down_write() is preempted right
before rwsem_set_owner() ?

> Optimistic spinning should be disabled when the
> exact time delay between percpu_rwsem_release() and
> percpu_rwsem_acquire() is indeterminate even though no one is supposed
> to spin on the rwsem during that time.
>
> If we don't do that now, we may forget this issue when

See above, I never argued with this change. Just the changelog looks as if
we already have this issue in freeze/thaw code, this is not true.

Oleg.
Ingo Molnar May 18, 2018, 7:02 a.m. UTC | #6
* Matthew Wilcox <willy@infradead.org> wrote:

> On Tue, May 15, 2018 at 05:49:50PM -0400, Waiman Long wrote:
> > @@ -357,11 +357,8 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
> >  
> >  	rcu_read_lock();
> >  	owner = READ_ONCE(sem->owner);
> > -	if (!rwsem_owner_is_writer(owner)) {
> > -		/*
> > -		 * Don't spin if the rwsem is readers owned.
> > -		 */
> > -		ret = !rwsem_owner_is_reader(owner);
> > +	if (!owner || !is_rwsem_owner_spinnable(owner)) {
> > +		ret = !owner;	/* !owner is spinnable */
> >  		goto done;
> >  	}
> 
> This is confusingly written.  I think you mean ...
> 
> 	if (!owner)
> 		goto done;
> 	if (!is_rwsem_owner_spinnable(owner)) {
> 		ret = false;
> 		goto done;
> 	}

Yes, that's cleaner. Waiman, mind sending a followup patch that cleans this up?

Thanks,

	Ingo
Oleg Nesterov May 18, 2018, 8:41 a.m. UTC | #7
On 05/18, Ingo Molnar wrote:
>
>
> * Matthew Wilcox <willy@infradead.org> wrote:
>
> > This is confusingly written.  I think you mean ...
> >
> > 	if (!owner)
> > 		goto done;
> > 	if (!is_rwsem_owner_spinnable(owner)) {
> > 		ret = false;
> > 		goto done;
> > 	}
>
> Yes, that's cleaner. Waiman, mind sending a followup patch that cleans this up?

Or simply

	static inline bool owner_on_cpu(struct task_struct *owner)
	{
		return owner->on_cpu && !vcpu_is_preempted(task_cpu(owner));
	}

	static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
	{
		struct task_struct *owner;
		bool ret = true;

		if (need_resched())
			return false;

		rcu_read_lock();
		owner = READ_ONCE(sem->owner);
		if (owner) {
			ret = is_rwsem_owner_spinnable(owner) &&
			      owner_on_cpu(owner);
		}
		rcu_read_unlock();
		return ret;
	}

note that rwsem_spin_on_owner() can use the new owner_on_cpu() helper too,

		if (need_resched() || !owner_on_cpu(owner)) {
			rcu_read_unlock();
			return false;
		}

looks a bit better than the current code:

		if (!owner->on_cpu || need_resched() ||
				vcpu_is_preempted(task_cpu(owner))) {
			rcu_read_unlock();
			return false;
		}

Oleg.
Ingo Molnar May 18, 2018, 9:40 a.m. UTC | #8
* Oleg Nesterov <oleg@redhat.com> wrote:

> On 05/18, Ingo Molnar wrote:
> >
> >
> > * Matthew Wilcox <willy@infradead.org> wrote:
> >
> > > This is confusingly written.  I think you mean ...
> > >
> > > 	if (!owner)
> > > 		goto done;
> > > 	if (!is_rwsem_owner_spinnable(owner)) {
> > > 		ret = false;
> > > 		goto done;
> > > 	}
> >
> > Yes, that's cleaner. Waiman, mind sending a followup patch that cleans this up?
> 
> Or simply
> 
> 	static inline bool owner_on_cpu(struct task_struct *owner)
> 	{
> 		return owner->on_cpu && !vcpu_is_preempted(task_cpu(owner));
> 	}
> 
> 	static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
> 	{
> 		struct task_struct *owner;
> 		bool ret = true;
> 
> 		if (need_resched())
> 			return false;
> 
> 		rcu_read_lock();
> 		owner = READ_ONCE(sem->owner);
> 		if (owner) {
> 			ret = is_rwsem_owner_spinnable(owner) &&
> 			      owner_on_cpu(owner);
> 		}
> 		rcu_read_unlock();
> 		return ret;
> 	}
> 
> note that rwsem_spin_on_owner() can use the new owner_on_cpu() helper too,
> 
> 		if (need_resched() || !owner_on_cpu(owner)) {
> 			rcu_read_unlock();
> 			return false;
> 		}
> 
> looks a bit better than the current code:
> 
> 		if (!owner->on_cpu || need_resched() ||
> 				vcpu_is_preempted(task_cpu(owner))) {
> 			rcu_read_unlock();
> 			return false;
> 		}
> 
> Oleg.

That looks good to me too - mind sending a patch on top of latest -tip?

Thanks,

	Ingo
diff mbox

Patch

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index e795908..604d247 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -357,11 +357,8 @@  static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
 
 	rcu_read_lock();
 	owner = READ_ONCE(sem->owner);
-	if (!rwsem_owner_is_writer(owner)) {
-		/*
-		 * Don't spin if the rwsem is readers owned.
-		 */
-		ret = !rwsem_owner_is_reader(owner);
+	if (!owner || !is_rwsem_owner_spinnable(owner)) {
+		ret = !owner;	/* !owner is spinnable */
 		goto done;
 	}
 
@@ -382,11 +379,11 @@  static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem)
 {
 	struct task_struct *owner = READ_ONCE(sem->owner);
 
-	if (!rwsem_owner_is_writer(owner))
-		goto out;
+	if (!is_rwsem_owner_spinnable(owner))
+		return false;
 
 	rcu_read_lock();
-	while (sem->owner == owner) {
+	while (owner && (READ_ONCE(sem->owner) == owner)) {
 		/*
 		 * Ensure we emit the owner->on_cpu, dereference _after_
 		 * checking sem->owner still matches owner, if that fails,
@@ -408,12 +405,12 @@  static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem)
 		cpu_relax();
 	}
 	rcu_read_unlock();
-out:
+
 	/*
 	 * If there is a new owner or the owner is not set, we continue
 	 * spinning.
 	 */
-	return !rwsem_owner_is_reader(READ_ONCE(sem->owner));
+	return is_rwsem_owner_spinnable(READ_ONCE(sem->owner));
 }
 
 static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 30465a2..bc1e507 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -221,5 +221,3 @@  void up_read_non_owner(struct rw_semaphore *sem)
 EXPORT_SYMBOL(up_read_non_owner);
 
 #endif
-
-
diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
index a17cba8..b9d0e72 100644
--- a/kernel/locking/rwsem.h
+++ b/kernel/locking/rwsem.h
@@ -1,20 +1,24 @@ 
 /* SPDX-License-Identifier: GPL-2.0 */
 /*
  * The owner field of the rw_semaphore structure will be set to
- * RWSEM_READ_OWNED when a reader grabs the lock. A writer will clear
+ * RWSEM_READER_OWNED when a reader grabs the lock. A writer will clear
  * the owner field when it unlocks. A reader, on the other hand, will
  * not touch the owner field when it unlocks.
  *
- * In essence, the owner field now has the following 3 states:
+ * In essence, the owner field now has the following 4 states:
  *  1) 0
  *     - lock is free or the owner hasn't set the field yet
  *  2) RWSEM_READER_OWNED
  *     - lock is currently or previously owned by readers (lock is free
  *       or not set by owner yet)
- *  3) Other non-zero value
- *     - a writer owns the lock
+ *  3) RWSEM_ANONYMOUSLY_OWNED bit set with some other bits set as well
+ *     - lock is owned by an anonymous writer, so spinning on the lock
+ *       owner should be disabled.
+ *  4) Other non-zero value
+ *     - a writer owns the lock and other writers can spin on the lock owner.
  */
-#define RWSEM_READER_OWNED	((struct task_struct *)1UL)
+#define RWSEM_ANONYMOUSLY_OWNED	(1UL << 0)
+#define RWSEM_READER_OWNED	((struct task_struct *)RWSEM_ANONYMOUSLY_OWNED)
 
 #ifdef CONFIG_DEBUG_RWSEMS
 # define DEBUG_RWSEMS_WARN_ON(c)	DEBUG_LOCKS_WARN_ON(c)
@@ -51,14 +55,22 @@  static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
 		WRITE_ONCE(sem->owner, RWSEM_READER_OWNED);
 }
 
-static inline bool rwsem_owner_is_writer(struct task_struct *owner)
+/*
+ * Return true if the a rwsem waiter can spin on the rwsem's owner
+ * and steal the lock, i.e. the lock is not anonymously owned.
+ * N.B. !owner is considered spinnable.
+ */
+static inline bool is_rwsem_owner_spinnable(struct task_struct *owner)
 {
-	return owner && owner != RWSEM_READER_OWNED;
+	return !((unsigned long)owner & RWSEM_ANONYMOUSLY_OWNED);
 }
 
-static inline bool rwsem_owner_is_reader(struct task_struct *owner)
+/*
+ * Return true if rwsem is owned by an anonymous writer or readers.
+ */
+static inline bool rwsem_has_anonymous_owner(struct task_struct *owner)
 {
-	return owner == RWSEM_READER_OWNED;
+	return (unsigned long)owner & RWSEM_ANONYMOUSLY_OWNED;
 }
 #else
 static inline void rwsem_set_owner(struct rw_semaphore *sem)