[RFC,05/28] sched: Add cond_resched_rwlock

Message ID	20190926231824.149014-6-bgardon@google.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=ErSa=XV=vger.kernel.org=kvm-owner@kernel.org> Date: Thu, 26 Sep 2019 16:18:01 -0700 In-Reply-To: <20190926231824.149014-1-bgardon@google.com> Message-Id: <20190926231824.149014-6-bgardon@google.com> Mime-Version: 1.0 References: <20190926231824.149014-1-bgardon@google.com> Subject: [RFC PATCH 05/28] sched: Add cond_resched_rwlock From: Ben Gardon <bgardon@google.com> To: kvm@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com>, Peter Feiner <pfeiner@google.com>, Peter Shier <pshier@google.com>, Junaid Shahid <junaids@google.com>, Jim Mattson <jmattson@google.com>, Ben Gardon <bgardon@google.com> Content-Type: text/plain; charset="UTF-8" Sender: kvm-owner@vger.kernel.org Precedence: bulk
Series	kvm: mmu: Rework the x86 TDP direct mapped case \| expand [RFC,00/28] kvm: mmu: Rework the x86 TDP direct mapped case [RFC,01/28] kvm: mmu: Separate generating and setting mmio ptes [RFC,02/28] kvm: mmu: Separate pte generation from set_spte [RFC,03/28] kvm: mmu: Zero page cache memory at allocation time [RFC,04/28] kvm: mmu: Update the lpages stat atomically [RFC,05/28] sched: Add cond_resched_rwlock [RFC,06/28] kvm: mmu: Replace mmu_lock with a read/write lock [RFC,07/28] kvm: mmu: Add functions for handling changed PTEs [RFC,08/28] kvm: mmu: Init / Uninit the direct MMU [RFC,09/28] kvm: mmu: Free direct MMU page table memory in an RCU callback [RFC,10/28] kvm: mmu: Flush TLBs before freeing direct MMU page table memory [RFC,11/28] kvm: mmu: Optimize for freeing direct MMU PTs on teardown [RFC,12/28] kvm: mmu: Set tlbs_dirty atomically [RFC,13/28] kvm: mmu: Add an iterator for concurrent paging structure walks [RFC,14/28] kvm: mmu: Batch updates to the direct mmu disconnected list [RFC,15/28] kvm: mmu: Support invalidate_zap_all_pages [RFC,16/28] kvm: mmu: Add direct MMU page fault handler [RFC,17/28] kvm: mmu: Add direct MMU fast page fault handler [RFC,18/28] kvm: mmu: Add an hva range iterator for memslot GFNs [RFC,19/28] kvm: mmu: Make address space ID a property of memslots [RFC,20/28] kvm: mmu: Implement the invalidation MMU notifiers for the direct MMU [RFC,21/28] kvm: mmu: Integrate the direct mmu with the changed pte notifier [RFC,22/28] kvm: mmu: Implement access tracking for the direct MMU [RFC,23/28] kvm: mmu: Make mark_page_dirty_in_slot usable from outside kvm_main [RFC,24/28] kvm: mmu: Support dirty logging in the direct MMU [RFC,25/28] kvm: mmu: Support kvm_zap_gfn_range in the direct MMU [RFC,26/28] kvm: mmu: Integrate direct MMU with nesting [RFC,27/28] kvm: mmu: Lazily allocate rmap when direct MMU is enabled [RFC,28/28] kvm: mmu: Support MMIO in the direct MMU

Message ID

20190926231824.149014-6-bgardon@google.com (mailing list archive)

State

New, archived

Headers

Date: Thu, 26 Sep 2019 16:18:01 -0700
In-Reply-To: <20190926231824.149014-1-bgardon@google.com>
Message-Id: <20190926231824.149014-6-bgardon@google.com>
Mime-Version: 1.0
References: <20190926231824.149014-1-bgardon@google.com>
Subject: [RFC PATCH 05/28] sched: Add cond_resched_rwlock
From: Ben Gardon <bgardon@google.com>
To: kvm@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>,
        Peter Feiner <pfeiner@google.com>,
        Peter Shier <pshier@google.com>,
        Junaid Shahid <junaids@google.com>,
        Jim Mattson <jmattson@google.com>,
        Ben Gardon <bgardon@google.com>
Content-Type: text/plain; charset="UTF-8"
Sender: kvm-owner@vger.kernel.org
Precedence: bulk

Series

kvm: mmu: Rework the x86 TDP direct mapped case | expand

Commit Message

Ben Gardon Sept. 26, 2019, 11:18 p.m. UTC

Rescheduling while holding a spin lock is essential for keeping long
running kernel operations running smoothly. Add the facility to
cond_resched read/write spin locks.

RFC_NOTE: The current implementation of this patch set uses a read/write
lock to replace the existing MMU spin lock. See the next patch in this
series for more on why a read/write lock was chosen, and possible
alternatives.

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 include/linux/sched.h | 11 +++++++++++
 kernel/sched/core.c   | 23 +++++++++++++++++++++++
 2 files changed, 34 insertions(+)

Comments

Sean Christopherson Nov. 27, 2019, 6:42 p.m. UTC | #1

On Thu, Sep 26, 2019 at 04:18:01PM -0700, Ben Gardon wrote:
> Rescheduling while holding a spin lock is essential for keeping long
> running kernel operations running smoothly. Add the facility to
> cond_resched read/write spin locks.
> 
> RFC_NOTE: The current implementation of this patch set uses a read/write
> lock to replace the existing MMU spin lock. See the next patch in this
> series for more on why a read/write lock was chosen, and possible
> alternatives.

This definitely needs to be run by the sched/locking folks sooner rather
than later.

> Signed-off-by: Ben Gardon <bgardon@google.com>
> ---
>  include/linux/sched.h | 11 +++++++++++
>  kernel/sched/core.c   | 23 +++++++++++++++++++++++
>  2 files changed, 34 insertions(+)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 70db597d6fd4f..4d1fd96693d9b 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1767,12 +1767,23 @@ static inline int _cond_resched(void) { return 0; }
>  })
>  
>  extern int __cond_resched_lock(spinlock_t *lock);
> +extern int __cond_resched_rwlock(rwlock_t *lock, bool write_lock);
>  
>  #define cond_resched_lock(lock) ({				\
>  	___might_sleep(__FILE__, __LINE__, PREEMPT_LOCK_OFFSET);\
>  	__cond_resched_lock(lock);				\
>  })
>  
> +#define cond_resched_rwlock_read(lock) ({			\
> +	__might_sleep(__FILE__, __LINE__, PREEMPT_LOCK_OFFSET);	\
> +	__cond_resched_rwlock(lock, false);			\
> +})
> +
> +#define cond_resched_rwlock_write(lock) ({			\
> +	__might_sleep(__FILE__, __LINE__, PREEMPT_LOCK_OFFSET);	\
> +	__cond_resched_rwlock(lock, true);			\
> +})
> +
>  static inline void cond_resched_rcu(void)
>  {
>  #if defined(CONFIG_DEBUG_ATOMIC_SLEEP) || !defined(CONFIG_PREEMPT_RCU)
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index f9a1346a5fa95..ba7ed4bed5036 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5663,6 +5663,29 @@ int __cond_resched_lock(spinlock_t *lock)
>  }
>  EXPORT_SYMBOL(__cond_resched_lock);
>  
> +int __cond_resched_rwlock(rwlock_t *lock, bool write_lock)
> +{
> +	int ret = 0;
> +
> +	lockdep_assert_held(lock);
> +	if (should_resched(PREEMPT_LOCK_OFFSET)) {
> +		if (write_lock) {

The existing __cond_resched_lock() checks for resched *or* lock
contention.  Is lock contention not something that needs (or can't?) be
considered?

> +			write_unlock(lock);
> +			preempt_schedule_common();
> +			write_lock(lock);
> +		} else {
> +			read_unlock(lock);
> +			preempt_schedule_common();
> +			read_lock(lock);
> +		}
> +
> +		ret = 1;
> +	}
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL(__cond_resched_rwlock);
> +
>  /**
>   * yield - yield the current processor to other threads.
>   *
> -- 
> 2.23.0.444.g18eeb5a265-goog
>

Ben Gardon Dec. 6, 2019, 8:12 p.m. UTC | #2

Lock contention should definitely be considered. It was an oversight
on my part to not have a check for that.

On Wed, Nov 27, 2019 at 10:42 AM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> On Thu, Sep 26, 2019 at 04:18:01PM -0700, Ben Gardon wrote:
> > Rescheduling while holding a spin lock is essential for keeping long
> > running kernel operations running smoothly. Add the facility to
> > cond_resched read/write spin locks.
> >
> > RFC_NOTE: The current implementation of this patch set uses a read/write
> > lock to replace the existing MMU spin lock. See the next patch in this
> > series for more on why a read/write lock was chosen, and possible
> > alternatives.
>
> This definitely needs to be run by the sched/locking folks sooner rather
> than later.
>
> > Signed-off-by: Ben Gardon <bgardon@google.com>
> > ---
> >  include/linux/sched.h | 11 +++++++++++
> >  kernel/sched/core.c   | 23 +++++++++++++++++++++++
> >  2 files changed, 34 insertions(+)
> >
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 70db597d6fd4f..4d1fd96693d9b 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -1767,12 +1767,23 @@ static inline int _cond_resched(void) { return 0; }
> >  })
> >
> >  extern int __cond_resched_lock(spinlock_t *lock);
> > +extern int __cond_resched_rwlock(rwlock_t *lock, bool write_lock);
> >
> >  #define cond_resched_lock(lock) ({                           \
> >       ___might_sleep(__FILE__, __LINE__, PREEMPT_LOCK_OFFSET);\
> >       __cond_resched_lock(lock);                              \
> >  })
> >
> > +#define cond_resched_rwlock_read(lock) ({                    \
> > +     __might_sleep(__FILE__, __LINE__, PREEMPT_LOCK_OFFSET); \
> > +     __cond_resched_rwlock(lock, false);                     \
> > +})
> > +
> > +#define cond_resched_rwlock_write(lock) ({                   \
> > +     __might_sleep(__FILE__, __LINE__, PREEMPT_LOCK_OFFSET); \
> > +     __cond_resched_rwlock(lock, true);                      \
> > +})
> > +
> >  static inline void cond_resched_rcu(void)
> >  {
> >  #if defined(CONFIG_DEBUG_ATOMIC_SLEEP) || !defined(CONFIG_PREEMPT_RCU)
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index f9a1346a5fa95..ba7ed4bed5036 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -5663,6 +5663,29 @@ int __cond_resched_lock(spinlock_t *lock)
> >  }
> >  EXPORT_SYMBOL(__cond_resched_lock);
> >
> > +int __cond_resched_rwlock(rwlock_t *lock, bool write_lock)
> > +{
> > +     int ret = 0;
> > +
> > +     lockdep_assert_held(lock);
> > +     if (should_resched(PREEMPT_LOCK_OFFSET)) {
> > +             if (write_lock) {
>
> The existing __cond_resched_lock() checks for resched *or* lock
> contention.  Is lock contention not something that needs (or can't?) be
> considered?
>
> > +                     write_unlock(lock);
> > +                     preempt_schedule_common();
> > +                     write_lock(lock);
> > +             } else {
> > +                     read_unlock(lock);
> > +                     preempt_schedule_common();
> > +                     read_lock(lock);
> > +             }
> > +
> > +             ret = 1;
> > +     }
> > +
> > +     return ret;
> > +}
> > +EXPORT_SYMBOL(__cond_resched_rwlock);
> > +
> >  /**
> >   * yield - yield the current processor to other threads.
> >   *
> > --
> > 2.23.0.444.g18eeb5a265-goog
> >

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 70db597d6fd4f..4d1fd96693d9b 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1767,12 +1767,23 @@  static inline int _cond_resched(void) { return 0; }
 })
 
 extern int __cond_resched_lock(spinlock_t *lock);
+extern int __cond_resched_rwlock(rwlock_t *lock, bool write_lock);
 
 #define cond_resched_lock(lock) ({				\
 	___might_sleep(__FILE__, __LINE__, PREEMPT_LOCK_OFFSET);\
 	__cond_resched_lock(lock);				\
 })
 
+#define cond_resched_rwlock_read(lock) ({			\
+	__might_sleep(__FILE__, __LINE__, PREEMPT_LOCK_OFFSET);	\
+	__cond_resched_rwlock(lock, false);			\
+})
+
+#define cond_resched_rwlock_write(lock) ({			\
+	__might_sleep(__FILE__, __LINE__, PREEMPT_LOCK_OFFSET);	\
+	__cond_resched_rwlock(lock, true);			\
+})
+
 static inline void cond_resched_rcu(void)
 {
 #if defined(CONFIG_DEBUG_ATOMIC_SLEEP) || !defined(CONFIG_PREEMPT_RCU)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f9a1346a5fa95..ba7ed4bed5036 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5663,6 +5663,29 @@  int __cond_resched_lock(spinlock_t *lock)
 }
 EXPORT_SYMBOL(__cond_resched_lock);
 
+int __cond_resched_rwlock(rwlock_t *lock, bool write_lock)
+{
+	int ret = 0;
+
+	lockdep_assert_held(lock);
+	if (should_resched(PREEMPT_LOCK_OFFSET)) {
+		if (write_lock) {
+			write_unlock(lock);
+			preempt_schedule_common();
+			write_lock(lock);
+		} else {
+			read_unlock(lock);
+			preempt_schedule_common();
+			read_lock(lock);
+		}
+
+		ret = 1;
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL(__cond_resched_rwlock);
+
 /**
  * yield - yield the current processor to other threads.
  *

[RFC,05/28] sched: Add cond_resched_rwlock

Commit Message

Comments

Patch