diff mbox

[PATCHv3,1/1] locking/qspinlock/x86: Avoid test-and-set when PV_DEDICATED is set

Message ID 1509999980-10828-1-git-send-email-eduval@amazon.com (mailing list archive)
State New, archived
Headers show

Commit Message

Eduardo Valentin Nov. 6, 2017, 8:26 p.m. UTC
Currently, the existing qspinlock implementation will fallback to
test-and-set if the hypervisor has not set the PV_UNHALT flag.

This patch gives the opportunity to guest kernels to select
between test-and-set and the regular queueu fair lock implementation
based on the PV_DEDICATED KVM feature flag. When the PV_DEDICATED
flag is not set, the code will still fall back to test-and-set,
but when the PV_DEDICATED flag is set, the code will use
the regular queue spinlock implementation.

With this patch, when in autoselect mode, the guest will
use the default spinlock implementation based on host feature
flags as follows:

PV_DEDICATED = 1, PV_UNHALT = anything: default is qspinlock
PV_DEDICATED = 0, PV_UNHALT = 1: default is pvqspinlock
PV_DEDICATED = 0, PV_UNHALT = 0: default is tas

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Waiman Long <longman@redhat.com>
Cc: kvm@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Jan H. Schoenherr <jschoenh@amazon.de>
Cc: Anthony Liguori <aliguori@amazon.com>
Suggested-by: Matt Wilson <msw@amazon.com>
Signed-off-by: Eduardo Valentin <eduval@amazon.com>
---
V3:
 - When PV_DEDICATED is set (1), qspinlock is selected,
   regardless of the value of PV_UNHAULT. Suggested by Paolo Bonzini. 
 - Refreshed on top of tip/master.
V2:
 - rebase on top of tip/master

 Documentation/virtual/kvm/cpuid.txt  | 6 ++++++
 arch/x86/include/asm/qspinlock.h     | 4 ++++
 arch/x86/include/uapi/asm/kvm_para.h | 1 +
 arch/x86/kernel/kvm.c                | 2 ++
 4 files changed, 13 insertions(+)

Comments

Paolo Bonzini Nov. 7, 2017, 12:23 p.m. UTC | #1
On 06/11/2017 21:26, Eduardo Valentin wrote:
> Currently, the existing qspinlock implementation will fallback to
> test-and-set if the hypervisor has not set the PV_UNHALT flag.
> 
> This patch gives the opportunity to guest kernels to select
> between test-and-set and the regular queueu fair lock implementation
> based on the PV_DEDICATED KVM feature flag. When the PV_DEDICATED
> flag is not set, the code will still fall back to test-and-set,
> but when the PV_DEDICATED flag is set, the code will use
> the regular queue spinlock implementation.
> 
> With this patch, when in autoselect mode, the guest will
> use the default spinlock implementation based on host feature
> flags as follows:
> 
> PV_DEDICATED = 1, PV_UNHALT = anything: default is qspinlock
> PV_DEDICATED = 0, PV_UNHALT = 1: default is pvqspinlock
> PV_DEDICATED = 0, PV_UNHALT = 0: default is tas

Hi Eduardo,

besides the suggestion to use a separate word than the one for features,
is this still needed after Waiman's patch to adaptively switch between
tas and pvqspinlock?

Paolo

> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: x86@kernel.org
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Waiman Long <longman@redhat.com>
> Cc: kvm@vger.kernel.org
> Cc: linux-doc@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: Jan H. Schoenherr <jschoenh@amazon.de>
> Cc: Anthony Liguori <aliguori@amazon.com>
> Suggested-by: Matt Wilson <msw@amazon.com>
> Signed-off-by: Eduardo Valentin <eduval@amazon.com>
> ---
> V3:
>  - When PV_DEDICATED is set (1), qspinlock is selected,
>    regardless of the value of PV_UNHAULT. Suggested by Paolo Bonzini. 
>  - Refreshed on top of tip/master.
> V2:
>  - rebase on top of tip/master
> 
>  Documentation/virtual/kvm/cpuid.txt  | 6 ++++++
>  arch/x86/include/asm/qspinlock.h     | 4 ++++
>  arch/x86/include/uapi/asm/kvm_para.h | 1 +
>  arch/x86/kernel/kvm.c                | 2 ++
>  4 files changed, 13 insertions(+)
> 
> diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt
> index 3c65feb..117066a 100644
> --- a/Documentation/virtual/kvm/cpuid.txt
> +++ b/Documentation/virtual/kvm/cpuid.txt
> @@ -54,6 +54,12 @@ KVM_FEATURE_PV_UNHALT              ||     7 || guest checks this feature bit
>                                     ||       || before enabling paravirtualized
>                                     ||       || spinlock support.
>  ------------------------------------------------------------------------------
> +KVM_FEATURE_PV_DEDICATED           ||     8 || guest checks this feature bit
> +                                   ||       || to determine if they run on
> +                                   ||       || dedicated vCPUs, allowing opti-
> +                                   ||       || mizations such as usage of
> +                                   ||       || qspinlocks.
> +------------------------------------------------------------------------------
>  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||    24 || host will warn if no guest-side
>                                     ||       || per-cpu warps are expected in
>                                     ||       || kvmclock.
> diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
> index 5e16b5d..de42694 100644
> --- a/arch/x86/include/asm/qspinlock.h
> +++ b/arch/x86/include/asm/qspinlock.h
> @@ -3,6 +3,8 @@
>  #define _ASM_X86_QSPINLOCK_H
>  
>  #include <linux/jump_label.h>
> +#include <linux/kvm_para.h>
> +
>  #include <asm/cpufeature.h>
>  #include <asm-generic/qspinlock_types.h>
>  #include <asm/paravirt.h>
> @@ -58,6 +60,8 @@ static inline bool virt_spin_lock(struct qspinlock *lock)
>  	if (!static_branch_likely(&virt_spin_lock_key))
>  		return false;
>  
> +	if (kvm_para_has_feature(KVM_FEATURE_PV_DEDICATED))
> +		return false;
>  	/*
>  	 * On hypervisors without PARAVIRT_SPINLOCKS support we fall
>  	 * back to a Test-and-Set spinlock, because fair locks have
> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> index 554aa8f..85a9875 100644
> --- a/arch/x86/include/uapi/asm/kvm_para.h
> +++ b/arch/x86/include/uapi/asm/kvm_para.h
> @@ -25,6 +25,7 @@
>  #define KVM_FEATURE_STEAL_TIME		5
>  #define KVM_FEATURE_PV_EOI		6
>  #define KVM_FEATURE_PV_UNHALT		7
> +#define KVM_FEATURE_PV_DEDICATED	8
>  
>  /* The last 8 bits are used to indicate how to interpret the flags field
>   * in pvclock structure. If no bits are set, all flags are ignored.
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 8bb9594..dacd7cf 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -642,6 +642,8 @@ void __init kvm_spinlock_init(void)
>  {
>  	if (!kvm_para_available())
>  		return;
> +	if (kvm_para_has_feature(KVM_FEATURE_PV_DEDICATED))
> +		return;
>  	/* Does host kernel support KVM_FEATURE_PV_UNHALT? */
>  	if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
>  		return;
>
Eduardo Valentin Nov. 7, 2017, 12:39 p.m. UTC | #2
On Tue, Nov 07, 2017 at 01:23:56PM +0100, Paolo Bonzini wrote:
> On 06/11/2017 21:26, Eduardo Valentin wrote:
> > Currently, the existing qspinlock implementation will fallback to
> > test-and-set if the hypervisor has not set the PV_UNHALT flag.
> > 
> > This patch gives the opportunity to guest kernels to select
> > between test-and-set and the regular queueu fair lock implementation
> > based on the PV_DEDICATED KVM feature flag. When the PV_DEDICATED
> > flag is not set, the code will still fall back to test-and-set,
> > but when the PV_DEDICATED flag is set, the code will use
> > the regular queue spinlock implementation.
> > 
> > With this patch, when in autoselect mode, the guest will
> > use the default spinlock implementation based on host feature
> > flags as follows:
> > 
> > PV_DEDICATED = 1, PV_UNHALT = anything: default is qspinlock
> > PV_DEDICATED = 0, PV_UNHALT = 1: default is pvqspinlock
> > PV_DEDICATED = 0, PV_UNHALT = 0: default is tas
> 
> Hi Eduardo,
> 
> besides the suggestion to use a separate word than the one for features,

Ok. I will take a look.

> is this still needed after Waiman's patch to adaptively switch between
> tas and pvqspinlock?

Can you please point me to it ? Is it already in tip/master?

> 
> Paolo
> 
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> > Cc: Jonathan Corbet <corbet@lwn.net>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > Cc: x86@kernel.org
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Waiman Long <longman@redhat.com>
> > Cc: kvm@vger.kernel.org
> > Cc: linux-doc@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Cc: Jan H. Schoenherr <jschoenh@amazon.de>
> > Cc: Anthony Liguori <aliguori@amazon.com>
> > Suggested-by: Matt Wilson <msw@amazon.com>
> > Signed-off-by: Eduardo Valentin <eduval@amazon.com>
> > ---
> > V3:
> >  - When PV_DEDICATED is set (1), qspinlock is selected,
> >    regardless of the value of PV_UNHAULT. Suggested by Paolo Bonzini. 
> >  - Refreshed on top of tip/master.
> > V2:
> >  - rebase on top of tip/master
> > 
> >  Documentation/virtual/kvm/cpuid.txt  | 6 ++++++
> >  arch/x86/include/asm/qspinlock.h     | 4 ++++
> >  arch/x86/include/uapi/asm/kvm_para.h | 1 +
> >  arch/x86/kernel/kvm.c                | 2 ++
> >  4 files changed, 13 insertions(+)
> > 
> > diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt
> > index 3c65feb..117066a 100644
> > --- a/Documentation/virtual/kvm/cpuid.txt
> > +++ b/Documentation/virtual/kvm/cpuid.txt
> > @@ -54,6 +54,12 @@ KVM_FEATURE_PV_UNHALT              ||     7 || guest checks this feature bit
> >                                     ||       || before enabling paravirtualized
> >                                     ||       || spinlock support.
> >  ------------------------------------------------------------------------------
> > +KVM_FEATURE_PV_DEDICATED           ||     8 || guest checks this feature bit
> > +                                   ||       || to determine if they run on
> > +                                   ||       || dedicated vCPUs, allowing opti-
> > +                                   ||       || mizations such as usage of
> > +                                   ||       || qspinlocks.
> > +------------------------------------------------------------------------------
> >  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||    24 || host will warn if no guest-side
> >                                     ||       || per-cpu warps are expected in
> >                                     ||       || kvmclock.
> > diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
> > index 5e16b5d..de42694 100644
> > --- a/arch/x86/include/asm/qspinlock.h
> > +++ b/arch/x86/include/asm/qspinlock.h
> > @@ -3,6 +3,8 @@
> >  #define _ASM_X86_QSPINLOCK_H
> >  
> >  #include <linux/jump_label.h>
> > +#include <linux/kvm_para.h>
> > +
> >  #include <asm/cpufeature.h>
> >  #include <asm-generic/qspinlock_types.h>
> >  #include <asm/paravirt.h>
> > @@ -58,6 +60,8 @@ static inline bool virt_spin_lock(struct qspinlock *lock)
> >  	if (!static_branch_likely(&virt_spin_lock_key))
> >  		return false;
> >  
> > +	if (kvm_para_has_feature(KVM_FEATURE_PV_DEDICATED))
> > +		return false;
> >  	/*
> >  	 * On hypervisors without PARAVIRT_SPINLOCKS support we fall
> >  	 * back to a Test-and-Set spinlock, because fair locks have
> > diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> > index 554aa8f..85a9875 100644
> > --- a/arch/x86/include/uapi/asm/kvm_para.h
> > +++ b/arch/x86/include/uapi/asm/kvm_para.h
> > @@ -25,6 +25,7 @@
> >  #define KVM_FEATURE_STEAL_TIME		5
> >  #define KVM_FEATURE_PV_EOI		6
> >  #define KVM_FEATURE_PV_UNHALT		7
> > +#define KVM_FEATURE_PV_DEDICATED	8
> >  
> >  /* The last 8 bits are used to indicate how to interpret the flags field
> >   * in pvclock structure. If no bits are set, all flags are ignored.
> > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> > index 8bb9594..dacd7cf 100644
> > --- a/arch/x86/kernel/kvm.c
> > +++ b/arch/x86/kernel/kvm.c
> > @@ -642,6 +642,8 @@ void __init kvm_spinlock_init(void)
> >  {
> >  	if (!kvm_para_available())
> >  		return;
> > +	if (kvm_para_has_feature(KVM_FEATURE_PV_DEDICATED))
> > +		return;
> >  	/* Does host kernel support KVM_FEATURE_PV_UNHALT? */
> >  	if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
> >  		return;
> > 
> 
>
Paolo Bonzini Nov. 7, 2017, 12:43 p.m. UTC | #3
On 07/11/2017 13:39, Eduardo Valentin wrote:
>> is this still needed after Waiman's patch to adaptively switch between
>> tas and pvqspinlock?
> Can you please point me to it ? Is it already in tip/master?
> 

No, he just posted it:

https://marc.info/?l=linux-kernel&m=150972337909996&w=2

Paolo
Eduardo Valentin Nov. 8, 2017, 8:45 a.m. UTC | #4
Paolo,

On Tue, Nov 07, 2017 at 01:43:15PM +0100, Paolo Bonzini wrote:
> On 07/11/2017 13:39, Eduardo Valentin wrote:
> >> is this still needed after Waiman's patch to adaptively switch between
> >> tas and pvqspinlock?
> > Can you please point me to it ? Is it already in tip/master?
> > 
> 
> No, he just posted it:
> 
> https://marc.info/?l=linux-kernel&m=150972337909996&w=2

OK, Thanks. I've reviewed his V2. I think the patch to have PV_DEDICATED is still interesting to have,
for the case the guest is in autoselect mode and honors what the host gives as hints.

> 
> Paolo
Radim Krčmář Nov. 8, 2017, 5:36 p.m. UTC | #5
2017-11-06 12:26-0800, Eduardo Valentin:
> Currently, the existing qspinlock implementation will fallback to
> test-and-set if the hypervisor has not set the PV_UNHALT flag.
> 
> This patch gives the opportunity to guest kernels to select
> between test-and-set and the regular queueu fair lock implementation
> based on the PV_DEDICATED KVM feature flag. When the PV_DEDICATED
> flag is not set, the code will still fall back to test-and-set,
> but when the PV_DEDICATED flag is set, the code will use
> the regular queue spinlock implementation.
> 
> With this patch, when in autoselect mode, the guest will
> use the default spinlock implementation based on host feature
> flags as follows:
> 
> PV_DEDICATED = 1, PV_UNHALT = anything: default is qspinlock
> PV_DEDICATED = 0, PV_UNHALT = 1: default is pvqspinlock
> PV_DEDICATED = 0, PV_UNHALT = 0: default is tas
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: x86@kernel.org
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Waiman Long <longman@redhat.com>
> Cc: kvm@vger.kernel.org
> Cc: linux-doc@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: Jan H. Schoenherr <jschoenh@amazon.de>
> Cc: Anthony Liguori <aliguori@amazon.com>
> Suggested-by: Matt Wilson <msw@amazon.com>
> Signed-off-by: Eduardo Valentin <eduval@amazon.com>
> ---
> V3:
>  - When PV_DEDICATED is set (1), qspinlock is selected,
>    regardless of the value of PV_UNHAULT. Suggested by Paolo Bonzini. 
>  - Refreshed on top of tip/master.
> V2:
>  - rebase on top of tip/master
> 
>  Documentation/virtual/kvm/cpuid.txt  | 6 ++++++
>  arch/x86/include/asm/qspinlock.h     | 4 ++++
>  arch/x86/include/uapi/asm/kvm_para.h | 1 +
>  arch/x86/kernel/kvm.c                | 2 ++
>  4 files changed, 13 insertions(+)
> 
> diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt
> index 3c65feb..117066a 100644
> --- a/Documentation/virtual/kvm/cpuid.txt
> +++ b/Documentation/virtual/kvm/cpuid.txt
> @@ -54,6 +54,12 @@ KVM_FEATURE_PV_UNHALT              ||     7 || guest checks this feature bit
>                                     ||       || before enabling paravirtualized
>                                     ||       || spinlock support.
>  ------------------------------------------------------------------------------
> +KVM_FEATURE_PV_DEDICATED           ||     8 || guest checks this feature bit
> +                                   ||       || to determine if they run on
> +                                   ||       || dedicated vCPUs, allowing opti-
> +                                   ||       || mizations such as usage of
> +                                   ||       || qspinlocks.
> +------------------------------------------------------------------------------
>  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||    24 || host will warn if no guest-side
>                                     ||       || per-cpu warps are expected in
>                                     ||       || kvmclock.
> diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
> index 5e16b5d..de42694 100644
> --- a/arch/x86/include/asm/qspinlock.h
> +++ b/arch/x86/include/asm/qspinlock.h
> @@ -3,6 +3,8 @@
>  #define _ASM_X86_QSPINLOCK_H
>  
>  #include <linux/jump_label.h>
> +#include <linux/kvm_para.h>
> +
>  #include <asm/cpufeature.h>
>  #include <asm-generic/qspinlock_types.h>
>  #include <asm/paravirt.h>
> @@ -58,6 +60,8 @@ static inline bool virt_spin_lock(struct qspinlock *lock)
>  	if (!static_branch_likely(&virt_spin_lock_key))
>  		return false;
>  
> +	if (kvm_para_has_feature(KVM_FEATURE_PV_DEDICATED))
> +		return false;

Hm, every spinlock slowpath calls cpuid, which causes a VM exit, so I
wouldn't expect it to be faster than the existing implementations.
(Using the static key would be better.)

How does this patch perform compared to user-forced qspinlock and hybrid
pvqspinlock?

Thanks.
Eduardo Valentin Nov. 9, 2017, 8:55 a.m. UTC | #6
Hello,

On Wed, Nov 08, 2017 at 06:36:52PM +0100, Radim Krčmář wrote:
> 2017-11-06 12:26-0800, Eduardo Valentin:
> > Currently, the existing qspinlock implementation will fallback to
> > test-and-set if the hypervisor has not set the PV_UNHALT flag.
> > 
> > This patch gives the opportunity to guest kernels to select
> > between test-and-set and the regular queueu fair lock implementation
> > based on the PV_DEDICATED KVM feature flag. When the PV_DEDICATED
> > flag is not set, the code will still fall back to test-and-set,
> > but when the PV_DEDICATED flag is set, the code will use
> > the regular queue spinlock implementation.
> > 
> > With this patch, when in autoselect mode, the guest will
> > use the default spinlock implementation based on host feature
> > flags as follows:
> > 
> > PV_DEDICATED = 1, PV_UNHALT = anything: default is qspinlock
> > PV_DEDICATED = 0, PV_UNHALT = 1: default is pvqspinlock
> > PV_DEDICATED = 0, PV_UNHALT = 0: default is tas
> > 
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> > Cc: Jonathan Corbet <corbet@lwn.net>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > Cc: x86@kernel.org
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Waiman Long <longman@redhat.com>
> > Cc: kvm@vger.kernel.org
> > Cc: linux-doc@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Cc: Jan H. Schoenherr <jschoenh@amazon.de>
> > Cc: Anthony Liguori <aliguori@amazon.com>
> > Suggested-by: Matt Wilson <msw@amazon.com>
> > Signed-off-by: Eduardo Valentin <eduval@amazon.com>
> > ---
> > V3:
> >  - When PV_DEDICATED is set (1), qspinlock is selected,
> >    regardless of the value of PV_UNHAULT. Suggested by Paolo Bonzini. 
> >  - Refreshed on top of tip/master.
> > V2:
> >  - rebase on top of tip/master
> > 
> >  Documentation/virtual/kvm/cpuid.txt  | 6 ++++++
> >  arch/x86/include/asm/qspinlock.h     | 4 ++++
> >  arch/x86/include/uapi/asm/kvm_para.h | 1 +
> >  arch/x86/kernel/kvm.c                | 2 ++
> >  4 files changed, 13 insertions(+)
> > 
> > diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt
> > index 3c65feb..117066a 100644
> > --- a/Documentation/virtual/kvm/cpuid.txt
> > +++ b/Documentation/virtual/kvm/cpuid.txt
> > @@ -54,6 +54,12 @@ KVM_FEATURE_PV_UNHALT              ||     7 || guest checks this feature bit
> >                                     ||       || before enabling paravirtualized
> >                                     ||       || spinlock support.
> >  ------------------------------------------------------------------------------
> > +KVM_FEATURE_PV_DEDICATED           ||     8 || guest checks this feature bit
> > +                                   ||       || to determine if they run on
> > +                                   ||       || dedicated vCPUs, allowing opti-
> > +                                   ||       || mizations such as usage of
> > +                                   ||       || qspinlocks.
> > +------------------------------------------------------------------------------
> >  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||    24 || host will warn if no guest-side
> >                                     ||       || per-cpu warps are expected in
> >                                     ||       || kvmclock.
> > diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
> > index 5e16b5d..de42694 100644
> > --- a/arch/x86/include/asm/qspinlock.h
> > +++ b/arch/x86/include/asm/qspinlock.h
> > @@ -3,6 +3,8 @@
> >  #define _ASM_X86_QSPINLOCK_H
> >  
> >  #include <linux/jump_label.h>
> > +#include <linux/kvm_para.h>
> > +
> >  #include <asm/cpufeature.h>
> >  #include <asm-generic/qspinlock_types.h>
> >  #include <asm/paravirt.h>
> > @@ -58,6 +60,8 @@ static inline bool virt_spin_lock(struct qspinlock *lock)
> >  	if (!static_branch_likely(&virt_spin_lock_key))
> >  		return false;
> >  
> > +	if (kvm_para_has_feature(KVM_FEATURE_PV_DEDICATED))
> > +		return false;
> 
> Hm, every spinlock slowpath calls cpuid, which causes a VM exit, so I
> wouldn't expect it to be faster than the existing implementations.
> (Using the static key would be better.)
> 
> How does this patch perform compared to user-forced qspinlock and hybrid
> pvqspinlock?

This patch should have same effect as user-forced qspinlock. However, the key aspect
here is this patch gives a way for the host to instruct the guest to use qspinlock.
Even with Longman's patch which allows guest to select the spinlock implementation,
there should still be the auto-select mode. In such mode, PV_DEDICATED should
allow the host to get the guest to use qspinlock, without, the guest will fallback
to tas when PV_UNHALT == 0.

> 
> Thanks.
>
Wanpeng Li Nov. 9, 2017, 12:43 p.m. UTC | #7
2017-11-07 4:26 GMT+08:00 Eduardo Valentin <eduval@amazon.com>:
> Currently, the existing qspinlock implementation will fallback to
> test-and-set if the hypervisor has not set the PV_UNHALT flag.
>
> This patch gives the opportunity to guest kernels to select
> between test-and-set and the regular queueu fair lock implementation
> based on the PV_DEDICATED KVM feature flag. When the PV_DEDICATED
> flag is not set, the code will still fall back to test-and-set,
> but when the PV_DEDICATED flag is set, the code will use
> the regular queue spinlock implementation.
>
> With this patch, when in autoselect mode, the guest will
> use the default spinlock implementation based on host feature
> flags as follows:
>
> PV_DEDICATED = 1, PV_UNHALT = anything: default is qspinlock
> PV_DEDICATED = 0, PV_UNHALT = 1: default is pvqspinlock
> PV_DEDICATED = 0, PV_UNHALT = 0: default is tas
>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: x86@kernel.org
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Waiman Long <longman@redhat.com>
> Cc: kvm@vger.kernel.org
> Cc: linux-doc@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: Jan H. Schoenherr <jschoenh@amazon.de>
> Cc: Anthony Liguori <aliguori@amazon.com>
> Suggested-by: Matt Wilson <msw@amazon.com>
> Signed-off-by: Eduardo Valentin <eduval@amazon.com>
> ---
> V3:
>  - When PV_DEDICATED is set (1), qspinlock is selected,
>    regardless of the value of PV_UNHAULT. Suggested by Paolo Bonzini.
>  - Refreshed on top of tip/master.
> V2:
>  - rebase on top of tip/master
>
>  Documentation/virtual/kvm/cpuid.txt  | 6 ++++++
>  arch/x86/include/asm/qspinlock.h     | 4 ++++
>  arch/x86/include/uapi/asm/kvm_para.h | 1 +
>  arch/x86/kernel/kvm.c                | 2 ++
>  4 files changed, 13 insertions(+)
>
> diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt
> index 3c65feb..117066a 100644
> --- a/Documentation/virtual/kvm/cpuid.txt
> +++ b/Documentation/virtual/kvm/cpuid.txt
> @@ -54,6 +54,12 @@ KVM_FEATURE_PV_UNHALT              ||     7 || guest checks this feature bit
>                                     ||       || before enabling paravirtualized
>                                     ||       || spinlock support.
>  ------------------------------------------------------------------------------
> +KVM_FEATURE_PV_DEDICATED           ||     8 || guest checks this feature bit
> +                                   ||       || to determine if they run on
> +                                   ||       || dedicated vCPUs, allowing opti-
> +                                   ||       || mizations such as usage of
> +                                   ||       || qspinlocks.
> +------------------------------------------------------------------------------
>  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||    24 || host will warn if no guest-side
>                                     ||       || per-cpu warps are expected in
>                                     ||       || kvmclock.
> diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
> index 5e16b5d..de42694 100644
> --- a/arch/x86/include/asm/qspinlock.h
> +++ b/arch/x86/include/asm/qspinlock.h
> @@ -3,6 +3,8 @@
>  #define _ASM_X86_QSPINLOCK_H
>
>  #include <linux/jump_label.h>
> +#include <linux/kvm_para.h>
> +
>  #include <asm/cpufeature.h>
>  #include <asm-generic/qspinlock_types.h>
>  #include <asm/paravirt.h>
> @@ -58,6 +60,8 @@ static inline bool virt_spin_lock(struct qspinlock *lock)
>         if (!static_branch_likely(&virt_spin_lock_key))
>                 return false;
>
> +       if (kvm_para_has_feature(KVM_FEATURE_PV_DEDICATED))
> +               return false;
>         /*
>          * On hypervisors without PARAVIRT_SPINLOCKS support we fall
>          * back to a Test-and-Set spinlock, because fair locks have
> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> index 554aa8f..85a9875 100644
> --- a/arch/x86/include/uapi/asm/kvm_para.h
> +++ b/arch/x86/include/uapi/asm/kvm_para.h
> @@ -25,6 +25,7 @@
>  #define KVM_FEATURE_STEAL_TIME         5
>  #define KVM_FEATURE_PV_EOI             6
>  #define KVM_FEATURE_PV_UNHALT          7
> +#define KVM_FEATURE_PV_DEDICATED       8
>
>  /* The last 8 bits are used to indicate how to interpret the flags field
>   * in pvclock structure. If no bits are set, all flags are ignored.
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 8bb9594..dacd7cf 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -642,6 +642,8 @@ void __init kvm_spinlock_init(void)
>  {
>         if (!kvm_para_available())
>                 return;
> +       if (kvm_para_has_feature(KVM_FEATURE_PV_DEDICATED))
> +               return;
>         /* Does host kernel support KVM_FEATURE_PV_UNHALT? */
>         if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
>                 return;
> --
> 2.7.4
>

You should also add a cpuid flag in kvm part.

Regards,
Wanpeng Li
Pankaj Gupta Nov. 9, 2017, 3:53 p.m. UTC | #8
> 2017-11-07 4:26 GMT+08:00 Eduardo Valentin <eduval@amazon.com>:
> > Currently, the existing qspinlock implementation will fallback to
> > test-and-set if the hypervisor has not set the PV_UNHALT flag.
> >
> > This patch gives the opportunity to guest kernels to select
> > between test-and-set and the regular queueu fair lock implementation
> > based on the PV_DEDICATED KVM feature flag. When the PV_DEDICATED
> > flag is not set, the code will still fall back to test-and-set,
> > but when the PV_DEDICATED flag is set, the code will use
> > the regular queue spinlock implementation.
> >
> > With this patch, when in autoselect mode, the guest will
> > use the default spinlock implementation based on host feature
> > flags as follows:
> >
> > PV_DEDICATED = 1, PV_UNHALT = anything: default is qspinlock
> > PV_DEDICATED = 0, PV_UNHALT = 1: default is pvqspinlock
> > PV_DEDICATED = 0, PV_UNHALT = 0: default is tas
> >
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> > Cc: Jonathan Corbet <corbet@lwn.net>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > Cc: x86@kernel.org
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Waiman Long <longman@redhat.com>
> > Cc: kvm@vger.kernel.org
> > Cc: linux-doc@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Cc: Jan H. Schoenherr <jschoenh@amazon.de>
> > Cc: Anthony Liguori <aliguori@amazon.com>
> > Suggested-by: Matt Wilson <msw@amazon.com>
> > Signed-off-by: Eduardo Valentin <eduval@amazon.com>
> > ---
> > V3:
> >  - When PV_DEDICATED is set (1), qspinlock is selected,
> >    regardless of the value of PV_UNHAULT. Suggested by Paolo Bonzini.
> >  - Refreshed on top of tip/master.
> > V2:
> >  - rebase on top of tip/master
> >
> >  Documentation/virtual/kvm/cpuid.txt  | 6 ++++++
> >  arch/x86/include/asm/qspinlock.h     | 4 ++++
> >  arch/x86/include/uapi/asm/kvm_para.h | 1 +
> >  arch/x86/kernel/kvm.c                | 2 ++
> >  4 files changed, 13 insertions(+)
> >
> > diff --git a/Documentation/virtual/kvm/cpuid.txt
> > b/Documentation/virtual/kvm/cpuid.txt
> > index 3c65feb..117066a 100644
> > --- a/Documentation/virtual/kvm/cpuid.txt
> > +++ b/Documentation/virtual/kvm/cpuid.txt
> > @@ -54,6 +54,12 @@ KVM_FEATURE_PV_UNHALT              ||     7 || guest
> > checks this feature bit
> >                                     ||       || before enabling
> >                                     ||       || paravirtualized
> >                                     ||       || spinlock support.
> >  ------------------------------------------------------------------------------
> > +KVM_FEATURE_PV_DEDICATED           ||     8 || guest checks this feature
> > bit
> > +                                   ||       || to determine if they run on
> > +                                   ||       || dedicated vCPUs, allowing
> > opti-
> > +                                   ||       || mizations such as usage of
> > +                                   ||       || qspinlocks.
> > +------------------------------------------------------------------------------
> >  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||    24 || host will warn if no
> >  guest-side
> >                                     ||       || per-cpu warps are expected
> >                                     ||       || in
> >                                     ||       || kvmclock.
> > diff --git a/arch/x86/include/asm/qspinlock.h
> > b/arch/x86/include/asm/qspinlock.h
> > index 5e16b5d..de42694 100644
> > --- a/arch/x86/include/asm/qspinlock.h
> > +++ b/arch/x86/include/asm/qspinlock.h
> > @@ -3,6 +3,8 @@
> >  #define _ASM_X86_QSPINLOCK_H
> >
> >  #include <linux/jump_label.h>
> > +#include <linux/kvm_para.h>
> > +
> >  #include <asm/cpufeature.h>
> >  #include <asm-generic/qspinlock_types.h>
> >  #include <asm/paravirt.h>
> > @@ -58,6 +60,8 @@ static inline bool virt_spin_lock(struct qspinlock *lock)
> >         if (!static_branch_likely(&virt_spin_lock_key))
> >                 return false;
> >
> > +       if (kvm_para_has_feature(KVM_FEATURE_PV_DEDICATED))
> > +               return false;
> >         /*
> >          * On hypervisors without PARAVIRT_SPINLOCKS support we fall
> >          * back to a Test-and-Set spinlock, because fair locks have
> > diff --git a/arch/x86/include/uapi/asm/kvm_para.h
> > b/arch/x86/include/uapi/asm/kvm_para.h
> > index 554aa8f..85a9875 100644
> > --- a/arch/x86/include/uapi/asm/kvm_para.h
> > +++ b/arch/x86/include/uapi/asm/kvm_para.h
> > @@ -25,6 +25,7 @@
> >  #define KVM_FEATURE_STEAL_TIME         5
> >  #define KVM_FEATURE_PV_EOI             6
> >  #define KVM_FEATURE_PV_UNHALT          7
> > +#define KVM_FEATURE_PV_DEDICATED       8
> >
> >  /* The last 8 bits are used to indicate how to interpret the flags field
> >   * in pvclock structure. If no bits are set, all flags are ignored.
> > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> > index 8bb9594..dacd7cf 100644
> > --- a/arch/x86/kernel/kvm.c
> > +++ b/arch/x86/kernel/kvm.c
> > @@ -642,6 +642,8 @@ void __init kvm_spinlock_init(void)
> >  {
> >         if (!kvm_para_available())
> >                 return;
> > +       if (kvm_para_has_feature(KVM_FEATURE_PV_DEDICATED))
> > +               return;
> >         /* Does host kernel support KVM_FEATURE_PV_UNHALT? */
> >         if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
> >                 return;
> > --
> > 2.7.4
> >
> 
> You should also add a cpuid flag in kvm part.

Also, I am thinking if PV_DEDICATED helps in performance and with conjunction
with PV TLB patch in other thread. For use-case e.g KVM-RT where we don't overcommit 
vCPU's and pin vCPU:pCPU 1:1 we need a way from host side with which user can decide 
to enable PV_DEDICATED option. Such that if vCPU's are unlikely going to preempt or 
sleep we should avoid traversing the cpulist in PV TLB code.

So, two things:

1] A way to configure PV_DEDICATED from host.
2] PV TLB should also behave as per option PV_DEDICATED for better performance.

Or I am missing any context here?
> 
> Regards,
> Wanpeng Li
>
Radim Krčmář Nov. 9, 2017, 4 p.m. UTC | #9
2017-11-09 20:43+0800, Wanpeng Li:
> 2017-11-07 4:26 GMT+08:00 Eduardo Valentin <eduval@amazon.com>:
> > Currently, the existing qspinlock implementation will fallback to
> > test-and-set if the hypervisor has not set the PV_UNHALT flag.
> >
> > This patch gives the opportunity to guest kernels to select
> > between test-and-set and the regular queueu fair lock implementation
> > based on the PV_DEDICATED KVM feature flag. When the PV_DEDICATED
> > flag is not set, the code will still fall back to test-and-set,
> > but when the PV_DEDICATED flag is set, the code will use
> > the regular queue spinlock implementation.
> >
> > With this patch, when in autoselect mode, the guest will
> > use the default spinlock implementation based on host feature
> > flags as follows:
> >
> > PV_DEDICATED = 1, PV_UNHALT = anything: default is qspinlock
> > PV_DEDICATED = 0, PV_UNHALT = 1: default is pvqspinlock
> > PV_DEDICATED = 0, PV_UNHALT = 0: default is tas
> >
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> > Cc: Jonathan Corbet <corbet@lwn.net>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > Cc: x86@kernel.org
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Waiman Long <longman@redhat.com>
> > Cc: kvm@vger.kernel.org
> > Cc: linux-doc@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Cc: Jan H. Schoenherr <jschoenh@amazon.de>
> > Cc: Anthony Liguori <aliguori@amazon.com>
> > Suggested-by: Matt Wilson <msw@amazon.com>
> > Signed-off-by: Eduardo Valentin <eduval@amazon.com>
> > ---
> 
> You should also add a cpuid flag in kvm part.

It is better without that.  The flag has no dependency on KVM (kernel
hypervisor) code.
Radim Krčmář Nov. 9, 2017, 4:05 p.m. UTC | #10
2017-11-09 10:53-0500, Pankaj Gupta:
> 
> 
> > 2017-11-07 4:26 GMT+08:00 Eduardo Valentin <eduval@amazon.com>:
> > > Currently, the existing qspinlock implementation will fallback to
> > > test-and-set if the hypervisor has not set the PV_UNHALT flag.
> > >
> > > This patch gives the opportunity to guest kernels to select
> > > between test-and-set and the regular queueu fair lock implementation
> > > based on the PV_DEDICATED KVM feature flag. When the PV_DEDICATED
> > > flag is not set, the code will still fall back to test-and-set,
> > > but when the PV_DEDICATED flag is set, the code will use
> > > the regular queue spinlock implementation.
> > >
> > > With this patch, when in autoselect mode, the guest will
> > > use the default spinlock implementation based on host feature
> > > flags as follows:
> > >
> > > PV_DEDICATED = 1, PV_UNHALT = anything: default is qspinlock
> > > PV_DEDICATED = 0, PV_UNHALT = 1: default is pvqspinlock
> > > PV_DEDICATED = 0, PV_UNHALT = 0: default is tas
> > >
> > > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > > Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> > > Cc: Jonathan Corbet <corbet@lwn.net>
> > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > Cc: Ingo Molnar <mingo@redhat.com>
> > > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > > Cc: x86@kernel.org
> > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > Cc: Waiman Long <longman@redhat.com>
> > > Cc: kvm@vger.kernel.org
> > > Cc: linux-doc@vger.kernel.org
> > > Cc: linux-kernel@vger.kernel.org
> > > Cc: Jan H. Schoenherr <jschoenh@amazon.de>
> > > Cc: Anthony Liguori <aliguori@amazon.com>
> > > Suggested-by: Matt Wilson <msw@amazon.com>
> > > Signed-off-by: Eduardo Valentin <eduval@amazon.com>
> > > ---
> > > V3:
> > >  - When PV_DEDICATED is set (1), qspinlock is selected,
> > >    regardless of the value of PV_UNHAULT. Suggested by Paolo Bonzini.
> > >  - Refreshed on top of tip/master.
> > > V2:
> > >  - rebase on top of tip/master
> > >
> > >  Documentation/virtual/kvm/cpuid.txt  | 6 ++++++
> > >  arch/x86/include/asm/qspinlock.h     | 4 ++++
> > >  arch/x86/include/uapi/asm/kvm_para.h | 1 +
> > >  arch/x86/kernel/kvm.c                | 2 ++
> > >  4 files changed, 13 insertions(+)
> > >
> > > diff --git a/Documentation/virtual/kvm/cpuid.txt
> > > b/Documentation/virtual/kvm/cpuid.txt
> > > index 3c65feb..117066a 100644
> > > --- a/Documentation/virtual/kvm/cpuid.txt
> > > +++ b/Documentation/virtual/kvm/cpuid.txt
> > > @@ -54,6 +54,12 @@ KVM_FEATURE_PV_UNHALT              ||     7 || guest
> > > checks this feature bit
> > >                                     ||       || before enabling
> > >                                     ||       || paravirtualized
> > >                                     ||       || spinlock support.
> > >  ------------------------------------------------------------------------------
> > > +KVM_FEATURE_PV_DEDICATED           ||     8 || guest checks this feature
> > > bit
> > > +                                   ||       || to determine if they run on
> > > +                                   ||       || dedicated vCPUs, allowing
> > > opti-
> > > +                                   ||       || mizations such as usage of
> > > +                                   ||       || qspinlocks.
> > > +------------------------------------------------------------------------------
> > >  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||    24 || host will warn if no
> > >  guest-side
> > >                                     ||       || per-cpu warps are expected
> > >                                     ||       || in
> > >                                     ||       || kvmclock.
> > > diff --git a/arch/x86/include/asm/qspinlock.h
> > > b/arch/x86/include/asm/qspinlock.h
> > > index 5e16b5d..de42694 100644
> > > --- a/arch/x86/include/asm/qspinlock.h
> > > +++ b/arch/x86/include/asm/qspinlock.h
> > > @@ -3,6 +3,8 @@
> > >  #define _ASM_X86_QSPINLOCK_H
> > >
> > >  #include <linux/jump_label.h>
> > > +#include <linux/kvm_para.h>
> > > +
> > >  #include <asm/cpufeature.h>
> > >  #include <asm-generic/qspinlock_types.h>
> > >  #include <asm/paravirt.h>
> > > @@ -58,6 +60,8 @@ static inline bool virt_spin_lock(struct qspinlock *lock)
> > >         if (!static_branch_likely(&virt_spin_lock_key))
> > >                 return false;
> > >
> > > +       if (kvm_para_has_feature(KVM_FEATURE_PV_DEDICATED))
> > > +               return false;
> > >         /*
> > >          * On hypervisors without PARAVIRT_SPINLOCKS support we fall
> > >          * back to a Test-and-Set spinlock, because fair locks have
> > > diff --git a/arch/x86/include/uapi/asm/kvm_para.h
> > > b/arch/x86/include/uapi/asm/kvm_para.h
> > > index 554aa8f..85a9875 100644
> > > --- a/arch/x86/include/uapi/asm/kvm_para.h
> > > +++ b/arch/x86/include/uapi/asm/kvm_para.h
> > > @@ -25,6 +25,7 @@
> > >  #define KVM_FEATURE_STEAL_TIME         5
> > >  #define KVM_FEATURE_PV_EOI             6
> > >  #define KVM_FEATURE_PV_UNHALT          7
> > > +#define KVM_FEATURE_PV_DEDICATED       8
> > >
> > >  /* The last 8 bits are used to indicate how to interpret the flags field
> > >   * in pvclock structure. If no bits are set, all flags are ignored.
> > > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> > > index 8bb9594..dacd7cf 100644
> > > --- a/arch/x86/kernel/kvm.c
> > > +++ b/arch/x86/kernel/kvm.c
> > > @@ -642,6 +642,8 @@ void __init kvm_spinlock_init(void)
> > >  {
> > >         if (!kvm_para_available())
> > >                 return;
> > > +       if (kvm_para_has_feature(KVM_FEATURE_PV_DEDICATED))
> > > +               return;
> > >         /* Does host kernel support KVM_FEATURE_PV_UNHALT? */
> > >         if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
> > >                 return;
> > > --
> > > 2.7.4
> > >
> > 
> > You should also add a cpuid flag in kvm part.
> 
> Also, I am thinking if PV_DEDICATED helps in performance and with conjunction
> with PV TLB patch in other thread. For use-case e.g KVM-RT where we don't overcommit 
> vCPU's and pin vCPU:pCPU 1:1 we need a way from host side with which user can decide 
> to enable PV_DEDICATED option. Such that if vCPU's are unlikely going to preempt or 
> sleep we should avoid traversing the cpulist in PV TLB code.
> 
> So, two things:
> 
> 1] A way to configure PV_DEDICATED from host.

Userspace can already configure it through KVM_SET_CPUID2 ioctl
(regardless of KVM version).

> 2] PV TLB should also behave as per option PV_DEDICATED for better performance.

Right,

thanks.
Peter Zijlstra Nov. 9, 2017, 4:17 p.m. UTC | #11
On Thu, Nov 09, 2017 at 05:05:36PM +0100, Radim Krcmar wrote:
> 2017-11-09 10:53-0500, Pankaj Gupta:
> > 2] PV TLB should also behave as per option PV_DEDICATED for better performance.
> 
> Right,

Shouldn't KVM do flush_tlb_other() in any case? Not sure how
PV_DEDICATED can help with that.
Pankaj Gupta Nov. 9, 2017, 4:37 p.m. UTC | #12
> Subject: Re: [PATCHv3 1/1] locking/qspinlock/x86: Avoid test-and-set when PV_DEDICATED is set
> 
> On Thu, Nov 09, 2017 at 05:05:36PM +0100, Radim Krcmar wrote:
> > 2017-11-09 10:53-0500, Pankaj Gupta:
> > > 2] PV TLB should also behave as per option PV_DEDICATED for better
> > > performance.
> > 
> > Right,
> 
> Shouldn't KVM do flush_tlb_other() in any case? Not sure how
> PV_DEDICATED can help with that.

Yes.

If I understand it correctly, It will just avoid traversing all the 
cpus another time just to check/set 'KVM_VCPU_PREEMPTED'/FLUSH value and 
clear the cpu mask.

Thanks,
Pankaj
Radim Krčmář Nov. 9, 2017, 4:45 p.m. UTC | #13
2017-11-09 17:17+0100, Peter Zijlstra:
> On Thu, Nov 09, 2017 at 05:05:36PM +0100, Radim Krcmar wrote:
> > 2017-11-09 10:53-0500, Pankaj Gupta:
> > > 2] PV TLB should also behave as per option PV_DEDICATED for better performance.
> > 
> > Right,
> 
> Shouldn't KVM do flush_tlb_other() in any case? Not sure how
> PV_DEDICATED can help with that.

It will, the suggestion was based on recent extension of the
flush_tlb_others implementaion, https://lkml.org/lkml/2017/11/8/1146.

PV_TLB_FLUSH allows a guest to set a flush bit instead of sending flush
IPI if the target VCPU is not running.  This would be a waste of time
with PV_DEDICATED as all VCPUs are expected to always running.

With PV_DEDICATED, the guest should keep using native_flush_tlb_others.
Peter Zijlstra Nov. 9, 2017, 5:12 p.m. UTC | #14
On Thu, Nov 09, 2017 at 05:45:23PM +0100, Radim Krcmar wrote:
> 2017-11-09 17:17+0100, Peter Zijlstra:
> > On Thu, Nov 09, 2017 at 05:05:36PM +0100, Radim Krcmar wrote:
> > > 2017-11-09 10:53-0500, Pankaj Gupta:
> > > > 2] PV TLB should also behave as per option PV_DEDICATED for better performance.
> > > 
> > > Right,
> > 
> > Shouldn't KVM do flush_tlb_other() in any case? Not sure how
> > PV_DEDICATED can help with that.
> 
> It will, the suggestion was based on recent extension of the
> flush_tlb_others implementaion, https://lkml.org/lkml/2017/11/8/1146.
> 
> PV_TLB_FLUSH allows a guest to set a flush bit instead of sending flush
> IPI if the target VCPU is not running.  This would be a waste of time
> with PV_DEDICATED as all VCPUs are expected to always running.
> 
> With PV_DEDICATED, the guest should keep using native_flush_tlb_others.

Is saving that for_each_cpu() really worth the effort compared to the
cost of actually doing the IPIs and CR3 write?

Also, you should not put cpumask_t on stack, that's 'broken'.
Peter Zijlstra Nov. 9, 2017, 5:15 p.m. UTC | #15
On Thu, Nov 09, 2017 at 06:12:41PM +0100, Peter Zijlstra wrote:
> On Thu, Nov 09, 2017 at 05:45:23PM +0100, Radim Krcmar wrote:
> > 2017-11-09 17:17+0100, Peter Zijlstra:
> > > On Thu, Nov 09, 2017 at 05:05:36PM +0100, Radim Krcmar wrote:
> > > > 2017-11-09 10:53-0500, Pankaj Gupta:
> > > > > 2] PV TLB should also behave as per option PV_DEDICATED for better performance.
> > > > 
> > > > Right,
> > > 
> > > Shouldn't KVM do flush_tlb_other() in any case? Not sure how
> > > PV_DEDICATED can help with that.
> > 
> > It will, the suggestion was based on recent extension of the
> > flush_tlb_others implementaion, https://lkml.org/lkml/2017/11/8/1146.
> > 
> > PV_TLB_FLUSH allows a guest to set a flush bit instead of sending flush
> > IPI if the target VCPU is not running.  This would be a waste of time
> > with PV_DEDICATED as all VCPUs are expected to always running.
> > 
> > With PV_DEDICATED, the guest should keep using native_flush_tlb_others.
> 
> Is saving that for_each_cpu() really worth the effort compared to the
> cost of actually doing the IPIs and CR3 write?
> 
> Also, you should not put cpumask_t on stack, that's 'broken'.

Also, you'll want to use __cpumask_clear_cpu() there.
Peter Zijlstra Nov. 9, 2017, 5:28 p.m. UTC | #16
On Thu, Nov 09, 2017 at 06:15:11PM +0100, Peter Zijlstra wrote:
> On Thu, Nov 09, 2017 at 06:12:41PM +0100, Peter Zijlstra wrote:
> > On Thu, Nov 09, 2017 at 05:45:23PM +0100, Radim Krcmar wrote:
> > > 2017-11-09 17:17+0100, Peter Zijlstra:
> > > > On Thu, Nov 09, 2017 at 05:05:36PM +0100, Radim Krcmar wrote:
> > > > > 2017-11-09 10:53-0500, Pankaj Gupta:
> > > > > > 2] PV TLB should also behave as per option PV_DEDICATED for better performance.
> > > > > 
> > > > > Right,
> > > > 
> > > > Shouldn't KVM do flush_tlb_other() in any case? Not sure how
> > > > PV_DEDICATED can help with that.
> > > 
> > > It will, the suggestion was based on recent extension of the
> > > flush_tlb_others implementaion, https://lkml.org/lkml/2017/11/8/1146.
> > > 
> > > PV_TLB_FLUSH allows a guest to set a flush bit instead of sending flush
> > > IPI if the target VCPU is not running.  This would be a waste of time
> > > with PV_DEDICATED as all VCPUs are expected to always running.
> > > 
> > > With PV_DEDICATED, the guest should keep using native_flush_tlb_others.
> > 
> > Is saving that for_each_cpu() really worth the effort compared to the
> > cost of actually doing the IPIs and CR3 write?
> > 
> > Also, you should not put cpumask_t on stack, that's 'broken'.
> 
> Also, you'll want to use __cpumask_clear_cpu() there.

Also^2, that patch split is crazy, after patch 2/3 your machine is
broken due to lost TLB flushes. You have to first add the SHOULD_FLUSH
handling and then clear CPUs from the native_flush_tlb_other() mask.
Radim Krčmář Nov. 9, 2017, 5:31 p.m. UTC | #17
2017-11-09 18:12+0100, Peter Zijlstra:
> On Thu, Nov 09, 2017 at 05:45:23PM +0100, Radim Krcmar wrote:
> > 2017-11-09 17:17+0100, Peter Zijlstra:
> > > On Thu, Nov 09, 2017 at 05:05:36PM +0100, Radim Krcmar wrote:
> > > > 2017-11-09 10:53-0500, Pankaj Gupta:
> > > > > 2] PV TLB should also behave as per option PV_DEDICATED for better performance.
> > > > 
> > > > Right,
> > > 
> > > Shouldn't KVM do flush_tlb_other() in any case? Not sure how
> > > PV_DEDICATED can help with that.
> > 
> > It will, the suggestion was based on recent extension of the
> > flush_tlb_others implementaion, https://lkml.org/lkml/2017/11/8/1146.
> > 
> > PV_TLB_FLUSH allows a guest to set a flush bit instead of sending flush
> > IPI if the target VCPU is not running.  This would be a waste of time
> > with PV_DEDICATED as all VCPUs are expected to always running.
> > 
> > With PV_DEDICATED, the guest should keep using native_flush_tlb_others.
> 
> Is saving that for_each_cpu() really worth the effort compared to the
> cost of actually doing the IPIs and CR3 write?

It is one line for a few percent (hopefully better for AMD with AVIC).

Still, keeping the decision completely on userspace would be cleaner.

> Also, you should not put cpumask_t on stack, that's 'broken'.

Good catch, thanks.
Radim Krčmář Nov. 9, 2017, 5:35 p.m. UTC | #18
2017-11-09 18:28+0100, Peter Zijlstra:
> On Thu, Nov 09, 2017 at 06:15:11PM +0100, Peter Zijlstra wrote:
> > On Thu, Nov 09, 2017 at 06:12:41PM +0100, Peter Zijlstra wrote:
> > > On Thu, Nov 09, 2017 at 05:45:23PM +0100, Radim Krcmar wrote:
> > > > 2017-11-09 17:17+0100, Peter Zijlstra:
> > > > > On Thu, Nov 09, 2017 at 05:05:36PM +0100, Radim Krcmar wrote:
> > > > > > 2017-11-09 10:53-0500, Pankaj Gupta:
> > > > > > > 2] PV TLB should also behave as per option PV_DEDICATED for better performance.
> > > > > > 
> > > > > > Right,
> > > > > 
> > > > > Shouldn't KVM do flush_tlb_other() in any case? Not sure how
> > > > > PV_DEDICATED can help with that.
> > > > 
> > > > It will, the suggestion was based on recent extension of the
> > > > flush_tlb_others implementaion, https://lkml.org/lkml/2017/11/8/1146.
> > > > 
> > > > PV_TLB_FLUSH allows a guest to set a flush bit instead of sending flush
> > > > IPI if the target VCPU is not running.  This would be a waste of time
> > > > with PV_DEDICATED as all VCPUs are expected to always running.
> > > > 
> > > > With PV_DEDICATED, the guest should keep using native_flush_tlb_others.
> > > 
> > > Is saving that for_each_cpu() really worth the effort compared to the
> > > cost of actually doing the IPIs and CR3 write?
> > > 
> > > Also, you should not put cpumask_t on stack, that's 'broken'.
> > 
> > Also, you'll want to use __cpumask_clear_cpu() there.
> 
> Also^2, that patch split is crazy, after patch 2/3 your machine is
> broken due to lost TLB flushes. You have to first add the SHOULD_FLUSH
> handling and then clear CPUs from the native_flush_tlb_other() mask.

That should be fixed in v2 -- [2/3] must not enable this feature if the
host has not exposed it and [3/3] has to expose it.  (The ordering of
those two doesn't matter as they are separate kernel.)
Wanpeng Li Nov. 10, 2017, 2:07 a.m. UTC | #19
2017-11-10 1:15 GMT+08:00 Peter Zijlstra <peterz@infradead.org>:
> On Thu, Nov 09, 2017 at 06:12:41PM +0100, Peter Zijlstra wrote:
>> On Thu, Nov 09, 2017 at 05:45:23PM +0100, Radim Krcmar wrote:
>> > 2017-11-09 17:17+0100, Peter Zijlstra:
>> > > On Thu, Nov 09, 2017 at 05:05:36PM +0100, Radim Krcmar wrote:
>> > > > 2017-11-09 10:53-0500, Pankaj Gupta:
>> > > > > 2] PV TLB should also behave as per option PV_DEDICATED for better performance.
>> > > >
>> > > > Right,
>> > >
>> > > Shouldn't KVM do flush_tlb_other() in any case? Not sure how
>> > > PV_DEDICATED can help with that.
>> >
>> > It will, the suggestion was based on recent extension of the
>> > flush_tlb_others implementaion, https://lkml.org/lkml/2017/11/8/1146.
>> >
>> > PV_TLB_FLUSH allows a guest to set a flush bit instead of sending flush
>> > IPI if the target VCPU is not running.  This would be a waste of time
>> > with PV_DEDICATED as all VCPUs are expected to always running.
>> >
>> > With PV_DEDICATED, the guest should keep using native_flush_tlb_others.
>>
>> Is saving that for_each_cpu() really worth the effort compared to the
>> cost of actually doing the IPIs and CR3 write?
>>
>> Also, you should not put cpumask_t on stack, that's 'broken'.

Thanks pointing out this. I found a useful comments in arch/x86/kernel/irq.c:

/* These two declarations are only used in check_irq_vectors_for_cpu_disable()
 * below, which is protected by stop_machine().  Putting them on the stack
 * results in a stack frame overflow.  Dynamically allocating could result in a
 * failure so declare these two cpumasks as global.
 */
static struct cpumask affinity_new, online_new;

>
> Also, you'll want to use __cpumask_clear_cpu() there.

Will do.

Regards,
Wanpeng Li
Wanpeng Li Nov. 10, 2017, 6:07 a.m. UTC | #20
2017-11-10 0:00 GMT+08:00 Radim Krcmar <rkrcmar@redhat.com>:
> 2017-11-09 20:43+0800, Wanpeng Li:
>> 2017-11-07 4:26 GMT+08:00 Eduardo Valentin <eduval@amazon.com>:
>> > Currently, the existing qspinlock implementation will fallback to
>> > test-and-set if the hypervisor has not set the PV_UNHALT flag.
>> >
>> > This patch gives the opportunity to guest kernels to select
>> > between test-and-set and the regular queueu fair lock implementation
>> > based on the PV_DEDICATED KVM feature flag. When the PV_DEDICATED
>> > flag is not set, the code will still fall back to test-and-set,
>> > but when the PV_DEDICATED flag is set, the code will use
>> > the regular queue spinlock implementation.
>> >
>> > With this patch, when in autoselect mode, the guest will
>> > use the default spinlock implementation based on host feature
>> > flags as follows:
>> >
>> > PV_DEDICATED = 1, PV_UNHALT = anything: default is qspinlock
>> > PV_DEDICATED = 0, PV_UNHALT = 1: default is pvqspinlock
>> > PV_DEDICATED = 0, PV_UNHALT = 0: default is tas
>> >
>> > Cc: Paolo Bonzini <pbonzini@redhat.com>
>> > Cc: "Radim Krčmář" <rkrcmar@redhat.com>
>> > Cc: Jonathan Corbet <corbet@lwn.net>
>> > Cc: Thomas Gleixner <tglx@linutronix.de>
>> > Cc: Ingo Molnar <mingo@redhat.com>
>> > Cc: "H. Peter Anvin" <hpa@zytor.com>
>> > Cc: x86@kernel.org
>> > Cc: Peter Zijlstra <peterz@infradead.org>
>> > Cc: Waiman Long <longman@redhat.com>
>> > Cc: kvm@vger.kernel.org
>> > Cc: linux-doc@vger.kernel.org
>> > Cc: linux-kernel@vger.kernel.org
>> > Cc: Jan H. Schoenherr <jschoenh@amazon.de>
>> > Cc: Anthony Liguori <aliguori@amazon.com>
>> > Suggested-by: Matt Wilson <msw@amazon.com>
>> > Signed-off-by: Eduardo Valentin <eduval@amazon.com>
>> > ---
>>
>> You should also add a cpuid flag in kvm part.
>
> It is better without that.  The flag has no dependency on KVM (kernel
> hypervisor) code.

Do you mean -cpu host, +xxxx,I think it will result in "warning: host
doesn't support requested feature: CPUID.40000001H:eax.XXXXXX"

Regards,
Wanpeng Li
Peter Zijlstra Nov. 10, 2017, 7:59 a.m. UTC | #21
On Fri, Nov 10, 2017 at 10:07:56AM +0800, Wanpeng Li wrote:

> >> Also, you should not put cpumask_t on stack, that's 'broken'.
> 
> Thanks pointing out this. I found a useful comments in arch/x86/kernel/irq.c:
> 
> /* These two declarations are only used in check_irq_vectors_for_cpu_disable()
>  * below, which is protected by stop_machine().  Putting them on the stack
>  * results in a stack frame overflow.  Dynamically allocating could result in a
>  * failure so declare these two cpumasks as global.
>  */
> static struct cpumask affinity_new, online_new;

That code no longer exists.. Also not entirely sure how it would be
helpful.

What you probably want to do is have a per-cpu cpumask, since
flush_tlb_others() is called with preemption disabled. But you probably
don't want an unconditionally allocated one, since most kernels will not
in fact be PV.

So you'll want something like:

	static DEFINE_PER_CPU(cpumask_var_t, __pv_tlb_mask);

And then you need something like:

	for_each_possible_cpu(cpu) {
		zalloc_cpumask_var_node(per_cpu_ptr(&__pb_tlb_mask, cpu),
					GFP_KERNEL, cpu_to_node(cpu));
	}

before you set the pv-op or so.
Wanpeng Li Nov. 10, 2017, 8:04 a.m. UTC | #22
2017-11-10 15:59 GMT+08:00 Peter Zijlstra <peterz@infradead.org>:
> On Fri, Nov 10, 2017 at 10:07:56AM +0800, Wanpeng Li wrote:
>
>> >> Also, you should not put cpumask_t on stack, that's 'broken'.
>>
>> Thanks pointing out this. I found a useful comments in arch/x86/kernel/irq.c:
>>
>> /* These two declarations are only used in check_irq_vectors_for_cpu_disable()
>>  * below, which is protected by stop_machine().  Putting them on the stack
>>  * results in a stack frame overflow.  Dynamically allocating could result in a
>>  * failure so declare these two cpumasks as global.
>>  */
>> static struct cpumask affinity_new, online_new;
>
> That code no longer exists.. Also not entirely sure how it would be
> helpful.
>
> What you probably want to do is have a per-cpu cpumask, since
> flush_tlb_others() is called with preemption disabled. But you probably
> don't want an unconditionally allocated one, since most kernels will not
> in fact be PV.
>
> So you'll want something like:
>
>         static DEFINE_PER_CPU(cpumask_var_t, __pv_tlb_mask);
>
> And then you need something like:
>
>         for_each_possible_cpu(cpu) {
>                 zalloc_cpumask_var_node(per_cpu_ptr(&__pb_tlb_mask, cpu),
>                                         GFP_KERNEL, cpu_to_node(cpu));
>         }
>
> before you set the pv-op or so.

Thanks Peterz, :) I will have a try.

Regards,
Wanpeng Li
Paolo Bonzini Nov. 10, 2017, 8:19 a.m. UTC | #23
On 10/11/2017 07:07, Wanpeng Li wrote:
>>> You should also add a cpuid flag in kvm part.
>> It is better without that.  The flag has no dependency on KVM (kernel
>> hypervisor) code.
> Do you mean -cpu host, +xxxx,I think it will result in "warning: host
> doesn't support requested feature: CPUID.40000001H:eax.XXXXXX"

There are some exceptions where QEMU overrides the values of
KVM_GET_SUPPORTED_CPUID.

I think it is better to add the flag to KVM *and* to QEMU's override in
kvm_arch_get_supported_cpuid (target/i386/kvm.c).

Paolo
diff mbox

Patch

diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt
index 3c65feb..117066a 100644
--- a/Documentation/virtual/kvm/cpuid.txt
+++ b/Documentation/virtual/kvm/cpuid.txt
@@ -54,6 +54,12 @@  KVM_FEATURE_PV_UNHALT              ||     7 || guest checks this feature bit
                                    ||       || before enabling paravirtualized
                                    ||       || spinlock support.
 ------------------------------------------------------------------------------
+KVM_FEATURE_PV_DEDICATED           ||     8 || guest checks this feature bit
+                                   ||       || to determine if they run on
+                                   ||       || dedicated vCPUs, allowing opti-
+                                   ||       || mizations such as usage of
+                                   ||       || qspinlocks.
+------------------------------------------------------------------------------
 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||    24 || host will warn if no guest-side
                                    ||       || per-cpu warps are expected in
                                    ||       || kvmclock.
diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
index 5e16b5d..de42694 100644
--- a/arch/x86/include/asm/qspinlock.h
+++ b/arch/x86/include/asm/qspinlock.h
@@ -3,6 +3,8 @@ 
 #define _ASM_X86_QSPINLOCK_H
 
 #include <linux/jump_label.h>
+#include <linux/kvm_para.h>
+
 #include <asm/cpufeature.h>
 #include <asm-generic/qspinlock_types.h>
 #include <asm/paravirt.h>
@@ -58,6 +60,8 @@  static inline bool virt_spin_lock(struct qspinlock *lock)
 	if (!static_branch_likely(&virt_spin_lock_key))
 		return false;
 
+	if (kvm_para_has_feature(KVM_FEATURE_PV_DEDICATED))
+		return false;
 	/*
 	 * On hypervisors without PARAVIRT_SPINLOCKS support we fall
 	 * back to a Test-and-Set spinlock, because fair locks have
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 554aa8f..85a9875 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -25,6 +25,7 @@ 
 #define KVM_FEATURE_STEAL_TIME		5
 #define KVM_FEATURE_PV_EOI		6
 #define KVM_FEATURE_PV_UNHALT		7
+#define KVM_FEATURE_PV_DEDICATED	8
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 8bb9594..dacd7cf 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -642,6 +642,8 @@  void __init kvm_spinlock_init(void)
 {
 	if (!kvm_para_available())
 		return;
+	if (kvm_para_has_feature(KVM_FEATURE_PV_DEDICATED))
+		return;
 	/* Does host kernel support KVM_FEATURE_PV_UNHALT? */
 	if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
 		return;