diff mbox

[resend] x86,kvm: Add a kernel parameter to disable PV spinlock

Message ID 20170904142836.15446-1-osalvador@suse.de (mailing list archive)
State New, archived
Headers show

Commit Message

Oscar Salvador Sept. 4, 2017, 2:28 p.m. UTC
This is just a resend of Waiman Long's patch.
I could not find why it was not merged to upstream, so I thought
to give it another chance.
What follows is what Waiman Long wrote.

Xen has an kernel command line argument "xen_nopvspin" to disable
paravirtual spinlocks. This patch adds a similar "kvm_nopvspin"
argument to disable paravirtual spinlocks for KVM. This can be useful
for testing as well as allowing administrators to choose unfair lock
for their KVM guests if they want to.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 Documentation/admin-guide/kernel-parameters.txt |  6 +++++-
 arch/x86/kernel/kvm.c                           | 14 +++++++++++++-
 2 files changed, 18 insertions(+), 2 deletions(-)

Comments

Peter Zijlstra Sept. 4, 2017, 2:40 p.m. UTC | #1
On Mon, Sep 04, 2017 at 04:28:36PM +0200, Oscar Salvador wrote:
> This is just a resend of Waiman Long's patch.
> I could not find why it was not merged to upstream, so I thought
> to give it another chance.
> What follows is what Waiman Long wrote.
> 
> Xen has an kernel command line argument "xen_nopvspin" to disable
> paravirtual spinlocks. This patch adds a similar "kvm_nopvspin"
> argument to disable paravirtual spinlocks for KVM. This can be useful
> for testing as well as allowing administrators to choose unfair lock
> for their KVM guests if they want to.

For testing its trivial to hack your kernel and I don't feel this is
something an Admin can make reasonable decisions about.

So why? In general less knobs is better.
Waiman Long Sept. 4, 2017, 7:32 p.m. UTC | #2
On 09/04/2017 10:40 AM, Peter Zijlstra wrote:
> On Mon, Sep 04, 2017 at 04:28:36PM +0200, Oscar Salvador wrote:
>> This is just a resend of Waiman Long's patch.
>> I could not find why it was not merged to upstream, so I thought
>> to give it another chance.
>> What follows is what Waiman Long wrote.
>>
>> Xen has an kernel command line argument "xen_nopvspin" to disable
>> paravirtual spinlocks. This patch adds a similar "kvm_nopvspin"
>> argument to disable paravirtual spinlocks for KVM. This can be useful
>> for testing as well as allowing administrators to choose unfair lock
>> for their KVM guests if they want to.
> For testing its trivial to hack your kernel and I don't feel this is
> something an Admin can make reasonable decisions about.

I almost forgot about this patch that I sent quite a while ago. I was
sending this patch out mainly to maintain consistency between KVM and
Xen. This patch is not that important to me, and that is why I didn't
push it further.

Cheers,
Longman
Davidlohr Bueso Sept. 4, 2017, 10:21 p.m. UTC | #3
On Mon, 04 Sep 2017, Peter Zijlstra wrote:

>For testing its trivial to hack your kernel and I don't feel this is
>something an Admin can make reasonable decisions about.
>
>So why? In general less knobs is better.

+1.

Also, note how b8fa70b51aa (xen, pvticketlocks: Add xen_nopvspin parameter
to disable xen pv ticketlocks) has no justification as to why its wanted
in the first place. The only thing I could find was from 15a3eac0784
(xen/spinlock: Document the xen_nopvspin parameter):

"Useful for diagnosing issues and comparing benchmarks in over-commit CPU scenarios."

So I vote for no additional knobs, specially for such core code.

Thanks,
Davidlohr
Juergen Gross Sept. 5, 2017, 6:28 a.m. UTC | #4
On 05/09/17 00:21, Davidlohr Bueso wrote:
> On Mon, 04 Sep 2017, Peter Zijlstra wrote:
> 
>> For testing its trivial to hack your kernel and I don't feel this is
>> something an Admin can make reasonable decisions about.
>>
>> So why? In general less knobs is better.
> 
> +1.
> 
> Also, note how b8fa70b51aa (xen, pvticketlocks: Add xen_nopvspin parameter
> to disable xen pv ticketlocks) has no justification as to why its wanted
> in the first place. The only thing I could find was from 15a3eac0784
> (xen/spinlock: Document the xen_nopvspin parameter):
> 
> "Useful for diagnosing issues and comparing benchmarks in over-commit
> CPU scenarios."

Hmm, I think I should clarify the Xen knob, as I was the one requesting
it:

In my previous employment we had a configuration where dom0 ran
exclusively on a dedicated set of physical cpus. We experienced
scalability problems when doing I/O performance tests: with a decent
number of dom0 cpus we achieved throughput of 700 MB/s with only 20%
cpu load in dom0. A higher dom0 cpu count let the throughput drop to
about 150 MB/s and cpu load was up to 100%. Reason was the additional
load due to hypervisor interactions on a high frequency lock.

So in special configurations at least for Xen the knob is useful for
production environment.


Juergen
Oscar Salvador Sept. 5, 2017, 6:57 a.m. UTC | #5
On 09/05/2017 08:28 AM, Juergen Gross wrote:
> On 05/09/17 00:21, Davidlohr Bueso wrote:
>> On Mon, 04 Sep 2017, Peter Zijlstra wrote:
>>
>>> For testing its trivial to hack your kernel and I don't feel this is
>>> something an Admin can make reasonable decisions about.
>>>
>>> So why? In general less knobs is better.
>> +1.
>>
>> Also, note how b8fa70b51aa (xen, pvticketlocks: Add xen_nopvspin parameter
>> to disable xen pv ticketlocks) has no justification as to why its wanted
>> in the first place. The only thing I could find was from 15a3eac0784
>> (xen/spinlock: Document the xen_nopvspin parameter):
>>
>> "Useful for diagnosing issues and comparing benchmarks in over-commit
>> CPU scenarios."
> Hmm, I think I should clarify the Xen knob, as I was the one requesting
> it:
>
> In my previous employment we had a configuration where dom0 ran
> exclusively on a dedicated set of physical cpus. We experienced
> scalability problems when doing I/O performance tests: with a decent
> number of dom0 cpus we achieved throughput of 700 MB/s with only 20%
> cpu load in dom0. A higher dom0 cpu count let the throughput drop to
> about 150 MB/s and cpu load was up to 100%. Reason was the additional
> load due to hypervisor interactions on a high frequency lock.
>
> So in special configurations at least for Xen the knob is useful for
> production environment.

It may be that the original patch was just to keep consistency between 
Xen and KVM, and also only for testing purposes.
But we find a case when a customer of ours is running some workloads 
with 1<->1 mapping between physical cores and virtual cores, and we 
realized that with the pv spinlocks disabled there is a 4-5% of 
performance gain.

A perf analysis showed that the application was very lock intensive with 
a lot of time spent in __raw_callee_save___pv_queued_spin_unlock.
Peter Zijlstra Sept. 5, 2017, 6:58 a.m. UTC | #6
On Tue, Sep 05, 2017 at 08:28:10AM +0200, Juergen Gross wrote:
> On 05/09/17 00:21, Davidlohr Bueso wrote:
> > On Mon, 04 Sep 2017, Peter Zijlstra wrote:
> > 
> >> For testing its trivial to hack your kernel and I don't feel this is
> >> something an Admin can make reasonable decisions about.
> >>
> >> So why? In general less knobs is better.
> > 
> > +1.
> > 
> > Also, note how b8fa70b51aa (xen, pvticketlocks: Add xen_nopvspin parameter
> > to disable xen pv ticketlocks) has no justification as to why its wanted
> > in the first place. The only thing I could find was from 15a3eac0784
> > (xen/spinlock: Document the xen_nopvspin parameter):
> > 
> > "Useful for diagnosing issues and comparing benchmarks in over-commit
> > CPU scenarios."
> 
> Hmm, I think I should clarify the Xen knob, as I was the one requesting
> it:
> 
> In my previous employment we had a configuration where dom0 ran
> exclusively on a dedicated set of physical cpus. We experienced
> scalability problems when doing I/O performance tests: with a decent
> number of dom0 cpus we achieved throughput of 700 MB/s with only 20%
> cpu load in dom0. A higher dom0 cpu count let the throughput drop to
> about 150 MB/s and cpu load was up to 100%. Reason was the additional
> load due to hypervisor interactions on a high frequency lock.
> 
> So in special configurations at least for Xen the knob is useful for
> production environment.

So the problem with qspinlock is that it will revert to a classic
test-and-set spinlock if you don't do paravirt but are running a HV.

And test-and-set is unfair and has all kinds of ugly starvation cases,
esp on slightly bigger hardware.

So if we'd want to cater to the 1:1 virt case, we'll need to come up
with something else. _IF_ it is an issue of course.
Juergen Gross Sept. 5, 2017, 7:35 a.m. UTC | #7
On 05/09/17 08:58, Peter Zijlstra wrote:
> On Tue, Sep 05, 2017 at 08:28:10AM +0200, Juergen Gross wrote:
>> On 05/09/17 00:21, Davidlohr Bueso wrote:
>>> On Mon, 04 Sep 2017, Peter Zijlstra wrote:
>>>
>>>> For testing its trivial to hack your kernel and I don't feel this is
>>>> something an Admin can make reasonable decisions about.
>>>>
>>>> So why? In general less knobs is better.
>>>
>>> +1.
>>>
>>> Also, note how b8fa70b51aa (xen, pvticketlocks: Add xen_nopvspin parameter
>>> to disable xen pv ticketlocks) has no justification as to why its wanted
>>> in the first place. The only thing I could find was from 15a3eac0784
>>> (xen/spinlock: Document the xen_nopvspin parameter):
>>>
>>> "Useful for diagnosing issues and comparing benchmarks in over-commit
>>> CPU scenarios."
>>
>> Hmm, I think I should clarify the Xen knob, as I was the one requesting
>> it:
>>
>> In my previous employment we had a configuration where dom0 ran
>> exclusively on a dedicated set of physical cpus. We experienced
>> scalability problems when doing I/O performance tests: with a decent
>> number of dom0 cpus we achieved throughput of 700 MB/s with only 20%
>> cpu load in dom0. A higher dom0 cpu count let the throughput drop to
>> about 150 MB/s and cpu load was up to 100%. Reason was the additional
>> load due to hypervisor interactions on a high frequency lock.
>>
>> So in special configurations at least for Xen the knob is useful for
>> production environment.
> 
> So the problem with qspinlock is that it will revert to a classic
> test-and-set spinlock if you don't do paravirt but are running a HV.

In the Xen case we just use the bare metal settings when xen_nopvspin
has been specified. So paravirt, but without modifying any pv_lock_ops
functions.


Juergen

> 
> And test-and-set is unfair and has all kinds of ugly starvation cases,
> esp on slightly bigger hardware.
> 
> So if we'd want to cater to the 1:1 virt case, we'll need to come up
> with something else. _IF_ it is an issue of course.
>
Peter Zijlstra Sept. 5, 2017, 8:07 a.m. UTC | #8
On Tue, Sep 05, 2017 at 08:57:16AM +0200, Oscar Salvador wrote:
> It may be that the original patch was just to keep consistency between Xen
> and KVM, and also only for testing purposes.
> But we find a case when a customer of ours is running some workloads with
> 1<->1 mapping between physical cores and virtual cores, and we realized that
> with the pv spinlocks disabled there is a 4-5% of performance gain.

There are very definite downsides to using a test-and-set spinlock.

A much better option would be one that forces the use of native
qspinlock in the 1:1 case. That means you have to fail both pv_enabled()
and virt_spin_lock().
Peter Zijlstra Sept. 5, 2017, 8:10 a.m. UTC | #9
On Tue, Sep 05, 2017 at 09:35:40AM +0200, Juergen Gross wrote:
> > So the problem with qspinlock is that it will revert to a classic
> > test-and-set spinlock if you don't do paravirt but are running a HV.
> 
> In the Xen case we just use the bare metal settings when xen_nopvspin
> has been specified. So paravirt, but without modifying any pv_lock_ops
> functions.

See arch/x86/include/asm/qspinlock.h:virt_spin_lock(). Unless you clear
X86_FEATURE_HYPERVISOR you get a test-and-set spinlock.

And as the comment there says, this is a fallback for !paravirt enabled
hypervisors to avoid the worst of the lock holder preemption crud.

But this very much does not deal with the 1:1 case nicely.
Juergen Gross Sept. 5, 2017, 8:14 a.m. UTC | #10
On 05/09/17 10:10, Peter Zijlstra wrote:
> On Tue, Sep 05, 2017 at 09:35:40AM +0200, Juergen Gross wrote:
>>> So the problem with qspinlock is that it will revert to a classic
>>> test-and-set spinlock if you don't do paravirt but are running a HV.
>>
>> In the Xen case we just use the bare metal settings when xen_nopvspin
>> has been specified. So paravirt, but without modifying any pv_lock_ops
>> functions.
> 
> See arch/x86/include/asm/qspinlock.h:virt_spin_lock(). Unless you clear
> X86_FEATURE_HYPERVISOR you get a test-and-set spinlock.
> 
> And as the comment there says, this is a fallback for !paravirt enabled
> hypervisors to avoid the worst of the lock holder preemption crud.
> 
> But this very much does not deal with the 1:1 case nicely.
> 

Aah, now I've got it.

So maybe we should add virt_spin_lock() to pv_lock_ops? This way e.g.
xen_nopvspin could tweak just the virt_spin_lock() case by letting it
return false all the time?

In case you agree I can setup a patch...


Juergen
Peter Zijlstra Sept. 5, 2017, 8:28 a.m. UTC | #11
On Tue, Sep 05, 2017 at 10:14:21AM +0200, Juergen Gross wrote:
> On 05/09/17 10:10, Peter Zijlstra wrote:
> > On Tue, Sep 05, 2017 at 09:35:40AM +0200, Juergen Gross wrote:
> >>> So the problem with qspinlock is that it will revert to a classic
> >>> test-and-set spinlock if you don't do paravirt but are running a HV.
> >>
> >> In the Xen case we just use the bare metal settings when xen_nopvspin
> >> has been specified. So paravirt, but without modifying any pv_lock_ops
> >> functions.
> > 
> > See arch/x86/include/asm/qspinlock.h:virt_spin_lock(). Unless you clear
> > X86_FEATURE_HYPERVISOR you get a test-and-set spinlock.
> > 
> > And as the comment there says, this is a fallback for !paravirt enabled
> > hypervisors to avoid the worst of the lock holder preemption crud.
> > 
> > But this very much does not deal with the 1:1 case nicely.
> > 
> 
> Aah, now I've got it.
> 
> So maybe we should add virt_spin_lock() to pv_lock_ops? This way e.g.
> xen_nopvspin could tweak just the virt_spin_lock() case by letting it
> return false all the time?

Hmm, that might work. Could we somehow nop that call when
!X86_FEATURE_HYPERVISOR?, that saves native from having to do the call
and would be a win for everyone.
Juergen Gross Sept. 5, 2017, 8:52 a.m. UTC | #12
On 05/09/17 10:28, Peter Zijlstra wrote:
> On Tue, Sep 05, 2017 at 10:14:21AM +0200, Juergen Gross wrote:
>> On 05/09/17 10:10, Peter Zijlstra wrote:
>>> On Tue, Sep 05, 2017 at 09:35:40AM +0200, Juergen Gross wrote:
>>>>> So the problem with qspinlock is that it will revert to a classic
>>>>> test-and-set spinlock if you don't do paravirt but are running a HV.
>>>>
>>>> In the Xen case we just use the bare metal settings when xen_nopvspin
>>>> has been specified. So paravirt, but without modifying any pv_lock_ops
>>>> functions.
>>>
>>> See arch/x86/include/asm/qspinlock.h:virt_spin_lock(). Unless you clear
>>> X86_FEATURE_HYPERVISOR you get a test-and-set spinlock.
>>>
>>> And as the comment there says, this is a fallback for !paravirt enabled
>>> hypervisors to avoid the worst of the lock holder preemption crud.
>>>
>>> But this very much does not deal with the 1:1 case nicely.
>>>
>>
>> Aah, now I've got it.
>>
>> So maybe we should add virt_spin_lock() to pv_lock_ops? This way e.g.
>> xen_nopvspin could tweak just the virt_spin_lock() case by letting it
>> return false all the time?
> 
> Hmm, that might work. Could we somehow nop that call when
> !X86_FEATURE_HYPERVISOR?, that saves native from having to do the call
> and would be a win for everyone.

So in fact we want a "always false" shortcut for bare metal and for any
virtualization environment selecting bare metal behavior.

I'll have a try.


Juergen
diff mbox

Patch

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index d9c171ce4190..56c6e3acdf8e 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1899,6 +1899,10 @@ 
 			feature (tagged TLBs) on capable Intel chips.
 			Default is 1 (enabled)
 
+	kvm_nopvspin	[X86,KVM]
+			Disables the paravirtualized spinlock slowpath
+			optimizations for KVM.
+
 	l2cr=		[PPC]
 
 	l3cr=		[PPC]
@@ -4533,7 +4537,7 @@ 
 			never -- do not unplug even if version check succeeds
 
 	xen_nopvspin	[X86,XEN]
-			Disables the ticketlock slowpath using Xen PV
+			Disables the spinlock slowpath using Xen PV
 			optimizations.
 
 	xen_nopv	[X86]
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index d04e30e3c0ff..51addf874fc1 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -568,6 +568,18 @@  static void kvm_kick_cpu(int cpu)
 	kvm_hypercall2(KVM_HC_KICK_CPU, flags, apicid);
 }
 
+static bool kvm_pvspin = true;
+
+/*
+ * Allow disabling of PV spinlock in kernel command line
+ */
+static __init int kvm_parse_nopvspin(char *arg)
+{
+	kvm_pvspin = false;
+	return 0;
+}
+early_param("kvm_nopvspin", kvm_parse_nopvspin);
+
 #include <asm/qspinlock.h>
 
 static void kvm_wait(u8 *ptr, u8 val)
@@ -633,7 +645,7 @@  asm(
  */
 void __init kvm_spinlock_init(void)
 {
-	if (!kvm_para_available())
+	if (!kvm_para_available() || !kvm_pvspin)
 		return;
 	/* Does host kernel support KVM_FEATURE_PV_UNHALT? */
 	if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))