Message ID | 20170904142836.15446-1-osalvador@suse.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Sep 04, 2017 at 04:28:36PM +0200, Oscar Salvador wrote: > This is just a resend of Waiman Long's patch. > I could not find why it was not merged to upstream, so I thought > to give it another chance. > What follows is what Waiman Long wrote. > > Xen has an kernel command line argument "xen_nopvspin" to disable > paravirtual spinlocks. This patch adds a similar "kvm_nopvspin" > argument to disable paravirtual spinlocks for KVM. This can be useful > for testing as well as allowing administrators to choose unfair lock > for their KVM guests if they want to. For testing its trivial to hack your kernel and I don't feel this is something an Admin can make reasonable decisions about. So why? In general less knobs is better.
On 09/04/2017 10:40 AM, Peter Zijlstra wrote: > On Mon, Sep 04, 2017 at 04:28:36PM +0200, Oscar Salvador wrote: >> This is just a resend of Waiman Long's patch. >> I could not find why it was not merged to upstream, so I thought >> to give it another chance. >> What follows is what Waiman Long wrote. >> >> Xen has an kernel command line argument "xen_nopvspin" to disable >> paravirtual spinlocks. This patch adds a similar "kvm_nopvspin" >> argument to disable paravirtual spinlocks for KVM. This can be useful >> for testing as well as allowing administrators to choose unfair lock >> for their KVM guests if they want to. > For testing its trivial to hack your kernel and I don't feel this is > something an Admin can make reasonable decisions about. I almost forgot about this patch that I sent quite a while ago. I was sending this patch out mainly to maintain consistency between KVM and Xen. This patch is not that important to me, and that is why I didn't push it further. Cheers, Longman
On Mon, 04 Sep 2017, Peter Zijlstra wrote: >For testing its trivial to hack your kernel and I don't feel this is >something an Admin can make reasonable decisions about. > >So why? In general less knobs is better. +1. Also, note how b8fa70b51aa (xen, pvticketlocks: Add xen_nopvspin parameter to disable xen pv ticketlocks) has no justification as to why its wanted in the first place. The only thing I could find was from 15a3eac0784 (xen/spinlock: Document the xen_nopvspin parameter): "Useful for diagnosing issues and comparing benchmarks in over-commit CPU scenarios." So I vote for no additional knobs, specially for such core code. Thanks, Davidlohr
On 05/09/17 00:21, Davidlohr Bueso wrote: > On Mon, 04 Sep 2017, Peter Zijlstra wrote: > >> For testing its trivial to hack your kernel and I don't feel this is >> something an Admin can make reasonable decisions about. >> >> So why? In general less knobs is better. > > +1. > > Also, note how b8fa70b51aa (xen, pvticketlocks: Add xen_nopvspin parameter > to disable xen pv ticketlocks) has no justification as to why its wanted > in the first place. The only thing I could find was from 15a3eac0784 > (xen/spinlock: Document the xen_nopvspin parameter): > > "Useful for diagnosing issues and comparing benchmarks in over-commit > CPU scenarios." Hmm, I think I should clarify the Xen knob, as I was the one requesting it: In my previous employment we had a configuration where dom0 ran exclusively on a dedicated set of physical cpus. We experienced scalability problems when doing I/O performance tests: with a decent number of dom0 cpus we achieved throughput of 700 MB/s with only 20% cpu load in dom0. A higher dom0 cpu count let the throughput drop to about 150 MB/s and cpu load was up to 100%. Reason was the additional load due to hypervisor interactions on a high frequency lock. So in special configurations at least for Xen the knob is useful for production environment. Juergen
On 09/05/2017 08:28 AM, Juergen Gross wrote: > On 05/09/17 00:21, Davidlohr Bueso wrote: >> On Mon, 04 Sep 2017, Peter Zijlstra wrote: >> >>> For testing its trivial to hack your kernel and I don't feel this is >>> something an Admin can make reasonable decisions about. >>> >>> So why? In general less knobs is better. >> +1. >> >> Also, note how b8fa70b51aa (xen, pvticketlocks: Add xen_nopvspin parameter >> to disable xen pv ticketlocks) has no justification as to why its wanted >> in the first place. The only thing I could find was from 15a3eac0784 >> (xen/spinlock: Document the xen_nopvspin parameter): >> >> "Useful for diagnosing issues and comparing benchmarks in over-commit >> CPU scenarios." > Hmm, I think I should clarify the Xen knob, as I was the one requesting > it: > > In my previous employment we had a configuration where dom0 ran > exclusively on a dedicated set of physical cpus. We experienced > scalability problems when doing I/O performance tests: with a decent > number of dom0 cpus we achieved throughput of 700 MB/s with only 20% > cpu load in dom0. A higher dom0 cpu count let the throughput drop to > about 150 MB/s and cpu load was up to 100%. Reason was the additional > load due to hypervisor interactions on a high frequency lock. > > So in special configurations at least for Xen the knob is useful for > production environment. It may be that the original patch was just to keep consistency between Xen and KVM, and also only for testing purposes. But we find a case when a customer of ours is running some workloads with 1<->1 mapping between physical cores and virtual cores, and we realized that with the pv spinlocks disabled there is a 4-5% of performance gain. A perf analysis showed that the application was very lock intensive with a lot of time spent in __raw_callee_save___pv_queued_spin_unlock.
On Tue, Sep 05, 2017 at 08:28:10AM +0200, Juergen Gross wrote: > On 05/09/17 00:21, Davidlohr Bueso wrote: > > On Mon, 04 Sep 2017, Peter Zijlstra wrote: > > > >> For testing its trivial to hack your kernel and I don't feel this is > >> something an Admin can make reasonable decisions about. > >> > >> So why? In general less knobs is better. > > > > +1. > > > > Also, note how b8fa70b51aa (xen, pvticketlocks: Add xen_nopvspin parameter > > to disable xen pv ticketlocks) has no justification as to why its wanted > > in the first place. The only thing I could find was from 15a3eac0784 > > (xen/spinlock: Document the xen_nopvspin parameter): > > > > "Useful for diagnosing issues and comparing benchmarks in over-commit > > CPU scenarios." > > Hmm, I think I should clarify the Xen knob, as I was the one requesting > it: > > In my previous employment we had a configuration where dom0 ran > exclusively on a dedicated set of physical cpus. We experienced > scalability problems when doing I/O performance tests: with a decent > number of dom0 cpus we achieved throughput of 700 MB/s with only 20% > cpu load in dom0. A higher dom0 cpu count let the throughput drop to > about 150 MB/s and cpu load was up to 100%. Reason was the additional > load due to hypervisor interactions on a high frequency lock. > > So in special configurations at least for Xen the knob is useful for > production environment. So the problem with qspinlock is that it will revert to a classic test-and-set spinlock if you don't do paravirt but are running a HV. And test-and-set is unfair and has all kinds of ugly starvation cases, esp on slightly bigger hardware. So if we'd want to cater to the 1:1 virt case, we'll need to come up with something else. _IF_ it is an issue of course.
On 05/09/17 08:58, Peter Zijlstra wrote: > On Tue, Sep 05, 2017 at 08:28:10AM +0200, Juergen Gross wrote: >> On 05/09/17 00:21, Davidlohr Bueso wrote: >>> On Mon, 04 Sep 2017, Peter Zijlstra wrote: >>> >>>> For testing its trivial to hack your kernel and I don't feel this is >>>> something an Admin can make reasonable decisions about. >>>> >>>> So why? In general less knobs is better. >>> >>> +1. >>> >>> Also, note how b8fa70b51aa (xen, pvticketlocks: Add xen_nopvspin parameter >>> to disable xen pv ticketlocks) has no justification as to why its wanted >>> in the first place. The only thing I could find was from 15a3eac0784 >>> (xen/spinlock: Document the xen_nopvspin parameter): >>> >>> "Useful for diagnosing issues and comparing benchmarks in over-commit >>> CPU scenarios." >> >> Hmm, I think I should clarify the Xen knob, as I was the one requesting >> it: >> >> In my previous employment we had a configuration where dom0 ran >> exclusively on a dedicated set of physical cpus. We experienced >> scalability problems when doing I/O performance tests: with a decent >> number of dom0 cpus we achieved throughput of 700 MB/s with only 20% >> cpu load in dom0. A higher dom0 cpu count let the throughput drop to >> about 150 MB/s and cpu load was up to 100%. Reason was the additional >> load due to hypervisor interactions on a high frequency lock. >> >> So in special configurations at least for Xen the knob is useful for >> production environment. > > So the problem with qspinlock is that it will revert to a classic > test-and-set spinlock if you don't do paravirt but are running a HV. In the Xen case we just use the bare metal settings when xen_nopvspin has been specified. So paravirt, but without modifying any pv_lock_ops functions. Juergen > > And test-and-set is unfair and has all kinds of ugly starvation cases, > esp on slightly bigger hardware. > > So if we'd want to cater to the 1:1 virt case, we'll need to come up > with something else. _IF_ it is an issue of course. >
On Tue, Sep 05, 2017 at 08:57:16AM +0200, Oscar Salvador wrote: > It may be that the original patch was just to keep consistency between Xen > and KVM, and also only for testing purposes. > But we find a case when a customer of ours is running some workloads with > 1<->1 mapping between physical cores and virtual cores, and we realized that > with the pv spinlocks disabled there is a 4-5% of performance gain. There are very definite downsides to using a test-and-set spinlock. A much better option would be one that forces the use of native qspinlock in the 1:1 case. That means you have to fail both pv_enabled() and virt_spin_lock().
On Tue, Sep 05, 2017 at 09:35:40AM +0200, Juergen Gross wrote: > > So the problem with qspinlock is that it will revert to a classic > > test-and-set spinlock if you don't do paravirt but are running a HV. > > In the Xen case we just use the bare metal settings when xen_nopvspin > has been specified. So paravirt, but without modifying any pv_lock_ops > functions. See arch/x86/include/asm/qspinlock.h:virt_spin_lock(). Unless you clear X86_FEATURE_HYPERVISOR you get a test-and-set spinlock. And as the comment there says, this is a fallback for !paravirt enabled hypervisors to avoid the worst of the lock holder preemption crud. But this very much does not deal with the 1:1 case nicely.
On 05/09/17 10:10, Peter Zijlstra wrote: > On Tue, Sep 05, 2017 at 09:35:40AM +0200, Juergen Gross wrote: >>> So the problem with qspinlock is that it will revert to a classic >>> test-and-set spinlock if you don't do paravirt but are running a HV. >> >> In the Xen case we just use the bare metal settings when xen_nopvspin >> has been specified. So paravirt, but without modifying any pv_lock_ops >> functions. > > See arch/x86/include/asm/qspinlock.h:virt_spin_lock(). Unless you clear > X86_FEATURE_HYPERVISOR you get a test-and-set spinlock. > > And as the comment there says, this is a fallback for !paravirt enabled > hypervisors to avoid the worst of the lock holder preemption crud. > > But this very much does not deal with the 1:1 case nicely. > Aah, now I've got it. So maybe we should add virt_spin_lock() to pv_lock_ops? This way e.g. xen_nopvspin could tweak just the virt_spin_lock() case by letting it return false all the time? In case you agree I can setup a patch... Juergen
On Tue, Sep 05, 2017 at 10:14:21AM +0200, Juergen Gross wrote: > On 05/09/17 10:10, Peter Zijlstra wrote: > > On Tue, Sep 05, 2017 at 09:35:40AM +0200, Juergen Gross wrote: > >>> So the problem with qspinlock is that it will revert to a classic > >>> test-and-set spinlock if you don't do paravirt but are running a HV. > >> > >> In the Xen case we just use the bare metal settings when xen_nopvspin > >> has been specified. So paravirt, but without modifying any pv_lock_ops > >> functions. > > > > See arch/x86/include/asm/qspinlock.h:virt_spin_lock(). Unless you clear > > X86_FEATURE_HYPERVISOR you get a test-and-set spinlock. > > > > And as the comment there says, this is a fallback for !paravirt enabled > > hypervisors to avoid the worst of the lock holder preemption crud. > > > > But this very much does not deal with the 1:1 case nicely. > > > > Aah, now I've got it. > > So maybe we should add virt_spin_lock() to pv_lock_ops? This way e.g. > xen_nopvspin could tweak just the virt_spin_lock() case by letting it > return false all the time? Hmm, that might work. Could we somehow nop that call when !X86_FEATURE_HYPERVISOR?, that saves native from having to do the call and would be a win for everyone.
On 05/09/17 10:28, Peter Zijlstra wrote: > On Tue, Sep 05, 2017 at 10:14:21AM +0200, Juergen Gross wrote: >> On 05/09/17 10:10, Peter Zijlstra wrote: >>> On Tue, Sep 05, 2017 at 09:35:40AM +0200, Juergen Gross wrote: >>>>> So the problem with qspinlock is that it will revert to a classic >>>>> test-and-set spinlock if you don't do paravirt but are running a HV. >>>> >>>> In the Xen case we just use the bare metal settings when xen_nopvspin >>>> has been specified. So paravirt, but without modifying any pv_lock_ops >>>> functions. >>> >>> See arch/x86/include/asm/qspinlock.h:virt_spin_lock(). Unless you clear >>> X86_FEATURE_HYPERVISOR you get a test-and-set spinlock. >>> >>> And as the comment there says, this is a fallback for !paravirt enabled >>> hypervisors to avoid the worst of the lock holder preemption crud. >>> >>> But this very much does not deal with the 1:1 case nicely. >>> >> >> Aah, now I've got it. >> >> So maybe we should add virt_spin_lock() to pv_lock_ops? This way e.g. >> xen_nopvspin could tweak just the virt_spin_lock() case by letting it >> return false all the time? > > Hmm, that might work. Could we somehow nop that call when > !X86_FEATURE_HYPERVISOR?, that saves native from having to do the call > and would be a win for everyone. So in fact we want a "always false" shortcut for bare metal and for any virtualization environment selecting bare metal behavior. I'll have a try. Juergen
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index d9c171ce4190..56c6e3acdf8e 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1899,6 +1899,10 @@ feature (tagged TLBs) on capable Intel chips. Default is 1 (enabled) + kvm_nopvspin [X86,KVM] + Disables the paravirtualized spinlock slowpath + optimizations for KVM. + l2cr= [PPC] l3cr= [PPC] @@ -4533,7 +4537,7 @@ never -- do not unplug even if version check succeeds xen_nopvspin [X86,XEN] - Disables the ticketlock slowpath using Xen PV + Disables the spinlock slowpath using Xen PV optimizations. xen_nopv [X86] diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index d04e30e3c0ff..51addf874fc1 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -568,6 +568,18 @@ static void kvm_kick_cpu(int cpu) kvm_hypercall2(KVM_HC_KICK_CPU, flags, apicid); } +static bool kvm_pvspin = true; + +/* + * Allow disabling of PV spinlock in kernel command line + */ +static __init int kvm_parse_nopvspin(char *arg) +{ + kvm_pvspin = false; + return 0; +} +early_param("kvm_nopvspin", kvm_parse_nopvspin); + #include <asm/qspinlock.h> static void kvm_wait(u8 *ptr, u8 val) @@ -633,7 +645,7 @@ asm( */ void __init kvm_spinlock_init(void) { - if (!kvm_para_available()) + if (!kvm_para_available() || !kvm_pvspin) return; /* Does host kernel support KVM_FEATURE_PV_UNHALT? */ if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))