Message ID | 1564643196-7797-1-git-send-email-wanpengli@tencent.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | cpuidle-haltpoll: Enable kvm guest polling when dedicated physical CPUs are available | expand |
On 8/1/2019 9:06 AM, Wanpeng Li wrote: > From: Wanpeng Li <wanpengli@tencent.com> > > The downside of guest side polling is that polling is performed even > with other runnable tasks in the host. However, even if poll in kvm > can aware whether or not other runnable tasks in the same pCPU, it > can still incur extra overhead in over-subscribe scenario. Now we can > just enable guest polling when dedicated pCPUs are available. > > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > Cc: Paolo Bonzini <pbonzini@redhat.com> > Cc: Radim Krčmář <rkrcmar@redhat.com> > Cc: Marcelo Tosatti <mtosatti@redhat.com> > Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Paolo, Marcelo, any comments? > --- > drivers/cpuidle/cpuidle-haltpoll.c | 3 ++- > drivers/cpuidle/governors/haltpoll.c | 2 +- > 2 files changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/cpuidle/cpuidle-haltpoll.c b/drivers/cpuidle/cpuidle-haltpoll.c > index 9ac093d..7aee38a 100644 > --- a/drivers/cpuidle/cpuidle-haltpoll.c > +++ b/drivers/cpuidle/cpuidle-haltpoll.c > @@ -53,7 +53,8 @@ static int __init haltpoll_init(void) > > cpuidle_poll_state_init(drv); > > - if (!kvm_para_available()) > + if (!kvm_para_available() || > + !kvm_para_has_hint(KVM_HINTS_REALTIME)) > return 0; > > ret = cpuidle_register(&haltpoll_driver, NULL); > diff --git a/drivers/cpuidle/governors/haltpoll.c b/drivers/cpuidle/governors/haltpoll.c > index 797477b..685c7007 100644 > --- a/drivers/cpuidle/governors/haltpoll.c > +++ b/drivers/cpuidle/governors/haltpoll.c > @@ -141,7 +141,7 @@ static struct cpuidle_governor haltpoll_governor = { > > static int __init init_haltpoll(void) > { > - if (kvm_para_available()) > + if (kvm_para_available() && kvm_para_has_hint(KVM_HINTS_REALTIME)) > return cpuidle_register_governor(&haltpoll_governor); > > return 0;
On 01/08/19 18:51, Rafael J. Wysocki wrote: > On 8/1/2019 9:06 AM, Wanpeng Li wrote: >> From: Wanpeng Li <wanpengli@tencent.com> >> >> The downside of guest side polling is that polling is performed even >> with other runnable tasks in the host. However, even if poll in kvm >> can aware whether or not other runnable tasks in the same pCPU, it >> can still incur extra overhead in over-subscribe scenario. Now we can >> just enable guest polling when dedicated pCPUs are available. >> >> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >> Cc: Paolo Bonzini <pbonzini@redhat.com> >> Cc: Radim Krčmář <rkrcmar@redhat.com> >> Cc: Marcelo Tosatti <mtosatti@redhat.com> >> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> > > Paolo, Marcelo, any comments? Yes, it's a good idea. Acked-by: Paolo Bonzini <pbonzini@redhat.com> Paolo > >> --- >> drivers/cpuidle/cpuidle-haltpoll.c | 3 ++- >> drivers/cpuidle/governors/haltpoll.c | 2 +- >> 2 files changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/cpuidle/cpuidle-haltpoll.c >> b/drivers/cpuidle/cpuidle-haltpoll.c >> index 9ac093d..7aee38a 100644 >> --- a/drivers/cpuidle/cpuidle-haltpoll.c >> +++ b/drivers/cpuidle/cpuidle-haltpoll.c >> @@ -53,7 +53,8 @@ static int __init haltpoll_init(void) >> cpuidle_poll_state_init(drv); >> - if (!kvm_para_available()) >> + if (!kvm_para_available() || >> + !kvm_para_has_hint(KVM_HINTS_REALTIME)) >> return 0; >> ret = cpuidle_register(&haltpoll_driver, NULL); >> diff --git a/drivers/cpuidle/governors/haltpoll.c >> b/drivers/cpuidle/governors/haltpoll.c >> index 797477b..685c7007 100644 >> --- a/drivers/cpuidle/governors/haltpoll.c >> +++ b/drivers/cpuidle/governors/haltpoll.c >> @@ -141,7 +141,7 @@ static struct cpuidle_governor haltpoll_governor = { >> static int __init init_haltpoll(void) >> { >> - if (kvm_para_available()) >> + if (kvm_para_available() && kvm_para_has_hint(KVM_HINTS_REALTIME)) >> return cpuidle_register_governor(&haltpoll_governor); >> return 0; > >
On Thu, Aug 01, 2019 at 06:54:49PM +0200, Paolo Bonzini wrote: > On 01/08/19 18:51, Rafael J. Wysocki wrote: > > On 8/1/2019 9:06 AM, Wanpeng Li wrote: > >> From: Wanpeng Li <wanpengli@tencent.com> > >> > >> The downside of guest side polling is that polling is performed even > >> with other runnable tasks in the host. However, even if poll in kvm > >> can aware whether or not other runnable tasks in the same pCPU, it > >> can still incur extra overhead in over-subscribe scenario. Now we can > >> just enable guest polling when dedicated pCPUs are available. > >> > >> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > >> Cc: Paolo Bonzini <pbonzini@redhat.com> > >> Cc: Radim Krčmář <rkrcmar@redhat.com> > >> Cc: Marcelo Tosatti <mtosatti@redhat.com> > >> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> > > > > Paolo, Marcelo, any comments? > > Yes, it's a good idea. > > Acked-by: Paolo Bonzini <pbonzini@redhat.com> > > Paolo I think KVM_HINTS_REALTIME is being abused somewhat. It has no clear meaning and used in different locations for different purposes. For example, i think that using pv queued spinlocks and haltpoll is a desired scenario, which the patch below disallows. Wanpeng Li, currently the driver does not autoload. So polling in the guest has to be enabled manually. Isnt that sufficient?
On Sun, 4 Aug 2019 at 04:21, Marcelo Tosatti <mtosatti@redhat.com> wrote: > > On Thu, Aug 01, 2019 at 06:54:49PM +0200, Paolo Bonzini wrote: > > On 01/08/19 18:51, Rafael J. Wysocki wrote: > > > On 8/1/2019 9:06 AM, Wanpeng Li wrote: > > >> From: Wanpeng Li <wanpengli@tencent.com> > > >> > > >> The downside of guest side polling is that polling is performed even > > >> with other runnable tasks in the host. However, even if poll in kvm > > >> can aware whether or not other runnable tasks in the same pCPU, it > > >> can still incur extra overhead in over-subscribe scenario. Now we can > > >> just enable guest polling when dedicated pCPUs are available. > > >> > > >> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > >> Cc: Paolo Bonzini <pbonzini@redhat.com> > > >> Cc: Radim Krčmář <rkrcmar@redhat.com> > > >> Cc: Marcelo Tosatti <mtosatti@redhat.com> > > >> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> > > > > > > Paolo, Marcelo, any comments? > > > > Yes, it's a good idea. > > > > Acked-by: Paolo Bonzini <pbonzini@redhat.com> > > > > Paolo > Hi Marcelo, Sorry for the late response. > I think KVM_HINTS_REALTIME is being abused somewhat. > It has no clear meaning and used in different locations > for different purposes. ================== ============ ================================= KVM_HINTS_REALTIME 0 guest checks this feature bit to determine that vCPUs are never preempted for an unlimited time allowing optimizations ================== ============ ================================= Now it disables pv queued spinlock, pv tlb shootdown, pv sched yield which are not expected present in vCPUs are never preempted for an unlimited time scenario. > > For example, i think that using pv queued spinlocks and > haltpoll is a desired scenario, which the patch below disallows. So even if dedicated pCPU is available, pv queued spinlocks should still be chose if something like vhost-kthreads are used instead of DPDK/vhost-user. kvm adaptive halt-polling will compete with vhost-kthreads, however, poll in guest unaware other runnable tasks in the host which will defeat vhost-kthreads. Regards, Wanpeng Li
On 13/08/19 02:55, Wanpeng Li wrote: >> I think KVM_HINTS_REALTIME is being abused somewhat. >> It has no clear meaning and used in different locations >> for different purposes. > > Now it disables pv queued spinlock, pv tlb shootdown, pv sched yield > which are not expected present in vCPUs are never preempted for an > unlimited time scenario. Guest side polling definitely matches the purpose of KVM_HINTS_REALTIME. While host-side polling is conditional on single_task_running, this is obviously not true of guest-side polling. The alternative would be to enable it only if KVM_FEATURE_POLL_CONTROL is available, but I prefer Wanpeng's patch. Paolo >> For example, i think that using pv queued spinlocks and >> haltpoll is a desired scenario, which the patch below disallows. > > So even if dedicated pCPU is available, pv queued spinlocks should > still be chose if something like vhost-kthreads are used instead of > DPDK/vhost-user. kvm adaptive halt-polling will compete with > vhost-kthreads, however, poll in guest unaware other runnable tasks in > the host which will defeat vhost-kthreads. > > Regards, > Wanpeng Li >
On Tue, Aug 13, 2019 at 08:55:29AM +0800, Wanpeng Li wrote: > On Sun, 4 Aug 2019 at 04:21, Marcelo Tosatti <mtosatti@redhat.com> wrote: > > > > On Thu, Aug 01, 2019 at 06:54:49PM +0200, Paolo Bonzini wrote: > > > On 01/08/19 18:51, Rafael J. Wysocki wrote: > > > > On 8/1/2019 9:06 AM, Wanpeng Li wrote: > > > >> From: Wanpeng Li <wanpengli@tencent.com> > > > >> > > > >> The downside of guest side polling is that polling is performed even > > > >> with other runnable tasks in the host. However, even if poll in kvm > > > >> can aware whether or not other runnable tasks in the same pCPU, it > > > >> can still incur extra overhead in over-subscribe scenario. Now we can > > > >> just enable guest polling when dedicated pCPUs are available. > > > >> > > > >> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > >> Cc: Paolo Bonzini <pbonzini@redhat.com> > > > >> Cc: Radim Krčmář <rkrcmar@redhat.com> > > > >> Cc: Marcelo Tosatti <mtosatti@redhat.com> > > > >> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> > > > > > > > > Paolo, Marcelo, any comments? > > > > > > Yes, it's a good idea. > > > > > > Acked-by: Paolo Bonzini <pbonzini@redhat.com> > > > > > > Paolo > > > > Hi Marcelo, > > Sorry for the late response. > > > I think KVM_HINTS_REALTIME is being abused somewhat. > > It has no clear meaning and used in different locations > > for different purposes. > > ================== ============ ================================= > KVM_HINTS_REALTIME 0 guest checks this feature bit to > > determine that vCPUs are never > > preempted for an unlimited time Unlimited time means infinite time, or unlimited time means 10s ? 1s ? The previous definition was much better IMO: HINTS_DEDICATED. > allowing optimizations > ================== ============ ================================= > > Now it disables pv queued spinlock, OK. > pv tlb shootdown, OK. > pv sched yield "The idea is from Xen, when sending a call-function IPI-many to vCPUs, yield if any of the IPI target vCPUs was preempted. 17% performance increasement of ebizzy benchmark can be observed in an over-subscribe environment. (w/ kvm-pv-tlb disabled, testing TLB flush call-function IPI-many since call-function is not easy to be trigged by userspace workload)." This can probably hurt if vcpus are rarely preempted. > which are not expected present in vCPUs are never preempted for an > unlimited time scenario. > > > > > For example, i think that using pv queued spinlocks and > > haltpoll is a desired scenario, which the patch below disallows. > > So even if dedicated pCPU is available, pv queued spinlocks should > still be chose if something like vhost-kthreads are used instead of > DPDK/vhost-user. Can't you enable the individual features you need for optimizing the overcommitted case? This is how things have been done historically: If a new feature is available, you enable it to get the desired performance. x2apic, invariant-tsc, cpuidle haltpoll... So in your case: enable pv schedyield, enable pv tlb shootdown. > kvm adaptive halt-polling will compete with > vhost-kthreads, however, poll in guest unaware other runnable tasks in > the host which will defeat vhost-kthreads. It depends on how much work vhost-kthreads needs to do, how successful halt-poll in the guest is, and what improvement halt-polling brings. The amount of polling will be reduced to zero if polling is not successful.
Cc Michael S. Tsirkin, On Tue, 27 Aug 2019 at 04:42, Marcelo Tosatti <mtosatti@redhat.com> wrote: > > On Tue, Aug 13, 2019 at 08:55:29AM +0800, Wanpeng Li wrote: > > On Sun, 4 Aug 2019 at 04:21, Marcelo Tosatti <mtosatti@redhat.com> wrote: > > > > > > On Thu, Aug 01, 2019 at 06:54:49PM +0200, Paolo Bonzini wrote: > > > > On 01/08/19 18:51, Rafael J. Wysocki wrote: > > > > > On 8/1/2019 9:06 AM, Wanpeng Li wrote: > > > > >> From: Wanpeng Li <wanpengli@tencent.com> > > > > >> > > > > >> The downside of guest side polling is that polling is performed even > > > > >> with other runnable tasks in the host. However, even if poll in kvm > > > > >> can aware whether or not other runnable tasks in the same pCPU, it > > > > >> can still incur extra overhead in over-subscribe scenario. Now we can > > > > >> just enable guest polling when dedicated pCPUs are available. > > > > >> > > > > >> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > >> Cc: Paolo Bonzini <pbonzini@redhat.com> > > > > >> Cc: Radim Krčmář <rkrcmar@redhat.com> > > > > >> Cc: Marcelo Tosatti <mtosatti@redhat.com> > > > > >> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> > > > > > > > > > > Paolo, Marcelo, any comments? > > > > > > > > Yes, it's a good idea. > > > > > > > > Acked-by: Paolo Bonzini <pbonzini@redhat.com> > > > > > > > > Paolo > > > > > > > Hi Marcelo, > > > > Sorry for the late response. > > > > > I think KVM_HINTS_REALTIME is being abused somewhat. > > > It has no clear meaning and used in different locations > > > for different purposes. > > > > ================== ============ ================================= > > KVM_HINTS_REALTIME 0 guest checks this feature bit to > > > > determine that vCPUs are never > > > > preempted for an unlimited time > > Unlimited time means infinite time, or unlimited time means > 10s ? 1s ? The former one I think. There is a discussion here https://lkml.org/lkml/2018/5/17/612 > > The previous definition was much better IMO: HINTS_DEDICATED. > > > > allowing optimizations > > ================== ============ ================================= > > > > Now it disables pv queued spinlock, > > OK. > > > pv tlb shootdown, > > OK. > > > pv sched yield > > "The idea is from Xen, when sending a call-function IPI-many to vCPUs, > yield if any of the IPI target vCPUs was preempted. 17% performance > increasement of ebizzy benchmark can be observed in an over-subscribe > environment. (w/ kvm-pv-tlb disabled, testing TLB flush call-function > IPI-many since call-function is not easy to be trigged by userspace > workload)." > > This can probably hurt if vcpus are rarely preempted. That's why I add the KVM_HINTS_REALTIME checking here. > > > which are not expected present in vCPUs are never preempted for an > > unlimited time scenario. > > > > > > > > For example, i think that using pv queued spinlocks and > > > haltpoll is a desired scenario, which the patch below disallows. > > > > So even if dedicated pCPU is available, pv queued spinlocks should > > still be chose if something like vhost-kthreads are used instead of > > DPDK/vhost-user. > > Can't you enable the individual features you need for optimizing > the overcommitted case? This is how things have been done historically: > If a new feature is available, you enable it to get the desired > performance. x2apic, invariant-tsc, cpuidle haltpoll... > > So in your case: enable pv schedyield, enable pv tlb shootdown. Both of them are used to optimize function-call IPIs. pv sched yield for call function interrupts, and pv tlb shootdown for tlb invalidation. So still different here. Our latest testing against an 80 pCPUs host, and three 80 vCPUs VMs, the number is more better than 64 pCPUs host which I used when posting patches: ebizzy -M vanilla optimized boost 1VM 31234 34489 10% 2VM 5380 26664 396% 3VM 2967 23140 679% > > > kvm adaptive halt-polling will compete with > > vhost-kthreads, however, poll in guest unaware other runnable tasks in > > the host which will defeat vhost-kthreads. > > It depends on how much work vhost-kthreads needs to do, how successful > halt-poll in the guest is, and what improvement halt-polling brings. > The amount of polling will be reduced to zero if polling > is not successful. We observe vhost-kthreads compete with vCPUs adaptive halt-polling in kvm, it hurt performance in over-subscribe product environment, polling in guest can make it worse. Regards, Wanpeng Li
On Tue, 27 Aug 2019 at 08:43, Wanpeng Li <kernellwp@gmail.com> wrote: > > Cc Michael S. Tsirkin, > On Tue, 27 Aug 2019 at 04:42, Marcelo Tosatti <mtosatti@redhat.com> wrote: > > > > On Tue, Aug 13, 2019 at 08:55:29AM +0800, Wanpeng Li wrote: > > > On Sun, 4 Aug 2019 at 04:21, Marcelo Tosatti <mtosatti@redhat.com> wrote: > > > > > > > > On Thu, Aug 01, 2019 at 06:54:49PM +0200, Paolo Bonzini wrote: > > > > > On 01/08/19 18:51, Rafael J. Wysocki wrote: > > > > > > On 8/1/2019 9:06 AM, Wanpeng Li wrote: > > > > > >> From: Wanpeng Li <wanpengli@tencent.com> > > > > > >> > > > > > >> The downside of guest side polling is that polling is performed even > > > > > >> with other runnable tasks in the host. However, even if poll in kvm > > > > > >> can aware whether or not other runnable tasks in the same pCPU, it > > > > > >> can still incur extra overhead in over-subscribe scenario. Now we can > > > > > >> just enable guest polling when dedicated pCPUs are available. > > > > > >> > > > > > >> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > > >> Cc: Paolo Bonzini <pbonzini@redhat.com> > > > > > >> Cc: Radim Krčmář <rkrcmar@redhat.com> > > > > > >> Cc: Marcelo Tosatti <mtosatti@redhat.com> > > > > > >> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> > > > > > > > > > > > > Paolo, Marcelo, any comments? > > > > > > > > > > Yes, it's a good idea. > > > > > > > > > > Acked-by: Paolo Bonzini <pbonzini@redhat.com> Hi Marcelo, If you don't have more concern, I guess Rafael can apply this patch now since the merge window is not too far. Regards, Wanpeng Li
On Wed, Aug 28, 2019 at 10:34 AM Wanpeng Li <kernellwp@gmail.com> wrote: > > On Tue, 27 Aug 2019 at 08:43, Wanpeng Li <kernellwp@gmail.com> wrote: > > > > Cc Michael S. Tsirkin, > > On Tue, 27 Aug 2019 at 04:42, Marcelo Tosatti <mtosatti@redhat.com> wrote: > > > > > > On Tue, Aug 13, 2019 at 08:55:29AM +0800, Wanpeng Li wrote: > > > > On Sun, 4 Aug 2019 at 04:21, Marcelo Tosatti <mtosatti@redhat.com> wrote: > > > > > > > > > > On Thu, Aug 01, 2019 at 06:54:49PM +0200, Paolo Bonzini wrote: > > > > > > On 01/08/19 18:51, Rafael J. Wysocki wrote: > > > > > > > On 8/1/2019 9:06 AM, Wanpeng Li wrote: > > > > > > >> From: Wanpeng Li <wanpengli@tencent.com> > > > > > > >> > > > > > > >> The downside of guest side polling is that polling is performed even > > > > > > >> with other runnable tasks in the host. However, even if poll in kvm > > > > > > >> can aware whether or not other runnable tasks in the same pCPU, it > > > > > > >> can still incur extra overhead in over-subscribe scenario. Now we can > > > > > > >> just enable guest polling when dedicated pCPUs are available. > > > > > > >> > > > > > > >> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > > > >> Cc: Paolo Bonzini <pbonzini@redhat.com> > > > > > > >> Cc: Radim Krčmář <rkrcmar@redhat.com> > > > > > > >> Cc: Marcelo Tosatti <mtosatti@redhat.com> > > > > > > >> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> > > > > > > > > > > > > > > Paolo, Marcelo, any comments? > > > > > > > > > > > > Yes, it's a good idea. > > > > > > > > > > > > Acked-by: Paolo Bonzini <pbonzini@redhat.com> > > Hi Marcelo, > > If you don't have more concern, I guess Rafael can apply this patch > now since the merge window is not too far. I will likely queue it up later today and it will go to linux-next early next week. Thanks!
On Wed, 28 Aug 2019 at 16:45, Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Wed, Aug 28, 2019 at 10:34 AM Wanpeng Li <kernellwp@gmail.com> wrote: > > > > On Tue, 27 Aug 2019 at 08:43, Wanpeng Li <kernellwp@gmail.com> wrote: > > > > > > Cc Michael S. Tsirkin, > > > On Tue, 27 Aug 2019 at 04:42, Marcelo Tosatti <mtosatti@redhat.com> wrote: > > > > > > > > On Tue, Aug 13, 2019 at 08:55:29AM +0800, Wanpeng Li wrote: > > > > > On Sun, 4 Aug 2019 at 04:21, Marcelo Tosatti <mtosatti@redhat.com> wrote: > > > > > > > > > > > > On Thu, Aug 01, 2019 at 06:54:49PM +0200, Paolo Bonzini wrote: > > > > > > > On 01/08/19 18:51, Rafael J. Wysocki wrote: > > > > > > > > On 8/1/2019 9:06 AM, Wanpeng Li wrote: > > > > > > > >> From: Wanpeng Li <wanpengli@tencent.com> > > > > > > > >> > > > > > > > >> The downside of guest side polling is that polling is performed even > > > > > > > >> with other runnable tasks in the host. However, even if poll in kvm > > > > > > > >> can aware whether or not other runnable tasks in the same pCPU, it > > > > > > > >> can still incur extra overhead in over-subscribe scenario. Now we can > > > > > > > >> just enable guest polling when dedicated pCPUs are available. > > > > > > > >> > > > > > > > >> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > > > > >> Cc: Paolo Bonzini <pbonzini@redhat.com> > > > > > > > >> Cc: Radim Krčmář <rkrcmar@redhat.com> > > > > > > > >> Cc: Marcelo Tosatti <mtosatti@redhat.com> > > > > > > > >> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> > > > > > > > > > > > > > > > > Paolo, Marcelo, any comments? > > > > > > > > > > > > > > Yes, it's a good idea. > > > > > > > > > > > > > > Acked-by: Paolo Bonzini <pbonzini@redhat.com> > > > > Hi Marcelo, > > > > If you don't have more concern, I guess Rafael can apply this patch > > now since the merge window is not too far. > > I will likely queue it up later today and it will go to linux-next > early next week. Thank you, Rafael. Regards, Wanpeng Li
On Wed, Aug 28, 2019 at 10:45:44AM +0200, Rafael J. Wysocki wrote: > On Wed, Aug 28, 2019 at 10:34 AM Wanpeng Li <kernellwp@gmail.com> wrote: > > > > On Tue, 27 Aug 2019 at 08:43, Wanpeng Li <kernellwp@gmail.com> wrote: > > > > > > Cc Michael S. Tsirkin, > > > On Tue, 27 Aug 2019 at 04:42, Marcelo Tosatti <mtosatti@redhat.com> wrote: > > > > > > > > On Tue, Aug 13, 2019 at 08:55:29AM +0800, Wanpeng Li wrote: > > > > > On Sun, 4 Aug 2019 at 04:21, Marcelo Tosatti <mtosatti@redhat.com> wrote: > > > > > > > > > > > > On Thu, Aug 01, 2019 at 06:54:49PM +0200, Paolo Bonzini wrote: > > > > > > > On 01/08/19 18:51, Rafael J. Wysocki wrote: > > > > > > > > On 8/1/2019 9:06 AM, Wanpeng Li wrote: > > > > > > > >> From: Wanpeng Li <wanpengli@tencent.com> > > > > > > > >> > > > > > > > >> The downside of guest side polling is that polling is performed even > > > > > > > >> with other runnable tasks in the host. However, even if poll in kvm > > > > > > > >> can aware whether or not other runnable tasks in the same pCPU, it > > > > > > > >> can still incur extra overhead in over-subscribe scenario. Now we can > > > > > > > >> just enable guest polling when dedicated pCPUs are available. > > > > > > > >> > > > > > > > >> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > > > > >> Cc: Paolo Bonzini <pbonzini@redhat.com> > > > > > > > >> Cc: Radim Krčmář <rkrcmar@redhat.com> > > > > > > > >> Cc: Marcelo Tosatti <mtosatti@redhat.com> > > > > > > > >> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> > > > > > > > > > > > > > > > > Paolo, Marcelo, any comments? > > > > > > > > > > > > > > Yes, it's a good idea. > > > > > > > > > > > > > > Acked-by: Paolo Bonzini <pbonzini@redhat.com> > > > > Hi Marcelo, > > > > If you don't have more concern, I guess Rafael can apply this patch > > now since the merge window is not too far. > > I will likely queue it up later today and it will go to linux-next > early next week. > > Thanks! NACK patch. Just don't load the haltpoll driver.
On Tue, Aug 27, 2019 at 08:43:13AM +0800, Wanpeng Li wrote: > > > kvm adaptive halt-polling will compete with > > > vhost-kthreads, however, poll in guest unaware other runnable tasks in > > > the host which will defeat vhost-kthreads. > > > > It depends on how much work vhost-kthreads needs to do, how successful > > halt-poll in the guest is, and what improvement halt-polling brings. > > The amount of polling will be reduced to zero if polling > > is not successful. > > We observe vhost-kthreads compete with vCPUs adaptive halt-polling in > kvm, it hurt performance in over-subscribe product environment, > polling in guest can make it worse. > > Regards, > Wanpeng Li Wanpeng, Polling should not be performed if there is other work to do. For example, halt-polling could check a host/guest shared memory region indicating whether there are other runnable tasks in the host. Disabling polling means you will not achieve the improvement even in the transitional periods where the system is not overcommitted (which should be frequent given that idling is common). Again, about your patch: it brings no benefit to anyone. Guest halt polling should be already disabled by default (the driver has to be loaded for guest polling to take place).
On Wed, Aug 28, 2019 at 11:48:58AM -0300, Marcelo Tosatti wrote: > On Tue, Aug 27, 2019 at 08:43:13AM +0800, Wanpeng Li wrote: > > > > kvm adaptive halt-polling will compete with > > > > vhost-kthreads, however, poll in guest unaware other runnable tasks in > > > > the host which will defeat vhost-kthreads. > > > > > > It depends on how much work vhost-kthreads needs to do, how successful > > > halt-poll in the guest is, and what improvement halt-polling brings. > > > The amount of polling will be reduced to zero if polling > > > is not successful. > > > > We observe vhost-kthreads compete with vCPUs adaptive halt-polling in > > kvm, it hurt performance in over-subscribe product environment, > > polling in guest can make it worse. > > > > Regards, > > Wanpeng Li > > Wanpeng, > > Polling should not be performed if there is other work to do. For > example, halt-polling could check a host/guest shared memory > region indicating whether there are other runnable tasks in the host. > > Disabling polling means you will not achieve the improvement > even in the transitional periods where the system is not > overcommitted (which should be frequent given that idling > is common). > > Again, about your patch: it brings no benefit to anyone. > > Guest halt polling should be already disabled by default > (the driver has to be loaded for guest polling to take place). The most efficient solution would be to mwait on a memory region that both host and guest would write to. No cpu cycles burned, full efficiency. However both host and guest would have to write to this region, which brings security concerns.
On Wed, Aug 28, 2019 at 4:39 PM Marcelo Tosatti <mtosatti@redhat.com> wrote: > > On Wed, Aug 28, 2019 at 10:45:44AM +0200, Rafael J. Wysocki wrote: > > On Wed, Aug 28, 2019 at 10:34 AM Wanpeng Li <kernellwp@gmail.com> wrote: > > > > > > On Tue, 27 Aug 2019 at 08:43, Wanpeng Li <kernellwp@gmail.com> wrote: > > > > > > > > Cc Michael S. Tsirkin, > > > > On Tue, 27 Aug 2019 at 04:42, Marcelo Tosatti <mtosatti@redhat.com> wrote: > > > > > > > > > > On Tue, Aug 13, 2019 at 08:55:29AM +0800, Wanpeng Li wrote: > > > > > > On Sun, 4 Aug 2019 at 04:21, Marcelo Tosatti <mtosatti@redhat.com> wrote: > > > > > > > > > > > > > > On Thu, Aug 01, 2019 at 06:54:49PM +0200, Paolo Bonzini wrote: > > > > > > > > On 01/08/19 18:51, Rafael J. Wysocki wrote: > > > > > > > > > On 8/1/2019 9:06 AM, Wanpeng Li wrote: > > > > > > > > >> From: Wanpeng Li <wanpengli@tencent.com> > > > > > > > > >> > > > > > > > > >> The downside of guest side polling is that polling is performed even > > > > > > > > >> with other runnable tasks in the host. However, even if poll in kvm > > > > > > > > >> can aware whether or not other runnable tasks in the same pCPU, it > > > > > > > > >> can still incur extra overhead in over-subscribe scenario. Now we can > > > > > > > > >> just enable guest polling when dedicated pCPUs are available. > > > > > > > > >> > > > > > > > > >> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > > > > > >> Cc: Paolo Bonzini <pbonzini@redhat.com> > > > > > > > > >> Cc: Radim Krčmář <rkrcmar@redhat.com> > > > > > > > > >> Cc: Marcelo Tosatti <mtosatti@redhat.com> > > > > > > > > >> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> > > > > > > > > > > > > > > > > > > Paolo, Marcelo, any comments? > > > > > > > > > > > > > > > > Yes, it's a good idea. > > > > > > > > > > > > > > > > Acked-by: Paolo Bonzini <pbonzini@redhat.com> > > > > > > Hi Marcelo, > > > > > > If you don't have more concern, I guess Rafael can apply this patch > > > now since the merge window is not too far. > > > > I will likely queue it up later today and it will go to linux-next > > early next week. > > > > Thanks! > > NACK patch. I got an ACK from Paolo on it, though. Convince Paolo to withdraw his ACK if you want it to not be applied. > Just don't load the haltpoll driver. And why would that be better?
On Thu, Aug 29, 2019 at 01:37:35AM +0200, Rafael J. Wysocki wrote: > On Wed, Aug 28, 2019 at 4:39 PM Marcelo Tosatti <mtosatti@redhat.com> wrote: > > > > On Wed, Aug 28, 2019 at 10:45:44AM +0200, Rafael J. Wysocki wrote: > > > On Wed, Aug 28, 2019 at 10:34 AM Wanpeng Li <kernellwp@gmail.com> wrote: > > > > > > > > On Tue, 27 Aug 2019 at 08:43, Wanpeng Li <kernellwp@gmail.com> wrote: > > > > > > > > > > Cc Michael S. Tsirkin, > > > > > On Tue, 27 Aug 2019 at 04:42, Marcelo Tosatti <mtosatti@redhat.com> wrote: > > > > > > > > > > > > On Tue, Aug 13, 2019 at 08:55:29AM +0800, Wanpeng Li wrote: > > > > > > > On Sun, 4 Aug 2019 at 04:21, Marcelo Tosatti <mtosatti@redhat.com> wrote: > > > > > > > > > > > > > > > > On Thu, Aug 01, 2019 at 06:54:49PM +0200, Paolo Bonzini wrote: > > > > > > > > > On 01/08/19 18:51, Rafael J. Wysocki wrote: > > > > > > > > > > On 8/1/2019 9:06 AM, Wanpeng Li wrote: > > > > > > > > > >> From: Wanpeng Li <wanpengli@tencent.com> > > > > > > > > > >> > > > > > > > > > >> The downside of guest side polling is that polling is performed even > > > > > > > > > >> with other runnable tasks in the host. However, even if poll in kvm > > > > > > > > > >> can aware whether or not other runnable tasks in the same pCPU, it > > > > > > > > > >> can still incur extra overhead in over-subscribe scenario. Now we can > > > > > > > > > >> just enable guest polling when dedicated pCPUs are available. > > > > > > > > > >> > > > > > > > > > >> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > > > > > > >> Cc: Paolo Bonzini <pbonzini@redhat.com> > > > > > > > > > >> Cc: Radim Krčmář <rkrcmar@redhat.com> > > > > > > > > > >> Cc: Marcelo Tosatti <mtosatti@redhat.com> > > > > > > > > > >> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> > > > > > > > > > > > > > > > > > > > > Paolo, Marcelo, any comments? > > > > > > > > > > > > > > > > > > Yes, it's a good idea. > > > > > > > > > > > > > > > > > > Acked-by: Paolo Bonzini <pbonzini@redhat.com> > > > > > > > > Hi Marcelo, > > > > > > > > If you don't have more concern, I guess Rafael can apply this patch > > > > now since the merge window is not too far. > > > > > > I will likely queue it up later today and it will go to linux-next > > > early next week. > > > > > > Thanks! > > > > NACK patch. > > I got an ACK from Paolo on it, though. Convince Paolo to withdraw his > ACK if you want it to not be applied. > > > Just don't load the haltpoll driver. > > And why would that be better? Split the group of all kvm users in two: overcommit group and non-overcommit group. Current situation regarding haltpoll driver is: overcommit group: haltpoll driver is not loaded by default, they are happy. non overcommit group: boots without "realtime hints" flag, loads haltpoll driver, happy. Situation with patch above: overcommit group: haltpoll driver is not loaded by default, they are happy. non overcommit group: boots without "realtime hints" flag, haltpoll driver cannot be loaded.
On Thu, 29 Aug 2019 at 20:04, Marcelo Tosatti <mtosatti@redhat.com> wrote: > > On Thu, Aug 29, 2019 at 01:37:35AM +0200, Rafael J. Wysocki wrote: > > On Wed, Aug 28, 2019 at 4:39 PM Marcelo Tosatti <mtosatti@redhat.com> wrote: > > > > > > On Wed, Aug 28, 2019 at 10:45:44AM +0200, Rafael J. Wysocki wrote: > > > > On Wed, Aug 28, 2019 at 10:34 AM Wanpeng Li <kernellwp@gmail.com> wrote: > > > > > > > > > > On Tue, 27 Aug 2019 at 08:43, Wanpeng Li <kernellwp@gmail.com> wrote: > > > > > > > > > > > > Cc Michael S. Tsirkin, > > > > > > On Tue, 27 Aug 2019 at 04:42, Marcelo Tosatti <mtosatti@redhat.com> wrote: > > > > > > > > > > > > > > On Tue, Aug 13, 2019 at 08:55:29AM +0800, Wanpeng Li wrote: > > > > > > > > On Sun, 4 Aug 2019 at 04:21, Marcelo Tosatti <mtosatti@redhat.com> wrote: > > > > > > > > > > > > > > > > > > On Thu, Aug 01, 2019 at 06:54:49PM +0200, Paolo Bonzini wrote: > > > > > > > > > > On 01/08/19 18:51, Rafael J. Wysocki wrote: > > > > > > > > > > > On 8/1/2019 9:06 AM, Wanpeng Li wrote: > > > > > > > > > > >> From: Wanpeng Li <wanpengli@tencent.com> > > > > > > > > > > >> > > > > > > > > > > >> The downside of guest side polling is that polling is performed even > > > > > > > > > > >> with other runnable tasks in the host. However, even if poll in kvm > > > > > > > > > > >> can aware whether or not other runnable tasks in the same pCPU, it > > > > > > > > > > >> can still incur extra overhead in over-subscribe scenario. Now we can > > > > > > > > > > >> just enable guest polling when dedicated pCPUs are available. > > > > > > > > > > >> > > > > > > > > > > >> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > > > > > > > >> Cc: Paolo Bonzini <pbonzini@redhat.com> > > > > > > > > > > >> Cc: Radim Krčmář <rkrcmar@redhat.com> > > > > > > > > > > >> Cc: Marcelo Tosatti <mtosatti@redhat.com> > > > > > > > > > > >> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> > > > > > > > > > > > > > > > > > > > > > > Paolo, Marcelo, any comments? > > > > > > > > > > > > > > > > > > > > Yes, it's a good idea. > > > > > > > > > > > > > > > > > > > > Acked-by: Paolo Bonzini <pbonzini@redhat.com> > > > > > > > > > > Hi Marcelo, > > > > > > > > > > If you don't have more concern, I guess Rafael can apply this patch > > > > > now since the merge window is not too far. > > > > > > > > I will likely queue it up later today and it will go to linux-next > > > > early next week. > > > > > > > > Thanks! > > > > > > NACK patch. > > > > I got an ACK from Paolo on it, though. Convince Paolo to withdraw his > > ACK if you want it to not be applied. > > > > > Just don't load the haltpoll driver. > > > > And why would that be better? > > Split the group of all kvm users in two: overcommit group and non-overcommit > group. > > Current situation regarding haltpoll driver is: > > overcommit group: haltpoll driver is not loaded by default, they are > happy. > > non overcommit group: boots without "realtime hints" flag, loads haltpoll driver, > happy. > > Situation with patch above: > > overcommit group: haltpoll driver is not loaded by default, they are > happy. > > non overcommit group: boots without "realtime hints" flag, haltpoll driver > cannot be loaded. non overcommit group, if they don't care latency/performance, they don't need to enable haltpoll, "realtime hints" etc. Otherwise, they should better tune. Regards, Wanpeng Li
On Thu, Aug 29, 2019 at 08:16:41PM +0800, Wanpeng Li wrote: > > Current situation regarding haltpoll driver is: > > > > overcommit group: haltpoll driver is not loaded by default, they are > > happy. > > > > non overcommit group: boots without "realtime hints" flag, loads haltpoll driver, > > happy. > > > > Situation with patch above: > > > > overcommit group: haltpoll driver is not loaded by default, they are > > happy. > > > > non overcommit group: boots without "realtime hints" flag, haltpoll driver > > cannot be loaded. > > non overcommit group, if they don't care latency/performance, they > don't need to enable haltpoll, "realtime hints" etc. Otherwise, they > should better tune. As mentioned before, "being overcommitted" is a property which is transitional. A static true/false scheme reflects this poorly. Therefore the OS should detect it and act accordingly.
On Thu, Aug 29, 2019 at 09:53:04AM -0300, Marcelo Tosatti wrote: > On Thu, Aug 29, 2019 at 08:16:41PM +0800, Wanpeng Li wrote: > > > Current situation regarding haltpoll driver is: > > > > > > overcommit group: haltpoll driver is not loaded by default, they are > > > happy. > > > > > > non overcommit group: boots without "realtime hints" flag, loads haltpoll driver, > > > happy. > > > > > > Situation with patch above: > > > > > > overcommit group: haltpoll driver is not loaded by default, they are > > > happy. > > > > > > non overcommit group: boots without "realtime hints" flag, haltpoll driver > > > cannot be loaded. > > > > non overcommit group, if they don't care latency/performance, they > > don't need to enable haltpoll, "realtime hints" etc. Otherwise, they > > should better tune. > > As mentioned before, "being overcommitted" is a property which is transitional. > > A static true/false scheme reflects this poorly. > > Therefore the OS should detect it and act accordingly. Hi Wanpeng Li, One suggestion for a dynamic "is overcommited" scheme: If the amount of stolen time, in the past record_steal_time window, is more than 20% of the time in that window, then mark system as overcommitted. Otherwise, clear it. Make that 20% configurable by as kvm module parameter. Use that info to enable/disable overcommit features. That should work, right?
diff --git a/drivers/cpuidle/cpuidle-haltpoll.c b/drivers/cpuidle/cpuidle-haltpoll.c index 9ac093d..7aee38a 100644 --- a/drivers/cpuidle/cpuidle-haltpoll.c +++ b/drivers/cpuidle/cpuidle-haltpoll.c @@ -53,7 +53,8 @@ static int __init haltpoll_init(void) cpuidle_poll_state_init(drv); - if (!kvm_para_available()) + if (!kvm_para_available() || + !kvm_para_has_hint(KVM_HINTS_REALTIME)) return 0; ret = cpuidle_register(&haltpoll_driver, NULL); diff --git a/drivers/cpuidle/governors/haltpoll.c b/drivers/cpuidle/governors/haltpoll.c index 797477b..685c7007 100644 --- a/drivers/cpuidle/governors/haltpoll.c +++ b/drivers/cpuidle/governors/haltpoll.c @@ -141,7 +141,7 @@ static struct cpuidle_governor haltpoll_governor = { static int __init init_haltpoll(void) { - if (kvm_para_available()) + if (kvm_para_available() && kvm_para_has_hint(KVM_HINTS_REALTIME)) return cpuidle_register_governor(&haltpoll_governor); return 0;