Message ID | 20210925005528.1145584-1-seanjc@google.com (mailing list archive) |
---|---|
Headers | show |
Series | KVM: Halt-polling fixes, cleanups and a new stat | expand |
While looking into this series, I realized that Davids patch commit acd05785e48c01edb2c4f4d014d28478b5f19fb5 Author: David Matlack <dmatlack@google.com> AuthorDate: Fri Apr 17 15:14:46 2020 -0700 Commit: Paolo Bonzini <pbonzini@redhat.com> CommitDate: Fri Apr 24 12:53:17 2020 -0400 kvm: add capability for halt polling broke the possibility for an admin to disable halt polling for already running KVM guests. In past times doing echo 0 > /sys/module/kvm/parameters/halt_poll_ns stopped polling system wide. Now all KVM guests will use the halt_poll_ns value that was active during startup - even those that do not use KVM_CAP_HALT_POLL. I guess this was not intended?
On Mon, Sep 27, 2021, Christian Borntraeger wrote: > While looking into this series, > > I realized that Davids patch > > commit acd05785e48c01edb2c4f4d014d28478b5f19fb5 > Author: David Matlack <dmatlack@google.com> > AuthorDate: Fri Apr 17 15:14:46 2020 -0700 > Commit: Paolo Bonzini <pbonzini@redhat.com> > CommitDate: Fri Apr 24 12:53:17 2020 -0400 > > kvm: add capability for halt polling > > broke the possibility for an admin to disable halt polling for already running KVM guests. > In past times doing > echo 0 > /sys/module/kvm/parameters/halt_poll_ns > > stopped polling system wide. > Now all KVM guests will use the halt_poll_ns value that was active during > startup - even those that do not use KVM_CAP_HALT_POLL. > > I guess this was not intended? Ouch. I would go so far as to say that halt_poll_ns should be a hard limit on the capability. What about having the per-VM variable track only the capability, and then use the module param to cap the max when doing adjustments? E.g. add a variant of this early in the series? diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 80f78daa6b8d..f50e4e31a0cf 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1078,8 +1078,6 @@ static struct kvm *kvm_create_vm(unsigned long type) goto out_err_no_arch_destroy_vm; } - kvm->max_halt_poll_ns = halt_poll_ns; - r = kvm_arch_init_vm(kvm, type); if (r) goto out_err_no_arch_destroy_vm; @@ -3136,7 +3134,8 @@ void kvm_sigset_deactivate(struct kvm_vcpu *vcpu) sigemptyset(¤t->real_blocked); } -static void grow_halt_poll_ns(struct kvm_vcpu *vcpu) +static void grow_halt_poll_ns(struct kvm_vcpu *vcpu, + unsigned int max_halt_poll_ns) { unsigned int old, val, grow, grow_start; @@ -3150,8 +3149,8 @@ static void grow_halt_poll_ns(struct kvm_vcpu *vcpu) if (val < grow_start) val = grow_start; - if (val > vcpu->kvm->max_halt_poll_ns) - val = vcpu->kvm->max_halt_poll_ns; + if (val > max_halt_poll_ns) + val = max_halt_poll_ns; vcpu->halt_poll_ns = val; out: @@ -3261,6 +3260,7 @@ void kvm_vcpu_halt(struct kvm_vcpu *vcpu) { bool halt_poll_allowed = !kvm_arch_no_poll(vcpu); bool do_halt_poll = halt_poll_allowed && vcpu->halt_poll_ns; + unsigned int max_halt_poll_ns; ktime_t start, cur, poll_end; bool waited = false; u64 halt_ns; @@ -3304,19 +3304,25 @@ void kvm_vcpu_halt(struct kvm_vcpu *vcpu) update_halt_poll_stats(vcpu, start, poll_end, !waited); if (halt_poll_allowed) { + max_halt_poll_ns = vcpu->kvm->max_halt_poll_ns; + if (max_halt_poll_ns) + max_halt_poll_ns = min(max_halt_poll_ns, halt_poll_ns); + else + max_halt_poll_ns = halt_poll_ns; + if (!vcpu_valid_wakeup(vcpu)) { shrink_halt_poll_ns(vcpu); - } else if (vcpu->kvm->max_halt_poll_ns) { + } else if (max_halt_poll_ns) { if (halt_ns <= vcpu->halt_poll_ns) ; /* we had a long block, shrink polling */ else if (vcpu->halt_poll_ns && - halt_ns > vcpu->kvm->max_halt_poll_ns) + halt_ns > max_halt_poll_ns) shrink_halt_poll_ns(vcpu); /* we had a short halt and our poll time is too small */ - else if (vcpu->halt_poll_ns < vcpu->kvm->max_halt_poll_ns && - halt_ns < vcpu->kvm->max_halt_poll_ns) - grow_halt_poll_ns(vcpu); + else if (vcpu->halt_poll_ns < max_halt_poll_ns && + halt_ns < max_halt_poll_ns) + grow_halt_poll_ns(vcpu, max_halt_poll_ns); } else { vcpu->halt_poll_ns = 0; }
On 27/09/21 16:59, Sean Christopherson wrote: >> commit acd05785e48c01edb2c4f4d014d28478b5f19fb5 >> Author: David Matlack<dmatlack@google.com> >> AuthorDate: Fri Apr 17 15:14:46 2020 -0700 >> Commit: Paolo Bonzini<pbonzini@redhat.com> >> CommitDate: Fri Apr 24 12:53:17 2020 -0400 >> >> kvm: add capability for halt polling >> >> broke the possibility for an admin to disable halt polling for already running KVM guests. >> In past times doing >> echo 0 > /sys/module/kvm/parameters/halt_poll_ns >> >> stopped polling system wide. >> Now all KVM guests will use the halt_poll_ns value that was active during >> startup - even those that do not use KVM_CAP_HALT_POLL. >> >> I guess this was not intended? No, but... > I would go so far as to say that halt_poll_ns should be a hard limit on > the capability ... this would not be a good idea I think. Anything that wants to do a lot of polling can just do "for (;;)". So I think there are two possibilities that makes sense: * track what is using KVM_CAP_HALT_POLL, and make writes to halt_poll_ns follow that * just make halt_poll_ns read-only. Paolo
On Mon, Sep 27, 2021, Paolo Bonzini wrote: > On 27/09/21 16:59, Sean Christopherson wrote: > > > commit acd05785e48c01edb2c4f4d014d28478b5f19fb5 > > > Author: David Matlack<dmatlack@google.com> > > > AuthorDate: Fri Apr 17 15:14:46 2020 -0700 > > > Commit: Paolo Bonzini<pbonzini@redhat.com> > > > CommitDate: Fri Apr 24 12:53:17 2020 -0400 > > > > > > kvm: add capability for halt polling > > > > > > broke the possibility for an admin to disable halt polling for already running KVM guests. > > > In past times doing > > > echo 0 > /sys/module/kvm/parameters/halt_poll_ns > > > > > > stopped polling system wide. > > > Now all KVM guests will use the halt_poll_ns value that was active during > > > startup - even those that do not use KVM_CAP_HALT_POLL. > > > > > > I guess this was not intended? > > No, but... > > > I would go so far as to say that halt_poll_ns should be a hard limit on > > the capability > > ... this would not be a good idea I think. Anything that wants to do a lot > of polling can just do "for (;;)". Hmm, true, there is no danger to the system in having the capability override the module param. > So I think there are two possibilities that makes sense: > > * track what is using KVM_CAP_HALT_POLL, and make writes to halt_poll_ns > follow that I think this option makes more sense, making halt_poll_ns read-only is basically forcing users to switch to KVM_CAP_HALT_POLL. > * just make halt_poll_ns read-only. > > Paolo >
Am 27.09.21 um 17:03 schrieb Paolo Bonzini: > On 27/09/21 16:59, Sean Christopherson wrote: >>> commit acd05785e48c01edb2c4f4d014d28478b5f19fb5 >>> Author: David Matlack<dmatlack@google.com> >>> AuthorDate: Fri Apr 17 15:14:46 2020 -0700 >>> Commit: Paolo Bonzini<pbonzini@redhat.com> >>> CommitDate: Fri Apr 24 12:53:17 2020 -0400 >>> >>> kvm: add capability for halt polling >>> >>> broke the possibility for an admin to disable halt polling for already running KVM guests. >>> In past times doing >>> echo 0 > /sys/module/kvm/parameters/halt_poll_ns >>> >>> stopped polling system wide. >>> Now all KVM guests will use the halt_poll_ns value that was active during >>> startup - even those that do not use KVM_CAP_HALT_POLL. >>> >>> I guess this was not intended? > > No, but... > >> I would go so far as to say that halt_poll_ns should be a hard limit on >> the capability > > ... this would not be a good idea I think. Anything that wants to do a lot of polling can just do "for (;;)". > > So I think there are two possibilities that makes sense: > > * track what is using KVM_CAP_HALT_POLL, and make writes to halt_poll_ns follow that what about using halt_poll_ns for those VMs that did not uses KVM_CAP_HALT_POLL and the private number for those that did. > > * just make halt_poll_ns read-only. > > Paolo >
On Mon, Sep 27, 2021 at 8:17 AM Christian Borntraeger <borntraeger@de.ibm.com> wrote: > > > > Am 27.09.21 um 17:03 schrieb Paolo Bonzini: > > On 27/09/21 16:59, Sean Christopherson wrote: > >>> commit acd05785e48c01edb2c4f4d014d28478b5f19fb5 > >>> Author: David Matlack<dmatlack@google.com> > >>> AuthorDate: Fri Apr 17 15:14:46 2020 -0700 > >>> Commit: Paolo Bonzini<pbonzini@redhat.com> > >>> CommitDate: Fri Apr 24 12:53:17 2020 -0400 > >>> > >>> kvm: add capability for halt polling > >>> > >>> broke the possibility for an admin to disable halt polling for already running KVM guests. > >>> In past times doing > >>> echo 0 > /sys/module/kvm/parameters/halt_poll_ns > >>> > >>> stopped polling system wide. > >>> Now all KVM guests will use the halt_poll_ns value that was active during > >>> startup - even those that do not use KVM_CAP_HALT_POLL. > >>> > >>> I guess this was not intended? > > > > No, but... > > > >> I would go so far as to say that halt_poll_ns should be a hard limit on > >> the capability > > > > ... this would not be a good idea I think. Anything that wants to do a lot of polling can just do "for (;;)". I agree. It would also be a maintenance burden and subtle "gotcha" to have to increase halt_poll_ns anytime one wants to increase KVM_CAP_HALT_POLL. > > > > So I think there are two possibilities that makes sense: > > > > * track what is using KVM_CAP_HALT_POLL, and make writes to halt_poll_ns follow that > > what about using halt_poll_ns for those VMs that did not uses KVM_CAP_HALT_POLL and the private number for those that did. None of these options would cover Christian's original use-case though. (Write to module to disable halt-polling system-wide.) What about adding a writable "enable_halt_polling" module parameter that affects all VMs? Once that is in place we could also consider getting rid of halt_poll_ns entirely. > > > > * just make halt_poll_ns read-only. > > > > Paolo > >
On Mon, Sep 27, 2021 at 5:17 PM Christian Borntraeger <borntraeger@de.ibm.com> wrote: > > So I think there are two possibilities that makes sense: > > > > * track what is using KVM_CAP_HALT_POLL, and make writes to halt_poll_ns follow that > > what about using halt_poll_ns for those VMs that did not uses KVM_CAP_HALT_POLL and the private number for those that did. Yes, that's what I meant. David pointed out that doesn't allow you to disable halt polling altogether, but for that you can always ask each VM's userspace one by one, or just not use KVM_CAP_HALT_POLL. (Also, I don't know about Google's usecase, but mine was actually more about using KVM_CAP_HALT_POLL to *disable* halt polling on some VMs!). Paolo
On Mon, Sep 27, 2021, Paolo Bonzini wrote: > On Mon, Sep 27, 2021 at 5:17 PM Christian Borntraeger > <borntraeger@de.ibm.com> wrote: > > > So I think there are two possibilities that makes sense: > > > > > > * track what is using KVM_CAP_HALT_POLL, and make writes to halt_poll_ns follow that > > > > what about using halt_poll_ns for those VMs that did not uses KVM_CAP_HALT_POLL and the private number for those that did. > > Yes, that's what I meant. David pointed out that doesn't allow you to > disable halt polling altogether, but for that you can always ask each > VM's userspace one by one, or just not use KVM_CAP_HALT_POLL. (Also, I > don't know about Google's usecase, but mine was actually more about > using KVM_CAP_HALT_POLL to *disable* halt polling on some VMs!). I kinda like the idea if special-casing halt_poll_ns=0, e.g. for testing or in-the-field mitigation if halt-polling is broken. It'd be trivial to support, e.g. @@ -3304,19 +3304,23 @@ void kvm_vcpu_halt(struct kvm_vcpu *vcpu) update_halt_poll_stats(vcpu, start, poll_end, !waited); if (halt_poll_allowed) { + max_halt_poll_ns = vcpu->kvm->max_halt_poll_ns; + if (!max_halt_poll_ns || !halt_poll_ns) <------ squish the max if halt_poll_ns==0 + max_halt_poll_ns = halt_poll_ns; + if (!vcpu_valid_wakeup(vcpu)) { shrink_halt_poll_ns(vcpu); - } else if (vcpu->kvm->max_halt_poll_ns) { + } else if (max_halt_poll_ns) { if (halt_ns <= vcpu->halt_poll_ns) ; /* we had a long block, shrink polling */ else if (vcpu->halt_poll_ns && - halt_ns > vcpu->kvm->max_halt_poll_ns) + halt_ns > max_halt_poll_ns) shrink_halt_poll_ns(vcpu); /* we had a short halt and our poll time is too small */ - else if (vcpu->halt_poll_ns < vcpu->kvm->max_halt_poll_ns && - halt_ns < vcpu->kvm->max_halt_poll_ns) - grow_halt_poll_ns(vcpu); + else if (vcpu->halt_poll_ns < max_halt_poll_ns && + halt_ns < max_halt_poll_ns) + grow_halt_poll_ns(vcpu, max_halt_poll_ns); } else { vcpu->halt_poll_ns = 0; }
Am 27.09.21 um 18:58 schrieb David Matlack: > On Mon, Sep 27, 2021 at 8:17 AM Christian Borntraeger > <borntraeger@de.ibm.com> wrote: >> >> >> >> Am 27.09.21 um 17:03 schrieb Paolo Bonzini: >>> On 27/09/21 16:59, Sean Christopherson wrote: >>>>> commit acd05785e48c01edb2c4f4d014d28478b5f19fb5 >>>>> Author: David Matlack<dmatlack@google.com> >>>>> AuthorDate: Fri Apr 17 15:14:46 2020 -0700 >>>>> Commit: Paolo Bonzini<pbonzini@redhat.com> >>>>> CommitDate: Fri Apr 24 12:53:17 2020 -0400 >>>>> >>>>> kvm: add capability for halt polling >>>>> >>>>> broke the possibility for an admin to disable halt polling for already running KVM guests. >>>>> In past times doing >>>>> echo 0 > /sys/module/kvm/parameters/halt_poll_ns >>>>> >>>>> stopped polling system wide. >>>>> Now all KVM guests will use the halt_poll_ns value that was active during >>>>> startup - even those that do not use KVM_CAP_HALT_POLL. >>>>> >>>>> I guess this was not intended? >>> >>> No, but... >>> >>>> I would go so far as to say that halt_poll_ns should be a hard limit on >>>> the capability >>> >>> ... this would not be a good idea I think. Anything that wants to do a lot of polling can just do "for (;;)". > > I agree. It would also be a maintenance burden and subtle "gotcha" to > have to increase halt_poll_ns anytime one wants to increase > KVM_CAP_HALT_POLL. I think the idea of the upper bound is not about preventing wasting CPUs but to reconfigure existing poll intervals on a global level. So I think this idea is a bad idea in itself. Especially as the admin might not have access to the monitor of user QEMUs. >>> >>> So I think there are two possibilities that makes sense: >>> >>> * track what is using KVM_CAP_HALT_POLL, and make writes to halt_poll_ns follow that >> >> what about using halt_poll_ns for those VMs that did not uses KVM_CAP_HALT_POLL and the private number for those that did. > > None of these options would cover Christian's original use-case > though. (Write to module to disable halt-polling system-wide.) > > What about adding a writable "enable_halt_polling" module parameter that would then affect both classes with and without KVM_CAP_HALT_POLL. > that affects all VMs? Once that is in place we could also consider > getting rid of halt_poll_ns entirely. As far as I can tell QEMU does not yet use KVM_CAP_HALT_POLL. So having a system wide halt_poll_ns makes sense. And I think for all processes not using KVM_CAP_HALT_POLL we should really follow what halt_poll_ns is NOW and not what it used to be.
Hi Sean, Paolo, I recently also notice the behavior change of param halt_poll_ns. Now it loses the ability to: 1) dynamically disable halt polling for all the running VMs by `echo 0 > /sys` 2) dynamically adjust the halt polling interval for all the running VMs by `echo * > /sys` While in our cases, we usually use above two abilities, and KVM_CAP_HALT_POLL is not used yet. On 2021/9/28 1:33, Sean Christopherson wrote: > On Mon, Sep 27, 2021, Paolo Bonzini wrote: >> On Mon, Sep 27, 2021 at 5:17 PM Christian Borntraeger >> <borntraeger@de.ibm.com> wrote: >>>> So I think there are two possibilities that makes sense: >>>> >>>> * track what is using KVM_CAP_HALT_POLL, and make writes to halt_poll_ns follow that >>> what about using halt_poll_ns for those VMs that did not uses KVM_CAP_HALT_POLL and the private number for those that did. >> Yes, that's what I meant. David pointed out that doesn't allow you to >> disable halt polling altogether, but for that you can always ask each >> VM's userspace one by one, or just not use KVM_CAP_HALT_POLL. (Also, I >> don't know about Google's usecase, but mine was actually more about >> using KVM_CAP_HALT_POLL to *disable* halt polling on some VMs!). > I kinda like the idea if special-casing halt_poll_ns=0, e.g. for testing or > in-the-field mitigation if halt-polling is broken. It'd be trivial to support, e.g. Do we have any plan to repost the diff as a fix? I would be very nice that this issue can be solved. Besides, I think we may need some Doc for users to describe how halt_poll_ns works with KVM_CAP_HALT_POLL, like "Documentation/virt/guest-halt-polling.rst". > @@ -3304,19 +3304,23 @@ void kvm_vcpu_halt(struct kvm_vcpu *vcpu) > update_halt_poll_stats(vcpu, start, poll_end, !waited); > > if (halt_poll_allowed) { > + max_halt_poll_ns = vcpu->kvm->max_halt_poll_ns; > + if (!max_halt_poll_ns || !halt_poll_ns) <------ squish the max if halt_poll_ns==0 > + max_halt_poll_ns = halt_poll_ns; > + Does this mean that KVM_CAP_HALT_POLL will not be able to disable halt polling for a VM individually when halt_poll_ns !=0? > if (!vcpu_valid_wakeup(vcpu)) { > shrink_halt_poll_ns(vcpu); > - } else if (vcpu->kvm->max_halt_poll_ns) { > + } else if (max_halt_poll_ns) { > if (halt_ns <= vcpu->halt_poll_ns) > ; > /* we had a long block, shrink polling */ > else if (vcpu->halt_poll_ns && > - halt_ns > vcpu->kvm->max_halt_poll_ns) > + halt_ns > max_halt_poll_ns) > shrink_halt_poll_ns(vcpu); > /* we had a short halt and our poll time is too small */ > - else if (vcpu->halt_poll_ns < vcpu->kvm->max_halt_poll_ns && > - halt_ns < vcpu->kvm->max_halt_poll_ns) > - grow_halt_poll_ns(vcpu); > + else if (vcpu->halt_poll_ns < max_halt_poll_ns && > + halt_ns < max_halt_poll_ns) > + grow_halt_poll_ns(vcpu, max_halt_poll_ns); > } else { > vcpu->halt_poll_ns = 0; > } > _______________________________________________ > kvmarm mailing list > kvmarm@lists.cs.columbia.edu > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm > . Thanks, Yanan
On Tue, Nov 15, 2022 at 11:28:56AM +0800, wangyanan (Y) wrote: > Hi Sean, Paolo, > > I recently also notice the behavior change of param halt_poll_ns. > Now it loses the ability to: > 1) dynamically disable halt polling for all the running VMs > by `echo 0 > /sys` > 2) dynamically adjust the halt polling interval for all the > running VMs by `echo * > /sys` > > While in our cases, we usually use above two abilities, and > KVM_CAP_HALT_POLL is not used yet. I think the right path forward is to make KVM_CAP_HALT_POLL a pure override of halt_poll_ns, and restore the pre-existing behavior of halt_poll_ns whenever KVM_CAP_HALT_POLL is not used. e.g. see the patch below. That will fix issues (1) and (2) above for any VM not using KVM_CAP_HALT_POLL. If a VM is using KVM_CAP_HALT_POLL, it will ignore all changes to halt_poll_ns. If we truly need a mechanism for admins to disable halt-polling on VMs using KVM_CAP_HALT_POLL, we can introduce a separate module parameter for that. But IMO, any setup that is sophisticated enough to use KVM_CAP_HALT_POLL should also be able to use KVM_CAP_HALT_POLL to disable halt polling. If everyone is happy with this approach I can test and send a real patch to the mailing list. diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index e6e66c5e56f2..253ad055b6ad 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -788,6 +788,7 @@ struct kvm { struct srcu_struct srcu; struct srcu_struct irq_srcu; pid_t userspace_pid; + bool override_halt_poll_ns; unsigned int max_halt_poll_ns; u32 dirty_ring_size; bool vm_bugged; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 43bbe4fde078..479d0d0da0b5 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1198,8 +1198,6 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname) goto out_err_no_arch_destroy_vm; } - kvm->max_halt_poll_ns = halt_poll_ns; - r = kvm_arch_init_vm(kvm, type); if (r) goto out_err_no_arch_destroy_vm; @@ -3371,7 +3369,7 @@ void kvm_sigset_deactivate(struct kvm_vcpu *vcpu) sigemptyset(¤t->real_blocked); } -static void grow_halt_poll_ns(struct kvm_vcpu *vcpu) +static void grow_halt_poll_ns(struct kvm_vcpu *vcpu, unsigned int max) { unsigned int old, val, grow, grow_start; @@ -3385,8 +3383,8 @@ static void grow_halt_poll_ns(struct kvm_vcpu *vcpu) if (val < grow_start) val = grow_start; - if (val > vcpu->kvm->max_halt_poll_ns) - val = vcpu->kvm->max_halt_poll_ns; + if (val > max) + val = max; vcpu->halt_poll_ns = val; out: @@ -3501,10 +3499,17 @@ void kvm_vcpu_halt(struct kvm_vcpu *vcpu) { bool halt_poll_allowed = !kvm_arch_no_poll(vcpu); bool do_halt_poll = halt_poll_allowed && vcpu->halt_poll_ns; + unsigned int max_halt_poll_ns; ktime_t start, cur, poll_end; + struct kvm *kvm = vcpu->kvm; bool waited = false; u64 halt_ns; + if (kvm->override_halt_poll_ns) + max_halt_poll_ns = kvm->max_halt_poll_ns; + else + max_halt_poll_ns = READ_ONCE(halt_poll_ns); + start = cur = poll_end = ktime_get(); if (do_halt_poll) { ktime_t stop = ktime_add_ns(start, vcpu->halt_poll_ns); @@ -3545,17 +3550,16 @@ void kvm_vcpu_halt(struct kvm_vcpu *vcpu) if (halt_poll_allowed) { if (!vcpu_valid_wakeup(vcpu)) { shrink_halt_poll_ns(vcpu); - } else if (vcpu->kvm->max_halt_poll_ns) { + } else if (max_halt_poll_ns) { if (halt_ns <= vcpu->halt_poll_ns) ; /* we had a long block, shrink polling */ - else if (vcpu->halt_poll_ns && - halt_ns > vcpu->kvm->max_halt_poll_ns) + else if (vcpu->halt_poll_ns && halt_ns > max_halt_poll_ns) shrink_halt_poll_ns(vcpu); /* we had a short halt and our poll time is too small */ - else if (vcpu->halt_poll_ns < vcpu->kvm->max_halt_poll_ns && - halt_ns < vcpu->kvm->max_halt_poll_ns) - grow_halt_poll_ns(vcpu); + else if (vcpu->halt_poll_ns < max_halt_poll_ns && + halt_ns < max_halt_poll_ns) + grow_halt_poll_ns(vcpu, max_halt_poll_ns); } else { vcpu->halt_poll_ns = 0; } @@ -4588,6 +4592,7 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm, if (cap->flags || cap->args[0] != (unsigned int)cap->args[0]) return -EINVAL; + kvm->override_halt_poll_ns = true; kvm->max_halt_poll_ns = cap->args[0]; return 0; } > > On 2021/9/28 1:33, Sean Christopherson wrote: > > On Mon, Sep 27, 2021, Paolo Bonzini wrote: > > > On Mon, Sep 27, 2021 at 5:17 PM Christian Borntraeger > > > <borntraeger@de.ibm.com> wrote: > > > > > So I think there are two possibilities that makes sense: > > > > > > > > > > * track what is using KVM_CAP_HALT_POLL, and make writes to halt_poll_ns follow that > > > > what about using halt_poll_ns for those VMs that did not uses KVM_CAP_HALT_POLL and the private number for those that did. > > > Yes, that's what I meant. David pointed out that doesn't allow you to > > > disable halt polling altogether, but for that you can always ask each > > > VM's userspace one by one, or just not use KVM_CAP_HALT_POLL. (Also, I > > > don't know about Google's usecase, but mine was actually more about > > > using KVM_CAP_HALT_POLL to *disable* halt polling on some VMs!). > > I kinda like the idea if special-casing halt_poll_ns=0, e.g. for testing or > > in-the-field mitigation if halt-polling is broken. It'd be trivial to support, e.g. > Do we have any plan to repost the diff as a fix? > I would be very nice that this issue can be solved. > > Besides, I think we may need some Doc for users to describe > how halt_poll_ns works with KVM_CAP_HALT_POLL, like > "Documentation/virt/guest-halt-polling.rst". > > @@ -3304,19 +3304,23 @@ void kvm_vcpu_halt(struct kvm_vcpu *vcpu) > > update_halt_poll_stats(vcpu, start, poll_end, !waited); > > > > if (halt_poll_allowed) { > > + max_halt_poll_ns = vcpu->kvm->max_halt_poll_ns; > > + if (!max_halt_poll_ns || !halt_poll_ns) <------ squish the max if halt_poll_ns==0 > > + max_halt_poll_ns = halt_poll_ns; > > + > Does this mean that KVM_CAP_HALT_POLL will not be able to > disable halt polling for a VM individually when halt_poll_ns !=0? > > if (!vcpu_valid_wakeup(vcpu)) { > > shrink_halt_poll_ns(vcpu); > > - } else if (vcpu->kvm->max_halt_poll_ns) { > > + } else if (max_halt_poll_ns) { > > if (halt_ns <= vcpu->halt_poll_ns) > > ; > > /* we had a long block, shrink polling */ > > else if (vcpu->halt_poll_ns && > > - halt_ns > vcpu->kvm->max_halt_poll_ns) > > + halt_ns > max_halt_poll_ns) > > shrink_halt_poll_ns(vcpu); > > /* we had a short halt and our poll time is too small */ > > - else if (vcpu->halt_poll_ns < vcpu->kvm->max_halt_poll_ns && > > - halt_ns < vcpu->kvm->max_halt_poll_ns) > > - grow_halt_poll_ns(vcpu); > > + else if (vcpu->halt_poll_ns < max_halt_poll_ns && > > + halt_ns < max_halt_poll_ns) > > + grow_halt_poll_ns(vcpu, max_halt_poll_ns); > > } else { > > vcpu->halt_poll_ns = 0; > > } > > _______________________________________________ > > kvmarm mailing list > > kvmarm@lists.cs.columbia.edu > > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm > > . > Thanks, > Yanan
On 2022/11/17 1:19, David Matlack wrote: > On Tue, Nov 15, 2022 at 11:28:56AM +0800, wangyanan (Y) wrote: >> Hi Sean, Paolo, >> >> I recently also notice the behavior change of param halt_poll_ns. >> Now it loses the ability to: >> 1) dynamically disable halt polling for all the running VMs >> by `echo 0 > /sys` >> 2) dynamically adjust the halt polling interval for all the >> running VMs by `echo * > /sys` >> >> While in our cases, we usually use above two abilities, and >> KVM_CAP_HALT_POLL is not used yet. > I think the right path forward is to make KVM_CAP_HALT_POLL a pure > override of halt_poll_ns, and restore the pre-existing behavior of > halt_poll_ns whenever KVM_CAP_HALT_POLL is not used. e.g. see the patch > below. Agree with this. kvm.halt_poll_ns serves like a legacy method to control halt polling globally. Once KVM_CAP_HALT_POLL is used for a VM, it should hold 100% responsibility to control on the VM, including disabling the polling. This strategy helps to keep the two mechanisms decoupled. > That will fix issues (1) and (2) above for any VM not using > KVM_CAP_HALT_POLL. If a VM is using KVM_CAP_HALT_POLL, it will ignore > all changes to halt_poll_ns. If we truly need a mechanism for admins to > disable halt-polling on VMs using KVM_CAP_HALT_POLL, we can introduce a > separate module parameter for that. But IMO, any setup that is > sophisticated enough to use KVM_CAP_HALT_POLL should also be able to use > KVM_CAP_HALT_POLL to disable halt polling. > > If everyone is happy with this approach I can test and send a real patch > to the mailing list. > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index e6e66c5e56f2..253ad055b6ad 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -788,6 +788,7 @@ struct kvm { > struct srcu_struct srcu; > struct srcu_struct irq_srcu; > pid_t userspace_pid; > + bool override_halt_poll_ns; > unsigned int max_halt_poll_ns; > u32 dirty_ring_size; > bool vm_bugged; > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 43bbe4fde078..479d0d0da0b5 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -1198,8 +1198,6 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname) > goto out_err_no_arch_destroy_vm; > } > > - kvm->max_halt_poll_ns = halt_poll_ns; > - > r = kvm_arch_init_vm(kvm, type); > if (r) > goto out_err_no_arch_destroy_vm; > @@ -3371,7 +3369,7 @@ void kvm_sigset_deactivate(struct kvm_vcpu *vcpu) > sigemptyset(¤t->real_blocked); > } > > -static void grow_halt_poll_ns(struct kvm_vcpu *vcpu) > +static void grow_halt_poll_ns(struct kvm_vcpu *vcpu, unsigned int max) > { > unsigned int old, val, grow, grow_start; > > @@ -3385,8 +3383,8 @@ static void grow_halt_poll_ns(struct kvm_vcpu *vcpu) > if (val < grow_start) > val = grow_start; > > - if (val > vcpu->kvm->max_halt_poll_ns) > - val = vcpu->kvm->max_halt_poll_ns; > + if (val > max) > + val = max; > > vcpu->halt_poll_ns = val; > out: > @@ -3501,10 +3499,17 @@ void kvm_vcpu_halt(struct kvm_vcpu *vcpu) > { > bool halt_poll_allowed = !kvm_arch_no_poll(vcpu); > bool do_halt_poll = halt_poll_allowed && vcpu->halt_poll_ns; > + unsigned int max_halt_poll_ns; > ktime_t start, cur, poll_end; > + struct kvm *kvm = vcpu->kvm; > bool waited = false; > u64 halt_ns; > > + if (kvm->override_halt_poll_ns) > + max_halt_poll_ns = kvm->max_halt_poll_ns; > + else > + max_halt_poll_ns = READ_ONCE(halt_poll_ns); > + > start = cur = poll_end = ktime_get(); > if (do_halt_poll) { > ktime_t stop = ktime_add_ns(start, vcpu->halt_poll_ns); > @@ -3545,17 +3550,16 @@ void kvm_vcpu_halt(struct kvm_vcpu *vcpu) > if (halt_poll_allowed) { > if (!vcpu_valid_wakeup(vcpu)) { > shrink_halt_poll_ns(vcpu); > - } else if (vcpu->kvm->max_halt_poll_ns) { > + } else if (max_halt_poll_ns) { > if (halt_ns <= vcpu->halt_poll_ns) > ; > /* we had a long block, shrink polling */ > - else if (vcpu->halt_poll_ns && > - halt_ns > vcpu->kvm->max_halt_poll_ns) > + else if (vcpu->halt_poll_ns && halt_ns > max_halt_poll_ns) > shrink_halt_poll_ns(vcpu); > /* we had a short halt and our poll time is too small */ > - else if (vcpu->halt_poll_ns < vcpu->kvm->max_halt_poll_ns && > - halt_ns < vcpu->kvm->max_halt_poll_ns) > - grow_halt_poll_ns(vcpu); > + else if (vcpu->halt_poll_ns < max_halt_poll_ns && > + halt_ns < max_halt_poll_ns) > + grow_halt_poll_ns(vcpu, max_halt_poll_ns); > } else { > vcpu->halt_poll_ns = 0; > } > @@ -4588,6 +4592,7 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm, > if (cap->flags || cap->args[0] != (unsigned int)cap->args[0]) > return -EINVAL; > > + kvm->override_halt_poll_ns = true; > kvm->max_halt_poll_ns = cap->args[0]; > return 0; > } Looks sensible to me overall. I will look at the RFC series, thanks for your quick response. Yanan . >> On 2021/9/28 1:33, Sean Christopherson wrote: >>> On Mon, Sep 27, 2021, Paolo Bonzini wrote: >>>> On Mon, Sep 27, 2021 at 5:17 PM Christian Borntraeger >>>> <borntraeger@de.ibm.com> wrote: >>>>>> So I think there are two possibilities that makes sense: >>>>>> >>>>>> * track what is using KVM_CAP_HALT_POLL, and make writes to halt_poll_ns follow that >>>>> what about using halt_poll_ns for those VMs that did not uses KVM_CAP_HALT_POLL and the private number for those that did. >>>> Yes, that's what I meant. David pointed out that doesn't allow you to >>>> disable halt polling altogether, but for that you can always ask each >>>> VM's userspace one by one, or just not use KVM_CAP_HALT_POLL. (Also, I >>>> don't know about Google's usecase, but mine was actually more about >>>> using KVM_CAP_HALT_POLL to *disable* halt polling on some VMs!). >>> I kinda like the idea if special-casing halt_poll_ns=0, e.g. for testing or >>> in-the-field mitigation if halt-polling is broken. It'd be trivial to support, e.g. >> Do we have any plan to repost the diff as a fix? >> I would be very nice that this issue can be solved. >> >> Besides, I think we may need some Doc for users to describe >> how halt_poll_ns works with KVM_CAP_HALT_POLL, like >> "Documentation/virt/guest-halt-polling.rst". >>> @@ -3304,19 +3304,23 @@ void kvm_vcpu_halt(struct kvm_vcpu *vcpu) >>> update_halt_poll_stats(vcpu, start, poll_end, !waited); >>> >>> if (halt_poll_allowed) { >>> + max_halt_poll_ns = vcpu->kvm->max_halt_poll_ns; >>> + if (!max_halt_poll_ns || !halt_poll_ns) <------ squish the max if halt_poll_ns==0 >>> + max_halt_poll_ns = halt_poll_ns; >>> + >> Does this mean that KVM_CAP_HALT_POLL will not be able to >> disable halt polling for a VM individually when halt_poll_ns !=0? >>> if (!vcpu_valid_wakeup(vcpu)) { >>> shrink_halt_poll_ns(vcpu); >>> - } else if (vcpu->kvm->max_halt_poll_ns) { >>> + } else if (max_halt_poll_ns) { >>> if (halt_ns <= vcpu->halt_poll_ns) >>> ; >>> /* we had a long block, shrink polling */ >>> else if (vcpu->halt_poll_ns && >>> - halt_ns > vcpu->kvm->max_halt_poll_ns) >>> + halt_ns > max_halt_poll_ns) >>> shrink_halt_poll_ns(vcpu); >>> /* we had a short halt and our poll time is too small */ >>> - else if (vcpu->halt_poll_ns < vcpu->kvm->max_halt_poll_ns && >>> - halt_ns < vcpu->kvm->max_halt_poll_ns) >>> - grow_halt_poll_ns(vcpu); >>> + else if (vcpu->halt_poll_ns < max_halt_poll_ns && >>> + halt_ns < max_halt_poll_ns) >>> + grow_halt_poll_ns(vcpu, max_halt_poll_ns); >>> } else { >>> vcpu->halt_poll_ns = 0; >>> } >>> _______________________________________________ >>> kvmarm mailing list >>> kvmarm@lists.cs.columbia.edu >>> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm >>> . >> Thanks, >> Yanan > .