Message ID | 20190416203248.29429-3-sean.j.christopherson@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: lapic: Fix a variety of timer adv issues | expand |
On 16/04/19 22:32, Sean Christopherson wrote: > + /* ndelay uses delay_tsc whenever the hardware has TSC, thus always. */ > + if (guest_tsc < tsc_deadline) { > + ns = (tsc_deadline - guest_tsc) * 1000000ULL; > + do_div(ns, vcpu->arch.virtual_tsc_khz); > + ndelay(min_t(u32, ns, lapic_timer_advance_ns)); > + } > This min_t would allow the timer to expire in advance of the indicated deadline. If this happens, wouldn't it be a bug elsewhere? Paolo
On Wed, Apr 17, 2019 at 02:56:53PM +0200, Paolo Bonzini wrote: > On 16/04/19 22:32, Sean Christopherson wrote: > > + /* ndelay uses delay_tsc whenever the hardware has TSC, thus always. */ > > + if (guest_tsc < tsc_deadline) { > > + ns = (tsc_deadline - guest_tsc) * 1000000ULL; > > + do_div(ns, vcpu->arch.virtual_tsc_khz); > > + ndelay(min_t(u32, ns, lapic_timer_advance_ns)); > > + } > > > > This min_t would allow the timer to expire in advance of the indicated > deadline. If this happens, wouldn't it be a bug elsewhere? Maybe? In this exact variation of the code, no, since userspace can modify lapic_timer_advance_ns while the vCPU is running. Obviously that case goes away when lapic_timer_advance_ns is no longer directly consumed by vCPUS. Originally, the check was added because lockup occurred if the guest's TSC was sent backwards[1]. The discussion surrounding the bugfix patch[2] is probably more interesting/amusing/relevant. We're basically rehashing that discussion. :-) Anyways, capping at a KVM controlled value is prudent since not doing so can hang the host if something does go wrong. As for the __delay() vs ndelay() discussion, I'll try to dig more into how/why I saw the timer fire with an incorrect TSC, i.e. why ndelay() "works" but __delay() does not. https://bugzilla.redhat.com/show_bug.cgi?id=1337667 https://patchwork.kernel.org/patch/9130687/
On 17/04/19 16:33, Sean Christopherson wrote: > In this exact variation of the code, no, since userspace can modify > lapic_timer_advance_ns while the vCPU is running. Obviously that case > goes away when lapic_timer_advance_ns is no longer directly consumed by > vCPUS. Indeed, and if we weren't about to add per-vCPU lapic_timer_advance_ns, the old suggestion of making lapic_timer_advance_ns read-only would stand. > Anyways, capping at a KVM controlled value is prudent since not doing > so can hang the host if something does go wrong. Yes---for the per-vCPU lapic_timer_advance_ns it is completely reasonable. Paolo
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 92446cba9b24..5891c0badfa6 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1502,10 +1502,12 @@ void wait_lapic_expire(struct kvm_vcpu *vcpu) guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc()); trace_kvm_wait_lapic_expire(vcpu->vcpu_id, guest_tsc - tsc_deadline); - /* __delay is delay_tsc whenever the hardware has TSC, thus always. */ - if (guest_tsc < tsc_deadline) - __delay(min(tsc_deadline - guest_tsc, - nsec_to_cycles(vcpu, lapic_timer_advance_ns))); + /* ndelay uses delay_tsc whenever the hardware has TSC, thus always. */ + if (guest_tsc < tsc_deadline) { + ns = (tsc_deadline - guest_tsc) * 1000000ULL; + do_div(ns, vcpu->arch.virtual_tsc_khz); + ndelay(min_t(u32, ns, lapic_timer_advance_ns)); + } if (!lapic_timer_advance_adjust_done) { /* too early */