diff mbox

[2/2,V2] KVM: LAPIC: cap __delay at lapic_timer_advance_ns

Message ID 20160621013345.GA27706@amt.cnet (mailing list archive)
State New, archived
Headers show

Commit Message

Marcelo Tosatti June 21, 2016, 1:33 a.m. UTC
The host timer which emulates the guest LAPIC TSC deadline
timer has its expiration diminished by lapic_timer_advance_ns
nanoseconds. Therefore if, at wait_lapic_expire, a difference
larger than lapic_timer_advance_ns is encountered, delay at most
lapic_timer_advance_ns.

This fixes a problem where the guest can cause the host
to delay for large amounts of time.

Reported-by: Alan Jenkins <alan.christopher.jenkins@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

---

v2: s/max/min/ !

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Alan Jenkins June 21, 2016, 12:13 p.m. UTC | #1
On 21/06/2016, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> The host timer which emulates the guest LAPIC TSC deadline
> timer has its expiration diminished by lapic_timer_advance_ns
> nanoseconds. Therefore if, at wait_lapic_expire, a difference
> larger than lapic_timer_advance_ns is encountered, delay at most
> lapic_timer_advance_ns.
>
> This fixes a problem where the guest can cause the host
> to delay for large amounts of time.
>
> Reported-by: Alan Jenkins <alan.christopher.jenkins@gmail.com>
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
>
> ---
>
> v2: s/max/min/ !
>
> Index: kvm/arch/x86/kvm/lapic.c
> ===================================================================
> --- kvm.orig/arch/x86/kvm/lapic.c
> +++ kvm/arch/x86/kvm/lapic.c
> @@ -1164,7 +1164,8 @@ void wait_lapic_expire(struct kvm_vcpu *
>
>  	/* __delay is delay_tsc whenever the hardware has TSC, thus always.  */
>  	if (guest_tsc < tsc_deadline)
> -		__delay(tsc_deadline - guest_tsc);
> +		__delay(min(tsc_deadline - guest_tsc,
> +			nsec_to_cycles(vcpu, lapic_timer_advance_ns)));
>  }
>
>  static void start_apic_timer(struct kvm_lapic *apic)
>

Can we make sure this is sent to -stable?

Since I tested this approach (I think complete with the same max/min
correction :) before I started trying to boil the ocean, it looks good
to me.

I expect nsec_to_cycles() always rounds down. Any rounding up would be
undesirable :) - but this way, even if host sets tsc_khz=1, all that
can happen is the guest is woken 1 tick before the deadline is met. It
wouldn't be a whole millisecond early in real time either, just as
much as lapic_timer_advance_ns.

And talking of scaling, it will cut off the other potential lockup as
well where we currently fail to scale the guest delay down into host
TSC ticks.

I also suggested guests could observe early wakeup when sysadmins tune
lapic_timer_advance_ns (i.e. it was reduced after the timer was set).
But Paolo pointed out the nice way to fix that would be to make
lapic_timer_advance_ns read-only (explicit requirement to `rmmod kvm`
before you can try a different value).

Thanks for this
Alan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

Index: kvm/arch/x86/kvm/lapic.c
===================================================================
--- kvm.orig/arch/x86/kvm/lapic.c
+++ kvm/arch/x86/kvm/lapic.c
@@ -1164,7 +1164,8 @@  void wait_lapic_expire(struct kvm_vcpu *
 
 	/* __delay is delay_tsc whenever the hardware has TSC, thus always.  */
 	if (guest_tsc < tsc_deadline)
-		__delay(tsc_deadline - guest_tsc);
+		__delay(min(tsc_deadline - guest_tsc,
+			nsec_to_cycles(vcpu, lapic_timer_advance_ns)));
 }
 
 static void start_apic_timer(struct kvm_lapic *apic)