From patchwork Fri Sep 29 11:36:11 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: David Woodhouse <dwmw2@infradead.org>
X-Patchwork-Id: 13404121
Return-Path: <kvm-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id ED70BE80ABE
	for <kvm@archiver.kernel.org>; Fri, 29 Sep 2023 11:36:23 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S233003AbjI2LgX (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Fri, 29 Sep 2023 07:36:23 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35934 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229687AbjI2LgW (ORCPT <rfc822;kvm@vger.kernel.org>);
        Fri, 29 Sep 2023 07:36:22 -0400
Received: from casper.infradead.org (casper.infradead.org
 [IPv6:2001:8b0:10b:1236::1])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D334994;
        Fri, 29 Sep 2023 04:36:18 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
        d=infradead.org; s=casper.20170209;
 h=MIME-Version:Content-Type:Date:Cc:To:
        From:Subject:Message-ID:Sender:Reply-To:Content-Transfer-Encoding:Content-ID:
        Content-Description:In-Reply-To:References;
        bh=I2cLa0h+lpUltv1ermPtkAdKnpzp55fdUDyoi6QPbL8=;
 b=YX+nwm42ViiId2g8DvolBSNKhe
        1mFEo5ozSU7REOu5JV2XHxXH1BwQotm8rybzbCRx0WeWNYGXsG0bcj9R9G9zgtKJZQHDS0j2xhbW5
        yZygqM+EdPDQYIhJMb2EsCSL7jaPfO3LxZhYwsxTauiZQeZeGIOdtwoqeDSZ+gGs74HdyJfa5TWka
        8S//UM4YOE0mVyM8YVWQDGoCmJyqmitho6rgtJtWLbLxRqZ8Pr5No0aqG+RmVdUc8xoiC4Y2l0UFN
        lpp0cl9nZilJ7mpfnfoMsh7nx4Jxj0iFYlu7d2mRckJiEBki+E21CTfgAShE7R3kTwPAV3EIl91zg
        xYicVcew==;
Received: from [2001:8b0:10b:5:379a:6ec5:d16c:614]
 (helo=u3832b3a9db3152.ant.amazon.com)
        by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux))
        id 1qmBn3-008VwF-Ke; Fri, 29 Sep 2023 11:36:15 +0000
Message-ID: <a3989e7ff9cca77f680f9bdfbaee52b707693221.camel@infradead.org>
Subject: [PATCH v2] KVM: x86: Use fast path for Xen timer delivery
From: David Woodhouse <dwmw2@infradead.org>
To: kvm <kvm@vger.kernel.org>
Cc: David Woodhouse <dwmw2@infradead.org>, Paul Durrant <paul@xen.org>,
        Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        Dave Hansen <dave.hansen@linux.intel.com>, x86@kernel.org,
        "H. Peter Anvin" <hpa@zytor.com>, linux-kernel@vger.kernel.org
Date: Fri, 29 Sep 2023 12:36:11 +0100
User-Agent: Evolution 3.44.4-0ubuntu2 
MIME-Version: 1.0
X-SRS-Rewrite: SMTP reverse-path rewritten from <dwmw2@infradead.org> by
 casper.infradead.org. See http://www.infradead.org/rpr.html
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: David Woodhouse <dwmw@amazon.co.uk>

Most of the time there's no need to kick the vCPU and deliver the timer
event through kvm_xen_inject_timer_irqs(). Use kvm_xen_set_evtchn_fast()
directly from the timer callback, and only fall back to the slow path
when it's necessary to do so.

This gives a significant improvement in timer latency testing (using
nanosleep() for various periods and then measuring the actual time
elapsed).

However, there was a reason¹ the fast path was dropped when this support
was first added. The current code holds vcpu->mutex for all operations on
the kvm->arch.timer_expires field, and the fast path introduces potential
race conditions. So... ensure the hrtimer is *cancelled* before making
changes in kvm_xen_start_timer(), and also when reading the values out
for KVM_XEN_VCPU_ATTR_TYPE_TIMER.

Add some sanity checks to ensure the truth of the claim that all the
other code paths are run with the vcpu loaded. And use hrtimer_cancel()
directly from kvm_xen_destroy_vcpu() to avoid a false positive from the
check in kvm_xen_stop_timer().

¹ https://lore.kernel.org/kvm/846caa99-2e42-4443-1070-84e49d2f11d2@redhat.com/

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
---

 • v2: Remember, and deal with, those races.

 arch/x86/kvm/xen.c | 64 +++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 58 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index fb1110b2385a..9d0d602a2466 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -117,6 +117,8 @@ static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn)
 
 void kvm_xen_inject_timer_irqs(struct kvm_vcpu *vcpu)
 {
+	WARN_ON_ONCE(vcpu != kvm_get_running_vcpu());
+
 	if (atomic_read(&vcpu->arch.xen.timer_pending) > 0) {
 		struct kvm_xen_evtchn e;
 
@@ -136,18 +138,41 @@ static enum hrtimer_restart xen_timer_callback(struct hrtimer *timer)
 {
 	struct kvm_vcpu *vcpu = container_of(timer, struct kvm_vcpu,
 					     arch.xen.timer);
+	struct kvm_xen_evtchn e;
+	int rc;
+
 	if (atomic_read(&vcpu->arch.xen.timer_pending))
 		return HRTIMER_NORESTART;
 
-	atomic_inc(&vcpu->arch.xen.timer_pending);
-	kvm_make_request(KVM_REQ_UNBLOCK, vcpu);
-	kvm_vcpu_kick(vcpu);
+	e.vcpu_id = vcpu->vcpu_id;
+	e.vcpu_idx = vcpu->vcpu_idx;
+	e.port = vcpu->arch.xen.timer_virq;
+	e.priority = KVM_IRQ_ROUTING_XEN_EVTCHN_PRIO_2LEVEL;
+
+	rc = kvm_xen_set_evtchn_fast(&e, vcpu->kvm);
+	if (rc == -EWOULDBLOCK) {
+		atomic_inc(&vcpu->arch.xen.timer_pending);
+		kvm_make_request(KVM_REQ_UNBLOCK, vcpu);
+		kvm_vcpu_kick(vcpu);
+	} else {
+		vcpu->arch.xen.timer_expires = 0;
+	}
 
 	return HRTIMER_NORESTART;
 }
 
 static void kvm_xen_start_timer(struct kvm_vcpu *vcpu, u64 guest_abs, s64 delta_ns)
 {
+	WARN_ON_ONCE(vcpu != kvm_get_running_vcpu());
+
+	/*
+	 * Avoid races with the old timer firing. Checking timer_expires
+	 * to avoid calling hrtimer_cancel() will only have false positives
+	 * so is fine.
+	 */
+	if (vcpu->arch.xen.timer_expires)
+		hrtimer_cancel(&vcpu->arch.xen.timer);
+
 	atomic_set(&vcpu->arch.xen.timer_pending, 0);
 	vcpu->arch.xen.timer_expires = guest_abs;
 
@@ -163,6 +188,8 @@ static void kvm_xen_start_timer(struct kvm_vcpu *vcpu, u64 guest_abs, s64 delta_
 
 static void kvm_xen_stop_timer(struct kvm_vcpu *vcpu)
 {
+	WARN_ON_ONCE(vcpu != kvm_get_running_vcpu());
+
 	hrtimer_cancel(&vcpu->arch.xen.timer);
 	vcpu->arch.xen.timer_expires = 0;
 	atomic_set(&vcpu->arch.xen.timer_pending, 0);
@@ -1019,13 +1046,38 @@ int kvm_xen_vcpu_get_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data)
 		r = 0;
 		break;
 
-	case KVM_XEN_VCPU_ATTR_TYPE_TIMER:
+	case KVM_XEN_VCPU_ATTR_TYPE_TIMER: {
+		bool pending = false;
+
+		/*
+		 * Ensure a consistent snapshot of state is captures, with a
+		 * timer either being pending, or fully delivered. Not still
+		 * lurking in the timer_pending flag for deferred delivery.
+		 */
+		if (vcpu->arch.xen.timer_expires) {
+			pending = hrtimer_cancel(&vcpu->arch.xen.timer);
+			kvm_xen_inject_timer_irqs(vcpu);
+		}
+
 		data->u.timer.port = vcpu->arch.xen.timer_virq;
 		data->u.timer.priority = KVM_IRQ_ROUTING_XEN_EVTCHN_PRIO_2LEVEL;
 		data->u.timer.expires_ns = vcpu->arch.xen.timer_expires;
+
+		/*
+		 * The timer may be delivered immediately, while the returned
+		 * state causes it to be set up and delivered again on the
+		 * destination system after migration. That's fine, as the
+		 * guest will not even have had a chance to run and process
+		 * the interrupt by that point, so it won't even notice the
+		 * duplicate IRQ.
+		 */
+		if (pending)
+			hrtimer_start_expires(&vcpu->arch.xen.timer,
+					      HRTIMER_MODE_ABS_HARD);
+
 		r = 0;
 		break;
-
+	}
 	case KVM_XEN_VCPU_ATTR_TYPE_UPCALL_VECTOR:
 		data->u.vector = vcpu->arch.xen.upcall_vector;
 		r = 0;
@@ -2085,7 +2137,7 @@ void kvm_xen_init_vcpu(struct kvm_vcpu *vcpu)
 void kvm_xen_destroy_vcpu(struct kvm_vcpu *vcpu)
 {
 	if (kvm_xen_timer_enabled(vcpu))
-		kvm_xen_stop_timer(vcpu);
+		hrtimer_cancel(&vcpu->arch.xen.timer);
 
 	kvm_gpc_deactivate(&vcpu->arch.xen.runstate_cache);
 	kvm_gpc_deactivate(&vcpu->arch.xen.runstate2_cache);