From patchwork Fri Feb  9 22:28:55 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Sean Christopherson <seanjc@google.com>
X-Patchwork-Id: 13551963
Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com
 [209.85.128.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8938B364A0
	for <kvm@vger.kernel.org>; Fri,  9 Feb 2024 22:29:03 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.202
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1707517745; cv=none;
 b=gM5O5duR1WFrgRQWMSwjz06pZ9lWh00fsDCb6OAxe2Xtnrm1sNo6ojnfdGUSBFAP2ro/LDHKg4RPAJx9Xz/zeJ9CYo7JIa9BEJT59XRuqi5GyDMMRg3DtQ6OtQSLetX5e/aPhG4ueQ22rW9kP8gS2QCiMlh3HGMT+7L6n2Zp/oQ=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1707517745; c=relaxed/simple;
	bh=icC+nO7MuwJUmEXdQhRIAlaaDV8vsyLHbL3REFEZnx0=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=SjQKTikNJvVGHFSHMJRPTgyIUqGTzg9MfeF8uSUI1tDt6Y3Wd9Ul5qsLdpt83yPOGwgKdbIPzbUrnjwU2uc+YS3x5nFdvHe0wUbMpN3XomtMXqXS4nCYpsV6nH32VnVAXH8QsNwT3RMZ7crrTCUgrSNdU+fuvhcsey/55UYBaRg=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=kmnWOXez; arc=none smtp.client-ip=209.85.128.202
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="kmnWOXez"
Received: by mail-yw1-f202.google.com with SMTP id
 00721157ae682-5ffee6fcdc1so26509077b3.2
        for <kvm@vger.kernel.org>; Fri, 09 Feb 2024 14:29:03 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1707517742; x=1708122542;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc:subject:date:message-id:reply-to;
        bh=dMv+G+Ap9UXJP5o+o2JuMHRcwTnywW6TJb7XNnA0Nis=;
        b=kmnWOXezJtbZxhmvygdy07yxQMXsYj2ObCntSqWslnbjXJgWYQHQZ7HtDfWfUObgHd
         CqVEn1syj+pU8q5FC1FWTKNhAdr9Jhzna0Z/X8yEkJnztG0bTVEqiCeB7ZMdlcJB6Di9
         Nd/IibV2SI9YftV5n8hIjWsH2+g4bmlVv4PHms2wRehZLOp4R0ddpLhr6/UiIUNVplu5
         /IYcFU0/+VjzCf3eAseoxxwMX4KXTq6Ku7oPx8OGQcdQ6qRIYCJFrH/1poyFRnMqbfUN
         X6AyFBLV3h/9H6doiKmugLeWfTP9O5pD4C6JG7E6Yr9HAa2/MMeeUxmw4MCLZJPyEMEF
         2T7A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1707517742; x=1708122542;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=dMv+G+Ap9UXJP5o+o2JuMHRcwTnywW6TJb7XNnA0Nis=;
        b=M8qiPrHOUP8dnC8h9HoisyHzklaUpGxoJmYWQ4Qsif0DdJNPbvKKHPUvc3aLVex/aA
         d1cycQ+EhiFfbD4rsFxxHKXgaI0zjVn8MzYMmXuR0ZHjGqygkZV2QQYAVnGcuv4f1oEH
         TEvvPwR2orTbPoir5cc7DenCDMpjDYn+5ysDqrBvBt4t0Kw0t/is/6J2vnerniqMjRnH
         SyTAqYG/xBzAd1o+rrOyQTIBQeKw9JqKj6cCdESl1yfzidhK5wBpT/mmgN8nN5jJZ35M
         emRwXOovw2sLq6bClozj58AS/6mzPOeffWb6VNAdDUShkWJ0HCIQud4q65zhBQ2JoPd4
         F0/w==
X-Gm-Message-State: AOJu0YwrSsMzjKOHaKctlOyySnu6SsavSLYlODvrvYTZxTXAU+veZ2DB
	paEjhtF7VTQP/XvocP2g11937te82IGhz92k50Nt37l+6hnb5MokA/kt72eBcRvuPHRLLyQGk52
	gLA==
X-Google-Smtp-Source: 
 AGHT+IELxZuxEzpEJztgqXAtd5UEdM2r1PgP18O+CDOtyuNNlRlRBUOdU1r0uAwZLiPtzBNxrXzIa1m5k1w=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a81:4942:0:b0:604:42b0:ebf with SMTP id
 w63-20020a814942000000b0060442b00ebfmr100005ywa.10.1707517742672; Fri, 09 Feb
 2024 14:29:02 -0800 (PST)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Fri,  9 Feb 2024 14:28:55 -0800
In-Reply-To: <20240209222858.396696-1-seanjc@google.com>
Precedence: bulk
X-Mailing-List: kvm@vger.kernel.org
List-Id: <kvm.vger.kernel.org>
List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20240209222858.396696-1-seanjc@google.com>
X-Mailer: git-send-email 2.43.0.687.g38aa6559b0-goog
Message-ID: <20240209222858.396696-2-seanjc@google.com>
Subject: [PATCH v4 1/4] KVM: x86/mmu: Retry fault before acquiring mmu_lock if
 mapping is changing
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
 Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Yan Zhao <yan.y.zhao@intel.com>, Friedrich Weber <f.weber@proxmox.com>,
	Kai Huang <kai.huang@intel.com>, Yuan Yao <yuan.yao@linux.intel.com>,
	Xu Yilun <yilun.xu@linux.intel.com>, Yu Zhang <yu.c.zhang@linux.intel.com>,
	Chao Peng <chao.p.peng@linux.intel.com>, Fuad Tabba <tabba@google.com>,
	Michael Roth <michael.roth@amd.com>,
 Isaku Yamahata <isaku.yamahata@intel.com>,
	David Matlack <dmatlack@google.com>

Retry page faults without acquiring mmu_lock, and without even faulting
the page into the primary MMU, if the resolved gfn is covered by an active
invalidation.  Contending for mmu_lock is especially problematic on
preemptible kernels as the mmu_notifier invalidation task will yield
mmu_lock (see rwlock_needbreak()), delay the in-progress invalidation, and
ultimately increase the latency of resolving the page fault.  And in the
worst case scenario, yielding will be accompanied by a remote TLB flush,
e.g. if the invalidation covers a large range of memory and vCPUs are
accessing addresses that were already zapped.

Faulting the page into the primary MMU is similarly problematic, as doing
so may acquire locks that need to be taken for the invalidation to
complete (the primary MMU has finer grained locks than KVM's MMU), and/or
may cause unnecessary churn (getting/putting pages, marking them accessed,
etc).

Alternatively, the yielding issue could be mitigated by teaching KVM's MMU
iterators to perform more work before yielding, but that wouldn't solve
the lock contention and would negatively affect scenarios where a vCPU is
trying to fault in an address that is NOT covered by the in-progress
invalidation.

Add a dedicated lockess version of the range-based retry check to avoid
false positives on the sanity check on start+end WARN, and so that it's
super obvious that checking for a racing invalidation without holding
mmu_lock is unsafe (though obviously useful).

Wrap mmu_invalidate_in_progress in READ_ONCE() to ensure that pre-checking
invalidation in a loop won't put KVM into an infinite loop, e.g. due to
caching the in-progress flag and never seeing it go to '0'.

Force a load of mmu_invalidate_seq as well, even though it isn't strictly
necessary to avoid an infinite loop, as doing so improves the probability
that KVM will detect an invalidation that already completed before
acquiring mmu_lock and bailing anyways.

Do the pre-check even for non-preemptible kernels, as waiting to detect
the invalidation until mmu_lock is held guarantees the vCPU will observe
the worst case latency in terms of handling the fault, and can generate
even more mmu_lock contention.  E.g. the vCPU will acquire mmu_lock,
detect retry, drop mmu_lock, re-enter the guest, retake the fault, and
eventually re-acquire mmu_lock.  This behavior is also why there are no
new starvation issues due to losing the fairness guarantees provided by
rwlocks: if the vCPU needs to retry, it _must_ drop mmu_lock, i.e. waiting
on mmu_lock doesn't guarantee forward progress in the face of _another_
mmu_notifier invalidation event.

Note, adding READ_ONCE() isn't entirely free, e.g. on x86, the READ_ONCE()
may generate a load into a register instead of doing a direct comparison
(MOV+TEST+Jcc instead of CMP+Jcc), but practically speaking the added cost
is a few bytes of code and maaaaybe a cycle or three.

Reported-by: Yan Zhao <yan.y.zhao@intel.com>
Closes: https://lore.kernel.org/all/ZNnPF4W26ZbAyGto@yzhao56-desk.sh.intel.com
Reported-by: Friedrich Weber <f.weber@proxmox.com>
Cc: Kai Huang <kai.huang@intel.com>
Cc: Yan Zhao <yan.y.zhao@intel.com>
Cc: Yuan Yao <yuan.yao@linux.intel.com>
Cc: Xu Yilun <yilun.xu@linux.intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c   | 45 +++++++++++++++++++++++++++++++++++++++-
 include/linux/kvm_host.h | 26 +++++++++++++++++++++++
 2 files changed, 70 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 3c193b096b45..166cef0c3ff4 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4400,11 +4400,37 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 			   unsigned int access)
 {
+	struct kvm_memory_slot *slot = fault->slot;
 	int ret;
 
 	fault->mmu_seq = vcpu->kvm->mmu_invalidate_seq;
 	smp_rmb();
 
+	/*
+	 * Check for a relevant mmu_notifier invalidation event before getting
+	 * the pfn from the primary MMU, and before acquiring mmu_lock.
+	 *
+	 * For mmu_lock, if there is an in-progress invalidation and the kernel
+	 * allows preemption, the invalidation task may drop mmu_lock and yield
+	 * in response to mmu_lock being contended, which is *very* counter-
+	 * productive as this vCPU can't actually make forward progress until
+	 * the invalidation completes.
+	 *
+	 * Retrying now can also avoid unnessary lock contention in the primary
+	 * MMU, as the primary MMU doesn't necessarily hold a single lock for
+	 * the duration of the invalidation, i.e. faulting in a conflicting pfn
+	 * can cause the invalidation to take longer by holding locks that are
+	 * needed to complete the invalidation.
+	 *
+	 * Do the pre-check even for non-preemtible kernels, i.e. even if KVM
+	 * will never yield mmu_lock in response to contention, as this vCPU is
+	 * *guaranteed* to need to retry, i.e. waiting until mmu_lock is held
+	 * to detect retry guarantees the worst case latency for the vCPU.
+	 */
+	if (!slot &&
+	    mmu_invalidate_retry_gfn_unsafe(vcpu->kvm, fault->mmu_seq, fault->gfn))
+		return RET_PF_RETRY;
+
 	ret = __kvm_faultin_pfn(vcpu, fault);
 	if (ret != RET_PF_CONTINUE)
 		return ret;
@@ -4412,9 +4438,21 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 	if (unlikely(is_error_pfn(fault->pfn)))
 		return kvm_handle_error_pfn(vcpu, fault);
 
-	if (unlikely(!fault->slot))
+	if (unlikely(!slot))
 		return kvm_handle_noslot_fault(vcpu, fault, access);
 
+	/*
+	 * Check again for a relevant mmu_notifier invalidation event purely to
+	 * avoid contending mmu_lock.  Most invalidations will be detected by
+	 * the previous check, but checking is extremely cheap relative to the
+	 * overall cost of failing to detect the invalidation until after
+	 * mmu_lock is acquired.
+	 */
+	if (mmu_invalidate_retry_gfn_unsafe(vcpu->kvm, fault->mmu_seq, fault->gfn)) {
+		kvm_release_pfn_clean(fault->pfn);
+		return RET_PF_RETRY;
+	}
+
 	return RET_PF_CONTINUE;
 }
 
@@ -4442,6 +4480,11 @@ static bool is_page_fault_stale(struct kvm_vcpu *vcpu,
 	if (!sp && kvm_test_request(KVM_REQ_MMU_FREE_OBSOLETE_ROOTS, vcpu))
 		return true;
 
+	/*
+	 * Check for a relevant mmu_notifier invalidation event one last time
+	 * now that mmu_lock is held, as the "unsafe" checks performed without
+	 * holding mmu_lock can get false negatives.
+	 */
 	return fault->slot &&
 	       mmu_invalidate_retry_gfn(vcpu->kvm, fault->mmu_seq, fault->gfn);
 }
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7e7fd25b09b3..179df96b20f8 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2031,6 +2031,32 @@ static inline int mmu_invalidate_retry_gfn(struct kvm *kvm,
 		return 1;
 	return 0;
 }
+
+/*
+ * This lockless version of the range-based retry check *must* be paired with a
+ * call to the locked version after acquiring mmu_lock, i.e. this is safe to
+ * use only as a pre-check to avoid contending mmu_lock.  This version *will*
+ * get false negatives and false positives.
+ */
+static inline bool mmu_invalidate_retry_gfn_unsafe(struct kvm *kvm,
+						   unsigned long mmu_seq,
+						   gfn_t gfn)
+{
+	/*
+	 * Use READ_ONCE() to ensure the in-progress flag and sequence counter
+	 * are always read from memory, e.g. so that checking for retry in a
+	 * loop won't result in an infinite retry loop.  Don't force loads for
+	 * start+end, as the key to avoiding infinite retry loops is observing
+	 * the 1=>0 transition of in-progress, i.e. getting false negatives
+	 * due to stale start+end values is acceptable.
+	 */
+	if (unlikely(READ_ONCE(kvm->mmu_invalidate_in_progress)) &&
+	    gfn >= kvm->mmu_invalidate_range_start &&
+	    gfn < kvm->mmu_invalidate_range_end)
+		return true;
+
+	return READ_ONCE(kvm->mmu_invalidate_seq) != mmu_seq;
+}
 #endif
 
 #ifdef CONFIG_HAVE_KVM_IRQ_ROUTING

From patchwork Fri Feb  9 22:28:56 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Sean Christopherson <seanjc@google.com>
X-Patchwork-Id: 13551964
Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com
 [209.85.219.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 87FB7381CF
	for <kvm@vger.kernel.org>; Fri,  9 Feb 2024 22:29:05 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.219.202
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1707517747; cv=none;
 b=b+gSfmaDLKpBA/ENTeCtuRtPLIqtF0ZJrbx5X+x/FW5WiE0FMkKWvyb56/gX3pdUd9kzjBsa1nGoImT5vhrTO4N7CSmz4WA75Stpg7HoPcAQVDhLjoz7vSswVbf/uvdQAbi5q67Jszs3sTe9NrKidlFLLWdm2G8KlyIRWL90z1A=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1707517747; c=relaxed/simple;
	bh=I+1ReO9gJD+nyzCjrY4TEB+E2DVWxf90KoNj+96Bxgs=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=DgQZ7X67bNiXyfrKfYU5eKi1j5eox7sYE2gjSqXD/eWOnyRYOpJgJhO5PqyrB7m63BJFpVnDxzl0DxJsdSUyyK7HjW1pkvDMTIOqn/95jXccTDfzwSGB/iy9LGsQ9mxhsJjp1QHZ6eBweuiwGV4WYDIjBbVkE4z3agwwZ0Qf1B8=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=PSiNohpa; arc=none smtp.client-ip=209.85.219.202
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="PSiNohpa"
Received: by mail-yb1-f202.google.com with SMTP id
 3f1490d57ef6-dc6b26ce0bbso2676433276.1
        for <kvm@vger.kernel.org>; Fri, 09 Feb 2024 14:29:05 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1707517744; x=1708122544;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc:subject:date:message-id:reply-to;
        bh=7bcloJ+2HwTb1M9xznBR1nyxmyl4elTRFFxAIRR6UOg=;
        b=PSiNohpayclniZ5o4jJNQhNEQ6poxFoV0Ma84p/699pC92QSuRExaK3Je5jW8clhur
         CUQmpVQQ2tkaUHZaZUbOVBFDmQjjsSdXQHLIFT7xE+pmxb9rI97qDfJwKhpW8lDfB+bj
         Ge7w/sIi2OgjQ2MRSRNK74RtXQHcwVrEjicKTrQeAu7g2Vgv03IYSJANbSfDCur/r59P
         v8yI5CxzuKmNWnoITfxKCEFmFqIdc+VbZ2/eOYeCvwg2B0LWz9/PIkwxnrONZICIq2at
         lZx6kp5cr+UCiCBLW78nK96kVaW7eSMIP3MJ0jZno0AX1tI5qWjtzGJKMvmIKgm6Ajx1
         elew==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1707517744; x=1708122544;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=7bcloJ+2HwTb1M9xznBR1nyxmyl4elTRFFxAIRR6UOg=;
        b=lW1HNUh+ihmxuFa/N2quHT+fV/uv08xio7DDSzwGZEhZE0eraS/6ZFjJjU7gIUAnGj
         jCvfVBVGF+oYvrs5w33XMsvGcBrZIM5/DPE75F+bgLgI4Uo7z0MW4wSSWlt/7zbB6rED
         +dM7T61Vsk8YBVQYTLMooQoH3eFhYrwHwIv7cFF6evBez9tziRRNpmXRufdipyDiZ7TQ
         UZqpWtJ74uEXhhfZzrvIWbPiXMzPRyNGwzABEv1DqXjg4D4J5A1LiS+IJFD6PnGb/Q/R
         SZAS2AdkGvCnrRxAzzTe23SAXftin3Tqt57C5C2isGxaaHEzul9me/AsUfMHQclk5pLX
         7tUA==
X-Gm-Message-State: AOJu0YyTSTjqetw6G2CAGc3KJK896F6KQZF0f0NUj02ZYFFXqYLUftx+
	qfSOpFPuYXRyZfwiCJRTndwLavzeUgqWikbhIWKZLIECcrwcL4Pi4sr+70DtHwPyJ39CabEJNfA
	Rig==
X-Google-Smtp-Source: 
 AGHT+IHoPP9F/FG0S8lg28wlaICJLTHfT7pqorkLv+VVBdmv+vZIIS4hGeCOwHN/Ul4rpe86EXtlJpBC1ko=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a05:6902:1083:b0:dc6:f78d:435a with SMTP id
 v3-20020a056902108300b00dc6f78d435amr109630ybu.11.1707517744631; Fri, 09 Feb
 2024 14:29:04 -0800 (PST)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Fri,  9 Feb 2024 14:28:56 -0800
In-Reply-To: <20240209222858.396696-1-seanjc@google.com>
Precedence: bulk
X-Mailing-List: kvm@vger.kernel.org
List-Id: <kvm.vger.kernel.org>
List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20240209222858.396696-1-seanjc@google.com>
X-Mailer: git-send-email 2.43.0.687.g38aa6559b0-goog
Message-ID: <20240209222858.396696-3-seanjc@google.com>
Subject: [PATCH v4 2/4] KVM: x86/mmu: Move private vs. shared check above slot
 validity checks
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
 Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Yan Zhao <yan.y.zhao@intel.com>, Friedrich Weber <f.weber@proxmox.com>,
	Kai Huang <kai.huang@intel.com>, Yuan Yao <yuan.yao@linux.intel.com>,
	Xu Yilun <yilun.xu@linux.intel.com>, Yu Zhang <yu.c.zhang@linux.intel.com>,
	Chao Peng <chao.p.peng@linux.intel.com>, Fuad Tabba <tabba@google.com>,
	Michael Roth <michael.roth@amd.com>,
 Isaku Yamahata <isaku.yamahata@intel.com>,
	David Matlack <dmatlack@google.com>

Prioritize private vs. shared gfn attribute checks above slot validity
checks to ensure a consistent userspace ABI.  E.g. as is, KVM will exit to
userspace if there is no memslot, but emulate accesses to the APIC access
page even if the attributes mismatch.

Fixes: 8dd2eee9d526 ("KVM: x86/mmu: Handle page fault for private memory")
Cc: Yu Zhang <yu.c.zhang@linux.intel.com>
Cc: Chao Peng <chao.p.peng@linux.intel.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Michael Roth <michael.roth@amd.com>
Cc: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 166cef0c3ff4..50bfaa53f3f2 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4360,11 +4360,6 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 			return RET_PF_EMULATE;
 	}
 
-	if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) {
-		kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
-		return -EFAULT;
-	}
-
 	if (fault->is_private)
 		return kvm_faultin_pfn_private(vcpu, fault);
 
@@ -4403,6 +4398,11 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 	struct kvm_memory_slot *slot = fault->slot;
 	int ret;
 
+	if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) {
+		kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
+		return -EFAULT;
+	}
+
 	fault->mmu_seq = vcpu->kvm->mmu_invalidate_seq;
 	smp_rmb();
 

From patchwork Fri Feb  9 22:28:57 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Sean Christopherson <seanjc@google.com>
X-Patchwork-Id: 13551965
Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com
 [209.85.128.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6C57438DF8
	for <kvm@vger.kernel.org>; Fri,  9 Feb 2024 22:29:07 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.202
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1707517749; cv=none;
 b=fN++0R9f9xZRaSgDyCya5wYtna1z5QSVYlPnqIX/P6IG391yqTJHpFExqKag8n3BXmu7v1FNIhhCkN9SL65VL6CZ3zIisv1F0QvyNjwGxNZ9vGtULyUX22y6g40rTDmFiWmzvqTpjLwe/8vgui5TzuJnyRljWFHxLYS4FbI3Ims=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1707517749; c=relaxed/simple;
	bh=MzdZYOJ8bcmLxbBzxayTEgk/rh3FciY0A5qDjUGjyCw=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=ntacnGQiG1ekRnE3tf3O3EuNeIe2Xth6isdswpLtDLHWcAxWvc+sbv0PQlDvC/6Ok0ODH5faNl1GfUVptgiB5/IpjwhRW5hUOXYTJUketMjSgZHSPWV4HAzhC/p4KmsZXqYFgoUby2Xz57srYGvkWujcDTqYZD/oIo1kjfS0++Q=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=4rSD36P0; arc=none smtp.client-ip=209.85.128.202
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="4rSD36P0"
Received: by mail-yw1-f202.google.com with SMTP id
 00721157ae682-604832bcbd0so33332147b3.0
        for <kvm@vger.kernel.org>; Fri, 09 Feb 2024 14:29:07 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1707517746; x=1708122546;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc:subject:date:message-id:reply-to;
        bh=a9w16DkYUKNtIq1BB7P1j7ef4bijSTE+8L2OPdx5r88=;
        b=4rSD36P0XAQFuAK/Nms2+vP/iKlh4jKw3co2K416l19tvRWVQfYeGHrwdHSTAhdK+V
         vGF9bF9x4tJqT2Fs/aqmNcVD0wXXTcDxm7laxOjZP7vNf3K/+EY/1JkAS/hEG8k2RgLa
         JHCZ9zLtIv7REH4uXprpSVNI0BoLlo6WxA/tQ7XEqSbQBBC7cyPybdbbWUkck9TiZv44
         h8Fz5VxXWXB4n9XVi+0mnMNelaXL/z2HNk60TcvyrGScblw4s2uLylza8BBd93AKCTQE
         4W/livcJRfn51On7TIWf2oyfMuCVPC6MIL7oUlONZAuCYCp+UsGJ3+qwPSTJ7NV+LLVy
         abhA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1707517746; x=1708122546;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=a9w16DkYUKNtIq1BB7P1j7ef4bijSTE+8L2OPdx5r88=;
        b=FSXZqQi1zmlnOz4Z2MjxWXMGvbw+mNxfvhu3/01Jz6//MJgDZbcmBhHGfS8938JfnK
         lRSAecRqwSv3IHopUl5FTD9Mqmq+s8Dyt9cYsrooSpgnpmRp/atZ1gwAYmwNtsuT7kHK
         TUG86gRGrSqXsoLJL+0F1eCf2VQ8VR0DeEgwrxpPSUYgQBvc2x/ClcRICqAHhm3QjqJ3
         iRWIz3t78V4kfYFE8csxLaFIj5sy1trI5OpYnRl8hpigzqhMhvRzWE+e7xqt17SUs/bA
         +DtazqV8IeYHF7O7pKj/TC2plon8rvFLymVV1iYPnUhP6sLHMzlKh4THIZgI0D+L464H
         u/0g==
X-Gm-Message-State: AOJu0YzoMwT/M/OjwkVZefNV47nqtpSppLtCI4G7T+rkZ1CJmIDeQOgN
	MLmYG27Ub55krNL/BqxSxD2ihXZmu71GaRYU5dPlkrt4VNyAbveqLXAf1U0ahXIuc16ZYWYyp5s
	dJA==
X-Google-Smtp-Source: 
 AGHT+IGt9DDAWDOXh5rWqR7Wdjp1ZBHAe0u2NWCLrJsGzRUa6H3IIUWBVD5SKU6jmY1bLJzmP6LW2pBAIEA=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a05:690c:ec9:b0:604:648:6dc0 with SMTP id
 cs9-20020a05690c0ec900b0060406486dc0mr161009ywb.10.1707517746491; Fri, 09 Feb
 2024 14:29:06 -0800 (PST)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Fri,  9 Feb 2024 14:28:57 -0800
In-Reply-To: <20240209222858.396696-1-seanjc@google.com>
Precedence: bulk
X-Mailing-List: kvm@vger.kernel.org
List-Id: <kvm.vger.kernel.org>
List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20240209222858.396696-1-seanjc@google.com>
X-Mailer: git-send-email 2.43.0.687.g38aa6559b0-goog
Message-ID: <20240209222858.396696-4-seanjc@google.com>
Subject: [PATCH v4 3/4] KVM: x86/mmu: Move slot checks from
 __kvm_faultin_pfn() to kvm_faultin_pfn()
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
 Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Yan Zhao <yan.y.zhao@intel.com>, Friedrich Weber <f.weber@proxmox.com>,
	Kai Huang <kai.huang@intel.com>, Yuan Yao <yuan.yao@linux.intel.com>,
	Xu Yilun <yilun.xu@linux.intel.com>, Yu Zhang <yu.c.zhang@linux.intel.com>,
	Chao Peng <chao.p.peng@linux.intel.com>, Fuad Tabba <tabba@google.com>,
	Michael Roth <michael.roth@amd.com>,
 Isaku Yamahata <isaku.yamahata@intel.com>,
	David Matlack <dmatlack@google.com>

Move the checks related to the validity of an access to a memslot from the
inner __kvm_faultin_pfn() to its sole caller, kvm_faultin_pfn().  This
allows emulating accesses to the APIC access page, which don't need to
resolve a pfn, even if there is a relevant in-progress mmu_notifier
invalidation.  Ditto for accesses to KVM internal memslots from L2, which
KVM also treats as emulated MMIO.

More importantly, this will allow for future cleanup by having the
"no memslot" case bail from kvm_faultin_pfn() very early on.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 62 ++++++++++++++++++++++--------------------
 1 file changed, 33 insertions(+), 29 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 50bfaa53f3f2..505fc7eef533 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4333,33 +4333,6 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	struct kvm_memory_slot *slot = fault->slot;
 	bool async;
 
-	/*
-	 * Retry the page fault if the gfn hit a memslot that is being deleted
-	 * or moved.  This ensures any existing SPTEs for the old memslot will
-	 * be zapped before KVM inserts a new MMIO SPTE for the gfn.
-	 */
-	if (slot && (slot->flags & KVM_MEMSLOT_INVALID))
-		return RET_PF_RETRY;
-
-	if (!kvm_is_visible_memslot(slot)) {
-		/* Don't expose private memslots to L2. */
-		if (is_guest_mode(vcpu)) {
-			fault->slot = NULL;
-			fault->pfn = KVM_PFN_NOSLOT;
-			fault->map_writable = false;
-			return RET_PF_CONTINUE;
-		}
-		/*
-		 * If the APIC access page exists but is disabled, go directly
-		 * to emulation without caching the MMIO access or creating a
-		 * MMIO SPTE.  That way the cache doesn't need to be purged
-		 * when the AVIC is re-enabled.
-		 */
-		if (slot && slot->id == APIC_ACCESS_PAGE_PRIVATE_MEMSLOT &&
-		    !kvm_apicv_activated(vcpu->kvm))
-			return RET_PF_EMULATE;
-	}
-
 	if (fault->is_private)
 		return kvm_faultin_pfn_private(vcpu, fault);
 
@@ -4406,6 +4379,37 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 	fault->mmu_seq = vcpu->kvm->mmu_invalidate_seq;
 	smp_rmb();
 
+	if (!slot)
+		goto faultin_pfn;
+
+	/*
+	 * Retry the page fault if the gfn hit a memslot that is being deleted
+	 * or moved.  This ensures any existing SPTEs for the old memslot will
+	 * be zapped before KVM inserts a new MMIO SPTE for the gfn.
+	 */
+	if (slot->flags & KVM_MEMSLOT_INVALID)
+		return RET_PF_RETRY;
+
+	if (!kvm_is_visible_memslot(slot)) {
+		/* Don't expose KVM's internal memslots to L2. */
+		if (is_guest_mode(vcpu)) {
+			fault->slot = NULL;
+			fault->pfn = KVM_PFN_NOSLOT;
+			fault->map_writable = false;
+			return RET_PF_CONTINUE;
+		}
+
+		/*
+		 * If the APIC access page exists but is disabled, go directly
+		 * to emulation without caching the MMIO access or creating a
+		 * MMIO SPTE.  That way the cache doesn't need to be purged
+		 * when the AVIC is re-enabled.
+		 */
+		if (slot->id == APIC_ACCESS_PAGE_PRIVATE_MEMSLOT &&
+		    !kvm_apicv_activated(vcpu->kvm))
+			return RET_PF_EMULATE;
+	}
+
 	/*
 	 * Check for a relevant mmu_notifier invalidation event before getting
 	 * the pfn from the primary MMU, and before acquiring mmu_lock.
@@ -4427,10 +4431,10 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 	 * *guaranteed* to need to retry, i.e. waiting until mmu_lock is held
 	 * to detect retry guarantees the worst case latency for the vCPU.
 	 */
-	if (!slot &&
-	    mmu_invalidate_retry_gfn_unsafe(vcpu->kvm, fault->mmu_seq, fault->gfn))
+	if (mmu_invalidate_retry_gfn_unsafe(vcpu->kvm, fault->mmu_seq, fault->gfn))
 		return RET_PF_RETRY;
 
+faultin_pfn:
 	ret = __kvm_faultin_pfn(vcpu, fault);
 	if (ret != RET_PF_CONTINUE)
 		return ret;

From patchwork Fri Feb  9 22:28:58 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Sean Christopherson <seanjc@google.com>
X-Patchwork-Id: 13551966
Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com
 [209.85.219.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7AAA03987B
	for <kvm@vger.kernel.org>; Fri,  9 Feb 2024 22:29:09 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.219.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1707517751; cv=none;
 b=pkS7/SQOdJ5Zm5YlTh65NlsSmDBsMi5tPWgIrz6KpGSthA+fHdM1/5vOm+9axTBq0V2gNWpi5myS9xRQ5SG0BvdtsSoyW05ylSretp8TrhYsbFIKEjbCLAF1Z7Kl0gdaXBvgkl4pPDX4MMR/gbTJ5illvuNspnnNLYENZCEooXo=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1707517751; c=relaxed/simple;
	bh=+BzY1mPGL9nhkNO3V2zfL1PyIK23e4DvUb11rip84d4=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=LKgEpOEPTt94E9C62WgFGjaaEzH/R2ZIXUdrHwmaJdZXeIMPrttCFuEXdGq4+saX+V3D5V/djHDn7EC/U5gYzfUktwAar1t8d5aOBLKDQc+UJVHBIw8RkDORNcGZa70nXdI+SuE7jhf6Nou5rDjv56z/4K5+uwWWNWiEbWlL+fU=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=HiTvF/wM; arc=none smtp.client-ip=209.85.219.201
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="HiTvF/wM"
Received: by mail-yb1-f201.google.com with SMTP id
 3f1490d57ef6-dc743cc50a6so2048618276.2
        for <kvm@vger.kernel.org>; Fri, 09 Feb 2024 14:29:09 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1707517748; x=1708122548;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc:subject:date:message-id:reply-to;
        bh=kAg+9jSv1tP63SQt0oBxMImkHQpyEVjvNJmbkD/YeDg=;
        b=HiTvF/wMJ7P5PXvGFeh5182hDQYnTMPqu2XnC/kbD2H2h+fKT0vWquTL7mMH+Hv7Uy
         d3nTW7ykXxCEd6dVYvm1DFF66h8CUFbM+UIIEQCnXwlzb2+lmT1l6MSCx7dDufCbWL5t
         IPMuAMQpeh5hfiMbCSYqK2DEEMnbb0cVrlvX9p88pMNAQhuYNH+bcVuzDsw5Tl2BbG7Z
         VhwA3g2labHdtmAn/IqpYPKjPzKf4KhBaW2IswfRxXhP0v3tp6oRU+5/fgEdQxG+c6b9
         lb2xoNE4lQmwSp+esObWSP7p8xMkdQr+NJM4s0iTVv8NO80bWYKM+n+19N4QzP8ZLoA7
         eitA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1707517748; x=1708122548;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=kAg+9jSv1tP63SQt0oBxMImkHQpyEVjvNJmbkD/YeDg=;
        b=itdbBDqvHxwvLduGNoRf3FPybsBSoOFzliqdMlfrWFaf5/ZUOmlF/gwUilSXeVzAHN
         DDtl6nehtHiogdomHfIaBHhhdOQ1is8ojNTArLyLA5RdTFQ8u4gAA/mPO2bfS88MoMCn
         hjB0MgakZs24Wej2RajnYj1JVpil77wtaJ1keu87kDlxmeGFj41JVWPLd8kz0Juewxk7
         Z5agR2Py/+VoFoH/abFQzoxLkfWdmA2ACpC9MyEIAw3a71RBLbiZCE5NyQmeItwx7xO9
         l1F/AOKF9nrgfmSWvZH0539wpZqMbX8mFF+7xSNgH1/OUxB5b2sayC4fvK5stxFr4JVv
         DF5g==
X-Gm-Message-State: AOJu0YwYctGSmygn7okNooMZPU5OEsI582tiOa0JNZSY3mvvhy+mjwyA
	mg3Em/nLHHwbGcgMMB/hSynnZEUyrslPQM+Kg2UE9AV6rotb9LAGgDbBH17IFvJtt/FygJaCN88
	qBQ==
X-Google-Smtp-Source: 
 AGHT+IFOx1fyJpguwH0rOayekQLXVRWsIIxJy0ZzhL6eJHbncGhAhhXwZ33rmybP0dmqo+Vp4BOITKePg60=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a05:6902:1b8a:b0:dc6:4bf5:5a74 with SMTP id
 ei10-20020a0569021b8a00b00dc64bf55a74mr20696ybb.11.1707517748503; Fri, 09 Feb
 2024 14:29:08 -0800 (PST)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Fri,  9 Feb 2024 14:28:58 -0800
In-Reply-To: <20240209222858.396696-1-seanjc@google.com>
Precedence: bulk
X-Mailing-List: kvm@vger.kernel.org
List-Id: <kvm.vger.kernel.org>
List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20240209222858.396696-1-seanjc@google.com>
X-Mailer: git-send-email 2.43.0.687.g38aa6559b0-goog
Message-ID: <20240209222858.396696-5-seanjc@google.com>
Subject: [PATCH v4 4/4] KVM: x86/mmu: Handle no-slot faults at the beginning
 of kvm_faultin_pfn()
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
 Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Yan Zhao <yan.y.zhao@intel.com>, Friedrich Weber <f.weber@proxmox.com>,
	Kai Huang <kai.huang@intel.com>, Yuan Yao <yuan.yao@linux.intel.com>,
	Xu Yilun <yilun.xu@linux.intel.com>, Yu Zhang <yu.c.zhang@linux.intel.com>,
	Chao Peng <chao.p.peng@linux.intel.com>, Fuad Tabba <tabba@google.com>,
	Michael Roth <michael.roth@amd.com>,
 Isaku Yamahata <isaku.yamahata@intel.com>,
	David Matlack <dmatlack@google.com>

Handle the "no memslot" case at the beginning of kvm_faultin_pfn(), just
after the private versus shared check, so that there's no need to
repeatedly query whether or not a slot exists.  This also makes it more
obvious that, except for private vs. shared attributes, the process of
faulting in a pfn simply doesn't apply to gfns without a slot.

Cc: David Matlack <dmatlack@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 33 ++++++++++++++++++---------------
 arch/x86/kvm/mmu/mmu_internal.h |  5 ++++-
 2 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 505fc7eef533..7a2874756b3f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3278,6 +3278,14 @@ static int direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	return ret;
 }
 
+static void kvm_mmu_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
+					      struct kvm_page_fault *fault)
+{
+	kvm_prepare_memory_fault_exit(vcpu, fault->gfn << PAGE_SHIFT,
+				      PAGE_SIZE, fault->write, fault->exec,
+				      fault->is_private);
+}
+
 static void kvm_send_hwpoison_signal(struct kvm_memory_slot *slot, gfn_t gfn)
 {
 	unsigned long hva = gfn_to_hva_memslot(slot, gfn);
@@ -3314,9 +3322,16 @@ static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu,
 {
 	gva_t gva = fault->is_tdp ? 0 : fault->addr;
 
+	if (fault->is_private) {
+		kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
+		return -EFAULT;
+	}
+
 	vcpu_cache_mmio_info(vcpu, gva, fault->gfn,
 			     access & shadow_mmio_access_mask);
 
+	fault->pfn = KVM_PFN_NOSLOT;
+
 	/*
 	 * If MMIO caching is disabled, emulate immediately without
 	 * touching the shadow page tables as attempting to install an
@@ -4296,14 +4311,6 @@ static inline u8 kvm_max_level_for_order(int order)
 	return PG_LEVEL_4K;
 }
 
-static void kvm_mmu_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
-					      struct kvm_page_fault *fault)
-{
-	kvm_prepare_memory_fault_exit(vcpu, fault->gfn << PAGE_SHIFT,
-				      PAGE_SIZE, fault->write, fault->exec,
-				      fault->is_private);
-}
-
 static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
 				   struct kvm_page_fault *fault)
 {
@@ -4376,12 +4383,12 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 		return -EFAULT;
 	}
 
+	if (unlikely(!slot))
+		return kvm_handle_noslot_fault(vcpu, fault, access);
+
 	fault->mmu_seq = vcpu->kvm->mmu_invalidate_seq;
 	smp_rmb();
 
-	if (!slot)
-		goto faultin_pfn;
-
 	/*
 	 * Retry the page fault if the gfn hit a memslot that is being deleted
 	 * or moved.  This ensures any existing SPTEs for the old memslot will
@@ -4434,7 +4441,6 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 	if (mmu_invalidate_retry_gfn_unsafe(vcpu->kvm, fault->mmu_seq, fault->gfn))
 		return RET_PF_RETRY;
 
-faultin_pfn:
 	ret = __kvm_faultin_pfn(vcpu, fault);
 	if (ret != RET_PF_CONTINUE)
 		return ret;
@@ -4442,9 +4448,6 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 	if (unlikely(is_error_pfn(fault->pfn)))
 		return kvm_handle_error_pfn(vcpu, fault);
 
-	if (unlikely(!slot))
-		return kvm_handle_noslot_fault(vcpu, fault, access);
-
 	/*
 	 * Check again for a relevant mmu_notifier invalidation event purely to
 	 * avoid contending mmu_lock.  Most invalidations will be detected by
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 0669a8a668ca..bd7d07e6c697 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -235,7 +235,10 @@ struct kvm_page_fault {
 	/* The memslot containing gfn. May be NULL. */
 	struct kvm_memory_slot *slot;
 
-	/* Outputs of kvm_faultin_pfn.  */
+	/*
+	 * Outputs of kvm_faultin_pfn, guaranteed to be valid if and only if
+	 * slot is non-NULL.
+	 */
 	unsigned long mmu_seq;
 	kvm_pfn_t pfn;
 	hva_t hva;