From patchwork Mon Aug  5 23:31:08 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: David Matlack <dmatlack@google.com>
X-Patchwork-Id: 13754226
Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com
 [209.85.128.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id EF57315383D
	for <kvm@vger.kernel.org>; Mon,  5 Aug 2024 23:31:20 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.202
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1722900682; cv=none;
 b=PNUKYpfyoC9KqC/TYcTRahJZbU7vAILBIbAzIGzIJYQqjdXQ5QJ0UvoRjtFFsmfmqNsJl2+mAVLuiQQzlc4qibKZBwPQaD3bEjT04w2gw/EmA3QLmvl89RzIKH75ELjAggWLotbaCfdkcB+cImOVGFX3nijJslGBRHjSSrqGM4A=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1722900682; c=relaxed/simple;
	bh=6IcW3BKD0ZsW3+CyzILKAguRFTnb79dMKHlabWM3MoQ=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=rx8CHADXV2KeCKO1j6vF9wRdpKCrErFibnwi/oUB7mBwMAt7lePCe8ZmRTl69jMIc/cY3fsTKheud+mka29z2Ot+62y/AdBx0ofXuZCbdNpSVt5MM+SDlYWfUVb9Mfdk5AyVobKschPAA89ho68hDyXRefoanscF0FlXWUsrqJM=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--dmatlack.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=d725v6Y0; arc=none smtp.client-ip=209.85.128.202
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--dmatlack.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="d725v6Y0"
Received: by mail-yw1-f202.google.com with SMTP id
 00721157ae682-650b621f4cdso220873237b3.1
        for <kvm@vger.kernel.org>; Mon, 05 Aug 2024 16:31:20 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1722900680; x=1723505480;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=ck+Vh2yRWHPgKKCf2pdOlRdOu2qAo0ouS81QIi7u8Xg=;
        b=d725v6Y03NkkTkyGWaNsDDHL95QOK/Ui0I9Ymjbhg3upXVphHLtI9k5SWdgFk051iE
         KetAElclNH+KBAyu3bJWWV4vv3pmCt6MEl/CgzvehO7asxogRVpjMpJD03ZJVX8uY+aB
         BaypmwB/4+uObML2vwGdIzP0ITGw5YOFeGZVRiS/Aa01DmZtmEjGC16eN9phxwe+qqI1
         8mJslk+CeQvYtaN9gMIwoQFMzs/Kb11p4prc5Bgyd7B/Nfs6BdFZ77d297UvtxPrp0PP
         MwjOAWY4zp3cx0SE93ke0TFOQnxo/EBZ1E2946wj3BILgw7gy0zZixScT5Je8EP/GY2S
         ULLg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1722900680; x=1723505480;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=ck+Vh2yRWHPgKKCf2pdOlRdOu2qAo0ouS81QIi7u8Xg=;
        b=qWUtPaWiFX7bPaoarpdQVBChtyYgz2aYpfwOuA5NuX3p6qMi2F3RYuFqtxzG8bxhb7
         ZMe3R2RRqH6GlcSeOSUcZZ3FNz8wUn81Upr9KX7gIE62ZmByjIUEdWvJtnLldhc225AJ
         VYupfcw2HquJkTWdEr1Dl7YN4roMNZXa2YpX7+/mT9nGAF+1iccquptFPvgJB5LIULjx
         IQmJnLFA2Kz8rW7lalGMCRo+DCW/qmVGLh1xZHOFQzhzaEukHCK4xe5zJAhD10vVvNzC
         8jCFoWyWxNLWb73PxIIHSkyCfaJRC6XTGtjCbG0n8hC1F2B8X+dW+ZWsgXkzczeuWLfF
         J1Iw==
X-Gm-Message-State: AOJu0YwHWoguRDumRmVMCMqd3mR1jeZnv0oBgdLdNoLoEweTGn3RhrnJ
	oHGYmyyk7+J8ETemQhvAS27EuQYYhwZ2hAyaWKDiq7qC68p6JbVLMaGzby8cXUPGA+mR0ea2CRV
	dSJ+SwHwUsw==
X-Google-Smtp-Source: 
 AGHT+IFMsFM16r9MA3nOnJCu4X04Ugtbd7lAeFZWrmCFn+hbzl6PmfRQSxxdQRQO8kl++bc1fKVMhM6Qf5RxgA==
X-Received: from dmatlack-n2d-128.c.googlers.com
 ([fda3:e722:ac3:cc00:20:ed76:c0a8:1309])
 (user=dmatlack job=sendgmr) by 2002:a05:690c:397:b0:61b:791a:9850 with SMTP
 id 00721157ae682-68964393719mr6731177b3.9.1722900679941; Mon, 05 Aug 2024
 16:31:19 -0700 (PDT)
Date: Mon,  5 Aug 2024 16:31:08 -0700
In-Reply-To: <20240805233114.4060019-1-dmatlack@google.com>
Precedence: bulk
X-Mailing-List: kvm@vger.kernel.org
List-Id: <kvm.vger.kernel.org>
List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20240805233114.4060019-1-dmatlack@google.com>
X-Mailer: git-send-email 2.46.0.rc2.264.g509ed76dc8-goog
Message-ID: <20240805233114.4060019-2-dmatlack@google.com>
Subject: [PATCH 1/7] Revert "KVM: x86/mmu: Don't bottom out on leafs when
 zapping collapsible SPTEs"
From: David Matlack <dmatlack@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>,
 Sean Christopherson <seanjc@google.com>
Cc: kvm@vger.kernel.org, David Matlack <dmatlack@google.com>

This reverts commit 85f44f8cc07b5f61bef30fe5343d629fd4263230.

Bring back the logic that walks down to leafs when zapping collapsible
SPTEs. Stepping down to leafs is technically unnecessary when zapping,
but the leaf SPTE will be used in a subsequent commit to construct a
huge SPTE and recover the huge mapping in place.

Note, this revert does not revert the function comment changes above
zap_collapsible_spte_range() and kvm_tdp_mmu_zap_collapsible_sptes()
since those are still relevant.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/tdp_iter.c |  9 +++++++
 arch/x86/kvm/mmu/tdp_iter.h |  1 +
 arch/x86/kvm/mmu/tdp_mmu.c  | 47 ++++++++++++++++++-------------------
 3 files changed, 33 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_iter.c b/arch/x86/kvm/mmu/tdp_iter.c
index 04c247bfe318..1279babbc72c 100644
--- a/arch/x86/kvm/mmu/tdp_iter.c
+++ b/arch/x86/kvm/mmu/tdp_iter.c
@@ -142,6 +142,15 @@ static bool try_step_up(struct tdp_iter *iter)
 	return true;
 }
 
+/*
+ * Step the iterator back up a level in the paging structure. Should only be
+ * used when the iterator is below the root level.
+ */
+void tdp_iter_step_up(struct tdp_iter *iter)
+{
+	WARN_ON(!try_step_up(iter));
+}
+
 /*
  * Step to the next SPTE in a pre-order traversal of the paging structure.
  * To get to the next SPTE, the iterator either steps down towards the goal
diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h
index 2880fd392e0c..821fde2ac7b0 100644
--- a/arch/x86/kvm/mmu/tdp_iter.h
+++ b/arch/x86/kvm/mmu/tdp_iter.h
@@ -136,5 +136,6 @@ void tdp_iter_start(struct tdp_iter *iter, struct kvm_mmu_page *root,
 		    int min_level, gfn_t next_last_level_gfn);
 void tdp_iter_next(struct tdp_iter *iter);
 void tdp_iter_restart(struct tdp_iter *iter);
+void tdp_iter_step_up(struct tdp_iter *iter);
 
 #endif /* __KVM_X86_MMU_TDP_ITER_H */
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index c7dc49ee7388..ebe2ab3686c7 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1628,49 +1628,48 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
 
 	rcu_read_lock();
 
-	for_each_tdp_pte_min_level(iter, root, PG_LEVEL_2M, start, end) {
-retry:
+	tdp_root_for_each_pte(iter, root, start, end) {
 		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, true))
 			continue;
 
-		if (iter.level > KVM_MAX_HUGEPAGE_LEVEL ||
-		    !is_shadow_present_pte(iter.old_spte))
+		if (!is_shadow_present_pte(iter.old_spte) ||
+		    !is_last_spte(iter.old_spte, iter.level))
 			continue;
 
+		max_mapping_level = kvm_mmu_max_mapping_level(kvm, slot,
+							      iter.gfn, PG_LEVEL_NUM);
+
+		WARN_ON(max_mapping_level < iter.level);
+
 		/*
-		 * Don't zap leaf SPTEs, if a leaf SPTE could be replaced with
-		 * a large page size, then its parent would have been zapped
-		 * instead of stepping down.
+		 * If this page is already mapped at the highest
+		 * viable level, there's nothing more to do.
 		 */
-		if (is_last_spte(iter.old_spte, iter.level))
+		if (max_mapping_level == iter.level)
 			continue;
 
 		/*
-		 * If iter.gfn resides outside of the slot, i.e. the page for
-		 * the current level overlaps but is not contained by the slot,
-		 * then the SPTE can't be made huge.  More importantly, trying
-		 * to query that info from slot->arch.lpage_info will cause an
-		 * out-of-bounds access.
+		 * The page can be remapped at a higher level, so step
+		 * up to zap the parent SPTE.
 		 */
-		if (iter.gfn < start || iter.gfn >= end)
-			continue;
-
-		max_mapping_level = kvm_mmu_max_mapping_level(kvm, slot,
-							      iter.gfn, PG_LEVEL_NUM);
-		if (max_mapping_level < iter.level)
-			continue;
+		while (max_mapping_level > iter.level)
+			tdp_iter_step_up(&iter);
 
 		/* Note, a successful atomic zap also does a remote TLB flush. */
-		if (tdp_mmu_zap_spte_atomic(kvm, &iter))
-			goto retry;
+		(void)tdp_mmu_zap_spte_atomic(kvm, &iter);
+
+		/*
+		 * If the atomic zap fails, the iter will recurse back into
+		 * the same subtree to retry.
+		 */
 	}
 
 	rcu_read_unlock();
 }
 
 /*
- * Zap non-leaf SPTEs (and free their associated page tables) which could
- * be replaced by huge pages, for GFNs within the slot.
+ * Zap non-leaf SPTEs (and free their associated page tables) which could be
+ * replaced by huge pages, for GFNs within the slot.
  */
 void kvm_tdp_mmu_zap_collapsible_sptes(struct kvm *kvm,
 				       const struct kvm_memory_slot *slot)

From patchwork Mon Aug  5 23:31:09 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: David Matlack <dmatlack@google.com>
X-Patchwork-Id: 13754227
Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com
 [209.85.219.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6615416E870
	for <kvm@vger.kernel.org>; Mon,  5 Aug 2024 23:31:22 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.219.202
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1722900683; cv=none;
 b=U5lKoSASpRyVScw0JzcPyqLjfrEXMF0ToFyYaNz7SEh7q0TstWAQL4hm0jAPkw/D2BKD9EZT+eFex6VOZSPEzSw+trGzo6ki0/MuNAZUgKu9gPdl0CPj07hzuRbfgu4hCJtzqd6JkuoWETklL0MY8HXZ1gPxM1DQYIDt64P6C0g=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1722900683; c=relaxed/simple;
	bh=Y19HIRv//o9MKak8I1KfvyWj0H3S7faDt3jjM6M5AOI=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=jTtlFjzPzxgCcCABVUCNIwAnUeoS4oa46WhA8I2qrvXZSvVKrXdWbpSTRn9h9d7nT0cVZxJvP3q2s/sy9UQCKH18/cPTvkZwmZv+hcG9R+lDRv2bezn+kUR37ukGsQWq1UxiFyUlg/Z3DNDUKMpBdrWtZWFtmTi5+SOkOcfF2M4=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--dmatlack.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=OGDzBdFN; arc=none smtp.client-ip=209.85.219.202
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--dmatlack.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="OGDzBdFN"
Received: by mail-yb1-f202.google.com with SMTP id
 3f1490d57ef6-e0879957a99so102549276.1
        for <kvm@vger.kernel.org>; Mon, 05 Aug 2024 16:31:22 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1722900681; x=1723505481;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=7LQqvI770O7xKSIiWj4rWaWMVDlnXBHyDdlHU3BeMkc=;
        b=OGDzBdFNePBSAZKNLJNWq3gSFSX+02vjTDuQzVChW5G5LDPJoDppW2z1rrQZHCw+8z
         N9697+CxfwILvlRTkQIjHGRwSH7/j29unK+OOcdZ29avdLbbE+AMVJW69hRXeCnbCmi1
         gghTagiq2UiPjEsgxZ0dqZnlT+7el7jC6Nd6xAGkDJyNwUJs/AmFO1IulkiRFp8ZI6PY
         pfi1jqsXLeJds3H/TAzA1W0LSyl2J+pZpRM3gyMwdpoI4SzEi7ZW6YAo24rSxLPipZSz
         P2SjrrAl2vMxxFFllZ3SvnXMJTh132tOhquJ12Yy68QG6S5kaQghb0yYmo1kO3ETD1K3
         7mNA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1722900681; x=1723505481;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=7LQqvI770O7xKSIiWj4rWaWMVDlnXBHyDdlHU3BeMkc=;
        b=QMJa1iGTrM9o4j62Fvv8Il2x96hx7TJ2NuVr4rFMtxeJ8GkkRSAA6WPM1rI5d+ArrY
         8Gh6j3BbjYsaN+JNk3e57Vxq5/uwd36NwkpNXtvlLUStbGRAiwF9a6Z7Vf2cbsDRnY2y
         +ZyMUlOq9F4703nN4MUcvgBSn1+VWKr8KxD936RNOiDrMlBGbvVJfHLF/QY1KhweosHe
         Jli4Z2MqFZTz3VNWp+Qau1oU5XZFmFbHsc3eTkQ1nMxoRm7E95iD3LVSEQojk+zjuYko
         Wt5nnEzhmGoT241+Fu8TR/xIuxuyNwsaS+wYkl25Y1+xmeZyEJc6sykUlcWvIJrkF5bu
         RBRQ==
X-Gm-Message-State: AOJu0YxBdDLR+q4J5/b8SjkOenjfeHVVvFTTWc3ta6H2ciBq9zs0Nx3/
	Eh7HUCbAR21Ck4Rtivizx576lPK3cD9dzM6NbbDL/Qzfq6UXpA2noYxPfzvL4EtnJHAWDFTntdk
	Ts7me5nWXtQ==
X-Google-Smtp-Source: 
 AGHT+IGUGvN8FxaHopiTmurNsaMfZtxMiUrdQ52shFU5Cz3pxiSe8fuge1q+LRErZMHnOLoIpSCUiaTZAycF1w==
X-Received: from dmatlack-n2d-128.c.googlers.com
 ([fda3:e722:ac3:cc00:20:ed76:c0a8:1309])
 (user=dmatlack job=sendgmr) by 2002:a05:6902:c0a:b0:e0b:5200:d93f with SMTP
 id 3f1490d57ef6-e0bde1fc85bmr20203276.3.1722900681306; Mon, 05 Aug 2024
 16:31:21 -0700 (PDT)
Date: Mon,  5 Aug 2024 16:31:09 -0700
In-Reply-To: <20240805233114.4060019-1-dmatlack@google.com>
Precedence: bulk
X-Mailing-List: kvm@vger.kernel.org
List-Id: <kvm.vger.kernel.org>
List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20240805233114.4060019-1-dmatlack@google.com>
X-Mailer: git-send-email 2.46.0.rc2.264.g509ed76dc8-goog
Message-ID: <20240805233114.4060019-3-dmatlack@google.com>
Subject: [PATCH 2/7] KVM: x86/mmu: Drop @max_level from
 kvm_mmu_max_mapping_level()
From: David Matlack <dmatlack@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>,
 Sean Christopherson <seanjc@google.com>
Cc: kvm@vger.kernel.org, David Matlack <dmatlack@google.com>

Drop the @max_level parameter from kvm_mmu_max_mapping_level(). All
callers pass in PG_LEVEL_NUM, so @max_level can be replaced with
PG_LEVEL_NUM in the function body.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 8 +++-----
 arch/x86/kvm/mmu/mmu_internal.h | 3 +--
 arch/x86/kvm/mmu/tdp_mmu.c      | 4 +---
 3 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 901be9e420a4..1b4e14ac512b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3127,13 +3127,12 @@ static int __kvm_mmu_max_mapping_level(struct kvm *kvm,
 }
 
 int kvm_mmu_max_mapping_level(struct kvm *kvm,
-			      const struct kvm_memory_slot *slot, gfn_t gfn,
-			      int max_level)
+			      const struct kvm_memory_slot *slot, gfn_t gfn)
 {
 	bool is_private = kvm_slot_can_be_private(slot) &&
 			  kvm_mem_is_private(kvm, gfn);
 
-	return __kvm_mmu_max_mapping_level(kvm, slot, gfn, max_level, is_private);
+	return __kvm_mmu_max_mapping_level(kvm, slot, gfn, PG_LEVEL_NUM, is_private);
 }
 
 void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
@@ -6890,8 +6889,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
 		 * mapping if the indirect sp has level = 1.
 		 */
 		if (sp->role.direct &&
-		    sp->role.level < kvm_mmu_max_mapping_level(kvm, slot, sp->gfn,
-							       PG_LEVEL_NUM)) {
+		    sp->role.level < kvm_mmu_max_mapping_level(kvm, slot, sp->gfn)) {
 			kvm_zap_one_rmap_spte(kvm, rmap_head, sptep);
 
 			if (kvm_available_flush_remote_tlbs_range())
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 1721d97743e9..fee385e75405 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -344,8 +344,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 }
 
 int kvm_mmu_max_mapping_level(struct kvm *kvm,
-			      const struct kvm_memory_slot *slot, gfn_t gfn,
-			      int max_level);
+			      const struct kvm_memory_slot *slot, gfn_t gfn);
 void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
 void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_level);
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index ebe2ab3686c7..f881e79243b3 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1636,9 +1636,7 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
 		    !is_last_spte(iter.old_spte, iter.level))
 			continue;
 
-		max_mapping_level = kvm_mmu_max_mapping_level(kvm, slot,
-							      iter.gfn, PG_LEVEL_NUM);
-
+		max_mapping_level = kvm_mmu_max_mapping_level(kvm, slot, iter.gfn);
 		WARN_ON(max_mapping_level < iter.level);
 
 		/*

From patchwork Mon Aug  5 23:31:10 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: David Matlack <dmatlack@google.com>
X-Patchwork-Id: 13754228
Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com
 [209.85.219.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id DBEB416F0D8
	for <kvm@vger.kernel.org>; Mon,  5 Aug 2024 23:31:23 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.219.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1722900685; cv=none;
 b=bew32UF6/bDY/yeaqmdZqyE2i+k1V5gv0Gd3vPvqBcYuVDPE4rnFJhZq7gm3x84DK9+kemB+OUvb1WHwm2UhCrxu/u2wRHRAbL2u4Lllg+cnoK2Uy2GHCrM2w7xRSBnni4qEMkiy2E3B4uj1/+RY+w7cnGd9Joox9vpkSDyvic4=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1722900685; c=relaxed/simple;
	bh=G6tpH0+cUa+eTiVlcsNAusIGfbblRYTeHtSvmVB5AHc=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=sXMwgcEAJ5pH6iSEdY4XvZ6c6867yBBzRJeIm+U7PHfzvZS2euo3cnomPzQlcx3N9dARkKz441o6bxae/yapz5KtRyBwIvcQfr3DEj7jjcllOerIM26Ynp0MpV+qBMid/VovQK45SB7YiSHyhUFSUfLkQak2/H1Hgv1l+lX0oow=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--dmatlack.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=mfk3PVau; arc=none smtp.client-ip=209.85.219.201
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--dmatlack.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="mfk3PVau"
Received: by mail-yb1-f201.google.com with SMTP id
 3f1490d57ef6-e0353b731b8so124221276.2
        for <kvm@vger.kernel.org>; Mon, 05 Aug 2024 16:31:23 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1722900683; x=1723505483;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=oGjvAvR4aTZnDEXjHKFS7ecNhScflJ06geRDwp6AlsU=;
        b=mfk3PVaumjsThJ6vlTfoSGCnZMdUG6PZIwb2mVnosYcZ5UPF2t5QEjlSopK8wyxrIf
         WO51fgFZLcKLYY3C0iMPDldkHA1BuSjOxcr7pTTYJwK6lOtU9eCVEn+S69VfFqHB0o1M
         JngLTcOy9yiXDV1pRJZJOpOVvvvahd8ACE3xPtf5463jb2/1IH5Y7N0CaKb6YhUvMZ8V
         utBCweWFxNZhtcosfdcFy1ZPXmrT2wrl19XG/QPOGT340+uPCTikLWBrVfaItb9Bryyy
         FjDCP/DSHsZZ/jpcqY54Q/QVmdnzBbMIQpc9IVbqS3hS/DPwGx9KS6bmFpUFt9iIuvbW
         889A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1722900683; x=1723505483;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=oGjvAvR4aTZnDEXjHKFS7ecNhScflJ06geRDwp6AlsU=;
        b=ZlWefwrlYK0Frvv2KbnQgpD57jE26KYS3/YFnPCHFiRWvSmsWZjC+XDPV/EbGpkHaI
         bOifVRRy8EOprqdtY5NUSZJe6vKugm9Mm2zPuJf4D5NBK9QdgL/JlybGc8i1jzs4AgRJ
         I/tdy/rPVY6u457kRwrmxwgInhfPHENDvAVUOTCe8CIRgICMMfZNhWadsMi7MPO5o4Wt
         4VQ4Tm6x/o6kvlkUzRh9OhSaK0GyYYcfAedvKgBMKt289mZpoueTi5E9Ed/xtU2tsbwZ
         V7+IeLvBN46ZviYsGXPswlKY976Bsp1D4xQ0MltREM+BaTVRz+YkaUDBa2VVTQL4KfdM
         TLWg==
X-Gm-Message-State: AOJu0YyMR2mUsesFFOV6XnG6VB7Taxlp645XyZnBnCFwrVh4fzDzPtl8
	S2t2rkpWS72VzGeo/fgYo6gEbAGN2keBLmm7DzyBKVdI9NolOOSdcG5JT3eYEH2xx/EIFzTarL4
	9ZlpiVEfTIg==
X-Google-Smtp-Source: 
 AGHT+IEdsLOY9gj0leB4a+Fht5C1DlaSoGKM0TX0qGe4w0rEaXaL+5oVgOGs6kwLcWdJbm7TqTewoGK5JCEo5w==
X-Received: from dmatlack-n2d-128.c.googlers.com
 ([fda3:e722:ac3:cc00:20:ed76:c0a8:1309])
 (user=dmatlack job=sendgmr) by 2002:a05:6902:c02:b0:e0b:d229:af01 with SMTP
 id 3f1490d57ef6-e0bde2925d2mr23960276.6.1722900682885; Mon, 05 Aug 2024
 16:31:22 -0700 (PDT)
Date: Mon,  5 Aug 2024 16:31:10 -0700
In-Reply-To: <20240805233114.4060019-1-dmatlack@google.com>
Precedence: bulk
X-Mailing-List: kvm@vger.kernel.org
List-Id: <kvm.vger.kernel.org>
List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20240805233114.4060019-1-dmatlack@google.com>
X-Mailer: git-send-email 2.46.0.rc2.264.g509ed76dc8-goog
Message-ID: <20240805233114.4060019-4-dmatlack@google.com>
Subject: [PATCH 3/7] KVM: x86/mmu: Batch TLB flushes when zapping collapsible
 TDP MMU SPTEs
From: David Matlack <dmatlack@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>,
 Sean Christopherson <seanjc@google.com>
Cc: kvm@vger.kernel.org, David Matlack <dmatlack@google.com>

Set SPTEs directly to SHADOW_NONPRESENT_VALUE and batch up TLB flushes
when zapping collapsible SPTEs, rather than freezing them first.

Freezing the SPTE first is not required. It is fine for another thread
holding mmu_lock for read to immediately install a present entry before
TLBs are flushed because the underlying mapping is not changing. vCPUs
that translate through the stale 4K mappings or a new huge page mapping
will still observe the same GPA->HPA translations.

KVM must only flush TLBs before dropping RCU (to avoid use-after-free of
the zapped page tables) and before dropping mmu_lock (to synchronize
with mmu_notifiers invalidating mappings).

In VMs backed with 2MiB pages, batching TLB flushes improves the time it
takes to zap collapsible SPTEs to disable dirty logging:

 $ ./dirty_log_perf_test -s anonymous_hugetlb_2mb -v 64 -e -b 4g

 Before: Disabling dirty logging time: 14.334453428s (131072 flushes)
 After:  Disabling dirty logging time: 4.794969689s  (76 flushes)

Skipping freezing SPTEs also avoids stalling vCPU threads on the frozen
SPTE for the time it takes to perform a remote TLB flush. vCPUs faulting
on the zapped mapping can now immediately install a new huge mapping and
proceed with guest execution.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c | 54 +++++++-------------------------------
 1 file changed, 9 insertions(+), 45 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index f881e79243b3..fad2912d3d4c 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -591,48 +591,6 @@ static inline int __must_check tdp_mmu_set_spte_atomic(struct kvm *kvm,
 	return 0;
 }
 
-static inline int __must_check tdp_mmu_zap_spte_atomic(struct kvm *kvm,
-						       struct tdp_iter *iter)
-{
-	int ret;
-
-	lockdep_assert_held_read(&kvm->mmu_lock);
-
-	/*
-	 * Freeze the SPTE by setting it to a special, non-present value. This
-	 * will stop other threads from immediately installing a present entry
-	 * in its place before the TLBs are flushed.
-	 *
-	 * Delay processing of the zapped SPTE until after TLBs are flushed and
-	 * the FROZEN_SPTE is replaced (see below).
-	 */
-	ret = __tdp_mmu_set_spte_atomic(iter, FROZEN_SPTE);
-	if (ret)
-		return ret;
-
-	kvm_flush_remote_tlbs_gfn(kvm, iter->gfn, iter->level);
-
-	/*
-	 * No other thread can overwrite the frozen SPTE as they must either
-	 * wait on the MMU lock or use tdp_mmu_set_spte_atomic() which will not
-	 * overwrite the special frozen SPTE value. Use the raw write helper to
-	 * avoid an unnecessary check on volatile bits.
-	 */
-	__kvm_tdp_mmu_write_spte(iter->sptep, SHADOW_NONPRESENT_VALUE);
-
-	/*
-	 * Process the zapped SPTE after flushing TLBs, and after replacing
-	 * FROZEN_SPTE with 0. This minimizes the amount of time vCPUs are
-	 * blocked by the FROZEN_SPTE and reduces contention on the child
-	 * SPTEs.
-	 */
-	handle_changed_spte(kvm, iter->as_id, iter->gfn, iter->old_spte,
-			    SHADOW_NONPRESENT_VALUE, iter->level, true);
-
-	return 0;
-}
-
-
 /*
  * tdp_mmu_set_spte - Set a TDP MMU SPTE and handle the associated bookkeeping
  * @kvm:	      KVM instance
@@ -1625,12 +1583,15 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
 	gfn_t end = start + slot->npages;
 	struct tdp_iter iter;
 	int max_mapping_level;
+	bool flush = false;
 
 	rcu_read_lock();
 
 	tdp_root_for_each_pte(iter, root, start, end) {
-		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, true))
+		if (tdp_mmu_iter_cond_resched(kvm, &iter, flush, true)) {
+			flush = false;
 			continue;
+		}
 
 		if (!is_shadow_present_pte(iter.old_spte) ||
 		    !is_last_spte(iter.old_spte, iter.level))
@@ -1653,8 +1614,8 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
 		while (max_mapping_level > iter.level)
 			tdp_iter_step_up(&iter);
 
-		/* Note, a successful atomic zap also does a remote TLB flush. */
-		(void)tdp_mmu_zap_spte_atomic(kvm, &iter);
+		if (!tdp_mmu_set_spte_atomic(kvm, &iter, SHADOW_NONPRESENT_VALUE))
+			flush = true;
 
 		/*
 		 * If the atomic zap fails, the iter will recurse back into
@@ -1662,6 +1623,9 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
 		 */
 	}
 
+	if (flush)
+		kvm_flush_remote_tlbs_memslot(kvm, slot);
+
 	rcu_read_unlock();
 }
 

From patchwork Mon Aug  5 23:31:11 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: David Matlack <dmatlack@google.com>
X-Patchwork-Id: 13754229
Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com
 [209.85.219.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5E90B16F84C
	for <kvm@vger.kernel.org>; Mon,  5 Aug 2024 23:31:25 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.219.202
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1722900687; cv=none;
 b=E0n4PaWv+JjfprmCK+NiMZkzL/s/OElQ5RWyfKIFggTLz6grxOImIp0QyZ58UUPrWrq0aRIshi8er30bInsyKjWdQCqEZDboRqZKe8Uj41Pfz5C/rBbSUD8oSZvnD9BnJJ1gj3Ob6FMK6BjCFSt0xeaG/pA1lCMGKG0fJCxfBjM=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1722900687; c=relaxed/simple;
	bh=8leJ31oJs1BNyX+3Er2KccXH2VRObSgKa3sneeSzwFE=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=Wd/tToUdDU/tCskUOiSUqT2LdZDP8nOLxDbO+zmUYODohkxhT32SRGvFdaFXFDyuWeCN9ilcyjGpCCQqaXIlN3lsFPb5HF5/LXpMjZ3Zpk42uoz2PmzSYpqTYjbky3sunksgBAqSSSigcdEWAGgTXXB91GC2AKqv+fjZ1mH5oqE=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--dmatlack.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=VwvBVu3L; arc=none smtp.client-ip=209.85.219.202
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--dmatlack.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="VwvBVu3L"
Received: by mail-yb1-f202.google.com with SMTP id
 3f1490d57ef6-e0be2fa8f68so94889276.3
        for <kvm@vger.kernel.org>; Mon, 05 Aug 2024 16:31:25 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1722900684; x=1723505484;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=vXE6JzMWFSja2kdpNMFAxnrAri58nZeKIl6fOrKa69M=;
        b=VwvBVu3LsS5WYIc+WiMK/jzZzD3wCCgm6wej6dp6SlQwwvK+v1jhYIXq4rPqOFy7KL
         N8+6LAb+otkaMD+dzGLsVIO9C+w31y57ECuem/VrfYJgwa9zwp7oJ1jf8DRuuXjABy5i
         9wzsK6qve1fonLvpot/JYkcLbgX9vEm6asKQLj9OEgtHAyn+JH97OBCxMpZGhjVw7nbZ
         oiyJUiKnYIrZSdhNt6gbe5Jtx5JJ9qwdysn9fw89h8Tux9ifLZ5ewiWoqpcAG31yu4P4
         c+6mm/65iTHYKtJWObfBOh+F1oLh/6w+33sS8/OOSFNqjSNoJSddq4IxlEUN7724kGeQ
         vG5g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1722900684; x=1723505484;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=vXE6JzMWFSja2kdpNMFAxnrAri58nZeKIl6fOrKa69M=;
        b=bXwYKgENAlJPQJ083xf0d1ZOfYRlzQXly1Qx+fpaXvEUwhnu2y+okQokWNb/dBTBYc
         Fez5K23Ig4ikfm5RLghqkrM9RkI2bkf+eBiuG1gDBc3TJqH0O6SG6UEV4CCq7CeG3fZC
         WvoSd8lHeRowC/l3HzoSsSF03bhgVtHuk7i2864Sy+Q1jSMPDhTAWkh46w8TQ7zvhyi1
         WcwMTPcpsnhsVQh1JMC2PXJANMQy+LVXyakBBc5RTsrAD4bNh+E7FjOBaulqwVm25YW+
         QAQPNxp8DNaN9Df+LRnEbLTpQ9Y8lxFs63BSKxhpfiDr1VLideVNwM7lXHEhAioOjEOn
         zlaA==
X-Gm-Message-State: AOJu0Yy1Rc8UhuBm2xl6E81o4LqClA7S7jXzFFa8gy1trJ/IGWlVPGFH
	6lbIKDKj3ViuYWvTS0aCKFKPQkx8chOTsw7xhs0zJHEiD0Ec0uIuYo+u+9t+V6aYb/s7vBnKCgm
	9WXFlbnnFeQ==
X-Google-Smtp-Source: 
 AGHT+IEgFoZugA85TLSkAb19f1VjyuhrA2JHG75XPVigWc28Z+clukSUR9Z/zvV0+TaoG8rxoJ+c+KrhLbO6uA==
X-Received: from dmatlack-n2d-128.c.googlers.com
 ([fda3:e722:ac3:cc00:20:ed76:c0a8:1309])
 (user=dmatlack job=sendgmr) by 2002:a05:6902:150a:b0:e02:ba8f:2bd5 with SMTP
 id 3f1490d57ef6-e0bde420d84mr61660276.7.1722900684308; Mon, 05 Aug 2024
 16:31:24 -0700 (PDT)
Date: Mon,  5 Aug 2024 16:31:11 -0700
In-Reply-To: <20240805233114.4060019-1-dmatlack@google.com>
Precedence: bulk
X-Mailing-List: kvm@vger.kernel.org
List-Id: <kvm.vger.kernel.org>
List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20240805233114.4060019-1-dmatlack@google.com>
X-Mailer: git-send-email 2.46.0.rc2.264.g509ed76dc8-goog
Message-ID: <20240805233114.4060019-5-dmatlack@google.com>
Subject: [PATCH 4/7] KVM: x86/mmu: Recover TDP MMU huge page mappings in-place
 instead of zapping
From: David Matlack <dmatlack@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>,
 Sean Christopherson <seanjc@google.com>
Cc: kvm@vger.kernel.org, David Matlack <dmatlack@google.com>

Recover TDP MMU huge page mappings in-place instead of zapping them when
dirty logging is disabled, and rename functions that recover huge page
mappings when dirty logging is disabled to move away from the "zap
collapsible spte" terminology.

Before KVM flushes TLBs, guest accesses may be translated through either
the (stale) small SPTE or the (new) huge SPTE. This is already possible
when KVM is doing eager page splitting (where TLB flushes are also
batched), and when vCPUs are faulting in huge mappings (where TLBs are
flushed after the new huge SPTE is installed).

Recovering huge pages reduces the number of page faults when dirty
logging is disabled:

 $ perf stat -e kvm:kvm_page_fault -- ./dirty_log_perf_test -s anonymous_hugetlb_2mb -v 64 -e -b 4g

 Before: 393,599      kvm:kvm_page_fault
 After:  262,575      kvm:kvm_page_fault

vCPU throughput and the latency of disabling dirty-logging are about
equal compared to zapping, but avoiding faults can be beneficial to
remove vCPU jitter in extreme scenarios.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/kvm_host.h |  4 ++--
 arch/x86/kvm/mmu/mmu.c          |  6 +++---
 arch/x86/kvm/mmu/spte.c         | 36 ++++++++++++++++++++++++++++++---
 arch/x86/kvm/mmu/spte.h         |  1 +
 arch/x86/kvm/mmu/tdp_mmu.c      | 32 +++++++++++++++++------------
 arch/x86/kvm/mmu/tdp_mmu.h      |  4 ++--
 arch/x86/kvm/x86.c              | 18 +++++++----------
 7 files changed, 67 insertions(+), 34 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 950a03e0181e..ed3b724db4d7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1952,8 +1952,8 @@ void kvm_mmu_try_split_huge_pages(struct kvm *kvm,
 				  const struct kvm_memory_slot *memslot,
 				  u64 start, u64 end,
 				  int target_level);
-void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
-				   const struct kvm_memory_slot *memslot);
+void kvm_mmu_recover_huge_pages(struct kvm *kvm,
+				const struct kvm_memory_slot *memslot);
 void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
 				   const struct kvm_memory_slot *memslot);
 void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 1b4e14ac512b..34e59210d94e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6917,8 +6917,8 @@ static void kvm_rmap_zap_collapsible_sptes(struct kvm *kvm,
 		kvm_flush_remote_tlbs_memslot(kvm, slot);
 }
 
-void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
-				   const struct kvm_memory_slot *slot)
+void kvm_mmu_recover_huge_pages(struct kvm *kvm,
+				const struct kvm_memory_slot *slot)
 {
 	if (kvm_memslots_have_rmaps(kvm)) {
 		write_lock(&kvm->mmu_lock);
@@ -6928,7 +6928,7 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
 
 	if (tdp_mmu_enabled) {
 		read_lock(&kvm->mmu_lock);
-		kvm_tdp_mmu_zap_collapsible_sptes(kvm, slot);
+		kvm_tdp_mmu_recover_huge_pages(kvm, slot);
 		read_unlock(&kvm->mmu_lock);
 	}
 }
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index d4527965e48c..979387d4ebfa 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -268,15 +268,14 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 	return wrprot;
 }
 
-static u64 make_spte_executable(u64 spte)
+static u64 modify_spte_protections(u64 spte, u64 set, u64 clear)
 {
 	bool is_access_track = is_access_track_spte(spte);
 
 	if (is_access_track)
 		spte = restore_acc_track_spte(spte);
 
-	spte &= ~shadow_nx_mask;
-	spte |= shadow_x_mask;
+	spte = (spte | set) & ~clear;
 
 	if (is_access_track)
 		spte = mark_spte_for_access_track(spte);
@@ -284,6 +283,16 @@ static u64 make_spte_executable(u64 spte)
 	return spte;
 }
 
+static u64 make_spte_executable(u64 spte)
+{
+	return modify_spte_protections(spte, shadow_x_mask, shadow_nx_mask);
+}
+
+static u64 make_spte_nonexecutable(u64 spte)
+{
+	return modify_spte_protections(spte, shadow_nx_mask, shadow_x_mask);
+}
+
 /*
  * Construct an SPTE that maps a sub-page of the given huge page SPTE where
  * `index` identifies which sub-page.
@@ -320,6 +329,27 @@ u64 make_huge_page_split_spte(struct kvm *kvm, u64 huge_spte,
 	return child_spte;
 }
 
+u64 make_huge_spte(struct kvm *kvm, u64 small_spte, int level)
+{
+	u64 huge_spte;
+
+	KVM_BUG_ON(!is_shadow_present_pte(small_spte), kvm);
+	KVM_BUG_ON(level == PG_LEVEL_4K, kvm);
+
+	huge_spte = small_spte | PT_PAGE_SIZE_MASK;
+
+	/*
+	 * huge_spte already has the address of the sub-page being collapsed
+	 * from small_spte, so just clear the lower address bits to create the
+	 * huge page address.
+	 */
+	huge_spte &= KVM_HPAGE_MASK(level) | ~PAGE_MASK;
+
+	if (is_nx_huge_page_enabled(kvm))
+		huge_spte = make_spte_nonexecutable(huge_spte);
+
+	return huge_spte;
+}
 
 u64 make_nonleaf_spte(u64 *child_pt, bool ad_disabled)
 {
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index ef793c459b05..498c30b6ba71 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -503,6 +503,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 	       bool host_writable, u64 *new_spte);
 u64 make_huge_page_split_spte(struct kvm *kvm, u64 huge_spte,
 		      	      union kvm_mmu_page_role role, int index);
+u64 make_huge_spte(struct kvm *kvm, u64 small_spte, int level);
 u64 make_nonleaf_spte(u64 *child_pt, bool ad_disabled);
 u64 make_mmio_spte(struct kvm_vcpu *vcpu, u64 gfn, unsigned int access);
 u64 mark_spte_for_access_track(u64 spte);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index fad2912d3d4c..3f2d7343194e 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1575,15 +1575,16 @@ void kvm_tdp_mmu_clear_dirty_pt_masked(struct kvm *kvm,
 		clear_dirty_pt_masked(kvm, root, gfn, mask, wrprot);
 }
 
-static void zap_collapsible_spte_range(struct kvm *kvm,
-				       struct kvm_mmu_page *root,
-				       const struct kvm_memory_slot *slot)
+static void recover_huge_pages_range(struct kvm *kvm,
+				     struct kvm_mmu_page *root,
+				     const struct kvm_memory_slot *slot)
 {
 	gfn_t start = slot->base_gfn;
 	gfn_t end = start + slot->npages;
 	struct tdp_iter iter;
 	int max_mapping_level;
 	bool flush = false;
+	u64 huge_spte;
 
 	rcu_read_lock();
 
@@ -1608,18 +1609,19 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
 			continue;
 
 		/*
-		 * The page can be remapped at a higher level, so step
-		 * up to zap the parent SPTE.
+		 * Construct the huge SPTE based on the small SPTE and then step
+		 * back up to install it.
 		 */
+		huge_spte = make_huge_spte(kvm, iter.old_spte, max_mapping_level);
 		while (max_mapping_level > iter.level)
 			tdp_iter_step_up(&iter);
 
-		if (!tdp_mmu_set_spte_atomic(kvm, &iter, SHADOW_NONPRESENT_VALUE))
+		if (!tdp_mmu_set_spte_atomic(kvm, &iter, huge_spte))
 			flush = true;
 
 		/*
-		 * If the atomic zap fails, the iter will recurse back into
-		 * the same subtree to retry.
+		 * If the cmpxchg fails, the iter will recurse back into the
+		 * same subtree to retry.
 		 */
 	}
 
@@ -1630,17 +1632,21 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
 }
 
 /*
- * Zap non-leaf SPTEs (and free their associated page tables) which could be
- * replaced by huge pages, for GFNs within the slot.
+ * Recover huge page mappings within the slot by replacing non-leaf SPTEs with
+ * huge SPTEs where possible.
+ *
+ * Note that all huge page mappings are recovered, including NX huge pages that
+ * were split by guest instruction fetches and huge pages that were split for
+ * dirty tracking.
  */
-void kvm_tdp_mmu_zap_collapsible_sptes(struct kvm *kvm,
-				       const struct kvm_memory_slot *slot)
+void kvm_tdp_mmu_recover_huge_pages(struct kvm *kvm,
+				    const struct kvm_memory_slot *slot)
 {
 	struct kvm_mmu_page *root;
 
 	lockdep_assert_held_read(&kvm->mmu_lock);
 	for_each_valid_tdp_mmu_root_yield_safe(kvm, root, slot->as_id)
-		zap_collapsible_spte_range(kvm, root, slot);
+		recover_huge_pages_range(kvm, root, slot);
 }
 
 /*
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index 1b74e058a81c..ddea2827d1ad 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -40,8 +40,8 @@ void kvm_tdp_mmu_clear_dirty_pt_masked(struct kvm *kvm,
 				       struct kvm_memory_slot *slot,
 				       gfn_t gfn, unsigned long mask,
 				       bool wrprot);
-void kvm_tdp_mmu_zap_collapsible_sptes(struct kvm *kvm,
-				       const struct kvm_memory_slot *slot);
+void kvm_tdp_mmu_recover_huge_pages(struct kvm *kvm,
+				    const struct kvm_memory_slot *slot);
 
 bool kvm_tdp_mmu_write_protect_gfn(struct kvm *kvm,
 				   struct kvm_memory_slot *slot, gfn_t gfn,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index af6c8cf6a37a..b83bebe53840 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -13056,19 +13056,15 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
 
 	if (!log_dirty_pages) {
 		/*
-		 * Dirty logging tracks sptes in 4k granularity, meaning that
-		 * large sptes have to be split.  If live migration succeeds,
-		 * the guest in the source machine will be destroyed and large
-		 * sptes will be created in the destination.  However, if the
-		 * guest continues to run in the source machine (for example if
-		 * live migration fails), small sptes will remain around and
-		 * cause bad performance.
+		 * Recover huge page mappings in the slot now that dirty logging
+		 * is disabled, i.e. now that KVM does not have to track guest
+		 * writes at 4KiB granularity.
 		 *
-		 * Scan sptes if dirty logging has been stopped, dropping those
-		 * which can be collapsed into a single large-page spte.  Later
-		 * page faults will create the large-page sptes.
+		 * Dirty logging might be disabled by userspace if an ongoing VM
+		 * live migration is cancelled and the VM must continue running
+		 * on the source.
 		 */
-		kvm_mmu_zap_collapsible_sptes(kvm, new);
+		kvm_mmu_recover_huge_pages(kvm, new);
 	} else {
 		/*
 		 * Initially-all-set does not require write protecting any page,

From patchwork Mon Aug  5 23:31:12 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: David Matlack <dmatlack@google.com>
X-Patchwork-Id: 13754230
Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com
 [209.85.128.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id BFFBC16F908
	for <kvm@vger.kernel.org>; Mon,  5 Aug 2024 23:31:26 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.202
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1722900688; cv=none;
 b=OZ4cjjYcevAvPeOj5C0hjCoc5ISASdx58jWM4fWHyHHTZP8+ReLdENWD94cyCqtULAm4VGrJJ9VGvoPD0LVsp6uLhKp4uZoOhTxsuncRIcR1QVL+MBFNJeVgxNQf+U1jsK1pcAqInKMZkoqDQs7fF/S4+k1TbgghcEmg2+FGtZc=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1722900688; c=relaxed/simple;
	bh=d5cSeUOtBkn+8CWRq66OLV/SdNt/opIkb5HC4NekoSk=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=AVVFzQG1o+3wDTmn+e7OQU85Q9ZIHMWGKgXYG+oCZGgWVWBNCLnRKJp+xGx43L4y6Gpgsfg04/KmZMQPgvjtUTlJMksKTy0V+YU8mccP0LZlaLe3fOxjNo7fm5yapw6MV1f3/Kd3s7hkdlDvdgRPWIpT0voLKgrcJXtTK886RDE=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--dmatlack.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=AgfY0iJf; arc=none smtp.client-ip=209.85.128.202
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--dmatlack.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="AgfY0iJf"
Received: by mail-yw1-f202.google.com with SMTP id
 00721157ae682-666010fb35cso5010407b3.0
        for <kvm@vger.kernel.org>; Mon, 05 Aug 2024 16:31:26 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1722900686; x=1723505486;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=egp86X5rm0dkr32bM6IyMRoOHcUMKLaRskZtRCsJ9Xo=;
        b=AgfY0iJfumuxy8OkTPoKNciehoz0JncYEhQ+IE6JOWgR0w5wGOswkGrJGdwz+AvazU
         Xzg6v5xbRxEZn1X0qFBPMeOHTjmWeQ+jGAZvOuGGEXbBVp/wG9BFUEzY+q2z/n44+CoH
         SmlFeDp5hevcEDQCYK76U1Oi4J8xxG/PerM6Sl4Qm+cQOJW6s/Ffkl0coqcbR6z+4T+q
         c+LrYl5BmsdyFnTcH+V5tAIITZhwYcsOMOg7ODM+FIj1VtzpHCXTsIWLS3o6/m0kqte6
         E23XdUB3G5dLB1FD+2lY7AkXvTG0M2cMBmilbHiG/4zLz5gqZYmA2o8Y8R/mfuG/fRuA
         kUIQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1722900686; x=1723505486;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=egp86X5rm0dkr32bM6IyMRoOHcUMKLaRskZtRCsJ9Xo=;
        b=ulKd3xc1jbWkXr5ktM/09R+N1BpEdb4as4t8tZANQzeXLK65kMabaCMPczYakKt62j
         XtNZSaNAv9KPBUdOIvVMEEqgSTeAL2iC0WPsxBnYIrnesb4prcp3HyUV0O3nsmA/hNgq
         TZoMFdqke1jpMfFD1V33eKEjRe7jmwp4rOFEtjnYdtoO7+NUAH4ponF5XHDGFkkvowdf
         E/se3xWQBQ/VE5PlF4+ndyGPkvg3sE0a3QUPfh0GIPGOYG0qtR2uFNo3x/efwuqocZlG
         yTXcGATwomLEp11/aaueNxD3Ih5sWjXyktYV17ICu9OvCpXgAMqces7r6n/Ef3m9G4HK
         4How==
X-Gm-Message-State: AOJu0YwXKnwuVRd0n9ijTJtusmq5df+C0j082Ay76FMlJ0TABLdcqxAO
	rgAAh84o3tbWpXPdPYPkS+9P162a8VvwwZeMq5a7JxExIw6R1Jghgf7FtGQtQLxcQCXAVX6jJbS
	tdeym5kHgGg==
X-Google-Smtp-Source: 
 AGHT+IHMPMfBa8ipSgkSi+Jkkkhx4dtrHeqWVmB9zKhr7LQVybrCF+feXixABH9uFEe1v3OZ9Ihp9M9lY2r3EA==
X-Received: from dmatlack-n2d-128.c.googlers.com
 ([fda3:e722:ac3:cc00:20:ed76:c0a8:1309])
 (user=dmatlack job=sendgmr) by 2002:a05:690c:d82:b0:68e:8de6:617c with SMTP
 id 00721157ae682-68e8de68517mr3937097b3.5.1722900685792; Mon, 05 Aug 2024
 16:31:25 -0700 (PDT)
Date: Mon,  5 Aug 2024 16:31:12 -0700
In-Reply-To: <20240805233114.4060019-1-dmatlack@google.com>
Precedence: bulk
X-Mailing-List: kvm@vger.kernel.org
List-Id: <kvm.vger.kernel.org>
List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20240805233114.4060019-1-dmatlack@google.com>
X-Mailer: git-send-email 2.46.0.rc2.264.g509ed76dc8-goog
Message-ID: <20240805233114.4060019-6-dmatlack@google.com>
Subject: [PATCH 5/7] KVM: x86/mmu: Rename make_huge_page_split_spte() to
 make_small_spte()
From: David Matlack <dmatlack@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>,
 Sean Christopherson <seanjc@google.com>
Cc: kvm@vger.kernel.org, David Matlack <dmatlack@google.com>

Rename make_huge_page_split_spte() to make_small_spte(). This ensures
that the usage of "small_spte" and "huge_spte" are consistent between
make_huge_spte() and make_small_spte().

This should also reduce some confusion as make_huge_page_split_spte()
almost reads like it will create a huge SPTE, when in fact it is
creating a small SPTE to split the huge SPTE.

No functional change intended.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c     | 2 +-
 arch/x86/kvm/mmu/spte.c    | 4 ++--
 arch/x86/kvm/mmu/spte.h    | 4 ++--
 arch/x86/kvm/mmu/tdp_mmu.c | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 34e59210d94e..3610896cd9d6 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6706,7 +6706,7 @@ static void shadow_mmu_split_huge_page(struct kvm *kvm,
 			continue;
 		}
 
-		spte = make_huge_page_split_spte(kvm, huge_spte, sp->role, index);
+		spte = make_small_spte(kvm, huge_spte, sp->role, index);
 		mmu_spte_set(sptep, spte);
 		__rmap_add(kvm, cache, slot, sptep, gfn, sp->role.access);
 	}
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 979387d4ebfa..5b38b8c5ba51 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -300,8 +300,8 @@ static u64 make_spte_nonexecutable(u64 spte)
  * This is used during huge page splitting to build the SPTEs that make up the
  * new page table.
  */
-u64 make_huge_page_split_spte(struct kvm *kvm, u64 huge_spte,
-			      union kvm_mmu_page_role role, int index)
+u64 make_small_spte(struct kvm *kvm, u64 huge_spte,
+		    union kvm_mmu_page_role role, int index)
 {
 	u64 child_spte = huge_spte;
 
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 498c30b6ba71..515d7e801f5e 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -501,8 +501,8 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 	       unsigned int pte_access, gfn_t gfn, kvm_pfn_t pfn,
 	       u64 old_spte, bool prefetch, bool can_unsync,
 	       bool host_writable, u64 *new_spte);
-u64 make_huge_page_split_spte(struct kvm *kvm, u64 huge_spte,
-		      	      union kvm_mmu_page_role role, int index);
+u64 make_small_spte(struct kvm *kvm, u64 huge_spte,
+		    union kvm_mmu_page_role role, int index);
 u64 make_huge_spte(struct kvm *kvm, u64 small_spte, int level);
 u64 make_nonleaf_spte(u64 *child_pt, bool ad_disabled);
 u64 make_mmio_spte(struct kvm_vcpu *vcpu, u64 gfn, unsigned int access);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 3f2d7343194e..9da319fd840e 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1328,7 +1328,7 @@ static int tdp_mmu_split_huge_page(struct kvm *kvm, struct tdp_iter *iter,
 	 * not been linked in yet and thus is not reachable from any other CPU.
 	 */
 	for (i = 0; i < SPTE_ENT_PER_PAGE; i++)
-		sp->spt[i] = make_huge_page_split_spte(kvm, huge_spte, sp->role, i);
+		sp->spt[i] = make_small_spte(kvm, huge_spte, sp->role, i);
 
 	/*
 	 * Replace the huge spte with a pointer to the populated lower level

From patchwork Mon Aug  5 23:31:13 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: David Matlack <dmatlack@google.com>
X-Patchwork-Id: 13754231
Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com
 [209.85.219.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id A2E79170A35
	for <kvm@vger.kernel.org>; Mon,  5 Aug 2024 23:31:28 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.219.202
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1722900690; cv=none;
 b=mY0b8ZxLZ4F38+SwhWSPsvnnWyzZ3/v62yCkMc0HNIuqHegduQdxTkePNfIFdcuj+Hh5mRC7/UEoG3MxZBdKK6F5Os9cBid7vfigAC9tBcUZoJZH2DSQCT1sJ/oCr2OYpr29Qk9EWapLF6LVtvrFcYaOUm2iOdGhONf6QYosnjM=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1722900690; c=relaxed/simple;
	bh=MDOMLayfjIGR/QoQ/prCRhBhZISBv/jdpz5s1GldHHE=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=ulyjIncPxT5cgFOGNirD04CLl5VzTVZxLNnwrRltzAXkHWzo4ZckEUbT7sX1/iANZO0s2Akj+z3ub5gsJCb6l/42ibCRnvNo7m7wQw/4zTIrF4rCsFFX9SsXlmyfJY3O5ed35I5YA5b8gZ031fcn7EaWjcjR1h2RDkyRBHDg1Og=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--dmatlack.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=DNKFwzoI; arc=none smtp.client-ip=209.85.219.202
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--dmatlack.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="DNKFwzoI"
Received: by mail-yb1-f202.google.com with SMTP id
 3f1490d57ef6-e0353b731b8so124397276.2
        for <kvm@vger.kernel.org>; Mon, 05 Aug 2024 16:31:28 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1722900687; x=1723505487;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=MHhVlAWRZI6u1lNbQzZ5qdQHjZqjpl6IHTgK6z1gw9I=;
        b=DNKFwzoIt0v18aqaKgP8SfpqnT/s6yx7MH6DgU5Vr3PBhJwhqCQW3Mx8oNE8KSGz8L
         E4jYrMdCSOZyEJM5gLp5lWup8nfuVW042kXXnEtuXK4bLTKtoHopWMb2iWOK3xkRpqVc
         AhTPLXz+QHk0hKNUS4+7pccEKLhBzD+AlVxmAZ8SX8gaxsOHgNzh2CzLo47zs9n03ez2
         v5n0EMqsV1JjvaHTTGTPEUXo7Nayt51vVjWGYE4jOU1kI5Da1AWzvnyukfBhWhkywJlX
         kS2CSUJzbVzb1x7Tb4UZBDXUISySobQBavv+gECLJlBSaveLB8blzvw+8malmzKgtWP/
         51xw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1722900687; x=1723505487;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=MHhVlAWRZI6u1lNbQzZ5qdQHjZqjpl6IHTgK6z1gw9I=;
        b=PrCpSJXlSlEIrISBhU11k5UCd8YGlCbOALeFMm5WCPpItD9qIMI7uIuXOq9LTw3rQt
         8jU+lzv2sqNGy/c4fGECFZsdlvGnOhI+iRL86m15BUNKE+KACyYsIJ5sn7OPQtHqKwLg
         xY6sNrrbv8FdvwF4XQdJser3rBL04J9uOzPxCJ4ABiPfVRiCLATkdRSt2mHGeAVWYM15
         3gL8M1K+RTtMUZCCJAXxsycu6oC7MpetN34JLUNwC62EfeTB3j8jQh7xS2xp4KlPq/lc
         s13D9D23sOZ13+qAa00j0wfS7pDpCFJ9jACv+EE2mVq7Ei0jGL8VEV3NUzH/OnuGE+Rt
         dPSA==
X-Gm-Message-State: AOJu0Yxe2ZaTtWDInEho3KmdBd0aKgvHoyGMTNCZM0eT4Q9bAhLoGRt9
	X+NzXeIRb3cliwyrR/mUFLj/5yJq0Ms0L66kcxOtIiZ1oRFO8MCna3urZNCeW4N865CKzC2r+xA
	ofZZJBR27bQ==
X-Google-Smtp-Source: 
 AGHT+IF7slbkdtxlS0N6e+PrgrzfBzkNXTgW7Ht42YhCFEGRcI4DezIZzAG98aSUQatMmAEpn7loFuaKtxwhpw==
X-Received: from dmatlack-n2d-128.c.googlers.com
 ([fda3:e722:ac3:cc00:20:ed76:c0a8:1309])
 (user=dmatlack job=sendgmr) by 2002:a05:6902:120e:b0:e0b:fa7e:8cbe with SMTP
 id 3f1490d57ef6-e0bfa7e8ed7mr15238276.11.1722900687581; Mon, 05 Aug 2024
 16:31:27 -0700 (PDT)
Date: Mon,  5 Aug 2024 16:31:13 -0700
In-Reply-To: <20240805233114.4060019-1-dmatlack@google.com>
Precedence: bulk
X-Mailing-List: kvm@vger.kernel.org
List-Id: <kvm.vger.kernel.org>
List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20240805233114.4060019-1-dmatlack@google.com>
X-Mailer: git-send-email 2.46.0.rc2.264.g509ed76dc8-goog
Message-ID: <20240805233114.4060019-7-dmatlack@google.com>
Subject: [PATCH 6/7] KVM: x86/mmu: WARN if huge page recovery triggered during
 dirty logging
From: David Matlack <dmatlack@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>,
 Sean Christopherson <seanjc@google.com>
Cc: kvm@vger.kernel.org, David Matlack <dmatlack@google.com>

WARN and bail out of recover_huge_pages_range() if dirty logging is
enabled. KVM shouldn't be recovering huge pages during dirty logging
anyway, since KVM needs to track writes at 4KiB. However its not out of
the possibility that that changes in the future.

If KVM wants to recover huge pages during dirty logging,
make_huge_spte() must be updated to write-protect the new huge page
mapping. Otherwise, writes through the newly recovered huge page mapping
will not be tracked.

Note that this potential risk did not exist back when KVM zapped to
recover huge page mappings, since subsequent accesses would just be
faulted in at PG_LEVEL_4K if dirty logging was enabled.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 9da319fd840e..07d5363c9db7 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1586,6 +1586,9 @@ static void recover_huge_pages_range(struct kvm *kvm,
 	bool flush = false;
 	u64 huge_spte;
 
+	if (WARN_ON_ONCE(kvm_slot_dirty_track_enabled(slot)))
+		return;
+
 	rcu_read_lock();
 
 	tdp_root_for_each_pte(iter, root, start, end) {

From patchwork Mon Aug  5 23:31:14 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: David Matlack <dmatlack@google.com>
X-Patchwork-Id: 13754232
Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com
 [209.85.219.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2FB7717109B
	for <kvm@vger.kernel.org>; Mon,  5 Aug 2024 23:31:29 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.219.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1722900691; cv=none;
 b=HdxBkORE8OZoTQr+8/VfFiUrVX1/5TDk+n4ElRC6NzI0Qwv2TjrN2o7goeWDspK6+Id1bkHFJAkdpxUT5zycPM+AyXV3Eh/xEDJ1KFHi/GRIbiaOraUNyHb9Vj2sX0c8bGaehah5kopQA4aBOdkxbogVR08zkmUPbxobamSEG2w=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1722900691; c=relaxed/simple;
	bh=0HmNEOpgMjQpIHUlfH6eKvwTm69FIkgwj2MocE8We5w=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=pfdYaQmNqdLrNcysYesOjQG55rqZkXsysezUObqo4fAImeVwkpSd0NW8S6BFrW575/6pgJgt0rxMpJ6NOEAb38S3rZI4uXXmJKrX58B6KX9fb0uMV38QyVVsRiq2RobSjrCGbAQ/zkoiJxaG+2Mj/diLuCLcoYzQrPHmR05LSTM=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--dmatlack.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=a+3UHOE+; arc=none smtp.client-ip=209.85.219.201
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--dmatlack.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="a+3UHOE+"
Received: by mail-yb1-f201.google.com with SMTP id
 3f1490d57ef6-e02b5792baaso20387395276.2
        for <kvm@vger.kernel.org>; Mon, 05 Aug 2024 16:31:29 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1722900689; x=1723505489;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=DKEJYawgFTgqeLxqFQfo6jraqV3EuXPgrnHcDXOEExE=;
        b=a+3UHOE+ECTrYrSW26+ZFqtyFySIMtnsKp/6nq+oHlBAqCmizyHi3TzC+NPjscsIin
         Gax7i7watIQfUUX/3q5Rxeecdyv0WwRtm+eH1gGi4NtPV4eBIuuxjj63M8B9HMJ5O2ca
         rwI5Il+givCmkM44QaoKruR+43W47rUUtmcRfnwAEio6+0GmCO6ZFcTacBVhaOAJ5hWv
         jPGOUzSMCHHdkee1Ip1iKptYABDcfm18s22fjxbpSXwDzqEnRHnCT7H6injdPEyAzw4n
         CYnIxY6PnPVR3etRYGwIcxn9vW0NgfhZHLQVrxxrsneE6HL2zee+iUP0DF2iFuQV5aEM
         hk5w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1722900689; x=1723505489;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=DKEJYawgFTgqeLxqFQfo6jraqV3EuXPgrnHcDXOEExE=;
        b=WI7lJ21HHE2XxwFGoaTXozaYI8YeNg89e4L6jD3/tbbau35XkiEIvCeUMxJ/8qbTlq
         8+xDxc9Dvjjx7dLwwEttvGfdR3dS+fZWAq1Ox87aUMEvXMFX4kT93slYxWoE/g+W/ljk
         DYcsW3lUld1885ezbUngN8QNO8FYj8aUgBsJ2r44u/bwEa5bkZr+z9izsGjXx0x7CmAX
         D/A66Fqf6DbEhJX/g/n8PrnJ/QmJHjRLho9ogClRTGkbLMRtwgWZ24uW4hDp6xcusE9Q
         pnyTmBlORAUuRqtlpca2SW0+hd+/MqdB4Usfts0ZhRxMvpdHo9T1hTQMRXXXr9c6TyQU
         O/+w==
X-Gm-Message-State: AOJu0YzLEx+L8WLm86x2Mzjd5sM+a5Kvm1OD08cEoLJAHA1oldg61p2m
	ynMkDOziStn/INu0D4LkVEjkxSwRVnHmu2Q9W9hfUaemfFTr3NaSiX1Xz0lg8caTApbx8i9znOG
	A8OpbOOS5Mw==
X-Google-Smtp-Source: 
 AGHT+IFV1GjWLQYMLRrUrhhfXUhX9zXBqQRj7Y3/Pl/wJJTSBB1xP3VnbQKxlqCwlbyCj71YlBbnnpPzI/u/Cw==
X-Received: from dmatlack-n2d-128.c.googlers.com
 ([fda3:e722:ac3:cc00:20:ed76:c0a8:1309])
 (user=dmatlack job=sendgmr) by 2002:a5b:405:0:b0:e0b:f69b:da30 with SMTP id
 3f1490d57ef6-e0bf69be1bdmr56099276.9.1722900689163; Mon, 05 Aug 2024 16:31:29
 -0700 (PDT)
Date: Mon,  5 Aug 2024 16:31:14 -0700
In-Reply-To: <20240805233114.4060019-1-dmatlack@google.com>
Precedence: bulk
X-Mailing-List: kvm@vger.kernel.org
List-Id: <kvm.vger.kernel.org>
List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20240805233114.4060019-1-dmatlack@google.com>
X-Mailer: git-send-email 2.46.0.rc2.264.g509ed76dc8-goog
Message-ID: <20240805233114.4060019-8-dmatlack@google.com>
Subject: [PATCH 7/7] KVM: x86/mmu: Recheck SPTE points to a PT during huge
 page recovery
From: David Matlack <dmatlack@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>,
 Sean Christopherson <seanjc@google.com>
Cc: kvm@vger.kernel.org, David Matlack <dmatlack@google.com>

Recheck the iter.old_spte still points to a page table when recovering
huge pages. Since mmu_lock is held for read and tdp_iter_step_up()
re-reads iter.sptep, it's possible the SPTE was zapped or recovered by
another CPU in between stepping down and back up.

This could avoids a useless cmpxchg (and possibly a remote TLB flush) if
another CPU is recovering huge SPTEs in parallel (e.g. the NX huge page
recovery worker, or vCPUs taking faults on the huge page region).

This also makes it clear that tdp_iter_step_up() re-reads the SPTE and
thus can see a different value, which is not immediately obvious when
reading the code.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 07d5363c9db7..bdc7fd476721 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1619,6 +1619,17 @@ static void recover_huge_pages_range(struct kvm *kvm,
 		while (max_mapping_level > iter.level)
 			tdp_iter_step_up(&iter);
 
+		/*
+		 * Re-check that iter.old_spte still points to a page table.
+		 * Since mmu_lock is held for read and tdp_iter_step_up()
+		 * re-reads iter.sptep, it's possible the SPTE was zapped or
+		 * recovered by another CPU in between stepping down and
+		 * stepping back up.
+		 */
+		if (!is_shadow_present_pte(iter.old_spte) ||
+		    is_last_spte(iter.old_spte, iter.level))
+			continue;
+
 		if (!tdp_mmu_set_spte_atomic(kvm, &iter, huge_spte))
 			flush = true;