From patchwork Sat Apr  9 00:38:42 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Sean Christopherson <seanjc@google.com>
X-Patchwork-Id: 12807301
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id BEC67C433F5
	for <kvm@archiver.kernel.org>; Sat,  9 Apr 2022 00:38:57 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S240161AbiDIAlA (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Fri, 8 Apr 2022 20:41:00 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59990 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S237195AbiDIAk6 (ORCPT <rfc822;kvm@vger.kernel.org>);
        Fri, 8 Apr 2022 20:40:58 -0400
Received: from mail-pg1-x549.google.com (mail-pg1-x549.google.com
 [IPv6:2607:f8b0:4864:20::549])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 249E4C6ECA
        for <kvm@vger.kernel.org>; Fri,  8 Apr 2022 17:38:53 -0700 (PDT)
Received: by mail-pg1-x549.google.com with SMTP id
 t6-20020a6549c6000000b00398a43dbdf8so5561172pgs.10
        for <kvm@vger.kernel.org>; Fri, 08 Apr 2022 17:38:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=GaLuhDl55Y2eSZ63P29gfgoHiJVz0xcWt0BcJEFLWAE=;
        b=SqBgq4za5MP6xb/cuDyd8ClD334SQ9mUdvvMuKGvqN8e+CbACIKUk30lLRCfjGIyp4
         zb4BauUvtEPVGy2WdM5AyjF8a2PdyIgCQFfvNNTySDm4v7mnoVgnm1DNpArN+V/14b6p
         94Pn4h/q1TYo9rZ9l3STC9PGABXDurhUnul0r3e5kGudFUSQLUGAlW8HEb3Q2emf9ESd
         s7liyUv6MtwEAdMi7pm/zAp20na5uyPPDBGKLoCAxH1G6PNaJeNaULPC14nXtmulmRoe
         Y7k4sfWeTbKlxKUIcwjNvjZxeffa0DuFitL5EIFun25Ls3/Oao9NzhjFvHN36jIABOx/
         MI0A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=GaLuhDl55Y2eSZ63P29gfgoHiJVz0xcWt0BcJEFLWAE=;
        b=RSUKNhd12itv9VnvmLZAEAy4Wbo0sbOhT7hCjGlpzy9LWtw6CNdfr2/sFruHKaIUVH
         WGCDewviAt9cJ6cvLqGFh8wRMix0OX6TPCF7czpDUqnZjLBTrOmZLXwZeHJRHE5KG/Cq
         SuGjTHHZt0RaOu1H4DvEsIy+9MZSo5XSLEsr3TstG4DDWvKD3sP1nYFnXYSHOLvI3jUc
         DSvGgMz+6tU0/m+6BwRbcjEucKpMvbdl2Y7R3qsYLvYf+aE9aW0vtGNOF5AMEopzCvx8
         6yELA8HnLyEq9c2PuJParDyHWS/S37BAhzpyMpMC0bv5NC355RldWPSbkI0lYLfEzmsn
         POiw==
X-Gm-Message-State: AOAM533pVbeiVyRKqpIP0HDI0H6yBfoHyz7FU+hX9uVDGLROkKFqak61
        eVEKjO33s9NuNoStEFjUasGVg/bI0RU=
X-Google-Smtp-Source: 
 ABdhPJzNVw4BLCrsaWV7wI/QXOTT/UhiXp+P8c5IloEza0ksBnLudqUL7GAcJMLG7bmvF48XuCnHvlPEEqY=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a05:6a00:244a:b0:4fa:ebf9:75de with SMTP id
 d10-20020a056a00244a00b004faebf975demr22004488pfj.73.1649464732675; Fri, 08
 Apr 2022 17:38:52 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Sat,  9 Apr 2022 00:38:42 +0000
In-Reply-To: <20220409003847.819686-1-seanjc@google.com>
Message-Id: <20220409003847.819686-2-seanjc@google.com>
Mime-Version: 1.0
References: <20220409003847.819686-1-seanjc@google.com>
X-Mailer: git-send-email 2.35.1.1178.g4f1659d476-goog
Subject: [PATCH 1/6] KVM: x86/mmu: Tag disallowed NX huge pages even if
 they're not tracked
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Mingwei Zhang <mizhang@google.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

Tag shadow pages that cannot be replaced with an NX huge page even if
zapping the page would not allow KVM to create a huge page, e.g. because
something else prevents creating a huge page.  This will allow a future
patch to more precisely apply the mitigation by checking if an existing
shadow page can be replaced by a NX huge page.  Currently, KVM assumes
that any existing shadow page encountered cannot be replaced by a NX huge
page (if the mitigation is enabled), which prevents KVM from replacing
no-longer-necessary shadow pages with huge pages, e.g. after disabling
dirty logging, zapping from the mmu_notifier due to page migration,
etc...

Failure to tag shadow pages appropriately could theoretically lead to
false negatives, e.g. if a fetch fault requests a small page and thus
isn't tracked, and a read/write fault later requests a huge page, KVM
will not reject the huge page as it should.

To avoid yet another flag, initialize the list_head and use list_empty()
to determine whether or not a page is on the list of NX huge pages that
should be recovered.

Opportunstically rename most of the variables/functions involved to
provide consistency, e.g. lpage vs huge page and NX huge vs huge NX, and
clarity, e.g. to make it obvious the flag applies only to the NX huge
page mitigation, not to any condition that prevents creating a huge page.

Fixes: 5bcaf3e1715f ("KVM: x86/mmu: Account NX huge page disallowed iff huge page was requested")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  6 +--
 arch/x86/kvm/mmu/mmu.c          | 75 ++++++++++++++++++++++-----------
 arch/x86/kvm/mmu/mmu_internal.h | 22 ++++++++--
 arch/x86/kvm/mmu/paging_tmpl.h  |  6 +--
 arch/x86/kvm/mmu/tdp_mmu.c      |  8 ++--
 5 files changed, 79 insertions(+), 38 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2c20f715f009..e4f7e7998928 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1063,7 +1063,7 @@ struct kvm_arch {
 	struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
 	struct list_head active_mmu_pages;
 	struct list_head zapped_obsolete_pages;
-	struct list_head lpage_disallowed_mmu_pages;
+	struct list_head possible_nx_huge_pages;
 	struct kvm_page_track_notifier_node mmu_sp_tracker;
 	struct kvm_page_track_notifier_head track_notifier_head;
 	/*
@@ -1219,8 +1219,8 @@ struct kvm_arch {
 	 *  - tdp_mmu_roots (above)
 	 *  - tdp_mmu_pages (above)
 	 *  - the link field of struct kvm_mmu_pages used by the TDP MMU
-	 *  - lpage_disallowed_mmu_pages
-	 *  - the lpage_disallowed_link field of struct kvm_mmu_pages used
+	 *  - possible_nx_huge_pages;
+	 *  - the possible_nx_huge_page_link field of struct kvm_mmu_pages used
 	 *    by the TDP MMU
 	 * It is acceptable, but not necessary, to acquire this lock when
 	 * the thread holds the MMU lock in write mode.
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 69a30d6d1e2b..d230d2d78ace 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -809,15 +809,43 @@ static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 	kvm_mmu_gfn_disallow_lpage(slot, gfn);
 }
 
-void account_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp)
+static void untrack_possible_nx_huge_page(struct kvm *kvm,
+					  struct kvm_mmu_page *sp)
 {
-	if (sp->lpage_disallowed)
+	if (list_empty(&sp->possible_nx_huge_page_link))
+		return;
+
+	--kvm->stat.nx_lpage_splits;
+	list_del_init(&sp->possible_nx_huge_page_link);
+}
+
+void unaccount_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
+{
+	sp->nx_huge_page_disallowed = false;
+
+	untrack_possible_nx_huge_page(kvm, sp);
+}
+
+static void track_possible_nx_huge_page(struct kvm *kvm,
+					struct kvm_mmu_page *sp)
+{
+	if (!list_empty(&sp->possible_nx_huge_page_link))
 		return;
 
 	++kvm->stat.nx_lpage_splits;
-	list_add_tail(&sp->lpage_disallowed_link,
-		      &kvm->arch.lpage_disallowed_mmu_pages);
-	sp->lpage_disallowed = true;
+	list_add_tail(&sp->possible_nx_huge_page_link,
+		      &kvm->arch.possible_nx_huge_pages);
+}
+
+void account_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp,
+			  bool nx_huge_page_possible)
+{
+	sp->nx_huge_page_disallowed = true;
+
+	if (!nx_huge_page_possible)
+		untrack_possible_nx_huge_page(kvm, sp);
+	else
+		track_possible_nx_huge_page(kvm, sp);
 }
 
 static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
@@ -837,13 +865,6 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 	kvm_mmu_gfn_allow_lpage(slot, gfn);
 }
 
-void unaccount_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp)
-{
-	--kvm->stat.nx_lpage_splits;
-	sp->lpage_disallowed = false;
-	list_del(&sp->lpage_disallowed_link);
-}
-
 static struct kvm_memory_slot *
 gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu, gfn_t gfn,
 			    bool no_dirty_log)
@@ -1713,6 +1734,8 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct
 		sp->gfns = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_gfn_array_cache);
 	set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
 
+	INIT_LIST_HEAD(&sp->possible_nx_huge_page_link);
+
 	/*
 	 * active_mmu_pages must be a FIFO list, as kvm_zap_obsolete_pages()
 	 * depends on valid pages being added to the head of the list.  See
@@ -2352,8 +2375,8 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm,
 		zapped_root = !is_obsolete_sp(kvm, sp);
 	}
 
-	if (sp->lpage_disallowed)
-		unaccount_huge_nx_page(kvm, sp);
+	if (sp->nx_huge_page_disallowed)
+		unaccount_nx_huge_page(kvm, sp);
 
 	sp->role.invalid = 1;
 
@@ -2931,9 +2954,9 @@ static int __direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 				      it.level - 1, true, ACC_ALL);
 
 		link_shadow_page(vcpu, it.sptep, sp);
-		if (fault->is_tdp && fault->huge_page_disallowed &&
-		    fault->req_level >= it.level)
-			account_huge_nx_page(vcpu->kvm, sp);
+		if (fault->is_tdp && fault->huge_page_disallowed)
+			account_nx_huge_page(vcpu->kvm, sp,
+					     fault->req_level >= it.level);
 	}
 
 	if (WARN_ON_ONCE(it.level != fault->goal_level))
@@ -5717,7 +5740,7 @@ int kvm_mmu_init_vm(struct kvm *kvm)
 
 	INIT_LIST_HEAD(&kvm->arch.active_mmu_pages);
 	INIT_LIST_HEAD(&kvm->arch.zapped_obsolete_pages);
-	INIT_LIST_HEAD(&kvm->arch.lpage_disallowed_mmu_pages);
+	INIT_LIST_HEAD(&kvm->arch.possible_nx_huge_pages);
 	spin_lock_init(&kvm->arch.mmu_unsync_pages_lock);
 
 	r = kvm_mmu_init_tdp_mmu(kvm);
@@ -6328,23 +6351,25 @@ static void kvm_recover_nx_lpages(struct kvm *kvm)
 	ratio = READ_ONCE(nx_huge_pages_recovery_ratio);
 	to_zap = ratio ? DIV_ROUND_UP(nx_lpage_splits, ratio) : 0;
 	for ( ; to_zap; --to_zap) {
-		if (list_empty(&kvm->arch.lpage_disallowed_mmu_pages))
+		if (list_empty(&kvm->arch.possible_nx_huge_pages))
 			break;
 
 		/*
 		 * We use a separate list instead of just using active_mmu_pages
-		 * because the number of lpage_disallowed pages is expected to
-		 * be relatively small compared to the total.
+		 * because the number of shadow pages that can be replaced with
+		 * an NX huge page is expected to be relatively small compared
+		 * to the total number of shadow pages.  And because the TDP MMU
+		 * doesn't use active_mmu_pages.
 		 */
-		sp = list_first_entry(&kvm->arch.lpage_disallowed_mmu_pages,
+		sp = list_first_entry(&kvm->arch.possible_nx_huge_pages,
 				      struct kvm_mmu_page,
-				      lpage_disallowed_link);
-		WARN_ON_ONCE(!sp->lpage_disallowed);
+				      possible_nx_huge_page_link);
+		WARN_ON_ONCE(!sp->nx_huge_page_disallowed);
 		if (is_tdp_mmu_page(sp)) {
 			flush |= kvm_tdp_mmu_zap_sp(kvm, sp);
 		} else {
 			kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list);
-			WARN_ON_ONCE(sp->lpage_disallowed);
+			WARN_ON_ONCE(sp->nx_huge_page_disallowed);
 		}
 
 		if (need_resched() || rwlock_needbreak(&kvm->mmu_lock)) {
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 1bff453f7cbe..5c460c727407 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -43,7 +43,13 @@ struct kvm_mmu_page {
 	bool tdp_mmu_page;
 	bool unsync;
 	u8 mmu_valid_gen;
-	bool lpage_disallowed; /* Can't be replaced by an equiv large page */
+
+	 /*
+	  * The shadow page can't be replaced by an equivalent huge page
+	  * because it is being used to map an executable page in the guest
+	  * and the NX huge page mitigation is enabled.
+	  */
+	bool nx_huge_page_disallowed;
 
 	/*
 	 * The following two entries are used to key the shadow page in the
@@ -73,7 +79,14 @@ struct kvm_mmu_page {
 		};
 	};
 
-	struct list_head lpage_disallowed_link;
+	/*
+	 * Use to track shadow pages that, if zapped, would allow KVM to create
+	 * an NX huge page.  A shadow page will have nx_huge_page_disallowed
+	 * set but not be on the list if a huge page is disallowed for other
+	 * reasons, e.g. because KVM is shadowing a PTE at the same gfn, the
+	 * memslot isn't properly aligned, etc...
+	 */
+	struct list_head possible_nx_huge_page_link;
 #ifdef CONFIG_X86_32
 	/*
 	 * Used out of the mmu-lock to avoid reading spte values while an
@@ -168,7 +181,8 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
 
 void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc);
 
-void account_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp);
-void unaccount_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp);
+void account_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp,
+			  bool nx_huge_page_possible);
+void unaccount_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp);
 
 #endif /* __KVM_X86_MMU_INTERNAL_H */
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 66f1acf153c4..6c4549454a14 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -708,9 +708,9 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 			sp = kvm_mmu_get_page(vcpu, base_gfn, fault->addr,
 					      it.level - 1, true, direct_access);
 			link_shadow_page(vcpu, it.sptep, sp);
-			if (fault->huge_page_disallowed &&
-			    fault->req_level >= it.level)
-				account_huge_nx_page(vcpu->kvm, sp);
+			if (fault->huge_page_disallowed)
+				account_nx_huge_page(vcpu->kvm, sp,
+						     fault->req_level >= it.level);
 		}
 	}
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 566548a3efa7..7f949d48724b 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -284,6 +284,8 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu)
 static void tdp_mmu_init_sp(struct kvm_mmu_page *sp, tdp_ptep_t sptep,
 			    gfn_t gfn, union kvm_mmu_page_role role)
 {
+	INIT_LIST_HEAD(&sp->possible_nx_huge_page_link);
+
 	set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
 
 	sp->role = role;
@@ -390,8 +392,8 @@ static void tdp_mmu_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
 		lockdep_assert_held_write(&kvm->mmu_lock);
 
 	list_del(&sp->link);
-	if (sp->lpage_disallowed)
-		unaccount_huge_nx_page(kvm, sp);
+	if (sp->nx_huge_page_disallowed)
+		unaccount_nx_huge_page(kvm, sp);
 
 	if (shared)
 		spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
@@ -1125,7 +1127,7 @@ static int tdp_mmu_link_sp(struct kvm *kvm, struct tdp_iter *iter,
 	spin_lock(&kvm->arch.tdp_mmu_pages_lock);
 	list_add(&sp->link, &kvm->arch.tdp_mmu_pages);
 	if (account_nx)
-		account_huge_nx_page(kvm, sp);
+		account_nx_huge_page(kvm, sp, true);
 	spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
 
 	return 0;

From patchwork Sat Apr  9 00:38:43 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Sean Christopherson <seanjc@google.com>
X-Patchwork-Id: 12807302
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D57BDC433FE
	for <kvm@archiver.kernel.org>; Sat,  9 Apr 2022 00:38:59 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S240169AbiDIAlC (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Fri, 8 Apr 2022 20:41:02 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60104 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230389AbiDIAk7 (ORCPT <rfc822;kvm@vger.kernel.org>);
        Fri, 8 Apr 2022 20:40:59 -0400
Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com
 [IPv6:2607:f8b0:4864:20::104a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C0B2EC6F17
        for <kvm@vger.kernel.org>; Fri,  8 Apr 2022 17:38:54 -0700 (PDT)
Received: by mail-pj1-x104a.google.com with SMTP id
 w3-20020a17090ac98300b001b8b914e91aso6319388pjt.0
        for <kvm@vger.kernel.org>; Fri, 08 Apr 2022 17:38:54 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=thKGaxk6frVBmncPag9ccep8XhtdrO9GVerkqMqzUPE=;
        b=ep/jovrnXHVIN9Nk/1jRrewEmcxWCM//ePr/E+Alga7Hh1Xud3+PHOb1HiNXOp5uh3
         NF71FC3S7yN85uySpok+zQsY3dxStuqCjXQ9xtESfxXtJyBJrh6tAEcmVClcAMe++DP6
         Ztc44q2CpRE4TRVOYkW6voo5lTNtbu6LjIfS9zfzX+K4Wsl9ySrmDKffyzMezTRS86nk
         OqFX5bYqJIAniksa3haR6EnT8zVz6/8l49HyTG3HqSDiHBVzpqQPxsoaLlrRyc1zSS8Z
         t8HEwHSsTLlbqQyXAHhma7LqPz5PiAFUcOCF0WFYajvcAXXZLWik9X6WUYQp9f6Vk7zM
         qJMg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=thKGaxk6frVBmncPag9ccep8XhtdrO9GVerkqMqzUPE=;
        b=krSSxeGbWnlMptdAwbyN3cOLpTqKp8vsz7aKHCQz0Z4hBlDQtTbsSSfU8VQbIfB8ef
         p7G9mmeNnbnSbbjSYb3r6iiq9zDm2w7Imlx1DatpOSibebYdnyHPVsqx1Pz2h+d9U6SC
         bK+7gYDbRKMax18dn0E006M6HfXmGjJMAtUQZ6hWIIJmXjT2c5ysDdBwwEag+yzf3lOO
         V+HM4agfL49/0nAk0fQviJLKFLRha+re8rTchf+wYhP9Gi9qVD9hplHNZJHeq2y1zqVQ
         RG8sDRbHU7LZ5DUcs1Y5PVjt1QLKjLWONuY4VfcdLYnH26z6CwFCt3edRD7KBFcdADDC
         /7dA==
X-Gm-Message-State: AOAM531MQYf6v1KpOojM9hUs433SEpkcnEOfVXBIEvUaaCuW744B4SsB
        94EzyzjMVDVdzP2pl7Hxzlhlzo4txwQ=
X-Google-Smtp-Source: 
 ABdhPJxHLp7HdoEpNZv6Y8JV/wtuM6oq38rqtJtHP6r/yBc59CRj4JfPm6X7xcHDWTNaz5CC1327h2bB/NQ=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a05:6a00:1a0a:b0:4fc:d6c5:f3f1 with SMTP id
 g10-20020a056a001a0a00b004fcd6c5f3f1mr22302197pfv.45.1649464734306; Fri, 08
 Apr 2022 17:38:54 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Sat,  9 Apr 2022 00:38:43 +0000
In-Reply-To: <20220409003847.819686-1-seanjc@google.com>
Message-Id: <20220409003847.819686-3-seanjc@google.com>
Mime-Version: 1.0
References: <20220409003847.819686-1-seanjc@google.com>
X-Mailer: git-send-email 2.35.1.1178.g4f1659d476-goog
Subject: [PATCH 2/6] KVM: x86/mmu: Properly account NX huge page workaround
 for nonpaging MMUs
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Mingwei Zhang <mizhang@google.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

Account and track NX huge pages for nonpaging MMUs so that a future
enhancement to precisely check if shadow page cannot be replaced by a NX
huge page doesn't get false positives.  Without correct tracking, KVM can
get stuck in a loop if an instruction is fetching and writing data on the
same huge page, e.g. KVM installs a small executable page on the fetch
fault, replaces it with an NX huge page on the write fault, and faults
again on the fetch.

Alternatively, and perhaps ideally, KVM would simply not enforce the
workaround for nonpaging MMUs.  The guest has no page tables to abuse
and KVM is guaranteed to switch to a different MMU on CR0.PG being
toggled so there're no security or performance concerns.  But getting
make_spte() to play nice now and in the future is unnecessarily complex.
In the current code base, make_spte() can enforce the mitigation if TDP
is enabled or the MMU is indirect, but other in-flight patches aim to
drop the @vcpu param[*].  Without a @vcpu, KVM could either pass in the
correct information and/or derive it from the shadow page, but the former
is ugly and the latter subtly non-trivial due to the possitibility of
direct shadow pages in indirect MMUs.  Given that using shadow paging
with an unpaged guest is far from top priority in terms of performance,
_and_ has been subjected to the workaround since its inception, keep it
simple and just fix the accounting glitch.

[*] https://lore.kernel.org/all/20220321224358.1305530-5-bgardon@google.com

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Mingwei Zhang <mizhang@google.com>
---
 arch/x86/kvm/mmu.h      |  9 +++++++++
 arch/x86/kvm/mmu/mmu.c  |  2 +-
 arch/x86/kvm/mmu/spte.c | 11 +++++++++++
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 671cfeccf04e..89df062d5921 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -191,6 +191,15 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 		.user = err & PFERR_USER_MASK,
 		.prefetch = prefetch,
 		.is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault),
+
+		/*
+		 * Note, enforcing the NX huge page mitigation for nonpaging
+		 * MMUs (shadow paging, CR0.PG=0 in the guest) is completely
+		 * unnecessary.  The guest doesn't have any page tables to
+		 * abuse and is guaranteed to switch to a different MMU when
+		 * CR0.PG is toggled on (may not always be guaranteed when KVM
+		 * is using TDP).  See make_spte() for details.
+		 */
 		.nx_huge_page_workaround_enabled = is_nx_huge_page_enabled(),
 
 		.max_level = KVM_MAX_HUGEPAGE_LEVEL,
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index d230d2d78ace..9416445afa3e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2954,7 +2954,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 				      it.level - 1, true, ACC_ALL);
 
 		link_shadow_page(vcpu, it.sptep, sp);
-		if (fault->is_tdp && fault->huge_page_disallowed)
+		if (fault->huge_page_disallowed)
 			account_nx_huge_page(vcpu->kvm, sp,
 					     fault->req_level >= it.level);
 	}
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 4739b53c9734..14ad821cb0c7 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -115,6 +115,17 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 	if (!prefetch)
 		spte |= spte_shadow_accessed_mask(spte);
 
+	/*
+	 * For simplicity, enforce the NX huge page mitigation even if not
+	 * strictly necessary.  KVM could ignore if the mitigation if paging is
+	 * disabled in the guest, but KVM would then have to ensure a new MMU
+	 * is loaded (or all shadow pages zapped) when CR0.PG is toggled on,
+	 * and that's a net negative for performance when TDP is enabled.  KVM
+	 * could ignore the mitigation if TDP is disabled and CR0.PG=0, as KVM
+	 * will always switch to a new MMU if paging is enabled in the guest,
+	 * but that adds complexity just to optimize a mode that is anything
+	 * but performance critical.
+	 */
 	if (level > PG_LEVEL_4K && (pte_access & ACC_EXEC_MASK) &&
 	    is_nx_huge_page_enabled()) {
 		pte_access &= ~ACC_EXEC_MASK;

From patchwork Sat Apr  9 00:38:44 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Sean Christopherson <seanjc@google.com>
X-Patchwork-Id: 12807303
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D1743C433F5
	for <kvm@archiver.kernel.org>; Sat,  9 Apr 2022 00:39:00 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S240176AbiDIAlD (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Fri, 8 Apr 2022 20:41:03 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60200 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S237195AbiDIAlB (ORCPT <rfc822;kvm@vger.kernel.org>);
        Fri, 8 Apr 2022 20:41:01 -0400
Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com
 [IPv6:2607:f8b0:4864:20::1049])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 16C76C6F06
        for <kvm@vger.kernel.org>; Fri,  8 Apr 2022 17:38:56 -0700 (PDT)
Received: by mail-pj1-x1049.google.com with SMTP id
 i15-20020a17090a138f00b001cb64320375so146590pja.4
        for <kvm@vger.kernel.org>; Fri, 08 Apr 2022 17:38:56 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=JJ8+EAZcLP0IhGURS9E5DEEEdPamVlcyrs6/mYaAKeY=;
        b=MmyEt6GKp8Zghm+YSiKRQT2t3J2g00vdg15cu+mOIYj2ViSWyiash3iYqkQ1ApFnpD
         S1ZEVNE0ga7mdOo94bqHJJj04OTEV1Ymw0G3hczNQb51whxad5HqfkDGkE7wF4O4WiwZ
         D88Y5zIkDXS/DG4hVqcb6Q/D3ruLX+zaCplIofPuv2rp3pu/Xrad7Sl3WZRLXZszXn5G
         SvAjvKlGugKweRe4+riNlEqttGoOyJ0WqJBwPQf1TzUVPiqkunTtfQQZC3YhdSszFbQh
         O6Iyz7n+KKcE7g3Ynfk7w4liV6OKKCmY3H3aq0XwBmx0ZUvoBiYYaPeummgqg2vwvOgI
         qa2g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=JJ8+EAZcLP0IhGURS9E5DEEEdPamVlcyrs6/mYaAKeY=;
        b=kQi4cVP1v1G/+KmmJCZtas5fnfX5CpwCZvyshoU5hFWl2Lb55PTDnkZOvz7d4vP224
         krqcVqZjyc7lf8nxBuSTxWo1oxQqyW8vTa6oTOFOb8H7foZalKNZGphbL8lFqYxF+Xya
         cnZGaTMuIHC7/GR1TqxcB7LDZWRKoOXZcNmwWWtxSkpU6GyivwKBbfrdIFhGHKK0CtZU
         DJw1u6tZQ6VLNFj+2oHycZ7neXmE0loZZSFuOzMVSoqyzUdprI0QGlcK2UqmnnLZynEu
         zgDguZpfXXhcmHopRwn+nxIiVGv2+CwyZx0RPwFnv+wnNoF6YeJekfhT21rpARCoZhMS
         GnjA==
X-Gm-Message-State: AOAM5324ggMGBMvRboE916UDzb4GoN87+ZR2I5zb6TCPh44E4lyotUvG
        D8qeFHvTXzmxQSD3/AFmbWgWOFynJbE=
X-Google-Smtp-Source: 
 ABdhPJzw0XWgdpHDT++V+w2BvrYhsxgvU32EUWXY8UOPJUMsN/c8iWC8oH1YhkUyxXzYc75E9l3ErQ0uRes=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a17:90a:8591:b0:1b9:da10:2127 with SMTP id
 m17-20020a17090a859100b001b9da102127mr24564586pjn.13.1649464735641; Fri, 08
 Apr 2022 17:38:55 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Sat,  9 Apr 2022 00:38:44 +0000
In-Reply-To: <20220409003847.819686-1-seanjc@google.com>
Message-Id: <20220409003847.819686-4-seanjc@google.com>
Mime-Version: 1.0
References: <20220409003847.819686-1-seanjc@google.com>
X-Mailer: git-send-email 2.35.1.1178.g4f1659d476-goog
Subject: [PATCH 3/6] KVM: x86/mmu: Set disallowed_nx_huge_page in TDP MMU
 before setting SPTE
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Mingwei Zhang <mizhang@google.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

Set nx_huge_page_disallowed in TDP MMU shadow pages before making the SP
visible to other readers, i.e. before setting its SPTE.  This will allow
KVM to query the flag when determining if a shadow page can be replaced
by a NX huge page without violating the rules of the mitigation.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 12 +++++-------
 arch/x86/kvm/mmu/mmu_internal.h |  5 ++---
 arch/x86/kvm/mmu/tdp_mmu.c      | 30 +++++++++++++++++-------------
 3 files changed, 24 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 9416445afa3e..bc86997f9339 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -809,8 +809,7 @@ static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 	kvm_mmu_gfn_disallow_lpage(slot, gfn);
 }
 
-static void untrack_possible_nx_huge_page(struct kvm *kvm,
-					  struct kvm_mmu_page *sp)
+void untrack_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	if (list_empty(&sp->possible_nx_huge_page_link))
 		return;
@@ -819,15 +818,14 @@ static void untrack_possible_nx_huge_page(struct kvm *kvm,
 	list_del_init(&sp->possible_nx_huge_page_link);
 }
 
-void unaccount_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
+static void unaccount_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	sp->nx_huge_page_disallowed = false;
 
 	untrack_possible_nx_huge_page(kvm, sp);
 }
 
-static void track_possible_nx_huge_page(struct kvm *kvm,
-					struct kvm_mmu_page *sp)
+void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	if (!list_empty(&sp->possible_nx_huge_page_link))
 		return;
@@ -837,8 +835,8 @@ static void track_possible_nx_huge_page(struct kvm *kvm,
 		      &kvm->arch.possible_nx_huge_pages);
 }
 
-void account_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp,
-			  bool nx_huge_page_possible)
+static void account_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp,
+				 bool nx_huge_page_possible)
 {
 	sp->nx_huge_page_disallowed = true;
 
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 5c460c727407..75e830c648da 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -181,8 +181,7 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
 
 void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc);
 
-void account_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp,
-			  bool nx_huge_page_possible);
-void unaccount_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp);
+void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp);
+void untrack_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp);
 
 #endif /* __KVM_X86_MMU_INTERNAL_H */
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 7f949d48724b..9966735601a6 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -392,8 +392,10 @@ static void tdp_mmu_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
 		lockdep_assert_held_write(&kvm->mmu_lock);
 
 	list_del(&sp->link);
-	if (sp->nx_huge_page_disallowed)
-		unaccount_nx_huge_page(kvm, sp);
+	if (sp->nx_huge_page_disallowed) {
+		sp->nx_huge_page_disallowed = false;
+		untrack_possible_nx_huge_page(kvm, sp);
+	}
 
 	if (shared)
 		spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
@@ -1102,16 +1104,13 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
  * @kvm: kvm instance
  * @iter: a tdp_iter instance currently on the SPTE that should be set
  * @sp: The new TDP page table to install.
- * @account_nx: True if this page table is being installed to split a
- *              non-executable huge page.
  * @shared: This operation is running under the MMU lock in read mode.
  *
  * Returns: 0 if the new page table was installed. Non-0 if the page table
  *          could not be installed (e.g. the atomic compare-exchange failed).
  */
 static int tdp_mmu_link_sp(struct kvm *kvm, struct tdp_iter *iter,
-			   struct kvm_mmu_page *sp, bool account_nx,
-			   bool shared)
+			   struct kvm_mmu_page *sp, bool shared)
 {
 	u64 spte = make_nonleaf_spte(sp->spt, !shadow_accessed_mask);
 	int ret = 0;
@@ -1126,8 +1125,6 @@ static int tdp_mmu_link_sp(struct kvm *kvm, struct tdp_iter *iter,
 
 	spin_lock(&kvm->arch.tdp_mmu_pages_lock);
 	list_add(&sp->link, &kvm->arch.tdp_mmu_pages);
-	if (account_nx)
-		account_nx_huge_page(kvm, sp, true);
 	spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
 
 	return 0;
@@ -1140,6 +1137,7 @@ static int tdp_mmu_link_sp(struct kvm *kvm, struct tdp_iter *iter,
 int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 {
 	struct kvm_mmu *mmu = vcpu->arch.mmu;
+	struct kvm *kvm = vcpu->kvm;
 	struct tdp_iter iter;
 	struct kvm_mmu_page *sp;
 	int ret;
@@ -1176,9 +1174,6 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		}
 
 		if (!is_shadow_present_pte(iter.old_spte)) {
-			bool account_nx = fault->huge_page_disallowed &&
-					  fault->req_level >= iter.level;
-
 			/*
 			 * If SPTE has been frozen by another thread, just
 			 * give up and retry, avoiding unnecessary page table
@@ -1190,10 +1185,19 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 			sp = tdp_mmu_alloc_sp(vcpu);
 			tdp_mmu_init_child_sp(sp, &iter);
 
-			if (tdp_mmu_link_sp(vcpu->kvm, &iter, sp, account_nx, true)) {
+			sp->nx_huge_page_disallowed = fault->huge_page_disallowed;
+
+			if (tdp_mmu_link_sp(kvm, &iter, sp, true)) {
 				tdp_mmu_free_sp(sp);
 				break;
 			}
+
+			if (fault->huge_page_disallowed &&
+			    fault->req_level >= iter.level) {
+				spin_lock(&kvm->arch.tdp_mmu_pages_lock);
+				track_possible_nx_huge_page(kvm, sp);
+				spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
+			}
 		}
 	}
 
@@ -1481,7 +1485,7 @@ static int tdp_mmu_split_huge_page(struct kvm *kvm, struct tdp_iter *iter,
 	 * correctness standpoint since the translation will be the same either
 	 * way.
 	 */
-	ret = tdp_mmu_link_sp(kvm, iter, sp, false, shared);
+	ret = tdp_mmu_link_sp(kvm, iter, sp, shared);
 	if (ret)
 		goto out;
 

From patchwork Sat Apr  9 00:38:45 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Sean Christopherson <seanjc@google.com>
X-Patchwork-Id: 12807304
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id AA422C433EF
	for <kvm@archiver.kernel.org>; Sat,  9 Apr 2022 00:39:17 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S240219AbiDIAlU (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Fri, 8 Apr 2022 20:41:20 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60294 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S240171AbiDIAlC (ORCPT <rfc822;kvm@vger.kernel.org>);
        Fri, 8 Apr 2022 20:41:02 -0400
Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com
 [IPv6:2607:f8b0:4864:20::1049])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D5022C6ECA
        for <kvm@vger.kernel.org>; Fri,  8 Apr 2022 17:38:57 -0700 (PDT)
Received: by mail-pj1-x1049.google.com with SMTP id
 x3-20020a17090a6c0300b001cab7230b41so6275377pjj.9
        for <kvm@vger.kernel.org>; Fri, 08 Apr 2022 17:38:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=DzOG1dO8rYMP0yi3T5wLKRfSac9FjtUBvTnGYR4alpQ=;
        b=WtYztDD216TMxeTmU1+93Vo6j0ypbZQLo/2zSPbj43phg1CTvw3awsFCfI71alMGS9
         e8kWXpgvmRDrKh6pnVORjsJxwneusGrEBKQfmqpFXgGVejPFBToNquTD/whrAiBlcNRG
         5fHpYO6wvgcN2j6x9HxFmtPxfK2BQzDcN16jVQlcZtA7psZlaIzB+yIah8F37Swvr9qz
         pZQHygc0TEEEADDxf5MJJgxqBHiBPKzOSkC5QtKOnciPI80ly61muRqsd6qS6qVBhs+u
         m9mqkdlRq6Ww5DRdi0a5Gnlf5lUkCR1+xIZnolpVUcRXS/MjCUpYiy14rtbAhgLgIuYY
         fxRQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=DzOG1dO8rYMP0yi3T5wLKRfSac9FjtUBvTnGYR4alpQ=;
        b=AVOJUTTyKCgtVMUtktnULzToqW70kYJax7EYmjjhrtj37gpHcvhpIx2tthTm22kVGd
         +eW0b8Bg2XFgwV55TO91jR+AiLEaZxRbH4cT/JDyglCxSEZo5/NSZr3dWuAZKxGDQYl9
         VG4E7bcRZ+t+UKimjmqkWhKYa075H1Y1Q5SF0LJaEM83mFkR/ViFd0tjGft3z3+D1y61
         zSpUPuNjGvgzCZvL2gGgc1YcfGcliiVtuL5LsYomgeqjutGts6OdImCtZsIZIazVqYvw
         TiSzwMrJOkqDiKNIKxXw4fEJNWem+94dWimwrVLKZgrL1W7eCKbMCCeWB4HhYkfDIlLv
         CaVw==
X-Gm-Message-State: AOAM532WcWNa+ZqhbT6Tp/9JfvZ4LuZ1MFHZG3lJJv/tIZ0izNnkriz9
        HTeqhxpWcQMBH9Cqy91C6gLSrOJEJBk=
X-Google-Smtp-Source: 
 ABdhPJw9csYsc/Q1poUULGW/tqgne+GSOH5TWq8ojK98JToAzK/+EaEdJLeQeOF8h4a1kZ3UCKgSDEWdays=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a17:902:d2c3:b0:157:ebe:25c5 with SMTP id
 n3-20020a170902d2c300b001570ebe25c5mr10049047plc.59.1649464737391; Fri, 08
 Apr 2022 17:38:57 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Sat,  9 Apr 2022 00:38:45 +0000
In-Reply-To: <20220409003847.819686-1-seanjc@google.com>
Message-Id: <20220409003847.819686-5-seanjc@google.com>
Mime-Version: 1.0
References: <20220409003847.819686-1-seanjc@google.com>
X-Mailer: git-send-email 2.35.1.1178.g4f1659d476-goog
Subject: [PATCH 4/6] KVM: x86/mmu: Track the number of TDP MMU pages, but not
 the actual pages
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Mingwei Zhang <mizhang@google.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

Track the number of TDP MMU "shadow" pages instead of tracking the pages
themselves. With the NX huge page list manipulation moved out of the common
linking flow, elminating the list-based tracking means the happy path of
adding a shadow page doesn't need to acquire a spinlock and can instead
inc/dec an atomic.

Keep the tracking as the WARN during TDP MMU teardown on leaked shadow
pages is very, very useful for detecting KVM bugs.

Tracking the number of pages will also make it trivial to expose the
counter to userspace as a stat in the future, which may or may not be
desirable.

Note, the TDP MMU needs to use a separate counter (and stat if that ever
comes to be) from the existing n_used_mmu_pages. The TDP MMU doesn't bother
supporting the shrinker nor does it honor KVM_SET_NR_MMU_PAGES (because the
TDP MMU consumes so few pages relative to shadow paging), and including TDP
MMU pages in that counter would break both the shrinker and shadow MMUs,
e.g. if a VM is using nested TDP.

Reviewed-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h | 11 +++--------
 arch/x86/kvm/mmu/tdp_mmu.c      | 19 +++++++++----------
 2 files changed, 12 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e4f7e7998928..19a352b5750b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1186,6 +1186,9 @@ struct kvm_arch {
 	 */
 	bool tdp_mmu_enabled;
 
+	/* The number of TDP MMU pages across all roots. */
+	atomic64_t tdp_mmu_pages;
+
 	/*
 	 * List of struct kvm_mmu_pages being used as roots.
 	 * All struct kvm_mmu_pages in the list should have
@@ -1206,18 +1209,10 @@ struct kvm_arch {
 	 */
 	struct list_head tdp_mmu_roots;
 
-	/*
-	 * List of struct kvmp_mmu_pages not being used as roots.
-	 * All struct kvm_mmu_pages in the list should have
-	 * tdp_mmu_page set and a tdp_mmu_root_count of 0.
-	 */
-	struct list_head tdp_mmu_pages;
-
 	/*
 	 * Protects accesses to the following fields when the MMU lock
 	 * is held in read mode:
 	 *  - tdp_mmu_roots (above)
-	 *  - tdp_mmu_pages (above)
 	 *  - the link field of struct kvm_mmu_pages used by the TDP MMU
 	 *  - possible_nx_huge_pages;
 	 *  - the possible_nx_huge_page_link field of struct kvm_mmu_pages used
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 9966735601a6..d0e6b341652c 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -29,7 +29,6 @@ int kvm_mmu_init_tdp_mmu(struct kvm *kvm)
 	kvm->arch.tdp_mmu_enabled = true;
 	INIT_LIST_HEAD(&kvm->arch.tdp_mmu_roots);
 	spin_lock_init(&kvm->arch.tdp_mmu_pages_lock);
-	INIT_LIST_HEAD(&kvm->arch.tdp_mmu_pages);
 	kvm->arch.tdp_mmu_zap_wq = wq;
 	return 1;
 }
@@ -54,7 +53,7 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
 	/* Also waits for any queued work items.  */
 	destroy_workqueue(kvm->arch.tdp_mmu_zap_wq);
 
-	WARN_ON(!list_empty(&kvm->arch.tdp_mmu_pages));
+	WARN_ON(atomic64_read(&kvm->arch.tdp_mmu_pages));
 	WARN_ON(!list_empty(&kvm->arch.tdp_mmu_roots));
 
 	/*
@@ -386,16 +385,18 @@ static void handle_changed_spte_dirty_log(struct kvm *kvm, int as_id, gfn_t gfn,
 static void tdp_mmu_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
 			      bool shared)
 {
+	atomic64_dec(&kvm->arch.tdp_mmu_pages);
+
+	if (!sp->nx_huge_page_disallowed)
+		return;
+
 	if (shared)
 		spin_lock(&kvm->arch.tdp_mmu_pages_lock);
 	else
 		lockdep_assert_held_write(&kvm->mmu_lock);
 
-	list_del(&sp->link);
-	if (sp->nx_huge_page_disallowed) {
-		sp->nx_huge_page_disallowed = false;
-		untrack_possible_nx_huge_page(kvm, sp);
-	}
+	sp->nx_huge_page_disallowed = false;
+	untrack_possible_nx_huge_page(kvm, sp);
 
 	if (shared)
 		spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
@@ -1123,9 +1124,7 @@ static int tdp_mmu_link_sp(struct kvm *kvm, struct tdp_iter *iter,
 		tdp_mmu_set_spte(kvm, iter, spte);
 	}
 
-	spin_lock(&kvm->arch.tdp_mmu_pages_lock);
-	list_add(&sp->link, &kvm->arch.tdp_mmu_pages);
-	spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
+	atomic64_inc(&kvm->arch.tdp_mmu_pages);
 
 	return 0;
 }

From patchwork Sat Apr  9 00:38:46 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Sean Christopherson <seanjc@google.com>
X-Patchwork-Id: 12807305
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id EE785C433F5
	for <kvm@archiver.kernel.org>; Sat,  9 Apr 2022 00:39:18 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S240252AbiDIAlV (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Fri, 8 Apr 2022 20:41:21 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60470 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S240181AbiDIAlE (ORCPT <rfc822;kvm@vger.kernel.org>);
        Fri, 8 Apr 2022 20:41:04 -0400
Received: from mail-pf1-x44a.google.com (mail-pf1-x44a.google.com
 [IPv6:2607:f8b0:4864:20::44a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 845C0C748C
        for <kvm@vger.kernel.org>; Fri,  8 Apr 2022 17:38:59 -0700 (PDT)
Received: by mail-pf1-x44a.google.com with SMTP id
 p187-20020a6229c4000000b004fb57adf76fso5994994pfp.2
        for <kvm@vger.kernel.org>; Fri, 08 Apr 2022 17:38:59 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=2SzKHtyKCq5Irk3bhcAX/x/kdJ+2+hL79/QeSqxrdXE=;
        b=F3emy64Cr5PgJT6yiaVC8uyT27yIU2FJVacYLWX5IZLobwDfbN/g7qteYjbCCnPH1G
         GCSSyXHMCug1JOCVxQMJIszQjXvtrtnApmdbr/+5Wf9WqSCZawrZqdYRqgmvA8jpdAIh
         HMubElJwL94l66p7xCa4kNl6b/b1523JFTwsbgsZRT9RpswFBUOA1yjGfDkAY/2XaiZw
         aP/L8Q8O7V8NUKHo89mtpCm2Dmv7H34hynVna8Sxt4EQ2nO4HyisOWRyBH7/A0A70GNp
         Ptz50ttkiZDXXNWP/gcv5ZuNWSeQ9PE5csdvVkpIq4ycn+CnScrR0Ue8OVZE/3OZbIsG
         NzAQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=2SzKHtyKCq5Irk3bhcAX/x/kdJ+2+hL79/QeSqxrdXE=;
        b=G4ooAq+V2RLPioGjZe8/eN72Hz5c1c8bbvSd9S09EiKNqJIwYb76cyQzjx0RfUoPuo
         rO3BScwyb3+8xuvzpSb1k1fOdvdCrU0Qn6/dZYLWQygkzCCgljNfKq88X/L0gUmT+Gm8
         /0HOrfYJFMRbsLGnMO4Rk1DLePqrwo4vUkMslFCR7ZjDyWGQyqWyNXrmEG9jqZ7ZW04E
         TlqXxfdIXBzH4LtBvDksfT99fEQsYpaNkJfg4uWySeIaPuEiHPKySPFCsUkVAopzwjN1
         5sorp6Yz9ggw1+6ahEWEzpDvFts2Z+5RTTVZHtIiyypwgfMfze9ofNrPe9iGhLkyawja
         NdWA==
X-Gm-Message-State: AOAM533JLDBaVapX0U79KB/Gr3VYMKzKy1qeW9exoQt5QxfvUyVQQwop
        aLEp0x9xtyM2DJCLH/gkPBZn9MWcmw8=
X-Google-Smtp-Source: 
 ABdhPJwuAy0R35WPnrFnfKlncfg91Jgy4oA66qufdBIM2rRGaMT+MFnDROLBhZEdu5vJpBdWi1G/BtAxcBc=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a17:902:ce81:b0:156:ad26:78b1 with SMTP id
 f1-20020a170902ce8100b00156ad2678b1mr22068188plg.144.1649464738950; Fri, 08
 Apr 2022 17:38:58 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Sat,  9 Apr 2022 00:38:46 +0000
In-Reply-To: <20220409003847.819686-1-seanjc@google.com>
Message-Id: <20220409003847.819686-6-seanjc@google.com>
Mime-Version: 1.0
References: <20220409003847.819686-1-seanjc@google.com>
X-Mailer: git-send-email 2.35.1.1178.g4f1659d476-goog
Subject: [PATCH 5/6] KVM: x86/mmu: Add helper to convert SPTE value to its
 shadow page
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Mingwei Zhang <mizhang@google.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

Add a helper to convert a SPTE to its shadow page to deduplicate a
variety of flows and hopefully avoid future bugs, e.g. if KVM attempts to
get the shadow page for a SPTE without dropping high bits.

Opportunistically add a comment in mmu_free_root_page() documenting why
it treats the root HPA as a SPTE.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 14 +++++++++-----
 arch/x86/kvm/mmu/mmu_internal.h | 12 ------------
 arch/x86/kvm/mmu/spte.h         | 17 +++++++++++++++++
 arch/x86/kvm/mmu/tdp_mmu.h      |  2 ++
 4 files changed, 28 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index bc86997f9339..8b4f3550710a 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1823,7 +1823,7 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
 			continue;
 		}
 
-		child = to_shadow_page(ent & PT64_BASE_ADDR_MASK);
+		child = spte_to_sp(ent);
 
 		if (child->unsync_children) {
 			if (mmu_pages_add(pvec, child, i))
@@ -2237,7 +2237,7 @@ static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 		 * so we should update the spte at this point to get
 		 * a new sp with the correct access.
 		 */
-		child = to_shadow_page(*sptep & PT64_BASE_ADDR_MASK);
+		child = spte_to_sp(*sptep);
 		if (child->role.access == direct_access)
 			return;
 
@@ -2258,7 +2258,7 @@ static int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
 		if (is_last_spte(pte, sp->role.level)) {
 			drop_spte(kvm, spte);
 		} else {
-			child = to_shadow_page(pte & PT64_BASE_ADDR_MASK);
+			child = spte_to_sp(pte);
 			drop_parent_pte(child, spte);
 
 			/*
@@ -2696,7 +2696,7 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot,
 			struct kvm_mmu_page *child;
 			u64 pte = *sptep;
 
-			child = to_shadow_page(pte & PT64_BASE_ADDR_MASK);
+			child = spte_to_sp(pte);
 			drop_parent_pte(child, sptep);
 			flush = true;
 		} else if (pfn != spte_to_pfn(*sptep)) {
@@ -3227,7 +3227,11 @@ static void mmu_free_root_page(struct kvm *kvm, hpa_t *root_hpa,
 	if (!VALID_PAGE(*root_hpa))
 		return;
 
-	sp = to_shadow_page(*root_hpa & PT64_BASE_ADDR_MASK);
+	/*
+	 * The "root" may be a special root, e.g. a PAE entry, treat it as a
+	 * SPTE to ensure any non-PA bits are dropped.
+	 */
+	sp = spte_to_sp(*root_hpa);
 	if (WARN_ON(!sp))
 		return;
 
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 75e830c648da..891ef217b877 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -106,18 +106,6 @@ struct kvm_mmu_page {
 
 extern struct kmem_cache *mmu_page_header_cache;
 
-static inline struct kvm_mmu_page *to_shadow_page(hpa_t shadow_page)
-{
-	struct page *page = pfn_to_page(shadow_page >> PAGE_SHIFT);
-
-	return (struct kvm_mmu_page *)page_private(page);
-}
-
-static inline struct kvm_mmu_page *sptep_to_sp(u64 *sptep)
-{
-	return to_shadow_page(__pa(sptep));
-}
-
 static inline int kvm_mmu_role_as_id(union kvm_mmu_page_role role)
 {
 	return role.smm ? 1 : 0;
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 73f12615416f..149a23c6e981 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -207,6 +207,23 @@ extern u64 __read_mostly shadow_nonpresent_or_rsvd_lower_gfn_mask;
  */
 extern u8 __read_mostly shadow_phys_bits;
 
+static inline struct kvm_mmu_page *to_shadow_page(hpa_t shadow_page)
+{
+	struct page *page = pfn_to_page((shadow_page) >> PAGE_SHIFT);
+
+	return (struct kvm_mmu_page *)page_private(page);
+}
+
+static inline struct kvm_mmu_page *spte_to_sp(u64 spte)
+{
+	return to_shadow_page(spte & PT64_BASE_ADDR_MASK);
+}
+
+static inline struct kvm_mmu_page *sptep_to_sp(u64 *sptep)
+{
+	return to_shadow_page(__pa(sptep));
+}
+
 static inline bool is_mmio_spte(u64 spte)
 {
 	return (spte & shadow_mmio_mask) == shadow_mmio_value &&
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index c163f7cc23ca..d3714200b932 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -5,6 +5,8 @@
 
 #include <linux/kvm_host.h>
 
+#include "spte.h"
+
 hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu);
 
 __must_check static inline bool kvm_tdp_mmu_get_root(struct kvm_mmu_page *root)

From patchwork Sat Apr  9 00:38:47 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Sean Christopherson <seanjc@google.com>
X-Patchwork-Id: 12807306
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 054A5C433F5
	for <kvm@archiver.kernel.org>; Sat,  9 Apr 2022 00:39:30 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S240237AbiDIAl3 (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Fri, 8 Apr 2022 20:41:29 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33292 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S240204AbiDIAlS (ORCPT <rfc822;kvm@vger.kernel.org>);
        Fri, 8 Apr 2022 20:41:18 -0400
Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com
 [IPv6:2607:f8b0:4864:20::54a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 243A3CA6F0
        for <kvm@vger.kernel.org>; Fri,  8 Apr 2022 17:39:01 -0700 (PDT)
Received: by mail-pg1-x54a.google.com with SMTP id
 u3-20020a632343000000b0039cac94652aso5037484pgm.11
        for <kvm@vger.kernel.org>; Fri, 08 Apr 2022 17:39:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=U/xA/nSwurkGEdck/qaZkekmuGCzTuPGUBnd/eefe1E=;
        b=GRCcf0ghzrfbbfGPqwY8gcMpS9q954jGN54lbHMgexFfT28hQ7wCe/1fMEsMACZJLG
         wme9uxgGTLpC9IJ4eTZ4wcSG2DUeRLKZIkHNMncuDLYJ2/sTA6fTZlaY5U/tpm8Vu+30
         rvBeNLWZkKtk1FS9FfiSLYHuPY6iRwRovKOEPPKgs8/Qcd4iXizLjzQ25wl8AE3SVwnC
         Ex8AR1Ibxo4/MBl9xzZKBDYnrd5V2bYSHlgsRaZEHyXqlBxjA1+1d83VrYtyQpY9dFJK
         W1J9cSDqLj0Nmsxdl/jbgT6p9P9IOIOR5SGAvkAfvqxUN/oERnvTlipraI97zVJdnQob
         Diwg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=U/xA/nSwurkGEdck/qaZkekmuGCzTuPGUBnd/eefe1E=;
        b=SXkepS+AacanZ4IDA2j+rxLpKvpX9TSYz99y97ZxQOT/prFKZ8E/GW22o2umP6o8+j
         NyAQCYIUYQS2iUwHaOK5iV2179o2WZC8qeD5/HpcvlbxXShkA+Grfkqjp73oplYboV6v
         suRj1M/aakcN4hhenqv/GSGaDu8DlRsgAQH2dmMOpja3F8KRJY+dQdjz4bpCrtvsy41n
         O9yVDKkjVCBuoeo0EJfffRwbeQ83zJXwNURrbIieGbIVQysx3eEcV/Aq8VxUOAePm//E
         jUtzjUXdel8k/G26giWIe6am+mP24swA+mJQzqFitLTUZ5P8Z8koPv+aZqDzAsxmug0K
         m0NA==
X-Gm-Message-State: AOAM531YwX5HhSgnldUXSOE97wv2bX8eG/6sJiNgI2Frb6amPNkwgNmv
        fvvjajX86853IznREVx+k6+ELD7y8Sw=
X-Google-Smtp-Source: 
 ABdhPJzWm5cixOoYwZ2jCRkmulJdgu38wIoabB2kELlK12VuaYj5Ty1y/Ha2+rY56TAZL+3b/KTvyUsyzS4=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a05:6a00:c94:b0:4fa:e134:4e6f with SMTP id
 a20-20020a056a000c9400b004fae1344e6fmr21792503pfv.66.1649464740605; Fri, 08
 Apr 2022 17:39:00 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Sat,  9 Apr 2022 00:38:47 +0000
In-Reply-To: <20220409003847.819686-1-seanjc@google.com>
Message-Id: <20220409003847.819686-7-seanjc@google.com>
Mime-Version: 1.0
References: <20220409003847.819686-1-seanjc@google.com>
X-Mailer: git-send-email 2.35.1.1178.g4f1659d476-goog
Subject: [PATCH 6/6] KVM: x86/mmu: explicitly check nx_hugepage in
 disallowed_hugepage_adjust()
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Mingwei Zhang <mizhang@google.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Mingwei Zhang <mizhang@google.com>

Explicitly check if a NX huge page is disallowed when determining if a page
fault needs to be forced to use a smaller sized page. KVM incorrectly
assumes that the NX huge page mitigation is the only scenario where KVM
will create a shadow page instead of a huge page. Any scenario that causes
KVM to zap leaf SPTEs may result in having a SP that can be made huge
without violating the NX huge page mitigation. E.g. disabling of dirty
logging, zapping from mmu_notifier due to page migration, guest MTRR
changes that affect the viability of a huge page, etc...

Fixes: b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation")
Signed-off-by: Mingwei Zhang <mizhang@google.com>
[sean: add barrier comments, use spte_to_sp()]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c     | 17 +++++++++++++++--
 arch/x86/kvm/mmu/tdp_mmu.c |  6 ++++++
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8b4f3550710a..c6f018c6d2f5 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2908,6 +2908,19 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
 	    cur_level == fault->goal_level &&
 	    is_shadow_present_pte(spte) &&
 	    !is_large_pte(spte)) {
+		u64 page_mask;
+
+		/*
+		 * Ensure nx_huge_page_disallowed is read after checking for a
+		 * present shadow page.  A different vCPU may be concurrently
+		 * installing the shadow page if mmu_lock is held for read.
+		 * Pairs with the smp_wmb() in kvm_tdp_mmu_map().
+		 */
+		smp_rmb();
+
+		if (!spte_to_sp(spte)->nx_huge_page_disallowed)
+			return;
+
 		/*
 		 * A small SPTE exists for this pfn, but FNAME(fetch)
 		 * and __direct_map would like to create a large PTE
@@ -2915,8 +2928,8 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
 		 * patching back for them into pfn the next 9 bits of
 		 * the address.
 		 */
-		u64 page_mask = KVM_PAGES_PER_HPAGE(cur_level) -
-				KVM_PAGES_PER_HPAGE(cur_level - 1);
+		page_mask = KVM_PAGES_PER_HPAGE(cur_level) -
+			    KVM_PAGES_PER_HPAGE(cur_level - 1);
 		fault->pfn |= fault->gfn & page_mask;
 		fault->goal_level--;
 	}
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index d0e6b341652c..5cae5cdcfcbc 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1185,6 +1185,12 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 			tdp_mmu_init_child_sp(sp, &iter);
 
 			sp->nx_huge_page_disallowed = fault->huge_page_disallowed;
+			/*
+			 * Ensure nx_huge_page_disallowed is visible before the
+			 * SP is marked present, as mmu_lock is held for read.
+			 * Pairs with the smp_rmb() in disallowed_hugepage_adjust().
+			 */
+			smp_wmb();
 
 			if (tdp_mmu_link_sp(kvm, &iter, sp, true)) {
 				tdp_mmu_free_sp(sp);